# 트랜스포머로 무엇을 할 수 있나요?

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

Collecting datasets
  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━

In [None]:
from transformers import pipeline

##sentiment-analysis(영어 감정분석)

문장의 감정이 긍정적인지 부정적인지 분석

In [None]:
classifier = pipeline("sentiment-analysis")
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!", "I love PARA", "나는 배고프다"]
)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455},
 {'label': 'POSITIVE', 'score': 0.9998016953468323},
 {'label': 'POSITIVE', 'score': 0.6206148862838745}]

##zero-shot-classification(제로샷 분류)

주어진 문장이 어디에 더 가까운지 수치화



In [None]:
classifier = pipeline("zero-shot-classification")
classifier(
    ["This is a course about the Transformers library",'i want eat hamburger'],
    candidate_labels=["education", "politics", "business","food"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'sequence': 'This is a course about the Transformers library',
  'labels': ['education', 'business', 'food', 'politics'],
  'scores': [0.7947172522544861,
   0.10536065697669983,
   0.05906017869710922,
   0.04086194932460785]},
 {'sequence': 'i want eat hamburger',
  'labels': ['food', 'business', 'education', 'politics'],
  'scores': [0.9946014285087585,
   0.00301549956202507,
   0.0012812966015189886,
   0.0011017395881935954]}]

##text-generation(텍스트 생성)

말그대로 텍스트를 생성함. 주어진 내용에 이어질 내용을 추론하여 출력함

In [None]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to print star in c lang")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to print star in c lang. In this lesson you will learn how to print star in c lang in the following format.\n\nYou need to use the following commands to print your star in c lang'}]

In [None]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "나는 아침밥을 먹고",
    max_length=100,
    num_return_sequences=2,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': '나는 아침밥을 먹고도 문그 어까 나밥의 괨는 너라 문그 가 쨄하 는 아침밥을 삐영음 뼈�'},
 {'generated_text': '나는 아침밥을 먹고려는우 그습다.\n\n\n\n\n\n\n\n\n\n'}]

##fill-mask(마스크 채우기)

주어진 내용에 마스킹된 부분에 어떤 내용이 들어갈지 추론하여 출력함

In [None]:
unmasker = pipeline("fill-mask")
unmasker("This <mask> will teach you all about <mask> models.", top_k=5)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[[{'score': 0.2166784256696701,
   'token': 35950,
   'token_str': ' tutorial',
   'sequence': '<s>This tutorial will teach you all about<mask> models.</s>'},
  {'score': 0.19827687740325928,
   'token': 1566,
   'token_str': ' article',
   'sequence': '<s>This article will teach you all about<mask> models.</s>'},
  {'score': 0.06834729015827179,
   'token': 1040,
   'token_str': ' book',
   'sequence': '<s>This book will teach you all about<mask> models.</s>'},
  {'score': 0.06521215289831161,
   'token': 618,
   'token_str': ' post',
   'sequence': '<s>This post will teach you all about<mask> models.</s>'},
  {'score': 0.047874681651592255,
   'token': 4704,
   'token_str': ' guide',
   'sequence': '<s>This guide will teach you all about<mask> models.</s>'}],
 [{'score': 0.1583120971918106,
   'token': 30412,
   'token_str': ' mathematical',
   'sequence': '<s>This<mask> will teach you all about mathematical models.</s>'},
  {'score': 0.03415736183524132,
   'token': 38163,
   'token

##ner(Named entity recognition, 개체명 인식)

문장 내에 들어가 있는 단어가 어떤 속성을 가지고 있는지 출력함
속성의 예시 : PER(사람), LOC(장소), ORG(조직) 등

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Chaeho and I work at PARA in YongSan Seoul")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9954087,
  'word': 'Chaeho',
  'start': 11,
  'end': 17},
 {'entity_group': 'ORG',
  'score': 0.9946884,
  'word': 'PARA',
  'start': 32,
  'end': 36},
 {'entity_group': 'LOC',
  'score': 0.982465,
  'word': 'YongSan Seoul',
  'start': 40,
  'end': 53}]

##question-answering(질의 응답)

질문에 대해서 주어진 지문에 기반하여 답변함

In [None]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="what is my name",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.991936981678009, 'start': 11, 'end': 18, 'answer': 'Sylvain'}

##summarization(요약)

In [None]:
summarizer = pipeline("summarization")
summarizer(
    """
    We understand that the segregation of our consciousness into present, past, and future is both a fiction and an oddly self-referential framework; your present was part of your mother's future, and your children's past will be in part your present. Nothing is generally wrong with structuring our consciousness of time in this conventional manner, and it often works well enough. In the case of climate change, however, the sharp division of time into past, present, and future has been desperately misleading and has, most importantly, hidden from view the extent of the responsibility of those of us alive now.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your max_length is set to 142, but your input_length is only 127. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=63)


[{'summary_text': ' We understand that the segregation of our consciousness into present, past, and future is both a fiction and an oddly self-referential framework . Nothing is generally wrong with structuring our consciousness of time in this manner, and it often works well enough . In the case of climate change, however, the division of time into past, present and future has been desperately misleading and has, most importantly, hidden from view the extent of the responsibility of those alive now .'}]

##translation(번역)

In [None]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")
translator("""한꾹인뜰만알아뽈쑤있께짝썽하껬씁니따.""")

[{'translation_text': "It's the only place in the courtyard where you can see it, and you can see it with your eyelids."}]