본문 바로가기

카테고리 없음

자연어 처리 Word2Vec

김기영님 강의

 

https://github.com/kiyoungkim1/ReadyToUseAI

 

GitHub - kiyoungkim1/ReadyToUseAI: Ready-To-Use AI

Ready-To-Use AI. Contribute to kiyoungkim1/ReadyToUseAI development by creating an account on GitHub.

github.com

 

Korpora  한국어 자연어 처리 관련 팩키지 말뭉치 

https://github.com/ko-nlp/Korpora

 

GitHub - ko-nlp/Korpora: Korean corpus repository

Korean corpus repository. Contribute to ko-nlp/Korpora development by creating an account on GitHub.

github.com

 

Test / Train / Validation

Embedding

 

Tokenizer, Subword 단위

 

 

 

TFIDF (Term Frequency-Inverse Document Frequency)

 

 

 

 

 

 

 

 

 

 

 

 

 

1. Bag of Words, TFIDF

https://www.youtube.com/watch?v=Z201jwWo-xs 

 

 

2. word2vec, fasttext와 doc2vec. Embedding, Vectorization, Hyperparameter, Inference

    word2vec (vector size, window, min_count)

https://youtu.be/5ivVf-Guqk4

size 100

하나의 단어를 100개의 배열로, 모든 단어를 각각 100개의 배열로 나타냈다 ?

 

 

, , , ,

 

 

 

 

 

 

 

 

3. Transformer and Transfer Learning

https://youtu.be/9HDBKS4j64M

 

Autoencoding model and Autoregressive model

 

 

 

 

 

4. hugging face

https://youtu.be/sSy8ufyiuDY