Models and Architectures in Word2vec

bebound@gmail.com (KK) — Fri, 05 Jan 2018 15:14:00 +0800

Generally, word2vec is a language model to predict the words probability based on the context. When build the model, it create word embedding for each word, and word embedding is widely used in many NLP tasks.

Models

CBOW (Continuous Bag of Words)

Use the context to predict the probability of current word. (In the picture, the word is encoded with one-hot encoding, \(W_{V*N}\) is word embedding, and \(W_{V*N}^{’}\), the output weight matrix in hidden layer, is same as \(\hat{\upsilon}\) in following equations)

Parameters in doc2vec

bebound@gmail.com (KK) — Thu, 03 Aug 2017 15:20:00 +0800

Here are some parameter in gensim’s doc2vec class.

window

window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead.

In skip-gram model, if the window size is 2, the training samples will be this:(the blue word is the input word)

min_count

If the word appears less than this value, it will be skipped

sample

High frequency word like the is useless for training. sample is a threshold for deleting these higher-frequency words. The probability of keeping the word \(w_i\) is: