<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Word2vec on KK's Blog (fromkk)</title><link>https://fromkk.com/tags/word2vec/</link><description>Recent content in Word2vec on KK's Blog (fromkk)</description><generator>Hugo</generator><language>en</language><managingEditor>bebound@gmail.com (KK)</managingEditor><webMaster>bebound@gmail.com (KK)</webMaster><lastBuildDate>Sun, 10 Aug 2025 18:44:05 +0800</lastBuildDate><atom:link href="https://fromkk.com/tags/word2vec/index.xml" rel="self" type="application/rss+xml"/><item><title>Models and Architectures in Word2vec</title><link>https://fromkk.com/posts/models-and-architechtures-in-word2vec/</link><pubDate>Fri, 05 Jan 2018 15:14:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/models-and-architechtures-in-word2vec/</guid><description>&lt;p&gt;Generally, &lt;code&gt;word2vec&lt;/code&gt; is a language model to predict the words probability based on the context. When build the model, it create word embedding for each word, and word embedding is widely used in many NLP tasks.&lt;/p&gt;
&lt;h2 id="models"&gt;Models&lt;/h2&gt;
&lt;h3 id="cbow--continuous-bag-of-words"&gt;CBOW (Continuous Bag of Words)&lt;/h3&gt;
&lt;p&gt;Use the context to predict the probability of current word. (In the picture, the word is encoded with one-hot encoding, \(W_{V*N}\) is word embedding, and \(W_{V*N}^{&amp;rsquo;}\), the output weight matrix in hidden layer, is same as \(\hat{\upsilon}\) in following equations)&lt;/p&gt;</description></item><item><title>Parameters in doc2vec</title><link>https://fromkk.com/posts/parameters-in-dov2vec/</link><pubDate>Thu, 03 Aug 2017 15:20:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/parameters-in-dov2vec/</guid><description>&lt;p&gt;Here are some parameter in &lt;code&gt;gensim&lt;/code&gt;&amp;rsquo;s &lt;code&gt;doc2vec&lt;/code&gt; class.&lt;/p&gt;
&lt;h3 id="window"&gt;window&lt;/h3&gt;
&lt;p&gt;window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead.&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;skip-gram&lt;/code&gt; model, if the window size is 2, the training samples will be this:(the blue word is the input word)&lt;/p&gt;
&lt;figure class="image-size-s"&gt;&lt;img src="https://fromkk.com/images/doc2vec_window.png"&gt;
&lt;/figure&gt;

&lt;h3 id="min-count"&gt;min_count&lt;/h3&gt;
&lt;p&gt;If the word appears less than this value, it will be skipped&lt;/p&gt;
&lt;h3 id="sample"&gt;sample&lt;/h3&gt;
&lt;p&gt;High frequency word like &lt;code&gt;the&lt;/code&gt; is useless for training. &lt;code&gt;sample&lt;/code&gt; is a threshold for deleting these higher-frequency words. The probability of keeping the word \(w_i\) is:&lt;/p&gt;</description></item></channel></rss>