<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Transformer on KK's Blog (fromkk)</title><link>https://fromkk.com/tags/transformer/</link><description>Recent content in Transformer on KK's Blog (fromkk)</description><generator>Hugo</generator><language>en</language><managingEditor>bebound@gmail.com (KK)</managingEditor><webMaster>bebound@gmail.com (KK)</webMaster><lastBuildDate>Sun, 10 Aug 2025 18:44:05 +0800</lastBuildDate><atom:link href="https://fromkk.com/tags/transformer/index.xml" rel="self" type="application/rss+xml"/><item><title>The Annotated The Annotated Transformer</title><link>https://fromkk.com/posts/the-annotated-the-annotated-transformer/</link><pubDate>Sun, 01 Sep 2019 16:00:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/the-annotated-the-annotated-transformer/</guid><description>&lt;p&gt;Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me.&lt;/p&gt;
&lt;p&gt;First, this is the graph that was referenced by almost all of the post related to Transformer.&lt;/p&gt;
&lt;figure class="image-size-s"&gt;&lt;img src="https://fromkk.com/images/transformer_main.png"&gt;
&lt;/figure&gt;

&lt;p&gt;Transformer consists of these parts: Input, Encoder*N, Output Input, Decoder*N, Output. I&amp;rsquo;ll explain them step by step.&lt;/p&gt;
&lt;h2 id="input"&gt;Input&lt;/h2&gt;
&lt;p&gt;The input word will map to 512 dimension vector. Then generate Positional Encoding(PE) and add it to the original embeddings.&lt;/p&gt;</description></item></channel></rss>