The Annotated The Annotated Transformer

bebound@gmail.com (KK) — Sun, 01 Sep 2019 16:00:00 +0800

Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me.

First, this is the graph that was referenced by almost all of the post related to Transformer.

Transformer consists of these parts: Input, Encoder*N, Output Input, Decoder*N, Output. I’ll explain them step by step.

Input

The input word will map to 512 dimension vector. Then generate Positional Encoding(PE) and add it to the original embeddings.

Transformer on KK's Blog (fromkk)

The Annotated The Annotated Transformer

Input