Transformer
The Illustrated Transformer
- Transformer is the first sequence transduction model based entirely on attention

- Self-Attention in both encoder and decoder is the cornerstone of Transformer

- Represent each word with some embedding vectors
- Encode each word to feature vectors with self-attention
