This short tutorial covers the basics of the Transformer, a neural network architecture designed for handling sequential data in machine learning. Timestamps: 0:00 - Intro 1:18 - Motivation for developing the Transformer 2:44 - Input embeddings (start of encoder walk-through) 3:29 - Attention 6:29 - Multi-head attention 7:55 - Positional encodings 9:59 - Add & norm, feedforward, & stacking encoder layers 11:14 - Masked multi-head attention (start of decoder walk-through) 12:35 - Cross-attention 13:38 - Decoder output & prediction probabilities 14:46 - Complexity analysis 16:00 - Transformers as graph neural networks Original Transformers paper: Attention is All You Need - Other papers mentioned: (GPT-3) Language Models are Few-Shot Learners - (DALL-E) Zero-Shot Text-to-Image Generation - BERT: Pre-training of Deep Bidirectional Tran
Hide player controls
Hide resume playing