Attention is one of the most important concepts behind Transformers and Large Language Models, like ChatGPT. However, it’s not that complicated. In this StatQuest, we add Attention to a basic Sequence-to-Sequence (Seq2Seq or Encoder-Decoder) model and walk through how it works and is calculated, one step at a time. BAM!!! If you’d like to support StatQuest, please consider... Patreon: ...or... YouTube Membership: ...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... ...or just donating to StatQuest! Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: 0:00 Awesome song and introduction 3:14 The Main Idea of Attention 5:34 A worked out example of Attention 10:18 The Dot Product Similarity 11:52 Using similarity scores to calculate Attention values 13:27 Using Attention values to predict an output word 14:22 Summary of Attention #StatQuest #neuralnetwork #attention
Hide player controls
Hide resume playing