DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)

About Share Download Add to

#deberta #bert #huggingface DeBERTa by Microsoft is the next iteration of BERT-style Self-Attention Transformer models, surpassing RoBERTa in State-of-the-art in multiple NLP tasks. DeBERTa brings two key improvements: First, they treat content and position information separately in a new form of disentangled attention mechanism. Second, they resort to relative positional encodings throughout the base of the transformer, and provide absolute positional encodings only at the very end. The resulting model is both more accurate on downstream tasks and needs less pretraining steps to reach good accuracy. Models are also available in Huggingface and on Github. OUTLINE: 0:00 - Intro & Overview 2:15 - Position Encodings in Transformer’s Attention Mechanism 9:55 - Disentangling Content & Position Information in Attention 21:35 - Disentangled Query & Key construction in the Attention Formula 25:50 - Efficient Relative Position Encodings 28:40 - Enhanced Mask Decoder using Absolute Position Encodings 35:30 - My Criti

Share with your friends

Link:

Embed:

<iframe width="640" height="360" src="//myvideo.cc/embed/L05jT1ltbmxjc1FCKzlRR2JuUG9TUlJ3YmlsK1JoYXNXRHJFcXFBOVFaWT0" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

Video Size:

Custom size:

Autoplay video

Hide player controls

Hide resume playing

Add to Playlist:

Favorites

My Playlist

Watch Later