Myvideo

Guest

Login

ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

Uploaded By: Myvideo
5 views
0
0 votes
0

#alibi #transformers #attention Transformers are essentially set models that need additional inputs to make sense of sequence data. The most widespread additional inputs are position encodings or position embeddings, which add sequence index information in various forms. However, this has put a limit on the resulting model, which cannot run inference on sequences longer than it has been trained on, as it would encounter unfamiliar position encodings. ALiBi solves this by proposing simple linear fixed biases as position information, adding negligible overhead in time and memory, but surprisingly, the resulting model is able to handle inference on sequences many times as long as its training sequences. OUTLINE: 0:00 - Intro & Overview 1:40 - Position Encodings in Transformers 4:55 - Sinusoidial Position Encodings 11:50 - ALiBi Position Encodings 20:50 - How to choose the slope parameter 23:55 - Experimental Results 29:10 - Comments & Conclusion Paper: https://of

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later