Myvideo

Guest

Login

Knowledge Distillation: A Good Teacher is Patient and Consistent

Uploaded By: Myvideo
12 views
0
0 votes
0

The optimal training recipe for knowledge distillation is consistency and patience. Consistency refers to showing the teacher and the student the exact same view of an image and additionally improving the support of the distribution with the MixUp augmentation. Patience refers to enduring long training schedules. Exciting to see advances in model compression to make stronger models more widely used! Paper Links: Knowledge Distillation: A Good Teacher is Patient and Consistent: Does Knowledge Distillation Really Work? Meta Pseudo Labels: MixUp Augmentation: Scaling Vision Transformers: Well-Read Students Learn Better: Chapters 0:00 Paper Title 0:05 Model Compression 1:11 Limitations of Pruning 2:13 Consistency in Distillation 4:08 Comparison w

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later