tinyML Asia 2021 Dongsoo Lee: Extremely low-bit quantization for Transformers

Uploaded By: Myvideo

Published on

17 Dec 2021

1 view

0

0 votes

0

About Share Download Add to

tinyML Asia 2021 Extremely low-bit quantization for Transformers DongSoo LEE 이동수, Executive Officer, NAVER CLOVA The deployment of widely used Transformer architecture is challenging because of heavy computation load and memory overhead during inference, especially when the target device is limited in computational resources such as mobile or edge devices. Quantization is an effective technique to address such challenges. Our analysis shows that for a given number of quantization bits, each block of Transformer contributes to model accuracy and inference computations in different manners. Moreover, even inside an embedding block, each word presents vastly different contributions. Correspondingly, we propose a mixed precision quantization strategy to represent Transformer weights by an extremely low number of bits (e.g., under 3 bits). For example, for each word in an embedding block, we assign different quantization bits based on statistical property. We also introduce a new

Share with your friends

Link:

Embed:

<iframe width="640" height="360" src="//myvideo.cc/embed/ZkIzRTd2Z25SNFcwMnlsN09CYTZ3d3JYTWFOTWhyYyt0YWgzcWg3dTRzYz0" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

Video Size:

Custom size:

x

Autoplay video

Hide player controls

Hide resume playing

Add to Playlist:

Favorites

My Playlist

Watch Later

tinyML Asia 2021 Partner Session - SynSense: SPECK A Low power, low latency neuromorphic visual...

4 years ago

00:14:56

tinyML Asia 2021 Partner Session - SynSense: SPECK A Low power, low latency neuromorphic visual...

2 23%

tinyML Asia 2021 Video Poster: Plant Growth and LAI Estimation using quantized Embedded Regression..

4 years ago

00:13:25

tinyML Asia 2021 Video Poster: Plant Growth and LAI Estimation using quantized Embedded Regression..

7 49%

tinyML Asia 2021 Video Poster: Efficient inference of low-resolution optic flow on low power...

4 years ago

00:14:38

tinyML Asia 2021 Video Poster: Efficient inference of low-resolution optic flow on low power...

2 37%

tinyML Asia 2021 Justin Kao: A lightweight face detection method working with Himax Ultra-Low...

4 years ago

00:28:21

tinyML Asia 2021 Justin Kao: A lightweight face detection method working with Himax Ultra-Low...

2 11%

tinyML Asia 2021 Zou Yuanhao: TinyML Heat Image Face Recognition on Wio-Terminal

4 years ago

00:16:40

tinyML Asia 2021 Zou Yuanhao: TinyML Heat Image Face Recognition on Wio-Terminal

13 18%

tinyML Asia 2021 Haochen Xie: An approach to dynamically integrate heterogenous AI components...

4 years ago

00:23:01

tinyML Asia 2021 Haochen Xie: An approach to dynamically integrate heterogenous AI components...

1 36%

tinyML Asia 2021 Joshua Chang: Sensor Fusion using Machine Learning: Smart Forehead Temperature...

4 years ago

00:29:39

tinyML Asia 2021 Joshua Chang: Sensor Fusion using Machine Learning: Smart Forehead Temperature...

2 16%

tinyML Asia 2021 Video Poster: Cyberon DSpotter: A phoneme-based local voice recognition solution

4 years ago

00:08:08

tinyML Asia 2021 Video Poster: Cyberon DSpotter: A phoneme-based local voice recognition solution

5 36%

tinyML Asia 2021 Video Poster: AI Enabled Low-Cost Stethoscope

4 years ago

00:08:10

tinyML Asia 2021 Video Poster: AI Enabled Low-Cost Stethoscope

0 60%

tinyML Asia 2021 Video Poster: Bird Hotspots: A tinyML acoustic classification system for...

4 years ago

00:08:02

tinyML Asia 2021 Video Poster: Bird Hotspots: A tinyML acoustic classification system for...

3 75%

tinyML Asia 2021 Anton Kroger: Airborne sound maintenance in remote sites using low power...

4 years ago

00:21:24

tinyML Asia 2021 Anton Kroger: Airborne sound maintenance in remote sites using low power...

1 5%

tinyML Asia 2021 Partner Session - Qualcomm: Always-on audio/speech network architectures,...

4 years ago

00:22:35

tinyML Asia 2021 Partner Session - Qualcomm: Always-on audio/speech network architectures,...

0 6%

tinyML Asia 2021 Yihong Wu: Lightweight visual localization with deep learning

4 years ago

00:26:53

tinyML Asia 2021 Yihong Wu: Lightweight visual localization with deep learning

5 43%

tinyML Asia 2021 Jingpeng Xiang: Soundplus

4 years ago

00:53:07

tinyML Asia 2021 Jingpeng Xiang: Soundplus

6 62%

tinyML Asia 2021 Chanwoo Kim: A review of on-device fully neural end-to-end speech recognition...

4 years ago

00:49:23

tinyML Asia 2021 Chanwoo Kim: A review of on-device fully neural end-to-end speech recognition...

5 82%

tinyML Asia Video Poster Neuton AI: Bringing Big Ideas into Tiny Devices Bottoms-up Approach to...

4 years ago

00:09:30

tinyML Asia Video Poster Neuton AI: Bringing Big Ideas into Tiny Devices Bottoms-up Approach to...

1 61%

tinyML Asia 2021 Dongsoo Lee: Extremely low-bit quantization for Transformers

4 years ago

00:27:54

tinyML Asia 2021 Dongsoo Lee: Extremely low-bit quantization for Transformers

1 39%

tinyML Asia 2021 Flora Salim: Learning compact representation with less (labelled) data from sensors

4 years ago

00:27:16

tinyML Asia 2021 Flora Salim: Learning compact representation with less (labelled) data from sensors

1 21%

0 Comments

Guest