Myvideo

Guest

Login

tinyML Talks: The Multilingual Spoken Words Corpus, a Massive Keyword Spotting Dataset

Uploaded By: Myvideo
1 view
0
0 votes
0

tinyML Talks The Multilingual Spoken Words Corpus, a Massive Keyword Spotting Dataset Mark Mazumder , PhD Student Harvard University This talk will present the Multilingual Spoken Words Corpus (MSWC), a speech dataset of over 340,000 spoken words in 50 languages, with over 23 million audio examples. MSWC has many use cases, ranging from voice-enabled consumer devices to call center automation. The dataset is CC-BY licensed and free for academic research and commercial use. We will introduce applications of MSWC for few-shot keyword spotting and spoken term search tasks in low-resource languages, and share a brief tutorial on getting started with the dataset. We will also discuss how we automated the construction of our dataset and our self-supervised approach for detecting outlier samples.

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later