Speech enhancement systems are built to remove background noise and reverberation from speech signals. It can be applied in video conferencing systems, virtual assistants, hearing aids, mobile, smart home devices, etc…Conventional speech enhancement systems are trained with supervised learning methods, which require a pair of studio-quality clean target speech and synthetic noisy mixture. The requirement of a ground truth clean speech dataset has disadvantages because it is harder to scale and not diverse enough, which makes the trained model not robust to real-world scenarios. Moreover, it is expensive to record clean speech and noise on the same domain with inference data. Additionally, conventional speech enhancement systems can lead to automatic speech recognition (ASR) performance degradation. Our project goals are to improve the perceptual quality of enhanced speech, utilize abundant real noisy recording instead of relying on expensive studio-quality data and mitigate the problem of ASR perfor
Hide player controls
Hide resume playing