Phase-Aware Single-Stage Speech Denoising and Dereverberation with U-Net

Paper: Submitted to Interspeech2020
Authors: Hyeong-Seok Choi, Hoon Heo, Jie Whan Lee, and Kyogu Lee
Abstract: In this work, we tackle a denoising and dereverberation problem with a single-stage framework.Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware β-sigmoid mask (PHM), which reuses the estimated magnitude values to estimate the clean phase by respecting the triangle inequality in the complex domain between three signal components such as mixture, source and the rest. Two PHMs are used to deal with direct and reverberant source, which allows to control the proportion of reverberation in the enhanced speech at inference time. In addition, to improve the speech enhancement performance, we propose a new time-domain loss function and show a reasonable performance gain compare to MSE loss in the complex domain. Finally, to achieve a real-time inference, an optimization strategy for U-Net is proposed which significantly reduces the computational overhead upto 88.9% compared to the naïve version.

Single-stage Denoising and Dereverberation

1. Noisy mixture to direct source speech (Denoising)

MixtureEnhanced

2. Reverberant speech to direct source speech (Dereverberation)

Reverberant speechEnhanced

3. Noisy-reverberant mixture to reverberant speech (Denoising)

Noisy-reverberant mixtureEnhanced

4. Noisy-reverberant mixture to direct source speech (Denoising and Dereverberation)

Noisy-reverberant mixtureEnhanced
Noisy-reverberant mixtureEnhanced
Noisy-reverberant mixtureEnhanced

5. Real-recordings

Mixture (Air conditioner)Enhanced
Mixture (Barking)Enhanced
Mixture (Typing)Enhanced
Mixture (Munching)Enhanced

The effect of phase enhancement in dereverberation task

Mixture phase vs. Est. phase

MixtureEst. magnitude + mixture phaseEst. magnitude + Est. phase
MixtureEst. magnitude + mixture phaseEst. magnitude + Est. phase

Controlling reverberation

Interpolation between direct source est. and reverberant source est.

MixtureDirect source est.Interpolated est.Reverberant source est
MixtureDirect source est.Interpolated est.Reverberant source est
MixtureDirect source est.Interpolated est.Reverberant source est