Phase-Aware Single-Stage Speech Denoising and Dereverberation with U-Net
Paper: Submitted to Interspeech2020
Authors: Hyeong-Seok Choi, Hoon Heo, Jie Whan Lee, and Kyogu Lee
Abstract: In this work, we tackle a denoising and dereverberation
problem with a single-stage framework.Although denoising and dereverberation
may be considered two separate challenging tasks, and thus, two modules are
typically required for each task, we show that a single deep network can be
shared to solve the two problems. To this end, we propose a new masking method
called phase-aware β-sigmoid mask (PHM), which reuses the estimated
magnitude values to estimate the clean phase by respecting the triangle inequality
in the complex domain between three signal components such as mixture, source and the rest.
Two PHMs are used to deal with direct and reverberant source, which allows to control
the proportion of reverberation in the enhanced speech at inference time.
In addition, to improve the speech enhancement performance, we propose a new time-domain
loss function and show a reasonable performance gain compare to MSE loss in the complex domain.
Finally, to achieve a real-time inference, an optimization strategy for U-Net is proposed
which significantly reduces the computational overhead upto 88.9% compared to the naïve version.
Single-stage Denoising and Dereverberation
1. Noisy mixture to direct source speech (Denoising)
Mixture
Enhanced
2. Reverberant speech to direct source speech (Dereverberation)
Reverberant speech
Enhanced
3. Noisy-reverberant mixture to reverberant speech (Denoising)
Noisy-reverberant mixture
Enhanced
4. Noisy-reverberant mixture to direct source speech (Denoising and Dereverberation)
Noisy-reverberant mixture
Enhanced
Noisy-reverberant mixture
Enhanced
Noisy-reverberant mixture
Enhanced
5. Real-recordings
Mixture (Air conditioner)
Enhanced
Mixture (Barking)
Enhanced
Mixture (Typing)
Enhanced
Mixture (Munching)
Enhanced
The effect of phase enhancement in dereverberation task
Mixture phase vs. Est. phase
Mixture
Est. magnitude + mixture phase
Est. magnitude + Est. phase
Mixture
Est. magnitude + mixture phase
Est. magnitude + Est. phase
Controlling reverberation
Interpolation between direct source est. and reverberant source est.