Music Source Separation

Research Overview

(According to Wikipedia,) Source separation problems in digital signal processing are those in which several signals have been mixed together into a combined signal and the objective is to recover the original component signals from the combined signal. In case of music –which is combined with many musical source- we need to separate it for the better understating of it.

There are many applications which need music source separation. For example, in the music information retrieval (MIR) scene, if we want to estimate the beat of the music, we first need to obtain the percussive sound of it, while we need to remove it if we want to estimate the chord. We need to take the vocal to estimate the lyrics, and need to separate all the instruments for the music transcription.

Music source separation is not only useful for MIR, but also by itself. Let say you want to train a singing. Vocal separation can help you by providing the vocal-removed signal from the original music file. In addition, if we can set each separated source on the various virtual location, it is possible to make the upmixed version of mono->stereo or stereo->5.1ch from it.

As many other tasks, a research for the music source separation is started from specifying the problem. What do we want to separate? How many instruments are there? Is it mono, stereo, or more? Do we have any pre-trained database? Is there any other side information? Depending on the task or the application, there are numerous problems we want to solve.

Publications

Journal Articles

  • J. Park, J. Shin, and K. Lee, “Exploiting continuity/discontinuity of Basis Vectors in spectrogram decomposition for harmonic-percussive sound separation,” IEEE Transactions on Audio Speech and Language Processing. (Under review) (demo)
  • I.-Y. Jeong, and K. Lee, “Vocal separation from monaural music using temporal/spectral continuity and sparsity constraints,” IEEE Signal Processing Letters, Vol. 21, No.10, pp. 1197-1200, Jun. 2014.

Conference Papers

  • J. Park, J. Shin, and K. Lee, “Exploiting continuity/discontinuity of basis vectors in spectrogram decomposition for harmonic-percussive sound separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, , Vol. 25, No. 5, pp. 1061-1074, May, 2017.
  • I.-Y. Jeong, and K. Lee, “Informed source separation from monaural music with limited binary time-frequency annotation,” submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015.
  • J. Park, and K. Lee, “Harmonic-percussive source separation using harmonicity and sparsity constraints,” in Proc. International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain, 2015.
  • .-Y Jeong and K. Lee, “Singing voice separation based on sparse nature and spectral/temporal discontinuity,” in Music Information Retrieval Evaluation eXchange (MIREX) : singing voice separation, 2014.
  • I.-Y. Jeong and K. Lee, “Vocal separation using extended robust principal component analysis with Schatten p/lp-norm and scale compression,” in Proc. IEEE International Conference on Machine Learning for Signal Processing (MLSP), Reims, France, Sep. 2014.

Project Members

Il-Young Jeong, Jeongsoo Park