Music Source Separation Using Stacked Hourglass Networks


Publication Year


Computer Science - Sound,Electrical Engineering and Systems Science - Audio and Speech Processing


  • Sungheon Park
  • Taehoon Kim
  • Kyogu Lee
  • Nojun Kwak


In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks. Stacked hourglass network, which was originally designed for human pose estimation in natural images, is applied to a music source separation task. The network learns features from a spectrogram image across multiple scales and generates masks for each music source. The estimated mask is refined as it passes over stacked hourglass modules. The proposed framework is able to separate multiple music sources using a single network. Experimental results on MIR-1K and DSD100 datasets validate that the proposed method achieves competitive results comparable to the state-of-the-art methods in multiple music source separation and singing voice separation tasks.