Music Source Separation Using Stacked Hourglass Networks

Venue

Publication Year

Keywords

Computer Science - Sound,Electrical Engineering and Systems Science - Audio and Speech Processing

Authors

Sungheon Park
Taehoon Kim
Kyogu Lee
Nojun Kwak

Abstract

In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks. Stacked hourglass network, which was originally designed for human pose estimation in natural images, is applied to a music source separation task. The network learns features from a spectrogram image across multiple scales and generates masks for each music source. The estimated mask is refined as it passes over stacked hourglass modules. The proposed framework is able to separate multiple music sources using a single network. Experimental results on MIR-1K and DSD100 datasets validate that the proposed method achieves competitive results comparable to the state-of-the-art methods in multiple music source separation and singing voice separation tasks.