Detection of Cut-Points for Automatic Music Rearrangement

Venue

2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6

Publication Year

Keywords

Adaptation models,Feature extraction,Instruments,Music,Neural networks,Task analysis,Training

Identifiers

DOI: http://dx.doi.org/10.1109/MLSP.2018.8516706

Authors

D. Stoller
V. Akkermans
S. Dixon

Abstract

Existing music recordings are often rearranged, for example to fit their duration and structure to video content. Often an expert is needed to find suitable cut points allowing for imperceptible transitions between different sections. In previous work, the search for these cuts is restricted to the beginnings of beats or measures and only timbre and loudness are taken into account, while melodic expectations and instrument continuity are neglected. We instead aim to learn these features by training neural networks on a dataset of over 300 popular Western songs to classify which note onsets are suitable entry or exit points for a cut. We investigate existing and novel architectures and different feature representations, and find that best performance is achieved using neural networks with two-dimensional convolutions applied to spectrogram input covering several seconds of audio with a high temporal resolution of 23 or 46 ms. Finally, we analyse our best model using saliency maps and find it attends to rhythmical structures and the presence of sounds at the onset position, suggesting instrument activity to be important for predicting cut quality.