Monaural Instrument Sound Segregation by Stacked Recurrent Neural Network

A stacked recurrent neural network (sRNN) with gated recurrent units (GRUs) and jointly optimized soft time-frequency mask was proposed for extracting target musical instrument sounds from a mixture of instrumental sound. The sRNN model stacks and links multiple simple recurrent neural networks (RNNs), which makes sRNN an excellent model with temporal dynamic behavior and real deepness. The GRU improves the gate foundations of long short-term memory and reduces the operating time. Experiments were conducted to test the proposed method. A musical dataset collected from real instrumental music was used for training and testing; electric guitar and drum sounds were the target sounds. Objective and subjective assessment scores obtained for the proposed method were compared with those obtained for two models, namely Wave-U-Net and SH-4stack, and a conventional RNN model. The results indicated that electric guitar and drum sounds can be successfully extracted through the proposed method.

關鍵字

electric guitar ； drums ； sound separation ； stacked recurrent neural network ； gated recurrent unit ； time-frequency mask

國際替代計量

全文下載

主題瀏覽

Monaural Instrument Sound Segregation by Stacked Recurrent Neural Network

摘要

關鍵字

延伸閱讀

國際替代計量

本網站使用Cookies