Monaural Instrument Sound Segregation by Stacked Recurrent Neural Network





Key Words

electric guitar ; drums ; sound separation ; stacked recurrent neural network ; gated recurrent unit ; time-frequency mask


Journal of Information Science and Engineering

Volume or Term/Year and Month of Publication

38卷3期(2022 / 05 / 01)

Page #

499 - 515

Content Language


Chinese Abstract

A stacked recurrent neural network (sRNN) with gated recurrent units (GRUs) and jointly optimized soft time-frequency mask was proposed for extracting target musical instrument sounds from a mixture of instrumental sound. The sRNN model stacks and links multiple simple recurrent neural networks (RNNs), which makes sRNN an excellent model with temporal dynamic behavior and real deepness. The GRU improves the gate foundations of long short-term memory and reduces the operating time. Experiments were conducted to test the proposed method. A musical dataset collected from real instrumental music was used for training and testing; electric guitar and drum sounds were the target sounds. Objective and subjective assessment scores obtained for the proposed method were compared with those obtained for two models, namely Wave-U-Net and SH-4stack, and a conventional RNN model. The results indicated that electric guitar and drum sounds can be successfully extracted through the proposed method.

Topic Category 基礎與應用科學 > 資訊科學