透過您的圖書館登入
IP:18.218.181.138
  • 學位論文

基於條件對抗式網路進行長片段音訊修補

SLAIN: A Second Long Audio Inpainting with Conditional GAN.

指導教授 : 徐宏民
共同指導教授 : 陳文進(Wen-Chin Chen)

摘要


我們輿論文中介紹一種實用、彈性且有效的長片段音訊修復方法。這個基於條件對抗式網路的架構稱為SLAIN,能夠恢復音訊的毀損部分,包括各類音效和樂器錄音。我們利用源自風格遷移的架構並進行精心設計的修改,使此方法可以處理未被形變的音訊頻譜圖,並根據人類的聲學特徵進行衡量。另外與最新神經聲碼器的集成使得輸出音訊質量比傳統演算法Griffin­Lim好上不少。除了重建函數和生成對抗函數之外,預訓練的聲碼器還提供了額外聲學函數來指導模型。透過分析實驗在兩個有挑戰性的數據集上,平均意見分數(MOS)的人工評估表明我們的方法可以處理彈性長度的毀損並在44.1 kHz(常見採樣頻率)的1.5秒長音訊樣本中能夠達到最多1秒的修補長度。生成的聲音其分數平均在MOS上最高5分中超過4分,這代表與現有的長音訊修復方法相比,我們的方法具有最佳效能。

並列摘要


We introduce a practical, flexible and powerful approach for long audio inpainting.The proposed cGANs-based method called SLAIN is to recover the missing parts ofthe audio data, including the audio events and the instrument recordings. We utilizethe solution from style transfer with well-designed manipulation, process the audio onmagnitude spectrogram without deformation, and measure with the fashion of humans’acoustic features. Integration with the latest neural vocoder makes the quality of theoutput audio a lot better than the Griffin-Lim. The pre-trained vocoder also provides anadditional vocal loss to guide the model other than reconstruction loss and GAN loss.In the experiments on two challenging datasets, human evaluations of the mean opinionscore (MOS) show that our method can handle the free-form mask up to 1 second withina 1.5 seconds long audio sample at 44.1 kHz (a common sampling frequency). Thegenerated sounds get the MOS over four out of five on average, which indicates that ourmethod has the best performance compare to existing methods for long audio inpainting.

並列關鍵字

Audio Inpainting cGANs Vocoder Acoustic MOS

參考文獻


A. Adler, V. Emiya, M. G. Jafari, M. Elad, R. Gribonval, and M. D. Plumbley. Audio inpainting. IEEE Transactions on Audio, Speech, and Language Processing, 20(3):922–932, 2011.
M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
Y. Bahat, Y. Y. Schechner, and M. Elad. Self­content­based audio inpainting. Signal Processing, 111:61–72, 2015.
R. Balan. On signal reconstruction from its spectrogram. In 2010 44th Annual Conference on Information Sciences and Systems (CISS), pages 1–4. IEEE, 2010.
T. Bazin, G. Hadjeres, P. Esling, and M. Malt. Spectrogram inpainting for interactive generation of instrument sounds. arXiv preprint arXiv:2104.07519, 2021.

延伸閱讀