標準漢語歌聲中歌詞辨識與一般語音辨識差異之研究

歌曲搜尋為大眾生活中不可或缺的一部份，在不知道曲名、不知道歌手的情況下，只要能夠哼唱幾句，就可以搜尋到想找的歌曲。如今歌唱搜尋的網站、手機應用程式大多是採用旋律搜尋。但這樣的搜尋方式，對無法將準確音調重現的使用者們來說，並不方便。在旋律不準確的情況下，無法順利得到正確的結果。因此若能將歌聲中的字音辨識出來，將能大幅提升使用者搜尋的正確率。本研究目的為比較歌唱與朗讀語音的不同，並藉此提升歌唱中歌詞的語音辨識正確率。研究流程分成三部分，第一部分先對歌唱與朗讀的語音辨識做觀察比較，再由觀察結果決定實驗方向。第二部分做歌聲前處理，去除噪音及背景音，留下人聲。第三部分特徵抽取，經過預加重、加窗等處理，將音訊轉換成聲譜圖，做為歌唱模型訓練的輸入圖像。第四部分使用端對端卷積神經網路 (CNN) 搭配鏈結式時間分類算法 (CTC) 訓練語音模型，實現歌曲字音辨識的功能。

關鍵字

語音辨識；音樂資訊檢索；卷積神經網路；伴奏歌聲；清唱訊號；漢語拼音

並列摘要

Song retrieval is an indispensable part in modern life. One expects to find the song simply by singing few words or humming a period of the it. Most websites and mobile apps use features of melody in song retrieval tasks nowadays. However, the search method is inconvenient for users who cannot sing in accurate tones. It will not get the correct result for the inaccurate melody. Therefore, if the words in the song can be recognized, it will greatly improve the accuracy of the song search. This research is divided into four parts. The first part compare singing audio and reading audio. The second part is preprocessing the song. It removes the background noise and forces vocal. The third part is extracting features. It converts the audio into a spectrogram as an input image for the training model. The fourth part uses convolutional neural network (CNN) model and connectionist temporal classification (CTC) model to train acoustic model.

並列關鍵字

Speech recognition ； Lyric recognition ； Music Information Retrieval ； Convolutional Neural Network ； Mandarin pinyin

參考文獻

1. Tzanetakis, G. and P. Cook, Musical genre classification of audio signals. IEEE Transactions on speech and audio processing, 2002. 10(5): p. 293-302.

Google Scholar

2. Tsai, W.-H. and H.-M. Wang. Towards Automatic Identification Of Singing Language In Popular Music Recordings. in ISMIR. 2004.

Google Scholar

3. Akeroyd, M.A., B.C. Moore, and G.A. Moore, Melody recognition using three types of dichotic-pitch stimulus. The Journal of the Acoustical Society of America, 2001. 110(3): p. 1498-1504.

Google Scholar

4. midomi. https://www.midomi.com/].

Google Scholar

5. Wang, A., The Shazam music recognition service. Communications of the ACM, 2006. 49(8): p. 44-48.

Google Scholar

國際替代計量

標準漢語歌聲中歌詞辨識與一般語音辨識差異之研究

全文下載

主題瀏覽