哼唱選歌系統之前期非適當輸入濾除

哼唱式音樂檢索（Query by Singing/Humming, QBSH）是以人聲哼唱旋律後，從資料庫找出正確歌曲之方法。考慮以QBSH為基準目標之旋律辨識系統將所有哼唱歌曲進行旋律辨識，卻忽略哼唱不佳之問題，本篇論文針對此系統提出過濾哼唱歌曲之方法，藉由過濾品質不佳之哼唱歌曲使系統之辨識效果提升。本篇論文主要提出兩種過濾哼唱歌曲之方式，分別為高斯混合模型（Gaussian Mixture Model, GMM）過濾法以及依特徵過濾法。GMM過濾法意即訓練出哼唱歌曲之高斯混合模型，利用此高斯混合模型所計算出之對數概似度差值（log-likelihood ratio）來選擇最佳門檻值。依特徵過濾法則根據不同特徵之過濾結果以錯誤拒絕率及錯誤接受率來選取各個特徵之最佳門檻值，再以各個特徵選定之門檻值來決定過濾歌曲之特徵種類。於實驗之部分，我們採用從頭哼唱及從任意處哼唱兩種不同哼唱位置之歌曲進行實驗，分別以線性縮放（Linear-Scaling, LS）、動態時間伸縮（Dynamic Time Warping, DTW）及線性縮放與動態時間伸縮整合等三種方法進行旋律辨識。從每首歌曲中，擷取出音量平均值、音高（pitch）平均值、音高清晰度（clarity）平均值、音高差值平均值及音框總數等五種特徵參數值。以各個特徵參數值及旋律辨識結果做為過濾哼唱歌曲之實驗資料。考慮使用者使用拒絕率過高之系統會失去哼唱選歌之興趣，本論文以拒絕率30%以下為衡量門檻值之目標。實驗結果顯示，每組資料以不同過濾法拒絕歌曲後之辨識率皆有提升。其中使用GMM過濾法過濾以LS為旋律辨識方法之從任意處哼唱歌曲後的結果為最佳，過濾歌曲之拒絕率為14.01%時，旋律辨識率提升了6.45%。本實驗結果證明，GMM過濾法與依特徵過濾法過濾哼唱歌曲皆能有效過濾不好的哼唱歌曲，以達到提升旋律辨識率之成效。最後，比較不同過濾方法對不同哼唱歌曲及不同旋律辨識方法之改善效果，針對實驗結果及錯誤分析提出改進方向，並對本論文做出結論。

關鍵字

音樂檢索；旋律辨識；特徵擷取；特徵選取

並列摘要

We introduce two methods for improving melody recognition rate by filtering bad queries in query by singing/humming (QBSH) system. One is “Filtration using Gaussian Mixture Model (GMM)”, and the other is “Filtration using features”. “Filtration using GMM” involves training a GMM by our discovered features and using the trained GMM to compute log-likelihood ratio to decide the threshold of filtration. “Filtration using features” directly uses the results of each feature to choose thresholds to reject bad queries. The value of a threshold is decided by the sum of false rejection rate and false acceptance rate. In this paper, we use three melody recognition methods, which are linear scaling (LS), dynamic time warping (DTW), and the combination of LS and DTW, to recognize songs from the position of head or anywhere. We use five features: mean volume, mean pitch, mean clarity, mean ratio of pitch, and number of frames. The training data and test data are derived from the results of melody recognition methods. Because too high rejective rate may cause unfriendly feeling to users, we only took the results with rejection rate under 30% in our experiment. The results show that the both proposed systems improve at least 1% recognition rate for each melody recognition method. The best result improves 6.45% recognition rate with 14.01% rejection rate by filtration using GMM to reject the songs recognized by LS from the position of anywhere. Our proposed methods successfully accomplish the elimination of bad queries and improve the recognition rate at a certain degree.

並列關鍵字

無資料

參考文獻

【2】 Jiang-Chun Chen, J.-S. Roger Jang, "Parallel Processing of Content-based music Retrieval", MS Thesis, National Tsing Hua University, Taiwan, 2001.

【1】 JT Foote, "Content-based retrieval of music and audio", Proceedings of SPIE, PP. 138-147, 1997.

【7】 P. Scalart, J.V. Filho, "Speech Enhancement Based on A Priori Signal to Noise Estimation", in Proc. 21st IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 2, pp. 629-632, May 1996.

【8】 T.W. Parsons, "Voice and Speech Processing", McGraw-Hill, 1986.

last accessed 19 June 2009.

國際替代計量

哼唱選歌系統之前期非適當輸入濾除

全文下載

主題瀏覽