本篇論文主要為建立使用MPEG-7音訊特徵值(Audio Descriptor)為搜尋基礎的歌曲檢索系統,我們提出模擬手機錄音方法並從特徵值和頻譜上評估其相似度與誤差來討論這樣做的可行性,判斷可行後加入不同音源及環境噪音判斷在不同條件下之辨識率。 之後討論適合的辨識策略,從頻域和時域嘗試對特徵值用濾波器處理降低雜訊影響,可知對頻域做處理可以很容易的提升辨識率;以及討論選擇不同頻段時對辨識率所造成的影響,可得知受環境雜訊影響的音樂較無雜訊音樂有更多的中高頻部分不適合用來辨識,並需要至於8至12個特徵值的維度才能維持高辨識率。 最後以主成分分析(Principal Component Analysis)及因素分析(Factor Analysis)討論降維時對辨識率的影響,可知對錄音歌曲降維至較少點數時,辨識率會快速下將。
This thesis studies the performance of a music database system, which accepts mobile-phone recorded audio as the query, based on MPEG-7 audio signature descriptors. In this study, we firstly investigate the possibility of convolving room impulse response with the reference audio to replace the mobile-phone recorded audio. By comparing the waveforms, we conclude that this approach is highly possible.. We next add environmental noise to the simulated recorded audio as the test audio to examine various strategies to improve the identification accuracy. Simulation results reveal that filtering on the frequency-axis provides higher accuracy for noisy environment. Next, we find that comparing 8 to 12 subbands are sufficient. Our last experiment concerns the accuracy versus the number of (dimension-reduced) descriptors. The results show that the identification accuracy dramatically reduced if the number of dimension-reduced features below a certain level.