生活聲響之自動辨認

在人的日常生活環境中有許多種不同的聲音，不論是語音還是非語音，我們可以藉由聲音的特質透過人耳來辨識出聲音，判斷周遭的情形。隨著科技的進步，聲音辨識已經是逐漸實用化的技術，尤其在語音辨識上。聲音辨識現今也逐漸融入居家安全中，但不論使用者的身分年齡地位，在居家中可能會出現屬於危急且非語音的聲音。由於以往聲音辨識大都著重在語音及語者的辨識上，這時候生活聲響的聲音辨識則顯得重要。若能針對人們在住宅中可能發生的任何危急聲音資訊做分類及辨識，除了對於分析周遭的情境有很大的幫助以外，亦可增加獨立生活的安全感。在本論文中，實驗用的音檔部分我們收集了八類共三百七十二筆音檔，分別平均拆成八類186筆音檔當訓練之資料庫，另外的八類186筆音檔當測試之資料庫，來研究一般環境下與含雜訊情況下的辨認方法與開發。特徵萃取主要使用梅爾倒頻譜係數(Mel-scale Frequency Cepstral Coefficients, MFCC)以及感知特徵(Perceptual Feature)萃取音檔的特徵向量。分類器是使用高斯混和模型(Gaussian Mixture Model, GMM)的方式來做為前端，並且增加異常排除(Outlier Rejection)的機制，使用似然比檢驗(Likelihood Ratio Test, LRT)為基礎，將測試音檔與非資料庫音檔分別與資料庫做模型比對，以防止非資料庫音檔被強制誤判。本論文使用了三個研究方法，分別為變異數加平均值(Variance-Mean)、音框投票(Frame Vote)、代表性音框投票(Selected Frame Vote)來各別進行生活聲響檔案之分類。目前針對資料庫與測試音檔的比對，使用三個研究方法裡在一般環境下最好可以達到96.24%的辨識正確率，另外我們對於雜訊與回音之強健性也進行了完整的評估。在異常排除機制的部分，收集了非資料庫共120筆音檔來進行實驗，整體錯誤率最好可降低至19%。另外又找了非資料庫共100筆音檔再次實驗異常排除，錯誤率最好可降低至23%。

關鍵字

梅爾倒頻譜；高斯混合模型； Log Likelihood Ratio Test ；聲音辨識；非語音；不予辨認

並列摘要

There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of the sound has become a practical technology gradually, especially in the speech recognition. The recognition of sound has gradually got into home safety. Regardless of the user's age or status, emergency can happen at home, accompanied by non-speech sounds. In the past, the recognition of the sound mostly focused on the voice and the speaker. If it is possible to classify and recognize any sound that indicates dangerous situations in the house, that will help analyze the scenario and increase people’s sense of security while living alone. In this paper, we have collected eight classes of audio files, 372 files in total for experiments. The files were equally divided into training and testing datasets. We use them to develop methods for sound recognition in normal or noisy situations. As for feature extraction, the feature vector consists of Mel-scale Frequency Cepstral Coefficients (MFCC) and Perceptual Features. Gaussian mixture model (GMM) is used as the front-end in the classifier, and an outlier rejection mechanism is added to it. The outlier rejection mechanism is based on Likelihood Ratio Test (LRT), which compares the test audio files and non-dataset files respectively with dataset. That way, we can prevent the non-dataset audio files from being enforced to recognize by mistake. In this paper, we use three methods to classify the audio files: the variance-mean method, the frame-vote method, and the selected frame-vote method. At the present time for the comparison of the dataset and the test audio files, the methods can reach 96.24% of recognition accuracy at best in the normal situation. In addition, we make a complete evaluation for the robustness against noise and echoes. As for the outlier rejection mechanism, we have collected a total of 120 non-dataset audio files to experiment on it, and the overall error rate can be reduced to 19%. What is more, we found a total of 100 non-dataset audio files to experiment on it again, and the overall error rate can be reduced to 23%.

並列關鍵字

無資料

參考文獻

4. Wong, E.; Sridharan, S., "Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification," in Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on. 2001. p. 95-98

6. 林財貝, "應用機率型SVMs與ICA於以內容為基礎音訊分類之研究", in 電機工程學系碩博士班2006, 國立成功大學: 台南市.

7. Li, S.-Z., "Content-based audio classification and retrieval using the nearest feature line method," Speech and Audio Processing, IEEE Transactions on, 2000. 8(5): p. 619-625.

8. Sakoe, H.; Chiba, S., "Dynamic programming algorithm optimization for spoken word recognition," Acoustics, Speech and Signal Processing, IEEE Transactions on, 1978. 26(1): p. 43-49.

9. Rabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition". Proceedings of the IEEE, 1989. 77(2): p. 257-286.

國際替代計量

生活聲響之自動辨認

全文下載

主題瀏覽