為了讓民眾可以很快地從鳥鳴聲中知道這是什麼鳥的種類,本論文提出一個自動識別鳥鳴聲串的系統,其中的鳥鳴聲串可能包含許多在不同時間所錄下的不同鳥鳴聲,因此系統必須識別哪些區段內含有哪種鳥鳴聲。我們採用音高與音色兩種特徵參數來進行識別。 在利用音高特徵的分析上,我們把鳥鳴聲串的訊號轉換成音符的形式,並且利用雙連文模型來凝聚每種鳥在音高上的動態變化資訊,進而識別出鳥種。在利用音色特徵的分析上,我們採用了梅爾倒頻譜係數來擷取音色的特徵,並且利用高斯混合模型表示出每種鳥在音色上的共有特徵,進而識別出鳥種。 我們收集了20種鳥共2815個聲音樣本進行實驗驗證,並將所有鳥鳴聲隨機串成一長度為885秒的聲音檔案。結果顯示在使用音高、音色、及結合音高與音色特徵的識別率分別為:50.2%、82.45%、85.28%。
To help people learn bird species from their sounds, this study proposes an automatic system that identifies bird sounds in a long audio recording. For each instant of a long audio recording, we perform timbre-based and pitch-based analyses. In the timbre-based analyses, Mel-frequency cepstrum coefficients are extracted from every short segment, and then tested by a Gaussian Mixture Model Classifier. In the pitch-based analysis, we convert sound signals from their waveform representations into a sequence of MIDI notes. Then, Bigram models are used to analyze the dynamic change information of the notes. The database used in this thesis consists of 2815 sound recordings from 20 bird species. We further concatenated all the recordings into an 885-sec long audio stream. Our experiments show that the identification accuracies obtained with pitch-based analysis, timbre-based analysis, and combined pitch and timbre-based analysis are 50.2%, 82.45% ,85.28%, respectively.