目前全世界大約有九千七百多種鳥類,而台灣這樣一個小島就佔了約二十分之一的種類,雖然我們周遭住有許多這些可愛的鄰居,但往往都只聽到它們的叫聲,卻不知它們是誰。鳥類的鳴聲豐富且多變,我們期望藉由物種之間的鳴聲差異性,發展出一套鳥鳴聲辨識系統,讓不是鳥類專家的一般民眾,也可以從自己隨意錄製的一段鳥鳴聲音檔中,使用此系統來得知所屬鳥種之相關訊息。 為了測試鳥鳴聲辨識系統的效能,本論文以「語者辨識」方法出發來探討鳥鳴聲辨識。我們嘗試使用在聲學辨識上常用的特徵參數,如梅爾倒頻譜係數、雷尼熵和梅爾頻譜質心倒頻譜係數,並搭配高斯混合模型技術進行基礎系統的鳥鳴聲識別。但實驗結果發現,不同類型鳴聲彼此互相干擾之情形嚴重,且不同特徵參數對於不同類型鳴聲之鑑別度好壞不一。為了改善此問題,我們依照鳥鳴聲可區分為「鳴唱」、「鳴叫」和「鳴叫序列」分別使用不同的特徵參數來建立統計類型,並設計出一種二階段式的辨識架構,先判斷未知鳴叫聲是否為「鳴叫」、或是否為「鳴叫序列」、或是否為「鳴唱」,再依判斷結果進行所屬類型的鳥種模型匹配。 實驗的訓練和測試音檔分別從商業CD和網路上收集而來,我們挑選出大台北地區常見的十種鳥類鳴聲,訓練和測試音檔分別屬不同的來源。實驗結果顯示,採用二階段式辨識架構,辨識率從基礎系統的60.37%提升到80%,顯示出本論文所提方法的可行性。
There are more than nine thousand and seven hundred bird species in the world. Interestingly, one-twentieth of worldwide bird species can be found in this small island we live. Despite abundant with various birds, most people in Taiwan cannot recognize what the bird is by its sound; even the bird is commonly seen. In this study, we attempt to develop automated techniques for identifying bird species based on their sounds. It is hoped that the techniques can help people learn such lovely animals by simply recording the bird sounds they hear. To begin, we apply the most prevalent speaker-identification method to the problem of bird sound identification. Several audio features, such as Mel-scale frequency cepstral coefficients (MFCCs), Renyi entropy, spectrum centroid, and Mel Spectral Centroid Cepstral Coefficients (MSCCCs), together with Gaussian mixture modeling are investigated. However, our experiments found that the resulting identification accuracy is far from satisfactory, and different audio features perform rather diverse. To improve the identification performance, we propose a two-stage paradigm based on a characteristic that bird sound can be divided into three classes, namely, bird call, bird call sequence, and bird song. It is found that MFCCs are suitable for bird song identification, whereas MSCCCs are suitable for bird call sequence identification. We therefore build two identifiers for bird call sequence and bird song using different audio features separately, and recognize which of the three classes an unknown test sound belongs to. Our experimental data comprises ten bird species, stemming from both commercial CDs and public websites. It is found that the identification accuracy can be largely improved from 60.37% to 80% using the proposed two-stage system.