應用於環境聲音辨識之可信度估測

近年來，環境聲音辨識在家庭自動化應用中已成為一個新的研究主題。在家庭自動化系統中，正確辨識環境中的聲音是執行任務的基礎。對一個辨識系統來說，特徵值與分類器的選取扮演著影響辨識率的重要角色。本篇論文使用非均勻尺度頻率圖當作特徵參數，並選擇高斯程序作為分類器使用。然而，除了特徵值與分類器之外，訓練資料的可靠程度也影響著辨識率。因此本篇論文提出了一個新的資料可信度估測方法用以實作離群點偵測。此可信度估測方法使用一個預先定義的字典來將特徵參數表示成高斯分佈。根據此高斯分佈的參數，我們定義兩個可信度值，稱為資料可信度及維度可信度；並且提出了兩個相應的核化函數用以應用於高斯程序之中。我們設定一個閥值來辦斷資料點是否為離群點。若資料點的可信度小於閥值，則此資料點視為離群點；反之，則為一般資料點。關於字典選擇的部分，本篇論文討論多種基於矩陣分解的字典所估測出的可信度的差異，如: 傳統非負矩陣分解、半非負矩陣分解、稀疏非負矩陣分解、主成分分析及二維(半)非負矩陣分解。測試資料庫為一個二十類環境聲音資料庫。實驗結果顯示，稀疏非負矩陣分解字典所估測的可信度較具有鑑別性，在所提出的離群點偵測演算法上有較好的表現。

關鍵字

可信度估測；環境聲音辨識；高斯程序

並列摘要

In recent years, environmental sound recognition has become a new research topic in home automation. In home automation systems, the sound recognized by the system becomes the basis for performing certain tasks. For a recognition system, features and classifiers play the important roles in improving performance. This thesis adapts the nonuniform scale-frequency maps (nSFMs) as the feature, and the Gaussian process is chosen as the classifier. However, apart from features and classifiers, the reliability of the data should be also taken into consideration. Therefore, we propose a new confidence estimation approach to achieve the outlier detection. Two confidence measures called data confidence and dimension confidence are defined. And two relative kernels are proposed for the Gaussian process. A threshold is set to decide whether the data point is an outlier or not. If the confidence value of the data point is less than the threshold, the data point is regarded as an outlier. Otherwise, it is a normal data. For the dictionary selection, the matrix factorization based dictionaries are discussed, such as standard nonnegative matrix factorization (NMF), Semi-NMF, sparse NMF, principal component analysis (PCA), and 2D (Semi-)NMF. Experiments are conducted on a 20 class environmental sound database. The results indicate that the confidence values estimated by the sparse NMF dictionary are discriminative and have better performances in the proposed outlier detection approach.

並列關鍵字

Confidence estimation ； Environmental sound recognition ； Gaussian process

參考文獻

[1] D. Mitrovi´c, M. Zeppelzauer, and C. Breiteneder, “Features for content-based audio retrieval,” Advances in computers, vol. 78, pp. 71-150, 2010.

[2] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397-3415, Dec. 1993.

[3] K. Umapathy, S. Krishnan, and S. Jimaa, “Multigroup classification of audio signals using time-frequency parameters,” IEEE Trans. Speech Audio Process., vol. 7, no. 2, pp. 308-315, Mar. 2005.

[4] S. Esmaili, S. Krishnan, and K. Raahemifar, “Content based audio classification and retrieval using joint time–frequency analysis,” in Proc. Int. Conf. Acoust., Speech, Signal Process., vol. 5, pp. 665-668, May 2004.

[5] S. Chu, S. Narayanan, and C.-C. J. Kuo, “Environmental sound recognition with time-frequency audio features,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 6, Aug. 2009.

國際替代計量

應用於環境聲音辨識之可信度估測

未授權

主題瀏覽