強健性聲音事件辨識之研究

近年來，環境聲音辨識在家庭自動化應用中已成為一個新的研究主題。在家庭自動化系統中，正確辨識環境中的聲音是執行任務的基礎。然而，真實環境中有外在干擾會導致辨識率低落，例如目標聲音與其他聲音同時出現，或是有環境噪音的干擾。為了處理這兩個問題，在此篇論文中，我們共提出了三套強健性處理方法。我們首先提出了一套混和聲音辨識方法來處理聲音同時出現的問題。對於環境噪音的問題，本論文採用兩種方法來移除噪音的影響。第一種方法是先移除收到訊號中的噪音後，再擷取特徵參數，稱作聲音強化。第二種方法則是在移除噪音的同時也擷取特徵參數，稱作強健性特徵參數擷取。在此篇論文中，對於聲音同時出現的問題，我們提出一個基於無線感測網路下的混和聲音驗證方法。此架構包括基於無線感測網路的聲音分離以及聲音驗證技術。在有噪音的環境下，對於聲音強化的方式，本論文提出了快速子空間聲音增強演算法濾除背景雜訊。對於強健性特徵參數擷取的方式，本論文提出了一套基於非均勻尺度-頻率圖的參數擷取方法。實驗數據顯示出，在有聲音同時出現或是有環境噪音下，我們提出的三種方法與基準方法相比，我們的系統都具有更高的辨識率。

關鍵字

強健性聲音事件辨識；混和聲音事件驗證；音訊增強；強健性特徵參數擷取

並列摘要

In recent years, environmental sound recognition has become a new research topic in home automation. In home automation systems, the sound recognized by the system becomes the basis for performing certain tasks. However, there are various disturbances which may cause recognition system to fail in real world applications. For example, a target source is mixed with another sound due to simultaneous occurrence, or the sound received by the applications is exposed to background noise. To resolve these two issues, we totally propose three robust processing methods in this dissertation. We firstly propose a mixed sound verification method to deal with simultaneous occurrence of sounds. For the problem of background noise, this dissertation adopts two approaches to reduce the impact on recognition. The first approach is sound enhancement, which suppresses the noise of received sound before feature extraction. The second approach is to simultaneously remove noise and extract feature (implements feature extraction and denoising simultaneously), called robust feature extraction. To handle the problem of simultaneous occurrences of multiple sounds, this study proposes a framework, which consists of sound separation and sound verification techniques based on a wireless sensor network (WSN). For the problem of reducing noice from the input audio, we propose a fast subspace based sound enhancement method to filter background noise on signal subspace. For the approach of robust feature extraction, we proposed a novel feature extraction approach called nonuniform scale-frequency map for environmental sound recognition. Furthermore, the experimental results demonstrate the robustness and feasibility of the three proposed systems are superior to baseline systems.

並列關鍵字

Robust Sound Event Recognition ； Mixed Sound Event Verification ； Sound Enhancement ； Robust Feature Extraction

參考文獻

[90] S. H. Jensen, P. C. Hansen, S. D. Hansen, and J. A. Sorensen, “Reduction of broad-band noise in speech by truncated qsvd,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 439-448, Nov. 1995.

[28] W. Brent, “Perceptually based pitch scales in cepstral techniques for percussive timbre identification,” in Proc. International Computer Music Conference, Montreal, Québec, Canada, 2009, Aug. 16–21, pp. 121–124.

[82] C. H. Yang and J. F. Wang, “Noise suppression based on approximate KLT with wavelet packet expansion,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, pp. I-565–I-568.

[3] M. Vacher, D. Istrate, F. Portet, T. Joubert, T. Chevalier, S. Smidtas, B. Meillon, B. Lecouteux, M. Sehili, P. Chahuara, and S. Meniard, “The sweet-home project: Audio technology in smart homes to improve well-being and reliance,” in Proc. 33rd Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, Boston, Massachusetts, United States, 2011, Aug. 30–Sep. 03, pp. 5291–5294.

[112] A. Rosenberg, C.-H. Lee, and F. Soong, “Cepstral channel normalization techniques for HMM-based speaker verification,” in Proc. ICSLP, 1994, vol. 4, pp. 1835–1838.

國際替代計量

強健性聲音事件辨識之研究

未授權

主題瀏覽