透過您的圖書館登入
IP:3.15.178.207
  • 學位論文

基於基頻與倍頻結構之語音偵測研究

Voice/nonvoice Detection Based on Fundamental Frequency and Harmonic Structure

指導教授 : 劉奕汶

摘要


本論文針對聲音式救護系統的前處理—語音偵測,進行研究與探討。由於聲音式救護系統無法假設,任何的呼喊聲在設定時間內發出;因此,聲音式救護系統的應用情境,需要作語音偵測,以在任何的時間點,判別擷取聲音訊號為生活聲響或者語音訊號。此外,本系統擷取是遠距離的聲音訊號,若單靠音量或訊雜比的權重比較方式,進行判別,可能會因為生活聲響的音量與訊雜比的權重不一定比語音訊號小,而產生誤判。故本論文提出一改善方法,利用語音訊號的週期性,延伸出分辨語音與非語音的特徵—基頻(Fundamental frequency)與倍頻結構(Harmonic structure),進行語音偵測。所謂的基頻,即是聲音訊號的音高,本論文利用自相關函數計算訊號的基頻;由於語音訊號的週期性,頻譜上基頻與倍頻皆出現峰值。依此特徵可以判斷輸入聲音為語音與否。經過收集的語料庫分析可以在訊雜比為5dB以上,有誤判率(False positive rate)在28%以內;錯失率(Miss rate)在10%以內。 透過頻譜上的分析,發現樂器所發出的聲音訊號,(以下簡稱「器樂」),擁有倍頻結構的特徵。不過,頻譜能量上的呈現,器樂相較於語音起伏差異甚大(Total Variation, TV)。經過設計的公式計算起伏的差異,可以有效的分開語音與器樂。經過收集的語料庫分析在訊雜比為10dB時,有誤判率在11%以內;錯失率也在20%以內。其中提到的語料庫包含前往台北的雙連安養中心與仁濟安老所錄製長者的語音訊號和生活聲響,也包含實驗室同學錄製的生活聲響、語音訊號以及新竹小太陽醫院協助錄製的咳嗽聲。 硬體實作方面,利用分時演算法計算訊號的頻譜來觀察倍頻結構,而快速傅立葉轉換縮短自相關函數找出基頻的計算時間,以上的方法在TI 提供的DSP開發板進行實作,主要使用到中斷副程式(Interrupt)與多通道緩衝串口(McBSP)兩種硬體溝通方式搭配撰寫程式實作。

關鍵字

語音偵測 倍頻結構

並列摘要


This thesis focuses on the front-end processing with voice detection of Voicecare system which is able to understand human calls for help. Since we cannot assume that calls for help occur at any pre-determined time, a Voicecare device needs a voice detector to distinguish whether the sound is a daily sound or voice at any time. Moreover, this device receives distant sounds; wrong judgments may be made if only comparing the volume or weighted SNR because the volume and weighted SNR of daily sounds are not definitely lower than the value of voice. This thesis addresses the problem and proposes to use fundamental frequency and harmonic structure to differentiate voice from nonvoice. Fundamental frequency is the pitch of voice; this thesis uses autocorrelation function to calculate the fundamental frequency of signals. Moreover, because of the periodicity of voice signals, there are peaks at fundamental and harmonic frequencies in the spectrum. Voice and nonvoice can be determinate based on the characteristics of signals as mentioned above. Experiments show that the false positive rate is within 28% and the miss rate is within 10%, if the SNR is above 5dB. By observing the magnitude spectrum of instrumental music and voice, we find out that the instrumental music with the characteristics of harmonic structure is misclassified as voice. However, the variation of instrumental music in magnitude spectrum is more dramatic than the variation of voice. Because of this observation, we can classify instrumental music and voice by calculating the total variation in magnitude spectrum. Experiments show that the false positive rate is within 11% and the miss rate is within 20%, if the SNR is 10 dB. A collection of sound files was recorded in Suang-Lien Elderly Center in Taipei (台北雙連安養中心) and Yan-Chai Elderly Center in Taipei (台北仁濟安老所), including voice and daily sounds of the elderly. In addition, voice and daily sounds gathered from classmates in Acoustic and Hearing laboratory and coughing sounds provided by新竹小太陽診所. For hardware development, we use “decimation in time” to calculate the spectrum and observe the harmonic structure. Also, Fast Fourier transform is utilized to shorten the computation time of autocorrelation function. These methods were implemented on a DSP board (Texas Instrument C6416). Hardware communication techniques, including techniques such as Interrupt and Multi-channel Buffered Serial Port (McBSP), are adopted to enable real-time implementation on the DSP board.

並列關鍵字

voice detection harmonic structure

參考文獻


[21] 廖育志,“結合雜訊抑制與帶聲語音重建之語音增強系統”,國立清華大學電機碩士論文,Jul 2010.
[4] M. Lahat, R. J. Niederjohn, and D. A. Krubsack, “A spectral autocorrelation method for measurement of the fundamental frequency of noise corrupted speech,” IEEE Trans. Acoustic, Speech, Signal Processing, vol.ASSP-35, pp. 741–750, June 1987.
[6] J. Ramírez, J. C. Segura, M. C. Benítez, A. de la Torre, and A. Rubio, “Efficient voice activity detection algorithms using long-term speech information,” Speech Communication, vol. 42, no. 3-4, pp. 271–287, 2004.
[7] J. Allen and D. Berkley, “Image method for efficiently simulating small-room acoustics,” Journal of the Acoustical Society of America, vol. 65(4), pp. 943-950, April 1979.
[9] R. Tucker, “Voice activity detection using a periodicity measure,” Proc. Inst. Elect. Eng., vol. 139, no. 4, pp. 377–380, Aug. 1992.

被引用紀錄


何育澤(2014)。基於支持向量機之混合聲響辨認〔碩士論文,國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2912201413562412

延伸閱讀