透過您的圖書館登入
IP:3.15.2.78
  • 學位論文

實現智慧型電視之個人化服務的語者識別技術探討

A Study of Speaker Identification Techniques for the Personalized Services of Smart TV

指導教授 : 蔡偉和
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


個人化服務是智慧型電視所應具備的一項重要特色,而識別不同的使用者則是實現個人化服務的第一步。本論文探討如何藉由使用者的說話聲來識別其身份,亦即語者識別。考慮使用者說話時大都正在觀看電視,因此麥克風錄下語者說話聲的同時,也錄下了電視的播放聲音,有時電視播放聲甚至比使用者的說話聲更大,這對於語者識別而言將是一種挑戰。所幸電視播放聲可透過電視線輸出取得,故可作為語者之背景聲音的消除依據。然而,由於麥克風所錄下的電視播放聲實際上並不等於透過線輸出所取得的聲音,因此直接進行訊號相減處理並無法獲得純粹的使用者說話聲。為了解決此問題,本論文利用適應性頻譜相減法來試圖求取使用者說話聲頻譜,據此判斷使用者身分。但實驗發現頻譜相減法通常無法完全消除背景音,且亦可能造成語者說話聲頻譜的破損。於是,本論文設計兩種補償頻譜相減法的做法。經實驗證實,本論文所提的兩種補償做法在電視播放聲音大於使用者說話聲時,仍能達到97.17%的語者識別準確度,遠優於單純頻譜相減法的效果。

並列摘要


Personalized service is an important feature for smart TV. To facilitate personalized services, identifying the user is a necessity. This thesis investigates how to identify a smart TV user from his/her voice, that is, speaker identification. Recognizing the factor that when a user issues a voice command to a smart TV, the signal which received by the smart TV would not only the user's speech but also the background sound mainly from the TV. Sometimes the background sound can be louder than the user's speech, and hence it is detrimental for speaker identification. Fortunately, the background sound from TV can be acquired and handled by recording the signal from "Line out". However, as the background sound, coming from TV's speaker(s), is not the same as the one from "Line out", it is infeasible to acquire the user's voice by performing direct subtraction in the time domain. To deal with this problem, we propose using spectrum subtraction to remove the background sound in the frequency. However, it is found that although the spectrum subtraction can remove the background sound significantly, it also removes some components in the spectrum of the user's speech. To improve the speaker identification more effectively, we further propose two compensating approaches built upon the spectrum subtraction. Our experiment show that when the background sound is much louder than the user's speech, both of the proposed compensating approaches can achieve the accuracy of speaker identification more than 97.17%.

參考文獻


10. 吳宜樵,智慧型家用機器人使用之語者辨識系統,碩士論文,國立交通大學電信工程研究所,2011。
11. 劉淵翰,語音強化與立體聲迴聲消除於智慧型電視之應用,碩士論文,國立交通大學工學院聲音與音樂創意科技碩士學位學程,2013。
1. G. R. Doddington, “Speaker Recognition-Identifying People by their Voices,” Proceedings of the IEEE, vol. 73, no. 11, pp. 1651-1664, November 1985.
4. C. Cerisara, S. Demange, and J. P. Haton, “On noise masking for automatic missing data speech recognition: A survey and discussion,” Computer Speech & Language, vol. 21, no. 3, pp. 443–457, 2007.
6. S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, pp. 254–272, Apr. 1981.

延伸閱讀