基於數字文本相關之語者驗證的研究與實作

聲紋驗證為生物辨識中一種重要的驗證方式，此種驗證方式最大的優點即是硬體需求簡單，只需要一般市面上常見的麥克風即可，因此常用於電話及手機的生物辨識。本篇論文目標為建立一套文本相關的聲紋驗證系統並包含三個部分：「動態時間扭曲語者驗證系統」利用強制對齊切開數字後藉由動態時間扭曲比較註冊時數字的梅爾倒頻譜係數與測試時數字的梅爾倒頻譜係數之差異、「語句級語者驗證系統」直接抽取註冊音檔與測試音檔的i-vector並使用餘弦相似度或機率線性判別分析來評分這二組i-vector、「數字級語者驗證系統」利用強制對齊切開數字後抽取註冊音檔與測試音檔中各個數字的i-vector並使用餘弦相似度或機率線性判別分析來評分對應數字的i-vector。

關鍵字

語者驗證；強制對齊；動態時間扭曲； i-vector ；機率線性判別分析

並列摘要

Speaker recognition is an important biometric identification method. The biggest advantage of using such method is the simple requirement of its hardware, which only consists of a microphone. Therefore, it is widely implemented in mobile phones and call centers. The purpose of this thesis is to create a text-related speaker verification system, for which we conduct three different approaches to analyze their result: dynamic time warping compares the differences between the MFCCs for digits at registration and digits at testing after applying forced alignment; sentence-level uses cosine similarity or PLDA to rate the two groups of i-vector retrieved from the audios at registration and testing respectively; digit-level uses cosine similarity or PLDA to rate each i-vector of every digits in the audios after applying forced alignment.

並列關鍵字

Speaker Verification ； Forced Alignment ； Dynamic Time Warping ； i-vector ； PLDA

參考文獻

[1] Rabiner, Lawrence R., and Biing-Hwang Juang. Fundamentals of speech recognition. Vol. 14. Englewood Cliffs: PTR Prentice Hall, 1993.

Google Scholar

[2] Lin-Shan Lee. "Fundamentals of Speech Signal Processing 2017 Spring", available at "http://speech.ee.ntu.edu.tw/DSP2017Spring/", accessed on [June 2018].

Google Scholar

[3] "Frames Representation of Speech Signal", available at "https://basic-programming.blogspot.com/2005/11/frames-representation-of-speech-signal.html", accessed on [June 2018].

Google Scholar

[4] "window function", available at "https://zh.wikipedia.org/wiki/%E7%AA%97%E5%87%BD%E6%95%B0", accessed on [June 2018].

Google Scholar

[5] Prasad, N. Vishnu, and Srinivasan Umesh. "Improved cepstral mean and variance normalization using Bayesian framework." Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.

Google Scholar

國際替代計量

基於數字文本相關之語者驗證的研究與實作

未授權

主題瀏覽