聲紋驗證為生物辨識中一種重要的驗證方式,此種驗證方式最大的優點即是硬體需求簡單,只需要一般市面上常見的麥克風即可,因此常用於電話及手機的生物辨識。本篇論文目標為建立一套文本相關的聲紋驗證系統並包含三個部分:「動態時間扭曲語者驗證系統」利用強制對齊切開數字後藉由動態時間扭曲比較註冊時數字的梅爾倒頻譜係數與測試時數字的梅爾倒頻譜係數之差異、「語句級語者驗證系統」直接抽取註冊音檔與測試音檔的i-vector並使用餘弦相似度或機率線性判別分析來評分這二組i-vector、「數字級語者驗證系統」利用強制對齊切開數字後抽取註冊音檔與測試音檔中各個數字的i-vector並使用餘弦相似度或機率線性判別分析來評分對應數字的i-vector。
Speaker recognition is an important biometric identification method. The biggest advantage of using such method is the simple requirement of its hardware, which only consists of a microphone. Therefore, it is widely implemented in mobile phones and call centers. The purpose of this thesis is to create a text-related speaker verification system, for which we conduct three different approaches to analyze their result: dynamic time warping compares the differences between the MFCCs for digits at registration and digits at testing after applying forced alignment; sentence-level uses cosine similarity or PLDA to rate the two groups of i-vector retrieved from the audios at registration and testing respectively; digit-level uses cosine similarity or PLDA to rate each i-vector of every digits in the audios after applying forced alignment.