基於子空間之口說語言辨識

本論文提出了一個嶄新且基於子空間的方法來實現基於音素結構的自動語言辨識。整個方法分為兩大部分，語音訊號的特徵表示法和基於子空間的學習演算法。前者利用了語音訊號中前後音素的關係與限制，透過自動語音辨識器的解碼、音素序列中各個音素的概似度計算，以及特徵串接，擷取出富含音素資訊的音素框。假設擷取出的音素框分布於一個低維度的特徵子空間，在這個空間中每段語音的結構幾乎可以完全被保留，因此每段語音又可進一步表示成固定維度的子空間。後者以非歐式距離的度量方法測量兩段語音（子空間）之間的相似性或距離，再利用基於距離或基於核的鑑別式分析進行特徵處理，最後使用後端的分類器，像是k鄰近分類法，來進行分類。實驗於OGI-TS和NIST LRE 2005這兩套資料庫上，結果顯示我們提出的方法在相等錯誤率上均勝過以向量空間模型為基礎的方法。

關鍵字

語言辨識；基於子空間學習法

並列摘要

This thesis presents a novel subspace-based approach for phonotactic language recognition. The whole framework is divided into two parts: speech feature representations and the subspace-based learning algorithms. First, the phonetic information as well as the contextual relationship, possessed by spoken utterances, are more abundantly retrieved by likelihood computation and feature concatenation through the decoding processed by an automatic speech recognizer. It is assumed that the extracted phone frames reside in a lower dimensional eigen-subspace, in which the structure of data can be approximately captured. Each utterance is further represented by a fixed-dimensional linear subspace. Second, to measure the similarity between two utterances, suitable non-Euclidean metrics are explored and applied to linear discriminant analysis in two kinds of mechanisms: the distance-based and kernel-based learning algorithms, followed by a back-end classifier, such as the k-nearest neighbor (KNN) classifier. The results of experiments on the OGI-TS and the NIST LRE 2005 databases demonstrate that the proposed framework outperforms the well-known vector space modeling based method in equal error rate (EER).

並列關鍵字

language recognition ； subspace-based learning

參考文獻

[1] M. P. Lewis, Ethnologue: Languages of the World, 16 ed. Dallas, Tex.: SIL International, 2009.

[2] M. A. Zissman and K. M. Berkling, "Automatic language identification," Speech Communication, vol. 35, pp. 115-124, 2001.

[4] V. W. Zue and J. R. Glass, "Conversational interfaces: advances and challenges," Proc. IEEE, vol. 88, pp. 1166-1180, 2000.

[5] A. Waibel, P. Geutner, L. M. Tomokiyo, T. Schultz, and M. Woszczyna, "Multilinguality in speech and spoken language systems," Proc. IEEE, vol. 88, pp. 1297-1313, 2000.

[7] E. Ambikairajah, H. Li, L. Wang, B. Yin, and V. Sethu, "Language identification: a tutorial," IEEE Circuits and Systems Magazine, vol. 11, pp. 82-108, 2011.

國際替代計量

基於子空間之口說語言辨識

主題瀏覽