行動裝置及線上學習近年已經廣泛實用,語音辨識跟著也變成很重要的技術,不但世界各種語言都需要語音辨識,也需要辨識不同語言混合的語者訊號,尤其是當英文扮演著第二語言的場合,若講話時混合著多種語言,在一句話在不同語言間來回切換,稱為『語碼切換』;要能夠聽懂這種話,不但電腦很難勝任,就連人有時也不容易做好,部份原因是相關語料不足而無法訓練好辨識系統。 本論文探討我們如何用中英語碼切換的課堂語料(中文為主英文為次)來找出複雜的特徵,好讓系統可以恢復原本辨識錯誤的英文語段。我們的多層次架構能同時考慮低階及高階的特徵,這包括音位特徵、韻律特徵、語言特徵以及純聲學或語音學特徵,這些特徵可用來分辨每個音框的訊號應屬於哪一種語言。 我們用簡單而精確的方法找出有效的用在條件隨機域(conditional random field, CRF)上的特徵,也探討如何更有效的使用語料訓練出的串聯式特徵。我們發現,可以調語料中的中英文比例以致大幅改善所恢復的英文語段的正確率,甚至發現這種技術比使用深層類神經網路(DNN)的方法更好。這種技術不但在傳統的GMM-HMM語音辨識環境下可以提昇辨識效能,在最先進的混合式CD-HMM-DNN辨識環境下亦同。
The rise of mobile devices and online learning brings into sharp focus the importance of speech recognition not only for the many languages of the world but also for code-mixed speech, especially where English is the second language. The recognition of code-mixed speech, where the speaker mixes languages within a single utterance, is a challenge for both computers and humans, not least because of the limited training data. We conduct research on a Mandarin-English code-mixed lecture corpus, where Mandarin is the host language and English the guest language, and attempt to find complex features for the recovery of English segments that were misrecognized in the initial recognition pass. We propose a multi-level framework wherein both low-level and high-level cues are jointly considered; we use phonotactic, prosodic, and linguistic cues in addition to acoustic-phonetic cues to discriminate at the frame level between English- and Chinese-language segments. We develop a simple and exact method for CRF feature induction, and improved methods for using cascaded features derived from the training corpus. By additionally tuning the data imbalance ratio between English and Chinese, we demonstrate highly significant improvements over previous work in the recovery of English-language segments, and demonstrate performance superior to DNN-based methods. We demonstrate considerable performance improvements not only with the traditional GMM-HMM recognition paradigm but also with a state-of-the-art hybrid CD-HMM-DNN recognition framework.