透過您的圖書館登入
IP:18.117.180.237
  • 學位論文

使用跨語言聲學模型及音框層級語言識別來辨識高度不平衡雙語混合課程之整合性架構

An Integrated Framework for Recognizing Highly Imbalanced Bilingual Code-switched Lectures with Cross-language Acoustic Modeling and Frame-level Language Identification

指導教授 : 李琳山
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文探討一種常見的雙語混合語音的辨識:語者所使用的語句中大部分的語音訊號是用主語言(通常是語者的母語)>所說,但其中包含小部分的詞或片語是用客語言(通常是語者的第二語言)所說的。在此狀況下,不只因為語言在語句>內頻繁切換而造成語音辨識困難,而且客語言的資料量少得多,造成客語言的辨識正確率明顯甚低。本論文提出了一個>辨識這種高度不平衡的雙語混合語音的整合性辨識系統架構。這其中包含了在聲學模型上進行不同層級(模型、狀態、>高斯)的單位融合做到跨語言語料共享,語音單位的恢復加強以重建融合後的聲學模型,依據單位佔用度排序提供更彈>性的跨語言以及語言內的語料共享,以及使用模糊事後機率特徵估測音框層級的語言事後機率等。此外,本論文也將這>些方法延伸到今日最成功的用深層類神經網路作為瓶頸特徵抽取器以及隱藏式馬可夫模型狀態模擬器的兩種方法上。我>們用一套在真實情境下錄製的語料進行統一條件下的測試,將所有提出方法做了完整的比較。實驗結果顯示本論文所提>出的系統架構能夠大幅改善雙語混合語音辨識的正確率。

並列摘要


This thesis considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. In this thesis, we propose an integrated overall framework for recognizing such highly imbalanced code-switched speech. This includes unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) for cross-lingual data sharing, unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using Blurred Posteriorgram Features (BPFs) to be used in decoding. In addition, we also evaluated two approaches extending above approaches based on HMMs to the state-of-the-art deep neural networks (DNNs), including using bottleneck features in HMM/GMM and modeling context-dependent HMM states. We present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.

參考文獻


[1] Ching-Feng Yeh and Lin-Shan Lee, “An improved framework for recognizing highly imbalanced bilingual code-switched lectures with cross-language acoustic modeling and frame-level language identification,” IEEE Transactions on Audio, Speech, and Language Processing, 2015.
[3] Tanja Schultz and Alex Waibel, “Language independent and language adaptive acoustic modeling for speech recognition,” Speech Communication, 2001.
[4] Hui Lin, Li Deng, Jasha Droppo, Dong Yu, and Alex Acero, “Learning methods in multilingual speech recognition,” in NIPS, 2008.
1995, pp. 185–188.
[6] Li Deng, “Integrated-multilingual speech recognition using universal phonological features in a functional speech production model,” in ICASSP, 1997.

延伸閱讀