基於字元階級之語音合成用文脈訊息擷取

優良的語言文脈訊息是語音合成的關鍵部分，傳統的文脈訊息都是依賴於自然語言處理(Natural Language Processing，NLP)，即使用parser 分析文字。但是parser 設計困難無法專門為語音合成設計；所以我們想直接以字元為處理單元建立一個end-to-end 的語音合成系統，在這想法下我們改用字元層級(character-level)的word2vec 與遞迴類神經網路，直接將輸入字元序列轉換成隱藏特徵向量當做語言合成的文脈訊息。最後我們利用一中英夾雜語音合成系統測試此想法，語音合成的實驗的結果表明，我們提出的方式的確比傳統使用parser 的方式有更好的性能。

關鍵字

語音合成；文脈訊息；文字向量；遞迴類神經網路語言模型

並列摘要

High quality linguistic features is the key to the success of speech synthesis. Traditional linguistic feature extraction methods are usually relied on a word-level natural language processing (NLP) parser. Since, a good parser requires a lot of feature engineering to build, it is usually a genral-purpose one and often not specially designed for speech synthesis. To avoid these difficulties, we propose to replace the conventional NLP parser by a character embedding and a chacter-level recurrent neural network language model (RNNLM) module to directly convert input character sequences, character-by-character, into latent linguistic feature vectors. Experimental results on Chinese-English speech synthesis system showed that the proposed approach achieved comparable performance with transitional NLP parser-based methods.

並列關鍵字

Speech Synthesis ； Linguistic Features ； Word2vec ； RNNLM

參考文獻

Licstar, (2013 年7 月29 日)。Deep Learning in NLP (一)詞向量和語言模型。【部落格文字資料】。取自http://licstar.net/archives/328。[Licstar. (2013, July 29). Deep Learningin NLP (1) Word embedding and Language model [Web blog message]. Retrieved fromhttp://licstar.net/archives/328]

Google Scholar

The Stanford Natural Language Processing Group. (2015). Stanford-Parser Version3.6.0Release in 2015/12/09: http://nlp.stanford.edu/software/lex-parser.shtml

Google Scholar

Brill, B.(1992).A SIMPLE RULE-BASED PART OF SPEECH TAGGER.ANLC'92 Proceedings of the third conference on Applied natural language processing.(ANLC'92 Proceedings of the third conference on Applied natural language processing).

Google Scholar

Ding, C.,Xie, L.,Yan, J.,Zhang, W.,Liu, Y.(2015).Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features.proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).(proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)).

Google Scholar

Greff, K.,Srivastava, R. K.,Koutník, J.,Steunebrink, B. R.,Schmidhuber, J..LSTM: A Search Space Odyssey.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS.

Google Scholar

國際替代計量

基於字元階級之語音合成用文脈訊息擷取

全文下載

主題瀏覽