透過您的圖書館登入
IP:18.221.165.246
  • 期刊
  • OpenAccess

基於字元階級之語音合成用文脈訊息擷取

Character-Level Linguistic Features Extraction for Text-to-Speech System

摘要


優良的語言文脈訊息是語音合成的關鍵部分,傳統的文脈訊息都是依賴於自然語言處理(Natural Language Processing,NLP),即使用parser 分析文字。但是parser 設計困難無法專門為語音合成設計;所以我們想直接以字元為處理單元建立一個end-to-end 的語音合成系統, 在這想法下我們改用字元層級(character-level)的word2vec 與遞迴類神經網路,直接將輸入字元序列轉換成隱藏特徵向量當做語言合成的文脈訊息。最後我們利用一中英夾雜語音合成系統測試此想法,語音合成的實驗的結果表明,我們提出的方式的確比傳統使用parser 的方式有更好的性能。

並列摘要


High quality linguistic features is the key to the success of speech synthesis. Traditional linguistic feature extraction methods are usually relied on a word-level natural language processing (NLP) parser. Since, a good parser requires a lot of feature engineering to build, it is usually a genral-purpose one and often not specially designed for speech synthesis. To avoid these difficulties, we propose to replace the conventional NLP parser by a character embedding and a chacter-level recurrent neural network language model (RNNLM) module to directly convert input character sequences, character-by-character, into latent linguistic feature vectors. Experimental results on Chinese-English speech synthesis system showed that the proposed approach achieved comparable performance with transitional NLP parser-based methods.

並列關鍵字

Speech Synthesis Linguistic Features Word2vec RNNLM

參考文獻


Licstar, (2013 年7 月29 日)。Deep Learning in NLP (一)詞向量和語言模型。【部落格文字資料】。取自http://licstar.net/archives/328。[Licstar. (2013, July 29). Deep Learningin NLP (1) Word embedding and Language model [Web blog message]. Retrieved fromhttp://licstar.net/archives/328]
The Stanford Natural Language Processing Group. (2015). Stanford-Parser Version3.6.0Release in 2015/12/09: http://nlp.stanford.edu/software/lex-parser.shtml
Brill, B.(1992).A SIMPLE RULE-BASED PART OF SPEECH TAGGER.ANLC'92 Proceedings of the third conference on Applied natural language processing.(ANLC'92 Proceedings of the third conference on Applied natural language processing).
Ding, C.,Xie, L.,Yan, J.,Zhang, W.,Liu, Y.(2015).Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features.proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).(proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)).
Greff, K.,Srivastava, R. K.,Koutník, J.,Steunebrink, B. R.,Schmidhuber, J..LSTM: A Search Space Odyssey.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS.

延伸閱讀