  • 學位論文


English Lexical Stress Detection and Sentence-Based Intonation Assessment based on Contour Shape Description

指導教授 : 張智星


英語詞彙的重音與句子的語調皆是溝通上重要的一環,兩者皆與口語表達時的情緒、態度以及想表達的意義密不可分,本論文中共分為兩大部份,分別為英語詞彙重音的偵測以及句子的語調評分。 英語詞彙重音的偵測部分,本論文利用了能量相關特徵、音高相關特徵、持續時間以及輪廓形狀特徵,針對英語詞彙中各音節母音區段進行特徵的抽取;在特徵抽取方面,本論文中比較了單一特徵對於詞彙重音偵測的效果,另一方面亦比較不同的特徵組合下對於英語詞彙重音偵測的影響,其中輪廓形狀特徵對於英語的詞彙重音偵測有著顯著效果;此外,本論文亦比較音節數相依與音節數不相依的分類方法於詞彙重音偵測的效果;在系統效能評估方面,本論文以詞彙為單位,計算系統對於詞彙重音偵測的正確率做為效能評估的標準,其正確率可達90.83 %。 語句語調的評分部分,本論文將語調相似度評分視為一個分類的問題;在特徵部分共為兩個部分,分別為整體句子的語調相似度特徵以及句子中各詞彙的語調相似度特徵;在整體句子的語調相似度中,本論文計算兩個句子間音高曲線的相關係數、動態時間扭曲距離以及輪廓形狀相關特徵;另一方面,在句子中各詞彙的語調相似度中,亦計算兩詞彙間音高曲線的相關係數、動態時間扭曲距離相關特徵以及輪廓形狀相關特徵,最後結合此兩部分以作為相似度特徵;在系統效能評估方面,本論文以系統評分與人為評分間的相關係數為系統評估的標準,其相關係數最高可達0.35。


Lexical stress and sentence intonation play important role in communication, both are related to emotion, attitude and the meaning that he/she wants to convey. This thesis is divided into two parts: the first part describes about lexical stress detection and the second part describes about sentence-based intonation assessment. In terms of lexical stress detection, energy-related features, pitch-related features, duration and contour shape features are extracted. All of these features are vowel-based. In this thesis, the performances of system using different single features are compared. Also, the performance of system using different combinations of features are compared. Based on them, it is contour shape features that are found useful in lexical stress detection. Besides, the different methods of classification by training syllable number-dependent classifiers and syllable number-independent classifiers are compared. The best recognition rate of our system is 90.83%. In terms of intonation assessment, it is treated as a classification problem, and the features being extracted include two parts: the intonation similarity of sentences and the intonation similarity of words. The features that we used are correlation coefficient, distance of dynamic time warping and contour shape-related features between two pitch contours. Besides, performance evaluation is determined by Pearson’s correlation coefficient, and it reaches 0.35 in our system.


[13] 曾璟鈺,"口說英語重音辨識之初步研究",清華大學碩士論文,民國 97 年.
[21] Karen Steffen Chung, "大師開講 - 英語教學死角:複合名詞重音", "http://homepage.ntu.edu.tw/~karchung/pubs/CET73.pdf " , 2012
[28] Chih-Chung Chang and Chih-Jen Lin, "LIBSVM : a library for support vector machines. ", ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at "http://www.csie.ntu.edu.tw/~cjlin/libsvm".
[11] Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, Vol. 7, No. 10, Aug 2008.
[20] Karen Steffen Chung, "大師開講 -「重音」真的很重要!", "http://homepage.ntu.edu.tw/~karchung/pubs/CET72.pdf ", 2012
