簡易檢索 / 詳目顯示

研究生: 陳晏淇
Chen, Yan-Chi
論文名稱: 台灣閩南語語音字典改良
Optimizing the Lexicon for Recognizing Taiwanese Southern Min Speech
指導教授: 甯俐馨
Ning, Li-Hsin
學位類別: 碩士
Master
系所名稱: 英語學系
Department of English
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 123
中文關鍵詞: 台灣閩南語發音變異多種發音新詞字典
英文關鍵詞: Taiwanese Southern Min (TSM), pronunciation variation (PV), multiple pronunciation (MP), new word (NW), lexicon
DOI URL: http://doi.org/10.6345/NTNU202001044
論文種類: 學術論文
相關次數: 點閱:84下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在改良囊括日常用語的台灣閩南語語音字典。有鑒於台灣的老化人口日益增加,建置台灣閩南語語音資料庫於未來多元應用更趨重要,如:語音科技改良與語言保存。然而,由於台灣閩南語為低資源語言(low-resource language)之一,目前可取得的台灣閩南語語料相當稀少;本研究蒐集線上台灣閩南語對話語料,並予以人工分詞與標記。本研究以蒐集的語料探究現有的台灣閩南語字典之於蒐集語料的涵蓋率,並發現尚有未被收錄於台灣閩南語字典中的台灣閩南語發音詞條與台灣閩南語新詞條。本研究將整理未被收錄於字典中的詞條,蒐集的詞條將被分類為三個類別;其分別為:發音變異(pronunciation variation, PV)、多種發音(multiple pronunciation, MP) 與新詞(new word, NW) 。本文將呈現針對蒐集語料的深入分析,並基於觀察的結果進行討論。討論重點將著眼於總括性的觀察性統整。我們期望此研究結果能夠反映部分台灣閩南語語詞在實際台灣閩南語對話中的使用情形,並協助改良現有台灣閩南語語音辨識系統。

    This thesis aims to optimize a Taiwanese Southern Min (TSM) lexicon that accommodates daily use of TSM words. In light of the increasing aging population in Taiwan, it might be necessary to build a database containing TSM words for diverse applications such as speech technologies and language preservation. Nevertheless, as a low-resource language, there is a dearth of available for TSM research. Due to the scarcity of TSM data, this thesis prepared TSM data by gathering on-line TSM conversational speeches, segmenting the content of the speeches, and annotating the data manually. Next, this thesis investigated the word coverage of the existing TSM dictionary and found that some TSM pronunciations and TSM words have yet been included in the dictionary. Data that were not found were then sorted into 3 categories: pronunciation variation (PV), multiple pronunciation (MP), and new word (NW) based on their pronunciation variation types. Followed up an in-depth description of data analysis, a discussion based on our observation will be elicited. The discussion would shed the lights on the generalization of our findings. It is expected that our findings would be capable of capturing a glimpse of daily use of TSM. We hope our results could be able to help optimize the lexicon for Taiwanese Southern Min speech recognition (TSMSR) system in progress and benefit TSM-related studies in the future.

    ACKNOWLEDGEMENTS i CHINESE ABSTRACT ii ENGLISH ABSTRACT iii TABLE OF CONTENTS iv LIST OF TABLES vi LIST OF FIGURES viii CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Significance 6 1.3 Organization 6 CHAPTER 2 LITERATURE REVIEW 9 2.1 Introduction of Taiwanese Southern Min (TSM) 9 2.1.1 TSM phonological system 9 2.1.2 Vowels & consonants 10 2.1.3 Tones 13 2.2 Establishing a Dictionary 16 2.2.1 Lexicalization 16 2.2.2 Types of dictionaries 17 2.2.3 What should be included in the dictionary 19 2.2.4 Contraction 24 2.3 Application of the Dictionary 26 2.3.1 An overview of a TSMSR system 26 2.3.2 Data Fed to TSMSR 27 2.3.3 The importance of TSM characters 28 2.3.4 The problem of one-to-one mapping 28 2.3.5 The importance of including pronunciation variation in the lexicon 30 CHAPTER 3 METHODOLOGY 33 3.1 Data Preparation/Description 33 3.2 Phonetic Representation 38 3.3 Data Processing 38 3.4 Data Processing: Data/Dictionary Preparation 43 3.4.1 Pronunciation variation (PV) 43 3.4.2 Multiple pronunciation (MP) 43 3.4.3 New Word (NW) 46 3.5 Classification Criteria 46 CHAPTER 4 ANALYSIS 49 4.1 Pronunciation Variation (PV) 49 4.1.1 Contraction 49 4.1.2 Deletion of an entire syllable 51 4.1.3 Initial consonant deletion 52 4.1.4 /l/ Insertion 54 4.1.5 Monophthongization (diphthong --> monothong) 56 4.1.6 Vowel change 56 4.2 Multiple Pronunciation (MP) 57 4.2.1 Vowel differences 57 4.2.2 Consonantal differences 59 4.2.3 Nasal variations 60 4.2.4 Differences at the phrasal level 62 4.3 New Word (NW) 63 4.3.1 Words with non-separable word meanings 63 4.3.2 Appendage & Insertion 64 4.3.3 e7 appendage 68 4.3.4 Oo3 insertion 69 4.3.5 Synonyms 69 4.3.6 V-著/V-到 phrases 70 4.3.7 Non-verbal kong2 72 4.3.8 Words with complex internal structures 77 4.3.9 4-character Words 79 4.3.10 Frequent words 81 4.3.11 Duplication 83 CHAPTER 5 DISCUSSION 85 5.1 The Three Wordlists 86 5.1.1 Pronunciation variation (PV) 86 5.1.2 Multiple pronunciation (MP) 86 5.1.3 New word (NW) 87 5.1.4 Summary 88 5.2 Contribution 89 5.2.1 Lexicon 89 5.2.2 Future application 90 5.2.3 Limitation 91 CHAPTER 6 CONCLUSION 93 6.1 Result 93 6.2 Prospect 94 References 95 Appendix 1– Pronunciation Variations (PV) 99 Appendix 2 – Multiple Pronunciations (MP) 104 Appendix 3– New Word (NW) 106

    Akita, Y., & Kawahara, T. (2010). Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1539-1549.
    Birkholz, P. (2013). Modeling Consonant-Vowel Coarticulation for Articulatory Speech Synthesis. PLoS ONE, 8(4).
    Brinton, L. J., & Traugott, E. C. (2005). Lexicalization and Language Change (Research Surveys in Linguistics).
    Chang, M.-H., & Hsieh, S.-K. (2018). A corpus-based study of the recurrent lexical bundle ka li kong ‘let (me) tell you’ in Taiwanese Southern Min conversations. Chinese Language and Discourse, 8, 174-211.
    Cheng, C., & Xu, Y. (2013). Articulatory limit and extreme segmental reduction in Taiwan Mandarin. Journal of the Acoustical Society of America, 134(5), 4481-4495.
    Cheng, R. L. (1985). Taiwanese Question Particles. Journal of Chinese Linguistics, 5(2), 153-185.
    Cheng, R. L., & Cheng, S. S. (1994). 台灣福建話的語音結構及標音法. 台北: 學生書局.
    Chiu, C., & Babel, M. (2010). Effects of Syllable Positions on Taiwanese Mandarin Sibilant Perception. Paper presented at the International Symposium on Chinese Spoken Language.
    Chui, K., & Lai, H.-l. (2008). The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min. Taiwan Journal of Linguistics 6(2).
    Chung, R.-F. (1996). The segmental phonology of Southern Min in Taiwan. Taipei: Crane Pub. Co.
    Chung, R.-F. (2002). 台語的語音基礎: Crane Publishing.
    Crystal, D. (1985). A Dictionary of Linguistics and Phonetics. 2nd ed. London: Blackwell.
    Gizatova, G. (2018). A Corpus-Based Approach to Lexicography:
    A New English-Russian Phraseological Dictionary. International Journal of English Linguistics, 8(3), 357-363. doi:10.5539/ijel.v8n3p357
    Hsieh, F.-f. (2012). Low Vowel Raising in Sinitic Languages: Assimilation, Reduction, or Both? LANGUAGE AND LINGUISTICS 13.4, 583-623.
    Hsu, H.-C. (2003). A Sonority Model of Syllable Contraction in Taiwanese Southern Min. Journal of East Asian Linguistics, 12(4), 349-377.
    Huang, C.-R., & CKIP. (1996). 「搜」文解字— 中文詞界研究與資訊用分詞標準. Retrieved from
    Huang, S. (1993). Language, Society, and Ethnic Identity (語言社會與族群意識). Taiwan: Crane.
    Iunn, U.-G., Lau, K.-G., Tan-Tenn, H.-G., Lee, S.-A., & Kao, C.-Y. (2007). Modeling Taiwanese Southern-Min Tone Sandhi Using Rule-Based Methods. Paper presented at the Computational Linguistics and Chinese Language Processing.
    Iunn, U.-G., & Liu, J.-Y. (2006). 台語文計算語言學基礎建設─介紹台語線頂辭典kap語料庫. Paper presented at the 台灣語文學系第一屆台灣語文暨文化研討會, Taichung, Taiwan.
    Jejunum. (2003). Merriam-Webster's dictionary (11 ed.). Springfield, MA: Merriam-Webster.
    Kuo, C.-C. (2007). Phonetic and phonological background of Chinese spoken languages. In C.-H. Lee, H. Li, L.-s. Lee, R.-H. Wang, & Q. Huo (Eds.), Advances in Chinese Spoken Language Processing. Toh Tuck Link, Singapore: World Scientific Publishing.
    Lau, S.-h. (2013). On Non-verbal Kóngs in Taiwanese. 台灣學誌, 7, 57-87.
    Lawrence Rabiner, & Juang, B.-H. (1993). Fundamentals of Speech Recognition. Englewood Cliffs, New Jersey: PTR Prentice-Hall.
    Levinson, S. C. (1998). Minimization and conversation inference. In J. Verschueren & M. Bertuccelli Papi (Eds.), The Pragmatic Perspective (pp. 61-129). Amsterdam: John Benjamins Publishing Company.
    Li, B., & Gaussier, E. (2010). Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora. Paper presented at the the 23rd International Conference on Computational Linguistics.
    Li, Y.-J., Wang, C.-C., Chen, L.-Y., Jang, J.-S. R., & Lyu, R.-Y. (2013). Using speech assessment technique for the validation of Taiwanese speech corpus. Computational Linguistics and Chinese Language Processing, 18(4), 81-96.
    Liang, M.-S., Lyu, R.-Y., & Chiang, Y.-C. (2006). Using Speech Recognition Technique for Constructing a Phonetically Transcribed Taiwanese (Min-nan) Text Corpus. Paper presented at the Acoustics, Speech and Signal Processing.
    Liang, Y.-F. (2005). The Evolution of Phase Complement “Lai” and “Qu” in Chinese. 語言科學, 4(6), 27-35.
    Lien, C. (2002). Grammatical Function Words 乞, 度, 共, 甲, 將 and 力 in Li4 Jing4 Ji4 荔鏡記 and their Development in Southern Min. Dialect Variations in Chinese, 179-216.
    Lien, C. (2011). Development of Directionals in Southern Min. LANGUAGE AND LINGUISTICS, 12.2, 427-475.
    Liu, Y., & Fung, P. (2003a). Modeling partial pronunciation variations for spontaneous Mandarin speech recognition. Computer Speech and Language, 17, 357-379.
    Liu, Y., & Fung, P. (2003b). Partial change accent models for accented Mandarin speech recognition. Paper presented at the Automatic Speech Recognition and Understanding, St Thomas, VI, USA,.
    Lyu, D.-C., Hsien, H.-W., Lee, Y.-X., Liou, Z.-I., Hsu, C.-n., Chiang, Y.-J., & Lyu, R.-Y. (2004). 華台雙語發音變異性之語音辨識研究及 PDA 之應用 (The study of pronunciation variations in Mandarin and Taiwanese and its application in PDA). Proceedings of the 16th Conference on Computational Linguistics and Speech Processing, 229-238.
    Lyu, D.-C., Hsien, H.-W., Lee, Y.-X., Liou, Z.-I., Hsu, C.-N., Chiang, Y.-J., & Lyu, R.-Y. (2004). 華台雙語發音變異性之語音辨識研究及PDA 之應用(The study of pronunciation variations in Mandarin and Taiwanese and its application in PDA). Paper presented at the Computational Linguistics and Speech Processing (ROCLING), Taipei, Taiwan.
    Lyu, D.-C., Liang, M.-s., Chiang, Y.-C., Hsu, C.-N., & Lyu, R.-Y. (2003). Large vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling. Paper presented at the DBLP.
    Lyu, D.-C., Lyu, R.-Y., Chiang, Y.-C., & Hsu, C.-N. (2005). Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition. International Journal of Computational Linguistics & Chinese Language 10(3).
    Lyu, R.-y., Chen, C.-y., Chiang, Y.-c., & Liang, M.-s. (2000). A Bi-lingual Mandarin/Taiwanese(Min-nan), Large Vocabulary, Continuous Speech Recognition System Based on the Tong-yong Phonetic Alphabet (TYPA). Paper presented at the International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China.
    Lyu, R.-Y., Liang, M.-S., & Chiang, Y.-C. (2004). Toward constructing a multilingual speech corpus for Taiwanese (Min-nan), Hakka, and Mandarin. Computational Linguistics and Chinese Language Processing, 9(2), 1-12.
    Lyu, R.-Y., Liang, M.-s., Lyu, D.-C., & Chiang, Y.-C. (2006). Taiwanese Min-nan speech recognition and synthesis. In C.-H. Lee (Ed.), Advances in Chinese Spoken Language Processing (pp. 387-406): World Scientific.
    Lyu, R.-Y., Lyu, D.-C., Liang, M.-S., Wang, M.-H., Chiang, Y.-C., & Hsu, C.-N. (2004a). A unified framework for large vocabulary speech recognition of mutually unintelligible Chinese "regionalects". Paper presented at the International Conference on Spoken Language Processing, Jeju Island, Korea.
    Lyu, R.-Y., Lyu, D.-C., Liang, M.-S., Wang, M.-H., Chiang, Y.-C., & Hsu, C.-N. (2004b). A unified framework for large vocabulary speech recognition of mutually unintelligible Chinese “regionalects”. INTERPEECH-2004, 1001-1004.
    Myers, J., & Li, Y. (2009). Lexical frequency effects in Taiwan Southern Min syllable contraction. Journal of Phonetics, 37, 212-230.
    O. Fujimura, M. J. M., L.A. Streeter. (1978). Perception of Stop Consonants with Conflicting Transitional Cues: A Cross-Linguistic Study. Lang Speech, 21(4), 337-346.
    Oshika, B. T., & Krausse, S. C. (1992). Electronic databases for linguistic and language research. Library Trends, 40(4).
    Pedro A. Fuertes-Olivera, & Tarp, S. (2014). Theory and Practice of Specialised Online Dictionaries. Lexicography versus Terminography. (Vol. 146): De Gruyter
    Plag, I., & Hedia, S. B. (2018). The phonetics of newly derived words: Testing the effect of morphological segmentability on affix duration. In S. Arndt-Lappe, A. Braun, C. Moulin, & E. Winter-Froemel (Eds.), Expanding the Lexicon: De Gruyter.
    Rosen, V., Thunes, M., Haugereid, P., Losnegaard, G. S., Dyvik, H., Meurer, P., . . . Smedt, K. D. (2016). The enrichment of lexical resources through incremental parsebanking Lang Resources & Evaluation, 50, 291-319.
    Sheng-Fu Wang, & Fon, J. (2013). A Taiwan Southern Min spontaneous speech corpus for discourse prosody. Paper presented at the Tools and Resources for the Analysis of Speech Prosody(TRASP), Aix-en-Provence, France.
    Sinclair, J. (2004). Corpus and Text — Basic Principles. In M. Wynne (Ed.), Developing Linguistic Corpora: a Guide to Good Practice.
    Sterkenburg, P. G. J. v. (2003). A Practical Guide to Lexicography: John Benjamins Publishing Company.
    Strik, H., & Cucchiarini, C. (1999). Modeling pronunciation variation for ASR: A survey of the literature. Speech Communication, 29, 225-246.
    Tarp, S., & Gouws, R. H. (2008). A Lexicographic Approach to Language Policy and Recommen- dations for Future Dictionaries. Lexikos, 18(1).
    Tognini-Bonelli, E. (2001). Corpus Linguistics at Work: John Benjamins Publishing Company.
    Tseng, C. (1999). Contraction in Taiwanese: Synchronic analysis and its connection with diachronic change. Chinese Languages and Linguistics V: Interactions in Languages, 205-232.
    Tseng, S.-C. (2001). Highlighting Utterances in Chinese Spoken Discourse. Paper presented at the The 15th Pacific Asia Conference on Language, Information and Computation, Hong Kong, China.
    Tseng, S.-C. (2005a). Contracted Syllables in Mandarin: Evidence from Spontaneous Conversations. LANGUAGE AND LINGUISTICS, 6(1), 153.
    Tseng, S.-C. (2005b). Syllable Contractions in a Mandarin Conversational Dialogue Corpus. International Journal of Corpus Linguistics. doi:10.1075/ijcl.10.1.04tse
    Tu, J.-Y., & Davis, S. (2009). Japanese loanwords into Taiwanese Southern Min.
    Zhang, J., & Lai, Y. (2007). Two aspects of productivity in Taiwanese Double Reduplication*. Kansas Working Papers in Linguistics.
    Zheng, Z. (2014). Restructuring Taiwan Southern Min Ū “HAVE” in Adjectival Predicate Constructions. Open Journal of Modern Linguistics, 4(5).

    無法下載圖示 電子全文延後公開
    2025/08/18
    QR CODE