  • 學位論文


Term Matching and Error Correction Mechanisms for Jhuyin Input Method

指導教授 : 黃光璿


中文輸入法當前以字形及字音兩種型式,前者如:倉頡輸入法、嘸蝦米輸入法等;後者如:微軟新注音、新酷音等。字音輸入法若欲輸入一組詞語,需完整輸入正確才能夠正確顯示,若輸入錯誤的注音序列則無法正確輸出字詞。相關解決方式已在行動裝置上出現,在行動裝置上使用注音輸入法,若輸入錯誤的注音序列,會顯示可能的字或詞語供使用者選擇,但現行Windows系統注音輸入法尚未盛行此種便於使用者輸入的方法。本文以注音輸入法為架構,利用查詢擴展(Query expansion)得出其他注音序列,並計算其與輸入注音序列之編輯距離(Edit Distance),找尋編輯距離較小的合法的字詞,供使用者便於使用注音輸入法輸出字詞,減少因為輸入錯誤,致需重新輸入注音序列。


Chinese input methods currently have two types: shape-based, such as Cangjie input method and Xiximi input method, and phonetic-based, such as Microsoft New Phonetic, and Xinkuyin. If you want to use phonetic-beased method to enter a group of words, you need to enter the correct keys of pronunciations of the words in order to retrieve them correctly. If you input the wrong sequence of pronunciations, you cannot output the intended words correctly. Partial solution have already appeared on mobile devices. If the illegal sequence is entered, the possible words or phrases will be displayed for the user, but this user-friendly method is not yet prevalent in the current Windows system. In this paper, we use the input method as a framework, and use query expansion to find out similar sequences of the input, and calculate their edit distances, in order to find the words with smaller edit distance. This can reduce the need to re-enter the sequence due to wrong input.


[1]Poollster波仕特線上市調, “Pollster波仕特線上市調:七成以上民眾使用注音輸入法,” pollster.com.tw. https://www.pollster.com.tw/Aboutlook/lookview_item.aspx?ms_sn=1476 (accessed December. 9, 2021).
[2]嚴立模, “注音符號與漢語拼音:文字還是音標?,” in 2009年亞太地區語言與文化教育國際學術研討會, 2009, pp. 77-92.
[3]中華民國教育部, “國字標準字體筆順學習網,” https://stroke-order.learningweb.moe.edu.tw (accessed May. 20, 2022).
[4]中華民國教育部, “注音符號,” cloud.edu.tw. https://pedia.cloud.edu.tw/Entry/WikiContent?title=注音符號 (accessed May. 20, 2022).
[5]中華民國教育部, “教育部國語辭典公眾授權網,” https://language.moe.gov.tw/001/Upload/Files/site_content/M0001/respub/index.html (accessed May. 12, 2022).
