透過您的圖書館登入
IP:3.21.97.61
  • 期刊
  • OpenAccess

Combining Mutual Information and Entropy for Unknown Word Extraction from Multilingual Code-Switching Sentences

摘要


In multilingual environments, a single statement may include content from more than one language, a phenomenon known as code-switching. Among speakers of Mandarin Chinese, code switching is a frequent occurrence in daily life, and this mixing of different languages poses serious challenges for language processing. This paper collects text corpora including code switching between Mandarin and English and Mandarin and Taiwanese, where Mandarin is the dominant language. Mutual information and entropy are then used as a basis for an algorithm to identify unknown words from multilingual texts which are then automatically referenced for multilingual inclusions. Experimental results show that the proposed method effectively filters unrelated new words, thus improving the accuracy of extracting unknown words.

延伸閱讀