同義詞替換在自然語言學習領域中的資訊檢索(information retrieval , IR)與電腦輔助語言學習的研究上,是經常被提出來討論的有趣問題。如何將正確的同義詞填入句子中適當的位置,往往令許多第二語言的語言學習者混淆,因為許多詞語雖然意義相近或相同,但在使用上的習慣或方式卻可能有極大的差異。早期的研究已提出多種方法在同義詞選擇的應用上,但主要都是基於以英文為主的文本上研究,相同的方法若是應用於中文文本,可能會產生不同的結果;本研究提出了獨立成份分析(independent component analysis, ICA)方法應用在中文的同義詞替換研究上,其結果也顯示出與早期研究提出的方法:pointwise mutual information (PMI),5連詞語言模型以及空間向量模型(vector space model, VSM)做比較後,ICA確實能夠得到較高的同義詞替換正確率。
Near-synonym sets represent groups of words with similar meaning, which are useful knowledge resources for many natural language applications such as query expansion for information retrieval (IR) and computer-assisted language learning. However, near-synonyms are not necessarily interchangeable in contexts due to their specific usage and syntactic constraints. Previous studies have developed various methods for near-synonym choice in English sentences. To our best knowledge, there is no such evaluation on Chinese sentences. Therefore, this paper proposes the use of the independent component analysis (ICA) for Chinese near-synonym choice evaluation. Experimental results show that the ICA achieves higher accuracy than the pointwise mutual information (PMI), 5-gram language model and vector space model (VSM) that have been used in previous studies.