應用獨立成份分析於同義詞替換之研究

同義詞替換在自然語言學習領域中的資訊檢索(information retrieval , IR)與電腦輔助語言學習的研究上，是經常被提出來討論的有趣問題。如何將正確的同義詞填入句子中適當的位置，往往令許多第二語言的語言學習者混淆，因為許多詞語雖然意義相近或相同，但在使用上的習慣或方式卻可能有極大的差異。早期的研究已提出多種方法在同義詞選擇的應用上，但主要都是基於以英文為主的文本上研究，相同的方法若是應用於中文文本，可能會產生不同的結果；本研究提出了獨立成份分析(independent component analysis, ICA)方法應用在中文的同義詞替換研究上，其結果也顯示出與早期研究提出的方法：pointwise mutual information (PMI)，5連詞語言模型以及空間向量模型(vector space model, VSM)做比較後，ICA確實能夠得到較高的同義詞替換正確率。

關鍵字

同義詞；獨立成份分析；資訊檢索

並列摘要

Near-synonym sets represent groups of words with similar meaning, which are useful knowledge resources for many natural language applications such as query expansion for information retrieval (IR) and computer-assisted language learning. However, near-synonyms are not necessarily interchangeable in contexts due to their specific usage and syntactic constraints. Previous studies have developed various methods for near-synonym choice in English sentences. To our best knowledge, there is no such evaluation on Chinese sentences. Therefore, this paper proposes the use of the independent component analysis (ICA) for Chinese near-synonym choice evaluation. Experimental results show that the ICA achieves higher accuracy than the pointwise mutual information (PMI), 5-gram language model and vector space model (VSM) that have been used in previous studies.

並列關鍵字

Synonym ； Independent Component Analysis ； information retrieval

參考文獻

2. D. Inkpen, “A statistical model for near-synonym choice,” ACM Transactions on Speech and Language Processing, vol. 4, no. 1, 2007, pp. 1-17.

3. P. Edmonds, “Choosing the word most typical in context using a lexical co-occurrence network,” Proc. Association for Computational Linguistics 1997, pp. 507-509.

5. L.-C. Yu, et al., “Annotation and verification of sense pools in OntoNotes,” Information Processing & Management, vol. 46, no. 4, 2010, pp. 436-447.

7. K.W. Church and P. Hanks, “Word Association Norms Mutual Information and Lexicography,” Computational Linguistics, vol. 16(1), 1991, pp. 22-29.

12. A. Hyvärinen and E. Oja, “Independent Component Analysis:Algorithms and Applications,” Neural Networks, 2000, pp. 411-430.

被引用紀錄

黃政華（2017）。發展適應性中文相似詞庫於口碑分類〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201700784

國際替代計量

應用獨立成份分析於同義詞替換之研究

全文下載

主題瀏覽