Similarity Based Chinese Synonym Collocation Extraction

Collocation extraction systems based on pure statistical methods suffer from two major problems. The first problem is their relatively low precision and recall rates. The second problem is their difficulty in dealing with sparse collocations. In order to improve performance, both statistical and lexicographic approaches should be considered. This paper presents a new method to extract synonymous collocations using semantic information. The semantic information is obtained by calculating similarities from HowNet. We have successfully extracted synonymous collocations which normally cannot be extracted using lexical statistics. Our evaluation conducted on a 60MB tagged corpus shows that we can extract synonymous collocations that occur with very low frequency and that the improvement in the recall rate is close to 100%. In addition, compared with a collocation extraction system based on the Xtract system for English, our algorithm can improve the precision rate by about 44%.

並列關鍵字

Lexical Statistics ； Synonymous Collocations ； Similarity ； Semantic Information

參考文獻

Benson, M.(1990).Collocations and General Purpose Dictionaries.International Journal of Lexicography.3(1),23-35.

Google Scholar

Choueka, Y.(1993).Proceedings of RIAO Conference on User-oriented Content-based Text and Image Handling.Cambridge:

Google Scholar

Church, K.,P. Hanks(1990).Word Association Norms, Mutual Information, and Lexicography.Computational Linguistics.6(1),22-29.

Google Scholar

Dagan, I.,L. Lee,F. Pereira(1997).Proceedings of the 35th Annual Meeting of ACL.Madrid, Spain:

Google Scholar

Lin, D. K.(1997).Proceedings of ACL/EACL-97.Madrid, Spain:

Google Scholar

被引用紀錄

Hong, J. F. (2010). 詞義預測研究：以語料庫驅動的語言學研究方法 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2010.02757

國際替代計量

Similarity Based Chinese Synonym Collocation Extraction

全文下載

主題瀏覽