A New Experience in Persian Text Clustering using FarsNet Ontology

Clustering through organizing large text corpora has a key role in an easy navigation and browsing of massive amounts of text data and in particular in search engines. The documents comparison using the conventional clustering techniques is based on the surface similarities of words or extracted morphemes. This leads to non-semantic clusters usually. In this paper, Farsi, also known as Persian, has been taken into account with regards to the fact that the amount of electronic Farsi texts are growing rapidly. The documents are enriched by using semantic relationships-synonymy, hypernymy and hyponymy- extracted from FarsNet lexical ontology. A WSD procedure is proposed to decrease uncertainty. After preprocessing routines, three clustering algorithms including Bisecting K-means, LSI and PLSI based clustering is applied on the pre-categorized Persian Hamshahri corpus. Experimental results show the improvement of clustering quality when text data is enriched by the semantic relations especially using PLSI based approach.

並列關鍵字

text clustering ； word sense disambiguation ； semantic analysis ； FarsNet lexical ontology ； probabilistic latent semantic indexing

被引用紀錄

賴冠良（2005）。應用故障樹分析理論於風力發電系統併接點可靠度分析〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2005.00878

Liu, H. Y. (2010). Application Behavior-aware Flow Control in Network-on-Chip [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2010.00468

賴玉霖（2014）。新產品開發階段的問題改善事項選擇及風險分析-以網路攝影機及工業平版電腦為研究案例〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201400940

楊景棠（2013）。應用故障樹方法分析並改善產品及管理系統安全與功能失效風險〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201300669

范晉獅（2008）。光纖螢光感測器於溫度應用及過氧化氫的檢測〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-0607200917245330

延伸閱讀

陳櫻芳（2002）。Pattern-Oriented Clustering Methods For Sequences〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611363335
邱智淳（2009）。Ontology-Based Planning and Execution of Fuzzy Inference Flow〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2009.00253
Yang, L. B., & Cai, X. Y. (2012). Semi-Supervised Co-Clustering for Query-Oriented Theme-based Summarization. Research Journal of Applied Sciences, Engineering and Technology, 4(18), 3410-3414. https://www.airitilibrary.com/Article/Detail?DocID=20407467-201209-201512070006-201512070006-3410-3414
Priya, D. S., & Karthikeyan, M. (2014). An Efficient EM based Ontology Text-mining to Cluster Proposals for Research Project Selection. Research Journal of Applied Sciences, Engineering and Technology, 8(12), 1435-1441. https://www.airitilibrary.com/Article/Detail?DocID=20407467-201409-201511260026-201511260026-1435-1441
Jing, L. (2007). Text subspace clustering with feature weighting and ontologies [doctoral dissertation, The University of Hong Kong]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0029-1812201200013915

國際替代計量

A New Experience in Persian Text Clustering using FarsNet Ontology

全文下載

主題瀏覽