多個專有詞彙概念解釋句語意關連自動分析組織之研究

本論文研究以電子書作為內容來源，針對兩個特定領域專有詞彙的概念解釋句，進行自動擷取以及分群組織整理。為了克服傳統上使用字詞頻率建構特徵向量卻忽略隱含語意關係的缺點，本論文提出計算句子中出現的所有字詞對選取的特徵字詞之語意相似關係，來對句子建立MI特徵向量，進行句子分群。從分群的結果中選定可以代表分群概念的標籤，使用標籤來重新組織概念架構，並且在分群中挑出可以代表兩個專有詞彙的比較句。

關鍵字

資料探勘；資訊檢索；句子分群；自動摘要

並列摘要

In this thesis, we use PDF textbook as data resource, focus on comparing the conceptual sentences of two domain-specific terms .We first calculate the mutual information of every word in sentence and selected feature words to build MI vector space model. The vector space model is used to evaluate the similarity of two sentences for the hierarchical clustering algorithm. After clustering, we choose representative labels and comparative sentence pair for every cluster. According representative labels, the clusters which have the same labels will be grouped as a new concept hierarchy.

並列關鍵字

Data Mining ； Information Retrieval ； Sentence Clustering ； Automatic Summarization

參考文獻

[1]M. Grineva, M. Grinev, and D. Lizorkin, “Extracting Key Terms From Noisy and Multi-theme Documents,” in Proceedings of the 18th international conference on World wide web (WWW), 2009.

[2]X. Hu, N. Sun, C Zhang, and T. Chua, “Exploiting internal and external semantics for the clustering of short texts using world knowledge,” in Proceedings of the 18th ACM conference on Information and knowledge management (CIKM), 2009.

[3]D. Bollegala, Y. Matsuo, and M. Ishizuka, “Measuring the similarity between implicit semantic relations using web search engines,” in Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009.

[4]W. Jin, R.K. Srihari, H.H. Ho, and X. Wu, “Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques,” in Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007.

[5]S. Momtazi, and D. Klakow, “A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009.

國際替代計量

多個專有詞彙概念解釋句語意關連自動分析組織之研究

主題瀏覽