利用內文及社群資訊進行關鍵字之階層分類

本論文的研究目的，在於建立一個非監督式的階層分類系統，此系統可將特定的專有名詞，適當地歸類在本體的分類架構上。所用的研究方法，與其他方法不同點在於：藉由利用結合內文與社群資訊所建置的字彙關連模型，可較單一資訊產生的關連模型，達到較強的分類效果。此外，我們利用本體的階層結構，提出以路徑為主的新穎分類方法。本研究係以三種不同的字彙關連模型，計算出專有名詞與類別的相似度；三種模型包括(1)基於相互資訊的內文模型、(2)具有共同社群屬性的靜態社群模型及(3)基於社群網路與頁排名演算法的動態社群模型。所用的階層分類演算法，是利用本體結構與字彙關連模型來預測專有名詞的類別。本研究的實驗，採行計算機器協會的文獻分類系統進行驗證，結果顯示，所提出的分類演算法，以及結合內文與社群資訊的字彙關連模型，均可有效地提升專有名詞分類的正確率。

關鍵字

本體；階層分類；相似度量測；類別相似度

並列摘要

The objective of this thesis is to develop an unsupervised hierarchical classification system in which a given proper noun is classified into an appropriate category of a designated ontology. Different from other approaches, our methods exploit both content and social information to show that combining weaker similarity measures could produce a stronger one. To take the hierarchical information into account, we also propose a novel path-based classification strategy. In our work, similarities of proper nouns and categories are captured using three different models: a content-based model using pointwise mutual information; a static social model based on social similarity, and a dynamic social model through exploiting the PageRank algorithm on a social network. Our hierarchical classification algorithms exploit both the ontology structure and similarity measures to identify the category of a given proper noun. The experimental results on ACM Computing Classification System show that our proposed classification algorithm, when used combined similarity measure, can improve significantly the effectiveness of the proper noun classification.

並列關鍵字

ontology ； hierarchical classification ； similarity measure ； taxonomic similarity

參考文獻

1.Heflin, J. and J. Hendler, A Portrait of the Semantic Web in Action. IEEE Intelligent Systems, 2001. 16(2): p. 54-59.

4.ACM. The 1998 ACM Computing Classification System. 1998; Available from: http://www.acm.org/about/class/1998/.

6.Wibowo, W. and H.E. Williams, Simple and accurate feature selection for hierarchical categorisation, in Proceedings of the 2002 ACM symposium on Document engineering. 2002, ACM: McLean, Virginia, USA.

7.Doan, A., P. Domingos, and A.Y. Halevy, Reconciling schemas of disparate data sources: a machine-learning approach. SIGMOD Rec., 2001. 30(2): p. 509-520.

8.Do, H. and E. Rahm, COMA - A System for Flexible Combination of Schema Matching Approaches. 2002.

國際替代計量

利用內文及社群資訊進行關鍵字之階層分類

全文下載

主題瀏覽