  • 學位論文


Hierarchical Catalog Integrate based on the Maximum Entropy Model

指導教授 : 楊正仁


隨著網路資訊蓬勃發展與快速整合交換的需求, 在全球資訊網(WWW) 中, 許多資訊都是透過分類目錄的方式來呈現給使用者。因此, 如何能夠精確的進行網頁目錄整合, 在近幾年中,成為重要的研究議題。 針對目錄整合的議題, 在過往研究中, 已分別針對攤平式與階層式的分類目錄來討論如何利用來源端目錄所隱含的目錄資訊, 有效的提升分類的效能表現。然而在目前的文獻回顧裡, 我們尚未看到對於階層式架構目錄, 使用外部語義庫資訊來提升效能的方式。因此在本論文中,我們探討如何利用外部語義庫與目錄階層架構關係的資訊, 使得目錄整合效能可以進一步被提升。在實驗中, 我們使用最大熵模型來進行實作我們的方法, 並且採用實際的Web 目錄來進行測試。與支持向量機一起評估的實驗結果顯示, 使用外部語義庫資訊, 可以比單獨使用目錄階層架構關係有更好的整合表現。


In many areas, information is organized in catalogs on the Web. Demands of integration two catalogs appear in many applications. These catalogs usually contain a lot of Web documents and have complicated hierarchical structures. Therefore, how to integrate two catalogs accurately becomes an important research topic. For the catalog integration problem, past studies mainly focus on flattened catalogs, and only few papers further discuss the integration of hierarchical catalogs. To the best of our survey, no research has discussed the improvement from additional semantic information on hierarchical catalog integration. This thesis presents an enhancement based on the Maximum Entropy (ME) model using the hierarchical thesaurus information embedded in the catalogs and the additional semantic features expanded from an external corpus. Experimental results on real-world catalogs indicate that the proposed approach consistently improves the integration performance.


[1] “Amazon.com.” [Online]. Available: http://www.amazon.com.
[6] “Wordnet a lexical database,” Cognitive Science Laboratory, Princeton University.
[Online]. Available: http://wordnet.princeton.edu/.
[8] A. Berger, “The Improved Iterative Scaling Algorithm: A Gentle Introduction,”Technical report, 1997.
[10] I.-X. Chen, J.-C. Ho, and C.-Z. Yang, “On Hierarchical Web Catalog Integration with Conceptual Relationships in Thesaurus,” in Proceedings of the 29rd ACM Conference
