透過您的圖書館登入
IP:3.140.185.147
  • 學位論文

基於支持向量機與輔助資訊之目錄整合機制改良

Enhanced Catalog Integration Based on Support Vector Machines with Auxiliary Information

指導教授 : 楊正仁
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著網際網路的快速發展,今日網際網路上的網頁數量非常龐大。 因此,許多入口網站提供目錄來幫助使用者尋找資訊。一個入口網站可以 透過整合另一個入口網站的網頁資源,以提供豐富的服務。電子商業網站 也可透過目錄整合以提供線上產品供應者與購買者更方便的線上交易環境。 當兩個目錄作整合,兩個目錄之間資訊的使用不一致性,卻會使得整合的工作更加複雜。 過去的目錄整合研究中顯示,來自來源目錄的相關輔助資訊可以有效提昇機率模型在目錄整合上的效果。 然而,後來的學者利用相關輔助資訊,並配合使用支持向量機(SVM),卻只得到有限的提昇效果。 在本篇論文中,我們提出一個新的目錄整合方式來提昇目錄整合的準確率。 我們使用支持向量機透過機器學習的方式,並配合所萃取的相關輔助資訊進行目錄整合。 我們將探討如何有效運地用來源目錄的資訊、超本文標籤與目錄中的同義詞資訊等 輔助資訊來加強目錄整合的效果。 在實驗中,我們使用兩個著名的分類網站Yahoo!與Google作為資料集。 實驗結果顯示支持向量機在目錄整合的正確性上表現十分優異, 並且結果也顯示輔助資訊對於支持向量機在目錄整合上有一致明顯的幫助。

並列摘要


As the Internet develops rapidly, the number of on-line Web pages becomes very large today. Many Web portals provide catalogs to facilitate information search. A Web portal can integrate other Web portals' catalogs to provide more abundant services. B2B electronic marketplaces bring together many online suppliers and buyers by e-catalog integration and provide a more conveinet online trade environment. Catalog integration thus becomes an important task for e-catalog management. However, when two catalogs are integrated, the information inconsistencies between the two catalogs will complicate the integration work and become the challenge of catalog integration. In this research, we present a new integration mechanism to enhance the accuracy in catalog integration. We use support vector machines (SVM) and auxiliary information to integrate the documents of one catalog into another catalog. The information from the source catalog, the destination catalog, and the hypertext weights are the important auxiliary information in catalog integration. The synonyms extracted from the thesaurus are another resource of the auxiliary information. In the experiments, the categories from two well-known Web catalogs Yahoo! and Google are chosen as the data sets. The experimental results show that SVM is superior in catalog integration. The experimental results also show that the auxiliary information can consistently improve the accuracy of SVM.

參考文獻


Kong, May 2001.
[3] Shawkat Ali and Ajith Abraham. ìAn Empirical Comparison of Kernel Selection for
[4] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machine
[5] Susan Dumais and Hao Chen. ìHierarchical Classication of Web Contentî. In
Learning Algorithms and Representations for Text Categorizationî. In Proceedings

延伸閱讀