隨著網際網路的快速發展,今日網際網路上的網頁數量非常龐大。 因此,許多入口網站提供目錄來幫助使用者尋找資訊。一個入口網站可以 透過整合另一個入口網站的網頁資源,以提供豐富的服務。電子商業網站 也可透過目錄整合以提供線上產品供應者與購買者更方便的線上交易環境。 當兩個目錄作整合,兩個目錄之間資訊的使用不一致性,卻會使得整合的工作更加複雜。 過去的目錄整合研究中顯示,來自來源目錄的相關輔助資訊可以有效提昇機率模型在目錄整合上的效果。 然而,後來的學者利用相關輔助資訊,並配合使用支持向量機(SVM),卻只得到有限的提昇效果。 在本篇論文中,我們提出一個新的目錄整合方式來提昇目錄整合的準確率。 我們使用支持向量機透過機器學習的方式,並配合所萃取的相關輔助資訊進行目錄整合。 我們將探討如何有效運地用來源目錄的資訊、超本文標籤與目錄中的同義詞資訊等 輔助資訊來加強目錄整合的效果。 在實驗中,我們使用兩個著名的分類網站Yahoo!與Google作為資料集。 實驗結果顯示支持向量機在目錄整合的正確性上表現十分優異, 並且結果也顯示輔助資訊對於支持向量機在目錄整合上有一致明顯的幫助。
As the Internet develops rapidly, the number of on-line Web pages becomes very large today. Many Web portals provide catalogs to facilitate information search. A Web portal can integrate other Web portals' catalogs to provide more abundant services. B2B electronic marketplaces bring together many online suppliers and buyers by e-catalog integration and provide a more conveinet online trade environment. Catalog integration thus becomes an important task for e-catalog management. However, when two catalogs are integrated, the information inconsistencies between the two catalogs will complicate the integration work and become the challenge of catalog integration. In this research, we present a new integration mechanism to enhance the accuracy in catalog integration. We use support vector machines (SVM) and auxiliary information to integrate the documents of one catalog into another catalog. The information from the source catalog, the destination catalog, and the hypertext weights are the important auxiliary information in catalog integration. The synonyms extracted from the thesaurus are another resource of the auxiliary information. In the experiments, the categories from two well-known Web catalogs Yahoo! and Google are chosen as the data sets. The experimental results show that SVM is superior in catalog integration. The experimental results also show that the auxiliary information can consistently improve the accuracy of SVM.