利用相關回饋資訊以提升文件分類之效能

隨著網際網路的快速發展，網路資訊爆炸(Information explosion)使得可存取的資訊量愈來愈多。資訊檢索系統在獲取資訊的過程中扮演很重要的角色，為了提升檢索的品質與滿足使用者的資訊需求，「文件分類」(Text classification)是一個重要的課題。本研究提出了一套方法，萃取相關回饋(Relevance feedback)的資訊建立使用者興趣檔(User profile)，並透過此使用者興趣檔對文件進行特徵選取(Feature selection)與字詞權重調整(Re-weighting)，其包含兩個概念：(1)使用者興趣檔代表了使用者正向與負向的興趣，文件只保留屬於此使用者興趣檔的維度以減少文件分類過程中雜訊之干擾。(2)字詞出現在使用者興趣檔或文件中的重要位置，則給予加權以增加相關文件與非相關文件特徵的差異性；文件特徵強化是字詞敏感度(term sensitivity)輔以半結構化資訊的應用。實驗結果證實，本研究的方法能夠有效地擷取相關回饋的資訊，輔助文件分類正確率的提升與大幅縮減至少一半以上的執行時間。

關鍵字

權重調整；使用者興趣檔；特徵選取；文件分類；相關回饋

並列摘要

With the rapid development of the Internet, the information explosion across the Internet offers access to an increasing amount of information. Information retrieval system is playing an important role in the information retrieval process. In order to improve the retrieval quality and provide information in line with users’ need, “text classification” is an important issue. The study proposes an approach extracting information of relevance feedback to construct user profile for feature selection and term weighting adjustment of documents, and this approach consists of two concepts: (1) The user profile represents positive and negative interests of user, and the documents preserve only the features belonging to the user profile for reducing the noise interference in text classification. (2) The terms appearing in the user profile or important position in document are weighted for increasing the characteristic difference between relevant and non-relevant documents. Characteristic enhancement of documents is the application of term sensitivity aided by semi-structured information. The results of the experiments show that the proposed approach can extract information of relevance feedback effectively. Not only improving the accuracy of text classification but also at least a half of processing time can be greatly reduced.

並列關鍵字

Feature selection ； Re-weighting ； User profile ； Relevance feedback ； Text classification

參考文獻

[4] H. Kim, P. Howland, and H. Park, “Dimension Reduction in Text Classification with Support Vector Machines,” Journal of Machine Learning Research, Vol. 6, No. 1, pp. 37-53, Mar. 2003.

[5] I. Ruthven, and M. Lalmas, “A survey on the use of relevance feedback for information access systems,” Knowledge Engineering Review, Vol. 18, No. 2, pp. 95-145, 2003.

[6] G. Salton, A. Wang, and C. S. Yang, “A Vector Space Model for Automatic Indexing,” Communication of the ACM, Vol. 18, No. 11, pp. 613-620, 1975.

[7] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, Vol. 24, No. 5, pp. 513-523, 1988.

[9] Y. Hijikata, “Implicit User Profiling for On Demand Relevance Feedback,” in Proceedings of ACM Intelligent User Interface Conference, pp. 198-205, January 2004.

國際替代計量

利用相關回饋資訊以提升文件分類之效能

未授權

主題瀏覽