以文件內容為基礎之多文件脈絡關係分析-以產品相關文件分析為例

當資訊需求者透過網際網路搜尋所需之文件資料時，由搜索引擎所尋得之文件通常以與搜尋條件相關性高或常被其他瀏覽者點選之文件為優先出現，即符合搜尋條件文件之排序並未考量文件之脈絡關係（即文件之排序未參考文件內容參照的先後關聯），導致資訊需求者無法依文件間合理的先後次第、由淺入深地閱讀文件，因而可能花費較多時間理解文件內容、或在閱讀文件的過程中面臨理解困難的問題。為解決上述問題，本研究乃先透過搜索引擎蒐集網際網路之各類文件，將所蒐集之文件加以分類，並擷取各文件之特徵點；之後，本研究即依各文件特質擷取結果歸納各類文件之區分特質。依前述作業之解析結果，本研究發展一套「文件脈絡關係分析」方法論，而此方法論主要乃包含「文件特質擷取」、「文件類別判定」及「文件脈絡排序」等三大階段。其中，「文件特質擷取」階段可將搜索引擎尋得之文件依其文件內容擷取特徵點；之後，「文件類別判定」階段乃依文件特質擷取結果、搭配已歸納之各類文件區分特質判定各目標文件所對應之文件類別；最後，「文件脈絡排序」階段則將各類別之文件依閱讀先後次第由淺入深地予以排序，並以視覺化方式呈現此排序結果，以呈現文件間之脈絡關係，供讀者方便地選讀所尋得之目標文件。藉由上述方法，資訊需求者可在尋得所需之文件資料後，以本研究發展之方法自大量文件中取得文件間合理之排序，並可依文件之先後次第由淺入深地閱讀文件，減少理解文件內容與困難問題的時間，進而提供不同對象閱讀之建議內容，以及學習過程之關係脈絡建議。

關鍵字

文件脈絡關係；文件類別判定；閱讀內容建議

並列摘要

As one searches required documents via keywords over the Internet, ranks of the related documents are determined based on their correlation with the specified keywords and their click rates. That is, context relationship between the related documents is not employed to determine the rank. As a result, readers have to spend more time to understand the document contents or face difficulties in understanding the documents. In order to solve the problems, this research analyzes a great number of documents and generalizes the relationship between document characteristics and document categories. On the basis of the analysis results, this research develops a model for context relationship analysis of multiple documents. By using the proposed model, characteristics and categories of documents can be identified by using determinant vectors. Finally, the documents can be sorted and the context relationship of documents can be visually displayed for reading. As a whole, the research can assist readers to acquire reasonable and visualized ranking of documents and to read the documents in appropriate sequence.

並列關鍵字

Document Context Relationship ； Classification ； Reading Recommendation

參考文獻

33. Li, Z., Zhou, D., Juan, Y.-F. and Han, J., 2010, "Keyword Extraction for Social Snippets," Proceedings of the 19th International Conference on World Wide Web, pp. 1143-1144.

32. Liu, Y., Zhang, L., Song, R., Nie, J.-Y. and Wen, J.-R., 2009, "Clustering Queries for Better Document Ranking," Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1569-1572.

1. Agrawal, J., Sharma, N., Kumar, P., Parshav, V. and Goudar, R. H., 2013, "Ranking of Searched Documents Using Semantic Technology," Procedia Engineering, Vol. 64, pp. 1-7.

2. Akbari Torkestani, J., 2012, "An Adaptive Learning Automata-Based Ranking Function Discovery Algorithm," Journal of Intelligent Information Systems, Vol. 39, No. 2, pp. 441-459.

5. Benny, A. and Philip, M., 2015, "Keyword Based Tweet Extraction and Detection of Related Topics," Procedia Computer Science, Vol. 46, pp. 364-371.

國際替代計量

以文件內容為基礎之多文件脈絡關係分析-以產品相關文件分析為例

全文下載

主題瀏覽