利用文獻資訊提昇以文件主題為基礎之個人化推薦系統

隨著資訊科技的進步，網際網路也隨即快速發展，改善了過往資訊傳播的方式，使得資料能夠快速傳遞。然而，當使用者在享受資訊科技便利的同時，也面臨了資訊超載(Information Overload)的問題。在知識爆炸的情況下，資訊需求者難以快速搜尋到正確的資訊，因此推薦系統(Recommender Systems, RS)提供了一種主動式的服務，其可以幫助使用者在面對大量資料時，能正確且迅速地找到有用的資訊。在目前資訊過盛的時代，知識工作者要如何在眾多的學術文章中找到合適的文章，變成一個值得研究的議題(Xu et al., 2012)。本研究希望以情境相關為基礎來進行文章推薦，主要方法有兩種，第一為主題推薦方法，此方法能夠避免因字詞的關係而忽略的重要資訊，最重要的是其符合了我們想要探討的知識狀態衍伸問題。第二為引文分析方法，主要利用文章間的屬性相關性來進行推薦。在資料收集的部分，我們準備了兩種類型的實驗資料，第一型資料為資管所碩二學生論文文獻資料，此資料無須使用者參與，主要利用同學提供的論文文獻來進行實驗的訓練與測試。第二型資料主要是針對ScienceDirect網站中的學術文章進行收集，此資料須請使用者來進行實驗測試，幫助我們了解各方法的推薦效果。實驗評估的部分，我們將透過許多不同的評估準則，衡量系統不同面向的效能。本實驗使用了召回率(Recall)、準確率(Precision)、序位倒數率(Reciprocal Rank, RR)、平均準確率(Average Precision, AP)、平均效益(Average Utility, AU)和最喜歡的文章( Favorite Paper, FP))等評估方法來探討推薦系統的效能。由實驗評估方法中各種方法的效果排名為Citation Analysis (CA)> Latent Semantic Analysis (LDA)> Content-Based (CB)，因此CA證明了使用者在選擇文章時，不單單只考慮到標題或摘要的重要性，而其他資訊，如作者、期刊、發表日期與引用關係等也佔有很重要的部分。另外也證實了適當的加入文獻資訊是可以幫助使用者找到所需要的資訊。

關鍵字

潛藏狄利克里分配；學術文章推薦；推薦系統；引用文獻分析

並列摘要

With the advancement of information technology, the rapid development of Internet has immediately improved the way past the dissemination of information, so that data can be quickly delivered. The user to enjoy the convenience of information technology, while also facing the problem of information overload. In the case of the explosion of knowledge, information needs are difficult to quickly find the right information, it is recommended that the system provides a proactive service, which can help users in the face of large amounts of data can be correctly and quickly find useful information. There are two main methods, the first theme of the recommended method, this method can be avoided due to important information and ignore the relationship between words. The second method for citation analysis, attributes the main use of the correlation between the article to recommend. In the data collection part, we prepared two types of experimental data, the first type of data for the student paper documentation. The second type of information is mainly aimed at ScienceDirect website scholarly articles were collected. Part of the experimental evaluation. This experiment used the Recall, Precision, Reciprocal Rank, Average Precision, Average Utility and Favorite Paper and other assessment methods to explore the effectiveness of the recommended system. Ranked by the effect of various experimental methods of evaluation methods for Citation Analysis (CA)> Latent Semantic Analysis (LDA)> Content-Based (CB), CA method proved the user to select the article, while only taking into account not only the importance of the title or abstract, and other information. The results also confirmed the CA method can recommend the best meet user needs Top-K articles.

並列關鍵字

Latent Dirichlet allocation ； Research document recommendation ； Recommender system ； Citation Analysis

參考文獻

李彥賢, 楊錦生, & 廖國堯. (2012). 以社會性標籤為基礎的擴充搜尋技術支援影音分享網站中之影片檢索. 資訊管理學報, 19(3), 533-565.

Aljaber, B., Stokes, N., Bailey, J., & Pei, J. (2010). Document clustering of scientific texts using citation contexts. Information Retrieval, 13(2), 101-131.

Amini, B., Ibrahim, R., Othman, M. S., & Selamat, A. (2014). Capturing scholar’s knowledge from heterogeneous resources for profiling in recommender systems. Expert Systems with Applications, 41(17), 7945-7957.

Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: two sides of the same coin? Communications of the ACM, 35(12), 29-38.

Bhatia, V. K. (1993). Analysing genre: Language use in professional settings.

國際替代計量

利用文獻資訊提昇以文件主題為基礎之個人化推薦系統

未授權

主題瀏覽