透過您的圖書館登入
IP:3.145.178.240
  • 學位論文

推薦期刊文章至適合學科類別之研究

Recommending Subject Categories for Journal articles

指導教授 : 許秉瑜
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


年輕學者在投稿時期刊文章,時常會有誤判學科類別(Subject Categories)的問題出現。本研究嘗試以英文期刊文章標題(Journal Title)來進行分析,探討期刊文章適合投稿的學科類別之間的吻合關係。在過去研究中不曾僅使用文章標題之斷詞後文字(Text)做為類別分類的基礎,此外當面臨相當龐大的資料量和類別廣度時,為瞭解探究其分類結果,所用方法包含:每篇文章標題之斷詞後各個文字出現的文字、次數和學科類別集合,以及天真貝氏分類法(Naïve Bayes)。所獲得預測命中與否的結果準確度分別有兩種:一種為概括文章命中率(Rough Hitting Ratio, RHR)67.24%,另一種為精實學科類別命中率(Precise Hitting Ratio, PHR)38.34%。

並列摘要


With the proliferation of academic journals, a common issue faced by young scholars or researchers who wish to tread into the field of cross disciplines is to locate suitable categories and journals to submit their works. To lessen the severity of the issue, this research proposed a Naïve Bayes Classification method to recommend subject categories for a manuscript by analyzing the title words. The challenging of this study came from the huge amount of data. By limiting the subject categories to the areas where NCU faculty members have published in the past three years, we got 64 categories and 199 journals. The number of articles in these journals are 224,870 The data that are used to build the classification model consists of 171,625 records and the testing data have 53,245 records. With intensive coding, the study is able to come out with a system to handle the job with reasonable performance. The Hit ratios are 67.24% and 38.34% for Rough Hitting Ratio (RHR) and Precise Hitting Ratio (PHR), respectively.

並列關鍵字

Naïve Bayes Big Data text mining

參考文獻


一、 中文部份
[1]尹相志,SQL Server 2008 Data Mining資料採礦,初版,悅知文化,2009
[2]Tan, P.N., Steinbach, M., Kumar, V.著,Introduction to Data Mining,施雅月,賴錦慧譯,台灣培生教育出版社股皆有限公司,2008.01
[3]林傑斌、張一岑、張太平,資料倉儲與資料採擷,博碩文化股份有限公司,2004.05
[4]耿素雲、張立昂,機率統計,二版,儒林圖書有限公司,1996.01

延伸閱讀