透過您的圖書館登入
IP:3.141.30.162
  • 學位論文

以統計分析探討文件分類程序對期刊論文分類效果之影響

The Study of the Effects of Text Categorization Processes on Journal Papers Classification by Statistical Analysis

指導教授 : 薛義誠
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


期刊論文提供專業領域知識,然資訊超載造成檢索時間成本浪費,應用文件分類技術可讓使用者迅速取得相關領域之期刊論文。文件分類程序包含「前處理」、「文件特徵建構」、「分類方法應用」與「分類結果評估」等四個階段。針對期刊論文之分類效果,本研究以統計假設檢定探討期刊論文分類程序中,特徵權重方法、文章欄位差異與應用不同分類器對分類效果之影響,並與本研究設計之抽樣分配分類器進行比較。由實驗模擬與統計假設檢定分析顯示,第一,以特徵比例作為特徵權重方法分類效果顯著優於特徵頻率。第二,文章欄位以「摘要」之分類效果最佳,優於標題與關鍵字,後兩者則無顯著差異。第三,期刊論文分類以支持向量機分類效果最佳,其次為貝式機率分類器、決策樹以及抽樣分配分類器。第四,應用文件分類技術將期刊論文分類之方法可行。另外針對抽樣分配分類器部分,亦提出分析結果與建議,以提升未來研究所需。

並列摘要


Journal papers provide professional domain knowledge. Nevertheless, emerging of information overloading causes considerable cost of time. Application of text categorization technology could help users to retrieve domain journal papers efficiently. Four phases of text categorization process are “text pre-processing”, “document feature construction”, “applying classification methods” and “evaluation”. This research probes for the effectiveness of: feature weighting, fields of articles and classifiers during the process of journal papers categorization, and also applied sampling distribution classifier within the process. The hypothesis test analysis shows that: 1st, feature ratio performs well significantly than feature frequency. 2nd, fields of abstract are more effective than titles and keywords of journal papers, and there are no difference between the latter two. 3rd, Support vector machines are most effective, then naïve-bayes, decision trees and sampling distribution classifier in order. And 4th, text categorization of journal papers is feasible. Additionally, analysis and recommendation of sampling distribution classifier are also proposed for the future study.

參考文獻


[4]. Sebastiani F., “Machine Learning in Automated Text Categorization”, ACM Computing Surveys (CSUR), Vol.34, Issue 1, pp.1-47, 2002.
[6]. Yang Y., Pedersen J., “A Comparative Study on Feature Selection in Text Categorization”, International Conference of Machine Learning (ICML-97), pp.412-420.
[8]. Joachims T., “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Proceedings of the European. Conference on Machine Learning, 1998.
[9]. Salton G., Buckley C., “Term Weighting Approaches in Automatic Text Retrieval”, Information Process, man, 24, 5, 1988, pp.513-523.
[11]. Joachims T. “Transductive Inference for Text Classification Using Support Vector Machines”, Proceedings of ICML-99, 16th International Conference on Machine Learning, pp.200-209.

被引用紀錄


張冠英(2012)。利用決策樹分析法探討慢性疾病對腹主動脈瘤患者術前及術後的影響〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2012.00384
沈錦鴻(2013)。應用類神經網路配合ACI規範輔助卜作嵐混凝土配比設計〔碩士論文,國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2013.00090
王瑋萱(2016)。綠色物流港之研究:結構化與非結構化資料分析〔碩士論文,逢甲大學〕。華藝線上圖書館。https://doi.org/10.6341/fcu.M0403256
吳沂衡(2012)。運用圖片與資料探勘技術於產品關聯與行銷關鍵字之應用研究-以服飾流通業為例〔碩士論文,國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201613513037

延伸閱讀