機器學習應用於試題之標記與分類

隨著時代進步，資訊量越來越大，越來越多的中文文章與試題出現，如用過去的分類方式，人工成本太高，也會造成無法即時性的判斷試題。而本研究的目標是完成自動化分類中文試題的系統。以本研究為例，先利用機器學習各種演算法，並在正式系統中，能快速從眾多題目中以科目、章節甚至觀念分類，找尋學生要的試題，針對學生的學習狀況來給出不同的試題。目前機器學習應用的層面非常廣，用在語意分析的研究上也不少，但目前大部份研究還是偏向於英文字的深度學習，中文字的研究較少，然而準確應用在題目分析上是沒有相關研究的，所以本研究將會強調這部分。本論文主要是在各種機器學習演算法，來測試出最適當的演算法，來用在教育平台上，自動且精確的分類各科題目，各章節以及各觀念。本研究主要是探討四種機器學習演算法，對題目的觀念分析進行比較，所使用的方法有支持向量機、邏輯回歸、決策樹及隨機森林。首先將使用結巴分詞，維基百科語料庫訓練文字維度，然後開始進行題目中的機器學習，期望結果輸出為精準度高的題目分類。本研究結果，以支持向量機演算法用於中文試題分類為最佳化，此研究結果將可用在教育平台的自動分類中文試題上，幫助更多的試題分類能夠省去人力成本。

關鍵字

機器學習；中文語意分析；教育平台；字詞轉向量套件

並列摘要

With the progress of the times, the amount of information is growing, more and more Chinese articles and topics arise, as with previous classification, labor costs are too high, will not be making immediate judgments subject. The goal of this study is to complete the automated classification of Chinese questions system. In this study, for example, the first use of the machine to learn a variety of algorithms, and in the formal system, can quickly from a number of topics in the subjects, chapters or even the concept of classification, to find students to questions, for students to learn the situation to give different Questions. The current level of machine learning application is very wide, with a study on the semantic analysis of a lot, but most of the research was biased in favor of English words the depth of learning, few studies in the chinese text, however, accurate application in the topic analysis is no relevant research, so this research will emphasize in this section. This thesis is to study in a variety of machine algorithms to test the most appropriate algorithms to be used in education platform, automatically and accurately classified subjects topic, each chapter as well as the concept. This study was to investigate three types of neural algorithms to subject the concept of comparative analysis, the methods used are Support Vector Machine, Logistic Regression, Decision Tree and Random Forest. First use geiba word, wiki text corpus training dimension, and then make the title of the depth of learning, it is desirable for the high accuracy of the result output subject classification. The results of this study are based on the support vector machine algorithm for the classification of Chinese questions. The results of this study will be used in the automatic classification of Chinese education on the educational platform to help more problem classification to save labor costs.

並列關鍵字

Machine Learning ； Support Vector Machine ； Chinese Semantic Analysis ； Education Platform ； Software(Word2Vec)

參考文獻

1. 林宗勳。2000。Support Vector Machines 簡介。國立台灣大學資訊工程研究所。

Google Scholar

2. 林大貴。2016。Machine Learning 介紹。網址：http://hadoopspark.blogspot.tw/2016/02/blog-post.html。上網日期：2016-09-05

Google Scholar

3. 博客園。2015。Python TF-IDF計算100份文檔關鍵詞權重。中國廣東省中山大學信息科學與技術學院。網址：http://www.cnblogs.com/chenbjin/p/3851165.html。上網日期：2016-10-15。

Google Scholar

4. Andrew Ng and John Duchi. 2016. CS229:Machine Learning. Available at: http://cs229.stanford.edu/ . Accessed 13 September 2000.

Google Scholar

5. Brink, Henrik. Richards, Joseph W. and Fetherolf, Mark. 2016. Real-World Machine Learning: Model Evaluation and Optimization.

Google Scholar

國際替代計量

機器學習應用於試題之標記與分類

全文下載

主題瀏覽