透過您的圖書館登入
IP:18.209.66.87
  • 學位論文

使用支援向量機進行中文文本可讀性分類-以國小國語課文為例

sing the Support Vector Machine to classify the Chinese text readability – A Case of Elementary Chinese Textbook

若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


語文能力在各方面都扮演著重要的角色。而獲取語文能力最重要、最直接的管道之一就是透過閱讀。可讀性可以評估一個文本是否適合閱讀者的閱讀能力。以往的研究指出可讀性公式是一個工具,可以把對於不同教育程度的讀者所閱讀的文章加以調整。英文文本的可讀性研究很早就出現了,可是中文領域這方面的研究不多,而中文能力在現今社會又是一個很主要的趨勢。因此,一個適合文本可讀性的分類方法是很重要的。過去西方學者因為過去技術的不足多採用線性的可讀性公式對文本做可讀性分類,而線性的可讀性公式對本研究的資料有些限制,因此本研究的目的在建立一個由支援向量機(Support Vector Machine,SVM)所訓練產生的預測模型,將國小的國語科課文做可讀性的分類。進而觀察預測的課文跟原來實際的課文的年級是否相符,並針對錯誤的課文做分析,以改善與謀求分類上的準確性。 本研究以課程專家編撰,經國家編審單位審定的三個民間版本教科書(H版、K版、N版),國小一年級至六年級國語科課文刪減掉新詩、絕句、古文、律詩的課文後共計386篇為實驗資料,將課文一部分做為訓練資料,另一部分課文為測試資料,透過中文斷詞的處理及資料格式的轉換,最後以SVM來對文本的可讀性進行分類。研究結果發現:利用LIBSVM預測國小國語科課文冊別的準確率(accuracy)為47.92%、正確率(fit rate)為80.31%。最後針對預測錯誤的課文做錯誤分析,了解是甚麼因素造成預測上的錯誤。

並列摘要


Language plays an important part in every reign. And the most efficient way to enhance our ability is to read. Readability can estimate whether an article is suitable for one reader. Past researches claim that readability is a mean to adjust the level of article according to different kinds of educational attainment. The research of English readability has been on its way while Chinese has a little progression. However, Chinese is a trend in nowadays. It is important to find a suitable way to classify text readability. In the past researches, many western readability formulas do to the lack of technology use linear models on text classification, and linear readability formulas is a limit for the data in my research. Therefore, the purpose of this research is to use the predict model, which trained by the support vector machine, to classify the elementary Chinese textbook’s readability. And to check up that whether the text is matched with the predict text. At last, analyze the wrong text to improve the accuracy of text readability. This research was compiled by course expert and the experience materials( from first to sixth grades deleting the classical Chinese texts of three vision texts of private publish enterprise including vision H, K, and N) total 386 texts were examined by the national compilation organization. Part of the texts are used as training materials and the others are testing materials. Through the Chinese Word Segmentation processing and data format conversion, we at last do the text classification by SVM. The research conclusion is that the accuracy of predicting elementary texts is 47.92% while the fit rate is 80.31%. At the end, analyze the wrong prediction and understand the reason of this wrong prediction.

參考文獻


陳稼興、謝佳倫、許芳誠(2000)。以遺傳演算法為基礎的中文斷詞研究。電子商
Readability using hierarchical lexical relations retrieved
  Summarization to Improve Text Classification by SVMs.
陳茹玲、蘇宜芬(2010)。國小不同認字能力學童辨識中文字詞之字元複雜度效果與  詞長效果研究。國立臺灣師範大學教育心理與輔導學系教育心理學報,41(3),579-604。
宋佩貞(1998)。台灣審定版國小英語教科書適讀性公式建置與評估。國立台東大   學教育學研究所教學科技碩士班碩士論文。

被引用紀錄


李慧萱(2013)。華語作文分級系統〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0801201418035119

延伸閱讀