透過您的圖書館登入
IP:3.145.191.169
  • 學位論文

使用潛在語意分析建構文本分類模型- 以國小社會科課文為例

Text Classification Model Based on Latent Semantic Analysis: A Case Study of Textbook for Social Studies in Elementary School

指導教授 : 張國恩 宋曜廷 張道行
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


由於網路的發達和電腦的普及,學生常常透過網路來尋找資料,但往往搜尋結果龐大,且內容涵蓋各個面向,導致學生浪費許多時間在結果中反覆檢閱才得以找出適合程度及目標的文章資訊。可讀性文本分類可以分辨文本所屬的難易層級,讓學生可以選擇適合自己程度的文本,以節省學生尋找適合自己程度的文本的時間。過去可讀性研究多將文本表面特徵代入線性公式求得一個難易度的分數,但是在中文環境底下,語意特徵就比表面特徵來的重要,因此本研究利用潛在語意分析技術分析文本的語意特徵,再以語意特徵作為分類依據對文本進行可讀性的分類。本研究資料採用國小社會科課文,利用每個學期不同主題的特性,透過潛在語意分析技術建置一個社會科的語意空間模型,利用建構好的語意空間模型將未知程度的社會科文章分類至所屬的層級。 本研究在國小社會科以學期為分類的分類結果,在分析的準確率達79.06%,在分類上可達到不錯的效果。潛在語意分析提供可讀性研究另一個角度的思維,以文本所傳達的「語意」為分析依據,特別適用重視語意的中文環境。

並列摘要


Due to the well-developed internet and widely usage of computers, internet becomes the tool for student to mine the information they need. But the results are often complex and huge, students waste a lot of time to review the results again and again to find out the text which is suitable to their ability. Readability text classification can identify the difficulty of the text and students can choose the text which is suitable for them in order to save their time. Many studies of readability put surface features into linear formula to obtain a readability score, but in Chinese, the semantic information is more important than in English. By using Latent Semantic Analysis to analyze the semantic features of text, and classify the readability of text by the semantic information. In this study, elementary Social Study textbook has been used as our data. By utilizing the characteristics of the different themes in each semester, we have constructed the semantic space model of elementary Social Study textbook by Latent Semantic Analysis, and apply the model to classify the unknown readability level texts to the class which they should be classified. In this study, the accuracy of classification is 79.06%. Latent Sementic Analysis inspires us another point of view on readability of text classification, especially for Chinese text whom importance semantic information more.

參考文獻


柯華葳、陳明蕾(2009)。中文語意空間建置及心理效度驗證:以潛在語意分析技 術為基礎。中華心理學刊,51(4),397-407。
楊孝濚(1978)。中文可讀性公式。新聞學研究,8,77-102。
Bormuth, J. R. (1966). Readability: A new approach. Reading Research Quarterly, 1(3), 79-132.
Eleni Miltsakaki , Audrey Troutt, (2008). Real-time web text classification and analysis of reading difficulty. Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, p.89-97.
Flesch, R. F. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221-233.

被引用紀錄


李慧萱(2013)。華語作文分級系統〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0801201418035119

延伸閱讀