由於網路的發達和電腦的普及,學生常常透過網路來尋找資料,但往往搜尋結果龐大,且內容涵蓋各個面向,導致學生浪費許多時間在結果中反覆檢閱才得以找出適合程度及目標的文章資訊。可讀性文本分類可以分辨文本所屬的難易層級,讓學生可以選擇適合自己程度的文本,以節省學生尋找適合自己程度的文本的時間。過去可讀性研究多將文本表面特徵代入線性公式求得一個難易度的分數,但是在中文環境底下,語意特徵就比表面特徵來的重要,因此本研究利用潛在語意分析技術分析文本的語意特徵,再以語意特徵作為分類依據對文本進行可讀性的分類。本研究資料採用國小社會科課文,利用每個學期不同主題的特性,透過潛在語意分析技術建置一個社會科的語意空間模型,利用建構好的語意空間模型將未知程度的社會科文章分類至所屬的層級。 本研究在國小社會科以學期為分類的分類結果,在分析的準確率達79.06%,在分類上可達到不錯的效果。潛在語意分析提供可讀性研究另一個角度的思維,以文本所傳達的「語意」為分析依據,特別適用重視語意的中文環境。
Due to the well-developed internet and widely usage of computers, internet becomes the tool for student to mine the information they need. But the results are often complex and huge, students waste a lot of time to review the results again and again to find out the text which is suitable to their ability. Readability text classification can identify the difficulty of the text and students can choose the text which is suitable for them in order to save their time. Many studies of readability put surface features into linear formula to obtain a readability score, but in Chinese, the semantic information is more important than in English. By using Latent Semantic Analysis to analyze the semantic features of text, and classify the readability of text by the semantic information. In this study, elementary Social Study textbook has been used as our data. By utilizing the characteristics of the different themes in each semester, we have constructed the semantic space model of elementary Social Study textbook by Latent Semantic Analysis, and apply the model to classify the unknown readability level texts to the class which they should be classified. In this study, the accuracy of classification is 79.06%. Latent Sementic Analysis inspires us another point of view on readability of text classification, especially for Chinese text whom importance semantic information more.