使用潛在語意分析建構文本分類模型- 以國小社會科課文為例

由於網路的發達和電腦的普及，學生常常透過網路來尋找資料，但往往搜尋結果龐大，且內容涵蓋各個面向，導致學生浪費許多時間在結果中反覆檢閱才得以找出適合程度及目標的文章資訊。可讀性文本分類可以分辨文本所屬的難易層級，讓學生可以選擇適合自己程度的文本，以節省學生尋找適合自己程度的文本的時間。過去可讀性研究多將文本表面特徵代入線性公式求得一個難易度的分數，但是在中文環境底下，語意特徵就比表面特徵來的重要，因此本研究利用潛在語意分析技術分析文本的語意特徵，再以語意特徵作為分類依據對文本進行可讀性的分類。本研究資料採用國小社會科課文，利用每個學期不同主題的特性，透過潛在語意分析技術建置一個社會科的語意空間模型，利用建構好的語意空間模型將未知程度的社會科文章分類至所屬的層級。本研究在國小社會科以學期為分類的分類結果，在分析的準確率達79.06%，在分類上可達到不錯的效果。潛在語意分析提供可讀性研究另一個角度的思維，以文本所傳達的「語意」為分析依據，特別適用重視語意的中文環境。

關鍵字

潛在語意分析；可讀性；文本分類

並列摘要

Due to the well-developed internet and widely usage of computers, internet becomes the tool for student to mine the information they need. But the results are often complex and huge, students waste a lot of time to review the results again and again to find out the text which is suitable to their ability. Readability text classification can identify the difficulty of the text and students can choose the text which is suitable for them in order to save their time. Many studies of readability put surface features into linear formula to obtain a readability score, but in Chinese, the semantic information is more important than in English. By using Latent Semantic Analysis to analyze the semantic features of text, and classify the readability of text by the semantic information. In this study, elementary Social Study textbook has been used as our data. By utilizing the characteristics of the different themes in each semester, we have constructed the semantic space model of elementary Social Study textbook by Latent Semantic Analysis, and apply the model to classify the unknown readability level texts to the class which they should be classified. In this study, the accuracy of classification is 79.06%. Latent Sementic Analysis inspires us another point of view on readability of text classification, especially for Chinese text whom importance semantic information more.

並列關鍵字

Latent Semantic Analysis ； Readability ； Text Classification

參考文獻

柯華葳、陳明蕾(2009)。中文語意空間建置及心理效度驗證：以潛在語意分析技術為基礎。中華心理學刊，51(4)，397-407。

楊孝濚(1978)。中文可讀性公式。新聞學研究，8，77-102。

Bormuth, J. R. (1966). Readability: A new approach. Reading Research Quarterly, 1(3), 79-132.

Eleni Miltsakaki , Audrey Troutt, (2008). Real-time web text classification and analysis of reading difficulty. Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, p.89-97.

Flesch, R. F. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221-233.

被引用紀錄

李慧萱（2013）。華語作文分級系統〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0801201418035119

延伸閱讀

陳文婷（2006）。結合學生問題分析表及試題概念結構圖之個別化回饋對國小數學學習之影響〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2006.00564
黃恆霖（2016）。以擴增實境建立考慮多重人因之個別化學習系統：以國小英文課程為例〔碩士論文，嶺東科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0103-1407201613452200
張淑惠（2021）。國中國文閱讀理解教學模組研究──建構式學習單之設計與應用〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://doi.org/10.6345/NTNU202100650
董純賢（2010）。Adopting the framework of Multi-level Class Priority with Multiple Classifiers to improve the Accuracy of Text Classification〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2010.00211
SriVidhya, M., & Ahmed, M. S. I. (2014). Classification of Messages in Online Social Network using Short Text Classifier. Research Journal of Applied Sciences, Engineering and Technology, 8(12), 1480-1486. https://www.airitilibrary.com/Article/Detail?DocID=20407467-201409-201511260026-201511260026-1480-1486

國際替代計量

使用潛在語意分析建構文本分類模型- 以國小社會科課文為例

主題瀏覽