透過您的圖書館登入
IP:3.129.69.151
  • 學位論文

表徵學習法之文本可讀性

Representation Learning for Text Readability

指導教授 : 陳柏琳 宋曜廷
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


none

關鍵字

none

並列摘要


Text readability refers to the degree to which a text can be understood by its readers: the higher the readability of a text for readers, the better the comprehension and learning retention can be achieved. In order to facilitate readers to digest and comprehend documents, researchers have long been developing readability models that can automatically and accurately estimate text readability. Conventional approaches to readability classification aim to infer a readability model using a set of handcrafted features defined a priori and computed from the training documents, along with the readability levels of these documents. However, developing the handcrafted features is not only labor-intensive and time-consuming, but also expertise demanding. With the recent advance of representation learning techniques, we can efficiently extract salient features from documents without recourse to specialized expertise, which offers a promising avenue of research on readability classification. In view of this, we in this study based on representation learning techniques propose several novel readability models, which have the capability of effectively analyzing documents belonging to different domains and covering a wide variety of topics. Compared with a baseline reference using a traditional model, the new model improves by 39.55% to achieve 78.45% of accuracy. We then combine different kinds of representation learning algorithm with general linguistic features, and the accuracy improves by an even higher degree of 40.95% to achieve 79.85%. Finally, this study also explores character-level representations to develop a novel readability model, which offers the promise of conducting a successful text readability assessment of the Chinese language with 78.66% accuracy. All the above results indicate that the readability features developed in this study can be used both to train a readability model for leveling domain-specific texts and to be used in combination with the more common linguistic features to enhance the efficacy of the model. As to future work, we will focus on exploring more training methods of constructing the semantic space and combining text summarization techniques, in order to distill salient aspects of text content that can further enhance the effectiveness of a readability model.

參考文獻


Altszyler, E., Sigman, M., Ribeiro, S., and Slezak, D. F. 2016. Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint arXiv:1610.01520.
Arras, L., Horn, F., Montavon, G., Müller, K. R., and Samek, W. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PloS one, 12(8): e0181142.
Bailin, A., and Grafstein, A. 2001. The linguistic assumptions underlying readability formulae. Language and Communication 21(3): 285-301.
Begeny, J. C., and Greene, D. J. 2014. Can readability formulas be used to successfully gauge difficulty of reading materials?. Psychology in the Schools 51(2): 198-215.
Belden, B. R., and Lee, W. D. 1961. Readability of biology textbooks and the reading ability of biology students. School Science and Mathematics 61(9): 689-693.

延伸閱讀