Translated Titles

Exploring the Use of Neural Network based Features for Text Readability Classification


曾厚強(Hou-Chiang Tseng);陳柏琳(Berlin Chen);宋曜廷(Yao-Ting Sung)

Key Words

可讀性 ; 詞向量 ; 卷積神經網路 ; 表示學習法 ; 快速文本 ; Readability ; Word Vector ; Convolutional Neural Network ; Representation Learning ; fastText.



Volume or Term/Year and Month of Publication

22卷2期(2017 / 12 / 01)

Page #

31 - 45

Content Language


Chinese Abstract


English Abstract

Text readability refers to the degree to which a text can be understood by its readers: the higher the readability of a text for readers, the better the the comprehension and learning retention can be achieved. In order to facilitate readers to digest and comprehend documents, researchers have long been developing readability models that can automatically and accurately estimate text readability. Conventional approaches to readability classification is to infer a readability model using a set of handcrafted features defined a priori and computed from the training documents, along with the readability levels of these documents. However, the use of handcrafted features requires special expertise and its applicability also is limited. With the recent advance of representation learning techniques, we can efficiently extract salient features from dcouments without recourse to specialized expertise, which offers a promising avenue of research on readability classification. In view of this, we in this paper propose two novel readability models built on top of a convolutional neural network based representation and the so-called fastText representation, respectively, which have the capability of effectively analyzing documents belonging to different domains and covering a wide variety of topics. A series of emperical experiments seem to demonstrate the utility of the proposed models in relation to several existing methods.

Topic Category 人文學 > 圖書資訊學
基礎與應用科學 > 資訊科學
工程學 > 電機工程
  1. Bertha, A. L. & Pressey, S. L. (1923). A method for measuring the" vocabulary burden" of textbooks. Educational Administration and Supervision, 9, 389-398
  2. Dale, E. & Chall, J. S. (1949). The concept of readability. Elementary English, 26(1), 19-26
  3. Flesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221-233. doi: 10.1037/h0057532
  4. Vogel, M. & Washburne, C. (1928). An objective method of determining grade placement of children's reading material. The Elementary School Journal, 28(5), 373-381
  5. Chollet, F. (2015). Keras: Deep learning library for theano and tensorflow. URL: https://keras.io.
  6. Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. (2016). Bag of tricks for efficient text classification. Retrived from arXiv preprint arXiv:1607.01759
  7. Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. Retrived from arXiv preprint arXiv:1301.3781
  8. Zhang, Y. & Wallace, B. (2015). A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. Retrieved from arXiv preprint arXiv:1510.03820
  9. Abdel-Hamid, O.,Deng, L.,Yu, D.(2013).Exploring convolutional neural network structures and optimization techniques for speech recognition.Interspeech 2013
  10. Bengio, Y.,Ducharme, R.,Vincent, P.,Jauvin, C.(2003).A neural probabilistic language model.Journal of machine learning research,3,1137-1155.
  11. Borst, A.,Gaudinat, A.,Grabar, N.,Boyer, C.(2008).Lexically-based distinction of readability levels of health documents.Acta Informatica Medica,16(2),72-75.
  12. Chall, J. S.,Dale, E.(1995).Readability revisited: The new Dale-Chall readability formula.Cambridge, Mass:Brookline Books.
  13. Chang, T. H.,Sung, Y. T.,Lee, Y. T.(2012).A Chinese word segmentation and POS tagging system for readability research.Proceedings of the 42nd Annual Meeting of the Society for Computers in Psychology
  14. Chang, T. H.,Sung, Y. T.,Lee, Y. T.(2013).Evaluating the difficulty of concepts on domain knowledge using latent semantic analysis.Proceedings of 2013 International Conference on Asian Language Processing (IALP)
  15. Ciresan, D. C.,Giusti, A.,Gambardella, L. M.,Schmidhuber, J.(2012).Deep neural networks segment neuronal membranes in electron microscopy images.Proceedings of the 25th International Conference on Advances in neural information processing systems(NIPS'12)
  16. Cireşan, D. C.,Meier, U.,Gambardella, L. M.,Schmidhuber, J.(2010).Deep, big, simple neural nets for handwritten digit recognition.Neural computation,22(12),3207-3220.
  17. Cireşan, D. C.,Meier, U.,Masci, J.,Schmidhuber, J.(2011).A committee of neural networks for traffic sign classification.Proceedings of The 2011 International Joint Conference on Neural Networks (IJCNN)
  18. Collins-Thompson, K.(2014).Computational assessment of text readability: A survey of current and future research.ITL-International Journal of Applied Linguistics,165(2),97-135.
  19. Deng, L.,Abdel-Hamid, O.,Yu, D.(2013).A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion.Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  20. Deng, L.,Li, J.,Huang, J. T.,Yao, K.,Yu, D.,Seide, F.,Acero, A.(2013).Recent advances in deep learning for speech research at Microsoft.Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  21. Feng, L.,Jansche, M.,Huenerfauth, M.,Elhadad, N.(2010).A comparison of features for automatic readability assessment.Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING '10)
  22. François, T.,Miltsakaki, E.(2012).Do NLP and machine learning improve traditional readability formulas?.Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations (PITR '12)
  23. Goodfellow, I.,Bengio, Y.,Courville, A.(2016).Deep learning (adaptive computation and machine learning series).Cambridge, MA:The MIT Press.
  24. Graesser, A. C.,McNamara, D. S.,Louwerse, M. M.,Cai, Z.(2004).Coh-Metrix: Analysis of text on cohesion and language.Behavior Research Methods, Instruments, & Computers,36(2),193-202.
  25. Graesser, A. C.,Singer, M.,Trabasso, T.(1994).Constructing inferences during narrative text comprehension.Psychological review,101(3),371-395.
  26. Hinton, G. E.(1986).Learning distributed representations of concepts.Proceedings of the eighth annual conference of the cognitive science society
  27. Johnson, R.,Zhang, T.(2014).Effective use of word order for text categorization with convolutional neural networks.NAACL HLT 2015
  28. Kim, Y.(2014).Convolutional neural networks for sentence classification.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  29. Kireyev, K.,Landauer, T. K.(2011).Word maturity: Computational modeling of word knowledge.Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
  30. Klare, G. R.(1963).Measurement of readability.Ames, IA:Iowa State University Press.
  31. Klare, G. R.(2000).The measurement of readability: useful information for communicators.ACM Journal of Computer Documentation,24(3),107-121.
  32. Landauer, T. K.,Dumais, S. T.(1997).A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.Psychological review,104(2),211-240.
  33. Landauer, T. K.,Foltz, P. W.,Laham, D.(1998).An introduction to latent semantic analysis.Discourse processes,25(2-3),259-284.
  34. Liu, Y. N.、Chen, K. Y.、Tseng, H. C.、Chen, B.(2015)。A Study of Readability Prediction on Elementary and Secondary Chinese Textbooks and Excellent Extracurricular Reading Materials。Proceedings of the 27th Conference on Computational Linguistics and Speech Processing (ROCLING 2015)
  35. Mc Laughlin, G. H.(1969).SMOG grading-a new readability formula.Journal of reading,12(8),639-646.
  36. Nair, V.,Hinton, G. E.(2010).Rectified linear units improve restricted boltzmann machines.Proceedings of the 27th international conference on machine learning (ICML-10)
  37. Petersen, S. E.,Ostendorf, M.(2009).A machine learning approach to reading level assessment.Computer speech & language,23(1),89-106.
  38. Pfeifer, R.(Ed.),Schreter, Z.(Ed.),Fogelman, F.(Ed.),Steels, L.(Ed.)(1989).Connectionism in perspective.Zurich, Switzerland:Elsevier.
  39. Srivastava, N.,Hinton, G. E.,Krizhevsky, A.,Sutskever, I.,Salakhutdinov, R.(2014).Dropout: a simple way to prevent neural networks from overfitting.Journal of machine learning research,15(1),1929-1958.
  40. Sung, Y. T.,Chen, J. L.,Cha, J. H.,Tseng, H. C.,Chang, T. H.,Chang, K. E.(2015).Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning.Behavior research methods,47(2),340-354.
  41. Truran, M.,Georg, G.,Cavazza, M.,Zhou, D.(2010).Assessing the readability of clinical documents in a document engineering environment.Proceedings of the 10th ACM symposium on Document engineering (DocEng '10 )
  42. Tseng, H. C.、Hung, H. T.、Sung, Y. T.、Chen, B.(2016)。Classification of Text Readability Based on Deep Neural Network and Representation Learning Techniques。Proceedings of 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)
  43. Tseng, H. C.,Sung, Y. T.,Chen, B.,Lee, W. E.(2016).Classification of text readability based on representation learning techniques.Proceedings of the 26th Annual Meeting of the Society for Text & Discourse
  44. Vapnik, V. N.,Chervonenkis, A. Y.(1974).Teoriya raspoznavaniya obrazov. Statisticheskie problemy obucheniya.Moscow, Russia:Nauka.
  45. Yan, X.,Song, D.,Li, X.(2006).Concept-based document readability in domain specific information retrieval.Proceedings of the 15th ACM international conference on Information and knowledge management (CIKM '06)