透過您的圖書館登入
IP:52.14.0.24
  • 期刊
  • OpenAccess

探究使用基於類神經網路之特徵於文本可讀性分類

Exploring the Use of Neural Network based Features for Text Readability Classification

摘要


可讀性通常指的是閱讀題材可以被讀者理解的程度:當閱讀材料愈能夠被讀者所理解時,就愈能夠產生好的學習效果。為了能夠幫助讀者去適配符合自己閱讀能力的文件,研究人員長久以來持續發展各種能夠自動且精準地估測文本可讀性的模型來達到此目標。可讀性分類通常是透過分析文件上的資訊來轉化成一組可讀性特徵,再利用這些可讀性特徵來訓練出可讀性模型,以便能預測未知文件的可讀性。然而,傳統的可讀性模型所使用的特徵都需要根據專家的經驗來進行選取,這卻也限制其實用性。近年來隨著表示學習法技術的蓬勃發展,訓練可讀性模型所需要的特徵可以不再需要仰賴專家,這也使得可讀性模型的發展有了一個嶄新的研究方向。因此,本論文嘗試以卷積神經網路以及快速文本兩種技術分別來自動地擷取文本特徵,以訓練出一個能夠分析跨領域文件的可讀性模型,並可以因應文件內容多元主題的特性。經與現有方法的一系列實驗比較後,其結果確認了本論文所提可讀性模型的效能優勢。

並列摘要


Text readability refers to the degree to which a text can be understood by its readers: the higher the readability of a text for readers, the better the the comprehension and learning retention can be achieved. In order to facilitate readers to digest and comprehend documents, researchers have long been developing readability models that can automatically and accurately estimate text readability. Conventional approaches to readability classification is to infer a readability model using a set of handcrafted features defined a priori and computed from the training documents, along with the readability levels of these documents. However, the use of handcrafted features requires special expertise and its applicability also is limited. With the recent advance of representation learning techniques, we can efficiently extract salient features from dcouments without recourse to specialized expertise, which offers a promising avenue of research on readability classification. In view of this, we in this paper propose two novel readability models built on top of a convolutional neural network based representation and the so-called fastText representation, respectively, which have the capability of effectively analyzing documents belonging to different domains and covering a wide variety of topics. A series of emperical experiments seem to demonstrate the utility of the proposed models in relation to several existing methods.

參考文獻


Bertha, A. L. & Pressey, S. L. (1923). A method for measuring the" vocabulary burden" of textbooks. Educational Administration and Supervision, 9, 389-398
Chollet, F. (2015). Keras: Deep learning library for theano and tensorflow. URL: https://keras.io.
Dale, E. & Chall, J. S. (1949). The concept of readability. Elementary English, 26(1), 19-26
Flesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221-233. doi: 10.1037/h0057532
Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. (2016). Bag of tricks for efficient text classification. Retrived from arXiv preprint arXiv:1607.01759

延伸閱讀