表徵學習法之文本可讀性

Tseng Hou-Chiang

doi:10.6345/NTNU202000173

透過您的圖書館登入 IP:3.129.69.151

透過您的圖書館登入

IP:3.129.69.151

繁體中文
English
简体中文

精確檢索 : 冠狀病毒
模糊檢索 : 冠狀病毒
冠狀病毒感染

冠狀病毒疾病
查詢出版品: 冠狀病毒

進階查詢

查詢歷史

主題瀏覽

【下載完整報告】AI熱潮從學術研究也能看出端倪？哪些議題是2023熱搜議題？

學位論文

表徵學習法之文本可讀性

Representation Learning for Text Readability

曾厚強(Tseng Hou-Chiang)

指導教授：陳柏琳；宋曜廷

國立臺灣師範大學/理學院/資訊工程學系/博士(2020年)

https://doi.org/10.6345/NTNU202000173

若您是本文的作者，可授權文章由華藝線上圖書館中協助推廣。

查找全文

摘要

none

關鍵字

none

並列摘要

Text readability refers to the degree to which a text can be understood by its readers: the higher the readability of a text for readers, the better the comprehension and learning retention can be achieved. In order to facilitate readers to digest and comprehend documents, researchers have long been developing readability models that can automatically and accurately estimate text readability. Conventional approaches to readability classification aim to infer a readability model using a set of handcrafted features defined a priori and computed from the training documents, along with the readability levels of these documents. However, developing the handcrafted features is not only labor-intensive and time-consuming, but also expertise demanding. With the recent advance of representation learning techniques, we can efficiently extract salient features from documents without recourse to specialized expertise, which offers a promising avenue of research on readability classification. In view of this, we in this study based on representation learning techniques propose several novel readability models, which have the capability of effectively analyzing documents belonging to different domains and covering a wide variety of topics. Compared with a baseline reference using a traditional model, the new model improves by 39.55% to achieve 78.45% of accuracy. We then combine different kinds of representation learning algorithm with general linguistic features, and the accuracy improves by an even higher degree of 40.95% to achieve 79.85%. Finally, this study also explores character-level representations to develop a novel readability model, which offers the promise of conducting a successful text readability assessment of the Chinese language with 78.66% accuracy. All the above results indicate that the readability features developed in this study can be used both to train a readability model for leveling domain-specific texts and to be used in combination with the more common linguistic features to enhance the efficacy of the model. As to future work, we will focus on exploring more training methods of constructing the semantic space and combining text summarization techniques, in order to distill salient aspects of text content that can further enhance the effectiveness of a readability model.

並列關鍵字

Readability ； Latent Semantic Analysis ； Word2vec ； fastText ； StarSpace ； Convolutional Neural Network ； BERT

參考文獻

Altszyler, E., Sigman, M., Ribeiro, S., and Slezak, D. F. 2016. Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint arXiv:1610.01520.

Google Scholar

Arras, L., Horn, F., Montavon, G., Müller, K. R., and Samek, W. 2017. "What is relevant in a text document?": An interpretable machine learning approach. PloS one, 12(8): e0181142.

Google Scholar

Bailin, A., and Grafstein, A. 2001. The linguistic assumptions underlying readability formulae. Language and Communication 21(3): 285-301.

Google Scholar

Begeny, J. C., and Greene, D. J. 2014. Can readability formulas be used to successfully gauge difficulty of reading materials?. Psychology in the Schools 51(2): 198-215.

Google Scholar

Belden, B. R., and Lee, W. D. 1961. Readability of biology textbooks and the reading ability of biology students. School Science and Mathematics 61(9): 689-693.

Google Scholar

國際替代計量

表徵學習法之文本可讀性

主題瀏覽

表徵學習法之文本可讀性

Representation Learning for Text Readability

摘要

關鍵字

並列摘要

並列關鍵字

參考文獻

延伸閱讀

國際替代計量

相關連結

本網站使用Cookies