利用字彙與子句結構進行全民英檢閱讀文章難易度分類之研究

網際網路是成長最快的傳播媒體，包含豐富多樣的資訊，但在龐大的網路資訊中，使用者若要從網路中搜尋適合自己的英語學習的文章進行閱讀是一件不容易的事。因此，本研究探討影響英文文章難易度之各種屬性的重要性，以資料探勘之演算法來找出最適當的屬性；作為建置以內容為基礎之適性化英文閱讀文章推薦系統之依據，來提高使用者的學習效率。字彙和句型結構是影響文章難易度的重要因素，字彙部分我們利用全民英檢之字庫及美國布朗大學字庫來計算字彙結構之屬性，另外，音節數、平均句長、冗餘字比例等特徵亦用來做為文章分類的屬性，句型結構方面，考量包含子句之句型可能對文章難易度造成影響，因此我們根據多種子句之特徵計算相關屬性值為分類依據，並以C5.0作為分類演算法。我們利用480多篇全民英檢閱讀測驗文章做為實驗樣本，經實驗證明在英文文章難度的分類上，全民英檢中高級、Flesch-Kincaid Grade Level、平均音節數等屬性具有一定的判別度。

關鍵字

推薦系統； C5.0 決策樹；屬性決定；文章難易度

並列摘要

Internet is the fastest growing medium in the world and contains abundant information. But among the tremendous amount of information, it is difficult for users to obtain some certain English articles properly for the purpose of reading comprehension enhancement on their own. Therefore, we try to probe the influence of each attributs and find out the most important ones by data mining algorithms. Using these attributes a content-based adaptive article recommendation system is proposed in this study to recommend suitable articles for the user to improve his English reading comprehension. Vocabulary and the sentence pattern are the most important factors which influence the difficultly of an article. To exam the influence of vocabulary, we use the GEPT and Brown corpus to determine the attributes of vocabulary. Besides, some attributes like the average syllable number per word, the average sentence length, and the percentage of stop words are also used. As for the sentence pattern factor, we search for the occurrence of different types of clause in an article as the attributs and the C5.0 decision tree is used for article classification. In the experiments, over 480 articles of GEPT reading test are used as the traing and testing smaples. The expermental results show that attributs like medium level vocabulary of GEPT，Flesch-Kincaid Grade Level，and average syllable number per word have remarkable influence on article classification among the attributes we use.

並列關鍵字

Recommendation system ； C5.0 Decision Tree ； Attribute Determining ； Text Difficulty

參考文獻

3. 吳紫葦(民95年)，「利用句法與統計之文法搭配與多字詞語之擷取」，清華大學資訊系統與應用研究所碩士論文。

19. 楊子儀(民98)，「基於代理人技術之適性化英文閱讀文章推薦系統」，長榮大學資訊管理學系碩士論文。

1. 王景南(民92)，多類支向機之研究，元智大學資訊管理系碩士論文。

4. 宋佩貞(民98)，「台灣審定版國小英語教科書適讀性公式建置與評估」，台東大學教育學系教學科技碩士論文。

6. 邱秋婷(民96)，屬性導向方法應用於證券交易相對關係規則之挖掘，國立中央大學資訊管理研究所碩士論文。

被引用紀錄

Lin, Y. H. (2016). 詞表告訴了我們什麼？—以詞義及難度分級檢驗現存英文詞表 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201610020

薛仱芸（2014）。改善網路操弄評論分類績效之研究〔碩士論文，朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-0905201416542666

莊閔茹（2016）。高低能力國中生理解順向及非順向時序英文對話性文本-來自眼動的證據〔碩士論文，國立交通大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0030-0803201714401609

國際替代計量

利用字彙與子句結構進行全民英檢閱讀文章難易度分類之研究

全文下載

主題瀏覽