網際網路是成長最快的傳播媒體,包含豐富多樣的資訊,但在龐大的網路資訊中,使用者若要從網路中搜尋適合自己的英語學習的文章進行閱讀是一件不容易的事。因此,本研究探討影響英文文章難易度之各種屬性的重要性,以資料探勘之演算法來找出最適當的屬性;作為建置以內容為基礎之適性化英文閱讀文章推薦系統之依據,來提高使用者的學習效率。 字彙和句型結構是影響文章難易度的重要因素,字彙部分我們利用全民英檢之字庫及美國布朗大學字庫來計算字彙結構之屬性,另外,音節數、平均句長、冗餘字比例等特徵亦用來做為文章分類的屬性,句型結構方面,考量包含子句之句型可能對文章難易度造成影響,因此我們根據多種子句之特徵計算相關屬性值為分類依據,並以C5.0作為分類演算法。我們利用480多篇全民英檢閱讀測驗文章做為實驗樣本,經實驗證明在英文文章難度的分類上,全民英檢中高級、Flesch-Kincaid Grade Level、平均音節數等屬性具有一定的判別度。
Internet is the fastest growing medium in the world and contains abundant information. But among the tremendous amount of information, it is difficult for users to obtain some certain English articles properly for the purpose of reading comprehension enhancement on their own. Therefore, we try to probe the influence of each attributs and find out the most important ones by data mining algorithms. Using these attributes a content-based adaptive article recommendation system is proposed in this study to recommend suitable articles for the user to improve his English reading comprehension. Vocabulary and the sentence pattern are the most important factors which influence the difficultly of an article. To exam the influence of vocabulary, we use the GEPT and Brown corpus to determine the attributes of vocabulary. Besides, some attributes like the average syllable number per word, the average sentence length, and the percentage of stop words are also used. As for the sentence pattern factor, we search for the occurrence of different types of clause in an article as the attributs and the C5.0 decision tree is used for article classification. In the experiments, over 480 articles of GEPT reading test are used as the traing and testing smaples. The expermental results show that attributs like medium level vocabulary of GEPT,Flesch-Kincaid Grade Level,and average syllable number per word have remarkable influence on article classification among the attributes we use.