本論文的目的是設計一套適性化英文文章推薦系統,依據全民英檢閱讀測驗對文章難易度的定義,找出影響文章難易度最重要的屬性,再利用這些屬性來對網路代理人所蒐集來的大量網頁英文文章進行分類,並根據使用者的英文閱讀能力推薦適當的文章,以提昇學習的效率。實作方面,我們利用網路代理人技術,用機器人程式(robot)每天從網際網路收集大量英文文章並加以處理,以取得文章特徵值向量,再根據每位使用者的輪廓檔(user profile) 進行文章推薦,讓使用者每次都能閱讀到網路上符合其閱讀能力的新文章。 為提昇準確度與系統分類的效率,我們要找出影響文章難易度分類的最重要屬性。研究中我們採用全民英檢閱讀測驗初級、中級、以及中高級閱讀測驗各六十篇文章進行實驗;在文章特徵值向量的選取方面我們考慮了八個屬性以判斷每篇文章難易度,分別是每篇文章中包含全民英檢所定義的初級、中級、以及中高級等三個等級詞庫字的比例、平均句長、平均音節數、布朗語料庫1-2000個所佔比例、布朗語料庫2000-5000個所佔比例以及RE值等。分類的方法我們則採用C5.0決策樹演算法與區別分析並以倒傳遞類神經網路加以驗證。
Internet is absolutely a fast growing media. It grows faster than any other media that ever occurred in human history. According to the study of ITU, Internet only use 4 years to achieve 50 million of users, which take 13 years for TV to meet the same number. As it grows so fast, for a user who tries to improve his English comprehension via Internet, the good news is he can have numerous articles everyday for free, while the bad news is dealing with the tremendous amount of Information, finding suitable articles for learning on Internet has become a dirty job. This paper proposes an architecture of an adaptive English auxiliary learning system based on information retrieval techniques. It includes the following four components: English document search agent, user profiles, document recommend agent, and document reading auxiliary interface. By using the proposed architecture, users can improve their reading comprehension adaptively according to their own English comprehension. And the experimental results show that the classification process using the vector model in the proposed system has good performance in comparison with the classification given by the original web site.