  • 學位論文


Peer evaluation system for open-ended questions based on pairwise comparisons

指導教授 : 陳和麟
共同指導教授 : 葉丙成(Ping-Cheng Yeh)


近年來由於 MOOCs 的興起,掀起了一波學習新革命。各式各樣的線上學習平 台不斷萌芽、快速發展,不論是想要課前讓學生預習的平台、課中增加學生課堂參 與度的平台、或是課後提升學習動機讓學生複習的平台。我們發現這些平台有個共 同要面對的問題:如何去批改開放式問題? 有的系統使用複雜機制:互評、檢討、自評、仲裁;有的系統利用大量學生匿 名互評的方式;有的系統乾脆只提供有標準答案的題目。但我們卻找不到一個有理 論背景支持、又快速方便的系統,能讓傳統教學環境中老師能夠快速上手,加入這 個 E-Learning 的時代。 本研究以快速、方便、有效為核心宗旨,打造了一個線上教育平台:師暢。旨 在帶給傳統教育不一樣的改變。有別於大多數教育平台只支援有「標準答案」的題 目。我們提出一套演算法,藉由「同儕互評」與「成對比較」的評分方式,能自動 批改「開放式問題」。希望透過同儕互評的方式,提高學生認知領域的層次,增加 訓練學生思辨能力的機會。並透過自動批改功能,減輕老師教學上的負擔。讓老師 只需要專注於設計教案、題目,而不用費心去批改。而系統在批改的同時也能計算 出每位學生的「評鑑能力」回饋給老師,幫助老師了解學生的學習狀態。讓老師能 評估學生是否有將知識融會貫通、是否有掌握認知領域中的最高層次學習目標:評 鑑能力。 有鑑於其它平台皆無有力的理論支援,我們引進了不同領域的理論背景,以實 作「師暢」的演算法。我們將結合推廌系統、資料探勘常用的成對比較演算法,佐 以機器學習中的主動學習方法,以提升演算法準確度。並且透過與學校老師合作, 證明了系統自動評分與老師評分為高度相關,相關係數約為 0.9。而「師暢」也在 此次實驗中,成功輔助老師抓到一些評分上的疏失與發覺評分標準前後不一的情 況,而修正了之前批改的分數。


In recent years, MOOCs and online education have changed the education a lot. Many people regard MOOCs as a revolution since anyone with an Internet connection can learn. However, online courses encounter a new problem: How to grade open-ended questions automatically and accurately? There are many online education platforms trying to deal with this problem. Some use peer assessment but often leads to inaccurate grades and low-quality feedback [1]. Some develop complex systems, such as peer review, self-evaluation, multiple rounds of evaluation, anonymous feedback, and so on. Some can only be applied to objective questions. But it seems that there’s no education platform can use simple but effective way to deal with the problem currently. In this thesis, several theorems are proposed and an online education platform named PK-Grader is developed. The name means auto-grading by the result of PK. It grades open-ended questions by ordinal peer evaluations and generates not only the score of answers but also the evaluation ability of each student. It also allows teachers to better understand their students and know whether they really get the concept and reach higher category in Bloom’s cognitive domain: evaluation ability. Several theorems of different fields and the combination of pairwise comparison algorithms, active learning methods, and probability models will be introduced to form our algorithm. We prove our auto-grading algorithm’s correctness by testing it with high school students and found that there is a high correlation between the scores from our system and the scores from the teacher (the correlation coefficient is about 0.9). PK- Grader also enabled the teacher to find out the fact that some original scores were wrong and helped the teacher evaluate the assignment more accurately.


[7] 陳豐祥, "新修定布魯姆認知領域目標的理論內涵及其在歷史教學上的應用," 歷史教育, 2009.
[5] 沈慶珩 and 黃信義, "網路同儕互評在 Moodle 系統上的應用," 教育資料與圖書館學, vol. 43, no. 3, pp. 267-284, 2006.
[10] W. Barnett, "The modern theory of consumer behavior: Ordinal or cardinal?," Quarterly Journal of Austrian Economics, vol. 6, no. 1, pp. 41-65, 2003.
[12] K. Topping, "Peer assessment between students in colleges and universities," (in English), Review of Educational Research, vol. 68, no. 3, pp. 249-276, Fal 1998.
[13] K. J. Topping, "Peer assessment," Theory into practice, vol. 48, no. 1, pp. 20-27, 2009.
