透過您的圖書館登入
IP:3.15.190.144
  • 學位論文

潛在特質模型錯排分派量測下評分者間信度之探討

Study on Inter-raters Reliability under the Latent Trait Model through Derangement

指導教授 : 陳宏

摘要


本研究的目的旨在當一大群應試者經由錯排方式分派給評分者時,探討評分者對應試者潛在特質進行評等之評分者間信度。在潛在特質模型(LTM)的假設下,polychoric 相關係數被用來當作評分者間信度。 我們認為經由錯排方式將約三十萬名應試者分派給幾百位的評分者,能確保兩評分者共同評等的應試者至少上百人。在這樣的設置下,我們發現所有的評分者都會被分成幾個循環組。透過分析論證及100次的模擬結果,發現所形成的循環組數大部分不超過十組,至少有一組2-循環或3-循環的比例為0.59,而且經常產生評分者個數超過100的循環組。每位評分者所被分派到的應試者潛在特質之分配,經由Kolmogrove-Smirov test發現大部分來自於標準常態分配,僅有少數群應試者潛在特質與其他群差別在於平均數的差異。 在潛在特質模型(LTM)的假設下,我們認為鑑別參數可視為評分者評等精確度的指標。同時我們也說明評分者的評等與應試者潛在特質之相關性與等級門檻(thresholds)和鑑別參數有關。兩評分者觀感潛在變數之相關係數為鑑別參數之乘積,並以兩階段的方式以polychoric相關係數來估計。 藉由評分者所給的級分比例求出他們的等級門檻,鑑別參數則是藉由polychoric相關係數及適當的錯排分派方式推得。 最後針對本研究的結果作個總結與建議。

並列摘要


We investigate the inter-rater reliability when the ability of large number of examinees is classified to ordinal grade by raters through derangement. The polychoric correlation coefficient is used as inter-rater reliability when the latent trait model (LTM) is assumed. To ensure at least hundreds examinees is graded by two raters when the number of raters is around a few hundred and the number of examinees is around three hundred thousand, we consider assigning examinees to raters through derangement. Under this setting, it is found that all raters are grouped into several cycles. Through analytic argument and simulation, it is found that the number of group is often not more than ten, the probability of getting at least one cycle of size 2 or 3 is close to 0.59, and the size of largest cycle is often exceeding one hundred. It also finds that the distributions of latent trait of examinees by different raters are close to each other up to a location shift. Under the assumption of the LTM, the discriminate parameter in models can be regard as the accuracy of rating.The correlation between the grades given by raters and the latent trait of examinees was affected by the interaction of the thresholds and discriminate parameter. The correlation coefficient of perspective latent trait variables of two raters is the product of their discriminate parameter, and polychoric correlation coefficient can be estimated by two stages method. The parameter of the thresholds of raters were estimated by the proportion of rating, while as discriminate parameter can be estimates through appropriate derangement. Finally according to the result of research, we propose the summary and some suggestions.

參考文獻


a paper presented at the Annual Meeting of the American Educational Research Association.
Agresti, A. (2002), Categorical Data Analysis, New Jersey: John Wiley & Sons,Inc.,2 edition.
Bickel, P.J. and Doksum, K.A. (2001), Mathematical Statistics:Basic ideas and selected topics, New Jersey: Prentice-Hall,Inc., 2 edition.
DeCarlo, L.T. (2005), "A model of rater behavior in essay grading based on signal detection theory", Journal of Educational Measurement, 42(1), 53-76.
Fleiss, J.L., Levin, B., and Paik, M.C. (2003), Statistical Methods for Rates and Proportions, New Jersey: John Wiley & Sons,Inc., 3 edition.

延伸閱讀