透過您的圖書館登入
IP:18.218.234.83
  • 學位論文

在隨機試題效果下以Mantel-Haenszel法和Logistic Regression法進行二分題差異試題功能之檢驗

Assessment of Differential Item Functioning for Dichotomous Items with Random Item Effects via the Mantel-Haenszel and Logistic Regression Methods

指導教授 : 蘇雅蕙
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在實務應用上,試題反應理論 (item response theory; IRT) 通常將試題的難度參數當作固定效果、人的能力參數當作隨機效果。理論上,試題的難度參數亦可視為隨機效果。過去有關差異試題功能 (differential item functioning; DIF) 檢驗的研究中,多將試題的難度參數視為固定效果;雖然有少數研究將試題參數視為隨機效果進行DIF檢驗,但這些研究並未充分考量現實情境且難以在實務推廣,如僅操弄全部有利於焦點團體的DIF型態、操弄DIF試題含量低 (約25%)、使用的DIF檢驗方法是透過計算複雜的參數估計程序等。因此本研究目的為在隨機試題效果下,瞭解實務常用的DIF檢驗方法之效果,研究採用Mantel-Haenszel法 (MH; Holland & Thayer, 1988; Mantel & Haenszel, 1959) 和Logistic Regression法 (LR; Swaminathan & Rogers, 1990),同時也與固定試題效果情境比較。研究結果發現在大多數的情境下,兩種試題效果的統計檢定力表現差不多;但當試題的難度參數被視為隨機效果時,DIF檢驗效果的型一錯誤率偏離0.05的情形比固定試題效果情境多。當兩團體的平均試題難度差異 (mean item difficulty difference; MIDD) 小於0.04時,兩種試題效果在傳統DIF檢驗之型一錯誤率與統計檢定力的表現差不多;當MIDD超過0.06時,兩種試題效果下都會有傳統DIF檢驗之型一錯誤率明顯失控的情形,但透過量尺淨化 (scale purification) 的程序,兩種試題效果下皆可將型一錯誤率完全控制在0.05附近。本研究同時比較隨機和固定試題效果的DIF檢驗,且實驗設計也較過去研究更考量真實情境,期許此研究結果將能有助於瞭解與解釋隨機試題效果之DIF檢驗的結果。

並列摘要


It is common practice in item response theory (IRT) to consider items as fixed effects and persons as random effects. Theoretically, items can be treated as random effects as well. Many studies on the assessment of differential item functioning (DIF) treat items as fixed effects. Few studies treat items as random effects; however, they didn’t manipulate all possible DIF type and high DIF percentage, and the DIF detection method they used was not easy to be implemented for practical users. Therefore, the aim of this study was to investigate the efficiency of DIF assessment for dichotomous items with random item effects via Mantel-Haenszel (Holland & Thayer, 1988; Mantel & Haenszel, 1959) and the Logistic Regression (Swaminathan & Rogers, 1990) methods. The results of random item effects were compared with those of fixed item effects in this study. The results showed the powers of DIF detection for both item effects were similar. However, the type I error of random item effects deviated 0.05 much more than that of fixed item effects. When the mean item difficulty difference (MIDD) smaller than 0.04, both item effects showed similar results in type I error and power of one-stage DIF detection methods. When MIDD was greater than 0.06, the type I error of one-stage DIF detection methods inflated significantly for both item effects; however, scale purification could reduce the inflated type I error. This study investigated the DIF detection under the random item effects and fixed item effects simultaneously, and the simulations were manipulated to real situation. Hence, the results from the study are expected to facilitate the understanding and the explanation of the DIF assessment when the items come from a distribution in practice.

參考文獻


盧雪梅、毛國楠 (2008)。 國中基本學力測驗數學科之性別差異與差別試題功能 (DIF) 分析。 教育實踐與研究,21(2),95-126
Albers, W., Does, R. J. M. M., Imbos, T., & Janssen, M. P. E. (1989). A stochastic growth model applied to repeated tests of academic knowledge. Psychometrika, 54, 451-466.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
translated assessment instruments. Applied Psychological Measurement, 19, 309-321.
Chilisa, B. (2000). Towards equity in assessment: Crafting gender-fair assessment. Assessment in Education, 7, 61-81.

延伸閱讀