多群組離散型驗證性因素分析模型在多元計分試題差異功能檢定之研究

本研究在探討在多群組離散型驗證性因素分析模型下，利用強韌性卡方差異檢定，並且配合基線模式開放法來檢測多元計分題之試題差異功能(DIF) 的有效性。我們利用模擬實驗來調查在不同的樣本數、群組之平均潛在能力差異、DIF 比例、DIF 強度、DIF 類型以及顯著水準類型等因素條件下，該檢測之型一誤差和檢定力的表現，以了解這些因素對檢測有效性的影響。研究結果發現：整體而言，強韌性卡方差異檢定能有效的檢測出DIF 試題；在以下的情境檢定力較高：大樣本數、重度DIF、DIF 類型為僅因素負荷量上有DIF 和因素負荷量和閾值均有DIF 時檢定力較高；是否有群組之平均潛在能力差異、DIF 比例二者則對檢定力影響不顯著；Bonferroni 修正由於過度保守，建議無須特別採用。另外，與過去文獻比較時發現：使用基線模式開放法比基線模式限制法有明顯較低的型一誤差，而基線模式限制法在經過Oort 調整過顯著水準後則有可接受的型一誤差。但不論調整與否，基線模式限制法比基線模式開放法平均來講有較佳的檢定力。再者，分析時視多元計分試題資料為離散型、檢測DIF 前先篩選出不配適的模型並不會使檢定力增加。

關鍵字

試題差異功能；多群組離散型驗證性因素分析模型；強韌性卡方差異檢定；基線模式開放法； Bonferroni 修正

並列摘要

The aim of this study is to assess the efficiency of using multiple group categorical CFA and robust chi-square difference test in DIF detection for polytomous items under the free baseline strategy. Simulation studies are conducted to examine the empirical type I error and power of DIF detection and the effects of five factors are investigated, including sample sizes, impacts, DIF percentages, DIF sizes, and types of DIF. Based on our results, robust chi-square difference test is shown to be efficient in detecting DIF for polytomous items, especially under the conditions of large sample size, large DIF size, and either factor loadings or both factor loadings and thresholds having DIF. Moreover, impact and DIF percentages do not seem to make significant difference in power for DIF detection. Bonferroni correction appears to be too conservative and therefore is not recommended for use. Compared to past studies with constrained baseline strategy, free baseline strategy seems to result in smaller type I errors. However, correcting the significance level of the former strategy using Oort’s approach will result in acceptable type I error. On average, higher powers are usually obtained for constrained-baseline than free-baseline strategy no matter whether Oort’s correction is applied. Furthermore, regarding polytomous data as discrete rather than continuous and adding the process of examining model fit before DIF detection do not seem to increase power in DIF detection.

並列關鍵字

DIF ； multiple-group categorical CFA ； robust chi-square difference test ； free baseline strategy ； Bonferroni correction

參考文獻

蔡良庭、楊志堅、王文中、施慶麟（2008）。應用MIMIC 模式評估方法以檢定試題差異性之研究。測驗學刊，55，287-312。

Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures. The issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.

Camilli, G., & Shepard, L. (1994). Methods for Identifying Biased Test Items. New Park, CA: Sage.

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing masurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233-255.

Drasgow, F., & Kanfer, R. (1985). Equivalence of psychological measurement in heterogeneous populations. Journal of Applied Psychology, 70, 662-680.

國際替代計量

多群組離散型驗證性因素分析模型在多元計分試題差異功能檢定之研究

主題瀏覽