在認知診斷測量架構中的試題差異功能偵測效果探討

試題差異功能檢驗已被視為在測驗發展過程的重要程序。隨著認知診斷評量持續在實務與方法學研究方面受到關注，在認知診斷測量架構下的試題差異功能議題自然也莫可忽視。本研究涵蓋三大目的，首先，本研究提出以模式為基礎所進行的試題差異功能偵測方法以處理認知診斷評量架構下的補償與非補償性資料；其次，本研究聚焦於過去在認知診斷測量架構下的試題差異功能研究中所忽視的當測驗受到偏誤試題污染的相關議題。最後，本研究以更系統性的探討可能影響試題差異功能偵測方法成效的因素，並將這些可能的影響因素導入於模擬研究設計中。本研究以馬克夫鍊蒙地卡羅演算法分別針對兩個所提出的模式進行參數估計，並且比較參數回覆性效果，同時檢驗在不同測驗情境下，使用模式為基礎的試題差異功能偵測方法與非參數取向的MH以及LR等試題差異功能偵測方法的型一錯誤率以及統計檢定力。除此之外，本研究加入了淨化程序於MH以及LR等試題差異功能偵測方法之中，並探討加入試題淨化程序後對於試題差異功能偵測的效能能否提升。最後，本研究使用2007年國際數學與科學教育成就趨勢調查研究中四年級數學科評量為範例，說明如何運用所提出的試題差異功能偵測方法於實務情境中。研究結果發現，在參數回覆方面，本研究所提出的兩個模式為基礎的試題差異功能偵測方法其參數回覆性效果甚佳。而在不同試題差異功能偵測方法的比較方面，本研究發現在相同測驗情境下以模式為基礎的試題差異功能檢驗方法其型一錯誤率的控制以及統計檢定力均優於MH以及LR。再者，模擬研究結果發現，當處理認知診斷測量資料時，試題遭受污染而未加以進行淨化程序即進行試題差異功能偵測，將會影響偵測效果，並且得到錯誤的結論。隨著淨化程序的加入，可以幫助改善MH以及LR等試題差異功能偵測方法在特定情境下的型一錯誤率的控制以及統計檢定力。不過此兩種方法，即使加入淨化程序後，仍無助於解決當受試者平均能力分布差異很大時，所造成的第一類型錯誤率膨脹的問題。最後，本研究也發現相較於MH以及LR等試題差異功能偵測方法，本研究所提出的模式為基礎的試題差異功能偵測方法在試題差異功能偵測的結果解釋較為細緻，並且能藉由模式擴展找出可能造成試題差異功能原因的前瞻性。

關鍵字

認知診斷測量；試題差異功能檢驗；限制式高階層再參數化DINA模式；限制式高階層再參數化DINO模式

並列摘要

Detection of Differential item functioning, DIF has been recognizing as an important procedure especially in test development. With the cognitive diagnostic measurements, CDMs continue to receive attention both in applied and methodological studies. DIF related issues in the framework of CDMs remain to concern. The purpose of the study had three objectives; first, to propose model based DIF detection method in dealing compensatory and non-compensatory cognitive diagnostic data; second, to address on the contaminated matching criterion issue that has be overlook in the past DIF study within the CDM framework; third, to investigate more possible factors that may affect DIF detection methods and introduced into the simulation design. An MCMC algorithm employing Gibbs sampling was used to estimate the two proposed models and simulation study was done to examine model recovery, Type I error rates, and power under different testing conditions. For DIF detection, the model based method was also compared with the MH method and LR method. Furthermore, the purification procedure is applied in the MH and LR methods and compared with the model based method to investigate the effectiveness of DIF detection methods. Finally, TIMSS 2007 fourth grade mathematics assessment was used to demonstrate and the results were used to illustrate the implementation of the new method. The parameter recovery of the proposed models yielded well. The simulation results of DIF methods comparison appeared to confirm that the model based method outperformed the MH and LR methods in Type I error control and power rate under comparable testing conditions. Moreover, the result revealed that the biased matching criterion may also determine the effectiveness of DIF detection in a framework of cognitive diagnostic measurement. With purification procedure, could improve the Type I errors and power rates for MH and LR under specific circumstance. Finally, the model based method had the strength of interpreting results more elaborately compared to the other DIF methods.

並列關鍵字

Cognitive diagnostic measurement ； Differential item functioning ； restricted higher-order reparameterized DINA model ； restricted higher-order reparameterized DINO model

參考文獻

Candell, G. L. & Drasgow, F. (1988). An purification procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.

Chaimongkol, S. (2005). Modeling differential item functioning (DIF) using multilevel logistic regression models: A Bayesian perspective. Unpublished doctoral dissertation, The Florida State University.

de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130.

de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199.

de la Torre, J. & Karelitz, T. M. (2009). Impact of diagnosticity on the adequacy of models for cognitive diagnostic under a linear attribute structure: A simulation study. Journal of Educational Measurement, 46(4), 450-469.

延伸閱讀

Ke, B. S. (2017). 分類模型中表現測度的模型診斷估計和探討及其於主動學習之應用 [doctoral dissertation, National Chiao Tung University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0030-0205201911011729
李信宏、林銘嫻（2019）。認知診斷模型DINA模式之差異試題功能分析。測驗學刊，66(2)，189-212。https://www.airitilibrary.com/Article/Detail?DocID=16094905-201906-201907120014-201907120014-189-212
周智亮、鄭振璋、董俊良、林育志、陳嘉炘、蔡宗育（2016）。客觀結構化臨床考試(OSCE)施測者回饋能力差異之探討。物理治療，41(4)，308-309。https://doi.org/10.6215/FJPT.2016.72.P07
Brown, C. L., Gibbons, L. E., Kennison, R. F., Robitaille, A., Lindwall, M., B.Mitchell, M., Shirk, S. D., Atri, A., Cimino, C. R., Benitez, A., MacDonald, S. W. S., Zelinski, E. M., Willis, S. L., Schaie, K. W., Johansson, B., Dixon, R. A., Mungas, D. M., Hofer, S. M., & Piccinin, A. M. (2012). Social Activity and Cognitive Functioning Over Time: A Coordinated Analysis of Four Longitudinal Studies. Journal of Aging Research, 2012(), 304-315. https://doi.org/10.1155/2012/287438
丘佳融（2019）。The Influence of Diagnostic Argument on Differential Correction Effect in High Involvement Situation〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://doi.org/10.6345/NTNU201900566

國際替代計量

在認知診斷測量架構中的試題差異功能偵測效果探討

主題瀏覽