透過您的圖書館登入
IP:18.206.12.31
  • 期刊

A Comparison of the Standardization and IRT Methods of Adjusting Pretest Item Statistics Using Realistic Data

標準化法與IRT法於校正預試試題統計值之比較:真實資料研究

若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在實際的預試情下,常常無法滿足 IRT 模式估計試題參數所要求的大樣本條件。雖然傳統的試題分析方法所需的樣本遠較 IRT 法為小,但是不同樣本所得的傳統試題統計值非屬同一量尺,因此無法直接比較。本趼究承續 Chang, Hanson, and Harris (2000) 的研究,使用比該研究更似真實的資料,進一步探討在小樣本的情況之下,標準化法 ( the standardization method)在調整預試試題統計值(即估計母群試題參數)的功能,並與1PL以及3PL的表現進行比較。研究結果顯示,使用MIRT 50 向度所模擬的真實資料時,3PL 在估計母群試題難易度與鑑別度的表現比 1PL 或標準化法來得好。就估計母群試題鑑度而言,標準化法比1PL好,但就估計母群試題難度而言,1PL卻比標準化法好。另外,本研究亦使用一大型測驗的預試資料進行比較,結果顯示1PL的表現最好。就估計難易度而言,3PL的表現最差;就估計鑑別度而言準化法的表差。茲因本研究考慮的變項有限,標準化法之於1PL與3PL表現之結論應該有所保留。雖然標準化法的結果不比IRT法來得精確,但就方法的簡便性而言,使用標準化法代規IRT法似乎是可行的。

並列摘要


The requirement of large sample sizes for calibrating items based on IRT models is not easily met in many practical pretesting situations. Although classical item statistics could be estimated with much smaller samples, the values may not be comparable across different groups of examinees. This study extended Chang, Hanson, and Harris (2000) by further exploring the standardization method and comparing its effectiveness with the one-parameter (1PL) and three-parameter (3PL) logistic IRT models in adjusting pretest item statistics with small sample sizes, using more realistic data than the previous study. Based on the realistic data generated from a 50-dimensional MIRT model, the 3PL model performed better than the 1PL or standardization method in recovering both the population p-values and point biserial correlations. The standardization method outperformed the 1PL model in recovering the population point biserial correlations, but not in recovering the population p-values. The performance of the methods was also evaluated using the real pretest data of a high-stakes test. In terms of recovering the p-values and point biserial correlations for the real data, the 1PL model produced the most satisfactory results. The 3PL model performed worst in terms of recovering the p-values for the real data, and the standardization method performed worst in recovering the point biserial correlations for the real data. Due to the very limited number of conditions studied, one must be cautious about making conclusions about the standardization method relative to IRT methods based on these studies. The standardization method appears to be a viable alternative to IRT methods that may be simpler to implement, although these results do not suggest that it will produce more accurate results.

參考文獻


Birnbaum, A.(1968).Statistical theories of mental test scores.Reading, Mass:Addison-Wesley.
Chang, S. W.,Hanson, B. A.,Harris, D. J.(2000).Annual meeting of the National Council on Measurement in Education.New Orleans:
Davey, T.,Nering, M.,Thompson, T. D.(1997).Realistic simulation of item response data.Iowa City, IA:ACT, Inc..
Fraser, C.(1986).NOHARM: A computer program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory.Armidale, New South Wales, Australia:Center for Behavioral Studies, The University of New England.
Hanson, B. A.(1991).A comparison of bivariate smoothing methods in common-item equipercentile equating.Applied Psychological Measurement.15,391-408.

延伸閱讀