電腦化適性預試對試題難度估計精準度之影響

本研究目的在於探討適性選題策略在小樣本情境下對參數估計效果之影響。試題參數的精準度是很重要的議題，因為許多測驗應用都是建立在精準的試題參數上進行，例如電腦化適性測驗（CAT）。一般來說，試題參數需經過大樣本的預試才能得到精準的估計，然而這個目標在現實中往往難以達成。本研究希望藉由提出「電腦化適性預試」（CAPT）設計，為每位受試者提供難度與能力相符的試題，進而提升試題參數估計的精準度。本研究共分為兩個子研究，研究一的目的是了解不同能力分佈形態受試者對試題參數估計的影響，希望提供CAPT設計下有關受試者選擇方面的建議。研究二可分為兩部分，第一部份的目的是提出CAPT設計，並探討不同預試設計下的參數估計效果為何；第二部分的目的是探討主觀難度與真實難度相關對參數估計的影響，並操弄測驗長度、共同題題數、共同題難度分佈等情境，以提供不同設定下的CAPT參數估計效果做為參考。研究一結果顯示，常態分佈、均等分佈與多群體分佈受試者的整體試題參數估計效果相近，但常態分佈受試者對於中等難度試題估計得較精準、對簡單與困難試題估計得較不精準，均等與多群體分佈受試者則是對不同難度試題的估計效果較一致。研究二結果顯示，使用CAPT設計所得到的參數比使用NEAT設計所得到的參數更精準。而當主觀難度與真實難度的相關愈低時，參數估計的效果愈差；當測驗長度較長、或共同題題數較少時，試題參數的估計效果較好；共同題難度分佈的不同則對參數估計的影響不大。整體而言，CAPT在小樣本情境下能夠提升試題參數估計的精準度，只要控制主觀難度與真實難度的相關在中等以上時，本研究可以提供無法使用大樣本進行預試者一個相當有用的資訊。

關鍵字

小樣本；電腦化適性測驗；電腦化適性預試；試題參數估計

並列摘要

The goal of the research is to investigate the influences of adaptive item selection method on the accuracy of pretest items calibration. The success of applications in computerized adaptive test (CAT) depends on the accuracy of each individual item parameters estimated. Typically, pretest calibration of item parameters is suggested to acquire large calibration sample size to reduce the estimation error. However, it may be difficult to reach such standard in reality. This paper proposes a Computerized Adaptive Pretest method (CAPT) for determining optimum items for each examinee to take in the pretest, thus improve the accuracy of item calibration. The research is composed of two studys. In Study 1, three kinds of ability distribution, normal, uniform and multi-group, were formed and to examine the influence on pretest item calibration. Study 2 is composed of two parts. Part one is to examine the difference of item estimation precision between CAPT and NEAT design. Part two is mainly to examine the difference of item estimation precision under three kinds of correlation between subjective difficulty and true difficluty. In addition, part two examines some variables that might be the factor which influences the item estimation precision under the CAPT design, such as test length, anchor items number, and the difficulty distribution of anchor itms. The result of Study 1 suggests there is no difference on the precision of item estimation between normal, uniform and multi-group distribution examinees. The difference between them is that the estimation is more precise for normal distribution at the average difficulty items, while it is not that accurate at the easy and hard items. Besides, uniform or multi-group examinees perform similarly accurate in all the items. The result of Study 2 suggests that the CAPT design performs better than NEAT design in the small sample size situation. With respect to the correlation between subjective difficulty and true difficluty, the higher the correlation, the more precise the item estimation. Furthermore, item estimation is more precise as the length of test is longer and the anchor items are fewer. The difficulty distribution of anchor items has little to do with the precision of item estimation. Generally speaking, this study sheds some light on future applications of pretest design for test users who can not acquire large sample size to estimate item parameters, as long as the correlation between subjective difficulty and true difficulty is equal to or higher than moderate level.

並列關鍵字

small sample size ； computerized adaprive test ； computerized adaptive pretest ； item parameters estimation

參考文獻

陳柏熹（2006）。能力估計方法對多向度電腦化適性測驗測量精準度的影響。教育心理學報，38(2)，93-210。

Ackerman, T. A. (1989). Unidimensional IRT calibration of compensatory and　noncompensatory multidimensional items. Applied Psychological Measurement, 13, 113-127.

Adams, R. J., Wilson, M., & Wang, W. C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.

Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.

國際替代計量

電腦化適性預試對試題難度估計精準度之影響

主題瀏覽