於可回溯電腦化適性測驗中加入4PL錯誤校正機制

在電腦化適性測驗(computerized adaptive tests, CAT)的施測過程中，若提供受試者回顧修改已作答試題的機會，或可讓受試者更正因粗心答錯的試題，進而令測驗結果更符合受試者能力，此為可回溯電腦化適性測驗(reviewable CAT)之基本假設。然而允許受試者於CAT測驗過程中修改作答反應卻可能造成後續試題無法有效預估該受試者能力值，進而導致測驗結果產生偏誤、降低能力估計之精確程度。本研究旨在利用四參數試題反應模式(four-parameter logistical IRT model, 4PL IRT model)減低上述因回溯而產生的不良試題對能力估計的影響，並比較其與三參數試題反應模式(three-parameter logistical IRT model, 3PL IRT model)在能力估計表現上之差異。本研究共分為三階段實驗，在第一階段模擬實驗及第二階段實徵實驗中，分別探討在模擬及實際施測兩種情境下，上漸近線參數(upper asymptote parameter)對CAT能力估計之精確性及估計效率的影響。第三階段的模擬實驗則探討4PL IRT model可否有效減低因回溯而產生的不良試題造成的能力估計偏誤。研究結果顯示，4PL IRT model 可改善因測驗初期作答失誤所導致的能力低估問題，提供較3PL IRT model更精確的能力估計；在正常施測狀況下，4PL也可有效改善整體施測效率。此外，透過與重排程序(rearrangement procedure)結合，4PL IRT model亦能解決可回溯CAT中因不良試題所產生的能力估計問題，改善可回溯CAT的能力估計精確性與效率。最後，實徵實驗結果顯示高中女生的英文能力顯著高於男性同儕。

關鍵字

項目反應理論；電腦化適性測驗；可回溯電腦化適性測驗；上漸近線參數；四參數試題反應理論；重排程序

並列摘要

The underlying hypothesis of reviewable CAT was that after rereading or rethinking an item, the examinees might correct the careless mistake they have made. Therefore, the testing score would be closer to examinee’s actual ability when mistakes were corrected; and the prohibition of reviewing items in CAT might lead to underestimating examinees’ ability. However, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee’s ability. These inappropriate items in a reviewable CAT could introduce bias in ability estimation and decrease precision. This study attempted to evaluate the performance of four-parameter logistical (4PL) model by comparing it with three-parameter logistical (3PL) model and utilizing it to reduce the impact of inappropriate items on reviewable CAT. Three experiments were conducted in this study. The first two experiments, a simulation and an empirical one, focused on a study of evaluating the performance of 4PL IRT model by comparing the measurement precision and efficiency of 3PL and 4PL IRT model under both simulation and empirical conditions; the third one focused on the study of reducing the impact of inappropriate items on reviewable CAT by implementing the 4PL model. Results of these experiment indicated that the 4PL IRT model could: (1) improve the estimation precision of CAT under poor-start administration condition; (2) promote the estimation efficiency of CAT under normal administration condition; and (3) could be implemented as a valuable solution in reducing the estimation bias introduced by the inappropriate items in reviewable CAT. Finally, the language achievement of female senior-high-school examinees was higher than that of males in both midterm score and ability estimated by CAT in this study.

並列關鍵字

Item response theory (IRT) ； computerized adaptive testing (CAT) ； reviewable CAT ； upper asymptote parameter ； four-parameter logistical (4PL) IRT model ； rearrangement procedure

參考文獻

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.

Kissau, S., & Turnbull, M. (2008). Boys and French as a second language: A research agenda for

Yen, Y. C., Ho, R. G., Chen, L. J., Chou, K. Y., & Chen, Y. L. (in press). Development and evaluation of a confidence-weighting computerized adaptive testing. Educational Technology & Society.

Ackerman, T. A. (1989). Unidimensional IRT calibration of compensatory and noncompensatory multidimensional items. Applied Psychological Measurement, 13(2), 101–125.

Baker, F. B. (1992). Item response theory: Parameter estimation techniques. New York: Marcel Dekker.

國際替代計量

於可回溯電腦化適性測驗中加入4PL錯誤校正機制

主題瀏覽