單向度試題反應理論之可能值方法於等化設計下之模擬實驗探究

許多國際大型測驗，多採用可能值方法來進行群體能力參數的估計。而可能值的資料型態，亦可讓資料分析者進行統計特性的描述。此外，一般大型測驗所評量的範圍都涵蓋了不同的認知向及難度，無法由單一受試者於短期間內全部完成，測驗題目都會進行不同的等化設計以減輕受試者負擔並達成測驗的目的。本研究係各以定錨不等組（non-equivalent groups with anchor test design，NEAT）及平衡不完全區塊（balanced incomplete block design，BIB）的垂直等化設計，並以可能值方法、納入背景變項的期望後驗法、期望後驗法及最大概度估計法等各種方法進行群體能力的平均數與標準差的估計，主要的目的在於探討可能值方法及其它估計法在群體參數估計的效果。本研究結果顯示在各種不同的等化設計下，群體能力平均數與標準差的估計，納入背景變項估計方法皆有較好的估計效果，特別是群體能力標準差的估計，可能值方法的估計結果遠優於各種估計方法。

關鍵字

大型測驗；參數估計；可能值方法；等化設計；試題反應理論

並列摘要

The purpose of this paper is to explore the performance of plausible values method under BIB and NEAT designs based on simulated data. The major focus of large-scale assessments is always on the population statistics, such as means and standard deviations, and the plausible value method is usually used to estimate the population parameters. For large-scale assessments the spectrum of subject matter is usually wide, but the testing time is short. Therefore, in order to cover the proficiency domain sufficiently, multiple booklets are used. Balanced incomplete block design (BIB) and non-equivalent groups with anchor test design (NEAT) are two popular test equating methods for this condition. The experimental results show that the estimating method based on plausible values estimate better than that of other methods in equating designs, and as the test length increase, population parameters (means and standard deviations) are well estimated. In these experimental situations, the estimations of population parameters are not affected by sample size (16,128 and 10,920). Both linking designs, BIB and NEAT, can lead to more precision estimates by using plausible value method.

並列關鍵字

equating design ； item response theory ； large-scale assessment ； parameter estimation ； plausible values

參考文獻

洪碧霞、林素微、林娟如(2006)。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊。2(4)，69-86。

Google Scholar

郭伯臣、王暄博(2008)。大型測驗中同時進行垂直與水平等化效果之探討。教育研究與發展期刊。4(4)，87-120。

Google Scholar

郭伯臣編、曾建銘編、吳慧珉編(2012)。大型標準化測驗建置流程應用於TASA之研究。新北市:國家教育研究院。

Google Scholar

Adams, R. J.,Wilson, M.,Wu, M.(1997).Multilevel item response models: An approach to errors in variables regression.Journal of Educational and Behavioral Statistics.22,47-76.

Google Scholar

Allen, N. L.,Carlson, J. E.,Johnson, E. G.,Mislevy, R. J.(1999).The NAEP 1998 technical report.,未出版Educational Testing Service.

Google Scholar

國際替代計量

單向度試題反應理論之可能值方法於等化設計下之模擬實驗探究

全文下載

主題瀏覽