等級反應模式下等化程序之探究

隨著要測量的概念日趨複雜，多元計分的試題已出現在許多真實測驗情境，包含大型測驗如美國的NAEP、我國的TASA-LN等。在大型測驗中，如果多個群體施測不同版本的測驗時，各群體的分數欲進行比較，便需進行等化的程序。Kim與Cohen (2002)指出，需將估計值量尺連結到真值的量尺上才算完成等化，但在許多實務情境中，參數真值並不可得，為了能於真實情境中實施，因此本研究透過模擬資料，探討在等級反應模式下，水平等化及垂直等化兩種情境中，以新的等化程序「估計值標準化」進行等化時，不同獨變項對等化效果之影響。研究結果發現，在水平等化時，測驗長度愈長、施測人數愈多，則參數估計愈精準，而同時估計法與分開估計法的參數估計精準度相當；垂直等化時的結果與水平等化相似，惟應選用分開估計法及樣本數相當的兩群體進行等化，可得到較理想的參數估計值。本研究發現，測驗長度為影響等化效果之主因，若施測的兩群體之樣本數相等，則進行垂直等化時將提高能力值的估計誤差。定錨題比例的增加能降低估計誤差，但在水平等化時的影響甚微。以估計法而言，水平等化時兩種估計法無顯著差異，垂直等化時則以分開估計法較佳。

關鍵字

同時估計；多元計分模式；等化；等級反應模式

並列摘要

As the concepts that being measured become more complicated, different type of items were being used in real test situation. Polytomous items were used in large-scale test, such as NAEP, TASA-LN, for a long time. To compare scores of different groups of examinees on large-scale test, the procedures of equating should be carried out. From simulation data, this study investigates the effect of the new equating procedure, estimates standardized, on parameter estimation under graded response model. Four independent factors were manipulated: (1) sample sizes; (2) test lengths; (3) percent of anchor items, and (4) estimation methods. In the horizontal equating, the RMSE was smaller when the test length or sample size was increased. The accuracy of parameter estimates will be the equivalent between concurrent estimation and separate estimation. In the vertical equating, on the other hand, the RMSE was smaller when the test length increased. If separate estimation was adopted, lower RMSE will be obtained if the sample sizes of two groups are equivalent. It was shown that the test length is the main factor that affects the results of equating. When the sample sizes of the base and target group were equivalent under vertical equating, the RMSE will decrease if the percentage of anchor items of the test increases. However, this is not hold under horizontal equating. In the end, separate estimation performs better under vertical equating.

並列關鍵字

concurrent estimation ； equating ； graded response model ； polytomous IRT model

參考文獻

Baker, F. B.(1992).Equating tests under the graded response model.Applied Psychological Measurement.16,87-96.

Google Scholar

Baker, F. B.(1993).EQUATE 2.0: A computer program for the characteristic curve method of IRT equating.Applied Psychological Measurement.17,20.

Google Scholar

Cohen, A. S.,Kim, S. H.(1998).An investigation of linking methods under the graded response model.Applied Psychological Measurement.22,116-130.

Google Scholar

De Ayala, R. J.,Dodd, B. G.,Koch, W. R.(1992).A comparison of the partial credit and graded response models in computerized adaptive testing.Applied Measurement in Education.5,17-34.

Google Scholar

Hanson, B. A.,Béguin, A. A.(2002).Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design.Applied Psychological Measurement.26,3-24.

Google Scholar

國際替代計量

等級反應模式下等化程序之探究

全文下載

主題瀏覽