Angoff標準設定之判斷者的評估

在標準設定中，專業的判斷者根據表現水準描述（Performance Level Descriptors, PLDs），扣合到標準化測驗的分數，並據以區分將學生的能力表現。這個流程通常決定了分數對學生的意義和決策人員對測驗的使用，例如，通過/未通過的決定、或優秀/平均/未通過等，也就是說，這些決定與標準設定判斷者之評估密切相關。在典型標準設定中，專家學者小組的判斷者接受訓練，評估符合表現水準的考生是否能答對測驗題目，接著互相討論判斷的結果。標準設定的組織者，則會提供回饋讓判斷者了解其決定對影響考生之通過和未通過比例的影響和其他的測驗使用情形。此外，整個標準設定過程，判斷者在訓練中被要求提出對於了解相關概念和想法之熟悉性與自信程度的自我報告，以及是否正確地來運用判斷。Angoff標準設定是廣泛被使用於區分設定的方法之一。這個方法中，專家判斷小組對於學生的能力做出判斷，以評估學生能夠於表定時間中正確回答測驗題目。此流程相當重要，然而，有關如何地預備判斷者在標準化設定中的角色，所知仍有限。本研究數據蒐集是由一所臺灣的大學發展之本土外語測驗和共同歐洲參考架構（Common European Framework of Reference, CEFR）所對應的題項而來，包括聽和讀兩個小組都加以實施。本研究採用兩種共同使用的評量方法，以瞭解預備判斷者對於Angoff標準設定和判斷精確性的關聯。判斷的精確性是以答對率判斷的相關性(p相關)和方均根差(Root Mean Square Error, RMSE) 和截止分數判斷（Cut-off Score Judgments, CSJ）來測量。在第一次評估時，判斷者以PLDs加以訓練，然後測試其對於PLDs切合測驗知識的PLDs和判斷精準性；第二次評估時，則在訓練中介紹判斷的測量精確性，對於概念和想法的熟悉性和自信程度的相關情形，發現最終判斷的測驗精確性於熟悉程度和自信程度之間沒有相關。除了主要發現之外，進一步觀察到精確的語詞說明，對於判斷的精確性是非常重要的。也觀察到以RMSE和CSJ來對精確性做出差異決定優於p相關。本文對未來研究方向提出在訓練Angoff標準設定判斷者的結論和建議，也指出本研究限制所在。

關鍵字

Angoff ；判斷者；標準設定

並列摘要

In a standard setting, groups of expert judges evaluate verbal descriptions of performance (Performance Level Descriptors or PLDs) contained in a standard and match these with scores on a standardized test that place students in categories of performance. This procedure is often used to make decisions about what scores mean for the students and policy makers who use the tests. For example, Pass/Fail decisions, as well as Excellent/ Average/ Fail decisions are often tied to how tests are evaluated by standard setting judges. In a typical standard setting, panels of expert judges are trained, evaluate test items, and are then given time to discuss their results with other judges. Feedback is provided by standard setting organizers that allow judges to know how their decisions would affect students Pass/ Fail rate and other decisions the test will be used to make. In addition, throughout the standard setting, judges are asked to give self-reports about their familiarity with and confidence in their understanding of the concepts and ideas during the training and whether or not the judge is applying them correctly. The Angoff standard setting method is one of the mostly widely used methods for setting cutscores. In this method, panels of expert judges make judgments about the ability of students to correctly answer test items listed one at a time. Despite the importance of this procedure, little is known about how best to prepare judges for their role as a judge in the standard setting. Data was gathered from a standard setting held at a Taiwan university to match items from a locally developed foreign language test with the Common European Framework of Reference (CEFR). The study then used an evaluation of two commonly used methods to prepare judges for an Angoff standard setting and their relationship with judge accuracy. Both a listening and reading panel were conducted. Accuracy of judges was measured by the p-value correlation, the Root Mean Square Error (RMSE), and the Cutoff Score Judgment (CSJ). For the first evaluation, judges were trained in the PLDs and then tested about their ability to match a test of knowledge of the PLDs with the three measures of judge accuracy. No relationship was found between tested knowledge of the PLDs and judge accuracy. The second evaluation correlated familiarity with and confidence in the concepts and ideas introduced during the training period with the measured accuracy of the judge. Once again no relationship was found between familiarity and confidence with the final measured accuracy of the judge. In addition to the main findings, it was also observed that the exact wording of the instructions to instructions is very important to the accuracy of the judges. RMSE and CSJ were observed to make different decisions about accuracy than the p-value correlation. Future directions for research on the training of Angoff standard setting judges are suggested, as are the limitations of this study.

並列關鍵字

Angoff ； judges ； standard setting

參考文獻

REFERENCES

Google Scholar

American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, & Psychological Testing (US). (2014). Standards for educational and psychological testing. Amer Educational Research Assn.

Google Scholar

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In: R. L. Thorndike (Ed.), Educational Measurement (pp. 508-600). Washington, DC: American Council on `Education.

Google Scholar

Brandon, P. R. (2004). Conclusions about frequently studied modified Angoff standard-setting topics. Applied Measurement in Education, 17(1), 59–88.

Google Scholar

Bond, T. G., & Fox, C. M. (2001). Applying the Rasch Model: Fundamental Measurement in Human Sciences. Mahwah, NJ: Erlbaum.

Google Scholar

主題瀏覽