為提供依據專家判斷的Yes/No Angoff標準設定方法一個客觀效度驗證方式,本研究隨機抽取2014年國中教育會考五考科各5,000筆考生資料,應用階層性群聚分析法,將考生分為待加強、基礎、精熟三群,計算出每一群考生答對題數的期望值、標準差,以及各分群的人數比例,作為後續最大期望值(EM)演算法機率模型當中的初始值,再進一步最佳化分群的結果,得到每科三群的切點分數,以此作為Yes/No Angoff專家判定的參考。此外,再利用考生的在校成績作為外在效標。研究發現,兩種方法在數學科的能力分類一致性最高,在自然科的一致性最低;透過群聚分析,除了國文科外,其他考科被分類在待加強的比例,皆比專家判定結果多;在校成績可作為標準參照計分方式外在效標驗證的參考。就教育決策的觀點而言,訂定切點分數的適當性會影響整體教育績效的評估,而群聚分析分類法在常用的以專家主觀意見為依據的標準設定法之外,增加了相對較客觀的切點分數的訊息,可提供給教育主管機關不同觀點的決策參考。
Test validity is a property of the interpretation assigned to test scores. To provide an objective validating evidence for a standard-referenced assessment is especially important. In this study we gather validity evidences to support the interpretation of test results of the 2014 Comprehensive Assessment Program Junior High School Students in Taiwan. We utilized two methods in the cluster analysis, namely, the hierarchical clustering and expectation maximization (EM) algorithm, to explore the validity of one of the expert judgement technique- Yes/No Angoff standard setting method. The hierarchical clustering (HC) based on a minimum variance algorithm was first applied to segregate the examinees into three groups with respective abilities, namely, below basic, basic, and proficient. Under the assumption that each ability cluster is a Gaussian distribution and the overall distribution of each test subject data is a mixture of Gaussians (MoG), we initialized the unknown parameters, including the mean, variance and the proportion of each cluster of the MoG based on the HC results. Following the initialization of parameters, the EM algorithm was adopted to optimize the estimation of parameters, resulting in three clusters of ability groups. The results from the traditional Yes/No Angoff standard-setting procedure and the HC-EM cluster analysis were compared. To compare the grouping results of the two methods, we analyzed the examinees' school grades as external criteria to provide alternative evidence for checking the convergent validity of both methods. The study suggested that cluster analysis could be applied as a support tool to provide validating information in the process of standard setting for high-stakes achievement tests.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。