透過您的圖書館登入
IP:52.15.63.145
  • 學位論文

項目反應理論在自然科學科能力測驗之應用:部份給分模式與等級反應模式之比較

Applying Item Response Theory on the Nature Science Scholastic Achievement Test:Comparing Partial Credit Model and Graded Response Model

指導教授 : 張淑慧
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本研究的主要目的在應用項目反應理論中部分給分模式及等級反應模式,針對九十一學年度大學多元入學考試-學科能力測驗自然考科進行分析,並比較在不同的計分模式下,何者具有較佳之適配性。主要的研究問題有三:(1)比較二元計分與多元計分模式之分析結果(2)何種計分模式具有較佳的適配性(3)比較CTT與IRT題目與能力計量特性的結果。 本研究樣本由大考中心隨機抽樣共計1000人;測驗題目共計68題,包含三種題型:(1)單一選擇題(二元計分)(2)多重選擇題(多元計分)(3)題組(多元計分),本研究依照資料的特性,將多重選擇題與題組分別以二元及多元計分方式處理,並以二元計分與多元計分模式進行分析。 本研究的主要結果摘要如下: 一、 在單一向度方面:由因素分析結果顯示,第一因素與第二因素的特徵值比值為3.956;第一因素可解釋變異量為12.223%,資料尚符合單一向度假定。 二、 在二元計分與多元計分模式的比較方面:在題目參數估計上,除了三參數模式及等計反應模式中第36、39兩題無法估計外,二元計分與多元計分模式所有題目都能夠順利估計,單參數模式平均難度為-0.10;雙參數模式平均難度及鑑別度分別為0.05及0.85,三參數模式平均難度、鑑別度、猜測值為0.48、1.47及0.2;當多重選擇題採多元計分時,部份給分模式平均難度為-0.03;等級反應模式難度、鑑別度分別為0.21、0.11;當多重選擇題與題組皆採多元計分時,部份給分模式平均難度為0.00;等級反應模式平均難度、鑑別度分別為0.13、0.85。 三、 在計分模式的適配性方面採三個標準:(1)適合度考驗結果顯示,適用單參數模式的題目佔25.76%;適用雙參數模式的題目有71.21%;適用三參數模式的題目為90.91%;在多重選擇題採多元計分時,部份給分及等級反應模式適用的題目分別為6.1%及4.5%;當多重選擇題與題組採多元計分,適用的題目皆為3.03%,就二元計分而言,三參數模式適用性較佳;就多元計分而言,部份給分及等級反應模式適用性皆不佳(2)就題目參數估計而言,各模式間難度指標相關介於0.890~0.999;鑑別度指標相關介於0.133~0.991,顯示各模式有其不同的計量特性,其中單參數模式與部份給分模式(多重選擇題為多元計分)難度相關達0.999;雙參數模式與等級反應模式(多重選擇題為多元計分)難度、鑑別度相關達0.999及0.991顯示模式間具有相似的計量特性(3)在測驗訊息量方面,二元計分模式中,三參數模式在能力水準為0~3 之間,明顯較其他模式能提供較多的訊息量;在多元計分模式(66題)中等級反應模式較部份給分模式能提供較多的訊息量,其中單參數模式與部份給分模式測驗訊息量幾乎相同;雙參數模式與等級反應模式也有相似的結果,顯示多重選擇題採多元計分並不能明顯提供較多的訊息量。在多重選擇題與題組多元計分(44題)的訊息量部份,將其訊息量依照比率計算,使之能與其他模式相互比較,結果顯示,多重選擇題與題組多元計分的部份給分模式及等級反應模式明顯較僅多重選擇題採多元計分的部份給分及等級反應模式的訊息量大。 四、 在傳統測驗指標方面,難度指標介於0.190~0.907,平均難度為0.515;鑑別度指標介於0.127~0.549,平均鑑別度為0.341。在項目反應理論模式與傳統測驗理論的相關方面,難度指標相關介於-0.758~-0.876之間;鑑別度指標相關介於0.166~0.876 間,由結果顯示傳統測驗理論與單、雙參數模式、部份給分模式及等級反應模式之間指標相關甚高;在能力值相關方面,相關介於0.976~0.995,顯示項目反應理論與傳統測驗理論對同一資料分析結果相當一致。 由項目反應理論模式分析的結果顯示,三參數模式於較適用本研究資料,在多元計分模式的部份則採保留態度,若能針對計分層級、題目類型等因素加以考量,對於模式的應用應有相當大的幫助。

並列摘要


The purpose of the study was to compare Partial Credited Model and Graded Response Model of Item Response Theory to the Science Scholastic Achievement Test. The investigations focused on:(1)comparing dichotomous and polychotomous IRT models(2)assessing the model-data fitting among IRT models(3)comparing both item and ability indices between CTT and IRT。 One thousand examinees’ data were sampled from the testing data. There were 68 items in the science-SAT. They were consisted of three types of item type: (1)multiple-choice items (only one best answer), (2) multiple-choice items (more than one best answer) and (3) testlet. The major findings were: 1. The result of factor analysis show that the ratio of the first to the second eigenvalue was 3.956. The variation of first factor was 12.223. The data roughly met the assumption of unidimenionality. 2. All item parameters of dichotomous and polytomous models were estimated successfully excepted two items for 3PL model. The mean of item difficulty of 1PL was -0.10;the means of item difficulty and discrimination of 2PL were 0.05 and 0.85;the means of item difficulty, discrimination and guessing of 3PL were 0.48、1.47 and 0.2;the mean of difficulty of PCM when MMC(multiple multiple-choice)were scored as polytomous items was -0.03;the means of difficulty and discrimation of GRM when MMC were scored as polytomous items were 0.21 and 0.11;for MMC and testlet were scored in polytomous, the mean of difficulty of PCM was 0.00;the means of difficulty and discrimation of GRM were 0.13 and 0.85. 3. Three criteria were used to assess the goodness of fit in model selection:(1)chi-square and Q statistical indices:17 items were retained for the 1PL;47 items were retained for the 2PL;60 items were retained for the 3PL;4 and 3 items were retained for the PCM and GRM when MMC were scored as polytomous items;2 items were retained for the PCM and GRM separately when MMC and testlet were scored as polytomous items. The results showed 3PL can be applied to dichotomous data relatively well than other models.(2)item indices correlation between IRT models:correlation between difficulty of IRT models was high, range from 0.890~0.999, the indices of discrimation were 0.133~0.991.(3)regarding the test information , the 3PL provided more information than others. There was not much difference between 1PL and PCM. Same situation was occurred between 2PL and GRM. 4. Regarding the difficulty and discrimination of CTT, the mean of item difficulty and discrimination were 0.515 and 0.341. The correlation of difficulty between IRT models and CTT were -0.758~-0.876, The correlation of discrimination were 0.166~0.876. The result of indices between IRT and CTT revealed high correlations. The correlation between ability parameters of IRT models and total score of CTT ranged from 0.976 to 0.995. According to the analysis, 3PL is suitable for use in the science-SAT, the study hold conservative attitude to the feasibility of polytomous IRT models.

參考文獻


Ratna N., Feng Yu, Hsin-Hung Li & Willams S.(1998)Assessing Undimensionality of polytomousn data, Applied Psychological Measurement, Vol. 22, No.2, p99-115.
王立行(民90)標準化入學考試量尺分數的心理計量問題研究,中國測驗學會年刊,48輯,1期,119-140頁。
American Psychological Association(2001)Publication Manual of the American Psychological Association 5th edition, American Psychological Association。
Baker F. B.(1992)Item Response Theory-Parameter Estimation Techniques,Marcel Dekker, Inc.。
Childs, R. A. & Chen, W.-H.(1999)Obtaining comparable item parameter estimates in MULTILOG and PARSCALE for two polytomous IRT models, Applied Psychological Measurement, Vol.23, No,4, p371-379.

延伸閱讀