透過您的圖書館登入
IP:3.135.224.139
  • 期刊

科學多重文本閱讀理解評量之建構與信效度分析-以氣候變遷與三峽大壩之間的關係題本為例

Developing and Validating a Scientific Multi-Text Reading Comprehension Assessment: Evidence from Texts Describing Relationships between Climate Changes and the Three Gorges Dam

摘要


本研究主要目的係在發展「科學多重文本閱讀理解評量」,並建立一組評鑑閱讀理解能力之「多重文本閱讀理解評量規準」。本評量之科學題本為「氣候變遷與中國長江三峽大壩的關係」,包含「提取訊息」、「概化訊息」、「解釋訊息」以及「整合訊息」四個分評量,共計10題選擇題及9題建構題。分析結果顯示,評分者內之Cronbach's α值均大於.9,表示評分者內一致性尚稱良好。其次,評分者間之Kendall ω和諧係數值大於 .8,P值小於< .001,達顯著相關,顯示評分者間有相同相對等級的評分趨勢。另評分者嚴苛度之多面向Rasch測量模式與評定量尺及部分給分模式比較之卡方考驗則達顯著水準,表示評分者間的嚴苛度及閾值嚴苛度存在差異存在,前者infit與outfit MFRM均介於1±0.3 之間,表示無論是嚴格或寬鬆的評分者,均能有效區分出高、低能力的學生;後者意謂著對於牽涉到評分者之詮釋、評估、評分的心理歷程,本來就很難像機器評分一樣的一致性,亦符合一般對於人評分的預期,並可被理解與接受。其次,題本之內部一致性,除「提取訊息」、「概化訊息」外,其餘亦均大於.70,全評量α則在.90以上,顯示SMTRCA之Cronbach's α尚在可接受範圍內。最後,驗證性因素分析也支持「科學多重文本閱讀理解評量」四因素之假設模式,兩者適配尚稱符合。本研究初步發現「科學多重文本閱讀理解評量」可分為「提取訊息」、「概化訊息」、「解釋訊息」以及「整合訊息」四個分評量,而該四個分評量分數所表徵之一階潛在因素,可被「科學多重文本閱讀理解評量」解釋的變異量分別為.60、.66、.80、.80。

並列摘要


This study aimed to advance the Scientific Multi-Text Reading Comprehension Assessment (SMTRCA), with a focus on the Rubric of Multi-Text Reading Comprehension Assessment (RMTRCA) designed to evaluate the extent of reading comprehension. To this end, we used scientific texts describing the dispute of the relationships between climate changes and the Three Gorges Dam and developed assessment items according to our rubric. Test items included 10 close-ended and 9 open-ended questions and were categorized into 4 subscales: information retrieval, information generalization, information interpretation, and information integration. The results of analysis showed that the cronbach's α values were more than .9, indicating that the intra-rater consistency was well. Secondly, the Kendall’s coefficient of concordance was more than .8 and its P value was smaller than .001, denoting a consistent scoring pattern between raters. Additionally, the analysis of many-facet Rasch measurement (MFRM) and the comparison of the rating scale model (RSM) and the partial credit model (PCM) showed that the chi-square test of rater severity and threshold difficulty were significant. The infit and outfit MNSQ of the former are between 1±0.3, meaning that both severe and lenient raters can distinguish high-ability students from low-ability students more effectively. The latter means that the rating procedures involve human interpretation, evaluation and scoring processes so that it is difficult to reach a machine-like consistency level. However, this is in line with expectations of typical human judgment processes. Thirdly, most values of Cronbach's α of test items were larger than .7 except those from information retrieval and information generalization but overall they were all within acceptable range. Finaly, confirmatory factor analysis showed that there was an acceptable goodness-of-fit among the SMTRCA. The SMTRCA accounts for .60, .66, .80, and .80 of the variance associated with the first order factors of 4 subscales.

參考文獻


Bråten, I., & Strømsø, H. (2010). When law students read multiple documents about global warming: Examining the role of topic-specific beliefs about the nature of knowledge and knowing. Instructional Science, 38(6), 635-657. DOI: 10.1007/s11251-008-9091-4
Bråten, I., Strømsø, H. I., & Britt, M. A. (2009). Trust matters: Examining the role of source evaluation in students' construction of meaning within and across multiple texts. Reading Research Quarterly, 44(1), 6-28. DOI: 10.1598/RRQ.44.1.1
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221. DOI: 10.1207/s15434311laq0203_2
Rouet, J. F., Britt, M. A., Mason, R. A., & Perfetti, C. A. (1996). Using multiple sources of evidence to reason about history. Journal of Educational Psychology, 88(3), 478-493. DOI: 10.1037/0022-0663.88.3.478
Rouet, J. F., Vidal-Abarca, E., Erboul, A. B., & Millogo, V. (2001). Effects of information search tasks on the comprehension of instructional text. Discourse Processes, 31(2), 163-186. DOI: 10.1207/S15326950DP3102_03

被引用紀錄


甘孟龍、黃秀霜、曾雅瑛、許力云(2023)。青少年寬恕水平能力測驗之編製及其追蹤調查研究的應用教育心理學報55(2),269-287。https://doi.org/10.6251/BEP.202312_55(2).0003
李奕璇、周業太、宋曜廷(2021)。中文閱讀能力適性診斷評量編製研究教育心理學報53(2),285-305。https://doi.org/10.6251/BEP.202112_53(2).0002
謝名娟(2020)。從多層面Rasch模式來檢視不同的評分者等化連結設計對參數估計的影響教育心理學報52(2),415-436。https://doi.org/10.6251/BEP.202012_52(2).0008
林小慧、林世華、吳心楷(2018)。科學能力的建構反應評量之發展與信效度分析:以自然科光學為例教育科學研究期刊63(1),173-205。https://doi.org/10.6209/JORIES.2018.63(1).06

延伸閱讀