Title

華語文閱讀測驗信度效度分析與垂直等化研究

Translated Titles

A Reliability, Validity and Vertical Equating Study of the Reading Subtest of the Test of Chinese as a Foreign Language

Authors

藍珮君(Pei-Jiun Lan);陳柏熹(Po-Hsi Chen)

Key Words

華語文能力測驗 ; 信度 ; 效度 ; 試題反應理論 ; 垂直等化 ; mandarin test ; reliability ; validity ; item response theory ; vertical equating

PublicationName

華語文教學研究

Volume or Term/Year and Month of Publication

11卷1期(2014 / 03 / 01)

Page #

99 - 125

Content Language

繁體中文

Chinese Abstract

本文旨在探討華語文閱讀測驗四個測驗等級:基礎級、進階級、高階級與流利級的信度與效度表現,並將四個等級試題難度連結至同一量尺上。樣本來自2011年5月與11月正式考試,及2012年預試之考生作答反應資料,以古典測驗理論與試題反應理論進行分析。研究結果顯示:1.閱讀測驗信度良好,各等測驗KR20信度係數接近或達到0.90以上,IRT估計標準誤換算後的信度數值皆達到0.90以上,且各測驗通過門檻的考生能力值亦有較高的測驗訊息量與較低的估計標準誤;2.閱讀測驗具有建構效度,各等級因素分析結果抽出閱讀理解單一因素,解釋變異量在66.91%以上,且各等級試題與模式適配比例達87.5%以上;3.四等測驗試題難度分佈良好;4.進階與高階級測驗折半合併為一等測驗,通過門檻之測驗訊息量及估計標準誤,與原進階級測驗相當,略差於原高階級測驗,將此兩等級測驗合併為一等測驗在實務上應為可行,惟組卷時試題難度比例需再做調整。

English Abstract

The purpose of this study is to investigate the reliability, validity and vertical equating of the Reading subtest of the Test of Chinese as a Foreign Language. Four levels are included in the reading section, they are Level 2, 3, 4, and 5, respectively. The analysis data was sampled from the formal version of the test administered in 2011 and pretest version in 2012. The results showed that, first, the coefficients of the Kuder-Richardson 20 were closed to or higher than .90. Moreover, large test information is provided to the value of cutoff which is determined an examinee is passed or failed. In other words, low standard error of estimation was obtained for the examinees. Second, the results of factor analysis showed that only one factor was extracted, which could account for above 66% of the variance. In addition, the results of Rasch analysis revealed that more than 87.5% of the items fit the model well. Third, there is a suitable range of difficulties for each level of test. Finally, standard error of estimation about the cutoff values were similar to Level 3 but lower than Level 4 when the items in Level 3 and 4 were split to assemble two tests (i.e., test information on the cutoff values for the even items included in Level 3 and 4, the odd items included in Level 3 and 4, and items in Level 3 and 4). That is these two adjacent levels can be combined to form a composite level of test in the future to reduce the burden for examinees and developers of the test. However, the item difficulty distribution of the composite test should be adjusted.

Topic Category 人文學 > 語言學
社會科學 > 教育學
Reference
  1. 張鈺卿(2007)。Taichung,國立臺中教育大學教育測驗統計研究所=Graduate Institute of Education Measurment and Statistics, National Taichung University of Education。
    連結:
  2. Educational Testing Service. 2007. TOEFL® iBT Score Reliability and Generalizability. Retrieved Sep 26, 2013 , from http://www.ets.org/Media/Tests/TOEFL /pdf/TOEFL_iBT_Score_Reliability_Generalizability.pdf
  3. Educational Testing Service. 2012. TOEIC Examinee handbook listening & reading. Retrieved Sep 26, 2013 , from http://www.ets.org/Media/Tests/TOEIC/pdf/TOEIC_LR_examinee_ handbook.pdf
  4. Winsteps and Rasch measurement Software. 2013. Misfit diagnosis: Infit outfit mean-square standardized. Retrieved from http://www.winsteps.com/win-man/index.htm?diagnosingmisfit.htm.
  5. 張晉軍。 2011。〈新漢語水準考試(HSK)品質報告〉。2013年9月26日,取自: http://blog.sina.com.cn/s/blog_53e7c11d0100v71z.html [Zhang, Jin- Jun. 2011. The report of the new Hanyu Shuiping Kaoshi (HSK). Retrieved Sep 26, 2013 , from http://blog.sina.com.cn/s/blog_53e7c11d0100v71z.html]
  6. Bond, Trevor G.,Fox, Christine M.(2007).Applying the Rasch Model: Fundamental Measurement in the Human Sciences.Mahwah:Lawrence Erlbaum Associates.
  7. Educational Testing Service(2011).Reliability and Comparability of TOEFL iBT® Scores(PDF).TOEFL iBT Research Insight
  8. Kolen, Michael J.,Brennan, Robert J.(1995).Test Equating: Methods and Practices.New York:Springer-Verlag.
  9. Lai, J.,Cella, D.,Chang, C. H.,Bode, R. K.,Heinemann, A. W.(2003).Item banking to improve, shorten, and computerize self-reported fatigue: An illustration of steps to create a core item bank from the FACIT-Fatigue scale.Quality of Life Research,12,485-501.
  10. Sawaki, Y.,Stricker, L. J.,Oranje, A. H.(2009).Factor structure of the TOEFL Internet-based test.Language Testing,26(1),5-30.
  11. Wright, B. D.,Stone, M. H.(1979).Best test design.Chicago:MESA Press.
  12. Yu, Chong Ho.,Osborn Popp, Sharon E.(2005).Test Equating by Common Items and Common Subjects: Concepts and Applications.Practical Assessment, Research & Evaluation,10(4),1-19.
  13. 王文中、呂金燮、吳毓瑩、張郁雯、張淑慧(2004)。教育測驗與評量:教室學習觀點。臺北=Taipei:五南書局=Wu-Nan Book Inc.。
  14. 王寶墉(1995)。現代測驗理論。臺北市=Taipei:心理出版社=Psychological Publishing Co., Ltd.。
  15. 余民寧(2009)。試題反應理論 IRT 及其應用。臺北市=Taipei:心理出版社=Psychological Publishing Co., Ltd.。
  16. 吳明隆(2003)。SPSS 統計應用實務。臺北=Taipei:松崗=Unalis Corporation。
  17. 柴省三(2012)。關於 HSK 閱讀理解測驗構想效度的實徵研究。世界漢語教學,2012(2),243-253。
  18. 符華均、張晉軍、李亞男、李佩澤、張鐵英(2013)。新漢語水平考試 HSK(五級)效度研究。考試研究,2013(3),65-69。
  19. 郭生玉(2000)。心理與教育測驗。臺北=Taipei:精華書局=Jing- Hua Book Company。
  20. 陳柏熹(2011)。心理與教育測驗─測驗編製理論與實務。臺北=Taipei:精策教育=Planned Education Ltd。
  21. 藍珮君、林玲英(2011)。新版華語文能力測驗與 CEFR 之連結:標準設定方法的應用。ALTE 第四屆國際研討會,波蘭克拉科=Krakow, Poland:
Times Cited
  1. 楊凱琳,陳建亨(2021)。題型對學生數學表現水準之影響-以相似形為例。教育科學研究期刊,66(3),247-277。