透過您的圖書館登入
IP:3.145.184.117
  • 期刊
  • OpenAccess

Investigating the Score Dependability and Decision Dependability of the GEPT Listening Test: A Multivariate Generalizability Theory Approach

全民英語能力分級檢定聽力測驗信度與決策信度之研究:多元概化理論分析法

摘要


古典測驗理論的信度估計方法廣泛應用於檢測信度,為一般人所熟悉與接受的理論。然而傳統測驗信度估計方法卻隱含一些問題與限制,並不適合應用於效標參照測驗之分析。概化理論則突破這些傳統方法的限制,能有效地評估信度。本文旨在應用多元概化理論,探討全民英語能力分級檢定(簡稱全民英檢)聽力測驗之信度與決策信度。此分析方法能同時檢測測量誤差的多種來源,並更準確地評估聽力檢定的信度,以及所設定之測驗通過標準的決策信度。共有609位臺灣大一學生參與本研究,應考一份全民英檢中級聽力測驗考題。各題並依所評量之聽力能力逐一編碼,可分為兩大類:能聽懂字面上的含意與能理解潛藏的寓意。研究結果顯示此份試題具有不錯的信度,而現今所使用的測驗通過標準亦具有一定的可靠性。

並列摘要


Classical test theory (CTT) reliability estimates have been commonly used and accepted as a useful way of estimating reliability. However, several limitations in the CTT model for reliability estimates have been identified, particularly within the context of criterion-referenced tests. Generalizability (G) theory is one of the useful techniques which can address these limitations and help to examine dependability issues. The purpose of this paper is to investigate the score dependability and decision dependability of the listening section of the General English Proficiency Test (GEPT). One form of the intermediate-level GEPT listening test was administered to 609 Taiwanese college freshmen. The responses were then coded for the ability to understand literal meaning (operationalized as literal-explicit items) and the ability to understand pragmatic meaning (operationalized as pragmatic-implicit items). A multivariate G theory analysis was conducted to examine the relative effects of multiple sources of variance in the listening scores as well as the dependability of the composite scores for the listening test and decisions being made at the current cut-off score. It was found that both literal-explicit and pragmatic-implicit items worked almost equally well. The current predetermined cut-off score yielded a moderately high dependability of decisions.

參考文獻


Bachman, L. F.(2004).Statistical analyses for language assessment.Cambridge, UK:Cambridge University Press.
Bachman, L. F.,Lynch, B. K.,Mason, M.(1995).Investigating variability in tasks and rater judgements in a performance test of foreign language speaking.Language Testing.12,238-257.
Bachman, L. F.,Palmer, A. S.(1996).Language testing in practice.Oxford, UK:Oxford University Press.
Brennan, R. L. (1999). mGENOVA (Version 2.0) [Computer software]. Iowa City, IA: The University of Iowa.
Brennan, R. L.(2001).Generalizability theory.New York, NY:Springer.

延伸閱讀