中文閱讀能力適性診斷評量編製研究

本研究建置了一套可同時評測整體閱讀理解能力，並診斷出字詞辨識、表層文意理解、文意統整、推論理解、分析評鑑之閱讀細項技能程度的中文閱讀能力適性診斷評量系統。此系統的適用對象為二至十二年級學生，為相關領域第一套可橫跨多個學習階段的中文閱讀測驗。本測驗透過現代測驗理論技術估算試題難度與學生能力參數，另藉由題庫建置、常模建置等作法，利用電腦化適性測驗的型式施測，可快速且精確地定位學生的閱讀能力程度，並長期追蹤能力變化情形。分析結果指出本測驗具備良好的重測信度、效標關聯效度、條件化信度、與IRT效度，顯示本測驗具有優秀的品質，可有效且穩定地評量學生的中文閱讀能力。

關鍵字

閱讀理解；電腦化適性測驗；診斷；中文閱讀能力

並列摘要

Reading comprehension is essential for learning in all subjects and for lifelong learning; it is also a crucial ability allowing people to communicate and interact with one another. Therefore, large-scale international assessments such as the Progress in International Reading Literacy Study and Program for International Student Assessment incorporate reading as an indicator of learning outcomes. This study also recognizes the essential nature of reading comprehension. However, existing reading comprehension tests have several limitations. For example, the target populations of most tests comprise students in specific grades (e.g., elementary school students) or groups (e.g., students with special needs), and the assessments involve paper-and-pencil tests with fixed items that requires a lot of resources on test implementation and scoring. Currently, no Chinese reading comprehension assessment suitable for long-term implementation in general classrooms exists. Accordingly, the purpose of this study was to develop an assessment system, namely the Diagnostic Assessment of Chinese Competence (DACC), for comprehensively evaluating students' reading abilities in the form of a computerized adaptive test. The reliability and validity of this system were also verified. The DACC system holistically assesses students' reading comprehension and assesses student performance in reading subskills such as comprehension (e.g., lexical, literal, and inferential), contextual integration, and analysis and evaluation. This assessment system was designed for students from the 2nd grade to the 12th grade. The DACC test items were drafted by school teachers, doctoral students in psychology, and professionals engaged in research on the Chinese language. All drafters were required to attend and pass training before contributing test items to the DACC system. Item topics were selected to be familiar to students, such as topics relating to daily or school life. The topics are not limited to the language arts, covering life experience, history, geography, and science. In the proposed system, assessment texts appear in various formats, including continuous texts, non-continuous texts, mixed texts, multiple texts, and texts displayed in hypertext. Text styles are also varied and include texts written in narrative, expository, descriptive, and argumentative styles. This wide range of texts reflects real-world reading situations encountered by students in their lives. Most of the DACC items are testlets, with each of the questions in the testlet corresponding to one of the five dimensions including vocabulary, literal comprehension, contextual integration, inferential comprehension, and analysis and evaluation. Such a design measures student performance in each of the dimensions and results in a comprehensive analysis of their reading abilities upon completion of the DACC. All test items were subjected to pilot tests to collect actual responses from students for the purpose of observing whether the questions meet the proposed design. The responses were also used to estimate item parameters. All DACC items were vertically equated on the basis of the nonequivalent groups with anchor test design. In the pilot tests, the characteristics of the respondents were also considered. Stratified random sampling was adopted to recruit students from both urban and rural areas to ensure that the parameter estimation results for the items apply to all students in the population. The DACC items were dichotomously scored in the pilot tests. At least 300 responses were gathered for each test item, and both classical test theory (CTT) and item response theory (IRT) were applied to analyze the responses. In the IRT-based analysis, this study used the multidimensional random coefficients multinomial logit model (MRCMLM) with marginal maximum likelihood estimation to estimate item parameters and used expected a posteriori measures to estimate ability parameters. In the CTT-based analysis, the pass rates and item discrimination were calculated for each item. To screen the DACC items for favorable psychometric characteristics, this study adopted two indicators. In the IRT-based analysis, the information-weighted mean square fit statistic (infit MNSQ) was used as the indicator to rule out misfit items, and items with infit MNSQ values between 0.6 and 1.4 were retained. In the CTT-based analysis, item discrimination was used as the indicator. Test items with discrimination of .3 or higher were retained. Accordingly, only when test items that met the requirements for both of these two indicators were entered into the formal item bank of the DACC system, resulting in 1019 items in this bank after data analysis. The range of item difficulties are bigger than -2 to 2, which corresponds to the ability parameters that include most students. The screening also demonstrated that the DACC is suitable for assessing the reading comprehension skills of students from the 2nd to 12th grades. To strengthen the effectiveness of the DACC system, this study constructed an assessment system based on computerized adaptive testing. For estimation of abilities, maximum a posteriori estimation (MAP) was used. For test item selection, Fisher's information was applied to calculate the item information each time students finished answering a set of questions. The system then randomly assigned the next question from the five items with the highest information score. When the number of items answered met a previously set standard, the assessment was terminated. Furthermore, this study provided a set of reference norms for the students' test results. A total of 38,099 students from 1,255 schools in Taiwan were included in the study. For these students, average scores were calculated for the students in each grade through the DACC system. Thus, students completing the assessment could compare their results against the norm and understand the level of their performance on the test. Such a reference can provide clear and objective standards to assist DACC users in assessing the grade level of their reading abilities. Accordingly, teachers can both determine whether their students' reading abilities meet the required level and adjust their follow-up instruction based on the assessment results. In addition to the rigorous procedures for constructing the DACC assessment system, this study examined the reliability and validity of the system. For the test-retest reliability assessment, this study evaluated the scores of 1,449 students who completed the test twice; the evaluation results revealed that the average correlation of their two scores was .76, meaning that the DACC system has high reliability. In the IRT analysis, the conditional reliability of the DACC system was also high. Assessing the test results of 16,479 students revealed that the average reliability of the system was above .80, indicating that the DACC system has a stable and high reliability level for students of differing reading abilities. The validity of the assessment system was examined on the basis of criterion-related validity. Assessing the scores of 2,332 ninth-grade students who underwent both the DACC and the Comprehensive Assessment Program for Junior High School Students (a large-scale standardized test that all graduates of junior high school in Taiwan must complete) indicated that the correlation of the scores from the two tests was moderate ( .64) . Moreover, construct validity assessment results demonstrated that all DACC items fit the MRCMLM. In summary, this study adopted a series of rigorous procedures to construct a DACC assessment system; the reliability and validity of the DACC were also verified. IRT was utilized to analyze item parameters to determine difficulty levels and student ability levels. Additionally, an item bank and ability norms were established for the system, thus enabling the use of a computerized adaptive test for assessment, which can effectively determine reading comprehension levels and provide long-term tracking of reading ability growth trends. Results of test-retest reliability, conditional reliability, criterion validity, and IRT validity tests indicate that the DACC system provides a stable and effective assessment of student reading ability. For future studies, the DACC system's item bank will be expanded. A control mechanism for the item exposure rate can also be adopted to improve the system's effectiveness. Moreover, as a comprehensive assessment tool across multiple learning stages, the DACC system can provide empirical evidence for use in solving problems related to reading comprehension and make substantial contributions to related fields of research.

並列關鍵字

reading comprehension ； computerized adaptive testing ； diagnosis ； Chinese reading ability

參考文獻

沈欣怡、蘇宜芬（2011）：〈推論性問題引導課程對國小四年級學童推論理解與閱讀理解能力之影響〉。《教育心理學報》，43（S），337-356。 [Shen, H.-Y., & Su, Y.-F. (2011). The effects “inferential question discussion program” on inferential comprehension and reading comprehension of fourth grade students. Bulletin of Educational Psychology, 43(S), 337–356.] https://doi.org/10.6251/BEP.20110801

林小慧、曾玉村（2017）：〈科學多重文本閱讀理解評量之建構與信效度分析—以氣候變遷與三峽大壩之間的關係題本為例〉。《教育心理學報》，49（2），215-241。[Lin, H.-H., & Tzeng, Y.-H. (2017). Developing and validating a scientific multi-text reading comprehension assessment: Evidence from texts describing relationships between climate changes and the Three Gorges Dam. Bulletin of Educational Psychology, 49(2), 215–241.] https://doi.org/10.6251/BEP.2017-49(2).0003

張世彗（2014）：〈閱讀理解量表建製之探究〉。《特殊教育發展期刊》，58，1–12。[Chang, S.-H. (2014). Test making of reading comprehension. The Development of Special Education, 58, 1–12.] https://doi.org/10.7034/DSE.201412_(58).0001

陳昭珍、宋曜廷、章瓊方、曾厚強（2020）：〈配合國小課程單元科普讀物人工分級推薦與系統可讀性分析之差異研究〉。《圖書資訊學刊》，18（1），45–67。[Chen, C.-C., Sung, Y.-T., Chang, C.-F., & Tseng, H.-C. (2020). Examining the differences of readability leveling of Chinese popular science books by experts and by CRIE system for elementary school children. Journal of Library and Information Studies, 18(1), 45–67.] https://doi.org/10.6182/jlis.202006_18(1).045

陳茹玲、宋曜廷、蘇宜芬（2017）：〈「精緻化推論教學課程」對國小弱勢低年級學生策略運用、閱讀理解與故事重述表現之影響〉。《教育心理學報》，48（3），303–327。[Chen, J.-L., Sung, Y.-T., & Su, Y.-F. (2017). The effect of “elaboration curriculum” on the reading strategy, reading comprehension and story retelling of 2nd grade students. Bulletin of Educational Psychology, 48(3), 303–327.] https://doi.org/10.6251/BEP.20150922

被引用紀錄

張丹、梁桂嘉、宋曜廷（2023）。使用UEQ評估中文適性閱讀診斷網站設計之使用者體驗。商業設計學報，(26)，66-84。https://www.airitilibrary.com/Article/Detail?DocID=10289518-N202304130004-00005

國際替代計量

中文閱讀能力適性診斷評量編製研究

全文下載

主題瀏覽