透過您的圖書館登入
IP:18.189.22.136
  • 學位論文

無母數試題反應理論在分析連續反應試題的延伸與應用

Nonparametric Item Response Theory for Continuous Items and its Applications

指導教授 : 施慶麟 陳淑英
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

並列摘要


In applying the continuous response format (CRF), test constructers usually face a critical challenge about fitting parametric models. Models involving the CRF always employ additional assumptions about test properties, such as a functional relationship between the item response and the difficulty parameter. To meet the requirements becomes increasingly difficult for test data, especially when there exists no method for evaluating model assumptions. The dissertation, therefore, proposes the nonparametric item response theory for continuous items (NIRTC) to provide a more flexible method/model for analyzing continuous test data. The core of NIRTC is the concept of item response surface function (IRSF) which can be used to represent various kinds of continuous response models by placing different assumptions on it. To establish the methodology for the newly proposed NIRTC, five interrelated simulation studies and an empirical study were conducted. The author first aims to ascertain the rules of thumb for determining a scalable item for an NIRTC analysis. The simulation outcomes revealed that the H coefficient mainly reflected the item quality and the traditional rule of thumb (i.e., c > .3) might be too stringent and debatable. To decide the criteria in terms of the test properties as well as test purposes is suggested. The practical SOL as the core values of the NIRTC was then investigated by extending the work done by Sijtsma and van der Ark (2001). A great outcome is found: test scores complied with NIRTC models always yield higher accuracies in ordering examinees than those with its discrete counterpart. Since the practical SOL has been proven its hold in nonparametric IRT, the holding for NIRTC should be indubitable. The study also justifies the usage of the NIRTC models while the sample size is small. The dissertation further use studies 3 and 4 to evaluate the performance of the automated item selection procedure (AISP). For detecting aberrant items, the results indicated that the AISP could efficiently remove the items with reverse worded and ideal point process, as well as the items with high percentage of randomly guessing. However, the AISP nearly had no power to detect the DIF items. The author further finds that the sample size required for per element accuracy (PEA) to reach a pre-specified criterion while constructing a test increases as the item quality decreases and the generating model gets complicated. For conditions with high-quality items, even 100 examinees are sufficient for performing an NIRTC analysis. However, for conditions with higher correlation of latent variables and item quality, a new strategy to improve PEAs should be necessary. To provide a DIF method for continuous items, the author proposed the continuous SIBTEST procedure (CSIB). Compared with the continuous MH method, CSIB could generally control type I error rates well and yield relatively high power rates in most conditions. To further take the burden of implementation into consideration, CSIB with flexible grouping method is especially recommended for assessing DIF. Based on the methodology established, the author conducted a thorough analysis for the Teacher Self-efficacy Scale (TSS), including aberrant items detection, the dimensionality assessment, model assumptions evaluation, and the DIF assessment. As a result, the 9-item TSS is considered as an SDMM scale and is potential for further applications. Based on these studies, the methodology for NIRTC should have been established well in this dissertation, and the NIRTC is ready for test construction. Apart from practical applications, the NIRTC also show its capability in examining and integrating item response models (IRM). About fifty IRMs with various kinds of assumptions, parametric forms, response formats, and other constraints belong to the NIRTC. In brief, the development of NIRTC can not only improve the method for scale construction but provide a general scope for examining most IRMs within IRT as well. Several issues are waiting for future study to make the NIRTC complete, such as the development of methods for assessing dimensionality, the extension of CSIB for non-uniform DIF, and the proposal of continuous cognitive diagnostic models and continuous unfolding models.

參考文獻


Almenberg, J., & Dreber, A. (2011). When does the price affect the taste? Results from a wine experiment. Journal of Wine Economics, 6, 111-121.
Alwin, D. F. (1997). Feeling thermometers versus 7-point scales: Which are better? Sociological Methods & Research, 25, 318-340.
Chang, H.-H., Mazzeo, J., & Roussos, J. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.
Chen, J.-H., Shih, C.-L., & Chen, S.-Y. (2014, April). The continuous response model with random-effect for modeling subjective judgment in rating scale items. Paper presented at the 2014 Annual Meeting of the American Educational Research Association, Philadelphia, PA, USA.
Coombs, C. H. (1964). A theory of data. New York: Wiley.

延伸閱讀