ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2016, Vol. 48 ›› Issue (10): 1347-1356.doi: 10.3724/SP.J.1041.2016.01347

• 论文 • 上一篇    

使用验证性补偿多维IRT模型进行认知诊断评估

詹沛达;陈 平;边玉芳   

  1. (北京师范大学中国基础教育质量监测协同创新中心, 北京 100875)
  • 收稿日期:2015-12-21 发布日期:2016-10-25 出版日期:2016-10-25
  • 通讯作者: 陈平, E-mail: pchen@bnu.edu.cn; 边玉芳, E-mail: bianyufang66@126.com
  • 基金资助:

    国家自然科学基金青年基金项目(31300862)、高等学校博士学科点专项科研基金项目新教师类(20130003120002)和东北师范大学应用统计教育部重点实验室开放课题(KLAS 130028614)资助。

Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis

ZHAN Peida; CHEN Ping; BIAN Yufang   

  1. (Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China)
  • Received:2015-12-21 Online:2016-10-25 Published:2016-10-25
  • Contact: CHEN Ping, E-mail: pchen@bnu.edu.cn; BIAN Yufang, E-mail: bianyufang66@126.com

摘要:

随着人们对测验反馈结果精细化的需求逐渐提高, 具有认知诊断功能的测量方法逐渐受到人们的关注。在认知诊断模型(CDMs)闪耀着光芒的同时, 另一类能够在连续量尺上提供精细反馈的多维IRT模型(MIRTMs)似乎受到些许冷落。为探究MIRTMs潜在的认知诊断功能, 本文以补偿模型为视角, 聚焦于分别属于MIRTMs的多维两参数logistic模型(M2PLM)和属于CDMs的线性logistic模型(LLM); 之后为使两者具有可比性, 可对补偿M2PLM引入验证性矩阵(Q矩阵)来界定题目与维度之间的关系, 进而得到验证性的补偿M2PLM (CC-M2PLM), 并通过把潜在特质按切点划分为跨界属性, 以期使CC-M2PLM展现出其本应具有的认知诊断功能; 预研究表明logistic量尺上的0点可作为相对合理的切点; 然后, 通过模拟研究对比探究CC-M2PLM和LLM的认知诊断功能, 结果表明CC-M2PLM可用于分析诊断测验数据, 且认知诊断功能与直接使用LLM的效果相当; 最后, 以两则实证数据为例来说明CC-M2PLM在实际诊断测验分析中的可行性。

关键词: 项目反应理论, 多维项目反应理论, 认知诊断模型, 认知诊断, Q矩阵, 验证性因素分析

Abstract:

Traditional testing methods, such as classical testing theory or unidimensional item response theory models (UIRMs), typically provide a single sum score or overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed and refined feedback for students. In recent years, cognitive diagnostic models (CDMs) have received great attention, particularly in the areas of educational and psychological measurement. The outcome of a DCM analysis is a profile of a set of attributes, α, also called a latent class, for each person; this provides cognitive diagnostic information about distinct skills underlying a test that students mastery or non-mastery. During the same period, another kind of models, multidimensional IRT models (MIRTMs), which also can provide fine-grained information about students’ strengths and weaknesses in the learning process were neglected. MIRTMs are different from CDMs in that latent variables in MIRTMs are continuous (namely, latent traits; θ) rather than categorical (typically binary). However, categorical variables in CDMs may be too rough to describe students’ skills when compared with the continuous latent traits in MIRTMs. Diagnostic measurement is the process of analyzing data from a diagnostic assessment for the purpose of making classification-based decisions. Currently, all testing method that have cognitive diagnostic function require substantive information about the attributes involved in specific items. Especially for CDMs, a confirmatory matrix that indicating which latent variables are required for an item, often referred to as Q matrix , is a essential term to analysis response data. Actually, such confirmatory matrices also exist in some multidimensional IRT models (MIRTMs), such as the scoring matrix in multidimensional random coefficients multinomial logit model. Therefore, it can be deduced that when MIRTMs are formulated in a confirmatory model defined by Q matrix, may also have diagnostic potential. Although some articles have noticed that viewpoint (e.g., Embretson & Yang, 2013; Stout, 2007; Wang & Nydick, 2015), no one really explored the diagnostic potential of confirmatory MIRTMs (C-MIRTMs). The main reason can be deduced that latent traits in MIRTMs are continuous, which can not be directly used to make classification-based diagnostic decisions. No matter MIRTMs or CDMs, multidimensional models normally can be specified into compensatory and non-compensatory models due to the relationship among dimensions. In compensatory models, students with high level on one dimension can compensate for lower levels on the other dimensions. Conversely, non-compensatory models assume that every dimensions are independent or partially independent with each others. Comparatively speaking, compensatory models are more general than non-compensatory models. Thus, only two compensatory models were concerned in this study, multidimensional 2-parameter logistic model (M2PLM) and linear logistic model (LLM) respectively, due to space limited. To explore the cognitive diagnostic function of MIRTMs, a confirmatory compensatory M2PLM (CC-M2PLM) were presented by introducing Q matrix in the item response function of M2PLM firstly. Then a cutoff point (CP) was used to transform estimated latent traits in CC-M2PLM to categorical variables (namely, trans-border attributes). This transformation step can be done after data analysis, thus two kinds of analysis results can be reported simultaneously: continuous latent traits and categorical trans-border attributes. Therefore, a suitable CP is very important, because of different CP will lead to different classification results. A simple pilot study was done to found the suitable CP: a test created with the CC-M2PLM but estimated with the LLM revealed that the LLM approximately divided the latent traits distribution in half, with a value of zero in IRT scale being the location of where masters (α = 1 if θ > 0) and non-masters (α = 0 if θ ≤ 0) were set. According to the result of pilot study, the CP was set equal to 0 for all dimensions (i.e., CPk = 0). Parameters in CC-M2PLM and LLM can be estimated by the mirt and CDM packages in R respectively. In simulation study, a series of simulations were conducted to evaluate cognitive diagnostic function of CC-M2PLM. The response data was generated by LLM, which can be treated as a diagnostic measurement dataset. CC-M2PLM and LLM were all used to fit that dataset, and results showed that the pattern (profile) correct classification ratio (PCCR) and the attribute correct classification ratio (ACCR) of trans-border attributes (from CC-M2PLM) and estimated attributes (from LLM) are almost same, the extent of most differences are smaller than 1%. Results of simulation study indicated that CC-M2PLM can be used to diagnostic measurement and its cognitive diagnostic function was as good as that of LLM. Finally, two empirical examples of diagnostic measurement were given to demonstrate applications and implications of the CC-M2PLM.

Key words: item response theory, multidimensional item response theory, cognitive diagnostic models, cognitive diagnosis, Q matrix, confirmatory factor analysis