使用验证性补偿多维IRT模型进行认知诊断评估

doi:10.3724/SP.J.1041.2016.01347

心理学报 ›› 2016, Vol. 48 ›› Issue (10): 1347-1356.doi: 10.3724/SP.J.1041.2016.01347

• 论文 • 上一篇

使用验证性补偿多维IRT模型进行认知诊断评估

詹沛达;陈平;边玉芳

(北京师范大学中国基础教育质量监测协同创新中心, 北京 100875)

收稿日期:2015-12-21 发布日期:2016-10-25 出版日期:2016-10-25
通讯作者: 陈平, E-mail: pchen@bnu.edu.cn; 边玉芳, E-mail: bianyufang66@126.com
基金资助:
国家自然科学基金青年基金项目(31300862)、高等学校博士学科点专项科研基金项目新教师类(20130003120002)和东北师范大学应用统计教育部重点实验室开放课题(KLAS 130028614)资助。

Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis

ZHAN Peida; CHEN Ping; BIAN Yufang

(Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China)

Received:2015-12-21 Online:2016-10-25 Published:2016-10-25
Contact: CHEN Ping, E-mail: pchen@bnu.edu.cn; BIAN Yufang, E-mail: bianyufang66@126.com

摘要/Abstract

摘要：

随着人们对测验反馈结果精细化的需求逐渐提高, 具有认知诊断功能的测量方法逐渐受到人们的关注。在认知诊断模型(CDMs)闪耀着光芒的同时, 另一类能够在连续量尺上提供精细反馈的多维IRT模型(MIRTMs)似乎受到些许冷落。为探究MIRTMs潜在的认知诊断功能, 本文以补偿模型为视角, 聚焦于分别属于MIRTMs的多维两参数logistic模型(M2PLM)和属于CDMs的线性logistic模型(LLM); 之后为使两者具有可比性, 可对补偿M2PLM引入验证性矩阵(Q矩阵)来界定题目与维度之间的关系, 进而得到验证性的补偿M2PLM (CC-M2PLM), 并通过把潜在特质按切点划分为跨界属性, 以期使CC-M2PLM展现出其本应具有的认知诊断功能; 预研究表明logistic量尺上的0点可作为相对合理的切点; 然后, 通过模拟研究对比探究CC-M2PLM和LLM的认知诊断功能, 结果表明CC-M2PLM可用于分析诊断测验数据, 且认知诊断功能与直接使用LLM的效果相当; 最后, 以两则实证数据为例来说明CC-M2PLM在实际诊断测验分析中的可行性。

关键词: 项目反应理论, 多维项目反应理论, 认知诊断模型, 认知诊断, Q矩阵, 验证性因素分析

Abstract:

Traditional testing methods, such as classical testing theory or unidimensional item response theory models (UIRMs), typically provide a single sum score or overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed and refined feedback for students. In recent years, cognitive diagnostic models (CDMs) have received great attention, particularly in the areas of educational and psychological measurement. The outcome of a DCM analysis is a profile of a set of attributes, α, also called a latent class, for each person; this provides cognitive diagnostic information about distinct skills underlying a test that students mastery or non-mastery. During the same period, another kind of models, multidimensional IRT models (MIRTMs), which also can provide fine-grained information about students’ strengths and weaknesses in the learning process were neglected. MIRTMs are different from CDMs in that latent variables in MIRTMs are continuous (namely, latent traits; θ) rather than categorical (typically binary). However, categorical variables in CDMs may be too rough to describe students’ skills when compared with the continuous latent traits in MIRTMs. Diagnostic measurement is the process of analyzing data from a diagnostic assessment for the purpose of making classification-based decisions. Currently, all testing method that have cognitive diagnostic function require substantive information about the attributes involved in specific items. Especially for CDMs, a confirmatory matrix that indicating which latent variables are required for an item, often referred to as Q matrix , is a essential term to analysis response data. Actually, such confirmatory matrices also exist in some multidimensional IRT models (MIRTMs), such as the scoring matrix in multidimensional random coefficients multinomial logit model. Therefore, it can be deduced that when MIRTMs are formulated in a confirmatory model defined by Q matrix, may also have diagnostic potential. Although some articles have noticed that viewpoint (e.g., Embretson & Yang, 2013; Stout, 2007; Wang & Nydick, 2015), no one really explored the diagnostic potential of confirmatory MIRTMs (C-MIRTMs). The main reason can be deduced that latent traits in MIRTMs are continuous, which can not be directly used to make classification-based diagnostic decisions. No matter MIRTMs or CDMs, multidimensional models normally can be specified into compensatory and non-compensatory models due to the relationship among dimensions. In compensatory models, students with high level on one dimension can compensate for lower levels on the other dimensions. Conversely, non-compensatory models assume that every dimensions are independent or partially independent with each others. Comparatively speaking, compensatory models are more general than non-compensatory models. Thus, only two compensatory models were concerned in this study, multidimensional 2-parameter logistic model (M2PLM) and linear logistic model (LLM) respectively, due to space limited. To explore the cognitive diagnostic function of MIRTMs, a confirmatory compensatory M2PLM (CC-M2PLM) were presented by introducing Q matrix in the item response function of M2PLM firstly. Then a cutoff point (CP) was used to transform estimated latent traits in CC-M2PLM to categorical variables (namely, trans-border attributes). This transformation step can be done after data analysis, thus two kinds of analysis results can be reported simultaneously: continuous latent traits and categorical trans-border attributes. Therefore, a suitable CP is very important, because of different CP will lead to different classification results. A simple pilot study was done to found the suitable CP: a test created with the CC-M2PLM but estimated with the LLM revealed that the LLM approximately divided the latent traits distribution in half, with a value of zero in IRT scale being the location of where masters (α = 1 if θ > 0) and non-masters (α = 0 if θ ≤ 0) were set. According to the result of pilot study, the CP was set equal to 0 for all dimensions (i.e., CPk = 0). Parameters in CC-M2PLM and LLM can be estimated by the mirt and CDM packages in R respectively. In simulation study, a series of simulations were conducted to evaluate cognitive diagnostic function of CC-M2PLM. The response data was generated by LLM, which can be treated as a diagnostic measurement dataset. CC-M2PLM and LLM were all used to fit that dataset, and results showed that the pattern (profile) correct classification ratio (PCCR) and the attribute correct classification ratio (ACCR) of trans-border attributes (from CC-M2PLM) and estimated attributes (from LLM) are almost same, the extent of most differences are smaller than 1%. Results of simulation study indicated that CC-M2PLM can be used to diagnostic measurement and its cognitive diagnostic function was as good as that of LLM. Finally, two empirical examples of diagnostic measurement were given to demonstrate applications and implications of the CC-M2PLM.

Key words: item response theory, multidimensional item response theory, cognitive diagnostic models, cognitive diagnosis, Q matrix, confirmatory factor analysis

詹沛达;陈平;边玉芳. (2016). 使用验证性补偿多维IRT模型进行认知诊断评估. 心理学报, 48(10), 1347-1356.

ZHAN Peida; CHEN Ping; BIAN Yufang. (2016). Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis. Acta Psychologica Sinica, 48(10), 1347-1356.

[1]	田亚淑, 詹沛达, 王立君. 联合作答精度和作答时间的概率态认知诊断模型[J]. 心理学报, 2023, 55(9): 1573-1586.
[2]	付颜斌, 陈琦鹏, 詹沛达. 问题解决任务中行动序列的二分类建模：单/两参数行动序列模型[J]. 心理学报, 2023, 55(8): 1383-1396.
[3]	游晓锋, 杨建芹, 秦春影, 刘红云. 认知诊断测评中缺失数据的处理：随机森林阈值插补法[J]. 心理学报, 2023, 55(7): 1192-1206.
[4]	刘彦楼, 陈启山, 王一鸣, 姜晓彤. 模型参数点估计的可靠性：以CDM为例[J]. 心理学报, 2023, 55(10): 1712-1728.
[5]	刘彦楼, 吴琼琼. 认知诊断模型Q矩阵修正：完整信息矩阵的作用[J]. 心理学报, 2023, 55(1): 142-158.
[6]	童昊, 喻晓锋, 秦春影, 彭亚风, 钟小缘. 多级计分测验中基于残差统计量的被试拟合研究[J]. 心理学报, 2022, 54(9): 1122-1136.
[7]	孙小坚, 郭磊. 考虑题目选项信息的非参数认知诊断计算机自适应测验[J]. 心理学报, 2022, 54(9): 1137-1150.
[8]	李佳, 毛秀珍, 韦嘉. 一种简单有效的Q矩阵修正新方法[J]. 心理学报, 2022, 54(8): 996-1008.
[9]	刘彦楼. 认知诊断模型的标准误与置信区间估计：并行自助法[J]. 心理学报, 2022, 54(6): 703-724.
[10]	宋枝璘, 郭磊, 郑天鹏. 认知诊断缺失数据处理方法的比较：零替换、多重插补与极大似然估计法[J]. 心理学报, 2022, 54(4): 426-440.
[11]	詹沛达. 引入眼动注视点的联合-交叉负载多模态认知诊断建模[J]. 心理学报, 2022, 54(11): 1416-1423.
[12]	郭磊, 周文杰. 基于选项层面的认知诊断非参数方法[J]. 心理学报, 2021, 53(9): 1032-1043.
[13]	任赫, 陈平. 两种新的多维计算机化分类测验终止规则[J]. 心理学报, 2021, 53(9): 1044-1058.
[14]	谭青蓉, 汪大勋, 罗芬, 蔡艳, 涂冬波. 一种高效的CD-CAT在线标定新方法：基于熵的信息增益与EM视角[J]. 心理学报, 2021, 53(11): 1286-1300.
[15]	罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465.

使用验证性补偿多维IRT模型进行认知诊断评估

Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis

PDF (PC)

评审附件

可视化

English Version

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价