认知诊断评价中测验结构的优化设计

doi:10.3724/SP.J.1041.2016.01600

心理学报 ›› 2016, Vol. 48 ›› Issue (12): 1600-1611.doi: 10.3724/SP.J.1041.2016.01600

认知诊断评价中测验结构的优化设计

彭亚风¹; 罗照盛¹; 喻晓锋¹; 高椿雷¹; 李喻骏²

(¹江西师范大学心理学院, 南昌 330022) (²华南师范大学心理应用研究中心/心理学院, 广州 510631)

收稿日期:2016-03-29 发布日期:2016-12-24 出版日期:2016-12-24
通讯作者: 罗照盛, E-mail: luozs@126.com
基金资助:
江西省研究生创新专项基金(YC2015-B025)资助。

The optimization of test design in Cognitive Diagnostic Assessment

PENG Yafeng¹; LUO Zhaosheng¹; YU Xiaofeng¹; GAO Chunlei¹; LI Yujun²

(¹ School of psychology, Jiangxi Normal University, Nanchang 330022, China) (² Center for Studies of Psychological Application/School of Psychology, South China Normal University, Guangzhou 510631, China)

Received:2016-03-29 Online:2016-12-24 Published:2016-12-24
Contact: LUO Zhaosheng, E-mail: luozs@126.com

摘要/Abstract

摘要：

Q矩阵是认知诊断评价的基础和核心要素, 它反映了测验的构念和内容设计, 直接影响着测验诊断分类的效果。本文采用Monte Carlo模拟, 研究了6种属性层级关系下, 不同的Q矩阵设计对于认知诊断效果的影响。用模式判准率的均值和标准差分别从分类准确性和稳定性的角度来评价诊断效果。实验结果表明：(1) 不同属性层级关系下, 分类准确性会随着测验长度的增加而提高, 但当测验长度增加到一定程度时, 会出现“天花板效应”; (2) Q矩阵中R*的个数(NR*)会影响测验的分类准确性及稳定性：NR*越大, 测验的分类稳定性越高, 当测验长度为属性个数的整数倍, 且NR*为测验长度相对属性个数的最大奇数倍时分类准确性最高; (3) Q矩阵中除R*以外的项目考察的属性个数会随着属性层级关系的不同对测验的分类准确性和稳定性产生不同的影响。根据实验结果, 本研究提出了进行诊断评价时Q矩阵优化设计的一些建议。

关键词: 认知诊断评价, Q矩阵, 测验结构设计, 分类准确性, 分类稳定性

Abstract:

Cognitive diagnostic assessment (CDA) is designed to measure specific knowledge structures and processing skills of students so as to provide information about their cognitive strengths and weaknesses. The Q matrix is the base component and core element in CDA that characterizes the design of test construct and continent, and has a direct impact on the classification efficiency of CDA. In this article, we examined how the characteristics of Q matrix design would affect the performance of CDA. In the Monte Carlo simulation study, the mean value and standard deviation of pattern match ratio are used to evaluate the classification accuracy and stability of CDA correspondingly. In the study, six attribute hierarchical structures (Linear, Convergent, Divergent, Unstructured, Independent and Mixture) are simulated. The results show that: (1) the classification accuracy becomes higher when the test is longer; however, a "ceiling effect" of classification accuracy emerges when the test length reaches a certain value; (2) the number of R* (the matrix that has same elements as the Reachable matrix) in the Q matrix affects the test’s classification accuracy and stability. The Q matrix design leads to higher stability with more R* included, and the Q matrix with a maximum odd number of R* has the highest classification accuracy; (3) the average number of attributes measured within each item has an effect on the classification accuracy and stability, and it varies across different attribute hierarchy structures. From the results, we have some recommendations on test design under different attribute hierarchy structures in CDA, summarized as follow: (1) the optimal test length and NR* is four times the number of attribute and one R* for Linear and Convergent, five times and one R* for Divergent, six times and three R* for Unstructured, six times and five R* for Independent, and six times and two R* for Mixture respectively; (2) the design of attributes measured in items excluding R* varies across different attribute hierarchy structures. (a) For Linear, every pattern in the set of potential item should be measured equally (the set of potential items is considered as a pool of items that probes all combinations of attributes under the corresponding attribute hierarchy structure); (b) for Convergent, the attributes measured in the items should be mainly on each path of the convergent branch, with their prerequisite attributes, and for the whole hierarchy structure in that sequence; (c) for the Divergent structure, the attributes measured in the items besides R* should be mainly the combinations of the attributes on each path of the divergent branch; (d) the combinations of the attribute and its prerequisite attribute are preferred under Unstructured; (e) for Independent, the combinations of any two attributes is recommended; (f) for Mixture, the suggestions discussed above under each hierarchy structure can be used as the reference in building the specific hierarchy structure parts among attributes.

Key words: cognitive diagnostic assessment, Q matrix, the design of test construct, classification accuracy, classification stability

彭亚风;罗照盛;喻晓锋;高椿雷;李喻骏. (2016). 认知诊断评价中测验结构的优化设计. 心理学报, 48(12), 1600-1611.

PENG Yafeng; LUO Zhaosheng; YU Xiaofeng; GAO Chunlei; LI Yujun. (2016). The optimization of test design in Cognitive Diagnostic Assessment. Acta Psychologica Sinica, 48(12), 1600-1611.

[1]	刘彦楼, 吴琼琼. 认知诊断模型Q矩阵修正：完整信息矩阵的作用[J]. 心理学报, 2023, 55(1): 142-158.
[2]	汪大勋, 高旭亮, 蔡艳, 涂冬波. 基于类别水平的多级计分认知诊断Q矩阵修正：相对拟合统计量视角[J]. 心理学报, 2020, 52(1): 93-106.
[3]	高椿雷；罗照盛；喻晓锋；彭亚风；郑蝉金. CD-MST初始阶段模块组建方法比较[J]. 心理学报, 2016, 48(8): 1037-1046.
[4]	康春花; 任平; 曾平飞. 多级评分聚类诊断法的影响因素[J]. 心理学报, 2016, 48(7): 891-902.
[5]	詹沛达;边玉芳;王立君. 重参数化的多分属性诊断分类模型及其判准率影响因素[J]. 心理学报, 2016, 48(3): 318-330.
[6]	汪文义; 宋丽红;丁树良. 复杂决策规则下MIRT的分类准确性和分类一致性[J]. 心理学报, 2016, 48(12): 1612-1624.
[7]	詹沛达;陈平;边玉芳. 使用验证性补偿多维IRT模型进行认知诊断评估[J]. 心理学报, 2016, 48(10): 1347-1356.
[8]	罗照盛;李喻骏;喻晓锋;高椿雷;彭亚风. 一种基于Q矩阵理论朴素的认知诊断方法[J]. 心理学报, 2015, 47(2): 264-272.
[9]	喻晓锋;罗照盛;秦春影;高椿雷;李喻骏. 基于作答数据的模型参数和Q矩阵联合估计[J]. 心理学报, 2015, 47(2): 273-282.
[10]	涂冬波,蔡艳,戴海琦. 基于DINA模型的Q矩阵修正方法[J]. 心理学报, 2012, 44(4): 558-568.
[11]	涂冬波;蔡艳;戴海琦;丁树良. 一种多策略认知诊断方法：MSCD方法的开发[J]. 心理学报, 2012, 44(11): 1547-1553.
[12]	孙佳楠,张淑梅,辛涛,包钰. 基于Q矩阵和广义距离的认知诊断方法[J]. 心理学报, 2011, 43(09): 1095-1102.

认知诊断评价中测验结构的优化设计

The optimization of test design in Cognitive Diagnostic Assessment

PDF (PC)

评审附件

可视化

English Version

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics

本文评价