ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2016, Vol. 48 ›› Issue (12): 1600-1611.doi: 10.3724/SP.J.1041.2016.01600

• 论文 • 上一篇    下一篇

认知诊断评价中测验结构的优化设计

彭亚风1; 罗照盛1; 喻晓锋1; 高椿雷1; 李喻骏2   

  1. (1江西师范大学心理学院, 南昌 330022) (2华南师范大学心理应用研究中心/心理学院, 广州 510631)
  • 收稿日期:2016-03-29 发布日期:2016-12-24 出版日期:2016-12-24
  • 通讯作者: 罗照盛, E-mail: luozs@126.com
  • 基金资助:

    江西省研究生创新专项基金(YC2015-B025)资助。

The optimization of test design in Cognitive Diagnostic Assessment

PENG Yafeng1; LUO Zhaosheng1; YU Xiaofeng1; GAO Chunlei1; LI Yujun2   

  1. (1 School of psychology, Jiangxi Normal University, Nanchang 330022, China) (2 Center for Studies of Psychological Application/School of Psychology, South China Normal University, Guangzhou 510631, China)
  • Received:2016-03-29 Online:2016-12-24 Published:2016-12-24
  • Contact: LUO Zhaosheng, E-mail: luozs@126.com

摘要:

Q矩阵是认知诊断评价的基础和核心要素, 它反映了测验的构念和内容设计, 直接影响着测验诊断分类的效果。本文采用Monte Carlo模拟, 研究了6种属性层级关系下, 不同的Q矩阵设计对于认知诊断效果的影响。用模式判准率的均值和标准差分别从分类准确性和稳定性的角度来评价诊断效果。实验结果表明:(1) 不同属性层级关系下, 分类准确性会随着测验长度的增加而提高, 但当测验长度增加到一定程度时, 会出现“天花板效应”; (2) Q矩阵中R*的个数(NR*)会影响测验的分类准确性及稳定性:NR*越大, 测验的分类稳定性越高, 当测验长度为属性个数的整数倍, 且NR*为测验长度相对属性个数的最大奇数倍时分类准确性最高; (3) Q矩阵中除R*以外的项目考察的属性个数会随着属性层级关系的不同对测验的分类准确性和稳定性产生不同的影响。根据实验结果, 本研究提出了进行诊断评价时Q矩阵优化设计的一些建议。

关键词: 认知诊断评价, Q矩阵, 测验结构设计, 分类准确性, 分类稳定性

Abstract:

Cognitive diagnostic assessment (CDA) is designed to measure specific knowledge structures and processing skills of students so as to provide information about their cognitive strengths and weaknesses. The Q matrix is the base component and core element in CDA that characterizes the design of test construct and continent, and has a direct impact on the classification efficiency of CDA. In this article, we examined how the characteristics of Q matrix design would affect the performance of CDA. In the Monte Carlo simulation study, the mean value and standard deviation of pattern match ratio are used to evaluate the classification accuracy and stability of CDA correspondingly. In the study, six attribute hierarchical structures (Linear, Convergent, Divergent, Unstructured, Independent and Mixture) are simulated. The results show that: (1) the classification accuracy becomes higher when the test is longer; however, a "ceiling effect" of classification accuracy emerges when the test length reaches a certain value; (2) the number of R* (the matrix that has same elements as the Reachable matrix) in the Q matrix affects the test’s classification accuracy and stability. The Q matrix design leads to higher stability with more R* included, and the Q matrix with a maximum odd number of R* has the highest classification accuracy; (3) the average number of attributes measured within each item has an effect on the classification accuracy and stability, and it varies across different attribute hierarchy structures. From the results, we have some recommendations on test design under different attribute hierarchy structures in CDA, summarized as follow: (1) the optimal test length and NR* is four times the number of attribute and one R* for Linear and Convergent, five times and one R* for Divergent, six times and three R* for Unstructured, six times and five R* for Independent, and six times and two R* for Mixture respectively; (2) the design of attributes measured in items excluding R* varies across different attribute hierarchy structures. (a) For Linear, every pattern in the set of potential item should be measured equally (the set of potential items is considered as a pool of items that probes all combinations of attributes under the corresponding attribute hierarchy structure); (b) for Convergent, the attributes measured in the items should be mainly on each path of the convergent branch, with their prerequisite attributes, and for the whole hierarchy structure in that sequence; (c) for the Divergent structure, the attributes measured in the items besides R* should be mainly the combinations of the attributes on each path of the divergent branch; (d) the combinations of the attribute and its prerequisite attribute are preferred under Unstructured; (e) for Independent, the combinations of any two attributes is recommended; (f) for Mixture, the suggestions discussed above under each hierarchy structure can be used as the reference in building the specific hierarchy structure parts among attributes.

Key words: cognitive diagnostic assessment, Q matrix, the design of test construct, classification accuracy, classification stability