ISSN 0439-755X
CN 11-1911/B

心理学报 ›› 2022, Vol. 54 ›› Issue (8): 996-1008.doi: 10.3724/SP.J.1041.2022.00996

• 研究报告 • 上一篇    


李佳, 毛秀珍(), 韦嘉   

  1. 四川师范大学教育科学学院, 成都 610066
  • 收稿日期:2021-06-30 发布日期:2022-06-23 出版日期:2022-08-25
  • 通讯作者: 毛秀珍
  • 基金资助:

A simple and effective new method of Q-matrix validation

LI Jia, MAO Xiuzhen(), WEI Jia   

  1. Institute of Educational Sichuan Normal University, Chengdu 610066, China
  • Received:2021-06-30 Online:2022-06-23 Published:2022-08-25
  • Contact: MAO Xiuzhen


Q矩阵的正确性是影响题目参数估计和被试分类准确性的重要因素。针对Q矩阵修正问题, 首先提出了一种简单有效的新方法(ORDP)。然后, 模拟研究通过改变被试知识状态的分布、样本容量(N)、测验长度(L)、Q矩阵错误率(M)、项目质量(Iq)和属性层级结构, 比较了ORDP与已有方法(R、RMSEA和HD)的表现。研究表明:(1) 当知识状态服从均匀分布时, ORDP方法在所有层级结构下最优; 当知识状态服从多元正态分布时, RMSEA和ORDP表现没有明显差异, 除独立结构外, RMSEA方法均稍优于ORDP方法; (2) 各方法在多元正态分布下的修正效果不及均匀分布时的修正结果; (3) NLMIq和属性层级结构对4种方法的表现均有明显影响; (4) 基于Tatsuoka (1984)分数减法数据的修正结果表明, 采用ORDP方法修正的Q矩阵与数据拟合最优。

关键词: 认知诊断, Q矩阵修正, ORDP方法, DINA模型


Cognitive diagnostic theory (CDT) can provide fine-grained and multidimensional process assessment results, which has important research and practical values. The Q-matrix that represents the relationship between items and attributes, is the basis of CDT. The accuracy of the Q-matrix is an important factor that affects the accuracy of items parameter estimation and participants’ diagnosis. Therefore, it is of great significance to check the correctness of the Q-matrix or to validate it. A lot of studies have been carried out on the estimation or validation of Q-matrix, and a variety of methods have been proposed from different perspectives, each having their advantages and disadvantages. The methods based on model-data fit can provide rich test information without the need of complex parameter estimation and time-consuming and tedious calculation. Following this line of thinking, this study used Gini coefficient to express the purity of expected numbers proportion distribution, and constructed a simple and efficient Q-matrix validation method, called the optimization of response distribution purity (ORDP) method, which is suitable for both simplified model and saturated model.
Residual index (R), root mean square error approximate (RMSEA) and hamming distance (HD) were compared to evaluate the performances with varied influencing factors, under the conditions of two different distribution of knowledge states (KS) (uniform distribution, multidimensional normal distribution), two different sample sizes (300, 1000), two different test lengths (20, 30), Q-matrix error rates (20%, 40%), item qualities ([0.05, 0.25], [0.05, 0.24]) and attribute hierarchical structures (independent structure, linear structure, convergent structure, and branched structure). The specific algorithm of Q-matrix validation is as follows. Firstly, the initial Q-matrix is represented by Q0. When validating the first item j, the initial q-vector of item j in Q0 is replaced with one of all possible q-vectors, leaving the rest of the items intact. Then, the EM algorithm is used to estimate the item parameters and the knowledge states of the participants. Lastly, the q-vector that minimizes ORDP, R, RMSEA, or HD for the q-vector of the item is selected.
Simulation results demonstrate that: (1) The distribution of KS affects the performance of each method. Specifically, when the KS is uniformly distributed, ORDP method is superior to other methods, HD method is the next, followed by RMSEA and R methods; When the KS follows multivariate normal distribution, there is no significant difference between RMSEA and ORDP. RMSEA method is slightly better than ORDP method except independent structure, followed by HD and R method; (2) The validation effect of these methods under multivariate normal distribution is not as good as that under uniform distribution; (3) The validation rates of the four methods all affected by sample sizes, test lengths, Q-matrix error rates, item qualities and attribute hierarchical structures. If the smaller the number of respondents, the shorter the test length, the higher the Q-matrix error rates, or the lower the item quality, the worse the performance of each method will be, and vice versa; (4) The validation results based on the fractional subtraction data of Tatsuoka (1984) show that the Q-matrix modified by ORDP method has the best model-data fit.
In this study, the ORDP index representing the purity of the expected numbers proportion distribution was constructed based on the Gini coefficient. Simulation and empirical studies show that this method has a high validation rate for Q-matrices under different conditions. On the whole, the new method proposed in this study validates the Q-matrix through data analysis, which can reduce the workload of experts and thus improve the correctness of the Q-matrix.

Key words: cognitive diagnosis, Q-matrix validation methods, ORDP method, DINA model