%A LI Jia, MAO Xiuzhen, WEI Jia
%T A simple and effective new method of Q-matrix validation
%0 Journal Article
%D 2022
%J Acta Psychologica Sinica
%R 10.3724/SP.J.1041.2022.00996
%P 996-1008
%V 54
%N 8
%U {https://journal.psych.ac.cn/acps/CN/abstract/article_5035.shtml}
%8 2022-08-25
%X Cognitive diagnostic theory (CDT) can provide fine-grained and multidimensional process assessment results, which has important research and practical values. The Q-matrix that represents the relationship between items and attributes, is the basis of CDT. The accuracy of the Q-matrix is an important factor that affects the accuracy of items parameter estimation and participants' diagnosis. Therefore, it is of great significance to check the correctness of the Q-matrix or to validate it. A lot of studies have been carried out on the estimation or validation of Q-matrix, and a variety of methods have been proposed from different perspectives, each having their advantages and disadvantages. The methods based on model-data fit can provide rich test information without the need of complex parameter estimation and time-consuming and tedious calculation. Following this line of thinking, this study used Gini coefficient to express the purity of expected numbers proportion distribution, and constructed a simple and efficient Q-matrix validation method, called the optimization of response distribution purity (ORDP) method, which is suitable for both simplified model and saturated model.<br/>Residual index (R), root mean square error approximate (RMSEA) and hamming distance (HD) were compared to evaluate the performances with varied influencing factors, under the conditions of two different distribution of knowledge states (KS) (uniform distribution, multidimensional normal distribution), two different sample sizes (300, 1000), two different test lengths (20, 30), Q-matrix error rates (20%, 40%), item qualities ([0.05, 0.25], [0.05, 0.24]) and attribute hierarchical structures (independent structure, linear structure, convergent structure, and branched structure). The specific algorithm of Q-matrix validation is as follows. Firstly, the initial Q-matrix is represented by <em>Q<sup>0</sup></em>. When validating the first item <em>j</em>, the initial q-vector of item <em>j</em> in <em>Q<sup>0</sup></em> is replaced with one of all possible q-vectors, leaving the rest of the items intact. Then, the EM algorithm is used to estimate the item parameters and the knowledge states of the participants. Lastly, the q-vector that minimizes ORDP, R, RMSEA, or HD for the q-vector of the item is selected.<br/>Simulation results demonstrate that: (1) The distribution of KS affects the performance of each method. Specifically, when the KS is uniformly distributed, ORDP method is superior to other methods, HD method is the next, followed by RMSEA and R methods; When the KS follows multivariate normal distribution, there is no significant difference between RMSEA and ORDP. RMSEA method is slightly better than ORDP method except independent structure, followed by HD and R method; (2) The validation effect of these methods under multivariate normal distribution is not as good as that under uniform distribution; (3) The validation rates of the four methods all affected by sample sizes, test lengths, Q-matrix error rates, item qualities and attribute hierarchical structures. If the smaller the number of respondents, the shorter the test length, the higher the Q-matrix error rates, or the lower the item quality, the worse the performance of each method will be, and vice versa; (4) The validation results based on the fractional subtraction data of Tatsuoka (1984) show that the Q-matrix modified by ORDP method has the best model-data fit.<br/>In this study, the ORDP index representing the purity of the expected numbers proportion distribution was constructed based on the Gini coefficient. Simulation and empirical studies show that this method has a high validation rate for Q-matrices under different conditions. On the whole, the new method proposed in this study validates the Q-matrix through data analysis, which can reduce the workload of experts and thus improve the correctness of the Q-matrix.