认知诊断模型属性层级关系和Q矩阵的联合验证方法：面向实践的视角

doi:10.3724/SP.J.1041.2025.1295

摘要/Abstract

摘要： 在认知诊断评估实践中, Q矩阵和属性层级关系的构建正确与否都会影响认知诊断模型参数估计的准确性以及被试的分类准确率。属性层级关系和Q矩阵通常依赖领域专家判断实现, 目前已经有一些研究对Q矩阵或者属性层级关系分别进行检验修正。本文提出一种基于贝叶斯网条件独立性检验的方法联合验证Q矩阵和属性层级关系, 通过两个模拟研究考察了该方法的联合修正准确率, 以及修正准确率的具体影响因素。结果表明, 在Q矩阵错误率处于中等或以下水平时, 该方法能够有效修正Q矩阵和属性层级关系, 尤其在题目质量较高样本量充足测验长度较长的情况下, 联合修正效果更好。最后将该算法应用于具体认知诊断评估实践中, 对专家界定的属性层级关系和Q矩阵进行联合的基于数据的检验修正, 结果表明修正后的模型拟合更好。

关键词: 认知诊断, 属性层级关系, Q矩阵, 贝叶斯网

Abstract:

Cognitive diagnostic models (CDMs) are developed to diagnostically evaluate subjects’ cognitive strengths and weaknesses based on the Q-matrix mapping of the items and attributes. The traditional calibration of cognitive attributes in the Q-matrix mainly relies on the subjective judgment of experts. Due to the subjective process of Q-matrix construction, there inevitably are more or less misspecifications in the Q-matrix, which, if left unchecked, may result in a serious negative impact on cognitive diagnostic assessment. From another important perspective, in the empirical applications of CDMs, cognitive attributes generally do not operate independently but rather belong to an interrelated network, and a certain psychological order, logical order, or hierarchical relationship may be present among the cognitive attributes. The correctness of both the Q-matrix and the attribute hierarchy significantly impacts the parameter estimation ability of a CDM and the accuracy of the examinee’s classification result. Recently, considerable studies have developed approaches for validating Q-matrices or testing attribute hierarchies respectively. However, there is no method that can validate both the Q-matrix and the attribute hierarchy simultaneously. From the empirical application perspective, an approach that can simultaneously validate both a prespecified Q-matrix and an attribute hierarchy is more desirable.

An approach based on Bayesian networks (BN) for validating both Q-matrices and attribute hierarchies simultaneously is proposed in this research. To explore the performance of the BN method, this article conducted two simulation studies and one empirical data analysis to theoretically and practically evaluate the accuracy of the Q-matrix validation and attribute hierarchy correction processes. The correctness of each element in the Q matrix and the attributes hierarchy can be checked by testing the strength of edge existence in the network structure.

When validating the attribute hierarchy relationships and the Q-matrix jointly in the first simulation, we explore the effects of Q-matrix error rate, item quality, test length, sample size, and the attribute hierarchy type on the correction accuracy of both the Q-matrix and the attribute hierarchy. The results show that the BN method can effectively correct the Q-matrix and the attribute hierarchy simultaneously when the error rate of the Q-matrix is at a medium or low level, especially when the item quality is high or the sample size is sufficient or the test length is long, the accuracy of the correction is generally high. As the Q-matrix error rate increases and the quality of the items decreases, the correction accuracy gradually decreases. The BN method can correct the attribute hierarchies exactly right when the Q matrix is correct. The results in the second simulation show that when the attribute number in the Q-matrix increases, the BN method is still performing well. Different types of attribute hierarchy errors have a small impact on the correction accuracy across different conditions. The effectiveness of the BN method in the empirical dataset was demonstrated by the better model data fit index of BIC.

In conclusion, the initial specified Q-matrix and attribute hierarchy can be simultaneously validated via the BN method. Then the corrected Q-matrix and the refined attribute hierarchy obtained from the data-driven BN method can again be combined with the theoretical judgments of experts to obtain a more optimized model, finally achieving more accurate diagnostic outcomes in CDA practice.

Key words: cognitive diagnosis, attribute hierarchy relationships, Q-matrix, Bayesian network

中图分类号:

B841

汪玲玲, 孙小坚. (2025). 认知诊断模型属性层级关系和Q矩阵的联合验证方法：面向实践的视角. 心理学报, 57(7), 1295-1308.

WANG Ling-Ling, SUN Xiao-Jian. (2025). An approach that can validate both Q-matrices and attribute hierarchies in cognitive diagnosis models: From the empirical application perspective. Acta Psychologica Sinica, 57(7), 1295-1308.

图/表 12

参考文献 42

[1]	Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723. http://dx.doi.org/10.1109/TAC.1974.1100705
[2]	Cui, Y. (2007). The hierarchy consistency index: Development and analysis [Unpublished doctoral dissertation]. University of Alberta.
[3]	Cui, Y., & Leighton, J. P. (2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46(4), 429-449.
[4]	Chiu, C.-Y. (2013). Statistical refinement of the Q-Matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598-618. https://doi.org/10.1177/0146621613488436
[5]	de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 346-362.
[6]	de la Torre, J. (2009). DINA model and parameter estimation:A didactic. Journal of Educational and Behavioral Statistics, 34(1), 115-130.
[7]	de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199.
[8]	de la Torre, J., & Chiu, C. Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253-273. doi: 10.1007/s11336-015-9467-8 pmid: 25943366
[9]	Ding, S. L., Mao, M. M., Wang, W. Y., Luo, F., & Cui, Y. (2012). Evaluating the consistency of test items relative to the cognitive model for educational cognitive diagnosis. Acta Psychologica Sinica, 44(11), 1535-1546. doi: 10.3724/SP.J.1041.2012.01535
	[丁树良, 毛萌萌, 汪文义, 罗芬, Cui, Y. (2012). 教育认知诊断测验与认知模型一致性的评估. 心理学报, 44(11), 1535-1546.]
[10]	Gu, Y., Liu, J., Xu, G., & Ying, Z. (2018). Hypothesis testing of the Q-matrix. Psychometrika, 83(3), 515-537.
[11]	Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3), 197-243.
[12]	Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(3), 191-210.
[13]	Jiang, Y. (2020). Research on the test method of attribute hierarchy based on information matrix [Unpublished doctoral dissertation]. Beijing Normal University.
	[姜宇. (2020). 基于信息矩阵的属性层级关系检验方法研究(博士学位论文). 北京师范大学.]
[14]	Kang, C. H., Yang, Y. K., & Zeng, P. H. (2019). Q-matrix refinement based on item fit statistic RMSEA. Applied Psychological Measurement, 43(7), 527-542. doi: 10.1177/0146621618813104 pmid: 31534288
[15]	Leighton, J. P., & Gierl, M. J.(Eds.). (2007). Cognitive diagnostic assessment for education: Theory and application. Cambridge: Cambridge University Press.
[16]	Li, J., Mao, X., & Wei, J. (2022). A simple and effective new method of Q-matrix validation. Acta Psychologica Sinica, 54(8), 996-1008. doi: 10.3724/SP.J.1041.2022.00996
	[李佳, 毛秀珍, 韦嘉. (2022). 一种简单有效的Q矩阵修正新方法. 心理学报, 54(8), 996-1008.] doi: 10.3724/SP.J.1041.2022.00996
[17]	Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q- Matrix. Applied Psychological Measurement, 36(7), 548-564.
[18]	Liu, R., Huggins-Manley, A. C., & Bradshaw, L. (2017). The impact of Q-matrix designs on diagnostic classification accuracy in the presence of attribute hierarchies. Educational and Psychological Measurement, 77(2), 220-240 doi: 10.1177/0013164416645636 pmid: 29795911
[19]	Liu, Y., & Wu, Q. (2023). An empirical Q-matrix validation method using complete information matrix in cognitive diagnostic models. Acta Psychologica Sinica, 55(1), 142-158. doi: 10.3724/SP.J.1041.2023.00142
	[刘彦楼, 吴琼琼. (2023). 认知诊断模型Q矩阵修正:完整信息矩阵的作用. 心理学报, 55(1), 142-158.] doi: 10.3724/SP.J.1041.2023.00142
[20]	Ma, C., Ouyang, J., & Xu, G. (2022). Learning latent and hierarchical structures in cognitive diagnosis models. Psychometrika, 88(1), 175-207. doi: 10.1007/s11336-022-09867-5 pmid: 35596101
[21]	Ma, W., & de la Torre, J. (2020). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163.
[22]	Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4(1), 12-35.
[23]	Rupp, A. A., Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78-96.
[24]	Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461-464.
[25]	Scutari, M., & Denis, J. B. (2021). Bayesian networks with examples in R. New York: Chapman and Hall/CRC.
[26]	Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 55-73.
[27]	Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20(4), 345-354.
[28]	Templin, J., & Bradshaw, L. (2014a). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339.
[29]	Templin, J., & Bradshaw, L. (2014b). The use and misuse of psychometric models. Psychometrika, 79(2), 347-354.
[30]	Terzi, R., & de la Torre, J. (2018). An iterative method for empirically-based Q-matrix validation. International Journal of Assessment Tools in Education, 5(2), 248-262.
[31]	Tu, D. B., Cai, Y., & Dai, H. Q. (2012). A new method of Q- matrix validation based on DINA model. Acta Psychologica Sinica, 44(4), 558-568.
	[涂冬波, 蔡艳, 戴海琦. (2012). 基于DINA模型的Q矩阵修正方法. 心理学报, 44(4), 558-568.]
[32]	Wang, C., & Gierl, M. (2011). Using the attribute hierarchy method to make diagnostic inferences about examinees' cognitive skills in critical reading. Journal of Educational Measurement, 48(2), 165-187
[33]	Wang, D., Gao, X., Cai, Y., & Tu, D. (2020). A method of Q-matrix validation for polytomous response cognitive diagnosis model based on relative fit statistics. Acta Psychologica Sinica, 52(1), 93-106. doi: 10.3724/SP.J.1041.2020.00093
	[汪大勋, 高旭亮, 蔡艳, 涂冬波. (2020). 基于类别水平的多级计分认知诊断Q矩阵修正:相对拟合统计量视角. 心理学报, 52(1), 93-106.] doi: 10.3724/SP.J.1041.2020.00093
[34]	Wang, D. X., Cai, Y., & Tu, D. B. (2020). Q-matrix estimation methods for cognitive diagnosis models: Based on partial known Q-matrix. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2020.1746901
[35]	Wang, D.-X., Gao, X.-L., Han, Y.-T., & Tu, D.-B. (2018). A simple and effective Q-matrix estimation method: From non-parametric perspective. Journal of Psychological Science, 41(1), 180-188.
	[汪大勋, 高旭亮, 韩雨婷, 涂冬波. (2018). 一种简单有效的Q矩阵估计方法开发:基于非参数化方法视角. 心理科学, 41(1), 180-188.]
[36]	Wang, C., & Lu, J. (2021). Learning attribute hierarchies from data: Two exploratory approaches. Journal of Educational and Behavioral Statistics, 46(1), 1-27. https://doi.org/10.3102/1076998620931094
[37]	Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113(523), 1284-1295.
[38]	Xue, W., & Chen, H. G. (2012). Data mining based on Clementine. China Renmin University Press.
	[薛薇, 陈欢歌. (2012). 基于Clementine的数据挖掘. 中国人民大学出版社.]
[39]	Yu, X. F., & Cheng, Y. (2020). Data- driven Q-matrix validation using a residual-based statistic in cognitive diagnostic assessment. British Journal of Mathematical and Statistical Psychology, 73(Suppl 1), 145-179.
[40]	Yu, X. F., Ding, S. L., Qin, C. Y., & Lu, Y. N. (2011). Application of Bayesian networks to identify hierarchical relation among attributes in cognitive diagnosis. Acta Psychologica Sinica, 43(3), 338-346.
	[喻晓锋, 丁树良, 秦春影, 陆云娜. (2011). 贝叶斯网在认知诊断属性层级结构确定中的应用. 心理学报, 43(3), 338-346.]
[41]	Zhang, L. W., & Guo, H. P. (2006). Introduction to Bayesian networks. Beijing: Science Press.
	[张连文, 郭海鹏. (2006). 贝叶斯网引论. 北京: 科学出版社.]
[42]	Zhang, X. Q., Jiang, Y., Xin, T., & Liu, Y. L. (2024). Iterative attribute hierarchy exploration methods for cognitive diagnosis models. Journal of Educational and Behavioral Statistics, https://doi.org/10.3102/10769986241268906

属性缺失的PCR			发散型		无结构型		线性型		聚合型
Q-error	test length	g-s	2000	1000	2000	1000	2000	1000	2000	1000
10%	25	0.05-0.25	0.94	0.84	0.906	0.856	0.876	0.786	0.882	0.88
	25	0.05-0.4	0.813	0.677	0.827	0.73	0.786	0.652	0.786	0.79
	40	0.05-0.25	0.964	0.891	0.945	0.883	0.93	0.853	0.945	0.901
	40	0.05-0.4	0.859	0.794	0.76	0.624	0.825	0.7	0.863	0.77
20%	25	0.05-0.25	0.864	0.768	0.847	0.77	0.832	0.692	0.832	0.802
	25	0.05-0.4	0.66	0.497	0.722	0.486	0.692	0.522	0.646	0.634
	40	0.05-0.25	0.936	0.869	0.925	0.821	0.929	0.826	0.93	0.83
	40	0.05-0.4	0.785	0.709	0.648	0.438	0.743	0.595	0.781	0.664
30%	25	0.05-0.25	0.695	0.559	0.602	0.577	0.608	0.536	0.616	0.658
	25	0.05-0.4	0.541	0.307	0.419	0.279	0.441	0.371	0.451	0.437
	40	0.05-0.25	0.825	0.689	0.823	0.715	0.835	0.784	0.798	0.778
	40	0.05-0.4	0.663	0.569	0.483	0.334	0.618	0.527	0.634	0.525
属性缺失的AACR			发散型		无结构型		线性型		聚合型
Q-error	test length	g-s	2000	1000	2000	1000	2000	1000	2000	1000
10%	25	0.05-0.25	0.988	0.967	0.98	0.97	0.975	0.954	0.976	0.975
	25	0.05-0.4	0.958	0.93	0.964	0.942	0.956	0.915	0.952	0.952
	40	0.05-0.25	0.993	0.976	0.989	0.977	0.986	0.968	0.989	0.978
	40	0.05-0.4	0.968	0.953	0.938	0.911	0.958	0.928	0.971	0.948
20%	25	0.05-0.25	0.97	0.943	0.964	0.949	0.958	0.921	0.962	0.953
	25	0.05-0.4	0.913	0.867	0.935	0.872	0.899	0.873	0.91	0.91
	40	0.05-0.25	0.986	0.972	0.985	0.961	0.984	0.96	0.986	0.959
	40	0.05-0.4	0.948	0.927	0.908	0.849	0.932	0.897	0.945	0.918
30%	25	0.05-0.25	0.912	0.871	0.88	0.887	0.876	0.859	0.885	0.897
	25	0.05-0.4	0.864	0.782	0.818	0.779	0.793	0.805	0.829	0.829
	40	0.05-0.25	0.95	0.916	0.956	0.926	0.953	0.943	0.944	0.944
	40	0.05-0.4	0.908	0.877	0.844	0.798	0.886	0.863	0.896	0.863

属性缺失的PCR			发散型		无结构型		线性型		聚合型
Q-error	test length	g-s	2000	1000	2000	1000	2000	1000	2000	1000
10%	25	0.05-0.25	0.94	0.84	0.906	0.856	0.876	0.786	0.882	0.88
	25	0.05-0.4	0.813	0.677	0.827	0.73	0.786	0.652	0.786	0.79
	40	0.05-0.25	0.964	0.891	0.945	0.883	0.93	0.853	0.945	0.901
	40	0.05-0.4	0.859	0.794	0.76	0.624	0.825	0.7	0.863	0.77
20%	25	0.05-0.25	0.864	0.768	0.847	0.77	0.832	0.692	0.832	0.802
	25	0.05-0.4	0.66	0.497	0.722	0.486	0.692	0.522	0.646	0.634
	40	0.05-0.25	0.936	0.869	0.925	0.821	0.929	0.826	0.93	0.83
	40	0.05-0.4	0.785	0.709	0.648	0.438	0.743	0.595	0.781	0.664
30%	25	0.05-0.25	0.695	0.559	0.602	0.577	0.608	0.536	0.616	0.658
	25	0.05-0.4	0.541	0.307	0.419	0.279	0.441	0.371	0.451	0.437
	40	0.05-0.25	0.825	0.689	0.823	0.715	0.835	0.784	0.798	0.778
	40	0.05-0.4	0.663	0.569	0.483	0.334	0.618	0.527	0.634	0.525
属性缺失的AACR			发散型		无结构型		线性型		聚合型
Q-error	test length	g-s	2000	1000	2000	1000	2000	1000	2000	1000
10%	25	0.05-0.25	0.988	0.967	0.98	0.97	0.975	0.954	0.976	0.975
	25	0.05-0.4	0.958	0.93	0.964	0.942	0.956	0.915	0.952	0.952
	40	0.05-0.25	0.993	0.976	0.989	0.977	0.986	0.968	0.989	0.978
	40	0.05-0.4	0.968	0.953	0.938	0.911	0.958	0.928	0.971	0.948
20%	25	0.05-0.25	0.97	0.943	0.964	0.949	0.958	0.921	0.962	0.953
	25	0.05-0.4	0.913	0.867	0.935	0.872	0.899	0.873	0.91	0.91
	40	0.05-0.25	0.986	0.972	0.985	0.961	0.984	0.96	0.986	0.959
	40	0.05-0.4	0.948	0.927	0.908	0.849	0.932	0.897	0.945	0.918
30%	25	0.05-0.25	0.912	0.871	0.88	0.887	0.876	0.859	0.885	0.897
	25	0.05-0.4	0.864	0.782	0.818	0.779	0.793	0.805	0.829	0.829
	40	0.05-0.25	0.95	0.916	0.956	0.926	0.953	0.943	0.944	0.944
	40	0.05-0.4	0.908	0.877	0.844	0.798	0.886	0.863	0.896	0.863

			属性冗余的PCR				属性冗余的AACR
			发散型		无结构型		发散型		无结构型
Q-error	test length	g-s	2000	1000	2000	1000	2000	1000	2000	1000
10%	25	0.05-0.25	0.929	0.861	0.909	0.854	0.985	0.971	0.982	0.969
	25	0.05-0.4	0.81	0.678	0.826	0.711	0.957	0.929	0.964	0.937
	40	0.05-0.25	0.968	0.909	0.953	0.88	0.993	0.98	0.991	0.976
	40	0.05-0.4	0.876	0.791	0.865	0.783	0.974	0.954	0.973	0.955
20%	25	0.05-0.25	0.867	0.782	0.828	0.755	0.969	0.95	0.958	0.944
	25	0.05-0.4	0.651	0.761	0.704	0.504	0.909	0.944	0.928	0.876
	40	0.05-0.25	0.936	0.868	0.93	0.81	0.986	0.97	0.986	0.958
	40	0.05-0.4	0.814	0.695	0.8	0.703	0.955	0.925	0.958	0.935
30%	25	0.05-0.25	0.681	0.595	0.612	0.534	0.908	0.883	0.883	0.871
	25	0.05-0.4	0.502	0.581	0.449	0.254	0.852	0.88	0.827	0.772
	40	0.05-0.25	0.855	0.719	0.858	0.718	0.96	0.926	0.967	0.926
	40	0.05-0.4	0.616	0.575	0.61	0.478	0.891	0.884	0.887	0.854

			属性冗余的PCR				属性冗余的AACR
			发散型		无结构型		发散型		无结构型
Q-error	test length	g-s	2000	1000	2000	1000	2000	1000	2000	1000
10%	25	0.05-0.25	0.929	0.861	0.909	0.854	0.985	0.971	0.982	0.969
	25	0.05-0.4	0.81	0.678	0.826	0.711	0.957	0.929	0.964	0.937
	40	0.05-0.25	0.968	0.909	0.953	0.88	0.993	0.98	0.991	0.976
	40	0.05-0.4	0.876	0.791	0.865	0.783	0.974	0.954	0.973	0.955
20%	25	0.05-0.25	0.867	0.782	0.828	0.755	0.969	0.95	0.958	0.944
	25	0.05-0.4	0.651	0.761	0.704	0.504	0.909	0.944	0.928	0.876
	40	0.05-0.25	0.936	0.868	0.93	0.81	0.986	0.97	0.986	0.958
	40	0.05-0.4	0.814	0.695	0.8	0.703	0.955	0.925	0.958	0.935
30%	25	0.05-0.25	0.681	0.595	0.612	0.534	0.908	0.883	0.883	0.871
	25	0.05-0.4	0.502	0.581	0.449	0.254	0.852	0.88	0.827	0.772
	40	0.05-0.25	0.855	0.719	0.858	0.718	0.96	0.926	0.967	0.926
	40	0.05-0.4	0.616	0.575	0.61	0.478	0.891	0.884	0.887	0.854

PCR		缺失一个边	缺失两个边	冗余一个边	冗余两个边	既有缺失
Q-error	g-s	缺失一个边	缺失两个边	冗余一个边	冗余两个边	又有冗余
10%	0.05-0.25	0.849	0.859	0.836	0.861	0.846
10%	0.05-0.4	0.767	0.724	0.71	0.705	0.713
20%	0.05-0.25	0.739	0.693	0.739	0.728	0.716
20%	0.05-0.4	0.57	0.546	0.576	0.513	0.53
30%	0.05-0.25	0.51	0.511	0.578	0.51	0.555
30%	0.05-0.4	0.333	0.27	0.35	0.343	0.336
AACR		缺失一个边	缺失两个边	冗余一个边	冗余两个边	既有缺失
Q-error	g-s	缺失一个边	缺失两个边	冗余一个边	冗余两个边	又有冗余
10%	0.05-0.25	0.977	0.978	0.975	0.978	0.976
10%	0.05-0.4	0.964	0.953	0.953	0.952	0.955
20%	0.05-0.25	0.952	0.941	0.951	0.95	0.946
20%	0.05-0.4	0.914	0.912	0.919	0.904	0.909
30%	0.05-0.25	0.885	0.883	0.91	0.881	0.897
30%	0.05-0.4	0.815	0.834	0.839	0.842	0.839