CD-CAT中基于SCAD惩罚和EM视角的在线标定方法开发——G-DINA模型

doi:10.3724/SP.J.1041.2024.00670

摘要/Abstract

摘要：

G-DINA (the generalized deterministic input, noisy and gate)模型限制条件少, 应用范围广, 满足大量心理与教育评估测验数据的要求。研究提出一种适用于G-DINA等模型的同时标定新题Q矩阵与项目参数的认知诊断计算机化自适应测验(CD-CAT)在线标定新方法SCADOCM, 以期促进CD-CAT在实践中的推广与应用。本研究分别基于模拟题库以及真实题库进行研究, 结果表明：相比传统的SIE方法, SCADOCM在各实验条件下均具有较为理想的标定精度与标定效率, 应用前景较好; SIE方法不适用于饱和的G-DINA等模型, 其各实验条件下的Q矩阵标定精度均较低。

关键词: 认知诊断计算机化自适应测验, 在线标定, Q矩阵, G-DINA模型, SCAD惩罚

Abstract:

Cognitive diagnostic computerized adaptive testing (CD-CAT) provides a detailed diagnosis of an examinee’s strengths and weaknesses in the content measured in a timely and accurate manner, which can be used as a reference for further study or remediation planning, thus meeting the practical need for efficient and detailed test results. The successful implementation of CD-CAT is based on an item bank, but its maintenance is a very challenging task. A psychometrically popular choice for maintaining an item bank is online calibration. Currently, the research on online calibration methods in the CD-CAT that can calibrate Q-matrix and item parameters simultaneously is very weak. The existing methods are basically developed based on the deterministic input, noisy and gate (DINA) model. Compared with the DINA model, the generalized DINA (G-DINA) model has been more widely applied because it is less restrictive and can meet the requirements of a large number of test data in psychological and educational assessment. Therefore, if the online calibration method that jointly calibrates the Q-matrix and item parameters can be developed for models with few constraints such as G-DINA, its meaning is understood without explanation.
In current study, a new online calibration method, SCADOCM, was proposed, which was suitable for the G-DINA model. The construction of SCADOCM was based on the smoothly clipped absolute deviation penalty (SCAD) and marginalized maximum likelihood estimation (MMLE/EM) algorithm. For the new item j, the log-likelihood function with SCAD can be formulated based on the examinees’ responses in this item and the examinees’ attribute marginal mastery probability, and the q-vector of the new item can be estimated by the q-vector estimator based on SCAD. Then, the EM algorithm was used to estimate the item parameter of the new item j based on the posterior distributions of examinees’ attribute patterns, the examinees’ responses to new item j and the estimated q-vector.
To examine the performance of the proposed SCADOCM and compare it with the SIE method, two simulation studies (Study 1 and Study 2) are conducted. Study 1 is based on a simulated item bank while Study 2 is based on the real item bank (Internet addiction item bank; Shi, 2017). In these simulation studies, four factors were manipulated: the calibration sample size (n_j = 50 vs. 100 vs. 500 vs. 1000 vs. 2000), the distribution of the attribute pattern (uniform distribution vs. high-order distribution vs. normal distribution), the item quality (U (0.05, 0.15) vs. U (0.1, 0.3)), and the online calibration methods (SCADOCM vs. SIE). The results showed that (1) SCADOCM has satisfactory calibration accuracy and calibration efficiency, and is superior to the SIE method. In addition, the traditional SIE method is not applicable for the G-DINA model, and its Q-matrix estimation accuracy rate is low under all experimental conditions. (2) The item calibration accuracy of SCADOCM and SIE increases with the increase of calibration sample and item quality under most conditions, and its item calibration accuracy in the uniform distribution/higher-order distribution is greater than that in the normal distribution. (3) The calibration efficiency of SCADOCM decreases with the increase of calibration samples, but it is less affected by the item quality and the attribute pattern distribution; the calibration efficiency of SIE decreases with the increase of calibration samples, but it is less affected by the item quality. Moreover, the calibration efficiency of the SIE method in the normal distribution is slightly slower than that of uniform distribution/high-order distribution.
To sum up the results, this study demonstrated that the SCADOCM has higher item calibration accuracy and calibration efficiency, and outperforms the SIE method; meanwhile, the traditional SIE method is not suitable for G-DINA model. All in all, this study provides an efficient and accurate method for item calibration in CD-CAT, and provides important support for further promoting the application of CD-CAT in practice.

Key words: Cognitive Diagnostic Computerized Adaptive Testing, Online Calibration, Q-matrix, G-DINA model, SCAD Penalty

中图分类号:

B841

谭青蓉, 蔡艳, 汪大勋, 罗芬, 涂冬波. (2024). CD-CAT中基于SCAD惩罚和EM视角的在线标定方法开发——G-DINA模型. 心理学报, 56(5), 670-688.

TAN Qingrong, CAI Yan, WANG Daxun, LUO Fen, TU Dongbo. (2024). Development of Online Calibration Method based on SCAD penalty and EM perspective in CD-CAT: G-DINA model. Acta Psychologica Sinica, 56(5), 670-688.

图/表 9

参考文献 52

[1]	Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001). A comparative study of on-line pretest item- calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191-212. doi: 10.1111/jedm.2001.38.issue-3 URL
[2]	Bradshaw, L. P., & Madison, M. J. (2015). Invariance properties for general diagnostic classification models. International Journal of Testing, 16(2), 99-118. doi: 10.1080/15305058.2015.1107076 URL
[3]	Breheny, P., & Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The Annals of Applied Statistics, 5(1), 232-253.
[4]	Chen, J. (2017a). A residual-based approach to validate Q-matrix specifications. Applied Psychological Measurement, 41(4), 277-293. doi: 10.1177/0146621616686021 URL
[5]	Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123-140. doi: 10.1111/jedm.2013.50.issue-2 URL
[6]	Chen, P. (2016). Two new online calibration methods for computerized adaptive testing. Acta Psychologica Sinica, 48(9), 1184-1198. doi: 10.3724/SP.J.1041.2016.01184
	[陈平. (2016). 两种新的计算机化自适应测验在线标定方法. 心理学报, 48(9), 1184-1198.]
[7]	Chen, P. (2017b). A comparative study of online item calibration methods in multidimensional computerized adaptive testing. Journal of Educational and Behavioral Statistics, 42(5), 559-590. doi: 10.3102/1076998617695098 URL
[8]	Chen, P., & Wang, C. (2015). A new online calibration method for multidimensional computerized adaptive testing. Psychometrika, 81(3), 674-701. doi: 10.1007/s11336-015-9482-9 URL
[9]	Chen, P., Wang, C., Xin, T., & Chang, H. H. (2017). Developing new online calibration methods for multidimensional computerized adaptive testing. British Journal of Mathematical & Statistical Psychology, 70(1), 81-117.
[10]	Chen, P., & Xin, T. (2011a). Developing on-line calibration methods for cognitive diagnostic computerized adaptive testing. Acta Psychologica Sinica, 43(6), 710-724.
	[陈平, 辛涛. (2011a). 认知诊断计算机化自适应测验中在线标定方法的开发. 心理学报, 43(6), 710-724.]
[11]	Chen, P., & Xin, T. (2011b). Item replenishing in cognitive diagnostic computerized adaptive testing. Acta Psychologica Sinica, 43(7), 836-850.
	[陈平, 辛涛. (2011b). 认知诊断计算机化自适应测验中的项目增补. 心理学报, 43(7), 836-850.]
[12]	Chen, P., Xin, T., Wang, C., & Chang, H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77(2), 201-222. doi: 10.1007/s11336-012-9255-7 URL
[13]	Chen, Y., Liu, J., & Ying, Z. (2015). Online item calibration for Q-matrix in CD-CAT. Applied Psychological Measurement, 39(1), 5-15. doi: 10.1177/0146621613513065 pmid: 29882531
[14]	Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619-632. doi: 10.1007/s11336-009-9123-2 URL
[15]	Chiu, C.-Y. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598-618. doi: 10.1177/0146621613488436 URL
[16]	de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199. doi: 10.1007/s11336-011-9207-7 URL
[17]	de la Torre, J., & Chiu, C. Y. (2016). General method of empirical Q-matrix validation. Psychometrika, 81(2), 253-273. doi: 10.1007/s11336-015-9467-8 URL
[18]	de la Torre, J., & Lee, Y. S. (2010). A note on the invariance of the DINA model parameters. Journal of Educational Measurement, 47(1), 115-127. doi: 10.1111/jedm.2010.47.issue-1 URL
[19]	de la Torre, J., van der Ark, L. A., & Rossi, G. (2018). Analysis of clinical data from a cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development, 51(4), 281-296. doi: 10.1080/07481756.2017.1327286 URL
[20]	Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348-1360. doi: 10.1198/016214501753382273 URL
[21]	Fan, J., & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1), 101-148. pmid: 21572976
[22]	Fan, Y., & Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(3), 531-552.
[23]	Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191-210. doi: 10.1007/s11336-008-9089-5 URL
[24]	Hou, L. (2013). Differential item functioning assessment in cognitive diagnostic modeling (Unpublished doctoral dissertation). University of Delaware.
[25]	Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258-272. doi: 10.1177/01466210122032064 URL
[26]	Kang, H. A., Zheng, Y., & Chang, H. H. (2020). Online calibration of a joint model of item responses and response times in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 45(2), 175-208. doi: 10.3102/1076998619879040 URL
[27]	Klein Entink, R. H., Kuhn, J.-T., Hornke, L. F., & Fox, J.-P. (2009). Evaluating cognitive theory: A joint modeling approach using responses and response times. Psychological Methods, 14(1), 54-75. doi: 10.1037/a0014877 pmid: 19271848
[28]	Li, H. (2012). Statistical learning method. Beijing: Tsinghua University Press.
	[李航. (2012). 统计学习方法. 北京: 清华大学出版.]
[29]	Lin, C. J., & Chang, H. H. (2019). Item selection criteria with practical constraints in cognitive diagnostic computerized adaptive testing. Educational and Psychological Measurement, 79(2), 335-357. doi: 10.1177/0013164418790634 URL
[30]	Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30(2), 152-172. doi: 10.1007/s00357-013-9128-5 URL
[31]	Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14), 1-26.
[32]	Madison, M. J., & Bradshaw, L. P. (2018). Assessing growth in a diagnostic classification model framework. Psychometrika, 83, 963-990. doi: 10.1007/s11336-018-9638-5 pmid: 30264183
[33]	Peng, S., Wang, D., Gao, X., Cai, Y., & Tu, D. (2019). The CDA-BPD: Retrofitting a traditional borderline personality questionnaire under the cognitive diagnosis model framework. Journal of Pacific Rim Psychology, 13, Article e22.
[34]	Rupp, A. A., & Templin, J. L. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78-96. doi: 10.1177/0013164407301545 URL
[35]	Shi, S. S. (2017). Cognitive diagnosis of Internet addition and its CD-CAT study (Unpublished master’s thesis). Jiangxi Normal University, Nanchang, China
	[史双双. (2017). 网络成瘾的认知诊断及其CD-CAT的研究(硕士学位论文). 江西师范大学, 南昌.]
[36]	Stocking, M. L. (1988). Scale drift in on-line calibration. ETS Research Report Series, 1988(1), 1-122.
[37]	Tan, Q., Cai, Y., Luo, F., & Tu, D. (2022). Development of a high-accuracy and effective online calibration method in CD-CAT based on gini index. Journal of Educational and Behavioral Statistics, 48(1), 103-141. doi: 10.3102/10769986221126741 URL
[38]	Tan, Q., Wang, D., Luo, F., Cai, Y., & Tu, D. (2021). A high-efficiency and new online calibration method in CD-CAT based on information gain of entropy and EM algorithm. Acta Psychologica Sinica, 53(11), 1286-1300. doi: 10.3724/SP.J.1041.2021.01286
	[谭青蓉, 汪大勋, 罗芬, 蔡艳, 涂冬波. (2021). 一种高效的CD-CAT在线标定新方法:基于熵的信息增益与EM视角. 心理学报, 53(11), 1286-1300. ] doi: 10.3724/SP.J.1041.2021.01286
[39]	Tan, Z., de La Torre, J., Ma, W., Huh, D., Larimer, M. E., & Mun, E.-Y. (2023). A tutorial on cognitive diagnosis modeling for characterizing mental health symptom profiles using existing item responses. Prevention Science: The Official Journal of the Society for Prevention Research, 24(3), 480-492. doi: 10.1007/s11121-022-01346-8
[40]	Tang, F., & Zhan, P. (2021). Does diagnostic feedback promote learning? Evidence from a longitudinal cognitive diagnostic assessment. AERA Open, 7(3), 296-307.
[41]	Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287-305. pmid: 16953706
[42]	Tu, D., Gao, X., Wang, D., & Cai, Y. (2017). A new measurement of Internet addiction using diagnostic classification models. Frontiers in Psychology, 8, 1768. doi: 10.3389/fpsyg.2017.01768 pmid: 29066994
[43]	van der Linden, W. J., Klein Entink, R. H., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327-347. doi: 10.1177/0146621609349800 URL
[44]	Wainer, H., & Mislevy, R. J. (1990). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computerized adaptive testing: A primer (Chap. 4, pp. 65-102). Hillsdale, NJ: Erlbaum.
[45]	Wang, D., Gao, X., Cai, Y., & Dongbo, T. U. (2020). A method of Q-matrix validation for polytomous response cognitive diagnosis model based on relative fit statistics. Acta Psychologica Sinica, 52(1), 93-106. doi: 10.3724/SP.J.1041.2020.00093
	[汪大勋, 高旭亮, 蔡艳, 涂冬波. (2020). 基于类别水平的多级计分认知诊断Q矩阵修正:相对拟合统计量视角. 心理学报, 52(1), 93-106.] doi: 10.3724/SP.J.1041.2020.00093
[46]	Wang, H., Li, R., & Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3), 553-568. doi: 10.1093/biomet/asm053 pmid: 19343105
[47]	Wang, W., Song, L., Ding, S., Meng, Y., Cao, C., & Jie, Y. (2018). An EM-based method for Q-matrix validation. Applied Psychological Measurement, 42(6), 446-459. doi: 10.1177/0146621617752991 pmid: 30787487
[48]	Xi, C., Cai, Y., Peng, S., Lian, J., & Tu, D. (2020). A diagnostic classification version of Schizotypal Personality Questionnaire using diagnostic classification models. International Journal of Methods in Psychiatric Research, 29(1), e1807. doi: 10.1002/mpr.v29.1 URL
[49]	Xu, G., Wang, C., & Shang, Z. (2016). On initial item selection in cognitive diagnostic computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 69(3), 291-315. doi: 10.1111/bmsp.2016.69.issue-3 URL
[50]	Zhang, X. G. (2010). Pattern Recognitive (Third Edition). Tsinghua University Press, China.
	[张学工. (2010). 模式识别(第三版). 清华大学出版社.]
[51]	Zhang, Y., Li, R., & Tsai, C. L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105(489), 312-323. pmid: 20676354
[52]	Zheng, C., & Chang, H. H. (2016). High-efficiency response distribution-based item selection algorithms for short- length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608-624. doi: 10.1177/0146621616665196 URL

项目质量	属性模式分布	标定样本	P (0)		1 − P (1)
项目质量	属性模式分布	标定样本	SIE	SCADOCM	SIE	SCADOCM
0.1~0.3	高阶	50	0.180	0.133	0.186	0.107
		100	0.122	0.108	0.127	0.085
		500	0.055	0.045	0.057	0.037
		1000	0.037	0.031	0.039	0.026
		2000	0.027	0.022	0.028	0.018
	均匀	50	0.362	0.162	0.356	0.122
		100	0.260	0.148	0.281	0.099
		500	0.111	0.068	0.113	0.046
		1000	0.079	0.041	0.078	0.032
		2000	0.053	0.027	0.054	0.022
	正态	50	0.229	0.160	0.232	0.134
		100	0.154	0.124	0.155	0.101
		500	0.065	0.058	0.065	0.045
		1000	0.046	0.041	0.047	0.033
		2000	0.033	0.030	0.034	0.024
0.05~0.15	高阶	50	0.131	0.127	0.127	0.095
		100	0.086	0.088	0.088	0.066
		500	0.038	0.033	0.038	0.027
		1000	0.026	0.021	0.026	0.019
		2000	0.019	0.014	0.019	0.013
	均匀	50	0.269	0.184	0.329	0.122
		100	0.198	0.142	0.218	0.087
		500	0.079	0.041	0.079	0.034
		1000	0.057	0.027	0.055	0.023
		2000	0.038	0.018	0.039	0.015
	正态	50	0.169	0.161	0.177	0.125
		100	0.107	0.107	0.110	0.084
		500	0.046	0.044	0.047	0.035
		1000	0.033	0.029	0.033	0.025
		2000	0.023	0.019	0.023	0.017

项目质量	属性模式分布	标定样本	P (0)		1 − P (1)
项目质量	属性模式分布	标定样本	SIE	SCADOCM	SIE	SCADOCM
0.1~0.3	高阶	50	0.180	0.133	0.186	0.107
		100	0.122	0.108	0.127	0.085
		500	0.055	0.045	0.057	0.037
		1000	0.037	0.031	0.039	0.026
		2000	0.027	0.022	0.028	0.018
	均匀	50	0.362	0.162	0.356	0.122
		100	0.260	0.148	0.281	0.099
		500	0.111	0.068	0.113	0.046
		1000	0.079	0.041	0.078	0.032
		2000	0.053	0.027	0.054	0.022
	正态	50	0.229	0.160	0.232	0.134
		100	0.154	0.124	0.155	0.101
		500	0.065	0.058	0.065	0.045
		1000	0.046	0.041	0.047	0.033
		2000	0.033	0.030	0.034	0.024
0.05~0.15	高阶	50	0.131	0.127	0.127	0.095
		100	0.086	0.088	0.088	0.066
		500	0.038	0.033	0.038	0.027
		1000	0.026	0.021	0.026	0.019
		2000	0.019	0.014	0.019	0.013
	均匀	50	0.269	0.184	0.329	0.122
		100	0.198	0.142	0.218	0.087
		500	0.079	0.041	0.079	0.034
		1000	0.057	0.027	0.055	0.023
		2000	0.038	0.018	0.039	0.015
	正态	50	0.169	0.161	0.177	0.125
		100	0.107	0.107	0.110	0.084
		500	0.046	0.044	0.047	0.035
		1000	0.033	0.029	0.033	0.025
		2000	0.023	0.019	0.023	0.017

ID	症状标准
A1	沉迷于网络游戏(如, 重温过去的游戏经历或期望下一次游戏, 网络游戏成为日常的主导活动)。
A2	远离网络游戏时出现戒断症状(如, 易怒、焦虑或悲伤, 但没有药物戒断的身体迹象)。
A3	耐受性——需要花更多的时间参与网络游戏。
A4	试图控制网络游戏的参与不成功。
A5	因网络游戏而对以前的爱好和娱乐失去兴趣, 但网络游戏除外。
A6	尽管了解心理社会问题, 但仍继续过度使用网络游戏。
A7	向家庭成员、治疗师或者其他人撒谎参与网络游戏的次数。
A8	利用网络游戏来逃避或缓解消极情绪(如, 无助感、焦虑、内疚)。
A9	因参与网络游戏而危及或失去重要的人际关系、工作、教育或职业机会。

ID	症状标准
A1	沉迷于网络游戏(如, 重温过去的游戏经历或期望下一次游戏, 网络游戏成为日常的主导活动)。
A2	远离网络游戏时出现戒断症状(如, 易怒、焦虑或悲伤, 但没有药物戒断的身体迹象)。
A3	耐受性——需要花更多的时间参与网络游戏。
A4	试图控制网络游戏的参与不成功。
A5	因网络游戏而对以前的爱好和娱乐失去兴趣, 但网络游戏除外。
A6	尽管了解心理社会问题, 但仍继续过度使用网络游戏。
A7	向家庭成员、治疗师或者其他人撒谎参与网络游戏的次数。
A8	利用网络游戏来逃避或缓解消极情绪(如, 无助感、焦虑、内疚)。
A9	因参与网络游戏而危及或失去重要的人际关系、工作、教育或职业机会。

项目参数	最小值	最大值	平均值	标准差
1 − P (1)	0.161	0.500	0.450	0.072
P (0)	0.004	0.500	0.069	0.082