认知诊断测评中缺失数据的处理：随机森林阈值插补法

doi:10.3724/SP.J.1041.2023.01192

摘要/Abstract

摘要：

认知诊断测评中缺失数据的处理是理论和实际应用者非常关注的研究主题。借鉴随机森林插补法(RFI)不依赖于缺失机制假设的特点, 对已有的RFI方法进行改进, 提出采用个人拟合指标(RCI)确定插补阈值的新方法: 随机森林阈值插补方法(RFTI)。模拟研究表明, RFTI在插补正确率上明显高于RFI方法; 与RFI和EM方法相比, RFTI在被试属性模式判准率和边际判准率上表现出明显优势, 尤其是非随机缺失和混合缺失机制, 以及缺失比例较高的条件下, 其优势更加明显。但对项目参数的估计, RFTI方法与EM方法相比不具有优势。

关键词: 缺失数据, 认知诊断测评, 随机森林阈值插补, 随机森林插补, EM算法

Abstract:

In recent years, interest in cognitive diagnostic assessments (CDAs), as a new form of test, has increased drastically. Due to the specific design of the test, missing data is an inevitable problem in CDAs. Proper handling of missing data in CDAs is important to provide accurate diagnostic feedback to students and teachers. With the use of machine learning in education, relevant advancements have been made in missing data imputation. Research showed machine learning techniques have more desirable features for missing data imputation than traditional approaches. The random forest algorithm has been extended to become the random forest imputation (RFI) method in handling of CDAs missing data for CDAs. The method takes into consideration the characteristics of the data rather than assumes certain missing mechanism. RFI is a new non-parametric method that makes full use of the available response information and characteristics of response patterns to impute missing data.

Making use of advantages of RFI in categorization/prediction and its non-reliant on missing mechanism type, we improved and proposed the new random forest threshold imputation (RFTI) method. It could be used to impute missing responses in the widely used DINA (Deterministic Inputs, Noise “And” Gate) model. This research proposed to apply the Response Conformity Index (RCI) in the missing data imputation to set the threshold of imputation and to develop a method for missing response treatment for CDAs without totally relying on imputation. Two simulation studies were conducted to compare the performance of the proposed method and traditional models. Study 1 began by introducing the theoretical background and algorithm implementation of RFTI. Then, RFTI and RFI were compared in terms of accuracy rate of imputation for data with different proportions of missingness (10%, 20%, 30%, 40%, 50%) and missing data mechanisms (MIXED, MNAR, MAR, MCAR). This was to affirm the necessity of including RCI during imputation. Study 2 aimed to investigate the performance of RFTI, as well as RFI and EM algorithm in imputing missing data under different conditions. The manipulated design factors were identical to those in Study 1. We evaluated RFTI in terms of its accuracy in assessing the model attributes and item parameters. We also compared RFTI against the traditionally better performed EM and RFI under various design conditions to explore the advantages and conditions of using RFTI.

Results of Study 1 showed that RFTI, as compared to RFI, improved accuracy when imputation threshold was one. In various design conditions, RFTI imputation rate and accuracy were also better. Study 2 showed that RFTI outperformed other methods (RFI, EM algorithm) in accurately assessing the attribute pattern and attribute margin. This advantage was affected by the missing data mechanism and the proportion of missing data. Notably, RFTI was particularly better than other methods in handling mixed type of missing or MNAR data, and when the proportion of missing data was higher than 30%. However, RFTI was not any better than other methods in its accuracy of item parameter estimates. In most conditions, EM algorithm provided the most accurate parameter estimates.

In sum, we propose a method to impute missing data in CDAs by applying machine learning methods in measurement models. The advantage of this new method is affirmed through its accurate assessment of attribute pattern and attribute margin of DINA model. Theoretically, the current study provides a missing data imputation approach with less assumptions, which extends the traditional methods to impute missing data in CDAs framework. Moreover, we investigate how to estimate the attribute pattern of students accurately through the responses of a few items. It sheds lights on imputing missing data due to particularly designs in assessment or teaching.

Key words: missing data, cognitive diagnostic assessment, random forest threshold imputation, random forest imputation, expectation-maximization algorithm

中图分类号:

B841

游晓锋, 杨建芹, 秦春影, 刘红云. (2023). 认知诊断测评中缺失数据的处理：随机森林阈值插补法. 心理学报, 55(7), 1192-1206.

YOU Xiaofeng, YANG Jianqin, Qin Chunying, LIU Hongyun. (2023). Missing data analysis in cognitive diagnostic models: Random forest threshold imputation method. Acta Psychologica Sinica, 55(7), 1192-1206.

图/表 6

参考文献 43

[1]	Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A preliminary theory of action for summative and formative assessment. Measurement Interdisciplinary Research & Perspectives, 8(2-3), 70-91.
[2]	Cheema, J. R. (2014). A review of missing data handling methods in education research. Review of Educational Research, 84(4), 487-508. doi: 10.3102/0034654314532697 URL
[3]	Chen, P., & Xin, T. (2011). Item replenishing in cognitive diagnostic computerized adaptive testing. Acta Psychologica Sinica, 43(7), 836-850.
	[陈平, 辛涛. (2011). 认知诊断计算机化自适应测验中的项目增补. 心理学报, 43(7), 836-850. ]
[4]	Chen, Y., Li, X., Liu, J., & Ying, Z. (2018). Recommendation system for adaptive learning. Applied psychological measurement, 42(1), 24-41. doi: 10.1177/0146621617697959 pmid: 29335659
[5]	Cheng, Y. (2010). Improving cognitive diagnostic computerized adaptive testing by balancing attribute coverage: The modified maximum global discrimination index method. Educational and Psychological Measurement, 70(6), 902-913. doi: 10.1177/0013164410366693 URL
[6]	Cui, Y., & Li, L. (2015). Evaluating person fit for cognitive diagnostic assessment. Applied Psychological Measurement, 39(3), 223-238. doi: 10.1177/0146621614557272 pmid: 29881004
[7]	Dai, S. (2017). Investigation of missing responses in implementation of cognitive diagnostic models. (Unpublished doctoral dissertation). Indiana University.
[8]	Dai, S., Svetina Valdivia, D. (2022). Dealing with missing responses in cognitive diagnostic modeling. Psych, 4, 318-342. https://doi.org/10.3390/psych4020028 doi: 10.3390/psych4020028 URL
[9]	De Ayala, R. J., Plake, B. S. & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38(3), 213-234. doi: 10.1111/jedm.2001.38.issue-3 URL
[10]	de la Torre, J., Hong, Y., & Deng, W. (2010). Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement, 47(2), 227-249. doi: 10.1111/(ISSN)1745-3984 URL
[11]	Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225-245. doi: 10.1111/jedm.2008.45.issue-3 URL
[12]	George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The R package CDM for cognitive diagnosis models. Journal of Statistical Software, 74(2), 1-24.
[13]	Gierl, M. J., Wang, C., & Zhou, J. (2011). Using the attribute hierarchy method to make diagnostic inferences about examinees' cognitive skills in algebra on the SAT. Journal of Technology, Learning, and Assessment, 6(6). Retrieved from http://www.jtla.org
[14]	Glas, C., & Pimentel, J. (2008). Modeling nonignorable missing data in speeded tests. Educational and Psychological Measurement, 68(6), 907-922. doi: 10.1177/0013164408315262 URL
[15]	Graham, J.W., Taylor, B.J., Olchowski, A.E., & Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods. 11,323-343. pmid: 17154750
[16]	Johnson, E. G. (1992). The design of the National Assessment of Educational Progress. Journal of Educational Measurement, 29(2), 95-110. doi: 10.1111/jedm.1992.29.issue-2 URL
[17]	Li, J., Mao, X., & Zhang, X. (2021). Q-matrix estimation (validation) methods for cognitive diagnosis. Advances in Psychological Science, 29(12), 2272-2280. doi: 10.3724/SP.J.1042.2021.02272 URL
	[李佳, 毛秀珍, 张雪琴. (2021). 认知诊断Q矩阵估计(修正)方法. 心理科学进展, 29(12), 2272-2280.] doi: 10.3724/SP.J.1042.2021.02272
[18]	Little, R., & Rubin, D. B. (2002). Statistical analysis with missing data: Second Edition. New York: Wiley.
[19]	Liu, Y., Xin, T., & Jiang, Y. (2021). Structural parameter standard error estimation method in diagnostic classification models: Estimation and application. Multivariate Behavioral Research, 57(5), 784-803. doi: 10.1080/00273171.2021.1919048 URL
[20]	Liu, Y., Andersson, B., Xin, T., Zhang, H., & Wang, L. (2019). Improved Wald statistics for item-level model comparison in diagnostic classification models. Applied Psychological Measurement, 43, 402-414. doi: 10.1177/0146621618798664 pmid: 31235985
[21]	Liu, Y., Zhang, Q., Zheng, Z., & Yin, H. (2019). The Robustness of the item-level model comparison statistics in cognitive diagnostic models. Journal of Psychological Science, 42(5), 1251-1259.
	[刘彦楼, 张倩萌, 郑宗军, 尹昊. (2019). 认知诊断模型中项目水平模型比较统计量的健壮性. 心理科学, 42(5), 1251-1259.]
[22]	Liu, Y., & Gopalakrishnan, V. (2017). An overview and evaluation of recent machine learning imputation methods using cardiac imaging data. Data, 2(1), 8-23. doi: 10.3390/data2010008 URL
[23]	Luo, Z. S., Li, Y, J., Yu, X. F., Gao, C. L., & Peng, Y. F. (2015). A simple cognitive diagnosis method based on Q-Matrix theory. Acta Psychologica Sinica, 47(2), 264-272. doi: 10.3724/SP.J.1041.2015.00264 URL
	[罗照盛, 李喻骏, 喻晓锋, 高椿雷, 彭亚风. (2015). 一种基于Q矩阵理论朴素的认知诊断方法. 心理学报, 47(2), 264-272.]
[24]	Mabrey, D. J. (2006). Tactical terrorism analysis: A comparative study of statistical learning techniques to predict culpability for terrorist bombings in two regionals low-intensity conflicts. Unpublished doctoral Dissertation, Sam Houston State University, Huntsville, TX.
[25]	McArdle, J. J. (1994). Structural factor analysis experiments with incomplete data. Multivariate Behavioral Research, 29, 409-454. doi: 10.1207/s15327906mbr2904_5 pmid: 26745236
[26]	Mislevy, R. J., & Wu, P. K. (1988). Inferring examinee ability when some item responses missing (RR-88-48-ONR). Princeton. NJ: Educational Testing Service.
[27]	Muthén, B., Asparouhov, T., Hunter, A., & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STARD antidepressant trial. Psychological Methods*, 16(1), 17-33. doi: 10.1037/a0022634 pmid: 21381817
[28]	Pan, Y., & Zhan, P. (2020). The impact of sample attrition on longitudinal learning diagnosis: A Prolog. Frontiers in Psychology, 11, 1051. doi: 10.3389/fpsyg.2020.01051 pmid: 32655428
[29]	Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525-556. doi: 10.3102/00346543074004525 URL
[30]	Pohl, S., Gräfe, L., & Rose, N. (2014). Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement, 74(3), 423-452. doi: 10.1177/0013164413504926 URL
[31]	Rose, N., von Davier, M., & Xu, X. (2010). Modeling nonignorable missing data with item response theory (IRT) (ETS Research Rep. no. RR-10-11), Princeton, NJ: Educational Testing Service.
[32]	Schafer, J., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147-177. pmid: 12090408
[33]	Shen, L., Hu, G. Q., Chen, L. Z., & Tan, H. Z. (2014). Application of missforest algorithm for imputing missing data. Chinese Journal of Health Statistics, 31(5), 774-776.
	[沈琳, 胡国清, 陈立章, 谭红专. (2014). 缺失森林算法在缺失值插补中的应用. 中国卫生统计, 31(5), 774-776.]
[34]	Song, Z. L., Guo, L., & Zheng, T. P. (2022). Comparison of missing data handling methods in cognitive diagnosis: Zero replacement, multiple imputation, and maximum likelihood estimation. Acta Psychologica Sinica, 54(4), 426-440. doi: 10.3724/SP.J.1041.2022.00426 URL
	[宋枝璘, 郭磊, 郑天鹏. (2022). 认知诊断缺失数据处理方法的比较: 零替换、多重插补与极大似然估计法. 心理学报, 54(4), 426-440.] doi: 10.3724/SP.J.1041.2022.00426
[35]	Stekhoven, D. (2013). MissForest: Nonparametric missing value imputation using random forest. R package version 1.4.
[36]	Stekhoven, D., & Bühlmann, P. (2012). MissForest - nonparametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi: 10.1093/bioinformatics/btr597 pmid: 22039212
[37]	Wang, P. J., Liu, H. Y. (2019). Make adaptive testing know examinees better: The item selection strategies based on recommender systems. Acta Psychologica Sinica, 51(9), 1057-1067. doi: 10.3724/SP.J.1041.2019.01057
	[王璞珏, 刘红云. (2019). 让自适应测验更知人善选——基于推荐系统的选题策略. 心理学报, 51(9), 1057-1067.] doi: 10.3724/SP.J.1041.2019.01057
[38]	Wang, W. Y. (2012). Researches on methods for aiding item attributes identifying in cognitive diagnostic assessment (Unpublished doctoral dissertation). Jiangxi Normal University, China.
	[汪文义. (2012). 认知诊断评估中项目属性辅助标定方法研究 (博士论文). 江西师范大学. ]
[39]	Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika. 81(3), 625-649. doi: 10.1007/s11336-015-9471-z pmid: 26155755
[40]	Yu, X. F., Luo, Z. S., Gao, C. L., & Qin, C. Y. (2014). Compare the diagnostic assessment classification accuracy when the Q-Matrix contains error. Journal of Psychological Science, 37(6), 1482-1488.
	[喻晓锋, 罗照盛, 高椿雷, 秦春影. (2014). Q矩阵包含错误的认知诊断测验分类准确性研究. 心理科学, 37(6), 1482-1488.]
[41]	Zhan, P., Jiao, H., Liao, M., & Bian, Y. (2019). Bayesian DINA modeling incorporating within-item characteristic dependency. Applied Psychological Measurement, 43(2), 143-158. doi: 10.1177/0146621618781594 pmid: 30792561
[42]	Zhang, S., & Chang, H. H. (2016). From smart testing to smart learning: How testing technology can assist the new generation of education. International Journal of Smart Technology and Learning, 1(1), 67-92. doi: 10.1504/IJSMARTTL.2016.078162 URL
[43]	Zhuchkova, S., & Rotmistrov, A. (2021). How to choose an approach to handling missing categorical data: (un)expected findings from a simulated statistical experiment. Quality & Quantity, 56, 1-22. https://doi.org/10.1007/s11135-021-01114-w

分类分段	缺失比例(%)
0%~5%	MR×1.50
5%~15%	MR×1.35
15%~30%	MR×1.15
30%~70%	MR×1.00
70%~85%	MR×0.85
85%~95%	MR×0.65
90%~100%	MR×0.50

分类分段	缺失比例(%)
0%~5%	MR×1.50
5%~15%	MR×1.35
15%~30%	MR×1.15
30%~70%	MR×1.00
70%~85%	MR×0.85
85%~95%	MR×0.65
90%~100%	MR×0.50

缺失比例	RFI插补值为1的正确率(%)				RFTI插补值为1的正确率(%)
缺失比例	MIXED	MNAR	MAR	MCAR	MIXED	MNAR	MAR	MCAR
10%	49.39	59.19	75.54	75.30	71.80	78.57	82.12	83.07
20%	42.84	49.29	73.23	73.62	67.25	75.45	83.04	81.81
30%	35.42	44.98	71.49	71.65	68.26	74.91	80.35	81.48
40%	32.51	42.97	68.32	69.04	58.22	71.59	79.74	79.84
50%	30.89	42.60	66.74	64.97	49.44	64.58	76.67	78.09
平均	38.21	47.80	71.06	70.92	62.99	73.02	80.39	80.86

缺失比例	RFI插补值为1的正确率(%)				RFTI插补值为1的正确率(%)
缺失比例	MIXED	MNAR	MAR	MCAR	MIXED	MNAR	MAR	MCAR
10%	49.39	59.19	75.54	75.30	71.80	78.57	82.12	83.07
20%	42.84	49.29	73.23	73.62	67.25	75.45	83.04	81.81
30%	35.42	44.98	71.49	71.65	68.26	74.91	80.35	81.48
40%	32.51	42.97	68.32	69.04	58.22	71.59	79.74	79.84
50%	30.89	42.60	66.74	64.97	49.44	64.58	76.67	78.09
平均	38.21	47.80	71.06	70.92	62.99	73.02	80.39	80.86

缺失比例	MIXED		MNAR		MAR		MCAR
缺失比例	正确率	缺失率	正确率	缺失率	正确率	缺失率	正确率	缺失率
10%	86.15	0.96	84.69	1.16	77.68	0.94	77.94	1.01
20%	85.86	2.13	84.39	2.81	77.97	2.03	77.69	2.02
30%	85.86	3.87	84.35	5.88	78.19	3.55	78.28	3.50
40%	85.61	7.27	84.38	9.03	78.27	5.98	78.48	5.60
50%	85.03	10.12	82.61	11.66	78.28	7.03	78.41	7.98