基于基尼指数的双目标CD-CAT选题策略

doi:10.3724/SP.J.1041.2020.01452

摘要/Abstract

摘要：

双目标CD-CAT的测验结果既可用于形成性评估也可用于终结性评估。基尼指数可度量随机变量的不确定性程度, 值越小则随机变量的不确定程度越低。本文用基尼指数度量被试知识状态类别以及能力估计置信区间后验概率的变化, 提出基于基尼指数的选题策略。Monte Carlo实验表明与已有的选题策略相比, 新策略的知识状态分类精度和能力估计精度都较高, 同时能有效兼顾题库利用均匀性, 并能快速实时响应, 且受认知诊断模型和被试知识状态分布的影响较小, 可用于实际测验中含多种认知诊断模型的混合题库。

关键词: 认知诊断, 项目反应理论, 基尼指数, 双目标CD-CAT, 选题策略

Abstract:

Existing literature has shown that dual-objective CD-CAT testing can facilitate the achievement of measurement objectives for both formative and summative assessments. And the Gini Index can be used as a measurement for the degree of uncertainty of random variables since a smaller Gini value indicates a lower degree of uncertainty. Hence, this paper proposed a Gini-Index-based selection method for dual-objective CD-CAT, and it measured the changes in the posterior probability of knowledge state and confidence interval for latent traits estimation. By adopting the Bayesian Decision Theory, the potential information of participants could be detected based on participants’ responses and changes in posterior probability distribution of two the random variables.
Monte Carlo Simulation was used to test the performances of the selection method based on Gini, ASI, IPA and JSD, respectively. The item banks measured 5 attributes consisting of 250 items in total, and each item measured 3 attributes at most. The true knowledge state of each participant was generated by HO-CDM and Multivariate Normal Models (both means were 0 and covariance coefficient was 0.8 and 0.2, respectively). G-DINA, DINA and R-RUM were adopted as the cognitive diagnostic models and the item bank of each of these three models included both CDM and 2PL parameters. Specifically, CDM parameters were generated by a G-DINA package in R software with the slipping and guessing parameters randomly selected from uniform distribution in a range from 0.05 to 0.25. The 2PL parameters were estimated by factoring in the responses elicited from 3, 000 participants’ responses to all items in item banks using the mirt package. Four indexes, namely the pattern match ratio, root mean square error of latent trait, chi-square value and time needed for item selection, were adopted in comparing the efficiency of different item selection methods. The value for each index was the mean of 10 repeated simulations of 1, 000 participants’ responses to all item bank.
The results showed that (1) The Gini and IPA selection methods had similar performance in terms of pattern match ratio, root mean square error of latent trait and chi-square value. Both methods were high in precision measurement and low in sensitivity to CDM and the distribution of participants’ cognitive patterns, making both methods applicable to the item banks featuring a mixture of cognitive diagnosis models. By comparison, the Gini method outperformed slightly the IPA method in pattern match ratio and time needed for item selection in which the Gini method was only one-tenth that of the IPA method; (2) Both the Gini and ASI selection methods were weighted linear combination approaches. The performances of the two methods were very close in the short test. In the long test, however, although time needed for item selection using the ASI method was only one-third that of the Gini method, the latter was superior to the former in terms of measurement accuracy and chi-square value; (3) Although the JSD method outperformed the Gini method in terms of uniformity of item bank usage and time needed for item selection, its measurement accuracy was far less than the latter.
To summarize, the Gini, IPA and ASI selection methods all have good measurement accuracy and hence are all recommended for short tests. For medium and long tests with a limited number of attributes and a smaller item bank, the Gini and IPA selection methods are recommended. As the number of attributes and item bank size grow, the Gini method is recommended. When there are high correlations among different attributes, as well as a large number of attributes and big item bank size, the ASI and JSD selection methods are recommended with the ASI method slightly outperforming the JSD method in measurement accuracy.

Key words: cognitive diagnostic, items response theory, Gini index, dual objective CD-CAT, selection method

中图分类号:

B841

罗芬, 王晓庆, 蔡艳, 涂冬波. (2020). 基于基尼指数的双目标CD-CAT选题策略. 心理学报, 52(12), 1452-1465.

LUO Fen, WANG Xiaoqing, CAI Yan, TU Dongbo. (2020). A new dual-objective CD-CAT item selection method based on the Gini index. Acta Psychologica Sinica, 52(12), 1452-1465.

图/表 7

参考文献 47

[1]	Bock R. D., & Mislevy R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431-444.
[2]	Breiman L., Friedman J., Stone C. J., & Olshen R. A. (1984). Classification and regression trees, Chapman & Hall / CRC, Boca Raton, FL.
[3]	Cai Y., Miao Y., & Tu D. B. (2016). The polytomously scored cognitive diagnosis computerized adaptive testing, Acta Psychologica Sinica, 48(10), 1338-1346. doi: 10.3724/SP.J.1041.2016.01338 URL
	[蔡艳, 苗莹, 涂冬波. (2016). 多级评分的认知诊断计算机化适应测验. 心理学报, 48(10), 1338-1346.]
[4]	Chalmers R. P. (2012). Mirt: A multidimensional item response theory package for the renvironment. Journal of Statistical Software, 48(6), 1-29.
[5]	Chang H. -H., & Ying Z. L. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213-229.
[6]	Chen P., Li Z., & Xin T. (2011). A note on the uniformity of item bank usage in cognitive diagnostic computerized adaptive testing. Studies of Psychology and Behavior, 37(1), 212-216.
	[ 陈平, 李珍, 辛涛.(2011). 认知诊断计算机化自适应测验的题库使用均匀性初探. 心理与行为研究, 37(1), 212-216.]
[7]	Cheng Y. (2007). The dual information method for item selection in cognitive diagnostic computerized adaptive testing (Unpublished Master’s thesis). University of Illinois at Urbana-Champaign.
[8]	Cheng Y. (2009). When cognitive diagnosis meets computerized adaptive testing. Psychometrika. 74(4), 619-632.
[9]	Cheng Y., & Chang H. -H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62(2), 369-383.
[10]	Dai B. Y., Zhang M. Q., & Li G. M. (2016). Exploration of item selection in dual purpose cognitive diagnostic computerized adaptive testing: Based on the RRUM. Applied Psychological Measurement, 40(8), 625-640. doi: 10.1177/0146621616666008 URL pmid: 29882535
[11]	Du X. X. (2010). A new strategy of item selection of cognitive diagnosis computerized adaptive testing (Unpublished Master’s thesis). Jiangxi Normal University, Nanchang, China.
	[ 杜宣宣. (2010). 具有认知诊断功能的计算机化自适应测验的选题策略研究(硕士学位论文). 江西师范大学, 南昌.]
[12]	de la Torre J., (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199.
[13]	de la Torre, J., & Douglas J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353.
[14]	Fan Z. W., Wang C., Chang H. -H., & Douglas J. (2012). Utilizing response time distributions for item selection in CAT. Journal of Educational and Behavioral Statistics, 37(5), 655-670.
[15]	Han Y. T., Gao X. L., Wang D. X., Cai Y., & Tu D. B. (2018). Item selection methods in multidimensional polytomous computerized adaptive testing. Journal of Psychological Science, 41(6), 1500-1507.
	[ 韩雨婷, 高旭亮, 汪大勋, 蔡艳, 涂冬波. (2018). 多级评分项目的多维CAT选题策略开发. 心理科学, 41(6), 1500-1507.]
[16]	Hartz S. M. (2002). A bayesian framework for the unified model for assessing cognitive abilities: blending theory with practicality (Unpublished Doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana-Champaign, IL.
[17]	Hsu C. -L., & Wang W. -C. (2015). Variable-length computerized adaptive testing using the higher order DINA model. Journal of Educational Measurement, 52(2), 125-143.
[18]	Hsu C. -L., & Wang W. -C. (2019). Multidimensional computerized adaptive testing using non-compensatory item response theory models. Applied Psychological Measurement, 43(6), 464-480. doi: 10.1177/0146621618800280 URL pmid: 31452555
[19]	Huang H. -Y. (2020). Utilizing response times in cognitive diagnostic computerized adaptive testing under the higher- order deterministic input, noisy ‘and’ gate model. British Journal of Mathematical and Statistical Psychology, 73(1), 109-141. URL pmid: 30793768
[20]	Junker B. W., & Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258-272. doi: 10.1177/01466210122032064 URL
[21]	Kang H. -A., Zhang S. S., & Chang H. -H. (2017). Dual-objective item selection criteria in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 54(2), 165-183. doi: 10.1111/jedm.12139 URL
[22]	Kaplan M., & de la Torre J. (2020). A blocked-CAT procedure for CD-CAT. Applied Psychological Measurement, 44(1), 49-64. doi: 10.1177/0146621619835500 URL pmid: 31853158
[23]	Kaplan M., de la Torre J., & Barrada J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167-188. doi: 10.1177/0146621614554650 URL pmid: 29881001
[24]	Li H. (2012). Statistical learning method. Beijing: Tsinghua University Press.
	[ 李航. (2012). 统计学习方法. 北京: 清华大学出版社.]
[25]	Lin C. -J., & Chang H. -H. (2019). Item selection criteria with practical constraints in cognitive diagnostic computerized adaptive testing. Educational and Psychological Measurement, 79(2), 335-357. doi: 10.1177/0013164418790634 URL pmid: 30911196
[26]	Liu S. C., Cai Y., & Tu D. B. (2018). On-the-fly constraint- controlled assembly methods for multistage adaptive testing for cognitive diagnosis. Journal of Educational Measurement, 55(4), 595-613.
[27]	Lord M. F. (1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.
[28]	Luo F., Wang X. Q., Ding S. L., & Xiong J. H. (2018). The design and selection strategies of adaptive multigroup Testing for Cognitive Diagnosis. Journal of Psychological Science, 41(3), 720-726.
	[ 罗芬, 王晓庆, 丁树良, 熊建华. (2018). 自适应分组认知诊断测验设计及其选题策略. 心理科学, 41(3), 720-726.]
[29]	Ma W. C., & de la Torre J. (2020). GDINA: The generalized DINA model framework. R package version 2.7.9, https:// CRAN.R-project.org/package=GDINA.
[30]	McGlohen M. K., & Chang H. -H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40(3), 808-821. doi: 10.3758/brm.40.3.808 URL pmid: 18697677
[31]	Nah F. F. -H. (2004). A study on tolerable waiting time: How long are web users willing to wait? Behaviour and Information Technology, 23(3), 153-163.
[32]	Quinlan J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
[33]	Quinlan J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA.
[34]	Rupp A. A., Templin J., & Henson R. A. (2010). Diagnostic measurement: theory, method, and application. New York: The Guilford Press.
[35]	Tatsuoka C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, Series C: Applied Statistics, 51(3), 337-350.
[36]	Tu D. B., & Cai Y. (2015). The Development of CD-CAT with polytomous attributes. Acta Psychologica Sinica, 47(11), 1405-1414.
	[ 涂冬波, 蔡艳. (2015). 基于属性多级化的认知诊断计算机化自适应测验设计与实现. 心理学报, 47(11), 1405-1414.]
[37]	Veerkamp W. J. J., & Berger M. P. F. (1994). Some new item selection criteria for adaptive testing (Research Rep. 94-6). Enschede, The Netherlands: University of Twente, Department of Educational Measurement and Data Analysis.
[38]	Wang C., & Chang H. -H. (2011). Item selection in multidimensional computerized adaptive testing-gaining information from different angles. Psychometrika, 76(3), 363-384.
[39]	Wang C., Chang H. -H., & Douglas J. (2012). Combining CAT with cognitive diagnosis: A weighted item selection approach. Behavior Research Methods, 44(1), 95-109. doi: 10.3758/s13428-011-0143-3 URL pmid: 21853408
[40]	Wang C., Chang H. -H., & Huebner A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255-273.
[41]	Wang C., Zheng C. J., & Chang H. -H. (2014). An enhanced approach to combine item response theory with cognitive diagnosis in adaptive testing. Journal of Educational Measurement, 51(4), 358-380.
[42]	Xu X. L., Chang H. -H., & Douglas J. (2003, April). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago, IL.
[43]	Zhang X. G. (2010). Pattern recognition (Third Edition). Beijing: Tsinghua University Press.
	[ 张学工. (2010). 模式识别(第三版). 北京: 清华大学出版社.]
[44]	Zheng C. J., & Chang H. -H. (2016). High-efficiency response distribution-based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608-624. URL pmid: 29881073
[45]	Zheng C. J., He G., & Gao C. L. (2018). The information product methods: A unified approach to dual-purpose computerized adaptive testing. Applied Psychological Measurement, 42(4), 321-324. doi: 10.1177/0146621617730392 URL pmid: 29882539
[46]	Zheng C. J., & Wang C. (2017). Application of binary searching for item exposure control in cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 41(7), 561-576. doi: 10.1177/0146621617707509 URL pmid: 29881106
[47]	Zhou Z. H. (2016). Machine learning. Beijing: Tsinghua University Press.
	[ 周志华. (2016). 机器学习. 北京: 清华大学出版社.]

CDM模型	知识状态生成模型	选题策略
		Gini		ASI		IPA		JSD
		Mean/%	SD	Mean/%	SD	Mean/%	SD	Mean/%	SD
G-DINA	HO	97.00	0.009	89.28	0.025	96.10	0.010	85.04	0.024
	MV-0.8	97.22	0.004	93.05	0.011	97.44	0.008	92.02	0.014
	MV-0.2	96.84	0.007	90.78	0.014	96.35	0.006	87.51	0.016
DINA	HO	97.45	0.010	90.99	0.032	97.18	0.011	75.31	0.060
	MV-0.8	97.24	0.011	93.45	0.017	97.06	0.010	91.46	0.023
	MV-0.2	97.57	0.006	93.76	0.007	96.93	0.008	86.23	0.050
R-RUM	HO	95.41	0.010	87.61	0.021	95.38	0.010	76.64	0.028
	MV-0.8	97.09	0.009	92.45	0.014	96.82	0.008	91.67	0.010
	MV-0.2	96.81	0.008	87.88	0.022	96.82	0.012	80.52	0.038

CDM模型	知识状态生成模型	选题策略
		Gini		ASI		IPA		JSD
		Mean/%	SD	Mean/%	SD	Mean/%	SD	Mean/%	SD
G-DINA	HO	97.00	0.009	89.28	0.025	96.10	0.010	85.04	0.024
	MV-0.8	97.22	0.004	93.05	0.011	97.44	0.008	92.02	0.014
	MV-0.2	96.84	0.007	90.78	0.014	96.35	0.006	87.51	0.016
DINA	HO	97.45	0.010	90.99	0.032	97.18	0.011	75.31	0.060
	MV-0.8	97.24	0.011	93.45	0.017	97.06	0.010	91.46	0.023
	MV-0.2	97.57	0.006	93.76	0.007	96.93	0.008	86.23	0.050
R-RUM	HO	95.41	0.010	87.61	0.021	95.38	0.010	76.64	0.028
	MV-0.8	97.09	0.009	92.45	0.014	96.82	0.008	91.67	0.010
	MV-0.2	96.81	0.008	87.88	0.022	96.82	0.012	80.52	0.038

CDM模型	知识状态生成模型	选题策略
		Gini		ASI		IPA		JSD
		Bias	RMSE	Bias	RMSE	Bias	RMSE	Bias	RMSE
G-DINA	HO	0.02	0.32	0.00	0.41	0.04	0.28	0.02	0.40
	MV-0.8	0.00	0.29	0.01	0.29	0.02	0.29	0.02	0.30
	MV-0.2	0.03	0.27	0.02	0.32	0.07	0.27	0.05	0.42
DINA	HO	-0.08	0.40	-0.02	0.41	-0.14	0.37	-0.05	0.46
	MV-0.8	0.02	0.34	0.01	0.32	-0.03	0.35	-0.08	0.35
	MV-0.2	-0.12	0.38	-0.09	0.36	-0.24	0.42	-0.28	0.52
R-RUM	HO	-0.07	0.35	-0.01	0.42	-0.14	0.35	-0.02	0.45
	MV-0.8	0.00	0.30	-0.02	0.30	-0.03	0.30	-0.03	0.32
	MV-0.2	-0.04	0.31	-0.01	0.43	-0.10	0.29	-0.05	0.51

CDM模型	知识状态生成模型	选题策略
		Gini		ASI		IPA		JSD
		Bias	RMSE	Bias	RMSE	Bias	RMSE	Bias	RMSE
G-DINA	HO	0.02	0.32	0.00	0.41	0.04	0.28	0.02	0.40
	MV-0.8	0.00	0.29	0.01	0.29	0.02	0.29	0.02	0.30
	MV-0.2	0.03	0.27	0.02	0.32	0.07	0.27	0.05	0.42
DINA	HO	-0.08	0.40	-0.02	0.41	-0.14	0.37	-0.05	0.46
	MV-0.8	0.02	0.34	0.01	0.32	-0.03	0.35	-0.08	0.35
	MV-0.2	-0.12	0.38	-0.09	0.36	-0.24	0.42	-0.28	0.52
R-RUM	HO	-0.07	0.35	-0.01	0.42	-0.14	0.35	-0.02	0.45
	MV-0.8	0.00	0.30	-0.02	0.30	-0.03	0.30	-0.03	0.32
	MV-0.2	-0.04	0.31	-0.01	0.43	-0.10	0.29	-0.05	0.51

CDM模型	知识状态生成模型	选题策略
		Gini		ASI		IPA		JSD
		χ²	TOE	χ²	TOE	χ²	TOE	χ²	TOE
G-DINA	HO	82.38	0.41	98.75	0.47	85.34	0.42	44.45	0.26
	MV-0.8	69.37	0.36	77.30	0.39	77.11	0.39	53.26	0.29
	MV-0.2	72.50	0.37	91.36	0.44	82.94	0.41	37.08	0.23
DINA	HO	70.91	0.36	86.88	0.43	72.68	0.37	53.52	0.29
	MV-0.8	56.55	0.31	66.74	0.35	58.98	0.32	59.31	0.32
	MV-0.2	72.11	0.37	83.17	0.41	67.31	0.35	58.41	0.31
R-RUM	HO	95.78	0.46	109.29	0.52	94.55	0.46	58.22	0.31
	MV-0.8	85.70	0.42	84.99	0.42	87.92	0.43	56.27	0.30
	MV-0.2	88.92	0.44	105.01	0.50	95.48	0.46	60.78	0.32