认知诊断模型的标准误与置信区间估计：并行自助法

doi:10.3724/SP.J.1041.2022.00703

摘要/Abstract

摘要：

认知诊断模型的标准误(Standard Error, SE; 或方差—协方差矩阵)与置信区间(Confidence Interval, CI)在模型参数估计不确定性的度量、项目功能差异检验、项目水平上的模型比较、Q矩阵检验以及探索属性层级关系等领域有重要的理论与实践价值。本研究提出了两种新的SE和CI计算方法：并行参数化自助法和并行非参数化自助法。模拟研究发现：模型完全正确设定时, 在高质量及中等质量项目条件下, 这两种方法在计算模型参数的SE和CI时均有好的表现; 模型参数存在冗余时, 在高质量及中等质量项目条件下, 对于大部分允许存在的模型参数而言, 其SE和CI有好的表现。通过实证数据展示了新方法的价值及计算效率提升效果。

关键词: 认知诊断模型, 标准误, 置信区间, 自助法, 并行计算

Abstract:

The model parameter standard error (SE; or variance-covariance matrix), which provides an estimate of the uncertainty associated with the model parameter estimate, has both theoretical and practical implications in cognitive diagnostic models (CDMs). The drawbacks of the analytic methods, such as the empirical cross-product information matrix, observed information matrix, and “robust” sandwich-type information matrix, are that they require the positive definiteness of the information matrix and may suffer from boundary problems. Another method for estimating model parameter SEs is to use the computer-intensive bootstrap method, and consequently, no study has systematically explored the performance of the bootstrap in calculating model parameter SEs and confidence intervals (CIs) in CDMs.
The purpose of this research is to present two new highly efficient bootstrap methods to calculate model parameter SEs and CIs in CDMs, namely the parallel parametric bootstrap (pPB) and parallel non-parametric bootstrap (pNPB) methods. A simulation study was conducted to evaluate the performance of the pPB and pNPB methods. Five factors that may influence the performance of the model parameter SEs and CIs were manipulated. The two model specification scenarios considered in this simulation were the correctly specified and over-specified models. The sample size was set to two levels: 1, 000 and 3, 000. Three bootstrap sample sizes were manipulated: 200, 500, and 3, 000. Three levels of item quality were considered: high [$P(\mathbf{0})=0.1$, $P(\mathbf{1})=0.9$], moderate [$P(\mathbf{0})=0.2$, $P(\mathbf{1})=0.8$], and low quality [$P(\mathbf{0})=0.3$, $P(\mathbf{1})=0.7$]. The pPB and pNPB methods were used to estimate model parameter SEs and CIs.
The simulation results indicated the following.
(1) For the correctly specified CDMs, under the high- or moderate-item-quality conditions, the coverage rates of the 95% CIs of the model parameter SEs based on the pNPB or pPB method were reasonably close to the expected coverage rate, and the bias for each model parameter SE converged to zero, meaning that the estimated SE was almost identical to the empirical SE. The increase in the bootstrap sample size had only a slight effect on the performance of the pNPB or pPB method. Under the low-item-quality condition, the pNPB method tended to over-estimate SE, whereas a contrary trend was observed for the pPB method.
(2) For the over-specified CDMs, most of the permissible item parameter SEs and almost all of the permissible structural parameter SEs exhibited good performance in terms of the 95% CI coverage rates and bias. Under most of the simulation conditions, the impermissible model parameter SEs did not exhibit good performance in approximating the empirical SEs.
To the best of our knowledge, this is the first study in which the performance of the bootstrap method in estimating model parameter SEs and CIs in CDMs is systematically investigated. The pNPB or pPB appears to be a useful tool for researchers interested in evaluating the uncertainty of the model parameter point estimates. As a time-saving computational strategy, the pNPB or pPB method is substantially faster than the usual bootstrap method. The simulation and real data studies showed that 3, 000 re-samples might be adequate for the bootstrap method in calculating model parameter SEs and CIs in CDMs.

Key words: cognitive diagnostic model, standard error, confidence interval, bootstrap, parallel computing method

中图分类号:

B841

刘彦楼. (2022). 认知诊断模型的标准误与置信区间估计：并行自助法. 心理学报, 54(6), 703-724.

LIU Yanlou. (2022). Standard errors and confidence intervals for cognitive diagnostic models: Parallel bootstrap methods. Acta Psychologica Sinica, 54(6), 703-724.

图/表 20

图1 模拟研究中使用的Q矩阵

图2 CDM模型参数完全正确设定时, 基于pNPB与pPB的项目参数的95% CI覆盖率

图3 CDM模型参数完全正确设定时, 基于pNPB与pPB的项目参数的SE的BIAS

图4 CDM模型参数完全正确设定时, 基于XPD、Obs与Sw的项目参数的95% CI覆盖率

图5 CDM模型参数完全正确设定时, 基于XPD、Obs与Sw的项目参数的SE的BIAS

图6 CDM模型参数完全正确设定时, 基于pNPB与pPB的结构参数的95% CI覆盖率

图7 CDM模型参数完全正确设定时, 基于pNPB与pPB的结构参数的SE的BIAS

图8 CDM模型参数完全正确设定时, 基于XPD、Obs与Sw的结构参数的95% CI覆盖率

图9 CDM模型参数完全正确设定时, 基于XPD、Obs与Sw的结构参数的SE的BIAS

图10 CDM模型参数冗余时, 基于pNPB与pPB的允许存在项目参数的95% CI覆盖率

图11 CDM模型参数冗余时, 基于pNPB与pPB的允许存在项目参数的SE的BIAS

图12 CDM模型参数冗余时, 基于pNPB与pPB的非允许存在项目参数的95% CI覆盖率

图13 CDM模型参数冗余时, 基于pNPB与pPB的非允许存在项目参数的SE的BIAS

图14 CDM模型参数冗余时, 基于pNPB与pPB的允许存在结构参数的95% CI覆盖率

图15 CDM模型参数冗余时, 基于pNPB与pPB的允许存在结构参数的SE的BIAS

图16 CDM模型参数冗余时, 基于pNPB与pPB的非允许存在结构参数的95% CI覆盖率

图17 CDM模型参数冗余时, 基于pNPB与pPB的非允许存在结构参数的SE的BIAS

图18 ECPE数据集的Q矩阵

图19 ECPE数据集中所有可能的属性掌握模式及其对应的结构参数估计值

表1 ECPE数据的结构参数估计值的SE

参数序号	解析法			pNPB				pPB
参数序号	XPD	Obs	Sw	200	500	3000	10000	200	500	3000	10000
1	0.017	0.018	0.023	0.021	0.022	0.022	0.021	0.015	0.015	0.015	0.015
2	0.003	-	0.010	0.008	0.008	0.008	0.008	0.003	0.003	0.003	0.003
3	0.013	0.014	0.017	0.013	0.014	0.013	0.013	0.010	0.010	0.011	0.011
4	0.017	0.020	0.027	0.027	0.026	0.026	0.026	0.016	0.016	0.015	0.015
5	0.006	0.006	0.007	0.007	0.007	0.008	0.008	0.005	0.005	0.005	0.005
6	0.008	0.007	0.016	0.010	0.010	0.010	0.011	0.008	0.008	0.008	0.008
7	0.018	0.020	0.027	0.023	0.023	0.024	0.024	0.018	0.018	0.017	0.017

参考文献 47

[1]	American Psychological Association. (2020). Publication manual of the American Psychological Association(7th ed.). Washington.
[2]	Bai, H., Sivo, S. A., Pan, W., & Fan, X. (2016). Application of a new resampling method to SEM: A comparison of S-SMART with the bootstrap. International Journal of Research & Method in Education, 39(2), 194-207.
[3]	Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.
[4]	Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (2007). Discrete multivariate analysis: Theory and practice. Springer,
[5]	Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge.
[6]	DeCarlo, T. (2019). Insights from reparameterized DINA and beyond. In M. von Davier & Y.-S. Lee (Eds.). Handbook of diagnostic classification models (pp. 549-572). Springer.
[7]	DeCarlo, T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-Matrix. Applied Psychological Measurement, 35(1), 8-26. doi: 10.1177/0146621610377081 URL
[8]	de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199. doi: 10.1007/s11336-011-9207-7 URL
[9]	de la Torre, J., & Lee, Y. S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355-373. doi: 10.1111/jedm.12022 URL
[10]	Denwood, M. J. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1-25.
[11]	Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall.
[12]	Guo, W., & Wind, S. A. (2021). An iterative parametric bootstrap approach to evaluating rater fit. Applied Psychological Measurement, 45(5), 315-330. doi: 10.1177/01466216211013105 URL
[13]	Gu, Y., & Xu, G. (2019). Learning attribute patterns in high-dimensional structured latent attribute models. Journal of Machine Learning Research, 20(115), 1-58.
[14]	Gu, Y., & Xu, G. (2020). Partial identifiability of restricted latent class models. The Annals of Statistics, 48(4), 2082- 2107.
[15]	Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76(4), 408-420. doi: 10.1080/03637750903310360 URL
[16]	Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-Based approach (2nd ed.). Guilford.
[17]	Hu, B., & Templin, J. (2020). Using diagnostic classification models to validate attribute hierarchies and evaluate model fit in Bayesian networks. Multivariate Behavioral Research, 55(2), 300-311. doi: 10.1080/00273171.2019.1632165 URL
[18]	Jiang, Z., Raymond, M., DiStefano, C., Shi, D., Liu, R., & Sun, J. (2021). A Monte Carlo study of confidence interval methods for generalizability coefficient. Educational and Psychological Measurement. Advance online publication. https://doi.org/10.1177/00131644211033899
[19]	Khorramdel, L., Shin, H. J., & von Davier, M. (2019). GDM Software mdltm Including Parallel EM Algorithm. In M. von Davier & Y.-S. Lee (Eds.), Handbook of diagnostic classification models (pp. 603-628). Springer.
[20]	Lai, M. H. C. (2021). Bootstrap confidence intervals for multilevel standardized effect size. Multivariate Behavioral Research, 56(4), 558-578. doi: 10.1080/00273171.2020.1746902 URL
[21]	Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41(3), 205-237. doi: 10.1111/j.1745-3984.2004.tb01163.x URL
[22]	Liu, R. (2018). Misspecification of attribute structure in diagnostic measurement. Educational and Psychological Measurement, 78(4), 605-634. doi: 10.1177/0013164417702458 URL
[23]	Liu, Y., Andersson, B., Xin, T., Zhang, H., & Wang, L. (2019). Improved wald statistics for item-level model comparison in diagnostic classification models. Applied Psychological Measurement, 43(5), 402-414. doi: 10.1177/0146621618798664 URL
[24]	Liu, Y., & Maydeu-Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49(4), 354-371. doi: 10.1080/00273171.2014.910744 URL
[25]	Liu, Y., Tian, W., & Xin, T. (2016). An application of M₂ statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3-26.
[26]	Liu, Y., & Xin, T. (2017). dcminfo: Information matrix for diagnostic classification models. R package version 0.1.6. https://CRAN.R-project.org/package=dcminfo
[27]	Liu, Y., Xin, T., Andersson, B., & Tian, W. (2019). Information matrix estimation procedures for cognitive diagnostic models. British Journal of Mathematical and Statistical Psychology, 72(1), 18-37. doi: 10.1111/bmsp.12134 URL
[28]	Liu, Y., Xin, T., & Jiang, Y. (2021). Structural parameter standard error estimation method in diagnostic classification models: Estimation and application. Multivariate Behavioral Research. Advance online publication. https://doi.org/10.1080/00273171.2021.1919048
[29]	Liu, Y., Xin, T., Li, L., Tian, W., & Liu, X. (2016). An improved method for differential item functioning detection in cognitive diagnosis models: An application of wald statistic based on observed information matrix. Acta Psychologica Sinica, 48(5), 588-598. doi: 10.3724/SP.J.1041.2016.00588 URL
	[ 刘彦楼, 辛涛, 李令青, 田伟, 刘笑笑. (2016). 改进的认知诊断模型项目功能差异检验方法——基于观察信息矩阵的Wald统计量. 心理学报, 48(5), 588-598.]
[30]	Liu, Y., Yin, H., Xin, T., Shao, L., & Yuan, L. (2019). A comparison of differential item functioning detection methods in cognitive diagnostic models. Frontiers in Psychology, 10, 1137. doi: 10.3389/fpsyg.2019.01137 URL
[31]	Ma, C., & Xu, G. (2021). Hypothesis testing for hierarchical structures in cognitive diagnosis models. arXiv preprint arXiv:2106.03218v1
[32]	Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275. doi: 10.1111/bmsp.12070 URL
[33]	Ma, W., & de la Torre, J. (2019). Category-level model selection for the sequential G-DINA model. Journal of Educational and Behavioral Statistics, 44(1), 45-77.
[34]	Ma, W., & de la Torre, J. (2020a). An empirical Q‐matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163. doi: 10.1111/bmsp.12156 URL
[35]	Ma, W., & de la Torre, J. (2020b). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14), 1-26.
[36]	Ma, W., Ragip, T., & de la Torre, J. (2021). Detecting differential item functioning using multiple-group cognitive diagnosis models. Applied Psychological Measurement, 45(1), 37-53. doi: 10.1177/0146621620965745 URL
[37]	Philipp, M., Strobl, C., de la Torre, J., & Zeileis, A. (2018). On the estimation of standard errors in cognitive diagnosis models. Journal of Educational and Behavioral Statistics, 43(1), 88-115.
[38]	Robitzsch, A., Kiefer, T., George, A. C., & Uenlue, A. (2020). CDM: cognitive diagnosis modeling. R package version 7.5-15. http://CRAN.R-project.org/package=CDM
[39]	Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: theory, methods, and applications. Guilford.
[40]	Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339. doi: 10.1007/s11336-013-9362-0 URL
[41]	Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32(2), 37-50. doi: 10.1111/emip.12010 URL
[42]	Tjoe, H., & de la Torre, J. (2014). On recognizing proportionality: Does the ability to solve missing value proportional problems presuppose the conception of proportional reasoning? The Journal of Mathematical Behavior, 33, 1-7. doi: 10.1016/j.jmathb.2013.09.002 URL
[43]	von Davier, M. (2014). The DINA model as a constrained general diagnostic model: Two variants of a model equivalency. British Journal of Mathematical and Statistical Psychology, 67(1), 49-71. doi: 10.1111/bmsp.12003 URL
[44]	Wang, C., & Lu, J. (2021). Learning attribute hierarchies from data: Two exploratory approaches. Journal of Educational and Behavioral Statistics, 46(1), 58-84.
[45]	Wu, Z., Deloria-Knoll, M., & Zeger, S. L. (2017). Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics, 18(2), 200-213.
[46]	Zhang, Z. (2014). Monte Carlo based statistical power analysis for mediation models: methods and software. Behavior research methods, 46(4), 1184-1198. doi: 10.3758/s13428-013-0424-0 URL
[47]	Zhang, Z., & Wang, L. (2020). bmem: mediation analysis with missing data using Bootstrap. R package version 1. 8. https://CRAN.R-project.org/package=bmem