心理学报 ›› 2022, Vol. 54 ›› Issue (6): 703-724.doi: 10.3724/SP.J.1041.2022.00703
• 研究报告 • 上一篇
收稿日期:
2021-10-14
发布日期:
2022-04-26
出版日期:
2022-06-25
通讯作者:
刘彦楼
E-mail:liuyanlou@163.com
基金资助:
Received:
2021-10-14
Online:
2022-04-26
Published:
2022-06-25
Contact:
LIU Yanlou
E-mail:liuyanlou@163.com
摘要:
认知诊断模型的标准误(Standard Error, SE; 或方差—协方差矩阵)与置信区间(Confidence Interval, CI)在模型参数估计不确定性的度量、项目功能差异检验、项目水平上的模型比较、Q矩阵检验以及探索属性层级关系等领域有重要的理论与实践价值。本研究提出了两种新的SE和CI计算方法:并行参数化自助法和并行非参数化自助法。模拟研究发现:模型完全正确设定时, 在高质量及中等质量项目条件下, 这两种方法在计算模型参数的SE和CI时均有好的表现; 模型参数存在冗余时, 在高质量及中等质量项目条件下, 对于大部分允许存在的模型参数而言, 其SE和CI有好的表现。通过实证数据展示了新方法的价值及计算效率提升效果。
中图分类号:
刘彦楼. (2022). 认知诊断模型的标准误与置信区间估计:并行自助法. 心理学报, 54(6), 703-724.
LIU Yanlou. (2022). Standard errors and confidence intervals for cognitive diagnostic models: Parallel bootstrap methods. Acta Psychologica Sinica, 54(6), 703-724.
参数 序号 | 解析法 | pNPB | pPB | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
XPD | Obs | Sw | 200 | 500 | 3000 | 10000 | 200 | 500 | 3000 | 10000 | |
1 | 0.017 | 0.018 | 0.023 | 0.021 | 0.022 | 0.022 | 0.021 | 0.015 | 0.015 | 0.015 | 0.015 |
2 | 0.003 | - | 0.010 | 0.008 | 0.008 | 0.008 | 0.008 | 0.003 | 0.003 | 0.003 | 0.003 |
3 | 0.013 | 0.014 | 0.017 | 0.013 | 0.014 | 0.013 | 0.013 | 0.010 | 0.010 | 0.011 | 0.011 |
4 | 0.017 | 0.020 | 0.027 | 0.027 | 0.026 | 0.026 | 0.026 | 0.016 | 0.016 | 0.015 | 0.015 |
5 | 0.006 | 0.006 | 0.007 | 0.007 | 0.007 | 0.008 | 0.008 | 0.005 | 0.005 | 0.005 | 0.005 |
6 | 0.008 | 0.007 | 0.016 | 0.010 | 0.010 | 0.010 | 0.011 | 0.008 | 0.008 | 0.008 | 0.008 |
7 | 0.018 | 0.020 | 0.027 | 0.023 | 0.023 | 0.024 | 0.024 | 0.018 | 0.018 | 0.017 | 0.017 |
表1 ECPE数据的结构参数估计值的SE
参数 序号 | 解析法 | pNPB | pPB | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
XPD | Obs | Sw | 200 | 500 | 3000 | 10000 | 200 | 500 | 3000 | 10000 | |
1 | 0.017 | 0.018 | 0.023 | 0.021 | 0.022 | 0.022 | 0.021 | 0.015 | 0.015 | 0.015 | 0.015 |
2 | 0.003 | - | 0.010 | 0.008 | 0.008 | 0.008 | 0.008 | 0.003 | 0.003 | 0.003 | 0.003 |
3 | 0.013 | 0.014 | 0.017 | 0.013 | 0.014 | 0.013 | 0.013 | 0.010 | 0.010 | 0.011 | 0.011 |
4 | 0.017 | 0.020 | 0.027 | 0.027 | 0.026 | 0.026 | 0.026 | 0.016 | 0.016 | 0.015 | 0.015 |
5 | 0.006 | 0.006 | 0.007 | 0.007 | 0.007 | 0.008 | 0.008 | 0.005 | 0.005 | 0.005 | 0.005 |
6 | 0.008 | 0.007 | 0.016 | 0.010 | 0.010 | 0.010 | 0.011 | 0.008 | 0.008 | 0.008 | 0.008 |
7 | 0.018 | 0.020 | 0.027 | 0.023 | 0.023 | 0.024 | 0.024 | 0.018 | 0.018 | 0.017 | 0.017 |
[1] | American Psychological Association. (2020). Publication manual of the American Psychological Association(7th ed.). Washington. |
[2] | Bai, H., Sivo, S. A., Pan, W., & Fan, X. (2016). Application of a new resampling method to SEM: A comparison of S-SMART with the bootstrap. International Journal of Research & Method in Education, 39(2), 194-207. |
[3] | Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. |
[4] | Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (2007). Discrete multivariate analysis: Theory and practice. Springer, |
[5] | Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge. |
[6] | DeCarlo, T. (2019). Insights from reparameterized DINA and beyond. In M. von Davier & Y.-S. Lee (Eds.). Handbook of diagnostic classification models (pp. 549-572). Springer. |
[7] |
DeCarlo, T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-Matrix. Applied Psychological Measurement, 35(1), 8-26.
doi: 10.1177/0146621610377081 URL |
[8] |
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199.
doi: 10.1007/s11336-011-9207-7 URL |
[9] |
de la Torre, J., & Lee, Y. S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355-373.
doi: 10.1111/jedm.12022 URL |
[10] | Denwood, M. J. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1-25. |
[11] | Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall. |
[12] |
Guo, W., & Wind, S. A. (2021). An iterative parametric bootstrap approach to evaluating rater fit. Applied Psychological Measurement, 45(5), 315-330.
doi: 10.1177/01466216211013105 URL |
[13] | Gu, Y., & Xu, G. (2019). Learning attribute patterns in high-dimensional structured latent attribute models. Journal of Machine Learning Research, 20(115), 1-58. |
[14] | Gu, Y., & Xu, G. (2020). Partial identifiability of restricted latent class models. The Annals of Statistics, 48(4), 2082- 2107. |
[15] |
Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76(4), 408-420.
doi: 10.1080/03637750903310360 URL |
[16] | Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-Based approach (2nd ed.). Guilford. |
[17] |
Hu, B., & Templin, J. (2020). Using diagnostic classification models to validate attribute hierarchies and evaluate model fit in Bayesian networks. Multivariate Behavioral Research, 55(2), 300-311.
doi: 10.1080/00273171.2019.1632165 URL |
[18] | Jiang, Z., Raymond, M., DiStefano, C., Shi, D., Liu, R., & Sun, J. (2021). A Monte Carlo study of confidence interval methods for generalizability coefficient. Educational and Psychological Measurement. Advance online publication. https://doi.org/10.1177/00131644211033899 |
[19] | Khorramdel, L., Shin, H. J., & von Davier, M. (2019). GDM Software mdltm Including Parallel EM Algorithm. In M. von Davier & Y.-S. Lee (Eds.), Handbook of diagnostic classification models (pp. 603-628). Springer. |
[20] |
Lai, M. H. C. (2021). Bootstrap confidence intervals for multilevel standardized effect size. Multivariate Behavioral Research, 56(4), 558-578.
doi: 10.1080/00273171.2020.1746902 URL |
[21] |
Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41(3), 205-237.
doi: 10.1111/j.1745-3984.2004.tb01163.x URL |
[22] |
Liu, R. (2018). Misspecification of attribute structure in diagnostic measurement. Educational and Psychological Measurement, 78(4), 605-634.
doi: 10.1177/0013164417702458 URL |
[23] |
Liu, Y., Andersson, B., Xin, T., Zhang, H., & Wang, L. (2019). Improved wald statistics for item-level model comparison in diagnostic classification models. Applied Psychological Measurement, 43(5), 402-414.
doi: 10.1177/0146621618798664 URL |
[24] |
Liu, Y., & Maydeu-Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49(4), 354-371.
doi: 10.1080/00273171.2014.910744 URL |
[25] | Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3-26. |
[26] | Liu, Y., & Xin, T. (2017). dcminfo: Information matrix for diagnostic classification models. R package version 0.1.6. https://CRAN.R-project.org/package=dcminfo |
[27] |
Liu, Y., Xin, T., Andersson, B., & Tian, W. (2019). Information matrix estimation procedures for cognitive diagnostic models. British Journal of Mathematical and Statistical Psychology, 72(1), 18-37.
doi: 10.1111/bmsp.12134 URL |
[28] | Liu, Y., Xin, T., & Jiang, Y. (2021). Structural parameter standard error estimation method in diagnostic classification models: Estimation and application. Multivariate Behavioral Research. Advance online publication. https://doi.org/10.1080/00273171.2021.1919048 |
[29] |
Liu, Y., Xin, T., Li, L., Tian, W., & Liu, X. (2016). An improved method for differential item functioning detection in cognitive diagnosis models: An application of wald statistic based on observed information matrix. Acta Psychologica Sinica, 48(5), 588-598.
doi: 10.3724/SP.J.1041.2016.00588 URL |
[ 刘彦楼, 辛涛, 李令青, 田伟, 刘笑笑. (2016). 改进的认知诊断模型项目功能差异检验方法——基于观察信息矩阵的Wald统计量. 心理学报, 48(5), 588-598.] | |
[30] |
Liu, Y., Yin, H., Xin, T., Shao, L., & Yuan, L. (2019). A comparison of differential item functioning detection methods in cognitive diagnostic models. Frontiers in Psychology, 10, 1137.
doi: 10.3389/fpsyg.2019.01137 URL |
[31] | Ma, C., & Xu, G. (2021). Hypothesis testing for hierarchical structures in cognitive diagnosis models. arXiv preprint arXiv:2106.03218v1 |
[32] |
Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275.
doi: 10.1111/bmsp.12070 URL |
[33] | Ma, W., & de la Torre, J. (2019). Category-level model selection for the sequential G-DINA model. Journal of Educational and Behavioral Statistics, 44(1), 45-77. |
[34] |
Ma, W., & de la Torre, J. (2020a). An empirical Q‐matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163.
doi: 10.1111/bmsp.12156 URL |
[35] | Ma, W., & de la Torre, J. (2020b). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14), 1-26. |
[36] |
Ma, W., Ragip, T., & de la Torre, J. (2021). Detecting differential item functioning using multiple-group cognitive diagnosis models. Applied Psychological Measurement, 45(1), 37-53.
doi: 10.1177/0146621620965745 URL |
[37] | Philipp, M., Strobl, C., de la Torre, J., & Zeileis, A. (2018). On the estimation of standard errors in cognitive diagnosis models. Journal of Educational and Behavioral Statistics, 43(1), 88-115. |
[38] | Robitzsch, A., Kiefer, T., George, A. C., & Uenlue, A. (2020). CDM: cognitive diagnosis modeling. R package version 7.5-15. http://CRAN.R-project.org/package=CDM |
[39] | Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: theory, methods, and applications. Guilford. |
[40] |
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339.
doi: 10.1007/s11336-013-9362-0 URL |
[41] |
Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32(2), 37-50.
doi: 10.1111/emip.12010 URL |
[42] |
Tjoe, H., & de la Torre, J. (2014). On recognizing proportionality: Does the ability to solve missing value proportional problems presuppose the conception of proportional reasoning? The Journal of Mathematical Behavior, 33, 1-7.
doi: 10.1016/j.jmathb.2013.09.002 URL |
[43] |
von Davier, M. (2014). The DINA model as a constrained general diagnostic model: Two variants of a model equivalency. British Journal of Mathematical and Statistical Psychology, 67(1), 49-71.
doi: 10.1111/bmsp.12003 URL |
[44] | Wang, C., & Lu, J. (2021). Learning attribute hierarchies from data: Two exploratory approaches. Journal of Educational and Behavioral Statistics, 46(1), 58-84. |
[45] | Wu, Z., Deloria-Knoll, M., & Zeger, S. L. (2017). Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics, 18(2), 200-213. |
[46] |
Zhang, Z. (2014). Monte Carlo based statistical power analysis for mediation models: methods and software. Behavior research methods, 46(4), 1184-1198.
doi: 10.3758/s13428-013-0424-0 URL |
[47] | Zhang, Z., & Wang, L. (2020). bmem: mediation analysis with missing data using Bootstrap. R package version 1. 8. https://CRAN.R-project.org/package=bmem |
[1] | 刘彦楼, 陈启山, 王一鸣, 姜晓彤. 模型参数点估计的可靠性:以CDM为例[J]. 心理学报, 2023, 55(10): 1712-1728. |
[2] | 刘彦楼, 吴琼琼. 认知诊断模型Q矩阵修正:完整信息矩阵的作用[J]. 心理学报, 2023, 55(1): 142-158. |
[3] | 高旭亮, 汪大勋, 王芳, 蔡艳, 涂冬波. 基于分部评分模型思路的多级评分认知诊断模型开发[J]. 心理学报, 2019, 51(12): 1386-1397. |
[4] | 刘彦楼;辛涛;李令青;田伟;刘笑笑. 改进的认知诊断模型项目功能差异检验方法 ——基于观察信息矩阵的Wald统计量[J]. 心理学报, 2016, 48(5): 588-598. |
[5] | 詹沛达;陈平;边玉芳. 使用验证性补偿多维IRT模型进行认知诊断评估[J]. 心理学报, 2016, 48(10): 1347-1356. |
[6] | 蔡艳;涂冬波. 属性多级化的认知诊断模型拓展及其Q矩阵设计[J]. 心理学报, 2015, 47(10): 1300-1308. |
[7] | 蔡艳;涂冬波;丁树良. 五大认知诊断模型的诊断正确率比较及其影响因素:基于分布形态、属性数及样本容量的比较[J]. 心理学报, 2013, 45(11): 1295-1304. |
[8] | 叶宝娟;温忠麟. 测验同质性系数及其区间估计[J]. 心理学报, 2012, 44(12): 1687-1694. |
[9] | 叶宝娟,温忠麟. 单维测验合成信度三种区间估计的比较[J]. 心理学报, 2011, 43(04): 453-461. |
[10] | 涂冬波,蔡艳,戴海琦,丁树良. 一种多级评分的认知诊断模型:P-DINA模型的开发[J]. 心理学报, 2010, 42(10): 1011-1020. |
[11] | 罗,欢,丁树良,汪文义,喻晓锋,曹慧媛1. 属性不等权重的多级评分属性层级方法[J]. 心理学报, 2010, 42(04): 528-538. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||