心理科学进展 ›› 2020, Vol. 28 ›› Issue (9): 1462-1477.doi: 10.3724/SP.J.1042.2020.01462
收稿日期:
2019-10-12
出版日期:
2020-09-15
发布日期:
2020-07-24
通讯作者:
涂冬波
E-mail:tudongbo@aliyun.com
基金资助:
ZHANG Longfei, WANG Xiaowen, CAI Yan, TU Dongbo()
Received:
2019-10-12
Online:
2020-09-15
Published:
2020-07-24
Contact:
TU Dongbo
E-mail:tudongbo@aliyun.com
摘要:
变点分析法(change point analysis, CPA)近些年才引入心理与教育测量学, 相较于传统方法, CPA不仅可以侦查异常作答被试, 还能自动精确地定位变点位置, 高效清洗作答数据。其原理在于:判断作答序列中是否存在可将该序列划分为具有不同统计学属性两部分的点(即变点), 并且需使用被试拟合统计量(person-fit statistic, PFS)来量化两个子序列之间的差异。未来可将单变点分析拓展至多变点, 结合反应时等信息, 构建非参数化指标以及将现有指标拓展至多级计分或多维测验, 以提高CPA的适用广度及效力。
中图分类号:
张龙飞, 王晓雯, 蔡艳, 涂冬波. (2020). 心理与教育测验中异常反应侦查新技术:变点分析法. 心理科学进展 , 28(9), 1462-1477.
ZHANG Longfei, WANG Xiaowen, CAI Yan, TU Dongbo. (2020). Change point analysis: A new method to detect aberrant responses in psychological and educational testing. Advances in Psychological Science, 28(9), 1462-1477.
CUSUM | CPA | |
---|---|---|
主要思想 | 按照题目顺序依次将各题上观察与期望得分间的残差累积求和。 | 找到某个可将序列划分为具有不同统计学属性两部分的点。 |
PFS | 基于题目平均加权残差的单侧指标$C_{j}^{+}$, $C_{j}^{-}$和双侧指标${{C}^{T}}$。 | 双侧指标:基于似然比检验的${{L}_{\max }}$, 基于Wald检验的${{W}_{\max }}$, 基于得分检验的${{S}_{\max }}$和基于加权残差的${{R}_{\max }}$, 以及各自的单侧形式。 |
单双侧指标 | 在侦测前已明确目标效应时用单侧指标, 未明确目标效应或对目标效应不作具体要求时用双侧指标。 | |
优点 | 输出图像, 可用于过程监控。 | 自动精确定位变点。 |
缺点 | 需人工检查图像来定位变点, 准确性较低。 | 当变点位于序列最前或最后几题时难以定位。 |
适用情境 | 变点前后模型参数已知。 | 变点前后模型参数未知。其中${{L}_{\max }}$、${{W}_{\max }}$和${{S}_{\max }}$适用于高风险(教育)测验, ${{R}_{\max }}$适用于低风险(心理)测验。 |
表1 CUSUM与CPA的综合比较
CUSUM | CPA | |
---|---|---|
主要思想 | 按照题目顺序依次将各题上观察与期望得分间的残差累积求和。 | 找到某个可将序列划分为具有不同统计学属性两部分的点。 |
PFS | 基于题目平均加权残差的单侧指标$C_{j}^{+}$, $C_{j}^{-}$和双侧指标${{C}^{T}}$。 | 双侧指标:基于似然比检验的${{L}_{\max }}$, 基于Wald检验的${{W}_{\max }}$, 基于得分检验的${{S}_{\max }}$和基于加权残差的${{R}_{\max }}$, 以及各自的单侧形式。 |
单双侧指标 | 在侦测前已明确目标效应时用单侧指标, 未明确目标效应或对目标效应不作具体要求时用双侧指标。 | |
优点 | 输出图像, 可用于过程监控。 | 自动精确定位变点。 |
缺点 | 需人工检查图像来定位变点, 准确性较低。 | 当变点位于序列最前或最后几题时难以定位。 |
适用情境 | 变点前后模型参数已知。 | 变点前后模型参数未知。其中${{L}_{\max }}$、${{W}_{\max }}$和${{S}_{\max }}$适用于高风险(教育)测验, ${{R}_{\max }}$适用于低风险(心理)测验。 |
[1] | 陈希孺. (1991). 变点统计分析简介. 数理统计与管理, (1), 52-59. |
[2] |
Abahous, H., Ronchail, J., Sifeddine, A., Kenny, L., & Bouchaou, L. (2018). Trend and change point analyses of annual precipitation in the Souss-Massa Region in Morocco during 1932-2010. Theoretical and Applied Climatology, 134(3-4), 1153-1163.
doi: 10.1007/s00704-017-2325-0 URL |
[3] | Allen, D. E., McAleer, M., Powell, R. J., & Singh, A. K. (2018). Non-parametric multiple change point analysis of the global financial crisis. Annals of Financial Economics, 13(02), 1850008. |
[4] | American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. |
[5] |
Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowledge and Information Systems, 51, 339-367.
doi: 10.1007/s10115-016-0987-z URL pmid: 28603327 |
[6] | Andrews, D. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61(4), 821-856. |
[7] |
Armstrong, R. D., & Shi, M. (2009). A parametric cumulative sum statistic for person fit. Applied Psychological Measurement, 33(5), 391-410.
doi: 10.1177/0146621609331961 URL |
[8] | Baker, F. B., & Kim, H. S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker. |
[9] | Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289-300. |
[10] |
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348.
doi: 10.1111/jedm.2002.39.issue-4 URL |
[11] | Bolt, D. M., Mroch, A. A., & Kim, J.-S. (2003, April). An empirical investigation of the hybrid IRT model for improving item parameter estimation in speeded tests. Paper presented at the meeting of the American Educational Research Association, Chicago, IL. |
[12] | Bradlow, E., & Weiss, R. E. (2001). Outlier measures and norming methods for computerized adaptive tests. Journal of Educational and Behavioral Statistics, 26(1), 85-104. |
[13] | Bradlow, E., Weiss, R. E., & Cho, M. (1998). Bayesian identification of outliers in computerized adaptive tests. Journal of the American Statistical Association, 93, 910-919. |
[14] | Chen, J., & Gupta, A. K. (2012). Parametric statistical change point analysis: With applications to genetics, medicine, and finance (2nd ed.). New York: Springer. |
[15] | Csorgo, M., & Horvath, L. (1997). Limit theorems in change-point analysis. New York, NY: Wiley. |
[16] | de Boeck, P., Cho, S. J., & Wilson, M. (2011). Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement, 35(8), 583-603. |
[17] | Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum, Inc. |
[18] | Estrella, A., & Rodrigues, A. (2005). One-sided test for an unknown breakpoint: Theory, computation, and application to monetary theory (Staff Reports No. 232). Federal Reserve Bank of New York. |
[19] | Evans, F. R., & Reilly, R. R. (1972). A study of speededness as a source of test bias. Journal of Educational Measurement, 9, 123-131. |
[20] |
Fox, J. P., & Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivariate behavioral research, 51(4), 540-553.
URL pmid: 27269482 |
[21] |
Genovese, C. R., Lazar, N. A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.
URL pmid: 11906227 |
[22] | Goegebeur, Y., de Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrika, 73(1), 65. |
[23] | Hawkins, D. M., Qiu, P., & Kang, C. W. (2003). The changepoint model for statistical process control. Journal of Quality Technology, 35(4), 355-366. |
[24] |
Hong, M. R., & Cheng, Y. (2019). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51(2), 573-588.
URL pmid: 30350024 |
[25] | Karabatsos, & George.(2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. |
[26] |
Kass-Hout, T. A., Xu, Z., McMurray, P., Park, S., Buckeridge, D. L., Brownstein, J. S., ... Groseclose, S. L. (2012). Application of change point analysis to daily influenza-like illness emergency department visits. Journal of the American Medical Informatics Association, 19(6), 1075-1081.
URL pmid: 22759619 |
[27] | Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges. Statistica Sinica, 11(2), 303-408. |
[28] |
Lee, Y. H., & von, Davier, A., A. (2013). Monitoring scale scores over time via quality control charts, model-based approaches, and time series techniques. Psychometrika, 78(3), 557-575.
URL pmid: 25106404 |
[29] |
Li, J., Witten, D.M., Johnstone, I.M., & Tibshirani, R. (2012). Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13(3), 523-538.
URL pmid: 22003245 |
[30] | Maleki, S., Bingham, C., & Zhang, Y. (2016). Development and realization of changepoint analysis for the detection of emerging faults on industrial systems. IEEE Transactions on Industrial Informatics, 12(3), 1180-1187. |
[31] | Meade, A. W. (2016). Understanding and detecting careless responding in survey research. Retrieved February 15, 2020, from https://cba.unl.edu/outreach/carma/documents/ CARMA-Meade-Presentation.pdf |
[32] | Meijer, R. R. (2002). Outlier detection in high-stakes certification testing. Journal of Educational Measurement, 39(3), 219-233. |
[33] | Mortaji, S. T. H., Noorossana, R., & Bagherpour, M. (2015). Project completion time and cost prediction using change point analysis. Journal of Management in Engineering, 31(5), 04014086. |
[34] |
Nam, C. F. H., Aston, J. A. D., & Johansen, A. M. (2012). Quantifying the uncertainty in change points. Journal of Time Series Analysis, 33(5), 807-823.
doi: 10.1111/jtsa.2012.33.issue-5 URL |
[35] | Nigro, M. B., Pakzad, S. N., & Dorvash, S. (2014). Localized structural damage detection: A change point analysis. Computer-Aided Civil and Infrastructure Engineering, 29(6), 416-432. |
[36] | Oshima, T. C. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31(3), 200-219. |
[37] | Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1-2), 100-115. |
[38] | Patton, J. M., Cheng, Y., Hong, M. R., & Diao, Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309-341. |
[39] | Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271-282. |
[40] |
Schwartzman, A., & Lin, X. (2011). The effect of correlation in false discovery rate estimation. Biometrika, 98(1), 199-214.
doi: 10.1093/biomet/asq075 URL pmid: 23049127 |
[41] | Shao, C. (2016). Aberrant response detection using change-point analysis (Unpublished Doctoral dissertation). University of Notre Dame, Notre Dame, IN. |
[42] | Shao, C., Li, J., & Cheng, Y. (2016). Detection of test speededness using change-point analysis. Psychometrica, 81(4), 1118-1141. |
[43] | Sinharay, S. (2016). Person fit analysis in computerized adaptive testing using tests for a change point. Journal of Educational and Behavioral Statistics, 41(5), 521-549. |
[44] | Sinharay, S. (2017a). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46-68. |
[45] |
Sinharay, S. (2017b). Some remarks on applications of tests for detecting a change point to psychometric problems. Psychometrika, 82(4), 1149-1161.
URL pmid: 27770307 |
[46] |
Sinharay, S. (2017c). Which statistic should be used to detect item preknowledge when the set of compromised items is known?. Applied Psychological Measurement, 41(6), 403-421.
doi: 10.1177/0146621617698453 URL pmid: 29881099 |
[47] | Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16), 9440-9445. |
[48] | Suh, Y., Cho, S. J., & Wollack, J. A. (2012). A comparison of item calibration procedures in the presence of test speededness. Journal of Educational Measurement, 49(3), 285-311. |
[49] | Suhaila, J., & Yusop, Z. (2018). Trend analysis and change point detection of annual and seasonal temperature series in Peninsular Malaysia. Meteorology and Atmospheric Physics, 130(5), 565-581. |
[50] | Tendeiro, J. N., & Meijer, R. R. (2012). A CUSUM to detect person misfit: A discussion and some alternatives for existing procedures. Applied Psychological Measurement, 36(5), 420-442. |
[51] |
Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239-259.
doi: 10.1111/jedm.2014.51.issue-3 URL |
[52] | Thies, S., & Molnár, P. (2018). Bayesian change point analysis of Bitcoin returns. Finance Research Letters, 27, 223-227. |
[53] | United States Department of Education. (2013). Testing integrity: Issues and recommendations for best practice. Retrieved November 21, 2019, from http://nces.ed.gov/ pubs2013/2013454.pdf. |
[54] | van der, Linden, W., J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287-308. |
[55] | van Krimpen-Stoop, E. M. L. A., Meijer, R. R. (2000). Detecting person misfit in adaptive testing using statistical process control techniques. In W. J. van der Linden & G. A. Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 201-219). Dordrecht, Netherlands: Springer. |
[56] | van, Krimpen-Stoop, E. M. L., A., & Meijer, R. R. (2001). CUSUM-based person-fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26(2), 199-217. |
[57] | van, Krimpen-Stoop, E. M. L., A., & Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26(2), 164-180. |
[58] | Vostrikova, L. Y. (1981). Detecting “disorder” in multidimensional random processes. Doklady Akademii Nauk, 259(2), 270-274. |
[59] |
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.
URL pmid: 25873487 |
[60] |
Wang, T., & Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29(5), 323-339.
doi: 10.1177/0146621605275984 URL |
[61] | Wollack, J. A., & Cohen, A. S. (2004, April). A model for simulating speeded test data. Paper presented at the meeting of the American Educational Research Association. San Diego, CA. |
[62] | Worsley, K. J. (1979). On the likelihood ratio test for a shift in location of normal populations. Journal of the American Statistical Association, 74, 365-367. |
[63] | Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89-98). New York: Waxmann. |
[64] |
Ye, W., Liu, X., & Miao, B. (2012). Measuring the subprime crisis contagion: Evidence of change point analysis of copula functions. European Journal of Operational Research, 222(1), 96-103.
doi: 10.1016/j.ejor.2012.04.004 URL |
[65] | Yu, M., & Ruggieri, E. (2019). Change point analysis of global temperature records. International Journal of Climatology, 39(8), 3679-3688. |
[66] |
Yu, X., & Cheng, Y. (2019). A change-point analysis procedure based on weighted residuals to detect back random responding. Psychological Methods, 24(5), 658-674.
doi: 10.1037/met0000212 URL pmid: 30762378 |
[67] | Zhang, J. (2014). A sequential procedure for detecting compromised items in the item pool of a CAT system. Applied Psychological Measurement, 38(2), 87-104. |
[1] | 方俊燕, 温忠麟. 追踪研究中的内生性问题:来源与应对[J]. 心理科学进展, 2023, 31(4): 507-518. |
[2] | 黄顺森, 陈豪杰, 来枭雄, 代欣然, 王耘. 多元宇宙样分析:简介及应用[J]. 心理科学进展, 2023, 31(2): 196-208. |
[3] | 陈新文, 李鸿杰, 丁玉珑. 探究事件相关脑电/脑磁信号中的神经表征模式:基于分类解码和表征相似性分析的方法[J]. 心理科学进展, 2023, 31(2): 173-195. |
[4] | 包寒吴霜 王梓西 程曦 苏展 杨盈 张光耀 王博 蔡华俭. 基于词嵌入技术的心理学研究:方法及应用[J]. 心理科学进展, 0, (): 0-0. |
[5] | 朱训, 顾昕. 变量相对重要性评估的方法选择及应用[J]. 心理科学进展, 2023, 31(1): 145-158. |
[6] | 方杰, 温忠麟. 纵向数据的调节效应分析[J]. 心理科学进展, 2022, 30(11): 2461-2472. |
[7] | 翟宏堃, 李强, 魏晓薇. 结构方程模型统计检验力分析:原理与方法[J]. 心理科学进展, 2022, 30(9): 2117-2130. |
[8] | 王阳, 温忠麟, 王惠惠, 管芳. 第二类有中介的调节模型[J]. 心理科学进展, 2022, 30(9): 2131-2142. |
[9] | 温忠麟, 谢晋艳, 方杰, 王一帆. 新世纪20年国内假设检验及其关联问题的方法学研究[J]. 心理科学进展, 2022, 30(8): 1667-1681. |
[10] | 温忠麟, 陈虹熹, 方杰, 叶宝娟, 蔡保贞. 新世纪20年国内测验信度研究[J]. 心理科学进展, 2022, 30(8): 1682-1691. |
[11] | 温忠麟, 方杰, 谢晋艳, 欧阳劲樱. 国内中介效应的方法学研究[J]. 心理科学进展, 2022, 30(8): 1692-1702. |
[12] | 方杰, 温忠麟, 欧阳劲樱, 蔡保贞. 国内调节效应的方法学研究[J]. 心理科学进展, 2022, 30(8): 1703-1714. |
[13] | 王阳, 温忠麟, 李伟, 方杰. 新世纪20年国内结构方程模型方法研究与模型发展[J]. 心理科学进展, 2022, 30(8): 1715-1733. |
[14] | 刘源, 都弘彦, 方杰, 温忠麟. 国内追踪数据分析方法研究与模型发展[J]. 心理科学进展, 2022, 30(8): 1734-1746. |
[15] | 林浇敏, 李爱梅, 周雅然, 何军红, 周蕾. 眼动操纵技术在决策研究中的应用前景:改变决策行为[J]. 心理科学进展, 2022, 30(8): 1794-1803. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||