心理科学进展 ›› 2021, Vol. 29 ›› Issue (9): 1696-1710.doi: 10.3724/SP.J.1042.2021.01696
• 研究方法 • 上一篇
收稿日期:
2020-10-23
发布日期:
2021-07-22
通讯作者:
刘红云
E-mail:hyliu@bnu.edu.cn
基金资助:
Received:
2020-10-23
Published:
2021-07-22
Contact:
LIU Hongyun
E-mail:hyliu@bnu.edu.cn
摘要:
混合模型方法(Mixture Model Method)是近年来提出的, 对心理与教育测验中的异常作答进行处理的方法。与反应时阈值法, 反应时残差法等传统方法相比, 混合模型方法可以同时完成异常作答的识别和模型参数估计, 并且, 在数据污染严重的情况下仍具有较好的表现。该方法的原理为根据正常作答和异常作答的特点, 针对分类潜变量(即作答层面的分类)的不同类别, 在作答反应和(或)反应时部分建立不同的模型, 从而实现对分类潜变量, 以及模型中其他题目和被试参数的估计。文章详细介绍了目前提出的几种混合模型方法, 并将其与传统方法比较分析。未来研究可在模型前提假设违背, 含有多种异常作答等情况下探索混合模型方法的稳健性和适用性, 通过固定部分题目参数, 增加选择流程等方式提高混合模型方法的使用效率。
中图分类号:
刘玥, 刘红云. (2021). 心理与教育测验中异常作答处理的新技术: 混合模型方法. 心理科学进展 , 29(9), 1696-1710.
LIU Yue, LIU Hongyun. (2021). Mixture Model Method: A new method to handle aberrant responses in psychological and educational testing. Advances in Psychological Science, 29(9), 1696-1710.
方法 类型 | 具体方法 | 没有综合利用反应时和作答反应的信息 | 没有基于理论分布 | 偶有例外, 无法批量应用 | 包含有关异常作答的强假设 | 对高比例异常作答 敏感 | 异常作答比例低时容易出现问题 | 计算复杂耗时长 | 识别结果不一定是异常作答 | 只能用于已知异常作答答对概率的情境 | 只能用于识别快速异常作答 |
---|---|---|---|---|---|---|---|---|---|---|---|
反应时 阈值法 | 统一阈值法 | × | × | × | |||||||
根据题目特征求阈值法 | × | × | × | ||||||||
双峰分布交点求阈值法 | × | × | × | × | |||||||
常模阈值法 | × | × | |||||||||
基于信息求阈值法 | × | × | × | ||||||||
条件分布法 | × | × | × | × | |||||||
反应时 残差法 | 标准化反应时残差法 | × | × | × | |||||||
贝叶斯残差法 | × | × | × | ||||||||
混合 模型法 | 等级分组的反应时模型 | × | × | × | |||||||
半参数化的混合模型 | × | × | × | × | × | ||||||
基于反应时的混合作答反应模型 | × | × | × | × | × | ||||||
基于反应时和作答反应的混合多层模型 | × | × | × |
表1 本文中所有方法的主要局限性总结
方法 类型 | 具体方法 | 没有综合利用反应时和作答反应的信息 | 没有基于理论分布 | 偶有例外, 无法批量应用 | 包含有关异常作答的强假设 | 对高比例异常作答 敏感 | 异常作答比例低时容易出现问题 | 计算复杂耗时长 | 识别结果不一定是异常作答 | 只能用于已知异常作答答对概率的情境 | 只能用于识别快速异常作答 |
---|---|---|---|---|---|---|---|---|---|---|---|
反应时 阈值法 | 统一阈值法 | × | × | × | |||||||
根据题目特征求阈值法 | × | × | × | ||||||||
双峰分布交点求阈值法 | × | × | × | × | |||||||
常模阈值法 | × | × | |||||||||
基于信息求阈值法 | × | × | × | ||||||||
条件分布法 | × | × | × | × | |||||||
反应时 残差法 | 标准化反应时残差法 | × | × | × | |||||||
贝叶斯残差法 | × | × | × | ||||||||
混合 模型法 | 等级分组的反应时模型 | × | × | × | |||||||
半参数化的混合模型 | × | × | × | × | × | ||||||
基于反应时的混合作答反应模型 | × | × | × | × | × | ||||||
基于反应时和作答反应的混合多层模型 | × | × | × |
[1] | 黄美薇, 潘逸沁, 骆方. (2020). 结合选择题与主观题信息的两阶段作弊甄别方法. 心理科学, (1), 75-80. |
[2] | 简小珠, 焦璨, Steven P Reise, 彭春妹. (2010). 四参数模型对被试作答异常现象的拟合与纠正. 心理科学进展, 18(3), 537-544. |
[3] |
Baer R. A., Ballenger J., Berry D. T. R., & Wetter M. W. (1997). Detection of random responding on the MMPI-A. Journal of Personality Assessment, 68(1), 139-151.
pmid: 16370774 |
[4] |
Berry D. T. R., Wetter M. W., Baer R. A., Larsen L., Clark C., & Monroe K. (1992). MMPI-2 random responding indices: Validation using a self-report methodology. Psychological Assessment, 4(3), 340-345.
doi: 10.1037/1040-3590.4.3.340 URL |
[5] |
Bolsinova M., & Tijmstra J. (2019). Modeling differences between response times of correct and incorrect responses. Psychometrika, 84(4), 1018-1046.
doi: 10.1007/s11336-019-09682-5 pmid: 31463656 |
[6] |
Bolt D. M., Cohen A. S., & Wollack J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348.
doi: 10.1111/jedm.2002.39.issue-4 URL |
[7] | Borghans L., & Schils T. (2012). The leaning tower of PISA: Decomposing achievement test scores into cognitive and noncognitive components (Unpublished doctorial dissertation). Maastricht University. |
[8] |
Bridgeman B., & Cline F. (2004). Effects of differentially time-consuming tests on computer-adaptive test scores. Journal of Educational Measurement, 41(2), 137-148.
doi: 10.1111/jedm.2004.41.issue-2 URL |
[9] |
Clark M. E., Gironda R. J., & Young R. W. (2003). Detection of back random responding: Effectiveness of MMPI-2 and personality assessment inventory validity indices. Psychological Assessment, 15(2), 223-234.
doi: 10.1037/1040-3590.15.2.223 URL |
[10] |
Cousineau D. (2009). Fitting the three-parameter Weibull distribution: Review and evaluation of existing and new methods. IEEE Transactions on Dielectrics and Electrical Insulation, 16(1), 281-288.
doi: 10.1109/TDEI.2009.4784578 URL |
[11] | Custer M., Sharairi S., & Swift D. (2012,April). A comparison of scoring options for omitted and not-reached items through the recovery of IRT parameters when utilizing the Rasch model and joint maximum likelihood estimation. Paper presented at the annual meeting of the National Council of Measurement in Education, Vancouver, BC, Canada. |
[12] |
Dolan C. V., van der Maas H. L. J., & Molenaar P. C. M. (2002). A framework for ML estimation of parameters of (mixtures of) common reaction time distributions given optional truncation or censoring. Behavior Research Methods, Instruments & Computers, 34, 304-323.
doi: 10.3758/BF03195458 URL |
[13] | Feinberg R., & Jurich D. (2018, April). Using rapid responses to evaluate test speededness. Paper presented at the meeting of the National Council of Measurement in Education (NCME), New York, NY. |
[14] | Goldhammer F., Martens T., Christoph G., & Lüdtke O. (2016). Test-taking engagement in PIAAC (OECD Education Working Papers, No. 133). Paris, France: OECD Publishing. |
[15] |
Guo H., Rios J. A., Haberman S., Liu O. L., Wang J., & Paek I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173-183.
doi: 10.1080/08957347.2016.1171766 URL |
[16] | Hauser C., & KingsburyG. G.(2009). Individual score validity in a modest-stakes adaptive educational testing setting. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA. |
[17] | Hauser C., Kingsbury G. G., & Wise S. L. (2008). Individual validity: Adding a missing link. Paper presented at the annual meeting of the American Educational Research Association, New York, NY. |
[18] |
Hong M. R., & Cheng Y. (2019a). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51(2), 573-588.
doi: 10.3758/s13428-018-1150-4 URL |
[19] |
Hong M. R., & Cheng Y. (2019b). Clarifying the effect of test speededness. Applied Psychological Measurement, 43(8), 611-623.
doi: 10.1177/0146621618817783 URL |
[20] |
Köhler C., Pohl S., & Carstensen C. H. (2017). Dealing with item nonresponse in large-scale cognitive assessments: The impact of missing data methods on estimated explanatory relationships. Journal of Educational Measurement, 54(4), 397-419.
doi: 10.1111/jedm.2017.54.issue-4 URL |
[21] |
Kong X. J., Wise S. L., & Bhola D. S. (2007). Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educational and Psychological Measurement, 67(4), 606-619.
doi: 10.1177/0013164406294779 URL |
[22] | Lee Y. H., & Jia Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large-scale Assessments in Education, 2(8), 1-24. |
[23] |
Liu Y., Cheng Y., & Liu H. (2020). Identifying effortful individuals with mixture modeling response accuracy and response time simultaneously to improve item parameter estimation. Educational and Psychological Measurement, 80(4), 775-807.
doi: 10.1177/0013164419895068 URL |
[24] |
Lu J., Wang C., Zhang J., & Tao J. (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour. British Journal of Mathematical and Statistical Psychology, 73(2), 261-288.
doi: 10.1111/bmsp.v73.2 URL |
[25] | Ma L., Wise S. L., Thum Y. M., & Kingsbury G. (2011, April). Detecting response time threshold under the computer adaptive testing environment. Paper presented at the annual meeting of the National Council of Measurement in Education, New Orleans, LA. |
[26] |
Masters G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
doi: 10.1007/BF02296272 URL |
[27] |
Meyer J. P. (2010). A mixture Rasch model with item response time components. Applied Psychological Measurement, 34(7), 521-538.
doi: 10.1177/0146621609355451 URL |
[28] |
Michaelides M. P., Ivanova M., & Nicolaou C. (2020). The relationship between response-time effort and accuracy in PISA science multiple choice items. International Journal of Testing, 20(3), 187-205.
doi: 10.1080/15305058.2019.1706529 URL |
[29] |
Molenaar D., Bolsinova M., & Vermunt J. K. (2018). A semi-parametric within-subject mixture approach to the analyses of responses and response times. British Journal of Mathematical and Statistical Psychology, 71(2), 205- 228.
doi: 10.1111/bmsp.2018.71.issue-2 URL |
[30] |
Molenaar D., Bolsinova M., Rozsa S., & de Boeck P.,(2016). Response mixture modeling of intraindividual differences in responses and response times to the Hungarian WISC- IV block design test. Journal of Intelligence, 4(3), 10-29.
doi: 10.3390/jintelligence4030010 URL |
[31] |
Molenaar D., Oberski D., Vermunt J., & de Boeck P., (2016). Hidden Markov item response theory models for responses and response times. Multivariate Behavioral Research, 51(5), 606-626.
doi: 10.1080/00273171.2016.1192983 pmid: 27712114 |
[32] |
Molenaar D., & de Boeck P.,(2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83(2), 279-297.
doi: 10.1007/s11336-017-9602-9 pmid: 29392567 |
[33] |
Morgenthaler S. (2007). A survey of robust statistics. Statistical Methods and Applications, 15, 271-293.
doi: 10.1007/s10260-006-0034-4 URL |
[34] |
Partchev I., & de Boeck P.,(2012). Can fast and slow intelligence be differentiated? Intelligence, 40(1), 23-32.
doi: 10.1016/j.intell.2011.11.002 URL |
[35] |
Patton J. M., Cheng Y., Hong M. R., & Diao Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309-341.
doi: 10.3102/1076998618825116 URL |
[36] | Pohl S., Haberkorn K., Hardt K., & Wiegand E. (2012). NEPS technical report for reading? Scaling results of starting cohort 3 in fifth grade. NEPS Working Paper No.15. Bamberg: Otto-Friedrich-Universitt, Nationales Bildungspanel. |
[37] |
Pokropek A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300-325.
doi: 10.3102/1076998616636618 URL |
[38] | Qian H., Staniewska D., Reckase M., & Woo A. (2016). Using response time to detect item preknowledge in computer-based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. |
[39] | Ranger J., & Kuhn J. T. (2017). Detecting unmotivated individuals with a new model-selection approach for Rasch models. Psychological Test and Assessment Modeling, 59(3), 269-295. |
[40] |
Ranger J., Wolgast A., & Kuhn J. T. (2019). Robust estimation of the hierarchical model for responses and response times. British Journal of Mathematical and Statistical Psychology, 72(1), 83-107.
doi: 10.1111/bmsp.2019.72.issue-1 URL |
[41] |
Rios J. A., Guo H., Mao L., & Liu O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not?. International Journal of Testing, 17(1), 74-104.
doi: 10.1080/15305058.2016.1231193 URL |
[42] | Rose N. (2013). Item nonresponses in educational and psychological measurement (Unpublished doctorial dissertation). Friedrich-Schiller-University, Jena. |
[43] |
Rose N., von Davier M., & Nagengast B. (2017). Modeling omitted and not-reached items in IRT models. Psychometrika, 82(3), 795-819.
doi: 10.1007/s11336-016-9544-7 URL |
[44] | Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph Supplement No. 17). Richmond, VA: Psychometric Society. |
[45] |
Schnipke D. L., & Scrams D. J. (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement, 34(3), 213-232.
doi: 10.1111/jedm.1997.34.issue-3 URL |
[46] | Schnipke D. L. & Scrams D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M.T. Potenza, J.J. Fremer, & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 237-266). Mahwah, NJ: Lawrence Erlbaum. |
[47] |
Setzer J. C., Wise S. L., van den Heuvel J. R., & Ling G. (2013). An investigation of examinee test-taking effort on a large-scale assessment. Applied Measurement in Education, 26(1), 34-49.
doi: 10.1080/08957347.2013.739453 URL |
[48] |
Shao C., Li J., & Cheng Y. (2016). Detection of test speededness using change-point analysis. Psychometrika, 81(4), 1118-1141.
pmid: 26305400 |
[49] | Silm G., Must O., & Täht K. (2013). Test-taking effort as a predictor of performance in low-stakes tests. TRAMES: A Journal of the Humanities & Social Sciences, 17(4), 433- 448. |
[50] |
Sinharay S., & Johnson M. S. (2019). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73(3), 397-419.
doi: 10.1111/bmsp.v73.3 URL |
[51] |
Ulitzsch E., von Davier M., & Pohl S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non- response. British Journal of Mathematical and Statistical Psychology, 73(S1), 83-112.
doi: 10.1111/bmsp.v73.s1 URL |
[52] |
van der Linden W. J.(2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181-204.
doi: 10.3102/10769986031002181 URL |
[53] |
van der Linden W. J.(2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287-308.
doi: 10.1007/s11336-006-1478-z URL |
[54] |
van der Linden W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73, 365-384.
doi: 10.1007/s11336-007-9046-8 URL |
[55] |
Wang C., Chang H. H., & Douglas J. A. (2013). The linear transformation model with frailties for the analysis of item response times. British Journal of Mathematical and Statistical Psychology, 66(1), 144-168.
doi: 10.1111/j.2044-8317.2012.02045.x URL |
[56] |
Wang C., Fan Z., Chang H. H., & Douglas J. A. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381-417.
doi: 10.3102/1076998612461831 URL |
[57] |
Wang C., & Xu G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.
doi: 10.1111/bmsp.2015.68.issue-3 URL |
[58] |
Wang C., Xu G., & Shang Z. (2018). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika, 83(1), 223-254.
doi: 10.1007/s11336-016-9525-x URL |
[59] |
Wang C., Xu G., Shang Z., & Kuncel N. (2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43(4), 469-501.
doi: 10.3102/1076998618767123 URL |
[60] |
Weirich S., Hecht M., Penk C., Roppelt A., & Böhme K. (2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41(2), 115-129.
doi: 10.1177/0146621616676791 URL |
[61] |
Wise S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237-252.
doi: 10.1080/08957347.2015.1042155 URL |
[62] |
Wise S. L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52-61.
doi: 10.1111/emip.2017.36.issue-4 URL |
[63] |
Wise S. L. (2019). An information-based approach to identifying rapid-guessing thresholds. Applied Measurement in Education, 32(4), 325-336.
doi: 10.1080/08957347.2019.1660350 URL |
[64] |
Wise S. L., & DeMars C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.
doi: 10.1111/jedm.2006.43.issue-1 URL |
[65] |
Wise S. L., & DeMars C. E. (2010). Examinee noneffort and the validity of program assessment results. Educational Assessment, 15(1), 27-41.
doi: 10.1080/10627191003673216 URL |
[66] |
Wise S. L., & Kingsbury G. G. (2016). Modeling student test-taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53(1), 86-105.
doi: 10.1111/jedm.12102 URL |
[67] | Wise S. L., & Ma L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada. |
[68] | Wright B. D., & Stone M. H. (1979). Best test design. Rasch measurement. Chicago, IL: MESA Press. |
[69] |
Yan T., & Tourangeau R. (2008). Fast times and easy questions: The effects of age, experience and question complexity on web survey response times. Applied Cognitive Psychology, 22(1), 51-68.
doi: 10.1002/(ISSN)1099-0720 URL |
[70] |
Yu X., & Cheng Y. (2019). A change-point analysis procedure based on weighted residuals to detect back random responding. Psychological Methods, 24(5), 658-674.
doi: 10.1037/met0000212 URL |
[1] | 陈冠宇, 陈平. 解释性项目反应理论模型:理论与应用[J]. 心理科学进展, 2019, 27(5): 937-950. |
[2] | 赵晓宁, 胡金生, 李松泽, 刘西, 刘琼阳, 吴娜. 基于眼动研究的孤独症谱系障碍早期预测[J]. 心理科学进展, 2019, 27(2): 301-311. |
[3] | 王孟成, 毕向阳. 回归混合模型:方法进展与软件实现[J]. 心理科学进展, 2018, 26(12): 2272-2280. |
[4] | 施霖. 短波长敏感视锥细胞适应水平与颜色对比度检测阈值的相关性[J]. 心理科学进展, 2017, 25(suppl.): 24-24. |
[5] | 郭磊;尚鹏丽;夏凌翔. 心理与教育测验中反应时模型应用的优势与举例[J]. 心理科学进展, 2017, 25(4): 701-712. |
[6] | 张阳;李艾苏;张少杰;张明. 微眼动的识别技术[J]. 心理科学进展, 2017, 25(1): 29-36. |
[7] | 陈宇帅;温忠麟;顾红磊. 因子混合模型:潜在类别分析与因子分析的整合[J]. 心理科学进展, 2015, 23(3): 529-538. |
[8] | 王力;张栎文;张明亮;陈安涛. 视觉运动Simon效应和认知Simon效应的影响因素及机制[J]. 心理科学进展, 2012, 20(5): 662-671. |
[9] | Ernst Poeppel;包燕;周斌. “时间窗”—— 认知加工的后勤基础[J]. 心理科学进展, 2011, 19(6): 775-793. |
[10] | 冯成志;贾凤芹 . 双眼竞争研究现状与展望[J]. 心理科学进展, 2008, 16(2): 213-221. |
[11] | 李明;杜建政. 基于反应时的人格内隐测量及其方法学的思考[J]. 心理科学进展, 2008, 16(1): 184-191. |
[12] | 彭正敏,林绚晖,张继明,车宏生. 情绪智力的能力模型[J]. 心理科学进展, 2004, 12(6): 817-817~823. |
[13] | 陈玲丽,吴家舵. 序列学习是否是内隐学习?[J]. 心理科学进展, 2004, 12(4): 500-504. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||