Advances in Psychological Science ›› 2021, Vol. 29 ›› Issue (9): 1696-1710.doi: 10.3724/SP.J.1042.2021.01696
• Research Method • Previous Articles
Received:
2020-10-23
Published:
2021-07-22
Contact:
LIU Hongyun
E-mail:hyliu@bnu.edu.cn
CLC Number:
LIU Yue, LIU Hongyun. Mixture Model Method: A new method to handle aberrant responses in psychological and educational testing[J]. Advances in Psychological Science, 2021, 29(9): 1696-1710.
方法 类型 | 具体方法 | 没有综合利用反应时和作答反应的信息 | 没有基于理论分布 | 偶有例外, 无法批量应用 | 包含有关异常作答的强假设 | 对高比例异常作答 敏感 | 异常作答比例低时容易出现问题 | 计算复杂耗时长 | 识别结果不一定是异常作答 | 只能用于已知异常作答答对概率的情境 | 只能用于识别快速异常作答 |
---|---|---|---|---|---|---|---|---|---|---|---|
反应时 阈值法 | 统一阈值法 | × | × | × | |||||||
根据题目特征求阈值法 | × | × | × | ||||||||
双峰分布交点求阈值法 | × | × | × | × | |||||||
常模阈值法 | × | × | |||||||||
基于信息求阈值法 | × | × | × | ||||||||
条件分布法 | × | × | × | × | |||||||
反应时 残差法 | 标准化反应时残差法 | × | × | × | |||||||
贝叶斯残差法 | × | × | × | ||||||||
混合 模型法 | 等级分组的反应时模型 | × | × | × | |||||||
半参数化的混合模型 | × | × | × | × | × | ||||||
基于反应时的混合作答反应模型 | × | × | × | × | × | ||||||
基于反应时和作答反应的混合多层模型 | × | × | × |
方法 类型 | 具体方法 | 没有综合利用反应时和作答反应的信息 | 没有基于理论分布 | 偶有例外, 无法批量应用 | 包含有关异常作答的强假设 | 对高比例异常作答 敏感 | 异常作答比例低时容易出现问题 | 计算复杂耗时长 | 识别结果不一定是异常作答 | 只能用于已知异常作答答对概率的情境 | 只能用于识别快速异常作答 |
---|---|---|---|---|---|---|---|---|---|---|---|
反应时 阈值法 | 统一阈值法 | × | × | × | |||||||
根据题目特征求阈值法 | × | × | × | ||||||||
双峰分布交点求阈值法 | × | × | × | × | |||||||
常模阈值法 | × | × | |||||||||
基于信息求阈值法 | × | × | × | ||||||||
条件分布法 | × | × | × | × | |||||||
反应时 残差法 | 标准化反应时残差法 | × | × | × | |||||||
贝叶斯残差法 | × | × | × | ||||||||
混合 模型法 | 等级分组的反应时模型 | × | × | × | |||||||
半参数化的混合模型 | × | × | × | × | × | ||||||
基于反应时的混合作答反应模型 | × | × | × | × | × | ||||||
基于反应时和作答反应的混合多层模型 | × | × | × |
[1] | 黄美薇, 潘逸沁, 骆方. (2020). 结合选择题与主观题信息的两阶段作弊甄别方法. 心理科学, (1), 75-80. |
[2] | 简小珠, 焦璨, Steven P Reise, 彭春妹. (2010). 四参数模型对被试作答异常现象的拟合与纠正. 心理科学进展, 18(3), 537-544. |
[3] |
Baer R. A., Ballenger J., Berry D. T. R., & Wetter M. W. (1997). Detection of random responding on the MMPI-A. Journal of Personality Assessment, 68(1), 139-151.
pmid: 16370774 |
[4] |
Berry D. T. R., Wetter M. W., Baer R. A., Larsen L., Clark C., & Monroe K. (1992). MMPI-2 random responding indices: Validation using a self-report methodology. Psychological Assessment, 4(3), 340-345.
doi: 10.1037/1040-3590.4.3.340 URL |
[5] |
Bolsinova M., & Tijmstra J. (2019). Modeling differences between response times of correct and incorrect responses. Psychometrika, 84(4), 1018-1046.
doi: 10.1007/s11336-019-09682-5 pmid: 31463656 |
[6] |
Bolt D. M., Cohen A. S., & Wollack J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348.
doi: 10.1111/jedm.2002.39.issue-4 URL |
[7] | Borghans L., & Schils T. (2012). The leaning tower of PISA: Decomposing achievement test scores into cognitive and noncognitive components (Unpublished doctorial dissertation). Maastricht University. |
[8] |
Bridgeman B., & Cline F. (2004). Effects of differentially time-consuming tests on computer-adaptive test scores. Journal of Educational Measurement, 41(2), 137-148.
doi: 10.1111/jedm.2004.41.issue-2 URL |
[9] |
Clark M. E., Gironda R. J., & Young R. W. (2003). Detection of back random responding: Effectiveness of MMPI-2 and personality assessment inventory validity indices. Psychological Assessment, 15(2), 223-234.
doi: 10.1037/1040-3590.15.2.223 URL |
[10] |
Cousineau D. (2009). Fitting the three-parameter Weibull distribution: Review and evaluation of existing and new methods. IEEE Transactions on Dielectrics and Electrical Insulation, 16(1), 281-288.
doi: 10.1109/TDEI.2009.4784578 URL |
[11] | Custer M., Sharairi S., & Swift D. (2012,April). A comparison of scoring options for omitted and not-reached items through the recovery of IRT parameters when utilizing the Rasch model and joint maximum likelihood estimation. Paper presented at the annual meeting of the National Council of Measurement in Education, Vancouver, BC, Canada. |
[12] |
Dolan C. V., van der Maas H. L. J., & Molenaar P. C. M. (2002). A framework for ML estimation of parameters of (mixtures of) common reaction time distributions given optional truncation or censoring. Behavior Research Methods, Instruments & Computers, 34, 304-323.
doi: 10.3758/BF03195458 URL |
[13] | Feinberg R., & Jurich D. (2018, April). Using rapid responses to evaluate test speededness. Paper presented at the meeting of the National Council of Measurement in Education (NCME), New York, NY. |
[14] | Goldhammer F., Martens T., Christoph G., & Lüdtke O. (2016). Test-taking engagement in PIAAC (OECD Education Working Papers, No. 133). Paris, France: OECD Publishing. |
[15] |
Guo H., Rios J. A., Haberman S., Liu O. L., Wang J., & Paek I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173-183.
doi: 10.1080/08957347.2016.1171766 URL |
[16] | Hauser C., & KingsburyG. G.(2009). Individual score validity in a modest-stakes adaptive educational testing setting. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA. |
[17] | Hauser C., Kingsbury G. G., & Wise S. L. (2008). Individual validity: Adding a missing link. Paper presented at the annual meeting of the American Educational Research Association, New York, NY. |
[18] |
Hong M. R., & Cheng Y. (2019a). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51(2), 573-588.
doi: 10.3758/s13428-018-1150-4 URL |
[19] |
Hong M. R., & Cheng Y. (2019b). Clarifying the effect of test speededness. Applied Psychological Measurement, 43(8), 611-623.
doi: 10.1177/0146621618817783 URL |
[20] |
Köhler C., Pohl S., & Carstensen C. H. (2017). Dealing with item nonresponse in large-scale cognitive assessments: The impact of missing data methods on estimated explanatory relationships. Journal of Educational Measurement, 54(4), 397-419.
doi: 10.1111/jedm.2017.54.issue-4 URL |
[21] |
Kong X. J., Wise S. L., & Bhola D. S. (2007). Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educational and Psychological Measurement, 67(4), 606-619.
doi: 10.1177/0013164406294779 URL |
[22] | Lee Y. H., & Jia Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large-scale Assessments in Education, 2(8), 1-24. |
[23] |
Liu Y., Cheng Y., & Liu H. (2020). Identifying effortful individuals with mixture modeling response accuracy and response time simultaneously to improve item parameter estimation. Educational and Psychological Measurement, 80(4), 775-807.
doi: 10.1177/0013164419895068 URL |
[24] |
Lu J., Wang C., Zhang J., & Tao J. (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour. British Journal of Mathematical and Statistical Psychology, 73(2), 261-288.
doi: 10.1111/bmsp.v73.2 URL |
[25] | Ma L., Wise S. L., Thum Y. M., & Kingsbury G. (2011, April). Detecting response time threshold under the computer adaptive testing environment. Paper presented at the annual meeting of the National Council of Measurement in Education, New Orleans, LA. |
[26] |
Masters G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
doi: 10.1007/BF02296272 URL |
[27] |
Meyer J. P. (2010). A mixture Rasch model with item response time components. Applied Psychological Measurement, 34(7), 521-538.
doi: 10.1177/0146621609355451 URL |
[28] |
Michaelides M. P., Ivanova M., & Nicolaou C. (2020). The relationship between response-time effort and accuracy in PISA science multiple choice items. International Journal of Testing, 20(3), 187-205.
doi: 10.1080/15305058.2019.1706529 URL |
[29] |
Molenaar D., Bolsinova M., & Vermunt J. K. (2018). A semi-parametric within-subject mixture approach to the analyses of responses and response times. British Journal of Mathematical and Statistical Psychology, 71(2), 205- 228.
doi: 10.1111/bmsp.2018.71.issue-2 URL |
[30] |
Molenaar D., Bolsinova M., Rozsa S., & de Boeck P.,(2016). Response mixture modeling of intraindividual differences in responses and response times to the Hungarian WISC- IV block design test. Journal of Intelligence, 4(3), 10-29.
doi: 10.3390/jintelligence4030010 URL |
[31] |
Molenaar D., Oberski D., Vermunt J., & de Boeck P., (2016). Hidden Markov item response theory models for responses and response times. Multivariate Behavioral Research, 51(5), 606-626.
doi: 10.1080/00273171.2016.1192983 pmid: 27712114 |
[32] |
Molenaar D., & de Boeck P.,(2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83(2), 279-297.
doi: 10.1007/s11336-017-9602-9 pmid: 29392567 |
[33] |
Morgenthaler S. (2007). A survey of robust statistics. Statistical Methods and Applications, 15, 271-293.
doi: 10.1007/s10260-006-0034-4 URL |
[34] |
Partchev I., & de Boeck P.,(2012). Can fast and slow intelligence be differentiated? Intelligence, 40(1), 23-32.
doi: 10.1016/j.intell.2011.11.002 URL |
[35] |
Patton J. M., Cheng Y., Hong M. R., & Diao Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309-341.
doi: 10.3102/1076998618825116 URL |
[36] | Pohl S., Haberkorn K., Hardt K., & Wiegand E. (2012). NEPS technical report for reading? Scaling results of starting cohort 3 in fifth grade. NEPS Working Paper No.15. Bamberg: Otto-Friedrich-Universitt, Nationales Bildungspanel. |
[37] |
Pokropek A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300-325.
doi: 10.3102/1076998616636618 URL |
[38] | Qian H., Staniewska D., Reckase M., & Woo A. (2016). Using response time to detect item preknowledge in computer-based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. |
[39] | Ranger J., & Kuhn J. T. (2017). Detecting unmotivated individuals with a new model-selection approach for Rasch models. Psychological Test and Assessment Modeling, 59(3), 269-295. |
[40] |
Ranger J., Wolgast A., & Kuhn J. T. (2019). Robust estimation of the hierarchical model for responses and response times. British Journal of Mathematical and Statistical Psychology, 72(1), 83-107.
doi: 10.1111/bmsp.2019.72.issue-1 URL |
[41] |
Rios J. A., Guo H., Mao L., & Liu O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not?. International Journal of Testing, 17(1), 74-104.
doi: 10.1080/15305058.2016.1231193 URL |
[42] | Rose N. (2013). Item nonresponses in educational and psychological measurement (Unpublished doctorial dissertation). Friedrich-Schiller-University, Jena. |
[43] |
Rose N., von Davier M., & Nagengast B. (2017). Modeling omitted and not-reached items in IRT models. Psychometrika, 82(3), 795-819.
doi: 10.1007/s11336-016-9544-7 URL |
[44] | Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph Supplement No. 17). Richmond, VA: Psychometric Society. |
[45] |
Schnipke D. L., & Scrams D. J. (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement, 34(3), 213-232.
doi: 10.1111/jedm.1997.34.issue-3 URL |
[46] | Schnipke D. L. & Scrams D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M.T. Potenza, J.J. Fremer, & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 237-266). Mahwah, NJ: Lawrence Erlbaum. |
[47] |
Setzer J. C., Wise S. L., van den Heuvel J. R., & Ling G. (2013). An investigation of examinee test-taking effort on a large-scale assessment. Applied Measurement in Education, 26(1), 34-49.
doi: 10.1080/08957347.2013.739453 URL |
[48] |
Shao C., Li J., & Cheng Y. (2016). Detection of test speededness using change-point analysis. Psychometrika, 81(4), 1118-1141.
pmid: 26305400 |
[49] | Silm G., Must O., & Täht K. (2013). Test-taking effort as a predictor of performance in low-stakes tests. TRAMES: A Journal of the Humanities & Social Sciences, 17(4), 433- 448. |
[50] |
Sinharay S., & Johnson M. S. (2019). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73(3), 397-419.
doi: 10.1111/bmsp.v73.3 URL |
[51] |
Ulitzsch E., von Davier M., & Pohl S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non- response. British Journal of Mathematical and Statistical Psychology, 73(S1), 83-112.
doi: 10.1111/bmsp.v73.s1 URL |
[52] |
van der Linden W. J.(2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181-204.
doi: 10.3102/10769986031002181 URL |
[53] |
van der Linden W. J.(2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287-308.
doi: 10.1007/s11336-006-1478-z URL |
[54] |
van der Linden W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73, 365-384.
doi: 10.1007/s11336-007-9046-8 URL |
[55] |
Wang C., Chang H. H., & Douglas J. A. (2013). The linear transformation model with frailties for the analysis of item response times. British Journal of Mathematical and Statistical Psychology, 66(1), 144-168.
doi: 10.1111/j.2044-8317.2012.02045.x URL |
[56] |
Wang C., Fan Z., Chang H. H., & Douglas J. A. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381-417.
doi: 10.3102/1076998612461831 URL |
[57] |
Wang C., & Xu G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.
doi: 10.1111/bmsp.2015.68.issue-3 URL |
[58] |
Wang C., Xu G., & Shang Z. (2018). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika, 83(1), 223-254.
doi: 10.1007/s11336-016-9525-x URL |
[59] |
Wang C., Xu G., Shang Z., & Kuncel N. (2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43(4), 469-501.
doi: 10.3102/1076998618767123 URL |
[60] |
Weirich S., Hecht M., Penk C., Roppelt A., & Böhme K. (2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41(2), 115-129.
doi: 10.1177/0146621616676791 URL |
[61] |
Wise S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237-252.
doi: 10.1080/08957347.2015.1042155 URL |
[62] |
Wise S. L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52-61.
doi: 10.1111/emip.2017.36.issue-4 URL |
[63] |
Wise S. L. (2019). An information-based approach to identifying rapid-guessing thresholds. Applied Measurement in Education, 32(4), 325-336.
doi: 10.1080/08957347.2019.1660350 URL |
[64] |
Wise S. L., & DeMars C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.
doi: 10.1111/jedm.2006.43.issue-1 URL |
[65] |
Wise S. L., & DeMars C. E. (2010). Examinee noneffort and the validity of program assessment results. Educational Assessment, 15(1), 27-41.
doi: 10.1080/10627191003673216 URL |
[66] |
Wise S. L., & Kingsbury G. G. (2016). Modeling student test-taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53(1), 86-105.
doi: 10.1111/jedm.12102 URL |
[67] | Wise S. L., & Ma L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada. |
[68] | Wright B. D., & Stone M. H. (1979). Best test design. Rasch measurement. Chicago, IL: MESA Press. |
[69] |
Yan T., & Tourangeau R. (2008). Fast times and easy questions: The effects of age, experience and question complexity on web survey response times. Applied Cognitive Psychology, 22(1), 51-68.
doi: 10.1002/(ISSN)1099-0720 URL |
[70] |
Yu X., & Cheng Y. (2019). A change-point analysis procedure based on weighted residuals to detect back random responding. Psychological Methods, 24(5), 658-674.
doi: 10.1037/met0000212 URL |
[1] | ZHANG Longfei, WANG Xiaowen, CAI Yan, TU Dongbo. Change point analysis: A new method to detect aberrant responses in psychological and educational testing [J]. Advances in Psychological Science, 2020, 28(9): 1462-1477. |
[2] | Shiyu Chen, Mofen Cen, Gaoxing Mei. Lower contrast discrimination sensitivity in subthreshold depression: a longitudinal study [J]. Advances in Psychological Science, 2019, 27(suppl.): 28-28. |
[3] | Keyun Xin, Zhi Li. Visual working memory load does not affect the overall stimulus processing time in visual search [J]. Advances in Psychological Science, 2019, 27(suppl.): 33-33. |
[4] | Shiming Qiu, Xun Xiao, Gaoxing Mei. Perceptual bias to sad faces in subthreshold depression during binocular rivalry [J]. Advances in Psychological Science, 2019, 27(suppl.): 108-108. |
[5] | WANG Meng-Cheng, BI Xiangyang. Regression mixture modeling: Advances in method and its implementation [J]. Advances in Psychological Science, 2018, 26(12): 2272-2280. |
[6] | GUO Lei; SHANG Pengli; XIA Lingxiang. Advantages and illustrations of application of response time model in psychological and educational testing [J]. Advances in Psychological Science, 2017, 25(4): 701-712. |
[7] | WANG Jing, TANG Wenqing, ZHANG Minqiang, ZHANG Wenyi, GUO Kaiyin. Piecewise growth mixture models and its current researches [J]. Advances in Psychological Science, 2017, 25(10): 1696-1704. |
[8] | CHEN Yushuai; WEN Zhonglin; GU Honglei. Factor Mixture Model: An Integration of Latent Class Analysis and Factor Analysis [J]. Advances in Psychological Science, 2015, 23(3): 529-538. |
[9] | GAO Fei;ZHANG Weiwei;PAN Xiaofu. The Cognitive Mechanism and Influential Factors of Perception of Inflation [J]. Advances in Psychological Science, 2013, 21(1): 125-134. |
[10] | WANG Li;ZHANG Li-Wen;ZHANG Ming-Liang;CHEN An-Tao. The Influencing Factors and Mechanisms of the Visuomotor Simon Effect and Cognitive Simon Effect [J]. , 2012, 20(5): 662-671. |
[11] | Ernst Poeppel;Yan BAO;Bin ZHOU. “Temporal Windows” as Logistical Basis for Cognitive Processing [J]. , 2011, 19(6): 775-793. |
[12] | Zhou Renlai. The Development of Methods for Testing Awareness State in Subliminal Perceptual Studies and Its Imply [J]. , 2004, 12(3): 321-329. |
[13] | Wang Yamin,Li Long,Han Buxin. Cognitive Impairments In Patients With Type 2 Diabetes [J]. , 2003, 11(5): 562-566. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||