Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items

doi:10.3724/SP.J.1041.2022.01122

Abstract

Abstract: Tests are widely used in educational measurement and psychometrics, and the examinee's aberrant responses will affect the estimation of their abilities. These examinees with aberrant responses should not be treated with conventional methods, the important thing is to accurately screen them out of the normal group. To achieve this, a common method is to construct person-fit statistics to detect whether the response patterns fit their estimated abilities.
In this study, a residual-based person-fit statistic R was proposed, which can be applied to both dichotomous or polytomous IRT models. The construction of R is based on a weighted residual between the observed response and the expected response. By accumulating the weighted residuals, the goodness of fit can be calculated and compared with a specific critical value to determine whether an examinee is aberrant or not. Given that tests with polytomous items can provide more information, polytomously scored items are being increasingly popular in educational measurement and psychometrics. The ability of R statistic to detect aberrant response patterns under the graded response model was mainly considered in this article.
An existing polytomous person-ft statistic l_zp was also introduced in its outstanding standardized form and superior power. In the first study, a simulation study was conducted to generate the empirical distribution of R statistic and l_zp. R statistic is an accumulation of weighted residuals, showing a positive skew distribution; l_zp shows a negative skew distribution when the test is less than 80 items. Both of them differ from the standard normal distribution, It is necessary to set critical value according to the type 1 error, using it to distinguish whether each respondent's response pattern is fitted. In the second study, examinees with different aberrant behaviors (e.g., Cheaters, Lucky guessers, Random respondents, Careless respondents, Creative respondents and Mixed) under different test length conditions were simulated, and the detection rate as well as area under curve (AUC) were used to compare the effectiveness of the two person-fit statistics. The results show that the R statistic has a better detection rate than l_zp when the aberrant behavior affects only a few items or the aberrant behavior is cheating or guessing. When the aberrant behavior covers plenty of items, l_zp is slightly better than R statistic. Then, an empirical study was also conducted to show the power of R statistic.
Both of the R statistic and the l_zp have their own pros and cons, so we may combine them in the future person-fit studies. The R statistic has a better detection rate under certain conditions compared to the l_zp, especially when cheating and lucky guessing happened. Considering that cheating and guessing behaviors of low-ability examinees are more preferred in many aberrant test behaviors, the R statistic is worthy of further research and exploration in real-world applications.

Key words: appropriateness measurement, item response theory, residual-based person-fit statistic, aberrant detection, polytomous item response models

CLC Number:

B841

TONG Hao, YU Xiaofeng, QIN Chunying, PENG Yafeng, ZHONG Xiaoyuan. (2022). Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items. Acta Psychologica Sinica, 54(9), 1122-1136.

[1]	REN He, CHEN Ping. Two new termination rules for multidimensional computerized classification testing [J]. Acta Psychologica Sinica, 2021, 53(9): 1044-1058.
[2]	ZHAN Peida, JIAO Hong, MAN Kaiwen. The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed [J]. Acta Psychologica Sinica, 2020, 52(9): 1132-1142.
[3]	LIU Yue, LIU Hongyun. Reporting overall scores and domain scores of bi-factor models [J]. Acta Psychologica Sinica, 2017, 49(9): 1234-1246.
[4]	CHEN Ping. Two new online calibration methods for computerized adaptive testing [J]. Acta Psychologica Sinica, 2016, 48(9): 1184-1198.
[5]	MENG Xiangbin; TAO Jian; CHEN Shali. Warm’sweighted maximum likelihood estimation of latent trait in the four-parameter logistic model [J]. Acta Psychologica Sinica, 2016, 48(8): 1047-1056.
[6]	WANG Wenyi;SONG Lihong;DING Shuliang. Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory [J]. Acta Psychologica Sinica, 2016, 48(12): 1612-1624.
[7]	ZHAN Peida; CHEN Ping; BIAN Yufang. Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis [J]. Acta Psychologica Sinica, 2016, 48(10): 1347-1356.
[8]	ZHAN Peida; LI Xiaomin; WANG Wen-Chung; BIAN Yufang; WANG Lijun. The Multidimensional Testlet-Effect Cognitive Diagnostic Models [J]. Acta Psychologica Sinica, 2015, 47(5): 689-701.
[9]	YAO Ruosong;ZHAO Baonan;LIU Ze;MIAO Qunying. The Application of Many-Facet Rasch Model in Leaderless Group Discussion [J]. Acta Psychologica Sinica, 2013, 45(9): 1039-1049.
[10]	LIU Yue;LIU Hongyun. Comparison of MIRT Linking Methods for Different Common Item Designs [J]. Acta Psychologica Sinica, 2013, 45(4): 466-480 .
[11]	DU Wenjiu;ZHOU Juan;LI Hongbo. The Item Parameters’ Estimation Accuracy of Two-Parameter Logistic Model [J]. Acta Psychologica Sinica, 2013, 45(10): 1179-1186.
[12]	LIU Hong-Yun,LI Chong,ZHANG Ping-Ping,LUO Fang. Testing Measurement Equivalence of Categorical Items’ Threshold/Difficulty Parameters: A Comparison of CCFA and (M)IRT Approaches [J]. Acta Psychologica Sinica, 2012, 44(8): 1124-1136.
[13]	LIU Hong-Yun,LUO Fang,WANG Yue,ZHANG Yu. Item Parameter Estimation for Multidimensional Measurement: Comparisons of SEM and MIRT Based Methods [J]. , 2012, 44(1): 121-132.
[14]	TU Dong-Bo,CAI Yan,DAI Hai-Qi,DING Shu-Liang. Parameters Estimation of MIRT Model and Its Application in Psychological Tests [J]. , 2011, 43(11): 1329-1340.
[15]	WU Rui,DING Shu-Liang,GAN Deng-Wen. Test Equating with Testlets [J]. , 2010, 42(03): 434-442.

Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments