Two new online calibration methods for computerized adaptive testing

doi:10.3724/SP.J.1041.2016.01184

Abstract

Abstract:

With the development of computerized adaptive testing (CAT), many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A (Stocking, 1988) has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator (FFMLE) and an estimator which made use of the consequences of sufficiency (ECSE) (Stefanski & Carroll, 1985) with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods: the original Method A [denoted as Method A (Original)], the original Method A which plugs in the true abilities of examinees [Method A (True)], and the “multiple EM cycles” method (MEM). These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes (1000, 2000 and 3000) and three levels of CAT test lengths (10, 20 and 30), assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N (0,1)]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN (u, S) following the procedures of Chen and Xin (2014). Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A (Original) in almost all conditions, and the magnitude of improvement reached maximum when the test length was small (e.g., 10). Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length (i.e., 10 and 20), whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer (i.e., 30). Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation (e.g., including item exposure control, content balancing and allowing item review, etc.).

Key words: full functional maximum likelihood estimator, computerized adaptive testing, item response theory, online calibration, construction of item bank

CHEN Ping. (2016). Two new online calibration methods for computerized adaptive testing. Acta Psychologica Sinica, 48(9), 1184-1198.

[1]	REN He, CHEN Ping. Two new termination rules for multidimensional computerized classification testing [J]. Acta Psychologica Sinica, 2021, 53(9): 1044-1058.
[2]	TAN Qingrong, WANG Daxun, LUO Fen, CAI Yan, TU Dongbo. A high-efficiency and new online calibration method in CD-CAT based on information gain of entropy and EM algorithm [J]. Acta Psychologica Sinica, 2021, 53(11): 1286-1298.
[3]	ZHAN Peida, JIAO Hong, MAN Kaiwen. The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed [J]. Acta Psychologica Sinica, 2020, 52(9): 1132-1142.
[4]	LIU Yue, LIU Hongyun. Reporting overall scores and domain scores of bi-factor models [J]. Acta Psychologica Sinica, 2017, 49(9): 1234-1246.
[5]	MENG Xiangbin; TAO Jian; CHEN Shali. Warm’sweighted maximum likelihood estimation of latent trait in the four-parameter logistic model [J]. Acta Psychologica Sinica, 2016, 48(8): 1047-1056.
[6]	GUO Lei; ZHENG Chanjin; BIAN Yufang; SONG Naiqing; XIA Lingxiang. New item selection methods in cognitive diagnostic computerized adaptive testing: Combining item discrimination indices [J]. Acta Psychologica Sinica, 2016, 48(7): 903-914.
[7]	WANG Wenyi;SONG Lihong;DING Shuliang. Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory [J]. Acta Psychologica Sinica, 2016, 48(12): 1612-1624.
[8]	ZHAN Peida; CHEN Ping; BIAN Yufang. Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis [J]. Acta Psychologica Sinica, 2016, 48(10): 1347-1356.
[9]	LIN Zhe; CHEN Pin; XIN Tao. The Block Item Pocket Method to Allow Item Review in CAT [J]. Acta Psychologica Sinica, 2015, 47(9): 1188-1198.
[10]	ZHAN Peida; LI Xiaomin; WANG Wen-Chung; BIAN Yufang; WANG Lijun. The Multidimensional Testlet-Effect Cognitive Diagnostic Models [J]. Acta Psychologica Sinica, 2015, 47(5): 689-701.
[11]	DAI Buyun; ZHANG Minqiang; JIAO Can; LI Guangming; ZHU Huawei; ZHANG Wenyi. Item Selection Using the Multiple-Strategy RRUM Based on CD-CAT [J]. Acta Psychologica Sinica, 2015, 47(12): 1511-1519.
[12]	GUO Lei; ZHENG Chanjin; BIAN Yufang. Exposure Control Methods and Termination Rules in Variable-Length Cognitive Diagnostic Computerized Adaptive Testing [J]. Acta Psychologica Sinica, 2015, 47(1): 129-140.
[13]	GUO Lei;WANG Zhuoran;WANG Feng;BIAN Yufang. a-Stratified Methods Combining Item Exposure Control and General Test Overlap in Computerized Adaptive Testing [J]. Acta Psychologica Sinica, 2014, 46(5): 702-713.
[14]	MAO Xiuzhen; XIN Tao. A Comparison of Item Selection Methods for Cognitive Diagnostic Computerized Adaptive Testing with Nonstatistical Constraints [J]. Acta Psychologica Sinica, 2014, 46(12): 1910-1922.
[15]	YAO Ruosong;ZHAO Baonan;LIU Ze;MIAO Qunying. The Application of Many-Facet Rasch Model in Leaderless Group Discussion [J]. Acta Psychologica Sinica, 2013, 45(9): 1039-1049.

Two new online calibration methods for computerized adaptive testing

Knowledge

Review File

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments