Please wait a minute...
Acta Psychologica Sinica
Two new online calibration methods for computerized adaptive testing
(Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China)
Download: PDF(531 KB)   Review File (1 KB) 
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    

With the development of computerized adaptive testing (CAT), many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A (Stocking, 1988) has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator (FFMLE) and an estimator which made use of the consequences of sufficiency (ECSE) (Stefanski & Carroll, 1985) with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods: the original Method A [denoted as Method A (Original)], the original Method A which plugs in the true abilities of examinees [Method A (True)], and the “multiple EM cycles” method (MEM). These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes (1000, 2000 and 3000) and three levels of CAT test lengths (10, 20 and 30), assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N (0,1)]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN (u, S) following the procedures of Chen and Xin (2014). Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A (Original) in almost all conditions, and the magnitude of improvement reached maximum when the test length was small (e.g., 10). Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length (i.e., 10 and 20), whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer (i.e., 30). Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation (e.g., including item exposure control, content balancing and allowing item review, etc.).

Keywords full functional maximum likelihood estimator      computerized adaptive testing      item response theory      online calibration      construction of item bank     
Corresponding Authors: CHEN Ping, E-mail:    
Issue Date: 25 September 2016
E-mail this article
E-mail Alert
Articles by authors
Cite this article:   
CHEN Ping. Two new online calibration methods for computerized adaptive testing[J]. Acta Psychologica Sinica, 10.3724/SP.J.1041.2016.01184
URL:     OR
[1] MENG Xiangbin; TAO Jian; CHEN Shali. Warm’sweighted maximum likelihood estimation of latent trait in the four-parameter logistic model[J]. Acta Psychologica Sinica, 2016, 48(8): 1047-1056.
[2] GUO Lei; ZHENG Chanjin; BIAN Yufang; SONG Naiqing; XIA Lingxiang. New item selection methods in cognitive diagnostic computerized adaptive testing: Combining item discrimination indices[J]. Acta Psychologica Sinica, 2016, 48(7): 903-914.
[3] WANG Wenyi;SONG Lihong;DING Shuliang. Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory[J]. Acta Psychologica Sinica, 2016, 48(12): 1612-1624.
[4] ZHAN Peida; CHEN Ping; BIAN Yufang. Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis[J]. Acta Psychologica Sinica, 2016, 48(10): 1347-1356.
[5] LIN Zhe; CHEN Pin; XIN Tao. The Block Item Pocket Method to Allow Item Review in CAT[J]. Acta Psychologica Sinica, 2015, 47(9): 1188-1198.
[6] ZHAN Peida; LI Xiaomin; WANG Wen-Chung; BIAN Yufang; WANG Lijun. The Multidimensional Testlet-Effect Cognitive Diagnostic Models[J]. Acta Psychologica Sinica, 2015, 47(5): 689-701.
[7] DAI Buyun; ZHANG Minqiang; JIAO Can; LI Guangming; ZHU Huawei; ZHANG Wenyi. Item Selection Using the Multiple-Strategy RRUM Based on CD-CAT[J]. Acta Psychologica Sinica, 2015, 47(12): 1511-1519.
[8] GUO Lei; ZHENG Chanjin; BIAN Yufang. Exposure Control Methods and Termination Rules in Variable-Length Cognitive Diagnostic Computerized Adaptive Testing[J]. Acta Psychologica Sinica, 2015, 47(1): 129-140.
[9] GUO Lei;WANG Zhuoran;WANG Feng;BIAN Yufang. a-Stratified Methods Combining Item Exposure Control and General Test Overlap in Computerized Adaptive Testing[J]. Acta Psychologica Sinica, 2014, 46(5): 702-713.
[10] MAO Xiuzhen; XIN Tao. A Comparison of Item Selection Methods for Cognitive Diagnostic Computerized Adaptive Testing with Nonstatistical Constraints[J]. Acta Psychologica Sinica, 2014, 46(12): 1910-1922.
[11] YAO Ruosong;ZHAO Baonan;LIU Ze;MIAO Qunying. The Application of Many-Facet Rasch Model in Leaderless Group Discussion[J]. Acta Psychologica Sinica, 2013, 45(9): 1039-1049.
[12] MAO Xiuzhen;XIN Tao. A Comparison of Item Selection Methods for Controlling Exposure Rate in Cognitive Diagnostic Computerized Adaptive Testing[J]. Acta Psychologica Sinica, 2013, 45(6): 694-703.
[13] LIU Yue;LIU Hongyun. Comparison of MIRT Linking Methods for Different Common Item Designs[J]. Acta Psychologica Sinica, 2013, 45(4): 466-480 .
[14] DU Wenjiu;ZHOU Juan;LI Hongbo. The Item Parameters’ Estimation Accuracy of Two-Parameter Logistic Model[J]. Acta Psychologica Sinica, 2013, 45(10): 1179-1186.
[15] LIU Hong-Yun,LI Chong,ZHANG Ping-Ping,LUO Fang. Testing Measurement Equivalence of Categorical Items’ Threshold/Difficulty Parameters: A Comparison of CCFA and (M)IRT Approaches[J]. Acta Psychologica Sinica, 2012, 44(8): 1124-1136.
Full text



Copyright © Acta Psychologica Sinica
Support by Beijing Magtech