ISSN 0439-755X
CN 11-1911/B

Acta Psychologica Sinica ›› 2016, Vol. 48 ›› Issue (9): 1184-1198.

### Two new online calibration methods for computerized adaptive testing

CHEN Ping

1. (Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China)
• Received:2015-09-28 Published:2016-09-25 Online:2016-09-25
• Contact: CHEN Ping, E-mail: pchen@bnu.edu.cn

Abstract:

With the development of computerized adaptive testing (CAT), many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A (Stocking, 1988) has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator (FFMLE) and an estimator which made use of the consequences of sufficiency (ECSE) (Stefanski & Carroll, 1985) with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods: the original Method A [denoted as Method A (Original)], the original Method A which plugs in the true abilities of examinees [Method A (True)], and the “multiple EM cycles” method (MEM). These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes (1000, 2000 and 3000) and three levels of CAT test lengths (10, 20 and 30), assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N (0,1)]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN (u, S) following the procedures of Chen and Xin (2014). Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A (Original) in almost all conditions, and the magnitude of improvement reached maximum when the test length was small (e.g., 10). Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length (i.e., 10 and 20), whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer (i.e., 30). Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation (e.g., including item exposure control, content balancing and allowing item review, etc.).