ISSN 0439-755X
CN 11-1911/B

心理学报 ›› 2016, Vol. 48 ›› Issue (9): 1184-1198.doi: 10.3724/SP.J.1041.2016.01184

• 论文 • 上一篇    



  1. (北京师范大学中国基础教育质量监测协同创新中心, 北京 100875)
  • 收稿日期:2015-09-28 发布日期:2016-09-25 出版日期:2016-09-25
  • 通讯作者: 陈平, E-mail:
  • 基金资助:

    国家自然科学基金青年基金项目(31300862)、高等学校博士学科点专项科研基金项目新教师类(20130003120002)和东北师范大学应用统计教育部重点实验室开放课题 (KLAS 130028614) 资助。

Two new online calibration methods for computerized adaptive testing

CHEN Ping   

  1. (Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China)
  • Received:2015-09-28 Online:2016-09-25 Published:2016-09-25
  • Contact: CHEN Ping, E-mail:


在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法, 但它具有明显的理论缺陷——在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与“利用充分性结果”估计方法(ECSE)的误差校正思路融入Method A (新方法分别记为FFMLE-Method A和ECSE-Method A), 从理论上对能力估计误差进行校正, 进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下, 两种新方法较Method A总体上可以改进标定精度, 且在测验长度为10的短测验上的改进幅度最大; (2)当CAT测验长度较短或中等(10或20题)时, 两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时, ECSE-Method A的总体表现最好、优于MEM; (3)样本量越大, 各种方法的标定精度越高。

关键词: 全功能极大似然估计, 计算机化自适应测验, 项目反应理论, 在线标定, 题库建设


With the development of computerized adaptive testing (CAT), many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A (Stocking, 1988) has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator (FFMLE) and an estimator which made use of the consequences of sufficiency (ECSE) (Stefanski & Carroll, 1985) with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods: the original Method A [denoted as Method A (Original)], the original Method A which plugs in the true abilities of examinees [Method A (True)], and the “multiple EM cycles” method (MEM). These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes (1000, 2000 and 3000) and three levels of CAT test lengths (10, 20 and 30), assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N (0,1)]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN (u, S) following the procedures of Chen and Xin (2014). Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A (Original) in almost all conditions, and the magnitude of improvement reached maximum when the test length was small (e.g., 10). Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length (i.e., 10 and 20), whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer (i.e., 30). Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation (e.g., including item exposure control, content balancing and allowing item review, etc.).

Key words: full functional maximum likelihood estimator, computerized adaptive testing, item response theory, online calibration, construction of item bank