Please wait a minute...
心理学报
  论文 本期目录 | 过刊浏览 | 高级检索 |
两种新的计算机化自适应测验在线标定方法
陈平
(北京师范大学中国基础教育质量监测协同创新中心, 北京 100875)
Two new online calibration methods for computerized adaptive testing
CHEN Ping
(Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China)
全文: PDF(531 KB)   评审附件 (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 

在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法, 但它具有明显的理论缺陷——在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与“利用充分性结果”估计方法(ECSE)的误差校正思路融入Method A (新方法分别记为FFMLE-Method A和ECSE-Method A), 从理论上对能力估计误差进行校正, 进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下, 两种新方法较Method A总体上可以改进标定精度, 且在测验长度为10的短测验上的改进幅度最大; (2)当CAT测验长度较短或中等(10或20题)时, 两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时, ECSE-Method A的总体表现最好、优于MEM; (3)样本量越大, 各种方法的标定精度越高。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈平
关键词 全功能极大似然估计 计算机化自适应测验 项目反应理论 在线标定 题库建设    
Abstract

With the development of computerized adaptive testing (CAT), many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A (Stocking, 1988) has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator (FFMLE) and an estimator which made use of the consequences of sufficiency (ECSE) (Stefanski & Carroll, 1985) with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods: the original Method A [denoted as Method A (Original)], the original Method A which plugs in the true abilities of examinees [Method A (True)], and the “multiple EM cycles” method (MEM). These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes (1000, 2000 and 3000) and three levels of CAT test lengths (10, 20 and 30), assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N (0,1)]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN (u, S) following the procedures of Chen and Xin (2014). Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A (Original) in almost all conditions, and the magnitude of improvement reached maximum when the test length was small (e.g., 10). Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length (i.e., 10 and 20), whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer (i.e., 30). Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation (e.g., including item exposure control, content balancing and allowing item review, etc.).

Key wordsfull functional maximum likelihood estimator    computerized adaptive testing    item response theory    online calibration    construction of item bank
收稿日期: 2015-09-28      出版日期: 2016-09-25
基金资助:

国家自然科学基金青年基金项目(31300862)、高等学校博士学科点专项科研基金项目新教师类(20130003120002)和东北师范大学应用统计教育部重点实验室开放课题 (KLAS 130028614) 资助。

通讯作者: 陈平, E-mail: pchen@bnu.edu.cn    
引用本文:   
陈平. 两种新的计算机化自适应测验在线标定方法[J]. 心理学报, 10.3724/SP.J.1041.2016.01184.
CHEN Ping. Two new online calibration methods for computerized adaptive testing. Acta Psychologica Sinica, 2016, 48(9): 1184-1198.
链接本文:  
http://journal.psych.ac.cn/xlxb/CN/10.3724/SP.J.1041.2016.01184      或      http://journal.psych.ac.cn/xlxb/CN/Y2016/V48/I9/1184
[1] 孟祥斌;陶剑;陈莎莉. 四参数Logistic模型潜在特质参数的 Warm加权极大似然估计[J]. 心理学报, 2016, 48(8): 1047-1056.
[2] 郭磊; 郑蝉金; 边玉芳; 宋乃庆; 夏凌翔. 认知诊断计算机化自适应测验中新的选题策略:结合项目区分度指标[J]. 心理学报, 2016, 48(7): 903-914.
[3] 汪文义; 宋丽红;丁树良. 复杂决策规则下MIRT的分类准确性和分类一致性[J]. 心理学报, 2016, 48(12): 1612-1624.
[4] 詹沛达;陈平;边玉芳. 使用验证性补偿多维IRT模型进行认知诊断评估[J]. 心理学报, 2016, 48(10): 1347-1356.
[5] 林喆;陈平;辛涛. 允许CAT题目检查的区块题目袋方法[J]. 心理学报, 2015, 47(9): 1188-1198.
[6] 詹沛达;李晓敏;王文中;边玉芳;王立君. 多维题组效应认知诊断模型[J]. 心理学报, 2015, 47(5): 689-701.
[7] 罗照盛;喻晓锋;高椿雷;李喻骏;彭亚风;王 睿;王钰彤. 基于属性掌握概率的认知诊断计算机化自适应测验选题策略[J]. 心理学报, 2015, 47(5): 679-688.
[8] 郭磊;郑蝉金;边玉芳. 变长CD-CAT中的曝光控制与终止规则[J]. 心理学报, 2015, 47(1): 129-140.
[9] 郭磊;王卓然;王丰;边玉芳. 结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略[J]. 心理学报, 2014, 46(5): 702-713.
[10] 毛秀珍;辛涛. 认知诊断CAT中具有非统计约束选题方法的比较[J]. 心理学报, 2014, 46(12): 1910-1922.
[11] 姚若松;赵葆楠;刘泽;苗群鹰. 无领导小组讨论的多侧面Rasch模型应用[J]. 心理学报, 2013, 45(9): 1039-1049.
[12] 毛秀珍;辛涛. 认知诊断CAT中项目曝光控制方法的比较[J]. 心理学报, 2013, 45(6): 694-703.
[13] 杜文久;周娟;李洪波. 二参数逻辑斯蒂模型项目参数的估计精度[J]. 心理学报, 2013, 45(10): 1179-1186.
[14] 刘红云,李冲,张平平,骆方. 分类数据测量等价性检验方法及其比较:项目阈值(难度)参数的组间差异性检验[J]. 心理学报, 2012, 44(8): 1124-1136.
[15] 罗芬,丁树良,王晓庆. 多级评分计算机化自适应测验动态综合选题策略[J]. , 2012, 44(3): 400-412.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《心理学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn