多级评分计算机化自适应测验动态综合选题策略

心理学报 ›› 2012, Vol. 44 ›› Issue (3): 400-412.

多级评分计算机化自适应测验动态综合选题策略

罗芬;丁树良;王晓庆

(江西师范大学计算机信息工程学院, 南昌 330022)

收稿日期:2010-10-21 修回日期:1900-01-01 发布日期:2012-03-28 出版日期:2012-03-28
通讯作者: 丁树良

Dynamic and Comprehensive Item Selection Strategies for Computerized Adaptive Testing Based on Graded Response Model

LUO Fen;DING Shu-Liang;WANG Xiao-Qing

(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China)

Received:2010-10-21 Revised:1900-01-01 Online:2012-03-28 Published:2012-03-28
Contact: DING Shu-Liang

摘要/Abstract

摘要： 多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。

关键词: 等级反应模型, 计算机化自适应测验, 选题策略, 区间估计, 多级评分b-STR

Abstract: Item selection strategy (ISS) is a core component in Computerized Adaptive Testing (CAT). Polytomous items can provide more information about examinee compared with dichotomous items, and adopting polytomously scored items in test is a research direction of CAT. As we know, the most widely used ISS is the maximum Fisher information (MFI) criterion, which raises concerns about cost-efficiency of the pool utilization and poses security risks for CAT programs. Chang & Ying (1999) and Chang, Qian, & Ying (2001) proposed two alternative item selection procedures, the a-stratified method (a-STR) and the a-stratified with b blocking method (b-STR) based on dichotomous model, with the goal to remedy the problems of item overexposure and item underexposure produced by MFI. However, the technology of a-STR and b-STR is static because the items are stratified according to the given information at the beginning of test. Based on graded response model (GRM), a technique of the reduction dimensionality of difficulty (or step) parameters was employed to construct some ISSs recently. The limitation of this dimension reduction technique is that it loses a lot of information. Thus, in order to improve MFI, two new item selection methods are proposed based on GRM: (1) modify the technique of the reduction dimensionality of difficulty (or step) parameters by integrating the interval estimation; (2) dynamic a-STR and dynamic b-STR methods are implemented in the testing process. On one hand, these new ISSs can avoid and remedy the limitations of MFI and make good use of the advantages of the Fisher information function (FIF); FIF compresses all item parameters and ability parameters, so it is a comprehensive tool for all parameters in nature.On the other hand, the new ISSs employ the property that FIF could represent the inverse of the variance of the ability estimation, let ε be the square root of the reciprocal of the Fisher information, d be the absolute deviation between the estimate ability and the function of the parameters of an item, which may be chosen and could be changed during the course of CAT, the inequality of d<ε has the form of interval estimation, and its utility could be imaged as a more flexible shadow item pool.
A simulation study based on GRM was conducted. Four item pools of different structures were simulated, and 1000 examinees was generated and their abilities were randomly drawn from the standard normal distribution N (0,1). Each pool consists of 1000 polytomous items and the maximum score of each item was randomly selected from set {3, 4, 5, 6}. In this paper, we assume the prior distribution of ability is standard normal and the Bayesian expected a posteriori (EAP) is employed to estimate the ability parameter. The CAT test stopped when the accumulative information satisfies the pre-determined value M (M=16) or reaches the pre-assigned test length 30.
The results of the simulation study show that the new item selection methods required shorter test lengths and lower average exposure rates than the other methods, while maintaining the accuracy of ability estimation. More specifically, the new ISSs which applied the idea of the interval estimate were better than the correspondent ISS in terms of the Chi-square value. And the same effect appeared when comparing the dynamic a-STR and dynamic b-STR ISS with MFI. Some important results are also found by comparing different structure of item pool. The accuracy of ability estimation and item exposure rate were related to the distribution of the difficult parameters b, that is, the accuracy of ability estimation obtained from the condition in which b was sampled from N (0,1) was better than that when b was sampled from uniform distribution. The conclusion of item exposure rate is on the contrary. Also, the test length was related to the distribution of the discrimination parameter a, the test length required by the condition in which a was sampled from uniform distribution was shorter than that when the logarithm of a was sampled from N (0,1). In a word, in terms of controlling and balancing the item exposure, the new ISSs may gain an advantage over the former correspondent ISS.

Key words: Graded Response Model (GRM), Computerized Adaptive Testing (CAT), Item selection, Interval estimation, b-STR based on polytomously scored model

罗芬,丁树良,王晓庆. (2012). 多级评分计算机化自适应测验动态综合选题策略. 心理学报, 44(3), 400-412.

LUO Fen,DING Shu-Liang,WANG Xiao-Qing. (2012). Dynamic and Comprehensive Item Selection Strategies for Computerized Adaptive Testing Based on Graded Response Model. , 44(3), 400-412.

[1]	童昊, 喻晓锋, 秦春影, 彭亚风, 钟小缘. 多级计分测验中基于残差统计量的被试拟合研究[J]. 心理学报, 2022, 54(9): 1122-1136.
[2]	孙小坚, 郭磊. 考虑题目选项信息的非参数认知诊断计算机自适应测验[J]. 心理学报, 2022, 54(9): 1137-1150.
[3]	罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465.
[4]	王璞珏, 刘红云. 让自适应测验更知人善选——基于推荐系统的选题策略[J]. 心理学报, 2019, 51(9): 1057-1067.
[5]	陈平. 两种新的计算机化自适应测验在线标定方法[J]. 心理学报, 2016, 48(9): 1184-1198.
[6]	郭磊; 郑蝉金; 边玉芳; 宋乃庆; 夏凌翔. 认知诊断计算机化自适应测验中新的选题策略：结合项目区分度指标[J]. 心理学报, 2016, 48(7): 903-914.
[7]	林喆;陈平;辛涛. 允许CAT题目检查的区块题目袋方法[J]. 心理学报, 2015, 47(9): 1188-1198.
[8]	罗照盛;喻晓锋;高椿雷;李喻骏;彭亚风;王睿;王钰彤. 基于属性掌握概率的认知诊断计算机化自适应测验选题策略[J]. 心理学报, 2015, 47(5): 679-688.
[9]	郭磊;郑蝉金;边玉芳. 变长CD-CAT中的曝光控制与终止规则[J]. 心理学报, 2015, 47(1): 129-140.
[10]	郭磊;王卓然;王丰;边玉芳. 结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略[J]. 心理学报, 2014, 46(5): 702-713.
[11]	毛秀珍;辛涛. 认知诊断CAT中具有非统计约束选题方法的比较[J]. 心理学报, 2014, 46(12): 1910-1922.
[12]	毛秀珍;辛涛. 认知诊断CAT中项目曝光控制方法的比较[J]. 心理学报, 2013, 45(6): 694-703.
[13]	田伟,辛涛. 基于等级反应模型的规则空间方法[J]. 心理学报, 2012, 44(2): 249-262.
[14]	杜文久;肖涵敏. 多维项目反应理论等级反应模型[J]. 心理学报, 2012, 44(10): 1402-1407.
[15]	肖涵敏,杜文久,张婷婷. 基于项目节点的多级评分模型的统一[J]. 心理学报, 2011, 43(12): 1462-1467.