基于GPCM的计算机自适应测验选题策略比较

心理学报 ›› 2008, Vol. 40 ›› Issue (05): 618-625. cstr: 32110.14.2008.00618

基于GPCM的计算机自适应测验选题策略比较

刘珍;丁树良;林海菁

江西师范大学信息工程学院, 南昌 330027

收稿日期:2006-12-15 修回日期:1900-01-01 发布日期:2008-05-30 出版日期:2008-05-30

Item Selection Strategies for Computerized Adaptive Testing with the Generalized Partial Credit Model

LIU Zhen;DING Shu-Liang;LIN Hai-Jing

Computer Information Engineering College, Jiangxi Normal University, Nanchang 330027, China

Received:2006-12-15 Revised:1900-01-01 Online:2008-05-30 Published:2008-05-30

摘要/Abstract

摘要： 选题策略是计算机自适应测验（Computerized Adaptive Testing , CAT）研究的一项重要内容，它的好坏直接关系到考试的信度、效度及考试的安全性。CAT的许多研究与应用，都建立在0-1二级评分模型基础上，对多级评分CAT的选题策略的研究很少报导。目前国内虽已开展了基于GRM的CAT研究，但基于GPCM的CAT的研究尚未见有关报道。本文通过计算机模拟程序，对基于拓广分部评分模型(Generalized Partial Credit Model, GPCM)下的CAT的四种选题策略在多种情况下进行了比较研究。研究结果表明：被试能力呈正态分布时，选题策略的使用效果与项目步骤参数分布有很大的关系。（1）项目步骤参数均服从正态分布时，采用能力与项目步骤参数匹配选题策略效果最佳；（2）项目步骤参数均服从均匀分布时，能力与项目步骤参数平均数匹配选题策略效果最佳

关键词: IRT, 多级评分模型, GPCM, a-分层, 选题策略

Abstract:

The objective of computerized adaptive testing (CAT) is to construct an optimal test for each examinee. Item Selection Strategy (ISS) is an important part of CAT research, whose quality is directly related to the reliability, efficiency, and security of the test.
Many researches and applications of CAT are based on a dichotomously scored model. It is highly evident that more information can be obtained from examinees using a polytomously scored model rather than a dichotomous model. Moreover, it is necessary for us to further explore CAT research based on a polytomously scored model.
Both the Generalized Partial Credit Model (GPCM) and the Graded Response Model (GRM) are within the range of a polytomously scored model. However, they differ from each other. In the GRM, the item grade difficulties ascend monotonously as the grades increase; while the GPCM shows the performing process of the item, which is separated into some line-steps to put forwards. In the GPCM, each item contains several step parameters, and there are no specific rules governing them. The posterior step cannot advance when the earlier step has not been completed, and the posterior’s step parameter may be lower than that of the previous one. Considerable research is already being conducted on CAT using the GRM; however, in our country, there are few reports pertaining to research on CAT using the GPCM.
This study investigated the four types of ISS in comparison with CAT in various circumstances, using the GPCM through computer simulated programs. They are implemented in four item pools, and each item pool has a capacity of 3000 items. Each item has five step parameters; further, the discrimination parameter and step parameters are distributed as follows: {(b~N(0,1), (lna~N(0,1)), (b~N(0,1)), (a~U(0.2,2.5)), (b~U(-3,3)), (lna~N(0,1)), (b~U(-3,3)), and (a~U(0.2,2.5)). Item parameters are generated based on the Monte Carlo simulation method. Responses to the items are generated according to the GPCM for a sample of 1000 simulatees ( ) whose trait level was also generated using the Monte Carlo simulation method in some types of ISS. During the course of responses, the simulatees’ ability is estimated based on the response obtained. In addition, after the four item pools are sorted by the discrimination parameter to complete the a-stratified design, the abovementioned process is performed repeatedly. Thirty-two simulated CATs are administered with the output evaluated with regard to the following measurements: precision, ISS steady, item used even, average use of item per person, χ2, efficiency, and item overlap.
The data in tables 1 and 2 include both the index values used for evaluation (which were obtained from the CAT process using four types of ISS when the item pool did not adopt the stratified design and instead adopted the a-stratified design) and values that are calculated after summing the weight of every index value. We can draw the following conclusions from the data in the tables: all the ability estimates are highly accurate and have fewer differences. Moreover, we compare the value by summing every means weight, we learn that the item step parameter distribution greatly influences the choices of ISS.
On the condition that the examinee’s trait level follows normal distribution, the application results of the ISS and the item step parameter distribution share a very close relationship. (1) If the item’s step parameters follow a normal distribution, the efficiency of the ISS for a random step parameter matching the trait level is much better than that for others. (2) If the item’s step parameters follow a uniform distribution, the efficiency of the item selection strategy for the item’s average step parameter matching the trait level is much better than that for others

Key words: IRT, polytomously scored model, GPCM, a-stratified design, item selection strategy

中图分类号:

B841

刘珍,丁树良,林海菁. (2008). 基于GPCM的计算机自适应测验选题策略比较. 心理学报, 40(05), 618-625.

LIU Zhen,DING Shu-Liang,LIN Hai-Jing. (2008). Item Selection Strategies for Computerized Adaptive Testing with the Generalized Partial Credit Model. , 40(05), 618-625.

[1]	孙小坚, 郭磊. 考虑题目选项信息的非参数认知诊断计算机自适应测验[J]. 心理学报, 2022, 54(9): 1137-1150.
[2]	李美娟, 刘玥, 刘红云. 计算机动态测验中问题解决过程策略的分析：多水平混合IRT模型的拓展与应用[J]. 心理学报, 2020, 52(4): 528-540.
[3]	罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465.
[4]	王璞珏, 刘红云. 让自适应测验更知人善选——基于推荐系统的选题策略[J]. 心理学报, 2019, 51(9): 1057-1067.
[5]	刘玥, 刘红云. 基于双因子模型的测验总分和维度分的合成方法[J]. 心理学报, 2017, 49(9): 1234-1246.
[6]	郭磊; 郑蝉金; 边玉芳; 宋乃庆; 夏凌翔. 认知诊断计算机化自适应测验中新的选题策略：结合项目区分度指标[J]. 心理学报, 2016, 48(7): 903-914.
[7]	简小珠;戴步云;戴海琦. Logistic加权模型的理论构建与模拟分析[J]. 心理学报, 2016, 48(12): 1625-1630.
[8]	罗照盛;喻晓锋;高椿雷;李喻骏;彭亚风;王睿;王钰彤. 基于属性掌握概率的认知诊断计算机化自适应测验选题策略[J]. 心理学报, 2015, 47(5): 679-688.
[9]	郭磊;王卓然;王丰;边玉芳. 结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略[J]. 心理学报, 2014, 46(5): 702-713.
[10]	刘玥;刘红云. 不同铆测验设计下多维IRT等值方法的比较[J]. 心理学报, 2013, 45(4): 466-480 .
[11]	罗芬,丁树良,王晓庆. 多级评分计算机化自适应测验动态综合选题策略[J]. 心理学报, 2012, 44(3): 400-412.
[12]	肖涵敏,杜文久,张婷婷. 基于项目节点的多级评分模型的统一[J]. 心理学报, 2011, 43(12): 1462-1467.
[13]	程小扬,丁树良,严深海,朱隆尹. 引入曝光因子的计算机化自适应测验选题策略[J]. 心理学报, 2011, 43(02): 203-212.
[14]	曾莉,辛涛,张淑梅. 2PL模型的两种马尔可夫蒙特卡洛缺失数据处理方法比较[J]. 心理学报, 2009, 41(03): 276-282.
[15]	田建全,苗丹民,杨业兵,何宁,肖玮. 应征公民计算机自适应化拼图测验的编制[J]. 心理学报, 2009, 41(02): 167-174.