ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2012, Vol. 44 ›› Issue (3): 400-412.

• • 上一篇    下一篇

多级评分计算机化自适应测验动态综合选题策略

罗芬;丁树良;王晓庆   

  1. (江西师范大学计算机信息工程学院, 南昌 330022)
  • 收稿日期:2010-10-21 修回日期:1900-01-01 发布日期:2012-03-28 出版日期:2012-03-28
  • 通讯作者: 丁树良

Dynamic and Comprehensive Item Selection Strategies for Computerized Adaptive Testing Based on Graded Response Model

LUO Fen;DING Shu-Liang;WANG Xiao-Qing   

  1. (School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China)
  • Received:2010-10-21 Revised:1900-01-01 Online:2012-03-28 Published:2012-03-28
  • Contact: DING Shu-Liang

摘要: 多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。

关键词: 等级反应模型, 计算机化自适应测验, 选题策略, 区间估计, 多级评分b-STR

Abstract: Item selection strategy (ISS) is a core component in Computerized Adaptive Testing (CAT). Polytomous items can provide more information about examinee compared with dichotomous items, and adopting polytomously scored items in test is a research direction of CAT. As we know, the most widely used ISS is the maximum Fisher information (MFI) criterion, which raises concerns about cost-efficiency of the pool utilization and poses security risks for CAT programs. Chang & Ying (1999) and Chang, Qian, & Ying (2001) proposed two alternative item selection procedures, the a-stratified method (a-STR) and the a-stratified with b blocking method (b-STR) based on dichotomous model, with the goal to remedy the problems of item overexposure and item underexposure produced by MFI. However, the technology of a-STR and b-STR is static because the items are stratified according to the given information at the beginning of test. Based on graded response model (GRM), a technique of the reduction dimensionality of difficulty (or step) parameters was employed to construct some ISSs recently. The limitation of this dimension reduction technique is that it loses a lot of information. Thus, in order to improve MFI, two new item selection methods are proposed based on GRM: (1) modify the technique of the reduction dimensionality of difficulty (or step) parameters by integrating the interval estimation; (2) dynamic a-STR and dynamic b-STR methods are implemented in the testing process. On one hand, these new ISSs can avoid and remedy the limitations of MFI and make good use of the advantages of the Fisher information function (FIF); FIF compresses all item parameters and ability parameters, so it is a comprehensive tool for all parameters in nature.On the other hand, the new ISSs employ the property that FIF could represent the inverse of the variance of the ability estimation, let ε be the square root of the reciprocal of the Fisher information, d be the absolute deviation between the estimate ability and the function of the parameters of an item, which may be chosen and could be changed during the course of CAT, the inequality of d<ε has the form of interval estimation, and its utility could be imaged as a more flexible shadow item pool.
A simulation study based on GRM was conducted. Four item pools of different structures were simulated, and 1000 examinees was generated and their abilities were randomly drawn from the standard normal distribution N (0,1). Each pool consists of 1000 polytomous items and the maximum score of each item was randomly selected from set {3, 4, 5, 6}. In this paper, we assume the prior distribution of ability is standard normal and the Bayesian expected a posteriori (EAP) is employed to estimate the ability parameter. The CAT test stopped when the accumulative information satisfies the pre-determined value M (M=16) or reaches the pre-assigned test length 30.
The results of the simulation study show that the new item selection methods required shorter test lengths and lower average exposure rates than the other methods, while maintaining the accuracy of ability estimation. More specifically, the new ISSs which applied the idea of the interval estimate were better than the correspondent ISS in terms of the Chi-square value. And the same effect appeared when comparing the dynamic a-STR and dynamic b-STR ISS with MFI. Some important results are also found by comparing different structure of item pool. The accuracy of ability estimation and item exposure rate were related to the distribution of the difficult parameters b, that is, the accuracy of ability estimation obtained from the condition in which b was sampled from N (0,1) was better than that when b was sampled from uniform distribution. The conclusion of item exposure rate is on the contrary. Also, the test length was related to the distribution of the discrimination parameter a, the test length required by the condition in which a was sampled from uniform distribution was shorter than that when the logarithm of a was sampled from N (0,1). In a word, in terms of controlling and balancing the item exposure, the new ISSs may gain an advantage over the former correspondent ISS.

Key words: Graded Response Model (GRM), Computerized Adaptive Testing (CAT), Item selection, Interval estimation, b-STR based on polytomously scored model