ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2010, Vol. 42 ›› Issue (03): 434-442.

• • 上一篇    

含题组的测验等值

吴 锐;丁树良;甘登文   

  1. (1江西师范大学计算机信息工程学院, 南昌 330022) (2华东交通大学图书馆, 南昌 330013)
  • 收稿日期:2008-09-26 修回日期:1900-01-01 出版日期:2010-03-30 发布日期:2010-03-30
  • 通讯作者: 丁树良

Test Equating with Testlets

WU Rui;DING Shu-Liang;GAN Deng-Wen   

  1. (1Computer Information Engineer College, JiangXi Normal University, Nanchang, Jiangxi 330022, China)
    (2East China JiaoTong University Library, Nanchang, Jiangxi 330013, China)
  • Received:2008-09-26 Revised:1900-01-01 Published:2010-03-30 Online:2010-03-30
  • Contact: DING Shu-Liang

摘要: 题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。

关键词: 测验等值, 题组, 项目反应理论, Monte Carlo模拟, 等值准则

Abstract: The research on test equating is very important for fairness of examnation, item banking, teaching quality assessing and computerized adaptive test. Along with the development of research on examination, testlets have appeared in different examnations increasingly, such as reading comprehension, mathematics, map etc. How to equate tests composed of testlets is a problem we are facing. When item response theory (IRT) models are applied in test equating, strong statistical assumptions—local independence (LI)—must be met. However, previous studies have shown that local independence is likely to be violated when testlets are contained in test. Hence, when equating tests composed of testlets, that local dependence is ignored can lead to distortion of equating coefficients using standard IRT model.
In order to solve this problem, we use a testlets-based model—2 Parameters Testlet Model (2PTM), which derives from IRT 2 Parameters Logistic Model by adding random-effect parameters associated with each testlet. Local dependence is considered in 2PTM. IRT characteristic curve equating methods and specific procedures for calculating equating coefficients were presented in this paper. In terms of the recovery of estimating the equating coefficients and based on Wilcoxon sign-rank test, a lot of experiments was done using Monte Carlo simulation method. The effectiveness of equating tests containing testlets was investigated under the several conditions, including the accuracy of the estimation of item parameters (AEIP), the number of examinees and the degree of local dependence. The findings of equating tests made up of testlets using 2PTM were compared with standard IRT model—2PLM, which not account for local dependence among items from a common testlet.
Results suggest that 2PTM is better than 2PLM in recovery and have significant differences mostly, so 2PTM is suitable for equating tests based testlets. In addition, the findings of using six different equating criterions for 2PTM were also compared with each other. The results showed that, generally speaking, when the value of the coefficient A is between 0.5 and 0.9, the performance of SLcrit is the best, SQRcrit is proper for 0.9<A<1.5 and Hcrit is proper for 1.5<A<2.0. The higher AEIP is, the better SQRcrit and SLcrit perform. Hcrit and SQRcrit are proper for large testlet effect. LCcrit, Wcrit and SREcrit are rarely better than others.

Key words: test equating, testlets, item response theory, Monte Carlo simulation, equating criterion