ISSN 0439-755X
CN 11-1911/B

›› 2010, Vol. 42 ›› Issue (03): 434-442.

Previous Articles    

Test Equating with Testlets

WU Rui;DING Shu-Liang;GAN Deng-Wen   

  1. (1Computer Information Engineer College, JiangXi Normal University, Nanchang, Jiangxi 330022, China)
    (2East China JiaoTong University Library, Nanchang, Jiangxi 330013, China)
  • Received:2008-09-26 Revised:1900-01-01 Published:2010-03-30 Online:2010-03-30
  • Contact: DING Shu-Liang

Abstract: The research on test equating is very important for fairness of examnation, item banking, teaching quality assessing and computerized adaptive test. Along with the development of research on examination, testlets have appeared in different examnations increasingly, such as reading comprehension, mathematics, map etc. How to equate tests composed of testlets is a problem we are facing. When item response theory (IRT) models are applied in test equating, strong statistical assumptions—local independence (LI)—must be met. However, previous studies have shown that local independence is likely to be violated when testlets are contained in test. Hence, when equating tests composed of testlets, that local dependence is ignored can lead to distortion of equating coefficients using standard IRT model.
In order to solve this problem, we use a testlets-based model—2 Parameters Testlet Model (2PTM), which derives from IRT 2 Parameters Logistic Model by adding random-effect parameters associated with each testlet. Local dependence is considered in 2PTM. IRT characteristic curve equating methods and specific procedures for calculating equating coefficients were presented in this paper. In terms of the recovery of estimating the equating coefficients and based on Wilcoxon sign-rank test, a lot of experiments was done using Monte Carlo simulation method. The effectiveness of equating tests containing testlets was investigated under the several conditions, including the accuracy of the estimation of item parameters (AEIP), the number of examinees and the degree of local dependence. The findings of equating tests made up of testlets using 2PTM were compared with standard IRT model—2PLM, which not account for local dependence among items from a common testlet.
Results suggest that 2PTM is better than 2PLM in recovery and have significant differences mostly, so 2PTM is suitable for equating tests based testlets. In addition, the findings of using six different equating criterions for 2PTM were also compared with each other. The results showed that, generally speaking, when the value of the coefficient A is between 0.5 and 0.9, the performance of SLcrit is the best, SQRcrit is proper for 0.9<A<1.5 and Hcrit is proper for 1.5<A<2.0. The higher AEIP is, the better SQRcrit and SLcrit perform. Hcrit and SQRcrit are proper for large testlet effect. LCcrit, Wcrit and SREcrit are rarely better than others.

Key words: test equating, testlets, item response theory, Monte Carlo simulation, equating criterion