ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2012, Vol. 44 ›› Issue (2): 263-275.

• • 上一篇    下一篇

贝叶斯题组随机效应模型的必要性及影响因素

刘玥;刘红云   

  1. (北京师范大学心理学院, 北京 100875)
  • 收稿日期:2011-01-25 修回日期:1900-01-01 发布日期:2012-02-28 出版日期:2012-02-28
  • 通讯作者: 刘红云

When Should We Use Testlet Model? A Comparison Study of Bayesian Testlet Random-Effects Model and Standard 2-PL Bayesian Model

LIU Yue;LIU Hong-Yun   

  1. (School of Psychology, Beijing Normal University, Beijing 100875, China)
  • Received:2011-01-25 Revised:1900-01-01 Online:2012-02-28 Published:2012-02-28
  • Contact: LIU Hong-Yun

摘要: 题组模型可以解决传统IRT模型由于题目间局部独立性假设违背时所导致的参数估计偏差。为探讨题组随机效应模型的适用范围, 采用Monte Carlo模拟研究, 分别使用2-PL贝叶斯题组随机效应模型(BTRM)和2-PL贝叶斯模型(BM)对数据进行拟合, 考虑了题组效应、题组长度、题目数量和局部独立题目比例的影响。结果显示:(1) BTRM不受题组效应和题组长度影响, BM对参数估计的误差随题组效应和题组长度增加而增加。(2) BTRM具有一定的普遍性, 且当题组效应大, 题组长, 题目数量大时使用该模型能减少估计误差, 但是当题目数量较小时, 两个模型得到的能力估计误差都较大。(3)当局部独立题目的比例较大时, 两种模型得到的参数估计差异不大。

关键词: 题组, 2-PL贝叶斯题组随机效应模型, 2-PL贝叶斯模型, MCMC算法

Abstract: A testlet is comprised of a group of multiple choice items based on a common stimuli. When a testlet is used, the traditional item response models may not be appropriate due to the violation of the assumption of local independence (LI). A variety of new models have been proposed to analyze response data sets for testlets. Among them, the Bayesian random effects model proposed by Bradlow, Wainer and Wang (1999) is one of the most promising. However, in many situations it is not clear to practitioners whether the traditional IRT methods should still be used instead of a newly proposed testlet model.
The objective of the current study is to investigate the effects of model selection in various situations. In simulation 1, simulated response data sets were generated under three simulation factors, which were: testlet variance (0, 0.5, 1, 2); testlet size (2, 5, 10); and test length (20, 40, 60). For each simulation condition, the test structure was determined by fixing the number of examinees as I =2000, and the percentage of testlet items in a test as 50%. Under each condition, 30 replications were generated. Both two-parameter Bayesian testlet random effect model and standard two-parameter Bayesian model were fitted to every dataset using MCMC method. The computer program SCORIGHT was used to conduct all the analysis across different conditions.
Two models were compared corresponding to seven criteria: bias, mean absolute error, root mean square error, correlation between estimated and true values, 95% posterior interval width, 95% coverage probability. These indexes were computed for all parameters separately.
Simulation 2 compared the two models under two factors: the proportion of independent items (1/3, 1/2, 2/3); test length (20, 30, 40, 60). The data generation, analyze process and criteria mimicked those of simulation 1.
The results showed that: (1) The accuracy of the estimation of all parameters under 2-PL Bayesian testlet random-effect model remained stable with varying levels of testlet effect and testlet size. However, the estimate errors of all the parameters under 2-PL Bayesian model increased dramatically as the testlet effect and testlet size became larger. Besides, using Bayesian testlet random-effect model, the error for every parameter was always less than that for 2-PL Bayesian model. It was especially necessary to choose 2-PL Bayesian testlet random-effects model when testlet variance and testlet size were large. (2) Even though, the accuracy of estimation of item parameters in Bayesian testlet random-effect model wasn’t affected by test length, the accuracy of ability parameter was. Moreover, as the test got shorter, the errors of all parameters under 2-PL Bayesian model increased dramatically. In all, under short test conditions, even if there was large testlet effect, Bayesian testlet random-effect model couldn’t work well, meanwhile, if items were all independent, using Bayesian testlet random-effect model would result in much worse ability estimations than 2-PL Bayesian model. (3) When the proportion of independent items was large, and the test length was larger than 20 items, the estimations of two models didn’t show significant differences.
In conclusion, 2-PL Bayesian testlet random-effect model is more general. Using the more complex testlet model when items are all independent, will lead almost the same accuracy of the parameter estimations as using the 2-PL Bayesian model. It is better to choose 2-PL Bayesian testlet random-effect model when testlet variance, testlet size, and test length are large. However, when test length is short, even the Bayesian testlet random- effects model couldn’t provide accurate estimations of parameters when local dependence happened. So it is important to make sure the test was comprised of enough items before applying a testlet model. We also give some suggestions for practitioners. In the test construction period, first it is better for items to be independent, if not, shorter testlets and larger proportion of independent items should be included. While in the test analysis period, local dependence should be detected first. If evidence shows that there is dependence structure, then an appropriate model should be chosen to avoid estimation errors.

Key words: testlet, 2-PL Bayesian testlet random-effect model, 2-PL Bayesian model, MCMC method