Comparison of MIRT Linking Methods for Different Common Item Designs

doi:10.3724/SP.J.1041.2013.00466

Abstract

Abstract: A great number of educational assessments usually measure more than one trait (Ackerman, 1992; DeMars, 2006; Reckase, 1985). In order to adjust scores on these different test forms, multidimensional item response theory (MIRT) and its linking procedures should be developed. So far, some researchers have already extended UIRT linking methods to the multidimensional structure (Davey et al., 1996; Hirsch, 1989; Li & Lissitz, 2000; Min, 2003; Yon, 2006). There were numerous studies comparing MIRT linking methods in the literature. However, although choosing anchor items was of great importance in common item designs, a few of studies compared MIRT linking methods under different common item designs. It was still in doubt that, how we could choose the common items across different MIRT linking methods. The purpose of this study was to compare five MIRT linking methods under two kinds of common item choosing strategies in various situations. The study was a mixed measure design of simulation conditions (between-factors) and linking methods (within-factor). There were six between-factors: (1) 2 test lengths (40 items and 80 items); (2) 2 levels of the proportion of the number of items in one dimension to another (1:1 and 1:3); (3) 3 anchor lengths (1/20, 1/5 to 1/3 of total test); (4) 2 strategies of choosing common items (averagely choosing the items in all dimensions or choosing according to the proportions of items in every dimension); (5) 3 correlations between two ability dimensions (r=0, 0.5, 0.9); (6) 2 levels of equivalent/non-equivalent ability levels between two populations. The five MIRT linking methods we investigated were: Mean/Mean (MM) method, Mean/Sigma (MS) method, Stoking-Lord’s (SL) method, Haebara’s (HB) method and Least Square (LS) method. Under each condition, the number of examinees was fixed as I =2000, and 30 replications were generated. BMIRT (Yao, 2003) was applied to estimate item and ability parameters using an MCMC method. Based on the previous studies about equating (Kim & Cohen, 1998; Kim & Cohen, 2002), a two-step of linking was applied. The first step was to transform the scale of parameters in the new test onto the base test, and the second step was to transform the scale of all the simulated items onto the generating scale. In each step, the transformation matrices were produced by LinkMIRT (Yao, 2004) and the R package called “Plink” (Weeks, 2010). Finally, the recovery of parameters was evaluated by four criteria: bias, mean absolute error, root mean square error, correlation between the parameters after equating and true values. To compare the five MIRT equating methods, the results showed that: the RMSE for parameters under SL, HB and LS methods were smaller and more stable in different situations; however, the RMSE for parameters in MM and MS methods were significantly large, especially in non-equivalent group conditions. Therefore, the latter results were displayed for the SL, HB and LS methods. It was found that these methods were not affected by the common item design. It meant that in multidimensional linking, if the number of common items was more than 5% of the total test, the RMSE became acceptable. Meanwhile, the strategy of choosing common items didn’t have significant influence on the linking results of the three methods across different conditions of test structure. Moreover, for other simulation factors: as test length increased, the RMSE of these methods decreased; as the correlations between two ability dimensions increased, the RMSE of ability parameter decreased; the difference of ability levels between two populations had smaller effect on these methods, that only for intercept parameter, the non-equivalent group condition produced larger error. In conclusion, SL, HB and LS methods generally performed better than the other two methods across all conditions, so it was highly recommended to use these methods in practical. The performances of SL, HB and LS methods were similar under different common item designs, which was amazing for MIRT linking. Once an appropriate method was chosen, shorter anchor set could be applied, as developing good common items for multidimensional tests was quite time-consuming. Meanwhile, the common items could be chosen either according to the proportions of items in every dimension or averagely in all dimensions. This might be more convenient for practitioners as well. Lastly, as test length had significant effect on the accuracy of equated parameters, it wss suggested to make sure the test was comprised of enough items in every dimension before conducting an MIRT linking.

Key words: test equating, Multidimensional Item Response Theory, Mean/Mean (MM) method, Mean/Sigma (MS) method, Stoking-Lord’s (SL) method, Haebara’s (HB) method, Least Square (LS) method

LIU Yue;LIU Hongyun. (2013). Comparison of MIRT Linking Methods for Different Common Item Designs. Acta Psychologica Sinica, 45(4), 466-480 .

[1]	REN He, CHEN Ping. Two new termination rules for multidimensional computerized classification testing [J]. Acta Psychologica Sinica, 2021, 53(9): 1044-1058.
[2]	LIU Yue, LIU Hongyun. Reporting overall scores and domain scores of bi-factor models [J]. Acta Psychologica Sinica, 2017, 49(9): 1234-1246.
[3]	WANG Wenyi;SONG Lihong;DING Shuliang. Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory [J]. Acta Psychologica Sinica, 2016, 48(12): 1612-1624.
[4]	ZHAN Peida; CHEN Ping; BIAN Yufang. Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis [J]. Acta Psychologica Sinica, 2016, 48(10): 1347-1356.
[5]	LIU Hong-Yun,LUO Fang,WANG Yue,ZHANG Yu. Item Parameter Estimation for Multidimensional Measurement: Comparisons of SEM and MIRT Based Methods [J]. , 2012, 44(1): 121-132.
[6]	TU Dong-Bo,CAI Yan,DAI Hai-Qi,DING Shu-Liang. Parameters Estimation of MIRT Model and Its Application in Psychological Tests [J]. , 2011, 43(11): 1329-1340.
[7]	WU Rui,DING Shu-Liang,GAN Deng-Wen. Test Equating with Testlets [J]. , 2010, 42(03): 434-442.
[8]	Ding Shuliang, Xiong Jianhua, Luo Fen, Wu Rui, Gan Xiaofang, Tu Bai. A NEW EQUATING CRITERION AND ITS BEHAVIORS [J]. , 2005, 37(05): 674-680.
[9]	Ding-Shuliang-,-Xiong-Jianhua-,-Mao-Mengmeng. LOGCONTRAST METHOD FOR EQUATING TEST BASED ON IRT [J]. , 2003, 35(06): 835-841.
[10]	Dai Haiqi, Liu Qihui (College of Educational Science , Jiangxi Normal University , Nanchang 330027). EFFECT OF ANCHOR ITEMS AND EQUATING RELATIONSHIP ESTIMATION METHOD ON TEST EQUATING [J]. , 2002, 34(04): 37-40.
[11]	Xie Xiaoqing (Beijing Language and Culture University, Beijing 100083). COMPARISON OF 15 EQUATING METHODS [J]. , 2000, 32(02): 217-222.
[12]	Cao Yiwei(Shenzhen University Normal College, Shenzhen, 518060). CONSTRUCTION OF VOCABULARY TESTS FOR JUNIOR SCHOOL LEVEL [J]. , 1999, 31(02): 215-221.

Comparison of MIRT Linking Methods for Different Common Item Designs

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 12

Recommended Articles

Metrics

Comments