A STUDY OF SCORE EQUATING IN THE COLLEGE ENGLISH TEST: A NEW APPROACH BASED ON “ANCHOR ITEMS” AND TWO-PARAMETER IRT MODEL

Abstract

Abstract: In China’s College English Test (CET), Rasch model has been used in the score equating procedure for 15 years and lots of score equating data have been accumulated. This paper discusses in detail some demerits of the score equating method based on Rasch model, and introduces a new score equating approach based on “anchor items” and two-parameters IRT model (the Item Response Theory model). It is assumed that for the old score equating method based on Rasch model: 1)The students in the control group give equal attention to both the formal and the control papers. 2)There has been no leakage of the items in either paper. 3) All items have the same Discrimination Index. A failure in assumption 1) would usually occur because the students feel that the control paper test is an extra burden to them and they often do not give it the same importance as the formal paper. In this case their marks on the control paper would be lower than their true performance. If the two papers were, in fact, equally easy or difficult they would score lower marks on the control paper, thus making it appear harder. This would have the effect of making the formal paper seem to be relatively easier and in the process of equating the students’ marks would be reduced. If assumption 2) is not true and the control paper has not truly been kept confidential, the effects would be in the opposite direction. The candidates would do better than they should on the control paper, causing their marks on that paper to be relatively high in comparison with the formal test. The latter test would therefore appear to the equating algorithm to be harder than it really is and all the students’ marks would be increased. Note that this would be true even if only a few items were leaked. For example, if just one Reading passage were leaked, together with the associated items, those five items would be scored correct for students who might otherwise have failed at least in some of them. Since reading items have double weight, this could falsely increase the score of weaker students by up to 10 marks! Of course, the effect on the mean score would be smaller since many students would have scored on these items anyway. It might also be argued that, since there is evidence that the items do not all have the same Discrimination Index, the two/three-parameter IRT model should be used. It has to be accepted that any equating step will increase the standard error of measurement (SEM) of the final score because the parameters that need to be used for equating will be estimated with some standard error of their own. However, this increase will usually be small (given the sample size of several hundred used to do the model fitting) and should be more than compensated for by the reduction in the “between-forms” bias, which the equating procedure is designed to correct. In this paper, a pilot study with real CET test data is reported with satisfactory score equating results.

Key words: College English Test, Item Response Theory, score equating

CLC Number:

B841

Zhu Zhengcai. (2005). A STUDY OF SCORE EQUATING IN THE COLLEGE ENGLISH TEST: A NEW APPROACH BASED ON “ANCHOR ITEMS” AND TWO-PARAMETER IRT MODEL. , 37(02), 280-284.

[1]	REN He, CHEN Ping. Two new termination rules for multidimensional computerized classification testing [J]. Acta Psychologica Sinica, 2021, 53(9): 1044-1058.
[2]	ZHAN Peida, JIAO Hong, MAN Kaiwen. The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed [J]. Acta Psychologica Sinica, 2020, 52(9): 1132-1142.
[3]	LIU Yue, LIU Hongyun. Reporting overall scores and domain scores of bi-factor models [J]. Acta Psychologica Sinica, 2017, 49(9): 1234-1246.
[4]	CHEN Ping. Two new online calibration methods for computerized adaptive testing [J]. Acta Psychologica Sinica, 2016, 48(9): 1184-1198.
[5]	MENG Xiangbin; TAO Jian; CHEN Shali. Warm’sweighted maximum likelihood estimation of latent trait in the four-parameter logistic model [J]. Acta Psychologica Sinica, 2016, 48(8): 1047-1056.
[6]	WANG Wenyi;SONG Lihong;DING Shuliang. Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory [J]. Acta Psychologica Sinica, 2016, 48(12): 1612-1624.
[7]	ZHAN Peida; CHEN Ping; BIAN Yufang. Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis [J]. Acta Psychologica Sinica, 2016, 48(10): 1347-1356.
[8]	ZHAN Peida; LI Xiaomin; WANG Wen-Chung; BIAN Yufang; WANG Lijun. The Multidimensional Testlet-Effect Cognitive Diagnostic Models [J]. Acta Psychologica Sinica, 2015, 47(5): 689-701.
[9]	YAO Ruosong;ZHAO Baonan;LIU Ze;MIAO Qunying. The Application of Many-Facet Rasch Model in Leaderless Group Discussion [J]. Acta Psychologica Sinica, 2013, 45(9): 1039-1049.
[10]	LIU Yue;LIU Hongyun. Comparison of MIRT Linking Methods for Different Common Item Designs [J]. Acta Psychologica Sinica, 2013, 45(4): 466-480 .
[11]	DU Wenjiu;ZHOU Juan;LI Hongbo. The Item Parameters’ Estimation Accuracy of Two-Parameter Logistic Model [J]. Acta Psychologica Sinica, 2013, 45(10): 1179-1186.
[12]	LIU Hong-Yun,LI Chong,ZHANG Ping-Ping,LUO Fang. Testing Measurement Equivalence of Categorical Items’ Threshold/Difficulty Parameters: A Comparison of CCFA and (M)IRT Approaches [J]. Acta Psychologica Sinica, 2012, 44(8): 1124-1136.
[13]	LIU Hong-Yun,LUO Fang,WANG Yue,ZHANG Yu. Item Parameter Estimation for Multidimensional Measurement: Comparisons of SEM and MIRT Based Methods [J]. , 2012, 44(1): 121-132.
[14]	TU Dong-Bo,CAI Yan,DAI Hai-Qi,DING Shu-Liang. Parameters Estimation of MIRT Model and Its Application in Psychological Tests [J]. , 2011, 43(11): 1329-1340.
[15]	WU Rui,DING Shu-Liang,GAN Deng-Wen. Test Equating with Testlets [J]. , 2010, 42(03): 434-442.

A STUDY OF SCORE EQUATING IN THE COLLEGE ENGLISH TEST: A NEW APPROACH BASED ON “ANCHOR ITEMS” AND TWO-PARAMETER IRT MODEL

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments