Applying IRT_ΔB Procedure and Adapted LR Procedure to Detect DIF in Tests with Matrix Sampling

doi:10.3724/SP.J.1041.2013.00921

Abstract

Abstract: Matrix sampling is a useful technique widely used in large-scale educational assessments. In an assessment with matrix sampling design, each examinee takes one of the multiple booklets with partial items. A critical problem of detecting differential item functioning (DIF) in such scenario has gained a lot of attention in recent years, which is, it is not appropriate to take the observed total score obtained from individual booklet as the matching variable in detecting the DIF. Therefore, the traditional detecting methods, such as Mantel-Haenszel (MH), SIBTEST, as well as Logistic Regression (LR) are not suitable. IRT_Δb might be an alternative due to its abilities to provide valid matching variable. However, the DIF classification criterion of IRT_Δb was not well established yet. Thus, the purpose of this study were: 1) to investigate the efficiency and robustness of using ability parameters obtained from Item Response Theory (IRT) model as the matching variable, comparing with the way using traditional observed raw total scores ; 2) to further identify what factors will influence the abilities in detecting DIF of two methods; 3) to propose a DIF classification criteria for IRT_Δb. Simulated and empirical data were both employed in this study to explore the robustness and the efficiency of the two prevailing DIF detecting methods, which were the IRT_Δb method and the adapted LR method with the estimation of group-level ability based on IRT model as the matching variable. In the Monte Carlo study, a matrix sampling test was generated, and various experimental conditions were simulated as follows: 1) different proportions of DIF items; 2) different actual examinee ability distributions; 3) different sample sizes; 4) different size of DIF. Two DIF detection methods were then applied and results were compared. In addition, power functions were established in order to derive DIF classification rule for IRT_Δb based on current rules for LR. In the empirical study, through conducting a DIF analysis for American and Korean mathematics tests from Programme for International Student Assessment (PISA) 2003, the consistency of the classification rules between IRT_Δb and LR were further examined. The results indicated that in the matrix sampling design, both IRT_Δb method and adjusted LR method were sensitive to the diverse DIF magnitude. It was also found that the power, type I error, and the final classification of both methods were also influenced by the sample size, percentage of items with DIF, and ability differences between the focused group and the reference group. In conclusion, it was found that both the IRT_Δb method and adjusted LR method can be used to detect DIF in matrix sampling tests. A classification rule for IRT_Δb was proposed, which are: 0.85 between negligible DIF(A) and intermediate DIF(B), 1.23 between intermediate DIF(B) and large DIF(C). Meanwhile, it was suggested that researchers would take this rule as a tentative principle since the ΔR2 was limited between a narrow interval and the classification rule of LR was very flexible compared to classification rule of MH. Further studies could be conducted to take MH, IRT_Δb as well as LR into consideration simultaneously to give more comparable and consistent classification rules for different methods.

Key words: Differential Item Functioning, Matrix Sampling, Rasch model, Logistic regression

ZHANG Xun;LI Lingyan;LIU Hongyun;SUN Yan. (2013). Applying IRT_ΔB Procedure and Adapted LR Procedure to Detect DIF in Tests with Matrix Sampling. Acta Psychologica Sinica, 45(8), 921-934.

[1]	LIU Yanlou; XIN Tao; LI Lingqing; TIAN Wei; LIU Xiaoxiao. An improved method for differential item functioning detection in cognitive diagnosis models: An application of Wald statistic based on observed information matrix [J]. Acta Psychologica Sinica, 2016, 48(5): 588-598.
[2]	ZHAN Peida;Wen-Chung WANG;WANG Lijun;LI Xiaomin. The Multidimensional Testlet-Effect Rasch Model [J]. Acta Psychologica Sinica, 2014, 46(8): 1208-1222.
[3]	WANG Zhuoran; GUO Lei; BIAN Yufang. Comparison of DIF Detecting Methods in Cognitive Diagnostic Test [J]. Acta Psychologica Sinica, 2014, 46(12): 1923-1932.
[4]	YAO Ruosong;ZHAO Baonan;LIU Ze;MIAO Qunying. The Application of Many-Facet Rasch Model in Leaderless Group Discussion [J]. Acta Psychologica Sinica, 2013, 45(9): 1039-1049.
[5]	LIU Hong-Yun,LI Chong,ZHANG Ping-Ping,LUO Fang. Testing Measurement Equivalence of Categorical Items’ Threshold/Difficulty Parameters: A Comparison of CCFA and (M)IRT Approaches [J]. Acta Psychologica Sinica, 2012, 44(8): 1124-1136.
[6]	SUN Xiao-Min,XUE Gang. A Many-faceted Rasch Model Analysis of Structured Interview [J]. , 2008, 40(09): 1030-1040.
[7]	CAO Yi-Wei,MAO Cheng-Mei. Adjustment of Freshman College Students: A longitudinal Study using Longitudinal Rasch Model [J]. , 2008, 40(04): 427-435.
[8]	Cao Yiwei. A CROSS-CULTURAL COMPARATIVE STUDY OF PERSONALITY: USING DIFFERENTIAL ITEM FUNCTIONING OF IRT [J]. , 2003, 35(01): 120-126.
[9]	Miao Danmin,Wang Huifang, et al. Psychology Section, the Fourth Military Medical University. A CASE-CONTROL STUDY ON PSYCHOSOCIAL RISK FACTORS OF PSORIASIS-NON-CONDITIONAL LOGISTIC REGRESSION ANALYSIS AND PATH ANALYSIS [J]. , 1993, 25(01): 47-50.

Applying IRT_ΔB Procedure and Adapted LR Procedure to Detect DIF in Tests with Matrix Sampling

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 9

Recommended Articles

Metrics

Comments