%A LIU Hong-Yun,LI Chong,ZHANG Ping-Ping,LUO Fang %T Testing Measurement Equivalence of Categorical Items’ Threshold/Difficulty Parameters: A Comparison of CCFA and (M)IRT Approaches %0 Journal Article %D 2012 %J Acta Psychologica Sinica %R %P 1124-1136 %V 44 %N 8 %U {https://journal.psych.ac.cn/xlxb/CN/abstract/article_1177.shtml} %8 2012-08-28 %X Multiple group confirmatory factor analyses and differential item functioning basing on the unidimensional or the multidimensional item response theory were the two most commonly used methods to assess the measurement equivalence of categorical items. Unlike the traditional linear factor analysis, multiple-group categorical confirmatory factor analysis (CCFA) could model the categorical measures with a threshold structure appropriately, which is comparable to the difficulty parameters in the multidimensional IRT [(M)IRT)]. In this study, we compared the multiple-group categorical CFA (CCFA) and (M)IRT in terms of their power to detect violations of measurement invariance (i.e., DIF) with the Monte Carlo method. Moreover, given the limitation of the assumptions under the traditional unidimensional IRT model, this study extended the DIF test method to the (M)IRT model. Simulation studies under both unidimensional and multidimensional conditions were conducted to compare the DIFFTEST method, IRT-LR method (for unidimensional scale), and MIRT-MG (for multidimensional scale) with respect to their power to detect the lack of invariance across groups. Results indicated that the three methods, namely, DIFFTEST, IRT-LR, and MIRT-MG, showed reasonable power to identify the measurement non-equivalence when the difference of threshold was large. For unidimensional scale, the IRT-LR test demonstrated superior power to DIFFTEST. Whereas, for multidimensional scale, the results were not completely consistent across different conditions. The power of MIRT-MG was higher than that of DIFFTEST when test length was long and the correlation between dimensions was high. In contrast, the power of DIFFTEST was higher than that of MIRT-MG when test length was short and the correlations between dimensions were low. For a fixed number of noninvariant items, the power of the DIFFTEST method became smaller as the test length increased, whereas the power of the IRT-LR and MIRT-MG methods became larger as the test length increased. The number of respondents per group (sample size) was found to be one of the most important factors affecting the performance of these three approaches. The power of the DIFFTEST, IRT-LR, and, MIRT-MG methods would increase as the sample size increased. For a finite number of observations, the power of all three methods was larger under the balanced design when the two groups were equal in size than when two groups were unequal in size in the unbalanced design. For the DIFFTEST method, the Type I errors reached the nominal error rate at 5%, while the IRT-LR and MIRT-MG methods produced much lower Type I error rates.