ISSN 0439-755X
CN 11-1911/B

中国科学院心理研究所

• 论文 •

### 认知诊断测验中的项目功能差异检测方法比较

1. (1北京师范大学认知神经科学与学习国家重点实验室) (2中国基础教育质量评价与提升协同创新中心, 北京 100875)
• 收稿日期:2013-10-22 出版日期:2014-12-25 发布日期:2014-12-25
• 通讯作者: 边玉芳, E-mail: bianyufang66@126.com
• 基金资助:

高等学校博士学科点专项科研基金资助课题(20120003110002)资助。

### Comparison of DIF Detecting Methods in Cognitive Diagnostic Test

WANG Zhuoran1; GUO Lei1; BIAN Yufang1,2

1. (1 National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China) (2 National Cooperative Innovation Center for Assessment and Improvement of Basic Education Quality, Beijing Normal University, Beijing 100875, China)
• Received:2013-10-22 Online:2014-12-25 Published:2014-12-25
• Contact: BIAN Yufang, E-mail: bianyufang66@126.com

Abstract:

DIF detecting is an important issue when using cognitive diagnostic tests in practice. MH method, SIBTEST method and Wald test have been introduced into cognitive diagnostic test DIF detection. However, all of them have some limitations. As Logistic Regression is not based on certain model, has a pretty good performance in detecting DIF in IRT test, and could distinguish uniform DIF from non-uniform ones, it is predictable that Logistic Regression (LR) could make up some of the flaws the methods used in cognitive diagnostic test have. The performance of LR was compared with that of MH method and Wald test. Matching criteria, DIF type, DIF size and sample size were also considered. In this simulation study, data was generated using HO-DINA model. When detecting DIF using MH method and LR, 3 kinds of matching criteria were used. Sum score was computed by summing up right answers of each examinee; q was calculated with 2PL model; KS was calculated with 3 different cognitive diagnostic methods, which are DINA model, RSM method and AHM method. Wald test could be directly applied to DINA model. The 4 kinds of DIF are s increases, g increases, s and g increase simultaneously, and s increases while g decreases. Two levels of DIF size are 0.05 and 0.1. Two levels of sample size are 500 examinees per group and 1000 examinees per group. Here are the results: (1) LR did a great job in cognitive diagnostic test DIF detection with a pretty high power and low type I error; (2) LR was not constrained by cognitive diagnostic models, thus it can use KS estimated by whichever cognitive diagnostic methods; (3) LR can distinguish uniform DIF from non-uniform DIF, and the power and type I error are fairly good; (4) Using KS as matching criteria in cognitive diagnostic test DIF detection can provide ideal power and type I error; (5) With the increase of DIF size and sample size, power grew significantly while type I error rate did not change. LR has a satisfying performance in cognitive diagnostic test DIF detection with a high power and a low and stable type I error rate. KS should be the ideal matching criteria in cognitive diagnostic test DIF detection. In the long run, the unique characters of DIF in cognitive diagnostic test should be explored, and pertinence DIF detecting methods should be developed.