ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2014, Vol. 46 ›› Issue (12): 1923-1932.doi: 10.3724/SP.J.1041.2014.01923

• 论文 • 上一篇    下一篇

认知诊断测验中的项目功能差异检测方法比较

王卓然1; 郭磊1; 边玉芳1,2   

  1. (1北京师范大学认知神经科学与学习国家重点实验室) (2中国基础教育质量评价与提升协同创新中心, 北京 100875)
  • 收稿日期:2013-10-22 发布日期:2014-12-25 出版日期:2014-12-25
  • 通讯作者: 边玉芳, E-mail: bianyufang66@126.com
  • 基金资助:

    高等学校博士学科点专项科研基金资助课题(20120003110002)资助。

Comparison of DIF Detecting Methods in Cognitive Diagnostic Test

WANG Zhuoran1; GUO Lei1; BIAN Yufang1,2   

  1. (1 National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China) (2 National Cooperative Innovation Center for Assessment and Improvement of Basic Education Quality, Beijing Normal University, Beijing 100875, China)
  • Received:2013-10-22 Online:2014-12-25 Published:2014-12-25
  • Contact: BIAN Yufang, E-mail: bianyufang66@126.com

摘要:

检测项目功能差异(DIF)是认知诊断测验中很重要的问题。首先将逻辑斯蒂克回归法(LR)引入认知诊断测验DIF检测, 然后将LR法与MH法和Wald检验法的DIF检验效果进行比较。在比较中同时考察了匹配变量、DIF种类、DIF大小和受测者人数的影响。结果表明:(1) LR法在认知诊断测验DIF检测中, 检验力较高, 一类错误率较低。(2) LR法在检测认知诊断测验的DIF时, 不受认知诊断方法的影响。(3) LR法可以有效区分一致性DIF和非一致性DIF, 并有较高检验力和较低一类错误率。(4)采用知识状态作为匹配变量, 能够得到较理想的检验力和一类错误率。(5) DIF越大, 受测者人数越多, 统计检验力越高, 但一类错误率不受影响。

关键词: 认知诊断, 项目功能差异, 逻辑斯蒂克回归法, 一致性DIF, 非一致性DIF

Abstract:

DIF detecting is an important issue when using cognitive diagnostic tests in practice. MH method, SIBTEST method and Wald test have been introduced into cognitive diagnostic test DIF detection. However, all of them have some limitations. As Logistic Regression is not based on certain model, has a pretty good performance in detecting DIF in IRT test, and could distinguish uniform DIF from non-uniform ones, it is predictable that Logistic Regression (LR) could make up some of the flaws the methods used in cognitive diagnostic test have. The performance of LR was compared with that of MH method and Wald test. Matching criteria, DIF type, DIF size and sample size were also considered. In this simulation study, data was generated using HO-DINA model. When detecting DIF using MH method and LR, 3 kinds of matching criteria were used. Sum score was computed by summing up right answers of each examinee; q was calculated with 2PL model; KS was calculated with 3 different cognitive diagnostic methods, which are DINA model, RSM method and AHM method. Wald test could be directly applied to DINA model. The 4 kinds of DIF are s increases, g increases, s and g increase simultaneously, and s increases while g decreases. Two levels of DIF size are 0.05 and 0.1. Two levels of sample size are 500 examinees per group and 1000 examinees per group. Here are the results: (1) LR did a great job in cognitive diagnostic test DIF detection with a pretty high power and low type I error; (2) LR was not constrained by cognitive diagnostic models, thus it can use KS estimated by whichever cognitive diagnostic methods; (3) LR can distinguish uniform DIF from non-uniform DIF, and the power and type I error are fairly good; (4) Using KS as matching criteria in cognitive diagnostic test DIF detection can provide ideal power and type I error; (5) With the increase of DIF size and sample size, power grew significantly while type I error rate did not change. LR has a satisfying performance in cognitive diagnostic test DIF detection with a high power and a low and stable type I error rate. KS should be the ideal matching criteria in cognitive diagnostic test DIF detection. In the long run, the unique characters of DIF in cognitive diagnostic test should be explored, and pertinence DIF detecting methods should be developed.

Key words: cognitive diagnosis, differential item functioning, logistic regression, uniform DIF, non-uniform DIF