Please wait a minute...
  论文 本期目录 | 过刊浏览 | 高级检索 |
IRT_Δb法和修正LR法对矩阵取样 DIF检验的有效性
(1北京师范大学认知神经科学与学习国家重点实验室, 北京 100875) (2北京师范大学心理学院, 北京 100875)
Applying IRT_ΔB Procedure and Adapted LR Procedure to Detect DIF in Tests with Matrix Sampling
ZHANG Xun;LI Lingyan;LIU Hongyun;SUN Yan
(1 National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China) (2 School of Psychology, Beijing Normal University, Beijing 100875, China)
全文: PDF(544 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 矩阵取样测验包含多个题册, 单个题册的总分不能直接作为匹配变量用于DIF检测。本研究首先基于模拟数据, 同时采用IRT_Δb法, 以及用IRT模型估计的考生能力作为匹配变量修订后的LR法对矩阵取样测验进行DIF检测, 分析二者进行DIF检测的有效性及其相关影响因素; 并根据已有的LR法DIF判断标准划定出IRT_Δb法分类标准; 最后使用实证数据加以验证。结果显示:矩阵取样测验中, IRT_Δb法和修正LR法均能较好地区分DIF量不同的题目; 样本量、题册中DIF题目的比例和考生群体间真实能力的差异对两种方法的检验力、犯I类错误的概率和分类结果都有较大影响。
E-mail Alert
关键词 矩阵取样测验项目功能差异Rasch模型Logistic回归    
Abstract:Matrix sampling is a useful technique widely used in large-scale educational assessments. In an assessment with matrix sampling design, each examinee takes one of the multiple booklets with partial items. A critical problem of detecting differential item functioning (DIF) in such scenario has gained a lot of attention in recent years, which is, it is not appropriate to take the observed total score obtained from individual booklet as the matching variable in detecting the DIF. Therefore, the traditional detecting methods, such as Mantel-Haenszel (MH), SIBTEST, as well as Logistic Regression (LR) are not suitable. IRT_Δb might be an alternative due to its abilities to provide valid matching variable. However, the DIF classification criterion of IRT_Δb was not well established yet. Thus, the purpose of this study were: 1) to investigate the efficiency and robustness of using ability parameters obtained from Item Response Theory (IRT) model as the matching variable, comparing with the way using traditional observed raw total scores ; 2) to further identify what factors will influence the abilities in detecting DIF of two methods; 3) to propose a DIF classification criteria for IRT_Δb. Simulated and empirical data were both employed in this study to explore the robustness and the efficiency of the two prevailing DIF detecting methods, which were the IRT_Δb method and the adapted LR method with the estimation of group-level ability based on IRT model as the matching variable. In the Monte Carlo study, a matrix sampling test was generated, and various experimental conditions were simulated as follows: 1) different proportions of DIF items; 2) different actual examinee ability distributions; 3) different sample sizes; 4) different size of DIF. Two DIF detection methods were then applied and results were compared. In addition, power functions were established in order to derive DIF classification rule for IRT_Δb based on current rules for LR. In the empirical study, through conducting a DIF analysis for American and Korean mathematics tests from Programme for International Student Assessment (PISA) 2003, the consistency of the classification rules between IRT_Δb and LR were further examined. The results indicated that in the matrix sampling design, both IRT_Δb method and adjusted LR method were sensitive to the diverse DIF magnitude. It was also found that the power, type I error, and the final classification of both methods were also influenced by the sample size, percentage of items with DIF, and ability differences between the focused group and the reference group. In conclusion, it was found that both the IRT_Δb method and adjusted LR method can be used to detect DIF in matrix sampling tests. A classification rule for IRT_Δb was proposed, which are: 0.85 between negligible DIF(A) and intermediate DIF(B), 1.23 between intermediate DIF(B) and large DIF(C). Meanwhile, it was suggested that researchers would take this rule as a tentative principle since the ΔR2 was limited between a narrow interval and the classification rule of LR was very flexible compared to classification rule of MH. Further studies could be conducted to take MH, IRT_Δb as well as LR into consideration simultaneously to give more comparable and consistent classification rules for different methods.
Key wordsDifferential Item Functioning    Matrix Sampling    Rasch model    Logistic regression
收稿日期: 2011-08-02      出版日期: 2013-08-25
通讯作者: 李凌艳   
张勋;李凌艳;刘红云;孙研. IRT_Δb法和修正LR法对矩阵取样 DIF检验的有效性[J]. 心理学报, 10.3724/SP.J.1041.2013.00921.
ZHANG Xun;LI Lingyan;LIU Hongyun;SUN Yan. Applying IRT_ΔB Procedure and Adapted LR Procedure to Detect DIF in Tests with Matrix Sampling. Acta Psychologica Sinica, 2013, 45(8): 921-934.
链接本文:      或
[1] 刘彦楼;辛涛;李令青;田伟;刘笑笑. 改进的认知诊断模型项目功能差异检验方法 ——基于观察信息矩阵的Wald统计量[J]. 心理学报, 2016, 48(5): 588-598.
[2] 詹沛达;王文中;王立君;李晓敏. 多维题组效应Rasch模型[J]. 心理学报, 2014, 46(8): 1208-1222.
[3] 王卓然; 郭磊; 边玉芳. 认知诊断测验中的项目功能差异检测方法比较[J]. 心理学报, 2014, 46(12): 1923-1932.
[4] 姚若松;赵葆楠;刘泽;苗群鹰. 无领导小组讨论的多侧面Rasch模型应用[J]. 心理学报, 2013, 45(9): 1039-1049.
[5] 刘红云,李冲,张平平,骆方. 分类数据测量等价性检验方法及其比较:项目阈值(难度)参数的组间差异性检验[J]. 心理学报, 2012, 44(8): 1124-1136.
[6] 郑蝉金,郭聪颖,边玉芳. 变通的题组项目功能差异检验方法在篇章阅读测验中的应用[J]. , 2011, 43(07): 830-835.
[7] 孙晓敏,薛刚. 多面Rasch模型在结构化面试中的应用[J]. , 2008, 40(09): 1030-1040.
[8] 曹亦薇,毛成美. 纵向Rasch模型在大学新生适应性追踪研究中的应用[J]. , 2008, 40(04): 427-435.
[9] 曹亦薇. 项目功能差异在跨文化人格问卷分析中的应用[J]. , 2003, 35(01): 120-126.
[10] 曹亦薇,张厚粲. 汉语词汇测验中的项目功能差异初探[J]. , 1999, 31(04): 460-467.
[11] 苗丹民,王惠芳,皇甫恩,关鹏举,游旭群. 银屑病发病的心理社会因素及危险因素间关系的探讨——非条件Logistic回归分析和通径分析[J]. , 1993, 25(01): 47-50.
Full text



版权所有 © 《心理学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持