ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2016, Vol. 48 ›› Issue (7): 903-914.doi: 10.3724/SP.J.1041.2016.00903

• 论文 • 上一篇    

认知诊断计算机化自适应测验中新的选题策略:结合项目区分度指标

郭磊1,2,3; 郑蝉金4; 边玉芳5; 宋乃庆3,6; 夏凌翔1   

  1. (1西南大学心理学部, 重庆 400715) (2西南大学统计学博士后科研流动站, 重庆 400715) (3中国基础教育质量监测协同创新中心西南大学分中心, 重庆 400715) (4江西师范大学心理学院, 南昌 330022) (5北京师范大学中国基础教育质量监测协同创新中心, 北京 100875) (6西南大学基础教育研究中心, 重庆 400715)
  • 收稿日期:2014-10-08 发布日期:2016-07-25 出版日期:2016-07-25
  • 通讯作者: 郭磊, E-mail: happygl1229@swu.edu.cn
  • 基金资助:

    中央高校基本科研业务费专项资金资助, 项目批准号: SWU1409433。教育部人文社会科学研究青年基金项目, 项目批准号: 15YJC190003。自立人格与社区心理(PI)研究室科研基金资助。

New item selection methods in cognitive diagnostic computerized adaptive testing: Combining item discrimination indices

GUO Lei1,2,3; ZHENG Chanjin4; BIAN Yufang5; SONG Naiqing3,6; XIA Lingxiang1   

  1. (1 Faculty of Psychology, Southwest University, Chongqing 400715, China) (2 Postdoctoral Research Center for Statistics, Southwest University, Chongqing 400715, China) (3 Southwest University Branch, Collaborative Innovation Center of Assessment toward Basic Education Quality, Chongqing 400715, China) (4 School of Psychology, Jiangxi Normal University, Nanchang 330022, China) (5 Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing 100875, China) (6 Center for Basic Education Research, Southwest University, Chongqing 400715, China)
  • Received:2014-10-08 Online:2016-07-25 Published:2016-07-25
  • Contact: GUO Lei, E-mail: happygl1229@swu.edu.cn

摘要:

当前国内外大部分认知诊断计算机化自适应测验(CD-CAT)主要采用PWKL作为选题策略进行研究。PWKL结合后验分布信息对KL指标进行加权, 提高了判准率, 但该方法仅利用个体层面信息加权, 忽视了项目本身能够提供的信息, 属于单源指标。本研究结合认知诊断中的项目区分度信息, 对PWKL进行修正, 提出了4种新的多源选题策略:GIDPWKL、AIDPWKL、CIDPWKL和KLEDPWKL方法, 并在加入曝光控制下与PWKL和互信息法(MIM)进行比较。模拟研究结果表明:(1)在定长测验情景下的绝大多数实验结果表明, 测验长度越短, 新方法的判准率越高。平均属性/模式判准率最高的是GIDPWKL, 之后是AIDPWKL, 而CIDPWKL、KLEDPWKL和MIM方法的优势随实验条件不同而不同。(2)在定长测验情景下的绝大多数实验结果表明, 题目质量越高, 新方法的优势越明显。(3)Q矩阵结构的复杂性会影响不同选题策略的表现。(4)在变长测验情景下, 4种新方法和MIM的平均测验长度均要低于PWKL方法, 表现最好的是GIDPWKL方法。因此, 若实际测验情景与本研究的模拟情景相似, 推荐GIDPWKL方法。

关键词: 认知诊断计算机化自适应测验, 选题策略, 项目区分度, 曝光控制

Abstract:

Interest in developing computerized adaptive testing (CAT) under cognitive diagnostic models has increased recently. Cognitive diagnostic computerized adaptive testing (CD-CAT) attempt to classify examinees into the correct latent class profile so as to pinpoint the strengths and weaknesses of each examinee whereas CAT algorithms choose items from the item bank to achieve that goal as efficiently as possible. Most of the research in CD-CAT uses the posterior-weighted Kullback-Leibler (PWKL) index due to its high efficiency. The PWKL index integrated the posterior probabilities of examinees’ latent class profiles into the KL information, and thus improved item selection efficiency considerably. However, the PWKL index only used examinee-based information to assess the relative importance of each latent class profile. The current study attempted to take advantage of not only the examinee-base information but also the item-based information that could be readily obtained from items. In a sense, the PWKL index should be regarded as single-source index. This paper introduced four new multiple-source item selection methods, GIDPWKL, AIDPWKL, CIDPWKL, and KLEDPWKL respectively, which can be modified from the PWKL index by combining the item discrimination information. Two simulation studies were conducted to evaluate the new methods’ efficiency against the PWKL index and mutual information (MI) index in the DINA model with the exposure control. The effects of different factors were investigated: the Q matrix structure (simple vs. complex), item quality (high vs. low) and test length (moderate vs. short). Simulation results indicated that: (1) In most cases, the shorter the test length was, the higher AACCR and PCCR values the four new methods would have in the fix-length test. The GIDPWKL index had the highest average attribute correct classification rate and pattern correct classification rate among the six methods, and followed by AIDPWKL index. The performance among the CIDPWKL, KLEDPWKL, and MI depends on the experimental conditions. (2) In most cases, the higher the item quality was, the more advantage the four new methods would have in the fix-length test. (3) The structure of the Q matrix affected the performance of different item selection methods. (4) In the variable-length test, the mean of test length across all examinees for the four new methods and MI method were all smaller than those in the PWKL method. As a whole, the performance of the GIDPWKL index was the best, and should be recommended in practice where had the similar testing scenarios.

Key words: cognitive diagnostic computerized adaptive testing, item selection strategy, item discrimination, exposure control