复杂决策规则下MIRT的分类准确性和分类一致性

doi:10.3724/SP.J.1041.2016.01612

心理学报 ›› 2016, Vol. 48 ›› Issue (12): 1612-1624.doi: 10.3724/SP.J.1041.2016.01612

复杂决策规则下MIRT的分类准确性和分类一致性

汪文义¹; 宋丽红²; 丁树良¹

(1江西师范大学计算机信息工程学院; 2江西师范大学初等教育学院, 南昌 330022)

收稿日期:2015-10-24 发布日期:2016-12-24 出版日期:2016-12-24
通讯作者: 宋丽红, E-mail: viviansong1981@163.com.
基金资助:
国家自然科学基金项目(31500909, 31360237, 31160203, 30860084)、全国教育科学规划教育部重点课题(DHA150285)、教育部人文社会科学研究青年基金项目(13YJC880060)、江西省自然科学基金项目(20161BAB212044)、江西省社会科学研究“十二五” (2012年)规划项目(12JY07)、江西省教育科学2013年度一般课题(13YB032)、江西省教育厅科技计划项目(GJJ13207)、国家留学基金委资助项目(201509470001)、江西师范大学青年成长基金和博士启动基金资助。

Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory

WANG Wenyi¹;SONG Lihong²;DING Shuliang¹

(¹College of Computer Information Engineering; ² Elementary Educational College, Jiangxi Normal University, Nanchang 330022, China)

Received:2015-10-24 Online:2016-12-24 Published:2016-12-24
Contact: SONG Lihong, E-mail: viviansong1981@163.com.

摘要/Abstract

摘要：

介绍多维项目反应理论模型下分类准确性和分类一致性指标, 采用蒙特卡罗方法实现复杂决策规则下指标计算, 并从数学上证明分类准确性指标两类估计量在均匀先验和相同决策规则条件下依概率收敛于同一真值。研究结果表明：分类准确性指标可以比较准确地评价分类结果的准确性; 分类一致性指标可以较好地评价分类结果的重测一致性; 在一定条件下, 基于能力量尺的指标优于基于原始总分的指标; 纵使测验维度增加, 估计精度仍比较好; 随着测验长度和维度间相关增加, 分类准确性和分类一致性更高。指标可以用来评价标准参照测验或计算机分类测验的多种决策规则下分类信度和效度。

关键词: 多维项目反应理论, 决策规则, 分类一致性, 分类准确性, 信度, 效度

Abstract:

For a criterion-referenced test, classification consistency and accuracy indices are important indicators to evaluate the reliability and validity of classification results. Some procedures have been proposed to estimate these indices in the framework of unidimensional item response theory (UIRT) based on either the total sum scores or the latent trait estimates. Although multidimensional item response theory (MIRT) has enjoyed tremendous popularity, most research is based on the total sum scores only, and Yao (2016) is a case in point. The present authors believe that under MIRT, the decision rules on the two indices should consider the both depending on the different situations. The two reasons are (1) Classifications from the latent trait estimates are equally or more accurate than from the total sum scores, at least for the logistic model of one-parameter, two-parameters, and the graded response model in UIRT; (2) It may be difficult to estimate the two indices from the total sum scores in some content areas when some items may measure more than two domains (complex structure). In this study, the Guo-based consistency and accuracy indices have been extended to MIRT for complex decision rules. Monte Carlo method was employed to estimate Lee-and Guo-based indices for tackling intractable summations or high-dimensional integrals. A simulation study was conducted under a multidimensional graded response model (MGRM). In the simulation study, one, two and four factors were manipulated. Three levels of correlation (ρ=0.0, ρ=0.50, and ρ=0.8) between pairs of dimensions were considered. The examinee sample size was 1,000 and 3,000 respectively. The ability vectors were generated from the multivariate normal distributions with an appropriately sized mean vector of 0 and covariance matrix Σ, where the diagonal elements of Σ were all 1 and the off-diagonal elements were given by the corresponding correlations. The test length for the one factor model was 10 and 20, for the two factor model was 15 and 30, and for the four factor model was 30 and 60. In order to balance information of each domain or dimension, content balancing techniques were adopted to ensure that the tests fulfill the content or domain requirements. The fully crossed design yielded a total of 28 conditions, where each was replicated 10 times. Simulation results suggested that the Guo-based indices worked well and flexibly because their values matched closely with the simulated consistency and accuracy rates for three decision rules, and the difference between the Lee- and Guo-based accuracy indices was much smaller for decision rule based on total score, which conformed to the theoretical results. The two practical implications of this research are identified. First, the indices can be used in score interpretations and test construction. Since it is convenient to estimate consistency and accuracy indices for domain scores and composite scores when the true cut scores are set on the θ scale, items that measure specific dimension with low indices can be created. Second, they might be useful in developing item selection algorithm in computerized classification testing for making multidimensional classification decisions.

Key words: multidimensional item response theory, decision rule, classification consistency, classification accuracy, reliability, validity

汪文义; 宋丽红;丁树良. (2016). 复杂决策规则下MIRT的分类准确性和分类一致性. 心理学报, 48(12), 1612-1624.

WANG Wenyi;SONG Lihong;DING Shuliang. (2016). Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory. Acta Psychologica Sinica, 48(12), 1612-1624.

[1]	李庆功, 方溦, 胡超, 石德君, 胡晓晴, 傅根跃, 王乾东. 灰姑娘能变成白雪公主吗?感知到的信任对他人面孔表征的影响[J]. 心理学报, 2023, 55(9): 1518-1528.
[2]	覃慧怡, 丁丽洪, 段威, 雷旭. 脑电的重测信度：在多项静息态和任务态实验中的对比[J]. 心理学报, 2023, 55(10): 1587-1596.
[3]	任赫, 陈平. 两种新的多维计算机化分类测验终止规则[J]. 心理学报, 2021, 53(9): 1044-1058.
[4]	柯晓晓, 齐惠紫, 梁家辉, 金欣园, 高婕, 张明霞, 汪亚珉. 中国人整体性思维特征的情境评估法及其应用[J]. 心理学报, 2021, 53(12): 1299-1309.
[5]	孙琳, 段涛, 刘伟, 陈宁. 特质正念对初中生学业情绪预测偏差的影响[J]. 心理学报, 2021, 53(11): 1203-1214.
[6]	袁小钧, 崔晓霞, 曹正操, 阚红, 王晓, 汪亚珉. 虚拟仿真场景中威胁性视觉刺激搜索的注意偏向效应 ^*[J]. 心理学报, 2018, 50(6): 622-636.
[7]	刘雁伶, 曾晓青, 左玲, 黄乐辉, 陈水平, 胡竹菁. 证词自信度和自主探索综合影响 5岁儿童的因果推理[J]. 心理学报, 2018, 50(5): 494-503.
[8]	王秀娟, 王娜, 韩尚锋, 刘燊, 张林. 面孔可信度对助人行为的影响：依恋安全的调节作用[J]. 心理学报, 2018, 50(11): 1292-1302.
[9]	余柳涛; 鲍建樟;陈清华;王大辉. 个体自信度对双人决策的影响[J]. 心理学报, 2016, 48(8): 1013-1025.
[10]	彭亚风;罗照盛;喻晓锋;高椿雷;李喻骏. 认知诊断评价中测验结构的优化设计[J]. 心理学报, 2016, 48(12): 1600-1611.
[11]	詹沛达;陈平;边玉芳. 使用验证性补偿多维IRT模型进行认知诊断评估[J]. 心理学报, 2016, 48(10): 1347-1356.
[12]	罗照盛;郭小军. 认知行为实验研究中最佳素材容量的选择与确定：多元概化理论应用[J]. 心理学报, 2014, 46(6): 876-884.
[13]	马华维,姚琦. 企业中的上级信任：作为一种行动意愿[J]. 心理学报, 2012, 44(6): 818-829.
[14]	杜文久;肖涵敏. 多维项目反应理论等级反应模型[J]. 心理学报, 2012, 44(10): 1402-1407.
[15]	刘红云,骆方,王玥,张玉. 多维测验项目参数的估计：基于SEM与MIRT方法的比较[J]. 心理学报, 2012, 44(1): 121-132.

复杂决策规则下MIRT的分类准确性和分类一致性

Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory

PDF (PC)

评审附件

可视化

English Version

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价