Please wait a minute...
Acta Psychologica Sinica    2016, Vol. 48 Issue (12) : 1612-1624     DOI: 10.3724/SP.J.1041.2016.01612
Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory
WANG Wenyi1;SONG Lihong2;DING Shuliang1
(1 College of Computer Information Engineering; 2 Elementary Educational College, Jiangxi Normal University, Nanchang 330022, China)
Download: PDF(474 KB)   Review File (1 KB) 
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    

For a criterion-referenced test, classification consistency and accuracy indices are important indicators to evaluate the reliability and validity of classification results. Some procedures have been proposed to estimate these indices in the framework of unidimensional item response theory (UIRT) based on either the total sum scores or the latent trait estimates. Although multidimensional item response theory (MIRT) has enjoyed tremendous popularity, most research is based on the total sum scores only, and Yao (2016) is a case in point. The present authors believe that under MIRT, the decision rules on the two indices should consider the both depending on the different situations. The two reasons are (1) Classifications from the latent trait estimates are equally or more accurate than from the total sum scores, at least for the logistic model of one-parameter, two-parameters, and the graded response model in UIRT; (2) It may be difficult to estimate the two indices from the total sum scores in some content areas when some items may measure more than two domains (complex structure). In this study, the Guo-based consistency and accuracy indices have been extended to MIRT for complex decision rules. Monte Carlo method was employed to estimate Lee-and Guo-based indices for tackling intractable summations or high-dimensional integrals. A simulation study was conducted under a multidimensional graded response model (MGRM). In the simulation study, one, two and four factors were manipulated. Three levels of correlation (ρ=0.0, ρ=0.50, and ρ=0.8) between pairs of dimensions were considered. The examinee sample size was 1,000 and 3,000 respectively. The ability vectors were generated from the multivariate normal distributions with an appropriately sized mean vector of 0 and covariance matrix Σ, where the diagonal elements of Σ were all 1 and the off-diagonal elements were given by the corresponding correlations. The test length for the one factor model was 10 and 20, for the two factor model was 15 and 30, and for the four factor model was 30 and 60. In order to balance information of each domain or dimension, content balancing techniques were adopted to ensure that the tests fulfill the content or domain requirements. The fully crossed design yielded a total of 28 conditions, where each was replicated 10 times. Simulation results suggested that the Guo-based indices worked well and flexibly because their values matched closely with the simulated consistency and accuracy rates for three decision rules, and the difference between the Lee- and Guo-based accuracy indices was much smaller for decision rule based on total score, which conformed to the theoretical results. The two practical implications of this research are identified. First, the indices can be used in score interpretations and test construction. Since it is convenient to estimate consistency and accuracy indices for domain scores and composite scores when the true cut scores are set on the θ scale, items that measure specific dimension with low indices can be created. Second, they might be useful in developing item selection algorithm in computerized classification testing for making multidimensional classification decisions.

Keywords multidimensional item response theory      decision rule      classification consistency      classification accuracy      reliability      validity     
Corresponding Authors: SONG Lihong, E-mail:   
Issue Date: 24 December 2016
E-mail this article
E-mail Alert
Articles by authors
Cite this article:   
WANG Wenyi;SONG Lihong;DING Shuliang. Classification accuracy and consistency indices for complex decision rules in multidimensional item response theory[J]. Acta Psychologica Sinica,2016, 48(12): 1612-1624.
URL:     OR
[1] TANG Xiaoyu, WU Yingnan, PENG Xing, WANG Aijun, LI Qi. The influence of endogenous spatial cue validity on audiovisual integration[J]. Acta Psychologica Sinica, 2020, 52(7): 835-846.
[2] Xiaojun YUAN, Xiaoxia CUI, Zhengcao CAO, Hong KAN, Xiao WANG, Yamin WANG. Attentional bias towards threatening visual stimuli in a virtual reality-based visual search task[J]. Acta Psychologica Sinica, 2018, 50(6): 622-636.
[3] LIU Yue, LIU Hongyun.  Reporting overall scores and domain scores of bi-factor models[J]. Acta Psychologica Sinica, 2017, 49(9): 1234-1246.
[4] PENG Yafeng; LUO Zhaosheng; YU Xiaofeng; GAO Chunlei; LI Yujun. The optimization of test design in Cognitive Diagnostic Assessment[J]. Acta Psychologica Sinica, 2016, 48(12): 1600-1611.
[5] ZHAN Peida; CHEN Ping; BIAN Yufang. Using confirmatory compensatory multidimensional IRT models to do cognitive diagnosis[J]. Acta Psychologica Sinica, 2016, 48(10): 1347-1356.
[6] GUO Lei; ZHENG Chanjin; BIAN Yufang. Exposure Control Methods and Termination Rules in Variable-Length Cognitive Diagnostic Computerized Adaptive Testing[J]. Acta Psychologica Sinica, 2015, 47(1): 129-140.
[7] LUO Zhaosheng;GUO Xiaojun. The Optimal Size of Material in Psychological Experiment: The Applications of Multivariate Generalizability Theory[J]. Acta Psychologica Sinica, 2014, 46(6): 876-884.
[8] LIU Yue;LIU Hongyun. Comparison of MIRT Linking Methods for Different Common Item Designs[J]. Acta Psychologica Sinica, 2013, 45(4): 466-480 .
[9] LIU Hong-Yun,LUO Fang,WANG Yue,ZHANG Yu. Item Parameter Estimation for Multidimensional Measurement: Comparisons of SEM and MIRT Based Methods[J]. , 2012, 44(1): 121-132.
[10] TU Dong-Bo,CAI Yan,DAI Hai-Qi,DING Shu-Liang. Parameters Estimation of MIRT Model and Its Application in Psychological Tests[J]. , 2011, 43(11): 1329-1340.
[11] WEN Zhong-Lin,YE Bao-Juan. Evaluating Test Reliability:From Coefficient Alpha to Internal Consistency Reliability[J]. , 2011, 43(07): 821-829.
[12] YE Bao-Juan,WEN Zhong-Lin. A Comparison of Three Confidence Intervals of Composite Reliability of A Unidimensional Test[J]. , 2011, 43(04): 453-461.
[13] YAN Jin,WU Ying-Jie,ZHANG Wei. Biodata as A Personnel Recruitment Selection Approach in China: Assessment and Its Validity[J]. , 2010, 42(03): 423-433.
[14] LIU Qiang,HU Zhong-Hua,ZHAO Guang,TAO Wei-Dong,ZHANG Qing-Lin,SUN Hong-Jin. The Prior Knowledge of the Reliability of Sensory Cues Affects the Multisensory Integration in the Early Perceptual Processing Stage[J]. , 2010, 42(02): 227-234.
[15] WANG Hui,WU Chao-Yan,ZHANG Yan,Chao C. CHEN. The Dimensionality and Measure of Empowering Leadership Behavior in the Chinese Organizations[J]. , 2008, 40(12): 1297-1305.
Full text



Copyright © Acta Psychologica Sinica
Support by Beijing Magtech