ISSN 0439-755X
CN 11-1911/B

中国科学院心理研究所

25 September 2021, Volume 53 Issue 9
 Nonparametric methods for cognitive diagnosis to multiple-choice test items GUO Lei, ZHOU Wenjie 2021, 53 (9):  1032-1043.  doi: 10.3724/SP.J.1041.2021.01032 Abstract ( 287 )   HTML ( 14 )   PDF (696KB) ( 456 )   Peer Review Comments Cognitive diagnostic assessment (CDA) focuses on evaluating students' advantages and disadvantages in knowledge mastering, providing an opportunity for individualized teaching. Therefore, CDA has attracted attention of many scholars, teachers, and students at domestic and overseas. In CDA and a large number of standardized tests, multiple-choice (MC) are typical item types, which have the advantages of not being affected by subjective errors, improving test reliability, being easy to review, scoring quickly, and meeting the needs of content balance. To fulfil the potential of MC items for CDA, researchers proposed the MC-cognitive diagnosis models (MC-CDMs). However, these MC-CDMs pertain to parameter methods, which need a large sample size to obtain accurate parameter estimation. They are not suitable for small samples at class level, and the MCMC algorithm is very time-consuming. In this study, three nonparametric MC cognitive diagnosis methods based on hamming-distance are proposed, aiming at maximizing the diagnostic efficacy of MC items and being suitable for the diagnosis target of a small sample. Simulation study 1 considered four factors: sample size (30, 50, 100), test length (10, 20, 30), item quality (high and low), and the true model (MC-S-DINA1, MC-S-DINA2). Three nonparametric MC methods and two parametric models were compared. The results showed that in most conditions, the pattern accuracy rates and average attribute accuracy rates of the nonparametric MC method(${{d}_{\text{h}-\text{MC}}}$) were higher than those of parametric models, especially when the test length was short or item quality was low. In a real test situation, the quality of different items in a test may vary greatly. Based on this, simulation study 2 set the first half of the items at high quality and the remaining items at low quality. The results showed that the pattern accuracy rates and average attribute accuracy rates of the nonparametric MC method (${{d}_{\text{ph}-\text{MC}}}$) were higher than those of the parametric models in all conditions. In an empirical study, the nonparametric MC methods and the parametric models were used to analyze a set of real data simultaneously. The results showed that nonparametric MC methods and parametric models presented high classification consistency rates. Furthermore, the ${{d}_{\text{ph}-\text{MC}}}$ method had satisfactory estimations. In sum, ${{d}_{\text{h}-\text{MC}}}$ was suitable in most conditions, especially when the test length was short or the item quality was low When the quality of different items was quite diverse, ${{d}_{\text{ph}-\text{MC}}}$ was a better choice compared with parameteric approaches.
 Two new termination rules for multidimensional computerized classification testing REN He, CHEN Ping 2021, 53 (9):  1044-1058.  doi: 10.3724/SP.J.1041.2021.01044 Abstract ( 287 )   HTML ( 23 )   PDF (1837KB) ( 381 )   Peer Review Comments Computerized classification testing (CCT) is a subset of computerized adaptive testing (CAT), and it aims to classify examinees into one of at least two possible categories that denote results such as pass/fail or non-mastery/partial mastery/mastery. Therefore, CCTs focus on increasing the accuracy of classification which is different from CATs designed for precise measurement. The termination rule is one of the key components of CCT. However, as pointed out by Nydick (2013), most CCTs (i.e., UCCTs) were designed under unidimensional item response theory (IRT), in which the unidimensionality assumption is easily violated in practice. Thus, researchers then began to construct multidimensional CCT termination rules (i.e., MCCT) based on multidimensional IRT. To date, however, these rules still have some deficiencies in terms of classification accuracy or test efficiency. Most current studies on termination rules of MCCT are based on termination rules of UCCT. In UCCTs, termination rules require setting a cut point, ${{\theta }_{0}}$, of the latent trait to calculate the statistics; and when they are extended from UCCT to MCCT, the cut point will become a classification bound curve or even a surface (i.e., $g(\theta )=0$). At this time, a question is how to convert the curve or surface into ${{\theta }_{0}}$. To this end, the projected sequential probability ratio test (P-SPRT), constrained SPRT (C-SPRT; Nydick, 2013), and multidimensional generalized likelihood ratio (M-GLR) were respectively proposed to solve the problem in different ways. Among them, P-SPRT and C-SPRT choose specific points on g(θ) as the approximate cut point, ${{\hat{\theta }}_{0}}$, by projecting into Euclidean space or constraining on g(θ) respectively; as for M-GLR, because the generalized likelihood ratio statistic can be calculated without a cut point, it can be directly employed in MCCT. To overcome the limitation that P-SPRT may lead to unstable results at the beginning of the test, this study proposed the Mahalanobis distance-based SPRT (Mahalanobis-SPRT). In addition, stochastic curtailment is a technique for shortening the test length by predicting whether the classification of participants will change as the test continues. This article also combined M-GLR with the stochastic curtailment and proposed M-GLR with stochastic curtailment (M-SCGLR). A full-scale simulation study was conducted to (1) compare both the Mahalanobis-SPRT and M-SCGLR with the P-SPRT, C-SPRT, M-GLR, and multidimensional stochastically curtailed SPRT (M-SCSPRT) under varying conditions; (2) compare the classification performance of the above six termination rules for participants with specific abilities to explore whether there is a significant difference in the sensitivity of various rules to classify specific participants. To achieve the first research objective, three levels of correlation between dimensions (ρ=0, 0.5, and 0.8), two item bank structures (within-item multidimensionality and between-item multidimensionality), and two kinds of classification boundary (compensatory boundary and non-compensatory boundary) were considered; to achieve the second objective, 36 specific ability points $({{\theta }_{1}},{{\theta }_{2}})$ were generated where ${{\theta }_{1}},{{\theta }_{2}}\in \{-0.5,-0.3,-0.1,0.1,0.3,0.5\}$. The results showed that: (1) when the compensatory classification function was used, the Mahalanobis-SPRT led to higher classification accuracy and similar test length to the rules without stochastic curtailment; (2) under almost all conditions, the M-SCGLR not only possessed higher precision but also maintained the short test length, compared to M-SCSPRT that also uses stochastic curtailment; (3) the six termination rules showed a consistent change in the sensitivity of the precision and test length to specific participants. To sum up, two new MCCT termination rules (Mahalanobis-SPRT and M-SCGLR) are put forward in this article. Although the simulation results are very promising, several research directions merit further investigation, such as the development of MCCT termination rules for more than two categories, and the construction of MCCT termination rules by incorporating process data like the response time.