计算机化分类测验终止规则的类别、特点及应用

doi:10.3724/SP.J.1042.2022.01168

摘要/Abstract

摘要：

计算机化分类测验(Computerized Classification Testing, CCT)能够高效地对被试进行分类, 已广泛应用于合格性测验及临床心理学中。作为CCT的重要组成部分, 终止规则决定测验何时停止以及将被试最终划分到何种类别, 因此直接影响测验效率及分类准确率。已有的三大类终止规则(似然比规则、贝叶斯决策理论规则及置信区间规则)的核心思想分别为构造假设检验、设计损失函数和比较置信区间相对位置。同时, 在不同测验情境下, CCT的终止规则发展出不同的具体形式。未来研究可以继续开发贝叶斯规则、考虑多维多类别情境以及结合作答时间和机器学习算法。针对测验实际需求, 三类终止规则在合格性测验上均有应用潜力, 而临床问卷则倾向应用贝叶斯规则。

关键词: 计算机化分类测验, 终止规则, 似然比, 随机缩减, 贝叶斯决策理论

Abstract:

Computerized classification testing (CCT) can adaptively classify test-takers into two or more different categories, and it has been widely used in qualifying tests and clinical psychology or medical diagnosis. As an essential part of CCT, the termination rule determines when the test is to be stopped and to which category the test-taker is ultimately classified into, directly affecting the test efficiency and classification accuracy. According to the theoretical basis of the termination rules, existing rules can be roughly divided into the likelihood ratio, Bayesian decision theory, and confidence interval rules. And their core ideas are constructing hypothesis tests, designing loss functions, and comparing the relative positions of confidence intervals, respectively. At the same time, when constructing specific termination rules, the requirement of different test scenarios (e.g., the number of categories and the number of tests’ dimensions) should also be considered.
There are advantages and disadvantages to each of the three types of termination rules. Specifically, the likelihood ratio rule is based on the likelihood ratio test, with better theoretical properties. However, the method requires prior determination of the indifference interval and the type I and II error rates, introducing the impact of subjective factors. Also, it is more challenging to extend the method in complex test situations, such as multidimensional and multicategory CCT. Bayesian decision theory rules make classification decisions based on the loss function. It can dynamically optimize the decision from a more global perspective since it works backward from the final stage of the test. In addition, the variety of loss functions makes the method very flexible in form and makes it easy to be applied to different test situations. However, in practice, the flexibility will inevitably result in the uncertainty of the choice of loss function, and the inappropriate loss function may be biased. The confidence interval method is the most straightforward because of its relatively simple principle and low computational effort. However, this method is less robust and has a relatively low test efficiency.
Currently, CCT is mainly applied in eligibility tests and clinical medicine questionnaires. In eligibility tests, all three types of termination rules have the potential to be widely applied. However, in practice, the principles of the likelihood ratio rule and the Bayesian decision theory rule are not easily understood by the general public, and these methods are also accompanied by the problem of over-exposure of items for their preference of cut-point based item selection methods. Therefore, the confidence interval rule, which is relatively simple in principle and has alleviated item exposure, has been widely used in existing qualifying tests. Bayesian decision theory rules are more applicable in clinical questionnaires because of their finer control over various classification losses.
The following can be considered for future research on CCT termination rules. First, Bayesian decision theory rules can be improved by considering non-statistical constraints with the help of the flexibility of its loss function. Second, termination rules can be developed for multidimensional and multicategory CCT to meet more practical needs. Third, termination rules that integrate response time can be developed to improve test efficiency and classification accuracy. Fourth, it is possible to construct termination rules under the framework of machine learning.

Key words: computerized classification testing, termination rule, likelihood radio, stochastic curtailment, Bayesian decision theory

任赫, 黄颖诗, 陈平. (2022). 计算机化分类测验终止规则的类别、特点及应用. 心理科学进展 , 30(5), 1168-1182.

REN He, HUANG Yingshi, CHEN Ping. (2022). Types, characteristics and application of termination rules in computerized classification testing. Advances in Psychological Science, 30(5), 1168-1182.

图/表 8

参考文献 51

[1]	陈平. (2016). 两种新的计算机化自适应测验在线标定方法. 心理学报, 48(9), 1184-1198.
[2]	简小珠, 陈平. (2020). 计算机化分类测验的特点与发展述评. 考试研究, (6), 77-89.
[3]	康春花, 辛涛. (2010). 测验理论的新发展:多维项目反应理论. 心理科学进展, 18(3), 530-536.
[4]	任赫, 陈平. (2021). 两种新的多维计算机化分类测验终止规则. 心理学报, 53(9), 1044-1058.
[5]	詹沛达. (2019). 计算机化多维测验中作答时间和作答精度数据的联合分析. 心理科学, (1), 170-178.
[6]	詹沛达, Hong Jiao, Kaiwen Man. (2020). 多维对数正态作答时间模型:对潜在加工速度多维性的探究. 心理学报, 52, 1132-1142.
[7]	Armitage P. (1950). Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. Journal of the Royal Statistical Society, Series B, 12(1), 137-144.
[8]	Bartroff J., Finkelman M., & Lai T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73(3), 473-486. doi: 10.1007/s11336-007-9053-9 URL
[9]	Eggen T.J.H.M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. doi: 10.1177/01466219922031365 URL
[10]	Eggen T.J.H.M., & Straetmans G.J.J.M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734. doi: 10.1177/00131640021970862 URL
[11]	Ferguson R. L. (1969). Computer-assisted criterion-referenced measurement (Working Paper No.41). Pittsburgh, PA: University of Pittsburgh, Learning and Research Development Center.
[12]	Finkelman M. (2003). An adaptation of stochastic curtailment to truncate Wald’s SPRT in computerized adaptive testing (CSE Report 606). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.
[13]	Finkelman M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442-463.
[14]	Finkelman M. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34(1), 27-45. doi: 10.1177/0146621609336113 URL
[15]	Finkelman M., He Y., Kim W., & Lai A. M. (2011). Stochastic curtailment of health questionnaires: A method to reduce respondent burden. Statistics in Medicine, 30(16), 1989-2004. doi: 10.1002/sim.4231 pmid: 21520454
[16]	Ghosh B. K. (1970). Sequential tests of statistical hypotheses. Reading, MA: Addison-Wesley.
[17]	Ghosh B. K., & Sen P. K. (1991). Handbook of sequential analysis. New York, NY: Marcel Dekker.
[18]	Gonzalez O. (2021). Psychometric and machine learning approaches for diagnostic assessment and tests of individual classification. Psychological Methods, 26(2), 236-254. doi: 10.1037/met0000317 URL
[19]	Govindarajulu Z. (1987). The sequential statistical analysis of hypothesis, testing, point and interval estimation, and decision theory (American series in mathematical and management sciences). Columbus, OH: American Sciences Press, Inc.
[20]	Huang C.-Y., Kalohn J. C., Lin C.-J., & Spray J. (2000). Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report2000-4). Iowa City, IA: ACT, Inc.
[21]	Huebner A. R., & Fina A. D. (2015). The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests. Behavior Research Methods, 47(2), 549-561. doi: 10.3758/s13428-014-0490-y pmid: 24907003
[22]	Kingsbury G. G., & Weiss D. J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D.J.Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.257-283). New York, NY: Academic Press.
[23]	Lewis C., & Sheehan K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367-386. doi: 10.1177/014662169001400404 URL
[24]	Li C., Moore S. C., Smith J., Bauermeister S., & Gallacher J. (2019). The costs of negative affect attributable to alcohol consumption in later life: A within-between random longitudinal econometric model using UK Biobank. PLOS ONE, 14(2), Article e0211357. https://doi.org/10.1371/journal.pone.0211357
[25]	Man K., Harring J. R., Jiao H., & Zhan P. (2019). Joint modeling of compensatory multidimensional item responses and response times. Applied Psychological Measurement, 43(8), 639-654. doi: 10.1177/0146621618824853 URL
[26]	Nydick S. (2013). Multidimensional mastery testing with CAT (Unpublished doctoral dissertation). University of Minnesota.
[27]	Reckase M. D. (1983). A procedure for decision making using tailored testing. In D.J.Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.237-257). New York, NY: Academic Press.
[28]	Reckase M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
[29]	Seitz N.-N., & Frey A. (2013). The sequential probability ratio test for multidimensional adaptive testing with between-item multidimensionality. Psychological Test and Assessment Modeling, 55(1), 105-123.
[30]	Sie H., Finkelman M. D., Riley B., & Smits N. (2015). Utilizing response times in computerized classification testing. Applied Psychological Measurement, 39(5), 389-405. doi: 10.1177/0146621615569504 URL
[31]	Smits N., & Finkelman M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19-37.
[32]	Smits N., Finkelman M. D., & Kelderman H. (2016). Stochastic curtailment of questionnaires for three-level classification: Shortening the CES-D for assessing low, moderate, and high risk of depression. Applied Psychological Measurement, 40(1), 22-36. doi: 10.1177/0146621615592294 URL
[33]	Sobel M., & Wald A. (1949). A sequential decision procedure for choosing one of three hypotheses concerning the unknown mean of a normal distribution. Annals of Mathematical Statistics, 20(4), 502-522.
[34]	Spray J. A. (1993). Multiple-category classification using a sequential probability ratio test (ACT Research Report Series, No.93-7). Iowa City, IA: Americn College Testing.
[35]	Spray J. A., & Reckase M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414.
[36]	Tartakovsky A., Nikiforov I., & Basseville M. (2014). Sequential analysis: Hypothesis testing and changepoint detection. Boca Raton, FL: Chapman and Hall/CRC.
[37]	Thompson N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793. doi: 10.1177/0013164408324460 URL
[38]	Thompson N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research, & Evaluation, 16(4), 1-7.
[39]	Tian C. (2018). Comparison of four stopping rules in computerized adaptive testing and examination of their application to on-the-fly multistage testing (Unpublished master’s thesis). University of Illinois.
[40]	van der Linden W. J., & Mellenbergh G. J. (1977). Optimal cutting scores using a linear loss function. Applied Psychological Measurement, 1(4), 593-599. doi: 10.1177/014662167700100414 URL
[41]	van der Linden W. J., & Vos H. J. (1996). A compensatory approach to optimal selection with mastery scores. Psychometrika, 61, 155-172. doi: 10.1007/BF02296964 URL
[42]	van Groen M. M.,Eggen, T. J. H, M., & Veldkamp, B. P. (2014). Item selection methods based on multiple objective approaches for classifying respondents into multiple levels. Applied Psychological Measurement, 38(3), 187-200. doi: 10.1177/0146621613509723 URL
[43]	Vos H. J. (1997a). Simultaneous optimization of quota- restricted selection decisions with mastery scores. British Journal of Mathematical and Statistical Psychology, 50(1), 105-125. doi: 10.1111/j.2044-8317.1997.tb01106.x URL
[44]	Vos H. J. (1997b). A simultaneous approach to optimizing treatment assignments with mastery scores. Multivariate Behavioral Research, 32(4), 403-433. doi: 10.1207/s15327906mbr3204_5 URL
[45]	Vos H. J. (1999). Applications of Bayesian decision theory to sequential mastery testing. Journal of Educational and Behavioral Statistics, 24(3), 271-292.
[46]	Wald A. (1947). Sequential analysis. New York, NY: John Wiley.
[47]	Wald A., & Wolfowitz J. (1948). Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 19, 326-339.
[48]	Wang C., Chen P., & Huebner A. (2021). Stopping rules for multi-category computerized classification testing. British Journal of Mathematical and Statistial Psychology, 74(2), 184-202.
[49]	Wang Z. (2019). Grid multi-classification adaptive classification testing with multidimensional polytomous items (Unpublished doctoral dissertation). University of Minnesota.
[50]	Zhan P., Jiao H., Man K., Wang W.-C., & He K. (2021). Variable speed across dimensions of ability in the joint model for responses and response times. Frontiers in psychology, 12, Article 469196. https://doi.org/10.3389/fpsyg.2021.469196
[51]	Zheng Y., Cheon H., & Katz C. M. (2020). Using machine learning methods to develop a short tree-based adaptive classification test: Case study with a high-dimensional item pool and imbalanced data. Applied Psychological Measurement, 44(7-8), 499-514. https://doi.org/10.1177/0146621620931198

决策	$\theta ={{\theta }_{l}}$	$\theta ={{\theta }_{u}}$
被试属于“未掌握”	${j}'{{l}_{c}}$	${{l}_{01}}+{j}'{{l}_{c}}$
被试属于“掌握”	${{l}_{10}}+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$

决策	$\theta ={{\theta }_{l}}$	$\theta ={{\theta }_{u}}$
被试属于“未掌握”	${j}'{{l}_{c}}$	${{l}_{01}}+{j}'{{l}_{c}}$
被试属于“掌握”	${{l}_{10}}+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$

决策	$\theta \le {{\theta }_{1l}}$	${{\theta }_{1u}}\text{}\theta \text{}{{\theta }_{2l}}$	$\theta \ge {{\theta }_{2u}}$
被试属于“类别1”	${j}'{{l}_{c}}$	${{l}_{12}}+{j}'{{l}_{c}}$	${{l}_{13}}+{j}'{{l}_{c}}$
被试属于“类别2”	${{l}_{21}}+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$	${{l}_{23}}+{j}'{{l}_{c}}$
被试属于“类别3”	${{l}_{31}}+{j}'{{l}_{c}}$	${{l}_{32}}+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$

决策	$\theta \le {{\theta }_{1l}}$	${{\theta }_{1u}}\text{}\theta \text{}{{\theta }_{2l}}$	$\theta \ge {{\theta }_{2u}}$
被试属于“类别1”	${j}'{{l}_{c}}$	${{l}_{12}}+{j}'{{l}_{c}}$	${{l}_{13}}+{j}'{{l}_{c}}$
被试属于“类别2”	${{l}_{21}}+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$	${{l}_{23}}+{j}'{{l}_{c}}$
被试属于“类别3”	${{l}_{31}}+{j}'{{l}_{c}}$	${{l}_{32}}+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$

决策	$\theta ={{\theta }_{l}}$	$\theta ={{\theta }_{u}}$
被试属于“未掌握”	${j}'{{l}_{c}}$	${{b}_{1}}\left( {{\theta }_{0}}-\theta \right)+{j}'{{l}_{c}}$
被试属于“掌握”	${{b}_{2}}\left( {{\theta }_{0}}-\theta \right)+{j}'{{l}_{c}}$	${j}'{{l}_{c}}$