心理科学进展 ›› 2022, Vol. 30 ›› Issue (5): 1168-1182.doi: 10.3724/SP.J.1042.2022.01168
收稿日期:
2021-06-18
出版日期:
2022-05-15
发布日期:
2022-03-24
通讯作者:
陈平
E-mail:pchen@bnu.edu.cn
基金资助:
REN He, HUANG Yingshi, CHEN Ping()
Received:
2021-06-18
Online:
2022-05-15
Published:
2022-03-24
Contact:
CHEN Ping
E-mail:pchen@bnu.edu.cn
摘要:
计算机化分类测验(Computerized Classification Testing, CCT)能够高效地对被试进行分类, 已广泛应用于合格性测验及临床心理学中。作为CCT的重要组成部分, 终止规则决定测验何时停止以及将被试最终划分到何种类别, 因此直接影响测验效率及分类准确率。已有的三大类终止规则(似然比规则、贝叶斯决策理论规则及置信区间规则)的核心思想分别为构造假设检验、设计损失函数和比较置信区间相对位置。同时, 在不同测验情境下, CCT的终止规则发展出不同的具体形式。未来研究可以继续开发贝叶斯规则、考虑多维多类别情境以及结合作答时间和机器学习算法。针对测验实际需求, 三类终止规则在合格性测验上均有应用潜力, 而临床问卷则倾向应用贝叶斯规则。
任赫, 黄颖诗, 陈平. (2022). 计算机化分类测验终止规则的类别、特点及应用. 心理科学进展 , 30(5), 1168-1182.
REN He, HUANG Yingshi, CHEN Ping. (2022). Types, characteristics and application of termination rules in computerized classification testing. Advances in Psychological Science, 30(5), 1168-1182.
决策 | | |
---|---|---|
被试属于“未掌握” | | |
被试属于“掌握” | | |
表1 阶段j’时的二分类阈值损失函数
决策 | | |
---|---|---|
被试属于“未掌握” | | |
被试属于“掌握” | | |
决策 | | | |
---|---|---|---|
被试属于“类别1” | | | |
被试属于“类别2” | | | |
被试属于“类别3” | | | |
表2 阶段j’的三分类阈值损失函数
决策 | | | |
---|---|---|---|
被试属于“类别1” | | | |
被试属于“类别2” | | | |
被试属于“类别3” | | | |
决策 | | |
---|---|---|
被试属于“未掌握” | | |
被试属于“掌握” | | |
表3 阶段j’的二分类线性损失函数
决策 | | |
---|---|---|
被试属于“未掌握” | | |
被试属于“掌握” | | |
决策 | | | |
---|---|---|---|
被试属于“类别1” | | | |
被试属于“类别2” | | | |
被试属于“类别3” | | | |
表4 阶段j’的三分类线性损失函数
决策 | | | |
---|---|---|---|
被试属于“类别1” | | | |
被试属于“类别2” | | | |
被试属于“类别3” | | | |
核心原理 | 类别数 | 维度数 | 终止规则 | 构造思路 |
---|---|---|---|---|
似然比规则 | ||||
序贯似然比 | 二分类 | 单维 | SPRT | 在分界点处构造一组简单假设及对应的序贯似然比统计量 |
SCSPRT | 在SPRT的基础上结合随机缩减技术 | |||
多维 | C-SPRT | 通过似然函数约束转化为SPRT | ||
P-SPRT | 通过欧氏空间投影转化为SPRT | |||
M-SCSPRT | 在C-SPRT的基础上结合随机缩减技术 | |||
多分类 | 单维 | Sobel-Wald方法 | 在每个分类点处进行一次SPRT | |
Armitage方法 | 为所有可能的类别组合进行SPRT | |||
广义似然比 | 二分类 | 单维 | GLR | 在分界点处构造一组复杂假设及对应的广义似然比统计量 |
SCGLR | 在GLR的基础上结合随机缩减技术 | |||
多维 | M-GLR | 将GLR中的能力区间转化为多维能力空间 | ||
多分类 | 单维 | mGLR | 对被试属于每个类别构造一组复杂假设及对应的广义似然比统计量 | |
贝叶斯规则 | ||||
阈值损失 | 二分类 | 单维 | Lewis-Sheehan方法 | 确定每种决策所对应的损失 |
多分类 | Vos方法 | 确定每种决策所对应的损失 | ||
线性损失 | 二分类 | Linden-Mellenbergh方法 | 确定每种决策所对应的损失, 并考虑能力估计值与分界点的距离 | |
多分类 | Vos方法 | 确定每种决策所对应的损失, 并考虑能力估计值与分界点的距离 | ||
置信区间规则 | ||||
置信区间 | 二分类 | 单维 | ACI | 比较能力估计值的置信区间与分界点的相对位置 |
表5 CCT终止规则的总结
核心原理 | 类别数 | 维度数 | 终止规则 | 构造思路 |
---|---|---|---|---|
似然比规则 | ||||
序贯似然比 | 二分类 | 单维 | SPRT | 在分界点处构造一组简单假设及对应的序贯似然比统计量 |
SCSPRT | 在SPRT的基础上结合随机缩减技术 | |||
多维 | C-SPRT | 通过似然函数约束转化为SPRT | ||
P-SPRT | 通过欧氏空间投影转化为SPRT | |||
M-SCSPRT | 在C-SPRT的基础上结合随机缩减技术 | |||
多分类 | 单维 | Sobel-Wald方法 | 在每个分类点处进行一次SPRT | |
Armitage方法 | 为所有可能的类别组合进行SPRT | |||
广义似然比 | 二分类 | 单维 | GLR | 在分界点处构造一组复杂假设及对应的广义似然比统计量 |
SCGLR | 在GLR的基础上结合随机缩减技术 | |||
多维 | M-GLR | 将GLR中的能力区间转化为多维能力空间 | ||
多分类 | 单维 | mGLR | 对被试属于每个类别构造一组复杂假设及对应的广义似然比统计量 | |
贝叶斯规则 | ||||
阈值损失 | 二分类 | 单维 | Lewis-Sheehan方法 | 确定每种决策所对应的损失 |
多分类 | Vos方法 | 确定每种决策所对应的损失 | ||
线性损失 | 二分类 | Linden-Mellenbergh方法 | 确定每种决策所对应的损失, 并考虑能力估计值与分界点的距离 | |
多分类 | Vos方法 | 确定每种决策所对应的损失, 并考虑能力估计值与分界点的距离 | ||
置信区间规则 | ||||
置信区间 | 二分类 | 单维 | ACI | 比较能力估计值的置信区间与分界点的相对位置 |
[1] | 陈平. (2016). 两种新的计算机化自适应测验在线标定方法. 心理学报, 48(9), 1184-1198. |
[2] | 简小珠, 陈平. (2020). 计算机化分类测验的特点与发展述评. 考试研究, (6), 77-89. |
[3] | 康春花, 辛涛. (2010). 测验理论的新发展:多维项目反应理论. 心理科学进展, 18(3), 530-536. |
[4] | 任赫, 陈平. (2021). 两种新的多维计算机化分类测验终止规则. 心理学报, 53(9), 1044-1058. |
[5] | 詹沛达. (2019). 计算机化多维测验中作答时间和作答精度数据的联合分析. 心理科学, (1), 170-178. |
[6] | 詹沛达, Hong Jiao, Kaiwen Man. (2020). 多维对数正态作答时间模型:对潜在加工速度多维性的探究. 心理学报, 52, 1132-1142. |
[7] | Armitage P. (1950). Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. Journal of the Royal Statistical Society, Series B, 12(1), 137-144. |
[8] |
Bartroff J., Finkelman M., & Lai T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73(3), 473-486.
doi: 10.1007/s11336-007-9053-9 URL |
[9] |
Eggen T.J.H.M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261.
doi: 10.1177/01466219922031365 URL |
[10] |
Eggen T.J.H.M., & Straetmans G.J.J.M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734.
doi: 10.1177/00131640021970862 URL |
[11] | Ferguson R. L. (1969). Computer-assisted criterion-referenced measurement (Working Paper No.41). Pittsburgh, PA: University of Pittsburgh, Learning and Research Development Center. |
[12] | Finkelman M. (2003). An adaptation of stochastic curtailment to truncate Wald’s SPRT in computerized adaptive testing (CSE Report 606). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. |
[13] | Finkelman M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442-463. |
[14] |
Finkelman M. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34(1), 27-45.
doi: 10.1177/0146621609336113 URL |
[15] |
Finkelman M., He Y., Kim W., & Lai A. M. (2011). Stochastic curtailment of health questionnaires: A method to reduce respondent burden. Statistics in Medicine, 30(16), 1989-2004.
doi: 10.1002/sim.4231 pmid: 21520454 |
[16] | Ghosh B. K. (1970). Sequential tests of statistical hypotheses. Reading, MA: Addison-Wesley. |
[17] | Ghosh B. K., & Sen P. K. (1991). Handbook of sequential analysis. New York, NY: Marcel Dekker. |
[18] |
Gonzalez O. (2021). Psychometric and machine learning approaches for diagnostic assessment and tests of individual classification. Psychological Methods, 26(2), 236-254.
doi: 10.1037/met0000317 URL |
[19] | Govindarajulu Z. (1987). The sequential statistical analysis of hypothesis, testing, point and interval estimation, and decision theory (American series in mathematical and management sciences). Columbus, OH: American Sciences Press, Inc. |
[20] | Huang C.-Y., Kalohn J. C., Lin C.-J., & Spray J. (2000). Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report2000-4). Iowa City, IA: ACT, Inc. |
[21] |
Huebner A. R., & Fina A. D. (2015). The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests. Behavior Research Methods, 47(2), 549-561.
doi: 10.3758/s13428-014-0490-y pmid: 24907003 |
[22] | Kingsbury G. G., & Weiss D. J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D.J.Weiss(Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.257-283). New York, NY: Academic Press. |
[23] |
Lewis C., & Sheehan K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367-386.
doi: 10.1177/014662169001400404 URL |
[24] | Li C., Moore S. C., Smith J., Bauermeister S., & Gallacher J. (2019). The costs of negative affect attributable to alcohol consumption in later life: A within-between random longitudinal econometric model using UK Biobank. PLOS ONE, 14(2), Article e0211357. https://doi.org/10.1371/journal.pone.0211357 |
[25] |
Man K., Harring J. R., Jiao H., & Zhan P. (2019). Joint modeling of compensatory multidimensional item responses and response times. Applied Psychological Measurement, 43(8), 639-654.
doi: 10.1177/0146621618824853 URL |
[26] | Nydick S. (2013). Multidimensional mastery testing with CAT (Unpublished doctoral dissertation). University of Minnesota. |
[27] | Reckase M. D. (1983). A procedure for decision making using tailored testing. In D.J.Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.237-257). New York, NY: Academic Press. |
[28] | Reckase M. D. (2009). Multidimensional item response theory. New York, NY: Springer. |
[29] | Seitz N.-N., & Frey A. (2013). The sequential probability ratio test for multidimensional adaptive testing with between-item multidimensionality. Psychological Test and Assessment Modeling, 55(1), 105-123. |
[30] |
Sie H., Finkelman M. D., Riley B., & Smits N. (2015). Utilizing response times in computerized classification testing. Applied Psychological Measurement, 39(5), 389-405.
doi: 10.1177/0146621615569504 URL |
[31] | Smits N., & Finkelman M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19-37. |
[32] |
Smits N., Finkelman M. D., & Kelderman H. (2016). Stochastic curtailment of questionnaires for three-level classification: Shortening the CES-D for assessing low, moderate, and high risk of depression. Applied Psychological Measurement, 40(1), 22-36.
doi: 10.1177/0146621615592294 URL |
[33] | Sobel M., & Wald A. (1949). A sequential decision procedure for choosing one of three hypotheses concerning the unknown mean of a normal distribution. Annals of Mathematical Statistics, 20(4), 502-522. |
[34] | Spray J. A. (1993). Multiple-category classification using a sequential probability ratio test (ACT Research Report Series, No.93-7). Iowa City, IA: Americn College Testing. |
[35] | Spray J. A., & Reckase M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414. |
[36] | Tartakovsky A., Nikiforov I., & Basseville M. (2014). Sequential analysis: Hypothesis testing and changepoint detection. Boca Raton, FL: Chapman and Hall/CRC. |
[37] |
Thompson N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793.
doi: 10.1177/0013164408324460 URL |
[38] | Thompson N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research, & Evaluation, 16(4), 1-7. |
[39] | Tian C. (2018). Comparison of four stopping rules in computerized adaptive testing and examination of their application to on-the-fly multistage testing (Unpublished master’s thesis). University of Illinois. |
[40] |
van der Linden W. J., & Mellenbergh G. J. (1977). Optimal cutting scores using a linear loss function. Applied Psychological Measurement, 1(4), 593-599.
doi: 10.1177/014662167700100414 URL |
[41] |
van der Linden W. J., & Vos H. J. (1996). A compensatory approach to optimal selection with mastery scores. Psychometrika, 61, 155-172.
doi: 10.1007/BF02296964 URL |
[42] |
van Groen M. M.,Eggen, T. J. H, M., & Veldkamp, B. P. (2014). Item selection methods based on multiple objective approaches for classifying respondents into multiple levels. Applied Psychological Measurement, 38(3), 187-200.
doi: 10.1177/0146621613509723 URL |
[43] |
Vos H. J. (1997a). Simultaneous optimization of quota- restricted selection decisions with mastery scores. British Journal of Mathematical and Statistical Psychology, 50(1), 105-125.
doi: 10.1111/j.2044-8317.1997.tb01106.x URL |
[44] |
Vos H. J. (1997b). A simultaneous approach to optimizing treatment assignments with mastery scores. Multivariate Behavioral Research, 32(4), 403-433.
doi: 10.1207/s15327906mbr3204_5 URL |
[45] | Vos H. J. (1999). Applications of Bayesian decision theory to sequential mastery testing. Journal of Educational and Behavioral Statistics, 24(3), 271-292. |
[46] | Wald A. (1947). Sequential analysis. New York, NY: John Wiley. |
[47] | Wald A., & Wolfowitz J. (1948). Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 19, 326-339. |
[48] | Wang C., Chen P., & Huebner A. (2021). Stopping rules for multi-category computerized classification testing. British Journal of Mathematical and Statistial Psychology, 74(2), 184-202. |
[49] | Wang Z. (2019). Grid multi-classification adaptive classification testing with multidimensional polytomous items (Unpublished doctoral dissertation). University of Minnesota. |
[50] | Zhan P., Jiao H., Man K., Wang W.-C., & He K. (2021). Variable speed across dimensions of ability in the joint model for responses and response times. Frontiers in psychology, 12, Article 469196. https://doi.org/10.3389/fpsyg.2021.469196 |
[51] | Zheng Y., Cheon H., & Katz C. M. (2020). Using machine learning methods to develop a short tree-based adaptive classification test: Case study with a high-dimensional item pool and imbalanced data. Applied Psychological Measurement, 44(7-8), 499-514. https://doi.org/10.1177/0146621620931198 |
[1] | 毛秀珍;辛涛. 多维计算机化自适应测验:模型、技术和方法[J]. 心理科学进展, 2015, 23(5): 907-918. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||