心理科学进展 ›› 2020, Vol. 28 ›› Issue (5): 855-870.doi: 10.3724/SP.J.1042.2020.00855
• 研究方法 • 上一篇
收稿日期:
2019-05-12
出版日期:
2020-04-26
发布日期:
2020-03-27
通讯作者:
张敏强
E-mail:2640726401@qq.com
基金资助:
WANG Shaojie, ZHANG Minqiang(), LI Tuoyu, LIANG Zhengyan
Received:
2019-05-12
Online:
2020-04-26
Published:
2020-03-27
Contact:
ZHANG Minqiang
E-mail:2640726401@qq.com
摘要:
核等值流程包括:预平滑、估计分数概率、连续化、等值、评估等值结果。该方法兼具线性等值与等百分位等值的优点, 各环节扩展性与包容性较强; 采用平滑与连续化处理, 可降低等值随机误差; 等值差异标准误等其所特有的概念为结果评估提供可靠的工具。连续化与带宽选择方法等因素均可影响其表现; 基于核等值的新方法为等值发展提供了新颖的视角。未来可关注核等值体系的扩充与完善、流程的更新、等值方法的结合和比较等方向。
中图分类号:
王少杰, 张敏强, 李拓宇, 梁正妍. (2020). 核等值:一种观察分数等值体系. 心理科学进展 , 28(5), 855-870.
WANG Shaojie, ZHANG Minqiang, LI Tuoyu, LIANG Zhengyan. (2020). Kernel equating: A framework of observed score equating. Advances in Psychological Science, 28(5), 855-870.
等值设计 | CTT等值 | 核等值 |
---|---|---|
EG | 等百分位等值 | 核等值(最优带宽) |
线性等值 | 核等值(较大带宽, ${{h}_{X}}>$ $10{{\sigma }_{X}}$, 下同) | |
NEAT | 等百分位链等值 | 核链等值(最优带宽) |
等百分位后分层等值 | 核后分层等值(最优带宽) | |
线性链等值 | 核链等值(较大带宽) | |
Tucker等值 | 核后分层等值(较大带宽, 特定条件下) | |
Levine观察分数等值 | - |
表1 常用CTT等值与核等值方法对应表
等值设计 | CTT等值 | 核等值 |
---|---|---|
EG | 等百分位等值 | 核等值(最优带宽) |
线性等值 | 核等值(较大带宽, ${{h}_{X}}>$ $10{{\sigma }_{X}}$, 下同) | |
NEAT | 等百分位链等值 | 核链等值(最优带宽) |
等百分位后分层等值 | 核后分层等值(最优带宽) | |
线性链等值 | 核链等值(较大带宽) | |
Tucker等值 | 核后分层等值(较大带宽, 特定条件下) | |
Levine观察分数等值 | - |
1 | 陈俊丽 . ( 2008). 核等值与其它等值方法的比较研究 (硕士学位论文). 北京语言大学. |
2 | 关丹丹, 景春丽 . ( 2018). 新高考改革背景下不分文理的数学成绩差异研究. 数学教育学报, 27( 4), 31-34. |
3 | 罗莲 . (2008a). 基于HSK数据对核等值法与其他等值方法的比较研究 (博士学位论文). 北京语言大学. |
4 | 罗莲 . (2008b). 一种新的等值方法:核等值法. 心理学探新, 28( 2), 69-74. |
5 | 张敏强, 胡晖 . ( 1988). 略论测验等值的理论、方法和应用. 华南师范大学学报(社会科学版), ( 4), 113-118. |
6 | Andersson B. ( 2016). Asymptotic standard errors of observed-score equating with polytomous IRT models. Journal of Educational Measurement, 53( 4), 459-477. |
7 | Andersson B., Bränberg K., & Wiberg M . ( 2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55( 6), 1-25. |
8 | Andersson B. & von Davier, A. A . ( 2014). Improving the bandwidth selection in kernel equating. Journal of Educational Measurement, 51( 3), 223-238. |
9 | Andersson B. &Wiberg M. , ( 2017). Item response theory observed-score kernel equating. Psychometrika, 82( 1), 48-66. |
10 | Arıkan Ç. A., &Gelbal S. , ( 2018). A comparison of traditional and kernel equating methods. International Journal of Assessment Tools in Education, 5( 3), 417-427. |
11 | Chen H. ( 2012). A comparison between linear IRT observed- score equating and Levine observed-score equating under the generalized kernel equating framework. Journal of Educational Measurement, 49( 3), 269-284. |
12 | Chen H. &Holland P. , ( 2009). Construction of chained true score equipercentile equatings under the kernel equating (KE) framework and their relationship to Levine true score equating. ETS Research Report Series, 2009( 1), i-15. |
13 | Chen H. &Holland P. , ( 2010). New equating methods and their relationships with Levine observed score linear equating under the kernel equating framework. Psychometrika, 75( 3), 542-557. |
14 | Chen H. H., Livingston S. A., & Holland P. W . ( 2009). Generalized equating functions for NEAT designs. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linking( pp. 185-200). New York City, NY: Springer. |
15 | Choi S.I . ( 2009). A comparison of kernel equating and traditional equipercentile equating methods and the parametric bootstrap methods for estimating standard errors in equipercentile equating (Unpublished doctorial dissertation). University of Illinois at Urbana-Champaign. |
16 | Cid J. A., & von Davier, A. A . ( 2015). Examining potential boundary bias effects in kernel smoothing on equating: An introduction for the adaptive and Epanechnikov kernels. Applied Psychological Measurement, 39( 3), 208-222. |
17 | de Ayala R. J., Smith B., & Norman Dvorak R . ( 2018). A comparative evaluation of kernel equating and test characteristic curve equating. Applied Psychological Measurement, 42( 2), 155-168. |
18 | Dorans N. J., Liu J., & Hammond S . ( 2008). Anchor test type and population invariance: An exploration across subpopulations and test administrations. Applied Psychological Measurement, 32( 1), 81-97. |
19 | Dorans N. J., &Puhan G. , ( 2017). Contributions to score linking theory and practice. In B. Veldkamp, & M. von Davier (Series Eds.) & R. E. Bennett, & M. von Davier (Vol. Eds.), Methodology of educational measurement and assessment: Advancing human assessment: The methodological, psychological and policy contributions of ETS( pp. 79-132). Cham, Zug, Switzerland: Springer. |
20 | Duong M. & von Davier, A. A . (2008,March). Kernel equating with observed mixture distributions in a single- group design. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, NY. |
21 | ETS. ( 2007a). GENASYS [Computer software]. Princeton, NJ: Author. |
22 | ETS. ( 2007b). KE Software [Computer software]. Princeton, NJ: Author. |
23 | Godfrey K.E . ( 2007). A comparison of kernel equating and IRT true score equating methods (Unpublished doctorial dissertation). The University of North Carolina at Greensboro. |
24 | González, J. ( 2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59( 7), 1-30. |
25 | González J., Barrientos A. F., & Quintana F. A . ( 2015). Bayesian nonparametric estimation of test equating functions with covariates. Computational Statistics & Data Analysis, 89, 222-244. |
26 | González J. & von Davier, A. A . ( 2016). An illustration of the Epanechnikov and adaptive continuization methods in kernel equating.In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 .(Cham, Zug, Switzerland: Springer. |
27 | Grant M. C., Zhang L., & Damiano I . ( 2009). An evaluation of kernel equating: Parallel equating with classical methods in the SAT subject tests™ program. ETS Research Report Series, 2009(1), i-25. |
28 | Haberman S.J . ( 1984). Adjustment by minimum discriminant information. The Annals of Statistics, 12( 3), 971-988. |
29 | Haberman S.J . ( 2015). Pseudo-equivalent groups and linking. Journal of Educational and Behavioral Statistics, 40( 3), 254-273. |
30 | Häggström J. &Wiberg M. , ( 2014). Optimal bandwidth selection in observed‐score kernel equating. Journal of Educational Measurement, 51( 2), 201-211. |
31 | Holland P. W., &Thayer D. T . ( 2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25( 2), 133-183. |
32 | Holland P. W., von Davier A. A., Sinharay S., & Han N . ( 2006). Testing the untestable assumptions of the chain and poststratification equating methods for the NEAT design. ETS Research Report Series, 2006(( 1), i-38. |
33 | Jiang Y., von Davier A. A., & Chen H . ( 2012). Evaluating equating results: Percent relative error for chained kernel equating. Journal of Educational Measurement, 49( 1), 39-58. |
34 | Jones M. C., Marron J. S., & Sheather S. J . ( 1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91( 433), 401-407. |
35 | Kim H.Y . ( 2014). A comparison of smoothing methods for the common item nonequivalent groups design (Unpublished doctorial dissertation). University of Iowa, Iowa City. |
36 | Kim S. &Lu R. , ( 2018). The pseudo-equivalent groups approach as an alternative to common-item equating. ETS Research Report Series, 2018( 1), 1-13. |
37 | Kolen M. J., &Brennan R. L . ( 2014). Test equating, scaling, and linking: methods and practices. New York City, NY: Springer Science & Business Media. |
38 | Lee Y. H., & von Davier, A. A . ( 2008). Comparing alternative kernels for the kernel method of test equating: Gaussian, logistic, and uniform kernels. ETS Research Report Series, 2008( 1), i-26. |
39 | Lee Y. H., & von Davier, A. A . ( 2011). Equating through alternative kernels. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linkingpp. 159-273). New York City, NY: Springer. |
40 | Leôncio W. &Wiberg M. , ( 2017). Evaluating equating transformations from different frameworks. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 233. Quantitative psychology: The 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017(pp. 101-110). Cham, Zug, Switzerland: Springer. |
41 | Liang T. & von Davier, A. A . ( 2014). Cross-validation: An alternative bandwidth-selection method in kernel equating. Applied Psychological Measurement, 38( 4), 281-295. |
42 | Liu J. &Low A. C . ( 2007). An exploration of kernel equating using SAT® data: Equating to a similar population and to a distant population. ETS Research Report Series, 2007( 1), i-22. |
43 | Liu J. &Low A. C . ( 2008). A comparison of the kernel equating method with traditional equating methods using SAT® data. Journal of Educational Measurement, 45( 4), 309-323. |
44 | Longford N.T . ( 2015). Equating without an anchor for nonequivalent groups of examinees. Journal of Educational and Behavioral Statistics, 40( 3), 227-253. |
45 | Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. |
46 | Lu R. &Guo H. , ( 2018). A simulation study to compare nonequivalent groups with anchor test equating and pseudo-equivalent group linking. ETS Research Report Series, 2018( 1), 1-16. |
47 | Mao X. ( 2006). An investigation of the accuracy of the estimates of standard errors for the kernel equating functions (Unpublished doctorial dissertation). University of Iowa, Iowa City. |
48 | Meng Y. ( 2012). Comparison of kernel equating and item response theory equating methods (Unpublished doctorial dissertation). University of Massachusetts Amherst. |
49 | Moses T. &Holland P. , ( 2007). Kernel and traditional equipercentile equating with degrees of presmoothing. ETS Research Report Series, 2007( 1), 1-39. |
50 | Moses T. &Holland P. , ( 2008). Notes on a general framework for observed score equating. ETS Research Report Series, 2008( 2), i-34. |
51 | Moses T., Yang W. L., & Wilson C . ( 2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44( 2), 157-178. |
52 | Norman Dvorak, R. L . ( 2009). A comparison of kernel equating to the test characteristic curve method (Unpublished doctorial dissertation). University of Nebraska, Lincoln. |
53 | Puhan G., von Davier A., & Gupta S . ( 2008). Impossible scores resulting in zero frequencies in the anchor test: Impact on smoothing and equating. ETS Research Report Series, 2008( 1), i-26. |
54 | R Core Team . ( 2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. |
55 | Sansivieri V. &Wiberg M. , ( 2016). IRT observed-score equating with the nonequivalent groups with covariates design. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 275-285). Cham, Zug, Switzerland: Springer. |
56 | Sansivieri V., Wiberg M., & Matteucci M . ( 2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica, 77( 4), 329-352. |
57 | Sinharay S. &Holland P. W . ( 2010). A new approach to comparing several equating methods in the context of the NEAT design. Journal of Educational Measurement, 47( 3), 261-285. |
58 | Underhill J.L . ( 2017). The robustness of kernel equating as non-normality occurs under the equivalent groups design (Unpublished doctorial dissertation). University of Florida, Gainesville. |
59 | van der Linden, W. J . ( 2010). On bias in linear observed-score equating. Measurement: Interdisciplinary Research & Perspective, 8( 1), 21-26. |
60 | van der Linden, W. J . ( 2013). Some conceptual issues in observed-score equating. Journal of Educational Measurement, 50( 3), 249-285. |
61 | van der Linden, W. J., &Wiberg M. , ( 2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34( 8), 620-640. |
62 | von Davier, A. A . ( 2011a). An observed-score equating framework.In P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger (Series. Eds.) & N. J. Dorans, & S. Sinharay (Vol. Eds.), Lecture notes in statistics: Proceedings: Vol 202. Looking back: proceedings of a conference in honor of Paul W. Holland(pp. 221-238). New York City, NY: Springer. |
63 | von Davier, A. A . ( 2011 b). A statistical perspective on equating test scores. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linking( pp. 1-17). New York City, NY: Springer. |
64 | von Davier, A. A . ( 2013). Observed-score equating: An overview. Psychometrika, 78( 4), 605-623. |
65 | von Davier, A. A., &Chen H. , ( 2013). The kernel levine equipercentile observed-score equating function. ETS Research Report Series,( 2), i-27. |
66 | von Davier A. A., Fournier-Zajac S., & Holland P. W . ( 2007). An equipercentile version of the Levine linear observed-score equating function using the methods of kernel equating. ETS Research Report Series,( 1), i-19. |
67 | von Davier A. A., Holland P. W., Livingston S. A., Casabianca J., Grant M. C., & Martin K . ( 2006). An evaluation of the kernel equating method: A special study with pseudotests constructed from real test data. ETS Research Report Series,( 1), i-31. |
68 | von Davier A. A., Holland P. W., & Thayer D. T . ( 2004). The kernel method of test equating. New York City, NY: Springer-Verlag. |
69 | von Davier, A. A., &Kong N. , ( 2005). A unified approach to linear equating for the nonequivalent groups design. Journal of Educational and Behavioral Statistics, 30( 3), 313-342. |
70 | Wallin G., Häggström J., & Wiberg M . ( 2017). How to select the bandwidth in kernel equating-An evaluation of five different methods. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 233. Quantitative psychology: The 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017 (pp. 91-100). Cham, Zug, Switzerland: Springer. |
71 | Wallin G. &Wiberg M. , ( 2016). Nonequivalent groups with covariates design using propensity scores for kernel equating. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 309-319). Cham, Zug, Switzerland: Springer. |
72 | Wallin G. &Wiberg M. , ( 2019). Kernel equating using propensity scores for nonequivalent groups. Journal of Educational and Behavioral Statistics, 44( 4), 390-414. |
73 | Wang, T. ( 2007). An alternative continuization method to the kernel method in von Davier, Holland and Thayer’s (2004) test equating framework (No.11).. Retrieved Jan 8, 2020, from |
74 | Wang T. ( 2011). An alternative continuization method: The continuized log-linear method. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for Social and Behavioral Sciences: Statistical models for test equating, scaling, and linking( pp. 141-157). New York City, NY: Springer. |
75 | Wang T., Lee W. C., Brennan R. L., & Kolen M. J . ( 2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32( 8), 632-651. |
76 | Wedman J. ( 2017). Theory and validity evidence for a large-scale test for selection to higher education (Unpublished doctorial dissertation). Umeå University. |
77 | Wiberg M. ( 2016 a). Alternative linear item response theory observed-score equating methods. Applied Psychological Measurement, 40( 3), 180-199. |
78 | Wiberg M. ( 2016b). Ensuring test quality over time by monitoring the equating transformations. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 239-251). Cham, Zug, Switzerland: Springer. |
79 | Wiberg M. &Bränberg K. , ( 2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39( 5), 349-361. |
80 | Wiberg M. &González J. , ( 2016). Statistical assessment of estimated transformations in observed-score equating. Journal of Educational Measurement, 53( 1), 106-125. |
81 | Wiberg M. & van der Linden, W. J . ( 2011). Local linear observed-score equating. Journal of Educational Measurement, 48( 3), 229-254. |
82 | Wiberg M ., van der Linden, W. J., & von Davier, A. A. ( 2014). Local observed-score kernel equating. Journal of Educational Measurement, 51( 1), 57-74. |
83 | Wiberg M. & von Davier, A. A . ( 2017). Examining the impact of covariates on anchor tests to ascertain quality over time in a college admissions test. International Journal of Testing, 17( 2), 105-126. |
84 | Xin T. &Zhang J. , ( 2015). Local equating of cognitively diagnostic modeled observed scores. Applied Psychological Measurement, 39( 1), 44-61. |
[1] | 方俊燕, 温忠麟. 追踪研究中的内生性问题:来源与应对[J]. 心理科学进展, 2023, 31(4): 507-518. |
[2] | 黄顺森, 陈豪杰, 来枭雄, 代欣然, 王耘. 多元宇宙样分析:简介及应用[J]. 心理科学进展, 2023, 31(2): 196-208. |
[3] | 陈新文, 李鸿杰, 丁玉珑. 探究事件相关脑电/脑磁信号中的神经表征模式:基于分类解码和表征相似性分析的方法[J]. 心理科学进展, 2023, 31(2): 173-195. |
[4] | 包寒吴霜 王梓西 程曦 苏展 杨盈 张光耀 王博 蔡华俭. 基于词嵌入技术的心理学研究:方法及应用[J]. 心理科学进展, 0, (): 0-0. |
[5] | 朱训, 顾昕. 变量相对重要性评估的方法选择及应用[J]. 心理科学进展, 2023, 31(1): 145-158. |
[6] | 方杰, 温忠麟. 纵向数据的调节效应分析[J]. 心理科学进展, 2022, 30(11): 2461-2472. |
[7] | 翟宏堃, 李强, 魏晓薇. 结构方程模型统计检验力分析:原理与方法[J]. 心理科学进展, 2022, 30(9): 2117-2130. |
[8] | 王阳, 温忠麟, 王惠惠, 管芳. 第二类有中介的调节模型[J]. 心理科学进展, 2022, 30(9): 2131-2142. |
[9] | 温忠麟, 谢晋艳, 方杰, 王一帆. 新世纪20年国内假设检验及其关联问题的方法学研究[J]. 心理科学进展, 2022, 30(8): 1667-1681. |
[10] | 温忠麟, 陈虹熹, 方杰, 叶宝娟, 蔡保贞. 新世纪20年国内测验信度研究[J]. 心理科学进展, 2022, 30(8): 1682-1691. |
[11] | 温忠麟, 方杰, 谢晋艳, 欧阳劲樱. 国内中介效应的方法学研究[J]. 心理科学进展, 2022, 30(8): 1692-1702. |
[12] | 方杰, 温忠麟, 欧阳劲樱, 蔡保贞. 国内调节效应的方法学研究[J]. 心理科学进展, 2022, 30(8): 1703-1714. |
[13] | 王阳, 温忠麟, 李伟, 方杰. 新世纪20年国内结构方程模型方法研究与模型发展[J]. 心理科学进展, 2022, 30(8): 1715-1733. |
[14] | 刘源, 都弘彦, 方杰, 温忠麟. 国内追踪数据分析方法研究与模型发展[J]. 心理科学进展, 2022, 30(8): 1734-1746. |
[15] | 林浇敏, 李爱梅, 周雅然, 何军红, 周蕾. 眼动操纵技术在决策研究中的应用前景:改变决策行为[J]. 心理科学进展, 2022, 30(8): 1794-1803. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||