迫选测验中虚假作答行为建模及其在人格测评中的应用：基于RES理论框架

doi:10.3724/SP.J.1041.2025.1832

摘要/Abstract

摘要：

与Likert自评量表相比, 虽然迫选测验因对项目进行社会称许性匹配而具一定的抗作假功效, 但大量研究表明项目的称许性会由于与不同的项目匹配成block发生改变, 并在不同的测评情境下也会发生改变, 因此迫选测验仍不可避免地存在虚假作答行为, 进而严重降低并危害测量结果的准确性与公平性。鉴于此, 本研究基于瑟斯顿IRT模型(TIRT)以及Böckenholt (2014)的RES作假理论模型, 针对迫选测验中虚假作答行为进行统计建模(简记为RES-TIRT), 以期解决上述问题。本文通过两项模拟研究探讨了新模型的性能并与传统的模型进行比较, 随后通过实证研究深入探讨了新模型在大五人格测评中的具体应用及其优势。模拟研究结果表明：(1)在不同模拟条件下RES-TIRT模型估计情况良好; (2)不论是项目参数还是被试参数, 新模型RES-TIRT的参数估计精度均明显优于传统的TIRT模型。实证研究将新模型应用于真实的大五人格测评, 通过对比分析诚实作答组和虚假作答组的结果, 结果表明：与传统的TIRT模型相比, 新模型RES-TIRT能有效地降低乃至消除虚假作答对测量结果的负面影响, 并进一步提升了迫选测验的抗作假功效, 有力地证明了RES-TIRT模型的优势及其应用前景。

关键词: 瑟斯顿IRT模型, 迫选测验, 虚假作答, 人格测评

Abstract:

Although forced-choice (FC) assessments with social desirability matching reduce faking compared to Likert scales, the desirability of items may shift when matched in blocks and vary across contexts. Consequently, faking remains a persistent issue in FC assessments, compromising measurement accuracy and fairness. To address this, we propose a statistical model for detecting and mitigating faking in FC assessments, integrating Böckenholt’s (2014) model of RES faking theory with the Thurstone Item Response Theory (TIRT) model (Brown & Maydeu-Olivares, 2011). Our approach aims to minimize the adverse effects of faking and enhance the robustness of FC measures.

Two simulation studies were conducted to evaluate the proposed RES-TIRT model. Simulation Study 1 examined model performance under varying conditions (sample size, FC scale format, item direction, trait correlation, and dimensionality). Results indicated optimal estimation accuracy when using 3-item blocks, 3 dimensions, a correlation of 0 between dimensions, and a mix of positive and negative item descriptions. Simulation Study 2 compared trait estimation accuracy between TIRT and RES-TIRT models under increasing faking prevalence. While the TIRT model performed better in faking-free conditions, its accuracy declined more sharply than the RES-TIRT model as faking increased—particularly for item parameters—demonstrating the RES-TIRT model’s superior resistance to faking.

An empirical analysis further validated the model’s applicability in real-world settings. Comparing honest responses with faked responses (simulating lawyer job applications), we found that applicants strategically inflated traits like openness, agreeableness, and extraversion to meet job requirements. The RES-TIRT model effectively detected these distortions, showing significant discrepancies in these dimensions compared to the TIRT model. Additionally, the RES-TIRT model effectively captured response distortion tendencies, as evidenced by significantly elevated latent trait values θ_j^E under faking conditions compared to honest responses. This indicates that, the faking behavior of the applicants can be successfully captured by the RES-TIRT model. Moreover, the difficulty parameter β^E_im triggering fake answers can be observed to determine whether a FC block is prone to faking. These empirically derived parameters enable targeted refinements in FC measure development, allowing test constructors to strategically modify or eliminate items with low faking thresholds, thereby enhancing the scale's overall resistance to response biases.

In conclusion, both simulation and empirical studies have demonstrated that the RES-TIRT model is a viable alternative to the TIRT model. It can be employed to address the issue of faking in FC scales, particularly in high-stakes situations such as talent selection.

Key words: Thurstonian IRT model, forced-choice measures, faking responses, personality assessment

中图分类号:

B841

何翠婷, 彭思韦, 朱怡安, 汪大勋, 蔡艳, 涂冬波. (2025). 迫选测验中虚假作答行为建模及其在人格测评中的应用：基于RES理论框架. 心理学报, 57(10), 1832-1848.

HE Cuiting, PENG Siwei, ZHU Yian, WANG Daxun, CAI Yan, TU Dongbo. (2025). Faking modeling for forced choice measures in personality assessment based on RES theoretical framework. Acta Psychologica Sinica, 57(10), 1832-1848.

图/表 15

参考文献 48

[1]	Bartram, D. (2007). Increasing validity with forced-choice criterion measurement formats. International Journal of Selection and Assessment, 15(3), 263-272.
[2]	Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., & Smith, M. A. (2006). A meta-analytic investigation of job applicant faking on personality measures. International Journal of Selection and Assessment, 14(4), 317-335.
[3]	Böckenholt, U. (2014). Modeling motivated misreports to sensitive survey questions. Psychometrika, 79(3), 515-537. doi: 10.1007/s11336-013-9390-9 pmid: 24297438
[4]	Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434-455.
[5]	Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71(3), 460-502.
[6]	Bunji, K., & Okada, K. (2020). Joint modeling of the two-alternative multidimensional forced-choice personality measurement and its response time by a Thurstonian D-diffusion item response model. Behavior Research Methods, 52(3), 1091-1107. doi: 10.3758/s13428-019-01302-5 pmid: 32394181
[7]	Bürkner, P. C., Schulte, N., & Holling, H. (2019). On the statistical and practical limitations of Thurstonian IRT models. Educational and Psychological Measurement, 79(5), 827-854.
[8]	Cao, M., & Drasgow, F. (2019). Does forcing reduce faking? A meta-analytic review of forced-choice personality measures in high-stakes situations. The Journal of Applied Psychology, 104(11), 1347-1368.
[9]	Chen, C. -W., Wang, W. -C., Chiu, M. M., & Ro, S. (2020). Item selection and exposure control methods for computerized adaptive testing with multidimensional ranking items. Journal of Educational Measurement, 57(2), 343-369.
[10]	Cheung, M., & Chan, W. (2002). Reducing Uniform Response Bias with Ipsative Measurement in Multiple-Group Confirmatory Factor Analysis. Structural Equation Modeling, 9(1), 55-77.
[11]	de Valpine, P., Turek, D., Paciorek, C., Anderson-Bergman, C., Temple Lang, D., & Bodik, R. (2017). Programming with models: Writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics, 26(2), 403-413.
[12]	Faul, F., Erdfelder, E., Lang, A. -G., & Buchner, A. (2007). GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods*, 39(2), 175-191. doi: 10.3758/bf03193146 pmid: 17695343
[13]	Frick, S. (2022). Modeling faking in the multidimensional forced-choice format: The faking mixture model. Psychometrika, 87(2), 773-794.
[14]	Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457-472.
[15]	Guenole, N., Brown, A. A., & Cooper, A. J. (2018). Forced- choice assessment of work-related maladaptive personality traits: Preliminary evidence from an application of Thurstonian item response modeling. Assessment, 25(4), 513-526. doi: 10.1177/1073191116641181 pmid: 27056730
[16]	Guo, Z., Wang, D., Cai, Y., & Tu, D. (2023). An Item Response Theory Model for Incorporating Response Times in Forced-Choice Measures. Educational and Psychological Measurement, 84(3), 450-480.
[17]	Heggestad, E. D., Morrison, M., Reeve, C. L., & McCloy, R. A. (2006). Forced-choice assessments of personality for selection: Evaluating issues of normative assessment and faking resistance. Journal of Applied Psychology, 91(1), 9-24. pmid: 16435935
[18]	Holtgraves, T. (2004). Social desirability and self-reports: Testing models of socially desirable responding. Personality & Social Psychology Bulletin, 30(2), 161-172.
[19]	Huang, H. Y. (2023). Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. Educational and Psychological Measurement, 83(1), 146-180.
[20]	Hughes, A. W., Dunlop, P. D., Holtrop, D., & Wee, S. (2021). Spotting the “Ideal” personality response: Effects of item matching in forced choice measures for personnel selection. Journal of Personnel Psychology, 20(1), 17-26. doi: 10.1027/1866-5888/a000267
[21]	Jackson, D. N., Wroblewski, V. R., & Ashton, M. C. (2000). The impact of faking on employment tests: Does forced choice offer a solution? Human Performance, 13(4), 371-388.
[22]	Joo, S. -H., Lee, P., & Stark, S. (2020). Adaptive testing with the GGUM-RANK multidimensional forced choice model: Comparison of pair, triplet, and tetrad scoring. Behavior Research Methods, 52(2), 761-772
[23]	König, C. J., Merz, A. S., & Trauffer, N. (2012). What is in applicants' minds when they fill out a personality test? Insights from a qualitative study. International Journal of Selection and Assessment, 20(4), 442-452.
[24]	Kreitchmann, R. S., Sorrel, M. A., & Abad, F. J. (2023). On bank assembly and block selection in multidimensional forced-choice adaptive assessments. Educational and Psychological Measurement, 83(2), 294-321. doi: 10.1177/00131644221087986 pmid: 36866066
[25]	Lin, Y., & Brown, A. (2017). Influence of context on item parameters in forced-choice personality assessments. Educational and Psychological Measurement, 77(3), 389-414. doi: 10.1177/0013164416646162 pmid: 29795919
[26]	Liu, J., Zheng, C. J., Li, Y. C., & Lian, X. (2022). IRT-based scoring methods for multidimensional forced choice tests. Advances in Psychological Science, 30(6), 1410-1428. doi: 10.3724/SP.J.1042.2022.01410
	[刘娟, 郑蝉金, 李云川, 连旭. (2022). 适用于多维迫选测验的 IRT 计分模型. 心理科学进展, 30(6), 1410-1428.] doi: 10.3724/SP.J.1042.2022.01410
[27]	MacCann, C., Ziegler, M., & Roberts, R. D. (2011). Faking in personality assessment. In M. Ziegler, C. MacCann, & R. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 309-329). Oxford University Press.
[28]	Morrison, E., & Bies, R. (1991). Impression management in the feedback-seeking process: A literature review and research agenda. Academy of Management Review, 16(3), 522-541.
[29]	Pauls, C. A., & Crost, N. W. (2005). Effects of different instructional sets on the construct validity of the NEO-PI-R. Personality and Individual Differences, 39(2), 297-308.
[30]	Peng, S., Man, K., Veldkamp, B. P., Cai, Y., & Tu, D. (2024). A mixture model for random responding behavior in forced-choice noncognitive assessment: Implication and application in organizational research. Organizational Research Methods, 27(3), 414-442.
[31]	Sass, R., Frick, S., Reips, U. -D., & Wetzel, E. (2020). Taking the test taker's perspective: Response process and test motivation in multidimensional forced-choice versus rating scale instruments. Assessment, 27(3), 572-584. doi: 10.1177/1073191118762049 pmid: 29560735
[32]	Saville, P., & Willson, E. (1991). The reliability and validity of normative and ipsative approaches in the measurement of personality. Journal of Occupational Psychology, 64(3), 219-238.
[33]	Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298-321.
[34]	Smith, D. B., & Ellingson, J. E. (2002). Substance versus style: A new look at social desirability in motivating contexts. Journal of Applied Psychology, 87(2), 211-219. pmid: 12002950
[35]	Speer, A. B., Wegmeyer, L. J., Tenbrink, A. P., Delacruz, A. Y., Christiansen, N. D., & Salim, R. M. (2023). Comparing forced-choice and single-stimulus personality scores on a level playing field: A meta-analysis of psychometric properties and susceptibility to faking. Journal of Applied Psychology, 108(11), 1812-1833. doi: 10.1037/apl0001099 pmid: 37326537
[36]	Tourangeau, R., & Rasinski, K. A. (1988). Cognitive-processes underlying context effects in attitude measurement. Psychological Bulletin, 103(3), 299-314.
[37]	Trent, J. D., Barron, L. G., Rose, M. R., & Carretta, T. R. (2020). Tailored adaptive personality assessment system (TAPAS) as an indicator for counterproductive work behavior: Comparing validity in applicant, honest, and directed faking conditions. Military Psychology, 32(1), 51-59.
[38]	van der Linden, D., te Nijenhuis, J., & Bakker, A. B. (2010). The general factor of personality: A meta-analysis of big five intercorrelations and a criterion-related validity study. Journal of Research in Personality, 44(3), 315-327.
[39]	Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross validation and WAIC. Statistics and Computing, 27(5), 1413-1432.
[40]	Viswesvaran, C., & Ones, D. S. (1999). Meta-analyses of fakability estimates: Implications for personality measurement. Educational and Psychological Measurement, 59(2), 197-210.
[41]	Walczyk, J. J., Schwartz, J. P., Clifton, R., Adams, B., Wei, M. I. N., & Zha, P. (2005). Lying person-to-person about life events: A cognitive framework for lie detection. Personnel Psychology, 58(1), 141-170.
[42]	Wang, Q., Zheng, Y., Liu, K., Cai, Y., Peng, S., & Tu, D. (2024). Item selection methods in multidimensional computerized adaptive testing for forced-choice items using Thurstonian IRT model. Behavior Research Methods, 56(2), 600-614.
[43]	Wang, S., Lou, F., & Liu, H. Y. (2014). The conventional and the IRT-based scoring methods of forced-choice personality tests. Advances in Psychological Science, 22(3), 549-557. doi: 10.3724/SP.J.1042.2014.00549
	[王珊, 骆方, 刘红云. (2014). 迫选式人格测验的传统计分与IRT计分模型. 心理科学进展, 22(3), 549-557.] doi: 10.3724/SP.J.1042.2014.00549
[44]	Wen, H. B., & Wang, S. M. (2020). Influence of block context on accuracy of ability parameter estimated by unfolding model in forced-choice test. Journal of Psychological Science, 43(4), 990-996.
	[温红博, 王帅鸣. (2020). 迫选测验中题组环境对展开模型能力估计精度的影响. 心理科学, 43(4), 990-996.]
[45]	Wetzel, E., & Frick, S. (2020). Comparing the validity of trait estimates from the multidimensional forced-choice format and the rating scale format. Psychological Assessment, 32(3), 239-253. doi: 10.1037/pas0000781 pmid: 31738070
[46]	Wetzel, E., Frick, S., & Brown, A. (2021). Does multidimensional forced-choice prevent faking? Comparing the susceptibility of the multidimensional forced-choice format and the rating scale format to faking. Psychological Assessment, 33(2), 156-170. doi: 10.1037/pas0000971 pmid: 33151727
[47]	Zhang, B., Sun, T., Drasgow, F., Chernyshenko, O. S., Nye, C. D., Stark, S., & White, L. A. (2020). Though forced, still valid: Psychometric equivalence of forced-choice and single- statement measures. Organizational Research Methods, 23(3), 569-590.
[48]	Ziegler, M. (2011). Applicant faking: A look into the black box. The Industrial-Organizational Psychologist, 49(1), 29-36.

实验因素	水平
样本容量(N)	500, 1000
迫选测验的格式(BS)	每个block包括2个项目, 3个项目
项目描述的正负性(Key)	所有都为正向描述(+), 正向与负向描述混合(+/−)
特质间相关性(COR)	0, 0.5
维度数量(Dim)	3, 5

实验因素	水平
样本容量(N)	500, 1000
迫选测验的格式(BS)	每个block包括2个项目, 3个项目
项目描述的正负性(Key)	所有都为正向描述(+), 正向与负向描述混合(+/−)
特质间相关性(COR)	0, 0.5
维度数量(Dim)	3, 5

BS	Dim	Key	COR	N = 500		N = 1000
BS	Dim	Key	COR	运算时间 (minute)	PSRF<1.1 (%)	运算时间 (minute)	PSRF<1.1 (%)
2	3维	+	0	9.320	0.999	23.188	0.998
		+	0.5	9.304	0.998	22.985	0.998
		+/−	0	9.330	0.999	23.325	0.999
		+/−	0.5	9.233	0.999	22.851	0.999
	5维	+	0	9.294	0.998	23.385	0.999
		+	0.5	9.337	0.998	20.787	0.999
		+/−	0	9.310	0.999	20.784	0.999
		+/−	0.5	8.549	0.998	20.485	0.999
3	3维	+	0	17.684	0.998	48.273	0.999
		+	0.5	17.808	0.997	48.027	0.998
		+/−	0	17.714	0.999	51.149	1.000
		+/−	0.5	17.846	0.999	47.736	0.998
	5维	+	0	18.009	0.998	48.284	0.999
		+	0.5	17.741	0.998	48.663	0.999
		+/−	0	19.280	0.998	49.208	1.000
		+/−	0.5	22.088	0.998	48.614	0.999

BS	Dim	Key	COR	N = 500		N = 1000
BS	Dim	Key	COR	运算时间 (minute)	PSRF<1.1 (%)	运算时间 (minute)	PSRF<1.1 (%)
2	3维	+	0	9.320	0.999	23.188	0.998
		+	0.5	9.304	0.998	22.985	0.998
		+/−	0	9.330	0.999	23.325	0.999
		+/−	0.5	9.233	0.999	22.851	0.999
	5维	+	0	9.294	0.998	23.385	0.999
		+	0.5	9.337	0.998	20.787	0.999
		+/−	0	9.310	0.999	20.784	0.999
		+/−	0.5	8.549	0.998	20.485	0.999
3	3维	+	0	17.684	0.998	48.273	0.999
		+	0.5	17.808	0.997	48.027	0.998
		+/−	0	17.714	0.999	51.149	1.000
		+/−	0.5	17.846	0.999	47.736	0.998
	5维	+	0	18.009	0.998	48.284	0.999
		+	0.5	17.741	0.998	48.663	0.999
		+/−	0	19.280	0.998	49.208	1.000
		+/−	0.5	22.088	0.998	48.614	0.999

BS	Dim	Key	COR	θ_j^R_mean	d	a	θ_j^E	β^E_im
2	3维	+	0	−0.003	−0.005	0.009	−0.002	0.000
		+	0.5	0.001	0.106	−0.164	0.003	−0.153
		+/−	0	−0.008	0.001	−0.011	0.004	−0.001
		+/−	0.5	−0.001	0.023	−0.039	−0.001	−0.044
	5维	+	0	0.000	−0.004	0.004	0.004	0.008
		+	0.5	0.005	0.116	−0.214	0.011	−0.173
		+/−	0	0.000	−0.007	−0.009	0.001	0.011
		+/−	0.5	0.003	0.021	−0.038	0.006	−0.036
3	3维	+	0	−0.001	0.067	0.054	0.015	−0.099
		+	0.5	0.004	0.139	−0.131	0.031	−0.205
		+/−	0	0.005	0.057	0.065	0.009	−0.076
		+/−	0.5	0.002	0.071	0.024	0.014	−0.089
	5维	+	0	−0.002	0.028	0.053	−0.009	−0.028
		+	0.5	0.007	0.138	−0.161	0.019	−0.169
		+/−	0	−0.001	0.027	0.075	0.004	−0.032
		+/−	0.5	0.007	0.029	0.020	0.003	−0.054