大语言模型的人格化对齐及其对道德判断的影响

doi:10.3724/SP.J.1041.2026.1237

摘要/Abstract

摘要： 随着人机共生时代的到来, 大语言模型(LLM)的伦理困境与算法偏见引发了社会广泛的担忧, 引导人工智能技术向善发展已成为该领域极具紧迫性和挑战性的重要议题。研究探讨了基于HEXACO人格模型的人格化对齐对LLM道德判断的影响, 其中研究1检验并证实了LLM可以通过遵循提示词有效表达HEXACO人格特质, 研究2探讨了人格化对齐对LLM功利主义倾向的影响及其与人类的异同。结果表明, 高诚实-谦恭、宜人性和尽责性的人格提示词显著减少了GPT-3.5、GPT-4和ERNIE 3.5做出功利主义选择的倾向。由此, 本研究提出基于HEXACO人格模型和人格元特质的LLM人格化对齐框架, 强调稳定性元特质中的诚实-谦恭、宜人性和尽责性等维度在LLM人格化对齐中的道德凸显效应。本研究为人工智能人格化对齐的理论构建与技术路径提供了心理学依据。

关键词: 大语言模型, 人格化对齐, 道德判断, HEXACO人格, 元特质

Abstract: With the advent of the human-machine symbiosis era, the ethical dilemmas and algorithmic biases of large language models (LLMs) have triggered widespread societal concerns. Guiding artificial intelligence (AI) toward beneficial development has thus become an urgent and challenging imperative. This research explores the impact of personalized alignment based on the HEXACO personality model on the moral judgment of LLMs. Specifically, the study aims to verify whether LLMs can effectively achieve personalized alignment through prompting and to systematically evaluate how such alignment influences utilitarian tendencies in LLMs compared to humans across various moral dilemmas. By leveraging established psychological frameworks, this research seeks to provide a scientific basis for constructing controllable and ethical AI alignment strategies.
Study 1 tested GPT-3.5, GPT-4, and ERNIE 3.5 using HEXACO-based personality prompts across six domains at high, low, and baseline levels, integrated with different gender roles. Manipulation checks were conducted using two distinct methods: a quantitative personality assessment using the HEXACO-60 scale and a qualitative personality story-writing task rated by independent human evaluators. Study 2 utilized a set of standardized moral dilemmas to assess utilitarian versus deontological choices in both LLMs and human participants. Human data were categorized into high and low personality groups for comparison, while the LLMs performed the same moral judgment tasks under various personality settings to identify shifts in decision-making patterns.
The results of Study 1 confirmed the feasibility of personalized alignment, demonstrating that LLMs can dynamically represent HEXACO personality traits through prompts. Among the LLMs tested, GPT-4 exhibited superior instruction-following capabilities and more distinct trait differentiation than GPT-3.5 and ERNIE 3.5. Findings from Study 2 revealed that personalized alignment significantly alters the moral judgment of LLMs, though the impact varies across different models and personality domains. Specifically, traits such as Honesty-Humility, Agreeableness, and Conscientiousness were found to reduce utilitarian tendencies, leading to a preference for deontological responses. While some traits, particularly Honesty-Humility, showed stable and consistent effects between humans and AI, others displayed divergent or even opposite patterns, highlighting fundamental differences in their respective moral reasoning mechanisms.
The study reached three primary conclusions. First, LLMs are capable of exhibiting stable and distinguishable personality tendencies that can be activated through prompt-based alignment. Second, the influence of Honesty-Humility on moral judgment exhibits a consistent effect across humans and different LLMs, whereas other personality domains show inconsistencies. This suggests that while LLMs' moral decision-making shares partial cognitive logic with humans, fundamental differences remain. Third, the personality metatrait of “Stability”—and particularly the Honesty-Humility domain—demonstrates a significant moral salience effect within the personalized alignment process. Based on these insights, this research proposes a personalized alignment framework utilizing the HEXACO model and personality metatrait theory to systematically shape the moral responses of AI, providing a psychological foundation for the development of safety, controllable and ethical AI systems. This framework emphasizes integrating psychological theories to mitigate ethical risks and ensure that AI behavior remains consistent with human values.

Key words: large language models, personalized alignment, moral judgment, HEXACO personality, metatrait

中图分类号:

B848

李昌锦, 焦丽颖, 陈圳, 许恒彬, 吴胜涛, 许燕. (2026). 大语言模型的人格化对齐及其对道德判断的影响. 心理学报, 58(7), 1237-1253.

LI Chang-Jin, JIAO Liying, CHEN Zhen, XU Hengbin, WU Michael Shengtao, XU Yan. (2026). Personalized alignment of large language models and its impact on moral judgment. Acta Psychologica Sinica, 58(7), 1237-1253.

参考文献

[1] Ashton, M. C., & Lee, K. (2007). Empirical, theoretical, and practical advantages of the HEXACO model of personality structure.Personality and Social Psychology Review, 11(2), 150-166.
[2] Ashton, M. C., & Lee, K. (2008a). The HEXACO model of personality structure and the importance of the H Factor.Social and Personality Psychology Compass, 2(5), 1952-1962.
[3] Ashton, M. C., & Lee, K. (2008b). The prediction of Honesty-Humility-related criteria by the HEXACO and Five-Factor Models of personality.Journal of Research in Personality, 42(5), 1216-1228.
[4] Ashton, M. C., & Lee, K. (2009). The HEXACO-60: A short measure of the major dimensions of personality.Journal of Personality Assessment, 91(4), 340-345.
[5] Bender E. M., Gebru T., McMillan-Major A., & Shmitchell S. (2021). On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(pp. 610-623). Association for Computing Machinery.
[6] Bodroža B., Dinić B. M., & Bojić L. (2024). Personality testing of large language models: Limited temporal stability, but highlighted prosociality.Royal Society Open Science, 11(10), 240180.
[7] Bonnefon J. F., Rahwan I., & Shariff A. (2024). The moral psychology of artificial intelligence.Annual Review of Psychology, 75(1), 653-675.
[8] Borman H., Leontjeva A., Pizzato L., Jiang M. K., & Jermyn D. (2024). Do LLM personas dream of bull markets? Comparing human and AI investment strategies through the lens of the five-factor model. arXiv. https://doi.org/10.48550/arXiv.2411.05801
[9] Chen R., Arditi A., Sleight H., Evans O., & Lindsey J. (2025). Persona vectors: Monitoring and controlling character traits in language models. arXiv. https://doi.org/10.48550/arXiv.2507.21509
[10] Corrêa N. K., Galvão C., Santos J. W., Del Pino C., Pinto E. P., Barbosa C., ... de Oliveira N. (2023). Worldwide AI ethics: A review of 200 guidelines and recommendations for AI Governance.Patterns, 4(10), 100857.
[11] DeYoung C. G., Peterson J. B., & Higgins D. M. (2002). Higher-order factors of the Big Five predict conformity: Are there neuroses of health?Personality and Individual Differences, 33(4), 533-552.
[12] Djeriouat, H., & Trémolière, B. (2014). The Dark Triad of personality and utilitarian moral judgment: The mediating role of Honesty/Humility and Harm/Care.Personality and Individual Differences, 67, 11-16.
[13] Gabriel, I. (2020). Artificial intelligence, values, and alignment.Minds and Machines, 30, 411-437.
[14] Graham J., Meindl P., Beall E., Johnson K. M., & Zhang L. (2016). Cultural differences in moral judgment and behavior, across and within societies.Current Opinion in Psychology, 8, 125-130.
[15] Hadi M. U., Tashi Q. A., Qureshi R., Shah A., Muneer A., Irfan M., ... Shah M. (2025). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. TechRxiv. https://doi.org/10.36227/techrxiv.23589741.v8
[16] Hagendorff, T. (2024). Deception abilities emerged in large language models.Proceedings of the National Academy of Sciences, USA, 121(24), e2317967121.
[17] Hagendorff T., Dasgupta I., Binz M., Chan S. C., Lampinen A., Wang J. X., Akata Z., & Schulz E. (2023). Machine Psychology. arXiv. https://doi.org/10.48550/arXiv.2303.13988
[18] He, J., & Liu, J. (2025). Investigating the impact of LLM personality on cognitive bias manifestation in automated decision-making tasks. arXiv. https://doi.org/10.48550/arXiv.2502.14219
[19] Hilbig B. E., Glöckner A., & Zettler I. (2014). Personality and prosocial behavior: linking basic traits and social value orientations.Journal of Personality and Social Psychology, 107(3), 529-539.
[20] Hu X., Li M., Li Y., Li K., & Yu F. (2026). Moral deficiency in AI decision-making: Underlying mechanisms and mitigation strategies.Acta Psychologica Sinica, 58(1), 74-95.
[胡小勇, 李穆峰, 李悦, 李凯, 喻丰. (2026). 人工智能决策的道德缺失效应及其机制与应对策略.心理学报, 58(1), 74-95.]
[21] Hu X., Li M., Wang D., & Yu F. (2024). Reactions to immoral AI decisions: The moral deficit effect and its underlying mechanism.Chinese Science Bulletin, 69(11), 1406-1416.
[胡小勇, 李穆峰, 王笛新, 喻丰. (2024). 人工智能决策的道德缺失效应及其机制.科学通报, 69(11), 1406-1416.]
[22] Jiang G., Xu M., Zhu S. C., Han W., Zhang C., & Zhu Y. (2023). Evaluating and inducing personality in pre-trained language models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Proceedings of the 37th International Conference on Neural Information Processing Systems(pp. 10622-10643). Curran Associates Inc.
[23] Jiang H., Zhang X., Cao X., Breazeal C., Roy D., & Kabbara J. (2024). PersonaLLM: Investigating the ability of large language models to express personality traits. In K. Duh, H. Gomez, & S. Bethard (Eds.), Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 3605-3627). Association for Computational Linguistics.
[24] Jiao L., Li C.-J., Chen Z., Xu H., & Xu Y. (2025). When AI “possesses” personality: Roles of good and evil personalities influence moral judgment in large language models.Acta Psychologica Sinica, 57(6), 929-946.
[焦丽颖, 李昌锦, 陈圳, 许恒彬, 许燕. (2025). 当AI“具有”人格:善恶人格角色对大语言模型道德判断的影响.心理学报, 57(6), 929-946.]
[25] Jiao L., Yang Y., Xu Y., Gao S., & Zhang H. (2019). Good and evil in Chinese culture: Personality structure and connotation.Acta Psychologica Sinica, 51(10), 1128-1142.
[焦丽颖, 杨颖, 许燕, 高树青, 张和云. (2019). 中国人的善与恶:人格结构与内涵.心理学报, 51(10), 1128-1142.]
[26] Jobin A., Ienca M., & Vayena E. (2019). The global landscape of AI ethics guidelines.Nature Machine Intelligence, 1, 389-399.
[27] Kroneisen, M., & Heck, D. W. (2020). Interindividual differences in the sensitivity for consequences, moral norms, and preferences for inaction: Relating basic personality traits to the CNI model.Personality and Social Psychology Bulletin, 46(7), 1013-1026.
[28] Kruglanski A. W., Szumowska E., Kopetz C. H., Vallerand R. J., & Pierro A. (2021). On the psychology of extremism: How motivational imbalance breeds intemperance.Psychological Review, 128(2), 264-289.
[29] Lee, K., & Ashton, M. C. (2004). Psychometric properties of the HEXACO personality inventory.Multivariate Behavioral Research, 39(2), 329-358.
[30] Lee, K., & Ashton, M. C. (2008). The HEXACO personality factors in the indigenous personality lexicons of English and 11 other languages.Journal of Personality, 76(5), 1001-1054.
[31] Lee, K., & Ashton, M. C. (2014). The dark triad, the big five, and the HEXACO model.Personality and Individual Differences, 67, 2-5.
[32] Lee K., Ashton M. C., Wiltshire J., Bourdage J. S., Visser B. A., & Gallucci A. (2013). Sex, power, and money: Prediction from the Dark Triad and Honesty-Humility.European Journal of Personality, 27(2), 169-184.
[33] Lin B. Y., Ravichander A., Lu X., Dziri N., Sclar M., Chandu K., Bhagavatula C., & Choi Y. (2023). The unlocking spell on base LLMs: Rethinking alignment via in-context learning. arXiv. https://doi.org/10.48550/arXiv.2312.01552
[34] Lomas, T. (2019). The roots of virtue: A cross-cultural lexical analysis.Journal of Happiness Studies, 20, 1259-1279.
[35] Lotto L., Manfrinati A., & Sarlo M. (2014). A new set of moral dilemmas: Norms for moral acceptability, decision times, and emotional salience.Journal of Behavioral Decision Making, 27(1), 57-65.
[36] Lu J. G., Song L. L., & Zhang L. D. (2025). Cultural tendencies in generative AI.Nature Human Behaviour, 9, 2360-2369.
[37] Matei, M.-C., & Abrudan, M.-M. (2018). Are national cultures changing? Evidence from the World Values Survey.Procedia-Social and Behavioral Sciences, 238, 657-664.
[38] Minaee S., Mikolov T., Nikzad N., Chenaghlu M., Socher R., Amatriain X., & Gao J. (2024). Large language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2402.06196
[39] Mittelstadt, B. (2019). Principles alone cannot guarantee ethical AI.Nature Machine Intelligence, 1, 501-507.
[40] Moser C., Den Hond F., & Lindebaum D. (2022). Morality in the age of artificially intelligent algorithms.Academy of Management Learning and Education, 21(1), 139-155.
[41] Newsham, L., & Prince, D. (2025). Personality-driven decision making in LLM-based autonomous agents. In S. Das, A. Nowé(General Chairs), & Y. Vorobeychik (Program Chair), Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems
[42] Ng, A. Y., & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In P. Langley (Ed.), Proceedings of the Seventeenth International Conference on Machine Learning(pp. 663-670). Morgan Kaufmann Publishers Inc.
[43] Nighojkar A., Moydinboyev B., Duong M., & Licato J. (2025). Giving AI personalities leads to more human-like reasoning. arXiv. https://doi.org/10.48550/arXiv.2502.14155
[44] Niszczota P., Janczak M., & Misiak M. (2025). Large language models can replicate cross-cultural differences in personality.Journal of Research in Personality, 115, 104584.
[45] OpenAI. (2023). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774
[46] Ramezani, A., & Xu, Y. (2023). Knowledge of cultural moral norms in large language models. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers; pp. 428-446). Association for Computational Linguistics.
[47] Saucier G., Kenner J., Iurino K., Bou Malham P., Chen Z., Thalmayer A. G., ... Altschul C. (2014). Cross-cultural differences in a global “survey of world views”.Journal of Cross-Cultural Psychology, 46(1), 53-70.
[48] Serapio-García G., Safdari M., Crepy C., Sun L., Fitz S., Romero P., ... Matarić M. (2025). A psychometric framework for evaluating and shaping personality traits in large language models.Nature Machine Intelligence, 7, 1954-1968.
[49] Shanahan M., McDonell K., & Reynolds L. (2023). Role play with large language models.Nature, 623, 493-498.
[50] Sorokovikova A., Fedorova N., Rezagholi S., & Yamshchikov I. P. (2024). LLMs simulate big five personality traits: Further evidence. arXiv. https://doi.org/10.48550/arXiv.2402.01765
[51] Strus, W., & Cieciuch, J. (2021). Higher-order factors of the big six-similarities between big twos identified above the big five and the big six.Personality and Individual Differences, 171, 110544.
[52] Thielmann I., Spadaro G., & Balliet D. (2020). Personality and prosocial behavior: A theoretical framework and meta-analysis.Psychological Bulletin, 146(1), 30-90.
[53] Tong H., Lu E., Sun Y., Han Z., Liu C., Zhao F., & Zeng Y. (2024). Autonomous alignment with human value on altruism through considerate self-imagination and theory of mind. arXiv. https://doi.org/10.48550/arXiv.2501.00320
[54] Treglown, L., & Furnham, A. (2026). AI, social desirability, and personality assessments: Impression management in large language models.Personality and Individual Differences, 251, 113563.
[55] Wang P., Zou H., Yan Z., Guo F., Sun T., Xiao Z., & Zhang B. (2024). Not yet: Large language models cannot replace human respondents for psychometric research. OSF Preprints. https://doi.org/10.31219/osf.io/rwy9b
[56] Wang S., Li R., Chen X., Yuan Y., Yang M., & Wong D. F. (2025). Exploring the impact of personality traits on LLM bias and toxicity. In C. Christodoulopoulos, T. Chakraborty, C. Rose, & V. Peng (Eds.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing(pp. 4125-4143). Association for Computational Linguistics.
[57] Wang X., Duan S., Yi X., Yao J., Zhou S., Wei Z., ... Xie X. (2024). On the essence and prospect: An investigation of alignment approaches for big models. In K. Larson (Ed.), Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence(pp. 8308-8316). Curran Associates Inc.
[58] Wu, M. S., & Peng, K. (2025). Human advantages and psychological transformations in the era of artificial intelligence.Acta Psychologica Sinica, 57(11), 1879-1884.
[吴胜涛, 彭凯平. (2025). 智能时代的人类优势与心理变革(代序).心理学报, 57(11), 1879-1884.]
[59] Xu, Y. (2024). Personality psychology (3rd ed.). Beijing, China: Beijing Normal University Publishing Group.
[许燕. (2024). 人格心理学 (第3版). 北京: 北京师范大学出版社.]
[60] Xu Z., Sengar N., Chen T., Chung H., & Oviedo- Trespalacios O. (2025). Where is morality on wheels? Decoding large language model (LLM)-driven decision in the ethical dilemmas of autonomous vehicles.Travel Behaviour and Society, 40, 101039.
[61] Yao J., Yi X., Wang X., Wang J., & Xie X. (2023). From instructions to intrinsic human values--A survey of alignment goals for big models. arXiv. https://doi.org/10.48550/arXiv.2308.12014
[62] Yu, B., & Kim, J. (2024). Personality of AI. In L. Rutkowski, R. Scherer, M. Korytkowski, W. Pedrycz, R. Tadeusiewicz, & J. M. Zurada (Eds.), Artificial Intelligence and Soft Computing: 23rd International Conference (pp. 244-252). Springer, Cham.
[63] Yuan X., Hu J., & Zhang Q. (2024). A comparative analysis of cultural alignment in large language models in bilingual contexts. OSF Preprints. https://doi.org/10.31219/osf.io/6hpcf
[64] Zaim bin Ahmad, M. S., & Takemoto, K. (2025). Large-scale moral machine experiment on large language models.PloS One, 20(5), e0322776.
[65] Zhao, G. L. (2023-06-20). Actual scores surpass ChatGPT! Baidu Wenxin Large Model Version 3.5 internal test application. China Science Daily. https://news.sciencenet.cn/htmlnews/2023/6/503256.shtm
[赵广立. (2023-06-20). 实测得分超ChatGPT！百度文心大模型3.5版内测应用. 中国科学报. https://news.sciencenet.cn/htmlnews/2023/6/503256.shtm]
[66] Zhou C., Liu P., Xu P., Iyer S., Sun J., Mao Y., ... Levy O. (2023). LIMA: less is more for alignment. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Proceedings of the 37th International Conference on Neural Information Processing Systems(pp. 55006- 55021). Curran Associates Inc.
[67] Zhou, X., & Liu, H. (2024). New ethical challenges in the digital and intelligent era.Acta Psychologica Sinica, 56(2), 143-145.
[周欣悦, 刘惠洁. (2024). 数智时代面临新的伦理挑战(前言).心理学报, 56(2), 143-145.]