ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2026, Vol. 58 ›› Issue (7): 1237-1253.doi: 10.3724/SP.J.1041.2026.1237 cstr: 32110.14.2026.1237

• 第二十八届中国科协年会学术论文 •    下一篇

大语言模型的人格化对齐及其对道德判断的影响

李昌锦1,2,3, 焦丽颖4, 陈圳1,2,3, 许恒彬1,2,3, 吴胜涛5, 许燕1,2,3   

  1. 1北京师范大学心理学部;
    2应用实验心理北京市重点实验室;
    3心理学国家级实验教学示范中心〔北京师范大学〕, 北京 100875;
    4北京林业大学人文社会科学学院心理学系, 北京 100083;
    5吉林大学哲学社会学院哲学系, 长春 130012
  • 收稿日期:2025-05-22 发布日期:2026-05-15 出版日期:2026-07-25
  • 通讯作者: 许燕, E-mail: xuyan@bnu.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(31671160), 教育部人文社会科学研究青年基金项目(24YJC190012), 北京市教育科学“十四五”规划2025年度青年专项课题(BCHA25157)资助

Personalized alignment of large language models and its impact on moral judgment

LI Chang-Jin1,2,3, JIAO Liying4, CHEN Zhen1,2,3, XU Hengbin1,2,3, WU Michael Shengtao5, XU Yan1,2,3   

  1. 1Faculty of Psychology, Beijing Normal University;
    2Beijing Key Laboratory of Applied Experimental Psychology;
    3National Demonstration Center for Experimental Psychology Education [Beijing Normal University], Beijing 100875, China;
    4Department of Psychology, School of Humanities and Social Sciences, Beijing Forestry University, Beijing 100083, China;
    5Department of Philosophy, School of Philosophy and Sociology, Jilin University, Changchun 130012, China
  • Received:2025-05-22 Online:2026-05-15 Published:2026-07-25

摘要: 随着人机共生时代的到来, 大语言模型(LLM)的伦理困境与算法偏见引发了社会广泛的担忧, 引导人工智能技术向善发展已成为该领域极具紧迫性和挑战性的重要议题。研究探讨了基于HEXACO人格模型的人格化对齐对LLM道德判断的影响, 其中研究1检验并证实了LLM可以通过遵循提示词有效表达HEXACO人格特质, 研究2探讨了人格化对齐对LLM功利主义倾向的影响及其与人类的异同。结果表明, 高诚实-谦恭、宜人性和尽责性的人格提示词显著减少了GPT-3.5、GPT-4和ERNIE 3.5做出功利主义选择的倾向。由此, 本研究提出基于HEXACO人格模型和人格元特质的LLM人格化对齐框架, 强调稳定性元特质中的诚实-谦恭、宜人性和尽责性等维度在LLM人格化对齐中的道德凸显效应。本研究为人工智能人格化对齐的理论构建与技术路径提供了心理学依据。

关键词: 大语言模型, 人格化对齐, 道德判断, HEXACO人格, 元特质

Abstract: With the advent of the human-machine symbiosis era, the ethical dilemmas and algorithmic biases of large language models (LLMs) have triggered widespread societal concerns. Guiding artificial intelligence (AI) toward beneficial development has thus become an urgent and challenging imperative. This research explores the impact of personalized alignment based on the HEXACO personality model on the moral judgment of LLMs. Specifically, the study aims to verify whether LLMs can effectively achieve personalized alignment through prompting and to systematically evaluate how such alignment influences utilitarian tendencies in LLMs compared to humans across various moral dilemmas. By leveraging established psychological frameworks, this research seeks to provide a scientific basis for constructing controllable and ethical AI alignment strategies.
Study 1 tested GPT-3.5, GPT-4, and ERNIE 3.5 using HEXACO-based personality prompts across six domains at high, low, and baseline levels, integrated with different gender roles. Manipulation checks were conducted using two distinct methods: a quantitative personality assessment using the HEXACO-60 scale and a qualitative personality story-writing task rated by independent human evaluators. Study 2 utilized a set of standardized moral dilemmas to assess utilitarian versus deontological choices in both LLMs and human participants. Human data were categorized into high and low personality groups for comparison, while the LLMs performed the same moral judgment tasks under various personality settings to identify shifts in decision-making patterns.
The results of Study 1 confirmed the feasibility of personalized alignment, demonstrating that LLMs can dynamically represent HEXACO personality traits through prompts. Among the LLMs tested, GPT-4 exhibited superior instruction-following capabilities and more distinct trait differentiation than GPT-3.5 and ERNIE 3.5. Findings from Study 2 revealed that personalized alignment significantly alters the moral judgment of LLMs, though the impact varies across different models and personality domains. Specifically, traits such as Honesty-Humility, Agreeableness, and Conscientiousness were found to reduce utilitarian tendencies, leading to a preference for deontological responses. While some traits, particularly Honesty-Humility, showed stable and consistent effects between humans and AI, others displayed divergent or even opposite patterns, highlighting fundamental differences in their respective moral reasoning mechanisms.
The study reached three primary conclusions. First, LLMs are capable of exhibiting stable and distinguishable personality tendencies that can be activated through prompt-based alignment. Second, the influence of Honesty-Humility on moral judgment exhibits a consistent effect across humans and different LLMs, whereas other personality domains show inconsistencies. This suggests that while LLMs' moral decision-making shares partial cognitive logic with humans, fundamental differences remain. Third, the personality metatrait of “Stability”—and particularly the Honesty-Humility domain—demonstrates a significant moral salience effect within the personalized alignment process. Based on these insights, this research proposes a personalized alignment framework utilizing the HEXACO model and personality metatrait theory to systematically shape the moral responses of AI, providing a psychological foundation for the development of safety, controllable and ethical AI systems. This framework emphasizes integrating psychological theories to mitigate ethical risks and ensure that AI behavior remains consistent with human values.

Key words: large language models, personalized alignment, moral judgment, HEXACO personality, metatrait

中图分类号: