ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报

• •    

大语言模型的人格化对齐及其对道德判断的影响(心理学与人工智能发展专题)

李昌锦, 焦丽颖, 陈圳, 许恒彬, 吴胜涛, 许燕   

  1. 北京师范大学心理学部, 北京 100875 中国
    应用实验心理北京市重点实验室, 北京 100875 中国
    心理学基础国家级实验教学示范中心(北京师范大学), 北京 100875 中国
    北京林业大学人文社会科学学院心理学系, 北京 100083 中国
    厦门大学社会与人类学院, 福建 361005 中国
  • 收稿日期:2025-05-23 修回日期:2026-01-04 接受日期:2026-03-05
  • 基金资助:
    教育部人文社会科学研究青年基金项目(24YJC190012); 国家自然科学基金面上项目(31671160); 国家社科基金重大项目(19ZDA363)

Personalized Alignment of Large Language Models and Its Impact on Moral Judgment

Li Chang-Jin, Jiao Liying, Chen Zhen, Xu Hengbin, Wu Shengtao Michael, Xu Yan   

  1. Faculty of Psychology, Beijing Normal University 100875, China
    , Beijing Key Laboratory of Applied Experimental Psychology 100875, China
    , National Demonstration Center for Experimental Psychology Education (Beijing Normal University) 100875, China
    Department of Psychology, School of Humanities and Social Sciences, Beijing Forestry University 100083, China
    School of Sociology and Anthropology, Xiamen University 361005, China
  • Received:2025-05-23 Revised:2026-01-04 Accepted:2026-03-05

摘要: 随着人机共生时代的到来,大语言模型(LLMs)在广泛应用中暴露出的价值对齐缺失与算法偏见引发了严重的伦理担忧,引导人工智能技术向善发展成为紧迫挑战。本研究探讨了基于HEXACO人格模型的人格化对齐对LLMs道德判断的影响。研究1检验并证实了LLMs可以通过遵循提示词有效表达HEXACO人格特质。研究2探讨了人格化对齐对LLMs功利主义倾向的影响及其与人类的异同。结果表明,高诚实-谦恭、宜人性和尽责性的人格提示词显著减少了GPT-3.5、GPT-4和ERNIE 3.5做出功利主义选择的倾向。由此,本研究提出基于HEXACO人格模型和人格元特质理论的LLMs人格化对齐框架,强调稳定性元特质的诚实-谦恭、宜人性和尽责性等维度在LLMs人格化对齐中起到的核心作用。本研究为人工智能人格化对齐技术的理论构建与实践路径提供了心理学依据。

关键词: 大语言模型, 人格化对齐, 道德判断, HEXACO模型, 人格元特质

Abstract: With the advent of the era of human-machine symbiosis, the lack of value alignment and algorithmic bias exposed in the widespread application of large language models (LLMs) have triggered serious ethical concerns, making it an urgent challenge to guide AI technology toward benevolence. This research proposed a personalized alignment approach based on the HEXACO model of personality structure and explored its impact on the moral judgment of LLMs. This research aims to validate the ability of LLMs to exhibit HEXACO personality traits via prompts and systematically compare the utilitarian tendencies of personalized-aligned LLMs and humans in moral dilemmas, addressing two core issues: (1) verifying whether LLMs can effectively achieve personalized alignment based on the HEXACO personality model, and (2) systematically examining the impact of personalized alignment on the utilitarian tendencies of LLMs in moral dilemmas, as well as the similarities and differences compared to human, thereby evaluating the effectiveness of different personality traits as alignment targets. Study 1 tested GPT-3.5, GPT-4 and ERNIE 3.5 using HEXACO-based personality prompts (six dimensions at high/low levels + baseline), each combined with three gender roles (baseline/male/female; 39 conditions total). Manipulation checks included: (1) HEXACO-PI-R scale responses (60 items) across five dialogue repetitions per condition (810 observations); and (2) a personal story writing task (generating 108 stories in total), rated by 15 psychology undergraduates on 5-point scales. Both measures employed Kruskal-Wallis test with personality level as between-subject variables. Study 2 utilized 60 moral dilemmas to assess utilitarian choice (yes/no). LLMs responded under Study 1's personality settings (36 prompts × 3 models × 60 dilemmas = 6,480 independent dialogues). Human participants (N=215, Chinese adults; mean age=30.8; 67 male/148 female) completed moral judgments (T1) and HEXACO-PI-R (T2), with high/low personality groups defined via extreme scoring (top/bottom 27%). Data aggregation at dilemma level enabled repeated-measures ANOVA comparing agent types (GPT-3.5, GPT-4, ERNIE 3.5 and human) and personality levels. Study 1 demonstrated that GPT-3.5, GPT-4 and ERNIE 3.5 could effectively express HEXACO personality traits through prompts, confirming the feasibility of personalized alignment—though effects varied by model performance and task type. Compared to GPT-3.5 and ERNIE 3.5, GPT-4 demonstrates a stronger ability to follow prompts and shows clearer differentiation across personality levels. Study 2 revealed that personality alignment based on the HEXACO model significantly affects the utilitarian tendencies of LLMs. Although the direction and strength of these effects varied across models and traits. Specifically, Honesty-Humility, Agreeableness, and Conscientiousness substantially reduce the utilitarian tendencies in all three LLMs. These traits lead to more deontological responses at high levels, whereas low levels show increased utilitarian tendencies. In contrast, Extraversion, Emotionality, and Openness to Experience have weaker and more inconsistent effects. The study reached three main conclusions. First, GPT-3.5, GPT-4 and ERNIE 3.5 demonstrate stable and distinguishable personality tendencies that can be dynamically activated through prompt-based alignment with the HEXACO model. Second, the influence of personality traits on LLMs' moral judgments shows both convergence and divergence from human patterns. Honesty-Humility show consistent effects with human patterns, while other traits exhibit inconsistent effects, revealing essential differences on moral reasoning between LLMs and human. Third, personality alignment based on the HEXACO model provides a viable strategy for guiding LLMs' moral behavior. In particular, the metatrait Stability, comprising Honesty-Humility, Agreeableness, and Conscientiousness, plays a core role in reducing utilitarian tendencies and enhancing deontological reasoning. Building on this, the study proposes a personalized alignment framework based on the HEXACO model and the theory of personality metatraits to systematically shape LLMs' moral responses.

Key words: large language models, personalized alignment, moral judgment, HEXACO model, personality metatrait