大语言模型的人格化对齐及其对道德判断的影响（心理学与人工智能发展专题）

doi:10.3724/SP.J.1041.2026.019

心理学报

• •

大语言模型的人格化对齐及其对道德判断的影响（心理学与人工智能发展专题）

李昌锦, 焦丽颖, 陈圳, 许恒彬, 吴胜涛, 许燕

北京师范大学心理学部, 北京 100875 中国
应用实验心理北京市重点实验室, 北京 100875 中国
心理学基础国家级实验教学示范中心（北京师范大学）, 北京 100875 中国
北京林业大学人文社会科学学院心理学系, 北京 100083 中国
厦门大学社会与人类学院, 福建 361005 中国

收稿日期:2025-05-23 修回日期:2026-01-04 接受日期:2026-03-05
基金资助:
教育部人文社会科学研究青年基金项目(24YJC190012); 国家自然科学基金面上项目(31671160); 国家社科基金重大项目(19ZDA363)

Personalized Alignment of Large Language Models and Its Impact on Moral Judgment

Li Chang-Jin, Jiao Liying, Chen Zhen, Xu Hengbin, Wu Shengtao Michael, Xu Yan

Faculty of Psychology, Beijing Normal University 100875, China
, Beijing Key Laboratory of Applied Experimental Psychology 100875, China
, National Demonstration Center for Experimental Psychology Education (Beijing Normal University) 100875, China
Department of Psychology, School of Humanities and Social Sciences, Beijing Forestry University 100083, China
School of Sociology and Anthropology, Xiamen University 361005, China

Received:2025-05-23 Revised:2026-01-04 Accepted:2026-03-05

摘要/Abstract

摘要： 随着人机共生时代的到来，大语言模型(LLMs)在广泛应用中暴露出的价值对齐缺失与算法偏见引发了严重的伦理担忧，引导人工智能技术向善发展成为紧迫挑战。本研究探讨了基于HEXACO人格模型的人格化对齐对LLMs道德判断的影响。研究1检验并证实了LLMs可以通过遵循提示词有效表达HEXACO人格特质。研究2探讨了人格化对齐对LLMs功利主义倾向的影响及其与人类的异同。结果表明，高诚实-谦恭、宜人性和尽责性的人格提示词显著减少了GPT-3.5、GPT-4和ERNIE 3.5做出功利主义选择的倾向。由此，本研究提出基于HEXACO人格模型和人格元特质理论的LLMs人格化对齐框架，强调稳定性元特质的诚实-谦恭、宜人性和尽责性等维度在LLMs人格化对齐中起到的核心作用。本研究为人工智能人格化对齐技术的理论构建与实践路径提供了心理学依据。

关键词: 大语言模型, 人格化对齐, 道德判断, HEXACO模型, 人格元特质

Abstract: With the advent of the era of human-machine symbiosis, the lack of value alignment and algorithmic bias exposed in the widespread application of large language models (LLMs) have triggered serious ethical concerns, making it an urgent challenge to guide AI technology toward benevolence. This research proposed a personalized alignment approach based on the HEXACO model of personality structure and explored its impact on the moral judgment of LLMs. This research aims to validate the ability of LLMs to exhibit HEXACO personality traits via prompts and systematically compare the utilitarian tendencies of personalized-aligned LLMs and humans in moral dilemmas, addressing two core issues: (1) verifying whether LLMs can effectively achieve personalized alignment based on the HEXACO personality model, and (2) systematically examining the impact of personalized alignment on the utilitarian tendencies of LLMs in moral dilemmas, as well as the similarities and differences compared to human, thereby evaluating the effectiveness of different personality traits as alignment targets. Study 1 tested GPT-3.5, GPT-4 and ERNIE 3.5 using HEXACO-based personality prompts (six dimensions at high/low levels + baseline), each combined with three gender roles (baseline/male/female; 39 conditions total). Manipulation checks included: (1) HEXACO-PI-R scale responses (60 items) across five dialogue repetitions per condition (810 observations); and (2) a personal story writing task (generating 108 stories in total), rated by 15 psychology undergraduates on 5-point scales. Both measures employed Kruskal-Wallis test with personality level as between-subject variables. Study 2 utilized 60 moral dilemmas to assess utilitarian choice (yes/no). LLMs responded under Study 1's personality settings (36 prompts × 3 models × 60 dilemmas = 6,480 independent dialogues). Human participants (N=215, Chinese adults; mean age=30.8; 67 male/148 female) completed moral judgments (T1) and HEXACO-PI-R (T2), with high/low personality groups defined via extreme scoring (top/bottom 27%). Data aggregation at dilemma level enabled repeated-measures ANOVA comparing agent types (GPT-3.5, GPT-4, ERNIE 3.5 and human) and personality levels. Study 1 demonstrated that GPT-3.5, GPT-4 and ERNIE 3.5 could effectively express HEXACO personality traits through prompts, confirming the feasibility of personalized alignment—though effects varied by model performance and task type. Compared to GPT-3.5 and ERNIE 3.5, GPT-4 demonstrates a stronger ability to follow prompts and shows clearer differentiation across personality levels. Study 2 revealed that personality alignment based on the HEXACO model significantly affects the utilitarian tendencies of LLMs. Although the direction and strength of these effects varied across models and traits. Specifically, Honesty-Humility, Agreeableness, and Conscientiousness substantially reduce the utilitarian tendencies in all three LLMs. These traits lead to more deontological responses at high levels, whereas low levels show increased utilitarian tendencies. In contrast, Extraversion, Emotionality, and Openness to Experience have weaker and more inconsistent effects. The study reached three main conclusions. First, GPT-3.5, GPT-4 and ERNIE 3.5 demonstrate stable and distinguishable personality tendencies that can be dynamically activated through prompt-based alignment with the HEXACO model. Second, the influence of personality traits on LLMs' moral judgments shows both convergence and divergence from human patterns. Honesty-Humility show consistent effects with human patterns, while other traits exhibit inconsistent effects, revealing essential differences on moral reasoning between LLMs and human. Third, personality alignment based on the HEXACO model provides a viable strategy for guiding LLMs' moral behavior. In particular, the metatrait Stability, comprising Honesty-Humility, Agreeableness, and Conscientiousness, plays a core role in reducing utilitarian tendencies and enhancing deontological reasoning. Building on this, the study proposes a personalized alignment framework based on the HEXACO model and the theory of personality metatraits to systematically shape LLMs' moral responses.

Key words: large language models, personalized alignment, moral judgment, HEXACO model, personality metatrait

李昌锦, 焦丽颖, 陈圳, 许恒彬, 吴胜涛, 许燕. 大语言模型的人格化对齐及其对道德判断的影响（心理学与人工智能发展专题）. 心理学报, doi: 10.3724/SP.J.1041.2026.019.

Li Chang-Jin, Jiao Liying, Chen Zhen, Xu Hengbin, Wu Shengtao Michael, Xu Yan. Personalized Alignment of Large Language Models and Its Impact on Moral Judgment. Acta Psychologica Sinica, doi: 10.3724/SP.J.1041.2026.019.

[1]	戴逸清, 马歆茗, 伍珍. 大语言模型放大共情性别刻板印象：对专业与职业推荐的影响[J]. 心理学报, 2026, 58(3): 399-415.
[2]	周蕾, 李立统, 王旭, 区桦烽, 胡倩瑜, 李爱梅, 古晨妍. 能辨“单次−多次博弈”的大语言模型: 理解与干预风险决策[J]. 心理学报, 2026, 58(3): 416-436.
[3]	吴诗玉, 王亦赟. “零样本语言学习”：大语言模型能“像人一样”习得语境中的情感吗?[J]. 心理学报, 2026, 58(2): 308-322.
[4]	宋茹, 吴珺, 刘彩霞, 刘洁, 崔芳. 旁观者清?道德情景中不同角色视角的启动调节第三方道德判断[J]. 心理学报, 2025, 57(6): 1070-1082.
[5]	章彦博, 黄峰, 莫柳铃, 刘晓倩, 朱廷劭. 基于大语言模型的自杀意念文本数据增强与识别技术[J]. 心理学报, 2025, 57(6): 987-1000.
[6]	高承海, 党宝宝, 王冰洁, 吴胜涛. 人工智能的语言优势和不足：基于大语言模型与真实学生语文能力的比较[J]. 心理学报, 2025, 57(6): 947-966.
[7]	焦丽颖, 李昌锦, 陈圳, 许恒彬, 许燕. 当AI“具有”人格：善恶人格角色对大语言模型道德判断的影响[J]. 心理学报, 2025, 57(6): 929-946.
[8]	由姗姗, 齐玥, 陈俊廷, 骆磊, 张侃. 人与AI对智能家居机器人的安全信任及其影响因素[J]. 心理学报, 2025, 57(11): 1951-1972.
[9]	周子森, 黄琪, 谭泽宏, 刘睿, 曹子亨, 母芳蔓, 樊亚春, 秦绍正. 多模态大语言模型动态社会互动情景下的情感能力测评[J]. 心理学报, 2025, 57(11): 1988-2000.
[10]	黄峰, 丁慧敏, 李思嘉, 韩诺, 狄雅政, 刘晓倩, 赵楠, 李林妍, 朱廷劭. 基于大语言模型的自助式AI心理咨询系统构建及其效果评估[J]. 心理学报, 2025, 57(11): 2022-2042.
[11]	武月婷, 王博, 包寒吴霜, 李若男, 吴怡, 王嘉琪, 程诚, 杨丽. 人类对大语言模型的热情和能力感知[J]. 心理学报, 2025, 57(11): 2043-2059.
[12]	吴珺, 李晚晨, 姚晓欢, 刘洁, 崔芳. 友善重要, 还是公平重要?亲社会性与公平性调节复杂道德判断[J]. 心理学报, 2024, 56(11): 1541-1555.
[13]	吕小康, 付春野, 汪新建. 反驳文本对患方信任和道德判断的影响与机制 ^*[J]. 心理学报, 2019, 51(10): 1171-1186.
[14]	甘甜, 石睿, 刘超, 罗跃嘉. 经颅直流电刺激右侧颞顶联合区对助人意图加工的影响[J]. 心理学报, 2018, 50(1): 36-46.
[15]	罗俊; 叶航;郑昊力;贾拥民;陈姝; 黄达强. 左右侧颞顶联合区对道德意图信息加工能力的共同作用——基于经颅直流电刺激技术[J]. 心理学报, 2017, 49(2): 228-240.

大语言模型的人格化对齐及其对道德判断的影响（心理学与人工智能发展专题）

Personalized Alignment of Large Language Models and Its Impact on Moral Judgment

PDF (PC)

可视化

English Version

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价