ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2026, Vol. 58 ›› Issue (3): 399-415.doi: 10.3724/SP.J.1041.2026.0399 cstr: 32110.14.2026.0399

• 第二十七届中国科协年会学术论文 • 上一篇    下一篇

大语言模型放大共情性别刻板印象:对专业与职业推荐的影响

戴逸清1, 马歆茗2, 伍珍1,3   

  1. 1清华大学心理与认知科学系, 北京 100084;
    2北京师范大学教育学部, 北京 100875;
    3清华大学终身学习实验室, 北京 100084
  • 收稿日期:2025-05-10 发布日期:2025-12-26 出版日期:2026-03-25
  • 通讯作者: 伍珍, E-mail: zhen-wu@mail.tsinghua.edu.cn
  • 基金资助:
    国家自然科学基金(32271110, 62441614)和清华大学自主科研基金(20235080047)支持

LLMs amplify gendered empathy stereotypes and influence major and career recommendations

DAI Yiqing1, MA Xinming2, WU Zhen1,3   

  1. 1Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing 100084, China;
    2Faculty of Education, Beijing Normal University, Beijing 100875, China;
    3Lab for Lifelong Learning, Tsinghua University, Beijing 100084, China
  • Received:2025-05-10 Online:2025-12-26 Published:2026-03-25

摘要: 大语言模型(LLMs)在教育与职业咨询等高敏感场景中的应用日益广泛, 其潜在的性别刻板印象风险引发关注。本研究通过三项实验考察LLMs在“共情能力女性强、男性弱”这一刻板印象上的表现及其影响。研究1通过人机对比, 发现6类LLMs在情绪共情、情感关注与行为共情维度上的性别刻板印象均显著高于人类。研究2操控输入语言(中文/英文)与性别身份(男/女), 发现英文语境和女性身份启动更易激活LLMs中的刻板印象。研究3聚焦专业与职业推荐任务, 发现LLMs倾向给女性推荐高共情需求的专业与职业, 而给男性推荐低共情需求的方向。总体而言, LLMs在共情能力上表现出明显的性别刻板印象, 该偏见会随输入情境变化, 并可迁移至现实推荐任务中。研究为人工智能系统的偏见识别与公平性优化提供了理论依据与实践启示。

关键词: 大语言模型(LLMs), 性别刻板印象, 共情能力, AI推荐, 人机交互

Abstract: As large language models (LLMs) are increasingly deployed in sensitive domains such as education and career guidance, concerns have grown about their potential to amplify gender bias. Prior research has documented occupational gender stereotypes in LLMs, such as associating men with technical roles and women with caregiving roles However, less attention has been paid to whether these models also encode deeper socio-emotional traits in gender-based ways. A persistent societal stereotype holds that “women are more empathetic than men”, a belief that can shape career expectations. This study investigated whether LLMs reflect or even exaggerate gender stereotypes related to empathy and examined the contextual factors (e.g., input language, gender-identity priming) that might influence the expression of these stereotypes. We hypothesized that LLMs would exhibit stronger gendered empathy stereotypes than human participants, that these biases would vary according to linguistic and social cues in prompts; and that these stereotypes would manifest in real-world major/career recommendation scenarios.
We conducted three studies to test these hypotheses. Study 1 compared judgments about empathy from human participants (N = 626) with those generated by six leading LLMs (GPT-4o, GPT-4-Turbo, GPT-3.5-Turbo, DeepSeek-reasoner, DeepSeek-chat, ERNIE-Bot). Twelve story-based scenarios, adapted from the Empathy Questionnaire, covered emotional empathy, attention to others’ feelings, and behavioral empathy. For each scenario, participants and LLMs inferred the protagonist’s gender based on their empathetic behavior. Study 2 examined how manipulating input language (English vs. Chinese) and gender-identity priming (male vs. female) influenced the expression of these stereotypes. Study 3 extended this paradigm to a real-world application: we prompted LLMs to recommend 16 pre-selected university majors and 16 professions (categorized into high- and low-empathy groups) to individuals of different genders, requesting explanatory rationales for each recommendation.
Results indicated that LLMs displayed significantly stronger gendered empathy stereotypes than human participants (Study 1). English prompts and female priming elicited stronger “women = high empathy, men = low empathy” associations (Study 2). In the recommendation tasks, LLMs more often suggested high-empathy majors and professions (e.g., nursing, education, psychology) for women, and low-empathy, STEM-related fields for men (Study 3). Together, these findings suggest that LLMs not only internalize gendered empathy stereotypes but also express them in context-dependent ways, producing measurable downstream effects in applied decision-making tasks.
=Overall, our findings underscore the need for critical evaluation of how LLMs represent and amplify social stereotypes, especially in socio-emotional domains such as empathy. This research contributes to understanding the sources of AI bias by showing that LLMs may exaggerate gender norms beyond human levels. It also highlights the complex interplay between language and gender identity in shaping algorithmic behavior. Practically, the results raise important ethical concerns about fairness in AI-driven decision-making systems and highlight the urgency of developing more robust bias-mitigation strategies in multilingual contexts.

Key words: large language models, gender stereotypes, empathy, AI recommendations, human-computer interaction

中图分类号: