ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2025, Vol. 57 ›› Issue (11): 2022-2042.doi: 10.3724/SP.J.1041.2025.2022 cstr: 32110.14.2025.2022

• 人工智能心理与治理专刊 • 上一篇    下一篇

基于大语言模型的自助式AI心理咨询系统构建及其效果评估

黄峰1,2,3, 丁慧敏4,5, 李思嘉6, 韩诺7,8, 狄雅政1,2, 刘晓倩1,2, 赵楠1,2, 李林妍3,9, 朱廷劭1,2()   

  1. 1 中国科学院心理研究所认知科学与心理健康全国重点实验室, 北京 100101
    2 中国科学院大学心理学系, 北京 100049
    3 香港城市大学计算学院数据科学系, 香港 999077
    4 中国人民大学教育学院, 北京 100872
    5 美国圣母大学心理学系, 印第安纳州 46556
    6 香港大学社会科学学院社会工作及社会行政学系, 香港 999077
    7 北京师范大学文理学院心理系, 广东 珠海 519087
    8 北京师范大学心理学部, 应用实验心理北京市重点实验室, 心理学国家级实验教学示范中心(北京师范大学), 北京 100875
    9 香港城市大学赛马会动物医学及生命科学院传染病及公共卫生学系, 香港 999077
  • 收稿日期:2024-08-15 发布日期:2025-09-24 出版日期:2025-11-25
  • 通讯作者: 朱廷劭, E-mail: tszhu@psych.ac.cn
  • 基金资助:
    北京市自然科学基金(IS23088)

Self-help AI psychological counseling system based on large language models and its effectiveness evaluation

HUANG Feng1,2,3, DING Huimin4,5, LI Sijia6, HAN Nuo7,8, DI Yazheng1,2, LIU Xiaoqian1,2, ZHAO Nan1,2, LI Linyan3,9, ZHU Tingshao1,2()   

  1. 1 State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
    2 Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
    3 Department of Data Science, College of Computing, City University of Hong Kong, Hong Kong SAR 999077, China
    4 School of Education, Renmin University of China, Beijing 100872, China
    5 Department of Psychology, University of Notre Dame, IN 46556, USA
    6 Department of Social Work and Social Administration, Faculty of Social Sciences, University of Hong Kong, Hong Kong SAR 999077, China
    7 Department of Psychology, Faculty of Arts and Sciences, Beijing Normal University at Zhuhai, Zhuhai 519087, China
    8 Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Faculty of Psychology, Beijing Normal University, Beijing 100875, China
    9 Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong SAR 999077, China
  • Received:2024-08-15 Online:2025-09-24 Published:2025-11-25

摘要:

本研究旨在探讨不依赖真实案例数据的前提下, 基于大语言模型构建自助式AI心理咨询系统的技术可行性, 及其对普通人群心理健康状况的改善效果。研究分为两个阶段: 首先, 基于零样本学习和思维链提示策略构建了一个自助式AI心理咨询机器人系统; 随后, 通过招募202名参与者进行为期两周的随机对照试验, 评估了该系统的实际应用效果。实验1的结果表明, 经提示工程优化后的GPT-4o模型在规范性、专业度、情感理解与共情能力以及一致性与连贯性方面均有显著提升。实验2发现, 与控制组相比, 使用自助式AI心理咨询机器人的参与者在短期内的抑郁、焦虑和孤独感均有显著改善。特别是, 拟人化设计的AI咨询师在缓解孤独感方面表现出显著优势, 而非拟人化设计在减轻压力方面效果更为明显。此外, 焦虑症状的积极变化在一周后的随访中仍然保持, 而其他指标的改善效果则未能持续。本研究初步探索了基于大语言模型的自助式AI心理咨询对心理健康的积极影响, 揭示了不同AI设计对特定心理问题的差异化效果, 为未来研究和实践提供了参考。

关键词: 人工智能, 大语言模型, 思维链, 心理健康, 自助式心理咨询, 随机对照试验

Abstract:

The global prevalence of mental health issues, such as depression and anxiety, has become a significant public health challenge. Traditional mental health services face limitations in accessibility, affordability, and scalability. The emergence of large language models (LLMs) offers new opportunities for developing intelligent, self-help psychological counseling systems. However, optimizing LLMs for mental health applications presents unique challenges, including data scarcity and privacy concerns. This study aimed to address these challenges by constructing a self-help AI psychological counseling system using zero-shot learning and chain-of-thought prompting. It also evaluated the effectiveness of this established system in improving mental health outcomes among the general population. The research also explored the impact of AI anthropomorphizing on human- computer interaction outcomes in mental health interventions.

The study comprised two parts. In Experiment 1, we constructed the AI counseling system based on the GPT-4o model. We first compared GPT-4o with two other LLMs (Claude 3 Opus and Yi-Large) using a test set of 12 common mental health topics covering interpersonal relationships, family issues, personal growth, and other categories. Three qualified psychological counselors evaluated the models' performance on normative quality, professionalism, emotional understanding and empathy, and consistency and coherence. We then optimized GPT-4o using chain-of-thought prompting and role instructions designed explicitly for psychological counseling scenarios. The optimized model was re-evaluated to assess improvements. In Experiment 2, we conducted a two-week randomized controlled trial with 202 participants from the general population who reported experiencing negative emotions or psychological distress but had not been diagnosed with severe mental health issues. Participants were randomly assigned to one of three experimental groups with varying degrees of AI anthropomorphizing (F: female counselor image and name, M: male counselor image and name, R: robot image without human name) or a control group (C: using unmodified GPT-4o). To ensure active participation, interactions with at least 10 dialogue rounds and spanning more than 10 minutes were considered valid for analysis. Mental health outcomes, including depression, anxiety, stress (measured by DASS-21), and loneliness (measured by SSL), were assessed at baseline (T1), the last two days of the one-week interaction (T2), and one week post-intervention (T3). Linear mixed-effects models were used to analyze the data, with simple effects analysis and Tukey HSD tests for post-hoc comparisons.

In Experiment 1, GPT-4o significantly outperformed other models in normative quality, emotional understanding and empathy, and consistency and coherence (all p < 0.001). After optimization with chain-of-thought prompting, the model showed further significant improvements across all evaluation dimensions (p < 0.01), with huge effect sizes in normative quality (d = 1.28), emotional understanding and empathy (d = 1.06), and consistency and coherence (d = 1.14). Professional competence showed more limited improvement (d = 0.51), reflecting current technological limitations in this dimension. In Experiment 2, the attrition rate from T1 to T3 was 24.3%, with no significant differences in demographic characteristics or baseline mental health indicators between completers and non-completers. The interaction quality control retained 180 participants at T2(retention rate 89.11%) and 153 at T3(75.74%). All experimental groups showed significant short-term improvements in depression, anxiety, and loneliness at T2 compared to the control group (all p < 0.001). For loneliness, anthropomorphized AI designs (F and M groups) demonstrated significantly greater effects than the non-anthropomorphized design (R group) at T2. For stress levels, a group × time interaction effect reached marginally significant (p = 0.05), with only the non-anthropomorphized group (R group) showing substantial improvement from T1 to T2 (b = 2.35, SE = 0.48, p < 0.001). The improvement in anxiety symptoms persisted at T3 for all experimental groups (p < 0.001), while effects on depression, stress, and loneliness did not maintain significance at follow-up.

This study provides empirical evidence for the potential of AI-based self-help psychological counseling in improving mental health outcomes, particularly in reducing mental health symptoms in the short term. The successful application of zero-shot learning and chain-of-thought prompting in optimizing LLMs for mental health dialogues offers a novel approach to overcome challenges in data scarcity and model adaptation in specialized domains. The differential effects of AI anthropomorphization on various mental health indicators support a nuanced design framework: anthropomorphized designs may be more effective for addressing social functioning-related issues like loneliness through enhanced social presence. In contrast, non-anthropomorphized designs might better manage stress by reducing social evaluation pressure. However, the study also reveals significant limitations, including the lack of long-term effects for most outcomes and limited improvement in professional competence. Future research should focus on enhancing the long-term efficacy of AI-assisted mental health interventions, improving professional depth for specialized counseling scenarios, exploring human-AI collaborative models for high-risk cases, and further investigating the mechanisms underlying the differential effects of AI design features on specific mental health issues. These findings provide valuable insights for developing more effective, personalized AI-assisted mental health services to complement traditional care approaches.

Key words: artificial intelligence, large language models, chain-of-thought, mental health, self-help psychological counseling, randomized controlled trial

中图分类号: