ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2026, Vol. 34 ›› Issue (3): 424-440.doi: 10.3724/SP.J.1042.2026.0424 cstr: 32111.14.2026.0424

• 研究方法 • 上一篇    下一篇

大模型在抑郁症筛查与诊断中的应用

谢宇1, 郑弘欣1, 刘怡资1, 禹红刚2, 杨成赫2   

  1. 1安徽师范大学教育科学学院, 芜湖 241000;
    2中国电信股份有限公司安徽分公司, 合肥 230001
  • 收稿日期:2025-06-28 出版日期:2026-03-15 发布日期:2026-01-07
  • 基金资助:
    安徽省高等学校思想政治教育研究会2024年度高校思想政治教育研究专项课题(2024SZX012); 中国电信股份有限公司大中小一体化智能心育研发项目(24AHEKYF5020)

The application of foundation models in depression screening and diagnosis

XIE Yu1, ZHENG Hongxin1, LIU Yizi1, YU Honggang2, YANG Chenghe2   

  1. 1School of Education Science, Anhui Normal University, Wuhu 241000, China;
    2Anhui Branch of China Telecom Co., Ltd., Hefei 230001, China
  • Received:2025-06-28 Online:2026-03-15 Published:2026-01-07

摘要: 抑郁症是一种常见的精神障碍, 严重影响患者的社会功能和生活质量。近年来, 大模型凭借其强大的语义理解和多模态数据处理能力, 在抑郁症早期筛查与辅助诊断中展现出显著优势。构建抑郁症筛查和诊断大模型通常包括: 数据准备、模型选择、模型训练和模型评估四个步骤。大模型在抑郁症筛查与诊断中, 主要通过情境化语义表征、注意力机制、多模态行为捕捉及生成式预测等关键技术实现。但当前研究仍存在算法偏见、诊断特异性、幻觉现象、隐私安全及伦理问题等挑战。未来应加强大模型心理干预的整合应用, 聚焦临床转化路径, 构建更为精细、动态且具备文化适应性的抑郁症数字表型, 实现心理健康服务的数智化转型。

关键词: 大模型, 抑郁症, 早期筛查, 辅助诊断

Abstract: Depression is a common mental disorder that significantly impairs patients' social functioning and quality of life. In recent years, foundation models, with their powerful semantic understanding capability and multimodal data-processing capacity, have shown notable potential in the early screening and auxiliary diagnosis of depression. Having been trained on large and diverse datasets, these models encode intricate interactions among textual semantics, speech acoustics, facial expressions, and movements, which consequently offers benefits for both computational psychiatry and the innovation of mental health services.
The framework for depression screening and diagnosis powered by a foundation model typically consists of four major steps: data preprocessing, model selection, model training, and model evaluation. This procedure begins with data collection and processing, since the quality and variability of data are the major factors influencing the performance and generalization ability of the model. The models' key strengths are derived from their high-quality pre-training, which endows them with very strong linguistic, contextual, and inferential abilities. These models are usually further enhanced through fine-tuning on datasets relevant to mental health disorders and specific tasks to maximize their performance. The principal metric against which this use case is measured is the rate of correct diagnosis, which defines the model's capacity to differentiate individuals with depression from those without.
Current research on foundation models is moving towards exploring clinical decision support, early screening, and personalized risk assessment for mental illnesses. Recent advances in using multimodal intelligent screening technologies—which integrate textual, speech-based, and facial analysis, as well as behavioral patterns—have opened up the possibility for the detection of depression with increased accuracy. Foundation models, combined with digital health technologies, are capable of rapidly analyzing and managing large volumes of unstructured clinical data, such as health records, patient self-reports, observations from family members, standardized scale assessments, as well as physiological or biochemical markers, to make diagnostic summaries that adhere to precise criteria. Such models, by incorporating genomics and biosignals data, help identify biomarkers for deeper disease insights and push towards personalized and precise prevention approaches.
The empirical reasoning suggests that the basic principles of foundation models involve contextualized semantic modeling, attention mechanisms, multimodal behavior tracking, and predictive processing. The dynamic and context-sensitive semantic representation of these models gives them an advantage over merely measuring the frequency of isolated negative words in the speech of patients with depression; furthermore, they can also capture unique and repeated thought patterns and cognitive styles of patients as a whole. The weighted distribution of attentional computations for each successive piece of information in a text sequence can be construed as a simulation of the attentional biases of patients with depression, enabling the model to prioritize processing of diagnostic cues that are considered most indicative of depression. Various modalities, like vision, speech, and text, can be fed into unified architectures, which help in quantifying the negative affective expressions of depression and in turn are used in identifying its symptoms. The predictive processing framework offers a unified view for cognitive disorders in depression by representing the inner operational principles of the models, which show a high similarity with the generative processes of large language models.
However, the implementation of foundation models is not without obstacles. This is partly due to algorithmic bias because the models are developed on data mostly sourced from a general adult population. Such practice may result in models with poor performance when applied to more heterogeneous populations, such as adolescents, the elderly, or individuals from different cultural backgrounds. The gap in diagnostic specificity remains a core problem, especially when distinguishing depression from comorbid disorders such as anxiety. On the other hand, the hallucination phenomenon, where models generate factually incorrect or contextually inaccurate information, poses a risk in clinical contexts. Security and privacy issues are a core concern for any mental health apps that handle sensitive personal data. Finally, another ethical issue involved is the balance between human agency in psychiatric care and the usage of AI in clinical decisions, as well as the dependence of humans on machines. Looking ahead, the integration of foundation models with psychological intervention paradigms should be advanced, with a heavy emphasis on clinical translation pathways, to build a more complex, adaptable, and culture-sensitive digital phenotype of depression and accomplish the digital and intelligent transformation of mental health services.

Key words: foundation models, depression, early screening, auxiliary diagnosis