ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

Advances in Psychological Science ›› 2026, Vol. 34 ›› Issue (3): 424-440.doi: 10.3724/SP.J.1042.2026.0424

• Research Method • Previous Articles     Next Articles

The application of foundation models in depression screening and diagnosis

XIE Yu1, ZHENG Hongxin1, LIU Yizi1, YU Honggang2, YANG Chenghe2   

  1. 1School of Education Science, Anhui Normal University, Wuhu 241000, China;
    2Anhui Branch of China Telecom Co., Ltd., Hefei 230001, China
  • Received:2025-06-28 Online:2026-03-15 Published:2026-01-07

Abstract: Depression is a common mental disorder that significantly impairs patients' social functioning and quality of life. In recent years, foundation models, with their powerful semantic understanding capability and multimodal data-processing capacity, have shown notable potential in the early screening and auxiliary diagnosis of depression. Having been trained on large and diverse datasets, these models encode intricate interactions among textual semantics, speech acoustics, facial expressions, and movements, which consequently offers benefits for both computational psychiatry and the innovation of mental health services.
The framework for depression screening and diagnosis powered by a foundation model typically consists of four major steps: data preprocessing, model selection, model training, and model evaluation. This procedure begins with data collection and processing, since the quality and variability of data are the major factors influencing the performance and generalization ability of the model. The models' key strengths are derived from their high-quality pre-training, which endows them with very strong linguistic, contextual, and inferential abilities. These models are usually further enhanced through fine-tuning on datasets relevant to mental health disorders and specific tasks to maximize their performance. The principal metric against which this use case is measured is the rate of correct diagnosis, which defines the model's capacity to differentiate individuals with depression from those without.
Current research on foundation models is moving towards exploring clinical decision support, early screening, and personalized risk assessment for mental illnesses. Recent advances in using multimodal intelligent screening technologies—which integrate textual, speech-based, and facial analysis, as well as behavioral patterns—have opened up the possibility for the detection of depression with increased accuracy. Foundation models, combined with digital health technologies, are capable of rapidly analyzing and managing large volumes of unstructured clinical data, such as health records, patient self-reports, observations from family members, standardized scale assessments, as well as physiological or biochemical markers, to make diagnostic summaries that adhere to precise criteria. Such models, by incorporating genomics and biosignals data, help identify biomarkers for deeper disease insights and push towards personalized and precise prevention approaches.
The empirical reasoning suggests that the basic principles of foundation models involve contextualized semantic modeling, attention mechanisms, multimodal behavior tracking, and predictive processing. The dynamic and context-sensitive semantic representation of these models gives them an advantage over merely measuring the frequency of isolated negative words in the speech of patients with depression; furthermore, they can also capture unique and repeated thought patterns and cognitive styles of patients as a whole. The weighted distribution of attentional computations for each successive piece of information in a text sequence can be construed as a simulation of the attentional biases of patients with depression, enabling the model to prioritize processing of diagnostic cues that are considered most indicative of depression. Various modalities, like vision, speech, and text, can be fed into unified architectures, which help in quantifying the negative affective expressions of depression and in turn are used in identifying its symptoms. The predictive processing framework offers a unified view for cognitive disorders in depression by representing the inner operational principles of the models, which show a high similarity with the generative processes of large language models.
However, the implementation of foundation models is not without obstacles. This is partly due to algorithmic bias because the models are developed on data mostly sourced from a general adult population. Such practice may result in models with poor performance when applied to more heterogeneous populations, such as adolescents, the elderly, or individuals from different cultural backgrounds. The gap in diagnostic specificity remains a core problem, especially when distinguishing depression from comorbid disorders such as anxiety. On the other hand, the hallucination phenomenon, where models generate factually incorrect or contextually inaccurate information, poses a risk in clinical contexts. Security and privacy issues are a core concern for any mental health apps that handle sensitive personal data. Finally, another ethical issue involved is the balance between human agency in psychiatric care and the usage of AI in clinical decisions, as well as the dependence of humans on machines. Looking ahead, the integration of foundation models with psychological intervention paradigms should be advanced, with a heavy emphasis on clinical translation pathways, to build a more complex, adaptable, and culture-sensitive digital phenotype of depression and accomplish the digital and intelligent transformation of mental health services.

Key words: foundation models, depression, early screening, auxiliary diagnosis