ISSN 0439-755X
CN 11-1911/B

Acta Psychologica Sinica ›› 2025, Vol. 57 ›› Issue (6): 987-1000.doi: 10.3724/SP.J.1041.2025.0987

• Academic Papers of the 27 th Annual Meeting of the China Association for Science and Techn • Previous Articles     Next Articles

Suicidal ideation data augmentation and recognition technology based on large language models

ZHANG Yanbo1,2, HUANG Feng1,2,3, MO Liuling4, LIU Xiaoqian1,2, ZHU Tingshao1,2()   

  1. 1CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
    2Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
    3Department of Data Science, College of Computing, City University of Hong Kong, Hong Kong SAR 999077, China
    4Department of Social Psychology, School of Sociology, Nankai University, Tianjin 300350, China
  • Received:2024-02-08 Published:2025-06-25 Online:2025-04-15
  • Contact: ZHU Tingshao E-mail:tszhu@psych.ac.cn
  • Supported by:
    National Natural Science Foundation of China(62272206);Beijing Municipal Natural Science Foundation(IS23088)

Abstract:

Suicide has become a global public health challenge. Traditional methods for identifying suicidal ideation primarily rely on patients actively seeking help, while automated identification models based on text analysis are limited by the scarcity of annotated data. This study innovatively proposes a data augmentation method based on large language models (LLMs) to improve the accuracy of suicidal ideation text recognition. The research employs a two-stage design: Study 1 focuses on data augmentation, and Study 2 validates the enhancement effect. In Study 1, ChatGLM3-6B and Qwen-7B-Chat were selected as the underlying models, combining supervised learning strategies with zero-shot and few-shot learning methods to optimize training dataset quality. Through eight rigorous comparative experiments, the results show that the two self-developed models demonstrated excellent performance in data augmentation, with comprehensive scores of 0.90 and 0.92 for their processed datasets, significantly outperforming baseline models (p < 0.001). Study 2 further evaluated the impact of data augmentation on recognition model performance, showing that the enhanced models comprehensively outperformed the best baseline models in terms of recognition accuracy and true negative rate (p < 0.001). This study not only validates the effectiveness of LLM-based data augmentation methods in improving the performance of suicidal ideation recognition models but also opens new directions for artificial intelligence applications in the field of mental health. This approach has the potential to provide timely and effective early warning of suicide risk while protecting user privacy, offering important technical support and research ideas for suicide prevention work. Future research could focus on expanding data heterogeneity, optimizing prompt engineering design, and introducing human-computer interaction paradigms to further extend the application of this method in promoting clinical psychological diagnosis.

Key words: suicidal ideation, data augmentation, suicide text recognition, large language models, artificial intelligence