Suicidal ideation data augmentation and recognition technology based on large language models

doi:10.3724/SP.J.1041.2025.0987

Abstract

Abstract:

Suicide has become a global public health challenge. Traditional methods for identifying suicidal ideation primarily rely on patients actively seeking help, while automated identification models based on text analysis are limited by the scarcity of annotated data. This study innovatively proposes a data augmentation method based on large language models (LLMs) to improve the accuracy of suicidal ideation text recognition. The research employs a two-stage design: Study 1 focuses on data augmentation, and Study 2 validates the enhancement effect. In Study 1, ChatGLM3-6B and Qwen-7B-Chat were selected as the underlying models, combining supervised learning strategies with zero-shot and few-shot learning methods to optimize training dataset quality. Through eight rigorous comparative experiments, the results show that the two self-developed models demonstrated excellent performance in data augmentation, with comprehensive scores of 0.90 and 0.92 for their processed datasets, significantly outperforming baseline models (p < 0.001). Study 2 further evaluated the impact of data augmentation on recognition model performance, showing that the enhanced models comprehensively outperformed the best baseline models in terms of recognition accuracy and true negative rate (p < 0.001). This study not only validates the effectiveness of LLM-based data augmentation methods in improving the performance of suicidal ideation recognition models but also opens new directions for artificial intelligence applications in the field of mental health. This approach has the potential to provide timely and effective early warning of suicide risk while protecting user privacy, offering important technical support and research ideas for suicide prevention work. Future research could focus on expanding data heterogeneity, optimizing prompt engineering design, and introducing human-computer interaction paradigms to further extend the application of this method in promoting clinical psychological diagnosis.

Key words: suicidal ideation, data augmentation, suicide text recognition, large language models, artificial intelligence

ZHANG Yanbo, HUANG Feng, MO Liuling, LIU Xiaoqian, ZHU Tingshao. (2025). Suicidal ideation data augmentation and recognition technology based on large language models. Acta Psychologica Sinica, 57(6), 987-1000.

Figures/Tables 9

References 39

[1]	Arunima, R., Nikolitch, K., Rachel, M., Safiya, J., Klement, W., & Kaminsky, Z. (2020). A machine learning approach predicts future risk to suicidal ideation from social media data. NPJ Digital Medicine, 3(1), 78. https://doi.org/10.1038/s41746-020-0287-6
[2]	Batterham, P. J., Ftanou, M., Pirkis, J., Brewer, J. L., Mackinnon, A. J., Beautrais, A., … Christensen, H. (2015). A systematic review and evaluation of measures for suicidal ideation and behaviors in population-based research. Psychological Assessment, 27(2), 501-512. https://doi.org/10.1037/pas0000053 doi: 10.1037/pas0000053 URL pmid: 25496086
[3]	Beck, A. T., Kovacs, M., & Weissman, A. (1979). Assessment of suicidal intention: The scale for suicide ideation. Journal of Consulting and Clinical Psychology, 47(2), 343-52. https://doi.org/10.1037//0022-006x.47.2.343 doi: 10.1037//0022-006x.47.2.343 URL pmid: 469082
[4]	Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., … Xie, X. (2024). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), 1-45. https://doi.org/10.1145/3641289
[5]	Claudia, G.-M., Oliván-Blázquez, B., Javier, F., Martínez-Martínez, A. B., Pérez-Yus, M., & Yolanda, L.-d.-H. (2022). Exploring the risk of suicide in real time on spanish twitter: Observational study. JMIR Public Health and Surveillance, 8(15), e31800. https://doi.org/10.2196/31800
[6]	Cunningham, P., Cord, M., & Delany, S. J. (2008). Supervised learning. In M. Cord & P. Cunningham (Eds.), Machine learning techniques for multimedia: Case studies on organization and retrieval (pp. 21-49). Springer. https://doi.org/10.1007/978-3-540-75171-7_2
[7]	Ghasemi, P., Shaghaghi, A., & Allahverdipour, H. (2015). Measurement scales of suicidal ideation and attitudes: A systematic review article. Health Promotion Perspectives, 5(3), 156-168. https://doi.org/10.15171/hpp.2015.019 doi: 10.15171/hpp.2015.019 URL pmid: 26634193
[8]	Giray, L. (2023). Prompt engineering with ChatGPT: A guide for academic writers. Annals of Biomedical Engineering, 51(12), 2629-2633. https://doi.org/10.1007/s10439-023-03272-4 doi: 10.1007/s10439-023-03272-4 URL pmid: 37284994
[9]	Hagendorff, T., Fabi, S., & Kosinski, M. (2023). Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science, 3(10), 833-838. https://doi.org/10.1038/s43588-023-00527-x doi: 10.1038/s43588-023-00527-x URL pmid: 38177754
[10]	He, T., Zheng, Y., Bai, J., Chen, P., Ma, Y., Fu, G., … Yang, B. (2021). Analysis of emotional characteristics of Weibo "tree hole" users with different suicide risk. Proceedings of the 2nd International Symposium on Artificial Intelligence for Medicine Sciences, Beijing, China. https://doi.org/10.1145/3500931.3501027
[11]	Homan, S., Gabi, M., Klee, N., Bachmann, S., Moser, A. M., Duri, M., … Kleim, B. (2022). Linguistic features of suicidal thoughts and behaviors: A systematic review. Clinical Psychology Review, 95(1), 102161. https://doi.org/10.1016/j.cpr.2022.102161
[12]	Hu, Y., Zeng, Z., Peng, L., Wang, H., Liu, S., Yang, Q., & Fang, X. (2023). The effects of the parent-child relationship and parental educational involvement on adolescent depression, self-injury, and suicidal ideation: The roles of defeat and meaning in life. Acta Psychologica Sinica, 55(1), 129-141. https://doi.org/10.3724/sp.J.1041.2023.00129 doi: 10.3724/SP.J.1041.2023.00129 URL
[13]	Huang, F., Li, S., Li, D., Yang, M., Ding, H., Di, Y., & Zhu, T. (2022). The impact of mortality salience, negative emotions and cultural values on suicidal ideation in covid-19: A conditional process model. International Journal of Environmental Research and Public Health, 19(15), 9200. https://doi.org/10.3390/ijerph19159200
[14]	Huang, F., Sun, X., Mei, A., Wang, Y., Ding, H., & Zhu, T. (2024). LLM plus machine learning outperform expert rating to predict life satisfaction from self-statement text. IEEE Transactions on Computational Social Systems, Advance online publication. https://doi.org/10.1109/TCSS.2024.3475413
[15]	Huang, L. (2023). A quasi-comprehensive exploration of the mechanisms of spatial working memory. Nature Human Behaviour, 7(5), 729-739. https://doi.org/10.1038/s41562-023-01559-z doi: 10.1038/s41562-023-01559-z URL pmid: 36959326
[16]	Ji, S., Pan, S., Li, X., Cambria, E., Long, G., & Huang, Z. (2020). Suicidal ideation detection: A review of machine learning methods and applications. IEEE Transactions on Computational Social Systems, 8(1), 214-226. https://doi.org/10.1109/TCSS.2020.3021467
[17]	Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, 70(6), 543-556. https://doi.org/10.1037/a0039210 doi: 10.1037/a0039210 URL pmid: 26348336
[18]	Liu, X., Liu, X., Sun, J., Yu, N. X., Sun, B., Li, Q., & Zhu, T. (2019). Proactive suicide prevention online (PSPO): Machine identification and crisis management for chinese social media users with suicidal thoughts and behaviors. Journal of Medical Internet Research, 21(5), e11705. https://doi.org/10.2196/11705
[19]	O'Connor, R. C., & Kirtley, O. J. (2018). The integrated motivational-volitional model of suicidal behaviour. Philosophical Transactions of The Royal Society B: Biological Sciences, 373(1754), 20170268. https://doi.org/10.1098/rstb.2017.0268
[20]	Ordóñez-Carrasco, J. L., Sayans-Jiménez, P., & Rojas-Tejada, A. J. (2021). Ideation-to-action framework variables involved in the development of suicidal ideation: A network analysis. Current Psychology, 42(5), 4053-4064. https://doi.org/10.1007/s12144-021-01765-w
[21]	Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. (2010). Suicide note classification using natural language processing: A content analysis. Biomedical Informatics Insights, 1(3), 19-28. https://doi.org/10.4137/bii.s4706
[22]	Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., Lim, C. P., ... Wu, Q. M. J. (2023). A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4051-4070. https://doi.org/10.1109/TPAMI. 2022.3191696
[23]	Renjith, S., Abraham, A., Jyothi, S. B., Chandran, L., & Thomson, J. (2022). An ensemble deep learning technique for detecting suicidal ideation from posts in social media platforms. Journal of King Saud University-Computer and Information Sciences, 34(10), 9564-9575. https://doi.org/10.1016/j.jksuci.2021.11.010
[24]	Robert, A. F., Jeremy, B., & Michiko, U. (2020). Covariance in diurnal patterns of suicide-related expressions on Twitter and recorded suicide deaths. Social Science & Medicine, 253(1), 112960. https://doi.org/10.1016/j.socscimed.2020.112960
[25]	Scherer, S., Pestian, J., & Morency, L.-P. (2013). Investigating the speech characteristics of suicidal adolescents. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC. https://doi.org/10.1109/ICASSP.2013.6637740
[26]	Sense, F., Wood, R., Collins, M. G., Fiechter, J., Wood, A., Krusmark, M., … Myers, C. W. (2022). Cognition-enhanced machine learning for better predictions with limited data. Topics in Cognitive Science, 14(4), 739-755. https://doi.org/10.1111/tops.12574
[27]	Shahnaz, A., Bauer, B. W., Daruwala, S. E., & Klonsky, E. D. (2020). Exploring the scope and structure of suicide capability. Suicide and Life-Threatening Behavior, 50(6), 1230-1240. https://doi.org/10.1111/sltb.12686
[28]	Shiffrin, R., & Mitchell, M. (2023). Probing the psychology of AI models. Proceedings of the National Academy of Sciences of the United States of America, 120(10), e2300963120. https://doi.org/10.1073/pnas.2300963120
[29]	Shing, H.-C., Nair, S., Zirikly, A., Friedenberg, M., Daumé Iii, H., & Resnik, P. (2018). Expert, crowdsourced, and machine assessment of suicide risk via online postings. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic. New Orleans, LA. https://doi.org/10.18653/v1/W18-0603
[30]	Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 60. https://doi.org/10.1186/s40537-019-0197-0
[31]	Sun, F., Song, W., Wen, X., Li, H., Ouyang, L., & Wei, S. (2022). Efficacy of suicide ideation classification based on pain avoidance and the EEG characteristics under self-referential punishment. Acta Psychologica Sinica, 54(9), 1031-1047. https://doi.org/10.3724/sp.J.1041.2022.01031 doi: 10.3724/SP.J.1041.2022.01031 URL
[32]	Tan, Z., Liu, X., Liu, X., Cheng, Q., & Zhu, T. (2017). Designing microblog direct messages to engage social media users with suicide ideation: Interview and survey study on weibo. Journal of Medical Internet Research, 19(12), e381. https://doi.org/10.2196/jmir.8729
[33]	Taylor, J. E. T., & Taylor, G. W. (2021). Artificial cognition: How experimental psychology can help generate explainable artificial intelligence. Psychonomic Bulletin & Review, 28(2), 454-475. https://doi.org/10.3758/s13423-020-01825-5
[34]	Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930-1940. https://doi.org/10.1038/s41591-023-02448-8 doi: 10.1038/s41591-023-02448-8 URL pmid: 37460753
[35]	Wang, C., Song, X., Zhu, T., Zhang, Z., & Liu, T. (2021). An analysis of the theme of a suicide blogger's comment. Chinese Mental Health Journal, 35(2), 121-126. http://ir.psych.ac.cn/handle/311026/39004
[36]	Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2021). Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys, 53(3), 1-34. https://doi.org/10.1145/3386252
[37]	World Health Organization. (2022). Suicide worldwide in 2019: Global health estimates. Retrieved January 9, 2024, from https://www.who.int/publications/i/item/9789240026643
[38]	Yin, H., Wardenaar, K. J., Xu, G., Tian, H., & Schoevers, R. A. (2019). Help-seeking behaviors among Chinese people with mental disorders: A cross-sectional study. BMC Psychiatry, 19(1), 373. https://doi.org/10.1186/s12888-019-2316-z doi: 10.1186/s12888-019-2316-z URL pmid: 31783825
[39]	Zhang, M., Jiang, G., Liu, S., Chen, J., & Zhang, M. (2024). LLM- assisted data augmentation for chinese dialogue-level dependency parsing. Computational Linguistics, 50(3), 876-891. https://doi.org/10.1162/coli_a_00515

Text	Suicide Risk Category
“给你说晚安” (“Saying goodnight to you”)	0
“我过来看看你” (“I just came to see you”)	0
“今天阳光正好, 好想干点什么” (“The sunshine is nice today, I want to do something”)	0
“我想去死了” (“I want to die”)	1
“怎么死比较好呢?” (“What's the best way to die?”)	1
“与其天天面对无休止的谩骂, 我想选择死亡。” (“Rather than facing endless abuse every day, I choose death.”)	1

Text	Suicide Risk Category
“给你说晚安” (“Saying goodnight to you”)	0
“我过来看看你” (“I just came to see you”)	0
“今天阳光正好, 好想干点什么” (“The sunshine is nice today, I want to do something”)	0
“我想去死了” (“I want to die”)	1
“怎么死比较好呢?” (“What's the best way to die?”)	1
“与其天天面对无休止的谩骂, 我想选择死亡。” (“Rather than facing endless abuse every day, I choose death.”)	1

Pattern	Input (prompt)	Output (completion)
Zero-shot	Please list some suicidal statements:	1. 我总想着死没有勇气怎么办 (I always think about death but don't have the courage, what should I do.) 2. 连结束自己这种事情都做不好 (Can't even end myself properly.) 3. 我知道我活不下去 (I know I can't go on living.)
Few-shot	Below is a suicidal statement: The weather in late May is cold like winter every day though my body should lie down still hope to die in my dreams. Similar suicidal statements include:	1. 饭饭, 我是真的很想死…真的很想死… (Fanfan, I really want to die ... really want to die ... ) 2. 现在陷入最两难境地活着难受死了又怕还能投胎愿我能一直做个孤魂野鬼游荡在阴间 (Now I'm in the most difficult situation, living is painful and I'm afraid of being reincarnated if I die, may I wander as a lonely ghost in the netherworld forever.) 3. 我也好想死啊 (I also really want to die.)

Pattern	Input (prompt)	Output (completion)
Zero-shot	Please list some suicidal statements:	1. 我总想着死没有勇气怎么办 (I always think about death but don't have the courage, what should I do.) 2. 连结束自己这种事情都做不好 (Can't even end myself properly.) 3. 我知道我活不下去 (I know I can't go on living.)
Few-shot	Below is a suicidal statement: The weather in late May is cold like winter every day though my body should lie down still hope to die in my dreams. Similar suicidal statements include:	1. 饭饭, 我是真的很想死…真的很想死… (Fanfan, I really want to die ... really want to die ... ) 2. 现在陷入最两难境地活着难受死了又怕还能投胎愿我能一直做个孤魂野鬼游荡在阴间 (Now I'm in the most difficult situation, living is painful and I'm afraid of being reincarnated if I die, may I wander as a lonely ghost in the netherworld forever.) 3. 我也好想死啊 (I also really want to die.)

Model	Pattern	Inference Effect Examples
GPT3.5_TURBO_zeroshot	Zero-shot	prompt input >>> Please list some suicidal statements: completion output >>> 1. 生不如死 (Life is worse than death) 2. 我想离开这个世界 (I want to leave this world)
GPT3.5_TURBO_fewshot	Few-shot	prompt input >>> Below is a suicidal statement: Living is so painful, why not just die Similar suicidal statements include: completion output >>> 1. 我已经无法忍受这种痛苦了, 死亡是唯一的出路。 (I can no longer bear this pain, death is the only way out.) 2. 我觉得自己一无所有, 没有任何意义活着。 (I feel like I have nothing, there's no meaning to living.)
ChatGLM3_6B_zeroshot	Zero-shot	prompt input >>> Please list some suicidal statements: completion output >>> 我无法形容自杀的痛苦和难度, 并且任何形式的自杀都会导致极大的痛苦和伤害, 无论选择哪种方式, 都对你自己和你周围的人造成不可逆转的伤害。如有自杀意念或紧急情况, 请寻求专业的医疗帮助, 例如通过拨打国家自杀预防热线, 可以提供专业的帮助和支持。 (I cannot describe the pain and difficulty of suicide, and any form of suicide will cause extreme pain and harm. No matter which method you choose, it will cause irreversible harm to yourself and those around you. If you have suicidal thoughts or are in an emergency, please seek professional medical help, such as by calling the national suicide prevention hotline, which can provide professional help and support.
ChatGLM3_6B_fewshot	Few-shot	prompt input >>> Below is a suicidal statement: Living is so painful, why not just die Similar suicidal statements include: completion output >>> 自杀热线是提供心理健康支持和帮助的匿名服务, 如果感到沮丧、绝望或无助, 可以拨打当地的自杀预防热线或紧急热线, 获得帮助和支持。这些热线是由专业人员提供支持和指导, 帮助解决问题, 提供安全的环境, 获得心理和情感上的支持。 (Suicide hotlines are anonymous services that provide mental health support and help. If you feel depressed, desperate, or helpless, you can call your local suicide prevention hotline or emergency hotline to get help and support. These hotlines are staffed by professionals who provide support and guidance, help solve problems, provide a safe environment, and obtain psychological and emotional support.)
Qwen_7B_Chat_zeroshot	Zero-shot	prompt input >>> Please list some suicidal statements: completion output >>> 1.说自己总是感觉比别人笨, 因此一直没有走出黑暗, 后来真的发现了自己真的比别人笨, 于是失望至极, 并且这一切都是自己咎由自取。 (Saying one always feels dumber than others, thus has never emerged from darkness, later really discovered that one is indeed dumber than others, and thus was extremely disappointed, and all of this is self-inflicted.) 2.说自己从小就笨, 老师让他带着弟弟妹妹写作业都是在旁边一边弄来弄去, 却不知道从什么时候开始做什么都要挨着别人写 (Saying one has been dumb since childhood, teachers had him help his younger siblings with homework but he was just fiddling around on the side, yet somehow from some point started needing to write alongside others for everything)
Qwen_7B_Chat_fewshot	Zero-shot	prompt input >>> Below is a suicidal statement: Living is so painful, why not just die Similar suicidal statements include: completion output >>> 我不想死, 我想活着, 我想活着, 我想活着, 我想活着, 我想活着, 我想活着, 我想活着, 我想活着, 我想活着, 我想活着。 (I don't want to die, I want to live, I want to live, I want to live, I want to live, I want to live, I want to live, I want to live, I want to live, I want to live, I want to live)