ISSN 0439-755X
CN 11-1911/B

Acta Psychologica Sinica ›› 2025, Vol. 57 ›› Issue (6): 947-966.doi: 10.3724/SP.J.1041.2025.0947

• Academic Papers of the 27 th Annual Meeting of the China Association for Science and Technology • Previous Articles     Next Articles

The linguistic strength and weakness of artificial intelligence: A comparison between Large Language Model (s) and real students in the Chinese context

GAO Chenghai1,2, DANG Baobao1,2, WANG Bingjie3, WU Michael Shengtao4   

  1. 1Northwest Minority Education Development Research Center, Northwest Normal University, Lanzhou 730070, China;
    2School of Education, Northwest Normal University, Lanzhou 730070, China;
    3School of Psychology, Northwest Normal University, Lanzhou 730070, China;
    4School of Sociology and Anthropology, Xiamen University, Xiamen 361005, China
  • Received:2024-03-12 Published:2025-06-25 Online:2025-04-15

Abstract: Previous research on generative artificial intelligence (AI) has been primarily conducted in the English context, but it remains unclear about linguistic strength and weakness of generative AI in the Chinese context. This study focuses on the accuracy and normativity, affectivity, and creativity of AI in generating language knowledge, and explores its cultural adaptability and ability to generate humanized and personalized content. Evaluating and analyzing these key indicators helps us gain a deeper understanding of the linguistic strengths and weaknesses of AI, as well as cultivating the unique advantages of humans in education.
By combining quantitative and qualitative methods, we evaluated the differences in knowledge accuracy, normativity, affectivity, and creativity between large language models and real students. Specifically, using an explanatory sequential design in the mixed-methods framework, we first tested group differences in each indicator among GPT-4 and ERNIE-4 versus real students on knowledge accuracy, normativity, affectivity, and creativity to test the. Next, through content analyses, we explored the specific performance of large language models on each indicator and the mechanism of their linguistic strengths and weaknesses.
Study 1 found that compared to real students, GPT-4 exhibited higher accuracy in modern text knowledge (especially conceptual knowledge), but lower accuracy in ancient poetry and language usage. The knowledge normativity of GPT-4 were comparable to those of real students, while its affectivity and creativity were lower than those of real students. Moreover, the highest individual scores of GPT-4 in normativity and emotionality were on comparable with the highest scores of real students. Study 2, based on ERNIE-4, confirmed the aforementioned results, and the accuracy in ancient poetry was still lower than that of real students. The results exhibited the advantages of artificial intelligence in the areas of modern knowledge and norms, its shortcomings in ancient poetry knowledge, and its potential in affective and creative expressions.
Taken together, the current findings demonstrate the linguistic strength of generative AI in the knowledge accuracy of modern Chinese literary, and the weakness regarding ancient Chinese poetry and affective and creative writings, as well as generative AI’s potential in normative and affective expressions. This sheds light on the field of the cultural adaptability, affective and creative expressions of generative AI, and has valuable implications for the AI-assistant teaching practice in the Chinese context.

Key words: large language models, language proficiency, accuracy, emotionality, creativity