ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2025, Vol. 57 ›› Issue (6): 947-966.doi: 10.3724/SP.J.1041.2025.0947 cstr: 32110.14.2025.0947

• 第二十七届中国科协年会学术论文 • 上一篇    下一篇

人工智能的语言优势和不足:基于大语言模型与真实学生语文能力的比较

高承海1,2, 党宝宝1,2, 王冰洁3, 吴胜涛4()   

  1. 1西北师范大学西北少数民族教育发展研究中心
    2西北师范大学教育科学学院, 兰州 730070
    3西北师范大学心理学院, 兰州 730070
    4厦门大学社会与人类学院, 厦门 361005
  • 收稿日期:2024-03-12 发布日期:2025-04-15 出版日期:2025-06-25
  • 通讯作者: 吴胜涛, E-mail: michaesltwu@gmail.com
  • 作者简介:第一联系人:

    高承海和党宝宝为共同第一作者。

  • 基金资助:
    国家社科基金重大项目(24&ZD189)

The linguistic strength and weakness of artificial intelligence: A comparison between Large Language Model(s) and real students in the Chinese context

GAO Chenghai1,2, DANG Baobao1,2, WANG Bingjie3, WU Michael Shengtao4()   

  1. 1Northwest Minority Education Development Research Center, Northwest Normal University, Lanzhou 730070, China
    2School of Education, Northwest Normal University, Lanzhou 730070, China
    3School of Psychology, Northwest Normal University, Lanzhou 730070, China
    4School of Sociology and Anthropology, Xiamen University, Xiamen 361005, China
  • Received:2024-03-12 Online:2025-04-15 Published:2025-06-25

摘要:

采用定量和定性相结合的混合研究方法, 从准确性、规范性、情感性和创造性四个维度评估了人工智能的语言优势和不足。研究1发现, 相对于真实学生, GPT-4现代文知识(尤其概念知识)的准确性较高, 但其古代诗文和语言文字运用的准确性较低; GPT-4规范性得分与真实学生相当, 情感性和创造性超过及格水平、但低于真实学生, 且前者最优个体的规范性、情感性得分与真实学生最高分持平。研究2基于文心ERNIE-4重复验证了上述结果, 且ERNIE-4的规范性得分高于真实学生。研究揭示了人工智能在现代文知识、规范领域的优势和古代诗文知识的不足, 以及情感性与创造性方面的潜力。这些发现有助于理解和提升人工智能的文化适应性和人性化、个性化生成能力, 也对反思和培养人类的独特优势具有重要启发。

关键词: 大语言模型, 语文能力, 准确性, 情感性, 创造性

Abstract:

Previous research on generative artificial intelligence (AI) has been primarily conducted in the English context, but it remains unclear about linguistic strength and weakness of generative AI in the Chinese context. This study focuses on the accuracy and normativity, affectivity, and creativity of AI in generating language knowledge, and explores its cultural adaptability and ability to generate humanized and personalized content. Evaluating and analyzing these key indicators helps us gain a deeper understanding of the linguistic strengths and weaknesses of AI, as well as cultivating the unique advantages of humans in education.

By combining quantitative and qualitative methods, we evaluated the differences in knowledge accuracy, normativity, affectivity, and creativity between large language models and real students. Specifically, using an explanatory sequential design in the mixed-methods framework, we first tested group differences in each indicator among GPT-4 and ERNIE-4 versus real students on knowledge accuracy, normativity, affectivity, and creativity to test the. Next, through content analyses, we explored the specific performance of large language models on each indicator and the mechanism of their linguistic strengths and weaknesses.

Study 1 found that compared to real students, GPT-4 exhibited higher accuracy in modern text knowledge (especially conceptual knowledge), but lower accuracy in ancient poetry and language usage. The knowledge normativity of GPT-4 were comparable to those of real students, while its affectivity and creativity were lower than those of real students. Moreover, the highest individual scores of GPT-4 in normativity and emotionality were on comparable with the highest scores of real students. Study 2, based on ERNIE-4, confirmed the aforementioned results, and the accuracy in ancient poetry was still lower than that of real students. The results exhibited the advantages of artificial intelligence in the areas of modern knowledge and norms, its shortcomings in ancient poetry knowledge, and its potential in affective and creative expressions.

Taken together, the current findings demonstrate the linguistic strength of generative AI in the knowledge accuracy of modern Chinese literary, and the weakness regarding ancient Chinese poetry and affective and creative writings, as well as generative AI’s potential in normative and affective expressions. This sheds light on the field of the cultural adaptability, affective and creative expressions of generative AI, and has valuable implications for the AI-assistant teaching practice in the Chinese context.

Key words: large language models, language proficiency, accuracy, emotionality, creativity

中图分类号: