ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2025, Vol. 33 ›› Issue (12): 2083-2104.doi: 10.3724/SP.J.1042.2025.2083 cstr: 32111.14.2025.2083

• 元分析 • 上一篇    下一篇

孤独症谱系障碍儿童语音情绪识别的障碍:韵律、语义还是整合困难?——基于三水平元分析的探究

陈丽君1, 靳悦鑫1, 曾涵菡2, 蒋销柳3()   

  1. 1福州大学人文社会科学学院应用心理学系, 福州 350108
    2中央财经大学社会与心理学院心理学系, 北京 100098
    3南开大学社会学院社会心理学系, 天津 300350
  • 收稿日期:2025-04-17 出版日期:2025-12-15 发布日期:2025-10-27
  • 通讯作者: 蒋销柳,E-mail: psyjxl@126.com
  • 基金资助:
    福建省教育科学规划课题(FJJKBK23-06)

Emotional speech recognition in children with autism spectrum disorder: A three-level meta-analysis of prosodic, semantic, and integrative deficits

CHEN Lijun1, JIN Yuexin1, ZENG Hanhan2, JIANG Xiaoliu3()   

  1. 1Department of Psychology, Faculty of Humanities and Social Sciences, Fuzhou University, Fuzhou 350108, China
    2School of Sociology and Psychology, Central University of Finance and Economics, Beijing 100098, China
    3Department of Social Psychology, School of Sociology, Nankai University, Tianjin 300350, China
  • Received:2025-04-17 Online:2025-12-15 Published:2025-10-27

摘要: 日常社交的言语交流中同时包含着语义线索和韵律线索, 孤独症谱系儿童社交中判断说话者的情绪是基于韵律还是语义?对于这一问题的探索有利于了解障碍成因, 并为未来干预提供方向, 但目前悬而未解且争论激烈。由此, 本文对纳入的47项研究(包括93个效应量, 3142名被试)使用三水平元分析模型进行分析, 并对分类变量(如任务类型、语境文化、年龄段、对照组匹配类型、语音性别、情绪类型、谱系亚型)进行亚组分析, 对连续变量(发表年份、样本量和研究质量)进行元回归分析。结果发现, 孤独症谱系语音情绪识别表现存在显著缺陷(g = −0.71); 整合任务效应量最大(g = −0.90)、韵律任务次之(g = −0.61)、语义任务的效应量最小(g = −0.49); 语境文化(p = 0.023)、整合任务中材料类型(p < 0.001)可调节孤独症儿童语音情绪识别的表现, 且任务类型与语境文化、情绪类型、谱系亚型存在交互作用。研究支持了“弱中央统合理论”, 研究为理解孤独症社交障碍机制及制定针对性干预措施提供了实证依据。

关键词: 孤独症谱系障碍, 语音情绪识别, 语义线索, 韵律线索, 元分析

Abstract:

Vocal emotion recognition is a foundational component of effective social communication. However, children with Autism Spectrum Disorder (ASD) frequently experience difficulties in this domain. While prior research has demonstrated that individuals with ASD show impairments in processing emotional information conveyed through speech, there is little consensus on the precise source of this difficulty. Is the core deficit primarily located in prosodic processing, semantic comprehension, or in the integrative mechanisms required to reconcile multiple emotional cues? To address this critical question, the current study conducted a three-level meta-analysis encompassing 47 independent studies, yielding 93 effect sizes and a total sample of 3,142 participants. This meta-analytic framework allows for the modeling of nested data structures and provides a statistically rigorous estimation of overall deficits and their moderators across varied task designs, cultural contexts, and developmental stages.

The results revealed a significant overall deficit in the vocal emotion recognition performance of children with ASD compared to their typically developing peers (Hedges’ g=-0.71), reflecting a moderate to large effect size. Importantly, the magnitude of the deficit was systematically related to task type: the largest impairments occurred in integrative tasks requiring the simultaneous processing of semantic and prosodic cues (g=-0.90), followed by tasks emphasizing prosodic features alone (g=-0.61), and then by semantic-only tasks (g=-0.49). This pattern offers robust support for the Weak Central Coherence (WCC) theory, which posits that individuals with ASD exhibit a domain-general impairment in integrating multiple channels of information. Even tasks that appear unimodal on the surface—such as recognizing emotional tone or understanding emotional vocabulary—often entail the coordination of subtle subcomponents (e.g., combining pitch, intensity, and duration for prosody, or word meaning, syntax, and context for semantics), which can disproportionately tax integrative processing in ASD.

Further, a series of moderator analyses revealed key contextual and methodological factors shaping the extent of impairment. Cultural context was a significant moderator of performance differences (p=0.023). Children with ASD from high-context cultures (e.g., many East Asian cultures, where communication often relies on implicit tone and context) exhibited more pronounced deficits in emotional speech recognition (g=-1.03) than those from low-context cultures (e.g., North America or Western Europe, g=-0.60). This finding underscores the importance of considering the sociocultural environment in understanding emotional development in ASD and designing culturally sensitive interventions. Material type also emerged as a powerful moderator (p<0.001). When semantic and prosodic cues were congruent, group differences between ASD and typically developing children were relatively small. However, when cues were conflicting or ambiguous, ASD children's performance deteriorated substantially. This suggests that the integrative challenge—especially in resolving conflicting information—may constitute the most critical barrier to successful emotion recognition in ASD. Contrary to the developmental delay hypothesis, age did not significantly moderate the observed deficits, suggesting that impairments in vocal emotion recognition are not simply a reflection of delayed maturation, but may instead represent a stable and persistent neurocognitive feature of ASD, thereby calling for intervention strategies that move beyond age-compensation frameworks.

In addition, we observed that the influence of task type interacted significantly with cultural context (p=0.035) and with emotion type (p=0.019). Specifically, children with ASD from high-context cultures demonstrated the greatest difficulties in vocal emotion recognition during integrated tasks, showing the largest performance gap compared to typically developing children when identifying mixed and complex emotions in such tasks. Further exploratory analyses revealed nuanced interactions between task type and diagnostic subtype (p=0.060). For instance, children with Asperger’s syndrome exhibited relatively better performance on semantic tasks, likely reflecting their preserved linguistic capabilities. However, this advantage did not extend to prosodic or integrative conditions, where performance was similarly impaired as in other subtypes. This pattern further supports the idea that superficial language proficiency in ASD may mask deeper deficits in emotional inference, particularly in contexts requiring multimodal integration.

This meta-analysis provided a nuanced understanding of vocal emotion recognition deficits in children with ASD. The findings demonstrated that the impairment is robust across contexts, highlighting that the greatest difficulty lies in integrating multiple communicative cues rather than in processing prosody or semantics alone. By identifying integrative processing as a critical weakness—and noting the unique patterns associated with cultural context and ASD subtypes—the study provides a foundation for developing targeted, evidence-based strategies to enhance social communicative functioning in children with ASD.

Key words: autism spectrum disorder, emotional speech recognition, semantics cues, prosodic cues, meta- analysis

中图分类号: