ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2025, Vol. 33 ›› Issue (10): 1794-1804.doi: 10.3724/SP.J.1042.2025.1794 cstr: 32111.14.2025.1794

• 研究前沿 • 上一篇    下一篇

视听协同发展在婴儿期语言习得中的作用: 典型发育与高风险自闭症谱系障碍婴儿的比较分析

靳孟可1, 严璘璘1(), 刘少英1, 肖乃奇2   

  1. 1 浙江理工大学理学院心理学系, 杭州 310018
    2 麦克马斯特大学心理、神经科学和行为学系, 汉密尔顿 L8S4L8, 加拿大
  • 收稿日期:2024-10-30 出版日期:2025-10-15 发布日期:2025-08-18
  • 通讯作者: 严璘璘, E-mail: yanlinlin@zstu.edu.cn
  • 基金资助:
    浙江省高校重大人文社科攻关计划项目“中加婴儿视听整合能力的比较研究”(2024QN133);浙江省教育科学规划课题“婴儿早期视听知觉整合的发展规律及干预研究”(2024SCG329)

Audiovisual integration in infant language acquisition: Different patterns in typically developing infants and those at elevated risk for autism spectrum disorder

JIN Mengke1, YAN Linlin1(), LIU Shaoying1, XIAO Naiqi2   

  1. 1 Department of Psychology, School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China
    2 Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton L8S4L8, Canada
  • Received:2024-10-30 Online:2025-10-15 Published:2025-08-18

摘要:

多模态感知机制对婴儿期语言习得至关重要, 其中视听整合在典型发育婴儿的语言能力发展中起着关键作用。相比之下, 高风险自闭症谱系障碍(ASD)婴儿在此整合过程中常面临挑战。典型的言语感知发展轨迹强调面部特征加工的重要性——对眼部与嘴部区域的注意能促进语言学习。实证研究表明, 4.5月龄的婴儿已具备视听整合能力, 这种能力可有效预测后期语言发展水平。而高风险ASD婴儿则表现出社会性注意减少和视听整合功能受损, 此类缺陷可能破坏常规语言习得路径。因此, 早期干预策略应优先采用基于生物学的感觉引导方法, 重点增强多感官整合能力, 而非仅针对注意行为进行训练。理解这些机制不仅能深化对典型语言发展的认知, 更能为制定早期干预方案提供实证基础, 从而支持高风险ASD群体的语言习得。

关键词: 语音感知, 视听匹配, 多感官整合, 高风险自闭症谱系障碍婴儿, 语言发展, 早期干预

Abstract:

Language development in infancy is fundamentally shaped by the dynamic integration of auditory and visual (AV) cues. This review examines the role of AV synergy in early language acquisition by contrasting developmental trajectories in typically developing infants and those at elevated risk for autism spectrum disorder (ASD).

In typically developing infants, AV integration progresses through stage-specific mechanisms. During the first three months postnatally, infants prioritize eye gaze to establish social engagement. At this stage, infants rely on temporally synchronized cues, such as speech paired with facial expressions. Early cross-modal learning is mediated by primary audiovisual cortical responses, initially confined to narrow temporal windows. Between 3-6 months, attention shifts toward the mouth region, driven by exaggerated articulatory movements and prosodic features typical of infant-directed speech (IDS). During this phase, infants also begin to show sensitivity to conflicting AV inputs, demonstrated by the McGurk effect. In such cases, infants integrate mismatched visual /ga/ and auditory /ba/ into a fused “da” percept. Adaptive mechanisms emerge during this period, with infants increasing mouth fixation to compensate for auditory ambiguity in noisy or unfamiliar linguistic contexts. Between 6-9 months, mouth-focused attention becomes dominant, facilitating precise phoneme-lip mapping. Bilingual infants exhibit adaptive plasticity, extending mouth fixation durations to manage dual-language inputs. From 9-12 months, socio-cognitive maturation supports dynamic rebalancing of attention. Infants maintain mouth fixation during lexical acquisition to enhance phoneme-semantic associations, while simultaneously reinstating eye contact to facilitate joint attention and intentional communication. Across all stages, IDS optimizes language learning through enhanced AV synchrony, such as slowed speech rates and amplified mouth movements, serving as scaffolding for developmental milestones.

Infants at high risk for ASD demonstrate systematic deviations in AV integration emerging early in life. A prominent feature is a progressive decline in social attention, particularly eye gaze, apparent as early as two months of age. Unlike typically developing infants, who maintain eye contact to foster social reciprocity, high-risk infants gradually reduce fixation on the eyes. This diminished attention disrupts foundational processes of joint attention, thereby limiting caregiver-infant interactions and linguistic input. Neural studies link these behavioral differences to reduced cortical activation in temporal regions during dynamic face processing, suggesting impaired encoding of social stimuli. Concurrently, high-risk infants display delayed attention to the mouth region, with significant increases in mouth fixation occurring around 18 months, considerably later than the typical 6- to 9-month period. This delay negatively impacts phoneme-lip mapping accuracy, leading to weaker phoneme discrimination. For instance, high-risk infants struggle to leverage visual speech cues in noisy environments, reflecting impaired AV integration. Neurophysiological evidence further highlights impaired AV synchrony detection, including increased tolerance to asynchronous AV stimuli and the absence of McGurk responses by nine months. These behavioral deficits are underpinned by neural atypicality, evidenced by attenuated event-related potentials (ERPs), such as diminished N290 responses to dynamic faces. Such neural signatures predict later social and linguistic impairments. Additionally, sex differences reveal divergent compensatory strategies: female high-risk infants partially mitigate language delays by increasing mouth fixation, whereas male infants exhibit persistent deficits in social attention and AV integration. Collectively, these findings highlight AV integration anomalies as early transdiagnostic markers detectable months before overt behavioral ASD symptoms, such as language delays or social withdrawal, emerge.

Intervention strategies aligned with developmental stages have demonstrated efficacy. Early interventions (0-6 months) leverage biofeedback to reinforce eye contact and enrich IDS-driven multimodal input. Mid-phase interventions (6-12 months) employ virtual reality training to enhance visual reliance in challenging auditory environments, alongside wearable eye-trackers to align gaze with auditory labeling. After 12 months, interventions incorporate emotional prosody and facial expressions to support socio-linguistic fluency. Preliminary studies indicate that multisensory integration training significantly improves language outcomes in high-risk infants, surpassing attention-focused approaches.

Critical challenges remain, including clarifying how prosodic cues influence phoneme discrimination, understanding neural mechanisms underlying consonant learning, and translating AV biomarkers into practical clinical tools. Future research should combine naturalistic observation with advanced neuroimaging techniques to develop multimodal risk assessment systems. Addressing these gaps will facilitate early, personalized interventions, leveraging neuroplasticity to reduce developmental impairments.

Key words: speech perception, audiovisual matching, multisensory integration, high-risk autism spectrum disorder (ASD) infants, language development, early intervention

中图分类号: