ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

Advances in Psychological Science ›› 2025, Vol. 33 ›› Issue (10): 1794-1804.doi: 10.3724/SP.J.1042.2025.1794

• Regular Articles • Previous Articles     Next Articles

Audiovisual integration in infant language acquisition: Different patterns in typically developing infants and those at elevated risk for autism spectrum disorder

JIN Mengke1, YAN Linlin1(), LIU Shaoying1, XIAO Naiqi2   

  1. 1 Department of Psychology, School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China
    2 Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton L8S4L8, Canada
  • Received:2024-10-30 Online:2025-10-15 Published:2025-08-18
  • Contact: YAN Linlin E-mail:yanlinlin@zstu.edu.cn

Abstract:

Language development in infancy is fundamentally shaped by the dynamic integration of auditory and visual (AV) cues. This review examines the role of AV synergy in early language acquisition by contrasting developmental trajectories in typically developing infants and those at elevated risk for autism spectrum disorder (ASD).

In typically developing infants, AV integration progresses through stage-specific mechanisms. During the first three months postnatally, infants prioritize eye gaze to establish social engagement. At this stage, infants rely on temporally synchronized cues, such as speech paired with facial expressions. Early cross-modal learning is mediated by primary audiovisual cortical responses, initially confined to narrow temporal windows. Between 3-6 months, attention shifts toward the mouth region, driven by exaggerated articulatory movements and prosodic features typical of infant-directed speech (IDS). During this phase, infants also begin to show sensitivity to conflicting AV inputs, demonstrated by the McGurk effect. In such cases, infants integrate mismatched visual /ga/ and auditory /ba/ into a fused “da” percept. Adaptive mechanisms emerge during this period, with infants increasing mouth fixation to compensate for auditory ambiguity in noisy or unfamiliar linguistic contexts. Between 6-9 months, mouth-focused attention becomes dominant, facilitating precise phoneme-lip mapping. Bilingual infants exhibit adaptive plasticity, extending mouth fixation durations to manage dual-language inputs. From 9-12 months, socio-cognitive maturation supports dynamic rebalancing of attention. Infants maintain mouth fixation during lexical acquisition to enhance phoneme-semantic associations, while simultaneously reinstating eye contact to facilitate joint attention and intentional communication. Across all stages, IDS optimizes language learning through enhanced AV synchrony, such as slowed speech rates and amplified mouth movements, serving as scaffolding for developmental milestones.

Infants at high risk for ASD demonstrate systematic deviations in AV integration emerging early in life. A prominent feature is a progressive decline in social attention, particularly eye gaze, apparent as early as two months of age. Unlike typically developing infants, who maintain eye contact to foster social reciprocity, high-risk infants gradually reduce fixation on the eyes. This diminished attention disrupts foundational processes of joint attention, thereby limiting caregiver-infant interactions and linguistic input. Neural studies link these behavioral differences to reduced cortical activation in temporal regions during dynamic face processing, suggesting impaired encoding of social stimuli. Concurrently, high-risk infants display delayed attention to the mouth region, with significant increases in mouth fixation occurring around 18 months, considerably later than the typical 6- to 9-month period. This delay negatively impacts phoneme-lip mapping accuracy, leading to weaker phoneme discrimination. For instance, high-risk infants struggle to leverage visual speech cues in noisy environments, reflecting impaired AV integration. Neurophysiological evidence further highlights impaired AV synchrony detection, including increased tolerance to asynchronous AV stimuli and the absence of McGurk responses by nine months. These behavioral deficits are underpinned by neural atypicality, evidenced by attenuated event-related potentials (ERPs), such as diminished N290 responses to dynamic faces. Such neural signatures predict later social and linguistic impairments. Additionally, sex differences reveal divergent compensatory strategies: female high-risk infants partially mitigate language delays by increasing mouth fixation, whereas male infants exhibit persistent deficits in social attention and AV integration. Collectively, these findings highlight AV integration anomalies as early transdiagnostic markers detectable months before overt behavioral ASD symptoms, such as language delays or social withdrawal, emerge.

Intervention strategies aligned with developmental stages have demonstrated efficacy. Early interventions (0-6 months) leverage biofeedback to reinforce eye contact and enrich IDS-driven multimodal input. Mid-phase interventions (6-12 months) employ virtual reality training to enhance visual reliance in challenging auditory environments, alongside wearable eye-trackers to align gaze with auditory labeling. After 12 months, interventions incorporate emotional prosody and facial expressions to support socio-linguistic fluency. Preliminary studies indicate that multisensory integration training significantly improves language outcomes in high-risk infants, surpassing attention-focused approaches.

Critical challenges remain, including clarifying how prosodic cues influence phoneme discrimination, understanding neural mechanisms underlying consonant learning, and translating AV biomarkers into practical clinical tools. Future research should combine naturalistic observation with advanced neuroimaging techniques to develop multimodal risk assessment systems. Addressing these gaps will facilitate early, personalized interventions, leveraging neuroplasticity to reduce developmental impairments.

Key words: speech perception, audiovisual matching, multisensory integration, high-risk autism spectrum disorder (ASD) infants, language development, early intervention

CLC Number: