ISSN 0439-755X
CN 11-1911/B

Acta Psychologica Sinica ›› 2026, Vol. 58 ›› Issue (4): 590-602.doi: 10.3724/SP.J.1041.2026.0590

• Reports of Empirical Studies • Previous Articles     Next Articles

Cross-modal transfer of statistical learning under unimodal and multimodal learning conditions

TANG Yi1, ZHAO Yajun2, ZENG Qingzhang3, ZHANG Zhijun3, WU Shengnan1   

  1. 1Chongqing Academy of Governance, Chongqing 400041, China;
    2College of Sociology and Psychology, Southwest University for Nationalities, Chengdu 610041, China;
    3Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou 310022, China
  • Received:2025-03-17 Published:2026-04-25 Online:2026-01-16

Abstract: Statistical learning (SL), defined as an unconscious and automatic ability to extract regularities from the environment, has been shown to operate across multiple sensory modalities, including vision, audition, and touch. Although SL exhibits a certain degree of modality independence, these processes are not entirely isolated and may interact. From the perspective of object-feature processing, Frost et al. (2015) proposed the abstract rule representation hypothesis, suggesting that individuals may rely on four types of characteristics when learning inter-object regularities: modality specificity, stimulus specificity, modality generality, and SL jointly modulated by both modality and stimulus. However, theoretical disagreement remains regarding whether statistical regularities learned in one modality can be directly expressed in another—namely, whether cross-modal transfer occurs. Existing research on SL transfer has mainly focused on two areas: (a) transfer between low-level features within the same modality (e.g., from shape to color), which has not been extended to the cross-modal level, and (b) transfer between objects with semantic information, which has also not addressed cross-modal processing mechanisms. Against this backdrop, the present study used animal pictures and animal sounds as materials to examine the cross-modal transfer of SL between the visual and auditory modalities.
This study included four experiments that integrated the cross-modal transfer and multimodal SL paradigms to investigate cross-modal transfer of SL in realistic object contexts systematically. Experiment 1 constructed a visual stimulus stream using animal pictures to verify visual SL. Experiment 2 employed a cross-modal transfer paradigm in which participants were visually familiarized only with animal pictures and then tested with either animal pictures or animal sounds. By comparing performance between visual-visual and visual-auditory conditions, the experiment evaluated whether visual SL transfers across different modalities. Experiment 3 used a multimodal learning approach to separate modality-specific learning from cross-modal transfer. It aimed to (a) examine whether SL in the visual modality is independent of that in the auditory modality, and (b) investigate the relationship between visual-to-auditory transfer and auditory SL under multimodal learning conditions. Experiment 4 assessed the transfer of SL from audition to vision and, together with Experiment 3, examined the bidirectionality of cross-modal transfer.
Results showed that Experiment 1 successfully validated visual SL with animal pictures, confirming previous findings (e.g., Otsuka et al., 2013). In Experiment 2, learned statistical regularities through visual unimodal exposure persisted within the visual modality and also transferred to the auditory modality, indicating comparable learning effects across both senses. Experiment 3 revealed that multimodal input did not significantly interfere with unimodal visual or auditory SL, aligning with studies by Li et al. (2018) and Mitchel and Weiss (2011), and supporting the idea that SL operates relatively independently across sensory modalities. Furthermore, regardless of whether the auditory stream contained statistical regularities, the visual-to-auditory transfer effect remained robust, suggesting that cross-modal transfer can occur alongside unimodal SL. Experiment 4 confirmed that statistical regularities learned through audition could transfer to vision. Together, these experiments offer converging evidence for bidirectional cross-modal transfer of SL, indicating that it is not modality-specific but instead reflects a general cognitive mechanism.
In summary, the study presents three main conclusions: (1) SL of real animal objects shows bidirectional cross-modal transfer; (2) SL in visual and auditory modalities is fairly independent; and (3) unimodal SL and cross-modal transfer can occur independently in parallel and simultaneously, supporting the idea of a multilevel statistical representation system.

Key words: statistical learning, animal picture, animal sound, multimodal, cross-modal transfer