ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

Advances in Psychological Science ›› 2026, Vol. 34 ›› Issue (2): 251-270.doi: 10.3724/SP.J.1042.2026.0251

• Regular Articles • Previous Articles     Next Articles

Simplify complexity: The neural mechanisms underlying ensemble perception

SUN Huanxiang1, ZHANG Fan1,2, LI Sijia1, ZHANG Xiuling1, JIANG Yi3,4   

  1. 1School of Psychology, Northeast Normal University, Changchun 130024, China;
    2Jinzhong College of Information, Jinzhong 030800, China;
    3State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
    4Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-07-01 Online:2026-02-15 Published:2025-12-15

Abstract: Ensemble perception, the process by which the visual system extracts summary statistical information (e.g., mean and variance) from groups of similar objects at a brief glance, is critical to human adaptive functioning. While the behavioral characteristics of this “gist” perception are well researched, its underlying neural mechanisms remain elusive. The present work reviews the temporal dynamics of ensemble perception, evaluates leading theoretical models in light of key empirical findings, and distinguishes the neural substrates underlying ensemble versus single-item processing. Based on existing evidence, the review proposes a “Coarse-Fine-Refine” model that reconciles divergent findings on visual information processing pathways, temporal stages, and computational mechanisms.
A primary focus of the review is the controversial temporal dynamics of ensemble processing. We first examine evidence for an early, automatic extraction of summary statistics. Event-related potential (ERP) studies suggest that mean emotional information can be processed with limited attention, supported by the absence of spatial attention components (N2pc) and the presence of visual mismatch negativity (vMMN) to changes in unattended ensemble items. Such findings suggest a rapid, pre-attentive mechanism of ensemble perception. However, this view is challenged by other studies. For instance, some failed to find vMMN in the absence of attention during mean orientation perception, suggesting that at least some ensemble features require attentional resources. Furthermore, recent MEG/EEG studies revealed that a precise and stable representation correlated with behavioral performance only emerges at much later time windows (e.g., 400-700 ms). These controversial findings suggest a possible multi-stage process, beginning with a rapid, coarse estimate followed by a slower, refined calculation.
The review then scrutinizes the integration mechanisms in ensemble perception. Early hypotheses, such as the Signal Pooling Hypothesis, posit a hierarchical, feedforward process where signals from individual items are averaged or pooled at progressively higher levels of the visual pathway. Computational models, like the Population Response Model, provide a plausible neural implementation for this. However, a growing body of evidence challenges a simple linear averaging account. Regression-based ERP studies demonstrate that ensemble perception may rely on non-additive integration, capturing global interactions between items rather than mere summation of individual elements.
We also address the possible dissociation between the neural coding of the ensemble and its individual members. fMRI and MEG evidence suggests that ensemble emotion preferentially relies on the dorsal visual stream (e.g., intraparietal sulcus), particularly the rapid magnocellular (M-pathway) input. In contrast, individual item identification depends on the ventral stream (e.g., fusiform gyrus) and parvocellular (P-pathway) processing. Reverse Hierarchy Theory (RHT) also posits a neural dissociation between ensemble and individual representations, proposing that “vision at a glance” (ensemble gist) arises first via a fast feedforward sweep, whereas “vision with scrutiny” (individual details) requires slower, top-down attentional feedback.
In summary, while research on the neural mechanisms of ensemble perception has made significant progress, it remains in a critical phase of development, with several key questions yet to be resolved: (1) using high-resolution imaging to delineate the specific contributions of different brain areas, including V1 and dorsal versus ventral stream regions; (2) clarifying the interplay between feedforward and feedback signals; (3) resolving the domain-general versus domain-specific debate; and (4) examining how ensemble mechanisms are shaped by neural development and individual experience. We further argue for a crucial distinction between neural activity related to “ensemble perception” (the overall response to multiple stimuli) and “statistical summary representation” (the specific neural computation of a statistic). Failing to separate these concepts tends to misattribute general neural summation effects, such as the increase in N170 amplitude evoked by multiple faces, to specific mechanisms of statistical representation. Addressing these questions will be essential for a complete understanding of the mechanisms underlying ensemble perception.
Finally, based on existing evidence, we propose an integrative “Coarse-Fine-Refine” model. In Phase 1, a rapid, domain-general “gist” is extracted, driven by the M-pathway projecting low-spatial-frequency information to dorsal and frontoparietal regions, forming an initial, rough summary or prediction. In Phase 2, a slower, domain-specific process mediated by the P-pathway analyzes high-spatial-frequency details within specialized ventral/occipital regions (e.g., face- or orientation-specific features). In Phase 3, iterative calibration occurs via recurrent feedforward-feedback loops between high-level (frontoparietal) and feature-specific (ventral/occipital) areas. This interaction uses the initial “coarse” prediction to modulate the “fine” processing, resulting in a final, precise ensemble representation. This framework synthesizes the complex roles of distinct neural pathways and resolves ongoing debates on temporal dynamics (early vs. late) and processing mechanisms (general vs. specific), offering a plausible neural hypothesis for how the brain “simplifies complexity”.

Key words: ensemble perception, statistical summary representation, perceptual integration, temporal dynamics, neural mechanism

CLC Number: