ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2026, Vol. 34 ›› Issue (2): 251-270.doi: 10.3724/SP.J.1042.2026.0251 cstr: 32111.14.2026.0251

• 研究前沿 • 上一篇    下一篇

化繁为简:视觉集合感知的神经机制

孙焕翔1,#, 张帆1,2,#, 李思嘉1, 张秀玲1(), 蒋毅3,4   

  1. 1东北师范大学心理学院, 长春 130024
    2晋中信息学院, 山西 晋中 030800
    3中国科学院心理研究所脑与认知科学国家重点实验室, 北京 100101
    4中国科学院大学心理学系, 北京 100049
  • 收稿日期:2025-07-01 出版日期:2026-02-15 发布日期:2025-12-15
  • 通讯作者: 张秀玲, E-mail: zhangxl556@nenu.edu.cn
  • 作者简介:

    #孙焕翔和张帆为本文的共同第一作者

  • 基金资助:
    吉林省教育厅科学研究项目(JJKH20241389KJ)

Simplify complexity: The neural mechanisms underlying ensemble perception

SUN Huanxiang1,#, ZHANG Fan1,2,#, LI Sijia1, ZHANG Xiuling1(), JIANG Yi3,4   

  1. 1School of Psychology, Northeast Normal University, Changchun 130024, China
    2Jinzhong College of Information, Jinzhong 030800, China
    3State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
    4Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-07-01 Online:2026-02-15 Published:2025-12-15

摘要:

集合感知是视觉系统高效地从复杂的外部世界中提取均值、方差等概要信息的过程, 这对于人类适应环境具有重要意义。对其神经机制的研究有助于理解视觉系统如何实现高效的抽象表征。本文总结了集合感知的时间进程, 综述了这种整合机制的理论模型和实证证据, 并区分了集合编码与成员或个体编码的功能及神经基础。在现有研究成果的基础上, 提出了“粗略−细节−校准”的整合模型:大脑在加工不同水平的视觉特征时, 可能依次存在领域通用与特异性机制, 早期依赖于通用性的大细胞通路的粗略加工, 随后是特异性的、依赖于各特征脑区小细胞通路的相对精细表征, 最后通过前馈−反馈循环迭代进行校准。未来研究可关注视觉集合感知的神经通路与具体脑区、前馈与反馈的角色、信息编码的通用性与特异性, 以及发育与经验对集合感知的影响。

关键词: 集合感知, 统计概要表征, 知觉整合, 时间进程, 神经机制

Abstract:

Ensemble perception, the process by which the visual system extracts summary statistical information (e.g., mean and variance) from groups of similar objects at a brief glance, is critical to human adaptive functioning. While the behavioral characteristics of this “gist” perception are well researched, its underlying neural mechanisms remain elusive. The present work reviews the temporal dynamics of ensemble perception, evaluates leading theoretical models in light of key empirical findings, and distinguishes the neural substrates underlying ensemble versus single-item processing. Based on existing evidence, the review proposes a “Coarse-Fine-Refine” model that reconciles divergent findings on visual information processing pathways, temporal stages, and computational mechanisms.

A primary focus of the review is the controversial temporal dynamics of ensemble processing. We first examine evidence for an early, automatic extraction of summary statistics. Event-related potential (ERP) studies suggest that mean emotional information can be processed with limited attention, supported by the absence of spatial attention components (N2pc) and the presence of visual mismatch negativity (vMMN) to changes in unattended ensemble items. Such findings suggest a rapid, pre-attentive mechanism of ensemble perception. However, this view is challenged by other studies. For instance, some failed to find vMMN in the absence of attention during mean orientation perception, suggesting that at least some ensemble features require attentional resources. Furthermore, recent MEG/EEG studies revealed that a precise and stable representation correlated with behavioral performance only emerges at much later time windows (e.g., 400-700 ms). These controversial findings suggest a possible multi-stage process, beginning with a rapid, coarse estimate followed by a slower, refined calculation.

The review then scrutinizes the integration mechanisms in ensemble perception. Early hypotheses, such as the Signal Pooling Hypothesis, posit a hierarchical, feedforward process where signals from individual items are averaged or pooled at progressively higher levels of the visual pathway. Computational models, like the Population Response Model, provide a plausible neural implementation for this. However, a growing body of evidence challenges a simple linear averaging account. Regression-based ERP studies demonstrate that ensemble perception may rely on non-additive integration, capturing global interactions between items rather than mere summation of individual elements.

We also address the possible dissociation between the neural coding of the ensemble and its individual members. fMRI and MEG evidence suggests that ensemble emotion preferentially relies on the dorsal visual stream (e.g., intraparietal sulcus), particularly the rapid magnocellular (M-pathway) input. In contrast, individual item identification depends on the ventral stream (e.g., fusiform gyrus) and parvocellular (P-pathway) processing. Reverse Hierarchy Theory (RHT) also posits a neural dissociation between ensemble and individual representations, proposing that “vision at a glance” (ensemble gist) arises first via a fast feedforward sweep, whereas “vision with scrutiny” (individual details) requires slower, top-down attentional feedback.

In summary, while research on the neural mechanisms of ensemble perception has made significant progress, it remains in a critical phase of development, with several key questions yet to be resolved: (1) using high-resolution imaging to delineate the specific contributions of different brain areas, including V1 and dorsal versus ventral stream regions; (2) clarifying the interplay between feedforward and feedback signals; (3) resolving the domain-general versus domain-specific debate; and (4) examining how ensemble mechanisms are shaped by neural development and individual experience. We further argue for a crucial distinction between neural activity related to “ensemble perception” (the overall response to multiple stimuli) and “statistical summary representation” (the specific neural computation of a statistic). Failing to separate these concepts tends to misattribute general neural summation effects, such as the increase in N170 amplitude evoked by multiple faces, to specific mechanisms of statistical representation. Addressing these questions will be essential for a complete understanding of the mechanisms underlying ensemble perception.

Finally, based on existing evidence, we propose an integrative “Coarse-Fine-Refine” model. In Phase 1, a rapid, domain-general “gist” is extracted, driven by the M-pathway projecting low-spatial-frequency information to dorsal and frontoparietal regions, forming an initial, rough summary or prediction. In Phase 2, a slower, domain-specific process mediated by the P-pathway analyzes high-spatial-frequency details within specialized ventral/occipital regions (e.g., face- or orientation-specific features). In Phase 3, iterative calibration occurs via recurrent feedforward-feedback loops between high-level (frontoparietal) and feature-specific (ventral/occipital) areas. This interaction uses the initial “coarse” prediction to modulate the “fine” processing, resulting in a final, precise ensemble representation. This framework synthesizes the complex roles of distinct neural pathways and resolves ongoing debates on temporal dynamics (early vs. late) and processing mechanisms (general vs. specific), offering a plausible neural hypothesis for how the brain “simplifies complexity”.

Key words: ensemble perception, statistical summary representation, perceptual integration, temporal dynamics, neural mechanism

中图分类号: