ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2026, Vol. 34 ›› Issue (6): 1035-1048.doi: 10.3724/SP.J.1042.2026.1035 cstr: 32111.14.2026.1035

• 研究前沿 • 上一篇    下一篇

一级与二级视觉视角采择的机制比较: 理论争议、行为与神经科学的证据

王镓茵, 李晶   

  1. 南京师范大学心理学院, 南京 210097
  • 收稿日期:2025-06-05 出版日期:2026-06-15 发布日期:2026-04-17
  • 基金资助:
    国家自然科学基金面上项目(42371444); 教育部人文社会科学研究一般项目(24YJA190007)

Comparing the mechanisms of level-1 and level-2 visual perspective taking: Theoretical controversies, behavioral and neuroscientific evidence

WANG Jiayin, LI Jing   

  1. School of Psychology, Nanjing Normal University, Nanjing 210097, China
  • Received:2025-06-05 Online:2026-06-15 Published:2026-04-17

摘要: 视觉视角采择(VPT)分为一级视觉视角采择与二级视觉视角采择。现有理论对二者关系存在根本分歧:双系统理论主张两者内部机制独立, 单系统理论则认为它们共享同一系统。综合两种理论视角, 本文提出三阶段加工模型, 该模型认为, 一级与二级视觉视角采择均经历信息处理、视角模拟及信息整合与反应选择三个阶段。行为与神经证据表明, 在三个阶段中, 一级与二级视觉视角采择的加工机制可能既存在差异又存在相似之处:在信息处理阶段, 两者共享基础的空间信息编码, 但二级视觉视角采择涉及的表征更为精细; 在视角模拟阶段, 一级视觉视角采择依赖快速、非具身的视线追踪, 而二级视觉视角采择则需要进行具身的参照系变换与自我旋转对齐; 在信息整合与反应选择阶段, 两者可能共享对他人意图的理解, 但二级视觉视角采择需要更强的认知控制。本文构建的三阶段模型为理解一级与二级视觉视角采择提供了统一框架, 未来研究应致力于开发能够分离各加工阶段的实验范式、进一步检验该模型的时间进程, 并深入探索二级视觉视角采择中具身机制的触发条件及其跨模态整合。

关键词: 视觉视角采择, 双系统理论, 单系统理论, 空间认知

Abstract: Visual Perspective Taking (VPT), the ability to simulate and understand anther's visual experience, is traditionally categorized into two levels: Level-1 (judging visibility, i.e., “what” is seen) and Level-2 (judging appearance, i.e., “how” it is seen). The current theories in this field present two opposing views: Two-systems account proposes that these two processes involve separate but complementary cognitive systems, while single-system account suggests that a unified cognitive system is responsible for both theories, however, struggle to fully explain empirical anomalies. To resolve these inconsistencies, this paper proposes a novel Three-Stage Processing Model. This framework suggests that both levels of VPT undergo three sequential phases: (1) Information Processing, (2) Perspective Simulation, and (3) Information Integration with Response Selection.
Stage 1: Information Processing. In this initial stage, both level-1 and level-2 VPT involve the encoding of spatial relationships between the self, others, and objects. However, the depth and scope of this information processing differ. Behavioral evidence suggests that level-1 VPT primarily involves tracking “line-of-sight” paths, requiring relatively shallow representation of whether a physical barrier exists between the agent and the target. In contrast, level-2 VPT demands more fine-grained spatial representation, including the precise orientation and visual morphology of objects as seen from different angles. While both levels share basic spatial encoding in the occipito-parietal cortex, level-2 VPT triggers more extensive activation in dorsal attention and frontoparietal control networks to manage higher representation depth.
Stage 2: Perspective Simulation. This stage marks the most significant divergence between the two levels. In level-1 VPT, perspective simulation is a relatively straightforward process that involves quickly tracking the other's line of sight and determining whether an object is visible. This simulation process relies on rapid, non-embodied mechanisms, such as gaze tracking, that do not require significant cognitive resources. In contrast, level-2 VPT engages more complex and embodied processes, often requiring mental rotation or reconfiguration of the reference frame. This embodied simulation involves a shift from the self's reference frame to that of the other, requiring cognitive resources such as body representation and spatial reasoning. Behavioral studies demonstrate that body posture alignment significantly facilitates level-2 VPT but has little effect on level-1 VPT. Neuroscientific data support this, showing that level-2 VPT specifically activates brain regions associated with body representation, such as the Extrastriate Body Area (EBA) and the insula, which are largely inactive during level-1 VPT.
Stage 3: Information Integration with Response Selection. In the final stage, individuals must integrate the information gathered in the first two stages and making a final judgment about the object or the other person's perspective. During this stage, both level-1 and level-2 VPT share a common mechanism of integrating information about the other person's intentions and mental states. For instance, when an agent exhibits a goal-directed “reach-to-grasp” action, both level-1 VPT and level-2 VPT performance are enhanced, suggesting a shared understanding of others' psychological states at the response stage. However, level-2 VPT generally requires stronger cognitive control to resolve more complex perspective conflicts. Neural evidence regarding “social brain”—specifically the right Temporoparietal Junction (rTPJ) and dorsomedial Prefrontal Cortex (dmPFC)—play a crucial role in managing these conflicts, Although the role of them remains debated, current evidence suggests these regions are likely recruited in both levels when tasks explicitly require processing social intent or involve high interference.
In conclusion, we proposes The Three-Stage Processing Model by integrating evidence from behavioural and neuroscience research. And this model offers a unified framework that accommodates the similarities and distinct differences between Level-1 and Level-2 VPT. To further validate and refine this model, future research should focus on developing experimental paradigms to dissociate these three stages, utilizing high-temporal-resolution techniques to map the model's temporal dynamics, and exploring the triggering conditions for embodied mechanisms in VPT-2 and their cross-modal integration. This study provides a comprehensive framework that paves the way for a more unified theory of spatial and social cognition.

Key words: visual perspective taking, two-systems account, single-system account, spatial cognition