ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

   

Comparing the Mechanisms of Level-1 and Level-2 Visual Perspective Taking: Theoretical Controversies, Behavioral and Neuroscientific Evidence

Wang Jiayin, Li Jing   

  1. , 210097,
  • Received:2025-06-06 Revised:2025-12-18 Accepted:2026-01-09
  • Contact: Li, Jing
  • Supported by:
    Humanity and Social Science Youth foundation of Ministry of Education of China(24YJA190007); National Natural Science Foundation of China(4237010315)

Abstract: Visual perspective taking (VPT) is commonly divided into level-1 and level-2. Current theories fundamentally disagree on their relationship: the two-systems account posits distinct mechanisms, while the single-system account suggests shared processing. Integrating these perspectives, this article proposes a three-stage processing model, which posits that both level-1 and level-2 VPT involve three stages: information processing, perspective simulation, and information integration with response selection. Behavioral and neural evidence indicates that the mechanisms of level-1 and level-2 VPT exhibit both similarities and differences across these stages. During information processing, both share basic spatial information encoding, but level-2 VPT requires more fine-grained representations. In the perspective simulation stage, level-1 VPT relies on rapid, non-embodied gaze tracking, whereas level-2 VPT involves embodied self-rotation with reference frame transformation. During information integration and response selection, both may share an understanding of others’ intentions, though level-2 VPT demands stronger cognitive control. The proposed three-stage model offers a unified framework for understanding level-1 and level-2 VPT. Future research should focus on developing experimental paradigms to dissociate these stages, employing high-temporal-resolution techniques to examine the temporal dynamics of the model, and further investigating the triggering conditions of embodied mechanisms in level-2 VPT, as well as cross-modal integration.

Key words: visual perspective taking, two-systems account, single-system account, spatial cognition