ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2023, Vol. 31 ›› Issue (11): 2050-2062.doi: 10.3724/SP.J.1042.2023.02050

• 研究前沿 • 上一篇    下一篇

利用视觉情境范式揭示口语加工的时间进程

魏一璞()   

  1. 北京大学对外汉语教育学院, 北京 100871
  • 收稿日期:2023-02-06 出版日期:2023-11-15 发布日期:2023-08-28
  • 通讯作者: 魏一璞, E-mail: weiyipu@pku.edu.cn
  • 基金资助:
    教育部人文社会科学研究青年基金项目(21YJC740062)

Visual world paradigm reveals the time course of spoken language processing

WEI Yipu()   

  1. School of Chinese as a Second Language, Peking University, Beijing 100871, China
  • Received:2023-02-06 Online:2023-11-15 Published:2023-08-28

摘要:

视觉情境范式是一种通过追踪、测量人眼在视觉物体上的注视轨迹来研究实时口语加工的眼动实验范式。该范式运用于语言理解类研究的理论基础是眼动连接假设(如: 协同互动理论、基于目标的连接假设理论等), 这些连接假设在眼动轨迹与口语加工进程之间建立起了有意义的关联。使用视觉情境范式所获取的数据能够为口语加工提供精确的时间信息, 常用的数据分析方法包括: 时间兴趣区内注视比例均值分析、分叉点分析、生长曲线分析等。该范式为研究词汇语音识别、句法解歧、语义理解、语篇语用信息加工等问题提供了关键性证据。

关键词: 视觉情境范式, 眼动追踪, 口语加工

Abstract:

The visual world paradigm (VWP) is a widely used tool in psycholinguistics to study the time course of spoken language processing (Cooper, 1974; Tanenhaus et al., 1995). In this paradigm, eye movements are tracked while participants listen to spoken language and view visual scenes, providing precise temporal information about the processing of words and sentences. As acoustic input unfolds, comprehenders’ focus of attention on particular entities in the mental representations of spoken language changes, and their visual attention also shifts accordingly (Altmann & Mirkovi, 2009). Such allocation of attention can be manifested in eye movements as overt behavioral data.

Linking hypotheses of this field link eye movements in visual contexts with the mental representations of linguistic input. The coordinated interplay account proposed by Knoeferle and Crocker (2006, 2007) defines three phases in visually situated spoken language comprehension: integrating new words, searching for referents in visual contexts and matching linguistic input with objects and actions in the visual contexts. These three phases may take place sequentially or overlap with one another in time. An alternative linking hypothesis raised by Altmann and Mirkovi (2009), however, suggested that the processes of interpreting linguistic input and comprehending visual scenes are intertwined, as linguistic meanings and non-linguistic information (e.g., visual information and world knowledge) are stored in one unitary system and jointly contribute to the dynamic representation of situations. Salverda et al.’s (2011) goal-based linking hypothesis introduces a task-goal dimension into the theoretical model. That is, the goal of the task also affects language processing: Visual objects that are directly related to this goal would attract more attention; and additional tasks such as clicking or moving objects contribute to the goal structure of the task and directly influence eye movements.

The assets visual world paradigm has brought to the field—(i) the possibility to include a visual dimension in linguistic processing; (ii) a fine-grained time course measure of eye movements in real-time language comprehension—have greatly expanded the range of experimental designs for language studies. As the VWP relies primarily on listening tasks and does not require subjects to have full literacy skills in reading, it can be applied to examine language processing in young children, second language learners, and people with specific language impairments.

Dependent variables in a VWP experiment include fixation proportions, target ratio, latency of saccades, etc. Factors such as areas of interest, groups and experimental conditions can be included as independent variables. To make use of the fine-grained time-course data provided by the VWP, including a temporal dimension to the analytical models is crucial. While traditional analyses evaluate fixation/saccade differences between conditions during a time window (using t-test, ANOVA, and mixed-effect models), the divergent point analysis and cluster‑based permutation analysis are informative in detecting and comparing the emergence time of effects (Ito & Knoeferle, 2022). The growth-curve analysis, on the other hand, models the changes of looks to an interest area over time (Mirman, 2008).

Studies fueled by the VWP have revealed that language processing is incremental or even predictive, in contrast to the findings of earlier studies supporting delayed integration of language. At the early stage of word recognition, phonological cohorts compete with the target, and listeners may use phonetic information to anticipate upcoming words. The processing of semantic information in verb-argument and classifier-noun structures, for example, is highly incremental or anticipatory. Discourse processing, including referential processing and the comprehension of coherence relations, is also found to be immediate. In addition, the VWP has shown that the syntactic and pragmatic processing is in accordance with the constraint-based account (Trueswell et al., 1994)—multiple types of information including syntactic structures and pragmatic implicatures, form constraints to language processing at the very early stage, alongside other constraints such as contextual features, visual information, world knowledge, etc.

The VWP is limited in the sense that it cannot provide data on processing time and therefore cannot answer questions related to processing difficulties in language comprehension. Moreover, the VWP experiments can only present a limited number of static objects in visual space, which also differs from the complex visual environment of natural conversation. In experimental settings where only a limited number of objects are presented, listeners may anticipate linguistic input in advance and look strategically at certain objects (Henderson & Ferreira, 2004; see counter-argument in Dahan & Tanenhaus, 2004).

Developments of the VWP are driven by both theoretical and technological advances. For future studies, investigating the role of task-goal in real-time language processing situated in visual contexts is important. Technological innovations such as virtual reality (VR) create comparatively natural communication scenarios while maintaining precise experimental control, largely improving the ecological validity of eye-tracking experiments using the VWP.

Key words: visual world paradigm, eye-tracking, spoken language processing

中图分类号: