ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2024, Vol. 32 ›› Issue (9): 1488-1501.doi: 10.3724/SP.J.1042.2024.01488

• 研究前沿 • 上一篇    下一篇

听到“牛黄”能想到“黄牛”吗?——口语识别中的语音位置编码机制

韩海宾1, 李兴珊2,3   

  1. 1河北师范大学教育学院, 石家庄 050024;
    2中国科学院心理研究所, 北京 100101;
    3中国科学院大学心理学系, 北京 100049
  • 收稿日期:2023-10-19 出版日期:2024-09-15 发布日期:2024-06-26
  • 通讯作者: 李兴珊, E-mail: lixs@psych.ac.cn
  • 基金资助:
    * 河北省社会科学基金项目(HB22JY041)资助

The mechanism of phonetic position encoding in spoken word recognition

HAN Haibin1, LI Xingshan2,3   

  1. 1College of Education, Hebei Normal University, Shijiazhuang 050024, China;
    2CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
    3Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-10-19 Online:2024-09-15 Published:2024-06-26

摘要: 在众多语言中, 都存在一系列词汇, 经过语音位置转置后仍能有效成词, 典型如中文中的“牛黄”与“黄牛”。阐明这类可转置词汇在语言理解过程中的编码方式, 是一项至关重要的研究课题。在阅读领域, 学者们已就词汇的位置编码机制展开了讨论, 然而针对口语加工中语音位置编码的认知机制, 至今仍存在序列-灵活编码之争: 早期口语识别理论认为语音位置编码主要以序列编码方式为主, 而近年来的研究则发现, 音位、音节和句子等层面上存在以灵活编码为主的语音位置编码方式。未来研究应深入探索与口语识别中语音编码相关的认知机理、神经机制、语言获得以及人工智能等重要问题, 由于汉字词在形音对应关系和语音加工单元等方面独具特殊性, 后续研究应对汉字词的语音位置编码予以特别关注。

关键词: 口语识别, 语音位置编码, 汉字词

Abstract: Across various languages, there exists a set of words that retain their meaning even when their phonetic components are transposed. A typical illustration can be found in Chinese with words like “牛黄/niu2 huang2/” and “黄牛/huang2 niu2/,” and in English with words like “bus” and “sub.” Investigating how these transposable words are processed during language comprehension has become a crucial research topic. Within the field of reading, scholars have been engaged in discussions regarding the mechanisms for encoding word positions. However, there persists a controversy regarding the cognitive mechanisms governing phonetic position encoding in spoken word recognition.
Early theories posited that phonetic position encoding primarily followed a sequential approach. These models assume that words are represented as sequences of phonemes, with activation based on linear positional matching during the temporal unfolding of spoken words, as exemplified by models like the COHORT or TRACE model. The COHORT model suggests that word activation follows an all-or-none rule, where only words matching at onset compete for activation. Later models diverge from this principle, proposing that word activation and recognition stem from the linear matching of input speech signals with phonemic segments, as seen in models like TRACE, NAM, and Shortlist models. These slot-based models postulate that the phonetic information of words is encapsulated within fixed ‘slots’, and word activation hinges on the degree of match between each slot’s phonetic and positional features.
Nevertheless, in the field of reading or visual word recognition, researchers have discovered that the encoding of words may adopt a coarse-grained encoding approach. Throughout the process of reading sentences, readers consistently maintain a sense of uncertainty regarding recently encountered words during the comprehension process. The Noisy-channel model of speech perception proposed by Gibson et al. (2013) also elucidates how we understand language amid ‘noise.’ Findings regarding this ‘uncertainty’ shed light on the possibility of employing coarse-grained encoding of phonetic information during spoken word recognition. Indeed, recent studies have shown the flexibility of phonetic position encoding at levels of phonemes, syllables, and sentences. For instance, researchers have discovered mutual activation between anadromous word pairs such as “sub” and “bus,” or “/byt/” and “/tyb/,” demonstrating a transposed-phoneme effect in spoken word recognition. This position-independent phoneme encoding suggested that phonetic encoding adopts a coarse-grained, more flexible approach, independent of positional information.
However, current research mostly focuses on alphabetic writing systems and lacks universality. Exploring logographic languages like Chinese can provide broader evidence for this flexible encoding mechanism. Chinese characters, as ideographic symbols, exhibit several distinctive features worthy of investigation. Foremost is the spelling-sound dissociation in Chinese, allowing for the identification of word pairs with entirely different forms and meanings yet sharing anadromic phonetic sequences, such as “冰锥/bing1 zhui1/, meaning ice pick” and “追兵/zhui1 bing1/, meaning pursuing soldiers.” The peculiarities of spelling-sound association rules in Chinese characters enable the examination of phonetic encoding independent of visual form, offering more precise and meticulous insights into phonetic position encoding in spoken word recognition. Additionally, the phonetic processing units in Chinese lexicon are distinctive. Unlike alphabetic languages, Chinese, with its syllabic nature, possibly operates at the syllable level, with each character corresponding to a syllable. Finally, the unique relationship among syllables, morphemes, and semantics in Chinese spoken word recognition differentiates it from alphabetic scripts, highlighting the importance of investigating phonetic position encoding in logographic languages.
In conclusion, the encoding of phonetic positions in spoken word recognition likely adopts a flexible, position-independent approach. However, additional empirical evidence is required to substantiate this hypothesis. Further exploration is needed on four key questions regarding phonetic position encoding in spoken word recognition: (1) Elucidating the universality of flexible encoding across languages by examining the phonetic encoding peculiarities of Chinese characters; (2) Unraveling the neural mechanisms underlying temporal processing of phonetic positions utilizing techniques such as electroencephalography (EEG) and functional magnetic resonance imaging (fMRI); (3) Leveraging existing research findings to guide language acquisition and learning processes across diverse populations; (4) Harnessing insights into the mechanisms of human speech signal processing to facilitate the development of more advanced and comprehensive functionalities in artificial intelligence, which is rapidly permeating various facets of modern life and evolving alongside advancements in information technology.

Key words: spoken word recognition, phonetic position encoding, Chinese character

中图分类号: