ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

Advances in Psychological Science ›› 2024, Vol. 32 ›› Issue (9): 1488-1501.doi: 10.3724/SP.J.1042.2024.01488

• Regular Articles • Previous Articles     Next Articles

The mechanism of phonetic position encoding in spoken word recognition

HAN Haibin1, LI Xingshan2,3   

  1. 1College of Education, Hebei Normal University, Shijiazhuang 050024, China;
    2CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China;
    3Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-10-19 Online:2024-09-15 Published:2024-06-26

Abstract: Across various languages, there exists a set of words that retain their meaning even when their phonetic components are transposed. A typical illustration can be found in Chinese with words like “牛黄/niu2 huang2/” and “黄牛/huang2 niu2/,” and in English with words like “bus” and “sub.” Investigating how these transposable words are processed during language comprehension has become a crucial research topic. Within the field of reading, scholars have been engaged in discussions regarding the mechanisms for encoding word positions. However, there persists a controversy regarding the cognitive mechanisms governing phonetic position encoding in spoken word recognition.
Early theories posited that phonetic position encoding primarily followed a sequential approach. These models assume that words are represented as sequences of phonemes, with activation based on linear positional matching during the temporal unfolding of spoken words, as exemplified by models like the COHORT or TRACE model. The COHORT model suggests that word activation follows an all-or-none rule, where only words matching at onset compete for activation. Later models diverge from this principle, proposing that word activation and recognition stem from the linear matching of input speech signals with phonemic segments, as seen in models like TRACE, NAM, and Shortlist models. These slot-based models postulate that the phonetic information of words is encapsulated within fixed ‘slots’, and word activation hinges on the degree of match between each slot’s phonetic and positional features.
Nevertheless, in the field of reading or visual word recognition, researchers have discovered that the encoding of words may adopt a coarse-grained encoding approach. Throughout the process of reading sentences, readers consistently maintain a sense of uncertainty regarding recently encountered words during the comprehension process. The Noisy-channel model of speech perception proposed by Gibson et al. (2013) also elucidates how we understand language amid ‘noise.’ Findings regarding this ‘uncertainty’ shed light on the possibility of employing coarse-grained encoding of phonetic information during spoken word recognition. Indeed, recent studies have shown the flexibility of phonetic position encoding at levels of phonemes, syllables, and sentences. For instance, researchers have discovered mutual activation between anadromous word pairs such as “sub” and “bus,” or “/byt/” and “/tyb/,” demonstrating a transposed-phoneme effect in spoken word recognition. This position-independent phoneme encoding suggested that phonetic encoding adopts a coarse-grained, more flexible approach, independent of positional information.
However, current research mostly focuses on alphabetic writing systems and lacks universality. Exploring logographic languages like Chinese can provide broader evidence for this flexible encoding mechanism. Chinese characters, as ideographic symbols, exhibit several distinctive features worthy of investigation. Foremost is the spelling-sound dissociation in Chinese, allowing for the identification of word pairs with entirely different forms and meanings yet sharing anadromic phonetic sequences, such as “冰锥/bing1 zhui1/, meaning ice pick” and “追兵/zhui1 bing1/, meaning pursuing soldiers.” The peculiarities of spelling-sound association rules in Chinese characters enable the examination of phonetic encoding independent of visual form, offering more precise and meticulous insights into phonetic position encoding in spoken word recognition. Additionally, the phonetic processing units in Chinese lexicon are distinctive. Unlike alphabetic languages, Chinese, with its syllabic nature, possibly operates at the syllable level, with each character corresponding to a syllable. Finally, the unique relationship among syllables, morphemes, and semantics in Chinese spoken word recognition differentiates it from alphabetic scripts, highlighting the importance of investigating phonetic position encoding in logographic languages.
In conclusion, the encoding of phonetic positions in spoken word recognition likely adopts a flexible, position-independent approach. However, additional empirical evidence is required to substantiate this hypothesis. Further exploration is needed on four key questions regarding phonetic position encoding in spoken word recognition: (1) Elucidating the universality of flexible encoding across languages by examining the phonetic encoding peculiarities of Chinese characters; (2) Unraveling the neural mechanisms underlying temporal processing of phonetic positions utilizing techniques such as electroencephalography (EEG) and functional magnetic resonance imaging (fMRI); (3) Leveraging existing research findings to guide language acquisition and learning processes across diverse populations; (4) Harnessing insights into the mechanisms of human speech signal processing to facilitate the development of more advanced and comprehensive functionalities in artificial intelligence, which is rapidly permeating various facets of modern life and evolving alongside advancements in information technology.

Key words: spoken word recognition, phonetic position encoding, Chinese character

CLC Number: