The mechanism of phonetic position encoding in spoken word recognition

doi:10.3724/SP.J.1042.2024.01488

Abstract

Abstract: Across various languages, there exists a set of words that retain their meaning even when their phonetic components are transposed. A typical illustration can be found in Chinese with words like “牛黄/niu2 huang2/” and “黄牛/huang2 niu2/,” and in English with words like “bus” and “sub.” Investigating how these transposable words are processed during language comprehension has become a crucial research topic. Within the field of reading, scholars have been engaged in discussions regarding the mechanisms for encoding word positions. However, there persists a controversy regarding the cognitive mechanisms governing phonetic position encoding in spoken word recognition.
Early theories posited that phonetic position encoding primarily followed a sequential approach. These models assume that words are represented as sequences of phonemes, with activation based on linear positional matching during the temporal unfolding of spoken words, as exemplified by models like the COHORT or TRACE model. The COHORT model suggests that word activation follows an all-or-none rule, where only words matching at onset compete for activation. Later models diverge from this principle, proposing that word activation and recognition stem from the linear matching of input speech signals with phonemic segments, as seen in models like TRACE, NAM, and Shortlist models. These slot-based models postulate that the phonetic information of words is encapsulated within fixed ‘slots’, and word activation hinges on the degree of match between each slot’s phonetic and positional features.
Nevertheless, in the field of reading or visual word recognition, researchers have discovered that the encoding of words may adopt a coarse-grained encoding approach. Throughout the process of reading sentences, readers consistently maintain a sense of uncertainty regarding recently encountered words during the comprehension process. The Noisy-channel model of speech perception proposed by Gibson et al. (2013) also elucidates how we understand language amid ‘noise.’ Findings regarding this ‘uncertainty’ shed light on the possibility of employing coarse-grained encoding of phonetic information during spoken word recognition. Indeed, recent studies have shown the flexibility of phonetic position encoding at levels of phonemes, syllables, and sentences. For instance, researchers have discovered mutual activation between anadromous word pairs such as “sub” and “bus,” or “/byt/” and “/tyb/,” demonstrating a transposed-phoneme effect in spoken word recognition. This position-independent phoneme encoding suggested that phonetic encoding adopts a coarse-grained, more flexible approach, independent of positional information.
However, current research mostly focuses on alphabetic writing systems and lacks universality. Exploring logographic languages like Chinese can provide broader evidence for this flexible encoding mechanism. Chinese characters, as ideographic symbols, exhibit several distinctive features worthy of investigation. Foremost is the spelling-sound dissociation in Chinese, allowing for the identification of word pairs with entirely different forms and meanings yet sharing anadromic phonetic sequences, such as “冰锥/bing1 zhui1/, meaning ice pick” and “追兵/zhui1 bing1/, meaning pursuing soldiers.” The peculiarities of spelling-sound association rules in Chinese characters enable the examination of phonetic encoding independent of visual form, offering more precise and meticulous insights into phonetic position encoding in spoken word recognition. Additionally, the phonetic processing units in Chinese lexicon are distinctive. Unlike alphabetic languages, Chinese, with its syllabic nature, possibly operates at the syllable level, with each character corresponding to a syllable. Finally, the unique relationship among syllables, morphemes, and semantics in Chinese spoken word recognition differentiates it from alphabetic scripts, highlighting the importance of investigating phonetic position encoding in logographic languages.
In conclusion, the encoding of phonetic positions in spoken word recognition likely adopts a flexible, position-independent approach. However, additional empirical evidence is required to substantiate this hypothesis. Further exploration is needed on four key questions regarding phonetic position encoding in spoken word recognition: (1) Elucidating the universality of flexible encoding across languages by examining the phonetic encoding peculiarities of Chinese characters; (2) Unraveling the neural mechanisms underlying temporal processing of phonetic positions utilizing techniques such as electroencephalography (EEG) and functional magnetic resonance imaging (fMRI); (3) Leveraging existing research findings to guide language acquisition and learning processes across diverse populations; (4) Harnessing insights into the mechanisms of human speech signal processing to facilitate the development of more advanced and comprehensive functionalities in artificial intelligence, which is rapidly permeating various facets of modern life and evolving alongside advancements in information technology.

Key words: spoken word recognition, phonetic position encoding, Chinese character

CLC Number:

B842.5

HAN Haibin, LI Xingshan. The mechanism of phonetic position encoding in spoken word recognition[J]. Advances in Psychological Science, 2024, 32(9): 1488-1501.

References

[1] 韩海宾, 许萍萍, 屈青青, 程茜, 李兴珊. (2019). 语言加工过程中的视听跨通道整合.心理科学进展, 27(3), 475-489.
[2] 黄伯荣, 廖序东. (2011). 现代汉语 (上册, 增订五版). 北京: 高等教育出版社.
[3] 彭聃龄, 丁国盛, 王春茂, Taft, 朱晓平. (1999). 汉语逆序词的加工——词素在词加工中的作用.心理学报, 1, 36-46.
[4] Allopenna P. D., Magnuson J. S., & Tanenhaus M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models.Journal of Memory and Language, 38(38), 419-439.
[5] Andersson R., Ferreira F., & Henderson J. M. (2011). I see what you’re saying: The integration of complex speech and scenes during language comprehension.Acta Psychologica, 137(2), 208-216.
[6] Chambers, S. M. (1979). Letter and order information in lexical access.Journal of Verbal Learning and Verbal Behavior, 18(2), 225-241.
[7] Chen J. Y., Chen T. M., & Dell G. S. (2002). Word-form encoding in Mandarin Chinese as assessed by the implicit priming task.Journal of Memory and Language, 46, 751-781.
[8] Chen, Q., & Mirman, D. (2012). Competition and cooperation among similar representations: Toward a unified account of facilitative and inhibitory effects of lexical neighbors.Psychological Review, 119, 417-430.
[9] Connine C. M., Blasko D. G., & Titone D. (1993). Do the beginnings of spoken words have a special status in auditory word recognition?Journal of Memory and Language, 32(2), 193-210.
[10] Connolly, J. F., & Phillips, N. A. (1994). Event-related potential components reflect phonological and semantic processing of the terminal word of spoken sentences.Journal of Cognitive Neuroscience, 6(3), 256-266.
[11] Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language.Cognitive Psychology, 107(1), 84-107.
[12] Dahan D.,& Magnuson, J. S. (2006). Spoken Word Recognition. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of Psycholinguistics (pp. 249-283). Academic Press.
[13] Davis, C. J. (2010). The spatial coding model of visual word identification.Psychological Review, 117, 713-758.
[14] Dufour, S., & Frauenfelder, U. H. (2010). Phonological neighbourhood effects in French spoken-word recognition.Quarterly Journal of Experimental Psychology, 63(2), 226-238.
[15] Dufour, S., & Grainger, J. (2019). Phoneme‐order encoding during spoken word recognition: A priming investigation.Cognitive Science, 43(10), e12785.
[16] Dufour, S., & Grainger, J. (2020). The influence of word frequency on the transposed-phoneme priming effect. Attention, Perception, & Psychophysics, 82(6), 2785-2792.
[17] Dufour, S., & Grainger, J. (2022). When you hear/baksɛt/do you think/baskɛt/? Evidence for transposed-phoneme effect with multisyllabic words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 48(1), 98-107.
[18] Dufour S., Mirault J., & Grainger J. (2021). Do you want/ʃoloka/on a/bistɔk/? On the scope of transposed-phoneme effects with non-adjacent phonemes.Psychonomic Bulletin & Review, 28(5), 1668-1678.
[19] Dufour S., Mirault J., & Grainger J. (2022). Transposed-word effects in speeded grammatical decisions to sequences of spoken words.Scientific Reports, 12(1), 22035.
[20] Dufour S., Mirault J., & Grainger J. (2023). When facilitation becomes inhibition: Effects of modality and lexicality on transposed-phoneme priming.Language, Cognition and Neuroscience, 38(2), 147-156.
[21] Dufour, S., & Peereman, R. (2003). Inhibitory priming effects in auditory word recognition: When the target's competitors conflict with the prime word.Cognition, 88(3), B33-B44.
[22] Frankish, C., & Turner, E. (2007). SIHGT and SUNOD: The role of orthography and phonology in the perception of transposed letter anagrams.Journal of Memory and Language, 56(2), 189-211.
[23] Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating form and meaning: A distributed model of speech perception. Language and Cognitive Processes, 12, 613-656.
[24] Gibson E., Piantadosi S. T., Brink K., Bergen L., Lim E., & Saxe R. (2013). A noisy-channel account of crosslinguistic word-order variation.Psychological Science, 24(7), 1079-1088.
[25] Gomez P., Ratcliff R., & Perea M. (2008). The overlap model: A model of letter position coding.Psychological Review, 115(3), 577-600.
[26] Grainger, J., & Van Heuven, W. J. B. (2004). Modeling letter position coding in printed word perception. In P. Bonin (Ed.), Mental lexicon: "Some words to talk about words"(pp. 1-23). Nova Science Publishers.
[27] Grainger, J., & Whitney, C. (2004). Does the huamn mnid raed wrods as a wlohe?Trends in Cognitive Sciences, 8, 58-59.
[28] Gregg J., Inhoff A. W., & Connine C. M. (2019). Re-reconsidering the role of temporal order in spoken word recognition.Quarterly Journal of Experimental Psychology, 72(11), 2574-2583.
[29] Guerrara, C., & Forster, K. (2008). Masked form priming with extreme transposition.Language & Cognitive Processes, 23, 117-142.
[30] Gwilliams L., King J. R., Marantz A., & Poeppel D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order.Nature Communications, 13(1), 6606.
[31] Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Proceedings of NAACL-2001(pp. 159-166). Stroudsburg, PA: Association for Computational Linguistics.
[32] Han, H., & Li, X. (2020). Degree of conceptual overlap affects eye movements in visual world paradigm.Language, Cognition and Neuroscience, 35(10), 1456-1464.
[33] Hannagan T., Dupoux E., & Christophe A. (2011). Holographic string encoding.Cognitive Science, 35, 79-118.
[34] Hannagan T., Magnuson J. S., & Grainger J. (2013). Spoken word recognition without a TRACE.Frontiers in Psychology, 4, 563.
[35] Hofmann T., Schölkopf B., & Smola A. J. (2008). Kernel methods in aachine learning.The Annals of Statistics, 36(3), 1171-1220.
[36] Huettig, F., & Altmann, G. T. M. (2005). Word meaning and the control of eye fixation: Semantic competitor effects and the visual world paradigm.Cognition, 96(1), 23-32.
[37] Jurafsky, D. (1996). A probabilistic model of lexical and syntactic access and disambiguation.Cognitive science, 20(2), 137-194.
[38] Lahiri, A., & Marslen-Wilson, W. (1991). The mental representation of lexical form: A phonological approach to the recognition lexicon.Cognition, 38(3), 245-294.
[39] Levy, R. (2008). Expectation-based syntactic comprehension. Cognition 106(3), 1126-1177.
[40] Levy R., Bicknell K., Slattery T., & Rayner K. (2009). Eye movement evidence that readers maintain and act on uncertainty about past linguistic input.Proceedings of the National Academy of Sciences, USA, 106, 21086-21090.
[41] Liu Z., Li Y., Cutter M. G., Paterson K. B., & Wang J. (2022). A transposed-word effect across space and time: Evidence from Chinese.Cognition, 218, 104922.
[42] Liu Z., Li Y., Paterson K. B., & Wang J. (2020). A transposed-word effect in Chinese reading.Attention, Perception, & Psychophysics, 82(8), 3788-3794.
[43] Liu Z., Li Y., & Wang J. (2021). Context but not reading speed modulates transposed-word effects in Chinese reading.Acta Psychologica, 215, 103272.
[44] Luce P. A., Goldinger S. D., Auer E. T., Jr., & Vitevitch M. S. (2000). Phonetic priming, neighborhood activation, and PARSYN.Perception & Psychophysics, 62, 615-625.
[45] Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and Hearing, 19, 1-36.
[46] Marslen-Wilson, W. D. (1990). Activation, competition, and frequency in lexical access. In G. T. M. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives (pp. 148-172). Cambridge, MA: MIT Press.
[47] Marslen-Wilson W. D., Moss H. E., & van Halen S. (1996). Perceptual distance and competition in lexical access. Journal of Experimental Psychology: Human Perception and Performance, 22(6), 1376-1392.
[48] Marslen-Wilson, W. D., & Warren, P. (1994). Levels of perceptual representation and process in lexical access: Words, phonemes, and features.Psychological Review, 101(4), 653-675.
[49] Marslen-Wilson, W. D., & Tyler, L. K. (1987). Against modularity. In J. L. Garfield (Ed.), Modularity in knowledge representation and natural-language understanding (pp. 37-62). Cambridge: The MIT Press.
[50] Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10(1), 29-63.
[51] Marslen-Wilson, W. D., & Zwitserlood, P. (1989). Accessing spoken words: The importance of word onsets.Journal of Experimental Psychology: Human Perception and Performance, 15(3), 576-585.
[52] Marslen-Wilson, W. D. (1993). Issues of process and representation in lexical access. In G. T. M. Altmann & R. Shillcock (Eds.), Cognitive models of speech processing: The second Sperlonga meeting (pp. 187-210). Lawrence Erlbaum Associates Publishers.
[53] Marslen-Wilson, W. (1973). Linguistic structure and speech shadowing at very short latencies.Nature, 244, 522-523.
[54] Marslen-Wilson, W. (1985). Speech shadowing and speech comprehension.Speech Communication, 4, 55-73.
[55] McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception.Cognitive Psychology, 18, 1-86.
[56] McMurray B., Tanenhaus M. K., & Aslin R. N. (2002). Gradient effects of within-category phonetic variation on lexical access.Cognition, 86, B33-B42
[57] Mirault J., Snell J., & Grainger J. (2018). You that read wrong again! A transposed-word effect in grammaticality judgments.Psychological Science, 29(12), 1922-1929.
[58] Norris, D. (1994). SHORTLIST: A connectionist model of continuous speech recognition.Cognition, 52, 189-234.
[59] Norris, D., & McQueen, J. M. (2008). Shortlist B: A Bayesian model of continuous speech recognition.Psychological Review, 115, 357-395.
[60] O’Connor, R. E., & Forster, K. I. (1981). Criterion bias and search sequence bias in word recognition.Memory & Cognition, 9, 78-92.
[61] O’Seaghdha P. G., Chen J. Y., & Chen T. M. (2010). Proximate units in word production: Phonological encoding begins with syllables in Mandarin Chinese but with segments in English.Cognition, 115(2), 282-302.
[62] Perea, M., & Lupker, S. J. (2003). Does jugde activate COURT? Transposed-letter similarity effects in masked associative priming.Memory & Cognition, 31, 829-841.
[63] Perea, M., & Lupker, S. J. (2004). Can CANISO activate CASINO? Transposed-letter similarity effects with nonadjacent letter positions.Journal of Memory and Language, 51, 231-246.
[64] Prabhakaran R., Blumstein S. E., Myers E. B., Hutchison E., & Britton B. (2006). An event-related fMRI investigation of phonological-lexical competition.Neuropsychologia, 44, 2209-2221.
[65] Qu Q. Q., Damian M. F., & Kazanina N. (2012). Sound-size segments are significant for Mandarin speakers.Proceedings of the National Academy of Sciences of the United States of America (PNAS), 109, 14265-14270.
[66] Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive psychology, 7(1), 65-81.
[67] Reichle E. D., Pollatsek A., Fisher D. L., & Rayner K. (1998). Toward a model of eye movement control in reading.Psychological Review, 105(1), 125-157.
[68] Righi G., Blumstein S. E., Mertus J., & Worden M. S. (2010). Neural systems underlying lexical competition: An eye tracking and fMRI study.Journal of Cognitive Neuroscience, 22(2), 213-224.
[69] Scott, S. K. (2019). From speech and talkers to the social world: The neural processing of human spoken language.Science, 366(6461), 58-62.
[70] Sereno S. C., Brewer C. C., & O'Donnell P. J. (2003). Context effects in word recognition: Evidence for early interactive processing.Psychological Science, 14(4), 328-333.
[71] Toscano J. C., Anderson N. D., & McMurray B. (2013). Reconsidering the role of temporal order in spoken word recognition.Psychonomic Bulletin & Review, 20(5), 981-987.
[72] Van Petten C., Coulson S., Rubin S., Plante E., & Parks M. (1999). Time course of word identification and semantic integration in spoken language.Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(2), 394-417.
[73] Whitney, C. (2001). How the brain encodes the order of letters in a printed word: The SERIOL model and selective literature review.Psychonomic Bulletin & Review, 8, 221-243.
[74] Yee E., Blumstein S., & Sedivy J. C. (2008). Lexical-semantic activation in Brocaʼs and Wernickeʼs aphasia: Evidence from eye movements. Journal of Cognitive Neuroscience, 20(4), 592-612.
[75] Yi H. G., Leonard M. K., & Chang E. F. (2019). The encoding of speech sounds in the superior temporal gyrus.Neuron, 102(6), 1096-1110.
[76] You, H., & Magnuson, J. S. (2018). TISK 1.0: An easy-to-use Python implementation of the time-invariant string kernel model of spoken word recognition.Behavior Research Methods, 50, 871-889.
[77] You W., Zhang Q., & Verdonschot R. G. (2012). Masked syllable priming effects in word and picture naming in Chinese.PloS one, 7(10), e46595.
[78] Zwitserlood, P. (1989). The locus of the effects of sentential-semantic context in spoken-word processing.Cognition, 32(1), 25-64.