心理科学进展, 2018, 26(10): 1765-1774 doi: 10.3724/SP.J.1042.2018.01765

研究前沿

口语加工中的词语切分线索

于文勃, 梁丹丹,

南京师范大学文学院, 南京 210097

Word segmentation cues in the process of spoken language

YU Wenbo, LIANG Dandan,

School of Chinese Language and Culture, Nanjing Normal University, Nanjing 210097, China

通讯作者: 梁丹丹, E-mail: ldd233@sina.com

收稿日期: 2017-12-27   网络出版日期: 2018-10-15

基金资助: *江苏高校优势学科建设工程资助项目.  PAPD

Received: 2017-12-27   Online: 2018-10-15

摘要

词是语言的基本结构单位, 对词语进行切分是语言加工的重要步骤。口语语流中的切分线索来自于语音、语义和语法三个方面。语音线索包括概率信息、音位配列规则和韵律信息, 韵律信息中还包括词重音、时长和音高等内容, 这些线索的使用在接触语言的早期阶段就逐渐被个体所掌握, 而且在不同的语言背景下有一定的特异性。语法和语义线索属于较高级的线索机制, 主要作用于词语切分过程的后期。后续研究应从语言的毕生发展和语言的特异性两个方面考察口语语言加工中的词语切分线索。

关键词: 口语 ; 词语切分 ; 语音 ; 语义 ; 语法

Abstract

Words are generally considered as the basic unit of language processing. Hence word segmentation is a vital step for language comprehension. In speech processing, cues for word segmentation may be phonological, grammatical or semantic. Phonological cues can be further classified as statistic, phonotactic and prosodic, while prosodic information involves stress, duration and pitch. Phonological cues are generally acquired at the initial stage of language learning, and they differ as the linguistic environment changes. Semantic and grammatical knowledge provide high-level cues which constrains word segmentation at later stage. It is suggested that future research focus on the trajectory of segmentation cues in a lifespan and the specificity of language in the process of word segmentation.

Keywords: spoken language ; word segmentation ; phonology ; semantics ; grammar

PDF (627KB) 元数据 多维度评价 相关文章 导出 EndNote| Ris| Bibtex  收藏本文

本文引用格式

于文勃, 梁丹丹. (2018). 口语加工中的词语切分线索. 心理科学进展, 26(10), 1765-1774

YU Wenbo, LIANG Dandan. (2018). Word segmentation cues in the process of spoken language. Advances in Psychological Science, 26(10), 1765-1774

1 引言

语言单位包括语素、词、词组等, 其中, 词是最小的能独立运用的音义结合体, 是个体在头脑中存储的基本单位(张珊珊, 杨亦鸣, 2012)。口语语流是随时间变化的线性结构, 词语切分过程中, 词和词之间没有清晰可靠的边界, 不像文本阅读中有明确的空间线索(标点符号或空格), 早期的研究往往关注语义、语法等方面的线索信息, 但是婴幼儿在习得语言初期并不具有完备的语义知识和语法体系, 那么他们是如何进行切分的呢?可以猜想, 语音信息可能是重要的线索。另一方面, 随着人工智能和语音合成等新技术的发展, 从语音层面探究词语的切分线索, 描绘人脑词语切分的内在过程俨然成为了当前心理学的研究热点。本文着重介绍口语加工中词语切分的语音线索, 随后介绍语法和语义线索, 最后对未来的研究提出一些建议。

2 词语切分中的语音线索

本部分内容聚焦词语切分的语音线索, 从概率信息、音位配列规则和韵律信息三个方面梳理相关研究。

2.1 概率信息

20世纪90年代末, 研究者提出统计学习(statistical learning)的概念, 指个体自觉地运算刺激间的转换概率(transitional probability, TP)掌握统计规律的过程(Saffran, Aslin, & Newport, 1996; 唐溢等, 2015; Saffran & Kirkham, 2018), 这一认知过程也被认为是婴幼儿和成人在语流中切分词语、发现语法分类甚至是习得句法结构的重要方式(Newport, 2016)。

2.1.1 婴幼儿的研究

在口语语流中, 概率信息指单词内音节的转换概率高于单词间的音节, 比如词组pretty boy中, 音节re-ty间的转换概率要高于音节ty-boy间, 研究表明刚出生8个月的婴儿就已经具备了利用这一概率信息切分词语的能力(Aslin, Saffran, & Newport, 1998; Saffran, Aslin, et al., 1996; Saffran, Aslin, & Newport, 1996)。Saffran, Aslin等(1996)设计了4个由3个音节组成的固定单词(tupiro, golabu, bidaku, padoti), 这些单词随机相连组成无意义音节串(tupirogolabubidakupadotibidaku……)。在完整单词tupiro中, 三个音节是固定连接的, 它们之间的转换概率为1 (三个音节均为tupiro的内部音节, 同时出现), 但在跨界单词rogola中, 前两个音节之间的转换概率为0.33 (单词golabu可能出现在其他任意三个单词之后), 后两个音节的转换概率为1, 因此跨界单词rogolarogo之间的转换概率较小, 意味着可能是词语边界, 所有无意义音节串均没有重音、停顿等线索, 只在转换概率上有所区分。实验分为学习阶段和测试阶段, 学习阶段让婴儿听2分钟的无意义音节串, 测试1发现婴儿对学习过的完整单词tupiro注视时间短, 对没学习过的单词tilado注视时间长; 测试2发现婴儿对学习过的完整单词tupiro注视时间短, 对学习过的跨界单词rogola注视时间长, 研究者认为这种去习惯化效应是因为婴儿以转换概率的高低划分词语边界, 对高转换概率的单词更为熟悉, 注视时间减少。

概率信息在词语切分中的作用也受到一些质疑, Estes (2012)认为大多数统计学习研究都是实验室研究, 而且实验材料为人工语法词, 这一学习机制是否能推广到自然语言环境中值得商榷; 另外, 也有研究者认为婴儿识别的音节串只是根据概率信息计算出来的音节单元, 并非是具有词汇属性的真实单词(Endress & Mehler, 2009; Perruchet & Poulin-Charronnat, 2012)。一些研究者通过实验在一定程度上反驳了以上质疑, 比如Lew-Williams, Pelucchi和Saffran (2011)以意大利语为实验材料, 发现8~10个月的婴儿可以利用转换概率和词汇呈现形式来切分词语; Erickson, Thiessen和Estes (2014)发现8个月大的婴儿只会将高转换概率的音节串作为标签来对物体分类, 婴儿的这种分类能力被认为是基于真实词汇的, 因此研究者推断婴儿通过概率信息切分出来的音节串也具备一定的词汇属性。

2.1.2 成人的研究

相比于婴幼儿的研究, 成人的研究中更容易控制额外变量, 有助于深入分析概率信息在词语切分中的作用。Saffran, Aslin等人(1996)的研究虽然证明婴儿可以通过音节间的概率信息切分口语语流, 但没有细致考察概率信息的载体。音节是我们直觉上最容易划分出来的最小语音单位, 一般以元音作为核心, 辅音在元音前面或后面, 共组成4种基本类型:(1)V, (2)C-V, (3)V-C, (4)C-V-C (林焘, 王理嘉, 2013), 那么概率信息的载体是元音、辅音还是整个音节这一问题并没有答案。近年来以成人为被试的研究发现, 不同语言背景下个体对承载概率信息的语音载体有着不同的偏好(Bonatti, Peña, Nespor, & Mehler, 2005; Gómez, Mok, Ordin, Mehler, & Nespor, 2017)。Bonatti等人(2005)在经典的转换概率范式基础上, 分别在元音和辅音层面上控制音节间的概率信息, 结果发现当辅音为载体时, 法语被试能够更好地利用概率信息进行词语切分, 研究者认为这是因为在印欧语系中辅音对单词识别的作用大于元音。Gómez等人(2017)以粤语母语者为被试, 沿用了Bonatti等(2005)的实验范式, 首先在材料中保证了音节间的转换概率恒定(音节ge后接音节dudy), 然后分别改变元音间的概率信息(含元音/u/的音节后接含元音/e/的音节的概率为0.75, 接含元音/i/的音节的概率为0.25)和辅音间的概率信息(含辅音/h/的音节后接含辅音/b/的音节的概率为0.75, 接含辅音/g/的音节的概率为0.25), 结果发现, 相比于辅音条件, 粤语母语者在元音条件下能更好地利用概率信息切分词语。不同于大部分印欧语系语言, 以汉语普通话、粤语和越南语等为代表的汉藏语系语言具有声调这一超音段特征, Gómez等人(2017)还发现随着声调信息的加入粤语被试对词语切分的准确率进一步提高。可见, 虽然利用概率信息切分词语是人类普遍的能力, 但在不同语言背景中表现形式并不相同。

2.2 音位配列规则

每种语言都有自己的语音音位配列规则(phonotactics), 符合配列规则的音位搭配出现频率高, 不符合的出现频率低甚至为0, 比如在英语中/mk/就是高频辅音搭配, 而/ ŋk /是低频辅音搭配。当个体在语流中识别到不可能同处于一个音节的两个音位时, 会倾向认为二者之间存在音节边界, 而如果前后两个音节分别是单音节词, 那么在切分音节的同时就完成了词语的切分(McQueen, 1998; Suomi, McQueen, & Cutler, 1997; Tremblay & Spinelli, 2013)。在荷兰语的研究中, McQueen (1998)采用词语指认范式, 要求被试在听到无意义双音节中的真词时迅速报告, 比如在无意义双音节词pil.vrempilv.rem中, 真词音节均为pil, 但是前者辅音/l//v/分别处在两个音节中, 后者辅音/l//v/处在同一个音节内。结果发现, 被试在第一种条件下报告真词的反应时更短, 准确率更高, 研究者指出在荷兰语中辅音/l//v/不能处于同一音节内, 与第一种条件刺激的发音方式相匹配, 被试在听到双音节词时更容易判断两个音位之间有音节边界, 进而完成了对真词的切分。

紧张性和松弛性是普遍存在的音位对立特征, 既可以表现在元音上, 也可以表现在辅音上, 紧元音(tense vowel)听起来强而长, 松元音(lax vowel)听起来短而弱(王理嘉, 1991)。在英语中, 紧元音(如/u//i:/)可以作为词尾音, 而松元音(如/i//ʌ/)不可以, Skoruppa, Nevins, Gillard和Rosen (2015)发现在语音片段this is a /naɪzʌteɪ/中, 个体倾向将其切分成this is a /naɪ/ /zʌteɪ/而不是this is a /naɪzʌ/ /teɪ/, 这说明元音的松紧性提供了必要的线索。音位配列规则可以看作是音位间、音节间概率信息的延伸, 暴露在语言环境下的个体可以通过它们之间的概率信息掌握音节与音节之间、词与词之间的边界, 进而内化为语音规则, 而无需特定的习得过程。

2.3 韵律信息

语言的语音结构由音段结构和超音段结构两部分组成(何善芬, 1989), 音段结构就是上文提到的音节, 也指其内部的元音和辅音, 词语切分中的概率信息和音位配列规则主要作用在音段结构上; 超音段特征包括音高、强度以及时间特性, 由音位或音位群负载(杨玉芳, 黄贤军, 高路, 2006), 相关的研究表明, 多种超音段信息也可以作为线索帮助个体切分口语语流。在韵律音系学中, 韵律特征(语调、时域分布和重音)主要通过超音段特征实现, 因此本部分所介绍的超音段信息也可以被称作韵律信息。

2.3.1 词重音

一段语流中各音节声音响亮程度并不完全相等, 在语流中听起来比其他音节突显的音节称为重音音节。重音可以分成词汇层面的词重音和句子层面的句重音或重读。词重音有词汇属性, 具有语法和词汇意义, 起到辨义作用, 而句重音彰显话语组织的突出焦点, 具有语用功能(何善芬, 1989; 许希明, 沈家煊, 2016)。Hyman (2009)将世界语言划分为重音语言和声调语言, 前者以英语为代表, 带有词层面的节律特征, 后者以汉语普通话为代表, 带有词层面的音高特征。相关的研究表明, 以重音语言为母语的个体能够利用词重音作为线索切分语流。

英语是自由重音语言, 单音节词不会遇到重音分配的问题, 多音节词的重音分配位置不固定, 虽然大多数单词词重音位于第一音节(如melody, polish, favorite)但也可能位于其他音节上(如begin, anecdotal)。Cutler和Carter (1987)通过语料库调查发现在英语的实义词中, 强音节开头的数量是弱音节开头数量的三倍, 而且前者出现的频率也是后者的两倍, 因此他们推断英语母语者会通过词重音确定词语的起始位置。Cutler和Norris (1988)设计了两类无意义音节:mintayvemintesh, 前者由两个完整元音音节组成, 记为SS (强强)音节, 后者由一个完整元音音节和一个半元音音节组成, 记为SW (强弱)音节, 实验要求被试在听无意义音节的同时检测真词(如mint)的出现, 结果发现被试对SS音节中真词的反应时间显著长于SW音节, 这可能是因为音节minttayve均是重音音节, 二者会竞争辅音/t/, 进而干扰对真词mint的识别, 而SW音节中不存在竞争关系。婴儿的研究也证实了词重音作为线索对切分词语的作用, Jusczyk, Houston和Newsome (1999)采用转头偏好范式, 考察7.5个月婴儿的音节识别能力, 结果发现他们对符合英语词重音模式(重音为第一音节)的双音节单词有偏好, 而对于不符合词重音模式的单词没有偏好。

虽然词重音可以作为英语词语切分的线索, 但是这一线索并非具有跨语言的普遍性。法语词重音形式与英语不同, 所有词重音均在词末音节上(林焘, 王理嘉, 2013), 属于固定重音语言, 研究发现法语母语者并非通过重音而是通过音节的完整性来切分词语(Mehler, Dommergues, Frauenfelder, & Segui, 1981); 而在同样是重音语言的西班牙语中, 母语者在切分词语过程中会结合音节的数量和重音两方面线索(LaCross et al., 2016)。

2.3.2 音高和时长信息

韵律结构普遍存在于所有语言中, 每一个韵律结构都会存在韵律边界, 通常伴随语段末音段延长、无声段以及相对较大的音高移动(李卫君, 杨玉芳, 2010)。研究指出这些音高和时长变化在语音歧义词的切分过程中起着消解歧义的作用(Christophe, Peperkamp, Pallier, Block, & Mehler, 2004; Gout, Christophe, & Morgan, 2004; Shatzman & McQueen, 2006)。在Christophe等人(2004)以法语为材料的实验中, 目标词可以和后面单词的首音节(歧义音节)组成合乎语义的竞争词, 但是目标词(chat)和歧义音节(gri)或者处在韵律短语内部(如[d’un chat grincheux]), 或者处在韵律短语边界处(如[le gros chat] [grimpait…])。他们发现被试对目标词的反应情况受到韵律边界的调节, 如果目标词和歧义音节分属于不同的韵律短语, 那么韵律边界有助于切分二者, 避免形成竞争词干扰目标词的识别。

韵律边界对词语的切分体现在音高和时长两方面信息的共同作用上, 那么两者中单独一个因素是否也能够起到切分词语的作用呢?Shatzman和McQueen (2006)采用跨通道语义启动范式考察荷兰语Ze beeft wel eens pot gezegd中辅音/s/的时长对歧义词组的切分影响(eens pot, een spot)。结果发现, 当辅音/s/持续时间较短时, 被试更早地对目标词(pot)对应的图片进行注视, 这是因为位于词尾(eens)的辅音/s/的时长要短于位于词首(spot)的情况, 因此被试将较短的/s/切分成前一个单词的词尾, 进而对目标词(pot)加工更快。除此之外, 关于抑扬-扬抑规律(ITL, Iambic\Trochaic law)的研究也提供了音高和时长信息是如何在词语切分中起线索作用的证据(Frost, Monaghan, & Tatsumi, 2017; Langus et al., 2016)。早在一百多年前, 研究者就发现个体具有根据强度、时长和音高等声学特征将声音序列进行组块化的倾向(Bolton, 1894; Woodrow, 1909)。Hayes (1995)提出节奏感知的抑扬-扬抑规律(Iambic\Trochaic Law):在强度参数上, 个体对节奏感知有强弱形式的扬抑偏好(后续研究发现音高参数与强度参数规律相同); 在时长参数上, 个体对节奏有短长形式的抑扬偏好; 作者进一步指出这一规律不仅仅是语言的结构形式, 也是个体组织、切分语言的方式。近年来的实证研究将焦点放在抑扬-扬抑规律对词语切分作用的跨语言特性上。Langus等人(2016)以意大利语、土耳其语和波斯语母语者为被试, 以重复出现、顺序固定的无意义音节为材料(pa su tu ke ma vi bu go ne du), 每隔一个音节改变音节的时长(180~400 ms)或基频F0 (180~400 Hz), 熟悉阶段要求被试认真听语音材料, 测试阶段给被试呈现音节对, 如pa-su, 要求判断其是否刚刚出现过。结果发现在音高参数上, 三组被试成绩相当且正确率较高(0.7~0.8), 说明他们都以扬抑形式切分音节, 能够区分音节对pa-susu-tu; 但是在时长参数上, 意大利母语者判断的正确率显著高于其他两组被试, 说明只有意大利母语者能够利用时长线索正确切分音节, 即词语切分过程受到语言经验的影响。不过, Frost等人(2017)的研究与此结论完全相反, 他们考察了日语母语者和英语母语者, 发现在时长参数下, 两组被试的回答正确率相当, 研究者认为抑扬-扬抑规律对词语的切分效应是一般性的认知机制, 具有跨语言的普遍性。虽然两个研究采用的实验范式相同, 自变量和因变量指标也基本一致, 但两者在材料设置上有细微差别, 前者的音节呈现顺序固定, 后者的音节呈现顺序随机变化, 而且后者的作答形式为迫选, 要求被试在两个音节对中选择更像单词的一个, 这可能是造成两个研究结果相悖的原因。总之, 关于时长、音高等声学信息在词语切分中作用的研究刚刚起步, 在研究范式和材料上都有不完善的地方, 还需要更多的研究加以对比。

2.3.3 韵律特征的规律性

上文介绍的线索信息在词语切分过程中大多作用在目标词附近, 可以看作是个体利用即时信息对口语语流进行切分, 但也有研究发现当个体对语流进行加工时, 如果前段语流的韵律特征(时长、音高)呈规律性的变化, 那么个体会以相同的变化模式切分后续语流(Brown, Dilley, & Tanenhaus, 2012; Brown, Salverda, Dilley, & Tanenhaus, 2015; Dilley, & McAuley, 2008; Dilley, Mattys, & Vinke, 2010)。

Dilley和McAuley (2008)设计了一系列由8个音节组成的单词串(skirmish princess side kick stand still), 其中前两个单词为重音在第一音节的双音节单词, 后面4个单音节单词可以组成多种音节形式的单词(sidekick standstill, side kickstand still)。考察音高线索时, 将前两个单词(skirmishprincess)的基频F0设置成由高到低(270~280 Hz到170~180 Hz)或相反的变化趋势(如图1), 其中单音节条件中(图1第一行)第五个音节(side)的F0由高到低(270~280 Hz到170~180 Hz), 双音节条件中(图1第二行)第五个音节(side)的F0为低(170~180 Hz), 两种条件下最后三个音节的F0保持一致。实验任务要求被试在听到单词串后报告他们听到的最后一个单词, 结果发现在单音节条件下, 被试会按照“高低高低”的组合规律切分单词, 将kickstand听成一个合成词kickstand, 最后报告单音节单词still; 然而在双音节条件下, 被试更多地报告双音节单词standstill

图1


语速通常被定义为单位时间内听到的音段或音节的数量(Reinisch, 2016), 语速快意味着每个音节的时长短, 语速慢意味着每个音节的时长长, 因此, 语速可以被看作是音节时长的一种表现形式。相关的研究表明, 语速也会对目标词的切分产生影响(Baese-Berk et al., 2014; Dilley & Pitt, 2010; Morrill, Baese-Berk, Heffner, & Dilley, 2015; Morrill, Dilley, McAuley, & Pitt, 2014)。在句子Deena doesn’t have any leisure or time中, 通过PSOLA软件调整语句前段音节串的语速(正常语速, 1.9倍正常语速和0.6倍正常语速), 结果发现当目标词or前面的单词语速较慢时, 被试倾向于报告没有听到目标词or (leisure time), 而语速较快时, 被试倾向于报告听到目标词or (leisure or time)。研究者推断较慢的语速中, 被试期待音节的时长较长, 单词leisureor协同发音导致被试感知不到目标词or (Dilley & Pitt, 2010)。在跨语言的研究中, Lai和Dilly (2016)采用相同的实验范式, 发现在汉语语句我们想摆一盆花在窗台上中, 音节yi1的识别也受到语速的影响; 而且即使过滤掉语义信息只保留基频信息, 目标词远端的韵律特征依旧可以影响词语切分的结果(Dilley et al., 2010)。

口语词语切分过程中, 语音范畴提供的线索大致可以分为音段线索和韵律线索两类, 虽然线索载体和作用方式都不同, 但是两类线索均是个体在接触语言早期就能够习得的, 尤其是已有研究证实个体在1岁以内就可以利用概率信息和词重音信息切分词语; 尽管抑扬-扬抑规律对词语切分的线索作用只得到成人研究的支持, 但是相关研究已经表明个体在出生伊始就具备抑扬或扬抑偏好(Abboub, Nazzi, & Gervain, 2016), 可以设想婴幼儿在切分词语时会综合使用音段和韵律线索。另外, 语音切分是语音合成的逆向过程, 现有的研究成果可以为增强合成语音的表现力和自然度提供帮助(李勇, 魏珰, 王柳渝, 2017)。

3 词语切分中的语法和语义线索

相比于语音线索, 语法和语义层面的信息对词语切分影响的研究较少, 而且考虑到这二者均是较高级的语言知识, 因此研究对象主要是成人。

3.1 语法线索

近年来, 一些研究者试图从脑神经活动的角度揭示个体切分词语的内在机制。Ding, Lucia, Zhang, Tian和Poeppel (2016)采用脑磁图技术, 向被试呈现没有韵律信息且均由单音节单词组成的中英文句子(历史不会重演, white cars need gas), 结果发现当被试听到符合语法结构的词组(历史, white car)或者句子(历史不会重演)时, 大脑皮层会出现较明显的电磁频谱反应, 研究者将其称为大脑皮层的“神经锁定” (neuro entrainment)现象, 借于此他们推断个体能够以语法知识切分语流, 并建构语义表征。

3.2 语义线索

索绪尔(De Saussure & Baskin, 1916)在《普通语言学教程》书中指出要通过音节的意义对语流进行切分, 从而保证被切分的音节有对应的实体, 比如, 法语音节串sižlaprà只能切分成si-ž-la-prà (如果我拿走它)和si-ž-l-aprà (如果我掌握它)两种。虽然索绪尔的设想较为粗糙, 没有得到实验证据的支持, 但随着语言知识的增长, 个体对词语的切分必然受到语义的限制。Norris, McQueen和Culter (1995)提出口语词语切分的可能性限制原则(Possible-Word Constraint, PWC), 认为在口语词语加工中个体头脑的候选词语必须能够解释语流中的所有音位, 只有这样才能完成词语识别, 进而完成词语切分。在研究中, Norris, McQueen, Cutler和Butterfield (1997)要求被试在听到音节串的同时识别真词, 结果发现在音节串fapple中对真词apple的识别比在音节串vuffapple中更加困难, 这是因为音位/f/无法单独构成一个单词, 不利于切分音节串, 而vuff构成单词的可能性较大, 有利于切分音节串。

近年来, 视听跨通道词语启动范式普遍被使用在词语切分的研究中(White, Mattys, & Wiget, 2012)。White等人(2012)设计了强语义关联和弱语义关联的词组(oil tankerseal tanker), 并从模拟对话中切分出真实词组作为实验材料。实验过程中, 首先呈现听觉词组作为启动刺激, 随后呈现视觉目标词, 要求被试判断目标词是否为真实单词, 视觉目标词包括三种情况:与探测词组的结尾词相同、与探测词组无关和非词。结果发现在强语义相关的探测条件下被试对与探测词组结尾相同的视觉单词判断更快, 这说明强语义探测刺激具有启动效应, 加快了对目标词tanker的切分。

4 多种线索的交互作用

在实际的口语加工中, 切分词语是个极其复杂的过程, 受到多种线索的协同(竞争)作用, 一些研究考察了韵律特征(重音)、音位规则、语义和语法等线索的相对权重(Babineau, Shi, & Achim, 2017; Heffner, Dilley, McAuley, & Pitt, 2013; Mattys, 2004)。在法语连音(French liaison)的研究中, Babineau等(2017)发现句法规则对连音的切分作用最大, 而语音线索只起辅助作用; 另一方面, 环境背景和被试的策略也会影响词语切分过程(Mattys, White, & Melhorn, 2005; Morrill et al., 2015)。Mattys等人(2005)的研究结果发现, 在安静环境下语义的影响权重最大, 随后是词汇信息和音位规则, 而重音的线索作用最小, 但在噪声环境中韵律特征等低层级线索的作用变大。

通过行为学实验可以判断个体对不同线索的依仗程度, 但是不能探究个体利用多种线索的时间进程, 大量事件相关电位的研究证实在词语切分过程中, 语音线索的作用是即时的, 不受高级线索的影响, 而语义、语法线索的作用时间主要位于整合语义的最后阶段(Steinhauer, Alter, & Friederici, 1999; 张辉, 孙和涛, 顾介鑫, 2013)。在Steinhauer等(1999)以德语为材料的研究中, 句子的每一个韵律短语边界都会引起被试顶叶脑区的活动, 出现中止正漂移的脑电成分(closure positive shift, CPS), 而如果韵律线索切分的句子结构与句法结构矛盾, 还会出现一个双向的N400-P600成分(biphasic N400-P600)。张辉等人(2013)以相同的实验范式考察汉语母语者对四字成语材料的切分情况, 实验过程中向被试呈现两种朗读模式的成语(2+2, 1+3), 其中每种朗读模式中一半是符合成语句法结构的(废寝/忘食, 狐/假虎威), 一半是不符合的(恩重/如山, 双/喜临门)。结果发现韵律节奏主效应显著, 无论材料是否符合句法模式, 只要以“1+3”节奏朗读时, 都会激发被试双向的N400-P600成分, 而以“2+2”节奏朗读则不会出现此成分。这说明个体在运用韵律信息切分语流时有一定的独立性, 切分早期并不受到语义、语法等高级线索的干扰。

5 小结与展望

本文系统地梳理了语音、语法和语义线索对口语词语切分的作用, 近年来的研究一方面集中在婴幼儿词语切分的线索机制上, 另一方面聚焦于成人是如何综合利用多种线索进行词语切分的, 笔者认为目前的研究仍有不足之处, 可以从以下两个方面丰富、扩展。

5.1 从毕生发展角度考察词语切分线索

语言发展会历经个体从出生到死亡的所有阶段, 目前的研究主要集中在成人口语词语切分上, 婴幼儿的研究才刚刚起步。一直以来, 婴幼儿是如何掌握词语这一问题始终困扰心理学家和语言学家, 除了本文提到的概率信息外是否还有其他线索呢?一些研究发现婴幼儿对词语的习得受到他们接触词语的频率(Ambridge, Kidd, Rowland, & Theakston, 2015)、时间分布、空间分布和文本环境(Roy, Frank, DeCamp, Miller, & Roy, 2015)的影响; 语料库的调查也发现婴儿所接触的语料中有9%是单个单词(single word), 出现频率较高的是comegoupdown等(Ambridge & Lieven, 2011), 因此可以猜想婴儿首先要掌握单个单词, 随后以此作为“据点”切分词组和句子, 进而掌握新词, 但这一假设仍需要更多实验证据的支持。除此之外, 语言加工的老化研究也是近期兴起的热点问题, 词语切分能力是否与语义加工一样存在老化现象呢?如果存在, 是单一线索使用能力下降还是多种线索使用能力共同下降呢?

5.2 从语言特异性角度考察词语切分线索

跨语言的研究已经证实, 个体在切分语言的过程中受到母语语音结构的影响(Cutler & Otake, 1994; Mehler et al., 1981; LaCross et al., 2016)。印欧语系中, 语言中的一个基本单位(词)通常对应若干音节, 因此对词语的切分首先要解决的问题是一个单词对应几个音节; 但汉语的音节结构有其独特的地方, 首先, 每一个音节有一个声调, 使同一个音节中的各个音位有一种向心力, 内聚为一个整体, 从而能清楚地与其他音节区别开来(徐通锵, 2001); 此外, 汉语音节以元音结尾占多数, 辅音结尾的只有/n//ng/两种, 以元音开头的音节又极少, 这都大大降低了连读的可能性; 最后, 按照徐通锵(2010)的说法, 汉语的特点是“1个字·1个音节·1个概念”, 英语中相应的结构是“1个词·n个音节·1个概念”, 即汉语音节与意义是一一对应的关系, 可见在汉语中识别音节并不存在困难。不过, 汉语词汇化过程中表现出明显的双音化倾向, 冯胜利(1998)也指出由于自然音步的影响, 汉语普通话母语者习惯使用双音节词, 端木三(2000)也以小火/车为例子, 指出汉语中的韵律结构对句法结构具有一定的限制作用, 那么这种双音节倾向是否是汉语母语者在语音层面上切分语流的线索呢?这有待于进一步考证。另一方面, 汉语没有明显的语法形态, 同音字数量多, 这导致了汉语中存在大量的同音异构形式, 比如炒饭这一结构, 既可以表示动宾含义炒饭这一动作, 也可以表示偏正含义炒饭这一实物, 再比如结构小张师傅手艺很好也存在歧义, 可以表示小张师傅本人, 也可以指代小张的师傅, 今后的研究可以发掘韵律信息在此类结构中的切分作用。

词语切分是语言加工研究的根本问题, 口语状态下的词语切分是自然交际中的关键环节, 未来应更广泛地从不同线索、不同视角、不同语言展开对这一问题的探讨, 不但可以揭示出某种具体语言中口语加工时的词语切分过程, 而且可以在此基础上得出具有普遍性的口语词语切分模型。

The authors have declared that no competing interests exist.
作者已声明无竞争性利益关系。

参考文献

端木三 . ( 2000).

汉语的节奏

当代语言学, ( 4), 203-209.

[本文引用: 1]

冯胜利 . ( 1998).

论汉语的 “自然音步”

中国语文, ( 1), 40-47.

[本文引用: 1]

何善芬 . ( 1989).

英语超音段音位及其辨义功能

外国语, ( 6), 66-69.

[本文引用: 2]

李勇, 魏珰, 王柳渝 . ( 2017).

基于PSOLA与DCT的情感语音合成方法

计算机工程, ( 12), 278-282.

[本文引用: 1]

李卫君, 杨玉芳 . ( 2010).

绝句韵律边界的认知加工及其脑电效应

心理学报, 42( 11), 1021-1032.

URL     [本文引用: 1]

Prosodic boundary, as an integral part of prosodic features in spoken language, is very important in sentence and discourse comprehension. Recently, researchers have shown increasing interest in the neural mechanism of prosodic boundary processing. Numerous studies have found that CPS, a special ERP component reflecting the closure of phonological phrase, could be induced by phonological phrase boundary embedded in a sentence. Also, researches have demonstrated that intonational phrase boundary in the end of a sentence consistently elicited P3, which reflects the operation of yntactic closure as well as the completeness of language unit. The present study aims to investigate the cognitive processing of intonational phrase boundary embedded in discourse and its related brain effect using ERP. To explore the processing of intonational phrase boundary in discourse, quatrain was used, which is composed of four sentences and five or seven characters in each sentence. Twenty (10 males) healthy undergraduates participated in the experiment. The participants were told to listen carefully to each poem, and complete a word discrimination task. Specifically, if the word "Space" was presented, they were asked to press the spacebar to continue. If other words were presented, they were required to press the "F" or "J" key to indicate whether the word appeared in the poem they had just heard. The EEG was recorded from 64 scalp channels using electrodes mounted in an elastic cap. Boundary-related ERPs were calculated for a 1500 ms epoch including a 200 ms pre-boundary syllable baseline. It was found that the three intonational phrase boundaries embedded in the quatrain elicited CPS respectively, with no differences in time course, amplitude and scalp distribution, indicating that prosodic boundary processing was not influenced by its position in discourse. Moreover, the final sentence of both five-character-quatrain and seven-character-quatrain evoked the P3 effect, with the amplitude elicited by the former one lower than that of the later one, but no differences in scalp distribution and onset latency. The present study suggests that boundaries conveying both the closure of preceding information and the prediction of upcoming information will induce CPS, while boundaries reflecting only the closure of preceding information will elicit P3. The nature and characteristics of these two components were also discussed in the present study.

唐溢, 张智君, 曾玫媚, 黄可, 刘炜, 赵亚军 . ( 2015).

基于名人面孔视觉特征和语义信息的视觉统计学习

心理学报, 47( 7), 837-850.

URL     [本文引用: 1]

王理嘉 . ( 1991). 音系学基础. 北京: 语文出版社.

[本文引用: 1]

林焘, 王理嘉 . ( 2013). 语音学教程. 北京: 北京大学出版社.

[本文引用: 2]

徐通锵 . ( 2001). 基础语言学教程. 北京: 北京大学出版社.

[本文引用: 1]

徐通锵 . ( 2010). 徐通锵文选. 北京: 北京大学出版社.

[本文引用: 1]

许希明, 沈家煊 . ( 2016).

英汉语重音的音系差异

外语教学与研究, 49( 5), 643-656.

[本文引用: 1]

杨玉芳, 黄贤军, 高路 . ( 2006).

韵律特征研究

心理科学进展, 14( 4), 546-550.

[本文引用: 1]

张辉, 孙和涛, 顾介鑫 . ( 2013).

成语加工中韵律与句法互动的事件相关电位研究

外国语: 上海外国语大学学报, ( 1), 22-31.

[本文引用: 2]

张珊珊, 杨亦鸣 . ( 2012).

从记忆编码加工看人脑中的基本语言单位——一项基于单音节语言单位的 ERPs 研究

外语与外语教学, ( 2), 1-6.

[本文引用: 1]

Abboub N., Nazzi T., & Gervain J . ( 2016).

Prosodic grouping at birth

Brain and Language, 162, 46-59.

URL     [本文引用: 1]

Ambridge B., Kidd E., Rowland C. F., & Theakston A. L . ( 2015).

The ubiquity of frequency effects in first language acquisition

Journal of Child Language, 42( 2), 239-273.

URL     PMID:25644408      [本文引用: 1]

Abstract This review article presents evidence for the claim that frequency effects are pervasive in children's first language acquisition, and hence constitute a phenomenon that any successful account must explain. The article is organized around four key domains of research: children's acquisition of single words, inflectional morphology, simple syntactic constructions, and more advanced constructions. In presenting this evidence, we develop five theses. (i) There exist different types of frequency effect, from effects at the level of concrete lexical strings to effects at the level of abstract cues to thematic-role assignment, as well as effects of both token and type, and absolute and relative, frequency. High-frequency forms are (ii) early acquired and (iii) prevent errors in contexts where they are the target, but also (iv) cause errors in contexts in which a competing lower-frequency form is the target. (v) Frequency effects interact with other factors (e.g. serial position, utterance length), and the patterning of these interactions is generally informative with regard to the nature of the learning mechanism. We conclude by arguing that any successful account of language acquisition, from whatever theoretical standpoint, must be frequency sensitive to the extent that it can explain the effects documented in this review, and outline some types of account that do and do not meet this criterion.

Ambridge B. , & Lieven, E. V. M.( 2011) .

Child language acquisition: Contrasting theoretical approaches Cambridge: Cambridge University Press Contrasting theoretical approaches

Cambridge: Cambridge University Press.

[本文引用: 1]

Aslin R. N., Saffran J. R., & Newport E. L . ( 1998).

Computation of conditional probability statistics by 8-month-old infants

Psychological Science, 9( 4), 321-324.

URL     [本文引用: 1]

A recent report demonstrated that 8-month-olds can segment a continuous stream of speech syllables, containing no acoustic or prosodic cues to word boundaries, into wordlike units after only 2 min of listening experience (Saffran, Aslin, & Newport, 1996). Thus, a powerful learning mechanism capable of extracting statistical information from fluent speech is available early in development. The present study extends these results by documenting the particular type of statistical computation090009transitional (conditional) probability090009used by infants to solve this word-segmentation task. An artificial language corpus, consisting of a continuous stream of trisyllabic nonsense words, was presented to 8-month-olds for 3 min. A postfamiliarization test compared the infants' responses to words versus part-words (trisyllabic sequences spanning word boundaries). The corpus was constructed so that test words and part-words were matched in frequency, but differed in their transitional probabilities. Infants showed reliable discrimination of words from part-words, thereby demonstrating rapid segmentation of continuous speech into words on the basis of transitional probabilities of syllable pairs.

Babineau M., Shi R., & Achim A . ( 2017).

Contextual factors in lexical processing: The case of French Liaison

Language, Cognition and Neuroscience, 32( 4), 457-470.

URL     [本文引用: 2]

Abstract Lower-level and higher-level processes during lexical recognition were investigated using ambiguous pseudo-noun cases related to liaisons in French. In phrases such as un onche and un nonche, the misalignment in the former liaison case produces an identical surface form as in the latter consonant-initial case, both [0400.n000006], and two possible interpretations (onche, nonche) enter into competition. Quebec-French-speaking adults performed an implicit segmentation task testing the use of different factors. Results showed a dominant effect of syntactic category, with a general bias for vowel-initial interpretation when targets followed a determiner. The use of specific liaison acoustic cues for disambiguation was found for /z/ and /n/ only in adjective context. Liaison frequency and onset probability had no clear influence. Thus, the contextual knowledge of liaison-causing words is crucial for lexical recognition. These findings are consistent with the predictions of the hierarchy proposed by Mattys, White, and Melhorn (2005).

Baese-Berk M. M., Heffner C. C., Dilley L. C., Pitt M. A., Morrill T. H., & McAuley J. D . ( 2014).

Long-term temporal tracking of speech rate affects spoken-word recognition

Psychological Science, 25( 8), 1546-1553.

URL     PMID:24907119      [本文引用: 1]

Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour.

Bolton, T. L . ( 1894).

Rhythm

The American Journal of Psychology, 6( 2), 145-238.

URL     [本文引用: 1]

Bonatti L. L., Peña M., Nespor M., & Mehler J . ( 2005).

Linguistic constraints on statistical computations: The role of consonants and vowels in continuous speech processing

Psychological Science, 16( 6), 451-459.

[本文引用: 3]

Brown M., Dilley L. C., & Tanenhaus M. K . ( 2012, January).

Real-time expectations based on context speech rate can cause words to appear or disappear

Proceedings of the Annual Meeting of the Cognitive Science Society. Austion, TX.

[本文引用: 1]

Brown M., Salverda A. P., Dilley L. C., & Tanenhaus M. K . ( 2015).

Metrical expectations from preceding prosody influence perception of lexical stress

Journal of Experimental Psychology: Human Perception and Performance, 41( 2), 306-323.

URL     PMID:25621583      [本文引用: 1]

Abstract Two visual-world experiments tested the hypothesis that expectations based on preceding prosody influence the perception of suprasegmental cues to lexical stress. The results demonstrate that listeners' consideration of competing alternatives with different stress patterns (e.g., 'jury/gi'raffe) can be influenced by the fundamental frequency and syllable timing patterns across material preceding a target word. When preceding stressed syllables distal to the target word shared pitch and timing characteristics with the first syllable of the target word, pictures of alternatives with primary lexical stress on the first syllable (e.g., jury) initially attracted more looks than alternatives with unstressed initial syllables (e.g., giraffe). This effect was modulated when preceding unstressed syllables had pitch and timing characteristics similar to the initial syllable of the target word, with more looks to alternatives with unstressed initial syllables (e.g., giraffe) than to those with stressed initial syllables (e.g., jury). These findings suggest that expectations about the acoustic realization of upcoming speech include information about metrical organization and lexical stress and that these expectations constrain the initial interpretation of suprasegmental stress cues. These distal prosody effects implicate online probabilistic inferences about the sources of acoustic-phonetic variation during spoken-word recognition. (c) 2015 APA, all rights reserved.

Christophe A., Peperkamp S., Pallier C., Block E., & Mehler J . ( 2004).

Phonological phrase boundaries constrain lexical access I. Adult data

Journal of Memory and Language, 51( 4), 523-547.

URL     [本文引用: 2]

The location of phonological phrase boundaries was shown to affect lexical access by English-learning infants of 10 and 13 months of age. Experiments 1 and 2 used the head-turn preference procedure: infants were familiarized with two bisyllabic words, then presented with sentences that either contained the familiarized words or contained both their syllables separated by a phonological phrase boundary. Ten-month-olds did not show any listening preference, whereas 13-month-olds listened significantly longer to sentences containing the familiarized words. Experiments 3 and 4 relied on a variant of the conditioned head-turning technique. In a first session, infants were trained to turn their heads for an isolated bisyllabic word. In the second session, they were exposed to the same sentences as above. Both 10- and 12.5-month-old infants turned significantly more often when the target word truly appeared in the sentence. These results suggest that phonological phrase boundaries constrain on-line lexical access in infants.

Cole R. A., Jakimik J., & Cooper W. E . ( 1980).

Segmenting speech into words

The Journal of the Acoustical Society of America, 67( 4), 1323-1332.

URL    

Cutler A., & Carter, D. M . ( 1987).

The predominance of strong initial syllables in the English vocabulary

Computer Speech & Language, 2( 3-4), 133-142.

URL     [本文引用: 1]

Studies of human speech processing have provided evidece for a segmentation strategy in the perception of continuous speech, whereby a word boundary is postulated, and a lexical access procedure initiated, at each metrically strong syllable. The likely success of this strategy was here estimated against the characteristics of the English vocabulary. Two computerized dictionaries were found to list approximately three times as many words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables containing a reduced vowel). Consideration of frequency of lexical word occurrence reveals that words beginning with strong syllables occur on average more often than words beginning with weak syllables. Together, these findings motivate an estimate for everyday speech recognition that approximately 85% of lexical words (i.e. excluding function words) will begin with strong syllables. This estimate was tested against a corpus of 190 000 words of spontaneous British English conversion. In this corpus, 90% of lexical words were found to begin with strong syllables. This suggests that a strategy of postulating word boundaries at the onset of strong syllables would have a high success rate in that few actual lexical word onsets would be missed.

Cutler A., & Norris, D. ( 1988).

The role of strong syllables in segmentation for lexical access

Journal of Experimental Psychology: Human Perception and Performance, 14( 1), 113-121.

URL     [本文引用: 1]

Speech recognition is the process by which meaning is derived from the acoustic signal. Speech signals are continuous. Before a recognizer can access the meaning of any word occurring in an input, it must decide where the word begins. This would he no problem if speakers provided reliable cues that marked such points in the signal. Speech researchers have so far failed to discover such cues however and much research effort in speech recognition has been devoted to the question of where to start lexical access in the absence of reliable information about where words begin. A solution adopted by most psychological models of speech recognition is to preprocess the signal and undertake some pre-lexical classification.

Cutler A., & Otake, T. ( 1994).

Mora or phoneme? Further evidence for language-specific listening

Journal of Memory and Language, 33( 6), 824-844.

URL     [本文引用: 1]

Japanese listeners detect speech sound targets which correspond precisely to a mora (a phonological unit which is the unit of rhythm in Japanese) more easily than targets which do not. English listeners detect medial vowel targets more slowly than consonants. Six phoneme detection experiments investigated these effects in both subject populations, presented with native- and foreign-language input. Japanese listeners produced faster and more accurate responses to moraic than to nonmoraic targets both in Japanese and, where possible, in English; English listeners responded differently. The detection disadvantage for medial vowels appeared with English listeners both in English and in Japanese; again, Japanese listeners responded differently. Some processing operations which listeners apply to speech input are language-specific; these language-specific procedures, appropriate for listening to input in the native language, may be applied to foreign-language input irrespective of whether they remain appropriate.

De Saussure F., , & Baskin, W. ( 1916).

Course in general linguistics. London:

Duckworth.

[本文引用: 1]

Dilley L. C., & McAuley, J. D . ( 2008).

Distal prosodic context affects word segmentation and lexical processing

Journal of Memory and Language, 59( 3), 294-311.

URL     [本文引用: 3]

Three experiments investigated the role of distal (i.e., nonlocal) prosody in word segmentation and lexical processing. In Experiment 1, prosodic characteristics of the initial five syllables of eight-syllable sequences were manipulated; the final portions of these sequences were lexically ambiguous (e.g., note bookworm, notebook worm). Distal prosodic context affected the rate with which participants heard disyllabic final words, although identical acoustic material was judged. In Experiment 2, removing four syllables of initial context reduced the magnitude of the distal prosodic effect. Experiment 3 used a study-test recognition design; better recognition was demonstrated for visually-presented disyllabic words when these items were comprised of adjacent syllables previously heard in distal prosodic contexts predicted to facilitate perceptual grouping of these two syllables. Overall, this research identifies distal prosody as a new factor in word segmentation and lexical processing and provides support for a perceptual grouping hypothesis derived from principles of auditory perceptual organization.

Dilley L. C., Mattys S. L., & Vinke L . ( 2010).

Potent prosody: Comparing the effects of distal prosody, proximal prosody, and semantic context on word segmentation

Journal of Memory and Language, 63( 3), 274-294.

URL     [本文引用: 2]

Dilley L. C., & Pitt, M. A . ( 2010).

Altering context speech rate can cause words to appear or disappear

Psychological Science, 21( 11), 1664-1670.

URL     [本文引用: 2]

Ding N., Lucia M., Zhang H., Tian X., & Poeppel D . ( 2016).

Cortical tracking of hierarchical linguistic structures in connected speech

Nature Neuroscience, 19( 1), 158-164.

URL     PMID:4809195      [本文引用: 1]

The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures based on a grammatical system, resulting in a hierarchy of linguistic units, e.g., words, phrases, and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. Here we demonstrate that during listening to connected speech, cortical activity of different time scales concurrently tracks the time course of abstract linguistic structures at different hierarchical levels, e.g. words, phrases, and sentences. Critically, the neural tracking of hierarchical linguistic structures is dissociated from the encoding of acoustic cues as well as from the predictability of incoming words. The results demonstrate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.

Endress A. D., & Mehler, J. ( 2009).

The surprising power of statistical learning: When fragment knowledge leads to false memories of unheard words

Journal of Memory and Language, 60( 3), 351-367.

URL     [本文引用: 1]

Estes, K. G . ( 2012).

Infants generalize representations of statistically segmented words

Frontiers in Psychology, 3( 3), 447.

URL     PMID:3482870      [本文引用: 1]

The acoustic variation in language presents learners with a substantial challenge. To learn by tracking statistical regularities in speech, infants must recognize words across tokens that differ based on characteristics such as the speaker's voice, affect, or the sentence context. Previous statistical learning studies have not investigated how these types of non-phonemic surface form variation affect learning. The present experiments used tasks tailored to two distinct developmental levels to investigate the robustness of statistical learning to variation. Experiment 1 examined statistical word segmentation in 11-month-olds and found that infants can recognize statistically segmented words across a change in the speaker's voice from segmentation to testing. The direction of infants' preferences suggests that recognizing words across a voice change is more difficult than recognizing them in a consistent voice. Experiment 2 tested whether 17-month-olds can generalize the output of statistical learning across variation to support word learning. The infants were successful in their generalization; they associated referents with statistically defined words despite a change in voice from segmentation to label learning. Infants' learning patterns also indicate that they formed representations of across word syllable sequences during segmentation. Thus, low probability sequences can act as object labels in some conditions. The findings of these experiments suggest that the units that emerge during statistical learning are not perceptually constrained, but rather are robust to naturalistic acoustic variation.

Erickson L. C., Thiessen E. D., & Estes K. G . ( 2014).

Statistically coherent labels facilitate categorization in 8-month-olds

Journal of Memory and Language, 72, 49-58.

URL     [本文引用: 1]

There is considerable evidence that infants can segment speech using syllable co-occurrence probabilities; however, relatively less is known about the nature of the representations formed during this process. The present studies tested the prediction that statistically segmented items should exhibit a specific property of real words, namely, these items should have a facilitative effect on infant categorization. During the segmentation phase, eight-month-old infants listened to a fluent speech stream that contained statistical word boundary cues. Infants were then tested on their ability to categorize drawings of an unfamiliar category when category exemplars were paired with either high-probability or low-probability labels from the segmentation phase. Infants who heard high-probability labels showed evidence of categorization. In contrast, infants who heard low-probability labels did not. A follow up experiment revealed that this effect was due to facilitation for high-probability words rather than inhibition for low-probability items. These results fit with theoretical accounts that suggest that infants treat statistically segmented units as potential words.

Frost R. L. A., Monaghan P., & Tatsumi T . ( 2017).

Domain- general mechanisms for speech segmentation: The role of duration information in language learning

Journal of Experimental Psychology Human Perception and Performance, 43( 3), 466-476.

URL     [本文引用: 2]

Gómez D. M., Mok P., Ordin M., Mehler J., & Nespor M . ( 2017).

Statistical speech segmentation in tone languages: The role of lexical tones

Language and Speech, 61( 1), 84-96.

URL     [本文引用: 3]

Abstract Research has demonstrated distinct roles for consonants and vowels in speech processing. For example, consonants have been shown to support lexical processes, such as the segmentation of speech based on transitional probabilities (TPs), more effectively than vowels. Theory and data so far, however, have considered only non-tone languages, that is to say, languages that lack contrastive lexical tones. In the present work, we provide a first investigation of the role of consonants and vowels in statistical speech segmentation by native speakers of Cantonese, as well as assessing how tones modulate the processing of vowels. Results show that Cantonese speakers are unable to use statistical cues carried by consonants for segmentation, but they can use cues carried by vowels. This difference becomes more evident when considering tone-bearing vowels. Additional data from speakers of Russian and Mandarin suggest that the ability of Cantonese speakers to segment streams with statistical cues carried by tone-bearing vowels extends to other tone languages, but is much reduced in speakers of non-tone languages.

Gout A., Christophe A., & Morgan J. L . ( 2004).

Phonological phrase boundaries constrain lexical access II. Infant data

Journal of Memory and Language, 51( 4), 548-567.

URL     [本文引用: 1]

The location of phonological phrase boundaries was shown to affect lexical access by English-learning infants of 10 and 13 months of age. Experiments 1 and 2 used the head-turn preference procedure: infants were familiarized with two bisyllabic words, then presented with sentences that either contained the familiarized words or contained both their syllables separated by a phonological phrase boundary. Ten-month-olds did not show any listening preference, whereas 13-month-olds listened significantly longer to sentences containing the familiarized words. Experiments 3 and 4 relied on a variant of the conditioned head-turning technique. In a first session, infants were trained to turn their heads for an isolated bisyllabic word. In the second session, they were exposed to the same sentences as above. Both 10- and 12.5-month-old infants turned significantly more often when the target word truly appeared in the sentence. These results suggest that phonological phrase boundaries constrain on-line lexical access in infants.

Hayes B., ( 1995).

Metrical stress theory: Principles and case studies

Chicago: University of Chicago Press.

URL     [本文引用: 1]

Abstract In this account of metrical stress theory, Bruce Hayes builds on the notion that stress constitutes linguistic rhythm that stress patterns are rhythmically organized, and that formal structures proposed for rhythm can provide a suitable account of stress. Through an extensive typological survey of word stress rules that uncovers widespread asymmetries, he identifies a fundamental distinction between iambic and trochaic rhythm, called the "Iambic/Trochaic law," and argues that it has pervasive effects among the rules and structures responsible for stress. Hayes incorporates the iambic/trochaic opposition into a general theory of word stress assignment, intended to account for all languages in which stress is assigned on phonological as opposed to morphological principles. His theory addresses particularly problematic areas in metrical work, such as ternary stress and unusual weight distinctions, and he proposes new theoretical accounts of them. Attempting to take more seriously the claim of generative grammar to be an account of linguistic universals, Hayes proposes analyses for the stress patterns of over 150 languages. Hayes compares his own innovative views with alternatives from the literature, allowing students to gain an overview of the field. Metrical Stress Theory should interest all who seek to understand the role of stress in language.

Heffner C. C., Dilley L. C., McAuley J. D., & Pitt M. A . ( 2013).

When cues combine: How distal and proximal acoustic cues are integrated in word segmentation

Language and Cognitive Processes, 28( 9), 1275-1302.

URL     [本文引用: 1]

Spoken language contains few reliable acoustic cues to word boundaries, yet listeners readily perceive words as separated in continuous speech. Dilley and Pitt (2010) showed that the rate of nonlocal (i.e., distal) context speech influences word segmentation, but present theories of word segmentation cannot account for whether and how this cue interacts with other acoustic cues proximal to (i.e., in the vicinity of) the word boundary. Four experiments examined the interaction of distal speech rate with four proximal acoustic cues that have been shown to influence segmentation: intensity (Experiment 1), fundamental frequency (Experiment 2), word duration (Experiment 3), and high frequency noise resembling a consonantal onset (Experiment 4). Participants listened to sentence fragments and indicated which of two lexical interpretations they heard, where one interpretation contained more words than the other. Across all four experiments, both distal speech rate and proximal acoustic manipulations affected the reported lexical interpretation, but the two types of cues did not consistently interact. Overall, the results of the set of experiments are inconsistent with a strictly-ranked hierarchy of cues to word boundaries, and instead highlight the necessity of word segmentation and lexical access theories to allow for flexible rankings of cues to word boundary placement.

Hyman, L. M . ( 2009).

How (not) to do phonological typology: the case of pitch-accent

Language Sciences, 31( 2-3), 213-238.

URL    

In this paper I argue for a property-driven approach to phonological typology. Rather than seeking to classify or label languages, the central goal of phonological typology is to determine how different languages systematize the phonetic substance available to all languages. The paper focuses on a very murky area in phonological typology, word-prosodic systems. While there is agreement that certain properties converge to characterize two prosodic prototypes, tone and stress, the term “pitch-accent” is frequently adopted to refer to a defective tone system whose tone is obligatory, culminative, privative, metrical, and/or restricted in distribution. Drawing from a database of ca. 600 tone systems, I show that none of these properties is found in all systems claimed to be accentual and that all five are amply attested in canonical tone systems. Since all one can say is that alleged pitch-accent systems exhibit significant constraints on the distribution of their tonal contrasts, they do not constitute a coherent prosodic “type”. Rather, alleged “pitch-accent” systems freely pick-and-choose properties from the tone and stress prototypes, producing mixed, ambiguous, and sometimes analytically indeterminate systems which appear to be “intermediate”. There thus is no pitch-accent prototype, nor can prosodic systems be treated as a continuum placed along a single linear dimension. The paper concludes that the goal of prosodic typology should not be to classify languages, but rather the properties of their subsystems.

Jusczyk P. W., Houston D. M., & Newsome M . ( 1999).

The beginnings of word segmentation in English-learning infants

Cognitive Psychology, 39( 3), 159-207.

URL     PMID:10631011      [本文引用: 3]

A series of 15 experiments was conducted to explore English-learning infants' capacities to segment bisyllabic words from fluent speech. The studies in Part I focused on 7.5 month olds' abilities to segment words with strong/weak stress patterns from fluent speech. The infants demonstrated an ability to detect strong/weak target words in sentential contexts. Moreover, the findings indicated that the infants were responding to the whole words and not to just their strong syllables. In Part II, a parallel series of studies was conducted examining 7.5 month olds' abilities to segment words with weak/strong stress patterns. In contrast with the results for strong/weak words, 7.5 month olds appeared to missegment weak/strong words. They demonstrated a tendency to treat strong syllables as markers of word onsets. In addition, when weak/strong words co-occurred with a particular following weak syllable (e.g., “guitar is”), 7.5 month olds appeared to misperceive these as strong/weak words (e.g., “taris”). The studies in Part III examined the abilities of 10.5 month olds to segment weak/strong words from fluent speech. These older infants were able to segment weak/strong words correctly from the various contexts in which they appeared. Overall, the findings suggest that English learners may rely heavily on stress cues when they begin to segment words from fluent speech. However, within a few months time, infants learn to integrate multiple sources of information about the likely boundaries of words in fluent speech.

LaCross A., Liss J., Barragan B., Adams A., Berisha V., McAuliffe M., & Fromont R . ( 2016).

The role of stress and word size in Spanish speech segmentation

The Journal of the Acoustical Society of America, 140( 6), EL484-EL490.

URL     PMID:28040010     

Abstract In English, the predominance of stressed syllables as word onsets aids lexical segmentation in degraded listening conditions. Yet it is unlikely that these findings would readily transfer to languages with differing rhythmic structure. In the current study, the authors seek to examine whether listeners exploit both common word size (syllable number) and stress cues to aid lexical segmentation in Spanish. Forty-seven Spanish-speaking listeners transcribed two-word Spanish phrases in noise. As predicted by the statistical probabilities of Spanish, error analysis revealed that listeners preferred two- and three-syllable words with penultimate stress in their attempts to parse the degraded speech signal. These findings provide insight into the importance of stress in tandem with word size in the segmentation of Spanish words and suggest testable hypotheses for cross-linguistic studies that examine the effects of degraded acoustic cues on lexical segmentation.

Lai W., & Dilley, L. ( 2016).

Cross-linguistic generalization of the distal rate effect: Speech rate in context affects whether listeners hear a function word in Chinese Mandarin

Proceedings of 2016 Speech Prosody, Boston, MA.

[本文引用: 2]

Langus A., Seyed-Allaei S., Uysal E., Pirmoradian S., Marino C., Asaadi S., .. Nespor M . ( 2016).

Listening natively across perceptual domains?

Journal of Experimental Psychology: Learning, Memory, and Cognition, 42( 7), 1127-1139.

URL     PMID:26820498      [本文引用: 1]

Our native tongue influences the way we perceive other languages. But does it also determine the way we perceive nonlinguistic sounds? The authors investigated how speakers of Italian, Turkish, and Persian group sequences of syllables, tones, or visual shapes alternating in either frequency or duration. We found strong native listening effects with linguistic stimuli. Speakers of Italian grouped the linguistic stimuli differently from speakers of Turkish and Persian. However, speakers of all languages showed the same perceptual biases when grouping the nonlinguistic auditory and the visual stimuli. The shared perceptual biases appear to be determined by universal grouping principles, and the linguistic differences caused by prosodic differences between the languages. Although previous findings suggest that acquired linguistic knowledge can either enhance or diminish the perception of both linguistic and nonlinguistic auditory stimuli, we found no transfer of native listening effects across auditory domains or perceptual modalities. (PsycINFO Database Record

Lew‐Williams C., Pelucchi B., & Saffran J. R . ( 2011).

Isolated words enhance statistical language learning in infancy

Developmental Science, 14( 6), 1323-1329.

URL     PMID:3280507      [本文引用: 1]

Infants are adept at tracking statistical regularities to identify word boundaries in pause-free speech. However, researchers have questioned the relevance of statistical learning mechanisms to language acquisition, since previous studies have used simplified artificial languages that ignore the variability of real language input. The experiments reported here embraced a key dimension of variability in infant-directed speech. English-learning infants (809000910 months) listened briefly to natural Italian speech that contained either fluent speech only or a combination of fluent speech and single-word utterances. Listening times revealed successful learning of the statistical properties of target words only when words appeared both in fluent speech and in isolation; brief exposure to fluent speech alone was not sufficient to facilitate detection of the words090005 statistical properties. This investigation suggests that statistical learning mechanisms actually benefit from variability in utterance length, and provides the first evidence that isolated words and longer utterances act in concert to support infant word segmentation.

Mattys, S. L . ( 2004).

Stress versus coarticulation: Toward an integrated approach to explicit speech segmentation

Journal of Experimental Psychology: Human Perception and Performance, 30( 2), 397-408.

URL     PMID:15053697      [本文引用: 1]

Although word stress has been hailed as a powerful speech-segmentation cue, the results of 5 cross-modal fragment priming experiments revealed limitations to stress-based segmentation. Specifically, the stress pattern of auditory primes failed to have any effect on the lexical decision latencies to related visual targets. A determining factor was whether the onset of the prime was coarticulated with the preceding speech fragment. Uncoarticulated (i.e., concatenated) primes facilitated priming. Coarticulated ones did not. However, when the primes were presented in a background of noise, the pattern of results reversed, and a strong stress effect emerged: Stress-initial primes caused more priming than non-initial-stress primes, regardless of the coarticulatory cues. The results underscore the role of coarticulation in the segmentation of clear speech and that of stress in impoverished listening conditions. More generally, they call for an integrated and signal-contingent approach to speech segmentation.

Mattys S. L., Melhorn J. F., & White L . ( 2007).

Effects of syntactic expectations on speech segmentation

Journal of Experimental Psychology: Human Perception and Performance, 33( 4), 960-977.

URL     PMID:17683240     

Although the effect of acoustic cues on speech segmentation has been extensively investigated, the role of higher order information (e.g., syntax) has received less attention. Here, the authors examined whether syntactic expectations based on subject-verb agreement have an effect on segmentation and whether they do so despite conflicting acoustic cues. Although participants detected target words faster in phrases containing adequate acoustic cues (

Mattys S. L., White L., & Melhorn J. F . ( 2005).

Integration of multiple speech segmentation cues: A hierarchical framework

Journal of Experimental Psychology: General, 134( 4), 477-500.

URL     PMID:16316287      [本文引用: 2]

A central question in psycholinguistic research is how listeners isolate words from connected speech despite the paucity of clear word-boundary cues in the signal. A large body of empirical evidence indicates that word segmentation is promoted by both lexical (knowledge-derived) and sublexical (signal-derived) cues. However, an account of how these cues operate in combination or in conflict is lacking. The present study fills this gap by assessing speech segmentation when cues are systematically pitted against each other. The results demonstrate that listeners do not assign the same power to all segmentation cues; rather, cues are hierarchically integrated, with descending weights allocated to lexical, segmental, and prosodic cues. Lower level cues drive segmentation when the interpretive conditions are altered by a lack of contextual and lexical information or by white noise. Taken together, the results call for an integrated, hierarchical, and signal-contingent approach to speech segmentation.

McQueen, J. M . ( 1998).

Segmentation of continuous speech using phonotactics

Journal of Memory and Language, 39( 1), 21-46.

URL     [本文引用: 2]

Newport, E. L . ( 2016).

Statistical language learning: Computational, maturational, and linguistic constraints

Language and Cognition, 8( 3), 447-461.

URL     PMID:5495188      [本文引用: 2]

Abstract Our research on statistical language learning shows that infants, young children, and adults can compute, online and with remarkable speed, how consistently sounds co-occur, how frequently words occur in similar contexts, and the like, and can utilize these statistics to find candidate words in a speech stream, discover grammatical categories, and acquire simple syntactic structure in miniature languages. However, statistical learning is not merely learning the patterns presented in the input. When their input is inconsistent, children sharpen these statistics and produce a more systematic language than the one to which they are exposed. When input languages inconsistently violate tendencies that are widespread in human languages, learners shift these languages to be more aligned with language universals, and children do so much more than adults. These processes explain why children acquire language (and other patterns) more effectively than adults, and also may explain how systematic language structures emerge in communities where usages are varied and inconsistent. Most especially, they suggest that usage-based learning approaches must account for differences between adults and children in how usage properties are acquired, and must also account for substantial changes made by adult and child learners in how input usage properties are represented during learning.

Mehler J., Dommergues J. Y., Frauenfelder U., & Segui J . ( 1981).

The syllable's role in speech segmentation

Journal of Verbal Learning and Verbal Behavior, 20( 3), 298-305.

URL     [本文引用: 1]

In this study a monitoring technique was employed to examine the role of the syllable in the perceptual segmentation of words. Pairs of words sharing the first three phonemes but having different syllabic structure (for instance, pa-lace and pal-mier) were used. The targets were the sequences composed of either the first two or three phonemes of the word (for instance, pa and pal). The results showed that reaction times to targets which correspond to the first syllable of the word were faster than those that did not, independently of the target size. In a second experiment, two target types, V and VC (for instance, a and al in the two target words above) were used with the same experimental list as in experiment one. Subjects detected the VC target type faster when it belonged to the first syllable than when it belonged to the first two syllables. No differences were observed for the V target type which was in the first syllable in both cases. On the basis of the reported results an interpretation in which the syllable is considered a processing unit in speech perception is advanced.

Morrill T. H., Dilley L. C., McAuley J. D., & Pitt M. A . ( 2014).

Distal rhythm influences whether or not listeners hear a word in continuous speech: Support for a perceptual grouping hypothesis

Cognition, 131( 1), 69-74.

URL     [本文引用: 2]

Morrill T., Baese-Berk M., Heffner C., & Dilley L . ( 2015).

Interactions between distal speech rate, linguistic knowledge, and speech environment

Psychonomic Bulletin & Review, 22( 5), 1451-1457.

URL     PMID:25794478      [本文引用: 1]

During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings...

Norris D., Mcqueen J. M., & Cutler A . ( 1995).

Competition and segmentation in spoken-word recognition

Journal of Experimental Psychology: Learning, Memory, and Cognition, 21( 5), 1209-1228.

URL     PMID:8744962      [本文引用: 1]

Abstract Spoken utterances contain few reliable cues to word boundaries, but listeners nonetheless experience little difficulty identifying words in continuous speech. The authors present data and simulations that suggest that this ability is best accounted for by a model of spoken-word recognition combining competition between alternative lexical candidates, and sensitivity to prosodic structure. In a word-spotting experiment, stress pattern effects emerged most clearly when there were many competing lexical candidates for part of the input. Thus, competition between simultaneously active word candidates can modulate the size of prosodic effects, which suggests that spoken-word recognition must be sensitive both to prosodic structure and to the effects of competition. A version of the Shortlist model (D. G. Norris, 1994b) incorporating the Metrical Segmentation Strategy (A. Cutler & D. Norris, 1988) accurately simulates the results using a lexicon of more than 25,000 words.

Norris D., McQueen J. M., Cutler A., & Butterfield S . ( 1997).

The possible-word constraint in the segmentation of continuous speech

Cognitive Psychology, 34( 3), 191-243.

URL     PMID:9466831      [本文引用: 1]

Abstract We propose that word recognition in continuous speech is subject to constraints on what may constitute a viable word of the language. This Possible-Word Constraint (PWC) reduces activation of candidate words if their recognition would imply word status for adjacent input which could not be a word--for instance, a single consonant. In two word-spotting experiments, listeners found it much harder to detect apple, for example, in fapple (where [f] alone would be an impossible word), than in vuffapple (where vuff could be a word of English). We demonstrate that the PWC can readily be implemented in a competition-based model of continuous speech recognition, as a constraint on the process of competition between candidate words; where a stretch of speech between a candidate word and a (known or likely) word boundary is not a possible word, activation of the candidate word is reduced. This implementation accurately simulates both the present results and data from a range of earlier studies of speech segmentation.

Perruchet P., & Poulin-Charronnat, B. ( 2012).

Beyond transitional probability computations: Extracting word-like units when only statistical information is available

Journal of Memory and Language, 66( 4), 807-818.

URL     [本文引用: 1]

Reinisch E., ( 2016).

Natural fast speech is perceived as faster than linearly time-compressed speech

Attention, Perception, & Psychophysics, 78( 4), 1203-1217.

URL     PMID:26860711      [本文引用: 1]

Abstract Listeners compensate for variation in speaking rate: In a fast context, a given sound is interpreted as longer than in a slow context. Experimental rate manipulations have been achieved either through linear compression or by using natural fast speech. However, in natural fast speech, segments are subject to processes such as reduction or deletion. If speaking rate is then defined as the number of segments per unit time, the question arises as to what impact such processes have on listeners' normalization for speaking rate. The present study tested the effect of sentence duration and fast-speech processes on rate normalization for a German vowel duration contrast. Results showed that a naturally produced short sentence containing segmental reductions and deletions led to the most "long" vowel responses whereas the long sentence with clearly articulated segments led to the fewest. This suggests that speaking rate is not merely calculated as the number of segments realized per unit time. Rather, listeners associate properties of natural fast speech with a higher speaking rate. This contrasts with earlier results and a second experiment in which perceived speaking rate was measured in an explicit task. Models of speech comprehension are evaluated with regard to the present findings.

Roy B. C., Frank M. C., DeCamp P., Miller M., & Roy D . ( 2015).

Predicting the birth of a spoken word

Proceedings of the National Academy of Sciences of the United of America, 112( 41), 12663-12668.

URL     PMID:26392523      [本文引用: 1]

Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child’s...

Saffran J. R., & Kirkham, N. Z . ( 2018).

Infant statistical learning

Annual Review of Psychology, 69, 181-203.

URL     [本文引用: 1]

Saffran J. R., Aslin R. N., & Newport E. L . ( 1996).

Statistical learning by 8-month-old infants

Science, 274, 1926-1928.

URL     [本文引用: 4]

Saffran J. R., Newport E. L., & Aslin R. N . ( 1996).

Word segmentation: The role of distributional cues

Journal of Memory and Language, 35( 4), 606-621.

URL    

One of the infant's first tasks in language acquisition is to discover the words embedded in a mostly continuous speech stream. This learning problem might be solved by using distributional cues to word boundaries or example, by computing the transitional probabilities between sounds in the language input and using the relative strengths of these probabilities to hypothesize word boundaries. The learner might be further aided by language-specific prosodic cues correlated with word boundaries. As a first step in testing these hypotheses, we briefly exposed adults to an artificial language in which the only cues available for word segmentation were the transitional probabilities between syllables. Subjects were able to learn the words of this language. Furthermore, the addition of certain prosodic cues served to enhance performance. These results suggest that distributional cues may play an important role in the initial word segmentation of language learners.

Shatzman K. B., & McQueen, J. M . ( 2006).

Segment duration as a cue to word boundaries in spoken-word recognition

Perception, & Psychophysics, 68( 1), 1-16.

URL     PMID:16617825      [本文引用: 2]

In two eye-tracking experiments, we examined the degree to which listeners use acoustic cues to word boundaries. Dutch participants listened to ambiguous sentences in which stop-initial words (e.g., pot , jar) were preceded by eens (once); the sentences could thus also refer to cluster-initial words (e.g., een spot , a spotlight). The participants made fewer fixations to target pictures (e.g., a jar) when the target and the preceding [s] were replaced by a recording of the cluster-initial word than when they were spliced from another token of the target-bearing sentence (Experiment 1). Although acoustic analyses revealed several differences between the two recordings, only [s] duration correlated with the participants fixations (more target fixations for shorter [s]s). Thus, we found that listeners apparently do not use all available acoustic differences equally. In Experiment 2, the participants made more fixations to target pictures when the [s] was shortened than when it was lengthened. Utterance interpretation can therefore be influenced by individual segment duration alone.

Skoruppa K., Nevins A., Gillard A., & Rosen S . ( 2015).

The role of vowel phonotactics in native speech segmentation

Journal of Phonetics, 49, 67-76.

URL     [本文引用: 1]

61English listeners use vowel phonotactics for the segmentation of unknown words.61They exploit the fact that English words do not end in lax vowels (e.g. [*di09t03]).61The lax vowel constraint influences segmentation in quiet but not in noise.

Steinhauer K., Alter K., & Friederici A. D . ( 1999).

Brain potentials indicate immediate use of prosodic cues in natural speech processing

Nature Neuroscience, 2( 2), 191-196.

URL     PMID:10195205      [本文引用: 2]

Abstract Spoken language, in contrast to written text, provides prosodic information such as rhythm, pauses, accents, amplitude and pitch variations. However, little is known about when and how these features are used by the listener to interpret the speech signal. Here we use event-related brain potentials (ERP) to demonstrate that intonational phrasing guides the initial analysis of sentence structure. Our finding of a positive shift in the ERP at intonational phrase boundaries suggests a specific on-line brain response to prosodic processing. Additional ERP components indicate that a false prosodic boundary is sufficient to mislead the listener's sentence processor. Thus, the application of ERP measures is a promising approach for revealing the time course and neural basis of prosodic information processing.

Suomi K., McQueen J. M., & Cutler A . ( 1997).

Vowel harmony and speech segmentation in Finnish

Journal of Memory and Language, 36( 3), 422-444.

URL     [本文引用: 1]

Tremblay A., & Spinell, E. ( 2013).

Segmenting liaison-initial words: The role of predictive dependencies

Language and Cognitive Processes, 28( 8), 1093-1113.

URL     [本文引用: 1]

Listeners use several cues to segment speech into words. However, it is unclear how these cues work together. This study examines the relative weight of distributional and (natural) acoustic-phonetic cues in French listeners' recognition of temporarily ambiguous vowel-initial words in liaison contexts (e.g., parfait [t]abri perfect shelter) and corresponding consonant-initial words (e.g., parfait tableau perfect painting). Participants completed a visual-world eye-tracking experiment in which they heard adjective-noun sequences where the pivotal consonant was /t/ (more frequent as word-initial consonant and thus expected advantage for consonant-initial words), /z/ (more frequent as liaison consonant and thus expected advantage for liaison-initial words), or /n/ (roughly as frequent as word-initial and liaison consonants and thus no expected advantage). The results for /t/ and /z/ were as expected, but those for /n/ showed an advantage for consonant-initial words over liaison-initial ones. These results are consistent with speech segmentation theories in which distributional information supersedes acoustic-phonetic information, but they also suggest a privileged status for consonant-initial words when the input does not strongly favour liaison-initial words.

White L., Mattys S. L., & Wiget L . ( 2012).

Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

Frontiers in Psychology, 3, 375.

URL     PMID:3464055      [本文引用: 2]

Abstract Multiple cues influence listeners' segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker's articulatory effort - hyperarticulation vs. hypoarticulation (H&H) - may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners' interpretation of segmentation cues is affected by speech style (spontaneous conversation vs. read), using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylized landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues - semantic likelihood and cross-boundary diphone phonotactics - was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech. Independent of speech style, we found an interaction between cue valence (favorable/unfavorable) and cue type (phonotactics/semantics). Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behavior. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically based cues in the segmentation of natural conversational speech.

Woodrow H., ( 1909).

A quantitative study of rhythm: The effect of variations in intensity, rate and duration

New York: Science Press.

[本文引用: 1]

版权所有 © 《心理科学进展》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn

/