微语调表情：检测情绪泄露的新框架

doi:10.3724/SP.J.1042.2026.1299

摘要/Abstract

摘要： 微表情是一种泄露出来的反映内心真实情绪状态的短暂表情, 已有文献对微表情的研究主要集中在微面部表情上, 微表情的发现和定义也只提及了面部表情。那么, 在语调表情中是否也存在类似的情绪泄露, 即是否存在微语调表情?各种证据表明微语调表情是存在的：生活中, 人们可以听出他人在说话时的紧张与不安; 理论上, 压抑的情绪可以通过语音通道泄露出来。泄露出来的微语调表情如何捕捉?听觉上微语调表情可能较难被觉察, 但可以通过言语情绪识别技术, 借鉴物理学的应变(Strain)概念, 对语音中泄露出来的短暂情绪信息进行客观度量; 同时, 可通过系列心理学实验分析这些短暂情绪信息的特征并将其应用到欺骗检测中, 为微语调表情的存在提供科学证据。本研究从理论建构的角度分析微语调表情概念的内涵和外延、结构和功能, 将深化对微表情的理论认识及促进其在欺骗检测中的应用。

关键词: 微语调表情, 情绪泄露, 言语情绪识别, 欺骗检测, 机器学习

Abstract: Micro-expressions are brief, involuntary emotional expressions that reveal an individual’s genuine internal affective state. Existing research has predominantly operationalized micro-expressions within the facial modality, restricting both their definition and detection to transient facial muscular activations. This face-centric perspective raises an important yet underexplored question: does a comparable form of emotional leakage occur in the vocal channel? More specifically, might “vocal micro-expressions” exist as subtle, short-lived variations embedded within speech prosody? Addressing this question is critical for advancing a more comprehensive and modality-general account of emotional leakage.
A growing body of evidence supports the theoretical and empirical plausibility of vocal micro-expressions. In everyday social interactions, individuals can often infer latent emotional states—such as nervousness, hesitation, or anxiety—from subtle changes in another person’s voice, even when those emotions are intentionally concealed. From the perspective of emotional leakage theory, affective suppression is inherently incomplete, yielding residual activation that propagates through less consciously regulated channels, including vocal production. Speech production, governed by tightly coupled respiratory, phonatory, and articulatory subsystems, is modulated by autonomic arousal and affective dynamics, thereby constituting a plausible substrate for transient, low-amplitude emotional signals. Importantly, such signals are likely to manifest as fine-grained, temporally localized deviations in acoustic features rather than as sustained prosodic patterns.
To systematically examine the existence and properties of vocal micro-expressions, the present study proposes the construction of a deception-elicited emotional speech corpus under controlled experimental conditions. Deception is adopted as the elicitation paradigm due to its well-established association with elevated cognitive load and affective arousal, both of which facilitate emotional leakage. Furthermore, deception inherently involves a conflict between internal states and external expressions, thereby increasing the likelihood of transient, involuntary perturbations. Data acquisition is conducted within an interactive communication framework to preserve ecological validity, as deception predominantly occurs in dialogic rather than monologic contexts. Participants engage in structured interaction tasks designed to elicit both deceptive and truthful responses, while multimodal recordings (audio-video) are obtained under both conditions. Temporal synchronization across modalities enables fine-grained alignment between vocal and facial signals, supporting cross-modal validation and integrative analysis.
A central methodological challenge lies in the detection and quantification of transient, low-salience vocal perturbations. Unlike facial micro-expressions, which can be captured via high-speed imaging and localized in the spatial domain, vocal micro-expressions are distributed over time and often fall below the threshold of conscious auditory perception due to their brevity and low amplitude. To address this limitation, we introduce a strain-inspired measurement framework defined in acoustic feature space. Specifically, vocal micro-expressions are operationalized as normalized deviations: Micro vocal expression = ΔL/L_o, where ΔL denotes instantaneous deviations in one or a composite set of acoustic features (e.g., fundamental frequency, energy, spectral descriptors, or cepstral coefficients), and L_o denotes the corresponding baseline estimate, computed either at the utterance level or via speaker-adaptive normalization. This formulation enables robust, speaker-invariant quantification while preserving sensitivity to fine-grained temporal fluctuations. Building on this formulation, the temporal dynamics of vocal micro-expressions are parameterized using onset, apex, and offset, providing a principled representation of their emergence, peak intensity, and dissipation. These temporal markers can be extracted via change-point detection or peak analysis algorithms, enabling segmentation of continuous speech into candidate micro-expression events. Such parameterization facilitates both descriptive analysis and downstream modeling, including sequence-based learning and temporal pattern recognition.
The proposed framework is evaluated through a series of behavioral and computational experiments aimed at (i) characterizing the statistical and distributional properties of detected vocal micro-expressions and (ii) assessing their discriminative utility in deception detection tasks. Supervised and self-supervised machine learning models, including deep neural architectures, are employed for feature representation, temporal modeling, and classification. By integrating theoretical formalization, ecologically grounded data acquisition, and advanced computational modeling, this work seeks to establish a robust empirical foundation for vocal micro-expressions. More broadly, it extends micro-expression research beyond the facial modality, advancing a unified, multimodal account of emotional leakage and contributing to the development of next-generation systems for affective computing and deception detection.

Key words: vocal micro-expressions, leakage of emotion, speech emotion recognition, deception detection, machine learning

中图分类号:

B842

申寻兵, 丰婷婷, 盛静, 彭咏梅, 刘仪辉, 李雅方, 陈振彩. (2026). 微语调表情：检测情绪泄露的新框架. 心理科学进展 , 34(8), 1299-1308.

SHEN Xunbing, FENG Tingting, SHENG Jing, PENG Yongmei, LIU Yihui, LI Yafang, CHEN Zhencai. (2026). Vocal micro-expressions: A new framework for detecting emotional leakage. Advances in Psychological Science, 34(8), 1299-1308.

参考文献

[1] 傅小兰. (2023). 情绪心理学:研究与应用. 上海:华东师范大学出版社.
[2] 马忠红. (2018). 以电信诈骗为代表的新型网络犯罪侦查难点及对策研究——基于W省的调研情况.中国人民公安大学学报(社会科学版), 34(3), 78-86.
[3] 申寻兵, 隋华杰, 傅小兰. (2017). 微表情在欺骗检测中的应用.心理科学进展, 25(2), 211-220.
[4] 吴奇, 申寻兵, 傅小兰. (2010). 微表情研究及其应用.心理科学进展, 18(9), 1359-1368.
[5] 赵明珠, 王志勇, 王世斌, 李林安, 孙颖, 李毓. (2017). 基于三维数字图像相关方法的面部表情变形测量研究.实验力学, 32(2), 152-162.
[6] Almaghrabi S. A., Clark S. R., & Baumert M. (2023). Bio-acoustic features of depression: A review.Biomedical Signal Processing and Control, 85, e105020.
[7] Bachorowski, J. A. (1999). Vocal expression and perception of emotion.Current Directions in Psychological Science, 8(2), 53-57.
[8] Bachorowski J. A.,& Owren, M. J. (2008). Vocal expressions of emotion. In Lewis, M., Haviland-Jones, J. M., & Barrett, L. F. (Eds.), Handbook of emotions (3rd ed., pp. 196-210). New York: The Guilford Press.
[9] Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression.Journal of Personality and Social Psychology, 70(3), 614-636.
[10] Belin P., Fecteau S., & Bedard C. (2004). Thinking the voice: Neural correlates of voice perception.Trends in Cognitive Sciences, 8(3), 129-135.
[11] Briefer, E. F. (2012). Vocal expression of emotions in mammals: Mechanisms of production and evidence.Journal of Zoology, 288(1), 1-20.
[12] Brinson, H. F., & Brinson, L. C. (2015). Stress and strain analysis and measurement. In Brinson, H. F., & Brinson, L. C.(Eds.), Polymer engineering science and viscoelasticity: An introduction
[13] Bryant, G., & Barrett, H. C. (2008). Vocal emotion recognition across disparate cultures.Journal of Cognition and Culture, 8(1-2), 135-148.
[14] de Oliveira-Souza, R. (2012). The human extrapyramidal system.Medical Hypotheses, 79(6), 843-852.
[15] DePaulo B. M., Lindsay J. J., Malone B. E., Muhlenbruck L., Charlton K., & Cooper H. (2003). Cues to deception.Psychological Bulletin, 129(1), 74-118.
[16] Ekberg M., Stavrinos G., Andin J., Stenfelt S., & Dahlström Ö. (2025). Acoustic features distinguishing emotions in Swedish speech. Journal of Voice, 39(6), 1699.e11-1699.e20.
[17] Ekman, P. (1981). Mistakes when deceiving.Annals of the New York Academy of Sciences, 364(1), 269-278.
[18] Ekman, P. (2003). Darwin, deception, and facial expression.Annals of the New York Academy of Sciences, 1000(1), 205-221.
[19] Ekman, P., & Friesen, W. V. (1969). Nonverbal leakage and clues to deception.Psychiatry, 32(1), 88-106.
[20] Elkins A., Zafeiriou S., Pantic M., & Burgoon, J. K. (2014). Unobtrusive deception detection. In R. Calvo, S. D' Mello, J. Gratch, & A. Kappas (Eds.), The Oxford handbook of affective computing (pp. 503-515). New York: Oxford University Press.
[21] Enos, F. (2009). Detecting deception in speech [Unpublished doctoral dissertation]. Columbia University.
[22] Eriksson, A., & Lacerda, F. (2007). Charlatanry in forensic speech science: A problem to be taken seriously.International Journal of Speech, Language and the Law, 14(2), 169-193.
[23] Eyben F., Scherer K. R., Schuller B. W., Sundberg J., André E., Busso C., .. Narayanan S. S. (2016). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing.IEEE Transactions on Affective Computing, 7(2), 190-202.
[24] Frank, M. G., & Svetieva, E. (2014). Microexpressions and deception. In K. M. Mandal, & A. Awasthi (Eds.), Understanding facial expressions in communication: Cross-cultural and multidisciplinary perspectives(pp. 227-242). New York: Springer.
[25] Holzman, P. S., & Rousey, C. (1966). The voice as a percept.Journal of Personality and Social Psychology, 4(1), 79-86.
[26] Iacono, W. G. (2024). Psychology and the lie detector industry: A fifty-year perspective.Biological Psychology, 190, e108808.
[27] Jürgens, U. (2002). Neural pathways underlying vocal control.Neuroscience & Biobehavioral Reviews, 26(2), 235-258.
[28] Jürgens, U. (2009). The neural control of vocalization in mammals: A review.Journal of Voice, 23(1), 1-10.
[29] Juslin P. N.,& Scherer, K. R. (2005). Vocal expression of affect. In J. Harrigan, R. Rosenthal, & K. Scherer (Eds.), The new handbook of methods in nonverbal behavior research (pp. 65-135). New York: Oxford University Press.
[30] Kamiloğlu R. G., Fischer A. H., & Sauter D. A. (2020). Good vibrations: A review of vocal expressions of positive emotions.Psychonomic Bulletin & Review, 27(2), 237-265.
[31] Kirchhuebel, C. (2013). The acoustic and temporal characteristics of deceptive speech [Unpublished doctoral dissertation]. University of York.
[32] Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies: An interdisciplinary approach to voice production and perception. Wiley-Blackwell.
[33] Larrouy-Maestri P., Poeppel D., & Pell M. D. (2025). The sound of emotional prosody: Nearly 3 decades of research and future directions.Perspectives on Psychological Science, 20(4), 623-638.
[34] Larson, C. R. (1988). Brain mechanisms involved in the control of vocalization.Journal of Voice, 2(4), 301-311.
[35] McGettigan, C., & Scott, S. K. (2014). Voluntary and involuntary processes affect the production of verbal and non-verbal signals by the human voice.Behavioral and Brain Sciences, 37(6), 564-565.
[36] Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion.The Journal of the Acoustical Society of America, 93(2), 1097-1108.
[37] Özseven, T. (2018). The acoustic cue of fear: Investigation of acoustic parameters of speech containing fear.Archives of Acoustics, 43(2), 245-251.
[38] Ploog, D. (1986). Biological foundations of the vocal expressions of emotions. In R. Plutchik, & H. Kellerman (Eds.), Biological foundations of emotion (Vol. 3, pp. 173-197). New York: Elsevier Inc.
[39] Pylyshyn, Z. W. (2002). Mental imagery: In search of a theory.Behavioral and Brain Sciences, 25(2), 157-182.
[40] Rouse, M. H. (2020). The Neurology of Speech. In M. H. Rouse (Ed.), Neuroanatomy for speech-language pathology and audiology(2nd ed., pp.243-262). Burlington: Jones & Bartlett Learning.
[41] Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research.Psychological Bulletin, 99(2), 143-165.
[42] Scherer K. R., Feldstein S., Bond R. N., & Rosenthal R. (1985). Vocal cues to deception: A comparative channel approach.Journal of Psycholinguistic Research, 14(4), 409-425.
[43] Scott, S. K. (2022). The neural control of volitional vocal production—from speech to identity, from social meaning to song.Philosophical Transactions of the Royal Society B, 377(1841), e20200395.
[44] Shen X., Fan G., Niu C., & Chen Z. (2021). Catching a liar through facial expression of fear.Frontiers in Psychology, 12, e675097.
[45] Tolkmitt, F. J., & Scherer, K. R. (1986). Effect of experimentally induced stress on vocal parameters.Journal of Experimental Psychology: Human Perception and Performance, 12(3), 302-313.
[46] Tutul A. A., Chaspari T., Levitan S. I., & Hirschberg J. (2023). Human-AI collaboration for the detection of deceptive speech. Paper presented at the meeting of 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), Cambridge, MA.
[47] Van Lancker, D., & Cummings, J. L. (1999). Expletives: Neurolinguistic and neurobehavioral perspectives on swearing.Brain Research Reviews, 31(1), 83-104.
[48] Van Puyvelde M., Neyt X., McGlone F., & Pattyn N. (2018). Voice stress analysis: A new framework for voice and effort in human performance.Frontiers in Psychology, 9, e1994.
[49] Wu C.H.,Cantor-Cutiva, L. C., & Hunter, E. J.(2025). Acoustic metrics of the strain dimension of voice quality: A scoping review. Journal of Voice. Advance online publication. https://doi.org/10.1016/j.jvoice.2025.08.018
[50] Xie Y., Liang R., Liang Z., Huang C., Zou C., & Schuller B. (2019). Speech emotion classification using attention- based LSTM.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(11), 1675-1685.
[51] Yovel, G., & Belin, P. (2013). A unified coding strategy for processing faces and voices.Trends in Cognitive Sciences, 17(6), 263-271.
[52] Zuckerman M., DePaulo B. M., & Rosenthal R. (1981). Verbal and nonverbal communication of deception.Advances in experimental social psychology, 14, 1-59.