ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2026, Vol. 34 ›› Issue (8): 1299-1308.doi: 10.3724/SP.J.1042.2026.1299 cstr: 32111.14.2026.1299

• 研究构想 •    下一篇

微语调表情:检测情绪泄露的新框架

申寻兵, 丰婷婷, 盛静, 彭咏梅, 刘仪辉, 李雅方, 陈振彩   

  1. 江西省中医药管理局中医心理与脑科学重点研究室; 江西中医药大学人文学院, 南昌 330004
  • 收稿日期:2026-02-03 出版日期:2026-08-15 发布日期:2026-06-03
  • 基金资助:
    国家自然科学基金地区项目(32560200), 教育部人文社会科学研究规划基金项目(24YJA190013)资助

Vocal micro-expressions: A new framework for detecting emotional leakage

SHEN Xunbing, FENG Tingting, SHENG Jing, PENG Yongmei, LIU Yihui, LI Yafang, CHEN Zhencai   

  1. Key Laboratory of Psychology of TCM and Brain Science, Jiangxi Administration of Traditional Chinese Medicine; School of Humanities, Jiangxi University of Chinese Medicine, Nanchang 330004, China
  • Received:2026-02-03 Online:2026-08-15 Published:2026-06-03

摘要: 微表情是一种泄露出来的反映内心真实情绪状态的短暂表情, 已有文献对微表情的研究主要集中在微面部表情上, 微表情的发现和定义也只提及了面部表情。那么, 在语调表情中是否也存在类似的情绪泄露, 即是否存在微语调表情?各种证据表明微语调表情是存在的:生活中, 人们可以听出他人在说话时的紧张与不安; 理论上, 压抑的情绪可以通过语音通道泄露出来。泄露出来的微语调表情如何捕捉?听觉上微语调表情可能较难被觉察, 但可以通过言语情绪识别技术, 借鉴物理学的应变(Strain)概念, 对语音中泄露出来的短暂情绪信息进行客观度量; 同时, 可通过系列心理学实验分析这些短暂情绪信息的特征并将其应用到欺骗检测中, 为微语调表情的存在提供科学证据。本研究从理论建构的角度分析微语调表情概念的内涵和外延、结构和功能, 将深化对微表情的理论认识及促进其在欺骗检测中的应用。

关键词: 微语调表情, 情绪泄露, 言语情绪识别, 欺骗检测, 机器学习

Abstract: Micro-expressions are brief, involuntary emotional expressions that reveal an individual’s genuine internal affective state. Existing research has predominantly operationalized micro-expressions within the facial modality, restricting both their definition and detection to transient facial muscular activations. This face-centric perspective raises an important yet underexplored question: does a comparable form of emotional leakage occur in the vocal channel? More specifically, might “vocal micro-expressions” exist as subtle, short-lived variations embedded within speech prosody? Addressing this question is critical for advancing a more comprehensive and modality-general account of emotional leakage.
A growing body of evidence supports the theoretical and empirical plausibility of vocal micro-expressions. In everyday social interactions, individuals can often infer latent emotional states—such as nervousness, hesitation, or anxiety—from subtle changes in another person’s voice, even when those emotions are intentionally concealed. From the perspective of emotional leakage theory, affective suppression is inherently incomplete, yielding residual activation that propagates through less consciously regulated channels, including vocal production. Speech production, governed by tightly coupled respiratory, phonatory, and articulatory subsystems, is modulated by autonomic arousal and affective dynamics, thereby constituting a plausible substrate for transient, low-amplitude emotional signals. Importantly, such signals are likely to manifest as fine-grained, temporally localized deviations in acoustic features rather than as sustained prosodic patterns.
To systematically examine the existence and properties of vocal micro-expressions, the present study proposes the construction of a deception-elicited emotional speech corpus under controlled experimental conditions. Deception is adopted as the elicitation paradigm due to its well-established association with elevated cognitive load and affective arousal, both of which facilitate emotional leakage. Furthermore, deception inherently involves a conflict between internal states and external expressions, thereby increasing the likelihood of transient, involuntary perturbations. Data acquisition is conducted within an interactive communication framework to preserve ecological validity, as deception predominantly occurs in dialogic rather than monologic contexts. Participants engage in structured interaction tasks designed to elicit both deceptive and truthful responses, while multimodal recordings (audio-video) are obtained under both conditions. Temporal synchronization across modalities enables fine-grained alignment between vocal and facial signals, supporting cross-modal validation and integrative analysis.
A central methodological challenge lies in the detection and quantification of transient, low-salience vocal perturbations. Unlike facial micro-expressions, which can be captured via high-speed imaging and localized in the spatial domain, vocal micro-expressions are distributed over time and often fall below the threshold of conscious auditory perception due to their brevity and low amplitude. To address this limitation, we introduce a strain-inspired measurement framework defined in acoustic feature space. Specifically, vocal micro-expressions are operationalized as normalized deviations: Micro vocal expression = ΔL/Lo, where ΔL denotes instantaneous deviations in one or a composite set of acoustic features (e.g., fundamental frequency, energy, spectral descriptors, or cepstral coefficients), and Lo denotes the corresponding baseline estimate, computed either at the utterance level or via speaker-adaptive normalization. This formulation enables robust, speaker-invariant quantification while preserving sensitivity to fine-grained temporal fluctuations. Building on this formulation, the temporal dynamics of vocal micro-expressions are parameterized using onset, apex, and offset, providing a principled representation of their emergence, peak intensity, and dissipation. These temporal markers can be extracted via change-point detection or peak analysis algorithms, enabling segmentation of continuous speech into candidate micro-expression events. Such parameterization facilitates both descriptive analysis and downstream modeling, including sequence-based learning and temporal pattern recognition.
The proposed framework is evaluated through a series of behavioral and computational experiments aimed at (i) characterizing the statistical and distributional properties of detected vocal micro-expressions and (ii) assessing their discriminative utility in deception detection tasks. Supervised and self-supervised machine learning models, including deep neural architectures, are employed for feature representation, temporal modeling, and classification. By integrating theoretical formalization, ecologically grounded data acquisition, and advanced computational modeling, this work seeks to establish a robust empirical foundation for vocal micro-expressions. More broadly, it extends micro-expression research beyond the facial modality, advancing a unified, multimodal account of emotional leakage and contributing to the development of next-generation systems for affective computing and deception detection.

Key words: vocal micro-expressions, leakage of emotion, speech emotion recognition, deception detection, machine learning

中图分类号: