ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

Advances in Psychological Science ›› 2024, Vol. 32 ›› Issue (1): 1-13.doi: 10.3724/SP.J.1042.2024.00001

• Conceptual Framework •     Next Articles

Cross-modal analysis of facial EMG in micro-expressions and data annotation algorithm

WANG Su-Jing1,2(), WANG Yan1,2, Li Jingting1,2, DONG Zizhao1,2, ZHANG Jianhang3, LIU Ye2,4   

  1. 1CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100101, China
    2Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
    3School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212003, China
    4State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100039, China
  • Received:2023-06-25 Online:2024-01-15 Published:2024-06-12
  • Contact: WANG Su-Jing E-mail:wangsujing@psych.ac.cn

Abstract:

Micro-expression analysis combined with deep learning has become a major trend. However, the small sample size problem has always hindered the further development of micro-expression analysis relying on deep learning. Micro-expressions are brief, subtle facial expressions, so the time cost and labor cost of micro-expression data annotation are very high, which leads to the problem of small sample size. To further improve the performance of micro-expression spotting and recognition, a huge amount of micro-expression samples is still needed for deep learning model training. Consequently, this research direction has an urgent desire to solve the problem of micro-expression data annotation. To address this issue, our research uses facial electromyographic (EMG) signals as a technical means to propose a set of solutions to the problem of micro-expression annotation from three aspects: automatic annotation, semi-automatic annotation, and unsupervised annotation of micro-expression data.

First, we use physiological psychology methods to combine facial EMG signals and behavioral cognitive psychology experiments to explore the physiological characteristics of micro-expressions. In this study, we recorded the signal frequency and amplitude during the contraction of facial muscles or muscle groups. And relevant EMG metrics were used to accurately and objectively quantify the three features of micro-expressions, namely, short presentation time, small movement amplitude, and asymmetry, to provide a theoretical basis for subsequent research on annotation and intelligent analysis of micro-expressions.

Second, for automatic annotation, this study proposes an automatic annotation scheme for micro-expressions based on distal facial electromyography. Specifically, we deploy EMG electrodes around the face without obscuring the facial expression being expressed. In this way, automatic annotation of micro-expression data by combining EMG information with video is implemented. Meantime, we design a psychological paradigm for inducing facial muscle movements. And based on the electromyographic signal pattern of micro-expressions, we develop an algorithm for automatic micro-expression annotation. Finally, we integrated the automatic annotation process and designed an automated annotation interactive software, which can greatly save the time of micro-expression annotation, reduce the workload of micro-expression coders, and solve the problem of small samples in micro-expression database to a certain extent.

Third, for semi-automatic annotation, we focus on the temporal action localization of micro-expressions (METL), i.e., the process of inferring the onset and offset frames of a micro-expression segment, based on the manual annotation of a single frame within that micro-expression. In particular, we propose a Micro-Expression Contrastive Identification Annotation (MECIA) method as a solution to METL. The backbone of the proposed MECIA method is a deep learning network. The network contains three modules: a contrastive module, an identification module, and an annotation module, corresponding to the three steps of manual annotation. The network's outputs infer the temporal localization of micro-expression clips. The experiments demonstrate that our inferred micro-expression intervals can correspond well to ground-truth intervals, demonstrating the potential of this approach to improve the efficiency of vision-based micro-expression annotation.

Fourth, for unsupervised annotation, due to the limited number of annotated micro-expression samples, we propose a self-supervised learning-based micro-expression analysis algorithm implemented in massive unsupervised annotation face and expression videos. Precisely, we provide time-domain supervised information for unsupervised annotation face videos based on the correspondence between facial EMG and facial expressions. And we design a Transformer-based self-supervised model for cross-modal contrastive learning, which utilizes EMG signals to enhance network learning of features targeting micro-expression action change patterns. Specifically, the introduction of EMG signals enhances the contrastive learning model to capture the weak dynamic facial changes in the time domain. This self-supervised learning model incorporating EMG signals can strengthen the model's understanding of visual features. In addition, cross-modal learning allows the model to learn more generalized features and enhance the robustness of the system.

Key words: image annotation, micro-expression analysis, distal facial electromyography, micro-expression data annotation

CLC Number: