Transformer-Based Self-Supervised Multimodal ...
Type de document :
Article dans une revue scientifique: Article original
URL permanente :
Titre :
Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
Auteur(s) :
Wu, Yujin [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Daoudi, Mohamed [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Amad, Ali [Auteur]
Lille Neurosciences & Cognition (LilNCog) - U 1172
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Daoudi, Mohamed [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Amad, Ali [Auteur]
Lille Neurosciences & Cognition (LilNCog) - U 1172
Titre de la revue :
IEEE Transactions on Affective Computing
Nom court de la revue :
IEEE Trans. Affect. Comput.
Numéro :
15
Pagination :
-
Date de publication :
2024-04-19
ISSN :
1949-3045
Mot(s)-clé(s) en anglais :
multimodal fusion
physiological signals
self-supervised learning
transformers
Emotion recognition
physiological signals
self-supervised learning
transformers
Emotion recognition
Discipline(s) HAL :
Sciences du Vivant [q-bio]
Résumé en anglais : [en]
Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse ...
Lire la suite >Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model was proved to be more accurate and robust compared to fully-supervised methods on low data regimes.Lire moins >
Lire la suite >Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model was proved to be more accurate and robust compared to fully-supervised methods on low data regimes.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Établissement(s) :
Université de Lille
Inserm
CHU Lille
Inserm
CHU Lille
Collections :
Date de dépôt :
2024-05-06T23:15:33Z
2024-05-31T11:10:51Z
2024-05-31T11:10:51Z
Fichiers
- document
- Accès libre
- Accéder au document