HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer

Kaddar, Bachir; Fezza, Sid Ahmed; Hamidouche, Wassim; Akhtar, Zahid; Hadid, Abdenour

Type de document :

Communication dans un congrès avec actes

DOI :

10.1109/VCIP53242.2021.9675402

Titre :

HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer

Auteur(s) :

Kaddar, Bachir [Auteur]
Fezza, Sid Ahmed [Auteur]
Hamidouche, Wassim [Auteur]
Institut d'Électronique et des Technologies du numéRique [IETR]
Akhtar, Zahid [Auteur]
Hadid, Abdenour [Auteur]
COMmunications NUMériques - IEMN [COMNUM - IEMN]
Institut d’Électronique, de Microélectronique et de Nanotechnologie - UMR 8520 [IEMN]

Titre de la manifestation scientifique :

2021 International Conference on Visual Communications and Image Processing (VCIP)

Ville :

Munich

Pays :

Allemagne

Date de début de la manifestation scientifique :

2021-12-05

Éditeur :

IEEE

Mot(s)-clé(s) en anglais :

DeepFake video
detection
convolutional neural network
vision transformer
hybrid

Discipline(s) HAL :

Informatique [cs]/Traitement du signal et de l'image [eess.SP]

Résumé en anglais : [en]

The number of new falsified video contents is dramatically increasing, making the need to develop effective deepfake detection methods more urgent than ever. Even though many existing deepfake detection approaches show ...
Lire la suite >The number of new falsified video contents is dramatically increasing, making the need to develop effective deepfake detection methods more urgent than ever. Even though many existing deepfake detection approaches show promising results, the majority of them still suffer from a number of critical limitations. In general, poor generalization results have been obtained under unseen or new deepfake generation methods. Consequently, in this paper, we propose a deepfake detection method called HCiT, which combines Convolutional Neural Network (CNN) with Vision Transformer (ViT). The HCiT hybrid architecture exploits the advantages of CNN to extract local information with the ViT's self-attention mechanism to improve the detection accuracy. In this hybrid architecture, the feature maps extracted from the CNN are feed into ViT model that determines whether a specific video is fake or real. Experiments were performed on Faceforensics++ and DeepFake Detection Challenge preview datasets, and the results show that the proposed method significantly outperforms the state-of-the-art methods. In addition, the HCiT method shows a great capacity for generalization on datasets covering various techniques of deepfake generation. The source code is available at: https://github.com/KADDAR-Bachir/HCiTLire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Collections :

Institut d'Électronique, de Microélectronique et de Nanotechnologie (IEMN) - UMR 8520

Source :

Harvested from HAL

HCiT: Deepfake Video Detection Using a ... BibTeX CSV Excel RIS

HCiT: Deepfake Video Detection Using a ...

BibTeX

CSV

Excel

RIS