Official PyTorch Implementation of the Paper:
Ștefan Smeu, Dragoș-Alexandru Boldisor, Dan Oneață and Elisabeta Oneață
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
CVPR, 2025
To set up your data, follow these steps:
Download the datasets:
- AV-Deepfake1M(AV1M) Dataset: Follow instructions from AV-Deepfake1M
- FakeAVCeleb Dataset: Follow instructions from FakeAVCeleb GitHub repo
- AVLips Dataset: Follow instructions from LipFD GitHub repo
# clone/install AV-Hubert
git clone https://github.com/facebookresearch/av_hubert.git
cd av_hubert/avhubert
git submodule init
git submodule update
cd ../fairseq
pip install --editable ./
cd ../avhubert
# install additional files for AV-Hubert
mkdir -p content/data/misc/
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2 -O content/data/misc/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -d content/data/misc/shape_predictor_68_face_landmarks.dat.bz2
wget --content-disposition https://github.com/mpc001/Lipreading_using_Temporal_Convolutional_Networks/raw/master/preprocessing/20words_mean_face.npy -O content/data/misc/20words_mean_face.npy
cd ../../
# moving our feature extraction files into avhubert space
cp deepfake_feature_extraction.py av_hubert/avhubert/deepfake_feature_extraction.py
cp deepfake_preprocess.py av_hubert/avhubert/deepfake_preprocess.py
# download avhubert checkpoint
wget https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/vsr/self_large_vox_433h.pt
mv self_large_vox_433h.pt av_hubert/avhubert/self_large_vox_433h.ptThis repository also integrates code from the following repositories:
Main prerequisites:
Python 3.10.14pytorch=2.2.0(older version for compability with AVHubert)pytorch-cuda=12.4lightning=2.4.0torchvision>=0.17scikit-learn>=1.3.2pandas>=2.1.1numpy>=1.26.4pillow>=10.0.1librosa>=0.9.1dlib>=19.24.9skvideo>=1.1.10ffmpeg>=4.3
- Preprocess video files Run deepfake_preprocess.py from av_hubert/avhubert. Example for AV-Deepfake1M
python deepfake_preprocess.py \
--dataset AV1M \
--split train \
--metadata /av1m_metadata/train_metadata.csv \
--data_path /path/to/AV1M_root \
--save_path /path/to/save/output_videos_and_audioand FakeAVCeleb
python deepfake_preprocess.py \
--dataset FakeAVCeleb \
--metadata /path/to/FakeAVCeleb_metadata.csv \
--data_path /path/to/FakeAVCeleb_root \
--save_path /path/to/save/output_videos_and_audio
--category all \- Extract features Run deepfake_feature_extraction.py from av_hubert/avhubert. Example for AV-Deepfake1M
python deepfake_feature_extraction.py \
--dataset AV1M \
--split train \
--metadata /av1m_metadata/train_metadata.csv \
--ckpt_path self_large_vox_433h.pt \
--data_path /path/to/preprocessed/data \
--save_path /path/to/save/featuresand FakeAVCeleb
python deepfake_feature_extraction.py \
--dataset FakeAVCeleb \
--metadata /path/to/FakeAVCeleb_metadata.csv \
--ckpt_path self_large_vox_433h.pt \
--data_path /path/to/preprocessed/data \
--save_path /path/to/save/features \
--category alladd --trimmed for the trimmed version of features
To train the models mentioned in the article, follow:
Set up training and validation data paths in config.py or specify them as arguments when running the training routine as the example below:
python train.py --name=<experiment_name> --data_root_path=<path_to_the_features_data> --metadata_root_path=<path_to_the_folder_containing_the_dataset_metadata_files>The model weights will be available at <save_path>/<name>.pt
We provide weights for our AVH-Align model trained on 45000 real videos from AV-Deepfake1M in checkpoints/AVH-Align_AV1M.pt.
To evaluate a model, use/modify the following example:
python eval.py \
--checkpoint_path checkpoints/AVH-Align_AV1M.pt \
--features_path /path/to/saved/features \
--metadata /av1m_metadata/test_metadata.csv \
--dataset AV1M The code is licensed under CC BY-NC-SA 4.0
If you find this work useful in your research, please cite it.
@InProceedings{AVH-Align,
author = {Smeu, Stefan and Boldisor, Dragos-Alexandru and Oneata, Dan and Oneata, Elisabeta},
title = {Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning localization},
booktitle = {Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025}
}