AVH-Align

Official PyTorch Implementation of the Paper:

Ștefan Smeu, Dragoș-Alexandru Boldisor, Dan Oneață and Elisabeta Oneață
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
CVPR, 2025

Data

To set up your data, follow these steps:

Download the datasets:

AV-Deepfake1M(AV1M) Dataset: Follow instructions from AV-Deepfake1M
FakeAVCeleb Dataset: Follow instructions from FakeAVCeleb GitHub repo
AVLips Dataset: Follow instructions from LipFD GitHub repo

Set-up AV-Hubert

# clone/install AV-Hubert
git clone https://github.com/facebookresearch/av_hubert.git
cd av_hubert/avhubert
git submodule init
git submodule update
cd ../fairseq
pip install --editable ./
cd ../avhubert
# install additional files for AV-Hubert
mkdir -p content/data/misc/
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2 -O content/data/misc/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -d content/data/misc/shape_predictor_68_face_landmarks.dat.bz2
wget --content-disposition https://github.com/mpc001/Lipreading_using_Temporal_Convolutional_Networks/raw/master/preprocessing/20words_mean_face.npy -O content/data/misc/20words_mean_face.npy
cd ../../
# moving our feature extraction files into avhubert space
cp deepfake_feature_extraction.py av_hubert/avhubert/deepfake_feature_extraction.py 
cp deepfake_preprocess.py av_hubert/avhubert/deepfake_preprocess.py

# download avhubert checkpoint
wget https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/vsr/self_large_vox_433h.pt
mv self_large_vox_433h.pt av_hubert/avhubert/self_large_vox_433h.pt

This repository also integrates code from the following repositories:

Installation

Main prerequisites:

Python 3.10.14
pytorch=2.2.0 (older version for compability with AVHubert)
pytorch-cuda=12.4
lightning=2.4.0
torchvision>=0.17
scikit-learn>=1.3.2
pandas>=2.1.1
numpy>=1.26.4
pillow>=10.0.1
librosa>=0.9.1
dlib>=19.24.9
skvideo>=1.1.10
ffmpeg>=4.3

Feature extraction

Preprocess video files Run deepfake_preprocess.py from av_hubert/avhubert. Example for AV-Deepfake1M

python deepfake_preprocess.py \
    --dataset AV1M \
    --split train \
    --metadata /av1m_metadata/train_metadata.csv \
    --data_path /path/to/AV1M_root \
    --save_path /path/to/save/output_videos_and_audio

and FakeAVCeleb

python deepfake_preprocess.py \
    --dataset FakeAVCeleb \
    --metadata /path/to/FakeAVCeleb_metadata.csv \
    --data_path /path/to/FakeAVCeleb_root \
    --save_path /path/to/save/output_videos_and_audio
    --category all \

Extract features Run deepfake_feature_extraction.py from av_hubert/avhubert. Example for AV-Deepfake1M

python deepfake_feature_extraction.py \
    --dataset AV1M \
    --split train \
    --metadata /av1m_metadata/train_metadata.csv \
    --ckpt_path self_large_vox_433h.pt \
    --data_path /path/to/preprocessed/data \
    --save_path /path/to/save/features

and FakeAVCeleb

python deepfake_feature_extraction.py \
    --dataset FakeAVCeleb \
    --metadata /path/to/FakeAVCeleb_metadata.csv \
    --ckpt_path self_large_vox_433h.pt \
    --data_path /path/to/preprocessed/data \
    --save_path /path/to/save/features \
    --category all

add --trimmed for the trimmed version of features

Train

To train the models mentioned in the article, follow:

Set up training and validation data paths in config.py or specify them as arguments when running the training routine as the example below:

python train.py --name=<experiment_name> --data_root_path=<path_to_the_features_data> --metadata_root_path=<path_to_the_folder_containing_the_dataset_metadata_files>

The model weights will be available at <save_path>/<name>.pt

Pretrained Models

We provide weights for our AVH-Align model trained on 45000 real videos from AV-Deepfake1M in checkpoints/AVH-Align_AV1M.pt.

Evaluation

To evaluate a model, use/modify the following example:

python eval.py \ 
    --checkpoint_path checkpoints/AVH-Align_AV1M.pt \ 
    --features_path /path/to/saved/features \ 
    --metadata /av1m_metadata/test_metadata.csv \ 
    --dataset AV1M

License

The code is licensed under CC BY-NC-SA 4.0

Citation

If you find this work useful in your research, please cite it.

@InProceedings{AVH-Align,
    author    = {Smeu, Stefan and Boldisor, Dragos-Alexandru and Oneata, Dan and Oneata, Elisabeta},
    title     = {Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning localization},
    booktitle = {Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
av1m_metadata		av1m_metadata
avh_sup		avh_sup
checkpoints		checkpoints
README.md		README.md
config.py		config.py
dataset.py		dataset.py
deepfake_feature_extraction.py		deepfake_feature_extraction.py
deepfake_preprocess.py		deepfake_preprocess.py
eval.py		eval.py
model.py		model.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AVH-Align

Data

Set-up AV-Hubert

Installation

Feature extraction

Train

Pretrained Models

Evaluation

License

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Languages

bit-ml/AVH-Align

Folders and files

Latest commit

History

Repository files navigation

AVH-Align

Data

Set-up AV-Hubert

Installation

Feature extraction

Train

Pretrained Models

Evaluation

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages