This repository contains code for zero-shot, metadata conditioned personalisation of wav2vec speech models. Personalisation is achieved through an adaptation of hyperformers - relevant reused code is assembled in ./hyperformer.
You can use either poetry (poetry install) of pip (pip install -r requirements.txt).
The training code is configured via .json files (you can find templates for the wav2vec baseline and personalisation in ./configs). You have to adapt the paths to your data directory (data_base), your label directory (label_base - containing train.csv, dev.csv and test.csv) and the csv file containing metadata for every subject (metadata_file).
The label csvs should have this header:
filename,[subject_column],[target]where filename is the relative path to each audio file from data_base. [subject_column] and [target] are defined through the config file and should contain the key used for personalisation (e.g., a participant code) and the label for the audio file.
The metadata csv should follow the format:
[subject_column],[m_1],[m_2],...,[m_N]where [subject_column] is defined in the config file and should have matching keys with the subjects defined for the audio files in the labels. [m_i] are metadata columns containing numeric values. The config files can be adjusted to define which columns will be utilised for personalisation.
Run
python -m hyperpersonalisation.trainer configs/wav2vec.jsonto fine-tune the wav2vec encoder.
After fine-tuning, run
python -m hyperpersonalisation.trainer configs/hyperpersonalisation_all.jsonto train the personalisation components.
Please direct any questions or requests to Maurice Gerczuk (maurice.gerczuk at informatik.uni-augsburg.de).