Music-Genre-Classification-Using-Audio-Spectrogram-Transformer

Fine tune an audio spectrogram transformer using GTZAN dataset to estimate music genre

Model

AST: Audio Spectrogram Transformer Yuan Gong, Yu-An Chung, James Glass. The main idea is applying a visual transformer to the spectogram of a given audio signal in order to extract features for classification. The model was pretrained on AudioSet dataset which has a variety of labeled youtube audio signals(Labels include music,bark,engine etc.). [https://huggingface.co/docs/transformers/en/model_doc/audio-spectrogram-transformer]

Dataset

"GTZAN is a dataset for musical genre classification of audio signals. The dataset consists of 1,000 audio tracks, each of 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22,050Hz Mono 16-bit audio files in WAV format. The genres are: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock."[https://huggingface.co/datasets/marsyas/gtzan]

Fine tuning method

These are used to adapt AST to the new task during training:

Freezing pretrained AST model layers.
Replacing the last layer is with a 2 layer MLP wi dropout and Adding a possibility of a DoRA wrapper in the encoder feed forward's last layer.
Adding sound augmentation to diversify the small dataset(Slow/fast, lowpass/highpass, echo and a mix of these).
Optuna hyperparameter search.

Results

Used 80% of the GTZAN samples as a training set and the rest were equally divided to a validation and test set(10% each).
Using the section above's method during training and validation, the test set classification accuracy achieved was 83-87% and its confusion matrix:

Cross entropy loss vs training iterations:

Validation accuracy vs training iterations(Best model validation accuracy is 87-89%):

Getting started

Use git to clone the repository with the following command:
git clone https://github.com/taldatech/ee046211-deep-learning.git
If an ece046211 virtual environment is already installed on your machine, activate it(conda activate deep_learn), skip to the transformers package installation in the table below and continue from there.
Else:

Get Anaconda with Python 3, follow the instructions according to your OS (Windows/Mac/Linux) at: https://www.anaconda.com/download
Create a new environment for the course and install packages from scratch: In Windows open Anaconda Prompt from the start menu, in Mac/Linux open the terminal and run conda create --name deep_learn python=3.9. Full guide at https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands
To activate the environment, open the terminal (or Anaconda Prompt in Windows) and run conda activate deep_learn
Install the required libraries according to the table below (to search for a specific library and the corresponding command you can also look at https://anaconda.org/)

Libraries to Install

Library	Command to Run
`Jupyter Notebook`	`conda install -c conda-forge notebook`
`numpy`	`conda install -c conda-forge numpy`
`matplotlib`	`conda install -c conda-forge matplotlib`
`pandas`	`conda install -c conda-forge pandas`
`scipy`	`conda install -c anaconda scipy`
`scikit-learn`	`conda install -c conda-forge scikit-learn`
`seaborn`	`conda install -c conda-forge seaborn`
`tqdm`	`conda install -c conda-forge tqdm`
`opencv`	`conda install -c conda-forge opencv`
`optuna`	`pip install optuna`
`pytorch` (cpu)	`conda install pytorch torchvision torchaudio cpuonly -c pytorch` (get command from PyTorch.org)
`pytorch` (gpu)	`conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia` (get command from PyTorch.org)
`torchtext`	`conda install -c pytorch torchtext`
`torchdata`	`conda install -c pytorch torchdata` + `pip install portalocker`
`transformers`	`conda install -c conda-forge transformers`
`accelerate`	`conda install -c conda-forge accelerate`
`datasets`	`conda install -c conda-forge datasets`
`evaluate`	`conda install -c conda-forge evaluate`
`pydub`	`conda install -c conda-forge pydub`
`audiomentations`	`pip install audiomentations`
`librosa`	`conda install -c conda-forge librosa`
`tensorboardX`	`conda install -c conda-forge tensorboardX`
`tqdm`	`conda install -c conda-forge tqdm`

Train and Test Notebooks

There are two jupyter notebooks in the repository:

train_test_gtzan.ipynb- Trains the AST model on GTZAN dataset using the suggested fine tuning method, save the most accurate model on the validation set and show the test set results of it.
test_best_model.ipynb- Test the current saved best music genre classification model capabilities on your own music files.

To open a notebook, open Ananconda Navigator or run jupyter notebook in the terminal (or Anaconda Prompt in Windows) while the deep_learn environment is activated.

References

Yuan Gong and Yu-An Chung and James Glass "AST: Audio Spectrogram Transformer", 2021,Proc. Interspeech 2021,571-575.
https://huggingface.co/docs/transformers/en/model_doc/audio-spectrogram-transformer
https://huggingface.co/learn/audio-course/en/chapter4/fine-tuning#conclusion
https://huggingface.co/datasets/marsyas/gtzan

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Best_Model		Best_Model
augmentation_examples		augmentation_examples
gtzan_image_samples		gtzan_image_samples
gtzan_music_samples		gtzan_music_samples
images		images
jamendo_music_samples		jamendo_music_samples
model		model
.gitattributes		.gitattributes
Presentation.mp4		Presentation.mp4
Presentation.pptx		Presentation.pptx
README.md		README.md
Report.pdf		Report.pdf
optuna_study.db		optuna_study.db
test_best_model.ipynb		test_best_model.ipynb
train_test_gtzan.ipynb		train_test_gtzan.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music-Genre-Classification-Using-Audio-Spectrogram-Transformer

Fine tune an audio spectrogram transformer using GTZAN dataset to estimate music genre

Model

Dataset

Fine tuning method

Results

Getting started

Libraries to Install

Train and Test Notebooks

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Music-Genre-Classification-Using-Audio-Spectrogram-Transformer

Fine tune an audio spectrogram transformer using GTZAN dataset to estimate music genre

Model

Dataset

Fine tuning method

Results

Getting started

Libraries to Install

Train and Test Notebooks

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages