Skip to content

sbajamy/Music-Genre-Classification-Using-Audio-Spectrogram-Transformer

Repository files navigation

Music-Genre-Classification-Using-Audio-Spectrogram-Transformer

Fine tune an audio spectrogram transformer using GTZAN dataset to estimate music genre

alt text

Model

AST: Audio Spectrogram Transformer Yuan Gong, Yu-An Chung, James Glass. The main idea is applying a visual transformer to the spectogram of a given audio signal in order to extract features for classification. The model was pretrained on AudioSet dataset which has a variety of labeled youtube audio signals(Labels include music,bark,engine etc.). [https://huggingface.co/docs/transformers/en/model_doc/audio-spectrogram-transformer]
alt text

Dataset

"GTZAN is a dataset for musical genre classification of audio signals. The dataset consists of 1,000 audio tracks, each of 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22,050Hz Mono 16-bit audio files in WAV format. The genres are: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock."[https://huggingface.co/datasets/marsyas/gtzan]

Fine tuning method

These are used to adapt AST to the new task during training:

  • Freezing pretrained AST model layers.
  • Replacing the last layer is with a 2 layer MLP wi dropout and Adding a possibility of a DoRA wrapper in the encoder feed forward's last layer.
  • Adding sound augmentation to diversify the small dataset(Slow/fast, lowpass/highpass, echo and a mix of these).
  • Optuna hyperparameter search.

Results

Used 80% of the GTZAN samples as a training set and the rest were equally divided to a validation and test set(10% each).
Using the section above's method during training and validation, the test set classification accuracy achieved was 83-87% and its confusion matrix:
alt text
Cross entropy loss vs training iterations:
alt text
Validation accuracy vs training iterations(Best model validation accuracy is 87-89%):
alt text

Getting started

Use git to clone the repository with the following command:
git clone https://github.com/taldatech/ee046211-deep-learning.git
If an ece046211 virtual environment is already installed on your machine, activate it(conda activate deep_learn), skip to the transformers package installation in the table below and continue from there.
Else:

  1. Get Anaconda with Python 3, follow the instructions according to your OS (Windows/Mac/Linux) at: https://www.anaconda.com/download
  2. Create a new environment for the course and install packages from scratch: In Windows open Anaconda Prompt from the start menu, in Mac/Linux open the terminal and run conda create --name deep_learn python=3.9. Full guide at https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands
  3. To activate the environment, open the terminal (or Anaconda Prompt in Windows) and run conda activate deep_learn
  4. Install the required libraries according to the table below (to search for a specific library and the corresponding command you can also look at https://anaconda.org/)

Libraries to Install

Library Command to Run
Jupyter Notebook conda install -c conda-forge notebook
numpy conda install -c conda-forge numpy
matplotlib conda install -c conda-forge matplotlib
pandas conda install -c conda-forge pandas
scipy conda install -c anaconda scipy
scikit-learn conda install -c conda-forge scikit-learn
seaborn conda install -c conda-forge seaborn
tqdm conda install -c conda-forge tqdm
opencv conda install -c conda-forge opencv
optuna pip install optuna
pytorch (cpu) conda install pytorch torchvision torchaudio cpuonly -c pytorch (get command from PyTorch.org)
pytorch (gpu) conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia (get command from PyTorch.org)
torchtext conda install -c pytorch torchtext
torchdata conda install -c pytorch torchdata + pip install portalocker
transformers conda install -c conda-forge transformers
accelerate conda install -c conda-forge accelerate
datasets conda install -c conda-forge datasets
evaluate conda install -c conda-forge evaluate
pydub conda install -c conda-forge pydub
audiomentations pip install audiomentations
librosa conda install -c conda-forge librosa
tensorboardX conda install -c conda-forge tensorboardX
tqdm conda install -c conda-forge tqdm

Train and Test Notebooks

There are two jupyter notebooks in the repository:

  • train_test_gtzan.ipynb- Trains the AST model on GTZAN dataset using the suggested fine tuning method, save the most accurate model on the validation set and show the test set results of it.
  • test_best_model.ipynb- Test the current saved best music genre classification model capabilities on your own music files.

To open a notebook, open Ananconda Navigator or run jupyter notebook in the terminal (or Anaconda Prompt in Windows) while the deep_learn environment is activated.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors