This repository contains the code and resources for my Bachelor's thesis project focusing on using Generative Adversarial Networks (GANs), particularly CycleGAN, for speech emotion augmentation. The project aims to transfer different emotions within the speech domain, allowing for the transformation of emotional characteristics in audio samples.
The project utilizes the Speech Emotion Recognition (SER) dataset, comprising audio recordings from 10 speakers in each of the two languages included(English and Chinese). The audio files are pre-processed into spectrograms and further quantized using the Fourier series to prepare the data for the emotion transfer process.
The core methodology involves the implementation of CycleGAN, a type of GAN known for its ability to perform unpaired image-to-image translation, adapted for the transfer of emotional characteristics in speech signals. The CycleGAN framework facilitates the transformation between different emotional states in the audio domain.
Generative_Model/: Includes the implementation of the CycleGan model and training for converting the source data to N target classes.
Python 3.x PyTorch (or any other deep learning framework) Librosa (for audio processing)