Project for the Natural Language Processing (NLP) exam – University of Salerno
DAFT.punk is a sentiment analysis system for Italian song lyrics, developed as part of the NLP course at the University of Salerno. The project leverages a fine-tuned DeBERTa model to classify lyrics into seven emotional categories, providing both a web interface and scripts for data processing and model training.
python gradio_app.pyThe web interface will be available at http://localhost:7860
gradio_app.py– Main web interface (Gradio)scripts/process_lyrics.py– Script for labeling lyrics with emotion using GPTtrain_finetuned_deberta.py– Script for training the DeBERTa modelgenius_fragments_scraper.py– Scraper for collecting and fragmenting lyrics from Genius
model/deberta_italian_v2/– Fine-tuned DeBERTa italian modeldata/raw/lyrics_enriched.json– Main dataset (Italian lyrics with emotion labels)processed/– Training reports and metric plots
- Sentiment analysis on Italian song lyrics with 7 emotions: joy, sadness, rage, love, nostalgia, hope, fear
- Graphical visualization of emotion probabilities
- Scripts for data scraping, processing, training, and evaluation
- Model:
osiria/deberta-base-italian, fine-tuned on a custom dataset of Italian song lyrics - Dataset: ~3MB of Italian lyrics, each labeled with one of 7 emotions (labels generated via GPT-4 and manual reviewed)
Please keep in mind that learning has been heavily limited for economical and hardware reasons.
- Accuracy: 0.6767
- Epochs: 10
- Batch size: 16
Install the required dependencies with:
pip install -r requirements.txtMain dependencies:
- torch
- transformers
- datasets
- gradio
- pandas
- numpy
- openai
- beautifulsoup4
- To launch the web app:
python gradio_app.py - To process and label lyrics:
python scripts/process_lyrics.py - To train the model:
python scripts/train_finetuned_deberta.py