This repository contains experiments for detecting depression from speech using deep learning models. The project leverages the autrainer toolkit, a modular framework built on PyTorch and Hydra for reproducible computer audition research.
The primary goal is to classify patients as depressive or non-depressive based on features extracted from their speech. Two main architectural approaches are explored:
- CNN10 Model: A Convolutional Neural Network trained directly on log-Mel spectrograms for binary classification.
- CNN-LSTM Model: Uses a pre-trained CNN10 as a feature extractor, followed by an LSTM with an attention mechanism to model temporal sequences.
- Project Structure
- 1. Setup and Installation
- 2. Environment Configuration
- 3. Data Preparation
- 4. Running Experiments
- 5. Post-processing and Analyzing Results
- 6. Inference
- 7. Manual Evaluation
The repository is organized to support reproducible experiments with autrainer and Hydra.
/
├── conf/ # Hydra/autrainer configuration files for datasets, models, etc.
├── data/ # Processed data (e.g., features, spectrograms).
├── data_raw/ # Raw data downloaded by fetch commands.
├── results/ # Experiment outputs, logs, and trained models.
├── requirements.txt # Project dependencies.
├── postprocess.sh # Helper script to analyze results.
└── README.md # This file.
Follow these steps to set up the environment and install the necessary dependencies.
git clone <repository-url>
cd <repository-name>It is highly recommended to use a virtual environment.
python -m venv venv
source venv/bin/activateInstall all required packages, including autrainer and its dependencies.
pip install -r requirements.txtThis project may require openSMILE for certain feature extraction configurations. If needed, install it separately and ensure it's available in your system's PATH. You can install the optional opensmile dependency for autrainer with:
pip install autrainer[opensmile]Before running the experiments, you need to configure the dataset URL in an environment file.
Copy the example environment file and fill in the Extended DAIC-WOZ dataset link:
cp .env.example .envThen edit the .env file and replace <LINK TO DATASET> with the actual URL to the Extended DAIC-WOZ dataset. You can reference .env.example to see the expected format:
BASE_URL=<LINK TO DATASET>The experiments rely on the Extended DAIC-WOZ dataset. The following commands will download and preprocess the data into the required format.
This command downloads the necessary datasets and pre-trained model weights specified in the configuration files.
autrainer fetchThis command processes the raw audio files into the features required for training (e.g., log-Mel spectrograms). The configurations in the conf/ directory are set up to handle this automatically.
autrainer preprocessAll experiments are managed by autrainer and can be launched from the command line. The results, including logs, model checkpoints, and plots, will be saved in the results/ directory.
This experiment trains a CNN10 model directly on log-Mel spectrograms for end-to-end depression classification.
Run the baseline training with a fixed dataset split:
autrainer train -cn config-fixedThis workflow involves two stages: extracting features with the CNN, and then training the LSTM on those features.
First, run the feature extraction script. This uses the pre-trained CNN10 model to generate feature sequences for each patient and saves them in data/ExtendedDAIC-lstm/features/.
python extract_cnn_features.py \
--data_path data/ExtendedDAIC-16k \
--output_path data/ExtendedDAIC-lstm \
--model_path model.ptOnce the features are extracted, train the LSTM model. This script runs a standalone training process and logs results to Weights & Biases.
python lstm_standalone.pyautrainer provides tools to analyze the results of your experiments, especially for grid searches.
To summarize the results of an experiment and aggregate across different seeds:
# Replace <experiment_id> with the one from your config (e.g., cnn10-fixed)
autrainer postprocess results/<experiment_id> --aggregate seedThis will generate summary CSVs and plots in the results/<experiment_id>/summary/ directory. You can also use the helper script:
./postprocess.sh <experiment_id>To run inference on new audio files using a trained model, use the autrainer inference command.
You need to point to a trained model directory, an input directory with audio files, and an output directory.
# Example command for a trained model from the 'cnn10-fixed' experiment
# Note: The path to the specific run may vary.
autrainer inference \
results/cnn10-fixed/training/ExtendedDAIC-16k-fixed_CNN10-binary_Adam_0.0001_32_epoch_50_None_None_42/ \
/path/to/your/input_audio/ \
/path/to/your/output_predictions/ \
--preprocess-cfg log_mel_16k \
--device cuda:0The repository includes a script to manually calculate detailed metrics from prediction files. After generating depression_predictions.csv and snippet_predictions.csv (e.g., using predict_depression.py), you can get a full evaluation report.
python calculate_metrics.py