Skip to content

NotTheStallion/dpo-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dpo-from-scratch

A minimal-from-scratch implementation of Direct Preference Optimization (DPO) training for causal language models.

This repository contains a small experimental pipeline that:

  • Loads a pretrained causal LM and a frozen reference model using Hugging Face Transformers.
  • Loads a preference dataset (Dahoas/full-hh-rlhf), filters for stronger preferences, and tokenizes prompt/completion pairs.
  • Implements the core DPO loss and a lightweight training loop that fine-tunes a subset of model parameters.

This project is intended as an educational reference and starting point for experimenting with preference-based alignment.

Explaination of DPO

Direct Preference Optimization (DPO) is a method for aligning language models with human preferences without requiring reinforcement learning. It works by fine-tuning a pretrained language model using pairs of preferred and non-preferred completions, optimizing a loss function that encourages the model to assign higher probabilities to preferred outputs.

DPO loss formula

DPO training diagram

Logging

The training experiments were tracked using Weights & Biases (wandb).

To insure that our model trains correctly, we logged the following metrics:

  • Training loss over time Training Loss
  • Validation accuracy on held-out preference pairs Validation Accuracy
  • Model log diff (to garantee the logit of wanted answer is higher than unwanted) Log Difference

Contents

  • main.py - Entrypoint script. Loads models/tokenizer, prepares data loaders, runs a training loop with DPO loss, logs metrics to Weights & Biases (wandb), and evaluates accuracy on held-out data.
  • src/dpo.py - Implementation of the DPO loss computation.
  • src/data.py - Dataset loading and filtering (uses datasets.load_dataset("Dahoas/full-hh-rlhf")). Includes a custom is_strong_preference filter.
  • src/utils.py - Tokenization helpers, padding collate function, and a compute_logprob utility that converts model logits into average log-probabilities over completions.

Quick start

Install requirements:

python -m pip install -r requirements.txt

Running training (example):

python main.py --cache_dir /path/to/cache --model_name Gensyn/Qwen2.5-0.5B-Instruct --max_samples 1000 --batch_size 2

Notes:

  • The script will pick cuda if available, otherwise cpu.
  • main.py currently enables gradient updates for parameters whose names include model.layers.23 and freezes the rest — this is a simple way to limit fine-tuning to a small subset of model weights.
  • The default model_name is Gensyn/Qwen2.5-0.5B-Instruct but you can substitute any compatible causal LM model from Hugging Face.

About

Direct Preference Optimization (DPO) from scratch in Pytorch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages