direct-preference-optimization

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

dense-video-captioning long-video-understanding multimodal-large-language-models direct-preference-optimization aaai2025

Updated Jan 26, 2025
Python

Mr-Loevan / DPO-Survey

Star

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

reading-list awesome-list direct-preference-optimization

Updated Jul 14, 2025

JIA-Lab-research / TGDPO

Star

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

alignment preference-learning large-language-models llm rlhf preference-alignment direct-preference-optimization preference-optimization

Updated Jul 15, 2025
Python

pspdada / Uni-DPO

Star

[ICLR 2026] Official repository of "Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs".

reinforcement-learning-algorithms multimodal large-language-models preference-alignment direct-preference-optimization

Updated Mar 14, 2026
Python

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

AzusaXuan / AnnoDPO

Star

[ICML 2025 Workshop FM4BS] AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization

multimodal-learning protein-function-prediction protein-language-models direct-preference-optimization

Updated Jun 13, 2025
Python

yflyzhang / RankPO

Star

RankPO: Rank Preference Optimization

information-retrieval dpo large-language-models llm rlhf rlaif reinforcement-learning-human-feedback direct-preference-optimization

Updated Mar 17, 2025
Python

artaasd95 / rap-music-generator

Star

The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.

python machine-learning huggingface large-language-models supervised-finetuning llm-training direct-preference-optimization

Updated May 18, 2025
Jupyter Notebook

pranoy-panda / minitune

Star

a small, research-focused Python library for post-training Large Language Models with autotuning

autotuning supervised-finetuning direct-preference-optimization

Updated Jan 11, 2026
Python

althea-yuquan-chen / 11611-NLP

Star

Homework assignments for CMU 11-611 Natural Language Processing (Spring 2026) — covering language identification, n-gram LMs, text classification, machine translation evaluation, and DPO fine-tuning.

python nlp text-classification machine-translation transformers language-identification n-gram dpo direct-preference-optimization natrual-language-processing

Updated Apr 20, 2026
Jupyter Notebook

AI-14 / r2gpoallm

Star

[CC 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach

natural-language-processing bioinformatics deep-learning transformer medical-image-analysis alignment-strategies chest-xrays large-language-models radiology-report-generation direct-preference-optimization

Updated May 4, 2025
Python

akhilpandey95 / LLMSciSci

Star

Experiments, and how-to guide for the lecture "Large language models for Scientometrics"

reproducibility scientometrics in-context-learning llms finetuning-llms direct-preference-optimization

Updated Jan 13, 2026
Jupyter Notebook

cluebbers / adverserial-paraphrasing

Star

Red-teaming harness for open-weight LLMs (LLaMA, Mistral, Pythia). LoRA-SFT on 580 examples raised refusal rate from ~6% to 89% and cut harmful replies to 8%. Includes adversarial prompt dataset, SFT + DPO training scripts, and 6 published adapters.

reinforcement-learning deep-learning lora red-teaming sft dpo direct-preference-optimization

Updated May 26, 2025
Jupyter Notebook

rasyosef / phi-1_5-instruct

Star

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

eliashornberg / EPFLLaMA

Star

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

natural-language-processing pytorch artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 15, 2024
Jupyter Notebook

pattabhia / rlvr-service

Star

Reinforcement Learning from Verified Rewards (RLVR) - Microservice for collecting execution data and generating training data for DPO, GRPO, and RLOO methods

machine-learning reinforcement-learning dpo llm llm-inference direct-preference-optimization

Updated Dec 13, 2025

cluebbers / dpo-rlhf-paraphrase-types

Star

End-to-end DPO fine-tuning pipeline for paraphrase-type generation (M.Sc. thesis, arXiv:2506.02018). DPO on 1,040 human-ranked pairs raised type accuracy +3 pp and human preference +7 pp over SFT baseline. Llama-3.1-8B + BART-large. Models on HuggingFace.

reinforcement-learning deep-learning transformers alignment arxiv paraphrase-generation human-feedback direct-preference-optimization paraphrase-type-generation

Updated Jun 4, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

direct-preference-optimization

Here are 24 public repositories matching this topic...

codelion / pts

liushunyu / awesome-direct-preference-optimization

caiyuanhao1998 / Open-PhyGDPO

mlvlab / VidChain

Mr-Loevan / DPO-Survey

JIA-Lab-research / TGDPO

pspdada / Uni-DPO

rasyosef / phi-2-sft-and-dpo

AzusaXuan / AnnoDPO

yflyzhang / RankPO

artaasd95 / rap-music-generator

pranoy-panda / minitune

althea-yuquan-chen / 11611-NLP

AI-14 / r2gpoallm

akhilpandey95 / LLMSciSci

cluebbers / adverserial-paraphrasing

rasyosef / phi-1_5-instruct

eliashornberg / EPFLLaMA

pattabhia / rlvr-service

cluebbers / dpo-rlhf-paraphrase-types

Improve this page

Add this topic to your repo