Pivotal Token Search
-
Updated
Dec 20, 2025 - Python
Pivotal Token Search
A Survey of Direct Preference Optimization (DPO)
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
[ICLR 2026] Official repository of "Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs".
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
[ICML 2025 Workshop FM4BS] AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
RankPO: Rank Preference Optimization
The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.
a small, research-focused Python library for post-training Large Language Models with autotuning
Homework assignments for CMU 11-611 Natural Language Processing (Spring 2026) — covering language identification, n-gram LMs, text classification, machine translation evaluation, and DPO fine-tuning.
[CC 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach
Experiments, and how-to guide for the lecture "Large language models for Scientometrics"
Red-teaming harness for open-weight LLMs (LLaMA, Mistral, Pythia). LoRA-SFT on 580 examples raised refusal rate from ~6% to 89% and cut harmful replies to 8%. Includes adversarial prompt dataset, SFT + DPO training scripts, and 6 published adapters.
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Reinforcement Learning from Verified Rewards (RLVR) - Microservice for collecting execution data and generating training data for DPO, GRPO, and RLOO methods
End-to-end DPO fine-tuning pipeline for paraphrase-type generation (M.Sc. thesis, arXiv:2506.02018). DPO on 1,040 human-ranked pairs raised type accuracy +3 pp and human preference +7 pp over SFT baseline. Llama-3.1-8B + BART-large. Models on HuggingFace.
Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."