feat: add ORPO trainer support #594

eghuzefa · 2025-10-19T17:53:32Z

Summary

This PR implements ORPO (Odds Ratio Preference Optimization) as a memory-efficient alternative to DPO for preference tuning. ORPO achieves approximately 50% memory savings by eliminating the need for a reference model during training.

Motivation

ORPO provides the same preference learning capabilities as DPO while requiring only a single model forward pass, making it ideal for resource-constrained environments and large model training.

Paper: https://arxiv.org/abs/2403.07691

Changes

Core Implementation

tunix/rl/orpo/orpo_trainer.py: Complete ORPO trainer implementation (516 lines)
- ORPOTrainer class inheriting from PeftTrainer
- ORPOTrainingConfig dataclass with lambda_orpo hyperparameter
- orpo_loss_fn implementing odds ratio-based preference optimization
- Support for both tokenized and string inputs
tunix/rl/orpo/__init__.py: Module exports

Testing

tests/rl/orpo/orpo_trainer_test.py: Comprehensive test suite
- 6 tests covering training loop, loss computation, input handling
- All tests passing ✅
- End-to-end training verified

Integration

tunix/__init__.py: Added ORPO exports to main package API

Key Features

✅ No reference model required (key difference from DPO)
✅ Memory efficient: ~50% memory savings vs DPO
✅ API compatible: Works with existing PeftTrainer infrastructure
✅ Full metrics logging: rewards/chosen, rewards/rejected, rewards/margin, rewards/accuracy, odds_ratio
✅ Flexible input formats: Supports both tokenized and string inputs

wang2yn84 · 2025-10-21T00:45:46Z

Hi eghuzefa, thank you for your PR. Do you happen to formatted everything? It's hard to review the actual change right now. Can you format the files that you changed?

eghuzefa · 2025-10-22T07:53:11Z

Hi eghuzefa, thank you for your PR. Do you happen to formatted everything? It's hard to review the actual change right now. Can you format the files that you changed?

Hey, yes I did, two files were primarily added
tunix/rl/orpo/orpo_trainer.py
tests/rl/orpo/orpo_trainer_test.py

and the following file was changed for adding the trainer export
tunix/init.py

eghuzefa · 2025-10-22T07:59:17Z

I will go through it again, fix all these conflicts, and reformat my changes, by tomorrow.

wang2yn84 · 2025-10-23T20:46:04Z

Hi eghuzefa, please let me know when this is PR ready for review. Right now still have formatting issues.

eghuzefa · 2025-10-24T10:53:06Z

Hi eghuzefa, please let me know when this is PR ready for review. Right now still have formatting issues.

Thanks for the review! I had rebased 2 days ago on the latest upstream/main and today made the following changes to match the codebase patterns: I updated ORPO to be compatible with recent upstream API changes: common.get_per_token_logps() now returns a single value instead of a tuple, and I renamed the _prepare_inputs parameter from training_input to input_data to match the base class signature. I also removed the init.py files from both tunix/rl/orpo/ and tests/rl/orpo/ to exactly match the structure used by DPO and GRPO (both export directly from tunix/init.py without intermediate init.py files). All 6 tests pass, pyink formatting is clean, and pylint gives 9.78/10 with only minor non-blocking docstring warnings. Could you please clarify what specific formatting issues you're seeing? I've run pre-commit hooks and everything passes. Are there specific files or lines that need attention, or perhaps a different formatting tool I should use?

wang2yn84 · 2025-11-06T22:41:56Z

Hi @eghuzefa , I'm seeing 42 files are changes in this PR, I believe that's because you run pre-commit manually on the whole repo? It's hard to focus on the actual change you have. Can you remove those format only changes? @sizhit2 , can you help take a look at the orpo_trainer?

eghuzefa · 2025-11-07T05:35:21Z

Hi @eghuzefa , I'm seeing 42 files are changes in this PR, I believe that's because you run pre-commit manually on the whole repo? It's hard to focus on the actual change you have. Can you remove those format only changes? @sizhit2 , can you help take a look at the orpo_trainer?

Hey @wang2yn84, yeah you're right — I ran pre-commit on the whole repo. I’ll revert the formatting-only changes and keep just the relevant ones.

eghuzefa · 2025-11-07T10:43:35Z

Hi @eghuzefa , I'm seeing 42 files are changes in this PR, I believe that's because you run pre-commit manually on the whole repo? It's hard to focus on the actual change you have. Can you remove those format only changes? @sizhit2 , can you help take a look at the orpo_trainer?

Hey @wang2yn84, yeah you're right — I ran pre-commit on the whole repo. I’ll revert the formatting-only changes and keep just the relevant ones.

Hi @wang2yn84, I have fixed it now, kindly check.

wang2yn84 · 2025-11-10T04:39:14Z

Thank you @eghuzefa, it's much better now! Couple of questions here: 1. ORPO is a variation of DPO, can you move to sft/ ? 2. We are refactoring the logics now to abstract the core algorithm. Before that, can you merge the algorithm directly to dpo? Most of the code added should be similar to DPO except some additional configs and the loss function.

eghuzefa · 2025-11-10T18:35:40Z

Thank you @eghuzefa, it's much better now! Couple of questions here: 1. ORPO is a variation of DPO, can you move to sft/ ? 2. We are refactoring the logics now to abstract the core algorithm. Before that, can you merge the algorithm directly to dpo? Most of the code added should be similar to DPO except some additional configs and the loss function.

Done! I've merged ORPO into DPO in sft/. Both algorithms now share the same trainer with algorithm="orpo" config and no reference model. I kept the dpo folder name to minimize changes - happy to rename it to something more generic like preference or po if you prefer, but wanted to avoid unnecessary refactoring for now. All tests passing ✓

tunix/sft/dpo/dpo_trainer.py

wang2yn84 · 2025-11-12T05:59:11Z

Thank you so much for the support! Looks great!

wang2yn84 · 2025-11-12T06:03:29Z

Do you have a gmail account? Can we chat a bit more about your contribution?

wang2yn84 · 2025-11-12T06:04:41Z

One last thing, can you squash your commits into 1?

eghuzefa · 2025-11-12T06:32:12Z

Do you have a gmail account? Can we chat a bit more about your contribution?

hey, sure, here it is, m.huzefa1993@gmail.com

eghuzefa · 2025-11-12T08:16:44Z

Apologies for the noise!
While squashing my commits into a single clean commit, I made a mistake during the git reset process that caused the PR to auto-close. The PR has been reopened with all the ORPO implementation changes intact and consolidated into one commit.
Sorry for any confusion this caused!

wang2yn84 · 2025-11-12T20:31:01Z

Apologies for the noise! While squashing my commits into a single clean commit, I made a mistake during the git reset process that caused the PR to auto-close. The PR has been reopened with all the ORPO implementation changes intact and consolidated into one commit. Sorry for any confusion this caused!

Oh no worry, I saw some github workflow test failure last time, need to fix them before we can merge.

wang2yn84 · 2025-11-12T20:31:17Z

Do you have a gmail account? Can we chat a bit more about your contribution?

hey, sure, here it is, m.huzefa1993@gmail.com

Are you available on Google chat?

eghuzefa force-pushed the feat-add-orpo-support branch from b704c94 to 529b8d3 Compare October 22, 2025 14:46

eghuzefa requested review from abheesht17, hgao327, lc5211, sizhit2, tianshub and wang2yn84 as code owners October 22, 2025 14:46

eghuzefa requested a review from jiangyangmu as a code owner October 24, 2025 10:49

eghuzefa force-pushed the feat-add-orpo-support branch from faf4d72 to 28f574d Compare November 7, 2025 05:51

wang2yn84 reviewed Nov 10, 2025

View reviewed changes

tunix/sft/dpo/dpo_trainer.py Outdated Show resolved Hide resolved

tunix/sft/dpo/dpo_trainer.py Show resolved Hide resolved

tunix/sft/dpo/dpo_trainer.py Outdated Show resolved Hide resolved

eghuzefa had a problem deploying to testing November 12, 2025 05:59 — with GitHub Actions Failure

eghuzefa closed this Nov 12, 2025

eghuzefa force-pushed the feat-add-orpo-support branch from 2f254a2 to a3f1ca8 Compare November 12, 2025 07:56

copybara-service bot temporarily deployed to testing November 12, 2025 08:10 Inactive

eghuzefa reopened this Nov 12, 2025

eghuzefa had a problem deploying to testing November 12, 2025 20:30 — with GitHub Actions Failure

eghuzefa force-pushed the feat-add-orpo-support branch 5 times, most recently from 150d12f to 5f10c23 Compare November 18, 2025 18:27

feat: add ORPO support

9e99644

eghuzefa force-pushed the feat-add-orpo-support branch from 5f10c23 to 9e99644 Compare November 18, 2025 18:30

eghuzefa temporarily deployed to testing November 18, 2025 19:16 — with GitHub Actions Inactive

copybara-service bot merged commit 4970237 into google:main Nov 19, 2025
8 checks passed

feat: add ORPO trainer support #594

feat: add ORPO trainer support #594

Uh oh!

Conversation

eghuzefa commented Oct 19, 2025

Summary

Motivation

Changes

Core Implementation

Testing

Integration

Key Features

Uh oh!

wang2yn84 commented Oct 21, 2025

Uh oh!

eghuzefa commented Oct 22, 2025

Uh oh!

eghuzefa commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wang2yn84 commented Oct 23, 2025

Uh oh!

eghuzefa commented Oct 24, 2025

Uh oh!

wang2yn84 commented Nov 6, 2025

Uh oh!

eghuzefa commented Nov 7, 2025

Uh oh!

eghuzefa commented Nov 7, 2025

Uh oh!

wang2yn84 commented Nov 10, 2025

Uh oh!

eghuzefa commented Nov 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wang2yn84 commented Nov 12, 2025

Uh oh!

wang2yn84 commented Nov 12, 2025

Uh oh!

wang2yn84 commented Nov 12, 2025

Uh oh!

eghuzefa commented Nov 12, 2025

Uh oh!

eghuzefa commented Nov 12, 2025

Uh oh!

wang2yn84 commented Nov 12, 2025

Uh oh!

wang2yn84 commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eghuzefa commented Oct 22, 2025 •

edited

Loading