Skip to content

Conversation

@eghuzefa
Copy link
Contributor

Summary

This PR implements ORPO (Odds Ratio Preference Optimization) as a memory-efficient alternative to DPO for preference tuning. ORPO achieves approximately 50% memory savings by eliminating the need for a reference model during training.

Motivation

ORPO provides the same preference learning capabilities as DPO while requiring only a single model forward pass, making it ideal for resource-constrained environments and large model training.

Paper: https://arxiv.org/abs/2403.07691

Changes

Core Implementation

  • tunix/rl/orpo/orpo_trainer.py: Complete ORPO trainer implementation (516 lines)

    • ORPOTrainer class inheriting from PeftTrainer
    • ORPOTrainingConfig dataclass with lambda_orpo hyperparameter
    • orpo_loss_fn implementing odds ratio-based preference optimization
    • Support for both tokenized and string inputs
  • tunix/rl/orpo/__init__.py: Module exports

Testing

  • tests/rl/orpo/orpo_trainer_test.py: Comprehensive test suite
    • 6 tests covering training loop, loss computation, input handling
    • All tests passing ✅
    • End-to-end training verified

Integration

  • tunix/__init__.py: Added ORPO exports to main package API

Key Features

  • No reference model required (key difference from DPO)
  • Memory efficient: ~50% memory savings vs DPO
  • API compatible: Works with existing PeftTrainer infrastructure
  • Full metrics logging: rewards/chosen, rewards/rejected, rewards/margin, rewards/accuracy, odds_ratio
  • Flexible input formats: Supports both tokenized and string inputs

@wang2yn84
Copy link
Collaborator

Hi eghuzefa, thank you for your PR. Do you happen to formatted everything? It's hard to review the actual change right now. Can you format the files that you changed?

@eghuzefa
Copy link
Contributor Author

Hi eghuzefa, thank you for your PR. Do you happen to formatted everything? It's hard to review the actual change right now. Can you format the files that you changed?

Hey, yes I did, two files were primarily added
tunix/rl/orpo/orpo_trainer.py
tests/rl/orpo/orpo_trainer_test.py

and the following file was changed for adding the trainer export
tunix/init.py

@eghuzefa
Copy link
Contributor Author

eghuzefa commented Oct 22, 2025

I will go through it again, fix all these conflicts, and reformat my changes, by tomorrow.

@wang2yn84
Copy link
Collaborator

Hi eghuzefa, please let me know when this is PR ready for review. Right now still have formatting issues.

@eghuzefa eghuzefa requested a review from jiangyangmu as a code owner October 24, 2025 10:49
@eghuzefa
Copy link
Contributor Author

Hi eghuzefa, please let me know when this is PR ready for review. Right now still have formatting issues.

Thanks for the review! I had rebased 2 days ago on the latest upstream/main and today made the following changes to match the codebase patterns: I updated ORPO to be compatible with recent upstream API changes: common.get_per_token_logps() now returns a single value instead of a tuple, and I renamed the _prepare_inputs parameter from training_input to input_data to match the base class signature. I also removed the init.py files from both tunix/rl/orpo/ and tests/rl/orpo/ to exactly match the structure used by DPO and GRPO (both export directly from tunix/init.py without intermediate init.py files). All 6 tests pass, pyink formatting is clean, and pylint gives 9.78/10 with only minor non-blocking docstring warnings. Could you please clarify what specific formatting issues you're seeing? I've run pre-commit hooks and everything passes. Are there specific files or lines that need attention, or perhaps a different formatting tool I should use?

@wang2yn84
Copy link
Collaborator

Hi @eghuzefa , I'm seeing 42 files are changes in this PR, I believe that's because you run pre-commit manually on the whole repo? It's hard to focus on the actual change you have. Can you remove those format only changes? @sizhit2 , can you help take a look at the orpo_trainer?

@eghuzefa
Copy link
Contributor Author

eghuzefa commented Nov 7, 2025

Hi @eghuzefa , I'm seeing 42 files are changes in this PR, I believe that's because you run pre-commit manually on the whole repo? It's hard to focus on the actual change you have. Can you remove those format only changes? @sizhit2 , can you help take a look at the orpo_trainer?

Hey @wang2yn84, yeah you're right — I ran pre-commit on the whole repo. I’ll revert the formatting-only changes and keep just the relevant ones.

@eghuzefa eghuzefa force-pushed the feat-add-orpo-support branch from faf4d72 to 28f574d Compare November 7, 2025 05:51
@eghuzefa
Copy link
Contributor Author

eghuzefa commented Nov 7, 2025

Hi @eghuzefa , I'm seeing 42 files are changes in this PR, I believe that's because you run pre-commit manually on the whole repo? It's hard to focus on the actual change you have. Can you remove those format only changes? @sizhit2 , can you help take a look at the orpo_trainer?

Hey @wang2yn84, yeah you're right — I ran pre-commit on the whole repo. I’ll revert the formatting-only changes and keep just the relevant ones.

Hi @wang2yn84, I have fixed it now, kindly check.

@wang2yn84
Copy link
Collaborator

Thank you @eghuzefa, it's much better now! Couple of questions here: 1. ORPO is a variation of DPO, can you move to sft/ ? 2. We are refactoring the logics now to abstract the core algorithm. Before that, can you merge the algorithm directly to dpo? Most of the code added should be similar to DPO except some additional configs and the loss function.

@eghuzefa
Copy link
Contributor Author

Thank you @eghuzefa, it's much better now! Couple of questions here: 1. ORPO is a variation of DPO, can you move to sft/ ? 2. We are refactoring the logics now to abstract the core algorithm. Before that, can you merge the algorithm directly to dpo? Most of the code added should be similar to DPO except some additional configs and the loss function.

Done! I've merged ORPO into DPO in sft/. Both algorithms now share the same trainer with algorithm="orpo" config and no reference model. I kept the dpo folder name to minimize changes - happy to rename it to something more generic like preference or po if you prefer, but wanted to avoid unnecessary refactoring for now. All tests passing ✓

@wang2yn84
Copy link
Collaborator

Thank you so much for the support! Looks great!

@wang2yn84
Copy link
Collaborator

Do you have a gmail account? Can we chat a bit more about your contribution?

@wang2yn84
Copy link
Collaborator

One last thing, can you squash your commits into 1?

@eghuzefa
Copy link
Contributor Author

Do you have a gmail account? Can we chat a bit more about your contribution?

hey, sure, here it is, m.huzefa1993@gmail.com

@eghuzefa
Copy link
Contributor Author

Apologies for the noise!
While squashing my commits into a single clean commit, I made a mistake during the git reset process that caused the PR to auto-close. The PR has been reopened with all the ORPO implementation changes intact and consolidated into one commit.
Sorry for any confusion this caused!

@wang2yn84
Copy link
Collaborator

Apologies for the noise! While squashing my commits into a single clean commit, I made a mistake during the git reset process that caused the PR to auto-close. The PR has been reopened with all the ORPO implementation changes intact and consolidated into one commit. Sorry for any confusion this caused!

Oh no worry, I saw some github workflow test failure last time, need to fix them before we can merge.

@wang2yn84
Copy link
Collaborator

Do you have a gmail account? Can we chat a bit more about your contribution?

hey, sure, here it is, m.huzefa1993@gmail.com

Are you available on Google chat?

@eghuzefa eghuzefa force-pushed the feat-add-orpo-support branch 5 times, most recently from 150d12f to 5f10c23 Compare November 18, 2025 18:27
@eghuzefa eghuzefa force-pushed the feat-add-orpo-support branch from 5f10c23 to 9e99644 Compare November 18, 2025 18:30
@copybara-service copybara-service bot merged commit 4970237 into google:main Nov 19, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants