Skip to content

Conversation

@joaobenedetmachado
Copy link

Problem

Logging for metrics returned by loss functions via aux was at a _iter_steps level,
whereas other metrics were logged at a global_steps level. The progress bar for
GRPO/PPO was at a _iter_steps level. This inconsistency made it confusing for
users to pass max_steps at a _iter_steps level, and not at global_steps level.

Solution

Unified logging to use global_steps consistently across all components:

  • Add global_steps property to PeftTrainer (returns train_steps for compatibility)
  • Update DPO trainer to use global_steps for aux metrics logging
  • Update progress bar to use global_steps for initial_steps
  • Ensures consistent step counting across all RL algorithms
  • Resolves confusion about max_steps parameter interpretation

Files Changed

  • tunix/sft/peft_trainer.py - Added global_steps property and updated progress bar
  • tunix/sft/dpo/dpo_trainer.py - Updated logging methods to use global_steps

Testing

  • All code verification tests pass
  • Backward compatibility maintained (global_steps = train_steps)
  • Demo notebook available in Colab

Resolves #390

Colab Notebook
https://colab.research.google.com/drive/1hMelDYWB5w3v3W6anOChPDFXruMCbjlY?usp=sharing

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and all unit tests pass.
  • I have added all appropriate doc-strings/documentation.
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have signed the Contributor License Agreement.
  • I have followed Contribution Guidelines.

@wang2yn84
Copy link
Collaborator

Thank you for your PR! We recently noticed there are some issue in DPO metrics logging and it's fixed by: ac07cfd. So the DPO trainer part is no longer valid and necessary. But we can still land the peft trainer part. Can you rebase?

- Add global_steps property to PeftTrainer (returns train_steps for compatibility)
- Update DPO trainer to use global_steps for aux metrics logging
- Update progress bar to use global_steps for initial_steps
- Ensures consistent step counting across all RL algorithms
- Resolves confusion about max_steps parameter interpretation
@joaobenedetmachado
Copy link
Author

Thanks for the feedback! I’ve rebased the branch with the latest main and removed the DPO-related changes since they were already addressed in ac07cfd.
The PR now only includes the PeftTrainer updates to unify logging with global_steps.
All tests pass and everything is ready for review. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RL Bug/Issue Tracker

2 participants