feat: basic ppo training implementation by hXl3s · Pull Request #2027 · NVIDIA-NeMo/RL

hXl3s · 2026-02-26T14:47:32Z

DO NOT MERGE! WORK IN PROGRESS!

What does this PR do ?

This PR adds basic Proximal Policy Optimization training loop to Nemo-RL.

What is added:

Support for value model. Current value model is a separate worker. Case where value model is just a head of Policy is not covered yet
PPO training loop and example of training math model (no convergence tested yet)
Basic logging and validation during PPO training

Issues

No direct issue

closes #2047

Usage

uv run example/run_ppo.py

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

github-actions · 2026-02-26T14:48:53Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 08aa60d (PR #2027 from lukaszp/ppo)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2026-02-26T14:52:07Z

⚠️ File Consistency Check

Check based on commit: 08aa60d (PR #2027 from lukaszp/ppo)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-26T15:37:05Z

✅ Submodule Fast-Forward Check Results

Check based on commit: efd71bb (PR #2027 from lukaszp/ppo)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

copy-pr-bot · 2026-03-10T13:48:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-24T16:04:59Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 7b83db2 (PR #2027 from lukaszp/ppo)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2026-03-24T16:16:20Z

✅ Submodule Fast-Forward Check Results

Check based on commit: aee85c3 (PR #2027 from lukaszp/ppo)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2026-03-24T17:45:03Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 90116e9 (PR #2027 from lukaszp/ppo)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

hXl3s force-pushed the lukaszp/ppo branch from 3a82b7b to 7b83db2 Compare March 24, 2026 16:03

hXl3s added 2 commits March 24, 2026 18:02

feat(ppo): Implementation scaffolding

012edfd

fix(ppo): naming issues after rebase

90116e9

hXl3s force-pushed the lukaszp/ppo branch from aee85c3 to 90116e9 Compare March 24, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: basic ppo training implementation#2027

feat: basic ppo training implementation#2027
hXl3s wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
hXl3s:lukaszp/ppo

hXl3s commented Feb 26, 2026 •

edited by terrykong

Loading

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hXl3s commented Feb 26, 2026 • edited by terrykong Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

github-actions bot commented Feb 26, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Feb 26, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Feb 26, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Mar 24, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Mar 24, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hXl3s commented Feb 26, 2026 •

edited by terrykong

Loading