[skyrl-train] Add GDPO Support to PPO Utils by devpatelio · Pull Request #897 · NovaSky-AI/SkyRL

devpatelio · 2026-01-19T23:01:05Z

GDPO is an extension of GRPO for multi-reward settings where we do group-wise normalization of each reward function prior to computing the advantage. This is then followed by a batch-norm across all prompts belonging to a given batch and it's respective advantages ( GDPO Paper)

Points of clarification:

How does skyrl handle multiple reward functions (if at all)? -> we assume token-level rewards also contain an extra dimension for N objectives
Should we port from VERL? It seems like it only supports max 2 rewards -> https://github.com/NVlabs/GDPO/blob/main/verl-GDPO/verl/trainer/ppo/ray_trainer.py

TODOs:

add multiple reward functionality
test GDPO performance against GRPO

gemini-code-assist

Code Review

This pull request introduces support for Group-wise Distributional Policy Optimization (GDPO) advantage estimation by adding a new advantage estimator, compute_gdpo_outcome_advantage, to the PPO utilities. A medium-severity vulnerability was identified in the new GDPO implementation, specifically a critical bug in the group-wise normalization step that could lead to division by zero. Additionally, a broken link to the reference paper in the docstring was found.

skyrl-train/skyrl_train/utils/ppo_utils.py

GDPO implementation

6706568

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

skyrl-train/skyrl_train/utils/ppo_utils.py Outdated Show resolved Hide resolved

skyrl-train/skyrl_train/utils/ppo_utils.py Show resolved Hide resolved

add test + use grpo norm flag

d7521f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train] Add GDPO Support to PPO Utils#897

[skyrl-train] Add GDPO Support to PPO Utils#897
devpatelio wants to merge 2 commits intoNovaSky-AI:mainfrom
devpatelio:devpatel/gdpo

devpatelio commented Jan 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devpatelio commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devpatelio commented Jan 19, 2026 •

edited

Loading