Skip to content

Basic RL Testing#50

Open
samherring99 wants to merge 5 commits intodev-updated-againfrom
rl-testing
Open

Basic RL Testing#50
samherring99 wants to merge 5 commits intodev-updated-againfrom
rl-testing

Conversation

@samherring99
Copy link

@samherring99 samherring99 commented Feb 18, 2026

This PR adds a basic RL testing setup to confirm things are working within our grpo/ subdirectory.

To run the test, simply set the virtual environment paths in torchtitan/grpo/test/online_multinode_vllm_test.slurm, and run:

./torchtitan/grpo/test/run_with_monitor.sh torchtitan/grpo/test/online_multinode_vllm_test.slurm 

to view the training metrics and stability.

Future changes will be added to run this with GSM8K and a smaller model.

Output:

GRPO Training Health Monitor Starting...
================================================================================
Looking for log file for job JOBID...
Found log file: /home/<user>/Projects/torchtitan/logs/JOBID.out
Monitoring log file: /home/<user>/Projects/torchtitan/logs/JOBID.out
Reading recent metrics from log...
Processing last 500 lines...
Caught up on existing metrics (8 metric lines found).
Now monitoring new lines...


================================================================================
GRPO Health Check - Step 55
================================================================================
Weak pos/neg separation: 0.0264
Policy ratio stable (~1.0)

Latest metrics:
   Loss:   -0.1149
   Reward: -0.0961
   Ratio:  1.0002
   Pos/Neg Separation: -0.1387

Status: HEALTHY
   Warnings: 0 | Critical: 0
================================================================================

@dmahan93
Copy link

how does this run? it looks very manual to setup 😅

@samherring99
Copy link
Author

how does this run? it looks very manual to setup 😅

have updated this with a single top level script, the only manual change is now setting venv paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants