Skip to content

Conversation

@simonsays1980
Copy link
Contributor

Description

The StatelessCartPole example form APPO is timing out. This could be due to the latest changes in the APPO data pipeline. This PR modifies the setup of the example by using the new APPO attributes.

Related issues

Fixes https://buildkite.com/ray-project/postmerge/builds/15188#019b8f6e-2850-465e-a98c-63c29fbf98f7/L4702

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 requested a review from a team as a code owner January 6, 2026 10:45
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the StatelessCartPole APPO example to address a timeout issue, likely by using new APPO configuration attributes. The changes involve enabling MeanStdFilter for observation normalization and adjusting training parameters by setting use_circular_buffer to False and increasing broadcast_interval. My main feedback is regarding a leftover TODO comment, which could cause confusion about the stability of MeanStdFilter.

Comment on lines +27 to +29
.env_runners(
env_to_module_connector=lambda env, spaces, device: MeanStdFilter(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You've enabled MeanStdFilter here, but the TODO comment on lines 25-26 mentions that it might cause NaNs during training. If this issue has been resolved, it would be great to remove the TODO comment to avoid confusion for future readers. If the issue is still present, perhaps a note explaining why it's being enabled despite the potential problem would be helpful.

@ray-gardener ray-gardener bot added the rllib RLlib related issues label Jan 6, 2026
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 added rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-system system issues, runtime env, oom, etc go add ONLY when ready to merge, run all tests labels Jan 6, 2026
@simonsays1980 simonsays1980 self-assigned this Jan 6, 2026
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSTM

@simonsays1980 simonsays1980 enabled auto-merge (squash) January 6, 2026 16:51
@simonsays1980 simonsays1980 merged commit b22f6a6 into ray-project:master Jan 6, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-system system issues, runtime env, oom, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants