Skip to content

[skyrl-train] Integrate Ray Core RDT for weight syncing #977

@erictang000

Description

@erictang000

Integrate RDT as a new weight syncing backend: https://docs.ray.io/en/latest/ray-core/direct-transport.html

Additional milestone once RDT is integrated is to use it to dynamically add inference engines to the rollout pool (failure handling, autoscaling, etc.) since we will be able to initiate weight sync without forming a new NCCL group.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions