Skip to content

Pipeline Parallelism (Supported? How to?) #827

@casper-hansen

Description

@casper-hansen

🚀 Feature Request

Supporting TP and SP seems quite easy to do with the `replication parameter:

replication = tp * sp

I have tried various ways to enable PP without success (unexpected high loss). I tried adding pp into the equation when computing replication and num_canonical_nodes, but I cannot get it to function normally because I get an unexpected high loss.

Motivation

I want to use the mosaicml streaming library with 4D parallel. Specifically, I rely on TorchTitan as my training tool and have simply swapped in the mosaicml streaming library by modifying the StreamingTextDataset implementation from LLM Foundry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions