🚀 Feature Request
Supporting TP and SP seems quite easy to do with the `replication parameter:
I have tried various ways to enable PP without success (unexpected high loss). I tried adding pp into the equation when computing replication and num_canonical_nodes, but I cannot get it to function normally because I get an unexpected high loss.
Motivation
I want to use the mosaicml streaming library with 4D parallel. Specifically, I rely on TorchTitan as my training tool and have simply swapped in the mosaicml streaming library by modifying the StreamingTextDataset implementation from LLM Foundry.