Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 54 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,58 @@
# DisTrO

This is the repository for DisTrO (Distributed Training Over-The-Internet), a family of low latency distributed optimizers that reduce inter-GPU communication requirements by three to four orders of magnitude.
[![GitHub stars](https://img.shields.io/github/stars/NousResearch/DisTrO?style=social)](https://github.com/NousResearch/DisTrO/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/NousResearch/DisTrO?style=social)](https://github.com/NousResearch/DisTrO/network/members)

- [x] Aug. 26th, 2024: DisTrO [(Preliminary Report)](https://github.com/NousResearch/DisTrO/raw/main/A_Preliminary_Report_on_DisTrO.pdf)
- [x] Dec. 2nd, 2024: DeMo Optimization [(Paper)](https://arxiv.org/abs/2411.19870) [(Code)](https://github.com/bloc97/DeMo), original seed research/idea for DisTrO
- [x] Dec. 2nd, 2024: [Nous trains a 15b model using DisTrO](https://distro.nousresearch.com/)
- [x] May 14th, 2025: [Psyche Network](https://nousresearch.com/nous-psyche/)
- [x] May 14th, 2025: [Nous Consilience 40b LLM](https://psyche.network/runs/consilience-40b-1/0), [Huggingface](https://huggingface.co/PsycheFoundation/consilience-40b-7Y9v38s5)
- [x] Oct. 14th, 2025: DeMo Optimization [(Paper v2)](https://openreview.net/pdf?id=U9oewpa7cn), [(Production Code)](https://github.com/PsycheFoundation/psyche/blob/b13ff76f879796a071850bae2d82084f360d608d/shared/modeling/src/distro.rs)
**DisTrO (Distributed Training Over-The-Internet)** is a family of low-latency distributed optimizers that reduce inter-GPU communication requirements by **three to four orders of magnitude**.

[Join us on Discord](https://discord.com/invite/jqVphNsB4H) if you're interested in helping research and build the future of distributed training.
## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- PyTorch 2.0+
- CUDA-compatible GPU(s)

### Installation

```bash
# Clone the repository
git clone https://github.com/NousResearch/DisTrO.git
cd DisTrO

# Install dependencies (if applicable)
pip install torch torchvision
```

### Usage

For implementation details and integration with your training pipeline, refer to:
- [DeMo Optimization Code](https://github.com/bloc97/DeMo) - Original seed research implementation
- [Production Code (Rust)](https://github.com/PsycheFoundation/psyche/blob/b13ff76f879796a071850bae2d82084f360d608d/shared/modeling/src/distro.rs) - Production-ready implementation

## 📚 Research & Publications

| Date | Release | Links |
|------|---------|-------|
| Aug. 26, 2024 | DisTrO Preliminary Report | [📄 PDF](https://github.com/NousResearch/DisTrO/raw/main/A_Preliminary_Report_on_DisTrO.pdf) |
| Dec. 2, 2024 | DeMo Optimization | [📄 Paper](https://arxiv.org/abs/2411.19870) • [💻 Code](https://github.com/bloc97/DeMo) |
| Dec. 2, 2024 | Nous 15B Model Training | [🌐 Website](https://distro.nousresearch.com/) |
| May 14, 2025 | Psyche Network | [📢 Announcement](https://nousresearch.com/nous-psyche/) |
| May 14, 2025 | Nous Consilience 40B LLM | [🔗 Run](https://psyche.network/runs/consilience-40b-1/0) • [🤗 HuggingFace](https://huggingface.co/PsycheFoundation/consilience-40b-7Y9v38s5) |
| Oct. 14, 2025 | DeMo Optimization v2 | [📄 Paper](https://openreview.net/pdf?id=U9oewpa7cn) • [💻 Production Code](https://github.com/PsycheFoundation/psyche/blob/b13ff76f879796a071850bae2d82084f360d608d/shared/modeling/src/distro.rs) |

## 🤝 Community

Interested in helping research and build the future of distributed training?

[![Discord](https://img.shields.io/badge/Discord-Join%20Us-7289da?logo=discord&logoColor=white)](https://discord.com/invite/jqVphNsB4H)

## 📖 About

DisTrO enables efficient distributed training across the internet by dramatically reducing the bandwidth requirements for gradient synchronization. This makes it possible to train large models across geographically distributed hardware without the need for expensive high-bandwidth interconnects.

### Key Features

- **Low Latency**: Optimized for training over standard internet connections
- **Massive Bandwidth Reduction**: 1000-10000x reduction in inter-GPU communication
- **Scalable**: Designed for large-scale distributed training across diverse hardware