Hydra

This folder contains the toy-scale prototype code accompanying the Hydra paper:

Hydra: A 1.6B-Parameter State-Space Language Model with Sparse Attention, Mixture-of-Experts, and Memory

Important: These scripts are illustrative and operate at small scale on synthetic data to validate integration and scaling trends. They are not a full 1.6B model training pipeline.

toy_hydra.py — Minimal Hydra model (SSM + sparse attention + MoE) used in benchmarks.
ssm_kernels.py — Placeholder selective-scan SSM; the fast surrogate lives in toy_hydra.py.
workspace_memory.py — Latent workspace memory (read/write) toy implementation.
pkm_memory.py — Product-Key Memory (PKM) toy layer and helpers.
fairness_benchmark.py — Training loop and short-context throughput benchmark on synthetic tasks.
run_long_context.py — Long-context throughput/memory runs (1k–16k tokens).
speedup_summary.py — Aggregates benchmark outputs and renders speedup tables.
plot_results.py — Plots figures from consolidated results.

Outputs are written to a results/ folder (see paper’s Reproducibility section for exact filenames).

Requirements

Python 3.11
PyTorch ≥ 2.1
NumPy
Matplotlib (for plotting)

GPU is optional; CPU runs are slower. CUDA is only needed for the long-context throughput experiments reported in the paper.

Quickstart

Create an environment and install dependencies, then run any of the scripts below from this folder.

Examples:

Run benchmark + training (default)
```
python fairness_benchmark.py
```
Long-context benchmark (1k–16k)
```
python run_long_context.py
```
Aggregate speedups and render a markdown table/JSON
```
python speedup_summary.py
```
Plot figures from results/
```
python plot_results.py
```

Notes:

Default toy config uses d=256, 8 blocks, attention every 4th block, MoE on even blocks, Top-2 routing.
Scripts generate CSV/JSON artifacts such as throughput_summary.csv, speedup_summary.json, train_losses.csv in results/.

Citation

If you use this code, please cite the paper:

Hydra: A 1.6B-Parameter State-Space Language Model with Sparse Attention, Mixture-of-Experts, and Memory

https://arxiv.org/abs/2508.15099

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hydra

Contents

Requirements

Quickstart

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
.DS_Store		.DS_Store
README.md		README.md
fairness_benchmark.py		fairness_benchmark.py
final_loss_summary.py		final_loss_summary.py
pkm_memory.py		pkm_memory.py
plot_results.py		plot_results.py
run_long_context.py		run_long_context.py
speedup_summary.py		speedup_summary.py
ssm_kernels.py		ssm_kernels.py
toy_hydra.py		toy_hydra.py
workspace_memory.py		workspace_memory.py

sidcraftscode/Hydra

Folders and files

Latest commit

History

Repository files navigation

Hydra

Contents

Requirements

Quickstart

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages