AI-powered Beat Saber track generator — feed in a song, get a playable custom map.
Beat Weaver uses machine learning to automatically generate Beat Saber note maps from audio files. Instead of manually placing blocks, you provide a song and the model outputs block positions, orientations, and timing for both sabers.
- Audio-to-map generation — provide an audio file, get a playable v2 Beat Saber map (BPM auto-detected or manual)
- Difficulty selection — generate for Easy, Normal, Hard, Expert, or ExpertPlus
- Seeded generation — use a fixed seed for repeatable tracks, or randomize for variety
- Grammar-constrained decoding — generated maps always follow valid Beat Saber structure
- Quality metrics — onset F1, parity violations, NPS accuracy, beat alignment, pattern diversity
- Python 3.11+
- For training: NVIDIA GPU with CUDA support
- Medium conformer (9.4M params): 8GB+ VRAM
- Large conformer (62M params): 24GB+ VRAM
- Beat Saber installation (Steam) — only needed for extracting official maps
git clone https://github.com/asfilion/beat-weaver.git
cd beat-weaver
# Core (data pipeline only)
pip install -e .
# With ML model dependencies (required for training and generation)
pip install -e ".[ml]"
# Development (adds pytest)
pip install -e ".[ml,dev]"This is the end-to-end workflow from a fresh clone to a trained model.
Download community maps from BeatSaver. This downloads maps with a rating score >= 0.75 and >= 5 upvotes. The download is resumable — you can stop and restart without losing progress.
# Download ~55K community maps (this takes several hours)
beat-weaver download --min-score 0.75 --output data/raw/beatsaverIf you have Beat Saber installed via Steam, you can also extract the 214 official/DLC maps. These are higher quality and weighted at 20% of each training batch.
# Windows (default Steam path)
beat-weaver extract-official --output data/raw/official
# Custom install path
beat-weaver extract-official --beat-saber "/path/to/Beat Saber" --output data/raw/officialParse all downloaded/extracted maps into a normalized Parquet format for training.
beat-weaver process --input data/raw --output data/processedCreate a JSON mapping from song hash to audio file path. This tells the training pipeline where to find each song's audio.
beat-weaver build-manifest --input data/raw --output data/audio_manifest.jsonChoose a config based on your hardware:
| Config | Params | VRAM | File |
|---|---|---|---|
| Small | 1M | 4GB | configs/small.json |
| Medium | 6.5M | 6GB | configs/medium.json |
| Medium Conformer | 9.4M | 8GB | configs/medium_conformer.json |
| Large Conformer | 62M | 24GB+ | configs/large_conformer.json |
# Train with the large conformer config (recommended if you have 24GB+ VRAM)
beat-weaver train \
--config configs/large_conformer.json \
--audio-manifest data/audio_manifest.json \
--data data/processed \
--output output/training
# Or with the medium conformer for 8GB GPUs
beat-weaver train \
--config configs/medium_conformer.json \
--audio-manifest data/audio_manifest.json \
--data data/processed \
--output output/trainingOn the first run, mel spectrograms are pre-computed and cached to data/processed/mel_cache/ (~30GB for 23K songs, takes ~25 minutes). Subsequent runs reuse the cache.
Training logs to TensorBoard:
tensorboard --logdir output/training/tensorboardAlways resume from the best/ checkpoint (never from numbered epoch checkpoints, which may be overwritten during training).
beat-weaver train \
--config configs/large_conformer.json \
--audio-manifest data/audio_manifest.json \
--data data/processed \
--output output/training \
--resume output/training/checkpoints/best# BPM is auto-detected from the audio
beat-weaver generate \
--checkpoint output/training/checkpoints/best \
--audio song.ogg \
--difficulty Expert \
--output my_map/
# With explicit BPM and seed for reproducibility
beat-weaver generate \
--checkpoint output/training/checkpoints/best \
--audio song.ogg \
--difficulty ExpertPlus \
--bpm 128 \
--seed 42 \
--output my_map/The output folder can be copied directly to Beat Saber_Data/CustomLevels/ to play in-game.
beat-weaver evaluate \
--checkpoint output/training/checkpoints/best \
--audio-manifest data/audio_manifest.json \
--data data/processedIf you want to download, extract, process, and build the manifest in a single command:
beat-weaver run --beat-saber "/path/to/Beat Saber" --output data/processedNote: this runs with conservative defaults (--max-maps 100). For full training data, use the individual steps above.
An encoder-decoder model that takes a log-mel spectrogram as input and generates a sequence of beat-quantized tokens representing note placements.
Audio (mel spectrogram + onset) -> [Conformer Encoder] -> [Token Decoder] -> Token Sequence -> v2 Beat Saber Map
- Tokenizer: 291-token vocabulary encoding difficulty, bar structure, beat positions, and compound note placements (position + direction per hand)
- Encoder: Linear projection + RoPE + Conformer blocks (FFN/2 + self-attention + depthwise conv + FFN/2 + LayerNorm). Falls back to standard Transformer with
use_conformer=false. - Decoder: Token embedding + RoPE + Transformer decoder with cross-attention to encoder
- Audio features: Log-mel spectrogram (80 bins) with onset strength channel
- Training: AdamW + cosine LR, mixed-precision (fp16), SpecAugment, color balance loss, dataset filtering by difficulty/characteristic/BPM, weighted sampling (official maps oversampled)
- Inference: Autoregressive generation with grammar constraints ensuring valid map structure. Windowed generation with overlap stitching for songs of any length.
See RESEARCH.md for research details and plans/ for implementation plans.
- Data pipeline — complete (parsers for v2/v3/v4 maps, BeatSaver downloader, Unity extractor, Parquet storage)
- ML model — complete (tokenizer, audio preprocessing, Conformer/Transformer encoder, training loop, inference, exporter, evaluation)
- Baseline training — complete (small model: 16 epochs, 23K songs, 60.6% token accuracy, generates playable maps)
- Model improvements — complete (dataset filtering, SpecAugment, onset features, RoPE, color balance loss, Conformer encoder)
- Conformer training — complete (9.4M params, best val_loss=2.23, 59.4% accuracy at epoch 26, Expert+ only)
# Run all tests (178 total; ML tests auto-skip without ML deps)
python -m pytest tests/ -v