2026.02
- Long-form generation (
generate_audio_long): pass arbitrarily long text and it auto-splits at sentence boundaries (using BERT tokenizer counts), then runs all chunks through the batched pipeline - Streaming output:
generate_audio_long(stream=True)returns a generator that yields audio segments as they decode, giving low time-to-first-audio- Semantic tokens are generated for all text chunks in one batched pass, split into ~3 s sub-segments, then streamed through coarse/fine/codec
- Native audio playback:
--playflag streams audio directly to desktop speakers viasounddeviceas it generates (no external tools needed) - Interactive REPL:
--interactivekeeps compiled models warm between runs for fast iteration - Non-blocking I/O: file writes and audio playback run on background threads; generation is never stalled by slow consumers (including FIFOs)
- True batched pipeline: all four stages (semantic, coarse, fine, codec) now run with batch > 1 natively instead of sequential-per-prompt
- Performance: top-p filtering runs entirely on GPU (no CPU round-trip), removed unnecessary
cuda.synchronize()calls, vectorised codebook flattening
2025.05.17
- MultiGPU inference example added to run a large script and combine the outputs at the end
- Uses torch.compile with max-autotune across all models by default
- Uses padded attention masks to support batched inference
- SageAttention will be used automatically if installed
- Compatible with batch inference and torch compile
On a 3x 4090 system, this can bring long inference job runtimes down from 2-5 minutes to 30-60 seconds.
Bark is licensed under the MIT License, meaning it's now available for commercial use!
git clone https://github.com/bghira/bghira-bark
cd bghira-bark
python3.12 -m venv .venv
. .venv/bin/activate
pip install -e .
pip install sounddevice # optional, for --play# One-shot generation (long text auto-splits):
python examples/batch.py -t "Your long text here..." -v en_speaker_6
# With live audio playback:
python examples/batch.py -t "Hello world." -v en_speaker_6 --play
# Interactive REPL (models stay compiled between runs):
python examples/batch.py --interactive --play -v en_speaker_6
# Multiple explicit prompts (batched):
python examples/batch.py -t "Hello|World|Goodbye" -v en_speaker_6from bark import generate_audio_long
from bark.generation import SAMPLE_RATE, preload_models
preload_models()
# Returns a single concatenated numpy array:
audio = generate_audio_long("Your long text here...", history_prompt="en_speaker_6")
# Or stream segments as they decode:
for segment in generate_audio_long("Long text...", history_prompt="en_speaker_6", stream=True):
# each segment is ~3 s of float32 audio at 24 kHz
passThis will run the example across all available GPUs without invoking torch compile:
env SUNO_DISABLE_COMPILE=true accelerate launch examples/parallel.py --out out.mp3 --normalize -14 --compressBark is fully generative tex-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to AudioLM and Vall-E and a quantized Audio representation from EnCodec. It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on Discord!
[laughter][laughs][sighs][music][gasps][clears throat]βor...for hesitationsβͺfor song lyrics- CAPITALIZATION for emphasis of a word
[MAN]and[WOMAN]to bias Bark toward male and female speakers, respectively
| Language | Status |
|---|---|
| English (en) | β |
| German (de) | β |
| Spanish (es) | β |
| French (fr) | β |
| Hindi (hi) | β |
| Italian (it) | β |
| Japanese (ja) | β |
| Korean (ko) | β |
| Polish (pl) | β |
| Portuguese (pt) | β |
| Russian (ru) | β |
| Turkish (tr) | β |
| Chinese, simplified (zh) | β |
Requests for future language support here or in the #forums channel on Discord.