PoC: Text To Speech with VibeVoice #236

leszko · 2025-12-15T12:26:50Z

No description provided.

- Replace stub implementation with real VibeVoice model integration - Add lazy loading of model and processor - Support device auto-detection (cuda/mps/cpu) - Implement text-to-speech generation with streaming output - Add proper audio resampling from 24kHz to 48kHz - Support configurable speaker voices - Add comprehensive test script (test_vibevoice.py) - Include error handling and fallback audio generation

- Create run_vibevoice_test.sh for easy test execution - Set up PYTHONPATH to include VibeVoice installation - Make script executable - Support all test arguments pass-through

- Handle bfloat16 tensors in audio conversion - Flatten multi-dimensional audio arrays before resampling - Add improved error handling for audio generation - Successfully generates real TTS audio from text - Audio is properly resampled from 24kHz to 48kHz for WebRTC

Generated audio from text: 'Hello, this is a test.' Duration: 2 seconds @ 48kHz Speaker: Emma (en-Emma_woman)

- Add vibevoice as git dependency from GitHub - Pin transformers to 4.51.3 for compatibility - Add hatch.metadata.allow-direct-references for git dependencies - Update voice file search to check multiple locations: 1. VIBEVOICE_VOICES_DIR environment variable 2. ~/VibeVoice/demo/voices/streaming_model 3. Installed package (if demo files included) - Remove PYTHONPATH requirement from test script - Provides better error messages when voices not found

AI Assistant added 12 commits December 15, 2025 10:02

Add VibeVoice audio pipeline support

0124a4a

Fix vibevoice callback typing

856adf8

Fix audio frame shape for vibevoice track

7801d55

Add convenience script for running VibeVoice tests with uv

456b688

- Create run_vibevoice_test.sh for easy test execution - Set up PYTHONPATH to include VibeVoice installation - Make script executable - Support all test arguments pass-through

Add sample VibeVoice TTS output for testing

dc3a348

Generated audio from text: 'Hello, this is a test.' Duration: 2 seconds @ 48kHz Speaker: Emma (en-Emma_woman)

Update prompt change

e4c5c62

Working

419187a

Fix UI

027bfae

Remove unused test files

7050c91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PoC: Text To Speech with VibeVoice #236

PoC: Text To Speech with VibeVoice #236

Uh oh!

leszko commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PoC: Text To Speech with VibeVoice #236

Are you sure you want to change the base?

PoC: Text To Speech with VibeVoice #236

Uh oh!

Conversation

leszko commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants