Skip to content

Conversation

@leszko
Copy link
Collaborator

@leszko leszko commented Dec 15, 2025

No description provided.

AI Assistant added 12 commits December 15, 2025 10:02
- Replace stub implementation with real VibeVoice model integration
- Add lazy loading of model and processor
- Support device auto-detection (cuda/mps/cpu)
- Implement text-to-speech generation with streaming output
- Add proper audio resampling from 24kHz to 48kHz
- Support configurable speaker voices
- Add comprehensive test script (test_vibevoice.py)
- Include error handling and fallback audio generation
- Create run_vibevoice_test.sh for easy test execution
- Set up PYTHONPATH to include VibeVoice installation
- Make script executable
- Support all test arguments pass-through
- Handle bfloat16 tensors in audio conversion
- Flatten multi-dimensional audio arrays before resampling
- Add improved error handling for audio generation
- Successfully generates real TTS audio from text
- Audio is properly resampled from 24kHz to 48kHz for WebRTC
Generated audio from text: 'Hello, this is a test.'
Duration: 2 seconds @ 48kHz
Speaker: Emma (en-Emma_woman)
- Add vibevoice as git dependency from GitHub
- Pin transformers to 4.51.3 for compatibility
- Add hatch.metadata.allow-direct-references for git dependencies
- Update voice file search to check multiple locations:
  1. VIBEVOICE_VOICES_DIR environment variable
  2. ~/VibeVoice/demo/voices/streaming_model
  3. Installed package (if demo files included)
- Remove PYTHONPATH requirement from test script
- Provides better error messages when voices not found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants