A modular TypeScript library for integrating PersonaPlex (NVIDIA's full-duplex speech-to-speech model) with Twilio Media Streams to create real-time voice conversation bots.
- ποΈ Real-time voice conversations - Full-duplex speech-to-speech using NVIDIA PersonaPlex
- π Twilio integration - Works with Twilio phone calls via Media Streams
- οΏ½ Opus Codec Support - Built-in encoding/decoding for 24kHz audio
- π― TypeScript first - Full type safety and modular architecture
- β‘ Performance optimized - Support for GPU acceleration and CPU offloading
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Phone βββΆ Twilio βββΆ Bridge Server (TypeScript) βββΆ PersonaPlex β
β β (Python) β
β Audio Processing β
β - mulaw β PCM β
β - Resampling 8kHz β 24kHz β
β - Opus encoding/decoding β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Node.js 20+
- NVIDIA GPU (recommended) or enough RAM for CPU offloading
- HuggingFace Token (with access to
nvidia/personaplex-7b-v1) - Twilio account (for phone integration)
# Clone the repository
cd voice-bot
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env with your settingsCopy .env.example to .env and add your HF_TOKEN.
You can start both the AI and the Bridge server with a single command:
# Start AI (PersonaPlex) + Bridge Server
npm run dev:aiOr start them separately:
Option A: Real PersonaPlex (Python Server)
# Starts the model with CPU offloading enabled by default
tsx scripts/start-moshi.jsOption B: Mock Server (Simulation)
npm run mock-servernpm run test:localThis will record received audio to test_output.pcm. Hear it by importing into Audacity as Raw Data (Float32, 24kHz).
npm run devBy default, it uses the PersonaPlex URL from your .env. Expose it with ngrok http 3000 and point your Twilio webhook to https://your-url.ngrok.io/twiml.
Configuration is loaded from environment variables (.env file):
| Variable | Description | Default |
|---|---|---|
PERSONAPLEX_URL |
PersonaPlex WebSocket URL | wss://localhost:8998/api/chat |
PERSONAPLEX_VOICE_PROMPT |
Voice to use (NATF0-3, NATM0-3, etc.) | NATF2.pt |
PERSONAPLEX_TEXT_PROMPT |
System prompt for the AI | You enjoy having a good conversation. |
SERVER_PORT |
Bridge server port | 3000 |
SERVER_HOST |
Bridge server host | 0.0.0.0 |
LOG_LEVEL |
Logging level | info |
| Voice ID | Description |
|---|---|
NATF0 - NATF3 |
Natural female voices |
NATM0 - NATM3 |
Natural male voices |
VARF0 - VARF4 |
Variety female voices |
VARM0 - VARM4 |
Variety male voices |
import { VoiceBot, VoiceBotConfig, PersonaPlexClient } from '@manus/voice-bot';
// Create config
const config: VoiceBotConfig = {
personaplex: {
url: 'wss://localhost:8998/api/chat',
voicePrompt: 'NATF2.pt',
textPrompt: 'You are a helpful assistant.',
},
server: {
port: 3000,
host: '0.0.0.0',
},
logLevel: 'info',
};
// Create bot
const bot = new VoiceBot(config);
bot.on('text', (text) => {
console.log('Bot said:', text);
});
bot.on('audio', (pcm) => {
// Handle audio response
});
await bot.startSession();import { PersonaPlexClient } from '@manus/voice-bot';
const client = new PersonaPlexClient({
config: {
url: 'wss://localhost:8998/api/chat',
voicePrompt: 'NATM1.pt',
textPrompt: 'You are a restaurant booking assistant.',
},
});
client.on('audio', (data) => {
// Opus encoded audio from AI
});
client.on('text', (text) => {
console.log('AI:', text);
});
await client.connect();voice-bot/
βββ src/
β βββ index.ts # Main exports
β βββ config.ts # Configuration
β βββ voice-bot.ts # VoiceBot orchestrator
β βββ audio/ # Audio processing
β β βββ converter.ts # mulaw β PCM
β β βββ resampler.ts # Sample rate conversion
β β βββ buffer.ts # Audio buffering
β βββ personaplex/ # PersonaPlex client
β β βββ client.ts # WebSocket client
β β βββ protocol.ts # Message encoding
β βββ twilio/ # Twilio integration
β β βββ media-streams.ts # Media Streams handler
β β βββ twiml.ts # TwiML generators
β βββ server/ # Bridge server
β β βββ app.ts # Fastify application
β βββ utils/ # Utilities
β βββ logger.ts # Logging
βββ examples/
β βββ local-test.ts # Test without Twilio
β βββ simple-bot.ts # Full bot example
βββ personaplex/ # PersonaPlex Engine (Python)
βββ package.json
# Run in development mode (auto-reload)
npm run dev
# Type checking
npm run typecheck
# Build for production
npm run build
# Run production build
npm start- GPU/Memory requirement - PersonaPlex is a 7B model. Even with
--cpu-offload, it needs significant system memory (VRAM + System RAM). - Single session - The current implementation handles one conversation at a time per PersonaPlex instance.
- Complete Opus encoding/decoding integration
- Add Microphone support for local testing
- Add WebRTC support (browser-based calls)
- Multi-session support with session management
- Docker deployment configuration
MIT
- PersonaPlex - NVIDIA's full-duplex speech model
- Moshi - Base architecture by Kyutai
- Twilio Media Streams - Real-time audio streaming