AI Voice Assistant with OpenAI Realtime

A production-ready AI voice assistant built with FastAPI, Twilio Media Streams, and OpenAI’s Realtime API.

🔍 Features

Real-time speech recognition: Transcribes caller audio via OpenAI Realtime (input.text.delta).
OpenAI TTS (sage): Streams AI-generated voice audio directly in G.711 μ-law format (response.audio.delta).
Health check endpoint: /health returns 200 OK for uptime monitoring.
Dashboard: /messages displays call logs in a styled HTML UI.
Persistent logging: All conversations logged to messages.json.
Retry logic: Backoff-based reconnect for OpenAI WebSockets.

🛠️ Prerequisites

Python 3.12+
ffmpeg (optional for custom audio handling)
A Twilio account with a phone number and Media Streams enabled
OpenAI API key (with Realtime access)

🏗 Installation

Clone your private repo:

git clone git@github.com:YOUR_USER/YOUR_REPO.git
cd YOUR_REPO

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

⚙️ Environment Variables

Create a .env file in the project root with:

# OpenAI
OPENAI_API_KEY=sk-...
SYSTEM_MESSAGE_PATH=./prompt.txt
VOICE=sage  # change to another voice if desired

# Twilio
TWILIO_ACCOUNT_SID=AC... 
TWILIO_AUTH_TOKEN=...

# Server
PORT=5050

Note: Do not commit .env to Git; it’s included in .gitignore.

▶️ Run Locally

uvicorn main:app --reload --host 0.0.0.0 --port $PORT

In Twilio Console, configure your phone number’s Webhook for Voice → Incoming Call to:
```
https://<your-ngrok-or-domain>/incoming-call
```
Start ngrok (if local):
```
ngrok http $PORT
```

🚀 Deployment on Railway

Add railway.nix for Python + ffmpeg (if needed).
Ensure Railway Environment Variables match local .env.
Deploy; Railway auto-detects uvicorn main:app.

📊 Dashboard & Logs

Visit GET /messages to view caller number, timestamp, transcripts, and AI replies in a styled table.

📝 How It Works

Incoming call at /incoming-call: Twilio streams media to /media-stream.
WebSocket (handle_media_stream):
- Sends session update enabling audio and text modalities, with server-side VAD and realtime transcription.
- Streams caller audio to OpenAI.
Transcription: input.text.delta events are logged and shown on the dashboard.
AI Reply: OpenAI’s response.audio.delta streams back voice audio (sage) to Twilio in G.711 μ-law format.
Error Handling: Backoff for reconnects, graceful closure on socket ConnectionClosedOK.

🛠️ Troubleshooting

No transcription? Ensure modalities includes "text" and "input_audio_transcription": {"type": "realtime"} in send_session_update().
No AI audio? Check Twilio Media Streams config, verify voice and output_audio_format in session.
500 on /messages? Delete or reinitialize messages.json to a valid JSON array ([]).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
templates		templates
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
messages.json		messages.json
prompt.txt		prompt.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice Assistant with OpenAI Realtime

🔍 Features

🛠️ Prerequisites

🏗 Installation

⚙️ Environment Variables

▶️ Run Locally

🚀 Deployment on Railway

📊 Dashboard & Logs

📝 How It Works

🛠️ Troubleshooting

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Voice Assistant with OpenAI Realtime

🔍 Features

🛠️ Prerequisites

🏗 Installation

⚙️ Environment Variables

▶️ Run Locally

🚀 Deployment on Railway

📊 Dashboard & Logs

📝 How It Works

🛠️ Troubleshooting

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages