This project is a real-time AI-powered voice assistant designed to help tourists explore London, UK. It transcribes live speech, generates AI responses using OpenAI's GPT, and provides voice feedback using ElevenLabs.
-
Real-time speech-to-text transcription using AssemblyAI
-
AI-generated responses powered by OpenAI's GPT-4o
-
Text-to-speech voice output using ElevenLabs
-
Interactive and conversational travel guide experience
1๏ธโฃ The assistant starts by greeting the user with a voice message.
2๏ธโฃ It listens to the user's voice and transcribes it in real-time.
3๏ธโฃ The query is sent to GPT-4o for a smart AI-generated response.
4๏ธโฃ The assistant then speaks the response using ElevenLabs.
5๏ธโฃ The loop continues, making it an interactive conversation! ๐ฃ๏ธ
-
Python ๐
-
OpenAI GPT-4o ๐ค
-
AssemblyAI (Speech-to-Text) ๐๏ธ
-
ElevenLabs (Text-to-Speech) ๐
Before running the script, make sure the following dependencies are installed on your system
๐ต MPV (Required for ElevenLabs Audio Streaming)
This is required for ElevenLabs to stream audio.
-
๐ฅ๏ธ Windows
-
Download mpv from here
-
Add it to your system PATH.
-
-
๐ Mac (macOS)
brew install mpv -
๐ง Linux (Ubuntu/Debian)
sudo apt update && sudo apt install mpv
๐ค PortAudio & PyAudio (Required for AssemblyAI Transcription)
PortAudio is required to use PyAudio, which AssemblyAI needs for real-time transcription.
-
๐ฅ๏ธ Windows
pip install pyaudio -
๐ Mac (macOS)
brew install portaudio pip install pyaudio -
๐ง Linux (Ubuntu/Debian)
sudo apt update && sudo apt install portaudio19-dev pip install pyaudio