A real-time audio group chat implementation enabling voice and text communication between humans and AI agents. This project combines WebRTC, speech-to-text, text-to-speech, and LLM capabilities to create interactive conversations with AI agents.
- Real-time audio communication using WebRTC
- Multiple AI agents with distinct voices and personalities
- Text-to-Speech (TTS) with customizable voice options
- Speech-to-Text (STT) for human voice input
- Round-robin speaker selection for balanced conversations
- Gradio-based web interface for easy interaction
- Support for both voice and text channels
- Python 3.8+
- Node.js (for frontend components)
- Ollama (for local LLM support)
- Clone the repository:
git clone <repository-url>
cd AudioGroupChat- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Configure Ollama settings in
main_app.py:
config_list = [{
"model": "gemma3:1b", # or other supported models
"base_url": "http://localhost:11434/v1",
"price": [0.00, 0.00],
}]- (Optional) Set up Twilio TURN server credentials for improved WebRTC connectivity:
export TWILIO_ACCOUNT_SID=your_account_sid
export TWILIO_AUTH_TOKEN=your_auth_token- Start the application:
python main_app.py-
Open the provided Gradio interface URL in your browser (typically http://localhost:7860)
-
Start a conversation by:
- Speaking into your microphone
- Typing text messages
- Using the provided UI controls
main_app.py: Main application entry pointaudio_groupchat.py: Core audio group chat implementationgradio_ui.py: Gradio web interface componentstest_group_chat.py: Test cases and examples
The system supports multiple voice options for AI agents:
- Energetic (fast, US English)
- Calm (slower, US English)
- British (UK English)
- Authoritative (moderate speed, US English)
- Default (standard US English)
class AudioGroupChat(GroupChat):
def __init__(self, agents=None, messages=None, max_round=10,
speaker_selection_method="round_robin",
allow_repeat_speaker=False)Key methods:
initialize(): Set up audio processing componentsadd_human_participant(user_id): Add a human participantstart_audio_session(user_id): Start an audio session
class GradioUI:
def __init__(self, audio_chat: AudioGroupChat)
def create_interface(self) -> gr.Blocks- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.