Desktop AI assistant with voice input, chat interface, and powerful tools.
Double-click START_CLEAN.bat to launch everything:
- π€ Transcribe Service (voice-to-text)
- π€ Agent Service (AI brain)
- π¬ Widget (desktop interface)
Everything runs in the background - no terminal windows! The widget appears on your desktop ready to use.
A modular AI agent that:
- Listens to your voice and transcribes it
- Chats with you using GPT-5
- Executes tasks through tools
- Remembers context across sessions
- Runs as separate services for stability
Multi-service architecture - Each component runs independently:
- agent-main/ - Main AI agent (CLI + API modes)
- agent/ - Agent core (OpenAI wrapper with streaming)
- transcribe/ - Audio β text conversion
- widget/ - Desktop UI with voice/chat
- chat_history/ - Conversation persistence
- memory/ - User context storage
- tools/ - Agent capabilities (filesystem, web, todos, etc.)
- service-template/ - Boilerplate for new services
- Click to record
- Auto-transcribe using Whisper
- Multi-language support
- Type or speak your messages
- Real-time streaming responses
- Color-coded output (thinking, responses, function calls)
- Screenshot sharing
- Persistent history
- Files: Read, write, search, edit
- Web: Search and scrape
- Todos: Task management
- Memory: Remember user preferences
- Documents: Create Word files
- Charts: Generate visualizations
- Terminal: Run commands
- Images: AI image generation
INSTALL.batpip install -r agent-main/requirements.txt
pip install -r transcribe/requirements.txt
pip install -r widget/requirements.txt1. Install dependencies first:
INSTALL.bat2. Launch the agent:
START_CLEAN.bat # Clean launch - runs in background, no terminalsThis is the main way to use the agent. Just close the widget to stop everything.
START.bat # Shows terminal windows (useful for debugging)# Interactive CLI
python agent-main/app.py --mode interactive
# Agent API
python agent-main/app.py --mode service --port 6002
# Transcribe service
python transcribe/app.py
# Widget only
python widget/widget.pySet your OpenAI API key:
# Windows
$env:OPENAI_API_KEY = "sk-..."
# Or edit config.py files- 6000 - Transcribe service
- 6002 - Agent service
- Widget connects to both
Each service has its own README with details:
/agent-main/README.md- Main agent docs/transcribe/README.md- Transcription service/widget/README.md- Desktop widget/tools/README.md- Available tools
ai-agent-desktop/
βββ INSTALL.bat # Install all dependencies
βββ START_CLEAN.bat # Launch everything (background, no terminals)
βββ START.bat # Launch with visible terminals
βββ agent-main/ # Main AI agent
βββ agent/ # Core agent logic
βββ transcribe/ # Voice-to-text service
βββ widget/ # Desktop interface
βββ tools/ # Agent tools
βββ chat_history/ # Conversation storage
βββ memory/ # User context
βββ service-template/ # New service boilerplate
- Isolation: One crash doesn't kill everything
- Resources: Distribute load across processes
- Development: Work on parts independently
- Scaling: Add more services easily
Set your OpenAI API key:
# PowerShell (permanent)
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')
# Or just enter it when prompted by START.bat- Install: Run
INSTALL.batto install all dependencies - Launch: Double-click
START.batto start all services - Use: Click π¬ on the widget to open chat, or use voice recording
For detailed instructions, see LAUNCH_GUIDE.md
- βΆ Start Recording - Record voice input
- βΉ Stop Recording - Stop and transcribe
- π¬ Chat - Open/close chat window
- β Settings - Language selection and options
- Type messages or use voice input
- Real-time streaming responses
- Persistent chat history
- Color-coded display for different response types
For detailed chat features, see widget/CHAT_FEATURE.md
- QUICKSTART.md - Quick start guide
- LAUNCH_GUIDE.md - Detailed launch instructions
- COMPLETE_SETUP.md - Complete setup summary
- widget/CHAT_FEATURE.md - Chat feature documentation
- CHAT_IMPLEMENTATION_SUMMARY.md - Technical details
When services are running, access interactive API docs:
- Transcribe Service: http://localhost:6001/docs
- Agent Service: http://localhost:6002/docs
- Audio transcription using OpenAI Whisper
- Multi-language support
- FastAPI-based REST API
- Conversational AI with GPT-5
- Tool execution (file ops, web search, memory, todos)
- Streaming responses
- Chat history management
- Always-on-top interface
- Voice recording with transcription
- Chat window with persistent history
- Draggable, customizable position
Contributions are welcome! Feel free to:
- Report bugs or issues
- Suggest new features
- Submit pull requests
- Improve documentation
This project is licensed under the MIT License - see the LICENSE file for details.
destorted93
- GitHub: @destorted93
- Repository: ai-agent-desktop