Skip to content

Desktop AI assistant with voice input and chat interface. Multi-service architecture using OpenAI Response API.

License

Notifications You must be signed in to change notification settings

destorted93/ai-agent-desktop

Repository files navigation

AI Agent

License: MIT

Desktop AI assistant with voice input, chat interface, and powerful tools.

πŸš€ Quick Start

Double-click START_CLEAN.bat to launch everything:

  • 🎀 Transcribe Service (voice-to-text)
  • πŸ€– Agent Service (AI brain)
  • πŸ’¬ Widget (desktop interface)

Everything runs in the background - no terminal windows! The widget appears on your desktop ready to use.

What it does

A modular AI agent that:

  • Listens to your voice and transcribes it
  • Chats with you using GPT-5
  • Executes tasks through tools
  • Remembers context across sessions
  • Runs as separate services for stability

Architecture

Multi-service architecture - Each component runs independently:

Core Services

  • agent-main/ - Main AI agent (CLI + API modes)
  • agent/ - Agent core (OpenAI wrapper with streaming)
  • transcribe/ - Audio β†’ text conversion
  • widget/ - Desktop UI with voice/chat

Data & Tools

  • chat_history/ - Conversation persistence
  • memory/ - User context storage
  • tools/ - Agent capabilities (filesystem, web, todos, etc.)

Utilities

  • service-template/ - Boilerplate for new services

Features

🎀 Voice Input

  • Click to record
  • Auto-transcribe using Whisper
  • Multi-language support

πŸ’¬ Chat Interface

  • Type or speak your messages
  • Real-time streaming responses
  • Color-coded output (thinking, responses, function calls)
  • Screenshot sharing
  • Persistent history

πŸ› οΈ Agent Capabilities

  • Files: Read, write, search, edit
  • Web: Search and scrape
  • Todos: Task management
  • Memory: Remember user preferences
  • Documents: Create Word files
  • Charts: Generate visualizations
  • Terminal: Run commands
  • Images: AI image generation

Installation

Option 1: Quick Install

INSTALL.bat

Option 2: Manual Install

pip install -r agent-main/requirements.txt
pip install -r transcribe/requirements.txt
pip install -r widget/requirements.txt

Running

Complete System (Recommended)

1. Install dependencies first:

INSTALL.bat

2. Launch the agent:

START_CLEAN.bat  # Clean launch - runs in background, no terminals

This is the main way to use the agent. Just close the widget to stop everything.

Alternative Launchers

START.bat        # Shows terminal windows (useful for debugging)

Individual Components

# Interactive CLI
python agent-main/app.py --mode interactive

# Agent API
python agent-main/app.py --mode service --port 6002

# Transcribe service
python transcribe/app.py

# Widget only
python widget/widget.py

Configuration

Set your OpenAI API key:

# Windows
$env:OPENAI_API_KEY = "sk-..."

# Or edit config.py files

Service Ports

  • 6000 - Transcribe service
  • 6002 - Agent service
  • Widget connects to both

Adding Features

Each service has its own README with details:

  • /agent-main/README.md - Main agent docs
  • /transcribe/README.md - Transcription service
  • /widget/README.md - Desktop widget
  • /tools/README.md - Available tools

Project Layout

ai-agent-desktop/
β”œβ”€β”€ INSTALL.bat         # Install all dependencies
β”œβ”€β”€ START_CLEAN.bat     # Launch everything (background, no terminals)
β”œβ”€β”€ START.bat           # Launch with visible terminals
β”œβ”€β”€ agent-main/         # Main AI agent
β”œβ”€β”€ agent/              # Core agent logic
β”œβ”€β”€ transcribe/         # Voice-to-text service
β”œβ”€β”€ widget/             # Desktop interface
β”œβ”€β”€ tools/              # Agent tools
β”œβ”€β”€ chat_history/       # Conversation storage
β”œβ”€β”€ memory/             # User context
└── service-template/   # New service boilerplate

Why Multi-Service Architecture?

  • Isolation: One crash doesn't kill everything
  • Resources: Distribute load across processes
  • Development: Work on parts independently
  • Scaling: Add more services easily

Set your OpenAI API key:

# PowerShell (permanent)
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')

# Or just enter it when prompted by START.bat

Usage Guide

Getting Started

  1. Install: Run INSTALL.bat to install all dependencies
  2. Launch: Double-click START.bat to start all services
  3. Use: Click πŸ’¬ on the widget to open chat, or use voice recording

For detailed instructions, see LAUNCH_GUIDE.md

Widget Controls

  • β–Ά Start Recording - Record voice input
  • ⏹ Stop Recording - Stop and transcribe
  • πŸ’¬ Chat - Open/close chat window
  • βš™ Settings - Language selection and options

Chat Window

  • Type messages or use voice input
  • Real-time streaming responses
  • Persistent chat history
  • Color-coded display for different response types

For detailed chat features, see widget/CHAT_FEATURE.md

Documentation

API Documentation

When services are running, access interactive API docs:

Services Overview

Transcribe Service (Port 6001)

  • Audio transcription using OpenAI Whisper
  • Multi-language support
  • FastAPI-based REST API

Agent Service (Port 6002)

  • Conversational AI with GPT-5
  • Tool execution (file ops, web search, memory, todos)
  • Streaming responses
  • Chat history management

Widget (Desktop App)

  • Always-on-top interface
  • Voice recording with transcription
  • Chat window with persistent history
  • Draggable, customizable position

Contributing

Contributions are welcome! Feel free to:

  • Report bugs or issues
  • Suggest new features
  • Submit pull requests
  • Improve documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

destorted93

About

Desktop AI assistant with voice input and chat interface. Multi-service architecture using OpenAI Response API.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •