AI Agent

Desktop AI assistant with voice input, chat interface, and powerful tools.

🚀 Quick Start

Double-click START_CLEAN.bat to launch everything:

🎤 Transcribe Service (voice-to-text)
🤖 Agent Service (AI brain)
💬 Widget (desktop interface)

Everything runs in the background - no terminal windows! The widget appears on your desktop ready to use.

What it does

A modular AI agent that:

Listens to your voice and transcribes it
Chats with you using GPT-5
Executes tasks through tools
Remembers context across sessions
Runs as separate services for stability

Architecture

Multi-service architecture - Each component runs independently:

Core Services

agent-main/ - Main AI agent (CLI + API modes)
agent/ - Agent core (OpenAI wrapper with streaming)
transcribe/ - Audio → text conversion
widget/ - Desktop UI with voice/chat

Data & Tools

chat_history/ - Conversation persistence
memory/ - User context storage
tools/ - Agent capabilities (filesystem, web, todos, etc.)

Utilities

service-template/ - Boilerplate for new services

Features

🎤 Voice Input

Click to record
Auto-transcribe using Whisper
Multi-language support

💬 Chat Interface

Type or speak your messages
Real-time streaming responses
Color-coded output (thinking, responses, function calls)
Screenshot sharing
Persistent history

🛠️ Agent Capabilities

Files: Read, write, search, edit
Web: Search and scrape
Todos: Task management
Memory: Remember user preferences
Documents: Create Word files
Charts: Generate visualizations
Terminal: Run commands
Images: AI image generation

Installation

Option 1: Quick Install

INSTALL.bat

Option 2: Manual Install

pip install -r agent-main/requirements.txt
pip install -r transcribe/requirements.txt
pip install -r widget/requirements.txt

Running

Complete System (Recommended)

1. Install dependencies first:

INSTALL.bat

2. Launch the agent:

START_CLEAN.bat  # Clean launch - runs in background, no terminals

This is the main way to use the agent. Just close the widget to stop everything.

Alternative Launchers

START.bat        # Shows terminal windows (useful for debugging)

Individual Components

# Interactive CLI
python agent-main/app.py --mode interactive

# Agent API
python agent-main/app.py --mode service --port 6002

# Transcribe service
python transcribe/app.py

# Widget only
python widget/widget.py

Configuration

Set your OpenAI API key:

# Windows
$env:OPENAI_API_KEY = "sk-..."

# Or edit config.py files

Service Ports

6000 - Transcribe service
6002 - Agent service
Widget connects to both

Adding Features

Each service has its own README with details:

/agent-main/README.md - Main agent docs
/transcribe/README.md - Transcription service
/widget/README.md - Desktop widget
/tools/README.md - Available tools

Project Layout

ai-agent-desktop/
├── INSTALL.bat         # Install all dependencies
├── START_CLEAN.bat     # Launch everything (background, no terminals)
├── START.bat           # Launch with visible terminals
├── agent-main/         # Main AI agent
├── agent/              # Core agent logic
├── transcribe/         # Voice-to-text service
├── widget/             # Desktop interface
├── tools/              # Agent tools
├── chat_history/       # Conversation storage
├── memory/             # User context
└── service-template/   # New service boilerplate

Why Multi-Service Architecture?

Isolation: One crash doesn't kill everything
Resources: Distribute load across processes
Development: Work on parts independently
Scaling: Add more services easily

Set your OpenAI API key:

# PowerShell (permanent)
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')

# Or just enter it when prompted by START.bat

Usage Guide

Getting Started

Install: Run INSTALL.bat to install all dependencies
Launch: Double-click START.bat to start all services
Use: Click 💬 on the widget to open chat, or use voice recording

For detailed instructions, see LAUNCH_GUIDE.md

Widget Controls

▶ Start Recording - Record voice input
⏹ Stop Recording - Stop and transcribe
💬 Chat - Open/close chat window
⚙ Settings - Language selection and options

Chat Window

Type messages or use voice input
Real-time streaming responses
Persistent chat history
Color-coded display for different response types

For detailed chat features, see widget/CHAT_FEATURE.md

Documentation

QUICKSTART.md - Quick start guide
LAUNCH_GUIDE.md - Detailed launch instructions
COMPLETE_SETUP.md - Complete setup summary
widget/CHAT_FEATURE.md - Chat feature documentation
CHAT_IMPLEMENTATION_SUMMARY.md - Technical details

API Documentation

When services are running, access interactive API docs:

Transcribe Service: http://localhost:6001/docs
Agent Service: http://localhost:6002/docs

Services Overview

Transcribe Service (Port 6001)

Audio transcription using OpenAI Whisper
Multi-language support
FastAPI-based REST API

Agent Service (Port 6002)

Conversational AI with GPT-5
Tool execution (file ops, web search, memory, todos)
Streaming responses
Chat history management

Widget (Desktop App)

Always-on-top interface
Voice recording with transcription
Chat window with persistent history
Draggable, customizable position

Contributing

Contributions are welcome! Feel free to:

Report bugs or issues
Suggest new features
Submit pull requests
Improve documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

destorted93

GitHub: @destorted93
Repository: ai-agent-desktop

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
agent-main		agent-main
agent		agent
chat_history		chat_history
memory		memory
service-template		service-template
tools		tools
transcribe		transcribe
widget		widget
.gitignore		.gitignore
INSTALL.bat		INSTALL.bat
LICENSE		LICENSE
README.md		README.md
START.bat		START.bat
START_CLEAN.bat		START_CLEAN.bat
launcher.py		launcher.py
run_agent.bat		run_agent.bat
run_agent_service.bat		run_agent_service.bat
run_services.bat		run_services.bat
start_hidden.vbs		start_hidden.vbs

License

destorted93/ai-agent-desktop

Folders and files

Latest commit

History

Repository files navigation

AI Agent

🚀 Quick Start

What it does

Architecture

Core Services

Data & Tools

Utilities

Features

🎤 Voice Input

💬 Chat Interface

🛠️ Agent Capabilities

Installation

Option 1: Quick Install

Option 2: Manual Install

Running

Complete System (Recommended)

Alternative Launchers

Individual Components

Configuration

Service Ports

Adding Features

Project Layout

Why Multi-Service Architecture?

Usage Guide

Getting Started

Widget Controls

Chat Window

Documentation

API Documentation

Services Overview

Transcribe Service (Port 6001)

Agent Service (Port 6002)

Widget (Desktop App)

Contributing

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages