A self-hosted AI agent for developers managing production incidents. Runs locally on your laptop as Docker containers. No login, no cloud sync — everything stays on your machine.
Paste errors, logs, alerts, or screenshots. The agent triages immediately: identifies what's wrong, runs commands against your infrastructure, checks Kubernetes pods, queries AWS/Azure/GCP, and walks you through root cause and fix.
Built by Doctor Droid.
- Prerequisites
- Quick Start
- Architecture
- Configuration
- Web UI Guide
- CLI Usage
- Skills
- Memory
- Tools & MCP Servers
- Learner Worker
- API Reference
- Host CLI Access
- Troubleshooting
- Development
- Docker and Docker Compose (Docker Desktop on Mac/Windows, or Docker Engine on Linux)
- An API key from one of: Azure AI Foundry, OpenAI, or Anthropic
- (Optional) CLI tools installed on your host:
kubectl,aws,az,gcloud,gh,docker,terraform,helm
# 1. Clone the repository
git clone https://github.com/drdroid-io/droid-agent.git
cd droid-agent
# 2. Create your environment file
cp .env.example .env
# 3. Edit .env — set your AI provider and API key (see Configuration below)
# At minimum, set AI_PROVIDER and the relevant API key
# 4. Create your MCP config (tools + external service integrations)
cp config/mcp.example.json config/mcp.json
# 5. (Optional) Edit config/mcp.json — enable MCP servers you want to use
# Set "enabled": true and fill in API keys for Datadog, Sentry, etc.
# 6. Build and start all containers
docker compose up -d --build
# 7. Open the web UI
open http://localhost:7433
# 8. (Optional) Run infrastructure sync to pre-load your stack context
docker compose exec droid-agent node sync.jsTo stop:
docker compose downTo stop and wipe all data (Redis, PostgreSQL, volumes):
docker compose down -vTo rebuild from scratch:
./rebuild.shThree Docker containers, all local:
┌─────────────────────────────────────────────────────────────┐
│ Your Laptop │
│ │
│ ┌──────────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ Droid Agent │ │ Redis │ │ PostgreSQL │ │
│ │ (Node.js) │ │ (conv. │ │ (incidents, │ │
│ │ │ │ history) │ │ feedback, │ │
│ │ ┌─────────────┐ │ │ │ │ learner) │ │
│ │ │ Web UI │ │ │ │ │ │ │
│ │ │ :7433 │ │ │ │ │ │ │
│ │ └─────────────┘ │ │ │ │ │ │
│ │ ┌─────────────┐ │ │ │ │ │ │
│ │ │ Agent Loop │ │ │ │ │ │ │
│ │ │ + Tools │ │ │ │ │ │ │
│ │ └─────────────┘ │ │ │ │ │ │
│ │ ┌─────────────┐ │ │ │ │ │ │
│ │ │ Learner │ │ │ │ │ │ │
│ │ │ Worker │ │ │ │ │ │ │
│ │ └─────────────┘ │ │ │ │ │ │
│ └──────────────────┘ └─────────────┘ └───────────────┘ │
│ │ │
│ ▼ (mounted volumes) │
│ ./skills/ ./memory/ ./config/ ~/.kube/ ~/.aws/ etc. │
└─────────────────────────────────────────────────────────────┘
| Component | Purpose | Persistence |
|---|---|---|
| Droid Agent | Express server, agent loop, web UI, learner worker | Stateless (code only) |
| Redis | Conversation message cache (24h TTL) | Docker volume redis_data |
| PostgreSQL | Incidents, tool audit log, conversations, feedback, learner state | Docker volume pg_data |
| ./skills/ | Domain knowledge (markdown files) | Host filesystem (volume mount) |
| ./memory/ | Agent's persistent memory | Host filesystem (volume mount) |
| ./config/ | Tool definitions, MCP server config | Host filesystem (volume mount) |
Droid Agent supports four AI providers. Set AI_PROVIDER in .env and fill in the corresponding keys.
AI_PROVIDER=azure-openai
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_DEPLOYMENT=gpt-4.1
AZURE_API_VERSION=2025-04-01-previewAI_PROVIDER=azure-kimi
AZURE_KIMI_ENDPOINT=https://your-resource.services.ai.azure.com
AZURE_KIMI_API_KEY=your-api-key
AZURE_KIMI_DEPLOYMENT=kimi-k2
AZURE_API_VERSION=2025-04-01-previewAI_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4.1AI_PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
CLAUDE_MODEL=claude-sonnet-4-20250514| Variable | Description | Default |
|---|---|---|
AI_PROVIDER |
Provider: openai, claude, azure-openai, azure-kimi |
azure-openai |
AGENT_NAME |
Display name in UI and prompts | Droid Agent |
PORT |
Web UI port | 7433 |
LEARNER_ENABLED |
Enable periodic learner worker | true |
LEARNER_INTERVAL_MS |
Learner run interval (milliseconds) | 3600000 (1 hour) |
LEARNER_MIN_MESSAGES |
Min messages in a conversation before learner analyzes it | 4 |
Redis and PostgreSQL connection strings are set automatically by docker-compose — no need to configure them.
Open http://localhost:7433 after starting the containers.
The main panel. Type or paste:
- Error messages or stack traces
- Log snippets
- Alert notifications
- Descriptions like "check if my prod pods are healthy"
The agent will:
- Run commands against your infrastructure (kubectl, aws, docker, etc.)
- Show you the output inline with collapsible tool call blocks
- Analyze the results and suggest next steps
- Ask before saving anything to memory
Keyboard shortcuts:
Enter— send messageShift+Enter— new line in messageCtrl+L— start new conversationEscape— close panels/modals
Image upload: Click the paperclip icon to attach up to 4 screenshots (Grafana dashboards, error pages, metric graphs). The agent can analyze images.
Click "Run Infrastructure Sync" in the sidebar, or run via CLI:
docker compose exec droid-agent node sync.jsThe sync agent discovers your infrastructure by running commands and saving structured summaries to memory/infra/. It follows the skills/infra-sync.md skill file — edit that file to customize what gets discovered.
What it discovers:
- Docker containers, networks, volumes
- Kubernetes clusters (all contexts), namespaces, deployments, services, pods
- AWS accounts and profiles, EC2, ECS, RDS, Lambda, S3
- Azure subscriptions, AKS, VMs, web apps
- Google Cloud projects, GCE, GKE, Cloud Run
- GitHub authentication and repositories
- Listening network ports
- Local project directories
The sync panel shows real-time streaming output of every command executed and every file written.
Click Skills in the sidebar to browse all loaded skills. Each skill is a markdown file in ./skills/. Click a skill to expand and read its content. Skills are injected into every agent conversation as system context.
Click Memory in the sidebar to browse all memory files. Each file shows its path, size, and last modified date. Click to expand and read content. Click edit to modify a file inline — changes are saved immediately.
Memory files are organized by directory:
context.md— your permanent notesinfra/— auto-populated by syncincidents/— agent-written investigation summarieslearned/— auto-generated by the learner worker
Click Tools in the sidebar to see all configured tools. Shows the tool name, executor type, timeout, and full JSON configuration.
The sidebar shows your 5 most recent conversations. Click any to reload it with full tool call details (command + output). Click "show all N conversations" to open the full history browser.
Each past conversation preserves:
- All user messages
- All agent responses with markdown rendering
- Tool call blocks with the exact command and output
- Memory write notifications
- Thumbs up/down feedback
Click "+ new chat" to start a fresh conversation.
Every agent response has thumbs up/down buttons. Click to record whether the response was helpful:
- Thumbs up — marks the response as validated. The agent will favor similar approaches in future conversations.
- Thumbs down — marks the response as unhelpful. The agent will try different approaches.
Feedback is stored in PostgreSQL and included in the agent's system prompt as context for future conversations.
All commands assume you're in the droid-agent directory.
# Start all containers (detached)
docker compose up -d
# Start with rebuild
docker compose up -d --build
# Stop containers (keeps data)
docker compose down
# Stop and delete all data
docker compose down -v
# Full rebuild (stop, remove image, rebuild, start)
./rebuild.sh
# View logs
docker compose logs -f droid-agent
docker compose logs -f droid-agent --tail=50
# Restart just the agent (after editing code)
docker compose up -d --build droid-agent# Run sync interactively (shows progress in terminal)
docker compose exec droid-agent node sync.js
# Run sync in the background
docker compose exec -d devagent node sync.js# Shell into the agent container
docker compose exec droid-agent sh
# Test a command as the agent would run it
docker compose exec droid-agent kubectl config get-contexts
docker compose exec droid-agent aws sts get-caller-identity
docker compose exec droid-agent az account show
docker compose exec droid-agent gcloud config list
docker compose exec droid-agent gh auth status
docker compose exec droid-agent docker pscurl -X POST http://localhost:7433/api/learner/trigger | python3 -m json.toolcurl http://localhost:7433/api/health | python3 -m json.toolExample output:
{
"status": "ok",
"model": "gpt-4.1",
"provider": "Azure AI Foundry (OpenAI)",
"skillsLoaded": 4,
"memoryFiles": 12,
"toolsAvailable": 4,
"mcpServersEnabled": 0,
"mcpToolsAvailable": 0,
"memoryTotalBytes": 30783,
"redis": "connected",
"postgres": "connected"
}# Send a message (returns SSE stream)
curl -N -X POST http://localhost:7433/api/chat \
-H 'Content-Type: application/json' \
-d '{"message":"list all pods in production","conversationId":"cli-test"}'
# List conversations
curl http://localhost:7433/api/conversations | python3 -m json.tool
# Get messages for a conversation
curl http://localhost:7433/api/conversations/cli-test/messages | python3 -m json.tool
# List recent incidents
curl http://localhost:7433/api/incidents | python3 -m json.tool
# Check learner status
curl http://localhost:7433/api/learner/status | python3 -m json.toolSkills are markdown files in ./skills/ that teach the agent domain-specific knowledge. They're injected into the system prompt on every message — no restart needed.
| Skill | File | Description |
|---|---|---|
| Kubernetes | skills/kubernetes.md |
Pod logs, describe, events, common failure states (OOMKilled, CrashLoopBackOff), rollouts, HPA |
| Docker | skills/docker.md |
Container logs, inspect, stats, exec, compose, common issues (port conflicts, volumes) |
| General Debugging | skills/general-debugging.md |
Incident triage framework, stack trace reading, 5xx/4xx triage, memory leaks, N+1 queries, latency spikes |
| Infra Sync | skills/infra-sync.md |
What to discover during infrastructure sync |
Create a .md file in ./skills/:
# Example: Add a runbook for your payments service
cat > ./skills/payments-runbook.md << 'EOF'
# Payments Service Runbook
## Service Overview
- Runs in Kubernetes namespace: payments
- Database: PostgreSQL on RDS (payments-db)
- Dependencies: Stripe API, Redis cache, notification service
## Common Issues
### Stuck payments
1. Check the payments queue: `kubectl exec -it payments-worker -- rails console`
2. Look for locked transactions in the DB
3. Check Stripe webhook delivery status
### High latency
1. Check Redis connection pool: `kubectl top pods -n payments`
2. Check RDS slow query log
3. Check Stripe API response times in Datadog
## Restart Procedure
1. `kubectl rollout restart deployment/payments-api -n payments`
2. Wait for rollout: `kubectl rollout status deployment/payments-api -n payments`
3. Verify health: `curl https://payments.internal/health`
EOFThe agent will immediately have access to this knowledge — no restart needed.
skills/infra-sync.md controls what the infrastructure sync discovers. It's a guide, not a script — the agent reads it and decides what commands to run. You can:
- Add sections for services specific to your stack
- Remove sections you don't care about
- Add hints about where things are deployed
- Include specific queries you want the agent to run
Example customization:
### Our Microservices
Check our main services running in the `production` namespace on the `prod-east` kubectl context.
List all deployments with their replica counts and images.
Check for any pods not in Running state.memory/
├── context.md # Your permanent notes (edit this!)
├── infra/ # Auto-populated by sync
│ ├── docker.md
│ ├── kubernetes.md
│ ├── aws.md
│ ├── azure.md
│ ├── gcloud.md
│ ├── github.md
│ ├── network.md
│ ├── projects.md
│ └── summary.md
├── incidents/ # Agent writes investigation summaries here
│ └── 2026-03-15-api-latency.md
└── learned/ # Learner worker writes patterns here
├── patterns.md
└── investigation-summaries.md
Three ways:
-
Web UI — Click "Memory" in sidebar, click edit on any file, modify, click Save.
-
Host filesystem — Edit files directly:
vim ./memory/context.md
-
API:
curl -X POST http://localhost:7433/api/memory/write \ -H 'Content-Type: application/json' \ -d '{ "path": "context.md", "content": "# My Stack\n\n## Services\n- api: port 8080\n- db: postgres on 5432" }'
memory/context.md is your permanent notes file. The agent reads it on every message. Use it for:
# My Stack
## Services
- **API Gateway**: runs on port 8080, deployed to `prod` k8s context
- **Payment Service**: port 3001, depends on Stripe and Redis
- **Database**: PostgreSQL 15 on AWS RDS, instance: prod-db-main
## On-Call
- PagerDuty escalation: Backend > SRE > VP Eng
- Slack channel: #incidents
- Runbook wiki: https://wiki.internal/runbooks
## Common Issues
- API latency usually caused by N+1 queries in the orders endpoint
- OOM on worker pods: increase memory limit to 2GiThe learner worker periodically analyzes past conversations and writes:
memory/learned/patterns.md— investigation patterns and useful commandsmemory/learned/investigation-summaries.md— summaries of past debugging sessions
These are automatically included in the agent's context.
| Tool | Description |
|---|---|
run_shell |
Execute any shell command on the host machine. All host CLIs (kubectl, aws, az, etc.) and credentials are available. |
fetch_url |
HTTP GET a URL. Use for health endpoints, metrics APIs. |
read_file |
Read a file from the host filesystem. |
write_memory |
Save markdown to the agent's memory. |
Configured in config/mcp.json under the tools array.
Safety: Dangerous commands (rm -rf /, DROP TABLE, shutdown, etc.) are blocked by default. To allow a specific pattern, add the regex string to allowed_dangerous in mcp.json.
The MCP configuration lives in config/mcp.json. This file is gitignored because it contains your service credentials. An example template is provided:
# First-time setup: copy the example
cp config/mcp.example.json config/mcp.json
# Then edit to add your credentials
vim config/mcp.jsonThe file has two sections:
tools— Built-in tools (run_shell, fetch_url, read_file, write_memory). These work out of the box.mcpServers— External MCP server integrations. All disabled by default — enable the ones you use.
MCP (Model Context Protocol) servers give the agent access to external services like Datadog, Sentry, Grafana, PagerDuty, etc. The agent auto-discovers all tools from each enabled server on startup.
Two types of MCP servers are supported:
The agent spawns the MCP server as a child process and communicates via JSON-RPC over stdio:
"datadog": {
"enabled": true,
"command": "npx",
"args": ["-y", "@anthropic/mcp-server-datadog"],
"env": {
"DATADOG_API_KEY": "your-actual-api-key",
"DATADOG_APP_KEY": "your-actual-app-key"
}
}Some services expose MCP over HTTP instead of stdio:
"render": {
"enabled": true,
"type": "http",
"url": "https://mcp.render.com/mcp",
"headers": {
"Authorization": "Bearer your-render-api-key"
}
}To enable a server:
- Open
config/mcp.json - Find the server entry (e.g.
"datadog") - Set
"enabled": true - Fill in
envvalues with your real API keys - Restart:
docker compose restart droid-agent
To verify a server is connected:
# Check health endpoint
curl -s http://localhost:7433/api/health | python3 -m json.tool
# Look for mcpServersEnabled > 0 and mcpToolsAvailable > 0
# Check container logs for MCP initialization
docker compose logs droid-agent | grep mcpHow it works at runtime:
- On startup, the agent spawns each enabled MCP server process
- Performs the MCP handshake (initialize → list tools)
- All discovered tools are injected into the agent's system prompt
- When the agent calls an MCP tool, the request is routed to the correct server
- If a server fails to start, the agent continues without it
Here's what your config/mcp.json might look like with Datadog, Sentry, and GitHub enabled:
{
"tools": [
{ "name": "run_shell", "description": "Run a shell command...", "executor": "shell", "timeout": 60 },
{ "name": "fetch_url", "description": "HTTP GET a URL...", "executor": "http_get", "timeout": 10 },
{ "name": "read_file", "description": "Read a file...", "executor": "read_file" },
{ "name": "write_memory", "description": "Save to memory...", "executor": "memory_write" }
],
"allowed_dangerous": [],
"mcpServers": {
"datadog": {
"enabled": true,
"command": "npx",
"args": ["-y", "@anthropic/mcp-server-datadog"],
"env": {
"DATADOG_API_KEY": "abc123...",
"DATADOG_APP_KEY": "def456..."
}
},
"sentry": {
"enabled": true,
"command": "npx",
"args": ["-y", "@sentry/mcp-server"],
"env": {
"SENTRY_AUTH_TOKEN": "sntrys_..."
}
},
"github": {
"enabled": true,
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."
}
}
}
}After restart, the agent can:
- Query Datadog metrics and logs
- Search Sentry issues and view stack traces
- List GitHub repos, PRs, and issues
All through natural conversation — just ask "show me the latest Sentry errors for the payments service."
| Service | Package | Category |
|---|---|---|
| Datadog | @anthropic/mcp-server-datadog |
Monitoring / APM |
| Sentry | @sentry/mcp-server |
Error Tracking |
| Grafana | @grafana/mcp-server |
Dashboards |
| New Relic | newrelic-mcp-server |
Monitoring / APM |
| PagerDuty | @pagerduty/mcp-server |
Incident Management |
| Elasticsearch | mcp-server-elasticsearch |
Logs / Search |
| Cloudflare | @cloudflare/mcp-server-cloudflare |
Infrastructure |
| Supabase | @supabase/mcp-server-supabase |
Backend / DB |
| Vercel | @vercel/mcp-server |
Deployment |
| Render | HTTP: mcp.render.com/mcp |
Deployment |
| PostHog | posthog-mcp-server |
Product Analytics |
| Linear | mcp-linear |
Issue Tracking |
| GitHub | @modelcontextprotocol/server-github |
Source Control |
| Slack | @modelcontextprotocol/server-slack |
Communication |
| Kubernetes | @modelcontextprotocol/server-kubernetes |
Infrastructure |
| PostgreSQL | @modelcontextprotocol/server-postgres |
Database |
| Brave Search | @modelcontextprotocol/server-brave-search |
Web Search |
| Filesystem | @modelcontextprotocol/server-filesystem |
File Access |
A background process that runs every hour (configurable) and learns from past conversations.
What it does:
- Reads conversations from PostgreSQL that haven't been analyzed yet
- Sends transcripts to the AI model for pattern extraction
- Identifies: issue types, investigation patterns, root causes, resolutions
- Merges findings into
memory/learned/patterns.mdandmemory/learned/investigation-summaries.md - These files are automatically included in future agent conversations
Configuration:
LEARNER_ENABLED=true # Set to false to disable
LEARNER_INTERVAL_MS=3600000 # Run every hour (default)
LEARNER_MIN_MESSAGES=4 # Minimum messages before analyzing a conversationManual trigger:
curl -X POST http://localhost:7433/api/learner/trigger | python3 -m json.toolCheck last run:
curl http://localhost:7433/api/learner/status | python3 -m json.tool| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat |
Send a message (returns SSE stream). Body: {message, images?, conversationId} |
GET |
/api/health |
System health (model, provider, skills, memory, redis, postgres, MCP) |
POST |
/api/sync |
Run infrastructure sync (returns SSE stream) |
GET |
/api/skills |
List all loaded skills |
GET |
/api/memory |
List all memory files with content |
POST |
/api/memory/write |
Write a memory file. Body: {path, content} |
GET |
/api/tools |
List configured tools from mcp.json |
GET |
/api/conversations |
List recent conversations. Query: ?limit=20 |
GET |
/api/conversations/:id/messages |
Get all messages for a conversation (includes feedback) |
POST |
/api/feedback |
Submit feedback. Body: `{messageId, conversationId, feedback: 'up' |
GET |
/api/incidents |
List incidents from PostgreSQL. Query: ?limit=50 |
GET |
/api/tool-executions |
Tool audit log. Query: ?conversationId=xxx&limit=100 |
POST |
/api/learner/trigger |
Manually trigger a learner cycle |
GET |
/api/learner/status |
Get last learner run info |
The agent container has these CLIs pre-installed:
kubectl, aws, az, gcloud, gh, docker, jq, curl, git, ssh, wget
Host credentials are mounted read-only via docker-compose volumes:
| Host Path | Container Path | Purpose |
|---|---|---|
~/.kube |
/root/.kube |
Kubernetes contexts and clusters |
~/.aws |
/root/.aws |
AWS credentials and config |
~/.azure |
/root/.azure |
Azure CLI auth tokens |
~/.config/gcloud |
/root/.config/gcloud |
GCP credentials |
~/.config/gh |
/root/.config/gh |
GitHub CLI auth |
~/.docker |
/root/.docker |
Docker registry auth |
~/.ssh |
/root/.ssh |
SSH keys |
/var/run/docker.sock |
/var/run/docker.sock |
Docker daemon socket |
If a credential directory doesn't exist on your host, comment out that line in docker-compose.yml to avoid startup errors.
Error: Mount denied: path ~/.azure does not exist
Comment out the missing mount in docker-compose.yml:
volumes:
# - ~/.azure:/root/.azure:ro # Comment out if you don't use AzureCheck the pod logs:
docker compose logs droid-agent --tail=30Look for [WARN] or [parser] messages. Common causes:
- Model outputs malformed JSON in tool calls (parser handles most cases)
- System prompt too large (check
System prompt built: N charsin logs)
Check if the infra-sync skill exists:
ls ./skills/infra-sync.mdCheck pod logs for sync output:
docker compose logs droid-agent --tail=50 | grep syncTest directly:
docker compose exec droid-agent kubectl config get-contexts
docker compose exec droid-agent aws sts get-caller-identityIf credentials aren't found, verify the volume mounts in docker-compose.yml and that the files exist on your host.
docker system prune -f
docker compose restart postgresdocker compose restart redisCheck logs:
docker compose logs droid-agent | grep mcpCommon issues:
- npx needs to download the package (first run can be slow)
- Invalid API keys
- Network issues inside the container
Click the sun/moon icon in the top-right of the sidebar.
droid-agent/
├── docker-compose.yml # Container orchestration
├── Dockerfile # Agent container image
├── .gitignore # Excludes .env, mcp.json, user data
├── .env.example # Environment template
├── rebuild.sh # Full rebuild script
├── config/
│ ├── mcp.example.json # MCP config template (committed)
│ ├── mcp.json # Your MCP config with credentials (gitignored)
│ └── init.sql # PostgreSQL schema
├── skills/
│ ├── kubernetes.md # K8s debugging knowledge
│ ├── docker.md # Docker debugging knowledge
│ ├── general-debugging.md # General triage framework
│ └── infra-sync.md # Sync discovery guide
├── memory/
│ ├── context.md # Your permanent notes
│ ├── infra/ # Sync-populated
│ ├── incidents/ # Agent-written
│ └── learned/ # Learner-written
└── app/
├── server.js # Express HTTP server + SSE
├── agent.js # Core agent loop (chat + sync)
├── provider.js # AI provider abstraction
├── tools.js # Tool execution (shell, HTTP, file, memory)
├── mcp-client.js # MCP server client (stdio + HTTP)
├── skills.js # Skill file loader
├── memory.js # Memory filesystem helpers
├── db.js # PostgreSQL queries
├── redis.js # Redis conversation cache
├── learner.js # Periodic learning worker
├── sync.js # CLI entry point for sync
├── package.json # Node.js dependencies
└── public/
├── index.html # Single-file frontend (vanilla JS)
└── logo.png # Doctor Droid logo
- Skills/Memory/Config: Edit files on the host — they're volume-mounted, changes are live.
- Backend code (app/): Edit, then
docker compose up -d --build droid-agent. - Frontend (index.html): Edit, then
docker compose up -d --build droid-agent. - Dockerfile: Run
./rebuild.shfor a clean build. - Database schema (init.sql): Run
./rebuild.sh(wipes and recreates DB).
Edit app/provider.js:
- Add a new case to the
switch (PROVIDER)block - Initialize the client
- Add a call function if the API format differs from OpenAI
Edit config/mcp.json — add to the tools array:
{
"name": "my_tool",
"description": "What this tool does",
"args": { "param": "description" },
"executor": "shell",
"timeout": 30
}Then add the executor logic in app/tools.js if using a custom executor type.
After running infrastructure sync, try:
- "Show me all pods that aren't running in the prod cluster"
- "Check the last 100 lines of logs for the api deployment"
- "What's using the most CPU across all my k8s clusters?"
- "Are there any OOMKilled pods in the last hour?"
- "List all AWS EC2 instances that are running"
- "Check if my RDS database has any slow queries"
- "Write a runbook for restarting the payments service"
- "What changed in the last deploy?"
- Upload a Grafana screenshot + "what caused this spike?"
- Paste a stack trace + "what went wrong?"
MIT