Zero/low-code meta-agent that designs and runs autonomous SRE & ops workflows from natural language goals.
AutoOps Architect takes your operations goals (like "investigate elevated 5xx errors") and automatically generates executable workflow graphs that collect data, analyze issues, and recommend actions - all without writing code.
- Natural Language to Action: Describe what you want to investigate, and the system creates a structured workflow
- Composable Workflows: Generated workflows are DAGs (directed acyclic graphs) with clear dependencies
- Pluggable Tools: Integrate with your existing monitoring, ticketing, and automation systems
- Institutional Memory: Learn from past investigations to improve future workflows
- Human-in-the-Loop: Dangerous operations require explicit approval
# Clone the repository
git clone https://github.com/nik-kale/AutoOPS-Architect.git
cd AutoOPS-Architect
# Set your API key
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-..."
# Start with Docker Compose (includes Redis cache)
docker-compose up -d
# Access web UI at http://localhost:8000
# View logs
docker-compose logs -f autoops
# Stop services
docker-compose down# Clone the repository
git clone https://github.com/nik-kale/AutoOPS-Architect.git
cd AutoOPS-Architect
# Install with pip
pip install -e .
# Or with development dependencies
pip install -e ".[dev]"# Build image
docker build -t autoops-architect .
# Run container
docker run -d \
-p 8000:8000 \
-e OPENAI_API_KEY="sk-..." \
--name autoops \
autoops-architect
# Access at http://localhost:8000# Generate a workflow plan from a goal
autoops plan "Investigate elevated 5xx errors for the checkout service in prod"
# Execute a workflow file
autoops run workflow.json
# Plan and execute in one step
autoops plan-and-run "Check why login API is slow" --service auth-api --env productionAutoOps Architect
Planning workflow for:
Investigate elevated 5xx errors for the checkout service in prod
Generated workflow: Investigate 5xx Errors Workflow
ID: wf-5xx-investigation-001
Nodes: 6, Edges: 5
Workflow Steps
βββ π§ [log_collection] Collect service logs
βββ π§ [metric_query] Query error rate metrics
βββ π§ [analysis] Analyze collected data
βββ π§ [rca_call] Run root cause analysis
βββ π§ [summary] Generate investigation summary
βββ π§ [ticket_create] Create tracking ticket
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Goal β
β "Investigate elevated 5xx errors for checkout" β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Architect (Planner) β
β β
β β’ Parses goal β
β β’ Retrieves similar past workflows from memory β
β β’ Uses LLM to generate workflow graph β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WorkflowGraph (DAG) β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Collect ββββΆβ Analyze ββββΆβ RCA ββββΆβ Summary β β
β β Logs β β Data β β Call β β β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Executor β
β β
β β’ Topologically sorts nodes β
β β’ Executes nodes via Tools β
β β’ Handles failures and approvals β
β β’ Collects results β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Memory Backend β
β β
β β’ Stores workflow history β
β β’ User preferences β
β β’ Successful playbooks for reuse β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AutoOps Architect is designed to work with the broader AutoOps ecosystem:
| Component | Description | Status |
|---|---|---|
| AutoRCA-Core | AI-powered root cause analysis | π Integration ready |
| Secure-MCP-Gateway | Secure tool execution (Jira, Slack, etc.) | π Integration ready |
| Ops-Agent-Desktop | Browser-based mission automation | π Integration ready |
| Autonomous Ops Hub | Central orchestration platform | π Planned |
# LLM Provider (auto-detected if not set)
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-..."
# Optional integrations
export AUTORCA_URL="http://localhost:8080"
export MCP_GATEWAY_URL="http://localhost:3000"
export OPS_AGENT_DESKTOP_URL="http://localhost:9090"from autoops_architect.llm import LLMConfig, LLMProvider
from autoops_architect.planner import Architect, PlannerConfig
# Use OpenAI
config = PlannerConfig(
llm_config=LLMConfig(
provider=LLMProvider.OPENAI,
model="gpt-4",
)
)
# Use Anthropic
config = PlannerConfig(
llm_config=LLMConfig(
provider=LLMProvider.ANTHROPIC,
model="claude-3-sonnet-20240229",
)
)
# Use mock (for testing)
config = PlannerConfig(
llm_config=LLMConfig(provider=LLMProvider.MOCK)
)
architect = Architect(config=config)AutoOps Architect supports transparent LLM response caching to reduce costs and improve latency. When enabled, identical planning requests return cached responses instantly.
Benefits:
- 50%+ cost reduction for repeated goals
- 70% faster response times for cached queries
- Supports memory, filesystem, and Redis backends
from autoops_architect.llm.cache import CacheConfig
from autoops_architect.planner import Architect, PlannerConfig
# Enable caching with in-memory backend (default)
config = PlannerConfig(
cache_config=CacheConfig(
enabled=True,
backend="memory", # Options: memory, filesystem, redis
ttl_seconds=3600, # Cache entries valid for 1 hour
max_size=1000, # Maximum cached entries
)
)
# Filesystem cache (persists across restarts)
config = PlannerConfig(
cache_config=CacheConfig(
enabled=True,
backend="filesystem",
cache_dir="~/.cache/autoops-architect",
ttl_seconds=7200, # 2 hours
)
)
# Redis cache (for distributed deployments)
config = PlannerConfig(
cache_config=CacheConfig(
enabled=True,
backend="redis",
redis_url="redis://localhost:6379/0",
ttl_seconds=3600,
)
)
architect = Architect(config=config)
# Force bypass cache for specific requests
workflow = await architect.llm_client.complete_json(
messages,
force_refresh=True # Ignores cache, makes fresh LLM call
)
# Get cache statistics
if isinstance(architect.llm_client, CachedLLMClient):
stats = architect.llm_client.get_cache_stats()
print(f"Cache hit rate: {stats['hit_rate']:.1%}")Cache backends comparison:
| Backend | Persistence | Distributed | Use Case |
|---|---|---|---|
| memory | No | No | Single-process, development |
| filesystem | Yes | No | Single-server, persists across restarts |
| redis | Yes | Yes | Multi-server, production deployments |
Note: Redis backend requires pip install redis to be installed separately.
# Plan a workflow
autoops plan "Your goal here" [OPTIONS]
--service, -s Target service(s)
--env, -e Environment (prod/staging/dev)
--priority, -p Priority level
--output, -o Output file path
--yaml Output as YAML
--mock Use mock LLM
--mermaid Show Mermaid diagram
# Run a workflow
autoops run <workflow.json> [OPTIONS]
--dry-run, -n Simulate execution
--auto-approve Auto-approve all requests
# Plan and run
autoops plan-and-run "Your goal" [OPTIONS]
# View history
autoops history [OPTIONS]
--limit, -n Number of entries
--search, -q Search keywords
# List available tools
autoops tools [OPTIONS]
--all, -a Show disabled tools
# Validate a workflow
autoops validate <workflow.json>
# Version info
autoops versionautoops-architect/
βββ src/autoops_architect/
β βββ models/ # Data models (Goal, Workflow, etc.)
β βββ planner/ # Architect/meta-agent logic
β βββ executor/ # Workflow execution engine
β βββ tools/ # Tool interface and implementations
β βββ memory/ # Memory backends
β βββ llm/ # LLM client abstraction
β βββ cli.py # CLI application
βββ tests/ # Test suite
βββ examples/ # Example goals and workflows
βββ docs/ # Documentation
βββ pyproject.toml # Project configuration
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=autoops_architect
# Type checking
mypy src/autoops_architect
# Linting
ruff check src/
# Format code
ruff format src/We welcome contributions! See CONTRIBUTING.md for guidelines.
- Add a new tool integration
- Create example workflows for common scenarios
- Improve documentation
- Add more test cases
See docs/roadmap.md for the detailed roadmap including:
- Phase 2: UI, templates, and real integrations
- Phase 3: Code quality, performance, and CI/CD
- Phase 4: Security, safety, and QA
- Phase 5: Ecosystem and community features
MIT License - see LICENSE for details.
Built with: