Skip to content

Releases: Arc-Computer/atlas-sdk

v0.1.16: Zero-Friction Onboarding and KV Cache Optimization

22 Nov 23:11

Choose a tag to compare

Overview

This release delivers major improvements to developer experience and performance optimization. The onboarding time drops from 30+ minutes to under 3 minutes with zero manual interventions, and new playbook injection modes enable KV cache optimization for reduced latency and token costs.

Zero-Friction Onboarding (PR #146)

Complete refactor of atlas env init to achieve "one API key to training data in 5 minutes."

What Changed

LLM-Driven Agent Selection

  • Intelligent candidate selection using Claude Haiku 4.5
  • Auto-selects with confidence > 0.85 (no user prompts required)
  • Shows reasoning for lower confidence selections
  • Graceful fallback to heuristic ranking if API unavailable

Anthropic-Only Defaults

  • Student: Claude Haiku 4.5 (claude-haiku-4-5-20251001)
  • Teacher: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
  • Based on runtime evaluation showing 0.989 reward score, 20.08s latency
  • Learning features enabled by default (few-shot prompting, playbook injection)

Integrated Storage Setup

  • Folds atlas init into env init flow
  • Auto-detects running PostgreSQL with connection validation
  • Prompts to start Docker Compose if not running
  • Validates database connectivity before claiming success

Improved Developer Experience

  • Database connection validation prevents cryptic first-run errors
  • Error messages follow Fix/Debug/Learn more pattern
  • Positive framing emphasizes what works vs what is missing
  • All documentation links verified working

Results

Metric Target Actual
Time to first run < 5 min ~3 min
Manual interventions 0 0
Learning enabled 100% 100%

New Files

  • atlas/cli/candidate_selection.py - LLM-based agent selection with fallback
  • atlas/cli/config_defaults.py - Anthropic configuration templates
  • atlas/cli/progress.py - Output formatting utilities
  • atlas/cli/education.py - Mental model education (not yet integrated)
  • tests/integration/test_env_init_anthropic.py - Integration tests

KV Cache Optimization (PR #144)

Configurable playbook injection modes optimize LLM provider KV cache usage for improved latency and reduced costs.

What Changed

Simplified Injection API

  • Removed confusing separate_message mode
  • Two clear modes:
    • prefix (default): Playbook injected before system prompt
    • suffix: Playbook injected after system prompt (enables KV cache reuse)

Critical Bug Fixes

  • Fixed learning persistence bug where empty dict blocked persistence
  • Disabled actionability gate in evaluation configs to allow cognitive learnings

Performance Benefits

  • Suffix mode preserves static system prompt for KV cache reuse
  • Reduced latency for subsequent requests (cached prefix not reprocessed)
  • Lower token costs (providers charge less for cached tokens)

Testing

Unit test suite added with 13 tests covering:

  • Prefix/suffix injection modes
  • Few-shot example extraction and formatting
  • Redaction pattern validation

End-to-end verification with OpenAI GPT-4o-mini:

  • First run: Generated 7 playbook entries (1,076 chars)
  • Second run: Loaded and injected learning (+3,126 prompt tokens from playbook)

Dependencies

  • Added anthropic>=0.74.1 for LLM-based candidate selection

Breaking Changes

None. All changes are backward compatible:

  • Heuristic fallback preserves original env init behavior
  • Default playbook injection mode remains prefix
  • Existing flags and options preserved

Installation

pip install --upgrade arc-atlas

Documentation

  • Updated README.md with new quickstart flow
  • Updated docs/sdk/quickstart.mdx and docs/guides/introduction.mdx
  • Updated docs/sdk/learning_tracking.md with injection mode documentation

Contributors

Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Version 0.1.15 - Learning Synthesis Fixes and Structured Payloads

12 Nov 01:12

Choose a tag to compare

Version 0.1.15 (2025-11-12)

🐛 Bug Fixes

  • Fixed learning synthesis not triggering when session_reward is set via orchestrator (#138)
    • Ensures proper ExecutionContext access after orchestrator completion
    • Adds defensive initialization checks for metadata property
  • Fixed Gemini learning synthesis reliability (#139)
    • Implements structured outputs with JSON Schema validation for Gemini models
    • Improves error handling with clearer messages
    • Reduces silent failures in learning synthesis
  • Fixed discovery experience issues
    • Resolves synthesis limits and module-level instance support problems
  • Fixed validation message visibility
    • Validation success/failure messages now always shown
  • Fixed .env variable loading
    • Ensures .env variables are applied to os.environ for LLM synthesis

✨ New Features

  • Structured payloads for BYOA adapters (#137)
    • Opt-in support for structured task_payload and step_payload metadata
    • Enables integration with test harnesses and simulation environments
    • Python adapters opt-in automatically; LLM adapters remain opted-out by default
    • See documentation for details

🔧 Improvements

  • Refactored env.py modules for improved maintainability
    • Extracted LLM inference utilities to atlas/sdk/llm_inference.py
    • Extracted factory builders to atlas/sdk/factory_synthesis.py
    • Extracted shared types to atlas/cli/env_types.py
    • Reduced code duplication by ~623 lines

📚 Documentation

  • Updated README with badges and development section
  • Added structured adapter payloads documentation
  • Clarified storage requirements for rewards and learning
  • Added training pipeline documentation links

0.1.14 - Multi-Provider LiteLLM Adapter

05 Nov 15:42
bca1131

Choose a tag to compare

0.1.14 - Multi-Provider LiteLLM Adapter

Features

  • Renamed OpenAI adapter to LitellmAdapter to reflect multi-provider support
  • Added support for Anthropic, Gemini, Bedrock, X.AI, Azure OpenAI via litellm

Deprecations

  • OpenAIAdapterConfigLitellmAdapterConfig (old name still works with warnings)
  • type: openaitype: litellm in YAML configs (old syntax still works)
  • AdapterType.OPENAIAdapterType.LITELLM (old enum value still works)

Migration Guide

For new code, update your configs:

agent:
  type: litellm  # Changed from: type: openai
from atlas.config.models import LitellmAdapterConfig
config = LitellmAdapterConfig(type=AdapterType.LITELLM, ...)

Old code continues to work with deprecation warnings.

v0.1.13 - Direct Training Data Access

04 Nov 13:34

Choose a tag to compare

Python Client for Direct Training Data Access

This release introduces a new Python client for querying training data directly from PostgreSQL, eliminating JSONL export intermediates and preventing schema drift between SDK and ATLAS Core.

Key Features

  • Direct Database Queries: Query training sessions without JSONL export steps
  • Reward-Based Filtering: Filter sessions using JSONB operators on reward scores
  • Selective Data Loading: Control trajectory events and learning data inclusion via flags
  • Pagination Support: Process large datasets efficiently with async iterators
  • Enterprise-Ready: Works with Docker Postgres and on-premises deployments

New Modules

  • atlas.training_data.client: Core query functions with async/sync variants
  • atlas.training_data.converters: Database dict to dataclass conversion
  • atlas.training_data.filters: SQL WHERE clause builder
  • atlas.training_data.pagination: Async iterator for batch processing

Schema Updates

AtlasSessionTrace: Added 6 essential fields (session_reward, trajectory_events, student_learning, teacher_learning, learning_history, adaptive_summary) and 7 property accessors

AtlasStepTrace: Added 2 essential fields (runtime, depends_on) and 1 property accessor

Performance Improvements

Added database indexes for production workloads:

  • Reward filtering: 10-100x faster
  • Date range queries: 50-100x faster
  • Critical for training workloads with millions of sessions

Install

pip install arc-atlas==0.1.13

Atlas SDK v0.1.12

02 Nov 16:26

Choose a tag to compare

Progressive Learning & Developer Experience Improvements

This release introduces a streamlined quickstart experience, comprehensive documentation consolidation, and enhanced evaluation tooling for measuring tool adoption and learning effectiveness.

Features

New Atlas Quickstart CLI (#114)

  • Added atlas quickstart command replacing the old examples/quickstart.py script
  • Run interactive demonstration with 3 security review tasks, metrics visualization, and cost estimation
  • Introduced ATLAS_OFFLINE_MODE=1 for mock LLM responses (deprecates ATLAS_FAKE_LLM)
  • Improved error handling with troubleshooting guidance for missing API keys and configuration issues
  • Zero-friction onboarding for new users

Auto-Validation on Environment Init (#113)

  • atlas env init now automatically validates discovered artifacts regardless of auto_skip flag
  • Removed confusing --validate flag that didn't actually change behavior
  • Ensures configuration correctness from the start

Capability Probe Fail-Open (#112)

  • Probe now gracefully handles missing API credentials with fail-open behavior
  • Defaults to 'paired' mode for zero-config operation when probe unavailable
  • Removed deprecated 'escalate' mode from all code paths
  • Fixed default LLM consistency: uses xai/grok-4-fast throughout
  • More resilient runtime with better error messages

Enhanced Tool Adoption Telemetry

  • Added comprehensive tool adoption validation framework with scripts/validate_tool_adoption.py
  • New evaluation configs for Claude and OpenAI models in configs/eval/learning/
  • Added digest statistics validation with scripts/validate_digest_stats.py
  • Improved telemetry tracking in Student persona with tool call recording
  • Reorganized eval scripts with clearer naming: benchmark_*, validate_*, report_*, collect_*

Improvements

Documentation Consolidation (#115, #116)

  • Removed documentation redundancy across codebase
  • Established canonical sources: quickstart.mdx, introduction.mdx, configuration.md
  • Removed all --offline flags from quickstart examples (promote real execution)
  • Featured mcp_tool_learning as primary production-ready example throughout docs
  • Fixed relative paths in PyPI readme for correct GitHub rendering

Examples Cleanup (#116)

  • Streamlined examples folder from 10+ files to 1 production-ready example (mcp_tool_learning)
  • Removed low-value examples: python_example.py, http_example.py, agent.py
  • Removed duplicate triage_adapters folder
  • Moved langgraph_adapter.py to test fixtures (test utility only)
  • Net reduction of 302 lines while improving clarity
  • Added README.md documenting naming convention pattern

Evaluation Tooling Organization

  • Reorganized configs/eval/ with clear structure: baseline/, learning/, reward/
  • Added comprehensive configs/eval/README.md and scripts/README.md documentation
  • Renamed scripts for clarity and consistency
  • Added comprehensive test coverage for tool adoption validation

Installation

pip install --upgrade arc-atlas

Full Changelog: v0.1.11...v0.1.12

Atlas SDK v0.1.11

01 Nov 19:02

Choose a tag to compare

Learning System Enhancements & Async Support

This release introduces transfer-level classification for learning evaluation and enables Atlas to run in async contexts.

Features

Transfer Level Classification (#106)

  • Added 4-tier transfer hierarchy: task/domain/workflow/universal
  • Evaluation reports now show transfer classification (e.g., "transfer no (task)")
  • Token efficiency guidance added to synthesis prompt
  • Rubric weights documented in learning_eval.md

Async Context Support (#105)

  • Exported arun() for async execution (fixes #104)
  • Enables learning harnesses, notebooks, and async frameworks
  • Added comprehensive MCP tool learning example (25 progressive tasks)

Improvements

Learning System

  • Gemini 2.5 Flash as default synthesizer
  • Enhanced 7-step synthesis process with quality gates
  • Fixed adoption tracking metadata recording

Terminology

  • Replaced "RIM" with "reward system" for clarity
  • Console output now shows "Judge scores" instead of "RIM scores"

Installation

pip install --upgrade arc-atlas

Full Changelog: v0.1.10...v0.1.11

v0.1.10

30 Oct 22:32

Choose a tag to compare

Zero-Config Atlas Run

This release enables developers to run atlas run immediately after atlas env init without requiring environment variables.

Key Changes

Validation Marker Workflow

  • Replaces ATLAS_DISCOVERY_VALIDATE=1 requirement with persistent .validated marker file
  • Marker written automatically during atlas env init, even when validation is deferred
  • Runtime factories check for marker file at instantiation
  • Emergency bypass available via ATLAS_SKIP_VALIDATION=1

Bug Fixes

  • Fixed HumanMessage format compatibility for LangGraph/DeepAgents (#101)
  • Fixed import shadowing causing infinite recursion in generated factories
  • Optimized IntermediateStepManager caching to prevent duplicate instantiation

Developer Experience

Before:

atlas env init
ATLAS_DISCOVERY_VALIDATE=1 atlas run --config .atlas/generated_config.yaml --task "..."

After:

atlas env init
atlas run --config .atlas/generated_config.yaml --task "..."

Installation

pip install --upgrade arc-atlas

Full Changelog: v0.1.9...v0.1.10

Atlas SDK v0.1.9

30 Oct 17:05

Choose a tag to compare

Atlas SDK v0.1.9 — 2025-10-30

Highlights

  • Repo-aware autodiscovery: atlas env init now analyzes your codebase to generate configs that mirror your agent's actual prompts, tools, LLM settings, and control flow—eliminating manual configuration for existing projects. Discovery now runs a real session during init, capturing telemetry and saving runtime config to .atlas/runs/ by default.
  • Learning evaluation foundation: Complete end-to-end evaluation pipeline with policy quality metrics, lifecycle tracking, usage instrumentation, and reproducible baseline experiments for prompt and model variants.
  • Provider-aware metadata management: OpenAI adapter now enforces budget constraints and projects execution metadata within configurable token limits, preventing context overflow. See configuration reference below for tuning knobs.
  • Critical pip install fix: Bundled config templates now ship in the wheel, ensuring atlas env init produces runnable configs without manual fixes.
  • Polished CLI experience: Streamlined learning summaries, dual-pass telemetry reporting, and improved warning management for cleaner output. Quickstart now documents warning suppression and dual-pass telemetry expectations.

Detailed Changelog

Autodiscovery & Configuration Synthesis

#90 — Repo-aware config generation

  • Extended autodiscovery to harvest project metadata (prompts, tools, config dictionaries, constructor defaults, factory helpers) directly from source code using AST, regex, and ripgrep analysis.
  • Enhanced candidate scoring to prioritize concrete implementations over abstract base classes by detecting NotImplementedError stubs, counting prompt/tool literals, and tracking instantiation patterns across the repository.
  • Generate runtime configs that automatically wire to your project's existing factories, system prompts, and LLM configurations instead of generic templates.
  • Persist signal maps in discover.json showing why each candidate was selected and what runtime wiring exists in the repo.
  • Broaden discovery beyond classes to include module-level callables, assignment factories, and pyproject.toml entry points, with framework-specific keyword boosting for LangGraph, LangChain, CrewAI, Semantic Kernel, and DeepAgents.
  • Improved LLM candidate extraction with recursive static analysis, deduplication, and normalized tool entry merging for more accurate agent config synthesis.
  • Increased model output limits (16k for Claude Haiku and Gemini 2.5 Flash) to accommodate richer metadata in a single synthesis round.
  • Changed default behavior: run_discovery=True now executes sample loops automatically during init, capturing telemetry and runtime config by default. Adapter call order changed to prioritize act(observation, ...) first for SecRL-style agent compatibility.
  • Result: Running atlas env init now produces two repo-aware artifacts: a factory module wrapping your environment/agent and a config file reflecting your project's actual implementation. Developers only need to tweak edge cases instead of rebuilding configs manually.

#88 — Bundle config templates in wheel

  • Fixed missing template issue that caused pip installations to fail generating runnable configs.
  • Packaged baseline template under atlas.templates and load via importlib.resources, maintaining repo fallback for local development.
  • Aligned example configs with packaged templates and removed stale active_judges toggle.
  • Impact: atlas env init --scaffold-config-full now works correctly for pip users without manual configuration fixes.

Learning & Evaluation Infrastructure

#86 — Learning evaluation foundation

  • Implemented end-to-end learning evaluation pipeline with policy quality, lifecycle, usage, and efficiency metrics.
  • Added playbook entry schema enforcement with rubric gating, provenance tracking, and runtime usage instrumentation throughout the learning pipeline (synthesizer, personas, runtime wiring).
  • Extended evaluation harness (scripts/eval_learning.py, atlas/evaluation/learning_report.py) with CLI flags for prompt and model experiments, filtering by project/task/tags, and model breakdown analytics.
  • Renamed learning nuggets to playbook entries for clearer conceptual alignment with policy-driven learning.
  • Added playbook impact instrumentation to track how learned policies influence runtime behavior.
  • Shipped experiment-ready configs (configs/eval/learning_baseline.yaml, learning_claude.yaml, learning_scope_shift.yaml) enabling reproducible baseline runs and systematic comparison of prompt/LLM variants.
  • Enhanced learning summary output with better formatting, dual-pass telemetry reporting, and reusable summary helpers.
  • Developer Experience: Complete evaluation toolkit for auditing learning behavior, comparing synthesis strategies, and measuring policy effectiveness across different model configurations.

#98 — Polish learning summary CLI

  • Streamlined README onboarding flow around autodiscovery with explicit PostgreSQL requirement documentation.
  • Hardened env config synthesis to tolerate missing capability metadata while preserving learning and runtime safety blocks.
  • Made StepwiseAgentAdapter more defensive when callables drop keyword argument support.
  • Reworked quickstart to highlight dual-pass telemetry, reuse learning summary helper, and suppress noisy warnings for cleaner CLI output.
  • Impact: Cleaner, more resilient CLI experience with better error handling and focused output.

Provider Integration & Metadata Management

#94 — Provider-aware metadata digest

  • Added OpenAI adapter digest helper that projects execution metadata and enforces provider-specific budget constraints.
  • Derive default character budgets from provider context windows (approximately 10% using 4 chars/token) with simple override support.
  • Expose digest statistics and budget utilization in payloads and logs with new config knobs for fine-tuning.
  • Instrumented metadata digest utilization tracking and fixed size accounting for accurate budget enforcement.
  • Developer Experience: Prevents context overflow by automatically managing metadata size within provider limits while maintaining visibility into budget consumption.

Configuration Reference

OpenAI adapter metadata digest can be configured via metadata_digest block in your adapter config:

metadata_digest:
  enabled: true                      # Enable/disable digest projection
  char_budget: 50000                 # Global character budget (overrides provider defaults)
  provider_char_budgets:             # Per-provider overrides
    openai: 50000
    anthropic: 80000
  max_plan_steps: 5                  # Max plan steps to include (0-20)
  max_step_summaries: 5              # Max step summaries to include (0-20)
  max_learning_history_entries: 3    # Max learning entries to include (0-10)
  max_reward_audit_entries: 3        # Max reward audit entries (0-10)
  max_prompt_rewrite_chars: 2000     # Max chars for prompt rewrites (256-20000)
  max_section_chars: 4000            # Max chars per metadata section (512-20000)
  max_string_chars: 1000             # Max chars per string value (128-4000)

Default budgets are derived as ~10% of provider context window × 4 chars/token.

Evaluation & Testing Improvements

#79 — Reward evaluation enhancements

  • Added configs/eval/reward_system.yaml for config-driven judge presets and combinations without code changes.
  • Extended scripts/eval_reward_models.py with --collect-audit flag, safe audit payload serialization, and automatic Markdown report generation alongside JSON output.
  • Fixed reward eval stub to properly handle the audit flag, ensuring audit payload size guards work correctly.
  • Updated documentation in docs/reward_eval.md for new flags and config-driven workflow.
  • Research Impact: Streamlined reward model evaluation with better auditability and configurable judge combinations.

Documentation & Developer Experience

  • Explained quickstart warning suppression patterns for cleaner first-run experience.
  • Streamlined README onboarding around autodiscovery workflow with clearer PostgreSQL setup instructions.
  • Updated learning evaluation documentation showing new schema, gates, and baseline config usage.
  • Enhanced reward evaluation docs with audit workflow and config-driven presets.
  • Added experiment design notes in docs/rfc/2025-10-29-learning-eval-foundation.md.

Testing & Quality

  • Added comprehensive test coverage for learning evaluation (tests/unit/evaluation/test_learning_report.py, tests/unit/test_learning_usage.py).
  • Added prompt digest and OpenAI adapter test coverage (tests/unit/connectors/test_prompt_digest.py).
  • Added CLI summary test coverage (tests/unit/cli/test_run_summary.py).
  • Enhanced learning synthesizer tests with playbook entry validation.
  • Fixed reward eval stub compatibility with audit flag.

Migration Notes

The following changes may require configuration updates:

Factory Synthesis LLM Provider

The factory synthesis flow now defaults to Gemini 2.5 Flash only. If you rely on OpenAI or Anthropic models for autodiscovery, you must export GEMINI_API_KEY or modify atlas/sdk/factory_synthesis.py to reintroduce other providers.

Evaluation Config Renaming

Learning evaluation configs have been renamed for clarity:

  • learning_overhaul_base.yamllearning_baseline.yaml
  • learning_overhaul_*.yamllearning_*.yaml

Update automation scripts referencing the old names.

Terminology Updates

Learning nuggets are now **playbook entri...

Read more

Atlas SDK v0.1.8

24 Oct 01:08

Choose a tag to compare

Atlas SDK v0.1.8 — 2025-10-23

Highlights

  • Autodiscovery-first CLI with .env/PYTHONPATH bootstrapping, config replays, and fake-LLM smoke tests.
  • Learning playbooks injected into every runtime prompt with hash-based cache safety.
  • Persistent telemetry plus learning reports with model-level analytics and flexible filters.
  • Session export guardrails requiring approval, complete with drift alerts and review tooling.
  • Expanded evaluation suites (capability probe, dual-agent runtime, reward models) and an end-to-end atlas train workflow.

Detailed Changelog

  • #76 – Learning playbook resolution, prompt injection, cache-key hashing, and config toggle for Student/Teacher personas.
  • #75 – CLI scaffolding upgrades: .env loading, PYTHONPATH fixes, provider/model autodiscovery, fake LLM mode, and env-ready integration tests.
  • #74atlas run --config, stronger adapter registration, LangChain serialization fixes, and CLI metadata/type-safety cleanup.
  • #73 – Learning evaluation harness with filtering by project/task/tags, model breakdown analytics, window specs, async summary generation, and extra session metadata.
  • #72 – Postgres persistence for discovery/runtime telemetry, hint-less learning evaluation workflow, report utilities, and database API extensions.
  • #70 – Autodiscovery onboarding flow, new runtime execution path, learning synthesizer config/state management, and orchestration improvements.
  • #63 – Session review/approval workflow, export filters by review status, CLI moderation commands, and documentation for production guardrails.
  • #55 – Capability probe refresh with xAI Grok provider support plus packaged evaluation datasets and docs.
  • #56 – Dual-agent runtime evaluation harness, synthetic runtime dataset, reference documentation, and unit tests.
  • #57 – Reward model evaluation harness, trajectory dataset, packaging updates, and documentation.
  • #54 – Learning history limits, payload trimming, streak metrics, documentation, and unit coverage.
  • #52atlas train CLI, reusable export filters, sample datasets, dotenv auto-load, and workflow documentation.

v0.1.7

17 Oct 20:47
bd9285e

Choose a tag to compare

What's Changed