Releases: Arc-Computer/atlas-sdk
v0.1.16: Zero-Friction Onboarding and KV Cache Optimization
Overview
This release delivers major improvements to developer experience and performance optimization. The onboarding time drops from 30+ minutes to under 3 minutes with zero manual interventions, and new playbook injection modes enable KV cache optimization for reduced latency and token costs.
Zero-Friction Onboarding (PR #146)
Complete refactor of atlas env init to achieve "one API key to training data in 5 minutes."
What Changed
LLM-Driven Agent Selection
- Intelligent candidate selection using Claude Haiku 4.5
- Auto-selects with confidence > 0.85 (no user prompts required)
- Shows reasoning for lower confidence selections
- Graceful fallback to heuristic ranking if API unavailable
Anthropic-Only Defaults
- Student: Claude Haiku 4.5 (claude-haiku-4-5-20251001)
- Teacher: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
- Based on runtime evaluation showing 0.989 reward score, 20.08s latency
- Learning features enabled by default (few-shot prompting, playbook injection)
Integrated Storage Setup
- Folds
atlas initintoenv initflow - Auto-detects running PostgreSQL with connection validation
- Prompts to start Docker Compose if not running
- Validates database connectivity before claiming success
Improved Developer Experience
- Database connection validation prevents cryptic first-run errors
- Error messages follow Fix/Debug/Learn more pattern
- Positive framing emphasizes what works vs what is missing
- All documentation links verified working
Results
| Metric | Target | Actual |
|---|---|---|
| Time to first run | < 5 min | ~3 min |
| Manual interventions | 0 | 0 |
| Learning enabled | 100% | 100% |
New Files
atlas/cli/candidate_selection.py- LLM-based agent selection with fallbackatlas/cli/config_defaults.py- Anthropic configuration templatesatlas/cli/progress.py- Output formatting utilitiesatlas/cli/education.py- Mental model education (not yet integrated)tests/integration/test_env_init_anthropic.py- Integration tests
KV Cache Optimization (PR #144)
Configurable playbook injection modes optimize LLM provider KV cache usage for improved latency and reduced costs.
What Changed
Simplified Injection API
- Removed confusing
separate_messagemode - Two clear modes:
prefix(default): Playbook injected before system promptsuffix: Playbook injected after system prompt (enables KV cache reuse)
Critical Bug Fixes
- Fixed learning persistence bug where empty dict blocked persistence
- Disabled actionability gate in evaluation configs to allow cognitive learnings
Performance Benefits
- Suffix mode preserves static system prompt for KV cache reuse
- Reduced latency for subsequent requests (cached prefix not reprocessed)
- Lower token costs (providers charge less for cached tokens)
Testing
Unit test suite added with 13 tests covering:
- Prefix/suffix injection modes
- Few-shot example extraction and formatting
- Redaction pattern validation
End-to-end verification with OpenAI GPT-4o-mini:
- First run: Generated 7 playbook entries (1,076 chars)
- Second run: Loaded and injected learning (+3,126 prompt tokens from playbook)
Dependencies
- Added
anthropic>=0.74.1for LLM-based candidate selection
Breaking Changes
None. All changes are backward compatible:
- Heuristic fallback preserves original env init behavior
- Default playbook injection mode remains
prefix - Existing flags and options preserved
Installation
pip install --upgrade arc-atlasDocumentation
- Updated README.md with new quickstart flow
- Updated docs/sdk/quickstart.mdx and docs/guides/introduction.mdx
- Updated docs/sdk/learning_tracking.md with injection mode documentation
Contributors
Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com
Version 0.1.15 - Learning Synthesis Fixes and Structured Payloads
Version 0.1.15 (2025-11-12)
🐛 Bug Fixes
- Fixed learning synthesis not triggering when
session_rewardis set via orchestrator (#138)- Ensures proper
ExecutionContextaccess after orchestrator completion - Adds defensive initialization checks for metadata property
- Ensures proper
- Fixed Gemini learning synthesis reliability (#139)
- Implements structured outputs with JSON Schema validation for Gemini models
- Improves error handling with clearer messages
- Reduces silent failures in learning synthesis
- Fixed discovery experience issues
- Resolves synthesis limits and module-level instance support problems
- Fixed validation message visibility
- Validation success/failure messages now always shown
- Fixed .env variable loading
- Ensures .env variables are applied to os.environ for LLM synthesis
✨ New Features
- Structured payloads for BYOA adapters (#137)
- Opt-in support for structured
task_payloadandstep_payloadmetadata - Enables integration with test harnesses and simulation environments
- Python adapters opt-in automatically; LLM adapters remain opted-out by default
- See documentation for details
- Opt-in support for structured
🔧 Improvements
- Refactored env.py modules for improved maintainability
- Extracted LLM inference utilities to
atlas/sdk/llm_inference.py - Extracted factory builders to
atlas/sdk/factory_synthesis.py - Extracted shared types to
atlas/cli/env_types.py - Reduced code duplication by ~623 lines
- Extracted LLM inference utilities to
📚 Documentation
- Updated README with badges and development section
- Added structured adapter payloads documentation
- Clarified storage requirements for rewards and learning
- Added training pipeline documentation links
0.1.14 - Multi-Provider LiteLLM Adapter
0.1.14 - Multi-Provider LiteLLM Adapter
Features
- Renamed OpenAI adapter to LitellmAdapter to reflect multi-provider support
- Added support for Anthropic, Gemini, Bedrock, X.AI, Azure OpenAI via litellm
Deprecations
OpenAIAdapterConfig→LitellmAdapterConfig(old name still works with warnings)type: openai→type: litellmin YAML configs (old syntax still works)AdapterType.OPENAI→AdapterType.LITELLM(old enum value still works)
Migration Guide
For new code, update your configs:
agent:
type: litellm # Changed from: type: openaifrom atlas.config.models import LitellmAdapterConfig
config = LitellmAdapterConfig(type=AdapterType.LITELLM, ...)Old code continues to work with deprecation warnings.
v0.1.13 - Direct Training Data Access
Python Client for Direct Training Data Access
This release introduces a new Python client for querying training data directly from PostgreSQL, eliminating JSONL export intermediates and preventing schema drift between SDK and ATLAS Core.
Key Features
- Direct Database Queries: Query training sessions without JSONL export steps
- Reward-Based Filtering: Filter sessions using JSONB operators on reward scores
- Selective Data Loading: Control trajectory events and learning data inclusion via flags
- Pagination Support: Process large datasets efficiently with async iterators
- Enterprise-Ready: Works with Docker Postgres and on-premises deployments
New Modules
atlas.training_data.client: Core query functions with async/sync variantsatlas.training_data.converters: Database dict to dataclass conversionatlas.training_data.filters: SQL WHERE clause builderatlas.training_data.pagination: Async iterator for batch processing
Schema Updates
AtlasSessionTrace: Added 6 essential fields (session_reward, trajectory_events, student_learning, teacher_learning, learning_history, adaptive_summary) and 7 property accessors
AtlasStepTrace: Added 2 essential fields (runtime, depends_on) and 1 property accessor
Performance Improvements
Added database indexes for production workloads:
- Reward filtering: 10-100x faster
- Date range queries: 50-100x faster
- Critical for training workloads with millions of sessions
Install
pip install arc-atlas==0.1.13Atlas SDK v0.1.12
Progressive Learning & Developer Experience Improvements
This release introduces a streamlined quickstart experience, comprehensive documentation consolidation, and enhanced evaluation tooling for measuring tool adoption and learning effectiveness.
Features
New Atlas Quickstart CLI (#114)
- Added
atlas quickstartcommand replacing the oldexamples/quickstart.pyscript - Run interactive demonstration with 3 security review tasks, metrics visualization, and cost estimation
- Introduced
ATLAS_OFFLINE_MODE=1for mock LLM responses (deprecatesATLAS_FAKE_LLM) - Improved error handling with troubleshooting guidance for missing API keys and configuration issues
- Zero-friction onboarding for new users
Auto-Validation on Environment Init (#113)
atlas env initnow automatically validates discovered artifacts regardless ofauto_skipflag- Removed confusing
--validateflag that didn't actually change behavior - Ensures configuration correctness from the start
Capability Probe Fail-Open (#112)
- Probe now gracefully handles missing API credentials with fail-open behavior
- Defaults to 'paired' mode for zero-config operation when probe unavailable
- Removed deprecated 'escalate' mode from all code paths
- Fixed default LLM consistency: uses
xai/grok-4-fastthroughout - More resilient runtime with better error messages
Enhanced Tool Adoption Telemetry
- Added comprehensive tool adoption validation framework with
scripts/validate_tool_adoption.py - New evaluation configs for Claude and OpenAI models in
configs/eval/learning/ - Added digest statistics validation with
scripts/validate_digest_stats.py - Improved telemetry tracking in Student persona with tool call recording
- Reorganized eval scripts with clearer naming:
benchmark_*,validate_*,report_*,collect_*
Improvements
Documentation Consolidation (#115, #116)
- Removed documentation redundancy across codebase
- Established canonical sources:
quickstart.mdx,introduction.mdx,configuration.md - Removed all
--offlineflags from quickstart examples (promote real execution) - Featured
mcp_tool_learningas primary production-ready example throughout docs - Fixed relative paths in PyPI readme for correct GitHub rendering
Examples Cleanup (#116)
- Streamlined examples folder from 10+ files to 1 production-ready example (
mcp_tool_learning) - Removed low-value examples:
python_example.py,http_example.py,agent.py - Removed duplicate
triage_adaptersfolder - Moved
langgraph_adapter.pyto test fixtures (test utility only) - Net reduction of 302 lines while improving clarity
- Added
README.mddocumenting naming convention pattern
Evaluation Tooling Organization
- Reorganized
configs/eval/with clear structure:baseline/,learning/,reward/ - Added comprehensive
configs/eval/README.mdandscripts/README.mddocumentation - Renamed scripts for clarity and consistency
- Added comprehensive test coverage for tool adoption validation
Installation
pip install --upgrade arc-atlasFull Changelog: v0.1.11...v0.1.12
Atlas SDK v0.1.11
Learning System Enhancements & Async Support
This release introduces transfer-level classification for learning evaluation and enables Atlas to run in async contexts.
Features
Transfer Level Classification (#106)
- Added 4-tier transfer hierarchy: task/domain/workflow/universal
- Evaluation reports now show transfer classification (e.g., "transfer no (task)")
- Token efficiency guidance added to synthesis prompt
- Rubric weights documented in learning_eval.md
Async Context Support (#105)
- Exported
arun()for async execution (fixes #104) - Enables learning harnesses, notebooks, and async frameworks
- Added comprehensive MCP tool learning example (25 progressive tasks)
Improvements
Learning System
- Gemini 2.5 Flash as default synthesizer
- Enhanced 7-step synthesis process with quality gates
- Fixed adoption tracking metadata recording
Terminology
- Replaced "RIM" with "reward system" for clarity
- Console output now shows "Judge scores" instead of "RIM scores"
Installation
pip install --upgrade arc-atlasFull Changelog: v0.1.10...v0.1.11
v0.1.10
Zero-Config Atlas Run
This release enables developers to run atlas run immediately after atlas env init without requiring environment variables.
Key Changes
Validation Marker Workflow
- Replaces
ATLAS_DISCOVERY_VALIDATE=1requirement with persistent.validatedmarker file - Marker written automatically during
atlas env init, even when validation is deferred - Runtime factories check for marker file at instantiation
- Emergency bypass available via
ATLAS_SKIP_VALIDATION=1
Bug Fixes
- Fixed HumanMessage format compatibility for LangGraph/DeepAgents (#101)
- Fixed import shadowing causing infinite recursion in generated factories
- Optimized IntermediateStepManager caching to prevent duplicate instantiation
Developer Experience
Before:
atlas env init
ATLAS_DISCOVERY_VALIDATE=1 atlas run --config .atlas/generated_config.yaml --task "..."After:
atlas env init
atlas run --config .atlas/generated_config.yaml --task "..."Installation
pip install --upgrade arc-atlasFull Changelog: v0.1.9...v0.1.10
Atlas SDK v0.1.9
Atlas SDK v0.1.9 — 2025-10-30
Highlights
- Repo-aware autodiscovery:
atlas env initnow analyzes your codebase to generate configs that mirror your agent's actual prompts, tools, LLM settings, and control flow—eliminating manual configuration for existing projects. Discovery now runs a real session during init, capturing telemetry and saving runtime config to.atlas/runs/by default. - Learning evaluation foundation: Complete end-to-end evaluation pipeline with policy quality metrics, lifecycle tracking, usage instrumentation, and reproducible baseline experiments for prompt and model variants.
- Provider-aware metadata management: OpenAI adapter now enforces budget constraints and projects execution metadata within configurable token limits, preventing context overflow. See configuration reference below for tuning knobs.
- Critical pip install fix: Bundled config templates now ship in the wheel, ensuring
atlas env initproduces runnable configs without manual fixes. - Polished CLI experience: Streamlined learning summaries, dual-pass telemetry reporting, and improved warning management for cleaner output. Quickstart now documents warning suppression and dual-pass telemetry expectations.
Detailed Changelog
Autodiscovery & Configuration Synthesis
#90 — Repo-aware config generation
- Extended autodiscovery to harvest project metadata (prompts, tools, config dictionaries, constructor defaults, factory helpers) directly from source code using AST, regex, and ripgrep analysis.
- Enhanced candidate scoring to prioritize concrete implementations over abstract base classes by detecting
NotImplementedErrorstubs, counting prompt/tool literals, and tracking instantiation patterns across the repository. - Generate runtime configs that automatically wire to your project's existing factories, system prompts, and LLM configurations instead of generic templates.
- Persist signal maps in
discover.jsonshowing why each candidate was selected and what runtime wiring exists in the repo. - Broaden discovery beyond classes to include module-level callables, assignment factories, and
pyproject.tomlentry points, with framework-specific keyword boosting for LangGraph, LangChain, CrewAI, Semantic Kernel, and DeepAgents. - Improved LLM candidate extraction with recursive static analysis, deduplication, and normalized tool entry merging for more accurate agent config synthesis.
- Increased model output limits (16k for Claude Haiku and Gemini 2.5 Flash) to accommodate richer metadata in a single synthesis round.
- Changed default behavior:
run_discovery=Truenow executes sample loops automatically during init, capturing telemetry and runtime config by default. Adapter call order changed to prioritizeact(observation, ...)first for SecRL-style agent compatibility. - Result: Running
atlas env initnow produces two repo-aware artifacts: a factory module wrapping your environment/agent and a config file reflecting your project's actual implementation. Developers only need to tweak edge cases instead of rebuilding configs manually.
#88 — Bundle config templates in wheel
- Fixed missing template issue that caused pip installations to fail generating runnable configs.
- Packaged baseline template under
atlas.templatesand load viaimportlib.resources, maintaining repo fallback for local development. - Aligned example configs with packaged templates and removed stale
active_judgestoggle. - Impact:
atlas env init --scaffold-config-fullnow works correctly for pip users without manual configuration fixes.
Learning & Evaluation Infrastructure
#86 — Learning evaluation foundation
- Implemented end-to-end learning evaluation pipeline with policy quality, lifecycle, usage, and efficiency metrics.
- Added playbook entry schema enforcement with rubric gating, provenance tracking, and runtime usage instrumentation throughout the learning pipeline (synthesizer, personas, runtime wiring).
- Extended evaluation harness (
scripts/eval_learning.py,atlas/evaluation/learning_report.py) with CLI flags for prompt and model experiments, filtering by project/task/tags, and model breakdown analytics. - Renamed learning nuggets to playbook entries for clearer conceptual alignment with policy-driven learning.
- Added playbook impact instrumentation to track how learned policies influence runtime behavior.
- Shipped experiment-ready configs (
configs/eval/learning_baseline.yaml,learning_claude.yaml,learning_scope_shift.yaml) enabling reproducible baseline runs and systematic comparison of prompt/LLM variants. - Enhanced learning summary output with better formatting, dual-pass telemetry reporting, and reusable summary helpers.
- Developer Experience: Complete evaluation toolkit for auditing learning behavior, comparing synthesis strategies, and measuring policy effectiveness across different model configurations.
#98 — Polish learning summary CLI
- Streamlined README onboarding flow around autodiscovery with explicit PostgreSQL requirement documentation.
- Hardened env config synthesis to tolerate missing capability metadata while preserving learning and runtime safety blocks.
- Made
StepwiseAgentAdaptermore defensive when callables drop keyword argument support. - Reworked quickstart to highlight dual-pass telemetry, reuse learning summary helper, and suppress noisy warnings for cleaner CLI output.
- Impact: Cleaner, more resilient CLI experience with better error handling and focused output.
Provider Integration & Metadata Management
#94 — Provider-aware metadata digest
- Added OpenAI adapter digest helper that projects execution metadata and enforces provider-specific budget constraints.
- Derive default character budgets from provider context windows (approximately 10% using 4 chars/token) with simple override support.
- Expose digest statistics and budget utilization in payloads and logs with new config knobs for fine-tuning.
- Instrumented metadata digest utilization tracking and fixed size accounting for accurate budget enforcement.
- Developer Experience: Prevents context overflow by automatically managing metadata size within provider limits while maintaining visibility into budget consumption.
Configuration Reference
OpenAI adapter metadata digest can be configured via metadata_digest block in your adapter config:
metadata_digest:
enabled: true # Enable/disable digest projection
char_budget: 50000 # Global character budget (overrides provider defaults)
provider_char_budgets: # Per-provider overrides
openai: 50000
anthropic: 80000
max_plan_steps: 5 # Max plan steps to include (0-20)
max_step_summaries: 5 # Max step summaries to include (0-20)
max_learning_history_entries: 3 # Max learning entries to include (0-10)
max_reward_audit_entries: 3 # Max reward audit entries (0-10)
max_prompt_rewrite_chars: 2000 # Max chars for prompt rewrites (256-20000)
max_section_chars: 4000 # Max chars per metadata section (512-20000)
max_string_chars: 1000 # Max chars per string value (128-4000)Default budgets are derived as ~10% of provider context window × 4 chars/token.
Evaluation & Testing Improvements
#79 — Reward evaluation enhancements
- Added
configs/eval/reward_system.yamlfor config-driven judge presets and combinations without code changes. - Extended
scripts/eval_reward_models.pywith--collect-auditflag, safe audit payload serialization, and automatic Markdown report generation alongside JSON output. - Fixed reward eval stub to properly handle the audit flag, ensuring audit payload size guards work correctly.
- Updated documentation in
docs/reward_eval.mdfor new flags and config-driven workflow. - Research Impact: Streamlined reward model evaluation with better auditability and configurable judge combinations.
Documentation & Developer Experience
- Explained quickstart warning suppression patterns for cleaner first-run experience.
- Streamlined README onboarding around autodiscovery workflow with clearer PostgreSQL setup instructions.
- Updated learning evaluation documentation showing new schema, gates, and baseline config usage.
- Enhanced reward evaluation docs with audit workflow and config-driven presets.
- Added experiment design notes in
docs/rfc/2025-10-29-learning-eval-foundation.md.
Testing & Quality
- Added comprehensive test coverage for learning evaluation (
tests/unit/evaluation/test_learning_report.py,tests/unit/test_learning_usage.py). - Added prompt digest and OpenAI adapter test coverage (
tests/unit/connectors/test_prompt_digest.py). - Added CLI summary test coverage (
tests/unit/cli/test_run_summary.py). - Enhanced learning synthesizer tests with playbook entry validation.
- Fixed reward eval stub compatibility with audit flag.
Migration Notes
The following changes may require configuration updates:
Factory Synthesis LLM Provider
The factory synthesis flow now defaults to Gemini 2.5 Flash only. If you rely on OpenAI or Anthropic models for autodiscovery, you must export GEMINI_API_KEY or modify atlas/sdk/factory_synthesis.py to reintroduce other providers.
Evaluation Config Renaming
Learning evaluation configs have been renamed for clarity:
learning_overhaul_base.yaml→learning_baseline.yamllearning_overhaul_*.yaml→learning_*.yaml
Update automation scripts referencing the old names.
Terminology Updates
Learning nuggets are now **playbook entri...
Atlas SDK v0.1.8
Atlas SDK v0.1.8 — 2025-10-23
Highlights
- Autodiscovery-first CLI with
.env/PYTHONPATHbootstrapping, config replays, and fake-LLM smoke tests. - Learning playbooks injected into every runtime prompt with hash-based cache safety.
- Persistent telemetry plus learning reports with model-level analytics and flexible filters.
- Session export guardrails requiring approval, complete with drift alerts and review tooling.
- Expanded evaluation suites (capability probe, dual-agent runtime, reward models) and an end-to-end
atlas trainworkflow.
Detailed Changelog
- #76 – Learning playbook resolution, prompt injection, cache-key hashing, and config toggle for Student/Teacher personas.
- #75 – CLI scaffolding upgrades:
.envloading,PYTHONPATHfixes, provider/model autodiscovery, fake LLM mode, and env-ready integration tests. - #74 –
atlas run --config, stronger adapter registration, LangChain serialization fixes, and CLI metadata/type-safety cleanup. - #73 – Learning evaluation harness with filtering by project/task/tags, model breakdown analytics, window specs, async summary generation, and extra session metadata.
- #72 – Postgres persistence for discovery/runtime telemetry, hint-less learning evaluation workflow, report utilities, and database API extensions.
- #70 – Autodiscovery onboarding flow, new runtime execution path, learning synthesizer config/state management, and orchestration improvements.
- #63 – Session review/approval workflow, export filters by review status, CLI moderation commands, and documentation for production guardrails.
- #55 – Capability probe refresh with xAI Grok provider support plus packaged evaluation datasets and docs.
- #56 – Dual-agent runtime evaluation harness, synthetic runtime dataset, reference documentation, and unit tests.
- #57 – Reward model evaluation harness, trajectory dataset, packaging updates, and documentation.
- #54 – Learning history limits, payload trimming, streak metrics, documentation, and unit coverage.
- #52 –
atlas trainCLI, reusable export filters, sample datasets, dotenv auto-load, and workflow documentation.
v0.1.7
What's Changed
- Learning history migration by @aman-jaglan
- Runtime efficiency improvements by @aman-jaglan
- Updated comparison visualizations by @aman-jaglan