22 Nov 23:11

8efcd9f

v0.1.16: Zero-Friction Onboarding and KV Cache Optimization Latest

Latest

Overview

This release delivers major improvements to developer experience and performance optimization. The onboarding time drops from 30+ minutes to under 3 minutes with zero manual interventions, and new playbook injection modes enable KV cache optimization for reduced latency and token costs.

Zero-Friction Onboarding (PR #146)

Complete refactor of atlas env init to achieve "one API key to training data in 5 minutes."

What Changed

LLM-Driven Agent Selection

Intelligent candidate selection using Claude Haiku 4.5
Auto-selects with confidence > 0.85 (no user prompts required)
Shows reasoning for lower confidence selections
Graceful fallback to heuristic ranking if API unavailable

Anthropic-Only Defaults

Student: Claude Haiku 4.5 (claude-haiku-4-5-20251001)
Teacher: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
Based on runtime evaluation showing 0.989 reward score, 20.08s latency
Learning features enabled by default (few-shot prompting, playbook injection)

Integrated Storage Setup

Folds atlas init into env init flow
Auto-detects running PostgreSQL with connection validation
Prompts to start Docker Compose if not running
Validates database connectivity before claiming success

Improved Developer Experience

Database connection validation prevents cryptic first-run errors
Error messages follow Fix/Debug/Learn more pattern
Positive framing emphasizes what works vs what is missing
All documentation links verified working

Results

Metric	Target	Actual
Time to first run	< 5 min	~3 min
Manual interventions	0	0
Learning enabled	100%	100%

New Files

atlas/cli/candidate_selection.py - LLM-based agent selection with fallback
atlas/cli/config_defaults.py - Anthropic configuration templates
atlas/cli/progress.py - Output formatting utilities
atlas/cli/education.py - Mental model education (not yet integrated)
tests/integration/test_env_init_anthropic.py - Integration tests

KV Cache Optimization (PR #144)

Configurable playbook injection modes optimize LLM provider KV cache usage for improved latency and reduced costs.

What Changed

Simplified Injection API

Removed confusing separate_message mode
Two clear modes:
- prefix (default): Playbook injected before system prompt
- suffix: Playbook injected after system prompt (enables KV cache reuse)

Critical Bug Fixes

Fixed learning persistence bug where empty dict blocked persistence
Disabled actionability gate in evaluation configs to allow cognitive learnings

Performance Benefits

Suffix mode preserves static system prompt for KV cache reuse
Reduced latency for subsequent requests (cached prefix not reprocessed)
Lower token costs (providers charge less for cached tokens)

Testing

Unit test suite added with 13 tests covering:

Prefix/suffix injection modes
Few-shot example extraction and formatting
Redaction pattern validation

End-to-end verification with OpenAI GPT-4o-mini:

First run: Generated 7 playbook entries (1,076 chars)
Second run: Loaded and injected learning (+3,126 prompt tokens from playbook)

Dependencies

Added anthropic>=0.74.1 for LLM-based candidate selection

Breaking Changes

None. All changes are backward compatible:

Heuristic fallback preserves original env init behavior
Default playbook injection mode remains prefix
Existing flags and options preserved

Installation

pip install --upgrade arc-atlas

Documentation

Updated README.md with new quickstart flow
Updated docs/sdk/quickstart.mdx and docs/guides/introduction.mdx
Updated docs/sdk/learning_tracking.md with injection mode documentation

Contributors

Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Assets 2

12 Nov 01:12

jbarnes850

v0.1.15

6bf500b

Version 0.1.15 - Learning Synthesis Fixes and Structured Payloads

Version 0.1.15 (2025-11-12)

🐛 Bug Fixes

Fixed learning synthesis not triggering when session_reward is set via orchestrator (#138)
- Ensures proper ExecutionContext access after orchestrator completion
- Adds defensive initialization checks for metadata property
Fixed Gemini learning synthesis reliability (#139)
- Implements structured outputs with JSON Schema validation for Gemini models
- Improves error handling with clearer messages
- Reduces silent failures in learning synthesis
Fixed discovery experience issues
- Resolves synthesis limits and module-level instance support problems
Fixed validation message visibility
- Validation success/failure messages now always shown
Fixed .env variable loading
- Ensures .env variables are applied to os.environ for LLM synthesis

✨ New Features

Structured payloads for BYOA adapters (#137)
- Opt-in support for structured task_payload and step_payload metadata
- Enables integration with test harnesses and simulation environments
- Python adapters opt-in automatically; LLM adapters remain opted-out by default
- See documentation for details

🔧 Improvements

Refactored env.py modules for improved maintainability
- Extracted LLM inference utilities to atlas/sdk/llm_inference.py
- Extracted factory builders to atlas/sdk/factory_synthesis.py
- Extracted shared types to atlas/cli/env_types.py
- Reduced code duplication by ~623 lines

📚 Documentation

Updated README with badges and development section
Added structured adapter payloads documentation
Clarified storage requirements for rewards and learning
Added training pipeline documentation links

Assets 2

05 Nov 15:42

jbarnes850

v0.1.14

bca1131

0.1.14 - Multi-Provider LiteLLM Adapter

Features

Renamed OpenAI adapter to LitellmAdapter to reflect multi-provider support
Added support for Anthropic, Gemini, Bedrock, X.AI, Azure OpenAI via litellm

Deprecations

OpenAIAdapterConfig → LitellmAdapterConfig (old name still works with warnings)
type: openai → type: litellm in YAML configs (old syntax still works)
AdapterType.OPENAI → AdapterType.LITELLM (old enum value still works)

Migration Guide

For new code, update your configs:

agent:
  type: litellm  # Changed from: type: openai

from atlas.config.models import LitellmAdapterConfig
config = LitellmAdapterConfig(type=AdapterType.LITELLM, ...)

Old code continues to work with deprecation warnings.

Assets 2

04 Nov 13:34

jbarnes850

v0.1.13

709ac77

v0.1.13 - Direct Training Data Access

Python Client for Direct Training Data Access

This release introduces a new Python client for querying training data directly from PostgreSQL, eliminating JSONL export intermediates and preventing schema drift between SDK and ATLAS Core.

Key Features

Direct Database Queries: Query training sessions without JSONL export steps
Reward-Based Filtering: Filter sessions using JSONB operators on reward scores
Selective Data Loading: Control trajectory events and learning data inclusion via flags
Pagination Support: Process large datasets efficiently with async iterators
Enterprise-Ready: Works with Docker Postgres and on-premises deployments

New Modules

atlas.training_data.client: Core query functions with async/sync variants
atlas.training_data.converters: Database dict to dataclass conversion
atlas.training_data.filters: SQL WHERE clause builder
atlas.training_data.pagination: Async iterator for batch processing

Schema Updates

AtlasSessionTrace: Added 6 essential fields (session_reward, trajectory_events, student_learning, teacher_learning, learning_history, adaptive_summary) and 7 property accessors

AtlasStepTrace: Added 2 essential fields (runtime, depends_on) and 1 property accessor

Performance Improvements

Added database indexes for production workloads:

Reward filtering: 10-100x faster
Date range queries: 50-100x faster
Critical for training workloads with millions of sessions

Install

pip install arc-atlas==0.1.13

Assets 2

02 Nov 16:26

jbarnes850

v0.1.12

24e715c

Atlas SDK v0.1.12

Progressive Learning & Developer Experience Improvements

This release introduces a streamlined quickstart experience, comprehensive documentation consolidation, and enhanced evaluation tooling for measuring tool adoption and learning effectiveness.

Features

New Atlas Quickstart CLI (#114)

Added atlas quickstart command replacing the old examples/quickstart.py script
Run interactive demonstration with 3 security review tasks, metrics visualization, and cost estimation
Introduced ATLAS_OFFLINE_MODE=1 for mock LLM responses (deprecates ATLAS_FAKE_LLM)
Improved error handling with troubleshooting guidance for missing API keys and configuration issues
Zero-friction onboarding for new users

Auto-Validation on Environment Init (#113)

atlas env init now automatically validates discovered artifacts regardless of auto_skip flag
Removed confusing --validate flag that didn't actually change behavior
Ensures configuration correctness from the start

Capability Probe Fail-Open (#112)

Probe now gracefully handles missing API credentials with fail-open behavior
Defaults to 'paired' mode for zero-config operation when probe unavailable
Removed deprecated 'escalate' mode from all code paths
Fixed default LLM consistency: uses xai/grok-4-fast throughout
More resilient runtime with better error messages

Enhanced Tool Adoption Telemetry

Added comprehensive tool adoption validation framework with scripts/validate_tool_adoption.py
New evaluation configs for Claude and OpenAI models in configs/eval/learning/
Added digest statistics validation with scripts/validate_digest_stats.py
Improved telemetry tracking in Student persona with tool call recording
Reorganized eval scripts with clearer naming: benchmark_*, validate_*, report_*, collect_*

Improvements

Documentation Consolidation (#115, #116)

Removed documentation redundancy across codebase
Established canonical sources: quickstart.mdx, introduction.mdx, configuration.md
Removed all --offline flags from quickstart examples (promote real execution)
Featured mcp_tool_learning as primary production-ready example throughout docs
Fixed relative paths in PyPI readme for correct GitHub rendering

Examples Cleanup (#116)

Streamlined examples folder from 10+ files to 1 production-ready example (mcp_tool_learning)
Removed low-value examples: python_example.py, http_example.py, agent.py
Removed duplicate triage_adapters folder
Moved langgraph_adapter.py to test fixtures (test utility only)
Net reduction of 302 lines while improving clarity
Added README.md documenting naming convention pattern

Evaluation Tooling Organization

Reorganized configs/eval/ with clear structure: baseline/, learning/, reward/
Added comprehensive configs/eval/README.md and scripts/README.md documentation
Renamed scripts for clarity and consistency
Added comprehensive test coverage for tool adoption validation

Installation

pip install --upgrade arc-atlas

Full Changelog: v0.1.11...v0.1.12

Assets 2

01 Nov 19:02

jbarnes850

v0.1.11

b8cd9e3

Atlas SDK v0.1.11

Learning System Enhancements & Async Support

This release introduces transfer-level classification for learning evaluation and enables Atlas to run in async contexts.

Features

Transfer Level Classification (#106)

Added 4-tier transfer hierarchy: task/domain/workflow/universal
Evaluation reports now show transfer classification (e.g., "transfer no (task)")
Token efficiency guidance added to synthesis prompt
Rubric weights documented in learning_eval.md

Async Context Support (#105)

Exported arun() for async execution (fixes #104)
Enables learning harnesses, notebooks, and async frameworks
Added comprehensive MCP tool learning example (25 progressive tasks)

Improvements

Learning System

Gemini 2.5 Flash as default synthesizer
Enhanced 7-step synthesis process with quality gates
Fixed adoption tracking metadata recording

Terminology

Replaced "RIM" with "reward system" for clarity
Console output now shows "Judge scores" instead of "RIM scores"

Installation

pip install --upgrade arc-atlas

Full Changelog: v0.1.10...v0.1.11

Assets 2

30 Oct 22:32

jbarnes850

v0.1.10

a9b5f5a

v0.1.10

Zero-Config Atlas Run

This release enables developers to run atlas run immediately after atlas env init without requiring environment variables.

Key Changes

Validation Marker Workflow

Replaces ATLAS_DISCOVERY_VALIDATE=1 requirement with persistent .validated marker file
Marker written automatically during atlas env init, even when validation is deferred
Runtime factories check for marker file at instantiation
Emergency bypass available via ATLAS_SKIP_VALIDATION=1

Bug Fixes

Fixed HumanMessage format compatibility for LangGraph/DeepAgents (#101)
Fixed import shadowing causing infinite recursion in generated factories
Optimized IntermediateStepManager caching to prevent duplicate instantiation

Developer Experience

Before:

atlas env init
ATLAS_DISCOVERY_VALIDATE=1 atlas run --config .atlas/generated_config.yaml --task "..."

After:

atlas env init
atlas run --config .atlas/generated_config.yaml --task "..."

Installation

pip install --upgrade arc-atlas

Full Changelog: v0.1.9...v0.1.10

Assets 2

30 Oct 17:05

jbarnes850

v0.1.9

b0201f2

Atlas SDK v0.1.9

Atlas SDK v0.1.9 — 2025-10-30

Highlights

Repo-aware autodiscovery: atlas env init now analyzes your codebase to generate configs that mirror your agent's actual prompts, tools, LLM settings, and control flow—eliminating manual configuration for existing projects. Discovery now runs a real session during init, capturing telemetry and saving runtime config to .atlas/runs/ by default.
Learning evaluation foundation: Complete end-to-end evaluation pipeline with policy quality metrics, lifecycle tracking, usage instrumentation, and reproducible baseline experiments for prompt and model variants.
Provider-aware metadata management: OpenAI adapter now enforces budget constraints and projects execution metadata within configurable token limits, preventing context overflow. See configuration reference below for tuning knobs.
Critical pip install fix: Bundled config templates now ship in the wheel, ensuring atlas env init produces runnable configs without manual fixes.
Polished CLI experience: Streamlined learning summaries, dual-pass telemetry reporting, and improved warning management for cleaner output. Quickstart now documents warning suppression and dual-pass telemetry expectations.

Detailed Changelog

Autodiscovery & Configuration Synthesis

#90 — Repo-aware config generation

Extended autodiscovery to harvest project metadata (prompts, tools, config dictionaries, constructor defaults, factory helpers) directly from source code using AST, regex, and ripgrep analysis.
Enhanced candidate scoring to prioritize concrete implementations over abstract base classes by detecting NotImplementedError stubs, counting prompt/tool literals, and tracking instantiation patterns across the repository.
Generate runtime configs that automatically wire to your project's existing factories, system prompts, and LLM configurations instead of generic templates.
Persist signal maps in discover.json showing why each candidate was selected and what runtime wiring exists in the repo.
Broaden discovery beyond classes to include module-level callables, assignment factories, and pyproject.toml entry points, with framework-specific keyword boosting for LangGraph, LangChain, CrewAI, Semantic Kernel, and DeepAgents.
Improved LLM candidate extraction with recursive static analysis, deduplication, and normalized tool entry merging for more accurate agent config synthesis.
Increased model output limits (16k for Claude Haiku and Gemini 2.5 Flash) to accommodate richer metadata in a single synthesis round.
Changed default behavior: run_discovery=True now executes sample loops automatically during init, capturing telemetry and runtime config by default. Adapter call order changed to prioritize act(observation, ...) first for SecRL-style agent compatibility.
Result: Running atlas env init now produces two repo-aware artifacts: a factory module wrapping your environment/agent and a config file reflecting your project's actual implementation. Developers only need to tweak edge cases instead of rebuilding configs manually.

#88 — Bundle config templates in wheel

Fixed missing template issue that caused pip installations to fail generating runnable configs.
Packaged baseline template under atlas.templates and load via importlib.resources, maintaining repo fallback for local development.
Aligned example configs with packaged templates and removed stale active_judges toggle.
Impact: atlas env init --scaffold-config-full now works correctly for pip users without manual configuration fixes.

Learning & Evaluation Infrastructure

#86 — Learning evaluation foundation

Implemented end-to-end learning evaluation pipeline with policy quality, lifecycle, usage, and efficiency metrics.
Added playbook entry schema enforcement with rubric gating, provenance tracking, and runtime usage instrumentation throughout the learning pipeline (synthesizer, personas, runtime wiring).
Extended evaluation harness (scripts/eval_learning.py, atlas/evaluation/learning_report.py) with CLI flags for prompt and model experiments, filtering by project/task/tags, and model breakdown analytics.
Renamed learning nuggets to playbook entries for clearer conceptual alignment with policy-driven learning.
Added playbook impact instrumentation to track how learned policies influence runtime behavior.
Shipped experiment-ready configs (configs/eval/learning_baseline.yaml, learning_claude.yaml, learning_scope_shift.yaml) enabling reproducible baseline runs and systematic comparison of prompt/LLM variants.
Enhanced learning summary output with better formatting, dual-pass telemetry reporting, and reusable summary helpers.
Developer Experience: Complete evaluation toolkit for auditing learning behavior, comparing synthesis strategies, and measuring policy effectiveness across different model configurations.

#98 — Polish learning summary CLI

Streamlined README onboarding flow around autodiscovery with explicit PostgreSQL requirement documentation.
Hardened env config synthesis to tolerate missing capability metadata while preserving learning and runtime safety blocks.
Made StepwiseAgentAdapter more defensive when callables drop keyword argument support.
Reworked quickstart to highlight dual-pass telemetry, reuse learning summary helper, and suppress noisy warnings for cleaner CLI output.
Impact: Cleaner, more resilient CLI experience with better error handling and focused output.

Provider Integration & Metadata Management

#94 — Provider-aware metadata digest

Added OpenAI adapter digest helper that projects execution metadata and enforces provider-specific budget constraints.
Derive default character budgets from provider context windows (approximately 10% using 4 chars/token) with simple override support.
Expose digest statistics and budget utilization in payloads and logs with new config knobs for fine-tuning.
Instrumented metadata digest utilization tracking and fixed size accounting for accurate budget enforcement.
Developer Experience: Prevents context overflow by automatically managing metadata size within provider limits while maintaining visibility into budget consumption.

Configuration Reference

OpenAI adapter metadata digest can be configured via metadata_digest block in your adapter config:

metadata_digest:
  enabled: true                      # Enable/disable digest projection
  char_budget: 50000                 # Global character budget (overrides provider defaults)
  provider_char_budgets:             # Per-provider overrides
    openai: 50000
    anthropic: 80000
  max_plan_steps: 5                  # Max plan steps to include (0-20)
  max_step_summaries: 5              # Max step summaries to include (0-20)
  max_learning_history_entries: 3    # Max learning entries to include (0-10)
  max_reward_audit_entries: 3        # Max reward audit entries (0-10)
  max_prompt_rewrite_chars: 2000     # Max chars for prompt rewrites (256-20000)
  max_section_chars: 4000            # Max chars per metadata section (512-20000)
  max_string_chars: 1000             # Max chars per string value (128-4000)

Default budgets are derived as ~10% of provider context window × 4 chars/token.

Evaluation & Testing Improvements

#79 — Reward evaluation enhancements

Added configs/eval/reward_system.yaml for config-driven judge presets and combinations without code changes.
Extended scripts/eval_reward_models.py with --collect-audit flag, safe audit payload serialization, and automatic Markdown report generation alongside JSON output.
Fixed reward eval stub to properly handle the audit flag, ensuring audit payload size guards work correctly.
Updated documentation in docs/reward_eval.md for new flags and config-driven workflow.
Research Impact: Streamlined reward model evaluation with better auditability and configurable judge combinations.

Documentation & Developer Experience

Explained quickstart warning suppression patterns for cleaner first-run experience.
Streamlined README onboarding around autodiscovery workflow with clearer PostgreSQL setup instructions.
Updated learning evaluation documentation showing new schema, gates, and baseline config usage.
Enhanced reward evaluation docs with audit workflow and config-driven presets.
Added experiment design notes in docs/rfc/2025-10-29-learning-eval-foundation.md.

Testing & Quality

Added comprehensive test coverage for learning evaluation (tests/unit/evaluation/test_learning_report.py, tests/unit/test_learning_usage.py).
Added prompt digest and OpenAI adapter test coverage (tests/unit/connectors/test_prompt_digest.py).
Added CLI summary test coverage (tests/unit/cli/test_run_summary.py).
Enhanced learning synthesizer tests with playbook entry validation.
Fixed reward eval stub compatibility with audit flag.

Migration Notes

The following changes may require configuration updates:

Factory Synthesis LLM Provider

The factory synthesis flow now defaults to Gemini 2.5 Flash only. If you rely on OpenAI or Anthropic models for autodiscovery, you must export GEMINI_API_KEY or modify atlas/sdk/factory_synthesis.py to reintroduce other providers.

Evaluation Config Renaming

Learning evaluation configs have been renamed for clarity:

learning_overhaul_base.yaml → learning_baseline.yaml
learning_overhaul_*.yaml → learning_*.yaml

Update automation scripts referencing the old names.

Terminology Updates

Learning nuggets are now **playbook entri...

Assets 2

24 Oct 01:08

jbarnes850

v0.1.8

f7f1649

Atlas SDK v0.1.8

Atlas SDK v0.1.8 — 2025-10-23

Highlights

Autodiscovery-first CLI with .env/PYTHONPATH bootstrapping, config replays, and fake-LLM smoke tests.
Learning playbooks injected into every runtime prompt with hash-based cache safety.
Persistent telemetry plus learning reports with model-level analytics and flexible filters.
Session export guardrails requiring approval, complete with drift alerts and review tooling.
Expanded evaluation suites (capability probe, dual-agent runtime, reward models) and an end-to-end atlas train workflow.

Detailed Changelog

#76 – Learning playbook resolution, prompt injection, cache-key hashing, and config toggle for Student/Teacher personas.
#75 – CLI scaffolding upgrades: .env loading, PYTHONPATH fixes, provider/model autodiscovery, fake LLM mode, and env-ready integration tests.
#74 – atlas run --config, stronger adapter registration, LangChain serialization fixes, and CLI metadata/type-safety cleanup.
#73 – Learning evaluation harness with filtering by project/task/tags, model breakdown analytics, window specs, async summary generation, and extra session metadata.
#72 – Postgres persistence for discovery/runtime telemetry, hint-less learning evaluation workflow, report utilities, and database API extensions.
#70 – Autodiscovery onboarding flow, new runtime execution path, learning synthesizer config/state management, and orchestration improvements.
#63 – Session review/approval workflow, export filters by review status, CLI moderation commands, and documentation for production guardrails.
#55 – Capability probe refresh with xAI Grok provider support plus packaged evaluation datasets and docs.
#56 – Dual-agent runtime evaluation harness, synthetic runtime dataset, reference documentation, and unit tests.
#57 – Reward model evaluation harness, trajectory dataset, packaging updates, and documentation.
#54 – Learning history limits, payload trimming, streak metrics, documentation, and unit coverage.
#52 – atlas train CLI, reusable export filters, sample datasets, dotenv auto-load, and workflow documentation.

Assets 2

17 Oct 20:47

aman-jaglan

v0.1.7

bd9285e

v0.1.7

What's Changed

Learning history migration by @aman-jaglan
Runtime efficiency improvements by @aman-jaglan
Updated comparison visualizations by @aman-jaglan

Contributors

aman-jaglan

Assets 2

Releases: Arc-Computer/atlas-sdk

v0.1.16: Zero-Friction Onboarding and KV Cache Optimization

Overview

Zero-Friction Onboarding (PR #146)

What Changed

Results

New Files

KV Cache Optimization (PR #144)

What Changed

Testing

Dependencies

Breaking Changes

Installation

Documentation

Contributors

Uh oh!

Version 0.1.15 - Learning Synthesis Fixes and Structured Payloads

Version 0.1.15 (2025-11-12)

🐛 Bug Fixes

✨ New Features

🔧 Improvements

📚 Documentation

Uh oh!

0.1.14 - Multi-Provider LiteLLM Adapter

0.1.14 - Multi-Provider LiteLLM Adapter

Features

Deprecations

Migration Guide

Uh oh!

v0.1.13 - Direct Training Data Access

Python Client for Direct Training Data Access

Key Features

New Modules

Schema Updates

Performance Improvements

Install

Uh oh!

Atlas SDK v0.1.12

Progressive Learning & Developer Experience Improvements

Features

Improvements

Installation

Uh oh!

Atlas SDK v0.1.11

Learning System Enhancements & Async Support

Features

Improvements

Installation

Uh oh!

v0.1.10

Zero-Config Atlas Run

Key Changes

Developer Experience

Installation

Uh oh!

Atlas SDK v0.1.9

Atlas SDK v0.1.9 — 2025-10-30

Highlights

Detailed Changelog

Autodiscovery & Configuration Synthesis

Learning & Evaluation Infrastructure

Provider Integration & Metadata Management

Evaluation & Testing Improvements

Documentation & Developer Experience

Testing & Quality

Migration Notes

Factory Synthesis LLM Provider

Evaluation Config Renaming

Terminology Updates

Uh oh!

Atlas SDK v0.1.8

Atlas SDK v0.1.8 — 2025-10-23

Highlights

Detailed Changelog

Uh oh!

v0.1.7

What's Changed

Contributors

Uh oh!