docs: update README, CHANGELOG, and bump to v1.2.0

Codebeast · claude · Codebeast · commit ad51d7a73bb7 · 2026-04-01T16:46:37.000-05:00
- README: added Cerebras/Groq providers, Logger section, fromEnv()
  example, graduated circuit breaker docs, CreditLedger usage,
  updated model constants to current-gen, corrected API reference
  tables for all 5 providers and new exports
- CHANGELOG: added [1.2.0] entry covering all 11 audit fixes
- package.json: version 1.1.0 → 1.2.0

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,31 @@
 All notable changes to `@stackbilt/llm-providers` are documented here.
 Format follows [Keep a Changelog](https://keepachangelog.com/). Versions use [Semantic Versioning](https://semver.org/).
 
+## [1.2.0] — 2026-04-01
+
+### Added
+- **Structured Logger** — `Logger` interface with `noopLogger` (silent default) and `consoleLogger` (opt-in). All components accept an optional `logger` via config. Zero `console.*` calls in production code.
+- **Rate limit enforcement** — `LLMProviderFactory` now checks `CreditLedger.checkRateLimit(rpm/rpd)` before dispatching to each provider, skipping exceeded providers.
+- **Claude 4.6 models** — `claude-opus-4-6-20250618`, `claude-sonnet-4-6-20250618` added to Anthropic provider.
+- **Claude Haiku 4.5** — `claude-haiku-4-5-20251001` added.
+- **Claude 3.7 Sonnet** — `claude-3-7-sonnet-20250219` added (replaces incorrect `claude-sonnet-3.7` ID).
+- **`CostAnalytics`** and **`ProviderHealthEntry`** — typed return values for `getCostAnalytics()` and `getProviderHealth()`.
+
+### Fixed
+- **30+ `any` types eliminated** — all provider interfaces, tool call types, Workers AI response shapes, error bodies, cost analytics returns, and decorator signatures fully typed. Three boundary casts for Cloudflare `Ai.run()` retained with explicit eslint-disable.
+- **Data leak removed** — `console.log` at `anthropic.ts:492` that dumped full tool call payloads into worker logs.
+- **Anthropic JSON mode** — only prepends `{` if the response doesn't already start with one, preventing `{{...}` corruption.
+- **OpenAI `supportsBatching`** — set to `false` (was `true` but `processBatch()` is a sequential loop).
+- **Default model** — OpenAI default changed from deprecated `gpt-3.5-turbo` to `gpt-4o-mini`.
+- **Default fallback chain** — now includes all 5 configured providers (was hardcoded to cloudflare/anthropic/openai, excluding Cerebras and Groq).
+- **Anthropic healthCheck** — switched from real API call (burned tokens) to lightweight OPTIONS reachability check.
+- **`TokenUsage.cost`** — made required (was optional, causing NaN accumulation in cost tracker).
+- **Circuit breaker test isolation** — `defaultCircuitBreakerManager.resetAll()` in all test `beforeEach` blocks to prevent cross-test state leaks.
+
+### Changed
+- **Logging default** — library is now silent by default (`noopLogger`). Pass `consoleLogger` or a custom `Logger` to enable output.
+- **Model catalog** — updated to current-gen models; removed stale/incorrect model IDs and TBD pricing.
+
 ## [1.1.0] — 2026-04-01
 
 ### Added
diff --git a/README.md b/README.md
@@ -1,17 +1,20 @@
 # @stackbilt/llm-providers
 
-A multi-provider LLM abstraction layer with automatic failover, circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard `fetch` API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.
+A multi-provider LLM abstraction layer with automatic failover, graduated circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard `fetch` API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.
 
 ## Features
 
-- **Multi-provider failover** -- OpenAI, Anthropic, and Cloudflare Workers AI behind a single interface
-- **Circuit breaker** -- state machine (closed / open / half-open) prevents cascading failures
+- **Multi-provider failover** -- OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, and Groq behind a single interface
+- **Graduated circuit breaker** -- 4-state machine (closed / degraded / recovering / open) with probabilistic traffic routing prevents cascading failures
 - **Exponential backoff retry** -- configurable delays, jitter, and per-error-class behavior
-- **Cost tracking and optimization** -- per-provider cost attribution, budget alerts, automatic routing to cheaper providers
-- **Streaming** -- SSE streaming support for all three providers
-- **Tool/function calling** -- OpenAI and Anthropic tool use with unified response format
-- **Batch processing** -- concurrent request batching with rate-limit awareness
+- **Cost tracking and optimization** -- per-provider cost attribution, budget alerts with CreditLedger, automatic routing to cheaper providers
+- **Rate limit enforcement** -- CreditLedger tracks RPM/RPD/TPM/TPD per provider; factory skips providers that exceed limits
+- **Streaming** -- SSE streaming support for all providers
+- **Tool/function calling** -- OpenAI, Anthropic, Cerebras, and Cloudflare tool use with unified response format
+- **Image generation** -- Cloudflare Workers AI (SDXL, FLUX) and Google Gemini
 - **Health monitoring** -- per-provider health checks, metrics, and circuit breaker state
+- **Structured logging** -- injectable `Logger` interface; silent by default, opt-in to console or custom loggers
+- **Zero runtime dependencies** -- no transitive dependency tree to audit
 
 ## Installation
 
@@ -46,55 +49,75 @@ console.log(response.message);
 console.log(`Provider: ${response.provider}, Cost: $${response.usage.cost}`);
 ```
 
-## Provider Configuration
-
-### OpenAI
+### Auto-Discovery from Environment
 
 ```typescript
-{
-  apiKey: 'sk-...',
-  organization: 'org-...',   // optional
-  project: 'proj-...',       // optional
-  baseUrl: 'https://api.openai.com/v1', // optional, for proxies
-  timeout: 30000,
-  maxRetries: 3,
-}
+import { LLMProviders } from '@stackbilt/llm-providers';
+
+// Scans env for ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY,
+// CEREBRAS_API_KEY, and AI binding — configures only what's present
+const llm = LLMProviders.fromEnv(env, {
+  costOptimization: true,
+  enableCircuitBreaker: true,
+});
 ```
 
-### Anthropic
+## Providers
+
+| Provider | Models | Streaming | Tools | Notes |
+|----------|--------|-----------|-------|-------|
+| **OpenAI** | GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-4 | Yes | Yes | Default: `gpt-4o-mini` |
+| **Anthropic** | Claude Opus 4.6, Sonnet 4.6, Sonnet 4, Haiku 4.5, 3.7 Sonnet, 3.5 Sonnet/Haiku, 3 Opus/Sonnet/Haiku | Yes | Yes | Default: `claude-haiku-4-5` |
+| **Cloudflare** | LLaMA 3.1 8B/70B, GPT-OSS 120B, Mistral 7B, Qwen 1.5, TinyLlama, and more | Yes | GPT-OSS only | Near-zero cost |
+| **Cerebras** | LLaMA 3.1 8B, LLaMA 3.3 70B, ZAI-GLM 4.7, Qwen 3 235B | Yes | GLM/Qwen only | ~2,200 tok/s |
+| **Groq** | LLaMA 3.3 70B Versatile, LLaMA 3.1 8B Instant | Yes | No | Ultra-fast inference |
+
+### Provider Configuration
 
 ```typescript
-{
-  apiKey: 'sk-ant-...',
-  version: '2023-06-01',     // optional
-  baseUrl: 'https://api.anthropic.com', // optional
-  timeout: 30000,
-  maxRetries: 3,
-}
+// OpenAI
+{ apiKey: 'sk-...', organization: 'org-...', project: 'proj-...' }
+
+// Anthropic
+{ apiKey: 'sk-ant-...', version: '2023-06-01' }
+
+// Cloudflare Workers AI
+{ ai: env.AI, accountId: '...' }
+
+// Cerebras
+{ apiKey: 'csk-...' }
+
+// Groq
+{ apiKey: 'gsk_...' }
 ```
 
-### Cloudflare Workers AI
+## Logging
+
+The library is silent by default. Opt in to logging by passing a `Logger`:
 
 ```typescript
-{
-  ai: env.AI,                // Cloudflare AI binding (required)
-  accountId: '...',          // optional
-  timeout: 30000,
-  maxRetries: 3,
-}
+import { LLMProviders, consoleLogger } from '@stackbilt/llm-providers';
+
+const llm = new LLMProviders({
+  anthropic: { apiKey: '...', logger: consoleLogger },
+  logger: consoleLogger, // factory-level logging
+});
 ```
 
+Or implement your own `Logger` interface (`debug`, `info`, `warn`, `error`).
+
 ## Circuit Breaker
 
-Each provider gets its own circuit breaker that tracks consecutive failures.
+Each provider gets a graduated circuit breaker that routes traffic away from failing providers with probabilistic degradation.
 
 | State | Behavior |
 |-------|----------|
-| **Closed** | Requests pass through normally. Failures increment a counter. |
-| **Open** | All requests are immediately rejected. After `resetTimeout` ms, transitions to half-open. |
-| **Half-open** | A single test request is allowed through. Success closes the circuit; failure re-opens it. |
+| **Closed** | 100% traffic to primary. Failures increment counter. |
+| **Degraded** | Traffic splits probabilistically (90% → 70% → 40% → 10%) as failures accumulate. |
+| **Recovering** | Success steps traffic back up one level at a time. |
+| **Open** | 0% traffic. After `resetTimeout` ms, failures decay and traffic resumes. |
 
-Default thresholds: 5 failures to open, 60s reset timeout, 5-minute monitoring window.
+Default: 5-step degradation curve `[1.0, 0.9, 0.7, 0.4, 0.1]`, 60s reset timeout, 5-minute monitoring window.
 
 ```typescript
 import { CircuitBreakerManager } from '@stackbilt/llm-providers';
@@ -103,47 +126,37 @@ const manager = new CircuitBreakerManager({
   failureThreshold: 5,
   resetTimeout: 60000,
   monitoringPeriod: 300000,
+  degradationCurve: [1.0, 0.9, 0.7, 0.4, 0.1],
 });
 
 const breaker = manager.getBreaker('openai');
 console.log(breaker.getHealth());
 ```
 
-## Cost Optimization
-
-When `costOptimization: true`, the factory routes requests to the cheapest available provider. Cloudflare Workers AI is essentially free and gets top priority.
+## Cost Tracking & Budget Management
 
 ```typescript
-import { createCostOptimizedLLMProviders } from '@stackbilt/llm-providers';
+import { CreditLedger, LLMProviders } from '@stackbilt/llm-providers';
 
-const llm = createCostOptimizedLLMProviders({
-  openai: { apiKey: process.env.OPENAI_API_KEY },
-  cloudflare: { ai: env.AI },
+const ledger = new CreditLedger({
+  budgets: [
+    { provider: 'openai', monthlyBudget: 50, rateLimits: { rpm: 60, rpd: 10000 } },
+    { provider: 'anthropic', monthlyBudget: 100 },
+  ],
 });
 
-const analytics = llm.getCostAnalytics();
-// { breakdown: { openai: { cost, requests, tokens }, ... }, total: 0.042, recommendations: [...] }
-```
-
-## Retry with Backoff
-
-Transient errors (rate limits, network errors, server errors) are retried automatically with exponential backoff and jitter.
+// Threshold alerts fire at 80%, 90%, 95% utilization
+ledger.on((event) => {
+  if (event.type === 'threshold_crossed') {
+    console.warn(`${event.provider}: ${event.tier} — ${event.utilizationPct.toFixed(0)}% of budget`);
+  }
+});
 
-```typescript
-import { RetryManager, retry } from '@stackbilt/llm-providers';
-
-// Standalone retry for any async operation
-const result = await retry(
-  () => fetch('https://api.example.com/data'),
-  { maxRetries: 3, initialDelay: 1000, backoffMultiplier: 2 }
-);
-
-// Or configure per-provider via RetryManager
-const retryManager = new RetryManager({
-  maxRetries: 5,
-  initialDelay: 500,
-  maxDelay: 30000,
-  backoffMultiplier: 2,
+const llm = new LLMProviders({
+  openai: { apiKey: '...' },
+  anthropic: { apiKey: '...' },
+  costOptimization: true,
+  ledger, // Factory enforces rate limits and tracks spend
 });
 ```
 
@@ -156,6 +169,7 @@ const llm = new LLMProviders({
   openai: { apiKey: '...' },
   anthropic: { apiKey: '...' },
   cloudflare: { ai: env.AI },
+  cerebras: { apiKey: '...' },
   fallbackRules: [
     { condition: 'rate_limit', fallbackProvider: 'cloudflare' },
     { condition: 'cost', threshold: 10, fallbackProvider: 'cloudflare' },
@@ -164,6 +178,8 @@ const llm = new LLMProviders({
 });
 ```
 
+Default fallback priority includes all configured providers: Cloudflare → Cerebras → Groq → Anthropic → OpenAI.
+
 ## Error Handling
 
 Structured error classes for each failure mode:
@@ -175,7 +191,6 @@ import {
   AuthenticationError,
   CircuitBreakerOpenError,
   TimeoutError,
-  LLMErrorFactory,
 } from '@stackbilt/llm-providers';
 
 try {
@@ -193,14 +208,16 @@ try {
 
 ## Model Constants
 
-Predefined model identifiers for convenience:
-
 ```typescript
 import { MODELS, getRecommendedModel } from '@stackbilt/llm-providers';
 
-MODELS.GPT_4O;              // 'gpt-4o'
-MODELS.CLAUDE_3_5_SONNET;   // 'claude-3-5-sonnet-20241022'
-MODELS.LLAMA_3_1_8B;        // '@cf/meta/llama-3.1-8b-instruct'
+// Current-gen models
+MODELS.CLAUDE_OPUS_4_6;         // 'claude-opus-4-6-20250618'
+MODELS.CLAUDE_SONNET_4_6;       // 'claude-sonnet-4-6-20250618'
+MODELS.CLAUDE_HAIKU_4_5;        // 'claude-haiku-4-5-20251001'
+MODELS.GPT_4O;                  // 'gpt-4o'
+MODELS.GPT_4O_MINI;             // 'gpt-4o-mini'
+MODELS.CEREBRAS_ZAI_GLM_4_7;    // 'zai-glm-4.7'
 
 // Get best model for a use case given available providers
 const model = getRecommendedModel('COST_EFFECTIVE', ['openai', 'cloudflare']);
@@ -214,41 +231,51 @@ const model = getRecommendedModel('COST_EFFECTIVE', ['openai', 'cloudflare']);
 |-------|-------------|
 | `LLMProviders` | High-level facade -- initialize providers, generate responses, check health |
 | `LLMProviderFactory` | Lower-level factory with provider chain building and fallback logic |
-| `OpenAIProvider` | OpenAI GPT models (streaming, tools, batch) |
+| `OpenAIProvider` | OpenAI GPT models (streaming, tools) |
 | `AnthropicProvider` | Anthropic Claude models (streaming, tools) |
-| `CloudflareProvider` | Cloudflare Workers AI (streaming, batch, cost optimization) |
+| `CloudflareProvider` | Cloudflare Workers AI (streaming, tools on GPT-OSS, batch) |
+| `CerebrasProvider` | Cerebras fast inference (streaming, tools on GLM/Qwen) |
+| `GroqProvider` | Groq fast inference (streaming) |
 | `BaseProvider` | Abstract base with shared resiliency, metrics, and cost calculation |
 
 ### Utilities
 
 | Class | Description |
 |-------|-------------|
-| `CircuitBreaker` | Per-provider circuit breaker state machine |
+| `CircuitBreaker` | Graduated 4-state circuit breaker with probabilistic degradation |
 | `CircuitBreakerManager` | Manages circuit breakers across multiple providers |
 | `RetryManager` | Exponential backoff retry with jitter |
 | `CostTracker` | Per-provider cost accumulation and budget alerts |
+| `CreditLedger` | Monthly budgets, rate limits, burn rate projection, threshold events |
 | `CostOptimizer` | Static methods for optimal provider selection |
+| `ImageProvider` | Multi-provider image generation (Cloudflare SDXL/FLUX, Google Gemini) |
+
+### Logger
+
+| Export | Description |
+|--------|-------------|
+| `Logger` | Interface: `debug`, `info`, `warn`, `error` methods |
+| `noopLogger` | Silent logger (default) |
+| `consoleLogger` | Forwards to `console.*` (opt-in) |
 
 ### Key Types
 
 | Type | Description |
 |------|-------------|
-| `LLMRequest` | Unified request: messages, model, temperature, tools, metadata |
-| `LLMResponse` | Unified response: message, usage, provider, cost, tool calls |
-| `LLMProvider` | Provider interface: generateResponse, healthCheck, estimateCost |
-| `ProviderFactoryConfig` | Factory configuration: provider configs, fallback rules, flags |
-| `CircuitBreakerConfig` | Failure threshold, reset timeout, monitoring period |
-| `RetryConfig` | Max retries, delays, backoff multiplier, retryable error codes |
-| `CostConfig` | Token costs, monthly budget, alert threshold |
+| `LLMRequest` | Unified request: messages, model, temperature, tools, response_format |
+| `LLMResponse` | Unified response: message, usage (with cost), provider, tool calls |
+| `TokenUsage` | Token counts and cost (inputTokens, outputTokens, totalTokens, cost) |
+| `ProviderFactoryConfig` | Factory config: provider configs, fallback rules, ledger, logger |
+| `CostAnalytics` | Cost breakdown, total, and recommendations |
+| `ProviderHealthEntry` | Health status, metrics, circuit breaker state, capabilities |
 
 ### Factory Functions
 
 | Function | Description |
 |----------|-------------|
 | `createLLMProviders(config)` | Create an `LLMProviders` instance |
 | `createCostOptimizedLLMProviders(config)` | Create with cost optimization, circuit breakers, and retries enabled |
-| `createLLMProviderFactory(config)` | Create a bare `LLMProviderFactory` |
-| `createCostOptimizedFactory(config)` | Create a cost-optimized factory |
+| `LLMProviders.fromEnv(env)` | Auto-discover providers from environment variables |
 | `getRecommendedModel(useCase, providers)` | Pick the best model for a use case |
 | `retry(fn, config)` | One-shot retry wrapper for any async function |
 
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@stackbilt/llm-providers",
-  "version": "1.1.0",
+  "version": "1.2.0",
   "description": "Multi-LLM failover with circuit breakers, cost tracking, and intelligent retry. Cloudflare Workers native.",
   "author": "Stackbilt <admin@stackbilt.dev>",
   "license": "Apache-2.0",

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@stackbilt/llm-providers",`
`3`		`- "version": "1.1.0",`
	`3`	`+ "version": "1.2.0",`
`4`	`4`	`"description": "Multi-LLM failover with circuit breakers, cost tracking, and intelligent retry. Cloudflare Workers native.",`
`5`	`5`	`"author": "Stackbilt <admin@stackbilt.dev>",`
`6`	`6`	`"license": "Apache-2.0",`