Skip to content

fix(AC-315): wrap all providers with RetryProvider for transient error handling#450

Merged
jayscambler merged 2 commits intomainfrom
fix/ac-315-anthropic-500-retry
Mar 18, 2026
Merged

fix(AC-315): wrap all providers with RetryProvider for transient error handling#450
jayscambler merged 2 commits intomainfrom
fix/ac-315-anthropic-500-retry

Conversation

@jayscambler
Copy link
Contributor

Summary

The RetryProvider existed with correct 500/429/timeout handling but was never wired in — create_provider() returned bare providers that crashed on transient errors.

Root Cause

L20 and L21 hit two consecutive Anthropic HTTP 500s and crashed immediately because:

# Before: bare provider, no retries
return AnthropicProvider(api_key=..., default_model_name=...)

Fix

All providers from create_provider() are now wrapped in RetryProvider:

# After: automatic retry with exponential backoff
return RetryProvider(AnthropicProvider(api_key=..., default_model_name=...))

Default: 3 retries, 1s initial delay, 2x backoff, 60s max delay.

RetryProvider._is_transient() checks for: 500, 502, 503, 504, rate_limit, 429, timeout, overloaded, capacity, connection, temporarily unavailable.

Providers wrapped

  • Anthropic
  • OpenAI-compatible
  • Ollama
  • vLLM

(MLX is local — no network retries needed)

Test plan

  • 5 new tests: create_provider returns RetryProvider, retries on 500, gives up after max, get_provider wraps
  • 3 existing tests updated for wrapped name format
  • ruff clean, full suite 4400 passed

…r handling

The RetryProvider existed but was never used — create_provider()
returned bare providers that crashed on 500s. L20/L21 hit two
consecutive Anthropic 500s and the CLI exited immediately.

Fix: wrap all providers returned by create_provider() in RetryProvider
(default: 3 retries, exponential backoff starting at 1s, max 60s).
Covers Anthropic, OpenAI-compatible, ollama, vllm providers.

RetryProvider already handles transient errors via _is_transient()
which checks for 500, 502, 503, 504, rate limit, timeout, overloaded
substrings in error messages.

5 tests: create_provider returns RetryProvider, RetryProvider retries
on 500, gives up after max_retries, get_provider also wraps.
Updated 3 existing tests to check "in p.name" instead of exact match
(RetryProvider wraps the name as "Retry(ProviderName)").
@linear
Copy link

linear bot commented Mar 18, 2026

@jayscambler jayscambler merged commit c674e9a into main Mar 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant