fix(AC-315): wrap all providers with RetryProvider for transient error handling#450
Merged
jayscambler merged 2 commits intomainfrom Mar 18, 2026
Merged
fix(AC-315): wrap all providers with RetryProvider for transient error handling#450jayscambler merged 2 commits intomainfrom
jayscambler merged 2 commits intomainfrom
Conversation
…r handling The RetryProvider existed but was never used — create_provider() returned bare providers that crashed on 500s. L20/L21 hit two consecutive Anthropic 500s and the CLI exited immediately. Fix: wrap all providers returned by create_provider() in RetryProvider (default: 3 retries, exponential backoff starting at 1s, max 60s). Covers Anthropic, OpenAI-compatible, ollama, vllm providers. RetryProvider already handles transient errors via _is_transient() which checks for 500, 502, 503, 504, rate limit, timeout, overloaded substrings in error messages. 5 tests: create_provider returns RetryProvider, RetryProvider retries on 500, gives up after max_retries, get_provider also wraps. Updated 3 existing tests to check "in p.name" instead of exact match (RetryProvider wraps the name as "Retry(ProviderName)").
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
RetryProviderexisted with correct 500/429/timeout handling but was never wired in —create_provider()returned bare providers that crashed on transient errors.Root Cause
L20 and L21 hit two consecutive Anthropic HTTP 500s and crashed immediately because:
Fix
All providers from
create_provider()are now wrapped inRetryProvider:Default: 3 retries, 1s initial delay, 2x backoff, 60s max delay.
RetryProvider._is_transient()checks for: 500, 502, 503, 504, rate_limit, 429, timeout, overloaded, capacity, connection, temporarily unavailable.Providers wrapped
(MLX is local — no network retries needed)
Test plan