fix: reduce stop hook API timeout from 10m to 90s by greynewell · Pull Request #567 · supermodeltools/Uncompact

greynewell · 2026-03-09T16:50:04Z

Problem

When the Supermodel API is unreachable, the Stop hook hangs for up to 10 minutes before giving up. During this window, Claude Code sessions are effectively frozen.

Root cause: runHandler and runWithoutCache both create a 10-minute context for the API call. When the API is down, pollJob retries immediately-failing connections every 10 seconds — so with a 10-minute context, the hook blocks for ~600 seconds before calling silentExit().

Fix

Reduce the API fetch timeout in the run command from 10 minutes to 90 seconds.

API unreachable → Stop hook gives up after ~90s instead of ~10 minutes ✓
API slow (large repo, first run) → 90s limit, then graceful silent exit; pregen (background, 20-minute timeout) warms the cache for the next compaction ✓
Fresh/stale cache hit → API is never called; behaviour unchanged ✓
Background stale refresh goroutine and pregen both retain their 20-minute timeouts ✓

The 10-minute timeout was designed for large repos on first run, but that use case is now handled by the background pregen hook — making the shorter Stop hook timeout safe.

Test plan

Simulate API outage (block the API host via /etc/hosts) — Stop hook should give up within ~90s with no output
Fresh-cache scenario — hook still serves instantly from cache
Stale-cache scenario — hook serves stale immediately, background refresh attempts (and fails silently) in parallel
go build ./... and go vet ./... pass

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Enhanced connection error handling: The API client now immediately returns errors for connection-level failures (such as network problems or DNS issues) instead of attempting automatic retries with delays. This provides faster feedback when the service is unreachable. HTTP-level errors continue to be handled as before.

…blocking When the Supermodel API is unreachable, pollJob retries failed connections every 10 seconds. With the previous 10-minute context, the Stop hook would hang for up to ~10 minutes before giving up — making Claude Code sessions unusable during API outages. Reduce the API fetch timeout in runHandler and runWithoutCache to 90 seconds. Long-running first-time fetches for large repos are already handled by the background pregen hook (20-minute timeout), so the stop hook can fail fast and gracefully on API outage without disrupting sessions. Co-Authored-By: Grey Newell <greyshipscode@gmail.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-09T16:50:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4773fee2-030c-47fc-bc66-176a89aaa004

📥 Commits

Reviewing files that changed from the base of the PR and between d01464b and dc468ea.

📒 Files selected for processing (1)

internal/api/client.go

Walkthrough

The API client's connection error handling was simplified. Instead of logging a warning and retrying with backoff delays when connection-level errors occur (DNS failures, refused connections, network issues), the function now immediately returns an error signal. HTTP-level errors continue using the existing retry logic.

Changes

Cohort / File(s)	Summary
Connection Error Handling `internal/api/client.go`	Removed retry/backoff logic for connection-level errors (DNS, connection refused, network timeouts); now immediately returns unreachable API error. HTTP-level error handling unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🔌 When networks fail and DNS breaks,
No more waiting, no more shakes,
Fail fast now, cut to the chase,
Connection errors show their face.
Simple paths, cleaner trace! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title mentions reducing API timeout from 10m to 90s, but the actual change in internal/api/client.go is about making connection-level errors fail fast instead of retrying—a different fix than the timeout reduction.	Update the title to reflect the actual change: something like 'fix: fail fast on connection-level errors in pollJob' would better describe what the code change actually does.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/issue-api-outage-20260309-0000

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…nutes When the Supermodel API is unreachable, pollJob was retrying connection errors (connection refused, DNS failure, network down) every 10 seconds for the full context duration — up to 10 minutes — before giving up. This blocked the Claude Code Stop hook for the entire outage window. Connection errors are fundamentally different from job-processing delays: - "pending"/"processing" status → API is working, polling makes sense - Connection error → API is unreachable, retrying won't help Change pollJob to return immediately on connection-level errors so the Stop hook can call silentExit() and unblock the session without waiting for the context deadline. 5xx errors, rate limits, and job-in-progress responses continue to be retried as before. Co-Authored-By: Grey Newell <greyshipscode@gmail.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/run.go`:
- Around line 229-235: The current 90s timeout only wraps the Supermodel fetch
and doesn't bound the full Stop hook; create a single parent context with a
total deadline for the run path (e.g., in the function that calls
runWithoutCache) and pass that parent ctx into runWithoutCache so all
sub-operations (project.Detect, GetWorkingMemory, and the Supermodel fetch)
derive sub-contexts from it (use context.WithTimeout/WithDeadline on the parent
to create shorter child contexts where needed and replace the local
context.WithTimeout in the Supermodel fetch with a child of the parent). Ensure
functions like runWithoutCache, project.Detect, and GetWorkingMemory accept and
use the passed parent context so the whole end-to-end Stop hook is capped by the
single parent deadline.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5585285b-f478-418c-8877-9b6653e90172

📥 Commits

Reviewing files that changed from the base of the PR and between 317cc93 and d01464b.

📒 Files selected for processing (1)

cmd/run.go

coderabbitai · 2026-03-09T16:53:35Z

cmd/run.go

+	// If no cache or forced refresh, fetch from API.
+	// Use a short timeout so the Stop hook never blocks a Claude Code session
+	// for more than ~90 seconds during an API outage. Long-running first-time
+	// fetches for large repos are handled by the background pregen hook.
 	if graph == nil || forceRefresh {
 		logFn("[debug] fetching from Supermodel API...")
-		ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
+		ctx, cancel := context.WithTimeout(context.Background(), 90*time.Second)


⚠️ Potential issue | 🟡 Minor

The 90s cap only applies to the fetch phase, not the whole Stop hook.

Right now the command can still spend up to 5s in project.Detect, 15s in GetWorkingMemory, and then another 90s here, so the real wall-clock cap is still closer to ~110s. If the PR goal is a true end-to-end Stop-hook limit, give the whole run path a parent deadline and derive these sub-contexts from it.

Possible shape of the fix

func runHandler(cmd *cobra.Command, args []string) error { + runCtx, runCancel := context.WithTimeout(context.Background(), 90*time.Second) + defer runCancel() + ... - gitCtx, gitCancel := context.WithTimeout(context.Background(), 5*time.Second) + gitCtx, gitCancel := context.WithTimeout(runCtx, 5*time.Second) defer gitCancel() ... - wmCtx, wmCancel := context.WithTimeout(context.Background(), 15*time.Second) + wmCtx, wmCancel := context.WithTimeout(runCtx, 15*time.Second) defer wmCancel() ... - ctx, cancel := context.WithTimeout(context.Background(), 90*time.Second) + ctx, cancel := context.WithTimeout(runCtx, 90*time.Second) defer cancel()

You'd want to thread that same parent context into runWithoutCache too.

Also applies to: 475-479

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/run.go` around lines 229 - 235, The current 90s timeout only wraps the Supermodel fetch and doesn't bound the full Stop hook; create a single parent context with a total deadline for the run path (e.g., in the function that calls runWithoutCache) and pass that parent ctx into runWithoutCache so all sub-operations (project.Detect, GetWorkingMemory, and the Supermodel fetch) derive sub-contexts from it (use context.WithTimeout/WithDeadline on the parent to create shorter child contexts where needed and replace the local context.WithTimeout in the Supermodel fetch with a child of the parent). Ensure functions like runWithoutCache, project.Detect, and GetWorkingMemory accept and use the passed parent context so the whole end-to-end Stop hook is capped by the single parent deadline.

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

greynewell merged commit 57756c9 into main Mar 9, 2026
3 checks passed

greynewell deleted the claude/issue-api-outage-20260309-0000 branch March 9, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce stop hook API timeout from 10m to 90s#567

fix: reduce stop hook API timeout from 10m to 90s#567
greynewell merged 2 commits intomainfrom
claude/issue-api-outage-20260309-0000

greynewell commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

greynewell commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greynewell commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading