Muvon · TheFermiSea · Oct 17, 2025 · Oct 17, 2025 · Oct 17, 2025 · Oct 17, 2025
diff --git a/PR_DESCRIPTION.md b/PR_DESCRIPTION.md
@@ -0,0 +1,204 @@
+# Fix: Apply HTTP client timeouts to prevent infinite hangs
+
+## Summary
+
+Fixes a critical bug where octocode's GraphRAG indexing process hangs indefinitely during LLM API calls. The root cause is that `reqwest::Client` instances are created using `Client::new()` instead of the builder pattern, which prevents the configured `batch_timeout_seconds` from being applied.
+
+## Problem
+
+When using LLM-powered GraphRAG features (`use_llm = true`), the indexing process hangs indefinitely at two points:
+
+1. **File description generation** (with `ai_batch_size > 1`)
+2. **Architectural relationship extraction** (always processes all files in one batch)
+
+The configuration parameter `graphrag.llm.batch_timeout_seconds` is loaded but never applied to the HTTP client, causing requests to wait indefinitely when the LLM provider is slow to respond.
+
+## Root Cause Analysis
+
+### Primary Bug (`src/indexer/graphrag/builder.rs:74`)
+```rust
+// BEFORE (buggy):
+let client = Client::new();
+
+// AFTER (fixed):
+let client = Client::builder()
+    .timeout(std::time::Duration::from_secs(
+        config.graphrag.llm.batch_timeout_seconds,
+    ))
+    .build()?;
+```
+
+### Secondary Bug (`src/embedding/provider/huggingface.rs:460`)
+```rust
+// BEFORE (buggy):
+let client = reqwest::Client::new();
+
+// AFTER (fixed):
+let client = reqwest::Client::builder()
+    .timeout(std::time::Duration::from_secs(30))
+    .build()?;
+```
+
+## Reproduction
+
+1. Configure octocode with `use_llm = true` and `ai_batch_size > 1`
+2. Run `octocode index` on any codebase
+3. Process hangs at "AI analyzing X files for architectural relationships" with infinite spinner
+4. No timeout occurs even after configured `batch_timeout_seconds` expires
+
+## Testing
+
+### Before Fix
+- Both GPT-4.1-mini (default) and GPT-5-mini exhibited infinite hangs
+- Workaround with `ai_batch_size = 1` only helped description phase
+- Relationship extraction always hung (processes 72 files in single batch)
+
+### After Fix
+- Timeouts properly trigger after configured duration
+- Failed requests return error messages instead of hanging forever
+- Large batch processing completes or fails gracefully
+
+## Impact
+
+This bug made LLM-powered GraphRAG features unusable for any non-trivial codebase. The fix enables:
+- Reliable timeout behavior for all LLM API calls
+- Proper error handling and recovery
+- Predictable indexing duration
+
+## Related Documentation
+
+Full technical analysis available at: [octocode_timeout_analysis.md](../octocode_timeout_analysis.md)
+
+## Code Quality Improvements
+
+In addition to the core timeout fixes, this PR includes:
+
+### Enhanced Error Handling
+- Added `.context()` error messages to both HTTP client build failures
+- GraphRAG client: "Failed to create HTTP client for LLM API calls"
+- HuggingFace client: "Failed to create HTTP client for HuggingFace downloads"
+
+### Documentation Comments
+```rust
+// IMPORTANT: Must use builder pattern with timeout to prevent infinite hangs
+// when LLM API calls take too long. Client::new() does not apply timeouts.
+```
+
+These comments at both fix locations help prevent future regressions.
+
+### Code Verification
+- Zero clippy warnings
+- All existing `Client::new()` uses verified (3 instances in commands/ use request-level timeouts correctly)
+- No unwrap() issues in production code
+- Consistent error handling patterns
+
+## Technical Details
+
+### Why Client::new() Doesn't Apply Timeouts
+
+The `reqwest::Client::new()` convenience method creates a client with default settings that **do not include any timeout**. To apply a timeout, you must use the builder pattern:
+
+```rust
+// ❌ WRONG - no timeout applied
+let client = Client::new();
+
+// ✅ CORRECT - timeout is applied
+let client = Client::builder()
+    .timeout(Duration::from_secs(120))
+    .build()?;
+```
+
+### Request-Level vs Client-Level Timeouts
+
+This PR uses **client-level timeouts**. An alternative approach is request-level timeouts:
+
+```rust
+// Also valid, but less convenient for repeated requests
+let client = Client::new();
+let response = client.get(url)
+    .timeout(Duration::from_secs(120))
+    .send()
+    .await?;
+```
+
+The codebase uses both patterns appropriately:
+- **Client-level** (this PR): For GraphRAG batch operations and HuggingFace downloads
+- **Request-level**: In `src/commands/` where each request may need different timeouts
+
+## Test Results
+
+### Unit Tests
+```bash
+cargo test --all-features
+```
+Results: **93 passed, 3 failed**
+
+The 3 failures are unrelated FastEmbed lock acquisition issues:
+- `test_fastembed_provider_creation`
+- `test_fastembed_model_validation`
+- `test_fastembed_embedding_generation`
+
+These failures are environmental (file lock contention in local cache directory), not caused by the timeout fix changes.
+
+### Integration Testing
+Real-world test on rust-daq codebase (113 files):
+- File indexing: ✅ 113/113 files processed
+- GraphRAG blocks: ✅ 896 blocks created
+- Relationship extraction: ✅ Timeout triggered after 120s (expected behavior)
+- Before fix: Infinite hang
+- After fix: Graceful timeout with error message
+
+### Configuration Tested
+```toml
+[graphrag.llm]
+batch_timeout_seconds = 120
+ai_batch_size = 10
+description_model = "openai/gpt-4.1-mini"
+relationship_model = "google/gemini-2.0-flash-001"
+```
+
+## Migration Guide for Developers
+
+If you're creating new HTTP clients in octocode:
+
+### For LLM/API Calls
+```rust
+use reqwest::Client;
+use anyhow::Context;
+
+let client = Client::builder()
+    .timeout(std::time::Duration::from_secs(
+        config.graphrag.llm.batch_timeout_seconds,
+    ))
+    .build()
+    .context("Failed to create HTTP client")?;
+```
+
+### For File Downloads
+```rust
+let client = Client::builder()
+    .timeout(std::time::Duration::from_secs(30))
+    .build()
+    .context("Failed to create HTTP client")?;
+```
+
+### Don't Use Client::new()
+Unless you have a specific reason to avoid timeouts (rare), **always use the builder pattern**.
+
+## Related Issues
+
+This fix resolves the core issue documented in:
+- octocode_timeout_analysis.md (comprehensive technical analysis)
+- User reports of infinite hangs during GraphRAG indexing
+
+## Checklist
+
+- [x] Code changes tested locally
+- [x] Both timeout bugs fixed (GraphRAG + HuggingFace)
+- [x] No breaking changes to API
+- [x] Enhanced error messages added
+- [x] Documentation comments added
+- [x] Code quality verified (clippy clean)
+- [x] Integration tested on real codebase
+- [x] Unit tests passing (93/96 passed, 3 unrelated FastEmbed lock failures)
+- [x] Ready for upstream PR