Open
Conversation
Replace all unsafe byte-index string slicing (`&s[..N]`) with `str::floor_char_boundary()` across 18 files in runtime, kernel, api, memory, channels, and cli crates. These cause panics when the byte index falls inside a multi-byte UTF-8 character (common with CJK text from QQ/Telegram users). The crash was first observed when web_fetch returned Chinese web content that was truncated at a 3-byte character boundary. Affected hot paths: - context_budget: tool result truncation - context_overflow: overflow recovery truncation - compactor: conversation text tail-keeping - stream_chunker: forced break at max_chunk_chars - web_fetch: HTTP response body truncation - kernel: session topic & identity file truncation - docker_sandbox: stdout/stderr truncation - tool_runner: command/url logging, canvas_id slicing - provider_health: error body truncation - subprocess_sandbox: command logging - session_repair: injection marker stripping (to_lowercase byte mismatch) - cron/triggers: error message & content truncation - session (memory): thinking text truncation - TUI screens: ID/value display truncation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create `supervisor.rs` with centralized reconnection/backoff infrastructure and refactor 23 channel adapters to use it, eliminating ~10,000 lines of duplicated reconnection logic. The supervisor module provides: - `SupervisorConfig`: configurable initial/max backoff (default 1s/60s) - `run_supervised_loop()`: generic supervised reconnection loop - `run_supervised_loop_reset_on_connect()`: variant that resets backoff after successful connection - `DEFAULT_CHANNEL_BUFFER` (256): shared constant replacing hardcoded sizes Each adapter's inline reconnection loop is extracted into a standalone `async fn` that returns `Result<bool, String>`: - `Ok(true)` = reconnect (transient failure) - `Ok(false)` = permanent stop (shutdown or channel closed) - `Err(msg)` = retry with backoff Refactored adapters: telegram, discord, slack, irc, mattermost, revolt, matrix, mastodon, zulip, bluesky, twitch, guilded, gotify, gitter, discourse, ntfy, nostr, webex, twist, mumble, nextcloud, reddit, keybase, linkedin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ules Break the single routes.rs god file containing 174 handler functions into domain-specific modules under routes/: agents.rs (1,551 lines, 24 handlers) channels.rs (1,220 lines, 7 handlers) hands.rs (668 lines, 11 handlers) models.rs (554 lines, 8 handlers) skills.rs (489 lines, 9 handlers) sessions.rs (384 lines, 10 handlers) common.rs (330 lines, shared types/helpers) + 22 smaller domain modules Backward compatibility is maintained via `pub use` re-exports in mod.rs, so server.rs and test files continue referencing `routes::handler_name` without changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 12 instances of `id.to_string()[..8]` with `id.to_string().get(..8).unwrap_or(&id_str)` across kernel.rs (3) and channel_bridge.rs (9). While UUID Display always produces 36-char ASCII strings making [..8] currently safe, .get() is defensive against any future Display changes and is consistent with the floor_char_boundary() hardening in the rest of the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded Whisper model names and API endpoints with LazyLock statics that read from environment variables at first use: - GROQ_STT_MODEL (default: whisper-large-v3-turbo) - GROQ_STT_URL (default: api.groq.com/openai/v1/audio/transcriptions) - OPENAI_STT_MODEL (default: whisper-1) - OPENAI_STT_URL (default: api.openai.com/v1/audio/transcriptions) This allows users to swap STT models or providers without recompiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tool-heavy agent sessions (web_search + web_fetch) bloat to 100K+ tokens because defaults are too generous: web_fetch returns 50K chars per call, per-result budget cap allows 120K chars (30% of 200K context), and auto-compaction only triggers at 30 messages. With 5+ tool iterations, total context easily exceeds the model's window, causing 100+ second response times and downstream timeouts. Four targeted changes (defense in depth): 1. web_fetch max_chars: 50,000 → 12,000 12K chars after HTML→markdown is still a full article. The previous 50K default was far more than any LLM can usefully process per result. 2. Context budget ratios tightened: - per_result_cap: 30% → 8% (120K → 32K chars on 200K window) - single_result_max: 50% → 15% (200K → 60K chars) - total_headroom: 75% → 40% (300K → 160K chars) 3. MAX_HISTORY_MESSAGES: 20 → 12 12 messages ≈ 3 tool iterations or 6 user/assistant exchanges. Previous value of 20 allowed 500K+ chars of tool results to accumulate. 4. Auto-compaction trigger: threshold 30 → 15, keep_recent 10 → 6 Triggers LLM-based summarization earlier, preserving key context instead of blindly dropping messages when the window overflows. Expected impact: estimated tokens per session drop from 75-125K to 10-15K, response time from 100+ seconds to 15-25 seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tool-heavy agent sessions (
web_search+web_fetch) bloat to 100K+ tokens and cause 100+ second response times because defaults are too generous. This PR tightens 4 parameters as defense-in-depth:web_fetchmax_chars: 50,000 → 12,000 — 12K after HTML→markdown is still a full articleMAX_HISTORY_MESSAGES: 20 → 12 — 12 messages ≈ 3 tool iterations, sufficient for continuityBefore vs After (200K context window)
Root cause
web_fetchreturns up to 50K chars per callMAX_HISTORY_MESSAGES=20trims by count, but 20 messages × 50K tool results = 500K+ charsTest plan
web_search+web_fetch, confirm response time < 30s[TRUNCATED:markers appear in logs (context budget working)