Skip to content

fix: UTF-8 panics + refactor: channel supervisor & routes split#193

Open
slysian wants to merge 5 commits intoRightNow-AI:mainfrom
slysian:pr/architecture-improvements
Open

fix: UTF-8 panics + refactor: channel supervisor & routes split#193
slysian wants to merge 5 commits intoRightNow-AI:mainfrom
slysian:pr/architecture-improvements

Conversation

@slysian
Copy link

@slysian slysian commented Mar 2, 2026

Replace all unsafe byte-index string slicing (&s[..N]) with
str::floor_char_boundary() across 18 files in runtime, kernel, api,
memory, channels, and cli crates.

These cause panics when the byte index falls inside a multi-byte UTF-8
character (common with CJK text from QQ/Telegram users). The crash was
first observed when web_fetch returned Chinese web content that was
truncated at a 3-byte character boundary.

Affected hot paths:

  • context_budget: tool result truncation
  • context_overflow: overflow recovery truncation
  • compactor: conversation text tail-keeping
  • stream_chunker: forced break at max_chunk_chars
  • web_fetch: HTTP response body truncation
  • kernel: session topic & identity file truncation
  • docker_sandbox: stdout/stderr truncation
  • tool_runner: command/url logging, canvas_id slicing
  • provider_health: error body truncation
  • subprocess_sandbox: command logging
  • session_repair: injection marker stripping (to_lowercase byte mismatch)
  • cron/triggers: error message & content truncation
  • session (memory): thinking text truncation
  • TUI screens: ID/value display truncation

Create supervisor.rs with centralized reconnection/backoff infrastructure
and refactor 23 channel adapters to use it, eliminating ~10,000 lines of
duplicated reconnection logic.

The supervisor module provides:

  • SupervisorConfig: configurable initial/max backoff (default 1s/60s)
  • run_supervised_loop(): generic supervised reconnection loop
  • run_supervised_loop_reset_on_connect(): variant that resets backoff
    after successful connection
  • DEFAULT_CHANNEL_BUFFER (256): shared constant replacing hardcoded sizes

Each adapter's inline reconnection loop is extracted into a standalone
async fn that returns Result<bool, String>:

  • Ok(true) = reconnect (transient failure)
  • Ok(false) = permanent stop (shutdown or channel closed)
  • Err(msg) = retry with backoff

Refactored adapters: telegram, discord, slack, irc, mattermost, revolt,
matrix, mastodon, zulip, bluesky, twitch, guilded, gotify, gitter,
discourse, ntfy, nostr, webex, twist, mumble, nextcloud, reddit,
keybase, linkedin.

…ules

Break the single routes.rs god file containing 174 handler functions
into domain-specific modules under routes/:

agents.rs (1,551 lines, 24 handlers)
channels.rs (1,220 lines, 7 handlers)
hands.rs (668 lines, 11 handlers)
models.rs (554 lines, 8 handlers)
skills.rs (489 lines, 9 handlers)
sessions.rs (384 lines, 10 handlers)
common.rs (330 lines, shared types/helpers)

  • 22 smaller domain modules

Backward compatibility is maintained via pub use re-exports in
mod.rs, so server.rs and test files continue referencing
routes::handler_name without changes.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

slysian and others added 5 commits March 2, 2026 13:54
Replace all unsafe byte-index string slicing (`&s[..N]`) with
`str::floor_char_boundary()` across 18 files in runtime, kernel, api,
memory, channels, and cli crates.

These cause panics when the byte index falls inside a multi-byte UTF-8
character (common with CJK text from QQ/Telegram users). The crash was
first observed when web_fetch returned Chinese web content that was
truncated at a 3-byte character boundary.

Affected hot paths:
- context_budget: tool result truncation
- context_overflow: overflow recovery truncation
- compactor: conversation text tail-keeping
- stream_chunker: forced break at max_chunk_chars
- web_fetch: HTTP response body truncation
- kernel: session topic & identity file truncation
- docker_sandbox: stdout/stderr truncation
- tool_runner: command/url logging, canvas_id slicing
- provider_health: error body truncation
- subprocess_sandbox: command logging
- session_repair: injection marker stripping (to_lowercase byte mismatch)
- cron/triggers: error message & content truncation
- session (memory): thinking text truncation
- TUI screens: ID/value display truncation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create `supervisor.rs` with centralized reconnection/backoff infrastructure
and refactor 23 channel adapters to use it, eliminating ~10,000 lines of
duplicated reconnection logic.

The supervisor module provides:
- `SupervisorConfig`: configurable initial/max backoff (default 1s/60s)
- `run_supervised_loop()`: generic supervised reconnection loop
- `run_supervised_loop_reset_on_connect()`: variant that resets backoff
  after successful connection
- `DEFAULT_CHANNEL_BUFFER` (256): shared constant replacing hardcoded sizes

Each adapter's inline reconnection loop is extracted into a standalone
`async fn` that returns `Result<bool, String>`:
- `Ok(true)` = reconnect (transient failure)
- `Ok(false)` = permanent stop (shutdown or channel closed)
- `Err(msg)` = retry with backoff

Refactored adapters: telegram, discord, slack, irc, mattermost, revolt,
matrix, mastodon, zulip, bluesky, twitch, guilded, gotify, gitter,
discourse, ntfy, nostr, webex, twist, mumble, nextcloud, reddit,
keybase, linkedin.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ules

Break the single routes.rs god file containing 174 handler functions
into domain-specific modules under routes/:

  agents.rs (1,551 lines, 24 handlers)
  channels.rs (1,220 lines, 7 handlers)
  hands.rs (668 lines, 11 handlers)
  models.rs (554 lines, 8 handlers)
  skills.rs (489 lines, 9 handlers)
  sessions.rs (384 lines, 10 handlers)
  common.rs (330 lines, shared types/helpers)
  + 22 smaller domain modules

Backward compatibility is maintained via `pub use` re-exports in
mod.rs, so server.rs and test files continue referencing
`routes::handler_name` without changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 12 instances of `id.to_string()[..8]` with
`id.to_string().get(..8).unwrap_or(&id_str)` across kernel.rs (3) and
channel_bridge.rs (9).

While UUID Display always produces 36-char ASCII strings making [..8]
currently safe, .get() is defensive against any future Display changes
and is consistent with the floor_char_boundary() hardening in the rest
of the codebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded Whisper model names and API endpoints with LazyLock
statics that read from environment variables at first use:

- GROQ_STT_MODEL (default: whisper-large-v3-turbo)
- GROQ_STT_URL (default: api.groq.com/openai/v1/audio/transcriptions)
- OPENAI_STT_MODEL (default: whisper-1)
- OPENAI_STT_URL (default: api.openai.com/v1/audio/transcriptions)

This allows users to swap STT models or providers without recompiling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant