-
Notifications
You must be signed in to change notification settings - Fork 214
Description
Environment
- Spacebot Version: v0.2.1 (pre-built binary from
ghcr.io/spacedriveapp/spacebot:latest) - OS: Ubuntu 22.04 LTS, x86_64 (bare-metal, systemd service)
- Language in use: Chinese (CJK characters in memory content)
Description
A tokio worker thread panics in cortex_chat.rs when the cortex processes memory content containing CJK (Chinese/Japanese/Korean) characters. The panic is caused by slicing a UTF-8 string at a fixed byte offset (200) that falls inside a multi-byte character.
The runtime is resilient — the panicking thread is isolated and the process continues running — but the panic fires repeatedly whenever the cortex bulletin generation is triggered with CJK memory content.
Panic Message
thread 'tokio-runtime-worker' panicked at /build/src/agent/cortex_chat.rs:92:37:
byte index 200 is not a char boundary; it is inside '小' (bytes 199..202) of
`{"memories":[{"id":"...","content":"用户 小明 是一位中文开发者,正在测试 Spacebot 的功能。
他安装了 hacker-news 技能,希望我能从 Hacker `[...]
Root Cause
cortex_chat.rs:92 performs a direct byte-index slice (likely &s[..200] or similar) on a JSON-serialized memory string without checking for char boundaries. CJK characters are 3 bytes each in UTF-8, so any truncation at a fixed byte offset has a high probability of splitting a character.
Suggested Fix
Replace the fixed byte-index slice with a char-boundary-safe equivalent, e.g.:
// Instead of: &s[..200]
let end = s.floor_char_boundary(200);
&s[..end]Or using the stable alternative:
let end = s.char_indices()
.map(|(i, _)| i)
.take_while(|&i| i <= 200)
.last()
.unwrap_or(0);
&s[..end]Related
- Build Failure: Multiple E0658 (round_char_boundary) and E0004 (non-exhaustive match) in v0.1.8 #128 mentions PR fix: prevent panic in split_message on multibyte UTF-8 char boundaries #49 addressed some UTF-8 boundary issues, but
cortex_chat.rsappears to have been missed.