-
Notifications
You must be signed in to change notification settings - Fork 0
Builtin read tool panics on UTF-8 character boundary at 2000-byte truncation #21
Copy link
Copy link
Open
Description
Summary
The builtin read tool panics when the 2000th byte of the target file falls inside a multibyte UTF-8 character.
This is reproducible through the Python API with a completely generic local setup. It does not require MCP servers, project files, or private data.
Minimal repro
from pathlib import Path
from tempfile import TemporaryDirectory
from a3s_code import Agent
with TemporaryDirectory(prefix="a3s-read-repro-") as tmp_dir:
workspace = Path(tmp_dir)
boundary_file = workspace / "boundary.txt"
# 1999 ASCII bytes + one 3-byte UTF-8 char + trailing ASCII.
# This places byte index 2000 inside a multibyte code point.
boundary_file.write_text("a" * 1999 + "频" + "z" * 20, encoding="utf-8")
agent = Agent.create("/path/to/your/working/config.hcl")
session = agent.session(str(workspace), permissive=True)
session.tool("read", {"file_path": str(boundary_file)})Steps to reproduce
- Create any valid local
a3s-codeconfig. - Run the script above.
- Observe the builtin
readtool panic.
Expected behavior
The read tool should either:
- return a valid truncated UTF-8 string, or
- return a normal tool error
but it should not panic the runtime.
Actual behavior
The runtime panics with:
thread '<unnamed>' panicked at .../core/src/tools/builtin/read.rs:99:22:
byte index 2000 is not a char boundary; it is inside '频' (bytes 1999..2002)
...
pyo3_runtime.PanicException: byte index 2000 is not a char boundary; it is inside '频' ...
Notes
- This appears to be a byte-slicing vs UTF-8 character-boundary bug in builtin
read. - The issue is independent of my application code.
- A temporary workaround on my side is to avoid feeding non-ASCII intermediate files to the builtin
readtool.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels