-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Current agent loop
Simple single-turn ReAct loop (agent-runtime.ts):
- Capture screenshot → scale → send to model as fresh single-turn API call
- Model returns one action → execute via adb → append result to history
- Repeat until
finishor max steps
What the model sees each step: system prompt + one screenshot + last 8 history strings (flat text, one line per step).
Limitations
- No multi-turn conversation — each step is an independent API call, no memory of prior model responses
- No state accumulation — model must re-derive its plan every step from a screenshot and 8 terse history lines
- No tool call chaining — exactly one action per API round-trip
- Fixed context — same system prompt every step, no dynamic progress tracking
This works for simple tasks but causes inefficiency on complex multi-step tasks (e.g. GPT-5.2 needs ~30% more steps than Claude Sonnet 4.5 on the same task).
Proposal: Claude Agent SDK
Use Claude Agent SDK (Claude Code) as the agent driver. This would give us:
- Multi-turn conversation — model retains full context across steps
- Native tool use — agent SDK manages the tool call loop, retries, and error handling
- Built-in agentic patterns — planning, reflection, and recovery without manual prompt engineering
- Streaming — real-time progress instead of waiting for full responses
The current tool definitions (src/agent/tools.ts) are already compatible with the SDK's tool format.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels