Skip to content

Improve agent loop: multi-turn context and Claude Agent SDK #4

@baryhuang

Description

@baryhuang

Current agent loop

Simple single-turn ReAct loop (agent-runtime.ts):

  1. Capture screenshot → scale → send to model as fresh single-turn API call
  2. Model returns one action → execute via adb → append result to history
  3. Repeat until finish or max steps

What the model sees each step: system prompt + one screenshot + last 8 history strings (flat text, one line per step).

Limitations

  • No multi-turn conversation — each step is an independent API call, no memory of prior model responses
  • No state accumulation — model must re-derive its plan every step from a screenshot and 8 terse history lines
  • No tool call chaining — exactly one action per API round-trip
  • Fixed context — same system prompt every step, no dynamic progress tracking

This works for simple tasks but causes inefficiency on complex multi-step tasks (e.g. GPT-5.2 needs ~30% more steps than Claude Sonnet 4.5 on the same task).

Proposal: Claude Agent SDK

Use Claude Agent SDK (Claude Code) as the agent driver. This would give us:

  • Multi-turn conversation — model retains full context across steps
  • Native tool use — agent SDK manages the tool call loop, retries, and error handling
  • Built-in agentic patterns — planning, reflection, and recovery without manual prompt engineering
  • Streaming — real-time progress instead of waiting for full responses

The current tool definitions (src/agent/tools.ts) are already compatible with the SDK's tool format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions