| layout | default |
|---|---|
| title | Chapter 2: Architecture and Agent Loop |
| nav_order | 2 |
| parent | Goose Tutorial |
Welcome to Chapter 2: Architecture and Agent Loop. In this part of Goose Tutorial: Extensible Open-Source AI Agent for Real Engineering Work, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
This chapter explains how Goose turns requests into concrete engineering actions.
- understand Goose's core runtime components
- trace request -> tool call -> response loop behavior
- reason about context revision and token efficiency
- use this model to debug misbehavior faster
flowchart TD
U[User] --> I["Interface\n(Desktop UI or CLI)"]
I --> A["Agent\n(goose-cli / goose-server)"]
A --> P["Provider\n(Anthropic, OpenAI, Ollama, etc.)"]
P -->|tool_use requests| A
A --> E["Extensions\n(MCP servers via goose-mcp)"]
E -->|tool results| A
A --> C["Conversation\n(context management + compaction)"]
C --> A
A --> I
| Component | Role | Practical Impact |
|---|---|---|
| Interface (Desktop/CLI) | Collects prompts, shows outputs | Determines operator experience and control surface |
| Agent | Runs orchestration loop | Handles provider calls, tool invocations, and retries |
| Extensions (MCP) | Expose capabilities as tools | Enable shell, file, API, browser, memory, and more |
- user submits task
- Goose sends task + available tools to model
- model requests tool calls when needed
- Goose executes tool calls and returns results
- Goose revises context for relevance/token limits
- model returns answer or next action request
- if outputs degrade, inspect tool surface and context length first
- if execution stalls, isolate whether provider/tool/permission is blocking
- if costs spike, tune context strategy and tool verbosity
Goose treats many execution failures as recoverable signals to the model:
- malformed tool arguments
- unavailable tools
- command failures
This makes multi-step workflows more resilient than simple one-shot prompting. The model sees the error text as a tool result and can retry with corrected arguments or choose a different approach.
The codebase is organized as a Rust workspace with clear separation between layers:
| Crate | Path | Role |
|---|---|---|
goose-cli |
crates/goose-cli/ |
Binary, interactive session, headless run |
goose-server |
crates/goose-server/ |
HTTP server for desktop app integration |
goose-mcp |
crates/goose-mcp/ |
Built-in MCP extensions (memory, computer controller, etc.) |
goose-acp |
crates/goose-acp/ |
Agent Control Protocol server implementation |
goose-sdk |
crates/goose-sdk/ |
Public SDK for embedding Goose in other tools |
The desktop application communicates with goose-server over a local HTTP connection. The CLI uses goose-cli directly. Both ultimately call the same Agent type from the core goose crate, so behavior is consistent across surfaces.
The /reply endpoint in goose-server emits structured telemetry at loop completion:
- turn count and total token usage
- tool call count per session
- session duration
- provider and model name
This telemetry is logged to the sessions directory. In production, these logs are the primary data source for cost attribution and debugging runaway sessions.
Goose uses two mechanisms to keep context within model limits:
- Auto-compaction — triggered near
GOOSE_AUTO_COMPACT_THRESHOLD; the agent rewrites the conversation history with a summarized version, retaining the most recent and most relevant turns - Fallback strategies — configured via
GOOSE_CONTEXT_STRATEGY; options includesummarize(automatic summary) andprompt(pause and ask the user how to proceed)
The display_context_usage() call at the top of each loop iteration shows you how far you are from the threshold before you send the next prompt.
Walking through a single turn in detail helps map the architecture to observable behavior:
- User submits prompt — typed in the CLI editor or sent to
/replyendpoint CliSession.handle_input()dispatches — validates input, handles slash commands, or passes to agentAgent.reply()called — packages theConversation(recent messages + system prompt) and available tools into a provider request- Provider call made — sent to the configured LLM; streamed response begins arriving
- Model requests tool calls — if the model emits
tool_useblocks, the agent queues them - Permission check —
PermissionManagerenforcesGooseModefor each tool; prompts if needed - Tool executed — MCP extension handles the call and returns a result (or error)
- Result appended to
Conversation— as atool_resultmessage - Loop back to step 3 — agent continues until model returns a text response or max turns hit
- Context usage displayed — token count shown at next prompt
Goose exposes an Agent Control Protocol (ACP) server via crates/goose-acp/. ACP is a session-oriented protocol that allows external clients — including the desktop UI, IDE plugins, and custom integrations — to create and manage agent sessions over a standardized interface. The key struct is GooseAcpSession, which bridges ACP's session model with Goose's internal Session and Thread types. This is the abstraction that makes it possible to drive Goose from multiple surfaces without duplicating agent logic.
When behavior is unexpected, the most useful debugging signals are:
- Token usage display — visible at the top of each interactive prompt; if you're near the limit, compaction or truncation may have dropped relevant context
- Tool call logs — each tool invocation is logged in the session file; export to Markdown to read the full call sequence
--debugflag — disables output truncation so you see full tool inputs and outputs in the terminalgoose session diagnostics— generates a ZIP with all session data for deep investigation
You now have an operator-level mental model for Goose's execution loop and error paths.
Next: Chapter 3: Providers and Model Routing
The CliSession struct in crates/goose-cli/src/session/mod.rs is the heart of every interactive Goose session:
pub struct CliSession {
agent: Agent,
messages: Conversation,
session_id: String,
completion_cache: Arc<std::sync::RwLock<CompletionCache>>,
debug: bool,
run_mode: RunMode,
scheduled_job_id: Option<String>,
max_turns: Option<u32>,
edit_mode: Option<EditMode>,
retry_config: Option<RetryConfig>,
output_format: String,
}Its interactive() method runs the core loop:
loop {
self.display_context_usage().await?;
output::run_status_hook("waiting");
let input = input::get_input(&mut editor, ...)?;
if matches!(input, InputResult::Exit) {
break;
}
self.handle_input(input, &history_manager, &mut editor).await?;
}Each iteration: display token usage, read input, dispatch to handle_input(), which calls the Agent, streams tool invocations back, and writes results to the Conversation. This is the loop described conceptually in the chapter's "Interactive Loop" section.
When using the desktop app or API, the /reply SSE endpoint in crates/goose-server/src/routes/reply.rs mirrors the CLI loop over HTTP. It accepts a ChatRequest and streams MessageEvent variants — Message, Error, Finish, Notification, UpdateConversation, and Ping — back to the client. The implementation uses tokio::select! to multiplex cancellation signals, heartbeats (every 500ms), and agent stream responses in the same loop. This is how the agent loop maps to both CLI and desktop surfaces without duplicating logic.
flowchart TD
A["User input (CLI or desktop)"]
B["CliSession.interactive()\nor /reply SSE endpoint"]
C["Agent: sends messages + tools to LLM"]
D["Tool execution via Extensions (MCP)"]
E["Conversation updated\nContext usage tracked"]
A --> B
B --> C
C --> D
D --> E
E --> B