| layout | default |
|---|---|
| title | Letta Tutorial - Chapter 2: Memory Architecture |
| nav_order | 2 |
| has_children | false |
| parent | Letta Tutorial |
Welcome to Chapter 2: Memory Architecture in Letta. In this part of Letta Tutorial: Stateful LLM Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Understand core memory, archival memory, and recall memory - the three pillars of persistent agent memory.
Letta's memory system is hierarchical and designed to give agents virtually unlimited context. This chapter explores the three types of memory and how they work together.
Core memory is the agent's "working memory" - the most important information that should always be accessible. It includes:
- Agent identity and persona
- Key facts about the user
- Current goals and context
- Critical instructions
from letta import create_client
client = create_client()
# Get an agent's core memory
agent = client.get_agent("sam")
core_memory = agent.memory
print("Core Memory:")
for memory_block in core_memory:
print(f"- {memory_block.name}: {memory_block.value}")Archival memory is long-term storage for facts, events, and information that might be relevant later but isn't needed immediately. It's like the agent's "external hard drive".
# Add to archival memory
client.add_to_archival_memory("sam", "John's favorite programming language is Python")
# Search archival memory
results = client.search_archival_memory("sam", "programming")Recall memory contains recent conversation history and context. It's automatically managed and provides the immediate conversational context.
# Get recent messages
messages = client.get_messages("sam", limit=10)
for msg in messages:
print(f"{msg.role}: {msg.content}")┌─────────────────┐
│ Core Memory │ ← Always in context, high priority
│ (Identity, │
│ Key Facts) │
├─────────────────┤
│ Recall Memory │ ← Recent conversation, auto-managed
│ (Last N turns) │
├─────────────────┤
│ Archival Memory │ ← Long-term storage, searchable
│ (Facts, Events)│
└─────────────────┘
When you send a message:
- Core memory is always included in the context
- Recall memory provides recent conversation history
- Archival memory is searched for relevant information
- The LLM generates a response
- New information is automatically stored in appropriate memory types
View what your agent knows:
# View core memory
letta get-agent --name sam
# Search archival memory
letta search-memory --name sam --query "python"
# View recent conversations
letta get-messages --name sam --limit 5# Update core memory blocks
client.update_memory_block("sam", "human", "Name: John, Occupation: Developer, Location: SF")# Store important facts
client.add_to_archival_memory("sam", "John completed the Python certification on 2024-01-15")
client.add_to_archival_memory("sam", "John prefers dark mode in all applications")# Semantic search
results = client.search_archival_memory("sam", "certification", top_k=3)
for result in results:
print(f"Score: {result.score}, Content: {result.content}")Letta automatically manages context to fit within LLM limits:
- Core memory: Always included (highest priority)
- Recall memory: Recent messages, truncated if needed
- Archival memory: Relevant chunks retrieved via search
For very long conversations, Letta can compress or summarize older recall memory to save space.
$ letta chat --name sam
Human: I prefer coffee over tea, and I'm allergic to nuts.
Assistant: I'll remember you prefer coffee and have a nut allergy. I'll keep this in mind for any food/drink recommendations.
Human: What should I drink in the morning?
Assistant: Based on what you've told me, you prefer coffee over tea. Would you like coffee recommendations?# Agent learns about user's work
client.add_to_archival_memory("sam", "John works on machine learning projects at TechCorp")
client.add_to_archival_memory("sam", "John uses PyTorch for deep learning")
client.add_to_archival_memory("sam", "John is preparing for an ML conference talk")
# Later conversations will reference this knowledgeAll memory is stored in a local database (SQLite by default) and persists across:
- Agent restarts
- System reboots
- Letta version updates
Core memory is organized into blocks:
# View memory blocks
blocks = client.get_memory_blocks("sam")
for block in blocks:
print(f"{block.name}: {block.value}")For advanced users, you can create custom memory blocks and retrieval strategies.
- Core Memory: Keep only essential, frequently-used information
- Archival Memory: Store detailed facts, events, and preferences
- Regular Cleanup: Periodically review and clean up outdated information
- Structured Storage: Use consistent formats for better retrieval
| Aspect | Traditional Chatbot | Letta Agent |
|---|---|---|
| Context | Limited to current conversation | Unlimited via hierarchical memory |
| Learning | None | Learns and remembers over time |
| Personalization | Basic | Deeply personalized experiences |
| Consistency | May contradict itself | Maintains consistent knowledge |
Next: Configure agent personalities and behavior with system prompts and models.
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for client, memory, John so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 2: Memory Architecture in Letta as an operating subsystem inside Letta Tutorial: Stateful LLM Agents, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around name, add_to_archival_memory, letta as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 2: Memory Architecture in Letta usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
client. - Input normalization: shape incoming data so
memoryreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
John. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- View Repo
Why it matters: authoritative reference on
View Repo(github.com). - Awesome Code Docs
Why it matters: authoritative reference on
Awesome Code Docs(github.com).
Suggested trace strategy:
- search upstream code for
clientandmemoryto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production