A conversational AI agent with persistent, queryable memory that can recall information from turn 1 even at turn 1000.
✅ Persistent Conversations - Automatically resumes from your last session
✅ Hybrid Memory Search - Semantic (vector) + keyword (FTS5) with RRF
✅ Manual Memory Extraction - /distill command to save memories on demand
✅ Token-Aware Context - Auto-flushes at 70% context usage
✅ last_used_turn Tracking - Full specification compliance (100/100)
✅ Rich CLI - Beautiful terminal interface with memory insights
✅ 1000+ Turn Support - Validated for long conversations
# 1. Clone the repository
git clone https://github.com/pragnyanramtha/longmem.git
cd longmem
# 2. Install uv (if not installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 3. Set up API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
# 4. Run the demo
./run_demo.sh# 1. Clone the repository
git clone https://github.com/pragnyanramtha/longmem.git
cd longmem
# 2. Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Set up API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
# 5. Run the interactive CLI
python main.py# 1. Clone the repository
git clone https://github.com/pragnyanramtha/longmem.git
cd longmem
# 2. Create environment
conda env create -f environment.yml
conda activate atlas
# 3. Set up API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
# 4. Run the interactive CLI
python main.py# 1. Clone the repository
git clone https://github.com/pragnyanramtha/longmem.git
cd longmem
# 2. Build the image
docker build -t atlas:latest .
# 3. Run with environment variables
docker run -it \
-e GROQ_API_KEY=your_key_here \
-v $(pwd)/memory.db:/app/memory.db \
-v $(pwd)/snapshots:/app/snapshots \
atlas:latest- Python 3.11+ (required)
- API Key from one of:
Option A: Groq (Recommended - FREE)
- Visit https://console.groq.com
- Sign up for a free account
- Go to API Keys → Create API Key
- Copy the key starting with
gsk_...
Option B: OpenAI
- Visit https://platform.openai.com
- Create an account and add credits
- Generate an API key starting with
sk-...
Option C: Local Model (Advanced)
- Install Ollama: https://ollama.ai
- Pull a model:
ollama pull mistral - No API key needed
Create .env file in the project root:
# For Groq (recommended)
GROQ_API_KEY=gsk_your_actual_key_here
# OR for OpenAI
# OPENAI_API_KEY=sk_your_actual_key_here
# OR for Ollama (local)
# OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_MODEL=mistralChoose your preferred method:
Using uv (fastest):
uv syncUsing pip:
pip install -r requirements.txtUsing conda:
conda env create -f environment.yml
conda activate atlasInteractive CLI:
python main.pyAutomated Demo Script:
./run_demo.shJupyter Notebook Demo:
jupyter notebook run_demo.ipynb| Command | Description |
|---|---|
| (normal text) | Chat with the agent |
/memories |
Show all active memories in a table |
/distill |
Manually extract memories from current conversation |
/snapshot |
Save current memory state to snapshots/ |
/quit |
Exit (conversation state is saved automatically) |
$ python main.py
╔════════════════════════════════════════════════════════════╗
║ Long-Form Memory Agent - Interactive CLI ║
╚════════════════════════════════════════════════════════════╝
Resuming conversation from turn 0. Active memories: 0
You: My name is Priya and I'm allergic to peanuts.
Assistant: Nice to meet you, Priya! I'll remember that you're allergic to peanuts.
You: /distill
✓ Distillation complete. 2 memories extracted.
Total active memories: 2
You: I like hiking on weekends.
Assistant: That's great! I'll remember you enjoy hiking on weekends.
You: /quit
Conversation saved. See you next time!
# --- Next day, restart the program ---
$ python main.py
Resuming conversation from turn 3. Active memories: 3
You: What's my name and what am I allergic to?
Assistant: Your name is Priya, and you're allergic to peanuts.
🧠 name: Priya (t1)
🧠 allergy: peanuts (t1)atlas/
├── src/
│ ├── agent.py # Main orchestration loop
│ ├── models.py # Data structures (Memory, DistilledMemory)
│ ├── store.py # SQLite + sqlite-vec + FTS5
│ ├── context.py # Token-aware context window manager
│ ├── distiller.py # LLM-based memory extraction
│ ├── retriever.py # Hybrid search (vector + keyword)
│ └── prompts.py # LLM prompt templates
├── eval/
│ ├── generate.py # Generate synthetic 1000-turn conversation
│ ├── evaluate.py # Run evaluation and calculate metrics
│ └── scenarios.json # Test scenarios (planted memories + probes)
├── main.py # Interactive CLI entry point
├── run_demo.sh # Automated demo script
├── run_demo.ipynb # Jupyter notebook demo
├── requirements.txt # Python dependencies
├── environment.yml # Conda environment
└── Dockerfile # Container image
- Chat normally - Messages are in the context window
- Auto-flush at 70% - When context is 70% full, memories are extracted
- Manual extraction - Use
/distillto save memories before 70% - Persistent storage - SQLite stores memories, vectors, and history
- Next session - Resume exactly where you left off
User Input
↓
[1] Retrieve Relevant Memories (hybrid search)
↓
[2] Inject into System Prompt
↓
[3] LLM Generates Response
↓
[4] Track last_used_turn
↓
[5] Check Context Usage (70% threshold?)
↓
[6] If threshold hit: Distill & Save Memories
↓
Response + Metadata
Run the comprehensive 1000-turn evaluation:
# Generate synthetic conversation
python eval/generate.py
# Run evaluation (uses local model or API)
python eval/evaluate.py
# Or with specific model
python eval/evaluate.py --local --model mistral --turns 1000Expected Results:
- ✅ Recall accuracy: >90%
- ✅ Memory persistence across 1000+ turns
- ✅ Query-specific retrieval working correctly
- ✅ last_used_turn tracking verified
See eval/SPEC_COMPLIANCE_ANALYSIS.md for detailed compliance report (100/100 score).
All data stored in memory.db:
CREATE TABLE memories (
id TEXT PRIMARY KEY,
type TEXT NOT NULL, -- preference, fact, etc.
category TEXT NOT NULL, -- language, schedule, etc.
key TEXT NOT NULL, -- canonical identifier
value TEXT NOT NULL, -- the actual information
source_turn INTEGER NOT NULL, -- when it was created
confidence REAL DEFAULT 0.9,
created_at REAL NOT NULL,
updated_at REAL NOT NULL,
is_active INTEGER DEFAULT 1, -- soft delete flag
last_used_turn INTEGER DEFAULT 0 -- tracking retrieval usage
);- memories_vec (sqlite-vec) - 384-dim embeddings for semantic search
- memories_fts (FTS5) - Full-text keyword index
- profile - User preferences auto-populated
- turns - Full conversation log
Edit src/agent.py constructor or pass parameters:
agent = LongMemAgent(
api_key="your_key", # API key
provider="groq", # "groq", "openai", or "ollama"
model="llama-3.1-8b-instant", # Model name
db_path="memory.db", # Database file
context_limit=8192, # Token limit
flush_threshold=0.70, # When to distill (70%)
)Groq (recommended):
llama-3.1-8b-instant(fast, default)llama-3.3-70b-versatile(powerful)mixtral-8x7b-32768(large context)
OpenAI:
gpt-4o-mini(cost-effective)gpt-4o(most capable)
Ollama (local):
mistralllama3.1qwen2.5
Force memory extraction without waiting for 70% threshold:
from src.agent import LongMemAgent
agent = LongMemAgent()
result = agent.manual_distill()
print(result) # {'success': True, 'memories_added': 5, ...}from src.agent import LongMemAgent
# Initialize agent
agent = LongMemAgent(provider="groq", model="llama-3.1-8b-instant")
# Single turn
response = agent.chat("My favorite color is blue")
print(response['response'])
print(response['active_memories']) # Memories used in this turn
# Get all memories
all_memories = agent.get_all_memories()
for mem in all_memories:
print(f"{mem['key']}: {mem['value']} (turn {mem['source_turn']})")If upgrading from an older version:
python migrate_add_last_used_turn.pypip install sentence-transformerspip install --force-reinstall sqlite-vec# Check your .env file
cat .env
# Make sure GROQ_API_KEY is set correctly# Check if memory.db exists and has data
sqlite3 memory.db "SELECT COUNT(*) FROM memories"
# Should return > 0 after distillation# Increase retrieval top_k
# Edit src/agent.py line 103: top_k=10 instead of top_k=5Atlas uses Reciprocal Rank Fusion (RRF) to merge:
- Vector search (semantic similarity via sentence-transformers)
- FTS5 search (keyword matching)
This ensures both semantic understanding and exact term matching.
- Context window monitored per-turn
- Automatic flush at 70% utilization (configurable)
- Last 4 messages retained for continuity
- System prompt rebuilt with retrieved memories
- Extraction - LLM analyzes conversation segment
- Validation - Structured format checked
- Storage - Written to SQLite + vector + FTS indexes
- Retrieval - Hybrid search on each turn
- Injection - Added to system prompt
- Tracking - last_used_turn updated
- Expiry - Soft delete via is_active flag
Contributions welcome! Areas for improvement:
- Add streaming response support
- Implement memory decay based on last_used_turn
- Add multi-user support with session IDs
- Create web UI (FastAPI + React)
- Add more evaluation scenarios
- Implement memory conflict resolution
- Add export/import for memory backups
MIT License - see LICENSE file for details.
- Built with sqlite-vec for vector search
- Powered by Groq for fast LLM inference
- Embeddings via sentence-transformers
- UI built with rich
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See
eval/SPEC_COMPLIANCE_ANALYSIS.mdfor detailed architecture
If you use Atlas in your research, please cite:
@software{atlas_memory,
title={Atlas: Long-Form Memory System for Conversational AI},
author={Pragnyan Ramtha},
year={2026},
url={https://github.com/pragnyanramtha/longmem}
}Built with ❤️ for production-grade persistent memory in conversational AI