A demonstration of an AI agent with persistent memory capabilities using LangGraph for workflow orchestration and Mem0 for long-term memory management.
🚀 Quick Start: Jump to the Quick Demo section to see the memory capabilities in action!
- Persistent Memory: Remembers user conversations across sessions using Mem0
- Contextual Responses: Retrieves relevant memories to provide personalized interactions
- Local Embeddings: Uses Ollama for privacy-focused embedding generation
- Workflow Orchestration: LangGraph manages the memory retrieval and response generation flow
- Multiple Interfaces: Interactive demos, CLI mode, and REST API
The system consists of two main workflow nodes:
- Memory Retrieval: Searches for relevant past conversations based on the current query
- Response Generation: Uses Claude with memory context to generate personalized responses
User Input → Retrieve Memory → Generate Response (with context) → Store New Memory
- Python 3.11+ and UV package manager
- Anthropic API key
- OR Docker and Docker Compose (for API server)
-
Install UV package manager:
curl -LsSf https://astral.sh/uv/install.sh | shOr on Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
-
Clone the repository:
git clone <repository-url> cd langgraph-mem0
-
Install dependencies:
uv sync
-
Install and set up Ollama:
# Install Ollama (macOS/Linux) curl -fsSL https://ollama.ai/install.sh | sh
Or download from https://ollama.ai for other platforms.
Then pull the embeddings model:
ollama pull nomic-embed-text:latest
-
Configure environment variables:
cp .env.example .env # Edit .env and configure all required variables: # ANTHROPIC_API_KEY=your_api_key_here # ANTHROPIC_MODEL=claude-sonnet-4-0 # OLLAMA_MODEL=nomic-embed-text:latest # OLLAMA_BASE_URL=http://localhost:11434 # OLLAMA_EMBEDDING_DIMS=768 # CHROMA_COLLECTION_NAME=test # CHROMA_DB_PATH=db
-
Clone the repository:
git clone <repository-url> cd langgraph-mem0
-
Configure environment variables:
cp .env.example .env # Edit .env and set your ANTHROPIC_API_KEY -
Start the services using Docker Compose:
# Start all services (API server runs by default) docker-compose up -d # Pull the embedding model (first time only) docker-compose run --rm ollama-setup # View logs docker-compose logs -f app
-
Access the API:
# API available at http://localhost:8000 curl http://localhost:8000/health # Send a chat message curl -X POST "http://localhost:8000/chat" \ -H "Content-Type: application/json" \ -d '{"message": "Hello!", "user_id": "user123"}'
-
Stop the services:
docker-compose down
See the memory capabilities in action with the structured demonstration:
uv run example_demo.pyWhat this shows:
- Agent learns user details (name, profession, preferences)
- Demonstrates memory persistence across conversation topics
- Shows contextual recall when asked "What do you remember about me?"
- Choose option 1 for automated demo or option 2 for interactive mode
Sample interaction:
> "Hi, my name is Alice and I'm a software developer from San Francisco."
< "Nice to meet you, Alice! It's great to connect with a fellow software developer..."
> "What do you remember about me?"
< "I remember that you're Alice, a software developer from San Francisco. You mentioned..."
For integration with other applications:
Available Endpoints:
GET /health- Health checkPOST /chat- Send chat messagesPOST /memory/search- Search user memoriesDELETE /memory/{user_id}- Clear user memories
Example API Usage:
# Health check
curl http://localhost:8000/health
# Chat with the agent
curl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-d '{"message": "Hi, my name is Alice", "user_id": "user123"}'
# Search memories
curl -X POST "http://localhost:8000/memory/search" \
-H "Content-Type: application/json" \
-d '{"query": "name", "user_id": "user123"}'
# Clear memories
curl -X DELETE "http://localhost:8000/memory/user123"API Benefits:
- Rate limiting and security features built-in
- Proper error handling and logging
- Ready for deployment
- Supports multiple concurrent users
Docker Benefits:
- Simplified Setup: No need to install Python, UV, or Ollama locally
- Consistent Environment: Same runtime across all systems
- Isolation: Application runs in isolated containers
- Easy Cleanup: Remove everything with
docker-compose down - Automatic Ollama Setup: Embedded model automatically downloads
- Includes: API server with health checks and proper error handling
Note: Docker setup includes persistent volumes for database and logs, so your data persists between container restarts.
For learning and demos, we recommend the local installation to better understand the components.
The memory system is configured in main.py with the following components:
- Embeddings: Ollama with
nomic-embed-text:latestmodel - LLM: Anthropic Claude Sonnet 4.0
- Vector Store: ChromaDB (local storage in
db/directory) - Memory Provider: Mem0 with ChromaDB backend
main.py # Core MemoryAgent implementation and CLI
api.py # FastAPI server exposing REST API
example_demo.py # Demonstration script
db/ # ChromaDB vector store data
logs/ # Security and application logs
.env.example # Environment variables template
pyproject.toml # Project configuration
Dockerfile # Docker container configuration
docker-compose.yml # Multi-service Docker setup
.dockerignore # Docker build exclusions
CLAUDE.md # Development guidance
SECURITY.md # Security guidelines
- LangGraph: Workflow orchestration and state management
- Mem0: Memory management and retrieval
- Anthropic: Claude API integration
- Ollama: Local embedding generation
- ChromaDB: Vector database for memory storage
- FastAPI: REST API framework
- Uvicorn: ASGI server
- SlowAPI: Rate limiting middleware
The API includes built-in rate limiting to prevent abuse:
- Chat/Memory endpoints: 20 requests/minute, 100 requests/hour (configurable)
- Health endpoint: 60 requests/minute
- Memory deletion: 10 requests/minute, 30 requests/hour
- Uses in-memory storage - No Redis required
- Configurable via environment variables:
RATE_LIMIT_REQUESTS_PER_MINUTE=20RATE_LIMIT_REQUESTS_PER_HOUR=100
Each user is identified by a user_id, and all conversations are stored persistently. The agent will:
- Store conversation history automatically
- Retrieve relevant memories for new interactions
- Provide contextual responses based on past conversations
- Maintain separate memory spaces for different users
User: "Hi, my name is Alice and I'm a software developer from San Francisco.
I love working with Python and machine learning projects."
Agent: "Nice to meet you, Alice! It's great to connect with a fellow
software developer from San Francisco. Python and ML are
fascinating fields - what kind of ML projects do you enjoy most?"
User: "I prefer working in the mornings, usually starting at 7 AM.
I'm vegetarian and love Italian cuisine."
Agent: "That's an early start, Alice! Morning productivity can be great.
And Italian cuisine has so many wonderful vegetarian options..."
User: "What do you remember about me?"
Agent: "I remember quite a bit about you, Alice! You're a software
developer from San Francisco who loves Python and machine
learning projects. You're an early bird who starts work at 7 AM,
and you're vegetarian with a preference for Italian cuisine."
Key Demonstration Points:
- Cross-session memory persistence
- Contextual understanding and recall
- Personalized responses based on learned preferences
- Natural conversation flow with memory integration
See CLAUDE.md for detailed development guidance and architectural information.
This project is licensed under the MIT License - see the LICENSE file for details.