Cognitive Memory System for AI Agents
Persistent semantic search Β· Knowledge graph Β· Sleep consolidation Β· Zero external APIs
Quick Start Β· Architecture Β· Features Β· Research Β· CLI Β· Agent Integration
Engram gives AI agents a brain β not a database. It remembers what matters, forgets what doesn't, and builds connections between ideas. All inference runs locally β no API keys, no cloud, no token cost per memory operation.
Most AI memory solutions (like Mem0) use LLM calls for every add() β extracting facts, classifying operations, resolving conflicts. That's powerful, but expensive and opaque.
Engram takes a different approach:
| Mem0 | Engram | |
|---|---|---|
| Who decides what to save | LLM automatically | Agent explicitly |
| Cost per memory write | LLM call (extraction + update) | Local embedding only |
| Cost per memory read | Vector search + LLM | Hybrid search + local reranker |
| External dependencies | OpenAI API / vector DB service | None β fully local |
| Memory unit | Atomic fact ("User prefers X") | Typed record with full context |
| Forgetting | No built-in mechanism | Ebbinghaus-inspired decay + consolidation |
| Knowledge graph | Neo4j (separate service) | SQLite-embedded links |
Engram is designed for coding agents β where context is precious, decisions have high stakes, and the agent itself is smart enough to know what's worth remembering.
- Node.js β₯ 20.0.0
git clone https://github.com/foramoment/engram-ai-memory.git
cd engram-ai-memory/SKILLS/engram
npm install
# Register the CLI globally (optional)
npm link# Add a memory
engram add reflex "Always wrap vector_top_k in try/catch" \
-c "LibSQL's DiskANN index may not be ready during cold start. Fallback to brute-force cosine." \
-t "libsql,vector-search" --permanent
# Recall it
engram recall "vector search error handling"
# Check your memory stats
engram statsEngram ships as a Skill for AI coding assistants (Antigravity, Claude Code, Cursor, etc.):
your-agent-config/
βββ skills/engram/ # copy or symlink SKILLS/engram here
βββ SKILL.md # Agent instructions (the agent reads this)
βββ src/ # Core modules
βββ scripts/ # Session automation
βββ references/ # Deep docs
See SKILLS/engram/SKILL.md for the full agent integration guide.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLI Layer β
β engram recall Β· add Β· search Β· sleep Β· link Β· export Β· ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββ β
β β Focus of β β Memory β β Sleep β β
β β Attention β β CRUD + β β Consolidation β β
β β (FoA) β β Search β β β β
β β β β β β β’ Ebbinghaus Decay β β
β β β’ Composite β β β’ Semantic β β β’ Prune (archive) β β
β β scoring β β β’ FTS (BM25) β β β’ Merge (dedup) β β
β β β’ Token β β β’ Hybrid β β β’ Boost (reinforce) β β
β β budget β β (RRF) β β β β
β β β’ Session β β β’ Reranking β β β β
β β context β β β’ Graph hops β β β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ βββββββββββββ¬ββββββββββββ β
β β β β β
βββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββ΄βββββββββββββββ€
β Embedding Layer β
β BGE-M3 (1024-dim, 100+ langs, 8192 tokens) β
β BGE-reranker-base (cross-encoder) β
β Hugging Face Transformers.js β runs on CPU/WebGPU β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Storage Layer β
β LibSQL/SQLite β single file, zero infrastructure β
β DiskANN vector index Β· FTS5 full-text Β· WAL mode β
β Typed memories Β· Tags Β· Links Β· Sessions Β· Access log β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Module | File | Purpose |
|---|---|---|
| CLI | src/cli.js |
Commander-based command interface |
| Database | src/db.js |
Schema, migrations, LibSQL client |
| Memory | src/memory.js |
CRUD, search (semantic/FTS/hybrid), graph links |
| Embeddings | src/embeddings.js |
BGE-M3 embedding + BGE-reranker cross-encoder |
| FoA | src/foa.js |
Focus of Attention β smart context assembly |
| Consolidation | src/consolidation.js |
Sleep cycle β decay, prune, merge, boost |
| Session | src/session.js |
Conversation session tracking |
| Migration | src/migrate.js |
Import from legacy memory formats |
- Zero infrastructure β single file, no server, no Docker
- Portable β the entire skill (code + database) lives in one folder
- LibSQL extensions β native
vector()type, DiskANN indexing, FTS5 built-in - Suitable scale β agent memory is ~100s to ~10,000s of entries, not millions
- Fully local β no API keys, no network, no cost
- Multilingual β works across Russian, English, and other languages
- 1024-dim β good balance of quality and performance on CPU
- @huggingface/transformers β pure JS/WASM, no native compilation needed
Bi-encoder (embedding) search is fast but approximate. The cross-encoder (bge-reranker-base) sees query + document together, giving much better relevance ranking β critical for a system where agents need the right context, not just similar context.
Semantic search misses exact names/identifiers. FTS misses semantic similarity. RRF fusion combines both with score = Ξ£ 1/(k + rank), naturally balancing precision and recall without tuning.
Without maintenance, memory grows unbounded and search quality degrades. Biological memory consolidation during sleep inspired four steps:
- Decay β Ebbinghaus forgetting curve (
strength *= 0.95^days). Idempotent: useslast_consolidation_atto prevent double-decay. - Prune β Archive memories below strength threshold (0.05). Permanent memories exempt.
- Merge β Find near-duplicates (cosine > 0.92), merge content, archive duplicate.
- Boost β Strengthen frequently accessed memories. Cooldown guard (β₯1 day) prevents runaway boosting.
- Extract (planned) β LLM-based pattern extraction to discover meta-rules from clusters.
Agents add memories one at a time. Auto-linking discovers relationships post-hoc: on each add, top-3 similar existing memories are found by cosine similarity and linked if above 0.7 threshold. This builds a knowledge graph organically, enabling graph-hop expansion during recall.
Agents have finite context windows. recall returns memories sorted by composite score (relevance Γ importance Γ strength Γ recency) until the token budget is filled (default: 4000). A noise gate (score < 0.001) prevents irrelevant results from wasting budget.
Every recall query runs two parallel search paths and fuses results:
Query: "authentication error handling"
β
ββββββ΄βββββ
βΌ βΌ
Semantic FTS5
(BGE-M3) (BM25)
β β
ββββββ¬βββββ
βΌ
Reciprocal Rank Fusion (k=60)
β
βΌ
Cross-Encoder Reranking (optional)
β
βΌ
Graph Expansion (multi-hop)
β
βΌ
Composite Scoring
relevance Γ importance Γ strength Γ recency
β
βΌ
Token Budget Fitting
- Semantic search catches conceptual matches ("auth patterns" β "OAuth2 token refresh")
- FTS5/BM25 catches exact keyword matches ("LibSQL" β memories containing "LibSQL")
- RRF fusion combines both without score normalization
- Cross-encoder reranking with BGE-reranker-base for precision-critical queries
- Multi-hop graph traversal follows links to pull in related context
Five memory types modeled after cognitive science:
| Type | Analogy | Use Case | Permanent? |
|---|---|---|---|
reflex |
Procedural memory | "If X happens β do Y" | β Recommended |
episode |
Episodic memory | Bug reports: trigger β cause β fix | β |
fact |
Semantic memory | Architecture decisions, stack info | β |
preference |
Implicit memory | User preferences, environment | β Recommended |
decision |
Deliberative memory | "Chose X over Y because Z" | β |
Memories form a linked graph with automatic and explicit connections:
- Auto-linking: every
addfinds the top 3 semantically similar memories (cosine β₯ 0.7) and createsrelated_tolinks - Explicit links:
caused_by,evolved_from,contradicts,supersedes - Multi-hop retrieval:
recallfollows graph links to pull in related context
[reflex] Always wrap vector_top_k
βββ related_to β [episode] Vector index NPE in production
β βββ caused_by β [decision] Use DiskANN over brute-force
βββ evolved_from β [fact] LibSQL vector search capabilities
Biologically-inspired memory maintenance, designed to run periodically:
engram sleep # Run full cycle
engram sleep --dry-run # Preview changes| Step | What it does | Biological analogy |
|---|---|---|
| Decay | strength *= 0.95^days |
Synaptic depression |
| Prune | Archive if strength < 0.05 | Synaptic elimination |
| Merge | Combine near-duplicates (cosine β₯ 0.92) | Memory consolidation |
| Boost | +10% strength for frequently accessed (β₯3 times) | Repetition priming |
Permanent memories (reflexes, preferences) are exempt from decay and pruning.
Every add does merge-on-write:
- Exact match (same type + title) β skip, bump access count
- Semantic near-match (cosine β₯ 0.92, same type) β merge content into existing memory
- New β create, auto-embed, auto-link
No LLM needed β pure embedding similarity.
- Single SQLite file β
data/engram.db - No external vector DB β LibSQL DiskANN built-in, brute-force fallback
- No API keys β all inference via Transformers.js (CPU/WebGPU)
- No Docker β just
npm install - Portable β copy one file to migrate
On i5-14600KF + DDR5 32GB:
| Operation | Time |
|---|---|
First run after npm install |
~40s (one-time WASM JIT compilation) |
| Embedding model load | ~1.4s |
| Reranker model load | ~1.2s |
| Recall (full pipeline) | ~3s total |
| Subsequent runs | ~3s (model loading dominates) |
Model loading is the bottleneck, not inference. For sub-second responses, a daemon mode (keeping models in memory) is the path forward.
Engram is an applied cognitive architecture β not a single research paper, but an engineering synthesis of ideas from cognitive psychology, information retrieval, and modern AI memory research.
"CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models" β arXiv:2504.01441
CogMem's three-layer model directly inspired Engram's architecture:
| CogMem Layer | Engram Component |
|---|---|
| Long-Term Memory (LTM) β persistent knowledge store | SQLite + DiskANN vector index |
| Direct Access (DA) β session working memory | session β conversation context tracking |
| Focus of Attention (FoA) β dynamic context assembly | recall() β composite scoring + token budget fitting |
The key insight from CogMem: don't stuff the entire history into the prompt β reconstruct concise, task-relevant context at each turn.
"Γber das GedΓ€chtnis" (On Memory) β Hermann Ebbinghaus, 1885
The foundational law of memory decay: retention decreases exponentially over time unless reinforced. Engram's sleep consolidation implements this directly:
strength *= decayRate ^ daysSinceLastAccess
(default: 0.95 ^ days)
Memories that are accessed frequently resist decay. Memories that are never recalled eventually fall below the prune threshold and are archived.
"MemoryBank: Enhancing Large Language Models with Long-Term Memory" β Zhong et al., 2023 β arXiv:2305.10250
MemoryBank was the first AI memory system to systematically apply Ebbinghaus-inspired forgetting to LLM agents. Their "human-like forgetting mechanism where memories strengthen when recalled and naturally decay over time if unused" directly influenced Engram's consolidation pipeline.
"Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" β Cormack, Clarke, Buettcher β ACM SIGIR '09
The standard method for combining ranked lists from multiple retrieval systems without score normalization:
RRF_score(doc) = Ξ£ 1 / (k + rank_i) where k = 60
Engram fuses semantic search (BGE-M3 embeddings) and lexical search (FTS5/BM25) through RRF.
"C-Pack: Packaged Resources To Advance General Chinese Embedding" β Xiao et al. β arXiv:2309.07597
- BGE-M3: Multilingual embedding model (1024-dim, 100+ languages, 8192 token context). Runs locally via Transformers.js.
- BGE-reranker-base: Cross-encoder for precision reranking. Processes (query, document) pairs jointly through attention β much more accurate than bi-encoder similarity for relevance scoring.
"DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node" β Subramanya et al., NeurIPS 2019 β arXiv:1907.05024
LibSQL's built-in vector index is based on the Vamana graph algorithm from DiskANN, providing sub-linear approximate nearest neighbor search without an external vector database.
"Episodic and semantic memory" β Endel Tulving, in Organization of Memory, 1972
The distinction between episodic memory (specific events with context) and semantic memory (general knowledge and facts) is the foundation for Engram's type system:
| Engram Type | Cognitive Model |
|---|---|
reflex |
Procedural memory β automated "if X then Y" |
episode |
Episodic memory β specific events with trigger/cause/fix |
fact |
Semantic memory β declarative knowledge |
preference |
Implicit memory β stable preferences |
decision |
Deliberative memory β choices with rationale |
| Engram Component | Source | Year |
|---|---|---|
| Focus of Attention (FoA) | CogMem (arXiv:2504.01441) | 2025 |
| Forgetting curve (decay) | Ebbinghaus, Γber das GedΓ€chtnis | 1885 |
| Sleep consolidation | MemoryBank (arXiv:2305.10250) | 2023 |
| Hybrid search (RRF) | Cormack, Clarke, Buettcher β SIGIR | 2009 |
| Embeddings (BGE-M3) | C-Pack (arXiv:2309.07597) | 2023 |
| Reranker (cross-encoder) | C-Pack (arXiv:2309.07597) | 2023 |
| Memory type taxonomy | Tulving (episodic/semantic) | 1972 |
| Vector index (DiskANN) | Subramanya et al. β NeurIPS | 2019 |
| Semantic dedup (merge-on-write) | Standard IR cosine gating | β |
# Read
engram recall "query" # Smart context assembly (FoA)
engram recall "query" --short # Compact preview
engram recall "query" -t reflex # Filter by type
engram recall "query" -b 2000 # Custom token budget
# Write
engram add <type> "Title" -c "Content" -t "tags" [--permanent]
engram ingest --file memories.json --remove-file # Batch (4x faster)engram search "query" # Hybrid (semantic + FTS)
engram search "query" -m semantic # Semantic only
engram search "query" -m fts # Exact keyword (BM25)
engram search "query" --rerank # Cross-encoder precision
engram search "query" --hops 2 # Multi-hop graph expansion
engram search "query" --since 1d # Time filterengram link <sourceId> <targetId> -r <relation>
# Relations: related_to | caused_by | evolved_from | contradicts | supersedesengram sleep --dry-run # Preview consolidation
engram sleep # Run decay/prune/merge/boost
engram stats # Overview
engram diagnostics # Find weak/duplicate memories
engram export -o backup.json # Export all
engram import --file backup.json # Restore from backupengram get <id> # View full memory
engram update <id> --title "New" --content "..."
engram delete <id> # Remove (cascades)
engram tag add <id> <tag> # Manage tags
engram mark <id> # Toggle permanent# Full backup
engram export -o backup.json
# Restore to same or different machine (dedup handles overlaps)
engram import --file backup.json
# Merge two databases
engram export -o db_a.json # on machine A
engram import --file db_a.json # on machine Bπ Full command reference with all options:
SKILLS/engram/references/cli_reference.mdπ§ Advanced usage patterns:SKILLS/engram/references/effective_usage.md
Engram has comprehensive test coverage across all modules:
cd SKILLS/engram
npm test| Test Suite | Coverage |
|---|---|
db.test.js |
Schema, migrations, vector index |
embeddings.test.js |
BGE-M3 embedding + cosine similarity |
memory.test.js |
CRUD, dedup, search (semantic/FTS/hybrid), graph links |
reranker.test.js |
Cross-encoder scoring + ranking |
session_foa_consolidation.test.js |
Sessions, FoA recall, sleep cycle |
enhancements.test.js |
Edge cases, N+1 optimizations |
migrate.test.js |
Legacy format migration |
| Component | Technology | Role |
|---|---|---|
| Runtime | Node.js β₯ 20 | ESM modules, native test runner |
| Database | LibSQL | SQLite-compatible with vector extensions |
| Embeddings | BGE-M3 via Transformers.js | 1024-dim, 100+ languages |
| Reranker | BGE-reranker-base | Cross-encoder precision scoring |
| Full-Text Search | SQLite FTS5 | BM25 lexical ranking |
| Vector Index | DiskANN (LibSQL built-in) | Approximate nearest neighbors |
| CLI | Commander.js | Command parsing + help generation |
| Feature | Engram | Mem0 | Zep | MemGPT/Letta |
|---|---|---|---|---|
| LLM required for writes | β No | β Yes | β Yes | β Yes |
| External services | β None | Vector DB + LLM API | Cloud service | LLM API |
| Knowledge graph | β SQLite-embedded | β Neo4j | β No | β No |
| Forgetting/decay | β Ebbinghaus-based | β No | β No | β No |
| Cross-encoder reranking | β Local | β No | β No | β No |
| Hybrid search (RRF) | β Semantic + FTS5 | β Vector only | β Vector + metadata | β No |
| Memory types | β 5 cognitive types | β Untyped facts | β Untyped | β Untyped |
| Portable (single file) | β SQLite | β No | β No | β No |
| Cost per operation | $0 (local inference) | $ (LLM API calls) | $$ (cloud) | $ (LLM API calls) |
engram-ai-memory/
βββ README.md # This file
βββ SKILLS/
β βββ engram/ # β portable skill folder
β βββ SKILL.md # Agent integration guide
β βββ package.json # Dependencies (3 packages)
β βββ src/
β β βββ cli.js # CLI entry point (Commander)
β β βββ db.js # Schema, migrations, LibSQL client
β β βββ memory.js # CRUD, search, graph links
β β βββ embeddings.js # BGE-M3 + BGE-reranker
β β βββ foa.js # Focus of Attention (recall)
β β βββ consolidation.js # Sleep: decay, prune, merge, boost
β β βββ session.js # Session management
β β βββ migrate.js # Legacy import
β β βββ __tests__/ # Test suites (node:test)
β βββ references/
β β βββ cli_reference.md # Complete CLI documentation
β β βββ effective_usage.md # Advanced patterns & best practices
β βββ scripts/
β β βββ session-start.ps1 # Auto-load context on session start
β β βββ remember.ps1 # Batch memory ingestion helper
β βββ data/
β βββ engram.db # SQLite database (auto-created)
βββ ...
Contributions are welcome! Here are some areas that could use help:
- Step 4: Pattern Extraction β LLM-based pattern extraction during
sleep(currently a placeholder) - Cross-platform scripts β Bash equivalents for
session-start.ps1/remember.ps1 - WebGPU acceleration β Currently CPU-only; WebGPU support is stubbed but untested
- Turso cloud sync β LibSQL supports cloud sync; could enable multi-device memory
- More memory types β Domain-specific types beyond the cognitive five
- Visualization β Graph visualization of the knowledge network
- Daemon mode β Keep models in memory for sub-second responses
cd SKILLS/engram
# Install dependencies
npm install
# Run tests
npm test
# Run CLI in dev mode
npm run cli -- recall "test query"
# Enable diagnostic logging
ENGRAM_TRACE=1 engram recall "test query"MIT β use it however you want.
Built with neuroscience, information retrieval theory, and a healthy distrust of API bills.