Skip to content

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory

Notifications You must be signed in to change notification settings

vashuteotia123/hypermind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

HyperMind

๐Ÿง  Intelligent Memory Layer for Large Language Models

License: MIT TypeScript Cloudflare Workers Deploy to Cloudflare Workers

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory


โœจ What is HyperMind?

HyperMind is a production-grade memory proxy that sits between your application and any LLM provider (OpenAI, Anthropic, Groq, Google). It automatically manages conversation context and long-term memory using cognitive science principles and advanced optimization techniques, preventing vector database bloat while maintaining intelligent memory retention. Try out the demo at: HyperMind chat *experimental

๐ŸŽฏ The Problem

  • LLMs are stateless - they forget everything after each conversation
  • Vector databases grow indefinitely, causing performance degradation
  • Building persistent memory is complex and expensive
  • No intelligent filtering - everything gets stored, even irrelevant content
  • Context windows are limited and expensive to extend

๐Ÿš€ The Solution

HyperMind provides a universal memory layer with comprehensive optimization:

# Instead of calling providers directly:
curl https://api.openai.com/v1/chat/completions
curl https://api.anthropic.com/v1/messages
curl https://api.groq.com/openai/v1/chat/completions
curl https://generativelanguage.googleapis.com/v1beta/openai/chat/completions

# Call HyperMind (same API, but with intelligent memory):
curl https://your-hypermind.workers.dev/router/v1/chat/completions

Your AI now remembers everything - while staying fast and cost-efficient.


๐ŸŒŸ Key Features

๐Ÿง  Memory Router

  • ๐Ÿ”Œ Universal Proxy: Works with any LLM provider (OpenAI, Anthropic, Groq, Google)
  • ๐Ÿ”„ Multi-Provider Support: Seamlessly switch between providers while maintaining memory
  • โšก Low Latency: Transparent proxy adds <700ms overhead
  • ๐Ÿ’ฐ Cost Transparent: Uses your API keys, zero markup

๐Ÿ” Hybrid Search Engine

Combines three search strategies for comprehensive memory retrieval:

  1. ๐ŸŽฏ Vector Search - Semantic similarity using embeddings
  2. ๐Ÿ•ธ๏ธ Graph Traversal - Entity relationships and knowledge graphs
  3. โฐ Chronological - Recent context and temporal relevance

๐ŸŽ›๏ธ Intelligent Memory Optimization

Prevents vector database bloat with advanced techniques:

  1. ๐Ÿ”— Smart Deduplication - Detects and merges similar memories (90% similarity threshold)
  2. ๐Ÿ“Š Significance Filtering - Skips low-value content (greetings, filler, acknowledgments)
  3. ๐Ÿ“ฆ Tiered Archival - Moves old memories through Hotโ†’Warmโ†’Coldโ†’Archived tiers
  4. ๐Ÿ”„ Memory Consolidation - Clusters and summarizes related memories
  5. โšก Batch Processing - Queues embeddings for efficient API usage

Result: 40-60% storage reduction, 2-3x faster search, 50-70% fewer API calls

๐Ÿ“Š Knowledge Graph

  • ๐Ÿ”— Temporal Triplets: Subject-Predicate-Object with time validity
  • ๐Ÿท๏ธ Entity Extraction: Automatic extraction of people, places, concepts
  • ๐Ÿ“ Episodic Classification: Categorizes memories by type (comparison, question, definition, list, factual)
  • ๐Ÿ“‰ Smart Decay: Different forgetting rates for different memory types

โฑ๏ธ Cognitive Science Integration

Based on Ebbinghaus' Forgetting Curve:

Tier Age Vector Search Status
๐Ÿ”ฅ Hot 0-7 days Active Full access
๐ŸŒก๏ธ Warm 7-30 days Active Full access
โ„๏ธ Cold 30-90 days Active Lower priority
๐Ÿ“ฆ Archived 90+ days Removed D1 only
๐Ÿ—„๏ธ Ancient 180+ days Compressed R2 storage (optional)

๐Ÿš€ Quick Start

1. Deploy HyperMind (1-click)

Deploy to Cloudflare Workers

2. Get Your API Key

Sign up for any LLM provider:

3. Make Your First Request

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-hypermind-user-id: user123" \
  -H "x-hypermind-provider: groq" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "user", "content": "I am building a quantum computing system with 127 qubits"}
    ]
  }'

4. Test Memory Recall

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-hypermind-user-id: user123" \
  -H "x-hypermind-provider: groq" \
  -d '{
    "model": "llama-3.3-70b-versatile", 
    "messages": [
      {"role": "user", "content": "What quantum computing project am I working on?"}
    ]
  }'

Response: "You're building a quantum computing system with 127 qubits..." โœจ


๐Ÿ—๏ธ Architecture

Request Flow

sequenceDiagram
    participant App as Your Application
    participant Router as Memory Router
    participant Search as Hybrid Search
    participant Storage as Storage Layer
    participant LLM as LLM Provider
    participant Optim as Optimization
    
    App->>Router: Chat Request<br/>(user message)
    
    Note over Router: Step 1: Memory Retrieval
    Router->>Search: Find relevant memories
    
    par Parallel Search
        Search->>Storage: Vector Search (semantic)
        Search->>Storage: Graph Traversal (entities)
        Search->>Storage: Chronological (recent)
    end
    
    Storage-->>Search: Combined Results
    Search-->>Router: Top 15 relevant memories
    
    Note over Router: Step 2: Context Injection
    Router->>Router: Inject memories into prompt
    
    Note over Router: Step 3: LLM Request
    Router->>LLM: Enhanced request<br/>(with context)
    LLM-->>Router: Response
    
    Router-->>App: Final Response<br/>(with memory)
    
    Note over Router: Step 4: Background Storage
    Router->>Optim: Store conversation async
    
    Optim->>Optim: Analyze Significance<br/>(score: 0.0-1.0)
    
    alt Low Significance (< 0.6)
        Optim->>Optim: Discard โŒ
    else High Significance (>= 0.6)
        Optim->>Optim: Check for duplicates<br/>(hash + similarity)
        
        alt Similar Memory Found (> 0.9)
            Optim->>Storage: Merge with existing ๐Ÿ”—
        else New Memory
            Optim->>Optim: Add to batch queue
            Optim->>Storage: Store when batch full
        end
    end
    
    Note over Storage: Tiered Storage
    Storage->>Storage: Hot (0-7d): Active<br/>Warm (7-30d): Active<br/>Cold (30-90d): Active<br/>Archived (90d+): D1 only<br/>Ancient (180d+): R2
Loading

Storage Infrastructure

Layer Technology Purpose Data Retention
Active Index Cloudflare Vectorize Semantic search on hot/warm/cold memories 0-90 days
Primary DB Cloudflare D1 (SQLite) All memories, entities, triplets Forever
Query Cache Cloudflare KV LLM analysis results 1 hour TTL
Cold Archive Cloudflare R2 (optional) Compressed ancient memories 180+ days

Optimization Pipeline

Incoming Memory
    โ†“
[Significance Analysis]
    โ†“
Score < 0.6? โ†’ Discard โŒ
    โ†“
[Hash Check]
    โ†“
Duplicate? โ†’ Skip โŒ
    โ†“
[Similarity Check]
    โ†“
Similar (>0.9)? โ†’ Merge ๐Ÿ”—
    โ†“
[Batch Queue]
    โ†“
Queue Full (50)? โ†’ Process Batch
    โ†“
[Vector Storage]
    โ†“
Stored โœ…

๐Ÿ“– Usage Examples

Memory Router API

# Chat with memory (works with any provider)
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "x-hypermind-user-id: user123" \
  -H "x-hypermind-provider: groq" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "user", "content": "My favorite programming language is Python"}
    ]
  }'

# Switch to different provider (memory persists)
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -H "x-hypermind-user-id: user123" \
  -H "x-hypermind-provider: openai" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "What programming language do I prefer?"}
    ]
  }'

Direct Memory API

# Store a memory manually
curl -X POST "https://your-hypermind.workers.dev/api/memories?userId=user123" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "I prefer TypeScript over JavaScript",
    "metadata": {"source": "manual", "tags": ["programming"]}
  }'

# Search memories
curl -X POST "https://your-hypermind.workers.dev/api/search?userId=user123" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "programming preferences",
    "limit": 5
  }'

โšก Performance & Optimization

Optimization Features

Feature Impact Description
Smart Deduplication 20-30% reduction Merges similar memories (cosine similarity > 0.90)
Significance Filtering 30-40% reduction Skips greetings, filler, low-value content
Tiered Archival 2-3x faster search Removes old memories from active vector index
Memory Consolidation 30-40% reduction Clusters related memories into summaries
Batch Processing 50-70% fewer API calls Queues embeddings for batch processing

Performance Benchmarks

Before Optimization:

  • Storage: Linear growth, indefinite
  • Search: 5-10s for 10k+ memories
  • API Calls: Every conversation = 1+ embedding calls

After Optimization:

  • Storage: 40-60% reduction
  • Search: 2-3s for 10k+ memories (2-3x faster)
  • API Calls: 50-70% reduction via batching

Configuration

Customize optimization thresholds in wrangler.toml:

[vars]
DEDUP_SIMILARITY_THRESHOLD = "0.90"  # 0.85-0.95 recommended
MIN_SIGNIFICANCE_SCORE = "0.60"      # 0.5-0.7 recommended
CONSOLIDATION_ENABLED = "true"        # Enable memory consolidation
BATCH_EMBEDDING_SIZE = "50"          # Batch size: 10-100
ARCHIVE_COLD_AFTER_DAYS = "90"       # Days before archival: 60-180

Automated Maintenance

HyperMind runs automated tasks via cron triggers:

Task Schedule Purpose
Forgetting Cycle Daily 2 AM Update relevance scores, archive old memories
Consolidation Daily 3 AM Cluster and summarize related memories
Batch Processing Every 30 min Process queued embeddings

๐Ÿ› ๏ธ Development

Local Setup

git clone https://github.com/yourusername/hypermind.git
cd hypermind
npm install
npm run dev

Environment Setup

# Create Cloudflare resources
wrangler d1 create hypermind-prod
wrangler vectorize create hypermind-embeddings --dimensions=768 --metric=cosine
wrangler kv:namespace create CACHE

# Optional: Create R2 bucket for ancient memory archival
wrangler r2 bucket create hypermind-archive

# Update wrangler.toml with your resource IDs

Database Migration

# Apply migrations to production
wrangler d1 migrations apply hypermind-prod --remote

Testing

npm test              # Run tests
npm run test:coverage # With coverage
npm run lint          # Code quality

๐Ÿ“Š Database Schema

Core Tables

  • memories: Conversation storage with optimization metadata
  • memory_consolidations: Tracks consolidated memory summaries
  • entities: Extracted entities (people, places, concepts)
  • temporal_triplets: Subject-Predicate-Object relationships
  • forgetting_config: Per-user decay settings

Optimization Fields

-- New fields in memories table
significance_score REAL DEFAULT 1.0    -- 0.0-1.0 importance score
consolidated INTEGER DEFAULT 0         -- Is this memory consolidated?
consolidated_into TEXT                 -- Reference to summary memory
vector_archived INTEGER DEFAULT 0      -- Removed from vector index?
r2_archived INTEGER DEFAULT 0          -- Stored in R2?
dedup_hash TEXT                        -- Hash for duplicate detection

Knowledge Graph

-- Example temporal triplet
INSERT INTO temporal_triplets (subject, predicate, object, episodic_type, valid_from)
VALUES ('user123', 'prefers', 'TypeScript', 'factual', '2024-01-01');

๐ŸŽฏ Use Cases

๐Ÿค– AI Chatbots

Build chatbots that remember user preferences, conversation history, and context across sessions - without bloating your database.

๐Ÿ“š RAG Applications

Use HyperMind as your vector store with automatic optimization for document-based AI applications.

๐ŸŽ“ Educational AI

Create AI tutors that remember student progress, learning patterns, and knowledge gaps - with intelligent consolidation.

๐Ÿ’ผ Business AI

Build AI assistants that remember customer interactions while archiving old, irrelevant data automatically.

๐ŸŽฎ Gaming AI

Create NPCs with persistent memory that evolves and consolidates over time.


๐Ÿ”ง Advanced Features

Smart Deduplication

Prevents storing duplicate or near-duplicate memories:

// Automatic similarity detection
const similarity = cosineSimilarity(newEmbedding, existingEmbedding);
if (similarity > 0.90) {
  // Merge with existing memory instead of creating new
  await mergeMemories(existing, newContent);
}

Significance Filtering

Filters out low-value content automatically:

  • โŒ Generic greetings: "hi", "hello", "thanks"
  • โŒ Acknowledgments: "ok", "got it", "understood"
  • โŒ Emoji-only messages
  • โŒ Very short content (< 20 characters)
  • โœ… Technical discussions (high significance score)
  • โœ… Personal information (high significance score)

Memory Consolidation

Automatically clusters and summarizes related memories:

// Daily consolidation process
1. Find related memories (cosine similarity > 0.70)
2. Group into clusters (3+ memories per cluster)
3. Generate summary memory
4. Mark originals as consolidated
5. Update vector index with summary

Result: 30-40% reduction in active corpus size

Tiered Archival

Automatically moves memories through storage tiers:

// Archival process
Hot (0-7d)   โ†’ Full vector search, all features active
Warm (7-30d) โ†’ Full vector search, lower priority
Cold (30-90d)โ†’ Vector search only if needed
Archived     โ†’ Removed from vector index, D1 only
Ancient      โ†’ Compressed, stored in R2 (optional)

Result: 2-3x faster search on large datasets


๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Areas for Contribution

  • ๐Ÿ› Bug fixes
  • โœจ New features (LLM-powered summarization, multi-language support)
  • ๐Ÿ“š Documentation improvements
  • ๐Ÿงช Test coverage
  • ๐ŸŽจ UI/UX enhancements
  • โšก Performance optimizations

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE for details.


๐Ÿ™ Acknowledgments

  • Ebbinghaus for the forgetting curve research
  • Cloudflare for the amazing Workers platform

Built with โค๏ธ for developers who want their AI to remember - efficiently

โญ Star us on GitHub

About

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published