Indexing Crate

Universal index building service for MIA's 9+ databases - zero-copy, multi-resolution, semantically aware.

📚 For complete architecture details, see ARCHITECTURE.md
Explains: What indexing IS, multi-resolution embeddings, how modules work together, the refactoring plan

What This Does

Builds search structures for ALL MIA databases:

Semantic search → HNSW indexes for vector similarity (O(log n))
Graph traversal → Adjacency lists for relationships (O(1))
Property lookups → B-tree indexes for metadata (O(log n))
Multi-resolution → Fast (0.6B) + Accurate (8B) embeddings

Quick Example

use storage::StorageManager;

// Create storage with automatic indexing
let storage = StorageManager::with_indexing("my_db")?;
let idx = storage.index_manager().unwrap();

// Query by property (zero-copy!)
if let Some(guard) = idx.get_nodes_by_property("chat_id", "chat_123")? {
    for node_id in guard.iter_strs() {
        println!("Found: {}", node_id);
    }
}

// Graph traversal
if let Some(guard) = idx.get_outgoing_edges("user_5")? {
    for edge_id in guard.iter_strs() {
        println!("Edge: {}", edge_id);
    }
}

// Semantic search
let results = idx.search_vectors(&query_vector, 10)?;

How It Works

IndexManager (public API)
    ├── StructuralIndex → B+ tree for properties
    ├── GraphIndex      → Adjacency lists for edges
    └── VectorIndex     → HNSW for embeddings

Key innovation: Zero-copy reads via transaction-backed guards (no allocations!).

Module Organization

Module	Purpose	When to Use
core/	Essential implementations	Always (automatic)
lock_free/	Concurrent structures	High-concurrency workloads
algorithms/	Graph analysis	Path finding, communities
advanced/	Optional features	Tiered storage, filtering
utils/	Helpers	Caching, metrics, batch ops

→ See each module's README for details.

Quick Start

1. Add Dependency

[dependencies]
indexing = { path = "../indexing" }
storage = { path = "../storage" }

2. Use via Storage

// Indexing happens automatically
let storage = StorageManager::with_indexing("db_path")?;

// Insert data (indexes update automatically)
storage.insert_node(node)?;
storage.insert_edge(edge)?;

// Query indexes
let idx = storage.index_manager().unwrap();
let results = idx.get_nodes_by_property("sender", "user_5")?;

3. Iterate Zero-Copy

if let Some(guard) = idx.get_nodes_by_property("chat_id", "chat_123")? {
    // guard.iter_strs() borrows from mmap - NO allocations!
    for id in guard.iter_strs() {
        process(id);
    }
    
    // O(1) count
    println!("Found {} nodes", guard.len());
}

Performance

Operation	Time	Space	Implementation
Property query	O(log n)	0 bytes	libmdbx B+ tree + guard
Graph neighbor	O(1)	0 bytes	Adjacency list + guard
Vector search	O(log n)	N vectors	HNSW approximate

Benchmark: 10K node query = 100x faster than traditional deserialization.

Common Patterns

Zero-Copy (Fastest)

// ✅ No allocations
if let Some(guard) = idx.get_nodes_by_property("sender", "user_5")? {
    for id in guard.iter_strs() {
        process(id);
    }
}

Owned Data (When Needed)

// ⚠️ Explicit allocation
let guard = idx.get_nodes_by_property("topic", "rust")?;
let owned = guard.to_owned()?;

tokio::spawn(async move {
    process(owned).await;
});

Key Features

✅ Zero-copy reads - Direct mmap access, no allocations
✅ Automatic sync - Indexes update on every mutation
✅ ACID transactions - libmdbx guarantees
✅ Pure Rust - No external dependencies
✅ Concurrent - Lock-free options available

Testing

cargo test           # Run all tests
cargo bench          # Run benchmarks
cargo check          # Check compilation

Documentation

ARCHITECTURE.md - START HERE! Complete explanation of what indexing is
Module READMEs - src/*/README.md for each module
TODO.md - Implementation status

Key Concepts

Indexing is a SERVICE, not a database owner!

storage owns → knowledge.mdbx
               ├── nodes (data)
               ├── edges (data)
               ├── structural_index ← INDEXING builds this
               └── graph_outgoing   ← INDEXING builds this

indexing builds indexes IN storage's database!

Multi-Resolution Embeddings:

embeddings.mdbx
├── 0.6b_vectors → FAST search (Stage 1: 1M→1K candidates, <1ms)
└── 8b_vectors   → ACCURATE rerank (Stage 2: 1K→100 best, <10ms)

Summary

What indexing is:

Universal index builder for ALL MIA databases
Provides: HNSW (vectors), adjacency lists (graph), B-trees (properties)
Takes storage's env pointers (doesn't create DBs)
Serves 9+ databases with same service

Start reading:

ARCHITECTURE.md - Full explanation
src/core/README.md - Core implementations
src/algorithms/README.md - Graph algorithms
src/advanced/README.md - Enterprise features

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing Crate

What This Does

Quick Example

How It Works

Module Organization

Quick Start

1. Add Dependency

2. Use via Storage

3. Iterate Zero-Copy

Performance

Common Patterns

Zero-Copy (Fastest)

Owned Data (When Needed)

Key Features

Testing

Documentation

Key Concepts

Indexing is a SERVICE, not a database owner!

Multi-Resolution Embeddings:

Summary

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Indexing Crate

What This Does

Quick Example

How It Works

Module Organization

Quick Start

1. Add Dependency

2. Use via Storage

3. Iterate Zero-Copy

Performance

Common Patterns

Zero-Copy (Fastest)

Owned Data (When Needed)

Key Features

Testing

Documentation

Key Concepts

Indexing is a SERVICE, not a database owner!

Multi-Resolution Embeddings:

Summary