The converged query engine for the TabAgent embedded database.
The query crate implements the Converged Query Pipeline, a multi-stage query execution system that fuses three distinct query facets into a unified interface:
- Structural Filters - Fast exact matching on indexed properties
- Graph Traversals - Relationship-based filtering using BFS
- Semantic Search - Vector similarity ranking
┌─────────────────────────────────────────────────┐
│ STAGE 1: Candidate Generation │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Structural │ │ Graph │ │
│ │ Filters │───▶│ Traversal │ │
│ │ (Exact Match) │ │ (BFS) │ │
│ └─────────────────┘ └─────────────────┘ │
│ │ │ │
│ └──────┬───────┘ │
│ ▼ │
│ Intersection │
│ (Accurate Candidate Set) │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ STAGE 2: Semantic Re-ranking │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Vector Search on Candidates │ │
│ │ (HNSW ANN) │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Ranked Results │
└─────────────────────────────────────────────────┘
-
Accuracy First, Relevance Second
Structural and graph filters ensure factual accuracy before semantic ranking ensures relevance. -
Efficient Filtering
By filtering first using fast indexes, semantic search only operates on a small candidate set, dramatically improving performance. -
Composable Queries
Each query facet is optional and can be combined in any way, providing maximum flexibility.
ConvergedQuery- The top-level query specificationSemanticQuery- Vector search parametersStructuralFilter- Property-based filters with operatorsGraphFilter- Relationship traversal specificationQueryResult- Result containing node and optional similarity scorePath- Ordered nodes and edges for graph traversals
The central orchestrator that:
- Executes the multi-stage pipeline
- Manages candidate set intersection
- Coordinates storage and indexing layers
- Provides high-level convenience APIs
Find all messages in a specific chat:
use query::{QueryManager, models::*};
use serde_json::json;
let query = ConvergedQuery {
structural_filters: Some(vec![
StructuralFilter {
property_name: "chat_id".to_string(),
operator: FilterOperator::Equals,
value: json!("chat_123"),
},
StructuralFilter {
property_name: "node_type".to_string(),
operator: FilterOperator::Equals,
value: json!("Message"),
},
]),
semantic_query: None,
graph_filter: None,
limit: 10,
offset: 0,
};
let results = query_mgr.query(&query)?;Find all nodes within 2 hops of a starting node:
let query = ConvergedQuery {
structural_filters: None,
semantic_query: None,
graph_filter: Some(GraphFilter {
start_node_id: "entity_abc".to_string(),
direction: EdgeDirection::Outbound,
edge_type: Some("MENTIONS".to_string()),
depth: 2,
}),
limit: 50,
offset: 0,
};
let results = query_mgr.query(&query)?;Find semantically similar messages in a specific chat that mention a particular entity:
let query = ConvergedQuery {
structural_filters: Some(vec![
StructuralFilter {
property_name: "chat_id".to_string(),
operator: FilterOperator::Equals,
value: json!("chat_123"),
},
]),
graph_filter: Some(GraphFilter {
start_node_id: "entity_project_phoenix".to_string(),
direction: EdgeDirection::Inbound,
edge_type: Some("MENTIONS".to_string()),
depth: 1,
}),
semantic_query: Some(SemanticQuery {
vector: embedding_vector,
similarity_threshold: Some(0.7),
}),
limit: 5,
offset: 0,
};
let results = query_mgr.query(&query)?;Find the shortest path between two nodes:
let path = query_mgr.find_shortest_path("node_a", "node_b")?;
if let Some(path) = path {
println!("Path length: {}", path.nodes.len());
println!("Edges: {}", path.edges.len());
}| Operation | Complexity | Notes |
|---|---|---|
| Structural Filter | O(1) to O(log n) | Uses secondary indexes |
| Graph Traversal | O(E + V) | BFS, limited by depth |
| Semantic Search | O(log n) | HNSW ANN on candidate set |
| Converged Query | O(C * log C) | C = candidate set size |
Key Insight: By filtering to a small candidate set (C) before semantic search, we achieve sub-linear performance even on large datasets.
query
├── Depends on: storage (CRUD operations)
├── Depends on: indexing (secondary indexes, graph, vector search)
├── Depends on: common (types, errors)
└── Used by: Python bindings (via PyO3)
Run tests:
cargo test -p queryCurrent test coverage:
- ✅ Structural filtering
- ✅ Empty result sets
- ✅ QueryManager initialization
- ✅ Doc test examples
- Complex Filter Logic: Support nested AND/OR expressions
- Range Queries:
GreaterThan,LessThanoperators on indexed fields - Multi-Hop Graph Patterns: Cypher-like pattern matching
- Cursor-Based Pagination: For efficient large result sets
- Query Optimization: Cost-based query planning
- Caching: Frequently-used query result caching
- StorageLayer.md - CRUD operations
- IndexingLayer.md - Secondary indexes
- QueryEngine.md - Full specification