High-performance AI orchestration with gRPC microservices
Rust workspace providing WebRTC signaling, HTTP API, database layer, and ML orchestration. Spawns and manages Python ML subprocess for vision/language models.
Rust Server (Orchestrator)
├── HTTP API (REST + OpenAPI)
├── WebRTC Signaling + Data Channels
├── Native Messaging (Chrome Extension)
├── Database Layer (7 MIA databases)
├── Model Management (Download + Cache)
└── Python ML gRPC Client
↓ (spawns subprocess)
Python ML Service (MediaPipe + Transformers + LiteRT)
Design:
- Rust = Brain (orchestration, state, networking)
- Python = Slave (ML inference only)
- gRPC = Communication (language-agnostic, location-transparent)
Main binary - Orchestrates all components
Modes:
native- Native messaging onlyhttp- HTTP API onlywebrtc- WebRTC signaling onlyweb- HTTP + WebRTC (no native)all- Everything (default)
Auto-starts: Python ML subprocess on startup
See: server/README.md
HTTP REST API with OpenAPI documentation
Features:
- Auto-generated routes from traits
- Swagger/ReDoc/RapiDoc UI
- Static file serving
- CORS support
Endpoints:
/v1/health- Health check/v1/models/*- Model management/v1/chat/*- Chat completion/v1/webrtc/*- WebRTC signaling/api-doc/*- API documentation
See: api/README.md
Centralized state for all components
Components:
ModelOrchestrator- Model lifecycle (download → load → inference)- Database clients
- ML clients (gRPC to Python)
- Hardware info
- HuggingFace auth
Pattern: Provides AppStateProvider trait for dependency injection
See: appstate/README.md
MIA cognitive architecture - 7-database memory system
Databases:
conversations/- Chat historyknowledge/- Facts, entitiesembeddings/- Vector searchtool-results/- Function call resultsexperience/- Episodic memorysummaries/- Compressed historymeta/- System metadata
Features:
- gRPC service for remote access
- Direct in-process access
- Atomic transactions
- Backup/restore
See: storage/README.md
Shared types, clients, utilities
Key Components:
MlClient- gRPC client for Python MLPythonProcessManager- Subprocess lifecycleAppStateProvider- State trait- Platform-specific paths
- Error types
- gRPC generated code
See: common/README.md
Model download, storage, serving
Features:
- HuggingFace Hub integration
- Parallel chunk downloads
- Resume interrupted downloads
- Local file serving (for Python gRPC)
- Default model library
File Structure:
models/
├── microsoft--Florence-2-base/
│ ├── config.json
│ ├── model.safetensors
│ └── ...
└── meta-llama--Llama-2-7b-hf/
└── ...
WebRTC signaling + data channels
Features:
- SDP offer/answer exchange
- ICE candidate gathering
- Data channel creation
- Video stream handling
- Session management
Use Cases:
- Browser → Rust real-time video
- Rust → Python streaming inference
- Peer-to-peer connections
See: webrtc/README.md
Chrome extension protocol
Features:
- stdin/stdout message framing
- JSON message parsing
- Bi-directional communication
- Extension manifest generation
Protocol:
[4-byte length][JSON message]
See: native-messaging/README.md
Auto-detect CPU, GPU, NPU
Detects:
- CPUs (cores, threads, model)
- NVIDIA GPUs (VRAM via nvidia-smi)
- AMD GPUs (discrete/integrated)
- Intel GPUs (Arc, integrated)
- Acceleration (CUDA, ROCm, Vulkan, DirectML)
- NPUs (AMD Ryzen AI, Intel Core Ultra)
Platforms: Windows (complete), Linux/macOS (planned)
See: hardware/README.md
ONNX Runtime integration
Features:
- Load
.onnxmodels - Execution providers (CPU, CUDA, DirectML, etc.)
- Session management
- Tensor handling
llama.cpp integration
Features:
- Load
.ggufmodels - Context management
- Sampling parameters
- Streaming generation
ONNX Runtime provider definitions
Providers: 25+ execution providers
- CUDA, TensorRT, ROCm (GPU)
- DirectML, CoreML, Metal (Platform-specific)
- XNNPACK, QNN, NNAPI (Edge)
- OpenVINO, Vitis, MIGraphX (Specialized)
See: execution-providers/README.md
Inference pipeline orchestration
Features:
- Multi-stage pipelines
- Preprocessing/postprocessing
- Model chaining
- Error handling
See: pipeline/README.md
Vector indexing and search
Features:
- HNSW implementation
- Cosine/Euclidean distance
- Lock-free concurrent access
- Incremental updates
See: indexing/README.md
Semantic search and retrieval
Features:
- Hybrid search (vector + keyword)
- Filtering
- Ranking
- Result fusion
See: query/README.md
Graph-based knowledge management
Features:
- Entity extraction
- Relationship mapping
- Graph traversal
- Knowledge consolidation
See: weaver/README.md
Async task management
Features:
- Priority queues
- Deadline scheduling
- Resource limits
- Cancellation
Text tokenization utilities
Features:
- HuggingFace tokenizers
- Byte-pair encoding
- Token counting
Typed value system
Features:
- Dynamic typing
- Serialization
- Type conversions
- Validation
See: values/README.md
Native messaging request handler
Features:
- Route requests to appstate
- Error handling
- Response formatting
# Rust 1.75+
rustup update
# Protocol Buffers compiler
# Windows: choco install protoc
# macOS: brew install protobuf
# Linux: apt install protobuf-compilercd Rust
# Debug build
cargo build
# Release build (optimized)
cargo build --release
# Build specific crate
cargo build -p tabagent-server# Default (all modes)
cargo run --bin tabagent-server
# Specific mode
cargo run --bin tabagent-server -- --mode web --port 3000
# Release mode
cargo run --release --bin tabagent-server
# With script (auto-kills old processes)
./run-server.ps1 # Windows PowerShell
./run-server.bat # Windows CMD# All tests
cargo test --workspace
# Specific crate
cargo test -p storage
# Integration tests
cargo test --test '*'
# With output
cargo test -- --nocaptureLocated in protos/:
database.proto- Database serviceml_inference.proto- ML services (ModelManagement, Transformers, MediaPipe)
# Rust (automatic via build.rs)
cargo build -p common
# Python
cd ../PythonML
./generate_protos.batRust → Python:
MlClientincommon/src/ml_client.rs- Connects to
localhost:50051 - Calls: LoadModel, GenerateText, StreamFaceDetection, etc.
Database:
DatabaseClientinstorage/src/database_client.rs- Can run in-process or remote (gRPC)
- Transparent switching
See: GRPC_ARCHITECTURE.md
# Default (platform-specific AppData)
# Windows: %APPDATA%/TabAgent/tabagent_db/
# Linux: ~/.local/share/TabAgent/tabagent_db/
# macOS: ~/Library/Application Support/TabAgent/tabagent_db/
# Custom
cargo run --bin tabagent-server -- --db-path ./my_db# Default
# Windows: %APPDATA%/TabAgent/models/
# Linux: ~/.local/share/TabAgent/models/
# macOS: ~/Library/Application Support/TabAgent/models/
# Custom
cargo run --bin tabagent-server -- --model-cache-path ./my_modelsserver
├── api (HTTP routes)
├── appstate (shared state)
├── webrtc (signaling)
├── native-messaging (extension)
└── common (types, clients)
appstate
├── storage (database)
├── model-cache (downloads)
├── hardware (detection)
├── common (gRPC clients)
└── ... (all other crates)
storage
├── indexing (vector search)
├── query (semantic search)
├── weaver (knowledge graph)
└── common (types)
common
├── tonic, prost (gRPC)
└── (no other workspace dependencies)
| Component | Metric | Value |
|---|---|---|
| HTTP API | Latency | <5ms (p99) |
| WebRTC Signaling | Setup Time | <100ms |
| Database Writes | Throughput | 10k/s |
| Database Reads | Throughput | 50k/s |
| Model Download | Speed | 100MB/s (parallel chunks) |
| gRPC (Rust↔Python) | Latency | <1ms (localhost) |
i9-12900K + NVMe SSD
- Linux/macOS hardware detection
- GPU memory management (VRAM tracking)
- Model quantization (GPTQ, AWQ)
- WebRTC video encoding/decoding
- Distributed database (multi-node)
- Model fine-tuning API
- Ray tracing for attention visualization
- Plugin system
- Web UI (dashboard)
- Monitoring/metrics (Prometheus)
- Clustering support
- Docker deployment
See: Individual crate TODO.md files for details
- GRPC_ARCHITECTURE.md - gRPC design patterns
- ARCHITECTURE.md - System architecture
- Rust-Architecture-Guidelines.md - Code standards
- docs/ - Database architecture, MIA memory
- Read architecture docs
- Check crate-specific README.md + TODO.md
- Write tests
- Run
cargo fmtandcargo clippy - No compilation warnings
- PythonML/README.md - Python ML services
- Root README.md - Project overview