ZeroCache

This is the capstone project for Rust Language UA Camp.

High-performance embedded key-value store with full-text search

ZeroCache is a fast cache server with full-text search capabilities that combines the simplicity of key-value storage with powerful full-text search capabilities. Built for speed and ease of use.

Features

🚀 Blazing Fast: Microsecond read/write operations
🔍 Full-Text Search: Tantivy inverted indexes with BM25 ranking
📦 Zero Dependencies: Single binary, no external services
🔄 ACID Compliant: Crash-safe with guaranteed data consistency
🎯 Simple REST API: JSON over HTTP, no complex protocols
🛡️ Smart Access: Public reads (rate-limited), protected writes (IP-based)
📊 Flexible Schema: Add fields anytime, index what you need
⚡ Segment-Based Indexing: Incremental updates, no full reindex needed

Access Control Architecture

ZeroCache is designed for fast public data access with protected admin operations.

Public Read Access (Any IP, Rate Limited)

Any IP address can query data, but limited to prevent abuse:

GET requests - Search products, filter by category, get items
Rate limit: rate_limit_per_second: 10 (default, configurable)

Use case: E-commerce website visitors searching products from any device/location.

Admin Operations (IP Restricted, No Rate Limit)

Only trusted IPs can modify data and access system info:

POST - Insert/update data
DELETE - Delete items/collections
PUT - Modify configuration
System endpoints - /status, /settings, /trees, /compact, /purge

Configure in settings.json:

{
  "allowed_ips": ["127.0.0.1", "192.168.1.*", "10.0.0.5"],
  "rate_limit_per_second": 10
}

Wildcard support: Use * for subnet matching (e.g., 192.168.*)

Why This Design?

✅ Admin can quickly bulk-load data from backend systems (no rate limit)
✅ Public users get fast product queries (rate limited to prevent abuse)
✅ No authentication overhead = maximum speed
✅ No encryption overhead = raw performance
✅ Simple IP whitelist = easy security model

Example workflow:

Admin (from 127.0.0.1) uploads 100k products via POST → unrestricted, fast
Website visitors query products via GET → 10 req/sec per IP, still very fast
Admin deletes old products via DELETE → unrestricted

Quick Start

Installation

cargo build --release
./target/release/zerocache

Server starts on http://127.0.0.1:8080

Basic Usage

Insert data:

curl -X POST http://127.0.0.1:8080/data/products \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Upsert-Field: objectID,name,description,category,price:f64" \
  -d '{"objectID":"1","name":"Wireless Mouse","description":"Ergonomic wireless mouse with adjustable DPI","category":"Electronics","price":29.99}
{"objectID":"2","name":"Mechanical Keyboard","description":"RGB backlit mechanical keyboard","category":"Electronics","price":89.99}
{"objectID":"3","name":"Running Shoes","description":"Lightweight running shoes with cushioning","category":"Sports","price":59.99}
{"objectID":"4","name":"Coffee Maker","description":"Automatic coffee maker with timer","category":"Home Appliances","price":49.99}
{"objectID":"5","name":"Bluetooth Speaker","description":"Portable waterproof Bluetooth speaker","category":"Electronics","price":39.99}
{"objectID":"6","name":"Yoga Mat","description":"Non-slip yoga mat, 6mm thick","category":"Sports","price":24.99}
{"objectID":"7","name":"Electric Kettle","description":"Fast-boiling electric kettle 1.7L","category":"Home Appliances","price":19.99}
{"objectID":"8","name":"Gaming Headset","description":"Over-ear headset with noise cancellation","category":"Electronics","price":69.99}
{"objectID":"9","name":"Dumbbell Set","description":"Adjustable dumbbell set 5-50 lbs","category":"Sports","price":99.99}
{"objectID":"10","name":"Blender","description":"High-speed blender for smoothies","category":"Home Appliances","price":34.99}'

Get all items:

curl http://127.0.0.1:8080/data/products

Search by field:

curl http://127.0.0.1:8080/data/products?category=Electronics

Full-text search:

curl http://127.0.0.1:8080/data/products?q=wireless

API Reference

All Endpoints

Method	URL	Access	Description
GET	`http://127.0.0.1:8080/data/products`	Public (rate limited)	Get all items in collection
GET	`http://127.0.0.1:8080/data/products?objectID=1`	Public (rate limited)	Get item by primary key
GET	`http://127.0.0.1:8080/data/products?category=Electronics`	Public (rate limited)	Filter by indexed field
GET	`http://127.0.0.1:8080/data/products?q=mouse`	Public (rate limited)	Full-text search
GET	`http://127.0.0.1:8080/data/products?limit=50&offset=100`	Public (rate limited)	Pagination
GET	`http://127.0.0.1:8080/data/products?sort_by=price&sort_order=desc`	Public (rate limited)	Sorting
POST	`http://127.0.0.1:8080/data/products`	Admin only	Insert/update items
GET	`http://127.0.0.1:8080/trees`	Admin only	List all collections
GET	`http://127.0.0.1:8080/status`	Admin only	Server health and metrics
GET	`http://127.0.0.1:8080/settings`	Admin only	Get current settings
PUT	`http://127.0.0.1:8080/settings`	Admin only	Update settings
DELETE	`http://127.0.0.1:8080/data/products?objectID=1`	Admin only	Delete item by primary key
DELETE	`http://127.0.0.1:8080/data/products` + `X-Confirm-Purge: true`	Admin only	Delete entire collection
DELETE	`http://127.0.0.1:8080/compact` + `X-Confirm-Compact: true`	Admin only	Optimize database and indexes
DELETE	`http://127.0.0.1:8080/purge` + `X-Confirm-Purge: true`	Admin only	Delete ALL data

Collections

A collection is a group of JSON documents with indexed fields.

Create/Update Items (Upsert)

POST http://127.0.0.1:8080/data/products
Content-Type: application/x-ndjson
X-Upsert-Field: objectID,name,description,category,price:f64

X-Upsert-Field header:

First field = primary key (must be unique, used for direct lookups)
Other fields = indexed fields (searchable, filterable, and sortable)
Field types (optional): Add :type suffix for numeric fields
- :u64 - Unsigned 64-bit integer (e.g., quantity:u64)
- :i64 - Signed 64-bit integer (e.g., temperature:i64)
- :f64 - 64-bit float (e.g., price:f64, rating:f64)
- No suffix = text field (default)

Example:

curl -X POST http://127.0.0.1:8080/data/products \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Upsert-Field: objectID,name,description,category,price:f64" \
  -d '{"objectID":"1","name":"Mouse","description":"Wireless","category":"Electronics","price":29.99}
{"objectID":"2","name":"Keyboard","description":"Mechanical","category":"Electronics","price":89.99}'

Response:

{
  "collection": "products",
  "count": 2,
  "errors": 0,
  "operation": "upsert",
  "success": true
}

Query Data

Get by Primary Key

curl http://127.0.0.1:8080/data/products?objectID=1

Returns single item directly from database (fastest - ~1-5μs).

Get by Indexed Field

curl http://127.0.0.1:8080/data/products?category=Electronics

Uses search index for filtering (~100μs).

Full-Text Search

curl "http://127.0.0.1:8080/data/products?q=wireless+mouse"

Searches across all indexed text fields (~1-5ms).

Multiple Filters

curl "http://127.0.0.1:8080/data/products?category=Electronics&price=29.99"

Combines multiple field filters (AND logic).

Range Filters

curl "http://127.0.0.1:8080/data/products?filter_min_price=50&filter_max_price=100"

Numeric range filtering.

Pagination

curl "http://127.0.0.1:8080/data/products?limit=50&offset=100"

limit: Items per page (default: 100, max: 1000)
offset: Skip N items

Response includes:

{
  "products": [...],
  "total": 50,
  "limit": 50,
  "offset": 100,
  "query_type": "full_scan"
}

Sorting

curl "http://127.0.0.1:8080/data/products?sort_by=price&sort_order=desc"

sort_by: Field name
sort_order: asc or desc (default: asc)

Combined Query

curl "http://127.0.0.1:8080/data/products?category=Electronics&filter_min_price=50&sort_by=price&limit=20"

Delete Operations

Delete by Primary Key

curl -X DELETE "http://127.0.0.1:8080/data/products?objectID=1"

Deletes single item and updates search index.

Response:

{
  "deleted": 1,
  "collection": "products",
  "id": "1"
}

Delete Entire Collection

curl -X DELETE http://127.0.0.1:8080/data/products \
  -H "X-Confirm-Purge: true"

Requires confirmation header to prevent accidental deletion.

Response:

{
  "message": "Deleted collection 'products'"
}

System Operations

Get Status

curl http://127.0.0.1:8080/status

Returns server health, memory usage, disk space, and performance metrics.

Response:

{
  "status": "healthy",
  "performance": "optimal",
  "memory": {
    "resident_bytes": 45678912,
    "resident_human": "43.56 MB",
    "virtual_bytes": 123456789,
    "virtual_human": "117.74 MB"
  },
  "db": {
    "size_bytes": 12345678,
    "size_human": "11.77 MB"
  },
  "indexes": {
    "size_bytes": 8765432,
    "size_human": "8.36 MB",
    "total_segments": 5
  },
  "disk": {
    "free_bytes": 123456789,
    "free_human": "117.74 MB"
  },
  "total_collections": 3,
  "total_items": 1250,
  "uptime": {
    "seconds": 3600,
    "human": "1h 0m 0s"
  },
  "requests": {
    "total": 45678
  },
  "can_store_data": true,
  "system_processes": 245,
  "system_memory": { ... },
  "system_cpu": { ... }
}

List Collections

curl http://127.0.0.1:8080/trees

Returns all collections with item counts and indexed fields.

Response:

{
  "collections": [
    {
      "name": "products",
      "count": 1000,
      "indexed": ["objectID(primary, text)", "name(text)", "category(text)", "price(f64)"]
    }
  ],
  "total": 1
}

Get Settings

curl http://127.0.0.1:8080/settings

Response:

{
  "port": 8080,
  "allowed_ips": ["127.0.0.1"],
  "rate_limit_per_second": 10,
  "data_path": "./data",
  "index_path": "./index",
  "upsert_index_buffer": {
    "bytes": 15000000,
    "human": "14.31 MB"
  },
  "compact_index_buffer": {
    "bytes": 50000000,
    "human": "47.68 MB"
  },
  "default_scan_limit": 100,
  "max_scan_limit": 1000,
  "payload_limit": {
    "bytes": 2097152,
    "human": "2.00 MB"
  }
}

Update Settings

curl -X PUT http://127.0.0.1:8080/settings \
  -H "Content-Type: application/json" \
  -d '{"compact_index_buffer": 45000000, "rate_limit_per_second": 20}'

Only specified fields are updated, others remain unchanged.

Compact Database

curl -X DELETE http://127.0.0.1:8080/compact \
  -H "X-Confirm-Compact: true"

When to use:

To reclaim disk space
Periodically (e.g., daily/weekly)

What it does:

Flushes Sled database to disk
Merges Tantivy index segments into fewer files
Removes deleted documents from indexes
Optimizes for faster queries

Response:

{
  "results": [
    "DB compacted",
    "Merged index for products",
    "No merge needed for users"
  ]
}

Performance impact: Can take 10s-5min depending on data size. Run during low traffic.

Purge All Data

curl -X DELETE http://127.0.0.1:8080/purge \
  -H "X-Confirm-Purge: true"

WARNING: Deletes all collections and indexes. Cannot be undone.

Response:

{
  "message": "Purged all collections and search index"
}

Indexed Fields Explained

Why Index Fields?

Indexed fields enable fast filtering and searching:

✅ Direct field filtering: ?category=Electronics
✅ Full-text search: ?q=wireless
✅ Range queries: ?filter_min_price=50
❌ Non-indexed fields cannot be filtered

How It Works

When you upsert with X-Upsert-Field: objectID,name,category,price:

objectID → Primary key (direct O(log n) lookup in database)
name, category, price → Indexed in Tantivy (full-text searchable)
Other fields → Stored in database but not searchable

Example:

{"objectID":"1","name":"Mouse","category":"Electronics","color":"black","stock":50}

What you can query:

✅ ?objectID=1 - Direct lookup (fastest)
✅ ?name=Mouse - Indexed search
✅ ?category=Electronics - Indexed filter
❌ ?color=black - NOT indexed, cannot filter
❌ ?stock=50 - NOT indexed, cannot filter

All fields are returned in results, but only indexed fields can be used for filtering.

Indexing Strategy

Index these fields:

Primary keys (objectID, SKU, etc.)
Frequently filtered fields (category, brand, status)
Search fields (name, description, tags)
Sort fields (price, date, rating)
Range filter fields (price, quantity)

Don't index:

Rarely queried fields
Large text blobs (full descriptions)
Binary data
Frequently changing fields (view_count, last_updated)

Trade-off: More indexes = slower writes, faster reads. Tune based on your use case.

Technology Stack

Core Technologies

Sled - Embedded database

Lock-free BTreeMap
ACID guarantees
Zero-copy reads
~1μs read latency
Crash-safe with write-ahead log

Tantivy - Full-text search

Inverted index (Lucene-like)
Fast term queries
BM25 relevance scoring
Efficient compression
Segment-based architecture

Actix-web - HTTP server

Async/await runtime
Multi-threaded worker pool
~50k+ req/sec throughput
Built-in middleware support

Performance Characteristics

Operation	Latency	Throughput
Get by primary key	1-5 μs	200k+ ops/sec
Get by indexed field	100 μs	10k+ ops/sec
Full-text search	1-5 ms	1k+ ops/sec
Insert/Update	50-200 μs	5k+ ops/sec
Bulk insert (1000 items)	50 ms	20k items/sec
Delete by primary key	50-100 μs	10k+ ops/sec

Benchmarks on modern NVMe SSD, varies by data size and query complexity

Why These Technologies?

No external dependencies - Single binary, no database server needed
Embedded = Fast - No network overhead, direct memory access
Rust = Safe & Fast - Memory safety without garbage collection overhead
ACID compliance - Data consistency guaranteed, even on crash
Scalable - Handles millions of documents efficiently on single node
Simple deployment - Copy binary and run, no configuration needed

Configuration

Default settings.json (auto-created on first run):

{
  "port": 8080,
  "allowed_ips": ["127.0.0.1"],
  "rate_limit_per_second": 10,
  "data_path": "./data",
  "index_path": "./index",
  "upsert_index_buffer": 15000000,
  "compact_index_buffer": 50000000,
  "default_scan_limit": 100,
  "max_scan_limit": 1000,
  "payload_limit": 2097152
}

Key Settings

Setting	Description	Unit	Default
`port`	HTTP server port	-	8080
`allowed_ips`	IP whitelist for admin operations	-	["127.0.0.1"]
`rate_limit_per_second`	Max GET requests per IP per second	req/sec	10
`data_path`	Sled database directory	-	./data
`index_path`	Tantivy indexes directory	-	./index
`upsert_index_buffer`	Memory for index writes	bytes	15 MB
`compact_index_buffer`	Memory for compaction	bytes	50 MB
`default_scan_limit`	Default items per query	items	100
`max_scan_limit`	Maximum items per query	items	1000
`payload_limit`	Max HTTP request body size	bytes	2 MB

Use Cases

✅ Perfect For:

E-commerce product catalogs - Fast product search and filtering
Content management - Articles, blogs with full-text search
API response cache - Cache expensive API calls with TTL
Session storage - Persistent sessions with fast lookup
Real-time analytics - Temporary data aggregation
Search-as-you-type - Instant search suggestions
Inventory systems - SKU lookups and stock filtering
Testing/development - Drop-in database for prototypes

❌ Not Ideal For:

Complex joins - Use relational database (PostgreSQL, MySQL)
Very large datasets - >100GB may need distributed system
Frequent schema changes - Requires re-indexing
Transactional workflows - No multi-collection transactions

Maintenance

Monitoring

Check /status endpoint regularly for these metrics:

Metric	Action Needed When
`indexes.total_segments`	> 10 segments → Run `/compact`
`disk.free_bytes`	< 10% free → Clean up or add storage
`memory.resident_bytes`	Growing continuously → Check for leaks
`can_store_data`	`false` → Disk full, urgent action needed

Optimization

Weekly maintenance:

curl -X DELETE http://127.0.0.1:8080/compact \
  -H "X-Confirm-Compact: true"

Performance tips:

Run compaction during low-traffic hours
Monitor segment count - more segments = slower search
Use specific filters instead of full scans
Limit query results with limit parameter

Troubleshooting

Slow Queries?

Check segments:

curl http://127.0.0.1:8080/status | grep total_segments

If >10 segments, run compact.

Solutions:

Run DELETE http://127.0.0.1:8080/compact
Reduce indexed fields
Add more specific filters to queries
Use primary key lookups when possible

High Memory Usage?

Check current usage:

curl http://127.0.0.1:8080/status | grep memory

Solutions:

Reduce upsert_index_buffer and compact_index_buffer in settings
Run compaction to free memory
Restart server if memory leak suspected

Disk Space Full?

Check available space:

curl http://127.0.0.1:8080/status | grep disk

Solutions:

Run compaction to reclaim space
Delete old collections
Clear unnecessary data from other applications

Search Not Working?

Verify field is indexed:

curl http://127.0.0.1:8080/trees

Check if field appears in indexed array.

Solutions:

Ensure field was in X-Upsert-Field header during POST
Re-insert data with correct header
Check field name spelling in query

Rate Limit Errors (HTTP 429)?

Increase limit for legitimate traffic:

curl -X PUT http://127.0.0.1:8080/settings \
  -H "Content-Type: application/json" \
  -d '{"rate_limit_per_second": 50}'

Or add IP to allowed_ips to bypass rate limiting.

Testing

Run integration tests:

# Start server
cargo run

# In another terminal
cargo test

All 14 tests cover:

CRUD operations
Search queries (full-text, filters, ranges)
Pagination and sorting
Delete operations (by key, entire collection)
System endpoints (status, settings, trees)
Access control validation
Rate limiting

Example: E-commerce Integration

Backend Setup (Admin IP: 127.0.0.1)

1. Bulk import products from main database:

# Export from PostgreSQL/MySQL as NDJSON
# Import to ZeroCache
curl -X POST http://127.0.0.1:8080/data/products \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Upsert-Field: sku,name,description,category,brand,price" \
  --data-binary @products.ndjson

2. Update single product:

curl -X POST http://127.0.0.1:8080/data/products \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Upsert-Field: sku,name,category,price" \
  -d '{"sku":"ABC123","name":"Updated Name","category":"Electronics","price":99.99}'

Frontend Queries (Any IP, Rate Limited)

Search products:

// Full-text search
fetch('http://cache.example.com:8080/data/products?q=laptop')
  .then(r => r.json())
  .then(data => console.log(data.products));

// Filter by category
fetch('http://cache.example.com:8080/data/products?category=Electronics&limit=20')
  .then(r => r.json())
  .then(data => displayProducts(data.products));

// Price range + sort
fetch('http://cache.example.com:8080/data/products?filter_min_price=100&filter_max_price=500&sort_by=price&sort_order=asc')
  .then(r => r.json())
  .then(data => displayProducts(data.products));

Architecture

┌─────────────────┐
│  PostgreSQL/    │  (Source of truth)
│  MySQL Database │
└────────┬────────┘
         │ Sync/Export
         ▼
┌─────────────────┐
│   ZeroCache     │  (Fast search cache)
│   127.0.0.1     │  - Admin access only
└────────┬────────┘
         │ Public queries
         ▼
┌─────────────────┐
│   Website       │  (Any IP)
│   Users         │  - Rate limited GET
└─────────────────┘

Benefits:

Main DB handles writes and complex queries
ZeroCache handles fast product search
10-100x faster than SQL for search queries
No load on main database for searches

ZeroCache - Zero complexity, maximum performance! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
benchmark		benchmark
http		http
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
benchmark.md		benchmark.md
settings.json		settings.json

License

brenik/ZeroCache

Folders and files

Latest commit

History

Repository files navigation

ZeroCache

Features

Access Control Architecture

Public Read Access (Any IP, Rate Limited)

Admin Operations (IP Restricted, No Rate Limit)

Why This Design?

Quick Start

Installation

Basic Usage

API Reference

All Endpoints

Collections

Create/Update Items (Upsert)

Query Data

Get by Primary Key

Get by Indexed Field

Full-Text Search

Multiple Filters

Range Filters

Pagination

Sorting

Combined Query

Delete Operations

Delete by Primary Key

Delete Entire Collection

System Operations

Get Status

List Collections

Get Settings

Update Settings

Compact Database

Purge All Data

Indexed Fields Explained

Why Index Fields?

How It Works

Indexing Strategy

Technology Stack

Core Technologies

Performance Characteristics

Why These Technologies?

Configuration

Key Settings

Use Cases

✅ Perfect For:

❌ Not Ideal For:

Maintenance

Monitoring

Optimization

Troubleshooting

Slow Queries?

High Memory Usage?

Disk Space Full?

Search Not Working?

Rate Limit Errors (HTTP 429)?

Testing

Example: E-commerce Integration

Backend Setup (Admin IP: 127.0.0.1)

Frontend Queries (Any IP, Rate Limited)

Architecture

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages