A high-performance, filesystem-backed Content-Addressed Storage (CAS) worker for the Acorn framework. Stores and retrieves data on disk using SHA256 content identifiers, with optional LFU decay eviction for capacity-bounded caches.
AcornDiskWorker implements the AcornCASWorker protocol as a Swift actor, providing thread-safe disk persistence for content-addressed data. It is designed to slot into Acorn's layered caching chain — typically as a mid-tier between a fast in-memory worker and a slower network worker.
[MemoryWorker] <-near/far-> [DiskCASWorker] <-near/far-> [NetworkWorker]
- Swift 6.0+
- macOS 13+ / iOS 16+
- Acorn package (local dependency)
Add AcornDiskWorker as a dependency in your Package.swift:
.package(path: "../AcornDiskWorker"),Then add it to your target's dependencies:
.target(name: "YourTarget", dependencies: ["AcornDiskWorker"])import AcornDiskWorker
import Acorn
let worker = try DiskCASWorker(directory: cacheDir)
let data = Data("hello, world".utf8)
let cid = ContentIdentifier(for: data)
await worker.storeLocal(cid: cid, data: data)
let retrieved = await worker.get(cid: cid) // Data("hello, world")let worker = try DiskCASWorker(
directory: cacheDir,
capacity: 1000, // max entries before eviction
maxBytes: 50_000_000, // max total bytes on disk (50 MB)
halfLife: .seconds(300), // frequency scores decay with this half-life
sampleSize: 5 // candidates sampled per eviction decision
)Eviction triggers when either the entry count or byte limit is exceeded. The least-frequently-used entry (with time decay) is evicted.
let worker = try DiskCASWorker(
directory: cacheDir,
verifyReads: false // skip SHA256 re-hash on reads
)When verifyReads is false, reads return data without re-hashing to verify the CID. Use this for trusted-local caches where bit rot is not a concern.
// Before shutdown — save bloom filter + size index to disk
try await worker.persistState()
// On next init, state is loaded from disk instead of scanning all files
let worker2 = try DiskCASWorker(directory: cacheDir, capacity: 10_000)await worker.delete(cid: someCID)let m = await worker.metrics
print("hits: \(m.hits), misses: \(m.misses), evictions: \(m.evictions)")
print("total bytes on disk: \(await worker.totalBytes)")Inject a custom FileSystemProvider for testing or alternative storage backends:
let worker = try DiskCASWorker(directory: dir, fileSystem: MyCustomFS())let memory = MemoryCASWorker()
let disk = try DiskCASWorker(directory: cacheDir, capacity: 10_000)
let chain = await CompositeCASWorker(
workers: ["memory": memory, "disk": disk],
order: ["memory", "disk"]
)
// get() walks the chain: memory first, then disk
let data = await chain.get(cid: someCID)| Method | Description |
|---|---|
init(directory:capacity:maxBytes:halfLife:sampleSize:timeout:verifyReads:fileSystem:) |
Create a worker backed by the given directory. Loads persisted state or scans existing files on init. Pre-creates 256 shard directories. |
has(cid:) -> Bool |
Check if a CID exists. Uses Bloom filter for fast rejection, falls back to access() syscall. |
getLocal(cid:) async -> Data? |
Read data from disk. Bloom filter rejects unknown CIDs in nanoseconds. Optionally verifies SHA256 integrity. |
storeLocal(cid:data:) async |
Write data to disk via temp file + rename (no fsync). Evicts until within capacity. |
delete(cid:) |
Remove an entry from cache, size index, and disk. |
persistState() |
Save Bloom filter and item sizes to disk for fast restart. |
get(cid:) -> Data? |
Protocol default: checks near worker first, then local, with optional timeout. |
store(cid:data:) |
Protocol default: stores locally, then propagates to near worker. |
metrics -> CASMetrics |
Hits, misses, stores, evictions, deletions, corruption detections. |
totalBytes -> Int |
Current total bytes tracked on disk (O(1) lookup). |
- Actor isolation ensures all state mutations (cache scores, size tracking) are serialized without explicit locks.
- POSIX I/O — raw
open/read/write/close/renamesyscalls bypass Foundation overhead. No fsync on writes — CAS is self-verifying, so partial writes are caught by the hash check on read. - Bloom filter — in-memory probabilistic filter eliminates filesystem
statcalls for unknown CIDs. Misses resolve in ~80 nanoseconds. Serializable to disk for fast restart. - Generic over
FileSystemProvider—DiskCASWorker<F>is specialized by the compiler in release mode, eliminating existential dispatch overhead on every I/O call. - Configurable integrity verification — SHA256 re-hash on reads can be disabled for trusted-local caches where throughput matters more than bit-rot detection.
- Data integrity (when enabled) — every read verifies the SHA256 hash matches the CID. Corrupted files are auto-deleted.
- Directory sharding — files are stored in
<dir>/<2-char-prefix>/<hash>with all 256 shard directories pre-created at init. - LFU with exponential decay means recently-accessed items are favored over historically-popular-but-stale items. Scores decay continuously with a configurable half-life.
- Dual capacity limits — eviction can be triggered by entry count (
capacity), total bytes (maxBytes), or both. - State persistence — Bloom filter and item sizes can be saved to disk, turning O(n) init scans into O(1) file loads on restart.
- Sampled eviction (like Redis) avoids scanning the entire cache — a random sample is drawn and the lowest-scored item is evicted.
- Zero-allocation hex encoding —
ContentIdentifieruses a byte lookup table withString(unsafeUninitializedCapacity:)for zero intermediate allocations. - O(1) byte tracking —
totalBytesis a running counter, not a reduction over the size dictionary. - Injectable file system —
FileSystemProviderprotocol allows swapping the I/O layer for testing or alternative backends.
Benchmarked on Apple Silicon (M-series), release mode:
| Operation | Latency | Notes |
|---|---|---|
| store (64B) | ~300us | POSIX write + rename, no fsync |
| store (4KB) | ~306us | Dominated by syscall overhead, not data size |
| get hit (64B) | ~41us | Includes SHA256 verification |
| get hit (256KB) | ~129us | SHA256 scales with data size |
| get miss | ~86ns | Bloom filter rejects without touching disk |
| has (hit) | ~15us | Bloom filter + access() syscall |
| has (miss) | ~75ns | Bloom filter only |
| delete | ~69us | unlink() syscall |
| eviction store | ~284us | Store with LFU eviction |
| mixed (80r/20w) | ~94us | Realistic workload |
swift run -c release DiskCASBenchmarks --save-baseline
swift run -c release DiskCASBenchmarks --check-baselineThe --check-baseline flag compares against a saved baseline and flags regressions greater than 15%.
swift test15 tests covering: round-trip storage, missing key lookups, cross-instance persistence, existence checks, LFU eviction, explicit deletion, corruption detection, size-based eviction, metrics tracking, init scan reconciliation, Bloom filter misses, totalBytes tracking, state persistence, and configurable verification.