Implement Hive runtime structural upgrades#21
Conversation
christopherkarani
commented
Feb 19, 2026
- Summary
- Introduce new persistent data structures, runtime indexing, native queues, mmap checkpoints, and signpost instrumentation so Hive can fork, checkpoint, and profile safely before phase-based rollout.
- Replace the global store backing map, join barrier logic, and memory recall with deterministic HAMT, bitset barriers, and BM25 inverted index respectively, plus add C-backed MPSC queues and mmap checkpoint support with new instrumentation hooks.
- Testing
- Not run (not requested)
|
@claude review |
|
Claude finished @christopherkarani's task in 2m 54s —— View job PR Review: Implement Hive runtime structural upgradesTasks:
This is a substantial PR adding meaningful capabilities. The core ideas are sound, but there are several correctness and maintainability concerns worth addressing before merge. Critical Issues1. Accidentally committed claude-mem CLAUDE.md files29 of the 50 changed files are
These all read: These should be removed. They bloat the PR noise, obscure the real changes, and commit tooling artifacts that have no business being in version control. Fix this → 2.
|
| Feature | Tests Present | Gap |
|---|---|---|
HiveBitset |
✅ 3 tests | Missing: insert out-of-bounds, wordCount: 0 |
HiveInvertedIndex |
✅ via MemoryStore tests | Missing: direct unit tests, remove on nonexistent doc |
| Static layer analysis | ✅ 3 tests | Good coverage |
HiveCachePolicy / HiveNodeCache |
❌ None | LRU eviction, TTL expiry, cache hit/miss in runtime |
Deferred nodes (HiveNodeOptions.deferred) |
❌ None | Deferred execution ordering |
fork |
❌ None | Fork from checkpoint, fork inheriting state |
getState |
❌ None | In-memory vs checkpoint fallback |
| Ephemeral channel reset | ❌ None | Reset after superstep |
Summary
The core structural changes — bitset join barriers, BM25 recall, static layer analysis, deferred nodes, and cache policy — are well-conceived. The implementation quality is generally good. The main concerns are:
- Remove the accidental claude-mem CLAUDE.md files — they dominate the diff and should not be committed
- Fix the expired cache entry bug — entries should be evicted on access
- Factor out the duplicated fork loop — maintainability risk
- Add tests for cache, deferred nodes, and fork
| PR branch: 0218
|
@claude make fixes, add tests |
|
Claude finished @christopherkarani's task in 7m 6s —— View job Fixes and tests appliedTasks:
Changes in commit
|
- Remove 21 accidentally committed claude-mem CLAUDE.md files from Sources/HiveSwarm, Sources/Wax, Sources/WaxCore, Sources/WaxMCPServer, Wax/Sources/WaxMCPServer, https:/, and ~/.claude paths - HiveCachePolicy: remove expired cache entries on lookup (bug fix) Previously, expired entries were returned as nil but stayed in `entries`, counting against maxEntries and displacing valid entries via LRU eviction - HiveCachePolicy: fix UInt64 overflow in lruTTL nanosecond conversion Cap seconds to 9 billion before multiplying, use wrapping arithmetic - HiveCachePolicy: use shared JSONEncoder instead of per-call allocation Avoids constructing a new JSONEncoder for every channel on every key computation - HiveInvertedIndex: return Double scores instead of Float Truncating Double BM25 scores to Float lost ~7 decimal digits of precision relevant to deterministic tie-breaking - HiveMemoryTypes / HiveRAGWaxStore: propagate Float→Double change HiveMemoryItem.score and HiveRAGWaxStore's internal scoring both updated to Double for consistency - Add HiveInvertedIndexTests: 11 tests covering empty index, term/limit edge cases, upsert/remove, duplicate key replacement, limit, tie-breaking, and score ordering - Add HiveNodeCacheTests: 8 tests covering miss, hit, TTL expiry with entry removal, LRU eviction, overwrite, and lruTTL overflow safety - Extend HiveBitsetTests: 4 new edge-case tests for wordCount:0, out-of-bounds insert/contains, and bitCapacity:0 clamping Co-authored-by: Christopher Karani <christopherkarani@users.noreply.github.com>
|
@claude Review |
|
Claude finished @christopherkarani's task in 4m 38s —— View job PR Review: Implement Hive runtime structural upgradesTasks:
This review covers the current state of the PR after the two previous fix passes. The core architectural additions are sound; the remaining issues are primarily in the cache integration. Critical Correctness Issues1. Cache hits do not update LRU order — eviction is broken
guard var nodeCache = state.nodeCaches[task.nodeID] else { continue }
guard let cachedOutput = nodeCache.lookup(key: cacheKey, policy: cachePolicy, nowNanoseconds: cacheNowNs)
else { continue }
cachedResultsByTaskIndex[index] = cachedOutput
Fix: after the lookup loop, write mutated caches back into 2.
|
| Feature | Status |
|---|---|
| Cache hit LRU update (issue #1) | ❌ No test demonstrates LRU update on hit |
fork from checkpoint with live step execution |
❌ No integration test |
getState with checkpoint fallback |
❌ No test |
HiveNodeOptions.deferred end-to-end |
❌ No integration test |
| Ephemeral channel reset after superstep | ❌ No test |
HiveCachePolicy.channels with unknown channel IDs |
❌ No test |
Summary
The critical item before merge is issue #1 (cache hits discarding LRU updates) — this makes the LRU eviction policy non-functional for hit traffic. Issue #2 (empty hash on registry failure) is a silent correctness hazard that should also be addressed. Issue #3 (duplicate nextNodes) is a minor API contract bug.
The forkAttempt duplication (issue #4) is a maintenance risk but doesn't produce incorrect output at runtime.
| PR branch: 0218
|
@claude fix all issues close all gaps |
|
Claude finished @christopherkarani's task in 6s —— View job Fixes applied and test gaps closedAll issues from the previous two reviews have been addressed in commit Changes made
New test file:
Remaining items (not addressed)None of the issues from the previous reviews remain open. The one item previously marked "too complex for this pass" ( |
Critical fixes: - LRU writeback: write mutated nodeCache back to state.nodeCaches after every lookup so lastUsedOrder updates (and expired-entry removals) are not silently discarded (fixes cache hit LRU ordering and eviction correctness) - storeHashKey silent failure: capture sortedGlobalSpecs at factory time in lru()/lruTTL() instead of constructing HiveSchemaRegistry per call; avoids per-call allocation AND the constant-hash bug when registry construction fails - nextNodes deduplication: HiveStateSnapshot.nextNodes now deduplicates and sorts via deduplicatedNextNodes(from:) helper (fixes duplicate nodes when scheduled by both a router and a static edge) - Log cache key errors: replace try? with do/catch + environment.logger.debug so cache key provider failures are surfaced rather than silently disabled Refactoring: - Extract shared executeRunLoop helper from runAttempt/resumeAttempt/forkAttempt eliminating ~209 lines of near-verbatim duplication; firstStepResume and clearInterruptionAfterFirstStep parameters encode the two behavioral differences New tests (HiveRuntimeCacheAndForkTests.swift): - testCache_HitSkipsNodeExecution: end-to-end cache hit skips re-execution - testCache_LRUWritebackPreservesOrder: LRU order persists across hits - testCachePolicy_ChannelsUnknownIDs: unknown channel IDs degrade gracefully - testGetState_CheckpointFallback: getState loads from checkpoint when no in-memory state exists - testGetState_NilWhenNoState: getState returns nil when thread never ran - testDeferredNodes_ExecuteAfterMainFrontier: deferred nodes run after main frontier is exhausted - testEphemeralChannel_ResetsAfterSuperstep: ephemeral channels reset to initial value after each superstep commit - testFork_RunsFromCheckpointToCompletion: fork loads checkpoint and runs new thread to completion - testGetState_NextNodesDeduplication: nextNodes deduplicates nodes that appear multiple times in frontier Co-authored-by: Christopher Karani <christopherkarani@users.noreply.github.com>