Skip to content

feat: integrate sproink Rust spreading activation via CGO#268

Merged
nvandessel merged 33 commits intomainfrom
feat/sproink-integration
Apr 14, 2026
Merged

feat: integrate sproink Rust spreading activation via CGO#268
nvandessel merged 33 commits intomainfrom
feat/sproink-integration

Conversation

@nvandessel
Copy link
Copy Markdown
Owner

Summary

  • Replaces floop's per-node-per-step SQLite edge queries with a single FFI call into sproink's in-memory CSR graph for spreading activation
  • Introduces Activator interface with NativeEngine (sproink via CGO) and existing Engine (pure-Go fallback)
  • Auto-detects CGO availability at startup — falls back gracefully on CGO_ENABLED=0 or missing libsproink
  • Adds GetAllEdges and Version() to ExtendedGraphStore for bulk graph loading and staleness detection

Architecture

floop_active call
  → LanceDB pre-filter (unchanged)
  → NativeEngine.Activate()
    → version check → auto-rebuild if stale
    → IDMap: UUID → u32
    → sproink_activate() (single FFI call, entire propagation in Rust)
    → IDMap: u32 → UUID
    → seed source attribution
  → Hebbian learning (Go-side, unchanged)

Key Design Decisions

  • Temporal decay: pre-applied in Go during graph load (sproink FFI doesn't expose it yet)
  • Feature affinity: virtual edges pre-materialized during graph build
  • Graph lifecycle: version counter on store, synchronous rebuild on staleness (sub-ms for floop graph sizes)
  • Co-activation pairs: Go-side functions retained for now; FFI wrappers implemented and tested for parity but not wired into handler

Files (23 changed, +2381/-11)

Area Files
Interface activator.go, engine.go (conformance check)
ID mapping idmap.go, edgekind.go
CGO bindings sproink_cgo.go, sproink_nocgo.go, sproink_pairs_cgo.go
Graph loading graph_loader.go
Store store.go, sqlite.go, sqlite_extended.go, multi.go
Integration server.go, handler_active.go
Tests 8 test files, 30+ test functions including parity and race detection

Test plan

  • CGO_ENABLED=1 go test ./internal/spreading/... -race — all pass
  • CGO_ENABLED=0 go test ./internal/spreading/... — nocgo stub tests pass
  • go test ./internal/store/... — store tests pass
  • Parity tests: NativeEngine vs Engine produce equivalent results (8 scenarios)
  • Property: boundary tolerance for nodes near MinActivation threshold
  • CI: needs Rust toolchain for cross-platform builds (tracked in floop-s8l)

Follow-up beads

  • floop-s8l (P2) — CI: Add Rust toolchain for sproink builds
  • floop-1wi (P3) — Wire FFI pair extraction into handler
  • floop-5ba (P3) — Expose temporal decay + step snapshots via sproink FFI
  • floop-gkd (P3) — Performance benchmarks

🤖 Generated with Claude Code

nvandessel and others added 12 commits April 5, 2026 04:16
Add Activator interface to decouple spreading activation consumers from
the concrete Engine implementation, enabling the upcoming NativeEngine
(sproink CGO) to be used interchangeably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements IDMap for the sproink FFI layer. Provides GetOrAssign
(sequential ID assignment), ToU32/ToUUID (bidirectional lookup),
and Len. Includes TDD tests for roundtrip, contiguity, idempotency,
unknown-UUID lookup, out-of-range panic, and length tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Map floop store.EdgeKind values to sproink uint8 constants:
conflicts→1, overrides/deprecated-to/merged-into→2,
feature-affinity→4, everything else→0 (positive).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GetAllEdges for bulk edge retrieval (sproink CSR graph building)
and atomic Version counter for stale-graph detection. MultiGraphStore
delegates both to localStore. All mutating methods bump version via
sync/atomic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thin Go wrappers around all sproink C functions: graph build/free,
activate, results extraction, co-activation pairs, and Oja update.
The nocgo stub provides compile-time Activator conformance with
runtime errors when CGO is disabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Loads edges from store, builds IDMap with isolated node support,
pre-materializes deduplicated affinity edges, applies temporal decay,
and calls sproink_graph_build via CGO FFI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…proink FFI

NativeEngine wraps the sproink CSR graph for spreading activation. It
detects store staleness via version checks before acquiring a read lock,
rebuilds the graph transparently, and maps FFI results back to UUIDs
with seed-source attribution and activation-descending sort.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that non-CGO builds return expected errors from NativeEngine
constructor and all methods, ensuring clean build separation between
CGO and non-CGO paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…patch

Construct spreading Activator once at server startup (NativeEngine via
sproink FFI when ExtendedGraphStore is available, pure-Go Engine as
fallback). Replace per-call engine creation in handleFloopActive with
s.activator.Activate(). Close NativeEngine via io.Closer assertion in
Server.Close().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement high-level FFI wrappers for sproink's co-activation pair
extraction and Oja weight update. Both wrappers are tested for parity
with Go-side implementations. Handler keeps Go-side functions per plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify both engines produce identical results for the same inputs.
Tests cover positive graphs, conflict suppression, directional
suppressive edges (overrides/merged-into), empty seeds, single seed
with no neighbors, inhibition toggle, and multi-seed overlap.

MinActivation raised to 0.05 in test config to avoid sigmoid(0)
phantom entries. Rank order comparison tolerates tied activations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NativeEngine and Go Engine can produce slightly different results for
nodes near MinActivation threshold due to propagation order differences
(Go uses map iteration; sproink uses CSR with bidirectional storage).
Nodes present in only one result set with activation below 2x
MinActivation are now tolerated as boundary effects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.75%. Comparing base (8bf6c8d) to head (6d6c6d9).

Files with missing lines Patch % Lines
internal/store/sqlite.go 69.09% 10 Missing and 7 partials ⚠️
internal/store/multi.go 64.28% 5 Missing and 5 partials ⚠️
internal/mcp/server.go 61.11% 5 Missing and 2 partials ⚠️
internal/store/sqlite_extended.go 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #268      +/-   ##
==========================================
- Coverage   78.81%   78.75%   -0.06%     
==========================================
  Files         186      189       +3     
  Lines       18646    18779     +133     
==========================================
+ Hits        14695    14789      +94     
- Misses       2731     2757      +26     
- Partials     1220     1233      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR replaces per-node SQLite edge queries with a single FFI call into a Rust CSR graph (sproink) for spreading activation, introducing a NativeEngine/Activator interface with graceful pure-Go fallback, and adding GetAllEdges/Version() to ExtendedGraphStore. All previously-flagged P0/P1 issues (version data race, TOCTOU in Rebuild and construction, double-free, thundering herd, multi-store parity gap, CI cache logic, silent global-store error degradation) are resolved in the follow-up commits.

Confidence Score: 5/5

  • Safe to merge; all prior P0/P1 findings are fixed and only two P2 style suggestions remain.
  • Every critical concurrency, memory-safety, and correctness issue raised in previous review rounds has been addressed. The two remaining findings are P2: a defensive bounds check for FFI-returned node IDs (IDMap.ToUUID panics on out-of-range) and a CI cache key that only captures header changes and not Rust source changes. Neither blocks correctness in normal operation.
  • internal/spreading/sproink_cgo.go (ToUUID panic on FFI out-of-range), .github/workflows/ci.yml (cache key gap)

Important Files Changed

Filename Overview
internal/spreading/sproink_cgo.go Core NativeEngine implementation: all previously flagged issues (version race, TOCTOU, double-free, thundering herd) are fixed; one P2 remains — IDMap.ToUUID panics on out-of-range FFI node IDs without a defensive bounds check.
internal/spreading/graph_loader.go Bulk graph loading with temporal decay, affinity edge materialisation, and empty-store handling; correct TOCTOU-safe pattern (version snapshot before reading edges) applied by callers.
internal/store/multi.go GetAllEdges and Version() now merge local + global stores; Version() uses addition (monotonically non-decreasing), global GetAllEdges errors are propagated rather than silently swallowed.
internal/store/sqlite.go Adds GetAllEdges, Version(), and bumpVersion(); all write paths (AddNode, UpdateNode, DeleteNode, AddEdge, RemoveEdge) correctly bump the atomic version counter.
internal/mcp/server.go NativeEngine wired into Server; io.Closer Close() called on shutdown after workerWg.Wait() ensuring no active activations race with cleanup.
.github/workflows/ci.yml All prior CI cache-logic bugs fixed; one P2 remains — cache key based on sproink.h hash does not detect Rust source-only changes, meaning a bug-fix in sproink without a header change reuses the stale cached binary.
internal/spreading/idmap.go Clean bidirectional UUID ↔ uint32 mapping; ToUUID intentionally panics on out-of-range — defensive callers needed at FFI boundary.
internal/store/sqlite_extended.go TouchEdges and BatchUpdateEdgeWeights now correctly bump version; PruneWeakEdges also bumped.
internal/spreading/sproink_nocgo.go Correct stub for CGO_ENABLED=0 builds; NewNativeEngine returns error, activator falls back to pure-Go Engine.
internal/spreading/sproink_pairs_cgo.go FFI pair extraction and Oja update wrappers; nativeActivateAndExtractPairs correctly holds RLock before accessing idmap/graph; NativeExtractPairs exported but parameter type (C.SproinkResults) is inaccessible from outside the package.
internal/spreading/parity_test.go 8 parity scenarios with appropriate epsilon tolerance; boundary zone handling for nodes near MinActivation threshold is well-reasoned.

Sequence Diagram

sequenceDiagram
    participant H as handler_active
    participant NE as NativeEngine
    participant S as Store (Multi/SQLite)
    participant FF as sproink FFI (Rust)

    H->>NE: Activate(ctx, seeds)
    NE->>S: Version() [atomic]
    alt "store version > engine version"
        NE->>NE: Rebuild(ctx) [write lock]
        NE->>S: Version() [snapVersion]
        NE->>S: GetAllEdges(ctx)
        S-->>NE: []Edge (local + global)
        NE->>S: "QueryNodes(kind=behavior)"
        S-->>NE: []Node (isolated nodes)
        NE->>FF: sproink_graph_build(numNodes, edges, weights, kinds)
        FF-->>NE: "*SproinkGraph (CSR)"
        NE->>NE: atomic.Store(version, snapVersion)
    end
    NE->>NE: RLock
    NE->>NE: map seeds → u32 via IDMap
    NE->>FF: sproink_activate(graph, seeds, params)
    FF-->>NE: "*SproinkResults"
    NE->>NE: map u32 results → UUID via IDMap
    NE->>NE: RUnlock
    NE-->>H: []Result (sorted by activation desc)
    H->>S: TouchEdges + Hebbian learning (Go-side)
Loading

Reviews (20): Last reviewed commit: "fix(test): stabilize hybrid scorer tests..." | Re-trigger Greptile

- P1: Use sync/atomic for NativeEngine.version to prevent data race
  between Activate (read) and Rebuild (write)
- P1: MultiGraphStore.GetAllEdges now merges local + global store edges
  so NativeEngine sees the same graph as the pure-Go Engine
- P1: MultiGraphStore.Version sums both stores' versions for staleness
  detection across both scopes
- P2: Rebuild re-checks version after acquiring write lock to prevent
  thundering herd (N goroutines observe staleness, only first rebuilds)
- Widen parity test epsilon for mixed conflict+positive edges (different
  propagation order causes different suppression magnitudes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set e.graph = nil immediately after sproinkGraphFree in Rebuild to
  prevent double-free if loadGraph fails on retry
- Add rows.Err() check after GetAllEdges iteration loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nvandessel and others added 4 commits April 6, 2026 01:14
- P1: Capture version snapshot BEFORE loadGraph in Rebuild to prevent
  racing writes from being silently missed (TOCTOU fix)
- P2: Add bumpVersion() to TouchEdges so last_activated updates trigger
  CSR graph rebuilds with fresh temporal decay values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Commit third_party/sproink/include/sproink.h (C header needed for CGO
  builds; binaries remain gitignored)
- Fix nil context lint errors in nocgo tests (use context.TODO())
- Add #cgo CFLAGS to sproink_pairs_cgo.go for independent compilation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues found — one correctness bug, one dead code cleanup.

P1 — TOCTOU in NewNativeEngine (same class of bug as the one fixed in Rebuild): The Rebuild method correctly captures snapVersion before calling loadGraph, but NewNativeEngine still calls s.Version() after loadGraph returns. If a concurrent write lands between GetAllEdges inside loadGraph and the s.Version() call, the engine is initialised with a stale graph but a version counter that already reflects the post-write state. The next Activate call will then observe store.Version() <= e.version and skip the rebuild, silently serving a stale graph until another write arrives.

P2 — Dead else-branch in activator selection: *MultiGraphStore now implements ExtendedGraphStore in full (it gained GetAllEdges and Version() in this PR). The store.GraphStore(graphStore).(store.ExtendedGraphStore) assertion will therefore always succeed, making the else branch (activator = spreading.NewEngine(graphStore, spreadConfig)) unreachable dead code. The intermediate store.GraphStore(...) widening conversion is also redundant.

nvandessel and others added 3 commits April 6, 2026 23:07
Capture store version before loadGraph (same pattern as Rebuild) to
prevent a racing write from being silently missed at startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build libsproink.a from source in the test-cgo CI job so CGO+LanceDB
tests can link against it. Uses dtolnay/rust-toolchain action and
caches the Rust build artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle Windows Rust output (sproink.lib → libsproink.a) in CI
- Make TokenStats test engine-agnostic: measure seed baseline dynamically
  then verify user behaviors add expected deltas, rather than hardcoding
  absolute counts that depend on which engine activates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rust defaults to MSVC on Windows but Go CGO uses MinGW. Build sproink
with --target x86_64-pc-windows-gnu to produce a MinGW-compatible
static library.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nvandessel and others added 3 commits April 10, 2026 17:14
Prevents git clone failure on cache hit when sproink-build/target/
is restored but libsproink.a hasn't been copied to third_party/ yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nvandessel and others added 2 commits April 11, 2026 21:41
- Check for .git subdir (not just directory) before cloning sproink,
  since actions/cache restores target/ creating a non-git directory
- Add --allow-multiple-definition for Windows to resolve duplicate
  rust_eh_personality symbols when linking both sproink and lancedb
  static libraries (both bundle Rust stdlib)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The build guard was checking third_party/sproink/lib/.../libsproink.a
which is never cached (only sproink-build/target/ is). On cache hit,
this always re-entered the build block where cargo build failed because
only target/ was restored without Cargo.toml and src/.

Now checks the cached artifact path directly and always copies to
third_party/ after build or cache restore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nvandessel and others added 3 commits April 11, 2026 21:55
Cache restores sproink-build/target/ but not .git, so the .git guard
always triggered rm + fresh clone, defeating the cache. Now checks for
Cargo.toml which exists after clone and won't conflict with cache
restore of target/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On cache hit, sproink-build/ exists (containing target/) but has no
source files. git clone fails because the directory is non-empty.
Now clones to a temp dir, preserves any cached target/, then renames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nvandessel and others added 3 commits April 13, 2026 12:19
…adation

Silent degradation on global store errors taints the version snapshot
in Rebuild — snapVersion includes the global counter but the graph is
built from local-only edges. Subsequent Activate calls see version as
current and permanently skip rebuild. Propagating the error forces a
retry on the next activation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
time.Since(CreatedAt) can be 0 on Windows (15ms clock) when the behavior
is created immediately before scoring, causing baseLevelScore to diverge
between the two Score() calls in formula-verification tests. Set CreatedAt
to 24h ago so age is large and stable regardless of clock resolution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nvandessel nvandessel merged commit 811f50f into main Apr 14, 2026
18 checks passed
@nvandessel nvandessel deleted the feat/sproink-integration branch April 14, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants