fix(native): type map confidence-aware dedup for engine parity by carlos-alm · Pull Request #885 · optave/ops-codegraph-tool

carlos-alm · 2026-04-06T21:42:10Z

Summary

Fix native engine type map producing different receiver/calls edges than WASM engine due to two bugs:
1. Rust edge builder used HashMap::collect() (last-wins) vs JS setTypeMapEntry (highest-confidence, first-wins on tie) — causing different type resolution when the same variable name appears in multiple function scopes (e.g., node: TreeSitterNode vs node: NodeRow in cfg.ts)
2. Rust JS extractor returned early after type annotation, skipping constructor inference — unlike the JS extractor which lets constructors (confidence 1.0) override annotations (0.9)
Add confidence field to Rust TypeMapEntry and TypeMapInput structs across all 12 language extractors
Add JS-side type map dedup in buildCallEdgesNative for immediate parity without requiring native addon rebuild

Test plan

Verified receiver edge counts identical between engines (735 = 735) after fix
cfg.ts: both engines now resolve node to TreeSitterNode (not NodeRow)
resolve.ts: neither engine produces spurious BatchResolvedMap receiver edge
631 integration + parser tests pass
Lint clean (6 pre-existing warnings in unrelated files)
CI build to verify Rust compilation on all platforms

The native engine's type map used HashMap::collect() (last-wins) while the JS engine used setTypeMapEntry (highest-confidence, first-wins on tie). When the same variable name appeared in multiple function scopes with different type annotations (e.g. `node: TreeSitterNode` and `node: NodeRow` in cfg.ts), the engines disagreed on the resolved type, producing different receiver/calls edges. Additionally, the Rust JS extractor returned early after finding a type annotation, skipping constructor inference — unlike the JS extractor which lets constructors (confidence 1.0) override annotations (0.9). Changes: - Add `confidence` field to Rust TypeMapEntry and TypeMapInput structs - Fix Rust edge builder to keep highest-confidence entry per name - Remove early return in Rust JS extractor; emit both annotation and constructor entries with appropriate confidence levels - Add confidence values to all 12 Rust language extractors - Add JS-side type map dedup in buildCallEdgesNative for immediate parity without requiring a native addon rebuild Verified: receiver edge counts now identical between engines (735=735), cfg.ts correctly resolves to TreeSitterNode on both engines.

claude · 2026-04-06T21:42:22Z

Claude finished @carlos-alm's task —— View job

Reviewing PR: Type Map Confidence-Aware Dedup for Engine Parity

Todo List:

Read changed files to understand implementation
Analyze root cause and approach justification
Check for configuration changes that might hide issues
Review test integrity and code quality
Assess backlog compliance
Provide final assessment and rating

Root Cause Analysis

Problem Being Solved: Engine parity bug where native Rust and WASM engines produced different receiver/call edge counts (735 vs different numbers) due to two specific bugs in the native type map handling.

Why This Approach: The fix addresses the actual root causes rather than hiding symptoms:

Rust edge builder dedup bug: Used HashMap::collect() (last-wins) vs JS setTypeMapEntry (highest-confidence, first-wins on tie). This caused different type resolution when same variable names appeared in multiple scopes.
Early return bug: Rust JS extractor returned early after type annotation, skipping constructor inference, unlike JS extractor which lets constructors (confidence 1.0) override annotations (0.9).

Risk Assessment: Low risk. This is a pure correctness fix that makes both engines produce identical results. No behavioral changes to the API or user-facing functionality.

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies added
Foundation-aligned: ✓ — Fully complies with P1 (always-current) and core principles. Fixes actual bugs without breaking API contracts
Problem-fit: 5 — Directly addresses the core problem: "AI coding assistants waste tokens re-orienting in large codebases, hallucinate dependencies, and miss blast-radius." Engine parity prevents AI agents from getting inconsistent results between environments
Breaking: No — Purely additive correctness fix, no incompatible behavior changes
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed engine parity fix that:

✅ Addresses real bugs, not symptoms: The confidence field addition and deduplication logic fix actual algorithmic differences
✅ Maintains test integrity: No tests were weakened, no config relaxed. The parity test framework validates the fix
✅ Follows codebase patterns: Uses existing setTypeMapEntry semantics and consistent confidence values across extractors
✅ Complete implementation: All 12 language extractors updated systematically
✅ Proper verification: PR description shows concrete evidence (735 = 735 edges, specific test cases)

Code Quality Assessment

Excellent implementation:

Systematic approach: All Rust extractors (c.rs, cpp.rs, csharp.rs, etc.) consistently add the confidence field
Correct semantics: edge_builder.rs:159-174 implements proper confidence-aware dedup matching JS setTypeMapEntry
Defensive coding: JS-side dedup in build-edges.ts:355-365 provides immediate parity without requiring native rebuild
Clear documentation: Type map confidence levels well-documented (0.9 = type annotation, 1.0 = constructor, 0.7 = factory)

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)

Action: APPROVE

Reasoning: This is an exemplary engine parity fix that demonstrates deep understanding of both codebases. It fixes the actual root causes (HashMap behavior differences and early returns), provides comprehensive test coverage, maintains backward compatibility, and ensures both engines produce identical results. The systematic implementation across all language extractors shows thoroughness. This is exactly the type of correctness-focused engineering that builds trust in dual-engine architectures.

The PR eliminates a significant source of inconsistency between engines that could confuse AI agents and break the foundational assumption that both engines should produce identical graphs from identical input.

greptile-apps · 2026-04-06T21:47:19Z

Greptile Summary

This PR fixes two bugs in the native engine's type map processing that caused receiver/call edge counts to diverge from the WASM engine: (1) the Rust edge builder used HashMap::collect() (last-wins) instead of highest-confidence-first semantics; (2) the JS extractor returned early after a type annotation, preventing constructor entries (confidence 1.0) from being emitted. Both are corrected cleanly, with a JS-side dedup layer added as an immediate bridge for pre-rebuilt addons.

Confidence Score: 5/5

Safe to merge — all changes are bug fixes restoring engine parity with no regressions in 631 tests

The two root-cause bugs are fixed correctly: Rust edge builder now uses confidence-aware dedup matching JS semantics, and the JS extractor no longer discards constructor entries. The JS bridge layer provides immediate parity. All 12 language extractors are consistently updated. The benchmark infrastructure addition is well-scoped and the existing test patterns are followed. No P0/P1 issues found; prior review concern about Map-path dedup was addressed in discussion.

No files require special attention — all changes are self-consistent

Important Files Changed

Filename	Overview
crates/codegraph-core/src/edge_builder.rs	Confidence-aware HashMap dedup: highest-wins / first-tie semantics now match JS setTypeMapEntry
crates/codegraph-core/src/extractors/javascript.rs	Removed early-return after annotation so both annotation (0.9) and constructor (1.0) entries are emitted; dedup resolves in edge builder
crates/codegraph-core/src/types.rs	Added confidence: f64 to TypeMapEntry with 0.7/0.9/1.0 scale documented
src/domain/graph/builder/stages/build-edges.ts	JS-side dedup pass before native call handles pre-rebuilt-addon Array branch; Map branch already deduped by setTypeMapEntry
tests/benchmarks/regression-guard.test.ts	New resolution precision/recall regression guard with 5pp/10pp thresholds; mirrors existing benchmark test pattern
.github/workflows/benchmark.yml	Added Gate on resolution thresholds step running regression test before merging benchmark data
scripts/update-benchmark-report.ts	Added resolution regression detection emitting GitHub Actions warnings on precision/recall drops
crates/codegraph-core/src/build_pipeline.rs	Passes confidence field through TypeMapInput struct when building call edges

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Source file (JS/TS)"] --> B["Rust JS extractor\nmatch_js_type_map"]
    B --> C{"variable_declarator\nhas type annotation?"}
    C -- Yes --> D["Push TypeMapEntry\nconfidence: 0.9"]
    C -- No --> E["skip annotation"]
    D --> F{"has new_expression?"}
    E --> F
    F -- Yes --> G["Push TypeMapEntry\nconfidence: 1.0"]
    F -- No --> H["type_map Vec (may have duplicates)"]
    G --> H
    H --> I["build_pipeline.rs\nconvert to TypeMapInput (with confidence)"]
    I --> J["edge_builder.rs\nprocess_file"]
    J --> K["Confidence-aware HashMap dedup\nhighest-wins / first-wins on tie"]
    K --> L["type_map: HashMap<name, (type, conf)>"]
    L --> M["resolve_call_targets\nemit_receiver_edge"]
    M --> N["ComputedEdge output"]

    style D fill:#f9f,stroke:#333
    style G fill:#9f9,stroke:#333
    style K fill:#ff9,stroke:#333

_{Reviews (2): Last reviewed commit: "fix: clarify dedup comment per review — ..." | Re-trigger Greptile}

greptile-apps · 2026-04-06T21:47:23Z

src/domain/graph/builder/stages/build-edges.ts

@@ -352,6 +352,17 @@ function buildCallEdgesNative(
        : Array.isArray(symbols.typeMap)
          ? (symbols.typeMap as Array<{ name: string; typeName: string; confidence: number }>)
          : [];
+    // Deduplicate: keep highest-confidence entry per name (first-wins on tie),
+    // matching JS setTypeMapEntry semantics.  This ensures parity even when
+    // the native edge builder's HashMap would otherwise use last-wins.
+    const typeMapDedup = new Map<string, { name: string; typeName: string; confidence: number }>();
+    for (const entry of typeMapRaw) {
+      const existing = typeMapDedup.get(entry.name);
+      if (!existing || entry.confidence > existing.confidence) {
+        typeMapDedup.set(entry.name, entry);
+      }
+    }
+    const typeMap = [...typeMapDedup.values()];


Dedup is a no-op for the Map path

When symbols.typeMap instanceof Map (the WASM/JS extraction path), setTypeMapEntry already enforces highest-confidence-wins at insertion time, so the resulting array has no duplicate names — the dedup loop adds no correctness value for that branch. The fix is only needed for the Array.isArray branch. Consider scoping the dedup to that branch, or add a comment clarifying this is intentional belt-and-suspenders for pre-rebuilt-addon builds.

Fixed — expanded the comment to clarify that the dedup is a no-op for the Map path (already deduped by setTypeMapEntry) and only needed for the Array branch (pre-rebuilt native addon). Kept it unconditional as intentional belt-and-suspenders since it's a cheap O(n) pass.

github-actions · 2026-04-06T21:49:12Z

Codegraph Impact Analysis

17 functions changed → 8 callers affected across 5 files

build_and_insert_call_edges in crates/codegraph-core/src/build_pipeline.rs:768 (1 transitive callers)
process_file in crates/codegraph-core/src/edge_builder.rs:144 (3 transitive callers)
resolve_call_targets in crates/codegraph-core/src/edge_builder.rs:222 (3 transitive callers)
emit_receiver_edge in crates/codegraph-core/src/edge_builder.rs:311 (3 transitive callers)
match_c_type_map in crates/codegraph-core/src/extractors/c.rs:22 (0 transitive callers)
match_cpp_type_map in crates/codegraph-core/src/extractors/cpp.rs:22 (0 transitive callers)
match_csharp_type_map in crates/codegraph-core/src/extractors/csharp.rs:414 (0 transitive callers)
collect_go_typed_identifiers in crates/codegraph-core/src/extractors/go.rs:324 (1 transitive callers)
match_java_type_map in crates/codegraph-core/src/extractors/java.rs:30 (0 transitive callers)
match_js_type_map in crates/codegraph-core/src/extractors/javascript.rs:55 (0 transitive callers)
match_kotlin_type_map in crates/codegraph-core/src/extractors/kotlin.rs:22 (0 transitive callers)
match_php_type_map in crates/codegraph-core/src/extractors/php.rs:388 (0 transitive callers)
match_python_type_map in crates/codegraph-core/src/extractors/python.rs:323 (0 transitive callers)
match_rust_type_map in crates/codegraph-core/src/extractors/rust_lang.rs:387 (0 transitive callers)
match_scala_type_map in crates/codegraph-core/src/extractors/scala.rs:22 (0 transitive callers)
match_swift_type_map in crates/codegraph-core/src/extractors/swift.rs:22 (0 transitive callers)
buildCallEdgesNative in src/domain/graph/builder/stages/build-edges.ts:329 (3 transitive callers)

…olds (#875) Add resolution quality gates to the benchmark pipeline so regressions are caught before publishing: - benchmark.yml: run vitest resolution test after the benchmark script, failing the workflow if any language drops below its threshold - update-benchmark-report.ts: warn on precision >5pp or recall >10pp drop per language between releases - regression-guard.test.ts: hard-fail CI on precision/recall regressions across releases, with KNOWN_REGRESSIONS exemption support

…mpile failure napi-rs v3 does not support the `default` attribute on `#[napi(object)]` struct fields — only on function parameters. The macro expansion failed, preventing TypeMapEntry and TypeMapInput from being generated, which cascaded into "not found in scope" errors across all extractors and the build pipeline. Removing the attribute is safe because all call sites (JS buildCallEdgesNative and Rust build_pipeline) always provide the confidence value explicitly.

…r Array path (#885)

carlos-alm · 2026-04-06T22:54:42Z

@greptileai

greptile-apps bot reviewed Apr 6, 2026

View reviewed changes

carlos-alm added 4 commits April 6, 2026 16:27

style: fix biome formatting in regression guard

5b147bf

fix: clarify dedup comment per review — no-op for Map path, needed fo…

0cc4360

…r Array path (#885)

Merge branch 'main' into fix/native-typemap-parity

3a0db09

carlos-alm merged commit ef7c834 into main Apr 6, 2026
18 of 19 checks passed

carlos-alm deleted the fix/native-typemap-parity branch April 6, 2026 23:20

github-actions bot locked and limited conversation to collaborators Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(native): type map confidence-aware dedup for engine parity#885

fix(native): type map confidence-aware dedup for engine parity#885
carlos-alm merged 6 commits intomainfrom
fix/native-typemap-parity

carlos-alm commented Apr 6, 2026

Uh oh!

claude bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 6, 2026

Uh oh!

carlos-alm Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

carlos-alm commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Apr 6, 2026

Summary

Test plan

Uh oh!

claude bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR: Type Map Confidence-Aware Dedup for Engine Parity

Todo List:

Root Cause Analysis

Backlog Compliance

Critical Concerns

Code Quality Assessment

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 6, 2026 •

edited

Loading

greptile-apps bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading