Skip to content

fix(native): type map confidence-aware dedup for engine parity#885

Merged
carlos-alm merged 6 commits intomainfrom
fix/native-typemap-parity
Apr 6, 2026
Merged

fix(native): type map confidence-aware dedup for engine parity#885
carlos-alm merged 6 commits intomainfrom
fix/native-typemap-parity

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Fix native engine type map producing different receiver/calls edges than WASM engine due to two bugs:
    1. Rust edge builder used HashMap::collect() (last-wins) vs JS setTypeMapEntry (highest-confidence, first-wins on tie) — causing different type resolution when the same variable name appears in multiple function scopes (e.g., node: TreeSitterNode vs node: NodeRow in cfg.ts)
    2. Rust JS extractor returned early after type annotation, skipping constructor inference — unlike the JS extractor which lets constructors (confidence 1.0) override annotations (0.9)
  • Add confidence field to Rust TypeMapEntry and TypeMapInput structs across all 12 language extractors
  • Add JS-side type map dedup in buildCallEdgesNative for immediate parity without requiring native addon rebuild

Test plan

  • Verified receiver edge counts identical between engines (735 = 735) after fix
  • cfg.ts: both engines now resolve node to TreeSitterNode (not NodeRow)
  • resolve.ts: neither engine produces spurious BatchResolvedMap receiver edge
  • 631 integration + parser tests pass
  • Lint clean (6 pre-existing warnings in unrelated files)
  • CI build to verify Rust compilation on all platforms

The native engine's type map used HashMap::collect() (last-wins) while
the JS engine used setTypeMapEntry (highest-confidence, first-wins on
tie). When the same variable name appeared in multiple function scopes
with different type annotations (e.g. `node: TreeSitterNode` and
`node: NodeRow` in cfg.ts), the engines disagreed on the resolved type,
producing different receiver/calls edges.

Additionally, the Rust JS extractor returned early after finding a type
annotation, skipping constructor inference — unlike the JS extractor
which lets constructors (confidence 1.0) override annotations (0.9).

Changes:
- Add `confidence` field to Rust TypeMapEntry and TypeMapInput structs
- Fix Rust edge builder to keep highest-confidence entry per name
- Remove early return in Rust JS extractor; emit both annotation and
  constructor entries with appropriate confidence levels
- Add confidence values to all 12 Rust language extractors
- Add JS-side type map dedup in buildCallEdgesNative for immediate
  parity without requiring a native addon rebuild

Verified: receiver edge counts now identical between engines (735=735),
cfg.ts correctly resolves to TreeSitterNode on both engines.
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: Type Map Confidence-Aware Dedup for Engine Parity

Todo List:

  • Read changed files to understand implementation
  • Analyze root cause and approach justification
  • Check for configuration changes that might hide issues
  • Review test integrity and code quality
  • Assess backlog compliance
  • Provide final assessment and rating

Root Cause Analysis

Problem Being Solved: Engine parity bug where native Rust and WASM engines produced different receiver/call edge counts (735 vs different numbers) due to two specific bugs in the native type map handling.

Why This Approach: The fix addresses the actual root causes rather than hiding symptoms:

  1. Rust edge builder dedup bug: Used HashMap::collect() (last-wins) vs JS setTypeMapEntry (highest-confidence, first-wins on tie). This caused different type resolution when same variable names appeared in multiple scopes.
  2. Early return bug: Rust JS extractor returned early after type annotation, skipping constructor inference, unlike JS extractor which lets constructors (confidence 1.0) override annotations (0.9).

Risk Assessment: Low risk. This is a pure correctness fix that makes both engines produce identical results. No behavioral changes to the API or user-facing functionality.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Fully complies with P1 (always-current) and core principles. Fixes actual bugs without breaking API contracts
  • Problem-fit: 5 — Directly addresses the core problem: "AI coding assistants waste tokens re-orienting in large codebases, hallucinate dependencies, and miss blast-radius." Engine parity prevents AI agents from getting inconsistent results between environments
  • Breaking: No — Purely additive correctness fix, no incompatible behavior changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed engine parity fix that:

Addresses real bugs, not symptoms: The confidence field addition and deduplication logic fix actual algorithmic differences
Maintains test integrity: No tests were weakened, no config relaxed. The parity test framework validates the fix
Follows codebase patterns: Uses existing setTypeMapEntry semantics and consistent confidence values across extractors
Complete implementation: All 12 language extractors updated systematically
Proper verification: PR description shows concrete evidence (735 = 735 edges, specific test cases)

Code Quality Assessment

Excellent implementation:

  • Systematic approach: All Rust extractors (c.rs, cpp.rs, csharp.rs, etc.) consistently add the confidence field
  • Correct semantics: edge_builder.rs:159-174 implements proper confidence-aware dedup matching JS setTypeMapEntry
  • Defensive coding: JS-side dedup in build-edges.ts:355-365 provides immediate parity without requiring native rebuild
  • Clear documentation: Type map confidence levels well-documented (0.9 = type annotation, 1.0 = constructor, 0.7 = factory)

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)

Action: APPROVE

Reasoning: This is an exemplary engine parity fix that demonstrates deep understanding of both codebases. It fixes the actual root causes (HashMap behavior differences and early returns), provides comprehensive test coverage, maintains backward compatibility, and ensures both engines produce identical results. The systematic implementation across all language extractors shows thoroughness. This is exactly the type of correctness-focused engineering that builds trust in dual-engine architectures.

The PR eliminates a significant source of inconsistency between engines that could confuse AI agents and break the foundational assumption that both engines should produce identical graphs from identical input.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR fixes two bugs in the native engine's type map processing that caused receiver/call edge counts to diverge from the WASM engine: (1) the Rust edge builder used HashMap::collect() (last-wins) instead of highest-confidence-first semantics; (2) the JS extractor returned early after a type annotation, preventing constructor entries (confidence 1.0) from being emitted. Both are corrected cleanly, with a JS-side dedup layer added as an immediate bridge for pre-rebuilt addons.

Confidence Score: 5/5

Safe to merge — all changes are bug fixes restoring engine parity with no regressions in 631 tests

The two root-cause bugs are fixed correctly: Rust edge builder now uses confidence-aware dedup matching JS semantics, and the JS extractor no longer discards constructor entries. The JS bridge layer provides immediate parity. All 12 language extractors are consistently updated. The benchmark infrastructure addition is well-scoped and the existing test patterns are followed. No P0/P1 issues found; prior review concern about Map-path dedup was addressed in discussion.

No files require special attention — all changes are self-consistent

Important Files Changed

Filename Overview
crates/codegraph-core/src/edge_builder.rs Confidence-aware HashMap dedup: highest-wins / first-tie semantics now match JS setTypeMapEntry
crates/codegraph-core/src/extractors/javascript.rs Removed early-return after annotation so both annotation (0.9) and constructor (1.0) entries are emitted; dedup resolves in edge builder
crates/codegraph-core/src/types.rs Added confidence: f64 to TypeMapEntry with 0.7/0.9/1.0 scale documented
src/domain/graph/builder/stages/build-edges.ts JS-side dedup pass before native call handles pre-rebuilt-addon Array branch; Map branch already deduped by setTypeMapEntry
tests/benchmarks/regression-guard.test.ts New resolution precision/recall regression guard with 5pp/10pp thresholds; mirrors existing benchmark test pattern
.github/workflows/benchmark.yml Added Gate on resolution thresholds step running regression test before merging benchmark data
scripts/update-benchmark-report.ts Added resolution regression detection emitting GitHub Actions warnings on precision/recall drops
crates/codegraph-core/src/build_pipeline.rs Passes confidence field through TypeMapInput struct when building call edges

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Source file (JS/TS)"] --> B["Rust JS extractor\nmatch_js_type_map"]
    B --> C{"variable_declarator\nhas type annotation?"}
    C -- Yes --> D["Push TypeMapEntry\nconfidence: 0.9"]
    C -- No --> E["skip annotation"]
    D --> F{"has new_expression?"}
    E --> F
    F -- Yes --> G["Push TypeMapEntry\nconfidence: 1.0"]
    F -- No --> H["type_map Vec (may have duplicates)"]
    G --> H
    H --> I["build_pipeline.rs\nconvert to TypeMapInput (with confidence)"]
    I --> J["edge_builder.rs\nprocess_file"]
    J --> K["Confidence-aware HashMap dedup\nhighest-wins / first-wins on tie"]
    K --> L["type_map: HashMap<name, (type, conf)>"]
    L --> M["resolve_call_targets\nemit_receiver_edge"]
    M --> N["ComputedEdge output"]

    style D fill:#f9f,stroke:#333
    style G fill:#9f9,stroke:#333
    style K fill:#ff9,stroke:#333
Loading

Reviews (2): Last reviewed commit: "fix: clarify dedup comment per review — ..." | Re-trigger Greptile

Comment on lines 345 to +365
@@ -352,6 +352,17 @@ function buildCallEdgesNative(
: Array.isArray(symbols.typeMap)
? (symbols.typeMap as Array<{ name: string; typeName: string; confidence: number }>)
: [];
// Deduplicate: keep highest-confidence entry per name (first-wins on tie),
// matching JS setTypeMapEntry semantics. This ensures parity even when
// the native edge builder's HashMap would otherwise use last-wins.
const typeMapDedup = new Map<string, { name: string; typeName: string; confidence: number }>();
for (const entry of typeMapRaw) {
const existing = typeMapDedup.get(entry.name);
if (!existing || entry.confidence > existing.confidence) {
typeMapDedup.set(entry.name, entry);
}
}
const typeMap = [...typeMapDedup.values()];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dedup is a no-op for the Map path

When symbols.typeMap instanceof Map (the WASM/JS extraction path), setTypeMapEntry already enforces highest-confidence-wins at insertion time, so the resulting array has no duplicate names — the dedup loop adds no correctness value for that branch. The fix is only needed for the Array.isArray branch. Consider scoping the dedup to that branch, or add a comment clarifying this is intentional belt-and-suspenders for pre-rebuilt-addon builds.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — expanded the comment to clarify that the dedup is a no-op for the Map path (already deduped by setTypeMapEntry) and only needed for the Array branch (pre-rebuilt native addon). Kept it unconditional as intentional belt-and-suspenders since it's a cheap O(n) pass.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Codegraph Impact Analysis

17 functions changed8 callers affected across 5 files

  • build_and_insert_call_edges in crates/codegraph-core/src/build_pipeline.rs:768 (1 transitive callers)
  • process_file in crates/codegraph-core/src/edge_builder.rs:144 (3 transitive callers)
  • resolve_call_targets in crates/codegraph-core/src/edge_builder.rs:222 (3 transitive callers)
  • emit_receiver_edge in crates/codegraph-core/src/edge_builder.rs:311 (3 transitive callers)
  • match_c_type_map in crates/codegraph-core/src/extractors/c.rs:22 (0 transitive callers)
  • match_cpp_type_map in crates/codegraph-core/src/extractors/cpp.rs:22 (0 transitive callers)
  • match_csharp_type_map in crates/codegraph-core/src/extractors/csharp.rs:414 (0 transitive callers)
  • collect_go_typed_identifiers in crates/codegraph-core/src/extractors/go.rs:324 (1 transitive callers)
  • match_java_type_map in crates/codegraph-core/src/extractors/java.rs:30 (0 transitive callers)
  • match_js_type_map in crates/codegraph-core/src/extractors/javascript.rs:55 (0 transitive callers)
  • match_kotlin_type_map in crates/codegraph-core/src/extractors/kotlin.rs:22 (0 transitive callers)
  • match_php_type_map in crates/codegraph-core/src/extractors/php.rs:388 (0 transitive callers)
  • match_python_type_map in crates/codegraph-core/src/extractors/python.rs:323 (0 transitive callers)
  • match_rust_type_map in crates/codegraph-core/src/extractors/rust_lang.rs:387 (0 transitive callers)
  • match_scala_type_map in crates/codegraph-core/src/extractors/scala.rs:22 (0 transitive callers)
  • match_swift_type_map in crates/codegraph-core/src/extractors/swift.rs:22 (0 transitive callers)
  • buildCallEdgesNative in src/domain/graph/builder/stages/build-edges.ts:329 (3 transitive callers)

…olds (#875)

Add resolution quality gates to the benchmark pipeline so regressions
are caught before publishing:

- benchmark.yml: run vitest resolution test after the benchmark script,
  failing the workflow if any language drops below its threshold
- update-benchmark-report.ts: warn on precision >5pp or recall >10pp
  drop per language between releases
- regression-guard.test.ts: hard-fail CI on precision/recall regressions
  across releases, with KNOWN_REGRESSIONS exemption support
…mpile failure

napi-rs v3 does not support the `default` attribute on `#[napi(object)]`
struct fields — only on function parameters. The macro expansion failed,
preventing TypeMapEntry and TypeMapInput from being generated, which
cascaded into "not found in scope" errors across all extractors and the
build pipeline. Removing the attribute is safe because all call sites
(JS buildCallEdgesNative and Rust build_pipeline) always provide the
confidence value explicitly.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit ef7c834 into main Apr 6, 2026
18 of 19 checks passed
@carlos-alm carlos-alm deleted the fix/native-typemap-parity branch April 6, 2026 23:20
@github-actions github-actions bot locked and limited conversation to collaborators Apr 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant