fix(native): type map confidence-aware dedup for engine parity#885
fix(native): type map confidence-aware dedup for engine parity#885carlos-alm merged 6 commits intomainfrom
Conversation
The native engine's type map used HashMap::collect() (last-wins) while the JS engine used setTypeMapEntry (highest-confidence, first-wins on tie). When the same variable name appeared in multiple function scopes with different type annotations (e.g. `node: TreeSitterNode` and `node: NodeRow` in cfg.ts), the engines disagreed on the resolved type, producing different receiver/calls edges. Additionally, the Rust JS extractor returned early after finding a type annotation, skipping constructor inference — unlike the JS extractor which lets constructors (confidence 1.0) override annotations (0.9). Changes: - Add `confidence` field to Rust TypeMapEntry and TypeMapInput structs - Fix Rust edge builder to keep highest-confidence entry per name - Remove early return in Rust JS extractor; emit both annotation and constructor entries with appropriate confidence levels - Add confidence values to all 12 Rust language extractors - Add JS-side type map dedup in buildCallEdgesNative for immediate parity without requiring a native addon rebuild Verified: receiver edge counts now identical between engines (735=735), cfg.ts correctly resolves to TreeSitterNode on both engines.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: Type Map Confidence-Aware Dedup for Engine ParityTodo List:
Root Cause AnalysisProblem Being Solved: Engine parity bug where native Rust and WASM engines produced different receiver/call edge counts (735 vs different numbers) due to two specific bugs in the native type map handling. Why This Approach: The fix addresses the actual root causes rather than hiding symptoms:
Risk Assessment: Low risk. This is a pure correctness fix that makes both engines produce identical results. No behavioral changes to the API or user-facing functionality. Backlog Compliance
Critical ConcernsNone identified. This is a well-executed engine parity fix that: ✅ Addresses real bugs, not symptoms: The confidence field addition and deduplication logic fix actual algorithmic differences Code Quality AssessmentExcellent implementation:
Final RecommendationRating: ⭐⭐⭐⭐⭐ (5/5) Action: APPROVE Reasoning: This is an exemplary engine parity fix that demonstrates deep understanding of both codebases. It fixes the actual root causes (HashMap behavior differences and early returns), provides comprehensive test coverage, maintains backward compatibility, and ensures both engines produce identical results. The systematic implementation across all language extractors shows thoroughness. This is exactly the type of correctness-focused engineering that builds trust in dual-engine architectures. The PR eliminates a significant source of inconsistency between engines that could confuse AI agents and break the foundational assumption that both engines should produce identical graphs from identical input. |
Greptile SummaryThis PR fixes two bugs in the native engine's type map processing that caused receiver/call edge counts to diverge from the WASM engine: (1) the Rust edge builder used Confidence Score: 5/5Safe to merge — all changes are bug fixes restoring engine parity with no regressions in 631 tests The two root-cause bugs are fixed correctly: Rust edge builder now uses confidence-aware dedup matching JS semantics, and the JS extractor no longer discards constructor entries. The JS bridge layer provides immediate parity. All 12 language extractors are consistently updated. The benchmark infrastructure addition is well-scoped and the existing test patterns are followed. No P0/P1 issues found; prior review concern about Map-path dedup was addressed in discussion. No files require special attention — all changes are self-consistent Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Source file (JS/TS)"] --> B["Rust JS extractor\nmatch_js_type_map"]
B --> C{"variable_declarator\nhas type annotation?"}
C -- Yes --> D["Push TypeMapEntry\nconfidence: 0.9"]
C -- No --> E["skip annotation"]
D --> F{"has new_expression?"}
E --> F
F -- Yes --> G["Push TypeMapEntry\nconfidence: 1.0"]
F -- No --> H["type_map Vec (may have duplicates)"]
G --> H
H --> I["build_pipeline.rs\nconvert to TypeMapInput (with confidence)"]
I --> J["edge_builder.rs\nprocess_file"]
J --> K["Confidence-aware HashMap dedup\nhighest-wins / first-wins on tie"]
K --> L["type_map: HashMap<name, (type, conf)>"]
L --> M["resolve_call_targets\nemit_receiver_edge"]
M --> N["ComputedEdge output"]
style D fill:#f9f,stroke:#333
style G fill:#9f9,stroke:#333
style K fill:#ff9,stroke:#333
Reviews (2): Last reviewed commit: "fix: clarify dedup comment per review — ..." | Re-trigger Greptile |
| @@ -352,6 +352,17 @@ function buildCallEdgesNative( | |||
| : Array.isArray(symbols.typeMap) | |||
| ? (symbols.typeMap as Array<{ name: string; typeName: string; confidence: number }>) | |||
| : []; | |||
| // Deduplicate: keep highest-confidence entry per name (first-wins on tie), | |||
| // matching JS setTypeMapEntry semantics. This ensures parity even when | |||
| // the native edge builder's HashMap would otherwise use last-wins. | |||
| const typeMapDedup = new Map<string, { name: string; typeName: string; confidence: number }>(); | |||
| for (const entry of typeMapRaw) { | |||
| const existing = typeMapDedup.get(entry.name); | |||
| if (!existing || entry.confidence > existing.confidence) { | |||
| typeMapDedup.set(entry.name, entry); | |||
| } | |||
| } | |||
| const typeMap = [...typeMapDedup.values()]; | |||
There was a problem hiding this comment.
Dedup is a no-op for the Map path
When symbols.typeMap instanceof Map (the WASM/JS extraction path), setTypeMapEntry already enforces highest-confidence-wins at insertion time, so the resulting array has no duplicate names — the dedup loop adds no correctness value for that branch. The fix is only needed for the Array.isArray branch. Consider scoping the dedup to that branch, or add a comment clarifying this is intentional belt-and-suspenders for pre-rebuilt-addon builds.
There was a problem hiding this comment.
Fixed — expanded the comment to clarify that the dedup is a no-op for the Map path (already deduped by setTypeMapEntry) and only needed for the Array branch (pre-rebuilt native addon). Kept it unconditional as intentional belt-and-suspenders since it's a cheap O(n) pass.
Codegraph Impact Analysis17 functions changed → 8 callers affected across 5 files
|
…olds (#875) Add resolution quality gates to the benchmark pipeline so regressions are caught before publishing: - benchmark.yml: run vitest resolution test after the benchmark script, failing the workflow if any language drops below its threshold - update-benchmark-report.ts: warn on precision >5pp or recall >10pp drop per language between releases - regression-guard.test.ts: hard-fail CI on precision/recall regressions across releases, with KNOWN_REGRESSIONS exemption support
…mpile failure napi-rs v3 does not support the `default` attribute on `#[napi(object)]` struct fields — only on function parameters. The macro expansion failed, preventing TypeMapEntry and TypeMapInput from being generated, which cascaded into "not found in scope" errors across all extractors and the build pipeline. Removing the attribute is safe because all call sites (JS buildCallEdgesNative and Rust build_pipeline) always provide the confidence value explicitly.
Summary
HashMap::collect()(last-wins) vs JSsetTypeMapEntry(highest-confidence, first-wins on tie) — causing different type resolution when the same variable name appears in multiple function scopes (e.g.,node: TreeSitterNodevsnode: NodeRowin cfg.ts)confidencefield to RustTypeMapEntryandTypeMapInputstructs across all 12 language extractorsbuildCallEdgesNativefor immediate parity without requiring native addon rebuildTest plan
nodetoTreeSitterNode(notNodeRow)BatchResolvedMapreceiver edge