cpf/enhancement: Implement import extraction with tree-sitter #324

shivasurya · 2025-10-26T20:17:43Z

This PR implements comprehensive import extraction for Python code using tree-sitter AST parsing. It handles all three main import styles:

Simple imports: import module
From imports: from module import name
Aliased imports: import module as alias and from module import name as alias

The implementation uses direct AST traversal instead of tree-sitter queries for better compatibility and control. It properly handles:

Multiple imports per line (from json import dumps, loads)
Nested module paths (import xml.etree.ElementTree)
Whitespace variations
Invalid/malformed syntax (fault-tolerant parsing)

Key functions:

ExtractImports(): Main entry point that parses code and builds ImportMap
traverseForImports(): Recursively traverses AST to find import statements
processImportStatement(): Handles simple and aliased imports
processImportFromStatement(): Handles from-import statements with proper module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:

simple_imports.py: Basic import statements
from_imports.py: From import statements with multiple names
aliased_imports.py: Aliased imports (both simple and from)
mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

Checklist:

Tests passing (gradle testGo)?
Lint passing (golangci-lint run this requires golangci-lint)?

Add foundational data structures for Python call graph construction: New Types: - CallSite: Represents function call locations with arguments and resolution status - CallGraph: Maps functions to callees with forward/reverse edges - ModuleRegistry: Maps Python file paths to module paths - ImportMap: Tracks imports per file for name resolution - Location: Source code position tracking - Argument: Function call argument metadata Features: - 100% test coverage with comprehensive unit tests - Bidirectional call graph edges (forward and reverse) - Support for ambiguous short names in module registry - Helper functions for module path manipulation This establishes the foundation for 3-pass call graph algorithm: - Pass 1 (next PR): Module registry builder - Pass 2 (next PR): Import extraction and resolution - Pass 3 (next PR): Call graph construction Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implement the first pass of the call graph construction algorithm: building a complete registry of Python modules by walking the directory tree. New Features: - BuildModuleRegistry: Walks directory tree and maps file paths to module paths - convertToModulePath: Converts file system paths to Python import paths - shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc. Module Path Conversion: - Handles regular files: myapp/views.py → myapp.views - Handles packages: myapp/utils/__init__.py → myapp.utils - Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users - Cross-platform: Normalizes Windows/Unix path separators Performance Optimizations: - Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.) - Avoids scanning thousands of dependency files - Indexes both full module paths and short names for ambiguity detection Test Coverage: 93% - Comprehensive unit tests for all conversion scenarios - Integration tests with real Python project structure - Edge case handling: empty dirs, non-Python files, deep nesting, permissions - Error path testing: walk errors, invalid paths, system errors - Test fixtures: test-src/python/simple_project/ with realistic structure - Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures) This establishes Pass 1 of 3: - ✅ Pass 1: Module registry (this PR) - Next: Pass 2 - Import extraction and resolution - Next: Pass 3 - Call graph construction Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm Base Branch: shiva/callgraph-infra-1 (PR #1) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This PR implements comprehensive import extraction for Python code using tree-sitter AST parsing. It handles all three main import styles: 1. Simple imports: `import module` 2. From imports: `from module import name` 3. Aliased imports: `import module as alias` and `from module import name as alias` The implementation uses direct AST traversal instead of tree-sitter queries for better compatibility and control. It properly handles: - Multiple imports per line (`from json import dumps, loads`) - Nested module paths (`import xml.etree.ElementTree`) - Whitespace variations - Invalid/malformed syntax (fault-tolerant parsing) Key functions: - ExtractImports(): Main entry point that parses code and builds ImportMap - traverseForImports(): Recursively traverses AST to find import statements - processImportStatement(): Handles simple and aliased imports - processImportFromStatement(): Handles from-import statements with proper module name skipping to avoid duplicate entries Test coverage: 92.8% overall, 90-95% for import extraction functions Test fixtures include: - simple_imports.py: Basic import statements - from_imports.py: From import statements with multiple names - aliased_imports.py: Aliased imports (both simple and from) - mixed_imports.py: Mixed import styles All tests passing, linting clean, builds successfully. This is Pass 2 Part A of the 3-pass call graph algorithm. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

safedep · 2025-10-26T20:17:46Z

SafeDep Report Summary

No dependency changes detected. Nothing to scan.

_{This report is generated by SafeDep Github App}

codecov · 2025-10-26T20:18:51Z

Codecov Report

❌ Patch coverage is 86.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.94%. Comparing base (4e21322) to head (1e53414).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
sourcecode-parser/graph/callgraph/imports.go	86.66%	4 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   73.65%   73.94%   +0.29%     
==========================================
  Files          25       26       +1     
  Lines        2615     2675      +60     
==========================================
+ Hits         1926     1978      +52     
- Misses        643      647       +4     
- Partials       46       50       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

shivasurya and others added 3 commits October 25, 2025 22:47

shivasurya self-assigned this Oct 26, 2025

shivasurya added enhancement New feature or request go Pull requests that update go code labels Oct 26, 2025

Base automatically changed from shiva/callgraph-infra-2 to main October 29, 2025 02:10

Merge branch 'main' into shiva/callgraph-infra-3

1e53414

shivasurya merged commit 5454e99 into main Oct 29, 2025
5 checks passed

shivasurya deleted the shiva/callgraph-infra-3 branch October 29, 2025 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

cpf/enhancement: Implement import extraction with tree-sitter #324

cpf/enhancement: Implement import extraction with tree-sitter #324

shivasurya commented Oct 26, 2025

Uh oh!

safedep bot commented Oct 26, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

cpf/enhancement: Implement import extraction with tree-sitter #324

cpf/enhancement: Implement import extraction with tree-sitter #324

Conversation

shivasurya commented Oct 26, 2025

Checklist:

Uh oh!

safedep bot commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SafeDep Report Summary

Uh oh!

codecov bot commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

safedep bot commented Oct 26, 2025 •

edited

Loading

codecov bot commented Oct 26, 2025 •

edited

Loading