Skip to content

Conversation

@shivasurya
Copy link
Owner

This PR implements comprehensive import extraction for Python code using tree-sitter AST parsing. It handles all three main import styles:

  1. Simple imports: import module
  2. From imports: from module import name
  3. Aliased imports: import module as alias and from module import name as alias

The implementation uses direct AST traversal instead of tree-sitter queries for better compatibility and control. It properly handles:

  • Multiple imports per line (from json import dumps, loads)
  • Nested module paths (import xml.etree.ElementTree)
  • Whitespace variations
  • Invalid/malformed syntax (fault-tolerant parsing)

Key functions:

  • ExtractImports(): Main entry point that parses code and builds ImportMap
  • traverseForImports(): Recursively traverses AST to find import statements
  • processImportStatement(): Handles simple and aliased imports
  • processImportFromStatement(): Handles from-import statements with proper module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:

  • simple_imports.py: Basic import statements
  • from_imports.py: From import statements with multiple names
  • aliased_imports.py: Aliased imports (both simple and from)
  • mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

Checklist:

  • Tests passing (gradle testGo)?
  • Lint passing (golangci-lint run this requires golangci-lint)?

shivasurya and others added 3 commits October 25, 2025 22:47
Add foundational data structures for Python call graph construction:

New Types:
- CallSite: Represents function call locations with arguments and resolution status
- CallGraph: Maps functions to callees with forward/reverse edges
- ModuleRegistry: Maps Python file paths to module paths
- ImportMap: Tracks imports per file for name resolution
- Location: Source code position tracking
- Argument: Function call argument metadata

Features:
- 100% test coverage with comprehensive unit tests
- Bidirectional call graph edges (forward and reverse)
- Support for ambiguous short names in module registry
- Helper functions for module path manipulation

This establishes the foundation for 3-pass call graph algorithm:
- Pass 1 (next PR): Module registry builder
- Pass 2 (next PR): Import extraction and resolution
- Pass 3 (next PR): Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement the first pass of the call graph construction algorithm: building
a complete registry of Python modules by walking the directory tree.

New Features:
- BuildModuleRegistry: Walks directory tree and maps file paths to module paths
- convertToModulePath: Converts file system paths to Python import paths
- shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc.

Module Path Conversion:
- Handles regular files: myapp/views.py → myapp.views
- Handles packages: myapp/utils/__init__.py → myapp.utils
- Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users
- Cross-platform: Normalizes Windows/Unix path separators

Performance Optimizations:
- Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.)
- Avoids scanning thousands of dependency files
- Indexes both full module paths and short names for ambiguity detection

Test Coverage: 93%
- Comprehensive unit tests for all conversion scenarios
- Integration tests with real Python project structure
- Edge case handling: empty dirs, non-Python files, deep nesting, permissions
- Error path testing: walk errors, invalid paths, system errors
- Test fixtures: test-src/python/simple_project/ with realistic structure
- Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures)

This establishes Pass 1 of 3:
- ✅ Pass 1: Module registry (this PR)
- Next: Pass 2 - Import extraction and resolution
- Next: Pass 3 - Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm
Base Branch: shiva/callgraph-infra-1 (PR #1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This PR implements comprehensive import extraction for Python code using
tree-sitter AST parsing. It handles all three main import styles:

1. Simple imports: `import module`
2. From imports: `from module import name`
3. Aliased imports: `import module as alias` and `from module import name as alias`

The implementation uses direct AST traversal instead of tree-sitter queries
for better compatibility and control. It properly handles:
- Multiple imports per line (`from json import dumps, loads`)
- Nested module paths (`import xml.etree.ElementTree`)
- Whitespace variations
- Invalid/malformed syntax (fault-tolerant parsing)

Key functions:
- ExtractImports(): Main entry point that parses code and builds ImportMap
- traverseForImports(): Recursively traverses AST to find import statements
- processImportStatement(): Handles simple and aliased imports
- processImportFromStatement(): Handles from-import statements with proper
  module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:
- simple_imports.py: Basic import statements
- from_imports.py: From import statements with multiple names
- aliased_imports.py: Aliased imports (both simple and from)
- mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@shivasurya shivasurya self-assigned this Oct 26, 2025
@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels Oct 26, 2025
@safedep
Copy link

safedep bot commented Oct 26, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

@codecov
Copy link

codecov bot commented Oct 26, 2025

Codecov Report

❌ Patch coverage is 86.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.94%. Comparing base (4e21322) to head (1e53414).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sourcecode-parser/graph/callgraph/imports.go 86.66% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   73.65%   73.94%   +0.29%     
==========================================
  Files          25       26       +1     
  Lines        2615     2675      +60     
==========================================
+ Hits         1926     1978      +52     
- Misses        643      647       +4     
- Partials       46       50       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from shiva/callgraph-infra-2 to main October 29, 2025 02:10
@shivasurya shivasurya merged commit 5454e99 into main Oct 29, 2025
5 checks passed
@shivasurya shivasurya deleted the shiva/callgraph-infra-3 branch October 29, 2025 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants