Skip to content

feat: add Puppet, YAML, and Dockerfile tree-sitter support#256

Merged
aphoristicartist merged 2 commits intomainfrom
feat/infra-formats
Mar 20, 2026
Merged

feat: add Puppet, YAML, and Dockerfile tree-sitter support#256
aphoristicartist merged 2 commits intomainfrom
feat/infra-formats

Conversation

@aphoristicartist
Copy link
Contributor

Summary

  • Add AST-based symbol extraction for Puppet (.pp), YAML (.yaml/.yml), and Dockerfile formats
  • Enables the embed command to process infrastructure-as-code with full symbol extraction, chunking, and streaming
  • Covers Helm charts, Kubernetes manifests, Ansible playbooks, Spinnaker pipeline configs, and Puppet manifests

Details

New Language Support

Format Extensions Symbols Extracted
Puppet .pp Classes, defined types, resources, nodes, functions, type aliases
YAML .yaml, .yml All mapping keys (plain, quoted)
Dockerfile Dockerfile, Dockerfile.* FROM stages, build args, labels

Tree-sitter Compatibility

tree-sitter-dockerfile 0.2 depends on tree-sitter 0.20, which is incompatible with this project's tree-sitter 0.26. Resolved by:

  1. Bypassing the Rust wrapper — calling the raw C symbol via extern "C"
  2. Converting via tree-sitter-language::LanguageFn bridge
  3. Force-linking native code with extern crate tree_sitter_dockerfile in lib.rs

Files Changed (15)

  • engine/Cargo.toml — 4 new dependencies
  • engine/src/parser/language.rs — Language enum variants + all match arms
  • engine/src/parser/queries.rs — Symbol extraction queries for 3 formats
  • engine/src/parser/{init,core,query_builder,extraction}.rs — Parser integration
  • engine/src/embedding/{chunker,streaming}.rs — Filename-based Dockerfile detection
  • engine/src/index/{types.rs,builder/core.rs,lazy.rs} — Index language mappings
  • engine/src/analysis/complexity.rs — Complexity analysis support
  • engine/src/lib.rs — Force-link dockerfile native code
  • engine/tests/tree_sitter_compat.rs — ABI compatibility tests for new grammars

Test plan

  • All 1555+ existing unit tests pass
  • Tree-sitter ABI compatibility verified for all 3 new grammars
  • Lockfile test updated for dual tree-sitter versions (0.26 + 0.20)
  • Manual: infiniloom embed on a repo with .pp, .yaml, and Dockerfile files
  • Manual: Verify Spinnaker YAML pipeline configs produce meaningful symbol chunks

🤖 Generated with Claude Code

aphoristicartist and others added 2 commits March 20, 2026 09:02
Add AST-based symbol extraction for three infrastructure formats:

- **Puppet** (.pp): Classes, defined types, resources, nodes, functions
- **YAML** (.yaml/.yml): Mapping keys (covers Helm, K8s, Ansible, Spinnaker)
- **Dockerfile**: FROM stages, ARG/ENV definitions, labels

This enables the embed command to process infrastructure-as-code files
with full symbol extraction, chunking, and streaming support.

Technical notes:
- tree-sitter-dockerfile 0.2 depends on tree-sitter 0.20; bypassed its
  Rust wrapper via extern "C" + tree-sitter-language bridge to maintain
  compatibility with the project's tree-sitter 0.26
- Dockerfile detection uses filename matching (no extension) alongside
  the standard extension-based detection
- All 1555+ existing tests pass with zero failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Contributor

Benchmark Results

                        time:   [8.9651 µs 8.9711 µs 8.9779 µs]
                        thrpt:  [1.1139 Melem/s 1.1147 Melem/s 1.1154 Melem/s]
                        time:   [41.438 µs 41.459 µs 41.479 µs]
                        thrpt:  [241.08 Kelem/s 241.20 Kelem/s 241.32 Kelem/s]
                        time:   [19.464 µs 19.551 µs 19.672 µs]
                        thrpt:  [2.5417 Melem/s 2.5574 Melem/s 2.5689 Melem/s]
                        time:   [61.117 µs 61.307 µs 61.475 µs]
                        thrpt:  [813.34 Kelem/s 815.57 Kelem/s 818.10 Kelem/s]
                        time:   [64.152 µs 64.240 µs 64.310 µs]
                        thrpt:  [3.1099 Melem/s 3.1133 Melem/s 3.1176 Melem/s]
                        time:   [133.24 µs 133.60 µs 134.02 µs]
                        thrpt:  [1.4923 Melem/s 1.4970 Melem/s 1.5011 Melem/s]
file_reading/sequential time:   [95.702 µs 95.823 µs 95.994 µs]
                        time:   [55.442 µs 55.872 µs 56.396 µs]
                        time:   [94.307 µs 94.614 µs 95.126 µs]
                        time:   [94.969 µs 95.072 µs 95.181 µs]
                        time:   [129.48 µs 129.53 µs 129.59 µs]
                        time:   [297.04 ps 297.16 ps 297.28 ps]
                        time:   [296.98 ps 298.38 ps 300.46 ps]
                        time:   [296.84 ps 297.07 ps 297.46 ps]

Benchmarks run on Ubuntu runner. See artifacts for full results.

@aphoristicartist aphoristicartist merged commit 46c5685 into main Mar 20, 2026
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant