Releases: Enigmatisms/tachyon
[beta-0.2] Optimization for tachyon evolve and the code-base cleanup
Major robustness and reliability improvements for the evolve automated kernel optimization workflow, plus codebase cleanup and new developer tooling.
Changes
Codebase cleanup
Removed tachyon analyze CLI command, LiteLLM backend, rule-based analyzer (nvrules), and unused models (evidence, opt_tree) along with their tests. Project standardizes on profile + evolve as the two primary workflows.
Evolve robustness
read_source_filereturns araw_textfield — LLMs can copy-paste exact content asold_content, eliminating the #1 cause of edit failures- Diff safety scanner detects removed
__syncthreads, shared memory size changes, and removed bounds guards before compilation - Crash classification maps exit codes to human-readable diagnostics
- Compile-error filtering reduces token waste from cascading compiler output
- Fuzzy-match fallback for
edit_source_filewhen exact match fails iteration_doomedflag and benchmark-fix window prevent wasted turns
Skill-based knowledge injection
New SkillRegistry scans .md files with YAML frontmatter and injects domain-specific CUDA optimization knowledge into evolve prompts. Three built-in skills: memory-bound, compute-bound, and general strategies.
The skill base can be easily extended, and in the future we are to build a CUDA kernel skill library for LLMs with smart selection of skills.
--debug-timer
Per-phase timing (edit / compile / benchmark / profile) for evolve iterations to identify bottlenecks during long optimization sessions.
NCU code aggregator
read_source_file falls back to reading non-line-mapped code sections from the .ncu-rep file when local source files are unavailable.
Terminal UX & docs
Live progress display, urgency indicators, structured logging system (utils/log.py), updated docs, and new docs/logging-guide.md.
Test Plan
- 1,172 tests pass (up from 1,125 on master)
- New test suites: safety scanner, crash classification, fuzzy match, skills, evolve timer, evolve session
- Removed tests for deleted modules
Major Changes
tachyon analyzeremoved — usetachyon profileLiteLLMBackendremoved — useAnthropicBackend
Tachyon preview beta release
AI empowered CUDA kernel profiler
tachyon /ˈtakēˌän/ (tachyon — a theoretical particle that travels faster than light) is a CUDA kernel performance analysis and self-evolving optimization toolkit. It combines LLM-powered agents with traditional rule-based analysis to bridge the full path from NCU metrics to CUDA source code to low-level instructions (PTX/SASS), so performance analysis doesn't stop at aggregate counters — it traces back through the instruction level all the way to your source lines. Beyond analysis, the evolve mode automates the optimization loop: an agent reads profiling data, edits source code, rebuilds, re-profiles, and iterates until convergence — no manual tuning required.
Fully integrated in Python. Supports end-to-end profiling (run an executable directly after tachyon, like ncu), interactive AI-driven analysis, and fully automated iterative optimization. Multiple LLM agent vendors are supported.