Releases · Enigmatisms/tachyon

Major robustness and reliability improvements for the evolve automated kernel optimization workflow, plus codebase cleanup and new developer tooling.

Changes

Codebase cleanup
Removed tachyon analyze CLI command, LiteLLM backend, rule-based analyzer (nvrules), and unused models (evidence, opt_tree) along with their tests. Project standardizes on profile + evolve as the two primary workflows.

Evolve robustness

read_source_file returns a raw_text field — LLMs can copy-paste exact content as old_content, eliminating the #1 cause of edit failures
Diff safety scanner detects removed __syncthreads, shared memory size changes, and removed bounds guards before compilation
Crash classification maps exit codes to human-readable diagnostics
Compile-error filtering reduces token waste from cascading compiler output
Fuzzy-match fallback for edit_source_file when exact match fails
iteration_doomed flag and benchmark-fix window prevent wasted turns

Skill-based knowledge injection
New SkillRegistry scans .md files with YAML frontmatter and injects domain-specific CUDA optimization knowledge into evolve prompts. Three built-in skills: memory-bound, compute-bound, and general strategies.

The skill base can be easily extended, and in the future we are to build a CUDA kernel skill library for LLMs with smart selection of skills.

--debug-timer
Per-phase timing (edit / compile / benchmark / profile) for evolve iterations to identify bottlenecks during long optimization sessions.

NCU code aggregator
read_source_file falls back to reading non-line-mapped code sections from the .ncu-rep file when local source files are unavailable.

Terminal UX & docs
Live progress display, urgency indicators, structured logging system (utils/log.py), updated docs, and new docs/logging-guide.md.

Test Plan

1,172 tests pass (up from 1,125 on master)
New test suites: safety scanner, crash classification, fuzzy match, skills, evolve timer, evolve session
Removed tests for deleted modules

Major Changes

tachyon analyze removed — use tachyon profile
LiteLLMBackend removed — use AnthropicBackend

AI empowered CUDA kernel profiler

tachyon /ˈtakēˌän/ (tachyon — a theoretical particle that travels faster than light) is a CUDA kernel performance analysis and self-evolving optimization toolkit. It combines LLM-powered agents with traditional rule-based analysis to bridge the full path from NCU metrics to CUDA source code to low-level instructions (PTX/SASS), so performance analysis doesn't stop at aggregate counters — it traces back through the instruction level all the way to your source lines. Beyond analysis, the evolve mode automates the optimization loop: an agent reads profiling data, edits source code, rebuilds, re-profiles, and iterates until convergence — no manual tuning required.

Fully integrated in Python. Supports end-to-end profiling (run an executable directly after tachyon, like ncu), interactive AI-driven analysis, and fully automated iterative optimization. Multiple LLM agent vendors are supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changes

Test Plan

Major Changes

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: Enigmatisms/tachyon

[beta-0.2] Optimization for tachyon evolve and the code-base cleanup

Changes

Test Plan

Major Changes

Uh oh!

Tachyon preview beta release

Uh oh!