Skip to content

Releases: Enigmatisms/tachyon

[beta-0.2] Optimization for tachyon evolve and the code-base cleanup

29 Mar 08:27
95e2224

Choose a tag to compare

Major robustness and reliability improvements for the evolve automated kernel optimization workflow, plus codebase cleanup and new developer tooling.

Changes

Codebase cleanup
Removed tachyon analyze CLI command, LiteLLM backend, rule-based analyzer (nvrules), and unused models (evidence, opt_tree) along with their tests. Project standardizes on profile + evolve as the two primary workflows.

Evolve robustness

  • read_source_file returns a raw_text field — LLMs can copy-paste exact content as old_content, eliminating the #1 cause of edit failures
  • Diff safety scanner detects removed __syncthreads, shared memory size changes, and removed bounds guards before compilation
  • Crash classification maps exit codes to human-readable diagnostics
  • Compile-error filtering reduces token waste from cascading compiler output
  • Fuzzy-match fallback for edit_source_file when exact match fails
  • iteration_doomed flag and benchmark-fix window prevent wasted turns

Skill-based knowledge injection
New SkillRegistry scans .md files with YAML frontmatter and injects domain-specific CUDA optimization knowledge into evolve prompts. Three built-in skills: memory-bound, compute-bound, and general strategies.

The skill base can be easily extended, and in the future we are to build a CUDA kernel skill library for LLMs with smart selection of skills.

--debug-timer
Per-phase timing (edit / compile / benchmark / profile) for evolve iterations to identify bottlenecks during long optimization sessions.

NCU code aggregator
read_source_file falls back to reading non-line-mapped code sections from the .ncu-rep file when local source files are unavailable.

Terminal UX & docs
Live progress display, urgency indicators, structured logging system (utils/log.py), updated docs, and new docs/logging-guide.md.

Test Plan

  • 1,172 tests pass (up from 1,125 on master)
  • New test suites: safety scanner, crash classification, fuzzy match, skills, evolve timer, evolve session
  • Removed tests for deleted modules

Major Changes

  • tachyon analyze removed — use tachyon profile
  • LiteLLMBackend removed — use AnthropicBackend

Tachyon preview beta release

23 Mar 06:22
79b7b19

Choose a tag to compare

AI empowered CUDA kernel profiler

tachyon /ˈtakēˌän/ (tachyon — a theoretical particle that travels faster than light) is a CUDA kernel performance analysis and self-evolving optimization toolkit. It combines LLM-powered agents with traditional rule-based analysis to bridge the full path from NCU metrics to CUDA source code to low-level instructions (PTX/SASS), so performance analysis doesn't stop at aggregate counters — it traces back through the instruction level all the way to your source lines. Beyond analysis, the evolve mode automates the optimization loop: an agent reads profiling data, edits source code, rebuilds, re-profiles, and iterates until convergence — no manual tuning required.

Fully integrated in Python. Supports end-to-end profiling (run an executable directly after tachyon, like ncu), interactive AI-driven analysis, and fully automated iterative optimization. Multiple LLM agent vendors are supported.