A collection of memory efficient attention operators implemented in the Triton language.
-
Updated
Jun 5, 2024 - Python
A collection of memory efficient attention operators implemented in the Triton language.
Triton implementation of FlashAttention2 that adds Custom Masks.
Triton implement of bi-directional (non-causal) linear attention
VIT inference in triton because, why not?
A "standard library" of Triton kernels.
Educational resource demonstrating common GPU programming pitfalls and solutions using Triton kernels.
LAMB go brrr
🧠️🖥️2️⃣️0️⃣️0️⃣️1️⃣️💾️📜️ The sourceCode:Triton category for AI2001, containing Triton programming language datasets
Experimental Rust DSL for writing GPU kernels that compile through the Triton compiler — no Python required.
FlashAttention implementations using CUDA and Triton
collection of high-performance CUDA implementations, ranging from naive to highly optimized versions.
💥 Optimize linear attention models with efficient Triton-based implementations in PyTorch, compatible across NVIDIA, AMD, and Intel platforms.
Writing TensorRT plugins using Triton and Python
A container of various PyTorch neural network modules written in Triton.
🌳️🌐️#️⃣️ The Bliss Browser Triton (ClosedAI) language support module, allowing Triton (ClosedAI) programs to be written in and ran within the browser.
Fast Golu Activation in Triton
Triton implementation for FISTA (Experimental)
The Chinese localized edition of Triton Puzzles.
Add a description, image, and links to the triton-lang topic page so that developers can more easily learn about it.
To associate your repository with the triton-lang topic, visit your repo's landing page and select "manage topics."