GitHub - govindansriram/CobraML2: Performant kernels, and other ML Systems integrations

About

LLM serving engine and custom kernels built from scratch, documenting every step along the way.

Accomplishments

Flash Attention 1 (done)
- between 2 and 4 x faster then pytorch naive MHA
Brought up GPT2

Milestones

Flash Attention 2
Flash Attention 3
KV Cache
Paged Attention
Tensor Parallelism
MOE support

Current limitations

No FP16, FP8, FP4 support
head dim must equal 64
can only handle 1 batch at a time (unless all batches are evenly sized)

Installation

Prerequisites

Python >= 3.10
CUDA toolkit with nvcc
An NVIDIA GPU with compute capability >= 8.0 (Ampere+)

So far all code has only been tested on systems with CUDA >= 12.8 and Ubuntu 22.04

Build from source

Initial setup

git clone https://github.com/govindansriram/CobraML2.git
cd CobraML2

sudo chmod +x ./runner.sh

Python package

Install torch for your CUDA version, then build cobraml:

python3 -m venv .venv
source .venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/cu130
pip install --no-build-isolation -e ".[dev]"

Replace cu130 with your CUDA version: cu124, cu126, cu128, etc. Check with nvcc --version.

C++ targets

The C++ build uses CMake to locate PyTorch headers from the .venv. You don't need to build the full Python package first, but torch must be installed in the venv.

Testing

C++ tests

# Build and run a specific test
./runner.sh -r test_fmha_cc

# Run all tests
./runner.sh -a

# Run with benchmarking enabled
./runner.sh -c -b -r test_fmha_cc

# Filter specific test cases
./runner.sh -r test_fmha_cc -- --gtest_filter=*causal*

Python tests

Requires the Python package to be built first.

# Run all tests
pytest

# Run with benchmarking
pytest --benchmark

# Filter specific test cases
pytest -k "test_fmha_fp32[4-512-16-64-True]"

Using the Runner

The runner.sh script is the main entry point for building, testing, profiling, and formatting the project.

Options

Flag	Description
`-h, --help`	Show help message
`-c, --clean`	Clean build (removes build directory)
`-b, --benchmark`	Enable benchmarking
`-t, --target <name>`	Build specific target
`-r, --run <name>`	Build and run specific target
`-a, --run-all`	Build and run all tests via ctest
`-f, --format [file]`	Format all files, or a specific file
`-p, --profile <name>`	Build and profile target with ncu
`-o, --output <name>`	Custom name for .ncu-rep file
`--profile-opts <opts>`	Additional ncu options
`--no-tests`	Disable building tests
`--`	Pass remaining args to executable

Examples

Build everything:

./runner.sh

Clean build:

./runner.sh -c

Build specific target:

./runner.sh -t test_fmha_cc

Build and run a test:

./runner.sh -r test_fmha_cc

Run all tests:

./runner.sh -a

Run with gtest filter:

./runner.sh -r test_fmha_cc -- --gtest_filter=*Perf*

Clean build with benchmarking enabled:

./runner.sh -c -b -r test_fmha_cc

Profile a kernel with ncu:

./runner.sh -p test_fmha_cc

Profile with custom output name:

./runner.sh -p test_fmha_cc -o my_profile

Profile specific kernel:

./runner.sh -p test_fmha_cc --profile-opts '--kernel-name fmha'

Linting

C++

All C++ files must be formatted with clang-format.

./runner.sh -f
./runner.sh -f include/cobraml2/kernels/fmha_cc.cuh

Python

ruff check python/
ruff format python/

Contributing

...

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
assets		assets
external		external
include/cobraml2		include/cobraml2
python		python
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
runner.sh		runner.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Accomplishments

Milestones

Current limitations

Installation

Prerequisites

Build from source

Initial setup

Python package

C++ targets

Testing

C++ tests

Python tests

Using the Runner

Options

Examples

Linting

C++

Python

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

govindansriram/CobraML2

Folders and files

Latest commit

History

Repository files navigation

About

Accomplishments

Milestones

Current limitations

Installation

Prerequisites

Build from source

Initial setup

Python package

C++ targets

Testing

C++ tests

Python tests

Using the Runner

Options

Examples

Linting

C++

Python

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages