This repository collects small, focused CUDA example programs and helper scripts used for learning and benchmarking. Each subdirectory contains a single example (source, README, and helper scripts).
- Prerequisites
- Quick Build
- How to run examples
- Profiling
- Repository layout and links
- CLI conventions
- CI / GitHub Actions
- Contributing
- Linux (Ubuntu recommended for scripts in this repo)
- NVIDIA CUDA toolkit (nvcc) installed and on
PATHfor local builds makeand standard build tools (gcc,g++,make)nvprof(or your preferred NVIDIA profiler) if you want to profile; profiling scripts in each directory callnvprofby default
If you plan to use the included GitHub Actions workflow, the workflow builds inside an NVIDIA CUDA Docker image so you don't need CUDA installed locally for CI builds.
From the project root run:
make -j$(nproc)This will run make in every subdirectory that provides a Makefile and build the example binaries.
Each example directory contains a run.sh helper script and a README with example invocations. Most binaries accept an explicit --help flag that prints usage.
Example:
cd vector_addition
./vectAdd --mode 0 --n 1024 --threads 128 --granularity 1Note: binaries accept flags only (no positional fallback). If a directory provides a run.sh, it maps convenient script arguments to the program flags when present.
Per-directory profiling scripts are provided and named profile_nvprof.sh. They call nvprof and save profiler outputs. Example usage (from a subdirectory):
./profile_nvprof.sh --n 4096 --threads 256If you do not have nvprof, install the CUDA toolkit, or run the GitHub Actions CI which builds the project inside a CUDA container.
Click the folders below for the example README files and more details:
Vector Addition— vector add exampleError Handling— examples showing CUDA error handlingDevice Specification— device query and capability examplesImage Manipulation— image processing examples (blur, grayscale); includesstbhelper headersMatrix-Vector Multiplication— matrix-vector multiplication exampleMatrix Multiplication— matrix multiplication exampleConvolution— convolution examples (1D & 2D)
Each folder includes a README.md with per-example instructions.
- All example binaries use flag-style CLI (e.g.,
--n 1024,--threads 128). - Centralized CLI helpers live in
common/cli_utils.hand are used across examples for consistent parsing and validation.
A GitHub Actions workflow is included at .github/workflows/ci.yml. The workflow builds the project inside an NVIDIA CUDA Docker image and uploads artifacts. It runs on push and pull_request to main/dev.
- Make changes in a feature branch, run
make, and add tests or smoke-tests if appropriate. - Open a PR with a clear description and small, focused commits.