GitHub - shaohao99/cuda_examples: CUDA programs for matrix multiplication, prefix sum of an array, 2D convolution, vector addition on multiple GPUs, and a convolutional layer with CUDNN implementation.

This directory shows some CUDA examples.

mat_mul: Matrix multiplication examples. mat_mul.cu: Compute matrix multiplication using global memory. mat_mul_tile.cu: Compute matrix multiplication using shared memory. The matrix is divided into small tiles and each tile is saved in shared memory, in order to gain large bandwidth.
prefix_sum: Compute prefix sum of an array. Use shared memory. The program Works for arbitrary array size (e.g. the size can be larger than a block size).
conv2d: Compute 2D convolution. The filter is saved in read-only constant memory to gain large bandwidth.
vec_add_multi_gpus: Compute vector addition using multiple GPUs on onde node of a computer cluster. There is no data communication between GPUs in vector addition. The number of GPUs on the node is automatically detected and all GPUs on the node are used. OpenMP is used for CPU multithreading. Each CPU thread assigns a CUDA kernel to one GPU.
conv_cudnn: Build a typical convolution layer in CNN. The CUDNN library is used to compute the convolution and activation processes. The OpenCV library is used to load, normalize and save images.

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
conv2d		conv2d
conv_cudnn		conv_cudnn
mat_mul		mat_mul
prefix_sum		prefix_sum
vec_add_multi_gpus		vec_add_multi_gpus
README.md		README.md