Skip to content

NikhilRout/TheGEMMCoreProject

Repository files navigation

TheGEMMCoreProject

(Deprecated) SystemVerilog implementation of Nvidia's SIMT CUDA, Hybrid-Precision Tensor Core, and Google's Systolic Array TPU MXU GEMM Operations.

Note: Although these modules are performing the same "operations", they're by no means really emulating the actual microarchitecture executing CUDA Core/Tensor Core/MXU instructions. Think of this as an introductory educational repo for FP arithmetic digital design. You could altough use these modules as a quick alternative to say prototype an FPU in your FPGA design.

If you're interested in going deeper, I'd highly recommend checking out my work on the Vortex GPGPU's Tensor Core Unit (TCU) extension's DRL Floating Point RTL backend for a significantly more researched, optimized and realistic microarchitecture implementation.

Tensor Core Versions

TensorCore v0: Volta Architecture [FP16MUL FP32ADD]

Volta Tensor Core Architecture Diagram
Volta Tensor Core Architecture Diagram

TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity

Ampere Tensor Core Architecture Diagram
Ampere Tensor Core Architecture Diagram

TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]

Hopper Tensor Core Architecture Diagram

About

(Deprecated) SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published