TheGEMMCoreProject

(Deprecated) SystemVerilog implementation of Nvidia's SIMT CUDA, Hybrid-Precision Tensor Core, and Google's Systolic Array TPU MXU GEMM Operations.

Note: Although these modules are performing the same "operations", they're by no means really emulating the actual microarchitecture executing CUDA Core/Tensor Core/MXU instructions. Think of this as an introductory educational repo for FP arithmetic digital design. You could altough use these modules as a quick alternative to say prototype an FPU in your FPGA design.

If you're interested in going deeper, I'd highly recommend checking out my work on the Vortex GPGPU's Tensor Core Unit (TCU) extension's DRL Floating Point RTL backend for a significantly more researched, optimized and realistic microarchitecture implementation.

Tensor Core Versions

TensorCore v0: Volta Architecture [FP16MUL FP32ADD]

TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity

TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Arch Diags		Arch Diags
CUDA Cores		CUDA Cores
TensorCoreAmpere		TensorCoreAmpere
TensorCoreHopper		TensorCoreHopper
TensorCoreVolta		TensorCoreVolta
TensorProcessingUnit		TensorProcessingUnit
ImpLinks.md		ImpLinks.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TheGEMMCoreProject

Tensor Core Versions

TensorCore v0: Volta Architecture [FP16MUL FP32ADD]

TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity

TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]

About

Uh oh!

Releases

Packages

Languages

License

NikhilRout/TheGEMMCoreProject

Folders and files

Latest commit

History

Repository files navigation

TheGEMMCoreProject

Tensor Core Versions

TensorCore v0: Volta Architecture [FP16MUL FP32ADD]

TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity

TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages