TinyTorch is a lightweight deep learning training framework implemented from scratch in C++.
For more details, please refer to my blog post: Write a nn training framework from scratch
- PyTorch-Style API: Similar naming conventions as PyTorch (
Tensor,Functions,nn.Module,Optimizer). - Pure C++ Implementation: No dependency on external deep learning libraries.
- CPU & CUDA Support: Runs on both CPU and CUDA-enabled GPUs.
- Mixed Precision: Supports FP16, FP32, BF16.
- Distributed: Multi-machine, multi-GPU training & inference.
- LLM Inference: Supports inference for llama/qwen/mistral models: https://github.com/keith2018/TinyGPT
relu,gelu,silusoftmax,logSoftmax
add,sub,mul,div,matmulsin,cos,sqrt,powmaximum,minimum
lt,le,gt,ge,eq,nelogicNot,logicAnd,logicOr
min,argmin,max,argmaxsum,mean,var
reshape,view,permute,transposeflatten,unflatten,squeeze,unsqueezesplit,concat,stack,hstack,vstack,narrowtopk,sort,cumsumgather,scatter
lineardropoutmaxPool2dconv2dembeddinglayerNormrmsNormsdpAttentionmseLossnllLoss
SGD,Adagrad,RMSprop,AdaDelta,Adam,AdamW
Dataset,DataLoader,data.Transform
TinyTorch's automatic differentiation (AD) is implemented by building a computation graph. Each operation on a Tensor is represented by a Function object, which is responsible for both the forward and backward passes. The Function nodes are connected via a nextFunctions field, creating the dependency graph. During the backward() call, the framework traverses this graph in reverse order, computing and propagating gradients using the chain rule.
- CMake
- C++17 or a more recent compiler
- CUDA Toolkit 11.0+ (optional)
mkdir build
cmake -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Releasecd demo/bin
./TinyTorch_democd build
ctestThis code is licensed under the MIT License (see LICENSE).
