Tensorax - Project Structure

📦 Directory Tree

tensorax/
├── 📄 README.md                      # Main project documentation
├── 📄 LICENSE                        # MIT License
├── 📄 setup.py                       # Build configuration
├── 📄 pyproject.toml                 # Python project metadata
├── 📄 MANIFEST.in                    # Package manifest
├── 📄 requirements.txt               # Runtime dependencies (pybind11)
├── 📄 requirements-dev.txt           # Development dependencies
├── 📄 .gitignore                     # Git ignore rules
├── 🔧 build.sh                       # Quick build script
├── 📄 demo.py                        # Comprehensive demo script
├── 📄 PROJECT_STRUCTURE.md           # This file
├── 📄 REFACTORING_SUMMARY.md         # NumPy removal summary
│
├── 📁 csrc/                          # C++ and CUDA source code
│   ├── 📄 tensor_ops.h              # Operation declarations
│   ├── 📄 tensor_ops.cpp            # Python bindings (pybind11)
│   ├── 📁 cpu/                      # CPU implementations
│   │   └── 📄 tensor_cpu.cpp       # CPU operations (add, mul, matmul, etc.)
│   └── 📁 cuda/                     # CUDA implementations
│       ├── 📄 cuda_utils.cuh        # CUDA utilities and macros
│       ├── 📄 tensor_cuda.cu        # CUDA memory management
│       └── 📁 kernels/              # Optimized CUDA kernels
│           ├── 📄 elementwise.cu    # Element-wise ops (add, mul, sqrt, etc.)
│           ├── 📄 reduction.cu      # Reduction ops (sum, max, etc.)
│           └── 📄 matmul.cu         # Tiled matrix multiplication
│
├── 📁 tensorax/                       # Python package
│   ├── 📄 __init__.py               # Package initialization
│   ├── 📄 tensor.py                 # Core Tensor class with autograd
│   ├── 📄 functional.py             # Functional API (F.relu, losses, etc.)
│   ├── 📄 optim.py                  # Optimizers (SGD, Adam)
│   └── 📁 nn/                       # Neural network modules
│       ├── 📄 __init__.py
│       ├── 📄 module.py             # Base Module class
│       └── 📄 layers.py             # Layers (Linear, ReLU, Sequential, etc.)
│
├── 📁 tests/                         # Test suite
│   ├── 📄 __init__.py
│   ├── 📄 test_tensor.py            # Tensor operation tests
│   ├── 📄 test_nn.py                # Neural network tests
│   ├── 📄 test_optim.py             # Optimizer tests
│   └── 📄 test_functional.py        # Functional API tests
│
├── 📁 examples/                      # Usage examples
│   ├── 📄 README.md
│   ├── 📄 basic_operations.py       # Basic tensor ops demo
│   ├── 📄 simple_nn.py              # Neural network example
│   └── 📄 cuda_example.py           # GPU acceleration demo
│
└── 📁 docs/                          # Documentation
    ├── 📄 ARCHITECTURE.md           # System architecture details
    ├── 📄 DEVELOPMENT.md            # Development workflow
    └── 📄 GUIDE.md                  # Complete development roadmap

🎯 Key Components

1. C++/CUDA Backend (csrc/)

Purpose: High-performance tensor operations with zero PyTorch/NumPy dependency

Key Files:

tensor_ops.cpp: Python bindings using pybind11, high-level operation wrappers
tensor_ops.h: Operation declarations and TensorImpl class
cuda/kernels/*.cu: Optimized CUDA kernels
cpu/tensor_cpu.cpp: CPU fallback implementations

Operations Fully Implemented:

Element-wise: add, subtract, multiply, divide, sqrt
Matrix: matmul (tiled CUDA algorithm with 448x speedup), transpose
Activations: ReLU, sigmoid, tanh, softmax
Losses: MSE, cross-entropy
Utilities: random normal distribution, device transfers (CPU↔CUDA)

2. Python API (tensorax/)

Purpose: User-friendly PyTorch-like interface with automatic differentiation

Key Classes:

Tensor: Core multi-dimensional array with full autograd support
- Operations: +, -, *, /, @ (matmul)
- Properties: .T (transpose), .shape, .device, .requires_grad
- Methods: .backward(), .zero_grad(), .sqrt(), .cuda(), .cpu()
- Factory methods: .zeros(), .ones(), .full(), .randn()
Module: Base class for neural network layers
- Parameter management
- Device transfer support
Optimizer: Base class for optimization algorithms
- Parameter updates with gradient descent

Neural Network Modules (nn/):

Linear: Fully connected layer with Xavier initialization
ReLU/Sigmoid/Tanh: Activation layers
Sequential: Layer container accepting list or varargs

Optimizers (optim.py):

SGD: Stochastic Gradient Descent with momentum support
Adam: Adaptive moment estimation with bias correction

3. Functional API (tensorax/functional.py)

Purpose: Stateless operations for functional programming style

Functions:

Activations: relu(), sigmoid(), tanh(), softmax()
Losses: mse_loss(), cross_entropy_loss()
Operations: linear()

4. Automatic Differentiation

Complete backpropagation system supporting:

All arithmetic operations (+, -, *, /)
Matrix operations (matmul, transpose)
Activation functions (ReLU, sigmoid, tanh)
Loss functions (MSE)
Gradient accumulation and parameter updates

Gradient flow tracked through computational graph with proper chain rule application.

5. Tests (tests/)

Test Coverage:

Tensor operations and device transfers
Gradient computation and backpropagation
Layer functionality
Optimizer behavior (SGD with momentum, Adam)
Functional API

6. Documentation (docs/)

ARCHITECTURE.md: System design, memory management, kernel design
DEVELOPMENT.md: Build process, testing, debugging
GUIDE.md: Complete development roadmap

🚀 Quick Start

Requirements

Python 3.8+
C++17 compiler (g++ or clang++)
CUDA Toolkit 11.0+ (optional, for GPU support)
pybind11 (automatically installed)

Installation

From PyPI:

pip install tensorax

From Source:

# Clone repository
git clone https://github.com/NotShrirang/tensorax.git
cd tensorax

# Quick build (automatically detects CUDA)
bash build.sh

# Install in development mode
pip install -e .

Run Demo

# Comprehensive demonstration of all features
python demo.py

Test

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=tensorax --cov-report=html

Examples

# Basic operations
python examples/basic_operations.py

# Neural network
python examples/simple_nn.py

# CUDA demo (requires GPU)
python examples/cuda_example.py

📊 Implementation Status

✅ Completed

Project structure and build system
Basic tensor operations (CPU/CUDA)
Tensor class with device management
Module system for neural networks
Common layers (Linear, activations)
Optimizers (SGD, Adam)
Test infrastructure
Documentation and examples

🚧 To Implement (See GUIDE.md for details)

🔧 Build System

Dependencies

Runtime:

Python 3.8+
NumPy
PyTorch (for build utilities only)

Build:

C++17 compiler
pybind11
CUDA Toolkit 11.0+ (optional)

Configuration

setup.py: Main build script
pyproject.toml: Modern Python packaging
MANIFEST.in: Files to include in distribution

CUDA Support

Automatically detected via CUDA_HOME environment variable. Falls back to CPU-only if CUDA not available.

📚 Learning Resources

For CUDA Development

CUDA C Programming Guide
CUDA Best Practices Guide
Nsight Compute/Systems profiling tools

For Deep Learning

PyTorch source code (reference implementation)
CS231n course (Stanford)
Deep Learning book (Goodfellow et al.)

Similar Projects to Study

PyTorch
TinyGrad
JAX
Tinygrad

🤝 Contributing

See docs/DEVELOPMENT.md for:

Development environment setup
Code style guidelines
Testing procedures
Pull request process

📈 Performance Tips

CUDA Optimization

Use tiled algorithms for matrix ops
Maximize shared memory usage
Ensure coalesced memory access
Profile with Nsight Compute
Consider kernel fusion

Python Optimization

Minimize Python/C++ boundary crossings
Batch operations when possible
Use NumPy vectorization
Consider Cython for critical paths

🐛 Common Issues

Build Errors

Check CUDA_HOME is set correctly
Verify C++ compiler supports C++17
Ensure pybind11 is installed

Runtime Errors

Check CUDA availability with cuda_is_available()
Verify tensor devices match for operations
Check array shapes for broadcasting

Memory Issues

CUDA memory leaks: Check cudaFree calls
CPU memory: Let Python GC handle it
Use smaller batch sizes if OOM

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: docs/ directory
Examples: examples/ directory

Next Steps: Follow the development roadmap in docs/GUIDE.md to implement remaining features!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorax - Project Structure

📦 Directory Tree

🎯 Key Components

1. C++/CUDA Backend (csrc/)

2. Python API (tensorax/)

3. Functional API (tensorax/functional.py)

4. Automatic Differentiation

5. Tests (tests/)

6. Documentation (docs/)

🚀 Quick Start

Requirements

Installation

Run Demo

Test

Examples

📊 Implementation Status

✅ Completed

🚧 To Implement (See GUIDE.md for details)

🔧 Build System

Dependencies

Configuration

CUDA Support

📚 Learning Resources

For CUDA Development

For Deep Learning

Similar Projects to Study

🤝 Contributing

📈 Performance Tips

CUDA Optimization

Python Optimization

🐛 Common Issues

Build Errors

Runtime Errors

Memory Issues

📞 Support

FilesExpand file tree

PROJECT_STRUCTURE.md

Latest commit

History

PROJECT_STRUCTURE.md

File metadata and controls

Tensorax - Project Structure

📦 Directory Tree

🎯 Key Components

1. C++/CUDA Backend (csrc/)

2. Python API (tensorax/)

3. Functional API (tensorax/functional.py)

4. Automatic Differentiation

5. Tests (tests/)

6. Documentation (docs/)

🚀 Quick Start

Requirements

Installation

Run Demo

Test

Examples

📊 Implementation Status

✅ Completed

🚧 To Implement (See GUIDE.md for details)

🔧 Build System

Dependencies

Configuration

CUDA Support

📚 Learning Resources

For CUDA Development

For Deep Learning

Similar Projects to Study

🤝 Contributing

📈 Performance Tips

CUDA Optimization

Python Optimization

🐛 Common Issues

Build Errors

Runtime Errors

Memory Issues

📞 Support