diff --git a/.github/classroom/autograding.json b/.github/classroom/autograding.json
new file mode 100644
index 00000000..fb177874
--- /dev/null
+++ b/.github/classroom/autograding.json
@@ -0,0 +1,34 @@
+{
+  "tests": [
+    {
+      "name": "Task 3.1 - CPU Parallel Operations",
+      "setup": "pip install -e .",
+      "run": "python -m pytest -m task3_1 --tb=no -q",
+      "input": "",
+      "output": "",
+      "comparison": "included",
+      "timeout": 10,
+      "points": 25
+    },
+    {
+      "name": "Task 3.2 - CPU Matrix Multiplication",
+      "setup": "",
+      "run": "python -m pytest -m task3_2 --tb=no -q",
+      "input": "",
+      "output": "",
+      "comparison": "included",
+      "timeout": 10,
+      "points": 25
+    },
+    {
+      "name": "Style Check",
+      "setup": "",
+      "run": "python -m ruff check . && python -m pyright",
+      "input": "",
+      "output": "",
+      "comparison": "included",
+      "timeout": 10,
+      "points": 10
+    }
+  ]
+}
\ No newline at end of file
diff --git a/.github/workflows/classroom.yaml b/.github/workflows/classroom.yaml
new file mode 100644
index 00000000..2853c181
--- /dev/null
+++ b/.github/workflows/classroom.yaml
@@ -0,0 +1,16 @@
+name: GitHub Classroom Workflow
+
+on: [push]
+
+permissions:
+  checks: write
+  actions: read
+  contents: read
+
+jobs:
+  build:
+    name: Autograding
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: education/autograding@v1
\ No newline at end of file
diff --git a/README.md b/README.md
index e82a8886..b589efab 100644
--- a/README.md
+++ b/README.md
@@ -1,32 +1,164 @@
-# MiniTorch Module 3
+# MiniTorch Module 3 - Parallel and GPU Acceleration
 
 <img src="https://minitorch.github.io/minitorch.svg" width="50%">
 
-* Docs: https://minitorch.github.io/
+**Documentation:** https://minitorch.github.io/
 
-* Overview: https://minitorch.github.io/module3.html
+**Overview (Required reading):** https://minitorch.github.io/module3.html
 
+## Overview
 
-You will need to modify `tensor_functions.py` slightly in this assignment.
+Module 3 focuses on **optimizing tensor operations** through parallel computing and GPU acceleration. You'll implement CPU parallel operations using Numba and GPU kernels using CUDA, achieving dramatic performance improvements over the sequential tensor backend from Module 2.
 
-* Tests:
+### Key Learning Goals
+- **CPU Parallelization**: Implement parallel tensor operations with Numba
+- **GPU Programming**: Write CUDA kernels for tensor operations
+- **Performance Optimization**: Achieve significant speedup through hardware acceleration
+- **Matrix Multiplication**: Optimize the most computationally intensive operations
+- **Backend Architecture**: Build multiple computational backends for flexible performance
 
+## Tasks Overview
+
+| Task    | Description 
+|---------|-------------
+| **3.1** | CPU Parallel Operations (`fast_ops.py`) 
+| **3.2** | CPU Matrix Multiplication (`fast_ops.py`) 
+| **3.3** | GPU Operations (`cuda_ops.py`)
+| **3.4** | GPU Matrix Multiplication (`cuda_ops.py`)
+| **3.5** | Performance Evaluation (`run_fast_tensor.py`)
+
+## Documentation
+
+- **[Installation Guide](installation.md)** - Setup instructions including GPU configuration
+- **[Testing Guide](testing.md)** - How to run tests locally and handle GPU requirements
+
+## Quick Start
+
+### 1. Environment Setup
+```bash
+# Clone and navigate to your assignment
+git clone <your-assignment-repo>
+cd <assignment-directory>
+
+# Create virtual environment (recommended)
+conda create --name minitorch python
+conda activate minitorch
+
+# Install dependencies
+pip install -e ".[dev,extra]"
+```
+
+### 2. Sync Previous Module Files
+```bash
+# Sync required files from your Module 2 solution
+python sync_previous_module.py <path-to-module-2> .
+
+# Example:
+python sync_previous_module.py ../Module-2 .
+```
+
+### 3. Run Tests
+```bash
+# CPU tasks (run anywhere)
+pytest -m task3_1  # CPU parallel operations
+pytest -m task3_2  # CPU matrix multiplication
+
+# GPU tasks (require CUDA-compatible GPU)
+pytest -m task3_3  # GPU operations
+pytest -m task3_4  # GPU matrix multiplication
+
+# Style checks
+pre-commit run --all-files
+```
+
+## GPU Setup
+
+### Option 1: Google Colab (Recommended)
+Most students should use Google Colab for GPU tasks:
+
+1. Upload assignment files to Colab
+2. Change runtime to GPU (Runtime → Change runtime type → GPU)
+3. Install packages:
+   ```python
+   !pip install -e ".[dev,extra]"
+   !python -c "import numba.cuda; print('CUDA available:', numba.cuda.is_available())"
+   ```
+
+### Option 2: Local GPU (If you have NVIDIA GPU)
+For students with NVIDIA GPUs and CUDA-compatible hardware:
+
+```bash
+# Install CUDA toolkit
+# Visit: https://developer.nvidia.com/cuda-downloads
+
+# Install GPU packages
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+pip install numba[cuda]
+
+# Verify GPU support
+python -c "import numba.cuda; print('CUDA available:', numba.cuda.is_available())"
 ```
-python run_tests.py
+
+## Testing Strategy
+
+### CI/CD (GitHub Actions)
+- **Task 3.1**: CPU parallel operations
+- **Task 3.2**: CPU matrix multiplication  
+- **Style Check**: Code quality and formatting
+
+### GPU Testing (Colab/Local GPU)
+- **Task 3.3**: GPU operations (use Colab or local NVIDIA GPU)
+- **Task 3.4**: GPU matrix multiplication (use Colab or local NVIDIA GPU)
+
+### Performance Validation
+```bash
+# Compare backend performance
+python project/run_fast_tensor.py    # Optimized backends
+python project/run_tensor.py         # Basic tensor backend
+python project/run_scalar.py         # Scalar baseline
 ```
 
-* Note:
+## Development Tools
 
-Several of the tests for this assignment will only run if you are on a GPU machine and will not
-run on github's test infrastructure. Please follow the instructions to setup up a colab machine
-to run these tests.
+### Code Quality
+```bash
+# Automatic style checking
+pre-commit install
+git commit -m "your changes"  # Runs style checks automatically
 
-This assignment requires the following files from the previous assignments. You can get these by running
+# Manual style checks
+ruff check .      # Linting
+ruff format .     # Formatting
+pyright .         # Type checking
+```
 
+### Debugging
 ```bash
-python sync_previous_module.py previous-module-dir current-module-dir
+# Debug Numba JIT issues
+NUMBA_DISABLE_JIT=1 pytest -m task3_1 -v
+
+# Debug CUDA kernels
+NUMBA_CUDA_DEBUG=1 pytest -m task3_3 -v
+
+# Monitor GPU usage
+nvidia-smi -l 1  # Update every second
 ```
 
-The files that will be synced are:
+## Implementation Focus
+
+### Task 3.1 & 3.2 (CPU Optimization)
+- Implement `tensor_map`, `tensor_zip`, `tensor_reduce` with Numba parallel loops
+- Optimize matrix multiplication with efficient loop ordering
+- Focus on cache locality and parallel execution patterns
+
+### Task 3.3 & 3.4 (GPU Acceleration)  
+- Write CUDA kernels for element-wise operations
+- Implement efficient GPU matrix multiplication with shared memory
+- Optimize thread block organization and memory coalescing
+
+## Important Notes
 
-        minitorch/tensor_data.py minitorch/tensor_functions.py minitorch/tensor_ops.py minitorch/operators.py minitorch/scalar.py minitorch/scalar_functions.py minitorch/module.py minitorch/autodiff.py minitorch/module.py project/run_manual.py project/run_scalar.py project/run_tensor.py minitorch/operators.py minitorch/module.py minitorch/autodiff.py minitorch/tensor.py minitorch/datasets.py minitorch/testing.py minitorch/optim.py
\ No newline at end of file
+- **GPU Limitations**: Tasks 3.3 and 3.4 cannot run in GitHub CI due to hardware requirements
+- **GPU Testing**: Use Google Colab (recommended) or local NVIDIA GPU for GPU tasks
+- **Performance Critical**: Implementations must show measurable speedup over sequential versions
+- **Memory Management**: Be careful with GPU memory allocation and deallocation
diff --git a/installation.md b/installation.md
new file mode 100644
index 00000000..d8924157
--- /dev/null
+++ b/installation.md
@@ -0,0 +1,142 @@
+# MiniTorch Module 3 Installation
+
+MiniTorch requires Python 3.8 or higher. To check your version of Python, run:
+
+```bash
+>>> python --version
+```
+
+We recommend creating a global MiniTorch workspace directory that you will use
+for all modules:
+
+```bash
+>>> mkdir workspace; cd workspace
+```
+
+## Environment Setup
+
+We highly recommend setting up a *virtual environment*. The virtual environment lets you install packages that are only used for your assignments and do not impact the rest of the system.
+
+**Option 1: Anaconda (Recommended)**
+```bash
+>>> conda create --name minitorch python    # Run only once
+>>> conda activate minitorch
+>>> conda install llvmlite                  # For optimization
+```
+
+**Option 2: Venv**
+```bash
+>>> python -m venv venv          # Run only once
+>>> source venv/bin/activate
+```
+
+The first line should be run only once, whereas the second needs to be run whenever you open a new terminal to get started for the class. You can tell if it works by checking if your terminal starts with `(minitorch)` or `(venv)`.
+
+## Getting the Code
+
+Each assignment is distributed through a Git repo. Once you accept the assignment from GitHub Classroom, a personal repository under Cornell-Tech-ML will be created for you. You can then clone this repository to start working on your assignment.
+
+```bash
+>>> git clone {{ASSIGNMENT}}
+>>> cd {{ASSIGNMENT}}
+```
+
+## Syncing Previous Module Files
+
+Module 3 requires files from Module 0, Module 1, and Module 2. Sync them using:
+
+```bash
+>>> python sync_previous_module.py <path-to-module-2> <path-to-current-module>
+```
+
+Example:
+```bash
+>>> python sync_previous_module.py ../Module-2 .
+```
+
+Replace `<path-to-module-2>` with the path to your Module 2 directory and `<path-to-current-module>` with `.` for the current directory.
+
+This will copy the following required files:
+- `minitorch/tensor_data.py`
+- `minitorch/tensor_functions.py`
+- `minitorch/tensor_ops.py`
+- `minitorch/operators.py`
+- `minitorch/scalar.py`
+- `minitorch/scalar_functions.py`
+- `minitorch/module.py`
+- `minitorch/autodiff.py`
+- `minitorch/tensor.py`
+- `minitorch/datasets.py`
+- `minitorch/testing.py`
+- `minitorch/optim.py`
+- `project/run_manual.py`
+- `project/run_scalar.py`
+- `project/run_tensor.py`
+
+## Installation
+
+Install all packages in your virtual environment:
+
+```bash
+>>> python -m pip install -e ".[dev,extra]"
+```
+
+## GPU Setup (Required for Tasks 3.3 and 3.4)
+
+Tasks 3.3 and 3.4 require GPU support and won't run on GitHub CI.
+
+### Option 1: Google Colab (Recommended)
+
+Most students should use Google Colab as it provides free GPU access:
+
+1. Upload your assignment files to Colab
+2. Change runtime to GPU (Runtime → Change runtime type → GPU)
+3. Install packages in Colab:
+   ```python
+   !pip install -e ".[dev,extra]"
+   !python -c "import numba.cuda; print('CUDA available:', numba.cuda.is_available())"
+   ```
+
+### Option 2: Local GPU Setup (If you have NVIDIA GPU)
+
+For students with NVIDIA GPUs and CUDA-compatible hardware:
+
+1. **Install CUDA Toolkit**
+   ```bash
+   # Visit: https://developer.nvidia.com/cuda-downloads
+   # Follow instructions for your OS
+   ```
+
+2. **Verify CUDA Installation**
+   ```bash
+   >>> nvcc --version
+   >>> nvidia-smi
+   ```
+
+3. **Install GPU-compatible packages**
+   ```bash
+   >>> pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+   >>> pip install numba[cuda]
+   ```
+
+## Verification
+
+Make sure everything is installed by running:
+
+```bash
+>>> python -c "import minitorch; print('Success!')"
+```
+
+Verify that the tensor functionality is available:
+
+```bash
+>>> python -c "from minitorch import tensor; print('Module 3 ready!')"
+```
+
+Check if CUDA support is available (for GPU tasks):
+
+```bash
+>>> python -c "import numba.cuda; print('CUDA available:', numba.cuda.is_available())"
+```
+
+You're ready to start Module 3!
\ No newline at end of file
diff --git a/pyproject.toml b/pyproject.toml
index 442ba844..7be5e21d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -5,17 +5,48 @@ build-backend = "hatchling.build"
 [project]
 name = "minitorch"
 version = "0.5"
+description = "A minimal deep learning library for educational purposes"
+requires-python = ">=3.8"
+dependencies = [
+    "colorama==0.4.6",
+    "hypothesis==6.138.2",
+    "numba==0.61.2",
+    "numpy>=1.24,<2.3",
+    "pytest==8.4.1",
+    "pytest-env==1.1.5",
+    "typing_extensions",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pre-commit==4.3.0",
+]
+extra = [
+    "datasets==2.4.0",
+    "embeddings==0.0.8",
+    "networkx==3.5",
+    "plotly==5.24.1",
+    "pydot==1.4.1",
+    "python-mnist",
+    "streamlit==1.48.1",
+    "streamlit-ace",
+    "torch==2.8.0",
+    "watchdog==1.0.2",
+    "altair==4.2.2",
+]
 
 [tool.pyright]
 include = ["**/minitorch"]
-ignore = [
+exclude = [
     "**/docs",
-    "**/docs/module1/**",
+    "**/docs/module3/**",
     "**/assignments",
     "**/project",
     "**/mt_diagrams",
     "**/.*",
     "*chainrule.py*",
+    "**/minitorch/autodiff.py",
+    "sync_previous_module.py",
 ]
 venvPath = "."
 venv = ".venv"
@@ -30,6 +61,7 @@ reportUnknownLambdaType = "none"
 reportIncompatibleMethodOverride = "none"
 reportPrivateUsage = "none"
 reportMissingParameterType = "error"
+reportMissingImports = "none"
 
 
 [tool.pytest.ini_options]
@@ -61,7 +93,6 @@ markers = [
     "task4_4",
 ]
 [tool.ruff]
-
 exclude = [
     ".git",
     "__pycache__",
@@ -72,10 +103,20 @@ exclude = [
     "**/mt_diagrams/*",
     "**/minitorch/testing.py",
     "**/docs/**/*",
+    "minitorch/optim.py",
+    "minitorch/datasets.py",
+    "minitorch/scalar.py",
+    "minitorch/autodiff.py",
+    "minitorch/module.py",
+    "minitorch/tensor.py",
+    "minitorch/tensor_data.py",
+    "minitorch/tensor_functions.py",
+    "minitorch/tensor_ops.py",
+    "sync_previous_module.py",
 ]
 
+[tool.ruff.lint]
 ignore = [
-    "ANN101",
     "ANN401",
     "N801",
     "E203",
@@ -96,7 +137,7 @@ ignore = [
     "D107",
     "D213",
     "ANN204",
-    "ANN102",
+    "D203"
 ]
 select = ["D", "E", "F", "N", "ANN"]
 fixable = [
@@ -147,5 +188,7 @@ fixable = [
 ]
 unfixable = []
 
-[tool.ruff.extend-per-file-ignores]
+[tool.ruff.lint.extend-per-file-ignores]
 "tests/**/*.py" = ["D"]
+"minitorch/scalar_functions.py" = ["ANN001", "ANN201"]
+"minitorch/tensor_functions.py" = ["ANN001", "ANN201"]
diff --git a/requirements.extra.txt b/requirements.extra.txt
deleted file mode 100644
index 070fa1d0..00000000
--- a/requirements.extra.txt
+++ /dev/null
@@ -1,11 +0,0 @@
-datasets==2.4.0
-embeddings==0.0.8
-plotly==4.14.3
-pydot==1.4.1
-python-mnist
-streamlit==1.12.0
-streamlit-ace
-torch
-watchdog==1.0.2
-altair==4.2.2
-networkx==3.3
diff --git a/requirements.txt b/requirements.txt
deleted file mode 100644
index c9cd8a02..00000000
--- a/requirements.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-colorama==0.4.3
-hypothesis == 6.54
-numba == 0.60
-numpy == 2.0.0
-pre-commit == 2.20.0
-pytest == 8.3.2
-pytest-env
-pytest-runner == 5.2
-typing_extensions
diff --git a/setup.py b/setup.py
deleted file mode 100644
index ff4cfa9f..00000000
--- a/setup.py
+++ /dev/null
@@ -1,3 +0,0 @@
-from setuptools import setup
-
-setup(py_modules=[])
diff --git a/sync_previous_module.py b/sync_previous_module.py
index 9110bf9c..0e1a8bc9 100644
--- a/sync_previous_module.py
+++ b/sync_previous_module.py
@@ -1,50 +1,72 @@
 """
-Description:
-Note: Make sure that both the new and old module files are in same directory!
+Sync Previous Module Files
 
-This script helps you sync your previous module works with current modules.
-It takes 2 arguments, source_dir_name and destination_dir_name.
-All the files which will be moved are specified in files_to_sync.txt as newline separated strings
+This script helps you sync files from your previous module to the current module.
+It copies files specified in 'files_to_sync.txt' from the source directory to the destination directory.
 
-Usage: python sync_previous_module.py <source_dir_name> <dest_dir_name>
+Usage: python sync_previous_module.py <source_directory> <destination_directory>
 
-Ex:  python sync_previous_module.py mle-module-0-sauravpanda24 mle-module-1-sauravpanda24
+Examples:
+    python sync_previous_module.py ./my-awesome-module-2 ./my-awesome-module-3
+    python sync_previous_module.py ~/assignments/Module-2-unicorn_ninja ~/assignments/Module-3-unicorn_ninja
 """
 import os
 import shutil
 import sys
 
-if len(sys.argv) != 3:
-    print(
-        "Invalid argument count! Please pass source directory and destination directory after the file name"
-    )
-    sys.exit()
+def print_usage():
+    """Print usage information and examples."""
+    print(__doc__)
 
-# Get the users path to evaluate the username and root directory
-current_path = os.getcwd()
-grandparent_path = "/".join(current_path.split("/")[:-1])
+def read_files_to_sync():
+    """Read the list of files to sync from files_to_sync.txt"""
+    try:
+        with open("files_to_sync.txt", "r") as f:
+            return f.read().splitlines()
+    except FileNotFoundError:
+        print("Error: files_to_sync.txt not found!")
+        sys.exit(1)
 
-print("Looking for modules in : ", grandparent_path)
+def sync_files(source, dest, files_to_move):
+    """Copy files from source to destination directory."""
+    if not os.path.exists(source):
+        print(f"Error: Source directory '{source}' does not exist!")
+        sys.exit(1)
 
-# List of files which we want to move
-f = open("files_to_sync.txt", "r+")
-files_to_move = f.read().splitlines()
-f.close()
+    if not os.path.exists(dest):
+        print(f"Error: Destination directory '{dest}' does not exist!")
+        sys.exit(1)
 
-# get the source and destination from arguments
-source = sys.argv[1]
-dest = sys.argv[2]
-
-# copy the files from source to destination
-try:
+    copied_files = 0
     for file in files_to_move:
-        print(f"Moving file : ", file)
-        shutil.copy(
-            os.path.join(grandparent_path, source, file),
-            os.path.join(grandparent_path, dest, file),
-        )
-    print(f"Finished moving {len(files_to_move)} files")
-except Exception as e:
-    print(
-        "Something went wrong! please check if the source and destination folders are present in same folder"
-    )
+        source_path = os.path.join(source, file)
+        dest_path = os.path.join(dest, file)
+
+        if not os.path.exists(source_path):
+            print(f"Warning: File '{file}' not found in source directory, skipping")
+            continue
+
+        try:
+            os.makedirs(os.path.dirname(dest_path), exist_ok=True)
+            shutil.copy(source_path, dest_path)
+            print(f"Copied: {file}")
+            copied_files += 1
+        except Exception as e:
+            print(f"Error copying '{file}': {e}")
+
+    print(f"Finished copying {copied_files} files")
+
+def main():
+    if len(sys.argv) != 3:
+        print("Error: Invalid number of arguments!")
+        print_usage()
+        sys.exit(1)
+
+    source = sys.argv[1]
+    dest = sys.argv[2]
+    files_to_move = read_files_to_sync()
+
+    sync_files(source, dest, files_to_move)
+
+if __name__ == "__main__":
+    main()
diff --git a/testing.md b/testing.md
new file mode 100644
index 00000000..5f9ecff1
--- /dev/null
+++ b/testing.md
@@ -0,0 +1,129 @@
+## Testing Your Implementation
+
+### Running Tests
+
+This project uses pytest for testing. Tests are organized by task:
+
+```bash
+# Run all tests for a specific task
+pytest -m task3_1  # CPU parallel operations
+pytest -m task3_2  # CPU matrix multiplication
+pytest -m task3_3  # GPU operations (requires CUDA)
+pytest -m task3_4  # GPU matrix multiplication (requires CUDA)
+
+# Run all tests
+pytest
+
+# Run tests with verbose output
+pytest -v
+
+# Run a specific test file
+pytest tests/test_tensor_general.py  # All optimized tensor tests
+
+# Run a specific test function
+pytest tests/test_tensor_general.py::test_one_args -k "fast"
+pytest tests/test_tensor_general.py::test_matrix_multiply
+```
+
+### GPU Testing Strategy
+
+**CI Limitations:**
+- GitHub Actions CI only runs tasks 3.1 and 3.2 (CPU only)
+- Tasks 3.3 and 3.4 require local GPU or Google Colab
+
+**Option 1: Google Colab Testing (Recommended):**
+```python
+# In Colab notebook
+!pip install -e ".[dev,extra]"
+!python -m pytest -m task3_3 -v
+!python -m pytest -m task3_4 -v
+!python -c "import numba.cuda; print('CUDA available:', numba.cuda.is_available())"
+```
+
+**Option 2: Local GPU Testing (If you have NVIDIA GPU):**
+```bash
+# Verify CUDA is available
+python -c "import numba.cuda; print('CUDA available:', numba.cuda.is_available())"
+
+# Test GPU tasks locally
+pytest -m task3_3  # GPU operations
+pytest -m task3_4  # GPU matrix multiplication
+
+# Debug GPU issues
+NUMBA_DISABLE_JIT=1 pytest -m task3_3 -v  # Disable JIT for debugging
+```
+
+### Style and Code Quality Checks
+
+This project enforces code style and quality using several tools:
+
+```bash
+# Run all pre-commit hooks (recommended)
+pre-commit run --all-files
+
+# Individual style checks:
+ruff check .                 # Linting (style, imports, docstrings)
+ruff format .               # Code formatting
+pyright .                   # Type checking
+```
+
+### Task 3.5 - Performance Evaluation
+
+**Training Scripts:**
+```bash
+# Run optimized training (CPU parallel)
+python project/run_fast_tensor.py
+
+# Compare with previous implementations
+python project/run_tensor.py     # Basic tensor implementation
+python project/run_scalar.py     # Scalar implementation
+```
+
+### Pre-commit Hooks (Automatic Style Checking)
+
+The project uses pre-commit hooks that run automatically before each commit:
+
+```bash
+# Install pre-commit hooks (one-time setup)
+pre-commit install
+
+# Now style checks run automatically on every commit
+git commit -m "your message"  # Will run style checks first
+```
+
+### Debugging Tools
+
+**Numba Debugging:**
+```bash
+# Disable JIT compilation for debugging
+NUMBA_DISABLE_JIT=1 pytest -m task3_1 -v
+
+# Enable Numba debugging output
+NUMBA_DEBUG=1 python project/run_fast_tensor.py
+```
+
+**CUDA Debugging:**
+```bash
+# Check CUDA device properties
+python -c "import numba.cuda; print(numba.cuda.gpus)"
+
+# Monitor GPU memory usage
+nvidia-smi -l 1  # Update every second
+
+# Debug CUDA kernel launches
+NUMBA_CUDA_DEBUG=1 python -m pytest -m task3_3 -v
+```
+
+**Performance Profiling:**
+```bash
+# Time specific operations
+python -c "
+import time
+import minitorch
+backend = minitorch.TensorBackend(minitorch.FastOps)
+# Time your operations here
+"
+
+# Profile memory usage
+python -m memory_profiler project/run_fast_tensor.py
+```
\ No newline at end of file