A from-scratch implementation of a fully-connected neural network for handwritten digit recognition on MNIST, using only NumPy and Tkinter for visualization. No TensorFlow, PyTorch, or Keras dependencies.
- Input Layer: 784 neurons (28×28 flattened grayscale pixels)
- Hidden Layer 1: 128 neurons (sigmoid activation)
- Hidden Layer 2: 64 neurons (sigmoid activation)
- Output Layer: 10 neurons (sigmoid activation, one-hot encoded)
- Loss Function: Mean Squared Error (MSE)
- Optimizer: Stochastic Gradient Descent (SGD) with mini-batches
- Weight Initialization: Xavier/Glorot initialization (w ~ N(0, 1/√n_in))
| Parameter | Value | Rationale |
|---|---|---|
| Batch Size | 32 | Small batches for stable gradients with limited memory |
| Epochs | 30 | Sufficient for convergence on MNIST |
| Learning Rate | 0.25 | Works well with sigmoid activations; smaller slows convergence, larger may be unstable |
mnist-neural-network/
├── data/ # MNIST dataset files
│ ├── train-images.idx3-ubyte # Training images (60,000)
│ ├── train-labels.idx1-ubyte # Training labels
│ ├── t10k-images.idx3-ubyte # Test images (10,000)
│ └── t10k-labels.idx1-ubyte # Test labels
├── main.py # Entry point, CLI, and GUI interface
├── network.py # Neural network implementation
├── mnist_final.npz # Pre-trained model weights
└── README.md # Documentation
pip install numpy pillowpython main.py --trainpython main.py --test --model mnist_final.npzpython main.pypython main.py --model my_model.npzThe GUI provides a 28×28 pixel grid drawing canvas that mirrors the MNIST input format:
- Real-time Drawing: Draw digits directly on the pixelated grid
- Live Prediction: Neural network predictions update as you draw
- Visual Feedback: Each grid cell darkens based on drawing intensity
- Prediction: Recognized digit (0-9)
- Accuracy: Model confidence percentage
- Loss: Mean squared error for current input
- Clear Function: Reset canvas for new digit
Sigmoid Activation Function
σ(z) = 1 / (1 + e^(-z))
Backpropagation Algorithm
- Forward pass: Compute activations layer by layer
- Backward pass: Compute gradients via chain rule
- Weight updates: Apply SGD with mini-batch averaging
Loss Function (MSE)
L = (1/2n) * Σ||ŷ - y||²
Data Processing
- Normalization: Pixel values scaled to [0,1] range
- One-hot Encoding: Labels converted to 10-dimensional vectors
- Input Format: 784×1 column vectors for each image
Typical Results:
- Training Accuracy: ~98-99%
- Test Accuracy: ~95-97%
- Training Time: ~2-5 minutes (depending on hardware)
- Model Size: <50KB (.npz format)
Edit the layer sizes in main.py:
nn = NeuralNetwork([784, 32, 32, 10]) # Larger hidden layersnn.train(training_data,
epochs=50, # More training epochs
mini_batch_size=20, # Smaller batch size
eta=0.05) # Lower learning rateThis implementation demonstrates:
- Fundamental ML Concepts: Gradient descent, backpropagation, activation functions
- Matrix Operations: Efficient vectorized computations with NumPy
- Neural Network Theory: From scratch implementation without frameworks
- Interactive Visualization: Real-time model inference and feedback
- Learning Tool: Understand neural networks without framework abstractions
- Prototyping: Quick experimentation with network architectures
- Demonstration: Visual showcase of digit recognition capabilities
- Research: Baseline implementation for custom modifications
- MNIST Database - Original dataset source
- Neural Networks and Deep Learning - Michael Nielsen
- Pattern Recognition and Machine Learning - Christopher Bishop
Built with ❤️ using pure NumPy and mathematical fundamentals