A from-scratch implementation of a neural network for handwritten digit classification (0-9) using only NumPy, featuring the Adam optimization algorithm. This project demonstrates the fundamentals of deep learning by building a complete neural network without high-level frameworks.
This project implements a 3-layer neural network that classifies handwritten digits from the MNIST dataset. The implementation compares the performance of Adam optimization against standard gradient descent, showcasing the effectiveness of adaptive learning rate methods.
- Pure NumPy Implementation: No deep learning frameworks (TensorFlow, PyTorch) used for the core network
- Adam Optimization: Full implementation of the Adam optimizer as described in the original paper
- Performance Comparison: Side-by-side comparison of Adam-optimized vs. standard gradient descent
- High Accuracy: Achieves 100% accuracy on the training set with Adam optimization
- Educational: Clear, documented code ideal for learning neural network fundamentals
├── Image_classification_neural_network_numpy-Adam Optimization.ipynb
├── README.md
└── LICENCE
The network consists of three layers:
| Layer | Type | Neurons | Activation |
|---|---|---|---|
| Input | Dense | 784 (28×28 pixels) | - |
| Hidden 1 | Dense | 128 | ReLU |
| Hidden 2 | Dense | 40 | ReLU |
| Output | Dense | 10 (digits 0-9) | Softmax |
Loss Function: Mean Squared Error (MSE)
Optimization: Adam (β₁=0.9, β₂=0.99, ε=1e-8)
- Source: Kaggle Digit Recognizer Competition
- Training Set: 42,000 labeled images
- Test Set: 28,000 unlabeled images
- Image Format: 28×28 grayscale pixels (784 features)
numpy
pandas
matplotlib
scikit-learn
tensorflow # Only used for validation metrics
pillow-
Clone the repository
git clone https://github.com/jvachier/Image_classification_neural_network_numpy-Adam-Optimization.git cd Image_classification_neural_network_numpy-Adam-Optimization -
Install dependencies
pip install numpy pandas matplotlib scikit-learn tensorflow pillow
-
Download the dataset
- Download
train.csvandtest.csvfrom Kaggle Digit Recognizer - Place them in the project directory
- Download
-
Run the notebook
jupyter notebook "Image_classification_neural_network_numpy-Adam Optimization.ipynb"
The Adam (Adaptive Moment Estimation) optimizer combines the advantages of two popular methods:
- RMSprop: Uses adaptive learning rates
- Momentum: Accelerates convergence in relevant directions
The update rules are:
- Layer 1:
Z[1] = W[1]X + b[1],A[1] = ReLU(Z[1]) - Layer 2:
Z[2] = W[2]A[1] + b[2],A[2] = ReLU(Z[2]) - Layer 3:
Z[3] = W[3]A[2] + b[3],A[3] = Softmax(Z[3])
Gradients are computed using the chain rule and used to update weights and biases through the Adam optimizer.
- Training Accuracy: 100% (with Adam optimization)
- Convergence: Significantly faster with Adam compared to standard gradient descent
- Visualization: Includes training curves for loss, accuracy, MSE, and R² score
- Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
- Kaggle Notebook: Classification with Neural Network - Adam - NumPy
- Dataset: Kaggle Digit Recognizer
This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See the LICENCE file for details.
jvachier
Created: July 2022
This project was created as an educational exercise to understand the inner workings of neural networks and optimization algorithms by implementing them from scratch.