Understanding and Visualizing Optimizers in Machine Learning
- Kien Tran
- Ken Lam
- Start Date: [Enter Start Date]
- End Date: [Enter Due Date or Expected Completion Date]
- Milestones:
- Research and Notes: [Insert Date]
- Initial Visualizations: [Insert Date]
- Interactive Demos Setup: [Insert Date]
- Final Report and Presentation: [Insert Date]
To explore and understand the internal mechanics of widely used optimization algorithms in machine learning, and build clear, intuitive visualizations to demonstrate how each optimizer behaves during training. The goal is to create a resource that helps students and ML practitioners grasp differences and use cases among optimizers.
- Stochastic Gradient Descent (SGD) with Momentum
- RMSProp
- Adadelta
- AdaGrad
- Adam
Each optimizer will be analyzed and compared based on:
- Update rules and formulas
- Handling of gradients (magnitude, direction, and adaptation)
- Performance on convex and non-convex functions
- Convergence speed and stability
- A clean, readable GitHub repository
- Python notebooks (or scripts) demonstrating and comparing optimizers
- Interactive visualizations of optimizer behavior on simple 2D loss surfaces (e.g., Rosenbrock, Himmelblau)
- Markdown-based educational explanations for each optimizer
- A final comparative summary table (speed, memory usage, convergence, etc.)
- Optional: small frontend (e.g., Streamlit or Gradio) to visualize behavior in real-time
- Programming Language: Python
- Libraries:
matplotlib/seabornfor plottingNumPyandPyTorchorTensorFlowfor numerical experimentsPlotly/mpl-interactions/ipywidgetsfor interactive visualizationStreamlitorGradio(optional for UI)
- Project Hosting: GitHub
- Documentation: Markdown files in the repository (
README.md,prd.md, etc.)
For each optimizer:
- 2D contour plot of a loss surface showing optimization path
- Animation (GIF or interactive widget) showing step-by-step movement
- Comparison plots of loss vs. epoch
- Vector field (optional): to show gradient directions vs. update directions
optimizer-visualization/
β
βββ notebooks/
β βββ adam.ipynb
β βββ adagrad.ipynb
β βββ adadelta.ipynb
β βββ rmsprop.ipynb
β βββ sgd_momentum.ipynb
β
βββ visualizations/
β βββ gifs/
β βββ plots/
β
βββ scripts/
β βββ optimizer_classes.py
β
βββ static/
β βββ README_assets/
β
βββ prd.md
βββ README.md
βββ requirements.txt
- Accuracy of theoretical explanations and implementation
- Clarity and usefulness of visualizations
- Code readability and modularity
- Engagement of interactive components (if applicable)
- Final comparative insights across optimizers
- Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization (2014)
- Stanford CS231n Notes
- Distill.pub β Visualizing Optimization Algorithms
- PyTorch & TensorFlow official documentation
- Focus on maintaining consistent formatting across notebooks and visualizations
- Optimize animations for clarity over aesthetics
- Ensure math formulas are rendered properly in markdown files
- Suggest tooltip-based explanations for each step in visualization (if using UI)
- Help with summarizing optimizer comparisons for final presentation/report