This project is an ongoing experimental journey, not a finished product:
- What we've discovered: Geometric Adam stays stable and converges successfully in conditions where standard Adam optimizer diverges and fails
- What we're still exploring: How Geometric Adam behaves when Adam is already stable - will it match or exceed Adam's performance?
- Why we're sharing now: We've discovered phenomena that existing optimization theory cannot explain, and we believe in transparent research
- Our primary focus: Understanding these unexplained theoretical phenomena is more important to us than just optimizer performance metrics
- What we need: We're actively seeking feedback and insights from experts who might help us understand what we've discovered (or think we've discovered)
This is an invitation to join our exploration, not a claim of a finished solution.
Someone asked whether I used LLMs to write my research paper. To be clear:
- I used LLMs to translate and polish my English writing (non-native speaker)
- I used them to discuss and validate mathematical proofs (like a 24/7 colleague)
- All core ideas, experiments, and analysis are my original work
- The novel insight of using ray tracing for optimization is mine alone
If using such tools disqualifies research, then we should also reject papers that used spell-checkers, discussed ideas with colleagues, or received any form of assistance. The scientific merit lies in the original contributions, not in the tools used to refine them.
A new kind of optimization algorithm that applies ray tracing principles from computer graphics to neural network training, achieving unprecedented stability and performance improvements.
- 59% improvement in validation perplexity (282 → 116) on 29M parameter transformer
- 100% training completion rate vs 20% for standard optimizers
- Zero divergence across 30 epochs while Adam/AdamW fail after 6 epochs
- Scale-invariant performance demonstrated on 2.5M, 10M and 29M parameter models
This repository implements the research presented in "Geometric Adam: A Ray Tracing-Inspired Adaptive Optimization" by Jaepil Jeong.
Geometric Adam treats gradient descent as light propagation through media with varying optical density:
- Refraction: Automatically adjusts step size based on loss landscape curvature
- Angular Analysis: Detects geometric changes through gradient direction vectors
- Adaptive Control: Exponential step size reduction in high-curvature regions
# Core geometric computation
d_t = g_t / (||g_t|| + ε) # Gradient direction
θ_t = arccos(|d_t · d_{t-1}|) # Angular change
r_t = exp(-λ * θ_t) # Refraction coefficientThe optimizer adapts to loss landscape geometry by:
- Computing angular changes between consecutive gradient directions
- Estimating local curvature from geometric properties
- Applying exponential step size reduction via refraction coefficients
| Optimizer | Final Valid PPL | Training Epochs | Status |
|---|---|---|---|
| Geometric Adam | 115.6 | 30 | ✅ Stable |
| Adam | 786.0 | 6 | ❌ Diverged |
| AdamW | 423.9 | 6 | ❌ Diverged |
Geometric Adam (pink) maintains stable convergence throughout 30 epochs while standard optimizers diverge catastrophically.
The complete implementation includes:
- Geometric State Tracking: Angular changes, curvature estimates, refraction coefficients
- Numerical Stability: Safe division, device compatibility, mixed precision support
- Memory Efficiency: Optional memory-reduced variants (47% reduction)
- Comprehensive Logging: TensorBoard, W&B integration, detailed metrics
class GeometricAdam(torch.optim.Optimizer):
"""
Ray tracing-inspired adaptive optimizer.
Args:
params: Model parameters
lr: Learning rate (default: 1e-3)
betas: Adam momentum coefficients (default: (0.9, 0.999))
lambda_refraction: Refraction sensitivity (default: 0.1)
gamma_curvature: Curvature memory factor (default: 0.95)
eps: Numerical stability constant (default: 1e-8)
"""Our research reveals that successful optimization operates in the large-angle regime where:
- Average angular changes: 1.48 radians (85°)
- Traditional small-angle theory breaks down
- Geometric adaptation provides robust control despite theoretical gaps
- Linear convergence for strongly convex objectives
- Efficient saddle point escape in non-convex settings
- Robustness to systematic estimation errors (21% curvature underestimation)
The paper proposes exciting extensions incorporating Phong reflection models and recursive ray tracing:
- Phong-inspired updates: Ambient + diffuse + specular lighting terms
- Recursive reflection: Multi-bounce optimization trajectories
- Cook-Torrance BRDF: Physically-based rendering for optimization
| Model Size | Training Epochs | Angular Changes | Performance |
|---|---|---|---|
| 2.5M params | 100 epochs | 1.45 ± 0.28 rad | Stable |
| 10M params | 53 epochs | 1.47 ± 0.29 rad | Stable |
| 29M params | 30 epochs | 1.48 ± 0.31 rad | Stable |
- t-statistic > 11 for all comparisons (p < 0.001)
- Cohen's d > 4 indicating very large effect sizes
- Consistent across multiple random seeds
Geometric Adam excels in scenarios requiring:
- High stability for large model training
- Robustness to hyperparameter choices
- Long training schedules without divergence
- Superior final performance over training speed
We welcome contributions to extend and improve Geometric Adam:
- Implementation optimizations
- Hardware acceleration
- New geometric extensions
- Theoretical analysis
- Experimental validation
MIT License
