A production-ready machine learning model serving API with comprehensive MLOps capabilities including model versioning, monitoring, and automated training pipelines.
Deploying machine learning models to production is often fragmented, fragile, and hard to scale. Teams face:
- Inconsistent model versions across environments
- Lack of observability into inference performance and data drift
- Manual retraining and deployment with no governance
- Poor scalability under real-world traffic
- No standardized monitoring of latency, errors, or throughput
This leads to unreliable predictions, delayed updates, and high operational overhead.
This project delivers a production-grade, end-to-end MLOps platform that enables:
- Reliable, versioned model serving via FastAPI with support for multiple formats
- Real-time monitoring & alerting using Prometheus and Grafana
- Automated drift detection and performance tracking
- CI/CD-integrated training pipelines with model registry
- Scalable, cloud-native deployment using Docker and Kubernetes
- Observability by design with latency, error, and throughput metrics
Result: Faster, safer, and more reliable ML in production.
- RESTful API for model predictions with FastAPI
- Model Versioning with support for multiple model formats
- Real-time Monitoring with Prometheus and Grafana
- Data Drift Detection and model performance tracking
- Automated Training Pipelines with model registry
- Docker & Kubernetes ready deployment
- CI/CD Integration with GitHub Actions
| Category | Technology |
|---|---|
| API Framework | FastAPI (async/await) |
| Machine Learning | Scikit-learn, TensorFlow, PyTorch |
| Monitoring | Prometheus, Grafana |
| Database | PostgreSQL (async) |
| Storage | Local filesystem, S3, or cloud storage |
| Containerization | Docker, Docker Compose |
| Orchestration | Kubernetes manifests |
| CI/CD | GitHub Actions |
The system follows a microservices architecture:
- API Service: Handles prediction requests and model management
- Model Registry: Manages model versions and metadata
- Monitoring Service: Tracks predictions, errors, and data drift
- Training Pipeline: Automated model training and evaluation
- Database: Stores model metadata and prediction history
- Python 3.9+
- Docker and Docker Compose
- PostgreSQL (for production)
-
Clone the repository:
git clone https://github.com/mosesachizz/ml-model-serving.git cd ml-model-serving -
Set up environment:
cp .env.example .env # Edit .env with your configuration -
Install dependencies:
pip install -r requirements/dev.txt
-
Run with Docker Compose:
docker-compose up -d
-
Access the services:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
Make a prediction:
curl -X POST "http://localhost:8000/api/v1/predict" -H "Content-Type: application/json" -d '{
"model_version": "v1",
"features": [[5.1, 3.5, 1.4, 0.2]]
}'List available models:
curl "http://localhost:8000/api/v1/models"Get model statistics:
curl "http://localhost:8000/api/v1/monitoring/models/v1/stats"- Pickle (.pkl)
- Joblib (.joblib)
- TensorFlow/Keras (.h5)
- ONNX (.onnx)
- PyTorch (.pt) - via custom loading
The API exposes Prometheus metrics at /metrics:
model_predictions_total: Total predictions countmodel_prediction_latency_seconds: Prediction latency histogrammodel_prediction_errors_total: Error counts by typemodel_throughput_predictions_per_second: Real-time throughput
The training pipeline includes:
- Data validation and preprocessing
- Model training with hyperparameter tuning
- Model evaluation and validation
- Model registration and versioning
- Automated testing and deployment
Run the training pipeline:
python -m training_pipeline.pipelineDeploy to Kubernetes:
kubectl apply -f kubernetes/The application can be deployed to:
- AWS ECS/EKS
- Google Cloud Run/GKE
- Azure Container Instances/AKS
- Heroku with Docker
GitHub Actions workflows included:
- CI: Run tests on pull requests
- CD: Deploy to staging/production
- Training: Automated model training pipeline
- Monitoring: Data drift detection and alerts
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests and ensure they pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.