Skip to content

Complete MLOps pipeline: 6-stage ML workflow,, Kubernetes deployment, Prometheus monitoring, Airflow orchestration, CI/CD automation, Obervability Stacks (Prometheus, EFK)

Notifications You must be signed in to change notification settings

Abeshith/MLOps_PipeLine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MACHINE LEARNING OPERATIONS PIPELINE

last commit Python Languages

Built with the tools and technologies:

Flask scikit-learn XGBoost DVC MLflow Docker Kubernetes Apache Airflow GitHub Actions Prometheus Grafana


πŸ“Š About the Project

This repository demonstrates a comprehensive MLOps pipeline that showcases industry-standard practices for end-to-end machine learning workflow automation. The project implements a production-ready ML system with automated training, validation, deployment, and monitoring capabilities.

🎯 Key Features

  • 6-Stage DVC Pipeline: Data ingestion β†’ Validation β†’ Feature engineering β†’ Transformation β†’ Training β†’ Evaluation
  • XGBoost Model: Achieved 92.15% accuracy with automated hyperparameter tuning
  • MLflow Integration: Experiment tracking and model registry for version control
  • Production Monitoring: 15+ ML metrics with Prometheus, Grafana dashboards, and health endpoints
  • Container Orchestration: Docker containerization with Kubernetes deployment
  • CI/CD Automation: GitHub Actions for testing, security scanning, and deployment

πŸš€ Getting Started

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • Git & DVC
  • Kaggle Account (for data access)

Installation

  1. Clone the repository
git clone https://github.com/Abeshith/MLOps_PipeLine.git
cd MLOps_PipeLine
  1. Set up Python environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
  1. Configure Kaggle credentials
# Create kaggle.json in ~/.kaggle/ directory
{
  "username": "your_kaggle_username",
  "key": "your_kaggle_key"
}

πŸ”„ Pipeline Execution

Complete Pipeline

# Run all stages
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python main.py

# Or use DVC
dvc repro

Individual Stages

# Stage 1: Data Ingestion
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_01_data_ingestion

# Stage 2: Data Validation  
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_02_data_validation

# Stage 3: Feature Engineering
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_03_feature_engineering

# Stage 4: Data Transformation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_04_data_transformation

# Stage 5: Model Training
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_05_model_trainer

# Stage 6: Model Evaluation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_06_model_evaluation

Flask Application

# Start web interface
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python app.py
# Access at: http://localhost:5000

# Production app with monitoring
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python production_app.py
# Metrics at: http://localhost:5000/metrics

πŸ“Š Monitoring & Observability

Start Monitoring Stack

cd observability
docker compose up -d

# Access monitoring tools:
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000
# Kibana: http://localhost:5601

Available Metrics

  • Model Performance: accuracy, precision, recall, F1-score
  • Prediction Analytics: confidence scores, class distribution
  • System Health: error rates, response times, resource usage

πŸ“– For detailed observability setup and configuration, see Observability.md


☸️ Kubernetes Deployment

# Start cluster
minikube start

# Deploy application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml

# Access application
kubectl port-forward svc/mlapp-service 8000:80
# Navigate to: http://localhost:8000

πŸ”§ Apache Airflow Pipeline

# Set up Airflow (Linux/WSL)
export AIRFLOW_HOME=~/airflow
cp model_dag.py ~/airflow/dags/

# Start Airflow
airflow standalone

# Test DAG file
python ~/airflow/dags/model_dag.py

# Access UI: http://localhost:8080
# Trigger: ml_pipeline_dag

πŸ“ˆ Model Performance

  • Algorithm: XGBoost Classifier
  • Accuracy: 92.15%
  • Precision: 91.47%
  • Recall: 92.00%
  • F1-Score: 91.64%
  • AUC: 94.04%

πŸ“ Project Structure

MLOps_PipeLine/
β”œβ”€β”€ src/mlpipeline/           # Core ML pipeline components and stages
β”œβ”€β”€ config/                   # Configuration files for pipeline settings
β”œβ”€β”€ artifacts/               # Generated model artifacts and data (DVC tracked)
β”œβ”€β”€ k8s/                     # Kubernetes deployment manifests
β”œβ”€β”€ observability/           # Complete monitoring stack with Prometheus, Grafana
β”œβ”€β”€ .github/workflows/       # CI/CD automation pipelines
β”œβ”€β”€ dvc.yaml                # DVC pipeline definition and stages
β”œβ”€β”€ Dockerfile              # Container definition for deployment
β”œβ”€β”€ model_dag.py           # Apache Airflow DAG for pipeline orchestration
β”œβ”€β”€ app.py                 # Basic Flask web application
β”œβ”€β”€ production_app.py      # Production Flask app with monitoring
└── main.py                # Main pipeline execution script

🎯 Key Achievements

βœ… End-to-End Automation: From data ingestion to model deployment
βœ… Scalable Infrastructure: Kubernetes orchestration with monitoring
βœ… Quality Assurance: Automated testing, validation, and security scanning
βœ… Observability: Comprehensive metrics, logging, and tracing
βœ… Continuous Integration: GitHub Actions for automated workflows
βœ… Model Governance: Version control, experiment tracking, performance monitoring


⭐ Star this repository if you found it helpful!

About

Complete MLOps pipeline: 6-stage ML workflow,, Kubernetes deployment, Prometheus monitoring, Airflow orchestration, CI/CD automation, Obervability Stacks (Prometheus, EFK)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages