This repository demonstrates a comprehensive MLOps pipeline that showcases industry-standard practices for end-to-end machine learning workflow automation. The project implements a production-ready ML system with automated training, validation, deployment, and monitoring capabilities.
- 6-Stage DVC Pipeline: Data ingestion β Validation β Feature engineering β Transformation β Training β Evaluation
- XGBoost Model: Achieved 92.15% accuracy with automated hyperparameter tuning
- MLflow Integration: Experiment tracking and model registry for version control
- Production Monitoring: 15+ ML metrics with Prometheus, Grafana dashboards, and health endpoints
- Container Orchestration: Docker containerization with Kubernetes deployment
- CI/CD Automation: GitHub Actions for testing, security scanning, and deployment
- Python 3.11+
- Docker & Docker Compose
- Git & DVC
- Kaggle Account (for data access)
- Clone the repository
git clone https://github.com/Abeshith/MLOps_PipeLine.git
cd MLOps_PipeLine- Set up Python environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt- Configure Kaggle credentials
# Create kaggle.json in ~/.kaggle/ directory
{
"username": "your_kaggle_username",
"key": "your_kaggle_key"
}# Run all stages
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python main.py
# Or use DVC
dvc repro# Stage 1: Data Ingestion
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_01_data_ingestion
# Stage 2: Data Validation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_02_data_validation
# Stage 3: Feature Engineering
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_03_feature_engineering
# Stage 4: Data Transformation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_04_data_transformation
# Stage 5: Model Training
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_05_model_trainer
# Stage 6: Model Evaluation
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python -m src.mlpipeline.pipeline.stage_06_model_evaluation# Start web interface
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python app.py
# Access at: http://localhost:5000
# Production app with monitoring
cd "/home/abhes/MlOps PipeLine" && source venv/bin/activate && PYTHONPATH="/home/abhes/MlOps PipeLine/src" python production_app.py
# Metrics at: http://localhost:5000/metricscd observability
docker compose up -d
# Access monitoring tools:
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000
# Kibana: http://localhost:5601- Model Performance: accuracy, precision, recall, F1-score
- Prediction Analytics: confidence scores, class distribution
- System Health: error rates, response times, resource usage
π For detailed observability setup and configuration, see Observability.md
# Start cluster
minikube start
# Deploy application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
# Access application
kubectl port-forward svc/mlapp-service 8000:80
# Navigate to: http://localhost:8000# Set up Airflow (Linux/WSL)
export AIRFLOW_HOME=~/airflow
cp model_dag.py ~/airflow/dags/
# Start Airflow
airflow standalone
# Test DAG file
python ~/airflow/dags/model_dag.py
# Access UI: http://localhost:8080
# Trigger: ml_pipeline_dag- Algorithm: XGBoost Classifier
- Accuracy: 92.15%
- Precision: 91.47%
- Recall: 92.00%
- F1-Score: 91.64%
- AUC: 94.04%
MLOps_PipeLine/
βββ src/mlpipeline/ # Core ML pipeline components and stages
βββ config/ # Configuration files for pipeline settings
βββ artifacts/ # Generated model artifacts and data (DVC tracked)
βββ k8s/ # Kubernetes deployment manifests
βββ observability/ # Complete monitoring stack with Prometheus, Grafana
βββ .github/workflows/ # CI/CD automation pipelines
βββ dvc.yaml # DVC pipeline definition and stages
βββ Dockerfile # Container definition for deployment
βββ model_dag.py # Apache Airflow DAG for pipeline orchestration
βββ app.py # Basic Flask web application
βββ production_app.py # Production Flask app with monitoring
βββ main.py # Main pipeline execution script
β
End-to-End Automation: From data ingestion to model deployment
β
Scalable Infrastructure: Kubernetes orchestration with monitoring
β
Quality Assurance: Automated testing, validation, and security scanning
β
Observability: Comprehensive metrics, logging, and tracing
β
Continuous Integration: GitHub Actions for automated workflows
β
Model Governance: Version control, experiment tracking, performance monitoring
β Star this repository if you found it helpful!