Skip to content

A real-time industrial predictive monitoring platform. Ingest, process, analyze, and visualize sensor data with Kafka, Spark, Airflow, and ML.

Notifications You must be signed in to change notification settings

pashitox/nitro-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Nitro Project - Industrial Predictive Monitoring System

πŸ“‹ Table of Contents

  1. Features
  2. System Architecture
  3. Services and Ports
  4. Quick Start
  5. Project Structure
  6. Analysis Notebooks
  7. Data Flow
  8. Implemented Technologies
  9. Maintenance
  10. Troubleshooting

✨ Features

  • πŸ“Š Real-time data ingestion with Kafka Producer
  • ⚑ Distributed processing with Spark Streaming
  • πŸ”§ Intelligent orchestration with Apache Airflow
  • πŸ’Ύ Scalable storage in PostgreSQL and MinIO
  • πŸ€– Advanced MLOps with MLflow and SHAP
  • πŸ“ˆ Professional visualization with Streamlit and Grafana
  • πŸ““ Comprehensive analysis with specialized notebooks
  • πŸ” Real-time monitoring with interactive dashboards

πŸ—οΈ System Architecture

graph LR
A[Kafka Producer] --> B[Kafka Cluster]
B --> C[Spark Processor]
C --> D[(PostgreSQL)]
C --> E[MinIO Storage]
D --> F[FastAPI ML Service]
E --> F
F --> G[Streamlit Dashboard]
F --> H[Grafana Monitoring]
D --> H
Loading

🌐 Services and Ports

Service URL Port Credentials Status
πŸ”§ Airflow http://localhost:8080 8080 admin/admin βœ… Operational
⚑ Spark Master http://localhost:8081 8081 - βœ… Operational
πŸ“Š Streamlit Dashboard http://localhost:8501 8501 - βœ… Operational
πŸ“ˆ Grafana http://localhost:3000 3000 admin/admin123 βœ… Operational
πŸ’Ύ MinIO Console http://localhost:9001 9001 admin/admin12345 βœ… Operational
πŸ—„οΈ PostgreSQL localhost:5432 5432 nitro_user/nitro_pass βœ… Operational
πŸš€ FastAPI http://localhost:8000 8000 - βœ… Operational
πŸ“‘ Kafka localhost:9092 9092/29092 - βœ… Operational

πŸš€ Quick Start

# Clone the repository
git clone <your-repository>
cd proyecto-nitro

# Start all services
docker-compose up -d

# Check service status
docker-compose ps

# Start the data producer
./start-producer.sh

# Access dashboards (wait 2-3 minutes for complete initialization)
echo "Access URLs:"
echo "Airflow: http://localhost:8080"
echo "Grafana: http://localhost:3000"
echo "Streamlit: http://localhost:8501"

πŸ“ Project Structure

proyecto-nitro/
β”œβ”€β”€ πŸ“Š airflow/                 # Airflow DAGs and configuration
β”œβ”€β”€ πŸš€ api-dashboard/           # FastAPI and Streamlit
β”‚   β”œβ”€β”€ fastapi/               # ML prediction API
β”‚   └── dashboards/            # Interactive dashboards
β”œβ”€β”€ πŸ“‘ kafka-producer/          # Kafka data producer
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ kafka_producer.py
β”‚   └── requirements.txt
β”œβ”€β”€ πŸ—„οΈ minio-setup/             # MinIO bucket configuration
β”œβ”€β”€ πŸ““ notebooks/              # Analysis and modeling
β”‚   β”œβ”€β”€ πŸ“Š EDA.ipynb
β”‚   β”œβ”€β”€ βš™οΈ feature_engineering.ipynb
β”‚   β”œβ”€β”€ πŸ€– model_training.ipynb
β”‚   β”œβ”€β”€ πŸ” mlflow_tracking.ipynb
β”‚   β”œβ”€β”€ πŸ“ˆ SHAP_analysis.ipynb
β”‚   β”œβ”€β”€ πŸ“‹ reports/
β”‚   β”œβ”€β”€ 🧠 models/
β”‚   └── πŸ’Ύ data/
β”‚       └── enhanced_predictions.csv
β”œβ”€β”€ πŸ—ƒοΈ postgres-setup/          # PostgreSQL schemas and config
β”œβ”€β”€ ⚑ python-processor/        # Spark data processor
β”œβ”€β”€ πŸ”₯ spark-processing/       # Spark jobs
β”œβ”€β”€ 🐳 docker-compose.yml      # Container orchestration
β”œβ”€β”€ πŸš€ start-producer.sh       # Startup script
└── πŸ“– README.md              # This file

πŸ““ Analysis Notebooks

Notebook Description Technologies
πŸ“Š EDA.ipynb Exploratory Data Analysis Pandas, Matplotlib, Seaborn
βš™οΈ feature_engineering.ipynb Feature engineering Scikit-learn, Featuretools
πŸ€– model_training.ipynb Predictive model training Scikit-learn, XGBoost, MLflow
πŸ” mlflow_tracking.ipynb ML experiment tracking MLflow, Hyperopt
πŸ“ˆ SHAP_analysis.ipynb Model explainability SHAP, Matplotlib

πŸ”„ Data Flow

  1. Ingestion: Kafka Producer generates simulated industrial sensor data
  2. Streaming: Kafka publishes to sensor_topic with 1 partition
  3. Processing: Spark processes data in real-time with transformations
  4. Storage: Data persisted in PostgreSQL (structured) and MinIO (raw)
  5. Analysis: Specialized notebooks for EDA and predictive modeling
  6. Visualization: Real-time dashboards with Streamlit and Grafana
  7. Prediction: FastAPI serves ML models for predictive maintenance

πŸ› οΈ Implemented Technologies

πŸ—οΈ Data Engineering

Apache Airflow Apache Kafka Apache Spark PostgreSQL MinIO

πŸ€– Machine Learning

Python Scikit-learn MLflow SHAP

πŸ“Š Visualization & Monitoring

Grafana Streamlit FastAPI

🐳 Infrastructure

Docker Docker Compose

πŸ› οΈ Maintenance

Useful Commands

# View logs for all services
docker-compose logs

# View specific service logs
docker-compose logs kafka
docker-compose logs spark-master

# Restart a specific service
docker-compose restart kafka-producer

# Check container status
docker-compose ps

# Access PostgreSQL
docker-compose exec postgres psql -U nitro_user -d nitro_db

# List Kafka topics
docker-compose exec kafka kafka-topics --list --bootstrap-server localhost:9092

# Scale Spark workers
docker-compose up -d --scale spark-worker=3

πŸ› Troubleshooting

Common Issues and Solutions

Issue Solution
Kafka won't start Check Zookeeper health: docker-compose logs zookeeper
Producer can't connect Wait 30-60 seconds for Kafka complete initialization
PostgreSQL connection refused Check logs: docker-compose logs postgres
Dashboards won't load Wait 2-3 minutes and verify all services are UP
Airflow webserver error Run: docker-compose restart airflow

Cleanup and Reinstallation

# Stop and remove all containers and volumes
docker-compose down -v

# Rebuild and start all services
docker-compose up -d --build

# Force container recreation
docker-compose up -d --force-recreate

πŸ“Š Grafana Dashboard - Recommended Configuration

To configure Grafana with your PostgreSQL data:

  1. Access http://localhost:3000
  2. Configure PostgreSQL data source:
    • Host: postgres:5432
    • Database: nitro_db
    • User: nitro_user
    • Password: nitro_pass
  3. Import dashboards for:
    • Kafka monitoring (lag, throughput)
    • Spark performance metrics
    • ML model prediction analysis
    • Real-time sensor health status

πŸ“ License

MIT License - see LICENSE file for details.

🀝 Contribution

Contributions are welcome!

# 1. Fork the project
# 2. Create your feature branch
git checkout -b feature/AmazingFeature

# 3. Commit your changes
git commit -m 'Add some AmazingFeature'

# 4. Push to the branch
git push origin feature/AmazingFeature

# 5. Open a Pull Request

πŸ“ž Support

If you encounter issues:

  1. Check the Troubleshooting section
  2. Verify logs with docker-compose logs [service]
  3. Open an issue in the repository with:
    • Detailed description of the problem
    • Commands executed
    • Relevant logs
    • Screenshots (if applicable)

⭐ If you find this project useful, please give it a star on GitHub!

About

A real-time industrial predictive monitoring platform. Ingest, process, analyze, and visualize sensor data with Kafka, Spark, Airflow, and ML.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published