- Features
- System Architecture
- Services and Ports
- Quick Start
- Project Structure
- Analysis Notebooks
- Data Flow
- Implemented Technologies
- Maintenance
- Troubleshooting
- π Real-time data ingestion with Kafka Producer
- β‘ Distributed processing with Spark Streaming
- π§ Intelligent orchestration with Apache Airflow
- πΎ Scalable storage in PostgreSQL and MinIO
- π€ Advanced MLOps with MLflow and SHAP
- π Professional visualization with Streamlit and Grafana
- π Comprehensive analysis with specialized notebooks
- π Real-time monitoring with interactive dashboards
graph LR
A[Kafka Producer] --> B[Kafka Cluster]
B --> C[Spark Processor]
C --> D[(PostgreSQL)]
C --> E[MinIO Storage]
D --> F[FastAPI ML Service]
E --> F
F --> G[Streamlit Dashboard]
F --> H[Grafana Monitoring]
D --> H
| Service | URL | Port | Credentials | Status |
|---|---|---|---|---|
| π§ Airflow | http://localhost:8080 | 8080 | admin/admin | β Operational |
| β‘ Spark Master | http://localhost:8081 | 8081 | - | β Operational |
| π Streamlit Dashboard | http://localhost:8501 | 8501 | - | β Operational |
| π Grafana | http://localhost:3000 | 3000 | admin/admin123 | β Operational |
| πΎ MinIO Console | http://localhost:9001 | 9001 | admin/admin12345 | β Operational |
| ποΈ PostgreSQL | localhost:5432 | 5432 | nitro_user/nitro_pass | β Operational |
| π FastAPI | http://localhost:8000 | 8000 | - | β Operational |
| π‘ Kafka | localhost:9092 | 9092/29092 | - | β Operational |
# Clone the repository
git clone <your-repository>
cd proyecto-nitro
# Start all services
docker-compose up -d
# Check service status
docker-compose ps
# Start the data producer
./start-producer.sh
# Access dashboards (wait 2-3 minutes for complete initialization)
echo "Access URLs:"
echo "Airflow: http://localhost:8080"
echo "Grafana: http://localhost:3000"
echo "Streamlit: http://localhost:8501"proyecto-nitro/
βββ π airflow/ # Airflow DAGs and configuration
βββ π api-dashboard/ # FastAPI and Streamlit
β βββ fastapi/ # ML prediction API
β βββ dashboards/ # Interactive dashboards
βββ π‘ kafka-producer/ # Kafka data producer
β βββ Dockerfile
β βββ kafka_producer.py
β βββ requirements.txt
βββ ποΈ minio-setup/ # MinIO bucket configuration
βββ π notebooks/ # Analysis and modeling
β βββ π EDA.ipynb
β βββ βοΈ feature_engineering.ipynb
β βββ π€ model_training.ipynb
β βββ π mlflow_tracking.ipynb
β βββ π SHAP_analysis.ipynb
β βββ π reports/
β βββ π§ models/
β βββ πΎ data/
β βββ enhanced_predictions.csv
βββ ποΈ postgres-setup/ # PostgreSQL schemas and config
βββ β‘ python-processor/ # Spark data processor
βββ π₯ spark-processing/ # Spark jobs
βββ π³ docker-compose.yml # Container orchestration
βββ π start-producer.sh # Startup script
βββ π README.md # This file
| Notebook | Description | Technologies |
|---|---|---|
π EDA.ipynb |
Exploratory Data Analysis | Pandas, Matplotlib, Seaborn |
βοΈ feature_engineering.ipynb |
Feature engineering | Scikit-learn, Featuretools |
π€ model_training.ipynb |
Predictive model training | Scikit-learn, XGBoost, MLflow |
π mlflow_tracking.ipynb |
ML experiment tracking | MLflow, Hyperopt |
π SHAP_analysis.ipynb |
Model explainability | SHAP, Matplotlib |
- Ingestion: Kafka Producer generates simulated industrial sensor data
- Streaming: Kafka publishes to
sensor_topicwith 1 partition - Processing: Spark processes data in real-time with transformations
- Storage: Data persisted in PostgreSQL (structured) and MinIO (raw)
- Analysis: Specialized notebooks for EDA and predictive modeling
- Visualization: Real-time dashboards with Streamlit and Grafana
- Prediction: FastAPI serves ML models for predictive maintenance
# View logs for all services
docker-compose logs
# View specific service logs
docker-compose logs kafka
docker-compose logs spark-master
# Restart a specific service
docker-compose restart kafka-producer
# Check container status
docker-compose ps
# Access PostgreSQL
docker-compose exec postgres psql -U nitro_user -d nitro_db
# List Kafka topics
docker-compose exec kafka kafka-topics --list --bootstrap-server localhost:9092
# Scale Spark workers
docker-compose up -d --scale spark-worker=3| Issue | Solution |
|---|---|
| Kafka won't start | Check Zookeeper health: docker-compose logs zookeeper |
| Producer can't connect | Wait 30-60 seconds for Kafka complete initialization |
| PostgreSQL connection refused | Check logs: docker-compose logs postgres |
| Dashboards won't load | Wait 2-3 minutes and verify all services are UP |
| Airflow webserver error | Run: docker-compose restart airflow |
# Stop and remove all containers and volumes
docker-compose down -v
# Rebuild and start all services
docker-compose up -d --build
# Force container recreation
docker-compose up -d --force-recreateTo configure Grafana with your PostgreSQL data:
- Access http://localhost:3000
- Configure PostgreSQL data source:
- Host:
postgres:5432 - Database:
nitro_db - User:
nitro_user - Password:
nitro_pass
- Host:
- Import dashboards for:
- Kafka monitoring (lag, throughput)
- Spark performance metrics
- ML model prediction analysis
- Real-time sensor health status
MIT License - see LICENSE file for details.
Contributions are welcome!
# 1. Fork the project
# 2. Create your feature branch
git checkout -b feature/AmazingFeature
# 3. Commit your changes
git commit -m 'Add some AmazingFeature'
# 4. Push to the branch
git push origin feature/AmazingFeature
# 5. Open a Pull RequestIf you encounter issues:
- Check the Troubleshooting section
- Verify logs with
docker-compose logs [service] - Open an issue in the repository with:
- Detailed description of the problem
- Commands executed
- Relevant logs
- Screenshots (if applicable)
β If you find this project useful, please give it a star on GitHub!