Skip to content

tanmay271/Train-Predictive-maintenance-using-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏭 Predictive Maintenance Pipeline — Air Compressor (MetroPT3)

Production-ready ML pipeline for predicting failures in metropolitan train air compressor systems using advanced time-domain and frequency-domain feature engineering with a Random Forest classifier.

Python scikit-learn NumPy Status


📋 Table of Contents


💼 Business Case

Unplanned downtime in metropolitan rail systems costs thousands of dollars per hour and disrupts passenger services. Air compressor failures in trains are a leading cause of such downtime.

This pipeline implements a Predictive Maintenance (PdM) system that:

  • Continuously monitors 7 critical sensor signals (pressure, temperature, current, etc.)
  • Extracts 182 engineered features per sample (12 time-domain + 14 frequency-domain per sensor + LPS)
  • Trains a Random Forest classifier to detect early fault signatures
  • Achieves high precision & recall on historically documented fault events

Impact: Early fault detection enables scheduled maintenance before catastrophic failure, reducing downtime by up to 70% and maintenance costs by up to 25%.


🏗 Project Architecture

Train-Predictive-maintenance-using-AI/
│
├── config/
│   └── config.yaml              # Central configuration (all hyperparameters)
│
├── data/
│   ├── raw/                     # Raw dataset storage
│   └── processed/               # Feature-engineered outputs
│
├── models/                      # Saved model artifacts (.joblib)
├── figures/                     # Confusion matrix plots
├── notebooks/                   # Original Jupyter notebook (reference)
│
├── src/
│   ├── data/
│   │   └── make_dataset.py      # Data ingestion, parsing, slicing
│   ├── features/
│   │   ├── time_domain.py       # 12 rolling-window statistical features
│   │   ├── freq_domain.py       # 14 FFT spectral features (zero-div fixed)
│   │   └── build_features.py    # Feature engineering orchestrator
│   ├── models/
│   │   ├── train_model.py       # StandardScaler + RandomForest training
│   │   └── predict_model.py     # Model evaluation & metrics logging
│   └── visualization/
│       └── visualize.py         # Confusion matrix heatmaps
│
├── run_pipeline.py              # End-to-end orchestrator (single command)
├── requirements.txt             # Pinned dependencies
└── README.md

Pipeline Flow

flowchart LR
    A[Raw CSV<br/>1.5M rows] --> B[make_dataset.py<br/>Parse & Slice]
    B --> C[build_features.py<br/>Feature Engineering]
    C --> D[train_model.py<br/>Scale & Train RF]
    D --> E[predict_model.py<br/>Evaluate 3 Test Sets]
    E --> F[visualize.py<br/>Confusion Matrices]
    G[config.yaml] -.-> B & C & D & E
Loading

🔬 Feature Engineering

Time-Domain Features (per sensor)

Rolling-window statistics computed over a configurable window (default: 200 samples):

# Feature Formula Description
1 Mean Rolling arithmetic mean
2 Std Rolling standard deviation
3 Variance Rolling variance
4 RMS Root mean square: √(mean(x²))
5 Peak Value Rolling maximum
6 Skewness Third standardized moment
7 Kurtosis Fourth standardized moment
8 Crest Factor Peak / RMS
9 Margin Factor Peak / Variance
10 Impulse Factor Peak /
11 A-Factor Peak / (Std × Variance)
12 B-Factor (Kurtosis × Crest Factor) / Std

Frequency-Domain Features (FFT, per sensor)

Spectral features extracted using FFT on sliding windows:

# Feature Description
1 Spectral Mean Mean of FFT magnitude spectrum
2 Spectral Variance Variance of the spectrum
3 3rd Moment Spectral skewness analog
4 4th Moment Spectral kurtosis analog
5 Grand Frequency Spectral centroid (weighted mean frequency)
6 Spectral Std Frequency spread (bandwidth)
7 C-Factor RMS frequency
8 D-Factor √(Σf⁴·y / Σf²·y)
9 E-Factor Σf²·y / √(Σy · Σf⁴·y)
10 G-Factor Frequency std / Grand frequency
11 Freq 3rd Moment Frequency-weighted skewness
12 Freq 4th Moment Frequency-weighted kurtosis
13 H-Factor Shape factor based on √
14 J-Factor Shape factor variant

Critical Fix: The original code suffered from RuntimeWarning: divide by zero during FFT calculations when the spectrum was flat or all-zero. This has been resolved using epsilon-guarded denominators and np.nan_to_num() cleanup.

Feature Matrix Summary

Component Features/Sensor Sensors Total
Time-domain 12 7 84
Frequency-domain 14 7 98
LPS indicator 1
Total 183

📊 Dataset

Source: MetroPT3 (Air Compressor) Dataset — Real-world sensor data from a metropolitan train's air production unit.

Property Value
Rows 1,516,948
Columns 15 sensor signals
Time Range 2020-02-01 to 2020-09-01
Sampling ~10 second intervals
File dataset.train (1.5 GB)

Sensor Signals: TP2, TP3, H1, DV_pressure, Reservoirs, Oil_temperature, Motor_current, COMP, DV_eletric, Towers, MPG, LPS, Pressure_switch, Oil_level, Caudal_impulses.

Fault Events Used

Dataset Slice Range Fault Window Purpose
Training rows 878,462–912,357 Jun 5–7, 2020 Model training
Test 1 rows 555,000–580,000 Apr 18, 2020 Validation
Test 2 rows 830,000–850,000 May 29–30, 2020 Validation
Test 3 rows 1,164,000–1,176,000 Jul 15, 2020 Validation

🚀 Quick Start

1. Clone & Install

git clone <repository-url>
cd Train-Predictive-maintenance-using-AI

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate        # Linux/Mac
venv\Scripts\activate           # Windows

# Install dependencies
pip install -r requirements.txt

2. Place the Dataset

Ensure dataset.train is in the project root directory.

3. Run the Full Pipeline

python run_pipeline.py

This single command will:

  1. ✅ Load and parse the 1.5M-row CSV
  2. ✅ Slice into training + 3 test datasets
  3. ✅ Extract 183 features per sample (time-domain + FFT)
  4. ✅ Train StandardScaler → RandomForest pipeline
  5. ✅ Evaluate on all 3 test sets with Precision/Recall/F1
  6. ✅ Save model to models/ and plots to figures/

Expected Output

STAGE 1: Data Loading & Slicing
  Training dataset: 33,895 samples (fault rate: X.XX%)
  Test set metrotest1: 25,000 samples
  ...

STAGE 4: Model Training
  StandardScaler → RandomForestClassifier(n_estimators=100)
  Model saved → models/random_forest_model.joblib

STAGE 5: Model Evaluation
  ═══════════════════════════════════════════
  EVALUATION SUMMARY
  Test Set         Precision      Recall          F1
  metrotest1          0.XXXX      0.XXXX      0.XXXX
  metrotest2          0.XXXX      0.XXXX      0.XXXX
  metrotest3          0.XXXX      0.XXXX      0.XXXX
  ═══════════════════════════════════════════

⚙ Configuration

All pipeline parameters are centralized in config/config.yaml:

# Data slicing
data:
  training:
    slice_start: 878462
    slice_end: 912357
    fault_start: "2020-06-05 10:00:00"
    fault_end: "2020-06-07 14:30:00"

# Feature engineering
features:
  time_domain:
    window_size: 200
  freq_domain:
    frame_size: 200
    hop_length: 1

# Model hyperparameters
model:
  n_estimators: 100
  random_state: 42

To experiment: simply modify config.yaml — no source code changes required.


📈 Results

The model is evaluated on 3 historically documented fault events using weighted Precision, Recall, and F1 Score. Confusion matrix heatmaps are saved to the figures/ directory.


🔧 Technical Details

Dependencies

Package Purpose
pandas Data manipulation & datetime handling
numpy Numerical computation
scikit-learn ML pipeline, Random Forest, metrics
scipy Signal processing (FFT)
matplotlib Plotting backend
seaborn Statistical visualization
pyyaml Configuration management
joblib Model serialization

Design Principles

  • Separation of Concerns: Each module has a single responsibility
  • Configuration-Driven: All hyperparameters in config.yaml
  • Robust Error Handling: Epsilon-guarded divisions in FFT calculations
  • Structured Logging: Python logging module replaces print statements
  • PEP 8 Compliant: Clean, readable, documented Python code
  • Reproducible: random_state parameter ensures deterministic results

📄 License

This project is for educational and research purposes.


Built with ❤️ for Predictive Maintenance and Industrial AI.

About

Engineered 183 advanced features per sample using Time-Domain rolling-window statistics and Frequency-Domain FFT spectral analysis across 7 IoT sensor signals (1.5M+ data points). Implements a modular, config-driven architecture following MLOps best practices — including robust error handling for zero-division edge cases in FFT computations, struct

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors