Skip to content

Commit 0779a7a

Browse files
committed
Enhance README.md and dashboard for ML pipeline integration
- Updated README.md to include details about ML pipeline monitoring and analytics, along with new API endpoints for SageMaker ML pipelines. - Added functionality in the Flask dashboard to display both data and ML pipelines, including detailed views and analytics for ML pipeline executions. - Improved HTML templates to separate sections for data and ML pipelines, enhancing user experience with a modern UI. - Implemented new API calls in app.py to fetch and analyze ML pipeline data, ensuring real-time monitoring capabilities.
1 parent ac109d0 commit 0779a7a

File tree

11 files changed

+573
-64
lines changed

11 files changed

+573
-64
lines changed

README.md

Lines changed: 127 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,65 @@ This project implements a data pipeline that analyzes global energy consumption
1212
- Automatically saves outputs as GitHub Actions artifacts
1313
- Flask-based monitoring dashboard for pipeline status
1414
- Custom REST APIs for pipeline monitoring
15+
- ML pipeline monitoring and analytics
16+
17+
## Dashboard App
18+
19+
The project includes a Flask-based dashboard application that provides real-time monitoring of both data and ML pipelines. The dashboard offers:
20+
21+
1. **Pipeline Overview**:
22+
- List of all available data and ML pipelines
23+
- Quick status indicators
24+
- Creation and last update timestamps
25+
- Pipeline tags and labels
26+
27+
2. **Pipeline Details**:
28+
- Comprehensive run history
29+
- Success rate statistics
30+
- Average run time metrics
31+
- Error analysis and common failure patterns
32+
- Detailed status distribution
33+
34+
3. **Visual Features**:
35+
- Modern, responsive UI
36+
- Interactive cards with hover effects
37+
- Color-coded status indicators
38+
- Real-time status updates
39+
- Detailed run history table
40+
41+
### Dashboard Screenshots
42+
43+
#### Homepage
44+
![Dashboard Homepage](assets/dashboard-home.png)
45+
46+
#### Data Pipeline Details
47+
![Data Pipeline Details](assets/data-pipeline-details.png)
48+
49+
#### ML Pipeline Details
50+
![ML Pipeline Details](assets/ml-pipeline-details.png)
51+
52+
### Running the Dashboard
53+
54+
1. Navigate to the dashboard directory:
55+
```bash
56+
cd dashboard
57+
```
58+
59+
2. Install dashboard dependencies:
60+
```bash
61+
pip install -r requirements.txt
62+
```
63+
64+
3. Start the Flask application:
65+
```bash
66+
python app.py
67+
```
68+
69+
4. Access the dashboard at http://localhost:5000
1570

1671
## Custom APIs
1772

18-
The project includes custom REST APIs built with AWS Lambda to interact with the Prefect Cloud pipelines. These APIs provide real-time access to pipeline information and status.
73+
The project includes custom REST APIs built with AWS Lambda to interact with the Prefect Cloud pipelines and SageMaker ML pipelines. These APIs provide real-time access to pipeline information and status.
1974

2075
### API Documentation
2176

@@ -24,42 +79,67 @@ The complete API documentation is available at:
2479

2580
### Available Endpoints
2681

27-
1. **Get All Pipelines**
82+
1. **Get All Data Pipelines**
2883
```
2984
GET https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/data/pipelines
3085
```
31-
- Returns a list of all pipelines running in Prefect Cloud
86+
- Returns a list of all data pipelines running in Prefect Cloud
3287
- Response: List of pipeline objects with metadata
3388

34-
2. **Get Pipeline Status**
89+
2. **Get Data Pipeline Status**
3590
```
3691
GET https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/data/pipelines/status?id={pipeline_id}
3792
```
38-
- Returns detailed status information for a specific pipeline
93+
- Returns detailed status information for a specific data pipeline
3994
- Parameters:
4095
- `id`: The unique identifier of the pipeline
4196
- Response: Detailed pipeline status including run history and metrics
4297

98+
3. **Get All ML Pipelines**
99+
```
100+
GET https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/ml/pipeline
101+
```
102+
- Returns a list of all ML pipelines running in SageMaker
103+
- Response: List of ML pipeline objects with metadata
104+
105+
4. **Get ML Pipeline Status**
106+
```
107+
GET https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/ml/pipeline/status?pipeline_id={pipeline_id}
108+
```
109+
- Returns detailed status information for a specific ML pipeline
110+
- Parameters:
111+
- `pipeline_id`: The unique identifier of the ML pipeline
112+
- Response: Detailed ML pipeline execution history and metrics
113+
43114
### API Usage Example
44115

45116
```python
46117
import requests
47118

48-
# Get all pipelines
119+
# Get all data pipelines
49120
response = requests.get('https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/data/pipelines')
50-
pipelines = response.json()
121+
data_pipelines = response.json()
51122

52-
# Get status of a specific pipeline
123+
# Get status of a specific data pipeline
53124
pipeline_id = "your-pipeline-id"
54125
response = requests.get(f'https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/data/pipelines/status?id={pipeline_id}')
55126
status = response.json()
127+
128+
# Get all ML pipelines
129+
response = requests.get('https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/ml/pipeline')
130+
ml_pipelines = response.json()
131+
132+
# Get status of a specific ML pipeline
133+
ml_pipeline_id = "your-ml-pipeline-id"
134+
response = requests.get(f'https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/ml/pipeline/status?pipeline_id={ml_pipeline_id}')
135+
ml_status = response.json()
56136
```
57137

58138
### API Response Format
59139

60140
The APIs return JSON responses with the following structure:
61141

62-
1. **Pipelines List Response**:
142+
1. **Data Pipelines List Response**:
63143
```json
64144
[
65145
{
@@ -73,7 +153,7 @@ The APIs return JSON responses with the following structure:
73153
]
74154
```
75155

76-
2. **Pipeline Status Response**:
156+
2. **Data Pipeline Status Response**:
77157
```json
78158
[
79159
{
@@ -92,60 +172,43 @@ The APIs return JSON responses with the following structure:
92172
]
93173
```
94174

95-
## Dashboard App
96-
97-
The project includes a Flask-based dashboard application that provides real-time monitoring of pipeline runs and their status. The dashboard offers:
98-
99-
1. **Pipeline Overview**:
100-
- List of all available pipelines
101-
- Quick status indicators
102-
- Creation and last update timestamps
103-
- Pipeline tags and labels
104-
105-
2. **Pipeline Details**:
106-
- Comprehensive run history
107-
- Success rate statistics
108-
- Average run time metrics
109-
- Error analysis and common failure patterns
110-
- Detailed status distribution
111-
112-
3. **Visual Features**:
113-
- Modern, responsive UI
114-
- Interactive cards with hover effects
115-
- Color-coded status indicators
116-
- Real-time status updates
117-
- Detailed run history table
118-
119-
### Dashboard Screenshots
120-
121-
#### Homepage
122-
![Dashboard Homepage](assets/dashboard-home.png)
123-
124-
#### Pipeline Details
125-
![Pipeline Details](assets/flow-details.png)
126-
127-
#### Run History
128-
![Flow Run History](assets/flow-run-history.png)
129-
130-
### Running the Dashboard
131-
132-
1. Navigate to the dashboard directory:
133-
```bash
134-
cd dashboard
135-
```
136-
137-
2. Install dashboard dependencies:
138-
```bash
139-
pip install -r requirements.txt
175+
3. **ML Pipelines List Response**:
176+
```json
177+
[
178+
{
179+
"PipelineArn": "arn:aws:sagemaker:region:account:pipeline/pipeline-name",
180+
"PipelineName": "pipeline-name",
181+
"PipelineDisplayName": "Pipeline Display Name",
182+
"CreationTime": "timestamp",
183+
"LastModifiedTime": "timestamp",
184+
"LastExecutionTime": "timestamp"
185+
}
186+
]
140187
```
141188

142-
3. Start the Flask application:
143-
```bash
144-
python app.py
189+
4. **ML Pipeline Status Response**:
190+
```json
191+
{
192+
"PipelineExecutionSummaries": [
193+
{
194+
"PipelineExecutionArn": "arn:aws:sagemaker:region:account:pipeline/pipeline-name/execution/execution-id",
195+
"StartTime": "timestamp",
196+
"PipelineExecutionStatus": "Succeeded|Failed|Executing",
197+
"PipelineExecutionDisplayName": "Execution Name",
198+
"PipelineExecutionDetails": {
199+
"PipelineArn": "arn:aws:sagemaker:region:account:pipeline/pipeline-name",
200+
"PipelineExecutionStatus": "Succeeded|Failed|Executing",
201+
"CreationTime": "timestamp",
202+
"LastModifiedTime": "timestamp",
203+
"CreatedBy": {
204+
"UserProfileName": "user-name"
205+
}
206+
}
207+
}
208+
]
209+
}
145210
```
146211

147-
4. Access the dashboard at http://localhost:5000
148-
149212
## Output Artifacts
150213

151214
The pipeline generates three main artifacts that are saved in the `output` directory and uploaded as GitHub Actions artifacts:
@@ -248,6 +311,10 @@ To run the pipeline manually:
248311
- `templates/`: HTML templates for the dashboard
249312
- `static/`: CSS and JavaScript files
250313
- `requirements.txt`: Dashboard-specific dependencies
314+
- `lambdas/`: AWS Lambda functions for custom APIs
315+
- `data/`: Data pipeline API functions
316+
- `ml/`: ML pipeline API functions
317+
- `docs.json`: API documentation
251318
- `README.md`: This file
252319

253320
## Data Analysis Features

assets/dashboard-home.png

173 KB
Loading

assets/data-pipeline-details.png

1.89 MB
Loading

assets/flow-details.png

-2.09 MB
Binary file not shown.

assets/flow-run-history.png

-2.58 MB
Binary file not shown.

assets/ml-pipeline-details.png

1.18 MB
Loading

dashboard/app.py

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,15 @@ def get_pipeline_details(pipeline_id):
1414
response = requests.get(f'https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/data/pipeline/status?id={pipeline_id}')
1515
return response.json()
1616

17+
def get_ml_pipelines():
18+
response = requests.get('https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/ml/pipeline')
19+
print("ML Pipelines:", response.json())
20+
return response.json()
21+
22+
def get_ml_pipeline_details(pipeline_id):
23+
response = requests.get(f'https://es3ozkq7i8.execute-api.us-east-1.amazonaws.com/dev/ml/pipeline/status?pipeline_id={pipeline_id}')
24+
return response.json()
25+
1726
def analyze_pipeline_runs(runs):
1827
# Convert string timestamps to datetime objects
1928
for run in runs:
@@ -46,12 +55,47 @@ def analyze_pipeline_runs(runs):
4655
'runs': sorted(runs, key=lambda x: x['created'], reverse=True)
4756
}
4857

49-
# pipelines : [{\"id\": \"4265d3d9-26cc-42ac-8fb7-b8e786796584\", \"created\": \"2025-03-24T14:47:28.193399Z\", \"updated\": \"2025-03-24T14:47:28.193419Z\", \"name\": \"Global Energy Transition Analysis Pipeline\", \"tags\": [], \"labels\": {}}, {\"id\": \"32e6109b-56eb-4f79-ae9c-a5513e800495\", \"created\": \"2025-03-24T12:55:48.498668Z\", \"updated\": \"2025-03-24T12:55:48.498685Z\", \"name\": \"Simple Data Pipeline\", \"tags\": [], \"labels\": {}}]
58+
def analyze_ml_pipeline_runs(executions):
59+
if not executions or 'PipelineExecutionSummaries' not in executions:
60+
return {
61+
'total_runs': 0,
62+
'status_counts': {},
63+
'avg_run_time': 0,
64+
'success_rate': 0,
65+
'latest_run': None,
66+
'error_types': {},
67+
'runs': []
68+
}
69+
70+
runs = executions['PipelineExecutionSummaries']
71+
72+
# Calculate statistics
73+
total_runs = len(runs)
74+
status_counts = Counter(run['PipelineExecutionStatus'] for run in runs)
75+
success_rate = (status_counts['Succeeded'] / total_runs * 100) if total_runs > 0 else 0
76+
77+
# Get latest run
78+
latest_run = max(runs, key=lambda x: x['StartTime']) if runs else None
79+
80+
# Get error types
81+
error_types = Counter(run['PipelineExecutionStatus'] for run in runs if run['PipelineExecutionStatus'] not in ['Succeeded'])
82+
83+
return {
84+
'total_runs': total_runs,
85+
'status_counts': dict(status_counts),
86+
'success_rate': round(success_rate, 2),
87+
'latest_run': latest_run,
88+
'error_types': dict(error_types),
89+
'runs': sorted(runs, key=lambda x: x['StartTime'], reverse=True)
90+
}
5091

5192
@app.route('/')
5293
def home():
53-
pipelines = get_pipelines()
54-
return render_template('index.html', pipelines=pipelines)
94+
data_pipelines = get_pipelines()
95+
ml_pipelines = get_ml_pipelines()
96+
return render_template('index.html',
97+
data_pipelines=data_pipelines,
98+
ml_pipelines=ml_pipelines)
5599

56100
@app.route('/pipeline/<pipeline_id>')
57101
def pipeline_details(pipeline_id):
@@ -73,5 +117,25 @@ def pipeline_details(pipeline_id):
73117
pipeline=pipeline_info,
74118
analysis=analysis)
75119

120+
@app.route('/ml/pipeline/<pipeline_id>')
121+
def ml_pipeline_details(pipeline_id):
122+
executions = get_ml_pipeline_details(pipeline_id)
123+
if not executions or 'PipelineExecutionSummaries' not in executions:
124+
return "Pipeline not found", 404
125+
126+
# Get pipeline info from the first execution
127+
pipeline_info = {
128+
'id': executions['PipelineExecutionSummaries'][0]['PipelineExecutionDetails']['PipelineArn'].split('/')[-1],
129+
'name': executions['PipelineExecutionSummaries'][0]['PipelineExecutionDetails']['PipelineArn'].split('/')[-1],
130+
'display_name': executions['PipelineExecutionSummaries'][0]['PipelineExecutionDisplayName']
131+
}
132+
133+
# Analyze the executions
134+
analysis = analyze_ml_pipeline_runs(executions)
135+
136+
return render_template('ml_pipeline_details.html',
137+
pipeline=pipeline_info,
138+
analysis=analysis)
139+
76140
if __name__ == '__main__':
77141
app.run(debug=True)

0 commit comments

Comments
 (0)