The goal of this project is to apply everything learned in the course to build an end-to-end machine learning system with full MLOps workflow.
This project aims to build a sustainable and maintainable stock price prediction system, implementing the complete MLOps lifecycle including data collection, feature engineering, model training, experiment tracking, real-time inference, deployment, and monitoring.
Users can query predicted stock prices and historical trend charts through a web interface. Developers can periodically retrain models, track experiments, monitor performance and data drift, and trigger auto-retraining.
| Category | Tools & Frameworks |
|---|---|
| Cloud / Infra | Docker Compose (extendable to EC2), MinIO, PostgreSQL, ClickHouse |
| ML Pipeline | FastAPI, Scikit-learn, Pandas, MLflow |
| Workflow Orchestration | Prefect 2 |
| Monitoring | Evidently + Prometheus + Grafana |
| CI/CD | GitHub Actions |
| Testing | pytest (unit + integration tests) |
| Formatting / Hooks | black, pre-commit, flake8 |
| IaC | Docker Compose + Volume + Network (extendable to Terraform) |
.
βββ backend/ # Backend with API, ML logic, workflows
β βββ api/ # FastAPI routes (train, predict)
β βββ src/ # Feature engineering, model training/inference
β βββ monitor/ # Monitoring logic using Evidently
β βββ tasks/ # Celery async tasks
β βββ workflows/ # Prefect ETL & training flows
β βββ tests/ # Unit & integration tests
βββ frontend/ # Frontend (Vite + React)
βββ data/, db/, pgdata/ # Data and DB initialization folders
βββ monitor/ # Prometheus & Grafana configurations
βββ Dockerfile.*, docker-compose.yml
βββ Makefile, setup.md, implementation_log.md
βββ .github/ # GitHub Actions configuration
β βββ workflows/ # GitHub Actions CI/CD workflow
βββ .pre-commit-config.yaml # Pre-commit configuration
βββ README.md
- ETL and training pipelines are triggered regularly via Prefect
- Training results are logged to MLflow and registered as versioned models
- FastAPI serves
/predictand/trainAPIs (Celery-supported) - Evidently exports model drift metrics to Prometheus
- Grafana dashboards visualize prediction accuracy, drift metrics, and system metrics
graph TD
%% ------------------- User / Frontend -------------------
U[User Browser] -->|HTTP/WS Requests| NG[Nginx<br>Static + Reverse Proxy]
subgraph Nginx_Proxy["Nginx Proxy"]
NG -->|/api/predict| UP1
NG -->|/api/train| UP2
NG -->|/api/| UP3
NG -->|/ws| W
NG -->|Static files<br>/index.html, /js, /css...| Static[React Build]
end
%% ------------------- Upstream Pools -------------------
subgraph Upstream_Pools["Upstream Pools"]
direction TB
UP1["backend_predict<br>70% to backend1<br>30% to backend2"]
UP2["backend_train<br>30% to backend1<br>70% to backend2"]
UP3["backend_api<br>1:1 to backend1, backend2"]
end
%% ------------------- Backend Containers -------------------
subgraph Backend_API["Backend API multiple containers"]
B1[backend1:8000]
B2[backend2:8000]
end
UP1 --> B1
UP1 --> B2
UP2 --> B1
UP2 --> B2
UP3 --> B1
UP3 --> B2
%% ------------------- Data / ETL -------------------
subgraph Data_ETL["Data and ETL"]
P[Prefect Workflow<br>backend/src/workflows] -->|ETL processing| D1[(raw_db<br>PostgreSQL)]
P -->|Cleaned data| D2[(OLAP<br>ClickHouse)]
end
B1 -->|Query cleaned data| D2
B2 -->|Query cleaned data| D2
B1 -->|Push task| E[Redis]
B2 -->|Push task| E
%% ------------------- Model Training -------------------
subgraph Model_Training["Model Training & MLflow"]
L[Celery Worker] -->|Read cleaned data| D2
L -->|Execute training| G[Model training logic]
G -->|Model version management| H[MLflow Registry]
G -->|Update model metadata| D3[(mlflow-db<br>PostgreSQL)]
H -->|Model Artifact| S[(MinIO<br>Model storage)]
H --> D4[(mlflow internal DB<br>PostgreSQL)]
end
%% ------------------- Monitoring -------------------
subgraph Monitoring["Monitoring & Real-time Push"]
W[ws_monitor<br>Kafka Consumer + WebSocket]
Q[metrics_publisher<br>Fetch & send to Kafka every 5s]
N1[Kafka - prediction topic] -->|Prediction result| W
N2[Kafka - metrics topic]
Q --> N2
N2 -->|Metrics| W
J[Prometheus]
J -->|Historical data| K[Grafana Dashboard]
end
%% ------------------- Async Queue -------------------
subgraph Async_Tasks["Async Task Queue"]
E --> |Execute| L
end
%% ------------------- Styles -------------------
classDef frontend fill:#FFD966,stroke:#333,stroke-width:2px;
classDef nginx fill:#FFB347,stroke:#333,stroke-width:2px;
classDef upstream fill:#85C1E9,stroke:#333,stroke-width:2px;
classDef backend fill:#ABEBC6,stroke:#333,stroke-width:2px;
classDef db fill:#F9E79F,stroke:#333,stroke-width:2px;
classDef cache fill:#F5B7B1,stroke:#333,stroke-width:2px;
classDef mlflow fill:#D7BDE2,stroke:#333,stroke-width:2px;
classDef monitoring fill:#FAD7A0,stroke:#333,stroke-width:2px;
classDef prom fill:#D5F5E3,stroke:#333,stroke-width:2px;
class U frontend
class NG,Static nginx
class UP1,UP2,UP3 upstream
class B1,B2 backend
class D1,D2,D3,D4,S db
class E,L,M cache
class G,H mlflow
class W,Q,N1,N2 monitoring
class J,K prom
- Visual diagram of the Docker Compose services
graph TD
subgraph Users
A[Browser]
end
subgraph Frontend
B[Vite + React]
end
subgraph Backend
C[FastAPI API]
D[Model Training / Inference]
E[Celery Worker]
F[Prefect Flows]
end
subgraph Storage
G[PostgreSQL as raw_db]
H[ClickHouse as cleaned data]
I[MinIO as Model Artifacts]
J[MLflow as Tracking DB]
end
subgraph Monitoring
K[Prometheus]
L[Grafana]
M[Evidently]
end
subgraph Messaging
N[Kafka]
O[Redis]
end
subgraph CI/CD
P[GitHub Actions]
end
A --> B
B --> C
C --> D
D --> E
E --> G
E --> H
D --> J
D --> I
F --> G
F --> H
M --> K
K --> L
D --> N
M --> N
E --> O
P -->|CI/CD| C
- βοΈ Well-defined scope: stock prediction + model lifecycle
- βοΈ Docker Compose setup with multiple services
- βοΈ IaC-friendly (MinIO, DB volumes, Prometheus)
- βοΈ MLflow for logging experiments and model versioning
- βοΈ Prefect 2 for ETL and training flows
- βοΈ FastAPI for model inference (containerized API)
-
βοΈ Evidently + Prometheus + Grafana for data/model monitoring
- βοΈ Makefile + setup.md + requirements + Docker for consistent setup
make dev-setup
- Unit tests
- Integration tests
- Code formatting (black, flake8)
- Makefile automation
- Pre-commit hooks
- GitHub Actions for CI
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
# Start all services
docker compose up --build
# Run Prefect workflow or one-off training
make train
make workflowHistorical stock data from TW & US markets (e.g., 2330.TW, AAPL, TSM):
- Source: Yahoo Finance
- Transformed via ETL and stored in Parquet format (see
workflows/parquet/)
MIT License.