Tools: GitHub · Docker · Render · Vercel · GitHub Actions · Apache Airflow Goal: take a trained ML model, expose it as a production API, automate deployment, then set up a nightly ML training pipeline.
You are working for a fintech company that processes thousands of card transactions per day. The fraud team has trained a machine learning model (XGBoost) capable of scoring each transaction in real time and flagging it as fraudulent or legitimate.
The model takes 5 features as input:
| Feature | Description |
|---|---|
amount |
Transaction amount in € |
hour |
Hour of the transaction (0 = midnight, 23 = 11 PM) |
merchant_category |
Category of the merchant (0–4 = common, 5–9 = high-risk) |
distance_from_home |
Distance between the transaction location and the cardholder's home (km) |
num_transactions_last_24h |
Number of transactions on this card in the last 24 hours |
The model was trained on historical data and currently runs as a script on a data scientist's laptop. Your job is to take it to production.
Concretely, you will:
- Expose the model as a REST API that any application can call
- Build a Docker image so it runs identically everywhere
- Automate tests and deployment so every code change goes live safely
- Host the backend on Render and a simple UI on Vercel — both for free
- Set up an Airflow pipeline that retrains the model every night on fresh data, with automatic drift detection
The frontend is a simple static web page — a single HTML file with a bit of JavaScript. There is no framework (no React, no Vue), no build step, no npm install. You open the file in a browser and it works.
What it does:
- Displays a form where the user enters the 5 transaction features
- Sends a
POST /predictrequest to the backend API when the form is submitted - Shows the result (fraud / legitimate + probability) returned by the API
How it communicates with the backend:
Browser (index.html)
│
│ POST /predict {"amount": 150, "hour": 14, ...}
▼
Backend API (FastAPI on Render)
│
│ {"is_fraud": false, "fraud_probability": 0.02, ...}
▼
Browser displays the result
The API URL is configured in config.js:
const CONFIG = {
API_URL: "https://YOUR-APP.onrender.com", // ← your Render URL
};This file is the only thing you need to edit in the frontend. Everything else is already written.
Why Vercel? Vercel hosts static files for free and redeploys automatically every time you push to the prod branch — exactly like Render does for the backend. You don't write any Vercel configuration.
You are not evaluated on the frontend. Its only purpose is to give a visual interface to test your production API.
Developer
│
git push → prod branch
│
GitHub Actions (CI/CD)
│
┌─────────┴──────────┐
│ pytest (tests) │
│ if OK ↓ │
│ Render deploy │
└─────────┬──────────┘
│
Backend API (Render) Frontend (Vercel)
FastAPI + XGBoost ←── HTML / JS
/predict /health reads from config.js
▲
│ (scheduled – configured in Part 5)
Airflow Training Pipeline
extract → validate → check_drift
↓ (if drift)
preprocess → train → evaluate → save
TP_Seance4/
├── backend/
│ ├── main.py ← FastAPI app [PROVIDED]
│ ├── requirements.txt ← Python dependencies [PROVIDED]
│ ├── model/
│ │ └── train.py ← Training script [PROVIDED]
│ └── tests/
│ └── test_api.py ← Deployment tests [PROVIDED]
│
├── frontend/
│ ├── index.html ← UI [PROVIDED]
│ └── config.js ← API URL config [edit this]
│
├── airflow/
│ ├── dags/
│ │ └── fraud_retrain_dag.py ← Airflow DAG [PROVIDED]
│ ├── Dockerfile ← Airflow image [PROVIDED]
│ └── docker-compose.airflow.yml [PROVIDED]
│
└── .github/
└── workflows/
└── deploy.yml ← CI/CD skeleton [write both jobs]
What YOU must write:
backend/Dockerfile.github/workflows/deploy.yml— both thetestjob and thedeploy-backendjob- Render configuration (web dashboard)
- Vercel configuration (web dashboard)
- GitHub secrets (
RENDER_DEPLOY_HOOK)
git clone <your-repo>
cd TP_Seance4
cd backend
pip install -r requirements.txt# From backend/
python model/train.py
# Expected: Test AUC: 0.97xx
# Creates: model/fraud_model.pkl + model/baseline_stats.jsonuvicorn main:app --reload --port 8000Open http://localhost:8000/docs → test the /predict endpoint with the Swagger UI.
pytest tests/ -vExpected: all 8 tests pass.
Your Dockerfile must:
- Start from
python:3.11-slim - Copy
requirements.txtand install dependencies - Copy all backend files
- Train the model at build time (
python model/train.py) - Expose port 8000
- Launch the API with
uvicorn
docker build -t fraud-api ./backend
docker run -p 8000:8000 -e MODEL_VERSION=v1.0 fraud-apiCheck: curl http://localhost:8000/health
git init
git remote add origin https://github.com/<you>/fraud-detection.git
git add .
git commit -m "Initial commit"
git push -u origin maingit checkout -b prod
git push -u origin prodAll future production deployments happen via push to
prod. You develop onmainor feature branches, then merge toprodto deploy.
The file .github/workflows/deploy.yml contains a skeleton with two jobs and their TODO comments. You must write both jobs yourself.
Job test — runs on every push and every PR targeting prod:
- Set up Python
- Install the backend dependencies
- Train the model (the tests need it)
- Run the tests against the source code directly (not inside the container)
Job deploy-backend — runs only on push to prod, after test passes:
- Call the Render deploy hook to trigger a redeployment
- The URL must be stored as a GitHub secret, not hardcoded
Things to think about:
- How do you tell a job to wait for another job to succeed before starting?
- How do you reference a secret inside a
run:command? - The
working-directoryoption lets you run a step from a specific folder
You will add the Render secret in Step 4.2, once your Render service is created.
- Go to https://render.com → New → Web Service
- Connect your GitHub repository
- Configure the service so that Render uses your Dockerfile to build and run the container
- Set the branch to
prod - Set the plan to Free
- Add the environment variable
MODEL_VERSION = v1.0
- Set the branch to
- Deploy and note your URL:
https://YOUR-APP.onrender.com
Hint: Render supports deploying Docker containers directly. Look for the right environment type when creating the service.
Verify your production API is live:
curl https://YOUR-APP.onrender.com/health- In Render: your service → Settings → Deploy Hook → copy the URL
- In GitHub: Settings → Secrets and variables → Actions → New repository secret
- Name:
RENDER_DEPLOY_HOOK - Value: the URL from Render
- Name:
- Push to
prod→ watch the Actions tab → verify Render redeploys
- Edit
frontend/config.js:const CONFIG = { API_URL: "https://YOUR-APP.onrender.com", // ← your real URL };
- Go to https://vercel.com → New Project → Import your GitHub repo
- Configure:
- Framework: Other (static site)
- Root directory:
frontend - Branch:
prod
- Deploy → visit your Vercel URL
Vercel auto-redeploys every time you push to
prod.
- Make a small change to
frontend/index.html(e.g. change the title) - Push to
prod - Verify:
- GitHub Actions runs and passes
- Render redeploys the backend
- Vercel redeploys the frontend
- The change appears on your Vercel URL
Scope: local only. This part runs entirely on your machine. Deploying Airflow to production requires shared storage between Airflow and the API (impossible with Render's free tier without extra infrastructure). Two paths exist if you want to go further after this TP:
- Astronomer – managed Airflow service, connects to any cloud API via HTTP. Free trial available: astronomer.io
- GitHub Actions (free alternative) – replace the DAG with a GHA scheduled workflow (cron). Simpler and free, but you lose the Airflow UI and drift detection branching. See the GHA docs on scheduled events.
The DAG (airflow/dags/fraud_retrain_dag.py) is provided. Your task is to:
- Start Airflow locally
- Manually trigger and validate the pipeline
- Configure the automatic schedule and observe it trigger on its own
- Understand the drift detection logic
Airflow runs inside Docker containers and cannot see your files by default.
The connection is made entirely through Docker volume mounts defined in airflow/docker-compose.airflow.yml.
Each mount maps a folder on your machine to a path inside the container:
Your machine Inside the Airflow container
──────────────────────────────────── ──────────────────────────────────
airflow/dags/ → /opt/airflow/dags/
fraud_retrain_dag.py DAG read live – edits are instant
data/ → /opt/airflow/data/
transactions.csv written by extract_transactions task
backend/model/ → /opt/airflow/model/
fraud_model.pkl read/written by train + save tasks
baseline_stats.json read by check_data_drift task
The paths on your machine are resolved relative to the
airflow/folder (where the compose file lives), so../backend/modelalways points tobackend/model/regardless of where you run the command from.
The DAG reads the container-side paths via environment variables (DATA_PATH, MODEL_PATH, BASELINE_PATH) defined in the compose file.
Read
airflow/docker-compose.airflow.ymlnow. Every line is commented — make sure you understand what each volume, environment variable, and service does before moving on.
docker-compose -f airflow/docker-compose.airflow.yml up --buildThe --build flag builds the custom Airflow image (with ML dependencies baked in) on the first run. Subsequent runs reuse the cached image and start instantly.
Wait ~2 min, then open http://localhost:8080
Login: admin / admin
First run only:
airflow-initcreates the DB and user. It will exit with code 0 – that's normal.
In the Airflow UI:
- Find
fraud_model_nightly_retrain - Click Graph → study the task dependencies before running anything
- Identify the two branches: what conditions lead to each path?
Before setting up automatic scheduling, you must confirm the pipeline runs correctly end-to-end.
- Enable the DAG toggle (OFF → ON)
- Click Trigger DAG ▶ to start a manual run
- Watch the tasks execute in the Graph view
- Check that every task succeeds and identify which branch
check_data_drifttook — and why - Trigger the DAG a second time and observe whether the branch changes
Do not move on to 5.4 until you have at least two successful manual runs.
The DAG currently has schedule_interval=None — it only runs when triggered manually.
Your task: change the schedule_interval in fraud_retrain_dag.py so the DAG triggers automatically at 19:00 in your local timezone.
Things to think about:
- Airflow schedules use UTC by default — what UTC time corresponds to 19:00 in your timezone?
- Cron format:
minute hour * * * - After saving the file, Airflow picks up the change automatically (no restart needed)
Once configured, keep the Airflow UI open and wait for the automatic trigger at 19:00. You should see a new run appear in the DAG history without having clicked anything.
Open airflow/dags/fraud_retrain_dag.py and answer:
- What does the
check_data_drifttask return when no baseline exists? - Which statistical test is used, and what does the p-value threshold mean?
- What happens if
validate_datafails (e.g. only 100 rows)? - Why does
evaluate_aucraise aValueErrorinstead of just logging? - What does
save_to_registrycurrently do instead of a real registry?
Force a drift by editing extract_transactions to change the amount distribution:
# Original: np.random.exponential(60, n_legit)
# Change to (simulates drift):
"amount": np.random.exponential(300, n_legit), # ← 5x higher amountsTrigger the DAG manually → the check_data_drift task should now take the retrain branch.
These exercises extend the core TP.
| Exercise | Where it runs |
|---|---|
| Exercise 1 – Production smoke test | GitHub Actions + Render (prod) |
| Exercise 2 – Model versioning | Local (Airflow + API) |
| Exercise 3 – Data versioning with DVC | Local |
| Exercise 4 – Faster builds | GitHub Actions + Render (prod) |
Right now, the GitHub Actions pipeline deploys to Render and stops there. The problem: a successful deploy doesn't guarantee the API is actually responding.
Your task: add a step at the end of the deploy-backend job that automatically calls the /health endpoint of your Render service and fails the pipeline if the API does not respond correctly.
Things to think about:
- Render takes a few seconds to restart the container after a deploy — your step needs to account for that
- The Render URL must not be hardcoded in the workflow file
- The step should fail the whole job if the API returns anything other than a successful response
Currently, every deployment serves MODEL_VERSION=v1.0 forever, even after the Airflow pipeline retrains and promotes a new model.
Your task: make the version meaningful and traceable end-to-end:
- Each time the Airflow DAG successfully promotes a new model, the version must be updated automatically (e.g. using the training date or an incremental counter)
- The
/healthendpoint must always return the version of the model currently in memory - After a successful retraining cycle, the version visible in
/healthmust be different from the previous one
Things to think about:
- Where is the version stored so that both the Airflow DAG and the API can access it?
- The running API loads the model once at startup and keeps it in memory — replacing the
.pklfile on disk has no effect on the live container. The only way to serve the new model is to restart the container. - On Render, restarting the container means triggering a redeploy. The Render deploy hook (already used in your CI/CD pipeline) can also be called from the Airflow DAG at the end of
save_to_registry, using a simple HTTP POST — this closes the full loop: retrain → promote → redeploy production API automatically.
Suggested approach:
-
Add a
POST /reload-modelendpoint to the FastAPI app that reloads the model from disk without restarting the server — this avoids a full redeploy when the model file is updated via the shared volume. -
Store the version in a
fraud_model_meta.jsonfile written bysave_to_registryalongside the model (e.g.trained_attimestamp). The API reads this file when loading the model and returns the value in/health. -
At the end of
save_to_registry, callPOST /reload-modelon the API URL so the live server picks up the new model immediately. The API URL should be configured via an environment variable (API_URL) so it works both locally (http://host.docker.internal:8000) and in production (https://YOUR-APP.onrender.com).
The full loop locally:
Airflow save_to_registry
→ writes fraud_model.pkl + fraud_model_meta.json (shared volume)
→ POST /reload-model
→ API reloads model from disk
→ GET /health now returns the new version
The Airflow pipeline currently generates new data on every run without tracking it. In a real MLOps setup, every training run must be reproducible: given a model, you must be able to retrieve the exact dataset it was trained on.
Your task: integrate DVC into the project to version the training data alongside the code.
What you need to do:
- Initialize DVC in the repository and configure a local remote (a folder on your machine is enough)
- Track the training data file (
data/transactions.csv) with DVC so thatgit statusno longer shows it as untracked - Add a step in the Airflow DAG (or as a separate script called by the DAG) that runs
dvc pushafter the data is extracted, so each training run's dataset is saved to the remote - Verify that you can reproduce a past training run by checking out a previous Git commit and running
dvc pullto restore the corresponding data
Things to think about:
- DVC separates data versioning from code versioning: the
.dvcfile goes in Git, the data goes in the DVC remote - You do not need a cloud remote for this exercise — a local folder works fine
- How would you extend this to also version the trained model artifact?
Every push to prod triggers three slow steps: installing Python dependencies in CI, building the Docker image, and pushing it to Render. This exercise asks you to cut that time by applying three independent optimisations.
Optimisation 1 – Replace pip with uv
uv is a Python package installer written in Rust. It is a drop-in replacement for pip and is typically 10–100× faster, with built-in caching and dependency resolution.
Replace pip with uv in two places:
In backend/Dockerfile:
- Add
uvto the image (there is an official way to copy just the binary from theghcr.io/astral-sh/uvimage) - Replace the
pip installstep withuv pip install
In .github/workflows/deploy.yml:
- Use the official
astral-sh/setup-uvaction to installuvon the runner - Replace the
pip installstep withuv pip install - Enable uv's built-in cache so repeated runs skip already-resolved packages
Things to think about:
uv pip install --systemis needed when not inside a virtual environment (e.g. in Docker or CI)- The
setup-uvaction supports acache-dependency-globparameter — what file should it watch?
Optimisation 2 – Add a .dockerignore
Without it, every docker build sends the entire project directory to the Docker daemon as build context — including model/*.pkl, __pycache__, .git, test fixtures, etc. This bloats the context and can invalidate layer caches unnecessarily.
Your task: create backend/.dockerignore and list the files and folders that the Docker build does not need.
Things to think about:
- What is already generated at build time inside the container and should not be copied in?
- What development artifacts (cache folders, test outputs) are irrelevant at runtime?
Optimisation 3 – Cache Docker layers in GitHub Actions
Right now, every CI run rebuilds the Docker image from scratch, even if only main.py changed. The RUN uv pip install layer — the slowest one — is rebuilt every time because GitHub Actions runners start with an empty Docker cache.
Your task: update the deploy-backend job to build the image using docker/build-push-action with GitHub Actions cache enabled, so the dependency layer is reused across runs when requirements.txt has not changed.
Things to think about:
docker/build-push-actionsupportscache-fromandcache-toparameters — which cache type works without a registry?- The
load: trueoption is needed if you want to run the image locally after building it in CI
Measure and report the observed speedup:
Fill in the table with the durations you observe in the GitHub Actions tab:
| before | after (cold cache) | after (warm cache) | |
|---|---|---|---|
| Install dependencies (CI) | |||
| Docker build – dep layer (CI) | (cached) |
A difference of less than 3× on the install step is a sign that the uv cache is not correctly enabled. A difference of less than 2× on the Docker build is a sign that the layer cache is not being hit.
| Problem | Fix |
|---|---|
| Render says "Service Unavailable" | Check logs in Render dashboard – model may not be trained |
pytest fails on import main |
Run from inside backend/ folder |
| Airflow scheduler doesn't pick up DAG | Check /opt/airflow/dags/ path in the volume mount |
| Vercel shows old page | Hard-refresh (Ctrl+Shift+R). If changes still don't appear: Vercel dashboard → Deployments → latest deployment → Redeploy → uncheck "Use existing Build Cache" |
RENDER_DEPLOY_HOOK secret missing |
Add it in GitHub → Settings → Secrets → Actions |
| KS test always shows no drift | Increase the amount shift or lower DRIFT_P_VALUE to 0.1 |
MLOps · Séance 4 · ECE · 2026