This repository contains the code discussed in the following blogposts:
- Rethinking data platforms in the age of digital sovereignty
- Locking down your data: fine-grained data access on EU Clouds
The project consists of the infrastructure for each of the EU cloud providers as well as the opensource components that make up the data platform.
- The open source components are:
- Trino
- Airflow
- Open Policy Agent
- Hashicorp Vault
- ArgoCD
- Zitadel
- The infrastructure required for the following EU-based providers
- OVH
- Scaleway
- UpCloud
- Exoscale
The core of the platform is a Trino cluster, providing a SQL-like interface to data. This is used by
- data engineers who can query the data via a database client
- jobs scheduled by Airflow
Supporting components:
- Zitadel - for single sign on
- ArgoCD - for application deployment
- Vault - for secrets management
- Open Policy Agent - for authorization of Trino queries
The Data Engineers interact with the platform via the Airflow UI and via a database client connecting to Trino.
- make sure you have a hostname for this project, lots of services require SSL, which requires a valid hostname.
- fork this repo because you will need to change some values
- pick a cloud provider and make sure you can authenticate terraform (you can run this locally, but it is tricky to get the SSL certificates right)
- OVH https://registry.terraform.io/providers/ovh/ovh/latest
- UpCloud https://registry.terraform.io/providers/UpCloudLtd/upcloud/latest
- Scaleway https://registry.terraform.io/providers/scaleway/scaleway/latest
- Exoscale https://registry.terraform.io/providers/exoscale/exoscale/latest
- It is possible to run this on Hetzner + Cloudfleet (this combo only gives you k8s and object store, no managed databases)
- pick a provider in the
infrafolder and follow the instructions from theREADME.mdin that folder - follow the readme in the
bootstrap-data-platformfolder to setup argocd. - if needed, continue bootstrapping the platform with the relevant infra provider (databases, credentials etc)
- deploy the Airflow DAGs
