Skip to content

datamindedbe/eu-data-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Platform Stack

This repository contains the code discussed in the following blogposts:

The project consists of the infrastructure for each of the EU cloud providers as well as the opensource components that make up the data platform.

  • The open source components are:
    • Trino
    • Airflow
    • Open Policy Agent
    • Hashicorp Vault
    • ArgoCD
    • Zitadel
  • The infrastructure required for the following EU-based providers
    • OVH
    • Scaleway
    • UpCloud
    • Exoscale

Architecture

Architecture

The core of the platform is a Trino cluster, providing a SQL-like interface to data. This is used by

  • data engineers who can query the data via a database client
  • jobs scheduled by Airflow

Supporting components:

  • Zitadel - for single sign on
  • ArgoCD - for application deployment
  • Vault - for secrets management
  • Open Policy Agent - for authorization of Trino queries

Interacting with the Platform

The Data Engineers interact with the platform via the Airflow UI and via a database client connecting to Trino.

Deploying the platform

Tools needed

Requirements

Infra deployment

  • pick a provider in the infra folder and follow the instructions from the README.md in that folder
  • follow the readme in the bootstrap-data-platform folder to setup argocd.
  • if needed, continue bootstrapping the platform with the relevant infra provider (databases, credentials etc)

Deploy some ETL jobs

  • deploy the Airflow DAGs

Contributors

About

Spin up a minimalistic Data Analytics Platform on a European cloud provider

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •