Skip to content

trading-engineering/trading-infrastructure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Trading Infrastructure – GitOps Kubernetes Stack (MicroK8s + Argo CD)

License

Declarative GitOps-based infrastructure stack for quantitative research and backtesting.

Provisions and manages a single-node Kubernetes cluster on Oracle Cloud Infrastructure (OCI) using MicroK8s and Argo CD. All workloads are managed declaratively via GitOps after a one-time bootstrap.

Designed for reproducible research infrastructure with explicit storage, secret management, and operational boundaries.


🧠 What is this?

This repository defines a fully declarative infrastructure layer for a quantitative trading research platform.

It bootstraps:

  • A single-node MicroK8s Kubernetes cluster
  • GitOps management via Argo CD
  • Integrated experiment tracking, monitoring, orchestration
  • Explicit local storage model
  • OCI-native secret integration

After bootstrap, all cluster state is managed through Git via Argo CD.

No imperative Kubernetes workflows are required.


🧩 What does it solve?

Research infrastructure often suffers from:

  • Manual Kubernetes management
  • Secret sprawl
  • Mixed imperative and declarative workflows
  • Poor reproducibility
  • Overengineered multi-node setups for small research workloads

This stack provides:

  • GitOps-first cluster lifecycle
  • Strict secret isolation via OCI Vault
  • Explicit storage boundaries (boot vs scratch)
  • Fully declarative application management
  • Minimal single-node production-grade design

It enables structured research infrastructure without managed Kubernetes complexity.


πŸ— Architecture Overview

The system is divided into two clear layers.

Host Layer (Bootstrap Phase)

Installed once via bootstrap script:

  • MicroK8s
  • Secrets Store CSI Driver
  • OCI Secrets Store Provider custom multi-arch image
  • Argo CD
  • Scratch Block Volume formatting and mounting

Cluster Layer (GitOps Managed)

Managed exclusively via Argo CD:

  • PostgreSQL (MLflow metadata)
  • MLflow
  • Prometheus + Grafana
  • Argo Workflows
  • Scratch PersistentVolume

All workloads are defined declaratively in:

apps/
argocd/

🧰 Core Stack


πŸ“ Repository Structure

apps/
  mlflow/
  postgres/
  monitoring/
  argo/
  scratch/

argocd/
  mlflow-app.yaml
  postgres-app.yaml
  monitoring-app.yaml
  argo-app.yaml
  scratch-app.yaml

infrastructure/
  oci-provider/
    provider.yaml

scripts/
  bootstrap-cluster.sh

πŸš€ Quickstart (Fresh VM)

1️⃣ Requirements

  • Ubuntu VM
  • Attached Block Volume for Scratch (/dev/oracleoci/oraclevds)
  • OCI Instance Principal configured
  • Repository cloned onto the VM
  • OCI Vault with predefined secrets

2️⃣ Environment Configuration

Create environment file:

cp .env.example .env

Configure:

VAULT_ID=ocid1.vault.oc1.eu-frankfurt-1.xxxxx
OCI_REGION=eu-frankfurt-1

Load environment:

set -a
source .env
set +a

3️⃣ Bootstrap the Cluster

⚠ Bootstrap is one-shot only. Run on a fresh VM or perform a full reset. Re-running on an existing cluster is not supported. All subsequent changes must occur via GitOps.

chmod +x scripts/*
./scripts/bootstrap-cluster.sh

Installs:

  • MicroK8s
  • CSI Driver
  • OCI Provider
  • Argo CD

Deploys:

  • PostgreSQL
  • MLflow
  • Monitoring
  • Argo Workflows
  • Scratch PV/PVC

🌐 Service Access Model

Kubernetes services are exposed internally as NodePorts.

OCI network configuration:

  • Only SSH (port 22) allowed externally
  • All service ports blocked externally
  • Access exclusively via SSH local port forwarding

Example SSH configuration:

Host vps
    HostName <IP>
    User ubuntu
    IdentityFile ~/.ssh/<key>
    LocalForward 30007 localhost:30007   # Grafana
    LocalForward 32120 localhost:32120   # Argo Workflows
    LocalForward 30090 localhost:30090   # Prometheus
    LocalForward 30500 localhost:30500   # MLflow

This enables development access while preventing public exposure.


πŸ’Ύ Storage Model

Explicit separation between system storage and workload storage.

Boot Volume (VM Root Disk)

  • ~47 GB (minimum OCI size)

  • Hosts:

    • Operating system
    • Kubernetes system data
    • PostgreSQL database

MLflow metadata persists in local PostgreSQL backed by boot volume.

Scratch Block Volume

  • 153 GB (β‰ˆ142.5 GiB)
  • Mounted at:
/mnt/scratch

Intended for:

  • Backtesting data
  • Research artifacts
  • Large intermediate datasets

Exposed via PersistentVolume / PersistentVolumeClaim.

⚠ Single-node only (hostPath-based).

Reusing Scratch Storage

Old VM:

sudo microk8s kubectl delete namespace scratch
sudo microk8s kubectl delete pv scratch-pv
sudo umount /mnt/scratch/

Detach volume.

New VM:

  1. Attach volume with same device name
  2. Run bootstrap
  3. Ensure PV/PVC names match
  4. Data reused automatically

Full Reset

sudo snap remove microk8s --purge
sudo rm -rf /var/snap/microk8s/
sudo rm -rf ~/.kube/

πŸ” OCI Secrets Integration (Required)

All sensitive configuration is stored in OCI Vault.

Requirements:

  • OCI Vault must exist
  • Secrets must be created in advance
  • Secret names must match Helm/YAML references

Secrets retrieved using:

  • Secrets Store CSI Driver
  • OCI Provider (custom multi-arch image)
  • Instance Principal authentication

Secrets defined via SecretProviderClass.

No secrets stored in Git.


πŸ“Š Monitoring & Persistence Model

Prometheus runs with ephemeral local storage by default.

  • Metrics stored inside pod filesystem
  • No PersistentVolume configured
  • Data lost on pod restart or node reboot

Intended for lightweight research environments.

Storage Monitoring

Because metrics are local:

df -h /
sudo du -h --max-depth=1 /var/snap/microk8s/common/var/

Used to track disk usage and prevent exhaustion.


☁️ Why Oracle Cloud Infrastructure?

OCI selected primarily for ARM free tier:

  • 4 vCPU ARM VM
  • 24 GB RAM
  • ~200 GB free storage

Suitable for:

  • Single-node Kubernetes research clusters
  • Persistent external storage
  • Zero-cost experimentation

Boot volume kept minimal; scratch volume handles data-heavy workloads.


πŸ”„ Updating the Platform

  1. Modify YAML / Helm values in apps/<component>/
  2. Commit & push
  3. Argo CD auto-syncs

No manual kubectl required.


πŸ” Security Model

  • No secrets in Git
  • Public VM with OCI firewall (NSGs / Security Lists)
  • Only SSH exposed
  • All service ports closed externally
  • Egress allowed
  • Secrets in OCI Vault

🎯 Design Principles

  • GitOps-first
  • Single source of truth
  • Declarative workflows only
  • Multi-arch native
  • Minimal but production-grade

πŸ“Œ Project Status

Operational

  • MicroK8s bootstrap
  • Argo CD GitOps layer
  • OCI secrets integration
  • PostgreSQL
  • Scratch storage model

Experimental

  • Monitoring stack
  • Argo Workflows pipelines
  • Higher-level research workflows

πŸ‘₯ Who is this for?

  • Quant research & backtesting
  • GitOps Kubernetes experimentation
  • Reproducible infrastructure setups

Not intended for multi-node production clusters or managed Kubernetes platforms.


🏷️ Versioning

MIT license. Semantic versioning. Initial public release: v0.1.0.

About

Cloud infrastructure for quantitative research & backtesting. GitOps-driven single-node Kubernetes platform on OCI using MicroK8s, Argo CD, MLflow and Vault-backed secrets.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages