Fix Cloud Run deployments after Dockerfile relocation #2558

skord · 2025-12-15T15:16:42Z

Summary

Fixes three broken Cloud Run deployment workflows that were never updated after the December 1st Dockerfile relocation (commit fec63ec).

Problem

The mise infrastructure refactoring moved Dockerfiles from crate directories to docker/:

crates/agent/Dockerfile → docker/control-plane-agent.Dockerfile
crates/data-plane-controller/Dockerfile → docker/data-plane-controller.Dockerfile
crates/oidc-discovery-server/Dockerfile → docker/oidc-discovery-server.Dockerfile

The new Dockerfiles use COPY ${TARGETARCH}/binary to copy binaries from architecture-specific subdirectories, but the workflows were still copying binaries directly to crate directories without creating the amd64/ subdirectory structure.

This caused all three deployments to fail when triggered.

Solution

Updated all three workflows to:

Create amd64/ subdirectory in the deployment source directory
Copy built binaries into amd64/
Copy required files (entrypoint scripts, sops) into amd64/
Copy the Dockerfile from docker/ to the source directory
Proceed with Cloud Run deployment

Testing

Locally verified that Docker builds succeed with this directory structure by simulating the workflow steps and running docker build.

Fixed Workflows

✅ deploy-agent-api.yaml
✅ deploy-data-plane-controller.yaml (immediately broken by Add geo_region field to AnsibleHost struct #2549)
✅ deploy-oidc-discovery-server.yaml

Dockerfiles were moved to docker/ directory in commit fec63ec as part of the mise infrastructure refactoring, but deployment workflows were never updated. The new Dockerfiles expect binaries in architecture-specific subdirectories (${TARGETARCH}/), but workflows were copying binaries directly to crate directories. Fixed workflows: - deploy-agent-api.yaml: Create amd64/ subdir, copy agent and sops - deploy-data-plane-controller.yaml: Create amd64/ subdir, copy binary, entrypoint, and sops - deploy-oidc-discovery-server.yaml: Create amd64/ subdir, copy binary and entrypoint Each workflow now: 1. Creates amd64/ subdirectory in the source location 2. Copies the built binary into amd64/ 3. Copies required scripts (entrypoint, sops) into amd64/ 4. Copies the Dockerfile from docker/ to the source directory 5. Cloud Run deployment proceeds with correct directory structure Verified locally that Docker builds succeed with this structure. This fixes Cloud Run deployments that have been broken since December 1st. Discovered when #2549 triggered the data-plane-controller deployment.

jgraettinger

LGTM, though this ought to be cleaned up further.

A better structure would be to let the Platform Build CI task complete and then deploy the pre-built/pushed images directly, rather than compiling and building an image from scratch.

github-actions · 2025-12-16T16:39:58Z

PR Preview Action v1.6.3
🚀 View preview at https://estuary.github.io/flow/pr-preview/pr-2558/
Built to branch `gh-pages` at 2025-12-16 16:39 UTC. Preview will be ready when the GitHub Pages deployment is complete.

…scratch Refactors deployment workflows to use pre-built Docker images from Platform Build CI rather than rebuilding from source on every deployment, as suggested by Johnny. Benefits: - Faster deployments (no build step) - Deploy exactly what was tested in CI - More efficient resource usage - Simpler workflows (39 net lines removed) Changes: - data-plane-controller: Triggers on Platform Build completion, deploys pre-built image - oidc-discovery-server: Triggers on Platform Build completion, deploys pre-built image - agent-api: Manual trigger only, deploys pre-built dev-next image All workflows now: 1. Determine the image tag (from Platform Build or dev-next for manual triggers) 2. Deploy the pre-built image from ghcr.io/estuary/{service}:{tag} 3. Skip all build/package/Docker build steps Images are built by Platform Build workflow via mise/tasks/ci/docker-images which: - Builds all docker/*.Dockerfile images as multi-arch (amd64/arm64) - Pushes to ghcr.io/estuary/ with git describe tags - Runs automatically on every master branch push

Cloud Run cannot pull directly from external registries like ghcr.io. Instead, it requires using Artifact Registry remote repositories which act as a proxy to the external registry. Updated all deployment workflows to use the Artifact Registry proxy path: us-central1-docker.pkg.dev/estuary-control/ghcr/estuary/{service}:{tag} The remote repository was created with: gcloud artifacts repositories create ghcr \ --project=estuary-control \ --repository-format=docker \ --location=us-central1 \ --mode=remote-repository \ --remote-docker-repo=https://ghcr.io Platform Build continues pushing to ghcr.io, and Artifact Registry automatically proxies pulls from Cloud Run to the upstream registry.

skord · 2025-12-16T17:31:50Z

Tagging @jgraettinger for re-review. I went back and forth a few times on this because I forgot and then remembered why these are completely built in Cloud Build.

The original deployment workflows built from source using Cloud Build, which automatically pushed images to Artifact Registry (us-central1-docker.pkg.dev/estuary-control/cloud-run-source-deploy/...). When I refactored to use pre-built images from Platform Build, I tried pointing directly to ghcr.io/estuary/* - but that doesn't work.

From Google Cloud docs:

"To deploy public or private container images that are not stored in Artifact Registry or Docker Hub, set up an Artifact Registry remote repository."

We have been building over there to get around this limitation for a while.

The Fix

I've set up an Artifact Registry remote repository that acts as a proxy to ghcr.io:

gcloud artifacts repositories create ghcr \
  --project=estuary-control \
  --repository-format=docker \
  --location=us-central1 \
  --mode=remote-repository \
  --remote-docker-repo=https://ghcr.io

I've pulled the latest data-plane-controller (docker pull us-central1-docker.pkg.dev/estuary-control/ghcr/estuary/data-plane-controller:v0.6.0-83-g5d38214e50) on my laptop to ensure that this works.

jgraettinger

LGTM % questions

.github/workflows/deploy-data-plane-controller.yaml

jgraettinger · 2025-12-17T20:53:11Z

.github/workflows/deploy-agent-api.yaml

          project_id: estuary-control
          region: us-central1
-          source: crates/agent/
+          image: ${{ steps.image-tag.outputs.image }}


sanity check: does it pin the sha256 of the current dev-next, or will it truly pull the "latest" dev-next every time Cloud Run decides to pull the image?

According to https://docs.cloud.google.com/run/docs/deploying#service

Deploying to a service for the first time creates its first revision. Note that revisions are immutable. If you deploy from a container image tag, it will be resolved to a digest and the revision will always serve this particular digest.

Without fetch-depth: 0, actions/checkout does a shallow clone that doesn't include tags. This causes git describe --tags to fall back to just the commit hash instead of the proper version tag. Added to deploy-data-plane-controller and deploy-oidc-discovery-server workflows which use git describe to determine the image tag. The deploy-agent-api workflow doesn't need this since it hardcodes TAG="dev-next".

skord requested a review from a team December 15, 2025 15:17

skord self-assigned this Dec 15, 2025

skord force-pushed the fix-data-plane-controller-deployment branch from 2d1cef0 to 12d05c2 Compare December 15, 2025 16:22

jgraettinger approved these changes Dec 16, 2025

View reviewed changes

skord force-pushed the fix-data-plane-controller-deployment branch from baa697e to 12d05c2 Compare December 16, 2025 16:38

skord added 2 commits December 16, 2025 11:41

skord requested review from jgraettinger and mdibaiee December 16, 2025 17:31

jgraettinger approved these changes Dec 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Cloud Run deployments after Dockerfile relocation #2558

Fix Cloud Run deployments after Dockerfile relocation #2558

Uh oh!

skord commented Dec 15, 2025

Uh oh!

jgraettinger left a comment

Uh oh!

github-actions bot commented Dec 16, 2025

Built to branch `gh-pages` at 2025-12-16 16:39 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

skord commented Dec 16, 2025 •

edited

Loading

Uh oh!

jgraettinger left a comment

Uh oh!

Uh oh!

jgraettinger Dec 17, 2025

Uh oh!

skord Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Cloud Run deployments after Dockerfile relocation #2558

Are you sure you want to change the base?

Fix Cloud Run deployments after Dockerfile relocation #2558

Uh oh!

Conversation

skord commented Dec 15, 2025

Summary

Problem

Solution

Testing

Fixed Workflows

Related

Uh oh!

jgraettinger left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025

Built to branch gh-pages at 2025-12-16 16:39 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

skord commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgraettinger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgraettinger Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

skord Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Built to branch `gh-pages` at 2025-12-16 16:39 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

skord commented Dec 16, 2025 •

edited

Loading