Skip to content

Conversation

@mikaeld
Copy link
Contributor

@mikaeld mikaeld commented Sep 18, 2025

Description

We are deprecating infrastructure related to the temporary provisioning of GKE clusters for Airfow DAGs development.

Related Tickets & Documents

@mikaeld mikaeld self-assigned this Sep 18, 2025
@mikaeld mikaeld marked this pull request as draft September 18, 2025 13:53
@mikaeld mikaeld requested a review from sean-rose September 19, 2025 17:45
Copy link
Contributor

@sean-rose sean-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README.md file will also need to be updated appropriately (though if you agree with my last comment, I could update the readme as part of my changes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script seems to be unrelated to the usage of the moz-fx-data-gke-sandbox project. Is there a particular reason why you're deleting it as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tech debt removal, I've asked many DENG over the years and none ever used this script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme mentions using this script for testing Dataproc jobs, and I seem to recall at least trying to test some Dataproc tasks locally when doing QA for Airflow upgrades (though I don't think I managed to get it fully working at that time).

In any case, since this is unrelated to GKE and we do still have Dataproc tasks in active DAGs, I don't think this script should be removed in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of yesterday's Data Infra WG meeting @akkomar suggested that these GKE scripts could be repurposed to facilitate running Airflow local dev workloads in our own personal dev projects, which sounds like a reasonable approach to me to preserve the option to have a quicker Airflow dev process (at the cost of having to configure our own personal dev projects to allow this to work).

If you agree that would be reasonable, I can contribute the necessary changes to this PR (e.g. having the scripts take a project ID argument; though since it looks tricky to pass arbitrary arguments through make we'd probably still want to remove those targets and have people run these scripts directly).

Copy link
Contributor Author

@mikaeld mikaeld Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running Airflow local dev workloads in our own personal dev projects

This would require each sandbox project to have a GKE cluster with Workload Identity configured with various GCP (e.g. BQ, GCS, GAR, SQL, etc.) permissions. This also means it would be each developer's responsibility to cleanup unused resources. Mozcloud lacks budget monitoring for sandbox projects, so it is hard to monitor the costs of unused resources in those projects. This is why we had make gke create resources in a centralized project with a k8s cron job dedicated to cleaning up unused clusters.

For those reasons, I recommend against the solution proposed by @akkomar.

If you want to re-enable the feature being removed by this PR, I'd recommend building something similar but in the supported mozcloud platform (i.e. GCPv2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the suggestion you made in Slack about potentially setting up a shared GKE cluster in a new moz-fx-data-airflow-gke-dev project where developers could run local Airflow instance GKE tasks (potentially in user-specific namespaces), and I've filed DENG-9749 "Come up with new solution for telemetry-airflow devs to run GKE tasks from local Airflow instances", mentioning that idea plus the original create-GCPv2-GKE-sandbox-project idea.

In the meantime I'm OK with you proceeding with this PR since the GKE scripts no longer work as is.

However, I have squirreled away revised versions of the GKE scripts in the GKE-sandbox-config branch just in case someone like me or @gleonard-m ends up needing to resort to using a custom GKE sandbox setup.

@mikaeld mikaeld requested a review from sean-rose September 22, 2025 22:44
@atlassian
Copy link

atlassian bot commented Sep 22, 2025

🔗 Link your GitHub account to Atlassian

To enable Code Reviewer, please link your GitHub account to your Atlassian account.

Click here to connect your accounts

This is a one-time setup that takes less than a minute.

Copy link
Contributor

@sean-rose sean-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme still needs to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme mentions using this script for testing Dataproc jobs, and I seem to recall at least trying to test some Dataproc tasks locally when doing QA for Airflow upgrades (though I don't think I managed to get it fully working at that time).

In any case, since this is unrelated to GKE and we do still have Dataproc tasks in active DAGs, I don't think this script should be removed in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the suggestion you made in Slack about potentially setting up a shared GKE cluster in a new moz-fx-data-airflow-gke-dev project where developers could run local Airflow instance GKE tasks (potentially in user-specific namespaces), and I've filed DENG-9749 "Come up with new solution for telemetry-airflow devs to run GKE tasks from local Airflow instances", mentioning that idea plus the original create-GCPv2-GKE-sandbox-project idea.

In the meantime I'm OK with you proceeding with this PR since the GKE scripts no longer work as is.

However, I have squirreled away revised versions of the GKE scripts in the GKE-sandbox-config branch just in case someone like me or @gleonard-m ends up needing to resort to using a custom GKE sandbox setup.

@sean-rose
Copy link
Contributor

@mikaeld this still needs to be completed, as the readme currently references commands like make gke which no longer work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants