-
Notifications
You must be signed in to change notification settings - Fork 4k
roachprod: pre-bake custom roachprod cloud images #156408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
It takes 25-30 minutes to build all flavors ( The current system ensures that an image is available in each region configured in |
7a57c3d to
d6edad4
Compare
3bd92d9 to
a09006f
Compare
Prior to this patch, roachprod clusters were created from bare Ubuntu images. This was inadequate for multiple reasons, some of which being: - dependency on third-parties (GCS, APT repositories) availability - spinning up two clusters at a different moment in time could lead to different resulting systems (package versions, ...) and create reproducibility issues - growing number of dependencies installed increases the boot time To address this, this patch creates a new roachprod bake-images command that relies on Hashicorp Packer to pre-bake ready to use cloud images for AWS and GCP. This creates a system dependency on Packer and requires the machine that runs the command to have Packer installed and to be authenticated on AWS and GCP with authorization to create instances and publish new images. If an image already exist, it won't get built again, making re-running roachprod bake-images safe. The pre-baking process creates images for amd64, arm64 and fips, and pushes them to the roachprod compatible regions (only for AWS, since images are globally available in GCP). The images are tagged with a hashed checksum of the startup script, which defines their unique version. At runtime, the providers checksums the startup script to figure out which pre-baked image should be used, and checks for its availability in the cloud provider for that specific region/zone: - if the image exists, it is used to create the instance, and only a subset (runtime) of the startup scripts is executed on the instances, decreasing the startup time to a minimum (5s or so for disk setup) - if the image does not exists, the system fallbacks to using the base image and the whole startup scripts (pre-baking + runtime) is executed on the instances This patch also drops the JSON hardcoded AMI IDs (or names in GCP) and introduces auto-discovery of the base image's most recent version based on the image name/family and owner or project ID. This allows us to automatically keep up to date with the latest patch releases, which usually are security updates. Notes: - this patch only contains implementation for AWS and GCP, and Azure and IBM should also be implemented - a CI mechanism should be built to automatically build all images when there is a change in the startup scripts (either Github upon merge to master or TeamCity nightly runs) - there is currently no built-in way to deprecate/cleanup previous images since they might still be used on older branches; a cleanup routine should be considered if/when the number of images get out of hand Beyond this first iteration, a concept of "pre-bake only snippets" should come next: snippets that are only executed at pre-baking time and not at runtime even if there is no pre-baked image. These snippets would contain adhoc roachtest setups (building/pre-installing third party tools like Prometheus/Grafana, Jepsen, Kafka CLI, ...), which would remove the need for these tests to build/install at third party dependencies at runtime if the test is running on an instance supported by a pre-baked image (see cockroachdb#62066 as an example). Epic: none Informs: cockroachdb#150144 Release note: None
Prior to this patch, roachprod clusters were created from bare Ubuntu images.
This was inadequate for multiple reasons, some of which being:
To address this, this patch creates a new
roachprod bake-imagescommand that relies on Hashicorp Packer to pre-bake ready to use cloud images for AWS and GCP. This creates a system dependency on Packer and requires the machine that runs the command to have Packer installed and to be authenticated on AWS and GCP with authorization to create instances and publish new images. If an image already exist, it won't get built again, making re-runningroachprod bake-imagessafe.The pre-baking process creates images for
amd64,arm64andfips, and pushes them to the roachprod compatible regions (only for AWS, since images are globally available in GCP). The images are tagged with a hashed checksum of the startup script, which defines their unique version.At runtime, the providers checksums the startup script to figure out which pre-baked image should be used, and checks for its availability in the cloud provider for that specific region/zone:
This patch also drops the JSON hardcoded AMI IDs (or names in GCP) and introduces auto-discovery of the base image's most recent version based on the image name/family and owner or project ID. This allows us to automatically keep up to date with the latest patch releases, which usually are security updates.
Notes:
masteror TeamCity nightly runs)Beyond this first iteration, a concept of "pre-bake only snippets" should come next: snippets that are only executed at pre-baking time and not at runtime even if there is no pre-baked image.
These snippets would contain adhoc roachtest setups (building/pre-installing third party tools like Prometheus/Grafana, Jepsen, Kafka CLI, ...), which would remove the need for these tests to build/install at third party dependencies at runtime if the test is running on an instance supported by a pre-baked image (see #62066 as an example).
Epic: none
Informs: #150144
Release note: None