You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prior to this patch, roachprod clusters were created from bare Ubuntu
images.
This was inadequate for multiple reasons, some of which being:
- dependency on third-parties (GCS, APT repositories) availability
- spinning up two clusters at a different moment in time could lead to
different resulting systems (package versions, ...) and create
reproducibility issues
- growing number of dependencies installed increases the boot time
To address this, this patch creates a new roachprod bake-images command
that relies on Hashicorp Packer to pre-bake ready to use cloud images
for AWS and GCP. This creates a system dependency on Packer and requires
the machine that runs the command to have Packer installed and to be
authenticated on AWS and GCP with authorization to create instances and
publish new images. If an image already exist, it won't get built again,
making re-running roachprod bake-images safe.
The pre-baking process creates images for amd64, arm64 and fips, and
pushes them to the roachprod compatible regions (only for AWS, since
images are globally available in GCP). The images are tagged with a
hashed checksum of the startup script, which defines their unique
version.
At runtime, the providers checksums the startup script to figure out
which pre-baked image should be used, and checks for its availability in
the cloud provider for that specific region/zone:
- if the image exists, it is used to create the instance, and only a
subset (runtime) of the startup scripts is executed on the instances,
decreasing the startup time to a minimum (5s or so for disk setup)
- if the image does not exists, the system fallbacks to using the base
image and the whole startup scripts (pre-baking + runtime) is executed
on the instances
This patch also drops the JSON hardcoded AMI IDs (or names in GCP) and
introduces auto-discovery of the base image's most recent version based
on the image name/family and owner or project ID. This allows us to
automatically keep up to date with the latest patch releases, which
usually are security updates.
Notes:
- this patch only contains implementation for AWS and GCP, and Azure and
IBM should also be implemented
- a CI mechanism should be built to automatically build all images when
there is a change in the startup scripts (either Github upon merge to
master or TeamCity nightly runs)
- there is currently no built-in way to deprecate/cleanup previous
images since they might still be used on older branches; a cleanup
routine should be considered if/when the number of images get out of
hand
Beyond this first iteration, a concept of "pre-bake only snippets"
should come next: snippets that are only executed at pre-baking time and
not at runtime even if there is no pre-baked image.
These snippets would contain adhoc roachtest setups
(building/pre-installing third party tools like Prometheus/Grafana,
Jepsen, Kafka CLI, ...), which would remove the need for these tests to
build/install at third party dependencies at runtime if the test is
running on an instance supported by a pre-baked image (see #62066 as an
example).
Epic: none
Informs: #150144
Release note: None
0 commit comments