-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Feature Area
/area components
What feature would you like to see?
The HyperparameterTuningJobRunOp component from Google Cloud Pipeline Components allows users to run hyperparameter tuning jobs on Vertex AI.
Vertex AI also supports persistent resources, which allows reserving compute resources. This is particularly useful when GPU availability is limited.
Currently, the HyperparameterTuningJobRunOp component does not support specifying a persistent_resource_id. In contrast, the CustomTrainingJobOp component already supports this parameter.
I request support for the persistent_resource_id argument in HyperparameterTuningJobRunOp to enable scheduling tuning jobs on reserved resources.
What is the use case or pain point?
GPU resource availability on Google Cloud can fluctuate significantly. We frequently encounter GPU instance shortages, which leads to job scheduling failures for our hyperparameter tuning tasks.
To reliably execute these tuning jobs, we need the ability to schedule them on persistent resources. This would align the component's capabilities with CustomTrainingJobOp and underlying Vertex AI features.
Example (Desired Usage):
from google_cloud_pipeline_components.v1.hyperparameter_tuning_job import HyperparameterTuningJobRunOp
hpt_op = HyperparameterTuningJobRunOp(
...,
persistent_resource_id="cluster-20251112-143916",
)Is there a workaround currently?
No. Users of google-cloud-pipeline-components cannot schedule hyperparameter tuning jobs on persistent resources using the high-level HyperparameterTuningJobRunOp.
The only alternative is to create a custom component to handle this, which adds unnecessary complexity and maintenance overhead.
Love this idea? Give it a 👍.