Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ steps:
Params:
action: start
labelSelector: app.kubernetes.io/created-by = kuberay-operator
measurmentInterval: 1s
measurementInterval: 1s

- name: Creating RayJobs for PyTorch MNIST fine-tuning
phases:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ spec:
effect: "NoSchedule"
containers:
- name: ray-submitter
image: rayproject/ray:2.41.0
image: {{.Image}}
rayClusterSpec:
rayVersion: '2.9.3'
headGroupSpec:
Expand All @@ -38,7 +38,7 @@ spec:
effect: "NoSchedule"
containers:
- name: ray-head
image: rayproject/ray:2.41.0
image: {{.Image}}
ports:
- containerPort: 6379
name: gcs-server
Expand All @@ -65,10 +65,10 @@ spec:
effect: "NoSchedule"
containers:
- name: ray-worker
image: rayproject/ray:2.41.0
image: {{.Image}}
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpu: {{.JobGPU}}
requests:
nvidia.com/gpu: 1
nvidia.com/gpu: {{.JobGPU}}

Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,8 @@ def install_ray_dependencies(self):
from "<cl2_config_dir>/ray/".
- Waits for operator and mock-head pods to be ready.
"""
config_dir = os.path.join("./clusterloader2/job_controller/config", "ray")
logger.info("cl2 config dir: %s", self.cl2_config_dir)
config_dir = os.path.join(self.cl2_config_dir, "ray")
values_file = os.path.join(config_dir, "values.yaml")

# Install KubeRay operator via Helm
Expand Down Expand Up @@ -180,6 +181,8 @@ def install_ray_dependencies(self):
"--install",
"kuberay-operator",
"kuberay/kuberay-operator",
"--version",
"1.4.2",
"--namespace",
"kuberay-system",
"--create-namespace",
Expand Down Expand Up @@ -283,6 +286,9 @@ def add_configure_subparser_arguments(parser):
type=str,
help="Timeout before failing the scale up test",
)
parser.add_argument(
"--cl2_config_dir", type=str, help="Path to the CL2 config directory"
)
parser.add_argument(
"--cl2_override_file",
type=str,
Expand Down
1 change: 1 addition & 0 deletions steps/engine/clusterloader2/job_controller/execute.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ steps:
--dra_enabled ${ENABLE_DRA:-False} \
--ray_enabled ${ENABLE_RAY:-False} \
--job_gpu ${JOB_GPU:-0} \
--cl2_config_dir ${CL2_CONFIG_DIR} \
--cl2_override_file ${CL2_CONFIG_DIR}/overrides.yaml

PYTHONPATH=$PYTHONPATH:$(pwd) python3 $PYTHON_SCRIPT_FILE execute \
Expand Down