Skip to content

Commit 2d6841e

Browse files
committed
ci: install CATS from local checkout, use pre-built cluster images
1 parent b00ce73 commit 2d6841e

File tree

9 files changed

+148
-37
lines changed

9 files changed

+148
-37
lines changed

.github/workflows/cluster-tests.yml

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,38 @@ on:
77
paths:
88
- '.github/workflows/cluster-tests.yml'
99
- 'cluster/**'
10+
- 'cats/**.py'
11+
- 'pyproject.toml'
1012
- '!cluster/README.md'
1113
pull_request:
1214
branches: [ main ]
1315
paths:
1416
- '.github/workflows/cluster-tests.yml'
1517
- 'cluster/**'
18+
- 'cats/**.py'
19+
- 'pyproject.toml'
1620
- '!cluster/README.md'
1721
workflow_dispatch:
1822

1923
jobs:
2024
build:
2125

2226
runs-on: ubuntu-latest
27+
permissions:
28+
contents: read
29+
packages: read
2330
steps:
2431
- uses: actions/checkout@v4
25-
- name: Build slurm container
32+
- name: Log in to GHCR
33+
uses: docker/login-action@v3
34+
with:
35+
registry: ghcr.io
36+
username: ${{ github.actor }}
37+
password: ${{ secrets.GITHUB_TOKEN }}
38+
- name: Start slurm cluster and install CATS
2639
run: |
27-
./cluster/clone.sh
28-
./cluster/build.sh
40+
./cluster/start.sh
41+
./cluster/install_cats.sh
2942
- name: Run tests
3043
run: |
3144
sleep 30 # wait for cluster to come up

cluster/README.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,31 @@
11
# Cluster tests
22

3-
This folder contains scripts to setup an ephemeral SLURM cluster to test cats
4-
in a more realistic setting than the current integration tests that use
5-
macking. The setup builds upon work from upstream
3+
This folder contains scripts to setup an ephemeral SLURM cluster to test
4+
cats in a more realistic setting than the current integration tests that
5+
use mocking. The setup builds upon work from upstream
66
https://github.com/giovtorres/slurm-docker-cluster with a patched
7-
[Dockerfile](Dockerfile) that installs the latest release of CATS and makes it
8-
available in the cluster.
7+
Dockerfile that installs jq and uv to make CATS installation easier. Our
8+
patches are maintained at
9+
https://github.com/GreenScheduler/slurm-docker-cluster.
10+
11+
## Pre-requisites
12+
13+
Currently slurm-docker-cluster is only built against linux/amd64 so you
14+
will need to be on a 64-bit machine if you want to test this locally. You
15+
will also need docker installed.
916

1017
## Setup
1118

1219
Clone this repository (GreenScheduler/cats) and then run
1320

1421
```shell
15-
./cats/clone.sh
16-
./cats/build.sh
22+
./cluster/start.sh
23+
```
24+
to fetch the `ghcr.io/greenscheduler/slurm-docker-cluster:latest` image
25+
and start the cluster. You can now install cats locally from the current checkout:
26+
```shell
27+
./cluster/install_cats.sh
1728
```
18-
to clone the slurm-docker-cluster repo, patch the Dockerfile to install CATS,
19-
build and start the cluster. Note that this requires `docker` and `docker
20-
compose` to be present. Currently this compiles a specific SLURM version, so
21-
this may take a while on older computers. When developing locally, you should
22-
only need to do this once, unless you update the Dockerfile.
2329

2430
Once the cluster is built and running, then you can run the following to get
2531
access to the control node:
@@ -28,9 +34,6 @@ access to the control node:
2834
docker exec -it slurmctld bash
2935
```
3036

31-
For more information about slurm-docker-cluster, consult the upstream
32-
repository.
33-
3437
## Tests
3538

3639
An automated testing script is supplied which shows programmatic interaction

cluster/build.sh

Lines changed: 0 additions & 9 deletions
This file was deleted.

cluster/cleanup.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
# Cleans up resources and shuts down containers, useful for local development of slurm-docker-cluster
33
set -eou pipefail
44

5+
pushd cluster
56
docker compose down
6-
if [ -d slurm-docker-cluster ]; then
7-
rm -r slurm-docker-cluster
8-
fi
7+
popd

cluster/clone.sh

Lines changed: 0 additions & 6 deletions
This file was deleted.

cluster/docker-compose.yml

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
services:
2+
mysql:
3+
image: mariadb:10.11
4+
hostname: mysql
5+
container_name: mysql
6+
environment:
7+
MYSQL_RANDOM_ROOT_PASSWORD: "yes"
8+
MYSQL_DATABASE: slurm_acct_db
9+
MYSQL_USER: slurm
10+
MYSQL_PASSWORD: password
11+
volumes:
12+
- var_lib_mysql:/var/lib/mysql
13+
networks:
14+
- slurm-network
15+
16+
slurmdbd:
17+
image: ghcr.io/greenscheduler/slurm-docker-cluster:latest
18+
command: ["slurmdbd"]
19+
container_name: slurmdbd
20+
hostname: slurmdbd
21+
volumes:
22+
- etc_munge:/etc/munge
23+
- etc_slurm:/etc/slurm
24+
- var_log_slurm:/var/log/slurm
25+
expose:
26+
- "6819"
27+
depends_on:
28+
- mysql
29+
networks:
30+
- slurm-network
31+
32+
slurmctld:
33+
image: ghcr.io/greenscheduler/slurm-docker-cluster:latest
34+
command: ["slurmctld"]
35+
container_name: slurmctld
36+
hostname: slurmctld
37+
volumes:
38+
- etc_munge:/etc/munge
39+
- etc_slurm:/etc/slurm
40+
- slurm_jobdir:/data
41+
- var_log_slurm:/var/log/slurm
42+
expose:
43+
- "6817"
44+
depends_on:
45+
- "slurmdbd"
46+
networks:
47+
- slurm-network
48+
49+
c1:
50+
image: ghcr.io/greenscheduler/slurm-docker-cluster:latest
51+
command: ["slurmd"]
52+
hostname: c1
53+
container_name: c1
54+
volumes:
55+
- etc_munge:/etc/munge
56+
- etc_slurm:/etc/slurm
57+
- slurm_jobdir:/data
58+
- var_log_slurm:/var/log/slurm
59+
expose:
60+
- "6818"
61+
depends_on:
62+
- "slurmctld"
63+
networks:
64+
- slurm-network
65+
66+
c2:
67+
image: ghcr.io/greenscheduler/slurm-docker-cluster:latest
68+
command: ["slurmd"]
69+
hostname: c2
70+
container_name: c2
71+
volumes:
72+
- etc_munge:/etc/munge
73+
- etc_slurm:/etc/slurm
74+
- slurm_jobdir:/data
75+
- var_log_slurm:/var/log/slurm
76+
expose:
77+
- "6818"
78+
depends_on:
79+
- "slurmctld"
80+
networks:
81+
- slurm-network
82+
83+
volumes:
84+
etc_munge:
85+
etc_slurm:
86+
slurm_jobdir:
87+
var_lib_mysql:
88+
var_log_slurm:
89+
90+
networks:
91+
slurm-network:
92+
driver: bridge

cluster/install_cats.sh

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/usr/bin/env bash
2+
# Install cats on the slurm custer
3+
# This relies on a cluster already setup and running, if not run
4+
# ./cluster/start.sh
5+
set -eou pipefail
6+
7+
docker exec slurmctld mkdir /tmp/cats
8+
for file in pyproject.toml ./cats; do
9+
docker cp "$file" slurmctld:/tmp/cats
10+
done
11+
docker exec -it slurmctld uv tool install /tmp/cats
12+
docker exec slurmctld cp /root/.local/bin/cats /usr/local/bin/cats

cluster/start.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
# Starts cluster
3+
set -eou pipefail
4+
pushd cluster
5+
docker compose pull
6+
docker compose up -d
7+
popd

cluster/tests.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/usr/bin/env bash
22
# Run tests to check if slurm picks up begin time set by CATS
33
# This relies on a cluster already setup and running, if not run
4-
# ./cluster/build.sh
4+
# ./cluster/start.sh
55
set -eou pipefail
66

77
# Step a) Run cats inside the slurmctld container and extract start time

0 commit comments

Comments
 (0)