From 876e69764180100eb88670ea30658154736bf25f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Johannes=20K=C3=B6ster?= Date: Tue, 18 Mar 2025 14:46:10 +0100 Subject: [PATCH 1/2] feat: cleanup instructions --- docs/slurm.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/slurm.md b/docs/slurm.md index 9d9186b..80eb62e 100644 --- a/docs/slurm.md +++ b/docs/slurm.md @@ -271,6 +271,36 @@ If the option `--gpus` is omitted, slurm does not set the `CUDA_VISIBLE_DEVICES` ## Job submission etiquette +### Cleaning up /local/work/$USER + +When storing files under `/local/work/$USER` while computing on a node (which is highly recommended in order to avoid malicious IO patterns on the shared network filesystem), it is recommended to clean up eventual remainders from time to time. +This can be done using sinfo and srun. + +First get the lists of nodes in the cluster via: + +```sh +sinfo +``` + +An example output could be: +```sh +PARTITION AVAIL TIMELIMIT NODES STATE NODELIST +IKIM* up infinite 2 drain c[78,107] +IKIM* up infinite 4 mix c[23,76-77,99] +IKIM* up infinite 8 alloc c[20,24-29,57] +IKIM* up infinite 24 idle c[11,56,60,80-81,89,91-94,98,101,106,108-112,114-117,119-120] +GPUampere up infinite 2 mix g1-[1,6] +GPUampere up infinite 5 alloc g1-[2,4,7-8,10] +``` + +Then, choose a subset of the nodes on which you expect data of yours on `/local/work/$USER` and run the following command (while replacing the node list with the nodes you think should be cleaned up): + +```sh +srun --wait 0 --nodelist "c[11,56,60,80-81,89,91-94,98,101,106,108-112,114-117,119-120]" bash -c "rm -rf /local/work/$USER" +``` + +Make sure to do this in a tmux session, because the command could take a while to complete. + ### Setting a deadline If a job is expected to run continuously for many hours, a deadline should be specified with the option `--time`, even if just an overestimation. This information is especially valuable when all worker nodes are occupied as it allows other users to predict when their job will be scheduled. Accepted time formats include `minutes`, `minutes:seconds`, `hours:minutes:seconds`, `days-hours`, `days-hours:minutes` and `days-hours:minutes:seconds`. From cdf31a7dd09695e62928507991ba9e9781c8bc4d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Johannes=20K=C3=B6ster?= Date: Tue, 18 Mar 2025 14:47:53 +0100 Subject: [PATCH 2/2] Update slurm.md --- docs/slurm.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/slurm.md b/docs/slurm.md index 80eb62e..72f2a0c 100644 --- a/docs/slurm.md +++ b/docs/slurm.md @@ -283,6 +283,7 @@ sinfo ``` An example output could be: + ```sh PARTITION AVAIL TIMELIMIT NODES STATE NODELIST IKIM* up infinite 2 drain c[78,107]