Compute Canada: sbatch: error: Invalid --mail-type specification #94

TidbitSoftware · 2025-10-16T22:40:01Z

TidbitSoftware
Oct 16, 2025
Maintainer

NOTE: This discussion was originally posted to the ISSM Forum, which has been decommissioned. It is reproduced here for reference. Please feel free to contribute to this discussion as it seems that the original and follow up questions were not answered, or start a new discussion.

CHRISTIExy
Hi there,
I'm trying to run ISSM programs on Matlab, remotely on the Cedar server of Compute Canada. I added this line to step 3 in ~/trunk/examples/shakti/rumme.m:

md.cluster=computecanada('port', 0,'login', 'myusername', 'name', 'cedar.computecanada.ca', 'time', 60, 'codepath', '/home/myusername/scratch/trunk/bin', 'executionpath', '/home/myusername/scratch/trunk/execution', 'projectaccount', 'def-nameofmyprofessor');

But it failed to run after generating the following messages:

checking model consistency
marshalling file moulin.bin
uploading input file and queuing script

Enter passphrase for key '/Users/green/.ssh/id_rsa': 

moulin-07-20-2023-15-30-18-49823.tar.gz                                 0%    0     0.0KB/s   --:-- ETA
moulin-07-20-2023-15-30-18-49823.tar.gz                               100%  182KB   2.0MB/s   00:00    
launching solution sequence on remote cluster

Enter passphrase for key '/Users/green/.ssh/id_rsa': 
Due to MODULEPATH changes, the following have been reloaded:
  1) openmpi/4.0.3

cedar1.cedar.computecanada.ca
sbatch: error: Invalid --mail-type specification
ssh -l myusername cedar.computecanada.ca "cd /home/myusername/scratch/trunk/execution && rm -rf ./moulin-07-20-2023-15-30-18-49823 && mkdir moulin-07-20-2023-15-30-18-49823 && cd moulin-07-20-2023-15-30-18-49823 && mv ../moulin-07-20-2023-15-30-18-49823.tar.gz ./ && tar -zxf moulin-07-20-2023-15-30-18-49823.tar.gz  && hostname && sbatch moulin.queue ": Signal 127
waiting for /home/myusername/scratch/trunk/execution/moulin-07-20-2023-15-30-18-49823/moulin.lock hold on... (Ctrl+C to exit)

I followed the instructions on (https://issm.ess.uci.edu/trac/issm/wiki/computecanada) and added a file named cedar_settings.m in $ISSM_DIR/src/m. I edited the following content.

cluster.login='myusername';
cluster.port=0;
cluster.codepath='/home/myusername/scratch/trunk/bin';
cluster.executionpath='/home/myusername/scratch/trunk/execution';

But I'm not sure the file can actually be found by the function computecanada() , since in order to run the program, I'm required to assign all of the variables in the function as shown in the beginning.
I'm trying to figure out what went wrong. Do you have any suggestions? Thanks for your advice!

mathieumorlighem
Hi CHRISTIExy

it seems like the problem is sbatch: error: Invalid --mail-type specification. Make sure to set md.cluster.mailtype to something that the server supports (see https://slurm.schedmd.com/sbatch.html), maybe 'ALL'?
Cheers
Mathieu

CHRISTIExy
mathieumorlighem,
Thank you for your reply! I set the md.cluster.mailtype to 'ALL' and the problem's resolved. But there's another issue. Even though I'm sure I entered the correct passphrase, it keeps asking for it. I've set the ssh key, though I still need to enter my passphrase to access the server. Furthermore, the program doesn't appear to be running since the moulin.log or moulin.outlog files cannot be found. The following is a part of the feedback when running the program.

cedar1.cedar.computecanada.ca
Submitted batch job 7894033
sbatch: NOTE: Your memory request of 2048M was likely submitted as 2G. Please note that Slurm interprets memory requests denominated in G as multiples of 1024M, not 1000M.
waiting for /home/myusername/scratch/trunk/execution/moulin-07-21-2023-09-54-19-49823/moulin.lock hold on... (Ctrl+C to exit)

checking for job completion (time: 0 min 5 sec)
Enter passphrase for key '/Users/green/.ssh/id_rsa':

checking for job completion (time: 0 min 16 sec)
Enter passphrase for key '/Users/green/.ssh/id_rsa':

checking for job completion (time: 0 min 51 sec)
Enter passphrase for key '/Users/green/.ssh/id_rsa':

checking for job completion (time: 1 min 4 sec)
Enter passphrase for key '/Users/green/.ssh/id_rsa':

mathieumorlighem
Hi!
This behavior is actually expected but we can turn it off. In short, ISSM checks for job completion by looking for a .lock file in the execution directory on the cluster, and (by default) checks every 5 seconds. Since you have a passphrase, you probably don't want your local machine to do that. To turn it off you can set md.settings.waitonlock = 0. You will need to download the results manually once the job is done.
Best
Mathieu

CHRISTIExy
mathieumorlighem,
Thanks for your answer! This message doesn't appear again. But I ran into another problem. The moulin.outbin and moulin.outlog files didn't appear to exist. When I run the program on my local computer without using the cluster, this error doesn't occur though. Could you help me with this problem? Many thanks.

>> md=loadresultsfromcluster(md)

Enter passphrase for key '/Users/green/.ssh/id_rsa':
scp: /home/myusername/scratch/trunk/execution/moulin-07-24-2023-10-38-46-516//{moulin.outlog,moulin.errlog,moulin.outbin}: No such file or directory

Warning: issmscpin error message: could not scp moulin.outlog

    In issmscpin (line 65)
    In computecanada/Download (line 129)
    In loadresultsfromcluster (line 47)
    Warning: issmscpin error message: could not scp moulin.errlog
    In issmscpin (line 65)
    In computecanada/Download (line 129)
    In loadresultsfromcluster (line 47)
    Warning: issmscpin error message: could not scp moulin.outbin
    In issmscpin (line 65)
    In computecanada/Download (line 129)
    In loadresultsfromcluster (line 47)
    Warning:

    Binary file moulin.outbin not found!

This typically happens when the run crashed.
Please check for error messages above or in the outlog

    In loadresultsfromdisk (line 16)
    In loadresultsfromcluster (line 50)

mathieumorlighem
Apologies... this is because I forgot to mention that there is a subtlety.... MATLAB will not remember how it named the directory where the job is running. The easiest is to follow these steps: https://issm.ess.uci.edu/trac/issm/wiki/nowaitlock let me know if you have any questions!
Mathieu

CHRISTIExy
I added this step 4 according to the website, but I'm not sure if it's correct.

if any(steps==4)
md=loadmodel('MoulinParam2');
md.cluster=computecanada('port', 0,'login', 'myusername', 'name', 'narval.computecanada.ca', 'time', 7, 'codepath', '/home/myusername/scratch/trunk/bin', 'executionpath', '/home/myusername/scratch/trunk/execution', 'projectaccount', 'def-nameofmyprofessor', 'mailtype', 'ALL');

md.settings.waitonlock = 0;
md.miscellaneous.name = 'moulin'; 
md=solve(md,'Transient','runtimename',false);
save MoulinTransient md

end

Once I enter the passphrase, It immediately asks me to use md=loadresultsfromcluster(md) to load results. The same error occurred when trying to load the result. I also tried to use 'loadonly' in the step 4 but it came with the same error as follows.

narval1.narval.calcul.quebec
Submitted batch job 19481855
Model results must be loaded manually with md=loadresultsfromcluster(md);

        md=loadresultsfromcluster(md)

Enter passphrase for key '/Users/green/.ssh/id_rsa':
scp: /home/puxinyi/scratch/trunk/execution/moulin//{moulin.outlog,moulin.errlog,moulin.outbin}: No such file or directory

Warning: issmscpin error message: could not scp moulin.outlog

    In issmscpin (line 65)
    In computecanada/Download (line 129)
    In loadresultsfromcluster (line 47)
    Warning: issmscpin error message: could not scp moulin.errlog
    In issmscpin (line 65)
    In computecanada/Download (line 129)
    In loadresultsfromcluster (line 47)
    Warning: issmscpin error message: could not scp moulin.outbin
    In issmscpin (line 65)
    In computecanada/Download (line 129)
    In loadresultsfromcluster (line 47)
    Warning:

    Binary file moulin.outbin not found!

This typically happens when the run crashed.
Please check for error messages above or in the outlog

    In loadresultsfromdisk (line 16)
    In loadresultsfromcluster (line 50)

mathieumorlighem
CHRISTIExy,
You forgot the most important part 🙂
one the job is completed, you should call the same code with 'loadonly',1 (which is an option of solve)

CHRISTIExy
mathieumorlighem,
Thank you for reminding me. I set loadonly = 0 in the first simulation, and then ran with loadonly = 1. But it also didn't work and generated the same error(file not found) as well. Is there anything wrong with the scripts?

if any(steps==4)
md=loadmodel('MoulinParam2');
% Specify that you want to run the model on your current computer
% Change the number of processors according to your machine (here np=4)
md.cluster=computecanada('port', 0,'login', 'myusername', 'name', 'narval.computecanada.ca', 'time', 7, 'codepath', '/home/myusername/scratch/trunk/bin', 'executionpath', '/home/myusername/scratch/trunk/execution', 'projectaccount', 'def-nameofmyprofessor', 'mailtype', 'ALL');

loadonly = 1;
md.settings.waitonlock = 0;
md.miscellaneous.name = 'moulin'; 
md=solve(md,'Transient','runtimename',false,'loadonly',loadonly);
if  loadonly
  save MoulinTransient md

end

mathieumorlighem
CHRISTIExy,
Are you sure that the job was completed? The path should be md.cluster.executionpath +/moulin: make sure you have an outlog, errlog and outbin file. If they are not there that means that something went wrong
Mathieu

CHRISTIExy
mathieumorlighem,
I found the problem in not being able to find the outbin, outlog and errlog files. When running the same program without using computecanada, only if I enter this script in Terminal:

source $ISSM_DIR/etc/environment.sh
cd /Applications/MATLAB_R2023a.app/bin
./matlab

It is only after opening matlab in this way that I can run the program properly, otherwise I get a .outbin not found error popping up. But I'm not sure how to avoid this error on computecanada. I tried 'loadonly' and made sure the program has finished running on computecanada:

[myusername@cedar5 trunk]$ squeue -j 8203199
JOBID USER ACCOUNT NAME ST TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON)
8203199 myusername def-xxx moulin R 19:58 1 8 N/A 2G cdr556 (Prolog)
[myusername @cedar5 trunk]$ squeue -j 8203199
JOBID USER ACCOUNT NAME ST TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON)
8203199 myusername def-xxx moulin CG 19:51 1 8 N/A 2G cdr556 (NonZeroExitCode)
[myusername @cedar5 trunk]$ squeue -j 8203199
JOBID USER ACCOUNT NAME ST TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON)

Nonetheless, I ran into the same problem:

scp: /home/puxinyi/scratch/trunk/execution/moulin-07-25-2023-10-18-49-28503//{moulin.outlog,moulin.errlog,moulin.outbin}: No such file or directory

Also, I checked the .errlog on computecanada:

cat /home/myusername/scratch/trunk/execution/moulin-07-25-2023-09-59-34-24631/moulin.errlog
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[0]PETSC ERROR: or try http://valgrind.org/ on GNU/linux and Apple MacOS to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.17.1, Apr 28, 2022
[0]PETSC ERROR: /home/myusername/scratch/trunk/bin/issm.exe on a named cdr556.int.cedar.computecanada.ca by myusername Tue Jul 25 10:00:24 2023
[0]PETSC ERROR: Configure options --prefix=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/petsc/3.17.1 --with-hdf5=1 --with-hdf5-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/hdf5-mpi/1.10.6 --with-cxx-dialect=C++14 --with-memalign=64 --with-python=no --with-mpi4py=no --download-party=1 --download-superlu_dist=1 --download-SuiteSparse=1 --download-superlu=1 --download-metis=1 --download-ptscotch=1 --download-hypre=1 --download-spooles=1 --download-chaco=1 --download-strumpack=1 --download-spai=1 --download-parmetis=1 --download-slepc=1 --download-hpddm=1 --download-ml=1 --download-prometheus=1 --download-triangle=1 --download-mumps=1 --download-mumps-shared=0 --download-ptscotch-shared=0 --download-superlu-shared=0 --download-superlu_dist-shared=0 --download-parmetis-shared=0 --download-metis-shared=0 --download-ml-shared=0 --download-SuiteSparse-shared=0 --download-hypre-shared=0 --download-prometheus-shared=0 --download-spooles-shared=0 --download-chaco-shared=0 --download-slepc-shared=0 --download-spai-shared=0 --download-party-shared=0 --with-cc=mpicc --with-cxx=mpicxx --with-c++-support --with-fc=mpifort --CFLAGS="-O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC" --CXXFLAGS="-O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC -DOMPI_SKIP_MPICXX -DMPICH_SKIP_MPICXX" --FFLAGS="-O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC" --with-mpi=1 --with-build-step-np=8 --with-shared-libraries=1 --with-debugging=0 --with-pic=1 --with-x=0 --with-windows-graphics=0 --with-scalapack=1 --with-scalapack-lib="[/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/scalapack/2.1.0/lib/libscalapack.a,libflexiblas.a,libgfortran.a]" --with-blaslapack-lib="[/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/flexiblas/3.0.4/lib/libflexiblas.a,libgfortran.a]" --with-hdf5=1 --with-hdf5-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/hdf5-mpi/1.10.6 --with-fftw=1 --with-fftw-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/fftw-mpi/3.3.8
[0]PETSC ERROR: #1 User provided function() at unknown file:0
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

In: PMI_Abort(59, N/A)
slurmstepd: error: *** STEP 8202307.0 ON cdr556 CANCELLED AT 2023-07-25T10:00:25 ***
srun: error: cdr556: task 0: Exited with exit code 16
srun: Terminating StepId=8202307.0

Hope the information is useful in solving this issue. Your reply is highly appreciated.

mathieumorlighem
ok so you DO have an errlog on the supercomputer, but it is a segmentation fault (weird because you should also have a non empty outlog)? Have you ever been able to run anything involving ISSM on ComputeCanada?

CHRISTIExy
mathieumorlighem,
I run it on my personal computer. I try to run the SHAKTI model on Cedar server on Compute Canada to make it run faster.

mathieumorlighem
CHRISTIExy,
Could you try with one simple run, like test101 in test/NightlyRun/ ? Tell us what you get in the errlog and outlog

CHRISTIExy
mathieumorlighem,
I made the following changes to the test101 file:

md.cluster=computecanada('port', 0,'login', 'myusername', 'name', 'narval.computecanada.ca', 'time', 7, 'codepath', '/home/myusername/scratch/trunk/bin', 'executionpath', '/home/myusername/scratch/trunk/execution', 'projectaccount', 'def-xxx', 'mailtype', 'ALL');
md.settings.waitonlock = 0;

When running, it generated this error:

        test101
        boundary conditions for stressbalance model: spc set as zero
        no balancethickness.thickening_rate specified: values set as zero
        uploading input file and queuing script

Enter passphrase for key '/Users/green/.ssh/id_rsa':

test101-07-26-2023-06-39-39-69827.tar.gz 0% 0 0.0KB/s --:-- ETA
test101-07-26-2023-06-39-39-69827.tar.gz 100% 78KB 280.6KB/s 00:00
launching solution sequence on remote cluster

Enter passphrase for key '/Users/green/.ssh/id_rsa':

Lmod is automatically replacing "intel/2020.1.217" with "gcc/9.3.0".

Due to MODULEPATH changes, the following have been reloaded:
1) blis/0.8.1 2) flexiblas/3.0.4 3) openmpi/4.0.3

narval1.narval.calcul.quebec
Submitted batch job 19559296
Model results must be loaded manually with md=loadresultsfromcluster(md);
Unrecognized field name "StressbalanceSolution".

Error in test101 (line 30)
(md.results.StressbalanceSolution.Vx),...

        md.results

ans =

struct with no fields.

However, after deleting the md.cluster=computecanada() script, it ran without error. Is it possible that the error was triggered by files not found?

test101
boundary conditions for stressbalance model: spc set as zero
no balancethickness.thickening_rate specified: values set as zero
uploading input file and queuing script
launching solution sequence on remote cluster

Ice-sheet and Sea-level System Model (ISSM) version 4.22
(website: http://issm.jpl.nasa.gov/ contact: [issm@jpl.nasa.gov](mailto:issm@jpl.nasa.gov))

call computational core:
write lock file:

FemModel initialization elapsed time: 0.041831
Total Core solution elapsed time: 0.103404
Linear solver elapsed time: 0.072022 (70%)

Total elapsed time: 0 hrs 0 min 0 sec

        md.results

ans =

struct with fields:

StressbalanceSolution: [1×1 struct]

mathieumorlighem
ok so it looks like it may actually be working since you get Submitted batch job 19559296 but ISSM did not wait for the results. Can you go check in your execution directory on the cluster see if you have a test101 directory and see what the outlog and errlog look like?

CHRISTIExy
mathieumorlighem,
Of course. Here is the test101 execution directory and the file content.

ls /home/myusername/scratch/trunk/execution/test101-07-26-2023-06-39-39-69827
slurm-19559296.out test101.bin test101.queue
test101-07-26-2023-06-39-39-69827.tar.gz test101.errlog test101.toolkits

cat /home/myusername/scratch/trunk/execution/test101-07-26-2023-06-39-39-69827/test101.errlog
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[0]PETSC ERROR: or try http://valgrind.org/ on GNU/linux and Apple MacOS to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.17.1, Apr 28, 2022
[0]PETSC ERROR: /home/myusername/scratch/trunk/bin/issm.exe on a named nc10536.narval.calcul.quebec by myusername Wed Jul 26 09:44:53 2023
[0]PETSC ERROR: Configure options --prefix=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/petsc/3.17.1 --with-hdf5=1 --with-hdf5-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/hdf5-mpi/1.10.6 --with-cxx-dialect=C++14 --with-memalign=64 --with-python=no --with-mpi4py=no --download-party=1 --download-superlu_dist=1 --download-SuiteSparse=1 --download-superlu=1 --download-metis=1 --download-ptscotch=1 --download-hypre=1 --download-spooles=1 --download-chaco=1 --download-strumpack=1 --download-spai=1 --download-parmetis=1 --download-slepc=1 --download-hpddm=1 --download-ml=1 --download-prometheus=1 --download-triangle=1 --download-mumps=1 --download-mumps-shared=0 --download-ptscotch-shared=0 --download-superlu-shared=0 --download-superlu_dist-shared=0 --download-parmetis-shared=0 --download-metis-shared=0 --download-ml-shared=0 --download-SuiteSparse-shared=0 --download-hypre-shared=0 --download-prometheus-shared=0 --download-spooles-shared=0 --download-chaco-shared=0 --download-slepc-shared=0 --download-spai-shared=0 --download-party-shared=0 --with-cc=mpicc --with-cxx=mpicxx --with-c++-support --with-fc=mpifort --CFLAGS="-O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC" --CXXFLAGS="-O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC -DOMPI_SKIP_MPICXX -DMPICH_SKIP_MPICXX" --FFLAGS="-O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC" --with-mpi=1 --with-build-step-np=8 --with-shared-libraries=1 --with-debugging=0 --with-pic=1 --with-x=0 --with-windows-graphics=0 --with-scalapack=1 --with-scalapack-lib="[/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/scalapack/2.1.0/lib/libscalapack.a,libflexiblas.a,libgfortran.a]" --with-blaslapack-lib="[/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/flexiblas/3.0.4/lib/libflexiblas.a,libgfortran.a]" --with-hdf5=1 --with-hdf5-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/hdf5-mpi/1.10.6 --with-fftw=1 --with-fftw-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/fftw-mpi/3.3.8
[0]PETSC ERROR: #1 User provided function() at unknown file:0
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

In: PMI_Abort(59, N/A)
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
srun: error: nc10536: task 0: Exited with exit code 16

Could it be an issue with the configuration scripts?

./configure --prefix=$ISSM_DIR --with-numthreads=2 --with-mkl-libflags="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lmkl_scalapack_lp64" --with-petsc-dir="$EBROOTPETSC" --with-mumps-dir="$EBROOTPETSC" --with-m1qn3-dir="$ISSM_DIR/externalpackages/m1qn3/install" --with-mpi-include="$EBROOTOPENMPI/include" --with-mpi-libflags="-lmpi_cxx -lmpi_mpifh -lmpi" --with-fortran-lib="-L$EBROOTGCC/lib64 -lgfortran" --with-triangle-dir="${ISSM_DIR}/externalpackages/triangle/install" --with-chaco-dir="${ISSM_DIR}/externalpackages/chaco/install" --with-matlab-dir=" $EBROOTMATLAB" --with-blas-dir=”$EBROOTIMKL” --with-scalapack-dir="$EBROOTIMKL" --with-metis-dir="$EBROOTMETIS" --with-parmetis-dir="$EBROOTPARMETIS"

mathieumorlighem
ouch! That's not good. Basically your installation of ISSM on the cluster does not work (you have a segmentation fault). It probably comes from a library conflict or something like that. Has anybody been able to install ISSM successfully on this cluster?

DominoAJones
Hi,

I'm also running into the no outlog, outbin, errorlog error.

I've tried the loadonly suggestion:

loadonly = 0;

   %Make sure jobs are submitted without MATLAB waiting for job completion 
   md.settings.waitonlock = 0;
   md.cluster.interactive = 0; %only needed if you are using the generic cluster
   md.miscellaneous.name = 'KNS'; 

  %Submit job or download results, make sure that there is no runtime name (that includes the date)
   md=solve(md,'Stressbalance','runtimename',false,'loadonly',loadonly);

   %Save model if necessary
loadonly = 1;

   %Make sure jobs are submitted without MATLAB waiting for job completion 
   md.settings.waitonlock = 0;
   md.cluster.interactive = 0; %only needed if you are using the generic cluster
   md.miscellaneous.name = 'KNS'; 

  %Submit job or download results, make sure that there is no runtime name (that includes the date)
   md=solve(md,'Stressbalance','runtimename',false,'loadonly',loadonly);
   md=loadresultsfromcluster(md);
   save KNS md

I cannot find any errorlog, outlog, or outbin on the supercomputer (so there must be something else wrong, maybe with the way I compiled it?) No one else has used ISSM on this cluster. Would love any advice I can get!

mathieumorlighem
Hi DominoAJones,
what do you see printed to the screen?
Also, make sure to only call loadonly = 1 once the run is completed. I am not sure if you are running this locally or on a remote machine, but you have to wait until the .lock file is written (which indicates that the run is finished)
All the best
Mathieu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compute Canada: sbatch: error: Invalid --mail-type specification #94

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Compute Canada: sbatch: error: Invalid --mail-type specification #94

Uh oh!

Uh oh!

TidbitSoftware Oct 16, 2025 Maintainer

Replies: 0 comments

TidbitSoftware
Oct 16, 2025
Maintainer