SLURM
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
The HPC team has the most comprehensive resource for Dalma available. We will go through some of the basic commands here.
Submit jobs with - sbatch
Create a file in your training folder.
#!/bin/bash
# job.1.sh
#SBATCH -p serial
#SBATCH --job-name=job1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH -o job.%J.out
#SBATCH -e job.%J.err
#Load your modules
module load gencore/1
# Commands
touch example
ls -lah
tar --remove-files -cvf example.tar example
ls -lah
sleep 5
Submit this job with sbatch.
[gencore@login-0-2 ~]$ :sbatch job.1.sh
Submitted batch job MYJOBID
Take note of the job id, because we will add this job as a dependency for the next job.
Investigate jobs with - squeue
squeue -u $USER
Submit a job with dependencies
#!/bin/bash
# job.2.sh
#SBATCH -p serial
#SBATCH --job-name=job2
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH -o job.%J.out
#SBATCH -e job.%J.err
#Load your modules
module load gencore/1
# Commands
touch example
ls -lah
tar -xvf example.tar
ls -lah
sleep 5
sbatch job.2.sh --dependency=afterok:MYJOBID #Substitute in your job id
If you submitted your jobs within ~5 minutes of eachother, you should see something like this when running squeue.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
55125 ser_std job2 gencore PD 0:00 1 (Dependency)
55124 ser_std job1 gencore PD 0:04 1 compute-7-18