Use Slurm to submit and manage jobs on high-performance computing systems

On this page:


Overview

At Indiana University, the Carbonate deep learning (DL) nodes use the Slurm Workload Manager to coordinate resource management and job scheduling.

Slurm user commands include numerous options for specifying the resources and other attributes needed to run batch jobs or interactive sessions. Options can be invoked on the command line or with directives contained in a job script.

Common user commands in Slurm include:

Command Description
sbatch Submit a batch script to Slurm. The command exits immediately when the script is transferred to the Slurm controller daemon and assigned a Slurm job ID. For more, see the Batch jobs section below.
srun Request resources for an interactive job. For more, see the Interactive jobs section below.
squeue Monitor job status information. For more, see the Monitor or delete your job section below.
scancel Terminate a queued or running job prior to its completion. For more, see the Monitor or delete your job section below.
sinfo View partition information. For more, see the View partition and node information section below.

SchedMD distributes and maintains the canonical version of Slurm and provides online user documentation. For a summary of Slurm commands and options, see Command/option Summary. For a complete list of Slurm commands, see Man Pages. If you're accustomed to using another resource management and scheduling system (for example, TORQUE/PBS), and need help determining the Slurm equivalents for a certain commands or options, see Rosetta Stone of Workload Managers.

Batch jobs

To run a job in batch mode, first prepare a job script that specifies the application you want to run and the resources required to run it, and then use the sbatch command to submit the script to Slurm. For complete documentation about the sbatch command and its options, see the sbatch manual page; on Carbonate, enter man sbatch, or on the web, see sbatch.

Prepare a job script

Most commonly, Slurm job scripts contain at least one executable command preceded by a list of options that specify the resources and other attributes needed to execute the command (for example, wall-clock time, the number of nodes and processors, and filenames for job output and errors). These options are included as #SBATCH directives that precede any executable lines in the job script. Additionally, because the job script will be executed under your preferred login shell, it must begin with a line that specifies the command interpreter that is needed to read it.

For example, a job script for running a batch job on the Carbonate DL nodes could look similar to the following:

#!/bin/bash

#SBATCH -J job_name
#SBATCH -p dl
#SBATCH --gres=gpu:p100:1
#SBATCH -o filename_%j.txt
#SBATCH -e filename_%j.err

module load modulename
./a.out

In the example script above:

  • The first line indicates that the script should be read using the Bash command interpreter.
  • The next lines are #SBATCH directives used to pass options to the sbatch command:
    Option Description
    -J job_name Assign the specified name (job_name) to the job.
    -p dl Request that the job is allocated resources in Carbonate's dl partition.
    --gres=gpu:p100:1 Request that the job is allocated one P100 GPU.
    -o filename_%j.out Redirect standard output to the file filename_%j.out.
    (Slurm automatically replaces %j with the job ID.)
    -e filename_%j.err Redirect standard error output to the file filename_%j.err.
    (Slurm automatically replaces %j with the job ID.)
  • The last two lines are the two executable lines that the job will run. In this case, the module command is used to load the modulename module before the a.out binary is executed.

Depending on the resources needed to run your executable lines, you may need to include other sbatch options in your job script. Here a few other useful ones:

Option Description
--nodes=<n> Request that your job is allocated a minimum number (n) of nodes.
--ntasks-per-node=<n> Specify the number of tasks (n) that should be invoked on each node.
--time=<HH:MM:SS> Set a limit (HH:MM:SS) on the total wall time for your job.
--mail-user=username@iu.edu Send job-related email to the specified email address (username@iu.edu).
--mail-type=<type> Send job-related email when an event of the specified type(s) occurs. Valid type values include all, begin, end, and fail.
Note:

If you omit the --gres option when requesting resources in Carbonate's dl or dl-debug partition, sbatch will return an error:

Must specify a gpu resource:
Resubmit with --gres=gpu:type:count, where type is p100 or v100.
Batch job submission failed: Unspecified error

For more, see Run jobs on Carbonate's DL nodes.

Submit your job script

To submit your job script (for example, my_job.script), use the sbatch command. If the command runs successfully, it will return a job ID to standard output, for example:

[lcalriss@h2 ~]$ sbatch my_job.script
Submitted batch job 9472

MPI jobs

To run an MPI job, add #SBATCH directives to your script for requesting the required resources and add the srun command as an executable line for launching your application. For example, a job script for running an MPI job that launches 24 tasks across two nodes and uses two V100 GPUs in Carbonate's dl partition could look similar to the following:

#!/bin/bash
  
#SBATCH -J mpi_job
#SBATCH -p dl
#SBATCH -o mpi_%j.txt
#SBATCH -e mpi_%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=username@iu.edu
#SBATCH --gres=gpu:v100:2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
#SBATCH --time=00:30:00

cd /directory/with/stuff
srun a.out
Note:
If your application was compiled using a version of OpenMPI configured with --with-pmi (for example, openmpi/gnu/4.0.1 or openmpi/intel/4.0.1), you can use srun to launch it from your job script. If your application was compiled using a version of OpenMPI that was not configured with --with-pmi (for example, openmpi/gnu/2.1.0 or openmpi/intel/2.1.0), you can use mpirun to launch it from your job script.

OpenMP and hybrid OpenMP-MPI jobs

To run an OpenMP or hybrid OpenMP-MPI job, use the srun command and add the necessary #SBATCH directives as in the previous example, but also add an executable line that sets the OMP_NUM_THREADS environment variable to indicate the number of threads that should be used for parallel regions. For example, a job script for running a hybrid OpenMP-MPI job that launches 12 tasks across two nodes and uses two V100 GPUs in Carbonate's dl partition could look similar to the following:

#!/bin/bash

#SBATCH -J hybrid_job
#SBATCH -p dl
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@iu.edu
#SBATCH --gres=gpu:v100:2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
#SBATCH --time=00:05:00
  
export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun a.out

You also can add options to the srun command in your script. For example, to modify the previous example so that it launches 24 tasks across two nodes, add the -N (node count) and -n (number of tasks) options to the srun command:

#!/bin/bash

#SBATCH -J hybrid_job
#SBATCH -p dl
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@iu.edu
#SBATCH --gres=gpu:v100:2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
#SBATCH --time=00:05:00

export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun -N2 -n24 a.out

You also can bind tasks to CPUs with the srun command's --cpu-bind option. For example, to modify the previous example so that it binds tasks to sockets, add the --cpu-bind=sockets option to the srun command:

#!/bin/bash
  
#SBATCH -J hybrid_job
#SBATCH -p dl
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@iu.edu  
#SBATCH --gres=gpu:v100:2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
#SBATCH --time=00:05:00

export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun -N2 -n24 --cpu-bind=sockets a.out

Supported binding options include --cpu-bind=mask_cpu:<list>, which binds by setting CPU masks on tasks as specified by the given <list>. To view all available CPU bind options, on the Carbonate command line, enter:

srun --cpu-bind=help

Interactive jobs

To request resources for an interactive job, use the srun command with the --pty option. For example, to launch a Bash session that uses one V100 GPU on a node in Carbonate's dl partition, on the command line, enter:

srun -p dl --gres=gpu:v100:1 --pty bash

When the requested resources are allocated to your job, you will be placed at the command prompt on one of Carbonate's DL nodes. When you are finished, on the command line, enter exit to free the allocated resources.

For complete documentation about the srun command, see the srun manual page; on Carbonate, enter man srun, or on the web, see srun.

Monitor or delete your job

To monitor the status of jobs in a Slurm partition, use the squeue command.

Some useful squeue options include:

Option Description
-u <username> Display jobs owned by the specified user.
-l Display information in long format.
-a Display all jobs.
-p <partition_name> Display jobs in the specified partition.
-t <state_list> Display jobs that have the specified state(s). Valid jobs states include PENDING, RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, NODE_FAIL, PREEMPTED, BOOT_FAIL, DEADLINE, OUT_OF_MEMORY, COMPLETING, CONFIGURING, RESIZING, REVOKED, and SPECIAL_EXIT.

For example, on Carbonate, to see pending jobs in the dl partition that belong to username, on the command line, enter:

squeue -u username -p dl -t PENDING

For complete documentation about the squeue command, see the squeue manual page; on Carbonate, enter man squeue, or on the web, see squeue.

To delete your pending or running job, use the scancel command with your job's job ID; for example, to delete your job that has a job ID of 8990, on the command line, enter:

scancel 8990

View partition and node information

To view information about the nodes and partitions that Slurm manages, use the sinfo command.

By default, sinfo (without any options) displays:

  • All partition names
  • Availability of each partition
  • Maximum wall time allowed for jobs in each partition
  • Number of nodes in each partition
  • State of the nodes in each partition
  • Names of the nodes in each partition

To display node-specific information, use sinfo -N, which lists:

  • All node names
  • Partition to which each node belongs
  • State of each node

To display additional node-specific information, use sinfo -lN, which adds the following fields to the previous output:

  • Number of cores per node
  • Number of sockets per node, cores per socket, and threads per core
  • Size of memory per node in megabytes

Alternatively, to specify which information fields are displayed and control the formatting of the output, use sinfo with the -o option; for example (replace # with a number to set the display width of the field, and field1 and field2 with the desired field specifications):

sinfo -o "%<#><field1> %<#><field2>"

Available field specifications include:

Specification Field displayed
%<#>P Partition name (set field width to # characters)
%<#>N List of node names (set field width to # characters)
%<#>c Number of cores per node (set field width to # characters)
%<#>m Size of memory per node in megabytes (set field width to # characters)
%<#>l Maximum wall time allowed (set field width to # characters)
%<#>s Maximum number of nodes allowed per job (set field width to # characters)
%<#>G Generic resource associated with a node (set field width to # characters)

For example, on Carbonate, the following sinfo command outputs a node-specific list that includes partition names, node names, the number of cores per node, the amount of memory per node, the maximum wall time allowed per job, and the number and type of generic resources (GPUs) available on each node:

sinfo -No "%10P %8N  %4c  %7m  %12l %10G"

The resulting output looks similar to this:

PARTITION  NODELIST  CPUS  MEMORY   TIMELIMIT    GRES
dl         dl1       24    192888   2-00:00:00   gpu:v100:2
dl         dl2       24    192888   2-00:00:00   gpu:v100:2
dl         dl3       24    192888   2-00:00:00   gpu:p100:2
dl         dl4       24    192888   2-00:00:00   gpu:p100:2
dl         dl5       24    192888   2-00:00:00   gpu:p100:2
dl         dl6       24    192888   2-00:00:00   gpu:p100:2
dl         dl7       24    192888   2-00:00:00   gpu:p100:2
dl         dl8       24    192888   2-00:00:00   gpu:p100:2
dl         dl9       24    192888   2-00:00:00   gpu:p100:2
dl-debug   dl10      24    192888   8:00:00      gpu:p100:2
dl         dl11      24    192888   2-00:00:00   gpu:v100:2
dl         dl12      24    192888   2-00:00:00   gpu:v100:2

For complete documentation about the sinfo command, see the sinfo manual page; on Carbonate, enter man sinfo, or on the web, see sinfo.

Note:
To best meet the needs of all research projects affiliated with Indiana University, UITS Research Technologies administers the batch job queues on IU's research supercomputers using resource management and job scheduling policies that optimize the overall efficiency and performance of workloads on those systems. If the structure or configuration of the batch queues on any of IU's supercomputing systems does not meet the needs of your research project, contact UITS Research Technologies.

Get help

Support for IU research computing systems, software, and services is provided by various teams within the Research Technologies division of UITS.

For general questions about research computing at IU, contact UITS Research Technologies.

For more options, see Research computing support at IU.

This is document awrz in the Knowledge Base.
Last modified on 2019-09-17 16:15:21.

Contact us

For help or to comment, email the UITS Support Center.