Use Slurm to submit and manage jobs on high-performance computing systems

On this page:


Overview

At Indiana University, Big Red 3 and the Carbonate deep learning (DL) nodes use the Slurm Workload Manager to coordinate resource management and job scheduling.

Slurm user commands include numerous options for specifying the resources and other attributes needed to run batch jobs or interactive sessions. Options can be invoked on the command line or with directives contained in a job script.

Common user commands in Slurm include:

Command Description
sbatch Submit a batch script to Slurm. The command exits immediately when the script is transferred to the Slurm controller daemon and assigned a Slurm job ID. For more, see the Batch jobs section below.
srun Request resources for an interactive job. For more, see the Interactive jobs section below.
squeue Monitor job status information. For more, see the Monitor or delete your job section below.
scancel Terminate a queued or running job prior to its completion. For more, see the Monitor or delete your job section below.
sinfo View partition information. For more, see the View partition and node information section below.

Batch jobs

To run a job in batch mode, first prepare a job script that specifies the application you want to launch and the resources required to run it. Then, use the sbatch command to submit your job script to Slurm.

For complete documentation about the sbatch command and its options, see the sbatch manual page (on the web, see sbatch; on Big Red 3 or Carbonate, enter man sbatch).

Prepare a job script

Slurm job scripts most commonly have at least one executable line preceded by a list of options that specify the resources and attributes needed to run your job (for example, wall-clock time, the number of nodes and processors, and filenames for job output and errors).

  • A job script for running a batch job on Big Red 3 may look similar to the following:
    #!/bin/bash
    
    #SBATCH -J job_name
    #SBATCH -p general
    #SBATCH -o filename_%j.txt
    #SBATCH -e filename_%j.err
    #SBATCH --mail-type=ALL
    #SBATCH --mail-user=username@iu.edu
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=1
    #SBATCH --time=02:00:00
    
    module load modulename
    ./a.out
    

    In the above example:

    • The first line indicates that the script should be read using the Bash command interpreter.
    • The next lines are #SBATCH directives used to pass options to the sbatch command:
      • -J job_name specifies a name for the job allocation. The specified name will appear along with the job ID number when you query running jobs on the system.
      • -p general specifies that the job should run in the general partition.
      • -o filename_%j.txt and -e filename_%j.err instructs Slurm to connect the job's standard output and standard error, respectively, to the file names specified, where %j is automatically replaced by the job ID.
      • --mail-user=username@iu.edu indicates the email address to which Slurm will send job-related mail.
      • --mail-type=<type> directs Slurm to send job-related email when an event of the specified type(s) occurs; valid type values include all, begin, end, and fail.
      • --nodes=1 requests that a minimum of one node be allocated to this job.
      • --ntasks-per-node=1 specifies that one task should be launched per node.
      • --time=02:00:00 requests that the job run for a minimum of two hours.
    • The last two lines are the two executable lines that the job will run. In this case, the module command is used to load a specified module before the a.out binary is executed.
  • A job script for running a batch job on the Carbonate deep learning nodes must contain the --gres flag to indicate the type of GPU (p100 or v100) and the number of GPUs (1 or 2) that should be allocated to the job; for example:
    #!/bin/bash
    
    #SBATCH -J job_name
    #SBATCH -p dl
    #SBATCH --gres=gpu:v100:2
    #SBATCH -o filename_%j.txt
    #SBATCH -e filename_%j.err
    #SBATCH --mail-type=ALL
    #SBATCH --mail-user=username@iu.edu
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=24
    #SBATCH --time=02:00:00
    
    module load modulename
    srun ./a.out
    

    If you omit the --gres flag, sbatch will return an error:

    Must specify a gpu resource:
    Resubmit with --gres=gpu:type:count, where type is p100 or v100.
    Batch job submission failed: Unspecified error
    

    For more, see Run jobs on Carbonate's DL nodes.

Depending on the resources needed to run your executable lines, you may need to include other sbatch options in your job script. Here a few other useful ones:

Option Action
--begin=YYYY-MM-DDTHH:MM:SS Defer allocation of your job until the specified date and time, after which the job is eligible to execute. For example, to defer allocation of your job until 10:30pm October 31, 2019, use:
--begin=2019-10-31T22:30:00
--no-requeue Specify that the job is not rerunnable. Setting this option prevents the job from being requeued after it has been interrupted, for example, by a scheduled downtime or preemption by a higher priority job.
--export=ALL Export all environment variables in the sbatch command's environment to the batch job.

Submit your job script

To submit your job script (for example, my_job.script), use the sbatch command. If the command runs successfully, it will return a job ID to standard output; for example, on Big Red 3:

[lcalriss@elogin2 ~]$ sbatch my_job.script
Submitted batch job 9472

MPI jobs

To run an MPI job, add #SBATCH directives to your script for requesting the required resources and add the srun command as an executable line for launching your application. For example, a job script for running an MPI job that launches 96 tasks across two nodes in the general partition on Big Red 3 could look similar to the following:

#!/bin/bash
  
#SBATCH -J mpi_job
#SBATCH -p general
#SBATCH -o mpi_%j.txt
#SBATCH -e mpi_%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=username@iu.edu
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=00:30:00

cd /directory/with/stuff
srun a.out
Note:
If your application was compiled using a version of OpenMPI configured with --with-pmi (for example, openmpi/gnu/4.0.1 or openmpi/intel/4.0.1), you can use srun to launch it from your job script. If your application was compiled using a version of OpenMPI that was not configured with --with-pmi (for example, openmpi/gnu/2.1.0 or openmpi/intel/2.1.0), you can use mpirun to launch it from your job script.

OpenMP and hybrid OpenMP-MPI jobs

To run an OpenMP or hybrid OpenMP-MPI job, use the srun command and add the necessary #SBATCH directives as in the previous example, but also add an executable line that sets the OMP_NUM_THREADS environment variable to indicate the number of threads that should be used for parallel regions. For example, a job script for running a hybrid OpenMP-MPI job that launches 24 tasks across two nodes in the general partition on Big Red 3 could look similar to the following:

#!/bin/bash

#SBATCH -J hybrid_job
#SBATCH -p general
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@iu.edu
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
#SBATCH --time=00:05:00
  
export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun a.out

You also can bind tasks to CPUs with the srun command's --cpu-bind option. For example, to modify the previous example so that it binds tasks to sockets, add the --cpu-bind=sockets option to the srun command:

#!/bin/bash
  
#SBATCH -J hybrid_job
#SBATCH -p dl
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@iu.edu  
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
#SBATCH --time=00:05:00

export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun --cpu-bind=sockets a.out

Supported binding options include --cpu-bind=mask_cpu:<list>, which binds by setting CPU masks on tasks as indicated in the specified list. To view all available CPU bind options, on the Big Red 3 command line, enter:

srun --cpu-bind=help

Interactive jobs

To request resources for an interactive job, use the srun command with the --pty option.

For example, on Big Red 3:

  • To launch a Bash session that uses one node in the general partition, on the command line, enter:
    srun -p general --pty bash
    
  • To perform debugging, submit an interactive job to the debug or general partition:
    • To request 12 cores for four hours of wall time in the debug partition, on the command line, enter:
      srun -p debug -N 1 --ntasks-per-node=12 --time=4:00:00 --pty bash
      
    • To request 12 cores for 12 hours of wall time in the general partition, on the command line, enter:
      srun -p general -N 1 --ntasks-per-node=12 --time=12:00:00 --pty bash
      
  • To run an interactive job with X11 forwarding, add the --x11 flag; for example:
    srun -p general --x11 -N 1 --ntasks-per-node=12 --time=12:00:00 --pty bash
    

When the requested resources are allocated to your job, you will be placed at the command prompt on a Big Red 3 compute node. Once you are placed on a compute node, you can launch graphical X applications, as well as your own binaries, from the command line. Depending on the application and your ~/.modules file, you may need to load the module for a desired X client before launching the application.

When you are finished with your interactive session, on the command line, enter exit to free the allocated resources.

For complete documentation about the srun command, see the srun manual page (on the web, see srun; on Big Red 3 or Carbonate, enter man srun).

Monitor or delete your job

To monitor the status of jobs in a Slurm partition, use the squeue command. Some useful squeue options include:

Option Description
-a Display information for all jobs.
-j <jobid> Display information for the specified job ID.
-j <jobid> -o %all Display all information fields (with a vertical bar separating each field) for the specified job ID.
-l Display information in long format.
-n <job_name> Display information for the specified job name.
-p <partition_name> Display jobs in the specified partition.
-t <state_list> Display jobs that have the specified state(s). Valid jobs states include PENDING, RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, NODE_FAIL, PREEMPTED, BOOT_FAIL, DEADLINE, OUT_OF_MEMORY, COMPLETING, CONFIGURING, RESIZING, REVOKED, and SPECIAL_EXIT.
-u <username> Display jobs owned by the specified user.

For example:

  • On Carbonate, to see pending jobs in the dl partition that belong to username, enter:
    squeue -u username -p dl -t PENDING
    
  • On Big Red 3, to see all jobs running in the general partition, enter:
    squeue -p general -t RUNNING
    

For complete documentation about the squeue command, see the squeue manual page (on the web, see squeue, or on Big Red 3 or Carbonate, enter man squeue).

To delete your pending or running job, use the scancel command with your job's job ID; for example, to delete your job that has a job ID of 8990, on the command line, enter:

scancel 8990

Alternatively:

  • To cancel a job named my_job, enter:
    scancel -n my_job
    
  • To cancel a job owned by username, enter:
    scancel -u username
    

For complete documentation about the scancel command, see the scancel manual page (on the web, see scancel, or on Big Red 3 or Carbonate, enter man scancel).

View partition and node information

To view information about the nodes and partitions that Slurm manages, use the sinfo command.

By default, sinfo (without any options) displays:

  • All partition names
  • Availability of each partition
  • Maximum wall time allowed for jobs in each partition
  • Number of nodes in each partition
  • State of the nodes in each partition
  • Names of the nodes in each partition

To display node-specific information, use sinfo -N, which lists:

  • All node names
  • Partition to which each node belongs
  • State of each node

To display additional node-specific information, use sinfo -lN, which adds the following fields to the previous output:

  • Number of cores per node
  • Number of sockets per node, cores per socket, and threads per core
  • Size of memory per node in megabytes

Alternatively, to specify which information fields are displayed and control the formatting of the output, use sinfo with the -o option; for example (replace # with a number to set the display width of the field, and field1 and field2 with the desired field specifications):

sinfo -o "%<#><field1> %<#><field2>"

Available field specifications include:

Specification Field displayed
%<#>P Partition name (set field width to # characters)
%<#>N List of node names (set field width to # characters)
%<#>c Number of cores per node (set field width to # characters)
%<#>m Size of memory per node in megabytes (set field width to # characters)
%<#>l Maximum wall time allowed (set field width to # characters)
%<#>s Maximum number of nodes allowed per job (set field width to # characters)
%<#>G Generic resource associated with a node (set field width to # characters)

For example, on Carbonate, the following sinfo command outputs a node-specific list that includes partition names, node names, the number of cores per node, the amount of memory per node, the maximum wall time allowed per job, and the number and type of generic resources (GPUs) available on each node:

sinfo -No "%10P %8N  %4c  %7m  %12l %10G"

The resulting output looks similar to this:

PARTITION  NODELIST  CPUS  MEMORY   TIMELIMIT    GRES
dl         dl1       24    192888   2-00:00:00   gpu:v100:2
dl         dl2       24    192888   2-00:00:00   gpu:v100:2
dl         dl3       24    192888   2-00:00:00   gpu:p100:2
dl         dl4       24    192888   2-00:00:00   gpu:p100:2
dl         dl5       24    192888   2-00:00:00   gpu:p100:2
dl         dl6       24    192888   2-00:00:00   gpu:p100:2
dl         dl7       24    192888   2-00:00:00   gpu:p100:2
dl         dl8       24    192888   2-00:00:00   gpu:p100:2
dl         dl9       24    192888   2-00:00:00   gpu:p100:2
dl-debug   dl10      24    192888   8:00:00      gpu:p100:2
dl         dl11      24    192888   2-00:00:00   gpu:v100:2
dl         dl12      24    192888   2-00:00:00   gpu:v100:2

For complete documentation about the sinfo command, see the sinfo manual page *(on the web, see sinfo, or on Big Red 3 or Carbonate, enter man sinfo).

Note:
To best meet the needs of all research projects affiliated with Indiana University, UITS Research Technologies administers the batch job queues on IU's research supercomputers using resource management and job scheduling policies that optimize the overall efficiency and performance of workloads on those systems. If the structure or configuration of the batch queues on any of IU's research supercomputers does not meet the needs of your research project, contact UITS Research Technologies.

Get help

SchedMD, the company that distributes and maintains the canonical version of Slurm, provides online user documentation, including a summary of Slurm commands and options, manual pages for all Slurm commands, and a Rosetta Stone of Workload Managers for help determining the Slurm equivalents of commands and options used in other resource management and scheduling systems (for example, TORQUE/PBS).

Support for IU research supercomputers, software, and services is provided by various teams within the Research Technologies division of UITS.

For general questions about research computing at IU, contact UITS Research Technologies.

For more options, see Research computing support at IU.

This is document awrz in the Knowledge Base.
Last modified on 2019-12-11 17:22:55.

Contact us

For help or to comment, email the UITS Support Center.