Use Slurm to submit and manage jobs on high performance computing systems
On this page:
- Overview
- Batch jobs
- Interactive jobs
- Monitor or delete your job
- View partition and node information
- Get help
Overview
At Indiana University, Big Red 3 and the Carbonate deep learning (DL) nodes use the Slurm Workload Manager to coordinate resource management and job scheduling.
Slurm user commands include numerous options for specifying the resources and other attributes needed to run batch jobs or interactive sessions. Options can be invoked on the command line or with directives contained in a job script.
Common user commands in Slurm include:
| Command | Description |
|---|---|
sbatch |
Submit a batch script to Slurm. The command exits immediately when the script is transferred to the Slurm controller daemon and assigned a Slurm job ID. For more, see the Batch jobs section below. |
srun |
Request resources for an interactive job. For more, see the Interactive jobs section below. |
squeue |
Monitor job status information. For more, see the Monitor or delete your job section below. |
scancel |
Terminate a queued or running job prior to its completion. For more, see the Monitor or delete your job section below. |
sinfo |
View partition information. For more, see the View partition and node information section below. |
Batch jobs
To run a job in batch mode, first prepare a job script that specifies the application you want to launch and the resources required to run it. Then, use the sbatch command to submit your job script to Slurm.
For complete documentation about the sbatch command and its options, see the sbatch manual page (on the web, see sbatch; on Big Red 3 or Carbonate, enter man sbatch).
Prepare a job script
Slurm job scripts most commonly have at least one executable line preceded by a list of options that specify the resources and attributes needed to run your job (for example, wall-clock time, the number of nodes and processors, and filenames for job output and errors).
- A job script for running a batch job on Big Red 3 may look similar to the following:
#!/bin/bash #SBATCH -J job_name #SBATCH -p general #SBATCH -o filename_%j.txt #SBATCH -e filename_%j.err #SBATCH --mail-type=ALL #SBATCH --mail-user=username@iu.edu #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --time=02:00:00 module load modulename ./a.out
In the above example:
- The first line indicates that the script should be read using the Bash command interpreter.
- The next lines are
#SBATCHdirectives used to pass options to thesbatchcommand:-J job_namespecifies a name for the job allocation. The specified name will appear along with the job ID number when you query running jobs on the system.-p generalspecifies that the job should run in the general partition.-o filename_%j.txtand-e filename_%j.errinstructs Slurm to connect the job's standard output and standard error, respectively, to the file names specified, where%jis automatically replaced by the job ID.--mail-user=username@iu.eduindicates the email address to which Slurm will send job-related mail.--mail-type=<type>directs Slurm to send job-related email when an event of the specified type(s) occurs; validtypevalues includeall,begin,end, andfail.--nodes=1requests that a minimum of one node be allocated to this job.--ntasks-per-node=1specifies that one task should be launched per node.--time=02:00:00requests that the job run for a minimum of two hours.
- The last two lines are the two executable lines that the job will run. In this case, the
modulecommand is used to load a specified module before thea.outbinary is executed.
- A job script for running a batch job on the Carbonate deep learning nodes must contain the
--gresflag to indicate the type of GPU (p100orv100) and the number of GPUs (1or2) that should be allocated to the job; for example:#!/bin/bash #SBATCH -J job_name #SBATCH -p dl #SBATCH --gres=gpu:v100:2 #SBATCH -o filename_%j.txt #SBATCH -e filename_%j.err #SBATCH --mail-type=ALL #SBATCH --mail-user=username@iu.edu #SBATCH --nodes=1 #SBATCH --ntasks-per-node=24 #SBATCH --time=02:00:00 module load modulename srun ./a.out
If you omit the
--gresflag,sbatchwill return an error:Must specify a gpu resource: Resubmit with --gres=gpu:type:count, where type is p100 or v100. Batch job submission failed: Unspecified error
For more, see Run jobs on Carbonate's DL nodes.
Depending on the resources needed to run your executable lines, you may need to include other sbatch options in your job script. Here a few other useful ones:
| Option | Action |
|---|---|
--begin=YYYY-MM-DDTHH:MM:SS |
Defer allocation of your job until the specified date and time, after which the job is eligible to execute. For example, to defer allocation of your job until 10:30pm October 31, 2019, use:
--begin=2019-10-31T22:30:00 |
--no-requeue |
Specify that the job is not rerunnable. Setting this option prevents the job from being requeued after it has been interrupted, for example, by a scheduled downtime or preemption by a higher priority job. |
--export=ALL |
Export all environment variables in the sbatch command's environment to the batch job. |
Submit your job script
To submit your job script (for example, my_job.script), use the sbatch command. If the command runs successfully, it will return a job ID to standard output; for example, on Big Red 3:
[lcalriss@elogin2 ~]$ sbatch my_job.script Submitted batch job 9472
MPI jobs
To run an MPI job, add #SBATCH directives to your script for requesting the required resources and add the srun command as an executable line for launching your application. For example, a job script for running an MPI job that launches 96 tasks across two nodes in the general partition on Big Red 3 could look similar to the following:
#!/bin/bash #SBATCH -J mpi_job #SBATCH -p general #SBATCH -o mpi_%j.txt #SBATCH -e mpi_%j.err #SBATCH --mail-type=END,FAIL #SBATCH --mail-user=username@iu.edu #SBATCH --nodes=2 #SBATCH --ntasks-per-node=48 #SBATCH --time=00:30:00 cd /directory/with/stuff srun a.out
--with-pmi (for example, openmpi/gnu/4.0.1 or openmpi/intel/4.0.1), you can use srun to launch it from your job script. If your application was compiled using a version of OpenMPI that was not configured with --with-pmi (for example, openmpi/gnu/2.1.0 or openmpi/intel/2.1.0), you can use mpirun to launch it from your job script.
OpenMP and hybrid OpenMP-MPI jobs
To run an OpenMP or hybrid OpenMP-MPI job, use the srun command and add the necessary #SBATCH directives as in the previous example, but also add an executable line that sets the OMP_NUM_THREADS environment variable to indicate the number of threads that should be used for parallel regions. For example, a job script for running a hybrid OpenMP-MPI job that launches 24 tasks across two nodes in the general partition on Big Red 3 could look similar to the following:
#!/bin/bash #SBATCH -J hybrid_job #SBATCH -p general #SBATCH -o hybrid_%j.txt #SBATCH -e hybrid_%j.err #SBATCH --mail-type=ALL #SBATCH --mail-user=username@iu.edu #SBATCH --nodes=2 #SBATCH --ntasks-per-node=24 #SBATCH --time=00:05:00 export OMP_NUM_THREADS=2 cd /directory/with/stuff srun a.out
You also can bind tasks to CPUs with the srun command's --cpu-bind option. For example, to modify the previous example so that it binds tasks to sockets, add the --cpu-bind=sockets option to the srun command:
#!/bin/bash #SBATCH -J hybrid_job #SBATCH -p dl #SBATCH -o hybrid_%j.txt #SBATCH -e hybrid_%j.err #SBATCH --mail-type=ALL #SBATCH --mail-user=username@iu.edu #SBATCH --nodes=2 #SBATCH --ntasks-per-node=24 #SBATCH --time=00:05:00 export OMP_NUM_THREADS=2 cd /directory/with/stuff srun --cpu-bind=sockets a.out
Supported binding options include --cpu-bind=mask_cpu:<list>, which binds by setting CPU masks on tasks as indicated in the specified list. To view all available CPU bind options, on the Big Red 3 command line, enter:
srun --cpu-bind=help
Interactive jobs
To request resources for an interactive job, use the srun command with the --pty option.
For example, on Big Red 3:
- To launch a Bash session that uses one node in the general partition, on the command line, enter:
srun -p general --pty bash
- To perform debugging, submit an interactive job to the debug or general partition:
- To request 12 cores for four hours of wall time in the debug partition, on the command line, enter:
srun -p debug -N 1 --ntasks-per-node=12 --time=4:00:00 --pty bash
- To request 12 cores for 12 hours of wall time in the general partition, on the command line, enter:
srun -p general -N 1 --ntasks-per-node=12 --time=12:00:00 --pty bash
- To request 12 cores for four hours of wall time in the debug partition, on the command line, enter:
- To run an interactive job with X11 forwarding, add the
--x11flag; for example:srun -p general --x11 -N 1 --ntasks-per-node=12 --time=12:00:00 --pty bash
When the requested resources are allocated to your job, you will be placed at the command prompt on a Big Red 3 compute node. Once you are placed on a compute node, you can launch graphical X applications, as well as your own binaries, from the command line. Depending on the application and your ~/.modules file, you may need to load the module for a desired X client before launching the application.
When you are finished with your interactive session, on the command line, enter exit to free the allocated resources.
For complete documentation about the srun command, see the srun manual page (on the web, see srun; on Big Red 3 or Carbonate, enter man srun).
Monitor or delete your job
To monitor the status of jobs in a Slurm partition, use the squeue command. Some useful squeue options include:
| Option | Description |
|---|---|
-a |
Display information for all jobs. |
-j <jobid> |
Display information for the specified job ID. |
-j <jobid> -o %all |
Display all information fields (with a vertical bar separating each field) for the specified job ID. |
-l |
Display information in long format. |
-n <job_name> |
Display information for the specified job name. |
-p <partition_name> |
Display jobs in the specified partition. |
-t <state_list> |
Display jobs that have the specified state(s). Valid jobs states include PENDING, RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, NODE_FAIL, PREEMPTED, BOOT_FAIL, DEADLINE, OUT_OF_MEMORY, COMPLETING, CONFIGURING, RESIZING, REVOKED, and SPECIAL_EXIT. |
-u <username> |
Display jobs owned by the specified user. |
For example:
- On Carbonate, to see pending jobs in the dl partition that belong to
username, enter:squeue -u username -p dl -t PENDING
- On Big Red 3, to see all jobs running in the general partition, enter:
squeue -p general -t RUNNING
For complete documentation about the squeue command, see the squeue manual page (on the web, see squeue, or on Big Red 3 or Carbonate, enter man squeue).
To delete your pending or running job, use the scancel command with your job's job ID; for example, to delete your job that has a job ID of 8990, on the command line, enter:
scancel 8990
Alternatively:
- To cancel a job named
my_job, enter:scancel -n my_job
- To cancel a job owned by
username, enter:scancel -u username
For complete documentation about the scancel command, see the scancel manual page (on the web, see scancel, or on Big Red 3 or Carbonate, enter man scancel).
View partition and node information
To view information about the nodes and partitions that Slurm manages, use the sinfo command.
By default, sinfo (without any options) displays:
- All partition names
- Availability of each partition
- Maximum wall time allowed for jobs in each partition
- Number of nodes in each partition
- State of the nodes in each partition
- Names of the nodes in each partition
To display node-specific information, use sinfo -N, which lists:
- All node names
- Partition to which each node belongs
- State of each node
To display additional node-specific information, use sinfo -lN, which adds the following fields to the previous output:
- Number of cores per node
- Number of sockets per node, cores per socket, and threads per core
- Size of memory per node in megabytes
Alternatively, to specify which information fields are displayed and control the formatting of the output, use sinfo with the -o option; for example (replace # with a number to set the display width of the field, and field1 and field2 with the desired field specifications):
sinfo -o "%<#><field1> %<#><field2>"
Available field specifications include:
| Specification | Field displayed |
|---|---|
%<#>P |
Partition name (set field width to # characters) |
%<#>N |
List of node names (set field width to # characters) |
%<#>c |
Number of cores per node (set field width to # characters) |
%<#>m |
Size of memory per node in megabytes (set field width to # characters) |
%<#>l |
Maximum wall time allowed (set field width to # characters) |
%<#>s |
Maximum number of nodes allowed per job (set field width to # characters) |
%<#>G |
Generic resource associated with a node (set field width to # characters) |
For example, on Carbonate, the following sinfo command outputs a node-specific list that includes partition names, node names, the number of cores per node, the amount of memory per node, the maximum wall time allowed per job, and the number and type of generic resources (GPUs) available on each node:
sinfo -No "%10P %8N %4c %7m %12l %10G"
The resulting output looks similar to this:
PARTITION NODELIST CPUS MEMORY TIMELIMIT GRES dl dl1 24 192888 2-00:00:00 gpu:v100:2 dl dl2 24 192888 2-00:00:00 gpu:v100:2 dl dl3 24 192888 2-00:00:00 gpu:p100:2 dl dl4 24 192888 2-00:00:00 gpu:p100:2 dl dl5 24 192888 2-00:00:00 gpu:p100:2 dl dl6 24 192888 2-00:00:00 gpu:p100:2 dl dl7 24 192888 2-00:00:00 gpu:p100:2 dl dl8 24 192888 2-00:00:00 gpu:p100:2 dl dl9 24 192888 2-00:00:00 gpu:p100:2 dl-debug dl10 24 192888 8:00:00 gpu:p100:2 dl dl11 24 192888 2-00:00:00 gpu:v100:2 dl dl12 24 192888 2-00:00:00 gpu:v100:2
For complete documentation about the sinfo command, see the sinfo manual page *(on the web, see sinfo, or on Big Red 3 or Carbonate, enter man sinfo).
Get help
SchedMD, the company that distributes and maintains the canonical version of Slurm, provides online user documentation, including a summary of Slurm commands and options, manual pages for all Slurm commands, and a Rosetta Stone of Workload Managers for help determining the Slurm equivalents of commands and options used in other resource management and scheduling systems (for example, TORQUE/PBS).
Support for IU research supercomputers, software, and services is provided by various teams within the Research Technologies division of UITS.
- If you have a system-specific question, contact the High Performance Systems (HPS) team.
- If you have a programming question about compilers, scientific/numerical libraries, or debuggers, contact the UITS Research Applications and Deep Learning team.
For general questions about research computing at IU, contact UITS Research Technologies.
For more options, see Research computing support at IU.
This is document awrz in the Knowledge Base.
Last modified on 2020-07-01 11:08:01.