ARCHIVED: Run jobs on Carbonate
On this page:
Overview
The information in this document pertains only to Carbonate's general-purpose and large-memory nodes. For information about running jobs on Carbonate's deep learning nodes, see:
Carbonate uses the Slurm workload manager for resource management and job scheduling. The Slate and Slate-Project file systems are mounted for temporary storage of research data.
Carbonate employs a default routing queue that funnels jobs, according to their resource requirements, into two execution queues configured to maximize job throughput and minimize wait times (the amount of time a job remains queued, waiting for required resources to become available). Depending on the resource requirements specified in either your batch job script or your qsub
command, the routing queue (BATCH) automatically places your job into the NORMAL or LARGEMEMORY queue:
- NORMAL: Jobs requesting up to 251 GB of virtual memory
- LARGEMEMORY: Jobs requesting from 251 GB up to 503 GB of virtual memory
Submit jobs
Use the TORQUE qsub
command to submit batch or interactive jobs for execution on Carbonate's compute nodes.
- Batch jobs: To run a batch job on Carbonate, first prepare a TORQUE script that specifies the application you want to run and the resources your job will require. For information about preparing a TORQUE job script, see the ARCHIVED: Job script section of ARCHIVED: Use TORQUE to submit and manage jobs on high performance computing systems.
To submit your TORQUE script (for example,
job.script
), use theqsub
command. For example:qsub job.script
If the command exits successfully, it will return a job ID; for example:
123456.m1.Carbonate
Note:Do not specify a destination queue in your TORQUE script orqsub
command. Carbonate uses a default routing queue (BATCH) that automatically places your job, depending on the resource requirements you specify, into the NORMAL or LARGEMEMORY. Your job will run in the NORMAL or LARGEMEMORY queue unless you specifically submit it to Carbonate's INTERACTIVE or DEBUG queue. For information about those queues, see Additional queue information below.If your job has resource requirements that are different from the defaults (but not exceeding the maximums allowed), specify them either with directives in your TORQUE script, or with the
-l
(a lower-case "L"; short forresource_list
) option in yourqsub
command. (Command-line arguments override directives in your job script.) For example:- To submit a TORQUE script (
job.script
) for a batch job that requires more than the default 60 minutes of walltime, use:qsub -l walltime=10:00:00 job.script
- To submit a TORQUE script (
job.script
) for a batch job that requires more than the default 16 GB of virtual memory (for example, 200 GB), use:qsub -l nodes=1:ppn=4,vmem=200gb job.script
If you don't provide a virtual memory resource (omit
-l vmem=[n]gb
), you will receive a warning, and the default (16 GB) virtual memory will be applied.
With the
qsub
command, you can specify multiple attributes with either one-l
switch followed by multiple comma-separated attributes, or multiple-l
switches, one for each attribute. For example, to submit TORQUE script (job.script
) for a batch job that requires 32 GB of virtual memory to run on 16 cores on one node, enter either of the following commands (they are equivalent):qsub -l nodes=1:ppn=16,vmem=32gb job.script qsub -l nodes=1:ppn=16 -l vmem=32gb job.script
- To submit a TORQUE script (
- Interactive jobs: To submit an interactive job, use
qsub
with the-I
(to specify an interactive job) and-q interactive
(to specify submission to Carbonate's INTERACTIVE queue) options; for example:qsub -I -q interactive -l nodes=1:ppn=4,walltime=01:00:00
Submitting your job to the INTERACTIVE queue directs it to a specific set of nodes that are configured for shared access (versus single-user access in the general batch queues). Consequently, your interactive job most likely will dispatch faster in the INTERACTIVE queue than in the general execution queues.
Other useful qsub
options include:
Option | Action |
---|---|
-a YYYYMMDDhhmm.SS |
Specify the date and time after which the job is eligible to execute (replace YYYY with the year, MM with the month, DD with the day of the month, hh with the hour, and mm with the minute; the .SS to indicate seconds is optional). |
-m e |
Mail a job summary report when the job terminates. |
-r [y|n] |
Declare whether the job is rerunnable. If the argument is y , the job is rerunnable; if the argument is n , the job is not rerunnable. The default value is y (rerunnable). |
-V |
Export all environment variables in the qsub command's environment to the batch job. |
For more, see the qsub
manual page.
Monitor jobs
To monitor the status of a queued or running job, use the TORQUE qstat
command:
- To get the status of a particular job, combine the
qstat
with the job ID that was returned by TORQUE when you entered yourqsub
command (for example,1071767
):qstat 1071767
- To get the status of a job (or jobs) that you submitted, combine the
qstat
with your IU username (for example,username
):qstat -u username
Useful qstat
options include:
Option | Action |
---|---|
-a |
Display all jobs. |
-f |
Write a full status display to standard output. |
-n |
List the nodes allocated to a job. |
-r |
Display jobs that are running. |
For more, see the qstat
manual page.
Delete jobs
To delete a queued or running job, use the TORQUE qdel
command combined with the job ID that was returned by TORQUE when you entered your qsub
command (for example, 1071767
):
qdel 1071767
qdel -W <delay>
to override the delay between SIGTERM and SIGKILL signals (for <delay>
, specify a value in seconds).
For more, see the qdel
manual page.
Additional queue information
- DEBUG: To submit a batch job to the DEBUG queue, either add the
#PBS -q debug
directive to your job script, or enterqsub -q debug
on the command line.The DEBUG queue is intended for short, quick-turnaround test jobs requiring less than 1 hour of wall time. For longer debugging or testing sessions, submit an interactive job to the INTERACTIVE queue instead.
Maximum wall time: 1 hour Maximum nodes per job: 2 Maximum cores per job: 48 Maximum number of jobs per user: 2 Direct submission: Yes - INTERACTIVE: To submit an interactive job to the INTERACTIVE queue, on the command line, enter
qsub
with the-I
and-q interactive
options added; for example:qsub -I -q interactive -l nodes=1:ppn=1,walltime=4:00:00
Interactive jobs submitted to the INTERACTIVE queue should experience less wait time (start sooner) than interactive jobs submitted to the batch execution queues. If you enter
qsub
without the-q interactive
option, your interactive job will be placed in the routing queue for submission to the NORMAL or LARGEMEMORY batch execution queue, which most likely will entail a longer wait time for your job.Maximum wall time: 8 hours Maximum cores per job: 8 Maximum number of jobs per queue: 128 Maximum number of jobs per user: 2 Direct submission: Yes
Get help
Support for IU research supercomputers, software, and services is provided by various teams within the Research Technologies division of UITS.
- If you have a technical issue or system-specific question, contact the High Performance Systems (HPS) team.
- If you have a programming question about compilers, scientific/numerical libraries, or debuggers, contact the UITS Research Applications and Deep Learning team.
For general questions about research computing at IU, contact UITS Research Technologies.
For more options, see Research computing support at IU.
This is document avjo in the Knowledge Base.
Last modified on 2021-04-11 07:02:04.