What is TORQUE, and how do I use it to submit and manage jobs on Quarry at IU?
Note: Following a system-wide upgrade in December 2012, Quarry now runs Red Hat Enterprise Linux version 6 (RHEL 6) and uses the Modules package (instead of SoftEnv) for manipulating user environments. For more, see Information about the 2012 upgrade to Quarry at IU. If you encounter any problems or have questions, email the High Performance Systems group.
On this page:
- About TORQUE
- Policy on Quarry
- Queues on Quarry
- Job scripts
- Submitting jobs
- Monitoring jobs
- Deleting jobs
About TORQUE
TORQUE is a resource management system for submitting and controlling jobs on supercomputers, clusters, and grids. TORQUE manages jobs that users submit to various queues on a computer system, each queue representing a group of resources with attributes necessary for the queue's jobs.
Commonly used TORQUE commands include:
qsub |
Submit a job. |
qstat |
Monitor the status of a job. |
qdel |
Terminate a job prior to its completion. |
TORQUE is based on the original open source Portable Batch System (OpenPBS) project and is managed as an open source project by Adaptive Computing, Inc. in cooperation with the TORQUE community. For more, see Adaptive Computing's TORQUE product page.
For help using TORQUE to submit and manage jobs, see the Submitting and managing jobs chapter of Adaptive Computing's TORQUE Administrator Guide. For a list of TORQUE commands, see the Commands overview appendix. For questions, suggestions, and issues about TORQUE, subscribe to the TORQUE mailing list or view the mailing list archive.
Policy on Quarry
At Indiana University, the Quarry research system uses
TORQUE for resource management in combination with the Moab
job scheduler. Job submission and management tools are available in
/usr/local/bin. Most tools have associated manual
(man) pages. More information regarding these tools is
available below and in the documentation mentioned above.
User processes on login nodes are limited to 20 minutes of CPU time. Processes exceeding this limit are automatically terminated without warning.
To run jobs that require more than 20 minutes (but less than 24
hours) of CPU time, use the interactive nodes
(q0145-q0148). To access one of these nodes,
you must log into Quarry, and then use SSH to connect to
q0145, q0146, q0147, or
q0148.
If your job requires more than 24 hours of CPU time, use the
qsub command to submit a batch job to
TORQUE.
Queues on Quarry
The following queues are available on Quarry:
Note: Cluster-wide, the maximum number of tasks is 2,768 (346 compute nodes available [342 in queues, 4 in user-selectable debug] X 8 tasks per node).
SERIAL queue properties
-
Nodes: 19 serial
(
q0155-q0173) + 78 normal (q0174-q0251) + 117 long (q0252-q0368) = 216 total - Maximum walltime: 12 hours
- Maximum nodes per job: 1 node
- Maximum cores per job: 8 cores
- Maximum number of jobs per queue: 2,000
- Maximum number of jobs per user: 500
- Direct submission: No
NORMAL queue properties
-
Nodes: 78 normal
(
q0174-q0251) + 117 long (q0252-q0368) = 195 total - Maximum walltime: 7 days
- Maximum nodes per job: 6 nodes
- Maximum cores per job: 48 cores
- Maximum number of jobs per queue: 1,500
- Maximum number of jobs per user: 500
- Direct submission: No
LONG queue properties
-
Nodes: 117 long
(
q0252-q0368) = 117 total - Maximum walltime: 14 days
- Maximum nodes per job: 117 nodes
- Maximum cores per job: 936 cores
- Maximum number of jobs per queue: 500
- Maximum number of jobs per user: 50
- Direct submission: No
DEBUG queue properties
-
Nodes: 4 nodes dedicated
(
q0151-q0154) - Maximum walltime: 30 minutes
- Maximum nodes per job: 2 nodes
- Maximum cores per job: 16 cores
- Maximum number of jobs per queue: None
- Maximum number of jobs per user: 2
-
Direct submission: Yes (
qsub -q debug)
Note: The DEBUG queue is intended for short, quick-turnaround test jobs requiring less than 30 minutes of wall clock time.
Once code has been debugged, you may submit it to the other work
queues via the normal qsub mechanism. You may allow the
batch queue to dispatch your jobs to an appropriate queue, or select
one of the user-selectable queues below.
If you don't specify a queue when you submit a job, that job will automatically go into the queue into which it fits, depending on other criteria (e.g., number of nodes or CPUs requested).
Job scripts
TORQUE most commonly handles job scripts, although it also supports
interactive jobs. A job script can be as simple as a bash
or tcsh shell script, but also can include
several TORQUE job directives. Always begin a TORQUE job script (which
will be executed under your preferred login shell) with a "shebang"
line that specifies the command interpreter under which it should run,
for example:
TORQUE directives (lines beginning with #PBS) include
switches for specifying useful information, such as the wall-clock
time required to complete the job, the number of nodes and processors
required, and filenames for job output and errors. Your script should
include these directives at the top, following the "shebang" line. An
example TORQUE job script might look like this:
Line by line, the above script says:
- Use
bashas the command interpreter for this script. - Keep the job output.
- This job requires four nodes, two processors per node, and 30 minutes of wall-clock time
- Send job-related email to
username@indiana.edu. - Send email if the job is (
a) aborted, when it (b) begins, and when it (e) ends. - The job name is JobName.
- Join standard output and standard error.
- Execute
~/bin/binarynameon eight processors from the machines in$PBS_NODEFILEusingmpirun.
For additional details on TORQUE directives, view the manual pages
(enter man qsub ).
Submitting jobs
To submit jobs, use the qsub command. If the command
runs successfully, it will return a job ID to standard output, for
example:
If your job requires attribute values greater than the defaults,
but less than the maximum allowed, specify these either in the job
script with TORQUE directives, or on the command line with the
-l (lowercase L) switch. For example, to submit a job
needing more than the default 30 minutes of wall-clock time, use:
Note: Command-line arguments override directives
in the job script, and you may specify many attributes on the command
line, either as comma-separated options following the
-l switch, or each with its own
-l switch. The following two commands are
equivalent:
Useful qsub switches include:
-q queue_name |
Specify a user-selectable queue
(queue_name).
|
-r |
Make the job re-runnable. |
-a date_time |
Execute the job only after
specific date and time (date_time).
|
-V |
Export environment variables in your current environment to the job. |
-I |
Run interactively (usually for testing purposes). |
For more, see the qsub manual page (man
qsub).
Monitoring jobs
To monitor the status of a queued or running job, use the
qstat command.
Useful qstat switches include:
-u user_list |
Display jobs for users listed in
user_list.
|
-a |
Display all jobs. |
-r |
Display running jobs. |
-f |
Display full listing of jobs (returns excessive detail). |
-n |
Display nodes allocated to jobs. |
For example, to see all the jobs running in the LONG queue, use:
qstat -r long | lessThe Moab job scheduler provides another useful command for
monitoring jobs (showq ). To list the queued jobs in
dispatch order, use:
For more about Moab, see What is Moab? and the
showq manual page (man showq.
Deleting jobs
To delete queued or running jobs, use the qdel
command:
qdel jobid |
Delete a specific job
(jobid).
|
qdel all |
Delete all jobs. |
Occasionally, a node becomes unresponsive and won't respond to the
TORQUE server's requests to delete a job. If that occurs, add the
-W (uppercase W) option:
If that doesn't work, email High Performance Systems for help.
This document was developed with support from National Science Foundation (NSF) grant OCI-1053575. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on May 21, 2013.







