Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

What is TORQUE, and how do I use it to submit and manage jobs on Quarry at IU?

Note: Following a system-wide upgrade in December 2012, Quarry now runs Red Hat Enterprise Linux version 6 (RHEL 6) and uses the Modules package (instead of SoftEnv) for manipulating user environments. For more, see Information about the 2012 upgrade to Quarry at IU. If you encounter any problems or have questions, email the High Performance Systems group.

On this page:


About TORQUE

TORQUE is a resource management system for submitting and controlling jobs on supercomputers, clusters, and grids. TORQUE manages jobs that users submit to various queues on a computer system, each queue representing a group of resources with attributes necessary for the queue's jobs.

Commonly used TORQUE commands include:

qsub Submit a job.
qstat Monitor the status of a job.
qdel Terminate a job prior to its completion.

TORQUE is based on the original open source Portable Batch System (OpenPBS) project and is managed as an open source project by Adaptive Computing, Inc. in cooperation with the TORQUE community. For more, see Adaptive Computing's TORQUE product page.

For help using TORQUE to submit and manage jobs, see the Submitting and managing jobs chapter of Adaptive Computing's TORQUE Administrator Guide. For a list of TORQUE commands, see the Commands overview appendix. For questions, suggestions, and issues about TORQUE, subscribe to the TORQUE mailing list or view the mailing list archive.

Back to top

Policy on Quarry

At Indiana University, the Quarry research system uses TORQUE for resource management in combination with the Moab job scheduler. Job submission and management tools are available in /usr/local/bin. Most tools have associated manual (man) pages. More information regarding these tools is available below and in the documentation mentioned above.

User processes on login nodes are limited to 20 minutes of CPU time. Processes exceeding this limit are automatically terminated without warning.

To run jobs that require more than 20 minutes (but less than 24 hours) of CPU time, use the interactive nodes (q0145-q0148). To access one of these nodes, you must log into Quarry, and then use SSH to connect to q0145, q0146, q0147, or q0148.

If your job requires more than 24 hours of CPU time, use the qsub command to submit a batch job to TORQUE.

Back to top

Queues on Quarry

The following queues are available on Quarry:

Note: Cluster-wide, the maximum number of tasks is 2,768 (346 compute nodes available [342 in queues, 4 in user-selectable debug] X 8 tasks per node).

SERIAL queue properties

  • Nodes: 19 serial (q0155-q0173) + 78 normal (q0174-q0251) + 117 long (q0252-q0368) = 216 total
  • Maximum walltime: 12 hours
  • Maximum nodes per job: 1 node
  • Maximum cores per job: 8 cores
  • Maximum number of jobs per queue: 2,000
  • Maximum number of jobs per user: 500
  • Direct submission: No

NORMAL queue properties

  • Nodes: 78 normal (q0174-q0251) + 117 long (q0252-q0368) = 195 total
  • Maximum walltime: 7 days
  • Maximum nodes per job: 6 nodes
  • Maximum cores per job: 48 cores
  • Maximum number of jobs per queue: 1,500
  • Maximum number of jobs per user: 500
  • Direct submission: No

LONG queue properties

  • Nodes: 117 long (q0252-q0368) = 117 total
  • Maximum walltime: 14 days
  • Maximum nodes per job: 117 nodes
  • Maximum cores per job: 936 cores
  • Maximum number of jobs per queue: 500
  • Maximum number of jobs per user: 50
  • Direct submission: No

DEBUG queue properties

  • Nodes: 4 nodes dedicated (q0151-q0154)
  • Maximum walltime: 30 minutes
  • Maximum nodes per job: 2 nodes
  • Maximum cores per job: 16 cores
  • Maximum number of jobs per queue: None
  • Maximum number of jobs per user: 2
  • Direct submission: Yes (qsub -q debug)

Note: The DEBUG queue is intended for short, quick-turnaround test jobs requiring less than 30 minutes of wall clock time.

Once code has been debugged, you may submit it to the other work queues via the normal qsub mechanism. You may allow the batch queue to dispatch your jobs to an appropriate queue, or select one of the user-selectable queues below.

If you don't specify a queue when you submit a job, that job will automatically go into the queue into which it fits, depending on other criteria (e.g., number of nodes or CPUs requested).

Back to top

Job scripts

TORQUE most commonly handles job scripts, although it also supports interactive jobs. A job script can be as simple as a bash or tcsh shell script, but also can include several TORQUE job directives. Always begin a TORQUE job script (which will be executed under your preferred login shell) with a "shebang" line that specifies the command interpreter under which it should run, for example:

#!/bin/bash

TORQUE directives (lines beginning with #PBS) include switches for specifying useful information, such as the wall-clock time required to complete the job, the number of nodes and processors required, and filenames for job output and errors. Your script should include these directives at the top, following the "shebang" line. An example TORQUE job script might look like this:

#!/bin/bash #PBS -k o #PBS -l nodes=4:ppn=2,walltime=30:00 #PBS -M username@indiana.edu #PBS -m abe #PBS -N JobName #PBS -j oe mpiexec -np 8 -machinefile $PBS_NODEFILE ~/bin/binaryname

Line by line, the above script says:

  • Use bash as the command interpreter for this script.
  • Keep the job output.
  • This job requires four nodes, two processors per node, and 30 minutes of wall-clock time
  • Send job-related email to  username@indiana.edu .
  • Send email if the job is (a) aborted, when it (b) begins, and when it (e) ends.
  • The job name is JobName.
  • Join standard output and standard error.
  • Execute ~/bin/binaryname on eight processors from the machines in $PBS_NODEFILE using mpirun.

For additional details on TORQUE directives, view the manual pages (enter man qsub ).

Back to top

Submitting jobs

To submit jobs, use the qsub command. If the command runs successfully, it will return a job ID to standard output, for example:

qsub job.script 123456.qm2

If your job requires attribute values greater than the defaults, but less than the maximum allowed, specify these either in the job script with TORQUE directives, or on the command line with the -l (lowercase L) switch. For example, to submit a job needing more than the default 30 minutes of wall-clock time, use:

qsub -l walltime=10:00:00 job.script

Note: Command-line arguments override directives in the job script, and you may specify many attributes on the command line, either as comma-separated options following the  -l  switch, or each with its own  -l  switch. The following two commands are equivalent:

qsub -l ncpus=16,mem=1024mb job.script qsub -l ncpus=16 -l mem=1024mb job.script

Useful qsub switches include:

-q queue_name Specify a user-selectable queue (queue_name).
-r Make the job re-runnable.
-a date_time Execute the job only after specific date and time (date_time).
-V Export environment variables in your current environment to the job.
-I Run interactively (usually for testing purposes).

For more, see the qsub manual page (man qsub).

Back to top

Monitoring jobs

To monitor the status of a queued or running job, use the qstat command.

Useful qstat switches include:

-u user_list Display jobs for users listed in user_list.
-a Display all jobs.
-r Display running jobs.
-f Display full listing of jobs (returns excessive detail).
-n Display nodes allocated to jobs.

For example, to see all the jobs running in the LONG queue, use:

qstat -r long | less

The Moab job scheduler provides another useful command for monitoring jobs (showq ). To list the queued jobs in dispatch order, use:

showq -i

For more about Moab, see What is Moab? and the showq manual page (man showq.

Back to top

Deleting jobs

To delete queued or running jobs, use the qdel command:

qdel jobid Delete a specific job (jobid).
qdel all Delete all jobs.

Occasionally, a node becomes unresponsive and won't respond to the TORQUE server's requests to delete a job. If that occurs, add the -W (uppercase W) option:

qdel -W jobid

If that doesn't work, email High Performance Systems for help.

Back to top

This document was developed with support from National Science Foundation (NSF) grant OCI-1053575. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document avmy in domains all and xsede-all.
Last modified on May 21, 2013.

I need help with a computing problem

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.



Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

I have a comment for the Knowledge Base

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.