What is TORQUE, and how do I use it to submit and manage jobs on high-performance computing systems?

Following is an overview of how to use the TORQUE resource manager to submit and manage batch jobs on high-performance computing systems.

On this page:


TORQUE at IU

At Indiana University, IU research computing systems use the TORQUE resource manager (based on OpenPBS) and the Moab Workload Manager to manage and schedule jobs. For information about Moab, see What is Moab?

Important:
For your application to execute on Big Red II's compute nodes, your batch job script must include the appropriate application launch command (aprun for ESM jobs; ccmrun for CCM jobs). Additionally, for CCM jobs, you must load the ccm module (add module load ccm to your ~/.modules file), and use the -l gres=ccm TORQUE directive in your job script. TORQUE scripts to run batch jobs in other Linux environments, such as Red Hat Enterprise Linux (RHEL) or CentOS, will not work on Big Red II without the proper modifications. If your script's executable line does not begin with the appropriate launch command, your application will execute on an aprun service node, not a compute node, and may likely cause a service disruption for all users on the system. The aprun nodes are shared by all currently running jobs, and are intended only for passing job requests. Any memory- or computationally-intensive jobs running on aprun nodes will be terminated.

For information about using TORQUE to submit batch jobs on Big Red II, see How do I run batch jobs on Big Red II at IU?

For information about batch queues available on IU research computing systems, see Queue information for IU research computing systems.

Note:
To best meet the needs of all research projects affiliated with Indiana University, the High Performance Systems (HPS) team administers the batch job queues on UITS Research Technologies supercomputers using resource management and job scheduling policies that optimize the overall efficiency and performance of workloads on those systems. If the structure or configuration of the batch queues on any of IU's supercomputing systems does not meet the needs of your research project, fill out and submit the Research Technologies Ask RT for Help form (for "Select a group to contact", select High Performance Systems).

Although UITS Research Technologies cannot provide dedicated access to an entire compute system during the course of normal operations, "single user time" is made available by request one day a month during each system's regularly scheduled maintenance window to accommodate IU researchers with tasks requiring dedicated access to an entire compute system. To request such single user time, complete and submit the Research Technologies Ask RT for Help form, requesting to run jobs in single user time on HPS systems. If you have questions, email the HPS team.

TORQUE overview

TORQUE is a resource management system for submitting and controlling jobs on supercomputers, clusters, and grids. TORQUE manages jobs that users submit to various queues on a computer system, each queue representing a group of resources with attributes necessary for the queue's jobs.

Commonly used TORQUE commands include:

qsub Submit a job.
qstat Monitor the status of a job.
qdel Terminate a job prior to its completion.

TORQUE includes numerous directives, which are used to specify resource requirements and other attributes for batch and interactive jobs. TORQUE directives can appear as header lines (lines that start with #PBS) in a batch job script or as command-line options to the qsub command.

TORQUE is based on the original open source Portable Batch System (OpenPBS) project and is managed as an open source project by Adaptive Computing, Inc. in cooperation with the TORQUE community. For more, see Adaptive Computing's TORQUE product page.

For help using TORQUE to submit and manage jobs, see the Submitting and managing jobs chapter of Adaptive Computing's TORQUE Administrator Guide. For a list of TORQUE commands, see the Commands overview appendix. For questions, suggestions, and issues about TORQUE, subscribe to the TORQUE mailing list or view the mailing list archive.

Job scripts

Note:
For information about using TORQUE job scripts on Big Red II, see How do I run batch jobs on Big Red II at IU?

To run a job in batch mode on a high-performance computing system using TORQUE, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to TORQUE using the qsub command. TORQUE passes your job and its requirements to the system's job scheduler, which then dispatches your job whenever the required resources are available.

A very basic job script might contain just a bash or tcsh shell script. However, TORQUE job scripts most commonly contain at least one executable command preceded by a list of directives that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These directives are listed in header lines (lines beginning with #PBS), which should precede any executable lines in your job script.

Additionally, your TORQUE job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.

For example:

  • A TORQUE job script for an MPI job might look like this:
  •   #!/bin/bash 
      #PBS -k o 
      #PBS -l nodes=2:ppn=6,walltime=30:00
      #PBS -M jthutt@tatooine.net
      #PBS -m abe 
      #PBS -N JobName 
      #PBS -j oe 
    
      mpiexec -np 12 -machinefile $PBS_NODEFILE ~/bin/binaryname
    

    In the above example, the first line indicates the script should be read using the bash command interpreter. Then, several header lines of TORQUE directives are included:

    TORQUE directive Description
    #PBS -k o Keeps the job output
    #PBS -l nodes=2:ppn=6,walltime=30:00 Indicates the job requires two nodes, six processors per node, and 30 minutes of wall-clock time
    #PBS -M jthutt@tatooine.net Sends job-related email to jthutt@tatooine.net
    #PBS -m abe Sends email if the job is (a) aborted, when it (b) begins, and when it (e) ends
    #PBS -N JobName Names the job JobName
    #PBS -j oe Joins standard output and standard error

    The last line in the example is the executable line. It tells the operating system to use the mpiexec command to execute the ~/bin/binaryname binary on 12 processors from the machines listed in $PBS_NODEFILE.

  • A TORQUE job script for a serial job might look like this:
  •   #!/bin/bash 
      #PBS -k o 
      #PBS -l nodes=1:ppn=1,walltime=30:00 
      #PBS -M jthutt@tatooine.net 
      #PBS -m abe
      #PBS -N JobName 
      #PBS -j oe
      ./a.out
    

    As in the previous example, this script starts with a line that specifies the bash command interpreter, followed by several header lines of TORQUE directives:

    TORQUE directive Description
    #PBS -k o Keeps the job output
    #PBS -l nodes=1:ppn=1,walltime=30:00 Indicates the job requires one node, one processor per node, and 30 minutes of wall-clock time
    #PBS -M jthutt@tatooine.net Sends job-related email to jthutt@tatooine.net
    #PBS -m abe Sends email if the job is (a) aborted, when it (b) begins, and when it (e) ends
    #PBS -N JobName Names the job JobName
    #PBS -j oe Joins standard output and standard error

    The last line tells the operating system to execute a.out on a single processor.

For more about TORQUE directives, see the qsub manual page (enter man qsub).

Submitting jobs

To submit your job script (e.g., job.script), use the TORQUE qsub command. If the command runs successfully, it will return a job ID to standard output, for example:

qsub job.script

  123456.qm2

If your job requires attribute values greater than the defaults, but less than the maximum allowed, you can specify these with the -l (lowercase L, for "limit") option, either in your job script (as explained in the previous section) or on the qsub command line. For example, the following command submits job.script, using the -l walltime option to indicate the job needs more than the default 30 minutes of wall-clock time:

qsub -l walltime=10:00:00 job.script
Note:
Command-line options will override TORQUE directives in your job script.

To include multiple options on the command line, use either one -l flag with several comma-separated options, or multiple -l flags, each separated by a space. For example, the following two commands are equivalent:

  qsub -l ncpus=16,mem=1024mb job.script

  qsub -l ncpus=16 -l mem=1024mb job.script

Useful qsub options include:

qsub option Description
-q queue_name Specifies a user-selectable queue (queue_name)
-r Makes the job re-runnable
-a date_time Executes the job only after a specific date and time (date_time)
-V Exports environment variables in your current environment to the job
-I Makes the job run interactively (usually for testing purposes)

For more, see the qsub manual page (enter man qsub).

Monitoring jobs

To monitor the status of a queued or running job, use the qstat command.

Useful qstat options include:

qstat option Description
-u user_list Displays jobs for users listed in user_list
-a Displays all jobs
-r Displays running jobs
-f Displays the full listing of jobs (returns excessive detail)
-n Displays nodes allocated to jobs

For example, to see all the jobs running in the LONG queue, enter:

  qstat -r long | less

For more, see the qstat manual page (enter man qstat).

Alternatively, use the Moab showq command for monitoring jobs. To list the queued jobs in dispatch order, enter:

  showq -i

For more, see Common Moab scheduler commands and the showq manual page (enter man showq).

For web-based status updates and information about jobs running on IU research computing systems, use the IU Cyberinfrastructure Gateway; see What is the IU Cyberinfrastructure Gateway?

Deleting jobs

To delete queued or running jobs, use the qdel command:

  • To delete a specific job (jobid), enter:
  •   qdel jobid
    
  • To delete all jobs, enter:
  •   qdel all
    

Occasionally, a node becomes unresponsive and won't respond to the TORQUE server's requests to delete a job. If that occurs, add the -W (uppercase W) option:

  qdel -W jobid

If that doesn't work, email the High Performance Systems group for help.

For more, see the qdel manual page (enter man qdel).

This document was developed with support from National Science Foundation (NSF) grants 1053575 and 1548562. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document avmy in the Knowledge Base.
Last modified on 2017-07-19 11:02:51.

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.