ARCHIVED: Use TORQUE to submit and manage jobs on high performance computing systems
On this page:
TORQUE overview
TORQUE is a resource management system for submitting and controlling jobs on supercomputers, clusters, and grids. TORQUE manages jobs that users submit to various queues on a computer system, each queue representing a group of resources with attributes necessary for the queue's jobs.
Commonly used TORQUE commands include:
qsub |
Submit a job. |
qstat |
Monitor the status of a job. |
qdel |
Terminate a job prior to its completion. |
TORQUE includes numerous directives, which are used to specify resource requirements and other attributes for batch and interactive jobs. TORQUE directives can appear as header lines (lines that start with #PBS
) in a batch job script or as command-line options to the qsub
command.
TORQUE is based on the original open source Portable Batch System (OpenPBS) project and is managed as an open source project by Adaptive Computing, Inc. in cooperation with the TORQUE community. For more, see Adaptive Computing's TORQUE Resource Manager.
For help using TORQUE to submit and manage jobs, see the Submitting and managing jobs chapter of Adaptive Computing's TORQUE Administrator Guide. For a list of TORQUE commands, see the Commands overview appendix.
Job scripts
To run a job in batch mode on a high performance computing system using TORQUE, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to TORQUE using the qsub
command. TORQUE passes your job and its requirements to the system's job scheduler, which then dispatches your job whenever the required resources are available.
A very basic job script might contain just a bash
or
tcsh
shell script. However, TORQUE job scripts most commonly contain at least one executable command preceded by a list of directives that specify resources and other attributes needed to execute the command (for example, wall-clock time, the number of nodes and processors, and filenames for job output and errors). These directives are listed in header lines (lines beginning with
#PBS
), which should precede any executable lines in your job script.
Additionally, your TORQUE job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.
For example:
- A TORQUE job script for an MPI job might look like this:
#!/bin/bash #PBS -k o #PBS -l nodes=2:ppn=6,walltime=30:00 #PBS -M jthutt@tatooine.net #PBS -m abe #PBS -N JobName #PBS -j oe mpiexec -np 12 -machinefile $PBS_NODEFILE ~/bin/binaryname
In the above example, the first line indicates the script should be read using the
bash
command interpreter. Then, several header lines of TORQUE directives are included:TORQUE directive Description #PBS -k o
Keeps the job output #PBS -l nodes=2:ppn=6,walltime=30:00
Indicates the job requires two nodes, six processors per node, and 30 minutes of wall-clock time #PBS -M jthutt@tatooine.net
Sends job-related email to jthutt@tatooine.net
#PBS -m abe
Sends email if the job is ( a
) aborted, when it (b
) begins, and when it (e
) ends#PBS -N JobName
Names the job JobName
#PBS -j oe
Joins standard output and standard error The last line in the example is the executable line. It tells the operating system to use the
mpiexec
command to execute the~/bin/binaryname
binary on 12 processors from the machines listed in$PBS_NODEFILE
. - A TORQUE job script for a serial job might look like this:
#!/bin/bash #PBS -k o #PBS -l nodes=1:ppn=1,walltime=30:00 #PBS -M jthutt@tatooine.net #PBS -m abe #PBS -N JobName #PBS -j oe ./a.out
As in the previous example, this script starts with a line that specifies the
bash
command interpreter, followed by several header lines of TORQUE directives:TORQUE directive Description #PBS -k o
Keeps the job output #PBS -l nodes=1:ppn=1,walltime=30:00
Indicates the job requires one node, one processor per node, and 30 minutes of wall-clock time #PBS -M jthutt@tatooine.net
Sends job-related email to jthutt@tatooine.net
#PBS -m abe
Sends email if the job is ( a
) aborted, when it (b
) begins, and when it (e
) ends#PBS -N JobName
Names the job JobName
#PBS -j oe
Joins standard output and standard error The last line tells the operating system to execute
a.out
on a single processor.
For more about TORQUE directives, see the qsub
manual page (enter man qsub
).
Submit jobs
To submit your job script (for example, job.script
), use the TORQUE qsub
command. If the command runs successfully, it will return a job ID to standard output, for example:
qsub job.script 123456.qm2
If your job requires attribute values greater than the defaults, but less than the maximum allowed, you can specify these with the -l
(lowercase L
, for "limit") option, either in your job script (as explained in the previous section) or on the qsub
command line. For example, the following command submits job.script
, using the -l walltime
option to indicate the job needs more than the default 30 minutes of wall-clock time:
qsub -l walltime=10:00:00 job.script
To include multiple options on the command line, use either one -l
flag with several comma-separated options, or multiple -l
flags, each separated by a space. For example, the following two commands are equivalent:
qsub -l ncpus=16,mem=1024mb job.script qsub -l ncpus=16 -l mem=1024mb job.script
Useful qsub
options include:
qsub option |
Description |
---|---|
-q queue_name |
Specifies a user-selectable queue (queue_name ) |
-r |
Makes the job re-runnable |
-a date_time |
Executes the job only after a specific date and time (date_time ) |
-V |
Exports environment variables in your current environment to the job |
-I |
Makes the job run interactively (usually for testing purposes) |
For more, see the qsub
manual page (enter man qsub
).
Monitor jobs
To monitor the status of a queued or running job, use the qstat
command.
Useful qstat
options include:
qstat option |
Description |
---|---|
-u user_list |
Displays jobs for users listed in user_list |
-a |
Displays all jobs |
-r |
Displays running jobs |
-f |
Displays the full listing of jobs (returns excessive detail) |
-n |
Displays nodes allocated to jobs |
For example, to see all the jobs running in the LONG queue, enter:
qstat -r long | less
For more, see the qstat
manual page (enter man
qstat
).
Alternatively, use the Moab showq
command for monitoring jobs. To list the queued jobs in dispatch order, enter:
showq -i
For more, see the showq
manual page (enter man showq
).
Delete jobs
To delete queued or running jobs, use the qdel
command:
- To delete a specific job (
jobid
), enter:qdel jobid
- To delete all jobs, enter:
qdel all
Occasionally, a node becomes unresponsive and won't respond to the TORQUE server's requests to delete a job. If that occurs, add the
-W
(uppercase W) option:
qdel -W jobid
If that doesn't work, email the High Performance Systems group for help.
For more, see the qdel
manual page (enter man
qdel
).
This is document avmy in the Knowledge Base.
Last modified on 2021-04-11 07:02:18.