Run jobs on Carbonate

On this page:


Overview

Note:

The information in this document pertains only to Carbonate's general-purpose and large-memory nodes. For information about running jobs on Carbonate's deep learning nodes, see:

Carbonate uses the TORQUE resource manager integrated with Moab Workload Manager to coordinate resource management and job scheduling. The Data Capacitor II, DC-WAN2, Slate, and Slate-Project file systems are mounted for temporary storage of research data.

Carbonate employs a default routing queue that funnels jobs, according to their resource requirements, into two execution queues configured to maximize job throughput and minimize wait times (the amount of time a job remains queued, waiting for required resources to become available). Depending on the resource requirements specified in either your batch job script or your qsub command, the routing queue (BATCH) automatically places your job into the NORMAL or LARGEMEMORY queue:

  • NORMAL: Jobs requesting up to 251 GB of virtual memory
  • LARGEMEMORY: Jobs requesting from 251 GB up to 503 GB of virtual memory
Note:
To best meet the needs of all research projects affiliated with Indiana University, UITS Research Technologies administers the batch job queues on IU's research supercomputers using resource management and job scheduling policies that optimize the overall efficiency and performance of workloads on those systems. If the structure or configuration of the batch queues on any of IU's supercomputing systems does not meet the needs of your research project, contact UITS Research Technologies.

Submit jobs

Use the TORQUE qsub command to submit batch or interactive jobs for execution on Carbonate's compute nodes.

  • Batch jobs: To run a batch job on Carbonate, first prepare a TORQUE script that specifies the application you want to run and the resources your job will require. For information about preparing a TORQUE job script, see the Job script section of Use TORQUE to submit and manage jobs on high-performance computing systems.

    To submit your TORQUE script (for example, job.script), use the qsub command. For example:

    qsub job.script
    

    If the command exits successfully, it will return a job ID; for example:

    123456.m1.Carbonate
    
    Note:
    Do not specify a destination queue in your TORQUE script or qsub command. Carbonate uses a default routing queue (BATCH) that automatically places your job, depending on the resource requirements you specify, into the NORMAL or LARGEMEMORY. Your job will run in the NORMAL or LARGEMEMORY queue unless you specifically submit it to Carbonate's INTERACTIVE or DEBUG queue. For information about those queues, see Additional queue information below.

    If your job has resource requirements that are different from the defaults (but not exceeding the maximums allowed), specify them either with directives in your TORQUE script, or with the -l (a lower-case "L"; short for resource_list) option in your qsub command. (Command-line arguments override directives in your job script.) For example:

    • To submit a TORQUE script (job.script) for a batch job that requires more than the default 60 minutes of walltime, use:
      qsub -l walltime=10:00:00 job.script
      
    • To submit a TORQUE script (job.script) for a batch job that requires more than the default 16 GB of virtual memory (for example, 200 GB), use:
      qsub -l nodes=1:ppn=4,vmem=200gb job.script
      

      If you don't provide a virtual memory resource (omit -l vmem=[n]gb), you will receive a warning, and the default (16 GB) virtual memory will be applied.

    With the qsub command, you can specify multiple attributes with either one -l switch followed by multiple comma-separated attributes, or multiple -l switches, one for each attribute. For example, to submit TORQUE script (job.script) for a batch job that requires 32 GB of virtual memory to run on 16 cores on one node, enter either of the following commands (they are equivalent):

    qsub -l nodes=1:ppn=16,vmem=32gb job.script
    qsub -l nodes=1:ppn=16 -l vmem=32gb job.script
    
  • Interactive jobs: To submit an interactive job, use qsub with the -I (to specify an interactive job) and -q interactive (to specify submission to Carbonate's INTERACTIVE queue) options; for example:
    qsub -I -q interactive -l nodes=1:ppn4,walltime=01:00:00
    

    Submitting your job to the INTERACTIVE queue directs it to a specific set of nodes that are configured for shared access (versus single-user access in the general batch queues). Consequently, your interactive job most likely will dispatch faster in the INTERACTIVE queue than in the general execution queues.

Other useful qsub options include:

Option Action
-a YYYYMMDDhhmm.SS Specify the date and time after which the job is eligible to execute (replace YYYY with the year, MM with the month, DD with the day of the month, hh with the hour, and mm with the minute; the .SS to indicate seconds is optional).
-m e Mail a job summary report when the job terminates.
-r [y|n] Declare whether the job is rerunnable. If the argument is y, the job is rerunnable; if the argument is n, the job is not rerunnable. The default value is y (rerunnable).
-V Export all environment variables in the qsub command's environment to the batch job.

For more, see the qsub manual page.

Monitor jobs

To monitor the status of a queued or running job, use the TORQUE qstat command:

  • To get the status of a particular job, combine the qstat with the job ID that was returned by TORQUE when you entered your qsub command (for example, 1071767):
    qstat 1071767
    
  • To get the status of a job (or jobs) that you submitted, combine the qstat with your IU username (for example, username):
    qstat -u username
    

Useful qstat options include:

Option Action
-a Display all jobs.
-f Write a full status display to standard output.
-n List the nodes allocated to a job.
-r Display jobs that are running.

For more, see the qstat manual page.

Delete jobs

To delete a queued or running job, use the TORQUE qdel command combined with the job ID that was returned by TORQUE when you entered your qsub command (for example, 1071767):

qdel 1071767
Note:
Occasionally, a node will become unresponsive and unable to respond to the TORQUE server's requests to kill a job. In such cases, try using qdel -W <delay> to override the delay between SIGTERM and SIGKILL signals (for <delay>, specify a value in seconds).

For more, see the qdel manual page.

Additional queue information

  • DEBUG: To submit a batch job to the DEBUG queue, either add the #PBS -q debug directive to your job script, or enter qsub -q debug on the command line.

    The DEBUG queue is intended for short, quick-turnaround test jobs requiring less than 1 hour of wall time. For longer debugging or testing sessions, submit an interactive job to the INTERACTIVE queue instead.

    Maximum wall time: 1 hour
    Maximum nodes per job: 2
    Maximum cores per job: 48
    Maximum number of jobs per user: 2
    Direct submission: Yes
  • INTERACTIVE: To submit an interactive job to the INTERACTIVE queue, on the command line, enter qsub with the -I and -q interactive options added; for example:
      qsub -I -q interactive -l nodes=1:ppn=1,walltime=4:00:00
    

    Interactive jobs submitted to the INTERACTIVE queue should experience less wait time (start sooner) than interactive jobs submitted to the batch execution queues. If you enter qsub without the -q interactive option, your interactive job will be placed in the routing queue for submission to the NORMAL or LARGEMEMORY batch execution queue, which most likely will entail a longer wait time for your job.

    Maximum wall time: 8 hours
    Maximum cores per job: 8
    Maximum number of jobs per queue: 128
    Maximum number of jobs per user: 2
    Direct submission: Yes

Get help

Support for IU research computing systems, software, and services is provided by various teams within the Research Technologies division of UITS.

For general questions about research computing at IU, contact UITS Research Technologies.

For more options, see Research computing support at IU.

Related documents

This is document avjo in the Knowledge Base.
Last modified on 2019-08-15 15:19:59.

Contact us

For help or to comment, email the UITS Support Center.