ARCHIVED: On Big Red at IU, how do I use the paralleljob script to submit jobs?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

Note: Big Red, originally commissioned in 2006, was retired from service on September 30, 2013. Its replacement, Big Red II, is a hybrid CPU/GPU Cray XE6/XK7 system capable of achieving a theoretical peak performance (Rpeak) of 1 petaFLOPS, making it one of the two fastest university-owned supercomputers in the US. For more, see ARCHIVED: Get started on Big Red II. If you have questions or concerns about the retirement of Big Red, contact the High Performance Systems group.

On Big Red at Indiana University, the paralleljob script simplifies the process of submitting parallel (multiple-processor) programs to the LoadLeveler resource manager. For complete documentation, see the paralleljob manual page (from the Big Red command line, enter man paralleljob).

On this page:


Requirements

For the paralleljob script to work:

  • An appropriate version of mpirun must be in your path or specified with the MPIRUN environment variable. Each flavor of MPI has its own version of mpirun. If you compiled your parallel application using mpicc or another appropriate compiler, mpirun should be in your path already.
  • Your program must be a "single-program, multiple-data" parallel application (i.e., it must contain only a single binary). The paralleljob script does not work for "multiple-programs, multiple-data" parallel applications (i.e., programs that consist of more than one binary, such as some master/worker programs in which the master and workers are different executable files).
  • Use only double-quoted arguments. The paralleljob command treats single quotes as double quotes.

Command syntax

The paralleljob command lets you specify the number of processes your job should use, the amount of time it should be allowed to run, and the queue to which it should be submitted:

  paralleljob program_name [program_options] [-CPUS n] [-wallhours h] [-queue queue_name]

Replace the sample text as follows:

For: Specify:
program_name
The name of the program to submit
program_options
Any command-line options to pass to the program
n
The number of processes to be used; the number is expected to be a multiple of 4 and will be modified upward if necessary
h
The maximum number (an integer) of hours the job will be allowed to run
queue_name
The Big Red queue to which the job will be submitted

By default, paralleljob launches four processes for up to two hours in Big Red's LONG queue. In the default (LONG) queue, you can request up to 128 processes (32 nodes) for up to 336 hours (14 days). The NORMAL queue allows up to 1,024 processes for up to 48 hours. The DEBUG queue (for debugging) allows up to 16 processes for up to 15 minutes. For more about Big Red's batch queues, see ARCHIVED: Big Red usage policies.

Examples

For the following examples, suppose your program (speedster) takes options that specify speed (-speed) and the file to be processed (-infile):

  • To run speedster using four processes for up to two hours in the LONG queue, at the Big Red command prompt, enter:
      paralleljob speedster -speed=super -infile=mydata.dat
  • To launch 16 processes and run speedster for up to 48 hours, enter:
      paralleljob speedster -speed=super -infile=mydata.dat -CPUS 16 -wallhours 48
  • To launch 512 processes and run speedster for 10 hours in Big Red's NORMAL queue, enter:
      paralleljob speedster -speed=super -infile=mydata.dat -CPUS 512 -wallhours 10 -queue NORMAL

Note: If the program that you wish to run is not in your default path, make sure to use its fully qualified path name. When your job runs, the current working directory of your program is the directory from which you ran the paralleljob command.

If you need help or have questions about using paralleljob on Big Red, email the UITS Scientific Applications and Performance Tuning (SciAPT) team.

This is document awdl in the Knowledge Base.
Last modified on 2018-01-18 16:09:36.