ARCHIVED: Using LoadLeveler on Libra at IU

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

Note: UITS will retire Libra in spring 2009. Accounts are available on Quarry, a general-purpose Unix computing environment. For more, see ARCHIVED: About the Libra retirement.

On this page:

Submitting batch jobs to LoadLeveler
LoadLeveler batch job classes
Sample LoadLeveler scripts
LoadLeveler keywords
Submitting and monitoring your job

Submitting batch jobs to LoadLeveler

To optimize job throughput and minimize paging on the Libra cluster at Indiana University, you must submit all CPU-intensive jobs (i.e., jobs requiring more than 20 minutes of CPU time) as batch jobs to a batch scheduler. Libra uses IBM's LoadLeveler, which dispatch jobs to the least busy host by means of a load-balancing algorithm. In addition, LoadLeveler provides job prioritizing, hold/release, and an easy-to-use graphical user interface for creating, submitting, and manipulating jobs. LoadLeveler's functionality is extended via its interface to an external scheduler, the Maui Scheduler, which offers superior backfill scheduling, quality of service control, advanced reservations, and more.

Your home directory resides on the UITS EMC storage system, and is accessible from all nodes. You may submit jobs to LoadLeveler from the interactive nodes, libra01 or libra02, and may read and write your input and output files from/to your home directory or from/to the local /tmp and /scr directories, or to the shared scratch file systems, /scratch1 and /scratch2. When your job is started, your default login shell is invoked, along with your .login and .cshrc files (for C shell users) or your .profile and .kshrc files (for Korn and Bourne shell users). Be sure that these files do not contain any commands that would not work in a batch environment (e.g.,the stty command, which expects stdout to be a terminal).

LoadLeveler batch job classes

Before you can submit your job to LoadLeveler, you must first determine its requirements in terms of CPU time, memory, and software and submit it to the appropriate class. The following classes are available on Libra:

serial: 1 CPU, less than 2GB memory, 14-day wall clock time
nonshared: 1-2 CPUs, less than 4GB memory, 14-day wall clock time
smp: 1-16 CPUs, more than 4GB memory, 14-day wall clock time

For Fortran and C jobs and other serial applications, such as SAS, SPSS, Matlab, and Mathematica, use the class "serial". If your job requires more than 2GB but less than 4GB of memory, use the class "nonshared". If your job requires multiple CPUs and/or more than 4GB of memory, use class "smp". You must also specify a feature code if the software you require is not available on all nodes that run the desired job class.

For information about the LoadLeveler node configuration, see ARCHIVED: LoadLeveler configuration on Libra.

Sample LoadLeveler scripts

To communicate with LoadLeveler, you can use the X Window graphical user interface, xloadl, or LoadLeveler commands (e.g., llsubmit, llq, and llcancel). The xloadl interface allows you to create, submit, query, and manipulate your batch jobs. If you do not have an X-capable display or you prefer LoadLeveler command mode, to submit jobs you must first create a shell script that contains all the necessary information to set up and run the desired tasks. Within the LoadLeveler job script, specify job options using keywords on a line beginning with the special characters #@. Job options include job class, working directory, executable name, software requirements, where to redirect stdin, stdout, and stderr, and notification type.

Serial job examples

  #@ class = serial
  #@ initialdir = /N/u/joedoe/libra
  #@ input = myprogram.input
  #@ output = myprogram.output
  #@ error = myprogram.error
  #@ arguments = arg1 arg2 arg3
  #@ executable = myprogram
  #@ queue

Important: You must assign values to "output" and "error" if your program writes to stdout and/or stderr. If not specified, these default to /dev/null.

To ensure that the output from each job goes to a unique file, use the values assigned by LoadLeveler to the Executable, Cluster, and Process values. The Cluster value is a unique jobid; a Process value is assigned to each process queued within a script. Following is an example script that executes the myjob program twice, using different arguments and creating two unique output files:

  #@ class = serial
  #@ initialdir = /N/u/joedoe/libra/bin
  #@ executable = myprogram
  #@ output = $(Executable).$(Cluster).$(Process).
  #@ arguments = arg1 arg2 arg3
  #@ queue
  #@ arguments = arg4 arg5 arg6
  #@ queue

If you do not specify an executable name, LoadLeveler assumes that anything following the #@ queue statement consists of commands to be executed. Use this format if you want to run a shell script rather than a compiled program, or if several commands need to be run in sequence. Following is an example:

  #@ class = serial
  #@ initialdir = /N/u/joedoe/libra
  #@ output = myprog.$(Cluster).out
  #@ error = myprog.$(Cluster).err
  #@ queue
  xlf -o myprog myprog.f            
  myprog
  rm /scr/joedoe/myprog.workfile

The following script submits a SAS job:

  #@ class = serial
  #@ initialdir = /N/u/joedoe/libra/sasjobs
  #@ error = mysasjob.err
  #@ output = mysasjob.out
  #@ queue
  sas mysasjob.sas

The following script submits a Mathematica job:

  #@ class = serial
  #@ initialdir = /N/u/joedoe/libra/math
  #@ input = math.input
  #@ output = math.output
  #@ error = math.err
  #@ queue
  math

The following script submits an Splus job. Splus is a statistics package that has a node-locked license on libra05; target your job to the correct node by requesting the defined feature code, in this case splus:

  #@ class = serial
  #@ requirements = (Feature == "splus")
  #@ initialdir = /N/u/joedoe/libra/Splus
  #@ input = Splus.commandfile
  #@ output = Splus.output 
  #@ error = Splus.error
  #@ queue
  Splus

Note: If you enter an unavailable requirement, you will not receive a message. LoadLeveler will hold your job in the queue until the resource becomes available. Be sure to specify the resource correctly.

To see the list of defined feature codes, enter llconfig.

Nonshared job example

If you use the Gaussian03 software package, which can use more than one CPU, or other programs that require one or two CPUs and more than 2GB of memory, use the "nonshared" queue to submit your jobs to nodes that are not shared by other user programs:

  #@ class = nonshared
  #@ initialdir = /N/u/joedoe/libra/gaussian03
  #@ output = run2.output
  #@ error = run2.error
  #@ queue
  setenv g03root /software/sciapps/g03
  source \$g03root/g03/bsd/g03.login
  setenv GAUSS_SCRDIR /scr/\$USER
  /bin/mkdir $GAUSS_SCRDIR
  g03 run2.com run2.log
  /bin/rm -R \$GAUSS_SCRDIR

Parallel job example

The following script submits a 16-CPU OpenMP parallel job. Note that although you specify the number of CPUs to reserve using the tasks_per_node keyword, your OpenMP program will spawn only the number of threads specified by the value of the OMP_NUM_THREADS environment variable, so you must set the value in the batch script, as follows:

  #@ class = smp
  #@ job_type = parallel
  #@ node = 1
  #@ tasks_per_node = 16
  #@ initialdir = /N/u/joedoe/libra/myprogs
  #@ executable = myopenmpcode
  #@ input = inputfile1
  #@ output = $(Executable).$(Cluster).output
  #@ error = $(Executable).$(Cluster).error 
  #@ environment = COPY_ALL; OMP_NUM_THREADS=16
  #@ queue

Libra does not have a high-speed low-latency interconnect, and does not support MPI (distributed memory parallel) jobs.

LoadLeveler keywords

Keyword	Description	Syntax
`arguments`	Specifies the list of arguments topass to your program when your job runs	`arguments = arg1 arg2 arg3 ...`
`checkpoint`	Specifies whether you want to checkpoint your program. If yes, your program must first be linked with the LoadLeveler C or FORTRAN libraries via the `llcc` or `llxlf` commands. Checkpoints occur every two hours and allow jobs to survive machine failures. Do not checkpoint jobs that fork, use signals, dynamic loading, shared memory, semaphores, messages, or internal timers, set the user/group ID, or are not idempotent (i.e., if I/O operations, when repeated, do not yield the same result).	`checkpoint = yes \| no` (Default is "no")
`class`	Specifies the name of a job class The job class is the same as a job queue. Each class has a defined CPU time and elapsed time limits; see the list above.	`class = serial \| nonshared \| smp` (Default is serial)
`core_limit`	Specifies the maximum size of a core file. When a job exceeds the softlimit, it receives a signal. When a job reaches the hardlimit, it is terminated. Limits are expressed in units using b (bytes), w (words), kb (kilobytes), mb (megabytes), mw (megawords), or gb (gigabytes).	`core_limit = hardlimit,softlimit` (For example, `core_limit = 1mb,0.8mb`; defaults to the AIX user core limit of 1MB)
`cpu_limit`	Specifies the maximum amount of CPU time that a submitted job can use. Express the limit as hours:minutes:seconds. (Defaults to the CPU limit for the job class)	`cpu_limit = hardlimit,softlimit` (For example, `cpu_limit = 12:00:00,11:50:00`)
`data_limit`	Specifies the maximum size of the data segment to be used by the submitted job	`data_limit = hardlimit,softlimit` (Defaults to unlimited)
`dependency`	Specifies dependencies between job steps. Syntax is `dependency = step_name operator returncode`, where `step_name` must be a previously defined job step and `operator` is `==`, `!=`, `<=`, `>=`, `<`, `>`, `&&`, or `.`.	`dependency = (step1==0) && (step2 > 0)`
`environment`	Specifies your initial environment variables when your job starts. Separate the variables by semicolons. Specify `COPY_ALL` to copy all the environment variables from your shell, `$var` to copy an individual variable, `!var` to prevent the copying of a variable, and `var=value` to set the value of a variable and then copy it.	`environment = env1 ; env2 ; ...` (For example, `environment = COPY_ALL ; !DISPLAY ;`)
`error`	Specifies the name of the file to use as standard error (stderr) when your job runs	`error = filename` (Defaults to /dev/null)
`executable`	Identifies the name of the program to run. If not specified, LoadLeveler uses the job script file as the executable.	`executable = filename`
`file_limit`	Specifies the maximum size of files created by the job	`file_limit = hardlimit,softlimit` (Defaults to the AIX user file limit of 2GB)
`hold`	Specifies whether you want to place a hold on your program when you submit it. Three types of hold are available: user, system, and usersys. Only a system administrator can release a job in system or combined usersys hold. Users releases jobs from user hold using the `llhold -r` command.	`hold = user \| system \| usersys`
`initialdir`	The path name of the directory to use as the initial working directory during execution of the job. If none is specified, the initial directory is the current working directory at the time you submitted the job.	`initialdir = pathname`
`input`	Specifies the filename to use as standard input (stdin) when your job runs	`input = filename` (Defaults to `/dev/null`)
`job_cpu_limit`	Specifies the maximum CPU time to be used by all processes of a job step; syntax is hours:minutes:seconds.fraction	`job_cpu_limit = 12:00:00.0` (Defaults to class time limit)
`job_name`	Specifies the name of the job; used in long reports for `llq` and `llstatus`, and in mail related to the job	`job_name = my_awesome_job`
`job_type`	Specifies whether the job is a single-CPU job or can run on multiple processors	`job_type = serial \| parallel` (Default is serial)
`node`	Specifies the number of nodes required for a parallel job	`node = min,max` or `node = n`
`node_usage`	Specifies whether the job can share the node with other jobs	`node_usage = shared \| not_shared` (Default is shared)
`notification`	Specifies when the user specified in `notify_user` is sent mail	`notification = always \| error \| start \| never \| complete` (Default is complete)
`notify_user`	Specifies the user to whom notification mail is sent	`notify_user = username` (Default is job owner)
`output`	Specifies the name of the file to use as standard output (stdout) when your job runs	`output = filename` (Defaults to `/dev/null`)
`preferences`	List of characteristics that you prefer be available on the target machine. If a machine that meets the preferences is not available, LoadLeveler will assign the job to a machine that meets the requirements. See the requirements keyword below.	`preferences = Boolean expression`
`queue`	Places one copy of the job in the queue. If you wish, you can specify input, output, error, and argument statements between queue statements.	`queue`
`requirements`	List of requirements the remote machine must meet to execute the job script. Supported requirements are: Memory (The amount of physical memory required in megabytes) Feature (Required software or some other locally defined feature.) Machine (Hostname of the target machine) Disk (Kilobytes of disk space available in LoadLeveler's working directory on the target machine) Arch (Target machine's architecture; defaults to that of the submitting machine. All AIX nodes are defined as "R6000".) OpSys (Target machine's operating system; defaults to that of the submitting machine. The Libra nodes are defined with "AIX53".)	`requirements = Boolean expression` Examples: `requirements = (Feature == "sas")` `requirements = ((Memory >= 2048) && (Feature == "math")` `requirements = (Machine == "libra09")`
`restart`	Restart the job if LoadLeveler abends or system crashes	`restart = no` (Defaults to yes)
`rss_limit`	Specifies the maximum resident set size	`rss_limit = hardlimit,softlimit` (Default is unlimited)
`shell`	Specifies the name of the shell to use for the job. If not specified, the shell specified in the owner's password file entry is used.	`shell = name`
`stack_limit`	Specifies the maximum size of the stack	`stack_limit = hardlimit,softlimit` (Default is the AIX user stack limit of 2GB)
`startdate`	Specifies when you want to run the job. Express startdate as MM/DD/YY HH:MM(:SS).	`startdate = date time` (Defaults to current date and time)
`step_name`	Specifies the name of the job step for your job; used for dependencies between job steps. Do not use `T` or `F` or start the name with a number. Do not use if you're using `task_per_node`.	`step_name = step_1` (Defaults to 0, 1, 2, ...)
`tasks_per_node`	Specifies the number of tasks to run on each node assigned to a parallel job	`task_per_node = nn` (Default is 1)
`user_priority`	Sets the initial user priority of your job. It orders the job with respect to other jobs you've submitted. Priority can be 0 to 100.	`user_priority = number` (Default is 50)
`wall_clock_limit`	Sets the limits for elapsed time a job can run	`wall_clock_limit = hardlimit, softlimit`

Submitting and monitoring the job

Once you've created your job script, enter chmod u+x to make the script executable. To submit the job script to LoadLeveler, enter:

 llsubmit  scriptname

If the submission is successful, LoadLeveler returns a jobid. For some useful LoadLeveler commands for tracking your job, see ARCHIVED: Monitoring LoadLeveler jobs on Big Red at IU.

LoadLeveler has been set up on Libra so that you cannot run more than four jobs simultaneously. If you submit more than eight jobs to the queue at one time, LoadLeveler will not consider the additional jobs eligible for queueing until previous jobs have completed. The default LoadLeveler FIFO Scheduler has been replaced by the Maui Scheduler, which uses a fairshare algorithm to determine the dispatch order of submitted jobs. Thus Maui considers both the submit time as well as the users' recent past use of batch cycles to prioritize jobs. To display fairshare usage statistics, enter showfairshare.