What is Moab?
On this page:
Introduction
Moab is an advanced job scheduler for use on clusters and
supercomputers. It is a highly optimized and configurable tool capable
of supporting a large array of scheduling and fairness policies,
dynamic priorities, and extensive reservations. Acknowledged by many
as one of the most advanced schedulers available, Moab is currently in
use at hundreds of leading government, academic, and commercial sites
throughout the world. Moab improves the manageability and efficiency
of machines ranging from clusters of a few processors to
multi-teraflop supercomputers.
Moab at IU
On the Quarry system at Indiana University, Moab serves as
the job scheduler for the TORQUE resource manager (also
called PBS). TORQUE is based on OpenPBS; if you are familiar with PBS
Pro, you'll find much of the syntax the same. On the Big
Red system at IU, Moab serves as the job scheduler for the
LoadLeveler resource manager.
Once a job has been submitted to one of the TORQUE/Loadleveler queues,
it may become eligible for dispatch by Moab. The following commands
provide useful information on the status of a queued or running job:
showq |
Display active, idle, or all jobs |
showstart jobid |
Display estimated start time
for jobid
|
checkjob jobid |
Display attributes for
jobid
|
For more about these commands as well as other Moab utilities, see the
Moab
Workload Manager User's Manual.
Fairshare scheduling
Fairshare scheduling allows historical resource usage to affect job
priority decisions. Administrators can set target utilization goals
for each user, group, class, or service group. When these utilization
goals are exceeded by one usage class, jobs from other usage classes
will take precedence over jobs from the offending class.
Currently, the fairshare policy on Quarry and Big Red records usage
over the last seven days and decays at a rate of 80% per day. Each
usage class (usually a username) has a goal of 5% usage. Anything
above that will cause that user's jobs to have a lower scheduling
priority.
Use the diagnose -f command to display the fairshare
scheduling usage table. The following example shows that users
baikgrp and dsheen have exceeded their "fair
share" and will be given lower priorities over the next week:
[root@Quarry]# diagnose -f
FairShare Information
Depth: 7 intervals Interval Length: 1:00:00:00 Decay Rate: 0.80
FS Policy: DEDICATEDPS
System FS Settings: Target Usage: 0.00 Flags: 0
FSInterval % Target 0 1 2 3 4 5 6
FSWeight ------- ------- 1.0000 0.8000 0.6400 0.5120 0.4096 0.3277 0.2621
TotalUsage 100.00 ------- 1872.2 1605.8 631.7 1868.0 3222.6 1857.5 1439.1
USER
-------------
haiyang* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
baikgrp* 45.91 5.00 81.11 45.57 79.98 49.70 4.88 20.49 10.79
balin* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
akewalra* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
kevidale* 0.25 5.00 ------- ------- ------- 0.23 0.74 0.78 -------
dlauer* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
kmane* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
bramley* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
qzou* 0.18 5.00 ------- ------- ------- 0.05 0.34 0.53 1.01
mathess* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
iyengar* 0.54 5.00 ------- ------- ------- ------- 0.63 2.58 3.34
pewang* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
rrepasky* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
agopu* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
heap* 0.02 5.00 0.09 ------- ------- ------- ------- ------- -------
vsingan* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
huili* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
dsheen* 39.26 5.00 14.97 43.03 ------- 33.74 86.48 62.68 -------
turnerg* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
ejolson* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
ssrivast* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
smiddha* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
mburland* 4.90 5.00 0.17 5.37 4.83 11.14 3.11 5.19 16.89
febertra* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
lsandvos* 0.01 5.00 ------- 0.06 ------- ------- ------- ------- -------
mswat* 0.00 5.00 ------- ------- ------- ------- ------- ------- -------
acolubri* 5.72 5.00 3.67 5.97 15.20 5.14 3.75 7.75 10.01
mbaik* 3.22 5.00 ------- ------- ------- ------- 0.07 ------- 57.96
When to expect your job to start
Moab uses the fairshare tables to determine which job will be assigned
to the next open processors. The showq command shows the
state of submitted jobs. Following is sample output:
[root@Quarry]# showq
active jobs--------------------
JOBID USERNAME STATE PROCS REMAINING STARTTIME
17199 heap Running 1 2:53:12 Wed Sep 17 11:20:45
17200 heap Running 1 2:53:52 Wed Sep 17 11:21:25
17201 heap Running 1 2:54:32 Wed Sep 17 11:22:05
17202 heap Running 1 2:55:13 Wed Sep 17 11:22:46
17203 heap Running 1 2:55:53 Wed Sep 17 11:23:26
17204 heap Running 1 2:56:33 Wed Sep 17 11:24:06
17205 heap Running 1 2:57:13 Wed Sep 17 11:24:46
.
.
.
6 active jobs
eligible jobs----------------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
16672 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:05
16673 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16674 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16675 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16676 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16677 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
6 eligible jobs
blocked jobs----------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
0 blocked jobs
Total Jobs: 116 Active Jobs: 104 Eligible Jobs: 6 Blocked Jobs: 0
The jobs at the top of the "eligible jobs" list will run next if
resources are available. Various reservations can prevent jobs from
running if they have blocked off resources that waiting jobs would
need. You can use the command showres to examine the list
of reservations.
To find the estimated start time of a particular job, try:
showstart $JOBID
[root@Quarry]# showstart 16672
job 16672 requires 1 proc for 8:08:00:00
Earliest start in 5:03:54:32 on Mon Sep 22 17:00:00
Earliest completion in 13:11:54:32 on Wed Oct 1 01:00:00
Best Partition: DEFAULT
If a job already has a node or nodes reserved, showstart
returns the start time of the reservation. In all other cases,
showstart returns the earliest possible
start time, assuming that the job in question is the highest priority
job. In most cases a job will not be the highest priority, so
showstart is only an estimate of the start time.
Display jobs eligible for scheduling
Use the showq -i command to view the priority of all jobs
eligible for scheduling. Note that any job with an asterisk
( * ) appended to the jobid already has a
reservation. Thus, the output of showstart for these
jobs will be as accurate as possible.
[root@Quarry ~]# showq -i
eligible jobs----------------------
JOBID PRIORITY XFACTOR Q USERNAME GROUP PROCS WCLIMIT CLASS SYSTEMQUEUETIME
733993* 6478 2.0 lo semadeni bus 16 5:00:00:00 long Wed Dec 3 13:47:15
768877* 2596 1.1 lo jalstott psych 1 14:00:00:00 long Sat Dec 6 17:02:44
770194* 2463 1.2 lo xl5 chem 8 10:00:00:00 long Sat Dec 6 19:31:15
770223* 2463 1.2 lo xl5 chem 8 10:00:00:00 long Sat Dec 6 19:34:18
757240* 2354 1.2 lo smdietri chem 8 14:00:00:00 long Fri Dec 5 17:01:33
713993* 2193 2.9 no wtclark chem 1 5:00:00:00 normal Fri Nov 28 20:56:25
771859* 2150 1.2 no briordan psych 8 7:00:00:00 normal Sat Dec 6 23:50:06
771876* 2149 1.2 no briordan psych 8 7:00:00:00 normal Sat Dec 6 23:51:22
735872* 2139 2.9 no sk31 med 8 2:12:00:00 normal Wed Dec 3 17:16:24
734403* 2019 3.4 no xy1 chem 8 2:00:00:00 normal Wed Dec 3 14:49:04
734404 1980 3.4 no xy1 chem 8 2:00:00:00 normal Wed Dec 3 14:49:04
710516 1830 2.3 lo partrama chem 1 10:00:00:00 long Tue Nov 25 16:58:40
674958 1740 3.4 lo ppei econ 2 8:08:00:00 long Tue Nov 18 13:23:10
674965 1739 3.4 lo ppei econ 2 8:08:00:00 long Tue Nov 18 13:24:41
713760 1606 3.3 no lee532 biol 8 4:04:00:00 normal Fri Nov 28 17:57:22
743322 1516 5.8 no gafergus chem 6 20:00:00 normal Thu Dec 4 10:17:08
779826 1419 1.7 no crosenth staff 16 1:00:00:00 normal Sun Dec 7 15:45:42
713761 1366 3.3 no lee532 biol 8 4:04:00:00 normal Fri Nov 28 17:57:22
713994 1365 2.9 no wtclark chem 1 5:00:00:00 normal Fri Nov 28 20:56:27
766205 1160 1.2 lo gohs econ 2 8:08:00:00 long Sat Dec 6 11:20:35
710517 1073 2.3 lo partrama chem 1 10:00:00:00 long Tue Nov 25 16:58:41
784858 968 1.7 no caishen iengtech 3 12:50:00 normal Mon Dec 8 00:34:49
784861 968 1.7 no caishen iengtech 3 12:50:00 normal Mon Dec 8 00:34:55
785807 822 1.1 no shehsu faculty 3 2:12:00:00 normal Mon Dec 8 03:05:24
783630 789 1.2 no jandinom chem 8 2:00:00:00 normal Sun Dec 7 22:28:13
735880 729 2.9 no sk31 med 8 2:12:00:00 normal Wed Dec 3 17:17:45
743323 723 5.8 no gafergus chem 6 20:00:00 normal Thu Dec 4 10:17:08
788480 449 1.1 no avharter biol 1 12:10:00 normal Mon Dec 8 08:16:16
788494 448 1.1 no avharter biol 1 12:10:00 normal Mon Dec 8 08:17:29
789120 345 1.0 no rllord chem 8 3:00:00:00 normal Mon Dec 8 09:18:19
30 eligible jobs
Total jobs: 30
This is document avmu in domain all.
Last modified on May 13, 2009.