ARCHIVED: If your job is waiting to run in an IU research supercomputer queue

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

Your job may be waiting on jobs with higher priorities to run. To see all the eligible jobs waiting in the queues, use the Moab showq -i command. Find your username and you'll see the jobs that are in the queue ahead of yours:

  eligible jobs----------------------
  JOBID                 PRIORITY  XFACTOR  Q  USERNAME    GROUP  PROCS     WCLIMIT     CLASS      SYSTEMQUEUETIME

  s10c2b5.159287.0*        20852      6.9  -   dartmaul     chem    256  1:00:00:00    NORMAL  Wed Dec 12 12:54:34
  s10c2b5.159404.0*        18774      6.7  -   darvader      bio    128  1:00:00:00    NORMAL  Wed Dec 12 19:20:25
  s10c2b5.167992.0          1061      1.0  -   pamidala     chem    128  7:00:00:00    NORMAL  Tue Dec 18 09:31:03
  s10c2b5.167970.0           120      1.0  -   jajbinks     chem     32  7:00:00:00    NORMAL  Tue Dec 18 09:21:52
  s10c2b5.167971.0           119      1.0  -   bekenobi     chem     32  7:00:00:00    NORMAL  Tue Dec 18 09:22:00

Use checkjob to see why a job is not running. In this example, let's look at job s10c2b5.167971.0:

  palpatin@h1:~/> checkjob s10c2b5.167971.0

    job s10c2b5.167971.0

    AName: 0
    State: Idle
    Creds:  user:palpatin  group:chem  account:NONE  class:NORMAL
    WallTime:   00:00:00 of 7:00:00:00
    SubmitTime: Tue Dec 18 09:22:00
      (Time Queued  Total: 1:58:41  Eligible: 1:58:13)

    Total Requested Tasks: 32

    Req[0]  TaskCount: 32  Partition: ALL
    Memory >= 0  Disk >= 0  Swap >= 0
    Opsys:   Linux2  Arch: PPC64  Features: ---


    IWD:            /N/dc2/scratch/palpatin/SORET/RUN2iv
    Executable:     /N/dc2/scratch/palpatin/SORET/RUN2iv/fledsub

    BypassCount:    1
    Flags:          RESTARTABLE,FSVIOLATION
    Attr:           FSVIOLATION
    StartPriority:  111
    available for 4 tasks     - s15c3b14.dim:s15c3b13.dim
    rejected for Class        -
    rejected for State        -
    rejected for Reserved     -
    NOTE:  job cannot run in partition base (idle procs do not meet
    requirements : 8 of 32 procs found)
    idle procs: 618  feasible procs:   8

    Node Rejection Summary: [Class: 156][State: 809][Reserved: 54] 

This shows that 156 nodes are in another class (or queue) not accessible by this job, 809 are in running jobs and so aren't in a state where they can run this job, and 54 are being reserved for some reason (in this case, to run a larger job with higher priority). You'll also see that two nodes (eight processors) are available to run this job, but this doesn't meet the job requirements.

You can get an idea of when your job will start with showstart:

  jajbinks@h1:~/> showstart s10c2b5.167971.0

    job s10c2b5.167971.0 requires 32 procs for 7:00:00:00

    Estimated Rsv based start in         4:04:56:50 on Sat Dec 22 16:19:19
    Estimated Rsv based completion in   11:04:56:50 on Sat Dec 29 16:19:19

    Best Partition: base 

Output from showstart indicates that job s10c2b5.167971.0 is expected to start by 4pm Saturday.

This is document awgw in the Knowledge Base.
Last modified on 2021-04-06 17:12:50.