ARCHIVED: If your job is waiting to run in an IU research supercomputer queue
Your job may be waiting on jobs with higher priorities to run. To see all the eligible jobs waiting in the queues, use the Moab
showq -i
command. Find your username and you'll see the jobs that are in the queue ahead of yours:
eligible jobs---------------------- JOBID PRIORITY XFACTOR Q USERNAME GROUP PROCS WCLIMIT CLASS SYSTEMQUEUETIME s10c2b5.159287.0* 20852 6.9 - dartmaul chem 256 1:00:00:00 NORMAL Wed Dec 12 12:54:34 s10c2b5.159404.0* 18774 6.7 - darvader bio 128 1:00:00:00 NORMAL Wed Dec 12 19:20:25 s10c2b5.167992.0 1061 1.0 - pamidala chem 128 7:00:00:00 NORMAL Tue Dec 18 09:31:03 s10c2b5.167970.0 120 1.0 - jajbinks chem 32 7:00:00:00 NORMAL Tue Dec 18 09:21:52 s10c2b5.167971.0 119 1.0 - bekenobi chem 32 7:00:00:00 NORMAL Tue Dec 18 09:22:00
Use checkjob
to see why a job is not running. In this example, let's look at job s10c2b5.167971.0
:
palpatin@h1:~/> checkjob s10c2b5.167971.0 job s10c2b5.167971.0 AName: 0 State: Idle Creds: user:palpatin group:chem account:NONE class:NORMAL WallTime: 00:00:00 of 7:00:00:00 SubmitTime: Tue Dec 18 09:22:00 (Time Queued Total: 1:58:41 Eligible: 1:58:13) Total Requested Tasks: 32 Req[0] TaskCount: 32 Partition: ALL Memory >= 0 Disk >= 0 Swap >= 0 Opsys: Linux2 Arch: PPC64 Features: --- IWD: /N/dc2/scratch/palpatin/SORET/RUN2iv Executable: /N/dc2/scratch/palpatin/SORET/RUN2iv/fledsub BypassCount: 1 Flags: RESTARTABLE,FSVIOLATION Attr: FSVIOLATION StartPriority: 111 available for 4 tasks - s15c3b14.dim:s15c3b13.dim rejected for Class - rejected for State - rejected for Reserved - NOTE: job cannot run in partition base (idle procs do not meet requirements : 8 of 32 procs found) idle procs: 618 feasible procs: 8 Node Rejection Summary: [Class: 156][State: 809][Reserved: 54]
This shows that 156 nodes are in another class (or queue) not accessible by this job, 809 are in running jobs and so aren't in a state where they can run this job, and 54 are being reserved for some reason (in this case, to run a larger job with higher priority). You'll also see that two nodes (eight processors) are available to run this job, but this doesn't meet the job requirements.
You can get an idea of when your job will start with
showstart
:
jajbinks@h1:~/> showstart s10c2b5.167971.0 job s10c2b5.167971.0 requires 32 procs for 7:00:00:00 Estimated Rsv based start in 4:04:56:50 on Sat Dec 22 16:19:19 Estimated Rsv based completion in 11:04:56:50 on Sat Dec 29 16:19:19 Best Partition: base
Output from showstart
indicates that job
s10c2b5.167971.0
is expected to start by 4pm Saturday.
This is document awgw in the Knowledge Base.
Last modified on 2021-04-06 17:12:50.