Why did my batch job crash, and where is the core file?

Your batch job on one of the research computing systems at Indiana University may have crashed because the stack size limit is set incorrectly.

If you're using bash or ksh, use the ulimit command to check the stack size:

  [dartmaul@q0144 ~]$ ulimit -a
  core file size          (blocks, -c) 0
  data seg size           (kbytes, -d) unlimited
  scheduling priority             (-e) 0
  file size               (blocks, -f) unlimited
  pending signals                 (-i) 128461
  max locked memory       (kbytes, -l) 64
  max memory size         (kbytes, -m) unlimited
  open files                      (-n) 1024
  pipe size            (512 bytes, -p) 8
  POSIX message queues     (bytes, -q) 819200
  real-time priority              (-r) 0
  stack size              (kbytes, -s) 10240
  cpu time               (seconds, -t) unlimited
  max user processes              (-u) 1024
  virtual memory          (kbytes, -v) unlimited
  file locks                      (-x) unlimited

If you're using tcsh or csh, use limit:

  [palpatin@h1 ~]$ limit
  cputime      unlimited
  filesize     unlimited
  datasize     unlimited
  stacksize    unlimited
  coredumpsize 0 kbytes
  memoryuse    unlimited
  vmemoryuse   4194304 kbytes
  descriptors  4096
  memorylocked unlimited
  maxproc      1024

In the first example, note that the stack size is 10240. Edit the initialization file for your shell to set the stack size to unlimited (as in the second example), and then try running your job again:

Shell Initialization file Command
bash .bashrc ulimit -s unlimited
ksh .profile ulimit -s unlimited
csh .cshrc limit stacksize unlimited
tcsh .cshrc limit stacksize unlimited

Once you've set your stack size to any value other than unlimited, you cannot raise it above that value in your current process. You must log out and log in again to reset your stack size to a higher value. In this situation, ulimit will return an error that looks like:

  -bash: ulimit: stack size: cannot modify limit:
  Operation not permitted  

By default, most versions of Linux set the core file size to zero. To generate a core file when you run a job on a research computing system at IU, use ulimit or limit to set the file size to unlimited:

Shell Initialization file Command
bash .bashrc ulimit -c unlimited
ksh .profile ulimit -c unlimited
csh .cshrc limit coredumpsize unlimited
tcsh .cshrc limit coredumpsize unlimited

To be sure you have enough space for the core file, link it to your scratch directory on the Data Capacitor II file system (replace username with your username):

  ln -s /N/dc2/scratch/username/core ./core
You may need to create the core file in your /N/dc2/scratch/username directory on Data Capacitor II, if you haven't done so already.

For more on disk space, see Available access to allocated and short-term storage capacity on IU's research systems.

This is document awdh in the Knowledge Base.
Last modified on 2018-01-30 12:01:54.

Contact us

For help or to comment, email the UITS Support Center.