ARCHIVED: Use RUR to collect resource usage statistics for batch jobs on Big Red II at IU

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

On this page:

Note:

Big Red II was retired from service on December 15, 2019; for more, see ARCHIVED: About Big Red II at Indiana University (Retired).


Overview

Resource Utilization Reporting (RUR) is a Cray Linux Environment (CLE) utility available on Indiana University's Big Red II research supercomputer. RUR collects resource usage statistics for batch job processes that run on Big Red II's CPU and CPU/GPU nodes in the Extreme Scalability Mode (ESM) and Cluster Compatibility Mode (CCM) execution environments.

Enable RUR on Big Red II

To enable RUR on Big Red II, you first must create a .rur subdirectory in your home directory (this is where RUR will store the log files it creates); from the Big Red II command line, enter:

  mkdir .rur

Once your ~/.rur subdirectory exists, RUR will create a rur.job_id log file (where job_id is your job's ID number) for each batch job you run on Big Red II. Each rur.job_id log will contain compute node state information that RUR gathers before and after each process in the job is run.

Understand RUR log files

You can use the statistical data in your RUR log files to determine how your applications are using system resources. Following are sample RUR log files with explanations of the information they contain.

Sample RUR log for a CPU-only ESM job

The following sample RUR log file shows data collected for a batch job (jobid: 541080) in which aprun launched the ./bin/ior binary in two processes (apid: 3625148 and apid: 3625157) on the CPU nodes in Big Red II's Extreme Scalability Mode (ESM) execution environment:

  uid: 1000, apid: 3625148, jobid: 541080, cmdname: ./bin/ior taskstats ['utime', 5680000, 'stime',
  435040000, 'max_rss', 4136, 'rchar', 545012, 'wchar', 137438955627, 'exitcodes', 0, 0, 0, 0]

  uid: 1000, apid: 3625157, jobid: 541080, cmdname: ./bin/ior taskstats ['utime', 48968000, 'stime',
  343752000, 'max_rss', 4124, 'rchar', 137439498456, 'wchar', 2148, 'exitcodes', 0, 0, 0, 0]

The data elements collected for each process include the user time ( utime), system CPU time (stime), and the amount of data (in bytes) read (rchar) and written ( wchar) by the application. These values are totals from all the nodes summed together. The max_rss value represents the maximum amount of memory (in KB) used by any of the nodes (rather than a sum across all the nodes).

Sample RUR logs for CPU/GPU ESM jobs

The following sample RUR log files show data collected for two batch jobs (jobid: 516541 and jobid: 538644). In each job, aprun launched one process of the specified binary on the CPU/GPU nodes in Big Red II's ESM execution environment:

  • rur.516541:
      uid: 1000, apid: 3524218, jobid: 516541, cmdname: /tmp/dostuff taskstats ['utime', 32000,
      'stime', 132000, 'max_rss', 1736, 'rchar', 44524, 'wchar', 289]
      
      uid: 1000, apid: 3524218, jobid: 516541, cmdname: /tmp/dostuff gpustat ['maxmem', 108000,
      'summem', 108000, 'gpusecs', 44]
    

    In the example above, the top line shows the CPU usage statistics (taskstats) for the process apid: 3524218, including the total user time (utime), system CPU time (stime), bytes read (rchar), and bytes written (wchar) across all nodes, and the maximum amount of memory used by any of the nodes (max_rss).

    The bottom line shows the GPU usage statistics (gpustat) for the same process (apid: 3524218), including the total GPU compute time (gpusecs) and memory used (summem) across all nodes, and the maximum GPU memory used by any of the nodes (maxmem).

    Taken together, these statistics reveal that the binary (/tmp/dostuff) made effective use of the CPU/GPU node by offloading a portion of the computational work to the GPU.

  • rur.538644:
      uid: 1000, apid: 3575196, jobid: 538644, cmdname: ./RunHPCC.sh taskstats ['utime', 360400000,
      'stime', 17260000, 'max_rss', 529916, 'rchar', 2376279, 'wchar', 6558, 'exitcodes', 15, 9, 0, 0]
    
      uid: 1000, apid: 3575196, jobid: 538644, cmdname: ./RunHPCC.sh gpustat ['maxmem', 0, 'summem',
      0, 'gpusecs', 0]
    

    In the example above, the top line shows the CPU usage statistics (taskstats) for the process apid: 3575196, including the total user time (utime), system CPU time (stime), bytes read (rchar), and bytes written (wchar) across all nodes, and the maximum amount of memory used by any of the nodes (max_rss).

    The bottom line shows the GPU usage statistics (gpustat) for the same process. The 0 values recorded for maxmem, summem, and gpusecs reveal that the ./RunHPCC.sh binary did not use any GPU resources.

Sample RUR log for a CPU-only CCM job

The following sample RUR log file shows data collected for a batch job (jobid: 547707) with two processes (apid: 3645534 and apid: 3645535) launched on the CPU nodes in Big Red II's Cluster Compatibility Mode (CCM) execution environment:

  uid: 1000, apid: 3645534, jobid: 547707, cmdname: /opt/cray/ccm/default/sbin/ccmlaunch taskstats
  ['utime', 72000, 'stime', 96000, 'max_rss', 3336, 'rchar', 6694077, 'wchar', 5708, 'exitcodes',
  256, 512, 32512, 9]
  
  uid: 1000, apid: 3645535, jobid: 547707, cmdname: /opt/cray/ccm/default/sbin/ccmlaunch taskstats
  ['utime', 57032000, 'stime', 5608000, 'max_rss', 9380492, 'rchar', 6866528, 'wchar', 3852,
  'exitcodes', 256, 512, 9, 0]

This example contains the same types of CPU usage statistics (taskstats) collected for CPU-only jobs running in the ESM execution environment, except this log shows that both processes were launched in the CCM execution environment by ccmlaunch (cmdname: /opt/cray/ccm/default/sbin/ccmlaunch).

Sample RUR log for a CPU/GPU CCM job

The following sample RUR log file shows data collected for a batch job (jobid: 547993) with one process (apid: 3648306) launched on the CPU/GPU nodes in Big Red II's CCM execution environment:

  uid: 1000, apid: 3648306, jobid: 547993, cmdname: /opt/cray/ccm/default/sbin/ccmlaunch taskstats
  ['utime', 64000, 'stime', 128000, 'max_rss', 3336, 'rchar', 6687954, 'wchar', 2203, 'exitcodes',
  256, 512, 32512, 9]
  
  uid: 1000, apid: 3648306, jobid: 547993, cmdname: /opt/cray/ccm/default/sbin/ccmlaunch gpustat
  ['maxmem', 0, 'summem', 0, 'gpusecs', 0]

This example contains the same type of CPU (taskstats) and GPU (gpustat) usage statistics collected for CPU/GPU jobs running in the ESM execution environment, except this log shows that the process was launched in the CCM execution environment by ccmlaunch (cmdname: /opt/cray/ccm/default/sbin/ccmlaunch).

In this case, 0 values recorded for maxmem, summem, and gpusecs reveal the process did not use any GPU resources.

Get help

If you need help or have questions about using RUR on Big Red II, contact the UITS Research Applications and Deep Learning team.

If you have system-specific questions or encounter problems running batch jobs on Big Red II, contact the UITS High Performance Systems group.

This is document bext in the Knowledge Base.
Last modified on 2023-04-21 16:56:08.