ARCHIVED: Use R on Big Red II at IU

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

On this page:

Note:

Big Red II was retired from service on December 15, 2019; for more, see ARCHIVED: About Big Red II at Indiana University (Retired).


Overview

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (for example, linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering) and graphical techniques, and is highly extensible. For more, see the R Project for Statistical Computing home page.

On Big Red II at Indiana University, you can set up and run R batch jobs on compute nodes in the native Extreme Scalability Mode (ESM) execution environment, or run interactive R jobs on compute nodes in the Cluster Compatibility Mode (CCM) execution environment. The ARCHIVED: ESM and CCM execution environments are features of the Cray Linux Environment (CLE) operating system running on Big Red II.

Set up your user environment

To add the default R module, you need to have the Intel programming environment module (PrgEnv-intel) already added to your user environment. To determine which modules are currently loaded, on the command line, enter:

module list

If another programming environment module (for example, PrgEnv-cray) is loaded, use the module swap command to replace it with the appropriate programming environment module; on the command line, enter:

module swap PrgEnv-cray PrgEnv-intel

To load the default R module; on the command line, enter:

module load r
Note:

Non-default versions of R installed on Big Red II require either the Intel (PrgEnv-intel) or GNU programming environment (PrgEnv-gnu) module. To see which versions are available, on the command line, enter:

module avail r/

To load a version that does not have intel in its module name (for example, r/3.1.1), make sure the GNU programming environment (PrgEnv-gnu) module is already loaded.

To load a non-default R module, enter the full module name; for example:

module load r/3.1.1

For example, to make sure the modules required for using the default R package are loaded every time you log into Big Red II, add the following lines to your ~/.modules file:

module swap PrgEnv-cray PrgEnv-intel
module load r

Submit an R batch job

To submit an R batch job that will run on Big Red II's compute nodes in the ESM execution environment:

  1. Create an R file (for example, R_input.R) containing the commands R should run.
  2. Create a job script (for example, R_job) that includes the aprun command to launch R on a compute node in the ESM execution environment (see the job script example below).
  3. Submit your job script (for example, R_job) to the TORQUE resource manager; on the command prompt, enter:
    qsub R_job
    
  4. To check the status of your job, use the qstat command (replace username with your IU username):
    qstat -u username
    

Job script example

The following example can be used (with some minor modifications) to run an R batch job on all 32 processors of one compute node in Big Red II's ESM execution environment:

#!/bin/bash

#PBS -l nodes=1:ppn=32
#PBS -l walltime=01:00:00
#PBS -q cpu
#PBS -m abe
#PBS -M username@iu.edu
#PBS -N my_R_job

cd /N/u/username/BigRed2/your_working_directory

aprun -n 32 R CMD BATCH --no-save R_input.R

The TORQUE directives in the example script do the following:

Directive Function
#PBS -l nodes=1:ppn=32 Sets resource requirements for the job to one node, 32 processors per node
#PBS -l walltime=01:00:00 Requests one hour of wall-clock time for the job
#PBS -q cpu Sends the job to Big Red II's routing queue (cpu); jobs submitted to the cpu routing queue are placed in the normal, long, or serial queue based on their resource requirements (for more, seeARCHIVED: Big Red II queue information)
#PBS -abe Sets event notification to send email if the job is (a) aborted, when it (b) begins, and when it (e) ends
#PBS -M Indicates where to send event notifications (replace username@iu.edu with your IU email address)
#PBS -N Assigns a job name (my_R_job)

The commands in the body of the script do the following:

  • The cd command changes the working directory to the job submission directory (where the R input file is located) before executing further commands; this is necessary because TORQUE scripts execute in your home directory by default.
  • Theaprun -n 32 command launches the specified application on all 32 cores of one compute node in the ESM execution environment.
  • TheR CMD BATCH --no-save R_input.R string starts R in batch mode, tells R not to save an image of the current workspace at the end of the session (--no-save), and specifies the file from which R should take its input (R_input.R).

Run R interactively

If your interactive session will require less than 20 minutes of processor time, you can load the required modules and launch R from the Big Red II command line; for example:

gchawwaa@login2:~> module swap PrgEnv-cray PrgEnv-intel
gchawwaa@login2:~> module load r
gchawwaa@login2:~> R

If your interactive session will require more than 20 minutes of processor time, you must run an interactive job on the compute nodes in Big Red II's Cluster Compatibility Mode (CCM) execution environment.

Note:
Because the login nodes are not intended for computational work, UITS strongly recommends this method of interactive execution.

To run an interactive R job on Big Red II:

  1. Make sure your user environment is configured properly. In addition to the PrgEnv-intel and r modules, you must also load the ccm module. Enter the following commands, or add them to your ~/.modules file:
    module swap PrgEnv-cray PrgEnv-intel
    module load r
    module load ccm
    
  2. From the command line, enter the qsub command with the -I (interactive), -l gres=ccm (use CCM), and -q cpu (CPU queue) flags added; for example:
    qsub -I -l walltime=01:00:00 -l nodes=1:ppn=32 -l gres=ccm -q cpu
    

    When the requested resources are available, your job will start. Once the CCM execution environment is initialized, you'll be placed on one of Big Red II's aprun nodes:

    chebacca@login2:~> qsub -I -l walltime=01:00:00 -l nodes=1:ppn=32 -l gres=ccm -q cpu
    qsub: waiting for job 788009 to start
    qsub: job 788009 ready
    
    In CCM JOB:  788009  JID  788009  USER  chebacca  GROUP  wook
    Initializing CCM environment, Please Wait
    CCM Start success, 1 of 1 responses
    Directory: /N/u/chebacca/BigRed2
    Thu Mar 26 16:09:44 EDT 2018
    chebacca@aprun8:~>
    
  3. From the aprun command line, enter the ccmlogin command:
    chebacca@aprun8:~> ccmlogin
    

    This will place you on a Big Red II compute node (for example, nid00085):

    Warning: Permanently added '[nid00885]:203' (RSA) to the list of known hosts.
    chebacca@nid00885:~>
    
  4. From the compute node command prompt, enter R to launch R:
    chebacca@nid00885:~> R
    
    R version 3.1.1 (2014-07-10) -- "Sock it to Me"
    Copyright (C) 2014 The R Foundation for Statistical Computing
    Platform: x86_64-unknown-linux-gnu (64-bit)
    
    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.
    
    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.
    
    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    
    >
    

To use the features of the R graphical user interface (GUI), you must SSH to Big Red II with X forwarding enabled, and then use qsub with the -I (interactive) and -X (X forwarding) switches, as well as the -l gres=ccm (use CCM) switch; for example:

lpawaroo@login1:~> qsub -I -X -l walltime=01:00:00 -l nodes=1:ppn=32 -l gres=ccm -q cpu

Get help

If you need help or have questions regarding the use of R on IU's research supercomputers, contact the UITS Research Applications and Deep Learning team.

Research computing support at IU is provided by the Research Technologies division of UITS. To ask a question or get help regarding Research Technologies services, including IU's research supercomputers and research storage systems, and the scientific, statistical, and mathematical applications available on those systems, contact UITS Research Technologies. For service-specific support contact information, see Research computing support at IU.

Related documents

This is document bdrv in the Knowledge Base.
Last modified on 2023-05-09 14:42:00.