Use R on Big Red II at IU

On this page:

Note:

Big Red II will be retired from service on December 15, 2019. After that date, you will no longer be able to log into Big Red II; however, the data in your Big Red II home directory will remain accessible from your home directory on any of the other IU research supercomputers. New software requests for Big Red II will no longer be accepted after the October 13, 2019, maintenance window.

Beginning October 14, 2019, IU graduate students, faculty, and staff will be able to create accounts on Big Red III. Undergraduate students and affiliates will be able to get accounts if they are sponsored by full-time IU faculty or staff members. Grand Challenges users who create Big Red III accounts will be able to request exclusive access to a portion of the system for running jobs.

For more, see Upcoming changes to research supercomputers at IU.


Overview

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (for example, linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering) and graphical techniques, and is highly extensible. For more, see the R Project for Statistical Computing home page.

On Big Red II at Indiana University, you can set up and run R batch jobs on compute nodes in the native Extreme Scalability Mode (ESM) execution environment, or run interactive R jobs on compute nodes in the Cluster Compatibility Mode (CCM) execution environment. The ESM and CCM execution environments are features of the Cray Linux Environment (CLE) operating system running on Big Red II.

Set up your user environment

To add the default R module, you need to have the Intel programming environment module (PrgEnv-intel) already added to your user environment. To determine which modules are currently loaded, on the command line, enter:

module list

If another programming environment module (for example, PrgEnv-cray) is loaded, use the module swap command to replace it with the appropriate programming environment module; on the command line, enter:

module swap PrgEnv-cray PrgEnv-intel

To load the default R module; on the command line, enter:

module load r
Note:

Non-default versions of R installed on Big Red II require either the Intel (PrgEnv-intel) or GNU programming environment (PrgEnv-gnu) module. To see which versions are available, on the command line, enter:

module avail r/

To load a version that does not have intel in its module name (for example, r/3.1.1), make sure the GNU programming environment (PrgEnv-gnu) module is already loaded.

To load a non-default R module, enter the full module name; for example:

module load r/3.1.1

To make permanent changes to your environment, edit your ~/.modules file. For more, see Use a .modules file in your home directory to save your user environment on an IU research supercomputer.

For example, to make sure the modules required for using the default R package are loaded every time you log into Big Red II, add the following lines to your ~/.modules file:

module swap PrgEnv-cray PrgEnv-intel
module load r

For more about using Modules to configure your user environment, see Use Modules to manage your software environment on IU's research computing systems.

Submit an R batch job

To submit an R batch job that will run on Big Red II's compute nodes in the ESM execution environment:

  1. Create an R file (for example, R_input.R) containing the commands R should run.
  2. Create a job script (for example, R_job) that includes the aprun command to launch R on a compute node in the ESM execution environment (see the job script example below).
  3. Submit your job script (for example, R_job) to the TORQUE resource manager; on the command prompt, enter:
    qsub R_job
    
  4. To check the status of your job, use the qstat command (replace username with your IU username):
    qstat -u username
    

Job script example

The following example can be used (with some minor modifications) to run an R batch job on all 32 processors of one compute node in Big Red II's ESM execution environment:

#!/bin/bash

#PBS -l nodes=1:ppn=32
#PBS -l walltime=01:00:00
#PBS -q cpu
#PBS -m abe
#PBS -M username@iu.edu
#PBS -N my_R_job

cd /N/u/username/BigRed2/your_working_directory

aprun -n 32 R CMD BATCH --no-save R_input.R

The TORQUE directives in the example script do the following:

Directive Function
#PBS -l nodes=1:ppn=32 Sets resource requirements for the job to one node, 32 processors per node
#PBS -l walltime=01:00:00 Requests one hour of wall-clock time for the job
#PBS -q cpu Sends the job to Big Red II's routing queue (cpu); jobs submitted to the cpu routing queue are placed in the normal, long, or serial queue based on their resource requirements (for more, seeBig Red II queue information)
#PBS -abe Sets event notification to send email if the job is (a) aborted, when it (b) begins, and when it (e) ends
#PBS -M Indicates where to send event notifications (replace username@iu.edu with your IU email address)
#PBS -N Assigns a job name (my_R_job)

The commands in the body of the script do the following:

  • The cd command changes the working directory to the job submission directory (where the R input file is located) before executing further commands; this is necessary because TORQUE scripts execute in your home directory by default.
  • Theaprun -n 32 command launches the specified application on all 32 cores of one compute node in the ESM execution environment.
  • TheR CMD BATCH --no-save R_input.R string starts R in batch mode, tells R not to save an image of the current workspace at the end of the session (--no-save), and specifies the file from which R should take its input (R_input.R).

Run R interactively

If your interactive session will require less than 20 minutes of processor time, you can load the required modules and launch R from the Big Red II command line; for example:

gchawwaa@login2:~> module swap PrgEnv-cray PrgEnv-intel
gchawwaa@login2:~> module load r
gchawwaa@login2:~> R

If your interactive session will require more than 20 minutes of processor time, you must run an interactive job on the compute nodes in Big Red II's Cluster Compatibility Mode (CCM) execution environment.

Note:
Because the login nodes are not intended for computational work, UITS strongly recommends this method of interactive execution.

To run an interactive R job on Big Red II:

  1. Make sure your user environment is configured properly. In addition to the PrgEnv-intel and r modules, you must also load the ccm module. Enter the following commands, or add them to your ~/.modules file:
    module swap PrgEnv-cray PrgEnv-intel
    module load r
    module load ccm
    
  2. From the command line, enter the qsub command with the -I (interactive), -l gres=ccm (use CCM), and -q cpu (CPU queue) flags added; for example:
    qsub -I -l walltime=01:00:00 -l nodes=1:ppn=32 -l gres=ccm -q cpu
    

    When the requested resources are available, your job will start. Once the CCM execution environment is initialized, you'll be placed on one of Big Red II's aprun nodes:

    chebacca@login2:~> qsub -I -l walltime=01:00:00 -l nodes=1:ppn=32 -l gres=ccm -q cpu
    qsub: waiting for job 788009 to start
    qsub: job 788009 ready
    
    In CCM JOB:  788009  JID  788009  USER  chebacca  GROUP  wook
    Initializing CCM environment, Please Wait
    CCM Start success, 1 of 1 responses
    Directory: /N/u/chebacca/BigRed2
    Thu Mar 26 16:09:44 EDT 2018
    chebacca@aprun8:~>
    
  3. From the aprun command line, enter the ccmlogin command:
    chebacca@aprun8:~> ccmlogin
    

    This will place you on a Big Red II compute node (for example, nid00085):

    Warning: Permanently added '[nid00885]:203' (RSA) to the list of known hosts.
    chebacca@nid00885:~>
    
  4. From the compute node command prompt, enter R to launch R:
    chebacca@nid00885:~> R
    
    R version 3.1.1 (2014-07-10) -- "Sock it to Me"
    Copyright (C) 2014 The R Foundation for Statistical Computing
    Platform: x86_64-unknown-linux-gnu (64-bit)
    
    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.
    
    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.
    
    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    
    >
    

To use the features of the R graphical user interface (GUI), you must SSH to Big Red II with X forwarding enabled, and then use qsub with the -I (interactive) and -X (X forwarding) switches, as well as the -l gres=ccm (use CCM) switch; for example:

lpawaroo@login1:~> qsub -I -X -l walltime=01:00:00 -l nodes=1:ppn=32 -l gres=ccm -q cpu

Get help

If you need help or have questions regarding the use of R on IU's research supercomputers, contact the UITS Research Applications and Deep Learning team.

Support for IU research computing systems, software, and services is provided by the Research Technologies division of UITS. To ask a question or get help, contact UITS Research Technologies.

Related documents

This is document bdrv in the Knowledge Base.
Last modified on 2019-08-01 13:02:50.

Contact us

For help or to comment, email the UITS Support Center.