ARCHIVED: Good practices in R programming

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

On this page:

Overview
Avoid unnecessary operators
Avoid growing objects inside loops
Use vectorization if possible
Get help

Overview

Following are guidelines and code examples that illustrate good practices in R programming. At Indiana University, R is available on the research supercomputers and via IUanyWare.

Avoid unnecessary operators

R is an interpreted language; every operator in your R scripts requires a name lookup every time you use it.

The following two code examples are functionally equivalent. However, the first code example takes about twice as much processing time due to the multiple parentheses.

Example1 Example2

Example1	Example2
`system.time({ I = 0 while (I<100000) { ((((((((((10)))))))))) I = I + 1 } })` `user system elapse 0.125 0.000 0.125`	`system.time({ I = 0 while (I<100000) { 10 I = I + 1 } })` `user system elapse 0.055 0.000 0.055`

system.time({ 
    I = 0
    while (I<100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

user    system    elapse
0.125   0.000     0.125

system.time({ 
    I = 0
    while (I<100000) {
        10
        I = I + 1
    }
})

user    system    elapse
0.055   0.000     0.055

Avoid growing objects inside loops

Always pre-allocate objects to be used inside loops. Executing loops in R is slow, and growing objects inside loops will make your R program particularly slow. You should always try to pre-allocate vectors, lists, and data frames accessed inside any loops.

Consider the following two code examples. The first accesses and grows a vector inside the for loop while the second pre-allocates the vector and accesses the vector inside the for loop without growing its size.

Example1 Example2

Example1	Example2
`square_loop_noinit <- function (n) { x <- c() for (i in 1:n) { x <- c(x, i^2) } system.time({ square_loop_noinit(200) })` `user system elapse 0.257 0.000 0.257`	`square_loop_noinit <- function (n) { x <- integer(n) for (i in 1:n) { x[i] <- i^2 } system.time({ square_loop_noinit(200) })` `user system elapse 0.099 0.000 0.099`

square_loop_noinit <- function (n) {
    x <- c() 
    for (i in 1:n) {
        x <- c(x, i^2)
}
system.time({
    square_loop_noinit(200)
})

user    system    elapse
0.257   0.000     0.257

square_loop_noinit <- function (n) {
    x <- integer(n)
    for (i in 1:n) {
        x[i] <- i^2
}
system.time({
    square_loop_noinit(200)
})

user    system    elapse
0.099   0.000     0.099

Use vectorization if possible

In R, everything is a vector. In your R script, you should always write vectorized code or use pre-existing compiled kernels (which are already vectorized and optimized) to avoid interpreter overhead.

Consider the following two code examples. The second example achieves a 38-fold speedup by using vectorized code provided by compiled kernels.

Example1 Example2

Example1	Example2
`Ply <- function(x) lapply (rep(1, 1000), rnorm) system.time({ Ply() })` `user system elapse 0.348 0.000 0.348`	`vec <- function(x) rnorm(1000) system.time({ vec() })` `user system elapse 0.009 0.000 0.009`

Ply <- function(x) lapply (rep(1, 1000), rnorm)
system.time({
    Ply()
})

user    system    elapse
0.348   0.000     0.348

vec <- function(x) rnorm(1000)
system.time({
    vec()
})

user    system    elapse
0.009   0.000     0.009

Get help

For help with R on the IU research supercomputers, contact the UITS Research Applications and Deep Learning team.