Using R with SGE-compatible Grid engines and Rmpi: micro-HOWTO

Boris Veytsman

Revision: 1.4 Date: 2016/08/26 16:51:06


SGE-compatible engines (see are very popular in the cluster world. This document describes how to run R in this environment.

The whole reason to use clusters is to run parallel computations on several cores on several computers (nodes). However, R itself is an one-thread program. Thus to use parallel computations in R you need to run several copies of it (slaves) and make them talk to the original copy (master). There are several ways to do this, see the discussion at In this micro-HOWTO we discuss just one: MPI, based on the Message Passing Interface protocol. We discuss the way to set up computations in the most simple and transparent way.

R packages

  1. You will need package Rmpi, which provides low-level functions for starting slaves.
  2. You will need package snow which creates clusters: groups of slaves to which you can send the processing jobs. However, while this package must be installed, we will not use it directly.
  3. Rather, we will use the package parallel that provides a more streamlined system for parallel processing. It calls snow functions as needed itself.

R scripts

We start with loading two packages we need (remember, snow is used indirectly):


Now we need to start up a cluster. How many slaves should we start? Suppose we use just one cluster for all parallel computations. We do not want to have fewer workers than the number of available cores: the spare cores will be unused. On the other hand, it does not make much sense to use more workers than the number of cores.

Some authors recommend the number of slaves being one less than the number of cores, so one core is used for the master. On the other hand, if the master is not doing heavy computations, you may use its core too. The number of available cores is given by the Rmpi primitive mpi.universe.size(), so depending of your tastes you may want either

n.cores <- mpi.universe.size()
n.cores <- max(1, mpi.universe.size()-1)
The expression in the last line takes care of the fact that if you run your script outside of MPI environment (for example, for debugging), then the function mpi.universe.size() returns 1--but you still need to start at least one slave process.

Then you use parallel package to start cluster:

cl <- makeCluster(n.cores, type='MPI')
This cluster object you pass to the parallel computing. See the documentation of parallel package about the available functions.

After you finish your computations, you need close the environment:


Let us now discuss a complete (but very simple) example. We use the function parLapply to run on all cores a very simple process:

This process runs the system function hostname, which returns the name of the node. We convert the resulting list into a vector, and print the summary, which just shows, how many cores were provided by each node. The script itself is the following:
# detectCores.R script
n.cores <- mpi.universe.size()
cl <- makeCluster(n.cores, type='MPI')
result <- parLapply(cl, 1:n.cores,
           function (i) {system2('hostname', stdout=TRUE)})

SGE scripts

SGE scripts follow the usual syntax, see the documentation at You need to tell the engine to use MPI; this could be either -pe mpi N or -pe openmpi N depending on your installation, where N is the number of cores.

Here is the script for the program above. It gives the program 32 cores in the queue main.q, writes the output to the file detectCores.out, putting there both stdout and stderr (since -j y) and uses the current directory as the work directory:

#$ -q main.q
#$ -pe mpi 32
#$ -o detectCores.out
#$ -j y
#$ -cwd
. /etc/profile
. /home/boris/.bashrc
mpirun -n 1 Rscript detectCores.R

The most interesting line is mpirun -n 1. Why is the number of jobs equal to 1 here?

The answer is, in many distributed application mpirun starts N processes, and they exchange messages. In our model Rmpi on the master starts slaves itself, so we do not want mpirun to start any additional copies of R.


Here is the output of the program discussed in the two previous sections:

         32 slaves are spawned successfully. 0 failed.
bnetnode1 bnetnode2 bnetnode3 bnetnode4 
        8         8         8         8
We see that our script was run on 32 cores, uniformly shared between four nodes in our cluster.


I am grateful to Andy Smith who helped me with setting up the cluster where these experiments were done.

Boris Veytsman