While R has not been considered a traditional application in HPC, R has become the lingua franca for many areas of data analysis, drawing power from its high level expressiveness and its multitude of domain-specific, community-developed packages. In recent years, many efforts by the R and HPC communities have focused on bridging this gap for scaling R to the power of supercomputers. The interest in using R with supercomputing is indeed on the rise within the XSEDE community as well. The two most recent workshops on R organized by NICS and TACC have drawn hundreds of registrations and online participants. The goal of this tutorial is to provide guidance to participants on improving and scaling up existing scientific analysis workflows with R in order to best utilize the resources available through XSEDE. The tutorial consists of two major components. The morning sessions focus on helping users to develop efficient R code. The presentations include how to profile R code and best practices in writing and compiling R code for efficiency. The afternoon session will focus on approaches to scale R computations on resources that are supported by XSEDE. The presentations include using parallel packages (such as parallel, pbdR) with R, utilizing hardware accelerators (such as Xeon Phi and GPGPU) and bridging R with other big data analysis systems (such as Hadoop and Spark).

Please Click Here to Access Slides:


Morning Session: Developing Efficient R code
8:00 - 8:30 Introduction
8:30 - 9:30 Understanding your R code using profiling, debugging and benchmarking
9:30 - 10:00 coffee break
10:00 - 10:45 Improving your R code performance
10:45 - 11:45 Interfacing to Compiled Code using Rcpp with examples and practices
11:45 - 12:00  Wrap up for the morning session includes Q&A and an overview to the afternoon sessions
Afternoon Session – Scaling R code with XSEDE resources
1:30 - 2:20 Parallelization with R 
2:20 - 2:30 coffee break
2:30 - 3:15 Working with Distributed Memory using Rmpi and pdbMPI packages

3:15 - 4:30 Working with Hadoop Streaming, RHadoop, SparkR


Jason Allison
Advanced Scientific Computing
Senior Program Coordinator