While R has not been considered a traditional application in HPC, R has become the lingua franca for many areas of data analysis, drawing power from its high level expressiveness and its multitude of domain-specific, community-developed packages. In recent years, many efforts by the R and HPC communities have focused on bridging this gap for scaling R to the power of supercomputers. The interest in using R with supercomputing is indeed on the rise within the XSEDE community as well. The two most recent workshops on R organized by NICS and TACC have drawn hundreds of registrations and online participants. The goal of this tutorial is to provide guidance to participants on improving and scaling up existing scientific analysis workflows with R in order to best utilize the resources available through XSEDE. The tutorial consists of two major components. The morning sessions focus on helping users to develop efficient R code. The presentations include how to profile R code and best practices in writing and compiling R code for efficiency. The afternoon session will focus on approaches to scale R computations on resources that are supported by XSEDE. The presentations include using parallel packages (such as parallel, pbdR) with R, utilizing hardware accelerators (such as Xeon Phi and GPGPU) and bridging R with other big data analysis systems (such as Hadoop and Spark).
3:15 - 4:30 Working with Hadoop Streaming, RHadoop, SparkR