Name: Using Spark with Shiny and R Markdown
Start: 2016-06-28T10:30:00-0700
End: 2016-06-28T10:48:00-0700

Click here to return to main conference site. For a one page, printable overview of the schedule, see this.

Using Spark with Shiny and R Markdown

R is well-suited to handle data that can fit in memory but additional tools are needed when the amount of data you want to analyze in R grows beyond the limits of your machine’s RAM. There have been a variety of solutions to this problem over the years that aim to solve this problem in R; one of the latest options is Apache Spark™. Spark is a cluster computing tool that enables analysis of massive, distributed data across dozens or hundreds of servers. Spark now includes an integration with R via the SparkR package. Due to Spark’s ability to interact with distributed data little latency, it is becoming an attractive tool for interfacing with large datasets in an interactive environment. In addition to handling the storage of data, Spark also incorporates a variety of other tools including stream processing, computing on graphs, and a distributed machine learning framework. Some of these tools are available to R programmers via the SparkR package. In this talk, we’ll discuss how to leverage Spark’s capabilities in a modern R environment. In particular, we’ll discuss how to use Spark within an R Markdown document or even in an interactive Shiny application. We’ll also briefly discuss alternative approaches to working with large data in R and the pros and cons of using Spark.

Moderators

Duncan Temple Lang

University of California Davis

Speakers

Jeff David Allen

RStudio

Tuesday June 28, 2016 10:30am - 10:48am PDT
SIEPR 130

Contributed talk, Performance

Attendees (211)

C
C
G
L
C
S
K
E
View All →

user2016

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Duncan Temple Lang

Jeff David Allen

Attendees (211)