This is a crosspost from Jonathan Dursi, R&D computing at scale. See the original post here.
R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it’s not clear what to do next.
I’ve put together material for a day-long tutorial on scalable data analysis in R. It covers:
The presentation for the material, in R markdown (so including the sourcecode) is in the presentation directory; you can read the resulting presentation as markdown there, or as a PDF.
The R code from the slides can be found in the R directory.
Some data can be found in the data directory; but as you might expect in a workshop on scalable data analysis, the files are quite large! Mostly you can just find scripts for downloading the data; running make in the main directory will pull almost everything down, but a little more work needs go to into automating some of the production of the data products used.
Suggestions, as always, greatly welcomed.