This is a crosspost from Jonathan Dursi, R&D computing at scale. See the original post here.
I was asked recently to do short presentation for the Greater Toronto R Users Group on parallel computing in R; My slides can be seen below or on github, where the complete materials can be found.
I covered some similar things I had covered in a half-day workshop a couple of years earlier (though, obviously, without the hands-on component):
with some bonus material tacked on the end touching on a couple advanced topics.
I was quite surprised at how little had changed since late 2014, other than further development of SparkR (which I didn’t cover), and the interesting but seemingly not very much used future package. I was also struck by how hard it is to find similar materials online, covering a range of parallel computing topics in R - it’s rare enough that even this simple effort made it to the HPC project view on CRAN (under “related links”). R continues to grow in popularity for data analysis; is this all desktop computing? Is Spark siphoning off the clustered-dataframe usage?
(This was also my first time with RPres in RStudio; wow, not a fan, RPres was not ready for general release. And I’m a big fan of RMarkdown.)