On Random vs. Streaming I/O Performance; Or seek(), and You Shall Find --- Eventually.

Posted by Jonathan Dursi on May 19, 2015 · 1 min read

This is a crosspost from Jonathan Dursi, R&D computing at scale. See the original post here.

At the Simpson Lab blog, I’ve written a post on streaming vs random access I/O performance, an important topic in bioinformatics. Using a very simple problem (randomly choosing lines in a non-indexed text file) I give a quick overview of the file system stack and what it means for streaming performance, and reservoir sampling for uniform random online sampling.