Vortex: Extreme-Performance Memory Abstractions for Data-Intensive Streaming Applications

Session: Streaming computational models--In the flow!

Authors: Carson Hanel (Texas A&M University); Arif Arman (Texas A&M University); Di Xiao (Texas A&M University); John Keech (Texas A&M University); Dmitri Loguinov (Texas A&M University)

Many applications in data analytics, information retrieval, and cluster computing process huge amounts of information. The complexity of involved algorithms and massive scale of data require a programming model that can not only offer a simple abstraction for inputs larger than RAM, but also squeeze maximum performance out of the available hardware. While these are usually conflicting goals, we show that this does not have to be the case for sequentially-processed data, i.e., in streaming applications. We develop a set of algorithms called Vortex that force the application to generate access violations (i.e., page faults) during processing of the stream, which are transparently handled in such a way that creates an illusion of an infinite buffer that fits into a regular C/C++ pointer. This design makes Vortex by far the simplest-to-use and fastest platform for various types of streaming I/O, inter-thread data transfer, and key shuffling. We introduce several such applications - file I/O wrapper, bounded producer-consumer pipeline, vanishing array, key-partitioning engine, and novel in-place radix sort that is 3-4 times faster than the best prior approaches.