UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Schedule data, not code Best, Micah J

Abstract

Parallel programming is hard and programmers still struggle to write code for shared memory multicore architectures that is both free of concurrency errors and efficient. Tools have advanced, but for tasks that are not embarrassingly parallel, or suitable for a limited model such as map/reduce, there is little help. We aim to address some major aspects of this still underserved area. We construct a model for parallelism, Data not Code (DnC), by starting with the observation that a majority of performance and problems in parallel programming are rooted in the manipulation of data, and that a better approach is to schedule data, not code. Data items don’t exist in a vacuum but are instead organized into collections, so we focus on concurrent access to these collections from both task and data parallel operations. These concepts are already embraced by many programming models and languages, such as map/reduce, GraphLab and SQL. We seek to bring the excellent principles embodied in these models, such as declarative data-centric syntax and the myriad of optimizations that it enables, to conventional programming languages, like C++, making them available in a larger variety of contexts. To make this possible, we define new language constructs and augment proven techniques from databases for accessing arbitrary parts of a collection in a familiar and expressive manner. These not only provide the programmer with constructs that are easy to use and reason about, but simultaneously allow us to better extract and analyze programmer intentions to automatically produce code with complex runtime optimizations. We present Cadmium, a proof of concept DnC language to demonstrate the effectiveness of our model. We implement a variety of programs and show that, without explicit parallel programming, they scale well on multicore architectures. We show performance competitive with, and often superior to, fine-grained locks, the most widely used method of preventing error-inducing data access in parallel operations.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International