Schedule data, not code

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Schedule data, not code Best, Micah J

Abstract

Parallel programming is hard and programmers still struggle to write code for shared memory multicore architectures that is both free of concurrency errors and efficient. Tools have advanced, but for tasks that are not embarrassingly parallel, or suitable for a limited model such as map/reduce, there is little help. We aim to address some major aspects of this still underserved area. We construct a model for parallelism, Data not Code (DnC), by starting with the observation that a majority of performance and problems in parallel programming are rooted in the manipulation of data, and that a better approach is to schedule data, not code. Data items don’t exist in a vacuum but are instead organized into collections, so we focus on concurrent access to these collections from both task and data parallel operations. These concepts are already embraced by many programming models and languages, such as map/reduce, GraphLab and SQL. We seek to bring the excellent principles embodied in these models, such as declarative data-centric syntax and the myriad of optimizations that it enables, to conventional programming languages, like C++, making them available in a larger variety of contexts. To make this possible, we define new language constructs and augment proven techniques from databases for accessing arbitrary parts of a collection in a familiar and expressive manner. These not only provide the programmer with constructs that are easy to use and reason about, but simultaneously allow us to better extract and analyze programmer intentions to automatically produce code with complex runtime optimizations. We present Cadmium, a proof of concept DnC language to demonstrate the effectiveness of our model. We implement a variety of programs and show that, without explicit parallel programming, they scale well on multicore architectures. We show performance competitive with, and often superior to, fine-grained locks, the most widely used method of preventing error-inducing data access in parallel operations.

Item Metadata

Title	Schedule data, not code
Creator	Best, Micah J
Publisher	University of British Columbia
Date Issued	2020
Description	Parallel programming is hard and programmers still struggle to write code for shared memory multicore architectures that is both free of concurrency errors and efficient. Tools have advanced, but for tasks that are not embarrassingly parallel, or suitable for a limited model such as map/reduce, there is little help. We aim to address some major aspects of this still underserved area. We construct a model for parallelism, Data not Code (DnC), by starting with the observation that a majority of performance and problems in parallel programming are rooted in the manipulation of data, and that a better approach is to schedule data, not code. Data items don’t exist in a vacuum but are instead organized into collections, so we focus on concurrent access to these collections from both task and data parallel operations. These concepts are already embraced by many programming models and languages, such as map/reduce, GraphLab and SQL. We seek to bring the excellent principles embodied in these models, such as declarative data-centric syntax and the myriad of optimizations that it enables, to conventional programming languages, like C++, making them available in a larger variety of contexts. To make this possible, we define new language constructs and augment proven techniques from databases for accessing arbitrary parts of a collection in a familiar and expressive manner. These not only provide the programmer with constructs that are easy to use and reason about, but simultaneously allow us to better extract and analyze programmer intentions to automatically produce code with complex runtime optimizations. We present Cadmium, a proof of concept DnC language to demonstrate the effectiveness of our model. We implement a variety of programs and show that, without explicit parallel programming, they scale well on multicore architectures. We show performance competitive with, and often superior to, fine-grained locks, the most widely used method of preventing error-inducing data access in parallel operations.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-10-15
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0394747
URI	http://hdl.handle.net/2429/76292
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2020-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Schedule data, not code Best, Micah J

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights