UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Compilation-assisted performance acceleration for data analytics Mustard, Craig

Abstract

Fundamental data analytics tasks are often simple -- many useful and actionable insights can be garnered by simply filtering, grouping, and summarizing data. However the sheer volume of data to be analyzed, demands of a multi-user operating environment, and limitations of general purpose processors make it challenging to perform these operations efficiently at scale. This thesis presents two techniques that address these challenges to improve the response time of data analytics tasks: (1) Newly emerging programmable network processors can perform data analytics tasks at terabits per second. However, existing data analytics systems, like Apache Spark, cannot readily use network processors because network processors are very limited and cannot execute the tasks generated by existing analytics systems. Using network processors for analytics requires re-designing how existing systems compile and execute data processing tasks. This thesis introduces Jumpgate, a system that enables existing data processing systems to execute relational queries using network processors. Jumpgate compiles client requests to a novel execution model that coordinates execution on heterogeneous network processors. Jumpgate can already improve performance by 1.12-3x on industry standard benchmarks, and paves the way for adopting network processors for data analytics tasks. (2) Analytics systems often process similar queries, either submitted by the same or different users. Cross program memoization (CPM) is a technique to re-use results of prior computations across programs and users. However, CPM is often not enabled because prior implementations have high overhead and are unable to re-use the output of user-defined functions (UDFs). This thesis presents KeyChain, a CPM implementation that identifies equivalent UDFs and has low overhead so CPM can be always be enabled.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International