A streaming algorithms approach to approximating hit rate curves

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

A streaming algorithms approach to approximating hit rate curves Drudi, Zachary

Abstract

In this work, we study systems with two levels of memory: a fixed-size cache, and a backing store, each of which contain blocks. In order to serve an IO request, the block must be in the cache. If the block is already in the cache when it is requested, the request is a cache hit. Otherwise it is a cache miss, and the block must be brought into the cache. If the cache is full, a block must be evicted from the cache to make room for the new block. A replacement policy determines which block to evict. In this work, we consider only the LRU policy. An LRU cache evicts the block which was least recently requested. A trace is a sequence of blocks, representing a stream of IO requests. For a given trace, a hit rate curve maps cache sizes to the fraction of hits that such a cache would achieve on the trace. Hit rate curves have been used to design storage systems, partition memory among competing processes, detect phases in a trace, and dynamically adjust heap size in garbage-collected applications. The first algorithm to compute the hit rate curve of a trace over a single pass was given by Mattson et al. in 1970. A long line of work has improved on this initial algorithm. The main contribution of our work is the presentation and formal analysis of two algorithms to approximate hit rate curves. Inspired by recent results in the streaming algorithms community on the distinct elements problem, we use memory efficient probabilistic counters to estimate the number of distinct blocks in a subsequence of the trace, which allows us to approximate the hit rate curve using sublinear space. We also formally state some variants of the hit rate curve approximation problem which our algorithms solve, and derive lower bounds on the space complexity of these problems using tools from communication complexity.

Item Metadata

Title	A streaming algorithms approach to approximating hit rate curves
Creator	Drudi, Zachary
Publisher	University of British Columbia
Date Issued	2014
Description	In this work, we study systems with two levels of memory: a fixed-size cache, and a backing store, each of which contain blocks. In order to serve an IO request, the block must be in the cache. If the block is already in the cache when it is requested, the request is a cache hit. Otherwise it is a cache miss, and the block must be brought into the cache. If the cache is full, a block must be evicted from the cache to make room for the new block. A replacement policy determines which block to evict. In this work, we consider only the LRU policy. An LRU cache evicts the block which was least recently requested. A trace is a sequence of blocks, representing a stream of IO requests. For a given trace, a hit rate curve maps cache sizes to the fraction of hits that such a cache would achieve on the trace. Hit rate curves have been used to design storage systems, partition memory among competing processes, detect phases in a trace, and dynamically adjust heap size in garbage-collected applications. The first algorithm to compute the hit rate curve of a trace over a single pass was given by Mattson et al. in 1970. A long line of work has improved on this initial algorithm. The main contribution of our work is the presentation and formal analysis of two algorithms to approximate hit rate curves. Inspired by recent results in the streaming algorithms community on the distinct elements problem, we use memory efficient probabilistic counters to estimate the number of distinct blocks in a subsequence of the trace, which allows us to approximate the hit rate curve using sublinear space. We also formally state some variants of the hit rate curve approximation problem which our algorithms solve, and derive lower bounds on the space complexity of these problems using tools from communication complexity.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2014-09-30
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0166992
URI	http://hdl.handle.net/2429/50486
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2014-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

A streaming algorithms approach to approximating hit rate curves Drudi, Zachary

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights