- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Validation of SQL queries over streaming warehouses
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Validation of SQL queries over streaming warehouses Jain, Ritika
Abstract
There is often a need to recover the “missing” query that produced a particular output from a data stream. As an example, since a data stream is constantly evolving, the analyst may be curious about using the query from the past to evaluate it on the current state of the data stream, for further analysis. Previous research has studied the problem of reverse engineering a query that would produce a given result at a particular database state. We study the following problem. Given a streaming database D=<D0,D1,D2..>, a result Rout , and a set of candidate queries Q, efficiently find all queries Qi ϵ Q such that for some state Dji of the stream, Qi(Dji) = Rout , and report the pair (Q,witQ) where witQ is the witness of (in)validity. A witness for a valid query Qval is a state Di s.t. Qval(Di) = Rout. For an invalid query Qinval , a witness is a pair of consecutive states (Di, Di+1) such that Rout \ Qinval (Di) ≠ Ø ≠ Qinval (Di+1) \ Rout. We allow any PTIME computable monotone query to be included in Q. While techniques developed in previous research can be used to generate the candidate query set Q, we focus on developing a scalable strategy for quickly determining the witness. We establish theoretical worst-case performance guarantees for our proposed approach and show that it is no more than a factor of O(log |DRDS|) of the optimal “Lucky guess” strategy, where Q(DRDS) = Rout. We empirically evaluate our technique and compare with natural baselines inspired from previous research. We show that the baselines either fail to scale or incur an inordinate amount of overhead by failing to take advantage of natural properties of a data stream. By contrast, our strategy scales effortlessly for very large data streams. Moreover, it never performs more than a small constant times the optimal amount of work, regardless of the state of the data stream that may have led to Rout.
Item Metadata
Title |
Validation of SQL queries over streaming warehouses
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2017
|
Description |
There is often a need to recover the “missing” query that produced a particular output
from a data stream. As an example, since a data stream is constantly evolving,
the analyst may be curious about using the query from the past to evaluate it on the
current state of the data stream, for further analysis. Previous research has studied
the problem of reverse engineering a query that would produce a given result at a
particular database state.
We study the following problem. Given a streaming database D=<D0,D1,D2..>,
a result Rout , and a set of candidate queries Q, efficiently find all queries Qi ϵ Q
such that for some state Dji of the stream, Qi(Dji) = Rout , and report the pair
(Q,witQ) where witQ is the witness of (in)validity. A witness for a valid query
Qval is a state Di s.t. Qval(Di) = Rout. For an invalid query Qinval , a witness is a pair
of consecutive states (Di, Di+1) such that Rout \ Qinval (Di) ≠ Ø ≠ Qinval (Di+1) \ Rout.
We allow any PTIME computable monotone query to be included in Q. While
techniques developed in previous research can be used to generate the candidate
query set Q, we focus on developing a scalable strategy for quickly determining
the witness. We establish theoretical worst-case performance guarantees for our
proposed approach and show that it is no more than a factor of O(log |DRDS|) of the
optimal “Lucky guess” strategy, where Q(DRDS) = Rout. We empirically evaluate
our technique and compare with natural baselines inspired from previous research.
We show that the baselines either fail to scale or incur an inordinate amount of
overhead by failing to take advantage of natural properties of a data stream. By
contrast, our strategy scales effortlessly for very large data streams. Moreover,
it never performs more than a small constant times the optimal amount of work,
regardless of the state of the data stream that may have led to Rout.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2017-08-30
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0355210
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2017-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International