UBC Theses and Dissertations
From tree patterns to generalized tree patterns : on efficient evaluation of XQuery Chen, Zhimin
XQuery is the de facto standard XML query language, and it is important to have efficient query evaluation techniques available for it. It is a well-known fact that a formal bulk algebra is essential for efficient query evaluation, and the Tree Algebra for XML (TAX), among others, is invented for this purpose. It can be shown in this thesis that a substantial subset of XQuery can be expressed as TAX. An XML document is often modelled as an ordered label tree. A core operation in the evaluation of XQuery is the finding of matches for specified tree patterns against the data tree (or forest), and there has been much work towards algorithms for finding such matches efficiently. Multiple XPath expressions can be evaluated by computing one or more tree pattern matches. However, because of the flexibility of XML data, the efficient evaluation of XQuery queries as a whole is much more than a tree pattern match and combining matchings of multiple tree patterns is not the most efficient evaluation plan for XQuery. In this thesis a structure called generalized tree pattern (GTP) is proposed to concisely represent a whole XQuery expression. Evaluating a query reduces to finding the matches of its GTP, which leads to more efficient evaluation plans. Algorithms are developed to translate an XQuery expression, possibly involving join, quantifiers, grouping, aggregation and nesting, to its GTP, and to generate efficient physical plans for a specified GTP. XML data often conforms to a schema. Relevant constraints from the schema give rise to further opportunities to optimize queries. Algorithms are given in the thesis to automatically infer structural constraints from a given schema and to simplify a GTP given a set of structural constraints. Finally, a detailed set of experiments using the TIMBER XML database system shows that plans via GTPs (with or without schema knowledge) significantly outperform plans based on navigation and straightforward plans obtained directly from the query.
Item Citations and Data