UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the ordering of multiattribute data in information retrieval systems Liu, Xian

Abstract

The requirement of ordering a set of multiattribute data arises frequently from spatial index processing and secondary key retrieval in modern information systems. The first application involves developing a single numerical index on a one-dimensional line for each point in multi-dimensional space, such that the spatial localizability can be preserved as best as possible. This work can be carried out by a mathematical transformation called spatial ordering. The Hilbert order has attained intensive interest in the literature due to its desirable performance. The lack of inexpensive encoding and decoding algorithms has been mentioned frequently in publications. The second application, secondary key retrieval, involves two issues: determining the resolution for each dimension in a multi-dimensional hashed space and ordering data blocks on disks. Aho and Ullman proposed a constrained nonlinear programming model applied to the partial match query. However, the literature is relatively silent on the range query retrieval. Faloutsos showed that the order determined by the reflected Gray code can significantly improve the cluster distribution of the data blocks. It is recommended to design symmetric Gray codes to reduce the bias introduced by the reflected Gray code. In this thesis, efficient analytical formulas are first derived to encode and decode the two-dimensional Hilbert order. Then the algorithms for the three-dimensional Hilbert order are discussed. The encoding and decoding processes for n-dimensional Hilbert orders (n > 4) would be considerably complicated and place a heavy computational burden on the application problems. Thus a new spatial order called the U order is proposed. Its major advantage is the significant simplicity of the encoding and decoding operations for multi-dimensional data. Performance assessments suggest that the U order is comparable with the Hilbert order and superior to most other orders. The performance analysis is followed by a formal mathematical description for the n-dimensional U order. Then a set of analytical encoding and decoding formulas are presented. Besides the U order, several new spatial orders with the quadrant-recursive structure are also investigated. The goal is to find the candidates that are competitive with the Hilbert order in performance and with relative simple encoding and/or decoding formulas. Of these new orders, the Q4 order behaves best and its performance is very close to the Hilbert order. An analysis shows that its encoding and decoding algorithms only need 64.5% and 84.2% of operations of the corresponding algorithms of the Hilbert order. The optimization of resolutions of multiattribute file systems is then discussed. Optimization models for the range query retrievals are presented by incorporating stochastic programming methodology. Three approaches axe discussed. The hereand- now and the wait-and-see approaches seem to be applicable to the situation in which range queries follow a relative simple distribution. Four typical paradigms are analyzed. One of them leads to a result that is similar to the one used in industry to evaluate the performance of disk drives. The scenario tracking approach appears to be more flexible and more appropriate for the problems where the probability distribution is not available. It also provides an adaptive technique to formulate non-conventional stochastic programming problems. Then a stochastic programming software package is developed and implemented. Numerical experiments show the coordinating effect of the scenario tracking methodology. Finally, the symmetric Hamiltonian paths in a high-dimensional hypercube are discussed. A heuristic approach is applied to construct the symmetric Hamiltonian paths in the five-, six-, and seven-dimensional hypercubes. They are equivalent to the symmetrical Gray codes. In summary, this thesis proposes three major approaches applied to the ordering of multiattribute data: a set of analytical encoding and decoding formulas for the twodimensional Hilbert order, two new spatial orders with significant simple encoding and decoding formulas, and a set of optimization models for the range query problem.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.