- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- On the ordering of multiattribute data in information...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
On the ordering of multiattribute data in information retrieval systems Liu, Xian
Abstract
The requirement of ordering a set of multiattribute data arises frequently from spatial index processing and secondary key retrieval in modern information systems. The first application involves developing a single numerical index on a one-dimensional line for each point in multi-dimensional space, such that the spatial localizability can be preserved as best as possible. This work can be carried out by a mathematical transformation called spatial ordering. The Hilbert order has attained intensive interest in the literature due to its desirable performance. The lack of inexpensive encoding and decoding algorithms has been mentioned frequently in publications. The second application, secondary key retrieval, involves two issues: determining the resolution for each dimension in a multi-dimensional hashed space and ordering data blocks on disks. Aho and Ullman proposed a constrained nonlinear programming model applied to the partial match query. However, the literature is relatively silent on the range query retrieval. Faloutsos showed that the order determined by the reflected Gray code can significantly improve the cluster distribution of the data blocks. It is recommended to design symmetric Gray codes to reduce the bias introduced by the reflected Gray code. In this thesis, efficient analytical formulas are first derived to encode and decode the two-dimensional Hilbert order. Then the algorithms for the three-dimensional Hilbert order are discussed. The encoding and decoding processes for n-dimensional Hilbert orders (n > 4) would be considerably complicated and place a heavy computational burden on the application problems. Thus a new spatial order called the U order is proposed. Its major advantage is the significant simplicity of the encoding and decoding operations for multi-dimensional data. Performance assessments suggest that the U order is comparable with the Hilbert order and superior to most other orders. The performance analysis is followed by a formal mathematical description for the n-dimensional U order. Then a set of analytical encoding and decoding formulas are presented. Besides the U order, several new spatial orders with the quadrant-recursive structure are also investigated. The goal is to find the candidates that are competitive with the Hilbert order in performance and with relative simple encoding and/or decoding formulas. Of these new orders, the Q4 order behaves best and its performance is very close to the Hilbert order. An analysis shows that its encoding and decoding algorithms only need 64.5% and 84.2% of operations of the corresponding algorithms of the Hilbert order. The optimization of resolutions of multiattribute file systems is then discussed. Optimization models for the range query retrievals are presented by incorporating stochastic programming methodology. Three approaches axe discussed. The hereand- now and the wait-and-see approaches seem to be applicable to the situation in which range queries follow a relative simple distribution. Four typical paradigms are analyzed. One of them leads to a result that is similar to the one used in industry to evaluate the performance of disk drives. The scenario tracking approach appears to be more flexible and more appropriate for the problems where the probability distribution is not available. It also provides an adaptive technique to formulate non-conventional stochastic programming problems. Then a stochastic programming software package is developed and implemented. Numerical experiments show the coordinating effect of the scenario tracking methodology. Finally, the symmetric Hamiltonian paths in a high-dimensional hypercube are discussed. A heuristic approach is applied to construct the symmetric Hamiltonian paths in the five-, six-, and seven-dimensional hypercubes. They are equivalent to the symmetrical Gray codes. In summary, this thesis proposes three major approaches applied to the ordering of multiattribute data: a set of analytical encoding and decoding formulas for the twodimensional Hilbert order, two new spatial orders with significant simple encoding and decoding formulas, and a set of optimization models for the range query problem.
Item Metadata
Title |
On the ordering of multiattribute data in information retrieval systems
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
1996
|
Description |
The requirement of ordering a set of multiattribute data arises frequently from spatial
index processing and secondary key retrieval in modern information systems. The
first application involves developing a single numerical index on a one-dimensional line
for each point in multi-dimensional space, such that the spatial localizability can be
preserved as best as possible. This work can be carried out by a mathematical transformation
called spatial ordering. The Hilbert order has attained intensive interest in
the literature due to its desirable performance. The lack of inexpensive encoding and
decoding algorithms has been mentioned frequently in publications.
The second application, secondary key retrieval, involves two issues: determining
the resolution for each dimension in a multi-dimensional hashed space and ordering
data blocks on disks. Aho and Ullman proposed a constrained nonlinear programming
model applied to the partial match query. However, the literature is relatively silent on
the range query retrieval. Faloutsos showed that the order determined by the reflected
Gray code can significantly improve the cluster distribution of the data blocks. It is
recommended to design symmetric Gray codes to reduce the bias introduced by the
reflected Gray code. In this thesis, efficient analytical formulas are first derived to encode and decode
the two-dimensional Hilbert order. Then the algorithms for the three-dimensional
Hilbert order are discussed.
The encoding and decoding processes for n-dimensional Hilbert orders (n > 4)
would be considerably complicated and place a heavy computational burden on the
application problems. Thus a new spatial order called the U order is proposed. Its
major advantage is the significant simplicity of the encoding and decoding operations
for multi-dimensional data. Performance assessments suggest that the U order is comparable
with the Hilbert order and superior to most other orders. The performance
analysis is followed by a formal mathematical description for the n-dimensional U
order. Then a set of analytical encoding and decoding formulas are presented.
Besides the U order, several new spatial orders with the quadrant-recursive structure
are also investigated. The goal is to find the candidates that are competitive
with the Hilbert order in performance and with relative simple encoding and/or decoding
formulas. Of these new orders, the Q4 order behaves best and its performance
is very close to the Hilbert order. An analysis shows that its encoding and decoding
algorithms only need 64.5% and 84.2% of operations of the corresponding algorithms
of the Hilbert order.
The optimization of resolutions of multiattribute file systems is then discussed.
Optimization models for the range query retrievals are presented by incorporating
stochastic programming methodology. Three approaches axe discussed. The hereand-
now and the wait-and-see approaches seem to be applicable to the situation in which range queries follow a relative simple distribution. Four typical paradigms are
analyzed. One of them leads to a result that is similar to the one used in industry
to evaluate the performance of disk drives. The scenario tracking approach appears
to be more flexible and more appropriate for the problems where the probability
distribution is not available. It also provides an adaptive technique to formulate
non-conventional stochastic programming problems. Then a stochastic programming
software package is developed and implemented. Numerical experiments show the
coordinating effect of the scenario tracking methodology.
Finally, the symmetric Hamiltonian paths in a high-dimensional hypercube are
discussed. A heuristic approach is applied to construct the symmetric Hamiltonian
paths in the five-, six-, and seven-dimensional hypercubes. They are equivalent to
the symmetrical Gray codes.
In summary, this thesis proposes three major approaches applied to the ordering
of multiattribute data: a set of analytical encoding and decoding formulas for the twodimensional
Hilbert order, two new spatial orders with significant simple encoding and
decoding formulas, and a set of optimization models for the range query problem.
|
Extent |
6341419 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-02-19
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0065151
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
1996-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.