Parallel computation of high dimensional robust correlation and covariance matrices

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Parallel computation of high dimensional robust correlation and covariance matrices Chilson, James

Abstract

Currently, data mining applications use classical methods to calculate covariance and correlation matrices. These methods have the drawback that they can be adversely affected by data set outliers. Thus, robust methods for calculating covariance and correlation matrices are useful for these applications. However, robust methods require more time to calculate. To counter this, we propose two parallel robust methods of calculating correlation and covariance matrices. The first algorithm is a parallel version of Quadrant Correlation (QC), and the second is a parallel version of the Maronna method. Parallel QC uses a parallel matrix library and can handle single-dimensional outliers in its data. The parallel Maronna method divides the independent correlation calculations between the processors and is capable of detecting one and two dimensional outliers in data. We evaluate these algorithms using a dataset from a "real-life" application. It is a genetic data set that comes from cardiovascular research, and it contains 6068 variables. Our evaluation also includes performance results from datasets with varying dimensions, performance of several algorithm components, a communications analysis, and improvements for the Maronna method. ' From our results we conclude that our parallel algorithms make the robust calculation of correlation and covariance matrices useful in applications that deal with large dimensional data, such as data mining. Our initial hypothesis was that Maronna would perform better in parallel than QC, to the point that Maronna would be faster. In actuality, we found that Maronna does work better in parallel than a parallel QC in that it scales to more processors. However, our experiments do not show the parallel Maronna takes less time. Our conclusion is QC and Maronna are two viable options for computing robust correlation and covariance matrices. QC is less robust, fast, but does not scale as well to many processors while Maronna takes longer, is more robust, and scales to many processors.

Item Metadata

Title	Parallel computation of high dimensional robust correlation and covariance matrices
Creator	Chilson, James
Publisher	University of British Columbia
Date Issued	2004
Description	Currently, data mining applications use classical methods to calculate covariance and correlation matrices. These methods have the drawback that they can be adversely affected by data set outliers. Thus, robust methods for calculating covariance and correlation matrices are useful for these applications. However, robust methods require more time to calculate. To counter this, we propose two parallel robust methods of calculating correlation and covariance matrices. The first algorithm is a parallel version of Quadrant Correlation (QC), and the second is a parallel version of the Maronna method. Parallel QC uses a parallel matrix library and can handle single-dimensional outliers in its data. The parallel Maronna method divides the independent correlation calculations between the processors and is capable of detecting one and two dimensional outliers in data. We evaluate these algorithms using a dataset from a "real-life" application. It is a genetic data set that comes from cardiovascular research, and it contains 6068 variables. Our evaluation also includes performance results from datasets with varying dimensions, performance of several algorithm components, a communications analysis, and improvements for the Maronna method. ' From our results we conclude that our parallel algorithms make the robust calculation of correlation and covariance matrices useful in applications that deal with large dimensional data, such as data mining. Our initial hypothesis was that Maronna would perform better in parallel than QC, to the point that Maronna would be faster. In actuality, we found that Maronna does work better in parallel than a parallel QC in that it scales to more processors. However, our experiments do not show the parallel Maronna takes less time. Our conclusion is QC and Maronna are two viable options for computing robust correlation and covariance matrices. QC is less robust, fast, but does not scale as well to many processors while Maronna takes longer, is more robust, and scales to many processors.
Extent	3220459 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-11-17
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051624
URI	http://hdl.handle.net/2429/15177
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2004-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2004-0139.pdf -- 3.07MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Parallel computation of high dimensional robust correlation and covariance matrices Chilson, James

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights