Patterns and privacy preservation with prior knowledge for classification

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Patterns and privacy preservation with prior knowledge for classification Bu, Shaofeng

Abstract

Privacy preservation is a key issue in outsourcing of data mining. When we seek approaches to protect the sensitive information contained in the original data, it is also important to preserve the mining outcome. We study the problem of privacy preservation in outsourcing of classifications, including decision tree classification, support vector machine (SVM), and linear classifications. We investigate the possibility of guaranteeing no-outcome-change (NOC) and consider attack models with prior knowledge. We first conduct our investigation in the context of building decision trees. We propose a piecewise transformation approach using two central ideas of breakpoints and monochromatic pieces. We show that the decision tree is preserved if the transformation functions used for pieces satisfy the global (anti-)monotonicity. We empirically show that the proposed piecewise transformation approach can deliver a secured level of privacy and reduce disclosure risk substantially. We then propose two transformation approaches, (i) principled orthogonal transformation (POT) and (ii) true negative point (TNP) perturbation, for outsourcing SVM. We show that POT always guarantees no-outcome-change for both linear and non-linear SVM. The TNP approach gives the same guarantee when the data set is linearly separable. For linearly non-separable data sets, we show that no-outcome-change is not always possible and propose a variant of the TNP perturbation that aims to minimize the change to the SVM classifier. Experimental results show that the two approaches are effective to counter powerful attack models. In the last part, we extend the POT approach to linear classification models and propose to combine POT and random perturbation. We conduct a detailed set of experiments and show that the proposed combination approach could reduce the change on the mining outcome while still providing high level of protection on privacy by adding less noise. We further investigate the POT approach and propose a heuristic to break down the correlations between the original values and the corresponding transformed values of subsets. We show that the proposed approach could significantly improve the protection level on privacy in the worst cases.

Item Metadata

Title	Patterns and privacy preservation with prior knowledge for classification
Creator	Bu, Shaofeng
Publisher	University of British Columbia
Date Issued	2010
Description	Privacy preservation is a key issue in outsourcing of data mining. When we seek approaches to protect the sensitive information contained in the original data, it is also important to preserve the mining outcome. We study the problem of privacy preservation in outsourcing of classifications, including decision tree classification, support vector machine (SVM), and linear classifications. We investigate the possibility of guaranteeing no-outcome-change (NOC) and consider attack models with prior knowledge. We first conduct our investigation in the context of building decision trees. We propose a piecewise transformation approach using two central ideas of breakpoints and monochromatic pieces. We show that the decision tree is preserved if the transformation functions used for pieces satisfy the global (anti-)monotonicity. We empirically show that the proposed piecewise transformation approach can deliver a secured level of privacy and reduce disclosure risk substantially. We then propose two transformation approaches, (i) principled orthogonal transformation (POT) and (ii) true negative point (TNP) perturbation, for outsourcing SVM. We show that POT always guarantees no-outcome-change for both linear and non-linear SVM. The TNP approach gives the same guarantee when the data set is linearly separable. For linearly non-separable data sets, we show that no-outcome-change is not always possible and propose a variant of the TNP perturbation that aims to minimize the change to the SVM classifier. Experimental results show that the two approaches are effective to counter powerful attack models. In the last part, we extend the POT approach to linear classification models and propose to combine POT and random perturbation. We conduct a detailed set of experiments and show that the proposed combination approach could reduce the change on the mining outcome while still providing high level of protection on privacy by adding less noise. We further investigate the POT approach and propose a heuristic to break down the correlations between the original values and the corresponding transformed values of subsets. We show that the proposed approach could significantly improve the protection level on privacy in the worst cases.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2010-09-28
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0051744
URI	http://hdl.handle.net/2429/28749
Degree	Doctor of Philosophy - PhD
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2010-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Patterns and privacy preservation with prior knowledge for classification Bu, Shaofeng

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights