UBC Theses and Dissertations
Secure query answering and privacy-preserving data publishing Wang, Hui
The last several decades have witnessed a phenomenal growth in the networking infrastructure connecting computers all over the world. The Web has now become an ubiquitous channel for information sharing and dissemination. More and more data is being exchanged and published on the Web. This growth has created a whole new set of research challenges, while giving a new spin to some existing ones. For example, XML(eXtensible Markup Language), a self-describing and semi-structured data format, has emerged as the standard for representing and exchanging data between applications across the Web. An important issue of data publishing is the protection of sensitive and private information. However, security/privacy-enhancing techniques bring disadvantages: security-enhancing techniques may incur overhead for query answering, while privacy-enhancing techniques may ruin data utility. In this thesis, we study how to overcome such overhead. Specifically, we address the following two problems in this thesis: (a) efficient and secure query evaluation over published XML databases, and (b) publishing relational databases while protecting privacy and preserving utility. The first part of this thesis focuses on efficiency and security issues of query evaluation over XML databases. To protect sensitive information in the published database, security policies must be defined and enforced, which will result in unavoidable overhead. Due to the security overhead and the complex structure of XML databases, query evaluation may become inefficient. In this thesis, we study how to securely and efficiently evaluate queries over XML databases. First, we consider the access-controlled database. We focus on a security model by which every XML element either is locally assigned a security level or inherits the security level from one of its ancestors. Security checks in this model can cause considerable overhead for query evaluation. We investigate how to reduce the security overhead by analyzing the subtle interactions between inheritance of security levels and the structure of the XML database. We design a sound and complete set of rules and develop efficient, polynomial-time algorithms for optimizing security checks on queries. Second, we consider encrypted XML database in a "database-as-service" model, in which the private database is hosted by an untrusted server. Since the untrusted server has no decryption key, its power of query processing is very limited, which results in inefficient query evaluation. We study how to support secure and efficient query evaluation in this model. We design the metadata that will be hosted on the server side with the encrypted database. We show that the presence of the metadata not only facilitates query processing but also guarantees data security. We prove that by observing a series of queries from the client and responses by itself, the server's knowledge about the sensitive information in the database is always below a given security threshold. The second part of this thesis studies the problem of preserving both privacy and the utility when publishing relational databases. To preserve utility, the published data will not be perturbed. Instead, the base table in the original database will be decomposed into several view tables. First, we define a general framework to measure the likelihood of privacy breach of a published view. We propose two attack models, unrestricted and restricted models, and derive formulas to quantify the privacy breach for each model. Second, we take utility into consideration. Specifically, we study the problem of how to design the scheme of published views, so that data privacy is protected while maximum utility is guaranteed. Given a database and its scheme, there are exponentially many candidates for published views that satisfy both privacy and utility constraints. We prove that finding the globally optimal safe and faithful view, i.e., the view that does not violate any privacy constraints and provides the maximum utility, is NP-hard. We propose the locally optimal safe and faithful view as the heuristic, and show how we can efficiently find a locally optimal safe and faithful view in polynomial time.
Item Citations and Data