UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Schema mediation and query processing in Peer Data Management Systems Zhao, Jie 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2006-0732.pdf [ 5MB ]
Metadata
JSON: 831-1.0051733.json
JSON-LD: 831-1.0051733-ld.json
RDF/XML (Pretty): 831-1.0051733-rdf.xml
RDF/JSON: 831-1.0051733-rdf.json
Turtle: 831-1.0051733-turtle.txt
N-Triples: 831-1.0051733-rdf-ntriples.txt
Original Record: 831-1.0051733-source.json
Full Text
831-1.0051733-fulltext.txt
Citation
831-1.0051733.ris

Full Text

Schema M e d i a t i o n and Q u e r y Processing in Peer D a t a Management Systems by Jie Zhao  '  B . S c , Fudan University, 2003  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF Master of Science in The Faculty of Graduate Studies (Computer Science)  The University Of British Columbia October 2006 © Jie Zhao, 2006  Abstract P 2 P D a t a M a n a g e m e n t S y s t e m s ( P D M S s ) a l l o w t h e efficient s h a r i n g o f d a t a b e t w e e n peers w i t h o v e r l a p p i n g sources of i n f o r m a t i o n . T h e s e s o u r c e s s h a r e d a t a t h r o u g h semantic m a p p i n g s between peers. In current systems, queries are a s k e d over e a c h p e e r ' s l o c a l s c h e m a a n d t h e n t r a n s l a t e d u s i n g t h e sem a n t i c m a p p i n g s b e t w e e n p e e r s . I n t h i s t h e s i s we p r o p o s e t h a t a m e d i a t e d s c h e m a c a n b e n e f i t P D M S s b y a l l o w i n g access t o m o r e d a t a a n d b y m a k i n g t h a t access c o m p r e h e n s i b l e . W e p r e s e n t o u r s y s t e m - M e P S y s , w h i c h uses t h e m e d i a t e d s c h e m a as a m e d i a for q u e r y t r a n s l a t i o n i n r e l a t i o n a l P D M S s . W e show how to create a m e d i a t e d s c h e m a i n P D M S s a u t o m a t i c a l l y u s i n g t h e s e m a n t i c s of t h e e x i s t i n g m a p p i n g s p r o v i d e d t o t r a n s l a t e q u e r i e s .  We  f u r t h e r d i s c u s s h o w t o u p d a t e t h e m e d i a t e d s c h e m a i n a s t a b l e s t a t e , i.e., after t h e s y s t e m s e t u p p e r i o d .  ii  Table of Contents Abstract  ". .  ii  Table of Contents  iii  List of Tables  vi  List of Figures  vii  Acknowledgements  ix  Introduction  1  1.1  Overview  1  1.2  A Motivating Example of P D M S  1.3  Challenges and Contributions  1  2  3 5  Preliminaries  10  2.1  Definitions w.r.t. Mediated schema creation  10  2.1.1  Datalog  10  2.1.2  Mediated Schema, L A V , G A V and G L A V  10  2.1.3  Query and Mapping  13  2.2  3  ,  Definitions w.r.t. P D M S  14  2.2.1  Query Reformulation and Result Reformulation . . . .  14  2.2.2  Mapping Composition  15  Related W o r k  3.1  Peer Data Management Systems  3.2  Mediated Schema Creation  16  . . . .•  16 18  iii  Table  4  5  6  19  4.1  Motivation  19  4.2  Definitions a n d Analysis  21  Creating a Mediated Schema in P D M S  25  5.1  P o t t i n g e r ' s S c h e m a M e d i a t i o n A l g o r i t h m for D I S  25  5.2  P r o b l e m Definition  27  5.3  MappingTable Creation  34  5.3.1  Intuitions  34  5.3.2  Algorithm  35  5.3.3  Section S u m m a r y  36  5.4  Mapping Compatible Check  37  5.5  Peer Schema M e d i a t i o n  39  5.5.1  System Setup Phase - Schema Mediation  40  5.5.2  Section S u m m a r y  49  50  Updating the Mediated Schema  6.2  6.3  8  Contents  Introducing Concept into Conjunctive Mappings  6.1  7  of  A d d i n g a N e w Peer to the System  50  6.1.1  53  Algorithm  D r o p p i n g a Peer from the System  54  6.2.1  61  Section S u m m a r y  E v o l u t i o n of a Peer L o c a l S c h e m a  61  A Study of Mapping Composition  63  7.1  C o m p l e x i t y : W h e r e Difficulties C o m e F r o m  63  7.2  E x p l o r i n g Possible Patterns in M a p p i n g Composition  66  7.2.1  D i f f e r e n t P a t t e r n s of M a p p i n g C o m p o s i t i o n  68  7.2.2  A n A n a l y s i s of M a p p i n g C o m p o s i t i o n P a t t e r n s  System Implementation and Experiment  . . . .  73  76  8.1  System Implementation and Setup  76  8.2  Input Schemas a n d M a p p i n g s  77  8.3  E x p e r i m e n t 1: S c h e m a M e d i a t i o n  77  8.4  E x p e r i m e n t 2: Q u e r y R e f o r m u l a t i o n  78  iv  Table of Contents  8.5  9  E x p e r i m e n t 3: U p d a t i n g t h e M e d i a t e d S c h e m a  Conclusion and Future Work  81  84  Bibliography  86  A Details for Example 15 .  89  B Details for Example 21  95  List of Tables 7.1  A S u m m a r y of M a p p i n g C o m p o s i t i o n C o m p l e x i t y f r o m [11, 18]  67  7.2  D i f f e r e n t P a t t e r n s for M a p p i n g C o m p o s i t i o n  71  8.1  U p d a t i n g the M e d i a t e d Schema.for A d d i n g a N e w Peer  . . .  82  i  vi  List of Figures 1.1  Query Processing in traditional P D M S  4  5.1  T h e C r e a t e M e d i a t e d S c h e m a A l g o r i t h m f r o m [22]  26  5.2  T h e C r e a t e M a p p i n g A l g o r i t h m f r o m [22]  27  5.3  Q u e r y R e w r i t i n g A l g o r i t h m f r o m [22]  28  5.4  Query Processing in M e P S y s  29  5.5  M a p p i n g T a b l e for E x a m p l e 6  5.6  M a p p i n g T a b l e for E x a m p l e 12(a)  5.7  M a p p i n g T a b l e for E x a m p l e 12(b)  5.8  Create MappingTable Algorithm  5.9  Merge MappingTable Algorithm  . . .  34 35 . .  35 37  '.  38  5.10 U p d a t e M a p p i n g T a b l e A l g o r i t h m  39  5.11 M a p p i n g C o m p a t i b l e C h e c k A l g o r i t h m  40  5.12 S c h e m a M e d i a t i o n S t a r t - u p  41  5.13 S y s t e m S e t u p A l g o r i t h m  43  5.14 M e r g e T w o M e d i a t e d S c h e m a A l g o r i t h m  .  45  5.15 C o m p u t e G L A V M a p p i n g A l g o r i t h m  46  5.16 Q u e r y R e f o r m u l a t i o n A l g o r i t h m  48  6.1  A d d i n g a N e w Peer to the S y s t e m  51  6.2  U p d a t e mediated information w h e n new peer joins the network  53  6.3  Peer Leaving ( B a d Topology)  57  6.4  U p d a t e m e d i a t e d i n f o r m a t i o n w h e n a p e e r leaves t h e n e t w o r k  58  6.5  E x a m p l e 17 I n p u t I n f o r m a t i o n (a)  59  6.6  E x a m p l e 17 I n p u t I n f o r m a t i o n (b)  6.7  E x a m p l e 17 I n p u t I n f o r m a t i o n (c)  ; . . . .  59 60  vii  List of Figures 6.8  E x a m p l e 17 O u t p u t (a)  61  6.9  E x a m p l e 17 O u t p u t (b)  61  6.10 E x a m p l e 17 O u t p u t (c)  62  8.1  T i m e of b u i l d i n g a m e d i a t e d s c h e m a V . S . N u m b e r of H o p s . .  79  8.2  T i m e of Q u e r y R e f o r m u l a t i o n a n d B r o a d c a s t i n g V . S . D e p t h of T o p o l o g y  8.3  80  T i m e of Q u e r y R e f o r m u l a t i o n i n t h e n e t w o r k V . S . M a x L o c a l C o m p u t i n g T i m e for 10 t i m e s Q u e r y R e f o r m u l a t i o n  81  Acknowledgements I w o u l d like t o t h a n k a l l of people g i v i n g m e s u p p o r t a n d help d u r i n g m y s t u d y at U B C . W i t h o u t your s u p p o r t , this thesis cannot b e done. F i r s t of a l l , I w o u l d like to t h a n k m y supervisor, R a c h e l P o t t i n g e r , for always g i v i n g m e s u p p o r t , encouragement a n d excellent guidance t h r o u g h o u t m y t h e s i s w o r k . I s p e n t a w o n d e r f u l t i m e w o r k i n g w i t h R a c h e l for t h e p a s t one a n d h a l f y e a r s . I w o u l d l i k e t o t h a n k G e o r g e T s i k n i s f o r d e d i c a t i n g t i m e a n d effort i n r e v i e w i n g m y thesis a n d g i v i n g m e valuable c o m m e n t s a n d advise. I w o u l d like to t h a n k a l l m y friends i n the department a n d i n the database lab for your continuous help, a n d especially the following people: Shaofeng B u , G a n g Peng, X u n Sun, Shuan Wang, Wendy Wang, Jian X u , Chao Y a n , Suling Yang and Xiaodong Zhou. L a s t , I w o u l d like t o t h a n k m y parents a n d A l e x L i u for their endless love a n d s u p p o r t .  Jie Zhao October 2006  ix  Chapter 1  Introduction 1.1  Overview  A peer-topeer  ( P 2 P ) network  is a n e t w o r k  i n w h i c h every  participating  node i n t h e network provides power, b a n d w i d t h a n d other resources rather t h a n o n l y r e l y i n g o n a s m a l l n u m b e r o f servers. It is a d e c e n t r a l i z e d a n d d i s t r i b u t e d system. C o m p a r e d to a client-server architecture where t h e n u m b e r of servers a r e f i x e d , P 2 P is m o r e f l e x i b l e a n d e x t e n s i b l e b e c a u s e w h e n n e w nodes arrive, the t o t a l capacity of the s y s t e m increases a c c o r d i n g l y  while  i n client-server environment, the a d d i n g of n e w clients means t h e slowing d o w n of the w h o l e s y s t e m . P 2 P systems are v e r y useful a n d famous nowad a y s a m o n g users w h o w a n t t o s h a r e files w h i l e users a r e e v e r y w h e r e across t h e w o r l d . F o r e x a m p l e , N a p s t e r [3] is a s u c c e s s f u l s y s t e m f o r m u s i c s h a r i n g using P 2 P  infrastructure.  A P e e r D a t a M a n a g e m e n t S y s t e m ( P D M S ) (e.g. [7, 8, 1 4 , 24]) is t h e result of b l e n d i n g t h e benefits of peer-to-peer ( P 2 P ) networks, such as lack of a c e n t r a l i z e d a u t h o r i t y , w i t h t h e r i c h e r s e m a n t i c s o f a d a t a b a s e .  Peer  D a t a M a n a g e m e n t S y s t e m ( P D M S ) is a D a t a M a n a g e m e n t S y s t e m u s i n g P 2 P architecture.  E a c h peer i n the system holds a database.  It c a n b e  extensively used for d a t a exchanging, query answering, i n f o r m a t i o n s h a r i n g , etc.  I n t h e areas o f s c i e n t i f i c r e s e a r c h , t h e i d e a o f s e t t i n g u p a P D M S f o r  research i n t h e related area to share d a t a a m o n g peers has already been widely discussed a n d admitted. A P D M S , as a P 2 P s y s t e m itself, keeps t h e p r o p e r t i e s o f a l l P 2 P s y s t e m s : E v e r y p e e r m a y j o i n a n d leave t h e n e t w o r k  at a n y time.  A l l peers are  a u t o n o m o u s l y c r e a t e d a n d m a n a g e d . H o w e v e r , A P D M S also r e q u i r e s t h a t e a c h p e e r , u p o n e n t e r i n g t h e n e t w o r k , p u b l i s h e s i t s d a t a b a s e s c h e m a so t h a t  1  Chapter 1. Introduction it c a n b e seen b y o t h e r p e e r s i n t h e n e t w o r k , a n d t h e e n t e r i n g p e e r m u s t c r e a t e a m a p p i n g b e t w e e n i t s e l f a n d one o r m o r e n e i g h b o r s . W e c a l l p e e r s b e t w e e n w h i c h t h e r e e x i s t s a d i r e c t m a p p i n g acquaintances.  T h e s e inter-peer  m a p p i n g s a l l o w q u e r y t r a n s l a t i o n s c h e m e s [8, 2 4 , 26] w h i c h a l l o w u s e r s t o easily, s e m a n t i c a l l y q u e r y d a t a f r o m s o u r c e s t h a t are n o t t h e i r o w n .  PDMSs  are a d a t a m a n a g e m e n t a r c h i t e c t u r e w h e r e n o t a l l n o d e s i n t h e s y s t e m n e e d to b e there constantly. P D M S s are o f t e n c o m p a r e d w i t h a D a t a I n t e g r a t i o n S y s t e m ( D I S ) w h i c h uses a c l i e n t - s e r v e r a r c h i t e c t u r e .  A m a j o r differences b e t w e e n a D I S a n d  P D M S is t h e c e n t r a l i z e d c o n t r o l . I n a t r a d i t i o n a l D I S , t h e r e is a l w a y s one o r m o r e c e n t r a l servers t o s t o r e a g l o b a l v i e w f o r a l l c l i e n t s ' d a t a b a s e s c h e m a s . Q u e r i e s a r e t y p i c a l l y p o s e d over t h e g l o b a l s c h e m a a n d t h e n t r a n s l a t e d t o local schemas.  A l l t h e q u e r y p r o c e s s i n g w o r k s u c h as q u e r y t r a n s l a t i o n ,  q u e r y f o r w a r d i n g , r e c e i v i n g a n s w e r s , a n d t r a n s l a t i n g a n s w e r s e n t i r e l y relies o n t h o s e servers. T h u s , i f t h e servers b r e a k d o w n , o r b e c o m e u n a v a i l a b l e , t h e whole system w i l l fail. C o n s i d e r i n g the w o r k l o a d of the whole system, the c e n t r a l i z e d c o n t r o l d e f i n i t e l y r e s u l t s i n a severe b o t t l e n e c k . side, a P D M S  O n the other  is a d y n a m i c a n d l o o s e l y - c o u p l e d e n v i r o n m e n t .  There are  n o c e n t r a l servers i n t h e P D M S . E v e r y p e e r is b o t h a server a n d a c l i e n t . E v e n t h o u g h , s o m e n o d e s m i g h t t a k e m o r e r e s p o n s i b i l i t i e s , e.g., s u p e r - p e e r s t r u c t u r e d P 2 P s y s t e m , these n o d e s are s t i l l n o t t a k i n g t h e r e s p o n s i b i l i t i e s o f c o n t r o l l i n g a l l n o d e s , a n d t h e r e s u l t i n g a r c h i t e c t u r e is m o r e d e c e n t r a l i z e d . I n m o s t c u r r e n t P D M S s , a q u e r y is p o s e d over a l o c a l p e e r s c h e m a , a n d a n s w e r s w i l l b e c o m p u t e d i n t h i s p e e r d a t a b a s e . T h e q u e r y w i l l also b e f o r w a r d e d and reformulated t o other peers i n t h e network, a n d query results w i l l b e sent b a c k . T h e f i n a l r e s u l t set r e t u r n i n g t o t h e u s e r w i l l b e a n s w e r s f r o m t h e l o c a l p e e r as w e l l as t h o s e f r o m o t h e r p e e r d a t a b a s e s . T h e g o a l o f t h i s t h e s i s is t o s t u d y t h e p r o p e r t i e s o f a P D M S , c o m p a r e and survey existing P D M S s , a n d look for more opportunities t o improve t h e c u r r e n t P D M S d e s i g n t o l e t t h e users h a v e a b e t t e r u n d e r s t a n d i n g o f the query processing result a n d to speed u p the query reformulation process a m o n g peers.  2  Chapter 1. Introduction  1.2  A Motivating Example of P D M S  Given the above properties of flexibility and decentralization, a P D M S is very suitable for scientific research data sharing and exchange [21]. There are many scenarios where a P D M S would be useful. In bioinformatics this has already been studied from both the data management approach [17, 21] and also from the bioinformatics approach [25]. Imagine that the Biology labs from different universities are doing research in similar areas. Some might have a partial gene sequence data of a sparrow, and some might have another partial gene sequence data of a sparrow. Researchers from either lab wants more data for further experiment.  Even though their results  have been published, the data still need some way to be shared by others. Thus a P D M S where each peer comes from a biology lab doing similar gene expression research can handle such a task well. In general, the types of scenarios that we can imagine that would benefit the most from a P D M S are those scenarios where people have overlapping structured data, and they want to access other sources' additional structured information. Rather than relying on readers having an understanding of the intricacies of biological data, we present a scenario which is familiar to readers of Computer Science area: bibliography management in Example 1. Example 1 Assume that there are two database servers of bibliography information: U B C and U W , and that they have the following relations storing information about conference papers: UW.conf-paper(title, venue, year, pages) UBC.conf-paper(title, conference, year, url)  As in this example, it is very likely that each database server only stores part of the record about conference papers, e.g., only U W has page information and only U B C has url. Additionally, even though they may have overlapped record for the same paper, the content might still be different. For instance,("Data Management for Peer-to-Peer Computing: A Vision", "Madison, Wisconsin, U S A " , "2002", 6) is one data entry in UW.conf-paper; 3  Chapter  1.  Introduction  ("Data Management for Peer-to-Peer Computing: A Vision","WebDB", "2002",  "citeseer.ist.psu.edu/bernstein02data.html") is the data entry for  the same paper in UBC.conf-paper. Given the similarity of relations above, along with the difference in available data, it would be beneficial for users of the two databases to be able to share their data on conference papers.  • /Queiy/f^efbrmulation  1  U B C liJser Quer>';Qu»e  UBCDB  Figure 1.1: Query Processing in traditional P D M S Figure 1.1 is a picture to explain how queries are propagated and translated back in the P2P systems. In most PDMSs (see figure 1.1) a user at peer U B C will ask a query posed over UBC's schema. Answers are computed at U B C , and the query is also reformulated and forwarded to other peers through mapping paths in the network. For example, assume U B C has a mapping MapuBCMW  to U W . Using MapuBCMW,  a query  QUBC  will be asked over U B C and then reformulated to Quw, a query over U W ' s schema. Quw will be processed at peer UW. U W will reformulate Quw to query over any additional acquaintances' schemas in the same way. Thus queries can be forwarded as far along the mapping path as possible, subject to system constraints. Query results will be sent back to the peer U B C after a similar series of reformulations. Thus, the user at U B C will get answers not only from local database but from other peer database as well.  4  Chapter  1.3  1.  Introduction  Challenges and Contributions  T h e m e t h o d s i n E x a m p l e 1 have been well s t u d i e d a n d allow t r a n s l a t i o n a n d a n s w e r i n g of a l a r g e class of queries [26].  However, w h e n the full context  is l o o k e d a t , t h e r e are a d d i t i o n a l o p p o r t u n i t i e s for i m p r o v i n g t h e u s e f u l n e s s a n d c o m p r e h e n s i b i l i t y of q u e r y t r a n s l a t i o n i n P D M S s , w h i c h we t a c k l e i n this thesis. F i r s t , w e get r i d of t h e r e s t r i c t i v e q u e r y f o r m a t .  In previous methods,  only information supported b y the local schema can be queried. F u r t h e r , a user at a p e e r c a n n o t k n o w w h a t o t h e r i n f o r m a t i o n is a v a i l a b l e i n t h e netw o r k . T h e u s e r c a n o n l y see t h e l o c a l s c h e m a , d e s p i t e t h e fact t h a t a c t u a l l y t h e r e is m u c h m o r e i n f o r m a t i o n i n t h e P D M S . T h i s is c l e a r l y s u b o p t i m a l ; s i n c e t h e i n f o r m a t i o n is p r e s e n t , users s h o u l d b e a b l e t o f u l l y u t i l i z e these resources a n d n o t b e r e s t r i c t e d b y t h e f o r m a t of l o c a l s c h e m a s . F o r e x a m p l e , i n E x a m p l e 1, users at U B C w o u l d b e u n a b l e t o access t h e page i n f o r m a t i o n at U W b e c a u s e t h e y c a n o n l y ask q u e r i e s over, t h e i r o w n s c h e m a . S e c o n d , we i m p r o v e t h e c o m p r e h e n s i b i l i t y of t h e P D M S . C u r r e n t s y s tems can either fail to take into account some expressiveness available to the s y s t e m , or r e s u l t i n s y s t e m s t h a t are d i f f i c u l t t o u n d e r s t a n d , as is s h o w n i n E x a m p l e 2: E x a m p l e 2 C o n s i d e r t h e f o l l o w i n g m a p p i n g s , w h e r e (2.1) r e l a t e s s o u r c e A t o t a r g e t B , a n d (2.2) r e l a t e s s o u r c e B t o t a r g e t C . C m e a n s t h e l e f t - h a n d s i d e is a s u b s e t of t h e r i g h t - h a n d s i d e . A t t r i b u t e s i n different s o u r c e s t h a t c a n b e m a p p e d t o e a c h o t h e r are u s i n g t h e s a m e v a r i a b l e n a m e s .  A. grandpa(x, y) C B.father(x, X\), B . f a t h e r ^ , y)  (2.1)  B. father(x, x l ) , B . f a t h e r ^ , X2), B.father(x2, y) C C.greatgrandpa(a;, y)  (2.2)  T h i s is i s o m o r p h i c t o a m a p p i n g i n [20].  (2.1) says t h a t if x is t h e  g r a n d p a of y, t h e n t h e r e is s o m e x\ s u c h t h a t x is x\s f a t h e r . M a p p i n g (2.2) says t h a t if x a n d x\,  x\  f a t h e r , a n d x\ is y ' s  a n d X2, ^2 a n d X3 a l l h a v e  f a t h e r r e l a t i o n s h i p , t h e n x a n d y w i l l b e t h e g r e a t g r a n d p a of y.  U s i n g the  5  Chapter 1. Introduction a l g o r i t h m d e s c r i b e d i n P i a z z a [20], c o m p o s i n g m a p p i n g s (2.1) a n d (2.2) t o m a p source A to source C results i n : A.grandpa(z, X2), A.grandpa(x , y) C C.greatgrandpa(a:, y{)  (2.3)  2  (2.3) says t h a t i f x is X2 s g r a n d p a a n d £2 is y ' s g r a n d p a , t h e n x is J  s o m e b o d y ' s great g r a n d p a . H o w e v e r , (2.3) does n o t e x p r e s s a n y c o n s t r a i n t o n y\:  from the given information,  i t is c l e a r t h a t y\ is t h e f a t h e r o f y  a n d X2 is t h e f a t h e r of y i ; P i a z z a is n o t e x p r e s s i v e e n o u g h t o e x p r e s s t h a t restriction.  W h i l e t h e m a p p i n g l a n g u a g e u s e d b y F a g i n et a l . [10] h a s  e n o u g h e x p r e s s i v e p o w e r t o h a n d l e t h e s a m e e x a m p l e , t h e final r e s u l t s u s i n g [10]'s a l g o r i t h m a r e d i f f i c u l t t o c o m p r e h e n d . T h e r e a s o n f o r t h i s is t h a t [10] uses s e c o n d - o r d e r l o g i c d e p e n d e n c i e s t o e x p r e s s t h e i r m a p p i n g s . a l l t h o s e m a p p i n g s w i t h unbound attributes  T h u s for  (attributes that appear i n the  t a r g e t b u t d o n o t a p p e a r i n t h e source) i n t h e  first-order  l o g i c , e.g. x\ i n  (2.1), [10]'s m a p p i n g l a n g u a g e c a n u s e 3 a n d V t o e x p l a i n t h e c o n s t r a i n t s for these a t t r i b u t e s .  However, w h e n there are t o o m a n y constraints i n t h e  m a p p i n g , t h e r e a d a b i l i t y o f t h e m a p p i n g itself w i l l b e d e c r e a s e d .  • In our system, we make the system more comprehensible b y providing a comprehensible mediated schema, a n d each local schema w i l l b e related t o t h e m e d i a t e d s c h e m a so t h a t users o f b o t h t h e l o c a l s c h e m a a n d t h e mediated schema c a n easily understand t h e system. T h i r d , we provide more semantic query processing. I n t r a d i t i o n a l P 2 P file s y s t e m s , copies o f t h e s a m e file o n different p e e r s c a n b e v i e w e d as g r o u p e d i n t o t h e s a m e file; e a c h m a y c o n t a i n different p a r t s o f t h e o r i g i n a l file.  C o n s i d e r t h e s c e n a r i o o f a P D M S . D a t a of t h e s a m e t o p i c ( w h i c h is  e x p r e s s e d as "Concept"  i n t h i s thesis) a r e d i s p e r s e d a m o n g p e e r s .  When  d a t a are g r o u p e d according t o t h e topic i n advance, it w i l l b e very easy t o h e l p t h e peers e x c h a n g e d a t a a n d m a k e t h e q u e r y p r o c e s s i n g fast. F o u r t h , w e reconsider t h e difficulties a n d c o m p l e x i t y raised b y m a p p i n g c o m p o s i t i o n , a n d see w h e t h e r s u c h c o m p l e x i t y c a n b e a v o i d e d i n r e a l systems.  6  Chapter  1.  Introduction  F i f t h , w e m a k e g o o d use o f i n d i r e c t m a p p i n g i n f o r m a t i o n . W e c a l l t h e i n f o r m a t i o n f o u n d i n t h e c o m p o s e d m a p p i n g t h a t is n o t i n c l u d e d i n t h e d i r e c t m a p p i n g indirect mapping information.  Previous methods tend to reformu-  l a t e q u e r i e s b a s e d o n s i n g l e sources o f m a p p i n g i n f o r m a t i o n .  However, i n  r e a l s y s t e m s , i t is v e r y l i k e l y t h a t different m a p p i n g r o u t e s c a n c o m p l e m e n t each other for query reformulation. Q u e r y answers w i l l b e i m p r o v e d a lot if w e c o n s i d e r different m a p p i n g r o u t e s a n d i n d i r e c t m a p p i n g i n f o r m a t i o n . T a k i n g i n t o c o n s i d e r a t i o n o f a l l these o p p o r t u n i t i e s , t h i s t h e s i s o n l y c o n siders the m a p p i n g s t h a t c a n group d a t a into topics, a n d proposes a u t o m a t ically creating a n d m a i n t a i n i n g a m e d i a t e d schema to take advantage of the above insights t o improve functionality for P D M S s . F o r other m a p p i n g s , t h e s a m e m e t h o d p r o p o s e d i n P i a z z a is a d o p t e d i n o u r s y s t e m . T h i s h a s m a n y a d v a n t a g e s . F i r s t a n d f o r e m o s t , users a r e a b l e t o access m o r e residing i n the network t h a n i n previous systems.  information  Second, we create t h e  m e d i a t e d s c h e m a u s i n g t h e p r e - e x i s t i n g m a p p i n g s , t h u s users h a v e t h e flexi b i l i t y t o use e i t h e r t h e m e d i a t e d s c h e m a o r use t h e i r l o c a l s c h e m a as t h e y use p r e v i o u s l y e x i s t i n g m e t h o d s , (e.g., t h o s e i n P i a z z a [26]). T h i r d , u s i n g o u r m e t h o d , t h e t r a n s l a t i o n o f q u e r i e s over t h e m e d i a t e d s c h e m a i n t o t h e sources is s i m p l e a n d effective. F i n a l l y , as w i l l b e s h o w n i n t h e e x p e r i m e n t , o u r q u e r y p r o c e s s i n g m e t h o d u s i n g a m e d i a t e d s c h e m a is m u c h faster t h a n other methods, w h i c h w o n ' t slow d o w n t h e whole system b u t w i l l ensure a comprehensible P D M S . However, creating a mediated schema i n a P D M S setting requires a n s w e r i n g d i f f i c u l t q u e s t i o n s s u c h as: - H o w c a n the mediated schema be created without a centralized a u thority? - H o w can an automatically created mediated schema be comprehensible t o users? - S i n c e t h e s y s t e m is d y n a m i c , t h e m a i n t e n a n c e o f a m e d i a t e d s c h e m a must be automatic and adaptive; how can h u m a n intervention be m i n imized?  7  Chapter 1. Introduction - W h e r e is the mediated schema stored, a n d how is it u p d a t e d ? In this thesis, we begin t o answer the above problems.  We make the  following specific contributions: - W e describe our P D M S system, M e P S y s (Mediation supported Peer D a t a M a n a g e m e n t S y s t e m ) , i n w h i c h a m e d i a t e d s c h e m a c a n be created d y n a m i c a l l y a n d any information i n the network c a n be queried w i t h o u t r e q u i r i n g a n y a d d i t i o n a l g l o b a l services i n the network. - W e outline our Peer S c h e m a M e d i a t i o n algorithm ( P S M algorithm) w h i c h is u s e d t o e f f i c i e n t l y c r e a t e t h e m e d i a t e d s c h e m a r e p r e s e n t i n g a l l s c h e m a s i n t h e n e t w o r k , a n d i n t r o d u c e M a p p i n g T a b l e s as a m e c h a n i s m for r e l a t i n g t h e m e d i a t e d s c h e m a t o t h e s o u r c e s . - W e s t u d y the semantics of m a p p i n g s a n d i n t r o d u c e the idea of autom a t i c a l l y d e t e c t i n g s p e c i f i c Concepts i n m a p p i n g s . A m e d i a t e d s c h e m a b a s e d o n Concepts is m u c h easier for users t o u n d e r s t a n d . - W e study how m a p p i n g composition impacts query reformulation and h o w t o d i s c o v e r c o m m o n Concepts i n c o m p o s e d m a p p i n g s t o e n s u r e comprehensible query results. - W e s t u d y h o w t o solve t h e p r o b l e m o f u p d a t i n g t h e m e d i a t e d s c h e m a in our infrastructure. - W e d e s i g n e x p e r i m e n t s t o test t h e efficiency a n d s c a l a b i l i t y of M e P S y s . T h i s t h e s i s is o r g a n i z e d as f o l l o w s .  I n C h a p t e r 2, w e i n t r o d u c e b a c k -  g r o u n d knowledge related t o this thesis. W e discuss related works i n C h a p t e r 3.' I n C h a p t e r 4, we d e f i n e a n d a n a l y z e t h e s e m a n t i c i n f o r m a t i o n t h a t can be presented i n a conjunctive mappings. W e present our approach t o creating a mediated schema i n a P D M S and creating the mappings from t h e m e d i a t e d s c h e m a t o l o c a l s o u r c e s i n C h a p t e r 5, i n c l u d i n g a l g o r i t h m s a n d examples. W e further describe how t o update the mediated schema i n C h a p t e r 6. W e s h o w o u r s y s t e m i m p l e m e n t a t i o n s e t u p a n d t h e e x p e r i m e n t a l  8  Chapter 1.  Introduction  results in Chapter 8. We then study the mapping composition problem that can impact the correctness and capability of query translation in the system in Chapter 7. Chapter 9 concludes and discusses future work.  9  Chapter 2  Preliminaries 2.1  Definitions w.r.t. Mediated schema creation  Definitions i n this section are related to t h e m e d i a t e d s c h e m a creation.  2.1.1  Datalog  D a t a l o g , a subset of P r o l o g , is a query a n d rule based language.  It is a  l o g i c - b a s e d q u e r y l a n g u a g e for t h e r e l a t i o n a l m o d e l . T a k e t h e f o l l o w i n g c l a u s e as a n e x a m p l e : Example 3  grandpa(x, z) :- father(x,y), father(y, z)  • E a c h c l a u s e i n D a t a l o g is c o m p o s e d o f t w o p a r t s : h e a d a n d b o d y , w h i c h a r e s e p a r a t e d b y " : - " . I n E x a m p l e 3, grandpa(x, z) is t h e h e a d , a n d father(x, y ) , father(y, z) a r e c o l l e c t i v e l y t h e b o d y . T h e b o d y c a n b e v i e w e d as c o n d i t i o n s k n o w n , a n d t h e head c a n b e regarded as t h e query. T h e above clause c a n b e i n t e r p r e t e d as: i f x is y's f a t h e r a n d y is z's f a t h e r , t h e n x is z's g r a n d p a . E a c h r e l a t i o n (e.g. father) is c a l l e d a p r e d i c a t e . E a c h t e r m i n t h e p r e d i c a t e (e.g., x a n d y) c a n e i t h e r b e a v a r i a b l e o r a c o n s t a n t .  F o r safety, o n l y v a r i a b l e s  that appear in the b o d y can appear in the head.  2.1.2  Mediated Schema, L A V , G A V and G L A V  A d a t a i n t e g r a t i o n s y s t e m is a s y s t e m t h a t c o m b i n e s d a t a r e s i d i n g a t different sources a n d p r o v i d e s t h e user w i t h a u n i f i e d v i e w o f these d a t a .  A  m e d i a t e d s c h e m a , also c a l l e d a g l o b a l s c h e m a , is u s u a l l y p r o v i d e d ' i n a D a t a Integration S y s t e m . E v e r y local source has its o w n database schema, w h i c h is c a l l e d t h e l o c a l s c h e m a . A m e d i a t e d s c h e m a is a u n i f i e d v i e w o f a l l these  10  Chapter 2.  Preliminaries  l o c a l d a t a b a s e s so t h a t t h r o u g h t h e m e d i a t e d s c h e m a , u s e r s c a n q u e r y over o n l y t h e m e d i a t e d s c h e m a a n d get r e s u l t s f r o m a l l l o c a l s o u r c e s . T h e r e are t w o b a s i c a p p r o a c h e s i n a d a t a i n t e g r a t i o n s y s t e m t o s p e c i f y m a p p i n g s between the mediated schema a n d the local schemas: L o c a l - A s V i e w ( L A V ) a n d G l o b a l - A s - V i e w ( G A V ) . I n L A V , t h e c o n t e n t of e a c h l o c a l s o u r c e s . s h o u l d b e c h a r a c t e r i z e d i n t e r m s of a v i e w qG over t h e g l o b a l s c h e m a ; i n G A V , t h e c o n t e n t of each e l e m e n t g of t h e g l o b a l s c h e m a s h o u l d b e c h a r a c t e r i z e d i n t e r m s of a v i e w qs over t h e l o c a l s o u r c e s [19]. E x a m p l e 4 T o e x p l a i n these d e f i n i t i o n s , c o n s i d e r E x a m p l e 1 a g a i n .  Two  l o c a l d a t a b a s e s U W a n d U B C are p r e s e n t e d w i t h t h e d a t a b a s e r e l a t i o n s as below. U W database:  UW.conf-paper(title, venue, year, pages) U B C database:  UBC.conf-paper(title, conference, year, url) A m e d i a t e d s c h e m a M is m a n u a l l y c r e a t e d as:  M.conf-paper(title, conference, venue, year) U s i n g L A V , UW.conf-paper a n d UBC.conf-paper w i l l b e e x p l a i n e d as a v i e w of t h e m e d i a t e d s c h e m a :  UW.conf-paper(title, venue, year, pages) :M.conf-paper(title, conference, venue, year) UBC.conf-paper(title, conference, year, url) : M.conf-paper(title, conference, venue, year) U s i n g G A V , M.conf-paper c a n b e e x p r e s s e d as t h e v i e w o f a l l  local  schemas.  M.conf-paper(title, conference, venue, year) :UW.conf-paper(title, venue, year, pages) M.conf-paper(title, conference, venue, year) :UBC.conf-paper(title, conference, year, url) 11  Chapter 2. Preliminaries  • G L A V , f i r s t l y i n t r o d u c e d i n [12], is a m o r e g e n e r a l i z e d a p p r o a c h t o s p e c i f y m a p p i n g s between sources.  GLAV  m a p p i n g language adopts b o t h G A V  and L A V i n its presentation, w h i c h allows a flexible s c h e m a definition a n d c o m b i n e s t h e e x p r e s s i v e p o w e r of b o t h G A V a n d L A V . E x a m p l e 5 s h o w s a n e x a m p l e of G L A V m a p p i n g r e l a t i n g different s c h e m a s .  Example 5  A s s u m e U W a n d U B C have the following schemas.  U W database: UW.research-area(area, groupleader, department) UW.grad-student(stu-name, area, advisor) U B C database: UBC.faculty(name, office-, department) A m e d i a t e d s c h e m a M is m a n u a l l y c r e a t e d as: M.people(name, department) U s i n g G L A V , relationship between the source schema a n d t h e m e d i a t e d s c h e m a c a n b e freely e x p r e s s e d . G L A V m a p p i n g between U B C a n d M UBC.faculty(name, office, department) —> M.people(name, department) G L A V m a p p i n g between U W a n d M UW.grad-student(stu-name, area, advisor), UW.research-area(area, gl, department) —» M.people(name, department)  T h e above G L A V m a p p i n g s c a n also b e r e w r i t t e n i n conjunctive m a p pings.  12  Chapter 2.  Preliminaries  G L A V m a p p i n g between U B C and M LAV: <2i(name, department):-M.people(name, department) GAV: Qi(name, department):-UBC.faculty(name, office, department)  G L A V m a p p i n g between U W and M LAV: Q2(r\ame,  department):-M.people(name, department)  GAV: <32(name, department):-UW.grad-student(name, area, advisor), UW.research-area(area, gl, department)  • 2.1.3  Query and Mapping  A m a p p i n g is the m e d i u m for exchanging d a t a a n d reformulating  queries  a m o n g different s c h e m a s ; i n p a r t i c u l a r , t h e m a p p i n g defines t h e o v e r l a p p i n g p a r t s o f a c q u a i n t a n c e s ' s c h e m a s . A s s u c h , i t is t h e m o s t b a s i c a n d i m p o r t a n t p a r t o f t h e s y s t e m . W e use c o n j u n c t i v e m a p p i n g s [27] as o u r m a p p i n g language.  T h e s e a r e t h e s a m e m a p p i n g s as i n [22] a n d [20]. W e choose  c o n j u n c t i v e m a p p i n g s as o u r i n p u t m a p p i n g b e c a u s e t h e y c a n e a s i l y e x p r e s s c o m m o n a l i t y a m o n g different s c h e m a s a n d a l l o w t h e use o f e x i s t i n g algor i t h m s if t h e m e d i a t e d s c h e m a is n o t u s e d , e.g., t h r o u g h t h e m e t h o d s i n [26] w h i c h j u s t c o m p o s e t h e m a p p i n g s b e t w e e n p e e r s . A conjunctive a set of conjunctive  mapping is  queries r e l a t i n g a p a i r o f l o c a l s c h e m a s : i.e., a set o f  s i m p l e D a t a l o g q u e r i e s . W e b r i e f l y r e v i e w t h e s y n t a x of c o n j u n c t i v e q u e r i e s i n E x a m p l e 6. I n a c o n j u n c t i v e m a p p i n g , i f a set o f c o n j u n c t i v e q u e r i e s h a s t h e s a m e I D B n a m e (i.e., n a m e of t h e a n s w e r r e l a t i o n ) , t h a t set expresses t h e o v e r l a p p i n g i n f o r m a t i o n i n t h e s o u r c e s . N o t e t h a t w e s t i l l use m a p p i n g t o refer t o t h e g e n e r a l sense o f c o r r e s p o n d e n c e s b e t w e e n s c h e m a s . E a c h c o n j u n c t i v e q u e r y i n t h e c o n j u n c t i v e m a p p i n g is also c a l l e d a component, w h i c h relates t o o n e s c h e m a for t h a t m a p p i n g . I n a c o n j u n c t i v e q u e r y , v a r i a b l e s  13  Chapter 2.  Preliminaries  t h a t a p p e a r i n b o t h t h e h e a d a n d t h e b o d y are c a l l e d distinguished v a r i a b l e s t h a t a p p e a r o n l y i n t h e b o d y a r e c a l l e d existential  variables;  variables.  E x a m p l e 6 Consider the following conjunctive m a p p i n g o n the schemas i n E x a m p l e 4: conf-paper(title,venue,yr):-UW.conf-paper(title,venue,yr.pages) conf-paper(title,venue,yr):-UBC.conf-paper(title,venue,yr,url) T h i s m a p p i n g shows h o w U B C a n d U W c a n share i n f o r m a t i o n  about  c o n f e r e n c e p a p e r s t h r o u g h t h e reuse o f t h e I D B n a m e c o n f - p a p e r .  Each  l i n e is a c o n j u n c t i v e q u e r y .  conf-paper(title,venue,yr) is t h e h e a d o f b o t h  q u e r i e s , a n d title, venue, and yr a r e h e a d v a r i a b l e s , i n d i c a t i n g t h a t these are t h e c o m m o n a t t r i b u t e s  i n the two schemas.  A s i n general D a t a l o g ,  t h e s e m a n t i c s i m p l y t h a t conf-paper i n f o r m a t i o n c a n b e f o u n d b y t a k i n g t h e u n i o n of title, venue, and yr a t t r i b u t e s f o u n d f r o m t h e b o d i e s o f t h e t w o queries. T h e bodies express t h e conditions required t o f o r m a n answer tuple, a n d reuse o f a v a r i a b l e i n d i c a t e s t h a t t h o s e v a l u e s m u s t b e t h e s a m e . E a c h r e l a t i o n i n t h e b o d y o f a q u e r y is r e f e r r e d t o as a s u b g o a l . T o g e t h e r , t h e c o n j u n c t i v e q u e r i e s s h o w t h a t i n f o r m a t i o n a b o u t conf-papers c a n b e f o u n d t h r o u g h e i t h e r UW.conf-paper o r UBC.conf-paper, a n d a l l i n f o r m a t i o n t h a t c a n b e o b t a i n e d f r o m t h e t o p i c conf-paper a r e title, venue, yr, page a n d url i n f o r m a t i o n . N o t e t h a t t h e I D B n a m e o f a s u b g o a l is t h e s u b g o a l n a m e .  2.2 2.2.1  •  Definitions w.r.t. P D M S Query Reformulation and Result Reformulation  Q u e r y r e f o r m u l a t i o n is t h e p r o c e s s o f t r a n s l a t i n g a q u e r y over d a t a b a s e s c h e m a A t o t h a t over d a t a b a s e s c h e m a B so t h a t q u e r i e s over A c a n b e u n d e r s t o o d b y schema B . T h e reformulation process follows the m a p p i n g s b e t w e e n these t w o s c h e m a s . A c c o r d i n g l y , r e s u l t r e f o r m u l a t i o n is t h e p r o c e s s of t r a n s l a t i n g t h e r e s u l t s u s i n g d a t a b a s e s c h e m a A t o t h o s e over d a t a b a s e schema B so that results obtained at s c h e m a A c a n b e u n d e r s t o o d b y s c h e m a B. 14  Chapter 2.  2.2.2  Preliminaries  Mapping Composition  M a p p i n g c o m p o s i t i o n is a h a r d p r o b l e m w h i c h d r a w s m u c h a t t e n t i o n cently.  Intuitively, w h e n given semantic mappings M i  B , a n d M23  2  from schema A  f r o m B to C , we h o p e we c o u l d c o m p u t e a d i r e c t m a p p i n g  f r o m A t o C w h i c h is e q u i v a l e n t t o t h e c o m p o s i t i o n of m a p p i n g s M12 M  2 3  H a l e v y et a l . p r o v i d e d a f o r m a l d e f i n i t i o n of m a p p i n g c o m p o s i t i o n : is a c o m p o s i t i o n of t h e m a p p i n g s MA~*B  a q u e r y l a n g u a g e Q i f f o r a l l d a t a b a s e s DA  f o r RA,  M\z and  "The  a n d MB-*C  w.r.t.  a n d for a l l queries q over  s u c h t h a t q is i n t h e l a n g u a g e Q, t h e c e r t a i n a n s w e r s for q w . r . t .  a r e t h e s a m e as t h e c e r t a i n a n s w e r s w . r . t . MA-,B  a n d MB->C"  MA->C  [20]  F a g i n et a l . d e f i n e d t h e m a p p i n g c o m p o s i t i o n i n m o r e l o g i c a l sense: M\2  to  .  m a p p i n g MA^C  Rc  re-  ={S\,  S2, £ 1 2 ) a n d M 2 3 =(S2,  53,  "Let  £23) b e t w o s c h e m a m a p p i n g s s u c h  t h a t t h e s c h e m a s S i , S , S3 have n o r e l a t i o n s y m b o l i n c o m m o n p a i r w i s e . 2  A schema mapping M  =  ( S i , S3, £13) is a c o m p o s i t i o n of Myj  a n d M 2 3 if  Inst(M') = I n s t ( M i ) o Inst(M 3), which means that I n s t ( M ) = { < / i ; 2  i3> |  2  I>  G I n s t ( M i ) a n d < 7 ; i3> G Inst(M 3)}."  " A s c h e m a m a p p i n g is a t r i p l e M  = ( S , T, £ t ) , w h e r e S a n d T are s c h e m a s  there exists I  2  such that < / i ;  2  2  2  2  s  w i t h n o r e l a t i o n s y m b o l s i n c o m m o n a n d £ t is a set of f o r m u l a s of s o m e s  l o g i c a l f o r m a l i s m over < S , T > . "  [11]  M a p p i n g c o m p o s i t i o n is a p r o b l e m t h a t costs h i g h c o m p l e x i t y w h e n c o n s i d e r i n g a l l classes of m a p p i n g s . I n C h a p t e r 5, w e e x p l o r e a n d a n a l y z e m a p ping composition problem in more depth.  15  Chapter 3  Related Work 3.1  Peer D a t a Management Systems  F o r a long t i m e , P 2 P c o m p u t i n g a n d D a t a b a s e research groups were very i n d e p e n d e n t . T h e i d e a of i n c o r p o r a t i n g t h e d a t a b a s e r e s e a r c h i n t o P 2 P w a s p r o p o s e d a n d d i s c u s s e d e a r l y t h i s c e n t u r y b y G r i b b l e et a l .  [13] i n 2 0 0 1 ,  B e r n s t e i n et a l . [7] i n 2002. T h e p a p e r b y B e r n s t e i n et a l . i n t r o d u c e s t h e L o c a l R e l a t i o n a l M o d e l ( L R M ) as a d a t a m o d e l s p e c i f i c a l l y d e s i g n e d for P 2 P applications. M o s t research projects i n P D M S b u i l d their architecture b y e x t e n d i n g L R M , (i.e.  P i a z z a [14], H e P T o X [8], H y p e r i o n [24]).  L R M  a s s u m e s t h a t t h e set o f a l l d a t a i n a P 2 P n e t w o r k c o n s i s t s o f l o c a l ( r e l a t i o n a l ) d a t a b a s e s , e a c h w i t h a set of a c q u a i n t a n c e s , w h i c h d e f i n e t h e P 2 P n e t w o r k topology. F o r each acquaintance link, translation rules between d a t a items a n d s e m a n t i c d e p e n d e n c i e s b e t w e e n t h e t w o d a t a b a s e s are p r e d e f i n e d  [7].  O u r s y s t e m also uses s u c h a m o d e l . P i a z z a [14, 15, 20] is a w e l l - k n o w n p r o t o t y p e for P e e r D a t a M a n a g e m e n t System.  P i a z z a uses G A V / L A V s t y l e m a p p i n g s t o d e s c r i b e t h e s e m a n t i c  r e l a t i o n s h i p s b e t w e e n t w o p e e r s . O u r m a p p i n g is b a s e d o n t h e P i a z z a m a p p i n g , b u t c h a n g e d i n t o c o n j u n c t i v e m a p p i n g for b e t t e r u n d e r s t a n d i n g . P i a z z a also p r o v i d e s a q u e r y r e f o r m u l a t i o n a l g o r i t h m [26] b a s e d o n X Q u e r y to translate the queries a m o n g peers u s i n g different schemas. U n l i k e our solution, they do not support a mediated schema in their prototype, w h i c h is f l e x i b l e for q u e r y t r a n s l a t i o n . O u r s o l u t i o n uses t h e s i m i l a r i n i t i a l s e t u p b u t e n s u r e s a b e t t e r u n d e r s t a n d i n g of t h e s y s t e m , access t o m o r e i n f o r m a t i o n a n d a faster q u e r y reformulation.  H o w e v e r , s i n c e o u r s y s t e m uses t h e  s a m e m a p p i n g s as P i a z z a , users of o u r s y s t e m c o u l d c h o o s e t o use e i t h e r o u r q u e r y a n s w e r i n g s y s t e m o r t r a n s l a t e t h e i r queries u s i n g P i a z z a ' s s y s t e m .  16  Chapter 3. Related Work H e P T o X [8, 9] i s a P 2 P X M L d a t a b a s e s y s t e m . A n o v e l q u e r y t r a n s l a t i o n algorithm  has been developed for a simple b u t a significant  fragment of  X Q u e r y . B o t h P i a z z a a n d H e P T o X a s s u m e t h a t a l l q u e r i e s a r e a s k e d over a local schema; they focus o n  finding  a query reformulation  algorithm to  t r a n s l a t e a q u e r y over one p e e r s c h e m a t o a q u e r y over i t s a c q u a i n t a n c e . T h e w o r k s o f F a g i n et a l . [10, 11] m a i n l y c o n s i d e r h o w t o p r o c e s s d a t a e x c h a n g e a n d m a p p i n g c o m p o s i t i o n w i t h o u t i n f o r m a t i o n loss u s i n g m a p p i n g s u n d e r s e c o n d - o r d e r logic d e p e n d e n c i e s .  T h e y provide a nice solution for  m a p p i n g c o m p o s i t i o n w h i c h t h e y uses f o r t h e i r q u e r y t r a n s l a t i o n . H y p e r i o n [5, 24] is also a P e e r D a t a M a n a g e m e n t S y s t e m w h i c h uses b o t h data-level and schema-level mappings to specify the correspondences between acquainted peer databases a n d process query r e f o r m u l a t i o n o n these m a p p i n g s .  based  M a p p i n g tables are used to specify correspondences  between d a t a values of acquainted databases.  T h i s is t h e k e y difference  between H y p e r i o n and a l l other systems. I n P e e r D B [21], l o c a l p e e r s c h e m a s a r e n o t p u b l i s h e d .  Instead,  local  users i n p u t k e y w o r d s f r o m r e l a t i o n n a m e , a t t r i b u t e s , a n d r e c o r d s f o r e a c h peer database, a n d such i n f o r m a t i o n is used t o m a t c h relations. A s i n p r e v i o u s P D M S s : a n e w p e e r e n t e r i n g t h e n e t w o r k chooses o n e o r more acquaintances a n d provides m a p p i n g s t o one or more acquaintances, e i t h e r c r e a t e d b y h a n d , o r t h r o u g h s o m e s c h e m a m a t c h i n g t o o l (see [23] f o r a survey). O u r s y s t e m has t h e following m i n i m a l features: Peer: each p e e r stores b o t h i t s l o c a l s c h e m a a n d one o r m o r e v e r s i o n s of m e d i a t e d s c h e m a t h a t l o c a l p e e r has a n a p p l i c a t i o n b u i l t o n .  Only the  mappings from mediated schema to its local schema w i l l b e stored at this l o c a l peer, w h i c h is used for query answering.  E a c h p e e r w i l l also s t o r e  M a p p i n g T a b l e s , w h i c h help create t h e mediated schema a n d determine how to relate t h e m e d i a t e d schema t o t h e peer's schema. Query answering: users c a n e i t h e r q u e r y t h e l o c a l s c h e m a (as i n p r e v i o u s s y s t e m s ) o r use t h e a u t o m a t i c a l l y - c r e a t e d m e d i a t e d s c h e m a , w h i c h p r o v i d e s additional information.  I n e i t h e r case, user q u e r i e s are a u t o m a t i c a l l y  refor-  mulated and forwarded to the peers' acquaintances.  17  Chapter 3. Related Work  3.2  Mediated Schema Creation  A n o t h e r a s p e c t of o u r r e s e a r c h is m e d i a t e d s c h e m a c r e a t i o n ( M S C ) . M S C has b e e n s t u d i e d i n d a t a i n t e g r a t i o n a r e a . B a t i n i , L e n z e r i n i , a n d N a v a t h e [6] c o n d u c t e d a s u r v e y for d a t a b a s e s c h e m a i n t e g r a t i o n e a r l y i n 1986.  In.  [6], m e t h o d o l o g i e s for s c h e m a i n t e g r a t i o n a n d a c o m p a r i s o n o f a l l a v a i l a b l e methodologies are p r o v i d e d .  L a t e r , L e n z e r i n i [19] a n d U l l m a n [28] b o t h  discussed d a t a integration techniques i n very theoretical perspective. H o w ever, m o s t e a r l y w o r k i g n o r e s c r e a t i n g t h e m a p p i n g s b e t w e e n t h e m e d i a t e d schema a n d the sources. P o t t i n g e r p r o v i d e d a m e t h o d for creating a m e d i a t e d schema a n d m a p p i n g s f r o m t h e m e d i a t e d s c h e m a t o s o u r c e s [22]. H o w e v e r , [22] d o e s n o t consider the complications of t h e P D M S s i t u a t i o n , i n c l u d i n g t h e fact t h a t m a p p i n g s can exist between a n y pairs of schemas, rather t h a n those dictated b y a c e n t r a l i z e d a u t h o r i t y . O u r s c h e m a m e d i a t i o n m e t h o d is b a s e d o n [22]. However, we have the following improvements: o u r M S C m e t h o d can handle a n y n u m b e r s o f l o c a l s c h e m a s , w h i l e t h e o n e i n [22] c a n o n l y h a n d l e a l i m i t e d n u m b e r of local sources. I n o u r a p p r o a c h , a n y ordering of m e d i a t i n g l o c a l s c h e m a s w i l l get a n e q u i v a l e n t r e s u l t w h i l e i n [22], d i f f e r e n t o r d e r i n g o f l o c a l s c h e m a s m i g h t get t o t a l l y d i f f e r e n t m e d i a t e d s c h e m a a n d m a p p i n g s t o l o c a l s o u r c e s . T h i s is l e t ' s a s s u m e t h a t w e u s e a d i f f e r e n t a p p r o a c h f r o m t h e i r s . F o r e x a m p l e , w h e n w e h a v e t h r e e peers A , B , C m a p p i n g MapA_B A a n d B , a n d m a p p i n g MapB.c  between  between B a n d C . Pottinger's approach to  c r e a t e a m e d i a t e d s c h e m a MAB is f i r s t l y b a s e d o n s c h e m a A, B a n d m a p p i n g MapA_B-  T h e n m a p p i n g s MapAB-B  m a p p i n g MapAB.c-  a n d Maps_c  U s i n g s c h e m a s MAB,  m e d i a t e d s c h e m a of A B C c a n b e created. t h e i n t e r m e d i a t e m e d i a t e d s c h e m a s MAB a m a p p i n g MapAB.BC MAB  a r e c o m p o s e d t o get t h e  C a n d m a p p i n g MapAB.c,  the  U s i n g o u r a p p r o a c h , we create a n d MBC  first.  T h e n we create  based o n the overlapped subgoals i n the mappings  a n d MBC , a n d c r e a t e t h e final m e d i a t e d s c h e m a a n d m a p p i n g s t o l o c a l  sources.  18  Chapter 4  Introducing Concept into Conjunctive Mappings Before we delve into t h e P S M algorithm, we start b y s t u d y i n g t w o related p r o b l e m s : w h a t are t h e things t h a t c a n b e represented b y t h e same relation in the m e d i a t e d schema, a n d w h a t is t h e c o m p l e x i t y of m a p p i n g c o m p o s i t i o n in such a s y s t e m . W e explore t h e first p r o b l e m i n this chapter, a n d explore t h e s e c o n d o n e i n c h a p t e r 7. T h e definitions of C o n j u n c t i v e M a p p i n g a n d C o n j u n c t i v e Q u e r y have b e e n i n t r o d u c e d a n d e x p l a i n e d i n s e c t i o n 2.1.3. I n t h i s c h a p t e r , w e e x p l o r e how t o support semantics i n the conjunctive m a p p i n g b y applying some restrictions o n the conjunctive m a p p i n g s . W e first present t h e m o t i v a t i o n of i n t r o d u c i n g t h e i d e a o f "Concept"  into conjunctive mappings i n section 4.1.  W e t h e n use some examples t o analyze w h a t k i n d of restrictions s h o u l d b e a p p l i e d t o t h e m a p p i n g s i n o r d e r t o l e t t h e m a p p i n g s h a v e a Concept. W e g i v e t h e f o r m a l d e f i n i t i o n of a Concept i n s e c t i o n 4 . 2 .  4.1  Motivation  O n e o f o u r c o n t r i b u t i o n s is t o i n t r o d u c e t h e n o t i o n o f Concept i n t o c o n junctive mappings.  P r e v i o u s work either ignores t h e semantics t h a t t h e  m a p p i n g s m i g h t c o n t a i n [11, 20] o r s i m p l y a s s u m e d t h a t i f t h e I D B n a m e of t h e q u e r i e s w e r e t h e s a m e , t h e n t h e y d e s c r i b e d t h e s a m e Concept [22]. T h e l a t e r v i e w , t h o u g h u s i n g t h e s e m a n t i c i n f o r m a t i o n t h a t is e m b e d d e d i n t h e m a p p i n g s , is o v e r l y s i m p l i s t i c .  However, reusing the same I D B name  i n s i d e a c o n j u n c t i v e m a p p i n g l i k e l y i n d i c a t e s t h e cases t h a t (1) t h e s e m a n tics of the m a p p i n g m a y m a k e it i m p o s s i b l e t o construct a n u n d e r s t a n d a b l e m e d i a t e d s c h e m a i f t h i s i s t h e o n l y i n f o r m a t i o n t a k e n i n t o a c c o u n t a n d (2) 19  Chapter 4. Introducing Concept into Conjunctive Mappings consistency of I D B names cannot be assumed between mappings - there is no guarantee that the creator of M a p ^ - B w i l l use the same I D B name i n the same way as the creator of Mapcjo  to represent the same Concept.  this thesis we begin to understand the semantic issues entailed by  In  Concept.  However, as an introductory step, throughout this thesis we assume that when more than one conjunctive mappings express the same Concept, their I D B names are the same. Intuitively, a Concept describes the common object among different schemas. T h i s is different from a mapping which expresses the overlapped attributes across schemas.  W h e n a mapping is said to have a  Concept,  the mapping should have a clear object, which might have some aspects to describe it i n detail. In E x a m p l e 6, clearly the mapping describes the object "conf-paper", which uses five aspects, title, venue, yr, pages and url, to describe this object. However, even though there are common relation names (e.g., father) and attributes (e.g., x, x l ) occurred in E x a m p l e 2, these two mappings together do not express a common object or an idea because mapping (2.1) explains the Concept grandpa and mapping (2.2) explains the Concept greatgrandpa. Deciding whether or not one mapping can express a Concept evolves much A l research topics such as natural language processing, ontology, etc, and thus is not our focus i n this thesis.  However, deciding whether two  mappings w i t h the same I D B name together can express the same Concept is very useful, especially when we want to mediate local schemas and propagate queries along the mapping path. There are two reasons. First, users tend to provide incomplete or wrong mapping information when two mappings between two pairs of peer schemas are considered.  Second, we want to  know whether the pre-defined mappings are truly relating identical parts from different schemas. In this thesis, we only discuss this Concept problem based on two or more mappings. Throughout this thesis, the input mappings for M e P S y s are required to be conjunctive mappings w i t h the same concept (Definition 3) without selfjoins i n each component. T h e P S M algorithm is sound and complete for such input mappings. 20  Chapter 4. Introducing Concept into Conjunctive Mappings  4.2  Definitions and Analysis  I n C h a p t e r 2 , w e i n t r o d u c e d c o n j u n c t i v e m a p p i n g s , i n c l u d i n g E x a m p l e 6, w h i c h defines o n e Concept: c o n f - p a p e r . H o w e v e r , i n m a n y cases d e t e r m i n i n g a Concept f r o m t h e m a p p i n g is n o t s t r a i g h t f o r w a r d .  Consider Example 7  ( o r i g i n a l l y f r o m [11] b u t r e w r i t t e n i n t h e f o r m o f a c o n j u n c t i v e m a p p i n g ) .  Example 7  A s s u m e the following conjunctive mappings:  C o n j u n c t i v e M a p p i n g 1: b e t w e e n A a n d B M a n a g e r ( x , y) :- A . M g r ( x , y)  (7.1a)  M a n a g e r ( x , y) :- B . M g r l ( x , y)  (7.1b)  Conjunctive M a p p i n g 2: between B a n d C M a n a g e r ( x ) :- B . M g r l ( x , x )  (7.2a)  M a n a g e r ( x ) :- C . S e l f M g r ( x )  (7.2b)  B o t h c o n j u n c t i v e m a p p i n g s use t h e I D B n a m e M a n a g e r . H o w e v e r , w h i l e t h e first c o n j u n c t i v e m a p p i n g defines t h e M a n a g e r r e l a t i o n s h i p , t h e s e c o n d does not. T h o u g h t h e relation M g r l appears i n b o t h c o n j u n c t i v e m a p p i n g s , i n the second m a p p i n g , it has more restrictions o n its attributes:  x must  m a n a g e itself. T h u s , M g r l ( x , x) c a n n o l o n g e r r e p r e s e n t s t h e i d e a o f M a n a g e r but Self-Management instead.  T h e r e f o r e , E x a m p l e 7 h a s t w o Concepts:  M a n a g e r a n d S e l f - M a n a g e m e n t rather t h a n o n l y one. W e call t h e attributes i n B . M g r l ( x , x) i n (7.2a) a self-restrictive  a t t r i b u t e , w h i c h is defined t o b e a n  a t t r i b u t e t h a t is i n a s u b g o a l t h a t appears i n t w o c o n j u n c t i v e m a p p i n g s w i t h the same I D B n a m e b u t has more restrictions t h a n the same a t t r i b u t e i n t h e o t h e r m a p p i n g . C o r r e s p o n d i n g l y , c o m p o n e n t (7.2a) ' i s a s e l f - r e s t r i c t i v e component (Definition 6).  •  E x a m p l e 7 s h o w s t h a t i t is h a r d t o j u d g e w h e t h e r t w o m a p p i n g s d e s c r i b e t h e s a m e i d e a . T h i s t h e s i s p r o v i d e s t h e f i r s t p a s s at u n d e r s t a n d i n g a Concept w h i c h is some c o m m o n topic or i d e a t h a t c a n help m a k e t h e m e d i a t e d s c h e m a m o r e c o m p r e h e n s i b l e . B u t w h a t i s a Concept i n t h e c o n j u n c t i v e m a p p i n g s ? 21  Chapter 4. Introducing Concept into Conjunctive Mappings  W e b e g i n w i t h t h e f o l l o w i n g d e f i n i t i o n o f a Concept i n c o n j u n c t i v e m a p p i n g s , w h i c h a l l o w s a c a r e f u l a n a l y s i s o f t h e cases t h a t a r e c l e a r - c u t .  There are  a d d i t i o n a l issues t o c o n s i d e r , b u t i n t h i s t h e s i s w e f o c u s o n t h e s e cases t h a t are clear a n d e s s e n t i a l f o r d e c i d i n g a Concept. A  Concept i s d e f i n e d b y a set o f aspects.  A s i n relations i n relational  d a t a b a s e s , t h e aspects o f a Concept a r e r e f e r r e d t o as attributes.  For exam-  p l e , t h e c o n c e p t o f flight i n c l u d e s t h e a t t r i b u t e s f l i g h t N o , date, t i m e , departure, d e s t i n a t i o n , c r e w , e t c , w h i c h are t h e aspects t h a t a r e u s e d t o d e s c r i b e t h e Concept flight. I n a c o n j u n c t i v e m a p p i n g , t h e n a m e o f e a c h a s p e c t i s t h e corresponding variable n a m e i n t h e m a p p i n g , w h i c h represents one aspect of t h e Concept t h a t t h e m a p p i n g d e s c r i b e s . D e f i n i t i o n 1 ( C o n c e p t i n C o n j u n c t i v e M a p p i n g s ) : A Concept i n c o n j u n c t i v e m a p p i n g s i s d e f i n e d as a n i d e a , n o t i o n o r e n t i t y t h a t i s c o m m o n i n a l l s c h e m a s t h a t t h e c o n j u n c t i v e m a p p i n g s are r e l a t i n g . F o r m a l l y , a Concept C is e x p r e s s e d b y a set o f aspects S.  •  F o r ease o f d i s c u s s i o n , w e p r o v i d e t h e f o l l o w i n g t e r m s a n d d e f i n i t i o n s .  Definition 2 (Component): E a c h conjunctive query Q i n t h e conjunctive m a p p i n g is a c o m p o n e n t [22].  Definition 3 (Same C o n c e p t ) : CM\  • W e say that two conjunctive  mappings  a n d CM2 d e f i n e t h e s a m e c o n c e p t i f t h e o v e r l a p p e d s u b g o a l s i n CM\  a n d CMi  a r e e q u i v a l e n t s u b g o a l sets, w h i c h c a n b e c h e c k e d as f o l l o w s :  1. A s s u m e V c o m p o n e n t c e CM\  dk.dC.  U CM ,  c) a n d (2) \subgoals(c)\  2  >  \subgoal(d)\;  A s s u m e t h e o v e r l a p p e d s c h e m a o f CM\ L e t C\ b e a c o m p o n e n t f r o m CM\  a n d CM2 i s X;  over X\  L e t C2 b e a c o m p o n e n t f r o m CM2 over 2. L e t name(sg)  $ c' s.t. (1) c = c' (i.e., c C  X.  b e t h e r e l a t i o n n a m e of t h e s u b g o a l sg;  L e t sgjnames(Q)  b e t h e n a m e s o f a l l o f t h e r e l a t i o n s i n q u e r y Q;  22  Chapter 4. Introducing Concept into Conjunctive Mappings L e t overlap-names — {name(sg) | name(sg) e sgjnames{C\) a n d name(sg) £ Let  Ci i p  sgjnames(C )}; 2  = {sg |  over a  name(sg) G overlap.nam.es a n d s# € 6 o d y ( C i ) } ;  C2 i;er/ap = { 5 I name(sg) € overlap .names a n d s<? € 6ody(C2)}. s  0  3. W e n o w c r e a t e n e w q u e r i e s Q i a n d Q  2  that describe the overlapping  p a r t s o f C\ a n d C2 r e s p e c t i v e l y . T h e g o a l is t o see i f t h e o v e r l a p p i n g parts are equivalent:  L e t IDB{Q )  b e IDB{CM ),  X  a n d let IDB{Q )  X  2  L e t subgoals(Qi) = Cx^^ia^ Let all variables i n  be  a n d subgoals(Q ) = 2  IDB{CM ); 2  C ,i ; 2au( T av  Q\ a n d Q b e d i s t i n g u i s h e d . T h a t is let vars{head{Q\)) 2  a n d let vars{head(Q,2)) = vars(subgoals(C>2))-  = vars(subgoals(Q\))  T h e n the following conditions should hold:  1. IDB(CM\) equal to  ( w h i c h , b y t h e a b o v e r e q u i r e m e n t , is also  — IDB(CM ) 2  IDB{Q ) X  and  IDB(Q )); 2  2. overlap-names ^ 0 ; 3. Q\ c o n t a i n s Q , 2  and Q  2  c o n t a i n s Q\ (i.e., t h e y a r e e q u i v a l e n t ) .  • A d d i t i o n a l l y , c o n d i t i o n (1) is a c t u a l l y t h e a s s u m p t i o n w e h a v e m a d e a t t h e b e g i n n i n g . T h i s c o n d i t i o n c a n b e r e m o v e d for f u r t h e r s t u d y i n t h e f u t u r e . C o n d i t i o n (2) says t h a t i f s c h e m a X e x i s t s i n t w o c o n j u n c t i v e m a p p i n g s o f t h e s a m e Concept, a t least o n e r e l a t i o n r s h o u l d a p p e a r i n b o t h c o n j u n c t i v e m a p p i n g s . T h i s ensures t h a t t h e of s c h e m a  Concept is c o m p a t i b l e i n t h e r e p r e s e n t a t i o n  X.  C o n d i t i o n (3) e n s u r e s t h a t i f o n e s u b g o a l n a m e a p p e a r s i n d i f f e r e n t c o n j u n c t i v e m a p p i n g s for t h e s a m e c o n c e p t , these t w o s u b g o a l s s h o u l d b e e x a c t l y t h e s a m e after a s u b s t i t u t i o n o f v a r i a b l e n a m e s .  23  Chapter 4. Introducing Concept into Conjunctive Mappings E x a m p l e 8 T h e r e a r e o t h e r cases w h e n t w o c o n j u n c t i v e m a p p i n g s f o l l o w t h e d e f i n i t i o n o f s a m e c o n c e p t b u t s t i l l a r e n o t r e p r e s e n t i n g t h e s a m e Concept. C o n s i d e r t h e f o l l o w i n g c o n j u n c t i v e m a p p i n g s . C o n j u n c t i v e M a p p i n g 1: b e t w e e n A a n d B student(sid, name) :- A.stu(sid, name)  (8.1a)  student(sid, name) :- B.stu(sid, name, program)  (8.1b)  C o n j u n c t i v e M a p p i n g 2: between A a n d C student(sid, name, advisor) :- A.stu(sid, name), A.advisor(sid, advisor)  (8.2a)  student(sid, name, advisor) :- C.student(sid, name, advisor), C.student(sidl, advisor, name)  (8.2b)  C o n j u n c t i v e m a p p i n g 1 a n d conjunctive m a p p i n g 2 together d o not viol a t e p r o p e r t y 1 a n d 2 , b u t i f o n l y c o n s i d e r c o n j u n c t i v e m a p p i n g 2 , it is easy t o f i n d o u t t h a t C.student(sid, name, advisor), C.student(sidl, advisor, name) is n o t t a l k i n g a b o u t t h e Concept "student". T h i s c o m p o n e n t is a c t u a l l y e x p r e s s i n g t h e Concept o f a s t u d e n t t h a t has a n a d v i s o r a n d t h e s t u d e n t is also his advisor's advisor. However, i n this thesis, we o n l y deal w i t h whether t w o c o n j u n c t i v e m a p p i n g s c a n d e s c r i b e t h e s a m e Concept, r a t h e r t h a n d e c i d i n g w h e t h e r t h e g i v e n c o n j u n c t i v e m a p p i n g itself is c o r r e c t .  •  24  Chapter 5  Creating a Mediated Schema in P D M S In o u r scenario, a mediated schema of all.peer schemas a n d the mappings f r o m the m e d i a t e d s c h e m a t o l o c a l sources are necessary for q u e r y p r o p a gation a n d translation. W e presented the preliminary knowledge of schema m e d i a t i o n i n S e c t i o n 2.1 a n d S e c t i o n 2.2. W e also d i s c u s s e d Concept w h i c h c a n b e r e p r e s e n t e d as t h e s e m a n t i c i n f o r m a t i o n i n m a p p i n g s i n C h a p t e r 4. I n this chapter, we present the definition o f schema m e d i a t i o n i n P 2 P settings, a n d present o u r approach o f creating a m e d i a t e d schema for a P D M S .  5.1  Pottinger's Schema Mediation Algorithm for DIS  Pottinger  [22] p r o v i d e s a n o v e l a p p r o a c h f o r t h e d a t a i n t e g r a t i o n  system  to a u t o m a t i c a l l y create a m e d i a t e d s c h e m a M based o n t w o l o c a l schemas E a n d F, a n d a m a p p i n g MapEj?  between them.  Using her algorithm,  not only a mediated schema M b u t the mappings f r o m M t o b o t h local sources E a n d F, MapMJE a n d MapM.F, w i l l also b e g e n e r a t e d [22]. I n t h i s section we describe Pottinger's algorithms of creating a mediated schema a n d m a p p i n g s t o l o c a l sources i n a d a t a integration s y s t e m . T h e a l g o r i t h m of t r a n s l a t i o n queries is also d i s c u s s e d i n t h i s s e c t i o n . F i g u r e 5.1 s h o w s P o t t i n g e r ' s m e d i a t e d s c h e m a c r e a t i o n a l g o r i t h m . T h e r e are m a i n l y t w o c a t e g o r i e s of r e l a t i o n s t h a t w i l l b e g e n e r a t e d i n t h e m e d i a t e d s c h e m a M: O n e is d i r e c t l y f r o m l o c a l r e l a t i o n s .  T h e y are n o t sharing a  Concept w i t h o t h e r s c h e m a s s o t h a t t h o s e r e l a t i o n s d o n o t a p p e a r i n a n y o f t h e m a p p i n g s . T h e o t h e r is c r e a t e d f r o m m a p p i n g s . If t h e r e i s a m a p p i n g b e t w e e n E a n d F w h i c h expresses s o m e t h i n g t h a t E a n d F b e a r i n c o m m o n , 25  Chapter 5. Creating a Mediated Schema in PDMS Procedure CreateMediatedSchema(£ , F, MapE.F) /* E,F are schemas, a n d MapE.F is a conjunctive m a p p i n g between them. * / C = 0 l  £=0 L e t R = { r e £ U F | r ^ a projection-only component } For each relation r £ Ft L e t m be a new relation name(m) = name(r) attributes(m) = attributes(r) A d d £(r, m) to £ A d d m to M For each I D B name q S IDB(MapE.p) L e t m be a new relation L e t vars be the duplicate-free union of variables q  of queries that define q in name(m) = name(q) attributes(m) = vars A d d £(q, m) to £ A d d m to M Return M and £  Mape.F  q  F i g u r e 5 . 1 : T h e C r e a t e M e d i a t e d S c h e m a A l g o r i t h m f r o m [22]  there w i l l be a relation i n the mediated schema representing this c o m m o n i d e a across the schemas. A s i m p l e s u b s e t o f G L A V m a p p i n g s [12] is u s e d t o e x p r e s s t h e m a p p i n g from M  t o E U F.  T h e h e a d s o f q u e r i e s i n MapM-EuF  c a n b e t r e a t e d as  a n i n t e r m e d i a t e s c h e m a , 7 5 , w h i c h is u s e d t o i n d i c a t e h o w e a c h m e d i a t e d s c h e m a r e l a t i o n relates to each p a r t i c u l a r source. E a c h c o m p o n e n t i n S e c t i o n 2.1.3) c f r o m MapE_F v i e w , lv , c  creates t w o v i e w s i n MapMJiuF-  is c a l l e d a g l o b a l v i e w f o r c. gv  c  GVM,  T h e first  is c a l l e d a l o c a l v i e w f o r c. It is a c o n j u n c t i v e q u e r y f r o m M t o 7 5 .  c o n s i s t s of a l l s u c h l o c a l v i e w d e f i n i t i o n s for M.  LVM  (defined  gv , c  is a q u e r y f r o m E t o 7 5 , a n d is i n c l u d e d i n  t h e g l o b a l v i e w d e f i n i t i o n s f o r M.  creating the mapping  T h e second view,  F i g u r e 5.2 s h o w s t h e a l g o r i t h m o f  MapEuF-  G i v e n M a n d MapM.E\jF,  a n y q u e r y t h a t is over M c a n b e r e f o r m u l a t e d  t o l o c a l s o u r c e s s i m p l y f o l l o w i n g LVM  a n d GVM-  T h e query  reformulation  a l g o r i t h m is s h o w n i n F i g u r e 5.3.  26  Chapter  5.  Creating a Mediated  Schema in  PDMS  P r o c e d u r e C r e a t e M a p p i n g ( £ ' , F, MapE.F, M) /* E a n d F are schemas, MCLPE_F is a conjunctive m a p p i n g between t h e m , a n d M a n d £ are the outputs from CreateMediatedSchema(£', F, MapE.p) */  = 0 = 0  LV  M  GV  M  For each relation m e M If e € E U F a n d f (e, m) L e t q be a fresh I D B name L e t lv = q(attributes(m)) :- m(attributes(m)) L e t gv = q(attributes(m)) :- e(attributes(e)) A d d lv to LVM A d d gv to GVM For each component c € MapE.F L e t cname. = IDB(c) L e t m be the relation in M s.t. £(cname, m) L e t q be a fresh IDB name lv = q(vars(c)) :- m(attributes(m)) gv = q(vars(c)) :- body(c) A d d lv to LVM A d d gv to GVM R e t u r n LVM a n d GVM m  m  m  m  c  c  m  m  F i g u r e 5.2: T h e C r e a t e M a p p i n g A l g o r i t h m f r o m  [22]  F o r m a l d e f i n i t i o n s of a m e d i a t e d s c h e m a i n D a t a I n t e g r a t i o n S y s t e m a n d m a p p i n g f r o m m e d i a t e d s c h e m a t o l o c a l s o u r c e s , as w e l l as t h e  mediated  s c h e m a c r i t e r i a are p r o v i d e d i n [22].  5.2  Problem Definition  J u s t as w e h a v e d i s c u s s e d i n C h a p t e r 1, a m e d i a t e d s c h e m a w h i c h c a n b e u s e d t o i m p r o v e t h e c o m p r e h e n s i b i l i t y of q u e r y t r a n s l a t i o n w i l l b e v e r y d e s i r a b l e i n a P D M S . T o m a k e t h i s i d e a m o r e c l e a r , l o o k at E x a m p l e 9. E x a m p l e 9 A s s u m e t h a t UW,  UBC  a n d UT  are the three peers w i t h  databases storing the following i n f o r m a t i o n a b o u t conference papers shown i n F i g u r e 5.4:  UW.conf-paper(title, author, conference, venue, pages) UBC.conf-paper(title, conference, year) 27  Chapter  5.  Creating a Mediated  Schema in  PDMS  P r o c e d u r e R e w r i t e Q u e r y ( M , MO.PM.EUF, Q) Q ' = m a x i m a l l y - c o n t a i n e d rewriting for Q using LVM / * Q ' is over intermediate schema IS * / Q " = expansion of Q ' using GVM / * Q " is over E U F */ Return Q " F i g u r e 5.3: Q u e r y R e w r i t i n g A l g o r i t h m f r o m [22]  UT.conf-paper(title, author, abstract, area) T h e m a p p i n g s b e t w e e n UBC  Mapping  a n d UW,  UW  a n d UT  are g i v e n as f o l l o w s :  MapuBCMW-  conf-paper(title.conference):UBC.conf-paper(title,conference,year) conf-paper(title.conference):UW.conf-paper(title,author,conference,venue,pages) Mapping  MapuwjJT'-  conf-paper(title.author):UW.conf-paper(title,author,conference,venue,pages) conf-paper(title.author):UT.conf-paper(title,author,abstract,area) Assume that a mediated schema M  is c r e a t e d for a l l t h r e e p e e r s u s i n g  the above information:  M.conf-paper(title,conference,year.author.venue,pages,abstract,area) W e further assume t h a t the M a p p i n g s f r o m M to each l o c a l peer c a n be obtained using G L A V mappings:  Mapping  Map MBCM  LAV:  conf-paper(title,conference,year):M.conf-paper(title,conference,year,author,venue,pages,abstract.area) GAV: 28  Chapter 5. Creating a Mediated Schema in PDMS  Quae is translated  to £>7 6ver M i ?  Peer UBC  is translated to gt/w over UW  UBC User:; Query;£<;fla over UBC  Peer UT  Peer UW (4)  (6)  (2w.is translated to gj'rbver UT  F i g u r e 5.4: Q u e r y P r o c e s s i n g i n M e P S y s  conf-paper(title,conference,year):UBC.conf-paper(title,conference,year) Mapping  Map .uwM  LAV:  conf-paper(title,author,conference,venue,pages):M.conf-paper(title,conference,year,author.venue,pages,abstract,area) GAV:  conf-paper(title,author,conference,venue,pages):UW.conf-paper(title,author,conference,venue,pages) M a p p i n g MapMJUT'LAV:  conf-paper(title,author,abstract,area):M.conf-paper(title,conference,year,author,venue,pages,abstract,area) 29  Chapter 5. Creating a Mediated Schema in PDMS GAV: conf-paper(title,author,abstract,area):UT.conf-paper(title,author,abstract,area)  Now, using these G L A V mappings, queries can be easily translated from over M to any local schema or from any local schema to M. A user at UBC can ask a query in UBC's own schema: q(title):-UBC.conf-paper(title,conference,year)  This query is first translated to that over M using  MapMMBC-  q(title):-M.conf-paper(title, conference, year.author.venue, pages, abstract, area)  Then peer UBC broadcasts this query to all of this acquaintances, and acquaintances to acquaintances. For example, UW receives this query. Using the Mapping  MapMMW,  this query can be translated to that over UW:  q(title):-UW.conf-paper(title,author,conference,venue,pages)  In the same way, answers can be reformulated and passed back to UBC. Thus, UBC can get not only answers from its own peer database but also from other peer database. Users at UBC can also pose their query directly over the mediated schema, by using which users are able to query more information in the network.  •  Since Pottinger has already provided the related algorithm in [22] (described in Section 5.1), are these algorithms already enough to create a mediated schema in a P D M S setting? Unfortunately, the answer is no. In a P D M S , any peer can join and leaves the network at any time, so the schema mediation algorithm needs to be commutative and associative and satisfy all the requirement of a P 2 P system. Though [22] considers to mediate more than two schemas, when the order of local schemas changes, the final mediated schema might also change. Furthermore, [22] requires that to include a new local schema E into the previously created mediated schema M, a 30  Chapter 5. Creating a Mediated Schema in PDMS m a p p i n g f r o m M to E should be specified. T h i s could be achieved in a d a t a integration system. However, i n a P 2 P system, n o b o d y c a n be expected to t a k e t h e r e s p o n s i b i l i t y for t h i s t a s k . A l s o , i n a P D M S , it is v e r y l i k e l y t h a t t w o d i f f e r e n t v e r s i o n s of m e d i a t e d s c h e m a ( c o n t a i n i n g i n f o r m a t i o n of different sets of p e e r s c h e m a s ) m i g h t m e e t at s o m e p o i n t , a m e r g i n g o p e r a t i o n t o m e r g e t h e t w o m e d i a t e d s c h e m a s w o u l d b e i n e v i t a b l e . S u c h a n i n s t a n c e is n e v e r e x p e c t e d i n a d a t a i n t e g r a t i o n s y s t e m , n o r i n [22].  T h u s , these n e w  tasks w i l l be tackled in M e P S y s . T h e a l g o r i t h m s of h o w t o c r e a t e a m e d i a t e d s c h e m a a n d m a p p i n g s t o l o c a l s o u r c e s i n a P D M S s e t t i n g as s h o w n i n E x a m p l e 9 w i l l b e d e s c r i b e d i n t h e f o l l o w i n g s e c t i o n s of t h i s c h a p t e r . T o c r e a t e a m e d i a t e d s c h e m a , we e x t e n d t h e a l g o r i t h m s i n S e c t i o n 5.1. T h i s i n c l u d e s (1) u s i n g Concepts  to create a more comprehensible m e d i a t e d  s c h e m a , (2) e n s u r i n g t h a t t h e a l g o r i t h m is c o m m u t a t i v e a n d a s s o c i a t i v e (since a n y p e e r c a n e n t e r or leave t h e n e t w o r k at a n y t i m e ) . H e r e we p r o v i d e t h e f o r m a l d e f i n i t i o n s of s c h e m a m e d i a t i o n i n P 2 P e n v i r o n m e n t . W e b e g i n b y i n f o r m a l l y d e f i n i n g C o n c e p t M e d i a t i o n , a n d t h e n f o r m a l l y d e f i n e it  in  D e f i n i t i o n 4. W e use t h e d e f i n i t i o n of C o n c e p t M e d i a t i o n t o f o r m a l i z e o u r d e f i n i t i o n o f M e d i a t e d S c h e m a I n P 2 P i n D e f i n i t i o n 5. I n f o r m a l l y , c o n c e p t m e d i a t i o n c a n b e e x p l a i n e d as t h e f o l l o w i n g : A s s u m e t h a t a set of p e e r s c h e m a s S = { P i , P2,  ••• , Pfc} a n d a set of m a p p i n g s  M  b e t w e e n s o m e p a i r s of t h e s c h e m a s are g i v e n . E a c h m a p p i n g m a y c o n t a i n s e v e r a l c o n j u n c t i v e m a p p i n g s . E a c h c o n j u n c t i v e m a p p i n g o n l y specifies o n e a s p e c t of t h e c o m m o n a l i t y b e t w e e n t h e p a i r s of p e e r s . W e a s s u m e t h a t a l l c o n j u n c t i v e m a p p i n g s w i t h t h e s a m e I D B n a m e i n t h e set M  are t a l k i n g  a b o u t t h e s a m e t h i n g , i.e., m o r e f o r m a l l y t h e y refer t o t h e s a m e c o n c e p t ( D e f i n i t i o n 3).  W e p u t a l l t h e i n f o r m a t i o n p r o v i d e d b y these c o n j u n c t i v e  m a p p i n g s ( d e f i n e d i n S e c t i o n 2.1.3) t o g e t h e r a n d f o r m a g l o b a l r e l a t i o n for this concept. D e f i n i t i o n 4 ( C o n c e p t M e d i a t i o n ) : g i v e n a set of p e e r s c h e m a s S = P2, ... , Pk} a n d a set o f m a p p i n g s M  {pi,  between some pairs of the schemas, a  C o n c e p t M e d i a t i o n is t h e p r o c e s s of c r e a t i n g a r e p r e s e n t a t i o n (i.e., r e l a t i o n  31  Chapter 5. Coating a Mediated Schema in PDMS  a n d m a p p i n g to sources) i n the m e d i a t e d s c h e m a t h a t corresponds t o those c o n j u n c t i v e m a p p i n g s i n t h e set M w i t h t h e s a m e I D B n a m e .  •  I n f o r m a l l y , t h e m e d i a t e d s c h e m a i n P 2 P c a n b e d e f i n e d as t h e u n i o n o f a l l c o n c e p t s o f i t s p e e r s . N o t e t h a t i f e v e r y c o n c e p t is c a l c u l a t e d b y u s i n g C o n c e p t M e d i a t i o n , t h e n t h e M e d i a t e d S c h e m a is t h e u n i o n o f a l l m e d i a t e d c o n c e p t s . A M e d i a t e d S c h e m a s h o u l d also f o l l o w t h e I n f o r m a t i o n C a p a c i t y c r i t e r i a : a l l q u e r i e s t h a t c a n b e a s k e d over t h e s o u r c e s c h e m a s c a n b e a s k e d over t h e m e d i a t e d s c h e m a a n d t h e s a m e r e s u l t s a r e r e t u r n e d [22]. D e f i n i t i o n 5 ( M e d i a t e d S c h e m a i n P 2 P ) : g i v e n a set o f p e e r s c h e m a s S =  {Pi,  P, 2  ... , Pk] a n d a set o f m a p p i n g s M b e t w e e n s o m e p a i r s o f t h e  s c h e m a s , a M e d i a t e d S c h e m a is t h e u n i o n o f a l l r e s u l t i n g r e l a t i o n s  from  C o n c e p t M e d i a t i o n (commonalities between schemas) a n d those relations e x i s t i n g i n S b u t n o t e x i s t i n g i n a n y o f t h e s u b g o a l s o f t h e m a p p i n g s i n set M  ( s p e c i a l i t i e s for l o c a l p e e r s c h e m a ) . F u r t h e r , a m a p p i n g MapM.E  •  from the mediated schema M to each local  s o u r c e E, i.e. t h e G L A V m a p p i n g b e t w e e n M a n d UBC  i n E x a m p l e 5, is  n e c e s s a r y so t h a t a q u e r y over M c a n b e r e f o r m u l a t e d t o t h a t over E. o u r a l g o r i t h m , t h e m a p p i n g Mapuj;  is i n t h e f o r m o f G L A V , a n d e a c h p e e r  E maintains their own G L A V m a p p i n g  MapM.E-  T o m a k e t h e m e d i a t i o n p r o c e s s easier, w e i n t r o d u c e d a c o n s t r u c t pingTable. Concept.  In  Map-  A M a p p i n g T a b l e contains a l l local information about a specific W i t h t h e use o f M a p p i n g T a b l e , t h e p r e s e n c e o f i n d i r e c t m a p p i n g s  that identify additional information about relationships betweens schemas c a n b e f u l l y u s e d . C o n s i d e r E x a m p l e 10. E x a m p l e 1 0 : Consider the indirect mapping information i n the following mappings: Conjunctive Mapping A _ B :  author(name) :- A.auth(name, affiliation, contactlnfo) author(name) :- B.author_paper(name, paper) Conjunctive Mapping B_C: 32  Chapter 5. • Creating a Mediated Schema in PDMS author(name) :- B.author paper(name, paper) author(name) :- C.author(name, affiliation, title)  Conjunctive Mapping A_C: author(affiliation) :- A.auth(name, affiliation, contactlnfo) author(affiliation) :- C.author(auth-name, affiliation, title)  • For mapping A_B and mapping B_C, it is clear that the first attributes in A.auth, B.author and C.author are the same since they are mapped to the same variable: name. In mapping A_C, the first attributes of A.auth and C.author are not mapped to the same variable, so by A_C only, it is impossible to tell that they represent the same information. However, by composing A_B and B_C, it is clear that in the first attributes of A.auth and C.author represent the same information. We call the information found in the composed mapping that is not included in the direct mapping indirect information.  mapping  One advantage of our work is that such indirect information  can complement information in the original mappings. Previous work tends to focus on translating queries on a single source of information, either from the original mapping or from the composed mapping. Since users are likely to provide incorrect or incomplete mapping information, our system helps users check whether their mappings are correct, and the system will automatically combine all mapping information together, regardless of where the information comes from. In Section 5.3, we introduce the use of a MappingTable to merge all mapping information, not just the direct information.  In Section 5.4 we  introduce the idea and the algorithm of a mapping compatible check to ensure that those relations in the mediated schema are concept-based. We present the schema mediation algorithm for P2P in Section 5.5.  33  Chapter 5. Creating a Mediated Schema in PDMS  5.3  MappingTable Creation  5.3.1  Intuitions  A M a p p i n g T a b l e b o t h h e l p s c r e a t e t h e m e d i a t e d s c h e m a , a n d is u s e d t o create t h e m e d i a t e d s c h e m a t o source m a p p i n g s . E a c h M a p p i n g T a b l e represents a s i n g l e c o n c e p t , i n c l u d i n g b o t h t h e d i r e c t a n d i n d i r e c t information i n relating a concept.  mapping  A s s h o w n i n E x a m p l e 11, each source  r e l a t i o n is g i v e n a r o w , a n d e a c h a t t r i b u t e is r e p r e s e n t e d b y a c o l u m n . E a c h c o l u m n represents one aspect of t h a t concept. E x a m p l e 11 : T h e M a p p i n g T a b l e c r e a t e d f o r t h e m a p p i n g i n E x a m p l e 6 is s h o w n i n F i g u r e 5 . 5 :  R e l a t i o n : M l . c o n f - p a p e r ( t i t l e , v e n u e , y e a r , u r l , pages) Ml.conf-paper  1  2  3  4  UBC.conf-paper  1  2  3  4  UW.conf-paper  1  2  3  5 4  F i g u r e 5.5: M a p p i n g T a b l e f o r E x a m p l e 6 E a c h e n t r y i n t h e t a b l e refers to t h e l o c a t i o n o f t h e a t t r i b u t e i n t h e s o u r c e (e.g., t h e f o u r t h a t t r i b u t e i n UW.conf-paper gives i n f o r m a t i o n a b o u t t h e a t t r i b u t e pages i n t h e m e d i a t e d s c h e m a r e l a t i o n ) .  •  A s t h e mediated schema grows to encompass more peers a n d more m a p pings, sometimes a new M a p p i n g T a b l e w i l l b e created t o represent t h e same c o n c e p t as o n e c r e a t e d f o r t h e p r e v i o u s l y - e x i s t i n g m e d i a t e d s c h e m a . L e t ' s a s s u m e t h a t conf-paper is a n e x i s t i n g c o n c e p t r e p r e s e n t e d b y t h e M a p p i n g T a b l e MTold. MTnew.  L e t the new M a p p i n g T a b l e representing conf-paper b e  MTold a n d MTnew n e e d to b e m e r g e d t o f o r m o n e c o n c e p t f o r  conf-paper i n t h e m e d i a t e d s c h e m a . E x a m p l e 12 gives t h e i n t u i t i o n o f m e r g ing two M a p p i n g T a b l e s of the same concept.  34  Chapter 5. Creating a Mediated Schema in PDMS E x a m p l e 1 2 : L e t M T o l d b e t h e M a p p i n g T a b l e i n E x a m p l e 11. L e t  MT-  new b e t h e M a p p i n g T a b l e s h o w n i n F i g u r e 5.6.  Relation: M2.conf-paper(title,  venue, year, u r l , author)  M2.conf-paper  1  2  3  4  UBC.conf-paper  1  2  3  4  UT.conf-paper  1  2  3  5 4  F i g u r e 5.6: M a p p i n g T a b l e for E x a m p l e 12(a)  S i n c e M T o l d a n d M T n e w are b o t h a b o u t  "conf-paper",  they can be  m e r g e d t o b e one M a p p i n g T a b l e , s h o w n i n F i g u r e 5.7.  Relation: M3.conf-paper(title,  venue, year, u r l , pages, author) 3  4  5  2  3  4  5  2  3  4  1  2  3,  M2.conf-paper  1  2  3  4  UBC.conf-paper  1  2  3  4  UT.conf-paper  1  2  3  M3.conf-paper  1  2  Ml.conf-paper  1  UBC.conf-paper  1  UW.conf-paper  6  4 5 4  F i g u r e 5.7: M a p p i n g T a b l e for E x a m p l e 12(b)  B o t h " U B C . c o n f - p a p e r " i n l i n e 3 a n d 6 are k e p t for c l a r i t y . real implementation  However, in  a n d after o p t i m i z a t i o n , o n l y one of t h e m needs t o b e  kept.  • 5.3.2  Algorithm  F i g u r e 5.8 s h o w s h o w t o c r e a t e a M a p p i n g T a b l e for e a c h r e l a t i o n t h a t is c o n s t r u c t e d f r o m a m a p p i n g MCLPE_F c r e a t e M e d i a t e d S c h e m a ( . E , F,  MapEjp)  in the mediated schema.  Procedure  h i F i g u r e 5.1 w i l l f i r s t b e c a l l e d .  35  Chapter  G i v e n MapE_F concept in  5.  Creating a Mediated  a n d MEF,  Schema in  PDMS  t h e M a p p i n g T a b l e s w i l l b e c o n s t r u c t e d for e a c h  MapE_F-  W h e n a p e e r h a v e t w o d i f f e r e n t v e r s i o n s of m e d i a t e d s c h e m a M\ m a i n t a i n i n g d i f f e r e n t i n f o r m a t i o n , if b o t h M\  and M  2  M  2  (r\  and r  2  representing the concept  M  2  c o n t a i n c o n c e p t q, i n  o r d e r t o m e r g e r\ a n d , r 2 , c o r r e s p o n d i n g M a p p i n g T a b l e MT\(q) needs t o b e m e r g e d .  and  and  MT {q). 2  are t h e c o r r e s p o n d i n g r e l a t i o n s i n M\  and  q.)  T h e M a p p i n g T a b l e m e r g i n g process follows the same general principles: 1. R e l a t e d a t t r i b u t e s s h o u l d b e p o s i t i o n e d i n t h e s a m e c o l u m n ; 2. U n - r e l a t e d a t t r i b u t e s are i n d i f f e r e n t c o l u m n s ; 3. O v e r l a p p i n g l o c a l r e l a t i o n s i n t h e t w o M a p p i n g T a b l e s are u s e d t o d e t e r m i n e e a c h c o l u m n i n one M a p p i n g T a b l e c o r r e s p o n d s t o t h a t i n t h e other M a p p i n g T a b l e . Procedure mergeMappingTable(mti,  r\,  mt , 2  a n y t w o M a p p i n g T a b l e of t h e s a m e c o n c e p t mt\ merged MappingTable. in m i l  a n d mt ; 2  It  r) 2  i n F i g u r e 5.9 m e r g e s  a n d mt , 2  and returns  is r e q u i r e d t h a t t h e r e b e o v e r l a p p e d  otherwise, they cannot be merged.  a  relations  S i n c e it is l i k e l y t h a t  each M a p p i n g T a b l e c o n t a i n s i n d i r e c t m a p p i n g i n f o r m a t i o n  that the  other  one d o e s n o t h a v e , t h e f i r s t step for m e r g i n g t w o M a p p i n g T a b l e is t o u p d a t e s u c h i n f o r m a t i o n for e a c h M a p p i n g T a b l e s h o w n i n F i g u r e 5.10.  5.3.3  Section Summary  I n t h i s s e c t i o n , we p r e s e n t e d b y e x a m p l e s t h e m o t i v a t i o n o f i n t r o d u c i n g  a  M a p p i n g T a b l e for e a c h r e l a t i o n i n t h e m e d i a t e d s c h e m a t h a t c o m e s f r o m a mapping.  W e also d i s c u s s e d t h e case of m e r g i n g t w o M a p p i n g T a b l e s .  We  f u r t h e r gave t h r e e a l g o r i t h m s t h a t are r e l a t e d t o o p e r a t i n g M a p p i n g T a b l e s . T h e y are c r e a t i n g a M a p p i n g T a b l e , m e r g i n g M a p p i n g T a b l e s a n d  updat-  ing current M a p p i n g T a b l e using additional m a p p i n g information in another MappingTable.  36  Chapter 5. Creating a Mediated Schema in PDMS P r o c e d u r e C r e a t e M a p p i n g T a b l e ( M a p . E _ F , MEF) /* MEF is the mediated schema created from Maps.F */ MT = 0 For each relation m S M constructed from IDB(MapE.F) L e t mt be a new M a p p i n g T a b l e w i t h # a t t r ( m ) + l columns / * T h e next three lines add m into mt * / mt(0, 0) = name(m) For i = 1 to # a t t r ( m ) mt(0, i) = i L e t qs be component from Maps.F L e t qp be component from Maps.F Add(<?.E,  over schema E over schema F  mt)  / * call P r o c e d u r e A d d ( g s , mt) to add all subgoals of qE to mt  */  Add(<7F> rnt) / * call Procedure A d d ( g F > rnt) to a d d all subgoals of qF to mt  */  A d d mt to  Return  MT _F E  MT _F E  P r o c e d u r e A d d ( g , mt) / * A d d all subgoals of q to M a p p i n g T a b l e mt L e t m be the first relation in mt  */  L e t j be # next row to insert in mt For each subgoal g of q m t ( j , 0) = name(g) For each attribute x of g Let x be the i t h attr of g F i n d k s.t. var(g.attr(z)) = var(m.attr(&)) m t ( j , k) = i j = j + 1 IImove to the next row F i g u r e 5.8: C r e a t e M a p p i n g T a b l e  5.4  Algorithm  M a p p i n g Compatible Check  T w o m a p p i n g s t h a t a r e c r e a t e d at d i f f e r e n t p e e r s b y d i f f e r e n t users m i g h t h a v e t h e s a m e I D B n a m e , i n d i c a t i n g t h a t these t w o m a p p i n g s r e p r e s e n t t h e s a m e c o n c e p t . H o w e v e r , as is d i s c u s s e d i n S e c t i o n 4 . 2 , t w o m a p p i n g s h a v i n g t h e s a m e c o n c e p t s h o u l d f o l l o w D e f i n i t i o n 3 (i.e. t h e s e t w o m a p p i n g s s h o u l d have the same I D B n a m e , they have overlapped subgoal names a n d  their  o v e r l a p p e d s u b g o a l sets are e q u i v a l e n t ) . T h e m o d u l e of M a p p i n g C o m p a t i b l e C h e c k is t o m a k e s u r e t h a t t h e m a p p i n g s are r e a l l y r e p r e s e n t i n g t h e s a m e  37  Chapter 5. Creating a Mediated Schema in PDMS  Procedure M e r g e M a p p i n g T a b l e ( m * i , n , 7 n * , T- ) I* mt\ and 771*2 are two M a p p i n g T a b l e s w i t h the same concept; r\ and r-i are the corresponding relations i n the mediated schema for 2  2  mt\ and rati */ / * F i r s t update each M a p p i n g T a b l e s using indirect m a p p i n g information specified i n the other * / U p d a t e M a p p i n g T a b l e ( m * i , 771*2) / / s e e F i g u r e 5.10 / * Second, merge the M a p p i n g T a b l e s * / Let S be the set of overlapped relation names i n 771*1 and 771*2 Let overlappedCol = # column (5) newMTCol = # c o l u m n ( m i i ) + # column(7n*2) + 1 overlappedCol / * Construct a new M a p p i n g T a b l e mt of newMTCol columns*/ hashi — 0 / * how columns i n mt\ map to mt */ For V column i of mt\ hashiii) = new(/c) new  new  /* k is a column i n mt  that has not been assigned * /  new  hashi = 0 / * how columns i n 771*2 map to mt */ For V column i of mti If 3 row j s.t. mt (j, 0) G S and mti{j, i)^ blank F i n d k, I s.t. mt\{k, 0) = mt (j, 0) and mt\(k, I) = mti(j, hashi{\) = hash\{l) Else hashi(\) = new(fc) / * A d d the first row to mt */ mt (0, 0) = n a m e ( r i ) For i = l to newMTCol mt (0, i) = i I* A d d m i l to mtnew */ For each row i of mt\ Let p be the next row i n mt For each column j of m t i If mti(i, j) is not blank mt (p, hashi(j)) = mti(i, j) new  2  2  i)  new  new  new  new  new  / * A d d 771*2 to mtnew  */  For each row i of 771*2  Let p be the next row i n mt w For each column j of 771*2 If m t ( i , j ) is not blank mtnewip, hash (j)) = mt {i, R e t u r n mt w ne  2  2  2  j)  ne  F i g u r e 5.9: M e r g e M a p p i n g T a b l e  Algorithm  38  Chapter 5. Creating a Mediated Schema in PDMS Procedure U p d a t e M a p p i n g T a b l e ( m * i , 771*2) V relation name x\ a n d xi from mt\ , If x\ and X2 both have attributes i n column j of mt\, b u t i n m*2, the corresponding attributes of x\ and X2 are i n different columns k a n d Z C o m b i n e column k and Z of 771*2 V relation name x\ a n d X2 from 777*2 If x\ and X2 both have attributes i n column j of 771*2 b u t i n 771*1, the corresponding attributes of x\ a n d x^ are in different columns k a n d / C o m b i n e column k a n d / of mt\. F i g u r e 5.10: U p d a t e M a p p i n g T a b l e A l g o r i t h m  concept when merging the information contained b y two mappings. Since every peer o n l y m a i n t a i n s its o w n m a p p i n g s a n d does n o t k n o w other peers' m a p p i n g , t h e m a p p i n g c o m p a t i b l e check needs t o rely o n t h e M a p p i n g T a b l e t h a t c a n infer t h e original m a p p i n g s . T h e a l g o r i t h m i n F i g u r e 5.11 i n p u t s t h e o r i g i n a l M a p p i n g T a b l e o f c o n c e p t q f o r t h e m e d i a t e d s c h e m a , MT (q), m  a n d a n e w m a p p i n g o f t h e c o n c e p t q, MapEj-{q)-  The  a l g o r i t h m f i r s t checks w h e t h e r t h e r e e x i s t s a case w h e n t w o a t t r i b u t e s o f a c e r t a i n s u b g o a l o r t w o a t t r i b u t e s o f different s u b g o a l s i n MapE_F(q)  have  t h e s a m e v a l u e b u t h a v e different v a l u e i n a p r e v i o u s m a p p i n g , w h i c h is r e p r e s e n t e d b y a r o w o r a set o f r o w s i n MT (q). m  i f t h e r e e x i s t s a case w h e n i n MapE_F(q),  T h e a l g o r i t h m f u r t h e r checks  two attributes i n one subgoal or i n  t w o s u b g o a l s f r o m o n e c o m p o n e n t h a v e t h e s a m e v a l u e , b u t i n MT (q), m  the  c o r r e s p o n d i n g a t t r i b u t e s h a v e different v a l u e s . If e i t h e r o f t h e a b o v e case is t r u e , t h e n e w m a p p i n g of c o n c e p t q i s n o t c o m p a t i b l e w i t h a p r e v i o u s e x i s t i n g m a p p i n g o f c o n c e p t q. T h e m a p p i n g needs t o b e m o d i f i e d b y t h e user. W e a s s u m e t h a t a t least o n e s u b g o a l s h o u l d b e i n c o m m o n i f one s c h e m a is i n v o l v e d i n s e v e r a l c o n j u n c t i v e m a p p i n g s w i t h t h e s a m e I D B n a m e .  5.5  Peer Schema Mediation  W e h a v e b r i e f l y d e s c r i b e d M e P S y s i n S e c t i o n 5.2. I n t h i s s e c t i o n , w e d e scribe our Peer Schema M e d i a t i o n algorithm  (short for P S M a l g o r i t h m ) ,  w h i c h is c o m p o s e d o f t h r e e s m a l l a l g o r i t h m s , s c h e m a m e d i a t i o n , c o m p u t i n g  39  Chapter 5. Creating a Mediated Schema in PDMS Procedure M a p p i n g C o m p a t i b l e C h e c k ( M a p / j ; _ i r ( g ) , MT (q)) /* MapE.F{q) is a new m a p p i n g of concept q\ MT is the M a p p i n g T a b l e of concept q for the mediated schema * / / * check whether Self-Restrictive component exists * / / * R e t u r n True when compatible; False when incompatible * / D i v i d e a l l rows in MT (q) into several sets s.t. a l l consecutive rows from one peer are i n one set For each set S of MT (q) If 3 values v\, v<i i n the same column from rows i, j which belong to S F i n d subgoal sgi i n MCLPE„F{<I) s.t. name(sgi) = MT (q)(i, 0) F i n d subgoal sg2 i n MapE.F(q) s.t. name(s<j2) = MT (q)(j, 0) If s g l . a t t r ^ i ) =^ sp2.attr(w2) R e t u r n False m  m  m  m  m  m  •For V subgoal sg\, sg2 from the same component of Mapsj-(q) If 3 x and y s.t. sf^.att^a;) = sp2-attr(j/) F i n d row i, j i n MT (q) s.t. MT (q)(i,0) = name(spi), M r ( < ? ) ( j , 0 ) = name(sff ) If x i n row i and y i n row j are not i n the same c o l u m n R e t u r n False R e t u r n True m  m  m  2  I  F i g u r e 5.11: M a p p i n g C o m p a t i b l e C h e c k A l g o r i t h m  G L A V mapping and query reformulation.  I n C h a p t e r 6, t h e r e w i l l b e m o r e  d i s c u s s i o n o n h o w t o h a n d l e t h e case w h e n n e w p e e r s j o i n t h e n e t w o r k  after  database a p p l i c a t i o n has been b u i l t u p o n t h e earlier m e d i a t e d schema, a n d h o w t o h a n d l e t h e case w h e n s o m e p e e r leaves t h e n e t w o r k , o r s o m e l o c a l d a t a b a s e s c h e m a evolves as t i m e goes by.  5.5.1  System Setup Phase - Schema Mediation  System Work F o r ease o f d i s c u s s i o n , w e a s s u m e t h e r e is a s e t u p p h a s e f o r M e P S y s : s e v e r a l peers have j o i n e d t h e network, f o u n d their acquaintances a n d created m a p pings to t h e acquaintances b u t n o m e d i a t i o n has been started. T h e system s e t u p p h a s e s t a r t s f r o m a first p e e r s t a r t i n g m e d i a t i o n t o a t i m e p o i n t w h e n a l l peers i n t h e s y s t e m get t h e m o s t u p d a t e d m e d i a t e d s c h e m a o f t h e w h o l e network. W e a s s u m e t h a t , a t a n y t i m e , each p e e r P m a i n t a i n s a l l o f t h e f o l l o w i n g  40  Chapter 5. Creating a Mediated Schema in PDMS information: 1. P ' s l o c a l d a t a b a s e s c h e m a 2. A l i s t L of m a p p i n g s w i t h P ' s a c q u a i n t a n c e s 3. A c u r r e n t v e r s i o n of m e d i a t e d s c h e m a M 4. M a p p i n g T a b l e set c o r r e s p o n d s t o 5. G L A V m a p p i n g s f r o m M t o  M  P  >l3::B>checks andupdates;its local relation, '.information imMn-based on B. tip X creates Mi- based on:: X,.MapxiB,Mapx_c t*j X'geis responses" from B^C Mapping with other peers  X computes Mwcontaining:  X. B..p arjd;MXpK:B,';MapVic Peer B: B.Mapxja, MapBjd, MaptD  Xbroadcasts MUand COTresppnding'MappjngTable  Peer-X;: X, MapVs, Mapxlc  'Map'ping'with'otliMrpmrs'  Peer C:: Mapictt'Mapaic .'.t3: CcheckVand updates-its local reiation ^formation in -Miibased oh C;  F i g u r e 5.12: S c h e m a M e d i a t i o n S t a r t - u p A s s h o w n i n F i g u r e 5.12, w h e n m e d i a t i o n s t a r t s f r o m p e e r X,  X  gets  m a p p i n g s f r o m L one b y o n e . A t t i m e £ i , X m e d i a t e s i n f o r m a t i o n b a s e d o n 41  Chapter 5. Creating a Mediated Schema in PDMS each m a p p i n g mapx.Y-  H o w e v e r , as is d e s c r i b e d i n S e c t i o n 5 . 1 , t h e r e are  t w o k i n d s of r e l a t i o n s i n t h e m e d i a t e d s c h e m a : one is f r o m l o c a l s o u r c e s a n d a n o t h e r is f r o m t h e m a p p i n g . S i n c e i n a P D M S , e v e r y p e e r does n o t k n o w its a c q u a i n t a n c e ' s s c h e m a , t h e m e d i a t e d s c h e m a Ma o n l y c o m p o s e d of t w o p a r t s : l o c a l r e l a t i o n s f r o m X  that X  first  a n d relations from the  m a p p i n g s . I n f o r m a t i o n of F ' s l o c a l r e l a t i o n is m i s s i n g f r o m Mt\. X s e n d s Ma  c r e a t e s is  A t time  to each acquaintance, Y a n d asks Y to c o n f i r m or u p d a t e  A t t i m e £3, e a c h a c q u a i n t a n c e c o n f i r m s Ma l o c a l s c h e m a i n f o r m a t i o n , i.e.  a n d u p d a t e s Ma  t, 2  M \. t  b a s e d o n its  a d d i n g those local schema i n f o r m a t i o n t h a t  X does n o t k n o w . A t £4, X receives a l l c o n f i r m a t i o n f r o m its a c q u a i n t a n c e s , X w i l l c o m p u t e a n e w m e d i a t e d s c h e m a , s a y M ^, t  w h i c h contains all schema  i n f o r m a t i o n of X a n d a l l of its a c q u a i n t a n c e s as w e l l as t h e m a p p i n g s b e t w e e n X a n d its acquaintances. X further computes the G L A V m a p p i n g f r o m to X.  X  t  t  also u p d a t e s M a p p i n g T a b l e s a n d let t h e m e d i a t e d s c h e m a t o b e  i n its l i s t . A t t i m e £5, X  M4  M4  sends M4 t  and corresponding MappingTables  t o a l l of its a c q u a i n t a n c e s . W h e n a peer E  u p o n receiving a mediated schema M , tn  first  checks  w h e t h e r it a l r e a d y h a s a m e d i a t e d s c h e m a . If it h a s n o t , E w i l l d o t h e s a m e as X  d o e s . O t h e r w i s e , m e r g e t h e t w o m e d i a t e d s c h e m a s . If t h e m e d i a t e d  schema that E  m a i n t a i n s c h a n g e d after t h i s s t e p , E  mediated schema M ( t  n + 1  )  sends out this new  a n d c o r r e s p o n d i n g M a p p i n g T a b l e s t o a l l o f its  a c q u a i n t a n c e s ; o t h e r w i s e , n o message w i l l b e sent o u t . E also c o m p u t e s t h e new G L A V m a p p i n g a n d m a i n t a i n s the u p d a t e d i n f o r m a t i o n i n his list  L.  A f t e r a p e r i o d w h e n e v e r y p e e r E has r e c e i v e d t h e m o s t u p d a t e d m e d i a t e d s c h e m a M ( k) t  n+  the system setup  a n d computed the G L A V m a p p i n g from M ( „ t  finishes.  + f c  ) to  E,  T h e a l g o r i t h m of c r e a t i n g t h e m e d i a t e d s c h e m a  i n t h e s e t u p p h a s e is s h o w n i n F i g u r e 5.13. T h e o r e m 1 T h e t i m e of c r e a t i n g a m e d i a t e d s c h e m a u s i n g t h e a b o v e a l g o r i t h m has a n u p p e r b o u n d . P r o o f S k e t c h : A s s u m e t h e r e are n p e e r s i n t h e n e t w o r k . L e t t b e t h e t i m e for one h o p i n t h e n e t w o r k .  T w o k i n d s of messages n e e d t o b e c o n s i d e r e d :  s e n d i n g out, t h e p a r t i a l m e d i a t e d s c h e m a a n d r e c e i v i n g c o n f i r m a t i o n of l o c a l  42  Chapter 5. Creating a Mediated Schema in PDMS A l g o r i t h m SystemSetUp / * to be processed at each peer E when receiving a mediation message from other peer * / Input: 1. mediation message containing a partially created mediated schema Mtmp of previous peers, a M a p p i n g T a b l e MT corresponding to M 2. local schema m a p p i n g list O u t p u t : A new mediated schema and M a p p i n g T a b l e w h i c h w i l l be sent out to all of E's acquaintances tmp  tmp  If previously there is no mediated schema at peer E Mediate schemas M e based on E and all its mappings t o acquaintances Send out check message w i t h ME and wait for confirmation from the acquaintances Received updated mediated schema MEF from each acquaintance F w i t h F ' s local schema information i n it C o m p u t e a local mediated schema ME for E and a l l its acquaintances based on all MEF {F € E's acquaintance list) Merge M and M to be M' Else (there exists previous version of mediated schema ME at E) Merge ME and M to be M' If M' does not equal to M Send M' and its corresponding M a p p i n g T a b l e to all the acquaintances Else N o message t o be sent t m p  tmp  E  tmp  B  tmp  E  B  E  E  Figure 5.13: System Setup Algorithm relations, and sending out the mediated schema. The upper bound for the first part is 2nt. So we only consider the second part. Each peer maintains the mapping information with its acquaintances. We can model this problem to the following graph problem: when does each node get all the edge information? Since each node knows its edge information, this problem can further be transferred to the problem: When does each node get all the node information? A node E knows the information of node F, either from F directly, or from F's neighbor. The maximum time difference between these two is one hop t. We drop this one hop difference temporarily. So in order to let E get a full set of node information, E should receive a message from all other nodes. E receives the earliest message from F after (shortestPath(A, F) + shortestPath(F, E)) hops. So E receives all nodes information at hop  43  Chapter 5. Creating a Mediated Schema in PDMS hop : E  hops = max.{shortestPath(A,  F) + shortestPath(F,  E)}  Thus the number of hops for all nodes receiving all other node information is hopau: hopait = maxhopE = max{m&x{shortestPath(A, EJ  F)+shortestPath(F,  E)}}  F  E  Considering the messages to confirm local relations, the total time before every peer gets the ultimate mediated schema is T \\: a  Tail = (hopau + 2n) *t  Considering the fact that information about F can be obtained from F's neighbor, T u can sometimes be a  T u = {hopau +  2n-2)*t  a  This is the upper bound for the whole mediation process.  •  M e r g i n g T w o M e d i a t e d Schemas  In a P D M S , different versions of mediated schemas are broadcasted in the network before a final mediated schema for all peers in the network has been built. At any time in the setup phase, a peer E that already maintains a mediated schema M\ can receive a different mediated schema Mo, that contains non-overlapped information with M \ .  In this case, M\ and M  2  need to be merged before E sends any of them to other acquainted peers. Our mediation process follows the following strategy: 1. When E gets a mapping Ep, a mediated schema MEF will be created first. This can use algorithm in Figure 5.1. 2. Then consider previous existing mediated schema ME, merge MEF and ME to get M ' , and update M E  E  to be M ' . E  44  5. Creating a Mediated  Chapter  Schema in  PDMS  3. W h e n E o r i g i n a l l y h a s a m e d i a t e d s c h e m a M a n d receives a n e w m e diated schema  Mtmp,  M  and  Mtmp  w i l l be merged.  F i g u r e 5.14 s h o w s t h e a l g o r i t h m o f h o w t o m e r g e t w o m e d i a t e d s c h e m a . Algorithm M e r g i n g T w o M e d i a t e d S c h e m a ( M , M , M T V , M T ) / * M i , M are the two mediated schemas, M T i , M T are the corresponding MappingTables * / M = 0 For V mti G M T i 1  2  2  2  2  new  If 3 rati G M T s.t. name(mii(0, 0 ) ) = name(mt 2 (0, 0 ) ) = q / * both m i l and mti are about concept q */ Find the corresponding relation r\ and r from M i and M mt =MergeMappingTable(mii, r\, mt , r ) Create a new relation r name [r ) = q Let n be column size of mt euj Let the first row number of mti in mt be indexl Let the first row number of mti in mt be index2 For i = 0 to n 2  2  new  2  2  2  new  new  n  new  new  If mt (indexl, i) contains value re r . a t t r ( i ) = ri.attr(a:) Else (rrji e™ (wdea:2, i) contains value a;) r„etu.attr(i) = r .attr(a;) Add r to M For V n G M i If r i hasn't been merged or added to M Add r i to M For V r G M new  netu  n  2  new  new  new  n  2  e  w  2  If r hasn't been merged or added to M „ t u Add r to M Return M e  2  2  n  e  w  new  F i g u r e 5.14: M e r g e T w o M e d i a t e d S c h e m a A l g o r i t h m  Compute G L A V Mapping for Each Local Peer N e x t , h a v i n g created the mediated schema, we need to b e able t o create the G L A V m a p p i n g f r o m t h e m e d i a t e d s c h e m a to t h e local peer s c h e m a i n o r d e r t o a l l o w f o r q u e r i e s .to b e t r a n s l a t e d . W i t h t h e m e d i a t e d s c h e m a a n d t h e c o r r e s p o n d i n g M a p p i n g T a b l e s , c o m p u t i n g t h e G L A V m a p p i n g is easy. F i g u r e 5.15 s h o w s a n a l g o r i t h m of c o m p u t i n g t h e G L A V m a p p i n g f o r l o c a l 45  Chapter 5. Creating a Mediated Schema in PDMS  s c h e m a E. E x a m p l e 13 gives a n e x a m p l e o f c o m p u t i n g G L A V M a p p i n g f o r local peer. A l g o r i t h m C o m p u t e G L A V M a p p i n g ( M , MT, E) /* M is the mediated schema, MT is the corresponding M a p p i n g T a b l e for M , E is the local schema * / LVM — 0 PV = 0 For each m € M If m does not have a corresponding mt £ MT If 3 e £ E s.t. name(e) = name(m) Let lv = q(attributes(e)) :- m(attributes(m)) Let gv — q(attributes(e)) :- e(attributes(e)) A d d lv to LVM A d d gv to GVM Else / / m has a corresponding mt S MT Let body be the subgoal set for local peer, body = 0 For each row i of mt If 3 r e E s.t. name(r) = m t ( i , 0) Construct subgoal sg s.t. name(sg) = name(r), using m ' s attribute value as the corresponding attribute value for sg A d d sg to body li$ s e E s.t. name(s) = m i ( i + 1,0) Let lv = q(var(body)) :- m(attributes(m)) Let gv = q(var(body)) :- body A d d lv to LVM A d d gv to G V M 6odi/ = 0 R e t u r n LVM, GVM M  m  m  m  m  m  m  m  m  F i g u r e 5.15: C o m p u t e G L A V M a p p i n g  Algorithm  E x a m p l e 1 3 G i v e n s c h e m a B, m e d i a t e d s c h e m a M MT,  and MappingTable  the G L A V mapping from M to B can be computed.  S c h e m a B: B_flight(date,  c o m p a n y , f l i g h t N o , service)  B_schedule(date, flightNo, depart, arrival,  numLeft)  S c h e m a M: 46  Chapter 5. Creating a Mediated Schema in PDMS flight(date, f l i g h t N o , company, service, class, I D , discount, depart, arrival, numLeft)  M a p p i n g T a b l e : flight A f .flight  1  2  3  4  BJlight  1  3  2  4  B_schedule  1  2  C-schedule  2  3  T h e G L A V m a p p i n g for MapMJ3  7  8  9  10  3  4  5  1 2  C-price  Mapping  5 . 6  1  3  can be obtained.  MapMJB'-  LAV: Q l ( d a t e , company, flightNo, service):M.flight(date, flightNo, company, service, class, I D , discount, depart, arrival, numLeft) GAV: Q l ( d a t e , company, flightNo, service):B.B_flight(date, company, flightNo, service), B_schedule(date, flightNo, depart, arrival, numLeft)  • Query Reformulation  Algorithm  W i t h t h e G L A V m a p p i n g c o m p u t e d f o r e a c h l o c a l p e e r E, a n y q u e r y p o s e d over t h e m e d i a t e d s c h e m a M c a n b e e a s i l y r e f o r m u l a t e d t o t h a t over E a n d vice versa.  T h i s e n a b l e s a fast t r a n s l a t i o n f o r q u e r i e s b e t w e e n a n y l o c a l  s c h e m a a n d M,  w h i c h c a n b e u s e d f o r q u e r y p r o c e s s i n g i n t h e P D M S as  d i s c u s s e d i n S e c t i o n 5.2. F i g u r e 5.16 g i v e s t h e a l g o r i t h m o f q u e r y  refor-  m u l a t i o n . E x a m p l e 14 s h o w s a n e x a m p l e o f q u e r y r e f o r m u l a t i o n u s i n g t h i s algorithm. E x a m p l e 1 4 R e f o r m u l a t e Q u e r y Q t o t h a t over l o c a l s c h e m a B u s i n g t h e G L A V m a p p i n g c o m p u t e d i n E x a m p l e 13 47  Chapter  5.  Creating  a Mediated  Schema in  PDMS  Algorithm Q u e r y R e f o r m u l a t i o n ( p , M O P M . B ) ) M , E /* q is the query posed by the user; MapM.E is the G L A V m a p p i n g between M a n d E\ M is the mediated schema; E is the local schema * / Let reformQ be the reformulated query of q Let h e a d ( r e / o r m Q ) = head(g) If any subgoal sp of q S M F o r each subgoal sg of q F i n d in L A V a view Iv s.t. sg € body(Zu) For each variable v in Iv L e t pos be the position of v in body(Zu) Change n to the variable v' in sg s.t. pos is the position of v' in sp F i n d in G A V a view gv s.t. head(pv) = head(Zv) For each variable v in head(pv) Change v to 1/ w i t h the same position in the head of Iv A p p e n d body(pu) to the body of reformQ Else // sg G E For each subgoal sg of q F i n d in G A V a view gv s.t. sp £ body(pv) For each variable u in gv L e t pos be the position of v in body(pi>) Change v to the variable v' in sp s.t. pos is the position of v' in sp F i n d in L A V a view Iv s.t. head(Zv) = head(pv) For each variable v in head(Zv) Change v to t/ w i t h the same position in the head of gv A p p e n d body(lv) to the body of reformQ Return reformQ F i g u r e 5.16: Q u e r y R e f o r m u l a t i o n A l g o r i t h m  Q = q ( d , f) :- M . f l i g h t ( d , f, p r i c e , c o m p a n y , s e r v i c e , c l a s s , I D , d i s c o u n t , depart, arrival, numLeft) U s i n g t h e a l g o r i t h m i n F i g u r e 5.16, Q is r e f o r m u l a t e d t o Q' over B Q' = q ( d , f ) :- B . B _ f l i g h t ( d , c o m p a n y , f, s e r v i c e ) , B . B _ s c h e d u l e ( d , f, d e p a r t , a r r i v a l , n u m L e f t )  •  48  Chapter  5.5.2  5. Creating a Mediated  Schema in  PDMS  Section Summary  I n t h i s s e c t i o n , we d i s c u s s e d t h e s y s t e m s e t u p p h a s e - c r e a t i n g a m e d i a t e d s c h e m a for t h e P D M S f r o m the b e g i n n i n g . W e presented P S M a l g o r i t h m which includes schema mediation, computing G L A V reformulation our idea.  algorithms.  mapping and query  F o r each step, we give examples to  illustrate  M e P S y s provides the first p r o t o t y p e to a u t o m a t i c a l l y create a  mediated schema in P D M S s .  W i t h P S M a l g o r i t h m , queries c a n b e easily  p r o c e s s e d a m o n g d i f f e r e n t peers i n a P D M S .  49  Chapter 6  Updating the Schema  Mediated  S o m e of t h e peers m i g h t b u i l d u p a d a t a b a s e a p p l i c a t i o n u s i n g t h e m e d i a t e d s c h e m a after t h e s y s t e m s e t - u p p h a s e . F o r t h o s e peers h a v i n g a p p l i c a t i o n s over t h e m e d i a t e d s c h e m a , i f t h e m e d i a t e d s c h e m a c h a n g e s , t h e a l r e a d y - b u i l t a p p l i c a t i o n s m i g h t n e e d t o b e r e - b u i l t , w h i c h t a k e s q u i t e a l o t of r e d u n d a n t work.  O n t h e o t h e r h a n d , i f a p e e r leaves, t h a t p e e r s c h e m a i n f o r m a t i o n  needs to b e d r o p p e d f r o m t h e m e d i a t e d schema. However, n o e x i s t i n g a l g o r i t h m h a s b e e n p r o p o s e d t o d e a l w i t h s u c h a case. I n t h i s C h a p t e r , w e d i s c u s s h o w t o u p d a t e t h e m e d i a t e d s c h e m a i n t h e s t e a d y s t a t e - i.e., after the s y s t e m setup phase. In particular, we determine b o t h h o w to u p d a t e the m e d i a t e d s c h e m a a n d m a p p i n g s if a n e w peer joins the P D M S  network  ( S e c t i o n 6.1) o r a n o l d o n e leaves t h e n e t w o r k ( S e c t i o n 6 . 2 ) , a n d h o w t h e m e d i a t e d s c h e m a a n d associated m a p p i n g s change if some peer's l o c a l database s c h e m a evolves ( S e c t i o n 6.3).  6.1  Adding a New Peer to the System  A f t e r the s y s t e m setup phase, every peer m a i n t a i n s a n up-to-date m e d i a t e d s c h e m a M.  If a n e w p e e r P d e c i d e s t o enter t h e n e t w o r k , t h e m e d i a t e d  s c h e m a M needs t o b e u p d a t e d t o a n e w m e d i a t e d s c h e m a M' i n c l u d e P's  schema information.  w h i c h also  H o w e v e r , s i n c e after t h e s y s t e m s e t u p  p h a s e , s o m e p e e r h a s a l r e a d y b u i l t u p a n a p p l i c a t i o n over M. a p p r o a c h is t o c o m p u t e a n e w m e d i a t e d s c h e m a M',  A naive  a n d r e b u i l d a l l those  a l r e a d y - b u i l t a p p l i c a t i o n s , w h i c h takes q u i t e a l o t of r e d u n d a n t w o r k . S i n c e t h e m e d i a t e d s c h e m a c a n c h a n g e p e r i o d i c a l l y d u r i n g t h e life c y c l e of s u c h a P D M S , i t is n o t r e a l i s t i c t o a s k t h e u s e r of e a c h p e e r t o b u i l d a n e w  50  Chapter 6. Updating the Mediated Schema d a t a b a s e a p p l i c a t i o n e v e r y t i m e t h e m e d i a t e d s c h e m a is u p d a t e d . A g o o d s t r a t e g y is t o l e t t h e p e e r m a i n t a i n i t s a l r e a d y - b u i l t a p p l i c a t i o n a n d m a i n t a i n a m a p p i n g f r o m M' t o M.  APPp(M)  S i n c e t h e m a p p i n g f r o m M' t o  P can e a s i l y b e c a l c u l a t e d b y u s i n g t h e M a p p i n g T a b l e t h a t c o r r e s p o n d s t o M', a n y q u e r y over M p a s s e d f r o m o t h e r p e e r c a n b e r e f o r m u l a t e d t o t h a t over P. on which  P's user c a n also use i t s l o c a l s c h e m a o r t h e m e d i a t e d s c h e m a M  APPp is b u i l t . B e l o w is a n c o n c r e t e e x a m p l e o f s u c h a s c e n a r i o .  E x a m p l e 1 5 A n example of a n e w peer j o i n i n g t h e network T h e r e a r e t h r e e p e e r s i n t h e c u r r e n t s y s t e m : A, B a n d C.  M a p p i n g s are  c r e a t e d b e t w e e n A a n d B, a n d B a n d C as MapA_B a n d MapB.c tively. W e only consider relations t h a t c a n represent a  respec-  Concept ( i n t h e f o r m  of a m a p p i n g ) for t h e m e d i a t e d s c h e m a b e c a u s e t h e y a r e t h e c o r e p a r t o f our discussion. C o n s i d e r t h e s i t u a t i o n s h o w n i n F i g u r e 6.1(a): after t h e s e t u p p h a s e , a m e d i a t e d s c h e m a MS\ m a p p i n g s f r o m MS\  h a s b e e n b u i l t a n d m a i n t a i n e d a t each p e e r . T h e  t o A, B a n d C a n d t h e M a p p i n g T a b l e s also have b e e n  built.  ;A¥pB(Ms;t)  (a) initial system  (b) after peer D joins  F i g u r e 6.1: A d d i n g a N e w P e e r t o t h e S y s t e m  A f t e r b u i l d i n g t h e m e d i a t e d s c h e m a MS\  a n d the mappings to local  s o u r c e s , user at p e e r B b u i l t a d a t a b a s e a p p l i c a t i o n APPB  using the medi51  Chapter 6. Updating the Mediated Schema a t e d s c h e m a MS\.  S o t h e information stored at each peer c a n b e s u m m a -  r i z e d as f o l l o w s . - P e e r A : s c h e m a A , Map AM, - Peer B : schema  B,  MS\,  MapMSx-A,  Map .c,  MapA.B,  B  MS\,  MT\.  MT\,  Map s M, M  X  APP (MS ). B  1  - P e e r C : s c h e m a C , Maps.C,  MS\,  MapMSi.c,  MT\. i  S o m e t i m e later, a n e w peer D decides to j o i n i n the network. a m a p p i n g Mapc_D  It c r e a t e s  t o p e e r C, as s h o w n i n F i g u r e 6 . 1 ( b ) .  U s i n g t h e a l g o r i t h m s d e s c r i b e d i n C h a p t e r 5, a n e w m e d i a t e d s c h e m a M a p p i n g T a b l e MT  MS , 2  a n d G L A V m a p p i n g s c a n b e c o m p u t e d at peer  2  C. C p a s s e d MS ,  t o D. D c o m p u t e s MapMS .D-  MT  2  2  t h e G L A V m a p p i n g f r o m MS  t o C u s i n g MapMS -MS  2  2  get t h e n e w G L A V m a p p i n g MapMS .c C a l s o passes MS ,  MT ,  2  MT , 2  t o i t s a c q u a i n t a n c e B.  MapMS .MS  2  2  x  u s i n g MapMS .MS 2  a n d MapMS^C  x  a n d MapMSi-B-  x  2  and  2  a m a p p i n g f r o m m e d i a t e d s c h e m a MS  B com-  B also p a s s e s  t o i t s a c q u a i n t a n c e A . U s i n g MapMS .MSi  MapMS .MSi  and  f o r itself.  2  p u t e s MapMS2.B  C further computes  2  MS , 2  MapMS^JK,  to local schema A can be computed.  2  A t t h e e n d , a l l t h e p e e r s get t h e m o s t u p d a t e d i n f o r m a t i o n a b o u t E a c h p e e r a l s o k n o w s h o w t o m a p i t s l o c a l s c h e m a t o MS . 2  MS . 2  Information  s t o r e d a t e a c h p e e r c a n b e s u m m a r i z e d as f o l l o w s . - P e e r A : s c h e m a A , Map AM,  MS ,  - P e e r B : s c h e m a B, MapAM,  Mapsjo,  APP  B  (MS!),  Ma MS P  2  .MS,,  - P e e r C : s c h e m a C, Map _c, B  M  APPB(MSI).  a n d MapMSiM  MS\  MapMS -MS 2  x  2  MS\, M,  2  MapcjD,  2  MT .  2  Map s  - P e e r D : s c h e m a D, Mapcs>,MS , F o r p e e r B,  MapMS J^,  2  MapMSi-B,  MS , MT 2  MS , 2  MapMS .D, 2  2  MapMS .c, 2  MT . 2  MT . 2  are kept i n its database  application  is k e p t b e c a u s e q u e r y over t h e n e w m e d i a t e d 52  Chapter 6. Updating the Mediated Schema s c h e m a MS  2  c a n b e t r a n s l a t e d t o MS\  a n d further be used b y application  APP {MSi). B  T h u s , every peer c a n keep t h e u p d a t e d m e d i a t e d s c h e m a a n d m a i n t a i n a m a p p i n g from the mediated schema to local schema. C o m p a r e d to the naive a p p r o a c h , n o b u r d e n h a s b e e n a d d e d t o t h e l o c a l p e e r users. A d d i t i o n a l l y , a l l t h e p r e v i o u s l y c r e a t e d d a t a b a s e a p p l i c a t i o n s over p r e v i o u s v e r s i o n s o f mediated schema c a n still be used.  N e w a p p l i c a t i o n s c a n also b e c r e a t e d  over t h e n e w v e r s i o n s o f t h e m e d i a t e d s c h e m a . D e t a i l s o f t h i s e x a m p l e is s h o w n i n t h e A p p e n d i x A .  6.1.1  •  Algorithm  F i g u r e 6.2 s h o w s t h e a l g o r i t h m o f u p d a t i n g t h e m e d i a t e d i n f o r m a t i o n f o r t h e case o f a d d i n g a p e e r . I n t h i s a l g o r i t h m , w e c o n s i d e r h o w t o get a n e w mediated schema, a n d how to maintain the mapping from the new mediated schema t o t h e o l d one o n w h i c h a n application might have been built. A l g o r i t h m U p d a t e M e d i a t e d I n f o r m a t i o n ( A d d i n g al P e e r E) Let M be the original mediated schema Let MT be the corresponding M a p p i n g T a b l e set 1. E creates a mapping to F which is already i n the network 2. E creates a mediated schema MEF using the Procedure i n Figure 5.1 3. F creates a new mediated schema M' using MEF and M using the Procedure i n Figure 5.14 F computes G L A V m a p p i n g MapM'.M a n d MapM'.F F updates its maintained information: M —> M', MT —> MT' a n d Mapu-F —* MapM.F F send M', corresponding MT to E F broadcast M', corresponding MT' a n d MapM'M to a l l other acquaintances F computes MapM'.F for itself , 4. E maintains MT', M', a n d computes Mapw.E using MT' 5. F o r any other peer G that received the message originally from F If there are applications built over M G maintains the following information: M', MT', MapM'.M, M, 1  MapM'-M, MapM'.G Else G simply updates M -> M'', MT -> MT' and Map .G -> Map '.G G further broadcasts a l l mediated schema information to its acquaintances M  M  F i g u r e 6.2: U p d a t e m e d i a t e d i n f o r m a t i o n w h e n n e w p e e r j o i n s t h e n e t w o r k  53  Chapter 6. Updating the Mediated Schema  6.2  Dropping a Peer from the System  In this section, we explore four possible approaches to u p d a t e the m e d i a t e d s c h e m a after d r o p p i n g one p e e r s c h e m a . W e c o m p a r e t h e a d v a n t a g e s a n d d i s a d v a n t a g e s of e a c h s t r a t e g y . ' C o m p a r a b l y , S t r a t e g y O n e a n d T w o are t h e n a i v e s o l u t i o n s . S t r a t e g y T h r e e is b e t t e r t h a n O n e a n d T w o . H o w e v e r , i n t h e w o r s t case, it faces t h e s a m e p r o b l e m as S t r a t e g y O n e a n d T w o .  Strat-  egy F o u r , t h o u g h poses a d d i t i o n a l r e q u i r e m e n t for t h e l e a v i n g p e e r , c a n keep the system m u c h more stable a n d does not require more time. W e t h i n k this is t h e b e s t a p p r o a c h a m o n g a l l .  S t r a t e g y O n e : O n c e a p e e r d e c i d e s t o leave t h e n e t w o r k , t h e p e e r needs t o notify any other node i n the network, w h i c h triggers the s c h e m a m e d i a t i o n p r o c e s s f r o m t h e v e r y b e g i n n i n g . E v e r y n o d e i n t h e n e t w o r k is r e g a r d e d as a new peer. A d v a n t a g e s : T h e s e t u p for s u c h a s y s t e m is v e r y easy. N o a d d i t i o n a l a l g o r i t h m for r e m o v a l o p e r a t i o n needs t o b e d e s i g n e d . O n l y s c h e m a m e d i a t i o n algorithm will be involved. Disadvantage:  B a s i c a l l y , t h i s s t r a t e g y is n o t r e a l i s t i c .  F i r s t , the schema  m e d i a t i o n p r o c e s s w i l l b e t o o f r e q u e n t . If p e e r s leave t h e n e t w o r k  frequently  i n a s h o r t p e r i o d , t h e r e w i l l b e t o o m u c h s y s t e m w o r k a s s i g n e d for s c h e m a m e d i a t i o n only. Resources c a n not be used wisely. Second, the previouslyc r e a t e d m e d i a t e d s c h e m a c a n n o t b e f u l l y u s e d of i n t h e p r o c e s s of c r e a t i n g t h e n e w one. T h e n e w m e d i a t e d s c h e m a m i g h t n o t c h a n g e d r a m a t i c a l l y f r o m t h e o l d one b u t t h i s s t r a t e g y r e q u i r e s t h a t t h e n e w v e r s i o n of t h e m e d i a t e d schema should be created f r o m the beginning. I n c o n c l u s i o n , t h i s s t r a t e g y is n o t at a l l s a t i s f a c t o r y .  Strategy Two:  R e - d o the schema m e d i a t i o n once every assigned p e r i o d .  If i n one p e r i o d , one p e e r is l e a v i n g t h e n e t w o r k , t h e s y s t e m needs t o d o t h e s c h e m a m e d i a t i o n a g a i n f r o m t h e b e g i n n i n g . T h e r e are t w o w a y s t o k n o w whether a peer X  is l e a v i n g t h e n e t w o r k .  n o d e b e f o r e its d e p a r t u r e .  (1) P e e r X  (2) O t h e r p e e r , u s u a l l y X's  notifies any other acquaintance, P I N s  54  Chapter 6. Updating the Mediated Schema o r c o m m u n i c a t e s w i t h p e e r X. t o h a v e left t h e n e t w o r k .  If it gets n o r e s p o n s e , X  can be assumed  X ' s i n f o r m a t i o n needs t o b e r e m o v e d f r o m  the  mediated s c h e m a a n d triggers a new s c h e m a m e d i a t i o n f r o m the beginning. A d v a n t a g e s : B e t t e r t h a n S t r a t e g y one b e c a u s e t h e f r e q u e n c y of t h e s c h e m a m e d i a t i o n p r o c e s s is d e c r e a s e d . A l g o r i t h m s i n v o l v e d are s t i l l s i m p l e . Disadvantages:  T h e previously-created mediated schema cannot be  fully  u s e d of i n t h e p r o c e s s of c r e a t i n g t h e n e w o n e .  S t r a t e g y T h r e e : T h e leaving peer X  w i l l n o t n o t i f y o t h e r p e e r s w h e n it  leaves t h e n e t w o r k . X ' s a c q u a i n t a n c e Y w i l l r e c o g n i z e X ' s l e a v i n g b y P I N i n g X  or c o m m u n i c a t i n g w i t h X  b u t getting no response. Y once realizes  a b s e n c e , it w i l l t r y t o c o m p u t e t h e n e w m e d i a t e d s c h e m a w i t h o u t t h e s c h e m a of X.  However, this requires t h a t peer V be able to recognize  w h i c h relation i n the M a p p i n g T a b l e s comes f r o m Advantages:  X's  knowing  X.  (1) U p d a t i n g of t h e m e d i a t e d s c h e m a is t i m e l y .  leave t h e n e t w o r k at a n y t i m e w i t h o u t n o t i f y i n g a n y o t h e r .  (2) P e e r c a n (3) T h e c a l c u -  l a t i o n is b a s e d o n t h e p r e v i o u s m e d i a t e d s c h e m a , so t h e p r o c e s s w i l l n o t b e t i m e - c o n s u m i n g . O l d v e r s i o n s of t h e m e d i a t e d s c h e m a c a n b e f u l l y u s e d of. Disadvantages:  (1) E v e r y p e e r w o u l d b e a b l e t o k n o w o t h e r p e e r s s c h e m a  f r o m t h e M a p p i n g T a b l e s , w h i c h is n o t i d e a l for t h e sake of s a f e t y a n d p r i v a c y r e a s o n . (2) F o r p e e r s t h a t lost c o n n e c t i o n b e c a u s e o t h e r p e e r s ' leave, s u c h peers n e e d t o r e - j o i n t h e n e t w o r k , w h i c h c o s t s a d d i t i o n a l s c h e m a m e d i a t i o n w o r k after t h e d e l e t i o n w o r k . W e use E x a m p l e 16 t o e x p l a i n t h i s i d e a .  Example 16  C o n s i d e r t h e e x a m p l e p r o p o s e d i n E x a m p l e 15 ( d e t a i l s i n A p -  p e n d i x A ) , a n d a s s u m e t h e s t a t u s of t h e n e t w o r k is w h e n MS  2  has been  c r e a t e d a n d a l l m e d i a t e d i n f o r m a t i o n h a s b e e n s t o r e d at e v e r y p e e r , as is s h o w n i n F i g u r e 6.1(b).  C a s e 1: B r e c o g n i z e s t h a t A has left t h e n e t w o r k . O n c e B r e c o g n i z e s t h a t A h a s left t h e n e t w o r k , it c h e c k s y l ' s i n f o r m a t i o n i n t h e M a p p i n g T a b l e MT . 2  A is o n l y i n v o l v e d i n t h e m a p p i n g w i t h B  and  has n o o t h e r a c q u a i n t a n c e s , so t h e d e l e t i o n w i l l b e easy:  55  Chapter 6. Updating the Mediated Schema 1. r e m o v e f r o m M a p p i n g T a b l e set MT  all rows originally f r o m schema  2  a l l c o l u m n s o n l y f r o m s c h e m a A —> MT3  A a n d r e m o v e f r o m MT  2  2. for e v e r y r e l a t i o n a i n MS ,  if a is o r i g i n a l l y f r o m s c h e m a A, r e m o v e a  2  f r o m MS  a n d u p d a t e each r e l a t i o n a i n MS2  2  3. delete M a p ^ j g f r o m B's 4. create t h e a m a p p i n g  i n f o r m a t i o n set; MapMS MS \ 3  2  MT3, a n d MapMS .MS  5. b r o a d c a s t MS3,  c o r r e s p o n d i n g to MT3;  3  2  to the other peers.  C a s e 2: 5 R e c o g n i z e s t h a t C h a s left t h e n e t w o r k .  T h i s is a m o r e g e n e r a l  case w h e n t h e l e a v i n g p e e r h a s m o r e t h a n o n e a c q u a i n t a n c e a n d a c t s as a bridge connecting other peers. B f i r s t checks MT .  F r o m the two intermediate mediated schemas I  2  a n d IMCD,  B k n o w s t h a t C o r i g i n a l l y h a d c o n n e c t i o n w i t h B a n d D.  MBC So B  w i l l first u p d a t e the mediated s c h e m a a n d the M a p p i n g T a b l e . U p d a t e the mediated  information:  1. D e l e t e f r o m t h e m e d i a t e d s c h e m a a l l r e l a t i o n s t h a t o r i g i n a l l y f r o m s c h e m a B.  F o l l o w i n g t h e d e t a i l s i n A p p e n d i x A , delete  come  "B_flight(date,  c o m p a n y , f l i g h t N o , service)" a n d " B _ p r i c e ( f l i g h t N o , class, price)"  from  MS . 2  2. D e l e t e f r o m t h e M a p p i n g T a b l e r o w s t h a t r e p r e s e n t r e l a t i o n s f r o m s c h e m a B a n d a l l i n t e r m e d i a t e r e l a t i o n s w h i c h i n v o l v e s s c h e m a B.  In this ex-  a m p l e , delete f r o m MT  "IM c-flight",  2  "IMCD-flight"  and  the rows " B J I i g h t " , "B_price",  B  "MSi.flight".  3. C h e c k a l l c o l u m n s of t h e M a p p i n g T a b l e w h e t h e r t h e r e are c o l u m n s n o t useful any more. Delete all such columns. Delete "ID" a n d "discount" f r o m MT  2  in this example.  4. U p d a t e t h e r e l a t i o n i n t h e m e d i a t e d s c h e m a t h a t c o m e s f r o m a m a p ping.  I n t h i s e x a m p l e , "flight(date,  f l i g h t N o , price, c o m p a n y , service,  class, ID, discount, depart, arrival, n u m L e f t ) " is n o w u p d a t e d t o  "flight(date,  f l i g h t N o , price, company, service, class, depart, arrival, n u m L e f t ) " .  56  Chapter 6. Updating the Mediated Schema 5. S i n c e D is l o s t c o n n e c t i o n w i t h a n y o t h e r n o d e i n t h e n e t w o r k , B c a n a s s u m e t h a t D needs t o r e - j o i n t h e n e t w o r k . for  B d o the same deletion  D.  6. G e t n e w m e d i a t e d s c h e m a MS3 MT3. Create a mapping  and corresponding MappingTables  MapMS .MS 3  7. B r o a d c a s t MS3, MT3 a n d MapMS .MS  2  3  t  o  a 1 1  t  n  e  o  t  n  2  e  r  peers.  A l l peers t h a t lost c o n n e c t i o n need to re-enter the network.  F o r the  w o r s t case, e.g. t h e t o p o l o g y s h o w n i n F i g u r e 6 . 3 , a l l n o d e s n e e d t o r e - j o i n t h e n e t w o r k i f B leaves. (MS2)  (MS2)"  (MS2).  4m: Di  1 V(MS2)  A..  F i g u r e 6.3: Peer L e a v i n g ( B a d T o p o l o g y )  • S t r a t e g y F o u r : A p e e r X c a n n o t leave u n t i l X c a l c u l a t e s t h e n e w m e d i a t e d s c h e m a . T h e i d e a is t h a t g i v e n t h e o r i g i n a l m e d i a t e d s c h e m a M s c h e m a X t h a t is t o b e r e m o v e d , c o m p u t e t h e r e m a i n i n g p a r t .  and local  In order to  d o t h i s , a " r e m o v a l " o p e r a t o r needs t o b e d e s i g n e d t o r e m o v e p a r t of t h e mediated schema using X s y  mediated schema.  l o c a l s c h e m a a n d m a p p i n g s t o get t h e u p d a t e d  T h e removal o p e r a t i o n c a n remove some relations, or  some a t t r i b u t e s i n the relations. F u r t h e r , X helps each of its acquaintances find  another acquaintance i n X ' s acquaintance list t o m a k e sure t h a t w h e n  X leaves, t h e n e t w o r k is s t i l l c o n n e c t e d . X does n o t n e e d t o b u i l d a m a p p i n g for t h e m s i n c e t h e m a p p i n g c a n b e i n f e r r e d f r o m t h e e x i s t i n g M a p p i n g T a b l e s .  57  Chapter 6. Updating the Mediated Schema A d v a n t a g e : T h i s is a better decision t h a n the previous t w o strategies. F i r s t , it is n o t n e c e s s a r y t o r e - b u i l d t h e m e d i a t e d s c h e m a f r o m t h e v e r y b e g i n n i n g every time. T h e original mediated schema c a n b e fully used w h e n b u i l d i n g t h e n e w one. S e c o n d , t h e c a l c u l a t i o n t i m e f o r r e m o v i n g p a r t o f t h e m e d i a t e d s c h e m a is f a r less t h a n r e - b u i l d i n g t h e w h o l e m e d i a t e d s c h e m a . D i s a d v a n t a g e s : T h e o n l y d i s a d v a n t a g e is t h a t t h e l e a v i n g p e e r needs t o d o a d d i t i o n a l w o r k a n d n o t i f y o t h e r n o d e s b e f o r e i t leaves t h e n e t w o r k . I n t h i s s t r a t e g y , p e e r s c a n n o t leave w i t h o u t n o t i f y i n g t h e o t h e r n o d e s . H o w e v e r , s o m e e x i s t i n g P 2 P d a t a b a s e s y s t e m s also r e q u i r e s t h a t s o m e p e e r needs t o d o e x t r a t a s k s b e f o r e l e a v i n g t h e n e t w o r k . F o r e x a m p l e , [16] r e q u i r e s t h a t o n l y leaf n o d e s i n t h e i r n e t w o r k s t r u c t u r e c a n v o l u n t a r i l y leave t h e network while other nodes must find a replacement t o store their i n f o r m a t i o n first  b e f o r e l e a v i n g . S o t h i s is a c c e p t a b l e i f w e r e q u i r e t h a t a n o d e n e e d t o  t r i g g e r a r e m o v a l o p e r a t i o n b e f o r e i t leaves t h e n e t w o r k . T h e r e m o v a l a l g o r i t h m is s h o w n i n F i g u r e 6.4 Procedure R e m o v e L o c a l S c h e m a (MS, MT, X, MapMS-x) /* MS is the mediated schema, MT is the corresponding M a p p i n g T a b l e set, X is the local peer schema, MapMSJi is the m a p p i n g from MS to X */ For every mapping mapMS.x € MapMS-X L e t R be body(LAV(mapMS-x)) Let mt € MT be the M a p p i n g T a b l e w i t h R as the first relation If mt doesn't exist /* R is a direct copy of local peer relation * / M a r k R i n MS for later deletion Else Let r be body{GAV(mapMS.X)) Let h be head (mapMS-x) For each attribute attr i n h Let SG be the subgoals i n r containing attr If i n mt, no other relations except those i n SG have corresponding attributes for attr M a r k attr i n MS for later deletion Delete a l l attributes and relations w i t h deletion marks from MS to M S ' U p d a t e MT to MT' Create a m a p p i n g MapMS'.MS from MS' t o MS R e t u r n MS', MT', Map >_MS MS  F i g u r e 6.4: U p d a t e m e d i a t e d i n f o r m a t i o n w h e n a p e e r leaves t h e n e t w o r k  58  Chapter 6. Updating the Mediated Schema T h e r e s u l t s MS',  a n d Map s'.MS  MT'  need to be broadcasted to all  M  other peers i n the network.  E x a m p l e 17  C o n s i d e r t h e e x a m p l e p r o p o s e d i n E x a m p l e 15 ( d e t a i l s i n A p -  p e n d i x A ) , a n d a s s u m e t h a t m e d i a t e d s c h e m a for A,  C a n d D has been  B,  c r e a t e d . P e e r D w o u l d l i k e t o leave t h e n e t w o r k , so D d o e s t h e c a l c u l a t i o n o f t h e n e w m e d i a t e d s c h e m a . F i g u r e 6.5, 6.6 a n d 6.7 s h o w t h e n e t w o r k s t a t u s before D ' s leaving. M e d i a t e d schema M S = { flight(date, flightNo, price, company, service, class, I D , discount, depart, 2  arrival, numLeft)  } F i g u r e 6.5: E x a m p l e 17. I n p u t I n f o r m a t i o n (a)  MappingTable M S i .flight  1  2  .flight  3  4  flig i t :  -5  6  7  8  7  8  7  8  1  2  3  4  5  6  IM -flight  1  2  3  4  5  6  AJlight  1  2  3  BJlight  1  3  2  4  4  5  2  4  MSi AB  1  3  1  2  3  1  3  B.price I  MBC-flight  B_flight  1  B-price 2  3  IMCD-flight  1  2  Cschedule  2  3  Cschedule  3  6  D_schedule  1  11  5  6  7  3  4  5  2 1 1  3  3  4  1 2  C-price  10  2  2  Cprice  9  1  3  2  F i g u r e 6.6: E x a m p l e 17 I n p u t I n f o r m a t i o n (b) A c c o r d i n g to the a l g o r i t h m m a p p i n g i n MapMS2S>,  i n F i g u r e 6.4, D c h e c k s e v e r y  if t h e L A V p a r t is d i r e c t l y  conjunctive  from a relation r  G 59  Chapter 6. Updating the Mediated Schema  Map s .D'M  2  LV — { M S 2 ~ D i ( d a t e , flightNo, depart, arrival, numLeft) :M5 2.flight(date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) ,  }  '"  GV = { MS2--Di(date, flightNo, depart, arrival, numLeft) :£>.D-schedule(date, flightNo, depart, arrival, numLeft) }  "'•  F i g u r e 6.7: E x a m p l e 17 I n p u t I n f o r m a t i o n (c)  D, delete r f r o m t h e m e d i a t e d s c h e m a MS  a n d the M a p p i n g T a b l e  2  In this example, there's no such m a p p i n g .  MT . 2  If t h e c o n j u n c t i v e m a p p i n g is  o b t a i n e d f r o m a m a p p i n g b e t w e e n a p a i r o f p e e r s , let t h e L A V p a r t b e lav a n d let t h e G A V p a r t b e gav.  D e l e t e f r o m t h e MT  2  gav a n d t h e i n t e r m e d i a t e r e l a t i o n r e l a t e d w i t h gav. delete "D_schedule" a n d "IM D C  flight" f r o m MT .  all relations i n  S o i n this example, D e l e t e f r o m MT  2  2  all  c o l u m n s t h a t d o n ' t contain a n y values. I n this e x a m p l e , delete t h e c o l u m n s "depart",  "arrival" a n d "numLeft".  U p d a t e the relation in the mediated  s c h e m a t h a t r e p r e s e n t s lav a c c o r d i n g t o t h e u p d a t e d M a p p i n g T a b l e . I n t h i s e x a m p l e , "flight(date, flightNo, price, company, service, class, ID, discount, depart; arrival, numLeft)" is u p d a t e d t o "flight(date, flightNo, price, company, service, class, ID, discount)". S o a n e w M a p p i n g T a b l e MT3, s h o w n i n F i g u r e 6.8, c a n b e o b t a i n e d . A n e w m e d i a t e d s c h e m a MS3  c a n also b e o b t a i n e d ; s h o w n i n F i g u r e 6.9. D  also creates a m a p p i n g f r o m MS3 t o MS , 2  D passes MS3,  M T 3 a n d MapMS -MS 3  s h o w n i n F i g u r e 6.10. t  2  o  a  n  t  n  e  other peers i n the  n e t w o r k a n d leaves t h e n e t w o r k .  •  60  Chapter 6. Updating the Mediated Schema MappingTable  flight:  MSi  .flight  1  2  3  4  5  6  7  8  MSi  .flight  1  2  3  4  5  6  7  8  IM -flight  1  2  3  4  5  6  AJlight  1  2  3  BJlight  1  3  2  4 7  8  AB  B_price  1  3 3  IM -flight  1  2  B_flight  1  3  BC  1  B.price Cschedule  2  2 4  5  2  4  3  6 2  1  3  C-price  2  1  3  F i g u r e 6.8: E x a m p l e 17 O u t p u t (a) M e d i a t e d schema MS3 = { flight(date, flightNo, price, company, service, class, I D , discount)  } F i g u r e 6.9: E x a m p l e 17 O u t p u t (b)  6.2.1  Section Summary  In this section, we have explored four possible ways of u p d a t i n g the m e d i a t e d s c h e m a w h e n a p e e r leaves t h e n e t w o r k .  W e h a v e also  illustrated  the advantages a n d disadvantages of each approach w i t h examples. W e f i nally presented a n a l g o r i t h m t o remove a peer schema's i n f o r m a t i o n f r o m the mediated schema based o n the fourth approach. T h e f o u r t h a p p r o a c h outperforms  all other approaches because w i t h the s u p p o r t of the removal  operation, all previously constructed applications c a n still be available, a l l the peers are still connected, a n d n o r e d u n d a n t w o r k w i l l be resulted.  6.3  Evolution of a Peer Local Schema  If p e e r E's l o c a l d a t a b a s e s c h e m a evolves f r o m SE t o S ' after t h e s y s t e m E  s e t u p p h a s e , t h e m e d i a t e d s c h e m a M needs t o b e c h a n g e d a c c o r d i n g l y .  A  61  Chapter 6. Updating the Mediated Schema MapMS -MS '3  2  LV = { MS3.MS21ida.te, flightNo, price, company, service, class, I D , discount) :MS3. flight (date, flightNo, price, company, service, class, I D , discount) }  "'  GV = { MS3.MS21ida.te, flightNo, price, company, service, class, I D , discount) :M52-flight(date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft)  } "' F i g u r e 6.10: E x a m p l e 17 O u t p u t  (c)  s i m p l e , however effective, s o l u t i o n t o t h i s case is t o f i r s t t r e a t E as l e a v i n g t h e n e t w o r k ( r e m o v i n g SE f r o m M),  then joining the network w i t h S ' . E  62  Chapter 7  A Study of M a p p i n g Composition We  have presented o u r a l g o r i t h m s of creating a m e d i a t e d s c h e m a a n d m a p -  p i n g s t o l o c a l s o u r c e s i n a P D M S i n C h a p t e r 5. W e f u r t h e r d i s c u s s e d h o w t o u p d a t e t h e m e d i a t e d s c h e m a i n C h a p t e r 6.  In a P D M S dealing w i t h  q u e r y p r o c e s s i n g , m a p p i n g c o m p o s i t i o n is a l w a y s a m a i n issue t o c o n s i d e r . How  w e l l t h e m a p p i n g MapA.c  and  MapB.c  keeps t h e i n f o r m a t i o n o f m a p p i n g s Map  largely decides h o w well the P D M S c a n transfer  AM  information  a m o n g different peers. T h u s , we m a k e a s t u d y of m a p p i n g c o m p o s i t i o n i n this chapter. N o t e t h a t o u r f o c u s i n t h i s t h e s i s is t o u n d e r s t a n d t h e f u n d a m e n t a l s o f m a p p i n g c o m p o s i t i o n i n t h i s c o n t e x t . A s s u c h , o u r a l g o r i t h m is n o t d e s i g n e d t o h a n d l e all p o s s i b l e p a t t e r n s , b u t r a t h e r t o f o c u s o n t h o s e t h a t a r e t h e m o s t common.  In particular, we only consider i n p u t m a p p i n g s to b e m a p p i n g s  w i t h t h e s a m e Concept ( D e f i n i t i o n 3 ) , i g n o r i n g s u c h c o m p l i c a t e d f a c t o r s as self-join a n d self-restrictive c o m p o n e n t (Definition 6). O n the other h a n d , u s i n g M e P S y s is a c t u a l l y t r a n s f e r r i n g t h e p r o b l e m o f m a p p i n g c o m p o s i t i o n into another: using the m e d i a t e d s c h e m a t o relate different schemas, w h i c h is  more comprehensible. However, we make a general c o m p a r i s o n here to  see  h o w well each a l g o r i t h m works i n different c i r c u m s t a n c e s . S e c t i o n 7.1 p r e s e n t s f o u r e x a m p l e s w i t h t h e i n t u i t i o n o f w h e r e t h e c o m -  p l e x i t y c o m e s f r o m . S e c t i o n 7.2 s h o w s t h e a n a l y s i s r e s u l t s .  7.1  Complexity: Where Difficulties Come From  I n section 2.2, we have presented the definition of m a p p i n g c o m p o s i t i o n a n d b r i e f l y i n t r o d u c e d t h e c o m p l e x i t y o f m a p p i n g c o m p o s i t i o n i n first o r d e r l o g i c 63  Chapter 7. A Study of Mapping  Composition  l a n g u a g e . I n t h i s s e c t i o n , w e p r e s e n t s e v e r a l e x a m p l e s t o s h o w w h e r e diffic u l t i e s c o m e f r o m f o r t h e p r o b l e m of m a p p i n g c o m p o s i t i o n . T h e s e e x a m p l e s a r e w e l l - s t u d i e d i n [20] a n d [11]. W e q u o t e t h e m h e r e u s i n g t h e i r o r i g i n a l m a p p i n g l a n g u a g e . T h e t r a n s l a t i o n t o c o n j u n c t i v e m a p p i n g s are t r i v i a l . M a d h a v a n et a l ' s w o r k [20] i n t r o d u c e d t h e c o m p l e x i t y of m a p p i n g c o m position b y a n a l y z i n g t h e n u m b e r of c o m p o s e d m a p p i n g s . T a k e a look at the following two examples. E x a m p l e 1 8 A s s u m e t h e f o l l o w i n g m a p p i n g s f r o m [20]:  M -B  = {a{x,y)C  M -c  = {b(x,xi),b(xi,x ),b(x ,y)  A  B  b(x,xi),b{xi,y)} 2  Q c{x,y)}  2  T h e final c o m p o s e d m a p p i n g set MA-C i n v o l v e s m o r e t h a n o n e f o r m u l a s : C c(x,yi)  (18.a)  a ( x i , x ) , a ( x , x ) C c(j/i,x)  (18.b)  a(x,xi),a(xi,x ) 2  2  2  a(x,xi),a(xi,x ),a(x ,y) 2  2  C c(x,yi),c{yi,y)  (18.c)  • Example  18 shows that the number  pend on the number  of the input  of composed  mappings  does not de-  mappings.  T o a n a l y z e E x a m p l e 18, [20] i l l u s t r a t e s r e l a t i o n a, b a n d c u s i n g p a t h length.  S u p p o s e b e n c o d e s a l l t h e edges o f a g r a p h G.  R e l a t i o n b(x, y)  m e a n s t h e r e ' s a p a t h f r o m x t o y w h o s e p a t h l e n g t h is o n e . R e l a t i o n a ( x , y) m e a n s t h e r e ' s a p a t h f r o m x t o y w h o s e p a t h l e n g t h is t w o . S i m i l a r l y , r e l a t i o n c ( x , y) m e a n s t h e r e ' s a p a t h f r o m x toy  w h o s e p a t h l e n g t h is t h r e e .  S o t h e m a p p i n g i n MA-B m e a n s a is a s u b s e t o f t h e n o d e p a i r s w i t h p a t h s of l e n g t h t w o i n G, a n d m a p p i n g i n M -c B  means a l l node pairs i n G w i t h  p a t h o f l e n g t h t h r e e a r e a s u b s e t of c. (18.a) d e s c r i b e s t h e fact t h a t i f t h e r e ' s a p a t h o f l e n g t h f o u r s t a r t i n g f r o m x , t h e n t h e r e is a p a t h o f l e n g t h t h r e e s t a r t i n g f r o m x . (18.b) s h o w s that if there's a p a t h of length four e n d i n g at x , then there's a p a t h o f length \ 64  Chapter  7. A Study of Mapping  Composition  t h r e e e n d i n g a t x. (18.c) tells u s i f t h e r e ' s a p a t h o f l e n g t h t h r e e i n a , t h e n t h e r e is a p a t h of l e n g t h t w o i n c. A l l o f t h e a b o v e t h r e e m a p p i n g s c a n b e i n f e r r e d f r o m MA-B a n d MB-C as t h e m a p p i n g f r o m A t o C. E x a m p l e 1 9 A s s u m e t h e f o l l o w i n g m a p p i n g s f r o m [20]: M -B  = {a {x,y)  A  Q  rg  a (x,y)  r  g  C b (x,xi),bg{x ,y)  gg  M -c  b (x,xi),b (xi,y) g  }  1  = {b (x,xi),bg(xi,x ),bg(x2,y)  B  r  C  2  bg(x,xi),b {xi,y)  C  g  c {x,y) rgg  c (x,y)} gg  T h i s e x a m p l e is also o r i g i n a l l y p r e s e n t e d i n [20]. T h e f i n a l c o m p o s e d m a p p i n g is a n i n f i n i t e set o f m a p p i n g s : Cc {x,y)  a (x,y) gg  (19.a)  gg  a {x,xi),a (xi,X2)  Qcrgg{x,y\)  a (x,xi),a g(x-i,X2),  ...,a (x ,x i)  rg  rg  gg  g  Cr (x,yi), gg  gg  c (yi, gg  n  (19.b) C  n+  y ) , • • •, c (y -i, 2  gg  n  (19.n)  Vn)  • Example  19 tells us that the composition  in infinite set of composed  of finite mappings may result  mappings.  T o a n a l y z e E x a m p l e 18, [20] uses c o l o r e d edges t o i l l u s t r a t e t h i s e x a m p l e . Informally,  each a b o v e e q u a t i o n c a p t u r e s t h e f a c t t h a t i f t h e r e ' s a p a t h  s t a r t i n g f r o m a r e d edge f o l l o w e d b y 2 n + 1 g r e e n edges, t h e n t h e r e m u s t b e a p a t h s t a r t i n g f r o m a r e d edge f o l l o w e d b y 2 n g r e e n edges. E a c h m a p p i n g o n l y e n c o d e s f i n i t e steps i n a p a t h . E a c h t i m e , a r e d edge c a n b e a d d e d t o t h e b e g i n n i n g o f s e v e r a l g r e e n edges, w h i c h w i l l c a u s e a n e w m a p p i n g . A l l of these m a p p i n g s c a n n o t b e e x p r e s s e d b y o t h e r s .  S o a n infinite  mapping  set w i l l b e c o m e t h e r e s u l t o f m a p p i n g c o m p o s i t i o n i n E x a m p l e 1 9 . I n F a g i n et a l ' s w o r k [11], t h e y a n a l y z e d m a p p i n g c o m p o s i t i o n p r o b l e m b y p r o v i n g t h a t i t is N P - c o m p l e t e i f t h e m a p p i n g is i n a f i r s t - o r d e r l o g i c language.  T h e y further proved that using Second-Order Logic m a p p i n g  language, a l l composed m a p p i n g c a n be expressed. 65  Chapter 7. A Study of Mapping Composition E x a m p l e 20  A s s u m e t h e f o l l o w i n g m a p p i n g s f r o m [11]:  M a p p i n g f r o m A to B :  Ve (Emp(e) -r* 3 m  Mgrl(e,m))  (20.1)  M a p p i n g f r o m B to C : VeVm  (Mgr-[(e,m) —> Mgr(e,m))  Ve (Mgri(e,e)  ->  (20.2)  SelfMgr(e))  (20.3)  F a g i n et a l . p r o v e d t h a t t h e c o m p o s i t i o n of M a p p i n g A - B a n d B - C is n o t d e f i n a b l e b y a n y finite set o f s o u r c e - t o - t a r g e t t u p l e g e n e r a t e d d e p e n d e n c i e s , is n o t  first-order-definable  a n d is n o t d e f i n a b l e i n D a t a l o g [11].  However,  t h i s c a n b e expressed u s i n g second-order logic. M a p p i n g from A to C :  3 / (V e(Emp(e) -» Mgr(e,f(e))  A V e(Emp(e)(e = /(e))  SelfMgr(e))).  • Example 20 shows that the composed mapping of two mappings in firstorder logic might not be expressed by first-order logic. ficulty  I n t u i t i v e l y , t h e dif-  i n t h i s e x a m p l e c o m e s f r o m t h e s e c o n d a t t r i b u t e of  Mgrl(e,m).  In  m a p p i n g (20.1), m is a n e x i s t e n t i a l v a r i a b l e . It is n o t b o u n d b y a n y of t h e v a r i a b l e s a p p e a r i n g i n t h e l e f t - h a n d s i d e . H o w e v e r , i n (20.3), t h e s i d e is a c t u a l l y b o u n d b y t h e s e c o n d a t t r i b u t e o f a selector w h i c h d e c i d e s w h e n  Self Mgr.  can be m a p p e d to  . A s t h e r e is n o i n f o r m a t i o n a b o u t  lost i n s c h e m a A. is e x p r e s s e d i n  7.2  Mgrl  Mgrl(e,m).  m  in  Emp,  Mgr  right-hand  m  acts as  a n d w h e n to  s u c h a s e l e c t o r is  T h u s m a p p i n g c o m p o s i t i o n for t h i s e x a m p l e w i l l f a i l if it first-order  logic.  Exploring Possible Patterns in M a p p i n g Composition  P r e v i o u s w o r k [11, 20] d e f i n e d w h e n m a p p i n g s w o u l d b e d i f f i c u l t or i m p o s s i b l e i n s e v e r a l w a y s . F o r e x a m p l e , F a g i n et a l . ' s p r o v e d a n d a n a l y z e d w h e n  66  Chapter 7. A Study of Mapping Composition the definability a n d c o m p u t a t i o n a l c o m p l e x i t y of t h e c o m p o s i t i o n of t w o schema mappings can be determined. T h e y showed that "the composition o f a finite s e t o f f u l l s o u r c e - t o - t a r g e t t u p l e - g e n e r a t i n g d e p e n d e n c i e s ( s o u r c e t o - t a r g e t t g d s ) w i t h a finite set o f s o u r c e - t o - t a r g e t t g d s is a l w a y s d e f i n a b l e b y a finite set o f s o u r c e - t o - t a r g e t t g d s , b u t t h e c o m p o s i t i o n o f a finite set o f s o u r c e - t o - t a r g e t t g d s w i t h a finite set of f u l l s o u r c e - t o - t a r g e t t g d s m a y n o t be definable b y a n y (finite or infinite) of source-to-target tgds; f u r t h e r m o r e , i t m a y n o t b e d e f i n a b l e b y a n y f o r m u l a o f least  fixed-point  l o g i c , a n d t h e as-  s o c i a t e d c o m p o s i t i o n q u e r y m a y b e N P - c o m p l e t e " [11]. F o r c l a r i t y , K o l a i t i s s u m m a r i z e s their results i n T a b l e 7.1. T a b l e 7.1: A S u m m a r y o f M a p p i n g C o m p o s i t i o n C o m p l e x i t y f r o m [11, 18] MapAM  .  MapB.c  Composition  MapA-C  Query finite s-t  set o f f u l l tgds  (p(x)  ->ib(x) finite  set o f s-t  finite  finite  set o f s-t  t g d s ip(x)  tp{x,y)  ib(x,y)  ->3y  set o f (full)  may not be de-  in N P ; can be  tgds  finable:  NP-complete  finite  ' t g d s <p(x) - > 3 ? / s-t ip{x,y)  set o f s-t  t g d s ip(x) - > 3 J /  in P  -^ib(x)  <p(x)  set in  by any  o f s-t t g d s ; FO-logic;  in  Datalog  In their analysis, they divided the m a p p i n g composition patterns i n to t w o m a i n c a t e g o r i e s b a s e d o n w h e t h e r t h e first m a p p i n g is a finite s e t o f f u l l s-t t g d s o r n o t . If t h e first m a p p i n g is a finite set o f f u l l s-t t g d s , t h e m a p p i n g c o m p o s i t i o n c o m p l e x i t y falls i n t o P s p a c e . If n o t , t h e c o m p l e x i t y is l i k e l y t o f a l l i n t o N P . O u r w o r k r e q u i r e s c o m p o s i t i o n f r o m a different a n g l e : r e l a t i n g different source schemas not b y t h e c o m p o s e d m a p p i n g s b u t b y t h e m e d i a t e d schema. T h i s causes u s t o w a n t t o e x p l o r e w h e n o u r a l g o r i t h m is p o s s i b l e . In this section, we m a k e a s t u d y of t h e m a p p i n g c o m p o s i t i o n p r o b l e m b a s e d o n [11] a n d [20]. R a t h e r t h a n o n l y f o c u s o n t h e s e t w o c a t e g o r i e s t h a t F a g i n et a l . a d o p t e d , w e t r y t o e x p l o r e different f a c t o r s t h a t c a n c a u s e t h e c o m p l e x i t y i n m a p p i n g c o m p o s i t i o n a n d c o m e u p w i t h 36 p a t t e r n s w h i c h c a n 67  Chapter 7. A Study of Mapping Composition be included into the above two m a i n categories, b u t i n smaller granularity.  7.2.1  Different Patterns of Mapping Composition  W e make a comparison w i t h Piazza mapping composition algorithm a n d F a g i n et a l ' s S e c o n d - O r d e r L o g i c m a p p i n g c o m p o s i t i o n a l g o r i t h m i n T a b l e 7.2 b a s e d o n s i x c r i t e r i a t h a t f o r m 36 m a p p i n g c o m p o s i t i o n p a t t e r n s . T h e m a i n i d e a o f t h i s s t u d y is t o see f o r a l l p o s s i b l e m a p p i n g c o m p o s i t i o n p a t t e r n s , how well o u r a p p r o a c h of relating the l o c a l schemas u s i n g t h e m e d i a t e d schema c a n perform a n d whether the complexity that are raised i n previo u s w o r k w o u l d also m a k e o u r a p p r o a c h d i f f i c u l t a n d i m p o s s i b l e i n c e r t a i n circumstances. T h e s i x c r i t e r i a for t h e m a p p i n g c o m p o s i t i o n c o m p a r i s o n s t u d y are listed as f o l l o w s .  It is p o s s i b l e t h a t t h e r e e x i s t o t h e r c r i t e r i a f o r t h e p r o b l e m o f  m a p p i n g c o m p o s i t i o n . T h e criteria below are those t h a t we t h i n k essential a n d clear. • W h e t h e r e a c h m a p p i n g is a 1 - s u b g o a l - t o - l - s u b g o a l m a p p i n g o r a m subgoal-to-n-subgoal mapping. • W h e t h e r t h e f i r s t m a p p i n g Map AM is a f u l l set o f t g d s o r n o t . • W h e t h e r t h e s e c o n d m a p p i n g Maps.c  is a f u l l set o f t g d s o r n o t .  T h i s c r i t e r i a w i t h t h e previous one are used to judge w h e t h e r m a p p i n g c o m p o s i t i o n is d e c i d a b l e f o r e a c h s p e c i f i c case, w h i c h is p r o p o s e d b y F a g i n et a l . i n d e c i d i n g t h e c o m p l e x i t y o f m a p p i n g c o m p o s i t i o n problem. • W h e t h e r the existential attributes i n the second schema B m a p to the t h i r d s c h e m a C. T h i s c r i t e r i a is c o m p l e m e n t a r y t o t h e p r e v i o u s t w o , a n d is v e r y i m p o r tant to decide whether the m a p p i n g composition will fall into N P . J u s t as w e s h o w i n T a b l e 7.2, t h e r e a r e a l a r g e set o f m a p p i n g c o m p o s i t i o n t h a t c a n b e easily processed w i t h o u t a n y c o m p l e x i t y even t h o u g h the f i r s t m a p p i n g is n o t a f u l l set of t g d s . A s i m p l e e x p l a n a t i o n f o r t h i s is 68  Chapter 7. A Study of Mapping Composition t h a t t h e existential a t t r i b u t e i n t h e second s c h e m a is t h e one t h a t h i n d e r s t h e f i r s t m a p p i n g t o b e a f u l l set o f t g d s , h o w e v e r , t h i s a t t r i b u t e do not act i n the m a p p i n g composition. T h u s such attributes w i l l not cause a n y difficulty i n c o m p u t i n g a c o m p o s e d m a p p i n g . •  W h e t h e r t h e r e e x i s t s s e l f - r e s t r i c t i v e c o m p o n e n t f o r s c h e m a B. restrictive  component  Self-  is d e f i n e d i n D e f i n i t i o n 6. S e l f - r e s t r i c t i v e c o m -  p o n e n t is a f a c t o r t h a t causes c o n f l i c t i n Concept f o r t w o m a p p i n g s . O n t h e o t h e r h a n d , s e l f - r e s t r i c t i v e c o m p o n e n t is a c a u s e o f c o m p l e x i t y i n t h e m a p p i n g c o m p o s i t i o n p r o b l e m as is s h o w n i n E x a m p l e 2 0 . D e f i n i t i o n 6 ( S e l f - R e s t r i c t i v e C o m p o n e n t ) W e s a y t h a t t h e c o m p o n e n t C\ is a s e l f - r e s t r i c t i v e c o m p o n e n t if: (1) C\ satisfies t h e f o l l o w i n g c o n d i t i o n : - C\ i s a c o m p o n e n t f r o m  CM\\  is a c o m p o n e n t f r o m  CM ;  C  2  C\ a n d C  2  a r e c o m p o n e n t s over t h e s a m e s c h e m a , a n d : I D B ( C i )  2  =  IDB(C ) 2  a n d (2) Q\ a n d Q  c o n s t r u c t e d b e l o w s a t i s f y Q\ C Q  2  - L e t name(sg)  2  C erlap 2oV  o v e r  G i  — {name(sg)  \ name(sg)  and  G sg.names{C\)  sg.names(C )}; 2  = {sg \ name(sg)  a p  —{9  I name(sg)  s  G overlapjnames  2  p a r t s o f C\ a n d C  2  body(C\)};  body(C )}. 2  that describe the overlapping  respectively:  a n d IDB(Q )  b e IDB{CMi)  2  q u i r e m e n t , is also equal t o L e t subgoals(Q\)  a n d sg G  a n d sg G  € overlap-names  - W e n o w c r e a t e n e w q u e r i e s Q\ a n d Q  L e t IDB{Qi)  2  b e t h e n a m e s o f a l l o f t h e r e l a t i o n s i n q u e r y Q;  L e t overlap.names  Let C i  Q.  b e t h e r e l a t i o n n a m e o f t h e s u b g o a l sg;  L e t sg-narnes(Q)  name(sg)  h Q\ ^  = Ci  o v e r  i  (which, b y the above re-  IDB(CM ); 2  a p  , a n d subgoals(Q ) 2  = C  2 a u e r  i  a p  ; 69  Chapter 7. A Study of Mapping Composition Let all variables in  Q\  and  Q  2  b e d i s t i n g u i s h e d . T h a t is let  = vars(subgoals(Q\)) a n d let vars(head(Q2)) =  vars[head(Q\))  vars(subgoals(Q )). 2  • • W h e t h e r t h e r e are a n y c o m p o s e d n o n - i d e n t i c a l s e l f - j o i n i n t h e m a p pings.  Composed Non-identical Self-Join Components is defined i n D e f i n i t i o n 7.  C o m p o s e d n o n - i d e n t i c a l s e l f - j o i n c o m p o n e n t s are t h e  main  c a u s e for t h e c o m p l e x i t y i n m a p p i n g c o m p o s i t i o n p r o b l e m for m a p pings w i t h self-join. F o r m a p p i n g s w i t h composed non-identical c o m p o n e n t s w i t h o u t s e l f - j o i n , t h e r e is n o s u c h c o m p l e x i t y . T h i s is b e c a u s e , e v e r y s u b g o a l i n one c o m p o n e n t c a n e i t h e r r e l a t e t o e x a c t l y one s u b g o a l i n t h e o t h e r c o m p o n e n t o r c a n n o t find a s u b g o a l i n t h e o t h e r at all.  B u t w h e n m a p p i n g s w i t h c o m p o s e d n o n - i d e n t i c a l self-join c o m -  p o n e n t s , m u l t i p l e choices w i l l o c c u r w h e n r e l a t i n g s u b g o a l s f r o m o n e component to the other. D e f i n i t i o n 7 ( C o m p o s e d Non-identical Self-Join C o m p o n e n t s ) A s s u m e there e x i s t s m a p p i n g s A.B tions.  Let  be t h e  b  aJb(b)  a n d B-C. be the  b  L e t a_6 G A-B  c o m p o n e n t of t h e i n t e r s e c t i o n  c o m p o n e n t of t h e i n t e r s e c t i o n 6_c.  n o n - i d e n t i c a l s e l f - j o i n C o m p o n e n t s if 1) s.t. r G  body(a-b(b))  i n at least one of  and r G  aJ)(b)  and  a n d 6_c G BJJ  a.b(b)  and  body(b-c(b)),  a n d 2)  r  aJb.  b-c(b)  aJ)(b) ^ b-c(b)  be intersecL e t 6_c(6)  are c o m p o s e d  but 3 a relation r  a p p e a r s at least t w i c e  b.c(b).  •  70  T a b l e 1.2: D i f f e r e n t P a t t e r n s f o r M a p p i n g C o m p o s i t i o n Second-  1-to-l or  MapAM  MapB.c  e x i s t e n t i a l self-  composed  num  m-to-n  fuU  full  attr  non-  Order  identical  Logic  B  in  restrictive  map  toC  self-join  / / / /  X  X  X  X  /  / /  X X  7  1:1  X  8  1:1  X  / / / /  9  1:1  X  X  X  10  1:1  X  X  X  /  11  1:1  X  X  X  12  1:1  X  X  / /  /  13  m:n  X  /  15  m:n m:n  / /  X  16 17  m:n  / / / / / /  X  m:n  / / / /  X  14  / / / / / /  / / / / / / / / / / / /  X  X  1  1:1  2  1:1  3  1:1  4  1:1  / / / /  5  1:1  X  6  1:1  X  18  m:n  / / X X  X X  / X  /  /  -  X  /  Piazza  MePSys  Pattern  algo  / / / / / /  V X  / X  / X  /  notel notel V  X  /  X  notel notel  X  / / / / / /  / / / X  / / / / / / / / / / / / V note2  X  /  X  note2  /  /  X  note2  Pattern  1-to-l or  MapAM  MapB.c  e x i s t e n t i a l self-  num  m-to-n  full  full  attr  B  in  restrictive  map  to C  Piazza  MePSys  Second-  non-  Order  identical  Logic algo  self-join  20  m:n  /  X  1 1  /  /  /  21  m:n  X  /  X  X  X  /  /  •  22  m:n  X  •  X  X_  /  X  note2  23  m:n  X  /  X  /  X  / /  X  /  24  m:n  X  /  X  /  /  /  X  noie2  25  m:n  X  /  /  X  X  not-el  /•  /  X  note2  19  -a to  composed  m:n  •  X  /  X  /  X  /  X  note2  26  m:n  X  •  /  X  /  27  m:n  X  /  /  /  X  28  m:n  X  /  /  /  /  note! notel not el  .29  m:n  X  X  X  X  X  /  /  /  30  m:n  X  X  X  X  /  /  X  noie2  31  m:n  X  X  X  /  X  /  X  /  32  m:n  X  X  X  /  •  /  X  noie2  33  m:n  X  X  /  X  X  noiel  /  /  /  notel  X  note2  • note2  X  •  X  note2  34  m:n  X  X  /  X  35  m:n  X  X  /  /  X  notel  X  36  m:n  X  X  /  /  /  notel  X  Chapter 7. A Study of Mapping Composition notel:  1.  L o g i c a l l y s p e a k i n g , t h e P i a z z a m e t h o d is n o t r i g o r o u s e n o u g h  t o h a n d l e s u c h cases. F o r e x a m p l e , j u s t as w h a t w e h a v e e x p l a i n e d i n E x a m p l e 2 , c o n s t r a i n t o n y\ is lost i n t h e c o m p o s e d m a p p i n g . W e c a l l t h e case t h a t c o n s t r a i n t s p e c i f i e d i n t h e o r i g i n a l m a p p i n g s b u t lost i n the composed m a p p i n g a  note2:  2.  constraint loss case.  R e s u l t s are logically correct, b u t are very c o m p l i c a t e d , n o t  m e a n i n g f u l a n d easy t o u n d e r s t a n d . T a k e E x a m p l e 21 as a n e x a m p l e , w h i c h is t h e s a m e as E x a m p l e 2 b u t a l l t h e v a r i a b l e s a r e r e n a m e d . E x a m p l e 21 Use the Second-Order Logic M a p p i n g Composition Algorithm to process the following m a p p i n g s .  M_: A  B  Vx  \fy(a(x,y)  -> 3 x i o ( x , x i ) , 6 ( x i , y ) )  M_: B  C  V x V x i V x Vy(b(x,xi),b(xi,x ),b(x ,y) 2  2  2  -> c(x,y))  U s i n g the Second-Order Logic M a p p i n g C o m p o s i t i o n A l g o r i t h m , the res u l t is a set o f eight v e r y c o m p l i c a t e d l o g i c e x p r e s s i o n . D e t a i l s a n d r e s u l t s of t h i s e x a m p l e a r e s h o w n i n A p p e n d i x B . C o m p a r e d t o t h e r e s u l t s g i v e n b y P i a z z a i n E x a m p l e 18, t h i s r e s u l t is m u c h t o o c o m p l i c a t e d a n d h a r d t o read.  7.2.2  •  An Analysis of Mapping Composition Patterns  F r o m T a b l e 7.2, t h e f o l l o w i n g r u l e s c a n b e c o n c l u d e d . 1. W h e t h e r P i a z z a m e t h o d is e x p r e s s i v e o r n o t d e p e n d s e n t i r e l y o n w h e t h e r existential attributes i n the second schema are m a p p e d to the t h i r d schema. 2. T h e S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n a l g o r i t h m c a n h a n d l e cases w i t h c o m p o s e d n o n - i d e n t i c a l s e l f - j o i n c o m p o n e n t s . H o w e v e r , t h e results d o not contain any semantic information, o n l y logically correct. T h e r e p r e s e n t a t i o n is c o m p l i c a t e d a n d h a r d t o u n d e r s t a n d . 73  Chapter 7. A Study of Mapping Composition 3. F o r 1:1 m a p p i n g s , w h e t h e r M e P S y s c a n h a n d l e t h e p a t t e r n d e p e n d s o n w h e t h e r s e l f - r e s t r i c t i v e c o m p o n e n t e x i s t s . If s e l f - r e s t r i c t i v e c o m p o n e n t s e x i s t , c o n f l i c t s for t h e s a m e c o n c e p t w i l l e x i s t i n t h e m a p p i n g s so t h a t n o r e l a t i o n i n t h e m e d i a t e d s c h e m a for a c o m m o n c o n c e p t c a n be created. 4. F o r m : n m a p p i n g s , w h e t h e r M e P S y s c a n h a n d l e t h e p a t t e r n d e p e n d s o n whether composed non-identical self-join c o m p o n e n t s exist i n the p a t t e r n . F o r n o w , M e P S y s h a s yet t o r e a l i z e t h e m e d i a t i o n of s c h e m a s if m a p p i n g s contain composed non-identical self-join components. W e put this into our future work. 5. F o r t h e cases w h e n  MapA_B  is a f u l l set o f t g d s , m a p p i n g c o m p o s i t i o n  c a n s u c c e e d u s i n g a n y of t h e t h r e e a l g o r i t h m . 6. F o r t h e cases w h e n Map AM  is a n o t a f u l l set of t g d s , it is n o t a l -  w a y s t h e case t h a t m a p p i n g c o m p o s i t i o n w i l l b e u n d e c i d a b l e . It  will  depend on whether existential attributes in schema B can be m a p p e d t o s c h e m a C. A n a l y z e t h e t h r e e a p p r o a c h e s , e s p e c i a l l y M e P S y s , u s i n g t h e r e s u l t s of T a b l e 7.2. T h e t w o m a p p i n g c o m p o s i t i o n a l g o r i t h m s , P i a z z a a p p r o a c h [20] a n d t h e S e c o n d - O r d e r L o g i c a l g o r i t h m [11], c a n h a n d l e a l l o f t h e p a t t e r n s , t h o u g h i n s o m e cases, t h e i r c o m p o s e d m a p p i n g s are n o t r i g o r o u s e n o u g h or not comprehensible enough. M e P S y s w i l l fail to handle m a p p i n g c o m p o s i t i o n whenever self-restrictive components or c o m p o s e d n o n - i d e n t i c a l self-join components exist. However, self-restrictive components w i l l cause a conflict i n t h e m a p p i n g s t o h a v e t h e s a m e Concept,  a n d M e P S y s w i l l o n l y merge  s c h e m a s w h e n m a p p i n g s e x p r e s s t h e s a m e Concept.  Thus, mappings with  self-restrictive c o m p o n e n t are out of the consideration of M e P S y s . A s i d e f r o m these t w o s p e c i a l g r o u p s of m a p p i n g c o m p o s i t i o n , t h e a p p r o a c h of u s i n g P S M a l g o r i t h m t o b u i l d a m e d i a t e d s c h e m a a n d u s i n g t h e m e d i a t e d s c h e m a t o r e l a t e different s o u r c e s is d e c i d a b l e . Comparatively, whenever Second-Order L o g i c algorithm has trouble in p r e s e n t i n g t h e r e s u l t s , o u r a p p r o a c h also h a s t r o u b l e of t h a t p a t t e r n .  This 74  Chapter 7. A Study of Mapping Composition is b e c a u s e a l l s u c h p a t t e r n s c o n t a i n s e l f - j o i n s . I n o u r c u r r e n t a p p r o a c h , selfj o i n c o m p o n e n t s are n o t c o n s i d e r e d for t h e i n p u t m a p p i n g s . W e h a v e yet t o s t u d y a n d a n a l y z e w h e n m a p p i n g s w i t h self-join c o m p o n e n t s c a n represent t h e s a m e Concept  and when cannot.  T h i s w i l l be our future work.  On  t h e o t h e r h a n d , w h e n P i a z z a a p p r o a c h is w e a k i n p r e s e n t a t i o n , it is n o t a l w a y s t h e case t h a t t h e M e P S y s w i l l h a v e t r o u b l e . T h i s is b e c a u s e , P i a z z a a p p r o a c h is w e a k i n p r e s e n t a t i o n w h e n e x i s t e n t i a l a t t r i b u t e s i n t h e s e c o n d s c h e m a m a p to the t h i r d one w h i l e our a p p r o a c h transfers the  mapping  c o m p o s i t i o n p r o b l e m t o t h e one t h a t h o w e a c h l o c a l s c h e m a r e l a t e s t o t h e mediated schema.  75  Chapter 8  System Implementation and Experiment In previous chapters, we discussed M e P S y s , a P D M S s u p p o r t i n g a mediated s c h e m a t o h e l p t r a n s l a t e q u e r i e s . W e also p r e s e n t e d o u r P S M a l g o r i t h m o f b u i l d i n g a mediated schema, creating m a p p i n g s t o local sources a n d transl a t i n g queries a m o n g different peers. In this chapter, we s t u d y t h e perform a n c e o f M e P S y s i n t h e case of c r e a t i n g a m e d i a t e d s c h e m a a n d m a p p i n g s , updating the mediated schema and mappings, a n d query reformulation.  8.1 We  System Implementation and Setup c h o o s e F r e e P a s t r y [2] as a p r o v i d e r o f n e t w o r k l a y e r t o o u r M e P S y s .  F r e e P a s t r y is a g e n e r i c , s c a l a b l e s u b s t r a t e f o r P 2 P a p p l i c a t i o n s . It uses effic i e n t r o u t i n g s t r a t e g y , a n d e a c h n o d e m a i n t a i n s a r o u t i n g t a b l e w h i c h keeps t r a c k o f i t s i m m e d i a t e n e i g h b o r s . F r e e P a s t r y also p r o v i d e s t h e f u n c t i o n a l i t y of n o t i f y i n g e a c h p e e r a p p l i c a t i o n o f m e s s a g e a r r i v a l , n e i g h b o r i n g n o d e f a i l ures, etc. T h e expected n u m b e r of forwarding steps i n F r e e P a s t r y overlay n e t w o r k is 0(logN). All  W e use F r e e P a s t r y v e r s i o n 1.4.4 f o r t h e n e t w o r k l a y e r .  i m p l e m e n t a t i o n is w r i t t e n i n J a v a , w h i c h m a k e s i t c r o s s - p l a t f o r m . We  s e t u p o u r e x p e r i m e n t i n E m u l a b [1], a n e t w o r k e m u l a t i o n t e s t b e d , t o  get t h e r e a l i s t i c P 2 P e n v i r o n m e n t f o r o u r s y s t e m . E m u l a b p r o v i d e s t h e a b i l i t y t o access a set o f different m a c h i n e s t o e m u l a t e n o d e s i n a r e a l n e t w o r k . N e t w o r k b a n d s a n d m e s s a g e d e l a y s c a n also b e s e t . D i f f e r e n t n u m b e r s o f p e e r n o d e s (12, 16, 20 a n d 24) h a v e b e e n u s e d f o r t h e p e r f o r m a n c e o f s c h e m a m e d i a t i o n a n d query reformulation i n M e P S y s . I n t o t a l , we get 40 machines f r o m E m u l a b , i n c l u d i n g 15 u s e d f o r n e t w o r k c o n t r o l a n d 1 f o r e x p e r i m e n t c o n t r o l , l e a v i n g u s w i t h a m a x i m u m of 24 d i f f e r e n t p e e r s .  Each Emulab 76  Chapter 8. System-Implementation and Experiment m a c h i n e t h a t we get p r o v i d e s 9 0 0 M m e m o r y w i t h 2992.787 M H z p r o c e s s o r . T h e q u a l i t y of s u c h m a c h i n e s is s i m i l a r t o t h a t of a p e e r i n a P 2 P n e t w o r k s i n c e e a c h p e e r needs t o act as b o t h a server a n d a c l i e n t . W e c h o o s e 7 0 m s d e l a y for each message a n d 5 0 M b a n d w i d t h environment.  1  to simulate a real  network  A l t h o u g h E m u l a b w a s o n l y a b l e t o p r o v i d e 24 peers for o u r  e x p e r i m e n t , w h i c h is less t h a n t h e h u n d r e d s o r t h o u s a n d s of p e e r s t h a t m a y e x i s t i n a P 2 P n e t w o r k , we b e l i e v e t h i s t o b e a r e a s o n a b l e n u m b e r of p e e r s t o test for a P D M S . I n a P D M S , t h e s y s t e m is r e l y i n g o n t h e  translation  of s e m a n t i c s , a n d w h i l e o u r s y s t e m does i m p r o v e t h e s e m a n t i c s of q u e r y t r a n s l a t i o n t h r o u g h t h e i n t r o d u c t i o n of C o n c e p t ( C h a p t e r 4 ) , a t t e m p t i n g t o s e m a n t i c a l l y c o m p o s e q u e r y r e w r i t e s across h u n d r e d s of p e e r s w i t h h e t e r o geneous s c h e m a s w i l l n o t r e s u l t i n a s e m a n t i c a l l y m e a n i n g f u l r e s u l t .  8.2  Input Schemas and Mappings  W e b a s e d t h e r e l a t i o n a l s c h e m a s at each p e e r o n t h e s c h e m a s i n T P C - H  [4],  w h e r e e a c h s c h e m a . c o n t a i n s 8 r e l a t i o n s , e a c h w i t h a n average of 8 a t t r i b u t e s . W e use t h i s i n f o r m a t i o n t o a u t o m a t i c a l l y g e n e r a t e a r e l a t i o n a l s c h e m a as a l o c a l d a t a b a s e s c h e m a for each p e e r i n M e P S y s , m a k i n g o u r p e e r s c h e m a as r e a l i s t i c as p o s s i b l e .  W e f u r t h e r g e n e r a t e m a p p i n g s b e t w e e n p a i r s of  a c q u a i n t e d n o d e s ; as w e w i l l see, t h e m a p p i n g s v a r i e d f r o m e x p e r i m e n t  to  e x p e r i m e n t , b u t we c o n t r o l l e d (1) t h e average n u m b e r of a c q u a i n t a n c e s p e r p e e r (2) t h e average n u m b e r of r e l a t i o n s p e r p e e r s c h e m a (3) t h e average n u m b e r of a t t r i b u t e s i n a r e l a t i o n .  8.3  Experiment 1: Schema Mediation  T h e m a i n p u r p o s e of t h i s e x p e r i m e n t is t o see t h e p e r f o r m a n c e o f t h e a l gorithms  described in C h a p t e r 5 and to compare the performance  when  d i f f e r e n t n u m b e r s of peers are i n v o l v e d i n t h e s y s t e m s e t u p p h a s e . The bandwidth is not an important factor in MePSys. We ran the same experiment using I M bandwidth, and no significant difference in time occurred in the result. lr  77  Chapter 8. System Implementation and Experiment I n T h e o r e m 1, w e gave a n u p p e r b o u n d o f t i m e f o r s y s t e m s e t u p p h a s e . T h e t i m e of s c h e m a m e d i a t i o n i n the P D M S  is d e c i d e d b y t h e n u m b e r  of h o p s w h e n t h e l a s t n o d e i n t h e n e t w o r k r e c e i v e d a l l n o d e i n f o r m a t i o n :  + shortestPath(F, E)}}, w h e r e A is t h e  m&XE{maxF{shortestPath(A,F)  m e d i a t i o n s t a r t i n g node, E a n d F represent a l l nodes i n the network.  Fig-  u r e 8.1 s h o w s t h e r e s u l t o f m e d i a t i n g 12, 16, 2 0 a n d 24 p e e r s ' i n f o r m a t i o n respectively.  W h e n p e e r n u m b e r is f i x e d , t h e t i m e o f c o m p l e t i n g a m e d i -  a t e d s c h e m a is l i n e a r a n d p r o p o r t i o n a l t o t h e n u m b e r o f h o p s . T h e t i m e o f e a c h h o p is d e c i d e d b y t w o f a c t o r s : d e l a y t i m e f o r e a c h m e s s a g e a n d m e s sage t r a n s f e r r i n g t i m e . I n o u r s c e n a r i o , t h e t i m e o f t r a n s f e r r i n g a m e d i a t e d schema cannot be ignored.  T h e t i m e for transferring a m e d i a t e d s c h e m a  is h i g h l y r e l a t e d t o t h e size o f t h e m e d i a t e d s c h e m a . W h e n t h e n u m b e r o f peers gets l a r g e r , t h e m e d i a t e d s c h e m a also i n c r e a s e s , r e s u l t i n g t h e t i m e o f e a c h h o p t o i n c r e a s e as w e l l . T h u s t h e t i m e o f e a c h h o p f o r m e d i a t i n g 12, 16, 20 a n d 24 peers a r e different. O n t h e o t h e r h a n d , w h e n t h e n u m b e r o f p e e r s i n a n e t w o r k is f i x e d , t h e t i m e for e a c h h o p is s i m i l a r , b e c a u s e t h e messages t h e n o d e s a r e p a s s i n g e a c h t i m e a r e o f s i m i l a r size. C o n s i d e r i n g t h e case o f 24 p e e r s w i t h 20 h o p s , t h e t o t a l t i m e of m e d i a t i n g t h e 24 p e e r s c h e m a s o n l y costs 31 s e c o n d s , w h i c h is q u i t e a s a t i s f a c t o r y t i m e cost f o r t h e s y s t e m s e t u p phase.  8.4  Experiment 2: Query Reformulation  I n t h i s e x p e r i m e n t , w e test t h e p e r f o r m a n c e o f o u r q u e r y r e f o r m u l a t i o n a l g o r i t h m a n d compare the local c o m p u t a t i o n time w i t h the whole query reformulation time i n the network.  Since the query reformulation  algorithm  is v e r y fast, f o r e a c h q u e r y o n e a c h p e e r , t h e q u e r y is r e f o r m u l a t e d 10 t i m e s , a n d t h i s q u e r y r e f o r m u l a t i o n is r e p o r t e d as l o c a l c o m p u t i n g t i m e . T h i s w a y , t h o u g h the time was artificially inflated, it w o u l d not be subject to t i m i n g errors caused b y h a v i n g to measure very s m a l l n u m b e r s . T i m e o f q u e r y r e f o r m u l a t i o n a n d b r o a d c a s t i n g i n t h e w h o l e n e t w o r k is p r o p o r t i o n a l t o t h e t o p o l o g y d e p t h w h e n e a c h q u e r y m e s s a g e size is s i m i l a r . W e d e f i n e Topology  Depth  as t h e m a x i m u m of t h e s h o r t e s t p a t h f o r a l l  78  Chapter 8. System Implementation and Experiment  Tim e ot b u il d i n g a m ed iated schemaV.S.Numb er of Hops - » - 12peers —•-IBpeers  20peers — « - 2 4 peers  31.38  „ « - M ."55  g  23.03  21.54  19.69 20  -29M5  17:67  — — •  •—~"*Tfl2  15.16 *-tfT7l  —  ^—-—"MTU  '  18.38  15.20  * 1 3 70  S.BJ  1  12  I  14  1  16  1  18  20  Num of Hops  F i g u r e 8.1: T i m e o f b u i l d i n g a m e d i a t e d s c h e m a V . S . N u m b e r o f H o p s  n o d e s f r o m t h e n o d e w h e r e q u e r y is i n i t i a l l y p o s e d .  TopologyDepth = max{shortestPath(A,  E)}  w h e r e A is t h e n o d e w h e r e q u e r y is p o s e d , E is a l l o t h e r n o d e s i n t h e n e t w o r k . T o p o l o g y d e p t h takes a n i m p o r t a n t role i n d e c i d i n g t h e t o t a l t i m e for q u e r y r e f o r m u l a t i o n i n t h e n e t w o r k . T h i s is b e c a u s e q u e r i e s w i l l b e sent t o a l l o t h e r a c q u a i n t a n c e s e x c e p t t h e message s o u r c e , a n d e a c h n o d e w i l l n o t p r o c e s s t h e s a m e q u e r y t w i c e . W h a t ' s m o r e , i n m o s t cases, q u e r y messages are s m a l l . F i g u r e 8.2 s h o w s t h e t i m e r e s u l t s o f c o m p l e t i n g t h e q u e r y r e f o r m u lation a n d broadcasting i n the network. W e posed a query w i t h 2 subgoals on one peer schema. F o r each topology d e p t h f r o m 4 to 9, we r u n e x p e r i m e n t o n 1 2 , 1 6 , 20 a n d 24 p e e r s r e s p e c t i v e l y . R e s u l t s s h o w t h a t f o r q u e r i e s ( w i t h r e l a t i v e l y s m a l l size u n d e r l k ) sent t o t o p o l o g i e s o f t h e s a m e t o p o l o g y d e p t h , t h e r e is a l m o s t n o t i m e difference t o r e f o r m u l a t e t h e q u e r y a m o n g t h e f o u r different sets. O n t h e o t h e r h a n d , r e s u l t s i n F i g u r e 8.2 s h o w t h a t t h e t i m e of q u e r y r e f o r m u l a t i o n a n d b r o a d c a s t i n g is e x a c t l y p r o p o r t i o n a l t o  79  Chapter 8. System Implementation and Experiment  Time of Query Reformulation and Broadcasting V.S.Topology Depth | —•—12 peers  16 peers - * - 2 0 p e e r s  •  24peers|  2.5 • 2 • rime (s)  ^ ,. •-  ^  ^  ^  ^  *r"-^"^**^  0.5 • 0 -  — i  4  1  5  i  i  6  7  i  8  i  9  Topology Depih  F i g u r e 8.2: T i m e of Q u e r y R e f o r m u l a t i o n a n d B r o a d c a s t i n g V . S . D e p t h o f Topology  t h e t o p o l o g y d e p t h of t h e n e t w o r k . W e f u r t h e r c o m p a r e d t h e q u a l i t y of t h e q u e r y r e f o r m u l a t i o n  algorithm  w h e n different q u e r i e s are p o s e d . W e f i x e d t h e n e t w o r k t o p o l o g y a n d p o s e d t e n different q u e r i e s t o a s o u r c e p e e r . E a c h q u e r y c o n t a i n s u b g o a l s r a n g i n g f r o m 1 to 10. F o r s o m e peer w h e n q u e r i e s c a n n o t b e r e f o r m u l a t e d , t h e l o c a l reformulation time w i l l be very little.  T h u s for each query, we c o m p a r e  the total reformulation and broadcasting time w i t h the m a x i m u m local peer c o m p u t i n g t i m e for 10 t i m e s q u e r y r e f o r m u l a t i o n . F i g u r e 8.3 s h o w s t h a t as t h e n u m b e r o f s u b g o a l g r o w s , t h e t i m e s p e n t o n l o c a l r e f o r m u l a t i o n t i m e also i n c r e a s e s . T h i s is as a n t i c i p a t e d s i n c e o u r a l g o r i t h m reformulates queries based o n each subgoal. A n o t h e r f i n d i n g f r o m F i g u r e 8.3 is t h a t w h e n t h e s u b g o a l n u m b e r i n c r e a s e s , t h e t i m e for c o m p l e t i n g t h e q u e r y i n t h e w h o l e n e t w o r k a l s o i n c r e a s e s . T h i s is b e c a u s e w h e n t h e s u b g o a l n u m b e r i n c r e a s e s , t h e size of t h e message c o n t a i n i n g t h a t q u e r y also i n c r e a s e s , w h i c h causes t h e t i m e of e a c h h o p t o i n c r e a s e as w e l l . H o w e v e r , w h e n processing queries w i t h no more t h a n 3 subgoals, the  reformulation  a n d b r o a d c a s t i n g s u c h q u e r i e s is s t i l l close t o c o n s t a n t . F r o m F i g u r e 8.3, w e  80  Chapter 8. System Implementation and Experiment c a n c o n c l u d e t h a t e v e n for 10 t i m e s of q u e r y r e f o r m u l a t i o n , m a x i m u m l o c a l r e f o r m u l a t i o n t i m e is a l w a y s less t h a n 3 % of t h e t o t a l n e t w o r k t i m e .  Tim* of Query Reformulation and Broadcasting in the network V.S. Max Local Computing Time (10 times query reformulation) B Time of Max Local Query Reformulation for 10 times • Query Reformulation in the Network  F i g u r e 8.3: T i m e of Q u e r y R e f o r m u l a t i o n i n t h e n e t w o r k V . S . M a x L o c a l C o m p u t i n g T i m e for 10 t i m e s Q u e r y R e f o r m u l a t i o n  8.5  Experiment 3: Updating the Mediated Schema  N e x t , we t e s t e d t h e p e r f o r m a n c e of u p d a t i n g t h e m e d i a t e d s c h e m a i n t h e s t a b l e s t a t e , i.e., after t h e s y s t e m s e t u p p e r i o d . T h i s e x p e r i m e n t is a f o l l o w u p e x p e r i m e n t of t h e f i r s t e x p e r i m e n t , S c h e m a M e d i a t i o n ( S e c t i o n 8.3), a n d is d e s i g n e d as f o l l o w s :  A s s u m e a l l peers m a i n t a i n t h e m o s t u p d a t e d m e -  diated schema a n d G L A V m a p p i n g s to local schema. A new peer Z t h e n e t w o r k a n d c r e a t e s a m a p p i n g t o one of t h e p e e r , s a y A,  joins  which trig-  gers a n e w s c h e m a m e d i a t i o n p r o c e s s . A f t e r t h a t , A b r o a d c a s t s t h e n e w mediated schema to all other peers, a n d t h e n all peers, u p o n receiving an  81  Chapter 8. System Implementation and Experiment u p d a t i n g message, u p d a t e their locally m a i n t a i n e d mediated schema a n d other  information.  T e n different e x p e r i m e n t s , r e p r e s e n t i n g t e n different t o p o l o g i e s , a r e exec u t e d i n o r d e r t o g e t t h e f o l l o w i n g t i m e for each s e t t i n g . 1. t\ = t i m e of i n i t i a l s y s t e m s e t u p p h a s e 2. ti = t i m e o f c o m p u t i n g t h e n e w M a t p e e r A 3. £3 = average t i m e o f u p d a t i n g M a t e a c h p e e r e x c e p t A 4. £4 = t o t a l n e t w o r k t i m e for u p d a t i n g M i n t h e w h o l e n e t w o r k after Z joins the network T i m e t\ is d i r e c t l y , o b t a i n e d f r o m t h e s t a t i s t i c s i n E x p e r i m e n t 1. T h u s , i t is p o s s i b l e t o c o m p a r e t h e t i m e of u p d a t i n g t h e m e d i a t e d s c h e m a w i t h t h e initial system setup time for a fixed topology. " T o p o l o g y D e p t h " , w h i c h is d e f i n e d i n E x p e r i m e n t 2 , is u s e d t o n o t e d o w n t h e topology d e p t h for each experiment.  S i n c e after u p d a t i n g t h e  m e d i a t e d s c h e m a a t A, t h e s a m e u p d a t i n g message w i l l o n l y b e p r o c e s s e d at e a c h p e e r o n c e , t h e t i m e f o r t h e w h o l e u p d a t i n g p r o c e s s t o c o m p l e t e depends o n the topology depth. T a b l e 8.1: U p d a t i n g t h e M e d i a t e d S c h e m a for A d d i n g a N e w Peer <2(ms) t ( m s ) t2/t4 Topology *i (ms) t4(ms) Experiment Depth 5868.13 0.0099 1.21% 16 peers 12 hops 4 .10733.86 71.33 0.97% 0.0105 7325.73 13147.71 71.10 16 peers 14 hops 5 11477.08 0.46% 0.0099 15542.58 69.16 16 peers 16 hops 7 12766.28 0.56% 17117.51 71.49 0.0105 16 peers 18 hops 8 0.0106 13075.01 0.54% 18376.80 71.38 16 peers 20 hops 9 3  20 20 20 20 20  peers peers peers peers peers  12 hops 14 hops 16 hops 18 hops 20 hops  4 5 6 8 7  15160.07 17672.95 19693.76 21542.09 23028.56  91.50 90.57 91.01 87.99 90.27  0.01 0.0103 0.0105 0.009 0.009  5297.89 6616.83 7921.49 9591.46 7940.53  1.73% 1.37% 1.15% 0.92% 1.14%  T a b l e 8.1 s h o w s t h e r e s u l t s o f different t i m e w . r . t . t h e t e n t o p o l o g i e s . T h e c o m p u t a t i o n o f a n e w m e d i a t e d s c h e m a a t p e e r A, t\, is a l w a y s a r o u n d 82  Chapter 8. System Implementation and Experiment 70 m s f o r 16 p e e r s , a n d 9 0 m s f o r 20 p e e r s , less t h a n 2 % of £4, t h e t o t a l t i m e for u p d a t i n g a m e d i a t e d s c h e m a i n t h e n e t w o r k . T h e t i m e of u p d a t i n g l o c a l m e d i a t e d i n f o r m a t i o n i n a l l o t h e r p e e r s , £3, is a l w a y s a b o u t 0.01 m s , w h i c h c a n b e ignored c o m p a r e d t o the t i m e spent for t h e network.  83  Chapter 9  Conclusion and Future  Work  I n t h i s t h e s i s , we p r e s e n t e d M e P S y s , a r e l a t i o n a l p e e r d a t a m a n a g e m e n t . s y s t e m . O u r k e y c o n t r i b u t i o n s are: • W e presented a m e c h a n i s m to s u p p o r t a mediated s c h e m a in P D M S w h i c h a l l o w s easier access t o m o r e i n f o r m a t i o n . • W e d e f i n e d w h a t a c o n c e p t m e a n s i n t h e case of c o n j u n c t i v e m a p p i n g and  h o w it i m p a c t s t h e u n d e r s t a n d a b i l i t y of t h e m a p p i n g a n d m e d i a t e d  schema. • W e i n t r o d u c e d a construct M a p p i n g T a b l e w h i c h helps us to transform unstructured m a p p i n g information to structured forms and w h i c h is  easier t o c o n s t r u c t m a p p i n g s f r o m t h e m e d i a t e d s c h e m a t o l o c a l  sources. • W e also p r o v i d e d a n a p p r o a c h t o c r e a t e m e d i a t e d s c h e m a i n and  PDMS  get m a p p i n g s t o l o c a l s o u r c e s .  - • W e further explored how to u p d a t e the mediated s c h e m a i n the steady state. • W e f i n a l l y i m p l e m e n t e d t h e s y s t e m a n d s h o w e d t h e e x p e r i m e n t a l results. I n f u t u r e w o r k , we w i l l e x t e n d t h i s w o r k t o a m o r e g e n e r i c m e t h o d . F i r s t , r e c o n s i d e r i n g t h e d e f i n i t i o n of Concept,  we w a n t t o f i g u r e o u t h o w t o d i s -  t i n g u i s h m a p p i n g s w i t h t h e s a m e s e m a n t i c i n f o r m a t i o n w h i l e u s i n g different I D B n a m e s . W e also w o u l d l i k e t o e x p l o r e t h e s e m a n t i c issues w h e n a b r o a d e r r a n g e of m a p p i n g s are c o n s i d e r e d , e.g., m a p p i n g s w i t h s e l f - j o i n s .  Second,  m o r e o p t i m i z a t i o n issues c a n b e c o n s i d e r e d i n t h e f u t u r e s y s t e m s i n c e i n  84  Chapter 9. Conclusion and Future Work c u r r e n t M e P S y s , e a c h m e d i a t e d s c h e m a message p a s s e d a m o n g p e e r s is s t i l l l a r g e . T h e r e are m a n y p o s s i b l e w a y s t o decrease t h e size of t h e messages b y sharing information  between different constructs.  G o o d optimization  can  s i g n i f i c a n t l y decrease t h e s c h e m a m e d i a t i o n t i m e i n t h e s y s t e m s e t u p p h a s e . T h i r d , we w o u l d l i k e t o e x p l o r e s o m e b e t t e r a p p r o a c h e s for u p d a t i n g  the  m e d i a t e d s c h e m a w h e n l o c a l s c h e m a evolves.  85  Bibliography [1] E m u l a b .  https://www.emulab.net/.  [2] F r e e p a s t r y . [3] N a p s t e r ,  http://www.freepastry.org/. http://www.napster.com/.  [4] T p c - h . h t t p : / / w w w . t p c . o r g / t p c h / . [5] M . A r e n a s , V . K a n t e r e , A . K e m e n t s i e t s i d i s , I. K i r i n g a , R . J . M i l l e r , a n d J. Mylopoulos. coordination.  T h e hyperion project: F r o m data integration to d a t a  SIGMOD Record,  2003.  [6] C . B a t i n i , M . L e n z e r i n i , a n d S . B . N a v a t h e . of m e t h o d o l o g i e s  A comparative  for database schema integration.  analysis  ACM Computing  Surveys, 1 8 : 3 2 3 - 3 6 4 , 1986. [7] P . B e r n s t e i n , F . G i u n c h i g l i a , A . K e m e n t s i e t s i d i s , J . M y l o p o u l o s , L . S e r a f i n i , a n d I. Z a i h r a y e u . D a t a m a n a g e m e n t f o r p e e r - t o - p e e r  computing:  A v i s i o n . I n WebDB, 2002. [8] A . B o n i f a t i , E . Q . C h a n g , T . H o , L . V . S . L a k s h m a n a n , a n d R . P o t t i n g e r . Heptox: M a r r y i n g x m l a n d heterogeneity i n your p 2 p databases (system  d e m o s t r a t i o n ) . I n VLDB, 2005. [9] E l a i n e Q . C h a n g . S c h e m a m a p p i n g a n d q u e r y t r a n s l a t i o n i n h e t e r o g e neous peer-to-peer x m l databases. Technical r e p o r t , T h e U n i v e r s i t y of B r i t i s h C o l u m b i a , 2005. M a s t e r t h e s i s . [10]  R . Fagin, P. K o l a i t i s , R . M i l l e r , and L . P o p a . D a t a exchange: Semantics a n d q u e r y a n s w e r i n g . I n ICDT, 2 0 0 3 .  86  Chapter 9. Conclusion and Future Work [11]  R. Fagin, P . G . Kolaitis, L. Popa, and W . Tan.  Composing schema  m a p p i n g s : S e c o n d - o r d e r d e p e n d e n c i e s t o t h e r e s c u e . I n PODS, 2004. [12] M . F r i e d m a n , A . L e v y , a n d T . M i l l s t e i n .  N a v i g a t i o n a l plans for d a t a  i n t e g r a t i o n . I n the 16th Nat. Conf. on Artificial Intelligence (AAAI99), pages 67 - 7 3 , 1999. [13]  S t e v e n G r i b b l e , A l o n H a l e v y , Z a c h a r y Ives, M a y a R o d r i g , a n d D a n S u c i u . W h a t can databases do for peer-to-peer.  [14]  In  WebDB,  2001.  A . Y . H a l e v y , Z . G . Ives, D . S u c i u , a n d I. T a t a r i n o v . S c h e m a m e d i a t i o n i n p e e r d a t a m a n a g e m e n t s y s t e m s . I n ICDE, 2 0 0 3 .  [15]  A l o n Y . H a l e v y , Z a c h a r y G . Ives, P e t e r M o r k ,  a n d Igor  Tatarinov.  P i a z z a : D a t a m a n a g e m e n t i n f r a s t r u c t u r e for s e m a n t i c w e b a p p l i c a t i o n s . I n WWW, [16]  2003.  H.V. Jagadish, Beng C h i n Ooi, and Q u a n g Hieu V u . Baton: A balanced t r e e s t r u c t u r e f o r p e e r - t o - p e e r n e t w o r k s . I n VLDB, 2 0 0 5 .  [17]  A . Kementsietsidis, M . Arenas, a n dR. J . Miller. peer-to-peer systems: Semantics and algorithmic  Mapping data in  issues. I n  SIGMOD,  2003. [18]  Phokion  G . Kolaitis.  mappings,  2004.  Data  exchange  Presentation  http://www.scs.carleton.ca/"  a n d composition  slides g i v e n  at Carleton  of schema University.  diis/arise/Presentations/PhokionKolaitis  .ppt. [19]  Maurizio Lenzerini.  D a t a integration:  A theoretical perspective.  In  PODS, p a g e s 2 3 3 - 2 4 6 , 2002. [20]  J . M a d h a v a n a n d A . Y . Halevy.  Composing mappings  among  data  sources. I n VLDB, 2003. [21] W . S . N g , B . C . O o i , K . T a n , a n d A . Z h o u .  Peerdb:  A p2p-based  s y s t e m f o r d i s t r i b u t e d d a t a s h a r i n g . I n ICDE, 2 0 0 3 .  87  Chapter 9. Conclusion and Future Work [22] R . P o t t i n g e r .  Processing  Queries  and Merging Schemas in Support of  P h D t h e s i s , 2004. P h D t h e s i s .  Data Integration.  [23] E . R a h m a n d P . A . B e r n s t e i n . schema matching.  VLDB  A survey of approaches t o a u t o m a t i c  J o u r n a l , 10(4) : 3 3 4 - 3 5 0 , D e c 2 0 0 1 .  [24] P . R o d r i g u e z - G i a n o l l i , M . G a r z e t t i ,  L. Jiang,  A . Kementsietsidis,  I. K i r i n g a , M . M a s u d , R . M i l l e r , a n d J . M y l o p o u l o s . i n t h e h y p e r i o n p e e r d a t a b a s e s y s t e m . I n ICDE, [25]  Data  sharing  2005.  S . P . S h a h , Y . H u a n g , T. X u , M . M S Y u e n , J . L i n g , a n d B . F . O u e l lette. A t l a s - a d a t a warehouse for integrative bioinformatics. I n Bioinformatics,  BMC  2005.  [26] Igor T a t a r i n o v a n d A l o n H a l e v y . E f f i c i e n t q u e r y r e f o r m u l a t i o n i n p e e r d a t a m a n a g e m e n t s y s t e m s . I n SIGMOD, [27] J e f f r e y D . U l l m a n . Principle  2004.  of Database and Knowledge-Base  Systems.  1989. [28] J e f f r e y D . U l l m a n . ICDT,  Information integration using logical views.  In  pages 1 9 - 4 0 , 1997.  88  Appendix A  Details for Example 15 E x a m p l e 15: A n e x a m p l e of a n e w p e e r j o i n i n g t h e n e t w o r k T h e r e are t h r e e peers i n t h e c u r r e n t s y s t e m : A, B a n d C. M a p p i n g s Map AM and  Mapsx) a r e c r e a t e d as f o l l o w s . W e o n l y c o n s i d e r t h e r e l a t i o n s t h a t a r e  c r e a t e d b y m a p p i n g s for t h e m e d i a t e d s c h e m a a n d o m i t r e l a t i o n s t h a t d o not participate i n t h e m a p p i n g s because relations t h a t represent a  Concept  a m o n g different peers a r e t h e core p a r t of o u r d i s c u s s i o n . M a p p i n g MapA_B'flight(date, flightNo, price) :- A.flight(date, flightNo, price) flight(date, flightNo, price) :B_flight(date, company,flightNo,service), B_price(flightNo, class, price) M a p p i n g Maps.c'flight (date, flightNo,. class, price) :B.flight(date, company, flightNo, service), B_price(flightNo, class, price) flight(date, flightNo, class, price) :C-Schedule(ID, date, flightNo), C-price(ID, class, price, discount) S h o w n i n i n F i g u r e 6.1(a), after t h e s e t u p p h a s e , a m e d i a t e d s c h e m a MS\  has been b u i l t a n d m a i n t a i n e d at each peer. T h e m a p p i n g s f r o m  MS\  t o A, B a n d C a n d t h e M a p p i n g T a b l e w i l l also b e b u i l t . M e d i a t e d schema MS\ = { flight(date, flightNo, price, company, service, class, I D , discount)  }  89  Appendix A. Details for Example 15 MappingTable  flight: 1  2  3  4  5  6  IM -flight  1  2  3  4  5  6  Ailight  1  2  3  B_flight  1  3  2  4  4  5  2  4  flight  MS  V  AB  B.price  1  IM -flight BC  B_flight  1  3  2  3  1  3  2  , 3  1  B_price C-Schedule  C_price  Mapping LV  7  8  7  8  2 6 2  3  1 -  2  1  3  Map Si^\'M  = {  M S i - ^ l ^ d a t e , flightNo, price) :M 5 i . f l i g h t ( d a t e , flightNo, price, company, service, class, I D , discount) }  "'  GV = {  MSi-j4i(date, }  price) :- A A _ n i g h t ( d a t e ,  flightNo,  price)  '"  Mapping  LV  flightNo,  MCLPMSLB'-  = {  M 5 i _ 5 ( d a t e , company, flightNo, service, class, price) :M 5 i . f l i g h t ( d a t e , flightNo, price, company, service, class, I D , discount) 1  }  "'  GV = {  MS\.Sedate, company, flightNo, service, class, price) :B.B_flight(date, company, flightNo, service), B.B_price(flightNo, class, price) . }  "'  90  Appendix  A. Details for Example 15  Mapping Map .cLV = { M 5 i - C ( I D , date, flightNo, class, price, discount) :MS\. flight (date, flightNo, price, company, service, class, I D , discount) MSl  1  }  '"  GV = { MS\-Ci(lD, date, flightNo, class, price, discount) :C . C - s c h e d u l e ( I D , date, flightNo), C C _ p r i c e ( I D , class, price, discount) } "' A f t e r b u i l d i n g t h e m e d i a t e d s c h e m a MS\  and the mappings to local  s o u r c e s , p e e r B also b u i l t a d a t a b a s e a p p l i c a t i o n APPB  over MS\.  So the  i n f o r m a t i o n s t o r e d at each p e e r is as f o l l o w s . - P e e r A : s c h e m a A , MapAM,  MS\, Map Si^A, M  - P e e r B : s c h e m a B, MapAM, MapB_c, MS\,  MT\. MapMS^B,  MT\,  APPB(MSI). - P e e r C : s c h e m a C, MapB.c,  MSi, Map Si.c,  A n e w peer D joins i n the network.  M  MT . Y  It creates a m a p p i n g t o p e e r  C,  shown i n F i g u r e 6.1(b).  Mapping Mapcjjflight(date, flightNo) :- Cj3chedule(ID, date, flightNo), C_price(ID, class, price, discount) flight(date, flightNo) :D_schedule(date, flightNo, depart, arrival, numLeft) U s i n g a l g o r i t h m s d e s c r i b e d i n C h a p t e r 5, a n e w m e d i a t e d s c h e m a M a p p i n g T a b l e MT  2  MS2,  and G L A V mappings can be computed.  M e d i a t e d schema MS = { flight(date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) 2  } 91  Appendix  A.  Details for Example  15  M a p p i n g T a b l e flight: A f S i .flight  1  2  3  4  5  6  7  8  .flight  1  2  3  4  5  6  7  8  1  2  3  4  5  6  AJlight  1  2  3  BJlight  1  3  2  4  4  5  7  8  2  4  MSi  -flight  I MAB  1  3  IM -flight  1  2  3  B.flight  1  3  B .price BC  1  B-price  6  1  3  1  2  3  4  Cschedule  2  3  1  1  2  2  C-price  2  C-price  Mapping  5  6  7.  3  .4  5  1  / M e n .flight  T w o mappings w i l l be created  11  2  3  3  D_schedule  10  2  2  C_schedule  9  1  3  first.  Map s .MSxM  2  LV = { <3i(date, flightNo, price, company, service, class, I D , discount) MS .flight (date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) 2  }  "'  GV = { Q i ( d a t e , flightNo, price, company, service, class, I D , discount) MS\. flight (date, flightNo, price, company, service, class, I D , discount) }  "'  92  Appendix  A.  Details for Example  15  M a p p i n g MapMS -D'LV = { M S 2 - D i ( d a t e , flightNo, depart, arrival, numLeft) :MS2-flight (date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) 2  }  '"  GV = { MS2-D1 (date, flightNo, depart, arrival, numLeft) :£>.D_schedule(date, flightNo, depart, arrival, numLeft) }  '"  C p a s s e d MS2, MT  a n d MapMS -D 2  2  D stores is: MS2,  MapMS -D>  MT ,  s c h e m a D,  2  2  to D. S o t h e i n f o r m a t i o n t h a t p e e r Mapcjj-  C f u r t h e r c o m p u t e s t h e G L A V m a p p i n g f r o m MS2 t o C u s i n g and  MapMS .MSj. 2  MapMSi.c-  Mapping MapMS -CLV = { MiS 2-C (ID, date, flightNo.class, price, discount) :MS2-flight (date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) 2  ,  1  }  "'  GV = { MS2-C1 (ID, date, flightNo,class, price, discount) :C.C-schedule(ID, date, flightNo), C.C_price(ID, class, price, discount) }  "'  ;  C w i l l keep t h e following i n f o r m a t i o n : C , Mapc.D,  2  M  2  MT , 2  MapMS -C 2  schema  MapB.c-  C passes MS , Map s iMSi  MS2,  and  MT , 2  Map :s -MS M  2  l  t o B. B c o m p u t e s MapMS2M  using  Map suM  93  Appendix A. Details for Example 15 Mapping  MapMS .B2  LV = { M £ 2 - 6 1 (date, company, flightNo, service, class, price) :M52-flight(date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) }  '"  GV = {  .  •  M 5 2 - 5 i ( d a t e , company, flightNo, service, class, price) :B.B_flight(date, company, flightNo, service), B.B_price(flightNo, class, price)  } S i n c e B h a s a p p l i c a t i o n APPB MS  t o MS\  MSi,  MS2,  2  s t i l l u s i n g MS\,  MS\  and the m a p p i n g  n e e d t o b e k e p t as w e l l . S o t h e i n f o r m a t i o n t h a t B s t o r e d is MapMS .MSi,  MapMS JB, 2  2  B w i l l p a s s MS2,  MT2,  U s i n g MapMS MS 2  2  MapMS .MS\  to  2  a n d MapMS\.A,  x  MT ,  s c h e m a B, Maps.c,  Map  AM-  A.  a mapping from mediated schema  to local schema A can be computed.  MS  2  Mapping  Map . 'MS2 A  LV = { M S ^ - ^ i t d a t e , flightNo, price) :M52-flight(date, flightNo, price, company, service, class, I D , discount, depart, arrival, numLeft) }  '"  GV = { MS2-Ai(d&te, flightNo, price) :A A _ f l i g h t ( d a t e , flightNo, price) }  A  Map  '"  w i l l keep the following  information:  MS2,  MT , 2  MapMS2.A  and  AM-  T h u s , every peer c a n keep the u p d a t e d m e d i a t e d s c h e m a a n d m a i n t a i n a m a p p i n g from the mediated schema to local schema.  Additionally,  all  t h e p r e v i o u s l y c r e a t e d d a t a b a s e a p p l i c a t i o n s over p r e v i o u s v e r s i o n s of t h e mediated schema can still be used. 94  Appendix B  Details for E x a m p l e 2 1 Use  the S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n A l g o r i t h m to process the  following mappings. M_: A  B  Vx  Vy(a(x,y)  -»  3xib(x,xi),b(x ,y)) x  M .c-  '  B  Vx Wxi V x Vy(b{x,x ),b(x ,x ),b(x ,y) 2  A  1  1  2  -> c(x,y))  2  d e t a i l e d p r o c e s s is as f o l l o w s u s i n g t h e S e c o n d - O r d e r L o g i c M a p p i n g  Composition Algorithm.  Step zero (change variable  names):  M_: A  B  Vxyy(a(x,y)  ->  3xib{x,x ),b{xi,y)) x  M .cB  Vs  V s V s Vt{b(s,s ),b( ,S2),b(s2,t) r  2  x  Sl  - » c(s,t))  Step one (split): M _B: A  3 / ( V x Vy(a(a;,y) -  6(x J ( x , y ) ) , 6 ( / ( z , y ) , y ) )  a ( x , y ) - > b{x,f{x,y)),  a(x,y)  -> b(f(x,y),y)  }  Step two (compose): M .cB  95  Appendix B. Details for Example 21 {b(s, s i ) , 6 ( s i ,  s ),b{s ,t)c(s,t)) 2  2  =M {a(x,y),b(si,s ),b{s ,t)  A {x = s ) A ( / ( x , y ) = S i ) - > c ( s , * ) ) ;  (a(x,y),6(s ,s ),fj(s2,i)  A (f(x,y)  2  1  2  2  = s) A (y = s i ) - » c(s,t))  }  ( a ( x , y ) , a ( x ' , y ' ) , K 2 , £ ) A (x = s) A ( / ( x , y ) = S i ) A (a;' = S i ) s  A (/(x',y') = s )-> c(s,t)); 2  (a{x,y),a(x',y'),b(s ,t) 2  A (x = s) A (f(x,y)  = s ) A (/(x',y') = a  s i ) A (y' = s ) -> c ( s , t ) ) ; 2  ( a ( x , y ) , a ( x ' y ) . ^ ( s 2 , i ) A ( / ( x , y ) = s ) A (y = S j ) A (x' = s i ) /  )  A (/(x',y') = s 2 ) - + c ( s , i ) ) ; ( a ( a ; , y ) , o ( a ; , y ' ) , r j ( s 2 , t ) A ( / ( 2 : y ) = s) A (y = si) A ( x ' = s i ) A /  (f(x',y')  !  = s ) -> c(M)); 2  } A (a: = s) A (f(x,y)  (a(x,y),a(a; ,y'),a(a;",y") /  A ( / ( x ' , y ' ) = s ) A (x" = s ) A (f(x",y")= 2  2  = si) A (x' = s ) x  t) -> c ( , £ ) ) ; S  (0(1, y ) , a ( x ' , y ' ) , a ( x " , y " ) A (x = s) A ( / ( x , y) = ) A (f{x', y ) = 1  8l  si) A (y' = s ) A (x" = s ) (f(x",y") 2  2  = t) -> c(s,t));  A ( / ( x , y ) = s) A (y = s ) A ( x ' = S i )  {a{x,y),a(x',y'),a{x",y")  :  A ( / ( x ' , y ' ) = s ) A ( X " = s ) A (f(x",y") 2  2  = t ) -> c ( , t ) ) ; S  ( a ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A ( / ( x , y ) = s) A (y = s i ) A ( x ' = si) A ( / ( x ' , y ' ) = s ) ( x " = s ) A ( / ( x " , y " ) = t) -> c ( s , i ) ) ; 2  2  ( o ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A (x = s ) A ( / ( x , y ) = si)  A ( / ( x ' , y ' ) = s ) A (f(x",y") 2  {a{x,y),a(x',y'),a(x",y")  = s ) A (y" = i) 2  A (x' =  s) a  c(s,t));  A (x — s) A ( / ( x , y ) = s i ) A ( / ( x ' , y ' ) =  si) A (y' = s ) A (f(x",y")  = s ) A (y" = t) -> c ( s , t ) ) ;  2  2  ( a ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A ( / ( x , y ) = s ) A (y = si) A ( x ' = S i )  • A ( / ( x ' , y ' ) = s ) A (f(x",y") 2  (a(x,y),a(x',y'),a(x",y")  2  A ( / ( x , y ) = s) A (y = s i ) A ( x ' = S i )  A ( / ( x ' , y ' ) = s ) A (f(x",y") 2  = s ) A ( y " — t) A c(s,t)); = s ) (y" = t) A c(s,t)) 2  96 1  Appendix B. Details for Example 21  }  Step three (add quantifiers): ^{ 3 / V x V y V x ' V y' V x" V y " V s V S, V s V r- ( a ( x , y),0(0;', y ' ) , a ( x " , y " ) 2  A (x = s) A ( / ( x , y ) = si) A (x' = s i ) A (f(x',y') (x" =  = s )A 2  s )A(/(x",y")=i)-cM)); 2  3 / V x V y Vx' V y ' Vx" V y " V 4r V s V s V t (a(x, y ) , a ( x ' , y ' ) , a(x", y") x  A (x = s) A (f(x,y)  2  = s i ) A (f(x',y') = s ) A ( y ' = s ) A 2  x  (s" = s ) A ( / ( s " , y " ) = 0 - * c ( M ) ) ; 2  3 / V x V y V x ' V y ' V x " V y " V s V s i A (/(x,y)  Vs Vi(a(x,y),a(x',y'),a(x",y") 2  = s ) A (y = S j ) A ( x ' = s j ) A ( / ( x ' , y ' ) = s ) A 2  (x" = s ) A ( / ( * " , y " ) = t ) - » c ( 2  S >  t));  3 / V x V y Vx'Vy'Vx"Vy"VsVsi Vs Vi 2  (o(s,y),o(s ,y') a(x" y") /  >  A ( / ( x , y ) = «) A (y = s i ) A ( x ' = si) A ( / ( x ' , y ' ) ( x " = s ) A (/(a:"-,y") = t ) - » c ( 2  B |  3fVx\/yVx'Vy'Vx"Vy"VsVsi A (x = s) A ( / ( x , y ) (/(x",y") =  )  = s ) A 2  t));  Vs Vt 2  (a(x,y),a(x',y'),a(x",y")  = Si) A (x' = si) A (/(x',y')  = s ) 2  ) A(y" = i)^c( ,i));  S 2  S  3 / V x V y V x' V y ' V x " V y" V s V si V s V t ( a ( x , y ) , a ( x ' , y'),a{x",y") 2  A (x = s) A ( / ( x , y ) = (/(x",y") = s ) A ( y " 2  S l  ) A ( / ( x ' , y ' ) = si) A ( y ' = s ) A 2  = i)->c(M));  3/VxVyVx'Vy'Vx"Vy"VsV'si Vs A  Vi(a(x,y),a(x' y'),a(x",y")  2  !  {f{x,y) = s) A (y = s i ) A ( x ' = s ) A ( / ( x ' , y ' ) = s ) A :  (/(x",y") = s ) A ( y " 2  = t)-»c(M));  3 / V x V y V x V y ' V x " V y " V s V s i Vs /  2  A ( / ( x , y ) = s ) A (y = s ) A ( x ' = ) x  (f(x",y") = s )A(y" 2  2  =  Sl  Vi(a(x,y),a(x',y'),a(x",y")  A (f{x',y')  = s )A 2  t)^c( ,t)); S  } T h e w h o l e s e t i n step t h r e e is t h e r e s u l t o f t h e c o m p o s e d m a p p i n g  MapA_c o f Map AM  and  MapB.c-  C o m p a r e d t o the results given b y P i -  a z z a i n E x a m p l e 18, t h i s r e s u l t is m u c h t o o c o m p l i c a t e d a n d h a r d t o r e a d . 97  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051733/manifest

Comment

Related Items