UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Schema mediation and query processing in Peer Data Management Systems Zhao, Jie 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-ubc_2006-0732.pdf [ 5MB ]
Metadata
JSON: 831-1.0051733.json
JSON-LD: 831-1.0051733-ld.json
RDF/XML (Pretty): 831-1.0051733-rdf.xml
RDF/JSON: 831-1.0051733-rdf.json
Turtle: 831-1.0051733-turtle.txt
N-Triples: 831-1.0051733-rdf-ntriples.txt
Original Record: 831-1.0051733-source.json
Full Text
831-1.0051733-fulltext.txt
Citation
831-1.0051733.ris

Full Text

Schema Mediation and Query Processing in Peer Data Management Systems by Jie Zhao ' B .Sc , Fudan University, 2003 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia October 2006 © Jie Zhao, 2006 Abstract P 2 P D a t a M a n a g e m e n t S y s t e m s ( P D M S s ) a l low the eff icient s h a r i n g of d a t a be tween peers w i t h ove r l app ing sources of i n f o r m a t i o n . T h e s e sources share d a t a t h r o u g h semant i c m a p p i n g s be tween peers. In cur ren t sys tems , quer ies are asked over each peer 's l oca l s chema a n d t h e n t r ans la ted u s i n g the se-m a n t i c m a p p i n g s between peers. In th is thesis we p ropose tha t a m e d i a t e d schema c a n benef i t P D M S s b y a l l ow ing access to more d a t a a n d b y m a k i n g t h a t access comprehens ib le . W e present our s ys tem - M e P S y s , w h i c h uses the m e d i a t e d s c h e m a as a m e d i a for que ry t r a n s l a t i o n i n re l a t i ona l P D M S s . W e show how to create a m e d i a t e d s c h e m a i n P D M S s a u t o m a t i c a l l y u s i n g the semant ics of the ex i s t i ng m a p p i n g s p r o v i d e d to t rans la te quer ies. W e fu r the r d iscuss how to u p d a t e the m e d i a t e d schema i n a s tab le s ta te , i.e., after the sys tem setup p e r i o d . i i Table of Contents Abstract " . . i i Table of Contents iii List of Tables vi List of Figures vii Acknowledgements ix 1 Introduction 1 1.1 Overview 1 1.2 A Motivating Example of PDMS , 3 1.3 Challenges and Contributions 5 2 Preliminaries 10 2.1 Definitions w.r.t. Mediated schema creation 10 2.1.1 Datalog 10 2.1.2 Mediated Schema, L A V , GAV and G L A V 10 2.1.3 Query and Mapping 13 2.2 Definitions w.r.t. PDMS 14 2.2.1 Query Reformulation and Result Reformulation . . . . 14 2.2.2 Mapping Composition 15 3 Related Work 16 3.1 Peer Data Management Systems . . . .• 16 3.2 Mediated Schema Creation 18 iii Table of Contents 4 Introducing Concept into Conjunctive Mappings 19 4.1 M o t i v a t i o n 19 4.2 De f i n i t i ons a n d A n a l y s i s 21 5 Creating a Mediated Schema in P D M S 25 5.1 P o t t i n g e r ' s S c h e m a M e d i a t i o n A l g o r i t h m for D I S 25 5.2 P r o b l e m D e f i n i t i o n 27 5.3 M a p p i n g T a b l e C r e a t i o n 34 5.3.1 In tu i t i ons 34 5.3.2 A l g o r i t h m 35 5.3.3 Sec t i on S u m m a r y 36 5.4 M a p p i n g C o m p a t i b l e C h e c k 37 5.5 Pee r S c h e m a M e d i a t i o n 39 5.5.1 S y s t e m S e t u p P h a s e - S c h e m a M e d i a t i o n 40 5.5.2 Sec t i on S u m m a r y 49 6 Updating the Mediated Schema 50 6.1 A d d i n g a N e w Peer to the S y s t e m 50 6.1.1 A l g o r i t h m 53 6.2 D r o p p i n g a Pee r f r o m the S y s t e m 54 6.2.1 Sec t i on S u m m a r y 61 6.3 E v o l u t i o n of a Peer L o c a l S c h e m a 61 7 A Study of Mapping Composition 63 7.1 C o m p l e x i t y : W h e r e D i f f i cu l t ies C o m e F r o m 63 7.2 E x p l o r i n g Poss ib le P a t t e r n s i n M a p p i n g C o m p o s i t i o n 66 7.2.1 Di f ferent P a t t e r n s of M a p p i n g C o m p o s i t i o n 68 7.2.2 A n A n a l y s i s of M a p p i n g C o m p o s i t i o n P a t t e r n s . . . . 73 8 System Implementation and Experiment 76 8.1 S y s t e m I m p l e m e n t a t i o n a n d S e t u p 76 8.2 I npu t Schemas a n d M a p p i n g s 77 8.3 E x p e r i m e n t 1: S c h e m a M e d i a t i o n 77 8.4 E x p e r i m e n t 2: Q u e r y R e f o r m u l a t i o n 78 iv Table of Contents 8.5 E x p e r i m e n t 3: U p d a t i n g the M e d i a t e d S c h e m a 81 9 Conclusion and Future Work 84 Bibliography 86 A Details for Example 15 . 89 B Details for Example 21 95 List of Tables 7.1 A S u m m a r y of M a p p i n g C o m p o s i t i o n C o m p l e x i t y f r o m [11, 18] 67 7.2 Di f ferent P a t t e r n s for M a p p i n g C o m p o s i t i o n 71 8.1 U p d a t i n g the M e d i a t e d Schema. fo r A d d i n g a N e w Pee r . . . 82 i v i List of Figures 1.1 Q u e r y P r o c e s s i n g i n t r a d i t i o n a l P D M S 4 5.1 T h e C r e a t e M e d i a t e d S c h e m a A l g o r i t h m f r o m [22] 26 5.2 T h e C r e a t e M a p p i n g A l g o r i t h m f r o m [22] 27 5.3 Q u e r y R e w r i t i n g A l g o r i t h m f r o m [22] 28 5.4 Q u e r y P r o c e s s i n g i n M e P S y s 29 5.5 M a p p i n g T a b l e for E x a m p l e 6 . . . 34 5.6 M a p p i n g T a b l e for E x a m p l e 12(a) 35 5.7 M a p p i n g T a b l e for E x a m p l e 12(b) . . 35 5.8 C r e a t e M a p p i n g T a b l e A l g o r i t h m 37 5.9 M e r g e M a p p i n g T a b l e A l g o r i t h m '. 38 5.10 U p d a t e M a p p i n g T a b l e A l g o r i t h m 39 5.11 M a p p i n g C o m p a t i b l e C h e c k A l g o r i t h m 40 5.12 S c h e m a M e d i a t i o n S t a r t - u p 41 5.13 S y s t e m S e t u p A l g o r i t h m 43 5.14 M e r g e T w o M e d i a t e d S c h e m a A l g o r i t h m . 45 5.15 C o m p u t e G L A V M a p p i n g A l g o r i t h m 46 5.16 Q u e r y R e f o r m u l a t i o n A l g o r i t h m 48 6.1 A d d i n g a N e w Pee r to the S y s t e m 51 6.2 U p d a t e m e d i a t e d i n f o r m a t i o n w h e n new peer j o ins the ne twork 53 6.3 Pee r L e a v i n g ( B a d T o p o l o g y ) 57 6.4 U p d a t e m e d i a t e d i n f o r m a t i o n w h e n a peer leaves the ne twork 58 6.5 E x a m p l e 17 I npu t I n f o rma t i on (a) 59 6.6 E x a m p l e 17 I npu t I n f o r m a t i o n (b) ; . . . 59 6.7 E x a m p l e 17 I npu t I n f o r m a t i o n (c) . 60 v i i List of Figures 6.8 E x a m p l e 17 O u t p u t (a) 61 6.9 E x a m p l e 17 O u t p u t (b) 61 6.10 E x a m p l e 17 O u t p u t (c) 62 8.1 T i m e of b u i l d i n g a m e d i a t e d s c h e m a V . S . N u m b e r of H o p s . . 79 8.2 T i m e of Q u e r y R e f o r m u l a t i o n a n d B r o a d c a s t i n g V . S . D e p t h of T o p o l o g y 80 8.3 T i m e of Q u e r y R e f o r m u l a t i o n i n the ne twork V . S . M a x L o c a l C o m p u t i n g T i m e for 10 t imes Q u e r y R e f o r m u l a t i o n 81 Acknowledgements I w o u l d l ike to t h a n k a l l of peop le g i v i n g me s u p p o r t a n d he lp d u r i n g m y s t u d y at U B C . W i t h o u t you r s u p p o r t , th is thesis canno t be done . F i r s t of a l l , I w o u l d l i ke to t h a n k m y superv i so r , R a c h e l P o t t i n g e r , for a lways g i v i n g me s u p p o r t , encouragement a n d excel lent gu idance t h r o u g h o u t m y thesis work . I spent a w o n d e r f u l t i m e w o r k i n g w i t h R a c h e l for t he pas t one a n d ha l f years. I w o u l d l ike to t h a n k George T s i k n i s for d e d i c a t i n g t ime a n d effort i n r ev i ew ing m y thesis a n d g i v i n g me va luab le c o m m e n t s a n d adv ise . I w o u l d l ike to t h a n k a l l m y f r iends i n the depa r tmen t a n d i n the da tabase lab for you r con t inuous he lp , a n d espec ia l l y the fo l l ow ing peop le : Shao feng B u , G a n g P e n g , X u n S u n , S h u a n W a n g , W e n d y W a n g , J i a n X u , C h a o Y a n , S u l i n g Y a n g a n d X i a o d o n g Z h o u . L a s t , I w o u l d l ike to t h a n k m y paren ts a n d A l e x L i u for the i r endless love a n d s u p p o r t . Jie Zhao October 2006 i x Chapter 1 Introduction 1.1 Overview A p e e r - t o p e e r ( P 2 P ) ne twork is a ne twork i n w h i c h every p a r t i c i p a t i n g node i n the ne twork p rov ides power , b a n d w i d t h a n d o ther resources ra the r t h a n on l y r e l y i n g o n a s m a l l n u m b e r of servers. I t is a decen t ra l i zed a n d d is -t r i b u t e d s y s t e m . C o m p a r e d to a c l ient -server a rch i tec tu re whe re t he n u m b e r of servers are f i xed , P 2 P is more f lex ib le a n d ex tens ib le because w h e n new nodes a r r i ve , the t o ta l capac i t y of the sys tem increases acco rd i ng l y wh i l e i n c l ient -server env i ronmen t , the a d d i n g of new cl ients means the s l ow ing d o w n of the w h o l e sys tem. P 2 P sys tems are ve ry usefu l a n d famous nowa-days a m o n g users w h o want to share files w h i l e users are everywhere across the w o r l d . F o r e x a m p l e , N a p s t e r [3] is a successfu l s ys tem for mus i c s h a r i n g us ing P 2 P in f ras t ruc tu re . A Pee r D a t a M a n a g e m e n t S y s t e m ( P D M S ) (e.g. [7, 8, 14, 24]) is the resu l t of b l e n d i n g t he benef i ts o f peer - to -peer ( P 2 P ) ne tworks , s u c h as lack of a cen t ra l i zed au thor i t y , w i t h the r icher semant ics of a da tabase . Pee r D a t a M a n a g e m e n t S y s t e m ( P D M S ) is a D a t a M a n a g e m e n t S y s t e m u s i n g P 2 P a rch i tec tu re . E a c h peer i n the sys tem ho lds a da tabase . It c a n be ex tens ive ly used for d a t a exchang ing , que ry answer ing , i n f o r m a t i o n s h a r i n g , etc. I n the areas of sc ient i f ic research, the i dea of se t t i ng u p a P D M S for research i n the re la ted a rea to share d a t a a m o n g peers has a l ready been w i d e l y d iscussed a n d a d m i t t e d . A P D M S , as a P 2 P s y s t e m itself, keeps the p roper t ies of a l l P 2 P sys tems: E v e r y peer m a y j o i n a n d leave the ne twork at a n y t ime . A l l peers are a u t o n o m o u s l y c rea ted a n d m a n a g e d . However , A P D M S also requ i res t ha t each peer , u p o n en te r ing the ne twork , pub l i shes i ts da tabase schema so tha t 1 Chapter 1. Introduction i t c a n be seen b y o ther peers i n the ne twork , a n d the en te r ing peer mus t create a m a p p i n g be tween i tse l f a n d one or more ne ighbors . W e ca l l peers be tween w h i c h there ex is ts a d i rec t m a p p i n g acquaintances. T h e s e in te r -peer m a p p i n g s a l low que ry t r a n s l a t i o n schemes [8, 24, 26] w h i c h a l low users to easi ly, seman t i ca l l y que ry d a t a f r o m sources t ha t are no t the i r o w n . P D M S s are a d a t a managemen t a rch i tec tu re where no t a l l nodes in the sys tem need to be there cons tan t ly . P D M S s are o f ten c o m p a r e d w i t h a D a t a In tegra t ion S y s t e m (D IS) w h i c h uses a c l ient -server a rch i tec tu re . A m a j o r di f ferences be tween a D I S a n d P D M S is the cen t ra l i zed con t ro l . In a t r a d i t i o n a l D I S , there is a lways one or more cen t ra l servers to store a g loba l v i ew for a l l c l ien ts ' da tabase schemas. Quer ies are t y p i c a l l y posed over the g loba l s chema a n d t h e n t r ans la ted to loca l schemas. A l l the que ry p rocess ing wo rk such as que ry t r a n s l a t i o n , que ry f o rwa rd ing , rece iv ing answers , a n d t r a n s l a t i n g answers en t i re ly rel ies o n those servers. T h u s , i f the servers b reak d o w n , or become unava i l ab le , the who le sys tem w i l l fa i l . C o n s i d e r i n g the w o r k l o a d of the who le s y s t e m , the cen t ra l i zed con t ro l de f in i te ly resu l ts i n a severe bo t t l eneck . O n the o ther s ide, a P D M S is a d y n a m i c a n d loose ly -coup led env i ronmen t . T h e r e are no cen t ra l servers i n the P D M S . E v e r y peer is b o t h a server a n d a c l ient . E v e n t h o u g h , some nodes m igh t take more respons ib i l i t i es , e.g., super -peer -s t r u c t u r e d P 2 P s y s t e m , these nodes are s t i l l no t t a k i n g the respons ib i l i t i es of con t ro l l i ng a l l nodes , a n d the resu l t i ng a rch i tec tu re is more decen t ra l i zed . In mos t cu r ren t P D M S s , a que ry is posed over a l oca l peer schema , a n d answers w i l l be c o m p u t e d i n th is peer da tabase . T h e que ry w i l l a lso be fo rwa rded a n d re fo rmu la ted to o ther peers i n the ne twork , a n d que ry resu l ts w i l l be sent back . T h e f ina l resu l t set r e t u r n i n g to the user w i l l be answers f r o m the l oca l peer as we l l as those f r o m other peer da tabases . T h e goa l of th is thesis is to s t u d y the p roper t ies of a P D M S , c o m p a r e a n d su rvey ex i s t i ng P D M S s , a n d look for more o p p o r t u n i t i e s to imp rove the cu r ren t P D M S des ign to let the users have a be t te r u n d e r s t a n d i n g of the que ry p rocess ing resul t a n d to speed u p the que ry r e fo rmu la t i on process a m o n g peers. 2 Chapter 1. Introduction 1.2 A Motivating Example of P D M S Given the above properties of flexibility and decentralization, a PDMS is very suitable for scientific research data sharing and exchange [21]. There are many scenarios where a PDMS would be useful. In bioinformatics this has already been studied from both the data management approach [17, 21] and also from the bioinformatics approach [25]. Imagine that the Biology labs from different universities are doing research in similar areas. Some might have a partial gene sequence data of a sparrow, and some might have another partial gene sequence data of a sparrow. Researchers from either lab wants more data for further experiment. Even though their results have been published, the data still need some way to be shared by others. Thus a PDMS where each peer comes from a biology lab doing similar gene expression research can handle such a task well. In general, the types of scenarios that we can imagine that would benefit the most from a P D M S are those scenarios where people have overlapping structured data, and they want to access other sources' additional structured information. Rather than relying on readers having an understanding of the intrica-cies of biological data, we present a scenario which is familiar to readers of Computer Science area: bibliography management in Example 1. Example 1 Assume that there are two database servers of bibliography in-formation: U B C and UW, and that they have the following relations storing information about conference papers: UW.conf-paper(title, venue, year, pages) UBC.conf-paper(title, conference, year, url) As in this example, it is very likely that each database server only stores part of the record about conference papers, e.g., only U W has page infor-mation and only U B C has url. Additionally, even though they may have overlapped record for the same paper, the content might still be different. For instance,("Data Management for Peer-to-Peer Computing: A Vision", "Madison, Wisconsin, USA", "2002", 6) is one data entry in UW.conf-paper; 3 Chapter 1. Introduction ("Data Management for Peer-to-Peer Computing: A Vision","WebDB", " 2 0 0 2 " , "citeseer.ist.psu.edu/bernstein02data.html") is the data entry for the same paper in UBC.conf-paper. Given the similarity of relations above, along with the difference in available data, it would be beneficial for users of the two databases to be able to share their data on conference papers. • /Queiy/f^efbrmulation U B C liJser U B C D B Quer>';Qu»e 1 Figure 1.1: Query Processing in traditional PDMS Figure 1.1 is a picture to explain how queries are propagated and trans-lated back in the P2P systems. In most PDMSs (see figure 1.1) a user at peer U B C will ask a query posed over UBC's schema. Answers are com-puted at U B C , and the query is also reformulated and forwarded to other peers through mapping paths in the network. For example, assume U B C has a mapping MapuBCMW to UW. Using MapuBCMW, a query QUBC will be asked over U B C and then reformulated to Quw, a query over UW's schema. Quw will be processed at peer UW. U W will reformulate Quw to query over any additional acquaintances' schemas in the same way. Thus queries can be forwarded as far along the mapping path as possible, subject to system constraints. Query results will be sent back to the peer U B C after a similar series of reformulations. Thus, the user at U B C will get answers not only from local database but from other peer database as well. 4 Chapter 1. Introduction 1.3 Challenges and Contributions T h e m e t h o d s i n E x a m p l e 1 have been we l l s t u d i e d a n d a l low t r a n s l a t i o n a n d answer ing of a large class of quer ies [26]. Howeve r , w h e n the fu l l con tex t is looked at , there are a d d i t i o n a l oppo r t un i t i e s for i m p r o v i n g the usefu lness a n d c o m p r e h e n s i b i l i t y of que ry t r a n s l a t i o n i n P D M S s , w h i c h we tack le i n th is thesis. F i r s t , we get r i d of the res t r i c t i ve que ry fo rma t . In p rev ious m e t h o d s , o n l y i n f o r m a t i o n s u p p o r t e d b y the l oca l s c h e m a c a n be quer ied . F u r t h e r , a user at a peer canno t k n o w w h a t o ther i n f o r m a t i o n is ava i lab le i n the net-work . T h e user can on l y see the l oca l schema, desp i te the fact t ha t a c t u a l l y there is m u c h more i n f o r m a t i o n i n the P D M S . T h i s is c lea r l y s u b o p t i m a l ; s ince the i n f o r m a t i o n is present , users shou ld be ab le to f u l l y u t i l i ze these resources a n d not be res t r i c ted by the f o rma t of l oca l schemas. F o r examp le , i n E x a m p l e 1, users at U B C w o u l d be unab le to access the page i n f o r m a t i o n at U W because they c a n o n l y ask quer ies over, the i r o w n schema . S e c o n d , we imp rove the comprehens ib i l i t y of the P D M S . C u r r e n t sys-tems can e i ther fa i l to take in to account some expressiveness ava i lab le to the s y s t e m , or resu l t i n sys tems t h a t are d i f f i cu l t to u n d e r s t a n d , as is s h o w n in E x a m p l e 2: E x a m p l e 2 C o n s i d e r the fo l low ing m a p p i n g s , where (2.1) re lates source A to target B , a n d (2.2) re lates source B to target C . C means the le f t -hand s ide is a subset of the r i g h t - h a n d s ide. A t t r i b u t e s i n di f ferent sources t ha t c a n be m a p p e d to each other are us ing the same va r i ab le names . A. grandpa(x, y) C B.father(x, X\), B . f a t h e r ^ , y) (2.1) B. father(x, x l ) , B . f a t h e r ^ , X2), B.father(x2, y) C C.greatgrandpa(a;, y) (2.2) T h i s is i s o m o r p h i c to a m a p p i n g i n [20]. (2.1) says t ha t i f x is the g r a n d p a of y, t h e n there is some x\ such tha t x is x\s fa ther , a n d x\ is y ' s fa ther . M a p p i n g (2.2) says t ha t i f x a n d x\, x\ a n d X2, 2^ a n d X3 a l l have fa ther re la t i onsh ip , t hen x a n d y w i l l be the great g r a n d p a of y. U s i n g the 5 Chapter 1. Introduction a l g o r i t h m desc r i bed i n P i a z z a [20], c o m p o s i n g m a p p i n g s (2.1) a n d (2.2) to m a p source A to source C resu l ts i n : A.grandpa(z, X2), A.grandpa(x2, y) C C.greatgrandpa(a:, y{) (2.3) (2.3) says t ha t i f x is X2Js g r a n d p a a n d £2 is y ' s g r a n d p a , t h e n x is s o m e b o d y ' s great g r a n d p a . However , (2.3) does no t express any cons t ra in t o n y\: f r o m the g iven i n f o r m a t i o n , i t is c lear t ha t y\ is the fa ther of y a n d X2 is the fa ther of y i ; P i a z z a is no t express ive enough to express t ha t res t r i c t i on . W h i l e the m a p p i n g language used b y F a g i n et a l . [10] has enough express ive power to hand le the same examp le , the final resu l ts u s i n g [10]'s a l g o r i t h m are d i f f icu l t to c o m p r e h e n d . T h e reason for th is is t h a t [10] uses second-order log ic dependenc ies to express the i r m a p p i n g s . T h u s for a l l those m a p p i n g s w i t h unbound attributes (a t t r ibu tes t ha t appea r i n the target b u t do not appea r i n the source) i n the first-order log ic , e.g. x\ i n (2.1), [10]'s m a p p i n g language c a n use 3 a n d V to e x p l a i n the cons t ra in ts for these a t t r i bu tes . However , w h e n there are t oo m a n y cons t ra in ts i n the m a p p i n g , the readab i l i t y of the m a p p i n g i tsel f w i l l be decreased. • In ou r s y s t e m , we m a k e the sys tem more comprehens ib le b y p r o v i d i n g a comprehens ib le m e d i a t e d schema, a n d each l oca l s c h e m a w i l l be re la ted to the m e d i a t e d schema so tha t users of b o t h the l oca l s c h e m a a n d the m e d i a t e d schema c a n eas i ly u n d e r s t a n d the sys tem. T h i r d , we p rov ide more semant i c que ry p rocess ing . In t r a d i t i o n a l P 2 P file sys tems, copies of the same file o n dif ferent peers c a n be v iewed as g r o u p e d in to the same file; each m a y con ta in di f ferent pa r t s of the o r i g i na l file. C o n s i d e r the scenar io of a P D M S . D a t a of the same top i c (wh i ch is expressed as "Concept" i n th is thesis) are d ispersed a m o n g peers. W h e n d a t a are g r o u p e d acco rd ing to the top i c i n advance , i t w i l l be ve ry easy to he lp the peers exchange d a t a a n d m a k e the que ry p rocess ing fast . F o u r t h , we recons ider the di f f icu l t ies a n d c o m p l e x i t y ra i sed b y m a p -p i n g c o m p o s i t i o n , a n d see whe the r such c o m p l e x i t y c a n be avo ided i n rea l sys tems. 6 Chapter 1. Introduction Fif th , we m a k e g o o d use of i nd i rec t m a p p i n g i n f o r m a t i o n . W e ca l l the i n f o r m a t i o n f ound i n the c o m p o s e d m a p p i n g tha t is no t i n c l u d e d i n the d i rec t m a p p i n g indirect mapping information. P r e v i o u s m e t h o d s t e n d to re fo rmu-late quer ies based o n single sources of m a p p i n g i n f o r m a t i o n . However , i n rea l sys tems , i t is ve ry l i ke l y t ha t di f ferent m a p p i n g rou tes c a n comp lemen t each other for que ry re fo rmu la t i on . Q u e r y answers w i l l be i m p r o v e d a lo t i f we cons ider di f ferent m a p p i n g routes a n d ind i rec t m a p p i n g i n f o r m a t i o n . T a k i n g in to cons ide ra t i on of a l l these oppo r t un i t i e s , th is thesis on l y c o n -s iders the m a p p i n g s t ha t c a n g roup d a t a in to top ics , a n d proposes a u t o m a t -i ca l l y c rea t ing a n d m a i n t a i n i n g a m e d i a t e d schema to take advan tage of the above ins ights to imp rove func t i ona l i t y for P D M S s . F o r o ther m a p p i n g s , the same m e t h o d p roposed i n P i a z z a is a d o p t e d i n ou r sys tem. T h i s has m a n y advantages. F i r s t a n d foremost , users are able to access more i n f o r m a t i o n res id ing i n the network t h a n i n p rev ious sys tems. S e c o n d , we create the m e d i a t e d schema us ing the p re -ex i s t i ng m a p p i n g s , thus users have the f lex-i b i l i t y to use e i ther the m e d i a t e d schema or use the i r l oca l s c h e m a as they use p rev ious l y ex i s t i ng m e t h o d s , (e.g., those i n P i a z z a [26]). T h i r d , u s i n g our m e t h o d , the t r a n s l a t i o n of quer ies over the m e d i a t e d s c h e m a in to the sources is s imp le a n d effective. F i n a l l y , as w i l l be s h o w n i n the expe r imen t , ou r que ry p rocess ing m e t h o d us ing a m e d i a t e d s c h e m a is m u c h faster t h a n o ther m e t h o d s , w h i c h w o n ' t s low d o w n the who le sys tem b u t w i l l ensure a comprehens ib le P D M S . However , c rea t i ng a m e d i a t e d schema i n a P D M S se t t i ng requ i res a n -swer ing d i f f icu l t quest ions such as: - H o w c a n the m e d i a t e d schema be c rea ted w i t h o u t a cen t ra l i zed a u -tho r i t y? - H o w c a n an a u t o m a t i c a l l y c rea ted m e d i a t e d s c h e m a be comprehens ib le to users? - S ince the sys tem is d y n a m i c , the ma in tenance of a m e d i a t e d schema mus t be a u t o m a t i c a n d adap t i ve ; h o w c a n h u m a n in te rven t ion be m i n -i m i z e d ? 7 Chapter 1. Introduction - W h e r e is the m e d i a t e d schema s to red , a n d how is i t u p d a t e d ? In th is thes is , we beg in to answer the above p rob lems . W e m a k e the fo l low ing speci f ic con t r i bu t i ons : - W e descr ibe our P D M S s y s t e m , M e P S y s ( M e d i a t i o n s u p p o r t e d P e e r D a t a M a n a g e m e n t S y s t e m ) , i n w h i c h a m e d i a t e d s c h e m a c a n be cre-a ted d y n a m i c a l l y a n d any i n f o r m a t i o n i n the ne twork c a n be quer ied w i t h o u t r equ i r i ng any a d d i t i o n a l g l oba l serv ices i n the ne twork . - W e ou t l i ne our Peer S c h e m a M e d i a t i o n a l g o r i t h m ( P S M a lgo r i t hm) w h i c h is used to ef f ic ient ly create the m e d i a t e d s c h e m a represen t ing a l l schemas i n the ne twork , a n d i n t roduce M a p p i n g T a b l e s as a m e c h a n i s m for re la t i ng the m e d i a t e d schema to the sources. - W e s t u d y the semant ics of m a p p i n g s a n d i n t r oduce the i dea of au to -m a t i c a l l y de tec t ing speci f ic Concepts i n m a p p i n g s . A m e d i a t e d schema based o n Concepts is m u c h easier for users to u n d e r s t a n d . - W e s t u d y how m a p p i n g c o m p o s i t i o n i m p a c t s que ry r e f o r m u l a t i o n a n d how to d iscover c o m m o n Concepts i n c o m p o s e d m a p p i n g s to ensure comprehens ib le que ry resu l ts . - W e s t u d y how to solve the p r o b l e m of u p d a t i n g the m e d i a t e d s c h e m a i n our i n f ras t ruc tu re . - W e des ign exper imen ts to test the ef f ic iency a n d sca lab i l i t y of M e P S y s . T h i s thesis is o rgan ized as fo l lows. In C h a p t e r 2, we in t r oduce back -g r o u n d know ledge re la ted to th is thes is . W e d iscuss re la ted wo rks i n C h a p -ter 3.' I n C h a p t e r 4, we def ine a n d ana l yze the seman t i c i n f o r m a t i o n t ha t c a n be presented i n a con junc t i ve m a p p i n g s . W e present our a p p r o a c h to c rea t i ng a m e d i a t e d s c h e m a i n a P D M S a n d c rea t i ng the m a p p i n g s f r o m the m e d i a t e d s c h e m a to l o c a l sources i n C h a p t e r 5, i n c l u d i n g a l go r i t hms a n d examp les . W e fu r the r descr ibe how to u p d a t e the m e d i a t e d s c h e m a i n C h a p t e r 6. W e show our sys tem i m p l e m e n t a t i o n se tup a n d the e x p e r i m e n t a l 8 Chapter 1. Introduction results in Chapter 8. We then study the mapping composition problem that can impact the correctness and capability of query translation in the system in Chapter 7. Chapter 9 concludes and discusses future work. 9 Chapter 2 Preliminaries 2.1 Definitions w.r.t. Mediated schema creation Def in i t i ons i n th is sec t ion are re la ted to the m e d i a t e d s c h e m a c rea t ion . 2.1.1 Datalog D a t a l o g , a subset o f P r o l o g , is a q u e r y a n d r u l e based language . It is a log ic -based que ry language for the re la t i ona l m o d e l . T a k e the fo l l ow ing c lause as an examp le : Example 3 grandpa(x, z) :- father(x,y), father(y, z) • E a c h c lause i n D a t a l o g is c o m p o s e d of two par t s : head a n d b o d y , w h i c h are separa ted by " : - " . In E x a m p l e 3, grandpa(x, z) is the h e a d , a n d father(x, y ) , father(y, z) are co l lec t i ve ly the b o d y . T h e b o d y can be v iewed as cond i t i ons k n o w n , a n d the h e a d c a n b e regarded as the query. T h e above c lause c a n b e in te rp re ted as: i f x is y 's fa ther a n d y is z's fa ther , t h e n x is z's g r a n d p a . E a c h re la t i on (e.g. father) is ca l led a p red ica te . E a c h t e r m i n the p red i ca te (e.g., x and y) c a n e i ther be a va r iab le or a cons tan t . F o r safety, on l y var iab les t ha t appea r i n the b o d y c a n appea r i n the head . 2.1.2 Mediated Schema, LAV, G A V and G L A V A d a t a i n teg ra t i on sys tem is a sys tem tha t comb ines d a t a res id ing at di f-ferent sources a n d prov ides the user w i t h a un i f i ed v i ew of these d a t a . A m e d i a t e d schema , also ca l led a g loba l schema, is u s u a l l y p r o v i d e d ' i n a D a t a In tegra t ion S y s t e m . E v e r y l o c a l source has i ts o w n da tabase schema , w h i c h is ca l led the l oca l schema. A m e d i a t e d schema is a un i f i ed v i e w of a l l these 10 Chapter 2. Preliminaries l o ca l da tabases so t ha t t h r o u g h the m e d i a t e d schema , users c a n que ry over on l y the m e d i a t e d schema a n d get resu l ts f r o m a l l l o c a l sources. T h e r e are two bas ic approaches i n a d a t a in teg ra t ion s y s t e m to spec i fy m a p p i n g s be tween the m e d i a t e d schema a n d the l o c a l schemas: L o c a l - A s -V i e w ( L A V ) a n d G l o b a l - A s - V i e w ( G A V ) . In L A V , the content of each lo-c a l source s . shou ld be charac te r i zed i n t e rms of a v i e w qG over the g loba l s chema ; i n G A V , the content of each e lement g of the g l o b a l s c h e m a s h o u l d be cha rac te r i zed i n te rms of a v i ew qs over the l oca l sources [19]. E x a m p l e 4 T o e x p l a i n these de f in i t ions , cons ider E x a m p l e 1 aga in . T w o l oca l da tabases U W a n d U B C are presented w i t h the da tabase re la t ions as be low. U W da tabase : UW.conf-paper(title, venue, year, pages) U B C da tabase : UBC.conf-paper(title, conference, year, url) A m e d i a t e d schema M is m a n u a l l y c rea ted as: M.conf-paper(title, conference, venue, year) U s i n g L A V , UW.conf-paper a n d UBC.conf-paper w i l l be e x p l a i n e d as a v i e w of the m e d i a t e d schema: UW.conf-paper(title, venue, year, pages) :-M.conf-paper(title, conference, venue, year) UBC.conf-paper(title, conference, year, url) : -M.conf-paper(title, conference, venue, year) U s i n g G A V , M.conf-paper c a n be expressed as the v i e w of a l l l o ca l schemas. M.conf-paper(title, conference, venue, year) :-UW.conf-paper(title, venue, year, pages) M.conf-paper(title, conference, venue, year) :-UBC.conf-paper(title, conference, year, url) 11 Chapter 2. Preliminaries • G L A V , f i r s t l y i n t r o d u c e d i n [12], is a more genera l i zed a p p r o a c h to spec i fy m a p p i n g s between sources. G L A V m a p p i n g language adop ts b o t h G A V a n d L A V i n i ts p resen ta t i on , w h i c h a l lows a f lex ib le s c h e m a de f in i t i on a n d comb ines the express ive power of b o t h G A V a n d L A V . E x a m p l e 5 shows an e x a m p l e of G L A V m a p p i n g re la t i ng di f ferent schemas. Example 5 A s s u m e U W a n d U B C have the fo l l ow ing schemas. U W da tabase : UW.research-area(area, groupleader, department) UW.grad-student(stu-name, area, advisor) U B C da tabase : UBC.faculty(name, office-, department) A m e d i a t e d schema M is m a n u a l l y c rea ted as: M.people(name, department) U s i n g G L A V , re la t i onsh ip be tween the source s c h e m a a n d the m e d i a t e d schema c a n be freely expressed. G L A V m a p p i n g be tween U B C a n d M UBC.faculty(name, office, department) —> M.people(name, department) G L A V m a p p i n g between U W a n d M UW.grad-student(stu-name, area, advisor), UW.research-area(area, gl, department) —» M.people(name, department) T h e above G L A V m a p p i n g s c a n also be rewr i t t en i n con junc t i ve m a p -p ings . 12 Chapter 2. Preliminaries G L A V m a p p i n g be tween U B C a n d M L A V : <2i(name, department):-M.people(name, department) G A V : Qi(name, department):-UBC.faculty(name, office, department) G L A V m a p p i n g between U W a n d M L A V : Q2(r\ame, department):-M.people(name, department) G A V : <32(name, department):-UW.grad-student(name, area, advisor), UW.research-area(area, gl, department) • 2.1.3 Query and Mapping A m a p p i n g is the m e d i u m for exchang ing d a t a a n d r e f o r m u l a t i n g quer ies a m o n g di f ferent schemas; i n p a r t i c u l a r , the m a p p i n g def ines the ove r l app ing pa r t s of acqua in tances ' schemas. A s such , i t is the mos t bas ic a n d i m p o r -tan t pa r t of the sys tem. W e use con junc t i ve m a p p i n g s [27] as our m a p p i n g language. T h e s e are the same m a p p i n g s as i n [22] a n d [20]. W e choose con junc t i ve m a p p i n g s as ou r i n p u t m a p p i n g because they c a n eas i ly express c o m m o n a l i t y a m o n g dif ferent schemas a n d a l low the use of ex i s t i ng algo-r i t h m s if the m e d i a t e d s c h e m a is no t used , e.g., t h r o u g h the m e t h o d s i n [26] w h i c h j us t compose the m a p p i n g s be tween peers. A conjunctive mapping is a set of conjunctive queries r e l a t i n g a pa i r of l o c a l schemas: i.e., a set of s imp le D a t a l o g quer ies. W e br ie f l y rev iew the s y n t a x of con junc t i ve quer ies i n E x a m p l e 6. I n a con junc t i ve m a p p i n g , i f a set of con junc t i ve quer ies has the same I D B n a m e (i.e., n a m e of the answer re la t i on ) , t ha t set expresses the ove r l app ing i n f o r m a t i o n i n the sources. N o t e t ha t we s t i l l use m a p p i n g to refer to the genera l sense of cor respondences be tween schemas. E a c h con -j u n c t i v e que ry i n the con junc t i ve m a p p i n g is also ca l led a component, w h i c h relates to one s c h e m a for t h a t m a p p i n g . In a con junc t i ve query , var iab les 13 Chapter 2. Preliminaries t ha t appea r i n b o t h the head and the b o d y are ca l led distinguished variables; var iab les t ha t appea r o n l y i n the b o d y are ca l led existential variables. E x a m p l e 6 C o n s i d e r the fo l l ow ing con junc t i ve m a p p i n g o n the schemas i n E x a m p l e 4: conf-paper(title,venue,yr):-UW.conf-paper(title,venue,yr.pages) conf-paper(title,venue,yr):-UBC.conf-paper(title,venue,yr,url) T h i s m a p p i n g shows how U B C a n d U W c a n share i n f o r m a t i o n abou t conference pape rs t h r o u g h the reuse of the I D B n a m e con f -paper . E a c h l ine is a con junc t i ve query. conf-paper(title,venue,yr) is the head of b o t h quer ies, a n d title, venue, and yr are head var iab les , i n d i c a t i n g t h a t these are the c o m m o n a t t r i bu tes i n the two schemas. A s i n genera l D a t a l o g , the semant ics i m p l y tha t conf-paper i n f o r m a t i o n c a n be f o u n d b y t a k i n g the u n i o n of title, venue, and yr a t t r i bu tes f o u n d f r o m the bod ies of the two quer ies. T h e bod ies express the cond i t i ons requ i red to f o r m an answer tup le , a n d reuse of a va r iab le ind ica tes t ha t those va lues mus t be the same. E a c h re la t i on i n the b o d y of a que ry is re fer red to as a subgoa l . Toge ther , the con junc t i ve quer ies show tha t i n f o r m a t i o n a b o u t conf-papers c a n be f o u n d t h r o u g h e i ther UW.conf-paper or UBC.conf-paper, a n d a l l i n f o r m a t i o n t ha t c a n be o b t a i n e d f r o m the top i c conf-paper are title, venue, yr, page a n d url i n f o r m a t i o n . N o t e t ha t the I D B name of a s u b g o a l is the s u b g o a l name . • 2.2 Definitions w.r.t. P D M S 2.2.1 Query Reformulation and Result Reformulation Q u e r y re fo rmu la t i on is the process of t r a n s l a t i n g a que ry over da tabase schema A to tha t over da tabase s c h e m a B so tha t quer ies over A c a n be u n d e r s t o o d b y schema B . T h e re fo rmu la t i on process fo l lows the m a p p i n g s between these two schemas. A c c o r d i n g l y , resu l t r e f o rmu la t i on is the process of t r a n s l a t i n g the resu l ts u s i n g da tabase schema A to those over da tabase schema B so tha t resu l ts ob ta ined at s c h e m a A c a n be u n d e r s t o o d b y s c h e m a B . 14 Chapter 2. Preliminaries 2.2.2 Mapping Composition M a p p i n g c o m p o s i t i o n is a h a r d p r o b l e m w h i c h d raws m u c h a t ten t i on re-cent ly . In tu i t i ve ly , w h e n g iven semant i c m a p p i n g s M i 2 f r o m s c h e m a A to B , a n d M23 f r o m B to C , we hope we cou ld c o m p u t e a d i rec t m a p p i n g M\z f r o m A to C w h i c h is equ iva lent to the c o m p o s i t i o n of m a p p i n g s M12 a n d M 2 3 . H a l e v y et a l . p r o v i d e d a f o r m a l de f in i t i on of m a p p i n g c o m p o s i t i o n : " T h e m a p p i n g MA^C is a c o m p o s i t i o n of the m a p p i n g s MA~*B a n d MB-*C w.r . t . a que ry language Q i f for a l l da tabases DA for RA, a n d for a l l quer ies q over Rc such tha t q is i n the language Q, the ce r ta in answers for q w. r . t . MA->C are the same as the ce r ta i n answers w.r . t . MA-,B a n d MB->C" [20] F a g i n et a l . de f ined the m a p p i n g c o m p o s i t i o n i n more log ica l sense: " L e t M\2 ={S\, S2, £12) a n d M 2 3 =(S2, 53, £23) be two s c h e m a m a p p i n g s such t ha t the schemas S i , S 2 , S3 have no re la t i on s y m b o l i n c o m m o n pa i rw ise . A schema m a p p i n g M = ( S i , S3, £13) is a c o m p o s i t i o n of Myj a n d M 2 3 i f I ns t (M ' ) = I n s t ( M i 2 ) o I ns t (M 2 3 ) , w h i c h means tha t I n s t ( M ) = { < / i ; i3> | there ex is ts I2 such tha t < / i ; I2> G I n s t ( M i 2 ) a n d < 7 2 ; i3> G Ins t (M 2 3)}." " A schema m a p p i n g is a t r ip le M = ( S , T, £ s t ) , where S a n d T are schemas w i t h no re la t i on symbo l s i n c o m m o n a n d £ s t is a set of f o rmu las of some log ica l f o r m a l i s m over < S , T > . " [11] M a p p i n g c o m p o s i t i o n is a p r o b l e m tha t costs h i g h c o m p l e x i t y w h e n c o n -s ide r i ng a l l classes of m a p p i n g s . In C h a p t e r 5, we exp lo re a n d ana l yze m a p -p i n g c o m p o s i t i o n p r o b l e m i n more d e p t h . 15 Chapter 3 Related Work 3.1 Peer Data Management Systems F o r a l ong t ime , P 2 P c o m p u t i n g a n d D a t a b a s e research g roups were ve ry i ndependen t . T h e i d e a of i n c o r p o r a t i n g the da tabase research in to P 2 P was p roposed a n d d iscussed ear l y th is cen tu ry b y G r i b b l e et a l . [13] i n 2001, B e r n s t e i n et a l . [7] i n 2002. T h e pape r b y B e r n s t e i n et a l . i n t roduces the L o c a l R e l a t i o n a l M o d e l ( L R M ) as a d a t a m o d e l spec i f i ca l l y des igned for P 2 P app l i ca t i ons . M o s t research p ro jec ts i n P D M S b u i l d the i r a rch i tec tu re by e x t e n d i n g L R M , (i.e. P i a z z a [14], H e P T o X [8], H y p e r i o n [24]). L R M assumes tha t the set of a l l d a t a i n a P 2 P ne twork consis ts of l oca l ( re la t iona l ) da tabases , each w i t h a set of acqua in tances , w h i c h def ine the P 2 P ne twork topo logy. F o r each acqua in tance l i nk , t r a n s l a t i o n ru les be tween d a t a i tems a n d seman t i c dependenc ies be tween the two da tabases are p rede f ined [7]. O u r sys tem also uses such a m o d e l . P i a z z a [14, 15, 20] is a w e l l - k n o w n p r o t o t y p e for Peer D a t a M a n a g e m e n t S y s t e m . P i a z z a uses G A V / L A V sty le m a p p i n g s to descr ibe the seman t i c re la t i onsh ips be tween two peers. O u r m a p p i n g is based o n the P i a z z a m a p -p i n g , b u t changed in to con junc t i ve m a p p i n g for be t te r u n d e r s t a n d i n g . P i -a z z a also p rov ides a que ry re fo rmu la t i on a l g o r i t h m [26] based o n X Q u e r y to t rans la te the quer ies a m o n g peers us ing di f ferent schemas. U n l i k e our so l u t i on , t h e y do not s u p p o r t a m e d i a t e d s c h e m a i n the i r p r o t o t y p e , w h i c h is f lex ib le for que ry t r ans la t i on . O u r so lu t i on uses the s im i l a r i n i t i a l se tup b u t ensures a be t te r u n d e r s t a n d i n g of the s y s t e m , access to more i n f o r m a -t i on a n d a faster q u e r y re fo rmu la t i on . However , s ince o u r s y s t e m uses the same m a p p i n g s as P i a z z a , users of ou r s ys tem c o u l d choose to use e i ther our que ry answer ing sys tem or t rans la te the i r quer ies u s i n g P i a z z a ' s s ys tem. 16 Chapter 3. Related Work H e P T o X [8, 9] is a P 2 P X M L da tabase sys tem. A nove l que ry t r a n s l a t i o n a l g o r i t h m has been deve loped for a s imp le b u t a s ign i f i can t f ragment of X Q u e r y . B o t h P i a z z a a n d H e P T o X assume tha t a l l quer ies are asked over a l oca l schema; they focus o n finding a que ry r e fo rmu la t i on a l g o r i t h m to t rans la te a que ry over one peer s c h e m a to a que ry over i ts acqua in tance . T h e works of F a g i n et a l . [10, 11] m a i n l y cons ider how to process d a t a exchange a n d m a p p i n g c o m p o s i t i o n w i t h o u t i n f o r m a t i o n loss us ing m a p p i n g s unde r second-order logic dependenc ies . T h e y p rov ide a n ice so lu t i on for m a p p i n g c o m p o s i t i o n w h i c h they uses for the i r que ry t r a n s l a t i o n . H y p e r i o n [5, 24] is also a Pee r D a t a M a n a g e m e n t S y s t e m w h i c h uses b o t h da ta - leve l a n d schema- leve l m a p p i n g s to spec i fy the cor respondences be tween acqua in ted peer da tabases a n d process que ry r e fo rmu la t i on based o n these m a p p i n g s . M a p p i n g tab les are used to spec i fy co r respondences between d a t a values of acqua in ted da tabases . T h i s is the key di f ference between H y p e r i o n a n d a l l o ther sys tems. In P e e r D B [21], l oca l peer schemas are not p u b l i s h e d . I ns tead , l oca l users i n p u t keywords f r o m re la t i on n a m e , a t t r i bu tes , a n d records for each peer da tabase , a n d such i n f o r m a t i o n is used to m a t c h re la t ions . A s i n p rev ious P D M S s : a new peer en te r ing the ne twork chooses one or more acqua in tances a n d p rov ides m a p p i n g s to one or more acqua in tances , e i ther c reated b y h a n d , or t h r o u g h some s c h e m a m a t c h i n g t o o l (see [23] for a su rvey ) . O u r sys tem has the fo l low ing m i n i m a l features: Peer: each peer stores b o t h i ts l oca l s chema a n d one or more vers ions of m e d i a t e d schema tha t l oca l peer has an a p p l i c a t i o n b u i l t on . O n l y the m a p p i n g s f r o m m e d i a t e d schema to i ts l oca l s c h e m a w i l l be s to red at th is l o c a l peer , w h i c h is used for que ry answer ing . E a c h peer w i l l a lso s tore M a p p i n g T a b l e s , w h i c h he lp create the m e d i a t e d s c h e m a a n d de te rm ine how to re late the m e d i a t e d schema to the peer 's schema. Query answering: users c a n e i ther que ry the l oca l s c h e m a (as i n p rev ious sys tems) or use the au toma t i ca l l y - c rea ted m e d i a t e d schema , w h i c h p rov ides a d d i t i o n a l i n f o r m a t i o n . In e i ther case, user quer ies are a u t o m a t i c a l l y refor-m u l a t e d a n d fo rwarded to the peers ' acqua in tances . 17 Chapter 3. Related Work 3.2 Mediated Schema Creation A n o t h e r aspect of ou r research is m e d i a t e d s c h e m a c rea t i on ( M S C ) . M S C has been s tud ied i n d a t a in tegra t ion area. B a t i n i , L e n z e r i n i , a n d N a v a t h e [6] c o n d u c t e d a survey for da tabase schema in teg ra t i on ear l y i n 1986. In. [6], me thodo log ies for s c h e m a in tegra t ion a n d a c o m p a r i s o n of a l l ava i lab le methodo log ies are p r o v i d e d . L a t e r , L e n z e r i n i [19] a n d U l l m a n [28] b o t h d iscussed d a t a in teg ra t ion techn iques i n ve r y theore t i ca l pe rspec t i ve . H o w -ever, mos t ea r l y wo rk ignores c rea t i ng the m a p p i n g s be tween the m e d i a t e d schema a n d the sources. P o t t i n g e r p r o v i d e d a m e t h o d for c rea t ing a m e d i a t e d schema a n d m a p -p ings f r o m the m e d i a t e d schema to sources [22]. However , [22] does not cons ider the comp l i ca t i ons of the P D M S s i t u a t i o n , i n c l u d i n g the fact t ha t m a p p i n g s c a n ex is t be tween any pa i rs of schemas, ra the r t h a n those d i c t a ted b y a cen t ra l i zed au thor i t y . O u r s c h e m a m e d i a t i o n m e t h o d is based o n [22]. However , we have the fo l l ow ing improvemen ts : ou r M S C m e t h o d c a n h a n d l e any n u m b e r s of l oca l schemas, wh i l e the one i n [22] c a n o n l y h a n d l e a l i m i t e d n u m b e r of l oca l sources. In our a p p r o a c h , any o rde r i ng of m e d i a t i n g l o c a l schemas w i l l get an equiva lent resu l t wh i l e i n [22], d i f ferent o r d e r i n g of l oca l schemas m igh t get t o ta l l y di f ferent m e d i a t e d schema a n d m a p p i n g s to l oca l sources. T h i s is le t 's assume tha t we use a di f ferent a p p r o a c h f r o m the i rs . F o r e x a m p l e , w h e n we have three peers A , B , C m a p p i n g MapA_B be tween A a n d B , a n d m a p p i n g MapB.c be tween B a n d C . P o t t i n g e r ' s a p p r o a c h to create a m e d i a t e d s c h e m a MAB is f i rs t l y based o n schema A, B a n d m a p p i n g MapA_B- T h e n m a p p i n g s MapAB-B a n d Maps_c are c o m p o s e d to get the m a p p i n g MapAB.c- U s i n g schemas MAB, C a n d m a p p i n g MapAB.c, the m e d i a t e d s c h e m a of A B C c a n be c rea ted . U s i n g ou r a p p r o a c h , we create the i n te rmed ia te m e d i a t e d schemas MAB a n d MBC first. T h e n we create a m a p p i n g MapAB.BC based o n the ove r lapped subgoa ls i n the m a p p i n g s MAB a n d MBC , a n d create the final m e d i a t e d s c h e m a a n d m a p p i n g s to l oca l sources. 18 Chapter 4 Introducing Concept into Conjunctive Mappings Befo re we delve in to the P S M a l g o r i t h m , we s tar t b y s t u d y i n g two re la ted p rob lems : w h a t are the th ings t ha t can be represented by the same re la t i on i n the m e d i a t e d schema , a n d w h a t is the c o m p l e x i t y of m a p p i n g c o m p o s i t i o n i n such a sys tem. W e exp lore the f i rst p r o b l e m i n th is chap te r , a n d exp lo re the second one i n chap te r 7. T h e def in i t ions of C o n j u n c t i v e M a p p i n g a n d C o n j u n c t i v e Q u e r y have been i n t r o d u c e d a n d exp la i ned i n sec t ion 2.1.3. I n th i s chap te r , we exp lo re how to s u p p o r t semant ics i n the con junc t i ve m a p p i n g b y a p p l y i n g some res t r i c t ions o n the con junc t i ve m a p p i n g s . W e f i rst present the m o t i v a t i o n of i n t r o d u c i n g the i dea of "Concept" in to con junc t i ve m a p p i n g s i n sec t ion 4 .1 . W e t h e n use some examples to ana lyze w h a t k i n d of res t r i c t ions s h o u l d be app l i ed to the m a p p i n g s i n order to let the m a p p i n g s have a Concept. W e give the f o r m a l de f i n i t i on of a Concept i n sec t ion 4.2. 4.1 Motivation O n e of our con t r i bu t i ons is to i n t roduce the n o t i o n of Concept i n to c o n -j u n c t i v e m a p p i n g s . P r e v i o u s wo rk e i ther ignores the semant ics t h a t the m a p p i n g s m igh t con ta in [11, 20] or s i m p l y assumed tha t i f the I D B n a m e of the quer ies were the same, t h e n t hey desc r ibed the same Concept [22]. T h e la ter v i ew , t h o u g h us ing the seman t i c i n f o r m a t i o n t ha t is e m b e d d e d i n the m a p p i n g s , is over ly s imp l i s t i c . However , r eus ing the same I D B n a m e ins ide a con junc t i ve m a p p i n g l i ke ly ind ica tes the cases t ha t (1) the seman -t ics of the m a p p i n g m a y m a k e it imposs ib le to cons t ruc t a n u n d e r s t a n d a b l e m e d i a t e d schema i f th is is the on l y i n f o r m a t i o n t a k e n in to account a n d (2) 19 Chapter 4. Introducing Concept into Conjunctive Mappings consistency of I D B names cannot be assumed between mappings - there is no guarantee that the creator of M a p ^ - B wi l l use the same I D B name in the same way as the creator of Mapcjo to represent the same Concept. In this thesis we begin to understand the semantic issues entailed by Concept. However, as an introductory step, throughout this thesis we assume that when more than one conjunctive mappings express the same Concept, their I D B names are the same. Intuitively, a Concept describes the common object among different schemas. This is different from a mapping which expresses the overlapped attributes across schemas. When a mapping is said to have a Concept, the mapping should have a clear object, which might have some aspects to describe it in detail. In Example 6, clearly the mapping describes the object "conf-paper", which uses five aspects, title, venue, yr, pages and url, to describe this object. However, even though there are common relation names (e.g., father) and attributes (e.g., x, x l ) occurred in Example 2, these two mappings together do not express a common object or an idea because mapping (2.1) explains the Concept grandpa and mapping (2.2) explains the Concept greatgrandpa. Deciding whether or not one mapping can express a Concept evolves much A l research topics such as natural language processing, ontology, etc, and thus is not our focus in this thesis. However, deciding whether two mappings wi th the same I D B name together can express the same Concept is very useful, especially when we want to mediate local schemas and propagate queries along the mapping path. There are two reasons. First , users tend to provide incomplete or wrong mapping information when two mappings between two pairs of peer schemas are considered. Second, we want to know whether the pre-defined mappings are truly relating identical parts from different schemas. In this thesis, we only discuss this Concept problem based on two or more mappings. Throughout this thesis, the input mappings for MePSys are required to be conjunctive mappings wi th the same concept (Definition 3) without self-joins in each component. The P S M algorithm is sound and complete for such input mappings. 20 Chapter 4. Introducing Concept into Conjunctive Mappings 4.2 Definitions and Analysis In C h a p t e r 2, we i n t r o d u c e d con junc t i ve m a p p i n g s , i n c l u d i n g E x a m p l e 6, w h i c h def ines one Concept: con f -paper . However , i n m a n y cases d e t e r m i n i n g a Concept f r o m the m a p p i n g is no t s t ra igh t fo rward . C o n s i d e r E x a m p l e 7 (o r ig ina l l y f r o m [11] b u t rewr i t t en i n the f o r m of a con junc t i ve m a p p i n g ) . Example 7 A s s u m e the fo l l ow ing con junc t i ve m a p p i n g s : C o n j u n c t i v e M a p p i n g 1: be tween A a n d B Manager (x , y) :- A . M g r ( x , y) (7.1a) Manager (x , y) :- B . M g r l ( x , y) (7.1b) C o n j u n c t i v e M a p p i n g 2: be tween B a n d C Manager(x) :- B . M g r l ( x , x) (7.2a) Manager(x) :- C .Se l fMgr (x ) (7.2b) B o t h con junc t i ve m a p p i n g s use the I D B name Manager . However , wh i l e the first con junc t i ve m a p p i n g def ines the M a n a g e r re l a t i onsh ip , the second does not . T h o u g h the re la t i on M g r l appears i n b o t h con junc t i ve m a p p i n g s , i n the second m a p p i n g , i t has more res t r i c t ions o n i ts a t t r i bu tes : x mus t manage itself. T h u s , M g r l ( x , x) c a n no longer represents the i d e a of M a n a g e r b u t S e l f - M a n a g e m e n t i ns tead . The re fo re , E x a m p l e 7 has two Concepts: M a n a g e r a n d S e l f - M a n a g e m e n t ra ther t h a n o n l y one. W e ca l l the a t t r i bu tes i n B . M g r l ( x , x) i n (7.2a) a self-restrictive a t t r i bu te , w h i c h is def ined to be an a t t r i b u t e t ha t is i n a s u b g o a l t ha t appears i n two con junc t i ve m a p p i n g s w i t h the same I D B n a m e b u t has more res t r i c t ions t h a n the same a t t r i b u t e i n the o ther m a p p i n g . C o r r e s p o n d i n g l y , componen t (7.2a) ' is a se l f - rest r ic t ive c o m p o n e n t (De f i n i t i on 6) . • E x a m p l e 7 shows tha t i t is h a r d to j udge whe the r two m a p p i n g s descr ibe the same idea . T h i s thesis p rov ides the f i rst pass at u n d e r s t a n d i n g a Concept w h i c h is some c o m m o n top i c or i dea tha t c a n he lp m a k e the m e d i a t e d s c h e m a more comprehens ib le . B u t w h a t is a Concept i n the con junc t i ve m a p p i n g s ? 21 Chapter 4. Introducing Concept into Conjunctive Mappings W e b e g i n w i t h the fo l l ow ing de f in i t i on of a Concept i n con junc t i ve m a p p i n g s , w h i c h a l lows a care fu l ana lys is of the cases t ha t are c lear -cu t . T h e r e are a d d i t i o n a l issues to cons ider , b u t i n th is thesis we focus o n these cases t ha t are c lear a n d essent ia l for dec id i ng a Concept. A Concept is def ined b y a set of aspects. A s i n re la t ions i n re l a t i ona l da tabases , the aspects of a Concept are re fer red to as attributes. F o r e x a m -p le , the concep t of f l ight inc ludes the a t t r i bu tes f l igh tNo, date, t ime, depar-ture, dest inat ion, crew, e tc , w h i c h are the aspects t ha t are used to desc r i be the Concept f l ight. In a con junc t i ve m a p p i n g , the n a m e of each aspect is the co r respond ing va r iab le n a m e i n the m a p p i n g , w h i c h represents one aspect of the Concept t ha t the m a p p i n g descr ibes. D e f i n i t i o n 1 ( C o n c e p t i n C o n j u n c t i v e M a p p i n g s ) : A Concept i n con junc -t ive m a p p i n g s is def ined as an idea , n o t i o n or en t i t y t ha t is c o m m o n i n a l l schemas tha t the con junc t i ve m a p p i n g s are re la t i ng . F o r m a l l y , a Concept C is expressed b y a set of aspects S. • F o r ease of d i scuss ion , we p rov ide the fo l low ing te rms a n d de f in i t i ons . Definit ion 2 ( C o m p o n e n t ) : E a c h con junc t i ve que ry Q i n the con junc t i ve m a p p i n g is a c o m p o n e n t [22]. • Definit ion 3 (Same C o n c e p t ) : W e say tha t two con junc t i ve m a p p i n g s CM\ a n d CM2 def ine the s a m e concep t i f the ove r l apped subgoa ls i n CM\ a n d CMi are equ iva lent subgoa l sets, w h i c h c a n be checked as fo l lows: 1. A s s u m e V c o m p o n e n t c e CM\ U CM2, $ c' s.t. (1) c = c' ( i.e., c C dk.dC. c) a n d (2) \subgoals(c)\ > \subgoal(d)\; A s s u m e the ove r l apped s c h e m a of CM\ a n d CM2 is X; L e t C\ be a c o m p o n e n t f r om CM\ over X\ L e t C2 be a c o m p o n e n t f r o m CM2 over X. 2. L e t name(sg) be the re la t i on n a m e of the subgoa l sg; L e t sgjnames(Q) be the names of a l l o f the re la t ions i n que ry Q; 22 Chapter 4. Introducing Concept into Conjunctive Mappings L e t overlap-names — {name(sg) | name(sg) e sgjnames{C\) a n d name(sg) £ sgjnames(C2)}; L e t Cioveriap = {sg | name(sg) G overlap.nam.es a n d s# € 6 o d y ( C i ) } ; C20i;er/ap = { s 5 I name(sg) € overlap .names a n d s<? € 6ody(C2)}. 3. W e now create new quer ies Q i a n d Q2 t h a t descr ibe the ove r l app ing pa r t s of C\ a n d C2 respect ive ly . T h e goa l is to see i f the ove r l app ing pa r t s are equ iva lent : L e t IDB{QX) be IDB{CMX), a n d let IDB{Q2) be IDB{CM2); L e t subgoals(Qi) = Cx^^ia^ a n d subgoals(Q2) = C2au(,Tiav; L e t a l l var iab les i n Q\ a n d Q2 be d i s t i ngu i shed . T h a t is let vars{head{Q\)) = vars(subgoals(Q\)) a n d let vars{head(Q,2)) = vars(subgoals(C>2))-T h e n the fo l low ing cond i t i ons s h o u l d h o l d : 1. IDB(CM\) — IDB(CM2) (wh ich , b y the above requ i remen t , is also equa l to IDB{QX) a n d IDB(Q2)); 2. overlap-names ^ 0 ; 3. Q\ con ta ins Q2, a n d Q2 con ta ins Q\ (i.e., t hey are equ iva len t ) . • A d d i t i o n a l l y , c o n d i t i o n (1) is ac tua l l y the a s s u m p t i o n we have m a d e at the b e g i n n i n g . T h i s c o n d i t i o n c a n be removed for fu r the r s t u d y i n the fu tu re . C o n d i t i o n (2) says t ha t i f s chema X ex is ts i n two con junc t i ve m a p p i n g s of the same Concept, at least one re la t i on r s h o u l d appea r i n b o t h con junc t i ve m a p p i n g s . T h i s ensures t ha t the Concept is c o m p a t i b l e i n the rep resen ta t ion of s chema X. C o n d i t i o n (3) ensures t ha t i f one subgoa l n a m e appears i n di f ferent c o n -j u n c t i v e m a p p i n g s for the same concept , these two subgoa ls s h o u l d be ex-ac t l y the same after a s u b s t i t u t i o n of va r iab le names . 23 Chapter 4. Introducing Concept into Conjunctive Mappings E x a m p l e 8 T h e r e are o ther cases w h e n two con junc t i ve m a p p i n g s fo l low the de f i n i t i on of same concep t b u t s t i l l are not represen t ing the same Con-cept. C o n s i d e r the fo l low ing con junc t i ve m a p p i n g s . C o n j u n c t i v e M a p p i n g 1: be tween A a n d B student(sid, name) :- A.stu(sid, name) (8.1a) student(sid, name) :- B.stu(sid, name, program) (8.1b) C o n j u n c t i v e M a p p i n g 2: be tween A a n d C student(sid, name, advisor) :- A.stu(sid, name), A.advisor(sid, advisor) (8.2a) student(sid, name, advisor) :- C.student(sid, name, advisor), C.student(sidl, advisor, name) (8.2b) C o n j u n c t i v e m a p p i n g 1 a n d con junc t i ve m a p p i n g 2 together do not v i o -la te p r o p e r t y 1 a n d 2, b u t i f on l y cons ider con junc t i ve m a p p i n g 2, it is easy to f i nd ou t t h a t C.student(sid, name, advisor), C.student(sidl, advisor, name) is no t t a l k i n g abou t the Concept "student". T h i s c o m p o n e n t is a c t u a l l y ex-press ing the Concept of a s tuden t t ha t has a n adv i so r a n d the s tuden t is also h is adv iso r ' s adv iso r . However , i n th is thes is , we o n l y dea l w i t h whe the r two con junc t i ve m a p p i n g s c a n descr ibe the same Concept, r a the r t h a n d e c i d i n g whe the r the g iven con junc t i ve m a p p i n g i tsel f is cor rec t . • 24 Chapter 5 Creating a Mediated Schema in P D M S In ou r scenar io , a m e d i a t e d schema of a l l . pee r schemas a n d the m a p p i n g s f r o m the m e d i a t e d schema to l oca l sources are necessary for que ry p r o p a -ga t i on a n d t r ans la t i on . W e presented the p r e l i m i n a r y know ledge of s c h e m a m e d i a t i o n i n Sec t i on 2.1 a n d Sec t i on 2.2. W e also d iscussed Concept w h i c h c a n be represented as the seman t i c i n f o r m a t i o n i n m a p p i n g s i n C h a p t e r 4. I n th i s chap te r , we present the de f in i t i on of schema m e d i a t i o n i n P 2 P set t ings , a n d present ou r app roach of c rea t ing a m e d i a t e d schema for a P D M S . 5.1 Pottinger's Schema Mediation Algorithm for DIS P o t t i n g e r [22] prov ides a nove l a p p r o a c h for the d a t a i n teg ra t i on s y s t e m to a u t o m a t i c a l l y create a m e d i a t e d s c h e m a M based o n two l oca l schemas E a n d F, a n d a m a p p i n g MapEj? be tween t h e m . U s i n g her a l g o r i t h m , not o n l y a m e d i a t e d schema M b u t the m a p p i n g s f r o m M to b o t h l o c a l sources E a n d F, MapMJE a n d MapM.F, w i l l a lso be genera ted [22]. In th i s sec t ion we descr ibe P o t t i n g e r ' s a lgo r i t hms of c rea t i ng a m e d i a t e d s c h e m a a n d m a p p i n g s to l o c a l sources i n a d a t a in teg ra t ion s y s t e m . T h e a l g o r i t h m of t r a n s l a t i o n quer ies is also d iscussed i n th is sec t ion . F i g u r e 5.1 shows P o t t i n g e r ' s m e d i a t e d schema c rea t i on a l g o r i t h m . T h e r e are m a i n l y two categor ies of re la t ions t ha t w i l l be genera ted i n the m e d i a t e d s c h e m a M: O n e is d i rec t l y f r o m l oca l re la t ions . T h e y are no t s h a r i n g a Concept w i t h o ther schemas so tha t those re la t ions d o not appea r i n any of the m a p p i n g s . T h e other is c rea ted f r o m m a p p i n g s . If there is a m a p p i n g be tween E a n d F w h i c h expresses s o m e t h i n g t ha t E a n d F bea r i n c o m m o n , 25 Chapter 5. Creating a Mediated Schema in PDMS Procedure CreateMediatedSchema(£ l, F, MapE.F) /* E,F are schemas, and MapE.F is a conjunctive mapping between them. * / C = 0 £ = 0 Let R = { r e £ U F | r ^ a projection-only component } For each relation r £ Ft Let m be a new relation name(m) = name(r) attr ibutes(m) = attr ibutes(r) A d d £(r, m) to £ A d d m to M For each I D B name q S IDB(MapE.p) Let m be a new relation Let varsq be the duplicate-free union of variables of queries that define q in Mape.F name(m) = name(q) attr ibutes(m) = varsq A d d £(q, m) to £ A d d m to M Return M and £ F i g u r e 5.1: T h e C r e a t e M e d i a t e d S c h e m a A l g o r i t h m f r o m [22] there w i l l be a re la t i on i n the m e d i a t e d schema represen t ing th i s c o m m o n i d e a across the schemas. A s imp le subset of G L A V m a p p i n g s [12] is used to express the m a p p i n g f r o m M to E U F. T h e heads of quer ies i n MapM-EuF c a n be t rea ted as an in te rmed ia te schema, 7 5 , w h i c h is used to i nd i ca te h o w each m e d i a t e d schema re la t i on relates to each p a r t i c u l a r source. E a c h c o m p o n e n t (def ined i n Sec t i on 2.1.3) c f r o m MapE_F creates two v iews i n MapMJiuF- T h e first v i ew , lvc, is ca l led a l oca l v i ew for c. It is a con junc t i ve que ry f r o m M to 7 5 . LVM cons is ts of a l l such l o c a l v i e w def in i t ions for M. T h e second v i e w , gvc, is ca l led a g loba l v i e w for c. gvc is a que ry f r o m E to 7 5 , a n d is i n c l u d e d i n GVM, the g loba l v i e w def in i t ions for M. F i g u r e 5.2 shows the a l g o r i t h m of c rea t i ng the m a p p i n g MapEuF-G i v e n M a n d MapM.E\jF, any que ry tha t is over M c a n be re fo rmu la ted to l oca l sources s i m p l y fo l low ing LVM a n d GVM- T h e que ry r e fo rmu la t i on a l g o r i t h m is s h o w n i n F i g u r e 5.3. 26 Chapter 5. Creating a Mediated Schema in PDMS Procedure C r e a t e M a p p i n g ( £ ' , F, MapE.F, M) /* E and F are schemas, MCLPE_F is a conjunctive mapping between them, and M and £ are the outputs from CreateMediatedSchema(£', F, MapE.p) */ LVM = 0 GVM = 0 For each relation m e M If e € E U F and f (e, m) Let q be a fresh I D B name Let lvm = q(attr ibutes(m)) :- m(attr ibutes(m)) Let gvm = q(attr ibutes(m)) :- e(attributes(e)) A d d lvm to LVM A d d gvm to GVM For each component c € MapE.F Let cname. = IDB(c) Let m be the relation in M s.t. £(cname, m) Let q be a fresh IDB name lvc = q(vars(c)) :- m(attr ibutes(m)) gvc = q(vars(c)) :- body(c) A d d lvm to LVM A d d gvm to GVM Return LVM and GVM F i g u r e 5.2: T h e C r e a t e M a p p i n g A l g o r i t h m f r o m [22] F o r m a l de f in i t ions of a m e d i a t e d schema i n D a t a In teg ra t ion S y s t e m a n d m a p p i n g f r o m m e d i a t e d schema to l oca l sources, as we l l as the m e d i a t e d schema c r i t e r i a are p r o v i d e d i n [22]. 5.2 Problem Definition J u s t as we have d iscussed i n C h a p t e r 1, a m e d i a t e d schema w h i c h c a n be used to imp rove the comprehens ib i l i t y of que ry t r ans la t i on w i l l be ve r y des i rab le i n a P D M S . T o make th is i d e a more c lear , look at E x a m p l e 9. E x a m p l e 9 A s s u m e tha t UW, UBC a n d UT are the th ree peers w i t h databases s to r i ng the fo l l ow ing i n f o r m a t i o n a b o u t conference pape rs s h o w n i n F i g u r e 5.4: UW.conf-paper(title, author, conference, venue, pages) UBC.conf-paper(title, conference, year) 27 Chapter 5. Creating a Mediated Schema in PDMS Procedure R e w r i t e Q u e r y ( M , MO.PM.EUF, Q) Q ' = maximal ly-contained rewrit ing for Q using LVM / * Q ' is over intermediate schema IS * / Q " = expansion of Q ' using GVM / * Q " is over E U F */ Return Q " F i g u r e 5.3: Q u e r y R e w r i t i n g A l g o r i t h m f r o m [22] UT.conf-paper(title, author, abstract, area) T h e m a p p i n g s be tween UBC a n d UW, UW a n d UT are g i ven as fo l lows: M a p p i n g MapuBCMW-conf-paper(title.conference):-UBC.conf-paper(title,conference,year) conf-paper(title.conference):-UW.conf-paper(title,author,conference,venue,pages) M a p p i n g MapuwjJT'-conf-paper(title.author):-UW.conf-paper(title,author,conference,venue,pages) conf-paper(title.author):-UT.conf-paper(title,author,abstract,area) A s s u m e tha t a m e d i a t e d schema M is c rea ted for a l l three peers us ing the above i n f o r m a t i o n : M.conf-paper(title,conference,year.author.venue,pages,abstract,area) W e fu r the r assume tha t the M a p p i n g s f r o m M to each l o c a l peer c a n be o b t a i n e d u s i n g G L A V m a p p i n g s : M a p p i n g MapMMBC-L A V : conf-paper(title,conference,year):-M.conf-paper(title,conference,year,author,venue,pages,abstract.area) G A V : 28 Chapter 5. Creating a Mediated Schema in PDMS Quae is translated to £>7?6ver M i Peer UBC UBC User:; Query;£<;fla over UBC Peer UW (4) is translated to gt/w over UW Peer UT (6) (2w.is translated to gj'rbver UT F i g u r e 5.4: Q u e r y P r o c e s s i n g i n M e P S y s conf-paper(title,conference,year):-UBC.conf-paper(title,conference,year) M a p p i n g MapM.uw-L A V : conf-paper(title,author,conference,venue,pages):-M.conf-paper(title,conference,year,author.venue,pages,abstract,area) G A V : conf-paper(title,author,conference,venue,pages):-UW.conf-paper(title,author,conference,venue,pages) M a p p i n g MapMJUT'-L A V : conf-paper(title,author,abstract,area):-M.conf-paper(title,conference,year,author,venue,pages,abstract,area) 2 9 Chapter 5. Creating a Mediated Schema in PDMS GAV: conf-paper(title,author,abstract,area):-UT.conf-paper(title,author,abstract,area) Now, using these G L A V mappings, queries can be easily translated from over M to any local schema or from any local schema to M. A user at UBC can ask a query in UBC's own schema: q(title):-UBC.conf-paper(title,conference,year) This query is first translated to that over M using MapMMBC-q(title):-M.conf-paper(title, conference, year.author.venue, pages, abstract, area) Then peer UBC broadcasts this query to all of this acquaintances, and ac-quaintances to acquaintances. For example, UW receives this query. Using the Mapping MapMMW, this query can be translated to that over UW: q(title):-UW.conf-paper(title,author,conference,venue,pages) In the same way, answers can be reformulated and passed back to UBC. Thus, UBC can get not only answers from its own peer database but also from other peer database. Users at UBC can also pose their query directly over the mediated schema, by using which users are able to query more information in the network. • Since Pottinger has already provided the related algorithm in [22] (de-scribed in Section 5.1), are these algorithms already enough to create a mediated schema in a PDMS setting? Unfortunately, the answer is no. In a PDMS, any peer can join and leaves the network at any time, so the schema mediation algorithm needs to be commutative and associative and satisfy all the requirement of a P 2 P system. Though [22] considers to mediate more than two schemas, when the order of local schemas changes, the final medi-ated schema might also change. Furthermore, [22] requires that to include a new local schema E into the previously created mediated schema M, a 30 Chapter 5. Creating a Mediated Schema in PDMS m a p p i n g f r o m M to E shou ld be spec i f ied . T h i s c o u l d be ach ieved in a d a t a in teg ra t ion s y s t e m . Howeve r , i n a P 2 P s y s t e m , n o b o d y c a n be expec ted to take the respons ib i l i t y for th is task. A l s o , i n a P D M S , i t is ve ry l i ke ly t ha t two di f ferent vers ions of m e d i a t e d schema (con ta in ing i n f o r m a t i o n of differ-ent sets of peer schemas) m i g h t meet at some p o i n t , a m e r g i n g o p e r a t i o n to merge the two m e d i a t e d schemas w o u l d be i nev i tab le . S u c h a n ins tance is never expec ted i n a d a t a in teg ra t ion s y s t e m , nor i n [22]. T h u s , these new tasks w i l l be tack led i n M e P S y s . T h e a lgo r i t hms of how to create a m e d i a t e d schema a n d m a p p i n g s to l oca l sources i n a P D M S se t t i ng as shown i n E x a m p l e 9 w i l l be desc r i bed i n the fo l low ing sect ions of th is chapter . T o create a m e d i a t e d schema, we ex tend the a lgo r i t hms i n Sec t i on 5.1. T h i s i nc ludes (1) u s i n g Concepts to create a more comprehens ib le m e d i a t e d schema, (2) ensu r i ng t ha t the a l g o r i t h m is c o m m u t a t i v e a n d assoc ia t ive (since any peer c a n enter or leave the ne twork at any t ime ) . H e r e we p rov i de the f o r m a l de f in i t ions of s c h e m a m e d i a t i o n i n P 2 P env i ronmen t . W e b e g i n by i n f o r m a l l y de f in ing C o n c e p t M e d i a t i o n , a n d t h e n f o r m a l l y def ine i t i n D e f i n i t i o n 4. W e use the de f in i t i on of C o n c e p t M e d i a t i o n to fo rma l i ze our de f i n i t i on o f M e d i a t e d S c h e m a In P 2 P i n D e f i n i t i o n 5. In fo rma l l y , concep t m e d i a t i o n c a n be exp la i ned as the fo l low ing : A s s u m e tha t a set of peer schemas S = { P i , P2, ••• , Pfc} a n d a set of m a p p i n g s M between some pa i r s of the schemas are g iven . E a c h m a p p i n g m a y con ta i n severa l con junc t i ve m a p p i n g s . E a c h con junc t i ve m a p p i n g o n l y speci f ies one aspect of the c o m m o n a l i t y be tween the pa i r s of peers. W e assume tha t a l l con junc t i ve m a p p i n g s w i t h the same I D B name i n the set M are t a l k i n g abou t the same t h i n g , i.e., more f o rma l l y they refer to the same concep t (De f i n i t i on 3). W e p u t a l l the i n f o r m a t i o n p r o v i d e d b y these con junc t i ve m a p p i n g s (def ined i n Sec t ion 2.1.3) together a n d f o r m a g loba l re la t i on for th is concept . D e f i n i t i o n 4 ( C o n c e p t M e d i a t i o n ) : g iven a set of peer schemas S = {pi, P2, ... , Pk} a n d a set o f m a p p i n g s M be tween some pa i r s o f the schemas , a C o n c e p t M e d i a t i o n is the process of c rea t i ng a represen ta t ion (i.e., re la t i on 31 Chapter 5. Coating a Mediated Schema in PDMS a n d m a p p i n g to sources) i n the m e d i a t e d s c h e m a tha t co r responds to those con junc t i ve m a p p i n g s i n the set M w i t h the same I D B n a m e . • In fo rma l l y , the m e d i a t e d s c h e m a i n P 2 P c a n be def ined as the u n i o n of a l l concepts of i ts peers. N o t e t h a t i f every concept is ca l cu la ted b y us ing C o n c e p t M e d i a t i o n , t h e n the M e d i a t e d S c h e m a is the u n i o n of a l l m e d i a t e d concepts . A M e d i a t e d S c h e m a s h o u l d also fo l low the I n f o r m a t i o n C a p a c i t y c r i t e r i a : a l l quer ies t ha t c a n be asked over the source schemas c a n be asked over the m e d i a t e d schema a n d the same resu l ts are re tu rned [22]. D e f i n i t i o n 5 ( M e d i a t e d S c h e m a i n P 2 P ) : g iven a set of peer schemas S = {Pi, P2, ... , Pk] a n d a set of m a p p i n g s M be tween some pa i r s of the schemas, a M e d i a t e d S c h e m a is the u n i o n of a l l r esu l t i ng re la t ions f r o m C o n c e p t M e d i a t i o n (commona l i t i es be tween schemas) a n d those re la t ions ex i s t i ng i n S b u t not ex i s t i ng i n any of the subgoa ls of the m a p p i n g s i n set M (specia l i t ies for l oca l peer schema) . • F u r t h e r , a m a p p i n g MapM.E f r o m the m e d i a t e d s c h e m a M to each l oca l source E, i.e. the G L A V m a p p i n g between M a n d UBC i n E x a m p l e 5, is necessary so tha t a que ry over M c a n be re fo rmu la ted to tha t over E. In our a l g o r i t h m , the m a p p i n g Mapuj; is i n the f o r m of G L A V , a n d each peer E m a i n t a i n s the i r own G L A V m a p p i n g MapM.E-T o make the m e d i a t i o n process easier, we i n t r o d u c e d a cons t ruc t Map-pingTable. A M a p p i n g T a b l e con ta ins a l l l o ca l i n f o r m a t i o n abou t a spec i f ic Concept. W i t h the use of M a p p i n g T a b l e , the presence of i nd i rec t m a p p i n g s t ha t iden t i f y a d d i t i o n a l i n f o r m a t i o n abou t re la t i onsh ips betweens schemas c a n be fu l l y used . C o n s i d e r E x a m p l e 10. E x a m p l e 1 0 : C o n s i d e r the ind i rec t m a p p i n g i n f o r m a t i o n i n the fo l l ow ing m a p p i n g s : C o n j u n c t i v e M a p p i n g A _ B : author(name) :- A.auth(name, affiliation, contactlnfo) author(name) :- B.author_paper(name, paper) C o n j u n c t i v e M a p p i n g B _ C : 32 Chapter 5. • Creating a Mediated Schema in PDMS author(name) :- B.author paper(name, paper) author(name) :- C.author(name, affiliation, title) Conjunctive Mapping A_C: author(affiliation) :- A.auth(name, affiliation, contactlnfo) author(affiliation) :- C.author(auth-name, affiliation, title) • For mapping A_B and mapping B_C, it is clear that the first attributes in A.auth, B.author and C.author are the same since they are mapped to the same variable: name. In mapping A_C, the first attributes of A.auth and C.author are not mapped to the same variable, so by A_C only, it is impossi-ble to tell that they represent the same information. However, by composing A_B and B_C, it is clear that in the first attributes of A.auth and C.author represent the same information. We call the information found in the com-posed mapping that is not included in the direct mapping indirect mapping information. One advantage of our work is that such indirect information can complement information in the original mappings. Previous work tends to focus on translating queries on a single source of information, either from the original mapping or from the composed mapping. Since users are likely to provide incorrect or incomplete mapping information, our system helps users check whether their mappings are correct, and the system will auto-matically combine all mapping information together, regardless of where the information comes from. In Section 5.3, we introduce the use of a MappingTable to merge all mapping information, not just the direct information. In Section 5.4 we introduce the idea and the algorithm of a mapping compatible check to ensure that those relations in the mediated schema are concept-based. We present the schema mediation algorithm for P2P in Section 5.5. 33 Chapter 5. Creating a Mediated Schema in PDMS 5.3 MappingTable Creation 5.3.1 Intuitions A M a p p i n g T a b l e b o t h he lps create the m e d i a t e d schema , a n d is used to create the m e d i a t e d schema to source m a p p i n g s . E a c h M a p p i n g T a b l e rep-resents a s ingle concep t , i n c l u d i n g b o t h the d i rec t a n d ind i rec t m a p p i n g i n f o r m a t i o n i n re la t i ng a concept . A s s h o w n i n E x a m p l e 11, each source re la t i on is g iven a row, a n d each a t t r i bu te is represented b y a c o l u m n . E a c h c o l u m n represents one aspect of t ha t concept . Example 11 : T h e M a p p i n g T a b l e c rea ted for the m a p p i n g i n E x a m p l e 6 is s h o w n i n F i g u r e 5.5: R e l a t i o n : M l . c o n f - p a p e r ( t i t l e , venue, year , u r l , pages) M l . c o n f - p a p e r 1 2 3 4 5 U B C . c o n f - p a p e r 1 2 3 4 U W . c o n f - p a p e r 1 2 3 4 F i g u r e 5.5: M a p p i n g T a b l e for E x a m p l e 6 E a c h en t r y i n the tab le refers to the l oca t i on of the a t t r i b u t e i n the source (e.g., the f ou r t h a t t r i bu te i n UW.conf-paper gives i n f o r m a t i o n abou t the a t t r i bu te pages i n the m e d i a t e d schema re la t ion ) . • A s the m e d i a t e d s c h e m a grows to encompass more peers a n d more m a p -p ings , somet imes a new M a p p i n g T a b l e w i l l be c rea ted to represent the same concept as one crea ted for the p rev ious l y -ex i s t i ng m e d i a t e d schema. L e t ' s assume tha t conf-paper is an ex i s t i ng concept represented by the M a p -p i n g T a b l e MTold. L e t the new M a p p i n g T a b l e represen t ing con f -paper be MTnew. MTold a n d MTnew need to be merged to f o r m one concep t for conf-paper i n the m e d i a t e d schema. E x a m p l e 12 gives the i n t u i t i o n of me rg -i ng two M a p p i n g T a b l e s of the same concept . 34 Chapter 5. Creating a Mediated Schema in PDMS E x a m p l e 12 : L e t M T o l d be the M a p p i n g T a b l e i n E x a m p l e 11. L e t M T -new be the M a p p i n g T a b l e s h o w n in F i g u r e 5.6. R e l a t i o n : M 2 . c o n f - p a p e r ( t i t l e , venue, year , u r l , au tho r ) M 2 . c o n f - p a p e r 1 2 3 4 5 U B C . c o n f - p a p e r 1 2 3 4 U T . c o n f - p a p e r 1 2 3 4 F i g u r e 5.6: M a p p i n g T a b l e for E x a m p l e 12(a) S ince M T o l d a n d M T n e w are b o t h abou t " c o n f - p a p e r " , t hey c a n be merged to be one M a p p i n g T a b l e , s h o w n i n F i g u r e 5.7. R e l a t i o n : M 3 . c o n f - p a p e r ( t i t l e , venue, year , u r l , pages, au tho r ) M 3 . c o n f - p a p e r 1 2 3 4 5 6 M l . c o n f - p a p e r 1 2 3 4 5 U B C . c o n f - p a p e r 1 2 3 4 U W . c o n f - p a p e r 1 2 3 , 4 M 2 . c o n f - p a p e r 1 2 3 4 5 U B C . c o n f - p a p e r 1 2 3 4 U T . c o n f - p a p e r 1 2 3 4 F i g u r e 5.7: M a p p i n g T a b l e for E x a m p l e 12(b) B o t h " U B C . c o n f - p a p e r " i n l ine 3 a n d 6 are kept for c lar i ty . However , i n rea l i m p l e m e n t a t i o n and after o p t i m i z a t i o n , o n l y one of t h e m needs to be kept . • 5.3.2 Algorithm F i g u r e 5.8 shows how to create a M a p p i n g T a b l e for each re la t i on t h a t is cons t ruc ted f r o m a m a p p i n g MCLPE_F i n the m e d i a t e d schema . P r o c e d u r e c r e a t e M e d i a t e d S c h e m a ( . E , F, MapEjp) h i F i g u r e 5.1 w i l l f i rs t be ca l l ed . 35 Chapter 5. Creating a Mediated Schema in PDMS G i v e n MapE_F a n d MEF, the M a p p i n g T a b l e s w i l l be c o n s t r u c t e d for each concept i n MapE_F-W h e n a peer have two di f ferent vers ions of m e d i a t e d s c h e m a M\ a n d M2 m a i n t a i n i n g di f ferent i n f o r m a t i o n , i f b o t h M\ a n d M2 con ta i n concep t q, i n o rder to merge r\ and ,r2, co r respond ing M a p p i n g T a b l e MT\(q) a n d MT2{q). needs to be merged . (r\ a n d r2 are the co r respond ing re la t ions i n M\ a n d M2 represen t ing the concept q.) T h e M a p p i n g T a b l e m e r g i n g process fo l lows the same genera l p r i nc ip les : 1. R e l a t e d a t t r i bu tes s h o u l d be pos i t i oned i n the same c o l u m n ; 2. U n - r e l a t e d a t t r i bu tes are i n di f ferent c o l u m n s ; 3. O v e r l a p p i n g l oca l re la t ions i n the two M a p p i n g T a b l e s are used to de-te rm ine each c o l u m n i n one M a p p i n g T a b l e co r responds to t ha t i n the o ther M a p p i n g T a b l e . P r o c e d u r e m e r g e M a p p i n g T a b l e ( m t i , r\, mt2, r2) i n F i g u r e 5.9 merges any two M a p p i n g T a b l e of the same concept mt\ a n d mt2, a n d re tu rns a merged M a p p i n g T a b l e . It is requ i red tha t there be ove r l apped re la t ions i n m i l a n d mt2; o therw ise , they canno t be merged . S ince i t is l i ke l y t ha t each M a p p i n g T a b l e con ta ins i nd i rec t m a p p i n g i n f o r m a t i o n tha t the o ther one does not have, the f i rst step for m e r g i n g two M a p p i n g T a b l e is to u p d a t e such i n f o r m a t i o n for each M a p p i n g T a b l e s h o w n i n F i g u r e 5.10. 5.3.3 Section Summary In th i s sec t ion , we presented by examp les the m o t i v a t i o n of i n t r o d u c i n g a M a p p i n g T a b l e for each r e l a t i o n i n the m e d i a t e d s c h e m a t h a t comes f r o m a m a p p i n g . W e also d iscussed the case of m e r g i n g two M a p p i n g T a b l e s . W e fu r ther gave three a lgo r i t hms tha t are re la ted to o p e r a t i n g M a p p i n g T a b l e s . T h e y are c rea t i ng a M a p p i n g T a b l e , m e r g i n g M a p p i n g T a b l e s a n d u p d a t -i ng cu r ren t M a p p i n g T a b l e u s i n g a d d i t i o n a l m a p p i n g i n f o r m a t i o n i n ano ther M a p p i n g T a b l e . 36 Chapter 5. Creating a Mediated Schema in PDMS Procedure C r e a t e M a p p i n g T a b l e ( M a p . E _ F , MEF) /* MEF is the mediated schema created from Maps.F */ MT = 0 For each relation m S M constructed from IDB(MapE.F) Let mt be a new MappingTable wi th #a t t r (m)+ l columns / * The next three lines add m into mt * / mt(0, 0) = name(m) For i = 1 to # attr(m) mt(0, i) = i Let qs be component from Maps.F over schema E Let qp be component from Maps.F over schema F Add(<?.E, mt) / * cal l Procedure A d d ( g s , mt) to add al l subgoals of qE to mt */ Add(<7F> rnt) / * cal l Procedure Add (gF> rnt) to add al l subgoals of qF to mt */ A d d mt to MTE_F Return MTE_F Procedure A d d ( g , mt) / * A d d al l subgoals of q to MappingTable mt */ Let m be the first relation in mt Let j be # next row to insert in mt For each subgoal g of q mt( j , 0) = name(g) For each attr ibute x of g Let x be the i t h attr of g F i n d k s.t. var(g.attr(z)) = var(m.attr(&)) mt ( j , k) = i j = j + 1 IImove to the next row F i g u r e 5.8: C r e a t e M a p p i n g T a b l e A l g o r i t h m 5.4 Mapping Compatible Check T w o m a p p i n g s t ha t are c rea ted at di f ferent peers b y di f ferent users m igh t have the same I D B name , i n d i c a t i n g t ha t these two m a p p i n g s represent the same concept . However , as is d iscussed i n Sec t i on 4.2, two m a p p i n g s h a v i n g the same concept shou ld fo l low D e f i n i t i o n 3 (i.e. these two m a p p i n g s s h o u l d have the same I D B n a m e , t hey have ove r lapped subgoa l names a n d the i r ove r l apped subgoa l sets are equ iva len t ) . T h e m o d u l e of M a p p i n g C o m p a t i b l e C h e c k is to m a k e sure t ha t the m a p p i n g s are rea l l y represen t ing the same 37 Chapter 5. Creating a Mediated Schema in PDMS Procedure M e r g e M a p p i n g T a b l e ( m * i , n , 7 n * 2 , T-2) I* mt\ and 771*2 are two MappingTables wi th the same concept; r\ and r-i are the corresponding relations in the mediated schema for mt\ and rati */ / * F i rs t update each MappingTables using indirect mapping information specified in the other * / UpdateMapp ingTab le (m* i , 771*2) / / s e e Figure 5.10 / * Second, merge the Mapp ingTab les* / Let S be the set of overlapped relation names in 771*1 and 771*2 Let overlappedCol = # column (5) newMTCol = # co lumn(mi i ) + # co lumn(7n*2) + 1 - overlappedCol / * Construct a new MappingTable mtnew of newMTCol co lumns* / hashi — 0 / * how columns in mt\ map to mtnew */ For V column i of mt\ hashiii) = new(/c) /* k is a column in mtnew that has not been assigned * / hashi = 0 / * how columns in 771*2 map to mtnew */ For V column i of mti If 3 row j s.t. mt2(j, 0) G S and mti{j, i)^ blank F i n d k, I s.t. mt\{k, 0) = mt2(j, 0) and mt\(k, I) = mti(j, i) hashi{\) = hash\{l) Else hashi(\) = new(fc) / * A d d the first row to mtnew */ mtnew(0, 0) = name(r i ) For i = l to newMTCol mtnew(0, i) = i I* A d d m i l to mtnew */ For each row i of mt\ Let p be the next row in mtnew For each column j of m t i If mti(i, j) is not blank mtnew(p, hashi(j)) = mti(i, j) / * A d d 771*2 to mtnew */ For each row i of 771*2 Let p be the next row in mtnew For each column j of 771*2 If m t 2 ( i , j ) is not blank mtnewip, hash2(j)) = mt2{i, j) Return mtnew F i g u r e 5.9: M e r g e M a p p i n g T a b l e A l g o r i t h m 38 Chapter 5. Creating a Mediated Schema in PDMS Procedure U p d a t e M a p p i n g T a b l e ( m * i , 771*2) V relation name x\ and xi from mt\ , If x\ and X2 both have attr ibutes in column j of mt\, but in m*2, the corresponding attr ibutes of x\ and X2 are in different columns k and Z Combine column k and Z of 771*2 V relation name x\ and X2 from 777*2 If x\ and X2 both have attr ibutes in column j of 771*2 but in 771*1, the corresponding attr ibutes of x\ and x^ are in different columns k and / Combine column k and / of mt\. F i g u r e 5.10: U p d a t e M a p p i n g T a b l e A l g o r i t h m concept w h e n m e r g i n g the i n f o r m a t i o n con ta ined b y two m a p p i n g s . S ince every peer o n l y ma in ta i ns i ts o w n m a p p i n g s a n d does no t k n o w o ther peers ' m a p p i n g , the m a p p i n g compa t i b l e check needs to re ly o n the M a p p i n g T a b l e t ha t c a n infer the o r i g i na l m a p p i n g s . T h e a l g o r i t h m i n F i g -ure 5.11 i n p u t s the o r i g i na l M a p p i n g T a b l e of concep t q for the m e d i a t e d schema, MTm(q), a n d a new m a p p i n g of the concept q, MapEj-{q)- T h e a l g o r i t h m f i rst checks whe the r there ex is ts a case w h e n two a t t r i bu tes of a ce r ta in subgoa l or two a t t r i bu tes of di f ferent subgoa ls i n MapE_F(q) have the same va lue b u t have dif ferent va lue i n a p rev ious m a p p i n g , w h i c h is rep-resented b y a row or a set of rows i n MTm(q). T h e a l g o r i t h m fu r the r checks i f there ex is ts a case w h e n i n MapE_F(q), two a t t r i bu tes i n one subgoa l or i n two subgoa ls f r o m one componen t have the same va lue , b u t i n MTm(q), the c o r r e s p o n d i n g a t t r i bu tes have dif ferent va lues. If e i ther of the above case is t rue , the new m a p p i n g of concept q is no t c o m p a t i b l e w i t h a p rev ious ex i s t i ng m a p p i n g of concept q. T h e m a p p i n g needs to be m o d i f i e d b y the user. W e assume tha t at least one subgoa l s h o u l d be i n c o m m o n i f one schema is i nvo lved i n severa l con junc t i ve m a p p i n g s w i t h the same I D B name . 5.5 Peer Schema Mediation W e have b r ie f l y desc r ibed M e P S y s i n Sec t i on 5.2. In th i s sec t ion , we de-scr ibe our Pee r S c h e m a M e d i a t i o n a l g o r i t h m (short for P S M a l g o r i t h m ) , w h i c h is c o m p o s e d of three s m a l l a l go r i t hms , schema m e d i a t i o n , c o m p u t i n g 39 Chapter 5. Creating a Mediated Schema in PDMS Procedure M a p p i n g C o m p a t i b l e C h e c k ( M a p / j ; _ i r ( g ) , MTm(q)) /* MapE.F{q) is a new mapping of concept q\ MTm is the MappingTable of concept q for the mediated schema * / / * check whether Self-Restrict ive component exists * / / * Return True when compatible; False when incompatible * / Div ide al l rows in MTm(q) into several sets s.t. al l consecutive rows from one peer are in one set For each set S of MTm(q) If 3 values v\, v<i in the same column from rows i, j which belong to S F i n d subgoal sgi in MCLPE„F{<I) s.t. name(sgi) = MTm(q)(i, 0) F i n d subgoal sg2 in MapE.F(q) s.t. name(s<j2) = MTm(q)(j, 0) If s g l . a t t r ^ i ) =^  sp2.attr(w2) Return False •For V subgoal sg\, sg2 from the same component of Mapsj-(q) If 3 x and y s.t. sf^.at t^a;) = sp2-attr(j/) F i n d row i, j in MTm(q) s.t. MTm(q)(i,0) = name(spi) , Mr m (<?)( j ,0) = name(sff 2) If x in row i and y in row j are not in the same column Return False Return True I F i g u r e 5.11: M a p p i n g C o m p a t i b l e C h e c k A l g o r i t h m G L A V m a p p i n g a n d que ry re fo rmu la t i on . In C h a p t e r 6, there w i l l be more d i scuss ion o n how to hand le the case w h e n new peers j o i n the ne twork af ter da tabase a p p l i c a t i o n has been bu i l t u p o n the ear l ier m e d i a t e d schema, a n d how to hand le the case w h e n some peer leaves the ne twork , or some l oca l da tabase schema evolves as t i m e goes by. 5.5.1 System Setup Phase - Schema Mediation System Work For ease of d i scuss ion , we assume there is a se tup phase for M e P S y s : severa l peers have j o i n e d the ne twork , f o u n d the i r acqua in tances a n d c rea ted m a p -p ings to the acqua in tances b u t no m e d i a t i o n has been s ta r t ed . T h e sys tem se tup phase s ta r ts f r o m a f i rst peer s t a r t i n g m e d i a t i o n to a t i m e p o i n t w h e n a l l peers i n the sys tem get the mos t u p d a t e d m e d i a t e d s c h e m a of the who le ne twork . W e assume tha t , at any t ime , each peer P m a i n t a i n s a l l of the fo l l ow ing 40 Chapter 5. Creating a Mediated Schema in PDMS i n f o r m a t i o n : 1. P ' s l oca l da tabase schema 2. A l is t L of m a p p i n g s w i t h P ' s acqua in tances 3. A cu r ren t vers ion of m e d i a t e d schema M 4. M a p p i n g T a b l e set co r responds to M 5. G L A V m a p p i n g s f r o m M to P >l3::B>checks andupdates;its local relation, '.information imMn-based on B. tip X creates Mi- based on:: X,.MapxiB,Mapx_c t*j X'geis responses" from B^C X computes Mwcontaining: X. B..p arjd;MXpK:B,';MapVic Xbroadcasts MUand COTresppnding'MappjngTable Peer-X;: X, MapVs, Mapxlc Mapping with other peers Peer B: B.Mapxja, MapBjd, MaptD 'Map'ping'with'otliMrpmrs' Peer C:: Mapictt'Mapaic .'.t3: CcheckVand updates-its local reiation ^formation in -Miibased oh C; F i g u r e 5.12: S c h e m a M e d i a t i o n S t a r t - u p A s s h o w n i n F i g u r e 5.12, w h e n m e d i a t i o n s ta r ts f r o m peer X, X gets m a p p i n g s f r o m L one b y one. A t t ime £ i , X med ia tes i n f o r m a t i o n based o n 41 Chapter 5. Creating a Mediated Schema in PDMS each m a p p i n g mapx.Y- However , as is desc r i bed i n Sec t i on 5.1, there are two k i n d s of re la t ions i n the m e d i a t e d schema: one is f r o m l oca l sources a n d ano ther is f r o m the m a p p i n g . S ince i n a P D M S , every peer does not k n o w i ts acqua in tance ' s schema, the med ia ted schema Ma t ha t X first creates is o n l y c o m p o s e d of two pa r t s : l o ca l re la t ions f r o m X a n d re la t ions f r o m the m a p p i n g s . I n f o r m a t i o n of F ' s l oca l re la t i on is m i s s i n g f r o m Mt\. A t t ime t2, X sends Ma to each acqua in tance , Y a n d asks Y to c o n f i r m or u p d a t e Mt\. A t t ime £3, each acqua in tance con f i rms Ma a n d u p d a t e s Ma based o n i ts l o c a l s c h e m a i n f o r m a t i o n , i.e. a d d i n g those l o c a l s c h e m a i n f o r m a t i o n t h a t X does not k n o w . A t £4, X receives a l l c o n f i r m a t i o n f r o m i ts acqua in tances , X w i l l c o m p u t e a new m e d i a t e d schema, say Mt^, w h i c h con ta ins a l l s c h e m a i n f o r m a t i o n of X a n d a l l of i ts acqua in tances as we l l as the m a p p i n g s be tween X a n d i ts acqua in tances . X f u r the r compu tes the G L A V m a p p i n g f r o m Mt4 to X. X a lso upda tes M a p p i n g T a b l e s a n d let t he m e d i a t e d s c h e m a to b e Mt4 i n i ts l is t . A t t ime £5, X sends Mt4 a n d c o r r e s p o n d i n g M a p p i n g T a b l e s to a l l of i ts acqua in tances . W h e n a peer E u p o n rece iv ing a m e d i a t e d s c h e m a Mtn, first checks whe the r i t a l ready has a m e d i a t e d schema. If i t has no t , E w i l l do the same as X does. O t h e r w i s e , merge the two m e d i a t e d schemas. If the m e d i a t e d s c h e m a tha t E m a i n t a i n s changed after th is s tep, E sends out th i s new m e d i a t e d schema M t ( n + 1 ) and co r respond ing M a p p i n g T a b l e s to a l l of i ts acqua in tances ; o therw ise , no message w i l l be sent ou t . E a lso compu tes the new G L A V m a p p i n g a n d m a i n t a i n s the u p d a t e d i n f o r m a t i o n i n h is l is t L. A f t e r a p e r i o d w h e n every peer E has rece ived the mos t u p d a t e d m e d i -a ted s c h e m a Mt(n+k) a n d c o m p u t e d the G L A V m a p p i n g f r o m M t ( „ + f c ) to E, the sys tem se tup finishes. T h e a l g o r i t h m of c rea t i ng the m e d i a t e d s c h e m a i n the se tup phase is s h o w n i n F i g u r e 5.13. T h e o r e m 1 T h e t ime of c rea t i ng a m e d i a t e d s c h e m a u s i n g the above a lgo-r i t h m has an u p p e r b o u n d . P r o o f S k e t c h : A s s u m e there are n peers i n the network . L e t t be the t ime for one hop i n the ne twork . T w o k i n d s of messages need to be cons ide red : send ing out, the p a r t i a l m e d i a t e d s c h e m a a n d rece i v ing c o n f i r m a t i o n of l oca l 42 Chapter 5. Creating a Mediated Schema in PDMS Algor i thm SystemSetUp / * to be processed at each peer E when receiving a mediation message from other peer * / Input: 1. mediation message containing a part ial ly created mediated schema Mtmp of previous peers, a MappingTable MTtmp corresponding to Mtmp 2. local schema mapping list Output : A new mediated schema and MappingTable which wi l l be sent out to all of E's acquaintances If previously there is no mediated schema at peer E Mediate schemas M e t m p based on E and al l its mappings to acquaintances Send out check message wi th MEtmp and wait for confirmation from the acquaintances Received updated mediated schema MEF from each acquaintance F w i th F ' s local schema information in it Compute a local mediated schema ME for E and al l its acquaintances based on al l MEF {F € E's acquaintance list) Merge ME and Mtmp to be M'B Else (there exists previous version of mediated schema ME at E) Merge ME and Mtmp to be M'B If M'E does not equal to ME Send M'E and its corresponding MappingTable to al l the acquaintances Else No message to be sent Figure 5.13: System Setup Algorithm relations, and sending out the mediated schema. The upper bound for the first part is 2nt. So we only consider the second part. Each peer maintains the mapping information with its acquaintances. We can model this prob-lem to the following graph problem: when does each node get all the edge information? Since each node knows its edge information, this problem can further be transferred to the problem: When does each node get all the node information? A node E knows the information of node F, either from F directly, or from F's neighbor. The maximum time difference between these two is one hop t. We drop this one hop difference temporarily. So in order to let E get a full set of node information, E should receive a message from all other nodes. E receives the earliest message from F after (shortestPath(A, F) + shortestPath(F, E)) hops. So E receives all nodes information at hop 43 Chapter 5. Creating a Mediated Schema in PDMS hopE: hops = max.{shortestPath(A, F) + shortestPath(F, E)} Thus the number of hops for all nodes receiving all other node information is hopau: hopait = maxhopE = max{m&x{shortestPath(A, F)+shortestPath(F, E)}} EJ E F Considering the messages to confirm local relations, the total time before every peer gets the ultimate mediated schema is Ta\\: Tail = (hopau + 2n) *t Considering the fact that information about F can be obtained from F's neighbor, Tau can sometimes be Tau = {hopau + 2n-2)*t This is the upper bound for the whole mediation process. • Merging Two Mediated Schemas In a PDMS, different versions of mediated schemas are broadcasted in the network before a final mediated schema for all peers in the network has been built. At any time in the setup phase, a peer E that already maintains a mediated schema M\ can receive a different mediated schema Mo, that contains non-overlapped information with M \ . In this case, M\ and M2 need to be merged before E sends any of them to other acquainted peers. Our mediation process follows the following strategy: 1. When E gets a mapping Ep, a mediated schema MEF will be created first. This can use algorithm in Figure 5.1. 2. Then consider previous existing mediated schema ME, merge MEF and ME to get M ' E , and update ME to be M ' E . 44 Chapter 5. Creating a Mediated Schema in PDMS 3. W h e n E o r i g i na l l y has a m e d i a t e d schema M a n d receives a new me-d ia ted schema Mtmp, M a n d Mtmp w i l l be merged . F i g u r e 5.14 shows the a l g o r i t h m of how to merge two m e d i a t e d schema. Algorithm M e r g i n g T w o M e d i a t e d S c h e m a ( M 1 , M 2 , MTV, M T 2 ) / * M i , M 2 are the two mediated schemas, M T i , M T 2 are the corresponding MappingTables * / Mnew = 0 For V mti G M T i If 3 rati G M T 2 s.t. name(mii(0, 0 ) ) = name(mt 2 (0, 0 ) ) = q / * both mi l and mti are about concept q */ Find the corresponding relation r\ and r 2 from M i and M 2 mtnew =MergeMappingTable(mii, r\, mt2, r 2 ) Create a new relation rnew name [rnew) = q Let n be column size of mtneuj Let the first row number of mti in mtnew be indexl Let the first row number of mti in mtnew be index2 For i = 0 to n If mtnew(indexl, i) contains value re r n e t u .attr(i) = ri.attr(a:) Else (rrjine™ (wdea:2, i) contains value a;) r„etu.attr(i) = r2.attr(a;) Add rnew to Mnew For V n G M i If r i hasn't been merged or added to Mnew Add r i to M n e w For V r 2 G M 2 If r 2 hasn't been merged or added to M „ e t u Add r2 to M n e w Return Mnew F i g u r e 5.14: M e r g e T w o M e d i a t e d S c h e m a A l g o r i t h m Compute G L A V Mapping for Each Local Peer N e x t , h a v i n g created the m e d i a t e d schema, we need to be ab le to create the G L A V m a p p i n g f r o m the m e d i a t e d schema to the l oca l peer s c h e m a i n order to a l low for quer ies .to be t r ans la ted . W i t h the m e d i a t e d s c h e m a a n d the co r respond ing M a p p i n g T a b l e s , c o m p u t i n g the G L A V m a p p i n g is easy. F i g u r e 5.15 shows a n a l g o r i t h m of c o m p u t i n g the G L A V m a p p i n g for l oca l 45 Chapter 5. Creating a Mediated Schema in PDMS schema E. E x a m p l e 13 gives an examp le of c o m p u t i n g G L A V M a p p i n g for l oca l peer . A lgor i thm ComputeGLAVMapping(M, MT, E) /* M is the mediated schema, MT is the corresponding MappingTable for M , E is the local schema * / LVM — 0 PVM = 0 For each m € M If m does not have a corresponding mt £ MT If 3 e £ E s.t. name(e) = name(m) Let lvm = q(attributes(e)) :- m(attr ibutes(m)) Let gvm — q(attributes(e)) :- e(attributes(e)) A d d lvm to LVM A d d gvm to GVM Else / / m has a corresponding mt S MT Let body be the subgoal set for local peer, body = 0 For each row i of mt If 3 r e E s.t. name(r) = mt ( i , 0) Construct subgoal sg s.t. name(sg) = name(r) , using m's attr ibute value as the corresponding attr ibute value for sg A d d sg to body li$ s e E s.t. name(s) = m i ( i + 1,0) Let lvm = q(var(body)) :- m(attr ibutes(m)) Let gvm = q(var(body)) :- body A d d lvm to LVM A d d gvm to G V M 6odi/ = 0 Return LVM, GVM F i g u r e 5.15: C o m p u t e G L A V M a p p i n g A l g o r i t h m E x a m p l e 1 3 G i v e n schema B, m e d i a t e d schema M a n d M a p p i n g T a b l e MT, the G L A V m a p p i n g f r o m M to B c a n be c o m p u t e d . S c h e m a B: B_f l ight (date, company , f l i gh tNo , service) B_schedu le(date , f l i gh tNo , depar t , a r r i va l , n u m L e f t ) S c h e m a M: 46 Chapter 5. Creating a Mediated Schema in PDMS f l i gh t (da te , f l i gh tNo , company , serv ice, c lass, I D , d i scoun t , depar t , a r r i va l , n u m L e f t ) M a p p i n g T a b l e : f l ight A f .f l ight 1 2 3 4 5 . 6 7 8 9 10 B J l i g h t 1 3 2 4 B_schedule 1 2 3 4 5 C -schedu le 2 3 1 C - p r i c e 2 1 3 T h e G L A V m a p p i n g for MapMJ3 c a n be ob ta i ned . M a p p i n g MapMJB'-L A V : Q l ( d a t e , company , f l i gh tNo , serv ice) : -M . f l i gh t (da te , f l i gh tNo , company , serv ice , c lass, I D , d i scoun t , depa r t , a r r i va l , n u m L e f t ) G A V : Q l ( d a t e , company , f l i gh tNo , serv ice) : -B .B_ f l i gh t (da te , company , f l i gh tNo , serv ice) , B_schedule(date , f l i gh tNo , depar t , a r r i va l , n u m L e f t ) • Q u e r y R e f o r m u l a t i o n A l g o r i t h m W i t h t he G L A V m a p p i n g c o m p u t e d for each l o c a l peer E, a n y q u e r y posed over the m e d i a t e d schema M can be eas i ly re fo rmu la ted to t ha t over E a n d v ice versa . T h i s enables a fast t r ans la t i on for quer ies be tween any l oca l s chema a n d M, w h i c h c a n be used for que ry p rocess ing i n the P D M S as d iscussed i n Sec t i on 5.2. F i g u r e 5.16 gives the a l g o r i t h m o f q u e r y refor-m u l a t i o n . E x a m p l e 14 shows an examp le of que ry r e fo rmu la t i on u s i n g th is a l g o r i t h m . E x a m p l e 14 R e f o r m u l a t e Q u e r y Q to t ha t over l oca l s c h e m a B u s i n g the G L A V m a p p i n g c o m p u t e d i n E x a m p l e 13 47 Chapter 5. Creating a Mediated Schema in PDMS Algor i thm Q u e r y R e f o r m u l a t i o n ( p , M O P M . B ) ) M , E /* q is the query posed by the user; MapM.E is the G L A V mapping between M and E\ M is the mediated schema; E is the local schema * / Let reformQ be the reformulated query of q Let head ( re /o rmQ) = head(g) If any subgoal sp of q S M For each subgoal sg of q F i n d in L A V a view Iv s.t. sg € body(Zu) For each variable v in Iv Let pos be the posit ion of v in body(Zu) Change n to the variable v' in sg s.t. pos is the posit ion of v' in sp F i n d in G A V a view gv s.t. head(pv) = head(Zv) For each variable v in head(pv) Change v to 1/ wi th the same posit ion in the head of Iv Append body(pu) to the body of reformQ Else // sg G E For each subgoal sg of q F i n d in G A V a view gv s.t. sp £ body(pv) For each variable u in gv Let pos be the posit ion of v in body(pi>) Change v to the variable v' in sp s.t. pos is the posit ion of v' in sp F i n d in L A V a view Iv s.t. head(Zv) = head(pv) For each variable v in head(Zv) Change v to t/ wi th the same posit ion in the head of gv Append body(lv) to the body of reformQ Return reformQ F i g u r e 5.16: Q u e r y R e f o r m u l a t i o n A l g o r i t h m Q = q ( d , f) :- M . f l i g h t ( d , f, p r i ce , company , serv ice , c lass, I D , d i scoun t , depar t , a r r i va l , n u m L e f t ) U s i n g the a l g o r i t h m in F i g u r e 5.16, Q is re fo rmu la ted to Q' over B Q' = q ( d , f ) :- B .B_ f l i gh t (d , company , f, serv ice) , B .B_schedu le (d , f, depar t , a r r i va l , n u m L e f t ) • 48 Chapter 5. Creating a Mediated Schema in PDMS 5.5.2 Section Summary I n th i s sec t ion , we d iscussed the sys tem setup phase - c rea t i ng a m e d i a t e d s c h e m a for the P D M S f r o m the b e g i n n i n g . W e presented P S M a l g o r i t h m w h i c h inc ludes schema m e d i a t i o n , c o m p u t i n g G L A V m a p p i n g a n d que ry re fo rmu la t i on a lgo r i t hms . F o r each s tep, we give examp les to i l l us t ra te our idea . M e P S y s p rov ides the f i rst p ro to t ype to a u t o m a t i c a l l y create a m e d i a t e d schema i n P D M S s . W i t h P S M a l g o r i t h m , quer ies c a n be eas i ly p rocessed a m o n g di f ferent peers i n a P D M S . 4 9 Chapter 6 U p d a t i n g t h e M e d i a t e d S c h e m a S o m e of the peers m igh t b u i l d u p a da tabase a p p l i c a t i o n us ing the m e d i a t e d schema after the sys tem set -up phase. F o r those peers h a v i n g app l i ca t i ons over the m e d i a t e d schema, i f the m e d i a t e d s c h e m a changes, the a l r eady -bu i l t app l i ca t i ons m i g h t need to be re -bu i l t , w h i c h takes qu i te a lot of r e d u n d a n t work . O n the o ther h a n d , i f a peer leaves, t ha t peer s c h e m a i n f o r m a t i o n needs to be d r o p p e d f r o m the m e d i a t e d schema. However , no ex i s t i ng a l -g o r i t h m has been p roposed to dea l w i t h such a case. I n th is C h a p t e r , we d iscuss how to u p d a t e the m e d i a t e d schema in the s teady s ta te - i.e., after the s y s t e m se tup phase. In p a r t i c u l a r , we de te rm ine b o t h h o w to u p d a t e the m e d i a t e d schema a n d m a p p i n g s i f a new peer jo ins the P D M S ne twork (Sec t ion 6.1) or an o l d one leaves the ne twork (Sec t ion 6.2), a n d how the me-d ia ted s c h e m a a n d assoc ia ted m a p p i n g s change i f some peer 's l oca l da tabase schema evolves (Sec t ion 6.3). 6.1 Adding a New Peer to the System A f t e r the sys tem se tup phase, every peer m a i n t a i n s an up - to -da te m e d i a t e d schema M. If a new peer P decides to enter the ne twork , the m e d i a t e d schema M needs to be u p d a t e d to a new m e d i a t e d s c h e m a M' w h i c h also i nc lude P's s chema i n f o r m a t i o n . However , s ince af ter the s y s t e m setup phase, some peer has a l ready bu i l t u p an a p p l i c a t i o n over M. A na ive a p p r o a c h is to c o m p u t e a new m e d i a t e d schema M', a n d r e b u i l d a l l those a l ready -bu i l t app l i ca t i ons , w h i c h takes qu i te a lo t of r e d u n d a n t work . S ince the m e d i a t e d schema can change p e r i o d i c a l l y d u r i n g the l i fe cyc le of such a P D M S , i t is no t rea l is t ic to ask the user of each peer to b u i l d a new 50 Chapter 6. Updating the Mediated Schema da tabase a p p l i c a t i o n every t ime the m e d i a t e d s c h e m a is u p d a t e d . A g o o d s t ra tegy is to let the peer m a i n t a i n i ts a l ready -bu i l t a p p l i c a t i o n APPp(M) a n d m a i n t a i n a m a p p i n g f r o m M' to M. S ince the m a p p i n g f r o m M' to P can eas i ly be ca l cu la ted b y us ing the M a p p i n g T a b l e t h a t co r responds to M', any que ry over M passed f r o m other peer c a n be re fo rmu la ted to t h a t over P. P's user c a n also use i ts l oca l s chema or the m e d i a t e d s c h e m a M o n w h i c h APPp is bu i l t . B e l o w is an concrete examp le of such a scenar io . E x a m p l e 1 5 A n examp le of a new peer j o i n i n g the ne twork T h e r e are three peers i n the cur ren t s y s t e m : A, B a n d C. M a p p i n g s are crea ted be tween A a n d B, a n d B a n d C as MapA_B a n d MapB.c respec-t ive ly . W e on l y cons ider re la t ions t ha t c a n represent a Concept ( in the f o r m of a m a p p i n g ) for the m e d i a t e d schema because they are the core pa r t of our d i scuss ion . C o n s i d e r the s i t ua t i on s h o w n i n F i g u r e 6.1(a): af ter the se tup phase , a m e d i a t e d schema MS\ has been bu i l t and m a i n t a i n e d at each peer . T h e m a p p i n g s f r o m MS\ to A, B a n d C a n d the M a p p i n g T a b l e s also have been bu i l t . A f t e r b u i l d i n g the m e d i a t e d schema MS\ a n d the m a p p i n g s to l o c a l sources, user at peer B bu i l t a da tabase a p p l i c a t i o n APPB u s i n g the m e d i -;A¥pB(Ms;t) (a) initial system (b) after peer D joins F i g u r e 6.1: A d d i n g a N e w Pee r to the S y s t e m 51 Chapter 6. Updating the Mediated Schema ated schema MS\. So the i n f o r m a t i o n s to red at each peer c a n be s u m m a -r i zed as fo l lows. - Pee r A : s c h e m a A , Map AM, MS\, MapMSx-A, MT\. - Pee r B : schema B, MapA.B, MapB.c, MS\, MapMsXM, MT\, APPB(MS1). - Peer C : schema C , Maps.C, MS\, MapMSi.c, MT\. i S o m e t i m e la ter , a new peer D decides to j o i n i n the ne twork . It creates a m a p p i n g Mapc_D to peer C, as s h o w n i n F i g u r e 6.1(b) . U s i n g the a lgo r i t hms desc r i bed i n C h a p t e r 5, a new m e d i a t e d schema MS2, M a p p i n g T a b l e MT2 a n d G L A V m a p p i n g s c a n be c o m p u t e d at peer C. C passed MS2, MT2 to D. D compu tes MapMS2.D- C f u r the r c o m p u t e s the G L A V m a p p i n g f r o m MS2 to C us i ng MapMS2-MSx a n d MapMS^C a n d get the new G L A V m a p p i n g MapMS2.c for itself. C a lso passes MS2, MT2, MapMS2.MSx to i ts acqua in tance B. B c o m -putes MapMS2.B us i ng MapMS2.MSx a n d MapMSi-B- B also passes MS2, MT2, MapMS2.MSi to i ts acqua in tance A . U s i n g MapMS2.MSi a n d MapMS^JK, a m a p p i n g f r o m m e d i a t e d s c h e m a MS2 to l oca l s c h e m a A c a n be c o m p u t e d . A t the e n d , a l l the peers get the mos t u p d a t e d i n f o r m a t i o n abou t MS2. E a c h peer also knows how to m a p i ts l oca l s c h e m a to MS2. I n f o r m a t i o n s to red at each peer c a n be s u m m a r i z e d as fo l lows. - Pee r A : s c h e m a A , Map AM, MS2, MapMS2J^, MT2. - Peer B : schema B, MapAM, Mapsjo, MS\, MapMSi-B, APPB (MS!), MaPMS2 .MS,, MapMs2 M, MS2, MT2 - Peer C : schema C, MapB_c, MapcjD, MS2, MapMS2.c, MT2. - Peer D : s c h e m a D, Mapcs>,MS2, MapMS2.D, MT2. F o r peer B, MS\ a n d MapMSiM are kept i n i ts da tabase a p p l i c a t i o n APPB(MSI). MapMS2-MSx is kept because que ry over the new m e d i a t e d 52 Chapter 6. Updating the Mediated Schema schema MS2 c a n be t rans la ted to MS\ a n d fu r ther be used b y a p p l i c a t i o n APPB{MSi). T h u s , every peer c a n keep the u p d a t e d m e d i a t e d s c h e m a a n d m a i n t a i n a m a p p i n g f r o m the m e d i a t e d schema to l oca l schema. C o m p a r e d to the na ive a p p r o a c h , no b u r d e n has been a d d e d to the l o c a l peer users. A d d i t i o n a l l y , a l l the p rev ious l y c rea ted da tabase app l i ca t i ons over p rev ious vers ions of m e d i a t e d schema c a n s t i l l be used . N e w app l i ca t i ons c a n also be c rea ted over the new vers ions of the m e d i a t e d schema. D e t a i l s of th is examp le is s h o w n i n the A p p e n d i x A . • 6.1.1 Algorithm F i g u r e 6.2 shows the a l g o r i t h m of u p d a t i n g the m e d i a t e d i n f o r m a t i o n for the case of a d d i n g a peer . In th is a l g o r i t h m , we cons ider how to get a new m e d i a t e d schema, a n d how to m a i n t a i n the m a p p i n g f r o m the new m e d i a t e d schema to the o ld one o n w h i c h an a p p l i c a t i o n m igh t have been bu i l t . A lgor i thm U p d a t e M e d i a t e d I n f o r m a t i o n ( A d d i n g al P e e r E) Let M be the original mediated schema Let MT be the corresponding MappingTable set 1. E creates a mapping to F which is already in the network 2. E creates a mediated schema MEF using the Procedure in Figure 5.1 3. F creates a new mediated schema M' using MEF and M using the Procedure in Figure 5.14 F computes G L A V mapping MapM'.M and MapM'.F F updates its maintained information: M —> M', MT —> MT' and Mapu-F —* MapM.F F send M', corresponding MT1 to E F broadcast M', corresponding MT' and MapM'M to al l other acquaintances F computes MapM'.F for itself , 4. E maintains MT', M', and computes Mapw.E using MT' 5. For any other peer G that received the message original ly from F If there are applications built over M G maintains the following information: M', MT', MapM'.M, M, MapM'-M, MapM'.G Else G s imply updates M -> M'', MT -> MT' and MapM.G -> MapM'.G G further broadcasts al l mediated schema information to its acquaintances F i g u r e 6.2: U p d a t e m e d i a t e d i n f o r m a t i o n w h e n new peer j o ins the ne twork 5 3 Chapter 6. Updating the Mediated Schema 6.2 Dropping a Peer from the System I n th is sec t ion , we exp lo re four poss ib le approaches to u p d a t e the m e d i a t e d s c h e m a after d r o p p i n g one peer schema. W e c o m p a r e the advantages a n d d isadvantages of each s t ra tegy. ' C o m p a r a b l y , S t ra tegy O n e a n d T w o are the na ive so lu t ions . S t ra tegy T h r e e is be t te r t h a n O n e a n d T w o . However , i n the wors t case, i t faces the same p r o b l e m as S t ra tegy O n e a n d T w o . S t r a t -egy F o u r , t h o u g h poses a d d i t i o n a l requ i rement for the l eav ing peer , c a n keep the sys tem m u c h more s tab le a n d does not requ i re more t ime . W e t h i n k th is is the best app roach a m o n g a l l . S t r a t e g y O n e : O n c e a peer dec ides to leave the ne twork , the peer needs to no t i f y any o ther node i n the ne twork , w h i c h t r iggers the s c h e m a m e d i a t i o n process f r o m the ve ry b e g i n n i n g . E v e r y node i n the ne twork is regarded as a new peer. A d v a n t a g e s : T h e setup for such a sys tem is ve ry easy. N o a d d i t i o n a l a lgo-r i t h m for r emova l ope ra t i on needs to be des igned. O n l y s c h e m a m e d i a t i o n a l g o r i t h m w i l l be i nvo lved . D i s a d v a n t a g e : B a s i c a l l y , th is s t ra tegy is no t rea l is t ic . F i r s t , the s c h e m a m e d i a t i o n process w i l l be too f requent . If peers leave the ne twork f requen t l y i n a shor t p e r i o d , there w i l l be too m u c h s y s t e m wo rk ass igned for s c h e m a m e d i a t i o n only . Resources c a n not be used wise ly . S e c o n d , the p rev ious l y -c rea ted m e d i a t e d s c h e m a canno t be fu l l y used of i n the process of c rea t i ng the new one. T h e new m e d i a t e d schema m igh t no t change d r a m a t i c a l l y f r o m the o ld one b u t th is s t ra tegy requ i res tha t the new ve rs ion of the m e d i a t e d schema s h o u l d be c rea ted f r o m the b e g i n n i n g . In conc lus i on , th is s t ra tegy is not at a l l sa t is factory . S t r a t e g y T w o : R e - d o the schema m e d i a t i o n once every ass igned p e r i o d . If i n one p e r i o d , one peer is l eav ing the ne twork , the s y s t e m needs to do the schema m e d i a t i o n aga in f r o m the b e g i n n i n g . T h e r e are two ways to k n o w whe the r a peer X is l eav ing the network . (1) Peer X not i f ies any o ther node before i ts depar tu re . (2) O t h e r peer , usua l l y X's acqua in tance , P I N s 54 Chapter 6. Updating the Mediated Schema or c o m m u n i c a t e s w i t h peer X. If i t gets no response, X c a n be assumed to have left the ne twork . X ' s i n f o r m a t i o n needs to be removed f r o m the m e d i a t e d s c h e m a a n d t r iggers a new s c h e m a m e d i a t i o n f r o m the b e g i n n i n g . A d v a n t a g e s : B e t t e r t h a n S t ra tegy one because the f requency of the s c h e m a m e d i a t i o n process is decreased. A l g o r i t h m s invo lved are s t i l l s imp le . D i sadvan tages : T h e p rev ious l y -c rea ted m e d i a t e d s c h e m a canno t be fu l l y used of i n the process of c rea t ing the new one. S t r a t e g y T h r e e : T h e leav ing peer X w i l l no t no t i f y o ther peers w h e n it leaves the ne twork . X ' s acqua in tance Y w i l l recognize X ' s l eav ing b y P I N i n g X or c o m m u n i c a t i n g w i t h X b u t ge t t i ng no response. Y once real izes X's absence, i t w i l l t r y to c o m p u t e the new m e d i a t e d schema w i t h o u t k n o w i n g the schema of X. However , th is requ i res t ha t peer V be ab le to recogn ize w h i c h re la t i on i n t he M a p p i n g T a b l e s comes f r o m X. A d v a n t a g e s : (1) U p d a t i n g of the m e d i a t e d s c h e m a is t ime ly . (2) Peer c a n leave the ne twork at any t ime w i t h o u t n o t i f y i n g any o ther . (3) T h e c a l c u -l a t i o n is based o n the p rev ious m e d i a t e d schema, so the process w i l l no t be t i m e - c o n s u m i n g . O l d vers ions of the m e d i a t e d s c h e m a c a n be fu l l y used of. D i sadvan tages : (1) E v e r y peer w o u l d be able to k n o w o ther peers s c h e m a f r o m the M a p p i n g T a b l e s , w h i c h is no t i dea l for the sake of safety a n d p r i v a c y reason. (2) F o r peers tha t lost connec t i on because o ther peers ' leave, such peers need to re- jo in the ne twork , w h i c h costs a d d i t i o n a l s c h e m a m e d i a t i o n work after the de le t ion work . W e use E x a m p l e 16 to e x p l a i n th is i dea . Example 16 C o n s i d e r the examp le p roposed in E x a m p l e 15 (deta i ls i n A p -p e n d i x A ) , a n d assume the s ta tus of the ne twork is w h e n MS2 has been c rea ted a n d a l l m e d i a t e d i n f o r m a t i o n has been s to red at every peer , as is s h o w n i n F i g u r e 6.1(b). C a s e 1: B recognizes t ha t A has left the network . O n c e B recognizes t ha t A has left the ne twork , i t checks y l ' s i n f o r m a t i o n i n the M a p p i n g T a b l e MT2. A is on l y invo lved i n the m a p p i n g w i t h B a n d has no o ther acqua in tances , so the de le t ion w i l l be easy: 55 Chapter 6. Updating the Mediated Schema 1. remove f r o m M a p p i n g T a b l e set MT2 a l l rows o r i g i na l l y f r o m schema A a n d remove f r o m MT2 a l l c o l u m n s on l y f r o m s c h e m a A —> MT3 2. for every re la t i on a i n MS2, if a is o r i g i na l l y f r o m s c h e m a A, remove a f r o m MS2 a n d u p d a t e each re la t i on a i n MS2 c o r r e s p o n d i n g to MT3; 3. delete M a p ^ j g f r o m B's i n f o r m a t i o n set; 4. create the a m a p p i n g MapMS3MS2\ 5. b roadcas t MS3, MT3, a n d MapMS3.MS2 to the o ther peers. C a s e 2: 5 Recogn izes tha t C has left the network . T h i s is a more genera l case w h e n the leav ing peer has more t h a n one acqua in tance a n d acts as a b r idge connec t i ng other peers. B f i rs t checks MT2. F r o m the two in te rmed ia te m e d i a t e d schemas I MBC a n d IMCD, B knows tha t C o r i g i na l l y h a d connec t i on w i t h B a n d D. S o B w i l l f i rs t u p d a t e the m e d i a t e d s c h e m a a n d the M a p p i n g T a b l e . U p d a t e the m e d i a t e d i n f o r m a t i o n : 1. De le te f r o m the m e d i a t e d s c h e m a a l l re la t ions t ha t o r i g i na l l y come f r o m schema B. F o l l o w i n g the de ta i l s i n A p p e n d i x A , delete "B_f l ight(date, company, f l igh tNo, service)" and "B_pr ice( f l ightNo, class, pr ice)" f r o m MS2. 2. De le te f r o m the M a p p i n g T a b l e rows tha t represent re la t ions f r o m s c h e m a B a n d a l l i n te rmed ia te re la t ions w h i c h invo lves s c h e m a B. In th i s ex-amp le , delete f r o m MT2 the rows " B J I i g h t " , "B_pr i ce " , "IMBc-flight", "IMCD-flight" a n d "MSi.flight". 3. C h e c k a l l c o l u m n s of the M a p p i n g T a b l e whe the r there are c o l u m n s not usefu l any more . De le te a l l such c o l u m n s . De le te " I D " a n d "d iscount " f r o m MT2 i n th is examp le . 4. U p d a t e the re la t i on i n the med ia ted schema tha t comes f r o m a m a p -p i n g . I n th i s e x a m p l e , " f l ight(date, f l igh tNo, price, company, service, class, ID, discount, depart, arr ival, numLef t ) " is now u p d a t e d to " f l ight(date, f l igh tNo, price, company, service, class, depart, arr ival, numLe f t ) " . 56 Chapter 6. Updating the Mediated Schema 5. S ince D is lost connec t i on w i t h any other node i n the ne twork , B c a n assume tha t D needs to re- jo in the network . B d o the same de le t ion for D. 6. G e t new m e d i a t e d schema MS3 a n d c o r r e s p o n d i n g M a p p i n g T a b l e s M T 3 . C r e a t e a m a p p i n g MapMS3.MS2-7. B r o a d c a s t MS3, MT3 a n d MapMS3.MS2 t o a 1 1 t n e o t n e r peers. A l l peers t ha t lost connec t i on need to re-enter the ne twork . F o r the wors t case, e.g. the topo logy s h o w n i n F i g u r e 6.3, a l l nodes need to re- jo in the ne twork i f B leaves. (MS2)" (MS2) (MS2). 4m: Di 1 V(MS2) A.. F i g u r e 6.3: Pee r L e a v i n g ( B a d T o p o l o g y ) • S t r a t e g y F o u r : A peer X canno t leave u n t i l X ca lcu la tes the new m e d i a t e d schema. T h e idea is t h a t g iven the o r i g i na l m e d i a t e d s c h e m a M a n d l o c a l s chema X t ha t is to be removed , c o m p u t e the r e m a i n i n g pa r t . In order to do th i s , a " r e m o v a l " opera to r needs to be des igned to remove pa r t of the m e d i a t e d s c h e m a us ing Xys l oca l s c h e m a and m a p p i n g s to get the u p d a t e d m e d i a t e d schema. T h e remova l ope ra t i on c a n remove some re la t ions , or some a t t r i bu tes i n the re la t ions . F u r t h e r , X he lps each of i ts acqua in tances find ano ther acqua in tance i n X ' s acqua in tance l is t to m a k e sure t ha t w h e n X leaves, the network is s t i l l connec ted . X does no t need to b u i l d a m a p p i n g for t h e m since the m a p p i n g c a n be in fer red f r o m the ex i s t i ng M a p p i n g T a b l e s . 57 Chapter 6. Updating the Mediated Schema A d v a n t a g e : T h i s is a be t te r dec is ion t h a n the p rev ious two st ra teg ies. F i r s t , i t is not necessary to re -bu i l d the m e d i a t e d s c h e m a f r o m the ve ry b e g i n n i n g every t ime . T h e o r i g i na l m e d i a t e d schema c a n be fu l l y used w h e n b u i l d i n g the new one. S e c o n d , the ca l cu l a t i on t ime for r e m o v i n g pa r t of the m e d i a t e d schema is far less t h a n r e - b u i l d i n g the who le m e d i a t e d schema. D i sadvan tages : T h e o n l y d isadvan tage is tha t the leav ing peer needs to d o a d d i t i o n a l wo rk a n d no t i f y o ther nodes before i t leaves the ne twork . I n th is s t ra tegy, peers canno t leave w i t h o u t n o t i f y i n g the o ther nodes . However , some ex i s t i ng P 2 P da tabase sys tems also requ i res tha t some peer needs to do e x t r a tasks before leav ing the ne twork . F o r examp le , [16] requ i res t ha t on l y leaf nodes i n the i r ne twork s t r uc tu re c a n v o l u n t a r i l y leave the network wh i l e o ther nodes mus t f i nd a rep lacement to store the i r i n f o r m a t i o n first before leav ing . So th is is acceptab le if we requ i re t h a t a node need to t r igger a remova l o p e r a t i o n before i t leaves the ne twork . T h e remova l a l g o r i t h m is s h o w n i n F i g u r e 6.4 Procedure R e m o v e L o c a l S c h e m a (MS, MT, X, MapMS-x) /* MS is the mediated schema, MT is the corresponding MappingTable set, X is the local peer schema, MapMSJi is the mapping from MS to X */ For every mapping mapMS.x € MapMS-X Let R be body(LAV(mapMS-x)) Let mt € MT be the MappingTable wi th R as the first relation If mt doesn't exist /* R is a direct copy of local peer relation * / M a r k R in MS for later deletion Else Let r be body{GAV(mapMS.X)) Let h be head (mapMS-x) For each attr ibute attr in h Let SG be the subgoals in r containing attr If in mt, no other relations except those in SG have corresponding attr ibutes for attr M a r k attr in MS for later deletion Delete al l attr ibutes and relations wi th deletion marks from MS to M S ' Update MT to MT' Create a mapping MapMS'.MS from MS' to MS Return MS', MT', MapMS>_MS F i g u r e 6.4: U p d a t e m e d i a t e d i n f o r m a t i o n w h e n a peer leaves the ne twork 58 Chapter 6. Updating the Mediated Schema T h e resu l ts MS', MT' a n d MapMs'.MS need to be b r o a d c a s t e d to a l l o ther peers i n the ne twork . Example 17 C o n s i d e r the examp le p roposed i n E x a m p l e 15 (deta i ls i n A p -p e n d i x A ) , a n d assume tha t m e d i a t e d schema for A, B, C a n d D has been c rea ted . Peer D w o u l d l ike to leave the ne twork , so D does the ca l cu l a t i on of the new m e d i a t e d schema. F i g u r e 6.5, 6.6 a n d 6.7 show the ne twork s ta tus before D ' s leav ing . Mediated schema M S 2 = { flight(date, f l ightNo, price, company, service, class, ID, discount, depart, arr ival , numLeft) } F i g u r e 6.5: E x a m p l e 17. I npu t I n f o r m a t i o n (a) MappingTable flig i t : M S i .f l ight 1 2 3 4 - 5 6 7 8 9 10 11 MSi . f l ight 1 2 3 4 5 6 7 8 IMAB-flight 1 2 3 4 5 6 A J l i g h t 1 2 3 B J l i g h t 1 3 2 4 B . p r i c e 1 3 2 I MBC-flight 1 2 3 4 5 6 7 8 B_f l ight 1 3 2 4 B - p r i c e 1 3 2 C s c h e d u l e 2 3 1 C p r i c e 2 1 3 IMCD-flight 1 2 3 4 5 6 7 C s c h e d u l e 2 3 1 C - p r i c e 2 1 3 D_schedule 1 2 3 4 5 F i g u r e 6.6: E x a m p l e 17 I npu t I n f o r m a t i o n (b) A c c o r d i n g to the a l g o r i t h m i n F i g u r e 6.4, D checks every con junc t i ve m a p p i n g i n MapMS2S>, i f the L A V pa r t is d i r ec t l y f r o m a re la t i on r G 59 Chapter 6. Updating the Mediated Schema MapMs2.D'-LV — { MS2~Di(date, flightNo, depart, arr ival , numLeft) :-M5 ,2.flight(date, flightNo, price, company, service, class, ID, discount, depart, arr ival , numLeft) } ' " GV = { MS2--Di(date, flightNo, depart, arr ival , numLeft) :-£>.D-schedule(date, flightNo, depart, arr ival , numLeft) } "'• F i g u r e 6.7: E x a m p l e 17 Inpu t I n f o r m a t i o n (c) D, delete r f r o m the m e d i a t e d s c h e m a MS2 a n d the M a p p i n g T a b l e MT2. In th is e x a m p l e , there 's no such m a p p i n g . If the con junc t i ve m a p p i n g is ob ta i ned f r o m a m a p p i n g between a p a i r of peers, let the L A V p a r t be lav a n d let the G A V pa r t be gav. De le te f r o m the MT2 a l l re la t ions i n gav a n d the in te rmed ia te re la t i on re la ted w i t h gav. So i n th is e x a m p l e , delete "D_schedule" a n d "IMCD flight" f r o m MT2. De le te f r o m MT2 a l l c o l u m n s tha t d o n ' t con ta in any va lues. In th is e x a m p l e , delete the c o l u m n s "depart", "arrival" a n d "numLeft". U p d a t e the re la t i on i n the m e d i a t e d s c h e m a tha t represents lav a cco rd i ng to the u p d a t e d M a p p i n g T a b l e . In th is examp le , "flight(date, flightNo, price, company, service, class, ID, discount, depart; arrival, numLeft)" is u p d a t e d to "flight(date, flightNo, price, company, service, class, ID, discount)". So a new M a p p i n g T a b l e MT3, s h o w n i n F i g u r e 6.8, c a n be o b t a i n e d . A new m e d i a t e d s c h e m a MS3 c a n also be o b t a i n e d ; s h o w n i n F i g u r e 6.9. D also creates a m a p p i n g f r o m MS3 to MS2, s h o w n i n F i g u r e 6.10. D passes MS3, M T 3 a n d MapMS3-MS2 t o a n t n e o ther peers i n the ne twork and leaves the network . • 60 Chapter 6. Updating the Mediated Schema M a p p i n g T a b l e flight: MSi . f l ight 1 2 3 4 5 6 7 8 MSi . f l ight 1 2 3 4 5 6 7 8 IMAB-flight 1 2 3 4 5 6 A J l i g h t 1 2 3 B J l i g h t 1 3 2 4 B_pr ice 1 3 2 IMBC-flight 1 2 3 4 5 6 7 8 B_fl ight 1 3 2 4 B . p r i c e 1 3 2 C s c h e d u l e 2 3 1 C - p r i c e 2 1 3 F i g u r e 6.8: E x a m p l e 17 O u t p u t (a) Mediated schema MS3 = { f l ight(date, f l ightNo, price, company, service, class, ID, discount) } F i g u r e 6.9: E x a m p l e 17 O u t p u t (b) 6.2.1 Section Summary In th is sec t ion , we have exp lo red four poss ib le ways of u p d a t i n g the m e -d ia ted schema w h e n a peer leaves the ne twork . W e have also i l l u s t ra ted the advantages a n d d isadvantages of each a p p r o a c h w i t h examp les . W e f i -n a l l y p resented an a l g o r i t h m to remove a peer schema 's i n f o r m a t i o n f r o m the m e d i a t e d schema based o n the fou r th app roach . T h e f o u r t h a p p r o a c h ou tpe r fo rms a l l o ther approaches because w i t h the s u p p o r t of the remova l o p e r a t i o n , a l l p rev i ous l y cons t ruc ted app l i ca t i ons c a n s t i l l be ava i lab le , a l l the peers are s t i l l connec ted , a n d no r e d u n d a n t work w i l l be resu l ted . 6.3 Evolution of a Peer Local Schema If peer E's l o c a l da tabase schema evolves f r o m SE to S ' E after the sys tem se tup phase, the m e d i a t e d schema M needs to be changed accord ing ly . A 61 Chapter 6. Updating the Mediated Schema MapMS3-MS2'-LV = { MS3.MS21ida.te, flightNo, price, company, service, class, ID , discount) :-MS3. flight (date, flightNo, price, company, service, class, ID, discount) } " ' GV = { MS3.MS21ida.te, flightNo, price, company, service, class, ID, discount) :-M52-flight(date, f l ightNo, price, company, service, class, ID, discount, depart, arr ival , numLeft) } "' F i g u r e 6.10: E x a m p l e 17 O u t p u t (c) s imp le , however effect ive, so l u t i on to th is case is to f i rst t reat E as leav ing the ne twork ( remov ing SE f r o m M), t hen j o i n i n g the ne twork w i t h S ' E . 62 Chapter 7 A S t u d y o f M a p p i n g C o m p o s i t i o n W e have presented our a lgo r i t hms of c rea t i ng a m e d i a t e d s c h e m a a n d m a p -p ings to l oca l sources i n a P D M S i n C h a p t e r 5. W e fu r the r d iscussed how to u p d a t e the m e d i a t e d schema i n C h a p t e r 6. I n a P D M S dea l i ng w i t h que ry p rocess ing , m a p p i n g c o m p o s i t i o n is a lways a m a i n issue to cons ider . H o w we l l the m a p p i n g MapA.c keeps the i n f o r m a t i o n of m a p p i n g s Map AM a n d MapB.c la rge ly decides how we l l the P D M S c a n t rans fer i n f o r m a t i o n a m o n g di f ferent peers. T h u s , we m a k e a s t u d y of m a p p i n g c o m p o s i t i o n i n th is chapter . N o t e tha t our focus i n th is thesis is to u n d e r s t a n d the f u n d a m e n t a l s of m a p p i n g c o m p o s i t i o n i n th is contex t . A s s u c h , ou r a l g o r i t h m is no t des igned to hand le all poss ib le pa t te rns , b u t ra the r to focus o n those t h a t are the mos t c o m m o n . In pa r t i cu l a r , we on l y cons ider i n p u t m a p p i n g s to be m a p p i n g s w i t h the same Concept (De f i n i t i on 3) , i gno r i ng such c o m p l i c a t e d factors as sel f - jo in a n d se l f - rest r ic t ive c o m p o n e n t (De f i n i t i on 6) . O n the o ther h a n d , us ing M e P S y s is a c t u a l l y t rans fe r r i ng the p r o b l e m of m a p p i n g c o m p o s i t i o n in to ano ther : us ing the m e d i a t e d schema to re la te di f ferent schemas , w h i c h is mo re comprehens ib le . However , we m a k e a genera l c o m p a r i s o n here to see how wel l each a l g o r i t h m works i n di f ferent c i r cums tances . Sec t i on 7.1 presents four examp les w i t h the i n t u i t i o n of where the c o m -p lex i t y comes f r om . Sec t i on 7.2 shows the ana lys is resu l ts . 7.1 Complexity: Where Difficulties Come From In sec t ion 2.2, we have presented the de f in i t i on of m a p p i n g c o m p o s i t i o n a n d br ie f l y i n t r o d u c e d the c o m p l e x i t y of m a p p i n g c o m p o s i t i o n i n first o rder log ic 63 Chapter 7. A Study of Mapping Composition l anguage. In th is sec t ion , we present severa l examp les to show where di f f i -cu l t ies come f r o m for the p r o b l e m of m a p p i n g c o m p o s i t i o n . T h e s e examp les are we l l - s tud ied i n [20] a n d [11]. W e quo te t h e m here u s i n g the i r o r i g i na l m a p p i n g language. T h e t r a n s l a t i o n to con junc t i ve m a p p i n g s are t r i v i a l . M a d h a v a n et a l 's wo rk [20] i n t r o d u c e d the c o m p l e x i t y of m a p p i n g c o m -p o s i t i o n b y a n a l y z i n g the n u m b e r of c o m p o s e d m a p p i n g s . T a k e a look at the fo l l ow ing two examp les . E x a m p l e 18 A s s u m e the fo l low ing m a p p i n g s f r o m [20]: MA-B = {a{x,y)C b(x,xi),b{xi,y)} MB-c = {b(x,xi),b(xi,x2),b(x2,y) Q c{x,y)} T h e final c o m p o s e d m a p p i n g set MA-C invo lves more t h a n one fo rmu las : a(x,xi),a(xi,x2) C c(x ,y i ) (18.a) a ( x i , x 2 ) , a ( x 2 , x ) C c ( j / i , x ) (18.b) a(x,xi),a(xi,x2),a(x2,y) C c(x,yi),c{yi,y) (18.c) • Example 18 shows that the number of composed mappings does not de-pend on the number of the input mappings. T o ana lyze E x a m p l e 18, [20] i l l us t ra tes re la t i on a, b a n d c us ing p a t h l eng th . S u p p o s e b encodes a l l the edges of a g r a p h G. R e l a t i o n b(x, y) means there 's a p a t h f r o m x to y whose p a t h l eng th is one. R e l a t i o n a ( x , y) means there 's a p a t h f r o m x to y whose p a t h l eng th is two. S i m i l a r l y , re la t i on c (x , y) means there 's a p a t h f r o m x toy whose p a t h l eng th is three. S o the m a p p i n g i n MA-B means a is a subset of the node pa i r s w i t h pa ths of l eng th two i n G, a n d m a p p i n g i n MB-c means a l l node pa i rs i n G w i t h p a t h of l eng th three are a subset of c. (18.a) descr ibes the fact t ha t i f there 's a p a t h of l eng th four s t a r t i n g f r o m x , t hen there is a p a t h of l eng th three s t a r t i n g f r o m x . (18.b) shows tha t i f there 's a p a t h of l eng th four e n d i n g at x , t h e n there 's a p a t h o f l eng th \ 64 Chapter 7. A Study of Mapping Composition three end ing at x. (18.c) tel ls us i f there 's a p a t h of l eng th three i n a , t h e n there is a p a t h of l eng th two i n c. A l l of the above three m a p p i n g s c a n be in fe r red f r o m MA-B and MB-C as the m a p p i n g f r o m A to C. E x a m p l e 1 9 A s s u m e the fo l low ing m a p p i n g s f r o m [20]: MA-B = {arg{x,y) Q br(x,xi),bg(xi,y) agg(x,y) C bg(x,xi),bg{x1,y) } MB-c = {br(x,xi),bg(xi,x2),bg(x2,y) C crgg{x,y) bg(x,xi),bg{xi,y) C cgg(x,y)} T h i s examp le is also o r i g ina l l y p resented i n [20]. T h e f ina l c o m p o s e d m a p p i n g is an in f in i te set of m a p p i n g s : agg(x,y) Ccgg{x,y) (19.a) arg{x,xi),agg(xi,X2) Qcrgg{x,y\) (19.b) arg(x,xi),agg(x-i,X2), ...,agg(xn,xn+i) C Crgg(x,yi), cgg(yi, y 2 ) , • • •, cgg(yn-i, Vn) (19.n) • Example 19 tells us that the composition of finite mappings may result in infinite set of composed mappings. T o ana l yze E x a m p l e 18, [20] uses co lo red edges to i l l us t ra te th is examp le . In fo rma l l y , each above equa t i on cap tu res the fact t ha t i f there 's a p a t h s t a r t i n g f r o m a red edge fo l lowed b y 2 n + 1 green edges, t h e n there mus t be a p a t h s t a r t i n g f r o m a red edge fo l lowed b y 2 n green edges. E a c h m a p p i n g o n l y encodes f in i te steps i n a p a t h . E a c h t ime , a red edge c a n be a d d e d to the b e g i n n i n g of severa l green edges, w h i c h w i l l cause a new m a p p i n g . A l l of these m a p p i n g s canno t be expressed by o thers . So an in f in i te m a p p i n g set w i l l become the resu l t of m a p p i n g c o m p o s i t i o n i n E x a m p l e 19. In F a g i n et a l 's work [11], they ana l yzed m a p p i n g c o m p o s i t i o n p r o b l e m b y p r o v i n g t h a t i t is N P - c o m p l e t e i f the m a p p i n g is i n a f i rs t -o rder log ic language. T h e y fu r ther p roved tha t us ing S e c o n d - O r d e r L o g i c m a p p i n g language, a l l c o m p o s e d m a p p i n g c a n be expressed. 65 Chapter 7. A Study of Mapping Composition Example 20 A s s u m e the fo l l ow ing m a p p i n g s f r o m [11]: M a p p i n g f r o m A to B : Ve (Emp(e) -r* 3 m Mgrl(e,m)) (20.1) M a p p i n g f r o m B to C : V e V m (Mgr-[(e,m) —> Mgr(e,m)) Ve (Mgri(e,e) -> SelfMgr(e)) (20.2) (20.3) F a g i n et a l . p roved tha t the c o m p o s i t i o n of M a p p i n g A - B a n d B - C is no t de f inab le b y any finite set of source- to- target t up l e genera ted dependenc ies , is no t first-order-definable a n d is no t de f inab le i n D a t a l o g [11]. Howeve r , t h i s c a n be expressed u s i n g second-order logic. M a p p i n g f r o m A to C : Example 20 shows that the composed mapping of two mappings in first-order logic might not be expressed by first-order logic. In tu i t i ve ly , the dif-ficulty i n th is examp le comes f r o m the second a t t r i b u t e of Mgrl(e,m). In m a p p i n g (20.1), m is an ex is ten t ia l va r iab le . It is no t b o u n d b y any of t he var iab les a p p e a r i n g i n the l e f t -hand s ide. However , i n (20.3), the r i g h t - h a n d s ide is a c t u a l l y b o u n d b y the second a t t r i bu te of Mgrl(e,m). m acts as a selector w h i c h decides w h e n Mgrl c a n be m a p p e d to Mgr a n d w h e n to Self Mgr. . A s there is no i n f o r m a t i o n abou t m i n Emp, such a selector is lost i n schema A. T h u s m a p p i n g c o m p o s i t i o n for th is examp le w i l l fa i l i f i t is expressed i n first-order log ic . 7.2 Exploring Possible Patterns in Mapping Composition P r e v i o u s work [11, 20] de f ined w h e n m a p p i n g s w o u l d be d i f f i cu l t or i m p o s -s ib le i n severa l ways . F o r examp le , F a g i n et a l . 's p roved a n d a n a l y z e d w h e n 3 / (V e(Emp(e) -» Mgr(e,f(e)) A V e(Emp(e)(e = /(e)) SelfMgr(e))). • 66 Chapter 7. A Study of Mapping Composition the de f i nab i l i t y and c o m p u t a t i o n a l c o m p l e x i t y of the c o m p o s i t i o n of two schema m a p p i n g s can be de te rm ined . T h e y showed tha t " the c o m p o s i t i o n o f a finite set o f fu l l source- to- target tup le -genera t i ng dependenc ies (source-to- target tgds) w i t h a finite set of source- to- target tgds is a lways def inab le by a finite set of source- to- target tgds , b u t the c o m p o s i t i o n of a finite set of source- to- target tgds w i t h a finite set of fu l l source- to- target tgds m a y no t be def inab le b y any (f in i te or in f in i te) of source- to- target tgds ; f u r t he rmore , i t m a y no t be de f inab le b y a n y f o r m u l a of least fixed-point log ic , a n d t he as-soc ia ted c o m p o s i t i o n que ry m a y be N P - c o m p l e t e " [11]. F o r c la r i ty , K o l a i t i s s u m m a r i z e s the i r resu l ts i n T a b l e 7.1. T a b l e 7.1: A S u m m a r y of M a p p i n g C o m p o s i t i o n C o m p l e x i t y f r o m [11, 18] MapAM MapB.c . MapA-C C o m p o s i t i o n Q u e r y finite set of fu l l s-t tgds (p(x) ->ib(x) finite set of s-t tgds ip(x) - > 3 J / tp{x,y) finite set of s-t tgds ip(x) ->3y ib(x,y) i n P finite set of s-t ' tgds <p(x) - > 3 ? / ip{x,y) finite set of ( ful l ) s-t tgds <p(x) -^ib(x) m a y not be de-finable: b y any set of s-t tgds ; i n F O - l o g i c ; i n D a t a l o g i n N P ; c a n be N P - c o m p l e t e In the i r ana lys i s , they d i v i d e d the m a p p i n g c o m p o s i t i o n pa t t e rns i n to two m a i n categor ies based o n whe the r t he first m a p p i n g is a finite set o f fu l l s-t tgds or not . If the first m a p p i n g is a finite set of fu l l s-t tgds , the m a p p i n g c o m p o s i t i o n c o m p l e x i t y fal ls in to P space. If no t , the c o m p l e x i t y is l i ke ly to fa l l in to N P . O u r wo rk requ i res c o m p o s i t i o n f r o m a di f ferent ang le : re la t i ng di f ferent source schemas not b y the c o m p o s e d m a p p i n g s b u t by the m e d i a t e d schema. T h i s causes us to w a n t to exp lo re w h e n our a l g o r i t h m is poss ib le . I n th is sec t ion , we m a k e a s t u d y of the m a p p i n g c o m p o s i t i o n p r o b l e m based o n [11] a n d [20]. R a t h e r t h a n on l y focus o n these two categor ies t ha t F a g i n et a l . a d o p t e d , we t r y to exp lo re di f ferent fac tors t ha t c a n cause t he c o m p l e x i t y i n m a p p i n g c o m p o s i t i o n a n d come u p w i t h 36 pa t te rns w h i c h c a n 67 Chapter 7. A Study of Mapping Composition be i n c l u d e d in to the above two m a i n categor ies, b u t i n sma l le r g ranu la r i t y . 7.2.1 Different Patterns of Mapping Composition W e m a k e a c o m p a r i s o n w i t h P i a z z a m a p p i n g c o m p o s i t i o n a l g o r i t h m a n d F a -g in et a l ' s S e c o n d - O r d e r L o g i c m a p p i n g c o m p o s i t i o n a l g o r i t h m i n T a b l e 7.2 based o n s i x c r i t e r i a t ha t f o r m 36 m a p p i n g c o m p o s i t i o n pa t te rns . T h e m a i n i d e a of th is s t u d y is to see for a l l poss ib le m a p p i n g c o m p o s i t i o n pa t te rns , how we l l ou r a p p r o a c h of re la t i ng the l oca l schemas u s i n g the m e d i a t e d schema c a n p e r f o r m a n d whe the r the c o m p l e x i t y t ha t are ra ised i n p r e v i -ous wo rk w o u l d also m a k e our app roach d i f f icu l t a n d imposs ib le i n ce r ta i n c i r cums tances . T h e s i x c r i t e r i a for the m a p p i n g c o m p o s i t i o n c o m p a r i s o n s t u d y are l i s ted as fo l lows. It is poss ib le t ha t there exist o ther c r i t e r i a for the p r o b l e m of m a p p i n g c o m p o s i t i o n . T h e c r i t e r i a be low are those tha t we t h i n k essent ia l a n d c lear . • W h e t h e r each m a p p i n g is a 1 -subgoa l - to - l - subgoa l m a p p i n g or a m-subgoa l - to -n -subgoa l m a p p i n g . • W h e t h e r the f i rst m a p p i n g Map AM is a f u l l set of tgds or not . • W h e t h e r the second m a p p i n g Maps.c is a fu l l set of tgds or no t . T h i s c r i t e r i a w i t h the p rev ious one are used to j udge w h e t h e r m a p -p i n g c o m p o s i t i o n is dec idab le for each spec i f ic case, w h i c h is p roposed by F a g i n et a l . i n dec id i ng the c o m p l e x i t y of m a p p i n g c o m p o s i t i o n p r o b l e m . • W h e t h e r the ex is ten t ia l a t t r i bu tes i n the second s c h e m a B m a p to the t h i r d schema C. T h i s c r i t e r i a is c o m p l e m e n t a r y to the p rev ious two, a n d is ve r y i m p o r -tan t to dec ide whe the r the m a p p i n g c o m p o s i t i o n w i l l fa l l in to N P . J u s t as we show i n T a b l e 7.2, there are a large set of m a p p i n g c o m p o s i t i o n t ha t c a n be eas i ly p rocessed w i t h o u t any c o m p l e x i t y even t h o u g h the f i rst m a p p i n g is no t a fu l l set of tgds . A s imp le e x p l a n a t i o n for th is is 68 Chapter 7. A Study of Mapping Composition t ha t the ex is ten t ia l a t t r i bu te i n the second s c h e m a is the one t h a t h i n -ders the f i rst m a p p i n g to be a fu l l set of tgds , however , th is a t t r i bu te do no t act i n the m a p p i n g c o m p o s i t i o n . T h u s such a t t r i bu tes w i l l no t cause any d i f f i cu l ty i n c o m p u t i n g a c o m p o s e d m a p p i n g . • W h e t h e r there ex is ts se l f - rest r ic t ive c o m p o n e n t for s c h e m a B. Self-restrictive component is def ined i n D e f i n i t i o n 6. Se l f - res t r ic t i ve c o m -ponen t is a factor t ha t causes conf l ic t i n Concept for two m a p p i n g s . O n the o ther h a n d , se l f - rest r ic t ive componen t is a cause of c o m p l e x i t y i n the m a p p i n g c o m p o s i t i o n p r o b l e m as is s h o w n i n E x a m p l e 20. D e f i n i t i o n 6 (Se l f -Res t r i c t i ve C o m p o n e n t ) W e say tha t the c o m p o n e n t C\ is a se l f - rest r ic t ive componen t if: (1) C\ sat isf ies the fo l l ow ing c o n d i t i o n : - C\ is a c o m p o n e n t f r o m CM\\ C2 is a c o m p o n e n t f r o m CM2; C\ a n d C2 are componen ts over the same schema , a n d : I D B ( C i ) = I D B ( C 2 ) a n d (2) Q\ and Q2 cons t ruc ted be low sat is fy Q\ C Q2 h Q\ ^ Q2. - L e t name(sg) be the re la t i on n a m e of the subgoa l sg; L e t sg-narnes(Q) be the names of a l l of the re la t ions i n que ry Q; L e t overlap.names — {name(sg) \ name(sg) G sg.names{C\) a n d name(sg) G sg.names(C2)}; L e t C i o v e r i a p = {sg \ name(sg) G overlapjnames a n d sg G body(C\)}; C2oVerlap — {s9 I name(sg) € overlap-names a n d sg G body(C2)}. - W e now create new quer ies Q\ a n d Q2 t ha t descr ibe the ove r l app ing pa r t s of C\ a n d C2 respec t ive ly : L e t IDB{Qi) and IDB(Q2) be IDB{CMi) (wh i ch , b y the above re-qu i remen t , is also equa l to IDB(CM2); L e t subgoals(Q\) = C i o v e r i a p , a n d subgoals(Q2) = C 2 a u e r i a p ; 69 Chapter 7. A Study of Mapping Composition L e t a l l va r iab les i n Q\ a n d Q2 be d i s t i ngu i shed . T h a t is let vars[head(Q\)) = vars(subgoals(Q\)) a n d let vars(head(Q2)) = vars(subgoals(Q2)). • • W h e t h e r there are any composed non - i den t i ca l sel f - jo in i n the m a p -p ings . Composed Non-identical Self-Join Components is def ined i n D e f i n i -t i o n 7. C o m p o s e d non - i den t i ca l sel f - jo in c o m p o n e n t s are the m a i n cause for the c o m p l e x i t y i n m a p p i n g c o m p o s i t i o n p r o b l e m for m a p -p ings w i t h se l f - jo in . F o r m a p p i n g s w i t h c o m p o s e d non - i den t i ca l c o m -ponen ts w i t h o u t se l f - jo in , there is no such comp lex i t y . T h i s is because , every subgoa l i n one componen t c a n e i ther re la te to exac t l y one sub -goa l i n the o ther componen t or canno t find a subgoa l i n the o ther at a l l . B u t w h e n m a p p i n g s w i t h c o m p o s e d non - i den t i ca l sel f - jo in c o m -ponen ts , m u l t i p l e choices w i l l occu r w h e n re l a t i ng subgoa ls f r o m one c o m p o n e n t to the other . D e f i n i t i o n 7 ( C o m p o s e d N o n - i d e n t i c a l S e l f - J o i n C o m p o n e n t s ) A s s u m e there ex is ts m a p p i n g s A.B a n d B-C. L e t a_6 G A-B a n d 6_c G BJJ be in tersec-t ions . L e t aJb(b) be the b componen t of the in te rsec t ion aJb. L e t 6_c(6) be the b c o m p o n e n t of the in te rsec t ion 6_c. a.b(b) a n d b-c(b) are c o m p o s e d non - i den t i ca l sel f - jo in C o m p o n e n t s if 1) aJ)(b) ^ b-c(b) b u t 3 a re la t i on r s.t. r G body(a-b(b)) a n d r G body(b-c(b)), a n d 2) r appears at least tw ice i n at least one of aJ)(b) a n d b.c(b). • 70 T a b l e 1.2: Di f ferent P a t t e r n s for M a p p i n g C o m p o s i t i o n P a t t e r n 1- to- l or MapAM MapB.c ex is tent ia l self- composed P i a z z a M e P S y s Second -n u m m- to -n fuU f u l l a t t r i n B m a p t o C res t r ic t i ve n o n -i den t i ca l sel f - jo in O r d e r L o g i c a lgo 1 1:1 / / / X / / V / 2 1:1 / / / / / / X / 3 1:1 / X / X / / / / 4 1:1 / X / / / / X / 5 1:1 X / X X / / / / 6 1:1 X / X / / / X / 7 1:1 X / / X / notel / / 8 1:1 X / / / / notel X / 9 1:1 X X X X / V / / 10 1:1 X X X / / / X / 11 1:1 X X / X / notel / / 12 1:1 X X / / / notel X / 13 m : n / / / X X / / V 14 m : n / / / X / / X note2 15 m : n / / / / X / X / 16 m : n / / / - / / / X note2 17 m :n / X / X X / / / 18 m : n / X / X / X note2 P a t t e r n 1- to- l o r MapAM MapB.c ex is tent ia l self- composed P i a z z a M e P S y s S e c o n d -n u m m- to -n fu l l f u l l a t t r i n B m a p to C res t r ic t ive n o n -iden t i ca l sel f - jo in O r d e r L o g i c a lgo 19 m : n • X 1 / X / X / 20 m :n / X 1 / / / X no te2 21 m : n X / X X X / / • 22 m : n X • X X_ / / X no te2 23 m : n X / X / X / X / 24 m : n X / X / / / X n o i e 2 25 m : n X / / X X not-el /• / 26 m :n X • / X / note! X note2 27 m : n X / / / X notel X • 28 m :n X / / / / not el X no te2 .29 m : n X X X X X / / / 30 m : n X X X X / / X n o i e 2 31 m : n X X X / X / X / 32 m :n X X X / • / X n o i e 2 33 m :n X X / X X n o i e l / / 34 m :n X X / X / n o t e l X no te2 35 m :n X X / / X n o t e l X • 36 m : n X X / / / n o t e l X note2 - a to Chapter 7. A Study of Mapping Composition 1. notel: L o g i c a l l y s p e a k i n g , the P i a z z a m e t h o d is no t r igorous enough to hand le such cases. F o r e x a m p l e , j us t as w h a t we have e x p l a i n e d i n E x a m p l e 2, cons t ra in t o n y\ is lost i n the c o m p o s e d m a p p i n g . W e ca l l the case t ha t cons t ra in t spec i f ied i n the o r i g i na l m a p p i n g s b u t lost i n the c o m p o s e d m a p p i n g a constraint loss case. 2. note2: R e s u l t s are log ica l l y cor rec t , b u t are ve ry c o m p l i c a t e d , no t m e a n i n g f u l a n d easy to u n d e r s t a n d . Take E x a m p l e 21 as a n examp le , w h i c h is the same as E x a m p l e 2 b u t a l l the var iab les are r e n a m e d . E x a m p l e 2 1 U s e the S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n A l g o r i t h m to process the fo l low ing m a p p i n g s . MA_B: V x \fy(a(x,y) -> 3 x i o ( x , x i ) , 6 ( x i , y ) ) MB_C: V x V x i V x 2 Vy(b(x,xi),b(xi,x2),b(x2,y) -> c(x,y)) U s i n g the S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n A l g o r i t h m , the re-su l t is a set of eight ve ry c o m p l i c a t e d log ic express ion . D e t a i l s a n d resu l ts of th is examp le are s h o w n i n A p p e n d i x B . C o m p a r e d to the resu l ts g iven by P i a z z a i n E x a m p l e 18, th is resu l t is m u c h too c o m p l i c a t e d a n d h a r d to read . • 7.2.2 An Analysis of Mapping Composition Patterns F r o m T a b l e 7.2, the fo l low ing ru les c a n be conc luded . 1. W h e t h e r P i a z z a m e t h o d is express ive or no t depends en t i re ly o n whe the r ex is ten t ia l a t t r i bu tes i n the second s c h e m a are m a p p e d to the t h i r d schema. 2. T h e S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n a l g o r i t h m c a n hand le cases w i t h c o m p o s e d non - i den t i ca l sel f - jo in c o m p o n e n t s . However , the resu l ts do not con ta in any semant i c i n f o r m a t i o n , o n l y l og ica l l y cor rect . T h e represen ta t ion is c o m p l i c a t e d a n d h a r d to u n d e r s t a n d . 73 Chapter 7. A Study of Mapping Composition 3. F o r 1:1 m a p p i n g s , whe the r M e P S y s c a n hand le the p a t t e r n depends o n whe the r se l f - res t r ic t ive c o m p o n e n t ex is ts . I f se l f - res t r ic t i ve c o m p o -nents ex is t , conf l ic ts for the same concept w i l l ex is t i n the m a p p i n g s so tha t no re la t i on i n the m e d i a t e d s c h e m a for a c o m m o n concep t c a n be c rea ted . 4. F o r m :n m a p p i n g s , whe the r M e P S y s c a n hand le the p a t t e r n depends o n whe the r composed non - i den t i ca l sel f - jo in c o m p o n e n t s ex is t i n the p a t t e r n . F o r now, M e P S y s has yet to real ize the m e d i a t i o n of schemas i f m a p p i n g s con ta i n c o m p o s e d non - i den t i ca l se l f - jo in componen t s . W e p u t th is i n to our fu tu re work . 5. F o r the cases w h e n MapA_B is a fu l l set of tgds , m a p p i n g c o m p o s i t i o n c a n succeed u s i n g any of the three a l g o r i t h m . 6. F o r the cases w h e n Map AM is a not a fu l l set of tgds , i t is no t a l -ways the case tha t m a p p i n g c o m p o s i t i o n w i l l be undec idab le . It w i l l d e p e n d o n whe the r ex is ten t ia l a t t r i bu tes i n s c h e m a B c a n be m a p p e d to schema C. A n a l y z e the three approaches , espec ia l l y M e P S y s , u s i n g the resu l ts of T a b l e 7.2. T h e two m a p p i n g c o m p o s i t i o n a l go r i t hms , P i a z z a a p p r o a c h [20] a n d the S e c o n d - O r d e r L o g i c a l g o r i t h m [11], c a n h a n d l e a l l of the pa t te rns , t h o u g h i n some cases, the i r c o m p o s e d m a p p i n g s are not r igo rous enough or no t comprehens ib le enough . M e P S y s w i l l fa i l to hand le m a p p i n g c o m p o s i -t i on whenever se l f - rest r ic t ive componen ts or c o m p o s e d non - i den t i ca l sel f - jo in c o m p o n e n t s ex is t . However , se l f - rest r ic t ive componen ts w i l l cause a conf l ic t i n the m a p p i n g s to have the same Concept, a n d M e P S y s w i l l o n l y merge schemas w h e n m a p p i n g s express the same Concept. T h u s , m a p p i n g s w i t h se l f - res t r ic t ive c o m p o n e n t are ou t of the cons ide ra t i on of M e P S y s . A s i d e f r o m these two spec ia l g roups of m a p p i n g c o m p o s i t i o n , the ap -p r o a c h of us ing P S M a l g o r i t h m to b u i l d a m e d i a t e d s c h e m a a n d us ing the m e d i a t e d s c h e m a to re late di f ferent sources is dec idab le . C o m p a r a t i v e l y , whenever S e c o n d - O r d e r L o g i c a l g o r i t h m has t r o u b l e i n p resen t ing the resu l ts , ou r app roach also has t roub le of tha t p a t t e r n . T h i s 74 Chapter 7. A Study of Mapping Composition is because a l l such pa t te rns con ta in sel f - jo ins. In our cu r ren t a p p r o a c h , self-j o i n c o m p o n e n t s are not cons idered for the i n p u t m a p p i n g s . W e have yet to s t u d y a n d a n a l y z e w h e n m a p p i n g s w i t h sel f - jo in c o m p o n e n t s c a n represent the same Concept a n d w h e n canno t . T h i s w i l l be ou r fu tu re work . O n the o ther h a n d , w h e n P i a z z a a p p r o a c h is weak i n p resen ta t i on , i t is not a lways the case t ha t the M e P S y s w i l l have t roub le . T h i s is because , P i a z z a a p p r o a c h is weak i n p resen ta t ion w h e n ex is ten t ia l a t t r i bu tes i n the second s c h e m a m a p t o the t h i r d one w h i l e ou r a p p r o a c h t ransfers t he m a p p i n g c o m p o s i t i o n p r o b l e m to the one tha t how each l oca l s c h e m a re lates to the m e d i a t e d schema. 75 Chapter 8 S y s t e m I m p l e m e n t a t i o n a n d E x p e r i m e n t In p rev ious chap te rs , we d iscussed M e P S y s , a P D M S s u p p o r t i n g a m e d i a t e d s c h e m a to he lp t rans la te quer ies. W e also presented ou r P S M a l g o r i t h m of b u i l d i n g a m e d i a t e d schema , c rea t ing m a p p i n g s to l oca l sources a n d t rans -l a t i ng quer ies a m o n g di f ferent peers. In th i s chap te r , we s t u d y the per for -m a n c e of M e P S y s i n the case of c rea t ing a med ia ted s c h e m a a n d m a p p i n g s , u p d a t i n g the m e d i a t e d schema a n d m a p p i n g s , a n d que ry r e fo rmu la t i on . 8.1 System Implementation and Setup W e choose F r e e P a s t r y [2] as a p rov ide r of ne twork layer to ou r M e P S y s . F r e e P a s t r y is a gener ic , sca lab le subs t ra te for P 2 P app l i ca t i ons . It uses effi-c ient r o u t i n g st rategy, a n d each node m a i n t a i n s a r o u t i n g tab le w h i c h keeps t rack of i ts i m m e d i a t e ne ighbors . F r e e P a s t r y also p rov ides the f unc t i ona l i t y of n o t i f y i n g each peer a p p l i c a t i o n of message a r r i va l , n e i g h b o r i n g node fa i l -u res , etc. T h e expec ted n u m b e r o f f o r w a r d i n g s teps i n F r e e P a s t r y over lay ne twork is 0(logN). W e use F r e e P a s t r y ve rs ion 1.4.4 for the ne twork layer. A l l i m p l e m e n t a t i o n is w r i t t e n i n J a v a , w h i c h makes i t c ross -p la t f o rm . W e se tup our expe r imen t i n E m u l a b [1], a ne twork e m u l a t i o n t es tbed , to get the rea l i s t i c P 2 P env i ronmen t for our sys tem. E m u l a b p rov ides the a b i l -i t y to access a set of di f ferent mach ines to emu la te nodes i n a rea l ne twork . N e t w o r k b a n d s a n d message delays c a n also be set. D i f ferent n u m b e r s of peer nodes (12, 16, 20 a n d 24) have been used for the pe r f o rmance of s c h e m a m e d i a t i o n a n d query re fo rmu la t i on i n M e P S y s . I n t o ta l , we get 40 mach ines f r o m E m u l a b , i n c l u d i n g 15 used for ne twork con t ro l a n d 1 for e x p e r i m e n t con t ro l , l eav ing us w i t h a m a x i m u m of 24 di f ferent peers. E a c h E m u l a b 76 Chapter 8. System-Implementation and Experiment m a c h i n e t h a t we get p rov ides 9 0 0 M m e m o r y w i t h 2992.787 M H z processor . T h e qua l i t y of such mach ines is s im i l a r to t ha t of a peer i n a P 2 P ne twork s ince each peer needs to act as b o t h a server a n d a c l ient . W e choose 70ms de lay for each message a n d 5 0 M b a n d w i d t h 1 to s imu la te a rea l ne twork env i ronmen t . A l t h o u g h E m u l a b was o n l y ab le to p rov ide 24 peers for our expe r imen t , w h i c h is less t h a n the hund reds or t housands of peers t ha t m a y ex is t i n a P 2 P ne twork , we bel ieve th is to be a reasonab le n u m b e r of peers to test for a P D M S . I n a P D M S , the sys tem is r e l y i n g o n the t r a n s l a t i o n of semant i cs , a n d wh i l e our s ys tem does imp rove the semant ics of que ry t r a n s l a t i o n t h r o u g h the i n t r o d u c t i o n of C o n c e p t ( C h a p t e r 4 ) , a t t e m p t i n g to seman t i ca l l y compose que ry rewr i tes across hund reds of peers w i t h hetero-geneous schemas w i l l no t resu l t i n a seman t i ca l l y m e a n i n g f u l resu l t . 8.2 Input Schemas and Mappings W e based the re la t i ona l schemas at each peer o n the schemas in T P C - H [4], where each schema.con ta ins 8 re la t ions , each w i t h an average of 8 a t t r i bu tes . W e use th is i n f o r m a t i o n to a u t o m a t i c a l l y generate a re l a t i ona l s c h e m a as a l oca l da tabase schema for each peer i n M e P S y s , m a k i n g our peer s c h e m a as rea l is t ic as poss ib le . W e fu r the r generate m a p p i n g s be tween pa i r s of acqua in ted nodes ; as we w i l l see, the m a p p i n g s va r ied f r o m expe r imen t to expe r imen t , b u t we con t ro l l ed (1) the average n u m b e r of acqua in tances pe r peer (2) the average n u m b e r of re la t ions pe r peer s c h e m a (3) the average n u m b e r of a t t r i bu tes i n a re la t i on . 8.3 Experiment 1: Schema Mediation T h e m a i n pu rpose of th is expe r imen t is to see the pe r f o rmance of the a l -go r i t hms desc r ibed i n C h a p t e r 5 a n d to c o m p a r e the pe r f o rmance w h e n di f ferent numbers of peers are invo lved i n the sys tem se tup phase . l rThe bandwidth is not an important factor in MePSys. We ran the same experiment using I M bandwidth, and no significant difference in time occurred in the result. 77 Chapter 8. System Implementation and Experiment In T h e o r e m 1, we gave a n u p p e r b o u n d of t i m e for s y s t e m se tup phase. T h e t i m e of s c h e m a m e d i a t i o n i n the P D M S is dec ided b y the n u m b e r of hops w h e n the last node i n the ne twork rece ived a l l node i n f o r m a t i o n : m&XE{maxF{shortestPath(A,F) + shortestPath(F, E)}}, where A is the m e d i a t i o n s t a r t i n g node , E a n d F represent a l l nodes i n the ne twork . F i g -ure 8.1 shows the resul t of m e d i a t i n g 12, 16, 20 a n d 24 peers ' i n f o r m a t i o n respect ive ly . W h e n peer n u m b e r is f i xed , the t ime of c o m p l e t i n g a m e d i -a ted schema is l inear a n d p r o p o r t i o n a l to the n u m b e r of hops . T h e t ime of each hop is dec ided b y two factors : de lay t ime for each message a n d mes-sage t rans fe r r i ng t ime . In our scenar io , the t i m e of t rans fe r r i ng a m e d i a t e d s c h e m a canno t be ignored . T h e t i m e for t rans fe r r i ng a m e d i a t e d s c h e m a is h i g h l y re la ted to the size of the m e d i a t e d schema. W h e n the n u m b e r of peers gets larger , the m e d i a t e d s c h e m a also increases, r esu l t i ng the t i m e of each hop to increase as we l l . T h u s the t ime of each hop for m e d i a t i n g 12, 16, 20 a n d 24 peers are di f ferent. O n the o ther h a n d , w h e n the n u m b e r of peers i n a ne twork is f i xed , the t ime for each hop is s im i l a r , because the messages the nodes are pass ing each t ime are of s im i l a r s ize. C o n s i d e r i n g the case of 24 peers w i t h 20 hops , the t o ta l t i m e of m e d i a t i n g the 24 peer schemas o n l y costs 31 seconds, w h i c h is qu i te a sa t i s fac to ry t ime cost for the s y s t e m se tup phase. 8.4 Experiment 2: Query Reformulation In th is expe r imen t , we test the pe r fo rmance of ou r que ry r e f o r m u l a t i o n a l -g o r i t h m a n d c o m p a r e the l oca l c o m p u t a t i o n t i m e w i t h the who le que ry re-f o r m u l a t i o n t i m e i n the network . S ince the que ry r e fo rmu la t i on a l g o r i t h m is ve ry fast , for each que ry o n each peer , the que ry is r e fo rmu la ted 10 t imes , a n d th i s que ry re fo rmu la t i on is r epo r t ed as l oca l c o m p u t i n g t ime . T h i s way, t h o u g h the t ime was a r t i f i c ia l l y i n f l a ted , i t w o u l d no t be sub jec t to t i m i n g errors caused b y h a v i n g to measure very s m a l l numbe rs . T i m e of que ry r e fo rmu la t i on a n d b r o a d c a s t i n g i n the who le ne twork is p r o p o r t i o n a l to the topo logy d e p t h w h e n each que ry message size is s im i l a r . W e def ine Topology Depth as the m a x i m u m of the shor test p a t h for a l l 78 Chapter 8. System Implementation and Experiment Tim e ot b u il d i n g a m ed iated schemaV.S.Numb er of Hops - » - 12peers —•-IBpeers 20peers —«-24 peers g 20 31.38 „ -29M5 « - M ."55 19.69 21.54 23.03 15.16 17:67 •—~"*Tfl2 — — • 18.38 15.20 *-tfT7l ' ^ — - — " M T U — * 1 3 70 S.BJ 1 12 14 I 1 16 Num of Hops 1 18 20 F i g u r e 8.1: T i m e of b u i l d i n g a m e d i a t e d s c h e m a V . S . N u m b e r of H o p s nodes f r o m the node where que ry is i n i t i a l l y posed . TopologyDepth = max{shortestPath(A, E)} where A is the node where que ry is posed , E is a l l o ther nodes i n the ne twork . T o p o l o g y d e p t h takes a n i m p o r t a n t ro le i n dec id i ng the t o t a l t ime for query re fo rmu la t i on i n the network . T h i s is because quer ies w i l l b e sent to a l l o ther acqua in tances except the message source, a n d each node w i l l no t process the same que ry tw ice . W h a t ' s more , i n mos t cases, query messages are s m a l l . F i g u r e 8.2 shows the t ime resul ts of c o m p l e t i n g the que ry re fo rmu-la t i on and b r o a d c a s t i n g i n the network . W e posed a que ry w i t h 2 subgoa ls on one peer schema. Fo r each topo logy d e p t h f r o m 4 to 9, we r u n expe r i -ment on 12, 16, 20 a n d 24 peers respect ive ly . R e s u l t s show tha t for quer ies (w i th re la t i ve ly s m a l l s ize unde r l k ) sent to topo log ies of the same topo logy d e p t h , there is a lmos t no t ime di f ference t o re fo rmu la te the q u e r y a m o n g the four di f ferent sets. O n the o ther h a n d , resul ts i n F i g u r e 8.2 show tha t the t ime of que ry re fo rmu la t i on a n d b r o a d c a s t i n g is exac t l y p r o p o r t i o n a l to 79 Chapter 8. System Implementation and Experiment Time of Query Reformulation and Broadcasting V.S.Topology Depth | —•—12 peers 16 peers - * - 2 0 p e e r s • 24peers| 2.5 • 2 • ^ ^ ^ ^ ^ rime (s) , . • - * r " - ^ " ^ * * ^ 0.5 • 0 -4 — i 1 i i i i 5 6 7 8 9 Topology Depih F i g u r e 8.2: T i m e of Q u e r y R e f o r m u l a t i o n a n d B r o a d c a s t i n g V . S . D e p t h of T o p o l o g y the topo logy d e p t h of the network . W e fu r ther c o m p a r e d the qua l i t y of the query r e fo rmu la t i on a l g o r i t h m w h e n di f ferent quer ies are posed . W e f i xed the ne twork t opo logy a n d posed ten di f ferent quer ies t o a source peer. E a c h query con ta in subgoa ls r a n g i n g f rom 1 to 10. Fo r some peer w h e n quer ies canno t be re fo rmu la ted , the l o c a l r e fo rmu la t i on t i m e w i l l be ve r y l i t t l e . T h u s for each query , we c o m p a r e the t o t a l r e fo rmu la t i on and b r o a d c a s t i n g t i m e w i t h the m a x i m u m loca l peer c o m p u t i n g t i m e for 10 t imes que ry re fo rmu la t i on . F i g u r e 8.3 shows tha t as the n u m b e r o f subgoa l g rows, the t i m e spen t on l oca l r e f o rmu la t i on t ime also increases. T h i s is as an t i c i pa ted s ince ou r a l g o r i t h m re fo rmu la tes quer ies based o n each subgoa l . A n o t h e r f i n d i n g f r o m F i g u r e 8.3 is t ha t w h e n the subgoa l n u m b e r increases, the t i m e for comp le t -ing the que ry i n the who le ne twork a lso increases. T h i s is because w h e n the subgoa l n u m b e r increases, the size of the message c o n t a i n i n g t ha t que ry also increases, w h i c h causes the t i m e of each hop to increase as we l l . However , w h e n p rocess ing quer ies w i t h no more t h a n 3 subgoa ls , t he r e fo rmu la t i on and b r o a d c a s t i n g such quer ies is s t i l l c lose to cons tan t . F r o m F i g u r e 8.3, we 80 Chapter 8. System Implementation and Experiment c a n conc lude tha t even for 10 t imes of que ry re fo rmu la t i on , m a x i m u m loca l r e fo rmu la t i on t ime is a lways less t h a n 3 % of the t o t a l ne twork t ime . Tim* of Query Reformulation and Broadcasting in the network V.S. Max Local Computing Time (10 times query reformulation) B Time of Max Local Query Reformulation for 10 times • Query Reformulation in the Network F i g u r e 8.3: T i m e of Q u e r y R e f o r m u l a t i o n i n the ne twork V . S . M a x L o c a l C o m p u t i n g T i m e for 10 t imes Q u e r y R e f o r m u l a t i o n 8.5 Experiment 3: Updating the Mediated Schema N e x t , we tested the pe r fo rmance of u p d a t i n g the m e d i a t e d s c h e m a i n the s tab le s ta te , i.e., after the sys tem se tup p e r i o d . T h i s expe r imen t is a fo l low-up expe r imen t of the f i rst expe r imen t , S c h e m a M e d i a t i o n (Sec t ion 8.3), a n d is des igned as fo l lows: A s s u m e a l l peers m a i n t a i n the mos t u p d a t e d me-d ia ted schema a n d G L A V m a p p i n g s to l oca l schema. A new peer Z j o i ns the ne twork a n d creates a m a p p i n g to one of the peer , say A, w h i c h t r i g -gers a new schema m e d i a t i o n process. A f t e r t ha t , A b roadcas ts the new med ia ted schema to a l l o ther peers, a n d t h e n a l l peers , u p o n rece iv ing an 81 Chapter 8. System Implementation and Experiment u p d a t i n g message, u p d a t e the i r l oca l l y m a i n t a i n e d m e d i a t e d s c h e m a a n d other i n f o r m a t i o n . T e n di f ferent expe r imen ts , represen t ing t en di f ferent topo log ies , are exe-cu ted i n order to get the fo l low ing t ime for each se t t ing . 1. t\ = t ime of i n i t i a l s ys tem se tup phase 2. ti = t i m e of c o m p u t i n g the new M at peer A 3. £3 = average t ime of u p d a t i n g M at each peer except A 4. £4 = t o ta l ne twork t ime for u p d a t i n g M i n the who le ne twork af ter Z j o ins the ne twork T i m e t\ is d i rec t l y , ob ta i ned f r o m the s ta t is t i cs i n E x p e r i m e n t 1. T h u s , i t is poss ib le to c o m p a r e the t ime of u p d a t i n g the m e d i a t e d s c h e m a w i t h the i n i t i a l s ys tem se tup t ime for a f ixed topo logy. " T o p o l o g y D e p t h " , w h i c h is def ined i n E x p e r i m e n t 2, is used to no te d o w n the topo logy d e p t h for each expe r imen t . S ince af ter u p d a t i n g the m e d i a t e d s c h e m a at A, the same u p d a t i n g message w i l l o n l y be processed at each peer once, the t ime for the who le u p d a t i n g process to comp le te depends o n the topo logy d e p t h . T a b l e 8.1: U p d a t i n g the M e d i a t e d S c h e m a for A d d i n g a N e w Pee r Exper iment Topology Depth *i (ms) <2(ms) t 3 (ms) t4(ms) t2 / t4 16 peers 12 hops 4 .10733.86 71.33 0.0099 5868.13 1.21% 16 peers 14 hops 5 13147.71 71.10 0.0105 7325.73 0.97% 16 peers 16 hops 7 15542.58 69.16 0.0099 11477.08 0.46% 16 peers 18 hops 8 17117.51 71.49 0.0105 12766.28 0.56% 16 peers 20 hops 9 18376.80 71.38 0.0106 13075.01 0.54% 20 peers 12 hops 4 15160.07 91.50 0.01 5297.89 1.73% 20 peers 14 hops 5 17672.95 90.57 0.0103 6616.83 1.37% 20 peers 16 hops 6 19693.76 91.01 0.0105 7921.49 1.15% 20 peers 18 hops 8 21542.09 87.99 0.009 9591.46 0.92% 20 peers 20 hops 7 23028.56 90.27 0.009 7940.53 1.14% T a b l e 8.1 shows the resu l ts of di f ferent t ime w.r . t . t he t en topo log ies . T h e c o m p u t a t i o n of a new m e d i a t e d schema at peer A, t\, is a lways a r o u n d 82 Chapter 8. System Implementation and Experiment 70 ms for 16 peers, a n d 90 ms for 20 peers , less t h a n 2 % of £4, the t o t a l t ime for u p d a t i n g a m e d i a t e d s c h e m a i n the network . T h e t ime of u p d a t i n g l oca l m e d i a t e d i n f o r m a t i o n i n a l l o ther peers, £3, is a lways a b o u t 0.01 m s , w h i c h c a n be igno red c o m p a r e d to the t i m e spent for the ne twork . 83 Chapter 9 C o n c l u s i o n a n d F u t u r e W o r k In th is thes is , we presented M e P S y s , a re l a t i ona l peer d a t a m a n a g e m e n t . sys tem. O u r key con t r i bu t i ons are: • W e presented a m e c h a n i s m to s u p p o r t a m e d i a t e d s c h e m a in P D M S w h i c h a l lows easier access to more i n f o r m a t i o n . • W e def ined w h a t a concep t means in the case of con junc t i ve m a p p i n g a n d how it i m p a c t s the u n d e r s t a n d a b i l i t y of the m a p p i n g a n d m e d i a t e d schema. • W e i n t r o d u c e d a cons t ruc t M a p p i n g T a b l e w h i c h he lps us to t rans -f o r m u n s t r u c t u r e d m a p p i n g i n f o r m a t i o n to s t r u c t u r e d fo rms a n d w h i c h is easier to cons t ruc t m a p p i n g s f r o m the m e d i a t e d s c h e m a to l oca l sources. • W e also p r o v i d e d an a p p r o a c h to create m e d i a t e d s c h e m a i n P D M S a n d get m a p p i n g s to l oca l sources. - • W e fu r the r exp lo red how to u p d a t e the m e d i a t e d s c h e m a i n the s teady s ta te . • W e f ina l l y i m p l e m e n t e d the sys tem a n d showed the e x p e r i m e n t a l re-su l ts . I n fu tu re work , we w i l l e x tend th is wo rk to a more gener ic m e t h o d . F i r s t , recons ide r ing the de f i n i t i on of Concept, we w a n t to f igure ou t how to d is -t i ngu i sh m a p p i n g s w i t h the same seman t i c i n f o r m a t i o n w h i l e u s i n g di f ferent I D B names. W e also w o u l d l ike to exp lo re the seman t i c issues w h e n a b roader range of m a p p i n g s are cons ide red , e.g., m a p p i n g s w i t h sel f - jo ins. S e c o n d , mo re o p t i m i z a t i o n issues c a n be cons idered i n the fu tu re s y s t e m s ince i n 84 Chapter 9. Conclusion and Future Work cur ren t M e P S y s , each m e d i a t e d schema message passed a m o n g peers is s t i l l large. T h e r e are m a n y poss ib le ways to decrease the size of the messages b y sha r i ng i n f o r m a t i o n be tween di f ferent cons t ruc ts . G o o d o p t i m i z a t i o n can s ign i f i can t l y decrease the schema m e d i a t i o n t ime i n the sys tem se tup phase . T h i r d , we w o u l d l ike to exp lore some bet te r approaches for u p d a t i n g the m e d i a t e d s c h e m a w h e n l o c a l s c h e m a evolves. 85 B i b l i o g r a p h y [1] E m u l a b . h t t p s : / / w w w . e m u l a b . n e t / . [2] F reepas t ry . h t t p : / / w w w . f r e e p a s t r y . o r g / . [3] N a p s t e r , h t t p : / / w w w . n a p s t e r . c o m / . [4] T p c - h . h t t p : / / w w w . t p c . o r g / t p c h / . [5] M . A r e n a s , V . K a n t e r e , A . K e m e n t s i e t s i d i s , I. K i r i n g a , R . J . M i l l e r , a n d J . M y l o p o u l o s . T h e h y p e r i o n p ro jec t : F r o m d a t a i n teg ra t i on to d a t a c o o r d i n a t i o n . SIGMOD Record, 2003. [6] C . B a t i n i , M . L e n z e r i n i , a n d S . B . N a v a t h e . A c o m p a r a t i v e ana lys is of methodo log ies for da tabase s c h e m a in teg ra t ion . ACM Computing Surveys, 18 :323-364, 1986. [7] P . B e r n s t e i n , F . G i u n c h i g l i a , A . K e m e n t s i e t s i d i s , J . M y l o p o u l o s , L . Ser -a f in i , a n d I. Z a i h r a y e u . D a t a managemen t for peer - to -peer c o m p u t i n g : A v i s i on . In WebDB, 2002. [8] A . B o n i f a t i , E . Q . C h a n g , T . H o , L . V . S . L a k s h m a n a n , a n d R . P o t t i n g e r . H e p t o x : M a r r y i n g x m l a n d heterogene i ty i n y o u r p 2 p da tabases (sys tem demos t ra t i on ) . In VLDB, 2005. [9] E l a i n e Q . C h a n g . S c h e m a m a p p i n g a n d que ry t r a n s l a t i o n i n heteroge-neous peer - to -peer x m l da tabases . T e c h n i c a l r epo r t , T h e U n i v e r s i t y of B r i t i s h C o l u m b i a , 2005. M a s t e r thesis. [10] R . F a g i n , P . K o l a i t i s , R . M i l l e r , a n d L . P o p a . D a t a exchange: Seman t i cs a n d que ry answer ing . In ICDT, 2003. 86 Chapter 9. Conclusion and Future Work [11] R . F a g i n , P . G . K o l a i t i s , L . P o p a , a n d W . T a n . C o m p o s i n g s c h e m a m a p p i n g s : Second -o rde r dependenc ies to the rescue. I n PODS, 2004. [12] M . F r i e d m a n , A . L e v y , a n d T . M i l l s t e i n . N a v i g a t i o n a l p lans for d a t a i n teg ra t ion . I n the 16th Nat. Conf. on Artificial Intelligence (AAAI99), pages 67 - 73, 1999. [13] S teven G r i b b l e , A l o n H a l e v y , Z a c h a r y Ives, M a y a R o d r i g , a n d D a n S u c i u . W h a t c a n da tabases do for peer - to-peer . In WebDB, 2001. [14] A . Y . H a l e v y , Z . G . Ives, D . S u c i u , a n d I. T a t a r i n o v . S c h e m a m e d i a t i o n i n peer d a t a managemen t sys tems. I n ICDE, 2003. [15] A l o n Y . H a l e v y , Z a c h a r y G . Ives, Pe te r M o r k , a n d Igor T a t a r i n o v . P i a z z a : D a t a managemen t i n f ras t ruc tu re for seman t i c web app l i ca t i ons . In WWW, 2003. [16] H . V . J a g a d i s h , B e n g C h i n O o i , a n d Q u a n g H i e u V u . B a t o n : A b a l a n c e d t ree s t ruc tu re for peer - to-peer ne tworks . In VLDB, 2005. [17] A . K e m e n t s i e t s i d i s , M . A r e n a s , a n d R . J . M i l l e r . M a p p i n g d a t a i n peer - to-peer sys tems: Semant i cs a n d a l g o r i t h m i c issues. I n SIGMOD, 2003. [18] P h o k i o n G . K o l a i t i s . D a t a exchange a n d c o m p o s i t i o n of schema m a p p i n g s , 2004. P r e s e n t a t i o n s l ides g iven at C a r l e t o n Un i ve rs i t y . h t t p : / / w w w . s c s . c a r l e t o n . c a / " d i i s / a r i s e / P r e s e n t a t i o n s / P h o k i o n K o l a i t i s .ppt . [19] M a u r i z i o L e n z e r i n i . D a t a in teg ra t ion : A theo re t i ca l pe rspec t i ve . In PODS, pages 233 -246 , 2002. [20] J . M a d h a v a n a n d A . Y . Ha levy . C o m p o s i n g m a p p i n g s a m o n g d a t a sources. In VLDB, 2003. [21] W . S . N g , B . C . O o i , K . T a n , a n d A . Z h o u . P e e r d b : A p2p -based sys tem for d i s t r i b u t e d d a t a sha r i ng . In ICDE, 2003. 87 Chapter 9. Conclusion and Future Work [22] R . Po t t i nge r . Processing Queries and Merging Schemas in Support of Data Integration. P h D thes is , 2004. P h D thes is . [23] E . R a h m a n d P . A . B e r n s t e i n . A survey of approaches to a u t o m a t i c s c h e m a m a t c h i n g . VLDB J o u r n a l , 10(4) :334-350, D e c 2001. [24] P . R o d r i g u e z - G i a n o l l i , M . G a r z e t t i , L . J i a n g , A . K e m e n t s i e t s i d i s , I. K i r i n g a , M . M a s u d , R . M i l l e r , a n d J . M y l o p o u l o s . D a t a sha r i ng i n the h y p e r i o n peer da tabase sys tem. In ICDE, 2005. [25] S. P . S h a h , Y . H u a n g , T. X u , M . M S Y u e n , J . L i n g , a n d B . F . O u e l -let te. A t l a s - a d a t a warehouse for in tegra t ive b i o i n fo rma t i cs . In BMC Bioinformatics, 2005. [26] Igor T a t a r i n o v and A l o n Ha levy . E f f i c ien t que ry r e fo rmu la t i on i n peer-d a t a managemen t sys tems. In SIGMOD, 2004. [27] Jef f rey D . U l l m a n . Principle of Database and Knowledge-Base Systems. 1989. [28] Jef f rey D . U l l m a n . I n f o r m a t i o n in teg ra t ion u s i n g l og i ca l v iews . In ICDT, pages 19 -40 , 1997. 88 Appendix A Details for Example 15 Example 15: A n examp le of a new peer j o i n i n g the ne twork T h e r e are th ree peers i n the cur ren t sys tem: A, B a n d C. M a p p i n g s Map AM a n d Mapsx) are c rea ted as fo l lows. W e on l y cons ider the re la t ions t ha t are c rea ted by m a p p i n g s for the m e d i a t e d s c h e m a a n d o m i t re la t ions t ha t do not pa r t i c i pa te i n the m a p p i n g s because re la t ions t ha t represent a Concept a m o n g di f ferent peers are the core pa r t of ou r d i scuss ion . Mapp ing MapA_B'-flight(date, f l ightNo, price) :- A.f l ight(date, f l ightNo, price) flight(date, f l ightNo, price) :-B_flight(date, company,fl ightNo,service), B_price(fl ightNo, class, price) Mapp ing Maps.c'-flight (date, f l ightNo,. class, price) :-B.f l ight(date, company, flightNo, service), B_price(fl ightNo, class, price) flight(date, flightNo, class, price) :-C-Schedule(ID, date, f l ightNo), C-pr ice( ID, class, price, discount) S h o w n i n i n F i g u r e 6.1(a), af ter the se tup phase , a m e d i a t e d s c h e m a MS\ has been bu i l t a n d m a i n t a i n e d at each peer. T h e m a p p i n g s f r o m MS\ to A, B a n d C a n d the M a p p i n g T a b l e w i l l a lso be bu i l t . Mediated schema MS\ = { flight(date, flightNo, price, company, service, class, ID, discount) } 89 Appendix A. Details for Example 15 M a p p i n g T a b l e flight: MSV flight 1 2 3 4 5 6 7 8 IMAB-flight 1 2 3 4 5 6 A i l i g h t 1 2 3 B _ f l i g h t 1 3 2 4 B . p r i c e 1 3 2 IMBC-flight 1 2 3 4 5 6 7 8 B _ f l i g h t 1 3 2 4 B _ p r i c e 1 3 2 C - S c h e d u l e 2 , 3 1 C_pr ice - 2 1 3 Mapp ing MapMSi^\'-LV = { M S i - ^ l ^ d a t e , flightNo, price) :-M5 i . f l i gh t (da te , flightNo, price, company, service, class, ID, discount) } " ' GV = { M S i - j 4 i ( d a t e , flightNo, price) :- AA_n igh t (da te , flightNo, price) } ' " Mapp ing MCLPMSLB'-LV = { M 5 i _ 5 1 ( d a t e , company, flightNo, service, class, price) :-M5 i . f l i gh t (da te , f l ightNo, price, company, service, class, ID , discount) } " ' GV = { MS\.Sedate, company, flightNo, service, class, price) :-B.B_fl ight(date, company, f l ightNo, service), B.B_price(f l ightNo, class, price) . } " ' 90 Appendix A. Details for Example 15 Mapp ing MapMSl.c-LV = { M 5 i - C 1 ( I D , date, flightNo, class, price, discount) :-MS\. flight (date, flightNo, price, company, service, class, ID, discount) } ' " GV = { MS\-Ci(lD, date, flightNo, class, price, discount) :-C.C-schedule( ID, date, flightNo), C C _ p r i c e ( I D , class, price, discount) } "' A f t e r b u i l d i n g the m e d i a t e d schema MS\ a n d the m a p p i n g s to l oca l sources, peer B also b u i l t a da tabase a p p l i c a t i o n APPB over MS\. So the i n f o r m a t i o n s to red at each peer is as fo l lows. - Peer A : schema A , MapAM, MS\, MapMSi^A, MT\. - Pee r B : schema B, MapAM, MapB_c, MS\, MapMS^B, MT\, APPB(MSI). - Pee r C : schema C, MapB.c, MSi, MapMSi.c, MTY. A new peer D j o ins i n the network . It creates a m a p p i n g to peer C, s h o w n i n F i g u r e 6.1(b) . Mapp ing Mapcjj-flight(date, flightNo) :- Cj3chedule(ID, date, flightNo), C_price(ID, class, price, discount) flight(date, flightNo) :-D_schedule(date, flightNo, depart, arr ival , numLeft) U s i n g a lgo r i t hms desc r ibed i n C h a p t e r 5, a new m e d i a t e d s c h e m a MS2, M a p p i n g T a b l e MT2 a n d G L A V m a p p i n g s can be c o m p u t e d . Mediated schema MS2 = { fl ight(date, flightNo, price, company, service, class, ID, discount, depart, arr ival , numLeft) } 91 Appendix A. Details for Example 15 M a p p i n g T a b l e f l ight : A f S i . f l ight 1 2 3 4 5 6 7 8 9 10 11 MSi . f l ight 1 2 3 4 5 6 7 8 I MAB -fl ight 1 2 3 4 5 6 A J l i g h t 1 2 3 B J l i g h t 1 3 2 4 B .p r i ce 1 3 2 IMBC-flight 1 2 3 4 5 6 7 8 B . f l i gh t 1 3 2 4 B - p r i c e 1 3 2 C_schedule 2 3 1 C - p r i c e 2 1 3 / M e n .f l ight 1 2 3 4 5 6 7 . C s c h e d u l e 2 3 1 C - p r i c e 2 1 3 D_schedule 1 2 3 . 4 5 T w o m a p p i n g s w i l l be c rea ted first. Mapp ing MapMs2.MSx-LV = { <3i(date, f l ightNo, price, company, service, class, ID , discount) MS2.flight (date, flightNo, price, company, service, class, ID, discount, depart, arr ival, numLeft) } " ' GV = { Qi(da te , flightNo, price, company, service, class, ID, discount) MS\. flight (date, flightNo, price, company, service, class, ID , discount) } " ' 92 Appendix A. Details for Example 15 Mapp ing MapMS2-D'-LV = { MS2-Di(date, flightNo, depart, arr ival , numLeft) :-MS2-flight (date, flightNo, price, company, service, class, ID, discount, depart, arr ival , numLeft) } ' " GV = { MS2-D1 (date, flightNo, depart, arr ival , numLeft) :-£>.D_schedule(date, flightNo, depart, arr ival , numLeft) } ' " C passed MS2, MT2 a n d MapMS2-D to D. So the i n f o r m a t i o n t ha t peer D stores is : MS2, MT2, MapMS2-D> s c h e m a D, Mapcjj-C fu r ther compu tes the G L A V m a p p i n g f r o m MS2 to C u s i n g MapMS2.MSj. a n d MapMSi.c-Mapp ing MapMS2-C-LV = { MiS ,2-C 1(ID, date, flightNo.class, price, discount) :-MS2-flight (date, flightNo, price, company, service, class, ID , discount, depart, arr ival , numLeft) } " ' GV = { MS2-C1 (ID, date, flightNo,class, price, discount) :-C.C-schedule( ID, date, flightNo), C.C_pr ice( ID, class, price, discount) } " ' ; C w i l l keep the fo l low ing i n f o r m a t i o n : MS2, MT2, MapMS2-C s chema C , Mapc.D, MapB.c-C passes MS2, MT2, MapM:s2-MSl to B. B c o m p u t e s MapMS2M u s i n g MapMs2iMSi a n d MapMsu-93 Appendix A. Details for Example 15 Mapp ing MapMS2.B-LV = { M £ 2 - 6 1 (date, company, flightNo, service, class, price) :-M52-fl ight(date, flightNo, price, company, service, class, ID , discount, depart, arr ival , numLeft) } ' " GV = { . • M52-5i (date, company, flightNo, service, class, price) :-B.B_fl ight(date, company, flightNo, service), B.B_price(f l ightNo, class, price) } Since B has a p p l i c a t i o n APPB s t i l l u s i n g MS\, MS\ a n d the m a p p i n g MS2 to MS\ need to be kept as we l l . So the i n f o r m a t i o n t ha t B s to red is MSi, MS2, MapMS2.MSi, MapMS2JB, MT2, s c h e m a B, Maps.c, Map AM-B w i l l pass MS2, MT2, MapMS2.MS\ to A. U s i n g MapMS2MSx a n d MapMS\.A, a m a p p i n g f r o m m e d i a t e d s c h e m a MS2 to l oca l s chema A can be c o m p u t e d . Mapp ing MapMS2.A'-LV = { M S ^ - ^ i t d a t e , flightNo, price) :-M52-fl ight(date, f l ightNo, price, company, service, class, ID, discount, depart, arr ival , numLeft) } ' " GV = { MS2-Ai(d&te, flightNo, price) :-AA_f l ight (date , flightNo, price) } ' " A w i l l keep the fo l low ing i n f o r m a t i o n : MS2, MT2, MapMS2.A a n d Map AM-T h u s , every peer c a n keep the u p d a t e d m e d i a t e d s c h e m a a n d m a i n t a i n a m a p p i n g f r o m the m e d i a t e d s c h e m a to l oca l schema. A d d i t i o n a l l y , a l l the p rev i ous l y c reated da tabase app l i ca t i ons over p rev ious vers ions of the m e d i a t e d s c h e m a can s t i l l be used. 94 Appendix B D e t a i l s f o r E x a m p l e 2 1 U s e the S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n A l g o r i t h m to process the fo l low ing m a p p i n g s . MA_B: V x Vy(a(x,y) - » 3xib(x,xi),b(xx,y)) ' MB.c-Vx Wxi V x 2 Vy(b{x,x1),b(x1,x2),b(x2,y) -> c(x,y)) A de ta i l ed process is as fo l lows us ing the S e c o n d - O r d e r L o g i c M a p p i n g C o m p o s i t i o n A l g o r i t h m . Step zero (change variable names): MA_B: Vxyy(a(x,y) -> 3xib{x,xx),b{xi,y)) MB.c-V s V s r V s 2 Vt{b(s,sx),b(Sl,S2),b(s2,t) - » c (s , t ) ) Step one (split): MA_B: 3 / ( V x Vy(a (a ; , y ) - 6(x J ( x , y ) ) , 6 ( / ( z , y ) , y ) ) a ( x , y ) - > b{x,f{x,y)), a(x,y) -> b(f(x,y),y) } Step two (compose): MB.c-95 Appendix B. Details for Example 21 {b(s, s i ) , 6 ( s i , s2),b{s2,t)c(s,t)) = M {a(x,y),b(si,s2),b{s2,t) A {x = s) A ( / ( x , y ) = S i ) - > c ( s , * ) ) ; ( a ( x , y ) , 6 ( s 1 , s 2 ) , f j ( s 2 , i ) A (f(x,y) = s) A (y = s i ) - » c(s,t)) } (a(x,y),a(x',y ' ) ,K s 2,£) A (x = s) A ( / ( x , y ) = Si ) A (a;' = Si ) A ( / ( x ' , y ' ) = s 2 ) - > c ( s , t ) ) ; (a{x,y),a(x',y'),b(s2,t) A (x = s) A (f(x,y) = s a ) A ( / ( x ' , y ' ) = s i ) A (y' = s2) -> c ( s , t ) ) ; ( a ( x , y ) , a ( x ' ) y / ) . ^ ( s 2 , i ) A ( / ( x , y ) = s) A (y = Sj) A (x' = s i ) A ( / ( x ' , y ' ) = s 2 ) - + c ( s , i ) ) ; (a(a;,y),o(a; / ,y'),rj(s2,t)A(/(2: ! y) = s) A (y = si) A (x' = s i ) A (f(x',y') = s2) -> c(M)); } ( a ( x , y ) , a ( a ; / , y ' ) , a ( a ; " , y " ) A (a: = s) A (f(x,y) = si) A (x' = sx) A ( / ( x ' , y ' ) = s2) A (x" = s2) A (f(x",y")= t) -> c ( S , £ ) ) ; (0(1, y ) , a ( x ' , y ' ) , a ( x " , y " ) A (x = s) A ( / ( x , y) = 8l) A (f{x', y1) = si) A (y' = s2) A (x" = s2) (f(x",y") = t) -> c(s,t)); {a{x,y),a(x',y'),a{x",y") A ( / ( x , y ) = s) A (y = s : ) A (x' = S i ) A ( / ( x ' , y ' ) = s2) A ( X " = s2) A (f(x",y") = t) -> c ( S , t ) ) ; ( a ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A ( / ( x , y ) = s) A (y = s i ) A (x' = si) A ( / ( x ' , y ' ) = s 2 ) (x" = s 2 ) A ( / ( x " , y " ) = t) -> c ( s , i ) ) ; ( o ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A (x = s) A ( / ( x , y ) = si) A (x' = sa) A ( / ( x ' , y ' ) = s2) A (f(x",y") = s 2 ) A (y" = i ) - c(s,t)); {a{x,y),a(x',y'),a(x",y") A (x — s) A ( / ( x , y ) = s i ) A ( / ( x ' , y ' ) = si) A (y' = s 2 ) A (f(x",y") = s2) A (y" = t) -> c ( s , t ) ) ; ( a ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A ( / ( x , y ) = s) A (y = si) A (x' = S i ) • A ( / ( x ' , y ' ) = s2) A (f(x",y") = s2) A (y" — t) A c(s,t)); (a(x,y),a(x',y'),a(x",y") A ( / ( x , y ) = s) A (y = s i ) A (x' = S i ) A ( / ( x ' , y ' ) = s2) A (f(x",y") = s2) (y" = t) A c(s,t)) 96 1 Appendix B. Details for Example 21 } Step three (add quantifiers): ^{ 3 / V x V y V x ' V y' V x" V y " V s V S, V s2 V r- (a (x , y),0(0;', y ' ) , a ( x " , y " ) A (x = s) A ( / ( x , y ) = si) A (x' = s i ) A (f(x',y') = s 2 ) A ( x " = s 2 ) A ( / ( x " , y " ) = i ) - c M ) ) ; 3 / V x V y Vx' V y ' Vx" V y " V 4r V sx V s 2 V t (a(x, y ) , a ( x ' , y ' ) , a(x", y") A (x = s) A (f(x,y) = s i ) A (f(x',y') = sx) A (y ' = s 2 ) A ( s " = s 2 ) A ( / ( s " , y " ) = 0 - * c ( M ) ) ; 3 / V x V y V x ' V y ' V x " V y " V s V s i V s 2 V i ( a ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A ( / ( x , y ) = s) A (y = Sj) A (x ' = s j ) A ( / ( x ' , y ' ) = s 2 ) A ( x " = s 2 ) A ( / ( * " , y " ) = t ) - » c ( S > t ) ) ; 3 / V x V y V x ' V y ' V x " V y " V s V s i V s 2 V i ( o ( s , y ) , o ( s / , y ' ) > a ( x " ) y " ) A ( / ( x , y ) = «) A (y = s i ) A (x ' = si) A ( / ( x ' , y ' ) = s 2 ) A ( x " = s 2 ) A (/(a:"-,y") = t ) - » c ( B | t ) ) ; 3fVx\/yVx'Vy'Vx"Vy"VsVsi Vs2Vt (a(x,y),a(x',y'),a(x",y") A (x = s) A ( / ( x , y ) = S i ) A (x ' = s i ) A ( / ( x ' , y ' ) = s 2 ) ( / ( x " , y " ) = S 2 ) A ( y " = i ) ^ c ( S , i ) ) ; 3 / V x V y V x' V y ' V x " V y" V s V si V s 2 V t (a (x , y ) , a ( x ' , y'),a{x",y") A (x = s) A ( / ( x , y ) = S l ) A ( / ( x ' , y ' ) = si) A (y ' = s 2 ) A ( / ( x " , y " ) = s 2 ) A ( y " = i ) - > c ( M ) ) ; 3 / V x V y V x ' V y ' V x " V y " V s V ' s i V s 2 V i ( a ( x , y ) , a ( x ' ! y ' ) , a ( x " , y " ) A {f{x,y) = s) A (y = s i ) A (x ' = s : ) A ( / ( x ' , y ' ) = s 2 ) A ( / ( x " , y " ) = s 2 ) A ( y " = t ) - » c ( M ) ) ; 3 / V x V y V x / V y ' V x " V y " V s V s i V s 2 V i ( a ( x , y ) , a ( x ' , y ' ) , a ( x " , y " ) A ( / ( x , y ) = s) A (y = s x ) A (x ' = Sl) A (f{x',y') = s 2 ) A (f(x",y") = s2)A(y" = t)^c(S,t)); } T h e who le set i n step three is the resu l t o f the c o m p o s e d m a p p i n g MapA_c of Map AM a n d MapB.c- C o m p a r e d to the resu l ts g i ven by P i -a z z a i n E x a m p l e 18, th is resu l t is m u c h too c o m p l i c a t e d a n d h a r d to read . 97 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051733/manifest

Comment

Related Items