Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Schema reintegration using generic schema manipulation operators Sun, Xun 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-ubc_2006-0666.pdf [ 21.92MB ]
Metadata
JSON: 831-1.0051766.json
JSON-LD: 831-1.0051766-ld.json
RDF/XML (Pretty): 831-1.0051766-rdf.xml
RDF/JSON: 831-1.0051766-rdf.json
Turtle: 831-1.0051766-turtle.txt
N-Triples: 831-1.0051766-rdf-ntriples.txt
Original Record: 831-1.0051766-source.json
Full Text
831-1.0051766-fulltext.txt
Citation
831-1.0051766.ris

Full Text

Schema Reintegration Using Generic Schema Manipulation Operators by X u n Sun B . S c , Concordia University, 2004 B . E . , J i l i n University, 1997 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in The Faculty of Graduate Studies (Computer Science) The University Of Br i t i sh Columbia September, 2006 © X u n Sun 2006 Abstract Schemas are very likely to be changed from time to time because of re-quirement changes, design revisions, database migration and such scenarios. Especially in a multi-user environment, schemas may be used by different groups or people and are modified to different versions. A t some point in time, it is necessary to reintegrate the different versions and have a final unique version of the schema. Basically, the problem of creating the unique version is to merge the modified schemas. Previous works on schema merg-ing describe how to merge two schemas given the mapping between them. In those works, the two schemas need not come from a common source schema. In fact, the original schema is often unavailable. However, in our scenario, we have the original schema, and we use it in decision-making. This s impli -fies what might otherwise be a complex matching procedure. We attempt to find a generic solution to the schema reintegration problem (i.e., when the original schema is present). In this thesis, we created a framework that implemented the schema re-integration algorithms using generic model management operators. These generic operators have been widely mentioned and explained abstractly in many previous papers. Here we have implemented each operator required, which necessitated formally defining and creating the algorithm for each operator used for this specific schema re-integration purpose. Our contri-butions are: (a) determining that the generic operators can be used for schema reintegration, and (b) designing, implementing, and analyzing the model management operators used in details. Table of Contents Abstract " Table of Contents iv List of Tables v i i i List of Figures ix Acknowledgements x i i i 1 Introduction 1 1.1 Motivation 1 1.2 Problem Statement 5 1.3 Contributions . . 5 1.4 Thesis Outline 6 2 Background 7 2.1 Generic Model Management 7 2.1.1 Model 8 2.1.2 Mapping 11 2.1.3 The Inclusion Rules . 1 3 2.1 A Generic Model Management Operators 16 2.2 Generic Merge 18 2.2.1 Rondo 19 2.3 Three-Way-Merge 20 2.4 Summary ,28 3 Operators 30 3.1 Operator Merge 30 3.1.1 Input 31 3.1.2 Output 32 3.1.3 The Merge Algorithm 32 3.2 Operator Diff 36 3.2.1 Input 37 3.2.2 Output 39 3.2.3 Algorithm 41 3.2.4 Related Work 43 3.3 Operator range 46 3.3.1 Input • 46 3.3.2 Output 46 3.3.3 Algor i thm , 47 3.3.4 Related Work 48 3.4 Operator Apply 49 3.4.1 Apply 50 3.4.2 Function T in Reintegration . 51 3.4.3 Related Work 53 . 3.5 Operator Match 54 3.5.1 Input 54 3.5.2 Output 55 3.5.3 Algor i thm 55 3.5.4 Related Work 59 3.6 Summary 59 4 Support Elements 60 4.1 Definition 62 4.2 Usabil ity of Support Elements 63 4.3 Support Elements Adding Algorithms 65 4.3.1 Algor i thm to F i n d Which Elements Need Support . . 66 4.3.2 E N S Representatives 68 4.3.3 Support Elements Adding Engine and Algorithms . . 69 4.4 Discussion 78 4.4.1 The Root Element 78 4.4.2 Support Elements in Input 79 5 Reintegration System 80 5.1 Reintegration (Three-Way-Merge) Algor i thm • . . 80 5.2 Discussion of Problems in Reintegration System . 82 5.2.1 N u l l Model 82 5.2.2 Support Elements in Input 84 5.2.3 Model Validation 86 5.3 Summary 87 6 Experiment 89 6.1 Operator Experiments 91 6.2 Support Element Engine Experiments 92 6.3 Reintegration System Experiments . . 94 7 Conclusion and Future Work 98 7.1 Conclusion 98 7.2 Future Work 99 Bibliography 101 A Reintegration System Example 104 List of Tables 2.1 Three-Way-Merge Rules 22 2.2 Three-Way-Merge Example 27 3.1 Element Correspondence for Each Operator 57 6.1 Experiments Performed for Each Operator 92 List of Figures 2.1 Different relationship kinds 9 2.2 Model Product 10 2.3 ModelMapping Product 13 2.4 Illegal Model 13 2.5 Morphism in Rondo 20 2.6 Three-Way-Merge Example [14] 21 2.7 Three-Way-Merge Result 21 3.1 Mapping between Model A and Model B 31 3.2 Merge result of Model A , Model B and Mapping 31 3.3 A Mapping between Model A and Mode l B, Model A is the Preferred Model 34 3.4 Mapping between Model O and Model B. Element c is not mapped 37 3.5 Diff result from Figure 3.4, assuming that O is the base model. Element a is shaded as support element 38 4.1 Support Elements Mapping Example 60 4.2 The Diff result in Figure 4.1 wi th the support elements needed to make it a valid model. Support elements are visualized as shaded nodes throughout this thesis 63 4.3 Mapping between a Model wi th a Support Element and a Normal , Element 65 4.4 Elements Need Support & Representatives 68 4.5 Greedy Algorithm Example 76 4.6 Redundancy in Greedy Algorithm: B — E are candidate sup-port elements. The Greedy Algor i thm wi l l add al l of them as support elements, but D is not necessary 76 5.1 Support element c in Input model, which is not needed in Merge result 85 6.1 Model Reintegration System 90 A . l The Original Model MO 105 A.2 The First Modified Model MA 106 A .3 The Second Modified Model MB 107 A.4 The Mapping MapoA between Model MO and Model MA . 108 A.5 The Mapping MapoB between Model MO and Model MB . 109 A.6 The Model Map'OA from Apply(Map0A) . 1 1 0 A . 7 The Model Map'OB from Apply(Map0B) H I A .8 The Model ChangedA 112 A .9 The Model ChangedB (A N U L L model) 113 A . 10 The Mapping MapchangedBChangedA between Models ChangedB and ChangedA 114 A . 11 The Mapping MapchangedAChangedB between Models Changed A and ChangedB 115 A.12 The Difference Model A' between Models ChangedA and ChangedB 116 A.13 The Difference Model B' between Models ChangedB and ChangedA 117 A . 14 The Mapping MapAB between Models MA and MB 118 A . 15 The Merged Model G from Models MA and MB 119 A . 16 The Mapping MapoA' between Models G and A' 120 A . 17 The Merged Model GA' from Models G and A' 121 A . 18 The Mapping MapcA'B' between Models GA' and B' . . . . 122 A . 19 The Merged Model GAB from Models GA' and B' 123 A.20 The Model DeletedA Showing The Difference between Models MO and MA 124 A.21 The Model Deleteds Showing The Difference between Models MO and MB 125 A.22 The Mapping Model MapoACB between Models DeletedA and ChangedB 126 A.23 The Mapping Model MapoBCA between Models Deleteds and ChangedA 127 A.24 The Difference Model ShouldDeleteA between DeletedA and Changeds 128 A.25 The Difference Model ShouldDeleteB between Deleteds and Changed A 129 A.26 The Mapping Model between Models GAB and ShouldDeleteA 130 A.27 The Difference Model GABSDA between Models GAB and ShouldDeleteA 131 A.28 The Mapping Model between Model GABSDA and Model ShouldDeleteB 132 A.29 The F i n a l Result Model Result, Difference between Models GABSDA and ShouldDeleteB 133 ) Acknowledgements I would like to thank Rachel A . Pottinger for being my supervisor and supporting me on finishing this thesis. She provided great guidance and help on my work. I would like to thank Dr . E d w i n M . K n o r r for being my second reader and giving me great comments on this thesis. I would like also to thank all the people at database management lab and the University of Br i t i sh Columbia. I have really enjoyed the time working with them. Finally, I would like to thank my wife Huifen L i , my daughter Cecil ia Sun and all my family members for their supports. I couldn't have accomplished this thesis without them. Chapter 1 Introduction 1.1 M o t i v a t i o n Database schemas are frequently modified and updated because of require-ment changes, design revision, environment changes, customers' require-ments changes, project re-engineering, etc. Additionally, in a teamwork environment, different teams may work remotely from each other and would also potentially make changes on database schema structures. Such changes can cause problems for al l users of the database. Let's consider the following scenario: Company O has designed a product management system that includes a schema named "product" with the following data definition: C R E A T E T A B L E P R O D U C T S ( ProductID int P R I M A R Y K E Y , ProductName string, Brand string, Quantity real, UnitPr ice int This schema is used by two groups, A and B , separately to develop the system. Producto is the original schema and Product A and Product B are the schemas as modified by groups A and B respectively. Dur ing the de-velopment, each group makes a lot of changes, both on the design of the system and the database. Comparing with Producto,the main differences in schema Product A is that the property Brand has been deleted: C R E A T E T A B L E P R O D U C T S ( ProductID int P R I M A R Y K E Y , ProductName string, Quantity real, UnitPr ice int ); While , in schema Products, a new property Discount has been added. C R E A T E T A B L E P R O D U C T S ( ProductID int P R I M A R Y K E Y , ProductName string, Brand string, Quantity real, UnitPr ice int, Discount real ); The changes made by groups A and B fit their local needs. After some time, however, the two parts need to be integrated. Whi le the schema and data from each group are very similar, they cannot be reintegrated easily. Company O has the original schema Producto, which is a baseline to com-pare the changes that each subsidiary has made and match the components that are from the original one. Based on this init ial schema, we expect to generate the following schema as the integration result: C R E A T E T A B L E P R O D U C T S ( ProductID int P R I M A R Y K E Y , ProductName string, Quantity real, UnitPr ice int, Discount real ); Previous research shows that in reintegration scenarios the goal is to keep all changes [14]. Here the attribute Brand has been deleted as in schema A ; it should not appear in the final version schema. Because the new attribute "Discount" appears in the version B , it is believed that the new property is valuable and should be added in the final version. \ Reintegration has been researched in various scenarios before. In [12], the authors show a flexible object merging framework that defines the merging policy targeting to different applications and the context of the collabora-tive activities, so that the reintegration process can be done in automatic, semi-automatic, and interactive ways. It also tries to be generic to suit objects wi th arbitrary structure and semantics. In [2], the authors give a framework describing how to synchronize file systems. It focuses on how to resolve update conflicts. Reintegrating a small number of small schemas can be done manually. However, if the schemas are more complicated and there are many schemas, it is not efficient and productive to do it manually. It is necessary to have a way to solve it programmatically. Some previous works, such as [11], can do reintegrations programmatically. However, these solutions have to be reprogrammed for each data model, and the operations that are done are not always generalizable to other schema operations. Schema management can usually be abstracted as Model Management [4][1]. A Model is not bounded to any kind of specific schema, but is con-sidered to be.a general form for schema integration. In Model Management, we use generic model management operators to manipulate the models. It wi l l be more valuable to use a generic way to achieve the reintegration goal, so that it can also be used in other scenarios that can also be represented by model management, such as reintegrating different versions of U M L ' s or E R diagrams. Frameworks and algorithms developed for Model Management are generic and can be applied to a variety of data models. 1.2 P r o b l e m Statement In this thesis we focus on analyzing and presenting the problem of reintegra-tion of models. Given an original model MO and two modification versions MA and MB, where MA and MB are generated from MO by using model management operators which provide History properties showing the map-pings from MA and MB to MO, we generate an integrated model MG that takes into consideration the changes made in both MA and MB. In [14], the authors have given an algorithm for the steps necessary to do wi th such reintegration. Here we solve the problem programmatically by analyzing and implementing the details of each operator and solving the conflicts that arise in each operator and each step. 1.3 Contributions We have the following contributions in this thesis: • We analyzed the generic model management operators that are used in the schema reintegration system and gave details on the algorithms for implementation. • We discussed the details of "Support Element" that are used to help the integrity of models. We give three algorithms -to select support elements based on different expectations. The "Support Element E n -gine" is generic and can be used by any model management operator. • We fully designed and implemented the Three-Way-Merge system that can be used for Schema Reintegration purposes. 1.4 Thesis Outl ine The remainder of this thesis is organized as follows: Chapter 2 gives back-ground knowledge on generic model management, generic merge and the three-way-merge algorithm. In Chapter 4, we discuss the support elements and the algorithms used to add support elements to models. In Chapter 3, the operators used in the reintegration system are formally defined and analyzed, including Diff, Range, Apply and Match. In Chapter 5, the Three-Way-Merge algorithm and the Schema Reintegration system are ex-plained. In Chapter 7, we talk about the experiments and give conclusions and suggestions for future work. Chapter 2 Background In this chapter, we describe previous works on generic model management, model manipulation operators and merging methods. 2.1 Generic M o d e l Management There are many different kinds of schemas: relational, X M L , etc. Schema manipulations to schemas share many commonalities to problems related to objects and relationships operations outside of databases. For example, U M L , E R diagrams, file systems and ontologies also present a k ind of schema structure that can be used to generate different kinds of data models and designs. When there are different modified versions of these kinds of objects, the reintegration process is very similar to the problem of database schema reintegration. The issues all relate to the objects and the mappings between them, and how to resolve conflicts and keep updates from different versions. If the research of schema manipulation is only restricted to database schema, it would lose the potential to solve similar problems, such as those of E R and the other models mentioned above. Therefore, previous research of Schema Manipulat ion has been abstracted to a higher level: the Model level [1] [5] [4]. A Model is denned as a complex application artifact. It is usually represented as a directed graph and it can abstract and represent many applications, including relational schemas, X M L D T D s , ontologies, U M L s , file systems, and network flows. This abstraction generates an 0 0 like style of data model and platform. Therefore, it can utilize some benefits of 0 0 design and it is generic enough to be applied to any specific data model, including schema management. 2.1.1 M o d e l Models [5] [14] [11] can be represented with graphs. The graph is composed of elements and relationships between these elements. Each element contains a set of properties, which describe the detail of the element, such as the name, constraints, or any needed information of these properties. Each element has a required property Name, ID and History. Property Name is used to represent the name of the element. ID is used to uniquely identify the element. The History property records the last operations to the element. In other words, it shows where and how this element is generated. For example, one element may have a history property of: "diff(300012)", meaning this element is generated by applying the diff operator to the first element with ID: 300012. Note that the element must have another unique ID different from 300012. A relationship is a binary link between two elements and must be one of following five types: Associates, Contains, Has-A, Is-A, and Type-Of. Associates Contains Has-a ls*a Type-of Figure 2.1: Different relationship kinds These five relationship types are illustrated in Figure 2.1. The relationship types and Figure 2.1 come from [14]. More specifically, the semantics of the relationship types are as follows: 1. Associate, A(x,y), is a weak relationship. It simply expresses that if x Associate y, then they have a very weak relationship; it implies no restrictions to the other as shown in Figure 2.1(a). 2. Contains, C(x,y), means x Contains y. It shows a container - x and containee - y relationship. The existence of the containee relies on the existence of the container. For example, in Figure 2.1(b), if table is deleted, then attribute Bob cannot be kept either. Contains is transitive and acyclic. 3. Has-a, H(x,y), means x Has-a sub-component y. It is similar to Contains in the sense that it also expresses the hierarchy of com-ponent vs. sub-component. The difference is that it can be cyclic and the sub-component does not need to be deleted when the component Figure 2.2: Model Product is deleted. For example, in Figure 2.1(c), if the relation deletes the key, the column can st i l l exist in the relation. 4. Is-a, I(x,y), means x Is-a specialization of y. It is very like the idea of inheritance in Object Oriented Design, which expresses a specialization relationship. In Figure 2.1(d), Student is a special kind of Person. Is - a is transitive and acyclic. 5. Type-of, T(x,y) means x's type is y. It expresses the type of an ele-ment. In Figure 2.1(e), it means the Street is the type of Column. It is required that one element can only have one Type-of relationship. In [14], there are more details of the one-type restriction. We can represent the product schema in our previous schema examples using a graph like Figure 2.2. For example, in Figure 2.2, element product has five Contains relationships and each subelement has a Type-of relation-ship. The Product Jd element also has an Is.a relationship showing that it is a primary key. 2.1.2 M a p p i n g One important representation in model management is the mapping, which shows the relationship between two models. Without the mapping, it is hard do anything to manipulate models. As in [3], we assume that a mapping is also a model, which means it also includes elements Emodei and relationships Rmodel that any model has. Moreover, it also contains: • Two models, Modeldamain and ModelTange. The mapping defines the relationship between them. • Mapping relationships Rmap- A mapping relationship r is between an element em e Emodei and an element e G Edomain U ETange. Ele -ments in domain and range that have Model Mapping Relation ships from the same element in map are related to each other. The type of mapping relationship can be either "Equal i ty" or "Simi lar i ty" . Here the expression of "Similarity" is different from the way in [14]. In [14], "Similarity" is presented as a "how related" property in the element. If the element is marked as a "Similarity" type, al l the elements in the domain or range that the element connects to are treated as similar to each other. Here we use the idea used in [13], such that the elements are the same, but the mapping relationships they connected to could have different types, "Equal i ty" or "Similar i ty" . In this way, whether the elements are equal or similar does not rely on the mapping ele-ment, but on each element in the domain and range separately. The same mapping element can be "similar" to one element and equal to another. This also frees the mapping elements from showing "how related". The mapping elements are the same as normal elements in a model and can exist in a mapping model without any mapping re-lationships linked. It makes more sense to the 0 0 design. In this way, "similarity" mapping relationships can be easily identified from "equality" mapping relationships at the relationship level. For example, Figure 2.3 shows a mapping between the original product schema (partial) and the version of group A . The domain model is the orig-inal product schema and the range model is the product schema of version A . The map includes ModelMapping elements that are used to match the elements in model domain and range. Each ModelMapping element has a M odelM apping-Relationship to one of elements in the domain and range models. The relation may be equality, meaning they are exactly the same; or similarity, meaning that they are similar but have differences. There may also be existing elements in the mapping model that are not model map-ping elements, but normal model elements to express some structure of the mapping. In summary, a mapping defines how the domain and range models are related to each other. The ability to name both mapping relationships and all other relationships in the mapping means that each element in the map-ping is not required to be the origin of a mapping relationship. However, the mapping must be a model. To do this, it must be adhere to the inclusion rules that follow in section 2.1.3. Domain (original) Mapping Range (version A) Figure 2.4: Illegal Model 2.1.3 T h e Inclusion Rules In order for the operators to be composable, the result of any operator must be a model. To be a model, all elements must be connected by relationships showing that the elements are members of the model. It is not always true that al l relationships among a number of elements result in a valid model. For example, Figure 2.4 shows an example that even though all the three elements are connected by relationships, they do not construct a legal model. If an invalid model is used as the input of some operators, the result may be unpredictable and error-prone. In model management, the completeness can be verified using inclusion rules [14]. B y saying legal, it means that al l the elements and relationships can be covered using the inclusion rules and there is no dangling element or relationship. The inclusion rules are based on the relationship types in the model. Each model has a unique root element, eR, and all the other elements, E L , have direct or indirect relationships, Re, to the root element. The rules include: • The root element, CR, is always in the model. • I(p, q),p G EL —• q G EL- If p has a Is-a relationship to q, and p is an element of Model , then q is also in the Model . • H(p,q),p G EL —> q G EL- If p has a Contains relationship to q, and p is an element of Model , then q is also in the Model . • T(p,q),p G EL —> q G EL- If p has a Type-of relationship to q, and p is an element of Model , then q is also in the Model . • p G EL,q G EL —> R{p,q) G ReL- If P , 9 are al l elements of Model , then all the relationships between p and q are in the Model . • M(p,q),p G EL -> M{p,q) G i ? e L : If V is an element of Model , then all the model mapping relationships that have p as the origin are in the model. Because the inclusion rules determine when an element is in a model by the relationships we need find all the relationships in a given model, includ-ing those relationships that can be deducted implicitly. The implications include those relationship types that are transitive and the cross — kind — relationship implications. Transitive relationship types include Contains, Has —a and Is —a. The cross — kind — relationship implications are: • If p (Type-of) q and q (Is-a) r =» p (Type-of) r • lip (Is-a) q and q (Has-a) r =*> p (Has-a) r • If p (Is-a) q and q (Contains) r => p (Contains) r • If p (Contains) q and q (Is-a) r ==> p (Contains) r • If p (Has-a) q and q (Is-a) r => p (Has-a) r Using these rules, we can find al l the explicit or implicit relationships in the model and then we can determine if a given model is legal or not. The inclusion rules can also be used to remove the redundant relationships that can be implied using existing relationships. We say element X is inclusion — implied by a relationship R (denoted II(R, X) if: (1) R(X, Y) and the inclusion rules state that if R(X, Y) and Y is in a model, then X is in the model, or, (2) R(Z,X) and the inclusion rules state that if R(Z, X) and Z is in a model then X is in the model. 2.1 A Generic M o d e l Management Operators There axe several operators in model management, such as Diff, Delete, Match , Extract , Domain, Range, Compose, Invert, Apply , Copy, and M o d -elGen [11] [14] [3]. These operators give a programmatic way to transform models in abstract levels. Here we briefly introduce the functionality of these operators, which wi l l be used to implement our generic schema rein-tegration. v • Diff - Dif f(x, y) = z: Takes two models, x and y, as input and returns model z, which contains the elements that are in x but not in y. Diff wil l be discussed in more detail in Section 3.2. • Delete - Delete(x,y) = z: Takes one model x and a set of elements y, and deletes elements y from x and returns the result model z. • Match - Match(x,y) — Mapz: Takes two models x and y and finds the mapping relationships between two given models and returns the mapping model Mapz. Match w i l l be discussed in more detail in Section 3.5. • Extract - Extract(x, y) = z: Takes one model x and a set of elements y, which contains partial elements of x, and selects those elements in y from x and relationships related to y and returns the result model • Domain - Domain(x,ymap) = z: Takes a model x and a mapping model y that has x as the domain of the mapping, and returns a partial model z of x that contains only the elements having mapping relationships in y. • Range - Range(x,ymap) — z: Takes a model x and a mapping model y that has x as the range of the mapping, and returns a partial model z of x that contains only the elements having mapping relationships in y. Range w i l l be discussed in more detail in Section 3.3. • Compose - Compose(Mi(x,y), M2{y,z)) = Ms(x,z): Takes two map-ping models M\ and M2, which are mappings between model x and y, and between y and z respectively. The operator Compose generates a new mapping model M3 between x and z. Whi le quite complicated in some cases, it is not complicated in our schema reintegration. More information on composition can be found in [3]. • Invert - Invert(Map(Mi,M2)) = Map'(M2,Mi): Takes a mapping model Map between domain model M\ and range model M2, swaps the domain and range roles in the mapping and returns a new mapping model Map' that has M2 as domain and M\ as range. • Apply - Apply(x) = y: Takes a model x and applies a function to each element in x to make changes according to the function and returns the modified model M'. Apply w i l l be discussed in more detail in Section 3.4. • Copy - Copy{x) = y: Takes a model x and return a new model y that has the same elements and relationships as x. • ModelGen - ModelGen(x) = y: Takes a type of model x and generates a new type of model y from x. For example, generate a relational schema from an X M L schema. 2.2 Generic Merge Given two models and the mapping between them, the generic merge gen-erates the model that includes al l the information from the two input models. There are several algorithms and implementations [14] [11] describing generic merge. The general steps are: First , generate the necessary elements. Basically, the elements are the set union of the two input models, where the mapped elements are treated as the same principle and are only represented using one element in the result. Each such representative element includes all the properties of the elements that it represents. Then, the relationships are merged. One key factor is that any relationships in the domain/range and the mapping should not be lost. A l l the elements, whether they are in the domain/range or the mapping, have representatives in the result model. A l l the relationships are between elements in the domain/range and mapping, so all the relationships can be preserved in the mapping result. However, if all the relationships are simply added to the final result, there may be conflicts existing, such as cycles or multiple types for an element. There may also be some existing redundant relationships that can be implied and induced from other relationships. So the final step is to check and fix these kinds of conflicts. 2.2.1 R o n d o Rondo [11] is the first complete prototype of generic model management sys-tem. Rondo systematically defines the key model management operators, such as Domain, Invert, Compose, etc, and suggests several new generic operators, such as Extract, Delete, etc. It also shows a schema integration system that can be used for relational database Schemas or X M L Schemas. The differences of our approach to Rondo's lie on the data structure. Rondo uses a new data structure to represent the mapping. It is called a morphism. A morphism is a binary relationship between two models. Figure 2.5 is an example from Rondo that shows a morphism between a relational schema and an X M L schema. In implementations, a morphism is a pair of elements. This is a convenient way to represent mapping relation-ships. However, in our framework, we have everything existing as first-class objects. The idea of a morphism violates the 0 0 principle. The existing of mapping relies on the models and cannot be presented by itself. Therefore, .we use a mapping model to present the mapping relationships. Furthermore, in our system, every step gives an output of a complete model, which is easy to maintain and extend. A l l of our operators have first-class objects as input and output. These operators can be easily used in any other generic model CREATE TABLE PRODUCTS( PID int, Pname varchar ) <schema xmlns = "..."> <complexTvpe name - "product"> <element name = "ProdudlD" type = "xs:int" /> <element name = "ProductName" type = "xs:string" /> •^ ftlpment name = "ProductType" type = "xs:string" /> </complexType> </schema> Figure 2.5: Morphism in Rondo management tasks. 2.3 T h r e e - W a y - M e r g e In this section, we review the Three-Way-Merge algorithm discussed in [14], which is the basic algorithm used in this thesis. The Three-Way-Merge algorithm merges a given model and two different modified versions of it into one model. For example, Figure 2.6 shows the example used in [14]. There is the original model O and two different modi-fied versions, model A and model B. The goal is to create a final model, L, that captures all of the changes in A and B. The modification of model A is that the element d changed its parent from the root to element b, while in Model B, the element c was deleted. When these models are to be merged, because element c is deleted in Model B, the final version should capture the deletion and exclude c. E l -ement d is modified in Model A; therefore, the final version should also Model A Model 0 C D C D C D \ / \ • v • C D C D Figure 2.6: Three-Way-Merge Example [14] Model L Figure 2.7: Three-Way-Merge Result capture the modification and change the parent element of d to b from a. The merged result model is shown in Figure 2.7. There are several rules targeting the different changes made to each modified version. Let A, B denote the two modified versions, O denote the original version, and L be the final merged version. The rules are shown in Table 2.1. It defines different situations that different versions may have and how the final version should be with regard to the element. For example, i n row 4, it means if the element is in O, it is deleted in one model (A here) and modified in the other model (B here), then in the final result L, the modified version should be kept, it is in the final model. Based on these rules, the three-way merge algorithm in [14] w i l l do the # ' O A B L 1 0 add add © 2 0 unmodified unmodified 0 3 O deleted unmodified 0 4 O deleted modified O 5 O modified unmodified O 6 O modified modified O Table 2.1: Three-Way-Merge Rules 0: element in model 0: element not in model following steps to merge models A, B and O. The result is the same as shown in Figure 2.7. Note that because we are only demonstrating a simple example, some steps may generate empty models. Step Operation Figure MapoA = Match(0, A) Mapping Cp ( p Cp (7^L (~p fP B y History Property » \ X 1 Mapcu ' V if' MapoB = Match(0, B) Mapping Gp CD Cp C p ^ C p CD f D \ s \ 0 * B y History Property Mapa MapoA' = Apply(MapoA) CD cp if e E MapoA and domain(e) is identical to range(e), then delete e Mapping f a 3 MapoB' = Apply(MapoB) He £ MapoB anddomain(e) is identical to range(e), then delete e 0 5 ChangedA = range(MapoA') ChangedA the things changed in A 6 Changeds = range(MapoB') the things changed in B 0 7 MapchA-ChB = Match(ChangedA,ChangedB) 0 8 MapchBJOhA = Match(Changeds, Changed A) 0 9 A' = Dif f (ChangedA, ChangedB, MapChA.ChB) A' CO ( a ) 10 B' = Diff(Changeds, Changed A , MapchB-ChA) 0 11 MapA_B = Match(A,B) According to OIDs Modal A Mapping M ^ B CD C D C D C p C p CD ~ - - Map»* Model G 12 G = Merge(A, MapA_B, B) C p CD CD 13 MapG_A' = Match(G, A') Model G Mapping C p CD C p Cjb CD / \ CD Mao % 14 GA Merge(G, MapG_A',A') Model G A with preference for A ' C p CD CD 15 MapGA'-B> = Match(GA',B') 0 16 GAB = Merge(GA',MapCA>-B',B') with preference for B ' Mode! GAB CD 17 DeletedA = Diff(0,A,Map0A) 0 Deleted B C O 18 Deleteds = Diff(0, B, MapoB) 19 MapDeletedA-ChangedB = Match{D el eted A , ChangedB) 0 20 MapDeletedB-ChangedA = Match(Deleteds, Changed A) 0 21 ShouldDeleteA = Dif/(DeletedA, ChangedB, MaprjeletedA-ChangedB) 0 SltouWDalBled G C D v 22 ShouldDeleteB = Dif f (Deleteds, ChangedA, MaprjeletedB.ChangedA) cb 23 MapGAB^DA = Match(GAB, ShouldDeleteA) 0 24 GABSDA Di f f(GAB, ShouldDeleteA, MapcABs Model GABSDA ShouldDtMoted B C p CEX & > s £ D T s / \ / CD DA) 25 MapcABSDASDB = Match(GABSDA, ShouldDeleteB) Model GABSDA ShouldDelated B c^c^x cb. c L cL - ^ 26 Fina l result — Diff {GABSDA, ShouldDeleteB, MapGABSDASDB) Model L Table 2.2: Three-Way-Merge Example In the original algorithm, it does not mention how to deal with the completeness problem of models. For example, in step 5, the range operator wi l l only keep the element d in the result. It is desirable to keep the result not being a simple element, but st i l l a model. Here the elements a and b are added to keep the structure of the model. In order to distinguish them from the result elements, they are marked as support elements. Not only could the operator range generate an incomplete model, other operators could also have the same problem and need support elements to help in the result model. A key factor is that because Rondo considers only morphisms, it does not use support elements. Support elements are discussed briefly in [3], but one contribution of this thesis is a fuller understanding of support elements. We discuss them more fully and describe our conclusions in Chapter 4. 2.4 S u m m a r y Model Management gives an abstract way to solve meta-data management problems. B y using graph theories, models can be programmatically ma-nipulated by the model management operators. The mapping between two models is important and used very often in most operators and applications. However, the presentation of mapping varies a lot in different systems. It can be a first-class object - a model; it can be a morphism that uses direct links between model elements; or it can be just some pairs without data structure support. Some have been re-searched deeply, such as match and merge. The other operators have been introduced and discussed, but few have details on the algorithm and few have been researched for the behavior of the operator and how they interact with each other. These operators are st i l l in an abstract level. Even if the idea is clear, the actual implementation varies depending on different k ind of applications that the operators use. In this thesis, we only use first-class objects (models) and the operators are implemented based on the require-ment of schema reintegration and functionality. Chapter 3 Operators In this chapter, we discuss the generic model management operators that are used in our reintegration system. As described Section 2.3, there are five operators used: Merge, Diff, Range, Apply, and Match. In Section 3.1, we briefly review Merge. We have fully analyzed and implemented the other four operators, which are discussed in this chapter. We discuss Diff in Section 3.2, Range in Section 3.3, Apply in Section 3.4 and Match in Section 3.5. 3.1 Operator Merge Merge has been fully described and implemented in [14] as discussed in Sec-tion 2.2. Because it is a generic model management operator, it can directly fit into our schema reintegration system as one of the operators. Merge works as follows: Given two models, M\ and M2, and the map-ping, map, showing their relationships, Merge generates a new model, MQ, that unions the two models without any duplicates and conflicts. For ex-ample, Figure 3.1 shows an input of Merge and Figure 3.2 shows the result Modal A Mapping Modei B q £ (V) Op C p v - — • •—flu. N . -"** Map*s Figure 3.1: Mapping between Model A and Model B Model G Figure 3.2: Merge result of Model A , Model B and Mapping model of Merge operation. 3.1.1 Input The Merge operator has three input models, M\, M2 and the mapping be-tween them, Map. Map defines how M\ and Mo. are related as described in section 2.1.2. One of M\ or M2 can be designated as the preferred model. When there are conflicts between M\ and Mj j , Merge w i l l follow the behav-ior of the preferred model to break the conflicts. 3.1.2 O u t p u t The Merge operator generates a model as output, which contains al l the information in Mi, Mo. and Map, but has duplicates eliminated and conflicts resolved. The duplicates are from the Map, which shows the relationship of elements in M\ and M2. 3.1.3 T h e M e r g e A l g o r i t h m The Merge algorithm [14] includes the following steps: 1. Initialize: Create a new empty model G as the result. 2. Elements: Group the elements from M\, M2 and Map based on the mapping relationships. If there is a mapping relationship R(ei,em), where G M i UM2 and em € Map, then and e m belong to the same group. In the original paper, em must be an Equality mapping element for ej and e m to be grouped together. The difference between Equality and Similarity is in the mapping elements. In this thesis, mapping elements are all the same; the relationship R differentiates between Equality and Similarity. Therefore, rather than requiring e m to be an Equality element, Merge requires that R is an Equality type mapping relationship. A l l the elements in M i , M2 and Map are divided into each group. Then for each group, create a corresponding element in G. 3. Element Properties: Intuitively, each element e created in G has a cor-responding group of element(s) Em from M\, M2 and Map. Element e has the union of the properties that all the elements in Em have, ex-cluding the History, ID and HowRelated properties. When the same property p is contained in multiple elements in Em, its value in e is determined following the order of Map, Preferred Model, Any Model. Formally, for a group of elements Em, a property p, and some elements e m £ Em have the property p, if there is an element emap 6 Map n Em that has property p, then the value, of p in e is same as the value of p in emap. If e m a p does not exist, it tries the same rule but from the preferred model, then the other model. Whenever p is included in more than one element in the same model, its value is chosen arbitrarily. For example, Figure 3.3 shows a simple mapping between model A and B, where model A is designated as the preferred model. Elements EA and E' of model A both map to the same mapping element EM of Map, which also maps to element EB in model B. Suppose elements E A , E' and EB contains the same property p, but different values, VA, v' and Vb respectively. When the two models are merged, elements E A , E', EM and EB belong to the same group. The corresponding element EG € G also contains property p. To decide the value of p, the algorithm first checks the mapping. Since the mapping does not have the property p, the algorithm wi l l further check the elements in model A, which is the preferred model. Because both element E and E' are in model A and contain property p, the algorithm w i l l arbi-trari ly choose one from them. Therefore, the resulting element Eg w i l l have a property p, whose value is either VA or v'.. Model A Mapping Model B L L Figure 3.3: A Mapping between Model A and Model B, Model A is the Preferred Model . The ID property is set to a new number assigned by the system. The History property is set to the value of "Merge(IDS)", where IDS includes all the IDs of the elements in the group Em. This gives the traceability of the elements in G to the original elements that it merges. If M i or M2 is a mapping, the element e is a mapping element given the corresponding elements in group Em, M i or M2, are also mapping elements. If e is a mapping element, its How — Related property is determined following the same selection order as the other properties. 4. Relationships: Each element in G has a corresponding group of el-ements in M i , M2 and Map. If there exists a relationship R(e,f), where e and / are from different groups, a same type of relationship R'(e', / ' ) is created in G, where e' and / ' are elements in G and cor-respond to e and / . Mapping relationships wi l l not be created in G since if element e\ and e2 have a mapping relationship, they would have been in the same group and the mapping relationship between them would not be copied to G. Since there is no "similarity mapping element" any more in the system, the steps to process the similarity mapping elements are ignored. Finally, the relationships that can be applied from other relationships in G are removed. 5. Fundamental conflict resolution: The previous steps generate a model G that includes the duplicate-free union of the input models. How-ever, there may be some conflicts existing in G at the meta-meta-model level. This step resolves the conflicts according to the rules and strate-gies that have been specified. For example: in Figure 3.1, Mapping shows how Model A and Model B are related. Basically, the elements are matched by names shown in the figure. Figure 3.2 shows the result after the algorithm is executed upon the mapping. Note that there was a has-a relationship between element a and d after all relationships are added based on the input models. However, since the has-a relationship from a to d can be implied from a to 6 and b to d, this relationship is deleted in the final result. To build the reintegration system, we must extend the algorithm to take into consideration the use of support elements mentioned in Chapter 4. The Merge operation usually does not need extra support elements for the merged result elements if the input models have no support elements. However, if the input models have support elements, they need to be taken care of in the result model. According to the merge algorithm, each element e in the result model G has a corresponding group of elements Em which are from the input models. If al l the elements in Em are support element, the element e may act as a support element in G as well. It may not be necessary and may need to<be deleted. Therefore, it w i l l be marked as "delete" by creating a property of "delete=true" and then when the model is passed to the "Support Elements Engine" mentioned in Chapter 4, the engine wi l l decide if it wi l l be a support element or be deleted. 3.2 Operator Diff The Diff operator is used to compute the difference between two models, given the mapping between the two models. Intuitively, it w i l l find all the objects in the first model, that we refer to as the base model, that do not have corresponding matching in the second model. As in Merge, the map-ping should have been created by some other operators, such as match, and is assumed to be given. In previous works, such as [11] [3] [5], the Diff operator is either men-tioned as a combination of other operators, or using extra mapping to show which elements are the Diff result and which are not. In this thesis, we give a detailed definition of Diff and the algorithm to compute it . We.also use the support elements to make the result a self-contained model. Diff works as follows: suppose there, are two models M i and M2. The Model O Mapping Ma pen Figure 3.4: Mapping between Model O and Model B. Element c is not mapped relation between them is also provided, which is presented as a mapping map. Formally, Dif f(M\,Map,M2) — Md finds the elements that appear in the base model M i but not in model M2 based on map. The result wi l l be a new model named Md-3.2.1 Input The Diff operator has two input models, M\, and M2, one of which has been denoted as the "base" model (say, M i ) , i.e., the model which the re-sult model is based on, and the Mapping map . The mapping map tells how the model M i is related to M2. Without knowing map, the computation wi l l be meaningless. The format of the mapping is covered in depth in section 2.1.2 To see how Diff works, we present an example in Figure 3.4. Figure 3.4 Deleted B Figure 3.5: Diff result from Figure 3.4, assuming that O is the base model. Element a is shaded as support element shows a mapping between two models, Model O and Model A, where we assume that Model O has been designated as the base model. Note that ele-ments b and d of Model O are matched to corresponding elements in Model A w i th the "Equal i ty" mapping relationship. Therefore, these two elements should not be part of the result model, since both elements appear both in the base model and in the other model. The element c has no mapping relationship, and it w i l l be part of the result model. Sometimes, elements may have a mapping relationship with some elements in the mapping, but do not match to any other elements in the other model. These elements should also be included in the result model. The result is shown in Figure 3.5. A "Similarity" Model Mapping Relationship means that the Model Ele -ment e\ is similar to the corresponding element based on the mapping. Such elements need to be kept in the result model to show the difference. 3.2.2 O u t p u t The expected output result wi l l be a model Md , which includes all the elements that are in the model M i , but not involved in any "Equal i ty" type Model Mapping Relationships with any Model Mapping Element in the mapping map that has Model Mapping Relationship wi th elements in the other model. Therefore, intuitively, the result model Md is a sub-model of the input base model M\. Each Model Element Ed in Md is generated from some element E\ in M i . Therefore, Ed has the same properties of E\ and those relationships of E\ such that the other elements of the relationships are also in Md. Furthermore, because the result model Md contains Model Elements and Relationships that correspond to only some elements and relationships in M i , the init ial result may not be a complete model. As described in the [14], al l the elements and relationships in a model must conform to the i n -clusion constraint: an element or relationship is included in the base model if and only if there is a path of containment from the root to that element or relationship. Therefore, some elements may not be included in Md af-ter applying all the inclusion rules (i.e., searching for a containment path). Clearly, we would like our result, M ^ , to be a valid model, both from the perspective of wanting to make sure that the operators are composable and from the perspective of wanting to make sure that there is enough informa-tion to solve future problems using Md- To solve the integrity problem, the model is processed to "Support Element Engine" as described in chapter 4. Algorithmically, this means that we have several phases to the algorithm. To compute the different elements in Md, we first delete all the elements that have matching element in map; this corresponds to the elements (and relationships) that are truly in the difference of the models, but this result is not a model. Then we add back some elements and relationships as the support elements. However, because our model is not a tree (i.e., there can be more than one containment path from the root to a given node), we have to decide which deleted elements should be added to model Md as the sup-port elements. This can be done using the algorithms described in Chapter 4. To summarize, the result model Md includes the following: 1. Elements: For each element e\ G M\ s.t. there is no ei G M2 and BM G map s.t. Me(eM,ei), Me(eM,e-2), 3 a new element e<i G Md, s.t. ed has a new ID, and inherits the name and al l the properties except the History property of e\ . Give the new History property of as r > i / / ( d ) . • History property. Each result element w i l l have a new History property, wi th the value Dif f(ID(e\)) Because the new History has the original element ID, we can easily trace back to the orig-inal and find the history of this element. 2. Relationships: For each relationship Rd{&\^2) € M\, e\,e'2 G Md, s.t. ID (History (e\)) = ID(ex) and ID(History(e'2)) = ID(e2), 3R'd G Md and R'd conform to the inclusion rules. The relationships of the result model Md are all inherited from M i , which means each relationship R'd G Md must have a corresponding relationship Rd G M\ and the origin/destination element of R'd corresponds to the ori-gin/destionation element of Rd- However, not all the relationships of M\ have corresponding relationships in Md- Only those corresponding relationships of Mi that have both its origin elements and destination elements (including different elements and support elements) included in Md and conform to the inclusion rules are included in the result model Md-3. Supporting elements: As described above, supporting elements are those nodes added to support the model integrity of the result ele-ments. These elements wi l l include an extra property, named "Sup-port" with value true, to indicate that they are support elements, but not part of the real result elements. They are added as described in Chapter 4. 3.2.3 A l g o r i t h m To compute the Diff result, the algorithm includes the following steps: 1. Duplicate: Make a full copy of model M i , including all its elements and all relationships in M i (i.e., we do not include relationships that are incident on elements in M\ but are not in M\ according to the inclusion rules). Note that the mapping relationships in Map are not part of M\, and therefore they are not copied. However, it is st i l l necessary to keep the information about which elements correspond to those elements e € Mi that have mapping relationships Me(x,e), where x e Map and there exists some element e2 S M2 s . t .M e (x ,e 2 ) . Let the set of such e elements = Eremave, which are the elements set to be deleted. To show which elements have mapping relationships, a list, L, is built to include all the elements in E r e m o v e . The new model is called Mc. 2. Marking: For each element e € E r e m o v e , e is marked as "delete". These elements are not in the result elements set, but may appear in the model as support elements. 3. Support Elements: G iv ing the model Mc to the support element en-gine to check which elements should be added as support elements. Those support elements that are decided by the support element en-gine are marked "support" and do not have the "delete" mark any more. 4. Delete: For each element e 6 Mc that has the "delete" mark, e is deleted from Mc and all the relationships that have e as either an origin or a destination are also deleted. 5. Finally, return Mc. 3.2.4 Related W o r k There axe several papers that mention the diff operation. They are summa-rized as follows: • Rondo[ l l ] : There is no explicit definition of diff , but it uses All(sl) — Domain(sls2) to do the same job as diff if the two models are s i and s2. This is described as: al l elements of s\ without the matched (and thus not deleted) element. The result is a new schema and a mapping between the result schema and the original schema, which describes how these two are related. In Rondo, modules are mapped using Morphisms, which are just binary relations. For example, the following shows two schemas: sl.Orders(Oid, OrderDate, Employee, Customer, PONum, SalesTaxRate) s2.0rders(Oid, OrderDate, Customer, PONum, SalesTaxRate, ShipDate FreightCharge, Rebate) In Rondo, the mapping is represented using the following morphism: s i s2 O i d O i d OrderDate OrderDate Customer Customer P O N u m P O N u m SalesTaxRate SalesTaxRate Then the operation combination All(sl) - Domain(sls2) would be the whole elements set in s i , which include al l the attributes of s i , minus the left part of the morphism, and the result is simply computed by set operations. This result elements set itself cannot represent the data structure. The result elements set has to be combined with the original schema to show a meaningful result, so that it can be used in other operators. Rondo does not mention how the "-" operator is implemented. It does not mention support nodes either. It uses mapping (morphisms) indicating how the deleted result module is related to the original module. Our method is more self-contained. • Vision[5]: Difference is believed to be basically the same as matching, except that the answer needs to highlight the differences. The diff operation is said to be a Fu l l OuterMatch. In Vision, it treats diff as a contrast of match. It does not explicitly describe the input and output of the diff. However, the desired result can be computed from some hints: In Vis ion, schemas are represented as first class objects; Mapping is also a first class object. In the description of Match , it tries to find how the domain and the range objects are related. Therefore, in the Differencing, it should be assumed that the main task is to find how the domain and the range objects differ. It is different from what we have described about diff, where we assume the mapping is in hand, but Vis ion somehow tries to find the mapping (and difference). • Apply ing Model Management to Classical M e t a Data Problems [3]: In this paper, the input is the model and the mapping. (M\,map\) and the result also include two parts, the objects that are not refer-enced in the mapping and a new mapping between the result model and the original model: (M[,map2)-Three problems are considered in this paper. 1. Root is always in -cluded in the result object. 2. To ensure the result model is a well-formed model, all the objects on the path of has-a relationships from the root to the result objects are also included. These added objects are support objects. 3. Use a new mapping between the original model and the diff result to mark the support objects. Mark ing support objects in the result is treated as introducing another structure (the marked model) and is avoided. Their paper defines the desired input and output of diff in detail. However, in our diff, we wi l l have different methods to add support elements, which wi l l ensure least elements are added. Moreover, we wi l l consider all the inclusion constraints rather than only the "hasA" relationships. Because in our system, we have created the structure of "property", we can utilize this structure to mark the support elements. Therefore, we do not need to add another mapping, which may otherwise make the system more complex. In the Diff operator of this thesis, we keep all the input models and out-put models as first-class models and we use "Support Elements" to keep the integrity of these models. Our Diff allows "Support Elements" appearing in both input models and output models. 3.3 Operator range Given two models, one being the range model and the other being the domain model, and the mapping between them, the Range operator finds the elements that appear in the range model of a mapping that have cor-responding mapped elements in the domain model. The mapping itself is presented as a model, which contains elements that correspond to the do-main model (Md), the range model (Mr) and other mapping elements (Em). The range model here is not the same as the result model (Mrange) by the Range operator. The Range result model Mrange only contains elements that refer to part of the range model Mr. A l l the elements in Mr that are referred by elements in Mrange must be associated in a mapping relationship with some elements in the domain model. We w i l l give more details when discussing the output part. 3.3.1 Input The map model includes the domain model, the range model and the map-ping elements, which describe the relationships between elements in the domain and the range. Therefore, the map model itself provides enough information for the range operator. It is the only input for the operator range. 3.3.2 O u t p u t The output result would be a new model that copies the range model in the input map model, but only those elements that have been "mapped" to some elements in the domain model are the result elements and the result elements would have new ID's. In order to keep the result as a complete model, some unnecessary elements are also needed to act as support ele-ments. Formally, for each element e € Mr, there is a corresponding element Grange G Mrange such that 3e0 G Mmap (the Mapping) , and 3ed G Mdomain (the domain model), and there exists model mapping relationships r\ and r2 such that r\ is between e0 and erange and r2 is between eQ and edomain-The relationships and support elements would be added in the same way as for operator diff. In each element, range(id(erange)) w i l l be added to the history property. 3.3.3 A l g o r i t h m The algorithm is very similar to the algorithm in the operator diff in that it also selects part of the model in map, but it selects the contrary part to what Diff selects. The range operator includes the following steps: 1. Duplicate: Extract the range model from the input mapping. Make a new model, Mr, which copies all the elements (with new ID's) and relationships of model range. 2. Mark : First ly , iterate al l the elements in Mr and mark them as "delete". For each element e G MT, find the corresponding element erange G range, such that e is generated by copying erange. If 3em G Mmap and ed G Mdomain such that there exists Model Mapping relationship r i (e m,&domain <md r2(em, e r a n o e ) , remove the delete mark of e. In this step, all the elements that should be in the result model are se-lected and al l the other elements are marked as "delete". 3. A d d Support Elements: Give the model MT to the "Support Element Engine" to check which elements should be added as support elements. Those support elements that are decided by the "Support Element Engine" are marked "support" and do not have the "delete" mark any more. 4. Delete: For each element e G Mr that has the "delete" mark, e is deleted from MT and all the relationships that have e as either an origin or a destination are also deleted. 5. Return: Final ly return model Mr. 3.3.4 Re lated W o r k Operator range was only previously discussed in Rondo[ l l ] . In Rondo, range is defined as Domain(Invert(map)). It firstly reverse the role of domain and range in the input map and then return the domain model. It is because the operator Domain is defined as a primitive operator and the operator range can be generated using existing primitive operators. Because the map in Rondo uses morphisms, in other words, the map-pings are element pairs between domain and range, the domain or range operator can then simply extract the needed half in the morphism. It is a straight-forward operation. The range in this thesis uses first-class models, so it needs to extract some elements that are really acting in the mapping. Here we also show the detailed algorithm and use "Support Elements" to make the result self-contained. The domain operator wi l l be exactly the same and only care about the schema in the other side of the mapping. 3.4 Operator Apply The operator Apply is different than the other operators. It is a generic method that takes effect on each element in the input Model . There is no specific target on how Apply affects each element. Instead, Apply provides a generic template for a function which affects the elements of some model. When Apply is used, it must bind to some real function, T, to perform the expected functionality. Therefore, operator Apply is like a delegate function that has T as a internal function pointer. In this section, we discuss both Apply and function T used in the rein-tegration system. 3.4.1 Apply 3.4.1.1 Input The input of Apply includes the following: • A Model , M, which contains the elements that are to be affected by the operator. • A n internal function, F, which is the real function that wi l l take effect on the elements in M. 3.4.1.2 Output The output of Apply is a Model , which is generated from model M, but al l the elements have been processed with function T. Formally, MA = Apply(M), and for each element eA € MA, 3eM G M such that eA = Y(eM)-3.4.1.3 Algorithm The algorithm for Apply includes the following steps: 1. Duplicate: Create a new model, MA, which copies all the elements, EQ (with new ID's) and relationships of model M. Update the History property of element e € MA as Apply(e0). 2. Execute: For each element e G MA, execute function T(e). Note that because the functionality of T is not unique, the result element may be changed in one of several ways. For example, it may be deleted. However, because the system always guarantees the integrity of the model, if the element is to be deleted by function T, it is marked as "delete" and w i l l be processed with the Support Element Engine to decide if it should be deleted or act as a support element. 3. A d d Support Elements: process the model MA with the Support Ele -ment Engine. The elements that are marked as "delete" but need to exist as support elements wi l l be marked as "support". 4. Delete: For each element e G MA that has a "delete" mark, e is deleted from MA and al l the relationships that have e as either an origin or a destination are also deleted. If MA is a mapping, the model mapping relationships of "support" elements are also deleted. 5. Return model MA-3.4.2 F u n c t i o n T in Reintegrat ion The function T that is created to be used by Apply in our Reintegration system only has one target: it checks the input mapping element e to see if the elements that e links to in the domain and range are identical. If they are the same, element e is deleted. Two elements, a G domain and b G range, are believed to be identical if: • They are mapped by a mapping element and "equality" mapping re-lationships (i.e., 3 an element m G Map, s.t. Me(m,a) and Me(m,b)). • They have all the same properties (both the name and the value), except for the "history" property. If each property p{X) of a and X is not ID or History, there is a property p'{X) of b and p{X) — p'(X). • Element a and 6 should have same kind of relationships, of which a and b are the destinations, and the origins should match each other. For each non-mapping relationship r(x, a) in the domain, there exists a relationship r'(y,b) in the range, where r' has the same type as r , and there exists a mapping element e in map, such that there exists mapping relationships Me(e,x) and Me(e,y). Then a and b are be-lieved to be identical. Here it only considers the relationships from the "parent", and if a child changed, it wi l l only be considered when the child element is inspected. Otherwise, if both the origins and des-tinations are checked, one change wi l l be caught in two places and be redundant. 3.4.2.1 Input The input is an element e of a Mapping Model M. It can be a normal element or a mapping element. 3.4.2.2 Output Strictly, the function does not have Output. Instead, it modifies the input element e and updates e with the proper action. If the function needs to Chapter 3. Operators delete e, it only mark it as "delete". 3.4.2.3 Algorithm The algorithm includes the following steps: 1. From the given model mapping model M, extract the domain and range in M - Md and MT respectively. For the given mapping element e, find the elements ed £ Md and e r £ Mr. 2. Comparison: • First ly , for each property pd £ e ,^ such that Name(pd) ^ "His-tory" , if 3 property pr £ er such that Name(pT) = Name(pd) and Value(pr) — Value(pd), continue; otherwise, mark e as "delete" and return. • Secondly, for each non-mapping relationship rd £ Md, such that destination(rd) = and origin(rd) = e0d {eod £ Md), check that 3 r r £ Mr and destination(rT) — er , origin(rr) = eor (eor £ Mr), such that 3em £ M and 3 model mapping relationship r\,r2 £ M, and destination[r{) = eQd, destionation[ri) — e^. If so, continue; otherwise, mark e as "delete" and return. 3.4.3 Re lated W o r k The operator Apply was first discussed in [3], where it gives the formal definition. The purpose of this operator is to reduce the need for application programs to navigate a model. The input function can define a purpose and the Apply operator defines the rules and algorithms to traverse the model. However, it does not have any implementation details. Here we have discussed the input, output and algorithms in detail, and the need to involve support elements. 3.5 Operator Match Match finds the mapping relationships between the elements in two input models. In this thesis, we do not try to solve the general matching problem, which is the subject of many other papers such as [15] [11] [3] [5]. The mapped elements are based on History properties of the elements in the input models. It is assumed that only the model management operators appear in the History properties. If two elements in two different models can be tracked to the same element from the History properties, the two elements match to each other. The details of tracking from the History properties based on different operators are discussed in the algorithm part. 3.5.1 Input The Match operator takes two models as input. The ID of each element in the models wi l l be used to track which elements are derived from the same element and can be matched. Here, it is assumed that the History properties of all the elements are available, so that each element can be tracked based on the full history information. 3.5.2 O u t p u t The Match operator returns a mapping model. This mapping model con-tains the two input models as domain and range, the model mapping ele-ments E, and the model mapping relationships Rmap that present the match-ing relationships. It also contains relationships Rmodel that connect these model mapping elements E. These relationships are based on one of the two input models. The purpose of these relationships is to link the mapping model elements and they should not create any additional information for the input models. Because the output model would not include more ele-ments than any of the input models, the model relationships of the output model can be created based on any of them. Sometimes, only part of the in -put elements could have been matched. In this case, some mapping elements might have been created with no mapping relationships to any other model elements. They are only used to connect the mapped mapping elements to the root based on the relationship given in the input models. These elements would act as support elements. 3.5.3 A l g o r i t h m The mapping model includes elements and relationships. 1. Elements: • Create a new Mapping Mode l M, and make the input domain and range the internal domain and range respectively. Create a model mapping element em for each element ed £ domain. • B u i l d the element correspondences, Coru. The matched elements in domain and range may not be generated from an element d i -rectly. They may have been manipulated by several operators in multiple steps. Therefore, the only available source of all the el-ements is the whole space, which is called the universe, U, here. In order to create the element correspondences, for each element e £ U, extract its History property Ph- Ph usually has the format of OPERA TOR (IDs), where OPERA TOR is one of the operators: Match, Diff, Merge, Compose, Apply, ModelGen, Select, Range, Domain and Delete. The history with any of these operators ex-cept Match provide tracing information. The ID of e and IDs can be put into one correspondence. For example, if ID(e) — 300008 and the History property is Mer#e(300001,300002), the created correspondence wi l l be {300008, 300001, 300002}. The number of IDs in OPERATOR(IDs) depends on different oper-ators, which is summarized in Table 3.1: If either ID(e) £ correspondence c o r l or IDs £ cor2 before, then the corresponding IDs in c o r l and cor2 are combined and we generate a new correspondence which includes all the related IDs. Operator Created Correspondence x = Match(y, z) N / A x = Diff(y,z) {x, y} x = Merge(y,z,w) {x,y,z,w} x = Compose(y,z) {x,y,z} x = Apply (y) {x, y} x = Copy(y) {x, y} x = ModelGen(y) {x, y} x — Select(y) {x, y} x = Range(y, z) {x, y} x — Domain(y, z) {x, y} x — Delete(y, z) {x, y} Table 3.1: Element Correspondence for Each Operator • Locate the Matched Elements: M has the same number of el-ements as domain. Therefore, for each element e<i G domain, 3em G M, such that e m is generated from e .^ If 3er G range, core G Coru, such that ID{ed) G core and ID(er) G core, then em is marked with a new History property as Match(ID(ed),ID(er)); otherwise, e m is marked as "delete". 2. Create Relationships: M has the same corresponding relationships to the ones in domain. For each non-mapping relationship r^ G domain, let Origin(rd) = eod, Destination(rd) = edd\ F i n d the elements e ^ , edm G M, which are corresponding elements of,e0d and edd , create relationship r m G M, such that Origin(rm) = e ^ , Destination(rm) = edm and the relationship type is same as r^. 3. P u t M into the Support Element Engine to find the necessary support elements. 4. Remove elements in M that are marked as delete and return M. In the above algorithm, it is assumed that the matched elements in domain and range are one-to-one. However, it may happen that the match-ing is one-tc-many or many-to-many. In this case, the above algorithm only need a small modification: l imit the usage of correspondence to one time. The "Locate the Matched Elements" step w i l l be changed to: • Each element ed £ domain, 3em £ M, such that e m is generated from ed- F i n d the correspondence core £ Coru, such that £ core; F i n d al l the elements Ec = {ec\ec £ core such that e c £ domain or e c £ range; Mark the History property of em as Match(ID(EC)); Remove core from Coru. If the correspondence core cannot be found, mark the element e<j as "delete". In this way, if elements ei,e2 £ domain match to the same group of elements Er £ range, when e\ is processed, the history property of the corresponding element em w i l l include all the elements,ei, e2 and al l Er. Then, this corresponding tuple wi l l be deleted. When e2 is processed, it cannot find the corresponding elements. Therefore, the element em2 £ M that is generated from e2 w i l l be marked as "delete" and wi l l probably be deleted or act as a "support element", but not have mapping relationships to any elements in domain or range. This guarantees that each group of matched elements only have one mapping element in M. 3.5.4 Re lated W o r k The Match operator has been widely discussed,(e.g., [15] [11] [3] [5]). Generic Match is not discussed in this thesis. The Match operator in this thesis is used for the Reintegration system environment and the matching process is done with the help of the History properties. 3.6 S u m m a r y In this chapter, we have discussed the five operators that are used in the Reintegration system. Four of them have not been discussed in such detailed level before. We have formally defined the input, output and algorithms to be used for each operator. Moreover, we have used the idea of Support Elements on al l the operators, which features heavily in each operator, and has not previously been described in detail. To handle support elements, we designed and implemented the Support Elements Engine, which wi l l be discussed in Chapter 4. It is a generic engine that can be used by any operator. This method makes the input and output of each operator more self-contained and more generic. Chapter 4 Support Elements In Model Management, most operators generate new models. It is valuable to ensure that all outputs of operators are composable For this to be true, the output of each operator must be a valid model. Here the term valid model means that al l the elements and relationships in the output must conform to the inclusion rules mentioned in chapter 2.1.3. However, some operations wi l l delete some elements or relationships from a model and this may make some elements "unconnected", and thus not included in the model. C o n -sider the example shown in Figure 4.1. It is a mapping between Productl and Product2. If we apply the Diff operation on it to compute the differ-ence between them, the result only includes the Product element (the root) and the Pid element. They are not connected and it is hard to say what relationships they should have between them. Domain Range Figure 4.1: Support Elements Mapping Example In order to keep the result sti l l as a complete model, some elements are added to the result model to support the structural integrity of the model. They are originally in the model, but may have been deleted. However, because these elements should not be included in the result, they must be identified from the other elements in the model. Here they are called support elements. The support elements idea was first proposed in [3]. To identify the sup-port elements, it uses a mapping between the result model and the original model. Only the result elements have mapping relationships to the original model elements, and those support elements are not mapped. This way has its benefits. It does not require a new data structure to mark the sup-port element and it only uses available structures, model and mapping, to identify the result. However, the result of the operators that need support elements wi l l then become a pair < M, mapping >. When the results are further used as input for other operators, it w i l l make the process complex. The operators cannot always expect a simple model as input, but sometimes the input is a pair, which is actually a new data structure. If there are a number of such steps, the result model wi l l become more complex and hard to follow or trace. Another method to mark the support elements is also mentioned in [3], which is to simply mark the elements as support. The disadvantage of this method is that it introduces another structure, the marked model. This method is not recommended. In our framework, we have the property attributes in each element. We can simply use the property as a marker to identify if the element is a support element. 4.1 Definition In model management, Support Elements are those elements that are i n -cluded in the model only for model completeness and integrity purposes. They are included in the model, but except for l inking other elements to-gether, they do not have any other functionalities." If any of the support elements are not included, the rest of the elements would not be a complete model anymore. If elements that need support are deleted, those support elements that only support the deleted elements should also be deleted. Formally, for a model M with elements EM , we divide EM into the sets ESM, where ESM contains al l support elements, and EM, where EeM contains al l elements in the model that appear as normal, non-support elements. These sets are disjoint and completely partition EM- That is EMC\EM — 0 andEsMUEM = EM. Additionally, let M' be the attempt at inducing a model from the ele-ments of EeM. That is, M' is built by ( l )adding EeM to M', and (2) adding to M' each relationship r , s.t. origin(r) G M' and dest(r) G M'. Then based on the inclusion rules M' is not a model. Similarly, all elements in EM are required for the result to be a model. That is, let M" be any model M — E's, where E's is a non-empty subset of ESM. Then by the inclusion ^Product! Figure 4.2: The Diff result in Figure 4.1 wi th the support elements needed to make it a valid model. Support elements are visualized as shaded nodes throughout this thesis. rules, M" is not a model. To visualize the difference, we shade the elements to identify the support elements. For example, the Diff result from the example in Figure 4.1 would be the model shown in Figure 4.2. Note that the element Pid is acting as a support element and is shaded. In the underlying representation, this is represented by adding a property support = true to the Pid element to show that it is now a support element. 4.2 Usabil ity of Support Elements Though originally motivated as a requirement for Diff [3], we show that a l -most all the other operators may also need support elements. For example, in the range operator, only the elements in the range model that have map-ping relationships wi l l be in the result. This means that only some elements of the original model wi l l be in the final result. In the operator match, two models may have a leaf element matched, but not on their ancestors. To make the mapping be a first-class object, we need make their ancestors support elements. In order to keep the result as a model, we have to use support elements to help. If we have support elements existing in the result model, the result may be further used in other operators. This means that support elements may be quite common in both input and output models. It is valuable to keep all our input and output as full models. In this way, we can make our operators more generic, composable, and easy to maintain. The output of one operator can be used as the input of another operator. If we are expecting a kind of model, we can always assume that it is a full completed model that conforms to the inclusion rules. It w i l l be trouble-some if the Diff operation only generated a set of discrete elements without structure and cannot be used in those operators that need models as input. Our purpose is to keep the result model complete. However, we should not add any extra information to the result and we should not lose any information either. For example, in Figure 4.2, we cannot just add in a type-of relationship, or any relationship between the element product and int that would make it a model. This addition may make the operation not accurate. B y adding the support element pid, as shown in Figure 4.2, we can keep the original structure and mark those elements that do not belong to the result model. We can achieve the goal of not adding any extra i n -formation or losing anything essential. Moreover, the support elements also help in improving the traceability of operations. If a model is manipulated Mapping Figure 4.3: Mapping between a Model with a Support Element and a Normal Element by a series of operations, some support elements may become not supported anymore in the final result; but, we can st i l l trace the generation of the ele-ments from the history property. Suppose the model in Figure 4.2 is used in a merge operator with another model as shown in Figure 4.3. In the result model, Pid w i l l be a normal element (non-support) with a history property of merge(ID(pid\),ID(pid^)). 4.3 Support Elements A d d i n g Algor i thms In this section, we discuss the engine for adding support elements that the algorithms use to determine which support elements to add. We consider as input to the problem of adding support elements a model M , and a set of elements EM C EM that are required elements in our result: the model M'. Our goal is to find the support elements set E3 to add to M'. B y adding the set E S , the partial graph M' of M includes only elements in ES and E M , and the relationships whose origin and destination are all in E S \ J E M . The partial graph M' is then our return result. Because our model is a graph, not a tree, for a given element that needs support elements, there may exist several paths to the root. We have to select a proper path based on some expectations or rules and add the ele-ments in the path but not in E M . First we determine which elements need support, EENS, in Section 4.3.1. Then we determine which elements EENSR, EENSR £ DENS, can represent all the elements in EENS hi Section 4.3.2. We only need consider EENSR when we look for support elements. Finally, we discuss the different algo-rithms used to find the support elements for EENSR m Section 4.3.3. 4.3.1 A l g o r i t h m to F i n d W h i c h Elements N e e d S u p p o r t Firstly , we need find our target elements set that need support, which is called Elements Need-Support (EENS)- Basically, any graph search algo-r i thm can be used to recursively include the elements by the inclusion rules starting from the root, which is always in the model by definition. Here a breadth-first search method is used. A queue can be used to keep track of which elements to explore next. It is also necessary to avoid cycles by remembering which elements have been used in the queue. The algorithm takes a model M' as input. Model M' can be any valid model, including a mapping. There are two categories of elements of M'. One is the normal elements, E M , which are the elements that the result model must contain and some of which need support elements. The other category is the elements, E M , that can be deleted and which may need to be support elements. These elements are different from EM in that they have a property delete w i th value true, meaning that they are to be deleted if they are not support elements. Note that this input is also the input of the whole "Support Element Engine" . The expected result of this part of algorithm is a set of elements, EENSU E M , that cannot be included in the model without the support of some el-ements, ES U E M . The algorithm details are as follows: 1. Define element set ECMIERED — 0 , EQUEUE '= 0 ; 2. P u t the root element of M' into Ecovered and E Q U E U E ; 3. While( E Q U E U E IS not Empty) Do: L E T e = Dequeue(EQUEUE); I F e has been visited, N E X T ; F I N D the corresponding element e' G M that corresponds to e; F O R E A C H inclusion-rule relationship R I F e' is the origin of a relationship R ; L E T ed be the corresponding destination; I F ed £ EEM P U T ed into Ecmiered\ Figure 4.4: Elements Need Support & Representatives P U T ed into E Q U E U E ; E N D W H I L E 4. R E T U R N EENS = E- E ^ ^ 4.3.2 E N S R e p r e s e n t a t i v e s Not all the elements in EENS need to be considered when adding support elements. In the example shown in Figure 4.4, elements B and C are not in E E M . EENS includes elements D,E,F. However, it is not necessary to consider element F when adding support elements, because if either D or E is included in the model by adding proper support elements (B, C here), F wi l l automatically be included in the model. Although D,E,F are con-nected together, it is st i l l necessary to consider both D and E as the E N S R , because they do not have ancestor- descendant relationships, and adding D wil l not ensure that e is in the model, and vice-versa. Therefore, if relationship r is between elements A and B, A G EENS, B £ EENS and II(r, B) (i.e., r implies inclusion of B, see Section 2.1.3), by which it can induced that if A is in the model then B is also in the model, then it is only necessary to consider adding A as a support element, and not B, since adding A would also include B . Therefore, in this step, only some elements from the EENS set are selected as representatives. Only these E N S Rep-resentatives ( E N S R ) are considered to add support elements. Each E N S R element may represent a set of elements in EENS- This may reduce the number of elements to be considered and speed up the overall process. This step takes the following as input: the model M' and the output elements set EENS from previous step in section 4.3.1. The algorithm is: Iterate each element e G EENS and test if 3ep G EENS, such that if ep G M then e G M by inclusion rules. If so, we delete e from EENS- Finally, we get the set EENSR, the representative elements set. 4.3.3 Support Elements A d d i n g E n g i n e a n d A l g o r i t h m s Because support elements are used in almost al l the operators in the rein-tegration system, the adding process wi l l be a separate, reusable engine, so that al l the operators can share the process and algorithms. The Engine has the following input and output: 1. Input: The input includes the original model M, the current uncom-pleted result model M', whose elements set as E M . The whole el-ements set of M as E. Here M' is a copy of M, except that the elements which are not in the result set E E M are marked as delete by adding the property delete = true. If these elements are marked as support later by adding the property support — true, they wi l l be kept as support elements. For those elements in EM that are not marked as support, they wi l l be deleted in the final step. In this way, the algorithm can operate on one model and avoid the frequent steps to locate the corresponding element in the original model. 2. Output: Updated model M' such that al l the elements of ENS are included by inclusion rules. To find the support elements set, there are many different ways. Here we list three methods. Each of them has different advantages and disad-vantages. Section 4.3.3.1 discusses the "Shortest Path A lgor i thm" , which is a fast algorithm and the most straightforward way: we consider each el-ement in ENSR separately. Section 4.3.3.2 discusses the "Least Support Elements" algorithm, which is valuable when the number of support ele-ments is desired to be as small as possible. For example, when support elements are costly for storage reasons, it wi l l be better to add as few sup-port elements as possible. Section 4.3.3.3 discusses the "Greedy A lgor i thm" , which is useful when the model structures are complicated and the elements interact with each other a lot. 4.3.3.1 Shortest Path Algorithm One straightforward method is to find the shortest path for each E N S R el-ement. For each E N S R element, the algorithm uses a reverse breadth-first search to track its ancestors. The ending condition is the tracing process reaches to either the root element, or any element that is already included in the model, or an element that has been marked as a support element by using the inclusion rules. Then al l the elements in the found path that did not belong to the result elements set are included in the result model as sup-port elements. Furthermore, because the path is found with breadth-first search, it is guaranteed that it is the shortest path. The detailed algorithm is as follows: 1. For each element e m G ENSR, put em into the processing queue Q. 2. Whi le the queue is not empty and there is no path found, let eq be the first element in Q , use reverse breadth-first search method. For each relationship r that is in the inclusion rules and for which eq is the destination, find the elements E 0 , such that for each element e0 G E0 there exists a relationship r between e0 and eq and II(r,eq). 3. For each element e0 G E 0 , if e0 G EENS, put it into Q and record the corresponding path; otherwise, if eD G E M , or e0 G EENS and e0 has a property of "support — true", then the shortest path for element e has been found. 4. F i n d out the whole path from e to e0. In this path, if any element is marked as delete, change its property to support. For each such element, recursively apply the inclusion rules again to find its descen-dant. If any of its descendant is in the EENSR, remove it from EENSR-5. When al l the elements in EENSR have been covered, the algorithm is done. Final ly remove those elements from M' if they st i l l sti l l marked as delete by having the property of udelete=truev. The advantage of this algorithm is the speed. Even to reach the root, the average case complexity for this algorithm is 0(log(n)). However, since we need to keep track of each level, the space usage is very high. 4.3.3.2 Least Support Elements Algorithm (LSE) In some applications, it may be valuable to use as few support elements as possible. This algorithm can minimize the number of support elements to be added overall. This means if n support elements are added to the model and make up all the elements in E N S R included in the model by the inclusion rules, it would be impossible to find any other better combination with fewer than n support elements that could also include all the elements of E N S R in the model. To determine the least support elements, it is necessary to find all the possible paths that all the E N S R elements may use to be included in the model. The steps are as follows: 1. The first step is very similar to the Shortest Path Algor i thm in that it also uses the reverse breadth-first search for each element e G EENSR-It also keeps an elements tracing queue Q for the breadth-first search similar to that for the Shortest Path Algorithm. The difference is that it does not stop when it finds the first element that is already in the model. Instead, it wi l l only record this path (but not trace this path), and it wi l l continue with other possible paths. When all the possible paths have been found, which means Q is empty, we terminate the tracing process. We record al l the possible paths, but do not delete elements from E N S R when we find all the paths, because we are not sure yet which path wi l l be used in the final result. 2. Repeat the above step for each element in E N S R . Then we have all the paths set for ej € EENSR- For example, the following may be the final result of the paths to be considered: path(eo)[0] = { e n , e i 2 , e i 3 } path(e0)[l] = { e n , e i 5 } path(e\)[0] = { e n , e i 2 , e i 7 , e i 8 } path(e\)[l] = { e i 9 , e 2 o , e 2 i } path(e2)[0] = {e 22} path(en)[k} = {e2i,e26,e27} The paths may have intersections. Each element can choose one of the available paths to add support elements. Different combinations of paths chosen by each element in E N S R have different number of elements by union the elements of selected paths. 3. Compute the total number of elements of each path combination by selecting a path from each element's path set. For the above example, we have: count(path(eo)[0] Upath(ei)[0] Upath(e2)[0)) = 6, (counting elements e n , ei2, e i 3 , e i 7 , e i 8 , e20) count(path(eo)[0} U Path(ei)[l] Upath(e2)[0]) — 7, (counting elements e n , ei2, e i3 , e ig , e2o, 621,22) 4. Pick up a combination of paths with the minimal count and mark all the elements in the selected paths that are not in the model (marked as delete) and mark them as support. LSE guarantees that only minimum support elements are selected. How-ever, the complexity of this algorithm is relatively high. 4.3.3.3 Greedy Algorithm The idea of the Greedy Algor i thm is to select the element that can cover most of the E N S R elements to be a support element in each step. The de-tailed algorithm is as follows: 1. F i n d al l the elements Edp, such that for each element edp € Edp, 3e e ENSR and a relationship r between edp and e, and II(r,e). 2. For each element in Edp, compute the out degree: the relationship r that is included in the inclusion — rule and II(r, e) for some element e € ENSR. 3. Select the one with highest out degree, e^, and mark it as a support element by adding property "Support = true". 4. P u t eh into E N S R . 5. Remove those elements in E N S R that can be covered by e/i. 6. F i n d the direct parents of and put them into Edp. 7. Repeat from step 2, unti l E N S R is empty. The Greedy Algor i thm cannot guarantee an optimal result (i.e., the least support elements). For example, Figure 4.5 shows an example of using the Greedy Algor i thm. The algorithm may choose Do...Dn as the support elements, because Dn has the highest out degree. However, we know that the least support elements should be only B and C. This example demonstrates that in some cases, the Greedy Algor i thm may generate a worse selection of support elements. However, it can generate a "better than naive" result and has an acceptable complexity. In practice, since database schemas are mostly well structured and the depth of the model graph is small , the Greedy Algor i thm can often find the "important" elements to be added as support elements first. These elements can play a good role on supporting elements in ENSR. Redundancy in the Greedy Algorithm C D ( J D Figure 4.5: Greedy Algorithm Example Figure 4.6: Redundancy in Greedy Algorithm: B - E are candidate support elements. The Greedy Algorithm wi l l add al l of them as support elements, but D is not necessary. The Greedy Algorithm that it cannot always produce the optimum result (the least support elements). Moreover, sometimes elements are included as support elements, but the model could sti l l be completed even if those ele-ments are not support elements. For example, Figure 4.6 shows an example of adding support elements using the Greedy Algorithm. The model shows that elements F to O are result elements and elements B,C,D,E are not in the model and are candidates for support elements. Following the Greedy Algor i thm, the steps axe: • E N S R s are elements F to 0 , and the out-degree D for the direct parents are: D(C) = 5, D(D) = 6, and D(E) = 5 • Element D is firstly selected as the support element, and it covers elements H to M. • The updated E N S R s are elements F,G,D,N,0, and the out-degrees are: D(C) = 2, D{B) = 1 and D(E) = 2 • Then, the element with highest degree, C, is selected as the support element, then E is selected. • Finally, B is selected as the support element and we are done. • The support elements are B, C, D, E. However, notice that element D is redundant. If D is not a support element, all the E N S R s can st i l l be covered by C and E. In our Schema Reintegration system, all three algorithms have been im-plemented and tested; In practice, we have used the Shortest Path A l -gorithm the most, because most models that represent database schemas are trees. In such cases, the Shortest Path Algorithm can correctly, effi-ciently, and optimally solve the problem. However, if the model structure is a complex structured graph, and it is necessary to consider the cost of the total number of added support elements, the other two algorithms can help. When the number of added support elements is really" critical, for example, adding a support element wi l l cost several gigabytes of disk space, then the "Least Support Elements" gives the optimal solution. When the model structure is very broad (high out-degree on most elements) and has complex relationships between elements, the "Least Support Elements" is very costly. It needs to find all the paths to some elements already included in the model. In such cases, the "Greedy Algor i thm" gives a good solution to quickly find the support elements. 4.4 Discussion 4.4.1 T h e R o o t Element In Model Management, each Model is required to have a unique root ele-ment. In a mapping, it is also assumed that the root elements of the domain and range match each other. When the models are manipulated by opera-tors the root element may not be deleted from the model. For example, in the Diff operator, if the root elements of the domain and range are matched in the mapping, the root of the result model wi l l be a support element. However, in the support elements adding algorithms, one of the conditions to stop tracking each path is when the path reaches the root. Therefore, the root element must be processed separately. In any operator, if support elements are to be added, it is necessary to guarantee that the root element is included in the model first. If not, it should be marked as a support element at the beginning of any operations. 4.4.2 Support Elements in Input Most operators need support elements in their result model. These results can be further used as input to other operators. Therefore, each operator needs to consider how to deal wi th the support elements in their input. Moreover, it should also be considered when such models (which already have support elements) need to add more support elements. These issues wi l l be addressed in Section 5.2.2. In this chapter, we discussed in detail the support elements used in model management. Support elements are helpful to maintain the integrity of mod-els during model management operations without adding extra information or losing information. We gave an algorithm to locate the support elements and we gave three support elements adding algorithms that could be used in different situations. Chapter 5 Reintegration System Having developed all the operators needed for the reintegration algorithm, in this chapter, we discuss the details of the reintegration system (Section 5.1) and some complicated problems that have to be solved (Section 5.2). 5.1 Reintegration (Three -Way-Merge) A l g o r i t h m As shown in the example in Table 2.2 in Chapter 2, the reintegration algo-r i thm can be done in 26 steps using the operators described in Chapter 3. The input of the algorithm is one original model O and two different mod-ified versions O, Model A and Mode l B. In model A and B, each element e has a History property indicating which element eQ G O e is derived from. Again, since each Model Management operator provides the history property, this information would be easy to derive even if there were more intervening operators. The output of the algorithm is a single model Mresuit, which is generated conforming to the rules mentioned in Table 2.1. A t the highest level, the reintegration steps include the following: 1. Merge the two modified versions, Model A and Model B, into Mode l MQ- This includes all the existing elements in both versions (steps 10-11). When two elements from two models are merged, sometimes, there are conflicts. For example, one element may have the property "minOccur = 5" and in the other model, the matched corresponding element may have the same property with a different value, such as "minOccur = 5". In the merge theory, there is usually a preferred model to break the conflict. If the preferred model does not include the latest modification, it may override the latest version. Therefore, the merged version need check whether the modified part in Model A and the modified part in Model B are included. This is done by using merge again between Model MQ and the changed parts of A and B. Steps 1 to 9 find the elements that are changed in A or B but not both. Steps 12 to 16 merge the changes wi th MQ- However, MQ may also include those elements that should be deleted. 2. F i n d the deleted elements in Model A and Model B separately, namely MoeletedA and MoeletedB, which may need to be deleted from the merged version MQ- Steps 17 to 18 find these deleted elements. How-ever, these deleted elements may have been modified in the other ver-sion. Therefore, not all the elements in MrjeietedA a n d MrjeietedB c a n be deleted from MG-3. F i n d the those elements that are deleted in one model but not modified in the other. These elements are those that can be really removed from the wholly merged version. Steps 19 to 26 find those elements and delete them from the merged model MQ-5.2 Discussion of Problems in Reintegration System We have implemented the reintegration system, including all the operators discussed in Chapter 3 and the reintegration algorithm [14] shown as an example in Chapter 2 and summarized in Section 5.1. The system was tested with several test samples, from very simple ones to very complicated ones. We discovered some interesting problems that have not been previ-ously studied. 5.2.1 N u l l M o d e l Our experiments started with the simple example shown in Chapter 2. Though this example is simple, it can raise some unexpected results. For example, many operations generate a resulting model that is NULL, i.e., the result model includes no element or relationships. Previous work had not identified this as a possibility, but is clearly a special case that must be handled properly. We treat the N U L L model as a special type of model. The Null model is necessary. M a n y operators take a pair of models as input, such as Diff, match, range, and match. Also, the map can be an input. Without the Null model, these operators would not be able to han-dle the cases where input models have no elements. But intuitively, the existence of the null model makes sense to real problems. For example, in the example in Table 2.2, Step 7, models changed^ and changeds have no common elements. Their mapping wi l l result in a null model. We need to define the behavior of each operators when one or more input models are null . • In Diff, if the domain model is a null model, then the result model is null too. If the range model or the map is a null model, then the result model is exactly the same as the domain model. • In merge if any of domain or range is a null model, the result w i l l be the same as the other model (range or domain). In merge it is assumed that the two root elements of the domain and range are mapped to each other if neither of them is null . Therefore, the map cannot be null . • In range, if the domain model is nul l , the map w i l l be null too. There-fore, the result model w i l l be same as the range model. If the range itself is null , the result wi l l be null too. • In match, if any of the domain or range is nul l , the result mapping wi l l be null too. • In Apply, if the input model is null , obviously a nul l model wi l l be returned. Furthermore, if all the elements in a result model are support elements, this model wi l l be a null model, instead of a model that includes al l support elements. 5.2.2 S u p p o r t Elements in Input As we have mentioned in Section 4.4.2, support elements may appear in the input models of each operator. While the Reintegration system is composed of the operators and the result of one operator w i l l be used in future oper-ators as input, the support elements wi l l be more likely to appear in input models. They need to be handled with clear rules. Here is how they are handled in the system. The first step is that al l the support elements in input models are marked as "delete" candidates. More specifically, if an element in the input mod-els has a property of "p(support) = true", this property is deleted and we add a new property as "p(delete) = true". This means that these elements may be deleted after the operator finishes. Because the "support elements" may not useful in the result model, i.e., it is not necessary to "support" any elements. For example, Figure 5.1 (a) shows a mapping between model A and model B, which includes a support element c to support the element d to be included in the model. The merge result is shown in (b). Obviously, element d is now included in the model because of the existence of element b, and support element c is not necessary any more. The second step is that the model with elements marked as "delete" is processed by each operator normally as input. The elements marked as Model A Map Model B Merge(A, B, Map) Figure 5.1: Support element c in Input model, which is not needed in Merge result. "delete" are treated as normal elements in the algorithms. However, each operator needs to handle the delete property in the input element differently. • In Merge, each element in the result model represents a group of corresponding elements. If any element in the group does not have the "delete" property, the corresponding result element wi l l not have the "delete" property. • In Diff, if the domain model contains elements that have the "delete" property , the corresponding elements in the result model also have the "delete" property. • In Range, if the range model contains elements that have the "delete" property, the corresponding elements in the result model also contain the "delete" property. • In Apply, the "delete" property of the elements in input model w i l l be kept in the corresponding elements in the result model, unless the Apply function modifies this property on purpose. • In Match, each model mapping element in the result map represents a group of elements. Only when all the elements in the group have the "delete" property, the corresponding element in the result map also has the "delete" property. Otherwise, the elements in the result map do not have the "delete" property. In the third step, the result model w i l l be processed by the "Support Elements Engine" as usual and if the elements that are marked as "delete" are st i l l essential to support some other elements, they w i l l be marked as "support" by the engine. If the elements are st i l l marked as "delete", they wi l l be deleted from the result. The specific handling of "Support Elements" enables the Reintegration system to work for any cases with full "Support Elements" functionality support. The "Support Elements" can appear in any place. The Support Element Engine has been fully discussed in Chapter 4. 5.2.3 M o d e l V a l i d a t i o n A l l the operators in the Reintegration system assume that the input models are valid. A l l the algorithms in each operator also rely on valid models. However, this assumption may be challenged when the input models are complex. During system testing, we tried some complex examples that include hun-dreds of elements. Some human errors may make the input models invalid. For example, some elements are supposed to be in a model, but cannot be inclusion-implied, or the relationships have cycles. If the input models have such kinds of invalid problems, the performance and result of each operation can hardly be guaranteed. This may lead to system instability. Therefore, all the input models need to be validated before being used. The model val -idation procedure includes validating the elements and relationships. The rules used are the "Inclusion Rules" shown in Section 2.1.3. Starting from the root, if al l the elements and relationship can be included by using the inclusion rules, then the model is valid. If the model is mapping, we also check if the two roots of the domain and range are mapped. The correctness of logic of the input models is not validated in this system. It depends on the user to make sure the logic in the model is correct. Invalid input models are rejected from proceeding further during Reintegration. In this way, the system becomes more stable and could be used on com-plex models, which are more general, and hence the system is more valuable. 5.3 S u m m a r y The Reintegration system integrates all the operators that are required for Reintegration. Because all the operators are designed in a way that they can consume the output models from other operators, they can be easily, incorporated into the system to accomplish the integration task. The "Sup-port Element Engine" is designed to support all the operators, so that each operator can focus on its functional algorithm. This kind of design makes the system more flexible and extensible. It is suitable for future model management operators development and other relevant model management topics. The system also includes a graphic pre-sentation that enables the models and algorithm procedures to be visualized. This makes the system easier to be validated and observed for discovering interesting issues. We have made the system stable in a sense that it always checks the validation of input models, and during each algorithm. From our exper-iments, we have also discovered the corner cases in the system that need to be considered to make the system more general and compatible. These corner cases are not usually discussed but they are st i l l technically essential to make everything stable in the system. A l l these efforts make the system ready to accomplish the schema rein-tegration. Now that the system is ready, we describe the experiments on the system in Chapter 6. Chapter 6 Experiment The implementation of the Model Reintegration System is based on the previous work of Merge in [14]. The system is written in C # and includes the classes for the basic data structures, such as Element, Model , Relationship, Mapping, and operators. We built each operator as a class, so that the class can be decoupled from other operators, and is easy to extend. We also built a class that integrates al l the operators and follows the Three-Way-Merge algorithm. In order to make the system more transparent and clear, the result models in the steps are al l visualized using Visua l Studio . N E T 2003 and the FlowChart . N E T component [8]. A l l the elements are displayed using an automatic tree-structure arrangement plan. This is a good way for observing the elements and relationships for demonstration purpose. Figure 6.1 shows a mapping in the Model Reintegration System interface that we used for demoing the reintegration procedure. The system includes the following main parts: • Model Display • Model Selection • Support tools. Internet Model Reintegration System Unknown Site E Load Reintegration Model MO Model MA Model MB 4. MapOB' B, MapChangeA 1. MapOA 2 MapOB 3. MapOA' 5. ChangedA B. ChangedB 7. MapChangeB 9. A' 10.B' 11 MapAB 12.G 13MapGA' 14.GA' 15MapGA'B' 16.GAB U.DeletedA 18.DeletedB 19.MapDACB 20MapDBCA 21.ShouldDelete 22.ShouldDelete 23.MapGABSDA 24 GABSDA 25.MapGABSDA 26Resii l t Property Display 8 Simple C Full MapOA = Match(MO, MA) Figure 6.1: Model Reintegration System After a demo is loaded, the user can select any model in the steps by clicking the related item in the left bar, and the selected model wi l l be dis-played in the main area. There are some functionalities to make the system more convenient for the users. For example, there are two modes in which to view the model: the simple mode only shows the name of each element, and the full mode shows all the details of each element, including all the properties. There is also a zoom bar to view the model in different sizes. 6.1 Operator Experiments While we developed each operator, we performed numerous experiments to ensure the correct behavior of each operator. These experiments are designed to cover different situations that may happen in the input. For example, the following are part of the experiment cases that were used for testing Diff: • Element in domain and has a valid match to some elements in the range • Element in domain but has no mapping relationship • Element in domain and has a mapping relationship with some element in the map, which has no mapping relationships wi th elements in range • Mult iple elements in domain match to the same element in range • One element in domain matches multiple elements in range # Operator # of experiment cases 1 Merge 25 2 Diff 6 3 Range 6 4 Apply 2 5 Match 4 Table 6.1: Experiments Performed for Each Operator • A l l the elements in domain have valid matched elements in range We have designed different test cases for different operators. We have covered different rules in each operator, including some special scenarios that have been mentioned. In each operator and some basic modules, there are also some functionalities that accomplish a single task. For example, we have a function to verify if a model is valid. Table 6.1 shows the experiments that we have performed for the operators, not including the test on basic functions, support elements and the whole system. From these experiments, we learned a lot of corner cases that we must cover for a complete system, such as how to deal wi th the root element in each operator. These observations are helpful when using these operators in the reintegration system. 6.2 Support Element Engine Experiments The "Support Element Engine" described in Chapter 4 is an assistant mod-ule that all the other operators rely on. Because its input is only a model and its output is the same model with support elements found and marked, the experiments on this module were performed in both single module sce-narios and integration scenarios wi th other operators. We have designed 18 test cases to covered different scenarios. We also tested the engine with mappings models. The engine is expected to be able to support any kind of model. Moreover, because we have three different support elements adding al -gorithms (See Sections 4.3.3.1 4.3.3.2 and 4.3.3.3), al l of the experiments were performed using all the algorithms separately. We have also designed cases to show that different algorithms would generate different results on the selection of support elements. We also tested the performance of the engine. We have designed test cases that include elements arranged in a wide structure, for example, one level contains more than 10 elements, and in a deep structure, for example, the model contains more than 5 levels. These models have very complex relationships between elements in each level and in different levels. The result shows that al l three algorithms can finish in less than one second, even for large models, which is acceptable speed. Among the three algorithms, the shortest path algorithm is a little bit faster than the other two. This is because this algorithm searches support elements in a direct way, which is less than the height of the graph from the root. The support elements adding step for each element is independent from each element. However, in the other two algorithms, the correlation of each element that needs support has to be considered. This increased the complexity of the algorithms. Therefore the Shortest P a t h Algorithm is the one that we recommend if the number of support elements added is not very small. 6.3 Reintegration System Experiments We designed four experiments to test the reintegration system. Because the system is composed of only the five operators mentioned in this thesis, these tests are also a way to further test each operator. In the experiments, we primarily used the shortest path algorithm in the Support Element Engine. However, in the last two more complex experiments, we have randomly chosen one of the three support element adding algorithms in each step of the reintegration system. This ensures that all three algorithms work properly. In order to quickly create models for testing, we also designed functions to automatically add elements to models and find and add history prop-erties to match to corresponding elements in another model (usually the original model). This gives us a quick way of adding elements and building the structure of models and the relationships between elements in different models, so that we can put more of our focus on the reintegration system itself. Our first experiment is a simple one mentioned in [14] (Merging Models Based on Given Correspondences). The original model includes four ele-ments. The derived models contain two scenarios: one element is modified in one model and not changed in the other; one element is deleted in one model and not changed in the other. Even from this simple experiment, we found many improvements that the system needed in order to be more sta-ble. For example, the simpler the models are, the more likely there are null models appearing in the result of operators, which may be used by other op-erators as input. From Table 2.2, we can see that there are 11 results among the total 26 steps are that null models. This brought our attention to han-dling null models for each operator and it is necessary to define the correct behavior when the input has null models, as discussed in Section 5.2.1. We have attached screen shots of all the steps of this experiment in Appendix A . The rules mentioned in Table 2.1 are the principle cases that our system should cover. Therefore, our second experiment was designed to include elements that can cover al l the scenarios in the rules. The original model in -cludes eight elements which are arranged in three levels. One derived model contains eight elements and the other contains seven elements. These ele-ments are arranged in a designed hierarchy to cover each scenario that we are going to test. Furthermore, we also performed experiments on the cases that Rondo [11] uses. This includes a relational schema case and an X M L schema case. Both schemas are translated into models that can be used in our system and run the Reintegration algorithm on them. In the first of these two experiments, we created a four-level parent-child hierarchy data model. The first level is the root for the whole database; the second level includes one element for each schema in the database; the third level includes one element for each attribute; and the last level includes the type of each attribute and if it is a primary key, there is also a child element of primary key. The translated models contain the following information: • Model O: 37 elements, in 4 levels • Model A : 49 elements, in 4 levels • Mode l B : 45 elements, in 4 levels • Mapping relationships between O and A : 33 • Mapping relationships between O and B : 29 • Mapping relationships between A and B : 31 • F i n a l result model contains 57 elements In the second example, we translate each X M L node of the example as an element in the each model. The translated models contain the following information: • Model O: 27 elements in 5 levels • Model A : 32 elements in 5 levels • Model B : 26 elements in 5 levels • Mapping relationships between O and A : 19 • Mapping relationships between O and B: 20 • Mapping relationships between A and B : 16 • F ina l result model contains 37 elements The overall test results showed that our system achieves the correct out-put. The first two experiments resulted in the results dictated by three-way merge. The last two cases got the same results as Rondo. These were the results that were dictated by the rules of three-way merge, and is the be-havior that make sense to most people. From these two more complex examples, we observed that it is very i m -portant to always keep the input model and output model valid. Sometimes, a dangling relationship or element is a human mistake while the model is created. Therefore, we added a validation step to make sure that all the input models are valid, that is, they are compatible with the inclusion rules. This extra step makes the system more reliable. Chapter 7 Conclusion and Future Work 7.1 Conclusion In this thesis, we discussed the details of the Model Management operators: Diff (Section 3.2), Range(Section 3.3), Apply(Section 3.4) and Ma£c/t(Section 3.5) . We formally defined each operator and gave algorithms after analyz-ing them in detail. We showed that these operators always need support elements to maintain integrity. We developed a model reintegration framework that integrates al l the operators and accomplishes the Reintegration task, which is the essential algorithm for schema reintegration purposes. The model reintegration sys-tem demonstrated how the original schema can help when merging different versions of modified schemas correctly. The schema reintegration system can be used in many areas, such as for relational schemas, X M L schemas, U M L , and ontologies. When the circum-stance happens that there are one original schema and two or more different modified versions of the schema, the schema reintegration can facilitate the automation of the reintegration steps. We discovered the need for, and also discussed the "Support Element Engine" in detail. This engine is designed and developed in a generic way, so that al l other operators can simply rely on it to select the appropriate elements as the support elements. This method does not add any extra information to the result of each operator and does not lose any existing information either. It is also a way to keep tracking the history of the evolvement of each element. W i t h this engine assisting of model manage-ment, all the other operators can now focus on their own algorithms and leave the structure problem of their result models to the engine in the final step. Our experiments have shown that these generic model management op-erators indeed can be used for schema reintegration purposes. This system can be used for different kind of schemas and generate good result. The rules implemented in each operator can be customized to be compatible wi th dif-ferent utilizations in the real world. We always keep "generic" in mind when we design our system. This system can be easily extended to other relevant applications. 7.2 Future W o r k In this thesis, we focused on analyzing and implementing the operators Diff, Range, Apply, and Match (based on History property), which are for rein-tegration. There are st i l l many other operators in model management to be explored, such as compose and modelGen. These operators can be useful for other model management purposes. In our reintegration system, models are created manually. This is only for demo purposes. The transformation from a schema to our model is slow if it is done manually. It wi l l be very helpful if there are interpreters that can transfer each kind of schemas to our models. The Match operator in the system is based on the "History" property. This works fine and is reliable in the middle of the algorithm after the models are built. However, in the first step, when we create the input models, we have used the element names to automatically build the corresponding "History" properties. This is because we do not discuss other automatic matching methods. The automatic matching in model management has been discussed widely, for example, [11] [15] [6] [7] [10] [9]. If a reliable matcher can be integrated into the system and build the init ia l relationships between the input models and elements, the system wi l l be more complete and have better usability. Bibliography [1] Suad Alagic and Phi l ip A . Bernstein, A model theory for generic schema management, D B P L , 2001, pp. 228-246. [2] S. Balasubramaniam and Benjamin C. Pierce, What is a file synchro-nizer?, A C M / I E E E International Conference on Mobile Computing and Networking ( M O B I C O M ) , 1998, pp. 98-108. [3] P. A . Bernstein, Applying model management to classical meta data problems, Conference on Innovative Data Systems Research ( C I D R ) , 2003, pp. 209-220. [4] Phi l ip A . Bernstein, Generic model management: A database infras-tructure for schema manipulation., CoopIS, 2001, pp. 1-6. [5] Ph i l ip A . Bernstein, A lon Y . Halevy, and Rachel A . Pottinger, A vision of management of complex models, S I G M O D Record 29 (2000), no. 4, 55-63. [6] David W . Embley, L i X u , and Yihong Ding , Automatic direct and indirect schema mapping: experiences and lessons learned, S I G M O D Record 33 (2004), no. 4, 14-19. [7] Fausto Giunchiglia and Mikala i Yatskevich, Semantic matching, K n o w l -edge Engineering Review 18 (2004), no. 3, 265-280. [8] MindFusion Limited , Flowchart.net component, Version 3.2.2, 2006. [9] Jayant Madhavan, Ph i l ip A . Bernstein, A n H a i Doan, and A lon Halevy, Corpus-based schema matching, I C D E '05: Proceedings of the 21st In-ternational Conference on Data Engineering (ICDE'05) (Washington, D C , U S A ) , I E E E Computer Society, 2005, pp. 57-68. [10] Jayant Madhavan, Ph i l ip A . Bernstein, and Erhard R a h m , Generic schema matching with cupid, V L D B '01: Proceedings of the 27th In-ternational Conference on Very Large Data Bases (San Francisco, C A , U S A ) , Morgan Kaufmann Publishers Inc., 2001, pp. 49-58. [11] Sergey Melnik, Erhard Rahm, and P. A . Bernstein, Rondo: A pro-gramming platform for generic model management, A C M S I G M O D International Conference on Management of Data ( S I G M O D ) , 2003, pp. 193-204. [12] Jonathon P. Munson and Prasun Dewan, A flexible object merging framework, Conference on Computer Supported Cooperative Work ( C S C W ) , 1994, pp. 231-242. [13] Rachel A Pottinger, Processing queries and merging schemas in support of data integration, P h . D . thesis, University of Washington, 2004. [14] Rachel A . Pottinger and P. A . Bernstein, Merging models based on given correspondences, Technical Report UW-CSE-03-02-03, University of Washington, 2003. [15] Erhard R a h m and Phi l ip A . Bernstein, A survey of approaches to au-tomatic schema matching., V L D B J . 10 (2001), no. 4, 334-350. Appendix A Reintegration System Example Here we list the screen shots of the first experiment that we did on the Rein-tegration System. Figure A . l shows the original model and Figures A .2 and A .3 are the two different modified versions. Figures A .4 through A.29 list al l the steps that correspond to the procedures shown in Table 2.2. [Ji' Internet - Model Reintegration System - Unknown Site Load R e i n t e g r a t i o n ' , P r o p e r t y D i s p l a y 1 ! «8 S i m p l e C Ful l M o d e l M A M o d e l M B 1. M a p O A M o d e l O 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A CD 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A m CD Q 9. A ' 1 0 . B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t Figure A . l : The Original Model MO Hi' Internet - Model Reintegration System - Unknown Site Load R e i n t e g r a t i o n 1 1 P r o p e r t y D i s p l a y [ M o d e l M O " ® S i m p l e M o d e l M A r Ful l j M o d e l M B 1. M a p O A M o d e l A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B B. M a p C h a n g e A 9. A ' 1 0 . B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 , M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t Figure A . 2 : The First Modified Model MA 5? Internet - Model Reintegration System - Unknown Site Load R e i n t e g r a t i o n -•—— , P r o p e r t y D i s p l a y M o d e l M O ) r eg S i m p l e C Ful l M o d e l M A M o d e l M B 1. M a p O A M o d e l B 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A B. C h a n g e d B Mi 7. M a p C h a n g e B 8. M a p C h a n g e A • Q 9. A ' 1 D . B ' 11 . M a p A B 1 2 . G 1 3 M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 25 . M a p G A B S D A | 2 B . R e s u l t Figure A .3 : The Second Modified Model MB " Internet - Model Reintegration System - Unknown Site Load Reintegration Model MO Model M, Model M 1.MapOA 2 .MapOB 3. MapOA' Property Display <8 Simple r Full MapOA = Match(MO, MA) 10.B1 11 .MapAB 12.G 13.MapGA' 14.GA' 15.MapGA'B' 16.GAB 17.DeletedA 18.DeletedB 19.MapDACB 20.MapDBCA 21,ShouldDelete 22.ShouldDelete 23.MapGABSDA 24.GABSDA |25.MapGABSDA 26. Result Figure A.4 : The Mapping MapoA between Model MO and Model MA Internet - Model Reintegration System - Unknown Site i Load i Reintegration Model MO Model MA Model MB 1. MapOA 2 .MapOB 3. MapOA' 4. MapOB' 5. ChangedA 6. ChangedB 7. MapChangeB 8. MapChangeA 9. A' 10.B' 11 .MapAB 12.G 13.MapGA' 14.GA' 15.MapGA'B' 16.GAB 17DeletedA 18.DeletedB 19.MapDACB 20.MapDBCA 21.ShouldDelete 22.ShouldDelete 23.MapGABSDA 24.GABSDA 25.MapGABSDA 26.Result Property Display S Simple C Full MapOB = Match(MO, MB) Figure A .5 : The Mapping MapoB between Model MO and Mode l MB Internet - Model Reintegration System - Unknown Site Load r- Reintegrat ion Model M O Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A 1 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 D e l e t e d B 1 9 M a p D A C B 2 0 , M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y fi" S i m p l e r Ful l M a p O A ' = A p p l y ( M a p O A ) Figure A .6 : The Model Map'OA from Apply{MapoA) 1 . M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B 1 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 M a p G A ' 1 4 . G A ' 1 5 M a p G A ' B ' 16 G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 M a p D A C B 2 0 M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t _ [JR Internet - Model Reintegration System - Unknown Site Load Reintegration Model M O Model M A M o d e l M B • i D l x l P r o p e r t y D i s p l a y S i m p l e r Full M a p O B ' = A p p l y ( M a p O B ) Figure A .7 : The Model Map'OB from Apply(MapoB) It,}"' Internet - Model Reintegration System - Unknown Site Load p Reintegrat ion - m P r o p e r t y D i s p l a y Model M O " <8 S i m p l e C Full Model M A Model M B 1. M a p O A C h a n g e d A = R a n g e ( M a p O A ) 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B K 7. M a p C h a n g e B B. M a p C h a n g e A flfe 9. A ' n 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 126.Result Figure A .8 : The Model ChangedA Internet - Model Reintegration System - Unknown Site Load Reintegration Model M O Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 M a p A B 1 2 . G 1 3 M a p G A ' 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 D e l e t e d B 1 9 . M a p D A C B 2 0 M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y S i m p l e r Ful l C h a n g e d B = R a n g e ( M a p O A ) Figure A .9 : The Model ChangedB (A N U L L model) Hi Internet - Model Reintegration System - Unknown Site Load Reintegration Model M Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10.B' 11 . M a p A B 12.G 1 3 M a p G A ' 14.GA' 15 .MapGA'B ' 1 6 . G A B 17.DeletedA 18.DeletedB 1 9 . M a p D A C B 2 0 M a p D B C A 21 .ShouldDelete 2 2 S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 26.Result Property Display CS Simple r Full M a p C h a n g e d B A = M a t c h ( C h a n g e d B , C h a n g e d A ) Figure A . 10: The Mapping MapchangedBChangedA between Models ChangedB and ChangedA 2i Internet - Model Reintegration System - Unknown Site Load R e i n t e g r a t i o n - -" ' M o d e l M O /lodel M A Model M B LQjxJ 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10.B' 11 . M a p A B 12.G 13 .MapGA' 14.GA' 15 .MapGA'B ' 1 6 . G A B 17.DeletedA 18.DeletedB 1 9 . M a p D A C B 2 0 . M a p D B C A 2 1 S h o u l d D e l e t e 2 2 S h o u l d D e l e t e 2 3 M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 26.Result I • i • i i i i i i | Property Display ® Simple r Full M a p C h a n g e d A B = M a t c h ( C h a n g e d A , C h a n g e d B ) Figure A . l l : The Mapping MapchangedAChangedB between Models ChangedA and ChangedB [fi Internet - Model Reintegration System - Unknown Site Load Reintegration lei M O 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 , M a p G A B S D A 24 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t Property D i s p l a y S i m p l e r Full A ' = D i f f ( C h a n g e d A , C h a n g e d B ) Figure A.12: The Difference Model A' between Models ChangedA and ChangedB IJi Internet - Model Reintegration System - Unknown Site Load Reintegrat ion M o d e l M O rfodel M, Model M B 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 M a p G A ' 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 M a p D A C B 2 0 M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 M a p G A B S D A 2 6 . R e s u l t ^ l P l . * J P r o p e r t y D i s p l a y ® S i m p l e r Full B ' = D i f f ( C h a n g e d B , C h a n g e d A ) Figure A.13: The Difference Model B' between Models ChangedB and ChangedA 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 , S h o u l d D e l e t e 2 3 , M a p G A B S D A 2 4 . G A B S D A 2 5 , M a p G A B S D A 26 .ResuIt y " Internet - Model Reintegration System - Unknown Site Load Reintegration M o d e l M O M o d e l M A M o d e ! M B P r o p e r t y D i s p l a y ® S i m p l e r Full M a p A B = M a t c h ( M A , M B ) Figure A . 14: The Mapping MapAB between Models MA and MB Qf Internet - Model Reintegration System - Unknown Site Reintegration Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 , S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y S i m p l e r Full G = M e r g e ( M A , M B ) Figure A.15: The Merged Model G from Models MA and MB LI*1 Internet - Model Reintegration System - Unknown Site Load Reintegration • | D | x | M o d e l M O M o d e l M A i M o d e l M B 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y c S i m p l e r Full M a p G A ' = M a p ( G , A') Figure A.16: The Mapping MapoA' between Models G and A' Wi Internet - Model Reintegration System - Unknown Site Load r Reintegrat ion j P r o p e r t y D i s p l a y Model M O f1- S i m p l e Model M A C Ful l Model M B 1. M a p O A G A ' = M e r g e ( G . A') 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A B. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A CD G) 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A |2B.Result Figure A.17: The Merged Model GA' from Models G and A' |Q"' Internet - Model Reintegration System - Unknown Site Load r~ Reintegrat ion Model M O j P r o p e r t y D i s p l a y ' ffi S i m p l e r Full [ Model M A | M o d e l M B M a p G A ' B ' = M a t c h ( G A , B') 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 , M a p G A B S D A |26.Result Figure A . 18: The Mapping MapcA'B1 between Models GA' and B' Q? Internet - Model Reintegration System - Unknown Site Load - R e i n t e a r a t i o n - - " - -Model M O j P r o p e r l y D i s p l a y ® S i m p l e C Full Model M B 1. M a p O A G A B = M e r g e ( G A , B;) 2 . M a p O B 3. M a p O A 4. M a p O B 1 5. C h a n g e d A 6. C h a n g e d B j p i 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' f 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 126.Result Figure A.19: The Merged Model GAB from Models GA' and B ' IB? Internet - Model Heintegral ion System - Unknown Site ''flHHflHIi HHHHHiilHHiHHHHHflHHHIHIHH - R e i n t e g r a t i o n , P r o p e r t y D i s p l a y Model M O ) S i m p l e C Full Model M A j Model M B 11. M a p O A D e l e t e d A = D i f f ( M O , MA) 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B null 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 24 . G A B S D A 2 5 . M a p G A B S D A 126.Resu.lt Figure A .20: The Model DeletedA Showing The Difference between Models MO and MA |LTi Internet - Model Reintegration System - Unknown Site - i n l x| Load -Reintegrat ion Model M O Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y f S i m p l e r Full D e l e t e d B = D i f f ( M O , M B ) Figure A.21: The Model Deleteds Showing The Difference between Models MO and MB \i Internet - Model Reintegration System - Unknown Site Load l F Reintegrat ion M o d e l M O M o d e l M A M o d e l M 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 M a p D A C B 2 0 M a p D B C A 21 . S h o u l d D e l e t e 2 2 S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y C" S i m p l e r Full M a p D A C B = M a t c h ( D e l e t e d A , C h a n g e d B ) Figure A .22: The Mapping Model MapoACB between Models DeletedA and Changeds Q± Internet - Model Reintegration System - Unknown Site Load Reintegration I Model M O Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 , M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t P r o p e r t y D i s p l a y <S S i m p l e r Full M a p D B C A = M a t c h ( D e l e t e d B , C h a n g e d A ) Figure A.23: The Mapping Model MapoBCA between Models Deletedg and Changed A Internet - Model Reintegration System - Unknown Site Load Reintegration i P r o p e r t y D i s p l a y Model M O J S i m p l e Model M A r Full | Model M B 11. M a p O A S h o u l d D e l e t e A = Dif f (DeletedA, C h a n g e d B ) 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B null 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B 1 11 . M a p A B 1 2 . G 1 3 . M a p G A 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 , S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 126 . R e s u l t Figure A.24: The Difference Model ShouldDeleteA between DeletedA and ChangedB 1 Internet - Model Reintegration System 1 - Unknown Site - ICI l x| Load R e i n t e g r a t i o n ^ Model M O Model M A Model M B 1. M a p O A 2 . M a p O B 3. M a p O A 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A 10 .B' 11 . M a p A B 1 2 . G 1 3 . M a p G A 1 4 . G A 1 5 . M a p G A ' B ' 1 6 . G A B 17 D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 2 6 . R e s u l t Property D i s p l a y «S S i m p l e C Full S h o u l d D e l e t e B = Di f f (DeletedB, C h a n g e d A ) Figure A.25: The Difference Model ShouldDeleteB between DeletedB and ChangedA Q"' Internet - Model Reintegration System - Unknown Site Load [ - R e i n t e g r a t i o n ^ 1 • P r o p e r t y D i s p l a y Model M O ) <* S i m p l e Model M A r Full | Model M B 1. M a p O A M a p G A B S D A = M a t c h ( G A B . S h o u l d D e l e t e A 2 . M a p O B 3. M a p O A " 4. M a p O B ' 5. C h a n g e d A B. C h a n g e d B null J 7. M a p C h a n g e B 8. M a p C h a n g e A 9. A ' 10 .B ' 11 . M a p A B 1 2 . G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 . M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A |2B.Result Figure A.26: The Mapping Model between Models GAB and ShouldDeleteA 0 ? Internet - Model Reintegration System - Unknown Site Load Reintegration i P r o p e r l y D i s p l a y Model M O J S i m p l e Model M A r Full ! Model M B 1. M a p O A G A B S D A = Diff ( G A B , S h o u l d D e l e t e A ) 2 . M a p O B 3. M a p O A ' 4. M a p O B ' 5. C h a n g e d A 6. C h a n g e d B 7. M a p C h a n g e B 8. M a p C h a n g e A f y CD 9. A ' m 110.B' j 11 . M a p A B |12.G 1 3 . M a p G A ' 1 4 . G A ' 1 5 . M a p G A ' B ' 1 6 . G A B 1 7 . D e l e t e d A 1 8 . D e l e t e d B 1 9 , M a p D A C B 2 0 . M a p D B C A 21 . S h o u l d D e l e t e 2 2 . S h o u l d D e l e t e 2 3 . M a p G A B S D A 24 . G A B S D A 2 5 . M a p G A B S D A 126.Result Figure A.27: The Difference Model GABSDA between Models GAB and ShouldDeleteA I'* Internet - Model Reintegration System - Unknown Site Load Reintegration Model MO I Property Display ' (? Simple r Full Model MA Model MB 1. M a p O A M a p G A B S D A S D B = Match(GABSDA, ShouldDeleteB) 2 .MapOB 3. MapOA' 4. MapOB' 5. ChangedA U\ , ) j o ) 6. ChangedB 7. MapChangeB 8. MapChangeA 9. A' ¥ ^> 2 V 10.B' 11 .MapAB 112.G 13.MapGA' 14.GA' 15.MapGA'B' 16.GAB 17.DeletedA 18.DeletedB 1 9 M a p D A C B 2 0 . M a p D B C A 21.ShouldDelete |22.ShouldDelete 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 . M a p G A B S D A 126.Result Figure A.28: The Mapping Model between Model GABSDA and Model ShouldDeleteB Lfi Internet - Model Reintegration System - Unknown Site Load Reintegration Model M O Model MA Model MB 1. MapOA 2 M a p O B 3. MapOA' 4. MapOB' 5. ChangedA 8. ChangedB 7. MapChangeB 8. MapChangeA 9. A' 10.B' 11 .MapAB 12.G 1 3 M a p G A 14.GA' 15MapGA'B ' 16.GAB 17.DeletedA 18.DeletedB 1 9 M a p D A C B 2 0 M a p D B C A 21.ShouldDelete 22.ShouldDelete 2 3 . M a p G A B S D A 2 4 . G A B S D A 2 5 M a p G A B S D A 26.Result Jfllxj Property Display CS Simple r Full Result = Diff(GABSDA, ShouldDeleteB) Figure A.29: The F i n a l Result Model Result, Difference between Models GABSDA and ShouldDeleteB 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051766/manifest

Comment

Related Items