UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

View integration in database design Ng, Victor 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1997-0266.pdf [ 14.18MB ]
Metadata
JSON: 831-1.0087708.json
JSON-LD: 831-1.0087708-ld.json
RDF/XML (Pretty): 831-1.0087708-rdf.xml
RDF/JSON: 831-1.0087708-rdf.json
Turtle: 831-1.0087708-turtle.txt
N-Triples: 831-1.0087708-rdf-ntriples.txt
Original Record: 831-1.0087708-source.json
Full Text
831-1.0087708-fulltext.txt
Citation
831-1.0087708.ris

Full Text

V i e w I n t e g r a t i o n i n D a t a b a s e D e s i g n by Victor Ng B .Sc , The University of Calgary, 1995 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in THE F A C U L T Y OF GRADUATE STUDIES The Department of Management Information Systems The Faculty of Commerce and Business Administration The University of British Columbia We accept this thesis as conforming to the required standard April 1996 © Victor Ng, 1996 in presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of HftMfta"JM£K\T -^f0?^f\ t I0\j <>\ S T t M ^ The University of British Columbia Vancouver, Canada Date APR ^ , tSS^ DE-6 (2788) Abstract Database designers usually spend much time integrating database views created by different users. This is because different users have different perceptions of the real world. Therefore, conflicts exist between database views. The most common conflicts found are naming and structural ones. Wagner (1989) suggests four characteristics (name, meaning, construct and context) that can be used to identify and distinguish database elements. He also provides a complete solution for the inter-view conflicts. Finally, he derives some heuristic rules that are used to determine similar database elements. However, Wagner's approach does not provide database designers with an efficient conflict analysis procedure. Moreover, no direction is given to database designers for the comparison of different database elements. So, database designers have to compare unnecessary elements. The objective of this research is to lessen the requirement of the view integration system to interact with database designers during a conflict analysis. A more efficient method of conflict analysis is outlined which compares elements in the following order: 1) entities, 2) relatedness, 3) relationships and 4) attributes. Fuzzy logic is also used to provide database designers with some numerical analysis of the degree of similarity between entities. This concept is used at the stage of entities and relatedness identification. The improved view integration system reduces much of the database designers' workload by comparing fewer elements during the conflict analysis. ii Table of Contents ABSTRACT II TABLE OF CONTENTS Ill LIST OF SIGNIFICANT TABLES XIII LIST OF SIGNIFICANT FIGURES XIV ACKNOWLEDGMENTS XV 1. INTRODUCTION .'. 1 1.1 OBJECTIVE OF THE STUDY 2 1.2 REPRESENTATION OF CONCEPTUAL DESIGN 3 1.3 PROBLEMS ASSOCIATED WITH VIEW INTEGRATION 7 1.3.1 Naming Conflicts 8 1.3.2 Structural Conflicts 9 2. LITERATURE REVIEW '. 12 2.1 PREVIOUS VIEW INTEGRATION APPROACH 12 2.2 T H E VIEW INTEGRATION APPROACH BY WAGNER (1989) 14 2.2.1 Criteria for Object Comparison 15 2.2.2 Conflict Resolution 20 iii 2.2.2.1 A Relationship Becomes an Entity (SI) 21 2.2.2.2 A Relationship Attribute Becomes an Entity (S2) 22 2.2.2.3 An Entity Attribute Becomes an E-R Construct (S3) 23 2.2.2.4 Association of an Entity to a Relationship (S4) 24 2.2.2.4.1 A Comment about Conflict Solution S4 (Association of an Entity to a Relationship) 25 2.2.2.5 Relocation of a relationship after creation of new superset or subset classes (S5) 26 2.2.2.6 Representation of Containment (S6) 27 2.2.2.7 Representation of a Common Role (W-relationship) (S7) 28 2.2.2.8 Representation of Common Superset without Overlap (S8) 29 2.2.2.9 Representation of Common Superset with Overlap (S9) 29 2.2.2.10 Renaming of Homonyms and Synonyms (S10) 30 2.2.2.11 Addition of Missing Objects (SI 1) 30 2.2.3 Heuristic Rules 31 2.2.3.1 HI: Two elements with identical or related meanings have a common context 32 2.2.3.2 H2: Two elements with identical or related meanings have the same construct 32 iv 2.2.3.3 H3: If no two elements with identical or related meanings and with identical constructs are found, the construct mismatch could be of the following types: 33 2.2.3.4 H4: Elements with identical meanings have identical names 34 2.2.3.5 H5: Elements with related meanings have names with identical word stems. 34 2.2.3.6 H6: Two elements with identical or related meanings have some attributes with identical names (for entities and relationships only) 34 2.2.3.7 H7: If elements are classified according to their meanings, then elements with identical or related meanings belong to the same pre-defined meaning category 35 2.2.4 Conflict Analysis Procedure 36 2.3 F U Z Z Y L O G I C 39 2.3.1 Basic Zadeh-Type Operations on Fuzzy Sets 43 2.3.1.1 The Intersection of Fuzzy Sets 44 2.3.1.2 The Union of Fuzzy Sets 45 2.3.2 Building a Fuzzy System 47 2.3.2.1 Defining Fuzzy Rules 47 2.3.2.1.1 Defining Fuzzy Sets 50 2.3.2.2 Composing Fuzzy Rules 53 2.3.2.2.1 Correlation Methods (The First Step) 53 2.3.2.2.1.1 Correlation Minimum 54 v 2.3.2.2.1.2 Correlation Product 56 2.3.2.2.1.3 Comparisons of the Correlation Minimum and Correlation Product Methods 58 2.3.2.2.2 Updating the Resultant Output Fuzzy Region (The Second Step) 59 2.3.2.2.2.1 The Min-Max Method 59 2.3.2.2.2.2 The Fuzzy Additive Method 65 2.3.2.2.2.3 Comparisons of the Min-Max Method and the Fuzzy Additive Method 70 2.3.2.3 Decomposing (Defuzzifying) Fuzzy Rules 71 2.3.2.3.1 Composite Moments (Centroid) 71 2.3.2.3.2 Composite Maximum (Maximum Height) 72 2.3.2.3.3 Comparisons of the Composite Moments Method and the Composite Maximum Method 73 3. CONTRIBUTION TO KNOWLEDGE 74 4. IMPROVED VIEW INTEGRATION SYSTEM 76 4.1 N E W C O N F L I C T A N A L Y S I S P R O C E D U R E 77 4.1.1 Entities Identification 78 4.1.1.1 Creating a Testing List Consisting Merely of Entities from Each of the Two Original Database Views (Phase Ei) 78 4.1.1.2 Comparing Each Entity from One Testing List with All Entities in the Other List (Phase Eii) 79 v i 4.1.1.2.1 An Entity in One View Corresponds to an Entity in the Other View (Situation El) 79 4.1.1.2.2 An Entity in One View Corresponds to None of the Entities in the Other View (Situation E2) 80 4.1.2 Relatedness Identification 85 4.1.2.1 Copying Existing Is-a Relationships from the Two Original Views to the Resultant View (Phase Li) 85 4.1.2.2 Identifying New Is-a Relationships between Unique Entities in the Resultant View (Phase Lii) 87 4.1.2.2.1 The Test for Containment (Test La) 88 4.1.2.2.1.1 The Common Entity is a Subset of the Two Different Unique Entities in the Original Views (Situation La.l) 89 4.1.2.2.1.2 The Common Entity is a Superset of the Two Different Unique Entities in the Original Views (Situation La.2) 90 4.1.2.2.1.3 The Common Entity is a Subset of One Unique Entity and a Superset of Another Unique Entity in the Original Views (Situation La.3)....91 4.1.2.2.2 Test for Superset-subset (Test Lb) 92 4.1.2.2.3 Test for a W-relationship (Test Lc) 95 4.1.3 Relationships Identification 97 4.1.3.1 Identifying Relationships in the Two Original Database Views (Phase Ri). ...97 4.1.3.1.1 With a Pair of Entities, None of the Entities Is Involved in Any Is-a Relationship (Situation Ra) 98 vii 4.1.3.1.1.1 Both Entities Are Unique in Different Database Views (Situation Ra.l) 98 4.1.3.1.1.2 Both Entities Are Present Together in Only One Original Database View (Situation Ra.2) 99 4.1.3.1.1.2.1 Two Entities Existing Only in One Database View Are not Linked by Any Relationship (Situation Ra.2.1) 100 4.1.3.1.1.2.2 Two Entities Existing Only in One Database View Are Linked by a Relationship (Situation Ra.2.2) 101 4.1.3.1.1.3 Both Entities Are Present in the Two Original Views (Situation Ra.3) 102 4.1.3.1.1.3.1 No Relationship Exists between the Two Entities in Both Database Views (Situation Ra.3.1) 103 4.1.3.1.1.3.2 A Relationship Exists between the Two Entities in Both Database Views (Situation Ra.3.2) 104 4.1.3.1.1.3.2.1 The Relationships between the Two Entities in the Two Database Views Are Identical (Situation Ra.3.2.1) 104 4.1.3.1.1.3.2.2 The Relationships between the Two Entities in the Two Database Views Are Non-identical (Situation Ra.3.2.2)...105 4.1.3.1.1.3.3 A Relationship Exists between the Two Entities in Only One Database View (Situation Ra.3.3) 106 4.1.3.1.1.3.3.1 The Relationship in One View May be Missing in the Other View (Situation Ra.3.3.1) 107 viii 4.1.3.1.1.3.3.2 The Relationship in One View May Correspond to an Entity in the Other View (Situation Ra.3.3.2) 108 4.1.3.1.2 With aPair of Entities, Either One or Both Entities Is/Are Involved in Is-a Relationships (Situation Rb) 109 4.1.3.1.2.1 Only One of the Two Entities Is Involved in Is-a Relationships (Situation Rb.l) 109 4.1.3.1.2.2 Both Entities are Involved in Is-a Relationships (Situation Rb.2)...l 16 4.1.3.2 Copying Cardinalities from the Two Original Views to the Resultant View (Phase Rb) 122 4.1.4 Attributes Identification 123 4.1.4.1 Identifying Attributes in the Two Original Database Views (Phase Ai) 123 4.1.4.1.1 An Attribute that Is Associated with an Entity (Situation Aa) 123 4.1.4.1.1.1 The Entity Is not Involved in an Is-a Relationships Chain (Situation Aa.l) 124 4.1.4.1.1.1.1 The Entity Exists in Only One Database View (Situation Aa.1.1) 124 4.1.4.1.1.1.2 The entity exists in both database views (Situation Aa. 1.2) 125 4.1.4.1.1.1.2.1 The Entity Attribute in One View Corresponds to an Attribute Associated with the Same Entity in the Other View (Situation Aa.l.2.1) 126 ix 4.1.4.1.1.1.2.2 The Entity Attribute in One View Corresponds to an E-R Construct Associated with the Same Entity in the Other View (Situation Aa. 1.2.2) 127 4.1.4.1.1.1.2.3 The Entity Attribute in One View Corresponds to No Attribute in the Other View (Situation Aa.1.2.3) 128 4.1.4.1.1.2 The Entity Is Involved in an Is-a Relationships Chain (Situation Aa.2) 129 4.1.4.1.1.2.1 No New Is-a Relationship Has Been Added to the Is-a Relationships Chain (Situation Aa.2.1) 129 4.1.4.1.1.2.2 New Is-a Relationships Have Been Added to the Is-a Relationships Chain (Situation Aa.2.2) 130 4.1.4.1.2 An Attribute Is Associated with a Relationship (Situation Ab) 134 4.1.4.1.2.1 The Relationship Attribute in One View Corresponds to an Attribute Associated with the Same Relationship in the Other View (Situation Ab.l) 134 4.1.4.1.2.2 The Relationship Attribute in One View Corresponds to an Entity in a Ternary Relationship in the Other View (Situation Ab.2) 135 4.1.4.1.2.3 The Relationship Attribute in One View Corresponds to an Attribute of Another Relationship Connecting the Same Entities in the Other View (Situation Ab.3) 139 4.1.4.1.2.4 The Relationship Attribute in One View Corresponds to No Attribute in the Other View (Situation Ab.4) 140 4.1.4.2 Solving key conflicts in the two original database views (Phase Aii) 144 4.1.5 Wagner's View Integration System Vs the Improved View Integration System....145 4.1.5.1 Case 1: Partially Overlapped Database Views 146 4.1.5.2 Case 2: Non-overlapped Database Views 159 4.1.5.3 Case 3: Identical Database Views 174 4.2 DETERMINING INTER-VIEWS ENTITIES' SIMILARITIES USING FUZZY LOGIC 184 4.2.1 Building the Fuzzy System for Entities' Similarities Determination 185 4.2.1.1 Defining Fuzzy Rules for Entities' Similarities Determination 185 4.2.1.1.1 Defining Model Variables for Entities' Similarities Determination 186 4.2.1.1.2 Defining Ranges of Fuzzy Set Domains 190 4.2.1.1.3 Selecting Surface Type of Fuzzy Sets 195 4.2.1.1.4 Defining Fuzzy Rules 196 4.2.1.2 Composing Fuzzy Rules 204 4.2.1.3 Decomposing (defuzzifying) Fuzzy Rules 217 4.2.2 Sensitivity Analysis of the Fuzzy Reasoning 226 4.2.2.1 Variation of the Surface Type of the Fuzzy Sets 230 4.2.2.2 Variation of the Correlation Method 233 4.2.2.3 Variation of the Number of Fuzzy Sets 236 4.2.2.4 Variation of the Overlap Degree of Neighboring Fuzzy Sets 247 4.2.2.5 Variation of the Antecedent Truth Membership of the Model Variable Name 258 xi 4.2.3 Advantages of Using Fuzzy Logic in Entities Similarity Determination 270 5. CONCLUSIONS AND RECOMMENDATIONS 272 6. BIBLIOGRAPHY 277 7. APPENDIX (CASE STUDY) 279 ( xii List of Significant Tables Table 1. The Neighbor Property of Different Kinds of Elements by Batini and Lenzerini (1984) 16 Table 2. All the Possible Cases for Comparing Two Elements from Two Different Views ...17 Table 3. Summary of the Candidates of a Relationship 121 Table 4. Summary of the Candidates of an Attribute 143 Table 5. The Comparison of Wagner's System with the Improved System in Integration of Partially Overlapped Views 147 Table 6. The Comparison of Wagner's System with the Improved System in Integration of Non-overlapped Views 160 Table 7. The Comparison of Wagner's System with the Improved System in Integration of Identical Views 175 Table 8. Comparisons of Wagner's Approach with the New Approach 272 xin List of Significant Figures Figure 1. The Metadescription of the Extended E-R Model 3 Figure 2. The Crispy Set for the Tall Concept 39 Figure 3. The Fuzzy Set for the Tall Concept 40 Figure 4. The General Fuzzy Set Structure 41 Figure 5. An Alpha Cut for the Fuzzy Set of the Tall Concept 42 Figure 6. The Intersection of Two Fuzzy Sets 45 Figure 7. The Union of Two Fuzzy Sets 46 Figure 8. Fuzzy Sets of the Model Variable Price 51 Figure 9. The Correlation Minimum Method 55 Figure 10. The Correlation Product Method 57 Figure 11. Composing Fuzzy Propositions Using the Min-Max Method (Case 1) 61 Figure 12. Composing Fuzzy Propositions Using the Min-Max Method (Case 2) 63 Figure 13. Composing Fuzzy Propositions Using the Fuzzy Additive Method (Case 1) 66 Figure 14. Composing Fuzzy Propositions Using the Fuzzy Additive Method (Case 2) 68 Figure 15. Fuzzy Sets of the Model Variable Similarity 191 Figure 16. Fuzzy Sets of the Model Variable Attributes' Resemblance 193 xiv Acknowledgments "Praise be to the God and Father of our Lord Jesus Christ, who has blessed us in the heavenly realms with every spiritual blessing in Christ." Ephesians 1: 3 I would like to give all the credits of my work to God. I would not be able to finish my thesis without His blessings. I thank my chief adviser Dr. Robert Goldstein for guiding me through this interesting and challenging research. I appreciate my second adviser Dr. Carson Woo for giving me invaluable comments and helps. My thanks also go to Dr. Ananth Srinivasan who is the third adviser in my committee. I would also like to give thanks to my family members for their love and support. Finally, I wish to show appreciation to my friends Samson, Ada, Aaron and Shen who constantly give me encouragement. xv 1. Introduction The design of a database system usually requires collaboration between database users and designers. Before they start building a database system, designers have to gather information from users. Different users might have different conceptual representations of information, while one designer's idea about how to convert users' information into a database design could be quite different from other designers'. Therefore, the database design process becomes time consuming and error prone. The database system development process, as suggested by the New Orleans Database Design Workshop, 1979 and other database researchers, can be divided into the following phases: 1. Requirements Analysis - During this phase, information is obtained from users to define a top down focus on an enterprise's requirements and on the boundaries of a database system. 2. Conceptual Design - The goal of this phase is "to obtain an integrated formal implementation independent of application specific enterprise information" (Navathe et al., 1980). Conceptual design can be divided into two stages: a) View Modeling and Modification - The objective of this stage is to represent users' information requirements as database views. b) View Integration - Individual database views are integrated into a global database schema. 1 3. Logical Design - Issues of integrity, consistency, recovery, security and efficiency of the database system are resolved in this phase. 4. Physical Design - This phase deals with the functionality and efficiency of the database system. 1.1 Objective of the Study In this research, the focus will be on the view integration stage which has as its goal to produce from users' database views a global database schema which is free of redundancy, inconsistency and information omission. The objective of this research is: 1. to improve Wagner's conflict analysis approach (Wagner 1989); 2. to identify general situations that happen during a conflict analysis; 3. to reduce the number of interactions with database designers during a conflict analysis; and 4. to provide database designers with numerical hints using the concept of fuzzy logic in order to help them recognize identical or related entities. 2 1.2 Representation of Conceptual Design One traditional approach to represent conceptual design has been the entity-relationship (E-R) method introduced by Chen (1976). In this research, an extended version of Chen's E-R model is used to represent the conceptual design. The metadescription of the extended E-R model, which is a modification of the E-R model's metaschema by Batini et al. (1984 p. 651), is shown in the figure below. Figure 1. The Metadescription of the Extended E-R Model 1,1 l,n Identifier 3 There are three basic classes of constructs in the original E-R model: entities, attributes, and relationships. An entity is a fact or an object that can be distinctly identified from the real world, for instance, a course. It is represented diagrammatically by a rectangle: Course An entity can be associated with relationships and attributes. However, it can also exist without any association with relationships and attributes. A relationship is a connection between entities, for example, an instructor teaches a course. It is represented by a rhombus: A relationship has to be associated with at least two entities, but the two entities are not necessarily different. For instance, a unary relationship is associated with the same entity twice. An example of a unary relationship is the prerequisites of courses: Courses A relationship can possess zero or more than one attribute. 4 An attribute is graphically represented by an eclipse. The representation of the attribute Bank Account of an entity Person is as follows: Person < 2 ank Accoui An attribute is associated with one and only one entity or relationship. There are two types of attributes: property and interconnection attributes. A property attribute, such as color or name, describes the characteristics of an entity or a relationship. An interconnection attribute portrays the association of an entity or a relationship with another entity or relationship. In other words, an interconnection attribute can also be abstracted as an entity. For example, a real world object Bank Account can be represented as the interconnection attribute of an entity Person: Person <^ T4ank Account} but the bank account can also be represented as an entity: has An identifier is an attribute which is able to identify each occurrence of an entity. For example, a person can be identified by a social insurance number. An identifier can be composed of one attribute (single key) or more than one attribute (concatenated key). It must be underlined to be distinguishable from other attributes of the entity it is associated with. For example, an attribute Employee ID is underlined to denote that it is the key of an entity Employee. Employee -^E^loyeelD^) 5 The cardinalities and the total and partial nature of relationships can be described by min-card (inalities) and max-card (inalities). The min-card is the minimum number of occurrences of a relationship that an occurrence of an entity can be associated with. A value of 0 means that an occurrence of an entity may exist without any association with an occurrence of a relationship; a value of 1 (or n) means that an occurrence of an entity has to be associated with at least 1 (or n) occurrence(s) of a relationship. Correspondingly, the max-card is the maximum number of occurrences of a relationship that each occurrence of an entity can be associated with. With figure 1, an occurrence of the entity Entity can be associated with at least 0 (min-card) and at most n (max-card) occurrence(s) of the relationship E-A. An entity can be related to other entities through an is-a relationship. An is-a relationship is used to describe a subset relationship between entities. For example, an entity Part-Time Employee Is-A subset of an entity Employee. The min-card and max-card of a subset entity to an is-a relationship are 1 and 1 respectively, while the min-card and max-card of a superset entity to an is-a relationship are 0 and 1 respectively. A broken line connecting a subset entity and an is-a relationship is used to allow users to identify a subset entity from an is-a relationship quickly, and this notation will be used throughout this study. Employee Part-Time (1,1) Employee 6 1.3 Problems Associated with View Integration During the view modeling phase, database users attempt to abstract data requirements from the "real world" into individual database views. In other words, a database view reflects a user's understanding of some part of the real world. These pieces of information are then integrated to form a complete database schema by database designers during the view integration phase. Users, rather than database designers, are requested to describe the information requirements of a database system because they are most familiar with their task domains. For instance, an accounting manager is most familiar with the operations within the accounting department. By requesting users to describe their data needs, the accuracy of the information gathered, and hence the validity of the database views, can be improved. Different users, however, perceive the world in different ways, because users have diverse areas of responsibilities. For example, the objective of a sales representative of a company is to sell products to customers, whereas the job of an accounting clerk is to collect payments from customers. Moreover, different users may have overlapping scopes of responsibilities. For instance, both the sales representative and the accounting clerk have to contact customers. Therefore, conflicting opinions may arise when users are asked to abstract information from the real world into database views. In general, there are two kinds of conceptual representation conflicts among users' database views: naming and structural conflicts. 7 1.3.1 Naming Conflicts Users from different application areas of an organization may have their own terminology and names. Therefore, inconsistencies among names of elements (entities, relationships and attributes) in different users' database views occur. Naming conflicts are of two types: 1. Homonyms: This conflict occurs when the same name is used for elements representing different real world objects in different database views. For example, in one view, an entity Suppl ier means both manufacturers and wholesalers, whereas in another view, the entity Suppl ier means wholesalers only. 2. Synonyms: This conflict results when elements representing the same real world object are described by different names in different database views. For instance, in two different database views, entities Cl ient and Cus tomer are synonyms and are used to describe the same real world object. Note that homonyms can be avoided among attributes by adopting the naming convention: objectname.attr ibutename (Elmasri et al., 1985). In this way, the attribute Address of an entity Suppl ier can be distinguished from the attribute A d d r e s s of an entity Customer. 8 1.3.2 Structural Conflicts Structural conflicts arise as a result of users' different choices for modeling real world objects. In general, structural conflicts are of the following types: 1. Type Conflicts (Construct Mismatches): These conflicts arise when the same real world object is represented by different modeling constructs in different database views. For example, a real world object Bank Account is represented as an entity in one database view, whereas it is the attribute of an entity Customer in another database view. View 1: Customer View 2: Customer 9 2. Key Conflicts: These conflicts result when different keys are assigned to entities or relationships representing the same real world object in different database views. For example, attributes Employee ID and SIN may be used as the keys of an entity Employee in different database views. View 1: Employee View 2: Employee - ( ^ ^ ^ b y e e l D ^ ) 3. Cardinalities Conflicts: These conflicts occur when the min-card and max-card within an entity-relationship association are incongruent in different database views. For instance, in view 1, a customer can have no bank account, while in view 2, a customer must have at least one bank account. View 1: 0,n Customer View 2: l,n Customer 10 Semantic Conflicts: These conflicts arise when different semantic interpretations are abstracted into different database views. For example, in view 1, it states that all employees is-a shareholders of a company (an entity Employee Is-A subset of an entity Shareholders), while in view 2, all shareholders is-a employees (an entity Shareholders Is-A subset of an entity Employee). View 1: Employee Is-A Shareholders View 2: Shareholders 11 2. Literature Review In this section, previous view integration approaches are first briefly illustrated. Then, the research conducted by Wagner (1989) on view integration is summarized. Finally, the concept of fuzzy logic based on Cox (1995, 1994) is introduced. 2.1 Previous View Integration Approach The view integration technique can be classified into two main streams: syntactic and semantic. The syntactic approach employs functional relationships (functional dependencies) among attributes in different database views to perform view integration. Researchers using the syntactic approach for view integration include Bernstein (1976), Raver and Hubbard (1977), Yao et al. (1982), Casanova and Vidal (1983), and Biskup and Convent (1986, 1985). The advantage of using the syntactic approach is that it does not require a database designer to fully understand the information requirements of a database system because functional dependencies derivation does not require tremendous consideration of the meaning of the elements in database views. However, the syntactic approach requires substantial information (constraint definitions) at the attribute level which is not easily obtained. Moreover, since the syntactic approach operates at the attribute level, the technique may be computationally (NP) hard. 12 Finally, the approach is not able to differentiate between dependencies that have different meanings, but involve the same attributes. For example, the functional dependency between attributes Emp loyee ID and Department Number (Employee ID implies Department Number) may actually represent two totally different functions. One function may state the department to which an employee belongs to while the other function may determine the department where an employee resides at. Since an employee belonging to the accounting department may sit in the financial department, the syntactic approach cannot differentiate such functions unless the attributes are explicitly renamed. In this case, the attribute Department Number has to be renamed to Department Be longed and Department Res ided accordingly. The semantic approach uses the meanings of the elements in database views to perform view integration. Examples of researchers using the semantic approach are Batini and Lenzerini, (1984) and Navathe and Elmasri (1986). Since the semantic approach operates at the entity and relationship level, and not at the attribute level, the technique is less complicated. The resultant global schema is also more "natural" and understandable to users and designers. The disadvantage of the semantic approach is that it requires more users' and designers' interactions to interpret and analyze conflicts. Summaries of each researcher's work on view integration can be found in Batini et al. (1986) and Wagner (1989). 13 2.2 The View Integration Approach by Wagner (1989) Wagner's main contributions (1989) are to suggest the criteria for identifying similar and distinguishing different elements in different database views, to provide a comprehensive list of solutions to views conflicts and to recommend heuristic rules that help find similar elements in different database views. The view integration approach proposed by Wagner is implemented using the semantic approach. It contains the following assumptions and rules: 1. Each database view created by a database user is free of error and is minimal in representation, that is, no redundant information is shown in the view. 2. Two database views are integrated at a time. The views are modified until they become identical. Therefore, the result of this pairwise integration is two new identical views, not a third view which combines elements from the two original ones. 3. Each real world object can only be represented by one element in a database view. 4. No information loss should result from view integration. 14 2.2.1 Criteria for Object Comparison Wagner (1989) proposes that elements in different database views can be identified and distinguished using four characteristics: meaning, name, construct and context. Two view elements (entities, relationships and attributes) have the same meaning if they represent the same real world object. Name is the label assigned to a view element. Construct refers to the structural type of view elements which can be one of entity, relationship, entity-attribute, or relationship-attribute. Context is the set of view elements which a view element is associated with. By definition, an entity has no context. The context of a relationship is all the entities it is associated with. An attribute's context is the entity or relationship it belongs to. Meaning is the most important concept among the four characteristics. By definition, if two elements from different views have the same meaning, all other characteristics have to be identical. If two elements refer to different real world objects, they should have different names, but may have identical constructs and contexts. Only a human database designer is able to decide whether two elements from different database views have the same meaning. A view integration algorithm can only help designers determine identical view elements. The algorithm still requires designers' inputs and interactions to make the final decision on whether or not two elements refer to the same real world object. 15 Batini and Lenzerini (1984) define a neighbor property to help identify and distinguish view elements. The neighbor property of each kind of element is shown in the following table: Table 1. The Neighbor Property of Different Kinds of Elements by Batini and Lenzerini (1984) Element Related Neighbor Properties Entity Attributes, relationships, is-a relationships Relationship Entities, attributes Attribute Entity/relationship Although two view elements referring to different real world objects do not have the same meaning, they can be related in meaning. Two elements can be related through a subset relationship. For example, the entity Part-Time Employee Is-A subset of the entity Employee. Elements that are of the entity or relationship type can be related in meaning. However, two related relationships do not require explicit depiction in an E-R diagram. In other words, an is-a relationship only exists between two related entities. Two related elements exhibit one of the following subset characteristics: 1. One element contains another element. For example, an entity Par t -T ime Emp loyee Is-A subset of an entity Employee. 2. Both elements have a common superset and subset. For example, entities Product T e a m Member and Project T e a m Member have a superset Emp loyee and a subset Product & Project T e a m Member . 16 3. Both elements have a common superset, and they do not overlap. For example, entities Part Time Employee and Full Time Employee have a superset Employee, yet they do not overlap. Two unrelated elements may share a common role (W-relationship). For instance, entities Person and Company are different and unrelated. However, they can have a common role which is the role of a shareholder. The following table, constructed by Wagner (1989), lists all the possible cases for comparing two elements from two different views. A '=' means that the corresponding characteristic of the two elements is identical, and a V denotes that the characteristic is non-identical. As for the meaning characteristic, 'r' means two elements are different in meaning but related. Table 2. All the Possible Cases for Comparing Two Elements from Two Different Views Case Name Construct Meaning Context Classification 1 — = Identical elements. 2 Identical elements with different contexts. 3 — — Synonyms. 4 — = Synonyms with different contexts. 5 A construct mismatch (semantic relativism). 6 A construct mismatch and 17 synonyms. 7 — = or* Different and unrelated elements. 8 — = or * Homonyms. 9 Different elements with different constructs. 10 Different elements with different constructs, but homonyms. 11 — r — Different but related elements. 12 — r = Different but related homonyms. 13 r Different but related elements with different contexts. 14 r Different but related homonyms with different contexts. 15 r Different but related elements of different types. 16 r Different but related homonyms of different types. 17 A missing element. 18 Cases 1 to 6 illustrate the situations when the meaning characteristics of two elements in different views are identical. In cases 7 to 10, the two elements have different meanings. The two elements in cases 11 to 16 are related in meaning. Case 17 shows the situation when an element in one view is not found in the other view. Note that if two elements have different constructs, their contexts will be different. The goal of view integration is to solve the conflicts between database views so as to achieve the situations in case 1, case 7, case 9 and case 11. The cases other than these four cases exhibit either a naming conflict or a construct mismatch. Therefore, cases 2 to 6 (elements with identical meanings but having either a naming conflict or a construct mismatch) have to be converted to case 1 (identical elements), as identical elements should be congruent in all four characteristics. Case 8 and case 10 (different elements with a homonym problem) should be converted to case 7 and case 9 (elements with different meanings) respectively. Cases 12 to 16 (related elements with either a homonym problem or a construct mismatch) are to be converted to case 11 (related elements). A new element is added to the view if case 17 occurs. 19 2.2.2 Conflict Resolution When a conflict is found between two database views, a solution has to be found. All case solutions are combinations of a set of eleven elementary solution operations: 1. A relationship becomes an entity. 2. A relationship attribute becomes an entity. 3. An entity attribute becomes an E-R construct. 4. Association of an entity to a relationship. 5. Relocation of a relationship after creation of new superset or subset classes. 6. Representation of containment. 7. Representation of a common role (W-relationship). 8. Representation of a common superset without overlap. 9. Representation of a common superset with overlap. 10. Renaming of homonyms and synonyms. 11. Addition of missing objects. 20 2.2.2.1 A Relationship Becomes an Entity (SI) When a real world object appears as a relationship in one view and as an entity in another view, the relationship construct has to be converted to the entity construct. New relationships will be added to connect the newly converted entity to other entities that the original relationship is associated with. For example, in view 1, a real world object contract is represented as a relationship, while in view 2, it is represented as an entity. The structural representation in view 1 is then converted to that in view 2. View 1: View 2: Customer Contract contract ^>- Dealer 21 2.2.2.2 A Relationship Attribute Becomes an Entity (S2) If a real world object is represented as a relationship attribute in one view, and as an entity in another view, the relationship attribute is transformed into an entity. The new entity is then connected back to the relationship. For example, a relationship attribute Project in view 1 will be converted to an entity Project as in view 2 to form a ternary relationship. View 1: View 2: 22 2.2.2.3 An Entity Attribute Becomes an E-R Construct (S3) Similarly to relationship attributes, entity attributes are converted to entities if conflicts occur. A new relationship will be added to connect the newly created entity with the entity originally is associated with. For instance, an entity attribute Bank Account is converted into an entity, and a new relationship has is added to connect the converted entity Bank Account with the entity Customer. View 1: Customer View 2: Customer 23 2.2.2.4 Association of an Entity to a Relationship (S4) An existing entity may have to be associated with an already existing relationship. For example, in view 1, entities Supplier and Project are connected by a relationship supply, while in view 2, the entity Supplier and an entity Part are connected by the same relationship supply. Hence, all entities are connected to the relationship to form a ternary relationship. Such a situation arises when two relationships are similar. View 1: View 2: Project Part Project Global Schema: 24 2.2.2.4.1 A Comment about Conflict Solution S4 (Association of an Entity to a Relationship) The circumstance in conflict solution S4 should not have arisen because the relationship supply in the global schema should not have existed without the three entities Suppl ier, Part and Project, while the relationship supply in view 1 and view 2 can exist without the entity Part and Project respectively. In fact, the relationship supply in different views possesses different meanings. In view 1, suppliers supply some resources that are not 'parts' for projects. In view 2, suppliers supply 'parts' which are not used in projects. In the global schema, suppliers supply 'parts' used in projects. 25 2.2.2.5 Relocation of a relationship after creation of new superset or subset classes (S5) When a subset relationship is created in a view, the relationships connecting the original entities may have to be relocated. For instance, in view 1, entities Full T ime Employee and Department are connected by a relationship be long, while in view 2, the entity Full T ime Employee Is-A subset of an entity Emp loyee . The resultant schema requires the relocation of the relationship belong from the entity Full T ime Emp loyee to the entity Employee. View 1: Department belong Full Time Employee View 2: Full Time Employee Global Schema: Department Employee Is-A Full Time Employee 26 2.2.2.6 Representation of Containment (S6) The subset relationship is represented by an is-a relationship. If an entity Full Time Employee is contained by an entity Employee, an is-a relationship is added to connect the two entities. View 1: Employee View 2: Full Time Employee Global Schema: Full Time Employee 27 2.2.2.7 Representation of a Common Role (W-relationship) (S7) Two unrelated elements can nevertheless have a common role. This kind of association is called a W-relationship (Goldstein and Storey, 1988). After the common role is represented, the relationships may have to be relocated. For example, entities Person and Company are not related, but both of them are shareholders of a company. This common role then has to be represented in the global schema. View 1: Person View 2: Company Global Schema: Person Company Shares 28 2.2.2.8 Representation of Common Superset without Overlap (S8) If two elements belong to a superset but exclude each other, the superset has to be added to represent the situation. For example, the entities Full T ime Emp loyee and Part T ime Emp loyee have a superset Employee. However, these two entities do not overlap. View 1: Full Time Employee View 2: Part Time Employee Global Schema: Full Time Employee Part Time Employee 2.2.2.9 Representation of Common Superset with Overlap (S9) Two elements can have both a common superset and a common subset. For instance, both entities Project T e a m Member and Product T e a m M e m b e r have a superset Employee and a subset Project & Product T e a m Member . View 1: Product Team Member View 2: Project Team Member 29 Global Schema: Product Team Member Project & Product Team Member Employee Project Team Member 2.2.2.10 Renaming of Homonyms and Synonyms (S10) When elements representing the same real world object have different names (synonyms) or when elements representing different real world objects have the same name (homonyms), the names of the elements have to be corrected. After synonyms are dealt with, the elements should bear the same name. Once homonyms are treated, the elements should carry different names. 2.2.2.11 Addition of Missing Objects (Sil) Elements may exist in one database view, but not in the other view as most database views merely overlap partially. If an element is missing in one view, it has to be added to make the views identical. 30 2.2.3 Heuristic Rules Heuristic rules are used during view integration to facilitate the determination of identical or related elements in two database views. Heuristic rules are rules that are generally correct in most circumstances. Some heuristic rules are defined as follows: 1. Two elements with identical or related meanings have a common context. 2. Two elements with identical or related meanings have the same construct. 3. If no two elements with identical or related meanings and with identical constructs are found, the construct mismatch could be of the following types: a) If a real world object is represented as an entity or a relationship in one view, it may appear as an entity attribute in the other view. b) If a real world object is represented as an entity in one view, but it does not appear as an entity attribute in the other view, then it may be a relationship attribute. c) If a real world object is represented as a relationship in one view, but it is not an entity attribute in the other view, then it may be an entity. 4. Elements with identical meanings have identical names. 5. Elements with related meanings have names with identical word stems. 6. Two elements with identical or related meanings have some attributes with identical names (for entities and relationships only). 7. If elements are classified according to their meanings, then elements with identical or related meanings belong to the same pre-defined meaning category. 31 2.2.3.1 HI: Two elements with identical or related meanings have a common context. This heuristic rule states that identical or related elements are usually found in the vicinity of the identical elements. If, for instance, in view 1, a relationship hire is associated with an entity Emp loyee , it would be reasonable to anticipate that the relationship hire can be found among the relationships that are connected to the identical entity Emp loyee in view 2. Even if a construct mismatch occurs, the relationship hire may be one of the entities or attributes that is associated with the entity Employee. 2.2.3.2 H2: Two elements with identical or related meanings have the same construct. This heuristic rule states that two identical elements are likely to have the same structural construct. So if a real world object Emp loyee is represented as an entity in view 1, then it is likely to appear as one of the entities in view 2. This heuristic rule is based on the tendency of people to employ the same type of construct to represent an object when they abstract information from the real world. For example, people usually represent real world objects Cus tomer as an entity and Co lor as an attribute 32 2.2.3.3 H3: If no two elements with identical or related meanings and with identical constructs are found, the construct mismatch could be of the following types: a) If a real world object is represented as an entity or a relationship in one view, it may appear as an entity attribute in the other view. A very common mistake in database design is the representation of an entity-relationship (E-R) construct as an entity's interconnection attribute (Storey, 1988). Note that an entity's property attribute would not be mistakenly represented as an E-R construct because a property attribute describes a characteristic of an entity or a relationship. b) If a real world object is represented as an entity in one view, but it does not appear as an entity attribute in the other view, then it may appear as a relationship attribute. Again, an entity in one view is only compared with the interconnection attributes of relationships in the other view. c) If a real world object is represented as a relationship in one view, but it is not an entity attribute in the other view, then it may be an entity. In other words, a relationship in one view can be represented as either an entity attribute or an entity in the other view if a construct mismatch occurs. 33 2.2.3.4 H4: Elements with identical meanings have identical names. In a company, people tend to use the same terminology to describe a real world object. However, one database user may use a singular term to label an object while the other one employs a plural name. Moreover, synonyms may be used to describe the same object. Therefore, this inconsistency may decrease the usefulness of this heuristic rule. 2.2.3.5 H5: Elements with related meanings have names with identical word stems. Word stems could be used as a filter for identifying related elements in different database views. For example, both entities Full T ime Emp loyee and Emp loyee have the same word stem Employee. Hence, it is useful to check the prefix and postfix of an element's name. 2.2.3.6 H6: Two elements with identical or related meanings have some attributes with identical names (for entities and relationships only). Two identical or related elements in different database views most likely possess the same key and property attributes. In fact, the neighbor property proposed by Batini and Lenzerini (1984) can help identify and distinguish view elements. 34 2.2.3.7 H7: If elements are classified according to their meanings, then elements with identical or related meanings belong to the same pre-defined meaning category. If the meaning of database elements can be classified according to some common sense schemes, then identical or related elements can be easily identified. A simple example of such common sense schemes could consist of two categories: animate elements and inanimate elements. Hence, an element Employee belongs to the category of animate elements because an employee is a living object, whereas an element Department belongs to the category of inanimate elements as a department is a non-living object. Then, the element Employee only corresponds or relates to other elements in the category of animate elements. Similarly, the element Department is only required to be compared with elements in the category of inanimate elements. 35 2.2.4 Conflict Analysis Procedure The conflict analysis procedure suggested by Wagner (1989) can be basically divided into two phases: testing the identity of elements and testing the relatedness1 of elements. An element in one view can be identical to at most one element in the other view, but it can relate to more than one element in the other view. The test for identity precedes the test for relatedness because once a pair of identical elements is determined, it is not necessary to check for relatedness anymore. In other words, inter-views' relatedness only exists between elements that are originally unique to one view. For example, view 1 contains elements Pe rson and Emp loyee , while view 2 includes elements Pe rson , Eng ineer and Secretary. The element Emp loyee is unique in view 1, and the elements Eng ineer and Secretary are unique in view 2. Hence, a designer only has to check if Emp loyee in view 1 relates to Eng ineer and Secretary in view 2. During the test for identity, the set of heuristic rules described in section 2.2.3 is used to help determine identical inter-views' elements. Then, the set of conflict solutions in section 2.2.2 is employed to make sure the four characteristics of name, construct, meaning and context are parallel between the inter-views' identical elements after the identity test. 1 The term relatedness is used instead of relationship to avoid confusion between the sets relationship and the relationship construct of the E-R modeling language. 36 The procedure of testing elements' identity is outlined in the following pseudo-code: Create a list of e lements for each of the two views. W H I L E there are still e lements in one of the two lists D O Se lec t one e lement (E1) from one list (L1). Se lec t a set of e lements (Sa) from the other list (L2) that has the s a m e construct as the element from List 1. IF one of the e lements (E2) in S a has the s a m e meaning a s E1 T H E N IF N O T ( E 2 has the s a m e context as E1) T H E N M a k e the context identical. E N D IF IF N O T ( E 2 has the s a m e name as E1) T H E N Make the name identical. E N D IF Delete E1 from L1 and E 2 from L2. E L S E IF a homonym is found from S a T H E N R e n a m e homonym. E N D IF Selec t a set of e lements (Sb) from L2 that have different construct as E 1 . IF one of the e lements (E3) in S b has the s a m e meaning as E1 T H E N M a k e the construct identical. IF N O T ( E 3 has the same context as E1) T H E N M a k e the context identical. E N D IF IF N O T ( E 2 has the same name as E1) T H E N Make the name identical. 37 END IF Delete E1 from L1 and E3 from L2. E L S E IF a homonym is found from Sb THEN Rename homonym. END IF Add E1 to another view. Delete E1 from L1. END IF END IF END WHILE The test for relatedness contains the following steps: 1. Test for Containments. This test determines whether an element in one view is contained by an element in the other view, or vice versa. 2. Test for Superset-subset. This test determines whether two elements from different views belong to a common superset and possess a common subset. 3. Test for a W-relationship. This test determines whether two elements from different views possess a common role. 4. If a construct mismatch exists, the element that is not an entity will be changed to an entity. 38 2.3 Fuzzy Logic This section discusses fuzzy logic concepts based on Cox (1995 and 1994). Only those concepts used in this study will be summarized. Fuzzy logic was invented by Lotfi Zadeh, an engineer and systems scientist. It measures how well the value of an object instance matches a semantic concept. The measure of how tall (a semantic concept) a person perceives another individual who is six feet tall to be (the value of an object instance) is an example of fuzzy logic. Zadeh states that humans reason not in terms of discrete symbols and numbers, but in terms of fuzzy sets. Fuzzy sets are functions that map a value within the domain of a set to a degree of memberships which is a number between zero and one. A degree of zero means that the value is not in the set, and a degree of one means that the value is completely representative of the set. So, the concept of tallness perceived by humans is different than that in classical set theory which enforces the selection of an arbitrary cutoff point, such as six feet. Figure 2. The Crispy Set for the Tall Concept TALL 1 -j I 0.8 • n 0.6 • Degree of Membership o.4 -0.2 • 0 -I , , , 1 , , 4 4.5 5 5.5 6 6.5 height in feet 39 The boundary between what is inside and outside the set, Tall, is very clear. If a person is taller than six feet, he/she is considered to be tall. On the contrary, if the height of a person is less than six feet, he/she is not considered tall. This kind of set is called a crispy set. However, the concept of tallness is normally perceived in a gradually increasing manner. That means as a person's height increases, the belief that the person is tall increases. One may think that a person who is four feet tall has a zero degree of membership (no membership) to the set Tall, and a person who is six feet tall has an one degree of membership (complete membership). Figure 3. The Fuzzy Set for the Tall Concept Tall 0.9 • y ^ 0.8 • 0.7 • yS Degree of o.6 • yS Membership °- 5 - / 0.4 / 0.3 - yS 0.2 yS 0.1 • yS 0 V— , , , , , 4 4.5 5 5.5 6 6.5 height in feet 40 The concept of tallhess described above is an example of a fuzzy set. A fuzzy set is comprised of three components: 1. a horizontal domain axis of monotonically increasing real numbers that constitute the population of the fuzzy set; 2. a vertical membership axis between zero and one indicating the degree of membership, p in the fuzzy set; and 3. the surface of the fuzzy set itself that correlates an element in the domain to a degree of membership in the set. Figure 4. The General Fuzzy Set Structure In fact, the fuzzy modeling technique revolves around the idea of a linguistic variable. The so called linguistic variable is the name of a fuzzy set. In the tallness example, Tall is a linguistic variable. 41 In order to restrict the domain of a fuzzy set based on the membership grade of each domain value, an alpha level (or alpha cut) is introduced. The result of applying the alpha cut is a set which contains all the domain values that are part of the original fuzzy set with a minimum membership value of a. pA(x) > a Returning now to the tallness example, if the alpha value is set to 0.25, the Tall fuzzy set is restricted to the domain of 4.5 to 6.5 feet. Figure 5. An Alpha Cut for the Fuzzy Set of the Tall Concept 4.5 5 5.5 height in feet An alpha cut is useful in excluding the candidate values with a low degree of membership in a fuzzy set, that is, the values which are less likely to belong to the specified fuzzy set. 42 2.3.1 Basic Zadeh-Type Operations on Fuzzy Sets Like conventional crispy sets, basic sets operations (intersection, union and complement) are also applicable to fuzzy sets. The fuzzy logic operations initially defined by Zadeh are as follows: 1. Intersection: AnB = min (u.A[x], |J.B[y]) 2. Union: AuB = max (u-A[x], i-iB[y]) 3. Complement: ~A = 1 - |J.A[x] where [i denotes the degree of membership of elements or objects x and y in the fuzzy sets (linguistic variables) A and B respectively. Since only intersection and union operations are used in this study, these two operations will be discussed in more detail. 43 2.3.1.1 The Intersection of Fuzzy Sets The And operation defined by Zadeh is supported by taking the minimum of the truth membership grades. The following table demonstrates the fuzzy And values for the x and y elements' memberships. 0.00 0.25 0.50 0.75 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.00 0.25 0.25 0.25 0.25 0.50 0.00 0.25 0.50 0.50 0.50 0.75 0.00 0.25 0.50 0.75 0.75 1.00 0.00 0.25 0.50 0.75 1.00 44 A bell-curved and a straight-lined graph are used to demonstrate the intersection of two fuzzy sets. The result of the intersection of the two fuzzy sets is represented by the shaded area in the graph. Figure 6. The Intersection of Two Fuzzy Sets Degree of Membership Domain 2.3.1.2 The Union of Fuzzy Sets The OR operator is supported by taking the maximum of the membership grades. The following table shows the fuzzy OR values for the x and y elements' memberships. 0.00 0.25 0.50 0.75 1.00 0.00 0.00 0.25 0.50 0.75 1.00 0.25 0.25 0.25 0.50 0.75 1.00 0.50 0.50 0.50 0.50 0.75 1.00 0.75 0.75 0.75 0.75 0.75 1.00 1.00 1.00 1.00 1.00 1.00 1.00 45 The result of the union of the two fuzzy sets (a bell-curved graph and a straight-lined graph) is represented by the shaded area in the following graph. 46 2.3.2 Building a Fuzzy System Although fuzzy logic has been known and examined for nearly thirty years, the concept has only recently been used in the business and technical areas. Fuzzy logic is mainly used in the area of process and control engineering. The development of a fuzzy system can be basically divided into the following steps: 1. Defining fuzzy rules. 2. Composing fuzzy rules. 3. Decomposing (defuzzifying) fuzzy rules. 2.3.2.1 Defining Fuzzy Rules The root mechanism in a fuzzy system is propositions. A fuzzy system is comprised of a series of conditional and unconditional fuzzy rules. A rule is made of propositions which relate the value of a model variable, x, to a fuzzy set, Y; it is expressed in the following form: x is Y The result of evaluating a fuzzy proposition is the degree of membership, pA of x in Y. pA <- (x e Y) 47 A conditional rule is virtually an if statement, and it is analogous to the rule of a conventional expert system. It has the general form: if w is Z, then x is Y the phrases w is Z and x is Y are propositions, where w and x are model variables, and Z and Y are fuzzy sets (linguistic variables). The proposition following the if term is the antecedent or predicate, while the one following the then term is the consequent. The rule is interpreted in the following way: x is a member of Y to the degree that w is a member of Z That means the truth membership of the consequent is correlated with the truth of the antecedent. An example of conditional rule will be: if competition price is HIGH, then price is MODERATEHIGH This rule states that the price of the product is (a member of) moderatehigh if the price of the competitor is (a member of) high. The antecedent can be comprised of more than one proposition, and the propositions are connected by fuzzy operators such as AND or OR. The general form of multiple antecedent propositions is: if (w is Z) • (y is W) • ... • (u is S) then x is Y where • could be the AND or OR operator. The truth membership of the consequent of the rule is correlated with the composite truth of the antecedent propositions. An example of the rule with multiple antecedent propositions is: if cost is MODERATE and sales is HIGH, then price is MODERATE 48 Therefore, the price of the product is (a member of) moderate to the degree that cost of the product is (a member of) moderate AND the sales is (a member of) high. An unconditional fuzzy rule is one that is not qualified by an if statement. It has the general form: x is Y where X is a model variable and Y is a fuzzy set. An unconditional rule is interpreted as: X is the minimum subset of Y where the output fuzzy set X is equal to min(X, Y). An example of the unconditional rule is: our price must be high. So the price of the product must be (a member of) high. Before writing rules for a fuzzy system, the way fuzzy sets are defined must be explained. 49 2.3.2.1.1 Defining Fuzzy Sets To define a fuzzy set, the range of the set domain must first be selected. For instance, $75 and $100 can be selected as the lower and upper limit respectively of the domain of the fuzzy set high for the model variable price. Next, the type of the surface of the fuzzy set must be chosen. The two most common types of surfaces used in fuzzy sets are straight-lined and S-curved. The choice of the surface is not too important, given that a fuzzy set is an approximation of the problem space. In fact, the following figure shows that the difference between a triangular (formed by two mirrored straight-lined graphs) and bell-curved (formed by two mirrored S-curved graphs) representations is apparent. Hence, in practice, fuzzy models are rarely sensitive to this kind of elasticity in the fuzzy set descriptions. 1 Degree of Membership 0 Domain 50 In order to write fuzzy rules, a relationship must be established between a collection of fuzzy sets and the model variable. A model variable must sometimes be split into one or more fuzzy sets in order to reflect its semantic properties. Each fuzzy set describes a certain range of the variable's domain, and a name is assigned to denote the linguistic meaning of the range. The number of fuzzy sets for a model variable is always an odd number between 3 and 11, with 5 or 7 most commonly used. As an example, the model variable price can be sliced up into five sections: low ($0 - $25), moderatelow ($0 - $50), moderate ($25 - $75), moderatehigh ($50 - $75) and high ($75 - $100). Figure 8. Fuzzy Sets of the Model Variable Price Price Low ModerateLow Moderate ModerateHigh High 0 25 50 75 100 One may notice that the fuzzy sets overlap to a certain extent. This overlap is required in order to convert a series of individual fuzzy regions into one continuous and smooth surface. For example, the idea of warm and hot can be represented as follows: 5i Degree of Membership -Warm -Hot 40 50 60 70 80 90 100 110 120 130 140 Degrees Centigrade However, this representation is at odds with the normal perception of the relationship between warm and hot, because in the figure, there is no perception of warmth and hotness at 90°C. The next figure illustrates a more logical representation of the two concepts. Degree of Membership -Warm -Hot 40 50 60 70 80 90 100 110 120 130 140 Degrees Centigrade As the temperature increases, the perception of warm decreases, but the perception of hot increases. There is no precise algorithm for determining the degree of overlap, but the overlap should reflect the semantics of the model variable. Experience shows that the degree of overlap averages between 25% and 50%. 52 2.3.2.2 Composing Fuzzy Rules There are two principal methods for the composition of fuzzy rules: the min-max method and the fuzzy additive method. The first step of both methods is identical. This step involves correlating the truth of the fuzzy rule's consequent fuzzy set with the composite truth of the rule's antecedent proposition. The only difference between these methods is the second step when the representation of the output fuzzy representation is updated. 2.3.2.2.1 Correlation Methods (The First Step) For both the min-max and fuzzy additive methods, the consequent fuzzy region of a rule is restricted to the minimum of its antecedent truth. The following equation summarizes this step: pcfs[xj <- min(ppt,pcfs[x;]) This equation illustrates that the consequent fuzzy set (cfs) is restricted to the minimum of either the truth value of itself or that of the antecedent proposition (pt). There are two main methods of restricting the height of the consequent fuzzy set: correlation minimum and correlation product. 53 2.3.2.2.1.1 Correlation Minimum This method truncates the consequent fuzzy region at the truth of the antecedent. The following fuzzy rule is used to demonstrate the mechanism of the correlation minimum method: if cost is LOW and sales is HIGH, then price is MODERATELOW The price of the product is dependent on its cost and sales. The figure on the next page shows the result of applying the rule using the correlation minimum method. The cost C has a degree of membership, 0.78, in the fuzzy set low and the sales S has a degree of membership, 0.48, in the fuzzy set high. The height of the consequent fuzzy set moderatelow of the model variable price is then equal to the minimum of these two values. Hence, the fuzzy set moderatelow is truncated at the degree of membership, 0.48. 54 Figure 9. The Correlation Minimum Method Cost L o w 1 .Pr ice Moderate Low 0.48 40 50 60 70 80 90 100 110 Sales H ' 9 n 0.48 10000 55 2.3.2.2.1.2 Correlation Product This method scales the consequent fuzzy region of a rule according to the composite truth of its antecedent propositions. This action has the effect of shrinking the consequent fuzzy region. The fuzzy rule from section 2.3.2.2.1.1: if cost is LOW and sales is HIGH, then price is MODERATELOW is used again to demonstrate the mechanism of the correlation product method. Since the cost C has a degree of membership, 0.78, in the fuzzy set low and the sales S has a degree of membership, 0.48, in the fuzzy set high, therefore, the fuzzy set moderatelow of the variable price is scaled at the degree of membership 0.48. 56 Figure 10. The Correlation Product Method Sales H ' 9 h 0.48 i I 10000 s 57 2.3.2.2.1.3 Comparisons of the Correlation Minimum and Correlation Product Methods The correlation minimum method always creates a plateau since the top of the fuzzy region is sliced by the antecedent truth value. This truncation causes a certain amount of information loss as the shape of the consequent fuzzy set is altered. The correlation product method, on the other hand, maintains the shape of the fuzzy set and thus reduces the amount of information loss. However, Cox (1994) says that the correlation minimum method is often preferred over the correlation product method because the former method involves less complex and faster arithmetic. Moreover, the aggregated output fuzzy region determined by the correlation minimum method is easier to be defuzzified using conventional defuzzification techniques (composite moments and composite maximum which will be discussed in section 2.3.2.3) than when using the correlation product method, because the shape of the output region determined by the former method is less complicated and irregular. Hence, the correlation minimum method is useful in situations requiring quicker decisions, but the decisions are not fully dependent on precise information, while the correlation product method is useful in situations which can allow slower decisions, but the decisions require more precise information. 58 2.3.2.2.2 Updating the Resultant Output Fuzzy Region (The Second Step) As mentioned in section 2.3.2.2, the min-max method and the fuzzy additive method have different mechanisms for updating the resultant output fuzzy region. 2.3.2.2.2.1 The Min-Max Method As for the min-max method, the resultant output fuzzy region is updated by taking the maximum of the correlated consequent fuzzy sets. The second step of the min-max method is summarized in the following equation: f V s M <~ max(Liofs, u^xj) This equation indicates that the output fuzzy set (ofs) is updated by taking the maximum of either the truth value of itself or that of the correlated consequent fuzzy set (cfs). To demonstrate the mechanism of the min-max method, two cases are used. Case 1: 1. if sales is HIGH, then price is MODERATELOW 2. if cost is LOW, then price is MODERATELOW Case 2: 1. if cost is MODERATE and sales is HIGH, then price is MODERATE 2. if cost is LOW and sales is HIGH, then price is MODERATELOW 59 In case 1, the antecedent propositions of the two fuzzy rules change the shape of the same consequent fuzzy set (moderatelow), while in case 2 , the antecedent propositions of the two fuzzy rules affect the shape of two different, but adjacent consequent fuzzy sets (moderate and moderatelow). The correlation minimum method is used in both cases to show the effect of applying the min-max method. The effect of applying the min-max method on case 1 is illustrated in the following figure: 60 Figure 11. Composing Fuzzy Propositions Using the Min-Max Method (Case Cost L o w Sales H ' 9 n 10000 s 0.23 1 .Pr ice Moderate Low 0.48 0.23 " T " " \ T7T:\ 40 50 60 70 80 90 100 110 61 In case 1, the first fuzzy rule reduces the height of the consequent fuzzy set moderatelow of the variable price (the region enclosed by a solid pink line) to 0.23, as the sales S has a degree of membership, 0.23, in the fuzzy set high. The second rule truncates the height of the same consequent fuzzy set (the region enclosed by a solid green line) at 0.48, as the cost C has a degree of membership, 0.48, in the fuzzy set low. By taking the maximum of the truncated heights of the two consequent fuzzy sets, the height of the resultant output fuzzy region (the region enclosed by a broken black line) is truncated at 0.48. . Price Moderate 1 n Low 0.48 - r 40 50 60 70 80 90 100 110 Note that the resultant output fuzzy region is merely determined by the first fuzzy rule which has a relatively higher degree of membership in the consequent fuzzy set. Therefore, with the min-max method, the fuzzy rule producing a higher degree of membership in a consequent fuzzy set will dominate over other rules producing lower degrees of membership in the same fuzzy set. 62 The illustration of the effect of the min-max method on case 2 is as follows: Figure 12. Composing Fuzzy Propositions I sing the Min-Max Method (Case 2) Sales H i 9 h 0.48 10000 s 63 Each antecedent proposition updates the fuzzy set of the consequent proposition. The first rule results in the consequent fuzzy set, moderate (the lower plateau region enclosed by a solid pink line), while the second one results in the consequent fuzzy set, moderatelow (the higher plateau region enclosed by a solid green line). The resultant output fuzzy region, the broken black lined region, is formed by taking the maximum of these two truncated fuzzy sets. Price Moderate Moderate o Low 0.48 „ _ _ 0.23 ' \ t 1 , , , r-40 50 60 70 80 90 100 110 64 2.3.2.2.2.2 The Fuzzy Additive Method Instead of taking the maximum of the truth membership functions of the consequent fuzzy sets, the output fuzzy set under the fuzzy additive method is formed by adding the truth values of the consequent fuzzy sets. The addition is bounded by [1, 0] so that the result of the addition does not exceed the maximum truth value of a fuzzy set. The mechanism of updating the output fuzzy region can be summarized in the following equation: M-ofsW <~ min(l, p o f s + pcfs[x;]) This equation indicates that the output fuzzy set (ofs) is updated by taking the minimum of either one or the sum of the values of itself and the correlated consequent fuzzy set (cfs). To demonstrate the mechanism of the fuzzy additive method, the two cases in section 2.3.2.2.2 are used again. Case 1: 1. if sales is HIGH, then price is MODERATELOW 2. if cost is LOW, then price is MODERATELOW Case 2: 1. if cost is MODERATE and sales is HIGH, then price is MODERATE 2. if cost is LOW and sales is HIGH, then price is MODERATELOW The correlation minimum method is again used to compare the effect of applying the fuzzy additive method to the two cases. 65 The effect of applying the fuzzy additive method to case 1 is illustrated in the following figure: Figure 13. Composing Fuzzy Propositions Using the Fuzzy Additive Method (Case 1) Cost L o w 1 n 1 Pr ice Moderate ' 0.48 I !0.48 0.23 Low 1-0.23=0.71 m « T77?\ -I 1 "I 1 1 1 40 50 60 70 80 90 100 110 Sales H '9 n 0.23 10000 66 The first rule reduces the height of the consequent fuzzy set moderatelow of the variable price (the region enclosed by a solid pink line) to 0.23, as the sales S has a degree of membership, 0.23, in the fuzzy set high. The output fuzzy region is then formed by adding the consequent fuzzy set with the truth membership, 0.48, from the second rule (the region enclosed by a solid green line) onto the consequent fuzzy set from the first rule, as the cost C has a degree of membership, 0.48, in the fuzzy set low. Hence, the resultant output fuzzy region (the region enclosed by a broken black line) has a height of 0.71. 1 Pr ice Moderate Low 0.48 + 0.23 = 0.71 * \ I \ l \ — * , , — — 1 r-40 50 60 70 80 90 100 110 67 The illustration of the effect of the fuzzy additive method on case 2 is as follows: Figure 14. Composing Fuzzy Propositions Using the Fuzzy Additive Method (Case 2) Cost Low Moderate 0.78 1 i Sales High 0.48 Pr ice Moderate Moderate Low 0.48 + 0.23 = 0.71 -°-4V- 0 0.23 ! • • • / ' • 4- £X3 40 50 60 70 80 90 100 110 10000 68 The first rule results in the consequent fuzzy set, moderate (the lower plateau region enclosed by a solid pink line), while the second one results in the consequent fuzzy set, moderatelow (the higher plateau region enclosed by a solid green line). The resultant output fuzzy region (the irregular region enclosed by a broken black line) is formed by taking the bounded sum of these two truncated fuzzy sets. 1 Pr ice Moderate Moderate Low 0.48 + 0.23 = 0.71 r * 0.48 » r * 0.23 ' — a 1 1 1 1 1 1 40 50 60 70 80 90 100 110 69 2.3.2.2.2.3 Comparisons of the Min-Max Method and the Fuzzy Additive Method With the min-max method, only those fuzzy rules that generate the highest truth values in the consequent fuzzy sets will contribute to the resultant output fuzzy region. In other words, certain rules in the fuzzy system will dominate over other rules when the min-max method is used. With the fuzzy additive method, all fuzzy rules contribute to the resultant output fuzzy region. In other words, the fuzzy additive method accumulates the effects of all the rules in a fuzzy system to reach the final result. Therefore, the min-max method is useful in situations when decisions are only dependent upon control factors that have the greatest impacts, while the fuzzy additive method is useful in situations when decisions are based on the cumulative effect of all the control factors. 70 2.3.2.3 Decomposing (Defuzzifying) Fuzzy Rules The objective of the decomposition (defuzzification) of fuzzy rules is to represent the information contained in the fuzzy set with a scalar (an expected) value. There are many defuzzification techniques. Here, two main techniques, composite moments (centroid) and composite maximum (maximum height), are introduced. 2.3.2.3.1 Composite Moments (Centroid) The composite moments technique is the most commonly used method in defuzzification. It computes the "balance" point of the output fuzzy region by finding the weighted mean of the fuzzy region. The formula for calculating the weighted mean of the fuzzy region is as follows: 1=0 where d is the ith domain value, and (j.(d) is the truth membership value for that domain point. 71 The following figure shows how the composite moments method finds the expected value to represent an output fuzzy region's center of gravity. 1 Pr ice Moderate Moderate Low 40 50 60 70 80 90 100 110 2.3.2.3.2 Composite Maximum (Maximum Height) The composite maximum technique finds a domain point with the maximum truth in the output fuzzy set. If this point is ambiguous, for instance, if it lies along a plateau, then the technique will find the center of the plateau. In the following figure, the composite maximum technique takes the center point of the consequent fuzzy set with the highest plateau, moderatelow, as the expected value for defuzzification. Price Moderate Moderate • -i Low r i \ \ 4-40 50 60 70 80 90 100 110 72 2.3.2.3.3 Comparisons of the Composite Moments Method and the Composite Maximum Method The composite moments technique employs the weighted mean to compute the expected value, while the composite maximum technique finds the highest point in the resultant output fuzzy region as the expected value. Therefore, the composite maximum technique is easier and simpler than the composite moments one as it involves less arithmetic calculations. However, the composite maximum technique is not as common as the composite moments one as the expected value calculated using the former technique is sensitive to a single fuzzy rule that dominates the resultant output fuzzy region. 73 3. Contribution to Knowledge This research builds on the work of Wagner (1989). Wagner suggests four characteristics, name, meaning, construct and context, for identifying view elements. He also provides a detailed solution, the eleven elementary solutions, to resolve conflicts among different database views. Moreover, he recommends some heuristic rules that help identify similar view elements. However, in the conflict analysis procedure suggested by Wagner, each element in one view is required to compare with almost every element in another view in the worst situation. The worst situation may involve two database views which have no common element, but an element in one view is related to another element in the other view. For example, no common element is found between view 1 and 2. However, an entity student in view 1 is related to an entity graduate student in view 2. View 1: Student belong Department View 2: Graduate Student work Project 74 View 3: Department Project Moreover, when designers compare an element in one view with a group of candidate elements in another view, they are not given any degree of similarity between the testing element and the candidates. These design deficiencies increase the time designers have to spend on identifying and comparing elements from different database views. Therefore, the first contribution of this research is to suggest a better conflict analysis procedure that will decrease the number of time that the system has to interact with database designers during the whole conflict analysis. The next contribution of this study is to provide designers with a mechanism to help determine the similarity of elements from different views. 75 4. Improved View Integration System The view integration system suggested by Wagner (1989) integrates two database views at a time. This integration results in two identical views bearing all the information from the original views. The improved view integration system will also integrate two database views each time. The result of the improved view integration system, however, will produce a new third view which contains all the information from the two original ones. The purpose of producing a third database view instead of modifying the two original views is that the improved conflict analysis requires a new working space to store the interim results from the analysis. The improved view integration system basically uses all the concepts and assumptions in the old view integration system which have been summarized in section 2. However, a new approach to conflict analysis will be suggested. Moreover, fuzzy logic concepts will be used to help users determine identical and related elements from different views. 76 4.1 New Conflict Analysis Procedure In Wagner's conflict analysis approach (1989), the elements, which are in the vicinity of the element that has just been compared, will be chosen for the subsequent identity examinations. Hence, after an entity is identified, its relationships and attributes will be compared before the next entity is compared. As a result, the conflict analysis approach is not able to identify all the unique entities (entities that appear in one view only) before relationships and attributes are compared, and this deficiency may cause more interactions with database designers during the identity test. Moreover, since the relatedness test is carried out at the end of the conflict analysis, the designers have to go through again some elements which have already been compared during the identity test in order to determine the possibility of relocation. If elements of different constructs are compared in a specified order, the number of time that designers have to be consulted could be greatly reduced. The suggested conflict analysis procedure is shown in the following figure: Entities ^ Relatedness ^ Relationships Attributes Identification Identification Identification • Identification 77 4.1.1 Entities Identification The first stage is to resolve conflicts for all entities in both database views. At this stage, two main phases are executed: 1. Creating a testing list consisting merely of entities from each of the two original database views (Phase Ei). 2. Comparing each entity from one testing list with all entities in the other list (Phase Eii). 4.1.1.1 Creating a Testing List Consisting Merely of Entities from Each of the Two Original Database Views (Phase Ei). At this stage, database elements which are of relationship or attribute types will not be compared. The comparisons of entities with relationships and attributes are deferred until the stages of relationships and attributes identification respectively. Therefore, a testing list comprising of entities only is created from each original view. If a real world object is represented as an entity in one view and as a relationship in the other view, the entity representation will be copied to the resultant third view. This solution is based on conflict solution SI (a relationship becomes an entity). If an entity in one view has the same meaning as an attribute in the other view, the element which is of the entity type will be added to the resultant third view. This solution is based on conflict solutions S2 (a relationship attribute becomes an entity) and S3 (an entity attribute becomes an E-R construct). 78 Since in the above two situations, the real world object is ultimately represented as an entity in the resultant third view, the relationships and attributes identification stages can be carried out after the entities identification stage. During the relationships or attributes identification stage, if a relationship or an attribute corresponds to one of the entities already recognized in the entity identification stage, then the relationship or attribute can be ignored. 4.1.1.2 Comparing Each Entity from One Testing List with All Entities in the Other List (Phase Eii). Two situations occur in this phase: 1. An entity in one view corresponds to an entity in the other view (Situation El). 2. An entity in one view corresponds to none of the entities in the other view (Situation E2). 4.1.1.2.1 An Entity in One View Corresponds to an Entity in the Other View (Situation E1). If an entity in one view is found to have the same meaning as an entity in the other view, the next step is to check whether they have the same name. If they have different names, conflict solution S10 (renaming of homonyms and synonyms) will be applied to solve the naming problem. Finally, the entity can be added to the resultant third view. 79 4.1.1.2.2 An Entity in One View Corresponds to None of the Entities in the Other View (Situation E2). If an entity in one view is unique, that is, if it is found only in one view, it will be simply added to the resultant third view. Before the entity is added, the existence of a homonym in the resultant third view is determined. If a homonym related to this entity is found, the conflict solution S10 (renaming of homonyms and synonyms) is executed to solve the naming conflict. The unique entity in a view is also recorded in a list created for that view. The unique entities' lists of the two views will be used as references in the later identification stages. After it is identified, the tested entity, together with its corresponding entity, if any, in the other view, can be removed from the testing lists. Consequently, when another entity is tested, less elements will be required to be compared. The testing list that has fewer entities will be regarded as the "host" list. That means the entities in this testing list will be compared first. Then, a database designer may have to go through fewer identification steps. For example, since view 1 consists of three entities and view 2 is comprised of two entities. Entities in view 2 will be compared first. View 1: Engineer View 2: Engineer Person 80 Step Comparing entities in view 1 first Comparing entities in view 2 first 1 Testing subject: Person Testing subject: Person Candidates: Person, Engineer Candidates: Person, Employee, Engineer 2 Testing subject: Employee Testing subject: Engineer Candidates: Engineer Candidates: Employee, Engineer 3 Testing subject: Engineer Candidates: Engineer The heuristics rules H4 (elements with identical meanings have identical names) and H6 (two elements with identical or related meanings have some attributes with identical names) can be used at this stage to help find identical entities in different database views. Furthermore, entities that have the same meaning usually have identical keys. Moreover, entities with names that are synonyms may indicate that the entities refer to the same real world object, because different users may use different terminology to refer to the same real world object. For example, a user chooses the term Person to refer to humans while another user employs the term People. In this regard, a thesaurus is useful to help identify synonyms. An entity in one view can correspond to more than one entity in the other view and this situation can be found in aggregation and association abstractions. An example of the aggregation abstraction is: an entity Work in view 1 is an aggregate of entities Employee and Project and a relationship work in view 2. 81 View 1: View 2 <T^ ployee^ Ko> <CTroj ectNo^> ic^mployeeNo^ Employee Person require Machinery -^Project No^> 82 As for an association abstraction, a committee is a set entity, and it includes engineers and sa les representat ives as its members. View 1: Sales Engineers WW Committee Representatives View 2 Engineers committee Sales Representatives! work Project 83 However, the handling of the aggregation and association abstractions is beyond the scope of this research. Hence, the improved view integration system is not designed to solve these abstraction problems. If database designers encounter such abstraction problems, they have to use the view creation system to correct the resultant view after view integration. 84 4.1.2 Relatedness Identification The second stage is to determine subset relationships between entities in the resultant third database view. Two phases are executed at this stage: 1. Copying existing is-a relationships from the two original views to the resultant view (Phase Li). 2. Identifying new is-a relationships between unique entities in the resultant view (Phase Lii). 4.1.2.1 Copying Existing Is-a Relationships from the Two Original Views to the Resultant View (Phase Li). Existing is-a relationships in the two original views should be copied to the third view, except for is-a relationships that have already been represented by a series of is-a relationships because of transitivity. For example, in view 1, Engineer Is-A Employee ls-A Person, while in view 2, Engineer Is-A Employee. The is-a relationship in view 2 has been represented by the two is-a relationships in view 1. Hence, it is redundant and is not copied to view 3. View 1: Engineer View 2: Engineer Person 85 View 3: Engineer Semantic conflicts may happen at this phase: an entity E1 may be a superset of an entity E2 in one view, while the entity E1 appears as a subset of the entity E2 in the other view. For example, in view 1, it states that all employees is-a shareholders of a company (an entity Employee Is-A subset of an entity Shareholder), while in view 2, all shareholders is-a employees (an entity Shareholder Is-A subset of an entity Employee). View 1: Employee Is-A Shareholder View 2: Shareholder This kind of conflicts is resolved by interactively consulting with the designers. Indeed, whenever an is-a relationships chain forms a loop in the resultant third view, a semantic conflict occurs. For example, view 1 illustrates the relationship: Engineer Is-A Employee Is-A Person, while view 2 shows that Person Is-A Engineer. Hence, a loop is formed in the resultant third view. The problem is solved by deleting one of the is-a relationships in order to break the loop. 86 View 1: View 2: View 3: Is-A 4.1.2.2 Identifying New Is-a Relationships between Unique Entities in the Resultant View (Phase LU). During this phase, only entities that appear in one view (are unique) are determined, because if an entity appears in both database views, its relatedness with other entities should have been shown in either one of the two database views. For example, if view 1 contains entities Employee and Person, and view 2 contains an entity Engineer and the same entity Employee, then only the entity Person in view 1 and the entity Engineer in view 2 are tested for relatedness. The entity Employee will not be tested as it appears in the two database views. The unique entities' information can be retrieved from the unique entities' lists generated at the entities identification stage. 87 The examination for relatedness involves the following tests: 1. test for containment (Test La); 2. test for superset-subset (Test Lb); and 3. test for a W-relationship (Test Lc). 4.1.2.2.1 The Test for Containment (Test La) As for the containment test, if an entity exists in both views and it is a subset or superset of two different unique entities in the original views, then it is likely that the two unique entities in the original views are related. One of the original is-a relationships will become redundant because of transitivity and will have to be removed from the resultant third view. Three situations are found in the containment test: 1. The common entity is a subset of the two different unique entities in the original views (Situation La.l). 2. The common entity is a superset of the two different unique entities in the original views (Situation La.2). 3. The common entity is a subset of one unique entity and a superset of another unique entity in the original views (Situation La.3). 88 4.1.2.2.1.1 The Common Entity is a Subset of the Two Different Unique Entities in the Original Views (Situation La. 1). An entity Engineer exists in both views 1 and 2, and it is a subset of entities Person and Employee in views 1 and 2 respectively. Thus, the entities Person and Employee may be related. In this case, Employee Is-A Person and hence, an is-a relationship is added to view 3, and the existing is-a relationship between Engineer and Person has to be deleted. View 1: View 2: Engineer View 3: Engineer 89 4.1.2.2.1.2 The Common Entity is a Superset of the Two Different Unique Entities in the Original Views (Situation La. 2). An entity Person is the common entity in the two views and it is a superset of unique entities Engineer and Employee. Then, an is-a relationship between the entities Engineer and Employee is added to the third view. The transitive is-a relationship between the entities Engineer and Person is deleted. View 1: Engineer View 2: Employee View 3: Engineer 90 4.1.2.2.1.3 The Common Entity is a Subset of One Unique Entity and a Superset of Another Unique Entity in the Original Views (Situation La. 3). If the common entity is a subset of a unique entity in one view, while it is a superset of another unique entity in the other view, then no is-a relationship between these unique entities has to be added to the resultant third view as it is redundant because of transitivity. For example, an entity Employee is a superset of an entity Engineer in view 1 and a subset of an entity Person in view 2. Even if the entities Engineer and Person are related, no is-a relationship is added to view 3 because of transitivity. View 1: Engineer View 2: Employee View 3: 91 4.1.2.2.2 Test for Superset-subset (Test Lb) Two unique entities could have a common superset and subset. If the unique entities have a common subset, they both should belong to a common superset, but it is not necessary for a database designer to record the common superset in the resultant third view as he or she may not be interested in including the superset in the database design. For example, entities Project Employee and Engineer are unique in view 1 and view 2 respectively. They have a common subset Project Engineer and a common superset Employee. Since a designer may not be interested in the information stored in the entity Employee, the entity is not recorded in view 3. View 1: Project Employee View 2: Engineer 92 View 3: Engineer 2 If the unique entities have a common superset, they may or may not have a common subset because the domains of the two entities may or may not overlap. For example, the domains of entities Project T e a m Member and Project T e a m M e m b e r overlap, and they have a common superset Emp loyee and a common subset Project & Product Member . View 1: Product Team Member View 2: Project Team Member View 3: Product Team] Member Project Team Member Is-A Project & Product Team Member Is-A 93 However, the domains of entities Full Time Employee and Part Time Employee do not overlap. Hence, the entities only belong to a common superset Employee. View 1: Full Time Employee View 2: Part Time Employee View 3: Full Time Employee Part Time Employee 94 4.1.2.2.3 Test for a W-relationship (Test Lc) If two unique entities do not have common attributes, but they are associated with the same entity in the original views, then a W-relationship might exist between the two unique entities. For example, entities Person and Company in views 1 and 2 respectively have different attributes as they refer to two totally unrelated real world objects. However, they both are associated with the same entity Shares in views 1 and 2. Hence, they are participating in the same role. This common role relationship is represented in view 3. View 1: Person View 2: Company View 3: Person Company Shares 95 During the test for superset-subset (test Lb) and the test for a W-relationship (test Lc), before each new superset or subset is added to the resultant third view, the existence of a homonym has to be checked to make sure no entity has the same name as the superset or subset. Two entities are most likely related if they have common neighbor entities, common attributes (heuristic rule H6: two elements with identical or related meanings have some attributes with identical names), common keys, and names that are synonyms or have identical word stems (heuristic rule H5: elements with related meanings have names with identical word stems). The relatedness identification test is carried out before the relationships identification test because of conflict solution S5 (relocation of a relationship after creation of new superset or subset classes). By testing relatedness first, the steps for relocating relationships can be avoided. Two relationships can be related in meaning. For example, in the following diagram, the relationship manage is a subset of the relationship work on. However, the E-R model does not have any special notation to represent such relatedness between relationships. Therefore, the relatedness identification stage is limited to entities only. 96 4.1.3 Relationships Identification The third stage is to identify the relationships between entities in the resultant third database view. Two phases are executed in this stage: 1. Identifying relationships in the two original database views (Phase Ri). 2. Copying cardinalities from the two original views to the resultant view (Phase Rii). 4.1.3.1 Identifying Relationships in the Two Original Database Views (Phase Ri). During this phase, each relationship in one view is compared with intellectually selected relationships or entities in other view. A relationship in one view can correspond to an entity in the other view according to conflict solution SI (a relationship becomes an entity). The comparisons of relationships with attributes are deferred until the attributes identification stage. The comparisons can be deferred because based on conflict solution S3 (entities attributes become an E-R construct), an entity attribute will eventually become an E -R construct. Possibly, every pair of entities in the new third database view is linked through a relationship. The following situations could occur at the stage of relationships identification: 1. With a pair of entities, none of the entities is involved in any is-a relationship (Situation Ra). 2. With a pair of entities, either one or both entities is/are involved in is-a relationships (Situation Rb). 97 4.1.3.1.1 With a Pair of Entities, None of the Entities Is Involved in Any Is-a Relationship (Situation Ra). If two entities in the resultant third view are not involved in any is-a relationship, then the possible situations between these entities are as follows: 1. Both entities are unique in different database views (Situation Ra. 1). 2. Both entities are present together in only one original database view (Situation Ra.2). 3. Both entities are present in both original views (Situation Ra.3). 4.1.3.1.1.1 Both Entities Are Unique in Different Database Views (Situation Ra. 1). No relationship should have been found between two unique entities. Therefore, no relationship is copied to the resultant third view. For instance, entities Car and Course exist only in views 1 and 2 respectively. Hence, in view 3, no relationship should link these two entities. View 1: Car View 2: Course View 3: 98 4.1.3.1.1.2 Both Entities Are Present Together in Only One Original Database View (Situation Ra.2). In other words, no, or only one, common entity exists in the other original view. Two situations could occur: 1. Two entities existing only in one database view are not linked by any relationship (Situation Ra.2.1). 2. Two entities existing only in one database view are linked by a relationship (Situation Ra.2.2). 99 4.1.3.1.1.2.1 Two Entities Existing Only in One Database View Are not Linked by Any Relationship (Situation Ra.2.1). If no relationship is found between the entities which exist only in one database view, then no relationship is copied to the resultant third view. For instance, in view 1, entities Car and Course are not linked by any relationship, while in view 2, none or either one of the entities Car or Course is found. Then, in view 3, no relationship should link these two entities. View 1: View 2: View 3: or No or Common Entity 100 4.1.3.1.1.2.2 Two Entities Existing Only in One Database View Are Linked by a Relationship (Situation Ra.2.2). If, in one original view, a relationship exists between the entities which are found only in one database view, the relationship is copied to the third view. For example, entities Department and Emp loyee are connected by a relationship belong in view 1, but none or either one of the entities Department or Emp loyee exists in view 2. Then, the relationship belong in view 1 is copied to the resultant view 3. View 1: Department belong Employee View 2: Department or Employee No or common entity View 3: Department belong Employee 101 4.1.3.1.1.3 Both Entities Are Present in the Two Original Views (Situation Ra.3). In this case, the following situations could be found: 1. No relationship exists between the two entities in both database views (Situation Ra.3.1). 2. A relationship exists between the two entities in both database views (Situation Ra.3.2). 3. A relationship exists between the two entities in only one database view (Situation Ra.3.3). 102 4.1.3.1.1.3.1 No Relationship Exists between the Two Entities in Both Database Views (Situation Ra.3.1). Since no relationship is found between the two entities in the two original views, no relationship should be added to the third view. For example, in both views 1 and 2, entities Department and Employee are not linked by any relationship. So, no relationship should link these two entities in view 3. View 1: Department Employee View 2: Department Employee View 3: Department Employee 103 4.1.3.1.1.3.2 A Relationship Exists between the Two Entities in Both Database Views (Situation Ra.3.2). In this case, designers should be asked whether the relationships in different views have the same meaning. Two situations can be found: 1. The relationships between the two entities in the two database views are identical (Situation Ra.3.2.1). 2. The relationships between the two entities in the two database views are non-identical (Situation Ra.3.2.2). 4.1.3.1.1.3.2.1 The Relationships between the Two Entities in the Two Database Views Are Identical (Situation Ra.3.2.1). If the relationships are identical, only one of the relationships will be added to the third view. For example, entities Department and Employee are linked by a relationship belong in views 1 and 2. Hence, only one relationship is copied to view 3. View 1: Department belong" Employee View 2: Department belong Employee View 3: Department belong Employee 104 4.1.3.1.1.3.2.2 The Relationships between the Two Entities in the Two Database Views Are Non-identical (Situation Ra.3.2.2). If the relationships do not have the same meaning, both relationships will be copied to the third view. For example, entities Department and Employee are linked by a relationship belong and manage in view 1 and view 2 respectively. Since the relationships belong and manage have different meanings, both relationships are copied to view 3. View 1: Department View 2: View 3: belong Department belong manage Employee Employee Employee •> 105 4.1.3.1.1.3.3 A Relationship Exists between the Two Entities in Only One Database View (Situation Ra.3.3). If a relationship exists between the two entities in only one view, then two situations can be found: 1. The relationship in one view may be missing in the other view (Situation Ra.3.3.1). 2. The relationship in one view may correspond to an entity in the other view (Situation Ra.3.3.2). 106 4.1.3.1.1.3.3.1 The Relationship in One View May be Missing in the Other View (Situation Ra.3.3.1). If a relationship is present in one view but not in the other view, then the relationship is added to the third view. For instance, the relationship belong appears in view 1, but not in view 2. Hence, the relationship belong is copied to view 3. View 1: Department belong Employee View 2: Department Employee View 3: Department belong Employee 107 4.1.3.1.1.3.3.2 The Relationship in One View May Correspond to an Entity in the Other View (Situation Ra.3.3.2). In other words, the two entities are linked indirectly through another entity. Since the entity is already identified in the entities identification stage, the relationship will not be added. For instance, in view 1, entities Department and Emp loyee are linked indirectly by an entity Project, while in view 2, the entities are linked by a relationship project. Since the entity Project and the relationship project refer to the same real world object, the situation in view 1 is copied to view 3. View 1: Department Project )articipa Employee View 2: Department Employee View 3: 108 4.1.3.1.2 With a Pair of Entities, Either One or Both Entities Is/Are Involved in Is-a Relationships (Situation Rb). If either one or both entities associated with a relationship is/are involved in is-a relationships, then the following situations could combine with situation Ra (with a pair of entities, none of the entities is involved in any is-a relationship). 1. Only one of the two entities is involved in is-a relationships (Situation Rb. 1). 2. Both entities are involved in is-a relationships (Situation Rb.2). 4.1.3.1.2.1 Only One of the Two Entities Is Involved in Is-a Relationships (Situation Rb.1). Suppose in one view, a relationship A is associated with entities X and Y , and one of the two entities, say Y, is a subset or superset of an entity Z in the resultant third view. In the other view, the entities X and Z are connected by a relationship B. Then, the relationship A in the former view can correspond to the relationship B in the latter view. A database designer will be asked whether the relationship should be associated with the entities Y or Z in the resultant third view. For example, in view 1, a relationship work is associated with entities Project and Engineer. However, in view 2, the relationship work is associated with the entity Project and an entity Emp loyee , and the entity Emp loyee is identified as the superset of the entity Engineer in the resultant view 3. Then, a designer is consulted about whether or not the relationship should connect the superset entity Emp loyee or the subset entity Engineer. Notice that the entity Project is present in both views 1 and 2. 109 View 1: (all possible situations) work or View 2: (all possible situations) work or View 3: (all possible results) or 110 In fact, this comparison of relationships solves another semantic conflict. When a database user perceives that a relationship should be associated with the superset entity while another user thinks that the relationship should be associated with the subset entity. For example, a relationship work is associated with an entity Employee which is a superset of an entity Engineer in view 1, while the relationship work is associated with the subset entity Engineer in view 2. Then, a designer has to decide whether the relationship should be associated with the superset entity or the subset entity. View 1: Engineer View 2: View 3: (all possible results) or 111 If an entity Y is involved in a chain of is-a relationships, then a relationship associated with an entity X and the entity Y could be found among the relationships associated with the subsets or supersets of the entity Y along the is-a relationships chain, provided that the subset or superset also is associated with the same entity X through a relationship. Then, in the resultant third view, the relationship can be relocated to any entity along the is-a relationships chain. For example, in view 1, a relationship work is associated with entities Project and Engineer and the entity Engineer is involved in an is-a relationships chain: Engineer Is-A Employee Is-A Person. In view 2, the relationship work is associated with the entities Person and Project. Then, a designer has to decide which entity along the is-a relationships chain the relationship should be associated with. View 1: Project View 2: Project 112 View 3: (all possible results) or or 113 As mentioned in section 4.1.3.1.1.3.3.2, a relationship in one view can be represented as an entity in the other view. Hence, suppose a relationship A is associated with entities X and Y, and the entity Y is involved in an is-a relationships chain in the resultant third view. If there is an entity Z associated with the entity X and associated with a subset or superset of the entity Y, then the relationship A can correspond to the entity Z, and it is not copied to the resultant third view as it has been represented by the entity Z. For example, in view 1, a relationship work on is associated with entities Project and Engineer, and the entity Engineer is involved in an is-a relationships chain: Engineer Is-A Employee Is-A Person. In view 2, an entity Assignment is associated with the entities Project and Employee. Hence, the relationship work on in view 1 has been represented by the entity Assignment in view 2 and is thus ignored. View 1: Project View 2: Project 114 View 3: (all possible results) or or 115 4.1.3.1.2.2 Both Entities are Involved in Is-a Relationships (Situation Rb.2). Suppose in one view, a relationship A is associated with entities W and X, and in the resultant third view, an entity Y is involved in the is-a relationships chain the entity W is participating in, and an entity Z is involved in the is-a relationships chain the entity X is participating in. Then, in the other view, the relationship A could be found among the following relationships: 1. a relationship associated with entities W and Z; 2. a relationship associated with entities Y and Z; 3. a relationship associated with entities Y and X. 4. an entity indirectly associated with entities W and Z; 5. an entity indirectly associated with entities Y and Z; and 6. an entity indirectly associated with entities Y and X. Again, a designer has to decide which entity from each is-a relationships chain the relationship is finally associated with. For example, in view 1, a relationship work exists between entities Engineer and Division. In view 2, entities Employee and Department are connected by the relationship work. In view 3, the entity Engineer is found to be the subset of the entity Employee while the entity Division is the subset of the entity Department. Then, a designer has to decide where the relationship work is placed. View 1: 116 View 2: Department work Employee View 3: Division Department or Division Is-A Department / or Division Department Employee 117 or In fact, whenever a relationship is found to be associated with an entity which is involved in an is-a relationship, a designer should be asked whether the relationship has to be relocated to another entity along the is-a relationships chain. This requirement is based on conflict solution S5 (relocation of a relationship after creation of new superset or subset classes). For example, a relationship belong is associated with entities Department and Full T ime Emp loyee in view 1, while the entity Full T ime Emp loyee is a subset of an entity Emp loyee in view 2. Then, in view 3, the relationship belong may have to be relocated from the entity Full T ime Employee to the entity Employee . 118 View 1: > Full Time Employee View 2: Full Time Employee View 3: Department belong Employee Is-A Full Time Employee Here is the list of all the situations which may happen at the relationships identification stage: 1. With a pair of entities, none of the entities is involved in any is-a relationship (Situation Ra). a) Both entities are unique in different database views (Situation Ra. 1). b) Both entities are present together in only one original database view (Situation Ra.2). i) Two entities existing only in one database view are not linked by any relationship (Situation Ra.2.1). ii) Two entities existing only in one database view are linked by a relationship (Situation Ra.2.2). c) Both entities are present in both original views (Situation Ra.3). i) No relationship exists between the two entities in both database views (Situation Ra.3.1). 119 ii) A relationship exists between the two entities in both database views (Situation Ra.3.2). a) The relationships between the two entities in the two database views are identical (Situation Ra.3.2.1). b) The relationships between the two entities in the two database views are non-identical (Situation Ra.3.2.2). iii) A relationship exists between the two entities in only one database view (Situation Ra.3.3). a) The relationship in one view may be missing in the other view (Situation Ra.3.3.1). b) The relationship in one view may correspond to an entity in the other view (Situation Ra.3.3.2). With a pair of entities, either one or both entities is/are involved in is-a relationships (Situation Rb). a) Only one of the two entities is involved in is-a relationships (Situation Rb. 1). b) Both entities are involved in is-a relationships (Situation Rb.2). 120 Assume that a relationship A is associated with entities X and Y in one view. The following table summarizes the situations when a database designer is consulted about which candidate in the other view corresponds to the relationship A: Table 3. Summary of the Candidates of a Relationship Situation in one view Candidates in the other view The entities X and Y are present only in one • Relationships associated with subsets or database view (Situation Ra.2.2). supersets of X and Y (Situation Rb). • Entities indirectly associated with subsets or supersets of X and Y (Situation Rb). The entities X and Y are present in both • Relationships associated with the same database views (Situation Ra.3). entities X and Y (Situation Ra.3.2). • Entities connecting the same entities X and Y indirectly (Situation Ra.3.3.2). • Relationships associated with subsets or supersets of X and Y (Situation Rb). • Entities indirectly associated with subsets or supersets of X and Y (Situations Rb). Before a relationship is added to the third view, naming anomalies have to be resolved (conflict solution S10: renaming of homonyms and synonyms). 121 4.1.3.2 Copying Cardinalities from the Two Original Views to the Resultant View (Phase Rb). Once a relationship has been added to the third view, the cardinalities of the relationship should be copied as well. If there is an inconsistency of cardinalities between the two original views, a designer has to be consulted to determine the right ones. A relationship in one view can correspond to an entity in the other view. For example, the relationship project in view 1 corresponds to the entity Project in view 2. Hence, the cardinalities of the E-R construct Department-project in view 1 should match the cardinalities of the E-R construct Department-belong in view 2. Similarly, the cardinalities of the E-R construct Employee-project must be equal to the cardinalities of the E-R construct Employee-part ic ipate. View 1: Department l ,n View 2 l,n Department -^belong^*'* Project Employee l,n / \ l , n •articipai Employee 122 4.1.4 Attributes Identification This stage is the final phase of the conflict analysis. Elements that are of attribute type are resolved at this stage. Two phases are executed in this stage: 1. Identifying attributes in the two original database views (Phase Ai). 2. Solving key conflicts in the two original database views (Phase Aii). 4.1.4.1 Identifying Attributes in the Two Original Database Views (Phase Ai). According to definition (section 1.1), an attribute can be associated with either an entity or a relationship. 4.1.4.1.1 An Attribute that Is Associated with an Entity (Situation Aa). The entity that an attribute is associated with can be involved in the following two situations: 1. The entity is not involved in an is-a relationships chain (Situation Aa. 1). 2. The entity is involved in an is-a relationships chain (Situation Aa.2). 123 4.1.4.1.1.1 The Entity Is not Involved in an Is-a Relationships Chain (Situation Aa.1). If the entity an attribute is associated with is not involved in an is-a relationships chain, then two situations occur for the entity: 1. The entity exists in only one database view (Situation Aa. 1.1). 2. The entity exists in both database views (Situation Aa. 1.2). 4.1.4.1.1.1.1 The Entity Exists in Only One Database View (Situation Aa.1.1). If an entity is unique in one view, then all its attributes can be copied immediately to the third database view. Since the entity is unique in one view, the attributes of the entity do not correspond to any attribute in the other view. Moreover, no new relationship is added to be associated with the unique entity during the relationships identification stage. In other words, the attributes of the unique entity do not correspond to any E-R construct in the other view. Therefore, the attributes of the entity in one view should not be found in the other view. For example, an entity Project is unique in view 1. Therefore, the attribute Date of the entity Project should not be found in view 2. Hence, the attribute is added to the resultant view 3. View 1: <^Date7> View 2: Department manage Employee 124 4.1.4.1.1.1.2 The entity exists in both database views (Situation Aa.1.2). If the entity an attribute is associated with exists in both original database views, then the attribute in one view can correspond to: 1. an attribute associated with the same entity in the other view (Situation Aa. 1.2.1); 2. an E-R construct associated with the same entity in the other view (Situation Aa. 1.2.2); and 3. no attribute in the other view (Situation Aa. 1.2.3). 125 4.1.4.1.1.1.2.1 The Entity Attribute in One View Corresponds to an Attribute Associated with the Same Entity in the Other View (Situation Aa. 1.2.1). If an entity attribute in one view is found among the attributes of the same entity in the other view, it is added to the resultant third view. For example, the attribute Address of an entity Person in view 1 is found among the attributes of the entity Person in view 2. Then, the attribute Address is added to view 3. View 1: Person View 2 Person .<^~Addres§~j~j> <^Addres§~T-> <TJZEmail View 3 : Person <^ f___Address_~J> "<TTj3mair_~~J^  126 4.1.4.1.1.1.2.2 The Entity Attribute in One View Corresponds to an E-R Construct Associated with the Same Entity in the Other View (Situation Aa. 1.2.2). If an entity attribute in one view is not found among the attributes of the same entity in the other view, then it may be found among the E-R constructs associated with the same entity in the other view (conflict solution S3: an entity attribute becomes an E-R construct). Notice that the E-R constructs which will be compared exist only (are unique) in the other view. If the attribute corresponds to an E-R construct, then the attribute is ignored as the E-R construct has already been identified during the entities and relationships identification stages. For example, the attribute Bank Account of an entity Employee in view 1 corresponds to an E-R construct has-Bank Account in view 2. Then, the attribute Bank Account is ignored as the E-R construct has-Bank Account has already been identified in view 3. View 1: Employee <^~~Bank Account View 2: Employee View 3: Employee has has Bank Account Bank Account 127 4.1.4.1.1.1.2.3 The Entity Attribute in One View Corresponds to No Attribute in the Other View (Situation Aa.1.2.3). If an entity attribute in one view does not correspond to any attribute in the other view, it is copied to the resultant third view. For instance, the attribute Address of an entity Person in view 1 is not found among the attributes of the entity Person in view 2. Then, the attribute Address is added to view 3. View 1: Person AddresiT^ View 2: Person View 3 : Person <f~~J Address~~~~j> <!TJl_Emair_J -^-> 128 4.1.4.1.1.2 The Entity Is Involved in an Is-a Relationships Chain (Situation Aa.2). If the entity is involved in an is-a relationships chain, then the following two situations can mix with situation Aa.l (the entity is not involved in an is-a relationships chain). 1. No new is-a relationship has been added to the is-a relationships chain (Situation Aa.2.1). 2. New is-a relationships have been added to the is-a relationships chain (Situation Aa.2.2). 4.1.4.1.1.2.1 No New Is-a Relationship Has Been Added to the Is-a Relationships Chain (Situation Aa.2.1). If an entity is involved in an is-a relationships chain, and no new is-a relationship has been added to this chain during the relatedness identification stage, then all the attributes of this entity can be copied to the resultant third view. For example, in view 1, an entity Full Time Employee has a superset Employee and in view 2, no entity is related to the entities Full Time Employee or Employee. Then the attribute Salary of the entity Full Time Employee is copied to view 3. View 1: Full Time Employee - C Z j S alaryJ~J> View 2: no entity is related to the chain Full Time Employee Is-A Employee 129 View 3: Full Time Employee <Z^Sa]arv_2^ 4.1.4.1.1.2.2 New Is-a Relationships Have Been Added to the Is-a Relationships Chain (Situation Aa.2.2). If the entity is involved in an is-a relationships chain, and new is-a relationships are added to this chain during the relatedness identification stage, then the attributes of the entity can be found among the attributes of the newly added supersets or subsets. Moreover, the attributes may be required to be relocated to the new superset or subset in order to reflect the inheritance property of the new subset relationship. For instance, in view 1, an entity Employee is a superset of an entity Engineer and it has two attributes Address and Email. After the relatedness identification stage, an entity Person which has an attribute Address in view 2 is added to the is-a relationship in view 1. The attribute Address of the entity Employee in view 1 is found to be the same as the attribute Address of the entity Person in view 2, and the attribute is finally attached to the entity Person in view 3. Since now the entity Person becomes a superset of the entity Employee, the attribute Email is relocated from the entity Employee to the entity Person. View 1: c^~~ Address~~^> 130 View 2: View 3: Engineer <^~~Addres£~^ Address" <^~~Emaii An entity attribute can also be found among the E-R constructs associated with the new subsets and supersets. For example, the attribute Contract of an entity Engineer corresponds to an E-R construct has-Project associated with the new superset Employee of the entity Engineer. View 1: <^J~Contracr^ View 2: Employee View 3: Employee has Contract 131 Furthermore, an entity attribute can correspond to more than one entity if the entity it is associated with is involved in is-a relationships. This situation can happen in generalization abstraction when a superset's attribute is used to refer to all the subsets. Since the subsets have been identified during the relatedness identification stage, the attribute of the superset is ignored. For example, an employee can be either part-time or full-time, but not both. So in view 1, the generalization abstraction can be represented using an attribute Type which value set can only be either Part-t ime or Full-t ime. However, this data abstraction can also be represented as it is in view 2, where Emp loyee is a superset of Full-time Employee and Part-t ime Employee. Since the subsets Full-t ime Emp loyee and Part-time Emp loyee have been identified during the relatedness identification stage, the attribute Type is ignored. View 1: Employee < _ T y p e _~^ > View 2: Part-time Employee View 3: Part-time Employee Full-time Employee Full-time Employee 132 The improved system determines the above situation by checking whether an entity possesses more than one subset. If the entity is a superset of more than one entity, then the designer will be asked whether an attribute of the entity is used to denote the generalization abstraction. 133 4.1.4.1.2 An Attribute Is Associated with a Relationship (Situation Ab). If an attribute is associated with a relationship in one view, it can correspond to: 1. an attribute associated with the same relationship in the other view (Situation Ab. 1). 2. an entity in a ternary relationship in the other view (Situation Ab.2). 3. an attribute of another relationship connecting the same entities in the other view (Situation Ab.3). 4. no attribute in the other view (Situation Aa.4). 4.1.4.1.2.1 The Relationship Attribute in One View Corresponds to an Attribute Associated with the Same Relationship in the Other View (Situation Ab. 1). If a relationship attribute in one view can be found among the attributes of the same relationship in the other view, then it is copied to the resultant third view. For example, the attribute G rade of a relationship take in view 1 can be found among the attributes of the same relationship take in view 2. Then, the attribute is copied to view 3. View 1: G r a d e d 134 View 2: C^Date Grade J > View 3: CTDate Grade _~^ > 4.1.4.1.2.2 The Relationship Attribute in One View Corresponds to an Entity in a Ternary Relationship in the Other View (Situation Ab.2). If the attribute of a binary relationship in one view corresponds to an entity in the other view, then the entity in the other view should be unique and involved in a ternary relationship. The existence of a binary relationship attribute is dependent on the occurrences of the two entities associated with the relationship. Hence, if the relationship attribute corresponds to an entity, the occurrences of the entity are also dependent on the occurrences of the same entities associated with the relationship; this dependency is only found in a ternary relationship. Since the corresponding entity has already been identified during the entities identification stage, the relationship attribute is ignored. 135 For example, in view 1, a relationship attribute Project is dependent on one occurrence of an entity Suppl ier and one occurrence of an entity Part. In view 2, each occurrence of an entity Project is also dependent on one occurrence of the entity Suppl ier and one occurrence of the entity Part. Hence, the relationship attribute Project in view 1 corresponds to the entity Project in view 2. However, since the entity Project has already been identified during the entities identification stage, the relationship attribute Project is ignored. View 1: View 2: View 3: 136 Similarly, an attribute associated with a ternary or higher level relationship may correspond to an entity involved in a quaternary or higher level relationship. However, a quaternary or higher level relationship hardly ever exists. An attribute associating with a unary relationship in one view does not correspond to any entity in the other view, because a unary relationship attribute is dependent on two occurrences of an entity. For example, a unary relationship attribute Project is dependent on two occurrences of an entity Employee. Since no entity's occurrence is dependent on two occurrences of an entity, an attribute of a unary relationship in one view can only correspond to an attribute associated the same relationship in the other view. If one of the entities associated with a relationship is exhibiting aggregation or association abstraction, then a relationship attribute can also correspond to an entity. For example, a relationship attribute Machinery in view 1 can correspond to an entity Machinery in view 2, because the aggregate Work in view 2 consists of two entities Employee and Person. 137 View 1 jCE^ l^oyeeWo" Employee Person -dProj ect NcT i^ C^achinerx]> View 2: <T^ployee^o> <!lProjectNoI> Machinery However, the improved view integration system does not support aggregation and association abstractions. 138 4.1.4.1.2.3 The Relationship Attribute in One View Corresponds to an Attribute of Another Relationship Connecting the Same Entities in the Other View (Situation Ab.3). As mentioned in section 4.1.2, two relationships connecting the same entities can be related in meaning. Then, the attribute of one relationship may be found among the attributes of another relationship. Moreover, the attributes of the relationships may have to be relocated. For example, a relationship manage is found to be a subset of a relationship work on. The attribute Pay Rate of the relationship manage in view 1 is also found attached to the relationship work on in view 2. Moreover, the attribute Hour of the relationship manage has to be relocated to the relationship work on in view 3. View 1: C^Hour Pay Rat£^> View 2: Pay Rate~> 139 View 3: 4.1.4.1.2.4 The Relationship Attribute in One View Corresponds to No Attribute in the Other View (Situation Ab.4) If a relationship attribute in one view does not correspond to any attribute in the other view, it is simply copied to the resultant third view. A relationship attribute does not correspond to any attribute in the other view when the relationship the attribute is associated with exists in only one view or when the attribute is simply not found among the attributes of the same relationship in the other view. For example, a relationship attribute Grade in view 1 does not correspond to any attribute in view 2. Hence, it is copied to view 3. View 1: Grade~~~~J> View 2: Student take Course or no relationship take is found. 140 View 3: Graded The following list summarizes all the situations which happen at the attributes identification stage: 1. An attribute that is associated with an entity (Situation Aa). a) The entity is not involved in an is-a relationships chain (Situation Aa. 1). i) The entity exists in only one database view (Situation Aa. 1.1). ii) The entity exists in both database views (Situation Aa. 1.2). a) The entity attribute in one view corresponds to an attribute associated with the same entity in the other view (Situation Aa. 1.2.1). b) The entity attribute in one view corresponds to an E-R construct associated with the same entity in the other view (Situation Aa.l.2.2). c) The entity attribute in one view corresponds to no attribute in the other view (Situation Aa.l.2.3). b) The entity is involved in an is-a relationships chain (Situation Aa.2). 141 i) No new is-a relationship has been added to the is-a relationships chain (Situation Aa.2.1). ii) New is-a relationships have been added to the is-a relationships chain (Situation Aa.2.2). An attribute is associated with a relationship (Situation Ab) a) The relationship attribute in one view corresponds to an attribute associated with the same relationship in the other view (Situation Ab.l). b) The relationship attribute in one view corresponds to an entity in a ternary relationship in the other view (Situation Ab.2). c) The relationship attribute in one view corresponds to an attribute of another relationship connecting the same entities in the other view (Situation Ab.3). d) The relationship attribute in one view corresponds to no attribute in the other view (Situation Ab.4) 142 The following table summarizes the candidates for an attribute being tested: Table 4. Summary of the Candidates of an Attribute Situation in one view Candidates in the other view An attribute associated with an entity • Attributes of the same entity (Situation (Situation Aa). Aa. 1.2.1). • E-R constructs associated with the entity (Situation Aa.l.2.2). • Attributes associated with the subsets and supersets of the entity (Situation Aa.2.2). • E-R constructs associated with the subsets and supersets of the entity (Situation Aa.2.2) An attribute associated with a relationship • Attributes of the same relationship (Situation Ab). (Situation Ab.l). • An entity involved in a ternary relationship (Situation Ab.2). • Attributes of another relationship linking the same entities (Situation Ab.3). Before an attribute is added to the resultant third view, naming conflicts have to be resolved (conflict solution S10: renaming of homonyms and synonyms). 143 4.1.4.2 Solving key conflicts in the two original database views (Phase AU). Each entity must have one key. However, the improved integration system does not enforce this rule in order to allow the database design process to be more flexible. However, if database users employ different keys for the same entity or relationship in different views, designers have to resolve this inconsistency. For example, attributes Employee ID and SIN may be used as the key of an entity Employee in different database views. Therefore, designers have to choose the right key for the entity. View 1: Employee -<r~Ejnployee I1T~^ > View 2: Employee View 3: Employee -^Employee ITT^1 or Employee 144 4.1.5 Wagner's View Integration System Vs the Improved View Integration System Wagner's view integration system is written in Turbo Prolog. Unfortunately, the Turbo Prolog's compiler has become obsolete and is no longer available in the market. Therefore, Wagner's system cannot be run. Not being able to run the system makes it difficult to compare Wagner's system with the improved one. However, Wagner (1989) has used a case to demonstrate the operation of his system, and this case will be used to compare the two systems. Three cases are used to show the differences of Wagner's view integration system and the improved view integration system: 1. Case 1: partially overlapped database views (Wagner's case). 2. Case 2: non-overlapped database views. 3. Case 3: identical database views. Case 1 is given by Wagner to show the operation of his system. The actual operations of Wagner's system on cases 2 and 3 are unknown, but the executions could be deduced from the conflict solutions (section 2.2.2) and heuristic rules (section 2.2.3) given by him. Wagner's system does not handle cardinalities and key conflicts. Hence, all the cardinalities and keys are ignored in all cases. 145 4.1.5.1 Case 1: Partially Overlapped Database Views. In the following case, an entity Dealer appears in both views 1 and 2, and the attribute Contract of the entity Dealer in view 1 corresponds to an entity Contract in view 2. View 1: "Dealer_rKT> Global Schema: ,dealer_ y-oontjact Dealer_n5> Branch <~J^ T^ ancrTn&> BranchliO> Contract Q7^racrt~n3> Contract (C^ C^ ntractTRy> Customer QCustome^^p Customer I^ CUstorne^ ^p 146 The following table shows when a database designer is consulted by the systems. Table 5. The Comparison of Wagner's System with the Improved System in Integration of Partially Overlapped Views No Wagner's View Integration System Improved View Integration System 1 Subject: Dealer (an entity in view 1). Subject: Dealer (an entity in view 1). Candidates: Dealer, Contract and Candidates: Dealer, Contract and Customer (entities in view 2). Customer (entities in view 2). Test: Identity. Stage: Entities identification. Execution: Execution: 1. Create a list of elements for each view. 1. Phase Ei: Create a testing list 2. Heuristic rule H2: Two elements with consisting merely of entities for each identical or related meanings have the of the two original database views. same construct. 2. Phase Eii: Comparing each entity from Result: one testing list with all entities in the Dealer (view 1) = Dealer (view 2). other list. a) Situation E l : An entity in one view corresponds to an entity in the other view. Result: 1. Dealer (view 1) = Dealer (view 2). 2. Dealer (view 1) is added to view 3. 2 Subject: supply (a relationship in view Subject: Branch (an entity in view 1). 147 1). Candidates: Contract and Cus tomer Candidates: dealer_contract (a (entities in view 2). relationship in view 2). Stage: Entities identification. Test: Identity. Execution: Execution: 1. Phase Eii: Comparing each entity from 1. Heuristic rule HI: Two elements with one testing list with all entities in the identical or related meanings will have other list. common contexts. a) Situation E2: An entity in one 2. Heuristic rule H2: Two elements with view corresponds to none of the identical or related meanings have the entities in the other view. same construct. Result: Result: 1. None of the candidate is selected. None of the candidates is selected. 2. The entities B ranch (view 1), Contract (view 2) and Cus tomer (view 2) are added to view 3 and the corresponding unique entities' lists. 3 Subject: supply (a relationship in view Subject: B ranch (an entity in view 1). 1). Candidates: Contract and Cus tomer Candidates: dealer_contract (a (entities in view 2). relationship in view 2). Stage: Relatedness and relationships Test: Relatedness. identification. 148 Execution: Execution: Heuristic rule H2: Two elements with 1. Phase Li: Copying existing is-a identical or related meanings have the relationships from the two original same construct. views to the resultant view. Result: 2. Phase Lii: Identifying new is-a None of the candidates is selected. relationships between unique entities in the resultant view. 3. Phase Ri: Identifying relationships in the two original database views. a) Situation Ra. 1: Both entities are unique in different database views. b) Situation Ra.2.1: Two entities existing only in one database view are not linked by any relationship. c) Situation Ra.2.2: Two entities existing only in one database view are linked by a relationship. 4. Phase Rii: Copying cardinalities from the two original views to the resultant view. Result: 149 1. None of the candidates is selected. 2. The relationships supply (view 1), dealer_contract (view 2) and customer_contract (view 2) are copied to view 3. 4 Subject: Dealer_no (an attribute in view Subject: Contract (an attribute in view 2). 1). Candidates: Branch-supply (an E-R Candidates: Dealer_no (an attribute in construct in view 1). view 2 and Contract -Test: Identity. dealer_contract (an E -R construct in Execution: view 2). 1. Heuristic rule HI: Two elements with Stage: Attributes identification. identical or related meanings will have Execution: common contexts. 1. Phase Ai: Identifying attributes in the 2. Conflict solution S3: An entity two original database views. attribute becomes an E-R construct. a) Situation Aa. 1.2.1: The entity Result: attribute in one view corresponds None of the candidates is selected. to an attribute associating with the same entity in the other view. b) Situation Aa. 1.2.2: The entity attribute in one view corresponds 150 to an E-R construct associating with the same entity in the other view. 2. Phase Aii: Solving key conflicts in the two original database views. Result: 1. Contract (view 1) = Contract-dealer_contract (view 2). 5 Subject: Branch (an entity in view 1). Subject: Dealer_no (an attribute in view Candidates: Contract and Customer 2). (entities in view 2). Candidates: Branch-supply (an E-R Test: Identity. construct in view 1). Execution: Stage: Attributes identification. Heuristic rule H2: Two elements with Execution: identical or related meanings have the 1. Phase Ai: Identifying attributes in the same construct. two original database views. Result: a) Situation Aa. 1.2.2: The entity None of the candidates is selected. attribute in one view corresponds to an E-R construct associating with the same entity in the other view. 151 Subject: Contract (an attribute in view 1). Candidates: Contract -dealer_contract (an E-R construct in view 2). Test: Identity. Execution: Conflict solution S3: An entity attribute becomes an E-R construct. Result: 1. Contract (view 1) = Contract -dealer_contract (view 2). b) Situation Aa. 1.1: The entity exists in only one database view. 2. Phase Aii: Solving key conflicts in the two original database views. Result: 1. None of the candidates is selected. 2. The attributes B ranch_no (view 1), Dealer_no (view 2), C o n t r a c t n o (view 2) and Cus tomer_no (view 2) are copied to view 3. 152 2. The elements supply, Branch and Branch_no are copied from view 1 to view 2. 3. The element Contract in view 1 is changed to Contract -dea le r con t rac t . 4. The elements customer_contract, Customer , Customer_no and Contract_no are copied from view 2 to view 1. 153 The graphical interim results of the two systems are as follows: Wagner's View Integration System: The database views in Wagner's system are not modified until the final interaction. So, at the sixth interaction, the two views become identical. Interaction no. 6: View 1: Dealer supply "Dealer_ncr> Branch ^Tf^ranchTRr) Contract I^Cbntracf^> Customer I^Customef^p View 2: .dealer.^-csntpact Dealer 5 5 > Branch <^ ~Branch^ nO> Contract r^ Cbntract^ &> Customer ICT^omeTTtp 154 The Improved View Integration System: Interaction no. 1: View 3: Dealer Interaction no. 2: View 3: Dealer Contract Customer Branch Interaction no. 3: View 3: csntjaci I Contract Customer 155 Interaction no. 4: View 3: Dealer Contract Customer Interaction no. 5: View 3: Dealer Contract "Dealer TuT> Customer r^ ^ontract H9> I^ CustomefT^p supply Branch H^BrancTTnO> The comparison shows that the improved system can solve this case with fewer interactions with a database designer than Wagner's system. The improved system requires fewer interactions with a database designer because it has automated more checking processes than Wagner's system. For example, all the identifications of the relationships in this case are done without any interaction with a designer. These automated checking processes require that all the unique entities be identified ahead of time. Therefore, these automated checking processes cannot be implemented in Wagner's system in which the order of the elements chosen for comparisons is not based on the elements' construct type. 156 The number of elements in one view which an element in the other view has to be compared at each interaction when using Wagner's system is approximately the same as that when using the improved system in this case. For example, the entity B ranch in view 1 has to be compared with two elements (Contract and Customer) in view 2 when using Wagner's approach, while it has to be compared with two elements (again Contract and Customer) when using the improved system's approach. It should be noted that for Wagner's approach, the relatedness test for the relationship supply is carried out at the third interaction. Therefore, the test for relatedness may not, as Wagner claims, be executed after the test for identity. Moreover, Wagner's approach does not check the relatedness between the unique entities Branch and Cus tomer in view 1 and view 2 respectively. Therefore, Wagner's system may not be able to solve the following case: View 1: Cffivision~"nQ> Dealer Contract TJealernc?^ epartmeht— Department l^ onn*act~5&> ^e^artmenLSb 157 Global Schema: C^vision~no> This case is the same as the previous one, except that the entities Branch and Customer are now replaced by the entities Division and Department respectively, and the entity Division is a subset of the entity Department. Since Wagner's system does not perform a relatedness test on the entities Branch and Customer, it may not be able to identify the is-a relationship between the entities Division and Department. 158 4.1.5.2 Case 2: Non-overlapped Database Views In the following case, no element is identical in views 1 and 2. However, an entity Student in view 1 is a superset of an entity Graduate Student in view 2. View 1: Student Ic^ Student 5cT> View 2: Department (D e^partme^ TjjD belong Department (D^artmerrT^jD <^ ~JTroject li5> Graduate Student 159 The following table shows the operation of the two systems in case 2: Table 6. The Comparison of Wagner's System with the Improved System in Integration of Non-overlapped Views No Wagner's View Integration System Improved View Integration System 1 Subject: Student (an entity in view 1). Subject: Student (an entity in view 1). Candidates: Graduate Student and Candidates: Graduate Student and Project (entities in view 2). Project (entities in view 2). Test: Identity. Stage: Entities identification. Execution: Execution: 1. Create a list of elements for each view. 1. Phase Ei: Create a testing list 2. Heuristic rule H2: Two elements with consisting merely of entities for each identical or related meanings have the of the two original database views. same construct. 2. Phase Eii: Comparing each entity from Result: one testing list with all entities in the None of the candidates is selected. other list. a) Situation E2: An entity in one view corresponds to none of the entities in the other view. Result: 1. None of the candidates is selected. 2. Student (view 1) is added to view 3 and the corresponding unique entities' 160 lists. 2 Subject: Student (an entity in view 1). Subject: Department (an entity in view Candidates: Student_no, Project_no 1). (attributes in view 2). Candidates: Gradua te Student and Test: Identity. Project (entities in view 2). Execution: Stage: Entities identification. Heuristic rule H3a: If a real world Execution: object is represented as an entity or a 1. Phase Eii: Comparing each entity from relationship in one view, it may appear one testing list with all entities in the as an entity attribute in the other view. other list. Result: a) Situation E2: An entity in one None of the candidates is selected. view corresponds to none of the entities in the other view. Result: 1. None of the candidates is selected. 2. The entities Department (view 1), Graduate Student (view 2) and Project (view 2) are added to view 3 and the corresponding unique entities' lists. 3 Subject: belong (a relationship in view Subject: Student (an entity in view 1). 161 1). Candidates: Gradua te Student and Candidates: Graduate Student and Project (entities in view 2). Project (entities in view 2). Stage: Relatedness identification. Test: Identity. Execution: Execution: 1. Phase Li: Copying existing is-a Heuristic rule H3c: If a real world relationships from the two original object is represented as a relationship views to the resultant view. in one view, but it is not an entity 2. Phase Lii: Identifying new is-a attribute in the other view, then it may relationships between unique entities be an entity. in the resultant view. Result: a) Test Li: The test for containment. None of the candidates is selected. Result: 1. Graduate Student is selected. 2. An is-a relationship between the entity Student and Gradua te Student is created in view 3. 4 Subject: Department (an entity in view Subject: Department (an entity in view 1). 1). Candidates: Graduate Student and Candidates: Graduate Student and Project (entities in view 2). Project (entities in view 2). Test: Identity. Stage: Relatedness and relationships 162 Execution: identification. Heuristic rule H2: Two elements with Execution: identical or related meanings have the 1. Phase Lii: Identifying new is-a same construct. relationships between unique entities Result: in the resultant view. None of the candidates is selected. 2. Phase Ri: Identifying relationships in the two original database views. a) Situation Ra. 1: Both entities are unique in different database views. b) Situation Ra.2.2: Two entities existing only in one database view are linked by a relationship. c) Ask for relocation of the relationships work and belong. 3. Phase Rii: Copying cardinalities from the two original views to the resultant view. Result: 1. None of the candidates is selected. 2. No relocation is performed. 3. The relationships belong (view 1) and 163 work (view 2) are copied to view 3. 5 Subject: Department (an entity in view Subject: Student_no (an attribute in 1). view 1). Candidates: Student_no, Project_no Candidates: Student_no (an attribute in (attributes in view 2). view 2) and work-Project (an E-R Test: Identity. construct in view 2). Execution: Stage: Attributes identification. Heuristic rule H3a: If a real world Execution: object is represented as an entity or a 1. Phase Ai: Identifying attributes in the relationship in one view, it may appear two original database views. as an entity attribute in the other view. a) Situation Aa.2.2: New is-a Result: relationships have been added to None of the candidates is selected. the is-a relationships chain. b) Situation Aa. 1.1: The entity exists in only one database view. 2. Phase Aii: Solving key conflicts in the two original database views. Result: 1. Student_no (view 1) = Student_no (view 2). 2. Student_no is attached to the entity 164 Student. 3. The attributes Department_no (view 1) and Project_no (view 2) are copied to view 3. 6 Subject: Graduate Student (an entity in view 2). Candidates: S t u d e n t n o , Department_no (attributes in view 1). Test: Identity. Execution: Heuristic rule H3a: If a real world object is represented as an entity or a relationship in one view, it may appear as an entity attribute in the other view. Result: None of the candidates is selected. 7 Subject: Project (an entity in view 2). Candidates: Student_no, Department_no (attributes in view 1). 165 Test: Identity. Execution: Heuristic rule H3a: If a real world object is represented as an entity or a relationship in one view, it may appear as an entity attribute in the other view. Result: None of the candidates is selected. 8 Subject: work (a relationship in view 2). Candidates: Student and Department (entities in view 1). Test: Identity. Execution: Heuristic rule H3c: If a real world object is represented as a relationship in one view, but it is not an entity attribute in the other view, then it may be an entity. Result: None of the candidates is selected. 9 Subject: Student (an entity in view 1). 166 Candidates: Graduate Student and Project (entities in view 2). Test: Relatedness. Execution: Heuristic rule H2: Two elements with identical or related meanings have the same construct. Result: Graduate Student is selected. 10 Subject: Department (an entity in view 1). Candidates: Graduate Student and Project (entities in view 2). Test: Relatedness. Execution: 1. Heuristic rule H2: Two elements with identical or related meanings have the same construct. 2. Conflict solution S5: Relocation of a relationship after creation of new superset or subset classes. 167 3. Ask for relocation of the attribute Student_no. Result: 1. None of the candidates is selected. 2. The elements Student, Student_no, be long, Department and Department_no are copied from view 1 to view 2. 3. The elements Graduate Student, Student_no, work, Project and Project_no are copied from view 2 to view 1. 4. An is-a relationship is established between the entities Student and Graduate Student. 5. The attribute Student_no is relocated from the entity Graduate Student to the entity Student. 168 The interim results of the two systems are shown in graphical representations as follows: Wagner's View Integration System: Interaction no. 10: View 1: Department iD^artmenFTjp belong Student "CSuclentjK^ <^Project~nO> Graduate Student Project View 2: Department (B^ epartmeriFTjp belong Student tCStudent no^> <^Troject~B&>-Graduate Student Project 169 The Improved View Integration System: Interaction no. 1: View 3: Department Interaction no. 2: View 3: Department Student Graduate Student Project Interaction no. 3: View 3: Department Student Graduate Student Project Interaction no. 4: View 3: Department belong Student Graduate Student Project 170 Interaction no. 5: View 3: Department Student Graduate Student l^epartaienF^ <^Troject^^> Project The improved system obviously requires fewer interactions with a database designer than Wagner's system because some checking processes can be done automatically after the information on unique entities is acquired. In fact, this lack of information on unique entities in Wagner's system increases the number of elements in one view which each element in the other view has to be compared. For example, when using Wagner's approach, the entity Student in view 1 has to be compared with all the elements (Graduate Student, Student_no, work, Project and Project_no) in view 2. However, when using the improved system, the entity Student in view 1 only has to be compared with two elements (Graduate Student and Project) in view 2. 171 Notice that Wagner's approach leaves the test for relatedness last, while the improved system performs the relatedness identification stage before the relationships and attributes identification stages. Therefore, when using Wagner's approach, a database designer has to come back to a previously identified relationship or attribute to decide whether it should be relocated, and this checking delay may be disadvantageous because a designer has to analyze the same relationship or attribute twice. On the contrary, when using the improved system, a designer is asked immediately whether the newly identified relationship or attribute has to be relocated. So he or she only has to analyze the relationship or attribute once. Wagner (1989) actually does not mention that the attributes of a new superset or subset may have to be relocated. As a result, how Wagner's system solves the fact that the attribute Student_no of the entity Student in view 1 corresponds to the attribute Student_no of the entity Graduate Student in view 2 is unclear. The two database views may end up in the following situations: View 1: Department Student Graduate Student rfJepartmenTTff) CStudentnQr^ CSfudent no <^TrojecTTKJ> Project 172 View 2: Department belong Student ^[^artmenTTjo Graduate Student CStudent CS d^ent no. <~"Troject TT9> Project The attribute Student_no may be still attached to the entity Graduate Student and be regarded as different from the attribute Student_no of the entity Student. 173 4.1.5.3 Case 3: Identical Database Views. In the following case, the two database views are identical. View 1: Dealer TJealerlKT} Contract c^ustomer1 mtr#ct Customer View 2: Dealer "Dealer_ncT> Contract Customer Global Schema: Dealer T>ealer_ncT> Contract Customer 174 The following table shows the operations of the two systems in case 3: Table 7. The Comparison of Wagner's System with the Improved System in Integration of Identical Views No Wagner's View Integration System Improved View Integration System 1 Subject: Dealer (an entity in view 1). Subject: Dealer (an entity in view 1). Candidates: Dealer, Contract and Candidates: Dealer, Contract and Customer (entities in view 2). Customer (entities in view 2). Test: Identity. Stage: Entities identification. Execution: Execution: 1. Create a list of elements for each view. 1. Phase Ei: Create a testing list 2. Heuristic rule H2: Two elements with consisting merely of entities for each identical or related meanings have the of the two original database views. same construct. 2. Phase Eii: Comparing each entity from Result: one testing list with all entities in the Dealer (view 1) = Dealer (view 2). other list. a) Situation E l : An entity in one view corresponds to an entity in the other view. Result: 1. Dealer (view 1) = Dealer (view 2). 2. Dealer (view 1) is added to view 3. 2 Subject: dealer_contract (a relationship Subject: Contract (an entity in view 1). 175 in view 1). Candidates: Contract and Cus tomer Candidates: dealer_contract (a (entities in view 2). relationship in view 2). Stage: Entities identification. Test: Identity. Execution: Execution: 1. Phase Eii: Comparing each entity from 1. Heuristic rule HI: Two elements with one testing list with all entities in the identical or related meanings will have other list. common contexts. a) Situation E l : An entity in one 2. Heuristic rule H2: Two elements with view corresponds to an entity in identical or related meanings have the the other view. same construct. Result: Result: 1. Contract (view 1) = Contract (view dealer_contract (view 1) = 2). dealer_contract (view 2). 2. Contract (view 1) is added to view 3. 3 Subject: Dealer_no (an attribute in view Subject: Cus tomer (an entity in view 1). 1). Candidates: Cus tomer (entities in view Candidates: Dealer_no (an attribute in 2). view 2). Stage: Entities and relatedness Test: Identity. identification. Execution: Execution: 1. Heuristic rule HI: Two elements with 1. Phase Eii: Comparing each entity from 176 identical or related meanings will have one testing list with all entities in the common contexts. other list. 2. Heuristic rule H2: Two elements with a) Situation E l : An entity in one identical or related meanings have the view corresponds to an entity in same construct. the other view. Result: 2. Phase Li: Copying existing is-a Dealer_no (view 1) = Dealer_no relationships from the two original (view 2). views to the resultant view. 3. Phase Lii: Identifying new is-a relationships between unique entities in the resultant view. Result: 1. Customer (view 1) = Customer (view 2). 2. Customer (view 1) is added to view -5 3. J. No new is-a relationship is added to view 3. 4 Subject: Contract (an entity in view 1). Subject: dealer_contract (a relationship Candidates: Contract (entities in view in view 1). 2). Candidates: dealercontract (a 177 Test: Identity. relationship in view 2). Execution: Stage: Relationships identification. 1. Heuristic rule HI: Two elements with Execution: identical or related meanings will have 1. Phase Ri: Identifying relationships in common contexts. the two original database views. 2. Heuristic rule H2: Two elements with a) Situation Ra.3.2.1: The identical or related meanings have the relationships between the two same construct. entities in the two database views Result: are identical. Contract (view 1) = Contract (view Result: 2). 1. dealer_contract (view 1) = dealer_contract (view 2). 2. dealer_contract (view 1) is copied to view 3. 5 Subject: customer_contract (a Subject: customer_contract (a relationship in view 1). relationship in view 1). Candidates: customer_contract (a Candidates: c u s t o m e r c o n t r a c t (a relationship in view 2). relationship in view 2). Test: Identity. Stage: Relationships identification. Execution: Execution: 1. Heuristic rule HI: Two elements with 1. Phase Ri: Identifying relationships in 178 identical or related meanings will have the two original database views. common contexts. a) Situation Ra.3.2.1: The 2. Heuristic rule H2: Two elements with relationships between the two identical or related meanings have the entities in the two database views same construct. are identical. Result: 2. Phase Rii: Copying cardinalities from customer_contract (view 1) = the two original views to the resultant customer_contract (view 2). view. Result: 1. customer_contract (view 1) = customer_contract (view 2). 2. customer_contract (view 1) is copied to view 3. 6 Subject: Cus tomer (an entity in view 1). Subject: Dealer_no (an attribute in view Candidates: Cus tomer (entities in view 1). 2). Candidates: Dealer_no (an attribute in Test: Identity. view 2). Execution: Stage: Attributes identification. 1. Heuristic rule HI: Two elements with Execution: identical or related meanings will have 1. Phase Ai: Identifying attributes in the common contexts. two original database views. 179 2. Heuristic rule H2: Two elements with a) Situation Aa. 1.2.1: The entity identical or related meanings have the attribute in one view corresponds same construct. to an attribute associating with the Result: same entity in the other view. Customer (view 1) = Customer 2. Phase Aii: Solving key conflicts in the (view 2). two original database views. Result: 1. Dealer_no (view 1) = Dealer_no (view 2). 2. Dealer_no (view 1) is copied to view 3. The graphical interim results of the two systems are as follows: Wagner's View Integration System: Interaction no. 6: View 1: cetntjarit I . d e a l e r ^ csritr^ ct "Dealer H3^> Contract Contract c^ustomer ?ntrdct Customer Customer 180 The Improved View Integration System: Interaction no. 1: View 3: Dealer Interaction no. 2: View 3: Dealer Contract Interaction no. 3: View 3: Dealer Contract Customer Interaction no. 4: View 3: Interaction no. 5: View 3: Dealer Interaction no. 6: View 3: Dealer T)ealerHS^ > . dealer_y-CBntjacr Contract Contract Contract Customer Customer Customer 181 The comparison in this case shows that Wagner's system interacts with a designer as frequently as is the case with the improved system, because the former system asks less unnecessary questions in this case where the two database views are identical. In fact, when comparing elements of type entity, Wagner's system outperforms the improved system by reducing the number of elements that each entity has to be compared with. In Wagner's approach, the first entity Dealer in view 1 has to be compared with the entities Dealer, Contract and Cus tomer in view 2, but other entities Contract and Cus tomer in view 1 only have to be compared with one entity in the view 2 (Contract and Cus tomer in view 2 respectively). On the contrary, in the improved system, each entity in view 1 is required to be compared with all the entities in view 2: the entity Dealer in view 1 has to be compared with the entities Dealer , Contract and Cus tomer in view 2, the entity Contract in view 1 is compared with the entities Contract and Cus tomer in view 2, and the entity Cus tomer in view 1 is compared with entity Cus tomer in view 2. The better performance of Wagner's system is to the credit of heuristic rule HI (two elements with identical or related meanings will have common contexts), which states that identical or related elements are usually found in the vicinity of the identical elements. 182 In conclusion, the improved view integration system in general requires fewer interactions with a designer than Wagner's system because some checking processes can be implemented automatically if all the unique entities have been identified ahead of time. However, when two database views are highly overlapped, a database designer has to compare more candidates for each entity using the improved system. In order to solve this problem, fuzzy logic is used to provide a numerical analysis of the similarities between entities in different views. Using a numerical analysis, a database designer may only have to compare the entities that are highly similar. 183 4.2 Determining Inter-views Entities' Similarities Using Fuzzy Logic Wagner's view integration system performs better than the improved view integration system in situations when the two database views are highly overlapped, because each entity in one view, except the one chosen for the first comparison, is only required to be compared with one entity in the other view. The better performance is mainly achieved by heuristic rule HI (two elements with identical or related meanings have common contexts) which states that identical or related elements are usually found in the vicinity of the identical elements. However, when a database designer examines the first entity in one view, he or she still has to compare it with all the entities in the other view if Wagner's approach is used. In Wagner's view integration system, no clue is given to the database designers about how similar an entity in one view is to entities in the other view. Hence, when database designers compare an entity in one view with many entities in the other view, they may have to spend considerable amount of time analyzing the properties of the entities being compared. The relative degrees of similarity of inter-view elements can be determined using the concept of fuzzy logic. This relative similarities determination is mainly used at the stage of the entities identification and relatedness identification. 184 4.2.1 Building the Fuzzy System for Entities' Similarities Determination As mentioned in section 2.3.2, the development of a fuzzy system includes the following steps: 1. Defining fuzzy rules. 2. Composing fuzzy rules. 3. Decomposing (defuzzifying) fuzzy rules. 4.2.1.1 Defining Fuzzy Rules for Entities' Similarities Determination Before fuzzy rules are defined, the following steps have to be implemented: 1. Defining model variables. 2. Defining ranges of fuzzy set domains. 3. Selecting surface type of fuzzy sets. 185 4.2.1.1.1 Defining Model Variables for Entities' Similarities Determination As mentioned in section 4.1.1, entities are similar if they have the following characteristics: 1. identical names (heuristic rule H4: elements with identical meanings have identical names); 2. common attributes (heuristic rule H6: two elements with identical or related meanings have some attributes with identical names); 3. common neighbor entities; 3. identical keys; 4. synonyms; and 5. identical word stems. Users of a company tend to use the same terminology to describe a real world object. Therefore, the identical names characteristic is very useful in finding identical entities in different database views and can be employed at the entities identification stage. 186 Identical or related entities usually have some attributes that have identical names or are synonyms. Hence, the common attributes characteristic can be employed at both entities and relatedness identification stages to determine identical or related entities. However, the improved view integration system does not include a thesaurus. Therefore, the improved system is not able to determine attributes that are synonyms. Besides, users may use different abbreviated names to describe the same real world object. For example, one user may select a term tel_no to refer to telephone number, while another user may employ a term tel_num. This usage of abbreviated names thus increases the difficulty of checking identical attributes. Therefore, the current improved system can only determine attributes that have identical names. Identical or related entities also have common neighbor entities. However, since most database views are only partially overlapped, this characteristic may decrease the entities' similarities calculated. For example, view 1 and view 2 are partially overlapped. An entity Dealer exists in both views; the entity Dealer in view 1 has the same attributes as the entity Dealer in view 2. However, they have different neighbor entities (the entities Branch and Contract). Hence, the common neighbor entities characteristic actually decreases the degree of similarity between the two entities. View 1: T T H I 5 F -^ Ic^raicrT^ 187 View 2: Dealer Contract Customer T3ealer_n<r> QContractnD> IC^Custome^ p^ Therefore, the fuzzy reasoning of the improved system does not build upon the common neighbor entities characteristic. Identical entities usually possess the same key, but related entities also have the same key. For instance, an entity Student is a superset of an entity Graduate Student, and they both have an attribute Student No as their keys. In a few occasions, however, users may use attributes with identical names as the keys of entities referring to totally different real world objects. For example, a user may employ an attribute N a m e as the keys of entities Student, Instructor, Department, Cou rse and so on. Therefore, this naming abuse decreases the usefulness of the identical keys characteristic. However, the identical keys characteristic is still a useful tool to determine identical or related entities as it does not decrease the degree of entities' similarities like the common neighbor entities characteristic, and it is easy to be checked. The improved view integration system is not designed to handle synonyms. Hence, the entities' similarities determination process is not based on the synonyms characteristic. 188 The identical word stems characteristic is only useful at the relatedness identification stage. Two entities may be related in meaning if their names have identical word stems. For example, entities Full Time Employee and Part Time Employee have the same word stem Employee and they are related in meaning. However, identification of word stems requires some artificial intellectual devices. For instance, a user may put all three words Full Time Employee in a single term FullTimeEmployee, and it is hard to retrieve the word stem from this single term. Two entities with names having the same word stem may not be related. For example, entities Full Time Teacher and Full Time Student have the same word stem Full Time, but yet they are not related in meaning. In addition, entities with names having different word stems can be related in meaning. For instance, entities Employee and Engineer are related and do not have common word stems in their names. Identifying naming differences is beyond the scope of this research. Therefore, the improved view integration system does not employ the identical word stems characteristic to determine related entities. Based on the above discussions, four model variables are defined: similarity, name, key and attributes' resemblance. The variable similarity is the degree of similarity between two entities. The variable name determines whether or not two entities have the same name. The variable key checks whether or not two entities have the same key. The variable attributes' resemblance computes the degree of similarity between entities' attributes at the entities and relatedness identification stages. The model variables similarity, key and attributes' resemblance are used at both the entities and relatedness identification stages, while the variable name is used only at the entities identification stage. 189 4.2.1.1.2 Defining Ranges of Fuzzy Set Domains A model variable may have to be split into one or more fuzzy sets to reflect its semantic properties. Each fuzzy set then represents a certain range of the variable's domain. The number of fuzzy sets for a model variable is usually an odd number between 3 and 11. Neighboring fuzzy sets must overlap in order to give a continuous surface. The degree of overlap averages between 25% and 50%. The domain of the model variable similarity ranges from 0 to 100 as the degree of similarity between two entities can be between 0% and 100%. The default number of fuzzy sets of the variable similarity is 7, because it is the middle number between 3 and 11. The default degree of overlap between neighboring fuzzy sets is set at 50% because this degree is most commonly used. Hence, the variable similarity is split into seven fuzzy sets: LowLow (FS1) , Low (FS2) , Modera teLow (FS3) , Moderate (FS4) , Modera teHigh (FS5) , High (FS6) and HighHigh (FS7) , where FS stands for fuzzy set. Then the domains of the fuzzy sets LowLow, Low, Modera teLow, Moderate, Modera teHigh, High and H ighHigh are 0-16.7, 16.7-33.3, 33.3-50.0, 50.0-66.7, 66.7-83.3 and 83.3-100 respectively. The following straight-lined graphs demonstrate how the model variable similarity is split into the seven fuzzy sets: LowLow, Low, Modera teLow, Moderate, Modera teHigh , High and HighHigh. 190 Figure 15. Fuzzy Sets of the Model Variable Similarity. similarity LowLow Low Degree of Membership ModerateLow Moderate ModerateHigh High HighHIGH 0 16.7 33.3 50 66.7 83.3 100 The model variable name is not split into several fuzzy sets because two entities either have the same name or have different names. Therefore, the domain of the variable name is: {True, False}. Similarly, the domain of the model variable key is: {True, False}, because two entities can have either identical or different keys. When an entity in one view (subject) is compared with many entities in the other view (candidates) at the entities or relatedness identification stage, the degree of attributes' resemblance can be computed by finding the ratio of the number of the candidate's attributes matching the subject's attributes to the total number of attributes owned by the subject. The formula for this calculation is as follows: Number of identical attributes between subject and candidate Total number of attributes of subject 191 The ratio describes how similar in terms of attributes each candidate is to the subject. For example, in view 1, an entity Employee possesses five attributes Employee ID, Name, Address, Position and Phone Number, while in view 2, an entity Teacher has four attributes Employee ID, Name, Address and Phone Number, and an entity Person has six attributes SIN, Employee ID, Name, Address, Email and Phone Number. View 1: (Employe' one Number} View 2: ( ^ ^ l o v e T r o ^ ^X^^g> vS-D\ y(Name) Teacher (^mployee ID (^ xidress^  ojie Number} (ASdress} hone Numbef) Although, the two candidates Teacher and Person in view 2 have different numbers of attributes (4 and 6 respectively), they have four attributes (Name, Address and Phone Number) matching the attributes of the subject Employee in view 1. Since the subject Employee has five attributes, the degree of attributes' resemblance between the subject Employee and the candidate Teacher is: 4/5 = 80%. Similarly, the degree of attributes' resemblance between the subject Employee and the candidate Person is: 4/5 = 80%. Therefore, the two candidates Teacher and Person have the same degree of attributes' resemblance with the subject Employee. 192 The model variable attributes' resemblance is again divided into seven sections: LowLow (FS1), Low (FS2) , ModerateLow (FS3) , Moderate (FS4) , ModerateHigh (FS5), High (FS6) and HighHigh (FS7). The default degree of overlap between neighboring fuzzy sets is again set at 50%. Then the domains of the fuzzy sets LowLow, Low, ModerateLow, Moderate, ModerateHigh, High and HighHigh are 0-16.7, 16.7-33.3, 33.3-50.0, 50.0-66.7, 66.7-83.3 and 83.3-100 respectively. The following straight-lined graphs demonstrate how the model variable attributes' resemblance is split into the seven fuzzy sets: LowLow, Low, ModerateLow, Moderate, ModerateHigh, High and HighHigh. Figure 16. Fuzzy Sets of the Model Variable Attributes' Resemblance attributes' resemblance Degree of Membership -LowLow -Low ModerateLow Moderate -ModerateHigh -High -HighHIGH 16.7 33.3 SO 66.7 83.3 100 A 80% degree of attributes' resemblance gives 0.6 degree of membership in the fuzzy set moderatehigh and 0.4 degree of membership in the fuzzy set high. 193 attributes' resemblance 1 Degree of Membership -LowLow -Low ModerateLow Moderate -ModerateHigh j -High -HighHIGH 16.7 33.3 50 66.7 83.3 100 194 4.2.1.1.3 Selecting Surface Type of Fuzzy Sets There are two types of surfaces used in fuzzy sets: straight-lined and S-curved. The straight-lined type is chosen as the default surface type of the improved view integration system because it is most commonly used. In fact, the difference between a straight-lined surface and a S-curved surface is apparent. Degree of Membership •3 Domain 195 4.2.1.1.4 Defining Fuzzy Rules After all the model variables, ranges of fuzzy set domains and fuzzy sets' surface type have been settled, the fuzzy rules can now be defined. The fuzzy rules for the entities identification stage are as follows: Rules for the model variable attributes' resemblance: The entities' similarity is proportional to the attributes' resemblance between the entities. The rules for entities' similarity have the following format: If the degree of attributes' resemblance is FSi, then the degree of similarity is FSi. where i is the fuzzy set's number between 1 and 7. So, the rules are listed as follows: a) If the degree of attributes' resemblance is FS1 (LowLow), then the degree of similarity is FS1 (LowLow). b) If the degree of attributes' resemblance is FS2 (Low), then the degree of similarity is FS2 (Low). c) If the degree of attributes' resemblance is FS3 (ModerateLow), then the degree of similarity is FS3 (ModerateLow). d) If the degree of attributes' resemblance is FS4 (Moderate), then the degree of similarity is FS4 (Moderate). e) If the degree of attributes' resemblance is FS5 (ModerateHigh), then the degree of similarity is FS5 (ModerateHigh). 196 f) If the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS6 (High). g) If the degree of attributes' resemblance is FS7 (HighHigh), then the degree of similarity is FS7 (HighHigh). 197 Rules for the model variable name: Two entities can either have identical or non-identical names. If the entities have identical names, the degree of similarity between the entities increases. The idea of the rules for entities which have identical names and attributes' resemblance at a certain fuzzy set range is to find the next higher available fuzzy set range of the model variable similarity. Consequently, the rules have the following format: If the entities have identical names and the degree of attributes' resemblance is FSi, then the degree of similarity is FSi+1. where i is the fuzzy set's number between 1 and 7. The truth membership of the proposition, the entities have identical names, is 0.5. a) If the entities have identical names and the degree of attributes' resemblance is FS1 (LowLow), then the degree of similarity is FS2 (Low). b) If the entities have identical names and the degree of attributes' resemblance is FS2 (Low), then the degree of similarity is FS3 (ModerateLow). c) If the entities have identical names and the degree of attributes' resemblance is FS3 (ModerateLow), then the degree of similarity is FS4 (Moderate). d) If the entities have identical names and the degree of attributes' resemblance is FS4 (Moderate), then the degree of similarity is F S 5 (ModerateHigh). e) If the entities have identical names and the degree of attributes' resemblance is FS5 (ModerateHigh), then the degree of similarity is FS6 (High). f) If the entities have identical names and the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS7 (HighHigh). If the entities have non-identical names, the degree of similarity between the entities decreases. The idea of the rules for entities which have non-identical names and attributes' resemblance at a certain fuzzy set range is to find the next lower available fuzzy set range of the model variable similarity. The general format of the rules is as follows: If the entities have non-identical names and the degree of attributes' resemblance is FSi, then the degree of similarity is FSi-1. where i is the fuzzy set's number between 1 and 7. Again, the truth membership of the proposition, the entities have non-identical names, is 0.5. a) If the entities have non-identical names and the degree of attributes' resemblance is FS2 (Low), then the degree of similarity is FS1 (LowLow). b) If the entities have non-identical names and the degree of attributes' resemblance is FS3 (ModerateLow), then the degree of similarity is FS2 (Low). c) If the entities have non-identical names and the degree of attributes' resemblance is FS4 (Moderate), then the degree of similarity is FS3 (ModerateLow). d) If the entities have non-identical names and the degree of attributes' resemblance is FS5 (ModerateHigh), then the degree of similarity is FS4 (Moderate). 199 e) If the entities have non-identical names and the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS5 (ModerateHigh). f) If the entities have non-identical names and the degree of attributes' resemblance is FS7 (HighHigh), then the degree of similarity is FS6 (High). The truth membership of the antecedent of the fuzzy rules for the model variable name is set at 0.5 because although the entities have the same name, they may be homonyms. Therefore, the probability that the entities with identical names refer to the same real world object is 0.5. 200 Rules for the model variable key: Two entities can either have identical or non-identical keys. If the entities have identical keys, the degree of similarity between the entities increases. So the rules for entities with identical keys have the following format: If the entities have identical keys and the degree of attributes' resemblance is FSi, then the degree of similarity is FSi+1. where i is the fuzzy set's number between 1 and 7. The truth membership of the proposition, the entities have identical keys, is 0.2. a) If the entities have identical keys and the degree of attributes' resemblance is FS1 (LowLow), then the degree of similarity is FS2 (Low). b) If the entities have identical keys and the degree of attributes' resemblance is FS2 (Low), then the degree of similarity is FS3 (ModerateLow). c) If the entities have identical keys and the degree of attributes' resemblance is FS3 (ModerateLow), then the degree of similarity is FS4 (Moderate). d) If the entities have identical keys and the degree of attributes' resemblance is FS4 (Moderate), then the degree of similarity is FS5 (ModerateHigh). e) If the entities have identical keys and the degree of attributes' resemblance is FS5 (ModerateHigh), then the degree of similarity is FS6 (High). f) If the entities have identical keys and the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS7 (HighHigh). 201 If the entities have non-identical keys, the degree of similarity between the entities decreases. The format of the rules for entities having non-identical keys is shown below: If the entities have non-identical keys and the degree of attributes' resemblance is FSi, then the degree of similarity is FSi-1. where i is the fuzzy set's number between 1 and 7. Again, the truth membership of the proposition, the entities have non-identical keys, is 0.2. a) If the entities have non-identical keys and the degree of attributes' resemblance is FS2 (Low), then the degree of similarity is FS1 (LowLow). b) If the entities have non-identical keys and the degree of attributes' resemblance is FS3 (ModerateLow), then the degree of similarity is FS2 (Low). c) If the entities have non-identical keys and the degree of attributes' resemblance is FS4 (Moderate), then the degree of similarity is FS3 (ModerateLow). d) If the entities have non-identical keys and the degree of attributes' resemblance is FS5 (ModerateHigh), then the degree of similarity is FS4 (Moderate). e) If the entities have non-identical keys and the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS5 (ModerateHigh). 202 f) If the entities have non-identical keys and the degree of attributes' resemblance is FS7 (HighHigh), then the degree of similarity is FS6 (High). The antecedent truth of the fuzzy rules for the model variable key is set at 0.2, because two entities having identical keys may be related in meanings. The antecedent truth of the fuzzy rules for the model variable name should be greater than that for the model variable key, because entities having identical names shows stronger evidence that they are similar than entities having identical keys. The fuzzy rules for the relatedness identification stage include all the rules listed above, except the rules for the model variable name. At the relatedness identical stage, the identical names characteristic is no longer useful because related entities do not have the same name. 203 4.2.1.2 Composing Fuzzy Rules Before choosing the method for fuzzy rules composition, the method for correlation (the first step of fuzzy rules composition) has to be selected. There are two methods for correlation: correlation minimum and correlation product. The correlation minimum method is selected as the default method for correlation because it is most commonly used. The composition of the fuzzy rules can be implemented using either the min-max method or the fuzzy additive method. If the min-max method is used to compose fuzzy rules, then in some situations, certain fuzzy rules may dominate over other rules as mentioned in section 2.3.2.2.2.3. One of these situations is shown below: View 1: 204 View 2 : Hourly Pay^ > (^ ffiame^ ) <^TJepartrne )^ one NunuSef) ccount Number^ Hourly Pay> T^JepartmOTT^ )- one NumBef) Scount NurnT5er> The two candidates Engineer and Clerk in view 2 have nine attributes matching those of the subject Employee in view 1. Therefore, the degree of attributes' resemblance between the subject Employee in view 1 and the candidate Engineer in view 2 is 9/10 = 0.9 (90%), and so is the degree of attributes' resemblance between the subject Employee and the candidate Clerk. The 90% degree of attributes' resemblance gives 0.6 degree of membership in the fuzzy set High and a 0.4 degree of membership in the fuzzy set HighHigh of the model variable attributes' resemblance. 205 206 So the consequent fuzzy sets High and HighHigh of the model variable similarity are truncated at 0.6 and 0.4 respectively. 11 High attributes' resemblance High HighHigh similarity HighHigh attributes' resemblance 207 Since the names of the candidates Engineer and Clerk are different from that of the subject Employee, two fuzzy rules for the model variable name are executed: 1. If the entities have non-identical names and the degree of attributes' resemblance is F S 6 (High), then the degree of similarity is F S 5 (ModerateHigh). 2. If the entities have non-identical names and the degree of attributes' resemblance is F S 7 (HighHigh), then the degree of similarity is F S 6 (High). With the first fuzzy rule of the model variable name executed, the antecedent truth for non-identical names is 0.5, while the antecedent truth for the degree of attributes' resemblance is 0.6. Therefore, by taking the minimum of these two antecedent truths, the consequent fuzzy set ModerateHigh of the model variable similarity is truncated at 0.5. 11 High attributes' resemblance name 0.5 ModerateHigh Min(0.6, 0.5) = 0.5 / \ similarity 208 With the second fuzzy rule of the model variable name executed, the antecedent truth for attributes' resemblance is 0.4 and the antecedent truth for non-identical names is 0.5. Hence, the consequent fuzzy set High of the model variable similarity is truncated at 0.4. 1 i HighHigh -1 1 IT attributes' resemblance name 0.5 High Min(0.4, 0.5) = 0.4 / \ / \ similarity The candidate Clerk and the subject Employee have different keys. Therefore, two fuzzy rules for the model variable key are executed: 1. If the entities have non-identical keys and the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS5 (ModerateHigh). 2. If the entities have non-identical keys and the degree of attributes' resemblance is FS7 (HighHigh), then the degree of similarity is FS6 (High). 209 The first fuzzy rule of the model variable key executed truncates the consequent fuzzy set ModerateHigh of the model variable similarity at 0.2, because the antecedent truth for non-identical keys is 0.2, and it is less than the antecedent truth for attributes' resemblance which is 0.6. High attributes' resemblance key 0.2 ModerateHigh Min(0.6, 0.2) = 0.2 similarity 210 The second fuzzy rule of the model variable key executed truncates the consequent fuzzy set High of the model variable similarity at 0.2, because the antecedent truth for non-identical keys is 0.2 , while the antecedent truth for attributes' resemblance is 0.4. HighHigh 1 1 1 attr ibutes' r e s e m b l a n c e n a m e 0.2 1 i High Min(0.4, 0.2) = 0.2 simi lar i ty The candidate Engineer, however, has the same key as the subject Employee, so only one fuzzy rules for the model variable key is executed, because the fuzzy set HighHigh represents the highest fuzzy set range of the variable key: 1. If the entities have identical keys and the degree of attributes' resemblance is FS6 (High), then the degree of similarity is FS7 (HighHigh). 211 The fuzzy rule of the model variable key executed truncates the consequent fuzzy set HighHigh of the model variable similarity at 0.2, because the antecedent truth for identical keys is 0.2, and it is less than the antecedent truth for attributes' resemblance which is 0.6. High 0.6 / 90% : \ 1 1 1 1—• 1 attributes' resemblance 1 - ModerateHigh key 0.2 Min(0.6, 0.2) = 0.2 o - 1 1 1 1 1 similarity 212 Therefore the resultant output fuzzy region (broken black line) in the case of the candidate Engineer is as follows: Resultant Output Fuzzy Region 0.7 T Degree of Similarity 213 The resultant output fuzzy region (broken black line) in the case of the candidate Clerk shown below: Resultant Output Fuzzy Region Degree of Similarity 214 Hence, the two candidates have the same resultant output fuzzy region and thus have the same degree of similarity with the subject. If the fuzzy additive method is used instead, the effect of the fuzzy rules will accumulate in the resultant output fuzzy region. As a result, the resultant output fuzzy region (broken black line) in the case of the candidate Engineer when using the fuzzy additive method is as follows: Resultant Output Fuzzy Region Degree of Similarity 215 while resultant output fuzzy region (broken black line) in the case of the candidate Clerk is shown below: Resultant Output Fuzzy Region Degree of Similarity Therefore, the fuzzy additive method is used to compose the fuzzy rules of the improved view integration system. 216 4.2.1.3 Decomposing (defuzzifying) Fuzzy Rules Techniques for decomposing fuzzy rules include composite moments and composite maximum. As mentioned in section 2.3.2.3.3, the composite maximum technique is sensitive to a single fuzzy rule that dominates the resultant output fuzzy region. Therefore, some situations cannot be solved by the composite maximum technique. One of these situations could be: The two candidates Employee and Person in view 2 do not have any attribute matching the attributes of the subject Employee in view 1. Therefore, the two candidates have a zero degree of attributes' resemblance with the subject. The 0.0 degree of attributes resemblance gives 1.0 degree of membership in the fuzzy set LowLow of the model variable attributes' resemblance. View 1: View 2: 217 a t t r i b u t e s ' r e s e m b l a n c e 1.01 Degree of Membership 0% o -LowLow -Low ModerateLow Moderate -ModerateHigh -High -HighHIGH 16.7 33.3 SO 66.7 83.3 100 Therefore, the consequent fuzzy set LowLow of the model variable Similarity is shown as follow: LowLow 1.0 attr ibutes ' r e s e m b l a n c e 1.0 LowLow similar i ty The keys of the two candidates are different from that of the subject; no fuzzy rules for the model variable key is executed because the fuzzy set LowLow represents the lowest fuzzy set range of the model variable similarity. 218 Since the candidate Person and the subject Employee have different names, so no fuzzy rule for the model variable name is executed, for the same reason that the fuzzy set LowLow represents the lowest fuzzy set range of the model variable similarity. However, the candidate Employee has the same name as the subject Employee. Thus, one fuzzy rule for the model variable name is executed: 1. If the entities have identical names and the degree of attributes' resemblance is FS1 (LowLow), then the degree of similarity is FS2 (Low). With the fuzzy rule of the model variable name executed, the antecedent truth for the model variable name is 0.5, while the antecedent truth for attributes' resemblance is 1.0. Therefore, the consequent fuzzy set Low of the model variable similarity is truncated at 0.5. LowLow 10 i o attributes' resemblance 11 ModerateHigh name 0.5 similarity 219 Therefore, the resultant output fuzzy region (broken black line) in the case of the candidate Employee is as follows: Resultant Output Fuzzy Region 220 The resultant output fuzzy region (broken black line) in the case of the candidate Person is as follows: Resultant Output Fuzzy Region If the composite maximum technique is used to decompose fuzzy rules, both candidates will have the same degree of similarity (0.0%) with the subject, as this technique finds the domain point with the maximum truth in the resultant output fuzzy region. If the composite moments technique is used instead, the candidates Employee and Person have degree of similarity 12.1% and 5.4% respectively with the subject Employee, because the technique uses the weighted mean to find the "balance point" of the resultant output fuzzy region. The following diagram shows the expected value in the case of the candidate Employee when using the composite moments technique: Resultant Output Fuzzy Region Degree of Similarity 222 The expected value of the resultant output fuzzy region in the case of the candidate Person when using the composite moments technique is shown below: Resultant Output Fuzzy Region In short, the result computed using the composite moments technique is more realistic than that using the composite maximum one. Although two entities that do not have identical names, identical keys and common attributes are unlikely identical or related in meaning, they are not necessary totally different (have zero degree of similarity). Moreover, even though two entities do not have identical keys and common attributes, if they have identical names, the degree of similarity between the two entities again should not be zero. 223 Returning now to the case in the last section (section 4.2.1.2). The degree of similarity between the subject Employee and the candidate Engineer, when using the fuzzy additive method and the composite moments technique is 79.7%. Resultant Output Fuzzy Region Degree of Similarity 79 7%) 224 The degree of similarity between the subject Employee and the candidate Clerk when using the fuzzy additive method and the composite moments technique is 77.9%. Resultant Output Fuzzy Region Degree of Similarity 77 9% 225 4.2.2 Sensitivity Analysis of the Fuzzy Reasoning The degrees of similarity between entities in different database views are mainly determined by the model variables attributes' resemblance, name and key. However, they are also affected by the values of the following system variables: 1. surface type of fuzzy sets; 2. correlation method; 3. number of fuzzy sets; 4. degree of overlap of neighboring fuzzy sets; 5. truth membership of the model variable name; and 6. truth membership of the model variable key. If the values of these system variables are varied, the fuzzy reasoning performance of the improved view integration system may be changed. The default settings of these variables in the improved view integration system are as follows: 1. surface type of fuzzy sets = straight-lined; 2. correlation method = correlation minimum; 3. number of fuzzy sets = 7; 4. degree of overlap of neighboring fuzzy sets = 50%; 5. truth membership of the model variable name = 0.5; and 6. truth membership of the model variable key = 0.2. 226 At the entities identification stage, the degrees of similarity between entities in different views are dependent on the model variable attributes' resemblance, name and key. If the degree of attributes' resemblance remains constant, four situations can be found: 1. two entities having non-identical keys and names; 2. two entities having identical keys and non-identical names; 3. two entities having non-identical keys and identical names; and 4. two entities having identical keys and names. At the relatedness identification stage, only two situations are found because related entities do not have the same name: 1. two entities having non-identical keys and names; and 2. two entities having identical keys and non-identical names. 227 With the system variables mentioned above remain at their default values, the following table shows how the degree of entities' similarity varies with the degree of attributes' resemblance in the four situations at the entities identification stage: Attributes' Resemblance (%) Non-identical Keys and Names Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names 0 5.4 NA 12.1 NA 10 12.4 17.4 20.3 22.1 20 18.0 24.5 26.8 32.3 30 21.4 27.5 31.8 37.3 40 31.1 38.3 43.8 50.8 50 41.1 46.9 53.1 58.9 60 49.2 56.2 61.7 68.9 70 62.7 68.2 72.5 78.6 80 67.7 73.2 75.5 82.0 90 77.9 79.7 82.6 87.6 100 86.4 87.9 90.0 94.6 228 No value for the degree of similarity should be found in the column Identical Keys and Non-identical names and the column Identical Keys and N ames when the attributes' resemblance is 0%. If the attributes' resemblance is 0%, that means no common attribute is found between the entities in different views. Hence, no identical key should be found between these two entities. Default Sett ings of the Fuzzy Reasoning 10 20 30 40 50 60 70 Attributes' Resemblance 80 90 100 -Non-identical Keys and Names -Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names In general, the degree of similarity rises with the attributes' resemblance. The degree of similarity between two entities with identical keys and names is the highest among the four situations at the entities' identification stage. The degree of similarity between two entities having non-identical keys and identical names is the second highest. The degree of similarity between two entities having identical keys and non-identical names is the third highest, and finally, the degree of similarity between two entities having non-identical keys and names is the lowest. 229 4.2.2.1 Variation of the Surface Type of the Fuzzy Sets There are two common surface types for fuzzy sets: straight-lined and S-curved. The table below shows the entities' similarities computed using straight-lined surfaces versus those using S-curved surfaces: Attributes' Resemblance (%) Non-identical Keys and Names Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names Surface Type: Straight-lined 0 5.4 NA 12.1 NA 10 12.4 17.4 20.3 22.1 20 18.0 24.5 26.8 32.3 30 21.4 27.5 31.8 37.3 40 31.1 38.3 43.8 50.8 50 41.1 46.9 53.1 58.9 60 49.2 56.2 61.7 68.9 70 62.7 68.2 72.5 78.6 80 67.7 73.2 75.5 82.0 90 77.9 79.7 82.6 87.6 100 86.4 87.9 90.0 94.6 230 Surface Type: S-curved 0 4.7 NA 11.5 NA 10 11.9 16.7 19.9 21.9 20 17.3 24.0 26.4 31.8 30 21.6 27.5 31.9 37.4 40 31.0 38.0 43.8 50.6 50 41.7 47.0 53.0 58.3 60 49.4 56.2 62.0 69.0 70 62.6 68.1 72.9 78.4 80 68.2 73.6 76.0 82.7 90 78.1 80.1 83.3 88.1 100 87.1 88.5 90.7 95.3 231 The following graph shows that the difference between the degrees of similarity computed using the straight-lined surface and those computed using the S-curved surface is negligible. Hence, the surface type of the fuzzy sets does not affect the fuzzy reasoning results of the improved view integration system. Straight-l ined Surface Vs S-curved Surface 232 4.2.2.2 Variation of the Correlation Method Two most commonly used methods for fuzzy rules correlation are the correlation minimum method and the correlation product method. The table below shows the difference of entities' similarities between using the correlation minimum method and the correlation product method: Attributes' Resemblance (%) Non-identical Keys and Names Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names Correlation Method: Correlation Minimum 0 5.4 NA 12.1 NA 10 12.4 17.4 20.3 22.1 20 18.0 24.5 26.8 32.3 30 21.4 27.5 31.8 37.3 40 31.1 38.3 43.8 50.8 50 41.1 46.9 53.1 58.9 60 49.2 56.2 61.7 68.9 70 62.7 68.2 72.5 78.6 80 67.7 73.2 75.5 82.0 90 77.9 79.7 82.6 87.6 233 100 86.4 87.9 90.0 94.6 Correlation Method: Correlation Product 0 5.4 NA 10.9 NA 10 11.8 15.7 19.7 21.5 20 14.9 19.4 23.0 26.5 30 24.7 29.2 34.7 38.9 40 30.4 36.5 42.8 48.8 50 43.1 47.1 52.9 56.9 60 51.2 57.2 63.5 69.6 70 61.1 65.3 70.8 75.3 80 73.5 77.0 80.6 85.1 90 78.5 80.3 84.3 88.2 100 88.1 89.1 91.5 94.6 234 The following graph illustrates the difference of entities' similarities between using the correlation minimum method and the correlation product method: Correlat ion Minimum V s Correlat ion Product Identical Keys and Names Non-identical Keys and IdenticalNames Identical Keys and Non-identicalNames Non-identical Keys and Names 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance Correlation Minimum Correlation Product The entities' similarities computed by the correlation minimum method is different from those by the correlation product method. Therefore, different correlation methods produce different fuzzy reasoning result. However, the graph shows that the correlation minimum method is not superior to the correlation product method nor vice versa. Hence, a database designer has to decide which method better suits his or her needs. 235 4.2.2.3 Variation of the Number of Fuzzy Sets The following table shows how the entities' similarities change with the number of fuzzy sets of the model variables attributes' resemblance and similarity. The comparison involves varying the number of fuzzy sets from 5 to 9. Attributes' Resemblance (%) Non-identical Keys and Names Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names Number of Fuzzy Sets: 5 0 8.2 NA 18.2 NA 10 17.6 25.9 29.2 32.2 20 19.3 26.4 31.2 34.0 30 27.0 36.8 40.2 48.4 40 31.0 39.2 45.8 53.3 50 36.6 45.4 54.6 63.4 60 46.7 54.2 60.8 69.0 70 51.6 59.8 63.2 73.0 80 66.0 68.8 73.6 80.7 90 67.8 70.8 74.1 82.4 100 79.6 81.8 84.9 91.8 236 Number of Fuzzy Sets: 6 0 6.5 NA 14.5 NA 10 14.6 20.8 24.1 26.4 20 15.9 21.4 26.7 30.7 30 24.2 30.8 36.2 42.0 40 29.3 36.3 43.7 50.7 50 38.0 46.3 53.7 62.0 60 49.3 56.3 63.7 70.7 70 58.0 63.8 69.2 75.8 80 69.3 73.3 78.6 84.1 90 73.6 75.9 79.2 85.4 100 83.7 85.5 87.9 93.5 Number of Fuzzy Sets: 7 0 5.4 NA 12.1 NA 10 12.4 17.4 20.3 22.1 20 18.0 24.5 26.8 32.3 30 21.4 27.5 31.8 37.3 40 31.1 38.3 43.8 50.8 50 41.1 46.9 53.1 58.9 237 60 49.2 56.2 61.7 68.9 70 62.7 68.2 72.5 78.6 80 67.7 73.2 75.5 82.0 90 77.9 79.7 82.6 87.6 100 86.4 87.9 90.0 94.6 Number of Fuzzy Sets: 8 0 4.6 NA 10.3 NA 10 10.8 15.0 17.5 19.2 20 16.8 21.7 25.0 29.3 30 23.6 29.6 33.8 40.0 40 29.5 36.0 39.5 46.2 50 41.4 47.4 52.6 58.6 60 53.8 60.5 64.0 70.5 70 60.0 66.2 70.4 76.4 80 70.7 75.0 78.3 83.2 90 80.8 82.5 85.0 89.2 100 88.4 89.7 91.4 95.4 Number of Fuzzy Sets: 9 0 4.0 NA 9.0 NA 238 10 9.6 13.2 15.6 17.0 20 15.5 19.6 22.9 26.7 30 23.3 28.7 32.9 38.1 40 34.6 40.5 43.5 49.2 50 43.3 47.7 52.3 56.7 60 50.8 56.5 59.5 65.4 70 61.9 67.1 71.3 76.7 80 73.3 77.1 80.4 84.5 90 83.0 84.4 86.8 90.4 100 89.9 91.0 92.5 96.0 239 Graphs in Set A The following graphs illustrates how the entities' similarities vary with the number of fuzzy sets of the model variables attributes' resemblance and similarity in each of the four situations at the entities identification stage: Entities with Non-identical Keys and Names 50 60 Attributes' Resemblance 70 80 90 100 -5 Fuzzy Sets -6 Fuzzy Sets 7Fuzzy Sets 8 Fuzzy Sets 9 Fuzzy Sets 240 Entit ies with Identical Keys and Non-identical Names 2 4 1 242 Graphs in Set B The following graphs show how the entities' similarities vary with the four situations at the entities identification stage when different numbers of fuzzy sets of the model variable attributes' resemblance and similarity are used. N u m b e r o f F u z z y S e t s E q u a l s 5 30 40 50 60 70 Attributes' Resemblance 30 90 100 -Non-identical Keys and Names -Identical Keys and Non-identical Names Non -identical Keys and Identical Names Identical Keys and Names 243 Numberof Fuzzy Sets Equals 6 244 Numberof Fuzzy Sets Equals 8 245 With regard to the graphs in set A, the slopes of the line graphs become steeper when the number of fuzzy sets of the model variables attributes' resemblance and similarity increases. Hence, raising the number of fuzzy sets has an effect of contrasting similar entities from dissimilar entities. This means that if the number of fuzzy sets increases, the degrees of similarity between dissimilar entities become lower, while those between similar entities become higher. Therefore, increasing the number of fuzzy sets may help a database designer distinguish similar entities from dissimilar entities more easily and quickly at a glance. The graphs in set B show that the line graphs become closer together when number of fuzzy sets increases. In other words, the vertical distance between the line graphs decreases as the number of fuzzy sets increases. Hence, raising the number of fuzzy sets has an effect of minimizing the difference of entities' similarities in the four situations at the entities identification stage. 246 4.2.2.4 Variation of the Overlap Degree of Neighboring Fuzzy Sets The following table shows the entities' similarities when the degree of overlap between neighboring fuzzy sets of the model variables attributes' resemblance and similarity is at 30%, 40%, 50%, 60% and 70%: Attributes' Resemblance (%) Non-identical Keys and Names Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names Degree of Overlap: 30% 0 4.3 NA 12.3 NA 10 11.2 17.2 20.9 22.9 20 12.4 18.0 23.1 26.7 30 23.3 29.8 36.8 43.4 40 29.1 37.4 42.8 51.1 50 40.2 46.6 53.4 59.8 60 48.9 57.2 62.6 70.9 70 56.6 63.2 70.2 76.7 80 73.3 76.9 82.0 87.6 90 77.1 79.1 82.8 88.8 100 86.4 87.7 90.0 95.7 247 Degree of Overlap: 40% 0 4.8 NA 12.1 NA 10 11.8 17.2 20.5 22.4 20 16.0 22.4 25.5 30.8 30 21.7 28.3 33.6 39.3 40 30.2 37.9 43.5 51.0 50 40.6 46.8 53.2 59.4 60 49.0 56.5 62.1 69.8 70 60.7 66.4 71.7 78.3 80 69.2 74.5 77.6 84.0 90 77.6 79.5 82.8 88.2 100 86.4 87.9 90.0 95.2 Degree of Overlap: 50% 0 5.4 NA 12.1 NA 10 12.4 17.4 20.3 22.1 20 18.0 24.5 26.8 32.3 30 21.4 27.5 31.8 37.3 40 31.1 38.3 43.8 50.8 50 41.1 46.9 53.1 58.9 248 60 49.2 56.2 61.7 68.9 70 62.7 68.2 72.5 78.6 80 67.7 73.2 75.5 82.0 90 77.9 79.7 82.6 87.6 100 86.4 87.9 90.0 94.6 Degree of Overlap: 60% 0 8.8 NA 13.0 NA 10 13.0 17.6 20.2 22.0 20 19.1 25.0 27.8 32.4 30 21.7 26.8 31.1 36.1 40 31.9 38.7 44.0 50.5 50 43.7 49.9 54.3 60.6 60 49.5 56.0 61.3 68.1 70 63.9 68.9 73.2 78.3 80 67.6 72.2 75.0 80.9 90 78.0 79.8 82.4 87.0 100 85.7 87.0 88.5 91.2 Degree of Overlap: 70% 0 10.7 NA 13.9 NA 249 10 13.6 17.9 20.2 21.9 20 20.0 25.4 28.4 32.5 30 22.1 26.6 30.6 35.4 40 32.6 39.0 44.1 50.1 50 45.0 51.6 54.8 60.7 60 49.9 55.9 61.0 67.4 70 64.6 69.4 73.4 77.9 80 67.5 71.6 74.6 80.0 90 78.1 79.8 82.1 86.4 100 85.1 86.1 87.4 89.3 250 Graphs in Set A The following graphs illustrate how entities" similarities vary with the degree of overlap between neighboring fuzzy sets of the model variables attributes' resemblance and similarity in different situations at the entities identification stage. Entit ies with Non-identical Keys and Names 30% Overlap 40% Overlap 50% Overlap 60% Overlap 70% Overlap 0 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance 251 Entit ies with Identical Keys and Non-Identical Names Entit ies with Non-identical Keys and Identical Names 30% Overlap 40% Overlap 50% Overlap 60% Overlap 70% Overlap 0 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance 252 Entit ies with Identical Keys and Names 100 -90 -80 -70 - 30% Overlap 60 - 40% Overlap 50 - 50% Overlap 40 • 60% Overlap 30 -70% Overlap 20 -10 • o -~ — i 1 1——i 1 1 1 1— i — , 0 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance 253 Graphs in Set B The following graphs illustrate how the entities' similarities vary with the four situations at the entities identification stage when different degrees of overlap of neighboring fuzzy sets are chosen: 254 Degree of Overlap Equals 40% 255 Degree of Overlap Equals 60% 256 The graphs in set A and set B show that when the degree of overlap between neighboring fuzzy sets of the model variables attributes' resemblance and similarity changes, the entities' similarities also vary. However, the graphs do not show any specific trait which indicate how the change of the degree of overlap between neighboring fuzzy sets would affect the entities' similarities. Therefore, a database designer can select the degree of overlap between neighboring fuzzy sets of the model variables attributes' resemblance and similarity according to his or her preference. 257 4.2.2.5 Variation of the Antecedent Truth Membership of the Model Variable Name The table below shows the entities' similarities when the antecedent truth membership of the model variable name is set at 0.3, 0.4, 0.5, 0.6 and 0.7: Attributes' Resemblance (%) Non-identical Keys and Names Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names Antecedent Truth of Name: 0.3 0 5.4 NA 11.0 NA 10 12.5 17.6 18.9 21.2 20 18.2 25.2 26.2 ^ 32.3 30 21.5 28.4 30.2 36.3 40 32.1 39.8 42.8 50.2 50 42.1 48.7 51.3 57.9 60 49.8 57.2 60.2 67.9 70 63.7 69.8 71.6 78.5 80 67.7 73.8 74.8 81.8 90 78.8 81.1 82.4 87.5 100 87.1 89.0 90.0 94.6 258 Antecedent Truth of Name: 0.4 0 5.4 NA 11.6 NA 10 12.4 17.5 19.7 21.7 20 18.1 24.8 26.5 32.3 30 21.5 27.9 31.1 36.8 40 31.6 38.9 43.8 50.8 50 41.5 47.7 52.3 58.5 60 49.2 56.2 61.1 68.4 70 63.2 68.9 72.1 78.5 80 67.7 73.5 75.2 81.9 90 78.3 80.3 82.5 87.6 100 86.7 88.4 90.0 94.6 Antecedent Truth of Name: 0.5 0 5.4 NA 12.1 NA 10 12.4 17.4 20.3 22.1 20 18.0 24.5 26.8 32.3 30 21.4 27.5 31.8 37.3 40 31.1 38.3 43.8 50.8 50 41.1 46.9 53.1 58.9 259 60 49.2 56.2 61.7 68.9 70 62.7 68.2 72.5 78.6 80 67.7 73.2 75.5 82.0 90 77.9 79.7 82.6 87.6 100 86.4 87.9 90.0 94.6 Antecedent Truth of Name: 0.6 0 5.4 NA 12.4 NA 10 12.4 17.4 20.7 22.4 20 17.9 24.3 27.0 32.3 30 21.4 27.2 32.3 37.6 40 30.8 37.8 43.8 50.8 50 40.8 46.4 53.6 59.2 60 49.2 56.2 62.2 69.2 70 62.4 67.7 72.8 78.6 80 67.7 73.0 75.7 82.1 90 77.6 79.3 82.6 87.6 100 86.2 87.6 90.0 94.6 Antecedent Truth of Name: 0.7 0 5.4 NA 12.6 NA 260 10 12.4 17.4 20.7 22.4 20 17.8 24.1 27.1 32.3 30 21.4 27.1 32.6 37.8 40 30.8 37.8 43.8 50.8 50 40.6 46.0 54.0 59.4 60 49.2 56.2 62.2 69.2 70 62.2 67.4 72.9 78.6 80 67.7 72.9 75.9 82.2 90 77.6 79.3 82.6 87.6 100 86.1 86.1 90.0 94.6 261 Graphs in Set A The following graphs illustrate how entities' similarities vary with the antecedent truth membership of the model variable name in different situations at the entities identification stage: Entities with Non-identical Keys and Names 0.3 Membership 0.4 Membership 0.5 Membership 0.6 Membership 0.7 Membership 0 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance 262 Entit ies with Identical Keys and Non-identical Names 0.3 Membership 0.4 Membership 0.5 Membership 0.6 Membership 0.7 Membership 0 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance Entit ies with Non-identical Keys and Identical Names 0.3 Membership 0.4 Membership 0.5 Membership 0.6 Membership 0.7 Membership 0 10 20 30 40 50 60 70 80 90 100 Attributes' Resemblance 263 Entit ies with Identical Keys and Names 264 Graphs in Set B The graphs below illustrate how the entities' similarities change with the four situations at the identification stage when the antecedent truth of the model variable name is set at 0.3, 0.4, 0.5, 0.6 and 0.7: Antecedent Truth of Name Equals 0.3 -Non-identical Keys and Names -Identical Keys and Non-identical Names Non-identical Keys and Identical Names Identical Keys and Names 30 40 SO 60 70 Attributes' Resemblance 80 90 100 265 Antecedent Truth of Name Equals 0.4 Antecedent Truth of Name Equals 0.5 266 Antecedent Truth of Name Equals 0.6 Antecedent Truth of Name Equals 0.7 267 The graphs in set A show that the change of the antecedent truth of the model variable name causes very small changes in the entities' similarities. Therefore, varying the antecedent truth of the model variable name does not affect the fuzzy reasoning results of the improved view integration system. Regarding the graphs in set B, the vertical distance between entities with non-identical keys and names and entities with non-identical keys and identical names, and the vertical distance between entities with identical keys and non-identical names and entities with identical keys and names increases with the antecedent truth of the model variable name. This change in vertical distance again indicates that increasing the antecedent truth of the model variable name contrast entities with identical names from entities with non-identical names. Note that when the antecedent truth of the model variable name decreases, the vertical distance between entities with identical keys and non-identical names and entities with non-identical keys and identical names drops. This drop in the vertical distance is reasonable as the antecedent truth of the model variable key is 0.2. When the antecedent truth of the model variable name decreases, it becomes comparable to the antecedent truth of the model variable key. Varying the antecedent truth of the model variable key has the same effect as changing the antecedent truth of the model variable name. Hence, varying the antecedent truth of the model variable key does not affect the fuzzy reasoning results of the improved system. Note that the antecedent truth of the model variable key has to be smaller than that of the model variable name, because the identical keys characteristic provides a less certain indication that the entities are similar than does the identical names characteristic. 268 In conclusion, using different fuzzy sets' surface types does not affect the computation of entities' similarities. Different correlation methods would produce different results concerning entities' similarities. However, no one correlation method is better than the other. Hence, a database has to decide which correlation to use. Raising the number of fuzzy sets of the model variables similarity and attributes' resemblance contrasts the similarities between similar entities from those between dissimilar entities. However, increasing the number of fuzzy sets also decreases the difference of the entities' similarities in the four situations at the entities identification stage. There is no specific pattern as to how the change of the degree of overlap between neighboring fuzzy sets of the model variables name and attributes' resemblance affects the degrees of similarities between entities. Finally, changing the antecedent truth of the model variables name and key would not affect the fuzzy reasoning results of the improved system. 269 4.2.3 Advantages of Using Fuzzy Logic in Entities Similarity Determination The merits of using fuzzy logic in determining similar entities are as follows: 1. A conventional expert system draws one conclusion to each of the rules executed; the conclusion is limited to each rule executed. However, a fuzzy system can draw a general conclusion based on the results of all the fuzzy rules executed. For example, the conventional expert system can only conclude that the following entities Employee and Engineer have 90% attributes' resemblance, non-identical names and identical keys. View 1: xxount NumbeT> 270 However, the fuzzy system can combine all the conclusions drawn by the conventional expert system and conclude that the two entities are 79.7% similar. 2. New fuzzy rules can be readily appended and deleted without modifying the existing fuzzy rules. New model variables can also be added easily to the system without affecting other variables. Each fuzzy rule in the system is independent of other fuzzy rules. Therefore, fuzzy rules are easy to write and modify. 3. Different opinions can be accommodated by the system. The output fuzzy region, which is formed at the stage of fuzzy rule composition, accumulates the results of all the fuzzy rules applied. Two entities with identical names are very similar. However, if these two entities do not have identical keys, they may not be similar. Fuzzy logic can accommodate these contradictory results and can provide a reasonable conclusion concerning these results. 4. All fuzzy rules in the system are applied in a parallel manner. This parallel processing allows the system to draw a general conclusion on the results of all the rules executed. 271 5. Conclusions and Recommendations The main contribution of this research is the improvement of Wagner's view integration system. In Wagner's approach, the testing of elements' identity is carried out before the testing of elements' relatedness. During the identity test, the elements in the vicinity of the element that has just been identified will be chosen for the subsequent comparisons. A new conflict analysis is introduced. It analyses the database elements in the following order: 1) entities, 2) relatedness, 3) relationships and 4) attributes. The table below summarizes the comparison of the two approaches: Table 8. Comparisons of Wagner's Approach with the New Approach Situations Wagner's Approach New Approach Partially Overlapped Views Better Non-overlapped Views Better Views with Is-a Relationships Better Highly Overlapped Views Better 272 The new approach performs better than Wagner's approach at situations when the views are partially overlapped or non-overlapped, because the former approach identifies all the unique entities at the entities identification stage. Hence, some of the comparison steps at the relationships and attributes identification stages can be done automatically by the system. This automatic checking greatly reduces the needs of the system to interact with the designer during a conflict analysis. The new approach is also more efficient at situations when the views contain is-a relationships, because the relatedness identification is carried out before the relationships and attributes identification. Hence, when a database designer compares a relationship or an attribute, the system can ask the designer immediately whether he or she wants to relocate the relationship or attribute to another entity along the is-a relationships chain. In Wagner's approach, the relatedness test is carried out after the identity test. Therefore, Wagner's system requires the designer to go through again some of the relationships or attributes that have been compared during the identity test so as to determine whether or not they have to be relocated to another entity along the is-a relationships chain. 273 However, Wagner's approach performs better than the new approach at situations when the views are highly overlapped. With Wagner's approach, each entity in one view, except the first entity, is only compared to one entity in the other view, while with the new approach, most entities in one view have to be compared to more than one entity in the other view. Therefore, the concept of fuzzy logic is used to provide the database designer with the relative degree of similarity between entities. The similarity degree is given as a numerical value which is based on the resemblance of the entities' names, keys and attributes. The entities' similarity is used at the phase of identifying entities and subset relationship of entities during the conflicts analysis. The improved view integration system can only identify inter-view structural conflicts; it cannot determine semantic differences between database views. For instance, the information in view 1 is abstracted at the department level, while the information in view 2 is abstracted at the faculty level. The attribute Budget only applies to departments. Hence, the attribute cannot be simply copied to view 3, because view 3 depicts the information at the faculty level. A solution to this problem is to add a new attribute Department to the entity Program in view 3 in order to distinguish the budgets of different departments. View 1: (Department) [Budget ^> View 2: (Faculty) 274 View 3: (Faculty) C e^partrnenfJ> Future extensions to this research could include the integration of the concept of common sense into the entities' similarity computation. The concept of common sense can help determine the meaning characteristic of database elements. If this concept can be represented using fuzzy sets, it can be used in writing fuzzy rules to help compute the entities' similarity. Then, the reliability and validity of the degree of similarity will be greatly improved. Another focus could be on refining the fuzzy rules and selecting the best set of values for the system variables of the improved system. More fuzzy rules can be added to the improved system if the system is able to identify and distinguish synonyms, abbreviated names and common word stems in names. Varying the values of the system variables could change the performance of the fuzzy reasoning of the improved system. After the fuzzy rules and system variables are finely tuned, an alpha level could be introduced to reduce the number of entities designers have to compare each time they interact with the system. Recall that an alpha level excludes those elements with low truth memberships in a model variable. If the degree of similarity between the subject entity and the candidate entity is low, the candidate entity can be exempted from the comparison. 275 This research has outlined a list of situations found at each of the four identification stages (entities, relatedness, relationships and attributes). However, this list of situations may not be exhaustive. Therefore, searching for other situations can further improve this research. Indeed, the current view integration system can only solve structural conflicts. Hence, the future research should concentrate more on resolving semantic conflicts. The improved system does not support aggregation and association abstractions. Therefore, the system can be improved by incorporating these data abstractions into the system's functionality. 276 6. Bibliography Batini, C , Lenzerini M., and Navathe S. B., "A Comparative Analysis of Methodologies for Database Schema Integration", ACM Computing Surveys, vol. 18, no. 4, 1986, pp. 323-364. Batini, C , Lenzerini M. "A Methodology for Data Schema Integration in the Entity Relationship Model", IEEE Transactions on Software Engineering, vol. 10, no. 6, 1984, pp. 650-663. Bernstein, P. "Synthesizing Third Normal Form Relations from Functional Dependencies", ACM Transactions on Database Systems, vol. 1, no. 4, December 1976, pp. 277-298. Biskup, Joachim and Convent, Bernhard, "A Formal View Integration Method", Int'I ACM SIGMOD Conf, 1986, pp. 398-407. Biskup, Joachim and Convent, Bernhard, "A Formal View Integration Method", Forschungsbericht 208, Universitat Dortmund, 1985. Casanova, M. and Vidal, V. "A Sound Approach to View Integration", Proceedings of the ACM Conference on Principles of Database Systems, March 1983, pp. 35-47. Chen, P. P. "The entity-relationship model: Toward a unified view of data", ACM Trans. Database Syst., vol. 1, no. 1, 1976, pp. 9-36. Cox, Earl D. Fuzzy Logic for Business and Industry, Charles River Media Inc., 1995. Cox, Earl D. The Fuzzy Logic Handbook,, Charles River Media Inc., 1994. Elmasri, R., Weeldryer, J., and Hevner, A. "The category concept: An extension to the entity-relationship model", Data Knowl. Eng. 1, 1 (June), 1985. Goldstein, Robert C. and Storey, Veda, "Unraveling Is-A Structures", University of British Columbia, November, 1988. New Orleans Database Design Workshop Report (Summary), VLDB, RIO (1979). Navathe, Shamkant, Elmasri, Ramez and Larson James, "Integrating User Views in Database Design", IEEE Computer, 1986, pp. 50-62. Raver, N. and Hubbard, G. U. "Automated Logical Database Design methodology and Techniques", IBM Systems Journal, vol. 16, no. 3, 1977 277 Wagner, Christian, "View Integration in Database Design", April 1989. Yao, S. Bing, Waddle, Vance E., and Housel, Barron C , "View Modeling and Integration Using the Functional Data Model", IEEE Transactions on Software Engineering, vol. SE-8, November 1982, pp. 544-553. 278 7. Appendix (Case Study) The following case, provided by Dr. Robert Goldstein, is used to demonstrate the operation of the improved view integration system: View 1 - Department Entities (Attributes): Course (Number. Name, Credits) Section (Number. Max_enrollment) Student (Number. Name, Address) Undergraduate (Option) Grad_student Program (Name. Budget) Relationships (Attributes): Section (1,1) instanceof Course (l,n) Student (l,n) registered_in Section (l,n) (Mark) Gradstudent (1,1) enrolledin Program (l,n) Undergraduate is_a Student Grad student is a Student 279 View 1 - Department Undergraduate (1,1) Course 280 View 2 - Faculty of Graduate Studies Entities (Attributes): Program (Name) Student (Number. Name, Citizenship, Full_time) Course (Dept. Number. Name, Credits) Teaching_asst (Hours, Salary) Relationships (Attributes): Student (1,1) enrolled_in Program (l,n) (year_started) Student (l,n) completed Course (l,n) (mark) Teachingasst (l,n) teaches Course (l,n) Teaching_asst is_a Student 281 V i e w 2 - F a c u l t y o f G r a d u a t e Studies 282 V i e w 3 - G l o b a l S c h e m a Entities (Attributes): Course (Dept. Number. Name, Credits) Section (Number. Max_enrollment) Student (Number. Name, Citizenship, Full_time, Address) Undergraduate (Option) Grad_student Program (Name. Budget) Teachingasst (Hours, Salary) Relationships (Attributes): Section (1,1) instanceof Course (l,n) Student (l,n) registered_in Section (l,n) (Mark) Gradstudent (1,1) enrolled_in Program (l,n) Student (l,n) completed Course (l,n) (mark) Teaching_asst (l,n) teaches Course (l,n) Undergraduate is_a Student Gradstudent is_a Student Teaching_asst is_a Student 283 V i e w 3 - G l o b a l Schema The following points highlight some of the steps executed during the conflict analysis: 1. The system suggests that the entity Grad_student in view 1 could correspond to the entities Student, Teach ing_ass t , Cou rse and Program in view 2. The entity Student in view 2 is selected. 2. The system suggests that the relationship completed in view 2 could correspond to the entity Sect ion in view 1. The reason why the system comes up with such suggestion is that the relationship completed is associated with the entities Cou rse and Grad_Student , while the entity Sect ion is associated with the entities Cou rse and Student. Since the entity Student is the superset of the entity Grad_Student , the system, therefore, suggests that the relationship completed in view 2 could correspond to the entity Sect ion in view 1. This situation has been discussed in section 4.1.3.1.2.1. The relationship completed in view 2 does not correspond to any element in view 1. 3. The system suggests that the relationship teaches in view 2 could correspond to the entity Sect ion in view 1. The reason for this suggestion is the same as the reason described in the previous step. The relationship teaches in view 2 does not correspond to any element in view 1. 284 4. The attributes Citizenship, Fulltime of the entity Student in view 2 (Grad_student in view 3) is relocated to the entity Student in view 3. 5. A key conflict occurs between the entity Course in view 1 and that in view 2. The attributes Dept and Name is selected as the key of the entity Course in view 3.. 285 V i e w 3 - G l o b a l S c h e m a Program Grad student (1,1) Is-A Teachingasst Undergraduate (1,1) < ^ Is-A ^ > (0,1) f(0,l) Student (l,n) ;egistered_ir. ( U ) Section nstance of C U ) teaches Course 286 Comments about the Case Study The above case is used to demonstrate that the improved view integration system is incapable of identifying semantic differences between database views. In view 1, all the graduate students belong to the same department. In view 2, although the graduate students belong to the same faculty, they can belong to different departments. Hence, the domain of the graduate students in view 1 is, in fact, a subset of the domain of the graduate students in view 2. The result of the view integration is also dependent on how well the database designer understands the information captured in the database views. For instance, a database designer, who does not thoroughly understand the information in views 1 and 2, will not know that the attributes of the entity Student in view 2 (Grad_Student in view 3) have to be relocated to the entity Student in view 3. In view 3, the entity Program represents the academic programs at the faculty level. However, in view 1, the attribute Budget of the entity P rogram only applies to a department. Therefore, it is wrong to simply copy the attribute Budget to the entity Program in view 3. A new entity or attribute Department may have to be added to view 3 in order to distinguish the budgets of different departments. Again, this problem can only be identified by a designer who truly understands the information abstracted in the views. 287 This case also shows a weakness of the E-R modeling. In view 1, the information stored in the relationship: Student-registered_in-Section is only temporary, because after a student has finished a course, no one will be interested in which section the student has registered. However, this short term information storage cannot be represented in an E-R diagram. 2 8 8 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0087708/manifest

Comment

Related Items