UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Transformation of set schema into relational structures Lee, Anna 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A6_7 L42.pdf [ 4.68MB ]
Metadata
JSON: 831-1.0051915.json
JSON-LD: 831-1.0051915-ld.json
RDF/XML (Pretty): 831-1.0051915-rdf.xml
RDF/JSON: 831-1.0051915-rdf.json
Turtle: 831-1.0051915-turtle.txt
N-Triples: 831-1.0051915-rdf-ntriples.txt
Original Record: 831-1.0051915-source.json
Full Text
831-1.0051915-fulltext.txt
Citation
831-1.0051915.ris

Full Text

T R A N S F O R M A T I O N O F S E T S C H E M A INTO R E L A T I O N A L S T R U C T U R E S By A N N A L E E B. Math., University of Waterloo, 1984 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F SCIENCE in T H E F A C U L T Y O F G R A D U A T E STUDIES ( D E P A R T M E N T O F C O M P U T E R SCIENCE) We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A August 1987 © Anna Lee, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) Abstract This thesis describes a new approach of relational database design using the SET con-ceptual model. The SET conceptual model is used for information modelling. The database schema generated from the information modelling is called the SET schema. The SET schema consists of the declarations of all the sets of the database schema. A domain graph can be constructed based on the information declared in the SET schema. A domain graph is a directed graph with nodes labelled with declared sets and arcs la-belled with degree information. Each are in the domain graph points to a node S from a node labelled with an immediate domain predecessor of S. The new method of table design for the relational database involves partitioning the domain graph into mutually exclusive <1,1>-connected components based on the degree information. These components (subgraphs) are then transformed into tree struc-tures. These trees are extended to include the domain predecessors of their nodes to make them predecessor total. The projections of these extended trees into the value sets labelling their leaf nodes form a set of relations which can be represented by tables. This table design method is described and presented in this thesis, along with d program that automates the method. Given a schema of the SET model, together with some degree information about defined sets that a user must calculate based on the intention of the defined sets, the program produces a relational database schema that will record data for the SET schema correctly and completely. ii Contents Abstract ii List of Tables v List of Figures vi Acknowledgement viii 1 INTRODUCTION 1 1.1 Overview 1 1.2 Literature 4 1.2.1 Why Database? 4 1.2.2 What is a Data Model? 5 1.2.3 Databases versus Knowledge Bases 6 1.2.4 Similarities of Database Systems and Knowledge-based Systems . 10 1.3 Motivations 12 1.4 Objectives 14 1.5 Organization of Thesis 14 2 SET DATA M O D E L 15 2.1 Introduction 15 2.2 Basic Concepts 16 2.2.1 Set 17 2.2.2 Intension & Extension 17 2.2.3 Domain 18 2.2.4 Ontology of Sets 21 2.2.5 Membership of Set 25 2.2.6 Association, Degree and Attribute 26 2.2.7 Principles of SET modelling 31 2.3 Domain Graph & Domain Tree 33 iii 3 T A B L E DESIGN P R O G R A M 37 3.1 Domain Graph Method of Table Design 38 3.2 Description of Algorithm 42 3.3 Outline of Table Design Program 43 3.3.1 Key 46 3.4 Correctness of Algorithm 47 3.5 Complexity of Table Design Program 51 4 DESCRIPTION OF A N E X A M P L E APPLICATION 53 4.1 Problem Definition 54 4.2 S E T Schema 56 4.3 Domain Graph of University Database Schema 57 . 4.4 Simplified Domain Graph of University Database Schema 58 4.4.1 1-1 connected Components 59 4.4.2 Domain Trees generated from the Domain Graph 60 4.4.3 Extended Domain Trees 61 4.5 Results - Program output 67 4.6 Analysis of Results . 69 5 COMPARISON WITH O T H E R WORKS 71 5.1 The Relational Model 71 5.1.1 Relational View 71 5.1.2 Functional Dependencies/Multivalued Dependencies and Normal-ization . . . 73 5.1.3 Relational Structure . 83 5.2 Entity-Relationship Data Model (ERM) 84 5.2.1 Extended Entity-Relationship data model 98 5.3 Future Extensions 1 100 5.4 Conclusion 101 A Normalization Procedures 106 iv List of Tables 5.1 S E T Schema of SHIPMENT database . . 77 5.2 Table SUPPLIER with primary key supplier number S N U M 79 5.3 Table P A R T with primary key part number P N U M 80 5.4 Table C I T Y with primary key city name C N A M E 80 5.5 Table SUPPLIER-PART (SP) 80 5.6 Table C T with primary key (cname,tname) 82 5.7 Table C X with primary key (cname,textname) 83 5.8 Step 3: Define the value sets and attributes for the entity/relationship sets 94 5.9 Step 4: Organization of data into entity/relationship relations 95 5.10 The S E T schema for the manufacturing firm database 96 v List of Figures 2.1 A simple example of domain graph 20 2.2 A simple domain graph 21 2.3 Ontology of sets 22 2.4 a . S T U D M A J O R augmented domain subgraph. b.Simplified domain graph 30 3.1 Breaking cycle(s) in a subgraph T(SN) of domain graph 41 4.1 Domain graph of University database schema 57 4.2 Simplified Domain graph of University database schema 58 4.3 1-1 connected components 59 4.4 Domain trees 60 4.5 Extended trees 61 4.6 Extended trees - continued 62 4.7 Extended trees - continued . 63 4.8 Extended trees - continued 64 4.9 Extended trees - continued 65 4.10 Extended trees - continued 66 5.1 a. 1:1 and Functional relationship between 2 relations by key propaga-tion, b. M:N relationship between 2 relations via a separate relation scheme 72 5.2 Functional dependency diagram for relation FIRST 75 5.3 Functional dependency diagram for relations S E C O N D and SP 76 5.4 Domain graph of Shipment database 78 5.5 Decomposition based on multivalued dependencies 80 5.6 Domain graph 81 5.7 Basic components of entity-relationship model represented in E R D . . . 86 5.8 Example of a simple entity-relationship diagram (ERD) and the corre-sponding domain graph 87 5.9 Recursive links 87 vi 5.10 Two relationship sets between the same two entity sets 88 5.11 Ternary relationship 88 5.12 Roles of entity sets 89 5.13 a.Attribute of an entity set b.Attribute of a relationship set 90 5.14 Multivalued attribute 90 5.15 Existence dependency 91 5.16 Identity dependency. Members of DIAGNOSIS can be uniquely identi-fied by their relationship with a P A T I E N T entity 92 5.17 Step 1 and 2: A n E R D for analysis of information in a manufacturing firm 94 5.18 Domain graph for firm database S E T schema 97 5.19 Generalization and subset hierarchies 99 5.20 Extended entity-relationship constructs 100 vii Acknowledgement / would like to thank my supervisor, Prof. Paul Gilmore, for his inspirations, guidance and advice on this thesis and Dr. Gerald Neufeld for reading the final draft. I would also like to thank all the lecturers and my fellow graduate students in the Department of Computer Science for making the time at UBC a very interesting and pleasant one. The University Graduate Fellowships awarded by the University of British Columbia is also very much appreciated. viii Chapter 1 I N T R O D U C T I O N 1.1 Overview This thesis presents a new approach of relational database design using the SET con-ceptual data model and a new method of table schema design based on the SET model. The earlier data models, the hierarchical data model [33,11], and the network data model [2,11] are low level models since they are too implementation oriented (i.e., they require database designers to consider physical characteristics of data in the conceptual design stage, making the design process unnecessarily complicated). Therefore, their capabilities to achieve data independence have been challenged. The relational data model [9,8,11], on the other hand, can achieve a high degree of data independence. It frees the database designers from most implementation issues, hence it is considered as a high level data model. However, it has several weaknesses. It requires the database designers to consider the presentation of data (tables) at the early stages of database design, hence it may lose some of the capabilities to represent 1 CHAPTER 1. INTRODUCTION 2 important semantic information in data modelling. These insufficient data modelling capabilities cause unnecessary complex normalization method of table schema design and the inability to deal with referential integrity in a uniform manner. The entity-relationship modelling approach is considered more natural and more conceptual oriented than the models described above. It incorporates most of the strength of the earlier models. For example, like the network model it separates enti-ties and relationships to provide a natural view of data. And like the relational model, it achieves a high degree of data independence. At the same time, it also avoids the premature implementation issues (of the hierarchical and the network model) and the restrictive presentation concerns (of the relational model) in the conceptual design stage. The weakness of the entity-relationship model, however, is its lack of a sound foundation upon which a management system might be based [14]. One of the motiva-tions for the development of the conceptual SET model is to provide such foundation. Both the relational model and the entity-relationship model may involve complex normalization process in the table schema design level of database design. In this thesis, a new method of table schema design based on the SET model, called the domain graph method of table design, is described and presented, along with a program that automates the method. The domain graph method provides a simple and systematic way to design table schema. The outline of the database design approach which will be presented in this thesis CHAPTER 1. INTRODUCTION 3 is as follows: 1. The choice of data model: S E T model. The S E T data model is used for informa-tion modelling. Information is modelled purely in terms of sets, entity sets and association sets. The database schema generated from this information modelling is called the S E T schema. 2. The design of logical database schemas, also known as logical database design phase. The logical database design phase includes: • Requirement specification (specified by natural language expressions). • View modelling, that is, gathering " local views" of the logical database and describing how they will use the database. • View integrating, that is, intergrating local views to form global description of the entire database [28,29,39]. • View restructuring, this involves mapping of the global description into a structure which is compatible with the data model chosen in step 1. This step will produce a S E T schema for the database. The SET schema consists of the declarations of all the sets of the database schema. • Transformation of the S E T schema constructed above into relations. A do-main graph is constructed based on the information provided in the SET CHAPTER 1. INTRODUCTION 4 schema. This domain graph is then partitioned into <1, l>-connected com-ponents. These components are transformed into tree structures. These tree structures are then projected into relations. The above transformation pro-cess produces the optimum number of tables to record the data for the SET schema, without losing any information. 3. The loading of the physical database, also known as physical database design phase. This design phase analyses the structure provided by the logical design phase (schema analysis) and maps it into storage structures supported by the available relational DBMS. This step varies among different DBMS and is not discussed in this thesis. Before we present the main theme of this thesis, we will present the literature, motivations and objectives of our work in database design. 1.2 Literature 1 . 2 . 1 W h y D a t a b a s e ? The enormity of problems existing in the traditional file or data processing method is the impetus to the design and implementation of database systems [11]. Among others, the problems that arise in the conventional file or data processing methods are: high redundancy of data and complex set of interrelating files which may become the bottleneck in a complex enterprise. CHAPTER 1. INTRODUCTION 5 The solution is to create a base of data, a database, that can be used or shared by various programs through a front-end program (the DBMS). A database is a collection of files that includes a self description(data dictionary). It contains within itself the format, location and method of storing data in its own files. In addition to solving the above problems, advance databases provide data inde-pendence, data integrity and consistency, query flexibility and control access [11]. 1 . 2 . 2 W h a t i s a D a t a M o d e l ? The major tool for database planning is the construction of models which represent the database structure in a manner which allows manipulation of the conceptual building blocks for the database [35]. Data models are conceptual tools for the organization and representation of information. In addition, data models • allow us to see information content of the data as opposed to individual values of data, • emphasize the relevant details of entities and suppress the irrelevant ones, • capture some of the interpretation of the data in additional to the data itself, • provide notation that allows concise data descriptions and statements about inter-relationships, and CHAPTER 1. INTRODUCTION 6 • provide a means for the representation and manipulation of information in a way that is amenable to computerization. Any design methodology must be based on some underlying data model which forms the basis for describing the structure of the data and the operations on that structure. In a database environment, a complete data model usually contains the following elements: • a collection of static and dynamic modelling primitives, • a collection of constraints which are used to integrate the primitives together to provide the structures for interpreting the data, and • a language for communication between the users and the computers. 1 . 2 . 3 D a t a b a s e s v e r s u s K n o w l e d g e B a s e s The emergence of knowledge bases in the area of Artificial Intelligence raises questions on the differences and similarities between databases and knowledge bases. In the following we briefly present the differences and similarities of databases and knowledge bases. Definition & characteristics - A database is a collection of information to which uniform access is provided. Database systems are usually defined and manip-ulated by some Data Definition Language (DDL) and Data Manipulation Lan-CHAPTER 1. INTRODUCTION 7 guage (DML) respectively. Advanced database systems attempt to combine these two languages. For example, for S E T model database system, a language called D E F I N E is under implementation to serve as both the D D L and D M L of the S E T database. Some database systems employ a database dictionary to provide directories to the large volumes of information stored in the database. Knowledge-based systems are usually associated with the application of A l re-search in Knowledge Representation (KR) 1 and reasoning to domains of appli-cations. Typically, knowledge bases are viewed as collections of domain-specific knowledge obtained from experts in the domains and are encoded into computer programs. These programs are then used to provide problem solving assistance to professionals working in the domains. In fact, the ability of an A l system resides in the detailed explicit knowledge base that represents the system's understand-ing of its domain. In this sense, the system is said to be anthropomorphized, where the program is an individual whose knowledge base forms its foundation for understanding and behaving in its environment [18]. The major ingredients of a knowledge-based systems are: 1. the representation language (e.g., K R L [3].), 2. the inference regime, and 1 Formal definition of Knowledge Representation will be presented at the end of this section. CHAPTER 1. INTRODUCTION 8 3. the particular domain knowledge. Nature of information - [37] has argued that the kinds of information stored in databases and knowledge bases are different. Databases are usually designed to hold large volumes of highly structured data, while knowledge bases tend to capture abstract (unstructured) information (e.g., All students are poor), in addition to the concrete (structured) information typical of a database (e.g., John is a student). D o m a i n of information and application - Databases are used for real world ap-plications whereas knowledge bases are often used for a more restricted and contrived domains [18]. Databases are appropriate for management and storing of information, while knowledge bases are generally associated with diagnosis systems. Volume of information and inferencing/retrieval mechanism - As mentioned, databases hold large volumes of structured information and they use some re-trieval mechanism based on the structure of the information to provide uniform access. Knowledge bases capture smaller volumes of less structured information and use some inference mechanisms to recover implicit information when required. The new data models for database systems attempt to provide such capabilities (i.e., CHAPTER 1. INTRODUCTION 9 only the basic information are stored, derivable information will be computed only when required [6]). Researchers who work on knowledge-based systems claim that database systems lack reasoning ability (most manipulations of data deal with yes-no sentences), and the means to deal with incomplete information, uncertainty and general information. In a sense, a database management system is a limited form of a knowledge-based system. However, this limited form has its own merits in efficiency, tractability and ease of use in answering questions that are related to the structure in the database itself [25]. There is no complicated reasoning or inferencing. Some database researchers proposed to incorporate rules into databases. The rules can be used to enforce integrity and consistency for storing data and for ease in querying the database. Management of information - Management of information involves maintaining consistency and enforcing integrity and accuracy. In a database system, the database administrator is responsible to enforce some of the integrity and accu-racy of the database. In a knowledge-based system, a subsystem, usually called truth maintenance system, assists in maintaining its knowledge and beliefs [12,23] by providing CHAPTER 1. INTRODUCTION 10 a set of consistent beliefs to the system (problem solver) at a given time and environment. The truth maintenance system determines the current set of beliefs, given the justifications encountered so far at that particular point in time, and not with respect to the logic of the problem solver. The integrity conditions must be supplied by the user before the truth maintenance subsystem can correctly determine whether a statement (e.g., an axiom) can be included in the current set of beliefs [12]. Note that, unlike database systems, the whole knowledge base need not be consistent. V i e w integration - Most traditional database systems distinguish between concep-tual schemas, external/user views as well as data administrators' and program-mers' views. On the other hand, knowledge bases incorporate all of them in a uniform manner (i.e., there is no explicit distinction between conceptual schema information and particular facts). Hence, they are more flexible than databases to accommodate changes in conceptual schemas. One of the objectives of the S E T model is to support view integration in this manner. 1 . 2 . 4 S i m i l a r i t i e s o f D a t a b a s e S y s t e m s a n d K n o w l e d g e - b a s e d S y s t e m s In recent years, both database and knowledge base researchers are sharing some com-mon goals in attempts to solve pr enhance the following: CHAPTER 1. INTRODUCTION 11 1. Revision of the base of information (database or knowledge base) in a nonmono-tonic way as new information is added to the base. 2. Reasoning ability to deal with incomplete "knowledge". 3. User interfaces, for examples: intelligent interactions, richness and flexibility of user-system interaction and descriptive response. 4. Data independence. Analogous terms in database and knowledge base: D a t a model versus Knowledge Representation : The minimal definition of a data model consists of: 1. a formal specification of all operations on objects belonging to the data model, and 2. a formal specification of the semantics of the operations, including a set of integrity rules. Knowledge Representation consists of : 1. a collection of data structure types, 2. a collection of operators and inferencing rules, and 3. a collection of integrity rules. CHAPTER 1. INTRODUCTION 12 D a t a Independence : Data independence is one of the goals of database and knowl-edge base design. Ent i ty Sc association versus facts & rules : Facts and rules in rule-based systems constitute entities and associations2 in database systems. 1.3 Motivations Although the entity-relationship model is considered the premier model for conceptual design, it has been shown to be inadequate in its modelling concepts. Extensions of the entity-relationship model have been proposed in [31,1,34,24,30,13,27,32]. The ex-tensions are done on topics such as the concept of generalization, existence constraints, integrity constraints and extensions on the entity-relationship diagram representation. However, a more disciplined entity-relationship modelling process is required. The SET model presented in [17] is a formalized semantic model based on the entity-relationship approach. It is proposed to serve as a foundation for the entity-relationship model. Although a formal model like the relational data model can be used in the table de-sign stage after the conceptual design stage, it requires complex analysis of functional and multivalued dependencies. By analyzing functional and multivalued dependencies associations in a relation schema, the normalization process is carried out to produce normalized relations to ensure no update anomalies. As a database schema grows 2In this thesis, we will use the term association instead of relation, to avoid confusion with the term relation in relational database systems. CHAPTER 1. INTRODUCTION 13 bigger and more complicated, the functional and multivalued dependencies also in-creases in number and become hard to analyse, thereby causing complex normalization processes. Normalization solves intra-table integrity problems but causes inter-table integrity problems, hence it may break relations into semantically meaningless units. Therefore, the relational model has limited capability to capture the semantics of re-lationships and constraints on and among tables. Works on the transformation of the entity-relationship model to the relational schemes can be found in [36,7,19,20,4,10,27,13,38]. In most cases, the candidate rela-tions which resulted from the transformation still need to be normalized. It is possible to transform the S E T schema into a corresponding relational schema which is capable of correctly recording the extensions of declared sets of the SET schema. Such a transformation and the domain graph method of table design on which the transformation process is based will be presented in this thesis. The domain graph method of table design frees database designers from analysing great numbers of data dependencies and complex normalization process. The transformation of the S E T schema into the relational schemes (also called table design algorithm) will enable a database design approach to benefit from both the SET model capabilities of conceptual design and the simplicity of the relational structure. CHAPTER 1. INTRODUCTION 14 1.4 Objectives The objective of this thesis is to present a conceptual design methodology for relational databases using the SET model. It is then necessary to be able to transform the S E T schema into the relational table schema. A n algorithm to transform the SET schema into the relational table schema is described and presented in this thesis. This algorithm is automated to assist the database designer to produce the optimum number of tables which will correctly record the data for a given S E T schema without losing any information. The purpose of the program is to assist the database designer to design the table schema for relatively large and complex databases. This table schema can then be used in the various existing relational systems. 1.5 Organization of Thesis This thesis consists of five chapters. The overview of the S E T data model will be presented in chapter 2. The algorithm for table design will be presented in chapter 3. A n example of application will be presented in chapter 4 . Finally, the comparison with other works and future extensions are presented in chapter 5. Chapter 2 S E T D A T A M O D E L > 2.1 Introduction In this chapter we will describe the S E T conceptual model proposed by Gilmore in [17]. A good conceptual model should be simple and have a sound theoretical base. We will see that the S E T model satisfies both requirements. First, the S E T model is purely based on one concept, that of a set. Both entity sets and associations between them are represented in term of sets. Second, it is based on a provably consistent formalized set theory[l6]. It will be shown in chapter 5 that most of the fundamental features in entity-relationship data model[5] are supported by the S E T model. It is shown in [14] that the S E T model could serve as a foundation for the entity-relationship model. As in the entity-relationship model, the conceptual design process involves: • identifying entities (entity sets), associations (association sets) and information concerning the entities and the associations, and 15 CHAPTER 2. SET DATA MODEL 16 • providing a unified view by integrating views of entity, association and attribute sets. 2.2 Basic Concepts Defining a S E T schema for an organizational information system requires careful ori-entation of design. As in other data models, there is more than one way to define a database schema [21]. We will first introduce some definitions and describe the features of the S E T model. Throughout this chapter, the design of a university database will be used as an example application to illustrate the features of the S E T model. The complete solution will then be presented and summarized in chapter 4. The description of the university database is as follows. At the university, there are various academic departments, each of which has its own unique department name. Each academic department offers some courses related to the department. Each course is identified by the department name and a course number. In each department, there are academic staffs (we do not consider nonacademic staffs in this example). Each academic staff is assigned a unique staff number and belongs to exactly one department. Exactly one academic staff in the department is currently the head of the department. Each academic staff/instructor in a department is competent to teach some courses offered by the department. Each course, when CHAPTER 2. SET DATA MODEL 17 offered on particular semester, is taught by one instructor. Each student attending the university majors in one area of studies from a depart-ment. For each major, there are courses requirements. Students who are eligible to take some required courses must register for the required courses. For a course registered by a student, the student will be granted a mark (and also credit) upon completion of the course. Each student currently attending the university is given a unique student ID. 2 . 2 . 1 S e t A set is a selected collection of entities possessing a common property or characteristic. The entity sets and association sets (including attribute sets) of interest are gathered during the process of information analysis, and are declared in a S E T schema. 2 . 2 . 2 I n t e n s i o n & E x t e n s i o n The membership of a set is characterized by its intension, the property that an entity must possess to be a member of the set. For example, consider the set declarations: I N T E G E R for { x: INTEGER I x i s system defined I a set of integers } and P O S I T I V E _ E V E N J N T E G E R for { a:INTEGER I [ 3x:INTEGER ] x>0 and a=2x I a is an even, positive integer} The set I N T E G E R is a primitive defined set and the set P O S I T I V E _ E V E N J N T E G E R CHAPTER 2. SET DATA MODEL 18 is a nonprimitive defined set1. The notation a:INTEGER means a is a member of the set I N T E G E R . The first part of the set declaration is called the domain assertion. It declares the range of values that a can be assigned. The second part of the declaration contains either degree information (for nonprimitive base set) or assertion (for defined set). The last part of the declaration defines the permissible occurrences of the set by specifying the membership condition (the intention) and hence is important in database design. It can be expressed in a natural language expression. The second property of a set is representational in nature and is called the extension of the set. The intention of a set, together with domain assertion of the set, determine the extension of the set. For example, the set representation {2,4,6,8,...} gives one possible extension of the set P O S I T I V E _ E V E N J N T E G E R whose intention is specified as above. It specifies the actual occurrences of the set by explicitly listing its members. Two sets are said to be extensionally identical if each member in one is in the other and vice versa. Note that a set has no duplicate members. A member appears in a set once and only once. 2 . 2 . 3 D o m a i n A collection of declarations of sets is called a SET schema. Each declared set must have a domain. Domains are used as sets of values from which certain semantically meaningful entities and their properties can take values over time. Hence the domain 1The different formats of set declarations will be elaborated in section 2.2.4. CHAPTER 2. SET DATA MODEL 19 declaration of a set is an integrity constraint expressing restrictions in the membership of the set. This type of integrity constraint is called inclusion constraint. The domain of a set S is the extensions of the cartesian product of one or more pre-viously declared set(s). Each set to be incorporated in the database must be declared and none is assumed to pre-exist. To avoid infinite regression of domains, the domain of a set S is either itself or cartesian product of previously declared set(s). The set with itself as its domain is called the primitive set. For a given declared set S, an immediate domain predecessor of S is any of the sets whose cartesian product in some order is the domain of S 2. For example, the immediate domain predecessors of a set called E M P D E P T , which associates employ-ees with their department, are the sets E M P L O Y E E and D E P A R T M E N T . The declaration of the above three sets are as follows: EMPLOYEE for <EMPL0YEE|I a set of employees employed by the university} DEPARTMENT for {DEPARTMENT I I a set of departments i n the university} EMPDEPT for {EMPLOYEE,DEPARTMENT I<1.1>.<1.*>| associate employees with their department. Each employee belongs to exactly one department and each department may have one or more employees} 2 A data base schema is closed under immediate domain predecessor. That is, if a set S is in a database schema, then all its immediate predecessors are also in the database schema. CHAPTER 2. SET DATA MODEL 20 The members of E M P D E P T are the binary tuples <emp,dept> where emp is a member of E M P L O Y E E and dept is a member of D E P A R T M E N T . Figure 2.1 shows a simple domain graph representation of the set E M P D E P T with its immediate domain pre-decessor information. A set can only label one node in a domain graph. Each arc in the domain graph points to a node labelled with a set S from a node labelled with an immediate domain predecessor of S3. The arity domain of a set is defined as follows: • The arity domain of a primitive set is the set itself. • The arity domain of a set S with domain Si x ... x Sk, where k >= 2 is the set S. • The arity domain of a set Sa with domain Si, i.e. Ss is a nonprimitive set with a single immediate domain predecessor Si, is the arity domain of Si. 3The concept of domain graph will be further elaborated in section 2.3. Figure 2.1: A simple example of domain graph CHAPTER 2. SET DATA MODEL 21 A n arity predecessor of a set is any immediate domain predecessor of its arity domain. The arity of a set is 1 if its arity domain is primitive, otherwise it is the number of its arity predecessors. If a set occurs more than once in the arity predecessors of a set S, the multiplicity of that arity predecessor is added instead to the computation of the arity of S. A B e D H Figure 2.2: A simple domain graph In figure 2.2, the immediate domain predecessors of G are E and F . The domain of G is (ExF), i.e., (AxB)x(CxD). The arity of G is 2. The arity domain of G is G. The arity predecessors of G are therefore E and F . The immediate domain predecessor of H is G , and the arity domain of H is also G . The arity predecessors of H are those of G , namely E and F . The arity of H is 2. 2 . 2 . 4 O n t o l o g y o f S e t s Declared sets can be categorized as depicted in the figure 2.3. They are categorized as follows: Pr imit ive set - Primitive sets which are base are the basic sets of an information CHAPTER 2. SET DATA MODEL 22 BASE ^DEFINED PRIMITIVE' 1 i r NONVALUE VALUE VALUE NONPRIMmVE NONVALUE & NONVALUE r Figure 2.3: Ontology of sets system. They are the entity sets. No two primitive base sets have any member in common. Every primitive base set must be provided with an identifier such that each member of the set is associated with a string in a value set that is unique to that member. For example, a primitive base set called E M P L O Y E E must have an identifier declared for it, say E M P N U M , to provide a one-to-one association between its members and the value set, V E M P N U M , which contains the range of possible values for E M P N U M . That is, the members of the set E M P L O Y E E are identified by members of the value set V E M P N U M through the identifier E M P N U M . Primitive value sets are system defined sets. Examples of such sets are I N T E G E R , STRING, etc. A primitive set is declared to have itself as its domain, since no set is assumed to preexist. CHAPTER 2. SET DATA MODEL 23 Base set - Base sets have only human understandable intension and are usually time dependent. That is, the extension of a base set can vary with time although its intension remains unchanged. For example, the extension of a primitive base set EMPLOYEE, where members of EMPLOYEE are employees currently employed in the company (its intension), may change as new employees are being hired or some old employees are no longer with the company. A base set declaration is as follows: {Domain declaration I degrees declaration I intention/comment} The first two parts are machine interpretable. As mentioned, the second part is left blank for primitive base set declarations. Base sets can be categorized into primitive base sets and nonprimitive base sets. Primitive base sets are the entity sets and nonprimitive base sets are sets contain-ing information (attributes or associations between entity sets) about the entity sets. Two examples of base sets are : EMPLOYEE (primitive) and EMPDEPT (nonprimitive). Defined sets - Primitive defined sets are value sets that are provided by a system. For example: the set INTEGER which consists of a set of all possible integers provided by the system. Note that each member of the primitive denned set, CHAPTER 2. SET DATA MODEL 24 I N T E G E R , identifies itself. Therefore it is not necessary to provide identifier for primitive defined sets. Nonprimitive defined sets are sets that can be defined in term of existing sets. A n example of a nonprimitive defined value set is : a set V N A M E consisting of a range of possible department names. V N A M E is defined in term of an existing set, say STRING. Explicit constraint can be added to restrict the string values in V N A M E ; for example, all the string values in V N A M E must be of length less than 12 characters. A declaration of a defined set has the following format: {Domain*Variable declaration I assertion I intention/comment} The first two parts are machine interpretable. Example of a nonprimitive nonvalue defined set is: HEADS for { e:EMPLOYEE I [For some d:DEPARTMENT] <e,d>:HEAD_0FI a set of heads of departments} where the base set HEAD_0F i s declared as HEAD_0F for {EMPDEPTI<0,1>,<1,1>I associate employees as heads of departments, each department i s managed by exactly one head} The domain declaration of the set H E A D . O F asserts that a head of a department CHAPTER 2. SET DATA MODEL 25 must be an employee of his/her department. Value sets - Value sets are defined sets of machine readable and writable strings. Any defined set with its domain a value set or a cartesian product of value sets is also a value set. Value sets generally have time independent extensions and they require no identifier since their members can be recorded directly. Primitive value sets can be viewed as basic data types of programming languages, for instance, I N T E G E R , STRING, B O O L E A N and R E A L 4 . Note that the ex-tension of these sets hardly change with time. A n example of a nonprimitive value set is V N A M E , a value set for a set of names. V N A M E may have domain STRING or the combination of STRING and I N T E G E R . 2 . 2 . 5 M e m b e r s h i p o f S e t The membership of a set is determined as follows: 1. If S is a base set (primitive or nonprimitive), membership is determined by human understanding of the intension of S. 2. If S is a nonprimitive defined set, membership is determined by machine inter-pretation of the intension of S. . 4In relational model, primitive value set is called domain of attribute. CHAPTER 2. SET DATA MODEL 26 3. If S is a primitive value set, membership is determined prior to machine imple-mentation of the intension of S. 2.2.6 Association, Degree and Attribute Let AB be an association between the sets A and B. Let a be any member of A. The degree of a for AB (or the degree of AB on A) is the number of b's in B that are AB associated with a, denoted by <a,b>:AB or a:AB:b. The degree of a member of an entity set for its association can vary with time. The intension of a set is used to determined its smallest possible degree, called the lower degree, and its largest possible degree, called the upper degree. In this sense, the degrees capture some of the semantics of data. The lower degree is restricted to 0 or 1 and the upper degree to 1 and *. The * sign represents any positive integer value. For example: If AB is an association between the entity sets A and B and AB has a lower degree of 1 on A, then at all times every member of A is AB associated with at least one member of B. If it has an upper degree of * on A, then at some time there is a member of A that is associated with any number of members of B. • An association AB between A and B is said to be total on A if its lower degree on A is 1, • It is single valued on A if its upper degree on A is 1, and CHAPTER 2. SET DATA MODEL 27 • It is multivalued on A if its upper degree on A is *. • It is functional on A if it is total and single valued. Degrees are declared only on base sets and are integrity constraints expressing restrictions on the membership of a set. This type of integrity constraint is also called the degree constraint. During information analysis, only base associations have degrees declared for them, defined associations do not. It is assumed however that the degrees for defined sets can be determined from their declaration since their intensions indirectly imposes the restriction on their degrees. In this thesis, we will assume that the degrees for defined sets are computable by the user. For example, suppose we have declared: VMAJOR for {major:STRINGllength(major) <=12|a set of f i e l d s of study offered by the university} (assume length i s a predeclared primitive defined function) DEPTMAJOR for {DEPARTMENT.VMAJOR I <1 , *>,<6.*> I each major i s associated with a department and is i d e n t i f i e d through the i d e n t i f i e r of the department} STUDDEPT for {STUDENT.DEPARTMENT I <l.l>.<i,*> I each student belongs to exactly one department, and CHAPTER 2. SET DATA MODEL 28 each department has one or more students} then we can define a defined set DSTUDMAJOR as follows: DSTUDMAJOR for {s:STUDENT.<d.major>:DEPTMAJOR I <s,d>:STUDDEPT I associates student with f i e l d s of studies i n his/her department} The second part of the declaration states that the student may only major the majors offered in his/her department. The declaration of DSTUDMAJOR states that the set DSTUDMAJOR consists of all possible pairs of <s,<d,major>> for which <s,d> :STUDDEPT. Since there exists at least one major in a department, therefore for each student in STUDDEPT.STUDENT (the projection of the set STUDDEPT on STUDENT), there may be at least one dept-major such that <s ,<d, ma jor»: DSTUDMAJOR. Hence the degree of DSTUDMAJOR on STUDENT is <1,*>. Similarly, for each department there is at least one student, therefore the degree of DSTUDMAJOR on DEPTMAJOR is also <1 .*>. The set DSTUDMAJOR defined as above is the domain for the base set STUD MA-JOR which consists of the set of pairs <s,<d,major*> and which determines the actual student's major for each student. STUDMAJOR is declared as follows: STUDMAJOR for {DSTUDMAJOR I <1,1>.<0,*> I each student majors i n one area of studies i n his/her department} Since each student must major in at least one major and one major only, therefore CHAPTER 2. SET DATA MODEL 29 the degree of S T U D M A J O R on D S T U D M A J O R . S T U D E N T is <1,1>. Similarly, a department major may be majored by 0 or more students in the department, therefore, the degree of S T U D M A J O R on D S T U D M A J O R . D E P T M A J O R is <0,*>. The degrees of S T U D M A J O R are declared on D S T U D M A J O R . S T U D E N T and D S T U D M A J O R . D E P T M A J O R because the set S T U D M A J O R is not its own arity domain. However, since the lower degrees of D S T U D M A J O R on S T U D E N T and D E P T M A J O R are both 1, therefore the sets D S T U D M A J O R . S T U D E N T and S T U -D E N T are extensionally identical, and the set D S T U D M A J O R . D E P T M A J O R and D E P T M A J O R are also extensionally identical. Hence, the degree of S T U D M A J O R is also <1,1> on S T U D E N T and similarly <0,*> on D E P T M A J O R . In this case, the set D S T U D M A J O R . S T U D E N T and D S T U D M A J O R . D E P T M A J O R are not necessarily declared. Figure 2.4 shows the domain graph (subgraph) for the above example, with each arc labelled with the degree information. The nodes labelled by projection sets ( D S T U D M A -J O R . S T U D E N T , D S T U D M A J O R . D E P T M A J O R ) are drawn in broken lines in fig-ure 2.4.a since the projection sets are not declared. This type of domain graph is called the augmented domain graph. The membership of the projection sets and the denned set D S T U D M A J O R need not be recorded since they were needed only as domains for base sets, so the subgraph containing nodes labelled with such sets is dropped in the simplified domain graph, figure 2.4.b. The simplified graph in fig-CHAPTER 2. SET DATA MODEL 30 VMAJOR <1,1> STUDENT <1,1> STUDDEPT <1,*> DEPARTMENT -STUDMAJOR "I <0,*> <1,*> DEPTMAJOR <0,*> WIAJOR Figure 2.4: a . S T U D M A J O R augmented domain subgraph, b.Simplified domain graph CHAPTER 2. SET DATA MODEL 31 ure 2.4.D shows the direct links from each arity predecessor of a node to the node. It is this simplified domain graph that will be used to determine the tables in the domain graph method of table design. Degrees can also be declared for a set with more than two immediate domain predecessors. Consider a set S with immediate domain predecessors, PxQxR, then the lower degree of S on P is the least number of pair <q, r> for any p, for which <p, q, r> may be a member of S. The upper degree of S on P is 1 if there is at most one <q,r> for each p in P, otherwise it is * if there is more than one <q,r> for some p in P. A n attribute on a set A is an association A V between A and a value set V . The attribute is said to be total, single valued, multivalued or functional if the association is respectively total, single valued, multivalued or functional on A . The value of A V for a member a of A is the unique member v of V for which [a,v] is a member of A V . A n attribute on A is said to be an identifier of A if it is functional and A V is single valued on V . 2 . 2 . 7 P r i n c i p l e s o f S E T m o d e l l i n g The following guidelines are followed in the set modelling: • The choice of primitive base sets is crucial in the design of a data base system. A primitive base set should have as members only those entities of interest to the enterprise. Primitive base sets should be chosen to be as large as possi-CHAPTER 2. SET DATA MODEL 32 ble as long as it is consistent with the requirement for a coherent identifier. For example, in a university database, E M P L O Y E E and D E P A R T M E N T may be de-clared as primitive base sets E M P L O Y E E and D E P A R T M E N T respectively. E M P L O Y E E is the domain for all employees employed by the university and D E P A R T M E N T is the domain for all departments of the university. How-ever, a set with members employees and departments must not be declared as a single primitive base set since a coherent identifier for such a set cannot be provided. On the other hand, it is not necessary to declare female employees and male employees in a university database as two separate primitive base sets since a larger primitive base set E M P L O Y E E can naturally represent both and a simple identifier for such a set can be provided. • Do not declare a set as base if it can be declared as a defined set in terms of sets that could be previously declared. For example, the set H E A D S must not be declared as a base set, since the set H E A D S can be defined in term of the sets E M P D E P T and E M P L O Y E E . By declaring the set H E A D S as a base set, unnecessary human intervention is needed to maintain the membership of H E A D S such that it is consistent with the membership of the set E M P D E P T and E M P L O Y E E , while we can easily rely on the system's capabilities to maintain the membership of H E A D S as a defined set (recall that membership of base sets must be maintained by human, while membership of defined sets is maintained CHAPTER 2. SET DATA MODEL 33 by the system). • Do not declare as an arity 1 set, a set of an arity greater than or equal to 2. For example, the set D E P T M A J O R should be declared as a base set of arity 2 as follows: {DEPARTMENT,VMAJORI<1,*>,<0.*>I associates departments with the ir majors} rather than a base set of arity 1. • Do not declare as an arity k set, where k is greater than or equal to 3, if the set can be naturally declared as an arity 2 set. For example, a set A B C of arity 3, should be declared as a base set (AB)C of arity 2 whenever its upper degree on A B is 1. This is to avoid composite identifiers as much as possible[32]. One other helpful guideline to classify entity sets and attribute or association sets, proposed in [32] is: if there is descriptive information about an object, the object should be declared as an entity (a member of an entity set), if only an identifier is needed for an object, it can be declared as a member of a value set (of an attribute). 2.3 Domain Graph & Domain Tree Simple examples of the graphical representation of a domain graph have been intro-duced in figure 2.1 and figure 2.4. Such graphical representations provide the database CHAPTER 2. SET DATA MODEL 34 designer/user with an overview of declarations that have been made. In this section we will elaborate on the concept of domain graph. In chapter 3, we will present a table design method which is based on the domain graph concept. A domain graph is a directed graph with nodes labelled with the declared sets and directed arcs labelled with the degree information. A domain graph for a database schema must meet the following conditions: 1. Each node in the domain graph is labelled with a declared set from the database schema. No set in a data base schema can label more than one node in the domain graph. Each arc in the domain graph points to a node labelled with a set S from a node labelled with an immediate domain predecessor of S. 2. The domain graph is predecessor total in the following sense: all the immediate domain predecessors of a node must be included in the domain graph as tails of arcs pointing to the node. There is an arc for every pair of node and its immediate domain predecessor, with the node labelling the head of the arc and the immediate domain predecessor labelling the tail of the arc. A domain graph with its arcs labelled with degree information is called a degree domain graph. The degree information is obtained from the declarations in the S E T schema. As mentioned in section 2.2.6, the degrees for base sets are declared CHAPTER 2. SET DATA MODEL 3 5 during information analysis. The degrees for defined sets are calculated based on their intention and the degree information that is already declared in the S E T schema. Two nodes are 1-connected if there is an undirected path from one to the other using edges of lower degree 1 only. A single node is 1-connected to itself. Two nodes are < 1 , l>-connected if there is an undirected path from one to the other using the edges with degrees < 1 , 1 > only. Since the immediate predecessor domains of a set must be declared prior to the declaration of the set itself, a domain graph representing the set schema is necessarily directed acyclic. However, there may be undirected cycles in a domain graph, since a set may be associated with several other sets and itself, and a set can only label one node in the domain graph. It is necessary to break such cycles in a domain graph. By breaking such cycles in a domain graph, we transform the domain graph into domain trees. A domain tree is a directed graph which must satisfy the following conditions: 1 . a domain tree satisfies all the conditions of a domain graph, except an entity set can label more than one nodes in the tree and a domain tree need not be predecessor total. 2. a domain tree must not have any undirected cycle. A set can appear more than once in a domain tree, represented by nodes with CHAPTER 2. SET DATA MODEL 36 different labels. This is to take care of the set which associates with more than one other sets or itself. A degree domain tree is a domain tree with its arcs labelled with degree information derived from the degree information presented in its degree domain graph. The method of transforming a domain graph into domain trees will be presented in chapter 3. The algorithm for table schemas design will be based on this method. Each tree generated from the transformation corresponds to a relation generated by the table design algorithm. Chapter 3 TABLE DESIGN PROGRAM As mentioned in chapter 1, the objective of this thesis is to present a conceptual design methodology for relational databases using the SET model. It is then necessary to be able to transform the S E T schema resulting from the conceptual modelling to the relational table schema. We have presented the SET conceptual model in chapter 2. In this chapter we will present an algorithm which will transform the S E T schema into a relation schema supplemented with integrity constraints. This algorithm is automated to produce the optimum number of tables to record the data for a given S E T schema correctly and completely without losing any information. A n example of its application is presented in chapter 4. The table design algorithm is based on the domain graph method of table design. The domain graph method of table design transforms domain graph into domain trees. Each of these trees corresponds to a relation. 37 CHAPTER 3. TABLE DESIGN PROGRAM 38 The domain graph method requires correct degrees information for both base sets and defined sets declared in the SET schema. Degrees for base sets are declared during information analysis. We assume that degrees for defined set can be calculated by the user based on the intention of the defined set and the degree information that is already in the S E T schema. 3.1 Domain Graph Method of Table Design The domain graph method is as follows (statements in italics are comments): 1. Label each edge of a domain graph of a S E T schema with lower and upper degrees. Start with G = {all the sets labelling the nodes of the degree domain graph }. 2. Do until there is no more set in G . (a) set S N = E M P T Y and T ( S N ) = E M P T Y . (b) Choose an arbitrary set in G and delete it from G . Include this set in SN and include the node labelled by this set in T(SN). (c) For every set S in G that is 1-connected to any set in SN, include it in SN and delete it from G . Include in T(SN), the node labelled by S and the arc joining S to the corresponding node in T(SN) together with the degree information. CHAPTER 3. TABLE DESIGN PROGRAM 3 9 (d) The subgraph T(SN) can further be refined, by threading the <1, l>-connected components of the subgraph, to later produce more efficient tables. The re-fined <1, l>-connected components are threaded in a similar way as in step 2c by using the upper degree on each arc. (e) Construct a tree out of T(SN). For each cycle in T(SN), choose an arc pointing out from bottom node. This arc is to be deleted from T(SN). A bottom node is a node with all its arc(s) pointing away from it. That is, a bottom node is a node that is not the head of any edge in the cycle. The choice of the arc to be deleted is important in determining which node is to be duplicated in the resulting tree. For efficiency and a better represen-tation, choose ones that are pointing out from bottom node in the cycle to minimize duplications in the next step. Add to T(SN) a new node N labelled with the set which labels the tail of the chosen arc, and add a new arc A from the new node to the head of the chosen arc. Assign to A the degree of the chosen arc. Remove the chosen arc from T(SN). An example is shown in figure 3.1. (f) To be able to construct a corresponding table from the tree T(SN), extend the tree to include the identifier for each set labelling the nodes of the tree. CHAPTER 3. TABLE DESIGN PROGRAM 40 This is done by making each node in the tree predecessor total by adding new leaf nodes labelled with the predecessors and new arcs joining the new nodes to their successors. Assign to the new arc the degree of the corre-sponding successor on the predecessor. Repeat until each bottom node of the extended tree is either labelled with primitive base set or value set. As mentioned, primitive base sets must be provided with an identifier. Hence, for each bottom node labelled with primitive base set above, link it with an arc pointing to a new node labelled by the primitive base set identifier. This new node is in turn linked with an arc pointing from the node representing its value set. If such a'node has not already existed, it must be added. •. Value set is its own identifier, therefore no further extension is necessary for the bottom nodes labelled by value set. Each of the value sets labelling the bottom nodes of the tree constitutes a column in the table. The value set also indicates the range of possible values for the corresponding column. Note that the same primitive base set may label more than one node in the extended tree. Therefore, it may constitute more than one column in the resulting table. Different column names must be given for these columns to avoid ambiguity. In this case, the corresponding association name is used CHAPTER 3. TABLE DESIGN PROGRAM 41 to reflect the role of the attribute represented in the column. (g) Output SN and the extended tree. (h) Project the extended tree onto its value sets to form a table. Figure 3.1: Breaking cycle(s) in a subgraph T(SN) of domain graph Note that each subgraph T(SN) in step 2c is mutually exclusive from the other in the domain graph. That is, the subgraphs generated in step 2c are maximal subgraph and they do not share any node. Hence, they provide unique partitions of G. This is also why we can arbitrarily choose a set in G in step 2.b. It will be shown that each set in the SET schema is included or covered by at least one tree by the above construction1. A table can be constructed from each subgraph generated from step 2.c. This table represents a relation (in relational data model terminology) that is lossless. A table consists of any ordered set of tuples that are member of the cartesian product (join) of the value sets labelling the bottom nodes of the trees in step 2.f. The number of tables 1 Proofs are presented in section 3.4. | CHAPTER 3. TABLE DESIGN PROGRAM 42 generated at this step is the minimal number of tables necessary to record all the data completely and correctly without losing any information2. However, these tables need to be refined for storage efficiency and user convenience purposes, since data may have to be duplicated in rows of a table as a trade off from minimizing the number of tables. This refinement is done in step 2.d by threading the <1, l>-connected components in the domain tree. From each of the <1, l>-connected components, an efficient tree (and hence efficient table) can be constructed. 3.2 Description of Algorithm The automated table design algorithm presented in this chapter is based on the domain graph method of table design described above. The objective of the table design algorithm is to produce an optimum number of tables which represent the base of data in a given S E T schema correctly and efficiently. Basically, for every declared non value set (primitive or association base sets), say X, join all the associations that are <1, l>-connected to the set X to produce a <1, 1>-connected component. It will be shown in section 3.4 that this join is a lossless join on the set X . The <1, l>-connected components form pretables (table schema) for the database. Projection of each of these pretables onto their value sets forms their corresponding table. 2 The proof of this claim is presented in section 3.4. CHAPTER 3. TABLE DESIGN PROGRAM 43 As in the domain graph method of table design, this algorithm assumes that degree information for both base sets and defined sets are provided in the SET schema. Also, the program which automates the algorithm only consider associations with arity 2. Slight modification on the data structure used in the program can be made to incorpo-rate associations with arity other than 2. However, the semantics of such higher order association sets can become quite complex to comprehend. Also, it is rare that we will declared an association with arity more than 2 if it can be naturally declared as arity 2. 3.3 Outline of Table Design Program 1. Make sure a l l entity sets (primitive base sets) are covered by some table. For each entity set E i n the SET schema do: If the entity set E has not been included i n any table, -mark E. -thread a l l <1,l>-connected attributes and associations associated with E to form a pretable. The resulting table w i l l record information on E. (We c a l l such a table the entity table.) -A key for such a table i s the i d e n t i f i e r of the entity set E. CHAPTER 3. TABLE DESIGN PROGRAM 2 . Take care of the uncovered associat ion sets. For each associat ion A i n the SET schema do: If the associat ion A has not been incorporated i n any table , -get the a r i t y predecessor(s) of A and include them i n the table , -thread a l l <1,l>-connected attr ibutes and associations associated with A. -determine the primary key for the table . The key for such a table may be a combination of the i d e n t i f i e r s of the ent i ty sets involved i n the table such that a 1-1 mapping can be drawn from th i s combination to the group of value sets i n the table . (we c a l l such a table the associat ion table . ) Recursive Subroutine THREAD(S). •Threading for <1,l>-connected component for a set S* For each associat ion ASSOC i n the SET schema, CHAPTER 3. TABLE DESIGN PROGRAM 45 i f ASSOC i s <l,l>-connected with S. -mark ASSOC. - include ASSOC i n the table . -check the opposite set i n the immediate domain predecessors of If i t i s a primit ive set, -set the i d e n t i f i e r of ASSOC to be the i d e n t i f i e r of the ent i ty set. If i t i s an associat ion set, -get the a r i t y predecessor(s) of the associat ion, - include them i n the table . - I f any of the a r i t y predecessor(s), say P, i s <1,l>-connected to ASSOC and has not yet been included i n any table , -mark P. - recurs ive ly thread the <1,l>-connected attr ibutes and associations associated to P, and include them i n the table . If i t i s a value set, -set the value set (domain) of ASSOC to be the value set. -thread for <1,l>-connected component of ASSOC. CHAPTER 3. TABLE DESIGN PROGRAM 46 3 . 3 . 1 K e y A key for a table is an attribute or a combination of attributes which is used to identify a tuple in a table. Therefore, values of a key for a table must be unique in the table. The existence of a key for each table is guaranteed by the fact that each tuple in a table is a member of a set whose domains are the cartesian products of the value sets labelling the leaf nodes of the extended trees generated in step 2.f. There is no duplicate member in a set. Therefore each tuple must be unique, that is, each tuple can be uniquely identified. A key for an entity table is the identifier of the primitive base set (entity set) recorded in the table, since such an identifier must be able to uniquely identify the member of the primitive base set. A key for an association table may be a combina-tion of the identifiers of the primitive base sets which participated in the association recorded in the table. Note also that some tables may be all-key. That is, the whole combination of all attributes in the table is used to identify the tuple in the table. Given adequate information on the identifier of each primitive base sets (identifier for weak entity set may be composite), the program for the table design uses the above method to automatically determine the primary key for each table. Therefore, no human intervention is necessary in determining a key for each table. Examples will be presented in chapter 4. CHAPTER 3. TABLE DESIGN PROGRAM 47 3.4 Correctness of Algorithm To prove that the tables generated from the domain graph method of table design record all the data of interest of an enterprise specified in a SET schema3, consider any set S in the S E T schema. From the construction of the corresponding domain graph, S will label exactly one node of the domain graph. It will be shown that any such S will be recorded in at least one table resulted from step 2.f, and the membership of S can be determined from the table. It is clear from the definition and construction of the trees in step 2.e that these trees are either a single node or a 1-connected component. They form a forest which is a collection of mutually exclusive partitions of the domain graph. Therefore no node in the domain graph is left-out or not covered by at least one tree. By the extension in step 2.f, S is projected onto its corresponding value set(s) to form a relation R. It is proved in [15] that S and R have the same extension. In this way, the members of S are recorded and can be determined from the table representing the relation. On the other hand, it is necessary to show that the members of S are not redun-dantly recorded in the tables generated in step 2.h, such that update anomalies can be avoided. In step 2.c and 2.d, a set is either included in only one tree or more than one tree. This means members of a set S are either recorded in only one table or the members of the subset of S are also recorded in other table(s) (generated from 3The detailed proof can be found in [15]. CHAPTER 3. TABLE DESIGN PROGRAM 48 <1, l>-connected components), the association table(s). In the algorithm, a set S is included in a pretable only when it has not been covered in other table, or it is <1, l>-connected to a set already incorporated in the pretable. The following theorem shows the role of lower degrees in inclusion constraints. T h e o r e m l Given an association A B between any two sets A and B, the projection of A B onto A , denoted by A B . A , and the set A are extensionally identical if the lower degree of A B on A is 1. Collorary: If the lower degree of A B on A is 0, then the sets A and A B . A may, at some times, be not extensionally identical. Proof: Since A B has lower degree 1 on A , for each member, a of A , there is a b in B for which a:AB:b by the definition of lower degree. This means that a must be in A B . A . Since this is true for each a in A , therefore by the definition of the set A B . A , A B . A and A are necessarily extensionally identical. That is, no information on A is lost in the join and projection. If the lower degree of A B on A is 0, this means there can be a member a in A for which there is no b in B. Such an a would not be in A B . A and so A B . A and A might not be extensionally identical. In this case, A B . A is a subset of A. Theorems 2 and 3 are generalization of theorem 1. For theorem 2 and 3: CHAPTER 3. TABLE DESIGN PROGRAM 49 Let A B and A C be associations between A and B, and A and C respectively, and let A B C be a join of A B and A C on A and A B C . A B is the projection A B C on A B . Similarly A B C . A C is the projection of A B C on A C . Theorem2 A B C . A B is extensionally identical with A B , if the set A B . A is a subset of the set A C . A . Proof: If A B . A is a subset of the set A C . A , then from every <a,b> in A B , there is <a,c> in A C . Hence there is some c in C for which <a,b,c> is in A B C . Hence <a,b> is in A B C . A B by the definition of A B C . A B . Since A B C . A B is a subset of A B , therefore A B C . A B are extensionally identical with A B . Theorem3 If the lower degree of A B and A C on A are both 1, then the projection A B C . A B of A B C on A B is extensional identical with A B , and A B C . A C and A C is also extensionally identical. Proof: Since A B and A C has lower degree 1 on A , then from theorem 1, A B . A and A C . A are extensionally identical with A. Hence they are extensionally identical to each other. From theorem 2, A B C . A B and A B is also extensionally identical. The inclusion constraint is translatable into the referential integrity described in [11]. This type of constraint dictates that all the values of a base set identifier appear-ing in an association table must also appear in the entity table in which the members of the base set are primarily recorded. For example: each department name appear-CHAPTER 3. TABLE DESIGN PROGRAM 50 ing in the table recording the association set, E M P D E P T = { EMPLOYEE, DEPARTMENT I <1. 1> . <1 . *> I associates the members of the sets E M P L O Y E E and D E P A R T M E N T } , must also appear in the entity table recording the members of the primary base set D E -P A R T M E N T . That is, if a tree generated from step 2 .d, say T ( E M P D E P T ) , is made predecessor total by extending it with its predecessors ( E M P L O Y E E and D E P A R T -M E N T ) , and their corresponding identifier (empnum and deptname respectively) to form a table T _ E M P D E P T , the values of the identifiers empnum and deptname ap-pearing in T _ E M P D E P T must also appear in the entity tables in which the members of the predecessors are actually recorded ( T _ E M P L O Y E E and T . D E P A R T M E N T re-spectively) . The inclusion constraint also justified that the deletion of a tuple in the association table does not also delete the corresponding identifier value in the entity table. That is, there is no delete anomaly. In relational systems, inclusion constraints should be explicitly stated and checked on each related update to ensure no update anomaly. For example: Rulel : After inser t ing EMPDEPT After updating EMPDEPT.empnum Exis ts ( EMPLOYEE where EMPLOYEE.empnum = EMPDEPT.empnum). Finally, it can be shown that the mutually exclusive partitions in a domain graph is unique. Since the partitions are mutually exclusive (clear from the partition process), CHAPTER 3. TABLE DESIGN PROGRAM 51 therefore no matter which node we start with, the result is always the same set of mutually exclusive <1, l>-connected components. Hence the <1, l>-connected com-ponents generated from the same domain graph are also always the same (uniqueness of results). The degrees labelling the edges of each tree generated from the algorithm determines the dependencies among the nodes in the tree. This is due to the property of trees that there is a unique undirected path from any node of a tree to any other node of the tree. This eliminates the complex traditional relational method of determining the functional and multivalued dependencies in conceptual design. The correctness of a S E T schema is critically dependent upon the correctness of the degrees supplied and the choice of primitive base sets. The degrees for base sets are declared by the database designer. The degrees for defined sets is assumed to be computable. 3.5 Complexity of Table Design Program The program starts by gathering all the entity sets and association sets pertaining to the database schema. A n identifier must be declared for each primitive base set. For the program to work correctly, correct degree information must be supplied for each association. The table design program then starts threading <1, l>-connected component for CHAPTER 3. TABLE DESIGN PROGRAM 52 each entity set, followed by threading <1, l>-eonnected component for each of the uncovered (left-over) association sets. Each set which has been covered in a table is marked. The recursive subroutine thread will terminate when it encounters an arc with degree which is not <1, 1>. If there is a cycle in the subgraph, the subroutine will stop when it hits the set which has previously been marked and merely duplicate the set. The complexity of the algorithm is proportional to the degrees (the number of columns) of the tables. This is because subroutine thread is the most resource con-suming. The higher the degree of a table, the more times subroutine thread is (recur-sively) called to generate the table. However, the increased complexity is not significant since subroutine thread is not a heavy duty subroutine. Chapter 4 DESCRIPTION OF A N E X A M P L E APPLICATION An example application will be presented in this chapter. The format of the example is as follows: 1. The description of the problem. 2. The corresponding SET data model representation (SET schema) that defines what entity sets and relationship types may exist and their attributes, along with additional information which is needed to construct the domain graph. 3. The construction of the domain graph (based on the information from the SET schema) representing entity sets and relationships between sets, particularly the domain predecessor(s) of each set. 4. The output of the program generating the optimum number of tables which will correctly record the base of sets without losing any information. 53 CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 54 5. Analysis of results. 4.1 Problem Definition We will present the design and implementation of an Academic Information System for a university. At the university, there are various academic departments, each of which has its own unique department name. Each academic department offers some courses related to the department. Each course is identified by the department name and a course number. In each department, there are academic staffs (we do not consider nonacademic staffs in this example). Each academic staff is assigned a unique staff number and belongs to exactly one department. Exactly one academic staff in the department is currently the head of the department. Each academic staff/instructor in a department is competent to teach some courses offered by the department. Each course, when offered on particular semester, is taught by one instructor. Each student attending the university majors in one area of studies from a depart-ment. For each major, there are courses requirements. Students who are eligible to take some required courses must register for the required courses. For a course registered by a student, the student will be granted a mark (and also credit) upon completion of the course. Each student currently attending the university is given a unique student ID. CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 55 Students, academic staffs and departments may have some other attributes. To keep the example from being unnecessarily large, we only include the attributes and associations that are deemed to be representative. In the operational level, the Academic Information System must support the fol-lowing function: • Recording and following students' performance. • Checking the completion of program requirements by a student. • Reporting of academic standing of students. • Checking for student's eligibility to attend specific courses. • Generation of lists of student requesting a course, or students actually attending a course. • Reporting courses schedule (given by whom, where and when/which semester). CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 4.2 SET Schema 56 Setname Domain Degree Assertion Intent ion/Cornment Primitive value H t i STRING S^TRING x is defined by the system a set of strings of characters or difite INTEGER yJNTEGER y k defined by the system a aet of intefen LEN • STRING, 1JNTEGER 1 U computed by system a function to compute length of a given string Primitive base > e t e DEPARTMENT DEPARTMENT M t of departmente in the university EMPLOYEE EMPLOYEE •et of academic staff a employed by the univeriity STUDENT STUDENT aet of students attending the unlveralty Nonprimitive value sets VDEPTNAME neme;STRING {n*me:LEN:} <- 12 a value set for names of departments VCOURSENUM cnunuINTECER {cnunuLEN:} <• S a value aet for course numbers VEMPNUM esumJNTEGER {eaunuLEN:} <• 6 a value aet for employee numbers VSTUDNUM •numJNTEGER {snum-LKN:} <• 1 a value aet for student numbers VMAJOR meJonSTRING {major-XEN:} <• 13 a value set for fields of studies offered by university VMARK mark INTEGER 0 <• Berk <- TO a value set for marks V8EMESTER term :ST RING tem le eae ef a value set of university calender semesters {WDJTER^ PRINGJ?ALL} Nonprimitive base M U DEPTNAME DEPARTMENT, VDEPTNAME <1.1>,<0,1> each department is identified by a unique name EMPNUM EMPLOYEE,VEMPNUM <1.1>,<0.1> each academic stag ia identified by a unique employee number STUDNUM STUDBNT.VSTUDNUM <l.l>.<0.1> each student is identified by a unique student number Assecamo* set* baae and defined sets DEPTCOURSE DEPARTMENT,VCOURSENUM <!,•>.<0.»> each course offered by a department is identified by the department identifier and the course number EMPDEPT EMPLOYEE JJEPARTMENT <l,l>.<l.e> each staff belong to exactly one department and a department may have one or more staffs HEAD-OF EMPDEPT <0.1>,<1,1> each department is managed by exactly one head DEMPCOURSE •EMPLOYEE, <4,c»n>:0O>IC(nnSE <e.4>:IMPllEn associate employees with their department's courses EMPCOURSE DEMPCOURSE <!,•>,<!,•> an employee of a department is competent to teach some of the courses offend by his/her department COURSEO PEERED DEPTCOURSE.VSEMESTER <0.•>,<!.•> a course is offered in some semesters of the year DINSTRUCTOR «4,caas> ,tem> :C00181QrraB>, c:lNPLOTEE <e.e>:EMPDEPT associates competent lecturers with the courses of-fered INSTRUCTOR DINSTRUCTOR <!.»>,<0,e> a course when offered on a particular semester is taught by only one instructor DEPTMAJOR DEPARTMENT,VMAJOR <!,•>,<0,«> associate fields of studies to each department STUDDEPT STUDENT DEPARTMENT <l.l>.<l.e> each student belongs to one department DSTUDMAJOR »:STUDENT, <d.«aJer>:DEPnUJ01 <s.l>:SIDU)EPT associate student with fields of studies in his/her de-partment STUDMAJOR DSTUDMAJOR <l,l>.<0.e> each student majors in one of area of studies in his/her department COURSEREQ DEPTMAJORJJEPTCOURSE <!,•>.<0.»> for each major, there ere courses requirements COURSEREG STUDENT.COURSEOPFERED <».•>.<0.e> each student has registered for some offered courses COURSECOMPLETED COURSEREG <0,e>.<0,e> courses completed by students must be first be regis-tered MARK <e, «4, c e « n > , tem»: C01BSICQNPLE1D, <!.!>. <0.» each courae completed by a student is granted a mark mark:VMARK Set name Domain Degree computed Assertion Intention/Comment Dcfimi arts DEMPCOURSE DINSTRUCTOR DSTUDMAJOR e EMPLOYEE, cd. caem>: DnTCOUlSI <<d,casB»,term»:C1)TaSKQrTnB), aiEXPLGTEK a:STUDENT, <d.a«jor>:DEPTH!JM <l,e>,<l,e> <l,e>,<l.e> <l,e>,<l,e> <e,d>:M>DEPT <e.e>:EMPDEPT <s,d>:STtn>DEPI associate employees with their department's courses associates competent lecturers with the courses offered associate student with fields of studies in his/her de-partment CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 57 4.3 Domain Graph of University Database Schema <i,i> ^ I OQURSEOCMREiro _<0,1> , •> j" NTEGffl | » | VMAHK <0,*> | SRJDNUM ^ |OOURSffBa | <1,1> MARK vsnx INUM I <0.1> INTEGER Figure 4 . 1 : Domain graph of University database schema CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 58 4.4 Simplified Domain Graph of University Database Schema N T B 3 E R r<0,1> S T R M G |  |<0,1> STUTJCEPT [4-VB/IPNUM ^ <0,1> EMPNUM t<1,1> -J^i—1<1,1 EMPLOVEE l * 1 ' 1 ^ ! EMPDEPT \ + ' * |DEPARTMBNnf VDEPTNAMEl  Y<0,1> DEPTNAME | f <1,1a <V> STRING <1,*> , rc0,1> VMAJOR <0,1> * \ H E A D O F " <1,1> NTEGfft <0,*> <1,*> E M F C O U F S E <1,*> <Q--> » NSTRUCTOR <1,1> 3^1 VrxUFSENUM DEPTMAJOR <0,*> * \ C O L R 3 B T O ] 4 <0,*> <0,*> <1,*> <o,*> V S E M B3TER 4 k <0,1> STUDMAJOR CCURSEflEGH-<1,*> <1,1> STUDBvn" ICOJFBBDCMIEIH < 1 «0,1>f-i 1 <0,">| NTB3ER I H VMAPK | ^ MARK <0.*>r <0,*> <1,1> <1,1a STUDNUM | I <0,1> V3TUDNUM | <0,1> N T E G E R Figure 4.2: Simplified Domain graph of University database schema CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 5 9 4 . 4 . 1 1-1 c o n n e c t e d C o m p o n e n t s NTEGER VB/IPNUM I •<01> STRNG | V D E P T N A M E I STUCCBT K-<1.1> EMPNUM f<1,1> EMPLOYEE ^ EMPOEPT~J^  l<0,1; D E P T N A M E f <1,1> DEPARTMENTl <f}A?U hEAD <1,1 = <1.*> S T R I N G <1 ,*> 4<o,i> VMAJOR EMFCOURSE DEPTCOURSE hBTRUCTOR NTEGER v>sc ;r<o,*> \O0URSENUM DEPTMAJOR C O U R S E R E O c0,*> <0,*> jCCLRSECRTO^<1' "| V S B ^ T J R " c V S E M E S T E R i i <0,1> GOURSEREG •<i.i> NTEGER "r-^gj VMARK MARK OOUR3EOCNFLETE <0,*> <0,*> STUDMAJOR <1,1> STUDENT <1,1> STUDNUM  f <0,1>" VSTUDNUM — r <0,1> N T E G E R Figure 4.3: 1-1 connected components CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 4 . 4 . 2 D o m a i n T r e e s g e n e r a t e d f r o m t h e D o m a i n G r a p h EMPNUM f <1,1> | EMPLOYEE r^i\ EMPDEPT DEPTNAME |  t<1.1> | D E P A H T M E N T | , <1,1> I COIRSEFEGI I CqjRSB3CftPLETH)| | COURSEPEOl HEAD Or <1,1> r MARK |gSTRLC^"]4-^|oaJ^ECRg=ED| • I DEPTMAJORI | STUDMAJOR i i <1,1> | STUDENT I <1,1>STUDNUM I Figure 4.4: Domain trees CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 4 . 4 . 3 E x t e n d e d D o m a i n T r e e s NTEGER |  + <0,1>" VEMPNUM  ^ <0,1> EMPNUM | f <1.1: E M P L O Y E E » E M P D E P T h* .<1 ,*> STRNG | ^ <0,1> V D E P T N A M E |  | <0,1> D E P T N A M E |  f <1,lT D E P A R T M E N T c NTEGER |  ^ <0,1> VEMPNUM I <0,1> EMPNUM <1,1> I E M P L O Y E E ] [ STRNG ]  <^0,i> | \ C E P T N A M E ] | D E P T N A M E ]  t<1,i>  I D E P A T W N T <0,1> -*\ HEAD_OF "}» I STRNG I  ^ <o[i> | N / D E P T N A M E ]  ^ <o[i> | D E P T N A M E ~ |  t <i]i> | D E P A R T M E N T ] NTEGER |  ^ <0,1>  \03URSENUM! <1,*> . <0,*> ^CEPTCOURSEp  Figure 4.5: Extended trees CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION | NTEGER "j «y <gi> VEMPNUM ]  «/<o~i>  I EMPNUM |  f<1.lT I EMRDYEEl <1,*> - I f EMPOCUPgl<-| S T R I M G j  «/ <gi>  | V E E P T N A M E ] I D E P T N A M E " ! t <i!i> DEPARTMENT] I NTEGER |  T <0,?7 |VQ3URSENUr^  <0,*> CEPTOQURSEr*-<1,"> NTEGER VEMPNUM  Y <0,1> [ E M P N U M . EMPLOYEE <0,*> I-INSTRUCTOR M-I STRNG «y <0,1: | V D E P T N A M E  «y <0,1> D E P T N A M ; I t <1,1> DEPARTMENT NTEGER [ v a S l i <v> »j CEPT03URSE}* e0,*> <1-1> <o,*> - |0GUFEg3Tffi^«»—[ V S E M E S T E R <0,1> STRIMG Figure 4.6: Extended trees - continued CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION STRIMG ^ <0j> | W S ^ T W M E ] «y <o.i> I STRIMG | OEPTNWE f <1,1> | DEPAPJMENTf-gTTS-H DEPTM\J3R| <0,1> | VMAJOR <0,*> STRIMG «y <0,1: V D E P T N A M E | <0,1: D E P T N A M E <^1,1> | STRING | <0,1> T | V M A X R | <0,"> DEPARTMENT{-^yp^-»>j^BJTKWJ3R STRIMG J <0,*> | STUDMAJOR | «/<0,1> | VDEPTNAME | ^ <Q,1> | D E P T N A M E " ] i ^ f <1,l7 | STUDENT STUDDEPT r^~~~~l DEPARTMENT! <1,1> <1,1> STUDNUM | <0,1> VSTUDNUM| •> <0,1> NTEGER | Figure 4.7: Extended trees - continued CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION | STRNG |  <^0,1> | VDEPTNAME] T<0.1>* | DEPTNAME ] t<i;"i>  | DEPARTMENT] <1.*> rrjEPTCCURSEP NTEGER • <0,1> IVCCUREENJ^  <0.*> STRNG STRNG I <0,1> IVCEPmftME I ^ <6ji>  I DEPTNAMEI t <\i>  | DEPARTMENT} <0,1> T | VMAJOR "| <v> <0," DEPTMAJOR -H ccuraeaV <i,*> Figure 4.8: Extended trees - continued CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION | NTEGER "J 1 VEMPNUM~1 + <0,1> EMPNUM f <1,1> EMPLOYEE 1 <0,*> N5TRUCTOR \+~ STUDENT <1,1> | STRNG |  +c0,1> | V D E P T N A M E ] V <gi>  | D E P T N A M E ] f<1,1> .  | D E P A R T M E N T ] D E P 1 C C U R S E | <1,1> NTEGER J^ <0,1>  |VOgjFSENUV] <0,*> <0,'> OCURBEG f*~ V S E M E S T E R — z <0,1> I ; STRNG 1 i k <0,1> V S T U D N U M i <0,1> NTEGER Figure 4.9: Extended trees - continued CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION | NTEGER | I STTttJG I I +*°' 1 > . *><0,1> | VEMPNUM I | VDEPTNAME | «/<0,1> *<0.1> I EMPNUM I | DEPTNAME | t<1.1» f1.1> <1/> EMPLOYEE NTEGER 0^,1 > \03URSENUM| |DEPARTMENT|—»j DEPTO0UR9E1 |OaJRSE03^«^Elkj NTEGER VMARK )——»{ MARK 1 <1,1> snjo MJM '<0,1> VSTUONUM | |<0,1> NTEGER | Figure 4.10: Extended trees - continued CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 4.5 Results - Program output Star t reading e n t i t y sets . . . The e n t i t y sets are: DEPARTMENT with i d e n t i f i e r DEPTNAME EMPLOYEE with i d e n t i f i e r EMPNUM STUDENT with i d e n t i f i e r STUDNUM F i n i s h e d reading e n t i t y sets S tar t reading a s s o c i a t i o n sets . . . The a s s o c i a t i o n sets are: DEPTNAME DEPARTMENT 1 1 VDEPTNAME 0 1 EMPNUM EMPLOYEE 1 1 VEMPNUM 0 1 STUDNUM STUDENT 1 1 VSTUDNUM 0 1 DEPTCOURSE DEPARTMENT 1 * VCOURSENUM 0 * EMPDEPT EMPLOYEE 1 1 DEPARTMENT 1 * HEAD_OF EMPLOYEE 0 1 DEPARTMENT 1 1 EMPCOURSE EMPLOYEE 1 * DEPTCOURSE 1 * COURSEOFFERED DEPTCOURSE 0 * VSEMESTER 1 * INSTRUCTOR COURSEOFFERED 1 1 EMPLOYEE 0 * DEPTMAJOR DEPARTMENT 1 * VMAJOR 0 * STUDDEPT STUDENT 1 1 DEPARTMENT 1 * STUDMAJOR STUDENT 1 1 DEPTMAJOR 0 * COURSEREQ DEPTMAJOR 1 * DEPTCOURSE 0 * COURSEREG STUDENT 1 * COURSEOFFERED 0 * COURSECOMPLETED STUDENT 0 * COURSEOFFERED 0 * MARK COURSECOMPLETED 1 1 VMARK 0 * F i n i s h e d reading a s s o c i a t i o n sets CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION The tab le s requ ired are: Tab le ' s name: DEPARTMENT a t t r i b u t e s : DEPTNAME with value set VDEPTNAME HEAD_OF with i d e n t i f i e r EMPNUM Primary key for t h i s tab le i s DEPTNAME T a b l e ' s name: EMPLOYEE a t t r i b u t e s : EMPNUM with value set VEMPNUM EMPDEPT with i d e n t i f i e r DEPTNAME Primary key for t h i s tab le i s EMPNUM Table ' s name: STUDENT a t t r i b u t e s : STUDNUM with value set VSTUDNUM STUDDEPT with i d e n t i f i e r DEPTNAME STUDMAJOR with a r i t y predecessor(s ) : DEPARTMENT VMAJOR Primary key for t h i s tab le i s STUDNUM Tab le ' s name: DEPTCOURSE a t t r i b u t e s : DEPTCOURSE with a r i t y predecessor(s) : DEPARTMENT VCOURSENUM This t a b l e i s a l l - k e y T a b l e ' s name: EMPCOURSE a t t r i b u t e s : EMPCOURSE with a r i t y predecessor(s) : EMPLOYEE DEPTCOURSE This tab le i s a l l - k e y Tab le ' s name: COURSEOFFERED a t t r i b u t e s : COURSEOFFERED with a r i t y predecessor(s) : DEPTCOURSE VSEMESTER INSTRUCTOR with i d e n t i f i e r EMPNUM Primary key for t h i s tab le i s COURSEOFFERED Table ' s name: DEPTMAJOR a t t r i b u t e s : DEPTMAJOR with a r i t y predecessor(s) : DEPARTMENT VMAJOR This t a b l e i s a l l - k e y Tab le ' s name: COURSEREQ a t t r i b u t e s : COURSEREQ with a r i t y predecessor(s) : DEPTMAJOR DEPTCOURSE This t a b l e i s a l l - k e y T a b l e ' s name: COURSEREG a t t r i b u t e s : COURSEREG with a r i t y predecessor(s) : STUDENT COURSEOFFERED This tab le i s a l l - k e y Tab le ' s name: COURSECOMPLETED a t t r i b u t e s : COURSECOMPLETED with a r i t y predecessor(s) : STUDENT COURSEOFFERED MARK with value set VMARK Primary key for t h i s tab le i s COURSECOMPLETED DONE CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 69 4.6 Analysis of Results The first three tables are the entity tables which record the entities information of the primitive base sets D E P A R T M E N T , E M P L O Y E E and S T U D E N T respectively. The primary key for these tables is the identifier declared for the corresponding primitive base sets, i.e., department name ( D E P T N A M E ) , employee number (EMPNUM) and student number (STUDNUM) respectively. The rest of the tables are association tables which record the association informa-tions which are not already included in the first three entity tables. The primary key for these tables are determined in the manner described in section 3.3.1. The tables D E P T C O U R S E , E M P C O U R S E , D E P T M A J O R , C O U R S E R E Q and C O U R S E R E G are all-key tables. That is, the whole combination of values of all the attributes in the tuple is needed to identify the tuple. The primary key for the table C O U R S E O F F E R E D is the attribute C O U R S E O F -F E R E D with arity predecessors DEPTCOURSE x VSEMESTER. Note that the node la-belled by C O U R S E O F F E R E D in figure 4.4 is a bottom node in the subgraph (tree). Similarly the primary key for table C O U R S E C O M P L E T E D is the attribute C O U R -S E C O M P L E T E D with arity predecessors STUDENT x COURSEREG. Note also that, the attribute H E A D _ O F in the table D E P A R T M E N T is identi-fied through the identifier E M P N U M . However, the value set of E M P N U M is stated CHAPTER 4. DESCRIPTION OF AN EXAMPLE APPLICATION 70 only in the table scheme E M P L O Y E E . This means each empnum value appearing in the column HEAD_of of the table D E P A R T M E N T must also appear in the corre-sponding column E M P N U M of table E M P L O Y E E . Similarly for E M P D E P T , STUD-D E P T , S T U D M A J O R , D E P T C O U R S E , E M P C O U R S E , C O U R S E O F F E R E D , IN-S T R U C T O R , D E P T M A J O R , C O U R S E R E Q , C O U R S E R E G and C O U R S E C O M P L E T E D . This type of integrity constraint together with the degree constraint must be enforced by the relational database to fully and accurately represent the S E T schema with its integrity constraints. The output of the table design program can be easily accommodated by any rela-tional DBMS (for examples: dBASEII, dBASEIII or RBASE5000) or even any prolog database shells (for example, Microprolog system in [6] Intelligence Database). i Chapter 5 COMPARISON WITH OTHER WORKS 5 .1 The Relational Model 5.1.1 R e l a t i o n a l V i e w A relational database consists of a set of relations with tables as their representations, proposed by Codd [9]. Given the entity sets D l , D2, Dn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples, each of which has its first element from D l , second element from D2, and so on. Denoted by: R = { (dl, d2, ... , dn) I dl:Dl , . . . , dn:Dn }. The sets Di 's are called the domains of R and (dl, d2, dn) is called a tuple of R. The number n is the degree of R and the number of tuples in R is called its cardinality. In a relational database schema declared in the data definition language, a relation is a listing of a relation name and its corresponding domain names, together with the specification of one or more keys. A relational schema is a multiset D = {Rl, Rn} 71 CHAPTER 5. COMPARISON WITH OTHER WORKS 72 in which each Ri is a relation. It specifies the intention of a relational database. Relationships between relations are represented by key propagation from one rela-tion to another; for the case of many-to-many relationship, as depicted in figure 5.1.b, via a separate relation. Role of domain EMPLOYEE NAME ADDRESS EMPDEPT Domain VEMP# /NAME VADDRESS VDEPTNAME Tuple (iTi^  [2221 Varah Jones Balaclava Broadway Role of domain DEPARTMENT HEAD ADDRESS Domain VDEPTNAME VEMP# VADDRESS Tuple 0 Q Agriculture Rd Each employee belongs to one department. Each department may have 1 or more employees. Each department has exactly one head a. 1:1 and Functional relationships Role of domain EMPLOYEE DEPARTMENT COURSE* Domain VEMP# VDEPTNAME /GOURSE# Tuple 111 111 222 CS CS CS 302 402 302 Each employee may be competent to teach 1 or more courses. Each course can be taught by 1 or more employees. b. M:N relationship Figure 5.1: a. 1:1 and Functional relationship between 2 relations by key propagation, b. M:N relationship between 2 relations via a separate relation scheme. Role names are used to clarify the semantics of the domains in a relation, especially CHAPTER 5. COMPARISON WITH OTHER WORKS 73 if the same domain appears more than once in a relation1. Comparing figure 5.1 with the corresponding S E T schema in chapter 4, we note that the domains in the relational model are basically equivalent to the value sets. The role names in the relational model seem to correspond to the attribute/association set names in the S E T model. However, an attribute set in the SET model also serves as an association between an entity set and its value set. And an association set in the SET model also serves as an association between declared sets. In this way, the distinction between the conceptual objects (entity and association sets and information associated with them) and the representation of such conceptual objects (by data) is apparent. This distinction helps clarify the semantics of data. Also, as in the entity-relationship model, the distinction between the conceptual objects and their representation, and the separation of entity sets and association sets clarify the semantics of functional and multivalued dependencies among data[5]. 5 . 1 . 2 F u n c t i o n a l D e p e n d e n c i e s / M u l t i v a l u e d D e p e n d e n c i e s a n d N o r m a l i z a t i o n Relations in a relational database are always normalized, in order, as much as possible, to have one fact presented in one place, to remove undesirable updating anomalies and redundancy [11]. To understand why normalization is necessary, it is helpful to recognize what is meant by functional dependency and multivalued dependency. 1 However, role names cannot be used to clarify semantic of the same domain name in different relations. CHAPTER 5. COMPARISON WITH OTHER WORKS 74 Definition: Functional dependency ( F D ) Given a relation R, attribute Y of R is functionally dependent on attribute X of R if and only if each of the X-value in R has associated with it precisely one Y-value in R at any one time. Both X and Y may be composite attributes. As an example, attribute N A M E of the relation E M P L O Y E E in figure 5.1 is func-tionally dependent on the attribute (primary key) E M P N U M of the relation E M -P L O Y E E . Definition: Mult iva lued dependency ( M V D ) Given a relation R with attributes A , B and C , the multivalued dependence(MVD) denoted by R . A 2 - >->R.B (i.e., B is M V D on A) holds in R if and only if the set of B-values matching a given (A-value,C-value) pair in R depends only on the A-value and is independent of the C-value. (A,B,C may be composite attributes). The normalization process provides a simple means to declare these kinds of depen-dencies. A relation is said to be normalized (in its first normal form) if each attribute value in each tuple (a row in the table representing the relation) is atomic (non de-composible so far as the system is concerned). Relations in its first normal form can further be normalized to second, third, fourth or fifth normal forms. The definitions of these normal forms and the normalization procedure can be found in the appendix. All relations must be at least in its first normal form. However, not all relations must 2Here, R.A means A is an attribute of R 1 CHAPTER 5. COMPARISON WITH OTHER WORKS 75 be normalized all the way to the last level of normal forms. To illustrate how normalization is carried out, let us use the examples from Date[ll]. Suppose we are given a simple relation called FIRST(SNUM,STATUS,CITY.PNUM.QTY) concerning information about suppliers and shipment of parts, together with its func-tional dependencies information. The primary key for this relation is the combination (SNUMjPNUM). Figure 5.2 shows the corresponding functional dependency diagram for relation FIRST. This relation is in INF since all the underlying domains contain atomic values only (i.e., nondecomposible). OTY S# P# STATUS CITY Figure 5.2: Functional dependency diagram for relation FIRST We see from figure 5.2 that: 1. The attribute Q T Y is functional dependent on (SNUM,PNUM). 2. The attribute STATUS is functional dependent on C I T Y . 3. The attributes STATUS and C I T Y are not fully functional dependent on the primary key. CHAPTER 5. COMPARISON WITH OTHER WORKS 76 4. The attributes STATUS and C I T Y are functional dependent only on SNUM, which is part of the primary key. The relation FIRST suffers from update anomalies as are shown in Date[ll]. The solution to these update anomalies is to normalize the relation. The relation FIRST is split into two relations: SECOND (SNUM, STATUS, CITY) and SP(SNUM.PNUM.qTY). The reduction of FIRST to these two relations is called non-loss decomposition, because no information is lost in the reduction process. Figure 5.3 shows the corresponding functional dependency diagram for these two relations. STATUS 1 s# CITY S# P# QTY Figure 5.3: Functional dependency diagram for relations S E C O N D and SP These new relations are said to be in 2NF. They eliminate the nonfull functional dependencies in FIRST and resolve some of the update problems. However, the nonkey attributes C I T Y and STATUS are not mutually independent. Specifically, the depen-dency of STATUS on S N U M is transitive via C I T Y . This transitivity will lead to other types of update difficulties as are shown in date[ll]. Again the reduction process is CHAPTER 5. COMPARISON WITH OTHER WORKS 77 carried out to solve these difficulties. The relation S E C O N D is now replaced by two of its projections: SC(SNUM,CITY) and CS(CITY,STATUS). These two relations satisfy the conditions of 3NF. Using the new approach of table design, the following sets are declared. Setname Domain degree Intention Primitive value sets INTEGER STRING x:INTEGER y:STRING x is any valid integer defined by the system y is any valid string (may be concatenation of alphabets and integers) defined by the system Primitive base sets (entity sets) SUPPLIER PART CITY s:SUPPLIER p:PART c:CITY set of suppliers for the parts set of parts for shipment set of cities where the suppliers reside Nonprimitive value sets VSNUM VPNUM VCNAME VSTATUS VQTY snum:STRING pnum: STRING city:STRING status:INTEGER qty:INTEGER VSNUM is a value set for supplier numbers VPNUM is a value set for part numbers VCNAME is a value set for city names VSTATUS is a value set for supplier status VQTY is a value set for the quantity of shipment of a part by a supplier Nonprimitive base sets (identifier sets) SNUM PNUM CNAME s:SUPPLIER,snum: VSNUM p:PART,pnum:VPNUM c:CITY,city:VCNAME <1,1>,<0,1> <i.i>1<0,l> <1,1>,<0.1> each supplier is identified by a unique supplier number each part is identified by a unique part number each city is identified by a unique city name (association sets) SP sc STATUS QTY s:SUPPLIER,p:PART s:SUPPLIER,c:CITY c:GITY,status: VSTATUS <s,p>:SP,qty:VQTY <1,*>,<1,*> <1,1>,<0,*> <l,l>i<0,*> <1,1>,<0,*> associate suppliers and parts associate suppliers and cities each supplier in the same city has the same status the quantity of shipment of a part by a supplier Table 5.1: S E T Schema of SHIPMENT database The domain graph representation for the above S E T schema is shown in figure 5.4. The corresponding 3NF relations can be easily derived using the table design algo-rithm. The results are as follows: Start reading entity sets ... The entity sets are: SUPPLIER with identifier SNUM PART with identifier PNUM CITY with identifier CNAME CHAPTER 5. COMPARISON WITH OTHER WORKS VSTATUS STATUS t <L1> CITY <1,1> SC CNAME <0,1J VCNAME VS# <0,1> S# '—1 r°.*> <i,i> L <i,i> VP# <0,1> P# SP kV>f <1,1> <1,1>; QTY <o ,*i VQTY Figure 5.4: Domain graph of Shipment database Finished reading entity sets Start reading association sets ... The association sets are: SNUM SUPPLIER 1 1 VSNUM 0 1 PNUM PART 1 1 VPNUM 0 1 CNAME CITY 1 1 VCNAME 0 1 SP SUPPLIER 1 * PART 1 * SC SUPPLIER 1 1 CITY 0 * STATUS CITY 1 1 VSTATUS 0 * QTY SP 1 1 VQTY 0 * Finished reading association sets The tables required are: Table's name: SUPPLIER attributes: SNUM with value set VSNUM SC with identifier CNAME Primary key for this table is SNUM Table's name: PART attributes: PNUM with value set VPNUM CHAPTER 5. COMPARISON WITH OTHER WORKS 79 Primary key for this table is PNUM Table's name: CITY attributes: CNAME with value set VCNAME STATUS with value set VSTATUS Primary key for this table is CNAME Table's name: SP attributes: SP with arity predecessor(s) : SUPPLIER PART QTY with value set VQTY Primary key for this table is SP DONE Note that although the set P A R T has only one attribute P N U M , it is declared as an entity set, instead of as an attribute set of SUPPLIER. This is due to the assumption that P A R T may have some other attributes (for examples, colour, name, etc.) which for simplicity are not declared here. Similarly, C I T Y is declared as an entity set (and hence an identifier C N A M E is declared for it) because the attribute SUPPLIER is functionally dependent on C I T Y . The corresponding table representation of the above relations are shown in tables 5.2, 5.3, 5.4 and 5.5. R O L E £JC D O M A I N V C N A M E T U P L E Table 5.2: Table SUPPLIER with primary key supplier number S N U M A n example of decomposition based on multivalued dependencies is shown in fig-ure 5.5. For a given course, there may exist any number of teachers and any number CHAPTER 5. COMPARISON WITH OTHER WORKS 80 R O L E P N U M D O M A I N V P N U M T U P L E Table 5.3: Table P A R T with primary key part number P N U M R O L E C N A M E S T A T U S D O M A I N V G N A M E V S T A T U S T U P L E i Table 5.4: Table C I T Y with primary key city, name C N A M E R O L E S P Q Y T S U P P L I E R P A R T D O M A I N V S N U M V P N U M V Q Y T TUPLE Table 5.5: Table SUPPLIER-PART (SP) COURSE TEACHER TEXT COURSE TEACHER COURSE TEXT Figure 5.5: Decomposition based on multivalued dependencies i CHAPTER 5. COMPARISON WITH OTHER WORKS 81 of course texts. A given teacher or text can be associated with any number of courses. Assume teacher and text are independent of each other. The multivalued dependencies are: COURSE ->->,TEACHER COURSE ->-> TEXT or simply COURSE ->-> TEACHER I TEXT Using the domain graph table design approach, the domain graph representation is shown in figure 5.6. TEACHER j<1.1> <1,*> CT COURSE cx .A *. - 1 * ^ TNAME k<0,1> VTNAME Figure 5.6: Domain graph The results are: Start reading entity seta ... The entity sets are: COURSE with identifier CNAME TEACHER with identifier TNAME TEXT with identifier TEXTNAME Finished reading entity sets <1,*> TEXT <1,1> TEXTNAME f <0,1> IVTEXTNAMtt Start reading association sets CHAPTER 5. COMPARISON WITH OTHER WORKS 82 The associations sets are: CNAME COURSE 1 1 VCNAME 0 1 TNAME TEACHER 1 1 VTNAME 0 1 TEXTNAME TEXT 1 i VTEXTNAME 0 1 CT COURSE 1 * TEACHER 1 * CX COURSE 1 * TEXT 1 * Finished reading association sets The tables required are: Table's name: COURSE attributes: CNAME with value set VCNAME Primary key for this table is CNAME Table's name: TEACHER attributes: TNAME with value set VTNAME Primary key for this table is TNAME Table '8 name: TEXT attributes: TEXTNAME with value set VTEXTNAME Primary key for this table is TEXTNAME Table's name: CT CT with arity predecessor(s) : COURSE TEACHER This table is all-key Table's name: CX CX with arity predecessor(a) : COURSE TEXT This table is all-key DONE The corresponding table representations are shown in table 5.6 and 5.7. R O L E C N A M E T N A M E DOMAIN V C N A M E V T N A M E T U P L E Table 5.6: Table C T with primary key (cname,tname) CHAPTER 5. COMPARISON WITH OTHER WORKS 83 R O L E C N A M E T E X T N A M E D O M A I N V C N A M E V T E X T N A M E T U P L E Table 5.7: Table C X with primary key (cname,textname) As we see from the above two examples, the distinction between the information concerning entities and association sets and the actual organization of information, in which the entities and associations are represented by data, clarifies the semantics of the dependencies among data 3. As mentioned in Chen[5], one of the problems with the normalization approach is that if we have to make a new assumption or change some assumptions, the relations may no longer be in the desired normal form. Hence further normalization may be necessary. In the S E T model and entity-relationship model approach, semantic in-formation is used to organize data in both the entity and association relations in a more apparent semantic meaning ( and also more meaningful relationships between the relations). 5 . 1 . 3 R e l a t i o n a l S t r u c t u r e i The simplicity of the relational approach arises from the fact that both entities and relationships are handled in a uniform manner. The only type of information bear-ing construct is the relation. Associations between tuples of different relations are 3Details on analysis of MVD based on the entity-relationship approach can be found in [26]. CHAPTER 5. COMPARISON WITH OTHER WORKS 84 represented by values of attributes of a common data type. To provide some explicit semantic interpretation of a relational schema, the rela-tional data model augments the relations by providing a facility for specifying explicit constraints on the relations. For example, the comparability domain constraint is used to determine if it is semantically meaningful to compare the values of two attributes. If the two attributes do not have the same comparability domain, they should not be compared and used in a join operation. This facility reduces the possibility of forming meaningless relationships. A n example of the usage of comparability domain is to per-mit units such as meters, grams, dollars, etc. to be specified for attributes such that only attributes which have the same kind of unit, i.e., having comparability domain, can be compared and used in a join operation. 5.2 Entity-Relationship Data Model (ERM) The entity-relationship data model is one of the new generation semantic data models which solves some of the inadequacies of the traditional data models. It is a gener-alization of the hierarchical, network and relational data models. The generalization is mostly in terms of allowing the representation of explicit constraints and many-to-many relationship types directly in the data model. Its advantage over the traditional data models is its more clearly defined and visible semantics. The entity-relationship approach to database design was proposed by Chen[5]. The CHAPTER 5. COMPARISON WITH OTHER WORKS 85 entity-relationship data model was conceived to facilitate database conceptual design by allowing specification of an enterprise schema which represents the entire enterprise's view of data. It is independent of storage or efficiency considerations. The enterprise schema is then mapped into an appropriate database schema which is realized by some DBMS. The basic components of an entity-relationship model are entities, relationships and attributes. 1. An entity is an element (thing/object) of interest. Entities are classified into different entity sets. 2. A relationship is an association between entities. Relationships are classified into different relationship types: one-to-one, functional and many-to-many. 3. Both entities and relationships can have attributes which are general properties and are classified into different attribute types. An attribute A of an entity set E or a relationship set RS is a mapping from E or RS into a value set V or a cartesian product of N value sets VI x V2 x ... x Vn. A: E or RS --> V or VI x V2 x . . . x Vn If n is greater than or equal to 2 then we call A a composite attribute. One key property of an entity set or a relationship set is its identifier or some CHAPTER 5. COMPARISON WITH OTHER WORKS 86 artificial one. A minimal set of attribute(s) K of an entity set E which has a 1-1 mapping from E into the cartesian product of the associated value sets of K is called a key of E . Relationships are identified using the identifiers of the involved entities. If there is no 1-1 mapping, then some combination of the n:l attributes must form a key. The structure of a database organized according to the entity-relationship model can be depicted by a diagrammatic technique called an entity-relationship diagram (ERD) [5,21]. The basic components of the entity-relationship model can be graphically represented as depicted in figure 5.7. A n entity set is represented diagrammatically ENTTTTY < ^ T O N S H > ( ATTRIBUTE ) Figure 5.7: Basic components of entity-relationship model represented in E R D as a rectangle. A relationship is represented diagrammatically as a diamond. An attribute is graphically represented as an ellipse. The arcs in an E R D represent either relationships connecting entity sets, or attributes connecting entity and/or relationship sets to value sets. AH arcs are marked with relationship/mapping types and are often labelled by the name of the relationship of the attribute (including the role of the related entity set, if it is deemed ambiguous). A n E R D conveys the intension of a database according to the entity-relationship CHAPTER 5. COMPARISON WITH OTHER WORKS 87 model. A simple example of ERD and the corresponding4 SET model domain graph is shown in figure 5.8. The entity sets that participate in a relationship set are indicated A member of A is associated by relation C with N members of B, A member of B is associated by relation C with one member of A. N = {0,1,2,...} ERD j <i,i> c k 1 1 -Each member of A is C associated with 0 or. more member of B. Each member of B is C associated with exactly 1 member of A. Domain graph Figure 5.8: Example of a simple entity-relationship diagram (ERD) and the corre-sponding domain graph by links which connect the entity sets to the relationship set. In the entity-relationship model, the links can represent n-ary relationships among the entity sets and can be one-to-one, functional, or many-to-many. Recursive links are also allowed. For example, see figure 5.9. EMPLOYEE N <V>, EMPLOYEE MANAGER <1,1>* ERD Domain graph Figure 5.9: Recursive links 4Note that the information represented in the ERD and in the domain graph may not be exactly equivalent. CHAPTER 5. COMPARISON WITH OTHER WORKS 88 It is also possible to have more than one relationship sets between the same entity sets. For example, see figure 5.10. „ , SUPERMSE <°. & — — — W< i , i> G R A D S T U D CONSULTS <0,*> ERD Domain gra p h Figure 5.10: Two relationship sets between the same two entity sets Relationship sets are not restricted to only binary relationship; it could be n-ary, that is, a relationship set among n entity sets where n is greater than two. A n example is figure 5.11. However the semantic of such higher order relationship sets can become quite complex to comprehend. 1 T E A C H E R M S T U D E N T PROJECT ERD T E A C H E F I—• S T U D E N T l<1 .*> P R O J E C T Each student can be involved in several projects and can work under the supervision of several teachers. For each project, the student works under the instruction of exactly one teacher. Domain graph Figure 5.11: Ternary relationship CHAPTER 5. COMPARISON WITH OTHER WORKS 89 A role of an entity set in a relationship is the function it plays in the relation-ship. For example, in figure 5.12, superior and subordinate are roles played by (staff) manager and the staff person respectively in the relationship set M A N A G E S . Superior 1 STAFF < ^ M A M « ^ > STAFF N <1,1>» MANAGER Subordinate ERD Figure 5.12: Roles of entity sets As mentioned, an attribute is a mapping from an entity set or relationship set to a value set or the cartesian product of n value sets. A n attribute associates an entity set or relationship set with the value set(s) and provides an interpretation of the value set(s) in the context of an entity set or relationship set respectively. For example, in figure 5.13.a, the attribute S U P N U M gives the interpretation(semantics) of the value set V S U P N U M in the context of the SUPPLIER entity set. The attribute mapping is presented in an E R D by a link from the entity set or relationship set to the value set(s). Attributes can be multivalued. For example, see figure 5.14. Explicit constraints can be expressed in the entity-relationship model. Some exam-ples are: 1. Explicit constraint on attribute, e.g., the range of value in the value set for the CHAPTER 5. COMPARISON WITH OTHER WORKS SUPPLIER SUPPLIER SUFNUM ZEE SUFNUM 1> VSUPNUM <0,1> VSUPNUM Domain graph ERD A. SUPPLIER! <1,*> |e1,1a SP OTY <1,*> <o,*> PART vary Domain graph ERD Figure 5.13: a.Attribute of an entity set b.Attribute of a relationship set EMPLOYEE M NICKNAME EMPLOYEE p <0, ' : NICKNAME i <o; VNICKNAME ERD Domain graph Figure 5.14: Multivalued attribute CHAPTER 5. COMPARISON WITH OTHER WORKS 91 attribute. 2. Explicit constraint on existing values in the database, e.g., when we desire to specify a subset of an existing set (inclusion constraint). 3. Existence constraints, these constraints specify the dependency of an entity on another entity. Figure 5.15 shows an example of existence constraint. BANKBFWNGH Weak relationship set Account Weak entity set ERD BANKBRANCH <1,*> AO30UNT <0*> VAG03UNT# Domain graph Figure 5.15: Existence dependency In this example, the deletion of B A N K B R A N C H will cause the deletion of A C -C O U N T too. The dependent entity set is called a weak entity set and its correspond-ing relationship set is called a weak relationship set. Notice that the dependent entity set is represented as a double rectangle, labelled box with the arc from the weak re-lationship set pointing at it. A weak relationship set can be one-to-one, functional or many-to-many. In many-to-many relationship set, the deletion of a regular (single rectangle) entity set does not necessarily imply the deletion of the associated weak CHAPTER 5. COMPARISON WITH OTHER WORKS 92 entities if more than one regular entity set is associated with the weak entity set. A similar kind of constraint is called ID dependent constraint. This constraint arises when an entity can not be identified by the value of its own attribute (s) but has to be identified by its relationship(s) with other entities [33]. A n example is shown in figure 5.16. PATIENT PATIENT ID > PATIENT DIAGNOSIS <1,1> DIAGNOSS <0,*> VDIAGN06E| ERD Domain Graph Figure 5.16: Identity dependency. Members of DIAGNOSIS can be uniquely identified by their relationship with a P A T I E N T entity The extension of an E R D is represented as a series of tables [5]. The extension of an entity set is represented as a table called an entity relation. The extension of a regular entity set corresponds naturally to the definition of the entity set. That is, there is one column for each attribute associated with the entity set. The extension of a weak entity set, called a weak entity relation, consists of the attributes of the weak entity set and the entity key of the regular entity set(s) associated to it. In a similar way, the extension of a regular relationship set is represented as a table CHAPTER 5. COMPARISON WITH OTHER WORKS 93 called a relationship relation. In this case, the attributes of the relationship relation consist of the entity keys of the entity sets associated with the relationship set and the attributes, if any, of the relationship set. Briefly, the database design approach based on the entity-relationship model is as follows[5,32]: 1. Identify the entity sets and relationship sets of interest. 2. Identify the semantic information in the relationship sets. S. Define the value sets and attributes for the entity sets/relationship sets. 4. Organize the data into entity/relationship relations and decide the primary key. 5. Normalize the above relations. The relations generated by the above approach are candidate relations for the re-lational database. Note however, the candidate relations generated from the transfor-mation of E R D may still require certain natural decomposition[21]. In the following, we present an example from [5] of the relational database design using the entity-relationship data model. Steps 1 and 2 are expressed in the E R D shown in figure 5.17. Table 5.8 illustrates step 3 for a small part of the database. Step 4 is summarized in table 5.9. CHAPTER 5. COMPARISON WITH OTHER WORKS 9 4 SUPPLIER Figure 5.17: Step 1 and 2: A n E R D for analysis of information in a manufacturing firm DECLARE VALUE-SETS REPRESENTATION ALLOWABLE-VALUES EMPLOYEE-NO INTEGER(4) (0,2000) FIRST-NAME CHARACTERS) ALL LAST-NAME GHARACTER(IO) ALL NO-OF-YEARS INTEGER(3) (0,100) PROJECT-NO INTEGER(S) (1,500) PERCENTAGE FIXED(5.2) (1,100.0) Table 5.8: Step 3: Define the value sets and attributes for the entity/relationship sets CHAPTER 5. COMPARISON WITH OTHER WORKS D E C L A R E R E G U L A R E N T I T Y R E L A T I O N E M P L O Y E E A T T R I B U T E / V A L U E - S E T : E M P L O Y E E - N O / E M P L O Y E E - N O ! N A M E / ( F I R S T - N A M E , L A S T - N A M E ) A L T E R N A T I V E - N A M E / ( F I R S T - N A M E > L A S T - N A M E ) A G E / N O - O F - Y E A R S PRIMARY K E Y E M P L O Y E E - N O D E C L A R E R E G U L A R E N T I T Y R E L A T I O N P R O J E C T A T T R I B U T E / V A L U E - S E T : P R O J E C T - N O / P R O J E C T - N O PRIMARY K E Y P R O J E C T - N O D E C L A R E R E G U L A R R E L A T 1 6 N S H l P R E L A T I O N P R O J E C T - W O R K E R R O L E / E N T I T Y - R E L A T I O N . P r i m a r y K e y / M A X - N O - O F - E N T I T I E S W O R K E R / E M P L O Y E E . P K / m P R O J E C T / P R O J E C T . P K / n (M .N mapping) A T T R I B U T E / V A L U E - S E T : P E R C E N T A G E - O F - T I M E / P E R C E N T A G E D E C L A R E W E A K RELATIONSHIP R E L A T I O N E M P L O Y E E - D E P E N D E N T R O L E / E N T I T Y - R E L A T I O N . P r i m a r y K e y / M A X - N O - O F - E N T I T I E S S U P P O R T E R / E M P L O Y E E . P K / 1 D E P E N D E N T / D E P E N D E N T . P K / n E X I S T E N C E O F D E P E N D E N T D E P E N D S O N E X I S T E N C E O F S U P P O R T E R D E C L A R E W E A K E N T I T Y REILATION D E P E N D E N T A T T R I B U T E / V A L U E - S E T : N A M E / F I R S T - N A M E A G E / N O - O F - Y E A R S PRIMARY K E Y N A M E E M P L O Y E E . P K T H R O U G H E M P L O Y E E - D E P E N D E N T Table 5.9: Step 4: Organization of data into entity/relationship relations CHAPTER 5. COMPARISON WITH OTHER WORKS 96 Using the SET model, the S E T schema for the small part of the database in the above manufacturing firm example is shown in table 5.10. The corresponding domain graph for figure 5.17 is shown in figure 5.18. SETNAME DOMAIN DEGREE INTENTION Primitive base sets entity sets EMPLOYEE e:EMPLOYEE set of employees in the firm PROJECT prtPROJECT set of projects in the firm DEPENDENT dep:DEPENDENT set of employees dependents Nonprimitive value sets FIXED z:REAL z is any valid fixed number between 0 and 100.0 VEMPNUM enum:INTEGER a value set for employee numbers VFN fn: STRING a value set for first names VLN In: STRING a value set for last names VNAME fn:VFN,ln:VLN a value set for names W E A R S yr: INTEGER a value set for number of years VPROJNUM pn:INTEGER a value set for project numbers VPERCENT pcFIXED a value set for percentage Nonprimitive base sets and defined sets EMPNUM e:EMPLOYEE,enum: VEMPNUM <1,1>,<0I1> each e is identified by a unique enum EMPNAME e:EMPLOYEE,<fn,-ln>:VNAME <i,l> .<(>,*> each e has a legal name EMPALN e:EMPLOYEE,<f n. ln> :VNAME <l,i>,<0 1»> each e has an alternate name EMPAGE e:EMPLOYEE,yr:VYEARS <l.i>1<0,*> each e is yr old PROJNUM pnPROJECT.pn: VPROJNUM <1,1>,<0.1> each pr is identified by a unique pn EMPPROJ e:EMPLOYEE,pr:PROJECT <1,*>,<1,*> associate e with pr PERCENTAGE <e, pr> :EMPPR0J,pc: VPERCENT <l,i>,<0,*> percentage of time work on pr by an e EMPDEP e:EMPLOYEE,dep:DEPENDENT «>,*>,<1,1> associate e with his dep(s) if any EMPDEPNAME <e, dep>: EMPDEP.nm: VNAME <1,1>,<<>,*> each empdep has a name EMPDEPAGE <e, dep>: EMPDEP.yr: VYEARS <i,i>1<0,*> each empdep is yr old Table 5.10: The SET schema for the manufacturing firm database The results of the table design are: The tables required are: Table's name: EMPLOYEE attributes: EMPNUM with value set VEMPNUM EMPNAME with arity predecessor(s) : VFN VLN EMPALN with arity predecessor(s) : VFN VLN EMPAGE with value set VYEARS CHAPTER 5. COMPARISON WITH OTHER WORKS Primary key for this table is EMPNUM Table's name: PROJECT attributes: PROJNUM with value set VPROJNUM Primary key for this table is PROJNUM Table's name: DEPENDENT attributes: EMPDEP with identifier EMPNUM EMPDEPNAME with arity predecessor(s) : VFN VLN EMPDEPAGE with value set VYEARS i Primary key for this table is EMPNUM-EMPDEPNAME Table's name: VNAME attributes: VNAME with arity predecessor(s) : VFN VLN This table is all-key Table's name: EMPPROJ attributes: EMPPROJ with arity predecessor(s) : EMPLOYEE PROJECT PERCENTAGE with value set VPERCENT Primary key for this table is EMPPROJ DONE DEPARTMENT) DEPT-EMP t<1,1>_ EMPLOYEE <V>_ EMPPROJ SUPPLIER PROJECT PROJ-MGR S-P-J "T<1*> PROJ-PART <1,*> PART IT <i ,*> EMPDEP f <1,1> DEPENDENT COMPONENT Figure 5.18: Domain graph for firm database SET schema CHAPTER 5. COMPARISON WITH OTHER WORKS 98 Comparing figure 5.17, table 5.8, table 5.9 and figure 5.18, table 5.10, the decla-rations in the S E T schema provide a more concise and accurate information of the database schema to the database designer, the database administrator and the users. The domain graph is superior than the E R D in term of the degree constraints repre-sentation, clarity and simpler representation constructs. Furthermore, the table design program can automatically generate the required set of tables based on the information in the S E T schema. 5 . 2 . 1 E x t e n d e d E n t i t y - R e l a t i o n s h i p d a t a m o d e l The extended entity-relationship model is an extension of entity-relationship model with the following additional concepts: Extended classes of objects - The capabilities to represent generalization hier-archies and subset hierarchies. A category is a subset of entities from an entity set. Categories of an entity set share some common attributes. Categories allow the representation of generalization hierarchies and subset hierar-chies. Figure 5.19 shows an example of such hierarchies in E E R D and their corresponding domain graph representation. Extended entity-relations hip constructs -• Degree of a relationship: unary, binary, ternary. CHAPTER 5. COMPARISON WITH OTHER WORKS 9 9 • Connectivity of a relationship: 1:1, 1:N, M:N. • Membership class in a relationship: mandatory, optional. Examples are shown in figure 5.20. EMPLOYEE EMPLOYEE | MANAGER [SECRETARY |TEaVllCtAN| EMPvffEMPLOYEE | MANAGER |-<0,1> ^<1,1» EMPJT * VJOB-TTTLE EERD Defined set MANAGER: {e:EMPLOYEE| [For some <ejt>:EMPJT] jt = 'manager11 set of managers} Simaarly for SECRETARY AND TECHNICIAN. Note that the sets MANAGER, SECRETARY AND TECHNICIAN do not overlap. Generalization hierarchy Domain Graph EMFLOYEE EMPLOYEE E M P . '.STUDENf^ EMP.POLmCIAN j EMP.STUDENJ-<a—|PRCfESSCN VPROFESSICN| <0,1> i <0,*> EERD Defined set EMP.STUDENT: {e:EMPLOYEE|[For some <ep>:PROFESSION] p='student'|set of student employees} Similarly for EMP.POUTICIAN. Note that EMP.STUDENT and EMP.POUT1GIAN may overlap. Domain Graph Subset hierarchy Figure 5.19: Generalization and subset hierarchies The addition of new concepts and new representations in E R D defeats the simplic-ity of the original entity-relationship model which facilitates communications among users and database designers. Note that the degree' concept in the S E T model covers the degree, connectivity, membership class concept in the extended entity-relationship model without introducing several additional new constructs. CHAPTER 5. COMPARISON WITH OTHER WORKS 100 Connectivity Membership c lass Mandatory Optional EER constructs SET model constructs Figure 5.20: Extended entity-relationship constructs 5.3 Future Extensions The objective of the SET model is to model all phases of the database design. It tries to provide uniformity of representation, with no distinction between the conceptual schema and the tuples (relations) recorded in the database. The kernel as well as the specification and query language, called D E F I N E , for the S E T model is currently being developed by Gilmore & Morrison. In this system, only the members of the primitive base sets are recorded. The members of the defined sets (tables) are computed (defined) upon query. These significantly reduce the storage space, at the same time providing an easy control on integrity constraint. CHAPTER 5. COMPARISON WITH OTHER WORKS 101 5.4 Conclusion We have described and presented a new approach of database design employing the S E T conceptual data model. In addition, we have also demonstrated the relative ease of conceptual modelling using the SET conceptual model compared to the entity-relationship model. The domain graph method of table design described in this thesis greatly reduces i the complex process of normalization and analysis of complex functional and multival-ued dependencies. The automated table design algorithm presented in this thesis produces the opti-mum number of tables to record the data for a given SET schema without losing any information. The table design program also reproduces unique results for a given SET schema. A n example application has also been presented to illustrate the database design process. This approach of database design benefits from both the simplicity of the S E T model conceptual design and the simplicity of the relational model structures. Bibliography [1] P. Atzeni, C . Batini, M . Lenzerini, and F . Villanelli. INCOD: a system for conceptual design of data and transactions in the entity-relationship model. Entity-Relationship Approach to Information Mod-eling and Analysis, 1981. [2] C . Bachman. The role concept in data models. In Proceedings of the 3rd International Conference on Very Large Databases, pages 464-476, Tokyo, October 1977. [3] D . G . Bobrow and T . Winograd. A n overview of K R L , a knowledge representation language. Cognitive Science, l(l):3-46, 1977. [4] H . Briand, H . Habrias, J . Hue, and Y . Simon. Expert system for translating an E - R diagram into databases. In Proceedings of the 4th International Conference on Entity-Relationship Approach, pages 199-206, I E E E Computer Society Press, Silver Spring, Md. , Chicago, 1985. [5] P.P. Chen. The entity-relationship modehtoward a unifying view of data. A CM Transactions on Database Systems, l(l):9-36, March 1976. [6] D.S. Ross Christopher. Intelligence databases. BYTE, January 1987. [7] I. Chung, F; Nakamura, and P. Chen. A decomposition of relations using the entity-relationship approach. Entity-Relationship Approach to Information Modeling and Analysis, 1981. [8] E . F . Codd. Extending the relational model to capture more meaning. ACM Transactions on Database Systems, 4(4), December 1979. [9] E . F . Codd. A relational model of data for large shared data banks. CACM, 13(6), June 1970. I 102 BIBLIOGRAPHY 103 [10] C . J . Date. An Introduction to Database Systems. Volume 1, Addison-Wesley, fourth edition, 1985. [11] C . J . Date. An Introduction to Database Systems. Volume 1, Addison-Wesley, third edition, 1981. [12] J.Doyle. A truth maintenance system. A.I. journal, 12:231-272, 1979. [13] R. Elmasri, A . Hevner, and J . Weeldreyer. The category concept: an extension to the entity-relationship model. Data Knowledge Engineer-ing, 1(1):75-116, 1985. [14] P. C . Gilmore. A Foundation for the Entity-Relationship Model: Why and How. Technical Report 87-10, Dept. of Computer Science, Uni-versity of British Columbia, Vancouver, B .C . , Canada, 1987. [15] P. C . Gilmore. Justifications and Applications of the Set Conceptual Model. Technical Report 87-9, Dept. of Computer Science, University of British Columbia, Vancouver, B . C . , Canada, 1987. [16] P. C . Gilmore. Natural deduction based set theories: a new resolution of the old paradoxes. Symbolic Logic, 51(2):393-411, June 1986. [17] P. C . Gilmore. The Set Conceptual Model and the Domain Graph Method of Table Design. Technical Report 87-7, Dept. of Computer Science, University of British Columbia, Vancouver, B . C . , Canada, 1987. [18] R. Goebel. A Logic Data model for the Machine Representation of Knowledge. PhD thesis, Dept. of Computer Science, University of British Columbia, June 1985. [19] S. Jajodia and P. Ng. On the representation of relational structures by entity-relationship diagrams. Entity-Relationship Approach to Soft-ware Engineering, 223-248, 1983. [20] S. Jajodia and P. Ng. Translation of entity-relationship diagrams into relational structures. System Software, 4:123-133, 1984. [21] S. Jajodia, P.A. Ng, and F . N . Springsteel. The problem of equivalence for entity-relationship diagrams. IEEE Transactions on Software En-gineering, SE-9(5):617-630, September 1983. BIBLIOGRAPHY 104 W. Kent. A simple guide to fifth normal forms in relational database theory. Communications of the ACM, 26(2):120-125, February 1983. A . Lee. Surveys on truth maintenance systems. March 1987. Term paper, U B C . M . Lenzerini and G . Santucci. Cardinality constraints in the entity-relationship model. Entity-Relationship Approach to Software Engi-neering, 529-549, 1983. A . J . Levesque and R.J . Brachman. A fundamental tradeoff in knowledge representation and reasoning. In Proceedings of CSCSI, pages 141-152, London, Ontario, Canada, 1984. original. T . Ling. An Analysis of Multivalued and Join Dependencies based on the ER approach. Technical Report TRA4/86, Dept. of Informa-tion Systems and Computer Science, National University of Singapore, Kent Ridge, Singapore, April 1986. T . Ling. A normal form for entity-relationship diagrams. In Pro-ceedings of the 4th International Conference on Entity-Relationship Approach, pages 24-35, I E E E Computer Society Press, Silver Spring, Md. , Chicago, 1985. S. B. Navathe and R. Elmasri. Integrating user views in database design. IEEE Computer, 50-62, January 1986. S.B. Navathe and S.G. Gadgil. A methodology for view intergration in logical database design. In Proceedings of the 8th International Conference on Very Large Databases, pages 142-164, Mexico city, 1982. H . Sakai. Entity-relationship approach to logical database design. Entity-Relationship Approach to Software Engineering, 155-187, 1983. P. Scheuermann, G . Scheffner, and H . Webber. Abstraction capabili-ties and invariant properties modelling within the entity-relationship approach. Entity-Relationship Approach to Systems Analysis and De-sign, 121-140, 1980. [32] T . J . Teorey, D. Yang, and J.P. Fry. A logical design methodology for relational databases using the extended entity-relationship model. Computing Survey, 18(2), June 1986. BIBLIOGRAPHY 105 [33] D . C . Tsichritzis and F . H . Lochovsky. Data Models. Prentice Hall, Engelwood-Cliffs, New Jersey, U.S.A., 1982. [34] N . Webre. A n extended E-R model and its use in a defense project. In Proceedings of the 2nd International Conference on Entity-Relationship Approach, pages 175-194, North-Holland,Amsterdam, Washington, D . C , 1981. [35] G . Wiederhold. Database Design. Megraw-Hill, New York, 1977. [36] E . Wong and R. Katz. Logical design and schema conversion for rela-tional and D B T G databases. Entity-Relationship Approach to Systems Analysis and Design, 311-322, 1980. [37] H . K . T . Wong and J . Mylopoulos. Two views of data semantics: a survey of data models in A l and D B M . INF OR, 15(3):344-383, 1977. [38] D. Yang, T . J . Teorey, and J.P. Fry. A Practical Approach to trans-forming extended ER Diagrams into the relational model. Technical Report CRL-TR-6-85, Computing Research Laboratory, Electrical En-gineering and Computer Science Department University of Michigan, Ann Arbor, 1985. [39] S.B. Yao, V . E . Waddle, and B . C . Housel. View modelling and integra-tion using the functional data model. IEEE Transactions on Software Engineering, SE-8(6):544-553, November 1982. Appendix A Normalization Procedures A good design principle is to have one fact in one place [11]. Normalization theory is basically a formalization of this idea. Definition: A relation is said to be normalized (in its First Normal form) if it satisfies the following condition: each attribute value in each tuple in the relation is atomic (nondecomposible so far as the system is concerned). Definition: First N o r m a l F o r m ( I N F ) A relation is said to be in INF if and only if it satisfies the constraint that all the underlying domains of attributes contain atomic values only. Definition: Second N o r m a l F o r m (2NF) A relation is in 2NF if and only if it is in INF and every nonkey attribute is fully dependent on the primary key. An attribute is nonkey if it does not participate in the primary key. Definition: T h i r d N o r m a l F o r m (3NF) 106 APPENDIX A. NORMALIZATION PROCEDURES 107 A relation R is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key. That is, at all time, each tuple of R consists of a primary key value that identifies some entity, together with a set of mutually independent attribute values that describe that entity in some way. Two attributes are mutually independent if neither is functional dependent on the other. Definition: B o y c e / C o d d N o r m a l F o r m ( B C N F ) A relation is in B C N F if and only if every determinant is a candidate key. A n attribute, possibly composite, on which some other attribute(s) is/are fully functional dependent is a (functional) determinant. Definition: Four th N o r m a l F o r m (4NF) A relation is in 4NF if and only if whenever there exists a multivalued dependency in R, say R.A ->-> R.B, then all attributes of R are also functional dependent on attribute R.A . That is, R . A -> X for all attribute X of R. Definition: Fi f th N o r m a l F o r m (5NF) [22], also called Projection Jo in N o r -m a l F o r m ( P J / N F ) A relation R is in 5NF if and only if every join dependence in R is implied by the candidate keys of R. Join dependency: A relation R satisfies join dependence *(X,Y,.. ,Z) if and only if it is the join of its projections on X , Y , . . , Z , where X , Y , . . , Z are subsets of the set of attributes of R. Definition: D o m a i n K e y N o r m a l F o r m ( D K / N F ) APPENDIX A. NORMALIZATION PROCEDURES 108 A relation is in D K / N F if and only if every constraint on the relation is a logical consequence of key constraint and domain constraint. Key constraint: when a certain attribute or a combination of attributes is a candidate key. Domain constraint: when the values of a certain attribute is restricted to some prescribed range of values. Reduction process 1. Take projections of the original INF relation to eliminate any non null functional dependencies on the primary key. This will generate a collection of 2NF relations. 2. Take the projections of these 2NF relations to eliminate any transitive dependen-cies. This will generate a collection of 3NF relations. 3. Take projections of these 3NF relations to eliminate any remaining functional dependencies in which the determinant is not a candidate key. This will generate a collection of B C N F relations. 4. Take projections of these B C N F relations to eliminate any multivalued depen-dencies that are not also functional dependencies. This will generate a collection of 4NF relations. 5. Take projections of these 4NF relations to eliminate any join dependencies that are not implied by the candidate keys. APPENDIX A. NORMALIZATION PROCEDURES 109 This reduction process may be quite complex and requires human intervention. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051915/manifest

Comment

Related Items