UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The design of semantic database model SDBM Xie, Linchi 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A4_6 X53.pdf [ 6.06MB ]
Metadata
JSON: 831-1.0096830.json
JSON-LD: 831-1.0096830-ld.json
RDF/XML (Pretty): 831-1.0096830-rdf.xml
RDF/JSON: 831-1.0096830-rdf.json
Turtle: 831-1.0096830-turtle.txt
N-Triples: 831-1.0096830-rdf-ntriples.txt
Original Record: 831-1.0096830-source.json
Full Text
831-1.0096830-fulltext.txt
Citation
831-1.0096830.ris

Full Text

THE DESIGN O F SEMANTIC DATABASE M O D E L S D B M by LINCHI X1E B.Sc. Shanghai Jiao Tong University A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DECREE OF MASTER OF SCIENCE ( BUSINESS ADMINISTRATION ) in THE FACULTY OF GRADUATE STUDIES Faculty of Commerce and Business Administration W e accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A Febuary 1987 © Linchi Xie, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at The University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Faculty of Commerce and Business Administration The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date: Febuary 1987 ii ABSTRACT This thesis is mainly concerned with semantic data modelling related to database design. The domain of this research is restricted to general data modelling and the discussion is carried out at the conceptual level. The thesis assesses a number of serious modelling shortcomings of the conventional data models and reviews several basic principles and mechanisms developed in current semantic data modelling research. Based on these findings, the thesis identifies the inadequacy of the conceptualization of data modelling and develops the two-view conceptualization of data modelling. The basic idea behind the two-view conceptualization is that the conceptual structure of the applications being modelled should be separated from its external data representation. A new semantic database model, SDBM, is designed based on the conceptualization. The model makes a clear separation between the conceptual structure and its external data representation. It offers a data type mechanism to deal with the data representation, a window mechanism to model the conceptual structure, and a transaction mechanism to provide database operations. One of the major extensions of the current semantic data models is that with SDBM the specialization relationship is just a special case of constraints that can be specified among SDBM windows. A formal syntax and informal semantics of SDBM are given in the thesis along with comparisons between SDBM and a closely-related semantic data model, Taxis. iii TABLE O F C O N T E N T S Page ABSTRACT ii TABLE OF CONTENTS iii ACKNOWLEDGEMENTS : vi 1. INTRODUCTION 1 1.1 OBJECTIVES 1 1.2 A P P R O A C H 3 1.3 PREVIOUS WORK 5 1.4 THESIS OUTLINE 8 2. REVIEW OF DATA MODELLING 10 2.1 SEMANTIC MODELLING PROBLEMS OF THE RELATIONAL DATA MODEL 10 2.1.1 TUPLE ORIENTATION 10 2.1.2 SEMANTIC OVERLOADING 12 2.1.3 ATOMIC DOMAINS 13 2.1.4 NORMALIZATION 15 2.1.5 RELIANCE O N USER-CONTROLLED IDENTIFIERS 16 2.1.6 GENERALIZATION/SPECIALIZATION 18 2.1.7 REQUIREMENT OF HOMOGENEITY 19 2.1.8 AGGREGATION 20 2.2 SEMANTIC EXTENSIONS TO THE RELATIONAL DATA MODEL 21 2.2.1 EXTRA CONSTRAINTS 22 2.2.2 SEMANTIC CLASSIFICATION OF RELATIONS 23 2.2.3 THE ENTITY-RELATIONSHIP A P P R O A C H 23 2.2.4 C O N C L U S I O N S 24 iv 3. REVIEW OF SEMANTIC DATA MODELLING 25 3.1 BASIC PRINCIPLE 25 3.2 OBJECT ORIENTATION 25 3.3 DATA ABSTRACTION ; 27 3.3.1 CLASSIFICATION 28 3.3.2 AGGREGATION '. 29 3.3.3 GENERALIZATION/SPECIALIZATION 30 3.3.4 ASSOCIATION 32 3.4 SEMENTIC INTEGRITY CONSTRAINTS 32 3.5 APPLICATION OF DATA TYPES 35 3.5.1 USE OF THE DATA TYPE CONCEPT 36 3.5.2 DATA TYPE STRUCTURES 37 4. THE CONCEPTUALIZATION OF DATA MODELLING 38 4.1 THE UNDERLYING PHILOSOPHY 38 4.2 THE TWO-VIEW CONCEPTUALIZATION 38 4.2.1 A GENERAL DESCRIPTION 39 4.2.3 SDBM OBJECTS 40 4.2.3 SDBM RELATIONSHIPS 44 4.2.4 SDBM PROPERITIES 47 4.2.5 S D B M TYPE CONCEPT 47 4.2.6 S D B M W I N D O W C O N C E P T 49 4.2.7 A N EXTENSION TO SPECIALIZATION MECHANISM 51 5. OVERVIEW OF SDBM 54 5.1 SOME NOTATION CONVENTIONS 55 5.2 SDBM TYPE SYSTEM 56 5.2.1 ELEMENTARY DATA TYPES 56 5.2.1.1 NUMERIC 56 5.2.1.2 BOOLEANS 57 V 5.2.1.3 CHARACTERS 58 5.2.1.4 ENUMERATION 59 5.2.2 STRUCTURED DATA TYPES 59 5.2.3 ABSTRACT DATA TYPES 63 5.2.4 SUBTYPE HIERARCHIES 66 5.3 S D B M W I N D O W SYSTEM 72 5.3.1 SDBM SUBWINDOW HIERARCHY 82 5.3.2 CONSTRAINTS A M O N G W I N D O W S 83 5.4 SDBM TRANSACTION SYSTEM 85 5.4.1 TRANSACTION AS ABSTRACTION OPERATION 86 5.4.2 TRANSACTION DESIGNATOR 87 5.4.3 TRANSACTION BODY 89 5.4.3.1 ASSIGNMENT 91 5.4.3.2 CREATION 91 5.4.3.3 DESTRUCTION 94 5.4.3.4 INSERTION 94 5.4.3.5 MODIFICATION 97 5.4.3.6 DELETION 99 5.4.3.7 RETRIEVAL 100 5.4.3.8 C O M P O U N D STATEMENT 102 5.4.3.9 CONDITIONAL STATEMENT 102 5.4.3.10 ITERATION STATEMENT 103 5.4.3.11 ABORT STATEMENT 104 5.5 EXPRESSIONS 104 5.6 OPERATIONS O N MULTI-VALUED PROPERTIES 106 5.6.1 ADDITION 107 5.6.2 REMOVAL 107 5.6.3 MODIFICATION 108 vi 6. A N APPLICATION EXAMPLE OF SDBM 110 6.1 A HYPOTHETICAL PROJECT M A N A G E M E N T SYSTEM 111 6.2 A SDBM PROJECT MANAGEMENT DATABASE 112 6.3 TRANSACTIONS FOR END USER OPERATIONS 121 6.4 SUMMARY 124 7. C O N C L U S I O N S 125 BIBLIOGRAPHY 129 APPENDIX A. SYNTAX OF SDBM 135 APPENDIX B. C O M P A R A S O N S BETWEEN SDBM A N D TAXIS 148 B. 1 SIMILARITIES 148 C. 2 DIFFERENCES 148 vi ACKNOWLEDGEMENTS I am deeply indebted to many people who have helped through my graduate study at the University of British Columbia. Here, I can only mentioned those who have been involved in my thesis work. I wish to express my sincere thanks to my supervisor Prof. Robert C. Goldstein for his guidance and encourgement. I truely appreciate his time and efforts in supervising my study. I have learnt so much from his database course which has greatly stimulated my interest in database research. I also wish to thank Prof. Yair Wand for his insightful comments on the thesis, which have helped to make the thesis more complete. Many thanks to my third reader Mr. Albert Fowler. I would like to thank Ms. Grace Wong who have made my stay in Canada much easier. I can always resort to her for help. Thanks also to Ms. Mackie Chase who took pains to correct many typos and English language errors in my thesis draft, which makes this final version more intelligible. Deep thanks to my friend Kefeng Xu who typed part of this thesis. We share a lot of serious thoughts as well as many not-so-serious jokes. My study at University of British Columbia is financially supported by the Canadian International Development Agency. 1 1. INTRODUCTION 1.1 OBJECTIVES It has been recognized that the tools for database design should be oriented more to the application-minded user than to the computer expert. The application-minded user is interested in the modelling of real world applications which are described in terms of entities, their attributes, their relationships, and their changes. Therefore, data models as conceptual modelling tools for database design should be based on real world entities as directly as possible. As a result of this recognition, during the past few years many researchers have devoted a considerable amount of effort to so-called semantic data modelling. Semantic data modell ing has the goal of capturing more of the meaning of the 'real' world being model led than the conventional data models can. A large number of semantic data models have been proposed. A few examples are the Data Semantic data model (Abrial [1974]), the Entity-Relationship model (Chen [1976]), Semantic Hierarchy Model (Smith & Smith [1976]), the RM/T data model (Codd [1979]), the TAXIS data model (Mylopoulos et al [1980]), the DAPLEX data model (Shipman [1982]), and the Semantic Data Mode l (Hammer & McLeod [1981]). From their work, a set of concepts, mechanisms, and methodologies for developing database-based information systems at the conceptual level have been established. At the same time, many important concepts and methodologies have been developed in programming languages. Consider for example the data typing concept, abstract data types, data and procedural abstractions, control structures, and exception handling. These concepts have found extensive use in the database design. Software engineering concepts have also been applied to databases in the development of data modell ing tools and 2 techniques (Brodie et al [1984]). The result is that many recently developed semantic data models are not just data models. Rather, they are database programming languages which are the integration of data models and programming languages. In this thesis, through designing SDBM (Semantic JJataBase Model) , we will investigate alternatives for conceptualizing semantic data modell ing and alternatives for incorporating programming language constructs and semantic data modell ing concepts. SDBM is designed to offer a clearer conceptual view of database designs and to offer greater flexibility in certain aspects than other revelant data models. The most important aspect of this investigation is the conceptualization of data modell ing in terms of two different views. On one side, we consider the 'real' w o r l d 1 to be model led as a collection of objects. These objects have properties and are interrelated in a certain structure. O n the other side, objects, relationships, and properties have to be represented in the computer. O n this side, the focus is on how data is represented in the computer. This is the external view of objects, relationships, and properites. The SDBM type concept is designed for external data representation. For data modell ing to be complete, appropriate correspondence between these two sides must be provided. The S D B M window concept is introduced to provide such correspondence. For the user, an S D B M window allows him to see part of the 'real' world. For the designer, S D B M windows also specify the conceptual structure among objects, relationships, and properties. It is our conceptualization that differentiates our approach from the previous work. W e will also investigate the separation between the type hierarchy and the window hierarchy in databases. However, most of all, we are interested in providing a data model that has greater modell ing ability than existing data models. We will demonstrate that our conceptualization of data modelling is sound through the design of the semantic database model, SDBM. ^ h e 'real' world refers to the part of the real world that we are interested in. 3 1.2 A P P R O A C H This thesis starts with the question 'What are the data modell ing problems of the conventional data models?'. In order to limit the research to a manageable range, we focus our attention on the relational data modelling methodology which is currently widely used and is expected to be used even more in the future. For the same reason of manageablity, the domain of this research is restricted to general data modell ing. By that, we do not intend to make our model capable of modell ing every aspect of the 'real' world. For example, we exclude modell ing of data from special purpose applications such as C A D / C A M and office procedures. What we are interested in is data that can be regularly formatted and data that is managed in many current databases. Examples which fall into our modell ing realm include student registration data, personnel data, payroll data, and inventory data. Within this perspective, semantic data modelling problems of the relational data model are identified through analyzing the data modell ing implications of some of its presumptions and inherent constraints and through analyzing its inablity to provide the basic modell ing mechanisms that are necessary at the conceptual level. These problems set up the major target of this research. To overcome these identified problems, a semantic database model SDBM (Semantic DataBase Model) is designed, primarily for the purpose of database design. The underlying mechanisms of this model come from two major disciplines, semantic data models and programming languages. In this research, semantic data modell ing is viewed as a set of well-established concepts including object orientation, classification, aggregation, generalization, and the concept of transaction, etc. They are the basic tools for organizing and manipulating objects, and are essential to almost all the semantic data models developed recently. 4 SDBM is designed to be a management facility for objects which are, conceptually, the most elementary units that can be operated on. Two other conceptual constructs are relationships and properties. Objects, relationships, and properties together form the conceptual world which is the conceptual counterpart of the 'real' world. This conceptual world may be viewed from different angles; different views may be related to each other; and possibly different viewers (users) may have different requirements for information details. To cope with these varieties, we will develop a window concept. We will actually specify the conceptual structure through this window concept. The computer data representation aspect of S D B M objects is handled by integrating into the system a data type facility. The system provides some built-in data types and some construction mechanisms to create complex data types. This facility enables certain automatic semantic integrity enforcements. Through analysis of the relational data model, it is clear that the record data structure is a major cause of its data modell ing problems. To solve these problems, we may abandon the record structure and look for some completely different structures, or we may modify the record structure and the way it is used. The latter alternative is taken in this research, because our further investigation reveals that the trouble with the record structure is only part of the story. The real underlying cause of relational data modelling problems is that objects are confusingly mixed up with their computer data representations. Namely, the problems mainly lie in the the way the record data structure is being used. The SDBM record structure is, therefore, designed. This kind of record permits complex component data types. Certain important inter-component constraints can be specified as part of the type definition. The advantages of static type-checking are recognized by designing SDBM to be typed. In S D B M , every denotable value has a type. Every expression can be statically type 5 checked. A generalization/specialization mechanism is also provided in the SDBM type facility. Type hierarchy rules exist to check the correctness of specification. Aggregation mechanism is used in data type construction and window construction. In addition to static type-checking, SDBM also enforces semantic integrity constraints through window checking. Window checking is primarily used for referential constraints, relationship constraints, and uniqueness constraints. The behavioral aspect of the database is modelled by S D B M transactions which are considered to be the most basic operation units that can be used by the end user to manipulate a database. The SDBM transaction is constructed from statements which are the smallest conceptually meaningful operations. Several sequence control mechanisms are borrowed from programming languages to enchance the expressiveness of SDBM transactions. In summary, the approach taken in this research is to first identify the semantic data modell ing problems of the relational data model and the basic mechanisms used in current semantic data modell ing research. Then the two-view conceptualization of data modell ing is formed. Based on this conceptualization, several fundamental data modelling concepts and mechanisms are re-interpreted and some necessary modell ing mechanisms are added. Finally, a semantic database model, SDBM, is designed through the integration of these concepts and mechanisms. 1.3 PREVIOUS WORK A considerable amount of prior research work in the semantic data modell ing area has influenced the design of S D B M . Usually, such research work results in new models or modell ing methodologies. This section identifies previous work related to this thesis and argues the relative advantages of S D B M by pointing out shortcomings of some closely related work. 6 In his landmark paper, 'Data Semantics', Abrial investigated the modell ing of data semantics in a mathematical model based on entities and binary relationships (Abrial [1974]). The model is rather abstract and is viewed as a conceptual tool for understanding data semantics. Hammer, Schmidt, and Weber identified the applicability of data type concepts in databases (Hammer [1976]), (Schmidt [1978]), (Weber[1976]). Brodie is one of the first researchers to develop a semantic data model which applies the data type concept to the specification and verification of databases (Brodie [1978]). His model focuses on the logical properties of data expressed in denotational specifications. McLeod and Hammer designed a semantic data model, S D M , for database design and documentation purposes (McLeod & Hammer [1981]). The model is not computer executable. Subsequently, they developed a database programming language DIAL which is based on S D M (Hammer & Berkowitz [1980]). The data type concept is supported by DIAL. However, the data type system is not integrated together with features from data modell ing. The data type system only provides a very limited number of data types and type construction mechanisms. Moreover, the concept of data type and the concept of c lass 2 are not separated. The two semantic data models that are most directly related to the design of SDBM are TAXIS and Galileo. TAXIS is a database programming language which provides a framework for information system design and is based on the concept of class which is applied uniformly to all the aspects of data modell ing (Mylopoulos et al [1980]). Therefore, we have classes such as metaclasses, data classes, and transaction classes. The fundamental modell ing mechanism of TAXIS is inheritance. TAXIS is not a typed database programming language. Well-known techniques for the data type checking in programming languages cannot be directly applied. TAXIS provides very simple data structures. Only the tuple data structure is available. In addition, in TAXIS everything is an object and therefore everything 2 T h e concept of class will be explained in Section 3.3. 7 is grouped into classes. Although such a uniform approach to data modell ing makes the model simple because of scarcity of basic modelling constructs, it is not aiways desirable and may be excessive in certain circumstances. One of the problems with such an approach is conceptual overloading of the class concept. For example, in a TAXIS program the sole purpose of some classes is to define another class. The former classes are conceptually different from the latter and apparently do not directly interest the database designer in determining the conceptual structure of the 'real' world. Moreover, TAXIS maintains no clear separation of type and class concept. Each data type has a unique class. Finally, in TAXIS the type associated with a class is restricted to a tuple type. That is, objects always have values of the tuple type, and they can be related to each other only through properties which are single-valued functions. Galileo is designed by Ablatio (Ablano [1985]). It is designed into a strongly typed database programming language. It utilizes the benefits of static type checking. A generalization mechanism is built into the type system of Galileo. The type hierarchy in Galileo specifies the intensional aspect of its class hierarchy. With Galileo, it is not possible to construct two extents 3 on the same data type, and therefore, different classes must have different data types. The type hierarchy of Galileo except for abstract types is specified implicitly and established through automatic enforcement of Galileo type hierarchy rules. Such implicitness is believed to be detrimental to the development of reliable database information systems. In addition, the referential constraints between objects of different classes have to be specified manually along with their processing procedures in Galileo. Finally, it is worth pointing out that Galileo is not a data model . Rather, it is a database programming language. It is very complex when compared with many other semantic data models. 3 That is, two sets of associated instances. A set of associated instances are usually managed in terms of a class in most semantic data models. 8 From our point of view, the most serious problem with these semantic data models is their underlying conceptualization of data modelling. They have not made a clear distinction between the conceptual structure (objects, relationships, and properties) and their external data representations. We will elaborate this point later on. 1.4 THESIS OUTLINE As usual, this first chapter has given the reasons for originating the research, presented the research approach to be taken, and discussed the related previous work. The next chapter provides a review of conventional data modell ing. It focuses on identifying the semantic modelling problems associated with the relational data model and problems with some of its direct extensions. These problems set up the goal for SDBM. Chapter 3 summarizes several of the most important features of current semantic data modell ing. Specific semantic data models are not discussed. These features provide some basic modell ing mechanisms for the design of SDBM. Chapter 4 gives a description of our conceptualization of data modell ing. It explains the basic concepts such as objects, properties, relationships, data types, roles, and windows. The chapter also discusses the specialization mechanism based on the conceptualization and proposes some extensions to this mechanism. In Chapter 5, an overview of S D B M is given. The overview is divided into three major parts, the S D B M type system, the SDBM window system, and the SDBM transaction system. Syntax is given informally by using prototypes. The semantics of each syntactic construct are also given informally both through examples and through explanation in plain English. A formal syntax of S D B M , written in context-free grammar, is provided in APPENDIX A. 9 Chapter 6 gives an example of using SDBM to design a hypothetical database. The example is highly simplified in order to emphasize the underlying principles and basic structures of the model. In APPENDIX B, we compare the database model , SDBM, and the data model, TAXIS. Finally, Chapter 7 concludes the thesis by summarizing the contributions of this research. 10 2. REVIEW OF DATA MODELLING 2.1 SEMANTIC MODELLING PROBLEMS OF THE RELATIONAL DATA MODEL There are serious semantic problems associated with conventional data models in general. For obvious reasons, the hierarchical data model and the network data model are basically oriented towards the computer instead of the human user. At the conceptual level, their modell ing ability is limited. In this paper, we will not consider these two data models but focus attention on the relational data model only. 2.1.1 TUPLE ORIENTATION The relational data model has the relation as its modell ing primitive. Basically, a relation is a flat file with tuples as its rows" . Although tuples are excellent for data processing purposes (primarily because of the current architecture of the computer), sometimes they are not directly related to the primitives in the human problem domain. With tuples, both the database designer and the database user may have to effect a mental transformation from human problem domain primitives, such as objects, to their computer representation, tuples. The transformation is complicated by the complex relationships between relations (or tuples) and conceptual primitives. As will be seen in the fol lowing discussion, relationships of this type are not one-to-one mappings. For example, in a relational database, a 'real' world object may be represented in any of the following ways: 1. As part of a tuple. 2. As a single tuple. " O n e difference between a relation and a flat file is that rows have no order in a relation while in a flat file they do. However, this difference is not significant in our discussion. 11 3. As a collection of tuples within a relation. 4. As a Join of tuples from two or even more relations. This suggests that a tuple may or may not represent an object. To illustrate the implication of this fact, let us take a look at a relational database consisting of several relations each with tuples describing certain aspects of employee objects. For example, a relation may describe the employee as a superisor and another relation may describe him as a research team member. However, the database management system based on the relational data model is not capable of obtaining and keeping the knowledge that they are just different aspects of the same employee. Therefore, it is impossible for it to carry responsibilities such as that for the cross relation consistency of the stored data. The relational representation of properties of an object and relationships between objects suffers from the same problems. 1. A property or a relationship may be represented as part of a tuple that describes an object. That is, a property or a relationship is expressed in terms of coexistence in a tuple. For example, an one-to-many relationship is almost always represented by embeding the relationship into the tuple on the 'many' side. 2. A property or a relationship may be represented in separate tuples. For example, if C O L O R is a multi-valued property of object PART, a separate relation has to be created. In fact, for any many-to-many relationship, a separate tuple must be used if normalization is desired. The worst implication of representing a relationship with several separate tuples may be that the database user might not know the existence of such a relationship. Or even if he knows the existence of a relationship, he may have difficulty in traversing relevant tuples because the relational database simply does not explicitly keep the information on this kind of relationship. 12 Such a non-one-to-one relationship between the relational data modelling primitives and the human conceptual primitives may also cause some difficulties for the relational database user because he is unable to directly manipulate objects and relationships between objects. The primitives he can directly manipulate are much less meaningful tuples and relationships among tuples. The desired result may be obtained as the side effect of this kind of manipulation. For example, the unit of creation and destruction is the tuple in the relational database. If an object is model led by two tuples, then it can only be created as the result of inserting two tuples. Notice that certain conclusions of the above discussion may be subject to disagreement depending on the interpretation of the relational representation of objects, relationships, and properties. Also notice that because the concept of an object is not well defined (In fact, this is a concept that can't be rigorously defined), we may be able to avoid the problems identified above by adopting the view that a tuple always corresponds to an object and anything that can't be represented by a tuple is not considered an object. However, we believe that designers as well as users do have certain tendencies in perceiving objects, properties, and relationships. We should not try to unnecessarily adapt human designers and users to constraints presented in the relational data model. 2.1.2 SEMANTIC O V E R L O A D I N G The relational data model has only one construct, the relation (and consequently, the tuple). This fact is often quoted as evidence of the simplicity of the relational data model. However, such simplicity does not always lead to a good result. Viewing it from a different angle, we will see that such simplicity can cause difficulty in interpreting a relation. This is sometimes called 'semantic overloading' (McLeod [1978]). That is, a single construct is used for many purposes: 13 1. To record the existence of objects. 2. To describe properties of an object. 3. To model relationships between objects. A user may have difficulty in determining which of the above purposes a given relation is used for. 2.T.3 ATOMIC DOMAINS By definition, the relational data model specifies that each entry of a relation can only have atomic domains. And to ensure that a relational database behaves properly, the relations are usually required to be, at least, in the third normal form. 'Atomic domain' means that the domain must be non-decomposable and non-hierarchical. Several problems are associated with this restriction. 1. Multi-valued properties are difficult to accomodate. Unfortunately, multi-valued properties are phenomena that are confronted frequently in the real world. For instance, a part may have more than one color. In order to remain in the third normal form, the relational data model either introduces an extra relation or adds more attributes to a relation. Conceptually, the first approach is not always desirable. For example, the part relation P(NAME, SIZE, SUPPLIER, COLOR) has to be broken into two relations P K N A M E , SIZE, SUPPLIER) and P2(NAME, COLOR) just because of the multi-valued nature of C O L O R attribute. Conceptually, we would prefer the single-table representation. This approach also introduces additional problems associated with the referential integrity (e.g. the integrity constraint between the NAME attribute of the relation P1 and the NAME attribute of the relation P2). 14 The second approach has only very limited applications. First of all, we have to know the maximum number of values a property can have. Secondly, relations created in this way could be very sparse (many null values may occur). Finally, this approach is not symmetrical and natural. Conceptually, we would prefer a single multi-valued attribute to several singled-valued attributes. 2. Object domain (domain whose values directly model objects) is impossible to express. For two major reasons, we would like the domain values of an attribute to directly model a collection of objects: a. Under certain circumtances, it is close to the way we think about certain relationships. For example, in the relation STUDENT(STDNO,NAME,DEPT,YEAR), we could treat the domain • of attribute DEPT in two different ways. First, we could think of DEPT as a collection of character strings. This is, in fact, the view taken by the relational data model . Alternatively, DEPT could be viewed as a collection of department objects. If we are interested in modell ing departments as objects, the latter approach is more natural. b. It helps the computer to precisely specify and enforce semantic integrity constraints. For example, the problem of referential integrity can be solved automatically, if object domains are allowed. The relational data model simply does not have such a capability. 3. The relational data model uses very few types of symbolic strings (e.g. integers, reals, and characters) to represent the varieties of real world properties. These few data types are static in that they have prefined operations which can not be changed. In addition, no abstraction mechanism is provided to structure domains. Therefore, desired behavioural semantics can not be attached to domains and subsequently properties. The result is that two conceptually different properties may have to have the same behavioural semantics. For 15 example, if the part SIZE and the employee AGE are both defined on the domain of two-digit integers, such illegal operations as AGE greater than SIZE is perfectly legitimate in the relational data model. 2.1.4 NORMALIZATION It has been realized that despite the apparent simplicity of the relational data model, particular care has to be taken to prevent a relational database from behaving abnormally. A relational database design theory called normalization was introduced for this very purpose. In a sense, relational normalization increases the ability to capture certain meanings of data (e.g. functional dependencies). However, conceptually it is irrelevant or even undesirable. We will elaborate this point below. The relational normalization theory dictates that anomalies, including insertion anmalies, deletion anomalies, and modification anomalies, be eliminated through decomposit ion or synthesis. There are two basic principles underlying the relational normalization, the principle of equivalence and the principle of minimal redundancy. Our argument is that all these concepts are basically data-processing oriented. 1. Decomposit ion and synthesis are recommended because the relational data model is handicaped by its limited data structure. An application-minded user does not care what data structure the computer is using. Neither does he care whether it decomposes or synthesizes its structure. In addition, there are semantic problems associated with the relational normalization theory itself. For example, normalized relations are usually less meaningful at the conceptual level, since a relation which contains certain information of a 'real' world object is often not in normalized form and the process of normalizing such a relation tends to split it into several realtions which do not correspond to the object being modelled. 16 2. Equivalence is also basically oriented towards the computer. The basic criterion of equvalence is the lossless join property under the universal relation assumption (Aho et al. [1977], Rissanen [1977], Beeri et al. [1980], Maier et al. [1979], Maier [1983] and Ullman [1983]). To the human user, two tables are usually not equivalent to one table. 3. Minimal redundency is desirable at the data-processing level, but may not be so at the conceptual level. Conceptual data redundancy is often desired, because for example the same person may be viewed from different angles or at the different levels of detail. Furthermore, even for data processing purposes the relational normalization can only eliminate some kinds of data redundencies. It may in fact introduce new data redundancy. Obviously, there is considerable lower level data processing involved in relational normalization. The key point is that the system should be designed in terms of semantically meaninful constructs so that analysis and design on the database designer side as well as interpretation and manipulation on the user side can be facilitated. The data structure manipulation for the purpose of efficient processing and internal data integrity should be left to the computer and completely hidden from the user. 2.1.5 RELIANCE O N USER-CONTROLLED IDENTIFIERS In the relational data model , user-controlled names or symbolic references are used to convey a considerable amount of semantic information. For example, objects and relationships are all represented by symbolic names that are chosen at the user's discretion. Whether the expected objective can be achieved depends on the database designer's careful selection and the user's consciousness. Kent has given an extensive discussion of this subject (Kent [1979]). See also (Date [1983]). The problems associated with reliance on user-controlled identifiers are summarized as follows. 1. User-controlled symbolic identifiers may be subject to changes. For example, a 17 supplier number may change for some reason. A person may not like his name any more and decide to change it. It is also quite possible to see a change of a journal name. When user-controlled symbolic identifiers are used as keys, such changes, though conceptually legitmate, are usually not allowed with the relational data model. This kind of change has to be accomplished through an insertion operation following a deletion operation. There are at least two problems with this approach. First, it is difficult for the user to find all the occurences of symbolic identifiers for an object so that proper insertions and deletions may be done. Second, as the result of deletion and insertion, semantic ambiguities may be introduced. For example, there is no way for the system to tell if we have only changed the person's number or we have replaced him with a different person who has identical properties except his number. Such semantic ambiguities do matter when a certain report on, for example, personnel archives is to be generated. 2. Symbolic references could result in references to non-existent objects. This is usually called 'dangling reference' which is sometimes desirable and sometimes not. Consider for example the part-supplier case. For each part, if we only want to know who the supplier is, we obviously do not care whether the supplier information is still in the database or has been deleted from the database when, for instance, the supplier goes bankrupt. Under such circumstances, a 'dangling reference' does no harm. Let us consider a different situation where the information on purchasing contracts is to be recorded. Obviously, a contract can exist only if the supplier still exists. In this case, a 'dangling reference' is highly undesirable 5 . 5 Of course, we only consider the static state of the system to be modelled. Dynamic transitions are not within our consideration. 18 The problem is that the relational data model provides no alternatives for specifying such constraints. 3. The database designer can have difficulty in selecting a proper identifier for a relation or attribute because of the strong interaction between the identifiers and the perception of objects being model led. Perception of objects can affect the choice of identifiers, and conversely identifiers can also affect the perception of objects. The latter can result in a quite different semantic interpretation of a relation or attribute. 4. Even if a choice of keys is made, it is often arbitrary' on the conceptual level. This arbitrariness can create serious problems when, for example, a user view that uses an alternative key needs to be derived. 5. In reality, different identifiers may be used to represent the same object. For instance, Robert Jackson and Bob Jackson could refer to the same person. Different identifiers may even be from different domains, for example, part-name and part-#. Because the relational data model only depends on domains to control joinability, the above real world possiblities must be banned in the relational database. 2.1.6 GENERALIZATION/SPECIALIZATION The relational data model is not an effective representation tool in some cases. This statement also applies to the modell ing of the important conceptual mechanism of generalization/specialization. The problems involved can be categorized into the following two parts. 1. The relational data model can handle certain machine-controlled special ization 6 by introducing extra relations. But this is both artificial and potentially 6 Machine-controlled specialization refers to specilization that can be obtained through evaluating some expressions, for example, first order logic expressions, against a group of objects, usually in a relation. 19 problematic when data integrity is concerned. In fact, as the number of specializations increases, the approach can result in unbearable data redundancies. For this reason, the concept of relational view may be used. However, the view concept can only provide limited help. For example, it is impossible to model the additional specialities of a subset with the relational view. 2. It will be very difficult for specialization that can only be specified and controlled by the user to be modelled by the relational data model. User-controlled specialization is often necessary. For example, in the PART-SUPPLIER case, we may want to have some extra information about wholesalers. Probably only human beings can understand what is meant here by 'wholesaler'. Therefore, we have to manually classify it. But at the same time, we still want the computer to maintain the semantic integrity constraint that a wholesaler is a supplier. 2.1.7 R E Q U I R E M E N T O F H O M O G E N E I T Y Kent identifies two assumptions underlying the relational representation, namely horizontal and vertical homogeneity in data (Kent [1979]). Horizontal homogeneity refers to the situation where each tuple of a given relation contains the same fields; and vertical homogeneity refers to the situation where in a relation a given attribute allows only the same kind of information in each tuple. For a real world user, these constraints are undesirable, because in reality counter examples are abundant. For instance, a patient relation may have an attribute P R E G N A N T - O R - N O T . For a male patient, we obviously do not want this attribute at all. On the other hand, it does not make much sense to have different relations for male and female patients. It is worth noticing that in this case, permitting null values might cause interpreting difficulty. It becomes more problematic when there is nonhomogeneity associated with an attribute 20 which we might want to use as an identifier. For example, social security numbers may often be used as employee identifiers. But in a multi-national corporation, some employees might not have social security numbers (Kent [1979]). Vertical nonhomogeneity is also a common phenomenon. For example, company cars can be assigned either to employees or to departments. Here employees and departments quite possibly have different formats of identifiers. The relational data model can handle this situation in a number of ways such as adding a new attribute, creating a new relation, creating new uniform identifiers, etc. However, none of them provides a good conceptual solution. For example, if we use two attributes, one for employees and another for departments, we not only introduce data integrity problems but also create a representation structure which bears little resemblance to the conceptual structure of what is being modelled. More awkward problems might arise when we try to assign cars to a new unit with an even different identifier format. For a more detailed discussion on this subject, see (Kent [1979]). 2.1.8 A G G R E G A T I O N The relational data model is not a very effective tool when we want to model another important data abstraction process, aggregation. For example, when recording data flow diagrams obtained from information needs analysis, we obviously want to preserve the relationships between different levels of a data flow diagram. Here, a level(n) diagram can be viewed as an aggregation of several level(n + 1) diagrams. With the relational data model, a special relation R( DIAGRAM,SUBDIAGRAM ) may be created. In each tuple of this relation, SUBDIAGRAM is an immediate subdiagram of DIAGRAM. 21 There are at least four difficulties associated with this approach. First of all, the modell ing process is not direct and explicit. All the relationships between different levels are hidden in data. Secondly, specialities that may be associated with different levels are difficult to accommodate. In our example, no such difficulty arises because diagrams at all levels are assumed to have homogeneous attributes. However, it is common that each level of aggregation has its own unique attributes. Thirdly, certain semantic integrity constraints cannot be maintained by the relational data model. For example, if A is a subdiagram of B and B is a subdiagram of C, then A is a subdiagram of C. This constraint can be easily violated. Finally, in certain cases information retrieval will be inconvienent. For example, it is awkward to express the query, 'find all the subdiagrams of diagram A'. To conclude the discussion of semantic modell ing problems of the relational data model , it should be mentioned that the above list is not exhaustive in any sense. The discussion could be expanded by further elaborating the listed points. For example, a further discussion can be made about problems associated with composite primary keys. This paper has neither enough space to accommodate them all nor the intention to do so. Moreover, it should also be mentioned that the above criticisms of the relational data model may not be so sound when looked at from some different angles. Some of our criteria such as simplicity and naturalness are generally subjective and may subject to different interpretetaions by different people. 2.2 SEMANTIC EXTENSIONS TO THE RELATIONAL DATA MODEL Ever since C o d d published his article 'A Relational Mode l of Data for Large Shared Data Banks' (Codd [1970]), the relational data model has been subjected to the substantial scrutiny of database researchers. There are not only considerable criticisms as presented in the previous sections but also many extensions which aim at enhancing the semantic 22 modell ing power of this data model . In this section, I will briefly summarize this work by commenting on how these extensions achieve this goal and what the problems associated with the approaches are. This section will only cover the direct extensions of the relational data model. By 'direct extension', we mean the basic construct of the extended data model are still relations and tuples. The next chapter reviews semantic data modelling in more detail. 2.2.1 EXTRA CONSTRAINTS To some extent, the relational data model has the above problems because the model is too general. In order to use such a general model to capture the great variety of the real world, the database designer has to somtimes make very subtle and difficult choices among many possible alternatives. Similiarly, the database user usually faces difficulties in correctly interpreting the intension of the database. Therefore, a natural way of solving the semantic integrity constraint problems of the relational data model appears to be to add some kinds of additional semantic integrity constraint mechanisms to it. In fact, much work has been done in this direction (Stonebraker et al [1974], Eswaran [1975], Astrahan [1976], Hammer & McLeod [1975], Hammer & McLeod [1976]). In this type of approach, integrity constraints are specified by a predicate assertion facility. Assertions can be stated in the database schema, and enforced automatically by the DBMS. For example, in System R (Astrahan et al. [1976]) there is an integrity constraint subsystem which uses the query language SQL to express constraints. The system can handle both static restrictions, e.g. attribute SEX can only has two values 'F' and ' M ' , and dynamic restrictions, e.g. the specification of a trigger. Although this approach proves to be very powerful and flexible in dealing with some semantic integrity problems, it can not solve many other problems of the relational data model such as those associated with the tuple representation. Furthermore, it is a 23 piece-meal approach. Constraints and the data model are separated. This may lead to unnecessarily increased conceptual difficulty in using the model . A better approach would be to let more constraints be directly expressed in the data model . 2.2.2 SEMANTIC CLASSIFICATION O F RELATIONS The second approach is characterized by its attempt to classify relations so that some semantics can be clearly captured. Schmid & Swenson [1975] and Wiederhold [1977] investigated the classification of relations into the following categories: 1. Those that describe the existence of objects (autonomous objects). 2. Those that model multi-valued characteristics of objects. 3. Those that capture relationships among objects. This approach explains various ways in which relations can be associated with semantic primitives. However, as pointed out by McLeod [1978], it does not really provide any guideline as to how relations should be used, nor does it extend the power of the relational data model in capturing more semantic information. Moreover, it does not eliminate any previously identified problems at all. 2.2.3 THE ENTITY-RELATIONSHIP A P P R O A C H Another important extension of the relational data model is the work done by Peter Chen [1976]. His model is called the entity-relationship model. The original ER model has been traditionally viewed as a direct extension of the relational data model. However, it would be more helpful to view it from two different angles, 1) its data structure and 2) its conceptual view. As far as the data structure is concerned, there is nothing significant about the model. Except for some minor differences, it is very close to the proposal of Schmid & Swenson [1975]. However, the significant part of the ER model is its underlying conceptual view which is profound and far-reaching. It is this conceptual view that has stimulated considerable interest in the entity-relationship approach. To some extent, many 24 recently proposed semantic data models can be directly or indirectly linked to the ER model. It is also very interesting to notice that the ER model, originally proposed as a database tool , has gained wide acceptance in system analysis and design (Chen [1981]). The major problems associated with the ER model are identified as follows: 7. 7a6/e Representation. The structure of the ER model is the same as the relational data model , namely a two-dimentional flat table. Therefore, problems identified for the tuple representation still exist. 2. Rigid Distinction between Objects and Relationships. The ER model explicitly defines relationships and prohibits them from participating in further relationships. This distinction is neither necessary nor convenient. 3. Lack of Richness. The ER model in its original form is a good database design aid, but not rich and flexible enough. For example, it does not allow attributes to be viewed as relationships. It does not allow entity sets to be attribute domains, either. In fact, the value of entity attributes can only be taken from so-called value sets which contains atomic symbolic strings. 2.2.4 CONCLUSIONS In this section, We have briefly reviewed some direct extensions of the relational data model. In conclusion, although this earlier research has contributed a great deal to semantic data modelling, it is handicapped to some extent by its close tie with the relational data model. None of them is sufficient to be used as a satisfactory conceptual modell ing tool to capture the semantics of real world applications. 25 3. R E V I E W O F S E M A N T I C D A T A M O D E L L I N G This chapter is devoted to a discussion of semantic data modelling. The discussion does not focus on individual semantic data models, but rather on the basic concepts and principles that are common to many of these data models. In the following, these concepts and principles are discussed in turn. 3.1 B A S I C G U I D E L I N E The human conceptual primitives are objects, properties, and relationships between objects. Database design is a transformation from the conceptual representation of an application into structures particular to a data model. Using a database is also a transformation but in the reverse direction. One of the objectives of the data modelling activity is to reduce these mental transformation as much as possible. 3.2 O B J E C T O R I E N T A T I O N In semantic data modelling, 'object' is considered the most basic modelling construct. An object is any thing that interests us. In the data modelling literature, different terms have been used to mean the same thing. Abrial [1974], Berild & Nachmens [1977], Smith & Smith [1976], Shipman [1981], and Brodie [1984] use the term 'object'. Chen [1976], Kent [1979], and Codd [1979] use 'entity', while others choose the term 'token', for instance in (Mylopoulos et al. [1980]), and (Tsichritzis & Lochovsky [1982]). The term 'object' is used in this paper as the most basic modelling construct. The term 'entity' is used to mean the real world thing to be modelled. Based on the observation that objects may be different in nature and in their implications for data processing, researchers generally classify objects into categories. Abrial 26 distinguishes between concrete and abstract objects (Abrial [1974]). A concrete object represents a physically existing thing, for example, a part, a supplier, or an employee, while an abstract object represents a concept such as date. Chen makes the distinction between a regular and a weak object (Chen [1976]). A regular object may exist independent of any other objects, whereas the existence of a weak object is dependent on the existence of some other object (or objects). The data processing implication is that the deletion of a regular object may cascade to cause deletion of all the related weak objects. In (Codd [1979]), Codd proposes to classify objects into characteristic, associative, and kernel objects. To human beings, the existence of an object is manifested by its properties which are the characteristics of that object. If we define attributes as characteristics of a real world entity, properties are their conceptual modell ing counterparts. One important principle is that there should be a one-to-one correspondence between entities of the real world and objects in the database. We should emphasize that in this paper the term 'object' is used as a modelling construct. An object is an entity of the modell ing world. The principle of one-to-one correspondence, which cannot be over-emphasied, is inherent in this definition of objects. This principle indicates that for an entity in the 'real' world, there should be correspondingly in the model one object representing that entity. Moreover, each object in the model must not represent more than one entity in the 'real' world. Finally, this principle is being discussed only at the conceptual level. Physically, certain redundancy may sometimes be desirable. For example, in order to achieve optimization of different access strategies, multiple copies of the same data may be required. Of course, such redundancy must be well controlled so that consistency and synchronization are ensured. The implementation of this one-to-one principle makes it possible to eliminate some very serious integrity problems such as insertion and deletion anomalies, and to some extent 27 the referential integrity problem, of the relational data model. Another important concept of the object orientation is the object identity. While in a set-oriented data model such as the relational data model two objects (tuples) can only be distinguished by some intrinsic properties, object identity allows any object to be referenced directly as a unit. An important implication is that an object is not the same as its symbolic name. This avoids the problem of unchangeable unique identifiers and a collection of other serious data modell ing problems. We will come back to this point later in Section 4.2.3. A further consequence of object identity is that relationships are asserted between objects instead of their symbolic identifiers. This contributes to the elimination of the referential integrity problem, for example, by specifying that an object cannot be deleted until it no longer participates in any relationships. As a final comment on the object orientation, it should be clarified that although object orientation is thought of as a semantic data modelling principle, this does not mean that conventional data modell ing does not deal with objects. The difference between conventional data modelling and semantic data modell ing is that the latter manipulates objects directly in the models, while the former does it indirectly through constructs such as tuples. The data abstraction aspect of semantic data modelling is discussed in the following section. 3.3 DATA ABSTRACTION Data abstraction is a process in which irrelevant details are suppressed and details relevant to the application being modelled are retained. Relevant details can be understood either with respect to the applications at hand or to a particuliar group of users. Clearly, different applications need different details. Different users also need to access different levels of detail. These needs have undoutedly motivated the explicit application of data abstraction in databases. Furthermore, like any other abstraction process, data abstraction is 28 a very effective way of reducing the mental load on both database designers and users. As soon as we understand a group of phenomena, we abstract the revelant information from them and then represent such abstraction with a concept which is usually identified with a symbolic name. Thereafter, we only need to refer to this symbolic name instead of going back through all the detailed information. The basic mechanisms that are used in most semantic data models to provide data abstraction are: classification, aggregation, generalization, and associat ion 7 . 3.3.1 CLASSIFICATION As mentioned before, objects are used as basic constructs to model the real world. It is noticed that objects may share some commonalities, particularly some common properties. Based on these common properties, we can group objects into categories — object classes, or classes for short. This grouping process (mechanism) is called classification. Precisely, classification is defined as a form of abstraction in a collection of objects, which is considered as a higher level object. By classification, the relevant common properties shared by objects are kept by the type of objects in a class while all the other properties of these objects are ignored. Classification establishes an instance-of relationship between a particular collection of the objects currently existing in the database and a class denoted by the name of the class. The term 'class' has several synonyms such as 'entity set', 'entity class', 'entity type', 'object type', etc. Unfortunately, 'class', 'set', and 'type' do not usually mean the same thing. To avoid any confusion I will use 'class' through this paper to mean a collection of classified ob jec ts 8 . One of the distinctive features of classes is that they are persistent. 7 This is not to be confused with the association between two sets. Association as an abstraction mechanism will be explained in Section 3.3.4. 8 The concept of class defined here can be found in various semantic data models proposed recently. As will be seen in the next chapter, the class concept is less powerful than the S D B M window concept in terms of data modell ing ability. 29 Namely, they exist even after a program invocation has terminated. Some semantic data models also allow classification to be applied to classes. The result is a metaclass. A metaclass is an abstraction of a collection of object classes which share some common properties. Note that the properties that are to be abstracted belong to an object class rather than instances of a object class. An example of a metaclass is CHARACTER-STRING with object class ALPHBETIC-STRING and object class NUMERIC-STRING as its instances. Notice here that a specific character string is not an instance of CHARACTER-STRING as defined. 3.3.2 A G G R E G A T I O N Sometimes a relationship among several objects may become so important that we would rather ignore these objects and refer only to the relationship. Clearly, this is an abstraction process. It is usually called aggregation. Formally, aggregation is defined to be a form of abstraction in which a relationship among objects is regarded as a higher-level object, with lower-level details suppressed. Thus, with this abstraction, an object may be viewed in two different ways: first as a relationship between an identity and a set of lower-level objects (attributes) and second as an object in its own right (Date [1983]). Some refinements of aggregation are made in semantic data models. There are basically three forms of aggregation (Codd [1979]). 1. Cartesian aggregation which views an object as an aggregate of properties. For example, object STUDENT may be considered to be an aggregate of properties STUDENT-NO, NAME, SEX, and ADDRESS. Aggregation of this kind is originally addressed by (Smith & Smith [1977]). 2. Part aggregation which views an object as an aggregate of its parts. For example, a PC may be viewed as a part aggregation of a monitor, a keyboard, a C P U and a disk drive. This aggregation can be applied recursively to 30 generate a part aggregation hierarchy. Part aggregations are primarily useful to maintain relationships among objects, i.e. part hierarchies, whereas Cartesian aggregations focus on describing objects by their properties. 3. Cover aggregation which refers to the grouping together of a collection of possibly heterogenerous objects into a higher-level object, in accordance with some kind of membership criterion (Date [1983]). For example, a task force may be viewed as a cover aggregation of ships, planes, tanks and personnel. There are two important characteristics of cover aggregation. First of all, the structure of its members is not well defined. This is unlike Cartesian aggregation where the structure of its members is explicitly specified in the definition of aggregation. Secondly, the grouping may or may not be machine-understandable. For this reason, the Semantic Data Model calls it user-controllable grouping (Hammer & McLeod [1979]). There is still another kind of aggregation, statistical aggregation, though it is somewhat different from the aggregation concept defined here. This concept is usually supported in most data models by providing some built-in functions, e.g. MAX, MIN, etc. 3.3.3 GENERALIZATION/SPECIALIZATION Generalization is a form of abstraction in which a set of similar objects is also considered at a generic level. With the generalization mechanism, an object may also be viewed as a generic object. This abstraction establishes an is-a relationship. By generalization, many individual differences between objects are suppressed. In set theory, this is similar to generating a superset. For example, object class D O C T O R and NURSE can be generalized to a more generic object class STAFF, if we are not particularly interested in their differences. 31 Generalization can be repeated to specify a generalization hierarchy, also called subclass hierarchy when the generalization mechanism is applied to object classes. This hierarchy corresponds to one of the important ways we acquire and process our knowledge. Therefore it is not surprising that generalization should be one of the inherent properties of data modelling. However, the conventional data models fail to provide a rigorous and consistent mechanism for modell ing this hierarchy. This is especially true of the relational data model which almost totally ignores the importance of generalization. Theories like normalization focus exclusively on Cartesian aggregation. In database design, it is the concept of specialization, the reverse of generalization, that is widely used to 'generate' new object classes. Because a database application is often wel l-bounded, we therefore can work downwards, i.e. from generic concepts to more specific concepts. The term 'generalization' and 'specialization' are distinguished only to indicate the order in which relevant classes are defined. They have no different implications for the conceptual structure of the database. In other words, generalization and specialization are exactly symmetrical with respect to the conceptual structure. Several very important characteristics are inherent in specialization. There are at least the following four: The extension of the specialization is a subset of the extension of the generic class. The specialization inherits all the properties from the generic class. For example, if class PERSON has a property called height, then class CHILD will have property height too, if class CHILD is specified to be a specialization of class PERSON. Values of inherited properties must be specializations of the generic property values. For instance, if values of PERSON'S height are specified to be less than 10, then values of CHILD'S height must also be less than 10. The specialization may have additional properities. For instance, class CHILD has a GUARDIAN property which class PERSON may not have. 32 3.3.4 ASSOCIATION This is a concept introduced by M . B r o d i e 9 . Association is a form of abstraction in which a collection of member objects is considered as a higher level set object. By association, the properties of a set object are explicitly differentiated from the properties of its member objects. For example, each member of object class STUDENT has property SNAME, STD#, and AGE. At the set level, the set object STUDENTS may have property STUDENT-REPRESENTATIVE and AVERAGE-AGE which are related to its member properties but do not exist at the member level. Here we say that, in this respect, STUDENTS is an association of student objects. Another example is that set object PEOPLE is an association of object instances of class PERSON. Here, PEOPLE may have properties such as POPULATION which is related to the set of objects as a whole, whereas PERSON have properties directly related to each individual person objects. In this section, I have briefly reviewed four basic concepts of data abstraction, namely classification, aggregation, generalization and association. These concepts provide useful or even necessary constructs for a powerful data modell ing tool — semantic hierarchy. 3.4 SEMANTIC INTEGRITY CONSTRAINTS Another important aspect of data models is what and how semantic integrity constraints are modelled. Data models differ dramatically in the provision of means for representing semantic integrity constraints. This section reviews major semantic integrity constraint issues in semantic data modell ing. 9 He claims that sets are fundamental modell ing concepts. Association is one of three primitives of his set concept (Brodie [1983]). 33 A database can be viewed as a repository of data values and a database schema can be viewed as a collection of constraints that restrict values that may exist and changes that may happen in the database. Therefore, to some extent, a data model is nothing but a methodology for specifying constraints. The following discussion will be pursued from this viewpoint. As will be seen, this view clarifies the difference between conventional data modell ing and semantic data modell ing. In conventional data modell ing, database schema and integrity constraints are separate concepts. Usually, a database schema is more concerned with accurately reflecting the real world situation, while integrity constraints restrict the possible database states that can be generated from a given schema to those that meet the constraints. The former is more concerned with organizing modell ing constructs into meaningful structures. In this paper, the term 'semantic integrity constraint' is used to refer to any database constraint that ensures accuracy, correctness, or validity of data in the database. With a specific data model, different semantic integrity constraints may have different relationships with the data model. Some are the inseparable part of a data model. They are called inherent constraints of that data model. For example, that tuples in the relational data model must be unique within a relation is an inherent constraint. Constraints that must be specified using some kind of constraint mechanisms are called explicit cons t ra in t s 1 0 . For example, constraints that are specified with certain first order logic assertions are explicit constraints. The third kind of constraints are implicit constraints. An implicit constraint is one that can be derived from either inherent and/or explicit constraints. Viewing a data model as a methodology for organizing semantic integrity constraints, we see that one major difference between data models lies in what and how semantic integrity constraints have been built into data models. The relational data model has very 1 0 In some literature, this kind of constraints are called semantic integrity constraints. 34 few inherent constraints. Most semantic integrity constraints have to be externally specified. Often, many kinds of. semantic integrity constraints cannot possibly be specified. To overcome the modelling shortcomings of the relational data model, researchers have proposed many external semantic modelling facilities. We have seen examples of this kind in INGRES and System R. With most semantic data models, however, more data semantics are captured by incorporating in the semantic data models a greater variety of inherent constraints. Of course, there is a trade-off in doing so. O n one side, we are increasing our ability to model certain data semantics. O n the other side, we are narrowing the modelling realm. Several types of semantic integrity constraints are of special interest to semantic data modell ing. They are domain constraints, cardinality constraints, uniqueness constraints, referential constraints, and inheritance constraints. A domain constraint refers to the restriction on values that a property can have. In the literature, the term 'type constraint' is also used. Almost all the data models prefixed with the adjective 'semantic' provide facilities to enforce these kinds of constraints. The basic mechanism employed by these models is to provide additional levels of data abstraction between properties and primitive types such as integer and character. In this way, underlying domains can be associated with distinct semantics. In database literature, a cardinality constraint usually refers to a restriction on the cardinality of relationships between classes. Cardinality is the number of objects in an class that can be related to an object in another object class and vice versa. This type of constraint is often extensively studied in semantic data models that directly deal with binary relationships, for example, in the Binary Semantic Data model (Abrial [1974]). It is easy to show that uniqueness constraints are a special case of cardinality constraints. 35 Referential constraints refer to the restriction on 'dangling references' 1 1 . This restriction requires that if an existing object has referenced other objects, then all referenced objects must exist. 'Dangling references' may be caused either by insertion of a new object or by removal of an existing object. Upon insertion, all the referenced objects must be already in the database, and upon removal, the object must not being referenced by any object. Otherwise, insertion and deletion should not be allowed to take place. This constraint is also called the existence constraint (Codd [1979]) or dependency constraint (Chen [1976]). In various extensions to the relational data model, some referential integrity rules have been developed to handle this type of constraints (Codd [1979], Date [1981]). An inheritance constraint refers to inheritance properties of data abstraction. For example, each subclass inherits all the properties of its superclass. This kind of constraint is automatically enforced in most recently developed semantic data models, such as TAXIS and Gali leo. In these models, the constraint is specified whenever the database structure is specified. The approach taken by semantic data models in modelling constraints is considerably different from the conventional data models. Most semantic data models present modell ing primitives, structures, and constraints within an integrated framework, while the conventional data models usually approach them separately. 3.5 APPLICATION O F DATA TYPES In recent years, more and more attention has been paid to the application of data abstraction capabilities developed in programming languages to the database management system. Systems such as TAXIS (Mylopoulos et al [1980]), Galileo (Ablano [1985]), RIGEL (Rowe [1979]), and INGRES (Stonebraker et al [1976]) all provide some applications of data types. This section gives a brief review of such applications. 1 ^ h e n we say 'A is referenced by B', we mean that to maintain integrity of a database, A must exist if B exists in the database. 36 3.5.1 U S E O F T H E D A T A T Y P E C O N C E P T The concept of the data type was originally developed in programming languages for the purpose of improving data reliability, program readability, program verification, and processing efficiency. The concept has been considered fundamental in the development of programming languages, basically because it is a very effective abstraction mechanism. Recently, a considerable amount of research has demonstrated that with some extensions this concept, especiallly the concept of abstract data type, can apply directly to the database context. The primary uses of the data type concept in the database context are: for the specification of a database. The concern of database structures is centered around data and their abstraction. Abstraction mechanisms such as generalization, aggregation and encapsulat ion 1 2 can be provided by data types. At the same time, data types also specify that certain operations are to be provided for data objects. One of the many advantages of applying data types to database specifications is that certain semantic integrity constraints, e.g. domain constraints, can be defined uniformly. For example, a hierarchy of data abstraction can be established with desirable semantics being associated with each participating data type. Another distinctive advantage of applying data type concept to the database design is that the behavioral semantics, in terms of specific operations, embedded within a data type, can be integrated with the structural semantics of a database design. From this point of view, abstract data types are those data types with behavioral semantics which can be specified by the designer. for the verification of a database specification. To prove that a database specification includes the desired structure and has no inconsistencies has not been an easy task. The use of the data type concept makes it possible to apply verification techniques developed in programming languages. Such techniques include theoretical models like 1 2 By encapsulation abstraction, we mean that the user of the abstraction does not need to know the hidden information in order to use the abstraction and he/she is not allowed to directly use or manipulate the hidden information. 37 operational semantics, denotational semantics, and axiomatic semantics. Two examples of such application are the work done by Brodie [1978] and the work by Wong [1981]. for the validation of a database. The data type concept provides a means to relate constraints and data objects. Typechecking techniques associated with data types have proven to be one of the best mechanisms for ensuring database correctness. 3.5.2 DATA TYPE STRUCTURES Data type structures provided by different semantic data models differ significantly from each other. Some offer only very primitive types such as integers, character strings, booleans, and enumerations. A few examples are TAXIS, DIAL, and G E M . Others such as RIGEL, ADAPLEX, and GALILEO provide a richer set of data types. All these models support the notion of class which, as mentioned before, is a special data type associated with a persistent extent, namely an associated collection of objects. In a recently published paper (Buneman & Atkinson [1986]), Buneman and Atkinson have argued that type and class may be separated to provide more general data models. They also pointed out that many recently developed data models have not maintained a clear separation of this kind. Separation of types and classes indicates that a data type may have multiple extents. This is often desireable in many situations. However, such separation of types and classes is usually not supported by semantic data models. This is a serious flaw in many semantic data models. They do not support the fact that a ro le 1 3 can have several types and a type can be used to model more than one role. That is to say there could be a many-to-many relationship between roles and types. It will become more clear in the discussion of our conceptualization of data modell ing. 3 The concept of role is explained in Section 4.2.3. 38 4. THE CONCEPTUALIZATION O F DATA M O D E L L I N G In this chapter, we explain in detail our conceptualization of data modell ing. This conceptualization will be applied to the design of a semantic database model, S D B M , which will be specified in chapter 5. In section 4.1 of this chapter, we give a statement of the underlying philosophy of our approach. The statement is presented without any explanation, because the rest of this chapter is, in fact, the elaboration of this statement. 4.1 THE UNDERLYING PHILOSOPHY The 'real' world exists in the database. 4.2 THE TWO-V IEW CONCEPTUALIZATION In the previous two chapters, a review of the data modell ing practice was provided. We discussed both conventional data modell ing (relational data modell ing in particular) and semantic data modell ing. The ability of conventional data models to modell the 'real' world was analysed. In the analysis, we assessed data modell ing shortcomings of the relational constructs and modelling mechanisms. A natural question to ask therefore is "With respect to these shortcomings, is there a more fundamental cause or is it just because some particular constructs or mechanisms are inadequate?" Tracing these shortcomings to their roots, we find that most of the shortcomings stem from the inadequate conceptualization of data modell ing, which is embedded in these models and their corresponding modelling methodology. In other words, it is not some of their particular building constructs but the fundamental principles that are to blame. Before further explaining this assertion, let us describe what our two-view conceptualization means. 39 4.2.1 A GENERAL DESCRIPTION Conceptually, there are actually two different aspects involved in data modell ing. On one side, we want the database to resemble the 'real' world as closely as possible. The 'real' world is considered to consist of a set of objects, their properties, and the relationships among these objects. These fundamental concepts and their differences from the identically-named concepts in previously developed data models including both conventional data models and semantic data models will be discussed in detail below. The guiding principle is that these fundamental concepts must be directly modelled in the database. O n the other side, we should realize that we are constrained by the current technology of the computer. We have to represent these concepts in the computer. Unfortunately, we can not accomplish this without using various kinds of symbolic strings, ln other words, symbolic strings are absolutely indispensable for the description of objects, properties, and relationships and for input and output between the user and the computer. However, no matter how they are used, the concepts of objects, properties and relationships and their representation with symbolic strings must be clearly separated. The separation must not depend on the conciousness of the designer and the user. Rather, it must be built into the data model as an inherent part of the data model so that many constraints which are natural in the 'real' world can be as naturally model led. To further clarify our conceptualization, we will discuss the relevant concepts in detail and present our interpretation of several existing semantic modelling mechanisms. 40 4.2.3 S D B M OBJECTS The concept of an S D B M object is basically the same as it is defined in other semantic data models. Naturally, the fundamental principle is that there should be a one-to-one correspondence between entities in the 'real' world and objects in the database. The rest of this section is structured in such a way that several aspects of the SDBM object concept will be discussed. They include the existence of objects, the naming of objects, and the type of objects. Because it is difficult to present the discussion on these aspects sequentially, the discussion may be better understood after the reading of the entire section. To discuss the type of an S D B M object, let us first describe the concept of role. The term role has been used in data models with various different meanings. Codd used it to describe the way in which a relation attribute relates to a domain (Codd [1970]). Chen used the term to describe the function that an entity plays in a relationship (Chen [1976]). The role concept in S D B M is similar to that used by Bachman and Daya [1977]. It has a similar meaning when it is used in the theatrical context. A role is an abstraction of certain properties of a set of objects. There is a many-to-many relationship between roles and objects. That is, a role can be played by various objects and a specific object may play many roles at the same time. All the roles played by an object characterize that object. In SDBM, objects are not directly related to each other. Rather, they are interrelated through their roles. For example, consider that an object plays a student role and another object plays a course role. The two objects can be interrelated through these two roles, e.g. the student takes the course. Now, let us go back to the consideration of the object type. The concept of object type has been used in semantic data models primarily for the purpose of reducing 41 complexity and increasing understandability. Unlike any other data models, SDBM objects have no type. This conceptualization is supported by the following considerations: We believe that this is a more natural way to look at objects. In most existing data models, objects and roles have been mixed up. For example, we could have an Engineer object type, an Employee object type, a Manager object type, and a Shareholder object type. But in fact they are just different roles. An object may assume all these roles. The abstraction function and the constraint specification function provided by the object type concept can be obtained through using the role concept. We will demonstrate this contention in Chapter 5 when we present the design of S D B M . There is a restriction in other semantic data model that an object may belong to several object classes but these classes must be in a same class generalization hierarchy. Our conceptualization of objects pinpoints this restriction and highlights its undesirability in modelling data. Now, we turn to discuss the existence of an SDBM object. The role played by an object has been conceptualized differently from the object itself. We should also clarify the relationship between the role of an object and the existence of that object. They are two different notions, although in S D B M the existence of an object is manifested by the roles played by that object. In a sense, the role concept does indirectly model the existence of SDBM objects. From a data modelling point of view, without playing any role, an object simply does not exist. For example, an object may appear to be a person, a student, and an instructor. The person, the student, and the instructor are all roles. Obviously, in reality an entity still exists even without playing these roles. But in data modell ing, for practical purpose, an object needs to play at least one role to exist, which in fact implies that we are still intersted in the corresponding entity. 42 Another characteristic that ought to be emphasized is the uniqueness of SDBM object existence. W e have limited our discussion exclusively to the conceptual level, ln fact the entire paper is presented at this level. At this level, a specific object can have only one existence. Multiple existence is not allowed. From this uniqueness constraint, we can infer that in SDBM two objects are never the same, even though they may play identical roles and their external data representations may be exactly the same. This conceptual constraint, when implemented, can lead to the elimination of certain data duplications in the database. This uniqueness constraint also leads us to clarify our conceptual views of differentiating two objects. We maintain that any two objects must be conceptually differentiatable. Syntactically, however two SDBM objects may not be differentiated, for sometimes we do not need to make syntactic differences between two objects. However, we know that they are different objects. The conceptual differentiation provides us with the potential to syntactically distinguish them in the future. Physically, an SDBM object may be represented with an internal character string which is invisible to the database user and unique for each object in the entire database, or they may be realized through some other means. These internal identifiers or any other physical constructs should not be accessed by the user. The existence of an SDBM object starts with an explicit instruction of object creation. The act of creation also assigns a role to the object at the same time. Without binding to a role, an object can not be created. Once an object is created, it can then be assigned to various permitted roles. A role played by an object can also be de-assigned when the object no longer plays that role. The existence of an object is terminated with an explicit destruction instruction. 43 Now, we discuss the naming of SDBM objects. We have identified in chapter 2 that many pathological implications in the relational data model are caused by confusion of these two concepts, objects and their naming. In S D B M , we maintain an explicit distinction between objects and symbolic strings. To make our point clear, we will discuss issues of SDBM object naming at three levels. At the internal level, an SDBM object may be represented with an internal symbolic string. Notice that such symbolic strings are significantly different from external symbolic strings. Firstly, they are invisible to the database user and can not be accessed by the database user. Secondly, they are generated automatically by the computer and are unique for each object in the entire database. Thirdly, their structure, syntax, etc. are irrelevant at the other levels. At the conceptual level, an SDBM object has no name. That is, we do not bind an object to a specific symbolic string throughout the life time of the object. By not doing so, we isolate an object from any possible relationships it may be participating in and from any roles it may be playing. We also separate an object from its properties. At the external data representation level, an S D B M object can be identified with any number of legitimate symbolic strings. These symbolic strings are bound to various syntactic forms. For example, an object may be identified by a number, an alphabetic string, or a record. The structure of symbolic identifiers for an object is differentiated from the object itself. Notice that here multiple identifiers for an object do not necessarily refer to synonyms as often mentioned. In the rest of this paper, we may loosely call an object playing the role of a student a student object. Because an object can play several roles at the same time, it may have several different 'names'. However, this is only for expository purposes. 44 4.2.3 S D B M RELATIONSHIPS Another fundamental concept is the relationship between two S D B M objects. This section discusses this concept. The next section will discuss the concept of object properties which can be viewed as a special type of relationship. Like the SDBM object concept, the fundamental principle of the one-to-one correspondence applies to the SDBM relationship concept. This section focuses on modell ing of different types of relationships, their existence, and their representations. We first consider the issue of n-ary vs. binary relationships. The criteria for deciding which is more suitable as the basic modell ing construct are modell ing simplicity and modelling accuracy, even though they are highly subjective. On one side, it is argued that because the binary relationship is the most primitive concept it is the most appropriate for conceptual modell ing (Braochi et al [1976]). On the other side, the n-ary relationship is favored because of its conciseness. For instance, the relational data model uses n-ary relationships as its basic construct. The major disadvantage with the binary relationship approach is that sometimes we are forced to introduce arbitrary relationships which we are not interested in. Consider, for example, the relationship where a student takes a course taught by a professor. In order to model this relationship with the binary relationship, we may arbitrarily break it into two binary relationships. For example, one may be that a student takes a course, and the other may be the relationship between the first relationship and a professor. In S D B M , we adopt the n-ary relationship but with the restriction that the n-ary relation is primitive. An n-ary relationship is said to be primitive if we are not interested in the smaller relationships that can be used to reconstruct the n-ary relationship. This approach is similar to the approach characterized by "irreducible relations" (Hall et al [1976]). 45 We believe that a data model should be able to model relationships with different characteristics differently. One important characteristic of relationships is whether they have properties of their own. A binary relationship without its own properties can be modelled in three different ways. For expository purposes, let us again use the example that students take courses. One way to model this relationship is to treat the relationship as if it were a property. That is, the student object will have the property COURSE which identifies courses taken by the student, and the course object will have the property STUDENT which identifies students enrolled in the course. Obviously, these 'properties' are multi-valued. Relationships with different kinds of functional characteristics (e.g. one-to-one and many-to-many) can all be modelled this way. A very important conceptual constraint is that for such a relationship to exist both the student object and the course object must exist. SDBM has provided the proper mechanisms to ensure that this constraint is observed. Another way to model the relationship is to treat it as an SDBM object, and then use the first approach to establish the relationship between this object and the student as well as the relationship between this object and the course object. Obviously, it does not make much sense to model the relationship in such a way if it does not have its own properties. It simply complicates data manipulation without any gain. The third approach is similar to the second approach. But with this approach, no object will be created to represent the relationship. In fact, this approach only creates a w i n d o w 1 " through which the relationship can be seen. This approach will be clarified after the window concept is discussed. It will be demonstrated that this approach and the 1 "The SDBM concept of window will be explained in Section 4.2.6. 46 second approach also have different implications for database manipulation. For binary relationships with their own properties and any n-ary (n>2) relationships, the second and third approach are used. When the second approach is used, we in fact represent the relationship with several binary relationships between each participating object and the relationship appearing as an object or a pseudo-object in the case of the third approach. However, we are not interested in these binary relationships themselves, because the relationship of interest has been modelled with SDBM obects. With respect to this approach, we can view all the relationships in SDBM as modelled with binary relationships. As mentioned previously, the existence of the S D B M relationship depends on the existence of the related objects. This necessary condit ion must be checked by the computer. To establish a specific relationship, we must explicitly modify appropriate properties of participating objects. These properties are those which have been dedicated to representing the relationship. A relationship can be terminated through modifying appropriate properties, too. It is important to be aware of the conceptual difference between the existence of an object and the existence of a relationship. We believe that modelling the relationship through object properties is natural. Take the student object as an example. We would naturally think that he has a name, he has a student number, he has an age, he has degrees, and he takes courses. Although the computer must distinguish courses from the name, age, etc., the user can choose to ignore their differences. For the user, they are all properties of the student. 47 4.2.4 S D B M PROPERTIES The SDBM property can be viewed as a special kind of relationship. It is defined to be the relationship between an S D B M object and a symbolic string or strings. The relationship discussed in the previous section is between two SDBM objects. The important characteristics of a symbolic string participating in a property can be summarized as follows: 1. The existence of a symbolic string is considered to be eternal. Therefore, the conceptual constraint imposed on the relationship between two objects will always hold for properties as long as the object exists. In other words, the existence of a property depends only on the existence of the object. 2. A symbolic string is not an object. 3. Not any symbolic string should be allowed to exist. For a specific property, only certain symbolic patterns are permitted. They are specified with the SDBM type and window system. So far, we have focused on the discussion of the conceptual structure and its basic constructs — objects, relationships, and properties. We now turn to addressing data representations of the conceptual strcture. The next section discusses the concept of SDBM data type and the section following presents the concept of the SDBM window. 4.2.5. S D B M TYPE C O N C E P T As mentioned before, we have to resort to symbolic strings for data manipulation. The semantics of objects, relationships and properties all have to be conveyed through symbolic strings. The conveyance is bilateral either from the user to the computer or from the computer to the user. To ensure the accuracy of such conveyance, we need certain mechanisms to structure symbolic strings and to provide certain semantics for the 48 conceptual structure. SDBM data types are designed to be such a mechanism. Let us define the concept of data type. In programing languages, a data type is defined by a symbolic name, a set of data ob jects 1 5 , a set of operations that manipulate these data objects, and the possible value bindings. This definition requires that the specification of a data type must contain the fol lowing three elements: 1. a name for the type 2. a syntactic specification for symbolic strings of the type 3. a set of operations for manipulating data objects of the type. They are the only operations that can directly manipulate the syntactic representation of symbolic strings of the type. In S D B M , we use the concept of data type as defined above. We should emphasize that the term data object does not necessarily refer to the physical computer. They may refer to any virtual computers. The name of a type is an abstraction of the syntactic specification and the set of operations. The name dictates the symbolic patterns and the operations that may be allowed. In S D B M , the name equivalence rule is adopted for the data type equivalence test. Therefore, distinctive semantics are associated with the SDBM data type name. For example, data type MILE and data type WEIGHT may both be defined on the same set of integers with the same syntactic specification. But they are not comparable in S D B M , because different semantics are associated with them. W e will discuss more about S D B M data types in the next chapter when their syntax and semantics are described. 1 5 Do not be confused with S D B M objects. The term data object here refers to a run-time grouping of one or more pieces of data in a virtual computer. Alternatively, it can be viewed as a container for data values. 49 4.2.6 S D B M W I N D O W C O N C E P T So far, we have constructed a conceptual world consisting of objects, relationships, and properties. We also introduced the SDBM data type concept which is used to structure symbolic strings for the purpose of external data representation and manipulation. We have not yet answered questions such as "How can we specify the conceptual structure among objects, relationships, and properties?" and "How can we integrate the conceptual structure and its external data representation?". The SDBM window concept is developed to answer these questions. This is probably the most important concept of the Semantic DataBase Model, SDBM. More specifically, the following functions will be accomplished by the window concept. 1. To provide a window through which certain aspect of SDBM objects or relationships can be seen (accessed) and manipulated. 2. To describe and maintain the structure among objects, relationships, and properties. 3. To model the persistence of objects, relationships, and properties. We will further clarify this concept below. To begin with, it is impractical, even though possible, to manage objects individually because of the following three reasons. First, the number of objects in a database may be enormous. Second, there may be differences between every two objects. Third, each object may play a variety of roles. Thus, we need to treat them in groups. This is accomplished through using SDBM windows. That is, a window specifies certain properties which we are interested in and which are allowed to be accessed and manipulated. Through this window, only objects that have these properties can be accessed and manipulated. The specific syntax of symbolic strings permitted to describe properties as well as objects and relationships through SDBM windows is specified with the SDBM type system. 50 From the conventional semantic data modell ing point of view, an SDBM window maybe appear to be the result of applying the classification mechanism to objects and is, therefore, the same as the class concept used in many existing semantic data models. Although similarities do exist between these two concepts, there are conceptual differences. At this point, we can identify at least three differences between the window concept and the class concept. First, a window is not an object container as a class is. Second, in a class objects are grouped together because of the homogeneous properties shared by them. However, through a window, we may access and manipulate two objects with different properties. The only requirement is that properties of every object have the same abstraction. Take the example used in Section 2.1.7. With S D B M , both female and male patients can all be seen through the window P A T I E N T 1 6 . We will be able to see female patients with property PRECNANT-OR-NOT and male patients without this property. Although variations are allowed among SDBM properties, the number of such variations should be relatively small compared with the total number of objects or relationships that can be seen from that window. The third difference is that a class contains only objects while both objects and relationships may be seen through windows, though they may not be seen through the same window. This will be demonstrated in Section 5.3 where the syntax and semantics of the S D B M window are presented. From now on, we will call the abstraction of properties seen through a window the role of that window. Apparently, it satisfies the definition of the role concept given in Section 4.2.2. As identified before, SDBM window's other function is to describe and maintain the 1 6 When we say that object A can be seen through a window, we mean that A can be accessed or manipulated through that window. 51 conceptual s t r u c t u r e 1 7 . An important aspect of this function is related to the relationship between windows. In SDBM, relationships between windows are divided into two kinds, interrelationships that do not affect the insertion of an object into the windows or the deletion of an object from the windows, and interrelationships that do affect these operations. A relationship of the latter kind is called a reference and should be specified explicitly. In S D B M , we do not rely on the behavior of the external data representation of relevant roles to convey the semantics of insertions or deletions. Rather, we explicitly specify what is intended. Finally, the SDBM window concept supports the notion of persistence. Anything seen through a window can survive after a program invocation is terminated. This is done by mapping relevant data to some internally persisitent data types. To conclude this section, we argue the differences between the window concept and the view concept in conventional data models. First, a SDBM window is an integrated part of the database specification. The definition of a conventional view is usually not in the database specification. For example, a view may be the result of a retrieval. The second difference is that a window usually contains additional information that can not be obtained elsewhere, whereas a view is usually derived from some other information. 4.2.7 AN EXTENSION T O SPECIALIZATION M E C H A N I S M In Section 3.3.3, we discussed the specialization mechanism which is used as a major tool to enhance the semantic modell ing ability in many recently developed semantic data models. TAXIS applies this mechanism uniformly to all the aspects of data modell ing. Clearly, different views of the same thing are supported by this mechanism. However, using the conceptualization presented in this paper, we can easily pinpoint the limitation of 1 7 A conceptual structure is a structure consisting of objects, properties, and relationships and is the conceptualization of the 'real' world being modelled. 52 the specialization mechanism. A major limitation which is inherent to the mechanism is that all the different views generated with the mechanism must be on the same specialization hierarchy. The implication of this limitation is that two different views of the same thing differ only in the levels of information detail. The fact that an object may be viewed from completely different angles can not be effectively modelled by this mechanism. For example, in the real world an object can act as a student and an employee at the same time. Here the student is neither a specialization of the employee nor is the employee a specialization of the student. This important real world phenomena cannot be effectively modelled by many existing semantic data models. For example, to model such fact with TAXIS, a TAXIS relation StudentEmployee has to be created, which should be a specialization of both the STUDENT relation and the EMPLOYEE relation. When the number of such specializations increases, the database specification can become very complex. It is apparent that we should extend the modell ing ability provided by the specialization mechanism. To discuss what extensions should be introduced, we first investigate the relationship between windows and objects. For two different windows W1 and W2, an object S can be related to them in three different ways: Typy 1. If S can be seen through W 1 , it will be seen through W2. Type 2. If S can be seen through W 1 , it may be seen through W2. Type 3. If S can be seen through W 1 , it will not be seen through W2. From now on, we will call these relationships constraints. Notice that these constrants are defined in such a way that they are oriented towards windows instead of towards individual objects. In other words, they are constraints among windows. For type 1 constraints, the S D B M subwindow hierarchy will be used to model it. This type of hierarchy can be described with the specialization mechanism. The SDBM subwindow concept provides a means to give an object a more detailed view by adding 53 to the object information which may not be seen through its superviews. The most important property of the S D B M subwindow hierarchy is that an SDBM window and its subwindows are not dealing with different objects. Rather, they are modell ing the same group of objects with the restriction that objects seen from a subwindow form a subset of objects seen through its superwindows. More detailed discussion on the SDBM subwindow hierarchy will be given in Section 5.3.2. Type 2 constraints are the most common among these three types of constraints. For example, a student object may also be an instructor. But, another student object may not be an instructor. Therefore, the constraint cannot be definitely specified at the window level. Rather, it varies with individual objects. In S D B M , Type 2 constraints are modelled from two aspects. At the window level, we specify the possibility for an object to play multiple roles. For a specific object, different roles are assumed with corresponding data manipulations. An example of Type 3 constraints is that a university may have a policy stating that a full-time student can not be a university administrator. A Type 3 constraint is actually an exclusion constraint between two windows. Type 3 constraints can be enforced at the window level. The specific S D B M construct provided for this purpose will be desribed in Section 5.3.3. 54 5. OVERVIEW OF SDBM In this chapter, we present an overview of the semantic database model , SDBM. Various considerations of the design of SDBM are also addressed. SDBM is designed to attack the modell ing problems of the relational data model and to present a more effective way of modell ing the real world than that available from the relational data model. Compared with other recently developed semantic data models, SDBM has several distinct and novel features. In section 1.3, we pointed out shortcomings of some relevant models. In the previous chapter, we outlined the conceptualization behind the design of SDBM. We will summarize the novelties of our approach in the last chapter. SDBM supports many well-developed features of semantic data modelling, e.g., object orientation and data abstraction. The model also supports the type hierarchy and the subwindow hierarchy. However, the most important feature is that greater modell ing power has been made possible by conceptualizing semantic data modell ing activities in a different but more effective and natural way. This research is intended to be a preliminary part of further study in database programming languages. At the moment, we do not attempt to design SDBM as a database programming language. Rather, it is intended to be a semantic database m o d e l 1 8 . For this reason, some very important features such as database designer specifiable exception handling facilities are not provided. 1 8 The difference between database programming languages and semantic data models is one of emphasis: a database programming language is a set of constructs for specifying the objects and procedures involved in an information system, whereas a semantic data model is a set of constructs for building a representation of the structure of the 'real' world, along with the necessary operations to manipulate that representation (King & McLeod [1985]). 55 In the following, we will discuss the S D B M type system, and then the SDBM window system. Every SDBM schema specification consists of these two parts. Finally, the SDBM transaction facility will be described. In this chapter, the specification of the SDBM, both the syntax and the semantic, are informal. In APPENDIX A, a formal syntax is provided. 5.1 SOME NOTATION CONVENTIONS The following notation conventions are used in the prototypes of SDBM syntax in this chapter. Upper case : indicates an SDBM keyword which is to be typed in verbatim by the user. Braces { } : indicates an optional repetition. The symbol inside may be repeated zero or more times. Brace Plus { } + indicates a repetition. The symbol inside may be repeated one or more times. Brackets [ ] indicates an optional alternative. The symbol inside may have zero or one occurrence. Bracket Plus [ ] + indicates an alternative. The symbol inside must have one occurrence. < > indicates a nonterminal symbol. Vertical bar | : indicates an alternative. indicates that the left side is to be replaced by the right side. To increase readability, in the syntax, we always add some kinds of prefix or postfix to nonterminal symbols to indicate their usage. In the case of ambiguity, " " are used to delimit a terminal symbol. 56 5.2 S D B M TYPE SYSTEM in this paper, the two-view conceptualization is used throughout the design of SDBM. To realize such a two-view conceptualization in the computer, we need a mechanism to support data representations and a mechanism to support conceptual structures. The former is the SDBM type system which is discussed in this section. The latter will be discussed in section 5.3. 5.2.1 ELEMENTARY DATA TYPES The built-in elementary S D B M data types include NUMERIC, BOOLEAN, CHARACTER, and ENUMERATION. They are the elementary data types that will be used to construct other more complex data types. Each elementary data type has certain built-in operations that are permissible with them. 5.2.1.1 NUMERIC A S D B M NUMERIC data type is specified in a way similiar to C O B O L picture clause. The prototype of S D B M NUMERIC data type declaration is TYPE ( <NumericTypeID>= NUMERIC WITH <NumericRange> { OR <NumericRange> } ) <NumericRange>:= <NumericConstant>..<NumericConstant> The following are some examples. 57 Example 5-1. TYPE ( S a l a r y = NUMERIC WITH 10000..99999 ) TYPE ( H e i g h t = NUMERIC WITH 0 . . 9 9 . 9 9 ) TYPE ( T e m p e r a t u r e = NUMERIC WITH - 9 9 . . 9 9 ) In the above examples, each type declaration is preceded with a keyword 'TYPE' to distinguish it from window declarations and transaction declarations. 'Salary', 'Height' , and 'Temperature' are numeric-type identifiers. 10000..99999 declares a five-digit positive integer data type whose values are between 10000 and 99999; 0..99.99 declares a positive real number with two digits before the decimal and two digits after; and -99..99 declares an integer data type which could assume either positive or negative value with two digits. Like any other type declaration, the declaration is enclosed in parentheses ( ). The operations defined for NUMERIC data type are the assignment, the arithmetic operations, and the relational operations. 5.2.1.2 B O O L E A N S A boolean value can be either of, 'TRUE' or 'FALSE'. The specification is straightforward. For instance Example 5-2. TYPE ( True= BOOLEAN WITH {TRUE} ) TYPE ( B= BOOLEAN WITH {TRUE,FALSE} ) The prototype of BOOLEAN data type declaration is TYPE ( <BooleanTypeID>= BOOLEAN WITN "{"TRUE|FALSE|TRUE,FALSE|FALSE,TRUE"}" ) There is no ordering defined for BOOLEAN data type. The operations on a BOOLEAN data type include the assignment and the logic operations (AND, OR, NOT). 58 5.2.1.3 CHARACTERS A character data type consists of data objects that have a string of characters as their values. This data type specifies certain desired format of character strings. For example Example 5 - 3 . TYPE ( Name= CHARACTER WITH A ( 1 0 ) ) TYPE ( Addressnum= CHARACTER WITH 9 ( 4 ) ) TYPE ( Student#= CHARACTER WITH X ( 1 1 ) ) Notice that although 'Addressnum' is a four-digit number, which is indicated by 9(4), it is not a numeric number. That is, the arithmetic operations do not apply to it. 'A(10)' indicates an alphabetic string with a length of 10. X(11) represents an alphanumeric string with length of 11. The prototype of CHARACTER data type declaration is TYPE ( < C h a r a c t e r T y p e I D > = CHARACTER WITH < C h a r a c t e r P i c t u r e > {OR < C h a r a c t e r P i c t u r e > } ) < C h a r a c t e r P i c t u r e > : = X ( < I n t e g e r > ) | 9 ( < I n t e g e r > ) | A ( < I n t e g e r > ) | < C h a r a c t e r L i s t > < C h a r a c t e r L i s t > : = { [ A | X | 9 | < Q u a t e L i s t > ] + } + < Q u a t e L i st>:= " < C h a r a c t e r > { < C h a r a c t e r > } " The above syntax indicates that a character data type can be defined to have several variants which are separated by the keyword OR. This is desirable for example when telephone numbers are to be recorded so that we do not have to write down the area code for each phone number. <Character> is defined in APPENDIX A. 5 9 5.2.1.4 ENUMERATIONS An enumeration is a list of distinctive values. Usually, these values are in a certain order. To eliminate some potential ambiguities, the ordering will not be considered in S D B M . Some declaration examples of the ENUMERATION data type are Example 5 - 4 . TYPE ( Sex = ENUMERATION WITH {Male,Female} ) TYPE ( A = ENUMERATION WITH {High,Medium,Low} ) The enumeration data type is specially useful when a data object can only take on a small number of values. The prototype of the enumeration data type declaration is TYPE ( <EnumerationTypeID>= ENUMERATION WITH "{" < E n u m e r a t i o n L i s t > "}" ) The basic operations on the ENUMERATION data type are the assignment and the relational operations. The relational operations do not include less-than and greater-than operators because no ordering of listed values is provided in the current specification of the S D B M ENUMERATION data type. 5.2.2 STRUCTURED DATA TYPES A structured data type is a data object which is an aggregate of other data objects. A structured data type can be constructed from elementary or any other previously declared data types. Currently, the only structured data type supported by SDBM is the S D B M record data type. An SDBM record data type is similar to an ordinary record data type. However, improvement has been introduced in order to eliminate some problems associated with the ordinary record data type. Let us first use an example to demonstrate the concept. 60 Example 5-5. TYPE ( Date= RECORD WITH Month:Monthtype, Day:Daytype, Year:Yeartype; BEGIN Month LT 13 AND Month GT 0, IF Month EQ 1 OR 5 OR 7 OR 8 OR 10 OR 12 THEN Day LE 31 ELSE IF Month EQ 2 THEN Day LE 28 OR 29 ELSE Day LE 30 END IF ENDIF END ) In above example, SDBM record type 'Date' defines a set of values that can be taken to represent a chronological date. This data type guarantees (almost) that any . incorrect input will be rejected when it goes through type checking. Therefore integrity of the date description is protected. 'Monthtype' , 'Daytype', and 'Yeartype' are data types that must be declared previously. The prototype of the SDBM record data type declaration is TYPE ( <RecordTypeID>= RECORD WITH [ [VARIANT|COMMON ] <ComponentDeclaration>{, [VARIANT|COMMON] <ComponentDeclaration>};] [ BEGIN <ConstraintAssertion>{, 61 < C o n s t r a i n t A s s e r t ion>} END ] ) < C o m p o n e n t D e c l a r a t i o n > : = <ComponentID>:<TypeSpec i f i c a t ion>[MULTIPLE|OPTIONAL] <TypeSpec i f i c a t ion>:=<TypeDef i n i t i o n > | < S u b t y p e D e f i n t ion>| <TypeID> The S D B M record data type declaration can be divided into two parts, component part and constraint part. In the component part which is ended with a record structure is declared which is composed of a number of named components that are usually heterogeneous. The components of this structure can be declared to have any previously declared data type. The SDBM record structure can signify that certain components are multi-valued with keyword MULTIPLE. If a component is specified to be multi-valued, it can take more than one value of the same type. Because the role modelled by an SDBM window may not be syntactically homogeneous, certain components may not make any sense with some objects. Therefore, the S D B M record structure allows its components to be optional, which is specified with the keyword OPTIONAL. Another important feature of the SDBM record structure is that it can have several variants. This kind of record structure specifies a collection of records of different syntactic types. All of them have some components in common. Components appear under the keyword C O M M O N or components that do not fol low either C O M M O N or VARIANT are, by default, common to all the records of the structure. But some of these records may have components that are different from others. Such record variants are specified by giving several component-id and Component-declaration pairs which are placed under the keyword VARIANT. For example, an employee payroll record structure may have two variants because permanent employees are paid monthly and 62 temporary employees are paid hourly. Example 5-6. TYPE ( Employee= RECORD WITH COMMON EmpID: I n t e g e r , Dept: DeptName, VARIANT EmpType: {Permanent} M o n t h l y R a t e : R a t e l , S t a r t D a t e : Date, VARIANT EmpType: {Temporary}, H o u r l y R a t e : R a t e 2 , O v e r t i m e R a t e : Rate3 ) In the constraint part which is optional and is enclosed by BEGIN...END, constraints on components or among components are expressed. Constraints are all specified in first-order logic. Constraint assertions are separated by a comma ',' which is a shorthand for the logic operator A N D . To simplify specification, an IF..THEN...ELSE structure is introduced and Boolean expressions are allowed on the right side of a relational operation sign. ENDIF is used to eliminate any potential syntactic ambiguity associated with the IF..THEN...ELSE structure. The constraints specified here apply only to the value of the data type. Namely, the SDBM type system only deals with domain constraints and, to certain degree, cardinality constraints. The constraints on objects are specified in the window declaration part. There are two basic operations defined for the SDBM record data type. One is the component selection operation (or the dot operation) whose syntax is <Var iab le lD>.<Component lD> 63 It selects the designated component from the record bound to the variable <Var iable lD>. The other operation is the assignment which assign the value of a record to another record of identical data type. For example Recordl = Record2 assigns the value of Record2 to Record l . Here variable Recordl and variable Record2 have been declared to be of the same record data type. 5.2.3 ABSTRACT DATA TYPES An abstract data type is a new data type defined by the database designer that includes: 1. one or more data type definitions. 2. a set of abstract operations on the defined data types. These operations are usually defined with several pieces of programs (procedures or functions). 3. Encapsulation of type definition and operations. In SDBM, we provide a facility which will support the above ADT (Abstract Data Type) concept. The following is an example that illustrates the specification of an SDBM abstract data type 'Da te l ' . Example 5-7. ADT ( Datel = TYPE Date LOCAL; ADTOP Compare(d1:Date,d2:Date): IS ENUMERATION WITH {L,S,EN-ACTION: FUNCTION Compare TYPE (T IS NUMERIC WITH 19760000..20000000); VARIABLE x,y: T; BEGIN x= d1.Year*10000+d1.Month*100+d1.Day, 64 y= d2.Year*1 IF x EQ y THEN compare IF x GT y 0000+d2.Month*100+d2.Day S r THEN compare= " L " , IF x LT y THEN compare= "E" END FUNCTION compare ) The declaration of an SDBM ADT can be recognized by the keyword ADT. In the above declaratation, the type definition of the abstract data type is given after keyword TYPE. The logical structure (syntactic structure) of ADT 'Da te l ' is defined with the data type 'Date' which of course must be previously declared. The operations that are permitted to access the internal components of the data type are declared after the keyword ADTOP which is short for ADT Operations. Parameters of the ADT operation are given along with their data types. Operations are defined to be either functions or procedures. In the former case, a single result is returned explicitly. In the latter case, no result or more than one result are returned explicitly. In the above declaration, there is a keyword LOCAL for the data type 'Date'. This declaration excludes any other operations from accessing the internal structure of the data type 'Date'. Therefore, the encapsulation abstraction is achieved. Notice that such encapsulation can only be obtained through dedicated language or model design and implementation. Ordinary subprogram facilities such as those in PASCAL are not sufficient to prevent illegal accesses to the internal structure, even though these facilities can be used to produce data types with new operations. The details of ADT operations are specified after the keyword ACTION. In this part, actions performed by one or more operations are defined. 65 Let us explain the ADT operation 'Compare' . This operation is used to compare two dates. If the first date is later than the second date, character 'L' is returned. If the dates are the same, 'S ' is returned. If earlier, 'E ' is returned. Therefore, we define the value returned to be of an enumeration data type. This enumeration type is defined to be a subtype of the built-in data type E N U M E R A T I O N 1 9 We assume that all the components of the data type 'Date' have been declared to be numeric and of the same data type. Arithmetic operations are performed on them, and the results are compared. Based on these comparsons, appropriate characters are assigned to the ADT operation. The prototype of the SDBM ADT declaration is as follows. For more detail, see APPENDIX A. ADT (<ADTID>= TYPE <TypeID> [ L O C A L ] ; ADTOP [<ADTFunctionOP>|<ADTProcedureOP>] +{, [<ADTFunctionOP>|<ADTProcedureOP>] +}; ACTION: < A D T P r o c e d u r e O r F u n c t i o n > ) <ADTFunc t i onOP>: = <ADTFunct i o n I D > ( < F u n c t i o n P a r a m e t e r s > ) : < T y p e S p e c i f i c a t i o n > <ADTProcedureOP>:= < A D T P r o c e d u r e I D > ( < P r o c e d u r e P a r a m e t e r s > ) Though generally speaking the internal structure of an abstract data type should not be accessable from outside the ADT definition, sometimes we only want to add several special operations to a data structure. We do not want to hide the structure and we do not want to eliminate the ordinary operations defined for that data type in SDBM. In this case, keyword LOCAL must not be specified. This will make the structure of the ADT and 1 9 The concept of subtype will be discussed in next section. 66 the operations on the structure visible outside. That is, any operation applicable to the data type <Type lD> will also be applicable to the ADT. 5.2.4 SUBTYPE HIERARCHIES The concept of subtype hierarchy (or type hierarchy for short) has been supported by several programming languages or data models such as Simula 67 (Birtwistle [1973]), Smalltalk (Ingalls [1978]), Ada (Wegner [1980]), and Galileo. The type hierarchy concept of Ada is a facility to give another name for a type whose set of values has been constrained. The concept of type hierarchy supported by Simula 67, Smalltalk, and Galileo are similar. The type hierarchy concept of Galileo, for example, can be defined as follows. Data type t' is a specialization or a subtype of the data type t if a value of t' can be used in any context where a value of the data type t is permitted. The type hierarchy concept of SDBM is similar to that defined above. However, there are both conceptual and structural differences between the type hierarchy of Galileo and that of SDBM. Conceptually, the type hierarchy of Galileo is a mechanism used to deal with the intensional aspect of its class hierarchy. Its extensional aspect is dealt with by the derived classes of Galileo. With S D B M , the type hierarchy does not necessarily define the inclusion hierarchy of windows. Structurally, Galileo adopts a mechanism that automatically establishes type hierarchies among data types based on their structure. One flaw associated with this approach is that two 'compatible' structures may have very different semantics. Consider for instance the following two data type examples defined in Galileo. 67 Example 5-8. TYPE ( Company := ( Name : STRING AND A d d r e s s : A d d r e s s AND Phone : PhoneNumber) AND P e r s o n := ( Name : STRING AND A d d r e s s : A d d r e s s AND Phone : PhoneNumber AND B i r t h D a t e : D a t e ) ) Obviously, the data type 'Person' should not be considered a subtype of the data type 'Company' . With SDBM, type hierarchies are established when the designer explicitly specifies it and the related data structures are compatible according to SDBM type hierarchy rules which will be explained later in this section. Let us first look at the type hierarchy related to elementary data types. The prototype is: TYPE ( <SubtypeID> IS <TypeID> [ WITH < E l e m e n t a r y T y p e P i c t u r e > ] ) < E l e m e n t a r y T y p e P i c t u r e > : = <Numer icRange>| < B o o l e a n P i c t u r e > | < C h a r a c t e r P i c t u r e > | < E n u m e r a t i o n P i c t u r e > Where <Type lD> must be a previously declared type or subtype ID. Four built-in elementary data types can be used here. They are NUMERIC, B O O L E A N , CHARACTER, and ENUMERATION. They cannot be used as subtypes. <ElemenaryTypePicture> is in fact a constraint which specifies a subset of values of type <Type lD> ; otherwise the subtype relationship can not be established. 68 Notice that an elementary subtype declaration does not introduce a new data type. A subtype and its supertype are the same data type in terms of type equivalence. To put it in a different way, every variable declared of the subtype will have the same set of permissible operations as that of the supertype. However, it takes its values from a subset of values permitted to the supertype. The following are several examples of elementary data type hierarchy declarations. Example 5-9. TYPE ( ManagerSalary IS S a l a r y WITH 30000..99999 ) TYPE ( Open IS BOOLEAN ) TYPE ( GradStudent# IS Student* WITH "95-"XXXXXX ) TYPE ( Sex IS ENUMERATION WITH {M,F} ) We now clarify the SDBM record type hierarchy concept with several examples. To a certain extent, in SDBM the type hierarchy is a convenient way of creating data types. For example Example 5-10. TYPE ( Time IS Date WITH Hour : (+99,NUMERIC) ) TYPE ( Am IS Time WITH BEGIN Hour LE 12 END ) TYPE ( SpringDate IS Date WITH Month: IS MonthType WITH 3..5 ) 69 In the first example, the data type 'Time' is defined to be a data type 'Date' with additional component Hour which is defined by the data type HourType which is assumed to be a subtype of NUMERIC. The data type 'Date' used above is defined in Section 5.2.2. In the second example, the data type 'Am ' is defined to be the data type 'Time' with the additional constraint that Hour must be less than 12. Structurally, the declaration of 'Am ' is equivalent to the following declaration Example 5-11. TYPE ( Am = RECORD WITH Month: MonthType, Day: DayType, Year: YearType, Hour: HourType; BEGIN Hour LE 12, Month LE 12, IF Month EQ 1 OR 5 OR 7 OR 8 OR 10 OR 12 THEN Day LE 31 ELSE IF Month EQ 2 THEN Day LE 28 OR 29 ELSE Day LE 30 END IF ENDIF END ) These two declarations are, however, not equivalent. In the latter case, no type hierarchy is established. That is, the data type 'Am ' declared in the latter example is not a subtype of 'Time'. It is a different data type with the same structure. We can make the latter a subtype of Time by explicitly adding 'IS Time' to the declaration. 70 Example 5-12. TYPE ( Am IS Time WITH Month: MonthType, Day: DayType, Year: YearType, Hour: HourType; BEGIN Hour LE 12, Month LE 12, IF Month EQ 1 OR 5 OR 7 OR 8 OR 10 OR 12 THEN Day LE 31 ELSE IF Month EQ 2 THEN Day LE 28 OR 29 ELSE Day LE 30 ENDIF ENDIF END ) It is apparent that the declaration of Am in Example 5-10 has the advantage of economical representation. Namely, it occupies much less space because only the additional components need to be declared. In S D B M , the order of components in a record data type does not affect the type hierarchy. Therefore, the above declaration is more flexible in that it can have its components in any order. ln SDBM, a record subtype ST is obtained by adding information to a record type T. Information can be added in any of three ways. First, the information can be added so that the set of components of ST contains the set of components of T. Second, the information can be added by simply adding more constraint assertions to T. Third, the information can be added through redefining the data type of an existing record component so that the new data type is the subtype of the original one. This third 71 approach has been illustrated with the last example in Example 5-10. Any combination of these two will also produce a subtype. In summary, we have the following rules to determine the subtype hierarchies. These are necessary conditions but not sufficient. They are used in SDBM to check the correctness of the type hierarchy declaration. 1. For any type t, <t, t> 2 0 2. For any numeric data type t1 and t2, <t1, t2> only if the numeric value set of type t1 is a subset of the numeric value set of type t2. 3. For any boolean data type t1 and t2, <t1, t2> only if the boolean value set of type t1 is a subset of boolean value set of type t2. 4. For any character data type t1 and t2, <t1, t2> only if the character value set of type t1 is a subset of character value set of type t2. 5. For any enumeration data type t1 and t2, <t1, t2> only if the enumeration value set of type t1 is a subset of the enumeration value set of type t2. 6. For any SDBM record data type t1 and t2, <t1, t2> only if at least one of the following conditions is satisfied. a. If c2 is a component of type t2, then c2 must be a component of t1 and the data type of c1 must be a subtype of the data type of c2. b. If c2 is a constraint assertion of t2, then c2 must be a constraint assertion of t l . The significance of the type hierarchy concept can be summarized in three parts. First, it facilitates type checking. The type hierarchy ensures that if type t1 is a subtype of type t2, then a type t1 value can be used in any context where a type t2 value is expected. The reverse, however, is not true. This property makes it easier to develop complex database software on an incremental basis, because once a system is verified and 2 0 T h i s notation is borrowed from (Albano [1985]). <a, b > means that type a is a subtype of b. 72 validated for data at a more general level, it is guaranteed for data of any subtype which may be introduced later on. Second, the type hierarchy concept increases the semantic modell ing ability in that it provides a means to associate distinctive semantics with data types in terms of type equivalence. We have mentioned before that SDBM uses the name equivalence rule. With this rule, any two type definitions are always different. For examle, in the fol lowing declaration: TYPE ( MILE= NUMERIC WITH 0..1000 ) TYPE ( HEIGHT= NUMERIC WITH 0..100 ) Both MILE and HEIGHT are numeric data types. But they are not equivalent. Therefore, illegal operations like MILE +HEIGHT can be detected. Data types that are equivalent can be declared with type hierarchy mechanism. For example, the width and length of a rectangular can be declared to be equivalent as follows: TYPE ( WIDTH IS NUMERIC WITH 10..50 ) TYPE ( LENGTH I S NUMERIC WITH 10..100 ) In fact, any two data types with a common root are equivalent. Third, the notion of the type hierarchy also enhances the specification of the database. It makes the specification concise and readable. Therefore, it may contribute to more reliable database software development. As a final comment, for S D B M records, the structural part and the constraint assertion part have different implications for the type checking of type hierarchies. 5.3 S D B M W I N D O W SYSTEM In Section 5.2, we described the S D B M type system. An SDBM data type specifies a set of symbolic values that can be used to describe certain roles played by an object (or a set of objects) and a set of operations for manipulating these values. The SDBM type system only deals with the external data representation aspect of the database. To accomplish the database functions, however, we also need mechanisms to describe objects, to describe properties and relationships among objects, to model persistence of objects, 73 and to open a channel for the database user to communicate with the database. The SDBM window system is such a mechanism. Section 4.2.6 has outlined the basic concept of the S D B M window concept, this section will further clarify the specific syntactic constructs that support this concept. For this purpose, several examples will be used in this section. Of course, all the explanations are informal. Window 'Student' and window 'Course' are declared as follows. The relevant data types are also defined. Example 5-13. TYPE ( S t u d e n t T y p e = RECORD WITH StdNo: StudentNumber, StdName: StudentName, DeptName: DepartmentName, C r s N o : CourseNumber (MULTIPLE), S e c t i o n : S e c t i o n N u m b e r (MULTIPLE), A d d r e s s : A d d r e s s , PhoneNo: PhoneNumber ) TYPE ( C o u r s e T y p e = RECORD WITH C r s N o : CourseNumber, C r s T i t l e : C o u r s e T i t l e , DeptName: DepartmentName, C r e d i t : C o u r s e C r e d i t ) TYPE ( C o u r s e S e c t i o n T y p e = RECORD WITH C r s N o : CourseNumber, S e c t i o n : S e c t i o n N u m b e r , I n s t r u c t o r : I n s t r u c t o r N a m e , StdNo: StudentNumber (MULYIPLE) ) TYPE (DepartmentType= RECORD WITH 74 DeptName: DepartmentName, D i r e c t o r : D i r e c t o r N a m e , A d d r e s s : A d d r e s s WINDOW ( S t u d e n t = TYPE: S t u d e n t T y p e ; REFERENCE: Department(DeptName= DeptName); RELATIONSHIP: T a k e ( C o u r s e S e c t i o n ( CrsNo=CrsNo, S e c t i o n = S e c t i o n ) ) <--> I s T a k e n , B e l o n g T o ( D e p a r t m e n t ( DeptName= DeptName ) ) <--> Have; KEY: StdNo WINDOW (Course= TYPE: C o u r s e T y p e ; REFERENCE: Department(DeptName=DeptName); RELATIONSHIP: I s O f f e r e d ( D e p a r t m e n t ( DeptName= DeptName ) ) <--> O f f e r , H a v e ( C o u r s e S e c t i o n ( C r s N o = C r s N o , ) <--> B e l o n g T o ; KEY: C r s N o WINDOW ( C o u r s e S e c t i o n = TYPE: C o u r s e S e c t i o n T y p e ; REFERENCE: C o u r s e (CrsNo= C r s N o ) ; RELATIONSHIP: B e l o n g T o ( C o u r s e ( C r s N o = C r s N o , ) <--> Have, I s T a k e n ( S t u d e n t ( S t d N o = S t d N o , ) <--> Take; 75 KEY: C r s N o , S e c t i o n ) WINDOW (Department= TYPE: D e p a r t m e n t T y p e ; RELATIONSHIP: H a v e ( S t u d e n t ) < — > B e l o n g T o , O f f e r ( C o u r s e ) < — > I s O f f e r e d ; KEY: DeptName ) Several points need to be elaborated. First of all, there are four attributes that may be defined in an SDBM window specification. They are TYPE, REFERENCE, RELATIONSHIP, and KEY. We explain these attributes subsequently. The TYPE attribute defines what type of values can be used to describe an object seen through that window. Therefore, in the example, a Student object must have values of data type StudentType and a Course object must have values of data type CourseType 2 1 . Different identifiers are required for different windows and their corresponding data types. This is to eliminate potential confusion that might occur in the variable declaration part of the SDBM transaction which will be discussed later on. The data type used in the SDBM window declaration must be previously specified in the type declaration part of the SDBM schema. In this paper, we call the component of an SDBM record data type used to describe the SDBM window and therefore to descibe certain aspects of objects the property of the objects. TYPE, REFERENCE, RELATIONSHIP, and KEY are called attributes of the window. Interrelationships that affect insertions or deletions of objects need be specified in the window declaration part. In other words, references must be declared explicitly. This attribute is necessary because in S D B M objects and symbolic strings are explicitly 2 1 When we say that a Student object must have values of data type StudentType, we mean that only the values of data type StudentType can be used to describe the object playing the role of a student. 76 differentiated. Such declaration is accomplished through the REFERENCE attribute. Identifiers of the referenced window are listed after the keyword REFERENCE. In order to successfully create or insert an object with reference constraints, the database designer must also specify how references should be established. For this purpose, certain property identifiers that can identify specific objects in a window may be given in the parentheses following the window identifier in the REFERENCE attribute. These identifiers must be components of the data type corresponding to that window so that data types are secured and value transmissions are provided. For example, in window Course we have the following reference: REFERENCE: Department(DeptName=DeptName) Where the identifier DeptName before the sign •' = ' must identify a property of window Department and the identifier DeptName after the sign ' = ' must be a property idenfier of window Course. These identifiers are not necessarily keys. The REFERENCE attribute is optional, which indicates that not every object depends on some other object to exist. The specific implications of the REFERENCE attribute for database operations will be discussed later when we descibe the SDBM transaction. The RELATIONSHIP attribute is used to explicitly establish relationships between objects. Let us explain this attribute with an example taking from Example 5-13. RELATIONSHIP: T a k e ( C o u r s e S e c t i o n ( C r s N o = C r s N o , S e c t i o n = S e c t i o n ) ) <--> I s T a k e n ; The symbolic string Take ' is the identifier for a relationship between a Student object and a CourseSection object. CourseSection in the parentheses identifies the window through which a Student object is to establish a relationship with an object, i.e. a CourseSection object. In the inner parentheses, the symbolic string 'C rsNo ' before the ' = ' identifies a property of CourseSection objects, the symbolic string 'CrsNo ' after the ' = ' 77 represents a property of Student objects. It is not necessary for these two identifiers to be identical as in this case. But they must have the same data type. 'Section = Section' in the same parentheses can be interpreted similarly. These property identifiers are used to establish a specific relationship between two objects. Whenever a Student object is created or inserted into the Student window or when its CrsNo property value or/and Section property value is modified, its new CrsNo property value and Section property value will be passed to the CourseSection window. Based on the values of CourseSection properties CrsNo and Section, appropriate CourseSection object will be retrieved. If such an object is found, a specific relationship will be established. Otherwise, no relationship will be established. More detailed implications for data manipulation will be discussed in Chapter 6. Symbolic string 'IsTaken' after the sign '< - -> ' identifies the same relationship as identified by 'Take' but views it from the reverse direction. Namely, a relationship between two objects, for example between a student and a course section, can be viewed from two different angles. On one side, he/she is 'taking' a course section. On the other side, a course section is 'taken' by a student. Through specifying 'Take <--> IsTaken', we explicitly define two different views of the same relationship. Although there are two identifiers for a relationship, we do not need to establish both the 'Take' relationship and the 'IsTaken' relationship in order to establish the relationship between a student and a course section. In fact, once a 'Take' relationship is established, an 'IsTaken' relationship is automatically established, because they are actually the same relationship. The specification in Example 5-13 permits the relationship between a Student object and a CourseSection object to be established from either side. However, the relationship between a student and a department can only be established through the Student window. We give this restriction in the specification only to demonstrate that the way of establishing a relationship is controlled by the database designer. The information 78 requirement may shows that in the real world the relationship that a student belongs to a department is established when the student is registered instead of when the department is created. Therefore, it is also natural to have the above restriction. The RELATIONSHIP attribute is optional. Let us discuss the existence difference between the REFERENCE attribute and the RELATIONSHIP attribute. The REFERENCE attribute specifies the semantic integrity constraint that must be satisfied by an object if it is to exist in the window. The RELATIONSHIP attribute, however, specifies the semantic integrity constraint for the existence of a relationship. It states that for a relationship to exist two participating objects must exist. But the existence of a relationship will not affect the existence of objects. The KEY attribute is used to impose the uniqueness constraint on the data representations of objects seen through a window. Identifiers listed after the KEY attribute must not conflict with the associated data type declaration. In other words, they must be components of the data type and therefore properties of objects when seen through that window. Apart from specifying the uniqueness constraint, keys in SDBM are solely for external convenience. They are not used internally to identify objects. The KEY attribute is also optional. The prototype of the SDBM class declaration is WINDOW (<WindowID> [ IS <WindowID> WITH | = ] + TYPE: <TypeID>{,<TypeID>}; [REFERENCE: < R e f e r e n c e s > ; ] [RELATIONSHIP: < R e l a t i o n s h i p s > ; ] [KEY: <KeyIDs>] ) < R e f e r e n c e s > : = < W i n d o w I D > ( < P r o p e r t y P a i r s > ) { , <WindowID>(<PropertyPairs>)} 79 <PropertyPairs>:=<PropertyID>=<PropertyID> {, <PropertyID>=<PropertyID> } <Relationships>:=<Relationship> {,<Relationship>} <Relationship>:=<Relat ionshipID>(<WindowID>[(<PropertyPai rs>)] ) <--><RelationshipID> <KeyIDs>:= <PropertyID>{,<PropertyID>} Notice that in the above SDBM window syntax specification, a window can be associated with multiple data types. This reflects the real world phenonmena that two collections of properties with different symbolic appearance may be abstracted to the same conceptual role. Let us use a familiar example that an employer could be an individual, a corporation, or a government organization. This fact can be modelled in two ways. On one hand, if we are interested in individuals, corporations, and government organizations and we can treat them as different roles and model them with different windows, we can model the fact with an employer window which has a single data type. We then insert2 2 corresponding individual objects, corporation objects, or government organization objects into the employer window. On the other hand, if we are not particularly interested in treating individuals, corporations, and governments as objects, the fact can be modelled with an employer window that has multiple data types. For example: 2 2 T h e 'insert' operation will be explained in Section 5.4.3.4. 80 WINDOW ( Employer= TYPE: P e r s o n E m p l o y e r T y p e , CompanyEmployerType, G o v e r n m e n t E m p l o y e r T y p e ) Let us take a look at how the fact that a student is enrolled in a certain section of a course can be modelled in SDBM. For convenience, we call this fact enrollment. In example 5-13, we have in fact modelled enrollment. But there an enrollment does not have its own properties. So, they are modelled with the binary relationship through the properties of students and courses. In the following examples, enrollments with their own properties will be discussed. It is apparent that an enrollment can be viewed from two different angles. First, it can be treated as an object and declared as follows. Example 5-14. TYPE ( E n r o l l m e n t T y p e = RECORD WITH StuName: StudentName, StdNo: StudentNumber, C r s N o : CourseNumber, S e c t i o n : S e c t i o n N u m b e r , G r a d e : Grade ) WINDOW ( E n r o l l m e n t = TYPE: E n r o l l m e n t T y p e ; REFERENCE: S t u d e n t ( S t d N o C o u r s e ( C r s N o = KEY: S t d N o , C r s N o , S e c t i o n ) The above declaration defines Enrollment objects to have properties StdName, StdNo, CrsNo, Section, and Grade. The existence of an Enrollment object is dependent on the existence of the corresponding Student object and the Course object. This approach is = S t d N o ) , C r s N o ) ; 81 similar to that of the relational data model. Alternatively, we can view an enrollment as an aggregation of a student object and a course object. Instead of creating a new object for the enrollment, we can simply generate a new 'window' through which the aggregation of the student object and the course object is seen. Although we have discussed the SDBM window concept, we restricted ourselves to seeing only certain aspects of the same kind objects through the window. Here, we proceed to extend the SDBM window concept so that relationships among objects of different kinds can also be viewed through the window. Let us explain this concept through the following declaration. Example 5-15. WINDOW ( Enrollment= TYPE: StdNo OF Student, CrsNo OF Course, S e c t i o n : SectionNumber, Grade: Grade; AGGREGATE: Student, Course ) The difference between this declaration and the declaration in Example 5-14 is that this S D B M window declaration does not create a new set of objects. In other words, there will be no enrollment objects. All we have from this declaration is a new view of some already existing objects. Keyword AGGREGATE indicates the windows from which objects will be used to construct the aggregation. Keyword TYPE defines the external view of the aggregation. In this case, properties StdNo, CrsNo, Section, and Grade are given. Notice that the window identifier following the keyword OF indicates that the property must also be a property defined in that window and is used to establish the link between a specific aggregation and a specific object of that window. Furthermore, such an aggregation is not simply putting together two objects. It may have its own characteristics too. These characteristics do not belong to any single underlying object and can not be 82 obtained from them. Rather, they must be explicitly specified. In S D B M , they are declared in the TYPE part of the AGGREGATION W I N D O W . The detailed syntax of aggregation is given in APPENDIX A. 5.3.1 S D B M S U B W I N D O W HIERARCHY In SDBM, an instance relationship is simply a relationship between a W I N D O W and its objects. SDBM supports a three level instance hierarchy, i.e. object, window, and metawindow. The metawindow-window hierarchy, however, is trivial in SDBM. There is only one built-in metawindow, W I N D O W , of which every specific window is declared as an instance. It is not possible to define new metawindows in SDBM. This restriction, however, is not a defect of the model for two reasons. First, three levels of instance hierarchy are sufficient for modell ing most data processing phenomena. Instance relationships above these three levels are both beyond our comprehension and of little practical use. Second, the existence of the S D B M type system partially renders a complex metawindow definition facility unnecessary. On the other hand, the S D B M subwindow concept provides a mechanism to give an object a more detailed view by adding to the object information which can not be seen through its superwindows. We use the following example to clarify the specification of the SDBM subwindow hierarchy. Example 5-16. TYPE ( G r a d S t u d e n t T y p e IS S t u d e n t T y p e WITH A d v i s o r N a m e : PersonName, A d v i s o r P h o n e : PhoneNo ) WINDOW ( G r a d S t u d e n t IS S t u d e n t WITH TYPE: G r a d S t u d e n t T y p e ; 83 ) A subwindow hierarchy is specified with keyword IS. The changed attributes are specified after keyword WITH. The semantics of this definition need to be clarified. Firstly, for this subwindow hierarchy to be established, data type GradStudentType must be a subtype of data type StudentType. We do not design SDBM in a way that the subtype hierarchy can determine the subwindow hierarchy automaticly, because otherwise we are assuming that each type is associated with a unique window. They are two different concepts and therefore should be separated. In S D B M , subwindow relationships must be explicitly declared. Secondly, the REFERENCE attribute of window Student is inherited by window GradStudent. Additional REFERENCE attributes can be declared to window GradStudent. The RELATIONSHIP attrbute of window Student is also inherited by window GradStudent. Additional relationships between objects may exist through subwindows. The KEY attribute of window Student is inherited by window GradStudent. Extra KEY attribute may be added to window GradStudent. Finally, as soon as the subwindow relationship is verified, internal routines will be generated to ensure that every object seen through window GradStudent can also be seen through window Student. When an object is successfully inserted into window GradStudent, it will also be inserted into window Student. Upon deletion of an object from GradStudent, it will also be deleted from window Student if such cascade is allowed. 5.3.2 CONSTRAINTS A M O N G W I N D O W S In Section 4.2.7, we pointed out the limitations of specialization as a data modelling mechanism. W e identified that there are three different types of constraints among windows. The last section described the SDBM subwindow hierarchy which is the mechanism to model Type 1 constraints, namely the constraints that if an object is seen 84 in the first window it must be seen in the second window. This section presents the constructs for modelling Type 2 and Type 3 constraints. In S D B M , the construct MAYBE is used to record Type 2 constraints. Let us consider an example. Example 4-17. WINDOW ( S t u d e n t IS P e r s o n , MAYBE I n s t r u c t o r WITH TYPE: S t u d e n t T y p e ) The declaration specifies that If an object plays the role of a student he/she must also plays the role of a person. By using the keyword MAYBE, the declaration also specifies that an object may be a student and an instructor at the same time. Notice that IS constraints and MAYBE constraints have very different implications for the semantics of data manipulation. With the IS constraint, a student automatically 'becomes' a person. With the MAYBE constraint, a student is only permitted to 'become' an instructor. It 'becomes' an instructor only after an appropriate SDBM Insertion Statement is successfully executed. The S D B M Insertion Statement will be explained in Section 5.4.3.4. Because of the symmetrical nature of Type 2 constraints, it will have the same effect if the MAYBE construct is specified with the Instructor window or if the construct is specified with the both windows. A Type 3 constraint is declared with the keyword ISNOT. Consider the following example. 85 Example 5-18. WINDOW ( S t u d e n t IS P e r s o n , MAYBE I n s t r u c t o r , ISNOT A d m i n i s t r a t o r WITH TYPE: S t u d e n t ) The above declaration enforces the policy that a full-time student can not be a university administrator. Eveiy time an object is to be inserted into the Student window, the system will first check if it is already in the Administrator window. In other words, before the system allows an object to play the role of a student, it will first check if the object plays the role of an administrator and vice versa. 5.4 S D B M T R A N S A C T I O N S Y S T E M If a database management system is to be useful, at least four basic operations must be provided. They are object creation, object removal, object va lue 2 3 retrieval, and object value modification. SDBM provides these four operations with automatic integrity enforcement by performing adequate typechecking and adequate window constraint checking. The database state can be permanently affected only after these two kinds of checking are completed successfully. Otherwise, the operation will be aborted and the database will be kept intact. Besides the above four operations, two additional operations should also be defined because of our conceptualization of data modell ing on which SDBM is built. They are object insertion and object deletion. Object insertion is to insert an already existing object into a different window, or in other words, to give a new role to the object. Object deletion is to delete an object from a window so that it will not be seen in that window. That is, it will no longer play the role described by that window. 2 3 Without causing any confusion, we use object values to mean values that are used to describe certain properties of an object. 86 Object creation is done by the CREATE statement; object destruction by the DESTROY statement; object value retrieval by the RETRIEVE statement; and object value modification is done by the MODIFY statement. In addition, object insertion is done by the INSERT statement and deletion is done by the DELETE statement. Often a set of such operations need to be integrated together in order to effect certain conceptually meaningful changes to the database. In terms of abstraction, the user is usually only interested in the 'macro' effect of such a set of operations. How such an operation is constructed is not important to them. Moreover, from the user's point of view, such operation should not only be conceptually meaningful but also conceptually primitive. An operation is primitive if it either achieves its desired effect successfully or terminates without any effect on the database. Conceptually, we want changes to the database to directly reflect changes to the 'real' world being modelled. The SDBM transaction system provides a behavioral abstraction facility allowing definition of conceptually primitive operations. As stated before, the major objective of this research is to make an investigation of our conceptualization of data modell ing and SDBM is not designed to be a database programming language. For this reason, we have not attempted to provide a sophiscated transaction system to SDBM. Rather, the transaction constructs are centered around the basic behavioral operations such as those mentioned at the beginning of this section. 5.4.1 TRANSACTION AS AN ABSTRACTION OPERATION An SDBM transaction closely resembles the subprogram concept in programming languages. The specification allows a user to understand it without digging into how it is implemented. The syntax of a transaction declaration consists of the following four aspects: 1. the name of the transaction; 2. the arguments, their order, and their data types; 87 3. the results, their order, and their data types; 4. the body of the transaction. The syntax is borrowed from the typical syntax for the parameter transmission subprogram of PASCAL-like programming languages. The prototype of the transaction declaration is TRANSACTION ( <TransactionDesignator> <Transact ionBody> ) Below, we will first address issues related to TransactionDesignator. Its semantics will be clarified. In section 5.4.3, we will discuss TransactionBody in detail. 5.4.2 T R A N S A C T I O N D E S I G N A T O R The TransactionDesignator gives a means to identify a transaction and specifies data for the database manipulation and data to be returned from the manipulation. The identification of the transaction is simply accomplished by giving each transaction (pre-defined set of operations) a unique identifier. In an S D B M transaction designator, following the transaction identifier is a set of formal parameters together with their type identifiers or window identifiers. These formal parameters are further separated into IN parameters and O U T parameters. The former designates parameters that are not modifiable by the transaction, while the latter designates results to be returned. When a formal parameter is given a type or a window, the type or the window must be previously declared in the S D B M type or window system. Any declared type or window is legal. 88 The prototype of the transaction designator specification is < T r a n s a c t i o n I D > ( [ < P a r a m e t e r D e f i n i t i o n > { , < P a r a m e t e r D e f i n i t i o n > } ] ) SDBM transactions can be procedural operations which are specified when there is no O U T parameter or when there are more than one O U T parameter definitions. Functional operations are specified when there is exactly one OUT parameter. But they are specified uniformly. <ParameterDefinit ion> is specified as follows. [IN] <ParameterID>:<TypeIDOrWindowID> o r OUT <ParatemerID>:<TypeIDOrWindowID> For IN parameters, keyword IN may be omitted. Below, we make some further comments about the transaction designator of the S D B M transaction facility. Firstly, all the formal parameters are local to the transaction and are invisible outside. Secondly, actual parameters are given when the transaction is called. The correspondence between the actual parameters and the formal parameters is established by pairing actual and formal parameters according to their respective positions in the actual and formal parameter lists. Thirdly, for the IN parameters, transmission-by-value method is used. That is, when the transaction is called, the value of the actual parameter is copied into the formal parameter and the actual parameter cannot be modified by the transaction. This method is justified because in the database system the primary concern is with modifying the persistent objects rather than modifying variables local to a transaction. In addition, objects are always globally accessible. For the O U T parameters, the transmission-by-result method is used. That is, when the transaction call is terminated, the final value of the formal parameter will be copied to the actual parameter. Fourthly, a transaction may have no formal parameters. Finally, When <Type lDOrWindowlD> is a window identifier, the formal parameter is specified to have objects as its values. For such 89 a formal parameter, the value of the actual parameter is in fact a pointer to some specific object. However, how such binding is implemented should not be a concern of the user. 5.4.3 TRANSACTION B O D Y A Transaction Body, corresponding to the body of a subprogram definition, consists of three parts, type declaration, variable declaration and statements. The prototype of the definition is [ < T y p e D e c l a r a t i o n > ] [ < V a r i a b l e D e c l a r a t ion>] BEGIN <Statements> END We will first describe the type declaration part, then the variable declaration part, and finally the statement part in turn. The prototype of the type declaration part is < T y p e D e c l a r a t i o n > : = TYPE <TypeID> [ = < T y p e D e f i n i t i o n > | < S u b t y p e D e f i n i t i o n > ] + { , <TypeID> [=<TypeDef i n i t ion>|<SubtypeDef i n i t i o n > ] + } The prototype of the variable declaration is VARIABLE < V a r i a b l e L i s t > : < T y p e S p e c i f i c a t i o n > | < W i n d o w I D > { , < V a r i a b l e L i s t > : < T y p e S p e c i f i c a t i o n > | < W i n d o w I D > } ; This part is provided to ease the design of SDBM transactions. Two kinds of variables can be declared, type variables and object variables. A type variable is specified with a type identifier and an object variable is defined with a window identifier. An object variable usually has objects of the specified window as its values. However, if the window does not directly model an object but an aggregation of two or more objects, the value 90 of the object variable is this aggregation. In effect, an object variable is a surrogate of some existing objects or their aggregations. Two issues should be clarified. They are scope rules and initialization. In S D B M , (static) scope rules are straightforward. Temporary data are always visible only within their transaction and persistent data are always globally visible. All the variables declared inside a transaction are temporary and local. They are destroyed when the transaction is terminated and they are only visible inside the transaction. Object variables are no exception even though objects are permanent and global. For both formal parameters and variables, uninitialized values could be dangerous to the database design because the computer cannot distinguish between an initialized-value and an uninitialized-value. In S D B M , all the parameters and variables are initialized immediately upon creation to be NULL, which indicates that they have not been purposely bound to anything. The major part of a transaction is a set of statements. These statements are used to accomplish database manipulation functions. Some statements provide actions that actually make changes to persistent data in an S D B M database and actions that retrieve data from a database. Others provide control facilities necessary to define a conceptually meaningful transaction. SDBM statements are listed as follows. <AssignmentStatement> | <TransactionCallStatement> | <CreateStatement> | <DestroyStatement> | <InsertStatement> | <ModifyStatement> | <DeleteStatement> | <RetrieveStatement> | <AbortStatement> | <CompoundStatement> | <ConditionalStatement> | <IterationStatement>|<AddStatement>|<RemoveStatement> 91 Below, we will discuss these different types of statements in turn. 5.4.3.1 ASSIGNMENT The syntax of the assignment operation is < A s s i g n m e n t S t a t e m e n t s := < V a r i a b l e I D > = < E x p r e s s i o n > Assignment is the operation used to change the binding between a value and a data object. It is defined in SDBM to have a meaning similar to the assignment statement in PASCAL. The meaning of the above syntax is that the value of the SDBM expression specified by < Expression > is copied to variable <VariablelD> and no explicit result is returned. Therefore, an S D B M assignment operation can only be used as a statement. Because SDBM is typed, both the result of expression on the right side and the variable on the left side have specific data types. Only when both sides have the same data type can the operation pass type checking. When the type is an SDBM data type, the assignment has the same semantics as that of PASCAL. When the type is an S D B M window, the result of the right must be bound to a specific object (or aggregate of objects) seen through the window and the statement causes the object variable on the left to bind to that object (or aggregate of objects). 5.4.3.2 CREATION The prototype of <CreateStatement> is CREATE < V a r i a b l e I D > INTO <WindowID> WITH ( < P r o p e r t y V a l u e L i s t > ) This built-in operation primitive creates a new object whose component values are given by <ClassValueList>. Conceptually, variable <VariablelD> will have the object as its value. This variable, which could be viewed as a container of objects, is specified to provide some operational convenience. For example, we often need to do some additional 92 operations on the object subsequently. If no such variable is defined, we have to retrieve it from the database each time we need it. In order to create a new object, several conditions must be satisfied. Firstly, variable <VariablelD> must have been declared in the transaction to be an object variable of the window designated by < W i n d o w l D > . Values in <PropertyValueList> must conform to the data type of window < W i n d o w l D > . Secondly, the execution of an SDBM CreateStatement must also check the uniqueness constraints on the window, if the KEY attribute has been specified in the SDBM schema. Any attempt to create an object that violates key constraints will be rejected. Thirdly, before creating a permanent object in the database, SDBM will check the REFERENCE attribute. If the property is specified, the system will use the revelant data values given in the value list of the statement to retrieve the referenced objects. If any one of them does not exist, no new object will be created. If the RELATIONSHIP arribute is specified and some properties are dedicated to establish the relationship, then appropriate objects in some other window will be searched to determine their existence. If the search fails, corresponding property values will be set to NULL and therefore no relationship will be established as the result of the object creation. Notice the RELATIONSHIP attribute will not affect the creation of objects. Finally, it should be reiterated that a new object can be created through any legitimate window, namely any window that has been declared for the object. The successful execution of a CreateStatement will have its effect propagated to all its generalizations (or superwindows). Namely, it will be seen through all its generalized windows. We will use the following two examples to further explain the CreateStatement. 93 Example 5-19. CREATE x INTO Student WITH ( StdNo="lOl1", StdName="Gary U t t e r " , DeptName="Computer Science", CrsNo="CPSC5l1", Section="00l", Address= Phone=" ) CREATE y INTO Course WITH ( CrsNo="Comm534", CrsTitle="System A n a l y s i s " , DeptName="Commerce", Credit="1.5" ) Let us explain the first example. If the Department object named 'Computer Science' exists and all the data type requirements are met, an object will be created. This object is currently playing the role of a student. His student number is 1011, his name is Gary Utter, his department is Computer Science, his address is 4460 W12th. Van., and his phone number is 224-6578. If the course CPSC511 exists and it has section 001, then the relationship that he takes CPSC511 in Section 001 will be established. Otherwise, properties CrsNo and Section will have value NULL which indicates he 'takes' no course section or we do not know what course section he 'takes'. After the creation is completed, variable x will be bound to this object. 94 5.4.3.3 DESTRUCTION The prototype of <DestroyStatement> is DESTROY < V a r i a b l e I D > This built-in behavioral primitive, which reverses the effect of the CREATE statement, destroys an object from the entire database. That is, when an object is destroyed, it can no longer be seen in any window. In order to destroy an object from a database, two conditions must be satisfied. First, the variable <VariablelD> must be already bound to the desired object. Second, the object is not referenced by any other objects in the database. As long as there is an object referencing the object from somewhere in the database, the DESTROY statement will be aborted. When an object is destroyed from a database, all the relationships it participates in will be terminated. Viewed from a related object side, the effect is that certain values of properties that are dedicated to establish the relationship will be modified into NULL. When the DESTORY statement is completed successfully, the variable <VariablelD> will be bound to NULL. 5.4.3.4 INSERTION The prototype of < lnsertStatement> is INSERT < V a r i a b l e I D > ( < V a r I D > { , < V a r I D > } ) INTO <WindowID> [WITH ( < P r o p e r t y V a l u e L i s t > ) ] This built-in behavioral primitive inserts an existing object into a different window. That is, the object is given a different role with this operation. The additional data values needed are given by <PropertyValueList>. <PropertyValueList> is optional because the new window may have the same data type as the original one. ln this case, all the necessary 95 values are already in the database, and we need not specify them again in this statement. In order to insert an object into a different window, several conditions must be satisfied. First of all, the variable <VariablelD> must be declared in the transaction to be an object variable of the window designated by < W i n d o w l D > . The format of the value <PropertyValueList> must conform to the data type of window < W i n d o w l D > . Secondly, before execution, the variable <VarID> must already be bound to the object to be inserted. Such a binding can be achieved through operations such as object creation and object retrieval. Such binding locates the object which we want to view from a different angles. Thirdly, the execution of an SDBM InsertStatement must also check the uniqueness constraint on the window, if the KEY attribute has been specified in the S D B M schema. Any attempt to insert an object that violates the key constraint will be rejected. Fourthly, before making permanent changes to the conceptual structure, SDBM will check the relevant REFERENCE attribute, if any. The system will use the data given in the InsertStatement to retrieve the referenced objects. If the retrieval returns a NULL value, which means that the referenced object does not exist, insertion will be rejected. Finally, when the RELATIONSHIP attribute is specified, an INSERT statement may cause new relationships to be established. Based on the values of those properties dedicated to establish certain relationships, a new relationship will be established if the appropriate object exists. Otherwise, no relationship will be established. Notice that no matter a relationship can be established or not, it has no effect on the successful execution of an INSERT statement. The following is an examples of the InsertStatement. 96 Example 5-20. INSERT X1(X) INTO GradStudent WITH ( AdvisorName="Jim Jackson", AdvisorPhone= ) In this example, variable x is already bound to a student whose student number is 1011 when it was created earlier in Example 5-19. The above statement shows that this student is also a graduate student with an advisor whose name is Jim Jackson and phone number is 228-9990. It is time to further clarify the concept of binding between the S D B M transaction variable and the SDBM o b j e c t 2 " . In SDBM, the variable-object binding is always achieved through the binding between the variable and the window and the binding between the window and the object. This is because every variable in S D B M belongs to exactly one type. For instance in the previous example, variable x is bound to an object through window Student. When we want to insert the object into window GradStudent, we can not specify the insertion statement as follows. Example 5-21. INSERT x INTO GradStudent WITH ( AdvisorName="Jim Jackson", AdvisorPhone=' ) If forced to use the above specification, we have to choose between two choices. First, we can dynamically change the typing of x so that after the execution of this statement it will be a variable of window GradStudent. In this case, SDBM is no longer typed and we lose all the benefits expected from static data typing. The second alternative is to keep the type of variable x. However, this choice is also awkward because we can not 2 "This type of binding will be called variable-object binding. 97 retain access to the object through window GradStudent any more unless we make another retrieval. Therefore, x1(x) is used in the insertion statement. Variable x is as described above, and variable x1 must be declared to be a variable of window GradStudent in the variable declaration part of the transaction. With this mechanism, a new variable-object binding is established through window GradStudent. Of course, the object is the same one. This time it is bound to object variable x1. 5.4.3.5 MODIF ICATION The S D B M ModifyStatement is primarily designed to change the property values of an existing object. The prototype of the ModifyStatement is < V a r i a b l e I D > . < P r o p e r t y I D > = <NewValue> <NewValue>:= " < C h a r a c t e r S t r i n g > " | < E x p r e s s i o n > <CharacterString> is defined in APPENDIX A. <Expression> will be explained in Section 5.4.4. Modification is accomplished through modifying a local object variable designated in the prototype by <VariablelD> which serves as a surrogate of the object to be modified. <Property lD> indicates the property to be modified. The object is designated by <Var iable lD>. New values are obtained either by directly specified values in the case of <CharacterString> or through evaluation of an expression in the case of < Expression>. In order to successfully accomplish a modification operation, several conditions must be satisfied. First of all, various relevant data types must be consistent. For example, the type of the new value must conform to the type of <Var iab le lD>.<Proper ty lD>. Secondly, the uniqueness constraint specified by the KEY attribute may be affected by the 98 operation, because in SDBM any property is allowed to be modif ied, even when it is specified to be a KEY. Therefore, the uniqueness constraint must be checked, if it is so specified. Thirdly, the referential constraint specified by the REFERENCE attribute may be violated by the operation. That is, an object may be modified to reference a non-existing object. For example, a graduate student may change his advisor. We have to make sure that the new advisor exists. Finally, the MODIFY statement can be used to modify the relationships among objects through modifying values of those properties that are dedicated for the purpose of establishing relationships. In this case, we terminate an old relationship ans start .a new one. Let us take a look at an example. Example 5-22. x1.AdvisorName= " L a r r y P e t e r s " After this operation, the object bound to x1 through its role as the graduate student will have Larry Peters as his advisor. It should be emphasized that only the property value of an object variable is modified with the ModifyStatement and such modification is permanent. The value of a type variable or an object variable can be simply modified through an assignment. Such modification, however, is temporary and only has effect within the transaction. The modification statement can be considered as an extension to the assignment statement discussed in Section 5.4.3.1. Notice that the modification statement will not create a new object no matter what changes have been made. Its effect is to get the property value of an object in a certain window changed so that the object will look different, or when the property happens to dedicate certain relationship, existing relationship may be terminated and a new relationship may be established. 99 5.4.3.6 DELETION The prototype of < DeleteStatement> is DELETE <VariableID> <VariableID> designates the object to be deleted from a window. Symmetric to the problem of insertion, deletion of an object requires that it is not currently referenced by any other objects in the database. This is essential to prevent the database from having 'dangling references'. It is also required that none of its specializations is referenced by any other objects because of the inheritance rule of the SDBM window hierarchy. The deletion of an object will propagate to all its specializations. An alternative is that a variable can be deleted only if it has no specialization. However, we believe that deletion cascade should be allowed because it is more natural. For example, suppose that an object plays the role of a student, the role of a graduate student, and the role of a PHD student at the same time. If he ceases to be a student, he must also ceases to be a graduate student, and therefore cease to be a PHD student. Deletion of an object will not destroy the object. The object will still be in the database until it is explicitly destroyed. Deleting an object from a window only means that some aspects of the object will no longer be seen. When the RELATIONSHIP attribute is concerned, the DELETE statement has the same effect as the DESTORY statement. Deletion of an object will set the variable designated by <VariablelD> to its initialized value, i.e. NULL. That is, the variable-object binding is terminated so that the variable will not indicate any object. 100 5.4.3.7 RETRIEVAL The SDBM RetrieveStatement is straightforward. The prototype is RETRIEVE <VariableID>[.<PropertyID> TO <VariableID>{, <VariableID>.<PropertyID> TO <VariableID> }] [ WHERE (<Expression>) ] When some window property values of an object are to be retrieved, the object variable identifier should be fol lowed by a dot and the window property identifier. Here dot '. ' is a window property selection o p e r a t o r 2 5 . Below are two examples of the RetrieveStatement. Example 5-23. RETRIEVE x.StdName TO stdname, x.Address TO stdaddress WHERE ( x.StdNo='1234' ) RETRIEVE x WHERE ( x.StdNo='1234' ) RETRIEVE x.Phone TO stdphone The first retrieval statement will search the values of properties 'StdName' and 'Address' of object x whose student number is 1234. The values obtained will be stored in variable stdname and variable stdaddress, respectively. Also as a result of this operation, object variable x will be bound to the retrieved object through the window Student. The second retrieveal statement will retrieve a Student object whose student number is 1234. If such a student exists, it will be bound to the variable x. 2 5 Do not confuse this operation with the record component selection. In record component selection, the variable identifier before the dot must be bound to an SDBM record type instead of an SDBM object through a window. 101 The third retrieval statement will retrive the value of property 'Phone' of the object bound to the variable x 2 6 and assign the value to the variable stdphone. Three additional points in regard to this example should be made in order to further clarify the semantics of the SDBM RetrieveStatement. First, x must have been previously declared to be an object variable of window Student. And upon the execution of this statement it must have been bound to a specific object or a specific set of objects seen through window Student, otherwise execution will be aborted. The binding may be obtained through using a WHERE clause of the RetrieveStatement. Only those objects that evaluate the Boolean expression after keyword WHERE to be true will be retrieved. Obviously, if x has been bound to the object we do not need the WHERE clause any more. Second, the property identifier following the dot must be consistent with the window designated by the variable identifier. Third, variable stdphone must be declared to have the same data type as the property Phone of window Student. This variable is used to store the value of the retrieved student phone number. The retrieval statement itself will not return an explicit result. The dot operator used for property selection will be further discussed in Section 5.5. Apparently, a single RetrieveStatement alone can only provide answers to very simple queries. More powerful data manipulation functions are achieved by combining it with some sequence control facilities which will be discussed in detail subsequently. There are three different sequence control facilities in SDBM. They are CompoundStatement, ConditionalStatement, and IterationStatement. 2 6 For short, object x will be used to mean the object that is bound to variable x. 102 5.4.3.8 C O M P O U N D STATEMENT The prototype of the CompoundStatement is BEGIN <Statements> END A compound statement is a sequence of statements which are enclosed with BEGIN and END and are arranged in the order in which they are to be executed. Statements are separated by commas. A compound statement is treated in an S D B M transaction as if it was a single statement. The compound statement enables the database designer to define a transaction incrementally. The internal structure may be changed without altering its overall effects. This concept is in fact an application of the aggregation principle. 5.4.3.9 C O N D I T I O N A L STATEMENT A conditional statement is one that provides alternation of two or more different sequences of statement execution. A conditional statement is also treated as a single statement in S D B M transaction construction. Its prototype is I F < E x p r e s s i o n > THEN <statement> [ E L S E <statement>] ENDIF The execution sequence is controlled by a test on the Boolean expression. If the expression is evaluated to be true, the statement after THEN is evaluated; if the expression is evaluated to be false and there is an ELSE specification, the statement after ELSE is executed; otherwise, the conditional statement is entirely skipped. The statement after THEN or ELSE must be a single statement or something that is treated as single statement. The conditional statement is ended with the keyword ENDIF which is used to eliminate any 103 syntatic ambiguity. 5.4.3.10 ITERATION STATEMENT There are two kinds of iteration statements: <ForStatement> | <WhileStatement> To suit database applications, SDBM has borrowed two different kinds of iteration statement from programming languages with a little modification. One kind of iteration provides iteration over all objects of a window. The prototype of the statement is FOR EACH <VariableID> DO <Statement> This statement resembles counter-controlled repetition in programming languages such as the FOR statement in FORTRAN and ALGOL. The difference is that this iteration is over each object of a window, whereas FOR statement in programming languages is over integers. Three points are worth mentioning about the FOR statement. First, variable <VariablelD> must be declared in the variable declaration part of the transaction to be an object variable. Second, the iteration is unconditional. That is, variable <VariablelD> will be bound to each object of the window in turn. For each object, statement <Statement lD> is executed and the binding can not be changed by the execution of the statement. Third, statement <Statement> must be a single statement or something that is treated as a single statement, e.g. a compound statement. The other type of iteration statement provides a facility to terminate the iteration on a specified condit ion. The prototype of the statement is: WHILE <Expression> DO <Statement> 1 0 4 In the WhileStatement, the expression, wich must be a Boolean expression, will be evaluated first. If it is evaluated to be true, statement < Statement > is executed. Otherwise, the WhileStatement will be skipped. Each time after statement <Statement> is executed, the expression is re-evaluated. 5.4.3.11 A B O R T STATEMENT As stated previously, we do not intend to provide a user-specifiable exception handling facility in SDBM. However, we need at least one mechanism to stop the execution of an SDBM transaction if certain condit ion can not be satisfied. AbortStatement is provided for this purpose. Its syntax is simple. <AbortStatement>:= ABORT The meaning of this statement is that it will stop the transaction wherever it is specified and undo whatever has been done. Upon the completion of this statement, the database should be in the same state as if the transaction had never been executed. 5.5 EXPRESSIONS Viewed as a programming language, SDBM is basically statement oriented. Expressions are one of the basic syntactic constituents of SDBM statements. They are used in statements when a value needs to be computed. The formal syntax of S D B M expressions is given in APPENDIX A and will not be repeated here. The basic issues associated with SDBM expressions are the syntactic form, the precedence of SDBM operators, and typing of S D B M expressions. Below, we discuss these issues in turn. S D B M expression are constructed in the mixed syntactic form with the infix form for arithmetic, boolean, and relational operations and the prefix form for transaction calls and 105 ADT operation calls. In SDBM expressions, the following operations are permitted. They are relational operations including EQ (equality), NE (inequality), CT (greater than), CE (greater than or equal to), LT (less than), and LE (less than or equal to), Boolean operations including A N D (conjunction), OR (inclusive disjunction), and NOT (negation), arithmetic operations including + (addition), - (subtraction), * (multiplication), and / (division), unary operations including - (negation) and + (absolute value), the dot operation, transaction calls, and ADT operation calls. Following the convention of many programming languages, SDBM defines the precedence of these operations as f o l l o w s 2 7 . EQ NE GT GE + * LT - + / ADT C a l l LE OR - AND NOT Dot O p e r a t o r Lowest P r i o r i t y H i g h e s t P r i o r i t y The above precedence is incorporated into the grammar for SDBM expressions. As indicated in the grammar, the precedence of an expression can be changed with parentheses. S D B M expressions are typed. That is, every SDBM expression has a type which can be statically determined at compilation time of an SDBM schema. Syntactically, expressions in S D B M are well typed. Therefore, there will be no execution-time type errors with an S D B M expression that has been type checked to be syntactically correct. 2 7 + and - operations that have the same precedence as the OR operation are binary operation. 106 To conclude this section, we make a further discussion of the dot operator used in SDBM expressions. The following two examples are used for this purpose. Example 5-24. y= x.DeptName z= x.BelongTo In the first example, the dot operator is used for property selection. Here, variable x is already bound to a Student object, x. DeptName will return the value of property DeptName of the object. In the example, the value is then assigned to variable y which must be declared to have the same data type as the property DeptName of the student object. In the second example, the dot operator is used for the relationship selection, x.BelongTo will return an object that participates in the relationship BelongTo with object x. The object returned here is a department which the student x belongs to. In the example, the returned department object will be bound to variable z. Variable z must have been declared to be an object variable of Department. 5.6 OPERATIONS O N MULTI-VALUED PROPERTIES In Section 2.1.3, we identified the inadequency of the restriction in the relational data model that property domains must be atomic. We, therefore, permit multi-valued properties in SDBM. Such properties are specified with the keyword MULTIPLE. In SDBM, multi-valued properties not only provide a natural description of one-to-many and many-to-many relationships between objects and symbolic string values but also provide a natural description of one-to-many and many-to-many relationships between objects. However, at the same time we improve the modelling of static structures of the 'real' world, we have to deal with some difficulties associated with the behavioral aspect of multi-valued properties. For example, the modification operation defined in Section 5.4.3.5 is only valid for single-valued properties. This section discusses operations provided 107 for manipulating multi-valued properties. 5.6.1 ADDITION A new value may need to be added to a multi-valued property, for example, when an employee joins an additional project. The syntax of this operation is as follows: ADD [ < V a r i a b l e I D > | " < C h a r a c t e r S t r i n g s > " ] + TO <WindowVar i a b l e I D > . < P r o p e r t y I D > Where <VariablelD> should have been declared to have the same data type as the property <Property lD> and should be already bound to a specific value of that data type. If the property is a REFERENCE property, then the new value to be added must lead to an existing object so that the reference can be established. If the property is a RELATIONSHIP property, the value to be added must also lead to an existing object. Otherwise, the relationship can not be established. The KEY constraint must be checked, if the unique requirement is specified with the property. The A D D statement is illustrated in the following two examples: Example 5-25. ADD x TO t . P r o j e c t ADD " ' B l u e ' , ' Y e l l o w ' " TO y . C o l o r Where in the first variable x has the data type used to define the Project property of Employee and it should be already bound to a value of that type. In the second example, variable y is bound to a Part object. 5.6.2 R E M O V A L To remove a value from a property reverses the effect of adding that value to the property. For example, an employee may drop a project, or we may find out that a part has only one color instead of two as recorded previously. Therefore, we want to disassociate the employee from the project and correct the color information recorded for 108 the part. These desired operations can be expressed as follows: Example 5-26. REMOVE p FROM t . P r o j e c t REMOVE "Yellow" FROM y.Color Where p is a variable that has the same data type as the project property and is bound to a value indicating the project to be removed. Variable t is bound to the Employee object and variable y is bound to the Part object. The syntax of the REMOVE statement is: REMOVE [<VariableID>|"<CharacterStrings>"] + FROM <Var iavleID>.<PropertyID> Where <Var iab le lD> ; <StringList>, <Var iable lD>, and <Property lD> have the same meaning as defined in Section 5.6.1. If the property is a REFERENCE property, this statement will terminate a reference(s). If the property is a RELATIONSHIP property, this statement will terminate a relationship(s), too. 5.6.3 MODIF ICATION The modification operation to be defined in this section is an extension of the assignment operation in Section 5.4.3.1 and the modification operation in Section 5.4.3.5. There only a single value can be operated on. The syntax of the modification operation for multi-valued properties is: <VariableID>.<PropertyID>= NULL|"<StringList>"|<VariableID>.<PropertyID> The following are three examples of the modification operation: 109 Example 5-27. x.Color= NULL x.Color= "'yellow','White','Black'" y.Project= z.Project As the result of this first modification, the multi-valued property color of parts will have a NULL value, which means that we do not know what color(s) the part object x has. The second amodification indicates that the part has three colors, yellow, white, and black. With the third modification, object y will have the same project values as object z has. The modification operation discussed here has the same implications for KEY, REFERENCE, and RELATIONSHIP properties as the modification operation discussed in Section 5.4.3.5 has. From now on, we will not make difference between assignment statement and modification statement. 110 6. AN APPLICATION EXAMPLE OF SDBM In chapter 5, we outlined the basic elements and structures of a proposed semantic database model , SDBM. To further clarify the model and the underlying rationale, we shall use SDBM to design a database. The database is designed to manage information related to a hypothetical project management system. The basic methodology of using SDBM to design databases is the same as that used to design S D B M itself, namely the two-view conceptualization of data modell ing and the principles of data abstraction. The design of SDBM databases begins with identifying objects and their roles. Constraints are classified into categories. Some constraints are only related to the external data representation aspect, and therefore should be modelled with SDBM types. Others are related to objects, roles, and their structures and therefore are naturally modelled with S D B M windows. Then, these windows are refined so that different views of the same objects may be generated according to the different access requirements. The refinement is carried out through studying objects and roles they play. All the possible roles that we are interested in should be identified, and then the relationships or constraints are analysed and classified. The further refinement is accomplished through the specialization mechanism. To begin with, the most general situations are considered to generate windows at this level. Then, with or without added detail, more specialized windows are generated until the desired detail is arrived at. The same process is also applied to SDBM types that serve to provide the data representation of the database contents and maintain semantic integrity constraints that are related to the data representation. The mechanism of aggregation is also used in 111 the design of an SDBM database. The application of this mechanism is demonstrated by allowing an object window (or a type) to participate in certain relationship without forcing the user to know the details of that window (or type). In this chapter, we will first give a general description of the hypothetical project management system. Then, more detail will be added, where necessary, when we go through the actual design. Because the system is used only to demonstrate the features of SDBM in a concrete manner, we have tried to make the example as simple and short as possible. For example, some data types will not be defined if they are obvious and if no confusion is caused. 6.1 A HYPOTHETICAL PROJECT M A N A G E M E N T SYSTEM A large manufacturing company often undertakes projects besides its normal operations. Some projects are launched internally. For example, the computer service department may install an electronic accounting system for the department of finance. Other projects are contracted from outside clients. For normal operations, the company has an ordinary organizational structure. Namely, it is divided functionally into several departments such as administration, finance, marketing, engineering consulting, R&D, and manufacturing. A different structure is adopted for project management. For each project, a team of employees is set up by assigning employees from individual departments to the project. Sometimes, extra people are hired specially for a project. When the project is finished, their employment is terminated. In this chapter, we call these people temporary employees, and other employees will be called permanent employees. They are hired by individual departments. A permanent employee of the company must belong to a department. When the project is completed, they will go back to their own departments. 112 For each project team, there is a principal manager. It is the company's policy that only a permanent employee who has completed his/her training period is allowed to be a principal manager. A teams may also have an assistant manager who could be a temporary employee. Permanent employees are paid monthly by their own department and temporary employees are paid hourly by their projects. For their overtime work, both of them will be paid hourly and rates vary with projects. The company does not allow any employee to work on more than two projects at the same time. For outside clients, we have restricted their locations to be within North America. This restriction is solely for the purpose of simplifying the demonstration. A contract project will be terminated if the client ceases to exist. A team will be dissovled if the project is terminated. The operation of interest is assigning employees to projects, which includes assigning a permanent employee to a project and hiring a temporary employee for a project. 6.2 A N S D B M PROJECT M A N A G E M E N T DATABASE Let us assume that all the necessary information requirements analysis has been completed so that we have the basic knowledge of the conceptual structure consisting of objects, properties, and relationships. To design the project management database in S D B M , we start with identifying different roles of objects and proceed to analyse the relationships and constraints among these roles. To simplify the definition process, we begin with the most general roles (views) of objects. Here, employees, projects and clients may be considered the most general roles, because many other roles can be derived through the specialization mechanism. To define the SDBM database schema, we have to give two kinds of specifications: 113 1. the conceptual structure of the application to be modelled. This concerns how objects, properties, and relationships among objects are viewed conceptually by the user. 2. the computer data representation. That is how the conceptual structure is represented in computer processable symbolic strings. The former is specified by using the SDBM window concept and the latter is dealt with by defining proper data types in S D B M . Let us start with employee objects. We want to record the ID, name, address, and phone number for each employee. These employees can be modelled as follows. TYPE ( EmployeeType = RECORD WITH EmpID: EmpID, EmpName: EmpName, A d d r e s s : EmpAddress, Phone#: Phone# ) WINDOW ( Employee = TYPE: E m p l o y e e T y p e ; KEY: ID ) The meaning of data types EmpID, EmpName, EmpAddress, and Phone# are obvious. To give an example, we define data type Phone#. TYPE ( Phone# = CHARACTER WITH ' ( '999 ' ) '999'- '9999 OR 999'- '9999 ) The legitimate telephone numbers are for example (604) 228-8651 and 224-1973. We have taken into consideration that it may not be necessary to record the area code for local phone numbers. 114 Assume that we are interested in the following information about each project: its name, code, category, start-date, end-date, budget, and total cost to date. The following are the definitions of the corresponding data type and window. TYPE ( P r o j e c t T y p e = RECORD WITH PjtName: PjtName, P j t C o d e : P j t C o d e , C a t e g o r y : P j t C a t e g o r y , S t a r t - D a t e : D a t e , E n d - D a t e : D a t e , B u d g e t : Money, C o s t : Money; ) WINDOW ( P r o j e c t = TYPE: P r o j e c t T y p e ; KEY: P j t C o d e ) Data type PjtCategory can be defined as an enumeration data type. TYPE ( P j t C a t e g o r y = ENUMERATION WITH { C o n s u l t i n g , R e p a i r , M a n u f a c t u r i n g , R e s e a r c h } ) To model client objects, we assume that each client will be assigned a special ID. Other interesting information includes client name, address, and phone#. We have the following specification. TYPE ( C l i e n t T y p e = RECORD WITH C l t l D : C l i e n t I D , CltName: C l i e n t N a m e , A d d r e s s : C l i e n t A d d r e s s , Phone#: Phone# ) 115 WINDOW ( C l i e n t = TYPE: C l i e n t T y p e ; KEY: ID ) Among the data types ClientID, ClientName, ClientAddress, and Phone#, ClientAddress is the most interesting. ClientAddress can be modelled with the SDBM record data type. Its definition must take into consideration the difference between the American postal code and the Canadian postal code. TYPE ( A d d r e s s = RECORD WITH No : I n t e g e r N o , S t r e e t : S t r e e t N a m e , C i t y : CityName, VARIANT S t a t e : StateName, C o u n t r y : {U.S.A.}, P o s t a l C o d e : USCode, VARIANT P r o v i n c e : P r o v i n c e N a m e , C o u n t r y : {Canada}, P o s t a l C o d e : C a n a d i a n C o d e ; ) P o s t a l c o d e s a r e d e f i n e d as f o l l o w s . TYPE ( USCode = CHARACTER WITH 9(5) ) TYPE ( C a n a d i a n C o d e = CHARACTER WITH 9A9' '9A9 ) The above specification dictates that if the country is U.S.A., we must write American state names and use the American postal codes, and if the country is Canada, we must write Canadian provincial names and use the Canadian postal codes. 116 So far, we have been concerned with information at a very general level. More details are needed. Let us consider project objects in a further step. There are two types of projects. They are internal projects and contract projects. For internal projects, we need to know the names of their clients in addition to those properties defined for project objects in general. TYPE ( I n t e r n a l P r o j e c t T y p e IS P r o j e c t T y p e WITH CltName: I n t e r n a l C l i e n t N a m e ) WINDOW ( I n t e r n a l P r o j e c t IS P r o j e c t WITH TYPE: I n t e r n a l P r o j e c t T y p e ) For contract projects, additional properties which we are interested in include client ID, client name, contract code, and amount of money. Contract projects can be modelled as follows. TYPE ( C o n t r a c t P r o j e c t T y p e IS P r o j e c t T y p e WITH C l t l D : C l i e n t I D , C ltName: C o n t r a c t C l i e n t N a m e , C o n t r a c t C o d e : C o n t r a c t C o d e , Amount: Money; ) WINDOW ( C o n t r a c t P r o j e c t IS P r o j e c t WITH TYPE: C o n t r a c t P r o j e c t T y p e ; REFERENCE: C l i e n t ( C l t I D = C l t I D ) ) In the definition of window ContractProject, we have given a referential constraint. This constraint states that a contract project must have at least one outside client if it is 117 to be created in the database. At the same time, an outside client object cannot be destroyed if it has at least one contract with the company. Now, we consider some different roles of an employee. Like any other real world objects, when viewed from different angles, an employee may appear to play different roles at the same time. For example, in our case, an employee may be a permanent employee or a temporary one, a trainee or a manager. For temporary employees, the corresponding object class can be defined as follows. WINDOW ( Temp o r a r y E m p l o y e e IS Employee WITH TYPE: Emp l o y e e T y p e ) Here, the data representation aspect of window TemporaryEmployee is exactly the same as that of window Employee. This is an example which illustrates that it is not necessary to add more details in order to generate a specialization. This example also demonstrates the flexibility gained from separating the type concept and the window concept. Here, data type EmployeeType has two sets of instances. For a permanent employee, we need to know his/her deparment and the date he/she started working for the company. TYPE ( PermanentEmployeeType IS EmployeeType WITH DeptName: DepartmentName, S t a r t - D a t e : D a t e ) WINDOW ( PermanentEmployee IS Employee WITH TYPE : PermenentEmpoyeeType ) TYPE ( DepartmentName= ENUMERATION WITH { A d m i n i s t r a t i o n , F i n a n c e , M a r k e t i n g , 118 E n g i n e e r i n g C o n s u l t i n g , R&D, M a n u f a c t u r i n g } Among permanent employees, some are newly hired trainees. In addition to the properties that an employee and a permanent employee has, a trainee has a special property that he/she has a permanent employee as his/her supervisor. Let us first define those employees who are not trainees. WINDOW ( N o n t r a i n e e IS PermanentEmployee, ISNOT T r a i n e e WITH TYPE: PermanentEmployeeType ) TYPE ( T r a i n e e T y p e IS PermanentEmployeeType WITH S u p e r v i s o r : EmpID ) WINDOW ( T r a i n e e IS Permanentemployee WITH TYPE: T r a i n e e T y p e ; REFERENCES: N o n t r a i n e e ( E m p I D = S u p e r v i s o r ) ) In the above definition, we have specified constraints that the supervisor must exist, he/she must be a permanent employee, and he/she must not be another trainee. A special group of non-trainees are the principal managers of projects. They are defined as follows. TYPE ( ManagerType IS PermanentEmployeeType WITH P j t C o d e : P r o j e c t C o d e , ) WINDOW ( Manager IS PermanentEmployee, ISNOT T r a i n e e WITH TYPE: ManagerType; REFERENCE: P r o j e c t ( P j t C o d e = P j t C o d e ) 119 ) Obviously, the existence of a manager can only make sense if the project to be managed exists. All the objects we have so far dealt with are somewhat concrete. Now, let us define a collection of more abstract objects which are assignments. An assignment is specified to be the relationship that an employee is assigned to a project. The properties of an assignment include employee ID, employee name, project code, and date of the assignment. TYPE ( A s s i g n m e n t T y p e = RECORD WITH EmpName: EmpName, EmpID: EmpID, P r o j e c t C o d e : P r o j e c t C o d e , D a t e : D a t e ) WINDOW ( A s s i g n m e n t = TYPE: A s s i g n m e n t T y p e ; REFERENCE: Employee(EmpID= E m p l d ) , P r o j e c t ( P j t C o d e = P r o j e c t C o d e ) ; KEY: EmpID, P r o j e c t C o d e ) Conceptually, an Assignment object is defined to be an aggregation of an employee object and a project object. We enforce that both the employee object and the project object must exist in order to have the assignment. Alternatively, we can define an assignment to be an aggregation of an employee object and a project object without creating any new object. We notice that assigning a temporary employee is different from assigning a permanent employee. To assign a temporary employee is to hire him or her for a project. He or she may be hired for several projects at the same time. Assume that the temporary 120 employee is paid by hour and that the minimum rate is $3 per hour. The hourly rate may be different for different projects. For simplicity, temporary-hiring is defined to be an assignment with the additional property Rate. TYPE ( T e m p o r a r y H i r i n g T y p e IS A s s i g n m e n t T y p e WITH R a t e : R a t e ; ) WINDOW ( T e m p o r a r y H i r i n g IS A s s i g n m e n t WITH TYPE: T e m p o r a r y H i r i n g T y p e ) and data type Rate is defined to be TYPE ( R a t e = NUMERIC WITH 3 . . 9 9 ) Another interesting concept is that of a project team which is the result of assigning employees to a project. It is reasonable to assume that a project has only one team and a team is formed only for one project. Though this one-to-one relationship suggests that a project and its team may be represented by a single object, we notice their conceptual differences. Consider for example a project may exist before a team is created for it. Properties of the team object include project code, project name, principal manager, assistant manager, and number of employees in the team. Because a team may not have an assistant manager, the AssistantManager property is signified to be optional. 121 TYPE ( TeamType= RECORD WITH P r o j e c t C o d e : P r o j e c t C o d e , P r o j e c t N a m e : P r o j e c t N a m e , P r i n c i p a l M a n a g e r : EmpID, A s s i s t a n t M a n a g e r : EmpID (OPTIONAL), N o O f E m p l o y e e s : I n t e g e r ) WINDOW ( Team= TYPE: TeamType; REFERENCE: Manager(Empld= P r i n c i p a l M a n a g e r ) , Manager(EmpID= A s s i s t a n t M a n a g e r ) , P r o j e c t ( P j t C o d e = P r o j e c t C o d e ) ; KEY: P r o j e c t C o d e ) The data types and windows declared above formed the basic structure of the hypothetical project management database. This part is sufficient to establish an empty database in the computer. However, we also need to define several operations in order to manipulate the database. SDBM database operations can be defined with the SDBM transaction facility. 6 . 3 TRANSACTIONS FOR END USER OPERATIONS As we pointed out previously, transactions are the only means for the end user to operate on an S D B M database. Transactions are designed to ensure that certain behavioural constraints are observed when the database is altered by the end user. In this section, we will define several end user operations in terms of transactions. First of all, we will define the operation of assigning an employee to a project team. To complete such an operation, we must first test that the employee has not been assigned to more than two teams. If this is true, a proper object is created and inserted into the corresponding window. The number of employees in the corresponding project 122 team must also be increased by one. TRANSACTION ( A s s i g n ( i d : EmpID, n: EmpName, p c : P r o j e c t C o d e , d: D a t e , OUT a: A s s i g n m e n t ); VARIABLE i : I n t e g e r , x: A s s i g n m e n t , t : Team; BEGIN C o u n t ( i d , i ) , IF i GT 3 THEN ABORT ELSE CREATE X WITH ( EmpName=n, EmpID=id, P r o j e c t C o d e = p c , Date=d ), RETRIEVE t WHERE ( P r o j e c t C o d e = p c ) , t.NoOfEmployee= ( t . N o O f E m p l o y e e + 1) END ) Of course, we assume that the project has been initiated. In the transaction Assign, another transaction Count is called to find out how many teams the employee has already been assigned to. This is a very simple transaction. In fact it does not alter anything in the database. 123 TRANSACTION ( Count ( i d : EmpID, OUT i : I n t e g e r ); VARIABLE y: A s s i g n m e n t ; BEGIN i = 0, RETRIEVE y WHERE (EmpID=id), WHILE y NE NULL DO BEGIN i = i + 1 RETRIEVE y WHERE (EmpID=id) END END ) More realistically, when we assign an employee to a project team, we usually know what kind of employee he/she is. Therefore, the assignment operations special to that kind of employee should also be defined. Consider for example the operation of hiring a temporary employee, which is defined as follows. TRANSACTION ( H i r e T e m p o r a r y ( i d : EmpID, n: EmpName, t a : EmpAddress, ph: Phone#, p: P r o j e c t C o d e , d: D a t e , r : Ra t e ); VARIABLE a: A s s i g n m e n t T y p e , e: Employee, t e : T e m p o r a r y E m p l o y e e , t h : T e m p o r a r y H i r i n g ; 124 BEGIN CREATE e WITH ( ID=id, Name=n, A d d r e s s = t a , Phone#=ph), INSERT t e ( e ) INTO T e m p o r a r y E m p l o y e e , A s s i g n ( i d , n , p , d , a ) , INSERT t h ( a ) INTO T e m p o r a r y H i r i n g WITH ( R a t e = r ) END ) 6.4 S U M M A R Y In this chapter, a hypothetical project management database is designed to demonstrate the concept, the structure, and the modelling methodology of the semantic database model S D B M . Although many examples have been given in chapter 5, they may appear to be fragmented. Our goal in this chapter is to give a more coherent picture of the model. 125 7. CONCLUSIONS In this thesis, we have addressed the shortcomings associated with conventional data models, particularly the relational data model. We have also discussed the features of recently proposed semantic data models. Based on this investigation, a semantic database model, SDBM, is developed. S D B M is designed to attack the modelling problems primarily of the relational data model and to offer some improvements over other semantic data models. The rest of this chapter will summarize some major ideas presented in this thesis. First of all, we conceptualize data modell ing in a way different from other data models. The conceptualization itself is simple and effective. What is more important is that our model directly reflects this conceptualization. We believe that data modell ing should be viewed from two sides. From one side, we see objects, properties, and relationships among these objects. This image is a one-to-one mapping of the real world situation to be modelled. From the other side, we see their computer data representations. We realize that it is not objects, nor properties, nor relationships among objects that should be or can be directly reflected by these computer data representations. Rather, it is their roles that should and can be directly reflected by these computer data representations. Roles are certain aspects of objects. Viewed from a more general point, a role is an abstraction of a collection of properties. Roles can be modelled with SDBM windows. Notice that there are similarities between the SDBM window concept and the class concept of other semantic data models. However, because the SDBM window concept is built on a different conceptualization, it is conceptually different from the class concepts of other semantic data models. 126 As mentioned before, we envision that there exists a conceptual structure or world consisting of objects, properties, and relationships, which is a one-to-one correspondence of the 'real' world. Open a window. We see some part of the structure. This part is a role which is characterized by a collection of properties. These properties could manifest either an object or a relationship. Open another window in a different direction, and we see another role possibly being played by the same object. No matter how many roles we have seen, the underlying structure as a whole is the same. Clearly, the concept of window in SDBM extends the concept of class in other semantic data models. Talking about instances of a class as in other semantic data models, we find that 'instances' of an SDBM window can be either objects or relationships among these objects, while the instances of a class of other semantic data models can only be objects. Therefore, SDBM accommodates views which are more flexible for the user. The object concept in SDBM is also different from the object concepts found in other semantic data models. In other semantic data models, an object is always rigidly tied to one role or several roles if these roles have compatible external data representations in terms of the data subtype hierarchy. However, in S D B M such rigidity is removed as a result of applying our two-view conceptualization. An SDBM object can have totally different external data representations. This extension of the object concept greatly enhances the data modell ing ability. Based on our conceptualization, we investigated the relationship between the type concept and the window concept. The type concept in SDBM is actually a tool to make it possible to display our vision of the 'real' world being model led. We found that there should be explicit separation between the type concept and the window concept and that a semantic data model should support the many-to-many relationship between these two concepts. However, this has not been supported by many recently proposed semantic data models. 127 We have developed the S D B M so that such separation is built into it. It has been demonstrated that this separation and this many-to-many relationship give the model enhanced modelling power. Following such separation, we classify semantic integrity constraints into two categories, those associated with types and those associated with windows. Secondly, a database design methodology based on our two-view conceptualization is presented. This methodology expands the design methodology proposed in TAXIS (Mylopoulos et al [1980]). Like TAXIS, our methodology also promotes the concept of specialization. However, unlike TAXIS, our approach does not just start with the most general level. It starts with objects that are both general and conceptually significant on their own. The design of an SDBM database also needs to go through a role identification process. In this process, we identify all the interesting roles for objects and analyse relationships that may exist among them. Thus, we expect that this approach will improve the database specification because of the enhanced modell ing power. Thirdly, because the model has a built-in type system, well-established type checking techniques in programming languages can be readily introduced to database applications. Finally, several new modell ing constructs are introduced to support our conceptualization of data modell ing. The S D B M record data type is developed as the basic external data representation tool for S D B M roles. As demonstrated in examples, this data structure is flexible in modelling various situations. The relationships among windows have been investigated. As a result, constructs like MAYBE and ISNOT are introduced to model the constraints between different roles that might be played by the same object. For the same purpose, dynamic modell ing constructs such as INSERT and DELETE are also introduced. To cope with the difficulty associated with multi-valued properties, several behavioral operations are provided. They can add a 128 value to the property, remove a value from the property, and modify a multi-valued property. We have pointed out that the specialization process may be carried out without additional details. Due to time constraint, the semantic database model S D B M has not been implemented on the computer. It should be interesting to investigate how our two-view conceptualization can be realized internally. It should also be interesting to investigate the user specifiable exceptional handling facility at the conceptual level. Currently, such facility is not supported. 129 BIBLIOGRAPHY Ablano, A. et al [1985], 'Gali leo: A Strongly Typed Interactive Conceptual Language', ACM TODS, Vol.10, No.2, 1985. Abrial, J.R. [1974], 'Data Semantics'. In Klimbie, J.W. and Koffman, K.L. (Eds.), DATA MANAGEMENT, North-Holland, 1974. Adiba, M. , Delobel, C , and Leonard, M. [1976], 'A Unified Approach for Modell ing in Logical Data Base Design', In Nijssen, G . M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976. Aho, A.V., Been, C. and Ullman, J.D. [1977], 'The theory of joins in Relational Databases (Extended Abstract)', Proc. of 18th Annual Symposium on Foundation of Computer Science, Oct . 1977. Astrahan, M .M. , et al. [1976], 'System R: A Relational Approach to Data Base Management', ACM TODS, Vol .1, No.2, 1976. Bachman, C W . [1977], 'The Role Concept in Database Models ' , Proc. of Int. Conf. on Very Large Data Bases, Tokyo, Japan, October 1977. Benci, E., Bodart, F., Bogaert, H., and Cabanes, A. [1976], 'Concepts for the Design of a Conceptual Schema', In Nijssen, G . M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976. Bernstein, P.A., Swenson, J.R. and Tsichritzis, D.C. [1975], 'A Unified Approach to Functional Dependencies and Relations', Proc. of ACM SIGMOD Int. Conf. on the Mgt of Data, San Jose CA, May 1975. Biller, H., Glatthaar, W. and Neuhold, E.J. [1976], ' O n the Semantics of Data Bases: the Semantics of Data Manipulation Languages', In Nijssen, G . M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976. Birtwistle, C M . , Dahl, O.J., Myhrhang, B., Nygaard, K. [1973], SIMULA BEGIN, New York, Petrocelli, 1973. Bracchi, G . Paolini, P. and Pelagatti, G. [1976], 'Binary Logical Associations in Data Model l ing' , in Nijssen, G . M . (Ed.) MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976. Brodie, M.L. [1978], 'Specification and Verification of Data Base Semantic Integrity', Technical 130 Report CSRG-91, Univ. of Toronto, April 1978. Brodie, M .L [1981], 'Association: A Database Abstraction for Semantic Model l ing' , In ENTITY-RELATIONSHIP APPROACH TO INFORMATION MODELLING AND ANALYSIS, Chen, P.P.(Eds.), North-Holland, 1981. Brodie, M.L., Mylopoulos, J. and Schmidt, J.W. [1984], ON CONCEPTUAL MODELLING, Springer-Verlag, 1984. Buneman, P. and Atkinson, M. [1986], 'Inheritance and Persistence in Database Programming Languages', Proc. of ACM SIGMOD Int. Conf. on the Mgt. of Data, Washington, D . C , 1986. Chen, P.P. [1976], T h e Entity Relationship Mode l : Toward a Unifying View of Data', ACM TODS, Vol .1, No.1 , 1976. Chen, P.P. [1981], ENTITY-RELATIONSHIP APPROACH TO INFORMATION MODELLING AND ANALYSIS, North-Holland, 1981. Codd , E.F. [1970], 'A Realtional Mode l for Large Shared Data Banks', Comm. of ACM, Vol.13, No.6, 1970. Codd , E.F. [1971a], 'A Data Base Sublanguage Found on the Relational Calculus', Proc. of ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego CA, 1971. C o d d , E.F. [1971b], 'Further Normalization of the Data Base Relational Mode l ' , In Rustin, R. (Ed.), DATA BASE SYSTEMS, Prentice Hall, 1971. Codd,E.F. [1971c], 'Normalization Data Base Structure: A Brief Tutorial', Proc. of ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego, 1971. C o d d , E.F. [1971d], 'Relational Completeness of Data Base Sublanguages', In Rustin, R. (Ed.), DATA BASE SYSTEMS, Prentice Hall, 1971. C o d d , E.F. [1974a], 'Recent Investigation in Relational Data Systems', Information Processing'74, North Holland, 1974. C o d d , E.F. [1974b], 'Seven Steps to Rendezvous with the Casual User', In Klimbie, J.W. et al. (Eds.), DATA BASE MANAGEMENT, North Holland, 1974. C o d d , E.F. [1979], 'Extending the Database Relational Mode l to Capture More Meaning'. ACM TODS. Vol.4, No.4, Dec. 1979. Date, C.J. [1983], AN INTRODUCTION TO DATABASE SYSTEM, Vol.11, Addison-Wesley, 1983. 131 Eswaran, K.P. and Chamberlin, D.D. [1975], 'Functional Specifications of a Subsystem for DAtabase Integrity', Proc. of Int. Conf. on Very Large Data Bases, Framingham MA, USA, Sept. 1975. Fagin, R. [1975], 'The Decomposit ion Versus the Synthetic Approach to Database Design', Proc. of Int. Conf. on Very Large Data Bases, Tokyo, Japen, October 1977. Fagin, R. [1977], 'Multivalued Dependencies and a New Normal Form for Relational Databases', ACM TODS, Vol.2, No.3, 1977. Fagin, R. [1981], 'A Normal Form for Relational Databases That Is Based on Domains and Keys', ACM TODS, Vol.6, No.3, 1981. Falkenberg, E. [1976a], 'A Uniform Approach to Data Base Management', In Nijssen, C M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976 Falkenberg, E. [1976b], 'Concepts for Model l ing Information', In Nijssen, C M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-holland, 1976. Goldstein, R.C. [1985], DATABASE: TECHNOLOGY AND MANAGEMENT, John Wiley & Sons, 1985. Hall, P., Owlett, J. and Todd, S. [1976], 'Relations and Entities', In Nijssen, G . M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976. Hammer, M .M . and Berkowitz, B. [1980], 'DIAL: A Programming Language for Data Intensive Applications', Proc. of ACM SIGMOD Int. Conf. on the Mgt. of Data, , 1980. Hammer, M .M . and McLeod, D.J. [1975], 'Semantic Integrity in a Relational Data Base system', Proc. of Int. Conf. on Very Large Data Bases, Framingham MA, USA, Sept. 1975. Hammer, M . M . and McLeod, D.J. [1976], 'A Framework for Data Base Semantic Integrity', Proc. of Second Int. Conf. on Software Eng., San Francisco, Oct. 1976. Hammer, M. and Mc leod, D. [1981], 'Database Description with S D M : A Semantic Database Mode l ' , ACM TODS, Vol.6, No.3, 1981. Ingalls, D.H. [1978], 'The Smalltalk-76 Programming Systems: Design and Implementation', Conf. Record of the 5th Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona, 1978. Kambayashi, Y., Tanaka, K. and Yajima, S. [1978], 'Problems of Relational Database Design', In Yao, S.B. and Navathe, S.B. (Eds.), DATA BASE DESIGN TECHNIQUES I: REQUIREMENTS AND LOGICAL STRUCTURE, Springer-Verlag, 1978. Kent, W. [1977], 'Entities and Relationships in Information', In Nijssen, C M . , (Ed.), 132 ARCHITECTURE AND MODELS IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1977. Kent, W. [1979], 'Limitations of Record-Based Information Models ' . ACM TODS, Vol.4, No.1, 1979. King, R. and Mc leod, D. [1982], 'The Event Database Specification Mode l ' , Proc. 2nd Int. Conf. on Databases: Improving Usability and Responsiveness. Jerusalem, Israel, June 1982. King, R. and McLeod, D. [1984], 'A Unified Mode l and Methodology for Conceptual Database Design', In Brodie, M.L. et al. (Eds.), ON CONCEPTUAL MODELLING, Springer-Verlag, 1984. King, R. and McLeod, D. [1985], 'Semantic Data Models ' , In Yao, S.B. (Ed.), PRINCIPLES OF DATABASE DESIGN, VOL. I, Pretice-Hall, 1985. Kroenke, D.M. [1983], DATABASE PROCESSING: FUNDATIONALS, DESIGN, IMPLEMENTATION, 2nd ed., SRA, 1983. King, B. and Sonke, S. [1984], 'A Semantic Data Language', IEEE 84, Trend and , 1984. Maier, D. [1983], THE THEORY OF RELATIONAL DATABASES, Computer Science Press, 1983. Manalo, B.A. [1983], 'A User-oriented Conceptual Schema Model of Data Base with a Supporting Design Strategy', University of Illinois at Urbana-Champaign, unpublished Ph.D. Thesis. 1983. M c C e e , W . C . [1976], ' O n User Criteria for Data Model Evaluation', ACM TODS, Vol.1, No.4, 1976. McLeod, D.J. [1976a], 'High Level Domain in a Relational Data Base System', Proc. of ACM SIGPLAN/SIGMOD Conf. on Data: Abstraction, Definition, and Structure, Salt Lake City, March 1976. McLeod, D.J. [1976b], 'High Level Expression of Semantic Integrity Specifications in a Relational Data Base System, Technical Report TR-165, Lab. for Computer Science, MIT, Cambridge MA, Sept. 1976. Mylopoulos, J., Berstein, P.A. and Wong , H.K.T. [1980], 'A Language Facility for Designing Database-intensive Applications'. ACM TODS, Vol.5, No.2, June 1980. Pratt, T.W. [1984], PROGRAMMING LANGUAGES: DESIGN AND IMPLEMENTATION, (2nd ed.), Prentice-Hall, 1984. Rissanen, J. [1977], 'Independent Components of Relations', ACM TODS, Vol.2, No.4, Dec. 1977. 133 Rowe, L. and Shoens, K. [1979], 'Data Abstraction, Views and Updates in RICEL', Proc. of ACM SIGMOD Int. Conf. on the Mgt. of Data, Boston MA, 1979. Schmid, H.A. and Swenson, J.R. [1975], ' O n the Semantics of the Relational Data Mode l ' , Proc. A C M SIGMOD Int. Conf. on the Management of Data, San Jose, 1975. Schmid, H.A. [1977], 'An Analysis of Some Concepts for Conceptual Models ' , In Nijssen, G . M . (Ed.), ARCHITECTURE AND MODELS IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1977. Schmidt, J.W., [1978], 'Type Concepts for Database Definit ion', Proc. of Int. Conf. on Data Bases, Haifa, Israel, 1978. Shipman, D.W. [1982], 'The Functional Mode l and the Data Language DAPLEX', ACM TODS, Vol.6, No.1, 1982. Smith, J .M. and Smith, C P . [1977], 'Database Abstractions: Aggregation and Generalization', ACM TODS, Vol.2, No.2, 1977 Steuert, J. and Goldman, J. [1974], 'The Relational Data Management System: A Perspective', Proc. of A C M SIGFIDET Workshop on Data Description, Access, and Control. 1974. Stonebraker, M.R. [1974], 'High Level Integrity Assurance in Relational Data Base Management Systems', Elec. Research Lab. Report ERL-M473, Univ. of California, Berkeley CA, Aug. 1974. Stonebraker, M. , Wong , E., kreps, P. and Held, G. [1976], 'The Design and Implementation of INGRES', ACM TODS, Vol .1, No.3, 1976. Stonebraker, M.R. [1984], 'Adding Semantic Knowledge to a Relational Database System', In Brodie, M .L et al (Eds.), ON CONCEPTUAL MODELLING, Springer-Verlag, 1984. Tsichritzis, D.C. and Lochovsky, F.H. [1982], DATA MODELS, Prentice-Hall, 1982 Tsur, S. and Zaniolo, C. [1984], 'An Implementation of GEM--Supporting a Semantic Data Mode l on a Relational Back-end'. Proc. of SIGMOD, 1984 annual meeting. Vol.14, No.2, June 1984. Ullman, J.D. [1983], PRINCIPLES OF DATABASE SYSTEMS, (2nd ed.), Computer Science Press, 1983. Weber, H. [1976], 'A Semantic Mode l of Integrity Constraints on a Relational Data BAse', In Nijssen, G . M . (Ed.), MODELLING IN DATA BASE MANAGEMENT SYSTEMS, North-Holland, 1976. 134 Wegner, P. [1980], 'Programming with Ada: An Introduction by Means of Graduated Examples', Englewood Cliffs, N.J., Prentice-Hall, 1980. Wiederhold, G. [1977], DATABASE DESIGN ,McGraw Hill, 1977. Wong , H.K.T. [1981], 'Design and Verification of Integrity Information Systems Using TAXIS', Technical Report CGSRG-129, Univ. of Toronto, April 1981. APPENDIX A. SYNTAX O F S D B M <Letter>:= A|B|C|D|E|F|G|H|l|j|K|L|M|N|0|P|Q|R|S|T|U|V|W|X|Y|Z a | b|c | d|e | f|g|h|i|j|k|l|m|n | o|p | q|r | s|t|u|vfw | x | y | z <Digit>:= 0 j1 | 2 | 3 | 4|5|6|7|8|9 <LetterOrDigit>:= <Letter> | <Digit> <SpecialCharacter>:= + | - | * l / l-M;l# l s|t | { | } | [ | ] | , l B | ) | ( | I <Character>:= <L e t t e r O r D i g i t > | <SpecialCharacter> <CharacterStr ing>:= <Character>{<Character>} <Ident i f ier>:= <Letter>{<LetterOrDigit>} <KeyWord>:=NUMERIC | CHARACTER | BOOLEAN | VARIANT | COMMON BEGIN | END | NULL | I SNOT | MAYBE | LOCAL | IS | WITH | ADT | TYPE | ADTOP | FUNCTION | ACTION | MULTIPLE | OPTIONAL | KEY | OF | INTO TRANSACTION | OUT | IN | VARIABLE | INSERT | MODIFY | DELETE | RETRIEVE | DESTROY | CREATE | TO | WINDOW | REFERENCE | RELATIONSHIP WHERE | IF | THEN | ELSE | ENDIF | FOR | EACH | DO | WHILE | GT | GE | LT | LE | EQ | NE | AND OR | NOT | <--> | ADD | REMOVE | TRUE | FALSE <UnsignedInteger>:= <Digit>{<Digit>} <PositiveInteger>:= <UnsignedInteger> | +<UnsignedInteger> < N e g a t i v e l n t e g e r > : = - < D i g i t > { < D i g i t > } <Integer>:= < P o s i t i v e I n t e g e r > | < N e g a t i v e l n t e g e r > < U n s i g n e d R e a l > : = < D i g i t > { < D i g i t > } . { < D i g i t > } <Real>:= < U n s i g n e d R e a l > | +<UnsignedReal> | - < U n s i g n e d R e a l > <Number>:= <Real> | < I n t e g e r > <Constant>:= < U n s i g n e d I n t e g e r > | < U n s i g n e d R e a l > | " < C h a r a c t e r S t r i n g > " | NULL < S i g n e d C o n s t a n t > : = <Number> | " < C h a r a c t e r S t r i n g > " | NULL <SDBMSchema>:= < T y p e D e c l a r a t i o n P a r t > <WindowDeclarat i o n P a r t > [ < T r a n s a c t i o n D e c l a r a t i o n P a r t > ] < T y p e D e c l a r a t i o n P a r t > : = < T y p e D e c l a r a t i o n > { < T y p e D e c l a r a t i o n > } < T y p e D e c l a r a t i o n > : = TYPE ( <TypeID> [ = < T y p e D e f i n i t i o n > | < S u b t y p e D e f i n t i o n > ] + ) < T y p e D e f i n i t i o n > : = <AnyTypeID> WITH < T y p e C o n s t r a i n t > <AnyTypeID>:= NUMERIC | BOOLEAN | CHARACTER | ENUMERATION | RECORD | <TypeID> < T y p e C o n s t r a i n t > : = < N u m e r i c R a n g e s > | < B o o l e a n P i c t u r e > | < C h a r a c t e r P i c t u r e s > | < E n u m e r a t i o n P i c t u r e > | < R e c o r d P i c t u r e > <NumericRanges>:= <NumericRange> {OR <NumericRange>} <NumericRange>:= <Number>..<Number> < B o o l e a n P i c t u r e > : = "{"TRUE|FALSE|TRUE,FALSE|FALSE,TRUE"}" < C h a r a c t e r P i c t u r e s > : = < C h a r a c t e r P i c t u r e > {OR < C h a r a c t e r P i c t u r e > } < C h a r a c t e r P i c t u r e > : = < N u m e r i c C h a r a c t e r > | < A l p h a b e t i c C h a r a c t e r > | < A l p h a n u m e r i c C h a r a c t e r > <Numer i c C h a r a c t e r > : = 9 ( < I n t e g e r > ) < A l p h a b e t i c C h a r a c t e r > : = A ( < I n t e g e r > ) < A l p h a n u m e r i c C h a r a c t e r > : = X ( < I n t e g e r > ) < C h a r a c t e r L i s t > : = { [ A | X | 9 | < Q u a t e L i s t > ] + } + < Q u a t e L i s t > : = " < C h a r a c t e r > { < C h a r a c t e r > } " <Enumerat i o n P i c t u r e > : = " { " < E n u m e r a t i o n L i s t > " } " < E n u m e r a t i o n L i s t > : = < C h a r a c t e r S t r i n g > { , < C h a r a c t e r S t r i n g > } < R e c o r d S t r u c t u r e > : = [[VARIANT|COMMON] <ComponentAndTypePair> {, [VARIANT|COMMON] <ComponentAndTypePair> };] [BEGIN < C o n s t r a i n t A s s e r t i o n > { , < C o n s t r a i n t A s s e r t i o n > } END] <ComponentAndTypePai r>: = <ComponentID>:<TypeSpec i f i c a t ion>[MULTIPLE|OPTIONAL] < C o m p o n e n t I D > : = < I d e n t i f i e r > < T y p e S p e c i f i c a t i o n > : = < T y p e I D > | < T y p e D e f i n i t i o n > | <SubtypeDef i n i t i o n > < C o n s t r a i n t A s s e r t i o n > : = < C o m p l e x A s s e r t i o n > | < S i m p l e A s s e r t ion> < S i m p l e A s s e r t i o n > : = <ConTerm> { < B o o n l e a n O p e r a t o r > <ConTerm>} <B o o n l e a n O p e r a t o r > : = AND | OR | NOT <ConTerm>:=<Element> { < R e l a t i o n a l O p e r a t o r > <Element>} < R e l a t i o n a l O P e r a t o r > : = GE | GT | LE | LT | EQ | NE <ConPrimary>:= ( < C o n s t r a i n t A s s e r t i o n > ) | <CompoentID> | <Constant> < C o m p l e x A s s e r t i o n > : = I F < C o n s t r a i n t A s s e r t i o n > THEN < C o n s t r a i n t A s s e r t i o n > [ ELSE < C o n s t r a i n t A s s e r t i o n > ] ENDIF < S u b t y p e D e f i n i t i o n > : = IS < T y p e D e f i n i t i o n > <ADTDe f i n i t i on>: = ADT(<ADTID>=TYPE <TypeID> [ L O C A L ] ; ADTOP [<ADTFunctionOP>|<ADTProcedureOP>] {, [<ADTFunctionOP>|<ADTProcedureOP>] +}; 139 ACTION: < A D T P r o c e d u r e O r F u n c t i o n > ) <ADTFunctionOP> : = <ADTFunct i o n I D > ( < F u n c t i o n P a r a m e t e r s > ) : < T y p e S p e c i f i c a t ion> <ADTFunct ionID>:= < I d e n t i f i e r > <ADTID>:= < I d e n t i f i e r > <Funct ionParameters>:=<ParameterID>:<TypeSpec i f i c a t i o n > { , <ParameterID>:<TypeSpec i f i c a t i o n > } <ParameterID>:= < I d e n t i f i e r > < ADTProcedureOP>:=<ADTProceureID>(<ProcedureParameters>) <ADTProcedureID>:= < I d e n t i f i e r > < P r o c e d u r e P a r a m e t e r s > : = [OUT]<ParameterID>:<TypeSpec i f i c a t ion>{, [OUT]<ParameterID>:<TypeSpec i f i c a t i o n > } < A D T P r o c e d u r e O r F u n c t i o n > : = <ADTProcedure>|<ADTFunct i o n > | < P r o c e d u r e > | < F u n c t i o n > <ADTProcedure>:= PROCEDURE <ADTProcedureID> <Body> END PROCEDURE <ADTProcedureID> <Body>:= [ < A D T T y p e D e c l a r a t i o n P a r t > ] [ < A D T V a r i a b l e D e c l a r a t i o n > ] <ADTStatementPart> < A D T T y p e D e c l a r a t i o n P a r t > : = TYPE < A D T T y p e D e c l a r a t i o n > { , 140 <ADTTypeDeclaration>} <ADTTypeDeclaration>:= <TypeID> [= <TypeDefinition> | <SubtypeDefinition> ] + <ADTVar i a b l e D e c l a r a t i o n P a r t > : = VARIABLE <VariableList>:<TypeID>{, <VariableList>:<TypeID>} <ADTStatementPart>:= BEGIN <ADTStatement>{, <ADTStatement>} END <ADTStatement>:= <ADTAssignment>|<ADTPrecedureCall>| <ADTCompound>|<ADTCondition>| <ADTIteration> <ADTAssignment>:= <VariableID>=<ADTExpression>| <FunctionID>=<ADTExpression> <ADTPrecedureCall>:= <PrecedureID>([<ActualParameter>{, <ActualParameter>}]) <ActualParameter>:= <ADTVariab1e> | <SignedConstant> <ADTVariable>:= <ADTVariableID>|<ADTVariableID>.<ComponentID> <ADTVariableID>:= <VariableID> <ADTCompound>:= BEGIN <ADTStatement>{, <ADTStatement>} END <ADTCondition>:= I F < A D T E x p r e s s i o n > THEN <ADTStatement> [ELSE <ADTStatement>] ENDIF < A D T I t e r a t i o n > : = <ADTFor>|<ADTWhile> <ADTFor>:= FOR < V a r i a b l e I D > = < I n i t i a l > TO < F i n a l > DO <ADTStatement> < I n i t i a l > : = < I n t e g e r > < F i n a l > : = < I n t e g e r > <ADTWhile>:= WHILE < A D T E x p r e s s i o n > DO <ADTStatement> <ADTExpression>:= <ADTTerm> < R e l a t i o n a l O p e r a t o r > <ADTTerm> <ADTTerm> <ADTTerm>:= <ADTUnaryTerm> < A d d i n g O p e r a t o r > <ADTUnaryTerm> <ADTUna ryTerm> <AddingOperator>:= +|-|OR <ADTUnaryTerm>:= <ADTFactor>|+<ADTFactor>|-<ADTFactor> <ADTFactor>:=<ADTPrimary> < M u l t i p l y i n g O p e r a t o r > <ADTPrimar <ADTPrimary> < M u l t i p l y i n g O p e a t o r > : = *|/|AND <ADTPrimary>:= NOT <ADTElement>|<ADTElement> <ADTElement>:= ( < A D T E x p r e s s i o n > ) | < A D T V a r i a b l e > | < F u n c t i o n C a l l > | < A D T F u n c t i o n C a l l > < F u n c t i o n C a l l > : = < F u n c t i o n I D > ( [ < A c t u a l P a r a m e t e r > { , < A c t u a l P a r a m e t e r > } ] ) < A D T F u n c t i o n C a l l > : = < A D T F u n c t i o n I D > ( [ < A c t u a l P a r a m e t e r > { , < A c t u a l P a r a m e t e r > } ] ) <ADTFunction>:= FUNCTION <ADTFunctionID> <Body> END FUNCTION <ADTFunctionID> <Procedure>:= PROCEDURE <PocedureID> <Body> END PROCEDURE <ProcedureID> <Function>:= FUNCTION < F u n c t i o n I D > <Body> END FUNCTION < F u n c t i o n I D > <WindowDeclarat i o n P a r t > : = < W i n d o w D e c l a r a t i o n > {<WindowDeclarat ion>} <WindowDeclaration>:= < S i m p l e W i n d o w D e c l a r a t i o n > | < A g g r e g a t e W i n d o w D e c l a r a t ion> < S i m p l e W i n d o w D e c l a r a t i o n > : = WINDOW (<WindowID> [ I S <WindowID> WITH | = ] + [TYPE: <TypeID>{,<TypeID>;] [REFERENCE: < R e f e r e n c e s > ; ] [RELATIONSHIP: < R e l a t i o n s h i p s > ; [KEY: <KeyIDs>] ) <WindowID>:= < I d e n t i f i e r > <KeyID>:= < P r o p e r t y I D > {,<PropertyID> } <PropertyID>:= < I d e n t i f i e r > < R e f e r e n c e s > : = < W i n d o w I D > ( < P r o p e r t y P a i r s > ) { , < W i n d o w I D > ( < P r o p e r t y P a i r s > ) } < P r o p e r t y P a i r s > : = < P r o p e r t y I D > = < P r o p e r t y I D > { , <PropertyID>=<PropertyID>} < R e l a t i o n s h i p s > : = < R e l a t i o n s h i p > { , < R e l a t i o n s h i p > } < R e l a t i o n s h i p > : = < R e l a t i o n s h i p I D > ( < ( W i n d o w I D > [ ( P r o p e r t y P a i r s > ) ] ) < - - > < R e l a t i o n s h i p I D > < A g g r e g a t e W i n d o w D e c l a r a t i o n WINDOW (<WindowID>= TYPE: < A g g r e g a t e T y p e D e f i n i t i o n > { , < A g g r e g a t e T y p e D e f i n i t i o n > } ; AGGREGATE: <WindowID>, <WindowID>{, <WindowID>} < A g g r e g a t e T y p e D e f i n i t i o n > : = <ComponentID> [OF <WindowID>|: < T y p e S p e c i f i c a t i o n > ] + < T r a n s a c t i o n D e c l a r a t i o n P a r t > : = < T r a n s a c t i o n D e c l a r a t i o n > { < T r a n s a c t i o n D e c l a r a t ion>} < T r a n s a c t i o n D e c l a r a t i o n > : = TRANSACTION ( < T r a n s a c t i o n D e s i g n a t o r > < T r a n s a c t i o n B o d y > ) < T r a n s a c t i o n D e s i g n a t o r > : = < T r a n s a c t i o n I D > ( [ < P a r a m e t e r D e f i n i t i o n L i s t > ] ) < T r a n s a c t i o n I D > : = < I d e n t i f i e r > < ParameterDef i n i t i o n L i st>:= <ParameterDef i n i t ion>{, <ParameterDef i n i t i o n > } < P a r a m e t e r D e f i n i t i o n > : = <OUTParameter> | <INParameter> <OUTParameter>:= OUT <ParameterID>:<TypeIDOrWindowID> <INParameter>:= [IN] <ParameterID>:<TypeIDOrWindowID> <TypeIDOrWindowID>:= <TypID> | <WindowID> < T r a n s a c t ionBody>:= [ < T r a n s a c t i o n T y p e D e c l a r a t i o n > ] [ < T r a n s a c t i o n V a r i a b l e D e f i n i t i o n > ] BEGIN <Statement>{, <Statement>} END < T r a n s a c t i o n T y p e D e c l a r a t i o n > : = TYPE <TypeID> [ = < T y p e D e f i n i t i o n > | < S u b t y p e D e f i n i t i o n > ] + { , <TypeID> [=<TypeDef i n i t i o n > | < S u b t y p e D e f i n i t i o n > ] + } < V a r i a b l e D e c l a r a t i o n > : = VARIABLE < V a r i a b l e L i s t > : < T y p e S p e c i f i c a t i o n > | < W i n d o w I D > { , <Var i a b l e L i s t > : <TypeSpec i f i c a t ion>|<WindowID>}; < V a r i a b l e L i s t > : = < V a r i a b l e I D > { , < V a r i a b l e I D > } < V a r i a b l e I D > : = < I d e n t i f i e r > <Statement>:= < A s s i g n m e n t S t a t e m e n t > | < C r e a t e S t a t e m e n t > | < D e s t r o y S t a t e m e n t > | < I n s e r t S t a t e m e n t > | < M o d i f y S t a t e m e n t > | < D e l e t e S t a t e m e n t > | < R e t r i e v e S t a t e m e n t > | < A b o r t S t a t e m e n t > | <CompoundStatement> | < C o n d i t i o n a l S t a t e m e n t > | < I t e r a t i o n S t a t e m e n t > | < T r a n s a c t i o n C a l l > | < A D T P r o c e d u r e C a l l > | <AddStatement> | <RemoveStatement> | < M u l t i - v a l u e d P r o p e r t y M o d i f y S t a t e m e n t > <AssignemntStatement>:= <Var i a b l e I D > = < E x p r e s s i o n > < C r e a t e S t a t e m e n t > : = CREATE < V a r i a b l e I D > INTO <WindowID> WITH < P r o p e r t y V a l u e L i st>) < P r o p e r t y V a l u e L i s t > : = < P r o p e r t y V a l u e P a i r > { , < P r o p e r t y V a l u e P a i r>} < P r o p e r t y V a l u e P a i r>: = < P r o p e r t y I D > = " < C h a r a c t e r S t r i n g s > " | < V a r i a b l e I D > < C h a r a c t e r S t r i n g s > : = ' < C h a r a c t e r S t r i n g > ' { , ' < C h a r a c t e r S t r i n g > ' < D e s t r o y S t a t e m e n t > : = DESTROY < V a r i a b l e I D > < I n s e r t S t a t e m e n t > : = INSERT <VariableID>(<VarID>{,<VarID>}) INTO <WindowID> [ WITH ( < P r o p e r t y V a l u e L i s t > ) ] <VarID>:= < I d e n t i f i e r > <Modi f y S t a t e m e n t > : = < V a r i a b l e I D > . < P r o p e r t y I D > = <NewValue> 146 <NewValue>:= "<CharacterString>" | <Expression> <DeleteStatement>:= DELETE <VariableID> <RetrieveStatement>:= RETRIEVE <VariableID>[.<PropertyID> TO <VariableID>{, <VariableID>.<PropertyID> TO <VariableID>}] [ WHERE <Expression> ] <AbortStatement>:= ABORT <AddStatement>:= ADD [<VariableID>|"<CharacterStrings>"] + TO <VariableID>.<PropertyID> <RemoveStatement>:= REMOVE [<VariableID>|"<CharacterStrings>"] + FROM <VariableID>.<PropertyID> <Mult i-valuedPropertyModi fyStatement>: = <VariableID>.<PropertyID>= NULL|"<CharacterStrings>"|<VariableID>.<PropertyID> <CompoundStatement>:= BEGIN <Statement> { , <Statement> } END <ConditionalStatement>:= IF <Expression> THEN <Statement> [ ELSE <Statement> ] ENDIF <IterationStatement>:=<ForStatement>I<WhileStatement> <Forstatment>:= FOR EACH <VariableID> DO <Statement> <WhileStatement>:= WHILE < E x p r e s s i o n > DO <Statement> <Expression>:=<Term> { < R e l a t i o n a l O p e r a t o r > <Term>} <Term>:=<UnaryTerm> { < A d d i n g O p e r a t o r > <UnaryTerm>} <UnaryTerm>:= < F a c t o r > | +<Factor> | - < F a c t o r > <Factor>:= <Primary> { < M u l t i p l y i n g O p e r a t o r > <Primary>} <Primary>:= NOT <Element> | <Elenemt> <Element>:= ( < E x p r e s s i o n > ) | < V a r i a b l e I D > | <VariableID>.<ComponentID> | < V a r i a b l e I D > . < P r o p e r t y I D > | < A D T F u n c t i o n a l C a l l > | <Constant> < T r a n s a c t i o n C a l l > : = < T r a n s a c t i o n I D > ( < A c t u r a l P a r a m e t e r > { , < A c t u r a l P a r a m e t e r > } ) 148 APPENDIX B. C O M P A R A S O N S BETWEEN S D B M A N D TAXIS B.1 SIMILARITIES SDBM TAXIS Model l ing primitives are objects. Model l ing primitives are objects. There can be dependency There are dependency constraints constraints. among objects. Objects have properties. Objects have properties. Object property constraints can be Object property constraints can be expressed with assertions. expressed with assertions. Relationships exist between objects. Relationships exist between objects. Basic operations are available, Basic operations are available, including insertion, deletion, including insertion, deletion, modification, and retrieval. modification, and retrieval. Variables may be used as object Variables are used as object surrogates in transactions. surrogates in transactions. B.2 DIFFERENCES SDBM TAXIS Objects have no types. The external Objects are categorized into classes. data representation is handled by windows. Instance hierarchy is trivial. Instance hierarchy is extensively used. Dependency (reference) constraints exist only if specified explicitly. Many-to-many relationships between two objects may be established directly without introducing an artificial construct. The existence of references is different from the existence of relationships. Semantics associated with the name of a relationship can be explicitly expressed. Properties can be multi-valued. Specialization mechanism is only a special constraint among diffeent windows of an object. Data types are used extensively to specify the external data representations of objects. Intra-data-type constraints are specified within data type definitions. Data types are differentiated from windows. A window can have more than one data type and a data type can be used with more than one window. Dependency constraints always exist. Many-to-many relationships between two objects must be established through creating a third object (class). The existence of refences is not differentiated from the existence of relationships. Semantics associated with the name of a relationship can be implied through syntactic combination of the properties of a class. Properties must be single-valued. Specialization mechanism is the only relationship that can be specified among classes. Only data type is the TAXIS class. Intra-relation constraints are specified through transactions. Data types and classes are mixed up. They are all Taxis classes. 150 Objects and symbolic strings are explicitly differentiated. Abstract data types are supported. Symbolic strings are treated as objects. Abstract data types are not supported. The underlying conceptualization is the two-view conceptualization. Appropriate constructs are introduced to support this conceptualization. The two-view conceptualization is not fully supported. S D B M is typed. TAXIS is not typed. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096830/manifest

Comment

Related Items