INCORPORATING SEMANTIC INTEGRITY CONSTRAINTSIN A DATABASE SCHEMAl)yHeng-Li YangB. Sc., National Chiao Tung University, 1978M. Commerce, National Cheng Chi University, 1980M. Sc. (Computer Science), Pennsylvania State University, 1985A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESTHE FACULTY OF COMMERCE AND BUSINESS AI)MINISTRATIONWe accept this thesis as conformingto the required sta dardTHE UNIVERSITY OF BRITISH COLUMBIAAugust 1992© Heng-Li Yang, 1992In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.______________Department of OJIcL 8usiui esc Ad,nmis*iatioJ%The University of British ColumbiaVancouver, CanadaDate 6’, /792DE-6 (2/88)AbstractA database schema should consist of structures and semantic integrity constraints. Semantic integrity constraints (SICs) are invariant restrictions on the static states of thestored data and the state transitions caused by the primitive operations: insertion, deletion, or update. Traditionally, database design has been carried out on an ad hoc basisand focuses on structure and efficiency. Although the E-R model is the popular conceptual modelling tool, it contains few inherent SICs. Also, although the relational databasemodel is the popular logical data model, a relational database in fourth or fifth normalform may still represent little of the data semantics. Most integrity checking is distributedto the application programs or transactions. This approach to enforcing integrity via theapplication software causes a number of problems.Recently, a number of systems have been developed for assisting the database designprocess. However, only a few of those systems try to help a database designer incorporateSICs in a database schema. Furthermore, current SIC representation languages in theliterature cannot be used to represent precisely the necessary features for specifyingdeclarative and operational semantics of a SIC, and no modelling tool is available toincorporate SICs.This research solves the above problems by presentillg two models and one subsystem.The E-R-SIC model is a comprehensive modelling tool for helping a database designer incorporate SICs in a database schema. It is application domain-independent and suitable11for implementation as part of an automated database design system. The SIC Representation model is used to represent precisely these SICs. The SIC elicitation subsystemwould verify these general SICs to a certain extent, decompose them into sub-SICs ifnecessary, and transform them into corresponding ones in the relational model.A database designer using these two modelling tools can describe more data semanticsthan with the widely used relational model. The proposed SIC elicitation subsystem canprovide more modelling assistance for him (her) than current automated database designsystems.U’Table of ContentsAbstract iiList of Figures xiAcknowledgement xii1 Introduction 11.1 Database Design 11.2 Semantic Integrity Constraints 31.3 SICs, “Constraints” and Transactions 41.4 Enforce Semantic Integrity Constraints via Application Software 81.5 Embed Semantic Integrity Constraints in a Database 101.6 The Research Questions, Objectives, Methodology and Scope 121.7 Contributions 171.8 The Dissertation Outline 192 Review of Previous Work 20iv2.1 SIC Classification.2.1.1 Classification Based on2.1.2 Classification Based on2.1.3 Classification Based on2.1.4 Classification Based on2.1.5 Classification Based on2.1.6 Classification Based on2.1.7 Classification Based on2.1.8 Classification Based on2.22.32.42.5ExplicitnessApplied ObjectsOperation TypePreconditionCertainties of SICsViolation ActionEnforcement Schedule .Dynamics21212225262628293134343840413 A Model for Representing Semantic Integrity Constraints3.1 An Overview of the Representation Model3.2 SIC Name4343512.1.9 Summary of Comments on SIC ClassificationSIC RepresentationSIC VerificationSIC Reformulation and DecompositionAutomated Database Design Aids for Eliciting SICsv3.3 Certainty Factor (F) 533.4 Object (0) 543.5 Operation Type (T) 553.6 Precondition (C) 553.7 Predicate (P) 563.8 Violation Action (A) 574 The Application of the SIC Representation Model 614.1 Completeness of a SIC Specification for Database Design 614.1.1 Not Fewer Components 634.1.2 No More Components 694.2 SIC Abstractions 764.3 Database Management 804.3.1 SIC Management 804.3.2 Other Aspects of Database Management 845 An Extended E-R Model Incorporating Semantic Integrity Constraints 865.1 Problems with Previous E-R Models 865.2 An Overview of the E-R-SIC Model 88vi5.2.1 Primitive Modelling Constructs 885.2.2 Data Abstractions 925.2.3 Basic Properties of SICs 975.3 Entity Attribute SICs 1045.4 Entity SICs 1065.5 Relationship SICs 1095.5.1 Necessary Conditions 1125.5.2 Sufficient Conditions 1185.6 SICs Implied by Implicit Relationships and Data Abstractions 205.7 Summary of the E-R-SIC Model 1246 The Application of the E-R-SIC Model 1306.1 An Example of Using the E-R-SIC Model 1306.2 Potential Pitfalls of Using the E-R-SIC Model . . . 1376.3 Data Integrity Semantic Completeness 1407 A Proposed Database Design Aid for Eliciting SICs 1447.1 An Overview of the SIC Elicitation Subsystem . . . 1457.2 SIC Verification. 149rjj7.2.1 Consistency and Nonredundancy Rules for SIC Types 1557.3 SIC Reformulation and Decomposition 1577.3.1 Representation of Generic SICs 1617.4 Transforming SICs to Relational Form 1658 Conclusions and Further Research 1708.1 Conclusions and Contributions 1708.2 Future Research Extensions to this Dissertation 172Bibliography 176Appendices 193A BNF Descriptions of the SIC Representation Model 193B Summary of the Predicates used in this Research 198B.1 Input Predicates 198B.2 Manipulation Predicates 203C BNF Descriptions of the Simplified Format 210D SIC Type Classification in the E-R-SIC Model 213viiiE Examples of Heuristics 225F Verification of Aggregate Attribute SICs and Cardinalities 227F.1 Simple Tests on Aggregate Attribute SICs 227F.2 Algorithms for Verifying Cardinalities 228G Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 231H SIC Reformulation and Decomposition Algorithms 23611.1 Find the Relevant Object and Operation Components 236H.2 Write the Proper Precondition and Predicate Components 240H.3 Suggest the Violation Action Component 245H.4 Generate the SIC Name 246I Some Examples of SIC Reformulation and Decomposition 248J Generic SIC Representation in the E-R-SIC Model 254K Algorithms Transforming SICs to Relational Form 266K.1 Transform the SIC Representation 266IK.2 Construct SIC Name Sets for the Foreign Key Update 269ixL Some Examples of Transforming SICs to Relational Form 270M Generic SIC Representation in the Relational Model 276xList of Figures1.1 A Proposed Automated Database Design Subsystem for Eliciting SICs . 165.1 Grouping: Member (M), Derived Set (DS), Indexing Entity (I), IndexingRelationship (R) 975.2 A Line Layout Context 1105.3 A Sta.r Layout Context 1115.4 A Loop-2 Layout Context 1115.5 A Loop-n Layout Context 1125.6 What is in the database? 1255.7 Single Entity Attribute SICs 1265.8 Single Entity SICs 1275.9 Relationship SICs 1285.10 SICs Implied by Implicit Relationships and Data Abstractions 1296.1 All Example: A Car Dealership Database. 131xiAcknowledgementI wish to take this opportunity to express my sincere gratitude to all members of mydissertation committee, Professor Alvin Fowler, Dr. Robert C. Goldstein, Dr. Veda C.Storey, Dr. Yair Wand, and Dr. Carson Woo. In particular, I am especially indebted tomy research supervisor, Dr. R. C. Goldstein, for his patience, support and countless hoursof valuable discussions. I am also grateful to Dr. V. Storey because she suggested thisgreat research topic and gave her opinions all the way. Many thanks go to Dr. Y. Wandfor his critical and stimulating comments. I appreciate Dr. C. Woo for his commentson the dissertation organization and Professor Al. Fowler for his practical suggestions.In addition, I would like to thank my friend, Dr. Hsueh-Ming Hang, and all staff of theinter-library loa.n division of UBC Library, for their helping collect the related literaturein the early stage of this research. Finally, I thank my parents for their love and supportall the time.xiiChapter 1Introduction1.1 Database DesignDatabase management systems (DBMSs) have been available for more than two decades.However, database design has also been recognized as a task with a high level of complexity ([Obretenov, et al., 1988]) and has often been referred to as an art rather than ascience ([Holsapple, et al., 1982]). Database design is the process of modelling the information requirements of a real-world application and mapping them onto an underlyingDBMS. Database design must go through the following phases: (1) information requirement elicitation, during which requirements for knowledge of a real-world application aredetermined; (2) conceptual database design, which produces a high-level representationof the requirements independent of the DBMS that will be used, e.g., the output mightbe expressed as an Entity-Relationship (E-R) model; (3) logical database design, whichproduces a logical schema that corresponds to the data model of the chosen DBMS, e.g.,the output might he expressed as a relational data model; and (4) physical databasedesign, which transforms the logical design into a form that is suitable for the givenhardware and DBMS, and considers efficiencies of storage and processing.Traditionally, database design has been carried out on an ad hoc basis ([Bouzeghoubet al., 1985], [Goldstein, 1985j). It has usually been performed by a human “database1Chapter 1. Introduction 2design expert”. The weaknesses of the traditional approach are that: (1) it is a difficulttask and expert database designers are scarce; and (2) the design of a database is done bysomeone who is unfamiliar with the application domain, instead of end-users ([Storey andGoldstein, 1991]). Recently, a number of computerized systems have been developed forassisting the database design process ([Ram, 1989], [Storey and Goldstein, 1990]). Someof those can be classified as knowledge-based or expert systems; others are automationtools. Those systems are designed for assisting conceptual and/or logical design processes.Some even provide help for physical database design. Automation of database designprocess can help overcome the problems of the above traditional approach by codifyingthe database design methodology in a computer program.However, some major problems remain. First, only a few of the above automatedsystems try to help a database designer incorporate semantic integrity constraints ina database schema ([Yang and Goldstein, 1989]). Many types of semantic integrityconstraints have never been identified by any system. Furthermore, as observed byTroyer [1989, p. 423], “constraints often considered as first class citizens in the conceptualmodelling seem to become pariahs during the transformation [from a conceptual schemato a relational schema]”. Second, those automated systems usually only consider databasestates and static properties. They seldom consider the behaviour of a database, that is,the state transitions and dynamic properties. Some systems (e.g., E2R [Kozaczynskiand Lilien, 1988] and EXIS [Yasdi and Ziarko, 1987]) model the behavioral semantics bytransaction modelling or event modelling. Dynamic semantic integrity constraints, whichplace restrictions on database transitions, have not really been treated as constraints ondata.Chapter 1. Introduction 31.2 Semantic Integrity ConstraintsSemantic integrity is concerned with the logical meaning (i.e., the intension) of storeddata and preserving the correctness of database contents even though users and application programs try to modify it incorrectly ([Fong and Kimbleton, 1980], [Fernandez, etal., 1981]). A database schema consists of structures and semantic integrity constraints(hereafter abbreviated as SICs) ([Tsichritzis and Lochovsky, 1982], [Frost, 1984]). Thestructure part tells us relatively little information other than the basic structure whatelementary items of data are grouped into what larger units; but the SIC part providesinformation about all allowable occurrences— current and future ([Morgenstern, et al.,1989]). These SICs express data integrity semantics, that is, the part of the meaning ofstored data needed to determine correctness. They are invariant restrictions on the staticstates of the stored data and the state transitions caused by the primitive operations:insertion, deletion, or update. They express what is and is not allowed in the part of theuniverse that is represented by the stored data’.Traditional database design techniques focus on structure and efficiency. The relational database model is the popular logical data model with a good theoretical foundation. Data dependency (e.g., functional dependencies, and multivalued dependencies) theory has been well-formalized in the literature (e.g., [Delohel, 1978], [Ullman, 1982]). However, data dependencies only capture part of semantic integrity. A relational database infourth or fifth normal form may still represent little of the data semantics, that is, themeaning of stored data. In addition, as observed by Kent [1979, p. 127], “The assurnption teiids to be that functional dependencies (if specified at all) have been used duringthe design phase of the database to insure that relations are in third normal form, and‘SICs also provide information that can be used to determine the most appropriate structure for theschema.Chapter 1. Introduction 4then discarded. They do not seem to be present at run time to explain the semanticstructure of the data.”Limitation of SICs Although a database with embedded SICs would be more correctthan the same one without SICs, the absolute correctness of the database is still notguaranteed. For instance, a user might incorrectly update the salary of an employee but,as long as it was within the range allowed by the SICs, it would still be accepted bythe DBMS2.Also note that enforcement of a SIC is based on the assumption that alldata already stored in the database are correct. If we know that some data were enteredincorrectly, the related SICs may need to be turned off in order to correct these errors3.In addition, SICs are passive restrictions on data. By incorporating violation actionsto alter other objects when a SIC is violated, one can make the database more active([Morgenstern, 1983]). However, a SIC can only trigger some action when it is violated.1.3 SICs, “Constraints” and TransactionsThe reader should be cautious that in the literature the word “constraints” is sometimesused to include all abstract relationships between objects in an information system, e.g.,the correctness of mapping to physical storage, and those preserving reliability, concurrentconsistency, and security. The SICs discussed in this dissertation are a proper subset of2Similarly, Thompson [1989, P. 95] points out that “a semantic database does not, and cannot,provide meaning, or strong-semantics, but it provides contexts within which it is possible for data to bemeaningful”. He names these contexts as weak-semantics. This dissertation does not adopt that term,but the reader should be cautious of the limitation.3For instance, if a SIC states “salary never decreases”, an update from $12,000 to $10,000 will berejected. However, it is possible that the $12,000 was entered incorrectly the first time. The SIC mustbe turned off in order to correct the input error.Chapter 1. Introduction 5these more general “constraints” (e.g., [Shepherd and Kerschberg, 1986]) “laws”, or “sub-laws” ([Paulson, 1989], [Wand and Weber, 1988; 1989; 1990]). The following are somekinds of “constraints” that are not dealt with in this research.Other Database Constraints In a database management system, there are at leastfour major aspects to the prevention of errors in a database environment: reliability,concurrent consistency, security, and semantic integrity ([Hammer and McLeod, 1975],[Eswarn and Chamberlin, 1975]). Reliability is concerned with errors due to the malfunctioning of system hardware or software. Concurrent consistency is the prevention ofinconsistencies that may arise due to concurrent processing (in which multiple processesconcurrently operate on shared data). Security deals with preventing users from accessing and manipulating the data in unauthorized ways. Although unauthorized updatesto the database are sometimes said to violate the “integrity” of the system, this type oferror is a security problem and is not considered in this research.Non-database Constraints Database design is part of information system development, which is the process of modelling a portion of the real world and transforming itinto an implemented artifact to deal with information processing functions in an organization. SICs are facts about the stored data ([Oren, 1985]) and are used to capture dataintegrity semantics ([Dampney, 1988]). However, data integrity semantics do not includeall information system semantics, that is. knowledge represented in the information system. Some constraints are inherent to application programs or transactions rather thandata ([Flemming and Halle, 1989, p. 140]).Chapter 1. Introduction 6Transaction Modelling Data-oriented system modelling has long been criticizedfor focusing only on static properties of an information system. There is ongoing researchon how to add dynamics to data-oriented modelling. For example, the ACM/PCM (Active and Passive Component Modelling) uses SHM+ (the Extended Semantic HierarchyModel) to include both structural and behavioral properties ([Brodie and Ridjanovic,1984].) The popular conceptual modelling tool, the E-R model, has also been extendedto model behavioral aspects of the real world by defining transactions or events (e.g.,applying the Petri Nets technique [Sakai, 1983a], [Solvberg and Kung, 1986]). A transaction, sometimes called an application-oriented operation (e.g., “hire”) ([Casanova andFurtado, 1984]), consists of one or more database querying and/or altering primitive operations that must be treated as an atomic unit, and reflects changes and “happenings”in the real world. In transaction modelling, transaction specifications often include preconditions and post-conditions. The pre-conditions specify what must be true before thetransaction can be applied. The post-conditions specify the actions to be taken and thetest-conditions that should be true after the transaction.Transaction-driven Constraints An information system is an artifact that relieson transactions to track changes in the real world. SICs place logical restrictions onstored data and are independent of any particular transaction. Transaction-driven constraints are inherent to transactions rather than data, and assure consistency betweenthe information system and the real world. A transaction-driven constraint requires thatwhen a change or “happening” occurs in the real world, a transaction must be performed4Similarly, there are some information system development methods, e.g., the Z approach ([Spivey,1988]) and VDM (Vienna Development Method) ([Jones, 1986]) to express formal specifications of staticand dynamic aspects of information systems by modelling transactions or events.Chapter 1. Introductionto modify the database faithfully. For example, pre-conditions of a transaction may include some procedural or manual checks (e.g., issuing a message to ask whether thereare signed documents or calls from customers). They are more prone to change as anorganization evolves. They might include rules on some objects (e.g., virtual fields) inthe information system that are not modelled as stored data in the database. They arelikely to require a great deal of human checking since they may involve non-data objects(e.g., the above mentioned signed documents).Note that because a transaction consists of primitive operations, it must also conform to all of the SICs on data. It might be inefficient to enforce SIC checking when atransaction is performed. It is possible to design an algorithm to transform SICs into thepre-conditions and post-conditions of transactions. For example, Lipeck [19861 describessome general rules for these transformations based on his temporal logic language for dynamic SICs. In fact. most of the usual pre- or post-conditions of transactions discussedin the database literature are transformed SICs rather than true transaction-driven constraints.To illustrate, consider the following example. A clerk receives a telephone call from acustomer to place an order. A transaction-driven constraint stipulates that a transaction“new-order” must be performed. It may stipulate that the transaction must be performedexactly according to the memo written by the clerk or the cassette recording of the call.In addition, it may stipulate that the transaction should be performed immediately bythe clerk or in batch during the night by a computer operator. If the transaction is tobe performed, a record of the order is to be “inserted” into the database. SICs wouldthen check the attributes of an order, the existence of the customer in the database,etc. However, since SICs are intensional expressions, they do not, in general, restrictwhen an order must be put into the database or to whom the order should be shipped,chapter 1. Introduction 8etc. In this example, in practice, it would also be impossible to use a SIC to increaseautomatically the customer’s account balance by the total price of the order unless allpast orders and payments are kept in the database to allow evaluation of an invariantformula among these objects5. Sending a bill to a customer may also involve a numberof transaction-driven constraints in addition to SICs.1.4 Enforce Semantic Integrity Constraints via Application SoftwareIn traditional data modelling, database design is concerned with the structure of the dataand most integrity checking is left to the application programs (procedures). Fernandez,et al. [1981, p. 109] identify the problems of relying on application programs for integritychecking as follows.• Checking is likely to be incomplete because the application programmer may notbe aware of the semantics of the complete database;• Each application program must trust other programs that modify the database —one rogue program could corrupt the whole database;• Code to enforce the same SICs occurs in a number of programs, wasting programming effort and risking inconsistencies;5One should note that statement-i “the new customer balance is equal to its old balance plus totalprice of the order” is not an invariant assertion of a SIC for updating Customer.Balance. That statementis specific for the transaction, New_Order, to update the Customer.Balance, and does not apply to othertransactions (e.g., a new payment by the customer or an update of the Order. TotaiPrice), which alsoaccess these same data objects (i.e. Customer.Balance or Order. TotaiPrice). The invariant formula inthis example would be that Cttstomer.Balance is equal to the sum of Order. TotaiPrice minus the sumof PaymenLAmount. However, this formula will normally be inefficient to check. For efficiency reasons,we may transform this formula into pre- or post-conditions on transactions. For example, statement-iwould be a post-condition of the New-Order transaction.Chapter 1. Introduction 9• The criteria for integrity are buried in procedures and are therefore hard to understand and control;• Primitive operations (update, insert, and delete) performed by users of high-levelquery languages cannot be controlled.Fleming and Halle [1989, p. 140] state that traditional modelling methods typicallyleave the definition of SICs to application development rather than to database development. This application-driven method raises three dangers as outlined below:• A user may define the SICs too narrowly, reflecting only the needs of the immediateapplication;• Multiple users interested in different applications may have different perspectiveson SICs. Thus they may define inconsistent or even conflicting SICs as part of theapplication specifications;• Maintaining correctness and consistency across SICs implemented via multiple applications may be extremely difficult (or impossible) to accomplish as applicationsevolve and new applications arrive.In summary, Fernandez et al., and Fleming and Halle identify the disadvantages ofenforcing integrity via the application software as: incomplete, inconsistent, redundant, incorrect, hard to understand and control, and difficult to maintain.Using the database embedded SIC approach will overcome these disadvantages.Chapter 1. Introduction 101.5 Embed Semantic Integrity Constraints in a DatabaseDatabase Approach Compared to traditional information processing before the development of database concepts, the database approach is claimed to have the followingadvantages [Goldstein, 1985, p. 8]:• Controlling data duplication and inconsistency;• Facilitating the sharing of data among applications;• Assisting the coherent management of data as a basic organization resource;• Increased programmer productivity;• Increased applications’ reliability;• Enabling quick, economical response to ad hoc requests for information;• Protecting data from damage or unauthorized access;• Providing data independence.It is not surprising to find that the disadvantages of enforcing integrity via the application software criticized by Fernandez et al., and Fleming and Halle are similar to thedisadvantages to distributing stored data to separate files instead of a database. The fulladvantages of the “true database approach” are still not achieved if only the corporatewide “data” are included in the database, but the corporate-wide SICs are distributedto separate application programs.Chapter 1. Introduction 11Behavioral Modelling As stated in the previous section, research has been ongoingto enhance data-oriented system modelling by incorporating some behavioral modellingmethods to model the state transition and dynamic properties of an information system. Some researchers claim that transaction modelling is part of database design (e.g.,[Brodie, 1986], [Brodie and Manola, 1989]). Some researchers even propose using transaction specifications to replace SIC specifications (e.g., [Abiteboul and Vianu, 1985],[Lipeck, 1986]). However, even if we agree that a “complete” database design includestransaction modelling, there would be some disadvantages of distributing SICs to separate transactions (e.g., redundancy and maintenance problems) that are similar to thecase of distributing SICs to separate application programs. Both dynamic and staticSICs are logical restrictions on data and should be embedded in a database.In summary, this research assumes that transaction modelling is still valuable, butSIC specifications are fundamental to behavioral modelling. The position of this researchis similar to the idea of having both specifications in the BASIS approach ([Leveson et al.,1983]), the idea of hierarchical levels of database specifications proposed by de Castilho,et al. [1982] and Casanova and Furtado [1984], and the hierarchical specification layersused in the CADDY design environment ([Hohenstein and Hülsmann, 1991]). That is, a“complete” database design will produce the two levels of specifications as follows.1. First-level specifications form the database schema that consists of structures andSICs. These are data-driven and tend to be more fundamental (stable) since theyare not affected by the addition or deletion of transactions;2. Second-level specifications consist of the specifications of transactions. The preconditions and post-conditions provide an effective way of implementing SICs andmay contain some transaction-driven constraints.Chapter 1. Introduction 12Knowledge-based Perspective Databases have been criticized for lacking abstractknowledge6 ([Wong and Mylopoulos, 1977], [Bubenko, 1980]), and inference capabilities([Wong and Mylopoulos, 1977], [Brodie, 1986]). Recently, the development of knowledge-based systems (KBS) provides useful insights for database research. Some researchers(e.g., Jarke and Vassiliou [1984], Missikoff and Wiederhold [1986], and Kennedy and Yeh[1990]) have recognized that a KBS can provide a DBMS with better semantic modelling,reasoning ability, improved user interface, etc. A special type of system, expert databasesystem, has been proposed to integrate a knowledge-based system (expert system) with adatabase system ([Missikoff and Wiederhold, 1986]). The approach to embedding SICs inthe database fits this new trend. SICs are abstract knowledge that can be used to providedeductive capabilities to a database, that is, make a DBMS appear more “intelligent”.An obvious example is the case where SICs are used to reduce knowledge-based queryevaluation costs ([Brodie, 1986]). For example, some queries can be answered using oniythe semantics expressed through SICs ([Chakravarthy, et al., 1987]).1.6 The Research Questions, Objectives, Methodology and ScopeRecently, a number of semantic data models (e.g., [Hull and King, 1987], [Peckham andMaryanski, 1988]) have been proposed to overcome the weaknesses of traditional datamodels (i.e., the hierarchical, network, and relational models) in modelling the semanticsof the real world. However, semantic integrity constraints have not received the attentionthey deserve. The relational model is still the most popular logical data model. Almostall research on semantic integrity constraints in the literature is confined to exploring6Bubenko [1980] defines two kinds of knowledge: (a) concrete knowledge is seen as facts, statementsconcerning individual phenomena, entities or relationships in the model of the environment; (b) abstractknowledge denotes such information which augments our interpretation of concrete information and bywhich we can draw inferences, conclusions of other facts.Chapter 1. Introduction 13the efficient enforcement (checking) methods for a few kinds of SICs in a relationaldatabase or deductive database. No suitable language exists to represent the featuresof SICs precisely: certain or uncertain; for what data object; operation-independent oroperation-dependent; conditional or unconditional; strong, soft or self-correcting; andstatic or dynamic7.Research Questions Based upon the above observation, this research addresses thefollowing questions:Is it possible that the features of SICs can be precisely represented by somelanguage when using an E-R model for conceptual modelling and a relationalmodel for logical modelling?Can we provide a model to incorporate the necessary SICs in a databaseschema during conceptual modelling?Can we help a database designer capture these SICs by providing automatedtools?Research Objectives To answer the above research questions, the following specificobjectives were established.1. To develop a precise SIC Representation model for specifying the componentsof a SIC. Since current languages are not sufficient to represent the features of a7These features are described in detail in Chapter 2. An operation-dependent SIC is the one thatmust hold for some object upon some operations, but not upon other operations. Violating a strong SICwould cause errors — reject the operation. Violating a soft SIC would only get a warning message.Chapter 1. Introduction 14SIC when the structure part of a schema is in the E-R model or in the relationalmodel, this research will first present a SIC Representation model that is extendedand modified from the model proposed by Fernandez et al. [1981], Date [1983], andBertino and Apuzzo [1984].2. To develop a comprehensive modelling tool, called the E-R-SIC model, for incorporating the necessary SICs during conceptual modelling. Since the traditionalE-R model contains only a few inherent SICs, this research proposes a comprehensive model, which is an extended E-R model, for incorporating SICs in a databaseschema.3. To propose conceptually the SIC elicitation subsystem of an automateddatabase design system for helping a database designer capture the necessarySICs. The SIC Representation model and the E-R-SIC model are complementary.The captured SICs are represented in terms of the Representation model. The SICRepresentation model is a language or representation tool to represent SICs. Itsfunction is similar to the E-R diagram that we use to represent the structural partwhen using an E-R model. Both the models are application domain-independentand are suitable for implementation as part of an automated database design system. Some problems, for example, the procedure to query the database designerto identify SICs, verification of the captured SICs for consistency and nonredundancy, reformulation and decomposition of a general SIC into operation-dependentsub-SICs, and transformation of the SICs referencing entities and relationshipsinto corresponding ones referencing relations, need to be solved in order to have aworkable automated database design system. A SIC originally obtained from thedatabase designer may be general, i.e., it may be relevant to several objects onvarious operations. Decomposing such a SIC is to rewrite it into several sub-SIGsChapter 1. Introduction 15represented in terms of the SIC Representation model. Each of them is only relevant for one object on one access operation — that is, it is operation-dependent.The purpose of decomposition is not only to let the database designer know clearlywhat precise implications of a general SIC would be, but also to reformulate theoriginal SIC into several formats that can be efficiently enforced later.Research Methodology Methodologically, this research has two components. Thefirst is model building to develop two models. The second is algorithm designing todevelop algorithms to verify, decompose, and transform SICs.Research Scope Figure 1.1 illustrates the scope of this research. This research isprimarily concerned with semantic integrity constraints in the conceptual design phaseand how they are translated into SICs in the logical design phase. In particular, itfocuses on the Entity-Relationship model and its transformation into a relational modelbecause of the popularity and wide-sprea.d adoption of these as conceptual and logicaldata modelling tools, respectively.A complete automated database design system would include a structure subsystemfor constructing the structural entities, relationships, and relations, and a SIC elicitationsubsystem for eliciting the SICs. The construction of the structure part of the schemais not the focus of this research although the SIC elicitation subsystem would need theconstructed E-R diagram as input.The proposed approach in this research is different from Codd’s ([Codd, 1979]) although the purpose of capturing more meaning may be the same. Codd proposed a newdata model RM/T to extend (indeed replace) the relational model. This research retainsSIC specificationsfor E-R model-_____________________________________-.Transaction :Design ITool : Transactions rograms ipre condition an dithte ity Maintenance Subsysten [lnterity Maintenance Subsystenpost-condition I Ispecifications L E H Data Structure Relational Data Structureof transactions B-H DBMS Relational DBMSfor E-R modepre-condition and post-conditionspecifications of transactionsI for relational modelNote that some general SICs needs to be first decomposed into precondition and predicate components before verifying them.Legend: dash lines and boxes are beyond the scope of this research.Chapter 1. Introduction 16An Automated Database Design SystemProposed SIC Elicitation SubsystemInterfacebetweentwoSubsystemsDB specificationin the extendedE-R modelStructure SubsystemStructureElicitationInterfaceE-RReqviremen,,,,,/DB DesignerHeuristicsConsistency & NonredundancyRules for DifferentSIC TypesE-R-STC Modeldoonain- independentknowledaesemantic integrityconstraintstransaction-drivenconstraintsSICspecificationsfor relationalmodelFigure 1.1: A Proposed Automated Database Design Subsystem for Eliciting SICsChapter 1. Introduction 17the traditional relational model because of its popularity, but proposes to include thenecessary SICs in addition to relation structures in a relational schema. The output ofthe proposed SIC elicitation subsystem would be SIC specifications, which are suitablefor either the E-R model or the relational model. There should be an integrity maintenance subsystem in a traditional E-R DBMS8 or relational DBMS. Some functionalrequirements (e.g., regarding SIC enforcement schedules and SIC inheritance, etc.) ofthis integrity maintenance subsystem are discussed in Chapter 4. However, in general,how this integrity maintenance subsystem would work is a future research topic beyondthe scope of this dissertation. It is also assumed that the DBMS would support inheritance mechanisms.This research does not address transaction modelling. The database designer maylater input transaction-driven constraints and SIC specifications to a transaction designtool to produce the pre-conditions and post-conditions of transactions, which would beperformed in an E-R DBMS or relational DBMS.This research neither empirically tests the “usefulness” of using database embeddedSICs versus the traditional enforcing integrity via the application software nor tests the“usefulness” of using the proposed automated database design system. The empiricalresearch is a future research topic.1.7 ContributionsThe contributions of this research are both theoretical and practical.The theoretical contributions include the following:8] is assumed that there are some (probably experimental) ]JBMSs to define and manipulate dataobjects directly in the E-R model.Chapter 1. Introduction 181. This research provides a model to represent precisely the features of a SIC.2. This research also develops a model to incorporate the necessary SICs in a databaseschema. The approach is to model dynamic as well as static SICs in the databaserather than in transactions or programs. The gap between traditional databasemodelling and application programming (transaction modelling) will be bridged bythe two models presented in this research.3. The work on reformulating and decomposing a general SIC into sub-SICs, andtransforming them from an E-R schema into a relational schema may be interestingto the computer science discipline because the current literature has explored theseproblems for only a few kinds of SICs.On the practical side, the contributions of this research include the following:1. The proposed automated database design system would help a database designernot only design the database structure but also include the necessary SICs.2. This research provides a foundation for overcoming the well-known problem ofrepresenting data integrity semantics in current relational database systems. Theresulting database would have the advantages of embedded SICs, e.g., greater consistency, deductive capabilities, etc. The SIC representation would facilitate theefficient enforcement.3. A database designer would find that modelling data integrity semantics becomeshis (her) responsibility and right. Important business rules and even heuristics canenter the database schema in an early database design phase— conceptual modelling. Applica.tion programmers could focus on information system semantics otherChapter 1. Introduction 19than data semantics, and need not worry about SIC checking in their individualprograms.4. This research provides a starting point for future empirical research to test theusefulness of using database-embedded SICs versus the traditional approach ofenforcing integrity via application software.1.8 The Dissertation OutlineThis chapter has given the definition of SICs, described the motivations, methodology,scope and contributions of this research. Chapter 2 briefly reviews work on SICs in theliterature. Chapter 3 introduces the SIC Representation model for representing SICsuniformly and precisely. Chapter 4 describes how the SIC Representation model can beapplied. Chapter 5 proposes the E-R-SIC model for incorporating SICs in a databaseschema. Chapter 6 gives an example of the use of the E-R-SIC model and discusses relatedissues and usefulness. Chapter 7 conceptually proposes a SIC elicitation subsystem tohelp the database designer use the E-R-SIC model and the SIC Representation model.Finally, Chapter 8 offers conclusions and describes how future researchers can extendthis dissertation. A series of appendices is attached to provide related materials in detailand give some examples.Chapter 2Review of Previous WorkAs briefly mentioned in Chapter 1, an automated database design system would firstelicit general SICs from a database designer, then verify and reformulate (decomposeif necessary) them in some representation language for the database structural schemarepresented in the E-R model, and finally transform them into corresponding ones in therelational model. This chapter reviews previous work on related aspects of producingSICs.The literature on SICs is rich. Most research concentrates on classifying and efficientlyenforcing SICs rather than capturing and incorporating them in a database schema. Inaddition, most research discusses SICs in the context of relational or deductive databasesrather than the databases in the conceptual level of E-R schema. Therefore, transformation from the SIC representations in the E-R model into corresponding ones in therelational model has not been explicitly discussed in the literature although some existingautomated database design systems may perform it for a few types of SICs that theyidentify. Section 2.1 first reviews the different ways to classify SICs because the classification can help understand the important SIC features so that we can incorporate,represent and enforce them properly. Section 2.2 reviews previous research on languagesfor representing SICs. Section 2.3 reviews how verification of SICs has been dealt within the literature. Section 2.4 reviews previous research on reformulating SICs. Finally,20Chapter 2. Review of Previous Work 21Section 2.5 reviews how SICs are handled by existing automated database design systems.2.1 SIC ClassificationIn the literature, there are a number of different ways to classify SICs.2.1.1 Classification Based on ExplicitnessOne way to classify SICs is based on their explicitness from the perspective of the datamodel in use. SICs can be inherent, explicit or implicit ([Tsichritzis and Lochovsky,1982], [Brodie, 1983], [Brodie, 1984], [Davis and Bollriell, 1989]). Inherent constraints areintegral parts of the structure of a data model (e.g., record relationships in the hierarchicaldata model are structured as trees; tuples in the relational data model are not duplicatedand the ordering of tuples is not important). Explicit constraints are defined explicitly bysome specification mechanism; for example, a database designer might explicitly specifythat the salary of any employee must be less than $1,000,000. Finally, implicit constraintsare logical consequences derived from inherent or explicit constraints; for example, thetransitive closure of functional dependencies can be deduced from a subset of functionaldependencies.Discussion A data model with rich inherent SICs would relieve the database designerfrom specifying many SICs explicitly. From this perspective, neither the E-R modelnor the relational model is a good modelling tool for designing a database since bothmodels, especially the relational model, contain very few inherent SICs. However, theyare widely used for other reasons. It is one of the motivations of this research to helpChapter 2. Review of Previous Work 22the database designer identify and precisely specify explicit SICs. A database designerand an automated database design system need to know the inherent SICs of the datamodel (e.g., the E-R model) that they use. Otherwise, it is likely that these inherentSICs would be lost when the structural schema of the conceptual model is transformedinto the logical model (e.g., the relational model). Both inherent and explicit SICs shouldbe represented properly at the conceptual design phase and transformed into a logicalschema. Given a set of inherent and explicit SICs, the database designer should alsobe aware of the derived consequences, i.e., implicit SICs, to verify its consistency andnon-redundancy.2.1.2 Classification Based on Applied ObjectsClassification of SICs based on applied objects is the most common classification andis concerned with the data objects to which a SIC applies ([Eswarn and Chamberlin,1975], [Fong and Kimbleton, 1980], [Fernandez et al., 1981], [Tsichritzis and Lochovsky,1982], [Date, 1983], [Weber, et al., 1983], [Simon and Valduriez, 1984]). There are threecategories of these SICs:1. Strong data type constraints: Strong data type constraints (also called domainconstraints) are applied to a single data item, for example, a field or an attribute.They include the following:(a) Value constraints specify the range of acceptable values of a data item (e.g.,within some numerical bounds or an enumerated set) and whether a null valueis allowed.Chapter 2. Review of Previous Work 23(b) Norivolatility constraints declare whether a data item value can be changed([Kozaczynski and Lilien, 1988]).(c) Extended format constraints permit specifications of data type, length, andformat (mask) pattern;(d) Legal operation constraints limit the operations that can be performed on agiven domain ([Eswarn and Chamberlin, 1975], [Fong and Kimbleton, 1980]);for example, two dates cannot be multiplied.2. Record (tuple) constraints: Record constraints apply to an individual record (anoccurrence in terms of the E-R model). For example, in each payroll record,Gross_Salary should be greater than Deductions. One interesting kind of recordconstraint is the case where the value of one field in a record is conditional onthe values of other fields ([Benci et al., 1976]). For example, Salary is equal toBase_Salary plus Bonus. In this example, regardless of whether the value of Salaryis entered by a user or computed by the system itself, as long as this value is explicitly stored, the integrity constraint must hold. If the conditional attribute isnot explicitly stored, it is a virtual invariant, derived item because one data itemis defined as a function of another ([Etzion, 1989]).3. Set constraints: Set constraints apply to a set of records (occurrences). They may hebased on built-in aggregate functions (e.g., average, minimum, count). Therefore,they are sometimes called aggregate constraints ([Tsichritzis and Lochovsky, 1982])or set function constraints ([Dogac et al., 1985]). They may be based on comparison(e.g., exclusion or inclusion) of one set to another. For example, the set of managersmust be contained in the set of employees. These records (occurrences) may belongto the same relation (entity) type or different relation (entity) types. From thisChapter 2. Review of Previous Work 24observation, some researchers (e.g., [Fong and Kimbleton, 1980]) further classifythem into relation constraints and multi-relation constraints.Discussion This categorization reminds us that all objects in a database— attributes,entity and relationship occurrences and types in the E-R model, columns, relation tuplesand types in the relational model may have related SICs that need to be identifiedand represented. However, there are two controversial points.• Should we capture legal operation constraints? Note that this kind of “constraint”borrows the notion of abstract data types ([Goldstein, 1985]), and the “operations”are basic arithmetic or string operations that may be performed on value domains,not the database primitive manipulation operations. This research will not considerthese “constraints” because they do not conform to the SIC definition. This kindof “constraint” is relevant only if the data object is manipulated by some restricted“operations” (e.g., arithmetic or date operations). It is not a restriction on staticdatabase states or state transitions.• Should we capture extended format constraints? One may argue that the specification of data type, length, and format of an attribute is a syntactic rather thana semantic issue. However, this research includes extended format constraints aspart of SICs for the following two reasons. (1) The semantics of an attribute arebased on the syntax we agree to use. For example, the meaning of a salary rangefrom $1,000 to $50,000 in integers is different from that of the same range of realnumbers. (2) SICs in this research are attached to the stored data rather than realworld objects. A database designer would specify a SIC in terms of the interests ofthe application and include the data in the format that is meaningful to it.Chapter 2. Review of Previous Work 252.1.3 Classification Based on Operation TypeSICs that are concerned with database operations can be classified as operation-dependentor operation-independent constraints ([Eswarn and Chamberlin, 1975], [Fernandez et al.,1981], [Weber, et al., 1983]).• operation-independent: A constraint is operation-independent if it must hold forsome object on all access operations, although for efficiency reason it may beenforced only on some operations.• operation-dependent: A constraint is operation-dependent if it must hold for someobject on some access operations, hut not on other access operations.Discussion We only need to consider three kinds of operation types: insertion, deletionand update because a retrieval (query) operation does not change any data. An updateoperation is on an attribute of an entity, relationship occurrence, or relation tuple, butnot on the whole entity, relationship occurrence, or relation tuple. It is also the onlyapplicable operation on an attribute. Therefore, if a SIC is relevant to an attribute, it isoperation-dependent. For an entity, relationship occurrence, or relation tuple, there aretwo possible operations insertion and deletion. It appears that in the real world if a SICapplies to an entity, relationship occurrence, or relation tuple, it is likely to be operationindependent. Operation-dependent constraints (e.g., “a project can be deleted only if itsbudget is equal to zero”) seem to be relatively less common. This may explain whyprevious research pays only little attention to them. However, the possible relevance tospecific operations creates a special requirement on the SIC representation language.Chapter 2. Review of Previous Work 262.1.4 Classification Based on PreconditionAccording to Fernandez et al. [1981], SICs based on the precondition for enforcement canbe classified as either conditional or unconditional. Conditional constraints are enforcedonly when certain preconditions are met. For example, the salary of an employee whohas worked for less than three years must not be over $30,000. Unconditional constraintsare always enforced.Discussion Whether a SIC is conditional or unconditional is also relative to the object concerned. It seems that the research of Fernandez et al. [1981] is the only workthat classifies SICs in this way. However, this categorization perspective places anotherrequirement on the SIC representation language— the context under which the SIC isapplicable should precisely be represented.2.1.5 Classification Based on Certainties of SICsSICs can be classified as certain or uncertain ([Oren, 1985], [Morgenstern et al., 1989]).A certain constraint specifies some fact about the data semantics that is assumed to beabsolutely true (e.g., “the height of a person is greater than zero” [Oren, 1985]). Anuncertain constraint is one that is generally true, but there is a slight probability that itcan be violated (e.g., “the weight of a person is less than 200 kgs”). Certain and uncertainconstraints lie on a continuum. Wiederhold [Morgenstern et al., 1989] identifies four levelswith respect to the degree of certainty and absoluteness of constraints:1. absolute truths that arise in the physical world and in the database; for example,an employee has a birth date.Chapter 2. Review of Previous Work 272. rigid situational rules that do not change often; for example, each employee isassigned to one department.3. business rules that may change often; for example, a manager’s salary is greaterthan the salary of his or her subordinates.4. heuristics; for example, employees are usually assigned to projects that match theirspecialities.A more sophisticated proposal such as defining and maintaining “measure(s) of accuracy”may be possible ([Eswarn and Chamberlin, 1975]). Violation of a certain constraint maybe rejected as an error, whereas violation of an uncertain constraint may only cause adiagnostic message. Therefore, uncertain constraints are sometimes called soft constraints([Eswarn and Chamberlin, 1975]).Discussion This classification is another perspective that has not drawn much attention. Since it is likely that an organization might have a number of uncertain but usefulconstraints, there should be some way to represent them.Note that the four levels of certainty mentioned by Wiederhold ([Morgenstern et al.1989]) are in fact classified by two kinds of “uncertainty” exception and permanence. “Heuristics” may have exceptions. “Absolute truths”, “rigid situational rules”,and “business rules” have no exceptions. If there is any exception, the “rules” shouldbe modified to accommodate the exception and the “modified rules” have no exceptions.These levels differ on their permanence the “uncertainty” of how often they may bechanged.Should SIC specifications include both “uncertainty” of exception and “uncertainty”Chapter 2. Review of Previous Work 28of permanence? The “permanence” information may he useful for SIC managementduring database usage. However, the organizational environment is turbulent and constantly changing. The permanence of a rule is not easily decided in advance when adatabase designer designs a database. Therefore, this research does not include this kindof information.2.1.6 Classification Based on Violation ActionSICs based on the violation action, i.e., what would happen if the SIC is to be violated,can be classified as strong, soft, or self-correcting ([Weber et al., 1983]). If a strongconstraint is violated, the operation is rejected and the user receives an error message.If a soft constraint is violated, the user only receives a warning. If a self-correctingconstraint is violated, its error correcting action is executed.Discussion In the published literature, [Weber et al., 1983] is the only research thatclassifies SICs according to their violation actions although there is other research thatdiscusses alternative violation actions. Violation actions are classified into the three categories given below, which are synthesized from previous research ([Hammer and McLeod,1975; 1976], [Hammer and McLeod, 1976], [Casanova and Tucherman, 1988] and [Flemingand Halle, 1989]).1. Reject— reject the requested database operation, signalling an error.2. Warning — allow the requested database operation, but issue a warning.3. Corrective Action — perform a corrective action; that is, an auxiliary procedureknown as a triggered action ([Fernandez et al., 1981]). Usually, it causes the DBMSChapter 2. Review of Previous Work 29to insert, delete, or update other objects.For referential integrity constraints, e.g., E.A C F.B, where E.A is a foreign key ofa relation E and F.B is the primary key of another relation F, the corrective actioncan be further classified as:(a) Propagate for example, insert a referenced tuple or delete the referencingtuples.(b) Nullify for example, set the referencing attributes in the referencing tupleto null.(c) Default for example, set the referencing attributes in the referencing tupleto predefined default values.(d) Others — for example, triggers or other procedures that are domain specific.Note that “nullify” and “default” are special cases of “propagate”.Traditionally, a SIC is strong by default. However, this classification is important toremind a database designer that it is not necessary that if a SIC is violated, the operationis just rejected. There are other options that can be specified at database design time.2.1.7 Classification Based on Enforcement ScheduleSICs based on the enforcement time, i.e., when the SIC is enforced, can be classifiedas immediate or deferred (e.g., [Eswarn and Chamberlin, 1975], [Fong and Kimbleton,1980], [Fernandez et al., 1981], [Date, 1983], [Weber, et al., 1983]). Immediate constraintsare enforced immediately after each database operation. Deferred constraints are notenforced until the end of a transaction. It may also be possible to permit the user toChapter 2. Review of Previous Work 30switch integrity checking ON or OFF that is, to have user-invocable constraints ([Fougand Kimbleton, 1980], [Bertino and Apuzzo, 1984], [Weber. et al., 1983]).Discussion A number of researchers discuss this categorization. However, differentspecification levels should not be mixed up. This categorization is useful when consideringthe enforcement of SICs, or designing transaction specifications. Fernandez et al. [1981]allow the precondition component of their original model to specify whether the SICis to be applied immediately or deferred to the end of a transaction or to a periodicaudit. However, Bertino and Apuzzo [1984] treat the enforcement schedule separately asdecided by an integrity maintenance subsystem because they believe that the enforcementschedule of a SIC depends upon the transactions or programs that are executing. Theirexample may be helpful to understand their arguments:A SIC: “each employee working on project P125 must earn less than $3,000.”Suppose that now the database contains an employee entity occurrence whois in department D55, and works on project P125.Consider a transaction: first update the salary of all employees in D55 to$3, 500; then re-assign the project of all employees in D55 to P200.This transaction should be correct.Based upon the above observation, Bertino and Apuzzo [1984] propose a criterion:Basically, an integrity constraint is enforced at the end of the transaction, ifattributes present in the constraint are modified by more than one update statementChapter 2. Review of Previous Work 31in the transaction. Otherwise, the constraint is enforced after each tupleupdate, if it is in class Cl [i.e., tuple constraints], or after each update-request, if it is in class C2 [i.e., relation constraints] or C3 [i.e., multi-relationconstraints].Furthermore, the enforcement schedule can be more sophisticated and more efficient.Lafue [1982] proposes that constraint checking can be delayed until the dependent instances become of interest9. That strategy is called by Morgenstern [1986] “propagationwhen used.” Between immediate propagation and “propagation when used” is an intermediate strategy that has been referred to by Morgenstern [1986] as opportunisticpropagation, both in the sense of doing the work of constraint propagation when thecomputer is idle, and in the sense of using priority ranking of the constraints to selectthe order in which they should be considered for propagation.Thus, because the enforcement schedule is closely related to transaction modellingand enforcement implementation efficiency strategies, it should not be included in SICspecifications. Neither should be the option to switch integrity checking ON or OFF.2.1.8 Classification Based on DynamicsSICs based on their dynamics can be classified as static or dynamic ( [Eswarn and Chamberlin, 1975], [Bracchi et al., 1979], [Fong and Kimbleton, 1980], [Fernandez et al., 1981],[Date, 1983], [Bertino and Apuzzo, 1984], [Brady and Dampney, 1984], [Heuser andRichter, 1986]). Static constraints specify correct database states. Transitional (dynamic)constraints characterize valid state transitions, i.e., are concerned with ‘admissibi1ity” of°A dependent variable is a variable that can be operated on by the constraint, i.e., by a violationaction.Chapter 2. Review of Previous Work 32a database state sequence.There are two major kinds of transitional constraints mentioned in the literature([Fong and Kimbleton, 1980]): (1) old/new transitional constraints that restrict an updateof an attribute during which its “old” value is to be changed to a “new” value (e.g.,“new salary must be greater than old salary”); (2) nonexistence/existence transitionalconstraints that restrict either a nonexistence to existence transition or an existence tononexistence transition (e.g., “only if the account balance is zero, can the account bedeleted”).Some dynamic constraints (e.g., [de Castilho et al., 1982], [Casanova and Furtado,1984], [Ehrich et al., 1984], [Kung, 1984] and [Lipeck, 1986]), which are often neglectedby researchers, illclude:• constraints on a sequence of operations: some operations must happen in aspecific sequence or at the same time. For example, “ownership of a car must bepassed from a manufacturer to a dealer first, before it may be passed to a purchaser”.• constraints involving time explicitly:1. SICs having some explicit time restriction or time-triggering condition. Forexample, we may have “an employee cannot receive a salary raise during his(her) first 6 months in the company”, or “at 0:00 on 1/1/1993, increase thesalary of each employee by $1,000.,’2. Time-triggered or restricted SICs depending on past generations of data and,thus reference historical data at some specific time point. For example, we mayhave “the price of any product at any time cannot be more than 5% higher thanits price one year ago.”Chapter 2. Review of Previous Work 33Discussion Both static and dynamic SICs are important to preserve the logical meaning of stored data. A database designer should capture the necessary dynamic SICsand have some language to represent them. Constraints on a sequence of operations arenot easily captured and represented as SICs. This may explain why researchers ofteneither neglect them or model them by transactions or events. However, they are theconsequences of enforcing some SICs if specified properly.Note that an operation-dependent SIC is also a dynamic SIC. Old/new transitionalconstraints and nonexistence/existence transitional constraints are important. However,one should not define them too narrowly. That is, old/new transitional constraints arespecial cases of the update transitional constraints, which do not necessarily involvethe old/new comparison. Nonexistence/existence transitional constraints are also specialcases of the deletion/insertion transitional constraints, which may involve more than oneobject.In order to capture and represent SICs having explicit time restriction, in a real timeenvironment, we would need a special system variable Current_time to register thecurrent clock time’0, and explicit time-valued attributes in the entity or relationship. Ina non-real time environment, those become ordinary data-driven constraints involvingsome time-valued attributes. If we wish to capture and represent SICs using historicaldata in general, we would need time-stamped generations of data. This research doesriot explore the last kind of SIC since a database keeping time-stamped generations ofdata is both unusual and expensive to implement.10t is assumed that a DBMS has access to a clock that registers both current date and time. Current..time can be thought of as an attribute of a special system entity type. Because constraints maybe affected by the normal advance of time, we assume that the operating system can be instructed tosignal the DBMS integrity maintenance subsystem when a pre-specified time is reached.Chapter 2. Review of Previous Work 342.1.9 Summary of Comments on SIC ClassificationIn the literature, researchers describe different SIC categorization schemes for diversepurposes — discussing data models, incorporating, representing, and enforcing SICs.For incorporating and representing SICs adequately, the most important categorization schemes are: (1) certain or uncertain, (2) the classification on applied objects,(3) operation-independent or operation-dependent, (4) unconditional or conditional, (5)strong, soft or self-correcting, and (6) static or dynamic . These categorizations are toorudimentary to serve as a modelling tool for incorporating SICs. However, they are veryimportant for precisely representing SICs since they provide a feature listing of a SIC.Without any explicit description, one could only assume the given SIC to be certain,related to all objects mentioned in that SIC, operation-independent, unconditional, andstrong.2.2 SIC RepresentationSQL is a generally accepted relational database language. However, only very few SICsare mentioned in the ISO SQL standard. The implied violation action of a SIC is to setSQLCODE negative. That is, it causes an error. Other SIC features, such as whetherthe SIC is operation-independent or operation-dependent, unconditional or conditional,certain or uncertain, are not specified.The SQL standard has two levels and one addendum [van der Lans, 1989]. Thespecified SICs are as below:1. SQL Level 1 Standard: It only specifies data types and length of data items.NOT NULL must be specified in every column definition in a CREATE TABLEChapter 2. Review of Previous Work 35statement.2. SQL Level 2 Standard: In addition to data types, it allows the designer tospecify whether a data item is UNIQUE and whether it can be NULL.3. SQL with Addendum:• In addition to the above, it allows specification of a value range by using theCHECK specification for a data item;• By using the CHECK specification separately (e.g., CHECK (YEAR-OF-BIRTH < YEAR-JOINED)), tuple constraints can be specified;• Few set constraints have been included:— Referential constraint is provided by using FOREIGN KEY (column list)REFERENCES (table name);— UNIQUE or PRIMARY KEY specificationA number of researchers propose alternative languages to represent more SICs. Someare variants of first order logic (FOL) languages, for example, the many-sorted first-orderpredicate calculus applied by Furtado et al. [1981]; the constraint equation proposedby Morgenstern [1983; 1984a; 1984b; 1986]; the equation statements suggested by Cosmadakis and Kanellakis [1985]; the first order formulas used by Reiter [1984; 1988],Henschen et al. [1984], and Urban and Delcambre [1989]. These FOL family languagesare purely declarative. The operations to be checked are treated as an implementationproblem. No violation action is specified. The constra.ints are assumed to be certain.Furthermore, these FOL languages cannot represent some kinds of SICs, e.g., operationdependent or dynamic SICs. Other researchers (e.g., [de Castilho et al., 1982], [Casanovaand Furtado, 1984], [Ehrich et al., 1984], [Kung, 1984] and [Lipeck, 1986]) propose anChapter 2. Review of Previous Work 36extension of FOL — temporal logic that may include explicit “state” or “time” parameters. They invent some special temporal quantification or a list of modalities (e.g.,always, until, heretofore, sometime) to model dynamic SICs. One potential problem ofthese temporal logic languages is that they may not be easily understood and implemented.Others just extend the original SQL proposals, but do not follow the SQL standard.For example, Hammer and McLeod [1975] state that the syntax of their language is“rather similar” to SEQUEL. Bertino and Apuzzo [1984] propose a 5-component modelto specify SICs and state that their language is a “simple extension” to SQL. Date[1983] applies a language of his own — very loosely based on the PL/I version of UDL(Unified Database Language) and describes [1987] some proposed extensions of the baseSQL standard. These SQL extensions are more precise and powerful for representingSIC features than the SQL standard. Among these, the language model proposed byBertino and Apuzzo [1984] and the similar one by [Fernandez et al., 1981]” may becapable of representing the SIC features mentioned above except for certainties. However,neither work carefully elaborates what would be in each component12.In addition, SICsrepresented in their original model may not be efficiently enforced if we compare therepresentation to Date’s UDL [Date, 1983] that uses the idea of a cursor to facilitate SICenforcement on occurrences.The certainty feature of SICs has not been properly represented. The term fuzzyintegrity constraints has appeared in the fuzzy database literature (e.g., [Zvieli and Chen,“However, [Fernandez, et al., 1981] does not describe what is really their language — first order logicor SQL.‘2For example, The model proposed by Fernandez, et al., [1981] allows to specify the enforcementschedule as a part of precondition of a SIC. The model proposed by Bertino and Apuzzo [1984] allowsthe options to switch a SIC ON/OFF. However, as discussed in the previous section, these should notbe in a SIC specification.Chapter 2. Review of Previous Work 371986], [Raju and Majumdar, 1988]). However, its meaning may have not been fullyexplored. Raju and Majumdar [1988] classify fuzzy relations in relational databasesinto two categories. A type-i fuzzy relation captures the impreciseness in the association among entities, e.g., the certainty of John liking the course AT is 80%. Atype-2 fuzzy relation produces further fuzzy semantics by allowing the domain of anattribute to be a set of fuzzy sets, e.g., “allowing salary of John to be in the range$40, 000 to $50, 000 and that of Mary to be a fuzzy set, However, their fuzzy integrity constraints only attach some fuzzy modifiers, e.g., “very”, “more or less” to anassertion by choosing some “fuzzy resemblance relation”, for example, “for any job, employees having approximately equal experience should have approximately equal salaril’ or“any items having approximately equal order-date should have more or less equal deliverydate”. This kind of SIC does not capture the “impreciseness” of a SIC itself. For example, “any items having approximately equal order-date should have more or less equaldelivery date” is a SIC that is true with certainty 75%. RESTRICT is another languageto consider the certainties of SICs ([Oren, 1985]). However, it simply uses a special operator “?“ to represent an uncertain assertion, e.g., “SALARY(PERSON) 10000ELSE rejectInterpretation: The first line is the SIC name. It indicates that this is a SICfor Employee.Salary on update and it is a “RshipDepEntVal” type because itasserts that the existence of a relationship (Work_for) depends on the value ofan entity attribute (Employee.Salary). The rship_occ_part is an assertion thatstands for “relationship occurrence participant”. In this example, it is usedto specify the Work_for occurrence in which the currently checked Employeeoccurrence participates with the entity type, “Employee”. This SIC statesthat with 100% certainty, when an Employee.Salary occurrence is to be updated, if the Employee participates in at least one Work_for, his or her Salarymust be greater than $10,000. Otherwise, the update operation is rejected.Note that the database designer might choose “propagate(delete(Work_for))”as the violation action. In that case, the propagation action might imply thatif the organization could not afford the minimum salary for an employee, itChapter 3. A Model for Representing Semantic Integrity Constraints 46could not require that the employee be associated with any project (so thecurrent Work_for occurrence must also be deleted).The restriction intention of a SIC represented by this model is expressed as the predicate component (P) separately. Its other components precisely specify the SIC featureslisted in Section 2.1.9. These are:1. certain or uncertain: indicated by the component F — certainty factor;2. applied data: shown in the component 0 — object;3. operation-dependent: a SIC represented in this model is always operation-dependent,the operation is specified in the component T — operation Type.4. unconditional or conditional: the conditions are listed in the component Cprecondition;5. strong, soft or self-correcting: indicated by the component A violation action.A SIC represented in this model is always dynamic in that it indicates a valid databasestate transition in which the object would be manipulated by the specified operation type.A static constraint is represented by rewriting it into one or more dynamic constraints forthe related object(s) on the operation(s) that could cause unallowed database state(s).The reasons for doing this are as follows:• Because some constraints are inherently dynamic, this allows uniform representations for all kinds of constraints.• Enforcement efficiency can be greatly increased by knowing which operations cancause constraint violations.Chapter 3. A Model for Representing Semantic Integrity Constraints 47• Violation actions may be different depending on the operation types causing constraint violations.The usual approaches to representing a SIC mix up several components in one statement and do not explicitly describe the above features. Taking a SIC represented in atraditional language, one could only assume that it is 100Yo certain; related to all objectsmentioned in that SIC; applicable to all primitive database access operations; unconditional in all contexts; and so strong that it would cause errors if violated. For instance,according to the language used by Ling and Rajagopalan [1984], the above example wouldbe expressed in the prenex normal form of the first order predicates as follows.E ranges over EmployeeJ ranges over WorLforV E V J (E.EmpNo J.EmpNo OR E.Salary> 10000)Evaluation:This statement is neither precise nor powerful because of the following:(1) It is not clear how certain this SIC would be. So, it can only be assumedas 100%.(2) Since E and J appear there and no operation type is explicitly specified,it would seem that the SIC is relevant to any operation on an Employeeor a Work...for. However, in fact, it could never be violated by deletionor insertion of an Employee, or deletion of a Work_for.(3) It does not clearly distinguish between the condition specifying thoseemployees to whom the constraint would apply (those working for someproject(s), i.e., the condition “E.EmpNo = J.EmpNo”) and the assertionChapter 3. A Model for Representing Semantic Integrity Constraints 48on the salary of those employees (i.e., the expression “E.Salary> 10000”).(4) Since no violation action is specified, it can only be assumed as “reject”.The representation in the SIC Representation model is also more precise than theoriginal models presented by Fernandez et al. [1981], and Bertino and Apuzzo [1984].According to the model presented by Bertino and Apuzzo [1984], the above examplewould be expressed by a single SIC in which the object component may include “Employee” and “Work_for” and the operation component may include “update” and “insert”.It implicitly means that the SIC is asserted on the insertion of an Employee or Work_forand the updates of any attribute of an Employee or Work_for.In addition, the representations in terms of the Representation model have some advantages that will be described in detail in the next chapter. In short, the certaintyfactor introduces fuzzy semantics, facilitates knowledge-based query processing and provides deductive capabilities. The identified object and operation type are useful forefficient enforcement and together with the violation action, they provide different perspectives for specifying and enforcing a general SIC. The separation of precondition frompredicate allows natural and precise representation of a SIC. The explicit representationsof the object, operation type and precondition components are useful for applying SICabstractions (that are introduced in Section 4.2).These six components and the naming convention of the SIC Representation modelare described in the following sections. Detailed BNF (Backus-Naur Form) descriptionsof the model are given in Appendix A. The rest of this section describes briefly thenotation and the allowed expressions in this language.Chapter 3. A Model for Representing Semantic Integrity Constraints 49Expressions First order logic and higher order logic expressions, arithmetic expressions and date expressions, are allowed in the precondition and predicate components.They may consist of quantifiers V (for all), (there exists), logical connectives“A” (conjunction), “V” (disjunction), “—i” (negation), but not the “—÷“ (implication)connective since it has been separated as the precondition. They may also contain aggregate functions: e.g.. avg, count, max, mm, sum, or other user-defined aggregationfunctions. In addition, a special pair of functions, old and new, are used. The new/oldfunction in SIC references the new/old value of the referenced object after/before checking the SIC. An implemented integrity maintenance subsystem of a DBMS must assurethe correct functioning of the new and old functions6.One set expression is also allowed:set{Ei some restriction(s)}. This is read as for all E1 satisfying “some restriction(s)” 17The special predicates used with the SIC Representation model are summarized in theAppendix B18.Subscript Notation This research uses a subscript to simulate the “cursor” in Date’sUDL to emphasize that a SIC is enforced on an occurrence although it is specified intensionally for an entity/relationship/relation type. A cursor in UDL is an object whosevalue is (normally) the address of some specific record in the database ([Date, 1983]).Since a SIC is enforced on an occurrence, if there are other occurrences of the same entity,‘61f the SIC is checked immediately, it references the new/old value of the referenced object after/beforethe operation. However, if the SIC is deferred to be checked at end of transaction, it references thenew/old value of the referenced object after/before the transaction.‘7The keyword set can be omitted. This notation is adopted from an “integrity constraint definitionlanguage” [Gardarin and Melkanoff, 1979].18n addition, this research follows the Prolog naming convention that a variable is written as a wordbeginning with a capital letter and an atom is written as a word beginning with a lower case. The onlyexceptions are keywords (CERTAINTY, FOR, ON, IF, ASSERT and ELSE) and the SIC name in thefirst line, which are character strings in nature, not variables.Chapter 3. A Model for Representing Semantic Integrity Constraints 50relationship or relation type referred in the precondition or predicate component, different subscripts will be used to distinguish them. The one to he checked is referenced byattaching a default subscript 0, e.g., E0. Variables with subscripts other than 0 (e.g., E1)represent any occurrence of the same object (e.g., E) type. In terms of programming language, the one with subscript 0 refers to the particular inserted/deleted/updated “value”to be checked; those with subscripts other than 0 are “variables” of the same object type.If there is only one occurrence (i.e., the one to be checked) referred in the preconditionand predicate components, the default subscript 0 is omitted. In any case, we do not usea subscript in the object component because the variable in that component is used toindicate that the SIC is applicable to any occurrence of the specified type.An example from Date [1983] will help to clarify the above subscript idea. Supposethat there is a relationship Supply in the context of “Supplier Supply Part”; and a SICrequiring that “any quantity value of a supply must not be more than 5 percent greaterthan the average of all such values”. One of the required SICs is represented by the Representation model as below, in which Supply1.Qty stands for any Supply. Qty occurrence.The assertion in its predicate component requires that each of other Supply. Qty’s mustsatisfy the average property when an occurrence is to be deleted.Supply-D-AggFcn- (Supply. Qty)CERTAINTY certainFOR SupplyON deletionASSERT -‘(Supplyi.Qty> 1.05 x avg({Supplyi.Qty Supply1 Supplyo}))ELSE rejectLegend: Supplyo indicates the value of Supply occurrence currently checked.Chapter 3. A Model for Representing Semantic Integrity Constraints 513.2 SIC NameSIC name is not counted as a component in the Representation model since it is notessential. A database designer may freely use other naming conventions, such as thesimple one Ii, 12, ... used in the literature ([Bertino and Apuzzo [1984], and [Date,1983]), or some application dependent conventions. In this research a concatenatedcharacter string is used as a SIC name for conveying some of the meaning of a SIC and alsofor serving as its unique identifier. This convention allows the SIC elicitation subsystem ofan automated database design system (introduced in Chapter 7) to generate a SIC nameautomatically by incorporating an abbreviated application domain-independent SIC typeand some application information (object type, operation type and related object typeset). A SIC name contains five parts. The general format is like:ObjectType-OperationType-SICType- (RelatedObjectTypeSet)-SequenceNoFor example. from “Employee. Salary- U-RshipDepEnt Val- (Work_for)” we know thatthis SIC should be checked on the update of Employee.Salary because the existence of aWork_for relationship occurrence depends on its value.1. The Object Type (e.g., Employee.Salary) is the specific type name of an attribute,entity, relationship, or relation for which this SIC is asserted. It corresponds to thecomponent 0 in the SIC.2. The Operation Type is either U, I, or D representing update, insertion, or deletion,respectively. Similarly, it corresponds to the component T in the SIC.Chapter 3. A Model for Representing Semantic Integrity Constraints 523. The SIC Type is the abbreviation of a SIC type conveying some application domain-independent meaning; e.g., “Totality” (if an entity occurrence exists, it must participate in some minimum number of relationship occurrences of the specified type),“RshipDepEntVal” (the existence of a relationship depends on an attribute valueof an entity type). By applying the E-R-SIC model (introduced in Chapter 5), wecan classify SICs into a number of domain-independent SIC types.4. The Related Object Type Set includes those object types that appear in the precondition and predicate components of a SIC and can supplement the meaningof the SIC type. A SIC name may not have this part if there is no such objects. For example, there is no other object type related to an absolute cardinality constraint for an entity type, which restricts the maximum number of occurrences of the entity type that can exist in a database. However, in the aboveproject_employee_minimum_salary example, it may be useful to know which relationship type (i.e., Work_for) depends on the attribute (i.e., Employee.Salary). Fordomain constraints on the insertion of an entity, although all the attributes of theentity are referenced in the SIC, they need not be included in its SIC name sinceit is implied by the SIC type “Domain”.5. The Sequence No is a positive integer number. In a complicated application, it ispossible that the above parts would not be sufficient to distinguish one SIC fromothers. In that case, as a last resort, a sequence number starting with 1 will beadded.Chapter 3. A Model for Representing Semantic Integrity Constraints 533.3 Certainty Factor (F)A certainty factor, F, which does not appear in Fernandez et al. [1981] and Bertino andApuzzo [1984], is introduced in the Representation model. A certainty factor is closelyrelated to the predicate (P) and violation action (A). It may be attached to SICs in thefollowing alternative ways:1. Ratio Scales: A ratio scale to measure the certainty of a SIC is needed if we woulduse the certainty factor to provide deductive capabilities under uncertainty. Thecertainty factor is defined to have a value between 0% and 100%. Any SIC with acertainty factor less than 100% can only have “warning”, “conditionallyieject”, or“conditionally_propagate” as its violation action. The value of the certainty factoris based on the experience of the database designer. If the value is too low, itimplies that the SIC is very unlikely to hold in the database and is less useful inproviding semantic information.2. Ordinal Scales: If we are only interested in traditional databases and there area number of uncertain SICs, an ordinal scale to measure the certainty of a SICis needed. This ordinal scale would be used to rank the certainty of SICs in thesame “family” (in terms of the same restriction intention, but different levels ofstringency) when verifying and enforcing them. The certainty factor might bedefined to have a value between 1 and 10 if there are at most 9 uncertain SICsfor the same “family” of SICs. Any SIC with a. certainty factor less than 10 is anuncertain SIC.3. Two levels: It is possible that a database designer would like to specify at mostone uncertain SIC for a restriction intention because an organization is mainlychapter 3. A Model for Representing Semantic Integrity Constraints 54concerned with SICs with 100% certainty. In this case, two discrete levels of certainty are enough. In this case, “uncertain” (i.e., certainty 100%) would havea “warning”, “conditionally_reject”, or “conditionally_propagate” violation actionwhereas “certain” (i.e., certainty = 100%) would have a corresponding “reject”, or“propagate” violation action.4. Fuzzy terms: It is also possible to attach a fuzzy term, e.g., “usually”, “sometimes”, as the certainty factor if that term has been properly specified to be equivalent to a specific certainty ratio number (e.g., “usually” may be equivalent to80%).3.4 Object (0)O represents the data object to which the SIC applies. In the E-R model, it correspondsto any occurrence of the specified attribute, entity, or relationship type. In the relationalmodel, it corresponds to any occurrence of an attribute or a relation type, i.e., a column,or tuple. The E.A or R.A is used to refer to the attribute A of entity E or relationship(relation in the relational model) R. Note that an attribute is treated as an object inthis model since it would make the SIC clearer and it is possible that a constraint isrelevant to an update of an attribute, but not the insertion/deletion of its related entity,relationship, or relation. For example, in the above project_employee_minimum_salaryexample, it is not relevant to the insertion or deletion of an Employee although it isrelevant to the update of an Employee.Salary.Chapter 3. A Model for Representing Semantic Integrity Constraints 553.5 Operation Type (T)The operation type, T, (also called the access type [Bertino and Apuzzo, 1984]) specifiesthe types of database manipulation operations to which the SIC is applicable. Only threeoperation types need to be considered — insertion, deletion, or update. Note that aSIC is never relevant to a retrieval (query) operation.The data definition operations, i.e., “creation” or “destruction” of an entity, relationship, or relation type, are also excluded’9.A database designer should design a databaseto include those necessary object types according to the information requirements. However, whether an object type should be in a database schema is not a SIC. Destroyingan object type is concerned with the database reorganization. In the case of databasereorganization, destroying an object type may have different implications in different situations. For example, destroying a type may occur because this type ceases to exist inthe real world or just because the organization will no longer keep the super-type (e.g.Person) information, hut is still interested in its subtypes (e.g. Employee). The databasedesigner needs to be involved to discover the changed information requirements. Thedifferent situations cannot be foreseen and automated in the database design phase.3.6 Precondition (C)The precondition (C) specifies the implicit or explicit presuppositions of a SIC. Ifit is not satisfied, it would cause the predicate of the SIC to be either undefined orirrelevant. For example, in the above project_employee_minimum_salary example, theSIC for Employee.Salary on update is irrelevant to any employee who does not work for‘9So the language represented by this model is not a relationally complete query language.Chapter 3. A Model for Representing Semantic Integrity Constraints .56any project. It also identifies the object (0) in the context of this SIC. In the aboveexample, the SIC for Work_for on insertion would have the precondition as “ Employee,rship_occ_part(Work_for, “Employee”, Employee)” to indicate the Employee occurrenceparticipating in the specific Work_for occurrence. If a SIC has no precondition, thekeyword IF may be omitted.3.7 Predicate (P)The predicate (P) is the assertion for the database state transition when the object(0) is to be manipulated by the operation (T). If it is true, the operation (T) on theobject (0) is allowed to be performed. For an attribute other than primary key20, thepredicate should involve the attribute itself. For an entity, relationship, or relation, thepredicate involves its attributes, aggregate attributes (e.g., average value), or relatedother entities and/or relationships (relations in the relational model). Since a SIC placeslogical restriction on data, its predicate component should contain invariant assertionson the related objects rather than procedural statements of programs.The precondition and predicate are both expressions that control whether the violation action will be invoked. The precondition specifies the general state of the databasethat makes the constraint relevant. The predicate specifies something that must be trueabout the object after the operation. The justification of having both components isdescribed in detail in Chapter 4.20We have this exception because in the relational model and traditional E-R model, primary keysplay the role of “surrogates”. The tipdate of a primary key of an entity/relationship/relation may implythe deletion of an old entity/relationship/relation followed by the insertion of a new one.Chapter 3. A Model for Representing Semantic Integrity Constraints 573.8 Violation Action (A)The violation action (A) specifies how the system is to behave if the predicate P is falsewhen the SIC is checked. In order to have precise meaning for a single SIC, it is desirableto have a single violation action (or a deterministic series of well-defined actions) ratherthan a complex procedure with many choices. The violation action may be specified asfollows.1. “warning” — allow the access operation, but issue a warning.2. “reject” reject the access operation, signalling an error.3. “propagate” allow the access operation, but perform a related corrective actionto restore the database to a correct state.4. “conditionally_reject” or “conditionally_propagate”— request confirmation from theuser to ignore the SIC; the processing of the access operation is suspended untilthe user indicates either to ignore the SIC or to take the violation action (rejectthe access operation, or allow the operation but propagate to perform a correctiveaction).Discussion on Propagation Traditionally, the violation action of a SIC is just arejection. A warning action indeed turns off the enforcement of the SIC except forgenerating a message. A propagation action changes other objects in the database. Date[1983] indicates that integrity rules may he considered important special cases of triggeredprocedures21.In discussing semantic integrity constraints, Fernandez et al. [1981] mentionthe following problems raised by the trigger mechanism.21Other application areas for triggered procedures include virtual fields, security, etc. [Date, 1983]Chapter 3. A Model for Representing Semantic Integrity Constraints 58(1) Integrity or security violation within auxiliary procedures could result ina never-ending sequence of invoking procedures.(2) The proper access rights of the invoked procedures are an issue: Shouldthese be the rights of the invoker of the triggering maintenance operationor the rights of the DBA who specified the auxiliary procedure?(3) Triggering of maintenance procedures results in a somewhat confusingdivision of responsibility between the user (or the programmer) and theDBA. For example, who should update the count of employees on insertinga new employee? The system or the programmer?The problem of security or access rights is beyond the scope of this research22.Besides,as stated before, the intention of this research is not to propose an approach to replacingthe transaction or application programming. If the intention of the violation actionis just to maintain a correct database state, there is no reason that we would need acomplex violation action. Therefore, although in principle a “violation action” couldbe very complex, in this research it is generally a single action or a deterministic seriesof well-defined actions. The responsibility of programmers to write proper applicationprograms to update data is not changed23.If the violation action is to propagate to insert some other objects, establishing the attribute values of those objects might become a problem. It is not possible for a databasedesigner to know all these values during database design. Those values that cannot beknown by an automated database design system will be assumed to be “null” (i.e., “unknown”). This may violate other SICs if some unknown attribute values (e.g., part of22llowever, this unsolved security problem raised by Fernandez et al. [1981] might also be one argumentagainst complex violation actions.23llowever, now a DBMS becomes more powerful to detect programming errors of programmers andtakes violation actions. Programmers need not check SICs because the DBMS will do that.Chapter 3. A Model for Representing Semantic Integrity Constraints 59key of a relationship) are not allowed to be “null”. There are two approaches to dealingwith this problem. In the first approach, the violation action “reject” rather than “propagate(insert(O1))” must be specified. In the second approach, the propagating insertionis allowed. Compared to the first approach, the second approach is more powerful. Evenif the propagating insertion is finally “rejected” due to other SICs, it is easier for theuser to understand why the original insertion operation is rejected. It also leaves openthe possibility of designing an enhanced integrity maintenance subsystem of DBMS thatwill prompt the user to supply the unknown attribute values or obtain them from someother sources ([Casanova and Tucherman, 1988]).Two-valued Logic Note that the violation action is taken only when the predicate,not the precondition, is What would happen if a database allows null values? Anull value can have at least two interpretations: “unknown” and “not applicable”. In thisresearch, “null” is only allowed to represent “unknown” because the “not applicable null”should not appear in a well-defined, properly-normalized database with clear semantics24.In principle, some warning messages can be issued25 if the predicate is evaluated to beunknown. However, since the “unknown” values will eventually become “known”, theywill be checked at that time. If they violate SICs, appropriate actions will he taken24The existence of “not applicable null” in some attributes of an entity, relationship. or relationindicates that these attributes do not apply to all occurrences. It implies that some subtypes shouldbe specified in order to clarify the semantics. In the literature (e.g., [Lee, 1988]) there are two otherinterpretations of “null”. Lee states that, for example, with regard to John’s spouse, the null value maymean: (1) literal interpretation “null” (e.g., the spouse name is “null”); (2) sense of “none” (e.g., Johnis a person, but he has no spouse); (3) “not applicable” because spouse is not an attribute of the object(e.g., John is the name of a building, not a person); (4) “unknown”. Note that the first interpretation isthe result of bad implementation of “null”. “Null” should be represented by an unambiguous symbol in adatabase. The second and third interpretations are also results of a bad database design — the semanticsare not clear because some attributes are not defined for some occurrences of the specific “type”. In theabove example, there should be two types, Person and Building, to avoid the third interpretation. Thesubtype Married_Person should be defined to avoid the second interpretation. In this research, the “notapplicable null” interpretation includes both the above second and third interpretations.25These are termed “weak violations” by Ho [1982].Chapter 3. A Model for Representing Semantic Integrity Constraints 60then. Therefore, in this research, for simplification, it is assumed that if the predicateis evaluated to be unknown, no violation action will be taken immediately. By takingsuch a position, the verification of SIC consistency can be simplified from 3-valued (true,false, unknown) to 2-valued (true, false) logic.Chapter 4The Application of the SIC Representation ModelThis chapter describes how we can apply the SIC Representation model for databasedesign and management. In Section 4.1 it is claimed that by applying this model we canrepresent precisely the features of any SIC. In Section 4.2 SIC abstractions are introducedto facilitate SIC specification and management. The usefulness of the Representationmodel in SIC abstraction is also discussed. In Section 4.3 the application to databasemanagement is briefly described.4.1 Completeness of a SIC Specification for Database DesignSu & Raschid [1985] and Shepherd Kerschberg [1986] suggest that “constraints” mustexplicitly specify both declarative semantics as well as process oriented or operationalsemantics. Declarative semantics correspond to logical formulas describing relationshipsbetween objects in a database. The information needed to check that these relationships are true or to maintain these relationships is the operational semantics. It is alsosuggested that the “constraint” formalisms must provide information along the lines ofWHAT, WHEN, WHERE and HOW if they are to be complete. Although our SICs areonly a proper subset of their “constraints” (refer to Section 1.3), the same formalism withsome modifications can be applied to examine the completeness of a SIC specification.61Chapter 4. The Application of the SIC Representation Model 621. Operational and Declarative Semantics. In the Representation model, thepredicate (P) and certainty factor (F) components specify the invariant declarativesemantics among data objects. The operation type (T) is relevant to the operationalsemantics to check these invariant facts. The violation action (A) specifies theoperational semantics that are needed to maintain these assertions. In addition,the object (0) and precondition (C) components supplement both declarative andoperational semantics to assert and check these invariant assertions. The certaintyfactor (F) also provides some operational semantics since the highest certain SICshould be enforced first.2. WHAT, WHEN, WHERE and HOW. The Representation model specifies:• WHAT the SIC requires — its invariant assertions (predicates, P) and certainty (certainty factor, F).• WHEN the SIC is to be checked these assertions should be checked whensome primitive access operation (T) (update, deletion, insertion) is performedon the data.• WHERE the SIC occurs — these assertions are applicable to some data object(0) in some contexts (preconditions, C).• HOW the system should behave in the case of these assertions being violated(violation action, A).Therefore, the Representation model provides declarative and operational semanticsof a SIC specification. It also provides a SIC specification along the lines of WHAT,WHEN, WHERE, and HOW. The rest of this section further explores whether we wouldexactly need these six components in order to represent completely declarative and operational semantics of a SIC specification and express precisely its features— can we haveChapter 4. The Application of the SIC Representation Model 63fewer or do we need more?4.1.1 Not Fewer ComponentsThe strengths of the Representation model are that it is precise enough to express thefeatures of a SIC; uniform for all kinds of SICs; and powerful enough to offer a correctiveaction rather than simply reject the operation violating the SIC. As mentioned in theprevious chapter, traditional languages do not explicitly describe all features included inthe Representation model. A SIC represented in a traditional language could only beassumed as 100% certain, related to all objects mentioned in that SIC; applicable to allprimitive database access operations; unconditional in all contexts; and so strong thatits violation would cause errors. It is beyond question that the predicate (P) componentis absolutely needed. Other components need to be further discussed.Object (0) Depending on the language used to represent a SIC, a number of objectsmay be mentioned. However, the declarative and operational semantics of the constraintmay not be relevant to some of these objects. In the project_employee_minimum_salaryexample of Chapter 3, the constraint is not relevant to the Project that may be mentionedif it is declared in a natural language. Neither is it relevant to the Employee if it isrepresented in a pure first order logic such as the one used by Ling and Rajagopalan[1984]. Consider another example. Suppose we have a constraint: “if an employee worksfor any project, his (her) salary cannot decrease”. This constraint is only relevant to theEmployee.Salary, not Employee, Project or Work_for. In order to have precise declarativeand operational semantics for a SIC, it is necessary to specify explicitly for which objectthis SIC is asserted. Since in this model a SIC is applied to a single operation, which canonly manipulate a single object, it is sufficient to assert only for this object.Chapter 4. The Application of the SIC Representation Model 64Operation Type (T) If a constraint is relevant to an entity, relationship, or relation,but not an attribute, and is operation-dependent, the explicit specification of its operationtype is needed. If the constraint is operation-independent, there are still two advantages(in addition to improving enforcement efficiency) to specifying the operation type:• First, explicitly expressing which operation could cause constraint violation provides the valuable operational semantics and clarifies the restriction intention (declarative semantics) of the SIC. It would be desirable to have a conceptual picture ofthe consequence of constraint enforcement even in the database design phase. Itwould also be helpful for later designing transaction specifications. For example,given the constraint “the total number of employees cannot exceed 200”, it is helpfulto know that its restriction is only relevant to the insertion of an employee althoughthe original constraint is operation-independent.• Second, it is impossible to specify the violation action without referring to a specificoperation type. When a general SIC is rewritten into several sub-SICs for separateobjects in terms of the Representation model, their objects, operations, and possiblydifferent violation actions provide useful declarative and operational semanticsnecessary conditions for operations on the objects in the object component andsufficient conditions for propagated operations on related objects.However, for a SIC asserted on an attribute, the explicit operation type specification isredundant since update is the only possible manipulation operation type. In this case,we explicitly specify it only for clear and uniform representation.Chapter 4. The Application of the SIC Representation Model 65Precondition (C) One may argue that it is unnecessary to separate preconditionfrom predicate in terms of pure logical expressiveness. However, without this separation. the declarative and operational semantics of the SIC a.re ambiguous. The precondition specifies the current database state that makes the constraint relevant; thepredicate asserts the allowable state after the operation on the object. We have seenthe project_employee_minimum_salary example in Chapter 3. Some constraints are morenaturally expressed in the IF... THEN. . . format. By explicitly separating the predicatefrom the precondition, the restriction intention of the predicate becomes clearer and isclosely related to the violation action that now can be simplified to be a deterministicseries of well-defined actions (usually a single one).Even if a constraint is not in the IF ... THEN . . . format when we first write it inEnglish, it may have some implicit “presuppositions”. For example, in a database containing Department, Employee, and Manager, a constraint might he stated in Englishas SIC-6P6: “each employee must earn less than his (her) department manager”. Inthat database, the above statement has two implicit presuppositions: (1) “the employeebelongs to some department”; and (2) “the department has a manager”. If these presuppositions are only true for some employees and departments, SIC-O is conditional and thepresuppositions should be explicitly represented in its precondition component. If theseare facts that are true for all employees and departments, we should have two other SICs,each of them asserting one of the above facts in its predicate component. In addition,these presuppositions should also be in the precondition component of SIC-O so that theconnection between a specific Employee and Manager are clear27. Each SIC would haveprecise meaning.261n this chapter, SIC-i, i=O,1,2,... are used as SIC names for simplicity.27That is, by these preconditions, we can find the manager occurrence related to an employee or findthe employee occurrences related to a manager. However, these are not the restriction intention of theabove SIC.Chapter 4. The Application of the SIC Representation Model 66The precise context specifications in the precondition are also needed to provideoperational semantics when enforcing some related SICs. Suppose we have two SICsfor Employee.Salary on update.SIC-iCERTAINTY certainASSERT Employee. Salary 100,000ELSE propagate(update(Department .Budget))SIC-2CERTAINTY certainASSERT Employee.Salary 150,000ELSE rejectWhen updating the salary of an employee to be $180,000, we would have an enforcement problem— which violation action should be taken. The problem is caused by theimprecise representation of SIC-i. The original restriction intention of these two SICs is:“the salary of an employee should be less than or equal to $100,000, otherwise, if it is nottoo high, we can increase the department budget; but if it is higher than $150,000, we canonly reject it”. The precondition of SIC-i should include an expression, “Employee.Salary< 150,00U’8.28Alternatively, we can use an expression, “checkpreSIC(SIC-, Employee.Salary”, that checks thepre-SIC, SIC-2, for Employee.Salary and returns true if the pre-SIC is satisfied. One SIC’s (say SIC-i)precondition may be the precondition and predicate of other SICs (say, SIC-2, SIC-3). SIC-2 and SIC-3may be called as the pre-SICs of SIC-i. If a data object violates the pre-SICs, SIC-i becomes irrelevantto further testing.Chapter 4. The Application of the SIC Representation Model 67Violation Action (A) Without the violation action component, all SICs could onlybe assumed to be strong. The SIC specifications would be less powerful since they couldnot include self-correcting or soft SICs.Certainty Factor (F) If a database designer only considers certain SICs, he or shewould not need the certainty factor component. However, the purpose of introducing certainty factors in database specifications is to represent more data semantics — includingfuzzy semantics. If the certainty factor component was not included in the Representation model, all SICs would be assumed to be 100% certain — no exceptions. Theviolation action of a SIC would be either a rejection or a corrective action. Systemswith only such SICs contain less declarative semantics since the fuzzy semantics are lost.They have also been criticized as too rigid to “deal with unusual, atypical, or unexpectedoccurrences”, and “it is not possible to allow violations of integrity constraints to persist without turning off completely the checking of those constraints” ([Bordiga, 1985]).Having considered such possibilities and in order to avoid such criticisms, a databasedesigner might specify too few SICs or too “generous” ones. The inclusion of certaintyfactors in the Representation model provides a mechanism for accepting data violatingthe assertions of some uncertain constraints— the introduction of “controlled inconsistency” For example, a database designer may specify a SIC for employee’s salaryas “Employee Salary < $1, 000,000” even though few employees in an organization havesalary greater than $100,000 Such a SIC would not catch common data entry errors,e.g., misplaced or omitted decimal points. Employing a certainty factor, he (she) mightspecify two or more SICs for the same object on the same operation. For example:SIC-1: with 100% certainty, Employee.Salary $1,000,000SIG-2: with 90% certainty, Employee.Salary < $100,000Chapter 4. The Application of the SIC Representation Model 68SIC-3: with 80% certainty, Employee.Salary < $50,000SIC-2 and SIC-3 are fuzzy (uncertain) semantic integrity constraints. If they areviolated, different warning messages might be issued.One may wonder since the certainty factor component relates closely to the violationaction, we might only need the latter. However, we may lose some semantics if we onlyhave the violation action component:• Suppose that we only have at most one uncertain SIC for each restriction intention.It might be acceptable to keep only the violation action component if we can assurethat the violation actions would correctly be specified and be consistent with their“implicit” certainty factors.• Suppose that we have a number of uncertain SICs. In the above example, SIG-2 andSIC-3 should have measures of certainty at least on an ordinal scale. Otherwise,we cannot know whether these SICs are inconsistent29.In addition, when enforcinga set of uncertain SICs, some ordering is needed. The most certain SIC in thesame “family” (in terms of the same restriction intention, but different levels ofstringency) should be checked first since its violation is more likely to cause thedatabase to be inconsistent, the second certain one is then checked, etc. If anoccurrence violates the higher certain SIC, the remaining SICs with lower certaintyin the same “family” need not be checked. Without the certainty factor, we wouldlose some declarative and operational semantics.29ff the SIC-S were with certainty 95%, SIC-S and SIC-S would become inconsistent. Note that SICsrepresented in this model are consistent if there exists a database state transition that is allowable withregard to all of the restrictions, i.e., considering all their components except for violation actions.Chapter 4. The Application of the SIC Representation Model 69Conclusion Based on the above arguments, we can conclude the following:If we wish to have uniform and precise representations for all kinds of SICs,it is necessary to have all six components of the Representation model.4.1.2 No More ComponentsIs it possible that more components are needed to represent declarative and operationalsemantics of a SIC or express its features? In the following, some possible proposals arediscussed.Enforcement Schedule Shepherd and Kerschberg [1986] suggest that a “constraint”formalism should include when a constraint is to be checked -— after every transactionor at audit time. Note that their discussion corresponds to the classification based onthe enforcement schedule described in the Section 2.1.7. As discussed in that section, theenforcement schedule is a transaction-driven specification. It is impossible to specify theenforcement schedule of a SIC as another component in the level of database schema.Permanence As also discussed in the Section 2.1.5, the permanence of a SIC is noteasily decided in advance since the organizational environment is changing. The permanence of a SIC is closely related to SIC maintenance rather than database design or SICenforcement. If a SIC is still in a database schema, it should be enforced anyway. Itspermanence information adds no more declarative or operational semantics. Therefore,this “permanence” information is not included in the Representation model.Chapter 4. The Application of the SIC Representation Model 70Object Type, Set, or Occurrence The object component in the Representationmodel means that the SIC is applicable to any occurrence of the specified object type.Occurrences are fundamental things in the database, and insertion, deletion, and updateare primitive operations on them. The integrity problems caused by data definitionoperations (destruction or creation) on data object types are not the focus of this research.Any SIC that involves data manipulation operations on higher levels of objects, e.g.,types, or sets, is finally enforced at the occurrence level. Therefore, the explicit indicationof the level to which the SIC is applied would add no more declarative or operationalsemantics.Static or Dynamic As mentioned in Section 2.1.8, researchers have also discussedstatic and dynamic constraints. Should we explicitly specify a SIC as static or dynamic?As described in Chapter 3, the Representation model is indeed transition-oriented; thatis, all SICs represented in terms of the Representation model are basically dynamic. Thefundamental premise is as below.Premise 4.1 The current database state is semantically correct.A database state is semantically correct if it can be constructed starting from an emptydatabase by a sequence of valid database primitive operations (insertions, deletions, orupdates). A database operation is valid if the state transition caused by it satisfies theSICs that exist at the time the operation is performed. We assume that a database is initially empty. Over time, object occurrences are inserted into the database, then updated,and finally probably deleted. Before ai occurrence is manipulated, the old database stateis semantically correct. A SIC is specified so that the database transition caused by itsprimitive operation would bring the database to a new semantically correct state. AnyChapter 4. The Application of the SIC Representation Model 71“static constraint” is represented by rewriting it into one or more dynamic constraints forthe related object(s) on the operation(s) that could cause unallowed database state(s).By doing so, we do not lose its declarative semantics, rather, proper operational semantics are attached. It is unnecessary to indicate whether the original constraint is dynamicor static in this model.In addition, a constraint on a sequence of operations is not a “real” SIC in terms ofthe SIC Representation model. Instead, it is the consequence of enforcing several SICs.For example, in order to model “a car must be owned by a manufacturer before it can beowned by a dealer’, given that there are Manufacturer_Ownership and Dealer_Ownershiprelationships, and Car, Manufacturer, and Dealer entities, we would need four SICs:Manufacturer_ Ownership-I-RshipExclusive- (Dealer_Ownership)CERTAINTY certainFOR ManufacturerOwnershipON insertionIF Car, rship_occ_part (Manufacturer_Ownership, Car)ASSERT —di DealerOwnership,rship_occ_part(Dealer_Ownership, Car)ELSE rejectChapter 4. The Application of the SIC Representation Model 72Dealer_ Ownership-I-RshipExciusive- (Manufacturer_Ownership)CERTAINTY certainFOR Dea1erOwnershipON insertionIF Car, rship_occ_part (Dealer_Ownership, Car)ASSERT - ManufacturerOwnership,rship_occ_part (Manufacturer_Ownership, Car)ELSE rejectDealer_Ownership-I-RshipBeforeRship- (ManufacturerOwnership)CERTAINTY certainFOR Dea1erOwnershipON insertionIF Car, rship_occ_part (Dealer_Ownership, Car)ASSERT ManufacturerOwnership,rship_occ_part (old(ManufacturerOwnership), Car)ELSE rejectManufacturer Ownership-D-Rship TriggerRship- (Dealer Ownership)CERTAINTY certainFOR ManufacturerOwnershipON deletionIF Car, rship_occ_part (Manufacturer_Ownership, Car)ASSERT DealerOwnership,rship_occ_part(Dealer_Ownership, Car)ELSE propagate(insert (Dealer_Ownership))Chapter 4. The Application of the SIC Representation Model 73Interpretation:These SICs restrict the relationships for a single Car. The first two SICsspecify that Dealer_Ownership and Manufacturer_Ownership are exclusive.The third SIC restricts Manufacturer_Ownership to have existed at the timeDealer_Ownership is being created. Thus, these first three SICs require that ifa Dealer_Ownership exists, then Manufacturer_Ownership must have existedin the past (and must no longer exist now). The fourth SIC requires thatwhen a Manufacturer_Ownership is to be deleted, a Dealer_Ownership mustbe created. These four SICs together restrict Manufacturer_Ownership andDealer_Ownership to exist in sequence. These SICs are independent of anytransactions. However, because of them, the order of the related transactionsis restricted. The sequence restriction on transactions is naturally guaranteedif we specify completely SICs on data. It need not be worried about at thelevel of transaction specification.For example, let us consider the following four possible transactions. Transaction-2 (which can be called Transfer_Car transaction to convey the applicationmeaning) can only be performed after Transaction-i (which can be calledProduce_Car transaction)30.Note that Transaction-3 and Transaction-4 arenot allowed to be performed. Transaction-3 would be rejected because either(i) if Transaction-i was not executed before, executing Transaction-3 wouldviolate the third SIC; or (ii) if Transaction-i was executed before, executing Transaction-S would violate the second SIC. Transaction-4 would also berejected due to the fourth SIC.30llowever, the Transaction-2 may never be performed— that is, a car may be always in the hand ofa manufacturer. Whether the Tran.action-2 should be performed would depend upon the happeningsin the real world. Also, note that since all these related SICs are enforced at the end of Trarisaction-2,we can switch the order of those two database operations inside it.Chapter 4. The Application of the SIC Representation Model 74Transaction-iBeginTransactioninsert (Manufacturer_Ownership)End TransactionTransaction-SBegin Transactiondelete (Manufacturer_Ownership)insert (Dealer_Ownership)End TransactionTransaction-SBegin Transactioninsert (Dealer_Ownership)End TransactionTransaction-4Begin Transactiondelete(Manufacturer_Ownership)End TransactionA SIC with explicit time restriction is still a data-driven constraint though it mayinvolve a special system variable Current_time that registers the current clock time. Forexample, to model the SIC: “an employee cannot receive a salary raise during his or herfirst 6 months in the company”, an Employee entity must have a HireDate attribute, andthe SIC will be:Chapter 4. The Application of the SIC Representation Model 75Employee. Salary- U- TimeRestrict Transition- (Current_time, Employee. HireDate)CERTAINTY certainFOR Employee. SalaryON updateIF Current_time < Employee.HireDate + “6 months”AS SERT new(Employee. Salary) old(Employee . Salary)ELSE rejectThis kind of SIC may only occur in an environment in which the event that causesmanipulation of the involved objects is processed in real time so that the Current_timein the computer matches the event time in the real world. Otherwise, Current_time mustbe replaced by a time-valued attribute that records the external event time, and theSIC becomes an ordinary data-driven constraint. In the above example, if the request toupdate Salary is not processed in real time, the expression in the precondition componentwould be: “Salary UpdateRequest.Date < Employee.HireDate + “6 months”.Conclusion Four possible proposals to incorporate more components in the Representation model have been discussed. They suggest either some things that cannot bespecified in a database schema, or others that add no more declarative or operationalsemantics of a SIC. Given these six components we can represent any kind of SIC thatis mentioned in the literature, and list its features precisely. These six components sufficiently provide the declarative and operational semantics of a SIC what should betrue in the database, and the information to check and maintain those assertions. Thus,we conclude the following:Chapter 4. The Application of the SIC Representation Model 76It is sufficient to have these six components of the SIC Representation modelto represent a SIC precisely.With this SIC Representation model, the database designer can represent the dataintegrity semantics properly during conceptual modelling.4.2 SIC AbstractionsOne may be concerned that there would be a huge number of SICs represented in termsof the Representation model in an actual database. However, the explicit componentseparation in the Representation model allows us to apply abstraction concepts to reducethe number of SICs that must he specified using the full Representation model.SIC Aggregation Assume that 01 and Ti are the data object and operation type forSIC-i, respectively; and Ui and Ti, where i= 2,3, ..., are the data object and operationtype for SIC-i, respectively. SIC-i (called aggregate-SIC) is the aggregation of otherSICs (called component-SICs) if• 01 contains all Oi’s as components;• the operation Ti on 01 can be conceptually thought of as the combination ofoperations Ti on Ui;• component-SICs and aggregate-SIC are sub-SICs decomposed from the same general SIC.An aggregate-SIC may have its own assertions and violation action. The enforcementof an aggregate-SIC can be simulated by checking all of its component-SICs and itsChapter 4. The Application of the SIC Representation Model 77own assertions. If an object violates any component-SIC, the violation action of theaggregate-SIC will be taken. Thus, we can use a special logical predicate (checkeomSlC,see Appendix B.2) to refer to all its component-SICs by their names in the aggregate-SIC and avoid having to write the same assertions explicitly for 01 on Ti. One exampleis that the domain constraint on insertion of an entity can he simulated by checkingthe domain constraint of updating its attributes (from unknown values to some values).We would have SICs asserting not-null, value, unique, etc., for each of its attributes onupdate. These assertions need not be repeated for the entity on insertion.SIC Specialization Assume that 01 and Ti are, respectively, the object and operationtype for SIC-i; and 02 and T2 are, respectively, the object and operation type for SIG-2.SIC-2 is the specialization of SIC-i if 02 is a specialization (i.e., sub-type) of 01, andT1=T2.The specialized SIC would inherit assertions from its parent SIC with the propervariable substitution (02 for 01). Thus, representation of the specialized SIC can beomitted unless it has special restrictions. The specialized SIC’s own assertions can onlyrefine its parent SIC’s assertions, but not overwrite them. For example, suppose thatthe database designer defines SIC-i for Employee.Salary on update: “(Employee.Salary> 1000) A (Employee.Salary < 120000)”. The SIC representation for Manager.Salaryon update is not needed unless there are special restrictions (e.g., “(Manager.Salary>15000)”).SIC Association A SIC is a set of other SICs if conceptually the enforcement ofthe set-SIC is the same as the enforcement of all of its member-SICs and nothingmore. The violation action of the “set-SIC” is dummy because if an object violates anyChapter 4. The Application of the SIC Representation Model 78member-SIC, the violation action of that member-SIC will be taken. Thus, we can usea special logical predicate (checkmemSlG, see Appendix B.2) to refer to all its memberSICs by their names in the “set-SIC” and avoid having to write the same representationsof the member-SICs explicitly for the object asserted by the “set-SIC”. The concept ofSIC association may be useful for SIC enforcement in a DBMS. For example, all SICsfor the same object on the same operation may be grouped as a “set-SIC” or several“set-SICs” according to their certainty factors and/or scheduling requirements. Thiskind of “set-SIC” is not a new type of SIC. However, in SIC specifications, conceptually,a SIC on updating the primary key of an entity (or relationship, or relation) can besimulated as two “set-SICs” that refer to SICs for deleting the corresponding old entity(relationship, or relation), and inserting a corresponding new entity (relationship, orrelation), respectively.Generic SICs By applying the above abstraction concepts, we can reduce the numberof SICs that must be specified explicitly using the Representation model. The conceptof generic SICs is introduced to reduce the number of required explicit representationseven further. Assume we have the following generic object types:— (1) Entity* is the generalization (i.e., union) of all entity types that are defined by thedatabase designer. Entity*.Attribute* is the generalization of all attributes of allentity types that are defined by the database designer.— (2) Relationship* is the generalization of all relationship types that are defined by thedatabase designer. Relationship .Attribte is the generalization of all attributes ofall relationship types that are defined by the database designer.These generic object types are conceptual modelling objects. They do not actually exist ina database. That is, neither data definition operations nor data manipulation operationsChapter 4. The Application of the SIC Representation Model 79will actually be applied to them. However, we can write some common SIC types (e.g.,domain constraints) for them. These SIC representations for generic object types canbe called generic SICs (SICs for specific object types can be called specific SICs incontrast). The precondition component of a generic SIC includes logical predicates3’(refer to Appendix B.1) to indicate clearly the contexts where the SIC type is applicable(e.g., the fact that two entity types are exclusive) and/or identify the related information(e.g., its primary key, etc.) of the object type for which the SIC type applies. Supposethat we keep the constraint information for specific object types as logical predicates inthe database (e.g., in the data dictionary). Also suppose that the DBMS would supportSIC inheritance properly. Since all object types defined by the database designer are subtypes of these generic object types, if they satisfy the preconditions of some generic SICs,these SICs would be applied to them by the principle of SIC specialization. Thus, thesegeneric SICs serve as “templates” for common SIC representations and are expected tobe inherited by specific object types32. We would need only one representation for eachSIC type (e.g., two entity types are exclusive) in a database regardless of the numberof specific object types to which the SIC type applies. The advantage of this approachis to reduce the possibly huge number of explicit representations of SICs so that theconceptual structure and future maintenance (management) of SICs become easier. Inaddition, since generic SICs can be pre-defined in an automated database design system,31These logical predicates in the precondition component of a generic SIC contain some uninstantiated variables. (For example, Primary_Key is a variable in the logical predicate entity(Entity*Primary_Key ) These variables will be instantiated when a specific entity type (e.g., Employee)has the constraint information to satisfy the precondition and inherit the SIC. (For example, the abovePrimary_Key can be instantiated to be “Empld” if the database designer has specified Employee as anentity type with the primary key “Empld”, that is, the assertion entity(”Employee”, “Empld”,...) hasbeen given.)32The idea here is similar to the following simple case. In a database, there are entity types Employee,Customer, Supplier, etc. Though the Person entity type does not actually in the database, we can writesome SICs for Person, which would be inherited by the specific entity types (e.g., Customer). The levelof our Entity* type is still higher than Person.Chapter 4. The Application of the SIC Representation Model 80the consultation to elicit SICs would be more efficient. Section 7.3.1 will describe indetail how to apply generic SICs for representing some SIC types that are identifiedduring conceptual modelling.Usefulness of the Representation Model It is necessary to identify the object andoperation type when applying SIC aggregation or specialization abstractions. The explicitcomponent separation in the SIC Representation model helps identify these componentseasily. In addition, the representation of generic SICs relies heavily on the preconditioncomponent. Most SIC types (e.g., two entity types are exclusive) only apply to somespecific object types. The precondition component of such a generic SIC indicates theconditions for the specific object type where the SIC type is applicable. A few SIC types(e.g., domain constraints) are common to all entity types. In that case, the preconditioncomponent of a generic SIC is used to identify the relevant object in this context so thatits variables can be instantiated with proper values when a specific object type inheritsthe SIC.4.3 Database ManagementIn this section, some possible applications of the SIC Representation model to databasemanagement rather than database design will be briefly mentioned.4.3.1 SIC ManagementA DBMS should include an integrity maintenance subsystem to enforce SICs. Someprevious research (e.g., [Hammer and McLeod, 1975] and [Bertino and Apuzzo, 1984])Chapter 4. The Application of the SIC Representation Model 81have proposed the functional structure of this integrity maintenance subsystem. Insteadof fully describing its structure, this dissertation briefly discusses its major functionsenforcing and maintaining (adding or removing) SICs. With regard to SIC enforcement,the focus is on the basic checking strategy (the enforcement schedule) and on the violationactions.SIC Enforcement If all accesses to the database are through transactions, the DBMSwould enforce SICs through pre- and post-conditions on transactions. The integritymaintenance subsystem would only maintain SICs, and would not be responsible forenforcing them. If it is possible to access the database using primitive operations or if no *pre- or post-conditions have been specified for a transaction, the integrity maintenancesubsystem would determine which SICs must be enforced and when.If the checking of a SIC is not at the end of the transaction, it should be done beforethe intended database operation is to be performed, not after it. While examining theassertions in the precondition and predicate components of a SIC, the effect of its specifiedoperation type should be taken into account. The Bertino and Apuzzo’s [1984] originalcriteria of deciding when a SIC must be enforced with regarding to a transaction aremodified as below:Basically, a SIC is enforced at the end of the transaction, if the set of objectsin all components of the SIC is affected (updated, inserted, or deleted) bymore than one update, insertion, or deletion statement in the transaction.Otherwise, the SIC is enforced on each occurrence update, insertion, or deletion, if it is a SIC involving only one entity/relationship occurrence or tuple;or on each update, insertion, or deletion-request, if it is a SIC involving one orChapter 4. The Application of the SIC Representation Model 82more than one set of occurrences or tuples belonging to the same or differententity, relationship types, or relations.The violation action of a SIC might become a part of a transaction. In determiningthe enforcement schedule of SICs, the related propagation actions should he consideredtoo. For example, suppose a transaction includes action-i, action-2, action-3, ..., andsuppose that SIG-i is only affected by action-i and has a violation action violation-action1. Therefore, SIC-i is enforced immediately. If action-i violates SIG-i, violation-action-ibecomes part of the transaction. If SIC-2 is affected by action-S and violation-action-i,it should be enforced at the end of transaction. If the transaction is rolled back for anyreason, the related propagation actions should also be undone and if a related propagationaction aborts, the transaction must be rolled back too.In most modern DBMS, there is a recovery mechanism that is responsible for recovering the database from many possible failures, e.g., transaction failures, system-widefailures, media failures ([Date, 1983]). We assume that the integrity maintenance subsystem cooperates with this recovery mechanism to keep information in a log requiredfor undoing an operation request The information needed in the log is similar to thatfor recovery, e g, (i) identification of each modification (update, insertion, or deletion)statement, (ii) identifers of the occurrences affected b each statement, and (iii) for eachoccurrence updated, the old occurrence value A special requuement on the log for deferring SIC checking until the end of transaction is the following. If the SIC contains new/oldfunctions, the old values of the objects should be logged at the beginning of the transaction even if the values are not referenced by any transaction statement. For example, wemay have a SIC to restrict old(snm(Employee.Salary)) new(snm(Employee.Salary)),but a transaction does not reference or operate on “siim(Employee.Salary)”.Chapter 4. The Application of the SIC Representation Model 83If the violation action of the related SIC is “warning”, the intended access operation isallowed, but a warning message would be issued. If the violation action is “propagate”,the access operation would also be allowed, but a related corrective action would beperformed (some message may also be issued). If the violation action is “reject”, theaccess operation would be rejected, an error code would be raised, and some explanationmessage describing the occurrences in violation of the SICs would also be given. In thislatter case, if the transaction has proper exception-handling procedures, this error wouldbe handled as planned (e.g., change the operation in some way and resubmit it, or skipthe operation). Otherwise, the integrity maintenance subsystem would cooperate withthe recovery mechanism to rollback the transaction automatically.The SIC Representation Model SIC representation in terms of the Representation model would facilitate their enforcement since the necessary operational semanticsare included. The checking is usually limited to the occurrence (that is currently manipulated by the operation in the operation component) of the object type (in the objectcomponent). The database state that makes the checking relevant is unambiguously described (in the precondition component). The necessary corrective action is also clearlyspecified (in the violation action component). The certainty factor component serves asthe ordering factor when checking a group of SICs in the same “family”. Other implementation mechanisms (e.g., keeping redundant minimum or maximum data value) canbe used to improve efficiency further.SIC Maintenance After a database is populated, there is a maintenance problema new SIC may he added or an old SIC may be removed. Suppose that no time-stampedpast generation of data is kept. In principle, deleting a SIC does not affect the currentChapter 4. The Application of the SIC Representation Model 84database state. Inserting a SIC can affect the current database state only if the SIC isrelevant to some object(s) on insertion. If the organization decides the new SIC will notbe applicable to the existing data, it should be explicitly stated in the SIC33. In thatcase, even if the new SIC is asserted on “insertion”, the current database state need notbe checked. Otherwise, a complicated procedure is needed to simulate insertions of alloccurrences of the related object (asserted by the new SIC) and settle possible violations.It may be easier to maintain SICs represented in the Representation model since theiroperation type components are explicitly expressed.4.3.2 Other Aspects of Database ManagementThe introduction of certainty factors in the Representation model brings some advantagesto other aspects of database management. The first is to support flexible management.The second is to facilitate intelligent query processing.Flexible Management By allowing uncertain SICs, we would decrease the seriousrigidity of SICs criticized by Brodiga [1985]. Although those SICs are uncertain, theirviolation might still convey some information. Usually warning messages are issued.The purpose of issuing warning messages is twofold. First, specific occurrences thatviolate an uncertain SIC deserve further examination either interactively or in batch toassure that the real organizational situation is reflected. Note that the enforcement ofa good uncertain SIC will generate warning messages for those, likely erroneous, (butcomparatively a few) cases without interfering with the processing of routine cases. Thiscorresponds to the idea of exception reporting. Second, if warning messages are often33For example, if a new restriction on employees salary is published on “1991/2/1” and is onlyapplicable to new employees, it should be attached the precondition: Employee.Hiredate “1991/2/1”.Chapter 4. The Application of the SIC Representation Model 85issued for a SIC34, the organization might need to reevaluate its organizational policy,which might invoke SIC maintenance.Knowledge-based Query Optimization and Deductive Capabilities One advantage of introducing certainty factors in the Representation model is to facilitate queryprocessing. An uncertain SIC is similar to a heuristic for directing query processingto the mostly likely objects first in an attempt to reduce response time. Furthermore,uncertain SICs can be applied to provide deductive capabilities under uncertainty. Forexample, from some sources we have the input data as “with 90% certainty, a ship is atanker, i.e., Ship. Type=tanker”. If we have a SIC “with 80% certainty factor, a tankercarries oil (i.e., if Ship. Type=tanker then Ship. Cargo=oil)” , we can conclude that with72% (i.e., 90% x 80%) certainty, the Ship.Cargo may be oil. Thus, this approach allowsthe Representation model to apply to expert database systems (i.e., the integration ofknowledge based systems and database systems) as well as traditional database systems.34For a SIC involving aggregate functions, it is possible that even if one warning message is issued onlyonce, the organizational policy needs to be re-evaluated because violating such a SIC might imply thata number of occurrence exceptions have happened so far owing to the nature of aggregate functions. Forexample, there might be two SICs such as SIc-4: with 100% certainty. avq(Employee.Saiary, < $25,000and SIC-5: with 90% certainty, avg(Ernployee.Saiary) $20,000. but no other SICs on Empioyee.Salary.Then, a single violation for SIC-5 might imply that many employees have too high salary, the salarypolicy may need to be reevaluated.Chapter 5An Extended E-R Model Incorporating Semantic Integrity ConstraintsThis chapter proposes a conceptual modelling tool, called the E-R-SIC model, forincorporating SICs. Section 5.1 describes the shortcomings of previous E-R models fordealing with SICs. Section 5.2 introduces the primitive constructs and data abstractionsof the E-R-SIC model and describes the basic properties of SICs. Sections 5.3 to 5.6explore SIC properties in more detail. Section 5.7 summarizes the E-R-SIC model insome figures.5.1 Problems with Previous E-R ModelsExisting database design methodologies based on the E-R model provide little supportfor incorporating SICs. Rather, they are treated as unessential accessories to conceptualmodelling.The E-R model was originally proposed by Chen [1976]. It has three primitive constructs: entity, relationship and attribute. More recently, researchers (e.g., [Smith andSmith, 1977a], [Smith and Smith, 1977b], [Teorey et al., 1986]) have extended Chen’soriginal E-R model to provide some powerful data abstractions, e.g., generalization, aggregation, association, etc. Very few SICs are discussed in the Chen’s original E-R model:86Chapter 5. An Extended E—R Model Incorporating Semantic Integrity Constraints 87• An incidence constraint requires that the existence of a relationship occurrencealways depends on the existence of the participating entity occurrences ([Furtado,et al., 1988]).• The existence of an occurrence of a weak entity type depends on some occurrence(s)of another entity type ([Chen, 1985]).• The maximum cardinality specifies the maximum number of occurrences of a relationship that can be related to one occurrence of an entity type.• Attributes cannot exist on their own, but are always attached to entities or relationships.• Primary keys must have unique values.The extended E-R models proposed by previous researchers introduce a few moreSICs, e.g., minimum cardinality. However, in all cases, SICs are only considered asaccessory properties of relationship or entity types (usually a single one). For example,researchers (e.g., Palmer[1978], Tsichritzis and Lochovsky [1982]) usually discuss theoptionality property of a relationship type (i.e., a relationship type, as a mapping, istotal or optional to an entity type). One problem with this modelling approach is thatSICs do not receive adequate attention in the conceptual design phase and are usually lostduring the transformation process from the E-R model to the relational model. SICs arenot treated as essential to conceptual modelling. Instead, they are considered mainly forselecting the appropriate logical data structure (e.g., relations). Because some semanticscan not be considered as “properties” of a single entity or relationship type, the traditionalapproach may not identify all necessary SICs. For example, consider the E-R diagramwhere an entity type E is related to entity types F and G via relationship types RX andChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 88RY, respectively. What SICs (if any) are implied by the adjacency of these relationshiptypes in an E-R diagram? There may he a SIC to require that an RX occurrence relatingto an E occurrence must exist if the corresponding RY occurrence exists, or conversely.Alternatively, there may be a SIC to require that RX he exclusive to RY. Such kindsof SICs are seldom discussed. Since SICs express restrictions on the logical meaning ofdata, it is worth considering the SIC as an distinct modelling construct.5.2 An Overview of the E-R-SIC ModelThe E-R-SIC model is an extended E-R model. It can be used for incorporating SICs ina database schema.5.2.1 Primitive Modelling ConstructsThe proposed model is application domain-independent. There are four primitive modelling constructs in the E-R-SIC model — entity, relationship, attribute and SIC. Thesefour constructs are all needed to model data semantics.Construct Description These primitive constructs are defined as follows:Definition 5.1 An entity is the database representation of a real-world object that canbe distinctly identified.Definition 5.2 A relationship is the database representation of an association amongreal-world objects.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 89Definition 5.3 An attribute is the database representation of a property of a real-worldobject or an association that is a function mapping an entity type or a relationship typeto values.Definition 5.4 A SIC is a logical, invariant restriction on the static state of a database(that is, a collection of attributes, entities and relationships), or on the database statetransition caused by an insertion, deletion, or update operation.Entity, relationship, and SIC occurrences are classified into different types accordingto some criteria. At a particular moment, certain groups of entity, relationship, or SICoccurrences can be considered as sets in the mathematical sense, and these sets mayhave some aggregate properties.Following the arguments on page 59, the E-R-SIC model does not allow the “notapplicable null”. A database designer should properly define entity or relationship typesto avoid that problem.A relationship type cannot be a participant in another relationship type. If a relationship type participates in another relationship type, it will be modelled as an entitytype and the original E-R diagram will be changed appropriately35.Since neither relationships nor attributes can exist without entities, and a relationshipcan be modelled as an entity, the followillg is clear:35That is, if a relationship type RX, which has two entity type participants E and F, is also a participantin another relationship type RY, then a new entity type, say G will be used to replace the originalrelationship type RX; and two new relationship types, say RE and RF, will be created to connectthis new entity type, G, with entity types E and F, respectively. The new entity type, G, becomes aparticipant in the relationship type RY.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 90Entity is the most fundamental construct among attribute, entity, and relationship.Weak Entity Type The existence of an occurrence of a weak entity type dependson the existence of some occurrence(s) of another entity type the “regular” entitytype. An existence subset [Webre, 1983] is the subset of occurrences of the regular entitytype upon which a weak entity occurrence is dependent. The E-R-SIC model requiresthat the above relationship be explicitly specified in order to make clear the “dependencesemantics” — the existence subset for each weak entity occurrence36.Time The E-R-SIC model does not model “time” as an entity type37. That is,time-stamped past generations of data are not modelled. However, there may be SICsapplied to some time-valued attributes, temporal sequences between object occurrences,or the special system variable, Current_time.Simplifying Assumptions For simplification, there are two assumptions on relationships.361n the literature there is a related discussion on another term, weak relationship. However, note thatresearchers use that term to imply diverse meanings. For example, Tsichritzis and Lochovsky [19821 useit to mean the relationship between a weak entity type and its related regular entity type. Scheuermannet al. [1980] classify weak relationships into two types. Their second type should be modelled byan association abstraction. Their first type (also see [Dogac and Chen, 1983]) implies a special SIC,which is called Critical_Relationship_Occurrence SIC in this research. That is, the existence ofan occurrence of an entity type E depends upon the existence of exactly one critical occurrence of arelationship type R, via which it related to another entity type F. An occurrence of E cannot exist ifits related critical relationship occurrence of R does not exist though it may still participate in otheroccurrences of R. For example, each employee should be assigned to at least two projects, and exactly oneof these would be critical. In order to model such a situation, the relationship type R should have a specialattribute, e.g., “Criticality”, which is a binary variable to indicate whether a relationship occurrence iscritical.371f we model “time” as an entity type, all other entities and relationships would connect with “time”via at least two special relationship types “has_creation time” and “has_deletion time” ([Studer,1988]). The E-R diagram would become very complicated.chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 91No Ternary or Higher Degree of Relationships Ternary relationships and relationships of higher degree are not discussed in this dissertation because of their additionalcomplexity38. Instead, ternary relationships or relationships of degree greater than twoare simulated using binary relationships ([Kent, 1977] )39.No Recursive Relationships The relationships discussed in this dissertation arebinary relationships involving two distinct entity types. Recursive relationships, whichare relationships involving only one entity type, are simulated by binary relationships.That is, one or two40 subtypes will be created for the entity type participating in arecursive relationship type. Such a simulation avoids the need to attach an explicit “role”when referencing an entity type participating in a relationship type. These subtypes alsomake the semantics clearer in most cases41. When an entity type is required to participatein both roles of a recursive relationship, such a distinct subtype, although it may beredundant, will be helpful for understanding the semantics. We may have some problemswhen an entity type is required to participate in both roles of a recursive relationship38For example, the cardinality specifications are more complicated. In a ternary relationship, thereare twelve pairs of minimum and maximum cardinalities according to the cardinality definition given byLenzerini and Santucci [1983].39That is, in the case of ternary relationships among entity types, the first binary relationship isformed between two entity types, the second binary relationship is then formed to connect the firstbinary relationship with the remaining entity type. Since the participants of a relationship type shouldbe entity types, the above means that the first binary relationship should become an entity type. Forexample, suppose that there is a ternary relationship Order among three entity types, Parts, Warehouses, and Suppliers. We will model the situation as four entity types, Parts, Warehouses, Allocation,Suppliers, and three binary relationship types connecting Allocation with Parts, Warehouses, Suppliers,respectively. Allocation is a new entity type converted from the original binary relationship betweenParts and Warehouses.400ne subtype is sufficient to replace a recursive relationship with a binary relationship although adatabase designer may prefer to have both. For example, in the relationship type Supervise, “Employee(0, *) Supervise Employee (0, *)‘ where * stands for some positive number other than 0, one subtype(e.g., Supervisor) is sufficient.41Compare to an extreme case where there is only one entity type “Thing” in the database and allrelationships are recursive.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 92and those two roles are not distinct42. However, such cases are very idiosyncratic. Forsimplification, this research is limited to handling binary relationships with two distinctentity types.5.2.2 Data AbstractionsA data abstraction is a simplified description of a system that suppresses specific detailswhile emphasizing those pertinent to the problem. Like other extended E-R models inthe literature, the E-R-SIC model provides three kinds of data abstractions: inclusion,aggregation, and association.Inclusion AbstractionThe inclusion abstraction concept ([Goldstein and Storey, 1990]) encompasses classification, generalization, and specialization. Classification is a form of abstractionin which a type is defined as a collection of occurrences with common properties. Specialization occurs when every occurrence of a type is also an occurrence of another type.Specialization is indicated by the term, is_a, that is, S is_a G, where S is a subtype and Gis a super-type. It is possible to have generalization or subset hierarchies ([Teorey, et al.,1986]). A generalization hierarchy occurs when a type is union of non-overlapping subtypes. A subset hierarchy occurs when a type is union of possibly overlapping subtypes.Property inheritance, which means that attributes, associated relationships and imposedSICs of the super-type are inherited by each subtype (or a type’s are inherited by itsoccurrences), is the most important characteristic of the inclusion abstraction. In the42That is the relationship is symmetric, e.g., “Persom Fri emdship Person”Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 93case of the classification abstraction (related occurrences to types), the property inheritance principle has been enforced traditionally by any DBMS. This research assumes thatthe DBMS (either in the E-R model or the relational model) would also automaticallyimplement property inheritance for the specialization abstraction. That is, the principleof redundancy minimization. i.e., properties that can be inherited from some other entity type via an is_a relationship should not be stored explicitly ([Goldstein and Storey,1990]), has been followed. Otherwise, the inherited attributes, relationships and imposedSICs would need to be explicitly and redundantly stored for each specific subtype andother SICs would be required to assure that these properties have been really “inherited”! However, there is one exception to the use of redundancy minimization principle.Although the primary key of the super-type may not be chosen as the primary key of thesubtype, it is indeed an attribute and candidate key of the subtype. In order to make thesemantics of a subtype entity clearer and have the same related SIC(s) regardless of thechoice of the primary key, the primary key of the super-type is suggested to be stored inthe subtype.Aggregation AbstractionAggregation is an abstraction that allows a relationship between objects (i.e., attributes,entities, relationships) to be considered as a higher level object ([Smith and Smith,1977b]). The term, component_of, is used to indicate the aggregation, that is, C cornponeritof A, where C is a component and A is an aggregate. The E-R-SIC model provides composite entity aggregation that is an abstraction in which an entity containsother dependent entities and some attributes as real components43.The aggregate entity431n addition to composite entity aggregation, aggregation abstractions can be classified into fourother kinds based on discussions about aggregation in the literature. (1) Attribute aggregation is theabstraction in which an attribute may be defined as the aggregation of attributes. This research doesChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 94“owns” these other entities ([Lee and Lee, 1990]). That is, the existence of these otherentities is dependent on their aggregate and are owned by exactly one aggregate (i.e.,the cardinalities of any component in the component_of relationship are always (1,1)).Although some researchers argue that in aggregation the inheritance is upwards ([Brodie,1983, p. 576], [Mees and Put, 1987], [Potter and Kerschberg, 1988]), this research argues that it is probably more suitable to state that the aggregate “owns” componentsand components “own” their attributes, so the aggregate “owns”, rather than “inherits”, components’ attributes. For example, we may state “a car owns engine.weight,engine, brand, and brake. brand, etc. “.Association AbstractionAssociation is the abstraction in which a collection of member objects is consideredas a higher level (more abstract) object ([Brodie, 1983]). The term member_of isused to indicate the association, that is, M member_of 5, where M is a member and Sis a set. Brodie [1983] states “as with aggregation, the inheritance goes upward” andsome researchers (e.g., Mees and Put [1987]) even state that association may supportboth upward and downward inheritance. This research takes the position that there isno inheritance in association. That is, is distinguished from type and a “set” inassociation is considered to represent a pure mathematical set.not consider the hierarchical structure of attributes. (2) Simple entity aggregation is the abstractionin which an entity is defined by aggregation of attributes. This is the traditional entity concept. (3)Complex entity aggregation is the abstraction in which an entity is defined by aggregation of attributes,other independent entities, and probably sets. The aggregate object does not really “own” them ascomponents. That is, the cardinalities of these “components” in the component_of relationship maynot be (1,1). This aggregation may convey less semantics because these “relationships” between theaggregate object and its “components” are all implicitly named as “component-of”. Therefore, the E-RSIC model does not adopt it. (4) Relationship aggregation is the abstraction in which a “relationship” isobtained by aggregation of entities and some attributes. It is just another way to represent a relationship.chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 95Before Brodie mentioned association, dos Santos et al. [1980] had already proposed auseful data abstraction “correspondence”, which was later referred to by Furtado andNeuhold [1986] as “grouping”. Grouping is more general than association. It createsa group of sets, i.e., grouping is an abstraction that defines a new entity type in whicheach occurrence is a set formed from a collection of occurrences of the source entity type.According to the correspondence idea in [dos Santos et al.,1980}, a group of sets isformed by an indexing mechanism. Applying the idea of correspondence, the E-R-SICmodel provides three kinds of associations as below.1. Natural Set Association: A set, S, is defined to contain all occurrences of anentity Mtype. Classification is the indexing mechanism to form a set. For example,we have: “Employee member_of Employees” where Employees is a set consisting ofall Employee occurrences in the employee type; or “Department member_of Departments “.If these sets only have attributes that are derived from those of their members, theirexplicit representations may be redundant and may not be efficient after consideringthe enforcement of SICs. However, there may be a priori attributes, for example,Employees.Representative. In the whole database, there are a number of such sets,e.g., Employees, Departments. Since they may have different derived attributes anda priori attributes, they form different entity set types containing a single occurrence, respectively. This association abstraction relationship, memberof, shouldnot have its own attributes and need not be explicitly represented.2. Indexing Derived Set Association: This is the original correspondence abstraction discussed by dos Santos et al. [1980]. The indexing mechanism44 can be an44Formally, Furtado and Neuhold [1986] define grouping as below. “If T designates some entity setChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 96indexing attribute that is an attribute of the indexed entity type, or an indexingentity type that is related to the indexed entity type via an indexing relationshiptype. An example is cosets of employees of the same age, where Employee.Age is theindexing attribute for the indexed entity type Employee. If the indexing attributeis an attribute that disallows null (“unknown”), the cosets obtained by groupingwould form a partition of the indexed entity type occurrences. Another example isshown in Figure 5.1, where DSis the indexing derived set (e.g., cosets of employeeswho work_for the same project), M (e.g., Employee) is the indexed entity type, I(e.g., Project) is the indexing entity type, and R (e.g., Work_for) is the indexingrelationship. Grouping is a powerful abstraction. There may be more than oneindexing type, which can be combined with indexing attributes as the indexingmechanism. Although we can get a group of sets from the indexing mechanism, wemay only be interested in some of these (e.g., the set of employees who work_forproject p100).In general, this association abstraction relationship, member_of does not have itsown attributes and need not be explicitly represented. This kind of set has someattributes derived from the indexed entity type, some are defined a priori. Twokinds of attributes in a set are important for membership derivation: the indexingattribute (e.g., Employee.Age) and the primary key of the indexing entity type (e.g.,Project. Id).3. Enumerated Set Association: There is no indexing mechanism in this kind ofassociation because the database designer does not know or does not care aboutand T1,T2, ..., T, are either value sets associated with T or entity sets related via some relationship withT, then the Grouping operator denoted by T1,T2, ..., T {T} constructs a new (grouped) entity set TGwhere each element is a set of entities of T such that inside of one such set all entities have the samevalues and related entities from the entity sets T1 , T2 ..., T associated. The types T1,T2, ..., T will becalled indexing types, T the basis.”Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 97_rof DSIFigure 5.1: Grouping: Member (M), Derived Set (DS), Indexing Entity (I), IndexingRelationship (R)the indexing entity types or attributes. The set membership can only be explicitlyenumerated by the member_of relationship.5.2.3 Basic Properties of SICsBased upon the ontological concepts of Bunge’s formalism, Wand and Weber ([1988];[1989]; [1990]) develop a formal model of the deep structure of an information system.Their work begins with a fundamental premise that is an adaptation of Newell andSimon’s [1976] physical symbol system hypothesis. Although as described in Section 1.3the SICs in this research are only parts of their “laws”, we can borrow this premise todevelop the SIC concept.Premise 5.1 A physical-symbol system has the necessary and sufficient properties torepresent real-world meaning.Chapter 5. An Extended E-R Model Incorporating Sernantic Integrity Constraints 98An information system is a physical-symbol system. A database is part of an information system and consists of attribute, entity and relationship occurrences. In orderto represent data integrity semantics, there may be constraints restricting the existenceor change of these occurrences. Some constraints may specify necessary conditions thatmust hold for an occurrence to exist, not exist or change in a database. Other constraintsspecify sufficient conditions, which if true, imply that an occurrence must exist, not exist,or change in a database. Thus, we can interpret a constraint as follows.A SIC is an assertion a sufficient or necessary condition for an occurrence of an attribute, entity, or relationship type to exist, not exist, or changein a database.What are these conditions?• Although these conditions are usually specified for an entity or relationship type orits attributes, they are actually restrictions on the insertion, deletion, or update ofits occurrences in a database, rather than on the addition or removal of the typefrom the database schema.• A SIC is defined intensionally in a database schema rather than extensionally. Theconditions apply to an occurrence by virtue of the fact it belongs to an entity,relationship, or attribute type. However, the conditions for entities or relationshipsmay only be relevant to some occurrences of the specified type. These occurrences,indeed, can be considered to belong to an “implicit subtype”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 99Reasons of having implicit subtypes. By applying the classification abstraction([Schrefi et al., 1984]) those entity/relationship occurrences with common properties form an entity/relationship type; and by the principle of property inheritance every entity/relationship occurrence should have exactly all the attributesof the entity/relationship type and conform to the SICs associated with the entity/relationship type ([Knuth et al., 1988]). However, these occurrences may notbe “homogenous”. That is, there may be special SICs on some occurrences. Onecould create a subtype for only those special occurrences. By taking this approach(similar to [Dampney, 1988]) we could avoid a number of special types of SICs associated with “implicit subtypes”. However, since there may be a number of suchspecial SICs, there would have to be a corresponding number of such subtypes ina database. The organization may have no intrinsic interest in these explicit “subtypes”. In this research, the subtypes will be created only if they are meaningfulto the organization or if they are needed to eliminate recursive relationships. Thatis, we have the following premise:Premise 5.2 The occurrences of each entity or relationship type are not homogenous; that is, they have different attribute values, and are related to different occurrences of other objects.This implies that a database designer could not specify all entity and relationshiptypes such that all occurrences of each type satisfied the same set of SICs. That is,“implicit subtypes” with unique SICs always exist.An entity subtype may be implicitly defined by restricting attribute values of itsoccurrences or by requiring occurrences to participate in some relationship types45.451n another semantic data model, SDM, there are four ways to define “sub-classes”: attribute-defined,user-specified, set operator-defined, and existence-defined ([Hammer and McLeod, 1981], [Urban and Delcambre, 1986]). A user can decide what occurrences the subtype will contain. From the SIC perspective,Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 100A relationship subtype may by implicitly defined by restricting attribute values ofits occurrences, by requiring its participating entity occurrences to participate insome other relationship types, or by restricting attribute values of its participatingentity occurrences. A condition for an entity or relationship type can be consideredas the definition of an implicit subtype if there are other conditions and this condition provides a criterion to decide whether an occurrence satisfies other conditions(that are special SICs for the “implicit subtype”).• These conditions may be positive or negative. A positive condition requires thatsomething must happen. A negative condition requires that something musthappen. For attributes, by simply reversing the comparison operator (e.g., reversing“not E.A> value” to “E.A < value”), we could consider only positive conditions.However, we must consider both positive (e.g., participate_in) and negative (e.g.,not participate_in) conditions for relationships.• These conditions may be simple assertions, or complicated arithmetic formulas.To be considered a single SIC, a condition should be “atomic”, i.e., indivisible.Otherwise, it is really two or more SICs. That is, usually we need not considerconjunctions of conditions. However, there may be some “further restrictions” thatare conditions on some basic restrictions. For example, there may be a SIC suchas, “if the salary of an employee is greater than $40,000, he (she) must participatein some project, which is projecti”, in which the last part of the statement is afurther restriction on the participation in a project.• These conditions may only require that any one occurrence of a specific type exist, or more strictly, may require that at least, at most, or some exact number ofthe semantics are embedded in the user’s mind and it is left to the user to maintain the database in orderto reflect real-world changes. If a subtype is defined by some set operation, the definition is explicit.The ideas of the “attribute-defined” and “existence-defined” “sub-classes” in SDM are adopted here.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 101occurrences of the specific type exist. That is, the restriction may be qualitativeor quantitative. Since an attribute type has exactly one occurrence per entity orrelationship occurrence, no quantitative requirement is placed on attributes. Thequantitative requirements are more important for a relationship type. They arethe relative cardinalities of a relationship type — restrictions on the minimum,maximum, or exact number of relationship occurrences in which an occurrence ofthe specified entity type participates; or requirements that for each occurrence ofthe specified entity type, all occurrences of the other entity type relate to it via therelationship type.• These conditions may directly restrict the value of an attribute of a single entity orrelationship occurrence, or may restrict the aggregate value of an attribute for a setof entity or relationship occurrences. All occurrences of an entity or relationshiptype (or its “implicit” subtype) naturally form a set (although the database designermay not explicitly define it). The most important set properties are the countingnumber, minimum, maximum, summing and average values.• Although a condition may be asserted for occurrences of one entity or relationshiptype, it might also be for occurrences of a group of types taken together.• These conditions may explicitly involve time to restrict or trigger the existence,nonexistence, or change of an occurrence, or may assert a temporal sequence between the existence of occurrences46.46Without time-stamped past data, the temporal requirement would imply “no time lag” between theexistence of adjacent occurrences in a sequence of objects.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 102Systematic Modelling A naive approach to modelling SICs would enumerate all necessary and sufficient conditions for the existence, nonexistence and change of the occurrences of each attribute, entity, and relationship type. However, if a SIC involves severalobjects, we need not specify it several times. For example, suppose that a SIC requiresthat if an RX occurrence exists for an occurrence of the specified entity occurrence, anRY occurrence must also exist. We would specify this SIC when we identify necessaryconditions for the existence of RX. Although the existence of RX is also a sufficient condition for the existence of RY, it need not be incorporated again as a general constraint ifwe properly represent the above SIC (i.e., by decomposing it into two sub-SICs in termsof the Representation model). It would be desirable to be able to guarantee that allconditions on all objects have been completely covered although we need not explicitlyconsider all kinds of conditions for each object. The systematic modelling procedureproposed in the remainder of this chapter and the decomposition algorithms proposedin Chapter 7 are used to achieve this goal. By applying the procedure described in Sections 5.3 to 5.6 a database designer can model SICs systematically by first consideringthe necessary and sufficient conditions for the existence, nonexistence, and change of eachattribute of each entity and each whole entity in isolation, and then examining the wholeE-R diagram by considering each explicit or implicit relationship.Representation Languages Although SICs incorporated by the E-R-SIC model candirectly be represented in terms of the SIC Representation model, it is suggested thatthey are first represented in a simplified format using rules or expressions. The purposeof using this simplified format language is to represent the precondition and predicatecomponents first in a language that is closer to natural language. The relevant objects andoperation types can be derived from the simplified format by applying the algorithmsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 103introduced in Chapter 7. The certainty factor and violation action specifications areclosely related to the objects and operation types, and can be added later.The simplified format contains the preconditions, predicates, and some operationtype information (by the keywords, is_to_be_deleted, and is_to_be_updated for operation-dependent SICs), hut not the certainty factor and violation action. For example, we mayrepresent a requirement on the salary of any programmer first in the simplified formatas:if Employee. Title= “programmer”then Employee.Salary 18000.BNF descriptions of the simplified format are described in Appendix C. For convenience,this dissertation describes a SIC as being in rule format if it can be written by using thekeywords if and then (and the “if” condition part is not empty) in the simplified format.Since a relationship type name usually can be a verb phrase, when a relationship typeis discussed, its participating entity types will be mentioned too. Because a relationshiptype involves two entity types, we need a with_respect_to keyword to clarify to whichentity type we are referring47. An assertion may have a temporal modifier (before orpreviously) in a rule describing a temporal sequence requirement between the existenceof the object in this assertion and the existence of the object in another assertion. If a“before” modifier is attached to an assertion, the assertion must be true for its involvedobject at the time another assertion is becoming true48. If a “previously” modifier isattached to an assertion, at the time the assertion becomes false for its involved object,47For example, “if with_respect_to E, E RX F then with_respect_to E, -‘(E RY F)” is a SIC to assertthat RX and RY are mutually exclusive relative to E only.48For example, we may have: “if Dealer Own Car then Manufacturer Possess Car before”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 104another assertion must become true49.SICs written in the simplified format should be later reformulated in terms of theSIC Representation model by applying the algorithms introduced in Chapter 7. Most ofdesigner-specified SICs are general SICs. These general SICs must then be decomposedinto several operation-dependent sub-SICs in terms of the Representation model. Forexample, the static constraint in the simplified format shown on the preceding page willbe represented as three operation-dependent SICs — one each on update of Salary, updateof Title, and insertion of Employee. At this time, the certainty factor and violation actionspecifications are added.SIC Types The E-R-SIC model can be used to classify SICs into a number of domain-independent SIC types. This SIC type classification facilitates development of SIC consistency and nonredundancy rules that can be applied to make SIC elicitation more efficient.It also allows us to present generic SICs for some common SIC types so that the numberof SICs that must be specified using the Representation model can be greatly reduced.Appendix D provides a detailed description of this classification.5.3 Entity Attribute SICsFirst examine necessary and sufficient conditions for the existence/nonexistence/changeof each attribute of each entity in isolation.Necessary Conditions An attribute cannot become nonexistent once it exists. Therefore, we only need to consider necessary conditions for its existence and change.49For example, we may have: “if Manufacturer Possess Car previously then Dealer Own Car”.Chapter 5. An Extended E-R Model Incorporating Scmantic Integrity Constraints 105Associated Entity Type SICs If an attribute exists, its value may be restrictedto be in a data type and in some range, and/or may be specified as not-null that meansthe “unknown” value is not’ allowed. In addition, its value must be expressed in somespecified format that is meaningful to the organization. If an attribute is allowed to beupdated, it must be specified as changeable. All attributes (including primary keys) ofrelationships or entities are updatable unless declared otherwise. If it can be updated,there may be special SICs restricting the pairs of before and after values.Associated Entity Set SICs Because an attribute type is defined as a functionfrom an entity type to value, it has only one occurrence per entity occurrence, i.e., itis single-valued, not multi-valued. We need not consider the attribute set. However,because an attribute belongs to an entity, there may be some SICs restricting entity setproperties. They may require that each attribute value he unique. The restrictions onminimum, maximum, summing, or average value of the attribute in an entity set mayalso be specified.Primary Key Problem In the E-R-SIC model, following the traditional E-Rmodel and relational model, there is no internal identifier (“surrogate”) to representan entity or relationship. Rather, some attribute or combination of attributes is used asthe primary key. Unfortunately, this overloading causes the semantics implied by an update of a primary key to be ambiguous it may imply a simple update of an attributeor it may imply the deletion of an “old” entity (relationship or tuple) followed by aninsertion of a “new” one. The update of a primary key is allowed in the E-R-SIC model.However, the SICs related to deletion and insertion must be enforced.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 106Time Restriction There may be some SICs restricting a time-valued attribute.The expression of such SICs may involve an explicit Current_time variable50.Sufficient Conditions An attribute type is included by a database designer to satisfyorganizational needs. Because the “not applicable null” is not allowed, for each entityoccurrence, there is at least one attribute occurrence. Therefore, no SICs can be expressedas sufficient conditions for the existence or nonexistence of an attribute considered inisolation.If we consider a single attribute in isolation, neither is there any sufficient conditionfor attribute change except time-triggering. That is, an increment of the Current_timemight trigger an update to an attribute. For example, at 0:00 on 1/1/1993, increase thesalary of each employee by $1,000.5.4 Entity SICsExamine necessary and sufficient conditions for the existence/nonexistence/change ofeach whole entity type in isolation.Necessary ConditionEntity Type SICs If an entity occurrence exists, there may be a formula5’amongits attributes. In general, any entity occurrence can be deleted to reflect the real world50For example, Current_time (Employee.FirstWorkDate + “2 years”) is an expression to assert arule that “each employee must have at least 2 years working experience”.511f there is more than one attribute appearing in an expression, we call that expression a formula.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 107situation. However, there may be SICs that require some of its attributes to have certainvalues before the deletion may take place.Entity Set SICs Because an entity occurrence belongs to an entity set, there maybe SICs restricting set properties. They may specify the maximum number of occurrencesof an entity type that are allowed to exist in the database. The concatenated values ofsome attributes may be required to be unique. Some aggregate values of attributesmay be required to be interdependent52,or more strictly, satisfy a formula. There is norestriction on the minimum number of occurrences that can exist in the database becausein the beginning there are no occurrences, and we should allow that an entity type mayhave no occurrences temporarily even after the database is already populated.There are some further complicating factors such as the following:The First Complicating Factor Implicit Subtype The single attribute SICsin Section 5.3 and the SICs mentioned above in this section may be conditional. That is,they may include conditions that can be thought of as defining “implicit subtypes”. Ifwe consider a single entity type in isolation, the only way to define “implicit subtypes” isby some range restrictions or formulas involving its attribute values. It is possible thata SIC (e.g., nonvolatility, value restrictions, etc.) applies to a single attribute only fora specified “implicit entity subtype”. It is also possible that a SIC (e.g., restriction onthe maximum number of occurrences, etc.) applies to an occurrence only because thisoccurrence belongs to a specified “implicit entity subtype”.specifications52For example, if min(E.A1)> vi then avg(E.A2) > v2.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 108The Second Complicating Factor Time Restriction There may be SICsthat restrict other attributes when time-valued attributes satisfy certain conditions. Astime passes, the restriction will either finally disappear or come into force53. Theseexplicit time restrictions may be added to all SICs mentioned in this section and inSection 5.3. It is possible to combine time restrictions and “implicit subtypes” factors ina single SIC.The Third Complicating Factor — Temporal Sequence among AttributesThere may be some temporal conditions among the attributes of an entity. They mayrequire that an entity occurrence’s attributes must satisfy a certain condition at thetime one of its other attributes is going to acquire some value54. It is also possible torequire that if its other attribute(s) satisfied a certain condition in the past (and no longersatisfies it now), one of its attribute must take some value.Sufficient Conditions An entity type exists in the database only because the databasedesigner includes it in order to satisfy the interests of the organization. It would be veryunusual to have a SIC requiring a specific entity occurrence to exist or not to existin a database. Therefore, we exclude the possibility of expressing a SIC as a sufficientcondition for the existence or nonexistence of an entity occurrence considered in isolation.If we consider a single entity type in isolation, neither is there any sufficient conditionfor entity change except time-triggering. That is, for an entity occurrence, an incrementof the Current_time might trigger an update to one of its attributes if its other attributes53For example, we may have a SIC like “an employee cannot receive a salary raise during his (her) first6 months in the company”, or “if an employee has worked for at least two years, he (she) must have atleast 10 vacation days”.54For example, we may have a SIC: “if E.A1=vl then E.A=v2 before”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 109satisfy some condition(s).5.5 Relationship SICsNow traverse an E-R diagram and consider the necessary and sufficient conditions for theexistence/nonexistence/change of each relationship. In terms of topology, the importantlocal contexts describing how relationships and entities are connected together in an E-Rdiagram are55: line, star, loop-2, and loop-n. Each relationship type is in one or more ofthese contexts. When incorporating SICs for a relationship, the database designer needsto recognize its contexts.Definition 5.5 If an entity type participates in a group of relationship types, it is asharing entity type to them.Definition 5.6 If in a part of an E-R diagram, each entity type participates in at mosttwo relationship types and there is no cycle (loop) among these entity types, e.g., Figure 5.2, each of these relationships is in a line context.Definition 5.7 If there is a sharing entity type participating in three or more relationship types and there is no loop among these entity types, e.g., Figure 5.3, each of theserelationships is in a star context.Definition 5.8 If there is a loop between two entity types, i.e., two or more relationshipsexist between two common entity types, e.g., Figure 5., each of these relationships is ina ioop-2 context.551f we allowed explicit recursive relationships, they would be in the loop-i context.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 110£_____ _____F++__HiFigure 5.2: A Line Layout ContextChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 111Figure 5.3: A Star Layout ContextERZ RX RYFFigure 5.4: A Loop-2 Layout ContextChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 112E__________H__________FFigure 5.5: A Loop-n Layout ContextDefinition 5.9 If there is a loop among n3 entity types, e.g., Figtre 5.5, each of theserelationships forming the loop is in a loop-n context.5.5.1 Necessary ConditionsIt is necessary to examine the necessary conditions for one or multiple occurrences ofa single relationship type t.o exist (not exist/change), and for occurrences of a group ofrelationship types to co-exist.Single Relationship Type SICsFirst we focus on a single relationship type.Focus on Attributes A relationship is similar to an entity because it can have its ownattributes in addition to the primary keys from the participating entities. Therefore, arelationship can have single attribute SICs similar to those discussed in Sections 5.3 andChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 113single relationship SICs56 such as those of Section 5.4. Since a relationship’s attributes areusually few, such SICs are not common. However, it is possible that there may be somemore complicated SICs57 because the definition of an “implicit relationship subtype”could be based on some conditions on one or both participating entity occurrences (e.g.,having some attribute values, or participating in some other relationship types), or someformula involving the attributes of the relationship and its participating entities.Focus on Association In another sense, a relationship is much different from anentity because a relationship occurrence represents the association between two entityoccurrences.Inherent SIC First, an inherent SIC is that the relationship occurrence’s participating entity occurrences must exist.Restricted-connecting Set There may be some restrictions on constructing arelationship, i.e., the necessary conditions for the existence of a relationship occurrence.Definition 5.10 Suppose that the relationship type R connects the entity types E withF. The restricted-connecting set of an entity occurrence of E in the relationship typeR is those occurrences of F that the entity occurrence of E is allowed to be related to viaR.561n contrast to an entity type, absolute maximum cardinality for a relationship type need not beconsidered although some researchers mention it. As shown by [Lenzerini and Santucci, 1983], theabsolute maximum cardinaJity of a relationship is bounded by the absolute maximum cardinalities andrelative maximum cardinalities of participating entity types.57For example, there might be a SIC stating that “if Employee Work_for Project, Project_Id=plOOthen Work_for.Hours 50”.chapter ö. An Extended E-R Model Incorporating Semantic Integrity Constraints 114Normally, a DBMS would not know the restricted-connecting set of each entity occurrence in each relationship since a SIC is seldom specified extensionally. However, theremay be some general business rules that restrict intensionally the restricted-connectingsets. The restrictions may be positive or negative. The restrictions on the freedom toconstruct a relationship include:1. One-side condition. The restriction may be on one entity type. It would imply that some occurrences of each participating entity type are not allowed toparticipate in the relationship type. That is, their restricted-connecting sets areundefined. The relationship type R is in fact defined on some “implicit entity subtype(s)” of E or F. The “implicit entity subtypes” may be defined positively, i.e.,as occurrences having some specific attribute values, having attributes satisfyingsome formula, participating in some other relationship types; or negatively, i.e., occurrences not participating in some other relationship types. We need to considerpossible disjunctions of conditions in a star context because there are at least twoother relationship types58.2. Two-side condition. The restrictions may be coupling conditions on both entitytypes at the same time. If these are positive restrictions, they are stronger than theabove one-side restriction. If these are negative restrictions, they are weaker thanthe above one-side restriction. Although all occurrences of each participating entitytype, E or F, may be allowed to participate in the relationship type, they are notfreely connected together. An example of a positive restriction is that each occurrence of an “implicit subtype” of E can only connect with those occurrences of some“implicit subtype” of F. An example of a negative restriction is that each occurrence of an “implicit subtype” of E cannot connect with those occurrences of some58For example, in Figure 5.3, we may have: “if E RX F then (E RY G) V (E RZ H)”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 115“implicit subtype” of F. The “implicit entity subtypes” on both sides of a relationship may be defined interdependently as occurrences having some specific attributevalues59,having attributes satisfying some formula, participating in some otherlationship types (e.g., occurrences of E participating in RX and their connectedoccurrences of F participating in RY at the same time)60. Some SICs described inthe literature are just special cases. For example, a Subset_Relationship SIC([Palmer, 1978]) in a loop-2 context61 and the necessary conditions for the composition of relationships ([Lenzerini and Santucci, 1983], [Azar and Pitchat, 1987])in a loop-n context62,are special cases requiring that the specific occurrences of Fconnecting with an occurrence of Evia other relationship types (e.g., RY, RZ) be inthe restricted-connecting sets of the occurrence of E. An Exclusive_OccurrenceSIC ([MaFadden and Hoffer, 1988]) in a loop-2 context63 is also a special case requiring that the specific occurrences of F connecting with an occurrence of E viathe other relationship type (e.g., RY) llQ be in the restricted-connecting set of theoccurrence of E.3. Intra-relationship condition. It is possible that we can define a restricted-connecting set based on some properties of the relationship type. For example, wemay define a restricted-connecting set of an occurrence of E by requiring that if someoccurrences (e.g., having some attribute values) of Fare in the restricted-connectingset, some other occurrences (e.g., having some other attribute values) must or mustnot be in the set. A relationship type may also be symmetric or transitive.59For example, in a relationship Drive connecting Driver with Vehicle, even if it is total on both sides,some drivers (e.g., with class 5 driver-licence) can only drive some vehicles (i.e., with gross weight10,900 kgs).°°For example, there may be a SIC that state: if Employee Is_Allocated Car then if Employee Work_forProject then Car IsJnsured Collision_[nsurarice61For example, in Figure 5.4, we may have: “if E RX F then E RY F’.62For example, in Figure 5.5, we may have: “if E RX F then (E RY H) A (H RZ F)”.63For example, in Figure 5.4, we may have: “if E R F then -(E RX F)”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 116The fundamental properties of any relationship type are assumed by default tobe irreflexivity, asymmetry, and intransitivity. However, some relationships maybe reflexive, symmetric, or transitive. The reflexivity of a relationship will notbe discussed in this research because it is quite unusual in a traditional database.Symmetric64 or transitive65 relationships may occur in a specialization hierarchy,that is, when the two involved entity types belong to the same super-type; or oneis a super-type (e.g., Employee), and the other is its subtype (e.g., Manager).There are some further complicating factors such as the following:1. The first complicating factor temporal sequence conditions. A SICmay require that for a relationship occurrence of a specified type to exist, one ofits participating entity occurrences must have attributes with certain values, orparticipate (or not participate) in other relationship type(s) at the time it is goingto participate in this relationship. Those occurrences of other relationship typesare allowed to be deleted after insertion of this relationship.2. The second complicating factor— quantitative requirements. For a relationship occurrence to exist, there may be a quantitative restriction on the maximum number of relationship occurrences of the same relationship type, in whichone involved entity occurrence has participated— that is the relative maximumcardinality to the specified entity type. If a necessary condition for the existenceof a relationship involves other relationship type(s), there may be some specialquantitative requirements that are cardinalities of those other relationship types66.64Some examples of such relationship types are: Sibling_of, Married_to, Partner_of.65Some examples of such relationship types are: Sibling_of, Ancestorof, Supervise, Partner_of.66For example, there may be a SIC such as, “if RX, E RX F then exactly 3 RY, E RY G”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 1173. The third complicating factor — multiple occurrences. The above onlyconsiders necessary coilditions for the existence of one relationship occurrence.However, there may be some necessary conditions for the existence of multiplerelationship occurrences of the same type relative to a sharing entity occurrence67.Group Relationship Type SICsThe above focuses on the necessary conditions for the occurrences of a single relationshiptype although sometimes several relationship types may be involved in the conditions.In a star context, there are several relationship types with a sharing entity type. Ifseveral occurrences of different relationship types co-exist, there may he some necessaryconditions to require that the sharing entity must participate in some other relationships,have some attribute values, or there may he a formula among some attributes of thesharing entity and those relationships.Relationships Involved in Defining “Implicit Entity Subtypes”It is necessary to review those single attribute SICs in Section 5.3 and Single Entity SICsin Section 5.4 in the presence of relationships. For example, it is possible that a SIC(e.g., nonvolatility, value restrictions, etc.) applies to a single attribute only because itsassociated entity participates in some relationship(s). It is also possible that a SIC (e.g.,restriction on the maximum number of occurrences, etc.) applies to an entity occurrenceonly because the occurrence participates in some relationship(s).67For example, there may be a SIC such as, “if 3 exactly 3 RX, B RX F then 3 RY, B RY G”.Chapter 5. An Extended E-R. Model Incorporating Semantic Integrity Constraints 1185.5.2 Sufficient ConditionsAs stated in Section 5.2.3, if a SIC requires the existence of an occurrence of one relationship type to depend on the existence of an occurrence of the other relationship type,it would be incorporated when identifying necessary conditions for the existence of theoccurrences of the first relationship type. At this step, we need not specially incorporateit as a sufficient condition for the existence of the occurrences of the other relationshiptype. However, other SICs may specify sufficient conditions for a relationship to existinvolving the existence of an entity occurrence, time-triggering, or temporal sequencerequirements. They are as follows.The Existence of an Entity There may be SICs to require that if an entity occurrenceexists, it must participate in a specified relationship type. In this case, the existence of arelationship occurrence might be thought of as a necessary condition for the existence ofan entity occurrence. However, since “entity” is a more fundamental construct it is morenatural to think that the existence of an entity occurrence is the sufficient condition forthe existence of a relationship occurrence. There are some further complicating factorssuch as the following:1. The first complicating factor implicit subtype. It is possible that a relationship type must exist only for an “implicit entity subtype”. In this case, the definition of an implicit entity subtype may be based on the values of its attributes68.68j this case, the only focus is the existence of an entity. We need not consider SICs related to an“implicit entity subtype” that is defined as occurrences participating in other relationship types. ThoseSICs have been covered when we consider necessary conditions for the existence of other relationshiptypes.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 1192. The second complicating factor — quantitative requirements. We mayrequire not only the existence of any occurrence of the specified relationship type,but also specify the number of such occurrences. These are two kinds of relativecardinalities69:the requirement of the minimum number of relationship occurrencesin which an occurrence of the specified entity type must participate; or the requirement that for an occurrence of the specified entity type, all occurrences of the otherentity type are required to connect with it via this relationship type. Combiningthis factor with the first factor, we would have cardinalities for an “implicit entitysubtype”7°3. The third complicating factor — further restrictions. It is possible thatthere may be further restrictions on the existence of a relationship in addition to thecardiiialities. A SIC may require that if an occurrence of one entity type exists, itmust participate in some minimum number of occurrences of a specified relationshiptype, and there are further restrictions placed on the relationship attribute value(s),or on the occurrence(s) of the other participating entity type. SICs related toa weak entity type7’ and a Critical_Relationship_Occurrence SIC72 are justspecial cases73.69Note that as described in the previous section, a relative maximum cardinality is a necessary condition for the existence of a relationship occurrence. The exact number of the cardinalities is theconjunction of the requirements of the minimum and maximum cardinalities.70Again note that some subtype cardinalities, which are related to the entity participating in otherrelationship types, have been considered in the previous section.71By the semantics of the weak entity, the relationship type R, via which the weak entity type isdependent upon a regular entity type, is total to the weak entity type and its key is fixed. The fixedretention restricts the key of relationship from updating. Note that the relative cardinalities of the weakentity in the relationship type R are not necessarily (1,1), it may be (c,d) where d is a number greaterthan c, and c greater than 1. For example, a child as a dependent may have both father and mother asemployees in a company.72A Critical_Relationship_Occurrence SIC requires the totality constraint to the specified entitytype and the existence of exactly one critical relationship occurrence.T3here are many other examples such as Stronger Totality Constraints (refer to Appendix D).Chapter 5. An Extended E-R Model Incoiporating Semantic Integrity Constraints 1204. The fourth complicating factor — a group of relationship types. We maynot require the existence of an occurrence of a specific relationship type for anyoccurrence of the specified entity type. Instead, we may require the existence ofone relationship occurrence among a group of relationship types. Combining thisfactor with the above factors, we might have a more complicated SIC.Time-Triggering A single relationship, like an entity in isolation, may have time-triggered SICs to trigger its update.Temporal Sequence Conditions It is possible to require that if an entity occurrencehad some attribute value(s) in the past and no longer has it now, it must participate insome relationship(s). In the case of a group of relationship types, if some relationship(s)existed in the past, once it is deleted, other relationship(s) may he required to exist now.5.6 SICs Implied by Implicit Relationships and Data AbstractionsIn general, the E-R-SIC model requires us to express explicitly all relationship typesin an E-R diagram. Even in cases involving an ID dependency74,is_a or component_ofrelationships where the candidate key of one entity provides information about a relatedentity, the relationship type is required to be specified to make the semantics clearer.There are some exceptions (e.g., the cases that two entity types are exclusive; an entitytype may be formed by set operations on other entity types; or member_of relationshipsexcept for in the enumerated set association) that there is a SIC between some entity74An entity type (e.g., E) has “ID dependency” on other entity types if this entity type cannot beuniquely identified by its own attributes and has to be identified by its relationship types to the otherentity types ([Chen, 1985]). The entity type E needs the key of some other entity type (e.g., F) as apart of its key.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 121types, but we need not express the relationship type explicitly because no suitable relationship type can be specified or the relationship is derivable. This section considers theSICs implied by these “implicit” relationship types. In addition, this section discussesthe SICs implied by data abstractions since they are special relationships.Specialization Abstraction The specification of a specialization abstraction, “S is_aG”, implies that the relative cardinality constraints of is_a must be S (1,1) and G (0,1)([Goldstein and Storey, 1990]). Because the primary key of G (say G. Gkey) is a candidatekey of 5, there is a necessary condition, in addition to the existence of participating entityoccurrences, for a relationship is_a to exist, i.e., S. Gkey=G. Gkey.Entity Types in a Specialization Hierarchy In a specialization hierarchy, twoentity types may be exclusive, or an entity type may be formed by set operations (intersection, union, or difference) on other entity types ([Biller and Neuhold, 1978]). It is notnatural to define a relationship to express the exclusion between two entity types. Neitheris there an explicit relationship to express the set operations although is_a relationshipsamong those entity types should be specified.• SICs implied by exclusion between entity types. Suppose that two entitytypes E, F are exclusive and Ekey is their common candidate key. It implies a SICwritten in the simplified format as: if E.Ekey= Value then -i(F.Ekey_—Value).• SICs implied by set operations on entity types. Suppose E is formed by theoperation on two entity types F, G and Ekey is the common candidate key among7It is not meaningful to discuss exclusion or set operations if those entity types are not in the samehierarchy. In addition, if those entity types formed by set operations do not have special attributes orparticipate in special relationships, they may be redundant.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 122E, Fand G.— F = F fl G. There are some SICs implied by the isa relationships betweenthe subtype (E and the super-types F, G, respectively. In addition, it implies‘if (F Ekey= Value) A (G Ekey= Value) then E Ekey— Value”— E = F U G There are some SICs implied by the is_a relationships between thethe subtypes F, G and the super-type F, respectively In addition, it implies“if E Ekey== Value then (F Ekey= Value) V (G Ekey= Value)”— E = F — G There are some SICs implied by the is_a ielationships between thesubtype (E) and the super-type (F), and the exclusion between E and G. In addition, it implies “if F.Ekey= Value then (E.Ekey= Value) V (G.Ekey= Value)”.Inheritance Conflict Problem An inheritance conflict may occur when two ormore super entity types of a single specialization type have some attribute(s) with thesame name(s). In such a case, there should be a further higher level super-type of whichthese super-types are subtypes. However, the organization may not be interested in thishigher level super-type. If these attributes are really semantically different, we just prefixtheir names with the super-type names and no SIC needs to be specified. However, iftheir semantics are the same, the subtype should inherit only one of these attributes andthere must be an integrity constraint to maintain the equality of those attributes’ values.In addition, the SICs associated with the super-types must not be inconsistent with eachother, otherwise, the multiple inheritance will cause the subtype to be empty.Aggregation Abstraction The specification of a composite entity aggregation, Ccomponent_of A, implies that the cardinalities of component_of should be either C (1,1)Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 123component_of A (O,n) or C (1,1) component_of A (n,m), where 77 1 and m n76.Because a candidate key of C is formed by the primary key (say A.Akey) of A plusprobably some other attributes, there is a necessary condition, in addition to the existence of participating entity occurrences, for a relationship component_of to exist, i.e.,C.Akey=A.Akey.Association Abstraction Association (grouping) is a powerful abstraction, but itsimplied SICs are complicated. In all cases except for the enumerated set association, themember_of is implicit since it can be derived. Their aggregate SICs should be speciallydealt with77. Suppose that agg_fcn stands for an aggregate function, Derived_Att is anattribute of the set, Att is its corresponding attribute of the members.1. Natural Set Association: Suppose that in the M memberof Ms, the memberofis implicit. We would have “Ms.Derived_Att=aggfcn(M.Att,)”. In addition, anoccurrence of Ms cannot be deleted if it, as a set, is not empty.2. Indexing Derived Set Association;• Only involving one indexing type: Suppose that in Figure 5.1, for Mmemberof DS, the indexing entity type is I with the key I.Ikey, the indexing761f the component C (e.g., Automatic_window_controller) is just relevant to the aggregate A (e.g.,Car), the aggregate A may have 0 or up to n components C. If the component C (e.g., Engine) ischaracteristic or identifying to the aggregate A (e.g., Car), component_of is total to the aggregate A.(Reference to [dos Santos, et al., 1980] for the formal definitions of “relevant”, “characteristic”, and“identifying”.) In either case, the component_of relationship should always be total to the componentC. In some applications, some “components” can exist independently, e.g., an engine may be sold as aseparate part. It would be better to define a subtype of the original “component” type to include thosecomponents that cannot exist independently so that the semantics become clearer. For example, wemay have a type Engine and its subtype ComponenL engine that belongs to cars. Those components aredifferent from other occurrences in terms of behavioral constraints and structural attributes.77The cardinalities of member_of in the case of natural set association and indexed set association canbe derived from the relative cardinalities of other relationship and absolute maximum cardinalities ofentities. So, they need not be considered.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 124relationship is R, member_of is implicit. We would have:“DS.Derived_Att=agg_fcn({M.Att M R I, DS.Ikey=I.Ikey})”.• Only involving indexing attributes: Suppose that in M memberof DS,M.Iridex is the indexing attribute, member_of is implicit. We would have:“DS. Derived_Att=agg_fcrq’{ M. Att DS. Index=M. Index}J’.In both cases, an occurrence of DS cannot be deleted if it, as a set, is not empty.In addition, DS.Ikey (or DS.Index) cannot be updated.3. Enumerated Set Association: In this case, M member_of ES, the member_of isexplicit. That is, we have:“ES.Derived_Att—agg_fcn({M.Att) M M_member_of_ES ES})”.5.7 Summary of the E-R-SIC ModelFigures 5.6 to 5.10 give a summary of the E-R-SIC model.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 125E-R DiagramAttributes?DB Designer SICs?Entities?think: Relationships?±Construct Definition:Definition 5.1: An entity is the database representation of a real-world objectthat can be distinctly identified.Definition 5.2: A relationship is the database representation of an association amongreal-world objects.Definition 5.3: An attribute is the database representation of a property of a real-world objector an association that is a function mapping an entity type or a relationshiptype to values.Definition 5.4: A SIC is a logical, invariant restriction on the static state of a database (that is,a collection of attributes, entities, and relationships), or on the database statetransition caused by an insertion, deletion, or update operation.Observation:• Entity, relationship, and SIC occurrences are classified into different entity, relationship, andSIC types according to some criteria.• At a particular moment, certain groups of entity, relationship, or SIC occurrences can beconsidered as sets in the mathematical sense, and these sets may have some aggregate properties.• Entity is the most fundamental construct among attribute, entity, and relationship.Simplifying Assumptions:No ternary relationships or relationships of higher degree.No recursive relationships.Data Abstractions: inclusion (classification, generalization), aggregation, and association.Premise 5.1: A physical-symbol system has the necessary and sufficient propertiesto represent real-world meaning.Interpretation: A SIC is an assertion— a sufficient or necessary condition— for an occurrenceof an attribute, entity, or relationship type to exist, not exist, or change in a database.Premise 5.2: The occurrences in each entity or relationship type are not homogenous; that is, theyhave different attribute values, and are related to different occurrences of other objects.Implication: A database designer could not specify all entity and relationship types such thatall occurrences of each type satisfied the same set of SICs.That is, “implicit subtypes” with unique SICs always exist.What are SICs that should be incorporated?Figure 5.6: What is in the database?Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 126DiagramDB Designerthink:Examine each attribute of each entity in isolation.(That is, imagine a hypothetical situation:examine each attribute, say E.A1, and suppose that it is the only “object”of interest at this moment.)Consider:Are there necessary conditions that must holdfor E.A1 to exist, not exist, or change in the database?What are these necessary conditions:• associated single entity type SICs?existence: value, null, etc.?— change: changeable, new/old value pair?• associated whole entity set SICs?— unique?— aggregate value?• because we choose it as the primary key?• because of time restriction?Are there sufficient conditions, which if true, imply thatE.A1 must exist, not exist, or change in the database?What are these sufficient conditions: none except for time-triggering?Figure 5.7: Single Entity Attribute SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 127DiagramDB Designerthink:Examine each whole entity in isolation.(That is, imagine a hypothetical situation:examine each whole entity type, say E, and suppose that it is the only “object”of interest at this moment.)Consider:Are there necessary conditions that must holdfor an E occurrence to exist. not exist, or change in the database?What are these necessary conditions:• single entity type SICs?— existence: a formula among attributes?— change: deleted entity attributes?• whole entity SICs?— the restriction on the maximal number of occurrences?— concatenated values of some attributes are unique?— a formula involving aggregate value(s)?aggregate values are interdependent?• because of an implicit subtype?• because of time-restriction?• because of temporal sequence among attributes?Are there sufficient conditions, which if true, imply thatan E occurrence must exist, not exist, or change in the database?What are these sufficient conditions: None except for time-triggering?Figure 5.8: Single Entity SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 128E-R DiagramDB Designerthink:Examine each relationship type, say R, and consider its context in the whole E-R diagram.What are the contexts that each relationship is in the E-R diagram: line, star, loop-2, and loop-n?Consider:What are necessary conditions that must holdfor an R occurrence to exist, not exist, or change in the database?What are these necessary conditions:• single relationship SICs?similarly to a single entity, focus only on its attributes:* each of its single attribute has conditions in Figure 5.7?* the whole relationship has conditions in Figure 5.8?but could be more complicated since an implicit relationship type may be defined based onits participating entities or there may be a formula among those attributes ofthe relationship and its participating entity?— focus on its role as an association between entities:* inherent SIC: its participant entity must exist?* restricted-connecting set:one-side, two-side of its participating entities, or intra-relationship condition?© complicating factors:1. temporal conditions on relationships or the values of entity attributes?2. quantitative requirements on occurrences of the same or different relationship types?3. existence of multiple relationship occurrences of the same type?• group relationship type SICs: conditions for occurrences of several relationship types to coexist?• relationships involved in defining “implicit entity subtypes”, so review Figures 5.7 and 5.8 again?Are there sufficient conditions, which if true, imply thatan R occurrence must exist, not exist, or change in the database?Some are covered as necessary conditions on other relationships, what are other sufficient conditions?• the existence of an entity?© complicating factors:1. because of implicit entity subtypes?2. quantitative requirements on the existence of the relationship occurrences?3. further restrictions on the existence of the relationship occurrences?4. the existence of one relationship occurrence among a group of relationship types?• Time-triggering?• Temporal sequence conditions?Figure 5.9: Relationship SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 129E-R DiagramDB Designerthink:Is there any SIC implied by some “implicit” relationships?If there is a SIC between any two entity types,in general we should express the involved relationship type explicitly in the E-R diagram.In what situations would we have some “implicit” relationship types?• these entity types are in the same specialization hierarchy?— two entity types are exclusive?one entity type is the intersection of other entity types?— one entity type is the union of other entity types?one entity type is the difference between two entity types?— multiple inheritance conflict problem?What are SICs implied by data abstractions?• generalization: S (1,1) is_a G (0,1),a necessary condition for is_a S. Gkey=G. Gkey• aggregation: C (1,1) component_of A (0,n) or A (n,m), n 1, m na necessary condition for component_of— C.Akey=A.Akey• association and grouping: member_of may be implicitly represented except forthe enumerated member_of;the involved relationships and entities are tightly related;SICs are complicatedFigure 5.10: SICs Implied by Implicit Relationships and Data AbstractionsChapter 6The Application of the E-R-SIC ModelThis chapter begins with a hypothetical example to show how the E-R-SIC model canbe applied. Potential pitfalls of using the E-R-SIC model are then discussed. Finally, wediscuss the usefulness of the E-R-SIC model.6.1 An Example of Using the E-R-SIC ModelDescription Suppose that we have an example called car_dealer.database_design. TheE-R diagram in shown in Figure 6.1. The database is intended to keep track of eachcar from the time it is ordered from the factory through its sale to a customer andsubsequent service history as long as the car is serviced by that dealership. Informationabout a (potential) customer is also kept. For simplicity, the possibility that a customermight transfer his (her) car to other persons is not considered.Attributes The attributes are not shown in the first level of the E-R diagram.They are as below78.• Entity Types:ThThe property inheritance principle is assumed. Therefore, the inherited attributes, e.g., Salesperson.Salary, are not listed.130Chapter 6. The Application of the E-R-SIC Model 131is..aFigure 6.1: An Example: A Car Dealership DatabaseChapter 6. The Application of the E-R-SIC Model 132— Person: SIN (key, i.e., social insurance number), Name, Sex, Address, Phone.— Customer: SIN (key).— Employee: EmpJd (key, i.e., employee identification number), Salary, Hire_Date.— Salesperson: Empld (key), BasicSalary, Commissioll.— Mechanic: EmpJd (key).— Car: Engine_No (key), Model, Year, Regular_Price, Cost.— Part: Code (key), Name, Price, Cost, Qty_on_hand, Reorder_Point, Date_Last_Used.— Service: Sequence_No (key), Date, Charge, Part_Fee, Labor_Fee, Discount_Amt.• Relationship Types:— Pre_Sell: no attributes.— Own: no attributes.— Sell: Discount_Rate, Total_price, Sell_Date.— Maintained_by: no attributes.— Installed_in: Qty_Used (i.e., the number of the related part installed duringa service).— Work_on: Hours (i.e., how much hours a mechanic spends on a service).The example does not include all SIC types. In addition, only some of the relevantSICs are illustrated. The SICs are written in the simplified format according to the BNFin Appendix C.Entity Attribute SICs Each attribute of each entity may have some SICs requiringit to be not-null, to take some specified value range, data type and format. The followingtwo attributes are used for illustration.Chapter 6. The Application of the E-R-SIC Model 133• Person.SIN: there are SICs to assert that a person’s SIN must be known,is of type non-arithmetic (i.e., may not be used in conventional arithmeticoperations) and has a format such as 123-56-789, i.e.,is..null(Person.SIN)=nosatisfy_datatype( Person. SIN, non_arithmetic)satisfy_format(Person.SIN, 999!999!999)• Employee.Salary: there are SICs to assert that an employee’s monthly salarymust be known, is of type arithmetic (i.e., all of the normal arithmetic operations may be performed on it in the usual way), and is in the range $1,500and $10,000 inclusive, i.e.,is_nuil(Employee. Salary) =nosatisfy_datatype(Employee. Salary, arithmetic)satisfy_value(Employee.Salary, [1500 .. 10000])Some attributes are not allowed to be updated. For example, the social insurancenumber of a person cannot be updated, i.e.79if Person.SIN is_to_be_updated then falseSome attributes have SICs to restrict changes in value. For example, a salesperson’sbasic salary cannot decrease; and the recently used date of each part is kept up-to-date,i.e.,new(Salesperson .Basic..Salary) old( Salesperson .BasicSalary)new(Part .DateLastUsed) old(Part .DatelastUsed)791n this section, it is assumed that there is no automated database design aid. If there is such asystem, the system, rather than the database designer, would write the SICs in the simplified format forlater reformulating them in terms of SIC Representation model.Chapter 6. The Application of the E-R-SIC Model 134Some attribute SICs relate to the entire associated entity set, for example, eachperson’s SIN is unique; and the sum of the monthly salary of employees should be lessthan or equal to $1,000,000, i.e.,unique (Person. SIN)sum(Employee.Salary) 1000000Entity SICs There may be some SICs that require the values of several attributes ofan entity to follow some formulas. For example, the salary of a salesperson includes twoparts — basic_salary and commission; the regular price of a car is set to have at least25% gross profit; and for each part, its quantity on hand must be always greater than orequal to its reorder point, i.e.,Salesperson. Salary = Salesperson .Basic.Salary + Salesperson. CommissionCar.Cost Car.RegularYrice x 0.75Part. Qtyon±and Part .ReorderPointThere are some SICs to restrict entity set properties. For example, the dealer canhave at most 30 mechanics, i.e.,count(Mechanic) < 30There are some SICs restricting attribute values as a function of time. For example,for those mechanics who have worked for at least two years, their salaries should notdecrease and the average should be at least $3,000, i.e.,if (Current_time — Mechanic.Hire_Date) “2 years”then new(Mechanic . Salary) old(Mechanic. Salary)Chapter 6. The Application of the E-R-SIC Model 135if (Current_time — Mechanic.Hire_Date) “2 years”then avg(Mechanic.Salary) 3000Relationship SICs Attributes of relationships may have similar attribute SICs. However, some restrictions on attributes of one relationship are more complicated. For example, a salesperson cannot offer more than 80% of the gross profit rate of a car as adiscount rate to a customer and the total price of a sold car is equivalent to its regularprice minus discount, then adding 13% tax rate, i.e.,if Salesperson Sell Carthen Sell.Discount_Rate (Car.Regular_Price— Car.Cost) ÷ Car.Regular_Pricex 0.80if Salesperson Sell Carthen Sell.Total_Price Car.Regular_Price x (1—Sell.Discount_Rate) x 1.13There are some restrictions on constructing a relationship. Some restrictions are one-side conditions. For example, the dealer only services those cars that are first owned bysome customers that are in the database, i.e.,if Car Maintained_by Servicethen (Customer Own Car) beforeOther restrictions are two-side conditions. For example, examine the loop formedby the three relationships, Pre_Sell, Sell and Own, there are only three non-redundantSICs to assert their dependencies: (1) if a salesperson sells a car, he (she) must havecontacted a customer, who then buys the car — there must be a “customer” involved;Chapter 6. The Application of the E-R-SIC Model 136any other “entity occurrences”, which are not in the “customer type”, cannot own thatcar (2) a customer can own those cars that have been pre-sold by a salesperson and thenactually sold by the same salesperson; (3) if a car is sold by a salesperson and is ownedby a customer then that salesperson must have contacted that customer transfer of thecar among customers is not recorded, i.e.,if Salesperson Sell Carthen Salesperson Pre_Sell Customer, Customer Own Carif Customer Own Carthen Salesperson Pre_Sell Customer, Salesperson Sell Carif Salesperson Sell Carthen if Customer Own Carthen Salesperson Pre_Sell CustomerSome conditions are complicated. For example, a part_fee including 6% tax is placedon the parts installed in those cars that have been bought more than 5 years, i.e.,if Part Installed_in Service,Car Maintained_by Service,Salesperson Sell Car,Service.Date— Sell.Date > “5 years”thenService.Part_Fee = sum({MCharge Part Installedln Service,MCharge= Part.Price x InstalledJn.QtyUsed x 1.06})Chapter 6. The Application of the E-R-SIC Model 137There can also be SICs expressing sufficient conditions for relationships. Those arethe relative minimum cardinalities in Figure 6.1. For example, any service should haveat least one mechanic working on it, i.e.,V Service, Work_on, Mechanic Work_on ServiceSICs in a Specialization Hierarchy There are some SICs implied by implicit relationships. For example, an employee cannot be both a mechanic and a salesperson,i.e.,if Mechanic.EmpJd=Valuethen —i (Salesperson .EmpJd=Value)There are also some SICs implied by is_a relationships. For example, “each mechanicis an employee” implies that relative cardinalities— Mechanic (1,1) is_a Employee (0,1),and a necessary condition for an is_a occurrence to exist—Mechanic. EmpId—Employee. EmpId.6.2 Potential Pitfalls of Using the E-R-SIC ModelThere might be some potential pitfalls that need to be avoided when using the E-R-SICmodel.Pitfall 1: Inconsistent and Redundant SICs Inconsistent or redundant SICs mightbe specified. For instance, in the car_dealer_database_design example, if the database designer specified that “if Salesperson Pre_Sell Customer then (Salesperson Sell Car)”, itChapter 6. The Application of the E-R-SIC Model 138would be inconsistent with “if Customer Own Car then Salesperson Pre_Sell Customer,Salesperson Sell Car”. If the database designer specified: “Salesperson.Basic_Salary2000”, “Salesperson. Commission 500”, and “Employee.Salary < 2OO”, these SICswould be inconsistent because Salesperson.Salary would then have the lower bound $2,500that is greater than the upper bound of Employee.Salary. If the database designer specified “if Customer Own Car then Salesperson Sell Car”, it is redundant since it is subsumed by “if Customer Own Car then Salesperson Fre_Sell Customer, Salesperson SellCar”. It becomes more difficult to detect such inconsistencies and redundancies as thenumber of SICs gets larger. There needs to be some automated tools to help detect themor prevent them in advance although this issue may not he completely solvable.Pitfall 2: Imprecise or Incomplete SIC Representation The simplified representation format in the preceding section should only be used as a convenient way ofinitially incorporating SICs rather than as a formal way of eventually representing them.Although the simplified representation format might be more natural to a database designer, it is not precise. Recall that there are six components in the SIC Representationmodel. In most cases, the simplified representation only contains precondition and predicate components. For operation-dependent SICs, object and operation type componentsare also specified. Even in that case, other important specification information (the certainty factor and violation action) is still missing. In order to represent any SIC precisely,we need some algorithms to reformulate it (decompose it if necessary) in terms of theRepresentation model.We do not take the naive approach of enumerating all conditions for all objects explicitly. For example, a general SIC represented in the simplified format, “if E RX Fthen E RY 0”, would be incorporated when we consider necessary conditions for theChapter 6. The Application of the E-R-SIC Model 139existence of RX. We do not incorporate it again as a sufficient condition for the existenceof RY. However, if we decompose the original general SIC, there would be two sub-SICsrepresented in the Representation model:one for RX on insertion, whose violation action would be “reject” or“propagate(insert (RY))”— the other for RY on deletion, whose violation action would be “reject” or“propagate(delete(RX))”Note that the first sub-SIC in terms of the SIC Representation model is a necessarycondition for the insertion of RX, and is also a sufficient condition for the insertion ofRY if the violation action is “propagate(insert(RY))”. The second sub-SIC is a necessarycondition for the deletion of RY, and is also a sufficient condition for the deletion ofRX if the violation action is “propagate(delete(RX))”. Without decomposing a generalSIC into several sub-SICs represented in terms of the SIC Representation model, theSIC specifications would not be complete because the violation action may depend onthe object and operation causing violation. Therefore, we can conclude that the SICRepresentation model provides a complete and precise representation of SICs in the ER-SIC model.Pitfall 3: E-R Structure Orientation The E-R-SIC model helps a database designerincorporate SICs based on E-R structural descriptions. However, our logical data modelis the relational model. There should be some procedure for losslessly transforming SICsin the E-R-SIC model into corresponding ones in the relational model80.80The “losslessly” means that, for each SIC incorporated in the E-R-SIC model placing restrictions onoccurrences of entities, relationships, or attributes, there is (are) SIC(s) in the relational model placingcorresponding restrictions on relation tuples or attributes.Chapter 6. The Application of the E-R-SIC Model 140Pitfall 4: Inefficient Modelling In the absence of an automated database design aid,a database designer must specify all explicit and inherent SICs in terms of the simplifiedformats as described in Appendix C. For example, in addition to specifying the entitytypes, Employee and Mechanic, and the relationship type Mechanic_is_a_Employee, he(she) needs also to specify the relative cardinalities of is_a, a necessary condition for itsexistence — Mechanic.EmpId=Employee.EmpJd, and even the fundamental incidenceconstraints the existence of participating entity occurrences. In general, there mightbe many possible SICs given an E-R diagram. Since one SIC may involve more thanone object, the database designer needs to record what SICs have already been identifiedso that they would not be incorporated again. There would be a heavy burden for thedatabase designer if he (she) needs to figure out all the possible SICs, verify, reformulate, decompose, and transform them by himself (herself). It is desirable to have someautomated database design system to help the database designer model and representSICs.Proposals for avoiding these pitfalls will be introduced in the next chapter (pitfall 1is not completely avoided).6.3 Data Integrity Semantic CompletenessIn the E-R-SIC model, the SIC is the construct used to provide logical restrictions onthe structural schema, that is, on the attributes, entities and relationships. Completelyincorporating SICs depends on the correct specification of the structural schema. TheE-R-SIC model requires all relationships to be explicitly represented except for somespecial cases, such as exclusion between two entity types. The explicit representation ofrelationships makes the semantics clear.Chapter 6. The Application of the E-R-SIC Model 141Based on the assumed correct E-R structural schema, we could systematically modelSICs by considering a wide range of possible restrictions or requirements— positiveor negative, simple assertions or complicated formulas, qualitative or quantitative, time-restricted, time-triggering, or temporal sequence requirements, implicitly/explicitly type-related or set-related. Most SICs we consider are necessary conditions for the existenceof the occurrences of one or more attribute, entity, or relationship types. They are, bynature, static SICs restricting the possible states of the database. A few are inherentlydynamic SICs, e.g., new_old transitional constraints; and some others are consideredas sufficient conditions. However, after decomposing them into sub-SICs in terms of theSIC Representation model, all become dynamic and operation-dependent SICs specifyingconditions for the deletion, insertion, or update of the related objects. Since these threeoperations are the only ways the database can change, those necessary conditions forthese operations completely cover the necessary conditions for existence, nonexistenceand change of an object in a database.In addition, sufficient conditions (if any) are also incorporated because of the violationaction component of the SIC Representation model.Whenever we find that the structural schema cannot model the data integrity semantics as SICs, we would need to go back and modify the E-R structure. Therefore, theE-R-SIC model not only allows modelling the SICs completely but also forces the structural schema to reflect the data integrity semantics more faithfully. The structure andSIC parts of the resulting schema would be closely related. This would form a more suitable and fundamental base for higher level transaction or application program modelling.Therefore, we can conclude the following:With the support of the SIC Representation model, the E-R-SIC model wouldChapter 6. The Application of the E-R-SIC Model 142completely model the data integrity semantics.There is one limitation. Although some entities or relationships may have time-valuedattributes or temporal sequence requirements, time-stamped data integrity semantics(e.g., dealing with historical data at some specific time) in general may not he completelymodelled81.A related discussion concerns SICs in the relational model. Suppose that the SICs inthe E-R model have been losslessly transformed into corresponding ones in the relationalmodel. Are there any other new SICs, e.g., data dependencies, that must be consideredin the relational model?Data Dependency in the Relational Model Tile relational data model has well-formalized theories. Data dependency is one of its important concepts. Uliman [1982]defines a data dependency as “a constraint on the possible relations that can be thecurrent value for a relation scheme”. However, one should note that the usefulness ofdata normalization theory is for designing a database directly from the relation concept,not for capturing semantics. The partial and transitive functional dependencies implythe embedding of independent relationships ([Makowsky, et al., 1986]). Similarly, theexistence of non-trivial multi-valued dependencies occurs when a relation represents morethan one 1:N relationship ([Kent, 1983]). These problems can be avoided if entitiesand relationships are properly designed and carefully transformed into relations ([Storey,1988]). Some researchers try to apply other dependencies, e.g., inclusion dependencies,811n the literature, [Palmer, 1982] discusses time-dependent relationships and [Taiizovich, 1991] introduces lifetime cardinalities. It is not clear how the restrictions of lifetime cardinalities are applied overthe lifetime of the entities and relationships.Chapter 6. The Application of the E-R-SIC Model 143exclusion dependencies and co-exclusion dependencies82,to capture semantics. Exclusiondependencies have been covered in the E-R-SIC model. Inclusion dependencies comeabout because some relationships have not been explicitly represented83;and co-exclusiondependencies occur because the SICs are not precisely represented84.Therefore, we neednot add other SICs after transforming those SICs identified by the E-R-SIC model tocorresponding ones that reference the schema in the relational model.52Let E, F be two relations (possibly the same), and A, B, he attributes of E and F, respectively.E[A1,... ,Am] ç F[B1,... ,B] is called an inclusion dependency ([Casanova and Vidal, 1983]). IfE[A,... ,Am] is exclusive to F[B1,... , B,], it is called an exclusion dependency ([Casanova and Vidal,1983]). The negation of an exclusion dependency is called a co-exclusion dependency, ([Arisawa andMiura, 1986]).83Take an example of [Minnila and Räihâ, 1986]: Registered_Cars/Model] CarTypes/ModelJ. Thereshould be some relationship, e.g., Hasmodel, between Registered_Cars and Car_Types.84Arisawa and Miura [1986] state that what co-exclusion dependencies mean is to constrain databasewhen updating and sharing entities. However, there should a higher level of entity that is the super-typeof those entity types. If the SICs implied the is_a relationships between the sub-types and super-typeare precisely represented, the co-exclusion dependency constraints would be incorporated.Chapter 7A Proposed Database Design Aid for Eliciting SICsIn Figure 1.1 of Chapter 1 a proposed database design subsystem for eliciting SICsis sketched. Both the SIC Representation model and E-R-SIC model are application-domain independent. They are suitable for implementation as part of an automateddatabase design system. An automated database design system is needed because of thepotential heavy work load of a database designer when incorporating and representingSICs. This chapter describes conceptually how this subsystem for eliciting SICs shouldwork. In the following, the term “database designer” will be used for the user of theautomated database design subsystem who may be a professional database designer oran end-user who plays the role of the database designer to design his (her) system. Section 7.1 gives a brief overview of the elicitation subsystem. The three major functions ofthe subsystem — verifying elicited SICs for consistency and non-redundancy, reformulating and decomposing general SICs into sub-SICs in terms of the SIC Representationmodel, and transforming them into corresponding ones in the relational model are described in detail in Sections 7.2, 7.3 and 7.4, respectively.144Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1457.1 An Overview of the SIC Elicitation SubsystemInterfaces From the view of the SIC elicitation subsystem, there are two interfaces.The first is an interface to elicit explicit constraints from the database designer in a dialogue or by menus, etc. The second is an interface between the SIC elicitation subsystemand the structure subsystem (e.g., the View Creation System in [Storey, 1988] and [Storeyand Goldstein, 1988]) for constructing the structural schema. From that interface, theSIC elicitation subsystem fetches the structural descriptions in terms of the E-R-SICmodel.Elicitation Based on these structure descriptions, the SIC elicitation subsystem willquery the database designer to obtain the attribute SICs for each attribute of each entity.Next, whole entity SICs would he obtained for each entity. The system then traversesthe E-R diagram to detect possible relationship SICs85 and data abstraction SICs. Ina dialogue, the database designer may need to add some further restrictions on a SICby using expressions such as those in the simplified format (Appendix C). In general,though, the database designer need not worry about the assertion representation syntax.In addition, the system, rather than the database designer, would specify those inherentSICs implied by special relationships (e.g., the is_a data abstraction).Elicitation Knowledge It is possible that the elicitation procedure may be lengthybecause the goal is to include the complete data integrity semantics and the system hasneither common sense nor application domain knowledge. However, it is expected that85The original structural specifications may include some traditional relative cardinalities of relationships. However, the exact numbers except for 0, 1 may not he known since they are irrelevant forconstructing normalized structures.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 146the system should query the database designer based on the following knowledge:• the E-R-SIC model to capture the possible SICs from different objects in the(line, star, ioop-2, and loop-n) contexts of an E-R diagram;• consistency and nonredundancy rules for different SIC types;• heuristics from naming conventions and data types;• specification information that the database designer has provided so far.With consistency and nonredundancy rules for different SIC types, the subsystemwould not request, or could refuse immediately, some impossible SICs. This kind ofknowledge is introduced in Section 7.2.1 to show how it can alleviate the need for somedetailed verification.Since there may be a large number of possible SICs, heuristics may help the subsystemto be even more efficient in asking questions. The object data type — date, arithmetic ornon-arithmetic may suggest its volatility. For example, often a date is unchangeable. Inaddition, naming conventions may hint at some SICs. For example, attributes with thesame names in the entity types linked via some relationship types may suggest the needfor SICs involving some formula between them. Appendix E contains more examples.Verification Although full verification for consistency and non-redundancy is hard toachieve, the elicited SICs would be verified to some extent, especially for cardinalities.The cardinality information is important because a totality constraint might make manypossible SICs redundant or inconsistent. Section 7.2 discusses the related issues on verification in detail.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 147Reformulation and Decomposition Based on the information obtained from thedatabase designer, the system would represent general SICs in terms of the simplifiedformat described in Appendix C. Then, the system should automatically rewrite eachof the above SICs, and decompose it into sub-SICs if necessary, in terms of the SICRepresentation model. All six components plus the SIC name in the model would bespecified. The reformulation and decomposition algorithms are introduced in Section 7.3.The default certainty factor is “certain”, but the database designer can change it to be“uncertain” or provide specific certainty factors.Without application domain knowledge, it is impossible for an automated databasedesign system to provide automatically complex and application-dependent violationactions. For example, if we allow arbitrary violation actions, even for a SIC on updateof E.A2 with a simple formula predicate “E.A1 = E.A2 + E.A3”, the violation actionmay be either a propagation to update E.A1, or update E.A3, or update both E.A1and E.A3 (this may imply a number of choices with different ratios of E.A1 and E.A3).The number of possible combinations of violation actions would grow rapidly as thenumber of involved objects increases. The “propagate” action is helpful and powerful,but is also costly if the set of SICs is not optimized86. In this research, it is envisionedthat the system would only provide some restricted predefined violation action choices(“reject”, “propagate”, “conditionallyreject”, “conditionally_propagate”, or “warning”).Most of the violation actions would be “reject”, “conditionally_reject”, or “warning” ifthe system could not decide how to propagate to fix the violation. If the system decidesthat a “propagate” action is feasible, it would query the database designer to determine86An example here consists of two SICs: “SIC-i: (B RX F) V (B RY G)”; “SIC-2: if B RX F thenB RY G” attached with “propagate” actions in their all sub-SICs. Suppose that the original databasestate is “RY exists, but RX does not”. A “deiete(RY)’ operation would cause the following operations tobe performed: “delete(RY)”, “propagate(insert(RX))” owing to SIC-i, “propagate(insert(RYJ)” owingto SIC-S. A RY is forced to be inserted back although now it may connect B with a different F.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 148whether to “propagate” or “reject”.Generic SICs As described on page 78, we can represent some common SIC typesfor the generic object (Entity*, Relationship, Entity*.Attribute*. and Relatioriship*.Attributejto reduce the number of explicit SICs represented using the Representation model. Bydoing so, we can also alleviate the need for the invocation of the reformulation and decomposition algorithms during the database design consultation session. The subsystemshould have these pre-defined representations of generic SICs. If the database designerspecifies that occurrences of an object type must satisfy one of these common SIC types(e.g., two entity types are mutually exclusive), the related constraint information wouldonly be kept in a logical predicate, called an input predicate (refer to Appendix B.1,e.g., ex_ents(ExEntSet,) storing the information that the entity types in ExEntSet areexclusive). Such specific constraint information need not be directly reformulated or decomposed in terms of the SIC Representation model. It is expected that the specifiedobject type could inherit the related generic SICs. Section 7.3.1 describes these genericSICs in more detail.Transformation It is assumed that if a relationship type has (1,1) cardinalities relative to an entity type, it would be represented by a foreign key rather than a separaterelation in the relational model. \‘Vhen the structure subsystem transforms the E-R specifications into relations, the SIC elicitation subsystem should also transform the SICs inthe E-R-SIC model into corresponding ones in the relational model. The input predicatesneed not be transformed. There should also be pre-defined generic SICs in the relationalmodel corresponding to those in the E-R-SIC model. The principles of representing relationships in the relational model and the SIC transformation algorithms are introducedChapter 7. A Proposed Database Design Aid for Eliciting SICs 149in Section 7.4.Final Outputs The final results of the consultation session would be a listing of incorporated SIC specifications in a relational database schema. These specifications include:(1) specific SICs, which directly apply to specific object types and are represented interms of the Representation model; (2) input predicates describing specific object typesto which some common SIC types apply; (3) generic SICs, which serve as the “templates”for common SIC representations and are expected to be inherited by the specific objecttypes described in (2).7.2 SIC VerificationDifficulty in Verification As discussed in Section 2.3, full verification of SICs for consistency and non-redundancy would need application-domain and common-sense knowledge. Even those aspects of verification that do not require application-domain knowledgestill pose difficult research problems because of the potential complexities of SICs. Thisresearch discusses the issues, but does not offer general solution algorithms.Because the interactions among SICs might, in general, be complicated87,we wouldneed logic theorem proving techniques (e.g., the resolution refutation method [Nilsson,1980]). However, standard logic must be enhanced to handle the following problems.• Attribute Value Problem. In standard logic programming theory, objects are8TFor example, suppose that Ci, C2, CS are the preconditions of three SICs, SIC-i, SIC-2, SIC-3,respectively; and P1, P2, P3 are their predicates, respectively. SIC-i and SIC-S are asserted for thesame object. If Ci overlaps with P2 and C3 implies C2, Pi should be the same as PS. Otherwise, theyare inconsistent — because CS implies P2 and P2 overlaps with Ci, if some occurrences satisfy C3, theymay also satisfy Ci; so CS transitively implies Pi.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 150represented as finite terms, and the set of terms is countable (within the contextof the Herbrand Universe) ([Lassez, 1987]). Suppose that the value domains of allattributes are discrete and finite. Then, consistency among SICs can be checkedby applying the standard logic proving technique although, for efficiency reasons,it may be accompanied by the standard CSP (constraint satisfaction problem)algorithms ([Mackworth, 1977; 1987]; [Mackworth and Freuder, 19851)88. However,suppose that the values of attributes are allowed to be continuous and infinite89.Standard logic cannot deal with the real number domain. It must be extendedto have other techniques to handle real number values and formulas. That isthe motivation for developing constraint logic programming (CLP) ([Jaffar andLassez, 1987]; [Cohen, 1990])°. We may apply linear programming (LP) algorithms(e.g., the simplex method [Hillier and Lieberman, 1986]) to check the consistencyof a set of linear formulas (i.e., whether a feasible solution exists) and removethe inconsistent value ranges (by finding the values of optimization functions—maximization and minimization for each involved attributes)91;apply integer linearprogramming algorithms if the values are restricted to be integer numbers92;and88There are three sets of algorithms node, arc, and path algorithms. In this case, we would applyarc algorithms to check the consistency between value constraints and a formula constraint; apply pathalgorithms to check the consistency among value constraints and several formula constraints. Theycan be checked in polynomial time ([Mackworth and Freuder, 1985]) although the path algorithms arecomplicated.89They may be numerical real numbers or integers; or may have date data type if date operations aresuitably defined.90For example, one CLP representative, CLP(R) provides both expressive and computational powersto solve linear equations and linear inequalities (by two-phase simplex algorithm) incrementally ([Jafi’arand Michaylov, 1987]). However, it is still a rudimentary experimental product and can only run underUNIX-based operating systems. Other CLP languages are Prolog III ([Colmerauer, 1990]), Trilogy, andCHIP ([Hentenryck, 1989]), etc.91A simple algorithm used by ALICE (A Language for Intelligent Combinatorial Exploration) ([Lauriere. 1978]) can also be applied to test the consistency between a single linear formula and the valuerange constraints, and further to remove the inconsistent value ranges.92It is not usual to have complex numbers in a database. Suppose we do have a complex numberapplication, Gröbner method would test whether a system of multivariate polynomial equations has asolution over the complex numbers ([Cohen, 1990]).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 151apply non-linear programming algorithms to handle a set of non-linear formulas93.However, all existing methods have some limitations. For example, if any of theconnectives between linear formulas is a disjunction (“V”), e.g., “P1 V P2”, thesimplex method is no longer applicable since “P1 V P2” may represent a non-convexpolyhedron ([Cohen, 199O]). The general integer linear programming problemis NP-complete ([Papadimitriou and Steiglitz, 1982]). Furthermore, it is evendifficult to achieve nonredundancy. We may apply these algorithms to removethe inconsistent values from value constraints. However, given a set of arbitraryformulas, no existing algorithms can tell us whether a formula is redundant.• Aggregate Function Problem. There may be some attribute SICs related toentire entity or relationship set properties. We cannot verify these SIC specificationswithout actual data. Even if they are not in rule format, at the best we can onlyhave some simple tests (Appendix F.1), such as: the specified maximum value of anattribute value must be greater than or equal to its minimum value. Iii addition, asstated, the verification for cardinality constraints is important for other relationshipSICs. Standard logic proving should be supplemented with the algorithms describedin Appendix F.2 to check consistency and nonredundancy for cardinalities.• Incremental Verification Problem. Verification would be conducted for theSICs in the sequence of the incorporation. Since it is desirable to identify any inconsistencies each time a new SIC is added, the SIC elicitation subsystem would931f there are non-linear formulas and the functions are monotonically increasing or decreasing foreach involved attribute, the ALICE algorithm can be applied to check the consistency between a singlenon-linear formula and value constraints.941n addition, one should note that the simplex algorithm would take an exponential number of stepsin the worst case although in general it can be regarded as very efficient ([Papadimitriou and Steiglitz,1982]).9It is known that the theory of predicates with =, , , <, >, , +, x over real numbers is satisfactioncomplete, i.e., every constraint is either provably satisfiable or provably unsatisfiable. However, the theoryof predicates with =, +, x over integer numbers is not satisfaction-complete ([Cohen, 1990]).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 152not verify the whole set of SICs oniy once, instead there would be an incremental constraint satisfaction problem ([Hentenryck, 1990]), in which constraints areincrementally added and dropped.• Feedback from the Database Designer. It is impossible for the SIC elicitationsubsystem to achieve complete verification without feedback from the databasedesigner. The following are two examples.— If the preconditions of two SICs overlap, the database designer is required tomodify these two SICs.— If the preconditions of two SICs are not related in the syntax (i.e., they areexpressions on different objects, for example, one is on Employee.Age, anotheris on Employee.Education), but their predicates are inconsistent, the systemshould query the database designer as to whether it is possible that an objectwould satisfy these preconditions at the same time. If it is, these two SICs areinconsistent.Optimization Problem Even if two constraints are neither inconsistent nor redundant, they might not be “optimized”— i.e., there might be a single constraint that canreplace the set of these two constraints and be enforced more efficiently. The followingare two examples.• Given two value range constraints that are neither inconsistent nor redundant,they may overlap. That is, there may be a single refined range to replace those tworanges.• A set of two constraints, “(E RX F) V (E RY G,,)”, and “if E RX F then E RY G”,is neither inconsistent nor redundant. However, after optimization, the above setChapter 7. A Proposed Database Design Aid for Eliciting SICs 153would be replaced by a single constraint, “V E, RY, E RY G” (i.e., RY total toE.To get a single refined range constraint may be easy. However, in general, optimizationis more difficult than consistency and non-redundancy checking even for a small andsimple set of constraints. For its related objects, each constraint allows a set of databasestates to exist. When several constraints relate to the same object, if they are consistent,the intersection of all those sets of states is not empty. One approach to optimization isto find this intersection set and describe it by some expressions. However, the number ofstates could he large and it would be very difficult to write expressions for those states.Issue of Unexpected Implicit Constraints Even if a set of SICs is not inconsistentor redundant, there may still exist some unexpected implicit constraints. For example,suppose that Supervise is a relationship type between entity types Manager and Project,the database designer has specified the absolute maximum cardinality of Project— therecan be no more than 8 projects; and some relative maximum and minimum cardinalitiesin the relationship Supervise there can be no more than one manager per projectand any manager must be assigned to a project. This set of three constraints implies animplicit constraint — the organization can hire at most 8 managers although the databasedesigner does not specify it explicitly or may think that there is no upper bound on theabsolute maximum cardinality of the entity type Manager. The elicitation subsystemneeds to recognize that some SIC types may be missing, and then check the current setof constraints to see whether such types of SICs can be derived from it. The databasedesigner would then be asked to judge as to whether he (she) has made some mistakes.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 154Verification related to Certainty Factors The certainty factor component in theSIC Representation model is a source of possible inconsistency between SICs. Supposing that both SIG-i and SIC’-2 are for the same object on the same operation with thesame precondition, if the predicate of SIC-2 implies that of SIC-i (i.e., SIC-2 is morerestrictive than SIC-i), the SIC-2 should be less certain than SIC-i. Otherwise, theyare inconsistent. In addition, the certainty factor and violation action are closely related.An uncertain SIC can only have a predefined violation action of “warning”, “conditionally_reject”, or “conditionally_propagate”. A certain SIC can only have a predefinedviolation action of “reject” or “propagate”.Verification related to Violation Actions Furtado et al. [1988] describe a situationwhere the violation actions of two SICs may be inconsistent. However, if the SICs areprecisely defined and the transaction concept is applied when enforcing them, the problemdescribed by Furtado et al. could not occur96. The major consistency problem causedby the violation action component in the SIC Representation model is that the violationactions may cause an endless enforcement cycle a “flip-flopping” behaviour between aset of SICs described by Ceri and Widom [1990] if arbitrary actions are allowed. Ceri andWidom suggest using a triggering graph97 to detect potential cycles. However, they leavethe responsibility of determining whether infinite triggering would actually be possibleto the database designer. It is difficult, in general, to take an arbitrary violation action96The problem raised by Furtado et al. [1988] is as follows. Suppose that there is a relationship typeR and an entity type E, R is total to E. If the violation action of the incidence constraint, SIC-i, forE on deletion is “propagate(delete(R))” and the violation action of the totality constraint, SIC2, for Ron deletion is “reject”, Furtado et al. claim that they are incompatible. However, note that if the SICsare precisely defined, when the SIC-i has been violated and the deletion of the R is taken, the SIC-2becomes irrelevant because the B occurrence has become non-existent.971n their triggering graph, the nodes of the graph correspond to the SICs. There is a directed edgefrom node SIC-i to node SIC-2, if and only if the execution of SIC-i’s violation action is likely to violateSIC-2 (the likelihood is only in terms of objects and operation types).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 155and decide whether it would violate other SICs. In this research, oniy some restrictedand predefined violation actions would be suggested by the SIC elicitation subsystem.Basically, if the violation action of a SIC is “propagate”, when the SIC is violated, it wouldallow the intended operation to be performed, but would also take a simple compensatoryaction to bring the database to another consistent state. With such a restriction, if thoseSICs are consistent (in terms of the components other than the violation action), whenan operation violates a SIC, the database would return to a consistent state althoughsome operations may be undone.Verification and Decomposition If a general SIC is not in rule format, it can beverified for consistency and nonredundancy before decomposing it. However, a generalSIC in rule format must first be decomposed into several sub-SICs before verifying it.These sub-SICs indicate the consequences of the general SIC that must be taken intoconsideration when verifying a set of SICs. For example, if there is a general SIC, “ifCl then P1”, where Cl and P1 are some assertions, the SIC elicitation subsystem wouldnot know that it is inconsistent with “if P1 then Cl” without first decomposing it.The decomposition algorithms described in the next section can be assured to representcorrectly the sub-SICs of a given single SIC. However, the verification for consistencyand nonredundancy among the representations for jj SICs is still needed.7.2.1 Consistency and Nonredundancy Rules for SIC TypesSince conceptually by using the E-R-SIC model, the SICs can be classified into a numberof types, some general problems of consistency and nonredundancy among those typesof SICs can be explored in advance. The results would be some rules about consistencyand nonredundancy for SIC types, which can be stored iii the knowledge base of theChapter 7. A Proposed Database Design Aid for Eliciting SICs 156elicitation subsystem in order to expedite the elicitation procedure.Take a simple example. In Figure 6.1 of the car_dealer_database_design example, thetwo relationship types, PreSell and Own, may be involved in a number of types of SICs.For example, they may be exclusive (i.e., “if Customer Own Car then —i(SalespersonPre_Sell Customer)”), or one may depend on the other (e.g., “if Customer Own Car thenSalesperson Pre_Sell Customer”). Suppose that the database designer has specified thatthe minimum cardinality of Pre_Sell relative to Customer is 1. With the built-in consistency and nonredundancy rules in Appendix G, the SIC elicitation subsystem wouldautomatically know that the first SIC type cannot occur and that the second SIC type isredundant. It might be confusing to the database designer if the SIC elicitation subsystem asks him (her) to confirm these SICs. It would be inefficient if the database designerinadvertently specified the above redundant or inconsistent SICs and the elicitation subsystem invoked the whole process to verify them during a consultation session.Note that this knowledge cannot totally replace the actual verification during theconsultation, but it would reduce the number of cases that need to be verified. Kung[1984, 1985] applies a sophisticated tableaux approach to verify a set of two SICs: “everyemployee earns more than $20, 000”, and “every manager is an employee”. Since the firstis a simple value constraint for a non-key attribute and the second implies some specialrelative cardinalities and the equality condition on key values of two sides of entities forthe existence of an is_a occurrence, a human database designer would know that theycannot be inconsistent. If the SIC elicitation subsystem had similar rules, it could alsoskip the detailed consistency checking process.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1577.3 SIC Reformulation and DecompositionAs discussed in Chapter 6, the E-R-SIC model uses the Representation model to specifySICs. SICs should be represented in terms of the Representation model. If a SIC is onlyrelevant to an object on an operation, we only need to reformulate it in the Representationmodel without changing its certainty. In most cases, because a SIC is relevant to severalobjects on various operations, the decomposition would get several sub-SICs written interms of the Representation model.The SIC elicitation subsystem would invoke the reformulation and decomposition algorithms, which are described in Appendix H, to reformulate and decompose SICs obtainedfrom the database designer. By using the E-R-SIC model, the subsystem should recognize all inherent SICs that are implied by the specifications (e.g., an is...a relationship)provided by the database designer. Before invoking the reformulation and decompositionalgorithms, the subsystem should first represent these explicit and inherent SICs in termsof the simplified formats in Appendix C, but without using the nested if. . . then rules.Algorithms previously proposed in the literature can deal only with some restrictedtypes of SICs and only consider the object and operation type components. The algorithms in Appendix H deal with all SICs that are recognized by the E-R-SIC model. Allcomponents in the SIC Representation model are considered.Correctness of the Reformulation and Decomposition Algorithms It is necessary to assure the correctness of the proposed reformulation and decomposition algorithms for representing a given general SIC. The following briefly examines the algorithms.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1581. Same Certainty Factor. The certainty of sub-SICs cannot be lower than thegeneral SIC; otherwise, the enforcement of these sub-SICs cannot achieve the certainty of the general SIC. Therefore. if the original general SIC is 100% certain,its sub-SICs should also have 100% certainty. Suppose that the original generalSIC is uncertain. Since its sub-SICs may have different violation actions, some ofthem may have higher certainty. Without further information, the elicitation subsystem can assume that they have at least the same certainty as the general SIC.(Later, the database designer can change the certainty of a sub-SIC to be higher ifnecessary.)2. Relevant Object and Operation Types. The search for the relevant objectand operation types is based on the following ideas.• An operation-dependent SIC is only relevant to one object on one operation.For example, “if E is_to_be_deleted then .. . “is only relevant to E on deletion.• It is desirable to remove obvious redundancies when checking objects mentioned in a SIC. By the nature of relationship, if a relationship occurrenceexists, its participating entity occurrences must exist too (the incidence constraint). If a general SIC is relevant to the insertion of an occurrence of arelationship type and we have a sub-SIC for the relationship, we need nothave other sub-SICs to restrict the insertion of the occurrences of its participating entity types. It is possible that some occurrences of these entity typesmay not participate in any occurrence of this relationship type. The checkingof the general SIC for any occurrence of these entity types can be delayed untilit really participates in an occurrence of this relationship type. For example,according to the algorithms, “if E RX F then E RY G” would be relevant toRX on insertion, but not to E, F or G.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 159• Premise 4.1 in Section 4.1.2 is adopted to avoid some operation checking.Since by assumption, the current database state is semantically correct, wewould need to check an operation only because the database state transitioncaused by the operation may violate the SIC. For example, an insertion of anRX occurrence may violate “if E RX F then E RY G”, but a deletion of aRX could not. Similarly, an insertion of a.n RX occurrence may violate “if 3atieast 3 RX, E RX F then 3 RY, E RY G”, hut a deletion of a RX couldnot.• Current_time and some aggregate functions have special monotonicity properties. For example, a deletion of an E occurrence may violate “max(E.A) >Value “, but not an insertion of an E because, by the monotonicity of functionmax, an insertion of a new occurrence would never decrease the aggregatefunction value.• Since a primary key is used as a surrogate to represent an entity or a relationship in a traditional database, its update may imply a deletion of an “old”entity or relationship occurrence followed by an insertion of a “new” entity orrelationship occurrence. Therefore, sub-SICs related to deletion and insertionshould also be asserted on the update of a primary key.Note that the algorithms have limited capability to identify avoidable operationchecking for SICs involving aggregate function. In addition, it is possible to incorporate more consistency and nonredundancy rules for SIC types into the algorithmsto reduce the sub-SICs further. However, it can be claimed that the current algorithms correctly find the relevant object and operation types for a single SIC.3. Proper Precondition and Predicate Components. Basically, the algorithmsjust rewrite the original SIC so that it becomes more precise and suitable for eachChapter 7. A Proposed Database Design Aid for Eliciting SICs 160sub-SIC. The following are some brief ideas.• Subscripts and some special predicates (e.g., rship_occ_part) are used to makea sub-SIC precise.• If a sub-SIC is for an attribute on update, in general, its predicate componentcontains only an assertion on the attribute requiring it to have some specialvalue after update.• If a sub-SIC is for an entity or relationship and it does not only assert thevalues of its attributes, in general, the precondition component is importantto identify what is the entity or relationship occurrence to be checked.• If the constraint violation is because an entity occurrence is deleted, in general,because of the key uniqueness, it is impossible to find another occurrence ofthe same type to avoid the constraint violation (unless the constraint is oneasserting aggregate properties). However, it is possible to “reconnect” anoccurrence of one entity type with another occurrence of the other entitytype. That is, we might find another relationship occurrence to satisfy theSIC unless both sides of the relationship are bound.• Constraints that assert explicitly the equality of some key attributes of twoentity types, e.g.. SICs implied by an ID Dependency, is_a or componenLofrelationships, should be dealt with specially. Because the key attributes (e.g.,SIN) of the two entity occurrences (e.g., Person and Employee) in fact referencethe same physical entity occurrence (e.g., the same physical Person), the linkshould be permanent.4. Suggested Violation Actions. As stated, because of the verification problem,the SIC elicitation subsystem could only suggest some restricted and predefinedChapter 7. A Proposed Database Design Aid for Eliciting SICs 161violation actions appropriate to the certainty of the SIC. A propagation actioncan be automatically suggested by the SIC elicitation subsystem only in the casethat there are two simple assertions on two objects in the SIC. In that case, asimple propagation action is suggested to regain a consistent database state. If thedatabase designer has set a certainty threshold (say 75%), any uncertain SIC withcertainty less than the threshold will be given a “warning” as its violation actionautomatically.5. Given SIC Names. A SIC name is given automatically after applying the abovealgorithms and using some predefined SIC types.Examples Some examples are included in Appendix I for illustration.General SIC Information After decomposing a general SIC, we may still need tolink its decomposed sub-SICs together since the verification for consistency and nonredundancy may still be needed, and furthermore, a general SIC may later be deleted.In addition, for documentation purposes, it may be desirable to keep the original general SIC information, which contains only the certainty factor (if not default “certain”),precondition and predicate components. Therefore, the SIC elicitation subsystem wouldneed to link a general SIC and its decomposed sub-SICs together.7.3.1 Representation of Generic SICsThe principles of the reformulation and decomposition algorithms can be similarly appliedto produce the generic sub-SICs for the generic object types Entity*, Relationship*,Entity*.Attribute*, and Relationship* .Attribute*. However, the SIC representation forChapter 7. A Proposed Database Design Aid for Eliciting SICs 162the generic types is complicated because the data dictionary retrieval and manipulation,and the pre-coudition contexts must be explicitly stated. The input information andmanipulation logical predicates used in generic SICs are listed in Appendix B. This subsection introduces the representation of generic SICs for some common SIC types anddiscusses the related issues. In all of the following cases, the principle of SIC specializationallows us to omit the explicit SIC representations for specific object types if we have therepresentations for generic object types.Domain Constraint Representation SIC Aggregation and SpecializationBy the principle of SIC aggregation, domain constraints on insertion of an entity/relationshipoccurrence can be simulated by applying the domain constraints of all its attributes.There are three separate sub-SICs for restricting not-null, uniqueness, and nonvolatility, respectively, and another sub-SIC dealing with data-type, format, and value. Forrestricting the insertion of an entity, we need one sub-SIC, which would call the aboverelated sub-SICs (except for the nonvolatility) for asserting attribute domain constraints.In total, the five sub-SICs in Appendix J are sufficient to represent all domain constraintsregardless of the number of entity types and their attributes in a database. The numberof SICs that must be specified explicitly is dramatically reduced through using the SICabstraction concepts. Similarly, five sub-SICs are needed for relationship types and theirattributes in a database.Primary Key Constraint Representation— SIC Association and Specialization A number of SICs must be specified to capture the possible inconsistent states of adatabase when updating a primary key. The SIC association and specialization conceptswill be used to reduce the number of explicit SICs required. During the database designChapter 7. A Proposed Database Design Aid for Eliciting SICs 163phase, if a SIC is identified for the insertion (or deletion) of a relationship and thereshould be a sub-SIC for checking the update of a part of its key, its SIC name is addedinto an associated SIC.NameSet of the affected key attribute. The SICNameSeis ofkey attributes of a relationship type may be different. However, in the case of a SIC thatis relevant to the insertion (or deletion) of an entity type, all affected key attributes ofthe entity have the same SICiVame_Set.For example, suppose that we have a SIC such as “if an employee is assigned to aproject, he (she) must participate in an insurance plan”, and assume the key of the employee and project are Empld and Projld, respectively. The name of the sub-SIC restricting an insertion of the relationship Assigned_to would be inserted into the SIC_Name_Setof a special logical predicate (associated_PKSICs_1) for the key attribute Assigned_to.Empld.Note that since the non-sharing entities are not concerned, the update of another keyattribute Assigned_to.Projld of the relationship is not restricted. Similarly, the name ofanother sub-SIC restricting a deletion of the relationship Insure, is inserted into AssociatedPKSICsD for Insure.Empld.Two “set-SICs” in Appendix J are needed for the key attributes of Relationship*.(Similarly, there are two “set-SICs” for key attributes of Entity*.) By the principle ofSIC association, the enforcement of such a “set-SIC” is the same as the enforcement ofall of its “member-SICs”.Other SIC Representation SIC Specialization A number of other SIC types98can be similarly represented. Usually, these SICs types can be described in the “closedform” of predicates, that is, without further arbitrary restrictions. If a DBMS finds98These SIC types include: Composite_Attribute_Unique Constraint, Absolute MaximumCardinality Constraint of an Entity Type, “traditional” relative cardinality constraints (i.e., Totality Constraint, and Relative Maximum Cardinality Constraint), Incidence Constraint,Chapter 7. A Proposed Database Design Aid for Eliciting SIC’s 164the related information on a specific entity, relationship or attribute type, e.g., symmetric(Married_to) indicating its symmetry, the related sub-SICs would be automaticallyinherited from the generic types.The number of such generic SICs stored in the SIC maintenance subsystem woulddepend on the complexity of the application at hand. It is possible that in a specialdatabase application all SICs can be represented as generic SICs in advance so that theactual invocation of the reformulation and decomposition algorithms might be almosttotally avoided during the database design consultation session.Some examples of the above generic SICs are included in Appendix J.Some Improvements The generic SICs in Appendix J have two problems, which canbe solved as follows.• Uniform Certainty Factors It is assumed that the generic SICs are all “certain”by default If we consider the possibilities that a few SIC types for some specificobject types may be “uncertain”, the certainty factors need to be stored in theinput predicates for these specific object types• Uniform Violation Actions. Because of the SIC inheritance principle, specific object types would inherit the same violation actions from the generic objecttypes. The advantage of this is that the elicitation sub-system only needs to ask theSymmetry Property of a Relationship, Transitivity Property of a Relationship, Subset_Relationship SIC, Relationships_Union Special SIC, Exclusive Relationship SIC, Exclusive Occurrence SIC, Not_And_Relationships SIC, Either_Existence_Relationships SIC, Relationship_Before_Relationship SIC, Relationship_Not_Before_Relationship SIC, Relationships_Join SIC, Relationships_Depends_onLoopN_Relationships SIC, Weak_Entity SIC,ID_Dependency_Relationship SIC, Weak_Relationship SIC, Completeness_Mapping SIC,Relationships_Intersection Special SIC, Relationship_Trigger_Relationship SIC, Exclusionbetween Entity Types, Entities_Intersection Special SIC, and Entities_Union Special SIC.Chapter 7. A Proposed Database Design Aid for Eliciting SIC’s7.4 Transforming SICs to Relational FormThe final result of the consultation is a logical database specification implemented inthe relational model. These SICs may be later enforced by an integrity maintenaicesubsystem in a relational database system. Therefore, the incorporated SICs referencingentities and relationships in the E-R-SIC model must be automatically transformed by theelicitation subsystem into SICs referencing the corresponding relations in the relationalmodel.Relationship Representation in the Relational Model When constructing the relations in the relational model by using the E-R descriptions, each entity type is naturallyrepresented by a separate relation. However, there are two alternatives for representinga binary relationship type in the relational model. Some researchers favour always representing a relationship type by using a separate relation rather than a foreign key because165database designer the violation action once for each of these common SIC types.However, this rigidity may not be suitable for all applications. An improvementwould be to store a violation action “list” in the input predicate for each relatedspecific object type. The number of elements in the list corresponds to the numberof its sub-SICs. Each element is a violation action for each sub-SIC. Then the subSICs of a specific object type would not inherit the violation action from its genericobject type, but have its own “custom-made” violation action. Considering alsothe above problem of uniform certainty factors, we may allow both the certaintyfactor and the violation action in generic SICs to be variables. They would bebound with actual values when a specific object type satisfies the preconditions.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 166they argue that foreign keys decrease the adaptability of database designs ([Wilmot,1984]) and “[E-R consistent relational schema] assures a greater adaptability to changesnot concerning the structure of the modeled information, such as the cardinality of relationships” ([Makowsky, et al., 1986, p. 321]). In addition, by adopting the separaterelation approach we would have the same SIC representations in both the E-R-SICmodel and the relational model.However, because of access efficiency considerations, the foreign key approach stillprevails in practice. This research allows the foreign key approach to representing arelationship type having (1,1) relative cardinalities. The cardinalities stability criterionis adopted if there is a tie to decide which entity key will become the foreign key. Supposethat a relationship type R relates to entity types E and F. If the relationship type R hasany attribute, it is represented by a separate relation in the relational model. Otherwise,its representation is based on the following rules99:1. Suppose only one of the involved entity types has (1,1) cardirialities. If E has the(1,1) cardinalities, add the primary key of F to the E relation as a foreign key.Otherwise, add the primary key of E to the F relation as a foreign key.2. Suppose both entity types, E and F, have (1,1) cardinalities.(a) The decision is based on the stability of the cardinalities. If the (1,1) cardinalities of F are more likely to change in the future, add the primary key ofF to the E relation as a foreign key. Otherwise, add the primary key of E tothe F relation as a foreign key.99There is an exception. If the relationship is a relationship via which an ID Dependency happens,an is_a or component_of relationship, the entity type with the (1,1) relative cardinalities already has theprimary key of the other entity type as its candidate key attributes. These key attributes can play boththe role of (part of) a candidate key and a foreign key. We need not add the key attributes of the otherentity type to it.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 167(b) If the above cannot be decided, the choice is then based on the relative frequencies of “F of E’ or “E of F” type queries. If the query “F of E” wouldbe encountered more often, add the primary key of F to the E relation as aforeign key. Otherwise, add the primary key of F to the F relation as a foreignkey.(c) If neither of the above can be decided, the choice depends on the last resort— how the database designer specifies the relationship, “F R F” or “F R F”.It is based on the heuristic that future queries may be similar to the way thedatabase designer states the relationship although he (she) may not admit it.If originally the relationship is expressed by the database designer as “F RF”, add the primary key of F to the F relation as a foreign key. Otherwise,add the primary key of E to the F relation as a foreign key.3. In all other cases, construct a separate relation for the relationship.SIC Representation in the Relational Model Although we allow the foreign keyapproach to representing relationships in the special cases of (1,1) cardinalities, this kindof representation is only for query and processing efficiency. The semantics should bethe same as in the E-R-SIC model. Therefore, the entity and relationship descriptionswould still be stored in the data dictionary of a relational database. There would also bethe same classification of SIC types. However, the adoption of the foreign key approachwould cause some semantic confusion because now a relation could represent an “entity”,or “relationship”, or even both. In this research, if a relation represents both an entitytype and one or more relationship types, it is deemed a special entity type with someinformation kept in the data dictionary. The associated special information specifies therelationship types hidden in it (by adding the key of the other entity type(s) to it asChapter 7. A Proposed Database Design Aid for Eliciting SICs 168a foreign key) and the relative cardinalities of the other entity type(s) in this hiddenrelationship type’°°.The (1,1) cardinalities of one entity type in a relationship type would cause some SICsto be redundant or inconsistent. In addition, some SICs in the E-R representation neednot be transformed when constructing relations because either they do not mention anexplicit relationship or the relationship representation is not relevant to them’°1. TheseSIC’s representation in the relational model would be the same as corresponding ones inthe E-R-SIC model.The Algorithms The general transformation algorithms for transforming SICs inthe E-R-SIC model into SICs referencing the corresponding relations in the relationalmodel are described in Appendix K. It is relatively simple to transform the relationshipname and the manipulation predicates of its participating entities102. However, in addition to the primary key update problem, in the relational model we have a problemof the update of foreign key attributes owing to the well-known semantic overload issue.The update of any attributes of a foreign key would imply the deletion of an old relationship occurrence and the insertion of a new one. The SIC association and specializationconcepts can be applied here too.‘°°That is, two related “relationship_participant” predicates would be deleted and a new “relationship_hidden_entity” predicate will created by the SIC elicitation subsystem.‘°‘Some of these SIC types are: all single entity attribute SICs, all entity SICs, SICs on a singlerelationship’s attributes, Entitieslntersection Special SICs, Entities_Union Special SICs. Inaddition, if a relationship is declared to be complete, usually the relative cardinalities of any involvedentity type should not be (1,1). Otherwise, the other side of the entity type can only have exactly oneoccurrence, which is not practical. Therefore, the Completeness_Mapping SIC usually need not betransformed.‘°2By taking the foreign key approach to representing relationships, some SICs would become redundant. For example, we do not need a SIC requiring the relative maximum cardinality to be 1 for the“entity” relation in which the relationship is now hidden because the foreign key is single-valued. Thealgorithms would also remove them.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 169Examples corresponding to the ones in the E-R-SIC model are included in Appendix Lfor illustration.Generic SICs It is also desirable to apply the SIC abstraction concept here torepresent generic SICs. In principle, we can apply the above transformation algorithms totransform the generic SICs in the E-R-SIC model into corresponding ones in the relationalmodel although the references to some new information from the data dictionary maybe needed. However, the foreign key approach would need additional sets of predefinedsub-SICs. Because now relationship representation is not uniform, we need to considereach of the three possible relation representations for each relationship. If a sub-SICinvolves two relationships, it would have nine possible combinations of representations;if it involves three relationships, the possible combinations would become twenty-sevenalthough some impossible cases can be excluded103.Such an analysis and preparation of pre-defined generic SICs could be carried outfor SIC types involving only a few relationships. However, it becomes impossible whena SIC type may involve an unknown number of relationships (e.g., the intersection ofrelationships). In those SIC types, we would be forced to pre-define generic SICs onlyfor the simple cases having at most three relationships involved.Some of these SICs representations in the relational model are included in Appendix Mfor illustration.103For example, suppose that we have an Exclusive_Relationship SIC, such as RXIIRY relative toE where RX type connects the entity types E with F, and BY connects the entity types E with G. Wewould know that E cannot have (1,1) cardinalities in either RX or RY. So we need not consider thoseimpossible representations for relationships RX and BY— adding the primary key of F or G as theforeign entity of E. However, F may have (1,1) cardinalities iii RX; and G may have (1,1) cardinalitiesin RY too. So we would need two more sets of representations for the predefined sub-SICs in additionto the original one in the E-R-SIC model.Chapter 8Conclusions and Further Research8.1 Conclusions and ContributionsConclusions This research has presented two models. The E-R-SIC model is a comprehensive modelling tool for helping the database designer systematically incorporatesemantic integrity constraints that are relevant to attributes, entities, and relationshipsin a database. The SIC Representation model is used to represent precisely the featuresof these SICs. The declarative and operational semantics of each SIC are specified. Byusing these two models, the data integrity semantic constraints on the allowable statesand state transitions can be completely modelled and properly represented in a databaseschema. The focus of database designers will be changed from traditionally emphasizingonly structure, functional dependencies, efficiency, etc. to describing data semantics.In a database, the number of explicit SICs specified using the Representation modelwould not be huge because of the application of SIC abstractions and the representationof generic SICs. These SICs could be efficiently enforced because of the characteristicsof the SIC Representation model.Both the Representation model and the E-R-SIC model are application-domain independent. They are suitable for implementation as part of an automated database design170Chapter 8. Conclusions and Further Research 171system. Conceptually, this research proposes a SIC elicitation subsystem. The SIC elicitation subsystem would detect where general SICs may he needed and prompt a databasedesigner to confirm or provide them. The subsystem would automatically decide for whatdata objects and on what database operations these SICs should be enforced, reformulate them (decompose them into sub-SICs if necessary) in terms of the Representationmodel, and suggest some violation actions. The subsystem would have some built-in consistency and nonredundancy rules for different SIC types, would verify the consistencyand nonredundancy of SICs to some extent and transform them into corresponding onesin the relational model. This kind of automated database design system would providemore assistance to a database designer in modelling the data semantics.Contributions Most previous SIC research concentrates on classifying and efficientlyenforcing a few types of SICs. Current languages do not represent all features of aSIC precisely. Existing automated database aids do not provide adequate facilities forincorporating SICs into a design. This research contributes to our understanding ofdatabase semantic integrity. On the theoretical side, there are the following contributions:1. The SIC Representation model is defined to represent precisely the necessary features for specifying declarative and operational semantics of a SIC.2. The E-R-SIC model is proposed to incorporate dynamic and static SICs in adatabase schema rather than in transactions and programs.3. Algorithms are provided to reformulate and decompose SICs elicited using the ER-SIC model, and to transform them into corresponding ones in the relational datamodel.chapter 8. Conclusions and Further Research 172On the practical side, there are the following contributions:1. A SIC elicitation szzbsystem has been proposed to help the database designer designSICs in addition to the structure part of a database schema.2. This research provides a foundation for overcoming the well-known problem ofrepresenting data integrity semantics in current relational database systems. Theresulting database would have the advantages of embedded SICs as described inChapter 1. The SIC representation would facilitate the efficient enforcement.3. Although not empirically tested, we may conjecture that the information systemconceptual design would more completely and systematically include data and information system semantics. The database designer would be able to model moredata integrity semantics, thereby reducing the need for this information to be included in application programs. This research provides a starting point for futureempirical tests.8.2 Future Research Extensions to this DissertationThis dissertation is part of a research program at the University of British Columbia toformalize the database design process and make databases more intelligeilt. There aremany areas for further research to extend this dissertation. Some of these are suggestedbelow.• Non-binary Relationships. Currently, the E-R-SIC model assumes all relationship types to be binary. It is assumed that a database designer would know howChapter 8. Conclusions and Further Research 173to use binary relationships to simulate non-binary ones (including recursive relationships, ternary relationships and relationships of higher degree). This restrictioncould be relaxed to allow the database designer to model non-binary relationshipsdirectly.• Development and Implementation of Efficient Algorithms to Assure theConsistency, Nonredundancy, and Even Optimization of SICs. To provethe consistency, nonredundancy and optimization of arbitrary SIC statements isstill an open issue. In addition, to design and implement those verification algorithms needs more research. These are challenges for researchers in the fields ofmanagement information system, computer science, and mathematics.• Integration of SICs Elicited from Multiple Sources. This dissertation isbased on a simplified assumption of a single database designer. View integration orsynthesis is an important topic in database design research. Previous research (e.g.,[Wagner, 1989]) have addressed this issue without considering SIC specifications. Itis also possible that the SICs need to be obtained from several database designers.Future researchers may address this complicated issue.• Programming Implementation. Future research would need to implement theproposed SIC elicitation subsystem. Then the produced SIC elicitation subsystemmust be merged with a system to construct the structure of a schema (e.g., theView Creation System [Storey, 1988]) to provide the complete database schemadesign assistance to a database designer. It is also desirable to incorporate it witha view integration system (e.g., AVIS [Wagner, 1989]) and a future SIC integrationsystem in the case that multiple sources for a database specification are needed. Acomplete automated database design system needs to be implemented.Chapter 8. Conclusions and Further Research 174• Add-on Knowledge, Capabilities and Features of the SIC Elicitation Subsystem. Based on the E-R-SIC model proposed in this research, future researchersmay add some common-sense, domain-dependent, and organization-dependent knowledge to the proposed automated database design system to make it more intelligentand provide more efficient and effective assistance for a database designer. Ideally,the system may have a learning capability so that it can accumulate knowledgeeach time it is used. It is also desirable to provide some natural language facilitiesand graphical interface so that a database designer can more easily describe entitiesand relationships, and directly design an E-R database on the screen.• Transaction Modelling. In order to capture transaction-driven semantics andcheck the SICs related to transactions more efficiently, specifications of transactions,i.e.. user-predefined operations, need to he specified. Future research may proposealgorithms to transform SICs identified by the E-R-SIC model to the pre-conditionsand post-conditions of transactions.• Application of SICs in an Expert Database System. As stated earlier, SICscan be used to facilitate intelligent query evaluation and provide deductive capabilities. Further research may explore how to include and apply SICs representedby the Representation model in an expert database system.• Design and Management of an Integrity Maintenance Subsystem in theRelational Database. Very few kinds of SICs are enforced in commercial relational database systems. One reason for this is probably concern for efficiency.Based on SIC specifications identified in this research, future researchers may design an integrity maintenance subsystem to enforce SICs efficiently in a relationaldatabase. The management of SICs, i.e., the insertion or deletion of SICs, after the database is populated should be carefully taken into consideration. SomeChapter 8. Conclusions and Further Research 175administrative procedures may need to be invented to handle the “change of SICs”.• Empirical Research for Testing “Usefulness”. Future empirical researchersmay test two kinds of usefulness. The first is the usefulness of using database embedded SICs versus the traditional approach of enforcing integrity via applicationsoftware. The second is the usefulness of the design approach adopting an automated database design system compared with the manual design approach (withoutany assistance of an automated design system) to incorporate the necessary SICsfor embedding in a database. The automated database design system proposed inthis dissertation can be taken as a tool.Bibliography[1] Abiteboul, S., and Vianu, \7, “Transactions and Integrity Constraints”, PTOC. ofthe Second ACM SIGA CT-SIGMOD Symposium of Principles of Database Systems,Portland, Oregon, May 1985, pp. 193-204.[2] Aho, A. V., Hopcroft, J. E., and Ulirnan, J. D., The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Mass., 1974.[3] Arisawa, H., and Miura, T., “On the Properties of Extended Inclusion Dependencies”, Proc. of the Twelfth International Conference on Very Large Data Base,Kyoto, August 1986, pp. 449-457.[4] Azar, N., and Pichat, E., “Translation of an Extended Entity-Relationship Modelinto the Universal Relation with Inclusion Formalism”, in Entity-Relationship (theFifth International Conference on E-R Approach, France, 1986) edited by S. Spaccapietra, Elsevier Science Publishers B.V. (North-Holland), 1987, pp. 253-268.[5] Benci, E., Bodart, F., Bogaert, H., and Cabanes, A., “Concepts for the Design of aConceptual Schema”, in Modelling in Data Base Management Systems, edited byG.M. Nijssen, North-Holland Publishing Co., 1976, pp. 181-200.[6] Bernstein, P. A., Blaustein, B. T., and Clarke, E. M., “Fast Maintenance of Semantic Integrity Assertions Using Redundant Aggregate Data”, Proc. of the SixthInternational Conference on Very Large Data Base, 1980, pp. 126-136.[7] Bertino, E., and Apuzzo, D., “Integrity Aspects in Data Base Management Systems”, Proc. Trends é4 Applications, Gaithereshurg, Maryland, 1984, pp. 43-52.[8] Biller, H., and Neuhold, E. J., “Semantics of Data Bases: The Semantics of dataModels”, Information System, Vol. 1 No. 3, 1978, pp. 273-292.[9] Borgida, A., “Language Features for Flexible Handling of Exceptions in InformationSystems”, ACM Transactions on Database Systems, Vol. 10, No. 4, December 1985,pp. 565-603.[10] Bouzeghoub, M., and Gardarin, G., “The Design of An Expert System for DatabaseDesign” in New Applications of Data Bases, edited by G. Gardarin and E. Gelenbe,1984, pp. 203-223.176Bibliography 177[11] Bouzeghoub, M., Gardarin, G., and Metais, E., “Database Design Tools: An ExpertSystem Approach:”, Proc. of the Eleventh International Conference on Very LargeData Base, Stockholm, August 1985, pp. 82-95.[12] Bouzeghoub, M., and Metais, E., “SECSI: An Expert System Approach forDatabase Design”, in Information Processing 86, edited by H.J. Kugler, ElsevierScience Publishers B.V. (North-Holland), 1986, pp. 251-257.[13] Bracchi, G., Furtado, A., and Pelagatti, G., “Constraints Specification in Evolutionary Data Base Design”, in Formal Models and Practical Tools for InformationSystems Design, edited by H.-J. Schneider, North-Holland Publishing Co., 1979,pp. 149-165.[14] Brady, L. I., and Dompney, C. N. G., “Dynamics of Database Semantic Integrityor Managing the Meaning of Data”, Proc. of the Australian Computer Conference,Sydney, November 1984, pp. 82-96.[15] Bragger, R. P., Dudler, A., Rebsamen, J., and Zehnder, C. A., “Gambit: An Interactive Database Design Tool for Data Structures, Integrity Constraints and Transactions”, International Conference on Data Engineering, Los Angeles, California,August 1984, pp. 399-407.[16] Brodie, M. L., Specification and Verification of Database Semantic Integrity, Ph.D.Thesis, University of Toronto, 1978 (also as Technical Report CSRG-91, April1978).[17] Brodie, M. L., “Association: A Database Abstraction for Semantic Modelling”, inEntity-Relationship Approach to Information Modelling and Analysis (the SecondInternational Conference on E-R Approach, 1981), edited by P. P. Chen, ElsevierScience B.V. (North-Holland), 1983, pp. 577-602.[18] Brodie, M. L., “On the Development of Data Models”, in On Conceptual Modelling,edited by M.L. Brodie, J. Mylopoulus and J.W. Schmidt, Spring-Verlag, 1984, pp.19-47.[19] Brodie, M. L., “Database Management: A Survey”, in On Knowledge-Base Management Systems, edited by M.L. Brodie and J. Mylopoulus, Springer-Verlag, 1986,pp. 201-218.[20] Brodie, M. L., and Manola, F., “Database Management: A Survey”, in Foundations of Knowledge Base Management: Contributions from Logic, Databases,and Artificial Intelligence Applications, edited by J. W. Schmidt and C. Thanos,Springer-Verlag, 1989, pp. 205-234.Bibliography 178[21] Brodie, M. L., and Ridjanovic, D., “On the Design and Specification of DatabaseTransactions”, in On Conceptual Modelling, Edited by M.L. Brodie, J. Mylopoulos,and J. W. Schmidt, Spring-Verlag, 1984, pp. 277-306.[22] Bry, F., and Mauthey, R., “Checking Consistency of Database Constraints: a Logical Basis”, Proc. of the Twelfth International Conference on Very Large Data Base,Kyoto, August 1986, pp. 13-20.[23] Bubenko, J. A., “Information Modeling in the Context of System Development”,Information Processing 80, edited by S. H. Lavington, North-Holland PublishingCo., 1980, pp. 395-411.[24] Carsnell, J. L., and Navathe, B. “SA-ER: A Methodology that Links Structured Analysis and Entity-Relationship Modelling for Database Design”, in Entity-Relationship (the Fifth International Conference on E-R Approach, France, 1986)edited by S. Spaccapietra, Elsevier Science Publishers B.V. (North-Holland), 1987,pp. 381-397.[25] Casanova, M. A., and Furtado, A. L., “On the Description of Database TransitionConstraints Using Temporal Languages”, in Advances in Database Theory, Vol. 2,edited by H. Gallaire, J. Minker and J.M. Nicolas, Plenum Press, New York, 1984,pp. 211-236.[26] Casanova, M. A., and Tucherman, L., “Enforcing Inclusion Dependencies and Referential Integrity”, Proc. of the Fourteenth International Conference on Very LargeData Base, California, 1988, pp. 38-49.[27] Casanova, M. A., and Vidal, V. M. P., “Towards a Sound View Integration”, Proc.of the Second ACM SIGACT-SIGMOD Symposium of Principles of Database Systems, Atlanta, George, March 1983, pp. 36-46.[28] Cauvet, C., Proix, C., and Rolland C., “A Knowledge Base for an InformationSystem Design Tool”, in Methodologies for Intelligent Systems, edited by Z.W.Ras, and M. Zemankova, Elsevier Science Publishing Co. Inc., 1987, pp. 56-63.[29] Ceri, S., and Widom, J., “Deriving Production Rules for Constraint Maintenance”,IBM Research Report, RJ 7348, March 1, 1990. (A short version appears in Proc.of the 16th VLDB Conference, Australia, 1990, pp. 566-577).[30] Chakravarthy, U. S., Minker, J., and Grant, J., “Semantic Query Optimization:Additional Constraints and Control Strategies”, in Expert Database Systems, editedby L. Kerschberg, Benjamin/Cummings Pub. Co. Inc., 1987, pp. 345-377.Bibliography 179[31] Chen, P. P.. “The Entity-Relationship Model— Toward a Unified View of Data”,ACM Transactions on Database Systems, Vol. 1 No. 1, December 1976, pp. 9-36.[32] Chen, P. P., “Database Design Based on Entity and Relationship”, in Principles ofDatabase Design, Vol. 1: Logical Organization, edited by S.B. Yao, Prentice-Hall,1985, pp. 174-210.[33] Choobineh, J., “Form Driven Conceptual Data Modelling”, Ph.D. Dissertation,Dept. Management Information Systems, University of Arizona, 1985.[34] Choobineh, J., Mannino, M. V., Nunamaker, J.F., and Konsynski, B.R., “An Expert Database Design System Based on Analysis of Forms”, IEEE Transaction onSoftware Engineering, Vol. 14, No. 2, February 1988, pp. 242-253.[35] Codd, E. F., “Extending the Database Relational Model to Capture More Meaning”, ACM Transactions on Database Systems, Vol. 4, No. 4, December 1979, pp.397-434.[36] Cohen, J., “Constraint Logic Programming Languages”, Communications of theAGM, Vol. 33, No. 7, July 1990, pp. 52-68.[37] Colmerauer, A., “An Introduction to Prolog III”, Communications of the ACM,Vol. 33, No. 7, July 1990, pp. 69-90.[38] Cosmadakis, S. C., and Kanellakis, P. C., “Equation Theories and Database Constraints”, Proc. of the 17th Annual ACM symposium on Theory of Computing,Rhode Island, May 1985, pp. 273-284.[39] Dampney, C. N. G., “Specifying a Semantically Adequate Structure for informationSystems and Databases”, in Entity-Relationship Approach (the Sixth InternationalConference on E-R Approach, New York , Nov. 1987), edited by S. T. March,Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 165-188.[40] Date, C. J., An Introduction to Database System, Vol. II, Addison-Wesley Publishing Co., Reading, Massachusetts, 1983.[4].] Date, C. J., A Guide to the SQL Standard, Addison-Wesley Publishing Co., Reading, Massachusetts, 1987.[42] Davis, J. P., and Bonnell, R. D., “Modelling Semantic Constraints with Logic inthe EARL Data Model”, The Fifth International Conference on Data Engineering,Los Angeles, California, February 1989, pp. 226-233.Bibliography 180[43] de Castilho, J. M. V., Casanova, M. A., and Purtado, A. L., “A Temporal Framework for Database Specifications”, Proc. the Eighth International Conference onVery Large Data Base, Mexico City, Mexico, 1982. pp. 280-291.[44] Delobel, C., “Normalization and Hierarchical Dependencies in the Relational DataModel”, ACM Transactions on Database Systems, Vol. 3, No. 3, September 1978,pp. 201-222.[45] Dogac, A., and Chen P. P., “Entity-Relationship Model in the ANSI/SPARCFramework”, in Entity-Relationship Approach to Information Modelling and Analysis (the second International Conference on E-R Approach, 1981), edited by P.P.Chen, Elsevier Science B.V. (North-Holland), 1983, pp. 357-374.[46] Dogac, A., Chen P. P. and Erol, N., “the Design and Implementation of an IntegritySubsystem for the Relational DBMS RAP”, in the Fourth Conference on Entity-Relationship Approach, Chicago, Illinois, October 1985, pp. 295-302.[47] dos Santos, C. S., Neuhold, E. J., and Purtado, A. L., “A Data Type Approach tothe Entity-Relationship Model”, in Entity-Relationship Approach to System Analysis and Design (the First International Conference on E-R Approach), edited byP. P. Chen, North-Holland Publishing Co., 1980, Pp. 103-119.[48] Ehrich, H. D., Lipeck, U. W., and Gogolla, M., “Specification, Semantics, andEnforcement of Dynamic Database Constraints”, Proc. of the Tenth InternationalConference on Very Large Data Base, Singapore, August 1984, pp. 301-308.[49] Eswarn, K. P., and Chamberlin, D. D., “Functional Specifications of a Subsystemfor Data Base Integrity”, Proc. the Second International Conference on Very LargeData Base, Framingham, Massachusetts, September 1975, pp. 48-68.[50] Etzion, 0., “PARDES — a Model for Supporting Derivation Closure”, Working Paper, Computer and Information Sciences Department, Temple University, Philadelphia, Pennsylvania, 1989.[51] Fernandez, E. B., Summers, R. C., and Wood, C., Database Security and Integrity,Addison-Wesley Publishing Co., Reading, Massachusetts, 1981.[52] Fleming, C. C., and Halle, B. V.,Handbook of Relational Database Design, AddisonWesley Publishing Co., Reading, Massachusetts, 1989.[53] Fong, E., and Kimbleton, S. R., “Database Semantic Iitegrity for a Network DataVIanger”, AFIPS Proceedings of National Computer Conference, (Vol. 49), California, May 1980, pp. 261-268.Bibliography 181[54] Frost, R. A. (ed).,Database Management Systems, Granada Publishing Ltd., London, 1984.[55] Furtado, A. L., dos Santos, C. S., and de Castilho, J. M. V., “Dynamic Modellingof a Simple Existence Constraint”, Information Systems, Vol. 6, 1981, pp. 73-81.[56] Furtado, A. L., and Neuhold, E. J., Formal Techniques for Data Bases Design,Springer-Verlag, Berlin, 1986.[57] Furtado, A. L., Casanova, M. A., and Tucherman, L., “The CHRIS CONSULTANT”, in Entity-Relationship Approach (the Sixth International Conference onE-R Approach, New York , Nov. 1987), edited by S. T. March, Elsevier SciencePublishers, B.V. (North-Holland), 1988, pp. 515-532.[58] Gardarin, G., and Melkanoff, M., “Proving Consistency of Database Transactions”,Proc. of the Fifth International Conference on Very Large Data Base, Brazil, October 1979, pp. 291-298.[59] Goldstein, R. C., Database: Technology and Management, John Wiley & Sons,1985.[60] Goldstein, R. C., and Storey, V. C., “Data Abstraction: The Impact on DatabaseManagement”, Working Paper, Faculty of Commerce and Business Administration,The University of British Columbia, October 1990.[61] Goldstein, R. C., and Wagner, C., Manual of Instruction Database Software for Microcomputers, Faculty of Commerce and Business Administration, The Universityof British Columbia, December 1988.[62] Hammer, M. M., and McLeod, D. J., “Semantic Integrity in a Relational DataBase System”, Proc. the Second International Conference on Very Large Data Base,Framingham, Massachusetts, September 1975, pp. 25-47.[63] Hammer, M. M., and McLeod, D. J., “A Framework for Data Base Semantic Integrity”, Proc. the Second International Conference on Software Engineering, SanFranciso, California, October 1976, pp. 498-504.[64] Hammer, M. M., and McLeod, D. J., “Database Description with SDM: A SemanticDatabase Model”, ACM Transaction on Database Systems, Vol. 6, No. 3, September1981, pp. 351-386.[65] Hentenryck, P. V., Constraint Satisfaction in Logic Programming, MIT Press, Massachusetts, 1989.Bibliography 182[66] Hentenryck, P. V., “Incremental Constraint Satisfaction in Logic Programming”,in Logic Programming: Proc. of the Seventh International Conference, edited byD. H. D. Warren and P. Szeredi, 1990, PP. 189-202.[67] Henschen, 1. J., McCune, W. W., and Naqvi, S. A., “Compiling Constraint-Checking Programs from First-Order Formulas”, in Advances in Database Theory,Vol. , edited by H. Gallaire, J. Minker and J. M. Nicolas, Plenum Press, NewYork, 1984, pp. 145-169.[68] Heuser, C. A.. and Richter, G.. “On the Relationship between Conceptual Schemaand Integrity Constraints on Databases”, in Database Semantics (DS-1), editedby T. B. Steel, Jr. and R. Meersman, Elsevier Science Publishers, B.V. (NorthHolland), 1986, pp. 27-39.[69] Hillier, F. S., and Lieberman, G. J., Introduction to Operations Research, the fourthedition, Holden-Day, 1986.[70] Ho, H. C., “Integrity Control in a Relational Database”, Technical ReportS.O.C.S.828, School of Computer Science, McGill University, Montreal, Canada,March 1982.[71] Hohenstein, U. and Hülsmann, K., “A Language for Specifying Static and DynamicIntegrity Constraints”, in the Tenth International Conference on E-R Approach,San Mateo, California, October, 1991, (proceedings edited by T.J. Teorey), pp.389-416.[72] Holsapple, C., Shen, S., and Whinston, A., “A Consulting System for DatabaseDesign”, Information System, Vol. 7, No. 3., 1982, pp. 281-296.[73] Hsu, A., and Imielinski, T., “Integrity Checking for Multiple Updates”, Proc. ofACM SIGMOD International Management of Data, Austin, Texas, May 1985, pp.152- 167.[74] Hsu, C., Perry, A., Bouziane, M. and Cheung, W., “TSER: A Data ModellingSystem using the Two-Stage Entity-Relationship Approach” ,in Entity-RelationshipApproach (the Sixth International Conference on E-R Approach, New York, November 1987), edited by S. T. March, Elsevier Science Publishers, B.V. (NorthHolland), 1988, pp. 497-514.[75] Hull, R., and King, R., “Semantic Database Modelling: Survey, Applications, andResearch Issues”, ACM Computing Surveys, Vol. 19, No. 3, September 1987, pp.201-260.Bibliography 183[76] Jaffar, J., and Lassez, J-L., “Constraint Logic Programming”, 14th Annual SCMSymposium on Principles of Programming Languages, Munich, West Germany, January 1987, PP. 111-119.[77] Jaffar, J., and Michaylov, S., “Methodology and Implementation of a CLP system”,Proc. of 4th International Conference on Logic Programming, edited by J-L. Lassez,MIT Press, 1987, pp. 196-217.[78] Jarke, M., and Vassiliou, Y., “Databases and Expert Systems: Opportunities andArchitectures for Integration”, in New Applications of Databases, edited by G.Gardarin and E. Gelenhe, Academic Press London, 1984, pp. 185-201.[79] Jones, C. B., Systematic Software Development Using VDM, Prentice-Hall, 1986.[80] Kawaguchi, A., Taoka, N., Mizoguchi, R., Yamaguchi, T., and Kakusho, 0., “AnIntelligent Interview System for Conceptual Design of Database”, Proc. ECAI 1986.[81] Kennedy, A. J., and Yen, D. C., “Enhancing a DBMS Through the Use of anExpert System”, Journal of Information Management, Spring 1990, Pp. 55-61.[82] Kent, W., “Entities and Relationships in Information”, in Architecture and Models in Data Base Management Systems, edited by G.M. Nijssen, North-HollandPublishing Co., 1977, pp. 67-91.[83] Kent, W., “Limitations of Record-Based Information Models”, ACM Transactionson Database Systems, Vol. 4, No. 1, March 1979, pp. 107-131.[84] Kent, W., “A Single Guide to Five Normal Forms in Relational Database Theory”,Communications of the ACM, Volume 26, No. 2, February 1983, pp. 120-125.[85] Kerstern, M. L., Weigand, H., Dignum, F., Boom, J., “A Conceptual IViodellingExpert System”, in Entity-Relationship (the Fifth International Conference on ER Approach, France, 1986) edited by S. Spaccapietra, Elsevier Science PublishersB.V. (North-Holland), 1987, pp. 35-48.[86] Kim, M.-J., Lee, W.-U., and Derniame, J.-C., “Automatic Relational Data BaseDesigns by Transformation of the Entity-Relationship Model”, IEEE the SecondInternational Conference on Computer and Application, Beijing, China, 1987, pp.418-425.[87] Knuth, E., Hannák, L., Radó, P., “A Taxonomy of Conceptual Foundations”, inData and Knowledge (DS-2), edited by R.A. Meersrnan and A.C. Sernadas, ElsevierScience Publishers, B.V. (North-Holland), 1988, pp. 205-219.Bibliography 184[88] Kobayashi, I., “Validating Database Updates”, Information Systems, Vol. 9, No.1., 1984, pp. 1-17.[89] Kozaczynski, W., and Lilieri, L., “An Extended Entity-Relationship (E2R)Database Specification and its Automatic Verification and Transformation intothe Logical Relational Design”, in Entity-Relationship Approach (the Sixth International Conference on E-R Approach, New York , November, 1987), edited by S.T. March, Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 533-549.[90] Kung, C. H., “A Temporal Framework for Database Specification and Verification”,Proc. of the Tenth International Conference on Very Large Data Base, Singapore,August 1984, pp. 91-99.[91] Kung, C. H., “A Tableaux Approach for Consistency Checking”, Information Systems: Theoretical and Formal Aspects, edited by A. Sernadas, J. Bubenko, Jr., andA. Olive, 1985, pp. 191-207.[92] Lafue, G., “Semantic Integrity Dependencies and Delayed Integrity Checking”,Proc. the Eighth International Conference on Very Large Data Base, Mexico City,Mexico, 1982, pp. 292-299.[93] Lassez, C., “Constraint Logic Programming”, BYTE, August 1987, pp. 171-176.[94] Lauriere, J. L., “A Language and a Program for Stating and Solving CombinatorialProblems”, Artificial Intelligence, Vol. 10, No. 1, 1978, pp. 29-127.[95] Lee, R. M., “Logic, Semantics and Data Modelling: An Ontology”, in Data andKnowledge (DS-2), edited by R. A. Meersman and A. C. Sernadas, Elsevier SciencePublishers, B .V. (North-Holland), 1988, pp. 221-243.[96] Lee, K., and Lee, S., “An Object-Oriented Approach to Data/Knowledge Modelling Based on Logic”, The Sixth International Conference on Data Engineering,California, February 1990, pp. 289-294.[97] Leuzerini, M., and Nobili, P., “On the Satisfiahility of Dependency Constraintsin Entity-Relationship Schema”, Proc. the Thirtieth International Conference onVery Large Data Base, Brighton, 1987, pp. 147-154.[98] Lenzerini, v1., and Santucci, G., “Cardinality Constraints in the EntityRelationship Model”, in Entity-Relationship Approach to Software Engineering (theThird International Conference on E-R Approach, California, 1983), edited by C.G. Davis, S. Jajodia, P. A. Ng and R. T. Yeh, Elsevier Science Publishers B. V.(North-Holland), 1983, pp. 529-549.Bibliography 185[99] Leveson, N. G., Wasserman, A. I., and Berry, D. M., “BASIS: A Behavioral Approach to the Specification of Information Systems”, Information Systems, Vol. 8,No. 1, 1983, pp. 15-23.[100] Ling, T.-W., “Integrity Constraint Checking in Deductive Database using the Pro-log not-Predicate”, Tech. Report, DISCS Pub. No. NOTRA7/86, National University of Singapore, July 1986.[101] Ling, T.-W., and Rajagopalan, P., “A Method to Eliminate Avoidable Checkingof Integrity Constraints”, Proc. Trends é4 Applications, Gaithereshurg, Maryland,1984, pp. 60-68.[102] Lipeck, U. W., “Stepwise Specification of Dynamic Database Behaviour”, International Conference on Data Engineering, Washington, D.C., May 1986, pp. 387-397.[103] Lockemann, P. C., “Object-Oriented Information Management”, Decision SupportSystem, 5, 1989, pp.79-102.[104] Mackworth, A. K., “Consistency in Networks of Relations”, Artificial Intelligence,Vol. 8, 1977, pp. 99-118.[105] Mackworth, A. K., and Freuder, E. C., “The Complexity of Some PolynomialNetwork Consistency Algorithms for Constraint Satisfaction Problems”, ArtificialIntelligence, Vol. 25, No. 1, January 1985, pp. 65-74.[106] Mackworth, A. K., “Constraint Satisfaction”, Encyclopedia of Artificial Intelligence, edited by S. C. Shapiro, J. Wiley & Sons, N.Y., 1987, pp. 205-211.[107] MaFadden, F. R., and Hoffer, J. A., Data Base Management, the second edition,Benjamin/Cummings Publishing Co. Inc., 1988.[108] Makowsky, J. A., Markowitz, V. M., Rotics, N., “Entity-Relationship ConsistencyFor Relational Schemas”, Proc. International Conference on Database Theory,(Lecture Note in Computer Science V. 243) edited by G. Ausiello, and P. Atzeni,Springer-Verlag, 1986, pp. 306-322.[109] Maunila, H., and Räihä, K-J, “Inclusion Dependencies in Database Design”, TheSecond International Conference on Data Engineering, Los Angeles, California,February 1986, pp. 713-718.[110] Maryanski, F., Francis, S., Hong, S., and Peckham, J., “Generation of ConceptualData Models”, Working Paper, Computer Science and Engineering Department,University of Connecticut, 1984.Bibliography 186[111] Maryanski, F., and Hong, S., “A Tool for Generating Semantic Database Applications”, IEEE COMPSAC, Chicago, Illinois, October 1985, PP. 368-375.[112] Meersman, R., “Towards Models for Practical Reasoning about ConceptualDatabase Design”, in Data and Knowledge (DS-2), edited by R. A. Meersmanand A. C. Sernadas, Elsevier Science Publishers, B.V. (North-Hollalld), 1988, pp.245-263.[113] Mees, M., and Put, F., “ Extending a Dynamic vIodelling Methods using DataModelling Capabilities: The Case of JSD”, in Entity-Relationship (the Fifth International Conference on E-R Approach, France, 1986) edited by S. Spaccapietra,Elsevier Science Publishers B.V. (North-Holland), 1987, Pp. 399-418.[114] Missikoff, M., and Wiederhold, G., “Toward a Unified Approach for Expert andDatabase Systems”, in Expert Database Systems (Proceedings from the first International Workshop), edited by L. Kerschberg, Benjamin/Cummings Publishing,1986, pp-383-99.[115] Morgenstern, M., “Active Databases as a Paradigm for Enhanced Computing Environments”, Proc. the Ninth International Conference on Very Large Data Base,Italy, 1983, pp. 34-42.[116] Morgenstern, M., “CONSTRAINT EQUATIONS: Declarative Expression of Constraints with Automatic Enforcement”, Proc. the Tenth International Conferenceon Very Large Data Base, Singapore, August 1984a, pp. 291-300.[117] Morgenstern, M., “A Concise Compatible Representation for Quantified Constraints in Semantic Networks”, AAAI-84, Proc. of National Conference on Artificial Intelligence, Texas, August 1984b, pp. 255-259.[118] Morgenstern, M., “The Role of Constraints in Databases, Expert Systems, andKnowledge Representation”, in Expert Database Systems, edited by L. Kerschberg,Benjamin/Cummings Publishing Co., 1986, pp. 351-368.[119] Morgenstern, M., Borgida, A., Lassez, C., Maier, D., and Wiederhold, G.,“Constraint-Based Systems: Knowledge about Data”, in Expert Database Systems(Proc. from the Second International Conference on EDS), edited by L. Kerschberg,Benjamin/Cummings Publishing Co., 1989, pp. 23-43.[120] Nakano, R., “Integrity Checking in a Logic-Oriented ER Model”, in Entity-Relationship Approach to Software Engineering (the Third International Conference on E-R Approach, California, 1983), edited by C. G. Davis, S. Jajodia, P. A.Ng and R. T. Yeh, Elsevier Science Publishers B. V. (North-Holland), 1983, pp.551-564.Bibliography 187[121] Newell, A., and Simon H. A., “Computer Science as Empirical Inquiry: Symbolsand Search”, Communications of the ACM, Volume 19, March 1976, PP. 113-126.[122] Nicolas, J.-M., “Logic for Improving Integrity Checking in Relational Data Bases”,Acta Informatica, 18, 1982, pp. 227-253.[123] Nilssori, N. J., Principles of Artificial Intelligence, Tioga Publishing Company,1980.[124] Obretenov, D., Angelov, Z., Mihaylov, J., Dishlieva, P., and Kirova, N., “AKnowledge-Based Approach to Relational Database Design”, Data é4 KnowledgeEngineering, 3, 1988, pp. 173-180.[125] Oren, 0., “Integrity Constraints in the Conceptual Schema Language SYSDOC”,the Fourth Conference on Entity-Relationship Approach, Chicago, Illinois, October1985, pp. 288-294.[126] Palmer, I. R., “Practicalities in Applying a Formal Methodology to Data Analysis”,in Data Base Design Techniques I: Requirements and Logical Structures, edited byS. B. Yao, et al., Spring-Verlag, Berlin, 1982, pp. 147-171.[127] Papadirnitriou, C. H., and Steiglitz, K., Combinatorial Optimization: Algorithmsand Gomplexity, Prentice-hall, N.J., 1982.[128] Paulson, D., Reasoning Tools to Support System Analysis and Design, UnpublishedPh.D. Dissertation, The University of British Columbia, Vancouver, B.C., Canada,1989.[129] Peckham, J., and Maryanski, F., “Semantic Data Models”, ACM Computing Surveys, Vol. 20, No. 3, September 1988, pp. 153-189.[130] Potter, W. D., and Kerschberg, L., “A Unified Approach to Modelling Knowledgeand Data”, in Data and Knowledge (DS-), edited by R.A. Meersman and A.C.Sernadas, Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 265-291.[131] Proix, C., and Rolland, C., “A Knowledge Base for Information System Design”, inData and Knowledge (DS-2), edited by R.A. Meersman and A.C. Sernadas, ElsevierScience Publishers, B.V. (North-Holland), 1988, pp. 293-306.[132] Qian, X., and Wiederhold, G., “Knowledge-based Integrity Constraint Validation”,Proceedings of the Twelfth International Conference on Very Large Data Base,Kyoto, 1986, pp. 3-12.Bibliography 188[133] Qian, X., and Smith, D. R., “Integrity Constraint Reformulation for Efficient Validation”, Proceedings of the thirteenth International Conference on Very Large DataBase, Brighton, 1987, pp. 417-425.[1341 Raju, K., and Majumdar, A. K., “ Fuzzy Functional Dependencies and LosslessJoin Decomposition of Fuzzy Relational Database Systems”, AGM Transactionson Database Systems, Vol. 13, No. 2, June 1988, pp. 129-166.[135] Ram, S., “Automated Tools for Database Design: Sate of the Art”, Working Paper, Dept. of Management Information Systems, College of Business and PublicAdministration, University of Arizona, 1989.[136] Reiter, R., “ On the Integrity of Typed First Order Data Bases”, in Advancesin Database Theory, Vol. 1, edited by H. Gallaire, J. Minker and J. M. Nicolas,Plenum Press, New York, 1984, pp. 137-157.[137] Reiter, R., “On Integrity Constraints”, Proc. of the Second Conference on Theoretical Aspects of Reasoning about Knowledge, 1988, pp. 97-111.[138] Rolland, C., and Proix, C., “An Expert System Approach to Information SystemDesign”, in Information Processing 86, edited by H. J. Kugler, Elsevier SciencePublishers B.V. (North-Holland), 1986, pp. 241-250.[139] Sakai, H., “On the Optimization of an Entity-Relationship Model”, 3rd USA-JAPAN Computer Conference, San Francisco, California, October 1978, pp. 145-149.[140] Sakai, H., “A Method for Entity-Relationship Behaviour Modelling”, in Entity-Relationship Approach to Software Engineering (the Third International Conferenceon E-R Approach, California, 1983), edited by C. G. Davis, S. Jajodia, P. A. Ng andR. T. Yeh, Elsevier Science Publishers B. V. (North-Holland), 1983a, pp. 111-129.[141] Sakai, H., “E-R Approach to Logical Database Design”, in Entity-RelationshipApproach to Software Engineering (the Third International Conference on E-RApproach, California, 1983), edited by C. G. Davis, S. Jajodia, P. A. Ng and R. T.Yeh, Elsevier Science Publishers B. V., North Holland, 1983b, pp. 155-187.[142] Scheuermann, P., Schiffner, G., and Weber, H., “Abstraction Capabilities and Invariant Properties Modelling within the Entity-Relationship Approach”, in EntityRelationship Approach to System Analysis and Design (the First International Conference on E-R Approach), edited by P. P. Chen, North-Holland Publishing Co.,1980, pp. 121-140.Bibliography 189[143] Schrefl, M., Tjoa, A. M., and Wagner, R. R., “Comparison-Criteria for SemanticData Models”, International Conference on Data Engineering, 1984, pp. 120-124.[144] Segev, A., “Transitive Dependencies”. in the Surveyors’ Forum of AM ComputingSurveys, Vol. 19, No. 2, June 1987, pp. 191-193.[145] Shepherd, A., and Kerschberg, L., “Constraint Management in Expert DatabaseSystems”, in Expert Database Systems (Proc. form the First International Workshop), edited by Larry Kerschberg, Benjamin/Cummings Publishing Co., 1986, pp.309-331.[146] Simon, E., and Valduriez, P., “Design and Implementation of an Extendible Integrity Subsystem”, ACM-SIGMOD Proc. International Conference on Management of Data, Boston, Massachusetts, June 1984, pp. 9-17.[147] Smith, J. M., and Smith, D. C. P., “Database Abstractions: Aggregation and Generalization”, ACM Transactions on Database Systems, Vol. 2, No. 2, June 1977a,pp. 105-133.[148] Smith, J. M., and Smith, D. C. P., “Database Abstractions: Aggregation”, Communications of the ACM, Vol. 20, No. 6, June 1977b, pp. 405-413.[149] Solvberg, A., and Kung. C. H., “On Structural and Behavioral Modelling of Reality”, in Database Semantics (DS-1), edited by T.B. Steel, Jr. and R. Meersman,Elsevier Science Publishers, B.V. (North-Holland), 1986, pp. 205-221.[150] Spivey, J. M., Understanding Z, Cambridge University Press, 1988.[151] Stonebraker, M., “Implementation of Integrity Constraints and Views by QueryModification”, Proc. of ACM SIGMOD International Management of Data, SanJose, May 1975, pp. 65-78.[152] Storey, V. C., View Creation: An Expert View Creation System for DatabaseDesign, Ph.D. Dissertation, Faculty of Commerce and Business Administration,University of British Columbia, Vancouver, B.C., Canada, October 1986, ICITPress,1988.[153] Storey, V. C., and Goldstein, R. C., “A Methodology for Creating User Viewsin Database Design”, AC’M Transactions on Database Systems, Vol. 13, No. 3,September 1988, pp. 30.5-338.[154] Storey, V. C. and Goldstein, R. C., “Design and Development of an ExpertDatabase Design System”, International Journal of Expert Systems: Research andApplications, Vol.3, No.1, 1990, pp. 31-63.Bibliography 190[155] Storey, V. C., and Goldstein R. C., “Knowledge-Based Approaches to DatabaseDesign”, Working Paper, University of Rochester, 1991.[156] Studer, R., “A Conceptual Model for Physical and Logical time”, in Entity-Relationship Approach (the Sixth International Conference on E-R Approach, NewYork , Nov. 1987), edited by S. T. March, Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 223-235.[157] Su, S. Y. XV., and Raschid, L. “Incorporating Knowledge Rules in a SemanticData Model: An Approach to Integrated Knowledge Management”, the SecondConference on Artificial Intelligence Application, Miami, Florida, December 1985,pp. 250-256.[158] Tauzovich, B., “An Expert System for Conceptual Data Modelling”, in the EighthInternational Conference on E-R Approach, Toronto, Canada, October 1989.[159] Tauzovich, B., “Towards Temporal Extensions to the Entity-Relationship Model”,in the Tenth International Conference on E-R Approach, San Mateo, California,October. 1991, (proceedings edited by T. J. Teorey). pp. 163-179.[160] Teorey, T. J., Yang, D., and Fry, J. P., ‘ A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model”, ComputingSurveys, Vol. 18, No. 2, June 1986, pp. 197-222.[161] Thompson, J. P., Data with Semantics: Data Models and Data Management, VanNostrand Reinhold, N.Y., 1989.[162] Troyer, 0. D., RIDL*: A Tool for the Computer-Assisted Engineering of LargeDatabases in the Presence of Integrity Constraints”, Proc. of ACM SIGMOD International Management of Data, Oregon, June 1989, pp. 418-429.[163] Tsichritzis, D. C., and Lochovsky, F. H., Data Models, Prentice-Hall Inc., 1982.[164] Ullman, J. D., Principles of Database Systems, the second edition, Computer Science Press, Rockville. Maryland, 1982.[165] Urban, S. D., and Delcambre, L. M. L., “An Analysis of the Structural Dynamic,and Temporal Aspects of Semantic Data Models”, The Second International Conference on Data Engineering, Los Angeles, California, February, 1986, pp. 382-389.[166] Urban, S. D., and Delcambre, L. M. L., “Constraint Analysis for Specifying Perspectives of Class Objects”, The Fifth International Conference on Data Engineering, Los Angeles, California, February 1989, pp. 10-17.Bibliography 191[167] van der Lans, R. F., The SQL Standard — A Complete Reference, originally inDutch, translated into English by Andrea Gray, original Dutch published by Academic Science, Schoonhoven, 1988 (translated English version published by PrenticeHall, 1989).[168] Wagner, C., View Integration in Database Design Unpublished Ph.D. Dissertation,The University of British Columbia, Vancouver, B.C., Canada, April 1989.[169] Wand, Y., and Weber, R., “An Ontological Analysis of Some Fundamental Information Systems Concepts”, Proc. of the Ninth International Conference on Information Systems, Minneapolis, Mn., 1988, pp. 213-225.[170] Wand, Y., and Weher, R., “An Ontological Analysis of Some Systems Analysisand Design Methods”, in Information Systems Concepts— An In-Depth Analysis,edited by E. Falkenberg and P. Lindgreen, North-Holland Publishing Co., Amsterdam, 1989, pp. 79-107.[171] Wand, Y., and Weber, R., “Toward a Theory of the Deep Structure of InformationSystems”, Proc. of the Eleventh International Conference on Information Systems,Copenhagen. December 1990.[172] Weber, W., Stucky, W., and Karszt, J., “Integrity Checking in Data Base Systems”,Information Systems, Vol. 8, No. 2, 1983, pp. 125-136.[173] Webre, N.W., “An Extended Entity-Relationship Model and its Use on a DefenseProject”, in Entity-Relationship Approach to Information Modelling and Analysis(the second International Conference on E-R Approach, 1981), edited by P. P.Chen, Elsevier Science, (North-Holland Publishing Co.), 1983, pp. 173-193.[174] Wilmot, R. B., “Foreign Keys Decrease Adaptability of Database Designs”, Communications of the ACM, Vol. 27, No. 12, December 1984, pp. 1237-1243.[175] Wong, H. K.T., and Mylopoulos, J., “Two Views of Data Semantics: A Survey ofData Models in Artificial Intelligence and Database Management”, INFOR, Vol.15, No. 3, October 1977, pp. 344-383.[176] Yang, H.-L., and Goldstein, R. C., “Identification of Semantic Integrity Constraintsfor Database Design”, Working Paper, 89-MIS-021, Faculty of Commerce and Business Administration, The University of British Columbia, 1989.[177] Yang, H.-L., “Semantic Integrity Constraint Representation Model: Some Illustrative Examples”, Working Paper, Faculty of Commerce and Business Administration, The University of British Columbia, February 1992.Bibliography 192[178] Yasdi, R., and Ziarko, W., “Conceptual Schema Design: A Machine LearningApproach”, in Methodologies for Intelligent Systems, edited by Z. W. Ras andM. Zemankova, Elsevier Science Publishers Co., 1987, pp. 379-391.[179] Zvieli, A., and Chen, P. P., “Entity-Relationship Modelling and Fuzzy Databases”,The Second International Conference on Data Engineering, Los Angeles, California,February 1986, pp. 320-327.Appendix ABNF Descriptions of the SIC Representation ModelThis appendix provides a summary of the syntax of the SIC Representation model usedin this research. The following usual BNF meta-symbols are used: < > { } [If a terminal symbol happens to be identical to a meta-symbol, it will be written withina pair of escape symbols, e.g., :j representing the terminal []CERTAINTY FOR ON [IF ]ASSERT ELSE ::= -< operation abbreviation>-[-< optional part>]SIC type is specified by the automated database design system or the databasedesigner using some conventions. [-] ::= I .< attribute name>I .< attribute name>.Entity type name, relationship type name, relation type name, and attribute nameare all specified by the database designer, and by convention, begin with a capitalletter. ::= I D U193Appendix A. BNF Descriptions of the SIC Representation Model 194 ::= ( {, }) :: Prolog variable is any name beginning with a capital letter or an underline sign(following the Prolog convention). In this case, it is used to represent a singleobject type name or a set including some object type names. 1 2 I 3Note that they are positive integer numbers. ::= I ::= certain uncertainNote that “certain” is equivalent to the ratio certainty number 100%, “uncertain”expresses an unknown ratio certainty number that is less than 100%. ::= usually I sometimesNote that the database designer would specify these fuzzy certainty factors to beequivalent to some certainty numbers (for example, “usually” may be set to 80%). ::= 1 2 3Note that they are positive integer numbers. ::= 1% I ... I 100%Note that they are positive real number from 1% to 100% inclusive and are represented in percentage. ::= insertion deletion I update :— [ ] [ ] :: [ ] ::= () ()Appendix A. BNF Descriptions of the SIC Representation Model 195 ::= [(< argument> {,})]Predicate name is any name beginning with a lower-case letter following the usualProlog convention. The built-in logical predicates for storing or manipulating information in data dictionary are listed in Appendix B. not j — ::= A ::= V ::= V I ::= I ::= I Atom is the same as in Prolog, i.e. it is made up of letters and digits, and beginswith a lower-case letter; string is a character string. ::= ::=