Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Incorporating semantic integrity constraints in a database schema Yang, Heng-li 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1992_fall_yang_heng-li.pdf [ 6.94MB ]
Metadata
JSON: 831-1.0086621.json
JSON-LD: 831-1.0086621-ld.json
RDF/XML (Pretty): 831-1.0086621-rdf.xml
RDF/JSON: 831-1.0086621-rdf.json
Turtle: 831-1.0086621-turtle.txt
N-Triples: 831-1.0086621-rdf-ntriples.txt
Original Record: 831-1.0086621-source.json
Full Text
831-1.0086621-fulltext.txt
Citation
831-1.0086621.ris

Full Text

INCORPORATING SEMANTIC INTEGRITY CONSTRAINTSIN A DATABASE SCHEMAl)yHeng-Li YangB. Sc., National Chiao Tung University, 1978M. Commerce, National Cheng Chi University, 1980M. Sc. (Computer Science), Pennsylvania State University, 1985A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESTHE FACULTY OF COMMERCE AND BUSINESS AI)MINISTRATIONWe accept this thesis as conformingto the required sta dardTHE UNIVERSITY OF BRITISH COLUMBIAAugust 1992© Heng-Li Yang, 1992In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.______________Department of OJIcL 8usiui esc Ad,nmis*iatioJ%The University of British ColumbiaVancouver, CanadaDate 6’, /792DE-6 (2/88)AbstractA database schema should consist of structures and semantic integrity constraints. Semantic integrity constraints (SICs) are invariant restrictions on the static states of thestored data and the state transitions caused by the primitive operations: insertion, deletion, or update. Traditionally, database design has been carried out on an ad hoc basisand focuses on structure and efficiency. Although the E-R model is the popular conceptual modelling tool, it contains few inherent SICs. Also, although the relational databasemodel is the popular logical data model, a relational database in fourth or fifth normalform may still represent little of the data semantics. Most integrity checking is distributedto the application programs or transactions. This approach to enforcing integrity via theapplication software causes a number of problems.Recently, a number of systems have been developed for assisting the database designprocess. However, only a few of those systems try to help a database designer incorporateSICs in a database schema. Furthermore, current SIC representation languages in theliterature cannot be used to represent precisely the necessary features for specifyingdeclarative and operational semantics of a SIC, and no modelling tool is available toincorporate SICs.This research solves the above problems by presentillg two models and one subsystem.The E-R-SIC model is a comprehensive modelling tool for helping a database designer incorporate SICs in a database schema. It is application domain-independent and suitable11for implementation as part of an automated database design system. The SIC Representation model is used to represent precisely these SICs. The SIC elicitation subsystemwould verify these general SICs to a certain extent, decompose them into sub-SICs ifnecessary, and transform them into corresponding ones in the relational model.A database designer using these two modelling tools can describe more data semanticsthan with the widely used relational model. The proposed SIC elicitation subsystem canprovide more modelling assistance for him (her) than current automated database designsystems.U’Table of ContentsAbstract iiList of Figures xiAcknowledgement xii1 Introduction 11.1 Database Design 11.2 Semantic Integrity Constraints 31.3 SICs, “Constraints” and Transactions 41.4 Enforce Semantic Integrity Constraints via Application Software 81.5 Embed Semantic Integrity Constraints in a Database 101.6 The Research Questions, Objectives, Methodology and Scope 121.7 Contributions 171.8 The Dissertation Outline 192 Review of Previous Work 20iv2.1 SIC Classification.2.1.1 Classification Based on2.1.2 Classification Based on2.1.3 Classification Based on2.1.4 Classification Based on2.1.5 Classification Based on2.1.6 Classification Based on2.1.7 Classification Based on2.1.8 Classification Based on2.22.32.42.5ExplicitnessApplied ObjectsOperation TypePreconditionCertainties of SICsViolation ActionEnforcement Schedule .Dynamics21212225262628293134343840413 A Model for Representing Semantic Integrity Constraints3.1 An Overview of the Representation Model3.2 SIC Name4343512.1.9 Summary of Comments on SIC ClassificationSIC RepresentationSIC VerificationSIC Reformulation and DecompositionAutomated Database Design Aids for Eliciting SICsv3.3 Certainty Factor (F) 533.4 Object (0) 543.5 Operation Type (T) 553.6 Precondition (C) 553.7 Predicate (P) 563.8 Violation Action (A) 574 The Application of the SIC Representation Model 614.1 Completeness of a SIC Specification for Database Design 614.1.1 Not Fewer Components 634.1.2 No More Components 694.2 SIC Abstractions 764.3 Database Management 804.3.1 SIC Management 804.3.2 Other Aspects of Database Management 845 An Extended E-R Model Incorporating Semantic Integrity Constraints 865.1 Problems with Previous E-R Models 865.2 An Overview of the E-R-SIC Model 88vi5.2.1 Primitive Modelling Constructs 885.2.2 Data Abstractions 925.2.3 Basic Properties of SICs 975.3 Entity Attribute SICs 1045.4 Entity SICs 1065.5 Relationship SICs 1095.5.1 Necessary Conditions 1125.5.2 Sufficient Conditions 1185.6 SICs Implied by Implicit Relationships and Data Abstractions 205.7 Summary of the E-R-SIC Model 1246 The Application of the E-R-SIC Model 1306.1 An Example of Using the E-R-SIC Model 1306.2 Potential Pitfalls of Using the E-R-SIC Model . . . 1376.3 Data Integrity Semantic Completeness 1407 A Proposed Database Design Aid for Eliciting SICs 1447.1 An Overview of the SIC Elicitation Subsystem . . . 1457.2 SIC Verification. 149rjj7.2.1 Consistency and Nonredundancy Rules for SIC Types 1557.3 SIC Reformulation and Decomposition 1577.3.1 Representation of Generic SICs 1617.4 Transforming SICs to Relational Form 1658 Conclusions and Further Research 1708.1 Conclusions and Contributions 1708.2 Future Research Extensions to this Dissertation 172Bibliography 176Appendices 193A BNF Descriptions of the SIC Representation Model 193B Summary of the Predicates used in this Research 198B.1 Input Predicates 198B.2 Manipulation Predicates 203C BNF Descriptions of the Simplified Format 210D SIC Type Classification in the E-R-SIC Model 213viiiE Examples of Heuristics 225F Verification of Aggregate Attribute SICs and Cardinalities 227F.1 Simple Tests on Aggregate Attribute SICs 227F.2 Algorithms for Verifying Cardinalities 228G Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 231H SIC Reformulation and Decomposition Algorithms 23611.1 Find the Relevant Object and Operation Components 236H.2 Write the Proper Precondition and Predicate Components 240H.3 Suggest the Violation Action Component 245H.4 Generate the SIC Name 246I Some Examples of SIC Reformulation and Decomposition 248J Generic SIC Representation in the E-R-SIC Model 254K Algorithms Transforming SICs to Relational Form 266K.1 Transform the SIC Representation 266IK.2 Construct SIC Name Sets for the Foreign Key Update 269ixL Some Examples of Transforming SICs to Relational Form 270M Generic SIC Representation in the Relational Model 276xList of Figures1.1 A Proposed Automated Database Design Subsystem for Eliciting SICs . 165.1 Grouping: Member (M), Derived Set (DS), Indexing Entity (I), IndexingRelationship (R) 975.2 A Line Layout Context 1105.3 A Sta.r Layout Context 1115.4 A Loop-2 Layout Context 1115.5 A Loop-n Layout Context 1125.6 What is in the database? 1255.7 Single Entity Attribute SICs 1265.8 Single Entity SICs 1275.9 Relationship SICs 1285.10 SICs Implied by Implicit Relationships and Data Abstractions 1296.1 All Example: A Car Dealership Database. 131xiAcknowledgementI wish to take this opportunity to express my sincere gratitude to all members of mydissertation committee, Professor Alvin Fowler, Dr. Robert C. Goldstein, Dr. Veda C.Storey, Dr. Yair Wand, and Dr. Carson Woo. In particular, I am especially indebted tomy research supervisor, Dr. R. C. Goldstein, for his patience, support and countless hoursof valuable discussions. I am also grateful to Dr. V. Storey because she suggested thisgreat research topic and gave her opinions all the way. Many thanks go to Dr. Y. Wandfor his critical and stimulating comments. I appreciate Dr. C. Woo for his commentson the dissertation organization and Professor Al. Fowler for his practical suggestions.In addition, I would like to thank my friend, Dr. Hsueh-Ming Hang, and all staff of theinter-library loa.n division of UBC Library, for their helping collect the related literaturein the early stage of this research. Finally, I thank my parents for their love and supportall the time.xiiChapter 1Introduction1.1 Database DesignDatabase management systems (DBMSs) have been available for more than two decades.However, database design has also been recognized as a task with a high level of complexity ([Obretenov, et al., 1988]) and has often been referred to as an art rather than ascience ([Holsapple, et al., 1982]). Database design is the process of modelling the information requirements of a real-world application and mapping them onto an underlyingDBMS. Database design must go through the following phases: (1) information requirement elicitation, during which requirements for knowledge of a real-world application aredetermined; (2) conceptual database design, which produces a high-level representationof the requirements independent of the DBMS that will be used, e.g., the output mightbe expressed as an Entity-Relationship (E-R) model; (3) logical database design, whichproduces a logical schema that corresponds to the data model of the chosen DBMS, e.g.,the output might he expressed as a relational data model; and (4) physical databasedesign, which transforms the logical design into a form that is suitable for the givenhardware and DBMS, and considers efficiencies of storage and processing.Traditionally, database design has been carried out on an ad hoc basis ([Bouzeghoubet al., 1985], [Goldstein, 1985j). It has usually been performed by a human “database1Chapter 1. Introduction 2design expert”. The weaknesses of the traditional approach are that: (1) it is a difficulttask and expert database designers are scarce; and (2) the design of a database is done bysomeone who is unfamiliar with the application domain, instead of end-users ([Storey andGoldstein, 1991]). Recently, a number of computerized systems have been developed forassisting the database design process ([Ram, 1989], [Storey and Goldstein, 1990]). Someof those can be classified as knowledge-based or expert systems; others are automationtools. Those systems are designed for assisting conceptual and/or logical design processes.Some even provide help for physical database design. Automation of database designprocess can help overcome the problems of the above traditional approach by codifyingthe database design methodology in a computer program.However, some major problems remain. First, only a few of the above automatedsystems try to help a database designer incorporate semantic integrity constraints ina database schema ([Yang and Goldstein, 1989]). Many types of semantic integrityconstraints have never been identified by any system. Furthermore, as observed byTroyer [1989, p. 423], “constraints often considered as first class citizens in the conceptualmodelling seem to become pariahs during the transformation [from a conceptual schemato a relational schema]”. Second, those automated systems usually only consider databasestates and static properties. They seldom consider the behaviour of a database, that is,the state transitions and dynamic properties. Some systems (e.g., E2R [Kozaczynskiand Lilien, 1988] and EXIS [Yasdi and Ziarko, 1987]) model the behavioral semantics bytransaction modelling or event modelling. Dynamic semantic integrity constraints, whichplace restrictions on database transitions, have not really been treated as constraints ondata.Chapter 1. Introduction 31.2 Semantic Integrity ConstraintsSemantic integrity is concerned with the logical meaning (i.e., the intension) of storeddata and preserving the correctness of database contents even though users and application programs try to modify it incorrectly ([Fong and Kimbleton, 1980], [Fernandez, etal., 1981]). A database schema consists of structures and semantic integrity constraints(hereafter abbreviated as SICs) ([Tsichritzis and Lochovsky, 1982], [Frost, 1984]). Thestructure part tells us relatively little information other than the basic structure whatelementary items of data are grouped into what larger units; but the SIC part providesinformation about all allowable occurrences— current and future ([Morgenstern, et al.,1989]). These SICs express data integrity semantics, that is, the part of the meaning ofstored data needed to determine correctness. They are invariant restrictions on the staticstates of the stored data and the state transitions caused by the primitive operations:insertion, deletion, or update. They express what is and is not allowed in the part of theuniverse that is represented by the stored data’.Traditional database design techniques focus on structure and efficiency. The relational database model is the popular logical data model with a good theoretical foundation. Data dependency (e.g., functional dependencies, and multivalued dependencies) theory has been well-formalized in the literature (e.g., [Delohel, 1978], [Ullman, 1982]). However, data dependencies only capture part of semantic integrity. A relational database infourth or fifth normal form may still represent little of the data semantics, that is, themeaning of stored data. In addition, as observed by Kent [1979, p. 127], “The assurnption teiids to be that functional dependencies (if specified at all) have been used duringthe design phase of the database to insure that relations are in third normal form, and‘SICs also provide information that can be used to determine the most appropriate structure for theschema.Chapter 1. Introduction 4then discarded. They do not seem to be present at run time to explain the semanticstructure of the data.”Limitation of SICs Although a database with embedded SICs would be more correctthan the same one without SICs, the absolute correctness of the database is still notguaranteed. For instance, a user might incorrectly update the salary of an employee but,as long as it was within the range allowed by the SICs, it would still be accepted bythe DBMS2.Also note that enforcement of a SIC is based on the assumption that alldata already stored in the database are correct. If we know that some data were enteredincorrectly, the related SICs may need to be turned off in order to correct these errors3.In addition, SICs are passive restrictions on data. By incorporating violation actionsto alter other objects when a SIC is violated, one can make the database more active([Morgenstern, 1983]). However, a SIC can only trigger some action when it is violated.1.3 SICs, “Constraints” and TransactionsThe reader should be cautious that in the literature the word “constraints” is sometimesused to include all abstract relationships between objects in an information system, e.g.,the correctness of mapping to physical storage, and those preserving reliability, concurrentconsistency, and security. The SICs discussed in this dissertation are a proper subset of2Similarly, Thompson [1989, P. 95] points out that “a semantic database does not, and cannot,provide meaning, or strong-semantics, but it provides contexts within which it is possible for data to bemeaningful”. He names these contexts as weak-semantics. This dissertation does not adopt that term,but the reader should be cautious of the limitation.3For instance, if a SIC states “salary never decreases”, an update from $12,000 to $10,000 will berejected. However, it is possible that the $12,000 was entered incorrectly the first time. The SIC mustbe turned off in order to correct the input error.Chapter 1. Introduction 5these more general “constraints” (e.g., [Shepherd and Kerschberg, 1986]) “laws”, or “sub-laws” ([Paulson, 1989], [Wand and Weber, 1988; 1989; 1990]). The following are somekinds of “constraints” that are not dealt with in this research.Other Database Constraints In a database management system, there are at leastfour major aspects to the prevention of errors in a database environment: reliability,concurrent consistency, security, and semantic integrity ([Hammer and McLeod, 1975],[Eswarn and Chamberlin, 1975]). Reliability is concerned with errors due to the malfunctioning of system hardware or software. Concurrent consistency is the prevention ofinconsistencies that may arise due to concurrent processing (in which multiple processesconcurrently operate on shared data). Security deals with preventing users from accessing and manipulating the data in unauthorized ways. Although unauthorized updatesto the database are sometimes said to violate the “integrity” of the system, this type oferror is a security problem and is not considered in this research.Non-database Constraints Database design is part of information system development, which is the process of modelling a portion of the real world and transforming itinto an implemented artifact to deal with information processing functions in an organization. SICs are facts about the stored data ([Oren, 1985]) and are used to capture dataintegrity semantics ([Dampney, 1988]). However, data integrity semantics do not includeall information system semantics, that is. knowledge represented in the information system. Some constraints are inherent to application programs or transactions rather thandata ([Flemming and Halle, 1989, p. 140]).Chapter 1. Introduction 6Transaction Modelling Data-oriented system modelling has long been criticizedfor focusing only on static properties of an information system. There is ongoing researchon how to add dynamics to data-oriented modelling. For example, the ACM/PCM (Active and Passive Component Modelling) uses SHM+ (the Extended Semantic HierarchyModel) to include both structural and behavioral properties ([Brodie and Ridjanovic,1984].) The popular conceptual modelling tool, the E-R model, has also been extendedto model behavioral aspects of the real world by defining transactions or events (e.g.,applying the Petri Nets technique [Sakai, 1983a], [Solvberg and Kung, 1986]). A transaction, sometimes called an application-oriented operation (e.g., “hire”) ([Casanova andFurtado, 1984]), consists of one or more database querying and/or altering primitive operations that must be treated as an atomic unit, and reflects changes and “happenings”in the real world. In transaction modelling, transaction specifications often include preconditions and post-conditions. The pre-conditions specify what must be true before thetransaction can be applied. The post-conditions specify the actions to be taken and thetest-conditions that should be true after the transaction.Transaction-driven Constraints An information system is an artifact that relieson transactions to track changes in the real world. SICs place logical restrictions onstored data and are independent of any particular transaction. Transaction-driven constraints are inherent to transactions rather than data, and assure consistency betweenthe information system and the real world. A transaction-driven constraint requires thatwhen a change or “happening” occurs in the real world, a transaction must be performed4Similarly, there are some information system development methods, e.g., the Z approach ([Spivey,1988]) and VDM (Vienna Development Method) ([Jones, 1986]) to express formal specifications of staticand dynamic aspects of information systems by modelling transactions or events.Chapter 1. Introductionto modify the database faithfully. For example, pre-conditions of a transaction may include some procedural or manual checks (e.g., issuing a message to ask whether thereare signed documents or calls from customers). They are more prone to change as anorganization evolves. They might include rules on some objects (e.g., virtual fields) inthe information system that are not modelled as stored data in the database. They arelikely to require a great deal of human checking since they may involve non-data objects(e.g., the above mentioned signed documents).Note that because a transaction consists of primitive operations, it must also conform to all of the SICs on data. It might be inefficient to enforce SIC checking when atransaction is performed. It is possible to design an algorithm to transform SICs into thepre-conditions and post-conditions of transactions. For example, Lipeck [19861 describessome general rules for these transformations based on his temporal logic language for dynamic SICs. In fact. most of the usual pre- or post-conditions of transactions discussedin the database literature are transformed SICs rather than true transaction-driven constraints.To illustrate, consider the following example. A clerk receives a telephone call from acustomer to place an order. A transaction-driven constraint stipulates that a transaction“new-order” must be performed. It may stipulate that the transaction must be performedexactly according to the memo written by the clerk or the cassette recording of the call.In addition, it may stipulate that the transaction should be performed immediately bythe clerk or in batch during the night by a computer operator. If the transaction is tobe performed, a record of the order is to be “inserted” into the database. SICs wouldthen check the attributes of an order, the existence of the customer in the database,etc. However, since SICs are intensional expressions, they do not, in general, restrictwhen an order must be put into the database or to whom the order should be shipped,chapter 1. Introduction 8etc. In this example, in practice, it would also be impossible to use a SIC to increaseautomatically the customer’s account balance by the total price of the order unless allpast orders and payments are kept in the database to allow evaluation of an invariantformula among these objects5. Sending a bill to a customer may also involve a numberof transaction-driven constraints in addition to SICs.1.4 Enforce Semantic Integrity Constraints via Application SoftwareIn traditional data modelling, database design is concerned with the structure of the dataand most integrity checking is left to the application programs (procedures). Fernandez,et al. [1981, p. 109] identify the problems of relying on application programs for integritychecking as follows.• Checking is likely to be incomplete because the application programmer may notbe aware of the semantics of the complete database;• Each application program must trust other programs that modify the database —one rogue program could corrupt the whole database;• Code to enforce the same SICs occurs in a number of programs, wasting programming effort and risking inconsistencies;5One should note that statement-i “the new customer balance is equal to its old balance plus totalprice of the order” is not an invariant assertion of a SIC for updating Customer.Balance. That statementis specific for the transaction, New_Order, to update the Customer.Balance, and does not apply to othertransactions (e.g., a new payment by the customer or an update of the Order. TotaiPrice), which alsoaccess these same data objects (i.e. Customer.Balance or Order. TotaiPrice). The invariant formula inthis example would be that Cttstomer.Balance is equal to the sum of Order. TotaiPrice minus the sumof PaymenLAmount. However, this formula will normally be inefficient to check. For efficiency reasons,we may transform this formula into pre- or post-conditions on transactions. For example, statement-iwould be a post-condition of the New-Order transaction.Chapter 1. Introduction 9• The criteria for integrity are buried in procedures and are therefore hard to understand and control;• Primitive operations (update, insert, and delete) performed by users of high-levelquery languages cannot be controlled.Fleming and Halle [1989, p. 140] state that traditional modelling methods typicallyleave the definition of SICs to application development rather than to database development. This application-driven method raises three dangers as outlined below:• A user may define the SICs too narrowly, reflecting only the needs of the immediateapplication;• Multiple users interested in different applications may have different perspectiveson SICs. Thus they may define inconsistent or even conflicting SICs as part of theapplication specifications;• Maintaining correctness and consistency across SICs implemented via multiple applications may be extremely difficult (or impossible) to accomplish as applicationsevolve and new applications arrive.In summary, Fernandez et al., and Fleming and Halle identify the disadvantages ofenforcing integrity via the application software as: incomplete, inconsistent, redundant, incorrect, hard to understand and control, and difficult to maintain.Using the database embedded SIC approach will overcome these disadvantages.Chapter 1. Introduction 101.5 Embed Semantic Integrity Constraints in a DatabaseDatabase Approach Compared to traditional information processing before the development of database concepts, the database approach is claimed to have the followingadvantages [Goldstein, 1985, p. 8]:• Controlling data duplication and inconsistency;• Facilitating the sharing of data among applications;• Assisting the coherent management of data as a basic organization resource;• Increased programmer productivity;• Increased applications’ reliability;• Enabling quick, economical response to ad hoc requests for information;• Protecting data from damage or unauthorized access;• Providing data independence.It is not surprising to find that the disadvantages of enforcing integrity via the application software criticized by Fernandez et al., and Fleming and Halle are similar to thedisadvantages to distributing stored data to separate files instead of a database. The fulladvantages of the “true database approach” are still not achieved if only the corporatewide “data” are included in the database, but the corporate-wide SICs are distributedto separate application programs.Chapter 1. Introduction 11Behavioral Modelling As stated in the previous section, research has been ongoingto enhance data-oriented system modelling by incorporating some behavioral modellingmethods to model the state transition and dynamic properties of an information system. Some researchers claim that transaction modelling is part of database design (e.g.,[Brodie, 1986], [Brodie and Manola, 1989]). Some researchers even propose using transaction specifications to replace SIC specifications (e.g., [Abiteboul and Vianu, 1985],[Lipeck, 1986]). However, even if we agree that a “complete” database design includestransaction modelling, there would be some disadvantages of distributing SICs to separate transactions (e.g., redundancy and maintenance problems) that are similar to thecase of distributing SICs to separate application programs. Both dynamic and staticSICs are logical restrictions on data and should be embedded in a database.In summary, this research assumes that transaction modelling is still valuable, butSIC specifications are fundamental to behavioral modelling. The position of this researchis similar to the idea of having both specifications in the BASIS approach ([Leveson et al.,1983]), the idea of hierarchical levels of database specifications proposed by de Castilho,et al. [1982] and Casanova and Furtado [1984], and the hierarchical specification layersused in the CADDY design environment ([Hohenstein and Hülsmann, 1991]). That is, a“complete” database design will produce the two levels of specifications as follows.1. First-level specifications form the database schema that consists of structures andSICs. These are data-driven and tend to be more fundamental (stable) since theyare not affected by the addition or deletion of transactions;2. Second-level specifications consist of the specifications of transactions. The preconditions and post-conditions provide an effective way of implementing SICs andmay contain some transaction-driven constraints.Chapter 1. Introduction 12Knowledge-based Perspective Databases have been criticized for lacking abstractknowledge6 ([Wong and Mylopoulos, 1977], [Bubenko, 1980]), and inference capabilities([Wong and Mylopoulos, 1977], [Brodie, 1986]). Recently, the development of knowledge-based systems (KBS) provides useful insights for database research. Some researchers(e.g., Jarke and Vassiliou [1984], Missikoff and Wiederhold [1986], and Kennedy and Yeh[1990]) have recognized that a KBS can provide a DBMS with better semantic modelling,reasoning ability, improved user interface, etc. A special type of system, expert databasesystem, has been proposed to integrate a knowledge-based system (expert system) with adatabase system ([Missikoff and Wiederhold, 1986]). The approach to embedding SICs inthe database fits this new trend. SICs are abstract knowledge that can be used to providedeductive capabilities to a database, that is, make a DBMS appear more “intelligent”.An obvious example is the case where SICs are used to reduce knowledge-based queryevaluation costs ([Brodie, 1986]). For example, some queries can be answered using oniythe semantics expressed through SICs ([Chakravarthy, et al., 1987]).1.6 The Research Questions, Objectives, Methodology and ScopeRecently, a number of semantic data models (e.g., [Hull and King, 1987], [Peckham andMaryanski, 1988]) have been proposed to overcome the weaknesses of traditional datamodels (i.e., the hierarchical, network, and relational models) in modelling the semanticsof the real world. However, semantic integrity constraints have not received the attentionthey deserve. The relational model is still the most popular logical data model. Almostall research on semantic integrity constraints in the literature is confined to exploring6Bubenko [1980] defines two kinds of knowledge: (a) concrete knowledge is seen as facts, statementsconcerning individual phenomena, entities or relationships in the model of the environment; (b) abstractknowledge denotes such information which augments our interpretation of concrete information and bywhich we can draw inferences, conclusions of other facts.Chapter 1. Introduction 13the efficient enforcement (checking) methods for a few kinds of SICs in a relationaldatabase or deductive database. No suitable language exists to represent the featuresof SICs precisely: certain or uncertain; for what data object; operation-independent oroperation-dependent; conditional or unconditional; strong, soft or self-correcting; andstatic or dynamic7.Research Questions Based upon the above observation, this research addresses thefollowing questions:Is it possible that the features of SICs can be precisely represented by somelanguage when using an E-R model for conceptual modelling and a relationalmodel for logical modelling?Can we provide a model to incorporate the necessary SICs in a databaseschema during conceptual modelling?Can we help a database designer capture these SICs by providing automatedtools?Research Objectives To answer the above research questions, the following specificobjectives were established.1. To develop a precise SIC Representation model for specifying the componentsof a SIC. Since current languages are not sufficient to represent the features of a7These features are described in detail in Chapter 2. An operation-dependent SIC is the one thatmust hold for some object upon some operations, but not upon other operations. Violating a strong SICwould cause errors — reject the operation. Violating a soft SIC would only get a warning message.Chapter 1. Introduction 14SIC when the structure part of a schema is in the E-R model or in the relationalmodel, this research will first present a SIC Representation model that is extendedand modified from the model proposed by Fernandez et al. [1981], Date [1983], andBertino and Apuzzo [1984].2. To develop a comprehensive modelling tool, called the E-R-SIC model, for incorporating the necessary SICs during conceptual modelling. Since the traditionalE-R model contains only a few inherent SICs, this research proposes a comprehensive model, which is an extended E-R model, for incorporating SICs in a databaseschema.3. To propose conceptually the SIC elicitation subsystem of an automateddatabase design system for helping a database designer capture the necessarySICs. The SIC Representation model and the E-R-SIC model are complementary.The captured SICs are represented in terms of the Representation model. The SICRepresentation model is a language or representation tool to represent SICs. Itsfunction is similar to the E-R diagram that we use to represent the structural partwhen using an E-R model. Both the models are application domain-independentand are suitable for implementation as part of an automated database design system. Some problems, for example, the procedure to query the database designerto identify SICs, verification of the captured SICs for consistency and nonredundancy, reformulation and decomposition of a general SIC into operation-dependentsub-SICs, and transformation of the SICs referencing entities and relationshipsinto corresponding ones referencing relations, need to be solved in order to have aworkable automated database design system. A SIC originally obtained from thedatabase designer may be general, i.e., it may be relevant to several objects onvarious operations. Decomposing such a SIC is to rewrite it into several sub-SIGsChapter 1. Introduction 15represented in terms of the SIC Representation model. Each of them is only relevant for one object on one access operation — that is, it is operation-dependent.The purpose of decomposition is not only to let the database designer know clearlywhat precise implications of a general SIC would be, but also to reformulate theoriginal SIC into several formats that can be efficiently enforced later.Research Methodology Methodologically, this research has two components. Thefirst is model building to develop two models. The second is algorithm designing todevelop algorithms to verify, decompose, and transform SICs.Research Scope Figure 1.1 illustrates the scope of this research. This research isprimarily concerned with semantic integrity constraints in the conceptual design phaseand how they are translated into SICs in the logical design phase. In particular, itfocuses on the Entity-Relationship model and its transformation into a relational modelbecause of the popularity and wide-sprea.d adoption of these as conceptual and logicaldata modelling tools, respectively.A complete automated database design system would include a structure subsystemfor constructing the structural entities, relationships, and relations, and a SIC elicitationsubsystem for eliciting the SICs. The construction of the structure part of the schemais not the focus of this research although the SIC elicitation subsystem would need theconstructed E-R diagram as input.The proposed approach in this research is different from Codd’s ([Codd, 1979]) although the purpose of capturing more meaning may be the same. Codd proposed a newdata model RM/T to extend (indeed replace) the relational model. This research retainsSIC specificationsfor E-R model-_____________________________________-.Transaction :Design ITool : Transactions rograms ipre condition an dithte ity Maintenance Subsysten [lnterity Maintenance Subsystenpost-condition I Ispecifications L E H Data Structure Relational Data Structureof transactions B-H DBMS Relational DBMSfor E-R modepre-condition and post-conditionspecifications of transactionsI for relational modelNote that some general SICs needs to be first decomposed into precondition and predicate components before verifying them.Legend: dash lines and boxes are beyond the scope of this research.Chapter 1. Introduction 16An Automated Database Design SystemProposed SIC Elicitation SubsystemInterfacebetweentwoSubsystemsDB specificationin the extendedE-R modelStructure SubsystemStructureElicitationInterfaceE-RReqviremen,,,,,/DB DesignerHeuristicsConsistency & NonredundancyRules for DifferentSIC TypesE-R-STC Modeldoonain- independentknowledaesemantic integrityconstraintstransaction-drivenconstraintsSICspecificationsfor relationalmodelFigure 1.1: A Proposed Automated Database Design Subsystem for Eliciting SICsChapter 1. Introduction 17the traditional relational model because of its popularity, but proposes to include thenecessary SICs in addition to relation structures in a relational schema. The output ofthe proposed SIC elicitation subsystem would be SIC specifications, which are suitablefor either the E-R model or the relational model. There should be an integrity maintenance subsystem in a traditional E-R DBMS8 or relational DBMS. Some functionalrequirements (e.g., regarding SIC enforcement schedules and SIC inheritance, etc.) ofthis integrity maintenance subsystem are discussed in Chapter 4. However, in general,how this integrity maintenance subsystem would work is a future research topic beyondthe scope of this dissertation. It is also assumed that the DBMS would support inheritance mechanisms.This research does not address transaction modelling. The database designer maylater input transaction-driven constraints and SIC specifications to a transaction designtool to produce the pre-conditions and post-conditions of transactions, which would beperformed in an E-R DBMS or relational DBMS.This research neither empirically tests the “usefulness” of using database embeddedSICs versus the traditional enforcing integrity via the application software nor tests the“usefulness” of using the proposed automated database design system. The empiricalresearch is a future research topic.1.7 ContributionsThe contributions of this research are both theoretical and practical.The theoretical contributions include the following:8] is assumed that there are some (probably experimental) ]JBMSs to define and manipulate dataobjects directly in the E-R model.Chapter 1. Introduction 181. This research provides a model to represent precisely the features of a SIC.2. This research also develops a model to incorporate the necessary SICs in a databaseschema. The approach is to model dynamic as well as static SICs in the databaserather than in transactions or programs. The gap between traditional databasemodelling and application programming (transaction modelling) will be bridged bythe two models presented in this research.3. The work on reformulating and decomposing a general SIC into sub-SICs, andtransforming them from an E-R schema into a relational schema may be interestingto the computer science discipline because the current literature has explored theseproblems for only a few kinds of SICs.On the practical side, the contributions of this research include the following:1. The proposed automated database design system would help a database designernot only design the database structure but also include the necessary SICs.2. This research provides a foundation for overcoming the well-known problem ofrepresenting data integrity semantics in current relational database systems. Theresulting database would have the advantages of embedded SICs, e.g., greater consistency, deductive capabilities, etc. The SIC representation would facilitate theefficient enforcement.3. A database designer would find that modelling data integrity semantics becomeshis (her) responsibility and right. Important business rules and even heuristics canenter the database schema in an early database design phase— conceptual modelling. Applica.tion programmers could focus on information system semantics otherChapter 1. Introduction 19than data semantics, and need not worry about SIC checking in their individualprograms.4. This research provides a starting point for future empirical research to test theusefulness of using database-embedded SICs versus the traditional approach ofenforcing integrity via application software.1.8 The Dissertation OutlineThis chapter has given the definition of SICs, described the motivations, methodology,scope and contributions of this research. Chapter 2 briefly reviews work on SICs in theliterature. Chapter 3 introduces the SIC Representation model for representing SICsuniformly and precisely. Chapter 4 describes how the SIC Representation model can beapplied. Chapter 5 proposes the E-R-SIC model for incorporating SICs in a databaseschema. Chapter 6 gives an example of the use of the E-R-SIC model and discusses relatedissues and usefulness. Chapter 7 conceptually proposes a SIC elicitation subsystem tohelp the database designer use the E-R-SIC model and the SIC Representation model.Finally, Chapter 8 offers conclusions and describes how future researchers can extendthis dissertation. A series of appendices is attached to provide related materials in detailand give some examples.Chapter 2Review of Previous WorkAs briefly mentioned in Chapter 1, an automated database design system would firstelicit general SICs from a database designer, then verify and reformulate (decomposeif necessary) them in some representation language for the database structural schemarepresented in the E-R model, and finally transform them into corresponding ones in therelational model. This chapter reviews previous work on related aspects of producingSICs.The literature on SICs is rich. Most research concentrates on classifying and efficientlyenforcing SICs rather than capturing and incorporating them in a database schema. Inaddition, most research discusses SICs in the context of relational or deductive databasesrather than the databases in the conceptual level of E-R schema. Therefore, transformation from the SIC representations in the E-R model into corresponding ones in therelational model has not been explicitly discussed in the literature although some existingautomated database design systems may perform it for a few types of SICs that theyidentify. Section 2.1 first reviews the different ways to classify SICs because the classification can help understand the important SIC features so that we can incorporate,represent and enforce them properly. Section 2.2 reviews previous research on languagesfor representing SICs. Section 2.3 reviews how verification of SICs has been dealt within the literature. Section 2.4 reviews previous research on reformulating SICs. Finally,20Chapter 2. Review of Previous Work 21Section 2.5 reviews how SICs are handled by existing automated database design systems.2.1 SIC ClassificationIn the literature, there are a number of different ways to classify SICs.2.1.1 Classification Based on ExplicitnessOne way to classify SICs is based on their explicitness from the perspective of the datamodel in use. SICs can be inherent, explicit or implicit ([Tsichritzis and Lochovsky,1982], [Brodie, 1983], [Brodie, 1984], [Davis and Bollriell, 1989]). Inherent constraints areintegral parts of the structure of a data model (e.g., record relationships in the hierarchicaldata model are structured as trees; tuples in the relational data model are not duplicatedand the ordering of tuples is not important). Explicit constraints are defined explicitly bysome specification mechanism; for example, a database designer might explicitly specifythat the salary of any employee must be less than $1,000,000. Finally, implicit constraintsare logical consequences derived from inherent or explicit constraints; for example, thetransitive closure of functional dependencies can be deduced from a subset of functionaldependencies.Discussion A data model with rich inherent SICs would relieve the database designerfrom specifying many SICs explicitly. From this perspective, neither the E-R modelnor the relational model is a good modelling tool for designing a database since bothmodels, especially the relational model, contain very few inherent SICs. However, theyare widely used for other reasons. It is one of the motivations of this research to helpChapter 2. Review of Previous Work 22the database designer identify and precisely specify explicit SICs. A database designerand an automated database design system need to know the inherent SICs of the datamodel (e.g., the E-R model) that they use. Otherwise, it is likely that these inherentSICs would be lost when the structural schema of the conceptual model is transformedinto the logical model (e.g., the relational model). Both inherent and explicit SICs shouldbe represented properly at the conceptual design phase and transformed into a logicalschema. Given a set of inherent and explicit SICs, the database designer should alsobe aware of the derived consequences, i.e., implicit SICs, to verify its consistency andnon-redundancy.2.1.2 Classification Based on Applied ObjectsClassification of SICs based on applied objects is the most common classification andis concerned with the data objects to which a SIC applies ([Eswarn and Chamberlin,1975], [Fong and Kimbleton, 1980], [Fernandez et al., 1981], [Tsichritzis and Lochovsky,1982], [Date, 1983], [Weber, et al., 1983], [Simon and Valduriez, 1984]). There are threecategories of these SICs:1. Strong data type constraints: Strong data type constraints (also called domainconstraints) are applied to a single data item, for example, a field or an attribute.They include the following:(a) Value constraints specify the range of acceptable values of a data item (e.g.,within some numerical bounds or an enumerated set) and whether a null valueis allowed.Chapter 2. Review of Previous Work 23(b) Norivolatility constraints declare whether a data item value can be changed([Kozaczynski and Lilien, 1988]).(c) Extended format constraints permit specifications of data type, length, andformat (mask) pattern;(d) Legal operation constraints limit the operations that can be performed on agiven domain ([Eswarn and Chamberlin, 1975], [Fong and Kimbleton, 1980]);for example, two dates cannot be multiplied.2. Record (tuple) constraints: Record constraints apply to an individual record (anoccurrence in terms of the E-R model). For example, in each payroll record,Gross_Salary should be greater than Deductions. One interesting kind of recordconstraint is the case where the value of one field in a record is conditional onthe values of other fields ([Benci et al., 1976]). For example, Salary is equal toBase_Salary plus Bonus. In this example, regardless of whether the value of Salaryis entered by a user or computed by the system itself, as long as this value is explicitly stored, the integrity constraint must hold. If the conditional attribute isnot explicitly stored, it is a virtual invariant, derived item because one data itemis defined as a function of another ([Etzion, 1989]).3. Set constraints: Set constraints apply to a set of records (occurrences). They may hebased on built-in aggregate functions (e.g., average, minimum, count). Therefore,they are sometimes called aggregate constraints ([Tsichritzis and Lochovsky, 1982])or set function constraints ([Dogac et al., 1985]). They may be based on comparison(e.g., exclusion or inclusion) of one set to another. For example, the set of managersmust be contained in the set of employees. These records (occurrences) may belongto the same relation (entity) type or different relation (entity) types. From thisChapter 2. Review of Previous Work 24observation, some researchers (e.g., [Fong and Kimbleton, 1980]) further classifythem into relation constraints and multi-relation constraints.Discussion This categorization reminds us that all objects in a database— attributes,entity and relationship occurrences and types in the E-R model, columns, relation tuplesand types in the relational model may have related SICs that need to be identifiedand represented. However, there are two controversial points.• Should we capture legal operation constraints? Note that this kind of “constraint”borrows the notion of abstract data types ([Goldstein, 1985]), and the “operations”are basic arithmetic or string operations that may be performed on value domains,not the database primitive manipulation operations. This research will not considerthese “constraints” because they do not conform to the SIC definition. This kindof “constraint” is relevant only if the data object is manipulated by some restricted“operations” (e.g., arithmetic or date operations). It is not a restriction on staticdatabase states or state transitions.• Should we capture extended format constraints? One may argue that the specification of data type, length, and format of an attribute is a syntactic rather thana semantic issue. However, this research includes extended format constraints aspart of SICs for the following two reasons. (1) The semantics of an attribute arebased on the syntax we agree to use. For example, the meaning of a salary rangefrom $1,000 to $50,000 in integers is different from that of the same range of realnumbers. (2) SICs in this research are attached to the stored data rather than realworld objects. A database designer would specify a SIC in terms of the interests ofthe application and include the data in the format that is meaningful to it.Chapter 2. Review of Previous Work 252.1.3 Classification Based on Operation TypeSICs that are concerned with database operations can be classified as operation-dependentor operation-independent constraints ([Eswarn and Chamberlin, 1975], [Fernandez et al.,1981], [Weber, et al., 1983]).• operation-independent: A constraint is operation-independent if it must hold forsome object on all access operations, although for efficiency reason it may beenforced only on some operations.• operation-dependent: A constraint is operation-dependent if it must hold for someobject on some access operations, hut not on other access operations.Discussion We only need to consider three kinds of operation types: insertion, deletionand update because a retrieval (query) operation does not change any data. An updateoperation is on an attribute of an entity, relationship occurrence, or relation tuple, butnot on the whole entity, relationship occurrence, or relation tuple. It is also the onlyapplicable operation on an attribute. Therefore, if a SIC is relevant to an attribute, it isoperation-dependent. For an entity, relationship occurrence, or relation tuple, there aretwo possible operations insertion and deletion. It appears that in the real world if a SICapplies to an entity, relationship occurrence, or relation tuple, it is likely to be operationindependent. Operation-dependent constraints (e.g., “a project can be deleted only if itsbudget is equal to zero”) seem to be relatively less common. This may explain whyprevious research pays only little attention to them. However, the possible relevance tospecific operations creates a special requirement on the SIC representation language.Chapter 2. Review of Previous Work 262.1.4 Classification Based on PreconditionAccording to Fernandez et al. [1981], SICs based on the precondition for enforcement canbe classified as either conditional or unconditional. Conditional constraints are enforcedonly when certain preconditions are met. For example, the salary of an employee whohas worked for less than three years must not be over $30,000. Unconditional constraintsare always enforced.Discussion Whether a SIC is conditional or unconditional is also relative to the object concerned. It seems that the research of Fernandez et al. [1981] is the only workthat classifies SICs in this way. However, this categorization perspective places anotherrequirement on the SIC representation language— the context under which the SIC isapplicable should precisely be represented.2.1.5 Classification Based on Certainties of SICsSICs can be classified as certain or uncertain ([Oren, 1985], [Morgenstern et al., 1989]).A certain constraint specifies some fact about the data semantics that is assumed to beabsolutely true (e.g., “the height of a person is greater than zero” [Oren, 1985]). Anuncertain constraint is one that is generally true, but there is a slight probability that itcan be violated (e.g., “the weight of a person is less than 200 kgs”). Certain and uncertainconstraints lie on a continuum. Wiederhold [Morgenstern et al., 1989] identifies four levelswith respect to the degree of certainty and absoluteness of constraints:1. absolute truths that arise in the physical world and in the database; for example,an employee has a birth date.Chapter 2. Review of Previous Work 272. rigid situational rules that do not change often; for example, each employee isassigned to one department.3. business rules that may change often; for example, a manager’s salary is greaterthan the salary of his or her subordinates.4. heuristics; for example, employees are usually assigned to projects that match theirspecialities.A more sophisticated proposal such as defining and maintaining “measure(s) of accuracy”may be possible ([Eswarn and Chamberlin, 1975]). Violation of a certain constraint maybe rejected as an error, whereas violation of an uncertain constraint may only cause adiagnostic message. Therefore, uncertain constraints are sometimes called soft constraints([Eswarn and Chamberlin, 1975]).Discussion This classification is another perspective that has not drawn much attention. Since it is likely that an organization might have a number of uncertain but usefulconstraints, there should be some way to represent them.Note that the four levels of certainty mentioned by Wiederhold ([Morgenstern et al.1989]) are in fact classified by two kinds of “uncertainty” exception and permanence. “Heuristics” may have exceptions. “Absolute truths”, “rigid situational rules”,and “business rules” have no exceptions. If there is any exception, the “rules” shouldbe modified to accommodate the exception and the “modified rules” have no exceptions.These levels differ on their permanence the “uncertainty” of how often they may bechanged.Should SIC specifications include both “uncertainty” of exception and “uncertainty”Chapter 2. Review of Previous Work 28of permanence? The “permanence” information may he useful for SIC managementduring database usage. However, the organizational environment is turbulent and constantly changing. The permanence of a rule is not easily decided in advance when adatabase designer designs a database. Therefore, this research does not include this kindof information.2.1.6 Classification Based on Violation ActionSICs based on the violation action, i.e., what would happen if the SIC is to be violated,can be classified as strong, soft, or self-correcting ([Weber et al., 1983]). If a strongconstraint is violated, the operation is rejected and the user receives an error message.If a soft constraint is violated, the user only receives a warning. If a self-correctingconstraint is violated, its error correcting action is executed.Discussion In the published literature, [Weber et al., 1983] is the only research thatclassifies SICs according to their violation actions although there is other research thatdiscusses alternative violation actions. Violation actions are classified into the three categories given below, which are synthesized from previous research ([Hammer and McLeod,1975; 1976], [Hammer and McLeod, 1976], [Casanova and Tucherman, 1988] and [Flemingand Halle, 1989]).1. Reject— reject the requested database operation, signalling an error.2. Warning — allow the requested database operation, but issue a warning.3. Corrective Action — perform a corrective action; that is, an auxiliary procedureknown as a triggered action ([Fernandez et al., 1981]). Usually, it causes the DBMSChapter 2. Review of Previous Work 29to insert, delete, or update other objects.For referential integrity constraints, e.g., E.A C F.B, where E.A is a foreign key ofa relation E and F.B is the primary key of another relation F, the corrective actioncan be further classified as:(a) Propagate for example, insert a referenced tuple or delete the referencingtuples.(b) Nullify for example, set the referencing attributes in the referencing tupleto null.(c) Default for example, set the referencing attributes in the referencing tupleto predefined default values.(d) Others — for example, triggers or other procedures that are domain specific.Note that “nullify” and “default” are special cases of “propagate”.Traditionally, a SIC is strong by default. However, this classification is important toremind a database designer that it is not necessary that if a SIC is violated, the operationis just rejected. There are other options that can be specified at database design time.2.1.7 Classification Based on Enforcement ScheduleSICs based on the enforcement time, i.e., when the SIC is enforced, can be classifiedas immediate or deferred (e.g., [Eswarn and Chamberlin, 1975], [Fong and Kimbleton,1980], [Fernandez et al., 1981], [Date, 1983], [Weber, et al., 1983]). Immediate constraintsare enforced immediately after each database operation. Deferred constraints are notenforced until the end of a transaction. It may also be possible to permit the user toChapter 2. Review of Previous Work 30switch integrity checking ON or OFF that is, to have user-invocable constraints ([Fougand Kimbleton, 1980], [Bertino and Apuzzo, 1984], [Weber. et al., 1983]).Discussion A number of researchers discuss this categorization. However, differentspecification levels should not be mixed up. This categorization is useful when consideringthe enforcement of SICs, or designing transaction specifications. Fernandez et al. [1981]allow the precondition component of their original model to specify whether the SICis to be applied immediately or deferred to the end of a transaction or to a periodicaudit. However, Bertino and Apuzzo [1984] treat the enforcement schedule separately asdecided by an integrity maintenance subsystem because they believe that the enforcementschedule of a SIC depends upon the transactions or programs that are executing. Theirexample may be helpful to understand their arguments:A SIC: “each employee working on project P125 must earn less than $3,000.”Suppose that now the database contains an employee entity occurrence whois in department D55, and works on project P125.Consider a transaction: first update the salary of all employees in D55 to$3, 500; then re-assign the project of all employees in D55 to P200.This transaction should be correct.Based upon the above observation, Bertino and Apuzzo [1984] propose a criterion:Basically, an integrity constraint is enforced at the end of the transaction, ifattributes present in the constraint are modified by more than one update statementChapter 2. Review of Previous Work 31in the transaction. Otherwise, the constraint is enforced after each tupleupdate, if it is in class Cl [i.e., tuple constraints], or after each update-request, if it is in class C2 [i.e., relation constraints] or C3 [i.e., multi-relationconstraints].Furthermore, the enforcement schedule can be more sophisticated and more efficient.Lafue [1982] proposes that constraint checking can be delayed until the dependent instances become of interest9. That strategy is called by Morgenstern [1986] “propagationwhen used.” Between immediate propagation and “propagation when used” is an intermediate strategy that has been referred to by Morgenstern [1986] as opportunisticpropagation, both in the sense of doing the work of constraint propagation when thecomputer is idle, and in the sense of using priority ranking of the constraints to selectthe order in which they should be considered for propagation.Thus, because the enforcement schedule is closely related to transaction modellingand enforcement implementation efficiency strategies, it should not be included in SICspecifications. Neither should be the option to switch integrity checking ON or OFF.2.1.8 Classification Based on DynamicsSICs based on their dynamics can be classified as static or dynamic ( [Eswarn and Chamberlin, 1975], [Bracchi et al., 1979], [Fong and Kimbleton, 1980], [Fernandez et al., 1981],[Date, 1983], [Bertino and Apuzzo, 1984], [Brady and Dampney, 1984], [Heuser andRichter, 1986]). Static constraints specify correct database states. Transitional (dynamic)constraints characterize valid state transitions, i.e., are concerned with ‘admissibi1ity” of°A dependent variable is a variable that can be operated on by the constraint, i.e., by a violationaction.Chapter 2. Review of Previous Work 32a database state sequence.There are two major kinds of transitional constraints mentioned in the literature([Fong and Kimbleton, 1980]): (1) old/new transitional constraints that restrict an updateof an attribute during which its “old” value is to be changed to a “new” value (e.g.,“new salary must be greater than old salary”); (2) nonexistence/existence transitionalconstraints that restrict either a nonexistence to existence transition or an existence tononexistence transition (e.g., “only if the account balance is zero, can the account bedeleted”).Some dynamic constraints (e.g., [de Castilho et al., 1982], [Casanova and Furtado,1984], [Ehrich et al., 1984], [Kung, 1984] and [Lipeck, 1986]), which are often neglectedby researchers, illclude:• constraints on a sequence of operations: some operations must happen in aspecific sequence or at the same time. For example, “ownership of a car must bepassed from a manufacturer to a dealer first, before it may be passed to a purchaser”.• constraints involving time explicitly:1. SICs having some explicit time restriction or time-triggering condition. Forexample, we may have “an employee cannot receive a salary raise during his(her) first 6 months in the company”, or “at 0:00 on 1/1/1993, increase thesalary of each employee by $1,000.,’2. Time-triggered or restricted SICs depending on past generations of data and,thus reference historical data at some specific time point. For example, we mayhave “the price of any product at any time cannot be more than 5% higher thanits price one year ago.”Chapter 2. Review of Previous Work 33Discussion Both static and dynamic SICs are important to preserve the logical meaning of stored data. A database designer should capture the necessary dynamic SICsand have some language to represent them. Constraints on a sequence of operations arenot easily captured and represented as SICs. This may explain why researchers ofteneither neglect them or model them by transactions or events. However, they are theconsequences of enforcing some SICs if specified properly.Note that an operation-dependent SIC is also a dynamic SIC. Old/new transitionalconstraints and nonexistence/existence transitional constraints are important. However,one should not define them too narrowly. That is, old/new transitional constraints arespecial cases of the update transitional constraints, which do not necessarily involvethe old/new comparison. Nonexistence/existence transitional constraints are also specialcases of the deletion/insertion transitional constraints, which may involve more than oneobject.In order to capture and represent SICs having explicit time restriction, in a real timeenvironment, we would need a special system variable Current_time to register thecurrent clock time’0, and explicit time-valued attributes in the entity or relationship. Ina non-real time environment, those become ordinary data-driven constraints involvingsome time-valued attributes. If we wish to capture and represent SICs using historicaldata in general, we would need time-stamped generations of data. This research doesriot explore the last kind of SIC since a database keeping time-stamped generations ofdata is both unusual and expensive to implement.10t is assumed that a DBMS has access to a clock that registers both current date and time. Current..time can be thought of as an attribute of a special system entity type. Because constraints maybe affected by the normal advance of time, we assume that the operating system can be instructed tosignal the DBMS integrity maintenance subsystem when a pre-specified time is reached.Chapter 2. Review of Previous Work 342.1.9 Summary of Comments on SIC ClassificationIn the literature, researchers describe different SIC categorization schemes for diversepurposes — discussing data models, incorporating, representing, and enforcing SICs.For incorporating and representing SICs adequately, the most important categorization schemes are: (1) certain or uncertain, (2) the classification on applied objects,(3) operation-independent or operation-dependent, (4) unconditional or conditional, (5)strong, soft or self-correcting, and (6) static or dynamic . These categorizations are toorudimentary to serve as a modelling tool for incorporating SICs. However, they are veryimportant for precisely representing SICs since they provide a feature listing of a SIC.Without any explicit description, one could only assume the given SIC to be certain,related to all objects mentioned in that SIC, operation-independent, unconditional, andstrong.2.2 SIC RepresentationSQL is a generally accepted relational database language. However, only very few SICsare mentioned in the ISO SQL standard. The implied violation action of a SIC is to setSQLCODE negative. That is, it causes an error. Other SIC features, such as whetherthe SIC is operation-independent or operation-dependent, unconditional or conditional,certain or uncertain, are not specified.The SQL standard has two levels and one addendum [van der Lans, 1989]. Thespecified SICs are as below:1. SQL Level 1 Standard: It only specifies data types and length of data items.NOT NULL must be specified in every column definition in a CREATE TABLEChapter 2. Review of Previous Work 35statement.2. SQL Level 2 Standard: In addition to data types, it allows the designer tospecify whether a data item is UNIQUE and whether it can be NULL.3. SQL with Addendum:• In addition to the above, it allows specification of a value range by using theCHECK specification for a data item;• By using the CHECK specification separately (e.g., CHECK (YEAR-OF-BIRTH < YEAR-JOINED)), tuple constraints can be specified;• Few set constraints have been included:— Referential constraint is provided by using FOREIGN KEY (column list)REFERENCES (table name);— UNIQUE or PRIMARY KEY specificationA number of researchers propose alternative languages to represent more SICs. Someare variants of first order logic (FOL) languages, for example, the many-sorted first-orderpredicate calculus applied by Furtado et al. [1981]; the constraint equation proposedby Morgenstern [1983; 1984a; 1984b; 1986]; the equation statements suggested by Cosmadakis and Kanellakis [1985]; the first order formulas used by Reiter [1984; 1988],Henschen et al. [1984], and Urban and Delcambre [1989]. These FOL family languagesare purely declarative. The operations to be checked are treated as an implementationproblem. No violation action is specified. The constra.ints are assumed to be certain.Furthermore, these FOL languages cannot represent some kinds of SICs, e.g., operationdependent or dynamic SICs. Other researchers (e.g., [de Castilho et al., 1982], [Casanovaand Furtado, 1984], [Ehrich et al., 1984], [Kung, 1984] and [Lipeck, 1986]) propose anChapter 2. Review of Previous Work 36extension of FOL — temporal logic that may include explicit “state” or “time” parameters. They invent some special temporal quantification or a list of modalities (e.g.,always, until, heretofore, sometime) to model dynamic SICs. One potential problem ofthese temporal logic languages is that they may not be easily understood and implemented.Others just extend the original SQL proposals, but do not follow the SQL standard.For example, Hammer and McLeod [1975] state that the syntax of their language is“rather similar” to SEQUEL. Bertino and Apuzzo [1984] propose a 5-component modelto specify SICs and state that their language is a “simple extension” to SQL. Date[1983] applies a language of his own — very loosely based on the PL/I version of UDL(Unified Database Language) and describes [1987] some proposed extensions of the baseSQL standard. These SQL extensions are more precise and powerful for representingSIC features than the SQL standard. Among these, the language model proposed byBertino and Apuzzo [1984] and the similar one by [Fernandez et al., 1981]” may becapable of representing the SIC features mentioned above except for certainties. However,neither work carefully elaborates what would be in each component12.In addition, SICsrepresented in their original model may not be efficiently enforced if we compare therepresentation to Date’s UDL [Date, 1983] that uses the idea of a cursor to facilitate SICenforcement on occurrences.The certainty feature of SICs has not been properly represented. The term fuzzyintegrity constraints has appeared in the fuzzy database literature (e.g., [Zvieli and Chen,“However, [Fernandez, et al., 1981] does not describe what is really their language — first order logicor SQL.‘2For example, The model proposed by Fernandez, et al., [1981] allows to specify the enforcementschedule as a part of precondition of a SIC. The model proposed by Bertino and Apuzzo [1984] allowsthe options to switch a SIC ON/OFF. However, as discussed in the previous section, these should notbe in a SIC specification.Chapter 2. Review of Previous Work 371986], [Raju and Majumdar, 1988]). However, its meaning may have not been fullyexplored. Raju and Majumdar [1988] classify fuzzy relations in relational databasesinto two categories. A type-i fuzzy relation captures the impreciseness in the association among entities, e.g., the certainty of John liking the course AT is 80%. Atype-2 fuzzy relation produces further fuzzy semantics by allowing the domain of anattribute to be a set of fuzzy sets, e.g., “allowing salary of John to be in the range$40, 000 to $50, 000 and that of Mary to be a fuzzy set, However, their fuzzy integrity constraints only attach some fuzzy modifiers, e.g., “very”, “more or less” to anassertion by choosing some “fuzzy resemblance relation”, for example, “for any job, employees having approximately equal experience should have approximately equal salaril’ or“any items having approximately equal order-date should have more or less equal deliverydate”. This kind of SIC does not capture the “impreciseness” of a SIC itself. For example, “any items having approximately equal order-date should have more or less equaldelivery date” is a SIC that is true with certainty 75%. RESTRICT is another languageto consider the certainties of SICs ([Oren, 1985]). However, it simply uses a special operator “?“ to represent an uncertain assertion, e.g., “SALARY(PERSON) <? 150000”.This approach has only two levels of certainty certain and uncertain.In addition, all of the languages mentioned above represent SICs in the relationalmodel, not in the E-R model. It is desirable to have a uniform language to representall SICs in the E-R model and in the relational model. It would be easier for systemanalysts, database designers, and programmers to learn, comprehend and communicatewith each other if there is a uniform language for both the conceptual modelling andlogical modelling phases.Chapter 2. Review of Previous Work 382.3 SIC VerificationIt is suggested ([Bracchi et al., 1979] and [Morgenstern et al., 1989]) that it is necessaryto verify a set of SICs based on some criteria: completeness, correctness, consistency,nonredundancy, no unexpected implicit constraints, and insensitivity to order.In the database literature, no substantial results have been published that assure thata set of SICs is complete and correct, which would need both application-domain andgeneral world knowledge.If the SIC representation is not precise enough to capture the restriction intention,the enforcement outcome of a set of SICs may depend upon the order in which particularconstraints are executed. Other remaining criteria — consistency, nonredundancy and nounexpected implicit constraints— are closely related although their verification difficultymay be increasing. Redundancy occurs in a set of constraints if some constraints subsumeother constraints. It would be an issue only if we are concerned with efficient enforcementof a set of constraints. To assure that a set of SICs has no unexpected implicit constraintswould require a database designer to understand well the closure of the set, i.e., all thoseconsequences derived from the specified SICs. It requires that the database designermake judgements as to whether an implicit constraint is unwanted. The fundamentaland most important issue is the consistency problem. Constraints are consistent if thereexists a database state or a state transition that is allowable with regard to all of therestrictions. Unfortunately, the consistency problem has been described (e.g., [Meersman,1988]) as a difficult one. SICs are either tacitly assumed to be consistent, or only someof them are verified for consistency. For example, Kung [1984] [1985] presents a tableauxmethod to check the consistency of restricted first-order SICs. Bry and Manthey [1986]describe two basic approaches to extending refutation methods into procedures to checkChapter 2. Review of Previous Work 39consistency of closed and function-free first-order SICs. Brodie [1978] mainly relies onan actual database to verify the consistency of some static SICs. Lenzerini and Nobili[1987] have contributed much to the consistency problem for cardinality constraints. Insummary, past researchers have oniy tried to verify very few types of SICs for consistency.Furthermore, the redundancy problem of SICs has never been directly addressed in theliterature. Nor has the issue of unexpected implicit constraints been explored for arbitrarySICs.The major reason for this is that it has been known (e.g., [Nillson, 1980]) that eventhe “logical implication” of first order logic predicates is “undecidable” or only “semi-decidable” 13 In this research, in order to cover more types of SICs, expressions in theprecondition and predicate of a SIC could include not only first order, but also higherorder logic. The full verification of consistency and non-redundancy of all SICs relatesto the fundamental issue of computer science and mathematics— constructing a Turingmachine to decide whether an arbitrary language is acceptable. Such an issue would beNP-complete in nature (i.e., putatively having exponential time complexity)14Based on current techniques, we cannot verify a set of arbitrary SICs for consistencyand nonredundancy. However, it is possible to classify SICs into several different typesand analyze at least some of them for consistency and non-redundancy. Some researchershave tried it, but only for very few types of SICs and produced relatively rudimentaryresults (e.g., Troyer [1989] verified five types of SICs for consistency in the Binary Relationship model, Furtado et al., [1988] verified two types of SICs for consistency in the‘3A property is semi-decidable if algorithms can be constructed that are guaranteed to report therespective property after finite (but indefinite) time if applied to a set that actually has this property,but possibly run forever otherwise.14t is shown ([Aho, et al., 1974]), [Papadimitriou and Steiglitz, 1982]) that even the “satisfiabilityproblem” of Boolean formulas—whether a Boolean formula can be made true by some truth assignmentto its variables is NP-Complete.chapter 2. Review of Previous Work 40E-R model).2.4 SIC Reformulation and DecompositionThe reformulation and decomposition of a general SIC into several sub-SICs is related toprevious work on efficiently checking SICs, an important research topic in the databaseliterature. A naive approach is to perform the modification and then check whether thenew database state satisfies all SICs. Such an approach is called full integrity constraintchecking ([Ling, 1986]) or total integrity checking ([Nakano, 1983]).Full SIC checking is time-consuming. Researchers have proposed a number of moreefficient SIC checking techniques or algorithms. For example, Stonebraker [1975] considers monitoring immediate static SICs and elementary database updates by applyingthe “query modification” technique. Niacolas [1982] and Kobayashi [1984], respectively,present a simplification algorithm that transforms static SICs into simplified forms. Hsuand Imielinski [1985] provide a simplification method for transactions, and also for SICsexpressed in the prenex normal form of relational tuple calculus. Ling and Rajagopalan[1984] propose a method for eliminating avoidable checking of integrity constraints expressed in first order predicate calculus. Ceri and Widom [1990] present a labelling algorithm to derive automatically the set of operations that may cause constraint violationfor any given SIC expressed in a SQL-based language.Basically, the above efficient algorithms are derived from the syntactic structure of aSIC specification. Some researchers (e.g., [Qian and Wiederhold, 1986], [Qian and Smith,1987]) go further to propose transformational mechanisms that exploit knowledge aboutthe application domain and even database physical organization structure to reformulateSICs into semantically equivalent, but more efficient ones. Bernstein et al. [1980] hasChapter 2. Review of Previous Work 41proposed improving SIC checking efficiency by ma.intainillg some redundant data (e.g.,minima and maxima of certain sets).All the above work is based upon the assumption that the current database statesatisfies all SICs that have been specified. This type of checking is called incrementalintegrity constraint checking ([Nakano, 1983], [Ling, 1986]).The purpose of previous work on SIC decomposition is mainly to make SIC checkingmore efficient. Some of them only attach a set of operations, which may cause constraintviolation, to a SIC. Others rewrite the original SIC into a set of sub-SICs that is semantically equivalent to the original SIC. However, one should note that when a general SIC isdecomposed into sub-SICs, it is possible that the sub-SICs might have different violationactions. Thus, these sub-SICs are also part of database specifications rather than justfor implementation efficiency.In addition, the previous algorithms are only suitable for few types of SICs in somerestricted languages.2.5 Automated Database Design Aids for Eliciting SICsYang and Goldstein [1989] survey twenty automated database design systems to investigate how SICs have been included. Many systems (e.g., 12S [Kawaguchi et al., 1986];SA-ER [Carsnell and Navathe, 1987]; ACME [Kerstern et al., 1987]) do not attemptto identify SICs at all. Some systems (e.g., SECSI [Bouzeghoub and Gardarin, 1984;Bouzeghoub et al., 1985; Bouzeghoub and Metais, 1986]; RIDL* [Troyer, 1989]; E2R[Kozaczynski and Lilien, 1988]; CHRIS [Furtado et al., 1988]; Gambit [Bragger et al.,1984]; EXIS [Yasdi and Ziarko, 1987]; TSER [Hsu et al., 1988]; PROEX [Obretenov etChapter 2. Review of Previous Work 42al., 1988]; Modeller [Tauzovich, 1989]; EDDS [Choobineh, 1985 Choobineh et al., 1988];OICSI [Rolland and Proix, 1986; Cauvet et al., 1987; Proix and Rolland, 1988]; andDatabase Generation Tool [Maryanski et al., 1984; Maryanski and Hong, 1985]) do provide some mechanisms for eliciting and representing SICs. However, only a few typesof SICs are identified. Most of them are the common SICs, e.g., incidence constraints,totality constraints, or inherent to data abstractions. No system provides a model toguide the incorporation of SICs.The SIC representation of some systems is as arcs in a semantic network (e.g., OICSI)in logic (e.g., Gambit), or rules (e.g., CHRIS, EDDS). Some SICs are even represented aspreconditions/post-conditions of transactions or events rather than data SICs (e.g., E2R,EXIS). Some SIC features mentioned above are not explicitly represented. Gambit maybe the one that represents them most explicitly. However, in Gambit, the responsibilityfor knowing the operation type on which a SIC must be checked rests with the databasedesigner.The discussions of SIC verification reported for these systems are either nonexistentor tend to be somewhat rudimentary.Chapter 3A Model for Representing Semantic Integrity ConstraintsThis chapter presents a language, called the SIC Representation model, to representthe features of a SIC that are mentioned in Section 2.1.9.3.1 An Overview of the Representation ModelThe SIC Representation model integrates and formalizes the ideas of Fernandez et al.[1981], Bertino and Apuzzo [1984], and Date [1983]. This model specifies precisely features of a SIC. The original model proposed by Fernandez et al., and Bertino and Apuzzohas been extended to include a measure of certainty, and some of their original conceptshave been modified. The cursor concept in UDL proposed by Date has also been incorporated into the model.The model represents a constraint in terms of six components: Object (0), Operation Type (T), Precondition (C), Predicate (P), Certainty Factor (F) andViolation Action (A). In addition, each constraint is given a descriptive name. Thejustification of including these six components is provided in Chapter 4. This chapteronly gives the description of the model.Using this model, a SIC is represented as:43Chapter 3. A Model for Representing Semantic Integrity Constraints 44SIC-NameCERTAINTY F (Certainty Factor)FOR 0 (Object)ON T (Operation Type)IF C (Precondition)ASSERT P (Predicate)ELSE A (Violation Action)In terms of the usual production rule syntax, the above whole statement (except forSIC-Name) is interpreted as:with certainty FIF(O,T,C) THEN(IF NOT P THEN A)SIC-Name is used as an identifier that conveys some meaning of a SIC. It is not essentialto the Representation model.The following example, named as project_employeeminimum_salary for convenience,will be used for illustration. Suppose that there is an Employee entity, and a Work_forrelationship in the E-R model and the corresponding Employee, and Work_for relationsin the relational model’5. A SIC might state “if an employee works for any project, his(her) salary should be greater than $10,000’. This SIC can be decomposed into severalsub-SICs represented in the Representation model. Each sub-SIC is operation-dependentand only relevant to a single object. In addition to the related sub-SICs on the updateof Work_for’s primary key, two sub-SICs are:‘5That is, a separate relation, Work_for, is assumed for simplifying the comparison of our model toother work in the literature since they only deal with the relational model.Chapter 3. A Model for Representing Semantic Integrity Constraints 45— one for Employee.Salary on update;— one for Work_for on insertionThe first one may be represented as the following.Employee. Salary- U-RshipDepEnt Val- (Workfor)CERTAINTY certainFOR Employee.SalaryON updateIF 3 Work_for, rship_occ_part(Work_for, “Employee”, Employee)ASSERT Employee.Salary> 10000ELSE rejectInterpretation: The first line is the SIC name. It indicates that this is a SICfor Employee.Salary on update and it is a “RshipDepEntVal” type because itasserts that the existence of a relationship (Work_for) depends on the value ofan entity attribute (Employee.Salary). The rship_occ_part is an assertion thatstands for “relationship occurrence participant”. In this example, it is usedto specify the Work_for occurrence in which the currently checked Employeeoccurrence participates with the entity type, “Employee”. This SIC statesthat with 100% certainty, when an Employee.Salary occurrence is to be updated, if the Employee participates in at least one Work_for, his or her Salarymust be greater than $10,000. Otherwise, the update operation is rejected.Note that the database designer might choose “propagate(delete(Work_for))”as the violation action. In that case, the propagation action might imply thatif the organization could not afford the minimum salary for an employee, itChapter 3. A Model for Representing Semantic Integrity Constraints 46could not require that the employee be associated with any project (so thecurrent Work_for occurrence must also be deleted).The restriction intention of a SIC represented by this model is expressed as the predicate component (P) separately. Its other components precisely specify the SIC featureslisted in Section 2.1.9. These are:1. certain or uncertain: indicated by the component F — certainty factor;2. applied data: shown in the component 0 — object;3. operation-dependent: a SIC represented in this model is always operation-dependent,the operation is specified in the component T — operation Type.4. unconditional or conditional: the conditions are listed in the component Cprecondition;5. strong, soft or self-correcting: indicated by the component A violation action.A SIC represented in this model is always dynamic in that it indicates a valid databasestate transition in which the object would be manipulated by the specified operation type.A static constraint is represented by rewriting it into one or more dynamic constraints forthe related object(s) on the operation(s) that could cause unallowed database state(s).The reasons for doing this are as follows:• Because some constraints are inherently dynamic, this allows uniform representations for all kinds of constraints.• Enforcement efficiency can be greatly increased by knowing which operations cancause constraint violations.Chapter 3. A Model for Representing Semantic Integrity Constraints 47• Violation actions may be different depending on the operation types causing constraint violations.The usual approaches to representing a SIC mix up several components in one statement and do not explicitly describe the above features. Taking a SIC represented in atraditional language, one could only assume that it is 100Yo certain; related to all objectsmentioned in that SIC; applicable to all primitive database access operations; unconditional in all contexts; and so strong that it would cause errors if violated. For instance,according to the language used by Ling and Rajagopalan [1984], the above example wouldbe expressed in the prenex normal form of the first order predicates as follows.E ranges over EmployeeJ ranges over WorLforV E V J (E.EmpNo J.EmpNo OR E.Salary> 10000)Evaluation:This statement is neither precise nor powerful because of the following:(1) It is not clear how certain this SIC would be. So, it can only be assumedas 100%.(2) Since E and J appear there and no operation type is explicitly specified,it would seem that the SIC is relevant to any operation on an Employeeor a Work...for. However, in fact, it could never be violated by deletionor insertion of an Employee, or deletion of a Work_for.(3) It does not clearly distinguish between the condition specifying thoseemployees to whom the constraint would apply (those working for someproject(s), i.e., the condition “E.EmpNo = J.EmpNo”) and the assertionChapter 3. A Model for Representing Semantic Integrity Constraints 48on the salary of those employees (i.e., the expression “E.Salary> 10000”).(4) Since no violation action is specified, it can only be assumed as “reject”.The representation in the SIC Representation model is also more precise than theoriginal models presented by Fernandez et al. [1981], and Bertino and Apuzzo [1984].According to the model presented by Bertino and Apuzzo [1984], the above examplewould be expressed by a single SIC in which the object component may include “Employee” and “Work_for” and the operation component may include “update” and “insert”.It implicitly means that the SIC is asserted on the insertion of an Employee or Work_forand the updates of any attribute of an Employee or Work_for.In addition, the representations in terms of the Representation model have some advantages that will be described in detail in the next chapter. In short, the certaintyfactor introduces fuzzy semantics, facilitates knowledge-based query processing and provides deductive capabilities. The identified object and operation type are useful forefficient enforcement and together with the violation action, they provide different perspectives for specifying and enforcing a general SIC. The separation of precondition frompredicate allows natural and precise representation of a SIC. The explicit representationsof the object, operation type and precondition components are useful for applying SICabstractions (that are introduced in Section 4.2).These six components and the naming convention of the SIC Representation modelare described in the following sections. Detailed BNF (Backus-Naur Form) descriptionsof the model are given in Appendix A. The rest of this section describes briefly thenotation and the allowed expressions in this language.Chapter 3. A Model for Representing Semantic Integrity Constraints 49Expressions First order logic and higher order logic expressions, arithmetic expressions and date expressions, are allowed in the precondition and predicate components.They may consist of quantifiers V (for all), (there exists), logical connectives“A” (conjunction), “V” (disjunction), “—i” (negation), but not the “—÷“ (implication)connective since it has been separated as the precondition. They may also contain aggregate functions: e.g.. avg, count, max, mm, sum, or other user-defined aggregationfunctions. In addition, a special pair of functions, old and new, are used. The new/oldfunction in SIC references the new/old value of the referenced object after/before checking the SIC. An implemented integrity maintenance subsystem of a DBMS must assurethe correct functioning of the new and old functions6.One set expression is also allowed:set{Ei some restriction(s)}. This is read as for all E1 satisfying “some restriction(s)” 17The special predicates used with the SIC Representation model are summarized in theAppendix B18.Subscript Notation This research uses a subscript to simulate the “cursor” in Date’sUDL to emphasize that a SIC is enforced on an occurrence although it is specified intensionally for an entity/relationship/relation type. A cursor in UDL is an object whosevalue is (normally) the address of some specific record in the database ([Date, 1983]).Since a SIC is enforced on an occurrence, if there are other occurrences of the same entity,‘61f the SIC is checked immediately, it references the new/old value of the referenced object after/beforethe operation. However, if the SIC is deferred to be checked at end of transaction, it references thenew/old value of the referenced object after/before the transaction.‘7The keyword set can be omitted. This notation is adopted from an “integrity constraint definitionlanguage” [Gardarin and Melkanoff, 1979].18n addition, this research follows the Prolog naming convention that a variable is written as a wordbeginning with a capital letter and an atom is written as a word beginning with a lower case. The onlyexceptions are keywords (CERTAINTY, FOR, ON, IF, ASSERT and ELSE) and the SIC name in thefirst line, which are character strings in nature, not variables.Chapter 3. A Model for Representing Semantic Integrity Constraints 50relationship or relation type referred in the precondition or predicate component, different subscripts will be used to distinguish them. The one to he checked is referenced byattaching a default subscript 0, e.g., E0. Variables with subscripts other than 0 (e.g., E1)represent any occurrence of the same object (e.g., E) type. In terms of programming language, the one with subscript 0 refers to the particular inserted/deleted/updated “value”to be checked; those with subscripts other than 0 are “variables” of the same object type.If there is only one occurrence (i.e., the one to be checked) referred in the preconditionand predicate components, the default subscript 0 is omitted. In any case, we do not usea subscript in the object component because the variable in that component is used toindicate that the SIC is applicable to any occurrence of the specified type.An example from Date [1983] will help to clarify the above subscript idea. Supposethat there is a relationship Supply in the context of “Supplier Supply Part”; and a SICrequiring that “any quantity value of a supply must not be more than 5 percent greaterthan the average of all such values”. One of the required SICs is represented by the Representation model as below, in which Supply1.Qty stands for any Supply. Qty occurrence.The assertion in its predicate component requires that each of other Supply. Qty’s mustsatisfy the average property when an occurrence is to be deleted.Supply-D-AggFcn- (Supply. Qty)CERTAINTY certainFOR SupplyON deletionASSERT -‘(Supplyi.Qty> 1.05 x avg({Supplyi.Qty Supply1 Supplyo}))ELSE rejectLegend: Supplyo indicates the value of Supply occurrence currently checked.Chapter 3. A Model for Representing Semantic Integrity Constraints 513.2 SIC NameSIC name is not counted as a component in the Representation model since it is notessential. A database designer may freely use other naming conventions, such as thesimple one Ii, 12, ... used in the literature ([Bertino and Apuzzo [1984], and [Date,1983]), or some application dependent conventions. In this research a concatenatedcharacter string is used as a SIC name for conveying some of the meaning of a SIC and alsofor serving as its unique identifier. This convention allows the SIC elicitation subsystem ofan automated database design system (introduced in Chapter 7) to generate a SIC nameautomatically by incorporating an abbreviated application domain-independent SIC typeand some application information (object type, operation type and related object typeset). A SIC name contains five parts. The general format is like:ObjectType-OperationType-SICType- (RelatedObjectTypeSet)-SequenceNoFor example. from “Employee. Salary- U-RshipDepEnt Val- (Work_for)” we know thatthis SIC should be checked on the update of Employee.Salary because the existence of aWork_for relationship occurrence depends on its value.1. The Object Type (e.g., Employee.Salary) is the specific type name of an attribute,entity, relationship, or relation for which this SIC is asserted. It corresponds to thecomponent 0 in the SIC.2. The Operation Type is either U, I, or D representing update, insertion, or deletion,respectively. Similarly, it corresponds to the component T in the SIC.Chapter 3. A Model for Representing Semantic Integrity Constraints 523. The SIC Type is the abbreviation of a SIC type conveying some application domain-independent meaning; e.g., “Totality” (if an entity occurrence exists, it must participate in some minimum number of relationship occurrences of the specified type),“RshipDepEntVal” (the existence of a relationship depends on an attribute valueof an entity type). By applying the E-R-SIC model (introduced in Chapter 5), wecan classify SICs into a number of domain-independent SIC types.4. The Related Object Type Set includes those object types that appear in the precondition and predicate components of a SIC and can supplement the meaningof the SIC type. A SIC name may not have this part if there is no such objects. For example, there is no other object type related to an absolute cardinality constraint for an entity type, which restricts the maximum number of occurrences of the entity type that can exist in a database. However, in the aboveproject_employee_minimum_salary example, it may be useful to know which relationship type (i.e., Work_for) depends on the attribute (i.e., Employee.Salary). Fordomain constraints on the insertion of an entity, although all the attributes of theentity are referenced in the SIC, they need not be included in its SIC name sinceit is implied by the SIC type “Domain”.5. The Sequence No is a positive integer number. In a complicated application, it ispossible that the above parts would not be sufficient to distinguish one SIC fromothers. In that case, as a last resort, a sequence number starting with 1 will beadded.Chapter 3. A Model for Representing Semantic Integrity Constraints 533.3 Certainty Factor (F)A certainty factor, F, which does not appear in Fernandez et al. [1981] and Bertino andApuzzo [1984], is introduced in the Representation model. A certainty factor is closelyrelated to the predicate (P) and violation action (A). It may be attached to SICs in thefollowing alternative ways:1. Ratio Scales: A ratio scale to measure the certainty of a SIC is needed if we woulduse the certainty factor to provide deductive capabilities under uncertainty. Thecertainty factor is defined to have a value between 0% and 100%. Any SIC with acertainty factor less than 100% can only have “warning”, “conditionallyieject”, or“conditionally_propagate” as its violation action. The value of the certainty factoris based on the experience of the database designer. If the value is too low, itimplies that the SIC is very unlikely to hold in the database and is less useful inproviding semantic information.2. Ordinal Scales: If we are only interested in traditional databases and there area number of uncertain SICs, an ordinal scale to measure the certainty of a SICis needed. This ordinal scale would be used to rank the certainty of SICs in thesame “family” (in terms of the same restriction intention, but different levels ofstringency) when verifying and enforcing them. The certainty factor might bedefined to have a value between 1 and 10 if there are at most 9 uncertain SICsfor the same “family” of SICs. Any SIC with a. certainty factor less than 10 is anuncertain SIC.3. Two levels: It is possible that a database designer would like to specify at mostone uncertain SIC for a restriction intention because an organization is mainlychapter 3. A Model for Representing Semantic Integrity Constraints 54concerned with SICs with 100% certainty. In this case, two discrete levels of certainty are enough. In this case, “uncertain” (i.e., certainty 100%) would havea “warning”, “conditionally_reject”, or “conditionally_propagate” violation actionwhereas “certain” (i.e., certainty = 100%) would have a corresponding “reject”, or“propagate” violation action.4. Fuzzy terms: It is also possible to attach a fuzzy term, e.g., “usually”, “sometimes”, as the certainty factor if that term has been properly specified to be equivalent to a specific certainty ratio number (e.g., “usually” may be equivalent to80%).3.4 Object (0)O represents the data object to which the SIC applies. In the E-R model, it correspondsto any occurrence of the specified attribute, entity, or relationship type. In the relationalmodel, it corresponds to any occurrence of an attribute or a relation type, i.e., a column,or tuple. The E.A or R.A is used to refer to the attribute A of entity E or relationship(relation in the relational model) R. Note that an attribute is treated as an object inthis model since it would make the SIC clearer and it is possible that a constraint isrelevant to an update of an attribute, but not the insertion/deletion of its related entity,relationship, or relation. For example, in the above project_employee_minimum_salaryexample, it is not relevant to the insertion or deletion of an Employee although it isrelevant to the update of an Employee.Salary.Chapter 3. A Model for Representing Semantic Integrity Constraints 553.5 Operation Type (T)The operation type, T, (also called the access type [Bertino and Apuzzo, 1984]) specifiesthe types of database manipulation operations to which the SIC is applicable. Only threeoperation types need to be considered — insertion, deletion, or update. Note that aSIC is never relevant to a retrieval (query) operation.The data definition operations, i.e., “creation” or “destruction” of an entity, relationship, or relation type, are also excluded’9.A database designer should design a databaseto include those necessary object types according to the information requirements. However, whether an object type should be in a database schema is not a SIC. Destroyingan object type is concerned with the database reorganization. In the case of databasereorganization, destroying an object type may have different implications in different situations. For example, destroying a type may occur because this type ceases to exist inthe real world or just because the organization will no longer keep the super-type (e.g.Person) information, hut is still interested in its subtypes (e.g. Employee). The databasedesigner needs to be involved to discover the changed information requirements. Thedifferent situations cannot be foreseen and automated in the database design phase.3.6 Precondition (C)The precondition (C) specifies the implicit or explicit presuppositions of a SIC. Ifit is not satisfied, it would cause the predicate of the SIC to be either undefined orirrelevant. For example, in the above project_employee_minimum_salary example, theSIC for Employee.Salary on update is irrelevant to any employee who does not work for‘9So the language represented by this model is not a relationally complete query language.Chapter 3. A Model for Representing Semantic Integrity Constraints .56any project. It also identifies the object (0) in the context of this SIC. In the aboveexample, the SIC for Work_for on insertion would have the precondition as “ Employee,rship_occ_part(Work_for, “Employee”, Employee)” to indicate the Employee occurrenceparticipating in the specific Work_for occurrence. If a SIC has no precondition, thekeyword IF may be omitted.3.7 Predicate (P)The predicate (P) is the assertion for the database state transition when the object(0) is to be manipulated by the operation (T). If it is true, the operation (T) on theobject (0) is allowed to be performed. For an attribute other than primary key20, thepredicate should involve the attribute itself. For an entity, relationship, or relation, thepredicate involves its attributes, aggregate attributes (e.g., average value), or relatedother entities and/or relationships (relations in the relational model). Since a SIC placeslogical restriction on data, its predicate component should contain invariant assertionson the related objects rather than procedural statements of programs.The precondition and predicate are both expressions that control whether the violation action will be invoked. The precondition specifies the general state of the databasethat makes the constraint relevant. The predicate specifies something that must be trueabout the object after the operation. The justification of having both components isdescribed in detail in Chapter 4.20We have this exception because in the relational model and traditional E-R model, primary keysplay the role of “surrogates”. The tipdate of a primary key of an entity/relationship/relation may implythe deletion of an old entity/relationship/relation followed by the insertion of a new one.Chapter 3. A Model for Representing Semantic Integrity Constraints 573.8 Violation Action (A)The violation action (A) specifies how the system is to behave if the predicate P is falsewhen the SIC is checked. In order to have precise meaning for a single SIC, it is desirableto have a single violation action (or a deterministic series of well-defined actions) ratherthan a complex procedure with many choices. The violation action may be specified asfollows.1. “warning” — allow the access operation, but issue a warning.2. “reject” reject the access operation, signalling an error.3. “propagate” allow the access operation, but perform a related corrective actionto restore the database to a correct state.4. “conditionally_reject” or “conditionally_propagate”— request confirmation from theuser to ignore the SIC; the processing of the access operation is suspended untilthe user indicates either to ignore the SIC or to take the violation action (rejectthe access operation, or allow the operation but propagate to perform a correctiveaction).Discussion on Propagation Traditionally, the violation action of a SIC is just arejection. A warning action indeed turns off the enforcement of the SIC except forgenerating a message. A propagation action changes other objects in the database. Date[1983] indicates that integrity rules may he considered important special cases of triggeredprocedures21.In discussing semantic integrity constraints, Fernandez et al. [1981] mentionthe following problems raised by the trigger mechanism.21Other application areas for triggered procedures include virtual fields, security, etc. [Date, 1983]Chapter 3. A Model for Representing Semantic Integrity Constraints 58(1) Integrity or security violation within auxiliary procedures could result ina never-ending sequence of invoking procedures.(2) The proper access rights of the invoked procedures are an issue: Shouldthese be the rights of the invoker of the triggering maintenance operationor the rights of the DBA who specified the auxiliary procedure?(3) Triggering of maintenance procedures results in a somewhat confusingdivision of responsibility between the user (or the programmer) and theDBA. For example, who should update the count of employees on insertinga new employee? The system or the programmer?The problem of security or access rights is beyond the scope of this research22.Besides,as stated before, the intention of this research is not to propose an approach to replacingthe transaction or application programming. If the intention of the violation actionis just to maintain a correct database state, there is no reason that we would need acomplex violation action. Therefore, although in principle a “violation action” couldbe very complex, in this research it is generally a single action or a deterministic seriesof well-defined actions. The responsibility of programmers to write proper applicationprograms to update data is not changed23.If the violation action is to propagate to insert some other objects, establishing the attribute values of those objects might become a problem. It is not possible for a databasedesigner to know all these values during database design. Those values that cannot beknown by an automated database design system will be assumed to be “null” (i.e., “unknown”). This may violate other SICs if some unknown attribute values (e.g., part of22llowever, this unsolved security problem raised by Fernandez et al. [1981] might also be one argumentagainst complex violation actions.23llowever, now a DBMS becomes more powerful to detect programming errors of programmers andtakes violation actions. Programmers need not check SICs because the DBMS will do that.Chapter 3. A Model for Representing Semantic Integrity Constraints 59key of a relationship) are not allowed to be “null”. There are two approaches to dealingwith this problem. In the first approach, the violation action “reject” rather than “propagate(insert(O1))” must be specified. In the second approach, the propagating insertionis allowed. Compared to the first approach, the second approach is more powerful. Evenif the propagating insertion is finally “rejected” due to other SICs, it is easier for theuser to understand why the original insertion operation is rejected. It also leaves openthe possibility of designing an enhanced integrity maintenance subsystem of DBMS thatwill prompt the user to supply the unknown attribute values or obtain them from someother sources ([Casanova and Tucherman, 1988]).Two-valued Logic Note that the violation action is taken only when the predicate,not the precondition, is What would happen if a database allows null values? Anull value can have at least two interpretations: “unknown” and “not applicable”. In thisresearch, “null” is only allowed to represent “unknown” because the “not applicable null”should not appear in a well-defined, properly-normalized database with clear semantics24.In principle, some warning messages can be issued25 if the predicate is evaluated to beunknown. However, since the “unknown” values will eventually become “known”, theywill be checked at that time. If they violate SICs, appropriate actions will he taken24The existence of “not applicable null” in some attributes of an entity, relationship. or relationindicates that these attributes do not apply to all occurrences. It implies that some subtypes shouldbe specified in order to clarify the semantics. In the literature (e.g., [Lee, 1988]) there are two otherinterpretations of “null”. Lee states that, for example, with regard to John’s spouse, the null value maymean: (1) literal interpretation “null” (e.g., the spouse name is “null”); (2) sense of “none” (e.g., Johnis a person, but he has no spouse); (3) “not applicable” because spouse is not an attribute of the object(e.g., John is the name of a building, not a person); (4) “unknown”. Note that the first interpretation isthe result of bad implementation of “null”. “Null” should be represented by an unambiguous symbol in adatabase. The second and third interpretations are also results of a bad database design — the semanticsare not clear because some attributes are not defined for some occurrences of the specific “type”. In theabove example, there should be two types, Person and Building, to avoid the third interpretation. Thesubtype Married_Person should be defined to avoid the second interpretation. In this research, the “notapplicable null” interpretation includes both the above second and third interpretations.25These are termed “weak violations” by Ho [1982].Chapter 3. A Model for Representing Semantic Integrity Constraints 60then. Therefore, in this research, for simplification, it is assumed that if the predicateis evaluated to be unknown, no violation action will be taken immediately. By takingsuch a position, the verification of SIC consistency can be simplified from 3-valued (true,false, unknown) to 2-valued (true, false) logic.Chapter 4The Application of the SIC Representation ModelThis chapter describes how we can apply the SIC Representation model for databasedesign and management. In Section 4.1 it is claimed that by applying this model we canrepresent precisely the features of any SIC. In Section 4.2 SIC abstractions are introducedto facilitate SIC specification and management. The usefulness of the Representationmodel in SIC abstraction is also discussed. In Section 4.3 the application to databasemanagement is briefly described.4.1 Completeness of a SIC Specification for Database DesignSu & Raschid [1985] and Shepherd Kerschberg [1986] suggest that “constraints” mustexplicitly specify both declarative semantics as well as process oriented or operationalsemantics. Declarative semantics correspond to logical formulas describing relationshipsbetween objects in a database. The information needed to check that these relationships are true or to maintain these relationships is the operational semantics. It is alsosuggested that the “constraint” formalisms must provide information along the lines ofWHAT, WHEN, WHERE and HOW if they are to be complete. Although our SICs areonly a proper subset of their “constraints” (refer to Section 1.3), the same formalism withsome modifications can be applied to examine the completeness of a SIC specification.61Chapter 4. The Application of the SIC Representation Model 621. Operational and Declarative Semantics. In the Representation model, thepredicate (P) and certainty factor (F) components specify the invariant declarativesemantics among data objects. The operation type (T) is relevant to the operationalsemantics to check these invariant facts. The violation action (A) specifies theoperational semantics that are needed to maintain these assertions. In addition,the object (0) and precondition (C) components supplement both declarative andoperational semantics to assert and check these invariant assertions. The certaintyfactor (F) also provides some operational semantics since the highest certain SICshould be enforced first.2. WHAT, WHEN, WHERE and HOW. The Representation model specifies:• WHAT the SIC requires — its invariant assertions (predicates, P) and certainty (certainty factor, F).• WHEN the SIC is to be checked these assertions should be checked whensome primitive access operation (T) (update, deletion, insertion) is performedon the data.• WHERE the SIC occurs — these assertions are applicable to some data object(0) in some contexts (preconditions, C).• HOW the system should behave in the case of these assertions being violated(violation action, A).Therefore, the Representation model provides declarative and operational semanticsof a SIC specification. It also provides a SIC specification along the lines of WHAT,WHEN, WHERE, and HOW. The rest of this section further explores whether we wouldexactly need these six components in order to represent completely declarative and operational semantics of a SIC specification and express precisely its features— can we haveChapter 4. The Application of the SIC Representation Model 63fewer or do we need more?4.1.1 Not Fewer ComponentsThe strengths of the Representation model are that it is precise enough to express thefeatures of a SIC; uniform for all kinds of SICs; and powerful enough to offer a correctiveaction rather than simply reject the operation violating the SIC. As mentioned in theprevious chapter, traditional languages do not explicitly describe all features included inthe Representation model. A SIC represented in a traditional language could only beassumed as 100% certain, related to all objects mentioned in that SIC; applicable to allprimitive database access operations; unconditional in all contexts; and so strong thatits violation would cause errors. It is beyond question that the predicate (P) componentis absolutely needed. Other components need to be further discussed.Object (0) Depending on the language used to represent a SIC, a number of objectsmay be mentioned. However, the declarative and operational semantics of the constraintmay not be relevant to some of these objects. In the project_employee_minimum_salaryexample of Chapter 3, the constraint is not relevant to the Project that may be mentionedif it is declared in a natural language. Neither is it relevant to the Employee if it isrepresented in a pure first order logic such as the one used by Ling and Rajagopalan[1984]. Consider another example. Suppose we have a constraint: “if an employee worksfor any project, his (her) salary cannot decrease”. This constraint is only relevant to theEmployee.Salary, not Employee, Project or Work_for. In order to have precise declarativeand operational semantics for a SIC, it is necessary to specify explicitly for which objectthis SIC is asserted. Since in this model a SIC is applied to a single operation, which canonly manipulate a single object, it is sufficient to assert only for this object.Chapter 4. The Application of the SIC Representation Model 64Operation Type (T) If a constraint is relevant to an entity, relationship, or relation,but not an attribute, and is operation-dependent, the explicit specification of its operationtype is needed. If the constraint is operation-independent, there are still two advantages(in addition to improving enforcement efficiency) to specifying the operation type:• First, explicitly expressing which operation could cause constraint violation provides the valuable operational semantics and clarifies the restriction intention (declarative semantics) of the SIC. It would be desirable to have a conceptual picture ofthe consequence of constraint enforcement even in the database design phase. Itwould also be helpful for later designing transaction specifications. For example,given the constraint “the total number of employees cannot exceed 200”, it is helpfulto know that its restriction is only relevant to the insertion of an employee althoughthe original constraint is operation-independent.• Second, it is impossible to specify the violation action without referring to a specificoperation type. When a general SIC is rewritten into several sub-SICs for separateobjects in terms of the Representation model, their objects, operations, and possiblydifferent violation actions provide useful declarative and operational semanticsnecessary conditions for operations on the objects in the object component andsufficient conditions for propagated operations on related objects.However, for a SIC asserted on an attribute, the explicit operation type specification isredundant since update is the only possible manipulation operation type. In this case,we explicitly specify it only for clear and uniform representation.Chapter 4. The Application of the SIC Representation Model 65Precondition (C) One may argue that it is unnecessary to separate preconditionfrom predicate in terms of pure logical expressiveness. However, without this separation. the declarative and operational semantics of the SIC a.re ambiguous. The precondition specifies the current database state that makes the constraint relevant; thepredicate asserts the allowable state after the operation on the object. We have seenthe project_employee_minimum_salary example in Chapter 3. Some constraints are morenaturally expressed in the IF... THEN. . . format. By explicitly separating the predicatefrom the precondition, the restriction intention of the predicate becomes clearer and isclosely related to the violation action that now can be simplified to be a deterministicseries of well-defined actions (usually a single one).Even if a constraint is not in the IF ... THEN . . . format when we first write it inEnglish, it may have some implicit “presuppositions”. For example, in a database containing Department, Employee, and Manager, a constraint might he stated in Englishas SIC-6P6: “each employee must earn less than his (her) department manager”. Inthat database, the above statement has two implicit presuppositions: (1) “the employeebelongs to some department”; and (2) “the department has a manager”. If these presuppositions are only true for some employees and departments, SIC-O is conditional and thepresuppositions should be explicitly represented in its precondition component. If theseare facts that are true for all employees and departments, we should have two other SICs,each of them asserting one of the above facts in its predicate component. In addition,these presuppositions should also be in the precondition component of SIC-O so that theconnection between a specific Employee and Manager are clear27. Each SIC would haveprecise meaning.261n this chapter, SIC-i, i=O,1,2,... are used as SIC names for simplicity.27That is, by these preconditions, we can find the manager occurrence related to an employee or findthe employee occurrences related to a manager. However, these are not the restriction intention of theabove SIC.Chapter 4. The Application of the SIC Representation Model 66The precise context specifications in the precondition are also needed to provideoperational semantics when enforcing some related SICs. Suppose we have two SICsfor Employee.Salary on update.SIC-iCERTAINTY certainASSERT Employee. Salary 100,000ELSE propagate(update(Department .Budget))SIC-2CERTAINTY certainASSERT Employee.Salary 150,000ELSE rejectWhen updating the salary of an employee to be $180,000, we would have an enforcement problem— which violation action should be taken. The problem is caused by theimprecise representation of SIC-i. The original restriction intention of these two SICs is:“the salary of an employee should be less than or equal to $100,000, otherwise, if it is nottoo high, we can increase the department budget; but if it is higher than $150,000, we canonly reject it”. The precondition of SIC-i should include an expression, “Employee.Salary< 150,00U’8.28Alternatively, we can use an expression, “checkpreSIC(SIC-, Employee.Salary”, that checks thepre-SIC, SIC-2, for Employee.Salary and returns true if the pre-SIC is satisfied. One SIC’s (say SIC-i)precondition may be the precondition and predicate of other SICs (say, SIC-2, SIC-3). SIC-2 and SIC-3may be called as the pre-SICs of SIC-i. If a data object violates the pre-SICs, SIC-i becomes irrelevantto further testing.Chapter 4. The Application of the SIC Representation Model 67Violation Action (A) Without the violation action component, all SICs could onlybe assumed to be strong. The SIC specifications would be less powerful since they couldnot include self-correcting or soft SICs.Certainty Factor (F) If a database designer only considers certain SICs, he or shewould not need the certainty factor component. However, the purpose of introducing certainty factors in database specifications is to represent more data semantics — includingfuzzy semantics. If the certainty factor component was not included in the Representation model, all SICs would be assumed to be 100% certain — no exceptions. Theviolation action of a SIC would be either a rejection or a corrective action. Systemswith only such SICs contain less declarative semantics since the fuzzy semantics are lost.They have also been criticized as too rigid to “deal with unusual, atypical, or unexpectedoccurrences”, and “it is not possible to allow violations of integrity constraints to persist without turning off completely the checking of those constraints” ([Bordiga, 1985]).Having considered such possibilities and in order to avoid such criticisms, a databasedesigner might specify too few SICs or too “generous” ones. The inclusion of certaintyfactors in the Representation model provides a mechanism for accepting data violatingthe assertions of some uncertain constraints— the introduction of “controlled inconsistency” For example, a database designer may specify a SIC for employee’s salaryas “Employee Salary < $1, 000,000” even though few employees in an organization havesalary greater than $100,000 Such a SIC would not catch common data entry errors,e.g., misplaced or omitted decimal points. Employing a certainty factor, he (she) mightspecify two or more SICs for the same object on the same operation. For example:SIC-1: with 100% certainty, Employee.Salary $1,000,000SIG-2: with 90% certainty, Employee.Salary < $100,000Chapter 4. The Application of the SIC Representation Model 68SIC-3: with 80% certainty, Employee.Salary < $50,000SIC-2 and SIC-3 are fuzzy (uncertain) semantic integrity constraints. If they areviolated, different warning messages might be issued.One may wonder since the certainty factor component relates closely to the violationaction, we might only need the latter. However, we may lose some semantics if we onlyhave the violation action component:• Suppose that we only have at most one uncertain SIC for each restriction intention.It might be acceptable to keep only the violation action component if we can assurethat the violation actions would correctly be specified and be consistent with their“implicit” certainty factors.• Suppose that we have a number of uncertain SICs. In the above example, SIG-2 andSIC-3 should have measures of certainty at least on an ordinal scale. Otherwise,we cannot know whether these SICs are inconsistent29.In addition, when enforcinga set of uncertain SICs, some ordering is needed. The most certain SIC in thesame “family” (in terms of the same restriction intention, but different levels ofstringency) should be checked first since its violation is more likely to cause thedatabase to be inconsistent, the second certain one is then checked, etc. If anoccurrence violates the higher certain SIC, the remaining SICs with lower certaintyin the same “family” need not be checked. Without the certainty factor, we wouldlose some declarative and operational semantics.29ff the SIC-S were with certainty 95%, SIC-S and SIC-S would become inconsistent. Note that SICsrepresented in this model are consistent if there exists a database state transition that is allowable withregard to all of the restrictions, i.e., considering all their components except for violation actions.Chapter 4. The Application of the SIC Representation Model 69Conclusion Based on the above arguments, we can conclude the following:If we wish to have uniform and precise representations for all kinds of SICs,it is necessary to have all six components of the Representation model.4.1.2 No More ComponentsIs it possible that more components are needed to represent declarative and operationalsemantics of a SIC or express its features? In the following, some possible proposals arediscussed.Enforcement Schedule Shepherd and Kerschberg [1986] suggest that a “constraint”formalism should include when a constraint is to be checked -— after every transactionor at audit time. Note that their discussion corresponds to the classification based onthe enforcement schedule described in the Section 2.1.7. As discussed in that section, theenforcement schedule is a transaction-driven specification. It is impossible to specify theenforcement schedule of a SIC as another component in the level of database schema.Permanence As also discussed in the Section 2.1.5, the permanence of a SIC is noteasily decided in advance since the organizational environment is changing. The permanence of a SIC is closely related to SIC maintenance rather than database design or SICenforcement. If a SIC is still in a database schema, it should be enforced anyway. Itspermanence information adds no more declarative or operational semantics. Therefore,this “permanence” information is not included in the Representation model.Chapter 4. The Application of the SIC Representation Model 70Object Type, Set, or Occurrence The object component in the Representationmodel means that the SIC is applicable to any occurrence of the specified object type.Occurrences are fundamental things in the database, and insertion, deletion, and updateare primitive operations on them. The integrity problems caused by data definitionoperations (destruction or creation) on data object types are not the focus of this research.Any SIC that involves data manipulation operations on higher levels of objects, e.g.,types, or sets, is finally enforced at the occurrence level. Therefore, the explicit indicationof the level to which the SIC is applied would add no more declarative or operationalsemantics.Static or Dynamic As mentioned in Section 2.1.8, researchers have also discussedstatic and dynamic constraints. Should we explicitly specify a SIC as static or dynamic?As described in Chapter 3, the Representation model is indeed transition-oriented; thatis, all SICs represented in terms of the Representation model are basically dynamic. Thefundamental premise is as below.Premise 4.1 The current database state is semantically correct.A database state is semantically correct if it can be constructed starting from an emptydatabase by a sequence of valid database primitive operations (insertions, deletions, orupdates). A database operation is valid if the state transition caused by it satisfies theSICs that exist at the time the operation is performed. We assume that a database is initially empty. Over time, object occurrences are inserted into the database, then updated,and finally probably deleted. Before ai occurrence is manipulated, the old database stateis semantically correct. A SIC is specified so that the database transition caused by itsprimitive operation would bring the database to a new semantically correct state. AnyChapter 4. The Application of the SIC Representation Model 71“static constraint” is represented by rewriting it into one or more dynamic constraints forthe related object(s) on the operation(s) that could cause unallowed database state(s).By doing so, we do not lose its declarative semantics, rather, proper operational semantics are attached. It is unnecessary to indicate whether the original constraint is dynamicor static in this model.In addition, a constraint on a sequence of operations is not a “real” SIC in terms ofthe SIC Representation model. Instead, it is the consequence of enforcing several SICs.For example, in order to model “a car must be owned by a manufacturer before it can beowned by a dealer’, given that there are Manufacturer_Ownership and Dealer_Ownershiprelationships, and Car, Manufacturer, and Dealer entities, we would need four SICs:Manufacturer_ Ownership-I-RshipExclusive- (Dealer_Ownership)CERTAINTY certainFOR ManufacturerOwnershipON insertionIF Car, rship_occ_part (Manufacturer_Ownership, Car)ASSERT —di DealerOwnership,rship_occ_part(Dealer_Ownership, Car)ELSE rejectChapter 4. The Application of the SIC Representation Model 72Dealer_ Ownership-I-RshipExciusive- (Manufacturer_Ownership)CERTAINTY certainFOR Dea1erOwnershipON insertionIF Car, rship_occ_part (Dealer_Ownership, Car)ASSERT - ManufacturerOwnership,rship_occ_part (Manufacturer_Ownership, Car)ELSE rejectDealer_Ownership-I-RshipBeforeRship- (ManufacturerOwnership)CERTAINTY certainFOR Dea1erOwnershipON insertionIF Car, rship_occ_part (Dealer_Ownership, Car)ASSERT ManufacturerOwnership,rship_occ_part (old(ManufacturerOwnership), Car)ELSE rejectManufacturer Ownership-D-Rship TriggerRship- (Dealer Ownership)CERTAINTY certainFOR ManufacturerOwnershipON deletionIF Car, rship_occ_part (Manufacturer_Ownership, Car)ASSERT DealerOwnership,rship_occ_part(Dealer_Ownership, Car)ELSE propagate(insert (Dealer_Ownership))Chapter 4. The Application of the SIC Representation Model 73Interpretation:These SICs restrict the relationships for a single Car. The first two SICsspecify that Dealer_Ownership and Manufacturer_Ownership are exclusive.The third SIC restricts Manufacturer_Ownership to have existed at the timeDealer_Ownership is being created. Thus, these first three SICs require that ifa Dealer_Ownership exists, then Manufacturer_Ownership must have existedin the past (and must no longer exist now). The fourth SIC requires thatwhen a Manufacturer_Ownership is to be deleted, a Dealer_Ownership mustbe created. These four SICs together restrict Manufacturer_Ownership andDealer_Ownership to exist in sequence. These SICs are independent of anytransactions. However, because of them, the order of the related transactionsis restricted. The sequence restriction on transactions is naturally guaranteedif we specify completely SICs on data. It need not be worried about at thelevel of transaction specification.For example, let us consider the following four possible transactions. Transaction-2 (which can be called Transfer_Car transaction to convey the applicationmeaning) can only be performed after Transaction-i (which can be calledProduce_Car transaction)30.Note that Transaction-3 and Transaction-4 arenot allowed to be performed. Transaction-3 would be rejected because either(i) if Transaction-i was not executed before, executing Transaction-3 wouldviolate the third SIC; or (ii) if Transaction-i was executed before, executing Transaction-S would violate the second SIC. Transaction-4 would also berejected due to the fourth SIC.30llowever, the Transaction-2 may never be performed— that is, a car may be always in the hand ofa manufacturer. Whether the Tran.action-2 should be performed would depend upon the happeningsin the real world. Also, note that since all these related SICs are enforced at the end of Trarisaction-2,we can switch the order of those two database operations inside it.Chapter 4. The Application of the SIC Representation Model 74Transaction-iBeginTransactioninsert (Manufacturer_Ownership)End TransactionTransaction-SBegin Transactiondelete (Manufacturer_Ownership)insert (Dealer_Ownership)End TransactionTransaction-SBegin Transactioninsert (Dealer_Ownership)End TransactionTransaction-4Begin Transactiondelete(Manufacturer_Ownership)End TransactionA SIC with explicit time restriction is still a data-driven constraint though it mayinvolve a special system variable Current_time that registers the current clock time. Forexample, to model the SIC: “an employee cannot receive a salary raise during his or herfirst 6 months in the company”, an Employee entity must have a HireDate attribute, andthe SIC will be:Chapter 4. The Application of the SIC Representation Model 75Employee. Salary- U- TimeRestrict Transition- (Current_time, Employee. HireDate)CERTAINTY certainFOR Employee. SalaryON updateIF Current_time < Employee.HireDate + “6 months”AS SERT new(Employee. Salary) old(Employee . Salary)ELSE rejectThis kind of SIC may only occur in an environment in which the event that causesmanipulation of the involved objects is processed in real time so that the Current_timein the computer matches the event time in the real world. Otherwise, Current_time mustbe replaced by a time-valued attribute that records the external event time, and theSIC becomes an ordinary data-driven constraint. In the above example, if the request toupdate Salary is not processed in real time, the expression in the precondition componentwould be: “Salary UpdateRequest.Date < Employee.HireDate + “6 months”.Conclusion Four possible proposals to incorporate more components in the Representation model have been discussed. They suggest either some things that cannot bespecified in a database schema, or others that add no more declarative or operationalsemantics of a SIC. Given these six components we can represent any kind of SIC thatis mentioned in the literature, and list its features precisely. These six components sufficiently provide the declarative and operational semantics of a SIC what should betrue in the database, and the information to check and maintain those assertions. Thus,we conclude the following:Chapter 4. The Application of the SIC Representation Model 76It is sufficient to have these six components of the SIC Representation modelto represent a SIC precisely.With this SIC Representation model, the database designer can represent the dataintegrity semantics properly during conceptual modelling.4.2 SIC AbstractionsOne may be concerned that there would be a huge number of SICs represented in termsof the Representation model in an actual database. However, the explicit componentseparation in the Representation model allows us to apply abstraction concepts to reducethe number of SICs that must he specified using the full Representation model.SIC Aggregation Assume that 01 and Ti are the data object and operation type forSIC-i, respectively; and Ui and Ti, where i= 2,3, ..., are the data object and operationtype for SIC-i, respectively. SIC-i (called aggregate-SIC) is the aggregation of otherSICs (called component-SICs) if• 01 contains all Oi’s as components;• the operation Ti on 01 can be conceptually thought of as the combination ofoperations Ti on Ui;• component-SICs and aggregate-SIC are sub-SICs decomposed from the same general SIC.An aggregate-SIC may have its own assertions and violation action. The enforcementof an aggregate-SIC can be simulated by checking all of its component-SICs and itsChapter 4. The Application of the SIC Representation Model 77own assertions. If an object violates any component-SIC, the violation action of theaggregate-SIC will be taken. Thus, we can use a special logical predicate (checkeomSlC,see Appendix B.2) to refer to all its component-SICs by their names in the aggregate-SIC and avoid having to write the same assertions explicitly for 01 on Ti. One exampleis that the domain constraint on insertion of an entity can he simulated by checkingthe domain constraint of updating its attributes (from unknown values to some values).We would have SICs asserting not-null, value, unique, etc., for each of its attributes onupdate. These assertions need not be repeated for the entity on insertion.SIC Specialization Assume that 01 and Ti are, respectively, the object and operationtype for SIC-i; and 02 and T2 are, respectively, the object and operation type for SIG-2.SIC-2 is the specialization of SIC-i if 02 is a specialization (i.e., sub-type) of 01, andT1=T2.The specialized SIC would inherit assertions from its parent SIC with the propervariable substitution (02 for 01). Thus, representation of the specialized SIC can beomitted unless it has special restrictions. The specialized SIC’s own assertions can onlyrefine its parent SIC’s assertions, but not overwrite them. For example, suppose thatthe database designer defines SIC-i for Employee.Salary on update: “(Employee.Salary> 1000) A (Employee.Salary < 120000)”. The SIC representation for Manager.Salaryon update is not needed unless there are special restrictions (e.g., “(Manager.Salary>15000)”).SIC Association A SIC is a set of other SICs if conceptually the enforcement ofthe set-SIC is the same as the enforcement of all of its member-SICs and nothingmore. The violation action of the “set-SIC” is dummy because if an object violates anyChapter 4. The Application of the SIC Representation Model 78member-SIC, the violation action of that member-SIC will be taken. Thus, we can usea special logical predicate (checkmemSlG, see Appendix B.2) to refer to all its memberSICs by their names in the “set-SIC” and avoid having to write the same representationsof the member-SICs explicitly for the object asserted by the “set-SIC”. The concept ofSIC association may be useful for SIC enforcement in a DBMS. For example, all SICsfor the same object on the same operation may be grouped as a “set-SIC” or several“set-SICs” according to their certainty factors and/or scheduling requirements. Thiskind of “set-SIC” is not a new type of SIC. However, in SIC specifications, conceptually,a SIC on updating the primary key of an entity (or relationship, or relation) can besimulated as two “set-SICs” that refer to SICs for deleting the corresponding old entity(relationship, or relation), and inserting a corresponding new entity (relationship, orrelation), respectively.Generic SICs By applying the above abstraction concepts, we can reduce the numberof SICs that must be specified explicitly using the Representation model. The conceptof generic SICs is introduced to reduce the number of required explicit representationseven further. Assume we have the following generic object types:— (1) Entity* is the generalization (i.e., union) of all entity types that are defined by thedatabase designer. Entity*.Attribute* is the generalization of all attributes of allentity types that are defined by the database designer.— (2) Relationship* is the generalization of all relationship types that are defined by thedatabase designer. Relationship .Attribte is the generalization of all attributes ofall relationship types that are defined by the database designer.These generic object types are conceptual modelling objects. They do not actually exist ina database. That is, neither data definition operations nor data manipulation operationsChapter 4. The Application of the SIC Representation Model 79will actually be applied to them. However, we can write some common SIC types (e.g.,domain constraints) for them. These SIC representations for generic object types canbe called generic SICs (SICs for specific object types can be called specific SICs incontrast). The precondition component of a generic SIC includes logical predicates3’(refer to Appendix B.1) to indicate clearly the contexts where the SIC type is applicable(e.g., the fact that two entity types are exclusive) and/or identify the related information(e.g., its primary key, etc.) of the object type for which the SIC type applies. Supposethat we keep the constraint information for specific object types as logical predicates inthe database (e.g., in the data dictionary). Also suppose that the DBMS would supportSIC inheritance properly. Since all object types defined by the database designer are subtypes of these generic object types, if they satisfy the preconditions of some generic SICs,these SICs would be applied to them by the principle of SIC specialization. Thus, thesegeneric SICs serve as “templates” for common SIC representations and are expected tobe inherited by specific object types32. We would need only one representation for eachSIC type (e.g., two entity types are exclusive) in a database regardless of the numberof specific object types to which the SIC type applies. The advantage of this approachis to reduce the possibly huge number of explicit representations of SICs so that theconceptual structure and future maintenance (management) of SICs become easier. Inaddition, since generic SICs can be pre-defined in an automated database design system,31These logical predicates in the precondition component of a generic SIC contain some uninstantiated variables. (For example, Primary_Key is a variable in the logical predicate entity(Entity*Primary_Key ) These variables will be instantiated when a specific entity type (e.g., Employee)has the constraint information to satisfy the precondition and inherit the SIC. (For example, the abovePrimary_Key can be instantiated to be “Empld” if the database designer has specified Employee as anentity type with the primary key “Empld”, that is, the assertion entity(”Employee”, “Empld”,...) hasbeen given.)32The idea here is similar to the following simple case. In a database, there are entity types Employee,Customer, Supplier, etc. Though the Person entity type does not actually in the database, we can writesome SICs for Person, which would be inherited by the specific entity types (e.g., Customer). The levelof our Entity* type is still higher than Person.Chapter 4. The Application of the SIC Representation Model 80the consultation to elicit SICs would be more efficient. Section 7.3.1 will describe indetail how to apply generic SICs for representing some SIC types that are identifiedduring conceptual modelling.Usefulness of the Representation Model It is necessary to identify the object andoperation type when applying SIC aggregation or specialization abstractions. The explicitcomponent separation in the SIC Representation model helps identify these componentseasily. In addition, the representation of generic SICs relies heavily on the preconditioncomponent. Most SIC types (e.g., two entity types are exclusive) only apply to somespecific object types. The precondition component of such a generic SIC indicates theconditions for the specific object type where the SIC type is applicable. A few SIC types(e.g., domain constraints) are common to all entity types. In that case, the preconditioncomponent of a generic SIC is used to identify the relevant object in this context so thatits variables can be instantiated with proper values when a specific object type inheritsthe SIC.4.3 Database ManagementIn this section, some possible applications of the SIC Representation model to databasemanagement rather than database design will be briefly mentioned.4.3.1 SIC ManagementA DBMS should include an integrity maintenance subsystem to enforce SICs. Someprevious research (e.g., [Hammer and McLeod, 1975] and [Bertino and Apuzzo, 1984])Chapter 4. The Application of the SIC Representation Model 81have proposed the functional structure of this integrity maintenance subsystem. Insteadof fully describing its structure, this dissertation briefly discusses its major functionsenforcing and maintaining (adding or removing) SICs. With regard to SIC enforcement,the focus is on the basic checking strategy (the enforcement schedule) and on the violationactions.SIC Enforcement If all accesses to the database are through transactions, the DBMSwould enforce SICs through pre- and post-conditions on transactions. The integritymaintenance subsystem would only maintain SICs, and would not be responsible forenforcing them. If it is possible to access the database using primitive operations or if no *pre- or post-conditions have been specified for a transaction, the integrity maintenancesubsystem would determine which SICs must be enforced and when.If the checking of a SIC is not at the end of the transaction, it should be done beforethe intended database operation is to be performed, not after it. While examining theassertions in the precondition and predicate components of a SIC, the effect of its specifiedoperation type should be taken into account. The Bertino and Apuzzo’s [1984] originalcriteria of deciding when a SIC must be enforced with regarding to a transaction aremodified as below:Basically, a SIC is enforced at the end of the transaction, if the set of objectsin all components of the SIC is affected (updated, inserted, or deleted) bymore than one update, insertion, or deletion statement in the transaction.Otherwise, the SIC is enforced on each occurrence update, insertion, or deletion, if it is a SIC involving only one entity/relationship occurrence or tuple;or on each update, insertion, or deletion-request, if it is a SIC involving one orChapter 4. The Application of the SIC Representation Model 82more than one set of occurrences or tuples belonging to the same or differententity, relationship types, or relations.The violation action of a SIC might become a part of a transaction. In determiningthe enforcement schedule of SICs, the related propagation actions should he consideredtoo. For example, suppose a transaction includes action-i, action-2, action-3, ..., andsuppose that SIG-i is only affected by action-i and has a violation action violation-action1. Therefore, SIC-i is enforced immediately. If action-i violates SIG-i, violation-action-ibecomes part of the transaction. If SIC-2 is affected by action-S and violation-action-i,it should be enforced at the end of transaction. If the transaction is rolled back for anyreason, the related propagation actions should also be undone and if a related propagationaction aborts, the transaction must be rolled back too.In most modern DBMS, there is a recovery mechanism that is responsible for recovering the database from many possible failures, e.g., transaction failures, system-widefailures, media failures ([Date, 1983]). We assume that the integrity maintenance subsystem cooperates with this recovery mechanism to keep information in a log requiredfor undoing an operation request The information needed in the log is similar to thatfor recovery, e g, (i) identification of each modification (update, insertion, or deletion)statement, (ii) identifers of the occurrences affected b each statement, and (iii) for eachoccurrence updated, the old occurrence value A special requuement on the log for deferring SIC checking until the end of transaction is the following. If the SIC contains new/oldfunctions, the old values of the objects should be logged at the beginning of the transaction even if the values are not referenced by any transaction statement. For example, wemay have a SIC to restrict old(snm(Employee.Salary)) new(snm(Employee.Salary)),but a transaction does not reference or operate on “siim(Employee.Salary)”.Chapter 4. The Application of the SIC Representation Model 83If the violation action of the related SIC is “warning”, the intended access operation isallowed, but a warning message would be issued. If the violation action is “propagate”,the access operation would also be allowed, but a related corrective action would beperformed (some message may also be issued). If the violation action is “reject”, theaccess operation would be rejected, an error code would be raised, and some explanationmessage describing the occurrences in violation of the SICs would also be given. In thislatter case, if the transaction has proper exception-handling procedures, this error wouldbe handled as planned (e.g., change the operation in some way and resubmit it, or skipthe operation). Otherwise, the integrity maintenance subsystem would cooperate withthe recovery mechanism to rollback the transaction automatically.The SIC Representation Model SIC representation in terms of the Representation model would facilitate their enforcement since the necessary operational semanticsare included. The checking is usually limited to the occurrence (that is currently manipulated by the operation in the operation component) of the object type (in the objectcomponent). The database state that makes the checking relevant is unambiguously described (in the precondition component). The necessary corrective action is also clearlyspecified (in the violation action component). The certainty factor component serves asthe ordering factor when checking a group of SICs in the same “family”. Other implementation mechanisms (e.g., keeping redundant minimum or maximum data value) canbe used to improve efficiency further.SIC Maintenance After a database is populated, there is a maintenance problema new SIC may he added or an old SIC may be removed. Suppose that no time-stampedpast generation of data is kept. In principle, deleting a SIC does not affect the currentChapter 4. The Application of the SIC Representation Model 84database state. Inserting a SIC can affect the current database state only if the SIC isrelevant to some object(s) on insertion. If the organization decides the new SIC will notbe applicable to the existing data, it should be explicitly stated in the SIC33. In thatcase, even if the new SIC is asserted on “insertion”, the current database state need notbe checked. Otherwise, a complicated procedure is needed to simulate insertions of alloccurrences of the related object (asserted by the new SIC) and settle possible violations.It may be easier to maintain SICs represented in the Representation model since theiroperation type components are explicitly expressed.4.3.2 Other Aspects of Database ManagementThe introduction of certainty factors in the Representation model brings some advantagesto other aspects of database management. The first is to support flexible management.The second is to facilitate intelligent query processing.Flexible Management By allowing uncertain SICs, we would decrease the seriousrigidity of SICs criticized by Brodiga [1985]. Although those SICs are uncertain, theirviolation might still convey some information. Usually warning messages are issued.The purpose of issuing warning messages is twofold. First, specific occurrences thatviolate an uncertain SIC deserve further examination either interactively or in batch toassure that the real organizational situation is reflected. Note that the enforcement ofa good uncertain SIC will generate warning messages for those, likely erroneous, (butcomparatively a few) cases without interfering with the processing of routine cases. Thiscorresponds to the idea of exception reporting. Second, if warning messages are often33For example, if a new restriction on employees salary is published on “1991/2/1” and is onlyapplicable to new employees, it should be attached the precondition: Employee.Hiredate “1991/2/1”.Chapter 4. The Application of the SIC Representation Model 85issued for a SIC34, the organization might need to reevaluate its organizational policy,which might invoke SIC maintenance.Knowledge-based Query Optimization and Deductive Capabilities One advantage of introducing certainty factors in the Representation model is to facilitate queryprocessing. An uncertain SIC is similar to a heuristic for directing query processingto the mostly likely objects first in an attempt to reduce response time. Furthermore,uncertain SICs can be applied to provide deductive capabilities under uncertainty. Forexample, from some sources we have the input data as “with 90% certainty, a ship is atanker, i.e., Ship. Type=tanker”. If we have a SIC “with 80% certainty factor, a tankercarries oil (i.e., if Ship. Type=tanker then Ship. Cargo=oil)” , we can conclude that with72% (i.e., 90% x 80%) certainty, the Ship.Cargo may be oil. Thus, this approach allowsthe Representation model to apply to expert database systems (i.e., the integration ofknowledge based systems and database systems) as well as traditional database systems.34For a SIC involving aggregate functions, it is possible that even if one warning message is issued onlyonce, the organizational policy needs to be re-evaluated because violating such a SIC might imply thata number of occurrence exceptions have happened so far owing to the nature of aggregate functions. Forexample, there might be two SICs such as SIc-4: with 100% certainty. avq(Employee.Saiary, < $25,000and SIC-5: with 90% certainty, avg(Ernployee.Saiary) $20,000. but no other SICs on Empioyee.Salary.Then, a single violation for SIC-5 might imply that many employees have too high salary, the salarypolicy may need to be reevaluated.Chapter 5An Extended E-R Model Incorporating Semantic Integrity ConstraintsThis chapter proposes a conceptual modelling tool, called the E-R-SIC model, forincorporating SICs. Section 5.1 describes the shortcomings of previous E-R models fordealing with SICs. Section 5.2 introduces the primitive constructs and data abstractionsof the E-R-SIC model and describes the basic properties of SICs. Sections 5.3 to 5.6explore SIC properties in more detail. Section 5.7 summarizes the E-R-SIC model insome figures.5.1 Problems with Previous E-R ModelsExisting database design methodologies based on the E-R model provide little supportfor incorporating SICs. Rather, they are treated as unessential accessories to conceptualmodelling.The E-R model was originally proposed by Chen [1976]. It has three primitive constructs: entity, relationship and attribute. More recently, researchers (e.g., [Smith andSmith, 1977a], [Smith and Smith, 1977b], [Teorey et al., 1986]) have extended Chen’soriginal E-R model to provide some powerful data abstractions, e.g., generalization, aggregation, association, etc. Very few SICs are discussed in the Chen’s original E-R model:86Chapter 5. An Extended E—R Model Incorporating Semantic Integrity Constraints 87• An incidence constraint requires that the existence of a relationship occurrencealways depends on the existence of the participating entity occurrences ([Furtado,et al., 1988]).• The existence of an occurrence of a weak entity type depends on some occurrence(s)of another entity type ([Chen, 1985]).• The maximum cardinality specifies the maximum number of occurrences of a relationship that can be related to one occurrence of an entity type.• Attributes cannot exist on their own, but are always attached to entities or relationships.• Primary keys must have unique values.The extended E-R models proposed by previous researchers introduce a few moreSICs, e.g., minimum cardinality. However, in all cases, SICs are only considered asaccessory properties of relationship or entity types (usually a single one). For example,researchers (e.g., Palmer[1978], Tsichritzis and Lochovsky [1982]) usually discuss theoptionality property of a relationship type (i.e., a relationship type, as a mapping, istotal or optional to an entity type). One problem with this modelling approach is thatSICs do not receive adequate attention in the conceptual design phase and are usually lostduring the transformation process from the E-R model to the relational model. SICs arenot treated as essential to conceptual modelling. Instead, they are considered mainly forselecting the appropriate logical data structure (e.g., relations). Because some semanticscan not be considered as “properties” of a single entity or relationship type, the traditionalapproach may not identify all necessary SICs. For example, consider the E-R diagramwhere an entity type E is related to entity types F and G via relationship types RX andChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 88RY, respectively. What SICs (if any) are implied by the adjacency of these relationshiptypes in an E-R diagram? There may he a SIC to require that an RX occurrence relatingto an E occurrence must exist if the corresponding RY occurrence exists, or conversely.Alternatively, there may be a SIC to require that RX he exclusive to RY. Such kindsof SICs are seldom discussed. Since SICs express restrictions on the logical meaning ofdata, it is worth considering the SIC as an distinct modelling construct.5.2 An Overview of the E-R-SIC ModelThe E-R-SIC model is an extended E-R model. It can be used for incorporating SICs ina database schema.5.2.1 Primitive Modelling ConstructsThe proposed model is application domain-independent. There are four primitive modelling constructs in the E-R-SIC model — entity, relationship, attribute and SIC. Thesefour constructs are all needed to model data semantics.Construct Description These primitive constructs are defined as follows:Definition 5.1 An entity is the database representation of a real-world object that canbe distinctly identified.Definition 5.2 A relationship is the database representation of an association amongreal-world objects.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 89Definition 5.3 An attribute is the database representation of a property of a real-worldobject or an association that is a function mapping an entity type or a relationship typeto values.Definition 5.4 A SIC is a logical, invariant restriction on the static state of a database(that is, a collection of attributes, entities and relationships), or on the database statetransition caused by an insertion, deletion, or update operation.Entity, relationship, and SIC occurrences are classified into different types accordingto some criteria. At a particular moment, certain groups of entity, relationship, or SICoccurrences can be considered as sets in the mathematical sense, and these sets mayhave some aggregate properties.Following the arguments on page 59, the E-R-SIC model does not allow the “notapplicable null”. A database designer should properly define entity or relationship typesto avoid that problem.A relationship type cannot be a participant in another relationship type. If a relationship type participates in another relationship type, it will be modelled as an entitytype and the original E-R diagram will be changed appropriately35.Since neither relationships nor attributes can exist without entities, and a relationshipcan be modelled as an entity, the followillg is clear:35That is, if a relationship type RX, which has two entity type participants E and F, is also a participantin another relationship type RY, then a new entity type, say G will be used to replace the originalrelationship type RX; and two new relationship types, say RE and RF, will be created to connectthis new entity type, G, with entity types E and F, respectively. The new entity type, G, becomes aparticipant in the relationship type RY.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 90Entity is the most fundamental construct among attribute, entity, and relationship.Weak Entity Type The existence of an occurrence of a weak entity type dependson the existence of some occurrence(s) of another entity type the “regular” entitytype. An existence subset [Webre, 1983] is the subset of occurrences of the regular entitytype upon which a weak entity occurrence is dependent. The E-R-SIC model requiresthat the above relationship be explicitly specified in order to make clear the “dependencesemantics” — the existence subset for each weak entity occurrence36.Time The E-R-SIC model does not model “time” as an entity type37. That is,time-stamped past generations of data are not modelled. However, there may be SICsapplied to some time-valued attributes, temporal sequences between object occurrences,or the special system variable, Current_time.Simplifying Assumptions For simplification, there are two assumptions on relationships.361n the literature there is a related discussion on another term, weak relationship. However, note thatresearchers use that term to imply diverse meanings. For example, Tsichritzis and Lochovsky [19821 useit to mean the relationship between a weak entity type and its related regular entity type. Scheuermannet al. [1980] classify weak relationships into two types. Their second type should be modelled byan association abstraction. Their first type (also see [Dogac and Chen, 1983]) implies a special SIC,which is called Critical_Relationship_Occurrence SIC in this research. That is, the existence ofan occurrence of an entity type E depends upon the existence of exactly one critical occurrence of arelationship type R, via which it related to another entity type F. An occurrence of E cannot exist ifits related critical relationship occurrence of R does not exist though it may still participate in otheroccurrences of R. For example, each employee should be assigned to at least two projects, and exactly oneof these would be critical. In order to model such a situation, the relationship type R should have a specialattribute, e.g., “Criticality”, which is a binary variable to indicate whether a relationship occurrence iscritical.371f we model “time” as an entity type, all other entities and relationships would connect with “time”via at least two special relationship types “has_creation time” and “has_deletion time” ([Studer,1988]). The E-R diagram would become very complicated.chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 91No Ternary or Higher Degree of Relationships Ternary relationships and relationships of higher degree are not discussed in this dissertation because of their additionalcomplexity38. Instead, ternary relationships or relationships of degree greater than twoare simulated using binary relationships ([Kent, 1977] )39.No Recursive Relationships The relationships discussed in this dissertation arebinary relationships involving two distinct entity types. Recursive relationships, whichare relationships involving only one entity type, are simulated by binary relationships.That is, one or two40 subtypes will be created for the entity type participating in arecursive relationship type. Such a simulation avoids the need to attach an explicit “role”when referencing an entity type participating in a relationship type. These subtypes alsomake the semantics clearer in most cases41. When an entity type is required to participatein both roles of a recursive relationship, such a distinct subtype, although it may beredundant, will be helpful for understanding the semantics. We may have some problemswhen an entity type is required to participate in both roles of a recursive relationship38For example, the cardinality specifications are more complicated. In a ternary relationship, thereare twelve pairs of minimum and maximum cardinalities according to the cardinality definition given byLenzerini and Santucci [1983].39That is, in the case of ternary relationships among entity types, the first binary relationship isformed between two entity types, the second binary relationship is then formed to connect the firstbinary relationship with the remaining entity type. Since the participants of a relationship type shouldbe entity types, the above means that the first binary relationship should become an entity type. Forexample, suppose that there is a ternary relationship Order among three entity types, Parts, Warehouses, and Suppliers. We will model the situation as four entity types, Parts, Warehouses, Allocation,Suppliers, and three binary relationship types connecting Allocation with Parts, Warehouses, Suppliers,respectively. Allocation is a new entity type converted from the original binary relationship betweenParts and Warehouses.400ne subtype is sufficient to replace a recursive relationship with a binary relationship although adatabase designer may prefer to have both. For example, in the relationship type Supervise, “Employee(0, *) Supervise Employee (0, *)‘ where * stands for some positive number other than 0, one subtype(e.g., Supervisor) is sufficient.41Compare to an extreme case where there is only one entity type “Thing” in the database and allrelationships are recursive.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 92and those two roles are not distinct42. However, such cases are very idiosyncratic. Forsimplification, this research is limited to handling binary relationships with two distinctentity types.5.2.2 Data AbstractionsA data abstraction is a simplified description of a system that suppresses specific detailswhile emphasizing those pertinent to the problem. Like other extended E-R models inthe literature, the E-R-SIC model provides three kinds of data abstractions: inclusion,aggregation, and association.Inclusion AbstractionThe inclusion abstraction concept ([Goldstein and Storey, 1990]) encompasses classification, generalization, and specialization. Classification is a form of abstractionin which a type is defined as a collection of occurrences with common properties. Specialization occurs when every occurrence of a type is also an occurrence of another type.Specialization is indicated by the term, is_a, that is, S is_a G, where S is a subtype and Gis a super-type. It is possible to have generalization or subset hierarchies ([Teorey, et al.,1986]). A generalization hierarchy occurs when a type is union of non-overlapping subtypes. A subset hierarchy occurs when a type is union of possibly overlapping subtypes.Property inheritance, which means that attributes, associated relationships and imposedSICs of the super-type are inherited by each subtype (or a type’s are inherited by itsoccurrences), is the most important characteristic of the inclusion abstraction. In the42That is the relationship is symmetric, e.g., “Persom Fri emdship Person”Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 93case of the classification abstraction (related occurrences to types), the property inheritance principle has been enforced traditionally by any DBMS. This research assumes thatthe DBMS (either in the E-R model or the relational model) would also automaticallyimplement property inheritance for the specialization abstraction. That is, the principleof redundancy minimization. i.e., properties that can be inherited from some other entity type via an is_a relationship should not be stored explicitly ([Goldstein and Storey,1990]), has been followed. Otherwise, the inherited attributes, relationships and imposedSICs would need to be explicitly and redundantly stored for each specific subtype andother SICs would be required to assure that these properties have been really “inherited”! However, there is one exception to the use of redundancy minimization principle.Although the primary key of the super-type may not be chosen as the primary key of thesubtype, it is indeed an attribute and candidate key of the subtype. In order to make thesemantics of a subtype entity clearer and have the same related SIC(s) regardless of thechoice of the primary key, the primary key of the super-type is suggested to be stored inthe subtype.Aggregation AbstractionAggregation is an abstraction that allows a relationship between objects (i.e., attributes,entities, relationships) to be considered as a higher level object ([Smith and Smith,1977b]). The term, component_of, is used to indicate the aggregation, that is, C cornponeritof A, where C is a component and A is an aggregate. The E-R-SIC model provides composite entity aggregation that is an abstraction in which an entity containsother dependent entities and some attributes as real components43.The aggregate entity431n addition to composite entity aggregation, aggregation abstractions can be classified into fourother kinds based on discussions about aggregation in the literature. (1) Attribute aggregation is theabstraction in which an attribute may be defined as the aggregation of attributes. This research doesChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 94“owns” these other entities ([Lee and Lee, 1990]). That is, the existence of these otherentities is dependent on their aggregate and are owned by exactly one aggregate (i.e.,the cardinalities of any component in the component_of relationship are always (1,1)).Although some researchers argue that in aggregation the inheritance is upwards ([Brodie,1983, p. 576], [Mees and Put, 1987], [Potter and Kerschberg, 1988]), this research argues that it is probably more suitable to state that the aggregate “owns” componentsand components “own” their attributes, so the aggregate “owns”, rather than “inherits”, components’ attributes. For example, we may state “a car owns engine.weight,engine, brand, and brake. brand, etc. “.Association AbstractionAssociation is the abstraction in which a collection of member objects is consideredas a higher level (more abstract) object ([Brodie, 1983]). The term member_of isused to indicate the association, that is, M member_of 5, where M is a member and Sis a set. Brodie [1983] states “as with aggregation, the inheritance goes upward” andsome researchers (e.g., Mees and Put [1987]) even state that association may supportboth upward and downward inheritance. This research takes the position that there isno inheritance in association. That is, is distinguished from type and a “set” inassociation is considered to represent a pure mathematical set.not consider the hierarchical structure of attributes. (2) Simple entity aggregation is the abstractionin which an entity is defined by aggregation of attributes. This is the traditional entity concept. (3)Complex entity aggregation is the abstraction in which an entity is defined by aggregation of attributes,other independent entities, and probably sets. The aggregate object does not really “own” them ascomponents. That is, the cardinalities of these “components” in the component_of relationship maynot be (1,1). This aggregation may convey less semantics because these “relationships” between theaggregate object and its “components” are all implicitly named as “component-of”. Therefore, the E-RSIC model does not adopt it. (4) Relationship aggregation is the abstraction in which a “relationship” isobtained by aggregation of entities and some attributes. It is just another way to represent a relationship.chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 95Before Brodie mentioned association, dos Santos et al. [1980] had already proposed auseful data abstraction “correspondence”, which was later referred to by Furtado andNeuhold [1986] as “grouping”. Grouping is more general than association. It createsa group of sets, i.e., grouping is an abstraction that defines a new entity type in whicheach occurrence is a set formed from a collection of occurrences of the source entity type.According to the correspondence idea in [dos Santos et al.,1980}, a group of sets isformed by an indexing mechanism. Applying the idea of correspondence, the E-R-SICmodel provides three kinds of associations as below.1. Natural Set Association: A set, S, is defined to contain all occurrences of anentity Mtype. Classification is the indexing mechanism to form a set. For example,we have: “Employee member_of Employees” where Employees is a set consisting ofall Employee occurrences in the employee type; or “Department member_of Departments “.If these sets only have attributes that are derived from those of their members, theirexplicit representations may be redundant and may not be efficient after consideringthe enforcement of SICs. However, there may be a priori attributes, for example,Employees.Representative. In the whole database, there are a number of such sets,e.g., Employees, Departments. Since they may have different derived attributes anda priori attributes, they form different entity set types containing a single occurrence, respectively. This association abstraction relationship, memberof, shouldnot have its own attributes and need not be explicitly represented.2. Indexing Derived Set Association: This is the original correspondence abstraction discussed by dos Santos et al. [1980]. The indexing mechanism44 can be an44Formally, Furtado and Neuhold [1986] define grouping as below. “If T designates some entity setChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 96indexing attribute that is an attribute of the indexed entity type, or an indexingentity type that is related to the indexed entity type via an indexing relationshiptype. An example is cosets of employees of the same age, where Employee.Age is theindexing attribute for the indexed entity type Employee. If the indexing attributeis an attribute that disallows null (“unknown”), the cosets obtained by groupingwould form a partition of the indexed entity type occurrences. Another example isshown in Figure 5.1, where DSis the indexing derived set (e.g., cosets of employeeswho work_for the same project), M (e.g., Employee) is the indexed entity type, I(e.g., Project) is the indexing entity type, and R (e.g., Work_for) is the indexingrelationship. Grouping is a powerful abstraction. There may be more than oneindexing type, which can be combined with indexing attributes as the indexingmechanism. Although we can get a group of sets from the indexing mechanism, wemay only be interested in some of these (e.g., the set of employees who work_forproject p100).In general, this association abstraction relationship, member_of does not have itsown attributes and need not be explicitly represented. This kind of set has someattributes derived from the indexed entity type, some are defined a priori. Twokinds of attributes in a set are important for membership derivation: the indexingattribute (e.g., Employee.Age) and the primary key of the indexing entity type (e.g.,Project. Id).3. Enumerated Set Association: There is no indexing mechanism in this kind ofassociation because the database designer does not know or does not care aboutand T1,T2, ..., T, are either value sets associated with T or entity sets related via some relationship withT, then the Grouping operator denoted by T1,T2, ..., T {T} constructs a new (grouped) entity set TGwhere each element is a set of entities of T such that inside of one such set all entities have the samevalues and related entities from the entity sets T1 , T2 ..., T associated. The types T1,T2, ..., T will becalled indexing types, T the basis.”Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 97_rof DSIFigure 5.1: Grouping: Member (M), Derived Set (DS), Indexing Entity (I), IndexingRelationship (R)the indexing entity types or attributes. The set membership can only be explicitlyenumerated by the member_of relationship.5.2.3 Basic Properties of SICsBased upon the ontological concepts of Bunge’s formalism, Wand and Weber ([1988];[1989]; [1990]) develop a formal model of the deep structure of an information system.Their work begins with a fundamental premise that is an adaptation of Newell andSimon’s [1976] physical symbol system hypothesis. Although as described in Section 1.3the SICs in this research are only parts of their “laws”, we can borrow this premise todevelop the SIC concept.Premise 5.1 A physical-symbol system has the necessary and sufficient properties torepresent real-world meaning.Chapter 5. An Extended E-R Model Incorporating Sernantic Integrity Constraints 98An information system is a physical-symbol system. A database is part of an information system and consists of attribute, entity and relationship occurrences. In orderto represent data integrity semantics, there may be constraints restricting the existenceor change of these occurrences. Some constraints may specify necessary conditions thatmust hold for an occurrence to exist, not exist or change in a database. Other constraintsspecify sufficient conditions, which if true, imply that an occurrence must exist, not exist,or change in a database. Thus, we can interpret a constraint as follows.A SIC is an assertion a sufficient or necessary condition for an occurrence of an attribute, entity, or relationship type to exist, not exist, or changein a database.What are these conditions?• Although these conditions are usually specified for an entity or relationship type orits attributes, they are actually restrictions on the insertion, deletion, or update ofits occurrences in a database, rather than on the addition or removal of the typefrom the database schema.• A SIC is defined intensionally in a database schema rather than extensionally. Theconditions apply to an occurrence by virtue of the fact it belongs to an entity,relationship, or attribute type. However, the conditions for entities or relationshipsmay only be relevant to some occurrences of the specified type. These occurrences,indeed, can be considered to belong to an “implicit subtype”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 99Reasons of having implicit subtypes. By applying the classification abstraction([Schrefi et al., 1984]) those entity/relationship occurrences with common properties form an entity/relationship type; and by the principle of property inheritance every entity/relationship occurrence should have exactly all the attributesof the entity/relationship type and conform to the SICs associated with the entity/relationship type ([Knuth et al., 1988]). However, these occurrences may notbe “homogenous”. That is, there may be special SICs on some occurrences. Onecould create a subtype for only those special occurrences. By taking this approach(similar to [Dampney, 1988]) we could avoid a number of special types of SICs associated with “implicit subtypes”. However, since there may be a number of suchspecial SICs, there would have to be a corresponding number of such subtypes ina database. The organization may have no intrinsic interest in these explicit “subtypes”. In this research, the subtypes will be created only if they are meaningfulto the organization or if they are needed to eliminate recursive relationships. Thatis, we have the following premise:Premise 5.2 The occurrences of each entity or relationship type are not homogenous; that is, they have different attribute values, and are related to different occurrences of other objects.This implies that a database designer could not specify all entity and relationshiptypes such that all occurrences of each type satisfied the same set of SICs. That is,“implicit subtypes” with unique SICs always exist.An entity subtype may be implicitly defined by restricting attribute values of itsoccurrences or by requiring occurrences to participate in some relationship types45.451n another semantic data model, SDM, there are four ways to define “sub-classes”: attribute-defined,user-specified, set operator-defined, and existence-defined ([Hammer and McLeod, 1981], [Urban and Delcambre, 1986]). A user can decide what occurrences the subtype will contain. From the SIC perspective,Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 100A relationship subtype may by implicitly defined by restricting attribute values ofits occurrences, by requiring its participating entity occurrences to participate insome other relationship types, or by restricting attribute values of its participatingentity occurrences. A condition for an entity or relationship type can be consideredas the definition of an implicit subtype if there are other conditions and this condition provides a criterion to decide whether an occurrence satisfies other conditions(that are special SICs for the “implicit subtype”).• These conditions may be positive or negative. A positive condition requires thatsomething must happen. A negative condition requires that something musthappen. For attributes, by simply reversing the comparison operator (e.g., reversing“not E.A> value” to “E.A < value”), we could consider only positive conditions.However, we must consider both positive (e.g., participate_in) and negative (e.g.,not participate_in) conditions for relationships.• These conditions may be simple assertions, or complicated arithmetic formulas.To be considered a single SIC, a condition should be “atomic”, i.e., indivisible.Otherwise, it is really two or more SICs. That is, usually we need not considerconjunctions of conditions. However, there may be some “further restrictions” thatare conditions on some basic restrictions. For example, there may be a SIC suchas, “if the salary of an employee is greater than $40,000, he (she) must participatein some project, which is projecti”, in which the last part of the statement is afurther restriction on the participation in a project.• These conditions may only require that any one occurrence of a specific type exist, or more strictly, may require that at least, at most, or some exact number ofthe semantics are embedded in the user’s mind and it is left to the user to maintain the database in orderto reflect real-world changes. If a subtype is defined by some set operation, the definition is explicit.The ideas of the “attribute-defined” and “existence-defined” “sub-classes” in SDM are adopted here.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 101occurrences of the specific type exist. That is, the restriction may be qualitativeor quantitative. Since an attribute type has exactly one occurrence per entity orrelationship occurrence, no quantitative requirement is placed on attributes. Thequantitative requirements are more important for a relationship type. They arethe relative cardinalities of a relationship type — restrictions on the minimum,maximum, or exact number of relationship occurrences in which an occurrence ofthe specified entity type participates; or requirements that for each occurrence ofthe specified entity type, all occurrences of the other entity type relate to it via therelationship type.• These conditions may directly restrict the value of an attribute of a single entity orrelationship occurrence, or may restrict the aggregate value of an attribute for a setof entity or relationship occurrences. All occurrences of an entity or relationshiptype (or its “implicit” subtype) naturally form a set (although the database designermay not explicitly define it). The most important set properties are the countingnumber, minimum, maximum, summing and average values.• Although a condition may be asserted for occurrences of one entity or relationshiptype, it might also be for occurrences of a group of types taken together.• These conditions may explicitly involve time to restrict or trigger the existence,nonexistence, or change of an occurrence, or may assert a temporal sequence between the existence of occurrences46.46Without time-stamped past data, the temporal requirement would imply “no time lag” between theexistence of adjacent occurrences in a sequence of objects.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 102Systematic Modelling A naive approach to modelling SICs would enumerate all necessary and sufficient conditions for the existence, nonexistence and change of the occurrences of each attribute, entity, and relationship type. However, if a SIC involves severalobjects, we need not specify it several times. For example, suppose that a SIC requiresthat if an RX occurrence exists for an occurrence of the specified entity occurrence, anRY occurrence must also exist. We would specify this SIC when we identify necessaryconditions for the existence of RX. Although the existence of RX is also a sufficient condition for the existence of RY, it need not be incorporated again as a general constraint ifwe properly represent the above SIC (i.e., by decomposing it into two sub-SICs in termsof the Representation model). It would be desirable to be able to guarantee that allconditions on all objects have been completely covered although we need not explicitlyconsider all kinds of conditions for each object. The systematic modelling procedureproposed in the remainder of this chapter and the decomposition algorithms proposedin Chapter 7 are used to achieve this goal. By applying the procedure described in Sections 5.3 to 5.6 a database designer can model SICs systematically by first consideringthe necessary and sufficient conditions for the existence, nonexistence, and change of eachattribute of each entity and each whole entity in isolation, and then examining the wholeE-R diagram by considering each explicit or implicit relationship.Representation Languages Although SICs incorporated by the E-R-SIC model candirectly be represented in terms of the SIC Representation model, it is suggested thatthey are first represented in a simplified format using rules or expressions. The purposeof using this simplified format language is to represent the precondition and predicatecomponents first in a language that is closer to natural language. The relevant objects andoperation types can be derived from the simplified format by applying the algorithmsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 103introduced in Chapter 7. The certainty factor and violation action specifications areclosely related to the objects and operation types, and can be added later.The simplified format contains the preconditions, predicates, and some operationtype information (by the keywords, is_to_be_deleted, and is_to_be_updated for operation-dependent SICs), hut not the certainty factor and violation action. For example, we mayrepresent a requirement on the salary of any programmer first in the simplified formatas:if Employee. Title= “programmer”then Employee.Salary 18000.BNF descriptions of the simplified format are described in Appendix C. For convenience,this dissertation describes a SIC as being in rule format if it can be written by using thekeywords if and then (and the “if” condition part is not empty) in the simplified format.Since a relationship type name usually can be a verb phrase, when a relationship typeis discussed, its participating entity types will be mentioned too. Because a relationshiptype involves two entity types, we need a with_respect_to keyword to clarify to whichentity type we are referring47. An assertion may have a temporal modifier (before orpreviously) in a rule describing a temporal sequence requirement between the existenceof the object in this assertion and the existence of the object in another assertion. If a“before” modifier is attached to an assertion, the assertion must be true for its involvedobject at the time another assertion is becoming true48. If a “previously” modifier isattached to an assertion, at the time the assertion becomes false for its involved object,47For example, “if with_respect_to E, E RX F then with_respect_to E, -‘(E RY F)” is a SIC to assertthat RX and RY are mutually exclusive relative to E only.48For example, we may have: “if Dealer Own Car then Manufacturer Possess Car before”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 104another assertion must become true49.SICs written in the simplified format should be later reformulated in terms of theSIC Representation model by applying the algorithms introduced in Chapter 7. Most ofdesigner-specified SICs are general SICs. These general SICs must then be decomposedinto several operation-dependent sub-SICs in terms of the Representation model. Forexample, the static constraint in the simplified format shown on the preceding page willbe represented as three operation-dependent SICs — one each on update of Salary, updateof Title, and insertion of Employee. At this time, the certainty factor and violation actionspecifications are added.SIC Types The E-R-SIC model can be used to classify SICs into a number of domain-independent SIC types. This SIC type classification facilitates development of SIC consistency and nonredundancy rules that can be applied to make SIC elicitation more efficient.It also allows us to present generic SICs for some common SIC types so that the numberof SICs that must be specified using the Representation model can be greatly reduced.Appendix D provides a detailed description of this classification.5.3 Entity Attribute SICsFirst examine necessary and sufficient conditions for the existence/nonexistence/changeof each attribute of each entity in isolation.Necessary Conditions An attribute cannot become nonexistent once it exists. Therefore, we only need to consider necessary conditions for its existence and change.49For example, we may have: “if Manufacturer Possess Car previously then Dealer Own Car”.Chapter 5. An Extended E-R Model Incorporating Scmantic Integrity Constraints 105Associated Entity Type SICs If an attribute exists, its value may be restrictedto be in a data type and in some range, and/or may be specified as not-null that meansthe “unknown” value is not’ allowed. In addition, its value must be expressed in somespecified format that is meaningful to the organization. If an attribute is allowed to beupdated, it must be specified as changeable. All attributes (including primary keys) ofrelationships or entities are updatable unless declared otherwise. If it can be updated,there may be special SICs restricting the pairs of before and after values.Associated Entity Set SICs Because an attribute type is defined as a functionfrom an entity type to value, it has only one occurrence per entity occurrence, i.e., itis single-valued, not multi-valued. We need not consider the attribute set. However,because an attribute belongs to an entity, there may be some SICs restricting entity setproperties. They may require that each attribute value he unique. The restrictions onminimum, maximum, summing, or average value of the attribute in an entity set mayalso be specified.Primary Key Problem In the E-R-SIC model, following the traditional E-Rmodel and relational model, there is no internal identifier (“surrogate”) to representan entity or relationship. Rather, some attribute or combination of attributes is used asthe primary key. Unfortunately, this overloading causes the semantics implied by an update of a primary key to be ambiguous it may imply a simple update of an attributeor it may imply the deletion of an “old” entity (relationship or tuple) followed by aninsertion of a “new” one. The update of a primary key is allowed in the E-R-SIC model.However, the SICs related to deletion and insertion must be enforced.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 106Time Restriction There may be some SICs restricting a time-valued attribute.The expression of such SICs may involve an explicit Current_time variable50.Sufficient Conditions An attribute type is included by a database designer to satisfyorganizational needs. Because the “not applicable null” is not allowed, for each entityoccurrence, there is at least one attribute occurrence. Therefore, no SICs can be expressedas sufficient conditions for the existence or nonexistence of an attribute considered inisolation.If we consider a single attribute in isolation, neither is there any sufficient conditionfor attribute change except time-triggering. That is, an increment of the Current_timemight trigger an update to an attribute. For example, at 0:00 on 1/1/1993, increase thesalary of each employee by $1,000.5.4 Entity SICsExamine necessary and sufficient conditions for the existence/nonexistence/change ofeach whole entity type in isolation.Necessary ConditionEntity Type SICs If an entity occurrence exists, there may be a formula5’amongits attributes. In general, any entity occurrence can be deleted to reflect the real world50For example, Current_time (Employee.FirstWorkDate + “2 years”) is an expression to assert arule that “each employee must have at least 2 years working experience”.511f there is more than one attribute appearing in an expression, we call that expression a formula.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 107situation. However, there may be SICs that require some of its attributes to have certainvalues before the deletion may take place.Entity Set SICs Because an entity occurrence belongs to an entity set, there maybe SICs restricting set properties. They may specify the maximum number of occurrencesof an entity type that are allowed to exist in the database. The concatenated values ofsome attributes may be required to be unique. Some aggregate values of attributesmay be required to be interdependent52,or more strictly, satisfy a formula. There is norestriction on the minimum number of occurrences that can exist in the database becausein the beginning there are no occurrences, and we should allow that an entity type mayhave no occurrences temporarily even after the database is already populated.There are some further complicating factors such as the following:The First Complicating Factor Implicit Subtype The single attribute SICsin Section 5.3 and the SICs mentioned above in this section may be conditional. That is,they may include conditions that can be thought of as defining “implicit subtypes”. Ifwe consider a single entity type in isolation, the only way to define “implicit subtypes” isby some range restrictions or formulas involving its attribute values. It is possible thata SIC (e.g., nonvolatility, value restrictions, etc.) applies to a single attribute only fora specified “implicit entity subtype”. It is also possible that a SIC (e.g., restriction onthe maximum number of occurrences, etc.) applies to an occurrence only because thisoccurrence belongs to a specified “implicit entity subtype”.specifications52For example, if min(E.A1)> vi then avg(E.A2) > v2.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 108The Second Complicating Factor Time Restriction There may be SICsthat restrict other attributes when time-valued attributes satisfy certain conditions. Astime passes, the restriction will either finally disappear or come into force53. Theseexplicit time restrictions may be added to all SICs mentioned in this section and inSection 5.3. It is possible to combine time restrictions and “implicit subtypes” factors ina single SIC.The Third Complicating Factor — Temporal Sequence among AttributesThere may be some temporal conditions among the attributes of an entity. They mayrequire that an entity occurrence’s attributes must satisfy a certain condition at thetime one of its other attributes is going to acquire some value54. It is also possible torequire that if its other attribute(s) satisfied a certain condition in the past (and no longersatisfies it now), one of its attribute must take some value.Sufficient Conditions An entity type exists in the database only because the databasedesigner includes it in order to satisfy the interests of the organization. It would be veryunusual to have a SIC requiring a specific entity occurrence to exist or not to existin a database. Therefore, we exclude the possibility of expressing a SIC as a sufficientcondition for the existence or nonexistence of an entity occurrence considered in isolation.If we consider a single entity type in isolation, neither is there any sufficient conditionfor entity change except time-triggering. That is, for an entity occurrence, an incrementof the Current_time might trigger an update to one of its attributes if its other attributes53For example, we may have a SIC like “an employee cannot receive a salary raise during his (her) first6 months in the company”, or “if an employee has worked for at least two years, he (she) must have atleast 10 vacation days”.54For example, we may have a SIC: “if E.A1=vl then E.A=v2 before”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 109satisfy some condition(s).5.5 Relationship SICsNow traverse an E-R diagram and consider the necessary and sufficient conditions for theexistence/nonexistence/change of each relationship. In terms of topology, the importantlocal contexts describing how relationships and entities are connected together in an E-Rdiagram are55: line, star, loop-2, and loop-n. Each relationship type is in one or more ofthese contexts. When incorporating SICs for a relationship, the database designer needsto recognize its contexts.Definition 5.5 If an entity type participates in a group of relationship types, it is asharing entity type to them.Definition 5.6 If in a part of an E-R diagram, each entity type participates in at mosttwo relationship types and there is no cycle (loop) among these entity types, e.g., Figure 5.2, each of these relationships is in a line context.Definition 5.7 If there is a sharing entity type participating in three or more relationship types and there is no loop among these entity types, e.g., Figure 5.3, each of theserelationships is in a star context.Definition 5.8 If there is a loop between two entity types, i.e., two or more relationshipsexist between two common entity types, e.g., Figure 5., each of these relationships is ina ioop-2 context.551f we allowed explicit recursive relationships, they would be in the loop-i context.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 110£_____ _____F++__HiFigure 5.2: A Line Layout ContextChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 111Figure 5.3: A Star Layout ContextERZ RX RYFFigure 5.4: A Loop-2 Layout ContextChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 112E__________H__________FFigure 5.5: A Loop-n Layout ContextDefinition 5.9 If there is a loop among n3 entity types, e.g., Figtre 5.5, each of theserelationships forming the loop is in a loop-n context.5.5.1 Necessary ConditionsIt is necessary to examine the necessary conditions for one or multiple occurrences ofa single relationship type t.o exist (not exist/change), and for occurrences of a group ofrelationship types to co-exist.Single Relationship Type SICsFirst we focus on a single relationship type.Focus on Attributes A relationship is similar to an entity because it can have its ownattributes in addition to the primary keys from the participating entities. Therefore, arelationship can have single attribute SICs similar to those discussed in Sections 5.3 andChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 113single relationship SICs56 such as those of Section 5.4. Since a relationship’s attributes areusually few, such SICs are not common. However, it is possible that there may be somemore complicated SICs57 because the definition of an “implicit relationship subtype”could be based on some conditions on one or both participating entity occurrences (e.g.,having some attribute values, or participating in some other relationship types), or someformula involving the attributes of the relationship and its participating entities.Focus on Association In another sense, a relationship is much different from anentity because a relationship occurrence represents the association between two entityoccurrences.Inherent SIC First, an inherent SIC is that the relationship occurrence’s participating entity occurrences must exist.Restricted-connecting Set There may be some restrictions on constructing arelationship, i.e., the necessary conditions for the existence of a relationship occurrence.Definition 5.10 Suppose that the relationship type R connects the entity types E withF. The restricted-connecting set of an entity occurrence of E in the relationship typeR is those occurrences of F that the entity occurrence of E is allowed to be related to viaR.561n contrast to an entity type, absolute maximum cardinality for a relationship type need not beconsidered although some researchers mention it. As shown by [Lenzerini and Santucci, 1983], theabsolute maximum cardinaJity of a relationship is bounded by the absolute maximum cardinalities andrelative maximum cardinalities of participating entity types.57For example, there might be a SIC stating that “if Employee Work_for Project, Project_Id=plOOthen Work_for.Hours 50”.chapter ö. An Extended E-R Model Incorporating Semantic Integrity Constraints 114Normally, a DBMS would not know the restricted-connecting set of each entity occurrence in each relationship since a SIC is seldom specified extensionally. However, theremay be some general business rules that restrict intensionally the restricted-connectingsets. The restrictions may be positive or negative. The restrictions on the freedom toconstruct a relationship include:1. One-side condition. The restriction may be on one entity type. It would imply that some occurrences of each participating entity type are not allowed toparticipate in the relationship type. That is, their restricted-connecting sets areundefined. The relationship type R is in fact defined on some “implicit entity subtype(s)” of E or F. The “implicit entity subtypes” may be defined positively, i.e.,as occurrences having some specific attribute values, having attributes satisfyingsome formula, participating in some other relationship types; or negatively, i.e., occurrences not participating in some other relationship types. We need to considerpossible disjunctions of conditions in a star context because there are at least twoother relationship types58.2. Two-side condition. The restrictions may be coupling conditions on both entitytypes at the same time. If these are positive restrictions, they are stronger than theabove one-side restriction. If these are negative restrictions, they are weaker thanthe above one-side restriction. Although all occurrences of each participating entitytype, E or F, may be allowed to participate in the relationship type, they are notfreely connected together. An example of a positive restriction is that each occurrence of an “implicit subtype” of E can only connect with those occurrences of some“implicit subtype” of F. An example of a negative restriction is that each occurrence of an “implicit subtype” of E cannot connect with those occurrences of some58For example, in Figure 5.3, we may have: “if E RX F then (E RY G) V (E RZ H)”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 115“implicit subtype” of F. The “implicit entity subtypes” on both sides of a relationship may be defined interdependently as occurrences having some specific attributevalues59,having attributes satisfying some formula, participating in some otherlationship types (e.g., occurrences of E participating in RX and their connectedoccurrences of F participating in RY at the same time)60. Some SICs described inthe literature are just special cases. For example, a Subset_Relationship SIC([Palmer, 1978]) in a loop-2 context61 and the necessary conditions for the composition of relationships ([Lenzerini and Santucci, 1983], [Azar and Pitchat, 1987])in a loop-n context62,are special cases requiring that the specific occurrences of Fconnecting with an occurrence of Evia other relationship types (e.g., RY, RZ) be inthe restricted-connecting sets of the occurrence of E. An Exclusive_OccurrenceSIC ([MaFadden and Hoffer, 1988]) in a loop-2 context63 is also a special case requiring that the specific occurrences of F connecting with an occurrence of E viathe other relationship type (e.g., RY) llQ be in the restricted-connecting set of theoccurrence of E.3. Intra-relationship condition. It is possible that we can define a restricted-connecting set based on some properties of the relationship type. For example, wemay define a restricted-connecting set of an occurrence of E by requiring that if someoccurrences (e.g., having some attribute values) of Fare in the restricted-connectingset, some other occurrences (e.g., having some other attribute values) must or mustnot be in the set. A relationship type may also be symmetric or transitive.59For example, in a relationship Drive connecting Driver with Vehicle, even if it is total on both sides,some drivers (e.g., with class 5 driver-licence) can only drive some vehicles (i.e., with gross weight10,900 kgs).°°For example, there may be a SIC that state: if Employee Is_Allocated Car then if Employee Work_forProject then Car IsJnsured Collision_[nsurarice61For example, in Figure 5.4, we may have: “if E RX F then E RY F’.62For example, in Figure 5.5, we may have: “if E RX F then (E RY H) A (H RZ F)”.63For example, in Figure 5.4, we may have: “if E R F then -(E RX F)”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 116The fundamental properties of any relationship type are assumed by default tobe irreflexivity, asymmetry, and intransitivity. However, some relationships maybe reflexive, symmetric, or transitive. The reflexivity of a relationship will notbe discussed in this research because it is quite unusual in a traditional database.Symmetric64 or transitive65 relationships may occur in a specialization hierarchy,that is, when the two involved entity types belong to the same super-type; or oneis a super-type (e.g., Employee), and the other is its subtype (e.g., Manager).There are some further complicating factors such as the following:1. The first complicating factor temporal sequence conditions. A SICmay require that for a relationship occurrence of a specified type to exist, one ofits participating entity occurrences must have attributes with certain values, orparticipate (or not participate) in other relationship type(s) at the time it is goingto participate in this relationship. Those occurrences of other relationship typesare allowed to be deleted after insertion of this relationship.2. The second complicating factor— quantitative requirements. For a relationship occurrence to exist, there may be a quantitative restriction on the maximum number of relationship occurrences of the same relationship type, in whichone involved entity occurrence has participated— that is the relative maximumcardinality to the specified entity type. If a necessary condition for the existenceof a relationship involves other relationship type(s), there may be some specialquantitative requirements that are cardinalities of those other relationship types66.64Some examples of such relationship types are: Sibling_of, Married_to, Partner_of.65Some examples of such relationship types are: Sibling_of, Ancestorof, Supervise, Partner_of.66For example, there may be a SIC such as, “if RX, E RX F then exactly 3 RY, E RY G”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 1173. The third complicating factor — multiple occurrences. The above onlyconsiders necessary coilditions for the existence of one relationship occurrence.However, there may be some necessary conditions for the existence of multiplerelationship occurrences of the same type relative to a sharing entity occurrence67.Group Relationship Type SICsThe above focuses on the necessary conditions for the occurrences of a single relationshiptype although sometimes several relationship types may be involved in the conditions.In a star context, there are several relationship types with a sharing entity type. Ifseveral occurrences of different relationship types co-exist, there may he some necessaryconditions to require that the sharing entity must participate in some other relationships,have some attribute values, or there may he a formula among some attributes of thesharing entity and those relationships.Relationships Involved in Defining “Implicit Entity Subtypes”It is necessary to review those single attribute SICs in Section 5.3 and Single Entity SICsin Section 5.4 in the presence of relationships. For example, it is possible that a SIC(e.g., nonvolatility, value restrictions, etc.) applies to a single attribute only because itsassociated entity participates in some relationship(s). It is also possible that a SIC (e.g.,restriction on the maximum number of occurrences, etc.) applies to an entity occurrenceonly because the occurrence participates in some relationship(s).67For example, there may be a SIC such as, “if 3 exactly 3 RX, B RX F then 3 RY, B RY G”.Chapter 5. An Extended E-R. Model Incorporating Semantic Integrity Constraints 1185.5.2 Sufficient ConditionsAs stated in Section 5.2.3, if a SIC requires the existence of an occurrence of one relationship type to depend on the existence of an occurrence of the other relationship type,it would be incorporated when identifying necessary conditions for the existence of theoccurrences of the first relationship type. At this step, we need not specially incorporateit as a sufficient condition for the existence of the occurrences of the other relationshiptype. However, other SICs may specify sufficient conditions for a relationship to existinvolving the existence of an entity occurrence, time-triggering, or temporal sequencerequirements. They are as follows.The Existence of an Entity There may be SICs to require that if an entity occurrenceexists, it must participate in a specified relationship type. In this case, the existence of arelationship occurrence might be thought of as a necessary condition for the existence ofan entity occurrence. However, since “entity” is a more fundamental construct it is morenatural to think that the existence of an entity occurrence is the sufficient condition forthe existence of a relationship occurrence. There are some further complicating factorssuch as the following:1. The first complicating factor implicit subtype. It is possible that a relationship type must exist only for an “implicit entity subtype”. In this case, the definition of an implicit entity subtype may be based on the values of its attributes68.68j this case, the only focus is the existence of an entity. We need not consider SICs related to an“implicit entity subtype” that is defined as occurrences participating in other relationship types. ThoseSICs have been covered when we consider necessary conditions for the existence of other relationshiptypes.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 1192. The second complicating factor — quantitative requirements. We mayrequire not only the existence of any occurrence of the specified relationship type,but also specify the number of such occurrences. These are two kinds of relativecardinalities69:the requirement of the minimum number of relationship occurrencesin which an occurrence of the specified entity type must participate; or the requirement that for an occurrence of the specified entity type, all occurrences of the otherentity type are required to connect with it via this relationship type. Combiningthis factor with the first factor, we would have cardinalities for an “implicit entitysubtype”7°3. The third complicating factor — further restrictions. It is possible thatthere may be further restrictions on the existence of a relationship in addition to thecardiiialities. A SIC may require that if an occurrence of one entity type exists, itmust participate in some minimum number of occurrences of a specified relationshiptype, and there are further restrictions placed on the relationship attribute value(s),or on the occurrence(s) of the other participating entity type. SICs related toa weak entity type7’ and a Critical_Relationship_Occurrence SIC72 are justspecial cases73.69Note that as described in the previous section, a relative maximum cardinality is a necessary condition for the existence of a relationship occurrence. The exact number of the cardinalities is theconjunction of the requirements of the minimum and maximum cardinalities.70Again note that some subtype cardinalities, which are related to the entity participating in otherrelationship types, have been considered in the previous section.71By the semantics of the weak entity, the relationship type R, via which the weak entity type isdependent upon a regular entity type, is total to the weak entity type and its key is fixed. The fixedretention restricts the key of relationship from updating. Note that the relative cardinalities of the weakentity in the relationship type R are not necessarily (1,1), it may be (c,d) where d is a number greaterthan c, and c greater than 1. For example, a child as a dependent may have both father and mother asemployees in a company.72A Critical_Relationship_Occurrence SIC requires the totality constraint to the specified entitytype and the existence of exactly one critical relationship occurrence.T3here are many other examples such as Stronger Totality Constraints (refer to Appendix D).Chapter 5. An Extended E-R Model Incoiporating Semantic Integrity Constraints 1204. The fourth complicating factor — a group of relationship types. We maynot require the existence of an occurrence of a specific relationship type for anyoccurrence of the specified entity type. Instead, we may require the existence ofone relationship occurrence among a group of relationship types. Combining thisfactor with the above factors, we might have a more complicated SIC.Time-Triggering A single relationship, like an entity in isolation, may have time-triggered SICs to trigger its update.Temporal Sequence Conditions It is possible to require that if an entity occurrencehad some attribute value(s) in the past and no longer has it now, it must participate insome relationship(s). In the case of a group of relationship types, if some relationship(s)existed in the past, once it is deleted, other relationship(s) may he required to exist now.5.6 SICs Implied by Implicit Relationships and Data AbstractionsIn general, the E-R-SIC model requires us to express explicitly all relationship typesin an E-R diagram. Even in cases involving an ID dependency74,is_a or component_ofrelationships where the candidate key of one entity provides information about a relatedentity, the relationship type is required to be specified to make the semantics clearer.There are some exceptions (e.g., the cases that two entity types are exclusive; an entitytype may be formed by set operations on other entity types; or member_of relationshipsexcept for in the enumerated set association) that there is a SIC between some entity74An entity type (e.g., E) has “ID dependency” on other entity types if this entity type cannot beuniquely identified by its own attributes and has to be identified by its relationship types to the otherentity types ([Chen, 1985]). The entity type E needs the key of some other entity type (e.g., F) as apart of its key.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 121types, but we need not express the relationship type explicitly because no suitable relationship type can be specified or the relationship is derivable. This section considers theSICs implied by these “implicit” relationship types. In addition, this section discussesthe SICs implied by data abstractions since they are special relationships.Specialization Abstraction The specification of a specialization abstraction, “S is_aG”, implies that the relative cardinality constraints of is_a must be S (1,1) and G (0,1)([Goldstein and Storey, 1990]). Because the primary key of G (say G. Gkey) is a candidatekey of 5, there is a necessary condition, in addition to the existence of participating entityoccurrences, for a relationship is_a to exist, i.e., S. Gkey=G. Gkey.Entity Types in a Specialization Hierarchy In a specialization hierarchy, twoentity types may be exclusive, or an entity type may be formed by set operations (intersection, union, or difference) on other entity types ([Biller and Neuhold, 1978]). It is notnatural to define a relationship to express the exclusion between two entity types. Neitheris there an explicit relationship to express the set operations although is_a relationshipsamong those entity types should be specified.• SICs implied by exclusion between entity types. Suppose that two entitytypes E, F are exclusive and Ekey is their common candidate key. It implies a SICwritten in the simplified format as: if E.Ekey= Value then -i(F.Ekey_—Value).• SICs implied by set operations on entity types. Suppose E is formed by theoperation on two entity types F, G and Ekey is the common candidate key among7It is not meaningful to discuss exclusion or set operations if those entity types are not in the samehierarchy. In addition, if those entity types formed by set operations do not have special attributes orparticipate in special relationships, they may be redundant.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 122E, Fand G.— F = F fl G. There are some SICs implied by the isa relationships betweenthe subtype (E and the super-types F, G, respectively. In addition, it implies‘if (F Ekey= Value) A (G Ekey= Value) then E Ekey— Value”— E = F U G There are some SICs implied by the is_a relationships between thethe subtypes F, G and the super-type F, respectively In addition, it implies“if E Ekey== Value then (F Ekey= Value) V (G Ekey= Value)”— E = F — G There are some SICs implied by the is_a ielationships between thesubtype (E) and the super-type (F), and the exclusion between E and G. In addition, it implies “if F.Ekey= Value then (E.Ekey= Value) V (G.Ekey= Value)”.Inheritance Conflict Problem An inheritance conflict may occur when two ormore super entity types of a single specialization type have some attribute(s) with thesame name(s). In such a case, there should be a further higher level super-type of whichthese super-types are subtypes. However, the organization may not be interested in thishigher level super-type. If these attributes are really semantically different, we just prefixtheir names with the super-type names and no SIC needs to be specified. However, iftheir semantics are the same, the subtype should inherit only one of these attributes andthere must be an integrity constraint to maintain the equality of those attributes’ values.In addition, the SICs associated with the super-types must not be inconsistent with eachother, otherwise, the multiple inheritance will cause the subtype to be empty.Aggregation Abstraction The specification of a composite entity aggregation, Ccomponent_of A, implies that the cardinalities of component_of should be either C (1,1)Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 123component_of A (O,n) or C (1,1) component_of A (n,m), where 77 1 and m n76.Because a candidate key of C is formed by the primary key (say A.Akey) of A plusprobably some other attributes, there is a necessary condition, in addition to the existence of participating entity occurrences, for a relationship component_of to exist, i.e.,C.Akey=A.Akey.Association Abstraction Association (grouping) is a powerful abstraction, but itsimplied SICs are complicated. In all cases except for the enumerated set association, themember_of is implicit since it can be derived. Their aggregate SICs should be speciallydealt with77. Suppose that agg_fcn stands for an aggregate function, Derived_Att is anattribute of the set, Att is its corresponding attribute of the members.1. Natural Set Association: Suppose that in the M memberof Ms, the memberofis implicit. We would have “Ms.Derived_Att=aggfcn(M.Att,)”. In addition, anoccurrence of Ms cannot be deleted if it, as a set, is not empty.2. Indexing Derived Set Association;• Only involving one indexing type: Suppose that in Figure 5.1, for Mmemberof DS, the indexing entity type is I with the key I.Ikey, the indexing761f the component C (e.g., Automatic_window_controller) is just relevant to the aggregate A (e.g.,Car), the aggregate A may have 0 or up to n components C. If the component C (e.g., Engine) ischaracteristic or identifying to the aggregate A (e.g., Car), component_of is total to the aggregate A.(Reference to [dos Santos, et al., 1980] for the formal definitions of “relevant”, “characteristic”, and“identifying”.) In either case, the component_of relationship should always be total to the componentC. In some applications, some “components” can exist independently, e.g., an engine may be sold as aseparate part. It would be better to define a subtype of the original “component” type to include thosecomponents that cannot exist independently so that the semantics become clearer. For example, wemay have a type Engine and its subtype ComponenL engine that belongs to cars. Those components aredifferent from other occurrences in terms of behavioral constraints and structural attributes.77The cardinalities of member_of in the case of natural set association and indexed set association canbe derived from the relative cardinalities of other relationship and absolute maximum cardinalities ofentities. So, they need not be considered.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 124relationship is R, member_of is implicit. We would have:“DS.Derived_Att=agg_fcn({M.Att M R I, DS.Ikey=I.Ikey})”.• Only involving indexing attributes: Suppose that in M memberof DS,M.Iridex is the indexing attribute, member_of is implicit. We would have:“DS. Derived_Att=agg_fcrq’{ M. Att DS. Index=M. Index}J’.In both cases, an occurrence of DS cannot be deleted if it, as a set, is not empty.In addition, DS.Ikey (or DS.Index) cannot be updated.3. Enumerated Set Association: In this case, M member_of ES, the member_of isexplicit. That is, we have:“ES.Derived_Att—agg_fcn({M.Att) M M_member_of_ES ES})”.5.7 Summary of the E-R-SIC ModelFigures 5.6 to 5.10 give a summary of the E-R-SIC model.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 125E-R DiagramAttributes?DB Designer SICs?Entities?think: Relationships?±Construct Definition:Definition 5.1: An entity is the database representation of a real-world objectthat can be distinctly identified.Definition 5.2: A relationship is the database representation of an association amongreal-world objects.Definition 5.3: An attribute is the database representation of a property of a real-world objector an association that is a function mapping an entity type or a relationshiptype to values.Definition 5.4: A SIC is a logical, invariant restriction on the static state of a database (that is,a collection of attributes, entities, and relationships), or on the database statetransition caused by an insertion, deletion, or update operation.Observation:• Entity, relationship, and SIC occurrences are classified into different entity, relationship, andSIC types according to some criteria.• At a particular moment, certain groups of entity, relationship, or SIC occurrences can beconsidered as sets in the mathematical sense, and these sets may have some aggregate properties.• Entity is the most fundamental construct among attribute, entity, and relationship.Simplifying Assumptions:No ternary relationships or relationships of higher degree.No recursive relationships.Data Abstractions: inclusion (classification, generalization), aggregation, and association.Premise 5.1: A physical-symbol system has the necessary and sufficient propertiesto represent real-world meaning.Interpretation: A SIC is an assertion— a sufficient or necessary condition— for an occurrenceof an attribute, entity, or relationship type to exist, not exist, or change in a database.Premise 5.2: The occurrences in each entity or relationship type are not homogenous; that is, theyhave different attribute values, and are related to different occurrences of other objects.Implication: A database designer could not specify all entity and relationship types such thatall occurrences of each type satisfied the same set of SICs.That is, “implicit subtypes” with unique SICs always exist.What are SICs that should be incorporated?Figure 5.6: What is in the database?Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 126DiagramDB Designerthink:Examine each attribute of each entity in isolation.(That is, imagine a hypothetical situation:examine each attribute, say E.A1, and suppose that it is the only “object”of interest at this moment.)Consider:Are there necessary conditions that must holdfor E.A1 to exist, not exist, or change in the database?What are these necessary conditions:• associated single entity type SICs?existence: value, null, etc.?— change: changeable, new/old value pair?• associated whole entity set SICs?— unique?— aggregate value?• because we choose it as the primary key?• because of time restriction?Are there sufficient conditions, which if true, imply thatE.A1 must exist, not exist, or change in the database?What are these sufficient conditions: none except for time-triggering?Figure 5.7: Single Entity Attribute SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 127DiagramDB Designerthink:Examine each whole entity in isolation.(That is, imagine a hypothetical situation:examine each whole entity type, say E, and suppose that it is the only “object”of interest at this moment.)Consider:Are there necessary conditions that must holdfor an E occurrence to exist. not exist, or change in the database?What are these necessary conditions:• single entity type SICs?— existence: a formula among attributes?— change: deleted entity attributes?• whole entity SICs?— the restriction on the maximal number of occurrences?— concatenated values of some attributes are unique?— a formula involving aggregate value(s)?aggregate values are interdependent?• because of an implicit subtype?• because of time-restriction?• because of temporal sequence among attributes?Are there sufficient conditions, which if true, imply thatan E occurrence must exist, not exist, or change in the database?What are these sufficient conditions: None except for time-triggering?Figure 5.8: Single Entity SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 128E-R DiagramDB Designerthink:Examine each relationship type, say R, and consider its context in the whole E-R diagram.What are the contexts that each relationship is in the E-R diagram: line, star, loop-2, and loop-n?Consider:What are necessary conditions that must holdfor an R occurrence to exist, not exist, or change in the database?What are these necessary conditions:• single relationship SICs?similarly to a single entity, focus only on its attributes:* each of its single attribute has conditions in Figure 5.7?* the whole relationship has conditions in Figure 5.8?but could be more complicated since an implicit relationship type may be defined based onits participating entities or there may be a formula among those attributes ofthe relationship and its participating entity?— focus on its role as an association between entities:* inherent SIC: its participant entity must exist?* restricted-connecting set:one-side, two-side of its participating entities, or intra-relationship condition?© complicating factors:1. temporal conditions on relationships or the values of entity attributes?2. quantitative requirements on occurrences of the same or different relationship types?3. existence of multiple relationship occurrences of the same type?• group relationship type SICs: conditions for occurrences of several relationship types to coexist?• relationships involved in defining “implicit entity subtypes”, so review Figures 5.7 and 5.8 again?Are there sufficient conditions, which if true, imply thatan R occurrence must exist, not exist, or change in the database?Some are covered as necessary conditions on other relationships, what are other sufficient conditions?• the existence of an entity?© complicating factors:1. because of implicit entity subtypes?2. quantitative requirements on the existence of the relationship occurrences?3. further restrictions on the existence of the relationship occurrences?4. the existence of one relationship occurrence among a group of relationship types?• Time-triggering?• Temporal sequence conditions?Figure 5.9: Relationship SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 129E-R DiagramDB Designerthink:Is there any SIC implied by some “implicit” relationships?If there is a SIC between any two entity types,in general we should express the involved relationship type explicitly in the E-R diagram.In what situations would we have some “implicit” relationship types?• these entity types are in the same specialization hierarchy?— two entity types are exclusive?one entity type is the intersection of other entity types?— one entity type is the union of other entity types?one entity type is the difference between two entity types?— multiple inheritance conflict problem?What are SICs implied by data abstractions?• generalization: S (1,1) is_a G (0,1),a necessary condition for is_a S. Gkey=G. Gkey• aggregation: C (1,1) component_of A (0,n) or A (n,m), n 1, m na necessary condition for component_of— C.Akey=A.Akey• association and grouping: member_of may be implicitly represented except forthe enumerated member_of;the involved relationships and entities are tightly related;SICs are complicatedFigure 5.10: SICs Implied by Implicit Relationships and Data AbstractionsChapter 6The Application of the E-R-SIC ModelThis chapter begins with a hypothetical example to show how the E-R-SIC model canbe applied. Potential pitfalls of using the E-R-SIC model are then discussed. Finally, wediscuss the usefulness of the E-R-SIC model.6.1 An Example of Using the E-R-SIC ModelDescription Suppose that we have an example called car_dealer.database_design. TheE-R diagram in shown in Figure 6.1. The database is intended to keep track of eachcar from the time it is ordered from the factory through its sale to a customer andsubsequent service history as long as the car is serviced by that dealership. Informationabout a (potential) customer is also kept. For simplicity, the possibility that a customermight transfer his (her) car to other persons is not considered.Attributes The attributes are not shown in the first level of the E-R diagram.They are as below78.• Entity Types:ThThe property inheritance principle is assumed. Therefore, the inherited attributes, e.g., Salesperson.Salary, are not listed.130Chapter 6. The Application of the E-R-SIC Model 131is..aFigure 6.1: An Example: A Car Dealership DatabaseChapter 6. The Application of the E-R-SIC Model 132— Person: SIN (key, i.e., social insurance number), Name, Sex, Address, Phone.— Customer: SIN (key).— Employee: EmpJd (key, i.e., employee identification number), Salary, Hire_Date.— Salesperson: Empld (key), BasicSalary, Commissioll.— Mechanic: EmpJd (key).— Car: Engine_No (key), Model, Year, Regular_Price, Cost.— Part: Code (key), Name, Price, Cost, Qty_on_hand, Reorder_Point, Date_Last_Used.— Service: Sequence_No (key), Date, Charge, Part_Fee, Labor_Fee, Discount_Amt.• Relationship Types:— Pre_Sell: no attributes.— Own: no attributes.— Sell: Discount_Rate, Total_price, Sell_Date.— Maintained_by: no attributes.— Installed_in: Qty_Used (i.e., the number of the related part installed duringa service).— Work_on: Hours (i.e., how much hours a mechanic spends on a service).The example does not include all SIC types. In addition, only some of the relevantSICs are illustrated. The SICs are written in the simplified format according to the BNFin Appendix C.Entity Attribute SICs Each attribute of each entity may have some SICs requiringit to be not-null, to take some specified value range, data type and format. The followingtwo attributes are used for illustration.Chapter 6. The Application of the E-R-SIC Model 133• Person.SIN: there are SICs to assert that a person’s SIN must be known,is of type non-arithmetic (i.e., may not be used in conventional arithmeticoperations) and has a format such as 123-56-789, i.e.,is..null(Person.SIN)=nosatisfy_datatype( Person. SIN, non_arithmetic)satisfy_format(Person.SIN, 999!999!999)• Employee.Salary: there are SICs to assert that an employee’s monthly salarymust be known, is of type arithmetic (i.e., all of the normal arithmetic operations may be performed on it in the usual way), and is in the range $1,500and $10,000 inclusive, i.e.,is_nuil(Employee. Salary) =nosatisfy_datatype(Employee. Salary, arithmetic)satisfy_value(Employee.Salary, [1500 .. 10000])Some attributes are not allowed to be updated. For example, the social insurancenumber of a person cannot be updated, i.e.79if Person.SIN is_to_be_updated then falseSome attributes have SICs to restrict changes in value. For example, a salesperson’sbasic salary cannot decrease; and the recently used date of each part is kept up-to-date,i.e.,new(Salesperson .Basic..Salary) old( Salesperson .BasicSalary)new(Part .DateLastUsed) old(Part .DatelastUsed)791n this section, it is assumed that there is no automated database design aid. If there is such asystem, the system, rather than the database designer, would write the SICs in the simplified format forlater reformulating them in terms of SIC Representation model.Chapter 6. The Application of the E-R-SIC Model 134Some attribute SICs relate to the entire associated entity set, for example, eachperson’s SIN is unique; and the sum of the monthly salary of employees should be lessthan or equal to $1,000,000, i.e.,unique (Person. SIN)sum(Employee.Salary) 1000000Entity SICs There may be some SICs that require the values of several attributes ofan entity to follow some formulas. For example, the salary of a salesperson includes twoparts — basic_salary and commission; the regular price of a car is set to have at least25% gross profit; and for each part, its quantity on hand must be always greater than orequal to its reorder point, i.e.,Salesperson. Salary = Salesperson .Basic.Salary + Salesperson. CommissionCar.Cost Car.RegularYrice x 0.75Part. Qtyon±and Part .ReorderPointThere are some SICs to restrict entity set properties. For example, the dealer canhave at most 30 mechanics, i.e.,count(Mechanic) < 30There are some SICs restricting attribute values as a function of time. For example,for those mechanics who have worked for at least two years, their salaries should notdecrease and the average should be at least $3,000, i.e.,if (Current_time — Mechanic.Hire_Date) “2 years”then new(Mechanic . Salary) old(Mechanic. Salary)Chapter 6. The Application of the E-R-SIC Model 135if (Current_time — Mechanic.Hire_Date) “2 years”then avg(Mechanic.Salary) 3000Relationship SICs Attributes of relationships may have similar attribute SICs. However, some restrictions on attributes of one relationship are more complicated. For example, a salesperson cannot offer more than 80% of the gross profit rate of a car as adiscount rate to a customer and the total price of a sold car is equivalent to its regularprice minus discount, then adding 13% tax rate, i.e.,if Salesperson Sell Carthen Sell.Discount_Rate (Car.Regular_Price— Car.Cost) ÷ Car.Regular_Pricex 0.80if Salesperson Sell Carthen Sell.Total_Price Car.Regular_Price x (1—Sell.Discount_Rate) x 1.13There are some restrictions on constructing a relationship. Some restrictions are one-side conditions. For example, the dealer only services those cars that are first owned bysome customers that are in the database, i.e.,if Car Maintained_by Servicethen (Customer Own Car) beforeOther restrictions are two-side conditions. For example, examine the loop formedby the three relationships, Pre_Sell, Sell and Own, there are only three non-redundantSICs to assert their dependencies: (1) if a salesperson sells a car, he (she) must havecontacted a customer, who then buys the car — there must be a “customer” involved;Chapter 6. The Application of the E-R-SIC Model 136any other “entity occurrences”, which are not in the “customer type”, cannot own thatcar (2) a customer can own those cars that have been pre-sold by a salesperson and thenactually sold by the same salesperson; (3) if a car is sold by a salesperson and is ownedby a customer then that salesperson must have contacted that customer transfer of thecar among customers is not recorded, i.e.,if Salesperson Sell Carthen Salesperson Pre_Sell Customer, Customer Own Carif Customer Own Carthen Salesperson Pre_Sell Customer, Salesperson Sell Carif Salesperson Sell Carthen if Customer Own Carthen Salesperson Pre_Sell CustomerSome conditions are complicated. For example, a part_fee including 6% tax is placedon the parts installed in those cars that have been bought more than 5 years, i.e.,if Part Installed_in Service,Car Maintained_by Service,Salesperson Sell Car,Service.Date— Sell.Date > “5 years”thenService.Part_Fee = sum({MCharge Part Installedln Service,MCharge= Part.Price x InstalledJn.QtyUsed x 1.06})Chapter 6. The Application of the E-R-SIC Model 137There can also be SICs expressing sufficient conditions for relationships. Those arethe relative minimum cardinalities in Figure 6.1. For example, any service should haveat least one mechanic working on it, i.e.,V Service, Work_on, Mechanic Work_on ServiceSICs in a Specialization Hierarchy There are some SICs implied by implicit relationships. For example, an employee cannot be both a mechanic and a salesperson,i.e.,if Mechanic.EmpJd=Valuethen —i (Salesperson .EmpJd=Value)There are also some SICs implied by is_a relationships. For example, “each mechanicis an employee” implies that relative cardinalities— Mechanic (1,1) is_a Employee (0,1),and a necessary condition for an is_a occurrence to exist—Mechanic. EmpId—Employee. EmpId.6.2 Potential Pitfalls of Using the E-R-SIC ModelThere might be some potential pitfalls that need to be avoided when using the E-R-SICmodel.Pitfall 1: Inconsistent and Redundant SICs Inconsistent or redundant SICs mightbe specified. For instance, in the car_dealer_database_design example, if the database designer specified that “if Salesperson Pre_Sell Customer then (Salesperson Sell Car)”, itChapter 6. The Application of the E-R-SIC Model 138would be inconsistent with “if Customer Own Car then Salesperson Pre_Sell Customer,Salesperson Sell Car”. If the database designer specified: “Salesperson.Basic_Salary2000”, “Salesperson. Commission 500”, and “Employee.Salary < 2OO”, these SICswould be inconsistent because Salesperson.Salary would then have the lower bound $2,500that is greater than the upper bound of Employee.Salary. If the database designer specified “if Customer Own Car then Salesperson Sell Car”, it is redundant since it is subsumed by “if Customer Own Car then Salesperson Fre_Sell Customer, Salesperson SellCar”. It becomes more difficult to detect such inconsistencies and redundancies as thenumber of SICs gets larger. There needs to be some automated tools to help detect themor prevent them in advance although this issue may not he completely solvable.Pitfall 2: Imprecise or Incomplete SIC Representation The simplified representation format in the preceding section should only be used as a convenient way ofinitially incorporating SICs rather than as a formal way of eventually representing them.Although the simplified representation format might be more natural to a database designer, it is not precise. Recall that there are six components in the SIC Representationmodel. In most cases, the simplified representation only contains precondition and predicate components. For operation-dependent SICs, object and operation type componentsare also specified. Even in that case, other important specification information (the certainty factor and violation action) is still missing. In order to represent any SIC precisely,we need some algorithms to reformulate it (decompose it if necessary) in terms of theRepresentation model.We do not take the naive approach of enumerating all conditions for all objects explicitly. For example, a general SIC represented in the simplified format, “if E RX Fthen E RY 0”, would be incorporated when we consider necessary conditions for theChapter 6. The Application of the E-R-SIC Model 139existence of RX. We do not incorporate it again as a sufficient condition for the existenceof RY. However, if we decompose the original general SIC, there would be two sub-SICsrepresented in the Representation model:one for RX on insertion, whose violation action would be “reject” or“propagate(insert (RY))”— the other for RY on deletion, whose violation action would be “reject” or“propagate(delete(RX))”Note that the first sub-SIC in terms of the SIC Representation model is a necessarycondition for the insertion of RX, and is also a sufficient condition for the insertion ofRY if the violation action is “propagate(insert(RY))”. The second sub-SIC is a necessarycondition for the deletion of RY, and is also a sufficient condition for the deletion ofRX if the violation action is “propagate(delete(RX))”. Without decomposing a generalSIC into several sub-SICs represented in terms of the SIC Representation model, theSIC specifications would not be complete because the violation action may depend onthe object and operation causing violation. Therefore, we can conclude that the SICRepresentation model provides a complete and precise representation of SICs in the ER-SIC model.Pitfall 3: E-R Structure Orientation The E-R-SIC model helps a database designerincorporate SICs based on E-R structural descriptions. However, our logical data modelis the relational model. There should be some procedure for losslessly transforming SICsin the E-R-SIC model into corresponding ones in the relational model80.80The “losslessly” means that, for each SIC incorporated in the E-R-SIC model placing restrictions onoccurrences of entities, relationships, or attributes, there is (are) SIC(s) in the relational model placingcorresponding restrictions on relation tuples or attributes.Chapter 6. The Application of the E-R-SIC Model 140Pitfall 4: Inefficient Modelling In the absence of an automated database design aid,a database designer must specify all explicit and inherent SICs in terms of the simplifiedformats as described in Appendix C. For example, in addition to specifying the entitytypes, Employee and Mechanic, and the relationship type Mechanic_is_a_Employee, he(she) needs also to specify the relative cardinalities of is_a, a necessary condition for itsexistence — Mechanic.EmpId=Employee.EmpJd, and even the fundamental incidenceconstraints the existence of participating entity occurrences. In general, there mightbe many possible SICs given an E-R diagram. Since one SIC may involve more thanone object, the database designer needs to record what SICs have already been identifiedso that they would not be incorporated again. There would be a heavy burden for thedatabase designer if he (she) needs to figure out all the possible SICs, verify, reformulate, decompose, and transform them by himself (herself). It is desirable to have someautomated database design system to help the database designer model and representSICs.Proposals for avoiding these pitfalls will be introduced in the next chapter (pitfall 1is not completely avoided).6.3 Data Integrity Semantic CompletenessIn the E-R-SIC model, the SIC is the construct used to provide logical restrictions onthe structural schema, that is, on the attributes, entities and relationships. Completelyincorporating SICs depends on the correct specification of the structural schema. TheE-R-SIC model requires all relationships to be explicitly represented except for somespecial cases, such as exclusion between two entity types. The explicit representation ofrelationships makes the semantics clear.Chapter 6. The Application of the E-R-SIC Model 141Based on the assumed correct E-R structural schema, we could systematically modelSICs by considering a wide range of possible restrictions or requirements— positiveor negative, simple assertions or complicated formulas, qualitative or quantitative, time-restricted, time-triggering, or temporal sequence requirements, implicitly/explicitly type-related or set-related. Most SICs we consider are necessary conditions for the existenceof the occurrences of one or more attribute, entity, or relationship types. They are, bynature, static SICs restricting the possible states of the database. A few are inherentlydynamic SICs, e.g., new_old transitional constraints; and some others are consideredas sufficient conditions. However, after decomposing them into sub-SICs in terms of theSIC Representation model, all become dynamic and operation-dependent SICs specifyingconditions for the deletion, insertion, or update of the related objects. Since these threeoperations are the only ways the database can change, those necessary conditions forthese operations completely cover the necessary conditions for existence, nonexistenceand change of an object in a database.In addition, sufficient conditions (if any) are also incorporated because of the violationaction component of the SIC Representation model.Whenever we find that the structural schema cannot model the data integrity semantics as SICs, we would need to go back and modify the E-R structure. Therefore, theE-R-SIC model not only allows modelling the SICs completely but also forces the structural schema to reflect the data integrity semantics more faithfully. The structure andSIC parts of the resulting schema would be closely related. This would form a more suitable and fundamental base for higher level transaction or application program modelling.Therefore, we can conclude the following:With the support of the SIC Representation model, the E-R-SIC model wouldChapter 6. The Application of the E-R-SIC Model 142completely model the data integrity semantics.There is one limitation. Although some entities or relationships may have time-valuedattributes or temporal sequence requirements, time-stamped data integrity semantics(e.g., dealing with historical data at some specific time) in general may not he completelymodelled81.A related discussion concerns SICs in the relational model. Suppose that the SICs inthe E-R model have been losslessly transformed into corresponding ones in the relationalmodel. Are there any other new SICs, e.g., data dependencies, that must be consideredin the relational model?Data Dependency in the Relational Model Tile relational data model has well-formalized theories. Data dependency is one of its important concepts. Uliman [1982]defines a data dependency as “a constraint on the possible relations that can be thecurrent value for a relation scheme”. However, one should note that the usefulness ofdata normalization theory is for designing a database directly from the relation concept,not for capturing semantics. The partial and transitive functional dependencies implythe embedding of independent relationships ([Makowsky, et al., 1986]). Similarly, theexistence of non-trivial multi-valued dependencies occurs when a relation represents morethan one 1:N relationship ([Kent, 1983]). These problems can be avoided if entitiesand relationships are properly designed and carefully transformed into relations ([Storey,1988]). Some researchers try to apply other dependencies, e.g., inclusion dependencies,811n the literature, [Palmer, 1982] discusses time-dependent relationships and [Taiizovich, 1991] introduces lifetime cardinalities. It is not clear how the restrictions of lifetime cardinalities are applied overthe lifetime of the entities and relationships.Chapter 6. The Application of the E-R-SIC Model 143exclusion dependencies and co-exclusion dependencies82,to capture semantics. Exclusiondependencies have been covered in the E-R-SIC model. Inclusion dependencies comeabout because some relationships have not been explicitly represented83;and co-exclusiondependencies occur because the SICs are not precisely represented84.Therefore, we neednot add other SICs after transforming those SICs identified by the E-R-SIC model tocorresponding ones that reference the schema in the relational model.52Let E, F be two relations (possibly the same), and A, B, he attributes of E and F, respectively.E[A1,... ,Am] ç F[B1,... ,B] is called an inclusion dependency ([Casanova and Vidal, 1983]). IfE[A,... ,Am] is exclusive to F[B1,... , B,], it is called an exclusion dependency ([Casanova and Vidal,1983]). The negation of an exclusion dependency is called a co-exclusion dependency, ([Arisawa andMiura, 1986]).83Take an example of [Minnila and Räihâ, 1986]: Registered_Cars/Model] CarTypes/ModelJ. Thereshould be some relationship, e.g., Hasmodel, between Registered_Cars and Car_Types.84Arisawa and Miura [1986] state that what co-exclusion dependencies mean is to constrain databasewhen updating and sharing entities. However, there should a higher level of entity that is the super-typeof those entity types. If the SICs implied the is_a relationships between the sub-types and super-typeare precisely represented, the co-exclusion dependency constraints would be incorporated.Chapter 7A Proposed Database Design Aid for Eliciting SICsIn Figure 1.1 of Chapter 1 a proposed database design subsystem for eliciting SICsis sketched. Both the SIC Representation model and E-R-SIC model are application-domain independent. They are suitable for implementation as part of an automateddatabase design system. An automated database design system is needed because of thepotential heavy work load of a database designer when incorporating and representingSICs. This chapter describes conceptually how this subsystem for eliciting SICs shouldwork. In the following, the term “database designer” will be used for the user of theautomated database design subsystem who may be a professional database designer oran end-user who plays the role of the database designer to design his (her) system. Section 7.1 gives a brief overview of the elicitation subsystem. The three major functions ofthe subsystem — verifying elicited SICs for consistency and non-redundancy, reformulating and decomposing general SICs into sub-SICs in terms of the SIC Representationmodel, and transforming them into corresponding ones in the relational model are described in detail in Sections 7.2, 7.3 and 7.4, respectively.144Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1457.1 An Overview of the SIC Elicitation SubsystemInterfaces From the view of the SIC elicitation subsystem, there are two interfaces.The first is an interface to elicit explicit constraints from the database designer in a dialogue or by menus, etc. The second is an interface between the SIC elicitation subsystemand the structure subsystem (e.g., the View Creation System in [Storey, 1988] and [Storeyand Goldstein, 1988]) for constructing the structural schema. From that interface, theSIC elicitation subsystem fetches the structural descriptions in terms of the E-R-SICmodel.Elicitation Based on these structure descriptions, the SIC elicitation subsystem willquery the database designer to obtain the attribute SICs for each attribute of each entity.Next, whole entity SICs would he obtained for each entity. The system then traversesthe E-R diagram to detect possible relationship SICs85 and data abstraction SICs. Ina dialogue, the database designer may need to add some further restrictions on a SICby using expressions such as those in the simplified format (Appendix C). In general,though, the database designer need not worry about the assertion representation syntax.In addition, the system, rather than the database designer, would specify those inherentSICs implied by special relationships (e.g., the is_a data abstraction).Elicitation Knowledge It is possible that the elicitation procedure may be lengthybecause the goal is to include the complete data integrity semantics and the system hasneither common sense nor application domain knowledge. However, it is expected that85The original structural specifications may include some traditional relative cardinalities of relationships. However, the exact numbers except for 0, 1 may not he known since they are irrelevant forconstructing normalized structures.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 146the system should query the database designer based on the following knowledge:• the E-R-SIC model to capture the possible SICs from different objects in the(line, star, ioop-2, and loop-n) contexts of an E-R diagram;• consistency and nonredundancy rules for different SIC types;• heuristics from naming conventions and data types;• specification information that the database designer has provided so far.With consistency and nonredundancy rules for different SIC types, the subsystemwould not request, or could refuse immediately, some impossible SICs. This kind ofknowledge is introduced in Section 7.2.1 to show how it can alleviate the need for somedetailed verification.Since there may be a large number of possible SICs, heuristics may help the subsystemto be even more efficient in asking questions. The object data type — date, arithmetic ornon-arithmetic may suggest its volatility. For example, often a date is unchangeable. Inaddition, naming conventions may hint at some SICs. For example, attributes with thesame names in the entity types linked via some relationship types may suggest the needfor SICs involving some formula between them. Appendix E contains more examples.Verification Although full verification for consistency and non-redundancy is hard toachieve, the elicited SICs would be verified to some extent, especially for cardinalities.The cardinality information is important because a totality constraint might make manypossible SICs redundant or inconsistent. Section 7.2 discusses the related issues on verification in detail.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 147Reformulation and Decomposition Based on the information obtained from thedatabase designer, the system would represent general SICs in terms of the simplifiedformat described in Appendix C. Then, the system should automatically rewrite eachof the above SICs, and decompose it into sub-SICs if necessary, in terms of the SICRepresentation model. All six components plus the SIC name in the model would bespecified. The reformulation and decomposition algorithms are introduced in Section 7.3.The default certainty factor is “certain”, but the database designer can change it to be“uncertain” or provide specific certainty factors.Without application domain knowledge, it is impossible for an automated databasedesign system to provide automatically complex and application-dependent violationactions. For example, if we allow arbitrary violation actions, even for a SIC on updateof E.A2 with a simple formula predicate “E.A1 = E.A2 + E.A3”, the violation actionmay be either a propagation to update E.A1, or update E.A3, or update both E.A1and E.A3 (this may imply a number of choices with different ratios of E.A1 and E.A3).The number of possible combinations of violation actions would grow rapidly as thenumber of involved objects increases. The “propagate” action is helpful and powerful,but is also costly if the set of SICs is not optimized86. In this research, it is envisionedthat the system would only provide some restricted predefined violation action choices(“reject”, “propagate”, “conditionallyreject”, “conditionally_propagate”, or “warning”).Most of the violation actions would be “reject”, “conditionally_reject”, or “warning” ifthe system could not decide how to propagate to fix the violation. If the system decidesthat a “propagate” action is feasible, it would query the database designer to determine86An example here consists of two SICs: “SIC-i: (B RX F) V (B RY G)”; “SIC-2: if B RX F thenB RY G” attached with “propagate” actions in their all sub-SICs. Suppose that the original databasestate is “RY exists, but RX does not”. A “deiete(RY)’ operation would cause the following operations tobe performed: “delete(RY)”, “propagate(insert(RX))” owing to SIC-i, “propagate(insert(RYJ)” owingto SIC-S. A RY is forced to be inserted back although now it may connect B with a different F.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 148whether to “propagate” or “reject”.Generic SICs As described on page 78, we can represent some common SIC typesfor the generic object (Entity*, Relationship, Entity*.Attribute*. and Relatioriship*.Attributejto reduce the number of explicit SICs represented using the Representation model. Bydoing so, we can also alleviate the need for the invocation of the reformulation and decomposition algorithms during the database design consultation session. The subsystemshould have these pre-defined representations of generic SICs. If the database designerspecifies that occurrences of an object type must satisfy one of these common SIC types(e.g., two entity types are mutually exclusive), the related constraint information wouldonly be kept in a logical predicate, called an input predicate (refer to Appendix B.1,e.g., ex_ents(ExEntSet,) storing the information that the entity types in ExEntSet areexclusive). Such specific constraint information need not be directly reformulated or decomposed in terms of the SIC Representation model. It is expected that the specifiedobject type could inherit the related generic SICs. Section 7.3.1 describes these genericSICs in more detail.Transformation It is assumed that if a relationship type has (1,1) cardinalities relative to an entity type, it would be represented by a foreign key rather than a separaterelation in the relational model. \‘Vhen the structure subsystem transforms the E-R specifications into relations, the SIC elicitation subsystem should also transform the SICs inthe E-R-SIC model into corresponding ones in the relational model. The input predicatesneed not be transformed. There should also be pre-defined generic SICs in the relationalmodel corresponding to those in the E-R-SIC model. The principles of representing relationships in the relational model and the SIC transformation algorithms are introducedChapter 7. A Proposed Database Design Aid for Eliciting SICs 149in Section 7.4.Final Outputs The final results of the consultation session would be a listing of incorporated SIC specifications in a relational database schema. These specifications include:(1) specific SICs, which directly apply to specific object types and are represented interms of the Representation model; (2) input predicates describing specific object typesto which some common SIC types apply; (3) generic SICs, which serve as the “templates”for common SIC representations and are expected to be inherited by the specific objecttypes described in (2).7.2 SIC VerificationDifficulty in Verification As discussed in Section 2.3, full verification of SICs for consistency and non-redundancy would need application-domain and common-sense knowledge. Even those aspects of verification that do not require application-domain knowledgestill pose difficult research problems because of the potential complexities of SICs. Thisresearch discusses the issues, but does not offer general solution algorithms.Because the interactions among SICs might, in general, be complicated87,we wouldneed logic theorem proving techniques (e.g., the resolution refutation method [Nilsson,1980]). However, standard logic must be enhanced to handle the following problems.• Attribute Value Problem. In standard logic programming theory, objects are8TFor example, suppose that Ci, C2, CS are the preconditions of three SICs, SIC-i, SIC-2, SIC-3,respectively; and P1, P2, P3 are their predicates, respectively. SIC-i and SIC-S are asserted for thesame object. If Ci overlaps with P2 and C3 implies C2, Pi should be the same as PS. Otherwise, theyare inconsistent — because CS implies P2 and P2 overlaps with Ci, if some occurrences satisfy C3, theymay also satisfy Ci; so CS transitively implies Pi.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 150represented as finite terms, and the set of terms is countable (within the contextof the Herbrand Universe) ([Lassez, 1987]). Suppose that the value domains of allattributes are discrete and finite. Then, consistency among SICs can be checkedby applying the standard logic proving technique although, for efficiency reasons,it may be accompanied by the standard CSP (constraint satisfaction problem)algorithms ([Mackworth, 1977; 1987]; [Mackworth and Freuder, 19851)88. However,suppose that the values of attributes are allowed to be continuous and infinite89.Standard logic cannot deal with the real number domain. It must be extendedto have other techniques to handle real number values and formulas. That isthe motivation for developing constraint logic programming (CLP) ([Jaffar andLassez, 1987]; [Cohen, 1990])°. We may apply linear programming (LP) algorithms(e.g., the simplex method [Hillier and Lieberman, 1986]) to check the consistencyof a set of linear formulas (i.e., whether a feasible solution exists) and removethe inconsistent value ranges (by finding the values of optimization functions—maximization and minimization for each involved attributes)91;apply integer linearprogramming algorithms if the values are restricted to be integer numbers92;and88There are three sets of algorithms node, arc, and path algorithms. In this case, we would applyarc algorithms to check the consistency between value constraints and a formula constraint; apply pathalgorithms to check the consistency among value constraints and several formula constraints. Theycan be checked in polynomial time ([Mackworth and Freuder, 1985]) although the path algorithms arecomplicated.89They may be numerical real numbers or integers; or may have date data type if date operations aresuitably defined.90For example, one CLP representative, CLP(R) provides both expressive and computational powersto solve linear equations and linear inequalities (by two-phase simplex algorithm) incrementally ([Jafi’arand Michaylov, 1987]). However, it is still a rudimentary experimental product and can only run underUNIX-based operating systems. Other CLP languages are Prolog III ([Colmerauer, 1990]), Trilogy, andCHIP ([Hentenryck, 1989]), etc.91A simple algorithm used by ALICE (A Language for Intelligent Combinatorial Exploration) ([Lauriere. 1978]) can also be applied to test the consistency between a single linear formula and the valuerange constraints, and further to remove the inconsistent value ranges.92It is not usual to have complex numbers in a database. Suppose we do have a complex numberapplication, Gröbner method would test whether a system of multivariate polynomial equations has asolution over the complex numbers ([Cohen, 1990]).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 151apply non-linear programming algorithms to handle a set of non-linear formulas93.However, all existing methods have some limitations. For example, if any of theconnectives between linear formulas is a disjunction (“V”), e.g., “P1 V P2”, thesimplex method is no longer applicable since “P1 V P2” may represent a non-convexpolyhedron ([Cohen, 199O]). The general integer linear programming problemis NP-complete ([Papadimitriou and Steiglitz, 1982]). Furthermore, it is evendifficult to achieve nonredundancy. We may apply these algorithms to removethe inconsistent values from value constraints. However, given a set of arbitraryformulas, no existing algorithms can tell us whether a formula is redundant.• Aggregate Function Problem. There may be some attribute SICs related toentire entity or relationship set properties. We cannot verify these SIC specificationswithout actual data. Even if they are not in rule format, at the best we can onlyhave some simple tests (Appendix F.1), such as: the specified maximum value of anattribute value must be greater than or equal to its minimum value. Iii addition, asstated, the verification for cardinality constraints is important for other relationshipSICs. Standard logic proving should be supplemented with the algorithms describedin Appendix F.2 to check consistency and nonredundancy for cardinalities.• Incremental Verification Problem. Verification would be conducted for theSICs in the sequence of the incorporation. Since it is desirable to identify any inconsistencies each time a new SIC is added, the SIC elicitation subsystem would931f there are non-linear formulas and the functions are monotonically increasing or decreasing foreach involved attribute, the ALICE algorithm can be applied to check the consistency between a singlenon-linear formula and value constraints.941n addition, one should note that the simplex algorithm would take an exponential number of stepsin the worst case although in general it can be regarded as very efficient ([Papadimitriou and Steiglitz,1982]).9It is known that the theory of predicates with =, , , <, >, , +, x over real numbers is satisfactioncomplete, i.e., every constraint is either provably satisfiable or provably unsatisfiable. However, the theoryof predicates with =, +, x over integer numbers is not satisfaction-complete ([Cohen, 1990]).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 152not verify the whole set of SICs oniy once, instead there would be an incremental constraint satisfaction problem ([Hentenryck, 1990]), in which constraints areincrementally added and dropped.• Feedback from the Database Designer. It is impossible for the SIC elicitationsubsystem to achieve complete verification without feedback from the databasedesigner. The following are two examples.— If the preconditions of two SICs overlap, the database designer is required tomodify these two SICs.— If the preconditions of two SICs are not related in the syntax (i.e., they areexpressions on different objects, for example, one is on Employee.Age, anotheris on Employee.Education), but their predicates are inconsistent, the systemshould query the database designer as to whether it is possible that an objectwould satisfy these preconditions at the same time. If it is, these two SICs areinconsistent.Optimization Problem Even if two constraints are neither inconsistent nor redundant, they might not be “optimized”— i.e., there might be a single constraint that canreplace the set of these two constraints and be enforced more efficiently. The followingare two examples.• Given two value range constraints that are neither inconsistent nor redundant,they may overlap. That is, there may be a single refined range to replace those tworanges.• A set of two constraints, “(E RX F) V (E RY G,,)”, and “if E RX F then E RY G”,is neither inconsistent nor redundant. However, after optimization, the above setChapter 7. A Proposed Database Design Aid for Eliciting SICs 153would be replaced by a single constraint, “V E, RY, E RY G” (i.e., RY total toE.To get a single refined range constraint may be easy. However, in general, optimizationis more difficult than consistency and non-redundancy checking even for a small andsimple set of constraints. For its related objects, each constraint allows a set of databasestates to exist. When several constraints relate to the same object, if they are consistent,the intersection of all those sets of states is not empty. One approach to optimization isto find this intersection set and describe it by some expressions. However, the number ofstates could he large and it would be very difficult to write expressions for those states.Issue of Unexpected Implicit Constraints Even if a set of SICs is not inconsistentor redundant, there may still exist some unexpected implicit constraints. For example,suppose that Supervise is a relationship type between entity types Manager and Project,the database designer has specified the absolute maximum cardinality of Project— therecan be no more than 8 projects; and some relative maximum and minimum cardinalitiesin the relationship Supervise there can be no more than one manager per projectand any manager must be assigned to a project. This set of three constraints implies animplicit constraint — the organization can hire at most 8 managers although the databasedesigner does not specify it explicitly or may think that there is no upper bound on theabsolute maximum cardinality of the entity type Manager. The elicitation subsystemneeds to recognize that some SIC types may be missing, and then check the current setof constraints to see whether such types of SICs can be derived from it. The databasedesigner would then be asked to judge as to whether he (she) has made some mistakes.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 154Verification related to Certainty Factors The certainty factor component in theSIC Representation model is a source of possible inconsistency between SICs. Supposing that both SIG-i and SIC’-2 are for the same object on the same operation with thesame precondition, if the predicate of SIC-2 implies that of SIC-i (i.e., SIC-2 is morerestrictive than SIC-i), the SIC-2 should be less certain than SIC-i. Otherwise, theyare inconsistent. In addition, the certainty factor and violation action are closely related.An uncertain SIC can only have a predefined violation action of “warning”, “conditionally_reject”, or “conditionally_propagate”. A certain SIC can only have a predefinedviolation action of “reject” or “propagate”.Verification related to Violation Actions Furtado et al. [1988] describe a situationwhere the violation actions of two SICs may be inconsistent. However, if the SICs areprecisely defined and the transaction concept is applied when enforcing them, the problemdescribed by Furtado et al. could not occur96. The major consistency problem causedby the violation action component in the SIC Representation model is that the violationactions may cause an endless enforcement cycle a “flip-flopping” behaviour between aset of SICs described by Ceri and Widom [1990] if arbitrary actions are allowed. Ceri andWidom suggest using a triggering graph97 to detect potential cycles. However, they leavethe responsibility of determining whether infinite triggering would actually be possibleto the database designer. It is difficult, in general, to take an arbitrary violation action96The problem raised by Furtado et al. [1988] is as follows. Suppose that there is a relationship typeR and an entity type E, R is total to E. If the violation action of the incidence constraint, SIC-i, forE on deletion is “propagate(delete(R))” and the violation action of the totality constraint, SIC2, for Ron deletion is “reject”, Furtado et al. claim that they are incompatible. However, note that if the SICsare precisely defined, when the SIC-i has been violated and the deletion of the R is taken, the SIC-2becomes irrelevant because the B occurrence has become non-existent.971n their triggering graph, the nodes of the graph correspond to the SICs. There is a directed edgefrom node SIC-i to node SIC-2, if and only if the execution of SIC-i’s violation action is likely to violateSIC-2 (the likelihood is only in terms of objects and operation types).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 155and decide whether it would violate other SICs. In this research, oniy some restrictedand predefined violation actions would be suggested by the SIC elicitation subsystem.Basically, if the violation action of a SIC is “propagate”, when the SIC is violated, it wouldallow the intended operation to be performed, but would also take a simple compensatoryaction to bring the database to another consistent state. With such a restriction, if thoseSICs are consistent (in terms of the components other than the violation action), whenan operation violates a SIC, the database would return to a consistent state althoughsome operations may be undone.Verification and Decomposition If a general SIC is not in rule format, it can beverified for consistency and nonredundancy before decomposing it. However, a generalSIC in rule format must first be decomposed into several sub-SICs before verifying it.These sub-SICs indicate the consequences of the general SIC that must be taken intoconsideration when verifying a set of SICs. For example, if there is a general SIC, “ifCl then P1”, where Cl and P1 are some assertions, the SIC elicitation subsystem wouldnot know that it is inconsistent with “if P1 then Cl” without first decomposing it.The decomposition algorithms described in the next section can be assured to representcorrectly the sub-SICs of a given single SIC. However, the verification for consistencyand nonredundancy among the representations for jj SICs is still needed.7.2.1 Consistency and Nonredundancy Rules for SIC TypesSince conceptually by using the E-R-SIC model, the SICs can be classified into a numberof types, some general problems of consistency and nonredundancy among those typesof SICs can be explored in advance. The results would be some rules about consistencyand nonredundancy for SIC types, which can be stored iii the knowledge base of theChapter 7. A Proposed Database Design Aid for Eliciting SICs 156elicitation subsystem in order to expedite the elicitation procedure.Take a simple example. In Figure 6.1 of the car_dealer_database_design example, thetwo relationship types, PreSell and Own, may be involved in a number of types of SICs.For example, they may be exclusive (i.e., “if Customer Own Car then —i(SalespersonPre_Sell Customer)”), or one may depend on the other (e.g., “if Customer Own Car thenSalesperson Pre_Sell Customer”). Suppose that the database designer has specified thatthe minimum cardinality of Pre_Sell relative to Customer is 1. With the built-in consistency and nonredundancy rules in Appendix G, the SIC elicitation subsystem wouldautomatically know that the first SIC type cannot occur and that the second SIC type isredundant. It might be confusing to the database designer if the SIC elicitation subsystem asks him (her) to confirm these SICs. It would be inefficient if the database designerinadvertently specified the above redundant or inconsistent SICs and the elicitation subsystem invoked the whole process to verify them during a consultation session.Note that this knowledge cannot totally replace the actual verification during theconsultation, but it would reduce the number of cases that need to be verified. Kung[1984, 1985] applies a sophisticated tableaux approach to verify a set of two SICs: “everyemployee earns more than $20, 000”, and “every manager is an employee”. Since the firstis a simple value constraint for a non-key attribute and the second implies some specialrelative cardinalities and the equality condition on key values of two sides of entities forthe existence of an is_a occurrence, a human database designer would know that theycannot be inconsistent. If the SIC elicitation subsystem had similar rules, it could alsoskip the detailed consistency checking process.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1577.3 SIC Reformulation and DecompositionAs discussed in Chapter 6, the E-R-SIC model uses the Representation model to specifySICs. SICs should be represented in terms of the Representation model. If a SIC is onlyrelevant to an object on an operation, we only need to reformulate it in the Representationmodel without changing its certainty. In most cases, because a SIC is relevant to severalobjects on various operations, the decomposition would get several sub-SICs written interms of the Representation model.The SIC elicitation subsystem would invoke the reformulation and decomposition algorithms, which are described in Appendix H, to reformulate and decompose SICs obtainedfrom the database designer. By using the E-R-SIC model, the subsystem should recognize all inherent SICs that are implied by the specifications (e.g., an is...a relationship)provided by the database designer. Before invoking the reformulation and decompositionalgorithms, the subsystem should first represent these explicit and inherent SICs in termsof the simplified formats in Appendix C, but without using the nested if. . . then rules.Algorithms previously proposed in the literature can deal only with some restrictedtypes of SICs and only consider the object and operation type components. The algorithms in Appendix H deal with all SICs that are recognized by the E-R-SIC model. Allcomponents in the SIC Representation model are considered.Correctness of the Reformulation and Decomposition Algorithms It is necessary to assure the correctness of the proposed reformulation and decomposition algorithms for representing a given general SIC. The following briefly examines the algorithms.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1581. Same Certainty Factor. The certainty of sub-SICs cannot be lower than thegeneral SIC; otherwise, the enforcement of these sub-SICs cannot achieve the certainty of the general SIC. Therefore. if the original general SIC is 100% certain,its sub-SICs should also have 100% certainty. Suppose that the original generalSIC is uncertain. Since its sub-SICs may have different violation actions, some ofthem may have higher certainty. Without further information, the elicitation subsystem can assume that they have at least the same certainty as the general SIC.(Later, the database designer can change the certainty of a sub-SIC to be higher ifnecessary.)2. Relevant Object and Operation Types. The search for the relevant objectand operation types is based on the following ideas.• An operation-dependent SIC is only relevant to one object on one operation.For example, “if E is_to_be_deleted then .. . “is only relevant to E on deletion.• It is desirable to remove obvious redundancies when checking objects mentioned in a SIC. By the nature of relationship, if a relationship occurrenceexists, its participating entity occurrences must exist too (the incidence constraint). If a general SIC is relevant to the insertion of an occurrence of arelationship type and we have a sub-SIC for the relationship, we need nothave other sub-SICs to restrict the insertion of the occurrences of its participating entity types. It is possible that some occurrences of these entity typesmay not participate in any occurrence of this relationship type. The checkingof the general SIC for any occurrence of these entity types can be delayed untilit really participates in an occurrence of this relationship type. For example,according to the algorithms, “if E RX F then E RY G” would be relevant toRX on insertion, but not to E, F or G.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 159• Premise 4.1 in Section 4.1.2 is adopted to avoid some operation checking.Since by assumption, the current database state is semantically correct, wewould need to check an operation only because the database state transitioncaused by the operation may violate the SIC. For example, an insertion of anRX occurrence may violate “if E RX F then E RY G”, but a deletion of aRX could not. Similarly, an insertion of a.n RX occurrence may violate “if 3atieast 3 RX, E RX F then 3 RY, E RY G”, hut a deletion of a RX couldnot.• Current_time and some aggregate functions have special monotonicity properties. For example, a deletion of an E occurrence may violate “max(E.A) >Value “, but not an insertion of an E because, by the monotonicity of functionmax, an insertion of a new occurrence would never decrease the aggregatefunction value.• Since a primary key is used as a surrogate to represent an entity or a relationship in a traditional database, its update may imply a deletion of an “old”entity or relationship occurrence followed by an insertion of a “new” entity orrelationship occurrence. Therefore, sub-SICs related to deletion and insertionshould also be asserted on the update of a primary key.Note that the algorithms have limited capability to identify avoidable operationchecking for SICs involving aggregate function. In addition, it is possible to incorporate more consistency and nonredundancy rules for SIC types into the algorithmsto reduce the sub-SICs further. However, it can be claimed that the current algorithms correctly find the relevant object and operation types for a single SIC.3. Proper Precondition and Predicate Components. Basically, the algorithmsjust rewrite the original SIC so that it becomes more precise and suitable for eachChapter 7. A Proposed Database Design Aid for Eliciting SICs 160sub-SIC. The following are some brief ideas.• Subscripts and some special predicates (e.g., rship_occ_part) are used to makea sub-SIC precise.• If a sub-SIC is for an attribute on update, in general, its predicate componentcontains only an assertion on the attribute requiring it to have some specialvalue after update.• If a sub-SIC is for an entity or relationship and it does not only assert thevalues of its attributes, in general, the precondition component is importantto identify what is the entity or relationship occurrence to be checked.• If the constraint violation is because an entity occurrence is deleted, in general,because of the key uniqueness, it is impossible to find another occurrence ofthe same type to avoid the constraint violation (unless the constraint is oneasserting aggregate properties). However, it is possible to “reconnect” anoccurrence of one entity type with another occurrence of the other entitytype. That is, we might find another relationship occurrence to satisfy theSIC unless both sides of the relationship are bound.• Constraints that assert explicitly the equality of some key attributes of twoentity types, e.g.. SICs implied by an ID Dependency, is_a or componenLofrelationships, should be dealt with specially. Because the key attributes (e.g.,SIN) of the two entity occurrences (e.g., Person and Employee) in fact referencethe same physical entity occurrence (e.g., the same physical Person), the linkshould be permanent.4. Suggested Violation Actions. As stated, because of the verification problem,the SIC elicitation subsystem could only suggest some restricted and predefinedChapter 7. A Proposed Database Design Aid for Eliciting SICs 161violation actions appropriate to the certainty of the SIC. A propagation actioncan be automatically suggested by the SIC elicitation subsystem only in the casethat there are two simple assertions on two objects in the SIC. In that case, asimple propagation action is suggested to regain a consistent database state. If thedatabase designer has set a certainty threshold (say 75%), any uncertain SIC withcertainty less than the threshold will be given a “warning” as its violation actionautomatically.5. Given SIC Names. A SIC name is given automatically after applying the abovealgorithms and using some predefined SIC types.Examples Some examples are included in Appendix I for illustration.General SIC Information After decomposing a general SIC, we may still need tolink its decomposed sub-SICs together since the verification for consistency and nonredundancy may still be needed, and furthermore, a general SIC may later be deleted.In addition, for documentation purposes, it may be desirable to keep the original general SIC information, which contains only the certainty factor (if not default “certain”),precondition and predicate components. Therefore, the SIC elicitation subsystem wouldneed to link a general SIC and its decomposed sub-SICs together.7.3.1 Representation of Generic SICsThe principles of the reformulation and decomposition algorithms can be similarly appliedto produce the generic sub-SICs for the generic object types Entity*, Relationship*,Entity*.Attribute*, and Relationship* .Attribute*. However, the SIC representation forChapter 7. A Proposed Database Design Aid for Eliciting SICs 162the generic types is complicated because the data dictionary retrieval and manipulation,and the pre-coudition contexts must be explicitly stated. The input information andmanipulation logical predicates used in generic SICs are listed in Appendix B. This subsection introduces the representation of generic SICs for some common SIC types anddiscusses the related issues. In all of the following cases, the principle of SIC specializationallows us to omit the explicit SIC representations for specific object types if we have therepresentations for generic object types.Domain Constraint Representation SIC Aggregation and SpecializationBy the principle of SIC aggregation, domain constraints on insertion of an entity/relationshipoccurrence can be simulated by applying the domain constraints of all its attributes.There are three separate sub-SICs for restricting not-null, uniqueness, and nonvolatility, respectively, and another sub-SIC dealing with data-type, format, and value. Forrestricting the insertion of an entity, we need one sub-SIC, which would call the aboverelated sub-SICs (except for the nonvolatility) for asserting attribute domain constraints.In total, the five sub-SICs in Appendix J are sufficient to represent all domain constraintsregardless of the number of entity types and their attributes in a database. The numberof SICs that must be specified explicitly is dramatically reduced through using the SICabstraction concepts. Similarly, five sub-SICs are needed for relationship types and theirattributes in a database.Primary Key Constraint Representation— SIC Association and Specialization A number of SICs must be specified to capture the possible inconsistent states of adatabase when updating a primary key. The SIC association and specialization conceptswill be used to reduce the number of explicit SICs required. During the database designChapter 7. A Proposed Database Design Aid for Eliciting SICs 163phase, if a SIC is identified for the insertion (or deletion) of a relationship and thereshould be a sub-SIC for checking the update of a part of its key, its SIC name is addedinto an associated SIC.NameSet of the affected key attribute. The SICNameSeis ofkey attributes of a relationship type may be different. However, in the case of a SIC thatis relevant to the insertion (or deletion) of an entity type, all affected key attributes ofthe entity have the same SICiVame_Set.For example, suppose that we have a SIC such as “if an employee is assigned to aproject, he (she) must participate in an insurance plan”, and assume the key of the employee and project are Empld and Projld, respectively. The name of the sub-SIC restricting an insertion of the relationship Assigned_to would be inserted into the SIC_Name_Setof a special logical predicate (associated_PKSICs_1) for the key attribute Assigned_to.Empld.Note that since the non-sharing entities are not concerned, the update of another keyattribute Assigned_to.Projld of the relationship is not restricted. Similarly, the name ofanother sub-SIC restricting a deletion of the relationship Insure, is inserted into AssociatedPKSICsD for Insure.Empld.Two “set-SICs” in Appendix J are needed for the key attributes of Relationship*.(Similarly, there are two “set-SICs” for key attributes of Entity*.) By the principle ofSIC association, the enforcement of such a “set-SIC” is the same as the enforcement ofall of its “member-SICs”.Other SIC Representation SIC Specialization A number of other SIC types98can be similarly represented. Usually, these SICs types can be described in the “closedform” of predicates, that is, without further arbitrary restrictions. If a DBMS finds98These SIC types include: Composite_Attribute_Unique Constraint, Absolute MaximumCardinality Constraint of an Entity Type, “traditional” relative cardinality constraints (i.e., Totality Constraint, and Relative Maximum Cardinality Constraint), Incidence Constraint,Chapter 7. A Proposed Database Design Aid for Eliciting SIC’s 164the related information on a specific entity, relationship or attribute type, e.g., symmetric(Married_to) indicating its symmetry, the related sub-SICs would be automaticallyinherited from the generic types.The number of such generic SICs stored in the SIC maintenance subsystem woulddepend on the complexity of the application at hand. It is possible that in a specialdatabase application all SICs can be represented as generic SICs in advance so that theactual invocation of the reformulation and decomposition algorithms might be almosttotally avoided during the database design consultation session.Some examples of the above generic SICs are included in Appendix J.Some Improvements The generic SICs in Appendix J have two problems, which canbe solved as follows.• Uniform Certainty Factors It is assumed that the generic SICs are all “certain”by default If we consider the possibilities that a few SIC types for some specificobject types may be “uncertain”, the certainty factors need to be stored in theinput predicates for these specific object types• Uniform Violation Actions. Because of the SIC inheritance principle, specific object types would inherit the same violation actions from the generic objecttypes. The advantage of this is that the elicitation sub-system only needs to ask theSymmetry Property of a Relationship, Transitivity Property of a Relationship, Subset_Relationship SIC, Relationships_Union Special SIC, Exclusive Relationship SIC, Exclusive Occurrence SIC, Not_And_Relationships SIC, Either_Existence_Relationships SIC, Relationship_Before_Relationship SIC, Relationship_Not_Before_Relationship SIC, Relationships_Join SIC, Relationships_Depends_onLoopN_Relationships SIC, Weak_Entity SIC,ID_Dependency_Relationship SIC, Weak_Relationship SIC, Completeness_Mapping SIC,Relationships_Intersection Special SIC, Relationship_Trigger_Relationship SIC, Exclusionbetween Entity Types, Entities_Intersection Special SIC, and Entities_Union Special SIC.Chapter 7. A Proposed Database Design Aid for Eliciting SIC’s7.4 Transforming SICs to Relational FormThe final result of the consultation is a logical database specification implemented inthe relational model. These SICs may be later enforced by an integrity maintenaicesubsystem in a relational database system. Therefore, the incorporated SICs referencingentities and relationships in the E-R-SIC model must be automatically transformed by theelicitation subsystem into SICs referencing the corresponding relations in the relationalmodel.Relationship Representation in the Relational Model When constructing the relations in the relational model by using the E-R descriptions, each entity type is naturallyrepresented by a separate relation. However, there are two alternatives for representinga binary relationship type in the relational model. Some researchers favour always representing a relationship type by using a separate relation rather than a foreign key because165database designer the violation action once for each of these common SIC types.However, this rigidity may not be suitable for all applications. An improvementwould be to store a violation action “list” in the input predicate for each relatedspecific object type. The number of elements in the list corresponds to the numberof its sub-SICs. Each element is a violation action for each sub-SIC. Then the subSICs of a specific object type would not inherit the violation action from its genericobject type, but have its own “custom-made” violation action. Considering alsothe above problem of uniform certainty factors, we may allow both the certaintyfactor and the violation action in generic SICs to be variables. They would bebound with actual values when a specific object type satisfies the preconditions.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 166they argue that foreign keys decrease the adaptability of database designs ([Wilmot,1984]) and “[E-R consistent relational schema] assures a greater adaptability to changesnot concerning the structure of the modeled information, such as the cardinality of relationships” ([Makowsky, et al., 1986, p. 321]). In addition, by adopting the separaterelation approach we would have the same SIC representations in both the E-R-SICmodel and the relational model.However, because of access efficiency considerations, the foreign key approach stillprevails in practice. This research allows the foreign key approach to representing arelationship type having (1,1) relative cardinalities. The cardinalities stability criterionis adopted if there is a tie to decide which entity key will become the foreign key. Supposethat a relationship type R relates to entity types E and F. If the relationship type R hasany attribute, it is represented by a separate relation in the relational model. Otherwise,its representation is based on the following rules99:1. Suppose only one of the involved entity types has (1,1) cardirialities. If E has the(1,1) cardinalities, add the primary key of F to the E relation as a foreign key.Otherwise, add the primary key of E to the F relation as a foreign key.2. Suppose both entity types, E and F, have (1,1) cardinalities.(a) The decision is based on the stability of the cardinalities. If the (1,1) cardinalities of F are more likely to change in the future, add the primary key ofF to the E relation as a foreign key. Otherwise, add the primary key of E tothe F relation as a foreign key.99There is an exception. If the relationship is a relationship via which an ID Dependency happens,an is_a or component_of relationship, the entity type with the (1,1) relative cardinalities already has theprimary key of the other entity type as its candidate key attributes. These key attributes can play boththe role of (part of) a candidate key and a foreign key. We need not add the key attributes of the otherentity type to it.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 167(b) If the above cannot be decided, the choice is then based on the relative frequencies of “F of E’ or “E of F” type queries. If the query “F of E” wouldbe encountered more often, add the primary key of F to the E relation as aforeign key. Otherwise, add the primary key of F to the F relation as a foreignkey.(c) If neither of the above can be decided, the choice depends on the last resort— how the database designer specifies the relationship, “F R F” or “F R F”.It is based on the heuristic that future queries may be similar to the way thedatabase designer states the relationship although he (she) may not admit it.If originally the relationship is expressed by the database designer as “F RF”, add the primary key of F to the F relation as a foreign key. Otherwise,add the primary key of E to the F relation as a foreign key.3. In all other cases, construct a separate relation for the relationship.SIC Representation in the Relational Model Although we allow the foreign keyapproach to representing relationships in the special cases of (1,1) cardinalities, this kindof representation is only for query and processing efficiency. The semantics should bethe same as in the E-R-SIC model. Therefore, the entity and relationship descriptionswould still be stored in the data dictionary of a relational database. There would also bethe same classification of SIC types. However, the adoption of the foreign key approachwould cause some semantic confusion because now a relation could represent an “entity”,or “relationship”, or even both. In this research, if a relation represents both an entitytype and one or more relationship types, it is deemed a special entity type with someinformation kept in the data dictionary. The associated special information specifies therelationship types hidden in it (by adding the key of the other entity type(s) to it asChapter 7. A Proposed Database Design Aid for Eliciting SICs 168a foreign key) and the relative cardinalities of the other entity type(s) in this hiddenrelationship type’°°.The (1,1) cardinalities of one entity type in a relationship type would cause some SICsto be redundant or inconsistent. In addition, some SICs in the E-R representation neednot be transformed when constructing relations because either they do not mention anexplicit relationship or the relationship representation is not relevant to them’°1. TheseSIC’s representation in the relational model would be the same as corresponding ones inthe E-R-SIC model.The Algorithms The general transformation algorithms for transforming SICs inthe E-R-SIC model into SICs referencing the corresponding relations in the relationalmodel are described in Appendix K. It is relatively simple to transform the relationshipname and the manipulation predicates of its participating entities102. However, in addition to the primary key update problem, in the relational model we have a problemof the update of foreign key attributes owing to the well-known semantic overload issue.The update of any attributes of a foreign key would imply the deletion of an old relationship occurrence and the insertion of a new one. The SIC association and specializationconcepts can be applied here too.‘°°That is, two related “relationship_participant” predicates would be deleted and a new “relationship_hidden_entity” predicate will created by the SIC elicitation subsystem.‘°‘Some of these SIC types are: all single entity attribute SICs, all entity SICs, SICs on a singlerelationship’s attributes, Entitieslntersection Special SICs, Entities_Union Special SICs. Inaddition, if a relationship is declared to be complete, usually the relative cardinalities of any involvedentity type should not be (1,1). Otherwise, the other side of the entity type can only have exactly oneoccurrence, which is not practical. Therefore, the Completeness_Mapping SIC usually need not betransformed.‘°2By taking the foreign key approach to representing relationships, some SICs would become redundant. For example, we do not need a SIC requiring the relative maximum cardinality to be 1 for the“entity” relation in which the relationship is now hidden because the foreign key is single-valued. Thealgorithms would also remove them.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 169Examples corresponding to the ones in the E-R-SIC model are included in Appendix Lfor illustration.Generic SICs It is also desirable to apply the SIC abstraction concept here torepresent generic SICs. In principle, we can apply the above transformation algorithms totransform the generic SICs in the E-R-SIC model into corresponding ones in the relationalmodel although the references to some new information from the data dictionary maybe needed. However, the foreign key approach would need additional sets of predefinedsub-SICs. Because now relationship representation is not uniform, we need to considereach of the three possible relation representations for each relationship. If a sub-SICinvolves two relationships, it would have nine possible combinations of representations;if it involves three relationships, the possible combinations would become twenty-sevenalthough some impossible cases can be excluded103.Such an analysis and preparation of pre-defined generic SICs could be carried outfor SIC types involving only a few relationships. However, it becomes impossible whena SIC type may involve an unknown number of relationships (e.g., the intersection ofrelationships). In those SIC types, we would be forced to pre-define generic SICs onlyfor the simple cases having at most three relationships involved.Some of these SICs representations in the relational model are included in Appendix Mfor illustration.103For example, suppose that we have an Exclusive_Relationship SIC, such as RXIIRY relative toE where RX type connects the entity types E with F, and BY connects the entity types E with G. Wewould know that E cannot have (1,1) cardinalities in either RX or RY. So we need not consider thoseimpossible representations for relationships RX and BY— adding the primary key of F or G as theforeign entity of E. However, F may have (1,1) cardinalities iii RX; and G may have (1,1) cardinalitiesin RY too. So we would need two more sets of representations for the predefined sub-SICs in additionto the original one in the E-R-SIC model.Chapter 8Conclusions and Further Research8.1 Conclusions and ContributionsConclusions This research has presented two models. The E-R-SIC model is a comprehensive modelling tool for helping the database designer systematically incorporatesemantic integrity constraints that are relevant to attributes, entities, and relationshipsin a database. The SIC Representation model is used to represent precisely the featuresof these SICs. The declarative and operational semantics of each SIC are specified. Byusing these two models, the data integrity semantic constraints on the allowable statesand state transitions can be completely modelled and properly represented in a databaseschema. The focus of database designers will be changed from traditionally emphasizingonly structure, functional dependencies, efficiency, etc. to describing data semantics.In a database, the number of explicit SICs specified using the Representation modelwould not be huge because of the application of SIC abstractions and the representationof generic SICs. These SICs could be efficiently enforced because of the characteristicsof the SIC Representation model.Both the Representation model and the E-R-SIC model are application-domain independent. They are suitable for implementation as part of an automated database design170Chapter 8. Conclusions and Further Research 171system. Conceptually, this research proposes a SIC elicitation subsystem. The SIC elicitation subsystem would detect where general SICs may he needed and prompt a databasedesigner to confirm or provide them. The subsystem would automatically decide for whatdata objects and on what database operations these SICs should be enforced, reformulate them (decompose them into sub-SICs if necessary) in terms of the Representationmodel, and suggest some violation actions. The subsystem would have some built-in consistency and nonredundancy rules for different SIC types, would verify the consistencyand nonredundancy of SICs to some extent and transform them into corresponding onesin the relational model. This kind of automated database design system would providemore assistance to a database designer in modelling the data semantics.Contributions Most previous SIC research concentrates on classifying and efficientlyenforcing a few types of SICs. Current languages do not represent all features of aSIC precisely. Existing automated database aids do not provide adequate facilities forincorporating SICs into a design. This research contributes to our understanding ofdatabase semantic integrity. On the theoretical side, there are the following contributions:1. The SIC Representation model is defined to represent precisely the necessary features for specifying declarative and operational semantics of a SIC.2. The E-R-SIC model is proposed to incorporate dynamic and static SICs in adatabase schema rather than in transactions and programs.3. Algorithms are provided to reformulate and decompose SICs elicited using the ER-SIC model, and to transform them into corresponding ones in the relational datamodel.chapter 8. Conclusions and Further Research 172On the practical side, there are the following contributions:1. A SIC elicitation szzbsystem has been proposed to help the database designer designSICs in addition to the structure part of a database schema.2. This research provides a foundation for overcoming the well-known problem ofrepresenting data integrity semantics in current relational database systems. Theresulting database would have the advantages of embedded SICs as described inChapter 1. The SIC representation would facilitate the efficient enforcement.3. Although not empirically tested, we may conjecture that the information systemconceptual design would more completely and systematically include data and information system semantics. The database designer would be able to model moredata integrity semantics, thereby reducing the need for this information to be included in application programs. This research provides a starting point for futureempirical tests.8.2 Future Research Extensions to this DissertationThis dissertation is part of a research program at the University of British Columbia toformalize the database design process and make databases more intelligeilt. There aremany areas for further research to extend this dissertation. Some of these are suggestedbelow.• Non-binary Relationships. Currently, the E-R-SIC model assumes all relationship types to be binary. It is assumed that a database designer would know howChapter 8. Conclusions and Further Research 173to use binary relationships to simulate non-binary ones (including recursive relationships, ternary relationships and relationships of higher degree). This restrictioncould be relaxed to allow the database designer to model non-binary relationshipsdirectly.• Development and Implementation of Efficient Algorithms to Assure theConsistency, Nonredundancy, and Even Optimization of SICs. To provethe consistency, nonredundancy and optimization of arbitrary SIC statements isstill an open issue. In addition, to design and implement those verification algorithms needs more research. These are challenges for researchers in the fields ofmanagement information system, computer science, and mathematics.• Integration of SICs Elicited from Multiple Sources. This dissertation isbased on a simplified assumption of a single database designer. View integration orsynthesis is an important topic in database design research. Previous research (e.g.,[Wagner, 1989]) have addressed this issue without considering SIC specifications. Itis also possible that the SICs need to be obtained from several database designers.Future researchers may address this complicated issue.• Programming Implementation. Future research would need to implement theproposed SIC elicitation subsystem. Then the produced SIC elicitation subsystemmust be merged with a system to construct the structure of a schema (e.g., theView Creation System [Storey, 1988]) to provide the complete database schemadesign assistance to a database designer. It is also desirable to incorporate it witha view integration system (e.g., AVIS [Wagner, 1989]) and a future SIC integrationsystem in the case that multiple sources for a database specification are needed. Acomplete automated database design system needs to be implemented.Chapter 8. Conclusions and Further Research 174• Add-on Knowledge, Capabilities and Features of the SIC Elicitation Subsystem. Based on the E-R-SIC model proposed in this research, future researchersmay add some common-sense, domain-dependent, and organization-dependent knowledge to the proposed automated database design system to make it more intelligentand provide more efficient and effective assistance for a database designer. Ideally,the system may have a learning capability so that it can accumulate knowledgeeach time it is used. It is also desirable to provide some natural language facilitiesand graphical interface so that a database designer can more easily describe entitiesand relationships, and directly design an E-R database on the screen.• Transaction Modelling. In order to capture transaction-driven semantics andcheck the SICs related to transactions more efficiently, specifications of transactions,i.e.. user-predefined operations, need to he specified. Future research may proposealgorithms to transform SICs identified by the E-R-SIC model to the pre-conditionsand post-conditions of transactions.• Application of SICs in an Expert Database System. As stated earlier, SICscan be used to facilitate intelligent query evaluation and provide deductive capabilities. Further research may explore how to include and apply SICs representedby the Representation model in an expert database system.• Design and Management of an Integrity Maintenance Subsystem in theRelational Database. Very few kinds of SICs are enforced in commercial relational database systems. One reason for this is probably concern for efficiency.Based on SIC specifications identified in this research, future researchers may design an integrity maintenance subsystem to enforce SICs efficiently in a relationaldatabase. The management of SICs, i.e., the insertion or deletion of SICs, after the database is populated should be carefully taken into consideration. SomeChapter 8. Conclusions and Further Research 175administrative procedures may need to be invented to handle the “change of SICs”.• Empirical Research for Testing “Usefulness”. Future empirical researchersmay test two kinds of usefulness. The first is the usefulness of using database embedded SICs versus the traditional approach of enforcing integrity via applicationsoftware. The second is the usefulness of the design approach adopting an automated database design system compared with the manual design approach (withoutany assistance of an automated design system) to incorporate the necessary SICsfor embedding in a database. The automated database design system proposed inthis dissertation can be taken as a tool.Bibliography[1] Abiteboul, S., and Vianu, \7, “Transactions and Integrity Constraints”, PTOC. ofthe Second ACM SIGA CT-SIGMOD Symposium of Principles of Database Systems,Portland, Oregon, May 1985, pp. 193-204.[2] Aho, A. V., Hopcroft, J. E., and Ulirnan, J. D., The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Mass., 1974.[3] Arisawa, H., and Miura, T., “On the Properties of Extended Inclusion Dependencies”, Proc. of the Twelfth International Conference on Very Large Data Base,Kyoto, August 1986, pp. 449-457.[4] Azar, N., and Pichat, E., “Translation of an Extended Entity-Relationship Modelinto the Universal Relation with Inclusion Formalism”, in Entity-Relationship (theFifth International Conference on E-R Approach, France, 1986) edited by S. Spaccapietra, Elsevier Science Publishers B.V. (North-Holland), 1987, pp. 253-268.[5] Benci, E., Bodart, F., Bogaert, H., and Cabanes, A., “Concepts for the Design of aConceptual Schema”, in Modelling in Data Base Management Systems, edited byG.M. Nijssen, North-Holland Publishing Co., 1976, pp. 181-200.[6] Bernstein, P. A., Blaustein, B. T., and Clarke, E. M., “Fast Maintenance of Semantic Integrity Assertions Using Redundant Aggregate Data”, Proc. of the SixthInternational Conference on Very Large Data Base, 1980, pp. 126-136.[7] Bertino, E., and Apuzzo, D., “Integrity Aspects in Data Base Management Systems”, Proc. Trends é4 Applications, Gaithereshurg, Maryland, 1984, pp. 43-52.[8] Biller, H., and Neuhold, E. J., “Semantics of Data Bases: The Semantics of dataModels”, Information System, Vol. 1 No. 3, 1978, pp. 273-292.[9] Borgida, A., “Language Features for Flexible Handling of Exceptions in InformationSystems”, ACM Transactions on Database Systems, Vol. 10, No. 4, December 1985,pp. 565-603.[10] Bouzeghoub, M., and Gardarin, G., “The Design of An Expert System for DatabaseDesign” in New Applications of Data Bases, edited by G. Gardarin and E. Gelenbe,1984, pp. 203-223.176Bibliography 177[11] Bouzeghoub, M., Gardarin, G., and Metais, E., “Database Design Tools: An ExpertSystem Approach:”, Proc. of the Eleventh International Conference on Very LargeData Base, Stockholm, August 1985, pp. 82-95.[12] Bouzeghoub, M., and Metais, E., “SECSI: An Expert System Approach forDatabase Design”, in Information Processing 86, edited by H.J. Kugler, ElsevierScience Publishers B.V. (North-Holland), 1986, pp. 251-257.[13] Bracchi, G., Furtado, A., and Pelagatti, G., “Constraints Specification in Evolutionary Data Base Design”, in Formal Models and Practical Tools for InformationSystems Design, edited by H.-J. Schneider, North-Holland Publishing Co., 1979,pp. 149-165.[14] Brady, L. I., and Dompney, C. N. G., “Dynamics of Database Semantic Integrityor Managing the Meaning of Data”, Proc. of the Australian Computer Conference,Sydney, November 1984, pp. 82-96.[15] Bragger, R. P., Dudler, A., Rebsamen, J., and Zehnder, C. A., “Gambit: An Interactive Database Design Tool for Data Structures, Integrity Constraints and Transactions”, International Conference on Data Engineering, Los Angeles, California,August 1984, pp. 399-407.[16] Brodie, M. L., Specification and Verification of Database Semantic Integrity, Ph.D.Thesis, University of Toronto, 1978 (also as Technical Report CSRG-91, April1978).[17] Brodie, M. L., “Association: A Database Abstraction for Semantic Modelling”, inEntity-Relationship Approach to Information Modelling and Analysis (the SecondInternational Conference on E-R Approach, 1981), edited by P. P. Chen, ElsevierScience B.V. (North-Holland), 1983, pp. 577-602.[18] Brodie, M. L., “On the Development of Data Models”, in On Conceptual Modelling,edited by M.L. Brodie, J. Mylopoulus and J.W. Schmidt, Spring-Verlag, 1984, pp.19-47.[19] Brodie, M. L., “Database Management: A Survey”, in On Knowledge-Base Management Systems, edited by M.L. Brodie and J. Mylopoulus, Springer-Verlag, 1986,pp. 201-218.[20] Brodie, M. L., and Manola, F., “Database Management: A Survey”, in Foundations of Knowledge Base Management: Contributions from Logic, Databases,and Artificial Intelligence Applications, edited by J. W. Schmidt and C. Thanos,Springer-Verlag, 1989, pp. 205-234.Bibliography 178[21] Brodie, M. L., and Ridjanovic, D., “On the Design and Specification of DatabaseTransactions”, in On Conceptual Modelling, Edited by M.L. Brodie, J. Mylopoulos,and J. W. Schmidt, Spring-Verlag, 1984, pp. 277-306.[22] Bry, F., and Mauthey, R., “Checking Consistency of Database Constraints: a Logical Basis”, Proc. of the Twelfth International Conference on Very Large Data Base,Kyoto, August 1986, pp. 13-20.[23] Bubenko, J. A., “Information Modeling in the Context of System Development”,Information Processing 80, edited by S. H. Lavington, North-Holland PublishingCo., 1980, pp. 395-411.[24] Carsnell, J. L., and Navathe, B. “SA-ER: A Methodology that Links Structured Analysis and Entity-Relationship Modelling for Database Design”, in Entity-Relationship (the Fifth International Conference on E-R Approach, France, 1986)edited by S. Spaccapietra, Elsevier Science Publishers B.V. (North-Holland), 1987,pp. 381-397.[25] Casanova, M. A., and Furtado, A. L., “On the Description of Database TransitionConstraints Using Temporal Languages”, in Advances in Database Theory, Vol. 2,edited by H. Gallaire, J. Minker and J.M. Nicolas, Plenum Press, New York, 1984,pp. 211-236.[26] Casanova, M. A., and Tucherman, L., “Enforcing Inclusion Dependencies and Referential Integrity”, Proc. of the Fourteenth International Conference on Very LargeData Base, California, 1988, pp. 38-49.[27] Casanova, M. A., and Vidal, V. M. P., “Towards a Sound View Integration”, Proc.of the Second ACM SIGACT-SIGMOD Symposium of Principles of Database Systems, Atlanta, George, March 1983, pp. 36-46.[28] Cauvet, C., Proix, C., and Rolland C., “A Knowledge Base for an InformationSystem Design Tool”, in Methodologies for Intelligent Systems, edited by Z.W.Ras, and M. Zemankova, Elsevier Science Publishing Co. Inc., 1987, pp. 56-63.[29] Ceri, S., and Widom, J., “Deriving Production Rules for Constraint Maintenance”,IBM Research Report, RJ 7348, March 1, 1990. (A short version appears in Proc.of the 16th VLDB Conference, Australia, 1990, pp. 566-577).[30] Chakravarthy, U. S., Minker, J., and Grant, J., “Semantic Query Optimization:Additional Constraints and Control Strategies”, in Expert Database Systems, editedby L. Kerschberg, Benjamin/Cummings Pub. Co. Inc., 1987, pp. 345-377.Bibliography 179[31] Chen, P. P.. “The Entity-Relationship Model— Toward a Unified View of Data”,ACM Transactions on Database Systems, Vol. 1 No. 1, December 1976, pp. 9-36.[32] Chen, P. P., “Database Design Based on Entity and Relationship”, in Principles ofDatabase Design, Vol. 1: Logical Organization, edited by S.B. Yao, Prentice-Hall,1985, pp. 174-210.[33] Choobineh, J., “Form Driven Conceptual Data Modelling”, Ph.D. Dissertation,Dept. Management Information Systems, University of Arizona, 1985.[34] Choobineh, J., Mannino, M. V., Nunamaker, J.F., and Konsynski, B.R., “An Expert Database Design System Based on Analysis of Forms”, IEEE Transaction onSoftware Engineering, Vol. 14, No. 2, February 1988, pp. 242-253.[35] Codd, E. F., “Extending the Database Relational Model to Capture More Meaning”, ACM Transactions on Database Systems, Vol. 4, No. 4, December 1979, pp.397-434.[36] Cohen, J., “Constraint Logic Programming Languages”, Communications of theAGM, Vol. 33, No. 7, July 1990, pp. 52-68.[37] Colmerauer, A., “An Introduction to Prolog III”, Communications of the ACM,Vol. 33, No. 7, July 1990, pp. 69-90.[38] Cosmadakis, S. C., and Kanellakis, P. C., “Equation Theories and Database Constraints”, Proc. of the 17th Annual ACM symposium on Theory of Computing,Rhode Island, May 1985, pp. 273-284.[39] Dampney, C. N. G., “Specifying a Semantically Adequate Structure for informationSystems and Databases”, in Entity-Relationship Approach (the Sixth InternationalConference on E-R Approach, New York , Nov. 1987), edited by S. T. March,Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 165-188.[40] Date, C. J., An Introduction to Database System, Vol. II, Addison-Wesley Publishing Co., Reading, Massachusetts, 1983.[4].] Date, C. J., A Guide to the SQL Standard, Addison-Wesley Publishing Co., Reading, Massachusetts, 1987.[42] Davis, J. P., and Bonnell, R. D., “Modelling Semantic Constraints with Logic inthe EARL Data Model”, The Fifth International Conference on Data Engineering,Los Angeles, California, February 1989, pp. 226-233.Bibliography 180[43] de Castilho, J. M. V., Casanova, M. A., and Purtado, A. L., “A Temporal Framework for Database Specifications”, Proc. the Eighth International Conference onVery Large Data Base, Mexico City, Mexico, 1982. pp. 280-291.[44] Delobel, C., “Normalization and Hierarchical Dependencies in the Relational DataModel”, ACM Transactions on Database Systems, Vol. 3, No. 3, September 1978,pp. 201-222.[45] Dogac, A., and Chen P. P., “Entity-Relationship Model in the ANSI/SPARCFramework”, in Entity-Relationship Approach to Information Modelling and Analysis (the second International Conference on E-R Approach, 1981), edited by P.P.Chen, Elsevier Science B.V. (North-Holland), 1983, pp. 357-374.[46] Dogac, A., Chen P. P. and Erol, N., “the Design and Implementation of an IntegritySubsystem for the Relational DBMS RAP”, in the Fourth Conference on Entity-Relationship Approach, Chicago, Illinois, October 1985, pp. 295-302.[47] dos Santos, C. S., Neuhold, E. J., and Purtado, A. L., “A Data Type Approach tothe Entity-Relationship Model”, in Entity-Relationship Approach to System Analysis and Design (the First International Conference on E-R Approach), edited byP. P. Chen, North-Holland Publishing Co., 1980, Pp. 103-119.[48] Ehrich, H. D., Lipeck, U. W., and Gogolla, M., “Specification, Semantics, andEnforcement of Dynamic Database Constraints”, Proc. of the Tenth InternationalConference on Very Large Data Base, Singapore, August 1984, pp. 301-308.[49] Eswarn, K. P., and Chamberlin, D. D., “Functional Specifications of a Subsystemfor Data Base Integrity”, Proc. the Second International Conference on Very LargeData Base, Framingham, Massachusetts, September 1975, pp. 48-68.[50] Etzion, 0., “PARDES — a Model for Supporting Derivation Closure”, Working Paper, Computer and Information Sciences Department, Temple University, Philadelphia, Pennsylvania, 1989.[51] Fernandez, E. B., Summers, R. C., and Wood, C., Database Security and Integrity,Addison-Wesley Publishing Co., Reading, Massachusetts, 1981.[52] Fleming, C. C., and Halle, B. V.,Handbook of Relational Database Design, AddisonWesley Publishing Co., Reading, Massachusetts, 1989.[53] Fong, E., and Kimbleton, S. R., “Database Semantic Iitegrity for a Network DataVIanger”, AFIPS Proceedings of National Computer Conference, (Vol. 49), California, May 1980, pp. 261-268.Bibliography 181[54] Frost, R. A. (ed).,Database Management Systems, Granada Publishing Ltd., London, 1984.[55] Furtado, A. L., dos Santos, C. S., and de Castilho, J. M. V., “Dynamic Modellingof a Simple Existence Constraint”, Information Systems, Vol. 6, 1981, pp. 73-81.[56] Furtado, A. L., and Neuhold, E. J., Formal Techniques for Data Bases Design,Springer-Verlag, Berlin, 1986.[57] Furtado, A. L., Casanova, M. A., and Tucherman, L., “The CHRIS CONSULTANT”, in Entity-Relationship Approach (the Sixth International Conference onE-R Approach, New York , Nov. 1987), edited by S. T. March, Elsevier SciencePublishers, B.V. (North-Holland), 1988, pp. 515-532.[58] Gardarin, G., and Melkanoff, M., “Proving Consistency of Database Transactions”,Proc. of the Fifth International Conference on Very Large Data Base, Brazil, October 1979, pp. 291-298.[59] Goldstein, R. C., Database: Technology and Management, John Wiley & Sons,1985.[60] Goldstein, R. C., and Storey, V. C., “Data Abstraction: The Impact on DatabaseManagement”, Working Paper, Faculty of Commerce and Business Administration,The University of British Columbia, October 1990.[61] Goldstein, R. C., and Wagner, C., Manual of Instruction Database Software for Microcomputers, Faculty of Commerce and Business Administration, The Universityof British Columbia, December 1988.[62] Hammer, M. M., and McLeod, D. J., “Semantic Integrity in a Relational DataBase System”, Proc. the Second International Conference on Very Large Data Base,Framingham, Massachusetts, September 1975, pp. 25-47.[63] Hammer, M. M., and McLeod, D. J., “A Framework for Data Base Semantic Integrity”, Proc. the Second International Conference on Software Engineering, SanFranciso, California, October 1976, pp. 498-504.[64] Hammer, M. M., and McLeod, D. J., “Database Description with SDM: A SemanticDatabase Model”, ACM Transaction on Database Systems, Vol. 6, No. 3, September1981, pp. 351-386.[65] Hentenryck, P. V., Constraint Satisfaction in Logic Programming, MIT Press, Massachusetts, 1989.Bibliography 182[66] Hentenryck, P. V., “Incremental Constraint Satisfaction in Logic Programming”,in Logic Programming: Proc. of the Seventh International Conference, edited byD. H. D. Warren and P. Szeredi, 1990, PP. 189-202.[67] Henschen, 1. J., McCune, W. W., and Naqvi, S. A., “Compiling Constraint-Checking Programs from First-Order Formulas”, in Advances in Database Theory,Vol. , edited by H. Gallaire, J. Minker and J. M. Nicolas, Plenum Press, NewYork, 1984, pp. 145-169.[68] Heuser, C. A.. and Richter, G.. “On the Relationship between Conceptual Schemaand Integrity Constraints on Databases”, in Database Semantics (DS-1), editedby T. B. Steel, Jr. and R. Meersman, Elsevier Science Publishers, B.V. (NorthHolland), 1986, pp. 27-39.[69] Hillier, F. S., and Lieberman, G. J., Introduction to Operations Research, the fourthedition, Holden-Day, 1986.[70] Ho, H. C., “Integrity Control in a Relational Database”, Technical ReportS.O.C.S.828, School of Computer Science, McGill University, Montreal, Canada,March 1982.[71] Hohenstein, U. and Hülsmann, K., “A Language for Specifying Static and DynamicIntegrity Constraints”, in the Tenth International Conference on E-R Approach,San Mateo, California, October, 1991, (proceedings edited by T.J. Teorey), pp.389-416.[72] Holsapple, C., Shen, S., and Whinston, A., “A Consulting System for DatabaseDesign”, Information System, Vol. 7, No. 3., 1982, pp. 281-296.[73] Hsu, A., and Imielinski, T., “Integrity Checking for Multiple Updates”, Proc. ofACM SIGMOD International Management of Data, Austin, Texas, May 1985, pp.152- 167.[74] Hsu, C., Perry, A., Bouziane, M. and Cheung, W., “TSER: A Data ModellingSystem using the Two-Stage Entity-Relationship Approach” ,in Entity-RelationshipApproach (the Sixth International Conference on E-R Approach, New York, November 1987), edited by S. T. March, Elsevier Science Publishers, B.V. (NorthHolland), 1988, pp. 497-514.[75] Hull, R., and King, R., “Semantic Database Modelling: Survey, Applications, andResearch Issues”, ACM Computing Surveys, Vol. 19, No. 3, September 1987, pp.201-260.Bibliography 183[76] Jaffar, J., and Lassez, J-L., “Constraint Logic Programming”, 14th Annual SCMSymposium on Principles of Programming Languages, Munich, West Germany, January 1987, PP. 111-119.[77] Jaffar, J., and Michaylov, S., “Methodology and Implementation of a CLP system”,Proc. of 4th International Conference on Logic Programming, edited by J-L. Lassez,MIT Press, 1987, pp. 196-217.[78] Jarke, M., and Vassiliou, Y., “Databases and Expert Systems: Opportunities andArchitectures for Integration”, in New Applications of Databases, edited by G.Gardarin and E. Gelenhe, Academic Press London, 1984, pp. 185-201.[79] Jones, C. B., Systematic Software Development Using VDM, Prentice-Hall, 1986.[80] Kawaguchi, A., Taoka, N., Mizoguchi, R., Yamaguchi, T., and Kakusho, 0., “AnIntelligent Interview System for Conceptual Design of Database”, Proc. ECAI 1986.[81] Kennedy, A. J., and Yen, D. C., “Enhancing a DBMS Through the Use of anExpert System”, Journal of Information Management, Spring 1990, Pp. 55-61.[82] Kent, W., “Entities and Relationships in Information”, in Architecture and Models in Data Base Management Systems, edited by G.M. Nijssen, North-HollandPublishing Co., 1977, pp. 67-91.[83] Kent, W., “Limitations of Record-Based Information Models”, ACM Transactionson Database Systems, Vol. 4, No. 1, March 1979, pp. 107-131.[84] Kent, W., “A Single Guide to Five Normal Forms in Relational Database Theory”,Communications of the ACM, Volume 26, No. 2, February 1983, pp. 120-125.[85] Kerstern, M. L., Weigand, H., Dignum, F., Boom, J., “A Conceptual IViodellingExpert System”, in Entity-Relationship (the Fifth International Conference on ER Approach, France, 1986) edited by S. Spaccapietra, Elsevier Science PublishersB.V. (North-Holland), 1987, pp. 35-48.[86] Kim, M.-J., Lee, W.-U., and Derniame, J.-C., “Automatic Relational Data BaseDesigns by Transformation of the Entity-Relationship Model”, IEEE the SecondInternational Conference on Computer and Application, Beijing, China, 1987, pp.418-425.[87] Knuth, E., Hannák, L., Radó, P., “A Taxonomy of Conceptual Foundations”, inData and Knowledge (DS-2), edited by R.A. Meersrnan and A.C. Sernadas, ElsevierScience Publishers, B.V. (North-Holland), 1988, pp. 205-219.Bibliography 184[88] Kobayashi, I., “Validating Database Updates”, Information Systems, Vol. 9, No.1., 1984, pp. 1-17.[89] Kozaczynski, W., and Lilieri, L., “An Extended Entity-Relationship (E2R)Database Specification and its Automatic Verification and Transformation intothe Logical Relational Design”, in Entity-Relationship Approach (the Sixth International Conference on E-R Approach, New York , November, 1987), edited by S.T. March, Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 533-549.[90] Kung, C. H., “A Temporal Framework for Database Specification and Verification”,Proc. of the Tenth International Conference on Very Large Data Base, Singapore,August 1984, pp. 91-99.[91] Kung, C. H., “A Tableaux Approach for Consistency Checking”, Information Systems: Theoretical and Formal Aspects, edited by A. Sernadas, J. Bubenko, Jr., andA. Olive, 1985, pp. 191-207.[92] Lafue, G., “Semantic Integrity Dependencies and Delayed Integrity Checking”,Proc. the Eighth International Conference on Very Large Data Base, Mexico City,Mexico, 1982, pp. 292-299.[93] Lassez, C., “Constraint Logic Programming”, BYTE, August 1987, pp. 171-176.[94] Lauriere, J. L., “A Language and a Program for Stating and Solving CombinatorialProblems”, Artificial Intelligence, Vol. 10, No. 1, 1978, pp. 29-127.[95] Lee, R. M., “Logic, Semantics and Data Modelling: An Ontology”, in Data andKnowledge (DS-2), edited by R. A. Meersman and A. C. Sernadas, Elsevier SciencePublishers, B .V. (North-Holland), 1988, pp. 221-243.[96] Lee, K., and Lee, S., “An Object-Oriented Approach to Data/Knowledge Modelling Based on Logic”, The Sixth International Conference on Data Engineering,California, February 1990, pp. 289-294.[97] Leuzerini, M., and Nobili, P., “On the Satisfiahility of Dependency Constraintsin Entity-Relationship Schema”, Proc. the Thirtieth International Conference onVery Large Data Base, Brighton, 1987, pp. 147-154.[98] Lenzerini, v1., and Santucci, G., “Cardinality Constraints in the EntityRelationship Model”, in Entity-Relationship Approach to Software Engineering (theThird International Conference on E-R Approach, California, 1983), edited by C.G. Davis, S. Jajodia, P. A. Ng and R. T. Yeh, Elsevier Science Publishers B. V.(North-Holland), 1983, pp. 529-549.Bibliography 185[99] Leveson, N. G., Wasserman, A. I., and Berry, D. M., “BASIS: A Behavioral Approach to the Specification of Information Systems”, Information Systems, Vol. 8,No. 1, 1983, pp. 15-23.[100] Ling, T.-W., “Integrity Constraint Checking in Deductive Database using the Pro-log not-Predicate”, Tech. Report, DISCS Pub. No. NOTRA7/86, National University of Singapore, July 1986.[101] Ling, T.-W., and Rajagopalan, P., “A Method to Eliminate Avoidable Checkingof Integrity Constraints”, Proc. Trends é4 Applications, Gaithereshurg, Maryland,1984, pp. 60-68.[102] Lipeck, U. W., “Stepwise Specification of Dynamic Database Behaviour”, International Conference on Data Engineering, Washington, D.C., May 1986, pp. 387-397.[103] Lockemann, P. C., “Object-Oriented Information Management”, Decision SupportSystem, 5, 1989, pp.79-102.[104] Mackworth, A. K., “Consistency in Networks of Relations”, Artificial Intelligence,Vol. 8, 1977, pp. 99-118.[105] Mackworth, A. K., and Freuder, E. C., “The Complexity of Some PolynomialNetwork Consistency Algorithms for Constraint Satisfaction Problems”, ArtificialIntelligence, Vol. 25, No. 1, January 1985, pp. 65-74.[106] Mackworth, A. K., “Constraint Satisfaction”, Encyclopedia of Artificial Intelligence, edited by S. C. Shapiro, J. Wiley & Sons, N.Y., 1987, pp. 205-211.[107] MaFadden, F. R., and Hoffer, J. A., Data Base Management, the second edition,Benjamin/Cummings Publishing Co. Inc., 1988.[108] Makowsky, J. A., Markowitz, V. M., Rotics, N., “Entity-Relationship ConsistencyFor Relational Schemas”, Proc. International Conference on Database Theory,(Lecture Note in Computer Science V. 243) edited by G. Ausiello, and P. Atzeni,Springer-Verlag, 1986, pp. 306-322.[109] Maunila, H., and Räihä, K-J, “Inclusion Dependencies in Database Design”, TheSecond International Conference on Data Engineering, Los Angeles, California,February 1986, pp. 713-718.[110] Maryanski, F., Francis, S., Hong, S., and Peckham, J., “Generation of ConceptualData Models”, Working Paper, Computer Science and Engineering Department,University of Connecticut, 1984.Bibliography 186[111] Maryanski, F., and Hong, S., “A Tool for Generating Semantic Database Applications”, IEEE COMPSAC, Chicago, Illinois, October 1985, PP. 368-375.[112] Meersman, R., “Towards Models for Practical Reasoning about ConceptualDatabase Design”, in Data and Knowledge (DS-2), edited by R. A. Meersmanand A. C. Sernadas, Elsevier Science Publishers, B.V. (North-Hollalld), 1988, pp.245-263.[113] Mees, M., and Put, F., “ Extending a Dynamic vIodelling Methods using DataModelling Capabilities: The Case of JSD”, in Entity-Relationship (the Fifth International Conference on E-R Approach, France, 1986) edited by S. Spaccapietra,Elsevier Science Publishers B.V. (North-Holland), 1987, Pp. 399-418.[114] Missikoff, M., and Wiederhold, G., “Toward a Unified Approach for Expert andDatabase Systems”, in Expert Database Systems (Proceedings from the first International Workshop), edited by L. Kerschberg, Benjamin/Cummings Publishing,1986, pp-383-99.[115] Morgenstern, M., “Active Databases as a Paradigm for Enhanced Computing Environments”, Proc. the Ninth International Conference on Very Large Data Base,Italy, 1983, pp. 34-42.[116] Morgenstern, M., “CONSTRAINT EQUATIONS: Declarative Expression of Constraints with Automatic Enforcement”, Proc. the Tenth International Conferenceon Very Large Data Base, Singapore, August 1984a, pp. 291-300.[117] Morgenstern, M., “A Concise Compatible Representation for Quantified Constraints in Semantic Networks”, AAAI-84, Proc. of National Conference on Artificial Intelligence, Texas, August 1984b, pp. 255-259.[118] Morgenstern, M., “The Role of Constraints in Databases, Expert Systems, andKnowledge Representation”, in Expert Database Systems, edited by L. Kerschberg,Benjamin/Cummings Publishing Co., 1986, pp. 351-368.[119] Morgenstern, M., Borgida, A., Lassez, C., Maier, D., and Wiederhold, G.,“Constraint-Based Systems: Knowledge about Data”, in Expert Database Systems(Proc. from the Second International Conference on EDS), edited by L. Kerschberg,Benjamin/Cummings Publishing Co., 1989, pp. 23-43.[120] Nakano, R., “Integrity Checking in a Logic-Oriented ER Model”, in Entity-Relationship Approach to Software Engineering (the Third International Conference on E-R Approach, California, 1983), edited by C. G. Davis, S. Jajodia, P. A.Ng and R. T. Yeh, Elsevier Science Publishers B. V. (North-Holland), 1983, pp.551-564.Bibliography 187[121] Newell, A., and Simon H. A., “Computer Science as Empirical Inquiry: Symbolsand Search”, Communications of the ACM, Volume 19, March 1976, PP. 113-126.[122] Nicolas, J.-M., “Logic for Improving Integrity Checking in Relational Data Bases”,Acta Informatica, 18, 1982, pp. 227-253.[123] Nilssori, N. J., Principles of Artificial Intelligence, Tioga Publishing Company,1980.[124] Obretenov, D., Angelov, Z., Mihaylov, J., Dishlieva, P., and Kirova, N., “AKnowledge-Based Approach to Relational Database Design”, Data é4 KnowledgeEngineering, 3, 1988, pp. 173-180.[125] Oren, 0., “Integrity Constraints in the Conceptual Schema Language SYSDOC”,the Fourth Conference on Entity-Relationship Approach, Chicago, Illinois, October1985, pp. 288-294.[126] Palmer, I. R., “Practicalities in Applying a Formal Methodology to Data Analysis”,in Data Base Design Techniques I: Requirements and Logical Structures, edited byS. B. Yao, et al., Spring-Verlag, Berlin, 1982, pp. 147-171.[127] Papadirnitriou, C. H., and Steiglitz, K., Combinatorial Optimization: Algorithmsand Gomplexity, Prentice-hall, N.J., 1982.[128] Paulson, D., Reasoning Tools to Support System Analysis and Design, UnpublishedPh.D. Dissertation, The University of British Columbia, Vancouver, B.C., Canada,1989.[129] Peckham, J., and Maryanski, F., “Semantic Data Models”, ACM Computing Surveys, Vol. 20, No. 3, September 1988, pp. 153-189.[130] Potter, W. D., and Kerschberg, L., “A Unified Approach to Modelling Knowledgeand Data”, in Data and Knowledge (DS-), edited by R.A. Meersman and A.C.Sernadas, Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 265-291.[131] Proix, C., and Rolland, C., “A Knowledge Base for Information System Design”, inData and Knowledge (DS-2), edited by R.A. Meersman and A.C. Sernadas, ElsevierScience Publishers, B.V. (North-Holland), 1988, pp. 293-306.[132] Qian, X., and Wiederhold, G., “Knowledge-based Integrity Constraint Validation”,Proceedings of the Twelfth International Conference on Very Large Data Base,Kyoto, 1986, pp. 3-12.Bibliography 188[133] Qian, X., and Smith, D. R., “Integrity Constraint Reformulation for Efficient Validation”, Proceedings of the thirteenth International Conference on Very Large DataBase, Brighton, 1987, pp. 417-425.[1341 Raju, K., and Majumdar, A. K., “ Fuzzy Functional Dependencies and LosslessJoin Decomposition of Fuzzy Relational Database Systems”, AGM Transactionson Database Systems, Vol. 13, No. 2, June 1988, pp. 129-166.[135] Ram, S., “Automated Tools for Database Design: Sate of the Art”, Working Paper, Dept. of Management Information Systems, College of Business and PublicAdministration, University of Arizona, 1989.[136] Reiter, R., “ On the Integrity of Typed First Order Data Bases”, in Advancesin Database Theory, Vol. 1, edited by H. Gallaire, J. Minker and J. M. Nicolas,Plenum Press, New York, 1984, pp. 137-157.[137] Reiter, R., “On Integrity Constraints”, Proc. of the Second Conference on Theoretical Aspects of Reasoning about Knowledge, 1988, pp. 97-111.[138] Rolland, C., and Proix, C., “An Expert System Approach to Information SystemDesign”, in Information Processing 86, edited by H. J. Kugler, Elsevier SciencePublishers B.V. (North-Holland), 1986, pp. 241-250.[139] Sakai, H., “On the Optimization of an Entity-Relationship Model”, 3rd USA-JAPAN Computer Conference, San Francisco, California, October 1978, pp. 145-149.[140] Sakai, H., “A Method for Entity-Relationship Behaviour Modelling”, in Entity-Relationship Approach to Software Engineering (the Third International Conferenceon E-R Approach, California, 1983), edited by C. G. Davis, S. Jajodia, P. A. Ng andR. T. Yeh, Elsevier Science Publishers B. V. (North-Holland), 1983a, pp. 111-129.[141] Sakai, H., “E-R Approach to Logical Database Design”, in Entity-RelationshipApproach to Software Engineering (the Third International Conference on E-RApproach, California, 1983), edited by C. G. Davis, S. Jajodia, P. A. Ng and R. T.Yeh, Elsevier Science Publishers B. V., North Holland, 1983b, pp. 155-187.[142] Scheuermann, P., Schiffner, G., and Weber, H., “Abstraction Capabilities and Invariant Properties Modelling within the Entity-Relationship Approach”, in EntityRelationship Approach to System Analysis and Design (the First International Conference on E-R Approach), edited by P. P. Chen, North-Holland Publishing Co.,1980, pp. 121-140.Bibliography 189[143] Schrefl, M., Tjoa, A. M., and Wagner, R. R., “Comparison-Criteria for SemanticData Models”, International Conference on Data Engineering, 1984, pp. 120-124.[144] Segev, A., “Transitive Dependencies”. in the Surveyors’ Forum of AM ComputingSurveys, Vol. 19, No. 2, June 1987, pp. 191-193.[145] Shepherd, A., and Kerschberg, L., “Constraint Management in Expert DatabaseSystems”, in Expert Database Systems (Proc. form the First International Workshop), edited by Larry Kerschberg, Benjamin/Cummings Publishing Co., 1986, pp.309-331.[146] Simon, E., and Valduriez, P., “Design and Implementation of an Extendible Integrity Subsystem”, ACM-SIGMOD Proc. International Conference on Management of Data, Boston, Massachusetts, June 1984, pp. 9-17.[147] Smith, J. M., and Smith, D. C. P., “Database Abstractions: Aggregation and Generalization”, ACM Transactions on Database Systems, Vol. 2, No. 2, June 1977a,pp. 105-133.[148] Smith, J. M., and Smith, D. C. P., “Database Abstractions: Aggregation”, Communications of the ACM, Vol. 20, No. 6, June 1977b, pp. 405-413.[149] Solvberg, A., and Kung. C. H., “On Structural and Behavioral Modelling of Reality”, in Database Semantics (DS-1), edited by T.B. Steel, Jr. and R. Meersman,Elsevier Science Publishers, B.V. (North-Holland), 1986, pp. 205-221.[150] Spivey, J. M., Understanding Z, Cambridge University Press, 1988.[151] Stonebraker, M., “Implementation of Integrity Constraints and Views by QueryModification”, Proc. of ACM SIGMOD International Management of Data, SanJose, May 1975, pp. 65-78.[152] Storey, V. C., View Creation: An Expert View Creation System for DatabaseDesign, Ph.D. Dissertation, Faculty of Commerce and Business Administration,University of British Columbia, Vancouver, B.C., Canada, October 1986, ICITPress,1988.[153] Storey, V. C., and Goldstein, R. C., “A Methodology for Creating User Viewsin Database Design”, AC’M Transactions on Database Systems, Vol. 13, No. 3,September 1988, pp. 30.5-338.[154] Storey, V. C. and Goldstein, R. C., “Design and Development of an ExpertDatabase Design System”, International Journal of Expert Systems: Research andApplications, Vol.3, No.1, 1990, pp. 31-63.Bibliography 190[155] Storey, V. C., and Goldstein R. C., “Knowledge-Based Approaches to DatabaseDesign”, Working Paper, University of Rochester, 1991.[156] Studer, R., “A Conceptual Model for Physical and Logical time”, in Entity-Relationship Approach (the Sixth International Conference on E-R Approach, NewYork , Nov. 1987), edited by S. T. March, Elsevier Science Publishers, B.V. (North-Holland), 1988, pp. 223-235.[157] Su, S. Y. XV., and Raschid, L. “Incorporating Knowledge Rules in a SemanticData Model: An Approach to Integrated Knowledge Management”, the SecondConference on Artificial Intelligence Application, Miami, Florida, December 1985,pp. 250-256.[158] Tauzovich, B., “An Expert System for Conceptual Data Modelling”, in the EighthInternational Conference on E-R Approach, Toronto, Canada, October 1989.[159] Tauzovich, B., “Towards Temporal Extensions to the Entity-Relationship Model”,in the Tenth International Conference on E-R Approach, San Mateo, California,October. 1991, (proceedings edited by T. J. Teorey). pp. 163-179.[160] Teorey, T. J., Yang, D., and Fry, J. P., ‘ A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model”, ComputingSurveys, Vol. 18, No. 2, June 1986, pp. 197-222.[161] Thompson, J. P., Data with Semantics: Data Models and Data Management, VanNostrand Reinhold, N.Y., 1989.[162] Troyer, 0. D., RIDL*: A Tool for the Computer-Assisted Engineering of LargeDatabases in the Presence of Integrity Constraints”, Proc. of ACM SIGMOD International Management of Data, Oregon, June 1989, pp. 418-429.[163] Tsichritzis, D. C., and Lochovsky, F. H., Data Models, Prentice-Hall Inc., 1982.[164] Ullman, J. D., Principles of Database Systems, the second edition, Computer Science Press, Rockville. Maryland, 1982.[165] Urban, S. D., and Delcambre, L. M. L., “An Analysis of the Structural Dynamic,and Temporal Aspects of Semantic Data Models”, The Second International Conference on Data Engineering, Los Angeles, California, February, 1986, pp. 382-389.[166] Urban, S. D., and Delcambre, L. M. L., “Constraint Analysis for Specifying Perspectives of Class Objects”, The Fifth International Conference on Data Engineering, Los Angeles, California, February 1989, pp. 10-17.Bibliography 191[167] van der Lans, R. F., The SQL Standard — A Complete Reference, originally inDutch, translated into English by Andrea Gray, original Dutch published by Academic Science, Schoonhoven, 1988 (translated English version published by PrenticeHall, 1989).[168] Wagner, C., View Integration in Database Design Unpublished Ph.D. Dissertation,The University of British Columbia, Vancouver, B.C., Canada, April 1989.[169] Wand, Y., and Weber, R., “An Ontological Analysis of Some Fundamental Information Systems Concepts”, Proc. of the Ninth International Conference on Information Systems, Minneapolis, Mn., 1988, pp. 213-225.[170] Wand, Y., and Weher, R., “An Ontological Analysis of Some Systems Analysisand Design Methods”, in Information Systems Concepts— An In-Depth Analysis,edited by E. Falkenberg and P. Lindgreen, North-Holland Publishing Co., Amsterdam, 1989, pp. 79-107.[171] Wand, Y., and Weber, R., “Toward a Theory of the Deep Structure of InformationSystems”, Proc. of the Eleventh International Conference on Information Systems,Copenhagen. December 1990.[172] Weber, W., Stucky, W., and Karszt, J., “Integrity Checking in Data Base Systems”,Information Systems, Vol. 8, No. 2, 1983, pp. 125-136.[173] Webre, N.W., “An Extended Entity-Relationship Model and its Use on a DefenseProject”, in Entity-Relationship Approach to Information Modelling and Analysis(the second International Conference on E-R Approach, 1981), edited by P. P.Chen, Elsevier Science, (North-Holland Publishing Co.), 1983, pp. 173-193.[174] Wilmot, R. B., “Foreign Keys Decrease Adaptability of Database Designs”, Communications of the ACM, Vol. 27, No. 12, December 1984, pp. 1237-1243.[175] Wong, H. K.T., and Mylopoulos, J., “Two Views of Data Semantics: A Survey ofData Models in Artificial Intelligence and Database Management”, INFOR, Vol.15, No. 3, October 1977, pp. 344-383.[176] Yang, H.-L., and Goldstein, R. C., “Identification of Semantic Integrity Constraintsfor Database Design”, Working Paper, 89-MIS-021, Faculty of Commerce and Business Administration, The University of British Columbia, 1989.[177] Yang, H.-L., “Semantic Integrity Constraint Representation Model: Some Illustrative Examples”, Working Paper, Faculty of Commerce and Business Administration, The University of British Columbia, February 1992.Bibliography 192[178] Yasdi, R., and Ziarko, W., “Conceptual Schema Design: A Machine LearningApproach”, in Methodologies for Intelligent Systems, edited by Z. W. Ras andM. Zemankova, Elsevier Science Publishers Co., 1987, pp. 379-391.[179] Zvieli, A., and Chen, P. P., “Entity-Relationship Modelling and Fuzzy Databases”,The Second International Conference on Data Engineering, Los Angeles, California,February 1986, pp. 320-327.Appendix ABNF Descriptions of the SIC Representation ModelThis appendix provides a summary of the syntax of the SIC Representation model usedin this research. The following usual BNF meta-symbols are used: < > { } [If a terminal symbol happens to be identical to a meta-symbol, it will be written withina pair of escape symbols, e.g., :j representing the terminal<SIC statement> [<SIC name>]CERTAINTY <certainty factor>FOR <object type name>ON <operation type>[IF <assertions>]ASSERT <assertions>ELSE <violation action><SIC name> ::= <object type name> -< operation abbreviation>-<SIC type>[-< optional part>]SIC type is specified by the automated database design system or the databasedesigner using some conventions.<optional part> <related object type set> [-<sequence number>]<object type name> ::= <entity type name><relationship type name>I <relation type name><attribute type name><attribute type name> <entity type name>.< attribute name>I <relationship type name> .< attribute name><relation type name>.<attribute name>Entity type name, relationship type name, relation type name, and attribute nameare all specified by the database designer, and by convention, begin with a capitalletter.<operation abbreviation> ::= I D U193Appendix A. BNF Descriptions of the SIC Representation Model 194<related object type set> ::= (<related object> {, <related object>})<related object> :: <object type name> <Prolog variable>Prolog variable is any name beginning with a capital letter or an underline sign(following the Prolog convention). In this case, it is used to represent a singleobject type name or a set including some object type names.<sequence number> 1 2 I 3Note that they are positive integer numbers.<certainty factor> ::= <binary certainty factor>I <fuzzy certainty factor><certainty number><binary certainty factor> ::= certain uncertainNote that “certain” is equivalent to the ratio certainty number 100%, “uncertain”expresses an unknown ratio certainty number that is less than 100%.<fuzzy certainty factor> ::= usually I sometimesNote that the database designer would specify these fuzzy certainty factors to beequivalent to some certainty numbers (for example, “usually” may be set to 80%).<certainty number> ::= <ordinal certainty number><ratio certainty number><ordinal certainty number> 1 2 3Note that they are positive integer numbers.<ratio certainty number> ::= 1% I ... I 100%Note that they are positive real number from 1% to 100% inclusive and are represented in percentage.<operation type> ::= insertion deletion I update<assertions> :— [<quantifier> <variable> <and_connective>]<assertion_and_list> [<or_connective> <assertions>]<assertion_and_list> :: <expression> [<and_connective> <assertion_and_list>]<expression> ::= (<assertions>)<not> (<assertions>)<logical predicate><set expression><arithmetic expression><date expression>Appendix A. BNF Descriptions of the SIC Representation Model 195<logical predicate> ::= <predicate name>[(< argument> {,<argument>})]Predicate name is any name beginning with a lower-case letter following the usualProlog convention. The built-in logical predicates for storing or manipulating information in data dictionary are listed in Appendix B.<not> not j —<and_connective> ::= A<or_connective> ::= V<quantifier> ::= V I<argument> ::= <constant> <EntRshipRel variable value>I <arithmetic simple expression><date simple expression><constant> ::= <numeric constant><string><atom>I <date constant>Atom is the same as in Prolog, i.e. it is made up of letters and digits, and beginswith a lower-case letter; string is a character string.<numeric constant> ::= <integer number> <real number><date constant> ::= <time point> <time interval>Time point is a specific date following any date representation convention within apair of double quotation marks (“) (the system takes responsibility for decipheringthe input string and interpreting it as a date); time interval is a period representedby a real number following onetime unit, i.e., second(s), minute(s), hour(s), day(s),month(s), and year(s) within a pair of quotation marks (““); e.g., “3.5 hours”.<variable> : — <EntRshipRel variable> <other_variable_except_ERR><EntRshipRel variable> = <entity type name> [<subscript>]<relationship type name> [<subsipt>]<relation type name> [<subscript>]<subscript> ::= 0 1 2Note that they are non-negative integer numbers.<other_variable_except_ERR> : : = <attribute variable>I <Prolog variable>Appendix A. BNF Descriptions of the SIC Representation Model 196<attribute variable> ::= <entity type name> [<subsipt>J.< attribute name>I <relationship type name> [<subscript>] . <attribute name><relation type name> <subscript>]. <attribute name><set expression> : := [set] {< variable> :j: <expression>}<arithmetic simple expression> [<positive_negative_sign>] <arithmetic term>{ <arithmetic_plus_minus> <arithmetic term> }<arithmetic term> ::= <arithmetic subterm>{ <arithmetic_times_divide> <arithmetic subterm> }<arithmetic subterm> :: <numeric constant><other variable value>I <aggregate function> (<other variable value>)count(<EntRshipRel variable value>)<aggregate function>(<set expression>)(<arithmetic simple expression>)[<exponential operator> (<arithmetic simple expression>)]<other variable value> ::= <other_variable_except_ERR>new(< other_variable_except_ERR>)I old(< other_variable_except_ERR>)<EntRshipRel variable value> ::= <EntRshipRel variable>new(<EntRshipRel variable>)I old(<EntRshipRel variable>)<positive_negative_sign> ::= + —<arithmetic_plus_minus> ::= +—<arithmetic_times_divide> ::= x I /<exponential operator> ::=<aggregate function> ::= sum avg miii max count<user-defined aggregate function><arithmetic expression> ::= <arithmetic simple expression>[<not>] <comparison operator> <arithmetic simple expression><comparison operator> ::= < >= I<date simple expression> [<positive_negative_sign>] <date subterm>{<date operator> <date subterm>}Appendix A. BNF Descriptions of the SIC Representation Model 197<date subterm> ::= <date constant><other variable value><date function>.(<other variable value>)(<date simple expression>)<date operator> :: + —<date function> : : = year month I day minute second<date expression> ::= <date simple expression>[<not>] <comparison operator> <date simple expression><violation action> <rejection> <propagation> warning<rejection> ::= reject conditionallyreject<propagation> : : = propagate( <propagated_action>)I conditionally_propagate( <propagated_action>)<propagated_action> : : = insert (<variable>)delete(<variable>)insert_all(< variable>)I delete_all( <variable>)I insert(<entity role type>, <variable>)I delete(<entity role type>, <variable>)I insert_all(<entity role type>, <variable>)I delete_all(<entity role type>, <variable>)I update(<variable>, <arithmetic simple expression>)I update(<variable>, <date simple expression>)<entity role type> “<entity type name>”I <Prolog variable>Prolog variable is used to represent an entity type name in this case.Appendix BSummary of the Predicates used in this ResearchThis appendix provides a summary of the logical predicates used with the SIC Representation model. The listing is not intended to be comprehensive. All the “inputpredicates” (in Appendix B.1) and most of the “manipulation predicates” (e.g., ent_occ,in Appendix B.2) are used only in conjunction with generic SICs. Other manipulationpredicates (e.g., rship_occ_part, is_null) are represented as Prolog “sub-procedures”, whichare convenient for SIC representation. However, it is possible to represent a specific SICwithout using any of these predicates.B.1 Input PredicatesThe following predicates are suggested by this research for conveying semantic information about a database. They are used by a database designer to indicate that there aresome integrity constraints on specific objects, which would inherit relevant generic SICs.They are information provided by the database designer and stored in a database.• The predicate domain(Domain_Name, Data_Type, Format, Value_Range) is usedto define a user-defined domain, Domain_Name, e.g., money, location; its systempre-defined Data_Type, e.g., arithmetic, date, non_arithmetic; its format; and valuerange. Stating that a data type is of type arithmetic if all of the normal arithmeticoperations may be performed on it in the usual way. Specifying a data with anonarithmetic data type implies that the data may not be used in conventionalarithmetic operations. Following the idea of the EBASE system to simplify thedatabase designer’s work ([Goldstein and Wagner, 1988]), no format specificationis needed for date and arithmetic data type domains. For nonarithmetic datatypes, the format is comprised of four basic symbols enclosed in the pair of quotation marks:A for an alphabetic character, i.e., A to Z, a to z, period or blank;9 for a numeric character, i.e., 0 to 9;for any special character, e.g.,- / ,. [] { } ! etc. or blank;X for an alpha-numeric character (i.e., any of the above).If a format contains a string of one type of character, it can be expressed more198Appendix B. Summary of the Predicates used in this Research 199concisely as “7©X”, which means “XXXXXXX”. The range is expressed as a continuous range from Beginning_value to End_value in the format of [Beginning_valueEnd_value], or an enumerated set. Following the usual mathematic convention,the symbols are used here as follows:“[“ means the beginning of a closed range,i.e., including the Beginning_value;“]“ means the end of a closed range, i.e., including the End_value;“(“ means the beginning of an open range, i.e., excluding theBeginning_value;“)“ means the end of an open range, i.e., excluding the End_value.The special symbol ““ is used if there is no upper or lower bound.• The predicate attribute(Entity/Relationship_Type, Attribute_Name, Domain_Name,Special_Value_Range, Null?, Unique?, Candidate_Key_Attribute?, Changeable?) isused to specify information about an attribute of an entity or relationship type.Domain_Name is its value domain definition. Special_ Value_Range is its specificvalue range information (e.g., Employee.Salary has the domain of money with thespecific value range between $2,000 and $100,000). The four binary variables, Null?,Unique?, Candidate_Key_Attribute?, Changeable? indicate whether the attribute isallowed to he null, unique, a part of a candidate key, and changeable.• The predicate entity(Entity_Type, Primary_Key, Composite_Key_Set,Absolute_Max_Cardinality) is used to describe Entity_Type. Primary_Key specifiesits primary key. Composite_Key_Set is a set comprised of all composite keys (including primary and non-primary composite keys). Absolute_Max_Cardinality specifiesthe maximum number of occurrences that are allowed in the Entity_Type.• The predicate relationship_participant (Relationship_ Type, Entity_Type,Min_ Cardinality, Max_ Cardinality) specifies Relationship_Type’s participant, Entity_Type, and the usual relationship cardinalities relative to it. This predicate isused for each relationship in the E-R-SIC model. In the relational model, it is usedfor those relationships that are separately represented. If a relationship is hiddenin an entity relation, its two related relationship_participant predicates are deleted,and another relationship_hidden_entity predicate is created (see below).• The predicate relationship_hidden_entity(RelationshipType, Entity Type 1, ForeignEntType, MinCard2, MaxCard2) is used in the relational model to indicate that Relationship Type is represented via a foreign key in the EntityTypel relation. It alsoindicates that relative minimum and maximum cardinalities of ForeignEnt Type inthe Relationship Type are MinCard2 and MaxCard2, respectively.• The predicate symmetric(Relationship) is used to indicate that a relationship issymmetric.• The predicate transitive (Relationship) is used to indicate that a relationship istransitive.Appendix B. Summary of the Predicates used in this Research 200• The predicate subset_rship ((Rship Type 1, Ent Type 1, Ent Type2), (Rship Type2, EntType3, EntType4)) is used to indicate that the first relationship type is a subset ofthe second relationship type in the sense that Ent TypeS and EntType are the participants in the second relationship type corresponding respectively to EntTypeland Entlkjpe2 in the first relationship type. Usually, EntTypel is the same asEntType3, and EntType2 is the same as EntType. However, it is possible thatEntTypel and Ent TypeS are different subtypes of a super-type, and EntType2 andEntType are different subtypes of another super-type (these two super-types maybe the same). In that case, if an occurrence El of EntTypel connects with an occurrence E2 of EntType2 in RshipTypel, an EntType3 occurrence, corresponding tothe super-type occurrence of El, should also connect with an EntType$ occurrence,corresponding to the super-type occurrence of E2 in RshipType2.• The predicate rships_union_condition((SuperRship, EntType 1, Ent Type2),SubRshipSet) is used to indicate a necessary condition for the case that the firstrelationship type is the union of the other relationship types that are in the set,i.e., the second argument of the predicate is a set in the form of {(SubRshipi,EntTypei, EntTypej)}, i,j= 1,2,...,i j. Similarly to the case of the predicatesubset_rship, usually EntTypei’s are the same as EntTypel, and EntTypej’s are thesame as EntType2. It is also possible that EntTypei’s and EntTypel are differentsubtypes of a super-type, and EntTypej ‘s and EntType2 are different subtypes ofanother super-type (these two super-types may be the same). In that case, ifthere is an SnperRship occurrence connecting an occurrence El of EntTypel withan occurrence E2 of EntType2, there should be at least one S’ubRshipi occurrenceconnecting an EntTypei occurrence, corresponding to the super-type occurrence ofEl, with an EntTypej occurrence, corresponding to the super-type occurrence ofE2.• The predicate rships_intersect_condition ((SubRship, Ent Typel, EntType2),SuperRshipSet) is used to indicate that the first relationship type is the intersectionof the other relationship types that are in the set. The second argument of the predicate, SnperRshipSet, is a set in the form of { (SuperRshipi, EntTypei, EntTypej)},i,j= 1,2,...,i=/ j. Similarly to the case of the predicate rships_union.condition,this kind of SIC may occur in a specialization hierarchy. In that case, for each ofSuperRshipi to have an occurrence connecting an EntTypei occurrence, corresponding to a common super-type occurrence (say, Ek), with its EntTypej occurrence,also corresponding to a common super-type occurrence (say, Fk), there should bean SnbRship occurrence connecting an EntTypel occurrence, corresponding to Ek,with an EntType2 occurrence, corresponding to Fk.• The predicate ex_rships(ExRshipSet), where ExRshipSet has the form of { (Rshipl,SharingEntl), (Rship2, SharingEnt2)}. The SharingEntl and SharingEnt2 areusually the same. In this case, the predicate is used to indicate that an occurrence ofAppendix B. Summary of the Predicates used in this Research 201the sharing entity type is allowed to participate in only one of these two relationshiptypes. The SharingEnti and SharingEnt2 may be different subtypes of a super-typein a specialization hierarchy. In that case, this predicate is used to indicate that ifan occurrence E of a sharing entity type participates in one of these two relationshiptypes, the occurrence of the other sharing entity type, corresponding to the super-type occurrence of E, cannot participate in the other relationship type.• The predicate ex_occ(ExoccRshipSet), where ExoccRshipSet has the form of{(RshipTypel, EntTypel, EntType2), (RshipType2, EntType3, EntType4)}, is usedto indicate that there should not be any common occurrences of these two relationship types in the sense that EntType3 and EntType are the participants in thesecond relationship corresponding respectively to the EntTypel and EntType2 inthe first relationship. Similarly to the case of the predicate ex_rships(ExRshipSet,),EntTypel and EntType3 may be different subtypes of a super-type and EntType2and EntType4 may also be different subtypes of another super-type.• The predicate not_andrships(NotAndRshipSet), where NotAndRshipSet has theform of {(Rshipi, SharingEnti)}, i=1,2 Similarly to the case of the predicate ex_rships(ExRshipSet), the SharingEriti’s are usually the same for Rshipi’s.In this case, the predicate is used to indicate that an occurrence of the sharingentity type cannot participate in ll of these relationship types. The SharingEnti’smay also be different subtypes of a super-type in a specialization hierarchy. In thatcase, at least one SharingEnti occurrence, corresponding to the same super-typeoccurrence, cannot participate in its Rshipi.• The predicate either_rships(EitherRshipSet), where EitherRshipSet has the formof { (Rshipi, SharingEnti)}, i=1,2, Similarly to the case of the predicateex_rships(ExRshipSet), the SharingEnti’s are usually the same for Rshipi’s. In thiscase, the predicate is used to indicate that an occurrence of the sharing entity typemust participate in at least one of these relationship types. If the SharingEnti’sare different, at least one occurrence of SharingEnti, corresponding to the samesuper-type occurrence, must participate in its Rshipi. It is desirable to have twosub-types of these SICs with different violation actions; one for the case with onlytwo relationships, and one for the case with more than two relationships. However,the predicate can be the same.• The predicate before((Rshipl, SharingEnti), (Rship2, SharingEnt2)) is used toindicate that at the time an occurrence of SharingEnti is going to participatein Rshipl, the occurrence of SharingEnt2, corresponding to the same super-typeoccurrence, must participate in Rship2.• The predicate not_before((Rshipl, SharingEntl), (Rship2, SharingEnt2)) is used toindicate that at the time an occurrence of SharingEntl is going to participate inRshipl type, the occurrence of SharingEnt2, corresponding to the same super-typeoccurrence, must participate in Rship2.Appendix B. Summary of the Predicates used in this Research 202• The predicate rships_join(RshipTypei, EntTypei, EntTypeN), RshipList,) is usedto indicate that if there is a linking path via those relationship types in RshipListto connect two entity occurrences in EntTypei and EntTypeN together, these twoentity occurrences must be connected via RshipTypei. RshipList is an orderedset (list) in the form of (RshipTypei, EntTypej, EntTypek) where i, k= 2, 3,N and j= 1, 2, ..., N_i. It is a necessary condition for asserting that RshipTypei is the join of these RshipTypei’s, that is, RshipTypei[EntTypei,EntType=Rship Type2[Ent Type 1, Ent Type2J N Rship Type3[Ent Type2, Ent Type3j N . . . NRshipTypeN[EntTypeN_i,EntTypeI\. Since its representation is complicated, theunusual cases in a specialization hierarchy are not considered.• The predicate rship_dep_loopn_rships((RshipTypei, EntTypei, EntTypeN),RshipList) is used to indicate that if an occurrence of RshipTypei exists, thereshould be a linking path via other relationship types in RshipList to connect itsparticipating entity occurrences (in EntTypei and EntTypeN) together. RshipListis an ordered set (list) in the form of (RshipTypei, EntTypej, EntTypek) where i, k=2, 3, ..., N and j= 1, 2, ..., Ni. Both this and the above rshipsjoin predicatesare needed to guarantee that RshipTypei is the join of these RshipTypei’s.That is, Rship Type 1 [EntTypei,Ent TypeI\ Rship Type2[Ent Typei, Ent Type2] NRship Type3[EntType2, Ent Type3] N . . . N Rship TypeN[Ent TypeN..i , Ent TypeIV]. Sinceits representation is complicated, the unusual cases in a specialization hierarchy arenot considered.• The predicate weak_entity(EntityType, Relationship Type) is used to indicate thatthe entity type is weak and Relationship Type is the one that it depends on.• The predicate id_depend(EntityType, KeyAttSet, Relationship Type) is used to indicate that because of the semantics of Relationship Type (e.g., implying ID dependency or as an is_a relationship, etc.), EntityType incorporates the primary key ofthe other participating entity type as (part of) its candidate key attributes KeyAttSet.• The predicate critical_rship (Relationship Type, EntType, CriticalAtt) is used to indicate that not only Relationship Type is total to Ent Type, but also exactly onecritical relationship occurrence exists for each occurrence of Ent Type. CriticalAtt isa binary attribute in Relationship Type to indicate whether a relationship occurrenceis critical.• The predicate completeness_mapping(RelationshipType) is used to indicate that arelationship type is complete, which means that for each occurrence of one entitytype, all occurrences of the other entity type must relate to it via this relationshiptype.• The predicate rship_trigger_rship((Rshipi, SharingEnti), (Rship2, SharingEnt2))is used to indicate that if an occurrence of Rshipi, in which an occurrence E ofAppendix B. Summary of the Predicates used in this Research 203SharirigEriti participated, existed in the past (and no longer exist now), the SharingEnt2 occurrence corresponding to the super-type occurrence of E must participatein Rship2.• The predicate ex_ents(’ExEntSet, is used to indicate that the two entity types inthe set are exclusive. The ExEntSet contains only two entity types. If three ormore entity types are mutually exclusive, we can specify more than one exentspredicate.• The predicate ents_intersect_condition(SubEnt Type, SuperEntSet) is used to indicate a necessary condition for the case that SubEnt Type is the intersection of theentity types in the set, SuperEntSet. That is, for each of the entity types in SuperEntSet to have an occurrence with the same candidate key value, there should bea corresponding occurrence with this candidate key value in SubEnt Type.• The predicate ents_unioncondition(SuperEnt Type, SubEntSet) is used to indicatea necessary condition for the case that the SuperEntType is the union of the entitytypes in the set, SubEntSet. That is, for any occurrence in SuperEnt Type, thereshould be at least one corresponding occurrence with the same candidate key valuein one of those entity types that are in SubEntSet.B.2 Manipulation PredicatesThe following predicates are used to manipulate the information contained in the inputpredicates. In order to represent all generic SICs mentioned in this dissertation, an automated database design system should have at least these built-in predicates. In thefollowing, only the functioning of these predicates is described. The actual Prolog codeis not provided except for those complicated recursive predicates— join_list_oki andjoin_list_ok2. All predicates are applicable to the E-R representation and the relationalrepresentation unless explicitly stated otherwise. The Prolog representation of an occurrence would include its associated entity or relationship type and its primary key (e.g.,(“Person”, “123-456-789”)).• The predicate “rship_occ_part(R,RoleType,E,i’ is used to evaluate whether an entityoccurrence E participates in a relationship occurrence R with the Role Type. Notethat usually the entity type of E is the same as the Role Type. However, in aspecialization hierarchy the entity type of E may be E Type and the Role Type maybe FType, where EType and FType have a common super-type. In that case,the predicate means that the F occurrence in F_Type, which corresponds to the Eoccurrence in the E_Type, participates in the relationship occurrence R.For example, if a SIC originally obtained from the database designer refers to aAppendix B. Summary of the Predicates used in this Research 204relationship such as “E R F’ and E is the oniy sharing entity type, it will bewritten as rship_occ_part(R, “E”,E) in its sub-SICs.The exact functioning of this predicate “rship_occ_part(R,RoleType,E)” is as follows.Role Type must be instantiated.(1)Suppose that both Rand Eare instantiated. (la) Suppose E_Type is the same asRole Type. This predicate checks whether the primary key value of the occurrenceE and the corresponding key attributes of the occurrence R are equivalent. Itreturns false if they are not equivalent or if the corresponding key attributes of theoccurrence R are null. (ib) Suppose EType is different from Role Type. It traversesa specialization hierarchy to find the corresponding occurrence in Role Type andcheck that occurrence with R.(2) Suppose that E is uninstantiated, R is instantiated. It returns the occurrenceparticipating in R with Role Type as E. No traverse of a specialization hierarchywould be performed.(3) Suppose that Eis instantiated, R is not. It returns the R (Rs when backtracking)in which E participates with the Role Type.• The predicate ent_occ (Entity_Type, E) is used to evalilate whether an E is an occurrence of Entity_Type, or used to fetch any one occurrence E from Entity_Type.If E is null, the logical value of this predicate is undefined. It does not traverse aspecialization hierarchy to find the corresponding entity occurrence. For example,enLocc(”Manager”, Engineer) would be false though the mentioned engineer mayalso appear in the manager entitytype.• The predicate rship_occ(Relationship_Type, R) is used to evaluate whether R is anoccurrence of Relationship_Type, or used to fetch any one occurrence from Relationship_Type. If R is null, the logical value of this predicate is undefined.• The predicate att_occ(E, Att_Name, E.A) (or att_occ(R, Att_Name, R.A)) is usedto get the value of an attribute occurrence E.A (or R.A) of an entity occurrenceE (or relationship occurrence R). If Att_Name is a key attribute of a relationshiptype, att_occ can be used to reference its value.• The predicate comp_atts_occ(E, Comp_Att, E. C) is used to get the value of a composite attribute occurrence, that is, by applying a “concatenate” operator to theattribute values in the ordered set, Comp_Att.• The predicates satisfy_datatype(E.A, Data_Type), satisfy_format(E.A, Format), satisfy_val’ue(E.A, Range), are used to determine whether E.A satisfies the corresponding data_type, format, value, respectively.• The is_null(’x) predicate is evaluated to be true if and only if x is “null”.• The is_not_nullfr,,) predicate is evaluated to be true if an only if x is not “null”.Appendix B. Summary of the Predicates used in this R.esearch 205• The false predicate is used to indicate that the assertion is absolutely false. Thatis, it is used to indicate that the attempted data operation the operation Tcomponent in the SIC Representation model— is not allowed.• The predicate concatenate(Stringl, String2, Resulting_String) is used to constructResultingString by appending String2 after Stringi.• The predicate substring(String, Beginning_Position, End_Position,Resulting_Substring) is used to get Resztlting_Substring from Beginning_Position toEnd_Position of String.• The predicate concatenate_SlCname(Stringl, Variable, String2, Resulting_SlCname)is used to construct Resulting_SlCname by appending the value of Variable afterString], and then concatenating with String2.• The predicate checkcomSlC(Component_SIC_Name, Checked_Occurrence) calls aSIC named Component_SIC_Name to check whether Checked_Occurrence satisfiesit; if not, the violation action of the calling SIC (rather than Component_SIC_Name)will be taken. (The SIC aggregation concept is applied.)• The predicate checkmemSlC(Member_SIC_Name, Checked_Occurrence) calls a SICnamed Member_SIC_Name to check whether Checked_Occurrence satisfies it. Theviolation action of the Member_SICName will be taken if the Checked_Occurrenceviolates it. (The SIC association concept is applied.)• The predicate not_empty(Set,) is used to test whether Set is empty.• The predicate belongs_to(Element, Set) is used to test whether Element is in Set.• The predicate is_compatt(Att, Comp_Key_Set) is used to test whether an attributebelongs to a composite key. It is different from belongs_to because C’omp_Key_Setmay be a coset — its elements are also sets containing some attributes.• The predicate part_of_comp_key(Att, Comp_Key_Set, Comp_Keyl) is used to get acomposite key Comp_Keyl, which consists of the attribute Att, from Comp_Key_Set.• The predicate remove_from_set(Element, SourceSet, ResultingSet) is used to indicate that ResultingSet is the result of removing Element from SourceSet.• Some special predicates to process a list are needed in handling a Relationships_Join SIC or Relationship_depends_on_LoopN_Relationships SIC. Inthe following, RshipListis an ordered set (list) in the form of (RshipTypei, EntTypej,EntTypek) where i, j, k= 1, 2, ..., N.— The predicate precedes (RshipTypej, RshipTypei, RshipList) is used to find therelationship type, Rship Typei, which precedes the relationship type,Rship Typej, in RshipList.Appendix B. Summary of the Predicates used in this Research 206— The predicate follows (RshipTypej, RshipTypek, RshipList) is used to findRshipTypek, which follows the relationship type, RshipTypej, in RshipList.— The recursive predicate join_list_oki (FirstEntType, LastEnt Type, FirstEntOcc,LastEntOcc, RshipList) is used to test whether the relationship occurrenceconnecting the FirstEntOcc of the FirstEnt Type with the LastEntOcc of theLastEntType is equal to the join of a series of entity occurrences participatingiii RshipList.— The recursive predicate join_list_ok2(Rship Typej, EntTypeji, Ent Typej2,FirstEntOcc, EntOccjl, EntOccj2, LastEntOcc, RshipList) is another recursive predicate. One relationship occurrence of RshipTypej, j=1,2,... in RshipList,via which EntOccjl of EntTypejl connects with EntOccj2 of EntTypej2, isgiven. This predicate is used to test whether it is possible to produce a relationship occurrence, via which FirstEntOcc connects with LastEntOcc, andwhich is equal to the join of corresponding occurrences of these relationshiptypes in RshipList with the given relationship occurrence.Since these two recursive predicate are complicated, their detailed Prolog representations are given as below.%special recursive predicate in Prolog:% for R = Ri rxl R2 r1 R3...% joindist_oki is to test whether% the given occurrence in R connecting FirstEntOcc with LastEntOcc is equal to% the join of the corresponding occurrences in Ri, R2,%% join_list_ok is to test whether% for the given occurrence in Rj connecting Entoccji with EntOccj, j=i,2,...% it is possible to produce a pair of (FirstEntOcc,LastEntOcc)% that is equal to the join of the corresponding occurrences in Ri, R2,%% join_list_oki: all variables are inputsjoin_list_oki (FirstEntType, LastEntType, FirstEntOcc,LastEntOcc, [(RshipTypei, FirstEntType, EntTypei) IRestListi ) :-FirstEntType LastEntType,RshipOcci, rship_occ (RshipTypei, RshipOcci),rship.occpart (RshipOcci, FirstEntType, FirstEntOcc),ent.occ(EntTypei, EntOcci),rshipocc_part (RshipOcci, EntTypei, EntOcci),joinlistoki (EntTypei, LastEntType, EntOcci, LastEntOcc, [Restbist]).Appendix B. Summary of the Predicates used in this Research 207join_list_okl (LastEntType, LastEntType, LastEntOcc, LastEntOcc, [j).%% joirt_list_ok2: FirstEntOcc and LastEntOcc are output variables,% others are input variablesjoinlist_ok2 (RshipTypej, EntTypej 1, EntTypej2, FirstEntOcc. EntOccj 1, EntOccj 2,LastEntOcc, RshipList):join..first_lialf(RshipTypej, EntTypej 1, EntOccj 1. FirstEntOcc, Rshipbist),joinJastha1f(RshipTypej, EntTypej 2, EntOccj 2, LastEntOcc, RshipList).% join_first_half: FirstEntOcc is an output variable, others are input variables% precedes(+RshipTypej. -RshipTypei, +List) is the predicate to find% the Rship Typei that precedes Rship Typej in the List.join_first_half(RshipTypej, EntTypej 1, EntOccj 1, FirstEntOcc, RshipList)precedes(RshipTypej, RshipTypei, RshipList),RshipOcci, rship_occ(RshipTypei, RshipOcci),rshipocc_part (RshipOcci, EntTypej 1, EntOccj 1),ent_occ(EntTypei, EntOcci),rship_occ_part (RshipOcci, EntTypei, EntOcci),join_first_half(RshipTypei, EntTypei, EntOcci, FirstEntOcc, RshipList).join.first_half(RshipTypej, EntTypej 1, EntOccj 1, EntOccj 1, Rshipbist)-, precedes ( RshipTypej, RshipTypei, RshipList).% join_last_half: LastEntOcc is an output variable, others are input variables;% follows(+RshipTypej, -RshipTypek, +List) is the predicate to search for% the RshipTypek that follows RshipTypej in the List.join_last_half(RshipTypej, EntTypej 2, EntOccj2, LastEntOcc, RshipList) :-follows (RshipTypej, RshipTypek, Rshipbist),RshipOcck, rship_occ (RshipTypek, RshipOcck),rship_occ_part(RshipOcck, EntTypej2, EntOccj2),entocc(EntTypek, EntOcck),rship_occpart (RshipOcck, EntTypek, EntOcck),joinlast_half(RshipTypek, EntTypek, EntOcck, LastEntOcc, RshipList).join_last_half(RshipTypej, EntTypej2, EntOccj 2, EntOccj 2, RshipList) :-—, follows (RshipTypej, RshipTypek, RshipList).• The following four predicates are used to store related SICs for updating the primary key of Relationship_Type or Entity_Type. If a SIC is relevant to the insertionAppendix B. Summary of the Predicates used in this Research 208(or deletion) of Relationship_Type, its name is added into the SIC_Name_Set inthe associated_PKSICs_I (or associated_PKSICs_D) predicate of the affected keyattribute PKAtt. Each affected key attribute, PKAtt, of Relationship_Type has itsown associated_PKSICs_I (or associated_PKSICs_D) predicate. In the case that aSIC is relevant to the insertion (or deletion) of Entity_Type, there is only one associated_PKSICs_I (or associated_PKSICs_D) predicate, which is related to each ofits primary key attribute.— predicates used to store the related SICs for the key attributes of Relationship_Type:associated_FKSICs_I(Relationship_Type, PKA tt. SIC_Name_Set)associated_FKSICs_D(Relationship_Type, PKA tt, SIC_Name_Set)— predicates used to store the related SICs for the key attributes of Entity_Type:associated_PKSICs_I(Entity_ Type, SIC_Name_Set)associated_PKSICs_D(Entity_Type, SIC_Name_Set)• The predicate foreign_key(Rship Type, EntOcc, ForeignEntType, FKAtts_val’ue) isused in the relational model to get the foreign key value FKAtts_value of EntOccconnecting with an occurrence of ForeignEnt Type in the relationship Rship Type.• The predicate foreign_ent_occ(RshipType,E,F) is used in the relational model toevaluate whether the primary key of an F occurrence appears as a foreign key inan occurrence E because of the relationship type, Rship Type. The Rship Type isneeded because there may be more than one relationship type between two entitytypes. The entity types with which E and F participate in Rship Type need notbe explicitly included in this predicate. From the relationship name Rship Type,the reference to relationship_hidden_entity(Rship Type, EntTypel, EntType2, Cmin,Cmax) would show that E takes the role EntTypel and F takes the role of EntType2. The exact functioning of this predicate “foreign_ent_occ(RshipType,E,F)”is as follows.Rship Type must be instantiated.Suppose that we have relationship_hidden_entity(Rship Type, EntTypel, Ent Type2,Cmin, Cmax).(1)Suppose that both E and F are instantiated. (la) Suppose the entity type of E,EType, is the same as EntTypel and the entity type of F, FType, is the same asEntType2. This predicate checks whether the foreign key value of the occurrence Eand the corresponding key attributes of the occurrence F are equivalent. It returnsfalse if they are not equivalent or if the corresponding key attribute of the occurrence E is null. (ib) Suppose either EType is different from EntTypel or FType isdifferent from EntType2, or both are different. It traverses a specialization hierarchy to find the occurrence in EntTypel corresponding to E and the occurrence inEntType2 corresponding to F, and then checks them.(2) Suppose that F is uninstantiated, E is instantiated. It only returns F with theAppendix B. Summary of the Predicates used in this Research 209value of the occurrence in EntType2 whose primary key appears in E as a foreign keyowing to Rship Type. No traverse on specialization hierarchy would be performed.(3) Suppose that Fis instantiated, Eis not. It returns those E (Es when backtracking) taking the role of EntTypel connecting with F taking the role of EntType2.• The predicate which_foreign (EntType, RoleTypel, Role Type2, EntOcci, EntOcc2,EntOcci, FEntOcci) is used to check which of entity types, Role Typel or Role Type2,should be Ent Type where the relationship type is hidden, and assign these entity occurrence variables properly. This predicate is used to reduce the possible predefinedsub-SICs in the relational model since we handle the entity links in specializationhierarchy implicitly and for some SICs (e.g., Subset_Relationship SIC, Exclusive_Occurrence SIC) the information about the corresponding involved pairs ofsharing entity types are stored in some order.EntOcci and EntOcc2 are the occurrences playing Role Typel and Role Type2, respectively. EntType is the entity type where the relationship between Role Typeland Role Type2 is hidden. EntOcci is an occurrence of EntType, FEntOcci is thecorresponding foreign entity occurrence. When calling this predicate, Ent Type,Role Typel, RoleType2 should be already instantiated. Either the pair of EntOcci,EntOcc2 or the pair of EntOcci, FEntOcci is already instantiated, but not both.This predicate performs the following:(1) if EntType=RoleTypel, EntOcc2 should be the foreign entity occurrence, thenunify EntOcci with EntOcci, and unify FEntOcci with EntOcc2 (that is, (la) ifEntOcci, EntOcc2 are already instantiated, then set EntOcci=EntOccl,FEntOcci=EntOcc2, (ib) if instead, EntOcci, FEntOcci are already instantiated,then set EntOccl=EntOcci, EntOcc=FEntOcci);(2) conversely if EntType=RoleType2, EntOcci should be the foreign entity occurrence then unify EntOcci with EntOcc2, and unify FEntOeci with EntOcci;(3) otherwise, return false.• The predicates associated_FKSICs_I(Entity_ Type, FKAtt, SIC_Name_Set) and associated_FKSICs_D(Entity_Type, FKAtt, SIC_Name_Set) are used in the relationalmodel to store related SICs for updating the foreign key attribute FKAtt of Entity_ Type.Appendix CBNF Descriptions of the Simplified FormatThis appendix provides a summary of the syntax of the simplified format that is used inthis research to represent the preconditions and predicates of a general SIC elicited fromthe database designer. Since this description is similar to the precondition and predicatecomponents of the SIC Representation model, some definitions refer to Appendix A. TheBNF’ meta-symbols and escape symbols used here are the same as those of Appendix A.<SIC description> ::= if [ with_respect_to <focused entity type names>,]<assertions> [previously]then [with_respect_to <focused entity type names>,]<assertions> [before]if [ with_respect_to <focused entity type names>,]<assertions> [previously]then <SIC description>I <assertions><focused entity type names> ::= <entity type name>(<entity type name>, <entity type name>)<assertions> : : = [<quantifier> <variable> <and_connective>]<assertion_and_list> [<or_connective> <assertions>]Note that the BNF definitions of or_connective and and_connective are the same asin Appendix A.<quantifier> ::= VI 3 [<numerical modifier>] [different]<numerical modifier> ::= at_least <some number>at_most <some number>I exactly <some number><some number> ::= 1 2 3Note that they are non-negative integer numbers.210Appendix C. BNF Descriptions of the Simplified Format 211<assertion_and_list> : : = <expression> [<and_connective> <assertion_and_list>]Note that this BNF definition is the same as in Appendix A except for the underlying definition of expression, which is redefined as below.<expression> ::= (<assertions>)<not> (<assertions>)<entity type name> <relationship type name> <entity type name>I <EntRship type name> is.to_be_deleted<attribute type name> is_to_beiipdatedI <logical predicate><set expression><arithmetic expression><date expression>Entity type name, relationship type name and attribute type name have the samename convention — beginning with a capital letter — as those in Appendix A. Thedefinitions of not and date expression are the same as those in Appendix A. Thedefinitions of set expression and arithmetic expression are almost the same as thecorresponding ones in Appendix A except for their underlying definitions of variableand arithmetic subterm, which will be redefined below. The major differences arethat neither subscript nor relation type name is used; and the new and old functions are only applicable to <attribute type name>. Only some of the logical predicates in Appendix B are applicable here. These are, satisfy_datatype(<attributetype name>, Data_Type), satisfy_format(<attribute type name>, Format), satisfy_val’ae(<attribute type name>, Range), is_null(<attribute type name>),is_not_null(<attribute type name>), false, and belongs_to(Element, Set), etc. Because subscripts are not used, the simplified format uses a special predicatennique(<attribute type name>{,<attribute type narne>}), which is not used in theSIC Representation model (so, it is not in Appendix B). It checks whether a particular attribute or combination of attributes does not actually contain duplicatevalues in a database. When the SIC is represented in the SIC Representationmodel, this special predicate will be replaced with a count function and appropriate subscripts will be attached to its associated entity (by applying the algorithmsin Appendix H.2).<EntRship type name> ::= <entity type name><relationship type name><variable> <EntRship type name> <other_variable_except_ER><other_variable_except_ER> : : = <attribute type name><Prolog variable>Appendix C. BNF Descriptions of the Simplified Format 212<arithmetic subterm> : = <numeric constant><other variable value>I <aggregate function>(<attribute variable value>)count ( < EntRship variable value>)<aggregate function> (<set expression>)(<arithmetic simple expression>)[<exponential operator> (<arithmetic simple expression>)]The definitions of numeric constant, aggregate function, arithmetic simple expression, exponential operator are all the same as in Appendix A. The main difference ofthe definition of arithmetic subterm is that we use EntRship variable value insteadof EntRshipRel variable value.<other variable value> ::= <other_variable_except_ER>new(<attribute type name>)I old(<attribztte type name>)<EntRship variable value> ::= <EntRship type name>Note that the new and old functions are not applicable here. We can directly useEntRship type name instead of defining EntRship variable value. The reason fordefining this is to compare it with Appendix A.Appendix DSIC Type Classification in the E-R-SIC ModelThis appendix provides a classification of SIC types used in the E-R-SIC model. Although the listing is not comprehensive, it covers different SIC types that can occur invarious contexts of an E-R diagram, and also includes those SICs mentioned in the literature. We can use these SIC types to write generic SICs and store some consistency ornonredundancy rules for them in the elicitation subsystem.Notation The following notation will be used in examples.• E, F, G, H, I, ... are used to denote entities.• R, RX, RY, RZ, Ri, R2, R3, ... beginning with “R” are used to denote relationships.• E.A, E.Ai, E.A2, F.B, F.Bi, F.H2, R.A, ... are used to denote attributes ofentities or relationships. Gurrent_time is the system variable to monitor the currentclock.• v, vi, v2, ... are used to denote values.• comp_op, comp_opi, ... are used to denote=, , , , <,>.• ariop, ari_opi, ... are used to denote +, —, x,• agg_fcn, agg_fcnl, ... are used to denote the aggregate functions, e.g., sum, avg,mm, max, count.SIC Types There are four categories of SIC types.• SIC Types Focusing on Attributes of a Single Entity or RelationshipType.213Appendix D. SIC Type Classification in the E-R-SIC Model 214• SIC Types Focusing on a Relationship as an Association. These are necessary conditions for the existence of one or more relationships because relationshipsare “associations” linking entities. Different contexts, line, star, ioop-2, and loop-nshould be considered when identifying SIC types.• Sufficient Conditions for the Existence of Relationship(s). These are sufficient conditions for the existence of one or more relationships because relationshipsare “associations” linking entities.• SIC Types for Entities without Explicit Relationships. There are someSICs for a group of entities because of some “implicit relationships”.Each of them is described in detail as below.• SIC Types Focusing on Attributes of a Single Entity or RelationshipType.1. Domain Constraint: declares the value; extended format; nonvolatility104data type of an attribute of an entity or relationship type; whether it can benull; and if it should be unique.It is possible to specify this kind of constraint for an “implicit entity or relationship subtype”. For example, a Subtype_Nonvolatility Constraintrequires that an attribute of any occurrence of the specified “implicit entityor relationship subtype “ be not changeable. We may also require that anattribute have some value range if there are restrictions on the values of otherattribute(s), e.g., Conditional_Value SIC such as:if E.Ai compopi vi, . .. then E.A2 comp.op2 v2.Constraints on the attributes of an entity or relationship occurrence could bemore complicated. In that case, we would have the following.2. Formula SIC: requires that a formula105 exists among attributes of any occurrence of the specified entity or relationship type. For example, we mayhave:(E.Ai) comp_op ((E.A2) ari_opi (E.A3,) ari_op2 (scalar_value)),...If the formula involves non-numeric data-types, string predicates may beneeded.104“Nonvolatility” is interpreted in a strict sense. If an entity is inserted with some null attributevalues, they have to stay null if “nonvolatile” is declared for them.‘°5Recall that if there is more than one attribute appearing in an expression, we call that expressionas a formula.Appendix D. SIC Type Classification in the E-R-SIC Model 215There may be some conditions on a formula requiring that it hold only for an“implicit entity or relationship subtype”. For example, we may have:if (E.A1) comp_opi ((E.A) ari_opi (E.A3) ari_op2 (scalar_value)),...then (E.A4) compop2 (E.A5) ariop3 (E.A3)3. Aggregate_Attribute SIC: restricts the aggregate attributes of the wholeentity or relationship set, for example, agg_fcn(E.A) comp_op v.If the data type of the attribute E.A is numeric, we may have:aggfcn(E.A) comp_op E.A, such as, “any salary value of an employee mustnot be more than 5 percent greater than the average of all such values”.Some complicated cases may occur. For example, there may he a SIC (calledAggregate_Attribute_Formula SIC) requiring some formula among thevalues of aggregate attributes of the entity or relationship set. There may bea restriction (called Interdependent_Aggregate_Attributes SIC) on theaggregate value of an attribute if some other attributes have some aggregatevalues, e.g.,if (aggfcn1(E.A3) compop2 v2,), ... then (aggfcn2(’E.A3) compop3 uS)It is also possible only for occurrences of the specified “implicit entity or relationship subtype” that there is a restriction on an aggregate value (calledSubtype_Aggregate_Attribute SIC), an aggregate attribute formula (calledSubtype_Aggregate_Attribute_Formula SIC), or the requirement of thecoexistence of some aggregate attributes (called SubtypeJnterdependent_AggregateAttributes SIC).4. Composite_Attribute_Unique Constraint: requires the concatenated values of some attributes (which form a composite key) of an entity type to beunique.5. Old_New Transitional Constraint: restricts the pairs of values before and’after an update of an attribute of the specified entity or relationship type. Itis also possible to have this kind of constraint for a specified “implicit entityor relationship subtype”.6. Primary_Key SIC: requires enforcing a set of SICs related to deletion andinsertion of an entity or relationship when an attribute, which is (part of) theprimary key, is updated.7. Deleted_Object_Attribute SIC: restricts the values of attributes of anyoccurrence of the specified entity or relationship type before this entity orrelationship occurrence can be deleted. For example, “a project can be deletedonly if its budget is zero”. It is an operation-dependent SIC that is relevantonly to the deletion operation. It is also possible to have this kind of constraintfor a specified “implicit entity or relationship subtype”.8. Absolute Maximum Cardinality Constraint of an Entity Type: restricts the maximum number of occurrences of an entity type that can exist inAppendix D. SIC Type Classification in the E-R-SIC Model 216a database. If it is infinite, it is not a restriction. The notation ““ is used todenote either the infinite, or the case that it is not infinite in the mathematicalsense, but there is no restriction on the maximum cardinality.9. SIC Involving Current_Time: either restricts a time-valued attribute ofthe specified entity or relationship type within some range relative to Gurrent_Time; places restrictions on other attributes when time-valued attribute(s)satisfy some condition(s) relative to Current_Time; or requires the propagationto update the value(s) of some attribute(s) of the specified entity or relationship type while updating the Currenttime.It may only occur in an environment in which the event that causes manipulation of the involved database object(s) is processed in real time so thatthe Current_time in the computer matches the event time in the real world.Otherwise, Current_time in all of the above must be replaced by a time-valuedattribute that records the external event time, and these constraints becomeordinary data-driven constraints.— Attribute SIC with Current_Time Restriction:(a) Restriction on a time-valued attribute relative to Current_Time: Fourexamples are given here to illustrate the various cases.* Cases 1 and 2— the increase of Current_time will never violate theSIC under the assumption that the current database is semanticallycorrect:i. Case 1 the time-valued attribute represents a past fact relative to Current_time: If “Employee.FirstWorkDate” is the firstdate of Employee working on related jobs, Currenttime (Employee.First WorkDate + “2 years”) is an expression to assert that“each employee must have at least 2 years working experience”.ii. Case 2 the time-valued attribute represents a future expectancyrelative to Current_time: If the “Part.ExpectedArrDate” is the expected arrival date of a Part ordered from Suppliers, Current_time(Part.ExpectedArrDate— “10 days”) is an expression to assertthat “the expected delivery time of a part must be no more than10 days”.* Cases 3 and 4 — the increase of Current_time may violate the SIC:i. Case 3 the time-valued attribute represents a past fact relativeto CurrenLtime: We may have an expression Current_timeAccount_Receivable.Date + “10 years” to assert a rule that “anaccount_receivable cannot be older than 10 years”.ii. Case 4— the time-valued attribute represents a future expectancyrelative to Current_time: If Drug.Expiration_Date is the date thata Drug will expire, Current_time < (Drug.ExpirationDate— “2Appendix D. SIC Type classification in the E-R-SIC Model 217weeks”) is an expression to assert a rule that “a drug must bevalid for at least 2 weeks”.(b) Restriction on other attributes when time-valued attributes satisfysome conditions: When time passes, the restriction will either disappear or come into force. For example,if Current_time (Employee.HireDate + “6 months”)then old(Employee. Salary) > new(Employee. Salary)is an expression to assert a rule that “an employee cannot receive asalary raise during his(her) first 6 months in the company”.Another example is: “if an employee has worked for at least two years,he (she) must have at least 10 vacation days”.— Current_Time_Triggered SIC on Attributes: It triggers some operation while updating the Current_time. For example, at 0:00 on 1/1/1993,increase the salary of each employee by $1, 000.10. Temporal Conditions Among Attributes: requires that some attribute(s)of an occurrence of the specified entity or relationship type must satisfy acertain condition, at the time one of its other attribute is going to acquiresome value. For example, we may have: if E.A1 comp-op vi then E.A2 compop v2 before. Alternatively, this type of SIC may require that if its otherattribute(s) satisfied a certain condition in the past (and no longer satisfies itnow), one of its attributes must take some value. For example, we may have:if E.A1 comp-op vi previously then E.A2 comp-op v2.• SIC Types Focusing on a Relationship as an Association.1. Incidence Constraint: allows a relationship occurrence to exist only if theparticipating entity occurrences exist.2. One_Side_Necessary_Condition for a Relationship Type: restricts theexistence of any occurrence(s) of the specified relationship type because thereare conditions on one of its participating entity types. Different layouts of E-Rdiagrams (line, star, loop-2, or loop-n contexts) may have different conditions.Some examples include:— Relationship_Depends_on_Relationship SIC: requires that for a relationship occurrence to exist, one of its participating entity occurrencesmust participate in the other relationship type. It may happen in line,star, or loop-2 contexts. For example, in a star context, Figure 5.3 (onpage 111), we may have:if E RXFthen ERYGorif E RXFthen (ERYG) V (ERZH)Appendix D. SIC Type Classification in the E-R-SIC Model 218Note that stating “if RX then RY’ is equivalent to stating “RX only-ifRY’. An example is “Customers (F Use Products (E” (i.e., RX only-if“Factories (G) Make Products (E”(i.e., RY). Sakai ([1983b]) states thatRX is existentially dependent on RY and there is a hierarchical orderingof the entity types.— Relationship_Depends_on_Entity_Value SIC: requires that for a relationship occurrence to exist, one of its participating entity occurrencesmust have some attribute value(s) satisfying a specified restriction. Forexample, in Figure 5.2, we may haveif E RXFthen (E.Ai comp.opi vi) V (E.A2 comp.op2 v2)— Exclusive_Relationship SIC: requires that given two relationship types,for a relationship occurrence to exist, one of its participating entity occurrences must not participate in the other relationship type. These tworelationship types RX and RYare exclusive (denoted as RXIRY), and arecalled excluding relationships relative to the entity ([Palmer, 1978], [Kozaczynski and Lilien, 1988]). It may happen in any context. For example,in the context of Figure 5.2, we may have:if E RXFthen —i(E RY G)An example is where a storeroom is used either for storing raw materialsor for storing finished goods, but never for both.— Not_And_Relationships SIC: requires that given a group of relationship types with a sharing entity type, for a relationship occurrence toexist, its participating entity occurrence of the sharing entity type mustnot participate in all of the other relationship types. That is, no occurrence of the sharing entity type can participate in all the relationshiptypes. It can only happen in a star context, e.g., in Figure 5.3, we mayhave:if ERXFthen ((E RY G) A (E RZ H))If there are only two relationships involved, it reduces to anExclusive_Relationship SIC.— Some constraints are special because of temporal conditions:* Relationship_Before_Relationship SIC: requires that for a relationship occurrence to exist, one of its participating entity occurrencesmust participate in another relationship type at the time it is goingto participate in this relationship type. For example, in Figure 5.2,we may have:if ERX Fthen E RY G beforeAppendix D. SIC Type Classification in the E-R-SIC Model 219It requires that RY must exist at the time RX is going to be inserted. However, there is no restriction on the deletion of RY afterRX exists. A Relationship_Depends_on..Relationship SIC requires that when RX is inserted, RY exists (that is, RX and RY canbe generated concurrently) and if RY is deleted, RX must be deletedtoo.* Entity_Attributei3efore_Relationship SIC: requires that for arelationship occurrence to exist, the attributes of one of its participating entity occurrences must satisfy some conditions at the timeit is going to participate in this relationship type. For example, inFigure 5.2, we may have:if ERXFthen ((E.Ai compop vi) V (E.A2 comp_op2 v2)) beforeAn example is that “for a person to participate in a life-insurancerelationship, his (her) age must be less than 65”. Note that we donot care what attribute values this entity occurrence will have after itparticipates in the relationship.* Relationship_Not_Before_Relationship SIC: requires that for arelationship occurrence to exist, one of its participating entity occurrences must pç participate in another specific relationship type atthe time it is going to participate in this relationship type.3. Two_Side_Necessary_Condition for a Relationship Type: restricts theexistence of any occurrence(s) of the specified relationship type because thereare conditions on both participating entity types at the same time. Different layouts of E-R diagrams (line, star, ioop-2, or loop-n contexts) may havedifferent conditions. Some examples include:— Pair_Relationships SIC: requires that for a relationship occurrence toexist, if one of its participating entity occurrences participates or does notparticipate in some specific relationship type, the other participating entity occurrence must or must not participate in other specific relationshiptype(s). For example, in Figure 5.2, we may have:if ERXFthen if E RY Gthen FRZH— Pair_Values SIC: requires that for a relationship occurrence to exist,if one of its participating entity occurrences has certain attribute values,the other participating entity occurrence must have some specific attributevalues. For example, in Figure 5.2, we may have:if E RXFthen if (E.A compopi vi)Appendix D. SIC Type Classification in the E-R-SIC Model 220then (F.B compop2 v2)For example, if Employee Is_Allocated Car thenif Employee.Rank= “president” then Car.Brand= “BMW”.— ID_Dependency_Relationship SIC: requires that if an entity type Eincorporates the primary key (e.g., F.Fkey) of the other entity type F aspart of its candidate key (because of a relationship type implying an “IDdependency” or because of an is_a, or component_of special relationshiptype), for an occurrence of the relationship type to exist, its participatingentity occurrences should have a condition: E.Fkey=F.Fkey.— Pair_Condition SIC: requires that for a relationship occurrence to exist,there are some conditions involving the attributes of both of its participating entity occurrences. For example, in Figure 5.2, we may have:if E RX Fthen (E.A1 comp.opl (F.B ariop E.A2)),— Subset_Relationship SIC: requires that for an occurrence of the specified relationship type to exist, its two participating entity occurrencesmust be connected via another relationship type ([Palmer, 1978]). Thatis, RX C RY, implying that RX occurrences are included in RY occurrences, or stated in a different way, RX only-if RY. It can only happen ina ioop-2 context. For example, Owns C Entitled-to-Drive implies that ifa person owns a car, he (she) must be entitled to drive it.— Exclusive_Occurrence SIC: requires that for an occurrence of the specified relationship type to exist, its two participating entity occurrencesmust be connected together via any occurrence of another relationship type ([MaFadden and Hoffer, 1988]). It is a special case, but weakerthan Exclusive_Relationship SIC.— Relationships_Union Special SIC: requires that for an occurrence ofthe specified relationship type to exist, its two entity occurrences must beconnected together via any one of other relationship types. It can onlyhappen in a loop-2 context and is a necessary condition for the union ofrelationships, e.g., RX = RY U RZ. The semantics that one relationship(e.g., RX is the union of the remaining relationships imply this type ofSIC and Subset_Relationship SICs (e.g., RY RX, etc.).— Relationships_Intersection Special SIC: requires that given a groupof relationship types in a set SuperRshipSet and another relationship typeSnbRship, for each of the relationship types in SnperRshipSet to have anoccurrence connecting the same pair of entity occurrences, there shouldbe an SubRship occurrence connecting this pair of entity occurrences. Itcan only happen in a loop-2 context and is a necessary condition for theintersection of relationships, e.g., RX = RYnRZ. The semantics that onerelationship (e.g., RX) is the intersection of the remaining relationshipsAppendix D. SIC Type Classification in the E-R-SIC Model 221imply this type of SIC and Subset_Relationship SICs (e.g., RX C RY,etc.).— Relationships_Join SIC: requires that if there is a linking path viasome relationship types to connect two entity occurrences together, thosetwo entity occurrences must be connected via another relationship type.It can only happen in a loop_n context and is a necessary condition forrelationship composition, denoted by, e.g., RX = RY N RZ ([Lenzeriniand Santucci, 1983], [Azar and Pichat, 1987]). For example, in Figure 5.5,we may have:if(ERYH)A (HRZF)then ERXFIn fact, this is another special case of Pair_Relationships.— Relationship_Depends_on_LoopN_Relationships SIC: requires thatfor a relationship occurrence to exist, there is a linking path via other relationship types to connect its participating entity occurrences together.It can only happen in a loopn context and is another necessary conditionfor the composition of relationships. For example, in Figure 5.5, we mayhave:if E RXFthen(ERYH)A (HRZF,)Sakai [1978] states that if RX is transitively dependent on RY and RZand it lacks (nonkey) attributes, it is redundant and could be eliminated.However, Segev [1987] argues that some relationships that appear to beredundant may, in fact, carry semantic information and thus cannot beeliminated without losing information content, and even if it is redundant,there may be a choice of which relationships to eliminate and that thischoice should be made during physical design.4. Intra_Condition for a Relationship Type: restricts the existence of anyoccurrence(s) of the specified relationship type because of other occurrencesof the same relationship type. For example, we may have the following.— Symmetry and Transitivity Properties of a Relationship: In thefollowing, entity types E and F should be in a specialization hierarchy.A relationship type R is symmetric if the following condition is true: “ifany occurrence Occi of an entity type E is related via an R occurrence toan occurrence Occ2 of the other entity type F, then an occurrence of theE type, corresponding to the same super-type occurrence of Occ2, is alsorelated via another R occurrence to an occurrence of the F type, corresponding to the same super-type occurrence of Occi.” Some examples ofsuch relationship types are: Sibling_of, Married_to, Partner_of.A relationship type R is transitive if the following condition is true: “ifany occurrence Occi of an entity type E is related via an R occurrence toAppendix D. SIC Type Classification in the E-R-SIC Model 222an occurrence Occ2 of the other entity type F, and an occurrence of the Etype, corresponding to the same super-type occurrence of Occ2, is relatedvia a second R occurrence to an occurrence Occ3 of the F type, then theOcci is also related via a third R occurrence to the Occ3.” Some examples of such relationship types are: Sibling_of, Ancestor_of, Supervise,Partner_of.— Relative Maximum Cardinality Constraint: restricts the existenceof a relationship occurrence because an occurrence of one its participatingentity type can only participate in some maximum number of occurrencesof the same relationship type.This restriction may only apply to a specified “implicit entity subtype”(it is a SubtypeRelative_Maximum_Cardinality Constraint). Thisrestriction may become weaker because there are further conditions onthe other entity type that relates to the specified entity type via the relationship type (it is a Weaker_Relative_Maximum_Cardinality Constraint). There may also be a SIC (that can be calledS ubtype_WeakerJtelative_MaximumCardinality Constraint) including both the complicated cases.5. Group_Relationships SIC: requires that if occurrences of a group of relationship types exists, there must be some conditions on the values of attributesof those relationships or their sharing entity. For example, in Figure 5.2 or5.3, we may haveif(ERXF)A (ERYG)then (E.A1 compop (RY.B1 ariop RX.B2))Or, they may depend upon many other relationships, for example, we mayhave:if(ERXF)/\ (ERYG)then (E RZ H) V (E RS I) V• Sufficient Conditions for the Existence of Relationship(s): requires thatgiven some conditions, one or more occurrences of the specified relationship type(s)must exist. It includes some special cases that are often mentioned in the literature.1. Totality Constraint: requires that if an entity occurrence exists, it mustparticipate in some minimum number of occurrences of the specified relationship type. A relationship type is total to one entity type if each occurrenceof the entity type must participate in at least one relationship occurrence106.It is desirable to know an exact number (e.g., 1, 2, 5, etc.) as the relativeminimum cardinalities for entity types if there exist such SICs. The minimumcardinality 0 is not a SIC.‘°6Some researchers use by other terms, e.g., [Palmer, 1982] states that the relationship is “mandatory”,and [Kim et al., 1987] states that the relationship is “obligatory”.Appendix D. SIC Type Classification in the E-R-SIC Model 223A relationship type may be only total to some specified “implicit entity subtype”. That is a Subtype_Totality Constraint. A total constraint requirement may become stronger because there are some conditions on the otherentity type that relates to the specified entity type (it is a Stronger_TotalityConstraint). There may also be a SIC (that can be calledSubtype_Stronger_Totality Constraint) including both complicating cases.There may also some further restrictions on the relationship occurrences. Forexample, a Weak_Entity SIC requires that a relationship type, via which theweak entity type is dependent upon a regular entity type, is total to the weakentity type and its key is fixed. A Critical_Relationship_Occurrence SICrequires the totality constraint to the specified entity type and the existenceof exactly one critical relationship occurrence.2. Completeness_Mapping SIC: requires that if an entity occurrence exists,it must be related to occurrences of the other entity type via the specifiedrelationship type ([Webre, 1983]).A relationship type may be only complete to some specified “implicit entitysubtype”. That is, Subtype_Completeness_Mapping SIC. A completemapping requirement may also become weaker because there are some conditions on the other entity type that relates to the specified entity type (it isa Weaker_Completeness_Mapping SIC). There may also be a SIC (thatcan be called Subtype_Weaker_Completeness_Mapping SIC) includingboth complicating cases.3. Either_Existence_Relationship SIC: requires that if one entity occurrenceexists, it should participate in at least one (or some specified number of) occurrence(s) among a group of relationship types. For example, in a star context,Figure 5.3, each E occurrence may be required to participate in at least onerelationship occurrence among the relationship types RX, RY and RZ.There are some variants of this type, e.g., the cases considering an “implicit entity subtype”, different quantitative requirements on relationship occurrences,or having further restrictions. For example, in Figure 5.3, each E occurrencemay be required to participate in at least either one RX occurrence, two RYoccurrences or four RZ occurrences.4. Relationship_Trigger_Relationship SIC: requires that if some relationships existed in the past (and no longer exist now), other relationships mustexist. For example, in Figure 5.2, we may have:if E RY G previonslythen ERXF5. Entity_Value_Trigger_Relationship SIC: requires that if the value(s) someattribute(s) of an entity occurrence satisfied a certain condition in the pastAppendix D. SIC Type Classification in the E-R-SIC Model 224(and no longer satisfy now), the entity occurrence must participate in someminimum number of occurrences of the specified relationship type. For example, we may have:if E.Al comp-op vi previouslythen ERXF• SIC Types for Entities without Explicit Relationships.— SIC Types for Entities in a Specialization Hierarchy.1. Exclusion between Entity Types: requires two entity types to beexclusive.2. Entities_Intersection Special SIC: requires that given a group of entity types El, E2, ..., En having a common candidate key Ekey, for eachof El, E2, . . . EnJ entity types to have any occurrence with the samecandidate key value, there should be an occurrence with this candidatekey value in the specified entity type En. That is, “if (Ei.Ekey= Value) A(E2. Ekey= Value) A ... then En. Enkey= Value “. It is a necessary conditionfor En = El fl E2fl3. Entities_Union Special SIC: requires that given a group of entity typesEl, E2, ..., En having a common candidate key Ekey, for any occurrencein the specified entity type En, there should be at least one correspondingoccurrence with the same candidate key value in one of other entity types.That is, “if E.Ekey_— Value then (El.Ekey=Value) V (E2.Ekey= Value) V“. It is a necessary condition for En = El U E2U— Association Abstraction SIC: requires that a set occurrence cannot bedeleted if it is not empty; two kinds of derived attributes in a set— theindexing attribute and the key of the indexing entity type — cannot be updated; and some formulas involving aggregate functions on attribute(s) of themember entity type although there is no implicit relationship type member_of.Appendix EExamples of HeuristicsThis appendix presents some heuristics that can be used to reduce the effort required touse the SIC elicitation subsystem. The listing shows only some examples; it is far fromcomplete.1. Two heuristics may be applied to elicit oldnew transitional constraints.• If an attribute is numeric, the subsystem may inquire whether it is monotonically increasing or decreasing on update.• If an attribute has an enumerated value set, there may be update transitionsthat are not allowed. For example, the value of Marriage status cannot beupdated from “single” to “widowed”, or divorced”, nor conversely.2. An attribute with the “date” data type may be unchangeable. More precisely, weshould state that historical dates are usually unchangeable. However, the subsystemneeds to know the meaning of the date in order to know whether it is historical.3. Based on domains of attributes, the subsystem may suggest some possible entitySICs, e.g., Formula SICs, to the database designer. For example, if an entityhas several attributes with the same domain “money”, there may be a formulaSIC involving these attributes. However, without application domain knowledge,the subsystem would have to ask the database designer to confirm whether such aconstraint is needed and provide the necessary details.4. The subsystem may have a lexicon containing the names of common relationshiptypes with symmetry or transitivity property. However, the subsystem still needs toask the database designer to confirm it. For example, the relationship “Married_to”is usually symmetric. However, if it is between two entity types Woman and Man(or Wife and Husband), it would not be symmetric. Another example is that therelationship Supervise (or Manage) is usually transitive. However, if the databasedesigner in fact defines the relationship to be “directly supervise (or manage) “, itwould not be transitive.5. A heuristic to detect excluding relationships or time sequencing between relationships is:225Appendix E. Examples of Heuristics 226If two relationships, which are adjacent in an E-R diagram, are named by thedatabase designer using the same verb phrase, they usually imply exclusion oreven time sequence among them. For example, the names of the relationships,Customer..OwnsCar, Dealer.Owns..Car are created by using the same verb phrase“Owns “.If those relationships (e.g., Customer_Owns_SavingAccourit andCustomer_Owns_CheckingAccount) are not exclusive nor have time sequencing, itmay be the following case.6. A heuristic to detect a Groupd{elationships SIC is:Suppose that an entity type is connected to a group of relationship types. Examineattribute names, but omit their prefixed entity or relationship names. If there isa common attribute name other than Id or Name among either (1) the sharingentity type and these relationship types or (2) the sharing entity type and its related other entity types, it is likely that there is a formula between those attributes.An example is Inventory. Qty— sum (Supply. Qty)— sum(Sales. Qty) in the contextof two relationship types, Supply and Sales, sharing the same entity type, Inventory. Another example is Customer.AccountBalance = SavingAccount.Balance +CheckingAccount.Balance in the context of a Customer owning two entity types,SavingAccount and CheckingAccount.Appendix FVerification of Aggregate Attribute SICs and CardinalitiesThis appendix discusses the extent to which we can verify SICs involving aggregateattributes, and provides algorithms to verify absolute and relative cardinalities given bythe database designer.F.1 Simple Tests on Aggregate Attribute SICsSuppose that we have some aggregate attribute SICs, each of which is a simple assertionto restrict the aggregate value of an attribute. At best, we can only have some simpletests, such as:Suppose E.A is an attribute between vi and v2; avg(E.A) is specified to bebetween MinAvg and MaxAvg; sum(E.A) is specified to be between MinSumand MaxSum; the maximum absolute cardinality of E is specified as Abs C-max. The following should be true:vi MinAvg MaxAvg v2;vi MinAvg MinSum MaxSum (AbsCmax x MaxAvg) (AbsCmaxx v2).If there is a SubtypeAggregate_Attribute SIC on E.A, the followingshould be true:vi subtype’s mm subtype’s max v2;subtype’s MaxSum < MaxSum;subtype ‘s count(E) < Abs CmaxThe general Interdependent..Aggregate_Attributes, Aggregate_Attribute_Formula, SubtypeAggregate_Attribute..Formula, etc. cannot be verified conceptually.227Appendix F. Verification of Aggregate Attribute SICs and Cardinalities 228F.2 Algorithms for Verifying Cardinalities1. Verification of “traditional” relationship cardinalities:• A relative minimum cardinality of 0 is not a constraint. Therefore, if a SICexists, for each involved entity type in a relationship, its relative cardinalitiesmust be:1 relative minimum cardinality relative maximum cardinality.• Suppose that the relative maximum cardinality of entity type E through arelationship R to entity type F is Cmax, and absolute maximum cardinalityof F is Ab..max. Then, Cmax Ab.max.• A set of cardinality constraints may be inconsistent in the loop-2 and loop-ncontexts.Given a group of relationship and entity types with their relative cardinalities,do the following check.(a) Associate each entity type Ei with a special variable Ei#.(b) For each relationship type Ri, do the following. Suppose that Ri connectstwo entity types Ej, Ek; and the minimum and maximum cardinalities ofEj and Ek in Ri are, respectively, (Cminj, Cmaxj) and (Cmink, Cmaxk).That is,Ej(Gminj, Gmaxj) Ri Ek(Gmink, Gmaxk,).— If Cmaxj is not ““ and Cmink is not 0, construct the following inequality:Cmaxj x Ej// Cmink x Ek#— If Cmaxk is not “<“ and Cminj is not 0, construct the following inequality:Cmaxk x Ek# Cminj x Ej(c) Solve all the above inequalities for the special variables Ei#’s under therestriction of each Ei# > 0.(d) If there is no solution, the cardinalities are inconsistent07.107 Observe these inequalities. If the cardinaiities are consistent, the designed database can be populated. Then there is at least one solution for these inequalities: assigning to each Ei# the valuecorresponding to the number of occurrences of Ei in the database. Formally, Lenzerini and Nobili([1987]) have proved that there exist solutions for all those inequalities if and only if the cardinalitiesare consistent. Note that their inequalities contain another set of variables R# because they prove thatit is true for any degree of relationships (not restricted to binary relationships). However, in the case ofAppendix F. Verification of Aggregate Attribute SIGs and Cardinalities 229• Cardinality constraints can interact with other SICs. A special interaction iswith a Subset_SIC. Suppose we have relationship type “RX C RY” betweenthe entity types, E and F. Then, for either E or F, both minimum cardinalityand maximum cardinality in RX should he less than or equal to those in RY,respectively. If either E or F has (1,1) in both RX a.nd RY, RX and RY wouldcontain the exactly same occurrences108.They should be merged and a newrelationship type will be created.• Verification of Completeness_Mapping SIC: Suppose that a relationshiptype R is specified to be complete relative to an entity type E. Certainlyit is also complete relative to its other entity type F by the symmetry ofthis SIC. The relative maximum cardinality of E in R must be equal to theabsolute maximum cardinality of F. Otherwise, they are inconsistent. Thereis a similar condition between the relative maximum cardinality of F andabsolute maximum cardinality of E.2. Verification of Stronger, Weaker, and Sub-type Cardinalities: When weconsider cardinalities with regard to Subtype_Totality Constraint (SubGmin),SubtypeRelative..MaximumCardinality Constraint (Sub.. Cmax),Stronger_Totality Constraint (Stronger..Cmin),Weaker_Relative_Maximum_Cardinality Constraint (Weaker.. Cmax),Subtype_Stronger_Totality Constraint (S’ub..Stronger Cmin), andSubtypeWeakerJ{elativeMaximumCardinality Constraint (Sub.. WeakerC’max), verification becomes complicated.• The following should be true if these cardinality constraints are not redundantand are consistent with each other.— For the same “subtype”,1 Cmin < Sub..Cmin SubCmax < Cmax— For the same “further restriction”,1 Stronger_Cmin < Weaker_Cmax,Stronger..Cmin < Cmin,Weaker_Cmax < Cmax.binary relationships, if we can assure that each pair of the relative maximum cardinality is greater thanor equal to relative minimum cardinaiity, those inequalities containing the R# variables can be reducedto our inequalities. Lenzerini and Nobili have also proposed another way to detect inconsistency forcardinajities by discovering cycles with special “weights” in an E-R diagram.‘°8Prove it by contradiction as follows. We already know that RX C RY. Now suppose that entity typeB has (1,1) in both relationship types, RX and RY. If an RY occurrence, via which an occurrence el ofB type connects with an occurrence fi of F type, does not belong to RX type, then el must connectanother occurrence f2 of F type via the relationship type RX. Otherwise, it would violate the totalityconstraint of RX to B. Then, by RX C RY, el must also connect with f2 via the relationship type RY.It would violate the maximum constraint of RY relative to E unless fi = f2. So, we would also have RYçRX.Appendix F. Verification of Aggregate Attribute SICs and Cardinalities 230— For the same “subtype”, and the same “further restriction”,Stronger_ Cmin< Sub_Stronger_ Cmin Sub_ Weaker_ Cmax< Weaker_ Cmax,SubStronger..Cmin < SubCmin,SubWeakerCmax < Sub.Cmax.• In the contexts of loop-s, loop-n, if cardinalities have been specified for “implicit subtype(s)”, the Lenzerini and Nobili’s inequalities should be tested byusing the proper cardinality values.— For example, in the context of ioop-2 where two relationship types RXand RY exist between two entity types E and F, the database designermay consider an implicit subtype of E, and give the Sub.Cmin, Sub_Cmaxof RX and RY relative to F, StrongerCmin, StrongerCmax of RX andRY relative to F. There may be potential inconsistencies between thesecardinalities.— Another example in the context of loop-n has a loop linking entity typesF, F, G and H, through four relationship types RX, RY, RZ and RS,consecutively. The database designer may consider implicit subtypes ofadjacent entity types, F and F. Now the proper values for testing theabove inequalities would be:Sub_StrongerCmin, Sub_WeakerCmax of RX relative to F and F,SubCmin, SubCmax of RY relative to F,Stronger_Cmin, Weaker_Cmax of RY relative to G,Cmin, Cmax of RZ relative to G, and H,Stronger_Cmin, Weaker_Cmax of RS relative to H,Sub_Cmin, SubCmax of RS relative to F.Appendix GConsistency and Nonredundancy Rules for SIC Elicitation SubsystemThe following rules could be stored in an elicitation subsystem for capturing SICs. Theserules are used to expedite the elicitation and verification procedure. By using them,the subsystem could sometimes avoid the need to invoke a sophisticated logic verificationalgorithm. In other cases, they eliminate the need to ask the database designer to confirmsome SICs that are “obviously” inconsistent or redundant. The listing here is illustrativerather than complete. In the following, E, F, G, H, I, ... are used to denote entity types.The R, Ri, R2, R3, ... are used to denote relationship types.1. If an attribute is declared to be non-changeable (i.e., a Nonvolatility constraint)there should be no other update SICs asserted for it.2. A composite key constraint is consistent with domain constraints if each componentattribute of a composite key is declared to be not-null.3. Incidence constraints are always needed and unlikely to be inconsistent with otherSICs. We need not verify them for consistency and non-redundancy.4. A Relationship_Depends_on_Relationship SIC could be redundant or couldbe reduced to another SIC if any involved relationship is total. If there is a negativecondition asserted for the involved relationships, specifying this type of SIC wouldbe inconsistent.• If Ri is total to E, the SIC, “if E Ri F then F R2 G”, is subsumed by thetotality constraint R2 total to E. On the other hand, if R2 is total to F,this SIC is trivially true.• The SIC relative to F, “if E Ri F then E R2 G”, or “if F R2 G then E RiF”, is inconsistent with an Exclusive_Relationship SIC, R1IR2, that is “ifF Ri F then -i(’E R2 G,)”.• If we have two SICs relative to F, “if F Ri F then E R2 G” and “if F Ri Fthen F R3 H”, they are inconsistent with either an Exclusive_RelationshipSIC— R2IR3, or a NotAnd_Re1ationship SIC — “-((E Ri F) A (F R2G) A (E R3 H))”.231Appendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 232• If we have two SICs, “if E Ri F then E R2 G” “E R2 G then E R3 H”,they are inconsistent with a Not_And_Relationship SIC, “—i((E Ri F) A(E R2 G) A (E R3 H))”.• Both “if with_respect_to E, B Ri F then with_respect_to E, B R2 F” and “ifwith_respect_to F, E Ri F then with_respect_to F, E R2 F” become redundantif a SubsetJtelationship SIC, Ri C R2, i.e., “if B Ri F then E R2 F” hasbeen specified.• If we have further restrictions, they may he redundant because they may beimplied by another SIC. For example, the further restriction “(F R3 H)” inthe SIC, “if E R2 G then (B Ri F) A (F R3 H)”, is redundant if there isanother SIC to require “if B Ri F then F R’ H”.• If Ri is total to B, the SIC, “if B Ri F then (B R2 G) V (B R3 H)”, reducesto an Either_Existence_Relationship SIC “(B R2 G) V (B R3 H) “. ThisSIC is also subsumed by either of the following:— R2 total to B (i.e., V E, R2, E R2 G),— R3 total to E (i.e., V E, R3, E R3 H),— “if B Ri F then E R2 G”,— “if ER1 FthenER3H”.This SIC is inconsistent with a set of two Exclusive_Relationship SICsR1IIR2 j, R1WR3.5. An Exclusive_Relationship SIC, R1IIR2, is inconsistent with the set of totalityconstraints— both Ri.4 R2 are total to B. If RiIR2 and only one (say Ri) ofthe relationships is total to E, the database designer needs to disconnect the otherrelationship (i.e., R2) type from E since it becomes unrelated to B; otherwise, theyare inconsistent.6. An Exclusive_Occurrence SIC becomes redundant when an Exclusive_Relationship SIC has been specified. However, it can exist when the set of totalityconstraints, both Ri R2 are total to E, has been specified.7. A NotAndRelationships SIC, ‘-i((E Ri F) A (B R2 G) A (B R3 H))”,becomes redundant when any pair of the involved relationships is exclusive, i.e.,R1WR2, R2IR3, or R3IRi. It is inconsistent with the set of totality constraintsall involved relationships Ri, R2, and R3 are total to E. If only one relationship,say R3, is total, it reduces to a Not_And_Relationship SIC among the remaining ones (or an Exclusive_Relationship SIC if there are two remaining). It isnot meaningful that only one of the relationship types (say, R3) is not total since“—R3” would imply that the R3 relationship type should be disconnected.8. An Either_Existence_Relationship SIC, “(B Ri F) V (B R2 G) V (E R3 H)”,becomes redundant when any one of these involved relationship is total to E. It isalso redundant when a proper subset of these relationships has been declared toAppendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 233have an Either_Existence_Relationship SIC. A S ubtypeEither_ExistenceRelationship SIC, “if (E.Ai compop vi) then (ER1 F) V (ER2 G) V (ERSH)”,becomes redundant when the related sub-Cmin (the subtype minimum cardinality)of any one of these involved relationships to E is greater than or eqilal to 1. Itis inconsistent with the set of three Relationship_Depends_on_Entity_ValueSICs:“if E Ri F then —‘ (E.Ai comp_op vi) “,“if E R2 G then —‘ (E.Ai compop vi)”,and “if E R3 H then -i (E.Ai compop vi)”.9. A Pair_Relationships SIC, “if (E Ri F) then if ((E R2 G) then (F R3 H))”,is redundant when R3 is total to F. When R2 is total to E, it reduces to a Relationship_Depends_on_relationship SIC, “if (E Ri F) then (F R3 H)”. ThisSIC may be redundant or needs to be modified when there are some Relationship_Depends_on_Relationship SICs. For example, if we have “if (E Ri F)then (F R3 H)”, the above SIC becomes redundant; if we have “if (E Ri F) then(E R2 G)”, the above SIC reduces to “if (E Ri F) then (F RS H)”. However, incase that there are further restrictions on R2 and R3, the consistency needs to bechecked during the consultation and is also hard to verify.10. A Relationship_Before_Relationship SIC, “if E Ri F then (E R2 G) before”,is redundant when R2 is total to E. It is not meaningful when Ri is total to E.If the database designer specifies a time sequence cycle among some relationships(e.g., “if E Ri F then (E R2 G) before”, “if E R2 G then (E R3 H) before”, “ifE R3 H then (E Ri F) before”), the insertion of all of the relationships can onlyperformed in a single transaction. It is possible that the database designer hasmade a mistake to assert the logical meaning of these relationship types. Thistype of SIC can exist when either a Relationship_Depends_on_Relationship,Subset_Relationship SIC, or an Exclusive_Relationship SIC, etc. has beenspecified for the related relationships Ri and R2.11. A Relationship_Not_Before_Relationship, “if (E Ri F) then (E Rn Y) before” is inconsistent with the totality constraint that Rn is total to E, or “if E RiF then (E Rn Y) before”, or a set of SICs: “if E Ri F then (E R2 G) before”, “(ER2 G) then (E R3 H) before “, “if E R3 H... “, “if E Ri X then E Rn Y before “. Itis not meaningful when Ri is total to E. However, it is allowed to exist when eithera Relationship_Depends_onRelationship, Subset_Relationship SIC, or anExclusive_Relationship SIC, etc. has been specified for the related relationshipsRi and R2.12. If a Group_Relationships SIC is specified, the verification is complicated.• The SIC, “if (E Ri F) A (E R2 G) then (E R3 H)”, is redundant when R3 istotal to E. It reduces to “if (E R2 0) then (E R3 H)” when Ri is total to E. InAppendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 234addition, when both Ri and R2 are total to E, it is subsumed by a totality constraint — R3 is total to E. It is inconsistent with a Not_And_RelationshipsSIC, “—((E Ri F) A (E R2 G) A (E R3 H))”.• Two SICs, “if (E Ri F) A (E R2 G) then E R3 H” and “if (E Ri F) A (ER2 G,) then E R4 I”, are inconsistent with R3IR4, or “—i((E Ri F,) A (E R2G) A (E R3 H) A (E R I))”.• Two SICs, “if (E Ri F) A (E R2 G) then (E R5 J)” and “if E R5 J then ER6 K”, are inconsistent with “-‘ ((E Ri F) A (E R2 G) A (E R5 J) A (E R6K))”.• If this type of SIC has the disjunction of assertions in the “then” part, it wouldbe further complicated and needs a case analysis to verify consistency.— The SIC, “if (E Ri F) A (E R2 G) then (E R3 H) V (E R7 L)”, isinconsistent with a set of two Not_And_Relationships SICs: “— ((ERi F) A (ER2 G)A (ER3H))” and “-((ERi F) A (ER2 G) A (ER7— Two SICs, “if (E Ri F) A (E R2 G) then (E R3 H) V (E R7 L)”, and “if(E Ri F) A (E .112 G) then (E R8 M) V (E R9 N)” are inconsistent with eitlier a set of four exclusive relationships: R311R8, R3WR9, R7IR8, R7IR9;or a set of four Not-And-Relationships SICs:“-((E Ri F) A (E R2 G) A (E R3 H) A (B R8 M))”,“—i((E Ri F) A (E R2 G) A (E R3 H) A (E R9 N))”,“-‘((ER] F) A (E R2 G) A (E R7 L) A (E R8 M))”and “—i((ER1 FA (ER2 G) A (ER7L)A (ER9N,))”.— Similarly, two SICs, “if (E Ri F) A (E R2 G) then (E R3 H) V (E R7L)”, and “if (B Ri F) A (E R2 G,) then (E R9 N)” are inconsistent witheither a set of two exclusive relationships: R3 jR9, R7IR9; or a set of twoNotAnd_Relationships SICs:“—i ((E Ri F) A (B ff2 0) A (E ff9 H) A (E R9 N))”,and “-i((E Ri F) A (B R2 G) A (E R7 L) A (B R9 N))”.— Three SICs, “if (ER1 F) A (E R2 G) then (E R3 H) V (E R’7 L)”, “if ER3 H then E R9 N”, and “if B R7 L then B R5 J”, are inconsistent witha set of two Not_And_Relationships:“-((ERJF)A(ER2G)A(ERSH)A(ER9N))”,and “—((E Ri F) A (E R G) A (E R7 L) A (E R5 J))”.13. A Relationships_Join SIC, “if (E Ri F) then (E R2 H) A (H R3 F)”, is notredundant even if either R2 or R3 is total to the related entities. It is inconsistentwith the Exclusive_Relationship SIC among any pair of Ri, R2, ff9.If we have this SIC, two Relationship_Depends_on_Relationship SICs, “if (BRi F) then (B R2 H)” and “if (B Ri F) then (H R3 F) “, are redundant.Appendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 23514. The Symmetry, and Transitivity Properties of a relationship would not beinconsistent with other SICs in the above.15. If Ri is total to E, a Relationship_Depends_on_Entity_Value SIC, “if (E RiF) then (E.A comp_op v)” reduces to “(E.A comp_op v)”.16. A Weak Relationship SIC or a weak entity type requires the presence of thespecification totality constraints.Appendix HSIC Reformulation and Decomposition AlgorithmsThis appendix proposes algorithms for reformulating general SICs in the simplified formatof Appendix C and decomposing them, if necessary, into operation-dependent sub-SICsas defined by the Representation model. Before applying the algorithms, the subsystemtakes the responsibility of writing all general SICs in the simplified format of Appendix C.The if ... then ... rule format should be used if possible. For example, although theformat “‘—‘ P V Q”is equivalent to “if P then Q” in logic, the rule format should beused. If a general SIC is originally written by using a nested rule (i.e., if ... then if• . . then...), it should be rewritten by using only one pair of “if” and “then” keywords.Decomposed sub-SICs have the same certainty factor as their general SICs.H.1 Find the Relevant Object and Operation Components1. If the system variable Current_time appears in a general SIC, a sub-SIC for Current_time on update will usually be required. However, there are two followingexceptions because the increase of Current_time will never violate the SIC.(a) The SIC is in rule format and Current_time appears on the left hand side of“s” in the “if” part.(b) The SIC is not in rule format (or the SIC is in rule format but Current_timeis in the “then” part), and Current_time appears on the left hand side of ““.2. If an attribute (say E.A) is mentioned in a SIC, the SIC is usually relevant to iton update. However, if any of the following cases occurs, the SIC would j berelevant to it.(a) The attribute has been declared to be unchangeable (i.e., there is a SIC “ifE.A is_to_be_updated then false”).(b) The SIC contains a pair of new and old special functions with arguments ofother attributes, not this attribute.(c) The SIC contains the keyword is_to_be_deleted.(d) The attribute is in the “then” part and the modifier “before” is attached to it.236Appendix H. SIC Reformulation and Decoinposition Algorithms 237(e) The attribute is in the “then” part and there is a modifier “previously” in the“if” part.(f) The attribute is a candidate key and is referenced by a special formula in the“if” part of the SIC. This formula involves the same attribute name of anotherentity type (e.g., E.A=F.A) to link the specific mentioned entity type occurrences (e.g., E and F that corresponds to the same physical entity occurrence(e.g., G) in a specialization hierarchy.3. Suppose that a general SIC has been found by the above steps to be relevant to anattribute.(a) Suppose that the relevance of the SIC to the attribute is because theattribute is an argument of an aggregation function. In general, the SIC isalso relevant to its associated entity or relationship on insertion. However,if any of the following cases occurs, the SIC would be relevant to theassociated entity or relationship on insertion.i. The SIC is the one that declares the attribute to be unchangeable.ii. The SIC contains a pair of new and old functions.iii. The SIC contains an “previously” modifier in its “if” part or a “before”modifier in its “then” part.In addition, if the attribute belongs to an entity a.nd the SIC is also relevantto the insertion of one mentioned relationship in which the entity participate,the SIC is relevant to the entity on insertion.(b) Suppose that the SIC is relevant to the attribute because the attribute is anargument of an aggregation function. In general, the SIC is relevant to itsassociated entity or relationship on both insertion and deletion. However, ifthe assertion including the attribute is in the form of “agg_fcn(E.A) comp_oparithmetic_simple_expression”, where “agg_fcn” is an aggregate function and“comp_op” is a comparison operator (for a comp_op (e.g., “>“) prefixed witha “not” or “—i”, first properly replace the comparison operator (e.g., ““))the following would be exceptions:i. Suppose that the aggregate function is either max or count:• If the comparison operator is either “>,“, the SIC is only relevantto the mentioned entity/relationship on deletion, not insertion.• If the comparison operator is either “<,“, the SIC is only relevantto the mentioned entity/relationship on insertion, not deletion.ii. Suppose that the aggregate function is mm:• If the comparison operator is either “>,“, the SIC is only relevantto the mentioned entity/relationship on insertion, not deletion.• If the comparison operator is either “<,<“, the SIC is only relevantto the mentioned entity/relationship on deletion, not insertion.Appendix H. SIC Reformulation and Decomposition Algorithms 238iii. If the aggregate function is sum and if it can be assumed that the attributevalues to be summed are all positive, the sub-SICs are the same as thecase of either max or count.In all other cases (e.g., the comparison operator is “=“ or “f’, or the aggregatefunction is “avg”), the SIC is relevant to its associated entity or relationship on bothinsertion and deletion. However, similarly, if the attribute belongs to an entity andthe SIC is also relevant to the insertion of one mentioned relationship in which theentity participate, it is pj relevant to the entity on insertion.4. Suppose that a SIC contains the keyword “is_to_be_deleted”. The SIC is only relevant to the entity or relationship that is to be deleted.5. Suppose that a SIC does not contain the keyword “is_to_be_deleted”.(a) Consider its mentioned relationships.i. Suppose that in the “if” part of a SIC there is an assertion containing arelationship (say R). In general, the SIC is relevant to the relationship oninsertion. However, if any of the following cases occurs, the SIC would berelevant to it on deletion, not insertion.• A “not” or “—i” has been attached to the assertion.• An “d at_most” numerical quantifier has been attached to the relationship R or there is a “count(R) < Number” or “count(R) Number”assertion, but no “not” or• A modifier “previously” has been attached to the assertion.If there is an “d exactly” numerical quantifier or “count(R)=Number”assertion, the SIC would be relevant to the relationship on both insertionand deletion.ii. Suppose that the “then” part of a SIC or a SIC, which is not in rule format,contains an assertion referencing to a relationship. In general, the SIC isrelevant to the relationship on deletion. However, if there is a “before”modifier attached to it, or if there is an “previously” modifier appearingin the “if” part of the SIC, this SIC is not relevant to it. In addition,if any of the following cases occurs, the SIC would be relevant to it oninsertion, not deletion.• A “not” or “—i” has been attached to the assertion.• An “ at_most” numerical quantifier has been attached to the relationship or there is a “count(R) < Number” or “count(R) Number”assertion, but no “not” or• A “quantifier has been attached to the relationship, but no “not”or —Appendix H. SIC Reformulation and Decomposition Algorithms 239If there is an “ exactly” numerical quantifier or “count(R)—_ Number”assertion, the SIC would be relevant to the relationship on both insertionand deletion.(h) Consider its mentioned entities. If there are relationships involved in the formof “entity_typel relationship_type entity_type2” of a SIC, do not consider thenon-sharing entities. The sharing entities are those entities which:— appear in at least two assertions as a participant of relationships or as anowner of an attribute in either “if” or “then” part if there is no “with_respect_to”modifier; orare only those in the “with_respect_to” modifiers.For example, in the case of “if (E RX F) then (E RY G)”, E is a sharing entity,F and G are not. In the case of “if (E RX F) then (E RY H) A (F RZ H)”,all entities E, F, and H are sharing entities. In the case of “if with_respect_toE, (E RX F) then with_respect_to E, (E RY F)”, E is a sharing entity, but Fis not. If those involved entity types are in the same specialization hierarchy,the sharing entity types may be different in syntax. For example, assume thatsome mangers do not supervise employees, we may have“if with_respect_to Manager, (Manager Supervise Employee)then with_respect_to Employee, —l (Employee Participate Union)”It means that if a manager supervises employee(s), he (she) cannot, as anemployee, participate in a union. Here the “sharing” entity type Manager inthe relationship Supervise corresponds to Employee in the relationship Participate. A more complicated case would need a list of sharing entity typesin some order and their corresponding sharing entity types in the same order.For example,“if with_respect_to (Teacher, Student), (Teacher Instruct Student)then with_respect_to (Worker, Manager),—‘ (Manager Supervise Worker)This is an exclusive occurrence example which means that “if a teacher instructs a student, the student cannot, as a manager, supervise the teacher asa worker” (but the other way around may be permitted). Here the sharing entity types are: Teacher corresponding to Worker, and Student correspondingto Manager.i. Suppose that in the “if” part of a SIC there is an assertion referencingto an entity. In general, the SIC is relevant to the entity on insertion.However, if the SIC is relevant to the insertion of a relationship in whichthe entity participates, the SIC is relevant to the entity on insertion.ii. Suppose that the “then” part of a SIC or a SIC, which is not in rule format,contains an assertion referencing to an entity. In general, if the assertionis not on its attribute(s), the SIC is relevant to the entity on deletion.If any of the following cases occurs, the SIC would be relevant to it oninsertion, not deletion. However, if the SIC is relevant to the insertionof a relationship in which the entity participate, it is not relevant to theAppendix H. SIC Reformulation and Decomposition Algorithms 240insertion of the entity; similarly, if the SIC is relevant to the deletion ofa relationship in which the entity participate, it is not relevant to thedeletion of the entity.• A “not” or “-“ has been attached to the assertion.• An “ at_most” numerical quantifier has been attached to the entity orthere is a “count(R) < Number” or “count(R) Number” assertion,but no “not” or• A ‘V” quantifier has been attached to the entity, but no “not” or(c) If there is a SIC including an aggregate function because of the natural association or indexing derived set association, the SIC is also relevant to theentity, as a set object, on deletion.6. Suppose that by the above steps, a general SIC is found to be relevant to theinsertion or deletion of a relationship R or entity E. If its primary key is alsoexplicitly mentioned in the SIC, skip this step. Otherwise, do the following.(a) Suppose that the SIC only contains attributes of the relationship type R or entity type E. In general, we need not consider the primary key update problem.However, there is an exception. If there is a keyword “is_to_be_deleted”, theSIC is relevant to any of the key attributes of the relationship R or entity Eon update. The proper SIC name (refer to Appendix H.4) should be recordedin its “associated_PKSIC_D” predicate.(b) Suppose that the SIC also contains attributes of other than the relationshiptype R or entity type E; or contains assertions directly referencing R or B.i. If the SIC is relevant to the insertion or deletion of the relationship R,it would also be relevant to the update of any of its key attributes thatrelates to its sharing entities. The proper SIC name (see Appendix H.4)should be recorded in its “associated_PKSIC_I” or “associated_PKSIC_D”predicate (see Appendix B.2). It is not relevant to the update of therelationship’s key attributes that relate to non-sharing entities.ii. If the SIC is relevant to the insertion or deletion of the entity B, it wouldbe also relevant to the update of any of its primary key attributes. Theproper SIC name (see Appendix H.4) should be recorded in its “associated_PKSIC_I” or “associated_PKSIC_D” predicate.H.2 Write the Proper Precondition and Predicate Components1. If there is any aggregate function with the attribute E.A or R.A as an argument,properly attach subscripts to the entity or relationship owning the attribute. Thatis, suppose agg_fcn is an aggregate function, do the following (similarly, for theR.A).Appendix H. SIC Reformulation and Decomposition Algorithms 241• change “agg_fcn(E.A)” to “agg_fcn(Ei .A)“;• change “agg_fcn(E.A) comp_op arithmetic_simple_expression involving E.A“)to “agg_fcn({Ei.A E1.A E0.A}) comp_op arithmetic_simple_expression involving E0 . A “.2. Suppose that the special predicate unique is used.• If it contains only one argument, e.g., “unique(E.A)”, change it to“count({Ei I E1.A =• If it contains more than one argument, e.g., “unique(E.A1, E.A2)”, change itto “count({Ei I comp_atts_occ(Eo, Comp_Key, CurrentCompAtts Value),comp_atts_occ(F1, Comp_Key, AnyCompAtts Value),CurrentCompAtts Value=AnyCompAtts Value})=1 “,where Comp_Key (e.g.,{E.A1, E.A2}) is a set containing those argumeilts inthe predicate unique.3. If there is a relationship expression “Entity_Occi Rship_Occ Entity_Occ2”, e.g., “ER F”, rewrite it for sharing entities by using a special predicate“rship_occ_part(Rship_ 0cc, Role_Type, Sharing_Entity_ 0cc)” that is used to evaluatewhether a Sharing_Entity_Occ participates in a relationship occurrence Rship_Occwith the Role_Type.• If there is no “with_respect_to” modifier, the Role_Type is the same as the entitytype of F.• If there are “with_respect_to” modifiers with a common entity type, the Role-Type would be the entity type in these modifiers.• If those entity types in the “with_respect_to” modifiers are different, the relationship expressions would be written with the same entity variable, butdifferent Role Types. For example, supposing that we have “if withrespecttoF, E RX F then with_respect_to G, G RY H”, the first relationship expressionwould be written as “rship_occ_part(RX, “F”, F) “, the second relationshipexpression would be written as “rship_occ_part(RY, “G”, F) “,• If the entity types in “with_respect_to” modifiers are in ordered lists, therewould be two pairs of “rship_occ_part” assertions with same entity variablestaking their Role_Types in the corresponding order. For example, given “ifwith_respect_to (E.F), E RX F then with_respect_to (H, G), G RY H”, we wouldhave the following four assertions:“rship_occ_part(RX, “F”, F) “,“rship_occ_part(RY, “F”, F) “,“rship_occ_part(RY, “H”, E) “, and“rship_occ_part(RY, “G “, F) “.Appendix H. SIC Reformulation and Decomposition Algorithms 2424. Omit any quantifier “V” or “s” of the SIC unless there is an “at_least”, ‘at_most”,“exactly”, or “different” following an “3” quantifier.5. Rewrite numerical quantifiers. That is, for all numerical quantifiers on any relationship R, do the following.• change “3 at_least Number R” to “count(R) Number”;• change “3 at_most Number R” to “count(R) Number”;• change ‘S exactly Number R” to “count(R,) = Number”.A subscript may be attached to R in a later step.6. If a “before” modifier is attached to an assertion referencing an object, apply anold function to that object and delete the “before” modifier.7. Suppose that a general SIC is found to be relevant to an attribute (say E.A or R.A)by applying the algorithm in Appendix H.1.(a) If the sub-SIC declares the attribllte to be non-updateable, the predicate wouldonly include “false “. All other assertions are in the precondition componentto identify the attribute.(b) In other cases, the predicate component would usually only have the assertion(say Q) containing the attribute (E.A or R.A) for which we are reformulating.That is,i. If the original general SIC contains only an arithmetic expression Q, leaveit as the predicate.ii. Suppose that the original general SIC is in rule format.• If Q is in the “if” part, negate it and move it to the predicate component. Leave the further restrictions (if any) on Q in the preconditioncomponent. Negate all the original assertions in the “then” part ofthe general SIC, and move them to “AND” with the preconditioncomponent.• If Q is in the “then” part, leave it as the predicate component, move(but not negate) the further restrictions (if any) oii Q to “AND” withthe precondition component.(c) If the subscript 1 has been attached to its entity or relationship no matterwhether it is attached to this attribute (e.g., agg_fcn(E1.A)) or other attributes(e.g., agg_fcn(E1.B)) of the same entity or relationship owing to the aggregatefunctions mentioned in step 1, attach the subscript 0 (i.e., E0.A) to its assertions other than those involving aggregate functions.Appendix H. SIC Reformulation and Decomposition Algorithms 243(d) If the original general constraint is to assert explicitly the equality of somekey attributes of two entity types, it should be dealt with specially. In theprecondition component of the sub-SIC for the involved key attribute of oneentity on update, the old function is used to search for the correspondingoccurrence of the other entity before updating. Its predicate component shouldrequire the new values of the corresponding key attributes of the two entitytypes to be equivalent.8. Suppose that a general SIC is found to be relevant to a relationship R or entity Eby applying the algorithm in Appendix H.l.(a) If the SIC only contains assertions referencing to attributes of the relationshipR or entity E, keep the original format that the system obtains from thedatabase designer as the precondition and predicate components of the sub-SIC for the relationship R or entity E.(b) Suppose that the SIC also contains assertions referencing to attributes of otherthan the relationship R or entity E, or contains assertions directly on therelationship R or entity E.i. Suppose that the SIC is found to be relevant to the entity E on insertion.In general, the sub-SIC for the entity keeps the original format. However,if the assertion (say, Q) containing the entity is in the “then” part witha “not” or “-“, rewrite the original format — negate and move the Q tothe precondition component, negate and move all other assertions in theoriginal “if” part to the predicate component.Replace any “count(E)” with “ E1, count(E1}”.ii. Suppose that the SIC is found to be relevant to the entity B on deletion.Rewrite the original format so that the assertion containing the entity Eis in the precondition component.iii. Suppose that the SIC is found to be relevant to the relationship R oninsertion. In general, keep the original format. However, if the “if” partoriginally does not contain those “rship_occ.part”s of the sharing entitiesregarding to the relationship R, add them to the precondition componentto identify this specific relationship R. If a “rship_occ_part” assertion isoriginally in the “then” part with a “not” or “—‘“ attached to it, rewritethe original format so that the assertion is in the precondition component.Replace any “count(R)” with “ R1, count(Ri) “.iv. Suppose that the SIC is found to be relevant to the relationship R ondeletion.A. Suppose that there is “ different R” in the original format.Replace ‘8 different R” with ‘8 R1, R1 R0 “.In its precondition component, replace the relationship variable Rwith R0.Appendix H. SIC Reformulation and Decomposition Algorithms 244In its predicate component, replace the relationship variable R withR1.B. Suppose that it is not the above case A, and the assertion(s) rship_occ_partcontaining this relationship R is (are) in the “if” part.Keep the original format after removing any “previously” modifier.Replace any “count(R)” with “ R1, R1 R0, count(Ri)”.C. Suppose that it is not the above case A, and the assertion(s) (e.g., sayQ) containing this relationship R is (are) in the “then” part.• Suppose that for maintaining semantic integrity it is possible to findother occurrences to replace the one to be deleted. That is, originally the relationship expression does not contain an “ at_most”numerical quantifier; and the SIC is not relevant to the all key attributes of the relationship. (For example, a Subset_RelationshipSIC is relevant to the whole relationship key.)— The original assertions in “if” part are in the precondition component.— In addition, add one copy of the assertion(s) Q containing thisrelationship R variable with a subscript 0 at the beginning of theprecondition component to be in conjunction with other assertions.— If originally Q is a part of further restrictions on some assertions(say Qsome), move (but not negate) Qsome, after removing theirnumerical quantifiers or count aggregate functions, to the precondition component. Rearrange them in a reverse order and putthem behind the assertion Q.— Replace the assertion(s) Q (that may include the “count” aggregate function) in the “then” part with the one containing thisrelationship variable with a subscript 1, and add the assertion “— The further restrictions (if any) on Q are kept unchanged in thepredicate component.For example, suppose that Qo is the assertion(s) containing R0, andQi is the assertion(s) containing R1. If originally we have “if Q Wthen Q V QX V QY V ... “, the precondition component would be“Qo A Q W”, and the predicate component would be “Qi V QX VQY V ... “. If originally we have “if QW then QX A QY A Q AQZ A ... “, where Q Y contains a numerical quantifier “ at_least”,the precondition component would be “Qo A QY A QX A Q W”,where QY is the QYafter removing the quantifier, and the predicatecomponent would be “Qi A QZ A ... “.• Suppose that it is not possible to find other occurrences to replacethe one to be deleted.Appendix H. SIC Reformulation and Decomposition Algorithms 245— Move (but do not negate) Q and all its further restrictions to theprecondition component.— If after doing the above, we find that the predicate componentcontains only the assertion “false”, negate all the assertions inoriginal “if” part and move them to the predicate component.v. Although in the above we have mentioned placing an “a” quantifier ona variable with a subscript 1, an “a” quantifier on a variable in othercases is optional. Without an explicit quantifier, an “s” quantifier canbe automatically assumed on any variable in that assertion. We mayexplicitly express the quantifier just for clarity. No explicit quantifierneed be placed on the object for which we are writing precondition andpredicate components. We may place an explicit “s” quantifier on anunknown variable when it is its first time to be included in any assertion.For example, if we are writing precondition and predicate components forR and we need an assertion rship.occ...part(R, “E”, E), we may place anexplicit quantifier on E, i.e., ‘S E”.H.3 Suggest the Violation Action Component1. A sub-SIC with certainty less than 100% (in a ratio scale), not 10 (supposing thatthe database designer specifies 10 as the highest in an ordinal scale), “uncertain”(in 2 levels), or fuzzy terms (e.g., “sometimes”) would have at least two alternativeviolation actions: “warning” and “conditionally_reject”. However, if the databasedesigner has specified a certainty threshold, those sub-SICs with certainty less thanthe threshold would only have one violation action — “warning”.2. A sub-SIC with certainty 100% (in a ratio scale), 10 (if it is the highest in an ordinalscale), or “certain” (in 2 levels) would have at least one alternative violation action“reject”.3. Suppose that there are only two assertions on two objects (e.g., “if (E RX F) then(E RY G)”) in the SIC.(a) Suppose that these two assertions are both for relationships or entities (sayRX, RY, not attributes. Both its sub-SICs would have propagation as analternative violation action. That is, a certain sub-SIC could have an alternative— ‘propagate”; and an uncertain sub-SIC could have “conditionally_propagate “.i. In a sub-SIC for one object, the propagated operation type is oppositeto the relevant operation of the other object found by the algorithm inAppendix H.1.For example. supposing that the general SIC is “if (E RX F) then (EAppendix H. SIC Reformulation and Decomposition Algorithms 246RY G,) “, the violation action in the sub-SIC for RX on insertion could be“propagate(insert(RY))“; supposing that the general SIC is “if (E RX F)then — (E RY G,) “, the violation action in the sub-SIC for RX on insertioncould be “propagate (delete (RY))”.ii. If there is an “ at_least” numerical quantifier attached to the assertioncontaining the object to be propagated or there is a “courit(Rship Type)Number” assertion in the sub-SIC, use the “insert_all” or “delete_all” asthe propagated operation to indicate that there are explicit quantitativerequirements when propagating to insert or delete the object occurrences.iii. If the object to be propagated is an entity occurrence, specify into/fromwhich entity type this occurrence is to be inserted or deleted. That is, wewould need “propagate (insert(Ent Type, F,))” or “propagate (delete(EntType,F))” where the Ent Type is the entity type with which E participates in arelated relationship in the sub-SIC. This Ent Type is the one appeared in anassertion “rship_occpart(Rship pe,EntType,E)”or “ent_occ(EntType, E)”in the predicate or precondition component of the sub-SIC.(b) Suppose that one of the assertions is for an attribute. Similarly, the sub-SIC forthe attribute would have an alternative to propagate to insert or delete an entity or relationship. The other sub-SIC for the entity or relationship could have“propagate(update(E.A, arithmetic_simple_expression))” as an alternative onlyif the assertion of the attribute is “E.A = arithmetic_simple_expression”.(c) Suppose that two assertions are both for attributes, say E.Ai and E.A2. Thesub-SIC for E.A2 (or E.Al) could have an alternative: “propagate(update(E.Al,arithmetic simple expression_i))” (or “propagate (update (E.A2, arithmetic simple expression_2))“) only if those two assertions are“E. Al = arithmetic_simple_expression_i”and “E.A2 = arithmetic_simple_expression_2“;or there is a single assertion such as “E.Ai = E.A2”.4. In the case that there are two alternative violation actions in a sub-SIC, ask thedatabase designer to choose one.H.4 Generate the SIC Name1. The “ObjectType, OperationType” in the SIC name can be decided by the systemafter applying the above algorithms.2. The “SICType” and “RelatedObjectTypeSet” depend on the predefined SIC types(e.g., those in Appendix D). If there are some further restrictions on some predefined SIC types, in general, all the objects appeared in the further restrictionsare included in the “RelatedObjectTypeSet”. However, the entity type names needAppendix H. SIC Reformulation and Decomposition Algorithms 247not be included if the names of their attributes have been included. Neither wouldthe entity type names be included if a relationship type is mentioned and both ofits participant entity types are concerned (e.g., SubsetRelationship SIC). Noduplicate object names are included.3. For some complicated applications, if the system finds that it cannot distinguishtwo SIC names by the previous steps, a “SeqenceNo”is added.Appendix ISome Examples of SIC Reformulation and DecompositionThis appendix contains four examples of reformulating and decomposing general SICs.Conditional_Value SIC:Suppose that there are only two attributes involved, such as,if E.A1 comp_opi vithen E.A2 compop2 v2where comp_opi or comp_op2 denotes =, , , <, or>.This type of SIC is represented by the following three sub-SICs.E.A2- U-Conditional Value-(E.Ai)CERTAINTY certainFOR E.A2ON updateIF E.A1 comp.opi viASSERT E.A2 comp.op2 v2ELSE rejectE. A i-U-ConditionalValue- (E. A 2)CERTAINTY certainFOR E.AiON updateIF E.A2 -‘ compop2 v2ASSERT E.A1 -, compopi viELSE reject248Appendix I. Some Examples of SIC Reformulation and Decomposition 249E-I-CoriditionalVal’ue-(E.Al, E.A2)CERTAINTY certainFOR EON insertionIF E.Al compopl viASSERT E.A2 compop2 v2ELSE rejectAppendix I. Some Examples of SIC Reformulation and Decomposition 250Relationship_Depends_onltelationship involving 3 entity types:For example, in Figure 5.2 (o