Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Incorporating semantic integrity constraints in a database schema Yang, Heng-li 1992-12-18

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
831-ubc_1992_fall_yang_heng-li.pdf [ 6.94MB ]
Metadata
JSON: 831-1.0086621.json
JSON-LD: 831-1.0086621-ld.json
RDF/XML (Pretty): 831-1.0086621-rdf.xml
RDF/JSON: 831-1.0086621-rdf.json
Turtle: 831-1.0086621-turtle.txt
N-Triples: 831-1.0086621-rdf-ntriples.txt
Original Record: 831-1.0086621-source.json
Full Text
831-1.0086621-fulltext.txt
Citation
831-1.0086621.ris

Full Text

INCORPORATING SEMANTIC INTEGRITY CONSTRAINTSIN A DATABASE SCHEMAl)yHeng-Li YangB. Sc., National Chiao Tung University, 1978M. Commerce, National Cheng Chi University, 1980M. Sc. (Computer Science), Pennsylvania State University, 1985A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESTHE FACULTY OF COMMERCE AND BUSINESS AI)MINISTRATIONWe accept this thesis as conformingto the required sta dardTHE UNIVERSITY OF BRITISH COLUMBIAAugust 1992©Heng-Li Yang, 1992In presenting this thesis in partial fulfilmentof the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shallmake itfreely available for reference and study. I further agree that permissionfor extensivecopying of this thesis for scholarly purposes maybe granted by the head of mydepartment or by his or her representatives. It is understood that copyingorpublication of this thesis for financial gain shall notbe allowed without my writtenpermission.______________Department ofOJIcL 8usiui escAd,nmis*iatioJ%The University of British ColumbiaVancouver, CanadaDate 6’, /792DE-6 (2/88)AbstractA database schema should consist of structures and semantic integrity constraints. Semantic integrity constraints (SICs) are invariant restrictions on the staticstates of thestored data and the state transitions caused by the primitive operations:insertion, deletion, or update. Traditionally, database design has been carried out on anad hoc basisand focuses on structure and efficiency. Although the E-R model is the popularconceptual modelling tool, it contains few inherent SICs. Also, although therelational databasemodel is the popular logical data model, a relational database in fourthor fifth normalform may still represent little of the data semantics. Mostintegrity checking is distributedto the application programs or transactions. This approach to enforcingintegrity via theapplication software causes a number of problems.Recently, a number of systems have been developed for assisting thedatabase designprocess. However, only a few of those systems try to help a databasedesigner incorporateSICs in a database schema. Furthermore, current SIC representationlanguages in theliterature cannot be used to represent precisely the necessary featuresfor specifyingdeclarative and operational semantics of a SIC, and no modellingtool is available toincorporate SICs.This research solves the above problems by presentillg two models andone subsystem.The E-R-SIC model is a comprehensive modelling tool for helping a databasedesigner incorporate SICs in a database schema. It is application domain-independentand suitable11for implementation as part of an automated database design system. The SIC Representation model is used to represent precisely these SICs. The SIC elicitation subsystemwould verify these general SICs to a certain extent, decompose them into sub-SICs ifnecessary, and transform them into corresponding ones in the relational model.A database designer using these two modelling tools can describe more data semanticsthan with the widely used relational model. The proposed SIC elicitationsubsystem canprovide more modelling assistance for him (her) thancurrent automated database designsystems.U’Table of ContentsAbstract iiList of Figures xiAcknowledgementxii1 Introduction11.1 Database Design11.2 Semantic Integrity Constraints31.3 SICs, “Constraints” and Transactions41.4 Enforce Semantic Integrity Constraints via Application Software81.5 Embed Semantic Integrity Constraints in a Database101.6 The Research Questions, Objectives, Methodology and Scope121.7 Contributions171.8 The Dissertation Outline192 Review of Previous Work 20iv2.1 SIC Classification.2.1.1 Classification Based on2.1.2 Classification Based on2.1.3 Classification Based on2.1.4 Classification Based on2.1.5 Classification Based on2.1.6 Classification Based on2.1.7 Classification Based on2.1.8 Classification Based on2.22.32.42.5ExplicitnessApplied ObjectsOperation TypePreconditionCertainties of SICsViolation ActionEnforcement Schedule .Dynamics21212225262628293134343840413 A Model for Representing Semantic Integrity Constraints3.1 An Overview of the Representation Model3.2 SIC Name4343512.1.9 Summary of Comments on SIC ClassificationSIC RepresentationSIC VerificationSIC Reformulation and DecompositionAutomated Database Design Aids for Eliciting SICsv3.3 Certainty Factor (F)533.4 Object (0)543.5 Operation Type (T)553.6 Precondition (C)553.7 Predicate (P)563.8 Violation Action (A)574 The Application of the SIC Representation Model614.1 Completeness of a SIC Specification for Database Design614.1.1 Not Fewer Components634.1.2 No More Components694.2 SIC Abstractions764.3 Database Management804.3.1 SIC Management804.3.2 Other Aspects of Database Management845 An Extended E-R Model Incorporating SemanticIntegrity Constraints 865.1 Problems with Previous E-R Models865.2 An Overview of the E-R-SIC Model88vi5.2.1 Primitive Modelling Constructs885.2.2 Data Abstractions925.2.3 Basic Properties of SICs975.3 Entity Attribute SICs1045.4 Entity SICs1065.5 Relationship SICs1095.5.1 Necessary Conditions1125.5.2 Sufficient Conditions1185.6 SICs Implied by Implicit Relationships and Data Abstractions205.7 Summary of the E-R-SIC Model1246 The Application of the E-R-SIC Model1306.1 An Example of Using the E-R-SIC Model1306.2 Potential Pitfalls of Using the E-R-SIC Model . . .1376.3 Data Integrity Semantic Completeness1407 A Proposed Database Design Aid for ElicitingSICs 1447.1 An Overview of the SIC Elicitation Subsystem . . .1457.2 SIC Verification. 149rjj7.2.1 Consistency and Nonredundancy Rules for SIC Types 1557.3 SIC Reformulation and Decomposition 1577.3.1 Representation of Generic SICs1617.4 Transforming SICs to Relational Form1658 Conclusions and Further Research1708.1 Conclusions and Contributions1708.2 Future Research Extensions to this Dissertation172Bibliography176Appendices193A BNF Descriptions of the SIC Representation Model193B Summary of the Predicates used in thisResearch 198B.1 Input Predicates198B.2 Manipulation Predicates203C BNF Descriptions of the Simplified Format210D SIC Type Classification in the E-R-SIC Model213viiiE Examples of Heuristics 225F Verification of Aggregate Attribute SICs and Cardinalities 227F.1 Simple Tests on Aggregate Attribute SICs227F.2 Algorithms for Verifying Cardinalities228G Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 231H SIC Reformulation and Decomposition Algorithms23611.1 Find the Relevant Object and Operation Components236H.2 Write the Proper Precondition and Predicate Components240H.3 Suggest the Violation Action Component245H.4 Generate the SIC Name246I Some Examples of SIC Reformulation and Decomposition248J Generic SIC Representation in the E-R-SIC Model254K Algorithms Transforming SICs to RelationalForm 266K.1 Transform the SIC Representation266IK.2 Construct SIC Name Sets for the Foreign Key Update269ixL Some Examples of Transforming SICs to Relational Form 270M Generic SIC Representation in the Relational Model276xList of Figures1.1 A Proposed Automated Database Design Subsystem for Eliciting SICs . 165.1 Grouping: Member (M), Derived Set (DS), Indexing Entity (I), IndexingRelationship (R)975.2 A Line Layout Context1105.3 A Sta.r Layout Context1115.4 A Loop-2 Layout Context1115.5 A Loop-n Layout Context1125.6 What is in thedatabase?1255.7 Single Entity Attribute SICs1265.8 Single Entity SICs1275.9 Relationship SICs1285.10 SICs Implied by Implicit Relationships and Data Abstractions1296.1 All Example: A Car Dealership Database.131xiAcknowledgementI wish to take this opportunity to express my sincere gratitude to all membersof mydissertation committee, Professor Alvin Fowler, Dr. RobertC. Goldstein, Dr. Veda C.Storey, Dr. Yair Wand, and Dr. Carson Woo. In particular, I am especiallyindebted tomy research supervisor, Dr. R. C. Goldstein, for his patience, support and countlesshoursof valuable discussions. I am also grateful to Dr. V. Storey because shesuggested thisgreat research topic and gave her opinions all the way.Many thanks go to Dr. Y. Wandfor his critical and stimulating comments.I appreciate Dr. C. Woo for his commentson the dissertation organization and Professor Al. Fowler for his practicalsuggestions.In addition, I would like to thank my friend, Dr. Hsueh-Ming Hang, andall staff of theinter-library loa.n division of UBC Library, for their helpingcollect the related literaturein the early stage of this research. Finally, I thank my parentsfor their love and supportall the time.xiiChapter 1Introduction1.1 Database DesignDatabase management systems (DBMSs) have been available formore than two decades.However, database design has also been recognizedas a task with a high level of complexity ([Obretenov, et al., 1988]) and has often been referredto as an art rather than ascience ([Holsapple, et al., 1982]). Database design is theprocess of modelling the information requirements of a real-world application and mapping them ontoan underlyingDBMS. Database design must go through the following phases: (1)information requirement elicitation, during which requirements for knowledge of a real-worldapplication aredetermined; (2) conceptual database design, which producesa high-level representationof the requirements independent of the DBMSthat will be used, e.g., the output mightbe expressed as an Entity-Relationship (E-R) model; (3) logicaldatabase design, whichproduces a logical schema that corresponds to thedata model of the chosen DBMS, e.g.,the output might he expressed as a relational data model; and(4) physical databasedesign, which transforms the logical design into a formthat is suitable for the givenhardware and DBMS, and considers efficiencies of storageand processing.Traditionally, database design has been carriedout on an ad hoc basis ([Bouzeghoubet al., 1985], [Goldstein, 1985j). It has usually been performed bya human “database1Chapter 1. Introduction 2design expert”. The weaknesses of the traditional approach are that: (1) it is a difficulttask and expert database designers are scarce; and (2) the design of a database is done bysomeone who is unfamiliar with the application domain, instead of end-users ([Storey andGoldstein, 1991]). Recently, a number of computerized systems have been developed forassisting the database design process ([Ram, 1989], [Storey and Goldstein, 1990]). Someof those can be classified as knowledge-based or expert systems; others are automationtools. Those systems are designed for assisting conceptual and/or logical design processes.Some even provide help for physical database design. Automation of database designprocess can help overcome the problems of the above traditional approach by codifyingthe database design methodology in a computer program.However, some major problems remain. First, only a few of the above automatedsystems try to help a database designer incorporate semantic integrity constraints ina database schema ([Yang and Goldstein, 1989]). Many types of semantic integrityconstraints have never been identified by any system. Furthermore, as observed byTroyer [1989, p. 423], “constraints often considered as first class citizens in the conceptualmodelling seem to become pariahs during the transformation [from a conceptual schemato a relational schema]”. Second, those automated systems usually only consider databasestates and static properties. They seldom consider the behaviour of a database, that is,the state transitions and dynamic properties. Some systems (e.g., E2R [Kozaczynskiand Lilien, 1988] and EXIS [Yasdi and Ziarko, 1987]) model the behavioral semanticsbytransaction modelling or event modelling. Dynamic semantic integrity constraints,whichplace restrictions on database transitions, have not really been treated as constraints ondata.Chapter 1. Introduction 31.2 Semantic Integrity ConstraintsSemantic integrity is concerned with the logical meaning (i.e., the intension) of storeddata and preserving the correctness of database contents even though users and application programs try to modify it incorrectly ([Fong and Kimbleton, 1980], [Fernandez, etal., 1981]). A database schema consists of structures and semantic integrity constraints(hereafter abbreviated as SICs) ([Tsichritzis and Lochovsky, 1982], [Frost,1984]). Thestructure part tells us relatively little information other than the basic structure whatelementary items of data are grouped into what larger units; but the SIC part providesinformation about all allowable occurrences — current and future ([Morgenstern,et al.,1989]). These SICs express data integrity semantics, that is, the part of themeaning ofstored data needed to determine correctness. They are invariant restrictionson the staticstates of the stored data and the state transitions caused by the primitiveoperations:insertion, deletion, or update. They express what is and is not allowed in thepart of theuniverse that is represented by the stored data’.Traditional database design techniques focus on structure andefficiency. The relational database model is the popular logical data model with a good theoreticalfoundation. Data dependency (e.g., functional dependencies, and multivalued dependencies)theory has been well-formalized in the literature (e.g., [Delohel, 1978], [Ullman,1982]). However, data dependencies only capture part of semantic integrity. A relationaldatabase infourth or fifth normal form may still represent little of the data semantics,that is, themeaning of stored data. In addition, as observed by Kent [1979,p.127], “The assurnption teiids to be that functional dependencies (if specified at all) have been usedduringthe design phase of the database to insure that relations are in third normalform, and‘SICs also provide information that can be used to determine the most appropriate structure for theschema.Chapter 1. Introduction 4then discarded. They do not seem to be present at run time to explain the semanticstructure of the data.”Limitation of SICs Although a database with embedded SICs would be more correctthan the same one without SICs, the absolute correctness of the database is still notguaranteed. For instance, a user might incorrectly update the salary of an employee but,as long as it was within the range allowed by the SICs, it would stillbe accepted bythe DBMS2.Also note that enforcement of a SIC is based on the assumptionthat alldata already stored in the database are correct. If we know that some data were enteredincorrectly, the related SICs may need to be turned off in orderto correct these errors3.In addition, SICs are passive restrictions on data. By incorporatingviolation actionsto alter other objects when a SIC is violated, one can make thedatabase more active([Morgenstern, 1983]). However, a SIC can only trigger some actionwhen it is violated.1.3 SICs, “Constraints” and TransactionsThe reader should be cautious that in the literature the word “constraints”is sometimesused to include all abstract relationships between objectsin an information system, e.g.,the correctness of mapping to physical storage, and those preservingreliability, concurrentconsistency, and security. The SICs discussed in this dissertation area proper subset of2Similarly, Thompson [1989,P.95] points out that “a semantic database does not, and cannot,provide meaning, or strong-semantics, but it provides contexts within which itis possible for data to bemeaningful”. He names these contexts as weak-semantics. This dissertation doesnot adopt that term,but the reader should be cautious of the limitation.3For instance, if a SIC states “salary never decreases”, an update from$12,000 to $10,000 will berejected. However, it is possible that the $12,000 was entered incorrectly thefirst time. The SIC mustbe turned off in order to correct the input error.Chapter 1. Introduction5these more general “constraints” (e.g., [Shepherd and Kerschberg, 1986]) “laws”,or “sub-laws” ([Paulson, 1989], [Wand and Weber, 1988; 1989; 1990]).The following are somekinds of “constraints” that are not dealt with in this research.Other Database Constraints In a database managementsystem, there are at leastfour major aspects to the prevention of errors in a database environment:reliability,concurrent consistency, security, and semantic integrity ([Hammerand McLeod, 1975],[Eswarn and Chamberlin, 1975]). Reliability is concerned witherrors due to the malfunctioning of system hardware or software. Concurrent consistencyis the prevention ofinconsistencies that may arise due to concurrent processing(in which multiple processesconcurrently operate on shared data). Security deals withpreventing users from accessing and manipulating the data in unauthorized ways. Although unauthorizedupdatesto the database are sometimes said to violatethe “integrity” of the system, this type oferror is a security problem and is not consideredin this research.Non-database Constraints Database design is partof information system development, which is the process of modelling a portion of thereal world and transforming itinto an implemented artifact to deal with information processingfunctions in an organization. SICs are facts about the stored data ([Oren,1985]) and are used to capture dataintegrity semantics ([Dampney, 1988]). However, data integritysemantics do not includeall information system semantics, that is. knowledge representedin the information system. Some constraints are inherent to applicationprograms or transactions rather thandata ([Flemming and Halle, 1989,p. 140]).Chapter 1. Introduction 6Transaction Modelling Data-oriented system modelling has long been criticizedfor focusing only on static properties of an information system. There is ongoing researchon how to add dynamics to data-oriented modelling. For example, the ACM/PCM (Active and Passive Component Modelling) uses SHM+ (the Extended Semantic HierarchyModel) to include both structural and behavioral properties ([Brodie and Ridjanovic,1984].) The popular conceptual modelling tool, the E-R model, has also been extendedto model behavioral aspects of the real world by defining transactions or events(e.g.,applying the Petri Nets technique [Sakai, 1983a], [Solvberg and Kung, 1986]).A transaction, sometimes called an application-oriented operation (e.g., “hire”) ([CasanovaandFurtado, 1984]), consists of one or more database queryingand/or altering primitive operations that must be treated as an atomic unit, and reflects changes and “happenings”in the real world. In transaction modelling, transaction specifications ofteninclude preconditions and post-conditions. The pre-conditions specify what mustbe true before thetransaction can be applied. The post-conditions specify the actions to betaken and thetest-conditions that should be true after the transaction.Transaction-driven Constraints An information systemis an artifact that relieson transactions to track changes in the real world. SICsplace logical restrictions onstored data and are independent of any particular transaction. Transaction-drivenconstraints are inherent to transactions rather than data, and assure consistencybetweenthe information system and the real world. A transaction-driven constraintrequires thatwhen a change or “happening” occurs in the real world, a transactionmust be performed4Similarly, there are some information system development methods, e.g., the Z approach([Spivey,1988]) and VDM (Vienna Development Method) ([Jones, 1986]) to expressformal specifications of staticand dynamic aspects of information systems by modelling transactions orevents.Chapter 1. Introductionto modify the database faithfully. For example, pre-conditions of a transaction may include some procedural or manual checks (e.g., issuing a message to ask whether thereare signed documents or calls from customers). They are more prone to change as anorganization evolves. They might include rules on some objects (e.g., virtual fields) inthe information system that are not modelled as stored data in the database. They arelikely to require a great deal of human checking since they may involve non-data objects(e.g., the above mentioned signed documents).Note that because a transaction consists of primitive operations, it must also conform to all of the SICs on data. It might be inefficient to enforce SIC checking whenatransaction is performed. It is possible to design an algorithm to transform SICsinto thepre-conditions and post-conditions of transactions. For example, Lipeck[19861describessome general rules for these transformations based on his temporal logic language fordynamic SICs. In fact. most of the usual pre- or post-conditions oftransactions discussedin the database literature are transformed SICs rather than true transaction-drivenconstraints.To illustrate, consider the following example. A clerk receives a telephone callfrom acustomer to place an order. A transaction-driven constraint stipulatesthat a transaction“new-order” must be performed. It may stipulate that the transaction mustbe performedexactly according to the memo written by the clerk or the cassette recording ofthe call.In addition, it may stipulate that the transaction should be performed immediatelybythe clerk or in batch during the night by a computer operator. If the transactionis tobe performed, a record of the order is to be “inserted” into the database. SICswouldthen check the attributes of an order, the existence ofthe customer in the database,etc. However, since SICs are intensional expressions, they do not, in general, restrictwhen an order must be put into the database or to whom the order should be shipped,chapter 1. Introduction 8etc. In this example, in practice, it would also be impossible to use a SIC to increaseautomatically the customer’s account balance by the total price of the order unless allpast orders and payments are kept in the database to allow evaluation of an invariantformula among these objects5.Sending a bill to a customer may also involve a numberof transaction-driven constraints in addition to SICs.1.4 Enforce Semantic Integrity Constraints via Application SoftwareIn traditional data modelling, database design is concerned with the structure of the dataand most integrity checking is left to the application programs (procedures). Fernandez,et al. [1981,p.109] identify the problems of relying on application programs for integritychecking as follows.• Checking is likely to be incomplete because the application programmer may notbe aware of the semantics of the complete database;• Each application program must trust other programs that modify the database —one rogue program could corrupt the whole database;• Code to enforce the same SICs occursin a number of programs, wasting programming effort and risking inconsistencies;5One should note that statement-i “the new customer balance is equal to its old balance plus totalprice of the order” is not an invariant assertion of a SIC for updating Customer.Balance. That statementis specific for the transaction, New_Order, to update the Customer.Balance, and does not apply to othertransactions (e.g., a new payment by the customer or an update of the Order. TotaiPrice), which alsoaccess these same data objects (i.e. Customer.Balance or Order. TotaiPrice). The invariant formula inthis example would be that Cttstomer.Balance is equal to the sum of Order. TotaiPrice minus the sumof PaymenLAmount. However, this formula will normally be inefficient to check. For efficiency reasons,we may transform this formula into pre- or post-conditions on transactions. For example, statement-iwould be a post-condition of the New-Order transaction.Chapter 1. Introduction9• The criteria for integrity are buried in procedures and are thereforehard to understand and control;• Primitive operations (update, insert, anddelete) performed by users of high-levelquery languages cannot be controlled.Fleming and Halle [1989,p.140] state that traditional modelling methods typicallyleave the definition of SICs to application development ratherthan to database development. This application-driven method raises three dangers asoutlined below:• A user may define the SICs toonarrowly, reflecting only the needs of the immediateapplication;• Multiple users interested in differentapplications may have different perspectiveson SICs. Thus they may define inconsistentor even conflicting SICs as part of theapplication specifications;• Maintaining correctness and consistencyacross SICs implemented via multiple applications may be extremely difficult (or impossible)to accomplish as applicationsevolve and new applications arrive.In summary, Fernandez et al., and Fleming and Halle identifythe disadvantages ofenforcing integrity via the application softwareas: incomplete, inconsistent, redundant, incorrect, hard to understand and control,and difficult to maintain.Using the database embedded SIC approach will overcomethese disadvantages.Chapter 1. Introduction 101.5 Embed Semantic Integrity Constraints in a DatabaseDatabase Approach Compared to traditional information processing before the development of database concepts, the database approach is claimed to have the followingadvantages [Goldstein, 1985,p.8]:• Controlling data duplication and inconsistency;• Facilitating the sharing of data among applications;• Assisting the coherent management of data asa basic organization resource;• Increased programmer productivity;• Increased applications’ reliability;• Enabling quick, economical responseto ad hoc requests for information;• Protecting data from damageor unauthorized access;• Providing data independence.It is not surprising to find that the disadvantages of enforcing integrity via the application software criticized by Fernandez et al., and Fleming and Halle are similar to thedisadvantages to distributing stored data to separate files instead of a database.The fulladvantages of the “true database approach” are still not achieved if only the corporatewide “data” are included in the database, but the corporate-wide SICs are distributedto separate application programs.Chapter 1. Introduction11Behavioral Modelling As stated in the previous section, research has beenongoingto enhance data-oriented system modelling by incorporating some behavioral modellingmethods to model the state transition and dynamic properties ofan information system. Some researchers claim that transaction modelling is part of database design(e.g.,[Brodie, 1986], [Brodie and Manola, 1989]). Some researchers evenpropose using transaction specifications to replace SIC specifications (e.g., [Abiteboul andVianu, 1985],[Lipeck, 1986]). However, even if we agree that a “complete” databasedesign includestransaction modelling, there would be some disadvantagesof distributing SICs to separate transactions (e.g., redundancy and maintenance problems)that are similar to thecase of distributing SICs to separate application programs.Both dynamic and staticSICs are logical restrictions on data and should be embedded in adatabase.In summary, this research assumes that transaction modelling isstill valuable, butSIC specifications are fundamental to behavioral modelling.The position of this researchis similar to the idea of having both specificationsin the BASIS approach ([Leveson et al.,1983]), the idea of hierarchical levels of database specificationsproposed by de Castilho,et al. [1982] and Casanova and Furtado [1984], andthe hierarchical specification layersused in the CADDY design environment ([Hohenstein and Hülsmann,1991]). That is, a“complete” database design will produce the two levelsof specifications as follows.1. First-level specifications form the database schema thatconsists of structures andSICs. These are data-driven and tend tobe more fundamental (stable) since theyare not affected by the addition or deletionof transactions;2. Second-level specifications consist of the specifications of transactions.The preconditions and post-conditions provide an effective way of implementingSICs andmay contain some transaction-driven constraints.Chapter 1. Introduction12Knowledge-based Perspective Databases havebeen criticized for lacking abstractknowledge6([Wong and Mylopoulos, 1977], [Bubenko,1980]), and inference capabilities([Wong and Mylopoulos, 1977], [Brodie, 1986]). Recently, the developmentof knowledge-based systems (KBS) provides useful insights for databaseresearch. Some researchers(e.g., Jarke and Vassiliou [1984], Missikoff and Wiederhold[1986], and Kennedy and Yeh[1990]) have recognized that a KBS can provide aDBMS with better semantic modelling,reasoning ability, improved user interface, etc. Aspecial type of system, expert databasesystem, has been proposed to integrate a knowledge-basedsystem (expert system) with adatabase system ([Missikoff and Wiederhold,1986]). The approach to embedding SICs inthe database fits this new trend. SICs are abstractknowledge that can be used to providedeductive capabilities to a database,that is, make a DBMS appear more “intelligent”.An obvious example is the case whereSICs are used to reduce knowledge-based queryevaluation costs ([Brodie, 1986]).For example, some queries can be answeredusing oniythe semantics expressed through SICs ([Chakravarthy,et al., 1987]).1.6 The Research Questions, Objectives,Methodology and ScopeRecently, a number of semantic datamodels (e.g., [Hull and King, 1987], [PeckhamandMaryanski, 1988]) have been proposedto overcome the weaknesses of traditionaldatamodels (i.e., the hierarchical, network, and relationalmodels) in modelling the semanticsof the real world. However, semanticintegrity constraints have not receivedthe attentionthey deserve. The relational model isstill the most popular logical datamodel. Almostall research on semantic integrity constraintsin the literature is confined to exploring6Bubenko [1980] defines two kinds of knowledge:(a) concrete knowledge is seen as facts, statementsconcerning individual phenomena, entities orrelationships in the model of the environment;(b) abstractknowledge denotes such information which augments our interpretationof concrete information and bywhich we can draw inferences, conclusions of other facts.Chapter 1. Introduction 13the efficient enforcement (checking) methods for a few kinds of SICs in a relationaldatabase or deductive database. No suitable language exists to represent the featuresof SICs precisely: certain or uncertain; for what data object; operation-independentoroperation-dependent; conditional or unconditional; strong,soft or self-correcting; andstatic or dynamic7.Research Questions Based upon the above observation, this researchaddresses thefollowing questions:Is it possible that the features of SICs can be precisely represented bysomelanguage when using an E-R model for conceptual modelling anda relationalmodel for logical modelling?Can we provide a model to incorporate the necessary SICs in adatabaseschema during conceptual modelling?Can we help a database designer capture these SICsby providing automatedtools?Research Objectives To answer the above researchquestions, the following specificobjectives were established.1. To develop a precise SIC Representation model forspecifying the componentsof a SIC. Since current languages are not sufficient to representthe features of a7These features are described in detail in Chapter 2. An operation-dependent SICis the one thatmust hold for some object upon some operations, but not upon other operations.Violating a strong SICwould cause errors — reject the operation. Violating a soft SIC would only get a warningmessage.Chapter 1. Introduction14SIC when the structure part of a schema is in the E-R model or in the relationalmodel, this research will first present a SIC Representation model that is extendedand modified from the model proposed by Fernandez et al. [1981], Date [1983],andBertino and Apuzzo [1984].2. To develop a comprehensive modelling tool, called the E-R-SIC model,for incorporating the necessary SICs during conceptual modelling. Sincethe traditionalE-R model contains only a few inherent SICs, this research proposes acomprehensive model, which is an extended E-R model, for incorporatingSICs in a databaseschema.3. To propose conceptually the SIC elicitation subsystem of anautomateddatabase design system for helping a database designer capturethe necessarySICs. The SIC Representation model and the E-R-SICmodel are complementary.The captured SICs are represented in terms ofthe Representation model. The SICRepresentation model is a language or representationtool to represent SICs. Itsfunction is similar to the E-R diagram that we useto represent the structural partwhen using an E-R model. Both themodels are application domain-independentand are suitable for implementation as part of anautomated database design system. Some problems, for example, the procedure toquery the database designerto identify SICs, verification of the capturedSICs for consistency and nonredundancy, reformulation and decomposition of a generalSIC into operation-dependentsub-SICs, and transformation of the SICs referencingentities and relationshipsinto corresponding ones referencing relations, needto be solved in order to have aworkable automated database design system. A SIC originally obtainedfrom thedatabase designer may be general, i.e., it may be relevant to severalobjects onvarious operations. Decomposing such a SIC is to rewrite it into severalsub-SIGsChapter 1. Introduction 15represented in terms of the SIC Representation model. Each of them is only relevant for one object on one access operation — that is, it is operation-dependent.The purpose of decomposition is not only to let the database designer know clearlywhat precise implications of a general SIC would be, but also to reformulate theoriginal SIC into several formats that can be efficiently enforced later.Research Methodology Methodologically, this research has two components. Thefirst is model building to develop two models. The second is algorithm designingtodevelop algorithms to verify, decompose, and transformSICs.Research Scope Figure 1.1 illustrates the scope of this research. Thisresearch isprimarily concerned with semantic integrity constraints in the conceptual designphaseand how they are translated into SICs in the logical design phase. In particular,itfocuses on the Entity-Relationship model and its transformation into arelational modelbecause of the popularity and wide-sprea.d adoption of these as conceptualand logicaldata modelling tools, respectively.A complete automated database design system would includea structure subsystemfor constructing the structural entities, relationships, and relations, and aSIC elicitationsubsystem for eliciting the SICs. The construction of the structurepart of the schemais not the focus of this research although the SIC elicitation subsystemwould need theconstructed E-R diagram as input.The proposed approach in this research is different from Codd’s ([Codd,1979]) although the purpose of capturing more meaning may be the same. Coddproposed a newdata model RM/T to extend (indeed replace) the relational model. This researchretainsSIC specificationsfor E-R model-____________________________ _________-.Transaction:DesignITool :Transactions rogramsipre condition andithteity Maintenance Subsysten [lnterity MaintenanceSubsystenpost-condition I IspecificationsLE H Data Structure Relational Data Structureof transactionsB-H DBMS Relational DBMSfor E-R modepre-condition and post-conditionspecifications of transactionsI for relational modelNote that some general SICs needs to be first decomposed into precondition and predicate components before verifyingthem.Legend: dash lines and boxes are beyond the scope of this research.Chapter 1. Introduction16An Automated Database Design SystemProposed SIC Elicitation SubsystemInterfacebetweentwoSubsystemsDB specificationin the extendedE-R modelStructure SubsystemStructureElicitationInterfaceE-RReqviremen,,,,,/DB DesignerHeuristicsConsistency & NonredundancyRules for DifferentSIC TypesE-R-STC Modeldoonain- independentknowledaesemantic integrityconstraintstransaction-drivenconstraintsSICspecificationsfor relationalmodelFigure 1.1: A Proposed Automated Database Design Subsystem for Eliciting SICsChapter 1. Introduction 17the traditional relational model because of its popularity, but proposes to include thenecessary SICs in addition to relation structures in a relational schema. The output ofthe proposed SIC elicitation subsystem would be SIC specifications, which are suitablefor either the E-R model or the relational model. There should be an integrity maintenance subsystem in a traditional E-R DBMS8 or relational DBMS. Some functionalrequirements (e.g., regarding SIC enforcement schedules and SIC inheritance,etc.) ofthis integrity maintenance subsystem are discussed in Chapter 4. However,in general,how this integrity maintenance subsystem would work is a future research topic beyondthe scope of this dissertation. It is also assumed that the DBMS would supportinheritance mechanisms.This research does not address transaction modelling. The databasedesigner maylater input transaction-driven constraints and SIC specifications toa transaction designtool to produce the pre-conditions and post-conditions of transactions, whichwould beperformed in an E-R DBMS or relational DBMS.This research neither empirically tests the “usefulness” of using databaseembeddedSICs versus the traditional enforcing integrity via the application softwarenor tests the“usefulness” of using the proposed automated database design system. Theempiricalresearch is a future research topic.1.7 ContributionsThe contributions of this research are both theoretical and practical.The theoretical contributions include the following:8]is assumed that there are some (probably experimental) ]JBMSs to define and manipulatedataobjects directly in the E-R model.Chapter 1. Introduction 181. This research provides a model to represent precisely the features of a SIC.2. This research also develops a model to incorporate the necessary SICs in a databaseschema. The approach is to model dynamic as well as static SICs in the databaserather than in transactions or programs. The gap between traditional databasemodelling and application programming (transaction modelling) will be bridged bythe two models presented in this research.3. The work on reformulating and decomposing a general SIC into sub-SICs, andtransforming them from an E-R schema into a relational schema may be interestingto the computer science discipline because the current literature has explored theseproblems for only a few kinds of SICs.On the practical side, the contributions of this research include the following:1. The proposed automated database design system would help a database designernot only design the database structure but also include the necessary SICs.2. This research provides a foundation for overcoming the well-known problem ofrepresenting data integrity semantics in current relational database systems. Theresulting database would have the advantages of embedded SICs, e.g., greater consistency, deductive capabilities, etc. The SIC representation would facilitate theefficient enforcement.3. A database designer would find that modelling data integrity semantics becomeshis (her) responsibility and right. Important business rules and even heuristics canenter the database schema in an early database design phase — conceptual modelling. Applica.tion programmers could focus on information system semantics otherChapter 1. Introduction19than data semantics, and need not worry aboutSIC checking in their individualprograms.4. This research provides a starting point for future empiricalresearch to test theusefulness of using database-embedded SICs versusthe traditional approach ofenforcing integrity via application software.1.8 The Dissertation OutlineThis chapter has given the definition ofSICs, described the motivations, methodology,scope and contributions of this research.Chapter 2 briefly reviews work on SICsin theliterature. Chapter 3 introduces theSIC Representation model for representingSICsuniformly and precisely. Chapter 4 describes howthe SIC Representation model can beapplied. Chapter 5 proposes the E-R-SIC modelfor incorporating SICs in a databaseschema. Chapter 6 gives an example ofthe use of the E-R-SIC model and discusses relatedissues and usefulness. Chapter7 conceptually proposes a SIC elicitation subsystemtohelp the database designer use the E-R-SICmodel and the SIC Representation model.Finally, Chapter 8 offers conclusionsand describes how future researcherscan extendthis dissertation. A series of appendicesis attached to provide related materialsin detailand give some examples.Chapter 2Review of Previous WorkAs briefly mentioned in Chapter 1, an automated database design system wouldfirstelicit general SICs from a database designer, then verify and reformulate (decomposeif necessary) them in some representation languagefor the database structural schemarepresented in the E-R model, and finally transformthem into corresponding ones in therelational model. This chapter reviews previous workon related aspects of producingSICs.The literature on SICs is rich. Most research concentrates on classifyingand efficientlyenforcing SICs rather than capturing and incorporating them ina database schema. Inaddition, most research discusses SICs in the contextof relational or deductive databasesrather than the databases in the conceptual level of E-Rschema. Therefore, transformation from the SIC representations in the E-R modelinto corresponding ones in therelational model has not been explicitly discussed in the literaturealthough some existingautomated database design systems mayperform it for a few types of SICs that theyidentify. Section 2.1 first reviews the different ways to classifySICs because the classification can help understand the importantSIC features so that we can incorporate,represent and enforce them properly. Section 2.2 reviews previousresearch on languagesfor representing SICs. Section 2.3 reviews how verification of SICshas been dealt within the literature. Section 2.4 reviews previous research on reformulating SICs. Finally,20Chapter 2. Review of Previous Work 21Section 2.5 reviews how SICs are handled by existing automated database design systems.2.1 SIC ClassificationIn the literature, there are a number of different ways to classify SICs.2.1.1 Classification Based on ExplicitnessOne way to classify SICs is based on their explicitness from the perspective of thedatamodel in use. SICs can be inherent, explicit or implicit ([Tsichritzis and Lochovsky,1982], [Brodie, 1983], [Brodie, 1984], [Davis and Bollriell, 1989]). Inherentconstraints areintegral parts of the structure of a data model (e.g., record relationships in thehierarchicaldata model are structured as trees; tuples in the relational data modelare not duplicatedand the ordering of tuples is not important). Explicit constraints aredefined explicitly bysome specification mechanism; for example, a database designer might explicitlyspecifythat the salary of any employee must be less than $1,000,000. Finally, implicitconstraintsare logical consequences derived from inherent or explicit constraints;for example, thetransitive closure of functional dependencies can be deducedfrom a subset of functionaldependencies.Discussion A data model with rich inherent SICs would relieve the databasedesignerfrom specifying many SICs explicitly. From this perspective,neither the E-R modelnor the relational model is a good modelling toolfor designing a database since bothmodels, especially the relational model, contain very few inherent SICs. However,theyare widely used for other reasons. It is one of the motivations of thisresearch to helpChapter 2. Review of Previous Work 22the database designer identify and precisely specify explicit SICs. A database designerand an automated database design system need to know the inherent SICs of the datamodel (e.g., the E-R model) that they use. Otherwise, it is likely that these inherentSICs would be lost when the structural schema of the conceptual model is transformedinto the logical model (e.g., the relational model). Both inherent and explicit SICs shouldbe represented properly at the conceptual design phase and transformed into a logicalschema. Given a set of inherent and explicit SICs, the database designer shouldalsobe aware of the derived consequences, i.e., implicit SICs, to verify its consistencyandnon-redundancy.2.1.2 Classification Based on Applied ObjectsClassification of SICs based on applied objects isthe most common classification andis concerned with the data objects to which aSIC applies ([Eswarn and Chamberlin,1975], [Fong and Kimbleton, 1980], [Fernandez et al., 1981], [Tsichritzis andLochovsky,1982], [Date, 1983], [Weber, et al., 1983], [Simon and Valduriez,1984]). There are threecategories of these SICs:1. Strong data type constraints: Strong data type constraints (also calleddomainconstraints) are applied to a single data item, for example,a field or an attribute.They include the following:(a) Value constraints specify the range of acceptable valuesof a data item (e.g.,within some numerical bounds or an enumeratedset) and whether a null valueis allowed.Chapter 2. Review of Previous Work 23(b) Norivolatility constraints declare whether a data item value can be changed([Kozaczynski and Lilien, 1988]).(c) Extended format constraints permit specifications of data type, length, andformat (mask) pattern;(d) Legal operation constraints limit the operations that can be performed on agiven domain ([Eswarn and Chamberlin, 1975], [Fong and Kimbleton, 1980]);for example, two dates cannot be multiplied.2. Record (tuple) constraints: Record constraints apply to an individual record (anoccurrence in terms of the E-R model). For example, in each payroll record,Gross_Salary should be greater than Deductions. One interesting kind ofrecordconstraint is the case where the value of one field in a record isconditional onthe values of other fields ([Benci et al., 1976]). For example, Salary isequal toBase_Salary plus Bonus. In this example, regardless of whether the value ofSalaryis entered by a user or computed by the system itself, as long as this value is explicitly stored, the integrity constraint must hold. If the conditional attributeisnot explicitly stored, it is a virtual invariant, derived item because onedata itemis defined as a function of another ([Etzion, 1989]).3. Set constraints: Set constraints apply to a set of records (occurrences). They may hebased on built-in aggregate functions (e.g., average, minimum, count). Therefore,they are sometimes called aggregate constraints ([Tsichritzis and Lochovsky, 1982])or set function constraints ([Dogac et al., 1985]). They may be based on comparison(e.g., exclusion or inclusion) of one set to another. For example, the set of managersmust be contained in the set of employees. These records (occurrences)may belongto the same relation (entity) type or different relation (entity) types. From thisChapter 2. Review of Previous Work24observation, some researchers (e.g., [Fong and Kimbleton, 1980])further classifythem into relation constraints and multi-relation constraints.Discussion This categorization reminds us that all objects in a database— attributes,entity and relationship occurrences and types in the E-R model, columns, relation tuplesand types in the relational model may have related SICs thatneed to be identifiedand represented. However, there are two controversial points.• Should we capture legal operationconstraints? Note that this kind of “constraint”borrows the notion of abstract data types ([Goldstein,1985]), and the “operations”are basic arithmetic or string operations thatmay be performed on value domains,not the database primitive manipulation operations.This research will not considerthese “constraints” because they do notconform to the SIC definition. Thiskindof “constraint” is relevant only if the dataobject is manipulated by some restricted“operations” (e.g., arithmetic or date operations).It is not a restriction on staticdatabase states or state transitions.• Should we capture extendedformat constraints? One mayargue that the specification of data type, length, and formatof an attribute is a syntactic ratherthana semantic issue. However, this research includesextended format constraints aspart of SICs for the following two reasons.(1) The semantics of an attribute arebased on the syntax we agree to use. Forexample, the meaning of a salaryrangefrom $1,000 to $50,000 in integers is differentfrom that of the same range of realnumbers. (2) SICs in this research are attachedto the stored data rather than realworld objects. A database designer would specifya SIC in terms of the interests ofthe application and include the data in the formatthat is meaningful to it.Chapter 2. Review of Previous Work 252.1.3 Classification Based on Operation TypeSICs that are concerned with database operations can be classified as operation-dependentor operation-independent constraints ([Eswarn and Chamberlin, 1975], [Fernandez et al.,1981], [Weber, et al., 1983]).• operation-independent: A constraint is operation-independent ifit must hold forsome object on all access operations, although for efficiency reason it may beenforced only on some operations.• operation-dependent: A constraint is operation-dependentif it must hold for someobject on some access operations, hut not on other access operations.Discussion We only need to consider three kinds of operation types: insertion,deletionand update because a retrieval (query) operation does not change any data.An updateoperation is on an attribute of an entity, relationship occurrence, or relationtuple, butnot on the whole entity, relationship occurrence, or relation tuple. It is also theonlyapplicable operation on an attribute. Therefore, if a SIC is relevant to an attribute,it isoperation-dependent. For an entity, relationship occurrence, or relation tuple,there aretwo possible operations insertion and deletion. It appears that in the real worldif a SICapplies to an entity, relationship occurrence, or relation tuple, it is likelyto be operationindependent. Operation-dependent constraints (e.g., “a project can be deletedonly if itsbudget is equal to zero”) seem to be relatively less common. This may explainwhyprevious research pays only little attention to them. However, thepossible relevance tospecific operations creates a special requirement on the SIC representation language.Chapter 2. Review of Previous Work 262.1.4 Classification Based on PreconditionAccording to Fernandez et al. [1981], SICs based on the precondition for enforcement canbe classified as either conditional or unconditional. Conditional constraints are enforcedonly when certain preconditions are met. For example, the salary of an employee whohas worked for less than three years must not be over$30,000. Unconditional constraintsare always enforced.Discussion Whether a SIC is conditional or unconditionalis also relative to the object concerned. It seems that the research of Fernandez et al. [1981] is the only workthat classifies SICs in this way. However, this categorization perspective placesanotherrequirement on the SIC representation language — the context underwhich the SIC isapplicable should precisely be represented.2.1.5 Classification Based on Certainties of SICsSICs can be classified as certain or uncertain ([Oren, 1985], [Morgenstern etal., 1989]).A certain constraint specifies some fact about the data semantics that isassumed to beabsolutely true (e.g., “the height of a person is greater than zero”[Oren, 1985]). Anuncertain constraint is one that is generally true, but there is a slight probabilitythat itcan be violated (e.g., “the weight of a person is less than 200 kgs”). Certain anduncertainconstraints lie on a continuum. Wiederhold [Morgenstern et al., 1989] identifiesfour levelswith respect to the degree of certainty and absoluteness of constraints:1. absolute truths that arise in the physical world and in the database; for example,an employee has a birth date.Chapter 2. Review of Previous Work 272. rigid situational rules that do not change often; for example, each employee isassigned to one department.3. business rules that may change often; for example, a manager’s salary is greaterthan the salary of his or her subordinates.4. heuristics; for example, employees are usually assigned to projects that match theirspecialities.A more sophisticated proposal such as defining and maintaining “measure(s) of accuracy”may be possible ([Eswarn and Chamberlin, 1975]). Violation of a certain constraint maybe rejected as an error, whereas violation of an uncertain constraint may only cause adiagnostic message. Therefore, uncertain constraints are sometimes called soft constraints([Eswarn and Chamberlin, 1975]).Discussion This classification is another perspective that has not drawnmuch attention. Since it is likely that an organization might have a number of uncertain butusefulconstraints, there should be some way to represent them.Note that the four levels of certainty mentioned by Wiederhold ([Morgensternet al.1989]) are in fact classified by two kinds of “uncertainty” exception and permanence. “Heuristics” may have exceptions. “Absolute truths”, “rigid situational rules”,and “business rules” have no exceptions. If there is any exception, the “rules”shouldbe modified to accommodate the exception and the “modified rules” have no exceptions.These levels differ on their permanence the “uncertainty” of how often theymay bechanged.Should SIC specifications include both “uncertainty” of exception and “uncertainty”Chapter 2. Review of Previous Work 28of permanence? The “permanence” information may he useful for SIC managementduring database usage. However, the organizational environment is turbulent and constantly changing. The permanence of a rule is not easily decided in advance when adatabase designer designs a database. Therefore, this research does not include this kindof information.2.1.6 Classification Based on Violation ActionSICs based on the violation action, i.e., what would happen if the SIC is to be violated,can be classified as strong, soft, or self-correcting ([Weber et al., 1983]). If a strongconstraint is violated, the operation is rejected and the user receives an error message.If a soft constraint is violated, the user only receives a warning. If a self-correctingconstraint is violated, its error correcting action is executed.Discussion In the published literature, [Weber et al., 1983] is the only research thatclassifies SICs according to their violation actions although there is other researchthatdiscusses alternative violation actions. Violation actions are classified into the threecategories given below, which are synthesized from previous research ([Hammer andMcLeod,1975; 1976], [Hammer and McLeod, 1976], [Casanova and Tucherman, 1988] and [Flemingand Halle, 1989]).1. Reject — reject the requested database operation, signalling an error.2. Warning — allow the requested database operation, but issue a warning.3. Corrective Action — perform a corrective action; that is, an auxiliary procedureknown as a triggered action ([Fernandez et al., 1981]). Usually, it causes the DBMSChapter 2. Review of Previous Work 29to insert, delete, or update other objects.For referential integrity constraints, e.g., E.A C F.B, where E.A is a foreign key ofa relation E and F.B is the primary key of another relation F, the corrective actioncan be further classified as:(a) Propagate for example, insert a referenced tuple or deletethe referencingtuples.(b) Nullify for example, set the referencing attributes in the referencingtupleto null.(c) Default for example, set the referencing attributes in the referencingtupleto predefined default values.(d) Others — for example, triggers or other procedures that aredomain specific.Note that “nullify” and “default” are special cases of “propagate”.Traditionally, a SIC is strong by default. However,this classification is important toremind a database designer that it is not necessary thatif a SIC is violated, the operationis just rejected. There are other options that can bespecified at database design time.2.1.7 Classification Based on Enforcement ScheduleSICs based on the enforcement time, i.e., whenthe SIC is enforced, can be classifiedas immediate or deferred (e.g., [Eswarn and Chamberlin,1975], [Fong and Kimbleton,1980], [Fernandez et al., 1981], [Date, 1983], [Weber,et al., 1983]). Immediate constraintsare enforced immediately after each database operation.Deferred constraints are notenforced until the end of a transaction. It may also be possible topermit the user toChapter 2. Review of Previous Work 30switch integrity checking ON or OFF that is, to have user-invocable constraints ([Fougand Kimbleton, 1980], [Bertino and Apuzzo, 1984], [Weber. et al., 1983]).Discussion A number of researchers discuss this categorization. However, differentspecification levels should not be mixed up. This categorization is useful when consideringthe enforcement of SICs, or designing transaction specifications. Fernandez etal. [1981]allow the precondition component of their original model to specify whether theSICis to be applied immediately or deferred to the end of a transaction or to a periodicaudit. However, Bertino and Apuzzo [1984] treat the enforcement scheduleseparately asdecided by an integrity maintenance subsystem because they believe that theenforcementschedule of a SIC depends upon the transactions or programs that are executing.Theirexample may be helpful to understand their arguments:A SIC: “each employee working on project P125 must earnless than $3,000.”Suppose that now the database contains an employee entity occurrencewhois in department D55, and works on project P125.Consider a transaction: first update the salary of all employees in D55 to$3, 500; then re-assign the project of all employees in D55 to P200.This transaction should be correct.Based upon the above observation, Bertino and Apuzzo[1984] propose a criterion:Basically, an integrity constraint is enforced at the end ofthe transaction, ifattributes present in the constraint are modifiedby more than one update statementChapter 2. Review of Previous Work 31in the transaction. Otherwise, the constraint is enforced after each tupleupdate, if it is in class Cl [i.e., tuple constraints], or after each update-request, if it is in class C2 [i.e., relation constraints] or C3 [i.e., multi-relationconstraints].Furthermore, the enforcement schedule can be more sophisticated and more efficient.Lafue [1982] proposes that constraint checking can be delayed until the dependent instances become of interest9.That strategy is called by Morgenstern [1986]“propagationwhen used.” Between immediate propagation and “propagationwhen used” is an intermediate strategy that has been referred to by Morgenstern [1986]as opportunisticpropagation, both in the sense of doing the work of constraint propagationwhen thecomputer is idle, and in the sense of using priority ranking of the constraints toselectthe order in which they should be considered for propagation.Thus, because the enforcement schedule is closely related to transactionmodellingand enforcement implementation efficiency strategies, it shouldnot be included in SICspecifications. Neither should be the option to switch integrity checkingON or OFF.2.1.8 Classification Based on DynamicsSICs based on their dynamics can be classified as static ordynamic([Eswarn and Chamberlin, 1975], [Bracchi et al., 1979], [Fong and Kimbleton,1980], [Fernandez et al., 1981],[Date, 1983], [Bertino and Apuzzo, 1984], [Brady and Dampney,1984], [Heuser andRichter, 1986]). Static constraints specify correct database states. Transitional(dynamic)constraints characterize valid state transitions, i.e., are concerned with ‘admissibi1ity”of°A dependent variable is a variable that can be operated on by the constraint, i.e., by a violationaction.Chapter 2. Review of Previous Work 32a database state sequence.There are two major kinds of transitional constraints mentioned in the literature([Fong and Kimbleton, 1980]): (1) old/new transitional constraints that restrict an updateof an attribute during which its “old” value is to be changed to a “new” value (e.g.,“new salary must be greater than old salary”); (2) nonexistence/existence transitionalconstraints that restrict either a nonexistence to existence transition or an existence tononexistence transition (e.g., “only if the account balance is zero, can the account bedeleted”).Some dynamic constraints (e.g., [de Castilho et al., 1982], [Casanova and Furtado,1984], [Ehrich et al., 1984], [Kung, 1984] and [Lipeck, 1986]), whichare often neglectedby researchers, illclude:• constraints on a sequence of operations:some operations must happen in aspecific sequence or at the same time. For example, “ownershipof a car must bepassed from a manufacturer to a dealer first, before it maybe passed to a purchaser”.• constraints involving time explicitly:1. SICs having some explicit time restriction or time-triggeringcondition. Forexample, we may have “an employee cannot receivea salary raise during his(her) first 6 months in the company”, or “at 0:00on 1/1/1993, increase thesalary of each employee by $1,000.,’2. Time-triggered or restricted SICs depending onpast generations of data and,thus reference historical data at some specific time point. For example, wemayhave “the price of any product at any time cannot bemore than 5% higher thanits price one year ago.”Chapter 2. Review of Previous Work 33Discussion Both static and dynamic SICs are important to preserve the logical meaning of stored data. A database designer should capture the necessary dynamic SICsand have some language to represent them. Constraints on a sequence of operations arenot easily captured and represented as SICs. This may explain why researchers ofteneither neglect them or model them by transactions or events. However, they are theconsequences of enforcing some SICs if specified properly.Note that an operation-dependent SIC is also a dynamic SIC. Old/new transitionalconstraints and nonexistence/existence transitional constraints are important. However,one should not define them too narrowly. That is, old/new transitional constraints arespecial cases of the update transitional constraints, which do not necessarily involvethe old/new comparison. Nonexistence/existence transitional constraints are also specialcases of the deletion/insertion transitional constraints, which may involve more than oneobject.In order to capture and represent SICs having explicit time restriction, in a real timeenvironment, we would need a special system variable Current_time to registerthecurrent clock time’0,and explicit time-valued attributes in the entity or relationship. Ina non-real time environment, those become ordinary data-driven constraints involvingsome time-valued attributes. If we wish to capture and represent SICs using historicaldata in general, we would need time-stamped generations of data. This research doesriot explore the last kind of SIC since a database keeping time-stamped generationsofdata is both unusual and expensive to implement.101t is assumed that a DBMS has access to a clock that registers both current date and time. Current..time can be thought of as an attribute of a special system entity type. Because constraints maybe affected by the normal advance of time, we assume that the operating system can be instructed tosignal the DBMS integrity maintenance subsystem when a pre-specified time is reached.Chapter 2. Review of Previous Work 342.1.9 Summary of Comments on SIC ClassificationIn the literature, researchers describe different SIC categorization schemes for diversepurposes — discussing data models, incorporating, representing, and enforcing SICs.For incorporating and representing SICs adequately, the most important categorization schemes are: (1) certain or uncertain, (2) the classification on applied objects,(3) operation-independent or operation-dependent, (4) unconditional or conditional, (5)strong, soft or self-correcting, and (6) static or dynamic . These categorizations are toorudimentary to serve as a modelling tool for incorporating SICs. However, they are veryimportant for precisely representing SICs since they provide a feature listing ofa SIC.Without any explicit description, one could only assume the givenSIC to be certain,related to all objects mentioned in that SIC, operation-independent, unconditional,andstrong.2.2 SIC RepresentationSQL is a generally accepted relational database language.However, only very few SICsare mentioned in the ISO SQL standard. The implied violation action of aSIC is to setSQLCODE negative. That is, it causes an error. Other SIC features, such aswhetherthe SIC is operation-independent or operation-dependent, unconditional or conditional,certain or uncertain, are not specified.The SQL standard has two levels and one addendum [van derLans, 1989]. Thespecified SICs are as below:1. SQL Level 1 Standard: It only specifies data types and length of dataitems.NOT NULL must be specified in every column definition in a CREATE TABLEChapter 2. Review of Previous Work 35statement.2. SQL Level 2 Standard: In addition to data types, it allows the designer tospecify whether a data item is UNIQUE and whether it can be NULL.3. SQL with Addendum:• In addition to the above, it allows specification ofa value range by using theCHECK specification for a data item;• By using the CHECK specification separately (e.g., CHECK (YEAR-OF-BIRTH < YEAR-JOINED)), tuple constraints can be specified;• Few set constraints have been included:— Referential constraint is provided by using FOREIGN KEY (column list)REFERENCES (table name);— UNIQUE or PRIMARY KEY specificationA number of researchers propose alternative languages to represent more SICs. Someare variants of first order logic (FOL) languages, for example, the many-sorted first-orderpredicate calculus applied by Furtado et al. [1981]; the constraint equation proposedby Morgenstern [1983; 1984a; 1984b; 1986]; the equation statements suggested by Cosmadakis and Kanellakis [1985]; the first order formulas used by Reiter [1984; 1988],Henschen et al. [1984], and Urban and Delcambre [1989]. These FOL family languagesare purely declarative. The operations to be checked are treated as an implementationproblem. No violation action is specified. The constra.ints are assumed to be certain.Furthermore, these FOL languages cannot represent some kinds of SICs, e.g., operationdependent or dynamic SICs. Other researchers (e.g., [de Castilho et al., 1982], [Casanovaand Furtado, 1984], [Ehrich et al., 1984], [Kung, 1984] and [Lipeck, 1986]) propose anChapter 2. Review of Previous Work 36extension of FOL — temporal logic that may include explicit “state” or “time” parameters. They invent some special temporal quantification or a list of modalities (e.g.,always, until, heretofore, sometime) to model dynamic SICs. One potential problem ofthese temporal logic languages is that they may not be easily understood and implemented.Others just extend the original SQL proposals, but do not follow the SQL standard.For example, Hammer and McLeod [1975] state that the syntax of their language is“rather similar” to SEQUEL. Bertino and Apuzzo [1984] propose a 5-component modelto specify SICs and state that their language is a “simple extension” to SQL. Date[1983] applies a language of his own — very loosely based on the PL/I version of UDL(Unified Database Language) and describes [1987] some proposed extensions of the baseSQL standard. These SQL extensions are more precise and powerful for representingSIC features than the SQL standard. Among these, the language model proposed byBertino and Apuzzo [1984] and the similar one by [Fernandez et al., 1981]” may becapable of representing the SIC features mentioned above except for certainties. However,neither work carefully elaborates what would be in each component12.In addition, SICsrepresented in their original model may not be efficiently enforced if we comparetherepresentation to Date’s UDL [Date, 1983] that uses the idea of a cursor to facilitate SICenforcement on occurrences.The certainty feature of SICs has not been properly represented. The term fuzzyintegrity constraints has appeared in the fuzzy database literature (e.g., [Zvieli and Chen,“However, [Fernandez, et al., 1981] does not describe what is really their language — first order logicor SQL.‘2For example, The model proposed by Fernandez, et al., [1981] allows to specify the enforcementschedule as a part of precondition of a SIC. The model proposed by Bertino and Apuzzo [1984] allowsthe options to switch a SIC ON/OFF. However, as discussed in the previous section, these should notbe in a SIC specification.Chapter 2. Review of Previous Work 371986], [Raju and Majumdar, 1988]). However, its meaning may have not been fullyexplored. Raju and Majumdar [1988] classify fuzzy relations in relational databasesinto two categories. A type-i fuzzy relation captures the impreciseness in the association among entities, e.g., the certainty of John liking the course AT is 80%. Atype-2 fuzzy relation produces further fuzzy semantics by allowing the domain of anattribute to be a set of fuzzy sets, e.g., “allowing salary of John to be in the range$40, 000 to $50, 000 and that of Mary to be a fuzzy set, However, their fuzzy integrity constraints only attach some fuzzy modifiers, e.g., “very”, “more or less” to anassertion by choosing some “fuzzy resemblance relation”, for example, “for any job, employees having approximately equal experience should have approximately equalsalaril’or“any items having approximately equal order-date should have more or less equal deliverydate”. This kind of SIC does not capture the “impreciseness” of a SIC itself. For example, “any items having approximately equal order-date should have more or less equaldelivery date” is a SIC that is true with certainty 75%. RESTRICT is another languageto consider the certainties of SICs ([Oren, 1985]). However, it simply uses a special operator “?“ to represent an uncertain assertion, e.g., “SALARY(PERSON) <? 150000”.This approach has only two levels of certainty certain and uncertain.In addition, all of the languages mentioned above represent SICs in the relationalmodel, not in the E-R model. It is desirable to have a uniform language to representall SICs in the E-R model and in the relational model. It would be easier for systemanalysts, database designers, and programmers to learn, comprehend and communicatewith each other if there is a uniform language for both the conceptual modelling andlogical modelling phases.Chapter 2. Review of Previous Work 382.3 SIC VerificationIt is suggested ([Bracchi et al., 1979] and [Morgenstern et al., 1989]) that it is necessaryto verify a set of SICs based on some criteria: completeness, correctness, consistency,nonredundancy, no unexpected implicit constraints, and insensitivity to order.In the database literature, no substantial results have been published that assure thata set of SICs is complete and correct, which would need both application-domain andgeneral world knowledge.If the SIC representation is not precise enough to capture the restriction intention,the enforcement outcome of a set of SICs may depend upon the order in which particularconstraints are executed. Other remaining criteria — consistency, nonredundancy and nounexpected implicit constraints — are closely related although their verification difficultymay be increasing. Redundancy occurs in a set of constraints if some constraints subsumeother constraints. It would be an issue only if we are concerned with efficient enforcementof a set of constraints. To assure that a set of SICs has no unexpected implicit constraintswould require a database designer to understand well the closure of the set, i.e., all thoseconsequences derived from the specified SICs. It requires that the database designermake judgements as to whether an implicit constraint is unwanted. The fundamentaland most important issue is the consistency problem. Constraints are consistent if thereexists a database state or a state transition that is allowable with regard to all of therestrictions. Unfortunately, the consistency problem has been described (e.g., [Meersman,1988]) as a difficult one. SICs are either tacitly assumed to be consistent, or only someof them are verified for consistency. For example, Kung [1984] [1985] presents a tableauxmethod to check the consistency of restricted first-order SICs. Bry and Manthey [1986]describe two basic approaches to extending refutation methods into procedures to checkChapter 2. Review of Previous Work39consistency of closed and function-free first-order SICs. Brodie [1978] mainlyrelies onan actual database to verify the consistency of some static SICs. Lenzeriniand Nobili[1987] have contributed much to the consistency problem for cardinality constraints.Insummary, past researchers have oniy tried to verify very few types of SICs for consistency.Furthermore, the redundancy problem of SICs has never been directlyaddressed in theliterature. Nor has the issue of unexpected implicit constraints been exploredfor arbitrarySICs.The major reason for this is that it has been known (e.g., [Nillson, 1980])that eventhe “logical implication” of first order logic predicates is “undecidable”or only “semi-decidable”13In this research, in order to cover more types of SICs, expressionsin theprecondition and predicate of a SIC could include not only first order,but also higherorder logic. The full verification of consistency andnon-redundancy of all SICs relatesto the fundamental issue of computer science andmathematics — constructing a Turingmachine to decide whether an arbitrary language is acceptable. Suchan issue would beNP-complete in nature (i.e., putatively having exponential time complexity)14.Based on current techniques, we cannot verifya set of arbitrary SICs for consistencyand nonredundancy. However, it is possibleto classify SICs into several different typesand analyze at least some of them for consistency andnon-redundancy. Some researchershave tried it, but only for very few types of SICsand produced relatively rudimentaryresults (e.g., Troyer [1989] verified five types ofSICs for consistency in the Binary Relationship model, Furtado et al., [1988] verified twotypes of SICs for consistency in the‘3A property is semi-decidable if algorithms can be constructedthat are guaranteed to report therespective property after finite (but indefinite) time if appliedto a set that actually has this property,but possibly run forever otherwise.141t is shown ([Aho, et al., 1974]), [Papadimitriou and Steiglitz,1982]) that even the “satisfiabilityproblem” of Boolean formulas — whether a Boolean formula canbe made true by some truth assignmentto its variables is NP-Complete.chapter 2. Review of Previous Work 40E-R model).2.4 SIC Reformulation and DecompositionThe reformulation and decomposition of a general SIC into several sub-SICs is relatedtoprevious work on efficiently checking SICs, an important researchtopic in the databaseliterature. A naive approach is to perform the modification andthen check whether thenew database state satisfies all SICs. Such an approach is called full integrity constraintchecking ([Ling, 1986]) or total integrity checking ([Nakano, 1983]).Full SIC checking is time-consuming.Researchers have proposed a number of moreefficient SIC checking techniques or algorithms.For example, Stonebraker [1975] considers monitoring immediate static SICs and elementary databaseupdates by applyingthe “query modification” technique. Niacolas [1982] andKobayashi [1984], respectively,present a simplification algorithm that transformsstatic SICs into simplified forms. Hsuand Imielinski [1985] provide a simplification method for transactions,and also for SICsexpressed in the prenex normal form of relationaltuple calculus. Ling and Rajagopalan[1984] propose a method for eliminating avoidable checking of integrityconstraints expressed in first order predicate calculus. Ceri and Widom [1990]present a labelling algorithm to derive automatically the set of operations thatmay cause constraint violationfor any given SIC expressed in a SQL-based language.Basically, the above efficient algorithms are derivedfrom the syntactic structure ofaSIC specification. Some researchers (e.g., [Qian and Wiederhold,1986], [Qian and Smith,1987]) go further to propose transformational mechanisms thatexploit knowledge aboutthe application domain and even database physical organizationstructure to reformulateSICs into semantically equivalent, but more efficient ones. Bernsteinet al. [1980] hasChapter 2. Review of Previous Work 41proposed improving SIC checking efficiency by ma.intainillg some redundant data (e.g.,minima and maxima of certain sets).All the above work is based upon the assumption that the current database statesatisfies all SICs that have been specified. This type of checking is called incrementalintegrity constraint checking ([Nakano, 1983], [Ling, 1986]).The purpose of previous work on SIC decomposition is mainly to make SIC checkingmore efficient. Some of them only attach a set of operations, which may cause constraintviolation, to a SIC. Others rewrite the original SIC into a set of sub-SICsthat is semantically equivalent to the original SIC. However, one should note that when a general SIC isdecomposed into sub-SICs, it is possible that the sub-SICs might havedifferent violationactions. Thus, these sub-SICs are also part of database specifications ratherthan justfor implementation efficiency.In addition, the previous algorithms are only suitable for few types ofSICs in somerestricted languages.2.5 Automated Database Design Aids for Eliciting SICsYang and Goldstein [1989] survey twenty automated database designsystems to investigate how SICs have been included. Many systems (e.g., 12S [Kawaguchiet al., 1986];SA-ER [Carsnell and Navathe, 1987]; ACME [Kerstern et al., 1987])do not attemptto identify SICs at all. Some systems (e.g., SECSI [Bouzeghouband Gardarin, 1984;Bouzeghoub et al., 1985; Bouzeghoub and Metais,1986];RIDL*[Troyer, 1989]; E2R[Kozaczynski and Lilien, 1988]; CHRIS [Furtadoet al., 1988]; Gambit [Bragger et al.,1984]; EXIS [Yasdi and Ziarko, 1987]; TSER [Hsu et al., 1988]; PROEX [ObretenovetChapter 2. Review of Previous Work 42al., 1988]; Modeller [Tauzovich, 1989]; EDDS [Choobineh, 1985 Choobinehet al., 1988];OICSI [Rolland and Proix, 1986; Cauvet et al., 1987; Proix and Rolland, 1988];andDatabase Generation Tool [Maryanski et al., 1984; Maryanski and Hong, 1985])do provide some mechanisms for eliciting and representing SICs. However, only a fewtypesof SICs are identified. Most of them are the common SICs, e.g., incidenceconstraints,totality constraints, or inherent to data abstractions. No system providesa model toguide the incorporation of SICs.The SIC representation of some systems is as arcsin a semantic network (e.g., OICSI)in logic (e.g., Gambit), or rules (e.g., CHRIS, EDDS). Some SICs areeven represented aspreconditions/post-conditions of transactions or events rather than dataSICs (e.g., E2R,EXIS). Some SIC features mentioned above are not explicitly represented.Gambit maybe the one that represents them most explicitly. However,in Gambit, the responsibilityfor knowing the operation type on which a SIC must bechecked rests with the databasedesigner.The discussions of SIC verification reported for thesesystems are either nonexistentor tend to be somewhat rudimentary.Chapter 3A Model for Representing Semantic Integrity ConstraintsThis chapter presents a language, called the SIC Representation model,to representthe features of a SIC that are mentioned in Section 2.1.9.3.1 An Overview of the Representation ModelThe SIC Representation model integrates and formalizes the ideasof Fernandez et al.[1981], Bertino and Apuzzo [1984], and Date [1983]. Thismodel specifies precisely features of a SIC. The original model proposed by Fernandez etal., and Bertino and Apuzzohas been extended to include a measure of certainty, and someof their original conceptshave been modified. The cursor concept in UDL proposed by Date has alsobeen incorporated into the model.The model represents a constraint in terms of six components:Object (0),Operation Type (T), Precondition (C), Predicate (P), CertaintyFactor (F) andViolation Action (A). In addition, each constraint is givena descriptive name. Thejustification of including these six components is provided inChapter 4. This chapteronly gives the description of the model.Using this model, a SIC is represented as:43Chapter 3. A Model for Representing Semantic Integrity Constraints44SIC-NameCERTAINTY F (Certainty Factor)FOR 0 (Object)ON T (Operation Type)IF C (Precondition)ASSERT P (Predicate)ELSE A (Violation Action)In terms of the usual production rule syntax, the above whole statement(except forSIC-Name) is interpreted as:with certainty FIF(O,T,C) THEN(IF NOT P THEN A)SIC-Name is used as an identifier that conveys some meaningof a SIC. It is not essentialto the Representation model.The following example, named as project_employeeminimum_salaryfor convenience,will be used for illustration. Suppose that thereis an Employee entity, and a Work_forrelationship in the E-R model and the corresponding Employee, andWork_for relationsin the relational model’5.A SIC might state “if an employee worksfor any project, his(her) salary should be greater than $10,000’. This SIC can bedecomposed into severalsub-SICs represented in the Representation model. Each sub-SICis operation-dependentand only relevant to a single object. In addition to the relatedsub-SICs on the updateof Work_for’s primary key, two sub-SICs are:‘5That is, a separate relation, Work_for, is assumed for simplifying the comparison of ourmodel toother work in the literature since they only deal with the relational model.Chapter 3. A Model for Representing Semantic Integrity Constraints45— one for Employee.Salary on update;— one for Work_for on insertionThe first one may be represented as the following.Employee. Salary- U-RshipDepEnt Val- (Workfor)CERTAINTY certainFOR Employee.SalaryON updateIF 3 Work_for, rship_occ_part(Work_for, “Employee”,Employee)ASSERT Employee.Salary> 10000ELSE rejectInterpretation: The first line is theSIC name. It indicates that this isa SICfor Employee.Salary on update and it isa “RshipDepEntVal” type because itasserts that the existence of a relationship(Work_for) depends on the value ofan entity attribute (Employee.Salary). The rship_occ_partis an assertion thatstands for “relationship occurrence participant”.In this example, it is usedto specify the Work_for occurrence in whichthe currently checked Employeeoccurrence participates with the entity type,“Employee”. This SIC statesthat with 100% certainty, when an Employee.Salaryoccurrence is to be updated, if the Employee participates inat least one Work_for, his or her Salarymust be greater than $10,000. Otherwise,the update operation is rejected.Note that the database designer might choose “propagate(delete(Work_for))”as the violation action. In that case, the propagation actionmight imply thatif the organization could not afford the minimum salary foran employee, itChapter 3. A Model for Representing Semantic Integrity Constraints46could not require that the employee be associated with any project (so thecurrent Work_for occurrence must also be deleted).The restriction intention of a SIC represented by this model is expressed asthe predicate component (P) separately. Its other components preciselyspecify the SIC featureslisted in Section 2.1.9. These are:1. certain or uncertain: indicated by the component F — certainty factor;2. applied data: shown in the component 0 — object;3. operation-dependent: a SIC represented in this model is always operation-dependent,the operation is specified in the component T — operationType.4. unconditional or conditional: the conditionsare listed in the component Cprecondition;5. strong, soft or self-correcting: indicated by the componentA violation action.A SIC represented in this model is always dynamicin that it indicates a valid databasestate transition in which the object would be manipulatedby the specified operation type.A static constraint is represented by rewriting it into one or moredynamic constraints forthe related object(s) on the operation(s) that couldcause unallowed database state(s).The reasons for doing this are as follows:• Because some constraintsare inherently dynamic, this allows uniform representations for all kinds of constraints.• Enforcement efficiencycan be greatly increased by knowing which operations cancause constraint violations.Chapter 3. A Model for Representing Semantic Integrity Constraints 47• Violation actions may be different dependingon the operation types causing constraint violations.The usual approaches to representing a SIC mix up several components in one statement and do not explicitly describe the above features. Taking a SIC represented in atraditional language, one could only assume that it is 100Yo certain; related to allobjectsmentioned in that SIC; applicable to all primitive database access operations; unconditional in all contexts; and so strong that it would cause errors if violated. For instance,according to the language used by Ling and Rajagopalan [1984], the above examplewouldbe expressed in the prenex normal form of the first order predicates as follows.E ranges over EmployeeJ ranges over WorLforV E V J (E.EmpNo J.EmpNo OR E.Salary> 10000)Evaluation:This statement is neither precise nor powerful because of thefollowing:(1) It is not clear how certain this SIC would be. So, it can only beassumedas 100%.(2) Since E and J appear there and no operation type is explicitly specified,it would seem that the SIC is relevant to any operation onan Employeeor a Work...for. However, in fact, it could never be violatedby deletionor insertion of an Employee, or deletion of a Work_for.(3) It does not clearly distinguish between the condition specifying thoseemployees to whom the constraint would apply (those workingfor someproject(s), i.e., the condition “E.EmpNo = J.EmpNo”) and the assertionChapter 3. A Model for Representing Semantic Integrity Constraints48on the salary of those employees (i.e., the expression “E.Salary> 10000”).(4) Since no violation action is specified, it can only be assumed as “reject”.The representation in the SIC Representation model is also moreprecise than theoriginal models presented by Fernandez et al. [1981], and Bertinoand Apuzzo [1984].According to the model presented by Bertino and Apuzzo [1984],the above examplewould be expressed by a single SIC in which the object componentmay include “Employee” and “Work_for” and the operation component may include“update” and “insert”.It implicitly means that the SIC is asserted onthe insertion of an Employee or Work_forand the updates of any attribute of an Employeeor Work_for.In addition, the representations in terms of the Representationmodel have some advantages that will be described in detail inthe next chapter. In short, the certaintyfactor introduces fuzzy semantics, facilitates knowledge-basedquery processing and provides deductive capabilities. The identified objectand operation type are useful forefficient enforcement and together with the violationaction, they provide different perspectives for specifying and enforcing a generalSIC. The separation of precondition frompredicate allows natural and precise representationof a SIC. The explicit representationsof the object, operation type and precondition components areuseful for applying SICabstractions (that are introduced in Section 4.2).These six components and the naming convention ofthe SIC Representation modelare described in the following sections. Detailed BNF(Backus-Naur Form) descriptionsof the model are given in Appendix A. The rest of thissection describes briefly thenotation and the allowed expressions in this language.Chapter 3. A Model for Representing Semantic Integrity Constraints49Expressions First order logic and higher order logic expressions, arithmetic expressions and date expressions, are allowed in the precondition and predicate components.They may consist of quantifiers V (for all), (there exists), logical connectives“A” (conjunction), “V” (disjunction), “—i” (negation), but not the “—÷“ (implication)connective since it has been separated as the precondition. Theymay also contain aggregate functions: e.g.. avg, count, max, mm, sum, or other user-defined aggregationfunctions. In addition, a special pair of functions, old and new, areused. The new/oldfunction in SIC references the new/old value of the referenced object after/beforechecking the SIC. An implemented integrity maintenance subsystemof a DBMS must assurethe correct functioning of the new and old functions6.One set expression is also allowed:set{Ei some restriction(s)}. This is read as for allE1 satisfying “some restriction(s)”17The special predicates used with the SIC Representationmodel are summarized in theAppendix B18.Subscript Notation This research uses a subscriptto simulate the “cursor” in Date’sUDL to emphasize that a SIC is enforced on an occurrencealthough it is specified intensionally for an entity/relationship/relation type. A cursorin UDL is an object whosevalue is (normally) the address of some specificrecord in the database ([Date, 1983]).Since a SIC is enforced on an occurrence, if there areother occurrences of the same entity,‘61f the SIC is checked immediately, it references the new/old valueof the referenced object after/beforethe operation. However, if the SIC is deferredto be checked at end of transaction, it references thenew/old value of the referenced object after/before the transaction.‘7The keyword set can be omitted. This notation is adopted froman “integrity constraint definitionlanguage” [Gardarin and Melkanoff, 1979].181n addition, this research follows the Prolog naming conventionthat a variable is written as a wordbeginning with a capital letter and an atom is written as a word beginningwith a lower case. The onlyexceptions are keywords (CERTAINTY, FOR, ON, IF,ASSERT and ELSE) and the SIC name in thefirst line, which are character strings in nature, not variables.Chapter 3. A Model for Representing Semantic Integrity Constraints 50relationship or relation type referred in the precondition or predicate component, different subscripts will be used to distinguish them. The one to he checked is referenced byattaching a default subscript 0, e.g., E0. Variables with subscripts other than 0 (e.g., E1)represent any occurrence of the same object (e.g., E) type. In terms of programming language, the one with subscript 0 refers to the particular inserted/deleted/updated “value”to be checked; those with subscripts other than 0 are “variables” of the same object type.If there is only one occurrence (i.e., the one to be checked) referred in the preconditionand predicate components, the default subscript 0 is omitted. In any case, we do notusea subscript in the object component because the variable in that componentis used toindicate that the SIC is applicable to any occurrence of the specifiedtype.An example from Date [1983] will help to clarify the above subscript idea. Supposethat there is a relationship Supply in the context of “Supplier Supply Part”; anda SICrequiring that “any quantity value of a supply must not be more than5 percent greaterthan the average of all such values”. One of the requiredSICs is represented by the Representation model as below, in which Supply1.Qty standsfor any Supply. Qty occurrence.The assertion in its predicate component requires that eachof other Supply. Qty’s mustsatisfy the average property when an occurrence is to be deleted.Supply-D-AggFcn- (Supply. Qty)CERTAINTY certainFOR SupplyON deletionASSERT -‘(Supplyi.Qty> 1.05 x avg({Supplyi.Qty Supply1 Supplyo}))ELSE rejectLegend: Supplyo indicates the value of Supply occurrence currently checked.Chapter 3. A Model for Representing Semantic Integrity Constraints513.2 SIC NameSIC name is not counted as a component in the Representation model sinceit is notessential. A database designer may freely use other naming conventions,such as thesimple one Ii, 12, ... used in the literature ([Bertino andApuzzo [1984], and [Date,1983]), or some application dependent conventions. In this research aconcatenatedcharacter string is used as a SIC name for conveying some of the meaningof a SIC and alsofor serving as its unique identifier. This convention allows the SIC elicitationsubsystem ofan automated database design system (introduced in Chapter 7)to generate a SIC nameautomatically by incorporating an abbreviated applicationdomain-independent SIC typeand some application information (object type,operation type and related object typeset). A SIC name contains five parts. The general format islike:ObjectType-OperationType-SICType- (RelatedObjectTypeSet)-SequenceNoFor example. from “Employee. Salary- U-RshipDepEntVal- (Work_for)” we know thatthis SIC should be checked on the update of Employee.Salarybecause the existence of aWork_for relationship occurrence depends on its value.1. The Object Type (e.g., Employee.Salary) is the specific type name ofan attribute,entity, relationship, or relation for which this SIC is asserted.It corresponds to thecomponent 0 in the SIC.2. The Operation Type is either U, I, or D representing update,insertion, or deletion,respectively. Similarly, it correspondsto the component T in the SIC.Chapter 3. A Model for Representing Semantic Integrity Constraints 523. The SIC Type is the abbreviation of a SIC type conveying some application domain-independent meaning; e.g., “Totality” (if an entity occurrence exists, it must participate in some minimum number of relationship occurrences of the specified type),“RshipDepEntVal” (the existence of a relationship depends on an attribute valueof an entity type). By applying the E-R-SIC model (introduced in Chapter 5), wecan classify SICs into a number of domain-independent SIC types.4. The Related Object Type Set includes those object types that appear in the precondition and predicate components of a SIC and can supplement the meaningof the SIC type. A SIC name may not have this part if thereis no such objects. For example, there is no other object type related toan absolute cardinality constraint for an entity type, which restricts the maximumnumber of occurrences of the entity type that can exist in a database. However,in the aboveproject_employee_minimum_salary example, it maybe useful to know which relationship type (i.e., Work_for) depends on the attribute (i.e., Employee.Salary).Fordomain constraints on the insertion of an entity, although all theattributes of theentity are referenced in the SIC, they need not be included in itsSIC name sinceit is implied by the SIC type “Domain”.5. The Sequence No is a positive integer number. In a complicated application,it ispossible that the above parts would not be sufficient to distinguishone SIC fromothers. In that case, as a last resort, a sequence numberstarting with 1 will beadded.Chapter 3. A Model for Representing Semantic Integrity Constraints 533.3 Certainty Factor (F)A certainty factor, F, which does not appear in Fernandez et al. [1981] and Bertino andApuzzo [1984], is introduced in the Representation model. A certainty factor is closelyrelated to the predicate (P) and violation action (A). It may be attached to SICs in thefollowing alternative ways:1. Ratio Scales: A ratio scale to measure the certainty of a SIC is needed if we woulduse the certainty factor to provide deductive capabilities under uncertainty.Thecertainty factor is defined to have a value between0% and 100%. Any SIC with acertainty factor less than 100% can only have “warning”,“conditionallyieject”, or“conditionally_propagate” as its violationaction. The value of the certainty factoris based on the experience of the database designer. If the value is toolow, itimplies that the SIC is very unlikely to holdin the database and is less useful inproviding semantic information.2. Ordinal Scales: If we are only interested in traditional databasesand there area number of uncertain SICs, an ordinal scale to measure thecertainty of a SICis needed. This ordinal scale would be used to rank the certaintyof SICs in thesame “family” (in terms of the same restriction intention, butdifferent levels ofstringency) when verifying and enforcing them. The certaintyfactor might bedefined to have a value between 1 and10 if there are at most 9 uncertain SICsfor the same “family” of SICs. Any SIC with a. certainty factorless than 10 is anuncertain SIC.3. Two levels: It is possible that a database designer would like to specify atmostone uncertain SIC for a restriction intention because an organization is mainlychapter 3. A Model for Representing Semantic Integrity Constraints 54concerned with SICs with 100% certainty. In this case, two discrete levelsof certainty are enough. In this case, “uncertain” (i.e., certainty 100%) would havea “warning”, “conditionally_reject”, or “conditionally_propagate” violation actionwhereas “certain” (i.e., certainty = 100%) would have a corresponding“reject”, or“propagate” violation action.4. Fuzzy terms: It is also possible to attach a fuzzy term, e.g., “usually”,“sometimes”, as the certainty factor if that term has been properly specifiedto be equivalent to a specific certainty ratio number (e.g., “usually”may be equivalent to80%).3.4 Object (0)O represents the data object to which the SIC applies. In theE-R model, it correspondsto any occurrence of the specified attribute, entity, or relationshiptype. In the relationalmodel, it corresponds to any occurrence of an attribute or a relationtype, i.e., a column,or tuple. The E.A or R.A is used to refer to the attribute A ofentity E or relationship(relation in the relational model) R. Note that an attributeis treated as an object inthis model since it would make the SIC clearer and it is possiblethat a constraint isrelevant to an update of an attribute, but not the insertion/deletionof its related entity,relationship, or relation. For example, in the above project_employee_minimum_salaryexample, it is not relevant to the insertion or deletion of anEmployee although it isrelevant to the update of an Employee.Salary.Chapter 3. A Model for Representing Semantic Integrity Constraints 553.5 Operation Type (T)The operation type, T, (also called the access type [Bertino and Apuzzo, 1984]) specifiesthe types of database manipulation operations to which the SIC is applicable. Only threeoperation types need to be considered — insertion, deletion, or update.Note that aSIC is never relevant to a retrieval (query) operation.The data definition operations, i.e., “creation” or “destruction” of anentity, relationship, or relation type, are also excluded’9.A database designer should designa databaseto include those necessary object types according to the informationrequirements. However, whether an object type should be in adatabase schema is not a SIC. Destroyingan object type is concerned with the database reorganization.In the case of databasereorganization, destroying an object type may have different implicationsin different situations. For example, destroying a type may occurbecause this type ceases to exist inthe real world or just because the organization willno longer keep the super-type (e.g.Person) information, hut is still interested in its subtypes(e.g. Employee). The databasedesigner needs to be involved to discover the changed informationrequirements. Thedifferent situations cannot be foreseen and automatedin the database design phase.3.6 Precondition (C)The precondition (C) specifies the implicit or explicitpresuppositions of a SIC. Ifit is not satisfied, it would cause the predicate ofthe SIC to be either undefined orirrelevant. For example, in the aboveproject_employee_minimum_salary example, theSIC for Employee.Salary on update is irrelevant to any employeewho does not work for‘9So the language represented by this model is not a relationally complete query language.Chapter 3. A Model for Representing Semantic Integrity Constraints.56any project. It also identifies the object (0) in the context of this SIC. In theaboveexample, the SIC for Work_for on insertion would have the precondition as “ Employee,rship_occ_part(Work_for, “Employee”, Employee)” to indicate the Employee occurrenceparticipating in the specific Work_for occurrence. If a SIC has no precondition,thekeyword IF may be omitted.3.7 Predicate (P)The predicate (P) is the assertion for the databasestate transition when the object(0) is to be manipulated by the operation (T). If it is true,the operation (T) on theobject (0) is allowed to be performed. For an attribute otherthan primary key20,thepredicate should involve the attribute itself. Foran entity, relationship, or relation, thepredicate involves its attributes, aggregate attributes(e.g., average value), or relatedother entities and/or relationships (relations in therelational model). Since a SIC placeslogical restriction on data, its predicate componentshould contain invariant assertionson the related objects rather than procedural statementsof programs.The precondition and predicate are both expressions that controlwhether the violation action will be invoked. The precondition specifiesthe general state of the databasethat makes the constraint relevant. The predicate specifies somethingthat must be trueabout the object after the operation. The justificationof having both components isdescribed in detail in Chapter 4.20We have this exception because in the relational model and traditionalE-R model, primary keysplay the role of “surrogates”. The tipdate of a primary key of an entity/relationship/relationmay implythe deletion of an old entity/relationship/relation followed by theinsertion of a new one.Chapter 3. A Model for Representing Semantic Integrity Constraints 573.8 Violation Action (A)The violation action (A) specifies how the system is to behave if the predicate P is falsewhen the SIC is checked. In order to have precise meaning for a single SIC, it is desirableto have a single violation action (or a deterministic series of well-defined actions) ratherthan a complex procedure with many choices. The violation action may be specifiedasfollows.1. “warning” — allow the access operation, but issue a warning.2. “reject” reject the access operation, signalling an error.3. “propagate” allow the access operation, but perform a relatedcorrective actionto restore the database to a correct state.4. “conditionally_reject” or “conditionally_propagate” — requestconfirmation from theuser to ignore the SIC; the processing of the access operation issuspended untilthe user indicates either to ignore the SIC or to take the violation action(rejectthe access operation, or allow the operation but propagate toperform a correctiveaction).Discussion on Propagation Traditionally, the violation actionof a SIC is just arejection. A warning action indeed turns off the enforcementof the SIC except forgenerating a message. A propagation action changes other objectsin the database. Date[1983] indicates that integrity rules may he considered importantspecial cases of triggeredprocedures21.In discussing semantic integrity constraints, Fernandezet al. [1981] mentionthe following problems raised by the trigger mechanism.21Other application areas for triggered procedures include virtual fields, security, etc. [Date, 1983]Chapter 3. A Model for Representing Semantic Integrity Constraints 58(1) Integrity or security violation within auxiliary procedures could result ina never-ending sequence of invoking procedures.(2) The proper access rights of the invoked procedures are an issue: Shouldthese be the rights of the invoker of the triggering maintenance operationor the rights of the DBA who specified the auxiliary procedure?(3) Triggering of maintenance procedures results in a somewhat confusingdivision of responsibility between the user (or the programmer) and theDBA. For example, who should update the count of employees on insertinga new employee? The system or the programmer?The problem of security or access rights is beyond the scope of this research22.Besides,as stated before, the intention of this research is not to proposean approach to replacingthe transaction or application programming. If theintention of the violation actionis just to maintain a correct database state, there is no reason that wewould need acomplex violation action. Therefore, although in principle a “violationaction” couldbe very complex, in this research it is generally a single action or a deterministicseriesof well-defined actions. The responsibility of programmers to write properapplicationprograms to update data is not changed23.If the violation action is to propagate to insert some other objects, establishingthe attribute values of those objects might become a problem. Itis not possible for a databasedesigner to know all these values during database design. Those values thatcannot beknown by an automated database design system will be assumed to be“null” (i.e., “unknown”). This may violate other SICs if some unknown attribute values(e.g., part of22llowever, this unsolved security problem raised by Fernandez et al. [1981] mightalso be one argumentagainst complex violation actions.23llowever, now a DBMS becomes more powerful to detect programming errors of programmers andtakes violation actions. Programmers need not check SICs because the DBMS will do that.Chapter 3. A Model for Representing Semantic Integrity Constraints 59key of a relationship) are not allowed to be “null”. There are two approaches to dealingwith this problem. In the first approach, the violation action “reject” rather than “propagate(insert(O1))” must be specified. In the second approach, the propagating insertionis allowed. Compared to the first approach, the second approach is more powerful. Evenif the propagating insertion is finally “rejected” due to other SICs, it is easier for theuser to understand why the original insertion operation is rejected. It also leaves openthe possibility of designing an enhanced integrity maintenance subsystem of DBMS thatwill prompt the user to supply the unknown attribute values or obtain them from someother sources ([Casanova and Tucherman, 1988]).Two-valued Logic Note that the violation action is taken only when the predicate,not the precondition, is What would happen if a database allows null values? Anull value can have at least two interpretations: “unknown” and “not applicable”. In thisresearch, “null” is only allowed to represent “unknown” because the “not applicable null”should not appear in a well-defined, properly-normalized database with clear semantics24.In principle, some warning messages can be issued25 if the predicate is evaluated to beunknown. However, since the “unknown” values will eventually become “known”, theywill be checked at that time. If they violate SICs, appropriate actions will he taken24The existence of “not applicable null” in some attributes of an entity, relationship. or relationindicates that these attributes do not apply to all occurrences. It implies that some subtypes shouldbe specified in order to clarify the semantics. In the literature (e.g., [Lee, 1988]) there are two otherinterpretations of “null”. Lee states that, for example, with regard to John’s spouse, the null value maymean: (1) literal interpretation “null” (e.g., the spouse name is “null”); (2) sense of “none” (e.g., Johnis a person, but he has no spouse); (3) “not applicable” because spouse is not an attribute of the object(e.g., John is the name of a building, not a person); (4) “unknown”. Note that the first interpretation isthe result of bad implementation of “null”. “Null” should be represented by an unambiguous symbolin adatabase. The second and third interpretations are also results of a bad database design — the semanticsare not clear because some attributes are not defined for some occurrences of the specific “type”. In theabove example, there should be two types, Person and Building, to avoid the third interpretation. Thesubtype Married_Person should be defined to avoid the second interpretation. In this research, the “notapplicable null” interpretation includes both the above second and third interpretations.25These are termed “weak violations” by Ho [1982].Chapter 3. A Model for Representing Semantic Integrity Constraints 60then. Therefore, in this research, for simplification, it is assumed that if the predicateis evaluated to be unknown, no violation action will be taken immediately. By takingsuch a position, the verification of SIC consistency can be simplified from 3-valued (true,false, unknown) to 2-valued (true, false) logic.Chapter 4The Application of the SIC Representation ModelThis chapter describes how we can apply the SIC Representationmodel for databasedesign and management. In Section 4.1 it is claimed thatby applying this model we canrepresent precisely the features of any SIC. In Section 4.2 SIC abstractionsare introducedto facilitate SIC specification and management. The usefulnessof the Representationmodel in SIC abstraction is also discussed. In Section4.3 the application to databasemanagement is briefly described.4.1 Completeness of a SIC Specificationfor Database DesignSu & Raschid [1985] and Shepherd Kerschberg[1986] suggest that “constraints” mustexplicitly specify both declarative semanticsas well as process oriented or operationalsemantics. Declarative semantics correspond to logical formulasdescribing relationshipsbetween objects in a database. The informationneeded to check that these relationships are true or to maintain these relationships isthe operational semantics. It is alsosuggested that the “constraint” formalisms must provideinformation along the lines ofWHAT, WHEN, WHERE and HOW if they are tobe complete. Although our SICs areonly a proper subset of their “constraints” (refer to Section1.3), the same formalism withsome modifications can be applied to examine the completeness of a SIC specification.61Chapter 4. The Application of the SIC Representation Model 621. Operational and Declarative Semantics. In the Representation model, thepredicate (P) and certainty factor (F) components specify the invariant declarativesemantics among data objects. The operation type (T) is relevant to the operationalsemantics to check these invariant facts. The violation action (A) specifies theoperational semantics that are needed to maintain these assertions. In addition,the object (0) and precondition (C) components supplement both declarative andoperational semantics to assert and check these invariant assertions. The certaintyfactor (F) also provides some operational semantics since the highest certain SICshould be enforced first.2. WHAT, WHEN, WHERE and HOW. The Representation model specifies:• WHAT the SIC requires — its invariant assertions(predicates, P) and certainty (certainty factor, F).• WHEN the SIC is to be checked these assertions should be checked whensome primitive access operation (T) (update, deletion, insertion) is performedon the data.• WHERE the SIC occurs — these assertionsare applicable to some data object(0) in some contexts (preconditions, C).• HOW the system should behavein the case of these assertions being violated(violation action, A).Therefore, the Representation model provides declarative and operationalsemanticsof a SIC specification. It also provides a SIC specification alongthe lines of WHAT,WHEN, WHERE, and HOW. The rest of this section further explores whether we wouldexactly need these six components in order to represent completely declarative and operational semantics of a SIC specification and express precisely its features — can we haveChapter 4. The Application of the SIC Representation Model63fewer or do we need more?4.1.1 Not Fewer ComponentsThe strengths of the Representation model are that it is precise enough toexpress thefeatures of a SIC; uniform for all kinds of SICs; and powerful enoughto offer a correctiveaction rather than simply reject the operation violating the SIC.As mentioned in theprevious chapter, traditional languages do not explicitly describeall features included inthe Representation model. A SIC represented ina traditional language could only beassumed as 100% certain, related to all objects mentioned in thatSIC; applicable to allprimitive database access operations; unconditionalin all contexts; and so strong thatits violation would cause errors. It is beyond question that thepredicate (P) componentis absolutely needed. Other components needto be further discussed.Object (0) Depending on the language usedto represent a SIC, a number of objectsmay be mentioned. However, the declarative and operationalsemantics of the constraintmay not be relevant to some of these objects. In theproject_employee_minimum_salaryexample of Chapter 3, the constraint is not relevant tothe Project that may be mentionedif it is declared in a natural language. Neitheris it relevant to the Employee if it isrepresented in a pure first order logic suchas the one used by Ling and Rajagopalan[1984]. Consider another example. Suppose we have a constraint:“if an employee worksfor any project, his (her) salary cannot decrease”.This constraint is only relevant to theEmployee.Salary, not Employee, Project or Work_for. In orderto have precise declarativeand operational semantics for a SIC, it is necessary to specify explicitly forwhich objectthis SIC is asserted. Since in this model a SIC is applied to a single operation,which canonly manipulate a single object, it is sufficient to assert only for this object.Chapter 4. The Application of the SIC Representation Model64Operation Type (T) If a constraint is relevant to an entity, relationship, or relation,but not an attribute, and is operation-dependent, the explicit specificationof its operationtype is needed. If the constraint is operation-independent, there arestill two advantages(in addition to improving enforcement efficiency) to specifying the operation type:• First, explicitly expressingwhich operation could cause constraint violation provides the valuable operational semantics and clarifies the restrictionintention (declarative semantics) of the SIC. It would be desirable to have a conceptualpicture ofthe consequence of constraint enforcement evenin the database design phase. Itwould also be helpful for later designing transactionspecifications. For example,given the constraint “the total number ofemployees cannot exceed 200”, it is helpfulto know that its restriction is only relevant to the insertionof an employee althoughthe original constraint is operation-independent.• Second, it is impossible to specify the violationaction without referring to a specificoperation type. When a general SIC is rewritteninto several sub-SICs for separateobjects in terms of the Representation model,their objects, operations, and possiblydifferent violation actions provide useful declarative andoperational semanticsnecessary conditions for operations on the objectsin the object component andsufficient conditions for propagated operationson related objects.However, for a SIC asserted on an attribute,the explicit operation type specification isredundant since update is the only possible manipulation operationtype. In this case,we explicitly specify it only for clear and uniform representation.Chapter 4. The Application of the SIC Representation Model 65Precondition (C) One may argue that it is unnecessary to separate preconditionfrom predicate in terms of pure logical expressiveness. However, without this separation. the declarative and operational semantics of the SIC a.re ambiguous.The precondition specifies the current database state that makes the constraint relevant;thepredicate asserts the allowable state after the operation on the object.We have seenthe project_employee_minimum_salary example in Chapter 3. Some constraintsare morenaturally expressed in the IF... THEN. . . format. By explicitly separatingthe predicatefrom the precondition, the restriction intention of the predicate becomesclearer and isclosely related to the violation action that now can be simplifiedto be a deterministicseries of well-defined actions (usually a single one).Even if a constraint is not in the IF ...THEN . . . format when we first write it inEnglish, it may have some implicit “presuppositions”. For example,in a database containing Department, Employee, and Manager, a constraint mighthe stated in Englishas SIC-6P6: “each employee must earn less than his (her)department manager”. Inthat database, the above statement has two implicit presuppositions:(1) “the employeebelongs to some department”; and (2) “the departmenthas a manager”. If these presuppositions are only true for some employees and departments,SIC-O is conditional and thepresuppositions should be explicitly represented in its precondition component.If theseare facts that are true for all employees and departments,we should have two other SICs,each of them asserting one of the above factsin its predicate component. In addition,these presuppositions should also be in the preconditioncomponent of SIC-O so that theconnection between a specific Employee and Managerare clear27.Each SIC would haveprecise meaning.261n this chapter, SIC-i, i=O,1,2,... are used as SIC names for simplicity.27That is, by these preconditions, we can find the manager occurrencerelated to an employee or findthe employee occurrences related to a manager. However, these are not the restriction intentionof theabove SIC.Chapter 4. The Application of the SIC Representation Model66The precise context specifications in the precondition are also needed to provideoperational semantics when enforcing some related SICs. Suppose we havetwo SICsfor Employee.Salary on update.SIC-iCERTAINTY certainASSERT Employee. Salary 100,000ELSE propagate(update(Department .Budget))SIC-2CERTAINTY certainASSERT Employee.Salary 150,000ELSE rejectWhen updating the salary of an employee to be$180,000, we would have an enforcement problem — which violation action should betaken. The problem is caused by theimprecise representation of SIC-i. The original restriction intentionof these two SICs is:“the salary of an employee should be less than or equalto $100,000, otherwise, if it is nottoo high, we can increase the department budget; but ifit is higher than $150,000, we canonly reject it”. The precondition of SIC-i should includean expression, “Employee.Salary< 150,00U’8.28Alternatively, we can use an expression,“checkpreSIC(SIC-, Employee.Salary”, that checks thepre-SIC, SIC-2, for Employee.Salary and returns true if thepre-SIC is satisfied. One SIC’s (say SIC-i)precondition may be the precondition and predicate of otherSICs (say, SIC-2, SIC-3). SIC-2 and SIC-3may be called as the pre-SICs of SIC-i. If a data object violatesthe pre-SICs, SIC-i becomes irrelevantto further testing.Chapter 4. The Application of the SIC Representation Model 67Violation Action (A) Without the violation action component, all SICs could onlybe assumed to be strong. The SIC specifications would be less powerful since they couldnot include self-correcting or soft SICs.Certainty Factor (F) If a database designer only considers certain SICs, he or shewould not need the certainty factor component. However, the purpose of introducingcertainty factors in database specifications is to represent more data semantics — includingfuzzy semantics. If the certainty factor component was not included in the Representation model, all SICs would be assumed to be 100% certain — no exceptions.Theviolation action of a SIC would be either a rejection or a correctiveaction. Systemswith only such SICs contain less declarative semantics since the fuzzy semanticsare lost.They have also been criticized as too rigid to “deal with unusual, atypical, orunexpectedoccurrences”, and “it is not possible to allow violations of integrity constraintsto persist without turning off completely the checking of those constraints” ([Bordiga,1985]).Having considered such possibilities and in order to avoidsuch criticisms, a databasedesigner might specify too few SICs or too “generous” ones.The inclusion of certaintyfactors in the Representation model provides a mechanism for acceptingdata violatingthe assertions of some uncertain constraints — the introduction of“controlled inconsistency” For example, a database designer may specify a SIC foremployee’s salaryas “Employee Salary < $1, 000,000” even though few employeesin an organization havesalary greater than $100,000 Such a SIC would not catch common dataentry errors,e.g., misplaced or omitted decimal points. Employing a certainty factor,he (she) mightspecify two or more SICs for the same object on the same operation. Forexample:SIC-1: with 100% certainty, Employee.Salary $1,000,000SIG-2: with 90% certainty, Employee.Salary < $100,000Chapter 4. The Application of the SIC Representation Model 68SIC-3: with 80% certainty, Employee.Salary < $50,000SIC-2 and SIC-3 are fuzzy (uncertain) semantic integrity constraints. If they areviolated, different warning messages might be issued.One may wonder since the certainty factor component relates closely to the violationaction, we might only need the latter. However, we may lose some semantics if we onlyhave the violation action component:• Suppose that we only have at mostone uncertain SIC for each restriction intention.It might be acceptable to keep only the violation action component if we can assurethat the violation actions would correctly be specified and be consistent withtheir“implicit” certainty factors.• Suppose that we have a number of uncertainSICs. In the above example, SIG-2 andSIC-3 should have measures of certainty at least on an ordinalscale. Otherwise,we cannot know whether these SICs are inconsistent29.In addition,when enforcinga set of uncertain SICs, some ordering is needed. The mostcertain SIC in thesame “family” (in terms of the same restriction intention, but differentlevels ofstringency) should be checked first since its violation is morelikely to cause thedatabase to be inconsistent, the second certainone is then checked, etc. If anoccurrence violates the higher certain SIC, the remaining SICs withlower certaintyin the same “family” need not be checked. Without the certaintyfactor, we wouldlose some declarative and operational semantics.29ff the SIC-S were with certainty 95%, SIC-S and SIC-S would become inconsistent. Notethat SICsrepresented in this model are consistent if there exists a database state transition that is allowablewithregard to all of the restrictions, i.e., considering all their components except for violation actions.Chapter 4. The Application of the SIC Representation Model 69Conclusion Based on the above arguments, we can conclude the following:If we wish to have uniform and precise representations for all kinds of SICs,it is necessary to have all six components of the Representation model.4.1.2 No More ComponentsIs it possible that more components are needed to represent declarative and operationalsemantics of a SIC or express its features? In the following, some possibleproposals arediscussed.Enforcement Schedule Shepherd and Kerschberg [1986] suggest thata “constraint”formalism should include when a constraint is to be checked -— after everytransactionor at audit time. Note that their discussion corresponds to the classificationbased onthe enforcement schedule described in the Section 2.1.7. As discussed inthat section, theenforcement schedule is a transaction-driven specification. It is impossibleto specify theenforcement schedule of a SIC as another component in the level of databaseschema.Permanence As also discussed in the Section 2.1.5, the permanenceof a SIC is noteasily decided in advance since the organizational environmentis changing. The permanence of a SIC is closely related to SIC maintenance ratherthan database design or SICenforcement. If a SIC is still in a database schema, it shouldbe enforced anyway. Itspermanence information adds no more declarative or operational semantics.Therefore,this “permanence” information is not included in the Representation model.Chapter 4. The Application of the SIC Representation Model 70Object Type, Set, or Occurrence The object component in the Representationmodel means that the SIC is applicable to any occurrence of the specified object type.Occurrences are fundamental things in the database, and insertion, deletion, and updateare primitive operations on them. The integrity problems caused by data definitionoperations (destruction or creation) on data object types are not the focus of this research.Any SIC that involves data manipulation operations on higher levels of objects, e.g.,types, or sets, is finally enforced at the occurrence level. Therefore, the explicit indicationof the level to which the SIC is applied would add no more declarative or operationalsemantics.Static or Dynamic As mentioned in Section 2.1.8, researchers have also discussedstatic and dynamic constraints. Should we explicitly specify a SIC as static or dynamic?As described in Chapter 3, the Representation model is indeed transition-oriented;thatis, all SICs represented in terms of the Representation model are basically dynamic.Thefundamental premise is as below.Premise 4.1 The current database state is semantically correct.A database state is semantically correct if it can be constructed starting from an emptydatabase by a sequence of valid database primitive operations(insertions, deletions, orupdates). A database operation is valid if the state transition causedby it satisfies theSICs that exist at the time the operation is performed. We assume that a databaseis initially empty. Over time, object occurrences are inserted into the database,then updated,and finally probably deleted. Before ai occurrence is manipulated, the olddatabase stateis semantically correct. A SIC is specified so that the database transition causedby itsprimitive operation would bring the database to a new semantically correct state. AnyChapter 4. The Application of the SIC Representation Model 71“static constraint” is represented by rewriting it into one or more dynamic constraints forthe related object(s) on the operation(s) that could cause unallowed database state(s).By doing so, we do not lose its declarative semantics, rather, proper operational semantics are attached. It is unnecessary to indicate whether the original constraint is dynamicor static in this model.In addition, a constraint on a sequence of operations is not a “real”SIC in terms ofthe SIC Representation model. Instead, it is the consequence ofenforcing several SICs.For example, in order to model “a car must be owned by a manufacturer before it canbeowned by a dealer’, given that there are Manufacturer_Ownershipand Dealer_Ownershiprelationships, and Car, Manufacturer, and Dealer entities,we would need four SICs:Manufacturer_ Ownership-I-RshipExclusive- (Dealer_Ownership)CERTAINTY certainFOR ManufacturerOwnershipON insertionIF Car, rship_occ_part (Manufacturer_Ownership,Car)ASSERT —di DealerOwnership,rship_occ_part(Dealer_Ownership, Car)ELSE rejectChapter 4. The Application of the SIC Representation Model 72Dealer_ Ownership-I-RshipExciusive- (Manufacturer_Ownership)CERTAINTY certainFOR Dea1erOwnershipON insertionIF Car, rship_occ_part (Dealer_Ownership, Car)ASSERT-ManufacturerOwnership,rship_occ_part(Manufacturer_Ownership, Car)ELSE rejectDealer_Ownership-I-RshipBeforeRship- (ManufacturerOwnership)CERTAINTY certainFOR Dea1erOwnershipON insertionIF Car, rship_occ_part(Dealer_Ownership, Car)ASSERT ManufacturerOwnership,rship_occ_part(old(ManufacturerOwnership), Car)ELSE rejectManufacturer Ownership-D-Rship TriggerRship- (Dealer Ownership)CERTAINTY certainFOR ManufacturerOwnershipON deletionIF Car, rship_occ_part (Manufacturer_Ownership, Car)ASSERT DealerOwnership,rship_occ_part(Dealer_Ownership, Car)ELSE propagate(insert (Dealer_Ownership))Chapter 4. The Application of the SIC Representation Model73Interpretation:These SICs restrict the relationships for a singleCar. The first two SICsspecify that Dealer_Ownership and Manufacturer_Ownership are exclusive.The third SIC restricts Manufacturer_Ownership to have existed at thetimeDealer_Ownership is being created. Thus, these first three SICsrequire that ifa Dealer_Ownership exists, then Manufacturer_Ownership must haveexistedin the past (and must no longer exist now). The fourth SIC requiresthatwhen a Manufacturer_Ownership is to be deleted,a Dealer_Ownership mustbe created. These four SICs together restrict Manufacturer_OwnershipandDealer_Ownership to exist in sequence. These SICs are independentof anytransactions. However, because ofthem, the order of the related transactionsis restricted. The sequence restriction on transactionsis naturally guaranteedif we specify completely SICs on data. It need not beworried about at thelevel of transaction specification.For example, let us consider the following four possible transactions.Transaction-2 (which can be called Transfer_Car transaction to conveythe applicationmeaning) can only be performed after Transaction-i(which can be calledProduce_Car transaction)30.Note that Transaction-3and Transaction-4 arenot allowed to be performed. Transaction-3 wouldbe rejected because either(i) if Transaction-i was not executed before, executing Transaction-3wouldviolate the third SIC; or (ii) if Transaction-i was executedbefore, executing Transaction-S would violate the second SIC. Transaction-4would also berejected due to the fourth SIC.30llowever, the Transaction-2 may never be performed — thatis, a car may be always in the hand ofa manufacturer. Whether the Tran.action-2 should be performed woulddepend upon the happeningsin the real world. Also, note that since all these related SICs are enforced at the end of Trarisaction-2,we can switch the order of those two database operations inside it.Chapter 4. The Application of the SIC Representation Model74Transaction-iBeginTransactioninsert (Manufacturer_Ownership)End TransactionTransaction-SBegin Transactiondelete (Manufacturer_Ownership)insert (Dealer_Ownership)End TransactionTransaction-SBegin Transactioninsert (Dealer_Ownership)End TransactionTransaction-4Begin Transactiondelete(Manufacturer_Ownership)End TransactionA SIC with explicit time restriction is still a data-drivenconstraint though it mayinvolve a special system variable Current_time thatregisters the current clock time. Forexample, to model the SIC: “an employee cannotreceive a salary raise during his or herfirst 6 months in the company”, an Employee entitymust have a HireDate attribute, andthe SIC will be:Chapter 4. The Application of the SIC Representation Model 75Employee. Salary- U- TimeRestrict Transition- (Current_time, Employee. HireDate)CERTAINTY certainFOR Employee. SalaryON updateIF Current_time < Employee.HireDate+ “6 months”AS SERT new(Employee. Salary) old(Employee . Salary)ELSE rejectThis kind of SIC may only occur in an environment in which the event thatcausesmanipulation of the involved objects is processed in real time so that theCurrent_timein the computer matches the event time in the real world. Otherwise,Current_time mustbe replaced by a time-valued attribute that records the external eventtime, and theSIC becomes an ordinary data-driven constraint. In the aboveexample, if the request toupdate Salary is not processed in real time, the expressionin the precondition componentwould be: “Salary UpdateRequest.Date < Employee.HireDate+ “6 months”.Conclusion Four possible proposals to incorporatemore components in the Representation model have been discussed. They suggest either some thingsthat cannot bespecified in a database schema, or others that add no more declarativeor operationalsemantics of a SIC. Given these six components wecan represent any kind of SIC thatis mentioned in the literature, and list its featuresprecisely. These six components sufficiently provide the declarative and operational semanticsof a SIC what should betrue in the database, and the information to check andmaintain those assertions. Thus,we conclude the following:Chapter 4. The Application of the SIC Representation Model76It is sufficient to have these six components of the SIC Representation modelto represent a SIC precisely.With this SIC Representation model, the database designer can representthe dataintegrity semantics properly during conceptual modelling.4.2 SIC AbstractionsOne may be concerned that there would be a huge numberof SICs represented in termsof the Representation model in an actual database. However, the explicitcomponentseparation in the Representation model allows us toapply abstraction concepts to reducethe number of SICs that must he specifiedusing the full Representation model.SIC Aggregation Assume that 01 and Ti arethe data object and operation type forSIC-i, respectively; and Ui and Ti, where i=2,3, ..., are the data object and operationtype for SIC-i, respectively. SIC-i (called aggregate-SIC)is the aggregation of otherSICs (called component-SICs) if• 01 contains all Oi’sas components;• the operation Ti on 01 can be conceptuallythought of as the combination ofoperations Ti on Ui;• component-SICsand aggregate-SIC are sub-SICs decomposed fromthe same general SIC.An aggregate-SIC may have its own assertions and violation action. Theenforcementof an aggregate-SIC can be simulated by checking all of its component-SICs anditsChapter 4. The Application of the SIC Representation Model 77own assertions. If an object violates any component-SIC, the violation action of theaggregate-SIC will be taken. Thus, we can use a special logical predicate (checkeomSlC,see Appendix B.2) to refer to all its component-SICs by their names in the aggregate-SIC and avoid having to write the same assertions explicitly for 01 on Ti. One exampleis that the domain constraint on insertion of an entity can he simulated by checkingthe domain constraint of updating its attributes (from unknown values to some values).We would have SICs asserting not-null, value, unique, etc., for each of its attributes onupdate. These assertions need not be repeated for the entity on insertion.SIC Specialization Assume that 01 and Ti are, respectively, the object and operationtype for SIC-i; and 02 and T2 are, respectively, the object and operation type for SIG-2.SIC-2 is the specialization of SIC-i if 02 is a specialization (i.e., sub-type) of 01, andT1=T2.The specialized SIC would inherit assertions from its parentSIC with the propervariable substitution (02 for 01). Thus, representation ofthe specialized SIC can beomitted unless it has special restrictions. The specialized SIC’s ownassertions can onlyrefine its parent SIC’s assertions, but not overwrite them.For example, suppose thatthe database designer defines SIC-i for Employee.Salary on update: “(Employee.Salary> 1000) A (Employee.Salary < 120000)”. The SIC representation for Manager.Salaryon update is not needed unless there are special restrictions (e.g.,“(Manager.Salary>15000)”).SIC Association A SIC is a set of other SICs if conceptuallythe enforcement ofthe set-SIC is the same as the enforcement of all of its member-SICsand nothingmore. The violation action of the “set-SIC” is dummy because if an object violates anyChapter 4. The Application of the SIC Representation Model78member-SIC, the violation action of that member-SIC will be taken. Thus, wecan usea special logical predicate (checkmemSlG, see Appendix B.2) to refer to allits memberSICs by their names in the “set-SIC” and avoid having to write the same representationsof the member-SICs explicitly for the object asserted by the “set-SIC”.The concept ofSIC association may be useful for SIC enforcement in a DBMS. For example,all SICsfor the same object on the same operation may be grouped as a“set-SIC” or several“set-SICs” according to their certainty factors and/or scheduling requirements.Thiskind of “set-SIC” is not a new type of SIC. However, in SIC specifications,conceptually,a SIC on updating the primary key of an entity (or relationship,or relation) can besimulated as two “set-SICs” that refer to SICs for deletingthe corresponding old entity(relationship, or relation), and inserting a corresponding new entity (relationship,orrelation), respectively.Generic SICs By applying the above abstractionconcepts, we can reduce the numberof SICs that must be specified explicitly using the Representationmodel. The conceptof generic SICs is introduced to reduce the numberof required explicit representationseven further. Assume we have the following genericobject types:— (1)Entity*is the generalization (i.e., union) of all entity types that are definedby thedatabase designer.Entity*.Attribute*is the generalization of all attributes of allentity types that are defined by the database designer.— (2)Relationship*is the generalization of all relationship types that aredefined by thedatabase designer. Relationship .Attribte is the generalization ofall attributes ofall relationship types that are defined by the database designer.These generic object types are conceptual modelling objects.They do not actually exist ina database. That is, neither data definition operations nor data manipulationoperationsChapter 4. The Application of the SIC Representation Model 79will actually be applied to them. However, we can write some commonSIC types (e.g.,domain constraints) for them. These SIC representations for generic objecttypes canbe called generic SICs (SICs for specific object types can be called specificSICs incontrast). The precondition component of a generic SICincludes logical predicates3’(refer to Appendix B.1) to indicate clearly the contexts where theSIC type is applicable(e.g., the fact that two entity types are exclusive) and/or identify the relatedinformation(e.g., its primary key, etc.) of the object type for which the SIC type applies.Supposethat we keep the constraint information for specificobject types as logical predicates inthe database (e.g., in the data dictionary). Also supposethat the DBMS would supportSIC inheritance properly. Since all object types defined by thedatabase designer are subtypes of these generic object types, if they satisfy the preconditionsof some generic SICs,these SICs would be applied to them by the principleof SIC specialization. Thus, thesegeneric SICs serve as “templates” for common SICrepresentations and are expected tobe inherited by specific object types32.We would need only one representation for eachSIC type (e.g., two entity types are exclusive)in a database regardless of the numberof specific object types to which the SIC type applies.The advantage of this approachis to reduce the possibly huge number of explicitrepresentations of SICs so that theconceptual structure and future maintenance (management)of SICs become easier. Inaddition, since generic SICs can be pre-defined inan automated database design system,31These logical predicates in the precondition component ofa generic SIC contain some uninstantiated variables. (For example, Primary_Key is a variablein the logical predicateentity(Entity*Primary_Key)These variables will be instantiated when a specific entitytype (e.g., Employee)has the constraint information to satisfy the precondition and inherit theSIC. (For example, the abovePrimary_Key can be instantiated to be “Empld” if the databasedesigner has specified Employee as anentity type with the primary key “Empld”, that is, theassertion entity(”Employee”, “Empld”,...)hasbeen given.)32The idea here is similar to the following simple case. Ina database, there are entity types Employee,Customer, Supplier, etc. Though the Person entity type doesnot actually in the database, we can writesome SICs for Person, which would be inherited by the specific entity types(e.g., Customer). The levelof ourEntity*type is still higher than Person.Chapter 4. The Application of the SIC Representation Model80the consultation to elicit SICs would be more efficient. Section 7.3.1 will describeindetail how to apply generic SICs for representing someSIC types that are identifiedduring conceptual modelling.Usefulness of the Representation Model Itis necessary to identify the object andoperation type when applying SIC aggregation or specialization abstractions.The explicitcomponent separation in the SIC Representationmodel helps identify these componentseasily. In addition, the representation of genericSICs relies heavily on the preconditioncomponent. Most SIC types (e.g., two entity typesare exclusive) only apply to somespecific object types. The precondition component of such ageneric SIC indicates theconditions for the specific object type where theSIC type is applicable. A few SIC types(e.g., domain constraints) are common to allentity types. In that case, the preconditioncomponent of a generic SIC isused to identify the relevant object in this contextso thatits variables can be instantiated with proper valueswhen a specific object type inheritsthe SIC.4.3 Database ManagementIn this section, some possible applications of the SICRepresentation model to databasemanagement rather than database design willbe briefly mentioned.4.3.1 SIC ManagementA DBMS should include an integrity maintenance subsystemto enforce SICs. Someprevious research (e.g., [Hammer and McLeod, 1975] and [Bertino and Apuzzo,1984])Chapter 4. The Application of the SIC Representation Model81have proposed the functional structure of this integrity maintenance subsystem. Insteadof fully describing its structure, this dissertation briefly discusses its major functionsenforcing and maintaining (adding or removing) SICs. With regardto SIC enforcement,the focus is on the basic checking strategy (the enforcement schedule)and on the violationactions.SIC Enforcement If all accesses to the database are through transactions,the DBMSwould enforce SICs through pre- and post-conditionson transactions. The integritymaintenance subsystem would only maintain SICs,and would not be responsible forenforcing them. If it is possible to access the databaseusing primitive operations or if no*pre- or post-conditions have been specified for a transaction, the integritymaintenancesubsystem would determine which SICs must be enforcedand when.If the checking of a SIC is not at the end of thetransaction, it should be done beforethe intended database operation is to be performed,not after it. While examining theassertions in the precondition and predicate componentsof a SIC, the effect of its specifiedoperation type should be taken into account.The Bertino and Apuzzo’s [1984] originalcriteria of deciding when a SIC mustbe enforced with regarding to a transaction aremodified as below:Basically, a SIC is enforced at the end of thetransaction, if the set of objectsin all components of the SIC is affected (updated,inserted, or deleted) bymore than one update, insertion, or deletionstatement in the transaction.Otherwise, the SIC is enforced on eachoccurrence update, insertion, or deletion, if it is a SIC involving only one entity/relationshipoccurrence or tuple;or on each update, insertion, or deletion-request, if it is a SIC involving one orChapter 4. The Application of the SIC Representation Model 82more than one set of occurrences or tuples belonging to the same or differententity, relationship types, or relations.The violation action of a SIC might become a part of a transaction. In determiningthe enforcement schedule of SICs, the related propagation actions should he consideredtoo. For example, suppose a transaction includes action-i, action-2, action-3, ..., andsuppose that SIG-i is only affected by action-i and has a violation action violation-action1. Therefore, SIC-i is enforced immediately. If action-i violates SIG-i, violation-action-ibecomes part of the transaction. If SIC-2 is affected by action-S and violation-action-i,it should be enforced at the end of transaction. If the transaction isrolled back for anyreason, the related propagation actions should also be undone and if a related propagationaction aborts, the transaction must be rolled back too.In most modern DBMS, there is a recovery mechanism that is responsible forrecovering the database from many possible failures, e.g., transactionfailures, system-widefailures, media failures ([Date, 1983]). We assume that the integritymaintenance subsystem cooperates with this recovery mechanism to keep informationin a log requiredfor undoing an operation request The information needed in thelog is similar to thatfor recovery, e g, (i) identification of each modification (update, insertion, or deletion)statement, (ii) identifers of the occurrences affected b each statement,and (iii) for eachoccurrence updated, the old occurrence value A special requuement onthe log for deferring SIC checking until the end of transaction is the following. If theSIC contains new/oldfunctions, the old values of the objects should be logged at the beginning of the transaction even if the values are not referenced by any transaction statement.For example, wemay have a SIC to restrict old(snm(Employee.Salary)) new(snm(Employee.Salary)),but a transaction does not reference or operate on “siim(Employee.Salary)”.Chapter 4. The Application of the SIC Representation Model83If the violation action of the related SIC is “warning”, the intended access operationisallowed, but a warning message would be issued. If the violation actionis “propagate”,the access operation would also be allowed, but a relatedcorrective action would beperformed (some message may also be issued).If the violation action is “reject”, theaccess operation would be rejected, an error code wouldbe raised, and some explanationmessage describing the occurrences in violation of the SICs wouldalso be given. In thislatter case, if the transaction has proper exception-handlingprocedures, this error wouldbe handled as planned (e.g., change the operationin some way and resubmit it, or skipthe operation). Otherwise, the integrity maintenancesubsystem would cooperate withthe recovery mechanism to rollback the transactionautomatically.The SIC Representation ModelSIC representation in terms of the Representation model would facilitate their enforcementsince the necessary operational semanticsare included. The checking is usually limited to theoccurrence (that is currently manipulated by the operation in the operation component)of the object type (in the objectcomponent). The database state thatmakes the checking relevant is unambiguouslydescribed (in the precondition component).The necessary corrective action is also clearlyspecified (in the violation action component).The certainty factor component serves asthe ordering factor when checking agroup of SICs in the same “family”. Other implementation mechanisms (e.g., keeping redundant minimumor maximum data value) canbe used to improve efficiency further.SIC Maintenance After a database is populated,there is a maintenance problema new SIC may he added or an old SIC may be removed.Suppose that no time-stampedpast generation of data is kept. In principle, deleting a SIC doesnot affect the currentChapter 4. The Application of the SIC Representation Model84database state. Inserting a SIC can affect the current databasestate only if the SIC isrelevant to some object(s) on insertion. If the organizationdecides the new SIC will notbe applicable to the existing data, it should be explicitly statedin the SIC33.In thatcase, even if the new SIC is asserted on “insertion”, the currentdatabase state need notbe checked. Otherwise, a complicated procedureis needed to simulate insertions of alloccurrences of the related object (asserted by thenew SIC) and settle possible violations.It may be easier to maintain SICs represented in the Representationmodel since theiroperation type components are explicitly expressed.4.3.2 Other Aspects of Database ManagementThe introduction of certainty factors in the Representationmodel brings some advantagesto other aspects of database management. Thefirst is to support flexible management.The second is to facilitate intelligent query processing.Flexible Management By allowing uncertainSICs, we would decrease the seriousrigidity of SICs criticized by Brodiga[1985]. Although those SICs are uncertain, theirviolation might still convey some information.Usually warning messages are issued.The purpose of issuing warning messagesis twofold. First, specific occurrences thatviolate an uncertain SIC deserve further examinationeither interactively or in batch toassure that the real organizational situationis reflected. Note that the enforcementofa good uncertain SIC will generate warningmessages for those, likely erroneous, (butcomparatively a few) cases without interfering withthe processing of routine cases. Thiscorresponds to the idea of exception reporting.Second, if warning messages are often33For example, if a new restriction on employees salary is publishedon “1991/2/1” and is onlyapplicable to new employees, it should be attached the precondition: Employee.Hiredate“1991/2/1”.Chapter 4. The Application of the SIC Representation Model85issued for a SIC34,the organization might need to reevaluate its organizational policy,which might invoke SIC maintenance.Knowledge-based Query Optimization and Deductive CapabilitiesOne advantage of introducing certainty factors in the Representation modelis to facilitate queryprocessing. An uncertain SIC is similar to a heuristic for directing queryprocessingto the mostly likely objects first in an attempt to reduce response time.Furthermore,uncertain SICs can be applied to provide deductive capabilitiesunder uncertainty. Forexample, from some sources we have the input dataas “with 90% certainty, a ship is atanker, i.e., Ship. Type=tanker”. If we have a SIC“with 80% certainty factor, a tankercarries oil (i.e., if Ship. Type=tanker then Ship. Cargo=oil)” ,we can conclude that with72% (i.e., 90% x 80%) certainty, the Ship.Cargo may be oil. Thus, thisapproach allowsthe Representation model to apply to expertdatabase systems (i.e., the integration ofknowledge based systems and database systems)as well as traditional database systems.34For a SIC involving aggregate functions, it is possiblethat even if one warning message is issued onlyonce, the organizational policy needsto be re-evaluated because violating such a SIC might imply thata number of occurrence exceptions have happened so far owingto the nature of aggregate functions. Forexample, there might be two SICs such as SIc-4: with100% certainty. avq(Employee.Saiary, < $25,000and SIC-5: with 90% certainty, avg(Ernployee.Saiary)$20,000. but no other SICs on Empioyee.Salary.Then, a single violation for SIC-5 might imply that many employees have too high salary, the salarypolicy may need to be reevaluated.Chapter 5An Extended E-R Model Incorporating SemanticIntegrity ConstraintsThis chapter proposes a conceptual modelling tool, calledthe E-R-SIC model, forincorporating SICs. Section 5.1 describes the shortcomingsof previous E-R models fordealing with SICs. Section 5.2 introduces the primitiveconstructs and data abstractionsof the E-R-SIC model and describes the basic propertiesof SICs. Sections 5.3 to 5.6explore SIC properties in more detail.Section 5.7 summarizes the E-R-SICmodel insome figures.5.1 Problems with Previous E-RModelsExisting database design methodologies based onthe E-R model provide little supportfor incorporating SICs. Rather, they are treatedas unessential accessories to conceptualmodelling.The E-R model was originally proposedby Chen [1976]. It has three primitiveconstructs: entity, relationship andattribute. More recently, researchers(e.g., [Smith andSmith, 1977a], [Smith and Smith, 1977b],[Teorey et al., 1986]) have extendedChen’soriginal E-R model to provide some powerfuldata abstractions, e.g., generalization,aggregation, association, etc. Very few SICs are discussedin the Chen’s original E-R model:86Chapter 5. An Extended E—R Model Incorporating Semantic Integrity Constraints87• An incidence constraint requires that the existence of a relationshipoccurrencealways depends on the existence of the participating entity occurrences ([Furtado,et al., 1988]).• The existence of an occurrence of a weak entity typedepends on some occurrence(s)of another entity type ([Chen, 1985]).• The maximum cardinality specifies the maximum numberof occurrences of a relationship that can be related to one occurrence of an entity type.• Attributes cannot exist on their own,but are always attached to entities or relationships.• Primary keys must have unique values.The extended E-R models proposed by previous researchers introducea few moreSICs, e.g., minimum cardinality. However, in all cases, SICsare only considered asaccessory properties of relationship or entity types (usually a singleone). For example,researchers (e.g., Palmer[1978], Tsichritzis and Lochovsky[1982]) usually discuss theoptionality property of a relationship type (i.e., a relationshiptype, as a mapping, istotal or optional to an entity type). One problem with this modelling approachis thatSICs do not receive adequate attention in the conceptual designphase and are usually lostduring the transformation process from the E-R model to the relational model.SICs arenot treated as essential to conceptual modelling. Instead, they are consideredmainly forselecting the appropriate logical data structure (e.g., relations). Becausesome semanticscan not be considered as “properties” of a single entity or relationship type, the traditionalapproach may not identify all necessary SICs. For example, consider theE-R diagramwhere an entity type E is related to entity types F and G via relationship types RX andChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints88RY, respectively. What SICs (if any) are implied by the adjacency of these relationshiptypes in an E-R diagram? There may he a SIC to require that an RX occurrence relatingto an E occurrence must exist if the corresponding RY occurrence exists, or conversely.Alternatively, there may be a SIC to require that RX he exclusiveto RY. Such kindsof SICs are seldom discussed. Since SICs express restrictions on the logical meaningofdata, it is worth considering the SIC as an distinct modelling construct.5.2 An Overview of the E-R-SIC ModelThe E-R-SIC model is an extended E-R model. It can be used for incorporatingSICs ina database schema.5.2.1 Primitive Modelling ConstructsThe proposed model is application domain-independent. Thereare four primitive modelling constructs in the E-R-SIC model — entity, relationship, attributeand SIC. Thesefour constructs are all needed to model data semantics.Construct Description These primitive constructs are definedas follows:Definition 5.1 An entity is the databaserepresentation of a real-world object thatcanbe distinctly identified.Definition 5.2 A relationship is the database representationof an association amongreal-world objects.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints89Definition 5.3 An attribute is the database representation of a property ofa real-worldobject or an association that is a function mapping an entity type or a relationshiptypeto values.Definition 5.4 A SIC is a logical, invariant restriction on thestatic state of a database(that is, a collection of attributes, entities and relationships),or on the database statetransition caused by an insertion, deletion, or update operation.Entity, relationship, and SIC occurrences are classified into different typesaccordingto some criteria. At a particular moment, certain groups of entity, relationship,or SICoccurrences can be considered as sets in the mathematicalsense, and these sets mayhave some aggregate properties.Following the arguments on page 59, the E-R-SIC model doesnot allow the “notapplicable null”. A database designer should properlydefine entity or relationship typesto avoid that problem.A relationship type cannot be a participant in another relationshiptype. If a relationship type participates in another relationship type, itwill be modelled as an entitytype and the original E-R diagram will be changed appropriately35.Since neither relationships nor attributes can exist without entities, anda relationshipcan be modelled as an entity, the followillg is clear:35That is, if a relationship type RX, which has two entity type participantsE and F, is also a participantin another relationship type RY, then a new entity type, sayG will be used to replace the originalrelationship type RX; and two new relationship types, say RE and RF, willbe created to connectthis new entity type, G, with entity types E and F, respectively. The new entity type,G, becomes aparticipant in the relationship type RY.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 90Entity is the most fundamental construct among attribute, entity, and relationship.Weak Entity Type The existence of an occurrence of a weak entity type dependson the existence of some occurrence(s) of another entity type the “regular”entitytype. An existence subset [Webre, 1983] is the subset of occurrencesof the regular entitytype upon which a weak entity occurrence is dependent. The E-R-SIC model requiresthat the above relationship be explicitly specified in order to make clearthe “dependencesemantics” — the existence subset for each weak entity occurrence36.Time The E-R-SIC model does not model “time” as an entitytype37. That is,time-stamped past generations of data are not modelled.However, there may be SICsapplied to some time-valued attributes, temporal sequences betweenobject occurrences,or the special system variable, Current_time.Simplifying Assumptions For simplification, there are two assumptionson relationships.361n the literature there is a related discussion on another term, weak relationship.However, note thatresearchers use that term to imply diverse meanings. For example, Tsichritzisand Lochovsky[19821useit to mean the relationship between a weak entity type and its related regular entitytype. Scheuermannet al. [1980] classify weak relationships into two types. Their second typeshould be modelled byan association abstraction. Their first type (also see [Dogac and Chen, 1983])implies a special SIC,which is called Critical_Relationship_Occurrence SIC in this research.That is, the existence ofan occurrence of an entity type E depends upon the existenceof exactly one critical occurrence of arelationship type R, via which it related to another entitytype F. An occurrence of E cannot exist ifits related critical relationship occurrence of R does not exist though it maystill participate in otheroccurrences of R. For example, each employee should be assignedto at least two projects, and exactly oneof these would be critical. In order to model such a situation, the relationshiptype R should have a specialattribute, e.g., “Criticality”, which is a binary variable to indicate whether a relationshipoccurrence iscritical.371f we model “time” as an entity type, all other entities and relationships would connect with“time”via at least two special relationship types “has_creation time” and “has_deletion time”([Studer,1988]). The E-R diagram would become very complicated.chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 91No Ternary or Higher Degree of Relationships Ternary relationships and relationships of higher degree are not discussed in this dissertation because of their additionalcomplexity38.Instead, ternary relationships or relationships of degree greater than twoare simulated using binary relationships ([Kent, 1977])39.No Recursive Relationships The relationships discussed in this dissertation arebinary relationships involving two distinct entity types. Recursive relationships, whichare relationships involving only one entity type, are simulated by binary relationships.That is, one or two40 subtypes will be created for the entity type participatingin arecursive relationship type. Such a simulation avoids the need to attach an explicit“role”when referencing an entity type participating in a relationship type.These subtypes alsomake the semantics clearer in most cases41.When an entity type is requiredto participatein both roles of a recursive relationship, such a distinct subtype, althoughit may beredundant, will be helpful for understanding the semantics. We may havesome problemswhen an entity type is required to participate in both roles of a recursiverelationship38For example, the cardinality specifications are more complicated. In a ternaryrelationship, thereare twelve pairs of minimum and maximum cardinalities according to the cardinality definitiongiven byLenzerini and Santucci [1983].39That is, in the case of ternary relationships among entity types, the first binaryrelationship isformed between two entity types, the second binary relationship is then formed to connectthe firstbinary relationship with the remaining entity type. Since the participants of a relationshiptype shouldbe entity types, the above means that the first binary relationship should become anentity type. Forexample, suppose that there is a ternary relationship Order among three entity types, Parts,Warehouses, and Suppliers. We will model the situation as four entity types, Parts, Warehouses,Allocation,Suppliers, and three binary relationship types connecting Allocation with Parts, Warehouses,Suppliers,respectively. Allocation is a new entity type converted from the original binary relationshipbetweenParts and Warehouses.400ne subtype is sufficient to replace a recursive relationship with a binaryrelationship although adatabase designer may prefer to have both. For example, in the relationship typeSupervise, “Employee(0,*)Supervise Employee (0,*)‘where*stands for some positive number other than 0, one subtype(e.g., Supervisor) is sufficient.41Compare to an extreme case where there is only one entity type “Thing” in the databaseand allrelationships are recursive.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 92and those two roles are not distinct42.However, such cases are very idiosyncratic. Forsimplification, this research is limited to handling binary relationships with two distinctentity types.5.2.2 Data AbstractionsA data abstraction is a simplified description of a system that suppresses specific detailswhile emphasizing those pertinent to the problem. Like other extended E-R models inthe literature, the E-R-SIC model provides three kinds of data abstractions: inclusion,aggregation, and association.Inclusion AbstractionThe inclusion abstraction concept ([Goldstein and Storey, 1990]) encompasses classification, generalization, and specialization. Classification is a form of abstractionin which a type is defined as a collection of occurrences with common properties.Specialization occurs when every occurrence of a type is also an occurrenceof another type.Specialization is indicated by the term, is_a, that is, S is_a G, where S is a subtype andGis a super-type. It is possible to have generalization or subset hierarchies ([Teorey, et al.,1986]). A generalization hierarchy occurs when a type is union of non-overlappingsubtypes. A subset hierarchy occurs when a type is union of possibly overlapping subtypes.Property inheritance, which means that attributes, associated relationships and imposedSICs of the super-type are inherited by each subtype (or a type’s are inherited by itsoccurrences), is the most important characteristic of the inclusion abstraction. In the42That is the relationship is symmetric, e.g., “Persom Friemdship Person”Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints93case of the classification abstraction (related occurrences to types),the property inheritance principle has been enforced traditionally by any DBMS.This research assumes thatthe DBMS (either in the E-R model or the relational model)would also automaticallyimplement property inheritance for the specialization abstraction.That is, the principleof redundancy minimization. i.e., properties that can be inheritedfrom some other entity type via an is_a relationship should not be stored explicitly([Goldstein and Storey,1990]), has been followed. Otherwise, the inherited attributes,relationships and imposedSICs would need to be explicitly and redundantlystored for each specific subtype andother SICs would be required to assurethat these properties have been really “inherited”! However, there is one exception to the useof redundancy minimization principle.Although the primary key of the super-type may notbe chosen as the primary key of thesubtype, it is indeed an attribute and candidate keyof the subtype. In order to make thesemantics of a subtype entity clearer and havethe same related SIC(s) regardless ofthechoice of the primary key, the primarykey of the super-type is suggested tobe stored inthe subtype.Aggregation AbstractionAggregation is an abstraction that allowsa relationship between objects (i.e., attributes,entities, relationships) to be considered as ahigher level object ([Smith and Smith,1977b]). The term, component_of, is usedto indicate the aggregation, that is,C cornponeritof A, where C is a component and A isan aggregate. The E-R-SIC model provides composite entity aggregationthat is an abstraction in which an entity containsother dependent entities and some attributes asreal components43.The aggregate entity431n addition to composite entity aggregation, aggregation abstractionscan be classified into fourother kinds based on discussions about aggregation in the literature. (1) Attributeaggregation is theabstraction in which an attribute may be defined as the aggregation of attributes.This research doesChapter 5. An Extended E-R Model Incorporating Semantic IntegrityConstraints 94“owns” these other entities ([Lee and Lee, 1990]). That is, the existenceof these otherentities is dependent on their aggregate and are owned by exactlyone aggregate (i.e.,the cardinalities of any component in the component_of relationshipare always (1,1)).Although some researchers argue that in aggregation the inheritanceis upwards ([Brodie,1983,p.576], [Mees and Put, 1987], [Potter and Kerschberg, 1988]), thisresearch argues that it is probably more suitable to state thatthe aggregate “owns” componentsand components “own” their attributes, so theaggregate “owns”, rather than “inherits”, components’ attributes. For example, wemay state “a car owns engine.weight,engine, brand, and brake. brand, etc. “.Association AbstractionAssociation is the abstraction in whicha collection of member objects is consideredas a higher level (more abstract) object([Brodie, 1983]). The term member_ofisused to indicate the association, that is, M member_of5, where M is a member and Sis a set. Brodie [1983] states “as with aggregation,the inheritance goes upward” andsome researchers (e.g., Mees and Put[1987]) even state that association may supportboth upward and downward inheritance. This researchtakes the position that there isno inheritance in association. That is, is distinguishedfrom type and a “set” inassociation is considered to represent apure mathematical set.not consider the hierarchical structure of attributes. (2) Simpleentity aggregation is the abstractionin which an entity is defined by aggregation of attributes.This is the traditional entity concept. (3)Complex entity aggregation is the abstraction in which an entityis defined by aggregation of attributes,other independent entities, and probably sets. The aggregateobject does not really “own” them ascomponents. That is, the cardinalities of these “components”in the component_of relationship maynot be (1,1). This aggregation may convey less semanticsbecause these “relationships” between theaggregate object and its “components” are all implicitly namedas “component-of”. Therefore, the E-RSIC model does not adopt it. (4) Relationship aggregation is the abstractionin which a “relationship” isobtained by aggregation of entities and some attributes. It isjust another way to represent a relationship.chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints95Before Brodie mentioned association, dos Santos et al. [1980] hadalready proposed auseful data abstraction “correspondence”, which was later referredto by Furtado andNeuhold [1986] as “grouping”. Grouping is more general thanassociation. It createsa group of sets, i.e., grouping is an abstraction that defines anew entity type in whicheach occurrence is a set formed from a collection of occurrences of thesource entity type.According to the correspondence idea in [dos Santos et al.,1980}, agroup of sets isformed by an indexing mechanism. Applying the ideaof correspondence, the E-R-SICmodel provides three kinds of associations as below.1. Natural Set Association: A set, S, is defined to contain alloccurrences of anentity Mtype. Classification is the indexing mechanism toform a set. For example,we have: “Employee member_of Employees”where Employees is a set consisting ofall Employee occurrences in the employeetype; or “Department member_of Departments “.If these sets only have attributes that are derivedfrom those of their members, theirexplicit representations may be redundantand may not be efficient after consideringthe enforcement of SICs. However, there maybe a priori attributes, for example,Employees.Representative. In the whole database,there are a number of such sets,e.g., Employees, Departments. Since they may havedifferent derived attributes anda priori attributes, they form different entity set types containinga single occurrence, respectively. This association abstractionrelationship, memberof, shouldnot have its own attributes and need not beexplicitly represented.2. Indexing Derived Set Association:This is the original correspondence abstraction discussed by dos Santos et al. [1980]. The indexing mechanism44can be an44Formally, Furtado and Neuhold [1986] define grouping as below. “If T designates some entitysetChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints96indexing attribute that is an attribute of the indexed entity type,or an indexingentity type that is related to the indexed entity type via an indexing relationshiptype. An example is cosets of employees of the same age, where Employee.Ageis theindexing attribute for the indexed entity type Employee. If theindexing attributeis an attribute that disallows null (“unknown”), the cosets obtainedby groupingwould form a partition of the indexed entity type occurrences. Anotherexample isshown in Figure 5.1, where DSis the indexing derived set (e.g.,cosets of employeeswho work_for the same project), M (e.g., Employee)is the indexed entity type, I(e.g., Project) is the indexing entity type, and R (e.g., Work_for)is the indexingrelationship. Grouping is a powerful abstraction.There may be more than oneindexing type, which can be combined with indexing attributesas the indexingmechanism. Although we can get a group of setsfrom the indexing mechanism, wemay only be interested in some of these (e.g., theset of employees who work_forproject p100).In general, this association abstraction relationship,member_of does not have itsown attributes and need not be explicitlyrepresented. This kind of set has someattributes derived from the indexed entity type,some are defined a priori. Twokinds of attributes in a set are important formembership derivation: the indexingattribute (e.g., Employee.Age) and the primarykey of the indexing entity type (e.g.,Project. Id).3. Enumerated Set Association: There is no indexing mechanismin this kind ofassociation because the database designer doesnot know or does not care aboutand T1,T2,..., T, are either value sets associated with T or entitysets related via some relationship withT, then the Grouping operator denoted by T1,T2,..., T {T} constructs anew (grouped) entity setTGwhere each element is a set of entities of T such that inside of one such setall entities have the samevalues and related entities from the entity sets T1 , T2 ..., T associated. The typesT1,T2,..., T will becalled indexing types, T the basis.”Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints97_rofDSIFigure 5.1: Grouping: Member (M), Derived Set (DS), Indexing Entity(I), IndexingRelationship (R)the indexing entity types or attributes. The set membershipcan only be explicitlyenumerated by the member_of relationship.5.2.3 Basic Properties of SICsBased upon the ontological concepts of Bunge’s formalism,Wand and Weber ([1988];[1989]; [1990]) develop a formal model of the deep structure of an informationsystem.Their work begins with a fundamental premise that is an adaptation ofNewell andSimon’s [1976] physical symbol system hypothesis. Althoughas described in Section 1.3the SICs in this research are only parts of their “laws”, we canborrow this premise todevelop the SIC concept.Premise 5.1 A physical-symbol system has the necessary andsufficient properties torepresent real-world meaning.Chapter 5. An Extended E-R Model Incorporating Sernantic Integrity Constraints 98An information system is a physical-symbol system. A database is part of an information system and consists of attribute, entity and relationship occurrences. In orderto represent data integrity semantics, there may be constraints restricting the existenceor change of these occurrences. Some constraints may specify necessary conditions thatmust hold for an occurrence to exist, not exist or change in a database. Other constraintsspecify sufficient conditions, which if true, imply that an occurrence must exist, not exist,or change in a database. Thus, we can interpret a constraint as follows.A SIC is an assertion a sufficient or necessary condition foran occurrence of an attribute, entity, or relationship type toexist, not exist, or changein a database.What are these conditions?• Although these conditions are usually specified for anentity or relationship type orits attributes, they are actually restrictions on the insertion,deletion, or update ofits occurrences in a database, rather than onthe addition or removal of the typefrom the database schema.• A SIC is defined intensionally in a database schema ratherthan extensionally. Theconditions apply to an occurrence by virtue of thefact it belongs to an entity,relationship, or attribute type. However, the conditions forentities or relationshipsmay only be relevant to some occurrences of the specifiedtype. These occurrences,indeed, can be considered to belong to an “implicitsubtype”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 99Reasons of having implicit subtypes. By applying the classification abstraction([Schrefi et al., 1984]) those entity/relationship occurrences with common properties form an entity/relationship type; and by the principle of property inheritance every entity/relationship occurrence should have exactly all the attributesof the entity/relationship type and conform to the SICs associated with the entity/relationship type ([Knuth et al., 1988]). However, these occurrences maynotbe “homogenous”. That is, there may be special SICs on some occurrences. Onecould create a subtype for only those special occurrences. By taking this approach(similar to [Dampney, 1988]) we could avoid a number of specialtypes of SICs associated with “implicit subtypes”. However, since there may be a numberof suchspecial SICs, there would have to be a corresponding number of such subtypesina database. The organization may have no intrinsic interest in these explicit“subtypes”. In this research, the subtypes will be created only if they are meaningfulto the organization or if they are needed to eliminate recursive relationships.Thatis, we have the following premise:Premise 5.2 The occurrences of each entity or relationship type are nothomogenous; that is, they have different attribute values, and are related to differentoccurrences of other objects.This implies that a database designer could not specify all entity and relationshiptypes such that all occurrences of each type satisfied the same set of SICs.That is,“implicit subtypes” with unique SICs always exist.An entity subtype may be implicitly defined by restricting attributevalues of itsoccurrences or by requiring occurrences to participate in some relationshiptypes45.451n another semantic data model, SDM, there are four ways to define “sub-classes”: attribute-defined,user-specified, set operator-defined, and existence-defined ([Hammer and McLeod, 1981], [Urban and Delcambre, 1986]). A user can decide what occurrences the subtype will contain. From the SIC perspective,Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 100A relationship subtype may by implicitly defined by restricting attribute values ofits occurrences, by requiring its participating entity occurrences to participate insome other relationship types, or by restricting attribute values of its participatingentity occurrences. A condition for an entity or relationship type can be consideredas the definition of an implicit subtype if there are other conditions and this condition provides a criterion to decide whether an occurrence satisfies other conditions(that are special SICs for the “implicit subtype”).• These conditions may be positive or negative.A positive condition requires thatsomething must happen. A negative condition requires that somethingmusthappen. For attributes, by simply reversing the comparison operator (e.g.,reversing“not E.A> value” to “E.A < value”), we could consider only positiveconditions.However, we must consider both positive (e.g., participate_in)and negative (e.g.,not participate_in) conditions for relationships.• These conditions maybe simple assertions, or complicated arithmetic formulas.To be considered a single SIC, a condition should be “atomic”, i.e.,indivisible.Otherwise, it is really two or more SICs. That is, usually we neednot considerconjunctions of conditions. However, there may be some “further restrictions”thatare conditions on some basic restrictions. For example, there may be a SICsuchas, “if the salary of an employee is greater than$40,000, he (she) must participatein some project, which is projecti”, in which the last part of thestatement is afurther restriction on the participation in a project.• These conditions may only requirethat any one occurrence of a specific type exist, or more strictly, may require that at least, at most, or someexact number ofthe semantics are embedded in the user’s mind and it is left to the user to maintain the database in orderto reflect real-world changes. If a subtype is defined by some set operation, the definition is explicit.The ideas of the “attribute-defined” and “existence-defined” “sub-classes” in SDM are adopted here.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 101occurrences of the specific type exist. That is, the restriction may be qualitativeor quantitative. Since an attribute type has exactly one occurrence per entity orrelationship occurrence, no quantitative requirement is placed on attributes. Thequantitative requirements are more important for a relationship type. They arethe relative cardinalities of a relationship type — restrictions on the minimum,maximum, or exact number of relationship occurrences in which an occurrence ofthe specified entity type participates; or requirements that for each occurrence ofthe specified entity type, all occurrences of the other entity type relate to it viatherelationship type.• These conditions may directly restrict the value of an attributeof a single entity orrelationship occurrence, or may restrict the aggregate value of an attribute for a setof entity or relationship occurrences. All occurrences of an entityor relationshiptype (or its “implicit” subtype) naturally form a set(although the database designermay not explicitly define it). The most important set propertiesare the countingnumber, minimum, maximum, summing and averagevalues.• Althougha condition may be asserted for occurrences of one entity or relationshiptype, it might also be for occurrences of a group of typestaken together.• These conditions may explicitlyinvolve time to restrict or trigger the existence,nonexistence, or change of an occurrence, or may asserta temporal sequence between the existence of occurrences46.46Without time-stamped past data, the temporal requirementwould imply “no time lag” between theexistence of adjacent occurrences in a sequence of objects.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 102Systematic Modelling A naive approach to modelling SICs would enumerate all necessary and sufficient conditions for the existence, nonexistence and change of the occurrences of each attribute, entity, and relationship type. However, if a SIC involves severalobjects, we need not specify it several times. For example, suppose that a SIC requiresthat if an RX occurrence exists for an occurrence of the specified entity occurrence,anRY occurrence must also exist. We would specify this SIC when we identify necessaryconditions for the existence of RX. Although the existence of RX is also a sufficient condition for the existence of RY, it need not be incorporated again as a general constraint ifwe properly represent the above SIC (i.e., by decomposing it into two sub-SICsin termsof the Representation model). It would be desirable to be able toguarantee that allconditions on all objects have been completely covered although weneed not explicitlyconsider all kinds of conditions for each object. The systematic modelling procedureproposed in the remainder of this chapter and the decomposition algorithms proposedin Chapter 7 are used to achieve this goal. By applying the procedure describedin Sections 5.3 to 5.6 a database designer can model SICs systematically by firstconsideringthe necessary and sufficient conditions for the existence, nonexistence, andchange of eachattribute of each entity and each whole entity in isolation, and thenexamining the wholeE-R diagram by considering each explicit or implicit relationship.Representation Languages Although SICs incorporatedby the E-R-SIC model candirectly be represented in terms of the SIC Representation model, it issuggested thatthey are first represented in a simplified format using rules or expressions.The purposeof using this simplified format language is to represent the preconditionand predicatecomponents first in a language that is closer to natural language. The relevantobjects andoperation types can be derived from the simplified format by applying the algorithmsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints103introduced in Chapter 7. The certainty factor and violationaction specifications areclosely related to the objects and operation types, andcan be added later.The simplified format contains the preconditions, predicates,and some operationtype information (by the keywords, is_to_be_deleted,and is_to_be_updated for operation-dependent SICs), hut not the certaintyfactor and violation action. For example, we mayrepresent a requirement on the salary of any programmerfirst in the simplified formatas:if Employee. Title= “programmer”then Employee.Salary 18000.BNF descriptions of the simplified formatare described in Appendix C. For convenience,this dissertation describes a SIC as being in ruleformat if it can be written by using thekeywords if and then (and the “if” conditionpart is not empty) in the simplified format.Since a relationship type name usually canbe a verb phrase, when a relationship typeis discussed, its participating entity types willbe mentioned too. Because a relationshiptype involves two entity types, we needa with_respect_to keyword to clarifyto whichentity type we are referring47.An assertion mayhave a temporal modifier (before orpreviously) in a rule describing a temporal sequencerequirement between the existenceof the object in this assertion and the existenceof the object in another assertion. If a“before” modifier is attached to an assertion, theassertion must be true for its involvedobject at the time another assertion is becomingtrue48. If a “previously” modifierisattached to an assertion, at the time the assertionbecomes false for its involved object,47For example, “if with_respect_to E, E RX F then with_respect_toE, -‘(E RY F)” is a SIC to assertthat RX and RY are mutually exclusive relative toE only.48For example, we may have: “if Dealer Own Car then Manufacturer PossessCar before”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints104another assertion must become true49.SICs written in the simplified format should be later reformulated in terms of theSIC Representation model by applying the algorithms introduced in Chapter7. Most ofdesigner-specified SICs are general SICs. Thesegeneral SICs must then be decomposedinto several operation-dependent sub-SICs in terms of the Representationmodel. Forexample, the static constraint in the simplified format shown onthe preceding page willbe represented as three operation-dependent SICs — one eachon update of Salary, updateof Title, and insertion of Employee. At this time, the certaintyfactor and violation actionspecifications are added.SIC Types The E-R-SIC model can be used toclassify SICs into a number of domain-independent SIC types. This SIC type classificationfacilitates development of SIC consistency and nonredundancy rules that can be appliedto make SIC elicitation more efficient.It also allows us to present generic SICs for somecommon SIC types so that the numberof SICs that must be specified using the Representation modelcan be greatly reduced.Appendix D provides a detailed description of this classification.5.3 Entity Attribute SICsFirst examine necessary and sufficient conditionsfor the existence/nonexistence/changeof each attribute of each entity in isolation.Necessary Conditions An attribute cannot become nonexistentonce it exists. Therefore, we only need to consider necessary conditions for its existenceand change.49For example, we may have: “if Manufacturer Possess Car previously then Dealer Own Car”.Chapter 5. An Extended E-R Model Incorporating ScmanticIntegrity Constraints 105Associated Entity Type SICs If an attribute exists, its valuemay be restrictedto be in a data type and in some range, and/or may be specified as not-nullthat meansthe “unknown” value is not’ allowed. In addition, its value must be expressedin somespecified format that is meaningful to the organization.If an attribute is allowed to beupdated, it must be specified as changeable. All attributes(including primary keys) ofrelationships or entities are updatableunless declared otherwise. If it can be updated,there may be special SICs restricting the pairsof before and after values.Associated Entity Set SICs Because anattribute type is defined as a functionfrom an entity type to value, it has only oneoccurrence per entity occurrence, i.e.,itis single-valued, not multi-valued.We need not consider the attribute set. However,because an attribute belongs toan entity, there may be some SICs restrictingentity setproperties. They may require that each attributevalue he unique. The restrictions onminimum, maximum, summing, or averagevalue of the attribute in an entityset mayalso be specified.Primary Key Problem In the E-R-SIC model,following the traditional E-Rmodel and relational model, there is nointernal identifier (“surrogate”) to representan entity or relationship. Rather, some attributeor combination of attributes is usedasthe primary key. Unfortunately, this overloadingcauses the semantics implied by an update of a primary key to be ambiguousit may imply a simple update of an attributeor it may imply the deletion of an “old”entity (relationship or tuple) followedby aninsertion of a “new” one. The update of aprimary key is allowed in the E-R-SICmodel.However, the SICs related to deletionand insertion must be enforced.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints106Time Restriction There may be some SICs restricting a time-valued attribute.The expression of such SICs may involve an explicit Current_time variable50.Sufficient Conditions An attribute type is included by a database designer tosatisfyorganizational needs. Because the “not applicable null”is not allowed, for each entityoccurrence, there is at least one attribute occurrence. Therefore,no SICs can be expressedas sufficient conditions for the existence or nonexistence of an attributeconsidered inisolation.If we consider a single attribute in isolation, neither is thereany sufficient conditionfor attribute change except time-triggering.That is, an increment of the Current_timemight trigger an update to an attribute.For example, at 0:00 on 1/1/1993, increase thesalary of each employee by $1,000.5.4 Entity SICsExamine necessary and sufficient conditions forthe existence/nonexistence/change ofeach whole entity type in isolation.Necessary ConditionEntity Type SICs If an entity occurrence exists,there may be a formula5’amongits attributes. In general, any entity occurrencecan be deleted to reflect the real world50For example, Current_time (Employee.FirstWorkDate+ “2years”) is an expression to assertarule that “each employee must have at least 2 years working experience”.511f there is more than one attribute appearing in an expression, we call that expression a formula.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 107situation. However, there may be SICs that require some of its attributes to have certainvalues before the deletion may take place.Entity Set SICs Because an entity occurrence belongs to an entity set, theremaybe SICs restricting set properties. They may specify the maximum numberof occurrencesof an entity type that are allowed to exist in the database. The concatenatedvalues ofsome attributes may be required to be unique.Some aggregate values of attributesmay be required to be interdependent52,or more strictly, satisfy a formula.There is norestriction on the minimum number of occurrences that can exist in the databasebecausein the beginning there are no occurrences, and we should allowthat an entity type mayhave no occurrences temporarily even after the databaseis already populated.There are some further complicating factorssuch as the following:The First Complicating Factor Implicit SubtypeThe single attribute SICsin Section 5.3 and the SICs mentioned above in this sectionmay be conditional. That is,they may include conditions that canbe thought of as defining “implicit subtypes”. Ifwe consider a single entity type in isolation, the onlyway to define “implicit subtypes” isby some range restrictions or formulas involving its attributevalues. It is possible thata SIC (e.g., nonvolatility, value restrictions, etc.)applies to a single attribute only fora specified “implicit entity subtype”. Itis also possible that a SIC (e.g., restriction onthe maximum number of occurrences,etc.) applies to an occurrence only because thisoccurrence belongs to a specified “implicit entity subtype”.specifications52For example, if min(E.A1)> vi then avg(E.A2) > v2.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints108The Second Complicating Factor Time Restriction There maybe SICsthat restrict other attributes when time-valued attributes satisfy certain conditions.Astime passes, the restriction will either finally disappear or come into force53. Theseexplicit time restrictions may be added to all SICs mentionedin this section and inSection 5.3. It is possible to combine time restrictions and “implicit subtypes”factors ina single SIC.The Third Complicating Factor — Temporal Sequence among AttributesThere may be some temporal conditions among the attributesof an entity. They mayrequire that an entity occurrence’s attributes mustsatisfy a certain condition at thetime one of its other attributes is going to acquiresome value54.It is also possible torequire that if its other attribute(s) satisfieda certain condition in the past (and no longersatisfies it now), one of its attribute must take somevalue.Sufficient Conditions An entity type exists inthe database only because the databasedesigner includes it in order to satisfy the interests of the organization.It would be veryunusual to have a SIC requiring a specific entity occurrenceto exist or not to existin a database. Therefore, we exclude the possibilityof expressing a SIC as a sufficientcondition for the existence or nonexistenceof an entity occurrence considered in isolation.If we consider a single entity type in isolation, neither is thereany sufficient conditionfor entity change except time-triggering. That is, foran entity occurrence, an incrementof the Current_time might trigger an update to oneof its attributes if its other attributes53For example, we may have a SIC like “an employee cannot receive asalary raise during his (her) first6 months in the company”, or “if an employee has worked for at least two years, he (she)must have atleast 10 vacation days”.54For example, we may have a SIC: “if E.A1=vl then E.A=v2 before”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 109satisfy some condition(s).5.5 Relationship SICsNow traverse an E-R diagram and consider the necessary and sufficientconditions for theexistence/nonexistence/change of each relationship.In terms of topology, the importantlocal contexts describing how relationships and entitiesare connected together in an E-Rdiagram are55: line, star, loop-2, and loop-n. Each relationshiptype is in one or more ofthese contexts. When incorporating SICs for a relationship,the database designer needsto recognize its contexts.Definition 5.5 If an entity type participatesin a group of relationship types, it isasharing entity type to them.Definition 5.6 If in a part of an E-R diagram,each entity type participates in at mosttwo relationship types and there is no cycle (loop)among these entity types, e.g., Figure 5.2, each of these relationships is in a line context.Definition 5.7 If there is a sharing entitytype participating in three or more relationship types and there is no loop among these entity types,e.g., Figure 5.3, each of theserelationships is in a star context.Definition 5.8 If there is a loop between two entitytypes, i.e., two or more relationshipsexist between two common entity types, e.g., Figure5., each of these relationships is ina ioop-2 context.551f we allowed explicit recursive relationships, they would be in the loop-i context.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 110£_____ _____F++__HiFigure 5.2: A Line Layout ContextChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 111Figure 5.3: A Star Layout ContextERZ RX RYFFigure 5.4: A Loop-2 Layout ContextChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 112E_____ _____H_____ _____FFigure 5.5: A Loop-n Layout ContextDefinition 5.9 If there is a loop amongn3entity types, e.g., Figtre 5.5, each of theserelationships forming the loop is in a loop-n context.5.5.1 Necessary ConditionsIt is necessary to examine the necessary conditions for one or multiple occurrencesofa single relationship type t.o exist (not exist/change), and for occurrences of a groupofrelationship types to co-exist.Single Relationship Type SICsFirst we focus on a single relationship type.Focus on Attributes A relationship is similar to an entity because it can haveits ownattributes in addition to the primary keys from the participating entities.Therefore, arelationship can have single attribute SICs similar to those discussed in Sections5.3 andChapter 5. An Extended E-R Model Incorporating Semantic IntegrityConstraints 113single relationship SICs56 such as those of Section 5.4. Since a relationship’s attributesareusually few, such SICs are not common. However, itis possible that there may be somemore complicated SICs57 because the definition of an “implicitrelationship subtype”could be based on some conditions on one or both participatingentity occurrences (e.g.,having some attribute values, or participating in someother relationship types), or someformula involving the attributes of the relationship and its participatingentities.Focus on Association In anothersense, a relationship is much different from anentity because a relationship occurrence representsthe association between two entityoccurrences.Inherent SIC First, an inherentSIC is that the relationship occurrence’s participating entity occurrences must exist.Restricted-connecting Set There may besome restrictions on constructing arelationship, i.e., the necessary conditionsfor the existence of a relationship occurrence.Definition 5.10 Suppose that the relationship typeR connects the entity types E withF. The restricted-connecting set of an entityoccurrence of E in the relationship typeR is those occurrences of F that the entity occurrence ofE is allowed to be related to viaR.561n contrast to an entity type, absolute maximumcardinality for a relationship type need notbeconsidered although some researchers mentionit. As shown by [Lenzerini and Santucci, 1983],theabsolute maximum cardinaJity of a relationshipis bounded by the absolute maximum cardinalitiesandrelative maximum cardinalities of participatingentity types.57For example, there might be a SIC stating that“if Employee Work_for Project, Project_Id=plOOthen Work_for.Hours 50”.chapter ö. An Extended E-R Model Incorporating Semantic Integrity Constraints 114Normally, a DBMS would not know the restricted-connecting set of each entity occurrence in each relationship since a SIC is seldom specified extensionally. However,theremay be some general business rules that restrict intensionally the restricted-connectingsets. The restrictions may be positive or negative. The restrictionson the freedom toconstruct a relationship include:1. One-side condition. The restriction may be on one entity type. It wouldimplythat some occurrences of each participating entity typeare not allowed toparticipate in the relationship type. That is,their restricted-connecting sets areundefined. The relationship type R is in fact definedon some “implicit entity subtype(s)” of E or F. The “implicit entity subtypes”may be defined positively, i.e.,as occurrences having some specific attribute values, having attributessatisfyingsome formula, participating in some other relationship types;or negatively, i.e., occurrences not participating in some other relationshiptypes. We need to considerpossible disjunctions of conditions in a star context becausethere are at least twoother relationship types58.2. Two-side condition. The restrictions may be coupling conditionson both entitytypes at the same time. If these are positive restrictions, theyare stronger than theabove one-side restriction. If these are negative restrictions,they are weaker thanthe above one-side restriction. Although all occurrences ofeach participating entitytype, E or F, may be allowed to participate in the relationshiptype, they are notfreely connected together. An example of a positive restrictionis that each occurrence of an “implicit subtype” of E can only connect with those occurrencesof some“implicit subtype” of F. An example of a negative restrictionis that each occurrence of an “implicit subtype” of E cannot connect with those occurrences of some58For example, in Figure 5.3, we may have: “if E RX F then (E RY G) V (E RZ H)”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 115“implicit subtype” of F. The “implicit entity subtypes” on both sides of a relationship may be defined interdependently as occurrences having some specific attributevalues59,having attributes satisfying some formula, participatingin some otherlationship types (e.g., occurrences of E participating in RX andtheir connectedoccurrences of F participating in RY at the sametime)60.Some SICs described inthe literature are just special cases. For example, a Subset_RelationshipSIC([Palmer, 1978]) in a loop-2 context61 and the necessary conditionsfor the composition of relationships ([Lenzerini and Santucci,1983], [Azar and Pitchat, 1987])in a loop-n context62,are special cases requiringthat the specific occurrences of Fconnecting with an occurrence of Evia other relationshiptypes (e.g., RY, RZ) be inthe restricted-connecting sets of the occurrence of E. An Exclusive_OccurrenceSIC ([MaFadden and Hoffer, 1988]) in a loop-2 context63 is alsoa special case requiring that the specific occurrences of F connecting withan occurrence of E viathe other relationship type (e.g., RY)llQbe in the restricted-connecting set oftheoccurrence of E.3. Intra-relationship condition. It is possible that we candefine a restricted-connecting set based on some properties of the relationshiptype. For example, wemay define a restricted-connecting set of an occurrenceof E by requiring that if someoccurrences (e.g., having some attribute values)of Fare in the restricted-connectingset, some other occurrences (e.g., having some other attributevalues) must or mustnot be in the set. A relationship type may alsobe symmetric or transitive.59For example, in a relationship Drive connecting Driver with Vehicle,even if it is total on both sides,some drivers (e.g., with class5 driver-licence) can only drive some vehicles (i.e., with gross weight10,900 kgs).°°For example, there may be a SIC that state: if Employee Is_Allocated Carthen if Employee Work_forProject then Car IsJnsured Collision_[nsurarice61For example, in Figure 5.4, we may have: “if E RX F then E RY F’.62For example, in Figure 5.5, we may have: “if E RX F then (E RY H) A(H RZ F)”.63For example, in Figure 5.4, we may have: “if E R F then -(E RX F)”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints116The fundamental properties of any relationship type are assumed by default tobe irreflexivity, asymmetry, and intransitivity. However, some relationships maybe reflexive, symmetric, or transitive. The reflexivity of a relationship willnotbe discussed in this research because it is quite unusual in a traditional database.Symmetric64 or transitive65 relationships may occur in a specializationhierarchy,that is, when the two involved entity types belong to the samesuper-type; or oneis a super-type (e.g., Employee), and the other is its subtype (e.g., Manager).There are some further complicating factors such as the following:1. The first complicating factor temporal sequenceconditions. A SICmay require that for a relationship occurrence of a specifiedtype to exist, one ofits participating entity occurrences must have attributes withcertain values, orparticipate (or not participate) in other relationshiptype(s) at the time it is goingto participate in this relationship. Those occurrences of otherrelationship typesare allowed to be deleted after insertion of this relationship.2. The second complicating factor — quantitative requirements.For a relationship occurrence to exist, there may be a quantitativerestriction on the maximum number of relationship occurrences of thesame relationship type, in whichone involved entity occurrence has participated— that is the relative maximumcardinality to the specified entity type. Ifa necessary condition for the existenceof a relationship involves other relationship type(s), theremay be some specialquantitative requirements that are cardinalitiesof those other relationship types66.64Some examples of such relationship types are: Sibling_of, Married_to, Partner_of.65Some examples of such relationship types are: Sibling_of, Ancestorof, Supervise, Partner_of.66For example, there may be a SIC such as, “if RX, E RX F then exactly3 RY, E RY G”.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 1173. The third complicating factor — multiple occurrences. The above onlyconsiders necessary coilditions for the existence of one relationship occurrence.However, there may be some necessary conditions for the existence of multiplerelationship occurrences of the same type relative to a sharing entity occurrence67.Group Relationship Type SICsThe above focuses on the necessary conditions for the occurrences of a singlerelationshiptype although sometimes several relationship types may be involved in the conditions.In a star context, there are several relationship types witha sharing entity type. Ifseveral occurrences of different relationship types co-exist, there mayhe some necessaryconditions to require that the sharing entity must participate in some other relationships,have some attribute values, or there may he a formula among someattributes of thesharing entity and those relationships.Relationships Involved in Defining “Implicit Entity Subtypes”It is necessary to review those single attribute SICs in Section 5.3 and SingleEntity SICsin Section 5.4 in the presence of relationships. For example, it is possiblethat a SIC(e.g., nonvolatility, value restrictions, etc.) applies to a singleattribute only because itsassociated entity participates in some relationship(s). Itis also possible that a SIC (e.g.,restriction on the maximum number of occurrences, etc.) applies toan entity occurrenceonly because the occurrence participates in some relationship(s).67For example, there may be a SIC such as, “if 3 exactly 3 RX, B RX F then 3 RY,B RY G”.Chapter 5. An Extended E-R. Model Incorporating Semantic Integrity Constraints 1185.5.2 Sufficient ConditionsAs stated in Section 5.2.3, if a SIC requires the existence of an occurrence of one relationship type to depend on the existence of an occurrence of the other relationshiptype,it would be incorporated when identifying necessary conditions for theexistence of theoccurrences of the first relationship type. At this step, we need not specially incorporateit as a sufficient condition for the existence of the occurrences of the other relationshiptype. However, other SICs may specify sufficient conditions for a relationshipto existinvolving the existence of an entity occurrence, time-triggering,or temporal sequencerequirements. They are as follows.The Existence of an Entity There may be SICs to require thatif an entity occurrenceexists, it must participate in a specified relationship type. Inthis case, the existence of arelationship occurrence might be thought of asa necessary condition for the existence ofan entity occurrence. However, since “entity” is a more fundamentalconstruct it is morenatural to think that the existence of an entity occurrenceis the sufficient condition forthe existence of a relationship occurrence. There are somefurther complicating factorssuch as the following:1. The first complicating factor implicit subtype. Itis possible that a relationship type must exist only for an “implicit entity subtype”.In this case, the definition of an implicit entity subtype may be basedon the values of its attributes68.68jthis case, the only focus is the existence of an entity. We need not considerSICs related to an“implicit entity subtype” that is defined as occurrences participating in other relationshiptypes. ThoseSICs have been covered when we consider necessary conditions for theexistence of other relationshiptypes.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints1192. The second complicating factor — quantitative requirements. Wemayrequire not only the existence of any occurrence of the specified relationship type,but also specify the number of such occurrences. These are twokinds of relativecardinalities69:the requirement of the minimum number of relationshipoccurrencesin which an occurrence of the specified entity type must participate;or the requirement that for an occurrence of the specified entity type, all occurrences of theotherentity type are required to connect with it via this relationship type. Combiningthis factor with the first factor, we would have cardinalities foran “implicit entitysubtype”7°3. The third complicating factor — further restrictions.It is possible thatthere may be further restrictions on the existence ofa relationship in addition to thecardiiialities. A SIC may require that if anoccurrence of one entity type exists, itmust participate in some minimum number of occurrences of a specifiedrelationshiptype, and there are further restrictions placed onthe relationship attribute value(s),or on the occurrence(s) of the other participating entitytype. SICs related toa weak entity type7’ and a Critical_Relationship_OccurrenceSIC72 are justspecial cases73.69Note that as described in the previous section, a relativemaximum cardinality is a necessary condition for the existence of a relationship occurrence. The exact number ofthe cardinalities is theconjunction of the requirements of the minimum and maximumcardinalities.70Again note that some subtype cardinalities, which are relatedto the entity participating in otherrelationship types, have been considered in the previous section.71By the semantics of the weak entity, the relationship typeR, via which the weak entity type isdependent upon a regular entity type, is total to the weakentity type and its key is fixed. The fixedretention restricts the key of relationship from updating. Notethat the relative cardinalities of the weakentity in the relationship type R are not necessarily (1,1), itmay be (c,d) where d is a number greaterthan c, and c greater than 1. For example, a child as a dependentmay have both father and mother asemployees in a company.72A Critical_Relationship_Occurrence SIC requires the totalityconstraint to the specified entitytype and the existence of exactly one critical relationship occurrence.T3There are many other examples such as Stronger Totality Constraints (refer to AppendixD).Chapter 5. An Extended E-R Model Incoiporating Semantic Integrity Constraints 1204. The fourth complicating factor — a group of relationship types. We maynot require the existence of an occurrence of a specific relationship type for anyoccurrence of the specified entity type. Instead, we may requirethe existence ofone relationship occurrence among a group of relationship types. Combiningthisfactor with the above factors, we might have a more complicated SIC.Time-Triggering A single relationship, like an entity in isolation,may have time-triggered SICs to trigger its update.Temporal Sequence Conditions It is possible to require thatif an entity occurrencehad some attribute value(s) in the past and no longer has it now,it must participate insome relationship(s). In the case of a group of relationship types,if some relationship(s)existed in the past, once it is deleted, other relationship(s) mayhe required to exist now.5.6 SICs Implied by Implicit Relationships and Data AbstractionsIn general, the E-R-SIC model requires us to express explicitlyall relationship typesin an E-R diagram. Even in cases involving an ID dependency74,is_a or component_ofrelationships where the candidate key of one entity provides informationabout a relatedentity, the relationship type is required to be specified to makethe semantics clearer.There are some exceptions (e.g., the cases that two entitytypes are exclusive; an entitytype may be formed by set operations on other entity types;or member_of relationshipsexcept for in the enumerated set association) that thereis a SIC between some entity74An entity type (e.g., E) has “ID dependency” on other entity types if this entity typecannot beuniquely identified by its own attributes and has to be identified by its relationship types to the otherentity types ([Chen, 1985]). The entity type E needs the key of some other entity type (e.g., F) as apart of its key.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 121types, but we need not express the relationship type explicitly because no suitable relationship type can be specified or the relationship is derivable. This section considers theSICs implied by these “implicit” relationship types. In addition, this section discussesthe SICs implied by data abstractions since they are special relationships.Specialization Abstraction The specification of a specialization abstraction, “S is_aG”, implies that the relative cardinality constraints of is_a must be S (1,1) and G (0,1)([Goldstein and Storey, 1990]). Because the primary key of G (say G. Gkey) is a candidatekey of 5, there is a necessary condition, in addition to the existence of participating entityoccurrences, for a relationship is_a to exist, i.e., S. Gkey=G. Gkey.Entity Types in a Specialization Hierarchy In a specialization hierarchy,twoentity types may be exclusive, or an entity type may be formed byset operations (intersection, union, or difference) on other entity types ([Biller and Neuhold, 1978]). It is notnatural to define a relationship to express the exclusion between twoentity types. Neitheris there an explicit relationship to express the set operationsalthough is_a relationshipsamong those entity types should be specified.• SICs implied by exclusion between entity types. Suppose that twoentitytypes E, F are exclusive and Ekey is their common candidate key. It implies a SICwritten in the simplified format as: if E.Ekey= Value then -i(F.Ekey_—Value).• SICs implied by set operations on entity types. Suppose E is formed by theoperation on two entity types F, G and Ekey is the common candidate key among7It is not meaningful to discuss exclusion or set operations if those entity types are not in the samehierarchy. In addition, if those entity types formed by set operations do not have special attributes orparticipate in special relationships, they may be redundant.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 122E, Fand G.— F = F fl G. There are some SICs implied by the isa relationships betweenthe subtype (E and the super-types F, G, respectively. In addition, it implies‘if (F Ekey= Value) A (G Ekey= Value) then E Ekey— Value”— E = F U G There are some SICs implied by the is_a relationships betweenthethe subtypes F, G and the super-type F, respectively In addition, it implies“if E Ekey==Value then (F Ekey= Value) V (G Ekey= Value)”— E = F — G There are some SICs implied by the is_a ielationships betweenthesubtype (E) and the super-type (F), and the exclusion between E andG. In addition, it implies “if F.Ekey= Value then (E.Ekey= Value) V (G.Ekey=Value)”.Inheritance Conflict Problem An inheritance conflict may occurwhen two ormore super entity types of a single specialization type have some attribute(s)with thesame name(s). In such a case, there should be a further higher level super-typeof whichthese super-types are subtypes. However, the organizationmay not be interested in thishigher level super-type. If these attributes are really semanticallydifferent, we just prefixtheir names with the super-type names and no SIC needsto be specified. However, iftheir semantics are the same, the subtype should inherit only one of these attributesandthere must be an integrity constraint to maintain the equalityof those attributes’ values.In addition, the SICs associated with the super-types mustnot be inconsistent with eachother, otherwise, the multiple inheritance will cause thesubtype to be empty.Aggregation Abstraction The specification of a compositeentity aggregation, Ccomponent_of A, implies that the cardinalities of component_of should be eitherC (1,1)Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 123component_of A (O,n) or C (1,1) component_of A (n,m), where 77 1 and mn76.Because a candidate key of C is formed by the primary key (say A.Akey) ofA plusprobably some other attributes, there is a necessary condition, in addition to theexistence of participating entity occurrences, for a relationship component_of to exist,i.e.,C.Akey=A.Akey.Association Abstraction Association (grouping) is a powerful abstraction,but itsimplied SICs are complicated. In all cases except for the enumerated setassociation, themember_of is implicit since it can be derived. Theiraggregate SICs should be speciallydealt with77.Suppose that agg_fcn stands for an aggregate function, Derived_Attis anattribute of the set, Att is its corresponding attributeof the members.1. Natural Set Association: Suppose that in theM memberof Ms, the memberofis implicit. We would have “Ms.Derived_Att=aggfcn(M.Att,)”.In addition, anoccurrence of Ms cannot be deleted if it, as a set,is not empty.2. Indexing Derived Set Association;• Only involving one indexing type: Suppose that in Figure5.1, for Mmemberof DS, the indexing entity type is I with the key I.Ikey,the indexing761f the component C (e.g., Automatic_window_controller) is just relevant to the aggregateA (e.g.,Car), the aggregate A may have 0 or up to n components C. Ifthe component C (e.g., Engine) ischaracteristic or identifying to the aggregate A (e.g., Car), component_of is total tothe aggregate A.(Reference to [dos Santos, et al., 1980] for the formal definitions of “relevant”, “characteristic”,and“identifying”.) In either case, the component_of relationship should always be totalto the componentC. In some applications, some “components” can exist independently, e.g., an engine may besold as aseparate part. It would be better to define a subtype of the original “component”type to include thosecomponents that cannot exist independently so that the semantics become clearer. Forexample, wemay have a type Engine and its subtype ComponenL engine that belongs to cars. Thosecomponents aredifferent from other occurrences in terms of behavioral constraints and structural attributes.77The cardinalities of member_of in the case of natural set association and indexed set association canbe derived from the relative cardinalities of other relationship and absolute maximum cardinalities ofentities. So, they need not be considered.Chapter 5. An Extended E-R Model Incorporating Semantic IntegrityConstraints 124relationship is R, member_of is implicit. We would have:“DS.Derived_Att=agg_fcn({M.Att MR I, DS.Ikey=I.Ikey})”.• Only involving indexing attributes: Suppose thatin M memberof DS,M.Iridex is the indexing attribute, member_ofis implicit. We would have:“DS. Derived_Att=agg_fcrq’{ M. Att DS. Index=M.Index}J’.In both cases, an occurrenceof DS cannot be deleted if it, as a set, is not empty.In addition, DS.Ikey (or DS.Index) cannot be updated.3. Enumerated Set Association: In this case, M member_of ES, themember_of isexplicit. That is, we have:“ES.Derived_Att—agg_fcn({M.Att) M M_member_of_ESES})”.5.7 Summary of the E-R-SIC ModelFigures 5.6 to 5.10 give a summaryof the E-R-SIC model.Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 125E-R DiagramAttributes?DB Designer SICs?Entities?think:Relationships?±Construct Definition:Definition 5.1: An entity is the database representationof a real-world objectthat can be distinctly identified.Definition 5.2: A relationship is the database representationof an association amongreal-world objects.Definition 5.3: An attribute is the database representationof a property of a real-world objector an association that is a function mapping an entitytype or a relationshiptype to values.Definition 5.4: A SIC is a logical, invariant restrictionon the static state of a database (that is,a collection of attributes, entities, and relationships),or on the database statetransition caused by an insertion, deletion, or updateoperation.Observation:• Entity, relationship,and SIC occurrences are classified into differententity, relationship, andSIC types according to some criteria.• Ata particular moment, certain groups of entity, relationship, or SICoccurrences can beconsidered as sets in the mathematical sense, andthese sets may have some aggregate properties.• Entity is the most fundamental construct among attribute, entity,and relationship.Simplifying Assumptions:No ternary relationships or relationships of higher degree.No recursive relationships.Data Abstractions: inclusion (classification, generalization),aggregation, and association.Premise 5.1: A physical-symbol system has the necessaryand sufficient propertiesto represent real-world meaning.Interpretation: A SIC is an assertion — a sufficient ornecessary condition — for an occurrenceof an attribute, entity, or relationship type toexist, not exist, or change in a database.Premise 5.2: The occurrences in each entity or relationship typeare not homogenous; that is, theyhave different attribute values, and are related to differentoccurrences of other objects.Implication: A database designer could notspecify all entity and relationship types such thatall occurrences of each type satisfied the same set of SICs.That is, “implicit subtypes” with unique SICs always exist.What are SICs that should be incorporated?Figure 5.6: What is in the database?Chapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints126DiagramDB Designerthink:Examine each attribute of each entity in isolation.(That is, imagine a hypothetical situation:examine each attribute, say E.A1, and suppose thatit is the only “object”of interest at this moment.)Consider:Are there necessary conditionsthat must holdfor E.A1 to exist, not exist, or change inthe database?What are these necessary conditions:• associated single entity type SICs?existence: value, null, etc.?— change: changeable, new/old value pair?• associated whole entity set SICs?— unique?— aggregate value?• because we choose it as the primary key?•because of time restriction?Are there sufficient conditions,which if true, imply thatE.A1 must exist, not exist, or changein the database?What are these sufficient conditions:none except for time-triggering?Figure 5.7: Single Entity Attribute SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 127DiagramDB Designerthink:Examine each whole entity in isolation.(That is, imagine a hypothetical situation:examine each whole entity type, say E, and suppose that it is the only “object”of interest at this moment.)Consider:Are there necessary conditionsthat must holdfor an E occurrence to exist. not exist, or change inthe database?What are these necessary conditions:• single entity type SICs?— existence: a formula among attributes?— change: deleted entity attributes?• whole entity SICs?— the restriction on the maximal number of occurrences?— concatenated values of some attributes are unique?— a formula involving aggregate value(s)?aggregate values are interdependent?• because of an implicit subtype?• because oftime-restriction?• because of temporal sequenceamong attributes?Are there sufficient conditions, whichif true, imply thatan E occurrence must exist, not exist, or change inthe database?What are these sufficient conditions: None exceptfor time-triggering?Figure 5.8: Single Entity SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 128E-R DiagramDB Designerthink:Examine each relationship type, say R, and consider its context inthe whole E-R diagram.What are the contexts that each relationship is in the E-R diagram:line, star, loop-2, and loop-n?Consider:What are necessary conditionsthat must holdfor an R occurrence to exist, not exist, or change in the database?What are these necessary conditions:• single relationship SICs?similarly to a single entity, focus only on its attributes:*each of its single attribute has conditions in Figure5.7?*the whole relationship has conditions in Figure 5.8?but could be more complicated since an implicit relationshiptype may be defined based onits participating entities or there may be a formulaamong those attributes ofthe relationship and its participating entity?— focus on its role as an association between entities:*inherent SIC: its participant entity must exist?*restricted-connecting set:one-side, two-side of its participating entities, or intra-relationshipcondition?© complicating factors:1. temporal conditions on relationships or the values of entity attributes?2. quantitative requirements on occurrences of the same ordifferent relationship types?3. existence of multiple relationship occurrences of the sametype?• group relationship typeSICs: conditions for occurrences of several relationshiptypes to coexist?• relationships involved in defining “implicit entity subtypes”, soreview Figures 5.7 and 5.8 again?Are there sufficient conditions,which if true, imply thatan R occurrence must exist, not exist, or changein the database?Some are covered as necessary conditions on otherrelationships, what are other sufficient conditions?• the existence of an entity?© complicating factors:1. because of implicit entity subtypes?2. quantitative requirements on the existence of the relationshipoccurrences?3. further restrictions on the existence of the relationship occurrences?4. the existence of one relationship occurrence among a group of relationshiptypes?• Time-triggering?• Temporalsequence conditions?Figure 5.9: Relationship SICsChapter 5. An Extended E-R Model Incorporating Semantic Integrity Constraints 129E-R DiagramDB Designerthink:Is there any SIC implied by some “implicit” relationships?If there is a SIC between any two entity types,in general we should express the involved relationship type explicitlyin the E-R diagram.In what situations would we have some “implicit” relationship types?• these entity types are in the same specialization hierarchy?— two entity types are exclusive?one entity type is the intersection of other entity types?— one entity type is the union of other entity types?one entity type is the difference between two entity types?— multiple inheritance conflict problem?What are SICs implied by data abstractions?• generalization: S (1,1) is_a G (0,1),a necessary condition for is_a S. Gkey=G. Gkey• aggregation: C (1,1) component_of A (0,n) or A (n,m), n1, m na necessary condition for component_of— C.Akey=A.Akey•association and grouping: member_of may be implicitly representedexcept forthe enumerated member_of;the involved relationships and entities aretightly related;SICs are complicatedFigure 5.10: SICs Implied by Implicit Relationships and Data AbstractionsChapter 6The Application of the E-R-SIC ModelThis chapter begins with a hypothetical example to showhow the E-R-SIC model canbe applied. Potential pitfalls of using the E-R-SIC model arethen discussed. Finally, wediscuss the usefulness of the E-R-SICmodel.6.1 An Example of Using the E-R-SIC ModelDescription Suppose that we havean example called car_dealer.database_design.TheE-R diagram in shown in Figure 6.1.The database is intended to keep trackof eachcar from the time it is ordered from the factory throughits sale to a customer andsubsequent service history as long as thecar is serviced by that dealership. Informationabout a (potential) customer is also kept.For simplicity, the possibility that a customermight transfer his (her) car to other persons is notconsidered.Attributes The attributes are not shown inthe first level of the E-R diagram.They are as below78.• Entity Types:ThTheproperty inheritance principle is assumed. Therefore, theinherited attributes, e.g., Salesperson.Salary, are not listed.130Chapter 6. The Application of the E-R-SIC Model 131is..aFigure 6.1: An Example: A Car Dealership DatabaseChapter 6. The Application of the E-R-SIC Model 132— Person: SIN (key, i.e., social insurance number), Name, Sex, Address, Phone.— Customer: SIN (key).— Employee: EmpJd (key, i.e., employee identification number), Salary,Hire_Date.— Salesperson: Empld (key), BasicSalary, Commissioll.— Mechanic: EmpJd (key).— Car: Engine_No (key), Model, Year, Regular_Price, Cost.— Part: Code (key), Name, Price, Cost, Qty_on_hand, Reorder_Point,Date_Last_Used.— Service: Sequence_No (key), Date, Charge, Part_Fee, Labor_Fee,Discount_Amt.• Relationship Types:— Pre_Sell: no attributes.— Own: no attributes.— Sell: Discount_Rate, Total_price, Sell_Date.— Maintained_by: no attributes.— Installed_in: Qty_Used (i.e., the number of therelated part installed duringa service).— Work_on: Hours (i.e., how much hours a mechanic spendson a service).The example does not include all SIC types. In addition,only some of the relevantSICs are illustrated. The SICs are written in the simplifiedformat according to the BNFin Appendix C.Entity Attribute SICs Each attribute of each entitymay have some SICs requiringit to be not-null, to take some specified value range, data type and format.The followingtwo attributes are used for illustration.Chapter 6. The Application of the E-R-SIC Model133• Person.SIN: there are SICs to assert thata person’s SIN must be known,is of type non-arithmetic (i.e., may not be used in conventional arithmeticoperations) and has a format such as 123-56-789,i.e.,is..null(Person.SIN)=nosatisfy_datatype( Person. SIN, non_arithmetic)satisfy_format(Person.SIN, 999!999!999)• Employee.Salary: there are SICsto assert that an employee’s monthly salarymust be known, is of type arithmetic (i.e., all of the normal arithmeticoperations may be performed on it in the usual way),and is in the range $1,500and $10,000 inclusive, i.e.,is_nuil(Employee. Salary) =nosatisfy_datatype(Employee. Salary, arithmetic)satisfy_value(Employee.Salary, [1500 .. 10000])Some attributes are not allowed to be updated.For example, the social insurancenumber of a person cannot be updated, i.e.79,if Person.SIN is_to_be_updated then falseSome attributes have SICs to restrict changesin value. For example, a salesperson’sbasic salary cannot decrease; and the recentlyused date of each part is kept up-to-date,i.e.,new(Salesperson .Basic..Salary) old( Salesperson.BasicSalary)new(Part .DateLastUsed) old(Part.DatelastUsed)791n this section, it is assumed that there is no automated database designaid. If there is such asystem, the system, rather than the database designer, would write the SICs inthe simplified format forlater reformulating them in terms of SIC Representation model.Chapter 6. The Application of the E-R-SIC Model 134Some attribute SICs relate to the entire associated entity set, for example, eachperson’s SIN is unique; and the sum of the monthly salary of employees should be lessthan or equal to $1,000,000, i.e.,unique (Person. SIN)sum(Employee.Salary) 1000000Entity SICs There may be some SICs that require the values of several attributesofan entity to follow some formulas. For example, the salary of a salesperson includestwoparts — basic_salary and commission; the regular price ofa car is set to have at least25% gross profit; and for each part, its quantity on hand must be always greater than orequal to its reorder point, i.e.,Salesperson. Salary = Salesperson .Basic.Salary+ Salesperson. CommissionCar.Cost Car.RegularYrice x 0.75Part. Qtyon±and Part .ReorderPointThere are some SICs to restrict entity set properties. For example,the dealer canhave at most 30 mechanics, i.e.,count(Mechanic) < 30There are some SICs restricting attribute values as a functionof time. For example,for those mechanics who have worked for at least two years, theirsalaries should notdecrease and the average should be at least $3,000, i.e.,if (Current_time — Mechanic.Hire_Date) “2 years”then new(Mechanic . Salary) old(Mechanic. Salary)Chapter 6. The Application of the E-R-SIC Model135if (Current_time — Mechanic.Hire_Date) “2 years”then avg(Mechanic.Salary) 3000Relationship SICs Attributes of relationships may have similarattribute SICs. However, some restrictions on attributes of one relationshipare more complicated. For example, a salesperson cannot offer more than 80% of the gross profit rateof a car as adiscount rate to a customer and the total price ofa sold car is equivalent to its regularprice minus discount, then adding 13% tax rate, i.e.,if Salesperson Sell Carthen Sell.Discount_Rate (Car.Regular_Price— Car.Cost) ÷ Car.Regular_Pricex 0.80if Salesperson Sell Carthen Sell.Total_Price Car.Regular_Pricex (1—Sell.Discount_Rate) x 1.13There are some restrictions on constructing a relationship.Some restrictions are one-side conditions. For example, the dealeronly services those cars that are first owned bysome customers that are in the database,i.e.,if Car Maintained_by Servicethen (Customer Own Car) beforeOther restrictions are two-side conditions. For example,examine the loop formedby the three relationships, Pre_Sell, Sell and Own,there are only three non-redundantSICs to assert their dependencies: (1) if a salesperson sells a car, he(she) must havecontacted a customer, who then buys the car — there must be a “customer” involved;Chapter 6. The Application of the E-R-SIC Model136any other “entity occurrences”, which are not in the “customer type”, cannot own thatcar (2) a customer can own those cars that have been pre-sold by a salesperson and thenactually sold by the same salesperson; (3) if a car is sold by a salesperson and is ownedby a customer then that salesperson must have contacted that customer transferof thecar among customers is not recorded, i.e.,if Salesperson Sell Carthen Salesperson Pre_Sell Customer, Customer OwnCarif Customer Own Carthen Salesperson Pre_Sell Customer, Salesperson SellCarif Salesperson Sell Carthen if Customer Own Carthen Salesperson Pre_Sell CustomerSome conditions are complicated. For example,a part_fee including 6% tax is placedon the parts installed in those cars that have been boughtmore than 5 years, i.e.,if Part Installed_in Service,Car Maintained_by Service,Salesperson Sell Car,Service.Date — Sell.Date> “5 years”thenService.Part_Fee = sum({MCharge PartInstalledln Service,MCharge= Part.Price x InstalledJn.QtyUsed x 1.06})Chapter 6. The Application of the E-R-SIC Model 137There can also be SICs expressing sufficient conditions for relationships. Those arethe relative minimum cardinalities in Figure 6.1. For example, any service should haveat least one mechanic working on it, i.e.,V Service, Work_on, Mechanic Work_on ServiceSICs in a Specialization Hierarchy There are some SICs implied by implicit relationships. For example, an employee cannot be both a mechanic and a salesperson,i.e.,if Mechanic.EmpJd=Valuethen —i (Salesperson .EmpJd=Value)There are also some SICs implied by is_a relationships. For example,“each mechanicis an employee” implies that relative cardinalities — Mechanic(1,1) is_a Employee (0,1),and a necessary condition for an is_a occurrence to exist —Mechanic. EmpId—Employee. EmpId.6.2 Potential Pitfalls of Using the E-R-SIC ModelThere might be some potential pitfalls that need to be avoided whenusing the E-R-SICmodel.Pitfall 1: Inconsistent and Redundant SICs Inconsistent or redundantSICs mightbe specified. For instance, in the car_dealer_database_design example, if the database designer specified that “if Salesperson Pre_Sell Customer then (Salesperson Sell Car)”, itChapter 6. The Application of the E-R-SIC Model 138would be inconsistent with “if Customer Own Car then Salesperson Pre_SellCustomer,Salesperson Sell Car”. If the database designer specified: “Salesperson.Basic_Salary2000”,“Salesperson. Commission500”,and “Employee.Salary < 2OO”, these SICswould be inconsistent because Salesperson.Salary would then have the lowerbound $2,500that is greater than the upper bound of Employee.Salary. If the databasedesigner specified “if Customer Own Car then Salesperson SellCar”, it is redundant since it is subsumed by “if Customer Own Car then Salesperson Fre_SellCustomer, Salesperson SellCar”. It becomes more difficult to detect such inconsistenciesand redundancies as thenumber of SICs gets larger. There needs tobe some automated tools to help detect themor prevent them in advance although this issue maynot he completely solvable.Pitfall 2: Imprecise or IncompleteSIC Representation The simplified representation format in the preceding section shouldonly be used as a convenient wayofinitially incorporating SICs rather than asa formal way of eventually representing them.Although the simplified representation format mightbe more natural to a database designer, it is not precise. Recall that there are sixcomponents in the SIC Representationmodel. In most cases, the simplified representation only containsprecondition and predicate components. For operation-dependent SICs, object andoperation type componentsare also specified. Even in that case, other important specificationinformation (the certainty factor and violation action) is still missing.In order to represent any SIC precisely,we need some algorithms to reformulate it (decomposeit if necessary) in terms of theRepresentation model.We do not take the naive approach of enumeratingall conditions for all objects explicitly. For example, a general SIC represented inthe simplified format, “if E RX Fthen E RY 0”, would be incorporated when we consider necessary conditionsfor theChapter 6. The Application of the E-R-SIC Model 139existence of RX. We do not incorporate it again as a sufficient condition for the existenceof RY. However, if we decompose the original general SIC, there would be two sub-SICsrepresented in the Representation model:one for RX on insertion, whose violation action would be “reject” or“propagate(insert(RY))”— the other for RY on deletion, whose violation action would be “reject”or“propagate(delete(RX))”Note that the first sub-SIC in terms of the SIC Representation modelis a necessarycondition for the insertion of RX, and is also a sufficient conditionfor the insertion ofRY if the violation action is “propagate(insert(RY))”. The secondsub-SIC is a necessarycondition for the deletion of RY, and is also a sufficientcondition for the deletion ofRX if the violation action is “propagate(delete(RX))”.Without decomposing a generalSIC into several sub-SICs represented in terms of the SIC Representationmodel, theSIC specifications would not be complete because the violation actionmay depend onthe object and operation causing violation. Therefore,we can conclude that the SICRepresentation model provides a complete and precise representationof SICs in the ER-SIC model.Pitfall 3: E-R Structure Orientation The E-R-SICmodel helps a database designerincorporate SICs based on E-R structural descriptions.However, our logical data modelis the relational model. There should be some procedurefor losslessly transforming SICsin the E-R-SIC model into corresponding ones in the relational model80.80The “losslessly” means that, for each SIC incorporated in the E-R-SIC model placing restrictionsonoccurrences of entities, relationships, or attributes, there is (are) SIC(s) in the relationalmodel placingcorresponding restrictions on relation tuples or attributes.Chapter 6. The Application of the E-R-SIC Model140Pitfall 4: Inefficient Modelling In the absence of an automated databasedesign aid,a database designer must specify all explicit and inherent SICsin terms of the simplifiedformats as described in Appendix C. For example, in additionto specifying the entitytypes, Employee and Mechanic, and the relationshiptype Mechanic_is_a_Employee, he(she) needs also to specify the relative cardinalitiesof is_a, a necessary condition for itsexistence — Mechanic.EmpId=Employee.EmpJd,and even the fundamental incidenceconstraints the existence of participating entityoccurrences. In general, there mightbe many possible SICs given an E-R diagram.Since one SIC may involve more thanone object, the database designer needs to recordwhat SICs have already been identifiedso that they would not be incorporated again. Therewould be a heavy burden for thedatabase designer if he (she) needs tofigure out all the possible SICs, verify, reformulate, decompose, and transform them by himself (herself).It is desirable to have someautomated database design system tohelp the database designer model and representSICs.Proposals for avoiding these pitfallswill be introduced in the next chapter (pitfall1is not completely avoided).6.3 Data Integrity Semantic CompletenessIn the E-R-SIC model, the SIC is the construct usedto provide logical restrictions onthe structural schema, that is, on the attributes,entities and relationships. Completelyincorporating SICs depends on the correctspecification of the structural schema.TheE-R-SIC model requires all relationshipsto be explicitly represented except for somespecial cases, such as exclusion between two entity types.The explicit representation ofrelationships makes the semantics clear.Chapter 6. The Application of the E-R-SIC Model 141Based on the assumed correct E-R structural schema, we could systematically modelSICs by considering a wide range of possible restrictions or requirements — positiveor negative, simple assertions or complicated formulas, qualitative or quantitative, time-restricted, time-triggering, or temporal sequence requirements, implicitly/explicitlytype-related or set-related. Most SICs we consider are necessary conditions forthe existenceof the occurrences of one or more attribute, entity, or relationship types.They are, bynature, static SICs restricting the possible states of the database.A few are inherentlydynamic SICs, e.g., new_old transitional constraints; and some others areconsideredas sufficient conditions. However, after decomposing them intosub-SICs in terms of theSIC Representation model, all become dynamic and operation-dependentSICs specifyingconditions for the deletion, insertion, or update of the relatedobjects. Since these threeoperations are the only ways the database can change, thosenecessary conditions forthese operations completely cover the necessary conditionsfor existence, nonexistenceand change of an object in a database.In addition, sufficient conditions (if any) are also incorporatedbecause of the violationaction component of the SIC Representation model.Whenever we find that the structural schema cannotmodel the data integrity semantics as SICs, we would need to go back and modify theE-R structure. Therefore, theE-R-SIC model not only allows modelling the SICs completelybut also forces the structural schema to reflect the data integrity semantics morefaithfully. The structure andSIC parts of the resulting schema would be closely related. Thiswould form a more suitable and fundamental base for higher level transaction or applicationprogram modelling.Therefore, we can conclude the following:With the support of the SIC Representation model, the E-R-SIC model wouldChapter 6. The Application of the E-R-SIC Model 142completely model the data integrity semantics.There is one limitation. Although some entities or relationships may have time-valuedattributes or temporal sequence requirements, time-stamped data integrity semantics(e.g., dealing with historical data at some specific time) in general may not he completelymodelled81.A related discussion concerns SICs in the relational model. Suppose that the SICsinthe E-R model have been losslessly transformed into correspondingones in the relationalmodel. Are there any other new SICs, e.g., data dependencies,that must be consideredin the relational model?Data Dependency in the Relational ModelTile relational data model has well-formalized theories. Data dependency is one ofits important concepts. Uliman [1982]defines a data dependency as “a constraint on the possible relationsthat can be thecurrent value for a relation scheme”. However, oneshould note that the usefulness ofdata normalization theory is for designing a databasedirectly from the relation concept,not for capturing semantics. The partial and transitive functionaldependencies implythe embedding of independent relationships ([Makowsky, et al.,1986]). Similarly, theexistence of non-trivial multi-valued dependenciesoccurs when a relation represents morethan one 1:N relationship ([Kent, 1983]). Theseproblems can be avoided if entitiesand relationships are properly designed and carefully transformedinto relations ([Storey,1988]). Some researchers try to apply other dependencies, e.g., inclusiondependencies,811n the literature, [Palmer, 1982] discusses time-dependent relationships and[Taiizovich, 1991] introduces lifetime cardinalities. It is not clear how the restrictions of lifetime cardinalitiesare applied overthe lifetime of the entities and relationships.Chapter 6. The Application of the E-R-SIC Model 143exclusion dependencies and co-exclusion dependencies82,to capture semantics. Exclusiondependencies have been covered in the E-R-SIC model. Inclusion dependencies comeabout because some relationships have not been explicitly represented83;and co-exclusiondependencies occur because the SICs are not precisely represented84.Therefore, we neednot add other SICs after transforming those SICs identified by the E-R-SIC model tocorresponding ones that reference the schema in the relational model.52Let E, F be two relations (possibly the same), and A, B, he attributes of E and F, respectively.E[A1,... ,Am] ç F[B1,... ,B] is called an inclusion dependency ([Casanova and Vidal, 1983]). IfE[A,... ,Am] is exclusive to F[B1,... , B,], it is called an exclusion dependency ([Casanova and Vidal,1983]). The negation of an exclusion dependency is called a co-exclusion dependency, ([Arisawa andMiura, 1986]).83Take an example of [Minnila and Räihâ, 1986]: Registered_Cars/Model] CarTypes/ModelJ. Thereshould be some relationship, e.g., Hasmodel, between Registered_Cars and Car_Types.84Arisawa and Miura [1986] state that what co-exclusion dependencies mean is to constrain databasewhen updating and sharing entities. However, there should a higher level of entity that is the super-typeof those entity types. If the SICs implied the is_a relationships between the sub-types and super-typeare precisely represented, the co-exclusion dependency constraints would be incorporated.Chapter 7A Proposed Database Design Aid for Eliciting SICsIn Figure 1.1 of Chapter 1 a proposed database design subsystem for elicitingSICsis sketched. Both the SIC Representation model and E-R-SIC model are application-domain independent. They are suitable for implementation as part of an automateddatabase design system. An automated database design system is needed becauseof thepotential heavy work load of a database designer when incorporatingand representingSICs. This chapter describes conceptually how this subsystem for elicitingSICs shouldwork. In the following, the term “database designer” will be used for theuser of theautomated database design subsystem who may be a professionaldatabase designer oran end-user who plays the role of the database designer to design his(her) system. Section 7.1 gives a brief overview of the elicitation subsystem.The three major functions ofthe subsystem — verifying elicited SICs for consistencyand non-redundancy, reformulating and decomposing general SICs into sub-SICs in terms of the SIC Representationmodel, and transforming them into corresponding ones in the relationalmodel are described in detail in Sections 7.2, 7.3 and 7.4, respectively.144Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1457.1 An Overview of the SIC Elicitation SubsystemInterfaces From the view of the SIC elicitation subsystem, there aretwo interfaces.The first is an interface to elicit explicit constraints from the database designer in adialogue or by menus, etc. The second is an interface between the SIC elicitation subsystemand the structure subsystem (e.g., the View Creation System in [Storey, 1988]and [Storeyand Goldstein, 1988]) for constructing the structural schema.From that interface, theSIC elicitation subsystem fetches the structural descriptions in termsof the E-R-SICmodel.Elicitation Based on these structure descriptions,the SIC elicitation subsystem willquery the database designer to obtain the attributeSICs for each attribute of each entity.Next, whole entity SICs would he obtained foreach entity. The system then traversesthe E-R diagram to detect possible relationshipSICs85 and data abstraction SICs. Ina dialogue, the database designer may need toadd some further restrictions on a SICby using expressions such as those in the simplified format(Appendix C). In general,though, the database designer need not worry aboutthe assertion representation syntax.In addition, the system, rather than the databasedesigner, would specify those inherentSICs implied by special relationships (e.g., the is_adata abstraction).Elicitation Knowledge It is possible thatthe elicitation procedure may be lengthybecause the goal is to include the complete data integritysemantics and the system hasneither common sense nor application domain knowledge.However, it is expected that85The original structural specifications may include some traditional relative cardinalitiesof relationships. However, the exact numbers except for 0, 1 may not he known since theyare irrelevant forconstructing normalized structures.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 146the system should query the database designer based on the following knowledge:• the E-R-SIC model to capture the possible SICs from different objectsin the(line, star, ioop-2, and loop-n) contexts of an E-R diagram;• consistency and nonredundancy rules for different SIC types;• heuristics from naming conventions and data types;• specification informationthat the database designer has provided so far.With consistency and nonredundancy rules for different SIC types,the subsystemwould not request, or could refuse immediately, some impossibleSICs. This kind ofknowledge is introduced in Section 7.2.1 to showhow it can alleviate the need for somedetailed verification.Since there may be a large number of possible SICs, heuristics may helpthe subsystemto be even more efficient in asking questions. The object data type — date,arithmetic ornon-arithmetic may suggest its volatility.For example, often a date is unchangeable. Inaddition, naming conventions may hint at some SICs.For example, attributes with thesame names in the entity types linked via some relationship typesmay suggest the needfor SICs involving some formula between them. AppendixE contains more examples.Verification Although full verification for consistency andnon-redundancy is hard toachieve, the elicited SICs would be verified to some extent, especially forcardinalities.The cardinality information is important because a totality constraintmight make manypossible SICs redundant or inconsistent. Section 7.2 discusses the related issues onverification in detail.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 147Reformulation and Decomposition Based on the information obtained from thedatabase designer, the system would represent general SICs in terms of the simplifiedformat described in Appendix C. Then, the system should automatically rewrite eachof the above SICs, and decompose it into sub-SICs if necessary, in terms of the SICRepresentation model. All six components plus the SIC name in the model would bespecified. The reformulation and decomposition algorithms are introduced in Section 7.3.The default certainty factor is “certain”, but the database designer can change it to be“uncertain” or provide specific certainty factors.Without application domain knowledge, it is impossible for an automated databasedesign system to provide automatically complex and application-dependent violationactions. For example, if we allow arbitrary violation actions, even for a SIC on updateof E.A2 with a simple formula predicate “E.A1 = E.A2+ E.A3”, the violation actionmay be either a propagation to update E.A1, or update E.A3, or update both E.A1and E.A3 (this may imply a number of choices with different ratios of E.A1 and E.A3).The number of possible combinations of violation actions would grow rapidly as thenumber of involved objects increases. The “propagate” action is helpful and powerful,but is also costly if the set of SICs is not optimized86.In this research, it is envisionedthat the system would only provide some restricted predefined violation action choices(“reject”, “propagate”, “conditionallyreject”, “conditionally_propagate”, or “warning”).Most of the violation actions would be “reject”, “conditionally_reject”, or “warning” ifthe system could not decide how to propagate to fix the violation. If the system decidesthat a “propagate” action is feasible, it would query the database designer to determine86An example here consists of two SICs: “SIC-i: (B RX F) V (B RY G)”; “SIC-2: if B RX F thenB RY G” attached with “propagate” actions in their all sub-SICs. Suppose that the original databasestate is “RY exists, but RX does not”. A “deiete(RY)’ operation would cause the following operations tobe performed: “delete(RY)”, “propagate(insert(RX))” owing to SIC-i, “propagate(insert(RYJ)” owingto SIC-S. A RY is forced to be inserted back although now it may connect B with a different F.Chapter 7. AProposed DatabaseDesign Aid forEliciting SICs148whether to “propagate”or “reject”.Generic SICs Asdescribed on page78, we can representsome common SIC typesfor the genericobject(Entity*,Relationship,Entity*.Attribute*.andRelatioriship*.Attributejto reduce thenumber of explicit SICsrepresented using theRepresentation model. Bydoing so, we can alsoalleviate the need forthe invocation of thereformulation and decomposition algorithmsduring the databasedesign consultationsession. The subsystemshould have thesepre-defined representationsof generic SICs. Ifthe database designerspecifies that occurrencesof an object type mustsatisfy one of thesecommon SIC types(e.g., two entitytypes are mutuallyexclusive), the relatedconstraint informationwouldonly be kept ina logical predicate,called an input predicate(refer to Appendix B.1,e.g., ex_ents(ExEntSet,)storing the informationthat the entity typesin ExEntSet areexclusive). Such specificconstraint informationneed not be directlyreformulated or decomposed in termsof the SIC Representationmodel. It is expectedthat the specifiedobject type couldinherit the related genericSICs. Section 7.3.1describes these genericSICs in more detail.TransformationIt is assumed thatif a relationship typehas (1,1) cardinalitiesrelative to an entitytype, it would berepresented by aforeign key rather than aseparaterelation in the relationalmodel. \‘Vhen thestructure subsystemtransforms the E-R specifications into relations,the SIC elicitationsubsystem should alsotransform the SICsinthe E-R-SIC modelinto correspondingones in the relationalmodel. The input predicatesneed not be transformed.There should alsobe pre-defined generic SICsin the relationalmodel correspondingto those in theE-R-SIC model. Theprinciples of representingrelationships in therelational model andthe SIC transformationalgorithms are introducedChapter 7. A ProposedDatabase Design Aid for Eliciting SICs149in Section 7.4.Final Outputs The final results ofthe consultation session would be a listing ofincorporated SIC specifications in a relationaldatabase schema. These specifications include:(1) specific SICs, which directly applyto specific object types and are representedinterms of the Representation model; (2)input predicates describing specific objecttypesto which some common SIC types apply; (3)generic SICs, which serve as the “templates”for common SIC representations and areexpected to be inherited by the specific objecttypes described in (2).7.2 SIC VerificationDifficulty in Verification As discussed in Section 2.3, fullverification of SICs for consistency and non-redundancy would need application-domainand common-sense knowledge. Even those aspects of verification thatdo not require application-domain knowledgestill pose difficult research problems becauseof the potential complexities of SICs. Thisresearch discusses the issues, but does not offer generalsolution algorithms.Because the interactions among SICs might, ingeneral, be complicated87,we wouldneed logic theorem proving techniques(e.g., the resolution refutation method [Nilsson,1980]). However, standard logic mustbe enhanced to handle the following problems.• Attribute Value Problem. In standard logic programming theory,objects are8TFor example, suppose that Ci, C2, CS arethe preconditions of three SICs, SIC-i, SIC-2, SIC-3,respectively; and P1, P2, P3 are their predicates,respectively. SIC-i and SIC-S are asserted for thesame object. If Ci overlaps with P2 and C3implies C2, Pi should be the same as PS. Otherwise,theyare inconsistent — because CSimplies P2 and P2 overlaps with Ci, if someoccurrences satisfy C3, theymay also satisfy Ci; so CStransitively implies Pi.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 150represented as finite terms, and the set of terms is countable (withinthe contextof the Herbrand Universe) ([Lassez, 1987]). Suppose that the value domains of allattributes are discrete and finite. Then, consistency among SICs can be checkedby applying the standard logic proving technique although, for efficiencyreasons,it may be accompanied by the standard CSP (constraint satisfaction problem)algorithms ([Mackworth, 1977; 1987]; [Mackworth and Freuder, 19851)88. However,suppose that the values of attributes are allowed to be continuous and infinite89.Standard logic cannot deal with the real number domain. It mustbe extendedto have other techniques to handle real number values and formulas.That isthe motivation for developing constraint logic programming(CLP) ([Jaffar andLassez, 1987]; [Cohen, 1990])°. We may apply linear programming (LP) algorithms(e.g., the simplex method [Hillier and Lieberman, 1986]) to check theconsistencyof a set of linear formulas (i.e., whether a feasible solutionexists) and removethe inconsistent value ranges (by finding the values of optimizationfunctions —maximization and minimization for each involvedattributes)91;apply integer linearprogramming algorithms if the values are restrictedto be integer numbers92;and88There are three sets of algorithms node, arc, and path algorithms.In this case, we would applyarc algorithms to check the consistency between value constraints anda formula constraint; apply pathalgorithms to check the consistency among value constraintsand several formula constraints. Theycan be checked in polynomial time ([Mackworth and Freuder, 1985]) although the pathalgorithms arecomplicated.89They may be numerical real numbers or integers; or may have date data type if dateoperations aresuitably defined.90For example, one CLP representative, CLP(R) provides both expressive andcomputational powersto solve linear equations and linear inequalities (by two-phase simplex algorithm)incrementally ([Jafi’arand Michaylov, 1987]). However, it is still a rudimentaryexperimental product and can only run underUNIX-based operating systems. Other CLP languages are Prolog III ([Colmerauer, 1990]),Trilogy, andCHIP ([Hentenryck, 1989]), etc.91A simple algorithm used by ALICE (A Language for Intelligent CombinatorialExploration) ([Lauriere. 1978]) can also be applied to test the consistency between asingle linear formula and the valuerange constraints, and further to remove the inconsistent value ranges.92It is not usual to have complex numbers in a database. Suppose we do havea complex numberapplication, Gröbner method would test whether a system of multivariate polynomialequations has asolution over the complex numbers ([Cohen, 1990]).Chapter 7. A Proposed Database Design Aid for Eliciting SICs151apply non-linear programming algorithms to handle a set of non-linear formulas93.However, all existing methods have some limitations. For example,if any of theconnectives between linear formulas is a disjunction(“V”), e.g., “P1 V P2”, thesimplex method is no longer applicable since “P1 V P2” may representa non-convexpolyhedron ([Cohen, 199O]). The general integer linear programmingproblemis NP-complete ([Papadimitriou and Steiglitz, 1982]). Furthermore, it is evendifficult to achieve nonredundancy. We mayapply these algorithms to removethe inconsistent values from value constraints. However, given a setof arbitraryformulas, no existing algorithms can tell us whethera formula is redundant.• Aggregate Function Problem. There maybe some attribute SICs related toentire entity or relationship set properties. We cannot verify theseSIC specificationswithout actual data. Even if they are not in ruleformat, at the best we can onlyhave some simple tests (Appendix F.1), such as: thespecified maximum value of anattribute value must be greater than or equal toits minimum value. Iii addition, asstated, the verification for cardinality constraints is important forother relationshipSICs. Standard logic proving should be supplemented with the algorithmsdescribedin Appendix F.2 to check consistency and nonredundancyfor cardinalities.• Incremental VerificationProblem. Verification would be conducted fortheSICs in the sequence of the incorporation. Since it is desirableto identify any inconsistencies each time a new SIC is added, theSIC elicitation subsystem would931f there are non-linear formulas and the functions are monotonicallyincreasing or decreasing foreach involved attribute, the ALICE algorithm can be applied tocheck the consistency between a singlenon-linear formula and value constraints.941n addition, one should note that the simplex algorithm would take an exponentialnumber of stepsin the worst case although in general it can be regarded as very efficient([Papadimitriou and Steiglitz,1982]).9It is known that the theory of predicates with=, ,,<, >,,+, x over real numbers is satisfactioncomplete, i.e., every constraint is either provably satisfiable or provably unsatisfiable. However, the theoryof predicates with =,+, x over integer numbers is not satisfaction-complete ([Cohen, 1990]).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 152not verify the whole set of SICs oniy once, instead there would be an incremental constraint satisfaction problem ([Hentenryck, 1990]), in which constraints areincrementally added and dropped.• Feedback from the Database Designer. It is impossible for the SIC elicitationsubsystem to achieve complete verification without feedback from the databasedesigner. The following are two examples.— If the preconditions of two SICs overlap, the database designer is required tomodify these two SICs.— If the preconditions of two SICs are not related in the syntax (i.e.,they areexpressions on different objects, for example,one is on Employee.Age, anotheris on Employee.Education), but their predicates are inconsistent, the systemshould query the database designer as to whether it is possiblethat an objectwould satisfy these preconditions at the same time.If it is, these two SICs areinconsistent.Optimization Problem Even if two constraints are neither inconsistentnor redundant, they might not be “optimized” — i.e., there might be a single constraintthat canreplace the set of these two constraints and be enforced more efficiently.The followingare two examples.• Given two value range constraints thatare neither inconsistent nor redundant,they may overlap. That is, there may be a single refined range to replacethose tworanges.• A set of two constraints, “(E RX F) V (E RY G,,)”, and “if E RX F then ERY G”,is neither inconsistent nor redundant. However, after optimization, the above setChapter 7. A Proposed Database Design Aid for Eliciting SICs 153would be replaced by a single constraint, “V E, RY, E RY G” (i.e., RY total toE.To get a single refined range constraint may be easy. However, in general, optimizationis more difficult than consistency and non-redundancy checking even for a small andsimple set of constraints. For its related objects, each constraint allows a set of databasestates to exist. When several constraints relate to the same object, if they are consistent,the intersection of all those sets of states is not empty. One approach to optimization isto find this intersection set and describe it by some expressions. However, the numberofstates could he large and it would be very difficult to write expressions for those states.Issue of Unexpected Implicit Constraints Even if a set of SICs is not inconsistentor redundant, there may still exist some unexpected implicit constraints. For example,suppose that Supervise is a relationship type between entity types Manager and Project,the database designer has specified the absolute maximum cardinalityof Project — therecan be no more than 8 projects; and some relative maximum and minimum cardinalitiesin the relationship Supervise there can be no more than one manager per projectand any manager must be assigned to a project. This set of three constraints impliesanimplicit constraint — the organization can hire at most 8 managers although the databasedesigner does not specify it explicitly or may think that there is no upper boundon theabsolute maximum cardinality of the entity type Manager. The elicitation subsystemneeds to recognize that some SIC types may be missing, and then check the current setof constraints to see whether such types of SICs can be derived from it. The databasedesigner would then be asked to judge as to whether he (she) has made some mistakes.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 154Verification related to Certainty Factors The certainty factor component in theSIC Representation model is a source of possible inconsistency between SICs. Supposing that both SIG-i and SIC’-2 are for the same object on the same operation with thesame precondition, if the predicate of SIC-2 implies that of SIC-i (i.e.,SIC-2 is morerestrictive than SIC-i), the SIC-2 should be less certain than SIC-i. Otherwise,theyare inconsistent. In addition, the certainty factor and violation action are closely related.An uncertain SIC can only have a predefined violation action of “warning”,“conditionally_reject”, or “conditionally_propagate”. A certain SIC can only havea predefinedviolation action of “reject” or “propagate”.Verification related to Violation Actions Furtado et al. [1988] describea situationwhere the violation actions of two SICs may be inconsistent. However,if the SICs areprecisely defined and the transaction concept is applied when enforcing them,the problemdescribed by Furtado et al. could not occur96.The major consistencyproblem causedby the violation action component in the SIC Representationmodel is that the violationactions may cause an endless enforcement cycle a “flip-flopping” behaviourbetween aset of SICs described by Ceri and Widom [1990] if arbitrary actions are allowed.Ceri andWidom suggest using a triggering graph97 to detect potential cycles. However,they leavethe responsibility of determining whether infinite triggering would actuallybe possibleto the database designer. It is difficult, in general, to take an arbitrary violationaction96The problem raised by Furtado et al. [1988] is as follows. Suppose that there isa relationship typeR and an entity type E, R is total to E. If the violation action of the incidence constraint,SIC-i, forE on deletion is “propagate(delete(R))” and the violation action of the totalityconstraint, SIC2, for Ron deletion is “reject”, Furtado et al. claim that they are incompatible. However,note that if the SICsare precisely defined, when the SIC-i has been violated and the deletion of the R istaken, the SIC-2becomes irrelevant because the B occurrence has become non-existent.971n their triggering graph, the nodes of the graph correspond to the SICs. Thereis a directed edgefrom node SIC-i to node SIC-2, if and only if the execution of SIC-i’s violation action is likely to violateSIC-2 (the likelihood is only in terms of objects and operation types).Chapter 7. A Proposed Database Design Aid for Eliciting SICs 155and decide whether it would violate other SICs. In this research, oniy some restrictedand predefined violation actions would be suggested by the SIC elicitation subsystem.Basically, if the violation action of a SIC is “propagate”, when the SIC is violated, it wouldallow the intended operation to be performed, but would also take a simple compensatoryaction to bring the database to another consistent state. With sucha restriction, if thoseSICs are consistent (in terms of the components other than the violation action),whenan operation violates a SIC, the database would return to a consistent statealthoughsome operations may be undone.Verification and Decomposition If a generalSIC is not in rule format, it can beverified for consistency and nonredundancy beforedecomposing it. However, a generalSIC in rule format must first be decomposed into several sub-SICsbefore verifying it.These sub-SICs indicate the consequences of the generalSIC that must be taken intoconsideration when verifying a set of SICs. Forexample, if there is a general SIC, “ifCl then P1”, where Cl and P1 are some assertions, theSIC elicitation subsystem wouldnot know that it is inconsistent with “if P1 thenCl” without first decomposing it.The decomposition algorithms described in the nextsection can be assured to representcorrectly the sub-SICs of a given single SIC. However, the verificationfor consistencyand nonredundancy among the representations forjjSICs is still needed.7.2.1 Consistency and Nonredundancy Rulesfor SIC TypesSince conceptually by using the E-R-SIC model, the SICscan be classified into a numberof types, some general problems of consistency and nonredundancyamong those typesof SICs can be explored in advance. The results would be somerules about consistencyand nonredundancy for SIC types, which can be stored iii the knowledge base oftheChapter 7. A Proposed Database Design Aid for Eliciting SICs 156elicitation subsystem in order to expedite the elicitation procedure.Take a simple example. In Figure 6.1 of the car_dealer_database_design example, thetwo relationship types, PreSell and Own, may be involved in a number of typesof SICs.For example, they may be exclusive (i.e., “if Customer Own Car then —i(SalespersonPre_Sell Customer)”), or one may depend on the other (e.g., “if Customer Own CarthenSalesperson Pre_Sell Customer”). Suppose that the database designer has specifiedthatthe minimum cardinality of Pre_Sell relative to Customer is 1. With the built-inconsistency and nonredundancy rules in Appendix G, the SIC elicitationsubsystem wouldautomatically know that the first SIC type cannot occurand that the second SIC type isredundant. It might be confusing to the database designer ifthe SIC elicitation subsystem asks him (her) to confirm these SICs. It would be inefficientif the database designerinadvertently specified the above redundant or inconsistent SICs and the elicitationsubsystem invoked the whole process to verify them duringa consultation session.Note that this knowledge cannot totally replace the actual verificationduring theconsultation, but it would reduce the number ofcases that need to be verified. Kung[1984, 1985] applies a sophisticated tableaux approach to verifya set of two SICs: “everyemployee earns more than $20, 000”, and “every manager is an employee”.Since the firstis a simple value constraint for a non-key attribute and the second implies somespecialrelative cardinalities and the equality condition on key values oftwo sides of entities forthe existence of an is_a occurrence, a human database designerwould know that theycannot be inconsistent. If the SIC elicitation subsystemhad similar rules, it could alsoskip the detailed consistency checking process.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 1577.3 SIC Reformulation and DecompositionAs discussed in Chapter 6, the E-R-SIC model uses the Representation model to specifySICs. SICs should be represented in terms of the Representation model. If a SIC is onlyrelevant to an object on an operation, we only need to reformulate it in the Representationmodel without changing its certainty. In most cases, because a SIC is relevant to severalobjects on various operations, the decomposition would get several sub-SICs written interms of the Representation model.The SIC elicitation subsystem would invoke the reformulation and decomposition algorithms, which are described in Appendix H, to reformulate and decompose SICs obtainedfrom the database designer. By using the E-R-SIC model, the subsystem should recognize all inherent SICs that are implied by the specifications (e.g., an is...a relationship)provided by the database designer. Before invoking the reformulation and decompositionalgorithms, the subsystem should first represent these explicit and inherent SICs intermsof the simplified formats in Appendix C, but without using the nested if.. . then rules.Algorithms previously proposed in the literature can deal only with some restrictedtypes of SICs and only consider the object and operation type components. The algorithms in Appendix H deal with all SICs that are recognized by the E-R-SIC model. Allcomponents in the SIC Representation model are considered.Correctness of the Reformulation and Decomposition Algorithms It isnecessary to assure the correctness of the proposed reformulation and decompositionalgorithms for representing a given general SIC. The following briefly examines the algorithms.Chapter 7. A Proposed Database Design Aid for Eliciting SICs1581. Same Certainty Factor. The certainty ofsub-SICs cannot be lower than thegeneral SIC; otherwise, the enforcement of these sub-SICscannot achieve the certainty of the general SIC. Therefore.if the original general SIC is 100% certain,its sub-SICs should also have 100% certainty. Suppose thatthe original generalSIC is uncertain. Since its sub-SICs may have different violationactions, some ofthem may have higher certainty. Without furtherinformation, the elicitationsubsystem can assume that they have atleast the same certainty as the general SIC.(Later, the database designer can changethe certainty of a sub-SIC to be higher ifnecessary.)2. Relevant Object and OperationTypes. The search for the relevant objectand operation types is based on the following ideas.• An operation-dependentSIC is only relevant to one object on oneoperation.For example, “if E is_to_be_deleted then.. . “is only relevant to E on deletion.• It is desirableto remove obvious redundancies whenchecking objects mentioned in a SIC. By the nature of relationship,if a relationship occurrenceexists, its participating entity occurrences mustexist too (the incidence constraint). If a general SIC is relevantto the insertion of an occurrenceof arelationship type and we have a sub-SICfor the relationship, we need nothave other sub-SICs to restrict the insertionof the occurrences of its participating entity types. It is possible thatsome occurrences of these entity typesmay not participate in any occurrence of this relationshiptype. The checkingof the general SIC for any occurrence of these entitytypes can be delayed untilit really participates in an occurrence ofthis relationship type. For example,according to the algorithms, “if E RX F then E RYG” would be relevant toRX on insertion, but not to E, F or G.Chapter 7. A Proposed Database Design Aid for Eliciting SICs159• Premise 4.1 inSection 4.1.2 is adopted to avoid some operationchecking.Since by assumption, the current database state issemantically correct, wewould need to check an operation only because thedatabase state transitioncaused by the operation may violate the SIC. For example, an insertionof anRX occurrence may violate “if E RX F then ERY G”, but a deletion of aRX could not. Similarly, an insertion ofa.n RX occurrence may violate “if 3atieast 3 RX, E RX F then 3 RY,E RY G”, hut a deletion of a RX couldnot.• Current_time andsome aggregate functions have special monotonicityproperties. For example, a deletion of an E occurrencemay violate “max(E.A)>Value “, but not an insertion of an E because,by the monotonicity of functionmax, an insertion of a new occurrence wouldnever decrease the aggregatefunction value.• Since a primarykey is used as a surrogate to represent an entityor a relationship in a traditional database, itsupdate may imply a deletion of an“old”entity or relationship occurrence followedby an insertion of a “new” entity orrelationship occurrence. Therefore,sub-SICs related to deletion and insertionshould also be asserted on the update of a primarykey.Note that the algorithms have limited capabilityto identify avoidable operationchecking for SICs involving aggregate function.In addition, it is possible to incorporate more consistency and nonredundancyrules for SIC types into the algorithmsto reduce the sub-SICs further. However,it can be claimed that the current algorithms correctly find the relevant object and operationtypes for a single SIC.3. Proper Precondition and Predicate Components. Basically, the algorithmsjust rewrite the original SIC so that it becomes more preciseand suitable for eachChapter 7. A Proposed Database Design Aid for Eliciting SICs160sub-SIC. The following are some brief ideas.• Subscripts and some special predicates(e.g., rship_occ_part) are used to makea sub-SIC precise.• If a sub-SIC is for an attributeon update, in general, its predicate componentcontains only an assertion on the attribute requiringit to have some specialvalue after update.• If a sub-SIC is for an entity or relationshipand it does not only assert thevalues of its attributes, in general, the preconditioncomponent is importantto identify what is the entity or relationship occurrenceto be checked.• If the constraint violation isbecause an entity occurrence is deleted, in general,because of the key uniqueness, it is impossibleto find another occurrence ofthe same type to avoid the constraint violation(unless the constraint is oneasserting aggregate properties). However,it is possible to “reconnect” anoccurrence of one entity type with another occurrenceof the other entitytype. That is, we might find anotherrelationship occurrence to satisfy theSIC unless both sides of the relationshipare bound.• Constraints that assert explicitly the equality ofsome key attributes of twoentity types, e.g.. SICs implied by an ID Dependency,is_a or componenLofrelationships, should be dealt with specially. Becausethe key attributes (e.g.,SIN) of the two entity occurrences (e.g., Person andEmployee) in fact referencethe same physical entity occurrence (e.g.,the same physical Person), the linkshould be permanent.4. Suggested Violation Actions. As stated, because ofthe verification problem,the SIC elicitation subsystem could only suggest some restricted and predefinedChapter 7. A Proposed Database Design Aid for Eliciting SICs161violation actions appropriate to the certainty of the SIC. A propagationactioncan be automatically suggested by the SIC elicitation subsystemonly in the casethat there are two simple assertions on two objects in the SIC. In that case,asimple propagation action is suggested to regain a consistent databasestate. If thedatabase designer has set a certainty threshold (say75%), any uncertain SIC withcertainty less than the threshold will be given a “warning” as its violationactionautomatically.5. Given SIC Names. A SIC name is given automatically afterapplying the abovealgorithms and using some predefined SIC types.Examples Some examples are included in AppendixI for illustration.General SIC Information After decomposinga general SIC, we may still need tolink its decomposed sub-SICs together since the verificationfor consistency and nonredundancy may still be needed, and furthermore,a general SIC may later be deleted.In addition, for documentation purposes, it maybe desirable to keep the original general SIC information, which contains only the certaintyfactor (if not default “certain”),precondition and predicate components. Therefore,the SIC elicitation subsystem wouldneed to link a general SIC and its decomposedsub-SICs together.7.3.1 Representation of Generic SICsThe principles of the reformulation and decompositionalgorithms can be similarly appliedto produce the generic sub-SICs for the generic objecttypesEntity*, Relationship*,Entity*.Attribute*,andRelationship* .Attribute*.However, the SIC representation forChapter 7. A Proposed Database Design Aid for Eliciting SICs 162the generic types is complicated because the data dictionary retrieval and manipulation,and the pre-coudition contexts must be explicitly stated. The input informationandmanipulation logical predicates used in generic SICs are listed in Appendix B. Thissubsection introduces the representation of generic SICs for some common SIC typesanddiscusses the related issues. In all of the following cases, the principleof SIC specializationallows us to omit the explicit SIC representations for specific object typesif we have therepresentations for generic object types.Domain Constraint RepresentationSIC Aggregation and SpecializationBy the principle of SIC aggregation, domain constraints on insertionof an entity/relationshipoccurrence can be simulated by applying the domainconstraints of all its attributes.There are three separate sub-SICs for restricting not-null, uniqueness,and nonvolatility, respectively, and another sub-SIC dealing with data-type,format, and value. Forrestricting the insertion of an entity, we need onesub-SIC, which would call the aboverelated sub-SICs (except for the nonvolatility) for asserting attributedomain constraints.In total, the five sub-SICs in Appendix J are sufficientto represent all domain constraintsregardless of the number of entity types and their attributesin a database. The numberof SICs that must be specified explicitly is dramatically reduced throughusing the SICabstraction concepts. Similarly, five sub-SICsare needed for relationship types and theirattributes in a database.Primary Key Constraint Representation— SIC Association and Specialization A number of SICs must be specified to capture the possibleinconsistent states of adatabase when updating a primary key. The SIC associationand specialization conceptswill be used to reduce the number of explicit SICs required. During the databasedesignChapter 7. A Proposed Database Design Aid for Eliciting SICs 163phase, if a SIC is identified for the insertion (or deletion) of a relationship and thereshould be a sub-SIC for checking the update of a part of its key, its SIC name is addedinto an associated SIC.NameSet of the affected key attribute. The SICNameSeis ofkey attributes of a relationship type may be different. However, in the case of a SIC thatis relevant to the insertion (or deletion) of an entity type, all affected key attributes ofthe entity have the same SICiVame_Set.For example, suppose that we have a SIC such as “if an employeeis assigned to aproject, he (she) must participate in an insurance plan”, and assume the key of the employee and project are Empld and Projld, respectively. The name of the sub-SIC restricting an insertion of the relationship Assigned_to would be inserted into the SIC_Name_Setof a special logical predicate (associated_PKSICs_1) for the key attribute Assigned_to.Empld.Note that since the non-sharing entities are not concerned,the update of another keyattribute Assigned_to.Projld of the relationshipis not restricted. Similarly, the name ofanother sub-SIC restricting a deletion of the relationship Insure,is inserted into AssociatedPKSICsD for Insure.Empld.Two “set-SICs” in Appendix J are needed for the key attributesofRelationship*.(Similarly, there are two “set-SICs” for key attributes ofEntity*.)By the principle ofSIC association, the enforcement of such a “set-SIC” is the same asthe enforcement ofall of its “member-SICs”.Other SIC Representation SIC Specialization A numberof other SIC types98can be similarly represented. Usually, these SICs types canbe described in the “closedform” of predicates, that is, without further arbitrary restrictions. If a DBMS finds98These SIC types include: Composite_Attribute_Unique Constraint, Absolute MaximumCardinality Constraint of an Entity Type, “traditional” relative cardinality constraints (i.e., Totality Constraint, and Relative Maximum Cardinality Constraint), Incidence Constraint,Chapter 7. A Proposed Database Design Aid for Eliciting SIC’s164the related information on a specific entity, relationship or attribute type, e.g.,symmetric(Married_to) indicating its symmetry, the relatedsub-SICs would be automaticallyinherited from the generic types.The number of such generic SICs stored in the SIC maintenancesubsystem woulddepend on the complexity of the application at hand. It is possiblethat in a specialdatabase application all SICs can be representedas generic SICs in advance so that theactual invocation of the reformulation and decomposition algorithmsmight be almosttotally avoided during the database design consultation session.Some examples of the above generic SICs are includedin Appendix J.Some Improvements The generic SICs in AppendixJ have two problems, which canbe solved as follows.• Uniform Certainty FactorsIt is assumed that the generic SICs are all “certain”by default If we consider the possibilitiesthat a few SIC types for some specificobject types may be “uncertain”, the certainty factorsneed to be stored in theinput predicates for these specific object types• Uniform Violation Actions. Because of theSIC inheritance principle, specific object types would inherit the same violationactions from the generic objecttypes. The advantage of this is that the elicitationsub-system only needs to ask theSymmetry Property of a Relationship, TransitivityProperty of a Relationship,Subset_Relationship SIC, Relationships_Union SpecialSIC, Exclusive Relationship SIC, Exclusive Occurrence SIC, Not_And_Relationships SIC,Either_Existence_Relationships SIC, Relationship_Before_Relationship SIC, Relationship_Not_Before_RelationshipSIC, Relationships_Join SIC, Relationships_Depends_onLoopN_RelationshipsSIC, Weak_Entity SIC,ID_Dependency_Relationship SIC, Weak_Relationship SIC, Completeness_MappingSIC,Relationships_Intersection Special SIC, Relationship_Trigger_RelationshipSIC, Exclusionbetween Entity Types, Entities_Intersection Special SIC, and Entities_Union SpecialSIC.Chapter 7. A Proposed Database Design Aid for Eliciting SIC’s7.4 Transforming SICs to Relational FormThe final result of the consultation is a logicaldatabase specification implementedinthe relational model. These SICs may be later enforcedby an integrity maintenaicesubsystem in a relational database system. Therefore,the incorporated SICs referencingentities and relationships in the E-R-SIC modelmust be automatically transformed bytheelicitation subsystem into SICs referencingthe corresponding relations in therelationalmodel.Relationship Representation in the RelationalModel When constructing therelations in the relational model by using the E-Rdescriptions, each entity type is naturallyrepresented by a separate relation. However, thereare two alternatives for representinga binary relationship type in the relational model.Some researchers favour always representing a relationship type by using a separate relation rather thana foreign key because165database designer the violation action once for each of these commonSIC types.However, this rigidity may not be suitable forall applications. An improvementwould be to store a violation action “list” in the inputpredicate for each relatedspecific object type. The number of elements in the list correspondsto the numberof its sub-SICs. Each element is a violation action for each sub-SIC.Then the subSICs of a specific object type would not inherit the violationaction from its genericobject type, but have its own “custom-made” violationaction. Considering alsothe above problem of uniform certainty factors,we may allow both the certaintyfactor and the violation action in generic SICs tobe variables. They would bebound with actual values when a specific objecttype satisfies the preconditions.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 166they argue that foreign keys decrease the adaptability of database designs ([Wilmot,1984]) and “[E-R consistent relational schema] assures a greater adaptability to changesnot concerning the structure of the modeled information, such as the cardinality ofrelationships” ([Makowsky, et al., 1986,p. 321]). In addition, by adopting the separaterelation approach we would have the same SIC representations in both the E-R-SICmodel and the relational model.However, because of access efficiency considerations, the foreign keyapproach stillprevails in practice. This research allows the foreign keyapproach to representing arelationship type having (1,1) relative cardinalities. The cardinalities stabilitycriterionis adopted if there is a tie to decide which entity key will becomethe foreign key. Supposethat a relationship type R relates to entity types E andF. If the relationship type R hasany attribute, it is represented by a separate relationin the relational model. Otherwise,its representation is based on the following rules99:1. Suppose only one of the involved entity types has (1,1) cardirialities.If E has the(1,1) cardinalities, add the primary key of F to the E relation as a foreignkey.Otherwise, add the primary key of E to the F relation asa foreign key.2. Suppose both entity types, E and F, have (1,1) cardinalities.(a) The decision is based on the stability of the cardinalities. If the(1,1) cardinalities of F are more likely to change in the future,add the primary key ofF to the E relation as a foreign key. Otherwise, add the primarykey of E tothe F relation as a foreign key.99There is an exception. If the relationship is a relationship via which an IDDependency happens,an is_a or component_of relationship, the entity type with the (1,1) relative cardinalitiesalready has theprimary key of the other entity type as its candidate key attributes. These key attributes can play boththe role of (part of) a candidate key and a foreign key. We need not add the key attributes of the otherentity type to it.Chapter 7. A Proposed Database Design Aid for Eliciting SICs 167(b) If the above cannot be decided, the choice is then based on the relative frequencies of “F of E’ or “E of F” type queries. If the query “F of E” wouldbe encountered more often, add the primary key of F to the E relationas aforeign key. Otherwise, add the primary key of F to the F relation asa foreignkey.(c) If neither of the above can be decided, the choice depends on the lastresort— how the database designer specifies the relationship, “F R F” or“F R F”.It is based on the heuristic that future queries may be similar to the waythedatabase designer states the relationship although he (she) may notadmit it.If originally the relationship is expressed by the database designeras “F RF”, add the primary key of F to the F relation as a foreign key.Otherwise,add the primary key of E to the F relation as a foreign key.3. In all other cases, construct a separate relation for the relationship.SIC Representation in the Relational Model Althoughwe allow the foreign keyapproach to representing relationships in the special casesof (1,1) cardinalities, this kindof representation is only for query and processing efficiency.The semantics should bethe same as in the E-R-SIC model. Therefore, the entity and relationshipdescriptionswould still be stored in the data dictionary of a relationaldatabase. There would also bethe same classification of SIC types. However, theadoption of the foreign key approachwould cause some semantic confusion because now a relationcould represent an “entity”,or “relationship”, or even both. In this research,if a relation represents both an entitytype and one or more relationship types, it is deemeda special entity type with someinformation kept in the data dictionary. The associated special informationspecifies therelationship types hidden in it (by adding the key of the other entity type(s) toit asChapter 7. A Proposed Database Design Aid forEliciting SICs 168a foreign key) and the relative cardinalities of the otherentity type(s) in this hiddenrelationship type’°°.The (1,1) cardinalities of one entity type in a relationship typewould cause some SICsto be redundant or inconsistent. In addition, someSICs in the E-R representation neednot be transformed when constructing relationsbecause either they do not mention anexplicit relationship or the relationship representationis not relevant to them’°1.TheseSIC’s representation in the relational model wouldbe the same as corresponding ones inthe E-R-SIC model.The Algorithms The general transformationalgorithms for transformingSICs inthe E-R-SIC model into SICs referencing the correspondingrelations in the relationalmodel are described in Appendix K. Itis relatively simple to transform the relationshipname and the manipulation predicatesof its participating entities102.However, in addition to the primary key updateproblem, in the relational model we havea problemof the update of foreign key attributesowing to the well-known semantic overloadissue.The update of any attributes of a foreignkey would imply the deletion of an oldrelationship occurrence and the insertion of a newone. The SIC association and specializationconcepts can be applied here too.‘°°That is, two related “relationship_participant” predicateswould be deleted and a new“relationship_hidden_entity” predicate will created by theSIC elicitation subsystem.‘°‘Some of these SIC types are: all single entity attributeSICs, all entity SICs, SICs on a singlerelationship’s attributes, Entitieslntersection SpecialSICs, Entities_Union Special SICs. Inaddition, if a relationship is declared tobe complete, usually the relative cardinalitiesof any involvedentity type should not be (1,1). Otherwise, theother side of the entity type can only have exactlyoneoccurrence, which is not practical. Therefore, the Completeness_MappingSIC usually need not betransformed.‘°2By taking the foreign key approach to representingrelationships, some SICs would becomeredundant. For example, we do not need a SIC requiringthe relative maximum cardinality to be1 for the“entity” relation in which the relationship is now hiddenbecause the foreign key is single-valued. Thealgorithms would also remove them.Chapter 7. A Proposed Database Design Aid for Eliciting SICs169Examples corresponding to the ones in the E-R-SIC model are includedin Appendix Lfor illustration.Generic SICs It is also desirable to apply theSIC abstraction concept here torepresent generic SICs. In principle, we can applythe above transformation algorithms totransform the generic SICs in the E-R-SIC model into correspondingones in the relationalmodel although the references to some new information fromthe data dictionary maybe needed. However, the foreign key approach wouldneed additional sets of predefinedsub-SICs. Because now relationship representationis not uniform, we need to considereach of the three possible relation representationsfor each relationship. If a sub-SICinvolves two relationships, it would havenine possible combinations of representations;if it involves three relationships, the possible combinationswould become twenty-sevenalthough some impossible cases can be excluded103.Such an analysis and preparationof pre-defined generic SICs could becarried outfor SIC types involving only a few relationships.However, it becomes impossiblewhena SIC type may involve an unknown number ofrelationships (e.g., the intersection ofrelationships). In those SIC types, we would beforced to pre-define genericSICs onlyfor the simple cases having at most three relationshipsinvolved.Some of these SICs representations in the relationalmodel are included in Appendix Mfor illustration.103For example, suppose that we have an Exclusive_RelationshipSIC, such asRXIIRYrelative toE where RX type connects the entity typesE with F, and BY connects the entity typesE with G. Wewould know that E cannot have (1,1) cardinalities in eitherRX or RY. So we need not consider thoseimpossible representations for relationships RX and BY— adding the primary key of F orG as theforeign entity of E. However, F may have (1,1) cardinalitiesiii RX; and G may have (1,1) cardinalitiesin RY too. So we would need two more sets of representationsfor the predefined sub-SICs in additionto the original one in the E-R-SIC model.Chapter 8Conclusions and Further Research8.1 Conclusions and ContributionsConclusions This research has presentedtwo models. The E-R-SIC modelis a comprehensive modelling tool for helpingthe database designer systematically incorporatesemantic integrity constraints that are relevantto attributes, entities, and relationshipsin a database. The SIC Representationmodel is used to represent preciselythe featuresof these SICs. The declarative and operationalsemantics of each SICare specified. Byusing these two models, the data integritysemantic constraints on the allowablestatesand state transitions can be completelymodelled and properly represented ina databaseschema. The focus of database designers willbe changed from traditionally emphasizingonly structure, functional dependencies, efficiency,etc. to describing data semantics.In a database, the number of explicitSICs specified using the Representationmodelwould not be huge because of the applicationof SIC abstractions and the representationof generic SICs. These SICs could be efficientlyenforced because of the characteristicsof the SIC Representation model.Both the Representation model and the E-R-SIC modelare application-domain independent. They are suitable for implementationas part of an automated database design170Chapter 8. Conclusions and Further Research171system. Conceptually, this research proposes a SIC elicitation subsystem. The SICelicitation subsystem would detect where general SICs may he needed and prompt a databasedesigner to confirm or provide them. The subsystem would automaticallydecide for whatdata objects and on what database operations these SICs should be enforced,reformulate them (decompose them into sub-SICs if necessary) in terms ofthe Representationmodel, and suggest some violation actions. The subsystem would havesome built-in consistency and nonredundancy rules for different SIC types, wouldverify the consistencyand nonredundancy of SICs to some extent and transform theminto corresponding onesin the relational model. This kind of automated database designsystem would providemore assistance to a database designer in modellingthe data semantics.Contributions Most previous SIC researchconcentrates on classifying and efficientlyenforcing a few types of SICs. Current languagesdo not represent all features of aSIC precisely. Existing automated database aids donot provide adequate facilities forincorporating SICs into a design. This researchcontributes to our understandingofdatabase semantic integrity. On the theoreticalside, there are the following contributions:1. The SIC Representation model is defined to represent preciselythe necessary features for specifying declarative and operationalsemantics of a SIC.2. The E-R-SIC model is proposed to incorporatedynamic and static SICs in adatabase schema rather than in transactionsand programs.3. Algorithms are provided to reformulate and decompose SICselicited using the ER-SIC model, and to transform them into correspondingones in the relational datamodel.chapter 8. Conclusions and Further Research 172On the practical side, there are the following contributions:1. A SIC elicitation szzbsystem has been proposed to help the database designer designSICs in addition to the structure part of a database schema.2. This research provides a foundation for overcoming the well-known problem ofrepresenting data integrity semantics in current relational database systems. Theresulting database would have the advantages of embedded SICs as describedinChapter 1. The SIC representation would facilitate the efficient enforcement.3. Although not empirically tested, we may conjecture that the information systemconceptual design would more completely and systematically includedata and information system semantics. The database designer would be able tomodel moredata integrity semantics, thereby reducing the need for this informationto be included in application programs. This research provides a startingpoint for futureempirical tests.8.2 Future Research Extensions to this DissertationThis dissertation is part of a research program at the University of BritishColumbia toformalize the database design process and make databases more intelligeilt.There aremany areas for further research to extend this dissertation. Some of theseare suggestedbelow.• Non-binary Relationships. Currently, the E-R-SIC modelassumes all relationship types to be binary. It is assumed that a database designer would know howChapter 8. Conclusions and Further Research 173to use binary relationships to simulate non-binary ones (including recursive relationships, ternary relationships and relationships of higher degree). This restrictioncould be relaxed to allow the database designer to model non-binary relationshipsdirectly.• Development and Implementation of Efficient Algorithmsto Assure theConsistency, Nonredundancy, and Even Optimization of SICs.To provethe consistency, nonredundancy and optimization of arbitrary SIC statements isstill an open issue. In addition, to design and implement those verificationalgorithms needs more research. These are challenges for researchers inthe fields ofmanagement information system, computer science, and mathematics.• Integration of SICs Elicited from Multiple Sources. Thisdissertation isbased on a simplified assumption of a single database designer.View integration orsynthesis is an important topic in database design research. Previousresearch (e.g.,[Wagner, 1989]) have addressed this issue without considering SIC specifications.Itis also possible that the SICs need to be obtained from several databasedesigners.Future researchers may address this complicated issue.• Programming Implementation.Future research would need to implement theproposed SIC elicitation subsystem. Then the produced SIC elicitation subsystemmust be merged with a system to construct the structure of a schema(e.g., theView Creation System [Storey, 1988]) to provide the completedatabase schemadesign assistance to a database designer. It is also desirable to incorporateit witha view integration system (e.g., AVIS [Wagner, 1989]) and a future SIC integrationsystem in the case that multiple sources for a database specification are needed.Acomplete automated database design system needs to be implemented.Chapter 8. Conclusions and Further Research 174• Add-on Knowledge, Capabilities and Features of the SIC Elicitation Subsystem. Based on the E-R-SIC model proposed in this research, future researchersmay add some common-sense, domain-dependent, and organization-dependent knowledge to the proposed automated database design system to make it more intelligentand provide more efficient and effective assistance for a database designer. Ideally,the system may have a learning capability so that it can accumulate knowledgeeach time it is used. It is also desirable to provide some natural language facilitiesand graphical interface so that a database designer can more easily describe entitiesand relationships, and directly design an E-R database on the screen.• Transaction Modelling.In order to capture transaction-driven semantics andcheck the SICs related to transactions more efficiently, specifications of transactions,i.e.. user-predefined operations, need to he specified. Future research mayproposealgorithms to transform SICs identified by the E-R-SIC model to the pre-conditionsand post-conditions of transactions.• Application of SICs in an Expert Database System. As statedearlier, SICscan be used to facilitate intelligent query evaluation and provide deductivecapabilities. Further research may explore how to include and apply SICs representedby the Representation model in an expert database system.• Designand Management of an Integrity Maintenance Subsystem in theRelational Database. Very few kinds of SICs are enforcedin commercial relational database systems. One reason for this is probably concern forefficiency.Based on SIC specifications identified in this research, future researchers maydesign an integrity maintenance subsystem to enforce SICs efficiently in a relationaldatabase. The management of SICs, i.e., the insertion or deletion of SICs, after the database is populated should be carefully taken into consideration. SomeChapter 8. Conclusions and Further Research 175administrative procedures may need to be invented to handle the “change of SICs”.• Empirical Research for Testing “Usefulness”. Future empirical researchersmay test two kinds of usefulness. The first is the usefulness of using database embedded SICs versus the traditional approach of enforcing integrity via applicationsoftware. The second is the usefulness of the design approach adopting anautomated database design system compared with the manual design approach (withoutany assistance of an automated design system) to incorporate the necessary SICsfor embedding in a database. The automated database design system proposedinthis dissertation can be taken as a tool.Bibliography[1] Abiteboul, S., and Vianu,\7,“Transactions and Integrity Constraints”, PTOC.ofthe Second ACM SIGA CT-SIGMOD Symposium of Principles of DatabaseSystems,Portland, Oregon, May 1985, pp. 193-204.[2] Aho, A. V., Hopcroft, J. E., and Ulirnan, J. D., The Design and Analysisof Computer Algorithms, Addison-Wesley, Reading, Mass., 1974.[3] Arisawa, H., and Miura, T., “On the Properties of Extended InclusionDependencies”, Proc. of the Twelfth International Conference on Very Large DataBase,Kyoto, August 1986,pp.449-457.[4] Azar, N., and Pichat, E., “Translation of an ExtendedEntity-Relationship Modelinto the Universal Relation with Inclusion Formalism”,in Entity-Relationship (theFifth International Conference on E-R Approach, France,1986) edited by S. Spaccapietra, Elsevier Science Publishers B.V. (North-Holland), 1987,pp.253-268.[5] Benci, E., Bodart, F., Bogaert, H., and Cabanes, A., “Concepts for theDesign of aConceptual Schema”, in Modelling in Data Base ManagementSystems, edited byG.M. Nijssen, North-Holland Publishing Co., 1976,pp.181-200.[6] Bernstein, P. A., Blaustein, B. T., and Clarke, E. M., “Fast Maintenanceof Semantic Integrity Assertions Using Redundant Aggregate Data”, Proc.of the SixthInternational Conference on Very Large Data Base,1980,pp.126-136.[7] Bertino, E., and Apuzzo, D., “Integrity Aspects in Data Base ManagementSystems”, Proc. Trends é4 Applications, Gaithereshurg, Maryland,1984,pp.43-52.[8] Biller, H., and Neuhold, E. J., “Semantics of Data Bases: The Semantics ofdataModels”, Information System, Vol. 1 No. 3, 1978,pp.273-292.[9] Borgida, A., “Language Features for Flexible Handling of Exceptions inInformationSystems”, ACM Transactions on Database Systems, Vol. 10,No. 4, December 1985,pp.565-603.[10] Bouzeghoub, M., and Gardarin, G., “The Design of An Expert System forDatabaseDesign” in New Applications of Data Bases, edited by G. Gardarin and E. Gelenbe,1984,pp.203-223.176Bibliography 177[11] Bouzeghoub, M., Gardarin, G., and Metais, E., “Database Design Tools: An ExpertSystem Approach:”, Proc. of the Eleventh International Conference on Very LargeData Base, Stockholm, August 1985,pp.82-95.[12] Bouzeghoub, M., and Metais, E., “SECSI: An Expert System Approach forDatabase Design”, in Information Processing 86, edited by H.J. Kugler, ElsevierScience Publishers B.V. (North-Holland), 1986,pp.251-257.[13] Bracchi, G., Furtado, A., and Pelagatti, G., “Constraints Specification in Evolutionary Data Base Design”, in Formal Models and Practical Tools for InformationSystems Design, edited by H.-J. Schneider, North-Holland Publishing Co., 1979,pp.149-165.[14] Brady, L. I., and Dompney, C. N. G., “Dynamics of Database Semantic Integrityor Managing the Meaning of Data”, Proc. of the Australian Computer Conference,Sydney, November 1984,pp.82-96.[15] Bragger, R. P., Dudler, A., Rebsamen, J., and Zehnder, C. A., “Gambit: An Interactive Database Design Tool for Data Structures, Integrity Constraints and Transactions”, International Conference on Data Engineering, Los Angeles, California,August 1984,pp.399-407.[16] Brodie, M. L., Specification and Verification of Database Semantic Integrity, Ph.D.Thesis, University of Toronto, 1978 (also as Technical Report CSRG-91,April1978).[17] Brodie, M. L., “Association: A Database Abstraction for Semantic Modelling”, inEntity-Relationship Approach to Information Modelling and Analysis (theSecondInternational Conference on E-R Approach, 1981), edited by P. P. Chen, ElsevierScience B.V. (North-Holland), 1983,pp.577-602.[18] Brodie, M. L., “On the Development of Data Models”, in On Conceptual Modelling,edited by M.L. Brodie, J. Mylopoulus and J.W. Schmidt, Spring-Verlag, 1984,pp.19-47.[19] Brodie, M. L., “Database Management: A Survey”, in On Knowledge-Base Management Systems, edited by M.L. Brodie and J. Mylopoulus, Springer-Verlag, 1986,pp.201-218.[20] Brodie, M. L., and Manola, F., “Database Management: A Survey”, in Foundations of Knowledge Base Management: Contributions from Logic, Databases,and Artificial Intelligence Applications, edited by J. W. Schmidt and C. Thanos,Springer-Verlag, 1989, pp. 205-234.Bibliography178[21] Brodie, M. L., and Ridjanovic, D., “On the Design and Specificationof DatabaseTransactions”, in On Conceptual Modelling, Edited by M.L. Brodie,J. Mylopoulos,and J. W. Schmidt, Spring-Verlag, 1984,pp.277-306.[22] Bry, F., and Mauthey, R., “Checking Consistency of DatabaseConstraints: a Logical Basis”, Proc. of the Twelfth International Conferenceon Very Large Data Base,Kyoto, August 1986,pp.13-20.[23] Bubenko, J. A., “Information Modeling inthe Context of System Development”,Information Processing 80, edited by S. H. Lavington,North-Holland PublishingCo., 1980,pp.395-411.[24] Carsnell, J. L., and Navathe, B. “SA-ER:A Methodology that Links Structured Analysis and Entity-Relationship Modelling forDatabase Design”, in Entity-Relationship (the Fifth International Conferenceon E-R Approach, France, 1986)edited by S. Spaccapietra, Elsevier Science PublishersB.V. (North-Holland), 1987,pp. 381-397.[25] Casanova, M. A., and Furtado, A. L., “Onthe Description of Database TransitionConstraints Using Temporal Languages”,in Advances in Database Theory, Vol.2,edited by H. Gallaire, J. Minker and J.M.Nicolas, Plenum Press, New York,1984,pp.211-236.[26] Casanova, M. A., and Tucherman, L., “EnforcingInclusion Dependencies and Referential Integrity”, Proc. of the FourteenthInternational Conference on VeryLargeData Base, California, 1988,pp.38-49.[27] Casanova, M. A., and Vidal, V. M. P., “Towardsa Sound View Integration”, Proc.of the Second ACM SIGACT-SIGMOD Symposium of Principlesof Database Systems, Atlanta, George, March 1983,pp. 36-46.[28] Cauvet, C., Proix, C., and RollandC., “A Knowledge Base for an InformationSystem Design Tool”, in Methodologies for IntelligentSystems, edited by Z.W.Ras, and M. Zemankova, Elsevier Science PublishingCo. Inc., 1987,pp.56-63.[29] Ceri, S., and Widom, J., “Deriving Production Rulesfor Constraint Maintenance”,IBM Research Report, RJ 7348, March 1,1990. (A short version appears in Proc.of the 16th VLDB Conference, Australia, 1990,pp.566-577).[30] Chakravarthy, U. S., Minker, J., and Grant, J., “SemanticQuery Optimization:Additional Constraints and Control Strategies”,in Expert Database Systems, editedby L. Kerschberg, Benjamin/Cummings Pub. Co. Inc., 1987,pp.345-377.Bibliography179[31] Chen, P. P.. “The Entity-Relationship Model — Toward aUnified View of Data”,ACM Transactions on Database Systems, Vol. 1 No.1, December 1976, pp. 9-36.[32] Chen, P. P., “Database Design Based on Entityand Relationship”, in Principles ofDatabase Design, Vol. 1: Logical Organization, edited by S.B.Yao, Prentice-Hall,1985, pp. 174-210.[33] Choobineh, J., “Form Driven Conceptual Data Modelling”, Ph.D.Dissertation,Dept. Management Information Systems, Universityof Arizona, 1985.[34] Choobineh, J., Mannino, M. V., Nunamaker,J.F., and Konsynski, B.R., “An Expert Database Design System Based on Analysis ofForms”, IEEE Transaction onSoftware Engineering, Vol. 14, No. 2, February 1988,pp.242-253.[35] Codd, E. F., “Extending the Database Relational Modelto Capture More Meaning”, ACM Transactions on Database Systems,Vol. 4, No. 4, December 1979,pp.397-434.[36] Cohen, J., “Constraint Logic Programming Languages”,Communications of theAGM, Vol. 33, No. 7, July 1990,pp.52-68.[37] Colmerauer, A., “An Introduction to PrologIII”, Communications of the ACM,Vol. 33, No. 7, July 1990,pp.69-90.[38] Cosmadakis, S. C., and Kanellakis, P.C., “Equation Theories and Database Constraints”, Proc. of the 17th AnnualACM symposium on Theory of Computing,Rhode Island, May 1985,pp.273-284.[39] Dampney, C. N. G., “Specifying a SemanticallyAdequate Structure for informationSystems and Databases”, in Entity-RelationshipApproach (the Sixth InternationalConference on E-R Approach, New York , Nov.1987), edited by S. T. March,Elsevier Science Publishers, B.V. (North-Holland),1988,pp.165-188.[40] Date, C. J., An Introduction to DatabaseSystem, Vol. II, Addison-Wesley Publishing Co., Reading, Massachusetts, 1983.[4].] Date, C. J., A Guide to the SQL Standard, Addison-WesleyPublishing Co., Reading, Massachusetts, 1987.[42] Davis, J. P., and Bonnell,R. D., “Modelling Semantic Constraints with Logic inthe EARL Data Model”, The Fifth InternationalConference on Data Engineering,Los Angeles, California, February 1989,pp.226-233.Bibliography 180[43] de Castilho, J. M. V., Casanova, M. A., and Purtado, A. L., “A TemporalFramework for Database Specifications”, Proc. the Eighth International ConferenceonVery Large Data Base, Mexico City, Mexico, 1982.pp.280-291.[44] Delobel, C., “Normalization and Hierarchical Dependencies in theRelational DataModel”, ACM Transactions on Database Systems, Vol.3, No. 3, September 1978,pp.201-222.[45] Dogac, A., and Chen P. P., “Entity-Relationship Modelin the ANSI/SPARCFramework”, in Entity-Relationship Approach toInformation Modelling and Analysis (the second International Conference on E-RApproach, 1981), edited by P.P.Chen, Elsevier Science B.V. (North-Holland), 1983,pp. 357-374.[46] Dogac, A., Chen P. P. and Erol, N., “the Design and Implementationof an IntegritySubsystem for the Relational DBMS RAP”, in the Fourth Conferenceon Entity-Relationship Approach, Chicago, Illinois, October1985,pp.295-302.[47] dos Santos, C. S., Neuhold, E. J., and Purtado,A. L., “A Data Type Approach tothe Entity-Relationship Model”, in Entity-RelationshipApproach to System Analysis and Design (the First International Conferenceon E-R Approach), edited byP. P. Chen, North-Holland Publishing Co., 1980,Pp.103-119.[48] Ehrich, H. D., Lipeck, U. W., and Gogolla, M., “Specification,Semantics, andEnforcement of Dynamic Database Constraints”, Proc.of the Tenth InternationalConference on Very Large Data Base, Singapore,August 1984,pp.301-308.[49] Eswarn, K. P., and Chamberlin, D. D., “FunctionalSpecifications of a Subsystemfor Data Base Integrity”, Proc. the Second InternationalConference on Very LargeData Base, Framingham, Massachusetts,September 1975,pp.48-68.[50] Etzion, 0., “PARDES — a Model for Supporting DerivationClosure”, Working Paper, Computer and Information Sciences Department,Temple University, Philadelphia, Pennsylvania, 1989.[51] Fernandez, E. B., Summers, R. C., and Wood, C.,Database Security and Integrity,Addison-Wesley Publishing Co., Reading, Massachusetts,1981.[52] Fleming, C. C., and Halle, B. V.,Handbook of RelationalDatabase Design, AddisonWesley Publishing Co., Reading, Massachusetts,1989.[53] Fong, E., and Kimbleton, S. R., “Database SemanticIitegrity for a Network DataVIanger”, AFIPS Proceedings of National ComputerConference, (Vol. 49), California, May 1980,pp.261-268.Bibliography 181[54] Frost, R. A. (ed).,Database Management Systems, Granada Publishing Ltd., London, 1984.[55] Furtado, A. L., dos Santos, C. S., and de Castilho, J. M. V., “Dynamic Modellingof a Simple Existence Constraint”, Information Systems, Vol. 6, 1981,pp.73-81.[56] Furtado, A. L., and Neuhold, E. J., Formal Techniques for Data Bases Design,Springer-Verlag, Berlin, 1986.[57] Furtado, A. L., Casanova, M. A., and Tucherman, L., “The CHRISCONSULTANT”, in Entity-Relationship Approach (the Sixth InternationalConference onE-R Approach, New York , Nov. 1987), edited byS. T. March, Elsevier SciencePublishers, B.V. (North-Holland), 1988,pp.515-532.[58] Gardarin, G., and Melkanoff, M., “Proving Consistency of Database Transactions”,Proc. of the Fifth International Conference on Very Large Data Base,Brazil, October 1979,pp.291-298.[59] Goldstein, R. C., Database: Technology and Management, John Wiley& Sons,1985.[60] Goldstein, R. C., and Storey, V. C., “Data Abstraction: The Impacton DatabaseManagement”, Working Paper, Faculty of Commerceand Business Administration,The University of British Columbia, October 1990.[61] Goldstein, R. C., and Wagner, C., Manual of Instruction DatabaseSoftware for Microcomputers, Faculty of Commerce and Business Administration,The Universityof British Columbia, December 1988.[62] Hammer, M. M., and McLeod, D.J., “Semantic Integrity in a Relational DataBase System”, Proc. the Second International Conference on Very LargeData Base,Framingham, Massachusetts, September 1975,pp.25-47.[63] Hammer, M. M., and McLeod, D. J., “A Framework for Data Base SemanticIntegrity”, Proc. the Second International Conference on Software Engineering,SanFranciso, California, October 1976,pp.498-504.[64] Hammer, M. M., and McLeod, D. J., “Database Description with SDM:A SemanticDatabase Model”, ACM Transaction on Database Systems,Vol. 6, No. 3, September1981,pp.351-386.[65] Hentenryck, P. V., Constraint Satisfaction in Logic Programming, MIT Press, Massachusetts, 1989.Bibliography 182[66] Hentenryck, P. V., “Incremental Constraint Satisfaction in Logic Programming”,in Logic Programming: Proc. of the Seventh International Conference, edited byD. H. D. Warren and P. Szeredi, 1990,PP.189-202.[67] Henschen, 1. J., McCune, W. W., and Naqvi, S. A., “Compiling Constraint-Checking Programs from First-Order Formulas”, in Advances in Database Theory,Vol. , edited by H. Gallaire, J. Minker and J. M. Nicolas, Plenum Press, NewYork, 1984,pp.145-169.[68] Heuser, C. A.. and Richter, G.. “On the Relationship between ConceptualSchemaand Integrity Constraints on Databases”, in Database Semantics(DS-1), editedby T. B. Steel, Jr. and R. Meersman, Elsevier Science Publishers,B.V. (NorthHolland), 1986,pp.27-39.[69] Hillier, F. S., and Lieberman, G. J., Introduction to Operations Research, thefourthedition, Holden-Day, 1986.[70] Ho, H. C., “Integrity Control in a Relational Database”, TechnicalReportS.O.C.S.828, School of Computer Science, McGill University,Montreal, Canada,March 1982.[71] Hohenstein, U. and Hülsmann, K., “A Languagefor Specifying Static and DynamicIntegrity Constraints”, in the Tenth InternationalConference on E-R Approach,San Mateo, California, October, 1991, (proceedings editedby T.J. Teorey), pp.389-416.[72] Holsapple, C., Shen, S., and Whinston, A., “A ConsultingSystem for DatabaseDesign”, Information System, Vol. 7, No. 3., 1982,pp.281-296.[73] Hsu, A., and Imielinski, T., “Integrity Checking for MultipleUpdates”, Proc. ofACM SIGMOD International Management of Data, Austin,Texas, May 1985,pp.152- 167.[74] Hsu, C., Perry, A., Bouziane, M. and Cheung,W., “TSER: A Data ModellingSystem using the Two-Stage Entity-Relationship Approach” ,inEntity-RelationshipApproach (the Sixth International Conference on E-R Approach, NewYork, November 1987), edited by S. T. March, ElsevierScience Publishers, B.V. (NorthHolland), 1988,pp.497-514.[75] Hull, R., and King, R., “Semantic Database Modelling: Survey,Applications, andResearch Issues”, ACM Computing Surveys, Vol. 19, No.3, September 1987,pp.201-260.Bibliography 183[76] Jaffar, J., and Lassez, J-L., “Constraint Logic Programming”, 14th Annual SCMSymposium on Principles of Programming Languages, Munich, West Germany, January 1987,PP.111-119.[77] Jaffar, J., and Michaylov, S., “Methodology and Implementation of a CLP system”,Proc. of 4th International Conference on Logic Programming, edited by J-L. Lassez,MIT Press, 1987,pp.196-217.[78] Jarke, M., and Vassiliou, Y., “Databases and Expert Systems: Opportunities andArchitectures for Integration”, in New Applications of Databases,edited by G.Gardarin and E. Gelenhe, Academic Press London, 1984,pp.185-201.[79] Jones, C. B., Systematic Software Development Using VDM, Prentice-Hall,1986.[80] Kawaguchi, A., Taoka, N., Mizoguchi, R., Yamaguchi, T., and Kakusho,0., “AnIntelligent Interview System for Conceptual Design of Database”,Proc. ECAI 1986.[81] Kennedy, A. J., and Yen, D. C., “Enhancing a DBMSThrough the Use of anExpert System”, Journal of Information Management,Spring 1990,Pp.55-61.[82] Kent, W., “Entities and Relationships in Information”,in Architecture and Models in Data Base Management Systems,edited by G.M. Nijssen, North-HollandPublishing Co., 1977,pp.67-91.[83] Kent, W., “Limitations of Record-Based Information Models”,ACM Transactionson Database Systems, Vol. 4, No. 1, March1979,pp.107-131.[84] Kent, W., “A Single Guide to Five Normal Forms in Relational Database Theory”,Communications of the ACM, Volume 26, No. 2, February 1983,pp.120-125.[85] Kerstern, M. L., Weigand, H., Dignum, F., Boom, J., “A Conceptual IViodellingExpert System”, in Entity-Relationship (the Fifth InternationalConference on ER Approach, France, 1986) edited by S. Spaccapietra, Elsevier SciencePublishersB.V. (North-Holland), 1987,pp.35-48.[86] Kim, M.-J., Lee, W.-U., and Derniame, J.-C., “Automatic Relational DataBaseDesigns by Transformation of the Entity-RelationshipModel”, IEEE the SecondInternational Conference on Computer and Application,Beijing, China, 1987,pp.418-425.[87] Knuth, E., Hannák, L., Radó, P., “A Taxonomy of Conceptual Foundations”,inData and Knowledge (DS-2), edited by R.A. Meersrnan and A.C.Sernadas, ElsevierScience Publishers, B.V. (North-Holland), 1988, pp. 205-219.Bibliography 184[88] Kobayashi, I., “Validating Database Updates”, Information Systems, Vol.9, No.1., 1984,pp.1-17.[89] Kozaczynski, W., and Lilieri, L., “An Extended Entity-Relationship(E2R)Database Specification and its Automatic Verification and Transformationintothe Logical Relational Design”, in Entity-Relationship Approach (the Sixth International Conference on E-R Approach, New York , November,1987), edited by S.T. March, Elsevier Science Publishers, B.V. (North-Holland), 1988,pp.533-549.[90] Kung, C. H., “A Temporal Framework for Database Specification and Verification”,Proc. of the Tenth International Conference on Very Large Data Base,Singapore,August 1984,pp.91-99.[91] Kung, C. H., “A Tableaux Approach for Consistency Checking”,Information Systems: Theoretical and Formal Aspects, edited by A. Sernadas,J. Bubenko, Jr., andA. Olive, 1985,pp.191-207.[92] Lafue, G., “Semantic Integrity Dependencies and DelayedIntegrity Checking”,Proc. the Eighth International Conference on Very Large DataBase, Mexico City,Mexico, 1982,pp.292-299.[93] Lassez, C., “Constraint Logic Programming”, BYTE,August 1987,pp.171-176.[94] Lauriere, J. L., “A Language and a Program forStating and Solving CombinatorialProblems”, Artificial Intelligence,Vol. 10, No. 1, 1978,pp.29-127.[95] Lee, R. M., “Logic, Semantics and Data Modelling: AnOntology”, in Data andKnowledge (DS-2), edited by R. A. Meersman andA. C. Sernadas, Elsevier SciencePublishers, B .V. (North-Holland),1988,pp.221-243.[96] Lee, K., and Lee, S., “An Object-Oriented Approach toData/Knowledge Modelling Based on Logic”, The Sixth International Conference onData Engineering,California, February 1990,pp.289-294.[97] Leuzerini, M., and Nobili, P., “On the Satisfiahilityof Dependency Constraintsin Entity-Relationship Schema”, Proc. the Thirtieth InternationalConference onVery Large Data Base, Brighton, 1987,pp.147-154.[98] Lenzerini, v1., and Santucci, G., “Cardinality Constraintsin the EntityRelationship Model”, in Entity-Relationship Approachto Software Engineering (theThird International Conference on E-R Approach, California,1983), edited by C.G. Davis, S. Jajodia, P. A. Ng and R. T. Yeh, Elsevier Science PublishersB. V.(North-Holland), 1983, pp. 529-549.Bibliography 185[99] Leveson, N. G., Wasserman, A. I., and Berry, D. M., “BASIS: A Behavioral Approach to the Specification of Information Systems”, Information Systems, Vol.8,No. 1, 1983,pp.15-23.[100] Ling, T.-W., “Integrity Constraint Checking in Deductive Databaseusing the Pro-log not-Predicate”, Tech. Report, DISCS Pub. No. NOTRA7/86, National University of Singapore, July 1986.[101] Ling, T.-W., and Rajagopalan, P., “A Method to Eliminate AvoidableCheckingof Integrity Constraints”, Proc. Trends é4 Applications, Gaithereshurg, Maryland,1984,pp.60-68.[102] Lipeck, U. W., “Stepwise Specification of Dynamic DatabaseBehaviour”, International Conference on Data Engineering, Washington, D.C., May 1986,pp.387-397.[103] Lockemann, P. C., “Object-Oriented Information Management”,Decision SupportSystem, 5, 1989, pp.79-102.[104] Mackworth, A. K., “Consistency in Networks of Relations”,Artificial Intelligence,Vol. 8, 1977,pp.99-118.[105] Mackworth, A. K., and Freuder, E. C.,“The Complexity of Some PolynomialNetwork Consistency Algorithms for Constraint SatisfactionProblems”, ArtificialIntelligence, Vol. 25, No. 1, January1985,pp.65-74.[106] Mackworth, A. K., “Constraint Satisfaction”, Encyclopediaof Artificial Intelligence, edited by S. C. Shapiro, J. Wiley & Sons,N.Y., 1987,pp.205-211.[107] MaFadden, F. R., and Hoffer, J. A., Data Base Management,the second edition,Benjamin/Cummings Publishing Co. Inc., 1988.[108] Makowsky, J. A., Markowitz, V. M., Rotics,N., “Entity-Relationship ConsistencyFor Relational Schemas”, Proc. InternationalConference on Database Theory,(Lecture Note in Computer ScienceV. 243) edited by G. Ausiello, and P. Atzeni,Springer-Verlag, 1986,pp.306-322.[109] Maunila, H., and Räihä, K-J, “Inclusion Dependenciesin Database Design”, TheSecond International Conference on Data Engineering,Los Angeles, California,February 1986,pp.713-718.[110] Maryanski, F., Francis, S., Hong, S., and Peckham,J., “Generation of ConceptualData Models”, Working Paper, Computer Science and EngineeringDepartment,University of Connecticut, 1984.Bibliography 186[111] Maryanski, F., and Hong, S., “A Tool for Generating Semantic Database Applications”, IEEE COMPSAC, Chicago, Illinois, October 1985,PP.368-375.[112] Meersman, R., “Towards Models for Practical Reasoning about ConceptualDatabase Design”, in Data and Knowledge (DS-2), edited by R. A. Meersmanand A. C. Sernadas, Elsevier Science Publishers, B.V. (North-Hollalld),1988,pp.245-263.[113] Mees, M., and Put, F., “ Extending a Dynamic vIodelling Methods using DataModelling Capabilities: The Case of JSD”, in Entity-Relationship (the Fifth International Conference on E-R Approach, France, 1986) edited by S. Spaccapietra,Elsevier Science Publishers B.V. (North-Holland), 1987,Pp.399-418.[114] Missikoff, M., and Wiederhold, G., “Toward a Unified Approachfor Expert andDatabase Systems”, in Expert Database Systems (Proceedings from the firstInternational Workshop), edited by L. Kerschberg, Benjamin/Cummings Publishing,1986,pp-383-399.[115] Morgenstern, M., “Active Databases as a Paradigm for Enhanced ComputingEnvironments”, Proc. the Ninth International Conference on Very LargeData Base,Italy, 1983,pp.34-42.[116] Morgenstern, M., “CONSTRAINT EQUATIONS: DeclarativeExpression of Constraints with Automatic Enforcement”, Proc. the Tenth InternationalConferenceon Very Large Data Base, Singapore, August 1984a,pp.291-300.[117] Morgenstern, M., “A Concise Compatible Representationfor Quantified Constraints in Semantic Networks”, AAAI-84, Proc. of National Conferenceon Artificial Intelligence, Texas, August 1984b, pp. 255-259.[118] Morgenstern, M., “The Role of Constraints in Databases, ExpertSystems, andKnowledge Representation”, in Expert Database Systems, edited by L. Kerschberg,Benjamin/Cummings Publishing Co., 1986,pp.351-368.[119] Morgenstern, M., Borgida, A., Lassez,C., Maier, D., and Wiederhold, G.,“Constraint-Based Systems: Knowledge about Data”,in Expert Database Systems(Proc. from the Second International Conference on EDS), editedby L. Kerschberg,Benjamin/Cummings Publishing Co., 1989,pp.23-43.[120] Nakano, R., “Integrity Checking in a Logic-OrientedER Model”, in Entity-Relationship Approach to Software Engineering (theThird International Conference on E-R Approach, California, 1983), edited by C. G. Davis, S. Jajodia, P.A.Ng and R. T. Yeh, Elsevier Science Publishers B. V. (North-Holland), 1983,pp.551-564.Bibliography 187[121] Newell, A., and Simon H. A., “Computer Science as Empirical Inquiry: Symbolsand Search”, Communications of the ACM, Volume 19, March 1976,PP.113-126.[122] Nicolas, J.-M., “Logic for Improving Integrity Checking in Relational Data Bases”,Acta Informatica, 18, 1982,pp.227-253.[123] Nilssori, N. J., Principles of Artificial Intelligence, Tioga Publishing Company,1980.[124] Obretenov, D., Angelov, Z., Mihaylov, J., Dishlieva, P., and Kirova, N., “AKnowledge-Based Approach to Relational Database Design”, Data é4 KnowledgeEngineering, 3, 1988,pp.173-180.[125] Oren, 0., “Integrity Constraints in the Conceptual Schema Language SYSDOC”,the Fourth Conference on Entity-Relationship Approach, Chicago, Illinois, October1985,pp.288-294.[126] Palmer, I. R., “Practicalities in Applying a Formal Methodology to Data Analysis”,in Data Base Design Techniques I: Requirements and Logical Structures, edited byS. B. Yao, et al., Spring-Verlag, Berlin, 1982,pp.147-171.[127] Papadirnitriou, C. H., and Steiglitz, K., Combinatorial Optimization: Algorithmsand Gomplexity, Prentice-hall, N.J., 1982.[128] Paulson, D., Reasoning Tools to Support System Analysisand Design, UnpublishedPh.D. Dissertation, The University of British Columbia, Vancouver, B.C.,Canada,1989.[129] Peckham, J., and Maryanski, F., “Semantic Data Models”, ACM ComputingSurveys, Vol. 20, No. 3, September 1988,pp.153-189.[130] Potter, W. D., and Kerschberg, L., “A Unified Approach to Modelling Knowledgeand Data”, in Data and Knowledge (DS-), edited by R.A. Meersman and A.C.Sernadas, Elsevier Science Publishers, B.V. (North-Holland), 1988,pp.265-291.[131] Proix, C., and Rolland, C., “A Knowledge Base for Information System Design”,inData and Knowledge (DS-2), edited by R.A. Meersman and A.C. Sernadas,ElsevierScience Publishers, B.V. (North-Holland), 1988,pp.293-306.[132] Qian, X., and Wiederhold, G., “Knowledge-based Integrity Constraint Validation”,Proceedings of the Twelfth International Conference on Very Large DataBase,Kyoto, 1986, pp. 3-12.Bibliography188[133] Qian, X., and Smith, D. R., “Integrity Constraint Reformulation for EfficientValidation”, Proceedings of the thirteenth International Conference on Very Large DataBase, Brighton, 1987, pp. 417-425.[1341Raju, K., and Majumdar, A. K., “ Fuzzy Functional Dependenciesand LosslessJoin Decomposition of Fuzzy Relational Database Systems”, AGM Transactionson Database Systems, Vol. 13, No. 2, June 1988, pp. 129-166.[135] Ram, S., “Automated Tools for Database Design: Sate of the Art”, WorkingPaper, Dept. of Management Information Systems, College of Business andPublicAdministration, University of Arizona, 1989.[136] Reiter, R., “ On the Integrity of Typed First Order Data Bases”,in Advancesin Database Theory, Vol. 1, edited by H. Gallaire,J. Minker and J. M. Nicolas,Plenum Press, New York, 1984,pp.137-157.[137] Reiter, R., “On Integrity Constraints”, Proc. of the SecondConference on Theoretical Aspects of Reasoning about Knowledge, 1988,pp.97-111.[138] Rolland, C., and Proix, C., “An Expert System Approachto Information SystemDesign”, in Information Processing 86, editedby H. J. Kugler, Elsevier SciencePublishers B.V. (North-Holland), 1986,pp. 241-250.[139] Sakai, H., “On the Optimization of an Entity-RelationshipModel”, 3rd USA-JAPAN Computer Conference, San Francisco, California, October1978,pp.145-149.[140] Sakai, H., “A Method for Entity-Relationship Behaviour Modelling”,in Entity-Relationship Approach to Software Engineering(the Third International Conferenceon E-R Approach, California, 1983), edited byC. G. Davis, S. Jajodia, P. A. Ng andR. T. Yeh, Elsevier Science Publishers B. V. (North-Holland),1983a,pp.111-129.[141] Sakai, H., “E-R Approach to Logical Database Design”,in Entity-RelationshipApproach to Software Engineering (the Third InternationalConference on E-RApproach, California, 1983), edited by C.G. Davis, S. Jajodia, P. A. Ng and R. T.Yeh, Elsevier Science Publishers B. V., North Holland,1983b, pp. 155-187.[142] Scheuermann, P., Schiffner, G., and Weber, H., “AbstractionCapabilities and Invariant Properties Modelling within the Entity-RelationshipApproach”, in EntityRelationship Approach to System Analysis and Design (the FirstInternational Conference on E-R Approach), edited by P. P. Chen, North-HollandPublishing Co.,1980,pp.121-140.Bibliography189[143] Schrefl, M., Tjoa, A. M., and Wagner, R. R., “Comparison-Criteria for SemanticData Models”, International Conference on Data Engineering, 1984,pp.120-124.[144] Segev, A., “Transitive Dependencies”. in the Surveyors’ Forum of AM ComputingSurveys, Vol. 19, No. 2, June 1987,pp.191-193.[145] Shepherd, A., and Kerschberg, L., “Constraint Management in ExpertDatabaseSystems”, in Expert Database Systems (Proc. form theFirst International Workshop), edited by Larry Kerschberg, Benjamin/Cummings Publishing Co., 1986,pp.309-331.[146] Simon, E., and Valduriez, P., “Design and Implementation of anExtendible Integrity Subsystem”, ACM-SIGMOD Proc. International Conference onManagement of Data, Boston, Massachusetts, June 1984,pp.9-17.[147] Smith, J. M., and Smith, D. C. P., “DatabaseAbstractions: Aggregation and Generalization”, ACM Transactions on DatabaseSystems, Vol. 2, No. 2, June 1977a,pp.105-133.[148] Smith, J. M., and Smith, D. C. P., “Database Abstractions: Aggregation”,Communications of the ACM, Vol. 20, No. 6, June 1977b,pp.405-413.[149] Solvberg, A., and Kung. C. H.,“On Structural and Behavioral Modelling of Reality”, in Database Semantics (DS-1), edited by T.B.Steel, Jr. and R. Meersman,Elsevier Science Publishers, B.V. (North-Holland),1986,pp.205-221.[150] Spivey, J. M., Understanding Z, Cambridge UniversityPress, 1988.[151] Stonebraker, M., “Implementation of Integrity Constraintsand Views by QueryModification”, Proc. of ACM SIGMOD International Managementof Data, SanJose, May 1975,pp.65-78.[152] Storey, V. C., View Creation: An Expert ViewCreation System for DatabaseDesign, Ph.D. Dissertation, Faculty of Commerceand Business Administration,University of British Columbia, Vancouver, B.C.,Canada, October 1986, ICITPress,1988.[153] Storey, V. C., and Goldstein, R. C., “A Methodologyfor Creating User Viewsin Database Design”, AC’M Transactions on DatabaseSystems, Vol. 13, No. 3,September 1988,pp.30.5-338.[154] Storey, V. C. and Goldstein, R. C., “Design and Developmentof an ExpertDatabase Design System”, International Journal of Expert Systems: ResearchandApplications, Vol.3, No.1, 1990,pp.31-63.Bibliography190[155] Storey, V. C., and Goldstein R. C., “Knowledge-Based Approaches to DatabaseDesign”, Working Paper, University of Rochester, 1991.[156] Studer, R., “A Conceptual Model for Physical and Logical time”, inEntity-Relationship Approach (the Sixth International Conference on E-R Approach, NewYork , Nov. 1987), edited by S. T. March, Elsevier Science Publishers,B.V. (North-Holland), 1988,pp.223-235.[157] Su, S. Y. XV., and Raschid, L. “Incorporating Knowledge Rules in a SemanticData Model: An Approach to Integrated Knowledge Management”,the SecondConference on Artificial Intelligence Application, Miami, Florida, December1985,pp.250-256.[158] Tauzovich, B., “An Expert System for Conceptual DataModelling”, in the EighthInternational Conference on E-R Approach, Toronto,Canada, October 1989.[159] Tauzovich, B., “Towards Temporal Extensions to the Entity-RelationshipModel”,in the Tenth International Conference on E-R Approach, SanMateo, California,October. 1991, (proceedings edited by T.J. Teorey).pp.163-179.[160] Teorey, T. J., Yang, D., and Fry, J. P., ‘ A Logical DesignMethodology for Relational Databases Using the Extended Entity-RelationshipModel”, ComputingSurveys, Vol. 18, No. 2, June 1986,pp.197-222.[161] Thompson, J. P., Data with Semantics: Data Models andData Management, VanNostrand Reinhold, N.Y., 1989.[162] Troyer, 0. D.,RIDL*:A Tool for the Computer-Assisted Engineering ofLargeDatabases in the Presence of Integrity Constraints”, Proc.of ACM SIGMOD International Management of Data, Oregon, June1989,pp.418-429.[163] Tsichritzis, D. C., and Lochovsky, F. H., Data Models, Prentice-HallInc., 1982.[164] Ullman, J. D., Principles of Database Systems, the secondedition, Computer Science Press, Rockville. Maryland, 1982.[165] Urban, S. D., and Delcambre, L. M. L., “An Analysis ofthe Structural Dynamic,and Temporal Aspects of Semantic Data Models”,The Second International Conference on Data Engineering, Los Angeles,California, February, 1986,pp.382-389.[166] Urban, S. D., and Delcambre, L. M. L., “Constraint Analysisfor Specifying Perspectives of Class Objects”, The Fifth International Conferenceon Data Engineering, Los Angeles, California, February 1989,pp.10-17.Bibliography 191[167] van der Lans, R. F., The SQL Standard — A Complete Reference, originallyinDutch, translated into English by Andrea Gray, original Dutch publishedby Academic Science, Schoonhoven, 1988 (translated English version published by PrenticeHall, 1989).[168] Wagner, C., View Integration in Database Design Unpublished Ph.D. Dissertation,The University of British Columbia, Vancouver, B.C., Canada, April 1989.[169] Wand, Y., and Weber, R., “An Ontological Analysis ofSome Fundamental Information Systems Concepts”, Proc. of the Ninth International Conferenceon Information Systems, Minneapolis, Mn., 1988,pp.213-225.[170] Wand, Y., and Weher, R., “An Ontological Analysisof Some Systems Analysisand Design Methods”, in Information Systems Concepts — An In-Depth Analysis,edited by E. Falkenberg and P. Lindgreen, North-Holland PublishingCo., Amsterdam, 1989,pp.79-107.[171] Wand, Y., and Weber, R., “Toward a Theory of theDeep Structure of InformationSystems”, Proc. of the Eleventh International Conference onInformation Systems,Copenhagen. December 1990.[172] Weber, W., Stucky, W., and Karszt,J., “Integrity Checking in Data Base Systems”,Information Systems, Vol. 8, No. 2, 1983,pp.125-136.[173] Webre, N.W., “An Extended Entity-Relationship Modeland its Use on a DefenseProject”, in Entity-Relationship Approach to InformationModelling and Analysis(the second International Conference on E-R Approach,1981), edited by P. P.Chen, Elsevier Science, (North-Holland PublishingCo.), 1983,pp.173-193.[174] Wilmot, R. B., “Foreign Keys Decrease Adaptabilityof Database Designs”, Communications of the ACM, Vol. 27, No. 12, December1984,pp.1237-1243.[175] Wong, H. K.T., and Mylopoulos,J., “Two Views of Data Semantics: A Survey ofData Models in Artificial Intelligence and Database Management”,INFOR, Vol.15, No. 3, October 1977,pp.344-383.[176] Yang, H.-L., and Goldstein, R. C., “Identificationof Semantic Integrity Constraintsfor Database Design”, Working Paper, 89-MIS-021,Faculty of Commerce and Business Administration, The University of BritishColumbia, 1989.[177] Yang, H.-L., “Semantic Integrity Constraint RepresentationModel: Some Illustrative Examples”, Working Paper, Faculty of Commerceand Business Administration, The University of British Columbia, February 1992.Bibliography 192[178] Yasdi, R., and Ziarko, W., “Conceptual Schema Design: A Machine LearningApproach”, in Methodologies for Intelligent Systems, edited by Z. W. Ras andM. Zemankova, Elsevier Science Publishers Co., 1987,pp.379-391.[179] Zvieli, A., and Chen, P. P., “Entity-Relationship Modelling and Fuzzy Databases”,The Second International Conference on Data Engineering, Los Angeles, California,February 1986, pp. 320-327.Appendix ABNF Descriptions of the SIC Representation ModelThis appendix provides a summary of the syntax of theSIC Representation model usedin this research. The following usual BNF meta-symbols are used:< >{ } [If a terminal symbol happens to be identical to a meta-symbol, it will be writtenwithina pair of escape symbols, e.g., :j representing the terminal<SIC statement> [<SIC name>]CERTAINTY <certainty factor>FOR <object type name>ON <operation type>[IF <assertions>]ASSERT <assertions>ELSE <violation action><SIC name> ::= <object type name> -< operation abbreviation>-<SIC type>[-< optional part>]SIC type is specified by the automated database design systemor the databasedesigner using some conventions.<optional part> <related object type set> [-<sequence number>]<object type name> ::= <entity type name><relationship type name>I<relation type name><attribute type name><attribute type name> <entity type name>.< attributename>I<relationship type name> .< attribute name><relation type name>.<attribute name>Entity type name, relationship type name, relationtype name, and attribute nameare all specified by the database designer, and by convention, beginwith a capitalletter.<operation abbreviation> ::= I D U193Appendix A. BNF Descriptions of the SIC Representation Model194<related object type set> ::= (<related object>{,<related object>})<related object> :: <object type name> <Prolog variable>Prolog variable is any name beginning with a capital letter or an underline sign(following the Prolog convention). In this case, it is used to represent a singleobject type name or a set including some object type names.<sequence number> 1 2I3Note that they are positive integer numbers.<certainty factor> ::= <binary certainty factor>I<fuzzy certainty factor><certainty number><binary certainty factor> ::= certain uncertainNote that “certain” is equivalent to the ratio certainty number100%, “uncertain”expresses an unknown ratio certainty numberthat is less than 100%.<fuzzy certainty factor> ::= usuallyIsometimesNote that the database designer would specify these fuzzy certaintyfactors to beequivalent to some certainty numbers (for example, “usually”may be set to 80%).<certainty number> ::= <ordinal certainty number><ratio certainty number><ordinal certainty number> 1 2 3Note that they are positive integer numbers.<ratio certainty number> ::= 1%I ... I100%Note that they are positive real number from 1% to100% inclusive and are represented in percentage.<operation type> ::= insertion deletionIupdate<assertions> : — [<quantifier> <variable> <and_connective>]<assertion_and_list> [<or_connective> <assertions>]<assertion_and_list> :: <expression> [<and_connective><assertion_and_list>]<expression> ::= (<assertions>)<not> (<assertions>)<logical predicate><set expression><arithmetic expression><date expression>Appendix A. BNF Descriptions of the SIC Representation Model 195<logical predicate> ::= <predicate name>[(< argument> {,<argument>})]Predicate name is any name beginning with a lower-case letter following theusualProlog convention. The built-in logical predicates for storing or manipulatinginformation in data dictionary are listed in Appendix B.<not> not j —<and_connective> ::= A<or_connective> ::= V<quantifier> ::= VI<argument> ::= <constant> <EntRshipRel variable value>I<arithmetic simple expression><date simple expression><constant> ::= <numeric constant><string><atom>I<date constant>Atom is the same as in Prolog, i.e. it is madeup of letters and digits, and beginswith a lower-case letter; string is a character string.<numeric constant> ::= <integer number> <realnumber><date constant> ::= <time point> <time interval>Time point is a specific date following any date representation conventionwithin apair of double quotation marks(“)(the system takes responsibility for decipheringthe input string and interpreting it as a date);time interval is a period representedby a real number following onetime unit, i.e., second(s), minute(s),hour(s), day(s),month(s), and year(s) within a pair of quotationmarks(““);e.g., “3.5 hours”.<variable> : — <EntRshipRel variable> <other_variable_except_ERR><EntRshipRel variable> = <entity type name>[<subscript>]<relationship type name>[<subsipt>]<relation type name>[<subscript>]<subscript> ::= 0 1 2Note that they are non-negative integer numbers.<other_variable_except_ERR> : : = <attribute variable>I<Prolog variable>Appendix A. BNF Descriptions of the SIC Representation Model 196<attribute variable> ::= <entity type name>[<subsipt>J.<attribute name>I<relationship type name>[<subscript>]. <attribute name><relation type name><subscript>].<attribute name><set expression> : := [set] {< variable> :j: <expression>}<arithmetic simple expression> [<positive_negative_sign>] <arithmetic term>{<arithmetic_plus_minus> <arithmetic term>}<arithmetic term> ::= <arithmetic subterm>{<arithmetic_times_divide> <arithmetic subterm>}<arithmetic subterm> :: <numeric constant><other variable value>I<aggregate function> (<other variable value>)count(<EntRshipRel variable value>)<aggregate function>(<set expression>)(<arithmetic simple expression>)[<exponential operator> (<arithmetic simple expression>)]<other variable value> ::= <other_variable_except_ERR>new(< other_variable_except_ERR>)Iold(< other_variable_except_ERR>)<EntRshipRel variable value> ::= <EntRshipRel variable>new(<EntRshipRel variable>)Iold(<EntRshipRel variable>)<positive_negative_sign> ::=+ —<arithmetic_plus_minus> ::=+ —<arithmetic_times_divide> ::= xI /<exponential operator> ::=<aggregate function> ::= sum avg miii max count<user-defined aggregate function><arithmetic expression> ::= <arithmetic simple expression>[<not>] <comparison operator> <arithmetic simple expression><comparison operator> ::= <>= I<date simple expression> [<positive_negative_sign>] <date subterm>{<date operator> <date subterm>}Appendix A. BNF Descriptions of the SIC Representation Model 197<date subterm> ::= <date constant><other variable value><date function>.(<other variable value>)(<date simple expression>)<date operator> :: + —<date function> : : = year monthIday minute second<date expression> ::= <date simple expression>[<not>] <comparison operator> <date simple expression><violation action> <rejection> <propagation> warning<rejection> ::= reject conditionallyreject<propagation> : : = propagate( <propagated_action>)Iconditionally_propagate( <propagated_action>)<propagated_action> : : = insert (<variable>)delete(<variable>)insert_all(< variable>)Idelete_all( <variable>)Iinsert(<entity role type>, <variable>)Idelete(<entity role type>, <variable>)Iinsert_all(<entity role type>, <variable>)Idelete_all(<entity role type>, <variable>)Iupdate(<variable>, <arithmetic simple expression>)Iupdate(<variable>, <date simple expression>)<entity role type> “<entity type name>”I<Prolog variable>Prolog variable is used to represent an entity type name in this case.Appendix BSummary of the Predicates used in this ResearchThis appendix provides a summary of the logical predicatesused with the SIC Representation model. The listing is not intendedto be comprehensive. All the “inputpredicates” (in Appendix B.1) and most of the“manipulation predicates” (e.g., ent_occ,in Appendix B.2) are used only in conjunction with genericSICs. Other manipulationpredicates (e.g., rship_occ_part, is_null) are represented asProlog “sub-procedures”, whichare convenient for SIC representation. However, itis possible to represent a specific SICwithout using any of these predicates.B.1 Input PredicatesThe following predicates are suggested by this researchfor conveying semantic information about a database. They are used by a databasedesigner to indicate that therearesome integrity constraints on specific objects, which would inheritrelevant generic SICs.They are information provided by the database designer and storedin a database.• The predicate domain(Domain_Name,Data_Type, Format, Value_Range)is usedto define a user-defined domain, Domain_Name, e.g.,money, location; its systempre-defined Data_Type, e.g., arithmetic, date, non_arithmetic;its format; and valuerange. Stating that a data type is of type arithmeticif all of the normal arithmeticoperations may be performed on it in the usual way.Specifying a data with anonarithmetic data type implies that the datamay not be used in conventionalarithmetic operations. Following the ideaof the EBASE system to simplify thedatabase designer’s work ([Goldstein andWagner, 1988]), no format specificationis needed for date and arithmetic data type domains.For nonarithmetic datatypes, the format is comprised of fourbasic symbols enclosed in the pair ofquotation marks:A for an alphabetic character, i.e., A toZ, a to z, period or blank;9 for a numeric character, i.e., 0 to 9;for any special character, e.g.,- / ,. [] { }! etc. or blank;X for an alpha-numeric character (i.e., any of the above).If a format contains a string of one type of character, it can be expressed more198Appendix B. Summary of the Predicates used in this Research 199concisely as “7©X”, which means “XXXXXXX”. The range is expressed as a continuous range from Beginning_value to End_value in the format of [Beginning_valueEnd_value], or an enumerated set. Following the usual mathematic convention,the symbols are used here as follows:“[“means the beginning of a closed range,i.e., including the Beginning_value;“]“means the end of a closed range, i.e., including the End_value;“(“means the beginning of an open range, i.e., excluding theBeginning_value;“)“means the end of an open range, i.e., excluding the End_value.The special symbol““is used if there is no upper or lower bound.• The predicate attribute(Entity/Relationship_Type, Attribute_Name, Domain_Name,Special_Value_Range, Null?, Unique?, Candidate_Key_Attribute?, Changeable?)isused to specify information about an attribute of an entity or relationshiptype.Domain_Name is its value domain definition. Special_ Value_Range isits specificvalue range information (e.g., Employee.Salary has the domain of moneywith thespecific value range between $2,000 and $100,000). The four binary variables,Null?,Unique?, Candidate_Key_Attribute?, Changeable? indicate whether theattribute isallowed to he null, unique, a part of a candidate key, and changeable.• The predicate entity(Entity_Type, Primary_Key, Composite_Key_Set,Absolute_Max_Cardinality) is used to describe Entity_Type.Primary_Key specifiesits primary key. Composite_Key_Set is a set comprised of all compositekeys (including primary and non-primary composite keys). Absolute_Max_Cardinalityspecifiesthe maximum number of occurrences that are allowedin the Entity_Type.• The predicate relationship_participant(Relationship_Type, Entity_Type,Min_ Cardinality, Max_ Cardinality) specifies Relationship_Type’sparticipant, Entity_Type, and the usual relationship cardinalities relativeto it. This predicate isused for each relationship in the E-R-SIC model.In the relational model, it is usedfor those relationships that are separately represented.If a relationship is hiddenin an entity relation, its two related relationship_participantpredicates are deleted,and another relationship_hidden_entity predicate iscreated (see below).• The predicate relationship_hidden_entity(RelationshipType, Entity Type 1, ForeignEntType, MinCard2, MaxCard2) is used in the relationalmodel to indicate that Relationship Type is represented via a foreign key in the EntityTypel relation.It alsoindicates that relative minimum and maximum cardinalities of ForeignEntType inthe Relationship Type are MinCard2 and MaxCard2, respectively.• The predicate symmetric(Relationship) is used to indicate that a relationshipissymmetric.• The predicate transitive(Relationship) is used toindicate that a relationship istransitive.Appendix B. Summary of the Predicates used in this Research 200• The predicate subset_rship((Rship Type 1, Ent Type 1, Ent Type2), (Rship Type2, EntType3, EntType4)) is used to indicate that the first relationship type is a subsetofthe second relationship type in the sense that Ent TypeS and EntType are theparticipants in the second relationship type corresponding respectively to EntTypeland Entlkjpe2 in the first relationship type. Usually, EntTypel is the sameasEntType3, and EntType2 is the same as EntType. However, itis possible thatEntTypel and Ent TypeS are different subtypes ofa super-type, and EntType2 andEntType are different subtypes of another super-type (these twosuper-types maybe the same). In that case, if an occurrence El of EntTypel connects withan occurrence E2 of EntType2 in RshipTypel, an EntType3 occurrence,corresponding tothe super-type occurrence of El, should also connect with an EntType$ occurrence,corresponding to the super-type occurrence of E2 inRshipType2.• The predicate rships_union_condition((SuperRship, EntType 1,Ent Type2),SubRshipSet) is used to indicate a necessary condition for thecase that the firstrelationship type is the union of the other relationship types thatare in the set,i.e., the second argument of the predicate is a set in theform of {(SubRshipi,EntTypei, EntTypej)}, i,j= 1,2,...,i j. Similarly to the case of the predicatesubset_rship, usually EntTypei’s are the same as EntTypel,and EntTypej’s are thesame as EntType2. It is also possible that EntTypei’s andEntTypel are differentsubtypes of a super-type, and EntTypej ‘s and EntType2are different subtypes ofanother super-type (these two super-typesmay be the same). In that case, ifthere is an SnperRship occurrence connecting an occurrenceEl of EntTypel withan occurrence E2 of EntType2, there shouldbe at least one S’ubRshipi occurrenceconnecting an EntTypei occurrence, correspondingto the super-type occurrence ofEl, with an EntTypej occurrence, correspondingto the super-type occurrence ofE2.• The predicate rships_intersect_condition ((SubRship, EntTypel, EntType2),SuperRshipSet) is used to indicate that the first relationshiptype is the intersectionof the other relationship types that are in the set. Thesecond argument of the predicate, SnperRshipSet, is a set in the form of{(SuperRshipi, EntTypei, EntTypej)},i,j= 1,2,...,i=/ j. Similarly to the case of the predicaterships_union.condition,this kind of SIC may occur in a specialization hierarchy. Inthat case, for each ofSuperRshipi to have an occurrence connectingan EntTypei occurrence, corresponding to a common super-type occurrence (say, Ek), with itsEntTypej occurrence,also corresponding to a common super-type occurrence (say, Fk),there should bean SnbRship occurrence connecting an EntTypel occurrence,corresponding to Ek,with an EntType2 occurrence, corresponding to Fk.• The predicate ex_rships(ExRshipSet), whereExRshipSet has the form of{(Rshipl,SharingEntl), (Rship2, SharingEnt2)}. The SharingEntl andSharingEnt2 areusually the same. In this case, the predicate is used to indicate that anoccurrence ofAppendix B. Summary of the Predicates used in this Research201the sharing entity type is allowed to participate in only one of these two relationshiptypes. The SharingEnti and SharingEnt2 may be different subtypesof a super-typein a specialization hierarchy. In that case, this predicate is usedto indicate that ifan occurrence E of a sharing entity type participates in one ofthese two relationshiptypes, the occurrence of the other sharing entity type, correspondingto the super-type occurrence of E, cannot participate in the other relationshiptype.• The predicate ex_occ(ExoccRshipSet), where ExoccRshipSethas the form of{(RshipTypel, EntTypel, EntType2), (RshipType2, EntType3,EntType4)}, is usedto indicate that there should not be any common occurrencesof these two relationship types in the sense that EntType3 and EntTypeare the participants in thesecond relationship corresponding respectivelyto the EntTypel and EntType2 inthe first relationship. Similarly to the case of thepredicate ex_rships(ExRshipSet,),EntTypel and EntType3 may be different subtypesof a super-type and EntType2and EntType4 may also be different subtypes of another super-type.• The predicate not_andrships(NotAndRshipSet), where NotAndRshipSethas theform of {(Rshipi, SharingEnti)}, i=1,2 Similarlyto the case of the predicate ex_rships(ExRshipSet), the SharingEriti’sare usually the same for Rshipi’s.In this case, the predicate is used to indicatethat an occurrence of the sharingentity type cannot participate inllof these relationship types. The SharingEnti’smay also be different subtypes of a super-typein a specialization hierarchy. In thatcase, at least one SharingEnti occurrence, correspondingto the same super-typeoccurrence, cannot participate in its Rshipi.• The predicate either_rships(EitherRshipSet), whereEitherRshipSet has the formof{(Rshipi, SharingEnti)}, i=1,2, Similarlyto the case of the predicateex_rships(ExRshipSet), the SharingEnti’s areusually the same for Rshipi’s. In thiscase, the predicate is used to indicate that an occurrenceof the sharing entity typemust participate in at least one of theserelationship types. If the SharingEnti’sare different, at least one occurrence of SharingEnti,corresponding to the samesuper-type occurrence, must participate in itsRshipi. It is desirable to have twosub-types of these SICs with different violationactions; one for the case with onlytwo relationships, and one for the case with morethan two relationships. However,the predicate can be the same.• The predicate before((Rshipl, SharingEnti),(Rship2, SharingEnt2)) is used toindicate that at the time an occurrence of SharingEntiis going to participatein Rshipl, the occurrence of SharingEnt2, correspondingto the same super-typeoccurrence, must participate in Rship2.• The predicate not_before((Rshipl, SharingEntl),(Rship2, SharingEnt2)) is used toindicate that at the time an occurrence ofSharingEntl is going to participate inRshipl type, the occurrence of SharingEnt2, corresponding tothe same super-typeoccurrence, must participate in Rship2.Appendix B. Summary of the Predicates used in this Research 202• The predicate rships_join(RshipTypei, EntTypei, EntTypeN), RshipList,) is usedto indicate that if there is a linking path via those relationship types in RshipListto connect two entity occurrences in EntTypei and EntTypeN together, these twoentity occurrences must be connected via RshipTypei. RshipList is an orderedset (list) in the form of (RshipTypei, EntTypej, EntTypek) where i, k= 2, 3,N and j= 1, 2, ..., N_i. It is a necessary condition for asserting that RshipTypei is the join of these RshipTypei’s, that is, RshipTypei[EntTypei,EntType=Rship Type2[Ent Type 1, Ent Type2J N Rship Type3[Ent Type2, Ent Type3j N . . . NRshipTypeN[EntTypeN_i,EntTypeI\. Since its representation is complicated, theunusual cases in a specialization hierarchy are not considered.• The predicate rship_dep_loopn_rships((RshipTypei, EntTypei, EntTypeN),RshipList) is used to indicate that if an occurrence of RshipTypei exists,thereshould be a linking path via other relationship types in RshipListto connect itsparticipating entity occurrences (in EntTypei and EntTypeN) together.RshipListis an ordered set (list) in the form of (RshipTypei, EntTypej, EntTypek) where i,k=2, 3, ..., N and j= 1, 2, ..., Ni. Both this and the above rshipsjoin predicatesare needed to guarantee that RshipTypei is the join of these RshipTypei’s.That is, Rship Type 1 [EntTypei,Ent TypeI\ Rship Type2[Ent Typei, Ent Type2] NRship Type3[EntType2, Ent Type3] N . . . N Rship TypeN[Ent TypeN..i, Ent TypeIV]. Sinceits representation is complicated, the unusual cases in a specialization hierarchyarenot considered.• The predicate weak_entity(EntityType, Relationship Type) is usedto indicate thatthe entity type is weak and Relationship Type is the one that it depends on.• The predicate id_depend(EntityType, KeyAttSet, Relationship Type)is used to indicate that because of the semantics of Relationship Type (e.g.,implying ID dependency or as an is_a relationship, etc.), EntityType incorporates the primary keyofthe other participating entity type as (part of) its candidate key attributesKeyAttSet.• The predicate critical_rship (Relationship Type, EntType, CriticalAtt)is used to indicate that not only Relationship Type is total to Ent Type, butalso exactly onecritical relationship occurrence exists for each occurrence of Ent Type.CriticalAtt isa binary attribute in Relationship Type to indicate whether a relationshipoccurrenceis critical.• The predicate completeness_mapping(RelationshipType)is used to indicate that arelationship type is complete, which means that for each occurrenceof one entitytype, all occurrences of the other entity type must relate to it via this relationshiptype.• The predicate rship_trigger_rship((Rshipi, SharingEnti), (Rship2, SharingEnt2))is used to indicate that if an occurrence of Rshipi, in which an occurrence E ofAppendix B. Summary of the Predicates used in this Research203SharirigEriti participated, existed in the past (and no longerexist now), the SharingEnt2 occurrence corresponding to the super-type occurrenceof E must participatein Rship2.• The predicate ex_ents(’ExEntSet, is used to indicate that thetwo entity types inthe set are exclusive. The ExEntSet contains only two entity types.If three ormore entity types are mutually exclusive, we can specify morethan one exentspredicate.• The predicate ents_intersect_condition(SubEnt Type, SuperEntSet) isused to indicate a necessary condition for the case that SubEnt Type isthe intersection of theentity types in the set, SuperEntSet. That is, foreach of the entity types in SuperEntSet to have an occurrence with the same candidate key value,there should bea corresponding occurrence with this candidatekey value in SubEnt Type.• The predicate ents_unioncondition(SuperEntType, SubEntSet) is used to indicatea necessary condition for the case that the SuperEntTypeis the union of the entitytypes in the set, SubEntSet. That is, for any occurrencein SuperEnt Type, thereshould be at least one corresponding occurrencewith the same candidate key valuein one of those entity types that are in SubEntSet.B.2 Manipulation PredicatesThe following predicates are used to manipulatethe information contained in the inputpredicates. In order to represent all generic SICs mentionedin this dissertation, an automated database design system should have atleast these built-in predicates. In thefollowing, only the functioning of these predicatesis described. The actual Prolog codeis not provided except for those complicatedrecursive predicates — join_list_okiandjoin_list_ok2. All predicates are applicableto the E-R representation and the relationalrepresentation unless explicitly stated otherwise.The Prolog representation of an occurrence would include its associated entity or relationshiptype and its primary key (e.g.,(“Person”, “123-456-789”)).• The predicate“rship_occ_part(R,RoleType,E,i’ is used to evaluate whether an entityoccurrence E participates in a relationshipoccurrence R with the Role Type. Notethat usually the entity type of E is thesame as the Role Type. However, in aspecialization hierarchy the entity type of E maybe E Type and the Role Type maybe FType, where EType and FType have a common super-type.In that case,the predicate means that the F occurrence in F_Type,which corresponds to the Eoccurrence in the E_Type, participates in the relationship occurrenceR.For example, if a SIC originally obtained from the database designer refers to aAppendix B. Summary of the Predicates used in this Research204relationship such as “E R F’ and E is the oniy sharing entitytype, it will bewritten as rship_occ_part(R, “E”,E) in its sub-SICs.The exact functioning of this predicate “rship_occ_part(R,RoleType,E)”is as follows.Role Type must be instantiated.(1)Suppose that both Rand Eare instantiated. (la)Suppose E_Type is the same asRole Type. This predicate checks whether the primarykey value of the occurrenceE and the corresponding key attributes of the occurrence R are equivalent.Itreturns false if they are not equivalent or if the correspondingkey attributes of theoccurrence R are null. (ib) Suppose EType is different from RoleType. It traversesa specialization hierarchy to find the corresponding occurrencein Role Type andcheck that occurrence with R.(2) Suppose that E is uninstantiated,R is instantiated. It returns the occurrenceparticipating in R with Role Type as E. No traverseof a specialization hierarchywould be performed.(3) Suppose that Eis instantiated, R is not. It returnsthe R (Rs when backtracking)in which E participates with the Role Type.• The predicate ent_occ (Entity_Type,E) is used to evalilate whether an E is an occurrence of Entity_Type, or used to fetch any one occurrenceE from Entity_Type.If E is null, the logical value of this predicateis undefined. It does not traverse aspecialization hierarchy to find the correspondingentity occurrence. For example,enLocc(”Manager”, Engineer) would befalse though the mentioned engineer mayalso appear in the manager entitytype.• The predicate rship_occ(Relationship_Type,R) is used to evaluate whether R is anoccurrence of Relationship_Type, or used tofetch any one occurrence from Relationship_Type. If R is null, the logical valueof this predicate is undefined.• The predicate att_occ(E, Att_Name, E.A) (or att_occ(R,Att_Name, R.A)) is usedto get the value of an attribute occurrenceE.A (or R.A) of an entity occurrenceE (or relationship occurrence R). If Att_Name isa key attribute of a relationshiptype, att_occ can be used to reference its value.• The predicate comp_atts_occ(E, Comp_Att,E. C) is used to get the value of a composite attribute occurrence, that is, byapplying a “concatenate” operator to theattribute values in the ordered set, Comp_Att.• The predicates satisfy_datatype(E.A, Data_Type),satisfy_format(E.A, Format), satisfy_val’ue(E.A, Range), are used to determine whetherE.A satisfies the corresponding data_type, format, value, respectively.• The is_null(’x) predicate is evaluated to be trueif and only if x is “null”.• The is_not_nullfr,,) predicate is evaluatedto be true if an only if x is not “null”.Appendix B. Summary of the Predicates used in this R.esearch 205• The false predicate is used to indicate that the assertion is absolutely false. Thatis, it is used to indicate that the attempted data operation the operation Tcomponent in the SIC Representation model — is not allowed.• The predicate concatenate(Stringl, String2, Resulting_String) is usedto constructResultingString by appending String2 after Stringi.• The predicate substring(String, Beginning_Position, End_Position,Resulting_Substring) is used to get Resztlting_Substring from Beginning_PositiontoEnd_Position of String.• The predicate concatenate_SlCname(Stringl, Variable, String2,Resulting_SlCname)is used to construct Resulting_SlCname by appending the value of VariableafterString], and then concatenating with String2.• The predicate checkcomSlC(Component_SIC_Name,Checked_Occurrence) calls aSIC named Component_SIC_Name to check whetherChecked_Occurrence satisfiesit; if not, the violation action of the calling SIC (rather than Component_SIC_Name)will be taken. (The SIC aggregation conceptis applied.)• The predicate checkmemSlC(Member_SIC_Name,Checked_Occurrence) calls a SICnamed Member_SIC_Name to check whetherChecked_Occurrence satisfies it. Theviolation action of the Member_SICName willbe taken if the Checked_Occurrenceviolates it. (The SIC association concept is applied.)• The predicate not_empty(Set,) is used to test whether Setis empty.• The predicate belongs_to(Element, Set) is used totest whether Element is in Set.• The predicate is_compatt(Att, Comp_Key_Set) isused to test whether an attributebelongs to a composite key. It is different frombelongs_to because C’omp_Key_Setmay be a coset — its elements are also sets containing some attributes.• The predicate part_of_comp_key(Att, Comp_Key_Set,Comp_Keyl) is used to get acomposite key Comp_Keyl, which consists of theattribute Att, from Comp_Key_Set.• The predicate remove_from_set(Element, SourceSet, ResultingSet) is usedto indicate that ResultingSet is the result of removingElement from SourceSet.• Some special predicatesto process a list are needed in handling aRelationships_Join SIC or Relationship_depends_on_LoopN_RelationshipsSIC. Inthe following, RshipListis an ordered set (list) inthe form of (RshipTypei, EntTypej,EntTypek) where i, j, k= 1, 2, ..., N.— The predicate precedes (RshipTypej, RshipTypei,RshipList) is used to find therelationship type, Rship Typei, which precedes the relationshiptype,Rship Typej, in RshipList.Appendix B. Summary of the Predicates used in this Research 206— The predicate follows (RshipTypej, RshipTypek, RshipList) isused to findRshipTypek, which follows the relationship type, RshipTypej, in RshipList.— The recursive predicate join_list_oki (FirstEntType, LastEnt Type, FirstEntOcc,LastEntOcc, RshipList) is used to test whether the relationship occurrenceconnecting the FirstEntOcc of the FirstEnt Type with the LastEntOcc oftheLastEntType is equal to the join of a series of entity occurrences participatingiii RshipList.— The recursive predicate join_list_ok2(Rship Typej, EntTypeji,Ent Typej2,FirstEntOcc, EntOccjl, EntOccj2, LastEntOcc, RshipList) is anotherrecursive predicate. One relationship occurrence of RshipTypej, j=1,2,...in RshipList,via which EntOccjl of EntTypejl connects with EntOccj2of EntTypej2, isgiven. This predicate is used to test whetherit is possible to produce a relationship occurrence, via which FirstEntOcc connects with LastEntOcc,andwhich is equal to the join of corresponding occurrences of theserelationshiptypes in RshipList with the given relationship occurrence.Since these two recursive predicate are complicated, their detailedProlog representations are given as below.%special recursive predicate in Prolog:% for R = RirxlR2 r1 R3...% joindist_oki is to test whether% the given occurrence in R connecting FirstEntOcc with LastEntOcc is equalto% the join of the corresponding occurrences in Ri, R2,%% join_list_ok is to test whether% for the given occurrence in Rj connecting Entoccji with EntOccj, j=i,2,...% it is possible to produce a pair of (FirstEntOcc,LastEntOcc)% that is equal to the join of the corresponding occurrences in Ri, R2,%% join_list_oki: all variables are inputsjoin_list_oki (FirstEntType, LastEntType, FirstEntOcc,LastEntOcc, [(RshipTypei, FirstEntType, EntTypei)IRestListi):-FirstEntType LastEntType,RshipOcci, rship_occ (RshipTypei, RshipOcci),rship.occpart (RshipOcci, FirstEntType, FirstEntOcc),ent.occ(EntTypei, EntOcci),rshipocc_part (RshipOcci, EntTypei, EntOcci),joinlistoki (EntTypei, LastEntType, EntOcci, LastEntOcc, [Restbist]).Appendix B. Summary of the Predicates used in this Research 207join_list_okl (LastEntType, LastEntType, LastEntOcc, LastEntOcc,[j).%% joirt_list_ok2: FirstEntOcc and LastEntOcc are output variables,% others are input variablesjoinlist_ok2 (RshipTypej, EntTypej 1, EntTypej2, FirstEntOcc. EntOccj 1, EntOccj 2,LastEntOcc, RshipList):join..first_lialf(RshipTypej, EntTypej 1, EntOccj 1. FirstEntOcc, Rshipbist),joinJastha1f(RshipTypej, EntTypej 2, EntOccj 2, LastEntOcc, RshipList).% join_first_half: FirstEntOcc is an output variable, others are input variables% precedes(+RshipTypej. -RshipTypei, +List) is the predicate to find% the Rship Typei that precedes Rship Typej in the List.join_first_half(RshipTypej, EntTypej 1, EntOccj 1, FirstEntOcc, RshipList)precedes(RshipTypej, RshipTypei, RshipList),RshipOcci, rship_occ(RshipTypei, RshipOcci),rshipocc_part (RshipOcci, EntTypej 1, EntOccj 1),ent_occ(EntTypei, EntOcci),rship_occ_part (RshipOcci, EntTypei, EntOcci),join_first_half(RshipTypei, EntTypei, EntOcci, FirstEntOcc, RshipList).join.first_half(RshipTypej, EntTypej 1, EntOccj 1, EntOccj 1, Rshipbist)-, precedes ( RshipTypej, RshipTypei, RshipList).% join_last_half: LastEntOcc is an output variable, others are input variables;% follows(+RshipTypej, -RshipTypek, +List) is the predicate to search for% the RshipTypek that follows RshipTypej in the List.join_last_half(RshipTypej, EntTypej 2, EntOccj2, LastEntOcc, RshipList):-follows (RshipTypej, RshipTypek, Rshipbist),RshipOcck, rship_occ (RshipTypek, RshipOcck),rship_occ_part(RshipOcck, EntTypej2, EntOccj2),entocc(EntTypek, EntOcck),rship_occpart (RshipOcck, EntTypek, EntOcck),joinlast_half(RshipTypek, EntTypek, EntOcck, LastEntOcc,RshipList).join_last_half(RshipTypej, EntTypej2, EntOccj 2, EntOccj 2, RshipList):-—, follows (RshipTypej, RshipTypek, RshipList).• The following four predicates are used to store related SICs for updatingthe primary key of Relationship_Type or Entity_Type. If a SIC is relevantto the insertionAppendix B. Summary of the Predicates used in this Research 208(or deletion) of Relationship_Type, its name is added into the SIC_Name_Set inthe associated_PKSICs_I (or associated_PKSICs_D) predicate of the affected keyattribute PKAtt. Each affected key attribute, PKAtt, of Relationship_Type has itsown associated_PKSICs_I (or associated_PKSICs_D) predicate. In the case thataSIC is relevant to the insertion (or deletion) of Entity_Type, there is only one associated_PKSICs_I (or associated_PKSICs_D) predicate, which isrelated to each ofits primary key attribute.— predicates used to store the related SICs forthe key attributes of Relationship_Type:associated_FKSICs_I(Relationship_Type, PKA tt. SIC_Name_Set)associated_FKSICs_D(Relationship_Type, PKA tt, SIC_Name_Set)— predicates used to store the related SICsfor the key attributes of Entity_Type:associated_PKSICs_I(Entity_ Type, SIC_Name_Set)associated_PKSICs_D(Entity_Type, SIC_Name_Set)• The predicate foreign_key(Rship Type, EntOcc, ForeignEntType,FKAtts_val’ue) isused in the relational model to get the foreign key value FKAtts_valueof EntOccconnecting with an occurrence of ForeignEnt Type in the relationshipRship Type.• The predicate foreign_ent_occ(RshipType,E,F) is used in therelational model toevaluate whether the primary key of an F occurrence appearsas a foreign key inan occurrence E because of the relationship type,Rship Type. The Rship Type isneeded because there may be more than one relationshiptype between two entitytypes. The entity types with which E and F participate inRship Type need notbe explicitly included in this predicate. Fromthe relationship name Rship Type,the reference to relationship_hidden_entity(Rship Type, EntTypel,EntType2, Cmin,Cmax) would show that E takes the role EntTypel and Ftakes the role of EntType2. The exact functioning of this predicate “foreign_ent_occ(RshipType,E,F)”is as follows.Rship Type must be instantiated.Suppose that we have relationship_hidden_entity(Rship Type, EntTypel,Ent Type2,Cmin, Cmax).(1)Suppose that both E and F are instantiated. (la) Suppose theentity type of E,EType, is the same as EntTypel and the entity type of F, FType,is the same asEntType2. This predicate checks whether the foreignkey value of the occurrence Eand the corresponding key attributes of the occurrence F are equivalent.It returnsfalse if they are not equivalent or if the corresponding key attributeof the occurrence E is null. (ib) Suppose either EType is different from EntTypel orFType isdifferent from EntType2, or both are different.It traverses a specialization hierarchy to find the occurrence in EntTypel corresponding to E and the occurrenceinEntType2 corresponding to F, and then checks them.(2) Suppose that F is uninstantiated, E is instantiated. It only returns F with theAppendix B. Summary of the Predicates used in this Research 209value of the occurrence in EntType2 whose primary key appears in E as a foreign keyowing to Rship Type. No traverse on specialization hierarchy would be performed.(3) Suppose that Fis instantiated, Eis not. It returns those E (Es when backtracking) taking the role of EntTypel connecting with F taking the role of EntType2.• The predicate which_foreign (EntType, RoleTypel, Role Type2, EntOcci, EntOcc2,EntOcci, FEntOcci) is used to check which of entity types, Role Typel or Role Type2,should be Ent Type where the relationship type is hidden, and assign these entity occurrence variables properly. This predicate is used to reduce the possible predefinedsub-SICs in the relational model since we handle the entity links in specializationhierarchy implicitly and for some SICs (e.g., Subset_Relationship SIC, Exclusive_Occurrence SIC) the information about the corresponding involved pairs ofsharing entity types are stored in some order.EntOcci and EntOcc2 are the occurrences playing Role Typel and Role Type2, respectively. EntType is the entity type where the relationship between Role Typeland Role Type2 is hidden. EntOcci is an occurrence of EntType, FEntOcci is thecorresponding foreign entity occurrence. When calling this predicate, EntType,Role Typel, RoleType2 should be already instantiated. Either the pair of EntOcci,EntOcc2 or the pair of EntOcci, FEntOcci is already instantiated, but notboth.This predicate performs the following:(1) if EntType=RoleTypel, EntOcc2 should be the foreign entity occurrence, thenunify EntOcci with EntOcci, and unify FEntOcci with EntOcc2 (that is,(la) ifEntOcci, EntOcc2 are already instantiated, then set EntOcci=EntOccl,FEntOcci=EntOcc2, (ib) if instead, EntOcci, FEntOcci are already instantiated,then set EntOccl=EntOcci, EntOcc=FEntOcci);(2) conversely if EntType=RoleType2, EntOcci should be the foreign entity occurrence then unify EntOcci with EntOcc2, and unify FEntOeci with EntOcci;(3) otherwise, return false.• The predicates associated_FKSICs_I(Entity_ Type, FKAtt, SIC_Name_Set) and associated_FKSICs_D(Entity_Type, FKAtt, SIC_Name_Set) are used in the relationalmodel to store related SICs for updating the foreign key attribute FKAtt of Entity_ Type.Appendix CBNF Descriptions of the Simplified FormatThis appendix provides a summary of the syntax of the simplified format that is used inthis research to represent the preconditions and predicates of a general SIC elicited fromthe database designer. Since this description is similar to the precondition and predicatecomponents of the SIC Representation model, some definitions refer to Appendix A. TheBNF’ meta-symbols and escape symbols used here are the same as thoseof Appendix A.<SIC description> ::= if[with_respect_to <focused entity type names>,]<assertions> [previously]then [with_respect_to <focused entity typenames>,]<assertions> [before]if[with_respect_to <focused entity type names>,]<assertions> [previously]then <SIC description>I<assertions><focused entity type names> ::= <entity type name>(<entity type name>, <entity type name>)<assertions> : : = [<quantifier> <variable> <and_connective>]<assertion_and_list> [<or_connective> <assertions>]Note that the BNF definitions of or_connective and and_connectiveare the same asin Appendix A.<quantifier> ::= VI3 [<numerical modifier>] [different]<numerical modifier> ::= at_least <some number>at_most <some number>Iexactly <some number><some number> ::= 1 2 3Note that they are non-negative integer numbers.210Appendix C. BNF Descriptions of the Simplified Format 211<assertion_and_list> : : = <expression> [<and_connective> <assertion_and_list>]Note that this BNF definition is the same as in Appendix A except for the underlying definition of expression, which is redefined as below.<expression> ::= (<assertions>)<not> (<assertions>)<entity type name> <relationship type name> <entity type name>I<EntRship type name> is.to_be_deleted<attribute type name> is_to_beiipdatedI<logical predicate><set expression><arithmetic expression><date expression>Entity type name, relationship type name and attribute type name have the samename convention — beginning with a capital letter — as those in Appendix A. Thedefinitions of not and date expression are the same as those in AppendixA. Thedefinitions of set expression and arithmetic expression are almostthe same as thecorresponding ones in Appendix A except for their underlying definitionsof variableand arithmetic subterm, which will be redefined below. The major differences arethat neither subscript nor relation type name is used; and the new andold functions are only applicable to <attribute type name>. Only some ofthe logical predicates in Appendix B are applicable here. These are, satisfy_datatype(<attributetype name>, Data_Type), satisfy_format(<attribute type name>, Format), satisfy_val’ae(<attribute type name>, Range), is_null(<attribute typename>),is_not_null(<attribute type name>), false, and belongs_to(Element, Set), etc. Because subscripts are not used, the simplified format uses a special predicatennique(<attribute type name>{,<attribute type narne>}), whichis not used in theSIC Representation model (so, it is not in Appendix B). It checks whether a particular attribute or combination of attributes does not actually contain duplicatevalues in a database. When the SIC is represented in theSIC Representationmodel, this special predicate will be replaced with a count function and appropriate subscripts will be attached to its associated entity (by applying the algorithmsin Appendix H.2).<EntRship type name> ::= <entity type name><relationship type name><variable> <EntRship type name> <other_variable_except_ER><other_variable_except_ER> : : = <attribute type name><Prolog variable>Appendix C. BNF Descriptions of the Simplified Format 212<arithmetic subterm> : = <numeric constant><other variable value>I<aggregate function>(<attribute variable value>)count(< EntRship variable value>)<aggregate function> (<set expression>)(<arithmetic simple expression>)[<exponential operator> (<arithmetic simple expression>)]The definitions of numeric constant, aggregate function, arithmetic simple expression, exponential operator are all the same as in Appendix A. The main difference ofthe definition of arithmetic subterm is that we use EntRship variable value insteadof EntRshipRel variable value.<other variable value> ::= <other_variable_except_ER>new(<attribute type name>)Iold(<attribztte type name>)<EntRship variable value> ::= <EntRship type name>Note that the new and old functions are not applicable here. We can directly useEntRship type name instead of defining EntRship variable value. The reason fordefining this is to compare it with Appendix A.Appendix DSIC Type Classification in the E-R-SIC ModelThis appendix provides a classification of SIC types usedin the E-R-SIC model. Although the listing is not comprehensive, it covers differentSIC types that can occur invarious contexts of an E-R diagram, and also includes thoseSICs mentioned in the literature. We can use these SIC types to write genericSICs and store some consistency ornonredundancy rules for them in the elicitation subsystem.Notation The following notation will be usedin examples.• E, F, G,H, I, ... are used to denote entities.• R, RX, RY, RZ, Ri, R2, R3, ... beginning with“R” are used to denote relationships.• E.A, E.Ai, E.A2, F.B, F.Bi, F.H2, R.A, ... are usedto denote attributes ofentities or relationships. Gurrent_time is the systemvariable to monitor the currentclock.• v, vi, v2, ... are usedto denote values.• comp_op, comp_opi, ...are used to denote=, ,,,<,>.• ariop, ari_opi, ...are used to denote+,—, x,• agg_fcn, agg_fcnl, ... areused to denote the aggregate functions, e.g., sum, avg,mm, max, count.SIC Types There are four categories of SIC types.• SIC Types Focusing on Attributes of a Single Entityor RelationshipType.213Appendix D. SIC Type Classification in the E-R-SIC Model214• SIC Types Focusing on a Relationshipas an Association. These are necessary conditions for the existence of one or more relationships because relationshipsare “associations” linking entities. Different contexts, line, star, ioop-2, andloop-nshould be considered when identifying SIC types.• Sufficient Conditions for the Existence of Relationship(s). These aresufficient conditions for the existence of one or more relationships because relationshipsare “associations” linking entities.• SIC Types for Entities without Explicit Relationships.There are someSICs for a group of entities because of some “implicit relationships”.Each of them is described in detail as below.• SIC Types Focusing on Attributes of a Single Entityor RelationshipType.1. Domain Constraint: declares the value; extendedformat; nonvolatility104;data type of an attribute of an entity or relationshiptype; whether it can benull; and if it should be unique.It is possible to specify this kind of constraintfor an “implicit entity or relationship subtype”. For example, a Subtype_NonvolatilityConstraintrequires that an attribute of any occurrence of thespecified “implicit entityor relationship subtype “ be not changeable.We may also require that anattribute have some value range if there are restrictionson the values of otherattribute(s), e.g., Conditional_Value SIC suchas:if E.Ai compopi vi, . .. then E.A2comp.op2 v2.Constraints on the attributes of an entity or relationshipoccurrence could bemore complicated. In that case, we would have thefollowing.2. Formula SIC: requires that a formula105 existsamong attributes of any occurrence of the specified entity or relationship type. For example,we mayhave:(E.Ai) comp_op ((E.A2) ari_opi (E.A3,) ari_op2(scalar_value)),...If the formula involves non-numeric data-types,string predicates may beneeded.104“Nonvolatility” is interpreted in a strict sense.If an entity is inserted with some nullattributevalues, they have to stay null if “nonvolatile” is declaredfor them.‘°5Recall that if there is more than one attribute appearingin an expression, we call that expressionas a formula.Appendix D. SIC Type Classification in the E-R-SIC Model215There may be some conditions on a formula requiring that it hold onlyfor an“implicit entity or relationship subtype”. For example, wemay have:if (E.A1) comp_opi ((E.A) ari_opi (E.A3) ari_op2 (scalar_value)),...then (E.A4) compop2 (E.A5) ariop3 (E.A3)3. Aggregate_Attribute SIC: restricts the aggregate attributes of the wholeentity or relationship set, for example, agg_fcn(E.A)comp_op v.If the data type of the attribute E.A is numeric, wemay have:aggfcn(E.A) comp_op E.A, such as, “any salary valueof an employee mustnot be more than 5 percent greater than the averageof all such values”.Some complicated cases may occur. For example, theremay he a SIC (calledAggregate_Attribute_Formula SIC) requiringsome formula among thevalues of aggregate attributes of theentity or relationship set. There may bea restriction (called Interdependent_Aggregate_AttributesSIC) on theaggregate value of an attribute if some other attributeshave some aggregatevalues, e.g.,if (aggfcn1(E.A3) compop2 v2,), ... then (aggfcn2(’E.A3) compop3uS)It is also possible only for occurrences of the specified“implicit entity or relationship subtype” that there is a restrictionon an aggregate value (calledSubtype_Aggregate_Attribute SIC), anaggregate attribute formula (calledSubtype_Aggregate_Attribute_FormulaSIC), or the requirement of thecoexistence of some aggregate attributes (calledSubtypeJnterdependent_AggregateAttributes SIC).4. Composite_Attribute_Unique Constraint: requiresthe concatenated values of some attributes (which form a compositekey) of an entity type to beunique.5. Old_New Transitional Constraint: restricts the pairsof values before and’after an update of an attribute of the specifiedentity or relationship type. Itis also possible to have this kind of constraintfor a specified “implicit entityor relationship subtype”.6. Primary_Key SIC: requires enforcing a set of SICsrelated to deletion andinsertion of an entity or relationship whenan attribute, which is (part of) theprimary key, is updated.7. Deleted_Object_Attribute SIC: restrictsthe values of attributes of anyoccurrence of the specified entity or relationshiptype before this entity orrelationship occurrence can be deleted.For example, “a project can be deletedonly if its budget is zero”. Itis an operation-dependent SIC that is relevantonly to the deletion operation. It is alsopossible to have this kind of constraintfor a specified “implicit entity or relationship subtype”.8. Absolute Maximum Cardinality Constraint of an EntityType: restricts the maximum number of occurrences of an entity typethat can exist inAppendix D. SIC Type Classification in the E-R-SIC Model216a database. If it is infinite, it is not a restriction.The notation““is used todenote either the infinite, or the case that itis not infinite in the mathematicalsense, but there is no restriction on the maximum cardinality.9. SIC Involving Current_Time: either restricts a time-valued attributeofthe specified entity or relationship type withinsome range relative to Gurrent_Time; places restrictions on other attributeswhen time-valued attribute(s)satisfy some condition(s) relative to Current_Time;or requires the propagationto update the value(s) of some attribute(s) of the specified entityor relationship type while updating the Currenttime.It may only occur in an environment inwhich the event that causes manipulation of the involved database object(s)is processed in real time so thatthe Current_time in the computer matchesthe event time in the real world.Otherwise, Current_time in all of the above mustbe replaced by a time-valuedattribute that records the external event time, andthese constraints becomeordinary data-driven constraints.— Attribute SIC with Current_Time Restriction:(a) Restriction on a time-valued attributerelative to Current_Time: Fourexamples are given here to illustrate thevarious cases.* Cases 1 and 2 — the increaseof Current_time will never violate theSIC under the assumption that thecurrent database is semanticallycorrect:i. Case 1 the time-valued attribute representsa past fact relative to Current_time: If “Employee.FirstWorkDate”is the firstdate of Employee working on relatedjobs, Currenttime (Employee.First WorkDate+ “2 years”) is an expression to assert that“each employee must have at least2 years working experience”.ii. Case 2 the time-valued attribute representsa future expectancyrelative to Current_time: If the “Part.ExpectedArrDate”is the expected arrival date of a Part ordered fromSuppliers, Current_time(Part.ExpectedArrDate — “10 days”)is an expression to assertthat “the expected delivery time of apart must be no more than10 days”.* Cases3 and 4 — the increase of Current_time may violatethe SIC:i. Case 3 the time-valued attribute representsa past fact relativeto CurrenLtime: We may have anexpression Current_timeAccount_Receivable.Date+ “10 years” to assert a rule that “anaccount_receivable cannot be olderthan 10 years”.ii. Case 4— the time-valued attribute representsa future expectancyrelative to Current_time: If Drug.Expiration_Dateis the date thata Drug will expire, Current_time < (Drug.ExpirationDate— “2Appendix D. SIC Type classification in the E-R-SIC Model 217weeks”) is an expression to assert a rule that “a drug must bevalid for at least 2 weeks”.(b) Restriction on other attributes when time-valued attributes satisfysome conditions: When time passes, the restriction will either disappear or come into force. For example,if Current_time (Employee.HireDate + “6 months”)then old(Employee. Salary) > new(Employee. Salary)is an expression to assert a rule that “an employee cannot receive asalary raise during his(her) first 6 months in the company”.Another example is: “if an employee has worked for at least two years,he (she) must have at least 10 vacation days”.— Current_Time_Triggered SIC on Attributes: It triggers some operation while updating the Current_time. For example, at 0:00 on 1/1/1993,increase the salary of each employee by $1, 000.10. Temporal Conditions Among Attributes: requires that some attribute(s)of an occurrence of the specified entity or relationship type must satisfyacertain condition, at the time one of its other attribute is goingto acquiresome value. For example, we may have: if E.A1 comp-op vi then E.A2compop v2 before. Alternatively, this type ofSIC may require that if its otherattribute(s) satisfied a certain condition in the past (and nolonger satisfies itnow), one of its attributes must take some value. Forexample, we may have:if E.A1 comp-op vi previously then E.A2 comp-op v2.• SIC Types Focusing on a Relationshipas an Association.1. Incidence Constraint: allows a relationship occurrence toexist only if theparticipating entity occurrences exist.2. One_Side_Necessary_Condition fora Relationship Type: restricts theexistence of any occurrence(s) of the specified relationship type becausethereare conditions on one of its participating entity types. Different layoutsof E-Rdiagrams (line, star, loop-2, or loop-n contexts) may have differentconditions.Some examples include:— Relationship_Depends_on_Relationship SIC: requires that for arelationship occurrence to exist, one of its participatingentity occurrencesmust participate in the other relationship type. It mayhappen in line,star, or loop-2 contexts. For example, in a star context,Figure 5.3 (onpage 111), we may have:if E RXFthen ERYGorif E RXFthen (ERYG) V (ERZH)Appendix D. SIC Type Classification in the E-R-SIC Model 218Note that stating “if RX then RY’ is equivalent to stating “RX only-ifRY’. An example is “Customers (F Use Products (E” (i.e., RX only-if“Factories (G) Make Products (E”(i.e., RY). Sakai ([1983b]) states thatRX is existentially dependent on RY and there is a hierarchical orderingof the entity types.— Relationship_Depends_on_Entity_Value SIC: requires that for a relationship occurrence to exist, one of its participating entity occurrencesmust have some attribute value(s) satisfying a specified restriction. Forexample, in Figure 5.2, we may haveif E RXFthen (E.Ai comp.opi vi) V (E.A2 comp.op2 v2)— Exclusive_Relationship SIC: requires that given two relationshiptypes,for a relationship occurrence to exist, one of its participating entity occurrences must not participate in the other relationship type. These tworelationship types RX and RYare exclusive (denoted asRXIRY),and arecalled excluding relationships relative to the entity ([Palmer, 1978],[Kozaczynski and Lilien, 1988]). It may happen in any context. For example,in the context of Figure 5.2, we may have:if E RXFthen —i(E RY G)An example is where a storeroom is used either for storing rawmaterialsor for storing finished goods, but never for both.— Not_And_Relationships SIC: requires that given a groupof relationship types with a sharing entity type, for a relationship occurrencetoexist, its participating entity occurrence of the sharing entitytype mustnot participate in all of the other relationship types. That is, nooccurrence of the sharing entity type can participate in all the relationshiptypes. It can only happen in a star context, e.g., in Figure5.3, we mayhave:if ERXFthen ((E RY G) A (E RZ H))If there are only two relationships involved, it reduces to anExclusive_Relationship SIC.— Some constraints are special because of temporal conditions:* Relationship_Before_RelationshipSIC: requires that for a relationship occurrence to exist, one of its participating entity occurrencesmust participate in another relationship type at the time it is goingto participate in this relationship type. For example, in Figure 5.2,we may have:if ERX Fthen E RY G beforeAppendix D. SIC Type Classification in the E-R-SIC Model 219It requires that RY must exist at the time RX is going to be inserted. However, there is no restriction on the deletion of RY afterRX exists. A Relationship_Depends_on..Relationship SIC requires that when RX is inserted, RY exists (that is, RX and RY canbe generated concurrently) and if RY is deleted, RX must be deletedtoo.* Entity_Attributei3efore_Relationship SIC: requiresthat for arelationship occurrence to exist, the attributes of one of its participating entity occurrences must satisfy some conditions at the timeit is going to participate in this relationship type. For example, inFigure 5.2, we may have:if ERXFthen ((E.Ai compop vi) V (E.A2 comp_op2 v2)) beforeAn example is that “for a person to participate in a life-insurancerelationship, his (her) age must be less than 65”. Note that wedonot care what attribute values this entity occurrence will have after itparticipates in the relationship.* Relationship_Not_Before_Relationship SIC:requires that for arelationship occurrence to exist, one of its participating entity occurrences must pç participate in another specific relationshiptype atthe time it is going to participate in this relationship type.3. Two_Side_Necessary_Condition for a Relationship Type: restrictstheexistence of any occurrence(s) of the specified relationship typebecause thereare conditions on both participating entity types at the same time.Different layouts of E-R diagrams (line, star, ioop-2, or loop-n contexts) may havedifferent conditions. Some examples include:— Pair_Relationships SIC: requires that fora relationship occurrence toexist, if one of its participating entity occurrences participates ordoes notparticipate in some specific relationship type, the other participatingentity occurrence must or must not participate in other specific relationshiptype(s). For example, in Figure 5.2, we may have:if ERXFthen if E RY Gthen FRZH— Pair_Values SIC: requires that for a relationship occurrenceto exist,if one of its participating entity occurrences has certain attribute values,the other participating entity occurrence must have some specific attributevalues. For example, in Figure 5.2, we may have:if E RXFthen if (E.A compopi vi)Appendix D. SIC Type Classification in the E-R-SIC Model 220then (F.B compop2 v2)For example, if Employee Is_Allocated Car thenif Employee.Rank= “president” then Car.Brand= “BMW”.— ID_Dependency_Relationship SIC: requires that ifan entity type Eincorporates the primary key (e.g., F.Fkey) of the other entity type F aspart of its candidate key (because of a relationship type implying an“IDdependency” or because of an is_a, or component_of special relationshiptype), for an occurrence of the relationship type to exist, its participatingentity occurrences should have a condition: E.Fkey=F.Fkey.— Pair_Condition SIC: requires that for a relationship occurrenceto exist,there are some conditions involving the attributes of both of itsparticipating entity occurrences. For example, in Figure 5.2, wemay have:if E RX Fthen (E.A1 comp.opl (F.B ariop E.A2)),— Subset_Relationship SIC: requires that foran occurrence of the specified relationship type to exist, its two participating entity occurrencesmust be connected via another relationship type ([Palmer,1978]). Thatis, RX C RY, implying that RX occurrences are includedin RY occurrences, or stated in a different way, RX only-if RY. Itcan only happen ina ioop-2 context. For example, Owns C Entitled-to-Driveimplies that ifa person owns a car, he (she) must be entitled to driveit.— Exclusive_Occurrence SIC: requires that for an occurrenceof the specified relationship type to exist, its two participatingentity occurrencesmust be connected together via any occurrenceof another relationship type ([MaFadden and Hoffer, 1988]). It is a specialcase, but weakerthan Exclusive_Relationship SIC.— Relationships_Union Special SIC: requires that foran occurrence ofthe specified relationship type to exist, its twoentity occurrences must beconnected together via any one of other relationshiptypes. It can onlyhappen in a loop-2 context and is a necessary conditionfor the union ofrelationships, e.g., RX = RY U RZ. The semanticsthat one relationship(e.g., RX is the union of the remaining relationshipsimply this type ofSIC and Subset_Relationship SICs (e.g., RYRX, etc.).— Relationships_Intersection Special SIC:requires that given a groupof relationship types in a set SuperRshipSet and anotherrelationship typeSnbRship, for each of the relationship typesin SnperRshipSet to have anoccurrence connecting the same pair of entity occurrences,there shouldbe an SubRship occurrence connecting this pair of entity occurrences.Itcan only happen in a loop-2 context andis a necessary condition for theintersection of relationships, e.g., RX = RYnRZ. The semantics that onerelationship (e.g., RX) is the intersection of the remaining relationshipsAppendix D. SIC Type Classification in the E-R-SIC Model 221imply this type of SIC and Subset_Relationship SICs (e.g., RX C RY,etc.).— Relationships_Join SIC: requires that if there is a linking path viasome relationship types to connect two entity occurrences together, thosetwo entity occurrences must be connected via another relationship type.It can only happen in a loop_n context and is a necessary condition forrelationship composition, denoted by, e.g., RX = RY N RZ ([Lenzeriniand Santucci, 1983], [Azar and Pichat, 1987]). For example, in Figure 5.5,we may have:if(ERYH)A (HRZF)then ERXFIn fact, this is another special case of Pair_Relationships.— Relationship_Depends_on_LoopN_Relationships SIC: requiresthatfor a relationship occurrence to exist, there is a linking path via other relationship types to connect its participating entity occurrences together.It can only happen in a loopn context and is another necessary conditionfor the composition of relationships. For example, in Figure 5.5, we mayhave:if E RXFthen(ERYH)A (HRZF,)Sakai [1978] states that if RX is transitively dependent onRY and RZand it lacks (nonkey) attributes, it is redundantand could be eliminated.However, Segev [1987] argues that some relationshipsthat appear to beredundant may, in fact, carry semantic informationand thus cannot beeliminated without losing information content,and even if it is redundant,there may be a choice of which relationships to eliminate andthat thischoice should be made during physical design.4. Intra_Condition for a Relationship Type: restricts the existenceof anyoccurrence(s) of the specified relationship type because of other occurrencesof the same relationship type. For example, we may have the following.— Symmetry and Transitivity Properties ofa Relationship: In thefollowing, entity types E and F should be in a specialization hierarchy.A relationship type R is symmetric if the following condition is true: “ifany occurrence Occi of an entity type E is related via an R occurrencetoan occurrence Occ2 of the other entity type F, then an occurrenceof theE type, corresponding to the same super-type occurrence of Occ2, isalsorelated via another R occurrence to an occurrenceof the F type, corresponding to the same super-type occurrence ofOcci.” Some examples ofsuch relationship types are: Sibling_of, Married_to, Partner_of.A relationship type R is transitive if the following condition is true: “ifany occurrence Occi of an entity type E is related via an R occurrence toAppendix D. SIC Type Classification in the E-R-SIC Model 222an occurrence Occ2 of the other entity type F, and an occurrence of the Etype, corresponding to the same super-type occurrence of Occ2, is relatedvia a second R occurrence to an occurrence Occ3 of the F type, then theOcci is also related via a third R occurrence to the Occ3.” Some examples of such relationship types are: Sibling_of, Ancestor_of, Supervise,Partner_of.— Relative Maximum Cardinality Constraint: restricts the existenceof a relationship occurrence because an occurrence of one its participatingentity type can only participate in some maximum number of occurrencesof the same relationship type.This restriction may only apply to a specified “implicit entity subtype”(it is a SubtypeRelative_Maximum_Cardinality Constraint). Thisrestriction may become weaker because there are further conditions onthe other entity type that relates to the specified entity type via the relationship type (it is a Weaker_Relative_Maximum_Cardinality Constraint). There may also be a SIC (that can be calledS ubtype_WeakerJtelative_MaximumCardinality Constraint) including both the complicated cases.5. Group_Relationships SIC: requires that if occurrences of a group of relationship types exists, there must be some conditions on the values of attributesof those relationships or their sharing entity. For example, in Figure 5.2 or5.3, we may haveif(ERXF)A (ERYG)then (E.A1 compop (RY.B1 ariop RX.B2))Or, they may depend upon many other relationships, for example, we mayhave:if(ERXF)/\ (ERYG)then (E RZ H) V (E RS I) V• Sufficient Conditions for the Existence of Relationship(s): requires thatgiven some conditions, one or more occurrences of the specified relationship type(s)must exist. It includes some special cases that are often mentioned in the literature.1. Totality Constraint: requires that if an entity occurrence exists, it mustparticipate in some minimum number of occurrences of the specified relationship type. A relationship type is total to one entity type if each occurrenceof the entity type must participate in at least one relationship occurrence106.It is desirable to know an exact number (e.g., 1, 2, 5, etc.) as the relativeminimum cardinalities for entity types if there exist such SICs. The minimumcardinality 0 is not a SIC.‘°6Some researchers use by other terms, e.g., [Palmer, 1982] states that the relationship is “mandatory”,and [Kim et al., 1987] states that the relationship is “obligatory”.Appendix D. SIC Type Classification in the E-R-SIC Model223A relationship type may be only total to some specified “implicit entity subtype”. That is a Subtype_Totality Constraint. A total constraintrequirement may become stronger because there are some conditions on theotherentity type that relates to the specified entity type (it is a Stronger_TotalityConstraint). There may also be a SIC (that can be calledSubtype_Stronger_Totality Constraint) including both complicatingcases.There may also some further restrictions on the relationship occurrences.Forexample, a Weak_Entity SIC requires that a relationship type, viawhich theweak entity type is dependent upon a regular entity type, is totalto the weakentity type and its key is fixed. A Critical_Relationship_OccurrenceSICrequires the totality constraint to the specified entitytype and the existenceof exactly one critical relationship occurrence.2. Completeness_Mapping SIC: requires that if an entity occurrenceexists,it must be related to occurrences of the otherentity type via the specifiedrelationship type ([Webre, 1983]).A relationship type may be only complete to some specified “implicitentitysubtype”. That is, Subtype_Completeness_MappingSIC. A completemapping requirement may also become weaker becausethere are some conditions on the other entity type that relates tothe specified entity type (it isa Weaker_Completeness_MappingSIC). There may also be a SIC (thatcan be called Subtype_Weaker_Completeness_MappingSIC) includingboth complicating cases.3. Either_Existence_Relationship SIC: requires that ifone entity occurrenceexists, it should participate in at least one (or some specifiednumber of) occurrence(s) among a group of relationship types.For example, in a star context,Figure 5.3, each E occurrence may be requiredto participate in at least onerelationship occurrence among the relationship typesRX, RY and RZ.There are some variants of this type, e.g., the casesconsidering an “implicit entity subtype”, different quantitative requirementson relationship occurrences,or having further restrictions. For example,in Figure 5.3, each E occurrencemay be required to participate in at least either oneRX occurrence, two RYoccurrences or four RZ occurrences.4. Relationship_Trigger_Relationship SIC: requiresthat if some relationships existed in the past (and no longerexist now), other relationships mustexist. For example, in Figure 5.2, we may have:if E RY G previonslythen ERXF5. Entity_Value_Trigger_Relationship SIC: requires that if the value(s) someattribute(s) of an entity occurrence satisfied a certain condition in the pastAppendix D. SIC Type Classification in the E-R-SIC Model224(and no longer satisfy now), the entity occurrence must participatein someminimum number of occurrences of the specified relationship type.For example, we may have:if E.Al comp-op vi previouslythen ERXF• SIC Types for Entities without Explicit Relationships.— SIC Types for Entities in a Specialization Hierarchy.1. Exclusion between Entity Types: requires two entitytypes to beexclusive.2. Entities_Intersection Special SIC: requires that given agroup of entity types El, E2, ..., En having a common candidatekey Ekey, for eachof El, E2, . . . EnJ entity types to have any occurrencewith the samecandidate key value, there should bean occurrence with this candidatekey value in the specified entity type En.That is, “if (Ei.Ekey= Value) A(E2. Ekey= Value) A ... then En. Enkey= Value “. It is a necessaryconditionfor En = El fl E2fl3. Entities_Union Special SIC: requires that givena group of entity typesEl, E2, ..., En having a common candidate keyEkey, for any occurrencein the specified entity type En, there should be atleast one correspondingoccurrence with the same candidate key valuein one of other entity types.That is, “if E.Ekey_— Value then (El.Ekey=Value)V (E2.Ekey= Value) V“. It is a necessary condition for En = El U E2U— Association Abstraction SIC: requiresthat a set occurrence cannot bedeleted if it is not empty; two kinds of derivedattributes in a set — theindexing attribute and the key of theindexing entity type — cannot be updated; and some formulas involving aggregatefunctions on attribute(s) of themember entity type although there is no implicitrelationship type member_of.Appendix EExamples of HeuristicsThis appendix presents some heuristics that can be used to reduce the effort requiredtouse the SIC elicitation subsystem. The listing shows only some examples;it is far fromcomplete.1. Two heuristics may be applied to elicit oldnew transitional constraints.• If an attribute is numeric, the subsystemmay inquire whether it is monotonically increasing or decreasing on update.• If an attribute has an enumerated value set,there may be update transitionsthat are not allowed. For example, the value of Marriagestatus cannot beupdated from “single” to “widowed”, or divorced”, nor conversely.2. An attribute with the “date” data type may be unchangeable.More precisely, weshould state that historical dates are usually unchangeable. However,the subsystemneeds to know the meaning of the date in order to know whetherit is historical.3. Based on domains of attributes, the subsystem may suggest some possibleentitySICs, e.g., Formula SICs, to the database designer. For example,if an entityhas several attributes with the same domain “money”, theremay be a formulaSIC involving these attributes. However, without applicationdomain knowledge,the subsystem would have to ask the database designer to confirmwhether such aconstraint is needed and provide the necessarydetails.4. The subsystem may have a lexicon containing the names ofcommon relationshiptypes with symmetry or transitivity property. However, the subsystemstill needs toask the database designer to confirm it. For example, therelationship “Married_to”is usually symmetric. However, if it is betweentwo entity types Woman and Man(or Wife and Husband), it would not be symmetric. Anotherexample is that therelationship Supervise (or Manage) is usually transitive. However,if the databasedesigner in fact defines the relationship to be “directlysupervise (or manage) “, itwould not be transitive.5. A heuristic to detect excluding relationships or time sequencing between relationships is:225Appendix E. Examples of Heuristics226If two relationships, which are adjacent in an E-R diagram, are named by thedatabase designer using the same verb phrase, they usually imply exclusionoreven time sequence among them. For example, the names of the relationships,Customer..OwnsCar, Dealer.Owns..Car are created by using the same verb phrase“Owns “.If those relationships (e.g., Customer_Owns_SavingAccourit andCustomer_Owns_CheckingAccount) are not exclusive nor have time sequencing,itmay be the following case.6. A heuristic to detect a Groupd{elationships SIC is:Suppose that an entity type is connected to a group of relationship types. Examineattribute names, but omit their prefixed entity or relationship names.If there isa common attribute name other than Id or Name among either (1) the sharingentity type and these relationship types or (2) the sharingentity type and its related other entity types, it is likely that there is a formula betweenthose attributes.An example is Inventory. Qty— sum (Supply. Qty) — sum(Sales. Qty)in the contextof two relationship types, Supply and Sales, sharing the same entity type,Inventory. Another example is Customer.AccountBalance = SavingAccount.Balance+CheckingAccount.Balance in the context of a Customer owning twoentity types,SavingAccount and CheckingAccount.Appendix FVerification of Aggregate Attribute SICs and CardinalitiesThis appendix discusses the extent to which we can verify SICs involving aggregateattributes, and provides algorithms to verify absolute and relative cardinalities given bythe database designer.F.1 Simple Tests on Aggregate Attribute SICsSuppose that we have some aggregate attribute SICs, each of which is a simple assertionto restrict the aggregate value of an attribute. At best, we can only have some simpletests, such as:Suppose E.A is an attribute between vi and v2; avg(E.A) is specified to bebetween MinAvg and MaxAvg; sum(E.A) is specified to be between MinSumand MaxSum; the maximum absolute cardinality of E is specified as Abs C-max. The following should be true:vi MinAvg MaxAvg v2;vi MinAvg MinSum MaxSum (AbsCmax x MaxAvg) (AbsCmaxx v2).If there is a SubtypeAggregate_Attribute SIC on E.A, the followingshould be true:vi subtype’s mm subtype’s max v2;subtype’s MaxSum < MaxSum;subtype ‘s count(E) < Abs CmaxThe general Interdependent..Aggregate_Attributes, Aggregate_Attribute_Formula, SubtypeAggregate_Attribute..Formula, etc. cannot be verified conceptually.227Appendix F. Verification of Aggregate Attribute SICs and Cardinalities 228F.2 Algorithms for Verifying Cardinalities1. Verification of “traditional” relationship cardinalities:• A relative minimum cardinality of0 is not a constraint. Therefore, if a SICexists, for each involved entity type in a relationship, its relative cardinalitiesmust be:1 relative minimum cardinality relative maximum cardinality.• Suppose that the relative maximum cardinality of entity type E througharelationship R to entity type F is Cmax, and absolute maximum cardinalityof F is Ab..max. Then, Cmax Ab.max.• A set of cardinality constraints may beinconsistent in the loop-2 and loop-ncontexts.Given a group of relationship and entity types with their relative cardinalities,do the following check.(a) Associate each entity type Ei with a special variableEi#.(b) For each relationship type Ri, do the following. Suppose that Ri connectstwo entity types Ej, Ek; and the minimum and maximum cardinalitiesofEj and Ek in Ri are, respectively, (Cminj, Cmaxj) and (Cmink, Cmaxk).That is,Ej(Gminj, Gmaxj) Ri Ek(Gmink,Gmaxk,).— If Cmaxj is not““and Cmink is not 0, construct the following inequality:Cmaxj x Ej// Cmink xEk#— If Cmaxk is not“<“and Cminj is not 0, construct the following inequality:Cmaxk xEk# Cminjx Ej(c) Solve all the above inequalities for the special variablesEi#’sunder therestriction of eachEi# > 0.(d) If there is no solution, the cardinalities are inconsistent07.107Observe these inequalities. If the cardinaiities are consistent,the designed database can be populated. Then there is at least one solution for these inequalities:assigning to eachEi#the valuecorresponding to the number of occurrences of Ei in the database.Formally, Lenzerini and Nobili([1987]) have proved that there exist solutions for all those inequalitiesif and only if the cardinalitiesare consistent. Note that their inequalities contain another set of variablesR#because they prove thatit is true for any degree of relationships (not restricted to binary relationships). However,in the case ofAppendix F. Verification of Aggregate Attribute SIGs and Cardinalities 229• Cardinality constraints can interact with other SICs. A special interactioniswith a Subset_SIC. Suppose we have relationship type “RX C RY” betweenthe entity types, E and F. Then, for either E or F, both minimum cardinalityand maximum cardinality in RX should he less than or equal to those in RY,respectively. If either E or F has (1,1) in both RX a.nd RY, RX and RY wouldcontain the exactly same occurrences108.They should be merged and a newrelationship type will be created.• Verification of Completeness_Mapping SIC: Suppose that a relationshiptype R is specified to be complete relative to an entity type E. Certainlyit is also complete relative to its other entity type F by the symmetry ofthis SIC. The relative maximum cardinality of E in R must be equal to theabsolute maximum cardinality of F. Otherwise, they are inconsistent. Thereis a similar condition between the relative maximum cardinality of F andabsolute maximum cardinality of E.2. Verification of Stronger, Weaker, and Sub-type Cardinalities: When weconsider cardinalities with regard to Subtype_Totality Constraint (SubGmin),SubtypeRelative..MaximumCardinality Constraint (Sub.. Cmax),Stronger_Totality Constraint (Stronger..Cmin),Weaker_Relative_Maximum_Cardinality Constraint (Weaker..Cmax),Subtype_Stronger_Totality Constraint (S’ub..Stronger Cmin), andSubtypeWeakerJ{elativeMaximumCardinality Constraint (Sub..WeakerC’max), verification becomes complicated.• The following shouldbe true if these cardinality constraints are not redundantand are consistent with each other.— For the same “subtype”,1 Cmin < Sub..Cmin SubCmax< Cmax— For the same “further restriction”,1 Stronger_Cmin < Weaker_Cmax,Stronger..Cmin < Cmin,Weaker_Cmax < Cmax.binary relationships, if we can assure that each pair of the relative maximum cardinalityis greater thanor equal to relative minimum cardinaiity, those inequalities containingthe R# variables can be reducedto our inequalities. Lenzerini and Nobili have also proposed another way to detectinconsistency forcardinajities by discovering cycles with special “weights” in anE-R diagram.‘°8Prove it by contradiction as follows. We already know thatRX C RY. Now suppose that entity typeB has (1,1) in both relationship types, RX and RY. If an RY occurrence, via whichan occurrence el ofB type connects with an occurrence fi of F type, does not belong toRX type, then el must connectanother occurrence f2 of F type via the relationship type RX.Otherwise, it would violate the totalityconstraint of RX to B. Then, by RX C RY, el must also connect withf2 via the relationship type RY.It would violate the maximum constraint of RY relative to E unless fi= f2. So, we would also have RYçRX.Appendix F. Verification of Aggregate Attribute SICs and Cardinalities 230— For the same “subtype”, and the same “further restriction”,Stronger_ Cmin< Sub_Stronger_Cmin Sub_ Weaker_ Cmax< Weaker_ Cmax,SubStronger..Cmin < SubCmin,SubWeakerCmax < Sub.Cmax.• In the contexts of loop-s, loop-n, if cardinalities have been specifiedfor “implicit subtype(s)”, the Lenzerini and Nobili’s inequalities should be tested byusing the proper cardinality values.— For example, in the context of ioop-2 where two relationship types RXand RY exist between two entity types E and F, the database designermay consider an implicit subtype of E, and give the Sub.Cmin, Sub_Cmaxof RX and RY relative to F, StrongerCmin, StrongerCmax of RX andRY relative to F. There may be potential inconsistencies between thesecardinalities.— Another example in the context of loop-n has a loop linking entity typesF, F, G and H, through four relationship types RX, RY, RZ and RS,consecutively. The database designer may consider implicit subtypes ofadjacent entity types, F and F. Now the proper values for testing theabove inequalities would be:Sub_StrongerCmin, Sub_WeakerCmax of RX relative to F and F,SubCmin, SubCmax of RY relative to F,Stronger_Cmin, Weaker_Cmax of RY relative to G,Cmin, Cmax of RZ relative to G, and H,Stronger_Cmin, Weaker_Cmax of RS relative to H,Sub_Cmin, SubCmax of RS relative to F.Appendix GConsistency and Nonredundancy Rules for SIC Elicitation SubsystemThe following rules could be stored in an elicitation subsystem for capturing SICs. Theserules are used to expedite the elicitation and verification procedure. By using them,the subsystem could sometimes avoid the need to invoke a sophisticated logic verificationalgorithm. In other cases, they eliminate the need to ask the database designer to confirmsome SICs that are “obviously” inconsistent or redundant. The listing here is illustrativerather than complete. In the following, E, F, G, H, I, ... are used to denote entity types.The R, Ri, R2, R3, ... are used to denote relationship types.1. If an attribute is declared to be non-changeable (i.e., a Nonvolatility constraint)there should be no other update SICs asserted for it.2. A composite key constraint is consistent with domain constraints if each componentattribute of a composite key is declared to be not-null.3. Incidence constraints are always needed and unlikely to be inconsistent with otherSICs. We need not verify them for consistency and non-redundancy.4. A Relationship_Depends_on_Relationship SIC could be redundant or couldbe reduced to another SIC if any involved relationship is total. If there is a negativecondition asserted for the involved relationships, specifying this typeof SIC wouldbe inconsistent.• If Ri is total to E, the SIC, “if E Ri F then F R2 G”, is subsumed by thetotality constraint R2 total to E. On the other hand, if R2 is total toF,this SIC is trivially true.• The SIC relative to F, “if E Ri F then E R2 G”, or “if F R2G then E RiF”, is inconsistent with an Exclusive_Relationship SIC,R1IR2,that is “ifF Ri F then -i(’E R2 G,)”.• If we have two SICs relative to F, “if F Ri F then E R2G” and “if F Ri Fthen F R3 H”, they are inconsistent with either an Exclusive_RelationshipSIC— R2IR3,or a NotAnd_Re1ationship SIC — “-((E Ri F)A (F R2G) A (E R3 H))”.231Appendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 232• If we have two SICs, “if E Ri F then E R2 G” “E R2 G then E R3 H”,they are inconsistent with a Not_And_Relationship SIC, “—i((E Ri F) A(E R2 G) A (E R3 H))”.• Both “if with_respect_to E, B Ri F then with_respect_to E, B R2 F” and “ifwith_respect_to F, E Ri F then with_respect_to F, E R2 F” become redundantif a SubsetJtelationship SIC, Ri C R2, i.e., “if B Ri F then ER2 F” hasbeen specified.• If we have further restrictions,they may he redundant because they may beimplied by another SIC. For example, the further restriction “(FR3 H)” inthe SIC, “if E R2 G then (B Ri F) A (F R3 H)”, is redundant if thereisanother SIC to require “if B Ri F then F R’ H”.• If Ri is total to B, the SIC, “if B Ri F then (B R2 G) V (B R3 H)”, reducesto an Either_Existence_Relationship SIC “(B R2 G) V (B R3 H) “. ThisSIC is also subsumed by either of the following:— R2 total to B (i.e., V E, R2, E R2 G),— R3 total to E (i.e., V E, R3, E R3 H),— “if B Ri F then E R2 G”,— “if ER1 FthenER3H”.This SIC is inconsistent with a set of two Exclusive_RelationshipSICsR1IIR2j,R1WR3.5. An Exclusive_Relationship SIC,R1IIR2,is inconsistent with the set of totalityconstraints — both Ri.4R2 are total to B. IfRiIR2and only one (say Ri) ofthe relationships is total to E, the database designer needsto disconnect the otherrelationship (i.e., R2) type from E since it becomes unrelatedto B; otherwise, theyare inconsistent.6. An Exclusive_Occurrence SIC becomes redundant when an Exclusive_Relationship SIC has been specified. However,it can exist when the set of totalityconstraints, both RiR2 are total to E, has been specified.7. A NotAndRelationships SIC, ‘-i((E Ri F) A (BR2 G) A (B R3 H))”,becomes redundant when any pair of the involved relationshipsis exclusive, i.e.,R1WR2, R2IR3,orR3IRi.It is inconsistent with the set of totality constraintsall involved relationships Ri, R2, and R3 are total toE. If only one relationship,say R3, is total, it reduces to a Not_And_RelationshipSIC among the remaining ones (or an Exclusive_Relationship SIC if there aretwo remaining). It isnot meaningful that only one of the relationshiptypes (say, R3) is not total since“—R3” would imply that the R3 relationship type shouldbe disconnected.8. An Either_Existence_Relationship SIC, “(B Ri F) V (BR2 G) V (E R3 H)”,becomes redundant when any one of these involvedrelationship is total to E. It isalso redundant when a proper subset of theserelationships has been declared toAppendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 233have an Either_Existence_Relationship SIC. A S ubtypeEither_ExistenceRelationship SIC, “if (E.Ai compop vi) then (ER1 F) V (ER2 G) V (ERSH)”,becomes redundant when the related sub-Cmin (the subtype minimum cardinality)of any one of these involved relationships to E is greater than or eqilal to 1. Itis inconsistent with the set of three Relationship_Depends_on_Entity_ValueSICs:“if E Ri F then —‘ (E.Ai comp_op vi) “,“if E R2 G then —‘ (E.Ai compop vi)”,and “if E R3 H then -i (E.Ai compop vi)”.9. A Pair_Relationships SIC, “if (E Ri F) then if ((E R2 G) then (F R3 H))”,is redundant when R3 is total to F. When R2 is total to E, it reduces to a Relationship_Depends_on_relationship SIC, “if (E Ri F) then (F R3 H)”.ThisSIC may be redundant or needs to be modified when there are some Relationship_Depends_on_Relationship SICs. For example, if we have “if (ERi F)then (F R3 H)”, the above SIC becomes redundant; if we have “if (E Ri F) then(E R2 G)”, the above SIC reduces to “if (E Ri F) then (F RS H)”. However,incase that there are further restrictions on R2 and R3, the consistency needsto bechecked during the consultation and is also hard to verify.10. A Relationship_Before_Relationship SIC, “if E Ri F then (ER2 G) before”,is redundant when R2 is total to E. It is not meaningful when Ri is totalto E.If the database designer specifies a time sequence cycle among some relationships(e.g., “if E Ri F then (E R2 G) before”, “if E R2G then (E R3 H) before”, “ifE R3 H then (E Ri F) before”), the insertion of all of the relationships canonlyperformed in a single transaction. It is possible that the databasedesigner hasmade a mistake to assert the logical meaning of these relationshiptypes. Thistype of SIC can exist when either a Relationship_Depends_on_Relationship,Subset_Relationship SIC, or an Exclusive_Relationship SIC, etc.has beenspecified for the related relationships Ri and R2.11. A Relationship_Not_Before_Relationship, “if (E RiF) then (E Rn Y) before” is inconsistent with the totality constraint that Rn is total toE, or “if E RiF then (E Rn Y) before”, or a set of SICs: “if E Ri F then(E R2 G) before”, “(ER2 G) then (E R3 H) before “, “if E R3 H... “, “if E Ri X thenE Rn Y before “. Itis not meaningful when Ri is total to E. However, it is allowed to existwhen eithera Relationship_Depends_onRelationship, Subset_Relationship SIC,or anExclusive_Relationship SIC, etc. has been specifiedfor the related relationshipsRi and R2.12. If a Group_Relationships SIC is specified, the verification iscomplicated.• The SIC, “if (E Ri F) A (E R2 G) then (E R3 H)”, is redundant whenR3 istotal to E. It reduces to “if (E R2 0) then (E R3 H)” when Ri is total to E. InAppendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 234addition, when both Ri and R2 are total to E, it is subsumed by a totality constraint — R3 is total to E. It is inconsistent with a Not_And_RelationshipsSIC, “—((E Ri F) A (E R2 G) A (E R3 H))”.• Two SICs, “if (E Ri F) A (E R2 G) then E R3 H” and “if (E Ri F)A (ER2 G,) then E R4 I”, are inconsistent withR3IR4,or “—i((E Ri F,) A (E R2G) A (E R3 H) A (E R I))”.• Two SICs, “if (E Ri F) A (E R2 G) then (E R5 J)” and “if ER5 J then ER6 K”, are inconsistent with “-‘ ((E Ri F) A (E R2 G) A (E R5 J) A (ER6K))”.• If this type of SIC has the disjunction of assertions in the “then” part,it wouldbe further complicated and needs a case analysis to verify consistency.— The SIC, “if (E Ri F) A (E R2 G) then (E R3 H) V (ER7 L)”, isinconsistent with a set of two Not_And_Relationships SICs: “—((ERi F) A (ER2 G)A (ER3H))” and “-((ERi F) A (ER2G) A (ER7— Two SICs, “if (E Ri F) A (ER2 G) then (E R3 H) V (E R7 L)”, and “if(E Ri F) A (E .112 G) then (E R8 M) V(E R9 N)” are inconsistent with eitlier a set of four exclusive relationships:R311R8, R3WR9, R7IR8, R7IR9;or a set of four Not-And-Relationships SICs:“-((E Ri F)A (E R2 G) A (E R3 H) A (B R8 M))”,“—i((E Ri F) A (E R2 G) A (E R3 H) A (ER9 N))”,“-‘((ER] F) A (E R2 G) A (E R7 L) A (E R8 M))”and “—i((ER1 FA (ER2 G) A (ER7L)A (ER9N,))”.— Similarly, two SICs, “if (E Ri F) A (ER2 G) then (E R3 H) V (E R7L)”, and “if (B Ri F) A (E R2 G,) then (E R9 N)” are inconsistentwitheither a set of two exclusive relationships: R3 jR9,R7IR9;or a set of twoNotAnd_Relationships SICs:“—i ((E Ri F) A (B ff2 0) A (E ff9 H) A (E R9 N))”,and “-i((E Ri F) A (B R2 G) A (E R7L) A (B R9 N))”.— Three SICs, “if (ER1 F) A (E R2 G) then (E R3 H)V (E R’7 L)”, “if ER3 H then E R9 N”, and “if B R7 L then B R5 J”, are inconsistentwitha set of two Not_And_Relationships:“-((ERJF)A(ER2G)A(ERSH)A(ER9N))”,and “—((E Ri F) A (ERG) A (E R7 L) A (E R5 J))”.13. A Relationships_Join SIC, “if (E Ri F) then (ER2 H) A (H R3 F)”, is notredundant even if either R2 orR3 is total to the related entities. It is inconsistentwith the Exclusive_Relationship SIC amongany pair of Ri, R2, ff9.If we have this SIC, two Relationship_Depends_on_RelationshipSICs, “if (BRi F) then (B R2 H)” and “if (B Ri F) then (H R3 F) “, are redundant.Appendix G. Consistency and Nonredundancy Rules for SIC Elicitation Subsystem 23514. The Symmetry, and Transitivity Properties of a relationship would not beinconsistent with other SICs in the above.15. If Ri is total to E, a Relationship_Depends_on_Entity_Value SIC, “if (E RiF) then (E.A comp_op v)” reduces to “(E.A comp_op v)”.16. A Weak Relationship SIC or a weak entity type requires the presence of thespecification totality constraints.Appendix HSIC Reformulation and Decomposition AlgorithmsThis appendix proposes algorithms for reformulating general SICs in the simplified formatof Appendix C and decomposing them, if necessary, into operation-dependent sub-SICsas defined by the Representation model. Before applying the algorithms, the subsystemtakes the responsibility of writing all general SICs in the simplified format of Appendix C.The if ... then ... rule format should be used if possible. For example, although theformat “‘—‘ P V Q”is equivalent to “if P thenQ”in logic, the rule format should beused. If a general SIC is originally written by using a nested rule (i.e., if ... thenif• . . then...),it should be rewritten by using only one pair of “if” and “then” keywords.Decomposed sub-SICs have the same certainty factor as their general SICs.H.1 Find the Relevant Object and Operation Components1. If the system variable Current_time appears in a general SIC, a sub-SIC for Current_time on update will usually be required. However, there are two followingexceptions because the increase of Current_time will never violate the SIC.(a) The SIC is in rule format and Current_time appears on the left hand side of“s”in the “if” part.(b) The SIC is not in rule format (or the SIC is in rule format but Current_timeis in the “then” part), and Current_time appears on the lefthand side of ““.2. If an attribute (say E.A) is mentioned in a SIC, the SICis usually relevant to iton update. However, if any of the following cases occurs,the SIC would j berelevant to it.(a) The attribute has been declared to be unchangeable (i.e.,there is a SIC “ifE.A is_to_be_updated then false”).(b) The SIC contains a pair of new and old special functions with argumentsofother attributes, not this attribute.(c) The SIC contains the keyword is_to_be_deleted.(d) The attribute is in the “then” part and the modifier “before”is attached to it.236Appendix H. SIC Reformulation and Decoinposition Algorithms 237(e) The attribute is in the “then” part and there is a modifier “previously” in the“if” part.(f) The attribute is a candidate key and is referenced by a special formula in the“if” part of the SIC. This formula involves the same attribute name of anotherentity type (e.g., E.A=F.A) to link the specific mentioned entity type occurrences (e.g., E and F that corresponds to the same physical entity occurrence(e.g., G) in a specialization hierarchy.3. Suppose that a general SIC has been found by the above steps to be relevant to anattribute.(a) Suppose that the relevance of the SIC to the attribute is because theattribute is an argument of an aggregation function. In general, the SICisalso relevant to its associated entity or relationship on insertion. However,if any of the following cases occurs, the SIC would be relevant to theassociated entity or relationship on insertion.i. The SIC is the one that declares the attribute to be unchangeable.ii. The SIC contains a pair of new and old functions.iii. The SIC contains an “previously” modifier in its “if” part or a “before”modifier in its “then” part.In addition, if the attribute belongs to an entity a.nd the SIC is also relevantto the insertion of one mentioned relationship in which the entity participate,the SIC is relevant to the entity on insertion.(b) Suppose that the SIC is relevant to the attribute because the attributeis anargument of an aggregation function. In general, the SIC is relevantto itsassociated entity or relationship on both insertion and deletion.However, ifthe assertion including the attribute is in the form of “agg_fcn(E.A) comp_oparithmetic_simple_expression”, where “agg_fcn” is an aggregate functionand“comp_op” is a comparison operator (for a comp_op (e.g.,“>“) prefixed witha “not” or “—i”, first properly replace the comparison operator (e.g.,““))the following would be exceptions:i. Suppose that the aggregate function is either max or count:• If the comparisonoperator is either “>,“, the SIC is only relevantto the mentioned entity/relationship on deletion, not insertion.• If the comparison operatoris either “<,“, the SIC is only relevantto the mentioned entity/relationship on insertion,not deletion.ii. Suppose that the aggregate function is mm:• If the comparisonoperator is either “>,“, the SIC is only relevantto the mentioned entity/relationship on insertion, not deletion.• If the comparison operatoris either “<,<“, the SIC is only relevantto the mentioned entity/relationship on deletion, not insertion.Appendix H. SIC Reformulation and Decomposition Algorithms 238iii. If the aggregate function is sum and if it can be assumed that the attributevalues to be summed are all positive, the sub-SICs are the same as thecase of either max or count.In all other cases (e.g., the comparison operator is “=“ or“f’,or the aggregatefunction is “avg”), the SIC is relevant to its associated entity or relationship on bothinsertion and deletion. However, similarly, if the attribute belongs to an entity andthe SIC is also relevant to the insertion of one mentioned relationship in which theentity participate, it is pj relevant to the entity on insertion.4. Suppose that a SIC contains the keyword “is_to_be_deleted”. The SIC is only relevant to the entity or relationship that is to be deleted.5. Suppose that a SIC does not contain the keyword “is_to_be_deleted”.(a) Consider its mentioned relationships.i. Suppose that in the “if” part of a SIC there is an assertion containing arelationship (say R). In general, the SIC is relevant to the relationship oninsertion. However, if any of the following cases occurs, the SIC would berelevant to it on deletion, not insertion.• A “not” or “—i” has been attached to the assertion.• An “d at_most” numerical quantifier has been attachedto the relationship R or there is a “count(R) < Number” or “count(R) Number”assertion, but no “not” or• A modifier “previously”has been attached to the assertion.If there is an “d exactly” numerical quantifier or “count(R)=Number”assertion, the SIC would be relevant to the relationship on both insertionand deletion.ii. Suppose that the “then” part of a SIC or a SIC, which is not in rule format,contains an assertion referencing to a relationship. In general, the SIC isrelevant to the relationship on deletion. However, if there is a “before”modifier attached to it, or if there is an “previously” modifier appearingin the “if” part of the SIC, this SIC is not relevant to it. In addition,if any of the following cases occurs, the SIC would be relevant to it oninsertion, not deletion.• A “not” or “—i” has been attached to the assertion.• An “ at_most” numerical quantifierhas been attached to the relationship or there is a “count(R)< Number” or “count(R) Number”assertion, but no “not” or• A “quantifier has been attached to the relationship, but no “not”or —Appendix H. SIC Reformulation and Decomposition Algorithms 239If there is an “ exactly” numerical quantifier or “count(R)—_ Number”assertion, the SIC would be relevant to the relationship on both insertionand deletion.(h) Consider its mentioned entities. If there are relationships involved in the formof “entity_typel relationship_type entity_type2” of a SIC, do not consider thenon-sharing entities. The sharing entities are those entities which:— appear in at least two assertions as a participant of relationships or as anowner of an attribute in either “if” or “then” part if there is no “with_respect_to”modifier; orare only those in the “with_respect_to” modifiers.For example, in the case of “if (E RX F) then (E RY G)”, E is a sharing entity,F and G are not. In the case of “if (E RX F) then (E RY H) A (F RZ H)”,all entities E, F, and H are sharing entities. In the case of “if with_respect_toE, (E RX F) then with_respect_to E, (E RY F)”, E is a sharing entity, but Fis not. If those involved entity types are in the same specialization hierarchy,the sharing entity types may be different in syntax. For example, assume thatsome mangers do not supervise employees, we may have“if with_respect_to Manager, (Manager Supervise Employee)then with_respect_to Employee, —l (Employee Participate Union)”It means that if a manager supervises employee(s), he (she) cannot, as anemployee, participate in a union. Here the “sharing” entity type Manager inthe relationship Supervise corresponds to Employee in the relationshipParticipate. A more complicated case would need a list of sharing entity typesin some order and their corresponding sharing entity types in the same order.For example,“if with_respect_to (Teacher, Student), (Teacher Instruct Student)then with_respect_to (Worker, Manager), —‘ (Manager Supervise Worker)This is an exclusive occurrence example which means that “if a teacher instructs a student, the student cannot, as a manager, supervise the teacher asa worker” (but the other way around may be permitted). Here the sharing entity types are: Teacher corresponding to Worker, and Student correspondingto Manager.i. Suppose that in the “if” part of a SIC there is an assertion referencingto an entity. In general, the SIC is relevant to the entity on insertion.However, if the SIC is relevant to the insertion of a relationship in whichthe entity participates, the SIC is relevant to the entity on insertion.ii. Suppose that the “then” part of a SIC or a SIC, which is not in rule format,contains an assertion referencing to an entity. In general, if the assertionis not on its attribute(s), the SIC is relevant to the entity on deletion.If any of the following cases occurs, the SIC would be relevant to it oninsertion, not deletion. However, if the SIC is relevant to the insertionof a relationship in which the entity participate, it is not relevant to theAppendix H. SIC Reformulation and Decomposition Algorithms 240insertion of the entity; similarly, if the SIC is relevant to the deletion ofa relationship in which the entity participate, it is not relevant to thedeletion of the entity.• A “not” or “-“ has been attached to the assertion.• An “ at_most” numerical quantifier has been attached to the entityorthere is a “count(R) < Number” or “count(R) Number” assertion,but no “not” or• A ‘V” quantifier has been attached to the entity, but no “not” or(c) If there is a SIC including an aggregate function because of the natural association or indexing derived set association, the SIC is also relevant to theentity, as a set object, on deletion.6. Suppose that by the above steps, a general SIC is found to be relevant to theinsertion or deletion of a relationship R or entity E. If its primary key is alsoexplicitly mentioned in the SIC, skip this step. Otherwise, do the following.(a) Suppose that the SIC only contains attributes of the relationship type R or entity type E. In general, we need not consider the primary key update problem.However, there is an exception. If there is a keyword “is_to_be_deleted”, theSIC is relevant to any of the key attributes of the relationship R or entity Eon update. The proper SIC name (refer to Appendix H.4) should be recordedin its “associated_PKSIC_D” predicate.(b) Suppose that the SIC also contains attributes of other than the relationshiptype R or entity type E; or contains assertions directly referencing R or B.i. If the SIC is relevant to the insertion or deletion of the relationship R,it would also be relevant to the update of any of its key attributes thatrelates to its sharing entities. The proper SIC name (see Appendix H.4)should be recorded in its “associated_PKSIC_I” or “associated_PKSIC_D”predicate (see Appendix B.2). It is not relevant to the update of therelationship’s key attributes that relate to non-sharing entities.ii. If the SIC is relevant to the insertion or deletion of the entity B, it wouldbe also relevant to the update of any of its primary key attributes. Theproper SIC name (see Appendix H.4) should be recorded in its “associated_PKSIC_I” or “associated_PKSIC_D” predicate.H.2 Write the Proper Precondition and Predicate Components1. If there is any aggregate function with the attribute E.A or R.A as an argument,properly attach subscripts to the entity or relationship owning the attribute. Thatis, suppose agg_fcn is an aggregate function, do the following (similarly, for theR.A).Appendix H. SIC Reformulation and Decomposition Algorithms 241• change “agg_fcn(E.A)” to “agg_fcn(Ei .A)“;• change “agg_fcn(E.A) comp_op arithmetic_simple_expression involving E.A“)to “agg_fcn({Ei.A E1.A E0.A}) comp_op arithmetic_simple_expression involving E0 . A “.2. Suppose that the special predicate unique is used.• If it contains only one argument, e.g., “unique(E.A)”, change it to“count({EiIE1.A =• If it contains more than one argument, e.g., “unique(E.A1, E.A2)”, change itto “count({EiIcomp_atts_occ(Eo, Comp_Key, CurrentCompAtts Value),comp_atts_occ(F1, Comp_Key, AnyCompAtts Value),CurrentCompAtts Value=AnyCompAtts Value})=1 “,where Comp_Key (e.g.,{E.A1, E.A2}) is a set containing those argumeilts inthe predicate unique.3. If there is a relationship expression “Entity_Occi Rship_Occ Entity_Occ2”, e.g., “ER F”, rewrite it for sharing entities by using a special predicate“rship_occ_part(Rship_ 0cc, Role_Type, Sharing_Entity_0cc)” that is used to evaluatewhether a Sharing_Entity_Occ participates in a relationship occurrence Rship_Occwith the Role_Type.• If there is no “with_respect_to” modifier, the Role_Type is the same as theentitytype of F.• If there are “with_respect_to” modifiers with a common entitytype, the Role-Type would be the entity type in these modifiers.• If those entity types in the “with_respect_to” modifiers are different, the relationship expressions would be written with the same entity variable, butdifferent Role Types. For example, supposing that we have “if withrespecttoF, E RX F then with_respect_to G, G RY H”, the first relationship expressionwould be written as “rship_occ_part(RX, “F”, F) “, the second relationshipexpression would be written as “rship_occ_part(RY, “G”, F) “,• If the entity types in “with_respect_to” modifiers are in ordered lists, therewould be two pairs of “rship_occ_part” assertions with same entity variablestaking their Role_Types in the corresponding order. For example, given“ifwith_respect_to (E.F), E RX F then with_respect_to (H, G), G RY H”, we wouldhave the following four assertions:“rship_occ_part(RX, “F”, F) “,“rship_occ_part(RY, “F”, F) “,“rship_occ_part(RY, “H”, E) “, and“rship_occ_part(RY, “G “, F) “.Appendix H. SIC Reformulation and Decomposition Algorithms 2424. Omit any quantifier “V” or “s” of the SIC unless there is an “at_least”, ‘at_most”,“exactly”, or “different” following an “3” quantifier.5. Rewrite numerical quantifiers. That is, for all numerical quantifiers on any relationship R, do the following.• change “3 at_least NumberR” to “count(R) Number”;• change “3 at_most Number R” to “count(R) Number”;• change ‘S exactly NumberR” to “count(R,) = Number”.A subscript may be attached to R in a later step.6. If a “before” modifier is attached to an assertion referencing an object, apply anold function to that object and delete the “before” modifier.7. Suppose that a general SIC is found to be relevant to an attribute (say E.A or R.A)by applying the algorithm in Appendix H.1.(a) If the sub-SIC declares the attribllte to be non-updateable, the predicate wouldonly include “false “. All other assertions are in the precondition componentto identify the attribute.(b) In other cases, the predicate component would usually only have theassertion(say Q) containing the attribute (E.A or R.A) for which we are reformulating.That is,i. If the original general SIC contains only an arithmetic expressionQ,leaveit as the predicate.ii. Suppose that the original general SIC is in rule format.• IfQis in the “if” part, negate it and move it to the predicatecomponent. Leave the further restrictions (if any) onQin the preconditioncomponent. Negate all the original assertions in the “then” partofthe general SIC, and move them to “AND” with thepreconditioncomponent.• IfQis in the “then” part, leave it as the predicate component,move(but not negate) the further restrictions (if any) oiiQto “AND” withthe precondition component.(c) If the subscript 1 has been attached to its entity or relationshipno matterwhether it is attached to this attribute (e.g., agg_fcn(E1.A)) or other attributes(e.g., agg_fcn(E1.B)) of the same entity or relationship owing to the aggregatefunctions mentioned in step 1, attach the subscript0 (i.e., E0.A) to its assertions other than those involving aggregate functions.Appendix H. SIC Reformulation and Decomposition Algorithms 243(d) If the original general constraint is to assert explicitly the equality of somekey attributes of two entity types, it should be dealt with specially. In theprecondition component of the sub-SIC for the involved key attribute of oneentity on update, the old function is used to search for the correspondingoccurrence of the other entity before updating. Its predicate component shouldrequire the new values of the corresponding key attributes of the two entitytypes to be equivalent.8. Suppose that a general SIC is found to be relevant to a relationship R or entity Eby applying the algorithm in Appendix H.l.(a) If the SIC only contains assertions referencing to attributes of the relationshipR or entity E, keep the original format that the system obtains from thedatabase designer as the precondition and predicate components of the sub-SIC for the relationship R or entity E.(b) Suppose that the SIC also contains assertions referencing to attributes of otherthan the relationship R or entity E, or contains assertions directly ontherelationship R or entity E.i. Suppose that the SIC is found to be relevant to the entity E on insertion.In general, the sub-SIC for the entity keeps the original format. However,if the assertion (say, Q) containing the entity is in the “then” part witha “not” or “-“, rewrite the original format — negate and move theQtothe precondition component, negate and move all other assertionsin theoriginal “if” part to the predicate component.Replace any “count(E)” with “ E1, count(E1}”.ii. Suppose that the SIC is found to be relevant to the entityB on deletion.Rewrite the original format so that the assertion containing the entityEis in the precondition component.iii. Suppose that the SIC is found tobe relevant to the relationship R oninsertion. In general, keep the original format. However, if the “if”partoriginally does not contain those “rship_occ.part”s of the sharing entitiesregarding to the relationship R, add them to the preconditioncomponentto identify this specific relationship R. If a “rship_occ_part”assertion isoriginally in the “then” part with a “not” or “—‘“ attached to it,rewritethe original format so that the assertionis in the precondition component.Replace any “count(R)” with “ R1, count(Ri) “.iv. Suppose that the SIC is found to be relevant to therelationship R ondeletion.A. Suppose that there is “ different R” in the original format.Replace ‘8 different R” with ‘8 R1, R1 R0 “.In its precondition component, replace the relationship variableRwith R0.Appendix H. SIC Reformulation and Decomposition Algorithms 244In its predicate component, replace the relationship variable R withR1.B. Suppose that it is not the above case A, and the assertion(s) rship_occ_partcontaining this relationship R is (are) in the “if” part.Keep the original format after removing any “previously” modifier.Replace any “count(R)” with “ R1, R1 R0, count(Ri)”.C. Suppose that it is not the above case A, and the assertion(s) (e.g., sayQ) containing this relationship R is (are) in the “then” part.• Suppose that for maintaining semantic integrity it is possible to findother occurrences to replace the one to be deleted. That is, originally the relationship expression does not contain an “ at_most”numerical quantifier; and the SIC is not relevant to the all key attributes of the relationship. (For example, a Subset_RelationshipSIC is relevant to the whole relationship key.)— The original assertions in “if” part are in the precondition component.— In addition, add one copy of the assertion(s)Qcontaining thisrelationship R variable with a subscript 0 at the beginning of theprecondition component to be in conjunction with other assertions.— If originallyQis a part of further restrictions on some assertions(say Qsome), move (but not negate) Qsome, after removing theirnumerical quantifiers or count aggregate functions, to the precondition component. Rearrange them in a reverse order and putthem behind the assertionQ.— Replace the assertion(s)Q(that may include the “count” aggregate function) in the “then” part with the one containing thisrelationship variable with a subscript 1, and add the assertion “— The further restrictions (if any) onQare kept unchanged in thepredicate component.For example, suppose thatQois the assertion(s) containing R0, andQiis the assertion(s) containing R1. If originally we have “ifQWthenQV QX V QY V ... “, the precondition component would be“QoAQW”, and the predicate component would be“QiV QX VQY V ... “. If originally we have “if QW then QX A QY AQAQZ A ... “, whereQY contains a numerical quantifier “ at_least”,the precondition component would be“QoA QY A QX AQW”,where QY is the QYafter removing the quantifier, and the predicatecomponent would be“QiA QZ A ... “.• Suppose that it is not possible to find otheroccurrences to replacethe one to be deleted.Appendix H. SIC Reformulation and Decomposition Algorithms 245— Move (but do not negate) Q and all its further restrictions to theprecondition component.— If after doing the above, we find that the predicate componentcontains only the assertion “false”, negate all the assertions inoriginal “if” part and move them to the predicate component.v. Although in the above we have mentioned placing an “a” quantifierona variable with a subscript 1, an “a” quantifier on a variable in othercases is optional. Without an explicit quantifier, an “s” quantifier canbe automatically assumed on any variable in that assertion. We mayexplicitly express the quantifier just for clarity. No explicit quantifierneed be placed on the object for which we are writing precondition andpredicate components. We may place an explicit “s” quantifier on anunknown variable when it is its first time to be included in any assertion.For example, if we are writing precondition and predicate components forR and we need an assertion rship.occ...part(R, “E”, E), we may placeanexplicit quantifier on E, i.e., ‘S E”.H.3 Suggest the Violation Action Component1. A sub-SIC with certainty less than 100% (in a ratio scale), not 10 (supposingthatthe database designer specifies 10 as the highest in an ordinalscale), “uncertain”(in 2 levels), or fuzzy terms (e.g., “sometimes”) would have at least two alternativeviolation actions: “warning” and “conditionally_reject”. However,if the databasedesigner has specified a certainty threshold, those sub-SICs with certaintyless thanthe threshold would only have one violation action — “warning”.2. A sub-SIC with certainty 100% (in a ratio scale), 10 (if it is the highestin an ordinalscale), or “certain” (in 2 levels) would have at least one alternative violationaction“reject”.3. Suppose that there are only two assertions on two objects (e.g.,“if (E RX F) then(E RY G)”) in the SIC.(a) Suppose that these two assertions are both for relationshipsor entities (sayRX, RY, not attributes. Both its sub-SICs would havepropagation as analternative violation action. That is, a certainsub-SIC could have an alternative — ‘propagate”; and an uncertain sub-SIC could have“conditionally_propagate “.i. In a sub-SIC for one object, the propagated operation type isoppositeto the relevant operation of the other object found by the algorithminAppendix H.1.For example. supposing that the general SIC is “if (E RX F) then(EAppendix H. SIC Reformulation and Decomposition Algorithms 246RY G,) “, the violation action in the sub-SIC for RX on insertion could be“propagate(insert(RY))“;supposing that the general SIC is “if (E RX F)then — (E RY G,) “, the violation action in the sub-SIC for RX on insertioncould be “propagate (delete (RY))”.ii. If there is an “ at_least” numerical quantifier attached to the assertioncontaining the object to be propagated or there is a “courit(Rship Type)Number” assertion in the sub-SIC, use the “insert_all” or “delete_all” asthe propagated operation to indicate that there are explicit quantitativerequirements when propagating to insert or delete the object occurrences.iii. If the object to be propagated is an entity occurrence, specify into/fromwhich entity type this occurrence is to be inserted or deleted. That is, wewould need “propagate (insert(Ent Type, F,))” or “propagate (delete(EntType,F))” where the Ent Type is the entity type with which E participates in arelated relationship in the sub-SIC. This Ent Type is the one appeared inanassertion “rship_occpart(Rship pe,EntType,E)”or “ent_occ(EntType,E)”in the predicate or precondition component of the sub-SIC.(b) Suppose that one of the assertions is for an attribute. Similarly, thesub-SIC forthe attribute would have an alternative to propagate to insert or delete an entity or relationship. The other sub-SIC for the entity or relationship could have“propagate(update(E.A, arithmetic_simple_expression))” as an alternativeonlyif the assertion of the attribute is “E.A = arithmetic_simple_expression”.(c) Suppose that two assertions are both for attributes, sayE.Ai and E.A2. Thesub-SIC for E.A2 (or E.Al) could have an alternative: “propagate(update(E.Al,arithmetic simple expression_i))” (or “propagate (update (E.A2, arithmeticsimple expression_2))“)only if those two assertions are“E. Al = arithmetic_simple_expression_i”and “E.A2 = arithmetic_simple_expression_2“;or there is a single assertion such as “E.Ai = E.A2”.4. In the case that there are two alternative violation actions in a sub-SIC,ask thedatabase designer to choose one.H.4 Generate the SIC Name1. The “ObjectType, OperationType” in the SIC namecan be decided by the systemafter applying the above algorithms.2. The “SICType” and “RelatedObjectTypeSet” depend on the predefinedSIC types(e.g., those in Appendix D). If there are some further restrictions onsome predefined SIC types, in general, all the objects appeared in the furtherrestrictionsare included in the “RelatedObjectTypeSet”. However, the entity typenames needAppendix H. SIC Reformulation and Decomposition Algorithms 247not be included if the names of their attributes have been included. Neither wouldthe entity type names be included if a relationship type is mentioned and both ofits participant entity types are concerned (e.g., SubsetRelationship SIC). Noduplicate object names are included.3. For some complicated applications, if the system finds that it cannot distinguishtwo SIC names by the previous steps, a “SeqenceNo”is added.Appendix ISome Examples of SIC Reformulation and DecompositionThis appendix contains four examples of reformulating and decomposing general SICs.Conditional_Value SIC:Suppose that there are only two attributes involved, such as,if E.A1 comp_opi vithen E.A2 compop2 v2where comp_opi or comp_op2 denotes=, ,,<, or>.This type of SIC is represented by the following three sub-SICs.E.A2- U-Conditional Value-(E.Ai)CERTAINTY certainFOR E.A2ON updateIF E.A1 comp.opi viASSERT E.A2 comp.op2 v2ELSE rejectE. A i-U-ConditionalValue- (E. A 2)CERTAINTY certainFOR E.AiON updateIF E.A2 -‘ compop2 v2ASSERT E.A1 -, compopi viELSE reject248Appendix I. Some Examples of SIC Reformulation and Decomposition 249E-I-CoriditionalVal’ue-(E.Al, E.A2)CERTAINTY certainFOR EON insertionIF E.Al compopl viASSERT E.A2 compop2 v2ELSE rejectAppendix I. Some Examples of SIC Reformulation and Decomposition 250Relationship_Depends_onltelationship involving 3 entity types:For example, in Figure 5.2 (on page 110):if E RXFthenERYGThere are two decomposed sub-SICs to represent this type of SIC.RX-I-RshipDepRship3E- (E, R Y)CERTAINTY certainFOR RXON insertionIF E, rship_occ_part(RX, “E” ,E)ASSERT RY, rship_occ_part (RY, “E” ,E)ELSE reject or propagate(insert(RY))RY-D-RshipDepRship3E- (E, RX)CERTAINTY certainFOR RYON deletionIF E, rship_occ_part(RY0,”E”,E),3 RX, rship_occ_part(RX, “E” ,E)ASSERT 3 RY1,RY1 RY0,rship_occ_part(RY , “E” ,E)ELSE reject or propagate(delete(RX))Appendix I. Some Examples of SIC Reformulation and Decomposition 251Relationship-Depends-on-Relationship involving 4 entity types:In Figure 5.3, we may haveif E RX Fthen(ERYG)V (ERZH)RX-I-RshipDepRship4E-(E, RY, RZ)CERTAINTY certainFOR RXON insertionIF E, rship_occ_part(RX,”E”,E)ASSERT(RY, rship_occ_part(RY,”E”,E)) V(RZ, rship_occ_part(RZ, “E” ,E))ELSE rejectR Y-D-RshipDepRshipE-(E, RX, RZ)CERTAINTY certainFOR RYON deletionIF E, rship_occ_part(RY0,“E” ,E),RX, rship.occ_part(RX, “E” ,E)ASSERT(RY1,RY1 RY0,rshipoccpart(RY1,”E”,E)) V(RZ, rship_occ..part(RZ, “E” ,E))ELSE rejectRZ-D-RshipDepRshipE-(E,RX, RY)CERTAINTY certainFOR RZON deletionIF E, rship_occ_part(RZ0,”E”,E),3 RX, rship_occ_part(RX, “E” ,E)ASSERT (3 RY, rship_occpart(RY, “E” ,E)) V(3 RZ1,RZ1 RZ0,rship_occ_part(RZ1,”E”,E))ELSE rejectAppendix I. Some Examples of SIC Reformulation and Decomposition 252S ubtype..Stronger....Totality.Cardinality:For example, in the context of Figure 5.2, we may have:if3 RY, ERYGthen 3 atieast SnbCmin RX, E RX F,3 RZ, FRZHRY-I-SubStrongTotal-(E,F,RZ,RX)CERTAINTY certainFOR RYON insertionIF 3 E, rship_occ_part(RY, “E” ,E)ASSERT 3 RX, rship..occ_part(RX, “E” ,E),3 F, rship_occ_part (RX, “F” ,F),3 RZ, rship_occ_part(RZ,”F”,F),count(RX) Sub.LminELSE rejectRX-D-S’ubStrong Total- (E, F, R Y, RZ)CERTAINTY certainFOR RXON deletionIF 3 E, rship_occ_part(RX0,“E” ,E),3 RY, rship_occ_part(RY, “E” ,E)ASSERT 3 RX1,RX1 RX0,rship_occ_part(RX , “E” ,E),3 F, rship_occ_part (RX1,“F” ,F),3 RZ, rship_occ_part(RZ,”F”,F),count(RX1) Sub..CminELSE rejectAppendix I. Some Examples of SIC Reformulation and Decomposition 253RZ-D-SubStrong Total- (E, F, RY, RX)CERTAINTY certainFOR RZON deletionIF 3 F, rship_occ_part(RZ0,“F” ,F),3 RX, rship..occ_part(RX, “F” ,F),3 E, rship_occ_part(RX,”E” ,E),3 RY, rship..occpart(RY,”E”,E)ASSERT 3 RZ1,RZ1 RZ0,rship_occ_part(RZ , “F” ,F)ELSE rejectAppendix JGeneric SIC Representation in the E-R-SIC ModelThis appendix contains a partial listing of generic SIC representation in the E-R-SICmodel for illustration. If there are two alternative violation actions, the system willquery the database designer to choose one. Please refer to [Yang, 1992] for other genericSICs mentioned in Section 7.3.1.Domain Constraint:Emtity*-I-DomainCERTAINTY certainFOREntity*ON insertionIF entity(Entity*,PrimKey, CompJ<eySet, AbMax.Card)ASSERT set{AttNameIattribute( “Entity”, AttName, Domain,SpecialVRange, Null?, Unique?, Key?, Changeable?)},att_occ(Entity*,AttName, EntAttOcc),concatenate..SlCname( “Entity.”, AttName,“U-DomNull”, AttDoznainSlCl),checkcornSlC(AttDomain5lC 1, EntAttOcc),concatenate_SlCname(Entity*.,AttNarne,“U-DomTypeForVal”, AttDomainSlC2),checkcomSlC(AttDomainSlC2, EntAttOcc),concatenate.SICname(Entity*.,AttName,“U-DomUnique”, AttDomainSlC3),checkcomSlC(AttDomainSlC3, EntAttOcc)ELSE reject254Appendix J. Generic SIC Representation in the E-R-SIC Model 255Entity* .Attribute*- U-DomNullCERTAINTY certainFOREntity* .Attribute*ON updateIF attribute(Entity*, Attribute*,Domain, SpecialVRange,Null?, Unique?, Key?, Changeable?),Null?=noASSERTisnot_null(Entity* .Attribute*)ELSE rejectEntity* Attribute*- U-Dom TypeFormat ValCERTAINTY certainFOREntity* .Attribute*ON updateIFis..notjiull(Entity*.Attribute*),attribute(Entity*, Attribute*,Domain, SpecialVRange,Null?, Unique?, Key?, Changeable?),domain(Domain, DataType, Format, ValueRange)ASSERTsatisfyAatatype(Entity* .Attribute*,DataType),satisfy.iormat(Entity* .Attribute*,Format),satisfy_value(Entity* .Attribute*,SpecialVRange),satisfy_value(Entity* .Attribute*,ValueRange)ELSE rejectEntity*Attribute*- U-Dom UniqueCERTAINTY certainFOREntity* .Attribute*ON updateIF attribute(Entity*, Attribute*,Domain, SpecialVRange,Null?, Unique?, Key?, Changeable?),Unique?=yesASSERT count(set{EntityIEntity.Attribute* = Entity.Attribute*})=1ELSE rejectAppendix J. Generic SIC Representation in the E-R-SIC Model 256Entity*.Attribute*U-DomChangeCERTAINTY certainFOREntity*.Attribute*ON updateIF attribute(Entity*, Attribute*,Domain, SpecialVRange,Null?, Unique?, Key?, Changeable?),Changeable?=noASSERT falseELSE rejectAppendix J. Generic SIC Representation in the E-R-SIC Model 257Primary.Key SIC:Entity .Attribute*- U-PrimKeySurroDelCERTAINTY certainFOREntity* .Attribute*ON updateIF entity(Entity*,Prim_Key, Comp_Key_Set, Ab_Max_Card),associated_PKSICs_D(“Entity”, SIC_Name_Set),belongs_to( “Attributes”, Prim_Key)ASSERT set{SIC_NameIbelongs_to(SICNarne, SIC_Name_Set)},checkmemSlC(SIC_Name,old(Entity*))ELSE rejectEntity*Attribute*- U-PrimKeySurrolnsCERTAINTY certainFOREntity* .Attribute*ON updateIFentity(Entity*,Prim_Key, Comp_Key_Set, Ab_Max_Card),associated_PKSICs_I(Entity*,SIC_Name_Set),belongs_to(Attribute*,Prim_Key)ASSERT set{SIC_Name belongs_to(SIC_Name, SICName_Set)},checkmemSlC(SIC_Name,new(Entity*))ELSE rejectAppendix J. Generic SIC Representation in the E-R-SIC Model 258Absolute Maximum Cardinality Constraint of an Entity Type:Entity*-I-Abs CardCERTAINTY certainFOREntity*ON insertionIF entity(Entity*,PrimaryJKey, Composite_Key_Set,AbsiVIax.Card),Abs_MaxCardASSERT Entity,count(Entity) AbsMaxCardELSE rejectAppendix J. Generic SIC Representation in the E-R-SIC Model 259Incidence Constraint:Relatioriship*-I-Incidence- (Ent Type)CERTAINTY certainFORRelationship*ON insertionIF relationship_participant(Re1ationship*,EntType,MinCardinality, Max.Cardinality)ASSERT EntOcc, entocc(EntType, EntOcc),rship_occ_part(Relationship*,EntType, Ent 0cc)ELSE reject or propagate(insert (EntType,EntOcc))Entity*-D-Incidence- (Rship Type)CERTAINTY certainFOREntity*ON deletionIF relationship_participant(RshipType,Entity*,MinCardinality, Max_Cardinality)ASSERT— (Rship0cc, rshipocc(RshipType, Rshipocc),rship_occ_part (Rshipocc,Entity*, Entity*))ELSE reject or propagate(delete( RshipOcc))Appendix J. Generic SIC Representation in the E-R-SIC Model 260Siibset.Relationship SIC: There are two decomposed sub-SICs to represent thistype of SIC as below.Relationship*-I-RshipSubset- (AnotherRship Type)CERTAINTY certainFORRelation