UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

View integration in database design Wagner, Christian 1989

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A1 W33.pdf [ 9.48MB ]
Metadata
JSON: 831-1.0098369.json
JSON-LD: 831-1.0098369-ld.json
RDF/XML (Pretty): 831-1.0098369-rdf.xml
RDF/JSON: 831-1.0098369-rdf.json
Turtle: 831-1.0098369-turtle.txt
N-Triples: 831-1.0098369-rdf-ntriples.txt
Original Record: 831-1.0098369-source.json
Full Text
831-1.0098369-fulltext.txt
Citation
831-1.0098369.ris

Full Text

VIEW INTEGRATION IN DATABASE DESIGN by C h r i s t i a n Wagner Diplom-Ingenieur, T e c h n i c a l U n i v e r s i t y B e r l i n , 1984  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES F a c u l t y o f Commerce and Business  We accept  Administration  t h i s t h e s i s as conforming  to the required  standard  THE UNIVERSITY OF BRITISH COLUMBIA A p r i l 1989 (c) C h r i s t i a n Wagner, 1989  In  presenting  degree  this  at the  thesis  in  University of  partial  fulfilment  of  of  department  this or  thesis for by  his  or  requirements  British Columbia, I agree that the  freely available for reference and study. I further copying  the  representatives.  It  publication of this thesis for financial gain shall not  is  granted  Commerce a n d B u s i n e s s  The University of British Columbia Vancouver, Canada _ Date t  DE-6  (2/88)  April  24, 1989  Library shall make it  by the  understood  that  head of copying  my or  be allowed without my written  permission. '  Faculty ^O^Sftrn'e'n't of  an advanced  agree that permission for extensive  scholarly purposes may be her  for  Administration  ABSTRACT  The of  purpose o f t h i s  a method  o f bottom  research  up d a t a b a s e  i s the formalization  design  known  as v i e w  integration.  View i n t e g r a t i o n  i s one o f t h e main s t e p s o f an acknowledged  database  procedure,  design  Workshop procedure.  t h e New Orleans  T h i s procedure  develops  Database  a g l o b a l database  ( g l o b a l schema) f o r an o r g a n i z a t i o n from s m a l l p a r t i a l (user the  views) .  data  Design  databases  I n d i v i d u a l u s e r views a r e r e p r e s e n t a t i o n s o f  relevant t o the users' o r g a n i z a t i o n a l tasks.  Views  w i l l o v e r l a p s i n c e u s e r s w i l l share data t o some e x t e n t . integration  h a s t o merge  i n f o r m a t i o n presented views without  views  without  i n m u l t i p l e views.  duplication  View  d u p l i c a t i n g the  The t a s k o f merging  i s complicated  by t h e f a c t  that  u s e r s have d i f f e r e n t p e r c e p t i o n s o f t h e world which l e a d them to  represent  of  different  occurrence  t h e same data  p e r c e p t i o n s being  integration  methods  naming c o n f l i c t s  such  form  as t h e  13 y e a r s a v a r i e t y o f approaches t o s o l v e the  t a s k has been r e p o r t e d .  neglected  leaving  t h e most simple  o f synonyms.  Within the l a s t  have  differently,  t h e problem o f c o n f l i c t i n g  i t ssolution that  Many o f t h e approaches views a l t o g e t h e r ,  t o t h e database d e s i g n e r .  performed  conflict  resolution  Integration d i d i t i n an  unsystematic dealt with  and  incomplete  conflict  fashion.  s i t u a t i o n s only  conveniently a v a i l a b l e .  This  fills  is  t h a t gap.  o u t l i n e d which c o n s i d e r s  and  transforms  them  A conflict  into conditions  that  steps. ignores  complete  methods  for  their  a n a l y s i s procedure  a l l possible c o n f l i c t that  means o f p r e v i o u s l y developed t e c h n i q u e s . i n two  these  i f information  r e s o l u t i o n was  research  Often  can  conditions  be  merged  by  The r e s e a r c h proceeds  F i r s t , a c o n f l i c t a n a l y s i s procedure i s developed the  information  the  c o n c e n t r a t i o n on completeness o f the procedure, s i n c e one  does  have  gathering  to  be  the  This  concerned  required  with  simplification  assuming  allows  not  information.  requirements problem by  the  information.  difficulties The  the assumption of complete i n f o r m a t i o n .  involved i n  second step Difficult  relaxes  information  requirements are i d e n t i f i e d and r e p l a c e d by more e a s i l y s a t i s f i e d ones.  Main c o n t r i b u t i o n s t o knowledge are (1) a complete o f the  f a c t o r s causing  c o n f l i c t s between views,  of s u b s t i t u t e s f o r d i f f i c u l t contributions  are  semantic  d i c t i o n a r y , (4)  design  of  data  (3)  information  knowledge  suggestions  based  an  (2) d e t e c t i o n  requirements.  f o r the  Other  development of  a l t e r n a t i v e method f o r  systems,  and  (5)  e f f i c i e n t bottom up systems d e s i g n s t r a t e g i e s .  iii  understanding  suggestions  a  the for  TABLE OF CONTENTS  ABSTRACT  i i  TABLE OF CONTENTS  iv  LIST OF FIGURES  viii  ACKNOWLEDGMENT  ix  1.  OVERVIEW  1  2.  VIEW INTEGRATION  3  2.1.  Database Design P h i l o s o p h i e s Top Down v s . Bottom Up  2.2.  3  Database Design based on t h e New  Orleans Database Design  Workshop Procedure 2.2.1.  5  S y n t a c t i c Approaches  2.2.1.2. ^ M a r t i n  1  s  12  Canonical  Synthesis 2.2.1.3.  Casanova's  21 and V i d a l ' s  Method 2.2.1.4.  Functional  25 Data  Based I n t e g r a t i o n  iv  Model 30  2.2.2.  Semantic  View  Approaches  Integration  Based  on t h e  E-R  Model 2.2.2.1.  34 Navathe's and  Elmasri's  Approach  36  B a t i n i ' s Approach  39  2.3.  View I n t e g r a t i o n Cases  43  2.4.  Conclusion  45  3.  SYSTEM FOR VIEW INTEGRATION . . . .  3.1.  Research  2.2.2.2.  Question  ''53  and  C o n t r i b u t i o n t o Knowledge  53  Approach t o the Problem  60  3.2.1.  Overview  60  3.2.2.  Outline  3.2.  o f the Problem  with  A v a i l a b l e Information 3.2.3.  Changes  i n the  Integration  Method  when  Necessary  Information  i s not  Directly  Available 3.2.4.  View  Integration  61  75 Conflict  Cases  79  3.3.  Expert System Methodology  83  4.  RESULTS  90  4.1.  Rules G u i d i n g View Integration . . . .  90  v  4.2.  D i a g n o s i s Procedure  134  4.3.  C o n f l i c t Therapy  151  4.4.  The Impact o f H e u r i s t i c s  167  4.5.  Generalization Hierarchy f o r Database O b j e c t s  178  4.6.  Assessment o f the Method  5.  IMPLEMENTATION  - THE  . . . . .  AVIS  PROGRAM  197  5.1.  Overview  . . . . .  5.2.  Function  and  197  Structure  of  the AVIS Program 5.3.  184  197  Knowledge R e p r e s e n t a t i o n  5.3.1.  R e p r e s e n t a t i o n o f views  5.3.2.  Representation  of  . . . . . 203 203 View  I n t e g r a t i o n Knowledge  206  5.4.  The Impact o f Domain Knowledge . . . .  210  6.  SUMMARY AND  214  7.  REFERENCES  EXTENSIONS  219  APPENDIX  224  Appendix 1: C o n f l i c t Cases  224  Appendix 2:  231  Conflict Solutions  vi  Appendix  3:  View  Integration  w i t h AVIS  vii  Session  LIST OF FIGURES  Figure  Title  Page  1  O b j e c t Comparison M a t r i x  65  2  Case T r a n s f o r m a t i o n s d u r i n g View I n t e g r a t i o n  72  3  O r d e r i n g o f View I n t e g r a t i o n Steps  74  4  C o n f l i c t R e c o g n i t i o n Procedure  75  5  D e c i s i o n Table I l l u s t r a t i o n  6  T e s t f o r Object I d e n t i t y , Procedure  (abbreviated)  85 without  Heuristics  138  7  Test f o r I d e n t i t y with H e u r i s t i c  143  8  T e s t f o r Relatedness o f O b j e c t s  147  9  R e l a t i o n s h i p becomes an E n t i t y  152  10  R e l a t i o n s h i p A t t r i b u t e Becomes an E n t i t y  153  11  E n t i t y A t t r i b u t e Becomes an E n t i t y Relationship Construct  155  12  A s s o c i a t i o n o f an E n t i t y t o a R e l a t i o n s h i p  156  13  Relationship Relocation  158  14  R e p r e s e n t a t i o n o f Containment  159  15  R e p r e s e n t a t i o n o f Common Role  160  16  R e p r e s e n t a t i o n o f Common Superset  without  Common Subset 17  162  R e p r e s e n t a t i o n o f Common Superset and Common Subset  163  18  Sources o f Evidence  f o r Meaning I d e n t i t y  19  C o n s t r u c t Mismatches Shown as Graph viii  173  Contraction 20  186  I d e n t i c a l Meaning Query i n P r o l o g Graph Notation  188  21  AVIS Program S t r u c t u r e  199  22  Representation  203  23  AVIS Hypotheses  206  24  AVIS "make agenda" Rule  2 07  25  F i l t e r i n g Rule i n AVIS  208  26  AVIS Object A s s e r t i o n Rule  209  27  AVIS Meaning I d e n t i t y I n d i c a t o r s  2i2  28  View I n t e g r a t i o n Sample Problem  239  o f Views i n AVIS  ix  ACKNOWLEDGMENT I thank my s u p e r v i s o r , P r o f e s s o r Robert C. G o l d s t e i n , f o r h i s guidance as w e l l as f o r h i s ongoing encouragement. My thanks go t o P r o f e s s o r Y a i r Wand f o r h i s o f t e n v e r y c r i t i c a l and always v e r y s t i m u l a t i n g comments. To P r o f e s s o r Wolfgang B i b e l I am g r a t e f u l f o r p r o v i d i n g many new p e r s p e c t i v e s on t h e nature o f t h i s research. I a l s o wish t o acknowledge t h e funding g i v e n f o r t h i s r e s e a r c h by t h e W o r l d U n i v e r s i t y S e r v i c e o f Canada, dem D e u t s c h e n Akademischen Austauschdienst, and t h e U n i v e r s i t y o f B r i t i s h Columbia. F i n a l l y , I thank my p a r e n t s , Helmuth and I r i s Wagner, f o r t h e i r l o v e and support.  x  OVERVIEW  1.  The casual  data  consuming, This  database  descriptions into  e r r o r prone,  argument  logical  and  is still  and  a  task,  converting  database  design  valid,  ( C u r t i c e and  strong  interest  users'  is  time  requires substantial expertise. even though the  p h y s i c a l d e s i g n c o n s i d e r a t i o n s has  design e f f o r t exists  designer's  Jones,  i n the  1982).  s e p a r a t i o n of simplified  Consequently,  the  there  development of t e c h n i q u e s  to  improve the database d e s i g n p r o c e s s , p a r t i c u l a r l y the hardware independent l o g i c a l database d e s i g n p r o c e s s .  One decomposition  approach of  the  t h a t has  design  been  process.  taken  i s the f u r t h e r  Frequently,  database  d e s i g n e r s b e g i n w i t h a g r a p h i c a l r e p r e s e n t a t i o n of the database t o be b u i l t , i . e . an e n t i t y - r e l a t i o n s h i p model or Brown diagrams (Brown, 1982) , b e f o r e they d e s i g n the a c t u a l database r e l a t i o n s or  r e c o r d and  set types.  As  DeMarco  context  of structured analysis,  a  that  tool  consistency  provides checking  of d e s i g n composition u s e r views  representation, allows  i s very maintainable.  focuses on  i n the  g r a p h i c a l r e p r e s e n t a t i o n s are  a concise  and  (1979) mentions  Another  easy form  the development of i n d i v i d u a l  f o r s m a l l t a s k domains and  o f u s e r views i n t o a complete schema.  subsequent The  integration  rationale for this  approach i s s i m p l i f i c a t i o n due t o a more narrow focus, as w e l l  1  as  improved  validity  of the views.  I f every u s e r d e s c r i b e s  o n l y the data of her t a s k d o m a i n — t h e data she i s most f a m i l i a r with—the  r e s u l t i n g r e p r e s e n t a t i o n promises t o be more c o r r e c t  than one t h a t i s done by a person o n l y remotely the  domain.  structures in  However,  s i n c e each  as p e r c e i v e d by  the  user p e r c e p t i o n s — c o n f l i c t s  expected. can  be  aggregated  purpose the  These c o n f l i c t s t o form  each  f a m i l i a r with  view d e s c r i b e s data  individual  users,  between u s e r v i e w s — a r e  have t o be a global  settled,  database  integration  resolution methods  are  problem.  structure.  conflict  resolution  problem  be  The  s o l u t i o n of  Even though a v a r i e t y  presently available,  i n t e g r a t i o n methods are incomplete,  to  b e f o r e views  o f t h i s r e s e a r c h i s the f o r m a l i z a t i o n and  conflict  differences  existing  of  view  f r e g u e n t l y n e g l e c t i n g the  (Batini  e t a l . , 1986,  p.  348).  C o n f l i c t s a r i s e when d i f f e r e n t u s e r s model the same r e a l world concepts  differently,  or  different  real  world  objects  identically.  This  research  bridges  the  gap  by  developing  a  conflict  c l a s s i f i c a t i o n and r e s o l u t i o n scheme, and based on t h i s scheme a  computer  program  that  i n t e g r a t e s u s e r views,  r u l e s and h e u r i s t i c s of database d e s i g n .  2  grounded i n  2.  VIEW INTEGRATION  2.1.  Database  Design  Philosophies  - Top Down v s .  Bottom Up  Independent of any p a r t i c u l a r database d e s i g n approach there  exists  other  form  bottom  t h e q u e s t i o n whether database d e s i g n ,  o f systems  up.  Bottom  design,  should  proceed  t o p down o r  up and t o p down r e p r e s e n t  p o i n t s i n a spectrum o f d e s i g n  l i k e any  t h e extreme  alternatives.  In g e n e r a l , t o p down d e s i g n has t h e advantage over bottom up d e s i g n t h a t i t i s o r i e n t e d towards o v e r a l l a l l o w s stepwise design  o f those g e n e r a l g o a l s .  requires integration  system and w i l l the  refinement  elements  almost  g o a l s and t h a t i t Bottom up  o f t h e elements o f t h e o v e r a l l  certainly  result  i n conflicts  between  and i n t h e n e c e s s i t y f o r t h e r e d e f i n i t i o n o f  system elements.  D e s p i t e t h i s disadvantage, bottom up approaches  are f r e q u e n t l y used  (Martin, 1984, McFadden and H o f f e r , 1988).  T h e i r major advantage i s t h a t they do not demand t h e e x i s t e n c e of can is  an o v e r a l l  design before the design of p a r t i c u l a r  take p l a c e .  Thus, no o v e r a l l understanding  o f t h e system  r e q u i r e d , o r a t l e a s t not t o t h e extent necessary  top down approach.  In a d d i t i o n , bottom up d e s i g n  3  elements  f o r the  facilitates  the use o f e x i s t i n g i n f o r m a t i o n from p r e v i o u s d e s i g n s and thus i s a b e t t e r approach f o r i n c r e m e n t a l development.  Given  t h a t both  designers w i l l using  approaches have advantages and disadvantages, typically  a t o p down focus  apply both  d e s i g n approaches, namely  f o r the i n i t i a l  design,  to partition  the system i n t o manageable subsystems which a r e c o n f l i c t - f r e e . T h e r e a f t e r , they w i l l apply a bottom up approach i n t h e d e t a i l e d design  of these  necessity  subsystems,  for conflict  taking  resolution  into  c o n s i d e r a t i o n the  and t r a d i n g i t f o r ease o f  design.  The  major database d e s i g n techniques d e s c r i b e d i n t h i s paper,  those  u s i n g view  integration,  will  appear t o be bottom up  approaches, s i n c e t h e i n t e g r a t i o n p r o c e s s i s based on i n d i v i d u a l u s e r views.  However, t h e procedure  Database Design  Workshop  l a i d out a t t h e New Orleans  (New Orleans,  1979) which p r e s e n t s a  framework f o r view i n t e g r a t i o n approaches, recommends a database d e s i g n procedure  t h a t i n t r o d u c e s o r g a n i z a t i o n a l g o a l s and h i g h  l e v e l i n f o r m a t i o n requirements in  by means o f E n t e r p r i s e M o d e l l i n g  t h e s t e p p r e c e d i n g view i n t e g r a t i o n .  widely  accepted  I n o t h e r words,  this  d e s i g n s t r a t e g y a l s o a p p l i e s a mixed t o p down  and bottom up approach.  4  2.2.  Database Design based on the New  O r l e a n s Database  Design Workshop Procedure  In  this  section  the  focus  will  be  on  the  common  elements o f a l l view i n t e g r a t i o n procedures as w e l l as on t h e i r differentiating  characteristics.  In  short,  a l l integration  approaches can be p e r c e i v e d as procedures f o r view and  schema o p t i m i z a t i o n .  approaches views.  One  feature  of  aggregation  a l l (comprehensive)  w i l l be the a b i l i t y t o r e s o l v e d i f f e r e n c e s between  To permit t h i s ,  the methods' data models w i l l have t o  be a b l e t o r e p r e s e n t o b j e c t s and o b j e c t a s s o c i a t i o n s .  Dissimil-  a r i t i e s among view i n t e g r a t i o n procedures w i l l a r i s e p r i m a r i l y from the d i f f e r e n c e s i n procedure, the d i f f e r e n c e s i n a b i l i t i e s to deal with c o n f l i c t i n g information, v a r i a t i o n s i n information requirements,  and  on  the  restrictions  placed  on  the  initial  schema.  View  i n t e g r a t i o n i s an  design  strategy.  requirements (physical) (New  of  any  bottom  T h i s s t r a t e g y , whose i n i t i a l  and  whose  database,  Orleans,  element  1979,  has  final been  Teory and  outcome  Fry,  1982)  database  i n p u t are  i s the  segmented by  up  user  implemented  various  i n t o the  authors  following  steps: 1. Requirements A n a l y s i s to obtain information processing  from u s e r s on i n f o r m a t i o n  requirements,  5  and  to  analyze  and this  information  i n order  inconsistencies analysis  and  with  to the  resolve  conflicts  enterprise  incorporation  of  view.  (global)  and The  business  c o n s t r a i n t s adds a top down focus t o t h i s otherwise bottom up o r i e n t e d 2.  View M o d e l l i n g  and  technique.  Modification  t o generate a p p l i c a t i o n views and i n f o r m a t i o n  access  requirements. 3. View I n t e g r a t i o n t o merge i n d i v i d u a l views i n t o a g l o b a l schema. 4.  Implementation Design t o handle i s s u e s of i n t e g r i t y , c o n s i s t e n c y , s e c u r i t y and  5.  recovery,  efficiency.  P h y s i c a l Design t o ensure f u n c t i o n i n g and e f f i c i e n c y o f the database w i t h a p a r t i c u l a r d a t a b a s e / f i l e system.  In o t h e r words, view i n t e g r a t i o n takes as i t s i n p u t s i n d i v i d u a l user  views  (and  p o s s i b l y p r o c e s s i n g / q u e r y requirements)  and  produces as i t s output a g l o b a l database schema.  The all  most t r i v i a l  form o f view i n t e g r a t i o n i s an a g g r e g a t i o n of  i n d i v i d u a l views  However,  instead  of  without  generating  a l t e r a t i o n of a  system  of  any  of  them.  interconnected  database o b j e c t s , t h i s method c r e a t e s merely a lump o f i n d i v i d u a l views.  View i n t e g r a t i o n has  t o go beyond aggregation,  6  i t has  to  include  schema. encies  the reorganization  (optimization)  of the global  The t a s k i s t o e l i m i n a t e redundancies and i n c o n s i s t that  result  from combining  overlapping  who a l l may have d i f f e r e n t conceptual  Reorganization  views o f users  models.  o f the g l o b a l schema i s intended t o i n c r e a s e the  d e s c r i p t i v e adequacy o f t h e g l o b a l schema  1  .  In a d d i t i o n , i t  may i n c l u d e t h e c o n s i d e r a t i o n o f query requirements which has been  a concern  relational  i n some e a r l i e r  s t u d i e s , e s p e c i a l l y i n non-  d a t a b a s e environments  ( f o r example B a t i n i e t a l .  (1984a) o r Yao e t a l . (1982, 1985)) . F o r network o r h i e r a r c h i c a l databases, result  consideration  of processing  i n a trade-off that introduces  o b j e c t s t o improve p r o c e s s i n g  might  d u p l i c a t i o n o f database  efficiency.  Even though a v a r i e t y o f r e s e a r c h e r s t o database d e s i g n ,  requirements  choose t h e same approach  namely view i n t e g r a t i o n , d i f f e r e n c e s e x i s t  i n t h e data m o d e l l i n g language used t o c a r r y out t h e i n t e g r a t i o n process.  T i g h t l y connected t o t h e data model i s t h e " i n t e g r a t i o n  philosophy", et  al.  a l t e r n a t i v e s o f which have been p o i n t e d out by Yao  (1982) as (1) v i e w  synthesis  using  frequency  i n t e g r a t i o n based information,  f u n c t i o n a l dependencies among items  on item  level  (2) s y n t h e s i s  using  and (3) merging o f o b j e c t  level structures.  1  D e s c r i p t i v e adequacy i s understood as t h e p r e c i s i o n with which t h e data model d e s c r i b e s the world i t attempts t o model. 7  The f i r s t c a t e g o r y i s a form o f " s t a t i s t i c a l " view i n t e g r a t i o n , in  which  frequency information  cohesion or f u n c t i o n a l Sheppard,  The  s e r v e s as a s u b s t i t u t e f o r  dependency  o f data items  (Dyba, 1977,  1977).  second c a t e g o r y  b u i l d s database o b j e c t s ,  i.e. relational  data s t r u c t u r e s , based on i n f o r m a t i o n on f u n c t i o n a l dependencies. Proponents  of this  c a t e g o r y c a n be f o u n d  Bernstein  (1976), Raver and Hubbard  Casanova  and V i d a l  19 85) . purely  (1983),  on f u n c t i o n a l  (1977), Yao e t a l . (1982),  and B i s k u p and C o n v e n t  Most o f t h e s e approaches based  f o r instance i n  attempt  to build  dependencies  (1986,  databases  (and p o s s i b l y o t h e r  forms o f dependencies)  and t r y t o a v o i d t h e c o n s i d e r a t i o n o f  the  objects  meaning  integration.  of data Later,  as much  as p o s s i b l e  these approaches w i l l  during  be r e f e r r e d t o as  s y n t a c t i c approaches.  The t h i r d Batini  group o f approaches i s p r o b a b l y b e s t r e p r e s e n t e d by  e t a l . ( f o r instance  Batini  and L e n z e r i n i ,  1984) and  Navathe e t a l . ( f o r i n s t a n c e Navathe and E l m a s r i , 1986). techniques  a r e based  additional  information  on t h e E-R model,  enhanced  Both  by some  (generalization/specialization).  The  f a c t t h a t t h e s e t e c h n i q u e s operate on an o b j e c t l e v e l does not imply t h a t f u n c t i o n a l r e l a t i o n s h i p s a r e not r e l e v a n t f o r them.  8  However,  i n E-R  models, dependencies are  represented  i n the  a s s o c i a t i o n of a t t r i b u t e s t o e n t i t i e s or r e l a t i o n s .  S i n c e the  late  statistical of  s e v e n t i e s , the  literature  has  approaches t o view i n t e g r a t i o n .  statistical  approaches  i s t h a t they  moved away from The main problem  attempt  to  capture  dependency i n f o r m a t i o n between data items by means o f r e l a t i v e frequency  o f common use  same f i l e  structure.  s i n c e experienced of  which d a t a  Weber, is  1986  inferior  T h i s s u b s t i t u t e may  should belong  "intuitive"  to  ones  that  concentrate  Thus, w i t h i n t h i s  on  two  these  latter two  together  g r o u p s of  on  correct,  understanding  (see f o r i n s t a n c e  n o r m a l i z a t i o n ) , but  dependencies. the  o f t e n be  f i l e d e s i g n e r s w i l l have a good  items  on  i n a p p l i c a t i o n s o r c o e x i s t e n c e i n the  the  r e s e a r c h , the  the  technique  actual  focus w i l l  i n t e g r a t i o n methods o n l y .  groups, p r o t o t y p i c a l  i n t e g r a t i o n methods  w i t h t h e i r data models) are presented  (1983) - "Bubble C h a r t i n g "  * B e r n s t e i n (1976) - R e l a t i o n a l Model e t a l . (1982) - F u n c t i o n a l Data Model  * Raver and Hubbard (1977) - "Bubble C h a r t i n g " 9  For  i n the f o l l o w i n g l i s t .  Based on F u n c t i o n a l Dependencies o n l y  * Yao  be  (together  SYNTACTIC ( a t t r i b u t e - l e v e l ) INTEGRATION  * Martin  data  * A l - F e d a g h i and Scheuenrtan (1981) - R e l a t i o n a l Model  Based on FDs and other Dependencies  * Casanova and V i d a l (1983) - R e l a t i o n a l Model * Biskup and Convent (1986) - R e l a t i o n a l Model  SEMANTIC ( o b j e c t - l e v e l ) INTEGRATION  * B a t i n i e t a l . (1983) - E n t i t y - R e l a t i o n s h i p Model * Navathe  et  a l . (1986)  1  -  Entity-Category-  R e l a t i o n s h i p Model * Mannino  and E f f e l s b e r g (1984) - G e n e r a l i z a t i o n  Assertions * Teory and F r y (1982) - Semantic H i e r a r c h i c a l Data M.  Not a l l o f these techniques there  e x i s t s considerable  techniques Casanova  will  s h a l l be d i s c u s s e d i n d e t a i l overlap  be d i s c u s s e d :  and V i d a l ,  among them.  Martin,  1  The f o l l o w i n g  Bernstein,  Navathe e t a l . , B a t i n i  since  Yao e t a l . ,  et a l .  Martin  The method put forward by Navathe and o t h e r s has gone t h r o u g h v a r i o u s stages and has i n v o l v e d v a r i o u s r e s e a r c h e r s . An e a r l i e r method i s d e s c r i b e d by Navathe and G a d g i l (1978) or Navathe and S c h k o l n i c k (1978). Other v e r s i o n s i n c l u d e Navathe, E l m a s r i , and Larson (1986). The method r e f e r e n c e d here i s the l a t e s t p u b l i s h e d form. I t has been extended i n t o database i n t e g r a t i o n by E l m a s r i e t a l . (1986) . 10  contributes method.  a not p a r t i c u l a r l y d e t a i l e d , y e t p o p u l a r i n t e g r a t i o n  Bernstein  presents the f i r s t  s y n t a c t i c a l view s y n t h e s i s method. the  first  syntactic  set  o f dependencies.  integration Finally,  and  purely  Casanova and V i d a l i n t r o d u c e  i n t e g r a t i o n method t h a t  includes  a richer  Navathe e t a l . put forward a semantic  method w i t h  Batini  algorithmic  a large  et a l . present  set of integration the  (semantic)  method t h a t b e s t d e a l s w i t h c o n f l i c t i n g views.  11  cases.  integration  2.2.1.  S y n t a c t i c Approaches  Syntactic the  view  approaches  are  design  i n t e g r a t i o n procedure does not  understanding  of the data  on "understanding" reorganize  the  d u r i n g the  by the a l g o r i t h m )  initial  1  methods i n which  rely  on  a  designer's  i n t e g r a t i o n process .  Instead, the  schema i n a p u r e l y  (nor  algorithms  structural  manner  independent o f the meaning of o b j e c t s o r a t t r i b u t e s i n v o l v e d , once  certain  information  dependencies are s a t i s f i e d . assumed  to  procedure.  The  be  satisfied  at  f o r view  (author's  terminology)  (Casanova and the  the  outset  They are not p a r t of the  algorithm  Vidal)  design  introduced  of the  i s not  d e s i g n e r might t h i n k o f .  functional  of  the  integration  technique.  i n t e g r a t i o n and  i s not  about  These i n f o r m a t i o n requirements are  s y n t a c t i c approaches  since  requirements  below, g i v e show t h e  resulting  a  complete  "optimality  design.  1 1  Optimality  a p a r t i c u l a r l y w e l l chosen term,  optimal "Optimal"  in a l l criteria  a  database  i s meant as " a c h i e v i n g  the  g o a l s s e t f o r the d e s i g n a t the s t a r t of the i n t e g r a t i o n p r o c e s s " which i n p a r t i c u l a r means the g e n e r a t i o n i.e.  one  that  satisfies  of a v a l i d  database,  a l l previously established  integrity  1  I d e a l l y t h e t e c h n i q u e s do n o t r e l y a t a l l on t h e d e s i g n e r ' s understanding. However, a t l e a s t one method (Biskup and C o n v e n t ) c o n s u l t s t h e d e s i g n e r , when t h e i n t e g r a t i o n a l g o r i t h m i s i n a deadlock. Other methods (e.g., Yao e t a l . ) r e q u i r e d e s i g n e r understanding f o r more complex i n t e g r a t i o n cases, such as removal of redundant f u n c t i o n s . 12  c o n s t r a i n t s and will  call  i s f r e e o f u n d e s i r a b l e data dependencies.  the r e s u l t i n g d e s i g n s from now  than " o p t i m a l " .  on " f e a s i b l e "  Three main proponents o f d i f f e r e n t  We  rather  syntactic  approaches a r e B e r n s t e i n (1976), Casanova and V i d a l (1983), and Biskup and Convent shall from  (198 6).  a l s o be mentioned the  Two  additional syntactic  approach  i n t h i s c o n t e x t , a l t h o u g h they  above t h r e e i n not b e i n g as p u r e l y  differ  synthetic,  not  p r o v i d i n g a complete a l g o r i t h m , and i n u s i n g o t h e r d a t a models ("bubble c h a r t s " al.)).  (Martin) and the Functional  Data Model (Yao et  A l l approaches, o t h e r than Biskup's and Convent's, w i l l  be d i s c u s s e d .  Biskup and Convent's t e c h n i q u e i s r a t h e r s i m i l a r  t o Casanova's and V i d a l s .  Hence, a s e p a r a t e d i s c u s s i o n w i l l not  be n e c e s s a r y .  Bernstein's  approach  does  not p a r t i c u l a r l y  address the  view  i n t e g r a t i o n problem, but i n s t e a d the problem o f s y n t h e s i z i n g a minimal  number o f  dependencies.  3NF  relations  from  a  set of  functional  N e v e r t h e l e s s , i t s approach i s a p p l i c a b l e t o view  i n t e g r a t i o n , s i n c e the a l g o r i t h m does not mind whether the schema d e s c r i p t i o n s used f o r r e l a t i o n s y n t h e s i s stem from one view o r from many views.  However, the procedure has o b v i o u s l y no means  t o u n i f y c o n f l i c t i n g p e r c e p t i o n s o f the same d a t a . more r e c e n t Vidal's,  integration  approaches  B e r n s t e i n ' s method  such  relies  as  only  Contrary to  Casanova's on  functional  dependencies t o c a r r y out the r e l a t i o n s y n t h e s i s procedure.  13  and  Martin's  approach, C a n o n i c a l  Synthesis,  attempts t o develop a  1  " c a n o n i c a l data r e p r e s e n t a t i o n " . T h i s method, l i k e B e r n s t e i n ' s , has  no formal  means f o r d e a l i n g with c o n f l i c t s between views,  not  even f o r naming c o n f l i c t s .  In a d d i t i o n i t i s much  less  d e t a i l e d and much l e s s a l g o r i t h m i c than B e r n s t e i n ' s .  Casanova's and V i d a l ' s technique  assumes t h e e x i s t e n c e o f user  views  of dependencies  and complete  constraints)  knowledge  f o r the c o l l e c t i o n  o f user  views.  summarized by t h e f o l l o w i n g i n t e g r a t i o n p l a n .  (integrity I t can be  Given a s e t o f  u s e r views and a s e t o f i n t e g r i t y c o n s t r a i n t s , d e f i n e as a v a l i d ("proper") database scheme ( = g l o b a l schema) one t h a t s a t i s f i e s all  desirable integrity  constraints.  Then apply an a l g o r i t h m  t h a t r e o r g a n i z e s t h e c o l l e c t i o n o f user views i n t o a v a l i d schema by  removing a l l u n d e s i r a b l e  data  dependencies through changes  i n r e l a t i o n schemes.  Yao  e t a l . r e q u i r e f o r t h e i r approach complete i n f o r m a t i o n on  entities  ("entity  nodes"),  f u n c t i o n a l r e l a t i o n s h i p s between  e n t i t y nodes, p l u s a s s e r t i o n s d e s c r i b i n g t r u e f a c t s about the data model which a r e not r e p r e s e n t e d relationships.  i n form o f e n t i t y nodes o r  A l l views a r e combined  1  i n one r e p r e s e n t a t i o n  The n o t i o n o f a c a n o n i c a l r e p r e s e n t a t i o n i n data models has been p u t forward by Raver and Hubbard (1977) and i s used t o d e s c r i b e schemata which a r e redundancy-free (no n o n e s s e n t i a l a s s o c i a t i o n s ) , complete, and c o r r e c t . Thus a c a n o n i c a l s y n t h e s i s t e c h n i q u e n o t o n l y i n t e g r a t e s user views, b u t can a l s o extend them t o add necessary f u r t h e r d e t a i l s . 14  which i s t h e r e a f t e r s u b j e c t t o removal of redundant and redundant nodes.  functions  A p r o o f of c o r r e c t n e s s of the i n t e g r a t i o n  r e s u l t i s not g i v e n f o r t h i s approach.  One major l i m i t a t i o n of the s y n t a c t i c s t r a t e g i e s , e s p e c i a l l y of Casanova's  and  Vidal's,  requirements. information  They  on  is  assume  their at  extensive  least  f u n c t i o n a l , i f not  the  information  availability  a l s o on  union  dependencies, i n c l u s i o n and e x c l u s i o n dependencies.  of  functional I t has  to  be q u e s t i o n e d whether i t i s f e a s i b l e t o generate t h i s i n f o r m a t i o n during  the  view  i n t e g r a t i o n process,  i n f o r m a t i o n w i l l be.  and  how  reliable  the  With r e s p e c t t o the amount o f i n f o r m a t i o n ,  one has t o keep i n mind t h a t not o n l y i n t r a - v i e w but a l s o i n t e r view  c o n s t r a i n t s have  to  be  defined.  This  requirement  can  i n c r e a s e the number o f c o n s t r a i n t s s u b s t a n t i a l l y , i t a l s o demands from the  designer  the comparison of each r e l a t i o n scheme from  each view a g a i n s t a l l o t h e r r e l a t i o n schemes, t o d e t e c t dependencies.  Any  i n c o r r e c t assessment by  the  designer  those will  p o t e n t i a l l y r e s u l t i n an i n c o r r e c t g l o b a l schema.  A second l i m i t a t i o n o f these approaches i s the r e s t r i c t i o n s they p l a c e on the i n i t i a l views t o make the i n t e g r a t i o n a computati o n a l l y s o l v a b l e problem the key  f o r the i n i t i a l  ( i . e . o n l y f u n c t i o n a l dependencies on c o l l e c t i o n of  15  views).  A t h i r d l i m i t a t i o n i s caused by the p u r e l y s y n t a c t i c treatment of  data dependencies.  The  p r o c e d u r e s cannot d i f f e r e n t i a t e  between dependencies t h a t are o f the same type and i n v o l v e the same a t t r i b u t e s , example, might  even  i f t h e i r meanings a r e d i f f e r e n t .  the f u n c t i o n a l  in fact  dependency  represent  two  Employee# ->  different  For  Department!  relationships,  first,  every employee works for one p a r t i c u l a r department, and second, every employee while  i s located  f o r example  i n one p a r t i c u l a r department.  employee  6750 works  f o r the  Thus,  information  systems department, he r e s i d e s i n the o f f i c e s o f the a c c o u n t i n g department.  T h i s d i f f e r e n c e i n r o l e s (here, r o l e s o f department)  has t o be i n c o r p o r a t e d  i n t o the a t t r i b u t e names, t o a l l o w the  s y n t a c t i c approaches d i f f e r e n t i a t e between the two r e l a t i o n s h i p s . I.e.,  there  has  to  exist  Employed_by_Dept.  16  a  Loca ted_in_Dept  and  a  2.2.1.1.  Bernstein's Relation Synthesis  T h i s method i s d e s c r i b e d implementation Gottlob  The  i n Bernstein  (1976).  An  o f B e r n s t e i n ' s a l g o r i t h m can be found i n C e r i and  (1986).  goal  of Bernstein's  method i s t h e c r e a t i o n o f a schema  c o n t a i n i n g t h e s m a l l e s t number o f 3NF r e l a t i o n s f o r a g i v e n s e t of f u n c t i o n a l dependencies.  S i n c e the procedure does not concern  i t s e l f w i t h t h e o r i g i n o f t h e f u n c t i o n a l dependencies, i t does not o b j e c t t o t h e f a c t t h a t t h e s e t o f dependencies i s taken from more than  one schema.  T h e r e f o r e t h e method can be c o n s i d e r e d  a view i n t e g r a t i o n procedure. synthesis  algorithm,  The method not o n l y p r o v i d e s a  b u t a l s o demonstrates t h a t t h e s e t o f  r e s u l t i n g r e l a t i o n s i s minimal and probably i n 3NF. The c r e a t i o n of  3NF r e l a t i o n s i s t y p i c a l l y the g o a l and f i n a l  decomposition  i n which l a r g e r t a b l e s a r e s p l i t  into  s m a l l e r redundance-free components ( f o r example, Ullman,  1980  o r Date, 1981).  process  outcome o f a  B e r n s t e i n , i n c o n t r a s t , generates 3NF r e l a t i o n s  by means o f composition. integration  The  goal  T h i s makes B e r n s t e i n ' s approach a view  technique.  of Bernstein's  i n t e g r a t i o n procedure  17  i s t o f i n d the  s m a l l e s t s e t o f 3NF r e l a t i o n s t h a t i n c o r p o r a t e s a l l p r e - d e f i n e d f u n c t i o n a l dependencies t h a t have been d e f i n e d .  The  algorithm  parts. 2",  The  developed by  f i r s t part  see below),  has  Bernstein  c o n s i s t s of three  ( i n v o l v i n g steps the purpose  1 and 2 o f  to generate  main  "Algorithm  a new  s e t of  f u n c t i o n a l dependencies (FDs) from an a r b i t r a r y s e t o f f u n c t i o n a l dependencies c h a r a c t e r i z i n g the d a t a r e l a t i o n s h i p s .  These  dependencies  Synthesis  form the i n p u t t o the s y n t h e s i s p a r t .  (steps 3 and 4 i n A l g o r i t h m  2) f i r s t p a r t i t i o n s the s e t o f FDs  i n t o groups w i t h i d e n t i c a l l e f t s i d e s in 6  t h e s e groups.  transitive previous  Algorithm  (1)  1  and then merges the FDs  The l a s t p a r t o f the procedure  i n Algorithm  2)  constructs  dependencies,  based  r e l a t i o n s which on the FDs  (steps 5 and are f r e e of  synthesized  i n the  steps.  2:  E l i m i n a t i o n o f extraneous a t t r i b u t e s t o produce set  (2)  new  F  1  a  o f f u n c t i o n a l dependencies.  F i n d i n g o f a non-redundant  covering  C  f o r the s e t  F' o f f u n c t i o n a l dependencies.  1  " L e f t s i d e " means the s e t o f d e t e r m i n i n g a t t r i b u t e s . In c o n t r a s t , the " r i g h t s i d e " c o n s i s t s o f the determined a t t r i b u t e s . 18  Partitioning  (3)  of the covering  C i n t o groups o f  f u n c t i o n a l dependencies with i d e n t i c a l l e f t s i d e s . (4)  Merging o f e q u i v a l e n t keys.  (5)  E l i m i n a t i o n o f t r a n s i t i v e dependencies.  (6)  Construction of r e l a t i o n s .  B e r n s t e i n ' s approach does not d i f f e r e n t i a t e among d i f f e r e n t cases o f i n t e g r a t i o n , based on d i f f e r e n t dependencies w i t h i n t h e data a t hand.  A l l f u n c t i o n a l dependencies a r e t r e a t e d by t h e same  i n t e g r a t i o n procedure.  This i s a p o s i t i v e feature of Bernstein's  approach, s i n c e i t s i m p l i f i e s t h e procedure.  In a d d i t i o n , t h i s  approach has l e s s i n f o r m a t i o n requirements than the two f o l l o w i n g ones,  which  also  require  information  on o t h e r  forms o f  dependencies.  One major problem o f t h e technique,  pointed  out by B e r n s t e i n  h i m s e l f , i s t h e p u r e l y s y n t a c t i c c h a r a c t e r o f t h e approach which i s t h e source  f o r the "uniqueness assumption".  The uniqueness  assumption says t h a t o n l y one f u n c t i o n a l dependency can e x i s t between any two i d e n t i c a l s e t s o f a t t r i b u t e s .  I n o t h e r words,  i f two FDs e x i s t e d , because o f a d i f f e r e n c e i n r o l e s o f e i t h e r s e t o f a t t r i b u t e s , t h e technique difference.  were not a b l e t o p i c k up the  In o r d e r t o a l l o w t h e technique  among d i f f e r e n t  roles,  role  to differentiate  names have t o be i n t r o d u c e d as  a t t r i b u t e names.  19  T h i s p o i n t l e a d s t o another shortcoming of the t e c h n i q u e , namely the  significance  operates  on  problems  caused  o f names.  attribute by  which  Furthermore,  purely syntactic  technique  names, b e i n g t h e r e f o r e s u b j e c t t o a l l  attribute  However, the technique was technique,  The  name synonymy  and  not c o n c e i v e d as a view  justifies  this  weakness t o  B e r n s t e i n ' s approach  does  not  homonymy. integration  some e x t e n t . rule  out  the  development of a p r e - i n t e g r a t i o n procedure which c o u l d t a k e c a r e of such c o n f l i c t s and then supply the i n t e g r a t i o n procedure w i t h conflict-free  views.  20  2.2.1.2.  Martin's Canonical Synthesis  See f o r example M a r t i n (1983).  Canonical Martin  Synthesis integrates  dependencies overlaying focus  i s M a r t i n ' s approach views  between  data  first  depicting  elements  integration.  a l l functional  (attributes)  and  any two views t o generate a t h i r d new one.  of h i s approach  dependencies stresses  by  t o view  the  i s on  generated use  of  by  the  the  bubble  integration charts,  ( a t t r i b u t e s ) and t h e i r f u n c t i o n a l  The procedure  integrates  elimination  then  The main  of t r a n s i t i v e  process.  showing  Martin  data  items  dependencies.  views p a i r w i s e and c o n s i s t s o f seven  i n t e g r a t i o n s t e p s f o r the l o g i c a l database d e s i g n .  1.  The  d e s i g n e r i s asked  to eliminate  any  duplicate  f u n c t i o n a l dependencies between any two d a t a items.  2.  The d e s i g n e r has t o i d e n t i f y c a n d i d a t e keys.  3.  A l l t r a n s i t i v e dependencies The  purpose  any  hidden primary keys,  3NF data  have  t o be  of t h i s step i s to f i n d  structure.  21  and  finally  removed.  and t o remove t o achieve a  4.  Introduce purpose allow  so  of  this  the  5.  step  "concatenated  is  on  more  data  the  key  of  structure,  and  Allocate  intersection  deals  extend of  Supplier#  step  to  representation  dependent existing  called  i.e.  data items  than  Price  is  The  model that  one  to are  already  dependent  on  Part#.  with  record  the  data  data  relationships  to  1  structures  data  that  I f r e l a t i o n s h i p s have a t t r i b u t e s , into  keys".  items.  have  This  attributes.  they are transformed  . 2  6.  Remove M : N r e l a t i o n s h i p s .  7.  The  technique  attribute such  an  is  transforms  owned  by  two  "intersecting"  structure  is  changed  to  structures  in  o r more p r i m a r y  attribute give  the  exists,  which  one  keys.  If  the  attribute  data  a  simple  the  method  between  views,  owner.  Martin's is  not  method has  concerned  three  with  the  major l i m i t a t i o n s . removal  of  First,  conflicts  1  These record s t r u c t u r e s are s i m i l a r to e n t i t i e s . Yet M a r t i n does not use the terms e n t i t y o r r e l a t i o n s h i p t o d e s c r i b e data constructs. 2  M a r t i n s u g g e s t s t h a t M:N r e l a t i o n s h i p s i n d a t a b a s e , a s i d e f r o m b e i n g s u p p o r t e d b y o n l y few DBMSs, a r e a n u n s t a b l e data construct, one t h a t is t y p i c a l l y r e p l a c e d by two 1:M structures as p a r t of the design or implementation process. His technique therefore disintegrates any M:N s t r u c t u r e into 1:M s t r u c t u r e s .  22  and of  second, the  i t uses a t t r i b u t e s  g l o b a l schema.  not p r e c i s e and  as  T h i r d , the  thus does no,  (Martin,  1983,  p.  "algorithm" presented  contrary to Martin's  a l l o w immediate automation of the  Conflict  the atomic b u i l d i n g b l o c k s  referring  statement,  process.  r e s o l u t i o n i s mentioned  265)  is  only  briefly  t o the problem o f homonyms.  A l l o t h e r view c o n f l i c t p o s s i b i l i t i e s are i g n o r e d .  For example,  M a r t i n i s not concerned about r e l a t i o n s h i p s o r e n t i t i e s modelled i n c o r r e c t l y as a t t r i b u t e s . resolution given  i s that Martin's  that  conflicts  applications. e l i m i n a t e d by process.  A consequence o f n e g l e c t i n g c o n f l i c t approach  have  M a r t i n has  to  be  expected  merging procedure,  be  automated,  in  real  world  t o assume t h a t a l l c o n f l i c t s  the database d e s i g n e r  Thus, l i k e  cannot  prior  were  t o the i n t e g r a t i o n  B e r n s t e i n ' s method, t h i s  one  i s a view  but not a c o n f l i c t r e s o l u t i o n procedure.  The use of a t t r i b u t e s as the atomic b u i l d i n g b l o c k s generate  at  based on  attributes  In  i t might be viewed s t r i c t l y  fact,  least  two  t o database d e s i g n . large  amounts o f  problems.  operates  The  First,  the modeling  at a very high  detail  information  level  of  process detail.  as a bottom-up approach  i n view d e s c r i p t i o n s c r e a t e s the  designer  has  to  process.  Even w i t h a s m a l l number of views, an e v a l u a t i o n o f the r e s u l t i n g schema becomes v e r y redundancies  complex and  very  difficult  i n terms  of  ( t r a n s i t i v e dependencies) . The e n t i t y - r e l a t i o n s h i p 23  approach, i n comparison, allows t o h i d e p a r t o f t h i s i n f o r m a t i o n , namely  a s s o c i a t i o n s between an e n t i t y and  i t s attributes.  the E-R  model, o n l y e n t i t i e s or r e l a t i o n s h i p s are a b l e t o form  r e l a t i o n s h i p s t o o t h e r e n t i t i e s or r e l a t i o n s h i p s . model, every a t t r i b u t e can be prior  to  redundancy e l i m i n a t i o n .  a t t r i b u t e s to higher semantic o b j e c t s on  r e l a t e d t o any  Martin's  other a t t r i b u t e , the  synthesis  l e v e l o b j e c t s i s not based on the  ( o b j e c t s meaningful t o the u s e r ) , but  f u n c t i o n a l dependency.  structures  Secondly,  In  (records,  The  segments,  r e s u l t i n g higher or  In  user's instead  level  r e l a t i o n s ) are  of  data  therefore  expected t o have l e s s meaning f o r the u s e r than data s t r u c t u r e s based on  objects  the  user  (e.g. e n t i t y MANAGER). s y n t h e s i s may o b j e c t s and  chooses t o d e s c r i b e h i s data  world  In other words, the r e s u l t s o f c a n o n i c a l  l o s e some o f i t s d e s c r i p t i v e adequacy o f r e a l world associations.  T h i s comment i s not meant t o imply on f u n c t i o n a l dependencies i s wrong.  t h a t database d e s i g n  based  Yet, the aggregates should  r e p r e s e n t the r e a l world view as f a i t h f u l l y as p o s s i b l e .  There  e x i s t s more than one p o s s i b l e way  t o d e s c r i b e a r e a l world o b j e c t  in  synthesis  the  data  model,  canonical  might  not  allow  a  r e p r e s e n t a t i v e of t h i s o b j e c t i n the form the u s e r would p r e f e r ( i . e . , semantic r e l a t i v i s m , Brodie,  1984).  F i n a l l y , due t o i t s l a c k of p r e c i s i o n , t h i s t e c h n i q u e should only  24  be viewed as a g u i d e l i n e t o i n t e g r a t i o n . s u b s t a n t i a l d e s i g n e r i n t e r a c t i o n and  2.2.1.3.  Casanova's and  See the  method,  well  as  require  designer i n s i g h t .  V i d a l ' s Method  Casanova and as  It s t i l l will  Vidal  (1983) f o r a d e s c r i p t i o n  Bishop and  Convent  (1986, 1985)  of for  extensions.  Casanova's view i n t e g r a t i o n method i s a formal approach t o view i n t e g r a t i o n based on  f o u r types of dependencies e x i s t e n t  g l o b a l database schema. g e n e r a t i o n of an respect in  size,  as  a  Goal of the i n t e g r a t i o n p r o c e s s i s the  "optimised"  to elimination  in  ( f e a s i b l e ) schema, o p t i m i s e d w i t h  of redundant i n f o r m a t i o n and  measured by  number of  relations  1  reduction  i n the  global  schema.  The  four  types of dependencies  ( a l s o r e f e r r e d t o as i n t e g r i t y  c o n s t r a i n t s ) i n t h i s approach are: f u n c t i o n a l dependencies (FDs) ,  1  In Casanova's language, which i s based on Ullman (1980, p. 75) , a " r e l a t i o n scheme" r e f e r s t o t h e s t r u c t u r e of a r e l a t i o n a l database o b j e c t , w h i l e a r e l a t i o n i s an i n s t a n c e of t h a t s t r u c t u r e , t h a t i s the a c t u a l d a t a . Ullman d e f i n e s relation scheme as the l i s t of a t t r i b u t e s f o r a r e l a t i o n . 25  inclusion  dependencies  (INDs), e x c l u s i o n dependencies  and union f u n c t i o n a l dependencies  A functional  dependency  f o r any t , u e  (EXDs),  (UFDs).  f d , expressed as R:X->Y, i s v a l i d i f f  r , i f t[X]=u[X]  then t[Y]=u[Y]  F o r example,  i n a r e l a t i o n scheme STUDENT[Stud#,Name], i f t [ X ] and u[X] are i d e n t i c a l s t u d e n t numbers, they both have t o i d e n t i f y the exact same s t u d e n t name.  An  inclusion  dependency  i n d i s expressed  as R1[X]  c  R2[Y],  w i t h X and Y b e i n g sequences o f a t t r i b u t e s of equal l e n g t h . T h i s dependency i s v a l i d i f f r l [ X ] i s a subset of r 2 [ Y ] . UNDERGRAD[Stud#] undergrad  students  An exclusion Y  c  STUDENT[Stud#] , means t h a t  the  set  of  i s a subset of the s e t o f a l l s t u d e n t s .  dependency  exd i s expressed as R1[X]  a g a i n b e i n g sequences  dependency  For example,  is valid,  of a t t r i b u t e s  i f f r l [ X ] and  | R2[Y], X and  o f same l e n g t h .  r2[Y] are d i s j o i n t .  This For  example, the s e t o f graduate s t u d e n t s and the s e t o f undergrad s t u d e n t s would be such d i s j o i n t s e t s o f s t u d e n t s .  A  union  functional  dependency  is a  functional  s t r e t c h i n g over the boundaries of one r e l a t i o n .  1  dependency  I t i s expressed  R r e f e r s t o a r e l a t i o n scheme, r i s an i n s t a n c e of t h a t r e l a t i o n scheme, X and Y are s e t s o f one o r more a t t r i b u t e s , and t and u are t u p l e s . 26  in  t h e form  <Ril:Xl->Yl,  ... , Rim:Xm->Ym>,  f u n c t i o n a l dependencies over r e l a t i o n and  as a s e t  of  schemes R i , where a l l X  Y a r e sequences o f a t t r i b u t e s o f same l e n g t h .  A UFD i s  v a l i d , i f f a FD t h a t h o l d s i n one r e l a t i o n holds i n a l l r e l a t i o n s i n c l u d e d i n t h e UFD.  F o r example, a UFD <STUDENT:Stud#->Name,  UNDERGRAD:Stud#->Uname>  means t h a t a student  number '83959818'  o c c u r r i n g i n STUDENT w i l l i d e n t i f y t h e same student name 'Jones' as t h e student  The  last  number '83959818' i n UNDERGRAD.  example g i v e s  above dependencies.  some i n d i c a t i o n o f t h e purpose o f t h e  They w i l l be used t o i d e n t i f y and e l i m i n a t e  sources o f redundancies. dependencies,  Given complete i n f o r m a t i o n on the above  a procedure  i s defined that w i l l  transform the  combination o f a l l views i n t o an i n t e g r a t e d g l o b a l schema. Complete  information  on d e p e n d e n c i e s  necessitates  complete  i n f o r m a t i o n on a l l a t t r i b u t e s i n a l l r e l a t i o n s o f a l l views, p l u s complete  information  on domains o f a t t r i b u t e s .  Given  this  i n f o r m a t i o n , t h e problem o f homonymy o r synonymy does not a r i s e , because  t h e names o f r e l a t i o n s o r a t t r i b u t e s a r e a l m o s t  irrelevant.  A l l  unambiguous.  In other  disputes domains  between  t h e above  words, there' w i l l  different  of attributes.  information  definition.  27  t o be  be f o r i n s t a n c e no  views concerning  Hence,  i s assumed  conflicts  dependencies o r a r e r u l e d o u t by  A  view  i n t e g r a t i o n based  on  Casanova's  i n v o l v e s the f o l l o w i n g s t e p s . above d e s c r i b e d  and  Vidal's  method  F i r s t , f o r every view, d e f i n e  dependencies.  Second, combine the  views  the by  lumping them t o g e t h e r and by d e f i n i n g a d d i t i o n a l c o n s t r a i n t s of the  above  elements  types,  to  (relations)  ("optimize")  this  describe of  schema  the  r e l a t i o n s h i p s between  d i f f e r e n t views. by  removing  Third,  the  integrate  redundancies  in  the  combination o f views.  The by  first the  major problem of t h i s i n t e g r a t i o n method, as  authors,  problem  i s that  i s PSPACE c o m p l e t e  memory space, but that  the  can  optimization  n o t h i n g but Fagin,  i t i s computationally  FDs  and  run  (it fits  indefinitely).  problem may  not  INDs are c o n s i d e r e d  hard.  finite  computer  Casanova p o i n t s  be  decidable,  The  out  even i f  (see a l s o Casanova  and  1982).  Another major problem concerns the this  technique.  The  approach  ambiguity-free information.  information requires  to  resolve  conflicts  requirements of  large  amounts  of  S i n c e i t cannot d e a l w i t h p a r t i a l l y  i n c o r r e c t u s e r views (wrong p e r c e p t i o n s used  into  stated  caused by  of d a t a ) ,  i t cannot  inconsistencies  be  i n user  views.  A f u r t h e r l i m i t a t i o n on Casanova's and V i d a l ' s approach r e s u l t s  28  from i t s a p p l i c a b i l i t y t o o n l y so c a l l e d " r e s t r i c t e d " schemas.  The  following  integration  (1)  r e s t r i c t i o n s apply  t o t h e input  o f t h e view  procedure.  All  functional  (single)  key.  dependencies  apply  Thus,  a r e no t r a n s i t i v e  there  only  to  the  dependencies e x i s t i n g . (2)  Any i n c l u s i o n  dependency a p p l i e s  a t t r i b u t e s of the r e l a t i o n s (3)  o n l y t o t h e key  involved.  Any union f u n c t i o n a l dependency must a p p l y t o the attributes  key  (as t h e l e f t argument o f t h e dependency)  for a l l relations  involved  and can o n l y d e s c r i b e a  dependency o f a s i n g l e a t t r i b u t e on t h e key  (" ...  if <Ril:XI:->Y1,  ... ,Rim:Xm->Ym>  Xl=...=Xm=Kil=...Kim and |Yj|=l, (4)  i s i n C, t h e n  je[l,m]").  Any a t t r i b u t e o f any r e l a t i o n can appear i n a t most one  union f u n c t i o n a l  and  any a t t r i b u t e A o f R i , A o c c u r s i n a t most one  UFD  i n C").  in  dependency (" ... f o r any R i e s  Note t h a t  Casanova's  this restriction i s violated  example.  The r e s t r i c t i o n may o n l y  r e f e r t o dependent a t t r i b u t e s , not t o key (5)  All  exclusion  dependencies  attributes.  29  apply  attributes.  t o o n l y key  E s p e c i a l l y r e s t r i c t i o n (4) seems l i k e a s i g n i f i c a n t l i m i t a t i o n t o the  i n t e g r a t i o n problem.  s e r v e as an  2.2.1.4.  i n d i c a t o r of how  Functional  For Housel  Real world databases w i l l have t o strong t h i s l i m i t a t i o n i s .  Data Model Based  references to  the  Integration  method, see  Yao,  Waddle  and  (1985, 1982).  In c o n t r a s t  t o many o t h e r s y n t a c t i c i n t e g r a t i o n methods, Yao  et  a l . p r e s e n t a view i n t e g r a t i o n approach based on Shipman's (1979) Functional data  can  Data Model. be  described  W i t h i n the F u n c t i o n a l i n form of  two  Data Model (FDM),  constructs,  nodes  (to  r e p r e s e n t e n t i t i e s and v a l u e s e t s ) and f u n c t i o n a l r e l a t i o n s h i p s . Nodes can  be  e i t h e r simple  nodes  (value s e t s ) , or tuple  nodes  ( c a r t e s i a n product of n>l v a l u e s e t s ) .  F u n c t i o n s , mappings from  a domain i n t o a range, can be functional  ( n : l ) , one-to-one,  identity  (1:1  mapping i n t o i d e n t i c a l value) and  30  or  can be p a r t i a l  (lower  degree 0), o r t o t a l  (lower  degree 1 ) .  Assertions  are  added as a f u r t h e r means f o r d e s c r i b i n g data,  to increase  the  d e s c r i p t i v e power of the model.  Assertions describe true facts  about data, i . e . t h a t one s e t of data i s the subset o f another.  Views are d e p i c t e d i n form o f nodes and r e l a t i o n s h i p c o n s t r u c t s ( i n a g r a p h i c a l r e p r e s e n t a t i o n ) . T h e r e f o r e , complete i n f o r m a t i o n on e n t i t i e s and a t t r i b u t e s , t h e i r domains and t h e i r r e l a t i o n s h i p s has t o be a v a i l a b l e . also  compiles  database.  information  on  the q u e r i e s  t o be  Database t r a n s a c t i o n s , r e p r e s e n t e d  Transaction  Specification  the views and modifications. namely  A s i d e from t h i s i n f o r m a t i o n , the approach  means o f  a  Language (TASL) are kept t o g e t h e r with  One  f u r t h e r p i e c e of i n f o r m a t i o n i s c o l l e c t e d , d e s c r i b i n g the  o f members of  professors,  by  the  are updated whenever view updates r e q u i r e query  information  quantities  i s s u e d on  courses  p h y s i c a l data  i n terms  a s e t , i . e . the number o f  in a  u n i v e r s i t y database.  of  students, Quantity  i n f o r m a t i o n i s l a t e r used i n h e u r i s t i c s t o i d e n t i f y non-redundant functions quantity  i n the  model.  information  will  The  treatment  of  subject  transaction  not  be  of  the  two  i n t e g r a t i o n operations:  and  following  discussion.  The  technique  removal  of  incorporates  redundant  nodes,  and  31  the  removal  of  the  redundant  functions. technique  According  to Batini  e t a l . (1986, p. 343), Yao's  performs view i n t e g r a t i o n on a l l views i n p a r a l l e l  ("one-shot n - a r y " ) .  However, t h i s i s t r u e f o r t h e i n t e g r a t i o n  o f redundant f u n c t i o n s o n l y .  I n t e g r a t i o n o f nodes i s performed  on a s i n g l e p a i r o f nodes a t any p o i n t i n time.  A node i s redundant i f i t r e p r e s e n t s as  some o t h e r  et  a l . , 1985, p. 338) does not mean the two s e t s a r e i n f a c t  identical.  node.  t h e "same s e t o f v a l u e s "  Note t h a t t h e "same s e t o f v a l u e s " (Yao  I t i s s u f f i c i e n t t h a t one i s a subset o f t h e o t h e r  or t h a t they a r e o v e r l a p p i n g . set  of values,  be performed are a  they w i l l  new  be merged.  i f any e x i s t i n g  identity functions.  I f two nodes r e p r e s e n t t h e same The i n t e g r a t i o n can only  f u n c t i o n s between t h e two nodes  Nodes A and B a r e merged by c r e a t i n g  node C which i s the union  o f A and B.  A l l functions  t h a t had A o r B as domain o r range w i l l be r e d e f i n e d t o have C as domain o r range.  In a d d i t i o n , i f A and B a r e not i d e n t i c a l , SEP w i l l  be c r e a t e d  that  between t h e two o r i g i n a l separation  node  stores  information  nodes, g i v e n  has t o be c r e a t e d ,  a separation  to differentiate  t h e new node C. also  node  If a  a new f u n c t i o n a l  dependency w i l l be c r e a t e d w i t h C as i t s domain and SEP as i t s range. that  A separation  node can be viewed as a s e t o f i n d i c e s  i n d i c a t e s , by means o f p o i n t e r s t o t h e new combined s e t  C, t h e o r i g i n o f each v a l u e i n t h e new s e t .  32  The  second i n t e g r a t i o n o p e r a t i o n  removes redundant  functions.  The  goal  i s t o remove a f u n c t i o n a l r e l a t i o n s h i p A->C,  i f i t can  be r e p l a c e d  by o t h e r f u n c t i o n s ,  i . e . the two  f u n c t i o n s A->B  and  authors p o i n t out t h a t the redundancy of a f u n c t i o n can  only  B->C.  The  be d e c i d e d upon a n a l y s i s of data semantics. meaning o f on  functional  r e l a t i o n s h i p s has  i t s redundancy.  differentiates discussed  The  This  Yao's e t  is  one  In o t h e r words, the  t o be  known t o  decide  the  criteria  which  of  a l . ' s technique  from the  previously  completely s y n t a c t i c approaches.  method proposed by Yao  e t a l . has  F i r s t , the method i s incomplete.  a number of l i m i t a t i o n s .  View i n t e g r a t i o n i s r e s t r i c t e d  t o o n l y t h r e e cases of node i n t e g r a t i o n and one case o f integration.  function  Hence, the technique w i l l not be a b l e t o adequately  r e p r e s e n t a l l p o s s i b l e types of s e t r e l a t i o n s h i p s between view objects common  A  ( f o r example, two  nodes are not  overlapping  but have a  superset).  second  procedure  weakness i s not  concerns  defined  the  integration  exactly.  removal always precede node removal?  33  For  procedure.  example, does  The  function  Does the procedure perform  node merges always on s i n g l e p a i r s of nodes, o r on an a r b i t r a r y number o f nodes a t the same time.  T h i r d l y , the technique does not show the t r a n s f o r m a t i o n from i n t o database o b j e c t s , i . e . r e l a t i o n s , or more l i k e l y ,  FDM  network  constructs.  Fourthly, database  use  of p h y s i c a l  design  i s not  database  particularly  information in useful  (i.e.,  logical record  quantities).  Finally,  the method has no means f o r d e a l i n g w i t h  conflicting  i n f o r m a t i o n , i . e . w i t h naming c o n f l i c t s o r w i t h type c o n f l i c t s .  2.2.2.  S e m a n t i c View I n t e g r a t i o n Approaches Based on E-R  Model  Semantic meaningful  the  approaches  t o the u s e r .  use  data  objects that  are  S i n c e they r e q u i r e a h i g h e r l e v e l  u n d e r s t a n d i n g of the meaning o f o b j e c t s , . t h e s e approaches  34  of  are  t y p i c a l l y i n t e r a c t i v e , t h a t i s , they demand d e s i g n e r i n t e r v e n t i o n during the i n t e g r a t i o n process.  Designer  intervention i s for  i n s t a n c e n e c e s s a r y t o s e t t l e c e r t a i n naming o r type and  even more  important,  to  interpret  conflicts,  t h e meaning of  data  objects or object r e l a t i o n s h i p s .  S i n c e semantic i n t e g r a t i o n approaches  focus more on the meaning  o f the data o b j e c t s than on o n l y s t r u c t u r a l i n f o r m a t i o n , the data models used t o r e p r e s e n t views have t o be a b l e t o c a p t u r e data semantics.  In t h i s s e c t i o n , i n t e g r a t i o n t e c h n i q u e s based on the  Entity-Relationship  (E-R)  model, w i l l  be i n t r o d u c e d .  The  E-R  model i t s e l f i s not p a r t i c u l a r l y r i c h i n i t s a b i l i t y t o r e p r e s e n t data semantics.  T h e r e f o r e , the methods d i s c u s s e d below  (both  Navathe e t a l . and B a t i n i e t a l . ) use an extended E-R model which f o r i n s t a n c e p r o v i d e s the c a p a b i l i t y t o model categories  which  1  are g e n e r a l i z a t i o n s o f e n t i t i e s .  I n t e r a c t i v e approaches database  take advantage o f h a v i n g access t o the  d e s i g n e r d u r i n g the  i n t e g r a t i o n process f o r c o n f l i c t  s e t t l e m e n t or i n f o r m a t i o n c l a r i f i c a t i o n . permit the  integration  of l e s s  restricted  In consequence, they data models and  to  perform a l a r g e r p o r t i o n o f the i n t e g r a t i o n p r o c e s s i . e . i n c l u d e conflict analysis. approaches  On the o t h e r hand, the r e p o r t e d i n t e r a c t i v e  t y p i c a l l y do not i n c l u d e a  1  complete  Not a l l i n t e g r a t i o n methods r e p r e s e n t i n g data semantics have t o be based on the E-R model. For example, Teory and Fry (1982) developed a method based on a semantic h i e r a r c h i c a l model. 35  algorithm specify  f o r the integration process  the r e s t r i c t i o n s  placed  and do n o t e x a c t l y  on t h e data model  (such as  consistency).  2.2.2.1.  be  Navathe's and E l m a s r i ' s  Approach  Description  a s p e c t s o f t h i s method can  of various  found i n Navathe, E l m a s r i ,  Elmasri  Larson  (IEEE 1986), Navathe and  (IEEE 1984), E l m a s r i and Navathe (1986), E l m a s r i e t a l .  (1987).  Navathe's and E l m a s r i ' s object  class  extended  approach c o n c e n t r a t e s  integration.  on t h e i d e a o f  The e n t i t y - r e l a t i o n s h i p model i s  t o an e n t i t y - c a t e g o r y - r e l a t i o n s h i p model where a  c a t e g o r y r e f e r s t o a c l a s s o r an o b j e c t  type  (common r o l e o r  subclass).  The atomic elements o f t h i s approach a r e e n t i t i e s ,  categories,  r e l a t i o n s h i p s , and a t t r i b u t e s .  Two t y p e s o f c a t e g o r i e s subclass represents sets,  categories.  a r e used, common r o l e c a t e g o r i e s and A common  a common p r o p e r t y  i . e . the category  role  category  i s one t h a t  o f two o r more otherwise d i f f e r e n t  OWNER r e p r e s e n t s  a common r o l e f o r  both PERSON and COMPANY, who may both be owners o f a v e h i c l e .  36  A s u b c l a s s i s a s p e c i a l i z a t i o n o f an e n t i t y s e t , i . e . the VEHICLE entity  set  has  subclasses  CAR  and  TRUCK.  Common r o l e  and  s p e c i a l i z a t i o n w i l l have an impact on i n h e r i t a n c e o f a t t r i b u t e s .  The  procedure c o n s i s t s of t h r e e s t e p s : p r e - i n t e g r a t i o n ,  i n t e g r a t i o n and  object  relationship integration.  W i t h i n p r e - i n t e g r a t i o n t h r e e t a s k s are performed.  F i r s t , naming  correspondences are e s t a b l i s h e d , r e s o l v i n g the problem of i n t e r view homonymy and  synonymy. Synonymy and homonymy r e f e r t o  the  problem o f d i f f e r e n t names d e s i g n a t i n g the same r e a l world o b j e c t or  identical  (concepts  1  names d e s i g n a t i n g  different real  world  objects  ) . The second t a s k i s the i d e n t i f i c a t i o n o f c a n d i d a t e  keys f o r o b j e c t c l a s s e s .  The  domains f o r o b j e c t c l a s s e s . Navathe's t e c h n i q u e .  The  t h i r d task  i s the d e f i n i t i o n  of  Domains p l a y an important r o l e i n  purpose o f d e f i n i n g them w i t h i n  the  p r e - i n t e g r a t i o n step i s t o gather i n f o r m a t i o n f o r the r e c o g n i t i o n of  identical  objects  or  have the  related  real  world  same domain, i t may  o b j e c t s are  identical.  Integration  of objects  objects. be  i s the  In t h i s phase, i n f o r m a t i o n  1  if  suspected t h a t  ( e n t i t i e s or c a t e g o r i e s )  phase of Navathe's scheme.  I.e.  two  these  second on  Navathe uses the term "concept" t o r e f e r t o a r e a l world o b j e c t , w h i l e B a t i n i uses the term "concept" f o r a data model element such as an e n t i t y , a t t r i b u t e o r r e l a t i o n s h i p . 37  domains i s used t o determine s i m i l a r i t i e s among  view  identical  objects.  Navathe  or d i s s i m i l a r i t i e s  analyses the following  domains, c o n t a i n e d domains, o v e r l a p p i n g  cases:  domains,and  d i s j o i n t domains.  INTEGRATION OF OBJECTS  The  i n t e g r a t i o n of r e l a t i o n s h i p s follows the object  step. both  Navathe points structural  out t h a t  and s e m a n t i c  for relationship considerations  Relationships  are c l a s s i f i e d  degree  i s n o t t h e mapping  objects  (which  involved  according  i n t h e view  integration  ratio  integration  a r e important.  to three  criteria:  b u t t h e number o f  (construct)),  roles  of  object  c l a s s e s i n v o l v e d i n t h e r e l a t i o n s h i p , and s t r u c t u r a l c o n s t r a i n t s , such as mapping r a t i o s .  The  relationship  information information  integration  i n the following  strength view  integration,  t h e above  role  degree  information constraints  cases.  t o be l e a r n t  o f domain  degree),  and s t r u c t u r a l v s . domain  resulting i n 8 integration  main p o i n t s  evaluates  sequence o f importance:  (same/different  (same/different r o l e s ) ,  The  process  from Navathe • s approach a r e the  information  and category  the p o s s i b i l i t y  information f o r  o f simultaneous  n-object  i n t e g r a t i o n ( i n some i n s t a n c e s ) , and t h e r e l e v a n c e o f p a r t i c u l a r  38  pieces  of information  during  the r e l a t i o n s h i p  integration  phase.  2.2.2.2.  B a t i n i ' s Approach  For r e f e r e n c e s see f o r i n s t a n c e B a t i n i e t a l . (1984a, 1983), o r B a t i n i and L e n z e r i n i  Batini's  (1983).  approach performs i n t e g r a t i o n  on t h e atomic elements  of t h e e n t i t y - r e l a t i o n s h i p model, e n t i t i e s , r e l a t i o n s h i p s , and attributes.  View  integration  i s presented  p r o c e s s which aggregates views p a i r w i s e . a r i s e between t h e two views, a c o n f l i c t  as an i t e r a t i v e  Whenever c o n f l i c t s r e s o l u t i o n process i s  invoked and c a r r i e d out i n t e r a c t i v e l y w i t h a database d e s i g n e r . The  technique  identifying  starts  intra-view  them.  These  (e.g.  entity)  which  represented  a name  homonyms  conflict  and synonyms  analysis,  and removing  c a n be naming c o n f l i c t s f o r t h e same concepts  relationship). analysis  out with  or f o r different This  step  results  concepts  i s followed  (e.g. e n t i t y vs.  by a t y p e  i n t h e same r e a l world  by t h e same c o n c e p t  conflict  object  being  i n d i f f e r e n t views (e.g.  MARRIAGE always an e n t i t y ) and i n an adjustment o f c a r d i n a l i t i e s (mapping  ratios)  relationships  and o p t i o n a l i t i e s  i n different  views  39  of attributes  and  t o make them i d e n t i c a l .  Finally,  merging  adjusted  views  and  and  redundancy  analysis  removes r e d u n d a n c i e s  superimposes such  as  the  redundant  1  cycles .  B a t i n i ' s method b u i l d s a g l o b a l schema i t e r a t i v e l y , i n t e g r a t i n g two  views i n t o a temporary g l o b a l schema and  views t o The  two  this  schema u n t i l  adding a d d i t i o n a l  a l l views have been  main elements o f the  consolidated.  technique are C o n f l i c t  Analysis  (together w i t h merging) and Redundancy A n a l y s i s , w i t h the main focus  on  Martin,  Conflict  Analysis.  Unlike  other  authors such  as  B a t i n i et a l . address the problem o f i n c o n s i s t e n c i e s  between d i f f e r e n t u s e r s naming c o n v e n t i o n s  1  perceptions  of the world and  s y s t e m a t i c a l l y (but not  different  completely).  The g o a l o f C o n f l i c t A n a l y s i s i s t o d e t e c t and s o l v e a l l e x i s t i n g c o n f l i c t s between two r e p r e s e n t a t i o n s of objects. and  type  Two  types o f c o n f l i c t s are t a c k l e d , naming c o n f l i c t s  conflicts.  model concept differently  (views) o f the same c l a s s e s  (entity,  Naming c o n f l i c t s  a r i s e i f the  same data  a t t r i b u t e or r e l a t i o n s h i p ) i s l a b e l l e d  (synonyms) , or  i f different  concepts are  labelled  w i t h the same name (homonyms) . Type c o n f l i c t a n a l y s i s determines whether o b j e c t s have compatible concepts (types) and a d j u s t s them if  necessary.  1  The t e c h n i q u e a l s o i n c l u d e s q u a n t i t a t i v e and p r o c e d u r a l a s p e c t s t o a r r i v e a t a p r o c e d u r a l l y more adequate schema where f r e q u e n t database o p e r a t i o n s can be c a r r i e d out more e f f i c i e n t ly. 40  To d e f i n e homonymy and synonymy, B a t i n i e t a l . r e f e r t o the view r e p r e s e n t a t i o n o f r e a l world o b j e c t s .  I f a view SI r e p r e s e n t s  two d i f f e r e n t r e a l world o b j e c t s w i t h t h e same concept t h i s i s c a l l e d an i n t r a - v i e w homonym  1  .  (name),  A c c o r d i n g l y , synonymy  r e f e r s t o t h e same r e a l world o b j e c t b e i n g r e p r e s e n t e d by two different  concepts  within  one v i e w .  Given  these  view  i n c o n s i s t e n c i e s , B a t i n i i d e n t i f i e s a number o f p o s s i b l e s c e n a r i o s and is  solution alternatives.  Interesting  i n Batini's  t h e focus on o n l y i n t r a - v i e w i n c o n s i s t e n c i e s .  procedure Inter-view  i n c o n s i s t e n c i e s are, a t l e a s t i n t h i s step, ignored.  A second s t e p i n t h e naming c o n f l i c t a n a l y s i s i s t h e so c a l l e d a n a l y s i s o f concept l i k e n e s s o r u n l i k e n e s s . The attempt i n t h i s s t e p i s t o f i n d out whether a concept t h a t has t h e same name i n two  d i f f e r e n t views  possesses d i f f e r e n t  "neighbor  properties"  (concept u n l i k e n e s s ) , o r whether concepts have d i f f e r e n t names but some common neighbor p r o p e r t i e s  The  next  Analysis. identical  step  i n Batini's  I t s purpose real  world  (concept l i k e n e s s ) .  approach  i s t h e Type  Conflicts  i s t o a s s i g n t h e same c o n c e p t s t o  objects i n different  views.  I.e.  i f  MARRIAGE were a r e l a t i o n s h i p i n one view, but an e n t i t y i n t h e  1  U s u a l l y one would expect i n t e r - v i e w homonymy t o be t h e more important i s s u e , two views s u p p l y i n g t h e same name t o two d i f f e r e n t r e a l world o b j e c t s . 41  other  one, a t l e a s t  one o f t h e s e  change t o l e t MARRIAGE The c o n v e r s i o n (entity,  representations  be r e p r e s e n t e d by o n l y  a t t r i b u t e r e l a t i o n ) and r e s u l t s i n two views  second  part  of type  conflict  checking, a p r o c e s s which analyzes, views,  one concept.  o f concepts i s r e s t r i c t e d t o o n l y atomic concepts  same names and same concepts t o d e s c r i b e  The  would be  r e a l world  analysis  using  objects.  i s compatibility  among t h e now q u i t e s i m i l a r  whether c a r d i n a l i t i e s (mapping r a t i o s ) a r e i d e n t i c a l .  Compatibility optionality  checking  also  discovers  differences  o f a t t r i b u t e s and r e l a t i o n s h i p s .  i n the  According t o  Batini et a l . , differences i n c a r d i n a l i t i e s point to errors i n one o f t h e views, o r a l t e r n a t i v e l y t o a containment r e l a t i o n s h i p .  Once a l l c o n f l i c t s have been r e s o l v e d , Analysis  follow.  superimposed.  In merging,  Merging and Redundancy  t h e c o n f l i c t - f r e e views a r e  Redundancy a n a l y s i s removes redundant a l t e r n a t e  paths between o b j e c t s .  Redundancies can o c c u r because m u l t i p l e  paths a r e s e m a n t i c a l l y  equivalent.  B a t i n i ' s t e c h n i q u e concludes w i t h an update o f t h e i n d i v i d u a l views t o make them c o n s i s t e n t w i t h t h e newly generated  global  schema and w i t h an a l t e r a t i o n o f the g l o b a l schema t o i n c l u d e procedural  and q u a n t i t a t i v e  aspects.  42  B a t i n i ' s approach p r o v i d e s a procedure f o r t h e i n t e g r a t i o n p r o c e s s t o g e t h e r w i t h some exact c o n f l i c t r e s o l u t i o n yet,  based on i t s d e s c r i p t i o n  automated.  algorithms,  i n t h e l i t e r a t u r e , i t cannot be  The method does not c l a r i f y when a p a r t i c u l a r  i n t e g r a t i o n r u l e has t o be a p p l i e d , t o be a v a i l a b l e  o r which i n f o r m a t i o n has  (Navathe i s more exact i n t h i s matter, b a s i n g  h i s r e s o l u t i o n scheme on i n f o r m a t i o n on c l a s s membership).  2.3.  The  View I n t e g r a t i o n  Cases  i n v e s t i g a t i o n o f t h e above view i n t e g r a t i o n t e c h n i q u e s found  c o n s i d e r a b l e o v e r l a p among t e c h n i q u e s w i t h r e s p e c t integration capabilities.  to their  When techniques d i f f e r , they t y p i c a l l y  d e v i a t e i n t h e i r c o n f l i c t r e s o l u t i o n c a p a b i l i t i e s and i n aspects of t h e i n t e g r a t i o n method r e l a t e d t o t h e i r i n d i v i d u a l data models.  The more r e c e n t techniques t y p i c a l l y p r o v i d e a r i c h e r  s e t o f cases f o r c o n f l i c t r e s o l u t i o n . respect  Consensus e x i s t s w i t h  t o t h e i n t e g r a t i o n cases f o r s e t s  relationships)  (of e n t i t i e s o r  whose c o n n e c t i o n t o each o t h e r i s known, as  represented i n the following  eight  43  cases.  Object Class  Integration:  (1)  I d e n t i c a l object  (2)  Contained o b j e c t c l a s s  (3)  O v e r l a p p i n g o b j e c t c l a s s e s w i t h a common  (4)  D i s j o i n t o b j e c t c l a s s e s w i t h a common  Relationship  classes  superset  superset  Integration:  (5)  Relationship identity  (6)  Relationship  (7)  R e l a t i o n s h i p o v e r l a p w i t h a common  containment superset  relationship (8)  D i s j o i n t r e l a t i o n s h i p s w i t h a common  superset  relationship  The  t a b l e below d e p i c t s which o f t h e above cases a r e supported  by t h e t e c h n i q u e s p r e s e n t e d i n t h e c h a p t e r  ('y' i n d i c a t e s t h e  t e c h n i q u e ' s a b i l i t y t o d e a l w i t h t h e case, a blank i n d i c a t e s t h a t no r e f e r e n c e  Technique  has been made t o how t h i s case would be s o l v e d ) .  1  2  3  Cases 4 5  Martin cases do not apply Bernstein cases do not apply Casanova and v i d a l y y y y Yao e t a l . y y y Navathe e t a l . y y y y y Batini et a l . y y y  44  6  7  y y  y y  8  y y  y  Conclusion  2.4.  This  section  shall  point  weaknesses o f s y n t a c t i c and  out  the  comparative s t r e n g t h s  and  semantic i n t e g r a t i o n approaches.  S y n t a c t i c approaches  R e s t r i c t e d Data Models S y n t a c t i c approaches p l a c e c o n s i d e r a b l e r e s t r i c t i o n s on the data model w i t h which views are represented.  For example, Biskup's  and Convent' s model i s r e s t r i c t e d t o o n l y proper database schemes which impose r e s t r i c t i o n s on the f i e l d s t o which c o n s t r a i n t s can apply. a  key  T y p i c a l l y , a l l dependencies have t o i n v o l v e the key attribute.  uniqueness  assumption  dependency may out  that  this  Bernstein  r e f e r s i n h i s technique to  which d i c t a t e s t h a t  o n l y one  e x i s t between any p a i r of f i e l d s . restriction  may  lead  t o the  or the  functional  He a l s o p o i n t s  necessity  to  bury  1  semantics i n data item names .  1  For i n s t a n c e t h a t two f i e l d s Emp# and Dept# may be r e l a t e d by the f u n c t i o n a l dependency "employee i s l o c a t e d i n department" o r by another dependency "employee i s employed by department". S y n t a c t i c models r e q u i r e a renaming o f a t l e a s t one of the Dept# f i e l d s i n t h i s case.  45  No C o n f l i c t A n a l y s i s The  s y n t a c t i c approaches operate under t h e assumption  data  required  Therefore, The  conflict  techniques  with  f o r integration analysis  and  correct.  i s not p a r t o f t h e t e c h n i q u e s .  can d e a l w i t h  synonymy,  i s complete  t h a t the  simple  i f identity  conflicts,  f o r instance  i s e s t a b l i s h e d by means o f  constraints.  No A b i l i t y t o Deal w i t h Incomplete Again,  the a b i l i t y  o r I n c o n s i s t e n t Data  t o deal with  incomplete  or inconsistent  data i s o u t s i d e t h e scope o f s y n t a c t i c i n t e g r a t i o n t e c h n i q u e s . At  least  one technique, Biskup's  and Convent's, w i l l ,  u n r e s o l v a b l e problem i s encountered,  when an  i n t e r a c t with the designer  t o r e s o l v e t h e problem i n o r d e r t o a l l o w a c o n t i n u a t i o n o f the i n t e g r a t i o n process. is let  n o t a planned the technique  form  However, t h i s form o f e x c e p t i o n h a n d l i n g of c o n f l i c t  analysis,  b u t a measure t o  c o n t i n u e when none o f t h e i n t e g r a t i o n  cases  i s c o n s i d e r e d performable by t h e t e c h n i q u e .  E x t e n s i v e I n f o r m a t i o n Requirements The  major i n f o r m a t i o n requirement  knowledge  o f dependencies  of syntactic  between  data  approaches i s  items.  Since a l l  dependencies a r e d e f i n e d on the a t t r i b u t e l e v e l , t h i s i n f o r m a t i o n requirement dependencies  exceeds t h a t o f semantic approaches which r e p r e s e n t on t h e e n t i t y  level  46  only.  Furthermore,  the  requirement t o a l s o d e f i n e an e x p o n e n t i a l  explosion  inter-view  constraints  can  lead  to  of c o n s t r a i n t d e f i n i t i o n s .  C o m p u t a t i o n a l l y Hard Casanova  and  Vidal  and  Biskup  and  Convent  point  out  the  computational requirements of t h e i r t e c h n i q u e s .  Provide Integration One of  Algorithm  major advantage of s y n t a c t i c approaches i s the completeness procedures.  particular that  upon  The  approaches,  i n t e g r a t i o n cases, termination  has  instead  typically  produced  an  of  outlining  present a integrated  only  procedure database  schema.  Show O p t i m a l i t y  ( F e a s i b i l i t y ) of Design  Another major advantage of ante  s p e c i f i c a t i o n of  s y n t a c t i c approaches  design  achievement of these d e s i g n  Semantic approaches  R e q u i r e Designer I n t e r a c t i o n  objectives  objectives.  and  is their  t h e i r proof  exof  Based on  the  f a c t t h a t semantic approaches operate on  objects  meaningful t o u s e r s but o f t e n not meaningful t o the i n t e g r a t i o n mechanism, these approaches r e q u i r e i n t e r p r e t a t i o n of o b j e c t s and  Cover L a r g e r  designer  interaction for  for conflict analysis.  P o r t i o n o f the I n t e g r a t i o n  In a d d i t i o n t o the o p e r a t i o n s c o n t a i n e d  Process  i n s y n t a c t i c approaches,  semantic approaches i n c l u d e a l s o c o n f l i c t a n a l y s i s procedures, and p r e - i n t e g r a t i o n procedures (see B a t i n i e t a l . , 1986) are concerned, among o t h e r  State/Solve  f a c t o r s , w i t h data  which  gathering.  More I n t e g r a t i o n Cases  Semantic t e c h n i q u e s i d e n t i f y  and  s o l v e more i n t e g r a t i o n cases  since  the  simple e i g h t  set  they  include  not  only  i n t e r - r e l a t i o n s h i p s as  involving  explained  cases based  above, but  also  Semantic methods perform not  Therefore, object  cases  conflicts.  A l l o w Less R e s t r i c t e d Data Models ( i . e . , n o n - s i m i l a r  objects,  on  a  i n t e g r a t i o n based on  ( e x c l u s i v e l y ) based on semantic  classes  approach  i n which one  can  possibly  when the o b j e c t c l a s s e s have d i f f e r e n t keys.  48  meaning  of  structural similarities.  i s a subset o f  Less Complex  the  keys)  integrate  the  other,  two even  Semantic ways.  approaches  First,  syntactic items.  simplify  the i n t e g r a t i o n  t h e amount o f d e t a i l  approaches,  Second,  since  semantic  p r o c e s s i n two  i s much l e s s than t h a t o f  the focus  i s on  entity-level  d a t a items a r e more meaningful t o  humans than a r b i t r a r y c o l l e c t i o n s o f f i e l d s h e l d t o g e t h e r o n l y by  dependencies.  Deal w i t h Database O b j e c t s Meaningful t o Designers and Users The  outcome o f t h e d e s i g n p r o c e s s a l s o  the database t o database  user, since users.  t h e database  A syntactic  i s more profound f o r  o b j e c t s a r e meaningful  integration,  based p u r e l y on  dependencies, may d e r i v e database o b j e c t s t h a t are not s u g g e s t i v e to  the user.  One o f B a t i n i  e t a l . 's  (1986) c r i t e r i a f o r  goodness o f a d e s i g n i s u n d e r s t a n d a b i l i t y .  Do not P r o v i d e Complete Procedures One o f t h e major weaknesses o f the semantic approaches limited Even  description  though  o f complete  a variety  procedures  of integration  d e s c r i p t i o n o f sequences  cases  f o r integration. i s o u t l i n e d , the  o f i n t e g r a t i o n s t e p s and p o s s i b l e r e -  i t e r a t i o n s i s , i f not m i s s i n g , a t l e a s t v e r y t e r s e . when d e a l i n g w i t h c o n f l i c t not complete  i n their  i s the  analysis,  analysis,  semantic  approaches are  nor do they show t h e m i s s i n g  elements o f t h e a n a l y s i s .  Do not P r e s e n t Proof o f O p t i m a l i t y o f t h e Design 49  In a d d i t i o n ,  A  consequence  of  the  incompleteness  of  semantic  integration  procedures i s t h e i r i n a b i l i t y t o demonstrate the o p t i m a l i t y of the  final  design.  No  semantic procedure s t a t e s  which the procedure t e r m i n a t e s and has achieved  a point  a final  at  design.  A l s o , the o b j e c t i v e s of semantic approaches i n v o l v e the c r i t e r i o n of for  understandability instance,  criteria  which cannot be  adherence t o  that  can  be  normal  measured as  forms.  Yet,  shown more e a s i l y ,  easily  as,  even f o r  the  semantic  approaches  t y p i c a l l y do not p r o v i d e any proof of o p t i m a l i t y or f e a s i b i l i t y .  Overall, point  conflict  in a l l  a n a l y s i s and  r e s o l u t i o n i s the  i n t e g r a t i o n techniques.  deficiency  are:  (1)  s y n t a c t i c techniques a n a l y s i s at a l l .  (2)  i f conflict  Three causes o f  cannot  deal  with  this  conflict  They ignore c o n f l i c t s i n g e n e r a l .  analysis  unsystematically.  common weak  i s done,  i t i s often  done  B a t i n i e t a l . (1983) perform  the  most thorough a n a l y s i s by s e p a r a t i n g naming c o n f l i c t s from  type  separately.  conflicts This  and  then  analyzing  analysis i s s t i l l  not  them  sufficient  t o i d e n t i f y , l e t alone s o l v e , a l l p o s s i b l e causes of conflicts. (3)  conflict  analysis  requirements considered  is  biased  considerations.  by  information  Only  cases  are  f o r which i n f o r m a t i o n i s e a s i l y a v a i l a b l e  ( i . e . mapping r a t i o s ) , which are most prominent ( i . e .  50  synonyms), or which are of p a r t i c u l a r concern due t o the data model chosen ( i . e . semantic mapping  ratios) .  procedure cases  In c o n t r a s t , a more  should be  and  then  requirements technique  to  relativism,  or  systematic  aware o f a l l p o s s i b l e c o n f l i c t  should  determine  s o l v e them.  the  Thus,  information even  i f the  i s not a b l e t o r e s o l v e a l l c o n f l i c t s  due  t o l a c k o f i n f o r m a t i o n , i t i s a t l e a s t aware of the p o s s i b i l i t y of e x i s t e n c e o f a c e r t a i n c o n f l i c t , thus of i t s own  Batini  et  a l . (198 6)  limitations!  summarize the  lack of research  area o f c o n f l i c t a n a l y s i s as f o l l o w s : ...  Simple  for  solving  renaming o p e r a t i o n s are used naming  methodologies.  conflicts  With  regard  by to  most other  types o f c o n f l i c t s , the methodologies do not s p e l l out f o r m a l l y how process  is  indication  carried i s given  the r e s o l u t i o n  out;  however,  an  i n s e v e r a l of them  as t o how one should proceed. And  ... (p.  348)  further: ... I t i s i n t e r e s t i n g t o note t h a t among the methodologies surveyed,  none p r o v i d e  an a n a l y s i s or p r o o f of the completeness of the  and  schema t r a n s f o r m a t i o n  51  operations  i n the  from  the  resolve arise.  The  standpoint any  ...  type  of  of  being  able  c o n f l i c t that  to can  (ibid.)  s o l u t i o n t o these problems w i l l t h e r e f o r e form the c o r e of  t h i s research project.  52  3.  SYSTEM FOR  3.1.  Research Question  Research q u e s t i o n  Can  1.1  VIEW INTEGRATION  and C o n t r i b u t i o n t o Knowledge  1:  a  view  i n t e g r a t i o n process  be  f o r m a l i z e d which transforms any c o l l e c t i o n of c o n f l i c t i n g views i n t o a complete  and  c o n s i s t e n t g l o b a l schema? Which  1.2  conflict  i n the  The  cases  have t o be  solved  process?  purpose o f t h i s r e s e a r c h q u e s t i o n  i s t o s o l v e the  a n a l y s i s problem, i n i t i a l l y n e g l e c t i n g i n f o r m a t i o n  conflict  requirements.  Assuming s u f f i c i e n t i n f o r m a t i o n , a mechanism i s t o be developed that allows The  view  s o l u t i o n o f a l l view  i n t e g r a t i o n mechanism  collection schema,  the d e t e c t i o n and  of  using  integration  views the  cases  o t h e r s t o be d e f i n e d  i n t o a complete  previously  be and  introduced  for set-subset  able  to convert  a  consistent global group o f  8  simple  r e l a t i o n s h i p s , as w e l l  as  later.  Based on the s u f f i c i e n t can be d e s c r i b e d and  shall  conflicts.  i n f o r m a t i o n assumption, c o n f l i c t  s o l v e d without 53  concern f o r the  cases  difficulty  of  data  with  gathering.  Instead  the information  o f mixing  requirements  the c o n f l i c t  problem  problem, q u e s t i o n  1 deals  o n l y w i t h t h e former one.  The  first  s t e p i n answering t h i s r e s e a r c h q u e s t i o n w i l l be t h e  identification cases.  and s o l u t i o n  The s e c o n d  procedure  o f a complete  step w i l l  focus  set of c o n f l i c t  on t h e development o f a  t o c a r r y out the i n t e g r a t i o n , based  on t h e s e t o f  cases.  Research q u e s t i o n 2 :  2.1  What  information  c a n be u s e d  f o r the  i n t e g r a t i o n o f user views i n t o a g l o b a l database  schema  when  the  necessary  information i s not e x p l i c i t l y a v a i l a b l e ? 2.2  How can t h i s  i n f o r m a t i o n be gathered i n  a  that  process  limits  interrogation to a feasible  designer level?  The b a s i s f o r t h e second q u e s t i o n i s t h e assumption t h a t i n a l l practical  s i t u a t i o n s t h e necessary  i n f o r m a t i o n about views i s  not u n a v a i l a b l e , o r t o o d i f f i c u l t o r t o o c o s t l y t o gather. Therefore,  even though t h e answer t o q u e s t i o n  i n f o r m a t i o n i s necessary  t o perform  view i n t e g r a t i o n , a l l t h i s  i n f o r m a t i o n cannot be expected t o be p r e s e n t .  54  1 r e v e a l s which  Hence, s u b s t i t u t e s  have t o be found  f o r the missing information; s u b s t i t u t e s that  can be e i t h e r known by t h e program  (program's knowledge base)  or which can be e a s i l y gathered through a minimum o f i n t e r a c t i o n w i t h t h e database d e s i g n e r .  The term " s u b s t i t u t e s " may be b e t t e r phrased as " o p e r a t i o n a l i z a t i o n s " o f i n f o r m a t i o n on some database concept. given  sufficient  relationships differ.  i n f o r m a t i o n , t h e system w i l l  have  identical  meaning,  even  F o r example, know t h a t two i f their  names  A system w i t h i n s u f f i c i e n t i n f o r m a t i o n has t o r e l y on  operationalizations  o f t h e "meaning" concept  i d e n t i t y o f such r e l a t i o n s h i p s .  t o assess the  Domain i d e n t i t y and i d e n t i t y  o f neighbour e n t i t i e s may be such o p e r a t i o n a l i z a t i o n s .  The i n t e n t i o n behind t h e second q u e s t i o n i s not t o f i n d to  solve  the limited  substitute  information;  assessment o f concepts to  grasp  will  problem,  information  items  but t o i d e n t i f y that  The knowledge o f these s u b s t i t u t e s  t e a c h us a l s o about a l t e r n a t i v e i n f o r m a t i o n  though  important should  not l i m i t  mechanism.  requirements  techniques.  availability  concern,  allow the  such as "meaning", which a r e d i f f i c u l t  by a computer.  o f data m o d e l l i n g  Even  information  "tricks"  of i n t e g r a t i o n information  i s an  t h e apparent l a c k o f s u b s t i t u t e i n f o r m a t i o n t h e comprehensiveness  Conflict  analysis, at least  55  of the integration in principle,  should  not be based on t h e convenience w i t h which r e l e v a n t i n f o r m a t i o n items  c a n be p r o d u c e d .  On t h e c o n t r a r y ,  question  2  should  i d e a l l y attempt t o f i n d i n f o r m a t i o n sources f o r a l l requirements raised  i n question  stating  1.  and s o l v i n g  information  In other  words, q u e s t i o n  t h e i n t e g r a t i o n problem  environment,  question  2 aims  in a  1 aims a t sufficient  at solving  that  i n t e g r a t i o n problem i n a l i m i t e d i n f o r m a t i o n environment.  In o r d e r limited on  t o decide information  the s u i t a b i l i t y  on t h e b e s t environment, of certain  f o l l o w i n g l i s t g i v e s suggestions on.  The term  information questions pieces  substitute  have t o be r a i s e d  o f information.  "concept" r e f e r s t o t h e i n f o r m a t i o n  how w e l l does the concept r e p r e s e n t t h e u n d e r l y i n g i n f o r m a t i o n t h a t i s necessary for  2.  design?  when does t h e concept f a i l as a s u r r o g a t e for  3.  database  the underlying  4.  information?  can  t h e user/database designer  the  information,  from some o t h e r  o r can i t be  provide gathered  source?  how easy can t h e i n f o r m a t i o n be gathered d u r i n g the i n t e g r a t i o n  56  The  t h e s e l e c t i o n should be based  be used as a s u b s t i t u t e :  1.  i n the  process?  concept t o  The  l a s t p o i n t b r i n g s up the i s s u e of d e v e l o p i n g  a process  for  view i n t e g r a t i o n which r e q u i r e s the l e a s t amount o f i n t e r a c t i o n by  using  as  sufficient 1  necessary will  be  much  information, .  question  usefulness.  For  modification  that  information  designer  Given l i m i t e d  necessary.  research  inferred  interaction  a process  require  the  redesign  technique  Given  is ideally  not  interaction  developed t o answer  example, a u s e f u l d e s i g n enabled  possible.  information, designer  Therefore, 1 may  as  to  increase i t s  change would be  t o apply  a  previously  gathered i n f o r m a t i o n t o l a t e r stages of the i n t e g r a t i o n p r o c e s s . One  has  t o keep i n mind t h a t a program w i l l  quickly lose i t s  appeal as a p r o d u c t i v i t y t o o l , i f i t r e p e a t e l y asks the d e s i g n e r t r i v i a l questions.  Such r e d e s i g n does not change the i n t e g r a t i o n  cases, but the sequence o f the a n a l y s i s , as w i l l be demonstrated l a t e r i n the c o n t e x t o f h e u r i s t i c s .  So,  while  the  primary  interest  d i s c o v e r y o f an exhaustive principles,  the  secondary  within this  research  i s the  s e t of c o n f l i c t cases and r e s o l u t i o n interest  i s the  development of  an  e f f i c i e n t i n t e g r a t i o n procedure through c h o i c e o f s u r r o g a t e s f o r c e r t a i n p i e c e s of i n f o r m a t i o n and through c h o i c e o f a  1  The i n t e g r a t i o n mechanism which assumes i n f o r m a t i o n a v a i l a b i l i t y i s implemented i n form of a programmed procedure t h a t d i r e c t s a l l q u e s t i o n s concerning i n f o r m a t i o n requirements back t o the d e s i g n e r (user of the mechanism). 57  sequence t h a t  a l l o w s t o make i n f e r e n c e s  from t h e data  already  gathered.  Contribution  t o Knowledge:  A main r e s u l t o f t h e study i s p r e s c r i p t i v e knowledge, knowledge on  how view i n t e g r a t i o n  point  for this  should be c a r r i e d out.  knowledge  The s t a r t i n g  i s the set of integration  cases  i d e n t i f i e d by t h e consensus o f p r e v i o u s i n t e g r a t i o n approaches. T h i s r e s e a r c h develops a s y s t e m a t i c framework which encompasses the  a v a i l a b l e i n t e g r a t i o n knowledge (see c h a p t e r 2)  a s e t o f a d d i t i o n a l cases f o r c o n f l i c t i n g views.  as w e l l as  The r e s e a r c h  a l s o demonstrates t h e framework's completeness.  Another r e s u l t o f t h e study i s a s e t o f h e u r i s t i c s f o r e f f i c i e n t execution of the i n t e g r a t i o n process with l i m i t e d The  assumptions  stated.  F o r example,  implemented. A  will  underlying  "IF object  have  Heuristics  information.  these h e u r i s t i c s w i l l  suppose,  the following  heuristic i s  A i s i d e n t i c a l t o object  t h e same c o n s t r u c t a r e accompanied  (i.e.,  be c l e a r l y  be b o t h  B and o b j e c t entities).  by e x p l a n a t i o n s c o n c e r n i n g  their  g e n e r a l i z a b i l i t y and e f f e c t s o f t h e i r f a i l u r e .  Prescriptive  knowledge  encompasses knowledge on  integration  laws and i n t e g r a t i o n p r o c e s s r u l e s w h i l e d e s c r i p t i v e knowledge 58  encompasses  process  and  information substitution  rules.  At  the end, t h i s r e s e a r c h p r e s e n t s a s e t of i n f o r m a t i o n requirements and  a set of  integration  r u l e s which t o g e t h e r are  sufficient  t o perform the i n t e g r a t i o n p r o c e s s i n c l u d i n g c o n f l i c t r e s o l u t i o n as w e l l as an e f f i c i e n t i n t e g r a t i o n p r o c e s s .  Another  contribution  research.  I t i s an  to  knowledge can be  e x t e n s i o n of the  r e g a r d i n g data semantics.  d e r i v e d from  relational  data  this model  I t i s w e l l known t h a t the r e l a t i o n a l  data model i n i t s c u r r e n t form i s not w e l l s u i t e d f o r c a p t u r i n g data semantics.  One  s t e p towards c a p t u r i n g data semantics  the data d i c t i o n a r y which keeps i n f o r m a t i o n on database either  i n computer or human i n t e r p r e t a b l e  types,  o r the meaning o f the data  l a r g e amount o f the d i c t i o n a r y virtually the  effort-free,  outcome o f  relations,  but  the  i n c l u d e data database  interpret system database  this  itself,  process  may  tuples.  be  The  only view  A  generated,  design process. not  data  be  Thus, s e t of  integration  i n f o r m a t i o n t h a t should be c a p t u r e d i n data  d i c t i o n a r i e s but has  Future  design  a l s o a data d i c t i o n a r y .  approach suggests  may  i n f o r m a t i o n can  as p a r t of the  items,  form, i . e . on  i n the r e l a t i o n  is  not been captured y e t .  concerning  This information  the meaning o f database  objects.  management systems c o u l d have f a c i l i t i e s data  i n order  to  support  f o r i n s t a n c e t o improve the  the  u s e r s and  integrity  of  to the the  ( f u l l y i n t e g r a t e d semantic d i c t i o n a r y ) o r a t l e a s t t o  improve u s e r understanding o f database  59  data.  For example, the  database  could  explain  t o the user  that  MANUFACTURER  isa  s u b c l a s s o f SUPPLIER which s u p p l i e s p a r t s and a l s o manufactures t h e s e p a r t s o r t h a t SUPPLIER i s a person o r o r g a n i z a t i o n in  the present  supplying  i s supplying  parts  or i n the past  that  has been  parts.  3.2.  Approach t o t h e Problem  3.2.1.  Overview  The problem s o l v i n g approach chosen f o r t h i s r e s e a r c h is  determined  integration Previous their  by t h e i l l - s t r u c t u r e d  process  research  and t h e p r e v i o u s  has i d e n t i f i e d  nature  o f t h e view  research  i n t h e area.  several c o n f l i c t  cases and  s o l u t i o n s without a s s u r i n g us t h a t t h e problem has been  solved i n i t s e n t i r e t y .  With t h e f i r s t r e s e a r c h q u e s t i o n , t h e  attempt  i s made t o d e v e l o p  method.  T h i s t a s k i s s i m p l i f i e d by the i n f o r m a t i o n  assumption.  To answer t h i s  a complete  research  problem s o l v i n g approach was chosen.  60  conflict  question,  resolution availability  an a n a l y t i c a l  T h i s approach  identifies  all  possible  different  views and  complete. various  conflict  The  cases  f o r any  pair  shows t h a t the l i s t  list  c o n t a i n s 17  of c o n f l i c t  general c o n f l i c t  1  from  cases i s  cases  with  subcases.  Completeness has t o be shown f o r t h i s l i s t . of  of objects  completeness  differentiate  r e s t s on the assumption any  two  The  demonstration  t h a t a l l c r i t e r i a which  views o r p a r t s t h e r e o f  (i.e. different  names f o r the same o b j e c t type, d i f f e r e n t meaning o f two o b j e c t types) have been i d e n t i f i e d here.  Once a l l c r i t e r i a a r e known  by which o b j e c t s can be d i s t i n g u i s h e d , a l l p o s s i b l e of  criteria  can be e a s i l y generated.  argument has are  t o j u s t i f y why  irrelevant  o r why  The  l a t t e r p a r t of the  some of the p o s s i b l e  they  are  similar  combinations  to  combinations  other,  already  i d e n t i f i e d ones.  3.2.2.  O u t l i n e o f the Problem w i t h A v a i l a b l e I n f o r m a t i o n  Even approaches  though  have d e a l t  recognition)  problem  some with  of the  the  previous  conflict  integration  analysis  i n a s y s t e m a t i c manner, t h e i r  1  (conflict conflict  P a i r w i s e i n t e g r a t i o n has been the p r o c e d u r a l c h o i c e f o r most p r e v i o u s i n t e g r a t i o n methods (see B a t i n i e t a l . , 1986). Only r e c e n t l y , some r e s e a r c h e r s ( i . e . , Navathe) have demonstrated p a r a l l e l i n t e g r a t i o n t e c h n i q u e s f o r more t h a n two v i e w s , applicable i n certain conflict situations. 61  c l a s s i f i c a t i o n schemes were not s u i t a b l e t o i d e n t i f y a l l p o s s i b l e combinations failed  to  of  cases  possible views.  differences.  i d e n t i f y some c o n f l i c t  categorization  The  object  Consequently, they have cases.  In  t h i s section,  a  i s p r e s e n t e d which overcomes t h i s weakness.  discussed  below  represent  c o n f l i c t s between It will  be  any  argued t h a t  two any  an  exhaustive  objects  possible  list  of  from d i f f e r e n t conflict  case i s  covered by the technique and t h a t a f t e r r e s o l u t i o n o f c o n f l i c t s , views are argued  i n a form i n which they can merged.  that  there  exists  a  "causal  I t w i l l also  ordering"  be  (compare Simon  and  Ando, 1963)  of c o n f l i c t r e s o l u t i o n cases which determines  the  sequence o f  steps within  an  integration  procedure  the  i n t e g r a t i o n process.  following  this  ordering  Hence, will  be  outlined.  O b j e c t comparison  O b j e c t comparison focuses on the difference may  be  between two  objects  o f any  i d e n t i t y or  from d i f f e r e n t views.  o f type e n t i t y , r e l a t i o n s h i p , a t t r i b u t e .  a designer a r b i t r a r i l y picks and  detection  one  object  Objects  For  example,  from each o f two  wants t o determine t h e i r i d e n t i t y or d i f f e r e n c e .  this,  he  evaluate  should  choose  four  general  objects:  62  criteria  by  view To  do  which  to  (1)  Name - a r e t h e o b j e c t s ' names i d e n t i c a l ?  (2)  C o n s t r u c t - a r e both o b j e c t s r e p r e s e n t e d by t h e same construct?  (3)  Meaning - do t h e o b j e c t s have t h e same meaning?  (4)  Context neighbor  - a r e t h e o b j e c t s a s s o c i a t e d w i t h t h e same o b j e c t s i n both views?  The name c r i t e r i o n i s a s t r a i g h t f o r w a r d one and w e l l d e s c r i b e d with  i n the literature.  have  t h e same name, d i f f e r e n t  names.  Otherwise,  In s h o r t ,  identical  o b j e c t s should  objects have  should  different  t h e o b j e c t p a i r s a r e synonyms o r homonyms.  C o n s t r u c t r e f e r s t o t h e o b j e c t type,  i.e., entity.  Identical  o b j e c t s s h o u l d have the same c o n s t r u c t , t o a l l o w t h e i r merging. P r e v i o u s r e s e a r c h has g i v e n many examples o f c o n s t r u c t mismatches and t h e i r  resolution.  Meaning i s t h e most d i f f i c u l t c r i t e r i o n . discussion point  about t h e i n t e r p r e t a t i o n  Instead o f a lengthy o f "meaning",  the f o l l o w i n g working d e f i n i t i o n  will  at this  be u s e d : two  o b j e c t s have t h e same meaning i f they both r e p r e s e n t t h e same real  world  Real world database  object.  Database d e s i g n  o b j e c t s a r e r e p r e s e n t e d by database items. items  a r e both  models t h e same r e a l  they have t h e same meaning. not  been  i s a form o f m o d e l l i n g .  explicitly  world  I f two object,  In p r e v i o u s r e s e a r c h , meaning has  discussed  63  as d i s c r i m i n a t i n g c r i t e r i o n ,  possibly  b e c a u s e t h e meaning c r i t e r i o n  assess.  i s very  F o r i n s t a n c e Navathe and E l m a s r i  d i f f i c u l t to  ( f o r example, 1986)  have f r e q u e n t l y used domains o r mapping r a t i o s as d i s c r i m i n a t i n g criteria  instead.  mapping  ratios  meaning  concept.  dimension conflict  will  We  may  think  o f domain  i n f o r m a t i o n and  as o p e r a t i o n a l i z a t i o n s c a p t u r i n g Explicit  result  representation  i n a simple  part  o f the  o f t h e meaning  and c l e a r d i s t i n c t i o n o f  1  cases .  Context r e f e r s t o t h e o b j e c t s t h a t a r e immediate neighbors o f an o b j e c t . 2  By d e f i n i t i o n , an e n t i t y w i l l always have an empty  context .  A r e l a t i o n s h i p ' s context  associated  with.  An a t t r i b u t e ' s c o n t e x t  r e l a t i o n s h i p i t belongs t o . represented researchers  are a l l e n t i t i e s  i n previous  i t is  i s the e n t i t y or  Context a l s o has not been e x p l i c i t l y research,  even  though  previous  were aware o f t h e importance o f context,  as t h e i r  c o n f l i c t r e c o g n i t i o n and r e s o l u t i o n examples show.  Based  on t h e f o u r c r i t e r i a  and two s t a t e s o f each  (identity or difference), a 2 x 2 x 2 x 2  criterion  matrix w i t h 16  1  The main d i f f i c u l t i e s o f meaning r e p r e s e n t a t i o n a r e c o m p l e t e n e s s o f t h e r e p r e s e n t a t i o n and d i f f e r e n c e s i n user perspective. F o r example, when asked about t h e meaning o f " l i o n " , most people may r e p l y "dangereous animal", w h i l e a l i o n tamer would probably r e p l y " l i v e l i h o o d " . T h e s e a r e two d i f f e r e n t , incomplete i n t e r p r e t a t i o n s which a r e both a c c e p t a b l e . For a d i s c u s s i o n o f t h e meaning concept c o n s u l t R u s s e l l (1946). 2  Even though e n t i t i e s have no context by d e f i n i t i o n , i t may be u s e f u l l a t e r t o t h i n k o f an e n t i t y ' s context as t h e relationships i t i s involved i n . 64  general be  cases  of i d e n t i t y  constructed.  recognition  and d i f f e r e n c e o f o b j e c t p a i r s can  To e x e m p l i f y  the p r i n c i p l e s  and r e s o l u t i o n , o n l y  the f i r s t  of  three  conflict criteria,  name, c o n s t r u c t , and meaning, w i l l be d i s c u s s e d i n more d e t a i l and  represented  For  now,  graphically i n this  the conflict  section  (see F i g u r e  problem can be s i m p l i f i e d  by assuming  t h a t whenever two o b j e c t s have i d e n t i c a l meaning, t h e i r will  be  their  identical.  contexts  Whenever  v a r i a t i o n s i n context,  contexts  t h e i r meanings a r e d i f f e r e n t ,  may be d i f f e r e n t o r i d e n t i c a l .  s e c t i o n s w i l l deal with the f u l l  1) .  The subsequent  i n t e g r a t i o n problem, a l l o w i n g  even i f meaning i s i d e n t i c a l .  Same  dif leient  2. S y n o n y m  1. Idenltcal seme —> different \  5. H o m o n y m  6.  Dllterenl Objects  7. H o m o n y m * DIM. C o n s .  8. D i l l . Obj. * DIM. C o n s -  CONSTRUCT  3. C o n s t r u c t M i s m a t c h  F i g u r e 1:  4  Construct Mismatch • Homonym  O b j e c t Comparison M a t r i x  Each o f t h e cases d e p i c t e d i n F i g u r e 1 w i l l be b r i e f l y below.  The focus  of t h i s  discussion shall 65  presented  be on t h e cases,  not or  on  their  detailed solution.  necessary  f o r the  subsequent c h a p t e r s . conflicts. 1) , t h e y  For can  d i s c u s s i o n , they w i l l Note t h a t not  instance,  be  U n l e s s s o l u t i o n s are  i f two  be  postponed  a l l cases below  objects  are  simple  describe  identical  merged without m o d i f i c a t i o n s .  to  (Case  Other  cases,  such as synonymy (Case 2) r e q u i r e an o b j e c t change.  Case 1; This  [Name:same; Meaning:same; Construct:same]  condition  research  (see  corresponds chapter  2) .  to Two  cases objects  1 and  5 from  previous  are  identical  in a l l  factors. Example: View 1: CUSTOMER ( e n t i t y ) View 2: CUSTOMER ( e n t i t y ) both d e s c r i b i n g the same customer o b j e c t The  notion of i d e n t i t y  exemplified, identical entities  Case 2:  but  also  entities, (identical  and  i s not o n l y meaningful f o r e n t i t i e s , for  identical  for identical  (i.e.,  a t t r i b u t e s of  linking  identical  Meaning:same; Construct:same]  T h i s i s the case o f a synonym.  construct  relationships  as  context).  [Name:different;  carry different  type.  names.  Both o b j e c t s are i d e n t i c a l  Note t h a t both o b j e c t s have the  entity).  Example: VI: CUSTOMER ( e n t i t y )  66  but same  V2:  BUYER ( e n t i t y )  both d e s c r i b i n g t h e same r e a l world customer o b j e c t  type.  Case 3; [Name:same; Meaning:same; C o n s t r u c t : d i f f e r e n t ] This  case  describes  represented  a s i t u a t i o n where t h e same o b j e c t i s  by d i f f e r e n t m o d e l l i n g  constructs.  be r e f e r r e d t o as c o n s t r u c t mismatch. this  difference i n construct  when t h e same o b j e c t and  Brodie  T h i s case w i l l (1984) r e f e r s t o  as "semantic r e l a t i v i s m " , e.g.,  i s represented  as an e n t i t y i n one view  as a r e l a t i o n s h i p i n another view.  Example: VI: MARRIAGE V2:  (entity)  Marriage ( r e l a t i o n s h i p )  Both views d e s c r i b e marriage o b j e c t s . name, b u t a d i f f e r e n t c o n s t r u c t . entity 2,  (probably  a marriage  entities) . the  with  F o r view 1, a marriage i s an  husband and w i f e  attributes),  i s a r e l a t i o n s h i p (probably  The s o l u t i o n t o t h i s  constructs,  vice versa.  Both views use t h e same  f o r view  l i n k i n g two person  case i s a change i n one o f  e i t h e r making t h e e n t i t y a r e l a t i o n s h i p o r  A t t h e end, each o b j e c t should  be r e p r e s e n t e d  by  the same c o n s t r u c t i n a l l views.  This  example  describes  only  one o f many p o s s i b l e  mismatch s c e n a r i o s .  67  construct  Case 4: [Name:different;  Meaning:same; C o n s t r u c t : d i f f e r e n t ]  T h i s case i s c l o s e l y r e l a t e d t o the p r e v i o u s objects  have  t h e same meaning, but t h i s  one.  time they n o t o n l y  have d i f f e r e n t c o n s t r u c t s , but a l s o d i f f e r e n t names. identity  of objects  i s disguised  Again, both  even  Therefore,  further,  by name  d i f f e r e n c e s on t o p o f c o n s t r u c t d i f f e r e n c e s . Example: VI: MARRIAGE V2:  (entity)  Married_to  While  both  processor,  (relationship)  views  use almost  s i m i l a r names, t o a s y n t a c t i c  t h e names w i l l be d i f f e r e n t .  Case 5 : [Name:same; M e a n i n g : d i f f e r e n t ; This  c a s e marks homonyms.  but have d i f f e r e n t meaning. (i.e.,  Construct:same]  The o b j e c t s  c a r r y t h e same name,  The o b j e c t s have t h e same c o n s t r u c t  entity).  Example: V I : SUPPLIER ( e n t i t y ) V2:  SUPPLIER ( e n t i t y )  Here t h e same name SUPPLIER i s used f o r both s u p p l i e r s ( c u r r e n t l y supplying  t h e product) and f o r manufacturers  (who manufacture  the product and may be p o t e n t i a l s u p p l i e r s ) .  Case 6:  [Name:different;  Meaning:different;  Construct:same]  T h i s case may r e f e r t o a t r i v i a l s i t u a t i o n i n which two o b j e c t s are d i f f e r e n t i n meaning and name, but have t h e same c o n s t r u c t . 68  On t h e o t h e r hand, i t may  r e f e r t o a number o f more complex  situations of non-identical  but r e l a t e d  relationship)  EMPLOYEE  situation  (entity)  V2: DEPARTMENT  (entity)  Example 2: r e l a t e d VI:  STUDENT  objects  (entity)  V2: UNDERGRAD  (entity)  The e n t i t i e s i n the f i r s t world o b j e c t s in  example r e f e r t o two d i f f e r e n t r e a l  which a r e not r e l a t e d  1  .  The o b j e c t s  Whenever t h e r e e x i s t s such a c o n n e c t i o n  between two items they cannot be t r e a t e d e i g h t cases e x t r a c t e d such n o n - i d e n t i c a l  but r e l a t e d  T h i s case c a p t u r e s homonyms. same, b u t t h e y  used.  Note t h a t  different 6). VI:  as independent.  from p r e v i o u s r e s e a r c h p r o v i d e  differ  The  solutions  sets.  Case 7: [Name:same; M e a n i n g : d i f f e r e n t ;  the  represented  t h e second example a r e r e l a t e d , namely through a s u p e r s e t -  subset r e l a t i o n s h i p .  for  superset-subset  objects.  Example 1: t r i v i a l VI:  (i.e.,  Construct:different]  Again, the name o f two o b j e c t s i s both  t h i s case may  i n meaning and i n  also contain  meaning but a r e r e l a t e d  objects  t o each o t h e r  construct that  have  (as i n Case  Example: SUPPLIER  (entity)  1  " R e l a t e d " i s used here t o express t h a t two o b j e c t c l a s s e s are e i t h e r o v e r l a p p i n g o r a r e c o n t a i n e d by a common o b j e c t c l a s s . 69  V2:  Supplier (attribute)  The  name  supplier  i s used  f o r both  an e n t i t y and f o r an  a t t r i b u t e , and t h e a t t r i b u t e does not r e f e r t o t h e same s u p p l i e r object  (i.e.,  r e f e r s t o a manufacturer o b j e c t ) .  Case 8: [Name:different; This  case  describes  Meaning:different; Construct:different] objects  which  are d i f f e r e n t  i n every  r e s p e c t , meaning, name and c o n s t r u c t . Example: VI: SUPPLIER ( e n t i t y ) V2:  Department ( a t t r i b u t e )  S u p p l i e r and department a r e d i f f e r e n t o b j e c t s a l t o g e t h e r , no  similarities  trivial  form  between  them.  o f the case.  Again,  But a g a i n ,  this  with  exemplifies the  objects  may a l s o be  related.  The will  above e i g h t cases  i n t o 2 main groups: o b j e c t s  be u l t i m a t e l y completely  different. second first  fall  group  Whether  identical  an o b j e c t  items),  or the  i s determined by t h e i r meaning dimension.  by cases 5,6,7,and 8.  cases d e s c r i b e s t a b l e s t a t e s . 3  that are  belongs t o the f i r s t  group c o n s i s t s o f cases 1,2,3, and 4.  i s represented  case  and o b j e c t s  (semantic  The  The second group  I n e i t h e r group, c e r t a i n  In t h e f i r s t group f o r example,  r e l a t i v i s m ) becomes a c a s e  once d i f f e r e n t  that  constructs  (identical  are eliminated.  becomes a case 3, once o b j e c t s a r e renamed.  70  1  Within  Case 4 t h e group  o f d i f f e r e n t o b j e c t s t h e r e e x i s t two are  related (i.e.,  u l t i m a t e l y belong  one  stable states.  i s a subset  t o case 6,  o f the  and  The  transformations  transformation  arrows  will  t o case 8 o r case 6  complete p a t t e r n of t r a n s f o r m a t i o n s  shown i n F i g u r e 2.  o t h e r ) , they  i . e . , a f t e r renaming from case 5.  I f they are u n r e l a t e d , they w i l l belong The  I f objects  one  show t h e  case  .  into stable states i s  f i g u r e shows d e p i c t s comparison from  1  into  direction  another.  of  cases The  transformation  d u r i n g the i n t e g r a t i o n p r o c e s s . 2 ->  1  convert  t r u e synonyms i n t o i d e n t i c a l  items through  renaming. 3 ->  1  convert  construct  mismatch  into  identical  items  through change of d i f f e r e n t c o n s t r u c t s . 4 ->  3  convert  construct  mismatch and  synonym i n t o  just  semantic r e l a t i v i s m through renaming, or 4 ->  2  c o n v e r t c o n s t r u c t mismatch and  synonym i n t o synonym  through c o n s t r u c t change. 5 ->  6  convert  homonyms i n t o d i f f e r e n t  items  (possibly  r e l a t e d ) through renaming. 8 ->  6  c o n v e r t d i f f e r e n t items w i t h d i f f e r e n t c o n s t r u c t s i n t o d i f f e r e n t items w i t h same c o n s t r u c t s  (only i f items  are d i f f e r e n t but r e l a t e d ) through c o n s t r u c t changes. 7 ->  5  convert homonymy w i t h d i f f e r e n t c o n s t r u c t i n t o  1  I f the o b j e c t s are u n r e l a t e d , case 8 i s a s t a b l e s t a t e , r e q u r i n g no changes d u r i n g c o n f l i c t r e s o l u t i o n . I f o b j e c t s are r e l a t e d , u l t i m a t e l y , t h e o b j e c t s w i l l be t r a n s f o r m e d i n t o s t a t e 6. 71  homonymy through name change  (only i f o b j e c t s a r e  related). 7 -> 8  convert  homonyms  into  different  items  through  renaming.  NAME same  MEANING same —• diflerent  1. I d e n t i c a l  5.  Homonym  different  2.  Synonym  6.  Different Objects  CONSTRUCT  di I l e r e n t  7. H o m o n y m Dilf. C o n s .  3. C o n s t r u c t  F i g u r e 2;  The  Case Transformations  transformation  Case 6, and Case 8. with  Mismatch  same meaning.  8. D i f f . O b j . •> Diff. C o n s .  4. C o n s t r u c t Homonym  Mismatch  d u r i n g View I n t e g r a t i o n  sequences have t h r e e  end p o i n t s , Case 1,  Case 1 i s t h e end p o i n t f o r a l l o b j e c t s I t i s captured  from p r e v i o u s r e s e a r c h .  by cases  1 and 5 e x t r a c t e d  Case 8 i s t h e end p o i n t f o r a l l items  which  are d i f f e r e n t  solution included  i s trivial.  solution)  This  schema.  unrelated  items w i l l  same c o n s t r u c t  but r e l a t e d o b j e c t s .  I f objects research  apply.  case t r a n s f o r m a t i o n s  ( F i g u r e 2) a r e f r e e o f c i r c u l a r i t i e s .  i t possible to postulate  r e c o g n i t i o n and r e s o l u t i o n . ordering.  be  (trivial  r e l a t e d , cases 2 t o 4 and 6 t o 8 from p r e v i o u s  makes  Its  Case 6 i s t h e end p o i n t f o r  items w i t h  and f o r d i f f e r e n t  (chapter 2) w i l l  The  and n o t r e l a t e d .  A l l these n o n - i d e n t i c a l  i n the global  non-identical,  are  i n a l l aspects  an o r d e r i n g  of c o n f l i c t  F i g u r e 3 i l l u s t r a t e s one p o s s i b l e  The o p e r a t i o n s t o be c a r r i e d out f i r s t a r e c o n s t r u c t  changes (4->2, 3->l, 7->5, 8->6) f o r i d e n t i c a l and f o r r e l a t e d objects.  T h i s i s f o l l o w e d by t h e change o f names f o r synonyms  (2->l) , and homonyms unrelated are  cases  attend  objects).  (5->6 f o r r e l a t e d o b j e c t s , The t e r m i n a t i o n  1, 6, and 8.  The o t h e r  points  7->8 f o r  o f t h e procedure  possible ordering  t o name changes p r i o r t o c o n s t r u c t  changes.  F o r now,  both sequences a r e e q u a l l y good, even though t h e f i r s t p r e f e r a b l e , as w i l l be e x p l a i n e d  73  later.  would  one i s  Case 4  Construct  Case  Case 7  Case 3  Construct  Change  Case  2  Change  5  Name Change Construct  Case 8  Case 1  Stable  Stable  Figure 3:  Change  O r d e r i n g o f View I n t e g r a t i o n 74  Stable  Steps  3.2.3.  Changes  i n the  Information  The  i s not D i r e c t l y A v a i l a b l e  i n t e g r a t i o n method  assumption  that  integration  process  I n t e g r a t i o n Method when Necessary  discussed  necessary  so  far  information  is directly  i s based to  available.  carry  For  the  on  the  out  the  required  i n f o r m a t i o n t o be a v a i l a b l e , i t e i t h e r has t o be s p e c i f i e d  ex-  ante, o r has t o be e l i c i t e d d u r i n g the view i n t e g r a t i o n p r o c e s s . Since  information  represents  a  specification  cost,  requires designer  i t i s d e s i r a b l e to  reduce  effort  information  s p e c i f i c a t i o n requirements f o r the database d e s i g n e r s . while method  p r e v i o u s l y the  focus  was  f o r i n t e g r a t i o n , the  on  the  focus  design  will  now  of a be  and  on  Hence, complete a human-  o r i e n t e d complete method f o r view i n t e g r a t i o n .  The  new  goal w i l l  and  r e l a t e d n e s s w i t h a s m a l l number o f i n t e l l i g e n t  redundant) future  be  questions.  questions  on  minimum requirement. o t h e r areas  1.  t o dtermine o b j e c t  Obviously,  answers The  to  the  following l i s t  many o b j e c t s  shall  comparison?  75  method  previous  i n which the procedure can be  How  identity, difference  be  (i.e.,  non-  should  base  This  is a  ones.  of q u e s t i o n s o u t l i n e s improved.  included  i n the  object  2.  Which o b j e c t s should be compared?  3.  What  i s t h e sequence  of c o n f l i c t  diagnosis  and  therapy?  4.  How s h a l l i d e n t i t y o r d i f f e r e n c e be decided?  How many o b j e c t s ? The  previously  pairs, This  i.e., " i s entity type  "no",  o u t l i n e d procedure  of question  1.  E3,  .  i n view 2 t h i s  By asking,  i d e n t i f i e r , o r w i t h "no". reduces t h e q u e s t i o n i n g  "yes" o r  form o f q u e s t i o n i n g  " i s E l identical  c a n be answered  object  i n meaning t o e n t i t y E2?"  Em}", t h e number o f q u e s t i o n s  The q u e s t i o n  compared  can always be answered w i t h  but f o r n objects  requires n questions {E2,  E l identical  always  either  t o one o f  can be reduced t o with  the object's  T h i s form o f q u e s t i o n i n g d r a s t i c a l l y  effort.  always be l : n i n s t e a d o f 1:1.  The q u e s t i o n i n g  format  will  An m:n format w i l l not be used,  s i n c e t h e answers become awkward (a l i s t o f t u p l e s o f i d e n t i c a l objects). Which o b j e c t s ? The  procedure would not behave i n t e l l i g e n t l y ,  objects  i n t h e comparison t h a t should  i f i t included  not be i n c l u d e d .  For  i n s t a n c e , i f E21 from view V2 was found t o be i d e n t i c a l t o E l l from view VI, t h e procedure should never again i n q u i r e whether E21  i s identical  are  described  any o t h e r o b j e c t from V I .  i n the results  76  chapter,  Other r u l e s which  reduce t h e group o f  relevant  objects  even  more.  Furthermore,  h e u r i s t i c s (also  r u l e s , but not always true) were found t o reduce the group o f objects  even  found t o be one may  further.  identical,  So  two  entities  are  and both p a r t i c i p a t e i n r e l a t i o n s h i p s ,  groups.  sequence?  f a r , sequences  which  once  expect t o f i n d i d e n t i c a l p a i r s o f r e l a t i o n s h i p s w i t h i n  these s m a l l e r  Which  F o r example,  resulted  objects,  of object modifications  in stable  state  cases,  have been o u t l i n e d (Case  1)  identical  (Case 6) d i f f e r e n t , but r e l a t e d o b j e c t s , and  different  and  unrelated  objects.  For  instance,  (Case 8)  a case o f  c o n s t r u c t mismatch (Case 3) was transformed i n t o Case 1 through a c o n s t r u c t change.  The q u e s t i o n  operate by s e a r c h i n g  a c t i v e l y f o r c o n f l i c t cases such as  3 o r Case 4? procedure  The answer i s "no". alter  the  should Case  A human-oriented i n t e g r a t i o n Following  the  assumption t h a t i n absence of i n f o r m a t i o n t o t h e c o n t r a r y ,  two  views  will  i s whether the method  are assumed t o be  sequence  identical,  of t e s t s .  the procedure w i l l  attempt f i r s t t o f i n d matching o b j e c t s , not o b j e c t  always  mismatches.  For example, t y p i c a l l y the assumption a t the o u t s e t o f the o b j e c t comparison exists i.e.,  an  will object  be 02  that  f o r an  i n view V2  both a r e r e l a t i o n s h i p s .  b a s i c sequence o f t e s t s .  77  object w i t h an  Figure  O l i n view VI identical  4 briefly  there  construct,  o u t l i n e s the  NAME  Figure 4:  C o n f l i c t Recognition  Procedure  (abbreviated)  For any o b j e c t 01 from view VI and any s e t o f o b j e c t s {02} from view V2, t h e f i r s t t e s t i s a t e s t f o r i d e n t i t y o f meaning. it  fails,  a test  f o r c o n s t r u c t mismatch f o l l o w s .  I f there i s  no c o n s t r u c t mismatch, an o b j e c t i s assumed t o be m i s s i n g . t h a t name and c o n t e x t d i f f e r e n c e o r i d e n t i t y are i g n o r e d  78  If  Note  at  first.  The t e s t  assumption identity  f o r relatedness  of relatedness  of objects.  which begins  i s separated  Tests  with the  from  the test f o r  f o r relatedness  a r e postponed  u n t i l a l l t e s t s f o r i d e n t i t y are c a r r i e d out.  How t o d e c i d e on i d e n t i t y o r d i f f e r e n c e ? For a l l o b j e c t c h a r a c t e r i s t i c s ,  i d e n t i t y o r d i f f e r e n c e have t o  be  asserted.  is  n o t f o r meaning and c o n t e x t .  judge  whether  intelligent possible will the  While t h i s i s simple  this  by f i l t e r i n g  object  t h e same meaning, should  decision.  help  b u t an  a s much as  I n s h o r t , t h e procedure  out o b j e c t s t h a t a r e not i d e n t i c a l t o  i n question.  corresponding  have  Only people can u l t i m a t e l y  i n t e g r a t i o n procedure  i n making  help  3.2.4.  two o b j e c t s  f o r c o n s t r u c t and name, i t  Rules  to f i l t e r  out these  non-  objects are defined.  View I n t e g r a t i o n C o n f l i c t Cases  P r e v i o u s l y , o n l y 8 o f t h e 16 g e n e r a l types o f cases were d i s c u s s e d , when context was h e l d c o n s t a n t . below d e s c r i b e s  a l l p o s s i b l e cases  a r b i t r a r y objects by  name  context  (object type  Cases a r e i d e n t i f i e d T ) , meaning  (C) <N,T,M,C> o f t h e i n v o l v e d o b j e c t s .  79  list  f o r t h e comparison o f two  from d i f f e r e n t views.  (N), c o n s t r u c t  The case  (M) , and  Object  01 i s  denoted through <N1,T1,M1,C1>, o b j e c t 02 through <N2 T2,M2,C2>. ;  For every  case t h e e q u a l i t y  o r n o n - e q u a l i t y o f parameters i s  stated.  The  overview  difference  table  along  are  shows  f o r each  the four dimensions.  under N means t h a t means t h e y  below  case  i d e n t i t y or  F o r example, a 1  both o b j e c t s have i d e n t i c a l names, an x'  different.  F o r t h e meaning dimension, ' r '  means t h e meanings a r e d i f f e r e n t but r e l a t e d . Case N 1 = 2 — 3 X 4 X 5 = 6 X  T  = = = =  7 8 9  —  =  X  C  M  = =  =  — — — —  —  X X  =/x =/x  X  X  X  10 =  X  X  X  11 X 12 = 13 X  —  =  —  r r r  14 =  =  r  X  15  X  X  r  X  16 —  X  r  X  -  -  -  -  17  X X  X  X X X X  X  Identical objects I d e n t i c a l o b j e c t s w i t h d i f f e r e n t context Synonym Synonym w i t h d i f f e r e n t c o n t e x t C o n s t r u c t mismatch (semantic r e l a t i v i s m ) C o n s t r u c t mismatch and synonym D i f f e r e n t and u n r e l a t e d o b j e c t s Homonym Different o b j e c t s with different constructs Different o b j e c t s with different c o n s t r u c t s , but homonyms d i f f e r e n t but r e l a t e d o b j e c t s d i f f e r e n t but r e l a t e d homonyms d i f f e r e n t but r e l a t e d objects with d i f f e r e n t context d i f f e r e n t b u t r e l a t e d homonyms w i t h d i f f e r e n t context d i f f e r e n t but r e l a t e d o b j e c t s o f d i f f e r e n t type d i f f e r e n t but related homonyms o f d i f f e r e n t type m i s s i n g o b j e c t 02  80  Note t h a t  i f two o b j e c t s a r e o f d i f f e r e n t type, t h e i r  will  be d i f f e r e n t ,  also  that  context  due t o t h e d e f i n i t i o n o f c o n t e x t .  identity or difference  Note  o f context i s i r r e l e v a n t f o r  o b j e c t s w i t h d i f f e r e n t meaning.  A  more d e t a i l e d  Appendix.  list  The l i s t  down i n t o  o f view c o n f l i c t s can be found  i n t h e appendix breaks each g e n e r a l  e x i s t s between an e n t i t y entity  out  here  provides  case  subcases based on d i f f e r e n t i a t i o n a c c o r d i n g t o t h e  constructs of p a r t i c i p a t i n g objects.  an  i n the  I.e.,  a c o n s t r u c t mismatch  and a r e l a t i o n s h i p  and an a t t r i b u t e .  The extended l i s t  f o r t h e purpose o f r e a d a b i l i t y . a brief  scenarios.  description  The g e n e r a l  as w e l l as between has been  left  The Appendix  also  of the solution  conflict  resolution  f o r a l l case  rule  f o r object  i d e n t i t y and d i f f e r e n c e  i s t o have a l l o t h e r dimensions  the meaning dimension.  I f two o b j e c t s have i d e n t i c a l meaning,  all  o t h e r dimensions w i l l  objects reflect  have d i f f e r e n t this.  have t o be made i d e n t i c a l . meaning,  from  Cases o f o b j e c t r e l a t e d n e s s a r e s o l v e d  this  solution  re-allocation of attributes general both  rule  through  relationships.  description  i s the technique f o r  when r e l a t e d n e s s i s d e t e c t e d .  i s to allocate  the superset  I f two  t h e name dimension has t o  r e p r e s e n t a t i o n o f t h e s u p e r s e t subset  Omitted  follow  those a t t r i b u t e s  and t h e s u b s e t  81  that  The  belong t o  t o the superset,  and t o  a l l o c a t e t o the subset it  o n l y the a t t r i b u t e s t h a t are unique t o  (see f o r i n s t a n c e Navathe and E l m a s r i  82  (1986)).  3.3.  Expert System Methodology  An  implemented  solution  f o r the view i n t e g r a t i o n  problem r e q u i r e s an adequate problem r e p r e s e n t a t i o n and s o l u t i o n mechanism. and  So  f a r , cases  of p o t e n t i a l  a p r o c e d u r e have been i d e n t i f i e d ,  mechanism has  been suggested.  Before  i n t e g r a t i o n problems y e t no  any  implementation  further discussion  o f an adequate mechanism, here a s h o r t reminder o f the problem situation.  Correcting  the  conflicts  problem s o l v i n g t a s k . is  t r e a t e d as  i n a s e t o f user views i s c l e a r l y  W i t h i n t h i s r e s e a r c h , view i n t e g r a t i o n  a d i a g n o s i s and  therapy  task  Roth e t a l . mention d i a g n o s i s and therapy t a s k s o f knowledge e n g i n e e r i n g of  a typical  a  diagnosis task  wrong" i n the a c t u a l s t a t e .  (note t h a t Hayes-  ("repair") as g e n e r i c  applications) .  i s the  goal to  Characteristic  find  out  "what's  Thus, the purpose of the d i a g n o s i s  p a r t o f view i n t e g r a t i o n i s the i d e n t i f i c a t i o n o f the d i s c r e p a n c y o r mismatch between a p a i r o f views.  Once the c o n f l i c t  has been i d e n t i f i e d , the therapy or " f i x i n g " phase w i l l one  or  both  creates tasks  the  are  views new,  by  designers.  remove an  existing  desired structure.  p r o t o t y p i c a l tasks  based systems. built  to  The  f o r expert  conflict.  Diagnosis  adjust Therapy  and  systems o r  therapy  knowledge  i n t e g r a t i o n method d i s c u s s e d here was  e x t r a c t i n g diagnosis Hence i t i s not  and  truly  83  therapy an expert  case  r u l e s from  not  expert  system. However,  i t w i l l r e p r e s e n t c o n f l i c t r e c o g n i t i o n and c o n f l i c t  resolution  knowledge. Database d e s i g n r u l e s can  be  easily  simplified  formulated  form, one may  as d e s c r i b i n g one the  for conflict  rule  sets of  case  resolution  production rules.  In  want t o t h i n k of each p r o d u c t i o n r u l e  o f the c a s e s .  matching  t r a n s f o r m the  as  r e c o g n i t i o n and  the into  s t a t e cases were reached  For each o b j e c t  conflict  another  situation  one,  until  comparison,  would  one  fire  o f the  and  stable  ( f o r a d e s c r i p t i o n o f the p r o d u c t i o n  system r e a s o n i n g mechanism see f o r i n s t a n c e B a r r and Feigenbaum, 1981).  The most a p p e a l i n g p r o p e r t y o f the p r o d u c t i o n system mechanism is  the  modularity  of  added,  deleted or  changed  rules. Vessey  resulting without  systems. directly  Figure 5 i l l u s t r a t e s t h i s fact. and Weber, 1986)  instructions  production  a d d i t i o n o f new does  table. affecting  The  list  columns.  not  affect  Furthermore,  can  can  affecting  be  other  F i g u r e 5 (taken  to exemplify  Each i n s t r u c t i o n  rule.  Rules  depicts a decision t a b l e with  f o r vegetables  rule editing.  column  the  from  cooking  the convenience  of  (column) c o r r e s p o n d i n g t o  one  be  easily  expanded  through  By the same token, the d e l e t i o n o f a any  each  other column  column  (or r u l e )  can  changed,  be  i n the thereby  o n l y the i n s t r u c t i o n s f o r one p a r t i c u l a r d i s h .  cause f o r t h i s s i m p l i c i t y of the r u l e based  84  The  system l i e s i n the  design of the condition l i s t .  Each c o n d i t i o n stub i s s p e c i f i e d  w i t h t h e utmost d e t a i l , not r e f e r r i n g t o c o n d i t i o n s a g g r e g a t e s o f more than one f a c t . does  not create  values)  such  intermediate  which a r e  I.e., t h e d e c i s i o n  results  table  (aggregates of truth  a s " j u i c y and c r i s p y and l e a f y b u t not t a l l " ,  which c o u l d appear l a t e r as a s i n g l e c o n d i t i o n i n t h e c o n d i t i o n list items  f o r both  a r e decoupled  the r u l e s  In o t h e r words,  as much as p o s s i b l e .  Fry Steam Grill Peel Boil Chop Roast  Y Y  — — — —  Y N Y Y Y  Y N Y Y N  Y N Y N  .— . —  — —  Y N N  Y N N  — —  — —  Y  N  X X  X  N  — — — — —  X X X  X  5: D e c i s i o n T a b l e  X X  Illustration  85  condition  Consequently  ( i . e . , t h e d i s h e s ) a r e decoupled.  Juicy Tall Crispy Leafy Red Hard  Figure  " f r y " and "steam".  also  The  modularity  o f p r o d u c t i o n r u l e s makes t h e i r  very forgiving. specified  I f a case i s l e f t out i n t h e b e g i n n i n g , o r i s  incompletely  very l i t t l e  at f i r s t ,  a d d i t i o n s can be made w i t h  e f f e c t on t h e a l r e a d y e x i s t i n g  One disadvantage  implementation  rules.  i s t h e i n e f f i c i e n c y o f t h e p r o d u c t i o n system  approach, due t o d u p l i c a t i o n o f i d e n t i c a l c o n d i t i o n elements. This  result  i s the cost  conditions. in  detail  without  being  between  would group those  able  t o make u s e o f e s t a b l i s h e d  A more s e n s i b l e d e s i g n approach complete  processing e f f i c i e n c y .  unit  by c o m p l e t e d e c o u p l i n g o f  Every c o n d i t i o n l i s t has t o be c r e a t e d and t e s t e d  intermediate r e s u l t s . compromise  induced  decoupling  o f c o n d i t i o n s and  A h e u r i s t i c f o r aggregating conditions  c o n d i t i o n s t o g e t h e r t h a t form a  (are h i g h l y cohesive).  Meaningful  stands  purely accidental coincidence of conditions.  because  i t does not i d e n t i f y  group o f food items.  meaningful  i n contrast to  I.e., " j u i c y and  c r i s p y and l e a f y , but not t a l l " i s not a p a r t i c u l a r l y grouping,  should  a certain  T h e r e f o r e , t h i s aggregate  meaningful well-known  should n o t be  chosen as a grouping, even though i t c o u l d s i m p l i f y t h e d e c i s i o n t a b l e i n t h e example.  A  second disadvantage  of production  they d i s g u i s e t h e c o n t r o l flow. to  understand  the control  systems i s t h e f a c t  I t i s d i f f i c u l t f o r a designer  flow i n t h e p r o d u c t i o n system.  86  that  In  so  called  Fortran,  " p r o c e d u r a l " programming  the c o n t r o l  languages,  i . e . Pascal  flow i s determined by the o r d e r i n g  of  or the  language statements, i f b r a n c h i n g statements are n e g l e c t e d f o r the  moment.  much  less  applied table  In p r o d u c t i o n systems, the importance.  I.e.,  the  f i r s t even though i t i s the 5,  in Figure  last  rule  true  first,  i n the  unless  system  i t will  be  the  systems i n g e n e r a l r e q u i r e d e s i g n e r s who  are  sequence of r u l e s  "chop"  one  first  will  f i r s t r u l e i n the  i t s conditions  i s the  rule  to  are  not  substantial  be  decision  true.  If  whose c o n d i t i o n s fire.  has  the  become  Hence, p r o d u c t i o n  re-thinking  used t o p r o c e d u r a l languages.  by In  systems a  Prolog  implementation t h i s problem i s a l l e v i a t e d t o some e x t e n t the language's i n t e r p r e t e r i n t e r p r e t s r u l e s s t i l l i n  since  sequential  order.  In  conclusion,  even  though  i t has  some d i s a d v a n t a g e s ,  p r o d u c t i o n system seems t o be a s u i t a b l e r e p r e s e n t a t i o n f o r the  implementation of t h i s r e s e a r c h .  a l r e a d y p r o v i d e s many g u i d e l i n e s resolution  rules.  Also,  the  The  case  a  mechanism  description  f o r the d e f i n i t i o n of c o n f l i c t maintainability  advantage  of  p r o d u c t i o n systems becomes important when subsequently h e u r i s t i c s have t o  be  added  to  the  operation with i n s u f f i c i e n t  integration  method t o  improve i t s  information.  A d i f f e r e n t , a p p a r e n t l y more e l e g a n t approach t o view i n t e g r a t i o n could  perform  the  integration  87  p r o c e s s as  a theorem  proving  task.  S i m i l a r t o o t h e r theorem p r o v i n g t a s k s (see f o r i n s t a n c e  Nilsson, the  1980)  question  the program would be "does t h e r e  exist  g i v e n a s e t o f views  a conflict  and  f r e e g l o b a l schema  which c o n t a i n s a l l the i n f o r m a t i o n of the i n d i v i d u a l c o n f l i c t i n g views"?  I f the answer t o t h a t q u e s t i o n were "yes", the g l o b a l  schema would be resolution  produced as a "by-product".  principle  (1965),  problem by c r e a t i n g a new and  by  falsifying  this  the  program  Using  Robinson's  would  solve  g o a l "there e x i s t s no g l o b a l schema" statement through a counter  example.  T h i s approach i s e l e g a n t because i t i s based on a v e r y problem  s o l v i n g mechanism,  However, d e f i n i t i o n procedural  of the  r u l e s of  the  the  theorem  proving  integration rules,  conflict  general  mechanism.  especially  r e c o g n i t i o n and  the  resolution i s  more d i f f i c u l t than i n the p r o d u c t i o n system approach.  Two  o t h e r reasonable r e p r e s e n t a t i o n s f o r the t a s k are frames  semantic  networks  (Waterman, 1986) .  They w i l l  be  and  discussed  below.  Frames (Minsky, 1975)  are complex data s t r u c t u r e s c o n t a i n i n g both  f a c t u a l and p r o c e d u r a l knowledge.  Frames have s l o t s which can  c o n t a i n data c o n c e r n i n g frame p r o p e r t i e s . be  procedures  which are invoked  t h a t are not f i l l e d  R e l a t e d t o s l o t s can  when a s l o t  can take i n i t i a l l y  is filled.  Slots  defined default values.  T h i s d e f a u l t c a p a b i l i t y i s one o f the advantageous f e a t u r e s of frame based knowledge r e p r e s e n t a t i o n s .  88  Mylopoulos and Levesque of dealing with as knowledge  incomplete knowledge.  representations  (see Waterman, Feigenbaum  (1984) f o r i n s t a n c e s t r e s s t h e i r ease  1986  (1981)  i n a variety  o r Hayes-Roth  state that  nets  represent  p r o p e r t i e s are i n h e r i t e d the  network.  et a l . ,  1983).  systems Barr  and  lie.  knowledge  i n a network  i n which  from o t h e r o b j e c t s a l o n g the a r c s o f  Waterman s t a t e s  been used i n expert  of expert  frames "have problems", y e t do  not mention where t h e s e problems  Semantic  Frames have been used  that  semantic n e t s have  algo  systems, i n f a c t he argues t h a t semantic  n e t s and frames are s i m i l a r .  Mylopoulos and Levesque  (1984)  emphasize as q u a l i t i e s o f semantic n e t s t h e i r data o r g a n i z a t i o n and the p r o v i s i o n  o f good access methods.  As a disadvantage  they s t a t e the l a c k o f formal semantics and s t a n d a r d t e r m i n o l o g y . The  problem  of  formal  i n t e r p r e t a t i o n mechanism  All in  s e m a n t i c s becomes  system  a p p r o a c h has  when t h e  f o r semantic n e t s i s i n v e s t i g a t e d .  approaches a r e f e a s i b l e . the maintenance  clear,  However, f o r i t s f o r g i v i n g n e s s  o f t h e knowledge been  chosen  base,  for this  the  production  research.  i n t e g r a t i o n method has been implemented i n P r o l o g .  The program  i s c a l l e d AVIS, f o r Automatic View I n t e g r a t i o n System.  89  The  4.  RESULTS  4.1.  Rules G u i d i n g View I n t e g r a t i o n  View guided  by  a  integration  set  of  as  a  problem  solving  task  is  r u l e s which a l l o w the problem s o l v e r t o  d e f i n e the problem environment, i d e n t i f y the p a r t i c u l a r problems (conflicts) rules  and  underlying  justified. base  t o s o l v e them.  The  rules  always  expected  rules  and  be  Heuristics  i n them  are  s e c t i o n , the  are presented, divided into Base r u l e s  are  support  known t o be  general  exemplified  two  and  major groups:  are b e l i e v e d t o  rules.  The  be  beliefs  wrong sometimes but  are  t o be t r u e i n most cases.  Especially  in  integration  method  right  can  heuristics.  true.  expressed  the p r o c e s s  In t h i s  its conflict  questions.  i t can perform  of  to  this  a  asking  l a r g e extent  I f the method can  on  ask the r i g h t  When u s e r the  answering t a s k .  right  interaction questions  can  cannot  be  simplify  view the  questions,  a l a r g e segment of the i n t e g r a t i o n without  interaction. selection  relies  recognition part,  user  avoided, the  a  user's  Furthermore, the method w i l l not appear t o be  s t u p i d , i f i t can a v o i d a s k i n g t r i v i a l or redundant q u e s t i o n s . To  help  i n the  question  formulation process,  heuristics  were  i n c l u d e d which f o r i n s t a n c e change the content and sequence of questions.  90  Base r u l e s are separated i n t o f o u r groups o f r u l e s . three  groups  are  static  modelling rules.  The  The  first  fourth  group  contains process r u l e s :  1. 2. 3.  General M o d e l l i n g Rules Rules of the M o d e l l i n g Language Rules o f Database Design/View I n t e g r a t i o n 3.1 General Database Design Rules 3.2 R u l e s C o n c e r n i n g t h e T e s t f o r I d e n t i t y of Objects (Conflict Recognition and R e c o n c i l i a t i o n Rules) 3.3 Rules Concerning the R e l a t e d n e s s of Objects ( R u l e s f o r R e c o g n i t i o n and M o d e l l i n g o f Inter-Schema R e l a t i o n s h i p s ) Process Rules 4.1 P r o c e s s R u l e s f o r C o n f l i c t R e c o g n i t i o n and Reconciliation 4 .2 Process Rules f o r the R e c o g n i t i o n and M o d e l l i n g of Inter-Schema R e l a t i o n s h i p s 1  4.  General modelling rules context. be  are v a l i d  database  For example, "each r e l e v a n t r e a l w o r l d o b j e c t  r e p r e s e n t e d by e x a c t l y one  rule.  not o n l y i n the  2  shall  o b j e c t i n the model" i s such a  Rules o f the m o d e l l i n g language, here the E-R m o d e l l i n g  language,  d e s c r i b e t r u e statements about the E-R  are r e l e v a n t t o the view i n t e g r a t i o n t a s k .  language  that  Rules o f database  1  The term " r e l a t e d n e s s " i s used t o s i g n i f y s u p e r s e t - s u b s e t r e l a t i o n s h i p s such as a l l managers are employees, MANAGER— Isa—EMPLOYEE. The term " r e l a t i o n s h i p " , u n l e s s o c c u r r i n g i n t h e form "subset/superset/containment r e l a t i o n s h i p " , i s used t o denote a s s o c i a t i o n s between e n t i t i e s . 2  Throughout the chapter, the terms o b j e c t and o b j e c t type w i l l be u s e d i n t e r c h a n g e a b l y t o d e s c r i b e o b j e c t t y p e s . P a r t i c u l a r i n s t a n c e s are r e f e r r e d t o as o b j e c t i n s t a n c e o r o b j e c t occurrence.  91  d e s i g n are separated i n t o r u l e s t o guide the database d e s i g n e r ' s (or the to  method's) t e s t  guide  the  uncovering  relationships. tests  (i.e.,  f o r the  Process conflict  of  i d e n t i t y o f o b j e c t s and  inter-schema  (superset-subset)  r u l e s d e s c r i b e the  sequence i n which  recognition)  and  c o r r e c t i v e measures  ( i . e . , c o n f l i c t r e s o l u t i o n ) s h a l l be c a r r i e d  The  discussion w i l l  rules  begin  with  out.  a d e s c r i p t i o n and  of the base r u l e s , f o l l o w e d by an a n a l y s i s o f the  explanation heuristics.  Base Rules  General  Modelling  1.  Each  Rules:  relevant  represented  All  model  r e a l world concise be for  world  by e x a c t l y one  object  type  o b j e c t type  (redundancy-free  representation).  building tries  to create  form.  tasks  Not  a l l the  information  Most o f the d e t a i l may a t hand.  of the not  Hence, some r e a l  92  shall  the  i n the most  r e a l world  even be  world  be  i n the model  a representation of  that contains a l l relevant information  represented. the  real  can  required  object  types  will  not f i n d  their  object  type  update  anomalies  real  world  model.  way  into  t h e model.  can occur.  Each new o b j e c t i n s t a n c e o f t h e  has t o be i n s e r t e d more than once i n t o  Should  the r e a l  world  o b j e c t type  c r e a t e s extra processing e f f o r t  inconsistency.  One o f t h e purposes  a v o i d e x a c t l y these  2.  world  i s r e p r e s e n t e d more than once i n t h e data world,  e x i s t , more than one data model o b j e c t type This  I f a real  t h e data  itself  cease t o  has t o be removed.  and t h e p o s s i b i l i t y o f  o f database  design i s t o  problems.  An i n t e g r a t i o n o f m u l t i p l e models s h a l l not r e s u l t i n t h e l o s s o f i n f o r m a t i o n from any o f t h e models.  Any  bottom-up m o d e l l i n g approach  attempts  to build  a large  g l o b a l model through t h e combination o f s m a l l e r models. o f t h e s m a l l models r e p r e s e n t s t h e r e a l world m o d e l - b u i l d e r p e r c e i v e s as r e l e v a n t . facts  model.  models  are correct  i t s e l f consistent  Hence,  i n an  t h e r u l e demands t h a t  and t h a t  the c o l l e c t i o n  (Biskup and Convent,  Rules o f t h e M o d e l l i n g Language:  93  f a c t s t h a t one  Omission o f any o f these  o u t i n t h e g l o b a l model would r e s u l t  global  Each  1986).  incomplete  a l l individual o f models i s i n  3.  Every  object  one  of  i n a view i s r e p r e s e n t e d  following  three  with  constructs:  exactly Entity,  Relationship, Attribute.  The  view  i n t e g r a t i o n method models databases based on Chen's  E n t i t y - R e l a t i o n s h i p model i n which o n l y E n t i t i e s , R e l a t i o n s h i p s and  Attributes exist.  some e x t e n d e d special  forms  Categories  which a r e r e p r e s e n t e d i n  o f t h e E-R model w i l l  be d e p i c t e d as  (Is-a) r e l a t i o n s h i p s .  4.  Entities without  a r e autonomous o b j e c t s . the e x i s t e n c e  They can e x i s t  o f R e l a t i o n s h i p s and without  the d e f i n i t i o n o f A t t r i b u t e s .  Entities can or  exist  are things or individuals.  As t h i n g s o r i n d i v i d u a l s  even i f they have no a s s o c i a t i o n s w i t h o t h e r  individuals,  so c a n e n t i t i e s .  SUPPLIER can e x i s t  without  F o r example,  an  an a s s o c i a t i o n t o another  things entity entity,  such as BUYER.  5.  A R e l a t i o n s h i p cannot e x i s t  without  the existence  o f a t l e a s t one E n t i t y .  Relationships  represent  a s s o c i a t i o n s between e n t i t i e s .  map i n s t a n c e s o f one e n t i t y t o i n s t a n c e s o f some o t h e r  94  They entity.  In  the  most  itself.  restricted  For  example,  case, the  one  entity  entity  i s associated  with  PERSON i s a s s o c i a t e d  with  i t s e l f through a S u p e r v i s o r or a Parent r e l a t i o n s h i p . more than one  entity  never l e s s than  6.  An the  Attribute  involved  in a relationship,  cannot e x i s t without the  E n t i t y or R e l a t i o n s h i p  represent  value  or  the  be  associations  a relationship  Person_name a t t r i b u t e  and  p e r s o n names.  existence  of  the  s e t s are not p a r t  The  entity  attribute or  of the E-R  of  i t belongs t o .  between  an  a value set.  associates  existence  the  entity For  and  example,  containing  cannot e x i s t without  relationship  a  PERSON e n t i t y w i t h a  v a l u e s e t of names which i t s e l f i s a s e t of s t r i n g s valid  but  one.  Attributes set,  will  Typically,  i t refers  to  the  (value  model).  General Database Design R u l e s :  7.  Two  types of A t t r i b u t e s e x i s t .  which d e s c r i b e the in  more  object  detail  "Interconnection"  ( E n t i t y or  (i.e., Attributes  95  "Property" A t t r i b u t e s  color, which  Relationship) name)  and  describe  the  association  of the object  t o some o t h e r o b j e c t  Attributes  a r e always  relationships are  and v a l u e  (Entity or Relationship).  associations sets.  association  e n t i t i e s or  However, sometimes  they  belong  t o , but instead,  attributes  F o r example, t h e a t t r i b u t e  property  o f a PERSON e n t i t y ,  fact,  with  implicitly  s a v i n g s accounts e x i s t and t h a t such a s a v i n g s account.  (PERSON i s _ a s s o c i a t e d _ w i t h difference  T h i s a t t r i b u t e even though  states  o f a person.  that  things  In  called  a person i s o r may be r e l a t e d  PERSON possesses SAVINGS_ACCT  SAVINGS_ACCT).  between attribute  also  Thus, t h e a t t r i b u t e d e s c r i b e s not  a p r o p e r t y , b u t an a s s o c i a t i o n .  interconnection  PERSON c o u l d  PERSON, i s not a p r o p e r t y  the attribute  and some o t h e r  Person_name d e s c r i b e s a  t h e i r name.  have an a t t r i b u t e Savings_acct_no. associated  t o d e s c r i b e an  between t h e e n t i t y o r r e l a t i o n s h i p  object.  the  between  n o t used t o d e s c r i b e an i n n a t e p r o p e r t y o f t h e e n t i t y o r  relationship  to  (Entity or Relationship)  a  property  was d i s t i n c t ,  While i n t h e example attribute i twill  a n d an  n o t be as  c l e a r i n a l l cases.  8.  Interconnection Attributes Entities  a r e shortened  forms o f  ( i f the A i s a Relationship-Attribute),  96  or  of  Entity-Relationship  constructs  ( i f t h e A i s an  Entity-Attribute).  In t h e above example,  PERSON had a S a v i n g s _ a c c t _ n o  attribute  which i n d i c a t e d t h e e x i s t e n c e o f s a v i n g s accounts and a person's possession  o f such  an a c c o u n t .  Obviously, savings_account  c o u l d become an e n t i t y , s i n c e i t i s a t h i n g i n t h e r e a l world. In t h a t case, a r e l a t i o n s h i p such as Has_account would r e p r e s e n t a person's Entity,  p o s s e s s i o n of such  a s a v i n g s account  an a c c o u n t .  c o u l d have a t t r i b u t e s  as Account_balance, o r Date_opened. need  a l l this  information  extra  Also,  itself,  such  The model b u i l d e r may not  information.  i s sufficient,  b e i n g an  I f the account  number  t h e r e i s no r e a s o n t o d e s c r i b e  s a v i n g s accounts, o r o t h e r r e a l world o b j e c t s , i n more d e t a i l . After a l l ,  a model s h o u l d c o n t a i n o n l y t h e r e l e v a n t i n f o r m a t i o n  about t h e system i t i s m o d e l l i n g .  In  t h e example,  an e n t i t y  attribute  was an a s s o c i a t i o n between an e n t i t y (of  account  (Has_account)  numbers)  took  (Savings_acct_no)  (PERSON) and a v a l u e s e t  the role  of a  between PERSON and another e n t i t y  The a t t r i b u t e thus r e p r e s e n t e d both a r e l a t i o n s h i p and an e n t i t y  All as  attributes  which  relationship SAVINGS_ACCT. (Has_account)  (SAVINGS_ACCT) through t h e account number v a l u e .  o f SAVING_ACCT  any p o t e n t i a l  non-key  o t h e r than  attributes 97  i t s number, as w e l l  o f Has_account  a r e not  represented.  Hence, i n t e r c o n n e c t i o n a t t r i b u t e s a r e a compressed  form o f i n f o r m a t i o n  This and  representation.  compression has t h e u n d e s i r a b l e i n s e r t i o n anomalies.  until  people e x i s t t h a t  side e f f e c t s of deletion  I.e., savings  accounts do n o t e x i s t ,  possess t h e accounts.  Accounts a l s o  cease t o e x i s t w i t h t h e person owning them.  9.  If Attributes  are mu1ti-va1ued,  they  are  interconnection Attributes.  This r u l e helps i n the detection of interconnection a t t r i b u t e s . I f a multi-valued  a t t r i b u t e i s found, i t i s c o n s i d e r e d  t o be a  i n t e r c o n n e c t i o n a t t r i b u t e . F o r example, i f the Address a t t r i b u t e of  an EMPLOYEE r e q u i r e s  represented through deals  multiple  e n t r i e s i t should  b e t t e r be  by a new e n t i t y RESIDENCE, r e l a t e d t o EMPLOYEE  a r e l a t i o n s h i p such  with multi-valued  as R e s i d e s _ a t .  Storey  (1988)  a t t r i b u t e s i n t h i s manner d u r i n g  view  creation.  10.  A Relationship  i s a l e s s fundamental o b j e c t than an  Entity.  Since  r e l a t i o n s h i p s cannot e x i s t without t h e e x i s t e n c e  least  one e n t i t y , t h e i r  continuing 98  existence  of at  i s based on two  factors.  First,  underlying  i t i s based on t h e e x i s t e n c e  the e n t i t i e s ,  and second, on t h e e x i s t e n c e o f t h e  a s s o c i a t i o n between those one  not e x i s t ,  entities, formerly  of the objects  real  world  objects?  Should e i t h e r  then t h e r e l a t i o n s h i p has t o be removed.  on t h e c o n t r a r y , existing  i t i s unimportant  For  w h e t h e r any  a s s o c i a t i o n between them i s s t i l l  i n place.  They w i l l o n l y d i s a p p e a r once t h e r e a l world o b j e c t s u n d e r l y i n g them d i s a p p e a r . instances.  The same i s t r u e f o r e n t i t y and r e l a t i o n s h i p  F o r example,  i f a database c o n t a i n s  the e n t i t i e s  EMPLOYEE and DEPARTMENT as w e l l as the r e l a t i o n s h i p Employed_by, individual  instances  Manufacturing] exists,  o f Employed_by,  are only  meaningful  such  i f employee  t h e manufacturing department e x i s t s ,  as  [1005,  1005 s t i l l  and t h e employee  i n f a c t s t i l l works f o r the manufacturing department ( r e f e r e n t i a l integrity).  11.  Each  object  Construct and  has four  relevant  d i m e n s i o n s : Name,  ( E n t i t y / R e l a t i o n s h i p / A t t r i b u t e ) , Meaning,  Context.  One o f t h e b a s i c assumptions u n d e r l y i n g t h i s view i n t e g r a t i o n method i s t h a t t h e r e  e x i s t only four relevant  criteria  f o r o b j e c t s i n a view:  object,  s u c h as SUPPLIER, c o n s t r u c t  relationship,  differentiation  name which i s t h e name o f an  meaning, and c o n t e x t .  or object  type,  such as  Meaning encompasses a l l  the  relevant  knowledge conveyed by t h e o b j e c t .  F o r example,  meaning i n c l u d e s a l l t h e i n f o r m a t i o n t h a t i s known, once i t i s known t h a t a p a r t i c u l a r e n t i t y i s a SUPPLIER. p a r t s , w i l l be p a i d f o r p a r t s . of a l l f o u r dimensions. the they  other  I.e., s u p p l i e s  Meaning i s t h e most  important  I t w i l l have a b s o l u t e precedence over  dimensions.  I f two o b j e c t s have t h e same meaning,  r e f e r t o t h e same r e a l  world  object  and t h e r e f o r e a l l  o t h e r dimensions w i l l have t o be a d j u s t e d a c c o r d i n g l y . identifies  the s e t of objects  An a t t r i b u t e ' s context  an o b j e c t  Context  i s associated  with.  i s t h e e n t i t y o r r e l a t i o n s h i p i t belongs  to.  A r e l a t i o n s h i p ' s context  are the e n t i t i e s  it.  E n t i t i e s a r e d e f i n e d as having  a s s o c i a t e d by  no c o n t e x t .  E n t i t i e s are  the o n l y o b j e c t s a b l e t o e x i s t without any o t h e r type o f o b j e c t s .  12.  Along each dimension, any two o b j e c t s can be e i t h e r "same" o r " d i f f e r e n t " , i . e . same name, same c o n s t r u c t .  Another major assumption o f t h e view i n t e g r a t i o n method r e f e r s t o t h e v a r i a t i o n s i n each dimension.  I t i s more important t o  f i n d out whether two o b j e c t s a r e i d e n t i c a l  (same) o r d i f f e r e n t  i n each o f t h e r e l e v a n t dimensions r a t h e r than t o f i n d out the actual  values  objects, identical. required.  they  f o r each have  dimension.  t o match,  they  In order have  I f they a r e even s l i g h t l y The m a g n i t u d e  100  t o be  different  of dissimilarity  t o merge two completely a change i s  does not matter,  since  a change i s r e q u i r e d n e v e r t h e l e s s .  F o r example,  the  e n t i t y names SUPPLIER and SUPPLIERS a r e o n l y s l i g h t l y d i f f e r e n t . Nevertheless,  they  are d i f f e r e n t  and w i l l  r e q u i r e a name  change i f t h e e n t i t i e s a r e t o be merged. The same i s t r u e f o r the o t h e r dimensions.  Two r e l a t i o n s h i p s may have "almost" t h e  i same c o n t e x t , t h a t i s , most o f t h e e n t i t i e s a s s o c i a t e d by them are t h e same. different  Despite t h i s  context  fact,  these r e l a t i o n s h i p s have a  and cannot be merged  unless the context of  one o r both o f them i s changed.  13.  Two o b j e c t s w i t h d i f f e r e n t meanings can be r e l a t e d i n meaning.  Meaning i s t h e o n l y dimension not  where i d e n t i t y o r d i f f e r e n c e a r e  t h e o n l y two r e l e v a n t v a l u e s .  EMPLOYEE  and PART_TIME_EMPLOYEE  meaning,  y e t they  refers  t o a type  have  a r e not completely  obviously  different  independent.  EMPLOYEE  o f i n d i v i d u a l s which i n c l u d e s t h e type o f  individuals  referred  two  are d i f f e r e n t  objects  F o r example, t h e e n t i t i e s  t o by PART_TIME_EMPLOYEE. i n meaning,  any  Hence,  superset-subset  r e l a t i o n s h i p s between them a r e n e v e r t h e l e s s r e l e v a n t .  Objects  w i t h such r e l a t i o n s h i p s w i l l be c a l l e d r e l a t e d i n meaning.  101  when  14.  Two r e l a t e d o b j e c t s 01 and 02 w i l l d i s p l a y one o f t h e f o l l o w i n g s e t r e l a t i o n s h i p s between them: 1.  01 and 02 have a common subset  (yes/no); and  2.  01 and 02 have a common s u p e r s e t  (yes/no);  r e s u l t i n g i n t h e f o l l o w i n g p o s s i b l e combinations: (a) (b)  one o b j e c t c o n t a i n s t h e o t h e r o b j e c t ; b o t h o b j e c t s have a ( m e a n i n g f u l ) common s u p e r s e t and a common subset, y e t t h e s u p e r s e t i s n o t one o f 01 o r 02; b o t h o b j e c t s have a common s u p e r s e t , but they do not o v e r l a p ; both o b j e c t s have no common s u p e r s e t and do not i n t e r s e c t ; v i r t u a l l y no r e l a t e d n e s s , no need f o r r e p r e s e n t a t i o n i n a database.  (c) (d)  Set  r e l a t i o n s h i p s and t h e i r treatment w i t h i n view i n t e g r a t i o n  have been d i s c u s s e d a t d i f f e r e n t l e v e l s o f completeness by a l l p r e v i o u s l y reviewed i n t e g r a t i o n techniques, Navathe and  Elmasri  This The  and c o l l e a g u e s ,  and Navathe  qualifier  a l l r e l e v a n t r e l a t i o n s h i p s between two s e t s . "meaningful"  any such superset  f o r supersets  o r subset  from t h e p o i n t o f t h e u s e r s .  the  entity  inherit  implies group  F o r example, the e n t i t i e s EMPLOYEE  CUSTOMER have a common superset PERSON.  o r subsets  has t o be a cohesive  and  would  (1984), Navathe  (1983).  rule l i s t s  that  Elmasri  most completely by  r e q u i r i n g implementation,  Consequently, both EMPLOYEE and CUSTOMER  t h e a t t r i b u t e s o f PERSON and a l l i n s t a n c e s o f  EMPLOYEE and CUSTOMER would be i n s t a n c e s o f PERSON.  Another,  l e s s meaningful superset would be an e n t i t y EMPLOYEESCUSTOMER. The  choice  o f an a p p r o p r i a t e  102  common  superset, i . e . ,  EMPLOYEES CUSTOMER v s . PERSON, has t o remain w i t h While  t h e r e a r e no f i x e d  entity, For  there  rules  are indicators  t o what c o n s t i t u t e s a "good" f o r less  good e n t i t y  choices.  i n s t a n c e , i f t h e u s e r cannot p r o v i d e a good name f o r t h e  o b j e c t , i t may not be a (good) e n t i t y . i s n o t a good o b j e c t name. to  1  the user .  be v e r y  identical  meaningful. t o an a l r e a d y  I.e., EMPLOYEE&CUSTOMER  Hence, t h e o b j e c t i s n o t expected Or, i f t h e o b j e c t s a t t r i b u t e s a r e existing  object's  a t t r i b u t e s , the  o b j e c t may not be a (good) e n t i t y .  Examples f o r t h e forms o f r e l a t e d n e s s a r e : (a)  EMPLOYEE c o n t a i n s PART_TIME_EMPLOYEE;  (b)  PRODUCT_TEAM_MEMBER both  subsets  and PROJECT_TEAM_MEMBER a r e  o f EMPLOYEE, t h e i r  intersect i s  PRODUCT&PROJECT_TEAM_MEMBER; (c)  PART_TIME_EMPLOYEE and FULL_TIME_EMPLOYEE a r e both subsets o f EMPLOYEE, but they do not o v e r l a p ;  (d)  CUSTOMER and DEPARTMENT do not i n t e r s e c t .  The r e l a t e d n e s s i n (d) i s so weak t h a t i t s h a l l be i g n o r e d .  Even  though i t r e p r e s e n t s some e x t r a knowledge about t h e world, t h e knowledge i s n e g a t i v e knowledge.  1  Since negative  Throughout t h e t e x t , the term "user" r e f e r s t o a database d e s i g n e r who employs t h e i n t e g r a t i o n method. T h i s "designer u s e r " r e p r e s e n t s t h e i n t e r e s t s o f t h e end u s e r s o f the database. The end u s e r s a r e assumed t o have p r o v i d e d t h e o r i g i n a l views. 103  knowledge  i s s o much more abundant than p o s i t i v e  knowledge,  i t s r e p r e s e n t a t i o n t y p i c a l l y becomes i n f e a s i b l e .  15.  Two u n r e l a t e d  o b j e c t s 01 and 02 may share  a common  role.  Two e n t i t i e s , and  f o r example PERSON and COMPANY can be d i f f e r e n t  unrelated,  but they  the r o l e o f s h a r e h o l d e r .  still  can have a common r o l e  N e i t h e r view may c o n t a i n a s h a r e h o l d e r  o b j e c t , even though both may c o n t a i n a STOCK e n t i t y . and  Storey  role  (1988) d i s c u s s u n r e l a t e d  ("W-relationship")  situation  16.  objects  and t h e proper  Goldstein  s h a r i n g a common  representation of this  i n a generalization l a t t i c e .  Two o b j e c t s a r e i d e n t i c a l , i f they a r e i d e n t i c a l i n all  dimensions.  Only t h e p r e v i o u s l y d i s c u s s e d  f o u r dimensions a r e r e l e v a n t t o  judge whether o b j e c t s a r e i d e n t i c a l . in  such as  a l l dimensions.  entity  F o r example,  O b j e c t s have t o correspond an e n t i t y  WORKER a r e known t o mean t h e same.  identical  i n meaning, c o n s t r u c t  Nevertheless,  EMPLOYEE and an Thus  they a r e  ( e n t i t y ) , and c o n t e x t  (empty).  t h e o b j e c t s a r e i d e n t i c a l o n l y a f t e r t h e i r names  have been made i d e n t i c a l t o o . 104  17.  Each o b j e c t i s r e l a t e d t o i t s e l f i s c o n t a i n e d by i t s e l f ) . be r e p r e s e n t e d  This  T h i s r e l a t e d n e s s s h a l l not  i n any view.  r u l e g u i d e s and l i m i t s  relationships.  ( c o n t a i n s i t s e l f and  t h e search  F o r example,  f o r between-view s e t  i f an e n t i t y EMPLOYEE has been  found  t o be i d e n t i c a l  other  view,  each  other  one.  They a l s o share a common subset,  itself.  global  itself,  o f the e n t i t i e s  The r e p r e s e n t a t i o n  information. the  t o another o b j e c t  EMPLOYEE  from  i s a l s o a superset  of t h i s  some  o f the  the e n t i t y set  f i n d i n g bears no e x t r a  I t would a l s o r e s u l t i n an i n f i n i t e expansion o f database,  since  i f every  a l s o the object expressing  object  i s r e l a t e d to  t h i s relatedness  i s related  t o i t s e l f which has t o be expressed through y e t another o b j e c t , and  so on.  18.  An o b j e c t  can be r e l a t e d t o between 0 and n  other  objects.  It  i s important t o remember t h a t one o b j e c t can be r e l a t e d t o  more t h a n from has  one o t h e r  a n o t h e r view been found.  object.  The search  i s not completed  for related  a f t e r one r e l a t e d  objects object  However, i t i s a l s o p o s s i b l e t h a t no r e l a t e d  o b j e c t s can be found i n another view.  105  19.  Each o b j e c t identical  i n one  object  view can  i n another view  a l s o the "corresponding"  This  rule follows  real  world  model. are  object  identical  identical  search  general  be  A view i s a model.  object  object).  r u l e of modelling  represented  that  identical.  objects  Two  views  o b j e c t s o f one  view two  T h i s r u l e i m p l i e s t h a t once a p a i r  has  are  no  more than once i n a  Hence, i f two  been  for further identical  20.  (call this  one  t o another e n t i t y from some o t h e r view, the  o b j e c t s must be of  from the shall  have a maximum o f  found,  there  i s no  need  to  objects.  the  same, i f a l l t h e i r  objects  are  identical.  The is  goal to  o f the  c o n f l i c t r e c o g n i t i o n and  c o r r e c t omissions and  previously have t o be information  different merged, one  conflicts  views are  be  i n the  s t a t e s when the i d e n t i t y c o n d i t i o n i s  106  so t h a t a t the  identical.  of them can  i s a l s o contained  r e s o l u t i o n procedure end  two  Then they do  not  removed, s i n c e a l l i t s other  view.  achieved.  This  rule  21.  Each i n d i v i d u a l view i s complete and c o n s i s t e n t and minimal.  A  view  i s complete  i f i t represents  a l l the individuals,  things,  and a s s o c i a t i o n s between them, r e l e v a n t  A  i s consistent  view  the  relatedness  view.  other as  i f none o f t h e f a c t s s t a t e d  of sets  F o r example,  PART_TIME_EMPLOYEE fact  are contradicted  i f the view  i s a subset  states  concerning  by o t h e r s that  the e n t i t y  contrary  information,  only  o f a view e n t a i l s t h a t  represented  contains  such  and EMPLOYEE have no members i n common,  (see Casanova and V i d a l (1982), Biskup and Convent  Minimality  i n the  o f t h e e n t i t y EMPLOYEE, no  i n t h e view may p r e s e n t  PART_TIME_EMPLOYEE  t o t h e user.  each r e a l  once i n a view.  two e n t i t i e s ,  (1983)).  world  F o r example,  object i s  i f one view  SUPPLIER and DEALER, these  entities  have t o be d i f f e r e n t ; they have t o r e f e r t o d i f f e r e n t o b j e c t s i n t h e r e a l world.  The completeness assumption c l a r i f i e s t h e r o l e o f t h e i n t e g r a t i o n method as a method t h a t f i n d s omissions o r c o n f l i c t s based  n o t on w i t h i n - v i e w  between view ( i n t e r - v i e w )  (intra-view) comparison.  107  i n views  a n a l y s i s b u t based on  22.  The c o l l e c t i o n  o f views  before  integration i s  consistent.  The  view  integration  individually  are consistent,  views i s c o n s i s t e n t concerning  method  that  but a l s o t h a t  as a whole.  relatedness  assumes  of sets  not only  views  the c o l l e c t i o n of  In o t h e r words, f a c t s i n one view cannot  stated  contradict  f a c t s s t a t e d i n another view.  This and  rule  t h e purpose o f the c o n f l i c t  recognition  r e s o l u t i o n method as a method t h a t c o r r e c t s omissions and  conflicts not  clarifies  (i.e.,  differences  contradictions.  i n opinion  For instance,  on name, context) but  i f view VI s t a t e s t h a t a l l  managers have t o be f u l l - t i m e employees, w h i l e view V2 s t a t e s that  also  contradict. The  part-time  employees  c a n be managers,  t h e views  Both statements cannot be t r u e a t t h e same time.  method assumes t h a t such c o n t r a d i c t i o n s do not e x i s t .  Rules Concerning t h e T e s t ( C o n f l i c t Recognition  f o r Identity of Objects:  and R e s o l u t i o n  108  Rules)  I f f o r an o b j e c t 01 from view VI an i d e n t i c a l o b j e c t  23.  02 c a n n o t be found missing the  i n view V2, then 02 i s e i t h e r  o r represented  through an o b j e c t  same meaning but i s d i f f e r e n t  along  t h a t has i t s other  dimensions.  I d e a l l y , an i d e n t i c a l o b j e c t 02 from V2 e x i s t s f o r each o b j e c t 01  from V I .  in  a l l relevant  context. If  Both o b j e c t s a r e i d e n t i c a l dimensions:  name,  i f they a r e i d e n t i c a l  construct,  meaning, and  The most c r u c i a l dimension i s t h e meaning dimension.  two o b j e c t s have t h e same meaning, they r e f e r t o t h e same  object  i n the real  world.  Hence, i f an o b j e c t  02 w i t h the  same meaning as 01 e x i s t s , t h e r e may remain a name, or  context  but  02  conflict  construct  between O l and 02 t o be taken c a r e o f f ,  i s not missing.  I f no 02 e x i s t s t h a t  r e f e r s t o the  same r e a l world o b j e c t as 01 does, then t h a t 02 i s t r u l y m i s s i n g .  24.  No change o f a view d u r i n g i n t e g r a t i o n s h a l l  result  i n the l o s s of information.  This  r u l e provides  a g u i d e l i n e t o t h e d i r e c t i o n o f change i n  cases o f c o n s t r u c t mismatch as d e s c r i b e d by one o f the f o l l o w i n g alternatives: Object i n view 1: Entity Entity Relationship  Object i n view 2: Relationship Attribute Attribute 109  Mismatches between an  attribute  on  relationship  on the o t h e r hand w i l l  object  the  with  attribute  one  hand and  This  f o l l o w s from the r u l e on i n t e r c o n n e c t i o n  a relationship,  o b j e c t permanence.  fundamental than e n t i t i e s . when  the  entity  adjustment  or the  rule  attributes.  change of the o b j e c t w i t h the r e l a t i o n s h i p the r u l e c o n c e r n i n g  entity  r e s u l t i n a change o f  construct.  A mismatch between an e n t i t y and  an  results  in a  c o n s t r u c t , based  R e l a t i o n s h i p s are  on  less  R e l a t i o n s h i p i n s t a n c e s cease t o e x i s t  instances  they  refer  to  cease  to  exist  ( r e f e r e n t i a l i n t e g r i t y ) , as i l l u s t r a t e d below: View 1:  SUPPLIER—Sup_con—CONTRACT—Cus_con—CUSTOMER  View 2:  SUPPLIER—Contract—CUSTOMER  Both views have s u p p l i e r s i n a c o n t r a c t s i t u a t i o n w i t h customers, yet  i n view 1,  the c o n t r a c t i t s e l f  is a relationship.  i s an e n t i t y ,  i n view 2 i t  In view 2, a d i s a p p e a r i n g customer (instance)  d e s t r o y s a l l r e c o r d s o f a c o n t r a c t u a l agreement between him the s u p p l i e r . have a l i f e  No h i s t o r i c data remains. o f t h e i r own  customer i n s t a n c e .  and  s u r v i v e the  Hence, the  less  In view 1,  contracts  disappearance o f  permanent c h a r a c t e r  r e l a t i o n s h i p p o t e n t i a l l y leads to information database e x t e n s i o n .  and  a  of a  l o s s i n the  Consequently, a c o n s t r u c t mismatch between  110  an  e n t i t y and a r e l a t i o n s h i p  should  r e s u l t i n a change o f t h e  r e l a t i o n s h i p c o n s t r u c t i n t o an e n t i t y  25.  I f two u n r e l a t e d  objects  construct.  share a common r o l e , t h e  common r o l e o b j e c t and s p e c i f i c r o l e o b j e c t s have t o be r e p r e s e n t e d  as w e l l as I s a r e l a t i o n s h i p s  between  the o r i g i n a l o b j e c t s and the s p e c i f i c r o l e and between the s p e c i f i c r o l e s and t h e common r o l e .  T h i s r u l e i s based on G o l d s t e i n and Storey  (1988).  F o r example,  in: VI:  PERSON—Holds—STOCK  V2: COMPANY—Holds—STOCK PERSON and COMPANY have t h e same r o l e . role  object  Therefore,  SHAREHOLDER i s needed t o d e s c r i b e  Furthermore,  specific role  COMPANY_SHAREHOLDER  objects,  a r e needed.  PERSON as w e l l as a SHAREHOLDER.  a common  the s i t u a t i o n .  PERSON_SHAREHOLDER and  Then, PERSON_SHAREHOLDER i s a SHAREHOLDER here w i l l be t h e  o b j e c t a s s o c i a t e d w i t h STOCK through t h e Holds  relationship.  Rules Concerning t h e T e s t f o r Relatedness o f O b j e c t s : (Recognition  and M o d e l l i n g  o f Inter-Schema  111  Relationships)  26.  Any  Object 01 from VI which i s not an e n t i t y and which  is  related  t o an  o b j e c t 02  from V2  s h a l l become an  entity.  Any or  object an  other  01  that  an  entity  attribute.  N e i t h e r of  objects  means o f  involved  in  by  involving  related,  they  relationship. example,  an  the  a  relationships  relationships are  i s not  two  may  be  are  not  relationship  Relationships  permitted,  nor  However, i f two be  connected  objects  Thus, t h i s c o n s t r u c t change i s n e c e s s a r y .  For  belonging to  by  entity  PART i n  view 1 i s r e l a t e d t o e n t i t y DEALER from view 2.  The  i s such t h a t  a l l suppliers  a l l dealers  suppliers.  In  will  this  become an  through  a  are d e a l e r s but  case,  entity,  the  Supplier  which w i l l  Supplies relationship.  interconnection  attribute  be  not  is  now  4.3  with  i n view more  are  i n view  associated  For a more d e t a i l e d  of c o n s t r u c t changes compare s e c t i o n  relatedness  attribute  Supplier  which  r e p r e s e n t e d through an e n t i t y .  27.  are  Isa  Supplier  to  with  an  attribute  have  a  associated  relationship.  attributes.  will  i s either  1 was  1  part an  adequately illustration  on c o n f l i c t  therapy.  I f an o b j e c t 01 c o n t a i n s an o b j e c t 02, the containment s h a l l be r e p r e s e n t e d by an Isa r e l a t i o n s h i p .  I f the  Isa  added.  relationship  does not  112  e x i s t , i t must be  The c o n t a i n e d  o b j e c t w i l l possess a l l a t t r i b u t e s o f  the c o n t a i n i n g  This  object.  r u l e on t h e E-R r e p r e s e n t a t i o n  from E l m a s r i and Navathe For  example,  connection  o f containment i s taken  (1984).  i f EMPLOYEE  contained  PART_TIME_EMPLOYEE, the  between t h e two would have t o be r e p r e s e n t e d  by an  I s a r e l a t i o n s h i p , s t a t i n g t h a t PART_TIME_EMPLOYEE i s an EMPLOYEE. PART_TIME_EMPLOYEE would i n h e r i t a l l a t t r i b u t e s o f EMPLOYEE.  28.  I f two o b j e c t s 01 and 02 o v e r l a p , and n e i t h e r o b j e c t c o n t a i n s t h e other, t h e o v e r l a p s h a l l be represented by an o v e r l a p o b j e c t 03. not e x i s t , will 02.  I f t h e o v e r l a p o b j e c t does  i t must be added.  inherit  t h e union o f t h e a t t r i b u t e s o f 01 and  The c o n n e c t i o n s  represented  The o v e r l a p o b j e c t 03  01-03  and 02-03  shall  by one I s a r e l a t i o n s h i p e a c h .  be If  e i t h e r o f t h e I s a r e l a t i o n s h i p s does not e x i s t , i t must be added.  This  rule  states  how  form  common s u b s e t  t h e method  (overlap).  handles r e l a t e d n e s s  o f the  The f o l l o w i n g example  i l l u s t r a t e the r u l e : View 1: PROJECT_EMPLOYEE[Emp#,Proj#,Yrs_experience,Title] View 2:  PRODUCT_EMPLOYEE[Emp#,Prodname,Function,Title] 113  will  the  common  attributes Title  subset  PROJECT&PRODUCT_EMPLOYEE  Emp#, P r o j # ,  and c o n t a i n s  PROJECT_EMPLOYEE  Yrs_experience,  i n h e r i t s the  Prodname,  a l l i n s t a n c e s o f employee and  Function,  contained i n  i n PRODUCT_EMPLOYEE  (intersect).  Furthermore, t h e f o l l o w i n g r e l a t i o n s h i p s a r e added: PROJECT&PRODUCT_EMPLOYEE—Isa—PROJECT_EMPLOYEE PROJECT&PRODUCT_EMPLOYEE—Isa—PRODUCT_EMPLOYEE The c r e a t i o n o f o v e r l a p o b j e c t s i s e x p l a i n e d i n d e t a i l  i n Yao  e t a l . (1982).  29.  I f two o b j e c t s 01 and 02 have a common s u p e r s e t , and n e i t h e r o b j e c t c o n t a i n s t h e other, t h e s u p e r s e t s h a l l be  represented  superset The of  by a s u p e r s e t  o b j e c t 03.  o b j e c t does not e x i s t ,  superset  o b j e c t 03 w i l l  the attributes  identifier  possess  o f 01 and 02.  attributes,  these  a n d 02-03  shall  r e l a t i o n s h i p each.  be  be added.  the intersect  I f they  a r e not  attributes w i l l  t o be removed from 01 and 02. 03  i t must  I f the  have  The c o n n e c t i o n s O l -  represented  by one I s a  I f e i t h e r of the Isa r e l a t i o n s h i p s  does not e x i s t , i t must be added.  This  rule  s t a t e s how  form common s u p e r s e t . the  t h e method  handles  relatedness of the  The f o l l o w i n g example w i l l  rule: 114  illustrate  View 1: PROJECT_EMPLOYEE[Emp#,Proj#,Yrs_experience,Title] View 2:  PRODUCT_EMPLOYEE[Emp#,Prodname,Function,Title]  the common s u p e r s e t EMPLOYEE r e c e i v e s the a t t r i b u t e s Emp#,Title. The non-key a t t r i b u t e T i t l e and  a r e removed from PROJECT_EMPLOYEE  from PRODUCT_EMPLOYEE:  EMPLOYEE[Emp#,Title] PROJECT_EMPLOYEE[Emp#,Proj #,Yrs_experience] PRODUCT_EMPLOYEE[Emp#,Prodname,Function]  EMPLOYEE  contains  a l l instances  o f employees  PROJECT_EMPLOYEE o r i n PRODUCT_EMPLOYEE (union).  included  in  Furthermore,  the f o l l o w i n g r e l a t i o n s h i p s are added: PRODUCTJEMPLOYEE—Isa—EMPLOYEE PROJECT_EMPLOYEE—Isa—EMPLOYEE The  creation  explained  30.  o f overlap  objects  and a t t r i b u t e r e l o c a t i o n i s  f o r i n s t a n c e i n Navathe e t a l . (1986).  I f two o b j e c t s exclude each other, t h e e x c l u s i o n s h a l l be represented  through an i n t e g r i t y c o n s t r a i n t .  No new o b j e c t s a r e added i n t h e case o f an e x c l u s i o n . an  integrity  instance sets.  constraint  can be added t o prevent  However,  any  object  f r o m a c c i d e n t a l i n s e r t i o n i n t o t h e non-overlapping  F o r example:  View 1: FULLTIME EMPLOYEE 115  View 2: PARTTIME_EMPLOYEE describe could either  two n o n - o v e r l a p p i n g  be f o r m u l a t e d object  objects s t i l l  only  sets.  t o permit i f after  An i n t e g r i t y  insertion  o f instances  the i n s e r t i o n  a join  r e s t r i c t i o n can improve  representation of exclusion by  into  o f both  r e t u r n s t h e empty s e t .  I f t h e model (and t h e DBMS) can support i n t e g r i t y this  constraint  the data  constraints,  quality.  integrity constraints  The  i s suggested  [Casanova and V i d a l , 1983] and [Biskup and Convent, 1986].  31.  Containment i s t r a n s i t i v e .  I f A c o n t a i n s B and B  c o n t a i n s C, then A c o n t a i n s C. The t r a n s i t i v i t y s h a l l not  be e x p l i c i t l y  r e p r e s e n t e d i n any view.  An I s a  r e l a t i o n s h i p between A and C i s assumed t o e x i s t , i f an I s a r e l a t i o n s h i p e x i s t s between A and B and between B and C.  This  rule  prevents  relationships  the generation  i n multi-level  o f new  hierarchies.  PERSON, EMPLOYEE, and PART_TIME_EMPLOYEE  redundant I s a I f f o r example,  entities exist  in a  view, and EMPLOYEE—Isa—PERSON, as w e l l as PART_TIME_EMPLOYEE- I s a — E M P L O Y E E has been expressed, t h e r e express PART_TIME_EMPLOYEE—Isa—PERSON.  116  i s no need t o a l s o  32.  I f an I s a r e l a t i o n s h i p h i e r a r c h y relationship hierarchy implied  This Isa  rule  view  because o f t r a n s i t i v i t y , the  I s a r e l a t i o n s h i p s h a l l be removed.  a s s u r e s t h e removal o f a l r e a d y  relationships 1 states  i m p l i e s another I s a  existing  i n multi-level hierarchies.  that  redundant  I f f o r example  PART_TIME_EMPLOYEE—Isa—EMPLOYEE—Isa—  PERSON, w h i l e view 2 expresses t h a t  PART_TIME_EMPLOYEE—Isa—  PERSON, expressed, t h e t r a n s i t i v e I s a i n view 2 c o n t a i n s both I s a ' s i n view 1 and i s redundant.  33.  Creation result  o f a new  I t has t o be removed.  s u p e r s e t o r subset o b j e c t  in relocation  of r e l a t i o n s h i p s  r e l a t i o n s h i p s were p r e v i o u s l y  will  i f these  linked to e n t i t i e s at  an i n c o r r e c t l e v e l o f g e n e r a l i z a t i o n .  Whenever a new s u p e r s e t - s u b s e t r e l a t i o n s h i p i s i n t r o d u c e d a view, t h e p o s s i b i l i t y e x i s t s t h a t e x i s t i n g r e l a t i o n s h i p s have t o be r e l o c a t e d . VI:  Consider the following  into may  example:  DEPARTMENT—Employs—FULLTIME_EMPLOYEE,  V2: FULLTIME_EMPLOYEE—Isa—EMPLOYEE. In  V I , Employs r e f e r s  g e n e r a l EMPLOYEE o b j e c t  t o FULLTIME_EMPLOYEE, exists.  117  because no more  Once t h e new EMPLOYEE  becomes  part  o f V I , t h e Employs  a s s o c i a t e DEPARTMENT w i t h  relationship will  be r e l o c a t e d t o  EMPLOYEE.  V1/V2:  DEPARTMENT—Employs—EMPLOYEE—Isa—FULLTIME_EMPLOYEE.  Process  Rules:  34.  In view i n t e g r a t i o n , t h e t e s t f o r i d e n t i t y  (conflict  r e c o g n i t i o n and r e c o n c i l i a t i o n ) s h a l l precede t h e t e s t for relatedness. The  test  for identity  and t h e t e s t  f o r relatedness  independent phases o f view i n t e g r a t i o n . detects  or creates  The t e s t f o r i d e n t i t y  i d e n t i c a l p a i r s of objects i n the involved  views so t h a t f i n a l l y identical  a r e two  f o r each o b j e c t i n view VI e x a c t l y one  o b j e c t e x i s t s i n view V2.  The t e s t  f o r relatedness  has t h e purpose t o d e t e c t c u r r e n t l y m i s s i n g forms o f r e l a t e d n e s s (set  r e l a t i o n s h i p s ) between v i e w s .  detect  within-view  relatedness.  I t s purpose  A l l occurrences  i s not t o of within-  view r e l a t e d n e s s a r e supposed t o be a l r e a d y r e p r e s e n t e d individual  views  (completeness assumption).  i l l u s t r a t e this fact.  An example  may  VI has employees working i n departments,  V2 a s s i g n s employees t o p r o j e c t s .  View 1:  i n the  EMPLOYEE—Works_in—DEPARTMENT  View 2: EMPLOYEE—Assigned_to—PROJECT  118  The  completeness  assumption  relatedness exist within either explicitly  stated  knowledge).  postulates  that  no forms o f  o f t h e views, because none a r e  (no knowledge  i s interpreted  as n e g a t i v e  F o r example, i t i s known t h a t EMPLOYEE i s not a  subset o f DEPARTMENT.  Consequently, t h e s e a r c h f o r i n t e r - v i e w  r e l a t e d n e s s has t o focus o n l y on those o b j e c t s t h a t o r i g i n a l l y exist  i n one v i e w but not i n t h e o t h e r .  I.e., i f EMPLOYEE  were i d e n t i c a l t o EMPLOYEE, Works_in i d e n t i c a l t o A s s i g n e d _ t o , and DEPARTMENT view  relatedness  originally test  i d e n t i c a l t o PROJECT, then no u n d e t e c t e d i n t e r could  exist.  In o r d e r t o know which  views  e x i s t e d o n l y i n one view but not i n t h e other, the  for identity  has t o be c a r r i e d  out f i r s t .  Thus, the  sequence o f t h e two independent view comparisons, f o r i d e n t i t y and f o r r e l a t e d n e s s , i s determined by t h e f a c t t h a t a p r e v i o u s test  for identity  c a n r e d u c e t h e number o f comparisons f o r  relatedness.  Process R u l e s f o r C o n f l i c t R e c o g n i t i o n and R e c o n c i l i a t i o n :  35.  F o r each  object  01 from view V I , t r y t o f i n d an  i d e n t i c a l o b j e c t 02 i n view V2. 119  The purpose o f t h e method i s t o e i t h e r identical,  o r t o make  identical,  one o f them  information  i s represented  earlier,  two v i e w s  identical.  them  f i n d t h a t two views a r e  identical.  Once two views a r e  c a n be e l i m i n a t e d  because  i n t h e remaining view.  are identical,  a l l  its  As d e f i n e d  i f a l l their  objects are  Hence, t h e t e s t f o r i d e n t i t y b e g i n s w i t h an attempt  t o f i n d an i d e n t i c a l o b j e c t 02 i n V2 f o r each o b j e c t 01 from V I .  36.  I f no i d e n t i c a l  o b j e c t 02 from V2 can be found f o r  01 from V I , t r y t o f i n d an o b j e c t t h a t has t h e same meaning as 01 and change t h e d i s s i m i l a r  dimensions  o f 01 and 02 so t h a t they become i d e n t i c a l .  Earlier,  complete  identity  o f o b j e c t s was d e f i n e d .  describes  the action  t o be t a k e n  partially  identical,  i f t h e y have  i f two o b j e c t s  o f change.  the  same meaning — r e f e r s  the  attribute  I f the e n t i t y  120  The  determines  world o b j e c t —  2, both o b j e c t s  the same name and t h e same c o n s t r u c t .  a r e only  SUPPLIER i n view 1 has  t o t h e same r e a l  Dealer_no i n view  rule  t h e same meaning.  meaning dimension as t h e most important dimension the d i r e c t i o n  This  finally  as  have  37.  I f no o b j e c t 02 w i t h same meaning can be found, add a new o b j e c t 02 t o V2 where 02 i s i d e n t i c a l t o O l from VI.  I f no o b j e c t 02 w i t h same meaning as 01's can be found, then 01 has no c o r r e s p o n d i n g o b j e c t i n V2. Hence an o b j e c t i d e n t i c a l t o 01 has t o be added t o V2.  38.  F o r each  object  02 from V2 which  i s different i n  meaning t o 01 from VI but has t h e same name, change the name so t h a t no two o b j e c t s w i t h d i f f e r e n t meaning c a r r y t h e same name.  This  rule  forbids the existence  o f homonyms i n t h e database.  I f a homonym i s found, a name change i s r e q u i r e d based on t h i s rule. If  Again, name f o l l o w s the more important dimension meaning.  meanings  are different,  names have t o be d i f f e r e n t . The  o t h e r dimensions, c o n s t r u c t and c o n t e x t can remain as they a r e .  39.  F o r each 02 i n V2 t h a t remains without an i d e n t i c a l object  from V I , a f t e r  a l l objects  matched w i t h an i d e n t i c a l  object  i n VI have been i n V2, add a new  o b j e c t 01 t o VI which i s i d e n t i c a l t o 02.  121  View V2 after  may  contain  a l l of  o b j e c t i n V2, identical  Vl's  o b j e c t s t h a t are not objects  have been  p a r t o f VI.  assigned  some of the o b j e c t s i n V2 may  object  i n VI.  an  Hence, identical  be l e f t without  an  Consequently, t h e s e o b j e c t s have t o  be added t o VI.  Process  Rules  f o r the  Recognition  and  Modelling  of  Inter-  Schema R e l a t i o n s h i p s :  40.  Compare each o b j e c t 01 from VI which was unique  to  VI  (before  during  identity  originally  addition of missing  test)  against  a l l objects  f o r m e r l y unique t o V2, t o f i n d out whether 01 02,  o r 02  contains  01.  objects  Represent each  {02}  contains  identified  containment.  Purpose o f the a n a l y s i s i s o n l y the a d d i t i o n of m i s s i n g view  superset-subset  ment t e s t t o one  relationships.  Therefore,  the  inter-  contain-  a p p l i e s o n l y t o o b j e c t s t h a t were o r i g i n a l l y unique  o f the two  views.  For example:  View 1:  PART—Last_ordered_from—SUPPLIER  View 2:  PART—Carried_by—DEALER  122  Here  PART i s t h e same i n b o t h  unique. and  Hence,  SUPPLIER,  o t h e r view.  only  views  Carried_by,  are potentially  and t h e r e f o r e i s not  Last_ordered_from,  related  t o objects  DEALER, from t h e  I.e., DEALER c o u l d be r e l a t e d t o Last_ordered_from  or t o SUPPLIER, Last_ordered_from or t o C a r r i e d _ b y .  c o u l d be r e l a t e d  t o DEALER  I f , f o r i n s t a n c e a l l SUPPLIERS a r e DEALERS  but not a l l DEALERS are SUPPLIERS, then DEALER c o n t a i n s SUPPLIER. Consequently, an I s a r e l a t i o n s h i p between SUPPLIER and DEALER would have t o be c r e a t e d .  The  comparison  relatedness, and  summarized  because  i n this  rule  i s the f i r s t  i t t h e most s p e c i a l  requires the least  change  case  test for  of relatedness  i n the existing  views.  The  comparison A c o n t a i n s B i s a s p e c i a l case o f common containment (A c o n t a i n s A and A c o n t a i n s B), as w e l l as a s p e c i a l case o f common subset  (B i s a subset o f A and B i s a subset o f B).  this  case,  special  views.  only  In the general  an I s a r e l a t i o n s h i p case,  t h e common  common subset have t o be added too.  In  i s added t o t h e  superset  and t h e  Thus, i f t h i s t e s t i s t h e  f i r s t one, t h e subsequent s t e p s a r e s i m p l i f i e d .  41.  F o r a l l p a i r s o f o r i g i n a l l y unique o b j e c t s 01, 02 i n which n e i t h e r o b j e c t c o n t a i n s t h e o t h e r , i n v e s t i g a t e whether 01 and 02 a r e c o n t a i n e d by a common o b j e c t  123  different  from  01 and 02.  Represent  t h e common  containment.  This  rule  summarizes t h e procedure f o r a common  where t h e c o n t a i n i n g o b j e c t i s d i f f e r e n t  containment  from 01 o r 02.  Only  those o b j e c t s a r e compared t h a t were o r i g i n a l l y r e p r e s e n t e d i n one view o n l y .  A l l o b j e c t p a i r s i n which one o b j e c t c o n t a i n s  the o t h e r a r e not c o n s i d e r e d .  42.  F o r a l l p a i r s o f o r i g i n a l l y unique o b j e c t s 01, 02 i n which n e i t h e r o b j e c t c o n t a i n s the o t h e r and which have a common s u p e r s e t , a l s o i n v e s t i g a t e whether 01 and 02 i n t e r s e c t . Represent integrity  This the  rule  Represent any e x i s t i n g common s u b s e t s .  t h e l a c k o f a common subset through an constraint.  summarizes t h e procedure f o r a common subset where  intersect  object  i s different  from 01 o r 02.  o b j e c t s a r e compared t h a t were o r i g i n a l l y view  only.  Also,  only  Only those  r e p r e s e n t e d i n one  o b j e c t s t h a t have a common s u p e r s e t  (different  from OI and 02) a r e compared.  meaningful  common  O b j e c t s without a  s u p e r s e t c a n n o t have a meaningful  subset.  124  common  43.  For a l l o b j e c t p a i r s 01, view, i n v e s t i g a t e (common  the  role) .  02 o r i g i n a l l y unique t o  one  e x i s t e n c e of a W - r e l a t i o n s h i p  Represent  any  existing  W-  relationships.  Even  though  the  test  for  r e l a t e d n e s s among the  related  o b j e c t s may  o b j e c t s themselves,  not  objects  find  can  any  have a  common r o l e , which r e q u i r e s the a d d i t i o n of o b j e c t s t o r e p r e s e n t the  common r o l e and  the  objects  company and  a person can  and  are  person  not  be  1  car  related  special owners.  (i.e.,  role.  I.e.,  Even though company  have no  meaningful common  s u p e r s e t i n the database), t h e i r common r o l e c a r owner representation, and  as  do  their special  both a  roles  requires  person-car-owner  company-car-owner.  Heuristics  Heuristics all  are  cases.  The  process w i l l the  incorrect  the  use  are  process  true  i t may  generally  true,  p r o c e s s f o r the and  will  when t h e  heuristics will  d e s i g n , but  are  but  of these r u l e s d u r i n g the view  s i m p l i f y the  heuristics  prolong  rules that  not  r e s u l t i n an  true in  integration  u s e r i n cases where  slightly  heuristic  not  inconvenience  fails.  The  incorrect  use  of  database  p r o l o n g the database d e s i g n p r o c e s s . 125  or  Heuristics  improve t h e i n t e g r a t i o n p r o c e s s by h e l p i n g  to f i n d objects  w i t h s i m i l a r o r r e l a t e d meaning.  i s compared t o a s e t o f o b j e c t s i s l a r g e and d i v e r s e relationships  I f object  01  {02} from view 2 and t h a t s e t  ( l a r g e number o f o b j e c t s i n c l u d i n g e n t i t i e s ,  and a t t r i b u t e s ) , t h e s e l e c t i o n problem may be  d i f f i c u l t f o r the user. problem  the user  I f t h e s e t {02} i s s m a l l , t h e s e l e c t i o n  becomes simple o r even t r i v i a l .  Heuristics  s i m p l i f y t h e s e l e c t i o n problem by i n c l u d i n g o n l y those i n the s e t that  are l i k e l y  help to objects  t o be i d e n t i c a l o r r e l a t e d t o the  o b j e c t 01.  The l i s t below shows o n l y some h e u r i s t i c s , i t cannot be complete. It  i s always p o s s i b l e  simplify  the search  to formulate  procedure.  are  may be t o o l o o s e .  e r r o r s which r e q u i r e is  Heuristics  a p a r t i c u l a r problem, s i n c e  exemplified  view  integration  assumptions t o  Furthermore,  h e u r i s t i c s shown may be t o o s t r i n g e n t others  further  some  f o r a p a r t i c u l a r design, that  are too stringent  they can r e s u l t  i n decision  l e n g t h y r e c o v e r y procedures. T h i s  i n the next procedures,  section  of the  which shows  one without  problem  alternative  any h e u r i s t i c s , one  w i t h o n l y one h e u r i s t i c implemented.  The  1.  following  h e u r i s t i c s have been i d e n t i f i e d :  Two o b j e c t s  with  i d e n t i c a l o r r e l a t e d meaning w i l l  have some common c o n t e x t . 126  T h i s r u l e says t h a t i n the v i c i n i t y been and  found  that  i d e n t i c a l o r r e l a t e d o b j e c t s w i l l be found  of i d e n t i c a l objects. there  V2, and EMPLOYEE  exists  F o r example, i f i t has  an e n t i t y  EMPLOYEE i n views VI  i n VI p a r t i c i p a t e s  in a  relationship  Employment, then i t i s reasonable t o assume t h a t EMPLOYEE w i l l participate  i n a similar  association  i n V2, even though  that  association  may not be c a l l e d Employment i n V2 and even though  i t may n o t be a r e l a t i o n s h i p .  The the  h e u r i s t i c i s based on t h e assumption t h a t people same e n v i r o n m e n t  environment.  will  t h e absence  therefore and  have t h e same p e r c e p t i o n o f t h e  S i n c e both views have common elements, both views  describe at least p a r t i a l l y In  that  associations  t h e same r e a l w o r l d  of information a l l users regard  t o the contrary,  i s not relevant Employment  difficult  t h e method objects  as r e l e v a n t .  association  i n V2 and t h e r e f o r e m i s s i n g . Note however, t h a t association  to find,  relationship,  environment.  t h e same r e a l world  In t h e example, the h e u r i s t i c f a i l s i f t h e Employment  the  describing  may not be m i s s i n g ,  i f i n V2  b u t be more  i t i s not represented  but as an e n t i t y a t t r i b u t e o r as an e n t i t y .  127  as a  Even  though  useful their  e n t i t i e s are defined  to treat context,  t o have no c o n t e x t  the r e l a t i o n s h i p s t o permit  they  are involved  the a p p l i c a t i o n  of this  i t is i n as  valuable  heuristic.  2.  Two o b j e c t s  w i t h i d e n t i c a l o r r e l a t e d meaning  have t h e same  This  rule  states  that  will  construct.  even b e f o r e  conflict  resolution,  object  with  type.  Thus, t h e r u l e l e a d s the i n t e g r a t i o n method t o l o o k f o r  a  matching  i d e n t i c a l o r r e l a t e d meaning w i l l  two  object  be o f t h e same  o n l y among those w i t h t h e same  construct.  I f EMPLOYEE i s an e n t i t y i n V I , t h e matching o b j e c t  i n V2 w i l l  a l s o be an e n t i t y .  This  h e u r i s t i c i s based on t h e assumption t h a t  describe they w i l l object on  t h e same o b j e c t  or association  i f two people  from t h e r e a l world,  agree i n t h e i r assessment o f t h e c o n s t r u c t  or association  the r e a l  reasonable.  world  should  item,  be r e p r e s e n t e d w i t h .  this  assumption  One would assume t h a t  t h a t the Depending  i s more o r l e s s  almost anyone c o n s i d e r s an  employee o r a customer t o be an i n d i v i d u a l , but a customer's o r d e r may be p e r c e i v e d as a t h i n g  ( e n t i t y ) , o r as an a s s o c i a t i o n  ( r e l a t i o n s h i p ) between a customer and a company.  128  The h e u r i s t i c f a i l s i n a l l cases o f c o n s t r u c t mismatch (semantic relativism),  i . e . , where one r e a l world o b j e c t i s r e p r e s e n t e d  as an e n t i t y view.  i n one view and as a r e l a t i o n s h i p  For cases  i n which  the rule  fails,  i n the other  the integration  procedure has t o b a c k t r a c k and look a t o b j e c t s w i t h  different  c o n s t r u c t s t o f i n d a match.  3.  I f no two o b j e c t s w i t h i d e n t i c a l o r r e l a t e d meaning and i d e n t i c a l c o n s t r u c t can be found, t h e c o n s t r u c t mismatch w i l l be o f t h e f o l l o w i n g t y p e : - I f 01 i s an e n t i t y w i l l be an e n t i t y  or a relationship,  then 02  attribute.  T h i s h e u r i s t i c suggests which c o n s t r u c t mismatch t o i n v e s t i g a t e first.  S t o r e y (1988) found t h a t a v e r y common e r r o r i n database  d e s i g n was the r e p r e s e n t a t i o n o f an e n t i t y - r e l a t i o n s h i p c o n s t r u c t as an i n t e r c o n n e c t i o n a t t r i b u t e .  S i n c e t h i s "mistake" i s v e r y  f r e q u e n t l y made, checking f o r i t s o c c u r r e n c e when an i d e n t i c a l object  was n o t f o u n d  i s useful.  In combination  w i t h the  common c o n t e x t h e u r i s t i c , t h i s h e u r i s t i c i s expected t o reduce the s e t {02} t o a manageable s i z e .  Some a t t r i b u t e s attributes,  can under no circumstance be i n t e r c o n n e c t i o n  w h i l e o t h e r s a r e more l i k e l y t o be i n t e r c o n n e c t i o n  129  attributes.  Two  support  rules  help  in  identifying  these  groups: a  single  attribute  interconnection  object  key  cannot  For  example,  employee.  are  assumed t o be  Employee#  It  EMPLOYEE and  does not  i s the  interconnection single  represent  some o t h e r o b j e c t .  Customerid+Product#  other objects,  a customer o b j e c t and interconnection  the  In  ORDER e n t i t y ,  are p o t e n t i a l  this  exist,  failure,  following  the  the  key  relationship  contrast,  the  identifies  of  an  between  key  of  l i n k s to  a product o b j e c t .  an two  Both  attributes.  h e u r i s t i c can  system  (composite  attributes.  attribute  S i n c e more forms of mismatches o t h e r than the attributes  an  attribute.  a t t r i b u t e s i n a m u l t i - a t t r i b u t e o b j e c t key key)  be  will  fail.  then s e a r c h  interconnection To  recover  according to  from the  rules: - I f 01 i s an e n t i t y and 02 i s not an e n t i t y a t t r i b u t e then 02 w i l l be - I f 01  a relationship  is a relationship  and  a t t r i b u t e then 02 w i l l be an  These are  the  a s i d e from the any  only other a l t e r n a t i v e s interconnection  of these r u l e s may  fail  02  attribute. i s not  130  entity  entity.  f o r c o n s t r u c t mismatch,  a t t r i b u t e assumption.  too,  an  However,  i f an o b j e c t i s m i s s i n g .  4.  Objects names its  This  i d e n t i c a l meaning w i l l  (consider  have  a name i n s i n g u l a r i d e n t i c a l  assumes t h a t  language  expected  instance, or  with  a p a r t i c u l a r a p p l i c a t i o n uses a  to label  i t s objects.  In absence o f  i n f o r m a t i o n t o t h e c o n t r a r y , members o f t h e same are  identical  plural).  heuristic  standardized  with  t o use terms  to label  organization  t h e same o b j e c t s .  For  terms such as "department" o r " j o b c l a s s i f i c a t i o n "  "account" a r e expected t o be used c o n s i s t e n t l y .  were t r u e , synonyms and homonyms would not e x i s t . assumption  i s expected  Nevertheless,  t o have v e r y  i t provides  limited  I f this  Hence, t h i s reliability.  a good s t a r t i n g p o i n t i n t h e search  f o r matching p a i r s o f o b j e c t s a t t h e o u t s e t o f t h e i n t e g r a t i o n procedure.  When t h i s having  heuristic  i s applied,  a r e t r e a t e d as  t h e same name even i f one i s i n s i n g u l a r form w h i l e the  o t h e r one i n t h e p l u r a l  If  two o b j e c t s  the h e u r i s t i c  to continue  fails,  ( i . e . , employee v s .  the search  employees).  f o r a matching o b j e c t has  among a l l o b j e c t s with d i f f e r e n t names.  131  5.  Objects with  related  meaning w i l l  have names w i t h  i d e n t i c a l word stems. In  t h e s e a r c h f o r r e l a t e d o b j e c t s , t h e word stem can be a v e r y  strong  filter  to identify  unrelated.  F o r example,  the  stem  same  those  objects  that  FULLTIME_EMPLOYEE and EMPLOYEE have  employee,  GRADUATE_STUDENT  UNDERGRADUATE_STUDENT have t h e same student stem. are l i k e l y the  word  t o be r e l a t e d .  stem,  and  Thus, they  An even s t r o n g e r i n t e r p r e t a t i o n o f  stem phenomenon may conclude  name i s t h e word  are l i k e l y  i twill  t h a t i f one o b j e c t ' s  be t h e s u p e r s e t o f t h e o t h e r  o b j e c t , w h i l e two o b j e c t w i t h d i f f e r e n t p r e f i x e s have a common superset.  Again, will  s i n c e synonyms and homonyms a r e f r e q u e n t , t h i s  be o f o n l y l i m i t e d use.  procedure,  Nevertheless, i n a  i t r e q u i r e s no u s e r  effort  rule  computerized  and i s t h e r e f o r e a  d e s i r a b l e f e a t u r e , even i f i t s b e n e f i t s may be m a r g i n a l .  6.  Two o b j e c t s w i t h have  identical  some a t t r i b u t e s  o r r e l a t e d meaning  with  identical  names  will (for  e n t i t i e s and r e l a t i o n s h i p s o n l y ) .  Especially  i n t h e search f o r i d e n t i c a l o b j e c t s , t h i s r u l e can  be u s e d t o e l i m i n a t e t h o s e candidates  for identity.  objects that  Two d i f f e r e n t 132  are very  unlikely  views d e s c r i b i n g t h e  same EMPLOYEE e n t i t y a r e expected t o use a t l e a s t some i d e n t i c a l attributes identical  t o s p e c i f y employee  properties.  In p a r t i c u l a r ,  o r r e l a t e d o b j e c t s a r e assumed t o have t h e same key  attributes  (with t h e same key a t t r i b u t e names).  Obviously,  homonymy  i s a problem i n t h i s c o n t e x t .  Attributes  may be i d e n t i c a l , but a t t r i b u t e names may be n o t .  7.  Objects belong  with  identical  o r r e l a t e d meaning  t o t h e same p r e - d e f i n e d meaning  will  category.  In a subsequent s e c t i o n , a h i e r a r c h y o f o b j e c t c a t e g o r i e s  will  be i n t r o d u c e d which p r o v i d e s a s t r u c t u r e f o r t h e c a t e g o r i z a t i o n of  database  objects  "animate o b j e c t " . terms  according  t o t h e i r meaning,  i.e.,  as an  I f each o b j e c t ' s meaning i s p r e - d e f i n e d , i n  of the category  i t belongs t o , then two o b j e c t s  d i f f e r e n t c a t e g o r i e s cannot be i d e n t i c a l .  Again, t h i s h e u r i s t i c  provides a f i l t e r t o eliminate non-identical objects.  133  from  Diagnosis  4.2.  The  conflict  two  parts:  and  the  Procedure  omission  test  r e c o g n i t i o n procedure c o n s i s t s of  for identity  of objects  (object types),  and the t e s t f o r r e l a t e d n e s s o f o b j e c t s .  The t e s t f o r i d e n t i t y  is  identical  concerned w i t h  the  identification  of  objects  in  the observed views; the t e s t f o r r e l a t e d n e s s i s concerned with the  detection  of  inter-view  set  relationships  (object  relatedness).  Even  though  an  corresponding  object  object  i n any  o f another view can be there  exists a  relatedness is  question  impossible  of objects  set  from  one  view  other  can  have  t o conclude the  one  object  Relatedness means t h a t  r e l a t i o n s h i p between the t o be  most  view, more than one  related to i t .  has  at  approached relatedness  objects.  The  independently.  It  or  non-relatedness  from the e x i s t e n c e of a p a i r o f i d e n t i c a l  objects,  or v i c e versa.  The  first  order  question w i l l  to r e s t r i c t  relatedness, test  for  the  find  out,  the t e s t  test  Inter-view  t h a t are  i d e n t i t y of objects.  f o r relatedness only to  relatedness  identity.  between o b j e c t s  r e f e r t o the  has  134  unique t o one  corresponding  inter-view  preceded by  r e l a t i o n s h i p s can  originally  which o b j e c t s have no  t o be  In  only  exist  view.  objects  the  To  i n the  1  o t h e r view, t h e t e s t f o r o b j e c t i d e n t i t y has t o be performed .  T e s t f o r I d e n t i t y o f Objects  The  purpose o f t h i s t e s t i s t o answer t h e q u e s t i o n  "does t h e r e  e x i s t an o b j e c t 02 i n V2 which i s i d e n t i c a l t o 01 from VI?", i . e . i f view 1 c o n t a i n s an e n t i t y SUPPLIER, does view 2 a l s o c o n t a i n an e n t i t y w i t h same name and same meaning. Again, "same meaning" can be i n t e r p r e t e d as "both o b j e c t s r e f e r t o t h e same o b j e c t i n the r e a l w o r l d " . exception. are  Obviously,  I t i s more l i k e l y t h a t o b j e c t s w i l l be found t h a t  somewhat s i m i l a r ,  adjustments objects world  f i n d i n g a p e r f e c t match w i l l be the  but not i d e n t i c a l .  have t o be made.  completely  objects  identical  The g e n e r a l  i f they  (have same meaning).  In such  cases,  r u l e i s t o make  r e f e r t o t h e same r e a l In such cases,  possible  mismatches i n name, c o n s t r u c t o r context w i l l be a d j u s t e d .  If  o b j e c t s r e f e r t o d i f f e r e n t r e a l world o b j e c t s , then a p o s s i b l e , but  undesirable,  match  i n their  names  (homonym) has t o be  corrected.  The  test  for identity  comparison time.  i s c a r r i e d out i n c r e m e n t a l l y ,  of the involved  objects  A l l t e s t s compare one o b j e c t  1  along  with  a  one dimension a t a  from view 1 t o a s e t o f  The t e s t procedures w i l l f r e q u e n t l y mention therapy procedures t o resolve c o n f l i c t s or t o r e f l e c t inter-view r e l a t i o n s h i p s , without going i n t o much d e t a i l . D e t a i l e d s o l u t i o n d e s c r i p t i o n s w i l l be g i v e n i n t h e subsequent s e c t i o n . 135  o b j e c t s from view 2, t o f i n d t h e ones t h a t f u l f i l l t h e c o n d i t i o n of t h e t e s t .  Objects  are i d e n t i c a l  are i d e n t i c a l .  S i n c e the meaning dimension i s the most important  o n e — o t h e r dimensions a r e a d j u s t e d good s t a r t i n g  point  i ftheir  four  accordingly—it  f o r the a n a l y s i s .  dimensions  presents  The main problem  a  with  t h i s approach i s t h a t an o b j e c t 01 from view VI i s compared t o all  objects  02 from V2, independent o f t h e i r name, c o n s t r u c t  o r c o n t e x t , even though o n l y one o b j e c t from V2 can be i d e n t i c a l t o 01.  T h i s may r e q u i r e t h a t t h e u s e r  irrelevant section second  objects.  The h e u r i s t i c s  can be used t o a l l e v i a t e procedure w i l l  check a l o n g  l i s t of  introduced  i n the previous  t h e problem.  Therefore,  be shown which  a  includes the h e u r i s t i c  " o b j e c t s w i t h i d e n t i c a l meaning w i l l have i d e n t i c a l c o n s t r u c t s " , t o exemplify begins with  the e f f e c t of h e u r i s t i c s . a search  f o r objects with  T h i s second procedure constructs  identical to  that of O l .  While  i t i s important  t o begin  with  t h e meaning dimension i n  the f i r s t procedure, t h e a n a l y s i s sequence f o r o t h e r dimensions may v a r y .  The order chosen here i s : c o n s t r u c t , c o n t e x t ,  Construct  a n a l y s i s has t o precede context  name.  a n a l y s i s , because  every t e s t f o r i d e n t i t y may r e s u l t i n a change i n t h a t dimension. For  example,  a test  construct  change,  construct  change w i l l  contrast,  context  f o r identity  i f constructs also result  of construct  will  are not identical. i n a context  But a  change.  changes do not a f f e c t t h e c o n s t r u c t .  136  cause a  In  Thus,  no  test  for identity  constructs should to  have  of context  become  follow construct  give  objects  construct.  identical.  Name i d e n t i t y  different  names, w h i c h  The complete procedure  illustrate  be e x e c u t e d  until  analysis  a n a l y s i s , because t h e u s e r may  form i n F i g u r e 6 (with a b b r e v i a t e d  To  should  decide  a r e based on t h e i r  i s depicted  i n flowchart  notation).  t h e whole procedure w i t h  an example, i t w i l l be  assumed t h a t an o b j e c t 01 from view VI i s s e l e c t e d a t random, i . e . , t h e e n t i t y type SUPPLIER which denotes t h e s e t o f c u r r e n t suppliers  o f a company.  With  this  object  held  f i x e d , the  f o l l o w i n g t e s t s a r e c a r r i e d out:  The  procedure begins w i t h  identical  meaning t o 01.  generates t h e h y p o t h e s i s  the goal To f i n d  to find  an o b j e c t 02 with  the object,  HI "there e x i s t s an o b j e c t 02 from V2  such t h a t 02 i s i d e n t i c a l i n meaning t o 01". the VI  user,  i t r e s u l t s i n the question  i s identical  i n meaning t o 01?"  describe  D i r e c t e d towards  "which o b j e c t  from view  The use can then e i t h e r  i d e n t i f y an o b j e c t , o r r e p l y with a "none". V2 may c o n t a i n  t h e procedure  F o r example, view  an e n t i t y MANUFACTURER which i s used i n V2 t o  a l l suppliers.  system s t a t e s'=sl  I f a matching o b j e c t  i s reached.  I f not,  s'=s5. test  i s found, t h e In contrast t o  the  s u b s e q u e n t hypotheses H2-H4, t h i s  compares 01 t o a  set  {02} from view V2 r a t h e r than t o a s i n g l e o b j e c t .  c o n t a i n s a l l o b j e c t s from V2 which so f a r have not been  137  {02}  s-sO  S'"S4  Pick next object 01  F i g u r e 6:  Test  for  Object  Identity,  Heuristics  138  Procedure  without  matched up w i t h  an o b j e c t from VI.  one  of t h e s e o b j e c t s w i l l  the  remaining n-1  from  {02}  will  s5.  Thus, i n the  objects  in  If  a matching  hypothesis  H2  construct, the  the  i n s t a t e s5, In o t h e r  i n Figure  outcome o f  object w i l l  object  01  while  or a l l o b j e c t s  i s found,  and  02  will  the  be  the  path,  continues  02  will  entities.  The  have the  "no"  path.  method  and  state  f o r most i f not a l l  f o l l o w the "yes"  i . e . , t h a t both are "do  i n VI,  words, f o r most, i f  6,  HI  which s t a t e s t h a t 01  question,  either  the r e s u l t of t h i s t e s t w i l l be  flowchart  {02},  w h i l e a t most one  be  i n s t a t e s5.  not a l l o b j e c t s from V2,  a r e s u l t of HI,  f i n d a matching o b j e c t  objects w i l l  be  As  with  have the method  same  issues  same c o n s t r u c t ? "  In  a  computerized view i n t e g r a t i o n system, the i n t e g r a t i o n procedure will  l o o k up  the  information  view d e f i n i t i o n s . (s'=s6),  a  constructs  Should both o b j e c t s have d i f f e r e n t  construct are  change would  identical,  example, SUPPLIER and  state  have t o  s'=s2  constructs.  Subsequent  s2,  the  system  checks  answer t o t h i s  context  i s an  question  i s reached.  for identical  empty s e t .  i s always p o s i t i v e ,  I f 01  and  a t t r i b u t e s and not a l l t h e i r context  139  02  are  the  constructs  occur.  Are O l and 02 a s s o c i a t e d with i d e n t i c a l o b j e c t s ? the  from  If  the  In  the  MANUFACTURER are both e n t i t i e s and  have i d e n t i c a l  to  t o answer t h i s q u e s t i o n  thus  context.  For e n t i t i e s , since  their  r e l a t i o n s h i p s or  o b j e c t s have been matched  to  o b j e c t s i n t h e o t h e r view y e t , then t h e i d e n t i t y  01  and 02 i s suspended, u n t i l t h e c o n t e x t o b j e c t s a r e matched  to  o b j e c t s i n t h e o t h e r view.  test  I f the r e s u l t  i s t h a t 01 and 02 have d i f f e r e n t  c o n t e x t s have t o be made i d e n t i c a l both o b j e c t a r e e n t i t i e s .  o f the context  contexts  (s'=s3).  test for  (s'=s7), t h e  I n t h e example,  Thus, both have i d e n t i c a l  (empty)  contexts.  If  state  s3 has been reached,  t h e remaining t e s t  f o r name i d e n t i t y o f t h e o b j e c t s .  i s the t e s t  The method's h y p o t h e s i s i s  t h a t both o b j e c t s have i d e n t i c a l names.  I f they do n o t share  the same name (s'=s8), t h e i r names a r e made i d e n t i c a l through will  a change o f a t l e a s t  have t o be d i f f e r e n t  one o f t h e names.  The new name  from t h e names o f a l l o t h e r o b j e c t s  i n VI and V2 t o a v o i d homonymy. of  (s'=s4)  I n t h e example, a t l e a s t one  t h e e n t i t i e s would r e q u i r e a name change.  The name chosen  s h o u l d be such t h a t i t i s n o t i d e n t i c a l t o t h e name o f another obj e c t .  Once t h e p a i r o f o b j e c t s i s i d e n t i c a l i n a l l f o u r  dimensions,  the  The method  identity  test  i s completed  f o r this  continues  by s e l e c t i n g  subjecting  i t t o t h e same a n a l y s i s .  pair.  a new o b j e c t 01 from  view  V I , and  The procedure t e r m i n a t e s  when a l l o b j e c t s have a matching o b j e c t i n t h e o t h e r view.  140  The  set of a l l objects  {02} from V2 t h a t , as a r e s u l t o f HI,  are known t o be d i f f e r e n t i n meaning from 01 (s'=s5) i s s u b j e c t to further analysis. names d i f f e r e n t (slO)  require  H5 t e s t s whether a l l o f t h e o b j e c t s have  from 01's name.  A l l objects with  renaming t o make t h e i r  a d d i t i o n , i f none o f t h e o b j e c t s  names unique  (s9).  In  {02} was i d e n t i c a l i n meaning  t o 01, a new o b j e c t 02, completely added t o achieve  same names  i d e n t i c a l t o 01, has t o be  t h e s t a t e s4.  The use o f h e u r i s t i c s r e s u l t s i n changes t o t h e view i n t e g r a t i o n procedure. discussed with  To e x e m p l i f y below  that  includes  i d e n t i c a l meaning w i l l  heuristic  such changes, a procedure w i l l only  one h e u r i s t i c :  be  "objects  have i d e n t i c a l c o n s t r u c t s . "  This  i s i n f a c t one o f t h e h e u r i s t i c s implemented i n the  view i n t e g r a t i o n program AVIS.  Again, t h e procedure b e g i n s by  p i c k i n g one o b j e c t 01 from view V I .  I t again w i l l  attempt t o  f i n d an o b j e c t i n view V2 t h a t i s i d e n t i c a l t o O l .  The  procedure  set  of objects  o b j e c t 01". same meaning consider  (see F i g u r e  the goal  " f i n d the  {02} from V2 t h a t have t h e same c o n s t r u c t as  S i n c e t h e procedure assumes t h a t a l l o b j e c t s have  those  t h e same c o n s t r u c t ,  objects  have t h e same c o n s t r u c t will  7) begins w i t h  i t decides  02 f o r f u r t h e r i d e n t i t y as 01.  141  t o only  t e s t i n g that  A number o f o b j e c t s  q u a l i f y and t h u s be i n s t a t e sO, w h i l e  with  from V2  the objects of  different  type w i l l  be i n s t a t e  s5.  Since  i n t h e example  SUPPLIER i s an e n t i t y , a l l e n t i t i e s from V2 would be c o n s i d e r e d for  further identity  testing.  use  o f c o n s t r u c t as a " f i l t e r "  o b j e c t s t o be c o n s i d e r e d ,  One may want  to think  o f the  which can reduce t h e number o f  h o p e f u l l y without b e i n g t o o s t r i n g e n t  a condition.  For  those  objects  with  same c o n s t r u c t ,  i n v e s t i g a t e s whether t h e r e  e x i s t s an o b j e c t  same meaning as 01 from V I . in  V2 i d e n t i c a l  object  o f V2  then  02 which has the  I.e., i t i s l o o k i n g f o r an e n t i t y  i n meaning t o SUPPLIER.  i s allowed  t h e procedure  to f u l f i l l  Again,  this  a t most one  condition.  That  o b j e c t w i l l be i n s t a t e s i . A l l o b j e c t s w i t h d i f f e r e n t meaning w i l l be i n s t a t e s6. the  procedure  tests, V2  continues  continues  t o have  meaning  but d i f f e r e n t  possibility  to verify  i s wrong.  construct  of construct.  a l t e r n a t i v e i s true,  minimal g l o b a l  i f no o b j e c t i n  as 01, t h e procedure  one o f two  The f i r s t  possible  possibility i s  Thus, an o b j e c t  i s t h a t no o b j e c t w i t h  regardless  (H3) and name (H4)  However,  of the s i t u a t i o n .  the h e u r i s t i c  which  t h e context  t h e same meaning  differently,  interpretations  V2,  with  s i m i l a r t o the t e s t s above.  i s found  that  I f an o b j e c t w i t h same meaning i s found,  e x i s t s i n V2.  02 w i t h  same  The second  i d e n t i c a l meaning e x i s t s i n  The procedure has t o f i n d out to avoid  schema.  142  t h e c r e a t i o n o f a non-  F i g u r e 7:  Test for  Identity with 143  Heuristic  Thus,  after  continues  i n abbreviated  different  empty  (H5),  the procedure  set.  this  as c 2 o c l .  any o b j e c t s  constructs test i s  I t s correct  i n V2 t h a t have a  T h i s q u e s t i o n may appear redundant f o r  i n s5, because  However,  In the f i g u r e ,  notation  i s "are there  construct?"  objects  test.  o f homonyms  from O l ' s c o n s t r u c t .  interpretation  the  care  w i t h a t e s t t o i d e n t i f y those o b j e c t s w i t h  different shown  taking  they  failed  t h e "same  context"  the s e t of objects  i n s t a t e s5 may be t h e  Thus, they would q u a l i f y  f o r t h e answer "no" t o  q u e s t i o n H6 ( s l 3 ) , r e q u i r i n g t h e a d d i t i o n o f a new o b j e c t .  If  there  Ol's,  are objects  i n V2 w i t h  constructs  different  from  t h e procedure checks whether any o f them have t h e same  meaning as O l (H7).. ( s l l ) , i t s construct exists  I f an o b j e c t w i t h  same meaning i s found  has t o be changed.  ( s l 4 ) , a t e s t f o r homonymy f o l l o w s  name change  f o r a l l homonyms.  I f no such  object  (H8), r e s u l t i n g  Subsequently,  ina  the missing  o b j e c t i s added.  In  this  procedure v a r i a n t ,  change w i t h construct  respect  identity.  t h e main  to the tests I t results  procedure i f t h e h e u r i s t i c  i s a sequence  f o r meaning  i d e n t i t y and  i n a prolongation  i s wrong.  144  effect  of the  The  procedure  could  be v a r i e d  further,  f o r instance  s w i t c h i n t h e sequence o f meaning i d e n t i t y and c o n t e x t test. the  Therefore,  test  f o r construct  o n l y those initially  the t e s t  identity  f o r meaning i d e n t i t y would  and context  identity.  by a  follow  Consequently,  o b j e c t s w i t h same c o n s t r u c t and same c o n t e x t be c o n s i d e r e d  f o r the  meaning i d e n t i t y t e s t .  would This  procedure change would r e f l e c t t h e h e u r i s t i c " i d e n t i c a l o b j e c t s are  i n the v i c i n i t y  would  look  further form,  i n t h e neighborhood  matching  also  of i d e n t i c a l  objects.  implemented  The procedure  o f matching o b j e c t s t o f i n d  This  i n AVIS.  objects."  heuristic  i s , i n modified  AVIS r e q u i r e s o n l y p a r t o f  the c o n t e x t t o be i d e n t i c a l .  The  test  f o r meaning  identity  could  even be moved p a s t the  test  f o r name i d e n t i t y  t o r e f l e c t the h e u r i s t i c  with  same meaning w i l l have same names.  that  Since t h i s  objects  heuristic  i s expected t o be f r e q u e n t l y wrong, i t has not been implemented i n AVIS.  T e s t f o r Relatedness o f Objects  The  purpose  being  of t h i s  test  i s to find  out whether a s i d e  i d e n t i c a l , o b j e c t s from one view a r e r e l a t e d  from another view through s e t r e l a t i o n s h i p s .  145  from  t o objects  I.e., an e n t i t y  (type)  SUPPLIER i n V I i s a subset  o f an e n t i t y  DEALER i n V2.  Such a case would e x i s t i n a s i t u a t i o n where SUPPLIER r e f e r r e d to a l l current  s u p p l i e r s o f t h e company, w h i l e  to a l l present those  and a l l p o t e n t i a l s u p p l i e r s o f t h e company.  r e l a t i o n s h i p s a r e n o t made e x p l i c i t ,  occur.  I . e . , i f a member  DEALER,  i t should  entity  DEALER r e f e r s  i s dropped  from  a l s o be a u t o m a t i c a l l y  s e t SUPPLIER.  Furthermore,  If  anomalies can the e n t i t y set  dropped  from t h e  a t t r i b u t e i n h e r i t a n c e can  be d e r i v e d from s e t r e l a t i o n s h i p s .  The  procedure d e s c r i b e d  the use o f h e u r i s t i c s for  containment  one  of the objects  below i s a g e n e r i c  (see F i g u r e 8) .  (HI and H2) .  Subject  i s contained  SUPPLIER i s c o n t a i n e d by DEALER. the  procedure  I t begins  with  of the t e s t  by t h e o t h e r  without a test  i s whether  object,  i.e.,  The procedure f i r s t determines  s e t {02} o f o b j e c t s c o n t a i n e d  by 01, and then,  f o r those  o b j e c t s n o t c o n t a i n e d by 01, t h e s e t {02'} c o n t a i n i n g 01. The way t h e q u e s t i o n i s r a i s e d t o t h e u s e r i s "Which o f t h e o b j e c t s (in  V2) a r e c o n t a i n e d  objects  by 01", and v i c e v e r s a  ( i n V2) c o n t a i n 01?"  some o b j e c t s  i n V2 w h i l e  "which o f t h e  I t i s p o s s i b l e t h a t 01 c o n t a i n s  being  itself  contained  by o t h e r s .  I.e., SUPPLIER (VI) i s c o n t a i n e d by DEALER (V2) b u t may c o n t a i n another o b j e c t SMALL_QTY_SUPPLIER from V2.  I n such a s i t u a t i o n  an  SMALL_QTY_SUPPLIER  I s a r e l a t i o n s h i p between  DEALER  and  would have e x i s t e d which now would have t o be removed because it  i s a transitive Isa.  146  c  04  )4 c Ov M c 02 "  •My. yc2-cK no x  *'•«  16  Cha ;o nngae t ruct  coC nh sa trnugcet  coC nh sa trnugcet  Rreelaptrleos e nt n ah Ip  Represent relatI on ship  Rep resent relat lonshlp  The  containment  the  most  i s the f i r s t  s p e c i a l i z e d form  superset,  one i s s u e d ,  because i t i s  o f common containment and common  requiring the least  e x i s t i n g views. between  test  amount  of additions  to the  Only one I s a r e l a t i o n s h i p has t o be e s t a b l i s h e d  the objects.  The i n s e r t i o n o f an I s a between t h e  o b j e c t s r e q u i r e s , however, t h a t both o b j e c t s a r e e n t i t i e s . they a r e n o t , a l l o f them which a r e not e n t i t i e s converted  into entities.  If  have t o be  The t e s t H6.1 i s executed t o determine  whether both o b j e c t s a r e e n t i t i e s .  The  entity test  their test are  (H6) i s i s s u e d f o r each p a i r o f o b j e c t s a f t e r  relatedness  has been d i s c o v e r e d .  f o r o b j e c t type e a r l i e r , not e n t i t i e s  will  There i s no need t o  since only r e l a t e d objects  require construct  changes.  o b j e c t s w i l l keep t h e i r o r i g i n a l c o n s t r u c t s . type t e s t  that  Unrelated  Since the object  (H6) i s i d e n t i c a l f o r a l l forms o f r e l a t e d n e s s (H6.1  - H6.4), i t w i l l not be d i s c u s s e d  f u r t h e r i n t h e procedure.  Should n e i t h e r o b j e c t c o n t a i n t h e o t h e r one ( s 8 ) , t h e procedure i n q u i r e s whether both o b j e c t s have a common s u p e r s e t  (H3).  they  a common  subset  do, t h e p r o c e d u r e  further  e x i s t s between them (H4).  precedes t h e common subset a  (meaningful)  common  i n q u i r e s whether The common s u p e r s e t  question,  subset  question  because o b j e c t s t h a t have  and a r e themselves  s e t s have t o have a (meaningful) common s u p e r s e t .  148  If  meaningful Although i t  is  possible t o construct  sets  such as t h e s e t o f " a l l green  t h i n g s " and t h e s e t o f " a l l e d i b l e t h i n g s " which have a common subset  i n t h e s e t o f " a l l green e d i b l e t h i n g s " , w h i l e  no m e a n i n g f u l nevertheless In  superset  valid  t h e example,  other  than  " a l l things",  when o n l y meaningful  having  the rule i s  sets are considered.  e s p e c i a l l y t h e s e t "green t h i n g s "  i s not a  meaningful s e t as i t has no c l e a r l y d e f i n e d a t t r i b u t e s ( r a t h e r than green c o l o r ) which we expect f o r an e n t i t y o r r e l a t i o n s h i p type.  If new  objects  o b j e c t s w i l l be c r e a t e d t o r e p r e s e n t  subset. the  have both a common s u p e r s e t  and subset  ( s l O ) , two  the superset  and t h e  A l s o , new I s a r e l a t i o n s h i p s w i l l be c r e a t e d t o r e p r e s e n t  relatedness.  no common subset  I f t h e o b j e c t s have a common s u p e r s e t but (sl4),  o n l y a common s u p e r s e t  corresponding  Isa relationships w i l l  an  c o n s t r a i n t may be d e f i n e d  integrity  e n t i t y and t h e  be added.  In a d d i t i o n ,  to identify  that the  objects are not overlapping.  Objects  without  existence If  a common superset  o f a W-relationship  no common s u p e r s e t  exists,  (sl3) are tested  ( G o l d s t e i n and Storey, the objects  Yet t h e o b j e c t s may s t i l l  inter-view  r e l a t i o n s h i p s i f they have a common r o l e .  e n t i t y may be car-owners,  1988) .  a r e i n f a c t not  related.  o b j e c t s have a common r o l e ,  require  f o r the  the creation of I f the  i . e . , both a PERSON and a COMPANY a new o b j e c t d e s c r i b i n g t h e common  149  role  (CAR_OWNER) , p l u s  (PERSON_CAR_OWNER, Furthermore,  objects  describing the special roles  COMPANY_CAR_OWNER)  have  t o be  created.  I s a r e l a t i o n s h i p s have t o be added t o r e p r e s e n t  the a s s o c i a t i o n s between t h e o b j e c t s .  I f n o t even a W - r e l a t i o n s h i p  e x i s t s between t h e o b j e c t s ,  they  are u n r e l a t e d and r e q u i r e no a d d i t i o n o f i n t e r - v i e w r e l a t i o n s h i p objects.  150  C o n f l i c t Therapy  4.3.  As the  soon as a c o n f l i c t  i s d e t e c t e d by t h e d i a g n o s i s  i n t e g r a t i o n method w i l l  there there  exists exists  a diagnosis no therapy  c o r r e c t t h e problem.  procedure, Thus,  procedure t o recognize  procedure p e r s e .  c o n f l i c t case, a case s o l u t i o n i s d e f i n e d .  while  conflicts,  Instead,  f o r each  A l l case s o l u t i o n s  are based on a s e t o f 11 elementary s o l u t i o n o p e r a t i o n s which were formulated  e a r l i e r as r u l e s g u i d i n g view i n t e g r a t i o n :  1.  R e l a t i o n s h i p becomes an e n t i t y .  2.  R e l a t i o n s h i p a t t r i b u t e becomes an e n t i t y .  3.  E n t i t y a t t r i b u t e becomes an E-R c o n s t r u c t .  4.  A s s o c i a t i o n o f an e n t i t y t o a r e l a t i o n s h i p .  5.  R e l o c a t i o n o f a r e l a t i o n s h i p a f t e r c r e a t i o n o f new s u p e r s e t o r subset c l a s s e s .  6.  Representation  o f containment.  7.  Representation  o f a common r o l e  8.  Representation  o f common s u p e r s e t without  9.  Representation  o f common s u p e r s e t w i t h  10.  Renaming o f homonyms and synonyms.  11.  Addition of missing objects.  One be  o r more o f these carried  out d u r i n g  elementary therapy conflict  (W-relationship). overlap.  overlap.  measures may have t o  reconciliation.  Each o f them  will  be d e s c r i b e d  groups  in detail.  of elementary  solutions  Appendix will  2 will  be a p p l i e d  show  which  to specific  c o n f l i c t cases and t h e i r sub-cases.  Relationship  becomes an e n t i t y (SI)  Whenever  necessary,  a relationship  entity.  I f a relationship  i s transformed  becomes an e n t i t y ,  the linkages  between t h e r e l a t i o n s h i p and t h e e n t i t i e s i t a s s o c i a t e d relationships  themselves  (see F i g u r e 9 ) .  CUSTOMER  F i g u r e 9: R e l a t i o n s h i p  Becomes an E n t i t y 152  i n t o an  become  The an  e n t i t y c o n s t r u c t i s the more fundamental one. entity  can  relationship,  be  associated  to  o t h e r e n t i t i e s by  i . e . an I s a r e l a t i o n s h i p .  newly c r e a t e d e n t i t y s e t r e l a t i o n s h i p s represented within in  the  figure,  the  the  CUSTOMER becomes an D e a l e r _ c o n t r a c t and  Relationship  and  the  and  entity  and  two  new  relationship  a linkage  relationships,  Relationship  are  Attribute  153  addition.  converted  i s expressed between the Figure  and  fS2)  attributes  (see  be  example  C o n t r a c t between DEALER  Customer-contract are c r e a t e d i n  newly c r e a t e d e n t i t y  F i g u r e 10:  itself  In the  a the  t o o t h e r o b j e c t s can  m o d e l l i n g language.  relationship  means of  Consequently, f o r  a t t r i b u t e becomes an e n t i t y  When n e c e s s a r y , entities  E-R  Furthermore,  relationship  10).  Becomes an  into  Entity  Relationship entities  attributes  that  are interconnection  have t o be t r a n s f o r m e d attributes.  into  Interconnection  a t t r i b u t e s r e p r e s e n t e n t i t i e s (or E-R c o n s t r u c t s ) i n shortened form.  I f t h e database r e q u i r e s t h a t an i n t e r c o n n e c t i o n  be a s s o c i a t e d into  w i t h another o b j e c t ,  an e n t i t y  i t f i r s t has t o be converted  (or an E-R c o n s t r u c t ) .  SUPPLIER i s a s s o c i a t e d  attribute  In t h e  illustration,  w i t h PART through t h e Supply r e l a t i o n s h i p  which has an a t t r i b u t e  Project.  This  attribute  subsequently  becomes an e n t i t y .  E n t i t y a t t r i b u t e becomes an E-R c o n s t r u c t (S3)  S i m i l a r t o r e l a t i o n s h i p a t t r i b u t e s , e n t i t y a t t r i b u t e s may have to  be t r a n s f o r m e d ,  objects,  i f they  require  association  with  o r i f another view r e p r e s e n t s them d i f f e r e n t l y .  e n t i t y a t t r i b u t e which i s an i n t e r c o n n e c t i o n  will  (see  be c o n v e r t e d  into  Therefore,  an e n t i t y - r e l a t i o n s h i p  structure  F i g u r e 11).  Typically,  t h e newly  real  object  world  created  that  entity will  refer  the original attribute  However, t h e u s e r may t h i n k o f t h e newly c r e a t e d as  An  a t t r i b u t e represents  an e n t i t y - r e l a t i o n s h i p c o n s t r u c t i n shortened form. it  other  the object that  t o the same referred to. relationship  corresponds t o the o r i g i n a l a t t r i b u t e . 154  In  fact,  the a t t r i b u t e  relationship. S u p p l i e r which  c o r r e s p o n d s t o both t h e e n t i t y  and the  In t h e example, t h e PART e n t i t y has an a t t r i b u t e i n f a c t r e p r e s e n t s a Supply r e l a t i o n s h i p  and a  SUPPLIER e n t i t y i n shortened form.  F i g u r e 11; E n t i t y  Attribute  Becomes an  Construct 155  Entity-Relationship  Association  o f an e n t i t y t o a r e l a t i o n s h i p  A c o n f l i c t s i t u a t i o n may r e q u i r e existing new  entity  w i t h an a l r e a d y  (S4)  the a s s o c i a t i o n existing  relationship.  element added t o t h e view i s t h e a s s o c i a t i o n  between t h e e n t i t y and t h e r e l a t i o n s h i p  o f an a l r e a d y  link  (see F i g u r e 12).  View 1  View 2  PART  SUPPLIER  PROJECT  Global Schema  F i g u r e 12: A s s o c i a t i o n  o f an E n t i t y t o a  156  Relationship  The (role)  Such  a s i t u a t i o n arises  even  though  associated the  other  one i n v o l v e s  a ternary  relationship.  involving  relationship.  are s i m i l a r ,  o n l y a subset o f t h e e n t i t y  by t h e o t h e r r e l a t i o n s h i p ,  relationship, first  when two r e l a t i o n s h i p s  only  i.e.  one i s a  types binary,  The f i g u r e shows a Supply  t h e SUPPLIER and PART i n the  Subsequently, t h e PROJECT e n t i t y i s a l s o  t i e d into the r e l a t i o n s h i p .  Relocation  of a relationship after creation  subset c l a s s e s  o f new s u p e r s e t o r  (S5)  Whenever a new s u p e r s e t - s u b s e t r e l a t i o n s h i p i s i n t r o d u c e d a view, t h e p o s s i b i l i t y e x i s t s t h a t e x i s t i n g r e l a t i o n s h i p s have t o be r e l o c a t e d . VI  F i g u r e 13 shows such a case.  every  FULLTIME_EMPLOYEE  i s an EMPLOYEE.  may  In view  DEPARTMENT Employs FULLTIME_EMPLOYEE, w h i l e view V2  that  into  reveals  Once t h e views  are combined, i t becomes e v i d e n t t h a t t h e Employs r e l a t i o n s h i p should  associate  DEPARTMENT w i t h EMPLOYEE  rather  than  with  FULLTIME_EMPLOYEE. Hence, t h e Employs r e l a t i o n s h i p i s r e l o c a t e d .  R e l o c a t i o n becomes necessary whenever t h e o r i g i n a l r e l a t i o n s h i p , i.e.  Employs,  object,  should have r e f e r r e d  i . e . EMPLOYEE  more s p e c i f i c  instead  object.  157  to either  a more g e n e r a l  o f FULLTIME_EMPLOYEE,  or to a  FULLTIME-  View 1  View 2  EMPLOYEE  FULLTIMEEMPLOYEE  EMPLOYEE  Isa  F i g u r e 13: R e l a t i o n s h i p R e l o c a t i o n  R e p r e s e n t a t i o n o f containment (S6)  Whenever one o b j e c t ( c l a s s ) r e p r e s e n t s t h e s u p e r s e t o f another o b j e c t and t h i s s u p e r s e t - s u b s e t r e l a t i o n s h i p i s meaningful f o r  158  the database,  i t has t o be r e p r e s e n t e d by an I s a r e l a t i o n s h i p  between t h e two o b j e c t s (see F i g u r e 14).  View 1  FULLTIME.  View 2  EMPLOYEE  EMPLOYEE  F i g u r e 14: R e p r e s e n t a t i o n o f Containment  The  illustration  relationship  i n the figure  between  shows t h e c r e a t i o n  an EMPLOYEE  entity.  159  o f an I s a  and a FULLTIME_EMPLOYEE  R e p r e s e n t a t i o n o f a common r o l e  ( W - r e l a t i o n s h i p ) (S7)  Two o b j e c t s can be u n r e l a t e d but n e v e r t h e l e s s have some a f f i n i t y to  each o t h e r ,  Storey  (1988)  i f they identify  assume a common r o l e . this  affinity  G o l d s t e i n and  as a W - r e l a t i o n s h i p .  F i g u r e 15 d e p i c t s two e n t i t i e s , COMPANY and PERSON, as u n r e l a t e d but  both  assuming t h e r o l e  o f a c a r owner.  companies can be c a r owners.  GLOBAL SCHEMA  160  Both people and  In such a s i t u a t i o n , new o b j e c t s have t o be c r e a t e d t o r e p r e s e n t the common r o l e ,  i . e . STOCKHOLDER, as w e l l as t o r e p r e s e n t t h e  s p e c i f i c r o l e s , i . e . , COMPANY_STOCKHOLDER and PERSON_STOCKHOLDER. Each o b j e c t r e p r e s e n t i n g one o f t h e o r i g i n a l as  by t h e o b j e c t  common  role  Figure  i . e . COMPANY o r PERSON, as w e l l  representing  t h e common r o l e .  Whenever a  r e l o c a t i o n o f r e l a t i o n s h i p s may  place.  Representation  other,  objects,  i s represented,  have t o take  A Superset  a s p e c i f i c r o l e w i l l be c o n t a i n e d by  o f common superset without  o v e r l a p (S8)  b u t no o v e r l a p d e s c r i b e s o b j e c t s t h a t exclude  such  a s FULLTIME_EMPLOYEE  16 i l l u s t r a t e s  o f a new s u p e r s e t  each  and PARTTIME_EMPLOYEE.  such a s c e n a r i o  and shows t h e c r e a t i o n  o b j e c t EMPLOYEE, connected t o t h e o r i g i n a l  o b j e c t s through two I s a r e l a t i o n s h i p s .  The example  i n Figure  EMPLOYEE e n t i t y view.  16 i s based on t h e assumption t h a t t h e  has not p r e v i o u s l y e x i s t e d i n e i t h e r o f t h e  Whenever a common superset  of r e l a t i o n s h i p s may have t o occur.  161  i s represented,  relocation  View 1  FULLTIME.  View 2  PARTTIME.  EMPLOYEE  EMPLOYEE  GLOBAL SCHEMA  PARTTIME.  FULLTIME. EMPLOYEE  EMPLOYEE  F i g u r e 16: R e p r e s e n t a t i o n o f a Common Superset w i t h o u t Common Subset  162  R e p r e s e n t a t i o n o f common s u p e r s e t w i t h o v e r l a p (S9)  In s i t u a t i o n s where two o b j e c t s not o n l y have a common s u p e r s e t but  also  subset  a common subset  (overlap) both  have t o be r e p r e s e n t e d  relationships  t h e s u p e r s e t and t h e  by a d d i t i o n a l  between t h e o r i g i n a l  o b j e c t s and I s a  o b j e c t s and t h e s u p e r s e t  and subset o b j e c t s (see F i g u r e 17).  View 1  PRODUCT. TEAMMEMBER  View 2  PROJECT. TEAMMEMBER  GLOBAL SCHEMA PRODUCTPROJECT.  TEAM-  TEAM-  MEMBER  MEMBER  PROJECT-SPRODUCT. TEAMMEMBER  F i g u r e 17: R e p r e s e n t a t i o n o f Common Superset  163  and Common  Subset  Figure  17 d e p i c t s PROJECT_TEAM_MEMBER and  entities.  Both  have t h e common  superset  common subset PROJECT&PRODUCT_TEAM_MEMBER. represent  PRODUCT_TEAM_MEMBER EMPLOYEE and t h e  The I s a r e l a t i o n s h i p s  t h a t a l l team members a r e employees and t h a t t h e  members o f t h e p r o j e c t & p r o d u c t team belong t o both t h e p r o j e c t and t h e product team.  Again, any p r e v i o u s l y e x i s t i n g s u p e r s e t ,  subset o r I s a r e l a t i o n s h i p s w i l l not be r e d u p l i c a t e d . Whenever a common s u p e r s e t o r a common subset i s r e p r e s e n t e d , r e l o c a t i o n o f r e l a t i o n s h i p s may have t o occur.  Renaming o f homonyms and synonyms (S10)  Renaming carry  becomes  different  necessary names  when otherwise  identical  objects  (synonym), o r when d i f f e r e n t  objects  c a r r y t h e same name (homonym). objects  Once synonyms a r e t r e a t e d , t h e  s h o u l d have t h e same name.  different  from  That  name should a l s o be  t h e name o f any o t h e r o b j e c t i n e i t h e r  view.  Once homonyms a r e t r e a t e d , t h e i n v o l v e d o b j e c t s should c a r r y names t h a t a r e d i f f e r e n t all  from  each o t h e r and d i f f e r e n t  o b j e c t s they a r e not known t o be i d e n t i c a l t o .  164  from  A d d i t i o n of missing  objects ( S l l )  O b j e c t s can be m i s s i n g . Hence, f o r any but  not  order  i n the  two  objects  views, a l l o b j e c t s  other  t o make the  Most views w i l l o v e r l a p o n l y p a r t i a l l y .  have t o be  t h a t e x i s t i n one  added t o the  views i d e n t i c a l .  The  view  other  view i n  a d d i t i o n of  missing  i s p a r t o f the "view completion" s t r a t e g y used i n t h i s  i n t e g r a t i o n method.  During i n t e g r a t i o n , both views t h a t  take  p a r t i n the i n t e g r a t i o n p r o c e s s are a l t e r e d u n t i l f i n a l l y  they  are  that  identical.  This  strategy  i s d i f f e r e n t from those  c r e a t e a t h i r d " i n t e g r a t e d " view d u r i n g the c o n f l i c t r e s o l u t i o n process.  Many c o n f l i c t cases r e q u i r e the combination o f s e v e r a l elementary therapy  procedures  case  construct  of  requires and  one  cases  to  correct a conflict.  mismatch p a i r e d  a name change and of  and  SI,  S2,  or  applicable  S3.  with  a construct  For  synonymy  procedures.  the  a  (Case 6) ,  change, t h e r a p i e s  Appendix 2 p r e s e n t s  therapy  instance,  S10  conflict  Case 6 i s shown  below f o r i l l u s t r a t i o n .  CONSTRUCT MISMATCH AND N l <>  N2;  T l <>  T2;  SYNONYM Ml = M2;  CI <>  C2;  6.1 Entity i s Relationship. S o l u t i o n : S10 and SI. 6.2 Entity Attribute is Entity-Relationship construct. S o l u t i o n : S10 and S3. 165  6.2.1. Attribute i s Entity. 6.2.2. Attribute i s Relationship. 6.3. Relationship Attribute i s Entity. S o l u t i o n : S10 and S2.  166  The  4.4.  Impact o f H e u r i s t i c s  The main g o a l o f t h i s r e s e a r c h i s the development o f a complete view i n t e g r a t i o n method.  The secondary g o a l  of t h i s method t o operate w i t h i n s u f f i c i e n t  The take For  i s an a d a p t a t i o n  information.  i n t e g r a t i o n method i n t h e form d e s c r i b e d i n t o account t h e source example,  view  so f a r does not  of i t s information  requirements.  i f t h e method has t o know whether EMPLOYEE i n  1 and DEALER  i n view  2 a r e o f t h e same o b j e c t  type  ( c o n s t r u c t ) , the method expects t h i s i n f o r m a t i o n t o be a v a i l a b l e . The four  source  o f the information  relevant  meaning, easily  dimensions  and c o n t e x t ,  assessed.  Obviously  not.  i s o f no concern.  f o r each o b j e c t ,  name and c o n s t r u c t  name,  Among the construct,  a r e t h e ones most  Does EMPLOYEE have t h e same name as DEALER? A l s o t h e o b j e c t type  i s observable,  because  o b j e c t types a r e e x p l i c i t l y s t a t e d i n E-R models. The assessment o f meaning i d e n t i t y , and t h e r e f o r e a l s o c o n t e x t much more d i f f i c u l t problem.  The q u e s t i o n  o b j e c t s r e f e r t o t h e same r e a l world  Recognition  identity, i s a  i s whether two view  object.  o r i n t e r p r e t a t i o n o f r e a l world  objects i s a task  beyond most computer systems and not a concern o f t h i s r e s e a r c h . Nevertheless,  r e c o g n i t i o n o f meaning i d e n t i t y o r d i f f e r e n c e i s  the most c r u c i a l  r e c o g n i t i o n task,  since the other  dimensions  f o l l o w the meaning dimension.  I.e., i f two o b j e c t s have the  same meaning, t h e i r names w i l l u l t i m a t e l y be t h e same, i f they have d i f f e r e n t meaning, t h e i r names w i l l u l t i m a t e l y be d i f f e r e n t .  The  following  information  alternatives exist  the  user i n t e r r o g a t i o n ;  2.  advance meaning  3.  method "guesses".  first  specification;  alternative to satisfy  requirement  i s through  user  t h e meaning  interrogation.  information  Every time  o b j e c t s a r e compared, the system c o u l d ask the u s e r two  objects  identical  i n meaning?".  demands a s u b s t a n t i a l amount user,  meaning  requirement:  1.  The  to s a t i s f y  This  of question  two  "are these  form o f  operation  answering by the  e s p e c i a l l y s i n c e f o r any o b j e c t 01 i n view 1 a t most one  o b j e c t 02 i n view 2 w i t h the same meaning i s allowed  Advance meaning  specification  requires  an ex-ante  to exist.  definition  o f t h e meaning o f each o b j e c t i n a form t h a t a l l o w s t h e method to  compare  difference. First,  i t t o other This  objects  requirement  meaning d e s c r i p t i o n s may  and t o d e c i d e  on i d e n t i t y or  r e s u l t s i n two main have t o be v e r y  problems.  d e t a i l e d to  d i f f e r e n t i a t e between o b j e c t s t h a t are q u i t e s i m i l a r , y e t not  168  completely  identical.  very high.  Secondly, meaning d e f i n i t i o n s have t o be formulated  in  Thus t h e u p - f r o n t  effort  required i s  such a form t h a t t h e r e can be no m i s i n t e r p r e t a t i o n s .  The  t e r m s used t o d e f i n e meaning have t o be c o n s i s t e n t over a l l object d e f i n i t i o n s .  These two problems v i r t u a l l y  r u l e out a  p r i o r complete d e f i n i t i o n o f each o b j e c t ' s meaning.  Method  "guesses"  s t r o n g evidence implies  that  require  that  t h e i n t e g r a t i o n method h a s  on which i t can base i t s guesses.  whenever t h e method  compares  makes a d e c i s i o n w h e t h e r t o b e l i e v e identical  or not.  This  that  "Guessing"  two o b j e c t s , i t the o b j e c t s are  i s t h e way i n which humans  operate.  When we say " I know", we mean t h a t we b e l i e v e , based on evidence f o r t h e f a c t and no o r l i t t l e evidence  evidence  against the f a c t " .  If  i s n o t a v a i l a b l e , t h e method i s bound t o make mistakes.  Unfortunately,  ample o p p o r t u n i t y  f o r mistakes e x i s t s ,  since  the amount o f p o s i t i v e i n f o r m a t i o n — a n y O l i s i d e n t i c a l t o a t most  one 0 2 — i s  information.  so much s m a l l e r  than t h e amount o f n e g a t i v e  Hence, r e l i a n c e on guesses i s n o t a d e s i r a b l e  alternative.  Apparently, reasonable The yet  none o f t h e a l t e r n a t i v e s by i t s e l f s o l u t i o n t o the information  provides  requirement  a  problem.  f i r s t a l t e r n a t i v e , i n t e r r o g a t i o n , provides the information, at high  front  cost  t o the user.  definition,  does  The second a l t e r n a t i v e , up-  not necessarily provide  169  a l l the  information to  an  and  i t requires a l o t of user  unambiguous  requires  no  representation.  user  effort  but  The  does  not  effort  i n addition  third  alternative  guarantee  that  the  i n f o r m a t i o n requirements are s a t i s f i e d c o r r e c t l y .  Consequently,  the  i s to  best  strategy to  satisfy  the  requirements,  the good a s p e c t s o f the d i s c u s s e d  alternatives.  User  method  i n t e r r o g a t i o n i s the  i n f o r m a t i o n requirements,  only  that  combine  satisfies  t h e r e f o r e i t i s the dominant approach  ( i f the u s e r says t h a t i n h i s world  two  o b j e c t s are  identical,  they are i d e n t i c a l , u n l e s s t h i s f a c t c o n f l i c t s w i t h a statement).  The  o t h e r two  to  or  at  overcome  because  questions  asked.  t o be  Most o f the ..."  will  questions  result  they  the can  of the type  i n the  weakness limit  answer "no"  has it  compared t o a l l o b j e c t s i n 02  and  of  o r the  direct  user the  identical  will  demand  o b j e c t s a t once.  to the  If  01  comparison, the  user  t o d e a l w i t h a l a r g e amount of i n f o r m a t i o n which may  make  difficult  t o answer c o r r e c t l y .  i n one  used  prioritize  " i s o b j e c t 01  comparison t o a v a s t number o f o t h e r is  previous  a l t e r n a t i v e approaches can be  alleviate  interrogation,  the  Consequently,  an  improved  method should reduce the number o f o b j e c t s OI has t o be compared to.  I f object  identity  i s the g o a l , o n l y such 02s  should  compared t o 01 which c o u l d p o t e n t i a l l y be i d e n t i c a l t o 01. other  words, a  filter  would be  170  used t o reduce the  be In  number of  objects  i n the  comparison.  Ex-ante meaning d e f i n i t i o n s  of  o b j e c t s , i f i n unambiguous form, can be used i n such a manner.  I f the purpose of ex-ante meaning d e f i n i t i o n s i n t h i s approach is  to  allow  an  automatic  assessment o f d i f f e r e n c e , meaning  d e f i n i t i o n s can become much s h o r t e r . definition value  being  separate from  of  each  either  a l l E-R  those  correctly  object  could  "animate  model  contain  object",  objects  describing things.  just  one  fact, i t s  "inanimate o b j e c t "  describing l i v i n g  creatures  I f a l l database o b j e c t s  were decide  t h a t EMPLOYEE and  DEPARTMENT are d i f f e r e n t , because the  former  one  object,  living  method c o u l d  to  automatically  is a  c l a s s i f i e d , the  For example, the meaning  the  l a t t e r one  not.  A  few  general  c a t e g o r i e s can be chosen which can a l l o w s u f f i c i e n t s p e c i f i c a t i o n and d i f f e r e n t i a t i o n o f meaning without the need f o r an up-front of  such  such  a  definition effort.  Ein-Dor  sense  knowledge  based  integration  method c o u l d  t h a t are not  i d e n t i c a l t o o b j e c t 01.  to decide  A  Grounded  classification,  quickly eliminate The  those o b j e c t s  use on the 02  u s e r would o n l y have  among the remaining o b j e c t s .  further reduction  comparison can be information, discussed  (1987) d i s c u s s e s the  "common sense knowledge" i n r e a s o n i n g . common  excessive  i n the  number of o b j e c t s  i n i t i a t e d through the use  i n c o m b i n a t i o n with  previously.  Instead 171  the  use  of guessing  involved  in  the  of o t h e r a v a i l a b l e of h e u r i s t i c s , which o b j e c t s  as are  identical,  t h e method  could  use any  additional  evidence t o  f u r t h e r reduce the number o f o b j e c t s under c o n s i d e r a t i o n .  The  f o l l o w i n g two views s h a l l exemplify t h i s approach which u t i l i z e s context  information:  View 1:  EMPLOYEE—Employed_by—DEPARTMENT  View 2:  EMPLOYEE—Works_in—XYZ—Engaged_in—PROJECT  Suppose,  i t i s already  known t h a t  EMPLOYEE i n view 2 a r e i d e n t i c a l . to find  out whether  i n meaning would  be  EMPLOYEE i n v i e w  Now,  t o expect that  an  t h e case but i s q u i t e  object  likely  i s identical  One r e a s o n a b l e assumption identical  would a l s o be a r e l a t i o n s h i p i n view 2. be  the next t a s k would be  the r e l a t i o n s h i p Employed_by  t o any o b j e c t i n view 2.  1 and  to  Employed_by  T h i s does not have t o  (hence, a h e u r i s t i c ) .  This  simple assumption reduces the number o f contenders i n view 2 t o the o b j e c t s ,  Works_in and Engaged_in.  Another r e a s o n a b l e  assumption would be t o expect t h a t the o b j e c t sought i n view 2 is  also  this  a s s o c i a t e d w i t h t h a t view's EMPLOYEE e n t i t y .  does  not n e c e s s a r i l y  have  t o be t h e case,  Again,  information  c o u l d be m i s s i n g i n view 2, y e t i t i s an assumption l i k e l y t o be  true.  potential  The  second  assumption  c a n d i d a t e t o have  leaves  o n l y Works_in as a  the same meaning  as  Employed_by.  Consequently, i n s t e a d o f a s k i n g the u s e r " i s the r e l a t i o n s h i p Employed_by  identical  i n meaning  t o one  o f the  following:  Works_in, XYZ, Engaged_in, PROJECT?", i t can more i n t e l l i g e n t l y ask,  " i s the r e l a t i o n s h i p Employed_by  172  identical  i n meaning  to  the  relationship Worksin?",  thus  simplifying the decision  task f o r the user.  Not  o n l y c o n t e x t and c o n s t r u c t can be used t o make assumptions  about t h e i d e n t i t y  of objects.  such as names can be used too. of  p o t e n t i a l sources  Other a v a i l a b l e i n f o r m a t i o n , F i g u r e 18 p r o v i d e s an overview  o f evidence  f o r meaning  identity.  The  f i r s t aspect, meaning r e p r e s e n t a t i o n , has a l r e a d y been d i s c u s s e d .  MEANING  F i g u r e 18: Sources o f Evidence f o r Meaning I d e n t i t y  173  The  second aspect, context, i s broken down i n t o t h r e e o b s e r v a b l e  facts: in  related  objects,  a relationship.  "Related  d e f i n i t i o n of context. relationships.  c a r d i n a l i t i e s , and r o l e s  I f two r e l a t i o n s h i p s with  contains multiple  do not o n l y a s s o c i a t e t h e  t h e same mapping r a t i o s , t h e  f o r the r e l a t i o n s h i p s  When a v i e w  denotes the general  C a r d i n a l i t i e s r e f e r s t o the context of  same e n t i t i e s , b u t a l s o evidence  objects"  of e n t i t i e s  1  identity  i s even  relationships  stronger.  associating  the  same s e t o f e n t i t i e s , a d i f f e r e n t i a t i o n by c a r d i n a l i t i e s can be  useful.  defined. an  The u s e o f r o l e s  applies  only  when  I f names a r e g i v e n t o t h e a s s o c i a t i n g  entity  and a r e l a t i o n s h i p ,  then these r o l e  r o l e s are  l i n k between names can be  used f o r comparison.  Third,  a t t r i b u t e s can serve as an i n d i c a t o r f o r i d e n t i t y .  problem  i s that  therefore identity easily  attributes  subject  i n t h e m s e l v e s and  t o t h e same d i f f i c u l t i e s  assessment.  found  are objects  with respect to  One aspect o f a t t r i b u t e s ,  out, t h e i r  names.  Thus,  The  however, i s  two o b j e c t s  may be  s p e c u l a t e d t o be i d e n t i c a l , i f t h e i r a t t r i b u t e s have i d e n t i c a l names. for  As f o r a l l p r e v i o u s i n d i c a t o r s ,  interpretation.  attributes  have  Alternatively,  The requirement t o be  identical,  t h e key a t t r i b u t e ( s )  174  t h e r e has t o be room  should  not be t h a t  yet at least could  all  some.  be t h e focus o f  attention.  I d e n t i c a l o b j e c t s a r e l i k e l y t o have i d e n t i c a l key  attributes.  Fourth,  identical  meaning,  domains can be an i n d i c a t o r f o r i d e n t i c a l  i f domains  c a n be d e f i n e d  unambiguously.  For  a t t r i b u t e s , domains a r e t h e v a l u e s e t s from which t h e a t t r i b u t e values  a r e drawn,  objects,  i.e. "Social  an o b j e c t ' s  EMPLOYEE — I s a — P E R S O N being  a person.  entity,  superset  S e c u r i t y Number". defines  specifies  I f the other  i t s domain.  t h e domain  view c o n t a i n s  t h e n t h e EMPLOYEE e n t i t y  could  F o r other I.e.,  o f EMPLOYEE as a l s o t h e PERSON  exist  only  among i t s  subsets.  F i n a l l y , t h e name o f an o b j e c t as an i n d i c a t o r f o r i t s meaning can be another r e l e v a n t p i e c e o f evidence. identity strings,  i s not d e f i n e d but  differentiation,  as s t r i c t  i f i t also  identity  allows  E s p e c i a l l y i f name of the character  f o rs i n g u l a r / p l u r a l  as i n EMPLOYEE v s . EMPLOYEES.  Both  objects  c o u l d be expected t o be t h e same, even though t h e i r names a r e , s t r i c t l y interpreted, different. of o b j e c t s , t h i s allowing  For the a n a l y s i s o f r e l a t e d n e s s  interpretation f l e x i b i l i t y  f o r comparison o f o b j e c t s t h a t  names' p r e f i x e s .  c o u l d be widened,  only d i f f e r  i n their  F o r example PART_TIME_EMPLOYEE, EMPLOYEE,  and FULL_TIME_EMPLOYEE c o u l d be expected t o be i d e n t i c a l o r a t least  related, since  they  a l l their  word employee.  175  names c o n t a i n  the root  It  i s unlikely,  point  into  object.  t h a t f o r any g i v e n o b j e c t a l l these  t h e same d i r e c t i o n ,  Often,  particular  i t may  object  that  i s , identify  n o t be known what  i s , naming  aspects t h e same  the context of a  preferences w i l l  d i f f e r , and  d i f f e r e n t t a s k s may r e q u i r e d i f f e r e n t o b j e c t a t t r i b u t e s .  The  approach t o be taken i s t o use these i n d i c a t o r s as a f i l t e r o f variable  density.  At f i r s t ,  the f i l t e r  s h o u l d be t i g h t , t o  suggest o n l y t h e most l i k e l y c a n d i d a t e ( s ) f o r a meaning match, i.e., of  o n l y t h e o b j e c t s o f t h e same type w i t h same c o n t e x t and  t h e same meaning c a t e g o r y .  still,  partial  attribute  objects. resulted  o v e r l a p o f a t t r i b u t e names, o r i d e n t i t y o f  names c a n be u s e d  Upon f a i l u r e ,  to restrict  t h e number o f  i . e . , i f none o f t h e suggested  objects  i n a proper match, t h e technique c o u l d remove one o r  more o f t h e e a r l i e r objects  be t o o wide  i . e . , f o r a database w i t h many e n t i t i e s o f t h e people  category, key  Should t h i s f i l t e r  applied restrictions,  o f t h e same meaning  category,  i . e . , look f o r a l l  regardless of object  type and c o n t e x t .  There e x i s t s no s i n g l e b e s t r u l e f o r t h e a p p l i c a t i o n o f meaning indicators.  The o n l y i n d i c a t o r which i s always a p p l i c a b l e and  c o r r e c t i n i t s p r e d i c t i o n , should t h e i n f o r m a t i o n be a v a i l a b l e , is  t h e meaning c a t e g o r y i n d i c a t o r .  By d e f i n i t i o n two o b j e c t s  cannot be i d e n t i c a l i n meaning u n l e s s t h e i r meanings belong t o the  same c a t e g o r y  o f meaning.  176  I.e., EMPLOYEE and DEPARTMENT  cannot have t h e same meaning because one i s an animate o b j e c t , the  other  the  only  other  Hence, t h i s  one t h a t can e l i m i n a t e o b j e c t s w i t h  i n d i c a t o r s can only  different  Only  one an inanimate o b j e c t .  suggest  that  indicator i s  certainty.  The  an o b j e c t may have  (or same) meaning.  e m p i r i c a l data  generated under a v a r i e t y o f c o n d i t i o n s  can p r o v i d e s t r o n g e r evidence  on which meaning i n d i c a t o r s work  b e t t e r than o t h e r s .  F o r i n s t a n c e , i f t h e same systems a n a l y s t  produces  (based  a l l views  requirements), reasonable underlying will vs. If  one may  indicator  on d i f f e r e n t  expect  that  (filter)  assumption b e i n g  users'  object  information  type  f o r meaning  may  be a  i d e n t i t y ; the  t h a t a s i n g l e database  designer  be more c o n s i s t e n t i n what he models as a r e l a t i o n s h i p an e n t i t y  o r a t t r i b u t e than a m u l t i p l i c i t y  a l l views s p e c i f i c a t i o n s  person  (user  designer),  and designs  one should  c o n s i s t e n t l y throughout t h e views. a good b a s i s t o judge meaning  a r e done by t h e same  expect names t o be used  Hence, names c o u l d  identity.  177  of designers.  provide  Generalization Hierarchy  4.5.  The  previous  f o r Database O b j e c t s  s e c t i o n introduced  t h e i d e a o f ex-ante  meaning d e f i n i t i o n s a c c o r d i n g t o p r e d e f i n e d meaning c a t e g o r i e s . Here,  the concept  introduced  shall  be  t o f a c i l i t a t e the c a t e g o r i z a t i o n .  The  difficulty  the  fact  in  of a generalization hierarchy  that  i n developing  such a c l a s s i f i c a t i o n  i t has t o be a c c e p t a b l e  the database  design  process.  t o a l l people  In order  scheme i s involved  to f u l f i l l  this  g o a l , t h e g e n e r a l i z a t i o n h i e r a r c h y should be:  1.  complete;  2.  consistent;  3.  discriminative;  4.  concise.  C r i t e r i a 1 and 2 a r e minimum c r i t e r i a .  First, a classification  scheme t h a t does not a l l o w t h e u s e r t o c l a s s i f y a l l h i s o b j e c t s in  accordance w i t h  knowledge. the  t o capture that  user's  Second, i f t h e scheme induces t h e u s e r t o c l a s s i f y  same o b j e c t  purpose  i t i s insufficient  under d i f f e r e n t  o f t h e scheme,  namely  d i f f e r e n c e o f o b j e c t meanings.  178  categories, to identify  i t v i o l a t e s the similarity  or  Criteria  3 and 4 are based on L e i b n i z ' s M i n i m a l i t y P r i n c i p l e  (Leibniz,  1956, pp. 198-199).  a representation shorter  Correspondingly,  is  superior.  that  i s very  very  coarse  a  to  explain  the  generalization  same  phenomena.  hierarchy  among a l a r g e r number o f o b j e c t  another one w i t h  that  i s s u p e r i o r t o another one, i f i t r e q u i r e s a  explanation  differentiate  This p r i n c i p l e postulates  that  can  classes  than  the same number o f d i f f e r e n t i a t i o n  What i s u n d e s i r a b l e fine-grained  criteria  i s a classification  f o r a subset  f o r the remainder o f o b j e c t  scheme  o f o b j e c t c l a s s e s but classes.  Similar to  an unbalanced b i n a r y t r e e , the too f i n e / t o o coarse g e n e r a l i z a t i o n h i e r a r c h y would waste too many l e v e l s o f s p e c i a l i z a t i o n on too few phenomena.  Unfortunately, will way  choice  o f the " r i g h t " g e n e r a l i z a t i o n h i e r a r c h y  c o n s e q u e n t l y depend on the knowledge domain i n which the person who  among them. contains  F o r example,  only  poorly  with  people  roles  class  a database  Consequently, hierarchy  one  (i.e.,  c l a s s i f i e s objects  and on the  differentiates  a generalization hierarchy  f o r a l l "people o b j e c t s " w i l l that  stores only  employee,  data  deal  for different  i n v e s t o r , saver,  tax  payer).  v a l i d a t i o n o f the q u a l i t y o f a g e n e r a l i z a t i o n  i s p o s s i b l e o n l y w i t h i n the context  of a p a r t i c u l a r  knowledge domain and a s p e c i f i c person who c l a s s i f i e s Hence  which  i t i s necessary  generalization hierarchy  to  include  the c r e a t i o n  o f such  i n the requirements a n a l y s i s  179  objects. a  effort.  The  database  designer  has t o develop a h i e r a r c h y  which can  represent  t h e a p p l i c a t i o n domain and has t h e above mentioned  desirable  properties.  If  no s u c h  specialized categorization  domain-independent The  hierarchy  created  incorporating only  A  flat  of  categorization  could  be used. flat,  few l e v e l s o f s p e c i a l i z a t i o n .  hierarchy  has t h e obvious  discriminative  classifications  hierarchy  exists, a  as p a r t o f t h i s p r o j e c t , i s r a t h e r  generalization  limited  hierarchy  ability.  disadvantage  However,  a r e used t o i d e n t i f y d i f f e r e n c e  meaning  the  i d e n t i f i e r s used by t h e i n t e g r a t i o n method, and t h e method always  Object  interrogate  classification  i n meaning,  not  will  identity.  object  the user,  i s only  i f i n doubt.  focus i s on d i f f e r e n c e i n meaning, even a f l a t  one o f  Since the  generalization  h i e r a r c h y has reasonable d i s c r i m i n a t i v e a b i l i t y , as t h e f o l l o w i n g example may  Consider among  illustrate.  a generalization  20  classes,  such  hierarchy  as P e r s o n ,  that  Animal,  O b j e c t EMPLOYEE i s c l a s s i f i e d as a Person. answered  i s " i s object  EMPLOYEE?". equal  Without  Organization.  The q u e s t i o n  XYZ d i f f e r e n t i n meaning from further  p r o b a b i l i t i e s t o belong  .05 c h a n c e  can d i f f e r e n t i a t e  of belonging  knowledge  t o be object  about XYZ, XYZ has  i n t o e i t h e r c l a s s , and thus a  i n t o t h e c l a s s Person.  180  Thus  there  e x i s t s a .05 chance f o r t h e c l a s s i f i c a t i o n mechanism t o suggest that  EMPLOYEE and XYZ a r e not d i f f e r e n t  situation  (1 o u t o f 20 c a s e s ) , t h e u s e r  consulted, question. reduce  i n meaning.  i f not other  would  i n d i c a t o r s were a b l e  In t h i s  have t o be  t o answer t h e  An i n c r e a s e o f t h e number o f c l a s s e s t o 40 would  the probability  t o .025, an i n c r e a s e t o 200 c l a s s e s  would r e s u l t i n a .005 p r o b a b i l i t y , r e q u i r i n g u s e r i n t e r r o g a t i o n only  i n 1 o u t o f 2 00  cases.  The r e d u c t i o n s  i n probability  have t o be weighed a g a i n s t t h e c l a s s i f i c a t i o n e f f o r t which i s an ex-ante investment.  A  g e n e r a l i z a t i o n hierarchy  classes common  shows s i m i l a r i t i e s sense  classification ambitious, different the  f o r the c a t e g o r i z a t i o n o f object  knowledge hierarchy  since  the attempts  in artificial discussed  the task,  i n meaning,  artificial  with  judging  i s simpler  intelligence  intelligence.  here  i s , however,  The less  whether two o b j e c t s a r e  than t h e t a s k presented i n  applications (i.e.,  Schank's and  Rieger's  restaurant  (1979)).  Ein-Dor suggests concept c l u s t e r s f o r common knowledge  i n the business 1. 2. 3. 4. 5. 6. 7. 8. 9.  scripts  t o represent  (1974) o r Hayes' n a i v e  environment (1987).  His categories are:  exchange, time, location, measurement, media o f exchange, o b l i g a t i o n s and commitments, types o f b u s i n e s s e s , behaviors, n a i v e economics, 181  physics  employment, people who engage i n b u s i n e s s .  10. 11.  This c l a s s i f i c a t i o n  clarifies  t h e d i f f e r e n c e between a common  knowledge r e p r e s e n t a t i o n and a g e n e r a l i z a t i o n h i e r a r c h y . E i n Dor's  c l a s s e s a r e n o t mutually  exclusive.  F o r example, t h e  employment s i t u a t i o n can be c l a s s i f i e d as group 10 as w e l l as group  6.  These  classes  represent  areas  i n which a common  sense computer program should have knowledge i n .  The  c a t e g o r i z a t i o n that  can be used  i n absence o f any more  domain o r i e n t e d h i e r a r c h i e s , i s s t r u c t u r e d as f o l l o w s : 1. 1.1. 1.1.1. 1.1.2. 1.1.3. 1.1.3.1. 1.1.3.2. 1.1.3.2.1. 1.1.3.2.2. 1.1.3.2.3. 1.2. 1.2.1. 1.2.1.1. 1.2.1.1.1. 1.2.1.2.2. 1.2.1.1.1. 1.2.1.2. 1.2.2. 2. According have  Objects L i v i n g o b j e c t s (even i f now dead) Plants (flora) Animals (fauna) Persons Person ( g e n e r i c , not person r o l e s ) Person r o l e s Person r o l e s i n person-person i n t e r a c t i o n (i.e., parent) Person r o l e s i n p e r s o n - t h i n g a s s o c i a t i o n ( i . e . , c a r owner) Person r o l e s i n person-person-thing interactions ( i . e . , manager) Inanimate o b j e c t s Abstract objects A b s t r a c t o b j e c t s t h a t a r e o r g a n i z e d (have s t r u c t u r e ) H i e r a r c h i e s ( i . e . , a b u s i n e s s company) Markets ( i . e . , t h e r e a l e s t a t e market) Other S t r u c t u r e s Heaps, lumps and atomic a b s t r a c t o b j e c t s ( i . e . , a dream, a theory) Concrete o b j e c t s ("things") Object c h a r a c t e r i s t i c s ( i . e . , c o l o r , s i z e ) to this  a meaning  c a t e g o r i z a t i o n scheme, each view o b j e c t can  list  containing  up t o 5 e l e m e n t s ,  s u c h as  [ o b j e c t , l i v i n g , p e r s o n , r o l e , p e r s o n - t h i n g ] f o r category 1.1.3.2.3. 182  O b j e c t s c l a s s i f i e d as b e l o n g i n g t o d i f f e r e n t c a t e g o r i e s cannot be i d e n t i c a l i n meaning.  I f the meaning l i s t  incompletely  i . e . , category  specified,  f o r an o b j e c t i s  1.1.3. i t may  not be  d i f f e r e n t from an o b j e c t c l a s s i f i e d as 1.1.3.2.3. and t h e r e f o r e user  interrogation  may  be  necessary.  Objects belonging to  d i f f e r e n t c a t e g o r i e s but b e l o n g i n g t o the same h i g h e r category may be r e l a t e d i n meaning. schemes w i l l use  More domain s p e c i f i c c a t e g o r i z a t i o n  have more and b e t t e r f i t t i n g c a t e g o r i e s but w i l l  t h e same r e a s o n i n g mechanism  categorization.  183  t o i n t e r p r e t t h e r e s u l t s of  Assessment o f the Method  4.6.  In an e a r l i e r chapter, the s t r e n g t h s and weaknesses of  previous  evaluation  integration criteria  methods were  will  now  be  assessed.  used  to  The  highlight  c a p a b i l i t i e s and l i m i t a t i o n s of the method p r e s e n t e d  Similar  to previous  introduced  in this  semantic research  integration requires  during the i n t e g r a t i o n process. to  settle  meaning.  questions  concerning  i d e n t i t y or  difference  as d i s c u s s e d here,  resolution,  relationships.  covers  view merging and  B a t i n i et a l .  conceptual  a larger part  addition  including  essential  part  o f the  aspects  correctness  and  integration  These t e s t s , however, are not  integration process;  of  inter-set  (1983) cover a d d i t i o n a l  design process,  (pre-integration).  the  I t performs of  completeness t e s t s f o r i n d i v i d u a l views b e f o r e the process  in  asked.  the i n t e g r a t i o n problem than most o t h e r t e c h n i q u e s . conflict  one  interaction  However, the method employs h e u r i s t i c s t o reduce  View i n t e g r a t i o n ,  the  here.  methods, the  designer  the  The d e s i g n e r has t o be c o n s u l t e d  number o f q u e s t i o n s t h a t must be  of  same  rather,  they  an are  elements o f the view c r e a t i o n t a s k . This  r e s e a r c h exceeds a l l p r e c e d i n g  of c o n f l i c t  cases covered.  approaches  Less important  184  i n the  number  than the number of  cases, however, i s the f a c t t h a t the c o n f l i c t l i s t i s exhaustive, based on a l l r e l e v a n t o b j e c t d i f f e r e n t i a t i o n  criteria.  Similar  one  to  complexity  other  semantic  o f the  methods,  this  i n t e g r a t i o n t a s k by  o b j e c t s e n t i t i e s and  relationships.  reduces  f o c u s s i n g on h i g h  The method a l s o  the level  separates  the t e s t f o r r e l a t e d n e s s from the t e s t f o r i d e n t i t y .  Heuristics  further  " i s object  01  reduce the  identical  simplified  task  complexity.  i n meaning t o one  through  reduction  of of  The the the  question  objects size  {02}?" can  o f the  H e u r i s t i c s are used t o e l i m i n a t e u n l i k e l y c a n d i d a t e s  be  set  {02}.  from  {02}.  T h i s r e s e a r c h a l s o i n v e s t i g a t e d whether the i n t e g r a t i o n problem c o u l d be d e s c r i b e d by an even s m a l l e r s e t o f c o n f l i c t c a t e g o r i e s than  the  17  simplify chosen  general  the  between edge. missing  d e s c r i p t i o n of  which  relationship,  cases  represents or  objects  identified conflicts, every  a t t r i b u t e , as (entity  role,  in section  4.1.  To  a graph n o t a t i o n  object,  a node, and  whether  was  entity,  every a s s o c i a t i o n  attribute association)  as  an  Based on t h i s n o t a t i o n , view c o n f l i c t s take the form of nodes or edges, or i n c o n s i s t e n t l y l a b e l l e d nodes (name  mismatch). attribute  A mismatch between types of nodes, i . e . vs.  c h a r a c t e r i z e d as  entity-relationship construct, a graph c o n t r a c t i o n .  entitycan  be  A graph c o n t r a c t i o n i s  the removal o f an edge which r e s u l t s i n the merging o f the  two  o b j e c t s l i n k e d by the edge i n t o one  E-R  185  new  object.  I.e.,  an  construct  i s merged i n t o one new o b j e c t , an e n t i t y  Similarly,  a r e l a t i o n s h i p replaces  attribute.  arelationship-entity-  relationship  s t r u c t u r e , when two edges a r e c o n t r a c t e d  latter  Both types o f c o n t r a c t i o n a r e d e p i c t e d i n F i g u r e  one.  i n the  19.  Entity attribute is E - R construct  Relationship represents E - R - E construct  F i g u r e 19: C o n s t r u c t Mismatch Shown as Graph C o n t r a c t i o n  The  examples  illustrate  t h a t t h e graph n o t a t i o n  describe  t h e c o n s t r u c t mismatch  missing  object  based on o n l y A missing mismatch  conflict  and t h e c o n t e x t  two c r i t e r i a :  object  conflict,  missing  i n a d d i t i o n t o the mismatch  186  conflict,  nodes and m i s s i n g  translates into a missing  t r a n s l a t e s into missing  i s able t o  edges  node,  (plus  edges. context  potentially  missing  nodes), and  edges and the  c o n s t r u c t mismatch t r a n s l a t e s i n t o m i s s i n g  graph c o n t r a c t i o n .  same c o n f l i c t  S i n c e the n o t a t i o n can  phenomena as  the  E-R  describe  model u s i n g  fewer  mechanisms, i t i s a more powerful d e s c r i p t i o n t o o l .  The  AVIS v i e w  research  employs  described objects, the  the  i n the and  form  of  more p r o p e r t y  of nodes and  object  not  contains  connecting.  approach.  part  AVIS,  edges.  of  this  views  Nodes  are  represent  (node) i s d e f i n e d  by  type  (i.e.,  a t t r i b u t e ) , view,  name, and  object  meaning  relevant the  In  Each o b j e c t  properties:  identifier,  (edge)  graph  edges, r o l e s .  same s e t  object  i n t e g r a t i o n program developed as  for this  identifiers  of  explanation) . the  two  (plus  one  Each r o l e  objects  i t is  Both are e x p l a i n e d i n more d e t a i l i n the subsequent  chapter.  Even though the graph n o t a t i o n i s more powerful as a d e s c r i p t i o n t o o l than the E-R  model, i n t e g r a t i o n cases have been d i s c u s s e d  within  this  research  widely  used  as  a  using  E-R  conceptual  concepts.  modelling  The  and  solutions described  more e a s i l y understood and database designer  means o f the  Thus, c o n f l i c t model  are  thus presumably more u s e f u l t o  the  than ones based on  d i f f e r e n c e s between the and the e x t e r n a l E-R  by  model i s  language i n database  d e s i g n , w h i l e the above graph n o t a t i o n i s not. cases  E-R  E-R  a graph n o t a t i o n .  i n t e r n a l graph r e p r e s e n t a t i o n  The  i n AVIS  r e p r e s e n t a t i o n r e q u i r e t h a t AVIS f r e q u e n t l y  187  t r a n s l a t e between these two r e p r e s e n t a t i o n forms.  Nevertheless  the i n t e r n a l r e p r e s e n t a t i o n i n the form of graphs i s v e r y u s e f u l because  i t allows  different  types  the  system  along  their  to  easily  relevant  compare o b j e c t s dimensions.  of For  i n s t a n c e , t h e q u e s t i o n "do o b j e c t OI and o b j e c t 02 have i d e n t i c a l meaning?" can be e a s i l y phrased i n the graph n o t a t i o n , shown i n Figure  20  i n i t s Prolog  equivalent.  This  simple  example  i l l u s t r a t e s t h a t the i n t e g r a t i o n method can compare o b j e c t s of any type i n the same manner. T2  i s "entity"  1  I.e., T l may  be " a t t r i b u t e " , w h i l e  .  identical_meaning(01,02) :obj e c t ( T l , V I , 0 1 , N l , M ) , obj ect(T2,V2,02,N2,M). F i g u r e 20:  I d e n t i c a l Meaning Query i n P r o l o g Graph N o t a t i o n  An a d d i t i o n a l s t r e n g t h o f the method d i s c u s s e d i n t h i s r e s e a r c h i s the use o f meaningful data o b j e c t s .  The E-R model a l l o w s the  d e s c r i p t i o n of o b j e c t s t h a t are meaningful t o database u s e r s . The i n t e g r a t i o n method f u r t h e r a l l o w s the r e p r e s e n t a t i o n o f some data  semantics.  1  However, the example i n the f i g u r e shows an over s i m p l i f i c a t i o n of the meaning comparison problem. AVIS does not use P r o l o g ' s p a t t e r n matching mechanism i n t h i s simple form t o assess meaning i d e n t i t y . Meaning comparison i s d e s c r i b e d i n more d e t a i l i n the subsequent implementation chapter. 188  Unlike  o t h e r semantic i n t e g r a t i o n methods, t h i s  an a l g o r i t h m which  explicitly  example, step  specifies  and  includes  f o r the r e l a t e d n e s s  tests,  the s t e p s o f t h e procedure.  For  t h e i d e n t i t y t e s t without h e u r i s t i c s c o n t a i n s a f o u r -  p r o c e d u r e i n which  relevant which  f o r the i d e n t i t y  one  object  criteria  meaning i d e n t i t y  namely as a 1:N {02}?"),  the  and  comparison  identity  or difference  i s assessed. relatedness  four  t o the form i n  q u e s t i o n s are  stated,  ("Is o b j e c t 01 i d e n t i c a l t o one o f  computational e f f o r t  number o f o b j e c t s .  Due  o f the  grows  linearly  w i t h the  The procedure t e r m i n a t e s when t h e i n i t i a l l y  d i f f e r e n t views have become i d e n t i c a l .  To be i d e n t i c a l ,  views have t o c o n t a i n the same o b j e c t s .  Objects are i d e n t i c a l  i f t h e y a r e i d e n t i c a l i n a l l f o u r r e l e v a n t dimensions  both  (meaning,  c o n t e x t , c o n s t r u c t , and name).  To judge t h e v a l u e o f the method, the q u e s t i o n s o f c o r r e c t n e s s and completeness o f the r e s u l t i n g views have t o be addressed. (The w o r k i n g the of  p r o t o t y p e o n l y demonstrates  the w o r k a b i l i t y  of  method f o r s p e c i f i c cases.) Based on the e a r l i e r d e s c r i p t i o n the i n t e g r a t i o n  algorithm,  i t i s known t h a t the procedure  always t e r m i n a t e s i f the i n i t i a l views c o n t a i n a f i n i t e number of  integration  task  through an adjustment o f both i n i t i a l l y d i f f e r e n t views.  When  the  objects.  The  procedure performs  p r o c e d u r e t e r m i n a t e s , f o r each  identical  object  exists  i n the  the  object  i n one view,  o t h e r view.  Hence,  completeness q u e s t i o n depends on whether o b j e c t s can be  189  an the  "lost"  during  i n t e g r a t i o n so t h a t t h e f i n a l views do not c o n t a i n a l l  objects  from  concerns  the i n i t i a l  whether o b j e c t s  represented  any  In  The c o r r e c t n e s s  from t h e i n i t i a l  i n t h e f i n a l view.  whether t h e order sequence  views.  question  views may be mis-  Furthermore, i t has t o address  i n w h i c h views a r e i n t e g r a t e d and/or t h e  i n which o b j e c t s w i t h i n  a view a r e c o n s i d e r e d  have  impact on t h e outcome o f t h e i n t e g r a t i o n p r o c e s s .  this  object  i n t e g r a t i o n method, o b j e c t s  represented  i n at least  one i n i t i a l  represented  i n t h e g l o b a l schema.  each  will  object  meaning  will  appear  cannot be l o s t .  This  but the object  name, c o n s t r u c t and context may change.  a l s o be  does not imply  in i t s original  be preserved,  view w i l l  Every  form.  that  The o b j e c t  representation i n  A r e l a t i o n s h i p may be  r e l o c a t e d , a name may be changed, o r an o b j e c t ' s c o n s t r u c t may be  changed.  A f t e r a c o n s t r u c t change, an o b j e c t w i l l  cases be represented relationship group. attribute one  will  The o n l y  through more than one new o b j e c t , i . e . , a become a r e l a t i o n s h i p - e n t i t y - r e l a t i o n s h i p  exception  i s t h e change o f a r e l a t i o n s h i p  i n t o an e n t i t y , where t h e c o n s t r u c t change  o l d object  i n most  by one new o b j e c t .  replaces  Due t o t h e d i r e c t i o n o f  change i n cases o f c o n s t r u c t mismatch, an o l d o b j e c t i s always r e p l a c e d by a t l e a s t one new o b j e c t . l o s t during the i n t e g r a t i o n process.  190  Hence, o b j e c t s cannot be  Although o b j e c t s cannot be l o s t , t h e r e s u l t i n g view may be  i n c o r r e c t , i f objects  added a r b i t r a r i l y . represented knowledge  still  a r e mis-represented  or objects are  An o b j e c t i s mis-represented  i f the knowledge  i n i t s p o s t - i n t e g r a t i o n form c o n t r a d i c t s w i t h t h e representation  i n t h e p r e - i n t e g r a t i o n form.  This  i n c l u d e s name changes t h a t r e s u l t i n names which do not convey the  meaning  the  information  result  o f the object, construct content  i n i n c o r r e c t meaning d e s c r i p t i o n s , and c o n t e x t should  Objects the  only  invalid  arbitrarily.  a r e o n l y added i f t h i s a d d i t i o n i s suggested by one o f  views, t h a t  object  changes  n o t be connected  The i n t e g r a t i o n method performs none o f these  o p e r a t i o n s , n o r does i t add o b j e c t s  that  i s i f at least  one o f t h e views c o n t a i n s an  i s not p a r t o f o t h e r  views.  Name changes  when synonyms o r homonyms a r e d e t e c t e d .  s u i t a b l e names t o overcome these designer in  compress  o f an o b j e c t , meaning changes which  which connect o b j e c t s t o o b j e c t s they to.  changes which  who uses t h e method.  the l o s s of information,  always t h e one which  conflicts  Construct since  occur  The c h o i c e o f  i s a task  f o r the  changes never  the construct  i s a b l e t o convey t h e most  result  chosen i s  information.  Meaning changes are never made by the system (database designer) . Meaning i s s p e c i f i e d by t h e u s e r s be for  changed by t h e u s e r s three  changes,  reasons. as d e p i c t e d  o f t h e system and can o n l y  o f t h e system.  First,  construct  i n Figure  191  Context changes  occur  changes cause  context  10 i n t h e c o n f l i c t  therapy  section.  Second, an a s s o c i a t i o n o f an e n t i t y t o a r e l a t i o n s h i p  results  i n a context  Third,  (exemplified  relationship relocation results  (shown  i n Figure  representation  will  only  13) .  i n Figure 12).  i n context  A l l of these  changes  change  make t h e  o f data o b j e c t i n one view compatible w i t h t h a t  of another view. 01  change  In t h e f i r s t  two o f t h e s e cases,  be connected t o an o b j e c t  view s t a t e s t h a t t h e two o b j e c t s  should  an o b j e c t  02, i f a t l e a s t one be connected.  Ifa l l  views a r e c o r r e c t p r i o r t o i n t e g r a t i o n , t h i s o p e r a t i o n result place  i n i n c o r r e c t context. only  i f during  Relationship  cannot  r e l o c a t i o n takes  t h e i n t e g r a t i o n process,  t h e database  designer  i d e n t i f i e s t h a t t h e r e l a t i o n s h i p i s a p p l i c a b l e t o the  superset  o b j e c t r a t h e r than t o t h e subset o b j e c t  Finally,  we must c o n s i d e r  whether t h e same outcome, t h a t i s ,  the same g l o b a l s t r u c t u r e , w i l l be achieved sequence  i n which  integration objects objects  problem,  compared. from  V2  ( F i g u r e 13).  views  independent o f t h e  are integrated.  I n a two-view  sequence r e f e r s t o t h e o r d e r  i n which  F o r example, i s 01 from VI compared t o a l l first,  followed  by 07 from V I , o r does 07  precede 01? In a m u l t i - v i e w i n t e g r a t i o n problem, sequence a l s o addresses  the order  i n which views a r e compared.  I.e.,  i f  t h r e e views, V I , V2, and V3 have t o be i n t e g r a t e d , w i l l VI be i n t e g r a t e d f i r s t w i t h V2 and t h e r e s u l t o f t h i s i n t e g r a t i o n be i n t e g r a t e d w i t h V3, o r w i l l t h e i n t e g r a t i o n b e g i n w i t h V2 and V3?  192  In both t h e two-view the  following  all  views become p a r t  and the m u l t i - v i e w i n t e g r a t i o n problems,  operations  are performed: o b j e c t s  o f the g l o b a l  existing in  schema, o b j e c t s  existing  i n a t l e a s t one view become p a r t o f the g l o b a l schema,  objects  represented  d i f f e r e n t l y i n d i f f e r e n t views are a d j u s t e d  become p a r t  of the g l o b a l  relationships  are added  schema.  In a d d i t i o n ,  t o the g l o b a l  schema.  and  inter-view  Objects  that  e x i s t i n a l l views w i l l not be a f f e c t e d by the sequence o f the i n t e g r a t i o n process. global  schema.  views w i l l the  They w i l l appear i n the same form i n the  Objects that  a l s o be added t o the g l o b a l  integration  sequence.  s i m i l a r l y missing objects, also  will  originally  be added,  d i d not e x i s t i n a l l schema, independent o f  Inter-view s e t r e l a t i o n s h i p s however m i s s i n g i n a l l views.  independent o f sequence.  are They  In f a c t , they  are added a f t e r a l l t e s t s f o r i d e n t i t y o f o b j e c t s a r e completed. The c r i t i c a l element f o r t h i s assessment o f t h e view i n t e g r a t i o n procedure i s t h e adjustment o f views when c o n f l i c t s a r e d e t e c t e d .  In  t h e two-view  s i t u a t i o n , t h e sequence  i n which o b j e c t s  are  compared may v a r y .  Does t h i s change a f f e c t the outcome o f the  integration?  question  questions,  This  namely  first,  translates  into  does the sequence  two  more  i n which  basic  objects  are compared r e s u l t i n d i f f e r e n c e s i n the d i a g n o s i s o f c o n f l i c t s , and second, does a p o t e n t i a l l y d i f f e r e n t d i a g n o s i s d i f f e r e n t g l o b a l schema?  193  result in a  The  conflict  criterion  d i a g n o s i s procedure uses as i t s most important  t h e meaning dimension.  Once o b j e c t s w i t h  identical  meaning a r e found, c o n f l i c t s a r e d e t e c t e d based on d i f f e r e n c e s in  the remaining dimensions,  name, c o n s t r u c t ,  and c o n t e x t .  For each o b j e c t i n each o f t h e views, a t most one o b j e c t w i t h i d e n t i c a l meaning can e x i s t independent  i n t h e o t h e r view.  o f t h e sequence  object  a r e compared.  Furthermore, w i t h t h e e x c e p t i o n o f name changes  f o r homonyms,  the remaining dimensions meaning  identity  Therefore, object the  o f an o b j e c t  with another  comparison  will  yield  from  different  t h e same r e s u l t ,  o f comparisons,  t h e method  a r e not changed  o b j e c t has been  f o r any two o b j e c t s  sequence  using  i n which  This i s true,  before  established. views, the  independent o f  u n l e s s t h e database designer  i s inconsistent  i n renaming  o b j e c t s when  homonyms a r e found.  One  o t h e r p o t e n t i a l source o f e r r o r e x i s t s , b u t i t i s a l s o i n  the domain o f t h e database d e s i g n e r . difficult are  i n certain  identical  The d e s i g n e r may f i n d i t  s i t u a t i o n s t o d e c i d e whether two o b j e c t s  i n meaning.  T h e r e f o r e , i f both o b j e c t s 01 and  02 from view VI appear t o t h e d e s i g n e r as i f they c o u l d match the meaning o f o b j e c t 03 from V2, then t h e o r d e r o f comparison may b i a s t h e d e s i g n e r t o d e c i d e f o r 01 i n one s i t u a t i o n and f o r 02 i n some o t h e r s i t u a t i o n .  This i s a p a r t i c u l a r  problem  i n cases o f c o n s t r u c t mismatch, where, f o r i n s t a n c e , an e n t i t y  194  attribute  i n one  view corresponds t o an e n t i t y - r e l a t i o n s h i p  c o n s t r u c t i n the o t h e r view (see F i g u r e 11). the  database  Supplier  designer  corresponds  relationship  Supply.  has  to  to  the  But  decide  In t h i s example,  w h e t h e r the a t t r i b u t e  entity  Supplier  even though the  or  designer  to may  the have  some d i s c r e t i o n i n d e c i d i n g which o f the o b j e c t s i s the matching one  ( e n t i t y or r e l a t i o n s h i p ) , the c o n f l i c t w i l l be r e s o l v e d i n  e x a c t l y the same way. R  construct.  The  The  a t t r i b u t e w i l l be r e p l a c e d by an  same i s t r u e  f o r other  forms o f  E-  construct  mismatch.  In  summary,  as  long  as  the  designer  i s consistent  i n his  assessment o f meaning i d e n t i t y o f o b j e c t s , the d i a g n o s i s always be is  the same, independent o f sequence.  inconsistent  procedure w i l l  in his  still  I f the  will  designer  assessment of meaning i d e n t i t y ,  produce  identical  outcomes f o r cases  the of  c o n s t r u c t mismatch.  In the m u l t i - v i e w schema) t o concern.  changes  i n the  i n v a r i a n c e of the outcome ( g l o b a l order  of view comparisons  is  the  Can o b j e c t s end up i n the g l o b a l schema w i t h d i f f e r e n t  names, d i f f e r e n t the  order  the  case.  for  a l l but  domain.  situation,  constructs,  or d i f f e r e n t  i n which views are The  processed.  contexts, Again,  based  this  on  i s not  i n t e g r a t i o n method p r e v e n t s those v a r i a t i o n s  naming  decisions  which  are  i n the  For c o n s t r u c t changes, t h e r e i s o n l y one 195  designer's d i r e c t i o n of  change,  to avoid  n views, n-1  l o s s of information.  represent  an o b j e c t  and o n l y one view r e p r e s e n t s still  For example,  as a r e l a t i o n s h i p a t t r i b u t e  i t as an e n t i t y , t h e o b j e c t  become an e n t i t y i n the g l o b a l s t r u c t u r e .  the  most  information  one  chosen  rich  f o r the g l o b a l  object  representation  structure.  d e a l t w i t h i n a s i m i l a r manner.  In a l l cases, will  schema w i l l  show  be the  C o n t e x t changes are  the s e t o f e n t i t i e s  view V2 the s e t { E l , E3}, and i n view V3 t h e c o n t e x t global  will  For example, i f a r e l a t i o n s h i p  R i n view VI has as i t s context  the  i f out o f  { E l , E2,  E3}  as R's  {El}, i n { E l , E2}, context,  independent o f the sequence i n which the views were i n t e g r a t e d . The  same i s t r u e  for attributes.  E n t i t i e s and r e l a t i o n s h i p s  i n t h e g l o b a l view have a t t r i b u t e s e t s which a r e t h e union o f the  a t t r i b u t e sets  of the corresponding  objects  from the  o r i g i n a l views (except, o f course, when an a t t r i b u t e i s converted t o another c o n s t r u c t ) .  In c o n c l u s i o n , even i n a multi-view produce  s i t u a t i o n , t h e method w i l l  t h e same g l o b a l schema, independent o f sequence, i f  the d e s i g n e r i s c o n s i s t e n t i n h i s d e c i s i o n s on meaning  196  identity.  5.  IMPLEMENTATION - THE AVIS PROGRAM  5.1.  Overview  An  implementation o f t h e view i n t e g r a t i o n method i s a v a i l a b l e  i n form o f t h e AVIS (Automatic View I n t e g r a t i o n System) program. AVIS i s w r i t t e n i n P r o l o g .  The  purpose o f t h e program i s not t o show c o r r e c t n e s s  conflict be  r e s o l u t i o n method.  judged  based  g u i d i n g view concerning can  only  Correctness  on i t s u n d e r l y i n g  serve  and therapy  feasibility  and  with  drawn from them  procedure.  The program  Furthermore, i t can show  the s c r e e n d i s p l a y s o f a view i n t e g r a t i o n  AVIS t o i l l u s t r a t e  the operation  o f t h e system  i t s r o l e as a t e s t b e d .  5.2.  To  the rules  o f an automated view i n t e g r a t i o n procedure.  Appendix 3 c o n t a i n s session  assumptions,  should  as a t e s t b e d t o show mistakes o r omissions i n  d e t a i l s o f t h e r e s o l u t i o n procedure. the  o f t h e method  i n t e g r a t i o n , and t h e c o n c l u s i o n  the diagnosis  of the  F u n c t i o n and S t r u c t u r e o f t h e AVIS Program  fulfill  i t s purpose  as a t e s t b e d  and an i n d i c a t o r f o r  feasibility,  t h e program i s an implementation o f t h e d i a g n o s i s  and  procedure  therapy  outlined  197  in earlier  sections.  The  program always o p e r a t e s on a s e t o f two views which are t o be integrated.  Such a s e t o f two views has t o be loaded i n t o the  system a t the o u t s e t o f the i n t e g r a t i o n s e s s i o n . proceeds by c h e c k i n g c o n f l i c t hypotheses.  The  program  F o r each h y p o t h e s i s  t h a t i s checked, one e l i g i b l e o b j e c t from view 1 i s chosen and compared tests  to a l l e l i g i b l e  are c a r r i e d  out  objects  from view  i n the sequence  i n t e g r a t i o n r u l e s and h e u r i s t i c s .  2.  Hypothesis  established  by  the  Depending on t h e outcome o f  a t e s t , an a p p r o p r i a t e t h e r a p y a c t i v i t y i s performed, f o l l o w e d by another t e s t . not  A t h e r a p y can be "do n o t h i n g " i f o b j e c t s do  have t o be changed,  discussed previously.  o r any o f the o t h e r t h e r a p y a c t i o n s  The program t e r m i n a t e s when both views  have become i d e n t i c a l .  The program  s t r u c t u r e which  achieves  t h i s f u n c t i o n i s d e p i c t e d i n F i g u r e 21.  F o l l o w i n g t h e t y p i c a l a r c h i t e c t u r e o f knowledge-based  systems,  the program i s designed i n h i g h l y decoupled form.  For instance,  the  i s not  sequence  i n which  hypotheses are t e s t e d  (programmed) , b u t d e t e r m i n e d by the sequence o c c u r on the OBJECT COMPARISON AGENDA Therefore,  an  i n which they  (box 8 i n the  "urgent" hypothesis t e s t  fixed  (typically  figure). performed  d u r i n g a t h e r a p y o p e r a t i o n c o n s i s t i n g o f more than one therapy action)  can pre-empt  next.  Another  recognizes  that  tests  t h a t would  form o f d e c o u p l i n g an o b j e c t  n o r m a l l y have  s e p a r a t e s the s t e p  i s missing  which  (box 4) , from the step  t h a t a c t u a l l y adds the o b j e c t t o the view (box 5 ) .  198  occurred  H-  Q  C  Object  Integrate  Comparison  Agenda  i te m  <J=Tl  fl>  % H CO  V Object  Pick  Comparison  Object  Agenda  next  2  Comparison  Agenda  item  O vQ  0) 3  H VO  w r+ f-j c o rt  Test Hypothesis  C  Add  items  Object  Agenda  V Assert  V objects  Object Assertion  S2 Add Object  items  Agenda to  Comparison Agenda  to  Assertion  4  Hence, i f the program r e a l i z e s t h a t an o b j e c t i s m i s s i n g , i t reports t h i s Then,  i n an  (add)  the  fact  i n the OBJECT ASSERTION AGENDA  independent object.  that  missing  finding  step, the program w i l l  If this  some o t h e r  object w i l l  i s not  remain  AGENDA u n t i l those p r e c o n d i t i o n s are  O v e r a l l , the program operates the  predicate  comparisons. by  the  Object  involved  i n the  the  test,  objects  This hypothesis an  find  first  to  be  the  The  form  tested.  The  hypothesis  an  answer without  user  check whether r e s u l t s  already  a  Every  t o be  tested, objects  1 and  DEALER,  program w i l l on  such  the  specific  first  object  specified  generic hypothesis  tested  calls  together  hypothesis. attempt  i t s own,  to  before  user.  d e c i d i n g the q u e s t i o n .  could  initiates  i . e . , SUPPLIER f o r view  i s then  answer t o  a s k i n g the  To  which  (same meaning) , and  BUYER, INVENTORY f o r view 2.  find  1),  I t repeatedly  (generic) h y p o t h e s i s  SIMILAR ENTITY  the  satisfied.  as f o l l o w s .  (box  the  OBJECT ASSERTION  comparisons are c a r r i e d out as  c o n s i s t of the  instance  with  i n the  to  fulfilled,  e n t r i e s i n the OBJECT COMPARISON AGENDA.  entry w i l l for  INTEGRATE  not  7) .  t r y to assert  y e t p o s s i b l e , due  p r e - c o n d i t i o n s are simply  (box  had  of  previous  the program tests  can  will  help  in  For example, i f a l l e n t i t i e s i n view 2  corresponding  answer the  interaction,  question  entities with 200  "no",  i n view  1,  the  program  because each o b j e c t  can  have o n l y one matching o b j e c t i n t h e o t h e r view. tests  cannot  help.  F o r example, a necessary  to two  help  i n deciding,  structural  I f previous  i n f o r m a t i o n may  c o n d i t i o n f o r two r e l a t i o n s h i p s  be r e l a t e d i s t o have a t l e a s t two common e n t i t i e s . common  entities  relationships knowledge  exist,  i n question  cannot  help,  semantic  knowledge  domain.  Currently t h i s  where  information  t h e program can a s s e r t t h a t t h e are not r e l a t e d .  t h e program  i t possesses  t h e program  option  i s able  i s only  I f structural  may be a b l e  concerning  t o use any  the a p p l i c a t i o n  i s not implemented  t o make s u c h  i n a form  inferences (the  passively available).  cannot d e c i d e by i t s e l f whether a h y p o t h e s i s it will  I f no  I f t h e program i s true or false,  ask t h e user.  Following the hypothesis  t e s t , t h e program w i l l  p l a c e an e n t r y  i n t o t h e OBJECT ASSERTION AGENDA, i f o b j e c t s have t o be added as  a consequence of the t e s t .  added  t o a view  fulfilled  i f a l l preconditions  (box 5 ) .  the h y p o t h e s i s  In a next  test,  Finally,  step,  for their  objects are creation are  and a l s o based on t h e outcome o f  new s p e c i f i c hypotheses may be p l a c e d on  the OBJECT COMPARISON AGENDA (boxes 6 and 8 ) .  At p o i n t s d u r i n g the i n t e g r a t i o n procedure, the OBJECT COMPARISON AGENDA may be empty, even though t h e i n t e g r a t i o n has n o t been completed. the  Such a p o i n t occurs  i n t e g r a t i o n process.  f o r instance a t the outset of  To " b o o t - s t r a p "  201  itself  i n these  situations,  t h e program w i l l  9), which p l a c e s  a c t i v a t e t h e SEED p r e d i c a t e  a f i r s t e n t r y on t h e agenda.  process u l t i m a t e l y  terminates,  more seeds can be generated.  202  i f t h e agenda  (box  The i n t e g r a t i o n i s empty and no  Knowledge R e p r e s e n t a t i o n  5.3.  5.3.1.  R e p r e s e n t a t i o n o f views  To a l l o w t h e o p e r a t i o n on a r b i t r a r y views, t h e program views s e p a r a t e from the p r o c e d u r a l knowledge.  stores  A s e t o f two s m a l l  views i s shown i n F i g u r e 22.  object("entity",1,3,"dealer",["sells","supplies"],[]) o b j e c t ( " e n t i t y " ,1,4, "branch", [ " a l t e r n a t e _ l o c a t i o n " , " s u b s i d i a r y " ] , []) object("entity",2,1003,"dealer",["sells","supplies"],[]) o b j e c t ( " e n t i t y " , 2 , 1 0 0 4 , " c u s t o m e r " , [ " b u y s " , " p a y s " , " o r d e r s " ] , []) obj e c t ( " e n t i t y " ,2,1005, " c o n t r a c t " , [ "agreement" ];'[]) object("relationship",1,502,"supply",["delivery","goods_transfer»],[]) obj e c t ("relationship",2,1502,"dealer_contract",["dealer_contract»],[]) obj e c t ( " r e l a t i o n s h i p " , 2,1503, "customer_contract", [ "customer_contract»],[]) obj e c t ( " a t t r " , 1 , 6 0 0 , " c o n t r a c t " , [ " i d e n t i f i e r " ] , [ ] ) 1  role(502,3) role(502,4) role(1502,1003) role(1502,1005) role(1503,1004) role(1503,1005) role(600,3) F i g u r e 22:  R e p r e s e n t a t i o n o f Views i n AVIS  Each o b j e c t i s s t o r e d as an atom o f the form object(Type,View#, Object#,Name,Meaninglist,Replacelist). r e l a t i o n s h i p , or a t t r ( i b u t e ) .  Type i s one o f e n t i t y ,  View numbers are a r b i t r a r y , but  1  " a t t r " i s used i n s t e a d o f a t t r i b u t e because i s a r e s t r i c t e d term i n the programming language. 203  attribute  objects  from t h e same v i e w  objects  from d i f f e r e n t  carry  views  t h e same view number and  c a r r y d i f f e r e n t view  O b j e c t numbers a r e unique i d e n t i f i e r s view they b e l o n g t o . for  the object.  some  is a list  numbers  object  a t hand.  f o r objects  that  F o r example,  becomes an e n t i t y , the new former  of s t r i n g s that  the nature of the  r e p r e s e n t e d by t h e database o b j e c t . object  f o r objects within  the  The o b j e c t name i s the u s e r d e f i n e d name  Meaninglist  i n d i c a t i o n of  numbers.  real  world  give  object  Replacelist i s a l i s t have  been  replaced  of  by the  i f a relationship attribute  e n t i t y r e t r a i n s a reference  r e l a t i o n s h i p a t t r i b u t e t h r o u g h t h e number  t o the i n the  Replacelist.  One way  t o t h i n k o f the meaning  list  i s t o view i t as a  o f t h e s a u r u s terms f o r the o b j e c t name. the  list  may  describe  through  slightly  meaning  list  some f a c e t s  different  Each o f t h e terms i n  of the objects  labelling  from  a l l object  which  simply  a set of category  terms used the  not have t o be a c e n t r a l p o o l sets  terms  t o be which  list  helps  o f base i t may  sets be  c a p t u r e t h e language  to s i m p l i f y the  (or even o f i d e n t i c a l ) o b j e c t s .  204  an  This l i s t of  defined,  i n t h e o r g a n i z a t i o n under study.  meaning  dissimilar  have  The  the categories  o b j e c t belongs t o i n a g e n e r a l i z a t i o n h i e r a r c h y . does  meaning,  of the object.  can a l s o be used t o i d e n t i f y  categories  list  In e i t h e r form,  identification  of  The context  o f an o b j e c t  Object#) atoms.  i s s t o r e d by means o f r o l e ( O b j e c t ! ,  The f i r s t o b j e c t number i n d i c a t e s t h e o b j e c t ,  the second one the o b j e c t i t i s a s s o c i a t e d w i t h . only  By d e f i n i t i o n ,  a t t r i b u t e s and r e l a t i o n s h i p s have a non-empty  Thus, r o l e s e x i s t o n l y f o r these two o b j e c t  205  types.  context.  5.3.2.  The  R e p r e s e n t a t i o n of View I n t e g r a t i o n  knowledge  hypotheses, tests,  contained  rules  rules  f o r the  for  the  o f the  program  consists  s e l e c t i o n of objects  elimination  i r r e l e v a n t t e s t objects, cases.  i n the  Knowledge  and  of  mostly  of  f o r subsequent  irrelevant  tests  or  r u l e s f o r the therapy of  conflict  Hypothesis atoms serve mainly t o c o n t r o l the  sequence  i n t e g r a t i o n procedure.  i n Figure  S e l e c t e d hypotheses are  depicted  23.  h y p o t h e s i s ( 3 , ["n",«n"] , [4,1] , [ 1 2 , 7 0 1 , 2 4 , 5 ] , " S i m i l a r E n t i t y " , "saiae_meaning", " d i f f erent_meaning") hypothesis(4,["o","o"],[],[],"Synonym","synonym","same") hypothesis(12,["o","o"],[],[],"Homonyms","homonyms","nothomonyms") hypothesis(701,["n","n"],[8],[24] , "Entity i s Relationship A t t r i b u t e " , " r e l a t i o n s h i p _ a t t r i b u t e " , "not_relationship_attribute") Figure  23:  AVIS Hypotheses  H y p o t h e s i s 3 formulates the t e s t f o r " S i m i l a r E n t i t y " . investigates entity  whether f o r an  i n view  2 with  d i f f e r e n t name.  The  h y p o t h e s i s atom  (i.e.,  the  lists  This test  e n t i t y i n view 1 t h e r e e x i s t s same meaning but of i n t e g e r s  [4,1])  indicate  depending on the outcome o f the t e s t .  possibly  which are  part  subsequent  For i n s t a n c e ,  an  with of  a  the  activities i f a similar  e n t i t y i s found, the next h y p o t h e s i s t o be t e s t e d i s h y p o t h e s i s 4,  which  Thereafter,  tests  whether  both  e n t i t i e s have  h y p o t h e s i s 1 would f o l l o w . 206  I f the  same  test  names.  result  were n e g a t i v e ,  invoked, i . e . , 12, which  3  701,  knowledge w i l l  consequence of  the  becomes t r u e ,  a number of 24, be  and  5.  added  involved  the  For  knowledge b a s e as  instance,  objects  same meaning.  w i l l be  s t o r e d as h a v i n g d i f f e r e n t meaning.  that  select  agenda" r u l e s .  One  the  objects  test  will  having  Rules  If  for  be  Each h y p o t h e s i s shows a l s o  to  t e s t outcome. the  o t h e r hypotheses would  i f hypothesis  be  memorized  outcome i s n e g a t i v e ,  subsequent  tests  example i s shown i n F i g u r e  a  are  as  they  "make  24.  m_a(3,_,[01],[02],H):H = 1,!, find_r(01,Rll), find_r(02,R12), filter(H,R12,R12n), m_a(0,b,Rll,R12n,H),!. F i g u r e 24:  The  AVIS "make agenda" Rule  "make agenda"  hypothesis t e s t , Once two locally,  relationships.  It  previously  the  figure  rule  finds  as a l l r e l a t i o n s h i p s out  relationships  investigated,  and  formulates  Rll will  be  to  find  a l l relationships  filters  relationships  prepares a  02  searches identical  e n t i t y 01  have  been  i n which a l l  compared t o a l l r e l a t i o n s h i p s  207  is  i s associated  that  a test  new  succeeded.  have been found, AVIS  v i c i n i t y of these e n t i t i e s , The  then  in  test for similar entity  entities  a s s o c i a t e d w i t h , as w e l l with.  shown  a f t e r the  identical i n the  rule  R12n,  to  find  matching  relationship,  the  pairs.  If  Rll  contains  more t h a n  agenda item w i l l l a t e r be decomposed i n t o  one as  many items as t h e r e are elements i n l i s t R l l . T h i s i s necessary, since  a l l tests  are  carried  out  in  a  l : n mode, where  o b j e c t of view 1 i s compared t o n o b j e c t s of view  Rules t o f i l t e r out are  one  2.  i r r e l e v a n t t e s t s or i r r e l e v a n t t e s t o b j e c t s  exemplified i n Figure  25.  /* the a t t r i b u t e 01 i s a key */ test_hypo([01],012,H,_,012):H = 14, is_key([01],[01]), make_agenda(H,t,[01],012,HN), do_ao(H,Ol,0,'n'),!. F i g u r e 25:  The 14.  F i l t e r i n g Rule i n AVIS  r u l e d e p i c t e d i n F i g u r e 25 r e f e r s t o the t e s t of h y p o t h e s i s Hypothesis  a t t r i b u t e may  14  states  the  correspond t o  an  possibility  that  entity-relationship  an  entity  construct.  The test_hypo r u l e shown here s t a t e s t h a t i f the e n t i t y a t t r i b u t e 01 to,  is a  key  then  (identifier)  i t cannot  construct  012.  attribute  correspond  Entity  constructs  attributes,  Supplier  the  attribute i s a singular  key) , i t r e f e r s PART i t s e l f .  to  the  the  the  attributes  entity-relationship i . e . the  to  of  can  entity  i t belongs  entity-relationship only  i f t h e y are  correspond  interconnection  a t t r i b u t e of a PART e n t i t y .  identifier  object  Such o b j e c t s can 208  If  (not p a r t of a compound  itself, be  to  i . e . Part# r e f e r s  excluded from the  test.  to By  u s i n g f i l t e r i n g r u l e s , the AVIS program can r e q u e s t s from the  reduce i n f o r m a t i o n  user.  Rules f o r the  therapy of c o n f l i c t cases t y p i c a l l y become r u l e s  to create  objects.  object"  new  F i g u r e 2 6 i l l u s t r a t e s such an  "assert  rule.  asso(H,01,02,'y',New):H = 14, object(relationship,_,02,_,_,_), find_e(02,El2), object(attr,VI,01,_,_,_), role(01,El), fct(same,El,E2), member(E2,El2,Elr), single(same_meaning,Elr,Els), dup(H,Els,Vl,Elsl), dup(H,[02],Vl,01n), append(Elsl,01n,New), retract(object(attr,VI,01,_,_,_)), retract(role(01,El)),I. F i g u r e 26:  The  AVIS Object A s s e r t i o n  figure  shows one  of the  rules dealing  where an e n t i t y a t t r i b u t e i n one relationship  construct  Rule  i n the  w i t h the  situation  view corresponds t o an e n t i t y -  o t h e r view.  This  rule  replaces  the a t t r i b u t e 01 w i t h a r e l a t i o n s h i p Oln, by simply d u p l i c a t i n g the  relationship  the  attribute  (E12)  02  from view 2 and  from v i e w  i n view 2 t h a t  are  1.  subsequently  Furthermore,  associated  by  the  eliminating  from a l l e n t i t i e s relationship  02,  those t h a t have no c o r r e s p o n d i n g o b j e c t s i n view 1 are i d e n t i f i e d ( E l s ) and  duplicated  i n view  1.  209  The  5.4.  One  Impact o f Domain Knowledge  of the b i g g e s t problems f o r knowledge based systems i s the  requirement  to  contain  problem domains. than  knowledge  about  a wide v a r i e t y  of  "Deep" knowledge i s much e a s i e r implemented  "wide" knowledge.  This  is similarly  true  f o r the  view  i n t e g r a t i o n program which a l r e a d y has t o c o n t a i n deep knowledge on  diagnosis  ability  to  and  therapy.  assess  T h i s weakness l i m i t s the  identity  program judge t h a t two  of  object  o b j e c t s are  meanings.  identical  necessary How  can . a  i n meaning i f i t  c o n t a i n s no domain knowledge?  I f the least  " t r u e " meaning of an o b j e c t cannot be assessed, a number o f  i n d i c a t o r s e x i s t to help  o f t r u e meaning (see F i g u r e 18, each  object  could  c a r r y with  previous  i n the  then at  assessment  chapter).  Obviously,  i t a meaning r e p r e s e n t a t i o n ,  l i s t o f symbols d e s c r i b i n g the meaning of the o b j e c t s . comparison Problems lists.  would then  could A  i n v o l v e the  a r i s e from  second  comparison  homonyms and  i n d i c a t o r could  be  of  object  Meaning  such  synonyms context.  in  lists. these If  o b j e c t s are s i m i l a r t h e i r immediate neighbors are l i k e l y t o similar provide similar.  too.  Thus,  the  f i n d i n g of  a  two be  s i m i l a r neighbors would  some evidence f o r the assumption t h a t two  objects  are  Other forms of context comparison i n v o l v e the a n a l y s i s  o f r e l a t i o n s h i p c a r d i n a l i t i e s and,  210  i f defined, r o l e s of e n t i t i e s  in a relationship. evidence at  least  can  S i m i l a r r o l e s and s i m i l a r c a r d i n a l i t i e s a r e  f o r object s i m i l a r i t y . similar  be a n o t h e r  attribute  indicator  T h i r d , s i m i l a r a t t r i b u t e s (or  names o r s i m i l a r key a t t r i b u t e s ) for similarity.  Fourth,  s e t s have been d e f i n e d , these can be compared. name o f an o b j e c t  itself  c a n be an i n d i c a t o r  i f value  F i n a l l y , the f o r meaning  similarity.  Most o f t h e above mentioned i n d i c a t o r s a r e plagued by t h e problem of  ambiguous  representation.  I f names o f o b j e c t s , due t o  homonymy and synonymy problems, a r e not a r e l i a b l e i n d i c a t o r f o r s i m i l a r i t y , t h e same w i l l be t r u e f o r o t h e r i n d i c a t o r s such as attribute  names o r meaning  lists.  The use o f c o n t e x t may be  viewed merely as a r e c u r s i v e restatement  o f t h e problem.  For  example, t o know whether e n t i t i e s E l and E2 a r e i d e n t i c a l , one has t o know whether t h e i r context R l and R2 i s i d e n t i c a l find  out whether R l and R2 a r e i d e n t i c a l  whether  the context  o f R l and R2  . To  one has t o f i n d out  i s identical,  N e v e r t h e l e s s , comparisons a r e p o s s i b l e .  1  and so on.  For instance, p a r t i a l  o v e r l a p o f meaning r e p r e s e n t a t i o n s can be i n d i c a t e d , o r p a r t i a l c o n t e x t s i m i l a r i t y , can be i n d i c a t e d . The AVIS program operates in  this  manner, however  i n passive  form.  d e c i d e s whether two o b j e c t s a r e i d e n t i c a l .  The program never Yet t h e u s e r can ask  the program f o r t h e v a l u e s o f s i m i l a r i t y i n d i c a t o r s . 1  So  Note t h a t t h e AVIS program r e c o g n i z e s c o n t e x t a l s o f o r e n t i t i e s i n o r d e r t o make use o f l o c a l s e a r c h f o r i d e n t i c a l objects.  far,  only  meaning  t h e i n d i c a t o r s meaning  l i s t s ) , context  name a r e implemented. a user  (comparing  representation  immediate neighbors) and  F i g u r e 27 shows t h e systems response t o  i n q u i r y on t h e v a l u e o f the meaning i n d i c a t o r s .  Testing f o r hypothesis: SIMILAR ENTITY, I n v o l v i n g t h e e n t i t y DEALER ( 3 ) and one o f t h e f o l l o w i n g Meaning Match M a t c h between e n t i t y DEALER ( 3 ) ["sells","supplles"] and o b j e c t s b e l o w : ID NAME M a t c h o f : NAME MEANING CONTEXT 1003 dealer y y unknown 1004 customer n n unknown 1005 contract n n unknown  Press  (comparing  <spacebar>  ——Responseto continue  F i g u r e 27: AVIS Meaning I d e n t i t y I n d i c a t o r s  212  objects;  A more  advanced form o f meaning  meaning While  representation  currently  restrictions,  i s based on t h e  (meaning c a t e g o r i z a t i o n )  meaning  therefore  indicators,  lists  f o r objects  allowing  d e f i n e t h e meaning o f an o b j e c t ;  f u t u r e meaning l i s t s w i l l be  elements o f a c a t e g o r i z a t i o n h i e r a r c h y  213  no f o r m  t h e u s e o f any symbol t o  more r e s t r i c t e d i n t h e c h o i c e o f terms.  unambiguous.  have  feature.  Terms w i l l have t o be and w i l l  be  therefore  SUMMARY AND  6.  The  EXTENSIONS  main c o n t r i b u t i o n of t h i s r e s e a r c h  i s the development of a  complete view i n t e g r a t i o n procedure.  The  r e s e a r c h went beyond  the problem o f i n t e r - v i e w c o n s t r a i n t r e p r e s e n t a t i o n of o b j e c t s ) . into  the  I t systematically categorized inter-view c o n f l i c t s  conflict  conflicts.  types,  The  meaning  based  on  an  a n a l y s i s of  source of a l l c o n f l i c t s  dimension  on  one  hand  o b j e c t dimensions, name, c o n s t r u c t , hand.  Whenever  a l s o have t o be a conflict  two  objects  identical  arises.  are  conflict-free.  The  sources  method  and  a l l other context,  identical  relevant  on the  other  i n meaning, If  they not,  o b j e c t s have d i f f e r e n t  i n the name dimension t o  presented  combinations  of  i s mismatches between  and  S i m i l a r l y , i f two  a l l possible  the  i n t h e i r o t h e r dimensions.  meanings they a l s o have t o d i f f e r  diagnose  (relatedness  in this of  research  mismatches  and  be can has  therapy r u l e s f o r a l l of them. In a d d i t i o n t o r u l e s f o r r e c o g n i t i o n and r e s o l u t i o n o f c o n f l i c t s , an  a l g o r i t h m i c view i n t e g r a t i o n procedure was  described.  s p e c i f i e s the sequence of t e s t s f o r o b j e c t i d e n t i t y and relatedness.  At the t e r m i n a t i o n of t h i s procedure, two  different  v i e w s become i d e n t i c a l  inter-view  constraints.  a  global  schema  integration object  and  represent  initially  a l l relevant  the  two  original  At the end  a test  of t h i s step, both views 214  become  views.  procedure developed here begins w i t h  identity.  object  Thus, e i t h e r of the views has  containing  It  The for  contain  the same o b j e c t s . all  The subsequent t e s t f o r r e l a t e d n e s s determines  inter-view constraints f o r a l l o r i g i n a l l y  (existing result  i n only  one view) .  The t e s t  i n the addition of e n t i t i e s  unique  objects  f o r relatedness  t o represent  may  s u p e r s e t and  subset o b j e c t s and i n t h e a d d i t i o n o f I s a r e l a t i o n s h i p s .  Furthermore, t h e r e s e a r c h p r o v i d e d i n t e g r a t i o n problem to  ease  f o r the user.  the user's  identical  task  meaning.  h e u r i s t i c s t o s i m p l i f y the H e u r i s t i c s were developed  of i d e n t i f y i n g  Assumptions  information  to the contrary,)  meaning w i l l  have i d e n t i c a l  such  object  pairs  with  as " ( i n a b s e n c e o f  two o b j e c t s  with  identical  c o n s t r u c t s " , reduce t h e number o f  o b j e c t s among which t h e u s e r has t o look f o r a matching o b j e c t . In  case o f information  objects fail The  with  t o the contrary,  same meaning were found,  i . e . , i f no p a i r o f the h e u r i s t i c  and would r e q u i r e a more p a i n s t a k i n g search research  exemplified  would  f o r a match.  how the i n t r o d u c t i o n o f h e u r i s t i c s  a l t e r s t h e i n t e g r a t i o n procedure.  The  method was designed  f o r use as a view i n t e g r a t i o n t o o l ,  through implementation as a knowledge based system AVIS system). assures  Implementation i n the form o f a computer program  adherence  resolution designer's  ( i . e . , the  steps. task.  t o t h e sequence o f c o n f l i c t I t also  eases  Nevertheless,  as much  the c o n f l i c t  a n a l y s i s and  as p o s s i b l e the r e c o g n i t i o n and  r e s o l u t i o n r u l e s which form the core o f the r e s e a r c h a r e v a l i d 215  independent o f any implementation. based  on r u l e s o f m o d e l l i n g , based  on database database  Future  design p r i n c i p l e s ,  on the E-R model and  based  r a t h e r than through t r a c i n g of  d e s i g n expert b e h a v i o r .  e x t e n s i o n s t o the r e s e a r c h w i l l  areas. not  The r u l e s have been developed  First,  more h e u r i s t i c s w i l l  only s i m p l i f y  more l i g h t  focus on a t l e a s t  be developed.  the u s e r ' s t a s k f u r t h e r ,  on t h e q u e s t i o n  objects are i d e n t i c a l  o f how  i n meaning.  we  two  This w i l l  i t will  also  can a s s e s s  shed  when two  The assessment o f meaning  i d e n t i t y i s the most d i f f i c u l t p a r t o f the i n t e g r a t i o n p r o c e s s . Currently, identity  the i n t e g r a t i o n  method  o f two o b j e c t s without  be d e s i r a b l e t o have  does n o t d e c i d e  user  t h e method  consultation.  decide,  approach  to  development  extend  t h e method  of categorization  application  areas.  categorization  In  in this  h i e r a r c h y has  research, been  i n some  One p o s s i b l e  direction  hierarchies  this  I t would  at least  cases, whether two o b j e c t s have the same meaning.  on t h e  i s the  for particular a  very  introduced,  coarse  one  which  f a c i l i t a t e s d e c i d i n g whether two o b j e c t s have d i f f e r e n t meanings. More e l a b o r a t e , as w e l l as more domain s p e c i f i c would  allow  a sharper  distinction  between  hierarchies  concepts  and  thus  a l l o w f o r b e t t e r judgment on i d e n t i t y o r d i f f e r e n c e i n meaning. This  measure  would  require that users  be v e r y p r e c i s e and  e x p l i c i t i n t h e i r c h o i c e o f names f o r e n t i t i e s , and  attributes  i n the p r e - i n t e g r a t i o n stage. 216  relationships, Hence, use o f a  categorization hierarchy but  may  t o use of  user  not  be  may  sufficient.  be  one  good source o f  Ultimately  more sources o f evidence and  a procedure w i l l  will  s p e c i f i c a t i o n e r r o r s , i n order  evidence,  have t o be  have  tolerant  t o make judgments  on  meaning i d e n t i t y t h a t are as good as human judgments.  A second area o f e x t e n s i o n t o focus on i s the d e t e c t i o n o f e r r o r s in  user  views.  The  i n t e g r a t i o n method i n i t s c u r r e n t  assumes t h a t views are complete ( a l l r e l e v a n t o b j e c t s consistent  form  included),  (no c o n f l i c t i n g knowledge), and minimal (each o b j e c t  only represented  1  once) .  I f views are i n c o r r e c t , i n c o n s i s t e n t  or not minimal, the g l o b a l schema w i l l be i n c o r r e c t , i n c o n s i s t e n t o r not minimal.  For example, i f one  view s t a t e d ( i n c o r r e c t l y )  t h a t " a l l EMPLOYEES are FULLTIME_EMPLOYEEs", w h i l e another view s t a t e d ( c o r r e c t l y ) t h a t "every FULLTIME_EMPLOYEE i s an EMPLOYEE", the method would r e p r e s e n t both c o n s t r a i n t s i n the g l o b a l schema ( i n c o n s i s t e n c y ) , not r e c o g n i z i n g t h a t the o n l y l o g i c a l l y c o r r e c t i n t e r p r e t a t i o n of these two and  FULLTIME_EMPLOYEE t o be  c o u l d be d e t e c t e d  statements would r e q u i r e EMPLOYEE identical.  Mistakes l i k e t h i s  one  and c o r r e c t e d d u r i n g the i n t e g r a t i o n p r o c e s s .  To p e r m i t r e c o g n i t i o n  1  The c o n s t r a i n t s on i n p u t views may seem r a t h e r s t r i n g e n t . However, we can expect views t o be i n c o n s i s t e n t and minimal form, i f they have been c r e a t e d w i t h a view c r e a t i o n system such as S t o r e y ' s (1988). Completeness has t o be assumed, u n l e s s evidence t o the c o n t r a r y e x i s t s . A l l p r e v i o u s l y d i s c u s s e d i n t e g r a t i o n approaches make s i m i l a r demands on the i n p u t s t o t h e i r i n t e g r a t i o n methods. 217  of  such e r r o r s , a s e t o f e r r o r s c e n a r i o s  and c o r r e c t i o n  rules  would have t o be developed.  Another p o s s i b l e  extension that  scope o f t h i s r e s e a r c h database databases rules  integration contain  goes s u b s t a n t i a l l y beyond the  i s the t r a n s l a t i o n of the findings f o r t o knowledge  facts,  and a r e t h e r e f o r e  knowledge  base i n t e g r a t i o n . bases c o n t a i n  much more d i f f i c u l t  Nevertheless, with the increase  While  f a c t s and  to integrate.  i n t h e development o f knowledge  based systems and c o r r e s p o n d i n g e f f o r t s t o improve t h e knowledge a c q u i s i t i o n e f f o r t such a p r o j e c t may become a f r u i t f u l endeavour for  the future.  218  7.  REFERENCES  A l - F e d a g h i , S. and P. Scheuermann. Mapping C o n s i d e r a t i o n s i n the Design o f Schemas f o r t h e R e l a t i o n a l Model. IEEE Trans. Software Engineering, SE-7, No. 1, 1981. A r m s t r o n g , W.W. Dependency S t r u c t u r e s o f Database R e l a t i o n ships. Proc. 1974 IFIP Congress, Amsterdam: North H o l l a n d , pp. 580-583. A t z i n i , P., C. B a t i n i , M. L e n z e r i n i , and F. V i l l a n e l l i . INCOD: System f o r Conceptual Design o f Data and T r a n s a c t i o n s i n t h e E n t i t y - R e l a t i o n s h i p Model. Proceedings of the Second Int'l Conference on the Entity-Relationship Approach, Washington, D.C., October 1981, pp. 379-414. Bachman, C h a r l e s W. and M a n i l a l Daya. Data Models. VLDB 77, pp. 464-476. B a r r A. a n d E . F e i g e n b a u m . The Intelligence. London: Pitman, 1981.  The Role Concept i n  Handbook  of  Artifical  B a t i n i , C , M. L e n z e r i n i , S.B. Navathe. A Comparative A n a l y s i s of Methodologies f o r Database Schema I n t e g r a t i o n . ACM Computing Surveys, V o l . 18, No. 4, 1986, pp. 323-364. B a t i n i , C. , V. De A n t o n e l l i s , A. D i Leva. Database Design A c t i v i t i e s w i t h i n t h e DATAID P r o j e c t . Quarterly Bulletin of the IEEE Computer Society Technical Committee on Database Engineering, V o l . 7, No. 4, 1984, pp. 16-21. (1984a) B a t i n i , C., B. Demo, A. D i Leva. A Methodology f o r Conceptual Design o f O f f i c e Databases. Information Systems, V o l . 9, No. 4, 1984. (1984b) B a t i n i , C , M. Talamo, and R. Tamassia. Computer Aided Layout of E n t i t y R e l a t i o n s h i p Diagrams. Journal of Software and Systems, 1984. (1984c) B a t i n i , C. , M. L e n z e r i n i . A M e t h o d o l o g y f o r D a t a Schema I n t e g r a t i o n i n the E n t i t y R e l a t i o n s h i p Model. IEEE Transactions on Software Engineering, V o l . 10, No. 6, 1984, pp. 650-663. B a t i n i , C , M. L e n z e r i n i , M. M o s c a r i n i . Views I n t e g r a t i o n . I n : Methodology and Tools for Data Base Design by S. C e r i (ed.). Amsterdam: N o r t h - H o l l a n d , 1983. B a t i n i , C. and M. L e n z e r i n i . A Conceptual Foundation f o r View I n t e g r a t i o n . Proceedings of IFIP Working Conference, Budapest, Hungary, May 1983. 219  B a t i n i . C. , M. L e n z e r i n i , and G. S a n t u c c i . Computer-Aided Methodology f o r Conceptual Database Design. Information Systems, Volume 7, No. 3, 1982, pp. 265-280. B e e r i , C. and P.A. B e r n s t e i n . Computational Problems R e l a t e d t o t h e Design o f T h i r d Normal Form Schemas. ACM TODS, V o l . 4, No. 1, 1979, pp. 30-59. B e r n s t e i n , P. S y n t h e s i z i n g T h i r d Normal Form R e l a t i o n s from F u n c t i o n a l Dependencies. ACM Transactions on Database Systems, Volume 1, No. 4, December 1976, pp. 277-298. B e r n s t e i n , P h i l i p A., J.R. Swenson, and D.C. T s i c h r i t z i s . A U n i f i e d A p p r o a c h t o F u n c t i o n a l Dependencies and R e l a t i o n s . Proc. ACM 1975 SIGMOD Conf. , San Jose, C a l i f o r n i a , pp. 237245. Biskup, Joachim and Bernhard Convent. A Formal View I n t e g r a t i o n Method. Int'l ACM SIGMOD Conf. 1986, pp. 398-407. Biskup, Joachim and Bernhard Convent. A Formal View I n t e g r a t i o n Method. Forschungsbericht 208, U n i v e r s i t a t Dortmund, 1985. Brodie, Michael. On t h e Development o f Data Models. I n On Conceptual Modelling by M i c h a e l B r o d i e , J o h n M y l o p o u l o s , Joachim Schmidt (eds.). New York: S p r i n g e r , 1984. Brown, Robert. Logical Database Design Techniques. View, CA: The Database Design Group, 1982.  Mountain  Casanova, Marco. Theory o f Data Dependencies over R e l a t i o n a l Expressions. P r o c . ACM SIGACT/SIGMOD Symp. on DB Systems, 1982, pp. 189-198. Casanova, Marco and Ronald F a g i n . I n c l u s i o n Dependencies and t h e i r I n t e r a c t i o n w i t h F u n c t i o n a l Dependencies. Proc. ACM SIGACT/SIGMOD Symp. on DB Systems, 1982, pp. 171-176. Casanova, M. and V. V i d a l . A Sound Approach t o View I n t e g r a t i o n . Proceedings of the ACM Conference on Principles of Database Systems, March 1983, pp. 36-47. C e r i , S. and G. G o t t l o b . N o r m a l i z a t i o n o f R e l a t i o n s and Prolog. Communications of the ACM, V o l . 29, No. 1, 1986, pp. 524-544. Chen, P e t e r . The E n t i t y - R e l a t i o n s h i p Model: Towards a U n i f i e d View o f Data. ACM TODS, Volume 1, No. 1, 1976, pp. 9-36. C u r t i c e , R o b e r t M. and P a u l E. Jones, J r . Logical Design. New York: an Nostrand R e i n h o l d Co., 1982. 220  Database  Date, C h r i s . An Introduction Addison-Wesley, 1981.  to Database  Systems.  DeMarco, Tom. Structured Analysis and Systems Englewood C l i f f s : P r e n t i c e - H a l l , 1979.  Reading:  Specification.  Dyba, E. P r i n c i p l e s o f Data Element I d e n t i f i c a t i o n . AuerJbach Data Base Management Services, P o r t f o l i o No. 23-01-03, 1977. Ein-Dor, P h i l l i p . Commonsense Business Knowledge R e p r e s e n t a t i o n A Research Proposal. Working P a p e r , T e l - A v i v U n i v e r s i t y , February, 1987. E l m a s r i , R. , J . L a r s o n and S. Navathe. Schema I n t e g r a t i o n A l g o r i t h m s f o r Federated Databases and L o g i c a l Database Design. T e c h n i c a l Report, Honeywell Corporate Research Center, 1987. E l m a s r i , Ramez and Sham Navathe. Object I n t e g r a t i o n i n L o g i c a l Database Design. IEEE International Conference on Data Engineering, Los Angeles, 1984, pp.42 6-433. E l m a s r i , Ramez, A. Hevner, and J . Weeldreyer. The Category Concept; An E x t e n s i o n t o the E n t i t y - R e l a t i o n s h i p Model. Data and Knowledge Engineering, Volume 1, No. 1, June 1985, pp. 75116. E l m a s r i , Ramez, James A. Larson, Sham Navathe, and T. S a s h i d a r . T o o l s f o r View I n t e g r a t i o n . Quarterly Bulletin of the IEEE Computer Society Technical Committee on Database Engineering, V o l . 7, No. 4. 1984. E l m a s r i , Ramez and G. Wiederhold. P r o p e r t i e s o f R e l a t i o n s h i p s and t h e i r R e p r e s e n t a t i o n s . Proceedings of the National Computer Conference, AFIPS, Volume 49, 1980, pp. 319-326. F a g i n , R. The Decomposition v e r s u s the S y n t h e t i c Approach t o R e l a t i o n a l Database Design. Proceedings of the 3rd VLDB, 1977, pp. 441-446. G o l d s t e i n , Robert C. and Veda S t o r e y . U n r a v e l l i n g Is-A Networks i n Database Design. Working P a p e r , U n i v e r s i t y o f B r i t i s h Columbia, November, 1988. Hayes, P a t r i c k . The N a i v e P h y s i c s M a n i f e s t o . In Expert Systems in the Micro Electronic Age by Donald M i c h i e (ed.). Edinburgh: Edinburgh U n i v e r s i t y Press, 1979, pp. 242-270. Hayes-Roth, F r e d e r i c k , Donald Waterman, Douglas Lenat. Expert Systems. Reading: Addison-Wesley, 1983. Housel,  Barron C ,  Vance  E. Waddle, and 221  S.  Building  B i n g Yao.  The  F u n c t i o n a l Model f o r L o g i c a l Database the 5th VLDB, 1979, pp. 194-203.  Design.  Hubbard, G. and N. R a v e r . Automating Proceedings 1st VLDB, 1975, pp.227-253.  Proceedings  Logical  File  of  Design.  L e i b n i z , G o t t f r i e d Wilhelm. Philosophical Letters and Papers, Vol. 1 (english translation). C h i c a g o : The U n i v e r s i t y o f Chicago P r e s s , 1956. Mannino, M. and W. E f f e l s b e r g . Matching Techniques i n G l o b a l Schema Design. Proceedings IEEE COMPDEC, Los Angeles, 1984, pp. 418-425. M a r t i n , James. Managing the Database C l i f f s : P r e n t i c e H a l l , 1983. McFadden, F r e d and Menlo Park: Benjamin  Jeffrey Hoffer. Cummings, 1988.  Environment. Data  Englewood  Base Management.  Minsky, Marvin. A Framework f o r R e p r e s e n t i n g Knowledge. The Psychology of Computer Vision by P. Winston (Ed.). York: McGraw-Hill, 1975.  In New  M y l o p o u l o s , J . and H. L e v e s q u e . An Overview o f Knowledge R e p r e s e n t a t i o n . In On Conceptual Modelling by B r o d i e , Mylopoulos and Schmidt (Eds.). New York: S p r i n g e r , 1984, pp. 3-17. Navathe, Shamkant, Ramez E l m a s r i , James Larson. Integrating User Views i n Database Design. IEEE Computer, 1986, pp. 50-62. N a v a t h e S, S. G a d g i l . A Methodology f o r View I n t e g r a t i o n i n L o g i c a l Database Design, i n Proc. ACM SIGMOD, A u s t i n , 1978. Navathe, Shamkant, and Mario S c h k o l n i c k . View R e p r e s e n t a t i o n i n L o g i c a l Database Design. Proceedings Int'l ACM SIGMOD Conference, 1978, pp. 144-156. New O r l e a n s Database Rio(1979).  Design Workshop Report  Nilsson, Nils. Principles A l t o : T i o g a P r e s s , 1980. Raver, N. and G.U. Hubbard. Methodology and Techniques. 3, 1977.  of Artificial  (Summary), VLDB,  Intelligence.  Palo  Automated L o g i c a l Database Design IBM Systems Journal, V o l . 16, No.  Robinson, J . A M a c h i n e - o r i e n t e d L o g i c Based on the R e s o l u t i o n Principle. JACM, Volume 12, No. 1, 1965, pp. 23-41. 222  R u s s e l l , Bertrand. A History George A l l e n & Unwin, 1946.  of Western  Philosophy.  London:  Schank, Roger and C h a r l e s R i e g e r . I n f e r e n c e and t h e Computer Understanding o f N a t u r a l Language. Artificial Intelligence, Volume 5, No. 4, 1974, pp. 373-412. Sheppard, D. P r i n c i p l e s o f Data S t r u c t u r e Design. Data Base Management Series, P o r t f o l i o No. 23-01-04,  AuerJbach 1977.  Shipman,D. The F u n c t i o n a l Data Model and Data Language DAPLEX. ACM TODS, V o l . 6, No. 1, March 1980, pp. 140-173. Simon, H e r b e r t and A. Ando. Aggregation of Variables i n Dynamic Systems. In Essays on the Structure of Social Science Models by Ando, F i s h e r , and Simon. Cambridge: MIT Press, 1963. Storey, Design.  Veda. View Creation: An Expert Washington: ICIT Press, 1988.  T e o r y , T . J . and J . P . F r y . Design of Englewood C l i f f s : P r e n t i c e H a l l , 1982. Ullman, J e f f r e y . Principles of Database Computer S c i e n c e Press, 1980.  System  for  Database Systems.  Database Structures. Stanford:  Vessey, I r i s and Ron Weber. S t r u c t u r e T o o l s and C o n d i t i o n a l L o g i c : An E m p i r i c a l I n v e s t i g a t i o n . Communications of the ACM, V o l . 29, No. 1, January 1986, pp. 48-57. V e t t e r , M. Database Design by Implied Data S y n t h e s i s . 77, pp. 428-440. Waterman, D o n a l d A. Addison-Wesley, 1986.  A Guide  to Expert  Systems.  VLDB  Reading:  Weber, Ron. Data Models Research i n A c c o u n t i n g : An E v a l u a t i o n of W h o l e s a l e D i s t r i b u t i o n Software. The Accounting Review, V o l . 61, No. 3, J u l y 1986, pp. 498-518. Yao, S. Bing, Vance E. Waddle, Barron C. Housel. View Modeling and I n t e g r a t i o n U s i n g t h e F u n c t i o n a l D a t a M o d e l , IEEE Transactions on Software Engineering, Volume SE-8, November 1982, pp. 544-553.  Yao, S. Bing, Vance E. Waddle, Barron C. Housel. An I n t e r a c t i v e System f o r Database Design and I n t e g r a t i o n . In Principles of Database Design, V o l . 1, S. Bing Yao (ed.), Englewood C l i f f s , N.J.: P r e n t i c e H a l l , 1985. 223  APPENDIX Appendix 1: C o n f l i c t Cases  1. IDENTICAL OBJECTS N l = N2;  T l = T2;  Ml = M2;  CI = C2;  S o l u t i o n : do n o t h i n g . 1.1. 1.2. 1.3.  Entity i s Entity. Relationship i s Relationship. Attribute i s Attribute.  2. IDENTICAL OBJECTS WITH DIFFERENT CONTEXT N l = N2; T l = T2; Ml = M2; CI <> C2; 2.1.  Relationship i s Relationship of different degree o r a s s o c i a t i n g d i f f e r e n t e n t i t i e s . S o l u t i o n : t i e not y e t a s s o c i a t e d e n t i t i e s t o r e l a t i o n s h i p ( s ) . I f e n t i t i e s cannot be found, t e s t f o r c o n s t r u c t mismatch (5.2.1. o r 6.2.1) and m i s s i n g e n t i t y (17.1.). 2.2. Attribute i s Attribute of a d i f f e r e n t entity or r e l a t i o n s h i p (both are p o s s e s s i o n attributes). S o l u t i o n : c o n v e r t b o t h a t t r i b u t e s i n t o E-R c o n s t r u c t s o r e n t i t i e s , s i m i l a r t o 6.2. o r 6.3. 3. TRUE SYNONYMS (SAME OBJECT TYPE) N l <> N2;  T l = T2;  Ml = M2;  CI = C2;  S o l u t i o n : rename a t l e a s t one o b j e c t so t h a t N l = N2. 3.1. 3.2. 3.3.  Entity/Entity. Relationship/Relationship. Attribute/Attribute.  4. TRUE SYNONYMS WITH DIFFERENT CONTEXT N l <> N2; T l = T2; Ml = M2; CI <> C2; S o l u t i o n : rename and make c o n t e x t s i d e n t i c a l (combine s o l u t i o n s 3. and 2 . ) . 4.1. 4.2.  Relationship/Relationship. Attribute/Attribute. 224  5. CONSTRUCT MISMATCH N l = N2; T l <> T2; Ml = M2;  CI <> C2;  5.1. Entity i s Relationship. S o l u t i o n : c o n v e r t t h e r e l a t i o n s h i p i n t o an e n t i t y . C r e a t e new r e l a t i o n s h i p s t o a s s o c i a t e t h e new e n t i t y w i t h t h e e n t i t i e s i t a s s o c i a t e d as a r e l a t i o n s h i p . 5.2. Entity Attribute i s Entity-Relationship construct. Solution: c o n v e r t t h e a t t r i b u t e i n t o an E-R c o n s t r u c t ( e n t i t y and r e l a t i o n s h i p ) . 5.2.1. Attribute i s Entity. 5.2.2. Attribute i s Relationship. 5.3. Relationship Attribute i s Entity. S o l u t i o n : c o n v e r t t h e a t t r i b u t e i n t o an e n t i t y .  6. CONSTRUCT MISMATCH AND SYNONYM Nl <> 2; T l <> T2; Ml = M2; CI <> C2; S o l u t i o n : rename o b j e c t s t o make names i d e n t i c a l and d e a l w i t h c o n s t r u c t mismatches as i n 5. 6.1. 6.2.  Entity i s Relationship. Entity Attribute i s Entity-Relationship construct. 6.2.1. Attribute i s Entity. 6.2.2. Attribute i s Relationship. 6.3. Relationship Attribute i s Entity. 7. DIFFERENT AND UNRELATED OBJECTS N l <> N2; T l = T2; Ml <> M2; not (related(Ml,M2)) ; CI = C2 or CI <> C2; 7.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . S o l u t i o n : do n o t h i n g . 7.1.1. Entity/Entity. 7.1.2. Relationship/Relationship. 7.1.3. Attribute/Attribute. 7.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). S o l u t i o n : c r e a t e a common r o l e o b j e c t , s p e c i a l r o l e o b j e c t s , and I s a r e l a t i o n s h i p s between t h e r o l e o b j e c t s and o b j e c t s OI and 02. I f o b j e c t s a r e n o t e n t i t i e s , t r a n s f o r m them i n t o e n t i t i e s f i r s t . 7.2.1. Entity/Entity. 7.2.3. Relationship/Relationship. 7.2.3. Attribute/Attribute. 225  8. TRUE HOMONYM N l = N2; T l = T2;  Ml <> M2;  CI = C2 o r CI <> C2;  S o l u t i o n : rename a t l e a s t one o b j e c t , g i v i n g i t a name t h a t i s n o t a s s i g n e d t o any o t h e r o b j e c t i n the view. T h e r e a f t e r t r e a t common r o l e o c c u r r e n c e s s i m i l a r t o 7. 8.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . 8.1.1. Entity/Entity. 8.1.2. Relationship/Relationship. 8.1.3. Attribute/Attribute. 8.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). 8.2.1. Entity/Entity. 8.2.2. Relationship/Relationship. 8.2.3. Attribute/Attribute.  9. DIFFERENT OBJECTS WITH DIFFERENT CONSTRUCTS N l <> N2; T l <> T2; Ml <> M2; CI <> C2; 9.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . S o l u t i o n : do n o t h i n g . 9.1.1. Entity/Relationship. 9.1.2. Relationship/Attribute. 9.1.3. Entity/Attribute. 9.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). S o l u t i o n : c r e a t e a common r o l e o b j e c t , s p e c i a l r o l e o b j e c t s , and I s a r e l a t i o n s h i p s between t h e r o l e o b j e c t s and o b j e c t s 01 and 02. I f o b j e c t s a r e not e n t i t i e s , t r a n s f o r m them i n t o e n t i t i e s f i r s t . 9.2.1. Entity/Relationship. 9.2.2. Relationship/Attribute. 9.2.3. Entity/Attribute. 10. DIFFERENT OBJECTS WITH DIFFERENT CONSTRUCTS. BUT HOMONYMS N l = N2; T l <> T2; Ml <> M2; CI <> C2; S o l u t i o n : t r e a t o b j e c t s l i k e t r u e homonyms. Change the name o f a t l e a s t one o b j e c t t o make i t d i f f e r e n t f r o m a l l o t h e r o b j e c t names i n t h e same v i e w . T r e a t common r o l e o b j e c t s as i n 9. 226  10.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . 10.1.1. Entity/Relationship 10.1.2. Relationship/Attribute 10.1.3. Entity/Attribute 10.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). 10.2.1. Entity/Relationship. 10.2.2. Relationship/Attribute. 10.2.3. Entity/Attribute.  11. DIFFERENT BUT RELATED OBJECTS N l <> N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ;  CI - C2;  11.1.  One o b j e c t c o n t a i n s t h e o t h e r ( O b j e c t 1 c o n t a i n s Object 2 o r v i c e v e r s a ) . S o l u t i o n : c r e a t e an I s a r e l a t i o n s h i p between the two o b j e c t s . 11.1.1. Entity/Entity. 11.1.2. Relationship/Relationship. 11.1.3. Attribute/Attribute. S o l u t i o n : b e f o r e c r e a t i n g an I s a r e l a t i o n ship, convert a t t r i b u t e s i n t o e n t i t i e s (for r e l a t i o n s h i p attributes) or into E-R c o n s t r u c t s ( f o r e n t i t y a t t r i b u t e s ) . 11.2. Object 1 and Object 2 have a common superset (but do not o v e r l a p ) . S o l u t i o n : c r e a t e a superset o b j e c t and I s a r e l a t i o n s h i p s from o b j e c t s 01 and 02 t o t h e superset o b j e c t . 11.2.1. Entity/Entity. 11.2.2. Relationship/Relationship. 11.2.3. Attribute/Attribute. S o l u t i o n : p r e c e d e g e n e r a l s o l u t i o n by transformation i n t o e n t i t i e s o r E^R constructs. 11.3. Object 1 and Object 2 have a common superset and o v e r l a p S o l u t i o n : combine s o l u t i o n s f o r 11.2. and 11.3. 11.3.1. Entity/Entity. 11.3.2. Relationship/Relationship. 11.3.3. Attribute/Attribute. 12. DIFFERENT BUT RELATED HOMONYMS N l = N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; S o l u t i o n : rename and s o l v e s i m i l a r t o 11.  CI = C2;  12.1.  One o b j e c t c o n t a i n s t h e o t h e r ( O b j e c t 1 c o n t a i n s Object 2 o r v i c e v e r s a ) . 12.1.1. Entity/Entity. 12.1.2. Relationship/Relationship. 12.1.3. Attribute/Attribute. 12.2. Object 1 and Object 2 have a common s u p e r s e t (but do not o v e r l a p ) . 12.2.1. Entity/Entity. 12 . 2 . 2 . Relationship/Relationship. 12.2.3. Attribute/Attribute. 12.3. Object 1 and Object 2 have a common superset and o v e r l a p . 12.3.1. Entity/Entity. 12.3 .2. Relationship/Relationship. 12.3.3. Attribute/Attribute.  13. DIFFERENT BUT RELATED OBJECTS WITH DIFFERENT CONTEXT N l <> N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; 13.1.  Entity Attribute related to Entity Attribute of a d i f f e r e n t e n t i t y . S o l u t i o n : t r a n s f o r m a t t r i b u t e s i n t o E-R c o n s t r u c t s and s o l v e r e l a t e d n e s s as i n case 11. 13.1.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). 13.1.2. Common s u p e r s e t . 13.1.3. Common subset and s u p e r s e t . 13.2. E n t i t y A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute S o l u t i o n : t r a n s f o r m e n t i t y a t t r i b u t e i n t o ER construct, relationship attribute into e n t i t y and s o l v e r e l a t e d n e s s as i n 11. 13.2.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). 13.2.2. Common s u p e r s e t . 13.2.3. Common subset and s u p e r s e t . 13.3. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute Solution: transform a t t r i b u t e s i n t o e n t i t i e s and s o l v e r e l a t e d n e s s as i n 11. 13.3.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). 13.3.2. Common s u p e r s e t . 13.3.3. Common subset and s u p e r s e t . 13.4. R e l a t i o n s h i p r e l a t e d t o R e l a t i o n s h i p Solution: transform r e l a t i o n s h i p s into entities and s o l v e r e l a t e d n e s s as i n 11. 13.4.1. Relationship 1 contains Relationship 2 (or v i c e v e r s a ) . 13.4.2. Common s u p e r s e t . 228  13.4.3.  Common subset and s u p e r s e t .  14. DIFFERENT BUT RELATED HOMONYMS WITH DIFFERENT CONTEXT N l = N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; S o l u t i o n : rename t o a v o i d homonym and s o l v e s i m i l a r t o 13. 14.1.  Entity Attribute related to Entity Attribute of a d i f f e r e n t e n t i t y . 14.1.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). 14.1.2. Common s u p e r s e t . 14.1.3. Common subset and s u p e r s e t . 14.2. E n t i t y A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute. 14.2.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa) . < > 14.2.2. Common s u p e r s e t . 14.2.3. Common subset and s u p e r s e t . 14.3. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute. 14.3.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). 14.3.2.. Common s u p e r s e t . 14.3.3. Common subset and s u p e r s e t . 14.4. R e l a t i o n s h i p r e l a t e d t o R e l a t i o n s h i p 14.4.1. Relationship 1 contains Relationship 2 (or v i c e v e r s a ) . 14.4.2. Common s u p e r s e t . 14.4.3. Common subset and s u p e r s e t . 15. DIFFERENT BUT RELATED OBJECTS OF DIFFERENT TYPE N l <> N2; T l <> T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; 15.1.  Entity Attribute related to Entity-Relationship construct. S o l u t i o n : t r a n s f o r m e n t i t y a t t r i b u t e i n t o ER c o n s t r u c t and s o l v e r e l a t e d n e s s s i m i l a r t o 11. 15.1.1. Entity Attribute related to Entity. 15.1.1.1. One o b j e c t c o n t a i n s t h e o t h e r . 15.1.1.2. Common s u p e r s e t . 15.1.1.3. Common subset and s u p e r s e t . 15.1.2. Entity Attribute related to Relationship. 15.1.2.1. One o b j e c t c o n t a i n s t h e o t h e r . 15.1.2.2. Common s u p e r s e t . 15.1.2.3. Common subset and s u p e r s e t . 15.2. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o E n t i t y . 15.2.1. One o b j e c t c o n t a i n s t h e o t h e r . 229  15.2.2. Common s u p e r s e t . 15.2.3. Common subset and s u p e r s e t . 15.3. E n t i t y r e l a t e d t o R e l a t i o n s h i p . 15.3.1. One o b j e c t c o n t a i n s t h e o t h e r . 15.3.2. . Common s u p e r s e t . 15.3.3. Common subset and s u p e r s e t . 16. DIFFERENT BUT RELATED HOMONYMS OF DIFFERENT TYPE N l = N2; T l <> T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; S o l u t i o n : rename a t l e a s t one o b j e c t homonym and s o l v e s i m i l a r t o 15. 16.1.  t o avoid  Entity Attribute related to Entity-Relationship construct 16.1.1. Entity Attribute related to Entity. 16.1.1.1. One o b j e c t c o n t a i n s t h e o t h e r . 16.1.1.2. Common s u p e r s e t . 16.1.1.3. Common subset and s u p e r s e t . 16.1.2. Entity Attribute related to Relationship. 16.1.2.1. One o b j e c t c o n t a i n s t h e o t h e r . 16.1.2.2. Common s u p e r s e t . 16.1.2.3. Common subset and s u p e r s e t . 16.2. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o E n t i t y . 16.2.1. One o b j e c t c o n t a i n s t h e o t h e r . 16.2.2. Common s u p e r s e t . 16.2.3. Common subset and s u p e r s e t . 16.3. E n t i t y r e l a t e d t o R e l a t i o n s h i p . 16.3.1 One o b j e c t c o n t a i n s t h e o t h e r . 16.3.2. Common s u p e r s e t . 16.3.3. Common subset and s u p e r s e t . 17. MISSING OBJECT Object 2 does n o t e x i s t . S o l u t i o n : add m i s s i n g o b j e c t . 17.1 17.2 17.3  E n t i t y missing. Relationship missing. A t t r i b u t e missing.  230  Appendix 2:  C o n f l i c t Solutions  1. IDENTICAL OBJECTS N l = N2;  T l = T2;  Ml = M2;  S o l u t i o n : do 1.1. 1.2. 1.3.  CI = C2;  nothing.  Entity i s Entity. Relationship i s Relationship. Attribute i s Attribute.  2. IDENTICAL OBJECTS WITH DIFFERENT CONTEXT N l = N2; T l = T2; Ml = M2; CI <> C2; 2.1.  R e l a t i o n s h i p i s R e l a t i o n s h i p of d i f f e r e n t degree o r a s s o c i a t i n g d i f f e r e n t e n t i t i e s . S o l u t i o n : S4, possibly SI or S2 or Sll. 2.2. Attribute i s Attribute of a d i f f e r e n t entity or r e l a t i o n s h i p (both a r e possession attributes). S o l u t i o n : S2 or S3. 3. TRUE SYNONYMS (SAME OBJECT TYPE) N l <> N2;  T l = T2;  Ml = M2;  CI = C2;  S o l u t i o n : S10. 3.1. 3.2. 3.3.  Entity/Entity. Relationship/Relationship. Attribute/Attribute.  4. TRUE SYNONYMS WITH DIFFERENT CONTEXT N l <> N2; T l = T2; Ml = M2; CI <> C2; 4.1. Relationship/Relationship. S o l u t i o n : S10 and S4, possibly SI, or S2, o r Sll. 4.2. Attribute/Attribute. S o l u t i o n : 520 and S2 or S3. 5. CONSTRUCT MISMATCH N l = N2; T l <> T2; Ml = M2;  CI <> C2;  5.1. Entity i s Relationship. Solution: SI. 5.2. E n t i t y Attribute i s Entity-Relationship construct. S o l u t i o n : S3. 5.2.1. Attribute i s Entity. 5.2.2. Attribute i s Relationship. 5.3. Relationship Attribute i s Entity. 231  S o l u t i o n : S2. 6. CONSTRUCT MISMATCH AND SYNONYM N l <> 2; T l <> T2; Ml = M2; CI <> C2; 6.1. Entity i s Relationship. S o l u t i o n : S10 and SI. 6.2. E n t i t y Attribute i s Entity-Relationship construct. S o l u t i o n : S10 and S3. 6.2.1. Attribute i s Entity. 6.2.2. Attribute i s Relationship. 6.3. Relationship Attribute i s Entity. S o l u t i o n : 10 and S2. 7. DIFFERENT AND UNRELATED OBJECTS N l <> N2; T l = T2; Ml <> M2; not (related(Ml,M2) ) ; CI = C2 or CI <> C2; 7.1.  O b j e c t s are d i f f e r e n t , u n r e l a t e d and have no common r o l e . S o l u t i o n : do nothing. 7.1.1. Entity/Entity. 7.1.2. Relationship/Relationship. 7.1.3. Attribute/Attribute. 7.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). 7.2.1. Entity/Entity. S o l u t i o n : S7. 7.2.2. Relationship/Relationship. S o l u t i o n : SI and S7. 7.2.3. Attribute/Attribute. S o l u t i o n : S2 or S3 followed by S7.  8. TRUE HOMONYM N l = N2; T l = T2;  Ml <> M2;  CI = C2 o r CI <> C2;  8.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . S o l u t i o n : S10. 8.1.1. Entity/Entity. 8.1.2. Relationship/Relationship. 8.1.3. Attribute/Attribute. 8.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). 8.2.1. Entity/Entity. S o l u t i o n : S10 followed by S7. 8.2.2. Relationship/Relationship. 232  8.2.3.  S o l u t i o n : S10 and SI followed by S7. Attribute/Attribute. S o l u t i o n : S10 and S2 or S3 followed by S7.  9. DIFFERENT OBJECTS WITH DIFFERENT CONSTRUCTS N l <> N2; T l <> T2; Ml <> M2; CI <> C2; 9.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . S o l u t i o n : do nothing. 9.1.1. Entity/Relationship. 9.1.2. Relationship/Attribute. 9.1.3. Entity/Attribute. 9.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). 9.2.1. Entity/Relationship. S o l u t i o n : SI followed by S7. 9.2.2. Relationship/Attribute. S o l u t i o n : SI and S2 or S3 followed by S7. 9.2.3. Entity/Attribute. S o l u t i o n : S2 or S3 followed by S7. 10. DIFFERENT OBJECTS WITH DIFFERENT CONSTRUCTS. BUT HOMONYMS N l = N2; T l <> T2; Ml <> M2; CI <> C2; 10.1.  O b j e c t s a r e d i f f e r e n t , u n r e l a t e d and have no common r o l e . S o l u t i o n : S10. 10.1.1. Entity/Relationship. 10.1.2. Relationship/Attribute. 10.1.3. Entity/Attribute. 10.2. O b j e c t 1 a n d O b j e c t 2 i n same r o l e (Wrelationship). 10.2.1. Entity/Relationship. S o l u t i o n : S10 and SI followed by S7. 10.2.2. Relationship/Attribute. S o l u t i o n : S10, SI and S2 or S3, followed by S7. 10.2.3. Entity/Attribute. S o l u t i o n : S10 and S2 or S3, followed by S7. 11. DIFFERENT BUT RELATED OBJECTS N l <> N2; T l = T2; Ml <> M2; related(Ml,M2); 233  CI = C2;  11.1.  One o b j e c t c o n t a i n s t h e o t h e r ( O b j e c t contains Object 2 o r v i c e v e r s a ) .  11.1.1. 11.1.2. 11.1.3. 11.2.  by S6.  Object 1 and O b j e c t 2 have a common s u p e r s e t (but do not o v e r l a p ) .  11.2.1. 11.2.2. 11.2.3. 11.3.  Entity/Entity. S o l u t i o n : S6. Relationship/Relationship. S o l u t i o n : SI and S6. Attribute/Attribute. S o l u t i o n : S2 or S3, followed  1  Entity/Entity. S o l u t i o n : S8. Relationship/Relationship. S o l u t i o n : SI and S8. Attribute/Attribute. S o l u t i o n : S2 or S3, followed  by S8.  Object 1 and Object 2 have a common s u p e r s e t and o v e r l a p  11.3.1. 11.3.2. 11.3.3.  Entity/Entity. S o l u t i o n : S9. Relationship/Relationship. S o l u t i o n : SI and S9. Attribute/Attribute. S o l u t i o n : S2 or S3, followed  12. DIFFERENT BUT RELATED HOMONYMS N l = N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; 12.1.  by S9.  CI = C2;  One o b j e c t c o n t a i n s t h e o t h e r ( O b j e c t 1 c o n t a i n s Object 2 o r v i c e v e r s a ) . 12.1.1. Entity/Entity. S o l u t i o n : S10 and S6. 12.1.2. Relationship/Relationship. S o l u t i o n : S10 and SI and S6. 12.1.3. Attribute/Attribute. S o l u t i o n : S10 and S2 or S3, followed by S6. 12.2. Object 1 and Object 2 have a common s u p e r s e t (but do not o v e r l a p ) . 12.2.1. Entity/Entity. S o l u t i o n : S10 and S8. 12.2.2. Relationship/Relationship. S o l u t i o n : S10 and SI and SB. 12.2.3. Attribute/Attribute. S o l u t i o n : S10 and S2 or S3, followed by S8. 234  12.3.  Object 1 and Object 2 have a common s u p e r s e t and o v e r l a p . 12.3.1. Entity/Entity. S o l u t i o n : S10 and S9. 12.3.2. Relationship/Relationship. S o l u t i o n : S10 and SI and S9. 12.3.3. Attribute/Attribute. S o l u t i o n : S10 and S2 or S3, followed by S9.  13. DIFFERENT BUT RELATED OBJECTS WITH DIFFERENT CONTEXT N l <> N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; 13.1.  Entity Attribute related to Entity of a d i f f e r e n t e n t i t y .  13.1.1.  Attribute  A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). S o l u t i o n : S3 and S6. 13.1.2. Common s u p e r s e t . S o l u t i o n : S3 and S8. 13.1.3. Common subset and s u p e r s e t . S o l u t i o n : S3 and S9. 13.2. E n t i t y A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute 13.2.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). S o l u t i o n : S2 and S3 and S6. 13.2.2. Common s u p e r s e t . S o l u t i o n : S2 and S3 and S8. 13.2.3. Common subset and s u p e r s e t . S o l u t i o n : S2 and S3 and S9. 13.3. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute 13.3.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). S o l u t i o n : S2 and S6. 13.3.2. Common s u p e r s e t . S o l u t i o n : S2 and S8. 13.3.3. Common subset and s u p e r s e t . S o l u t i o n : S2 and S9. 13.4. R e l a t i o n s h i p r e l a t e d t o R e l a t i o n s h i p 13.4.1. Relationship 1 contains Relationship 2 (or v i c e v e r s a ) . S o l u t i o n : SI and S6. 13.4.2. Common s u p e r s e t . S o l u t i o n : SI and S8. 13.4.3. Common subset and s u p e r s e t . S o l u t i o n : SI and S9.  235  14. DIFFERENT BUT RELATED HOMONYMS WITH DIFFERENT CONTEXT N l = N2; T l = T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; 14.1.  Entity Attribute related to Entity Attribute of a d i f f e r e n t e n t i t y . 14.1.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). S o l u t i o n : S10 and S3 and S6. 14.1.2. Common s u p e r s e t . S o l u t i o n : S10 and S3 and S8. 14.1.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and S3 and S9. 14.2. E n t i t y A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute. 14.2.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). S o l u t i o n : S10 and S2 and S3 and S6. 14.2.2. Common s u p e r s e t . S o l u t i o n : S10 and S2 and S3 and S8. 14.2.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and S2 and S3 and S9. 14.3. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o R e l a t i o n s h i p Attribute. 14.3.1. A t t r i b u t e 1 c o n t a i n s A t t r i b u t e 2 (or v i c e versa). S o l u t i o n : S10 and S2 and S6. 14.3.2. Common s u p e r s e t . S o l u t i o n : S10 and S2 and S8. 14.3.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and S2 and S9. 14.4. R e l a t i o n s h i p r e l a t e d t o R e l a t i o n s h i p 14.4.1. Relationship 1 contains Relationship 2 (or v i c e v e r s a ) . S o l u t i o n : S10 and SI and S6. 14.4.2. Common s u p e r s e t . S o l u t i o n : S10 and SI and S8. 14.4.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and SI and S9. 15. DIFFERENT BUT RELATED OBJECTS OF DIFFERENT TYPE N l <> N2; T l <> T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2; 15.1.  E n t i t y Attribute related to Entity-Relationship construct. 15.1.1. Entity Attribute related to Entity. 15.1.1.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S3 and S6. 15.1.1.2. Common s u p e r s e t . S o l u t i o n : S3 and S8. 15.1.1.3. Common subset and s u p e r s e t . S o l u t i o n : S3 and S9. 236  15.1.2. Entity Attribute related to Relationship. 15.1.2.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S3 and S6. 15.1.2.2. Common s u p e r s e t . S o l u t i o n : S3 and S8. 15.1.2.3. Common subset and s u p e r s e t . S o l u t i o n : S3 and S9. 15.2. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o E n t i t y . 15.2.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S2 and S6. 15.2.2. Common s u p e r s e t . S o l u t i o n : S2 and S8. 15.2.3. Common subset and superset. Solution: S2 and S9. 15.3. E n t i t y r e l a t e d t o R e l a t i o n s h i p . 15.3.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : SI and S6. 15.3.2. Common s u p e r s e t . S o l u t i o n : SI and S8. 15.3.3. Common subset and s u p e r s e t . S o l u t i o n : SI and S9. 16. DIFFERENT BUT RELATED HOMONYMS OF DIFFERENT TYPE N l = N2; T l <> T2; Ml <> M2; r e l a t e d ( M l , M 2 ) ; CI <> C2;  16.1.  Entity Attribute related to Entity-Relationship construct 16.1.1. Entity Attribute related to Entity. 16.1.1.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S10 and S3 and S6. 16.1.1.2. Common s u p e r s e t . S o l u t i o n : S10 and S3 and S8. 16.1.1.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and S3 and S9. 16.1.2. Entity Attribute related to Relationship. 16.1.2.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S10 and S3 and S6. 16.1.2.2. Common s u p e r s e t . S o l u t i o n : S10 and S3 and S8. 16.1.2.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and S3 and S9. 16.2. R e l a t i o n s h i p A t t r i b u t e r e l a t e d t o E n t i t y . 16.2.1. One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S10 and S2 and S6. 16.2.2. Common s u p e r s e t . S o l u t i o n : S10 and S2 and S8. 16.2.3. Common subset and s u p e r s e t . S o l u t i o n : S10 and S2 and S9. 16.3. E n t i t y r e l a t e d t o R e l a t i o n s h i p . 237  16.3.1. 16.3.2. 16.3.3.  One o b j e c t c o n t a i n s t h e o t h e r . S o l u t i o n : S10 and SI and S6. Common s u p e r s e t . S o l u t i o n : S10 and SI and S8. Common subset and s u p e r s e t . S o l u t i o n : S10 and SI and S9.  17. MISSING OBJECT Object 2 does not e x i s t . S o l u t i o n : Sll. 17.1. 17.2. 17.3.  E n t i t y missing. Relationship missing. A t t r i b u t e missing.  238  Appendix 3: View I n t e g r a t i o n S e s s i o n w i t h AVIS A view i n t e g r a t i o n s e s s i o n w i t h AVIS i s i l l u s t r a t e d through a s e t of 22 s c r e e n d i s p l a y s . The problem "c34" c o n s i s t s o f two s m a l l v i e w s which have t o be i n t e g r a t e d . F i g u r e 28 d e p i c t s the s t r u c t u r e of the views.  View 1: BRANCH  Contract  View 2: DEALER  CONTRACT  F i g u r e 28: View I n t e g r a t i o n Sample Problem The s c r e e n s shown below exemplify q u e s t i o n s asked by the AVIS s y s t e m as w e l l as AVIS' s u p p o r t f u n c t i o n s . These support f u n c t i o n s f o r i n s t a n c e i n d i c a t e t o t h e d e s i g n e r what t h e program a l r e a d y knows o r what the c u r r e n t c o n t e n t s o f each view a r e . Example screens which d i s p l a y system q u e s t i o n s t o the u s e r w i l l not d e p i c t u s e r r e p l i e s . The s h o r t summary d e s c r i p t i o n o f each s c r e e n shown below, however, s t a t e s the u s e r answers and documents the purpose of each s c r e e n . Screen 1 2  Purpose AVIS t i t l e screen, asks u s e r t o choose a problem file. Chosen here: "c34". F i r s t system q u e s t i o n . User answers "1003".  The f o l l o w i n g screens 3 - 8 exemplify support f u n c t i o n s which can be a c t i v a t e d a t any time d u r i n g the i n t e g r a t i o n s e s s i o n when the system i s ready t o accept i n p u t . Some of the screens may i n i t i a l l y have no o r l i t t l e content, i . e . , s c r e e n 4. They are shown here t o demonstrate the system s t a t u s a t the b e g i n n i n g of an i n t e g r a t i o n s e s s i o n and t o a l l o w a comparison w i t h l a t e r system s t a t i . Screens 3 - 8 show the system s t a t u s b e f o r e the 239  user's screen 3 4 5 & 6 7 8  9 10 11  12 13 14 15 16 17  18  19 20 21  answer "1003". 8.  The  u s e r gave h i s answer a f t e r  seeing  Shows "Agenda", c o n s i s t i n g of p r e s e n t and f u t u r e o b j e c t comparison t a s k s (preview). Shows "Old Agenda", c o n s i s t i n g of c u r r e n t and p r e v i o u s o b j e c t comparison t a s k s ( h i s t o r y l o g ) . Show the contents of views 1 and 2 ( a t the o u t s e t of the i n t e g r a t i o n s e s s i o n ) . Shows l i s t of " f a c t s " , knowledge about the s e t o f views based on p r e v i o u s o b j e c t comparisons. Here the l i s t i s s t i l l empty. Meaning comparison s c r e e n . Shows what the system knows about the match between o b j e c t s . Here, b e s t match i s w i t h "1003 - d e a l e r " .  System q u e s t i o n 2. User answers "n". System q u e s t i o n 3. User answers "n", but not u n t i l s e e i n g s c r e e n 11. " O l d A g e n d a " now shows t h e p r e v i o u s f o u r s y s t e m questions. Note t h a t the system never asked the u s e r for Synonym (agenda i t e m 2) b e c a u s e i t c a n a s s e s s without user h e l p whether o b j e c t s c a r r y d i f f e r e n t names (simple s t r i n g comparison). System q u e s t i o n 4. User answers "0", but not u n t i l s e e i n g s c r e e n 14. "Meaning match" support f u n c t i o n suggests no s i m i l a r i t y . F a c t l i s t shows the knowledge a s s e r t e d a t t h i s p o i n t i n time. I.e., o b j e c t s 3 and 1003 are i d e n t i c a l (same). System q u e s t i o n 5. User answers "0". System q u e s t i o n 6. User answers "1005". Note t h a t the system r e p o r t s i n the lower r i g h t window t h a t i n the mean time, a new o b j e c t , 2013 - branch, has been added. System q u e s t i o n 7. User answers "n". The number 18 on the upper r i g h t hand c o r n e r of the s c r e e n shows t h a t the system has i n t e r n a l l y c r e a t e d 18 q u e s t i o n s , but has asked the user o n l y 7. The remaining ones were answered by the system. "THE AGENDA IS EMPTY". The system has c r e a t e d two i d e n t i c a l views, without f u r t h e r q u e s t i o n s t o the user. Note the i n t e r n a l count o f 30 q u e s t i o n s (upper right corner). The "Old Agenda" shows the l a s t 12 q u e s t i o n s , answered by the system without user i n t e r a c t i o n . F i r s t p a r t o f the F a c t l i s t . & 22 The a d j u s t e d views 1 and 2. A l l newly c r e a t e d o b j e c t s can be i d e n t i f i e d by t h e i r o b j e c t i d e n t i f i e r s (>2000).  240  SCREEN 1 A  V  I  S  VV AAA vv ssssssssss vv AA AA vv ss VV AA AA vv ss AA AA VV VV ssssssss AA AA VV VV ss AA AAAAAAA AA VVV ss AA AA V sssssssss AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AAA  - Automated View I n t e g r a t i o n  Systems by C h r i s t i a n Wagner, 1988.  Response What Is the name of the f i l e c o n t a i n i n g the views? (TEST.PRO) c34  SCREEN 2 Testing f o r hypothesis: SIMILAR ENTITY, i n v o l v i n g the e n t i t y DEALER (3) and one o f t h e f o l l o w i n g 1003 -1004 -1005 --  objects;  dealer customer contract  Make Agenda  New  Objects  Hypo T e s t Assert -ResponsePlease type i n the number of the a p p l i c a b l e object (or 0 i f none).  241  Objects  SCREEN 3 Testing f o r hypothesis: SIMILAR ENTITY, I n v o l v i n g the e n t i t y DEALER (3) and one of the f o l l o w i n g  objects;  Agenda Current Agenda Item H: 3 S i m i l a r E n t i t y - (31(1003,1004,1005)  Press  Response<spacebar> to c o n t i n u e  SCREEN 4 A V I S Testing for hypothesis: SIMILAR ENTITY, I n v o l v i n g the e n t i t y DEALER (3) and one of the f o l l o w i n g -l->  Press  3 - Similar Entity -  Old Agenda (3111003,1004,1005]  Reaponse<spacebar> to c o n t i n u e  242  objects;  SCREEN 5 Objects VIEW 1 - RELATIONSHIPS 502 - s u p p l y l " d e a l e r " , " b r a n c h " )  VIEW 1 - ENTITIES 3 - dealer I"contract"1 4 - branch!"branch_no"J  Press  Response<spacebar> t o c o n t i n u e  SCREEN 6 Objects — VIEW 2 - RELATIONSHIPS 1502 - d e a l e r _ c o n t r a c t ( " d e a l e r " , " c o n t r a c t " 1 1503 - c u s t o m e r _ c o n t r a c t I " c u s t o m e r " , " c o n t r a c t " 1 VIEW 2 - ENTITIES 1003 - d e a l e r I"dealer_no") 1004 - customer("customer_no"I 1005 - c o n t r a c t ( " c o n t r a c t no")  Press  . Response(spacebar) to continue  243  SCREEN 7 T e s t i n g foe h y p o t h e s i s : SIMILAR ENTITY, i n v o l v i n g the e n t i t y DEALER (3) and one o f the f o l l o w i n g 1003 1004 1005  -  objects: Pacts  dealer customer contract  Make Agenda  Hypo Test  P r e s s <spacebar>  Responseto continue  SCREEN 8 Testing for hypothesis: SIMILAR ENTITY, i n v o l v i n g the e n t i t y DEALER (3) and one of the f o l l o w i n g Meaning Match Match between e n t i t y DEALER (3) I " s e l l s " , " s u p p l i e s " ] and o b j e c t s below: ID NAME Match o f : NAME MEANING CONTEXT 1003 dealer y y unknown 1004 customer n n unknown 1005 contract n n unknown  P r e s s <spacebar>  Responset o continue  244  objects:  9  SCREEN  Testing for hypothesis: SIMILAR RELATIONSHIP, i n v o l v i n g r e l a t i o n s h i p SUPPLY (502) and r e l a t i o n s h i p  Make Agenda 3 -> a g e n d a ( s i m i l a r _ m e a n l n g , I  New  Objects  3),(1003))  Hypo T e s t TO BE EXECUTED: simllar_meanlng((502),(15021)  Assert Objects ao(4,3,1003,n)  Response P l e a s e answer w i t h y or n t o i n d i c a t e the h y p o t h e s i s i s t r u e or f a l s e .  whether -  SCREEN  te3t_hypo(7)  10  T e s t i n g f o r h y p o t h e s i s : RELATED RELATIONSHIP, i n v o l v i n g r e l a t i o n s h i p SUPPLY (502) and r e l a t i o n s h i p  DEALER_CONTRACT (1502)  New  Make Agenda 1 ->  DEALER_CONTRACT (1502)  Objects  agenda(homonyms,(502),(1502))  Hypo T e s t TO BE EXECUTED: related((502),(15021) — Response P l e a s e answer w i t h y or n to i n d i c a t e the h y p o t h e s i s Is t r u e or f a l s e .  A s s e r t Objects ao(l,502,1502,n) whether test_hypo(7)  245  SCREEN 11 Testing for hypothesis: RELATED RELATIONSHIP, i n v o l v i n g r e l a t i o n s h i p SUPPLY (502) and r e l a t i o n s h i p DEALER_CONTRACT  1: - l - > 3 2: 3-> 4 3: o-> 1 4: l-> 13  Press  (1502)  Old Agenda S i m i l a r E n t i t y - [31(1003,1004,1005] Synonym - (31(1003) S i m i l a r R e l a t i o n s h i p - (5021(1502)  Response<spacebar> to c o n t i n u e test_hypo(7)  SCREEN 12 Testing for hypothesis: ENTITY ATTRIBUTE IS RELATIONSHIP CONSTRUCT, i n v o l v i n g the a t t r DEALER_NO (2001) and one of the f o l l o w i n g o b j e c t s : 4 — branch 502 -supply  Make Agenda  New  Objects  13 -> a g e n d a ( e a _ l s _ r c , (5021, (1502)) Hypo T e s t TO BE EXECUTED: e a _ i s _ r c ( [ 2 0 0 1 1 , ( 4 , 5021 )  Assert Objects ao(13,502,1502,n)  — Response P l e a s e type In the number of the a p p l i c a b l e o b j e c t (or 0 i f none).  246  -  test_hypo(7)  SCREEN  13  Testing for hypothesis: ENTITY ATTRIBUTE IS RELATIONSHIP CONSTRUCT, I n v o l v i n g the a t t r DEALER_NO (2001) and one of the f o l l o w i n g o b j e c t s : Meaning Match Match between a t t r DEALER_NO (2001) I"key"! and o b j e c t s below: ID NAME Match o f : NAME MEANING-CONTEXT 4 branch n n none 502 supply n n none  Press  Response<spacebar> t o c o n t i n u e test_hypo(7)  SCREEN  14  Testing for hypothesis: ENTITY ATTRIBUTE IS RELATIONSHIP CONSTRUCT, I n v o l v i n g t h e a t t r DEALER_NO (2001) and one of the f o l l o w i n g o b j e c t s : 4 — branch 502 — supply  Facts sinllar_meanlng(3,1003) same(3,1003) dlsslallar_meanlng(502,1502) unrelated(502,1502) Make Agenda  13 -> a g e n d a ( e a _ l s _ r c , (502), 11502)) Hypo T e s t TO BE EXECUTED: ea_is_rc((20011,(4,502)) Press  Response<spacebar> t o c o n t i n u e -  247  test_hypo(7)  S C R E E N  15  Testing for hypothesis: SIMILAR ENTITY, I n v o l v i n g the e n t i t y BRANCH (4) and one of the f o l l o w i n g 1004 1005  objects:  customer contract  -  Make Agenda  New O b j e c t s  Hypo T e s t Assert  Please object  Objects  Response type In the number of the a p p l i c a b l e (or 0 l f none).  S C R E E N  16  Testing for hypothesis: ENTITY ATTRIBUTE IS RELATIONSHIP CONSTRUCT, i n v o l v i n g the a t t r CONTRACT (600) and one of the f o l l o w i n g objects: 1005 1502  -  contract dealer contract  Make Agenda  New O b j e c t s H-slmllar_meanlng added o b j e c t s : 2013 — branch  Hypo T e s t TO BE EXECUTED: e a _ l s _ r c ( ( 6 0 0 ) , (1005,1502)) Please object  Assert Objects ao(301,4,0,n)  -Responsetype In the number of the a p p l i c a b l e (or 0 l f none).  248  -  test_hypo(7)  SCREEN  17  18 T e s t i n g f o r h y p o t h e s i s : ENTITY ATTRIBUTE IS RELATIONSHIP CONSTRUCT, i n v o l v i n g a t t r DEALERNO (2001) and r e l a t i o n s h i p SUPPLY (502)  Make Agenda 13 ->  New O b j e c t s H-Biiss ing added o b j e c t s : 2023 customer c o n t r a c t  agenda(ea_ls_rc,(15021,(502))  Hypo T e s t TO BE EXECUTED: e a _ i s _ r c ( ( 2 0 0 1 1 , (5021 )  Assert Objects ao(13,1502, 502,n)  -ResponseP l e a s e answer with y or n to i n d i c a t e the h y p o t h e s i s i s t r u e or f a l s e .  whether -  SCREEN  test_hypo(7)  18  30 A  V  I  Make Agenda  New O b j e c t s H-mlss ing added o b j e c t s : 2027 -supply  13 -> a g e n d a ( e a _ l s _ r c , (2027], (20171) Hypo Test PRECONDITION FAILED: related*(2027),[2017])  THE  S  Assert Objects ao(13,2027,2017,n)  Response AGENDA IS EMPTY -  249  asso(1301)  SCREEN  19  30 A  V  Old 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:  0 -> - 5 -> - 5 -> - 5 -> - 6 -> - 6 -> - 6 -> -7 -> - 7 -> - 8 -> - 8 -> - 8 ->  Press  19 1 1 1 1 1 1 13 13 13 13 13  -  Hissing Similar Similar Similar Similar Similar Similar Related Related Related Related Related  <spacebar> to  Relationship Relationship Relationship Relationship Relationship Relationship Relationship Relationship Relationship Relationship Relationship Relationship  -  I  S  Agenda 150211] 120231(1503,1502] (5021(2027) (20171(20271 (15021(2023) (15031(20231 (20271(2017,5021 (20171(2027) 120231(1502] (15021(20231 115031(2017) (20271(20171  Responsecontinue asso(1301)  SCREEN 20 30  A  Hake Agenda 13 -> a g e n d a ( e a _ l s _ r c , 1 2 0 2 7 ] , ( 2 0 1 7 ] )  Hypo T e s t PRECONDITION FAILED: r e l a t e d ( ( 2 0 2 7 ) , I 2017))  V  I  S  Facts slallar_meaning(3,1003) same(3,1003) d1sslmllar_aeaning(502,1502) unrelated!. 502,1502 ) ea_is_not_rc(2001,4) e a _ l s _ n o t _ r c ( 2 0 0 1 , 5 0 2) dlsslmilar_meanlng(4,1004) dlssimilar_meanlng(4,1005) simllar_meaning(4,2013) same(4,2013) dup(4,2013) ...  Response  THE AGENDA IS EMPTY  asso(1301)  250  more >  SCREEN  21  Objects VIEW 1 - RELATIONSHIPS  — —  502 - s u p p l y ! " d e a l e r " , " b r a n c h " 1 2017 - d e a l e r _ c o n t r a c t ! " d e a l e r " , " c o n t r a c t " I 2023 - c u s t o m e r _ c o n t r a c t ! " c u s t o m e r " , " c o n t r a c t " )  VIEW 1 - ENTITIES 3 4 2015 2021  - dealer("contract" 1 - branch!"branch_no") - contract("contract_no"I - customer("customer no"I  Press  <spacebar>  Responseto continue asso(1301)  S C R E E N 22  Objects — VIEW 2 - RELATIONSHIPS 1502 - d e a l e r _ c o n t r a c t I " d e a l e r " , " c o n t r a c t " ] 1503 - c u s t o m e r _ c o n t r a c t I " c u s t o m e r " , " c o n t r a c t " ) 2027 - s u p p l y ! " d e a l e r " , " b r a n c h " J  VIEW 2 - ENTITIES 1003 1004 1005 2013  Press  -  d e a l e r I"dealer_no"I customer I"customer_no"I contract I"contract_no"J branch!"branch_no"1  <spacebar>  Responseto continue asso(1301)  251  1  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098369/manifest

Comment

Related Items