Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A framework for managing information from heterogeneous, distributed, and autonomous sources in the architecture,… Kosovac, Branka 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-267424.pdf [ 17.34MB ]
Metadata
JSON: 831-1.0063270.json
JSON-LD: 831-1.0063270-ld.json
RDF/XML (Pretty): 831-1.0063270-rdf.xml
RDF/JSON: 831-1.0063270-rdf.json
Turtle: 831-1.0063270-turtle.txt
N-Triples: 831-1.0063270-rdf-ntriples.txt
Original Record: 831-1.0063270-source.json
Full Text
831-1.0063270-fulltext.txt
Citation
831-1.0063270.ris

Full Text

A FRAMEWORK FOR MANAGING INFORMATION FROM HETEROGENEOUS, DISTRIBUTED, AND AUTONOMOUS SOURCES IN THE ARCHITECTURE, ENGINEERING, CONSTRUCTION, AND FACILITIES MANAGEMENT DOMAIN by BRANKA KOSOVAC A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Civil Engineering) THE UNIVERSITY OF BRITISH COLUMBIA January 2007 ©Branka Kosovac, 2007 Abstract This d isser ta t ion p roposes a f ramework tha t a l lows di f ferent efforts a im ing to enhance in format ion m a n a g e m e n t in the a rch i tec tu re , eng inee r i ng , cons t ruc t ion , and faci l i t ies m a n a g e m e n t ( A E C / F M ) indust r ies , to coex is t and suppor t each o ther by shar ing resources , se rv i ces , and outputs . The ma in mot iva t ion for th is research was the lack of suppor t for non- rou t ine tasks and bus iness agi l i ty in in format ion s y s t e m s serv ing the d o m a i n . A n ex tens ive ana lys is of In fo rmat ion needs and ava i lab le so lu t ions ident i f ied the d o m a i n he terogene i ty and comp lex i t y as key cha l lenges for success fu l in fo rmat ion m a n a g e m e n t and eff ic ient c o m m u n i c a t i o n be tween a wide range of h u m a n and mach ine par t ic ipants as a m iss ing l ink. Sugges t i ng that such c o m m u n i c a t i o n needs to invo lve all c o m p o n e n t s of h u m a n - t o -h u m a n c o m m u n i c a t i o n : syn tax , seman t i c s , and p ragmat i cs , the ex is t ing in fo rmat ion -m a n a g e m e n t resources and app roaches were ana l yzed wi th in the semio t i c f r amework , in o rder to ident i fy sha red s imp le e lemen ts that can be used to re late t h e m . The p roposed f ramework ident i f ies th ree bas ic t ypes of asse r t i ons : senses , re la t ionsh ips , and in fo rmat ion , and the i r two p roper t ies : ca tegory and scope , as a set of bas ic e lemen ts that can be used to re late all k inds of seman t i c resources as wel l as i n f o r m a t i o n - m a n a g e m e n t app roaches based on l inguis t ics , in fo rmat ion- re t r ieva l theory and pract ice , d o c u m e n t s t ruc ture , and know ledge represen ta t ion . The f ramework enab les cons is ten t m a n a g e m e n t of d i f ferent t ypes of in fo rmat ion at any level of g ranu lar i ty and cor re la t ion of asser t ions invo lv ing in fo rmat ion , its sub jec t - and con tex t -d o m a i n s . A pilot imp lemen ta t i on demons t ra ted on a sma l l sca le how the p roposed f ramework can be used in pract ice. The env is ioned s y s t e m cons is ts of n u m e r o u s and d iverse c o m p o n e n t s that share the i r content v ia W e b serv ices us ing the p roposed f ramework and a set of shared resources that inc lude regis t r ies and spec ia l i zed serv ices offer ing senses ( i .e. te rm ino logy mapp ing and resolut ion) and re la t ionsh ips ( i .e. concep tua l i za t ions ) . The research uses a comb ina t ion of cons t ruc t i ve and exp lo ra to ry m e t h o d . The bas ic f r amework was va l ida ted by the abi l i ty to exp ress all t ypes of s e m a n t i c resources and the pi lot imp lemen ta t i on by the compar i son to a set of predef ined requ i remen ts . Howeve r , the real benef i ts of the p roposed f r amework can be p roven only w h e n it is used in comb ina t ion wi th a var ie ty of ex i s t i ng , e m e r g i n g , and future techn iques in comp lex rea l -wor ld env i r onmen ts , as in tended . Table of Contents Abstract ii Tab le of Contents iii List of Tab les vi List of F igures vii G lossary ix A c k n o w l e d g e m e n t s xvi Dedicat ion xvii 1 Introduct ion 1 1.1 Taming the Information Jungle 1 1.2 Document Overview 3 2 Prob lem Analys is 5 2.1 Information Need 5 2.2 Information Flow Channels 11 2.3 Information Acquiring Modes 14 2.4 Information Management Disciplines 15 2.5 Making Information Findable 17 2.6 Matching 18 2.7 Types of Information Management Systems 19 2.8 AEC/FM Domain Challenges 21 2.8.1 Abundance 21 2.8.2 Heterogeneity 22 2.8.3 Different views 23 2.8.4 Change 24 2.8.5 Multiple information spaces 24 2.9 Problem Definition 25 3 Research Object ive and Scope of Work 27 3.1 Hypothesis 27 3.2 Statement of objectives 28 3.3 Scope of Work 28 3.4 Assumptions and Delimitations 29 3.5 Methodology 30 4 Communica t ion 32 5 Exist ing Components of Semiot ic Infrastructure 41 5.1 Information Organization 41 5.1.1 Metadata 41 5.1.2 Classification 42 5.2 Lexical Resources 44 5.3 Knowledge Representation 46 5.3.1 Networks 46 5.3.2 Frames 47 5.3.3 Ontologies 47 5.4 Data Modeling 48 5.4.1 Schemas 48 5.5 Standardization vs. Diversity 50 6 Point of Departure 52 - iii -6.1 General Research and Development Efforts 52 6.1.1 Interoperability 52 6.1.2 Context 57 6.1.3 Sharing Mechanisms 59 6.1.4 Automated Extraction of Semantics 60 6.2 Supporting Technologies and Related Frameworks 61 6.2.1 XML and Associated Standards 61 6.2.2 Semantic Web 62 6.2.2.1 RDF 63 6.2.2.2 RDFS 64 6.2.2.3 OWL 65 6.2.2.4 Topic Maps 65 6.2.2.5 Topic Maps Compared to RDF 67 6.2.3 Technologies and Architectures for Sharing and Customization 68 6.2.3.1 Web Services 68 6.2.3.2 Service-Oriented Architecture (SOA) 68 6.2.3.3 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) 69 6.2.3.4 Web Feeds 70 6.2.3.5 Peer to Peer 71 6.2.3.6 Grid Computing 71 6.2.3.7 Agent Technologies 71 6.2.3.8 Smart Clients 72 6.2.3.9 Web 2.0 72 6.2.4 Research Projects 73 6.3 AEC/FM-Specif ic Efforts 77 6.3.1 Domain-Specific Standards and Semantic Resources 77 6.3.1.1 Classification schemes 77 6.3.1.2 ISO 12006-3 79 6.3.1.3 Lexicon 80 6.3.1.4 BARBi 80 6.3.1.5 SDC 80 6.3.1.6 CONNET Web Thesaurus 81 6.3.1.7 Industry Foundation Classes 81 6.3.1.8 Metadata 83 6.3.1.9 IfcXML 85 6.3.1.10 BLIS-XML 85 6.3.1.11 aecXML 86 6.3.1.12 bcXML 86 6.3.1.13 Promotion and Mapping of Standards 86 6.3.2 AEC Research Projects 87 6.3.2.1 eConstruct 88 6.3.2.2 ISTforCE 89 6.3.2.3 DocMo service 89 6.3.2.4 OSMOS 90 6.3.2.5 e-Cognos 90 7 Proposed Approach 93 7.1 Requirements 94 7.2 General Principles 101 7.3 Semiotic Infrastructure 102 7.3.1 Basic Concepts—Shared Factors 106 7.3.2 Reduction of Semantic Resources to Shared Factors 117 7.3.3 Summary 137 7.4 System Functionality and Architecture 140 8 Pilot Implementation 145 8.1 Implementation Overview 145 8.2 Pilot Implementation Requirements 146 8.2.1 Users 146 8.2.2 Use Cases and Scenarios 147 8.2.2.1 General User 148 8.2.2.2 Company Information Officer 152 8.2.2.3 Project Information Officer 153 8.2.2.4 System Component 154 8.2.3 Summary of Functional Requirements for the Pilot Implementation 155 - iv -8.2.3.1 Client Application 155 8.2.3.2 Web Service 157 8.2.4 User's View of the Application 158 8.3 Pilot Implementation Description 158 8.3.1 Data 158 8.3.2 Logic 163 8.3.3 User Interface 168 8.3.4 Architecture 174 8.3.5 Choice of technologies 174 8.3.6 Implementation Process Overview 175 8.3.7 System Operation 175 8.4 Discussion 190 8.4.1 Token service 190 8.4.2 Relationship Service 198 9 Conc lus ion 201 9.1 Evaluation 201 9.2 Contributions 204 9.3 Potential Benefi ts/Impact.. . . . 206 9.4 Future Research Directions and Development Steps 208 10 References 212 - v -List of Tables Table 1: Differences in approaches to achieving interoperability 56 Table 2: Requirements for the solution 100 Table 3: Key.. . 118 Table 4: Comparison of Dublin Core (DCMES) and IFC metadata schemas 134 Table 5: Associations needed for information management 137 Table 6: Comparison of test cases in current environments, pilot, and potential full implementation 189 Table 7: Comparison of entries for the term "companies" in T C / C S , AAT, and WordNet 193 Table 8: Comparison of entries for the term "French windows" in TC /CS , AAT, and WordNet. ... 194 - vi -List of Figures Figure 1. Information flow environments 12 Figure 2. Current communication flow 13 Figure 3. Plane of signifiers: syntagms and paradigms 34 Figure 4. Sign system 34 Figure 5. Types of signifiers 35 Figure 6. Ogden and Richards' meaning triangle 36 Figure 7. Representing "real world" in electronic environment 40 Figure 8. Types of approaches for achieving semantic interoperability 54 Figure 9. RDF graph 63 Figure 10. Semantic Web Layers 64 Figure 11. Three inter-related worlds 103 Figure 12. Three layers of semantics 104 Figure 13. Example of an IFS fractal: Koch's snowflake 105 Figure 14. Symbol for subject 107 Figure 15. Symbol for token 107 Figure 16. Code: association of a token with a subject 107 Figure 17. Reified token 108 Figure 18. Symbol for information item 109 Figure 19. Information: subject associated with an information item telling something about it. 109 Figure 20. Reified information item 109 Figure 21. Symbol for relationship between subjects 110 Figure 22. Reified relationship between two subjects 110 Figure 23. Representation of a subject and three basic types of assertions about it I l l Figure 24. Category relationship 112 Figure 25. Typed relationship: category assigned to an assertion 112 Figure 26. Scoped relationship: relationship of an assertion to a subject representing a scope. . 113 Figure 27. Different representations of the same relationship 114 Figure 28. Representation of the same relationship in this document 114 Figure 29. Data model 117 Figure 30. Simplified representation of a classification scheme 119 Figure 31. Representation of a classification scheme distinguishing information-realm and subject-domain classes 120 Figure 32. Sample assertions about information-realm classes in a classification scheme 120 Figure 33. Simplified representation of a faceted classification 121 Figure 34. Generic representation of a declaration 121 Figure 35. IfcRoot 122 Figure 36. Repeatable element of a monolingual dictionary 123 Figure 37. Formal definition per genus proximum and differentia specifica 123 Figure 38. Reified token 123 Figure 39. Reified token sense 123 Figure 40. Repeatable element of a multilingual dictionary 123 Figure 41. Repeatable element of a synonyms thesaurus 124 Figure 42. Repeatable element of a controlled vocabulary 124 Figure 43. Relationships between terms in a thesaurus 125 Figure 44. Thesaurus representation within the proposed framework 125 Figure 45. Basic element of a thesaurus 125 Figure 46. Hidden semantic resources 127 Figure 47. Complex subject created by aggregation 129 Figure 48. Comparison of related portions of AEC/FM domain-specific semantic resources 136 Figure 49. Diagram of the envisioned system 142 Figure 50. TRI-TACS architecture 144 Figure 51 : High-level use cases 148 Figure 52. Overview of use cases for system components 155 Figure 53. Data model of the company directory 159 Figure 54. Data model of the company database 159 Figure 55. Data model of the content source registry 160 - vii -Figure 56. Graphical representation of the sample BLIS-XML file 161 Figure 57. Data model of the pilot implementation 162 Figure 58. "F ind" activity diagram 164 Figure 59. "Explore" activity diagram 165 Figure 60. "Associate" activity diagram 166 Figure 61. "Create" activity diagram 167 Figure 62. "Share" activity diagram 168 - vii i -Glossary AAT: Art & Architecture Thesaurus Online. Full re ference under [J. Paul Ge t t y Trus t 2 0 0 6 ] . access point: Proper ty of in format ion i tems (i .e. me tada ta e lemen t ) that can be used for search ing and ret r iev ing t h e m . For e x a m p l e , au thor 's n a m e or sub jec t are useful access po in ts , the n u m b e r of pages in a documen t is not. aecXML: A n X M L - b a s e d language used to represent di f ferent t ypes of in format ion in the A E C indust ry . S e e Sec t ion 6 . 3 . 1 . 1 1 . appl icat ion profi le: A set of me tada ta e lemen ts used by a par t icu lar app l i ca t ion . It can inc lude subse ts of e lemen ts f rom mul t ip le s tandard me tada ta s c h e m a s . application schema: A conceptua l s c h e m a for da ta requi red by one or more app l ica t ions . assoc iat ion ism: A theory that assoc ia t ion is the bas ic pr inc ip le of menta l act iv i ty . BARBi : A Norweg ian c o m m o n ob jec t -o r ien ted re ference da ta l ibrary. S e e Sec t ion 6 .3 . I . 4 . basic-level category: One of the mos t inc lus ive ca tegor ies for wh ich a concre te image of the ca tegory as a whole can be f o r m e d , to be the f irst ca tegor iza t ions made dur ing percept ion of the env i ronmen t and to be the ca tegor ies mos t codab le , mos t c o d e d , and mos t necessary in language (e .g . cha i r as opposed to furn i ture or d in ing cha i r , dog as opposed to m a m m a l or Ir ish Set ter ) [Rosch et a l . 1976 ] . bcXML: Bu i ld ing -Cons t ruc t ion ex tens ib le Mark -up Language . S e e Sec t ion 6 . 3 . 1 . 1 2 . berrypick ing: A mode l of in format ion seek ing where a user is invo lved in an i terat ive process of mov ing f rom one patch of in format ion to ano the r and ga ther ing re levant in format ion resources . [Bates 1989] BLIS-XML: A methodo logy for encod ing E X P R E S S based in format ion in X M L fo rmat . S e e Sec t ion 6 . 3 . 1 . 1 0 . business requirement: A s ta temen t about an act iv i ty tha t an actor needs to per form and that can be suppor ted by di f ferent so lu t ions or per fo rmed in non -au toma ted env i ronmen ts . S e e a lso funct ional requ i rement . code: A s y s t e m of re la ted conven t ions for assoc ia t ing s iqni f iers to s ign i f ieds (or tokens to sub jec ts ) in a cer ta in contex t . compound term: A te rm that cons is t of more than one wo rd . concept: A uni t of though t const i tu ted th rough menta l representa t ion of a p h e n o m e n o n or abs t rac t ion on the bas is of character is t ics c o m m o n to a set of p h e n o m e n a . conceptual izat ion: A n abs t rac t , s impl i f ied v iew of a port ion of the wor ld . conceptual formal i sm: A set of mode l l ing const ruc ts needed to descr ibe a concep tua l i za t ion . E x a m p l e s : UML meta m o d e l , E X P R E S S me ta m o d e l . One conceptua l f o rma l i sm can be exp ressed in severa l conceptua l s c h e m a languages . conceptual s chema: Expl ic i t representa t ion of t ypes of sub jec ts , the i r p roper t ies , and re la t ionsh ips that ex is t in a port ion of the wor ld , wh ich are needed to represent that wor ld for s o m e purpose . - ix -c o n c e p t u a l s c h e m a language: fo rmal language based on a conceptua l f o rma l i sm and used for represent ing conceptua l s c h e m a s . E x a m p l e s : UML , E X P R E S S , IDEF1X . A conceptua l s c h e m a language may be lexical or g raph ica l . Seve ra l conceptua l s c h e m a languages can be based on the s a m e conceptua l f o r m a l i s m . c o n n e c t i o n i s m : A n app roach to s tudy ing cogni t ion and in te l l igence that e m p h a s i z e s the connec t ions a m o n g concep ts , ra ther than the i r symbo l i c m e a n i n g . construct: A theore t ica l concept in t roduced and def ined wi th in a theory as needed to ana l yze the un iverse of d iscourse and exp l ica te the theory . c o r p o r a : Large co l lect ions of tex ts in mach ine readable f o rm . C P V : Common Procurement Vocabulary, a c lass i f ica t ion s y s t e m for publ ic p rocu remen t a imed at s tandard iz ing the re ferences used by cont rac t ing author i t ies and ent i t ies to descr ibe the sub jec t of p rocu remen t cont rac ts in the European Un ion . Full re ference under [European C o m m i s s i o n 2 0 0 2 ] . crossdicipl inary r e s e a r c h : Resea rch focus ing on one d isc ip l ine us ing v iews and pr inc ip les of another . c r o s s w a l k : A tab le that maps the re la t ionsh ips and equ iva lenc ies be tween two or more me tada ta fo rma ts . C r o s s w a l k s or me tada ta mapp ing suppor t the abi l i ty of sea rch eng ines to search ef fect ive ly across he te rogeneous d a t a b a s e s , i.e. c rosswa lks help p romote in teroperabi l i ty . [Dubl in Core Metada ta In i t iat ive 2 0 0 1 ] . data: Re- in te rp re tab le representa t ion of in format ion in a fo rma l i zed m a n n e r su i tab le for c o m m u n i c a t i o n , in te rpre ta t ion , or process ing [ ISO 1993 ] . D A M L + O I L : D A R P A Agen t Markup Language deve loped as an ex tens ion to X M L and the Resource Descr ip t ion F ramework (RDF) prov id ing a set of cons t ruc ts wi th wh ich to c reate onto log ies and to markup in format ion so that it is mach ine readab le and unders tandab le . Full re ference under [W3C 2 0 0 1 ] . dataset: A n ident i f iable co l lec t ion of da ta [ ISO 2 0 0 6 b ] . D C M E S : Dublin Core Metadata Element Set Full re ference under [DCMI 2 0 0 6 b ] . descriptor: A te rm chosen as the preferred exp ress ion of a concep t in a thesau rus . A lso ca l led "p re fe r red t e r m " [N ISO 2 0 0 5 ] . E P I C : The c o m m o n group ing of const ruc t ion products used as a c o m m o n re ference s y s t e m by the European const ruc t ion indust ry for access to product in format ion ac ross nat ional boundar ies . Ful ly re ferenced under [E lect ronic Product In format ion C o - O p e r a t i o n 1999 ] . facet: One of c lear ly de f ined , mutua l l y exc lus ive , and co l lect ive ly exhaus t i ve aspec ts , proper t ies or charac ter is t i cs of a c lass or speci f ic sub jec t [Taylor 1992 ] . A g roup ing of concep ts of the s a m e inherent ca tegory . E x a m p l e s of ca tegor ies tha t may be used for g roup ing concepts into facets a re : ac t iv i t ies , d isc ip l ines , peop le , mate r ia l s , p laces , etc. [N ISO 2 0 0 5 ] . functional r e q u i r e m e n t : A s ta temen t tha t spec i f ies a funct ion that a s y s t e m or componen t mus t be able to per form in order to sat is fy a bus iness requ i rement . See a lso bus iness requ i rement . generic p o s t i n g : In thesau r i , the t rea tmen t of nar rower t e rms as n e a r - s y n o n y m s [N ISO 2 0 0 5 ] . gloss: A n exp lana t ion of the mean ing of a te rm 's s e n s e , tha t may inc lude a def in i t ion, - x -exp lana t i on , a n d / o r e x a m p l e s of usage (based on [WordNet Glossary]). guide te rm: S e e node labe l . head: The word in a c o m p o u n d te rm that p lays the s a m e g rammat i ca l role as the who le c o m p o u n d t e r m . hidden Web (or invisible Web or deep Web): Publ ic ly access ib le pages on the Wor ld Wide W e b that are not indexed by search eng ines , pr imar i ly pages dynamica l l y genera ted as responses to da tabase quer ies . heteronym: Two words are he te ronyms if they are spe l led the s a m e way but dif fer in pronunc ia t ion [Pr inceton Univers i ty 2 0 0 6 ] . ho lonym: A word that n a m e s the whole of wh ich a g iven word is a part [Pr inceton Un ivers i ty 2 0 0 6 ] . homograph : Two words are homog raphs if they are spe l led the s a m e way but dif fer in mean ing (e .g . fair) [Pr inceton Un ivers i ty 2 0 0 6 ] . h o m o n y m : One of two or more words that are p ronounced or spe l led the s a m e way but have di f ferent mean ings . Inc ludes homog raphs and h o m o p h o n e s . homophone: One of two or more words p ronounced a l ike but di f ferent in mean ing or der iva t ion or spel l ing (as the words to, too, and two) hypernym: A word that is more gener ic than a g iven word [Pr inceton Un ivers i ty 2 0 0 6 ] . hyponym: A word that has a nar rower mean ing than a g iven word [Pr inceton Univers i ty 2 0 0 6 ] . IFCs: Industry Foundation Classes, an A E C / F M doma in s c h e m a deve loped to suppor t in teroperabi l i ty of sof tware app l ica t ions in the d o m a i n . Full re ference under [IAI 2 0 0 6 ] . information management : P rocesses that invo lve cap tu r i ng , co l lec t ion , s to rage , o rgan iza t i on , channe l l i ng , synch ron i za t i on , reuse , a rch i v i ng , and weed ing of in fo rmat ion requ i red to conduc t bus iness act iv i t ies . interdiscipl inary research: Research invo lv ing a topic that is too broad or comp lex to be dea l t wi th adequate ly by a s ing le d isc ip l ine or p ro fess ion , by coord ina t ing d isc ip l ine-spec i f i c v iews a round h igher level concep ts tha t may be di f ferent ly . def ined wi th in each ind iv idual d isc ip l ine, (see a lso mul t id isc ip l inary , c rossd ic ip l inary , t ransdic ip l inary ' ) ISO 12006-2: A n ISO s tandard that prov ides a faceted f ramework to be used as a bas is for deve lopmen t of nat ional and regional A E C / F M c lass i f ica t ion s y s t e m s . Full re ference under [ ISO 2 0 0 1 a ] . ISO 12006-3: A n ISO s tandard that p rov ides a f ramework and conceptua l f o rma l i sm for def in ing concep ts , the i r re la t ionsh ips , and n a m i n g , in tended to reconc i le c lass i f icat ion and ob jec t -o r ien ted approaches to s t ruc tur ing A E C / F M in fo rmat ion . Full re ference under [ ISO 2 0 0 1 b ] . knowledge base: A col lect ion of in format ion about a cer ta in por t ion of the wor ld , cor re la ted to a par t icu lar conceptua l s c h e m a . A s opposed to a m o d e l , it can a lso inc lude uns t ruc tured in format ion and in format ion s t ruc tured accord ing to di f ferent conceptua l s c h e m a s , but cor re la ted to its own s c h e m a . Lex icon : A n ISO 1 2 0 0 6 - 3 - c o m p l i a n t vocabu la ry tha t def ines A E C / F M d o m a i n - xi -t e rm ino logy . Full re ference under [STABU 2 0 0 4 ] . m e r o n y m : A word that n a m e s a part of a larger who le , m e t a d a t a : S t ruc tu red in format ion about in format ion resources . m e t a d a t a e lement: A proper ty of in format ion resources def ined in a me tada ta s c h e m a . m e t a d a t a e l e m e n t set: A co l lect ion of me tada ta e lemen t def in i t ions that can be used in prof i les. m e t a d a t a s c h e m a : A conceptua l s c h e m a accord ing to wh ich me tada ta is s t ruc tu red . m o d e l : A representa t ion of a por t ion of the real wor ld s t ruc tured accord ing to a par t icu lar conceptua l s c h e m a . modifier: A word or a phrase in a c o m p o u n d te rm that modi f ies and l imi ts the ex tent of the mean ing of the te rm 's h e a d . M R R 1 : Mean rec iprocal rank of the f irst cor rect answer . One of the T R E C Web t rack measu res for assess ing e f fec t iveness of Web sea rch eng ines . multidiscipl inary r e s e a r c h : Research that invo lves jux tapos i t i on of v iews on a top ic f rom a var ie ty of d isc ip l inary perspec t i ves w i thout in tegrat ion of these v iews into a novel app roach . n a m e : The pr imary m e a n s of ident i f icat ion of ob jects and concepts for h u m a n s . n e a r - s y n o n y m : A t e rm whose mean ing is not exact ly s y n o n y m o u s wi th that of ano the r t e r m , yet wh ich may never the less be t rea ted as its equ iva len t in a cont ro l led vocabu la ry [N ISO 2 0 0 5 ] . n e u r a l network: A computa t iona l techn ique based on the mode l of the b ra in , mach ine learn ing , and paral le l p rocess ing . It uses an in te rconnec ted set of s imp le process ing e lemen ts that star t out connec ted in a r andom pat tern and get t ra ined by present ing it e x a m p l e s of input and the co r respond ing des i red output . The process ing abi l i ty of the network is s tored in the in ter -un i t connec t ion s t reng ths , or we igh ts , ob ta ined by a process of adap ta t ion of connec t ions to ach ieve the des i red ou t come . node label: A " d u m m y " t e r m , of ten a ph rase , tha t is not ass igned to d o c u m e n t s when index ing , but wh ich is inser ted into the h ierarch ica l sect ion of s o m e cont ro l led vocabu la r ies to ind icate the logical bas is on wh ich a c lass has been d i v ided . Node labels may a lso be used to g roup ca tegor ies of re la ted t e rms in the a lphabet ic sec t ion of a cont ro l led vocabu la ry [N ISO 2 0 0 5 ] . Ca l led "gu ide t e r m " in AAT . non-funct ional r e q u i r e m e n t : A s ta temen t tha t cap tu res a requ i red proper ty of a sys tem or a cr i ter ion that can be used to judge its opera t ion and that cons t ra ins its des ign , but does not descr ibe a funct ion that the s y s t e m is to p rov ide . O A I - P M H : O p e n A rch i ves In i t iat ive Protocol for Metada ta Harves t i ng . ontology: A f o rma l , expl ic i t speci f icat ion of a shared concep tua l i za t ion [Gruber 1993 ] . In th is documen t , the mean ing of the t e rm is restr ic ted to m e a n only those spec i f ica t ions of a sha red conceptua l i za t ion that inc lude def in i t ions and a x i o m s exp ressed in a log ic -based language . ostensive definition: A def in i t ion that conveys the mean ing of a s igni f ier by point ing out or exh ib i t ing ins tances of the s ign i f ied . - xii -P@n (Precision at n): Propor t ion of the top n documen ts re t r ieved by a W e b sea rch eng ine wh ich are sat is fac tory . P @ 1 0 = 0 . 5 m e a n s that 5 of the top 10 sea rch resul ts were sa t is fac tory . A run of mul t ip le quer ies is m e a s u r e d us ing m e a n P @ n . Th is is one of the T R E C Web t rack measu res for assess ing e f fec t iveness of W e b -search a lgo r i thms . The other measu res inc lude S @ n (Success at n)— the propor t ion of quer ies for wh ich a cor rect a n s w e r was wi th in the top n resu l ts , M R R 1 : m e a n rec iprocal rank of the f irst correct answer , and MAP: m e a n ave rage prec is ion . parallel distr ibuted process ing (PDP) : Computing based on connectionist and neural network models that involves a network of simple computational elements, working in parallel, each of which influences the other elements and is influenced by them. pear l -growing: a techn ique used in in format ion seek ing in wh ich af ter one good documen t is loca ted , it is used as a source of na tu ra l - l anguage or con t ro l led -vocabu la ry t e rms to be used in quer ies for addi t ional s im i la r content . percept: A menta l impress ion of someth ing perce ived by the s e n s e s , v i ewed as the bas ic c o m p o n e n t in the fo rmat ion of concep ts ; a sense d a t u m . p h e n o m e n o n : Any th ing that has def in i te , indiv idual ex is tence , as opposed to its menta l representa t ion or genera l i za t ion . precis ion: The propor t ion of in format ion i tems assessed as re levant to the total n u m b e r of re t r ieved i tems. profi le: In th is d o c u m e n t , prof i le means a subse t of a par t icu lar me tada ta e lemen t set needed to descr ibe a par t icu lar type of sub jec ts . qualif ier: A def in ing t e r m , a lso known as ' g l oss , ' used in a cont ro l led vocabu la ry to d is t ingu ish homog raphs . A qual i f ier is cons idered part of a descr ip to r , sub jec t head ing , or entry t e r m , but is separa ted f rom it by punc tua t ion (the qual i f ier is genera l ly enc losed in pa ren theses) [N ISO 2 0 0 5 ] . recal l : The propor t ion of re levant i tems ret r ieved by a query to the total n u m b e r of re levant i tems ava i lab le . re levance: A sub jec t ive measu re of the degree to wh ich re t r ieved conten t sat is f ies a g iven in format ion need . s c h e m a : A fo rmal descr ip t ion of a m o d e l . semant ic ne ighbourhood: A set of nodes that are d i rect ly re lated to a g i ven node wi th in a seman t i c network or v ia no more than a spec i f ied n u m b e r of edges . semant ic network: A know ledge representa t ion fo rma l i sm wh ich descr ibes ob jects and the i r re la t ionsh ips us ing a d i rec ted g raph cons is t ing nodes or ver t i ces and of label led arcs or edges . semant ic resource: In th is d o c u m e n t the te rm is used to e n c o m p a s s all t ypes of resources that m a k e any type of asser t ions compr i s ing the f ramework expl ic i t (e .g . d ic t ionar ies , c lass i f i ca t ions , me tada ta records , s c h e m a s e tc . ) . sense : The mean ing of a token or the way in wh ich it can be in te rpre ted . Used in th is work to m e a n an asser t ion that assoc ia tes a t oken wi th a sub jec t . sess ion : A ser ies of even ts that occur f rom the m o m e n t a user connec ts to a s y s t e m to the m o m e n t tha t connec t ion is t e rm ina ted , e i ther by the user or a predef ined t ime l imit. These even ts ref lect user in teract ion wi th the s y s t e m and can be - xii i -captured and used for s y s t e m adap ta t ion and improvemen t . s ign: In l inguist ic theory in t roduced by de S a u s s u r e , a comb ina t i on of a s igni f ier and s ign i f ied . S e e sense . signi f ied: The concep t re ferred to by a s igni f ier in the contex t of a par t icu lar s ign s y s t e m . signif ier: A w o r d , ph rase , image , ges tu re , sound or any th ing else that refers to a par t icu lar s ign i f ied . S e e t oken . s t e m m i n g : The use of an a lgor i thm that au tomat ica l l y sea rches for all of the words that c o m e f rom the s a m e " s t e m " or "root" (e .g . s w i m , s w a m , s w i m m i n g , s w i m m e r ) . S T E P : S tanda rd for the Exchange of Product Model D a t a , ISO 10303 ser ies of s tandards that descr ibe how to represent and exchange digi tal product in fo rmat ion . stop words: W o r d s , such as con junc t ions , p repos i t ions , ar t ic les , or words centra l to a par t icu lar d o m a i n that are so f requent ly used in a co l lect ion that they canno t cont r ibute to re levancy and are therefore ignored by search a lgo r i thms . subject : "Any th ing wha tsoeve r , regard less of whe the r it ex is ts or has any o ther speci f ic charac te r is t i cs , about wh ich any th ing wha tsoeve r may be asse r ted by any m e a n s wha tsoeve r " [Garshol and Moore 2 0 0 5 ] . In re lat ion to tokens , the te rm is used in this d o c u m e n t to m e a n assoc ia ted s ign i f ieds and in re lat ion to in format ion i tems to m e a n the i r top ics . syndicat ion: The process and pract ice of mak ing content ava i lab le to other par t ies. synset (also s y n o n y m set or s y n o n y m ring): A set of two or more t e rms that have the s a m e or very s im i la r m e a n i n g . A synse t can be used to ident i fy a sub jec t or sense of a token . s y s t e m : Used in th is d o c u m e n t to m e a n a set of in ter re la ted c o m p o n e n t s funct ion ing as a who le . It can m e a n any th ing f rom a s ing le funct ional uni t wi th in a par t icu lar sof tware app l ica t ion to the tota l i ty of in format ion and app l ica t ions connec ted v ia the Internet . Talo 90: Building 90: The Finnish building classification system. Full re ference under [Bui ld ing 90 Group and The F inn ish Bui ld ing Cent re L td . 1999 ] . T C / C S : Canadian Thesaurus of Construction Science and Technology. Full re ference under [Univers i ty of Mont rea l 1978 ] . te rm: One or more words des ignat ing a concep t [N ISO 2 0 0 5 ] . t h e s a u r u s . f o r A E C . c o m : Thesaurus for Architecture, Engineering, and Construction, a da tabase of t e rms and concepts used in the A E C indust r ies deve loped wi th in the C O N N E T Ini t iat ive wi th the purpose to suppor t index ing and retr ieval of A E C "ob jec ts " . S e e Sec t ion 6 . 3 . I . 6 . token: The t e rm used in th is d o c u m e n t to m e a n a w o r d , ph rase , i m a g e , ges tu re , sound or any th ing e lse that deno tes some th ing o ther t han itself ( i .e. a par t icu lar sub jec t ) . top te rm: The broades t descr ip tor in a thesau rus h ierarchy [N ISO 2 0 0 5 ] . A counte rpar t to the root node in o ther t ypes of seman t i c resources . t ransdicipl inary research: Resea rch that invo lves deve lopmen t of hol is t ic , overa rch ing conceptua l f r ameworks that t ranscend the nar row scope of d isc ip l inary wor ld - x iv -v iews . T R E C : Tex t RE t r i eva l Con fe rence—an annua l con ference sponsored by the U .S . nat ional agenc ies wi th the purpose of encourag ing research in in format ion ret r ieval f rom large text co l lect ions and focus ing on severa l in fo rmat ion- re t r ieva l research a reas , or " t r acks " . The Web t rack focuses on in format ion ret r ieval on the Wor ld Wide W e b . troponym: A verb that indicates an action more precisely than another more general verb. use case: A descr ip t ion of an act iv i ty or set of act iv i t ies that invo lve user 's in teract ion wi th a sys tem and lead to a tang ib le resul t f rom the user 's pe rspec t i ve . universe of discourse: A v iew of the real or hypothet ica l wor ld tha t inc ludes every th ing of in terest . WordNet: WordNet 3.0, a lex ical da tabase for the Eng l ish language deve loped by the Pr inceton Un ivers i ty Cogn i t i ve Sc ience Labora tory . Full re ference under [Pr inceton Un ivers i ty 2 0 0 6 ] . - xv -Acknowledgements I wou ld l ike to exp ress my i m m e n s e and s incere grat i tude to my adv iso r Dr. T h o m a s Froese for all his he lp, suppor t , and gu idance that he has been generous ly and unsel f ish ly prov id ing over the long yea rs . V e r y , ve ry spec ia l t hanks go to Dr. Dana Van ie r , my co -adv i so r , who persuaded me to pursue the deg ree , thus profoundly chang ing my life in the most posi t ive way , and prov ided con t inuous encou ragemen t and suppor t . I cons ider myse l f ve ry fo r tunate to have these two outs tand ing people as mento rs . I wou ld a lso l ike to thank the m e m b e r s of exam ina t i on and reading commi t t ees who invested the i r t ime and exper t i se at di f ferent t imes and prov ided usefu l feedback : Dr. T a m e r E l -D i raby , Dr. Eddie R a s m u s s e n , Dr. G r a h a m e Coope r , Dr. Barba ra Lence , Dr. Jerzy Wo j tow icz , Dr. Huse in A lnuwe i r i , and espec ia l l y Dr. A lan Russe l l , whose c o m m e n t s a lways impressed me by the del icate ba lance of sha rpness and he lp fu lness and who , together wi th my adv iso rs , made the t ime that I have spent in th is p rog ram a very p leasant and nur tur ing exper ience . My thanks go as wel l to Nat iona l Research Counc i l of C a n a d a and J .K . Zee Memor ia l Fe l lowship Fund for f inancia l suppor t . I a m eterna l ly gratefu l to my parents who inst i l led into me the love for learn ing and taught me the real va lues , to my s is ter and o ther fami ly who have a lways been there for me and prov ided suppor ted at dif f icult t imes of my life. I a lso want to acknow ledge f r iendsh ip and suppor t of my numerous f r iends sca t te red ac ross the wor ld due to the horr ib le Bosn ian war , espec ia l l y m y f r iends in V a n c o u v e r and genera t ion 1980 of the Univers i ty of Sa ra jevo Schoo l of Arch i tec tu re . A n d f inal ly , I wan t to thank my husband and ch i ldren for all the sacr i f ices they had to m a k e due to the long days and nights that I have spent work ing on the d i sse r ta t i on ; to t h e m I a m ded ica t ing this work. - x v i -To Davor, Ivo, and Sara 1 Introduction 1.1 Taming the Information Jungle This sect ion in t roduces the genera l p rob lem of in format ion m a n a g e m e n t , its s ign i f icance and cha l lenge in genera l and in the A E C / F M doma in in par t icu lar . It p resents mot iva t ion for th is research and a h igh- leve l ideal fo l lowed by an out l ine of the documen t . The cruc ia l role of in format ion in all a reas of bus iness act iv i ty is hard ly d ispu tab le . Research s tud ies are remarkab l y cons is ten t in demons t ra t i ng a c lear re la t ionship between the in format ion env i r onmen t of an organ iza t ion and its product iv i ty [Koenig 2 0 0 0 ] . The d o m a i n of Arch i tec tu re , Eng ineer ing , Cons t ruc t i on , and Faci l i t ies Managemen t ( A E C / F M ) is by no means an excep t i on . Success fu l ou t come of any task wi th in a cons t ruc t ion project requi res re levant , t ime ly , cor rect , c o m p r e h e n s i v e , cons is tent , and access ib le input in fo rmat ion : b luepr in ts , spec i f i ca t ions , regu la t ions , interest ra tes, marke t t rends , or p recedent des igns . The e - C o g n o s t e a m s ta tes that on ave rage " 1 5 % of a con t rac to r ' s tu rnover is spent on rework, much of wh ich could be avo ided by hav ing the r ight project in format ion at the r ight t i m e " [Zar l i et a l . 2 0 0 1 ] . A t the s a m e t ime , each task on its part a lso genera tes new or modi f ies ex is t ing in format ion that , e i ther obv ious ly and d i rect ly or not, may be of re levance for o ther t asks . T a s k s per formed in an arch i tec tura l off ice wil l p roduce d raw ings , wh ich are used as input in format ion for es t imat ing and schedu l ing of the project . Es t imat ing and schedu l ing will a lso be af fected by in format ion resul t ing f rom the task of bui ld ing a par t icu lar br ick wal l in another project , but on ly ind i rec t ly—as it wil l make one of the fac tors in the ca lcu la t ion of product iv i ty ra tes. Al l ou tput in format ion needs to be cap tu red and made ava i lab le where needed . Cap tu r i ng , s to r ing , o rgan iz ing , channe l l i ng , f ind ing, f i l ter ing, synch ron i z i ng , reus ing , a rch iv ing , and weed ing in format ion requi red to conduct bus iness act iv i t ies abso rbs major efforts and inves tmen ts in o rgan iza t ions of all sor ts at the outset of the th i rd m i l l enn ium. Rela ted t h e m e s have been on the forefront of research agendas in both academ ia and corpora te R&D ( research and deve lopmen t ) depa r tmen ts and a myr iad of di f ferent types of app l i ca t ions and techno log ies p romis ing the so lu t ion has appea red on the marke t , a long wi th a host of buzzwords and much confus ion a m o n g target users . Despi te m a n y fasc inat ing research resul ts and soph is t ica ted techno log ies , the f ield s e e m s sti l l to be in search of a c o m p r e h e n s i v e set of ef f ic ient, mutua l ly compa t i b l e , and wide ly app l icab le so lu t ions . - 1 -In format ion and in format ion f low in A E C / F M are f requent ly ident i f ied as a spec ia l p rob lem, more intr icate and cha l leng ing to manage than in o ther d o m a i n s [e .g . Cho i and Ibbs 1 9 9 5 ; Wi lson et a l . 2 0 0 1 , Rezgu i 2 0 0 1 ; Wi lson and Rezgu i 2 0 0 3 ; Peker ic l i et a l . 2 0 0 3 ; B reue r and F ischer 1994 ] . Hard ly any const ruc t ion project goes smoo th l y , c lose ly fo l lowing its or ig inal p lans , schedu le , and budget . P rob lems encoun te red dur ing and af ter a project lead to d i rect inc reases in const ruc t ion costs and to l i t igat ion which br ings both add i t iona l expenses and adve rsa ry re la t ionsh ips that are pervad ing the indust ry . Most p rob lems can be a t t r ibuted to m iss i ng , w r o n g , late, incons is tent , or more and more of ten, to super f luous in format ion [e.g. A b o u - Z e i d et a l . 1 9 9 5 ; Howel l and Bal lard 1 9 9 7 ; Thorpe and Mur ray 1 9 9 6 ; T h o m a s et a l . 1997 ] . For th is r eason , in format ion m a n a g e m e n t in its va r ious fo rms has been an ex t reme ly popu lar research topic in the doma in of in format ion techno logy in const ruc t ion as wel l [Bet ts and A m o r 2000 ] and a numbe r of c o m m e r c i a l too ls and so lu t ions target ing the doma in have been deve loped and of fered on the marke t , inc luding doma in -spec i f i c e lect ron ic and W e b - b a s e d documen t m a n a g e m e n t s y s t e m s , work f low m a n a g e m e n t s y s t e m s , co l labora t ion and project m a n a g e m e n t so lu t ions , in teroperab le and intel l igent sof tware app l i ca t ions , and suppor t ing s tanda rds , each of t h e m hav ing a potent ia l to improve a par t icu lar aspec t of in format ion m a n a g e m e n t in the indust ry . The mot iva t ion for th is research c o m e s f rom the observa t ion that those act iv i t ies that can cont r ibute the most to the increased ef f ic iency and other e n h a n c e m e n t s of the d o m a i n are least suppor ted by the ex is t ing too ls . It c o m e s f rom a v is ion of an ideal in format ion s y s t e m that is ab le to suppor t work of al l const ruc t ion project par t ic ipants by help ing t h e m at all t imes to do the best th ing in the best way . S u c h a s y s t e m shou ld prov ide all and only that in format ion which is re levant to a par t icu lar person at a par t icu lar t ime in a par t icu lar contex t , p resent that in format ion and a l low the user to re-present it in the way most su i tab le for the par t icu lar purpose. It shou ld suppor t not on ly rout ine tasks but a lso tasks such as dec is ion m a k i n g , p rob lem so l v i ng , des ign , ident i f icat ion of bus iness oppor tun i t ies , and inven t ion ; not on ly lower s tandards of pe r fo rmance such as " s a t i s f y i n g , " " co r rec t , " " g o o d , " but a lso "be t t e r - cheape r - f as te r , " " c r e a t i v e , " and " i nnova t i ve . " It shou ld br ing together the best of two w o r l d s -immed iacy , seman t i c s , and con tex t -sens i t i v i t y of in terpersona l in format ion f lows wi th the power , s p e e d , and cons is tency of compu te rs . Se t t ing such v is ion of a ful ly t a m e d in format ion jung le for a goal of a s ingle thes is wou ld cer ta in ly be over ly amb i t i ous , but being cons idered pa ramoun t , the v is ion has been used as the ideal that shou ld be st r ived for. Th is work is t ry ing to identi fy and accomp l i sh one s ing le manageab le s tep that can br ing this v is ion c loser to real i ty. 1.2 Document Overview The fo l lowing sect ion (Sect ion 2) ana lyzes di f ferent aspec ts of the genera l p rob lem in o rder to ident i fy an appropr ia te manageab le s tep to be add ressed by th is research . Sec t ion 3 fo rma l i zes the resul ts of that ana lys is into a s ta temen t of thes is ob jec t ives and desc r ibes the scope of work and the methodo logy fo l lowed in this research . It is fo l lowed by the d iscuss ion of h u m a n - t o - h u m a n c o m m u n i c a t i o n , wh ich was used as a gu id ing idea for the so lu t ion in Sec t ion 4 . Sec t ion 5 p rov ides an ove rv iew of the t ypes of ex is t ing seman t i c resources and Sec t ion 6 rev iews speci f ic re lated resea rch , deve lopmen t , and s tandard iza t ion ef for ts, both genera l and doma in -spec i f i c . Sec t ion 7 exposes the p roposed so lu t ion , whi le Sec t ion 8 descr ibes how the p roposed f ramework was imp lemen ted in the pro to type. Va l ida t ion star ts in Sec t ion 7 .3 .2 wi th the reduct ion of the ex is t ing seman t i c resources to the e lemen ts of the f r amework , d iscuss ion and eva lua t ion of th is p rocess in Sec t ion 7 .3 .3 and con t inues wi th the eva lua t ion of the ent i re f r amework in Sec t ion 8.4 af ter the pilot imp lementa t ion is desc r i bed . The f inal sect ion (Sec t ion 9) eva lua tes the de l i verab les by compa r i ng t h e m to the s ta ted ob jec t i ves , s ta tes the cont r ibut ions of th is research by compa r i ng its resul ts to the point of depar tu re , predic ts its impac t on the observed p rob lem, and prov ides sugges t ions for future research and deve lopmen t towards the s ta ted idea l . S h a d e d boxes th roughou t the documen t inc lude e x a m p l e s that clar i fy the mean ing of abs t rac t concep ts or i l lustrate how gener ic d iscuss ion re lates to the A E C / F M pract ice . The bold typeface is used in the body text to e m p h a s i z e key concep ts d i s cussed , wi th the in tent ion to prov ide a v isua l c lue suppor t ing or ienta t ion wi th in the documen t . Under l ine deno tes t e r m s that are def ined in the g lossary a n d / o r d i scussed in detai l e l sewhere in the documen t . In the e lect ron ic vers ion of the d o c u m e n t , the under l ined te rms are hyper l inked to g lossary ent r ies or o ther d o c u m e n t f r agmen ts c lar i fy ing denoted concep ts . Be ing hyper l inks , in- text re ferences to o ther d o c u m e n t sec t ions are a lso under l ined . The g lossary se rves two bas ic purposes . First , it inc ludes te rms impor tant for the proper unders tand ing of the text that have been incons is tent ly used in pract ice and in research l i terature; it ref ines and d i samb igua tes the mean ing of these te rms as used in this thes is . It a lso inc ludes def in i t ions of a c r o n y m s and s o m e less-known t e r m s , pr imar i ly those per ta in ing to doma ins o ther than A E C / F M . It may be noted that s o m e key t e rms , in par t icu lar " i n fo rma t i on " and " k n o w l e d g e " , are not def ined in the g lossary . Desp i te the prol i ferat ion and d ivers i ty of def in i t ions for these te rms in recent l i terature, it has been d e e m e d that these te rms have a long history of use and that adopt ion of va r ious def in i t ions by di f ferent people does not lead to m isunde rs tand ing . There fo re , th is work adopts a p ragmat i c approach and lets the mean ings of these t e rms e m e r g e th rough the i r use in natural language whi le ana lyz ing tasks and e lemen ts of in format ion use in the A E C / F M d o m a i n , t ry ing to a l low the reader to keep def in i t ions of the i r o w n . 2 Problem Analysis This sect ion ana l yzes the broad p rob lem of t am ing in format ion in the d o m a i n . The concept of in format ion need is ana lyzed and genera l app roaches current ly used to sat is fy it are brief ly rev iewed . The dif f icul t ies invo lved are ou t l ined , leading to the ident i f icat ion of research goa ls in the subsequen t sec t ion . 2 . 1 Information Need People work ing in the A E C / F M industr ies need in format ion in order to do thei r j obs , to g ive the i r cont r ibut ions towards the rea l izat ion of bus iness goa l s ; they need to get inst ructed or to m a k e in formed dec is ions on what shou ld be done and how. Rare are the cases when people need comprehens i ve expl ic i t ins t ruc t ions or can act wi thout any in format ion inpu ts ; most often people need p ieces of in fo rmat ion , wh ich get p rocessed , to a va ry ing degree , by thei r prev ious know ledge and c o m p l e m e n t that know ledge 1 . The concept of information n e e d has been ex tens i ve l y invest iga ted in the theory of in format ion re t r ieva l . Mackay descr ibed it as " i nadequacy in what we may call [ someone 's ] 's tate of read iness ' to interact purposefu l ly wi th the wor ld a round [ them] in a par t icu lar a rea of interest" [Mackay I 9 6 0 ] . Be lk in in t roduced the concept of Anomalous States of Knowledge (ASK). A n A S K occurs when s o m e o n e dec ides to use thei r knowledge or image of the wor ld for wha teve r purpose and real izes that there is an a n o m a l y — a gap , lack, uncer ta in ty , or incoherence [Belk in 1 9 8 0 ; Be lk in et a l . 1982 ] . S tar t ing f rom Belk in 's theory , Mizzaro def ined in format ion as a di f ference be tween two knowledge s ta tes [Mizzaro 1996 ] . I ngwersen , who sees in format ion seek ing as act ion under taken to reso lve doubts that cannot be reso lved by th ink ing a lone , d is t ingu ishes three fundamen ta l t ypes of in format ion need . Verificative need refers to the prob lem of locat ing a speci f ic in format ion resource about w h i c h , at least s o m e in format ion is known (e .g . au thor of a documen t , name of a phone number ' s owner ) , conscious topical where the user needs " to clar i fy , rev iew or pursue aspec ts of known sub jec t mat te r , " and muddled topical need or i l l -def ined in format ion p rob lems where the user wan ts to exp lore a new concept or concept re lat ions outs ide known sub jec t mat ter . [ Ingwersen 1 9 9 2 , 117] The f irst, f requent ly over looked cha l lenge in the process of obta in ing and manag ing in format ion is to recogn ize the ex is tence of an in format ion need . Seve ra l researchers 1 It should be noted that an additional reason for creating information in the domain is the need to document everything and have protection in case of potential litigation. This aspect, however, is not central to the scope of this work and will not be discussed in much detail. invest igated the process of in format ion need fo rma t ion . In her const ruc t iv is t app roach , Kuh l thau not iced that cont rary to the a s s u m p t i o n of typ ica l in format ion s y s t e m s and theor ies , the ear ly s tages of in format ion seek ing are c o m m o n l y f raught wi th uncer ta in ty and confus ion and that the ent i re p rocess invo lves the interact ion of th ink ing , act ing and feel ing [Kuh l thau 1 9 9 3 , p .344] . Tay lo r p roposed that the in format ion need is init ial ly visceral—actual but u n e x p r e s s e d , then tu rns into a conscious, in ternal ly conceptua l i zed need before it can be ex te rna l i zed and c lear ly fo rmu la ted (formalized), and f inal ly adap ted to the ava i lab le resources (compromised). He proposed that a query , wh ich is usual ly cons idered as the exp ress ion of in format ion need is "a descr ip t ion of an area of doubt in wh ich the ques t ion is open e n d e d , negot iab le and d y n a m i c . " [Taylor 1968 ] . Be lk in proposed that an in format ion retr ieval s y s t e m shou ld d iscover and represent the user 's knowledge of the prob lem—not ask ing the user for the ques t i on , but t ry ing to f ind out the ques t ion f rom the user [Belk in 1980 ] . The art i f ic ial inte l l igence c o m m u n i t y looks at the in format ion need f rom a prob lem so lv ing v iewpo in t and d is t ingu ishes : d o m a i n in format ion (e .g . known scient i f ic fac ts) , p rob lem in format ion (e .g . pro jec t -spec i f ic da ta ) , and p rob lem so lv ing in format ion ( i .e. exper t i se in p rob lem so lv ing) [e .g. Bar r and Fe igenbaum 1981 ] . The Gar tne r G roup typo logy of in format ion worke rs , w ide ly accepted and used in the IT doma in d is t ingu ishes between four t ypes of in format ion w o r k e r s : • H igh -pe r fo rmance worke rs per fo rm h igh -va lue , miss ion-cr i t i ca l t asks in h igh ly spec ia l i zed a reas , such as eng inee r i ng , 3 D - m o d e l l i n g , or a rch i tec tu re , and make app rox ima te l y 1% of the work fo rce 2 . • Know ledge worke rs ( 20%) ga ther , add va lue to , and commun i ca te in format ion in a dec is ion suppor t p rocess . The i r tasks are f lex ib le , project-dr iven and in format ion needs are a d - h o c . They make the i r own dec is ions on what to work on and how to accomp l i sh the task . E x a m p l e s of knowledge worke rs inc lude: project m a n a g e r s and execu t i ves . • S t ruc tu red - task worke rs ( 74%) add new in fo rmat ion , but they per form repet i t ive tasks dr iven by p rocedures and p rocesses that can be c lear ly desc r ibed . S t ruc tu red tasks inc lude: c la ims p rocess ing , accounts payab le , accounts rece ivab le , h igh -end const ruc t ion work , ma in tenance , and repair . 2 The percentages are provided for the workforce in general and not adjusted to the AEC/FM domain. • Data ent ry worke rs (5%) act based on input da ta but do not add va lue to in format ion . Th is ca tegory inc ludes c ler ical and lower -end const ruc t ion worke rs and d ra f t spersons . In format ion needs of these d is t inct g roups of wo rke rs va ry w ide ly in t e rms of the amoun t , t ype , sou rce , and var ie ty of in format ion requ i red . H igh -pe r fo rmance worke rs need comprehens i ve input d a t a , con t inuous updates to pro fess iona l in fo rmat ion , re ference mater ia ls . A s the exce l lence of these worke rs is often dependen t on c reat iv i ty , they wil l a lso ideal ly need in format ion not obv ious ly re lated to the i r j obs , wh ich may spur the i r creat iv i ty th rough reve la t ion of remote l inks , usual ly found by non -goa l -d r i ven browsing of books or observa t ion of nature. Whi le s t ruc tu red- task worke rs m a y be eas i ly served by task input in format ion and documen ts deta i l ing p rocedures and gu ide l ines , knowledge worke rs wil l need large amoun ts of in format ion f rom very d ive rse sources , wh ich wil l w ide ly dif fer f rom task to task , and may need to be represented in mul t ip le and non-s tandard w a y s to enab le ins ight . The i r t asks roughly co r respond to h u m a n act iv i t ies that t rans fo rm in format ion into know ledge , ident i f ied by Davenpor t and Prusak 1998) . • C o m p a r i s o n : how does in format ion about th is s i tuat ion compare to o ther s i tua t ions? • C o n s e q u e n c e s : what impl ica t ions does the in format ion have for dec is ion and act ions? • Connec t i ons : how does th is bit of knowledge relate to o thers? • Conve rsa t i on : what do o ther people th ink about this in format ion? [Davenpor t and Prusak 1998] T a s k s us ing s tandard types of in format ion f rom known sources , no mat te r how in format ion- in tens ive or sens i t i ve to er ror , are wel l se rved by the ex is t ing in format ion s y s t e m s . The cen tu r ies - long t radi t ion of bui ld ing led to the es tab l i shment of we l l -def ined in format ion- f low pat terns for core p rocesses and curricula for educat ing and t ra in ing var ious par t ic ipants . Many s tud ies ident i f ied s tandard t ypes of content requi red by speci f ic g roups and roles in the d o m a i n at di f ferent levels of g ranu lar i ty [e.g. Wix and Liebich 2 0 0 0 ; Bowden 2 0 0 2 ; Shahid 1996; Tenah 1 9 8 6 ; Br i t ish Te lephone 1 9 9 5 ; C lay ton et a l . 1999 ] . In au toma ted e n v i r o n m e n t s , s tandard work f lows , known- i t em search suppor t , and subscr ip t ions are c o m m o n l y inc luded in 2 0 % of fea tures sat is fy ing 8 0 % of the need , fo l lowing " the 8 0 : 2 0 ru le " or Pareto pr inc ip le , w ide ly used in in format ion s y s t e m deve lopmen t . T a s k s wi th v a g u e , chang ing in format ion requ i rements and non -rout ine tasks ( i .e. h igh -pe r fo rmance and knowledge work ) , however , can of ten represent those "v i ta l f e w " — 2 0 % of t asks br ing ing 8 0 % of the va lue to the bus iness . A n a t tempt to ident i fy in format ion needs in the d o m a i n leads to a ser ies of ques t ions : who and how can one ident i fy the ideal s ta te of know ledge in A E C / F M pract ice, who and how can one assess the di f ference be tween the ex is t ing and ideal knowledge s ta te , and f inal ly what is the r ight scope wi th in wh ich these ques t ions shou ld be asked—ques t i ons very s im i la r to those ra ised by O 'Connor , who contes ted the overuse of the te rm " in fo rmat ion n e e d " , point ing to the obscur i ty of the under ly ing concept [O 'Connor 1968 ] . In bus iness env i r onmen ts these ques t ions have recent ly been add ressed by the use of two basic knowledge m a n a g e m e n t t echn iques—needs ana lys is and in format ion audi t . In needs analysis, in fo rmat ion users are asked to speci fy what in format ion resources they need for per forming the i r j obs , in order to rat ional ize acquis i t ion and to m a n a g e d is t r ibut ion of in format ion resources . The goal of the information audit is to f ind out how those in format ion resources are actua l ly used . It looks at how act iv i t ies of di f ferent actors relate to organ iza t iona l ob jec t i ves , assesses the degree of the i r s t ra teg ic s ign i f i cance, and ident i f ies in format ion requi red to suppor t each act iv i ty and in format ion genera ted by e a c h . It revea ls in format ion f lows wi th in an organ iza t ion and be tween an organ iza t ion and its ex terna l env i ronmen t and " ident i f ies the ex is t ing fo rma l and informal commun ica t i on channe ls that are used to t rans fer in format ion as wel l as highl ight ing inef f ic iencies such as bo t t lenecks , gaps and dup l i ca t ions . " [Hencze l 2000 ] However , the app l ica t ion of these techn iques to the A E C / F M d o m a i n is not s t ra igh t fo rward . The pr imary reason is the modus operandi of the indus t r ies , wh ich is based on mul t ip le -par ty projects—ad-hoc a l l iances of o rgan iza t ions of va r ious spec ia l t ies , s i zes , cu l tu res , and bus iness ph i losoph ies [Kazi and C h a r o e n n g a m , 2 0 0 3 ] . Th is enta i ls that in format ion aud i ts shou ld ideal ly be carr ied out in a cons is ten t and compat ib le m a n n e r and ef fect ive ly synch ron i zed wi th in and ac ross di f ferent s c o p e s : o rgan iza t iona l , project , and indus t ry -w ide—a daunt ing endeavou r s tar t ing f rom the ident i f icat ion of appropr ia te par t ies to ini t iate and execu te it. In project con tex ts , in format ion aud i ts will typ ica l ly focus on in format ion t ransac t ions be tween di f ferent project par t ic ipants [e .g . Wix and Liebich 2 0 0 0 ] , whi le o rgan iza t iona l aud i ts may rather concent ra te on in format ion needed to suppor t par t icu lar j ob roles [ G o o d m a n and Ch inowsky 2 0 0 0 ] . In both c a s e s , however , a task—in fo rmat ion needed to suppor t it or genera ted by i t—represents the p r imary unit of interest , whe the r it is obse rved as a part of a bus iness process or part of a job descr ip t ion . Const ruc t ion pro jects invo lve a wide range of very d iverse tasks—from pour ing concre te or draf t ing to p rob lem so l v ing , d e c i s i o n - m a k i n g , or des ign ing , to ident i fy ing bus iness oppor tun i t ies and invent ing . There are tasks that are repet i t ive , s t anda rd i zed , and inherent to par t icu lar j ob roles and bus iness p rocesses and there are those that are occas iona l , or even o n e - o f - a - k i n d , ar is ing f rom chang ing c i r cums tances and newly acqu i red in fo rmat ion . In format ion requ i rements for di f ferent t ypes of tasks vary w ide ly in t e rms of the a m o u n t , na ture , content , f o rm , and qual i ty of requi red in fo rmat ion . Cer ta in tasks requi re the s a m e l imi ted set of in format ion f rom k n o w n , s tandard sources for each repet i t ion , o ther have ve ry in tens ive but qui te vague or cons tan t l y chang ing in format ion requ i remen ts ; s o m e are more sens i t i ve to e r roneous or m iss ing in format ion than other . A n imped imen t to t ask -based in format ion aud i ts is the fact that the def in i t ion of a task is qu i te e lus ive in te rms of its g ranu la r i t y ; most tasks can be b roken down into a ser ies of sub tasks as wel l as s u b s u m e d under an unl imi ted set of s u m m a r y tasks at di f ferent levels of speci f ic i ty . These comp lex re la t ionsh ips , inc luding mul t ip le inher i tance and syne rg ies , m a k e the ident i f icat ion of in format ion requ i rements for each speci f ic task used in t a s k - m o d e l l i n g , ha rd -compu t ing app roaches , ve ry diff icult [Bys t rdm and Jarve l in 1995 ] . Fu r the rmore , in format ion is not or shou ld not only be used to per fo rm a s s i g n e d , predef ined t asks , but a lso to shape t asks , def ine new ones , or to dec ide whe the r they shou ld be per fo rmed at a l l . There fo re , ident i f icat ion of the in fo rmat ion need mus t cons ider many add i t iona l intr icately in terre lated factors , a task being only one a m o n g equa ls . In format ion need for the s a m e type of task wil l differ be tween projects, based on the project t ype , comp lex i t y , un iqueness , o rgan iza t i on , and a host of o ther va r iab les . It may a lso dif fer be tween organizations, based on thei r goa ls , s i ze , cu l tu re , p rocedures , s tandards , gu ide l ines and so on . A n ex t reme ly impor tan t factor is the person who needs the in fo rmat ion , or information user. A l though all project par t ic ipants work wi th in format ion that descr ibes or is re lated to the s a m e project , depend ing on thei r job role, every par t ic ipant in a project : • needs a di f ferent v iew of project in format ion—di f ferent subse t of interest , p r imary focus , level of deta i l and prec is ion , aspec t of par t icu lar ent i t ies or top ics , mode of rep resen ta t ion , • uses that subset in comb ina t ion wi th di f ferent sets of ex te rna l—pro fess iona l , co rpora te , and persona l in fo rmat ion , and • in terprets in format ion based on more or less di f ferent concep tua l i za t ion of the d o m a i n . Howeve r , an ind iv idual 's needs do not va ry only in t e rms of the i r j ob role and pro fess ion , but a lso in te rms of the i r personal characteristics: p rev ious know ledge , expe r ience , cu l tura l backg round , p re ferences , cogn i t ive capab i l i t ies , goa ls , pr ior i t ies, s tandards of exce l l ence , cur ios i ty , and habi ts . User needs and behav iou rs have been a topic of interest of mul t ip le d isc ip l ines, inc luding cogni t ive sc ience , in format ion sc ience , and marke t resea rch . A wide var ie ty of user mode ls that have been used in pract ice vary in t e rms of whe the r they are mode ls of typ ica l users or mode ls of ind iv idual users , speci f ied expl ic i t ly by the user h imsel f or inferred by the s y s t e m based on user 's behav io r , con ta in longer - te rm genera l in format ion (e .g . j ob role or genera l in terests) or highly speci f ic sho r t - t e rm in format ion (e .g . sess ion data) [Rich 1979 , 3 3 0 ] . Standards of performance represent ye t ano ther modi f ie r of the in format ion need . These s tandards can be expl ic i t ly set or impl ic i t and based on e i ther l ong- or sho r t - t e rm bus iness goa ls or on persona l cr i ter ia . They depend on the nature of the task , j ob role, type of pro ject , and on all o ther e lemen ts of con tex t and can range f rom sa t i s fy ing , cor rect , or good to bet ter , cheape r , faster , to perfect , c rea t i ve , and innovat ive . Location can affect in format ion requ i rements th rough d i f fe rences in leg is la t ion, l anguage , local genera l expec ta t ions of qual i ty , c l imate and o ther env i ronmen ta l fac tors , and so on . Cost is an impor tant but f requent ly unrecogn ized factor . It invo lves costs of c rea t ing , ob ta in ing , and consum ing in fo rmat ion , e i ther d i rect mone ta ry costs or those caused by the t ime and effort requ i red . If the costs a re too h igh , an in format ion requ i rement is typ ica l ly ignored or reduced , a l though of ten subconsc ious l y . A s St ib ic no ted , "eve ryone uses only the in format ion that he can ef for t lessly f ind , and ignores sources and types of in format ion that are not eas i ly access ib le . " [St ib ic 1980 , 2] External environment is a very broad concept that may inc lude e c o n o m i c , marke t , or techno logy condi t ions and t rends , compet i t i on , budgetary cons t ra in ts and so o n . Th is is an espec ia l l y cha l leng ing modi f ier s ince its ex ten t , content , pa t te rns , and rules are diff icult to g rasp and cap tu re . - 10 -In format ion requ i rements a lso vary wi th time, ch ief ly due to the inev i tab le change in o ther e lemen ts of the contex t . The set of in format ion that wil l sat is fy a need must ref lect the cur rent s ta te wi th respect to both past and fu ture. A prec ise def in i t ion of the in format ion need wi th in any scope—pro jec t , o rgan iza t ion , task , person—is not an easy task , due to its con tex t sens i t iv i ty and the d ivers i ty , comp lex i t y , and in te r - re la tedness of contex t fac tors . Any potent ia l efforts on model l ing or deve lop ing onto log ies of in format ion needs shou ld take into accoun t the i r ve ry f inely g ra ined mul t i - faceted nature. The ex tent to wh ich in format ion needs of A E C / F M actors are cur rent ly sat is f ied can only be assessed in te rms of the abi l i ty of the ex is t ing s y s t e m s to adapt to the var ie ty of speci f ic con tex ts . The act of sat is fy ing the in format ion need invo lves its recogn i t ion , fo rmu la t i on , ma tch ing to ava i lab le in fo rmat ion , and obta in ing access to in format ion d e e m e d as re levant . Th is process invo lves a n u m b e r of s teps invo lv ing in format ion user , in format ion prov ider , and any potent ia l i n te rmed ia r ies : in format ion pro fess iona ls or, increas ing ly , c o m p u t e r p rog rams . Whi le in s o m e d o m a i n s , the roles are str ict ly sepa ra ted , in A E C / F M indust r ies , the s a m e person typ ica l ly p lays mul t ip le ro les. The subsec t ions that fol low wil l descr ibe di f ferent aspec ts and s teps of th is p rocess . 2 . 2 Information Flow Channels In t rad i t ional pract ice, as shown in Figure 1, in format ion f lows th rough var ious comb ina t i ons of de l ivery modes wi th in three bas ic types of env i r onmen t : soc ia l i za t ion , med ia ted c o m m u n i c a t i o n , and in format ion s to rage and re t r ieva l . The d i f ferences are in the co -occu r rence of par t ic ipants in t ime (synch ronous v s . asynch ronous commun ica t i on ) and space , in the degree of i m m e d i a c y , in terac t ion, and con tex tua l i za t ion , senses i nvo lved , pers is tence , re t r ievabi l i ty , and avai lab i l i ty of the in format ion e x c h a n g e d . - 11 -Figure 1. Informat ion f low env i ronments . In direct-communication env i r onmen ts , people obse rve each o ther wo rk i ng , ask ques t i ons , g ive and rece ive d i rect ions and adv ice , and e x c h a n g e expe r i ences . In format ion f lows th rough ad hoc pat terns of push and pull (descr ibed in the fo l lowing sect ion) in an i terat ive process of def in ing and sat is fy ing in fo rmat ion need . The in format ion is a lways rece ived in its full context but all in fo rmat ion e x c h a n g e rema ins recorded only in persona l memor ies of the par t ic ipants . Mediated communication can be more or less f o r m a l i z e d , synch ronous or a s y n c h r o n o u s , invo lv ing ad hoc or s tandard ized reques t - response pa t te rns . It can use a var ie ty of m e d i a , such as paper , te lephone , e - m a i l , fax , or v i deo -con fe renc ing . Both the need and response a re ex te rna l i zed and can be pers is ted . H o w e v e r , what wil l be pers is ted , whe re , and how, is left to the discret ion of the par t ic ipants or imposed and enforced by organ iza t iona l p rocedures . In information storage and retrieval e n v i r o n m e n t s , in fo rmat ion cons ide red re levant for a cer ta in c o m m u n i t y is pers is ted and o rgan ized in such a way that it can later be ret r ieved by anyone who might need it. Both the def in i t ion of the in format ion need and its ex te rna l i za t ion are the responsib i l i ty of the in fo rmat ion user , a l though h u m a n or mach ine ass is tance may be ava i lab le . Va r ious imp lemen ta t i ons vary in te rms of who dec ides what in format ion to store and who m a k e s it re t r ievab le and how. The adven t of new techno log ies is d ramat ica l l y chang ing the commun i ca t i on env i ronmen t and in format ion f low pat terns. It m a d e the d is t inc t ion be tween the c lass ica l commun i ca t i on env i ronmen ts b lur ; t e lep resence or a u t o m a t e d work f low, for e x a m p l e , a re dif f icult to fit into any s ingle ca tegory . Both l ive and med ia ted exchange of in format ion is increas ing ly captured and stored for fu ture re t r ieva l . On the other hand , - 12 -par t ic ipants in a cons t ruc t ion project are rarely s tor ing in format ion and ret r iev ing it f rom a s ing le s y s t e m . In fo rmat ion entered into one s y s t e m is passed into ano ther , mod i f i ed , p runed , or enr iched and passed fur ther . A n increas ing ly c o m p l e x network of in te rmed iary s y s t e m s (Figure 2 ) usual ly ex is ts be tween c rea to rs and c o n s u m e r s of in fo rmat ion , per forming any of the s teps compr i s ing the p rocess of sat is fy ing in format ion need and prov id ing add i t iona l , va lue -add ing se rv i ces . The increased n u m b e r of in te rmed iar ies , whi le substant ia l l y increas ing c o m m u n i c a t i o n capab i l i t i es , ava i lab i l i ty and qual i ty of i n fo rmat ion , on the other h a n d , impove r i shes h u m a n - t o - h u m a n c o m m u n i c a t i o n . A s a resu l t , there is a lack of the con tex t a w a r e n e s s , f lex ib i l i ty , easy commun i ca t i on of mean ing and resolut ion of m i sunde rs tand ing , that a re charac te r iz ing the process of soc ia l i za t ion and s o m e more direct f o rms of c o m m u n i c a t i o n . Direct human-to-human communication gsi Information storage and retrieval Figure 2. Current communication flow. The ideal env i ronmen t would feature con t inuous co l locat ion or te lep resence invo lv ing all h u m a n senses , wi th the possib i l i ty of both synch ronous and asynch ronous t w o - w a y c o m m u n i c a t i o n , and al l in format ion pers is ted and re t r ievab le w i th in its ful l contex t . It wou ld a l low con t inuous , direct in teract ion be tween in format ion users and prov iders th rough al l s teps of the in fo rmat ion-acqu i r ing p rocess , a long al l the benef i ts brought about by new techno log ies . - 13 -2.3 Information Acquiring Modes Two bas ic mode ls for acqu i r ing re levant i n fo rmat ion : pull and push , can be def ined based on the party per fo rming the bas ic s teps of def in ing and ref ining the in format ion need . In the "pull" a l te rna t ive , the in format ion user both speci f ies the in format ion need and locates in format ion resources sat is fy ing it. "Push" invo lves the in format ion prov ider def in ing the need and de l iver ing cor respond ing in format ion resources . Push techno log ies have roots in t rad i t ional educa t ion and m a s s m e d i a . In t rad i t ional bus iness env i ronmen ts , the o ther par ty can be a mento r , m a n a g e r , par t ic ipant in the prev ious s tep of the bus iness p rocess , or an in format ion off icer or a corpora te l ibrar ian imp lement ing the process of in format ion d i ssemina t i on . Examp les of push techno log ies in networked env i ronmen ts inc lude e -ma i l l is ts, content subscr ip t ions , and Webcas t s . A deta i led d iscuss ion of de l ivery m e c h a n i s m s in ne tworked env i ronmen ts is ava i lab le in [Frankl in and Zdon ik 1998 ] . The i r t a x o n o m y of these m e c h a n i s m s , however , does not e labora te the aspec t most re levant in the contex t of th is work , i.e. what par ty : the user , the prov ider , or the s y s t e m , per fo rms wh ich s tep of the p rocess : recogn izes , fo rmu la tes , ref ines, mod i f ies , and updates the in format ion need , ini t iates in format ion re t r ieva l , locates potent ia l sources of in fo rmat ion , ident i f ies and acqu i res appropr ia te in format ion resources . Th is aspec t e n c o m p a s s e s a con t i nuum f rom the user 's prec ise request for a speci f ic piece of in format ion to the def in i t ion of a con t inuous in format ion need th rough more or less g ranu la r user profiling, to the ve ry broad s ta temen t of genera l in terests , exp ressed by subscr ib ing to a mai l ing l ist, e -z ine or o ther source of content . Whi le one end of the con t i nuum puts the user in con t ro l , the other opens the door for over load wi th i r re levant in fo rmat ion , s p a m , and " m i n d con t ro l " , but a lso prov ides for serend ip i t y , In const ruc t ion pro jects , t rad i t ional in format ion f lows are sti l l p reva i l i ng : direct or med ia ted (pr imar i ly fax or e -ma i l ) exchange of in format ion be tween two part ies involv ing both push and pull and publ ish ing to a cent ra l repos i tory (e .g . e lec t ron ic documen t m a n a g e m e n t s y s t e m or shared project da tabase) f rom wh ich in terested part ies can pull in format ion as n e e d e d . Work f low m a n a g e m e n t s y s t e m s , wh ich are s lowly penetrat ing the d o m a i n , are us ing the push mode wi th the def in i t ion of the in format ion need based on user 's j ob role. Aggrega t ion and subscr ip t ion serv ices are used for acqu i r ing ex terna l in fo rmat ion , such as indust ry news (e .g . A E C C a f e . c o m ) . - 14 -2.4 Information Management Disciplines A few basic app roaches to in format ion m a n a g e m e n t can be d is t i ngu ished , co r respond ing to t radi t ional d isc ip l ines f rom which they or ig inate . Pr inc ipal d i f ferences are due to the nature of in format ion they are handl ing and the contex t in wh ich that in format ion is c reated and used . Librarianship is concerned wi th prov id ing access to the content of in format ion i tems rather than to the or ig ina l i tems t hemse l ves . The pr imary a r rangemen t of i tems is by sub jec t and organ iza t iona l s c h e m e s are based on d isc ip l ines , a l though m a n y addi t iona l access po in ts , such as au thor or da te , are p rov ided . In format ion hand led by l ibrar ies is very d ive rse but typ ica l ly c rea ted for the ve ry purpose of c reat ing in format ion and not as a s ide -p roduc t of ano the r act iv i ty . Onl ine da tabases ma in ta ined by c o m m e r c i a l in format ion prov iders , d ig i ta l l ibrar ies, and the large major i ty of Web s i tes a re based on th is t radi t ional mode l . In format ion arch i tec ture and assoc ia ted d isc ip l ines cur rent ly used in Web content m a n a g e m e n t al l evo lved f rom th is t radi t ional d isc ip l ine and are based on its pr inc ip les and techn iques . In format ion i tems hand led by archives, however , are created and a s s e m b l e d in the natura l course of h u m a n act iv i ty and represent ev idence of what has h a p p e n e d . The contex t in wh ich the i tems were c reated is more impor tan t than the i r content . There fo re , the a r r a n g e m e n t is a lways "by p rovenance" and in the s a m e ser ies and chrono log ica l sequence that the i tems were or ig inal ly a s s e m b l e d , a l though addi t iona l au thor or sub jec t indices m a y be p rov ided . For the s a m e reason , a rch iv is ts descr ibe ent i re co l lec t ions of re lated mater ia ls rather than indiv idual i tems wi th in t h e m . Un l ike in l ibrary s y s t e m s , the ac tua l i tems and thei r authent ic i ty are impor tant . A l t h o u g h , in the highly l i t ig ious indust ry , a rch iv ing is per fo rmed pr imar i ly for use in potent ia l legal c a s e s , s o m e of these pr inc ip les found (or could f ind) use in deve lop ing and manag ing corpora te and project m e m o r i e s , in captur ing " in ten t ion" , and in con tex t -based in format ion m a n a g e m e n t . Records management dea ls wi th in format ion genera ted in the conduc t of the affairs of an organ iza t ion whi le th is in format ion is stil l in rout ine use and not ready to be arch ived or d i sposed . Records m a n a g e r s d is t ingu ish three bas ic k inds of in fo rmat ion : t ransac t iona l , re fe rence, and housekeep ing records. The organ iza t ion s c h e m e needs to be dev ised for each par t icu lar o rgan iza t i on , s tar t ing f rom the top- leve l c lass i f ica t ion by func t ion , severa l bas ic t ypes of f i l ing me thods , and es tab l i shed eva lua t i ve pr inc ip les. The only un iversa l rule is "f i le for re t r ieva l . " The disc ip l ine is pr imar i ly occup ied with - 15 -logist ical aspec ts of hand l ing in format ion th rough its l i fecycle, ra ther than wi th its content , and in format ion s to rage and retr ieval is on ly one of its c o m p o n e n t s . Pr inc ip les and techn iques of records m a n a g e m e n t are used in logist ical aspec ts of in format ion m a n a g e m e n t , and espec ia l l y in work f low m a n a g e m e n t s y s t e m s . Information Resources Management (IRM) concerns i tself wi th manag ing in format ion produced by an organ iza t ion th roughout its life cyc le ( l ike records m a n a g e m e n t ) as wel l as wi th the col lect ion and use of in format ion useful to the organ iza t ion in meet ing its m iss ions , goa ls , and ob jec t ives ( l ike corpora te l ib rar iansh ip) . In add i t i on , IRM regards in format ion as a resource that needs to be m a n a g e d toge ther wi th o ther re lated resources , such as h u m a n resources , funds , equ ipmen t and techno log ies . It can be sa id that IRM is becoming an obsole te concep t that has been in tegrated in the broader d isc ip l ine of Knowledge Management (KM). In add i t ion to these genera l d isc ip l ines , s o m e doma in -spec i f i c app roaches to in format ion m a n a g e m e n t have been deve loped . Medical and legal in format ion s y s t e m s are probab ly the best known and mos t soph is t ica ted e x a m p l e s . T w o o ther t ypes of s y s t e m s hand l ing speci f ic t ypes of in fo rmat ion , however , are of more in terest for the A E C / F M d o m a i n . Museum documentation tack les cha l leng ing p rob lems of descr ib ing and o rgan iz ing a wide var ie ty of rea l -wor ld ob jec ts—ar te fac ts , wo rks of ar t , and s p e c i m e n . A s both ob jects and in format ion about these ob jects are d ive rse , subs tan t ia l l y di f ferent o rgan iza t iona l s c h e m e s have been used for di f ferent co l lec t ions. His tor ica l ly , co l lect ion m a n a g e m e n t s y s t e m s have been geared to internal users (e .g . , cu ra to rs , reg is t rars) and only recent ly s tar ted target ing ex te rna l , par t icu lar ly Internet aud iences . The change f rom local focus to shar ing led to the need to s tandard ize co l lect ion and i tem descr ip t ions . Probab ly due to the recency of these d e v e l o p m e n t s , no s y s t e m s used in other f ie lds are based on th is mode l . However , as the comp lex i t y of issues invo lved in organ iz ing and mak ing re t r ievable very d iverse real word ob jects and the i r su r roga tes at mul t ip le levels ( i .e. ob ject i tself, its p ic ture, d ig i t ized vers ion of that p ic ture, i tem vs . col lect ion etc.) t r iggered a substant ia l a m o u n t of resea rch , it can be expec ted that s o m e of the so lu t ions deve loped th rough that research will be re -pu rposed for di f ferent doma ins . The mos t l ikely app l ica t ion area is m a n a g e m e n t of product in format ion on the Web . D isc ip l ines and industr ies us ing geospatial information have m a n y p rob lems in c o m m o n wi th the A E C / F M d o m a i n , f irst of al l—the need to m a n a g e spat ia l ly re ferenced - 16 -i n fo rmat ion . A l though in format ion in these d o m a i n s is ful ly s t ruc tured and re ferenced to the Ear th , bas ic pr inc ip les of spat ia l re ferencing and in tegra t ion , as wel l as da ta s t ruc tu res , can eas i ly be app l ied to o ther k inds of in format ion and to any spat ia l re ferenc ing s y s t e m . A s loca t ion , geomet r y , topo logy , p rox im i ty , connec tedness , and v isua l representa t ions are essent ia l e lemen ts of A E C / F M in fo rmat ion , techn iques deve loped for manag ing geospat ia l in format ion may enab le captur ing and presenta t ion of s o m e impor tant re la t ionsh ips and s o m e conven ien t w a y s to m a n a g e in fo rmat ion , espec ia l l y wi th the increased use of sens ing techno log ies in the d o m a i n . To s o m e ex ten t , these s y s t e m s a lso involve m a n a g e m e n t of t empora l in fo rmat ion , i.e. change of spat ia l ly re ferenced in format ion over t ime—yet ano ther p rob lem shared wi th the A E C / F M d o m a i n . The s t rong need for g lobal shar ing of he te rogeneous geospat ia l in format ion lead to ear ly and in tens ive deve lopmen t of in teroperabi l i ty so lu t ions , such as da ta and in format ion reg is t r ies , m e t a d a t a , and shared in format ion in f ras t ruc ture , wh ich can cer ta in ly be cons idered for reuse. 2.5 Making Information Findable Two bas ic techn iques used to make in format ion f indable inc lude its sor t ing and descr ip t ion . Sort ing can invo lve a r rangemen t of in format ion resources or the i r " s u r r o g a t e s " (e .g . descr ip t ions or t i t les) in a speci f ic l inear o rder (e .g . a lphabet ica l or tempora l ) or c lass i fy ing t h e m into groups based on a cer ta in kind of s imi lar i ty (e .g . sub jec t or fo rmat ) . The ma in purpose of c lass i f icat ion is to reduce cogn i t i ve over load and faci l i tate access to in format ion by reduc ing the detai l and d ivers i ty of da ta by g roup ing s imi la r content i tems together [BrCicher et a l . 2 0 0 2 , 1] . Examp les of sor t ing inc lude: order ing of books on she l ves , f i l ing of c o m p a n y records , ass ignmen t of codes f rom a c lass i f icat ion s c h e m e , or d e v e l o p m e n t of nav igat ion s t ructure in Web s i tes . Large a m o u n t s of in format ion m a y requi re sor t ing into deep and comp lex h ie rarch ies . More recent au toma ted approaches to sor t ing inc lude: content ca tegor iza t ion and content c lus te r ing . A u t o m a t i c categorizat ion invo lves a s s i g n m e n t of conten t i tems to a predef ined set of ca tegor ies . Th is set is typ ica l ly s tat ic and a set of content i tems is manua l l y ass igned to the ca tegor ies and used for t ra in ing ca tegor iza t ion a lgor i thms. Clustering a lgo r i thms exp lore s imi lar i t ies in the content of in format ion resources and a r range t h e m in g roups based on the d iscovered s imi lar i t ies . Unl ike ca tegor i za t ion , - 17 -c lus ter ing is not based on a predef ined s t ruc ture of know ledge : ne i ther c lasses are predef ined nor e x a m p l e s are g iven [Brucher et a l . 2 0 0 2 , 2 ] . Description invo lves captur ing of di f ferent in format ion i tem at t r ibutes that are cons idered as potent ia l ly useful access points (e .g . au thor , sub jec t , t i t le, da te , content t ype) . Descr ip t ions can be pers is ted in ca ta logue ca rds , da tabase records , fi le headers , or X M L f i les. Ove r a century ago , Cu t te r p roposed the three bas ic "ob jec ts " of the ca ta logue : (1) to enab le a person to f ind a book, (2) to show what the l ibrary has , and (3) to ass is t in the cho ice of a work [Cut ter 1904 , 12 ] . More recent ly , In ternat ional Federa t ion of L ibrary Assoc ia t ions ident i f ied four bas ic func t ions of b ib l iographic records : to enab le users to (1) f ind , (2) ident i fy, (3) se lect , and (4) access in format ion of interest [ IFLA 1998 ] . S v e n o n i u s ident i f ied as the fifth requ i rement the need to co l locate in format ion match ing cer ta in cr i ter ia (e .g . s a m e sub jec t , au thor , or fo rmat ) [Svenon ius 2 0 0 0 ] . A l though research on au toma ted ex t rac t ion of me tada ta gave s o m e accep tab le resu l ts , and s o m e at t r ibu tes ' va lues are a l ready rout ine ly ex t rac ted by sof tware , the full au toma t ion of metada ta ext ract ion is stil l not an op t ion , espec ia l l y for a t t r ibutes that requi re h u m a n j u d g e m e n t (e .g . subject ) [Greenberg et a l . 2 0 0 5 ] . In add i t ion to these two methods , cross-referencing, wh ich was t rad i t ional ly used f rom wi th in in format ion i tems in the fo rm of footnotes and c i ta t ions, is increas ing ly app l ied ex terna l ly in the fo rm of " re la ted resources" , " m o r e l ike th i s " e tc . It invo lves re lat ing in format ion resources in di f ferent w a y s ; it can be based on match ing par t icu lar in format ion i tem at t r ibutes, by captur ing user behav iour (e .g . A m a z o n ' s " C u s t o m e r s who bought th is book a lso bought. . . " ) , or by inte l lectual ly re lat ing di f ferent p ieces of content . 2.6 Matching A success fu l match ing process requi res (a) proper def in i t ion and ex te rna l i za t ion -representa t ion of the need and (b) su i tab le representa t ion of ava i lab le in fo rmat ion . In t rue pul l env i ronmen ts , users access in format ion in two basic w a y s — b y search ing and by b rows ing . Wh ich one wil l be used in a par t icu lar case depends on the init ial c lar i ty of the in format ion need , on user 's p re fe rences , on features of fered by the s y s t e m , and wi th user 's fami l iar i ty wi th the s y s t e m and in fo rmat ion- loca t ing techn iques in gene ra l . In the case of browsing, the user ex te rna l i zes , and potent ia l ly ref ines the i r in fo rmat ion need by nav igat ing th rough ava i lab le v i r tua l ca tegor ies , bookshe l ves , or file - 18 -fo lders and exp lor ing the i r content . B rows ing requi res be fo rehand sor t ing of in format ion i tems accord ing to a cer ta in c lass i f icat ion s c h e m e . The best resul ts are ach ieved if e i ther of these condi t ions is met : (a) the user is fami l ia r wi th the s c h e m e , (b) i tems are c lass i f ied fo l lowing c lear , eas i ly unders tood , and we l l -de f ined ru les. In addi t ion to h ierarch ica l b rows ing , users of ten use "side-jumping" whi le looking for in fo rmat ion . Th is can inc lude footnote c h a s i n g , c i ta t ion fo l lowing [Bates 1989 ] , fo l lowing hyper l inks , " m o r e like th i s , " or " r e l a t e d " l inks , as wel l as random brows ing of books whi le wa lk ing th rough a l ibrary. Th is techn ique is used in the process of re f in ing, c la r i f y ing , or re-def in ing the in format ion need . In searching, whe the r suppor ted by a sea rch eng ine , da tabase query ut i l i ty, or card ca ta logue , the user ex te rna l i zes the in format ion need by fo rmu la t ing or se lec t ing appropr ia te search t e rms . Sea rch ing requi res conven ien t representa t ions or descr ip t ions of in format ion resources aga ins t wh ich user quer ies can be ma tched . The representa t ions can be full text indexes or in format ion resource descr ip t ions d iscussed in Sec t ion 2 .5 . Success fu l search ing requi res that t e rms used by the user ma tch t e rms found in sys tem representa t ions of in format ion i tems. Var ious imp lemen ta t i ons and var ia t ions of thesaur i can be used to enhance o u t c o m e s of the match ing p rocess . It is impor tant to note that in pract ice, the p rocesses of search ing and brows ing are rarely sepa ra ted . First of a l l , the in fo rmat ion -seek ing process invo lved con t inues re f inement of the in format ion need th rough " s e n s e - m a k i n g " [Derv in 1999] and " b e r r y p i c k i n g " [Bates 1989 ] . Dif ferent s tages of the process m a y requi re swi tch ing to a di f ferent techn ique , or the user m a y need to adap t the ex te rna l i za t ion of the in format ion need us ing techn iques such as " p e a r l - g r o w i n g " ( i .e. us ing descr ip t ions of the f irst located useful i tem to search for o ther ) . In add i t i on , m a n y in format ion s y s t e m s feature hybr ids of the two basic app roaches . 2.7 Types of Information Management Systems Bus iness env i ronmen ts have prov ided a fert i le soi l for many new types of in format ion s y s t e m s . Whi le s o m e of these s y s t e m s represent d is t inct types in t e rms of funct ional i ty and types of in format ion they hand le , many more are s imp ly var ia t ions , comb ina t i ons , and modi f ica t ions of those dist inct t ypes , appear ing on the marke t under di f ferent n a m e s . - 19 -Management Information Sys tems (MIS) , Executive Informat ion Systems (EIS), Decision Support Systems (DSS) , and o ther s im i la r too ls used to suppor t d e c i s i o n - m a k i n g , in thei r or ig inal f o rm , deal t pr imar i ly wi th in ternal ly genera ted and cent ra l ly contro l led s t ruc tured da ta . A s s u c h , they did not involve s igni f icant in format ion m a n a g e m e n t cha l l enges ; the i r funct ional i ty evo lved f rom a set of predef ined quer ies genera t ing reports on the s ta tus of a bus iness to more in teract ive appl icat ions dynamica l l y genera t ing cus tom repor ts and resul t p resenta t ions based on user 's input. Ove r t ime , however , these s y s t e m s g rew more comp lex because of the int r ins ic need to cons ider ex terna l and non-s t ruc tu red in format ion in s t ra teg ic dec i s i on -mak ing and evo lved into today 's Business Intel l igence (BI) so lu t ions . Th is broad ca tegory of app l ica t ions uses a var ie ty of advanced techno log ies for ex t rac t ing in format ion f rom mul t ip le and he te rogeneous sources of da ta and for soph is t i ca ted data ana lys is and v isua l i za t ion to suppor t s i tuat ion a w a r e n e s s , r isk a s s e s s m e n t , and dec is ion mak ing . Product Data Management (PDM) s y s t e m s , wh ich f irst appea red in the late 1980 ' s , are br inging together s t ruc tured and unst ruc tured in format ion about p roduc ts , a long wi th process m a n a g e m e n t funct iona l i ty . Spec i f i c products vary wide ly in c lass i f icat ion and search capabi l i t ies but, in gene ra l , product and componen t n a m e s g rouped into often c o m p l e x , company -spec i f i c h ie rarch ies are used as pr imary access po ints . PDM s y s t e m s typ ica l ly use predef ined d o c u m e n t t ypes for g roup ing product documen ta t i on re lated to par t icu lar products . Electronic Document Management (EDM) or Document Management Systems (DMS) can have very d iverse features but, in gene ra l , d raw on techn iques deve loped wi th in l ibrar ianship and records m a n a g e m e n t . Con ten t is typ ica l ly c lass i f ied based on s o m e comb ina t ion of d isc ip l ines , sub jec ts , and content t ypes . A long wi th s tandard access contro l and vers ion m a n a g e m e n t fea tu res , these s y s t e m s often prov ide for s o m e type of resource descr ip t ion , possib i l i ty to in tegrate cont ro l led vocabu la r ies , and work f low m a n a g e m e n t . S o m e more recent t ypes of in format ion m a n a g e m e n t s y s t e m s evo lved f rom E D M . Content Management Systems (CMS) are op t im ized for the Web and suppor t a var ie ty of p rocesses re lated to d e v e l o p m e n t , m a n a g e m e n t , and d is t r ibut ion of content . Con ten t can be ent i re d o c u m e n t s , d o c u m e n t f ragmen ts or p ieces of da ta or ig inat ing f rom one or more s imi la r or d iverse app l i ca t ions and can be reused and recombined for var ious purposes . Work f low m a n a g e m e n t is a lmos t a lways in tegrated in these s y s t e m s . - 20 -Workflow Management Systems (WfMS) are used for def in i t ion , m a n a g e m e n t and execu t ion of c o m p u t e r faci l i tated or au toma ted bus iness p rocesses or parts thereof . T a s k s are a s s i g n e d , passed o n , a long wi th requi red in fo rmat ion , f rom one par t ic ipant to ano ther , and thei r p rogress t racked by a compu te r p r o g r a m , accord ing to pre-de f ined rules and p rocedures—a c o m p u t e r representa t ion of the work f low logic [Hol l ingswor th 1 9 9 5 ; P lesums 2 0 0 2 ] . Collaboration solutions comb ine s o m e of the a b o v e - m e n t i o n e d s y s t e m s wi th var ious tools and techno log ies that prov ide suppor t for both synch ronous and asynch ronous co l labora t ive work. They enab le documen t or conten t m a n a g e m e n t , work f low a u t o m a t i o n , va r ious t ypes of rea l - t ime interact ion (e .g . v ideo con fe renc ing , wh i teboards , chat , f i le and app l ica t ion sha r ing ) , g roup schedu l ing capab i l i t ies , d i scuss ion g roups , redl in ing and so on . Personal Information Management (PIM) so lu t ions inc lude a var ie ty of too ls , app l i ca t ions , techno log ies that help ind iv iduals man ipu la te persona l i n f o r m a t i o n -o rgan ize , excerp t , anno ta te , g roup , and l ink it. N u m e r o u s project in format ion m a n a g e m e n t so lu t ions ta rget ing the A E C / F M industr ies (e .g . Au todesk B u z z s a w . Cons t ruc tware . e -Bu i lde r , P r imeCon t rac t , P ro jec tEdqe , Project-Talk), marke ted under di f ferent n a m e s , typ ica l ly inc lude the funct ional i ty of all the a fo re -descr ibed s y s t e m s . 2.8 A E C / F M Domain Challenges 2.8.1 Abundance Tardi f , Mur ray & Assoc ia tes Inc. 's ana lys is showed that an ave rage U S $ 1 0 mi l l ion cons t ruc t ion project genera tes approx ima te l y 56 ,000 pages of d o c u m e n t s [Hendr ickson and Au 2 0 0 3 ] . The a m o u n t of in format ion a project par t ic ipant is cop ing wi th is mul t ip l ied severa l t imes s ince documen ts m a k e only one por t ion of project in fo rmat ion , as people typ ica l ly work on more than one project at a t ime and use a cons iderab le amoun t of ex te rna l in fo rmat ion . Th is o rder of magn i tude d e m a n d s so lu t ions for eff ic ient locat ing of re levant in fo rmat ion . A n IDC research s tudy ci ted in [Global Learn ing A l l iance 2004 ] found that : • " K n o w l e d g e worke rs spend 1 5 % to 2 0 % of the i r t ime look ing for speci f ic in format ion - 21 -• . Less than 5 0 % of these sea rches are success fu l • . The non-success fu l ac t ions cost a c o m p a n y wi th 1,000 know ledge worke rs $ 6 , 0 0 0 , 0 0 0 (six mi l l ion US dol lars) in loss of t ime annua l l y . " Ano the r , re lated impl icat ion is the need to f ight " in fog lu t " or in format ion over load [Wut rman 1 9 8 9 ; Lewis 1996 ] . A report p repared for the Assoc ia ted Gene ra l Cont rac to rs of A m e r i c a [Deloi te and Touche 1996] pointed to the huge a m o u n t of le t ters, m e m o s , logs, and o ther project commun i ca t i ons inundat ing const ruc t ion project manage rs . A s Herber t S i m o n no ted , " W h a t in format ion c o n s u m e s is rather obv ious : it c o n s u m e s the a t tent ion of its rec ip ients . Hence , a wea l th of in format ion c rea tes a pover ty of a t tent ion and a need to a l locate that a t tent ion eff ic ient ly a m o n g the ove rabundance of in format ion sources that might c o n s u m e it." [Var ian 1995] The large a m o u n t of in format ion in the d o m a i n , the pace at wh ich new in format ion is c reated and needed to be ava i lab le m a k e s a perfect case for ex tens i ve use of au toma t i on . Many soph is t i ca ted and eff ic ient techn iques for mak ing in format ion f indable w i thout ex tens ive h u m a n efforts are ava i lab le ; however they are not necessar i l y app l icab le to cons t ruc t ion pro jects . One reason is that m a n y of these techn iques (e .g . Web search eng ines) requi re substant ia l ly larger corpora in o rder to be ef f ic ient, such as bi l l ions of pages and the immeasu rab le "h idden W e b " ava i lab le on the Internet . The o ther is the di f fer ing degree of er ror to le rance ac ross d o m a i n s — a misp laced piece of in format ion is perfect ly acceptab le on the In ternet ; in a cons t ruc t ion project consequences can be ser ious enough to just i fy the increased costs of h u m a n effort on o rgan iz ing in fo rmat ion . 2.8.2 Heterogeneity In format ion used in cons t ruc t ion projects is ex t reme ly he te rogeneous a long severa l axes . First of a l l , it is m u l t i - m o d a l ; tex tua l , numer i ca l , g raph ic , a n d , inc reas ing ly , mu l t i -med ia fo rmats are equa l ly impor tant . In each of these m o d e s in format ion can be s t ruc tured to a va ry ing degree and in va r ious w a y s , ranging f rom uns t ruc tured text , pho tographs , and v ideo to semi -s t ruc tu red C A D draw ings and we l l - fo rmat ted text to da tabases , " s m a r t " C A D , layered mu l t imed ia , and X M L d o c u m e n t s wi th very d iverse s c h e m a s . Fu r the rmore , on ly a port ion of in format ion is genera ted in digi tal f o r m ; te lephone conve rsa t i ons , f axes , cock ta i l -napk in ske tches are sti l l prevai l ing modes of c o m m u n i c a t i o n . - 22 -A s ing le person per fo rming a par t icu lar task wil l typ ica l ly need in format ion f rom any comb ina t ion of these ca tegor ies . A n impor tant requ i rement for eff ic ient in format ion m a n a g e m e n t in the doma in is the possib i l i ty to relate and seam less l y in tegrate all d i f ferent t ypes of in fo rmat ion . Many advanced techno log ies , such as natura l language process ing (NLP) [Lewis 1991 ] , con ten t -based image ret r ieval (CBIR) [Long , Zhang and Feng 2 0 0 3 ] , or bus iness inte l l igence [Dhar and Ste in 1997] a re fo rmat -spec i f i c and can so lve on ly a sma l l port ion of the overa l l p rob lem. In format ion needed for one person 's work res ides in a var ie ty of p laces , in s tandard repos i tor ies such as file s y s t e m s , da tabases , and l ibrar ies but a lso in va r ious app l i ca t ions—from e-ma i l s ys tems to eng ineer ing ana lys i s so f tware. In format ion s y s t e m s shou ld ideal ly be ab le to commun i ca te wi th all these app l i ca t ions , se lect and access des i red in fo rmat ion , and in tegrate wi th in format ion f rom o ther sou rces . In format ion a lso or ig inates f rom var ious sources—not on ly do di f ferent compan ies use di f ferent app l i ca t ions , they also have di f ferent in format ion m a n a g e m e n t pract ices and resources . It is not real ist ic to expec t that every th ing wil l be cons is ten t ly descr ibed or s t ruc tured at the source—tha t , for e x a m p l e , al l project par t ic ipants wil l cons is tent ly ass ign h igh-qua l i ty me tada ta to in format ion they genera te us ing s a m e s c h e m a s , data t ypes , and s a m e sets of contro l led va lues . The d o m a i n inc ludes a network of d is t r ibu ted , a u t o n o m o u s in format ion s y s t e m s wi th no centra l cont ro l . In add i t i on , each project is un ique and di f ferent pro jects carr ied out by the s a m e c o m p a n y may have substant ia l ly di f ferent in format ion pools and requ i remen ts . Spec i f i c resources used for o rgan iz ing in fo rmat ion , such as c lass i f ica t ion s c h e m e s , me tada ta s c h e m a s , conten t t ypes , or contro l led vocabu la r ies , m a y not be su i tab le for all k inds of pro jects . A seam less seman t i c reconci l ia t ion and the abi l i ty to swi tch f rom one to ano the r wou ld be useful even wi th in a c o m p a n y . 2.8.3 Different views A s d i scussed in Sec t ion 2 . 1 , depend ing on the i r ro les, af f i l ia t ions, and personal charac te r is t i cs , di f ferent project par t ic ipants requi re di f ferent v iews of a pro ject—they need di f ferent subse ts of project in fo rmat ion , care abou t di f ferent proper t ies of the s a m e ent i t ies , di f ferent levels of de ta i l , and want to o rgan ize the i r subse t of interest in di f ferent w a y s . - 23 -Severa l recent research pro jects deve loped so lu t ions for d isc ip l ine-spec i f i c v iews of shared project in fo rmat ion . Researchers f rom E indhoven Univers i ty of Techno logy , for e x a m p l e , focused on the speci f ic needs of arch i tec ts [van Leeuwen and Fr idqv is t 2 0 0 2 ] , those f rom Techn ica l Un ivers i ty of Dresden on the needs of eng ineers [Gehre and Ka t ranuschkov 2 0 0 2 ] . [Amor and Faraj 2001 ] pointed to the di f f icul t ies of reconci l ing di f ferent v iews and mapp ing between di f ferent mode ls in in tegrated project da tabases . Ex is t ing in format ion re t r ieva l /de l ivery s y s t e m s are typ ica l ly dea l ing wi th a s imp le r aspec t of the p rob lem, the one of der iv ing di f ferent v iews by f i l ter ing and re -o rgan iz ing . Di f ferent cus tom iza t i on , pe rsona l i za t ion , local izat ion and f i l ter ing m e c h a n i s m s are used for th is purpose . Howeve r , these techn iques are not prov id ing for reconci l ia t ion of di f ferent foc i , levels of de ta i l , and di f ferent concep tua l i za t ions . 2.8.4 Change Over the l i fecycle of a project , the a m o u n t of in format ion is g row ing , the content is mod i f i ed , and the in format ion need is chang ing . C h a n g e s in requ i rements , des ign , spec i f i ca t ions , legis lat ion and unexpec ted cond i t ions represent ma jo r causes of de lays , cost i nc reases , and resul t ing l i t igat ion in the indust ry [Ahmed 2 0 0 3 ] . In add i t ion , as [Perry et a l . 1999] point out , di f ferent project phases requi re di f ferent o rgan iza t ion of in fo rmat ion . Each project is un ique and di f ferent pro jects require di f ferent o rgan iza t iona l s t ruc tu res [Walker and Hughes 1987 ] , wh ich impl ies di f ferent in format ion f lows and the need for di f ferent o rgan iza t ion of in fo rmat ion . Cont ro l ve rs ion ing and change not i f icat ion are a l ready s tandard features of mos t in format ion m a n a g e m e n t s y s t e m s , thus suppor t ing the chang ing content . Howeve r , suppor t for chang ing in format ion need is cur rent ly in its ear ly in fancy, sti l l on ly exp lo red in research pro jects dea l ing with adapt i ve in format ion s y s t e m s . 2.8.5 Multiple information spaces Part ic ipants in a const ruc t ion project a lways work at the in tersect ion of mul t ip le in format ion s p a c e s : pe rsona l , project , o rgan iza t iona l , p ro fess iona l , indust ry , and global in format ion spaces can be d is t ingu ished . The impl ica t ions are mul t ip le . Secur i t y issues are d i scussed in deta i l in [ D a m m and Sch ind le r 2 0 0 2 ] . A s secur i ty is cur rent ly a ma jo r concern in all d o m a i n s , numerous techno log ies and m e c h a n i s m s have been deve loped and imp lemen ted in pract ical ly all in format ion m a n a g e m e n t s y s t e m s . [Howes 2000 as ci ted in A m o r 2001 ] c la ims that legal ownersh ip of da ta is the key obs tac le for the use of IT in cons t ruc t ion . Ownersh ip issues are related to how much and what part of its - 24 -knowledge an indiv idual c o m p a n y shou ld share and who has the right and the need to keep what port ion of project in format ion upon its conc lus ion . Th is set of issues is pr imar i ly reso lved by bus iness and legal p rocedures and ag reemen ts . The impl icat ion of pr imary interest in the contex t of th is work is synchron iza t ion of d iverse in format ion m a n a g e m e n t pract ices carr ied out by mul t ip le par t ies. 2.9 Problem Definition This subsec t ion pins down the ma in p rob lem to be add ressed by th is thes is . In format ion and in format ion f low in the d o m a i n are ident i f ied as a comp lex s y s t e m that cannot be so lved by reduct ion is t decompos i t i on into manageab le componen ts . The sect ion sugges ts that the comp lex i t y and heterogene i ty are the key features of that s y s t e m and mus t be p reserved in any abs t rac t ion . Based on the ana lys is of the p rob lem, it can be sta ted that in format ion m a n a g e m e n t and use in the A E C / F M doma in has to deal w i t h : Large a m o u n t s of d iverse t ypes of information resources, wh ich are related in di f ferent w a y s prov id ing cont inuous ly chang ing information about numerous and d iverse entities which are invo lved in d iverse and chang ing relationships between t hemse l ves and wi th ent i t ies in other domains need to be used by d iverse people using dif ferent terminologies conceptualizing the d o m a i n in di f ferent w a y s hav ing dif ferent backg rounds , capab i l i t ies , and pre ferences for d iverse purposes requi r ing di f ferent aspects of dif ferent sets of ent i t ies at di f ferent levels of granularity/abstraction - 25 -represented in di f ferent w a y s in di f ferent applications on di f ferent devices al lowing di f ferent displays. The forego ing s ta temen t points to d ivers i ty a long mul t ip le a x e s , comp lex in ter re la t ionsh ips of in fo rmat ion , its sub jec ts , and var ious e lemen ts of contex t in which it is used , and an a l l - encompass ing change . In format ion used and genera ted in const ruc t ion pro jects and its f low represent a s y s t e m that is not mere ly compl i ca ted but tru ly comp lex . It is an open network wi th a large n u m b e r of he te rogeneous e lemen ts invo lved in rich and d y n a m i c in teract ion wh ich invo lves t ransfer of in fo rmat ion . Th is network , charac te r i zed b y , d i s t r i bu tedness , asymmet r i ca l re la t ionsh ips , non- l inear in teract ions, lack of cent ra l con t ro l , recur rence , and cont inua l change has al l features of comp lex s y s t e m s . Mere ly compl i ca ted s y s t e m s can be ana l yzed and d e c o m p o s e d into manageab le parts or modu les , wh ich can eas i ly be so lved when isolated and subsequen t l y in tegrated into a who le . B idd ing , des ign , cos t , and faci l i t ies m a n a g e m e n t in fo rmat ion , for e x a m p l e , have t radi t ional ly been m a n a g e d separa te l y , each hav ing a f i l ing s y s t e m or compu te r app l ica t ion of its o w n . The int roduct ion of product data m a n a g e m e n t , documen t m a n a g e m e n t , and work f low m a n a g e m e n t s y s t e m s c a m e out of the recogn ized need to integrate these d is t inct modu les . Howeve r , a l though each of these s y s t e m s acknow ledges its own incomp le teness by inc lud ing o ther e lemen ts of the who le to s o m e ex ten t , each of t h e m does it f rom its own perspec t i ve , c lear ly show ing its speci f ic focus and o r ig in ; they are all c lear ly documen t - cen t r i c , p roduc t -cent r i c , or p rocess-cen t r i c and unable to ef fect ively handle the ent i re s p e c t r u m of in format ion in the d o m a i n . True comp lex i t y , however , means that e lemen ts of a s y s t e m are in ter re la ted in such a way that s imp le reduct ion is t decompos i t i on and re -ama lgama t i on cannot prov ide a su i tab le so lu t ion . The focus wil l therefore not be on parts of the s y s t e m , but rather on thei r re la t ionsh ips [Froese and S t a u b 2 0 0 3 ] . If an abs t rac t ion of th is port ion of the real wor ld is necessary in o rder to unders tand and m a n a g e it, it is the comp lex i t y , he terogene i ty , and change that wi l l be preserved ra ther than ignored in that abs t rac t ion . - 26 -3 Research Objective and Scope of Work This sect ion s ta tes the ob jec t ive of th is d isser ta t ion and de l imi ts its scope wi th in the overa l l p rob lem a r e a . Th is work s t r ives to br ing the A E C / F M d o m a i n a s tep c loser to the ideal descr ibed in Sec t ion 1.1. It a ims to comb ine the qual i ty of in format ion f low charac te r iz ing soc ia l iza t ion env i ronmen ts wi th the benef i ts and potent ia ls of in format ion techno log ies , wi th in a highly he te rogeneous comp lex s y s t e m . The search for the so lu t ion star ts wi th the p remise that : (a) the key to the so lu t ion is the eff ic ient commun i ca t i on be tween numerous and d iverse in format ion s y s t e m s that lie be tween equa l ly numerous and d iverse c rea to rs and c o n s u m e r s of in format ion and be tween h u m a n s and in format ion s y s t e m s at each point of in terac t ion, and (b) all commun ica t i on wi th in that comp lex s y s t e m needs to preserve essent ia l qual i t ies of h u m a n - t o - h u m a n c o m m u n i c a t i o n , whi le enhanc ing it by the power of new techno log ies . In that effort, the thes is does not a i m to replace the ex is t ing app roaches wi th a new and better a p p r o a c h , nor to bui ld a f inal so lu t ion on top of the ex is t ing ef for ts. Ra ther , in recogniz ing both the va lue and insuf f ic iency of the ex is t ing app roaches , the goal is to prov ide a f ramework that w i l l : a) a l low the ex is t ing efforts to coex is t and work together towards the shared idea l , avo id ing to d e e m any s ing le one right or w r o n g , b) a l low A E C / F M d o m a i n par t ic ipants to bet ter m a n a g e in format ion us ing any or all of the ex is t ing app roaches , and c) prov ide the in f rast ructure for fur ther efforts towards ach iev ing the idea l . 3.1 Hypothesis It is poss ib le to deve lop a h igh- leve l f r amework based on charac ter is t i cs of h u m a n -t o - h u m a n commun ica t i on that can leverage a var ie ty of ex is t ing efforts a im ing to enhance in format ion use in the A E C / F M d o m a i n . - 27 -3.2 Statement of objectives • Ident i fy the set of factors that is c o m m o n to the resources and app roaches cur rent ly ava i lab le for manag ing A E C / F M in format ion and can there fore be used to re late t h e m . • Deve lop a f ramework in wh ich these resources and app roaches can work together . • Deve lop a pro to type that uses the f ramework and demons t ra tes its benef i ts in address ing the prob lem ident i f ied in Sec t ion 2 .9 . 3.3 Scope of Work This sect ion s u m m a r i z e s the work done in the research descr ibed in this documen t . The work descr ibed in th is documen t bui lds upon pr ior research and ex is t ing pract ices in a n u m b e r of di f ferent f ie lds by : • Ana l yz ing in depth and breadth the prob lem of in format ion needs in the A E C / F M d o m a i n . • Sugges t i ng that a prerequis i te for eff ic ient in format ion m a n a g e m e n t in the doma in is eff ic ient commun i ca t i on be tween d iverse h u m a n and mach ine par t ic ipants fo rming a t ru ly comp lex s y s t e m , it ana l yzes the c o m m u n i c a t i o n p rob lem in e lec t ron ic env i ronmen ts by compar i ng it to h u m a n - t o - h u m a n commun i ca t i on and cons ider ing all three componen ts of sem io t i c s : syn tax , seman t i c s , and p ragmat i cs . • Ana l yz ing ex is t ing i n fo rma t i on -managemen t resources and m e c h a n i s m s wi th in the semio t i c f ramework and ident i fy ing their c o m m o n fac tors . • Propos ing a h igh- leve l f ramework that can be used to relate a var ie ty of ex is t ing seman t i c resources and in format ion m a n a g e m e n t app roaches . • Propos ing a s y s t e m arch i tec ture in which the proposed f ramework shou ld be used . • Present ing an appl icat ion that uses the proposed f ramework for manag ing in fo rmat ion . - 28 -3.4 Assumptions and Delimitations Despi te the fact that the A E C / F M doma in is not yet ful ly compu te r i zed and ne tworked and much of the in format ion is sti l l exchanged on paper , v ia fax , or te lephone [Rivard 2 0 0 0 ; R ivard et a l . 2 0 0 4 ; Bjdrk 2 0 0 1 ; Bjdrk 2 0 0 2 ] , th is research wil l focus exc lus ive ly on e lec t ron ic in format ion env i ronmen ts for three reasons . First : the increased adopt ion of in format ion techno log ies in the d o m a i n , however lagging af ter o ther indust r ies , is an unden iab le t rend , s e c o n d : m a n a g e m e n t of non-e lec t ron ic in format ion is to a great ex tent hand led by c o m p u t e r s , and f inal ly : even if nei ther in format ion nor in format ion m a n a g e m e n t are c o m p u t e r i z e d , the key cha l lenges and techn iques are bas ica l ly the s a m e . Part ly because of th is focus , the research is not referr ing speci f ica l ly to know ledge m a n a g e m e n t , desp i te numerous re ferences to k n o w l e d g e - m a n a g e m e n t re lated pro jects and concepts th roughout the documen t . The p rocesses of l ea rn ing , cap ture and t rans fe r of taci t knowledge th rough in terna l iza t ion , ex te rna l i za t ion , and soc ia l i za t ion [Nonaka and Takeuch i 1995] are out of scope of th is wo rk ; however , the research is s t rong ly re lated to the use , m a n a g e m e n t , and retr ieval of captured expl ic i t know ledge The research s tar ts f rom the assump t i on that s o m e o ther cur ren t t rends wil l con t inue in the future as we l l , spec i f ica l ly : • cont inu ing and increas ing use of the Internet as the p r imary in format ion env i ronmen t , • cont inu ing increase in da ta s torage and process ing capabi l i t ies of compu te rs , a c c o m p a n i e d by dec reas ing cos ts , • deve lopmen t of soph is t i ca ted mach ine- in te l l igence techn iques enab led by these t rends , • fur ther deve lopmen t of techn ica l in f rastructure and s tandards faci l i tat ing in format ion sha r i ng , • d ivers i f ica t ion of dev ices and commun ica t i on techno log ies , • deve lopmen t of secur i ty and t rus twor th iness -de te rm in ing m e c h a n i s m s . Th is work p roposes only a f ramework—a set of cons t ruc ts , gu id ing ideas , and pr inc ip les—not a comp le te s y s t e m . It does not offer a new d o m a i n s c h e m a , on to logy , knowledge representa t ion techn ique , concep tua l f o rma l i sm , or notat ion nor does it endorse any par t icu lar imp lementa t ion of the proposed f ramework . D o m a i n s c h e m a s - 29 -and onto log ies are d iscussed th roughout the d o c u m e n t , but on ly as they relate to o ther types of seman t i c resources . The purpose of the work is not to enab le full in teroperabi l i ty of app l ica t ions us ing di f ferent s c h e m a s or reason ing and prob lem so lv ing ac ross d ispara te know ledge bases , but to enhance m a n a g e m e n t of d o m a i n in fo rmat ion , (only a sma l l port ion of which is s t ruc tured accord ing to s tandard s c h e m a s and onto log ies ) . The work is focus ing on the use of in format ion in the A E C / F M d o m a i n . A l though the proposed f ramework is gener ic , i.e. der ived f rom the ana lys is of genera l c o m m u n i c a t i o n , seman t i c resources , and techno log ies , the requ i rements are based on the speci f ic fea tures and needs of the A E C / F M doma in and these needs and A E C / F M - s p e c i f i c seman t i c resources were used for eva lua t ing the de l i ve rab les . C o n s e q u e n t l y , a l though the p roposed f ramework can be used in any d o m a i n , it wil l be par t icu lar ly useful in the A E C / F M indust r ies or in other d o m a i n s that have a s im i la r modus operandi and in format ion use pat terns. 3.5 Methodology A s th is research is s t rong ly in terd isc ip l inary , it faced the cha l lenge of choos ing the research me thodo logy most su i tab le for the par t icu lar purpose a m o n g the very d iss im i la r me thodo log ies typ ica l ly used in the a reas of eng inee r ing , soc ia l sc iences , and compu te r sc ience . The ep is temolog ica l pa rad igm used can be best descr ibed as a comb ina t ion of p o s t m o d e r n i s m and p ragma t i sm and the research me thodo logy as ec lect ic . A s the topic and the d o m a i n have least to do wi th natural sc iences , or more prec ise ly , because the goal of the research was not to obse rve the ex is t ing wor ld in o rder to predict and contro l it, pos i t i v ism and its me thods were d e e m e d unsu i tab le , and the methodo logy can be best exp la ined by referr ing to research me thods used in c o m p u t e r sc ience . The research p rocess was to a cons iderab le deg ree , a l though not exc lus ive ly , of the exp lo ra to ry nature. It s tar ted wi th a v is ion of the ideal s ta te of in format ion m a n a g e m e n t in the A E C / F M doma in and proceeded wi th very b road , mu l t i -perspec t i ve ana lys is of the genera l p rob lem a rea . A s the exp lo ra t i ve method inherent ly does not al low deta i led speci f icat ion of a work plan in a d v a n c e , the p rob lem and a genera l so lu t ion idea were gradua l ly clar i f ied th rough the process of exp lo ra t ion , resul t ing in the prob lem def in i t ion. From that point o n , the cons t ruc t i ve por t ion s ta r ted , wh ich invo lved the deve lopmen t of the f ramework and the pi lot imp lemen ta t i on . However , the - 30 -exp lo ra to ry -me thod componen t ran in paral le l unti l the very e n d , mean ing that the research p rob lem was fur ther c lar i f ied and spec i f ied dur ing the process . The work is based on an ex tens ive and mu l t i -pe rspec t i ve ana lys i s of the p rob lem and ex is t ing approaches for address ing it. The concep tua l f ramework was built by syn thes iz ing the resul ts of the ana lys is and in te rweav ing t hem wi th the gu id ing idea t rans la ted into a ser ies of con jec tu res . As the research is non -expe r imen ta l , the resul ts are va l ida ted aga ins t predef ined requ i rements , ra ther than by empi r i ca l ev idence . The f ramework is va l ida ted by demons t ra t i ng that it can be used to re late di f ferent seman t i c resources and o ther app roaches to in format ion m a n a g e m e n t and th rough a pro to type that sat is f ies the requ i rements set for the f ramework . Bus iness requ i rements were der ived f rom the l i terature on genera l in fo rmat ion- re la ted behav iour and on funct ions of di f ferent t ypes of seman t i c resources (Sec t ion 7 .1) . T h e s e requ i remen ts were later re lated to s a m p l e w o r k - t a s k s , in gener ic use cases assoc ia ted wi th doma in -spec i f i c scenar ios f rom very di f ferent con tex ts (Sec t ion 8 .2 .2 ) . Funct iona l requ i rements needed to suppor t the use cases have been ident i f ied and exp ressed in te rms of the f ramework (Sect ion 8.2.3). For pract ical reasons , a set of test cases , co r respond ing to the use cases , but invo lv ing a l imi ted da taset , was deve loped and tes ted in the pilot (Sect ion 8 .3 .7 ) . The pilot was used to prove on a sma l l sca le that th is f ramework can be imp lemen ted in a work ing s y s t e m , wh ich can be used to success fu l l y accomp l i sh a sma l l set of t asks that is representa t ive of the overa l l in fo rmat ion- re la ted needs in the d o m a i n . Due to the great ly exp lora t ive nature of the resea rch , on ly tes t ing of potent ia l future s y s t e m s imp lemen t ing the f ramework may prov ide the oppor tun i ty to ful ly assess the ex ten t to wh ich the resul ts of th is research can affect the pract ice in the A E C / F M d o m a i n . - 31 -4 Communication This sect ion ana l yzes di rect h u m a n - t o - h u m a n commun i ca t i on and compa res it w i th commun ica t i on in e lec t ron ic env i ronmen ts . A s spec i f ied in Sec t ion 3, the p remise of th is thes is is that success fu l in format ion m a n a g e m e n t necess i ta tes eff ic ient c o m m u n i c a t i o n be tween numerous in format ion s y s t e m s that lie between c rea tors and c o n s u m e r s of in format ion and be tween h u m a n s and in format ion s y s t e m s at each point of in terac t ion , w i thout losses of the essent ia l qual i t ies of h u m a n - t o - h u m a n c o m m u n i c a t i o n . H u m a n commun i ca t i on enta i ls the abi l i ty of par t ic ipat ing part ies to convey and unders tand m e a n i n g , as opposed to mere data t r ansm iss i on , wh ich is the ma in concern of c o m m u n i c a t i o n techno log ies . Th is sect ion wil l p rov ide an overv iew of how mean ing is conveyed in direct in te rpersona l commun ica t i on wi th the goal to ident i fy factors necessary for eff ic ient exchange of mean ing in e lec t ron ic in format ion env i ronmen ts . Any commun ica t i on necess i ta tes the use of s o m e kind of a shared language . A language can be seen as a s y s t e m of s igns that are used accord ing to a set of speci f ied ru les in o rder to convey in fo rmat ion . A s ign is a d iscre te unit of mean ing that cons is ts of at least two e l emen ts : s igni f ier and s i g n i f i e d . 5 A signifier is a w o r d , ph rase , image , ges tu re , sound or any th ing e l s e 4 that deno tes some th ing o ther than i tself—a par t icu lar s ign i f ied. The words "doo r " , " po r te " , C P V code " 2 8 1 2 2 2 0 0 " , s y m b o l s C i and are all s igni f iers for the s a m e s ign i f ied . Signifieds can be seen as any th ing that can be thought of and may therefore need an agreed upon s igni f ier in o rder to enab le any re lated c o m m u n i c a t i o n . It c o m e s c losest to the concep t of " t h i n g " used in h igh- leve l on to log ies ("the un iversa l col lect ion : the co l lect ion wh i ch , by def in i t ion, conta ins eve ry th ing there i s " [Cycorp 2002] ) or " sub jec t " in topic maps ("anyth ing wha tsoeve r , regard less of whe the r it ex is ts or has any other speci f ic charac ter is t i cs , about wh ich any th ing wha tsoeve r may be asser ted by any 3 The dichotomy of signified and signifier comprising a sign and the terminology are adopted from de Saussure [Saussure 1966]. Although there are many more theories of signs, this particular one is used here, simply as considered suitable for introducing the concepts. 4 de Saussure himself focused on linguistic signs and, even more specifically, considered spoken words and utterances (images acoustiques) as signifieds. - 32 -means w h a t s o e v e r " [Garsho l and Moore 2005 ] ) . To c o m m u n i c a t e , two part ies mus t know both what s igni f iers s tand for and how they can be comb ined to descr ibe a concept , tel l a s tory , or ask a ques t i on . Examp les of s ign i f ieds inc lude : C E M E bu i ld ing , Twin Towers , the f ict ional D reamland resort used in th is d o c u m e n t ' s e x a m p l e s , A t lan t i s , U top ia , phenomeno logy , s y s t e m theory , pro ject s tar t da te , O l y m p i c G a m e s , V a n c o u v e r 2010 O l y m p i c s , r isk, change o rders , let ter sent on May 12 , 2001 f rom Joe Doe to Jane Doe , fire ra t ing , Indust ry Foundat ion C l a s s e s . Each of these s ign i f ieds can be represented by mul t ip le s igni f iers of d i f ferent k inds . A syntagm is a sys tema t i c a r r a n g e m e n t of s igni f iers that fo rms a mean ing fu l who le , such as a sen tence or paragraph in a natura l l anguage . S ign i f ie rs are conca tena ted or comb ined in other w a y s to fo rm a s y n t a g m fo l lowing syntactic rules. Paradigm is the set of al l i tems that can be subs t i tu ted into the s a m e posi t ion or slot in a s y n t a g m . The choice of the s igni f ier f rom a cer ta in pa rad igm for a speci f ic s lot wi th in a syn tagmat i c a r rangemen t wil l in f luence the mean ing of the s y n t a g m . Pa rad igms ex is t at the level of s igni f iers and at the level of s ign i f ieds , or more prec ise ly , spann ing the two leve ls . Examp les of pa rad igms in the rea lm of s ign i f iers , or substitution classes, as they wil l be cal led hereaf ter , inc lude parts of speech in natura l l anguages and cons t ruc ts such as var iab les , c lasses or me thods in p r o g r a m m i n g languages . The Figure 3 shows a set of s ign i f iers f rom the Engl ish language . Each subse t a l igned ver t ica l ly (a long the parad igmat i c axis) represents one part of speech (ar t ic le, noun , ve rb , a d v e r b ) , wh ich m a k e s a pa rad igm or subst i tu t ion c lass in natural l anguages . S e n t e n c e s ( syn tagms) get fo rmed by comb ina t i on (AND) of s igni f iers f rom di f ferent subst i tu t ion c lasses a long the syn tagma t i c ax is , (a r t i c le+noun+verb+adverb ) . If any s igni f ier wi th in a s y n t a g m gets subst i tu ted by ano ther s igni f ier f rom the s a m e subst i tu t ion c lass , the sen tence wil l rema in a val id Engl ish s y n t a g m , but its mean ing wil l be c h a n g e d . "The w indow fits tightly." v s . "A w indow f i ts loosely." The sen tence "The risk f i ts t ight ly . " is a val id Engl ish s y n t a g m , but does not m a k e sense because the s igni f iers " w i n d o w " and " r i sk " , a l though be long ing to the s a m e pa rad igm on the signi f iers p l a n -subst i tu t ion c lass , be long to di f ferent pa rad igms wi th in the p lane of s ign i f ieds. - 33 -.1 Si. the wal OR swan w indow risk substitution class s w i m s bears b reaks loosely f i ts AND t ight ly syntagm grac ious ly happi ly syntagmatic Figure 3. P lane of s ignif iers: s y n t a g m s and parad igms. However , va l id syn tac t i c comb ina t i ons wi th s igni f iers se lec ted f rom correct subst i tu t ion c lasses for cor rect s lo ts in a s y n t a g m are not necessar i l y mean ing fu l . Pa rad igms a lso ex is t in the rea lm of s ign i f ieds , or more prec ise ly , spann ing the two levels (F igure 4 ) . S y n t a g m s wil l be mean ing fu l on ly when the i r s lo ts are f i l led wi th se lec t ions f rom the cor rect pa rad igms of th is t ype . plane of signifieds pragmatic influences syntagmatic axis mr-x. animals O bu'lding elements abstract concepts Figure 4. Sign s y s t e m . - 34 -In mos t l anguages , s igni f iers are related to what they s tand for—to the i r s ign i f ieds , by conven t i on . Th is t ype of s igni f iers is cal led symbolic and needs to be d is t ingu ished f rom iconic s ign i f iers tha t are related to s igni f ieds by s o m e k ind of resemb lance (e .g . icons used in user in ter faces) and f rom indexical s igni f iers wh ich are d i rect ly re lated to s ign i f ieds th rough s o m e phys ica l or causa l connect ion (e .g . a crack in a wal l as a s ign of a s t ruc tura l defect) (F igure 5) . Be ing arb i t rary , re la t ionsh ips be tween s y m b o l i c s igni f iers and s ign i f ieds have to be learned. A sys tem of re lated conven t i ons for assoc ia t ing s igni f iers to s ign i f ieds in a cer ta in doma in is ca l led code. A par t icu lar code is not un iversa l l y va l id or a c c e p t e d ; it is shared only a m o n g m e m b e r s of the s a m e " in te rpre ta t ive c o m m u n i t y " [Fish 1980 ] . S y m b o l i c s igni f ier Iconic s igni f ier Index ica l s igni f ier Figure 5. Types of s ignif iers. Even wi th in the s a m e in terpretat ive c o m m u n i t y , assoc ia t ion be tween s ign i f iers and signi f ieds are not f i xed . For e x a m p l e , the mean ing of words "I", "ho t " , or " n o w " is a lways de te rm ined by contex t . C o m m u n i c a t i o n context is a ve ry c o m p l e x concep t tha t can inc lude di f ferent charac te r i s t i cs of par t ies invo lved in c o m m u n i c a t i o n , the i r in ten t ions , the s tate of their know ledge , phys ica l env i ronment , the task at h a n d , and so o n . Con tex t can g ive addi t iona l m e a n i n g or a l ters to a vary ing degree the l i teral mean ing of s igni f iers and the i r a r r a n g e m e n t s . The s tudy of the aspec ts of mean ing and language use that depend on the con tex t of commun i ca t i on is cal led pragmatics. Pragmat i cs is one of th ree branches of semiotics—a disc ip l ine that s tud ies s ign s y s t e m s used in h u m a n c o m m u n i c a t i o n . The other two branches a r e : syntax—the s tudy of rules govern ing the a r rangemen t of s igni f iers into s y n t a g m s . and semantics—the study of codes . Eff ic ient commun ica t i on requi res par t ic ipat ing par t ies to share a l l : syntac t ic ru les, p a r a d i g m s at both leve ls , codes , and know ledge of how di f ferent con tex t factors can inf luence m e a n i n g . Th is thes is advoca tes a semio t i c app roach to in fo rmat ion m a n a g e m e n t in the A E C / F M d o m a i n , mean ing that al l these fac tors need to be cons idered in the deve lopmen t of in format ion s y s t e m s . - 35 -It is impor tan t to note that s ign i f ieds are not rea l -wor ld ent i t ies . Wha t s igni f iers s tand for are ac tua l ly menta l cons t ruc ts , not th ings or p h e n o m e n a [Saussu re 1966 , 66 ] . The Ogden and R icha rds ' mean ing t r iangle shown in Figure 6 i l lust rates the indirect re la t ionship be tween s igni f iers (symbo ls ) and rea l -wor ld th ings ( re fe ren ts ) ; they are re lated th rough a " though t of re fe rence"—a menta l representa t ion of the rea l -wor ld th ing , wh ich is symbo l i zed by the s y m b o l and refers to the referent [Ogden and R ichards 1946 , 9 -12 ] . A n impor tan t impl icat ion is that menta l representa t ions of the s a m e referent can va ry w ide ly a m o n g people and therefore the mean ing of s igni f iers is hard ly ever ful ly s h a r e d . Th is ho lds t rue for both menta l representa t ions of par t icu lar phenomena ( ins tances , ind iv idua ls , or s ing le ton sets) and the i r genera l i za t ions (categories') . Menta l representa t ions of both " a bu i ld ing" and " the bui ld ing at 123 Main S t ree t " wi l l be ve ry di f ferent for an arch i tec t , a ch i ld , and an investor . Thought of reference S v m b o l Referent Figure 6. Ogden and Richards' meaning triangle. H u m a n - t o - h u m a n commun i ca t i on rel ies on the over lap of ind iv idua ls ' conceptual izat ions—abs t rac t and s impl i f ied menta l representa t ions of the wor ld that are necessary for h u m a n th ink ing and unders tand ing and deve loped th rough di f ferent t ypes of learn ing . These representa t ions can ex is t on ly in ind iv idua ls ' heads or they can be conveyed or ex te rna l i zed to s o m e extent . S p e a k i n g in semio t i c t e rms , conceptua l i za t ions are def in i t ions of pa rad igms and the i r re la t ionsh ips in the rea lm of s ign i f ieds, fo rmed by observ ing the wor ld , learn ing , and relat ing to what is a l ready known . Categorizat ion is the bas ic cogn i t ive process used in h u m a n th ink ing . Recogn i t ion of s imi lar i t ies be tween di f ferent s ign i f ieds and the i r subsequen t g roup ing into - 36 -categor ies—parad igms at the level of s ign i f ieds—enab le h u m a n s to d i scover order in the comp lex wor ld . Wi thout ca tegor iza t ion , h u m a n exper ience of each par t icu lar th ing (e .g . each leaf of g rass or l ight swi tch) wou ld be un ique and in format ion or knowledge acqu i red at the f irst encoun te r cou ld not be ex tended and app l ied in subsequen t encoun te rs wi th o ther ins tances [Markman 1989 ] . A s [Barsa lou 1987] points out , def in i t ion of ca tegor ies in h u m a n th ink ing is sub jec t i ve , f lex ib le , and con tex t -dependan t . On the o ther hand , any commun ica t i on requi res the abi l i ty to cap ture ca tegor ies us ing labels unders tandab le to al l commun ica t i ng part ies and the d e v e l o p m e n t of d isc ip l inary d o m a i n s rel ies on the deve lopmen t of we l l -de f ined , fo rma l i zed ca tegor ies [Jacob 1994 ] . Th is p rocess of fo rma l iza t ion necess i ta tes a loss of f lex ib i l i ty , p last ic i ty , and abi l i ty to respond to new pat terns of s imi lar i ty [Jacob 2 0 0 4 ] . The c lass ica l theory of cogni t ive mode ls , founded by Ar is to t le and exposed in his Categories [Ar istot le 2 0 0 2 ] , is sti l l h ighly in f luent ia l . Acco rd ing to th is theory , ca tegor ies are p rede te rm ined , def ined by a set of fea tures shared by all ca tegory m e m b e r s , and eas i ly app rehended and shared by all m e m b e r s of the l inguist ic c o m m u n i t y unders tand ing the ca tegory labe l . More recent th ink ing and research contrad ic t the c lass ica l theory of ca tegor ies . Wi t tgens te in sugges ted that ca tegor ies are based on " fam i l y r e s e m b l a n c e s " , ra ther than on c lear ly marked def in i t ions [Wi t tgenste in 1953 ] . Th rough a ser ies of expe r imen ts , Rosch proved the theory of cu l tu re -dependen t ca tegor ies wi th graded structure and of ten unc lear boundar ies , fo rmed a n d / o r menta l l y represented a round prototypes aga ins t wh ich o ther i tems are j udged and inc luded in the ca tegory as bet ter or worse e x a m p l e s and the ex i s tence of basic- level categor ies in h u m a n th ink ing , as opposed to subord ina te and supero rd ina te abs t rac t ion levels [Rosch 1 9 7 3 ; Rosch 1 9 7 5 ; Rosch 1 9 8 1 ; Rosch and Merv is 1 9 7 5 ; Rosch et a l . 1976 ] . Ano the r more recent v iew of ca tegor ies , used in probabi l is t ic app roaches and c lus ter ana l ys i s , sees ca tegor ies as se ts of charac ter is t i cs and p reponderance of charac ter is t i cs f rom the set as a condi t ion for ca tegory m e m b e r s h i p [Smi th and Medin 1981 ] . However , the c lass ica l pa rad igm of ca tegor ies wi th rigid boundar ies and def ined by necessary and suff ic ient cond i t ions is sti l l the inerad icab le foundat ion of most concep tua l s c h e m a s . Ca tegor i za t ion is only one a m o n g many cogn i t i ve p rocesses invo lv ing creat ion of connec t ions or assoc ia t ions . Al l h u m a n percep t ion , th ink ing , and learn ing involve creat ion of assoc ia t ions . Assoc ia t ions are fo rmed be tween new and old in fo rmat ion , be tween s igni f iers and s ign i f ieds, ins tances and ca tegor ies , and a m o n g s igni f iers , s ign i f ieds , and ca tegor ies . Assoc ia t ion is ts and connect ion is ts see concep tua l i za t ions as - 37 -networks of f lex ib le , d y n a m i c assoc ia t ions of di f ferent type and s t reng th . Th is v iew is the bas is of neura l ne tworks and paral le l d is t r ibuted process ing fPDPI in fo rmat ion-process ing mode ls as wel l as of di f ferent ne two rk -based concep tua l f o rma l i sms . M iscommun ica t i on is mos t often caused by d i f ferences in conceptua l i za t ion used by di f ferent part ies or by te rmino log ica l d i f ferences. In d i rect h u m a n - t o - h u m a n c o m m u n i c a t i o n , the d i f ferences are usua l ly eas i ly reso lved thanks to the h u m a n capabi l i ty to relate di f ferent te rm ino logy and create new re la t ionsh ips and to the ease of h u m a n - t o - h u m a n interact ion in c lar i fy ing recogn ized m isunde rs tand ings . A m isma tch that does not get recogn ized can lead to more -o r - l ess ser ious p rob lems ; in cons t ruc t ion pro jects such p rob lems are f requent ly ref lected in inc reased project cos ts . Ye t , in gene ra l , h u m a n - t o - h u m a n commun ica t i on can rely on part ia l over lap of concep tua l i za t ions and part ial unders tand ing and it is exac t ly the ex is tence of d iverse incomple te v iews of the wor ld and the h u m a n abi l i ty to re late t h e m , to rear range ca tegor ies and create new d is tant re la t ionsh ips wha t m a k e s h u m a n creat iv i ty . A s Be rne rs -Lee notes as he p roposes the S e m a n t i c W e b : " W o r k i n g toge ther is the bus iness of f inding shared unders tand ing but being careful not to label t hem as abso lu te . They may be s h a r e d , but of ten arb i t rary in the larger p ic tu re . " [Berners -Lee 1 9 9 9 , 206] " W e have to be prepared to f ind that the ' abso lu te ' t ru th we had been comfor tab le wi th in one group is sudden ly cha l lenged when we mee t ano ther . H u m a n commun i ca t i on sca les up only if we can be to lerant of the d i f ferences whi le we work wi th part ial unders tand ing . The new Web mus t a l low me to learn by c ross ing boundar ies . It has to help me reorgan ize the l inks in my own brain so I can unders tand those in ano the r person 's . It has to enab le me to keep the f r ameworks I a l ready h a v e , and relate t h e m to new ones . Meanwh i le , we as people will have to get used to v iew ing as commun ica t i on rather than a rgumen t the d iscuss ions and cha l lenges that are a necessa ry part of th is p rocess . W h e n we fa i l , we wil l have to f igure out whe the r one f ramework or ano the r is b roken , or whe the r we jus t aren ' t smar t enough yet to re late t h e m . " [ ib id. , 207] H u m a n - t o - c o m p u t e r in teract ion (or any indirect , med ia ted h u m a n - t o - h u m a n c o m m u n i c a t i o n which lacks direct in teract ion be tween the sende r and receiver of in format ion) requi res addi t ional efforts for p reven t ing m isunders tand ings . In in format ion s torage and retr ieval s y s t e m s , for e x a m p l e , h u m a n index ing and thesaur i have been used to br ing together te rm ino log ies used by di f ferent people (author , indexer , and use r ) ; most user in ter faces redundant ly use both icons and labels to faci l i tate h u m a n unders tand ing . Ande rson [2003] s u m m a r i z e s the " vocabu la r y p r o b l e m " in in format ion retr ieval as fo l lows: - 38 -" R e s e a r c h in psycho logy and in format ion sc ience has repeated ly demons t ra ted the e n o r m o u s var iab i l i ty in the use of l anguage by h u m a n s descr ib ing or seek ing in format ion [Col lantes 1995 ] . Th is is t rue for sea rchers as wel l as for indexers [Saracev ic et a l . 1988 ] . S tudy af ter s tudy ind icates that indexers and searchers agree on te rms abou t 23 per cent of the t ime. Most of the var iab i l i ty appears to be due to d i f ferences in cho ice for t e rms , and the rema in ing to d i f ferences in percept ion or conceptua l i za t ions of top ics or fea tures [ I ivonen 1994 ] . Furnas et a l . [1987] sugges t that a good in format ion ret r ieval s y s t e m needs to prov ide as m a n y as f i f teen ways to exp ress a topic or feature in order to a c c o m m o d a t e up to 80 per cent of the search s ta temen ts submi t ted by use rs " . [Anderson 2 0 0 3 , 474 ] Di f ferent conceptua l i za t ions are reconc i led in h u m a n - c o m p u t e r in ter faces only to a l imi ted ex tent , for e x a m p l e , th rough the use of faceted c lass i f ica t ion in Web si te arch i tec ture or th rough the prov is ion of mul t ip le w a y s to accomp l i sh a par t icu lar task in an app l i ca t ion . H u m a n reason ing is sti l l ava i lab le at the rece iv ing end of these s y s t e m s , to interpret and unders tand the mean ing and poss ib le d i f ferences and to map conceptua l i za t ion and te rmino logy built into the s y s t e m to the persona l ones . Mach ine - t o -mach ine c o m m u n i c a t i o n , however , cannot rely on h u m a n reason ing to reconci le any d i f ferences. Both conceptua l i za t ion and te rm ino logy need to be made expl ic i t in a mach ine -unde rs tandab le fo rm and agreed upon or commi t t ed to by all par t ic ipat ing par t ies. Concep tua l i za t ions are made expl ic i t in the fo rm of conceptual schemas. S c h e m a s are spec i f ied us ing a par t icu lar concep tua l f o rma l i sm and an assoc ia ted conceptua l s c h e m a language . A conceptual formal ism p rov ides basic cons t ruc ts , ru les, cons t ra in ts , inher i tance m e c h a n i s m s , even ts , func t ions , p rocesses , and other e l emen ts necessary to represent a concep tua l i za t ion . A conceptual schema language is a fo rmal language parsab le by compu te rs a n d / o r unders tandab le to h u m a n s that conta ins all cons t ruc ts—sign i f ie rs , rules for the i r use , senses , and subst i tu t ion c lasses , necessary to exp ress a concep tua l s c h e m a . In th is documen t , a popula ted conceptua l s c h e m a , i.e. an a r r a n g e m e n t of s igni f iers us ing the vocabu la ry and rules of a par t icu lar conceptua l s c h e m a language and s t ruc tured us ing a par t icu lar conceptua l s c h e m a , represent ing a cer ta in port ion of the real wo r ld , wil l be cal led a model . It shou ld be d is t ingu ished f rom the t e rm knowledge base, wh ich will be used here to denote a col lect ion of in format ion about a cer ta in port ion of the wor ld , cor re la ted to a par t icu lar conceptua l s c h e m a , mean ing that it can inc lude unst ruc tured in format ion and in format ion s t ruc tured accord ing to d i f ferent conceptua l s c h e m a s . Figure 7 shows re la t ionsh ips be tween these basic mode l l ing cons t ruc ts . It is based on const ruc ts def ined in ISO 19101 [ ISO 2002 ] and ex tended and adap ted for the purposes of th is work. - 39 -Real wor ld Domain of interest Is abstracted and internally represented in Conceptual izat ion Is formally and externally represented in Conceptual s c h e m a Provides constructs for describing Is structured according to Is organized using Model Provides language for communicating Knowledge base Conceptual fo rma l i sm Uses one or more Conceptual s chema language Figure 7. Represent ing "real w o r l d " in e lectronic env i ronment . Adopt ing the te rm ino logy used in the related A E C / F M project ca l led F U N S I E C , the te rm semantic resources wil l be used here to refer to all t ypes of resources that make conceptua l i za t ions and senses expl ic i t in a form parsab le by compu te rs a n d / o r unders tandab le by h u m a n s . S e m a n t i c resources have been deve loped in m a n y d isc ip l ines inc luding in format ion re t r ieva l , computa t iona l l inguis t ics, da ta m a n a g e m e n t and mode l l i ng , knowledge rep resen ta t ion , and art i f ic ial in te l l igence. It is impor tant to note here that all seman t i c resources a re not necessar i l y in tended to faci l i tate c o m m u n i c a t i o n ; s o m e have for the i r p r imary goal o rgan iza t ion of in format ion or da ta cons is tency , for e x a m p l e . Howeve r , as they never the less make s o m e kind of conceptua l iza t ion or sense expl ic i t , they can be cons idered for th is addi t iona l purpose. - 40 -5 Existing Components of Semiotic Infrastructure This sect ion descr ibes t ypes of seman t i c resources used for enhanc ing in format ion m a n a g e m e n t in ne tworked env i r onmen ts , in te rms of the a b o v e -exp la ined f ramework and the ex is t ing w a y s to a l low the i r coex is tence . Spec i f i c seman t i c resources used in the A E C / F M doma in are rev iewed in Sec t ion 6 . 3 . 1 . 5.1 Information Organization 5.1.1 Metadata One of the two basic in format ion m a n a g e m e n t techn iques—in fo rmat ion resource descr ip t ion , t radi t ional ly imp lemen ted in the fo rm of l ibrary ca ta logue cards or b ib l iographic records , in recent yea rs found its app l ica t ion far beyond l ibrary wa l ls . The compute r i zed and later, ne tworked env i r onmen t gave it a qual i ta t ive ly new contex t and it is cur rent ly imp lemen ted as " m e t a d a t a " , most ly in the contex t of corpora te da ta wa rehouses , geospat ia l da tase ts , Web content m a n a g e m e n t , e -sc ience , and e - lea rn ing . A s in format ion in const ruc t ion pro jects invo lves very d iverse t ypes of in fo rmat ion , inc luding s t ruc tured da ta s tored in corpora te da tabases , spat ia l in fo rmat ion , Web content , paper -based d o c u m e n t s , and learn ing resources , me tada ta as used in all a fore-ment ioned contex ts needs to be cons ide red . A l though in di f ferent c o m m u n i t i e s the te rm has s l ight ly di f ferent m e a n i n g s , it can be genera l i zed that metadata is s t ruc tured in format ion about in format ion resources . Information resource can be any th ing that con ta ins any kind of in fo rmat ion , regard less of content , m e d i a , or fo rmat . It co r responds to #$InformationBearingThing (IBT) in the C y c Upper Onto logy [Cycorp 2002 ]—"an i tem that conta ins in format ion (for an agent who knows how to interpret i t)". The purpose of me tada ta is to suppor t d i scovery , unde rs tand ing , admin i s t ra t i on , and access to in fo rmat ion . Metadata for an in format ion resource (a lso known as a me tada ta record) is s t ruc tured as a set of a t t r ibu te -va lue pairs. These ind iv idual a t t r ibutes are known as metadata elements, aspects, or facets. Each a t t r ibu te -va lue pair represents a s imp le asser t ion about the in format ion resource. At t r ibu te va lues may be l i terals or re ferences to o ther s ign i f ieds. Metadata wil l va ry w ide ly in di f ferent env i r onmen ts , depend ing on the type of the in format ion resource descr ibed and the purpose and use of the resource and its descr ip t ion . A s Burnet te et a l . point out [ 1999 ] , the p r imary purpose of me tada ta in the l ibrary c o m m u n i t y is to suppor t resource d i scove ry , whi le in format ion techno logy focuses on data use , inc luding da ta a c c e s s , m a n a g e m e n t , and ana lys i s . - 41 -N u m e r o u s typo log ies of me tada ta have been proposed [e.g. Lagoze et a l . 1 9 9 6 ; Gi l l i land 2 0 0 0 ; Cap lan 2 0 0 1 ; G reenbe rg 2 0 0 1 c ] . L isted below are s o m e c o m m o n types of e lemen ts that appea r in di f ferent c lass i f i ca t ions : 1. Descriptive me tada ta e lemen ts used to descr ibe the in format ion resource itself, 2. Semantic or content me tada ta e lemen ts descr ib ing content of the in format ion resource , 3. Administrative e l emen ts used for manag ing and admin is te r ing in format ion resources , 4 . Use me tada ta e lemen ts , wh ich relate an in format ion resource to par t icu lar aud ience and use con tex t , 5. Technical me tada ta e lemen ts that cap ture techn ica l in format ion about in format ion i tems and ensu res that they can be correct ly d i sp l ayed , e x p o r t e d , and p rese rved . 6. Structure me tada ta def in ing re la t ionships wi th in and a m o n g in format ion resources . T rans la ted into semio t i c t e rms , me tada ta represents a set of asser t i ons about ins tances of a speci f ic ca tegory of s ign i f ieds—about in format ion resources of al l k inds , inc luding a l l : s t ruc tu red , sem is t ruc tu red , and unst ruc tured resources . 5.1.2 Classification Classification is the o ldest and mos t c o m m o n m e c h a n i s m for o rgan iz ing in format ion and knowledge (as wel l as about any th ing e lse) . The Br i t ish S tanda rds Inst i tut ion def ines c lass i f icat ion as " the group ing together of l ike objects and the i r separa t ion f rom unl ike ob jec ts " [BS 1 0 0 0 C 1 9 6 3 ] , wh ich co r responds to the a fo re -desc r ibed concep t of ca tegor iza t ion . A l though the te rms " ca tego r i za t i on " and "c lass i f i ca t i on " are used in te rchangeab ly in both eve ryday par lance and profess ional l i terature, th is d o c u m e n t wil l d is t ingu ish the fo rmer as a f lex ib le cogni t ive m e c h a n i s m used for persona l unders tand ing and handl ing of the comp lex i t y of the wor ld f rom the lat ter as a more rigid f o rma l i sm used for b reak ing a cer ta in doma in into we l l -de f ined , h ierarch ica l ly o rgan ized c lasses . - 4 2 -Enumerative classification, in its pure f o rm , is a spec ia l i za t i on -based h ierarchy of mutua l l y exc lus ive (or dis jo int) and col lect ive ly exhaus t i ve classes—well-defined group ings of s ign i f ieds that sha re par t icu lar d is t ingu ish ing charac ter is t i cs . C lasses a re fo rmed by in tens ion—def in i t ion of necessary and suff ic ient cond i t ions for m e m b e r s h i p ( just as c lass ica l ca tegor ies ) , ra ther than ex tens ion—l is t ing of m e m b e r s . The only type of re la t ionsh ips cap tured in this type of concep tua l s c h e m a is h ie ra rch ica l ; more speci f ica l ly genus-species (or s u b s u m p t i o n ) , and character is t ics of h igher c lasses are inher i ted by all sub -o rd ina tes . Enumera t i ve c lass i f icat ion s c h e m e s are used for g roup ing s ign i f ieds of all k inds , both in format ion resources (documen ta ry c l a s s i f i c a t i o n 5 ) and d o m a i n concep ts ( taxonomies ) and can wide ly va ry in scope , f rom genera l c lass i f ica t ion s y s t e m s , such as Dewey Dec ima l Class i f icat ion ( D D C ) , for e x a m p l e , to c lass i f ica t ions cover ing very nar row spec ia l i zed d o m a i n s , such as bio logical t a x o n o m i e s or c lass i f icat ions of products on an e - c o m m e r c e W e b s i te. Depend ing on its pr imary pu rpose , a c lass i f icat ion scheme can dev ia te to a va ry ing ex tent f rom the a rche type def ined in the paragraph above . Wh i le , for e x a m p l e , mutua l exc lus iv i ty of c lasses may be necessary for ar rang ing books on l ibrary she lves or c lass i fy ing products in a f inancia l repor t ing da tabase , th is requ i rement m a y be re laxed in an e lec t ron ic vers ion of the s a m e l ibrary or in a Web cata log inc luding the s a m e set of products . S im i la r l y , the def in i t ion of necessary and suff ic ient cond i t ions for inc lus ion may be s imp ly impl ied by the c lass name (e .g . "me ta l sash w indows" ) in c lass i f icat ion s c h e m e s in tended exc lus ive ly for h u m a n use , or c lasses may be def ined by in tens ion if used as t ra in ing se ts in appl icat ions us ing mach ine - lea rn ing techno log ies for the class i f icat ion p rocess . T h u s , a l though c lass i f icat ion s c h e m e s as def ined above may be needed in s y s t e m s us ing s o m e k inds of mach ine reason ing , m a n y o ther needs wil l be sat is f ied wi th less fo rma l and less " seman t i ca l l y d e e p " h ierarch ica l a r r a n g e m e n t s of s ign i f ieds. Wi th the avai lab i l i ty of advanced techno log ies , d i f ferent ex te rna l i za t ions of categor iza t ion can also f ind the i r use and shou ld be cons idered as potent ia l seman t i c resources . Th is g roup inc ludes the so-ca l led folksonomies or ethnoclassifications— resul ts of co l labora t ive ca tegor iza t ion us ing f reely chosen keywords that recent ly s tar ted appear ing in a var ie ty of soc ia l sof tware [Merho lz 2004]—as wel l as fo lder s t ruc tu res 5 It is worth noting that documentary classification schemes are actually classifying knowledge (i.e. disciplines), but they are used for classifying information resources based on their subject, more precisely the aspect of the subject covered by the resource. For example, books about petroleum may be classified under Geology, Economics, Chemistry, or Politics, depending on the authors' perspective. - 43 -used for o rgan iz ing f i les on persona l compu te rs or shared dr ives and nav igat ion h ierarch ies used on a m a t e u r s ' W e b si tes and blogs Howeve r , the d is t inct ion between ca tegor iza t ion and c lass i f icat ion has to be cons idered in unders tand ing the potent ia l , benef i ts , and l imi tat ions of di f ferent seman t i c resources in the con t i nuum be tween free persona l ca tegor iza t ion and fu l l -b lown c lass i f icat ion s c h e m e s . Faceted classification can be used for o rgan iz ing any kind of s ign i f ieds. It was f irst in t roduced by the Ind ian l ibrar ian Rangana than in the ear ly 1930 's [Rangana than 1957 ] . Facets are c lear ly de f ined , mutua l l y exc lus ive , and co l lec t ive ly exhaus t i ve aspec ts , p roper t ies , or character is t ics of a s ign i f ied . Each aspec t of a s igni f ied is ana lyzed a n d , in the or ig ina l s y s t e m , a facet formula is used to comb ine facets into a compos i te concept and posi t ion it into the c lass i f icat ion s c h e m e . In mode rn imp lemen ta t i ons , faceted c lass i f icat ion is often used to al low a l ternat ive ways to access the s a m e set of in format ion resources [Sacco 2 0 0 0 ; Hearst 2 0 0 0 ] . It c o m b i n e s two c lass ic in format ion m a n a g e m e n t techn iques—resource descr ip t ion (e .g . me tada ta ) and c lass i f i ca t ion , and a l lows the lat ter to be der ived f rom the former . In o ther wo rds , each s igni f ied is descr ibed us ing a set of asser t ions relat ing it to o ther s ign i f ieds ; these s ign i f ieds belong to di f ferent facets and s igni f ieds wi th in each facet are a r ranged into mo re -o r - l ess deep h ie rarch ies , thus fo rming a network of h ierarch ies that can be t rave rsed in mul t ip le ways . 5.2 Lexical Resources A dictionary is an a lphabet ica l list of t e rms—words and ph rases , accompan ied by in format ion about t h e m . The in format ion may inc lude: mean ing (s ) of the te rm in the form of def in i t ions (mono l ingua l d ic t ionar ies) or co r respond ing t e rms in o ther languages (b i l ingual or mul t i l ingual d ic t ionar ies) , p ronunc ia t ion , e t ymo logy , word der iva t ions , usage gu idance , and e x a m p l e s of use in sen tences . In o ther wo rds , d ic t ionar ies p rov ide : (a) senses that assoc ia te s igni f iers ( i .e. te rms) wi th s ign i f ieds , the lat ter being represented by h u m a n - r e a d a b l e def in i t ions or by sets of co r respond ing t e rms in ano ther language and (b) addi t iona l in format ion about the s igni f ier itself. D ic t ionar ies are in tended for h u m a n use , but computa t iona l - l i ngu is t i cs techno log ies have been success fu l in ex t rac t ing lexical seman t i c s f rom mach ine - readab le d ic t ionar ies ( M R D ) . Glossaries are a lphabet ica l ly o rgan ized l ists of t e rms used in a par t icu lar doma in wi th more or less deta i led def in i t ions. They typ ica l ly do not prov ide any addi t iona l in format ion about the te rms and rarely have to hand le h o m o n y m y ( i .e. to prov ide - 44 -mul t ip le unre la ted senses for a s ing le te rm) as the inc lus ion in a doma in -spec i f i c g lossary in most cases de l imi ts the mean ing of the t e r m . Synonyms thesauri are in tended to be used as a ids in wr i t ing . They prov ide sets of s y n o n y m s o rgan ized by seman t i c rather than lexical s imi la r i ty . It can be said that each set of s y n o n y m s , a long wi th its posi t ion in the a r rangemen t , represents a s ign i f ied . Controlled vocabularies are co l lect ions of t e r m s used in a par t icu lar scope (e .g . jou rna l art ic le da tabase , enterpr ise) that ident i fy preferred terms wh ich shou ld be used for a par t icu lar purpose (subject i ndex ing , nam ing of da tabase tab les and co lumns ) and points to t hem f rom a n u m b e r of entry terms—terms that shou ld be avo ided for that par t icu lar purpose but can be expec ted to be used by di f ferent people for the s a m e concept , h t t p : / / w w w . n b c . c o m Information retrieval thesauri are the mos t soph is t i ca ted type of cont ro l led vocabu la r ies . They are organ ized co l lect ions of te rms—pr imar i l y nouns and noun -phrases , wh ich are used for descr ib ing in format ion resources s to red in in format ion ret r ieval s y s t e m s and for faci l i tat ing sea rch . A thesau rus cap tu res equ iva lency re la t ionsh ips ( s y n o n y m s and lexical var ian ts ) be tween the t e r m s as wel l as h ierarch ica l and assoc ia t i ve re la t ionsh ips be tween the s ign i f ieds they deno te . The s tandard that governs the i r cons t ruc t ion , fo rmat , and m a n a g e m e n t [ A N S I / N I S O Z 3 9 . 1 9 - 2 0 0 5 ] a l lows three t ypes of h ierarch ica l re la t ionsh ips : genus-species, pa r t -who le , and c l ass - i ns tance ; however it does not recogn ize any sub - t ypes of assoc ia t i ve re la t ionsh ips . Preferred t e rms f rom a thesaurus are used as va lues of me tada ta e lemen ts (or ig inal ly only of the sub jec t f ie ld) . Thesaur i a l low in format ion users to locate in format ion of interest even if t e rms they use in their search s t r ings differ f rom the te rm ino logy used in in format ion resources or f rom te rms used by indexers—individuals descr ib ing in format ion resources in o rder to m a k e t h e m ret r ievable ( i .e. ass ign ing me tada ta va lues ) . They also ass is t wi th the d isamb igua t ion of h o m o n y m s , e .g . br idge (cons t ruc t ion) , br idge (dent is t ry ) , br idge (card g a m e ) . Unl ike other cont ro l led vocabu la r i es—such as author i ty l ists or sub jec t head ings , whose purpose is only to s tandard i ze t e rms used for index ing and search ing—thesaur i capture a cer ta in conceptua l i za t ion of a d o m a i n th rough the use of h ierarch ica l and assoc ia t i ve re la t ionsh ips. Ove r t ime , thesaur i s tar ted converg ing wi th o ther t ypes of seman t i c resources and found addi t iona l uses in in format ion and knowledge m a n a g e m e n t . New types of seman t i c resources , s o m e w h a t s im i la r to thesaur i , that are wor th ment ion ing include t axonom ies and lexical nets. - 45 -The te rm taxonomy has or ig inal ly been used for c lass i f icat ion s c h e m e s in tended for organ iz ing s ign i f ieds o ther than in format ion resources , espec ia l l y l iv ing o rgan i sms . A l though its or ig inal mean ing is stil l re ta ined in b io logy, the te rm is cur rent ly ex tens ive ly used in the wor ld of corporate i n f o rma t i on -managemen t wi th a very broad spec t rum of var ia t ions in m e a n i n g . In [Gi lchr is t et a l . 2001 ] a corpora te t a x o n o m y is descr ibed a s : • " a corre la t ion of the di f ferent funct iona l l anguages used by the enterpr ise • to suppor t a m e c h a n i s m for nav iga t ing , and ga in ing access to the inte l lectual capi ta l of the enterpr ise • by prov id ing such tools as portal nav iga t ion a ids , author i ty for tagging documen ts and other in format ion ob jec ts , suppor t for search eng ines , and knowledge m a p s • and poss ib ly , a knowledge base in its own r ight . " Howeve r , due to var ia t ions in corpora te goa ls , cu l tu re , IT, and in format ion env i ronmen ts and s t ra teg ies , corpora te t a x o n o m i e s have very di f ferent fo rms. S o m e t i m e s the te rm refers to a pure c lass i f ica t ion s c h e m e or a pure thesau rus , but in other cases it s tands for e i ther an enumera t i ve or faceted c lass i f icat ion s c h e m e enr iched by s y n o n y m s or a thesaurus ex tended by n a m e d types of assoc ia t i ve re la t ionsh ips. There fo re , t axonom ies will be cons idered as var ia t ions and comb ina t ions of o ther types of seman t i c resources and d is t ingu ished f rom t h e m by thei r purpose and doma in rather than by their s t ructure and seman t i c dep th . Lexical databases, lexical nets, or terminological ontologies (e .g . Word Net and EDR Electronic Dictionary) represent a new type of seman t i c resources . They capture lexical seman t i cs ex t rac ted f rom mach ine - readab le d ic t ionar ies and text corpora in the form usab le by both h u m a n s and mach ines . T e r m s or s y n o n y m sets a re connec ted wi th rich sets of seman t i c re la t ionships (e .g . h y p e r n y m s , h y p o n y m s , m e r o n y m s . t r oponyms e tc . ) . These resources have been wide ly used in au tomat i c tex t ana lys i s and art i f ic ial in te l l igence app l ica t ions . 5.3 Knowledge Representation Know ledge representa t ion or ig inates f rom the field of Art i f ic ia l In te l l igence wi th the purpose to enab le mach ine reason ing . 5.3.1 Networks Var ious fo rms of conceptua l f o rma l i sms based on assoc ia t ion is t and connect iv is t theor ies and typ ica l ly represented as g raphs vary in te rms of representa t iona l - 4 6 -l anguages , te rm ino logy , and seman t i c dep th . They are known as assoc ia t i ve , cor re la t ional [Ceccato 1 9 6 1 ] , seman t i c [Sowa 1991 ] , connect ion is t , concep tua l , or cogni t ive networks or nets. S o m e of t h e m capture on ly undi f ferent ia ted connec t ions a m o n g s igni f ieds or s ign i f ie rs , in o thers , assoc ia t ions are labeled and def ined more -o r -less fo rmal ly . Assoc ia t ions are of ten ca l led " e d g e s , " " a r c s , " or " r o l e s " and can be rec iprocal ( i .e. symmet r i ca l ) or d i rec ted . S o m e ne tworks focus on a s ing le type of re la t ionsh ips , as for e x a m p l e inher i tance ne tworks (which co r respond to enumera t i ve c lass i f ica t ion) , whi le in connect ion is t ne tworks assoc ia t ions are d y n a m i c ; they can change o v e r t i m e based on di f ferent in f luences and learn ing . 5.3.2 Frames Orig ina l ly proposed by Minsky [Minsky 1 9 7 5 ] , th is concep tua l f o rma l i sm has fur ther been deve loped and used in m a n y f r a m e - b a s e d s y s t e m s , such as K R L [Bobrow and Winograd 1977 ] , K L - O N E [B rachman and S c h m o l z e 1985 ] , and many o ther [Fikes and Keh ler 1985 ] . F rames cons is t of slots ( i .e. a t t r ibutes) wi th spec i f ied or compu ted fillers ( i .e. va lues ) . Fi l lers can inc lude: l i teral va lues , re ferences to o ther f r ames , or procedure ca l ls . F rames are organ ized in h ierarch ies and sub f rames in the h ierarchy inheri t proper t ies (defaul t f i l lers, restr ic t ion on f i l lers, etc.) f rom supe r f r ames , accord ing to s o m e inher i tance s t ra tegy. Ob jec t -o r ien ted da ta mode l ing is based on th is f o rma l i sm . 5.3.3 Ontologies The te rm ontology has been used to m e a n m a n y di f ferent th ings . It had been used for centur ies to m e a n " a branch of me taphys i cs that s tud ies the nature of ex is tence or b e i n g " or " a sys tema t i c accoun t of Ex i s tence" . Cur ren t l y the most c i ted def in i t ion of onto logy in the in format ion techno logy c o m m u n i t y is probably the one that def ines it as an "exp l ic i t speci f icat ion of a concep tua l i za t i on " [Gruber 1993 ] . Howeve r , as there is no shared unders tand ing of the prec ise mean ing of "exp l ic i t spec i f i ca t ion" , th is def in i t ion a l lows di f ferent commun i t i es to interpret it in di f ferent w a y s and the te rm is used to m e a n any type of seman t i c resources , f rom a da tabase s c h e m a to a lex ical da tabase . In this documen t , the te rm "on to l ogy " will be used to m e a n a fo rma l or ax ioma t i zed onto logy only . Formal or axiomatized ontologies inc lude def in i t ions and a x i o m s exp ressed in a log ic -based language (a . k .a . descr ip t ion logics, te rmino log ica l s y s t e m s , or concept languages) or ano the r t ype of language that can be t rans la ted into a logic-based one . These languages can exp ress cons t ra in ts that a l low de ta i led , accu ra te , cons is tent , unamb iguous , compu te r -unde rs tandab le spec i f ica t ions of c lasses , - 4 7 -re la t ionsh ips that can ex is t a m o n g t h e m , and proper t ies that they may have . Such onto log ies typ ica l ly have assoc ia ted sets of in ference rules that suppor t compu te r reason ing . In te rms of scope , two types of onto log ies can be d is t i ngu ished : high-level, upper, or common ontologies that cap ture genera l knowledge about the wor ld and are va l id across d o m a i n s and domain ontologies, wh ich cover on ly a speci f ic d o m a i n and can be assoc ia ted wi th a speci f ic h igh- leve l onto logy. 5.4 Data Modeling 5.4.1 Schemas To be usable in ne tworked and au toma ted env i r onmen ts , me tada ta needs to be st ruc tured accord ing to a cer ta in metadata schema wh ich expl ic i t ly and unamb iguous ly dec la res what me tada ta e l emen ts a re needed to descr ibe an in format ion resource and how metada ta is to be represented and in te rpre ted , thus mak ing its seman t i cs comprehens ib le to h u m a n s and compu te rs . In tegrated in teroperab le env i ronmen ts require the use of s h a r e d , agreed upon , and publ ic ly ava i lab le me tada ta s c h e m a s . N u m e r o u s s tandards deve loped for that purpose belong to a few basic t ypes : • S tanda rds def in ing me tada ta structure, i.e. e lemen ts requi red to descr ibe a cer ta in type of resource , (e .g . Dubl in C o r e ) , • S tanda rds def in ing a l lowed values for me tada ta e lemen ts (e .g . l i teral va lues f rom cont ro l led vocabu la r ies or di f ferent t ypes of ru les, conven t i ons , or const ra in ts ) These s tandards are c o m p l e m e n t e d wi th techn ica l s tandards regulat ing encod ing and exchange of me tada ta . Most of the ex is t ing s tandards involve a comb ina t ion of the a fo re -men t ioned basic t ypes . Gener i c me tada ta s c h e m a s are a lso cal led metadata element sets. The use of the te rm " e l e m e n t se t " normal ly ind icates that it is not necessary to use each of the speci f ied e lemen ts . Spec i f i c types of in format ion resources or speci f ic app l ica t ions wil l need only a subset of these s c h e m a s or m a y need a comb ina t ion of subse ts f rom dif ferent e lemen t se ts . These subse ts are speci f ied in cor respond ing document profiles and application profiles. A s ment ioned ear l ier , a me tada ta record represents a s t ruc tured in format ion resource. Schemas are a lso used to def ine (descr ibe and speci fy) rest r ic t ions for o ther - 48 -t ypes of s t ruc tured in format ion resources , such as X M L f i les, da tabase records , or e lect ron ic represen ta t ions of di f ferent s ign i f ieds in ob jec t -o r ien ted app l i ca t ions . Jus t as in the case of me tada ta s c h e m a s , these def in i t ions cover : the s t ruc ture of in format ion resou rces—componen ts and a r r a n g e m e n t thereof ; the i r a l lowed conten t , e .g . da ta t ypes , sets of a l lowed va lues , or any addi t iona l cons t ra in ts ; a n d , e i ther impl ic i t ly or expl ic i t ly , the mean ing of ind iv idual componen ts . A t the s a m e t ime , s c h e m a s the reby speci fy a concep tua l i za t ion of s ign i f ieds about wh ich those in format ion resources prov ide in format ion and can be seen as se ts of asser t ions about these s ign i f ieds. W h e n used in comb ina t ion wi th me tada ta , s t ruc tured in format ion resources descr ib ing ca tegor ies of s ign i f ieds o ther than in format ion resources are usua l ly cal led "pro f i les" . Profiles are typ ica l ly c reated for c lasses f rom the bus iness con tex t , such as cus tomers , t asks , app l i ca t ions , or d isp lay dev ices . Prof i les are ma tched to me tada ta by match ing va lues of co r respond ing e lemen ts . For e x a m p l e , va lues of role, personal interest, and language preference e lemen ts in a user prof i le are ma tched to va lues of audience, subject, and language me tada ta e lemen ts in order to make appropr ia te in format ion ava i lab le to speci f ic aud ience . It can be not iced that the t e rm "p ro f i l e " has been used to refer to bo th : (a) se ts of e lemen ts needed for descr ib ing a par t icu lar type of in format ion resources or in format ion resources used in a par t icu lar app l ica t ion ( i .e. documen t or app l ica t ion prof i le) and (b) popula ted s c h e m a s descr ib ing s igni f ieds o ther than in format ion resources (e .g . user prof i le). Unfor tunate ly , the t e rm has been wide ly accepted in both senses and there are no a l ternat ive t e rms used for e i ther purpose . However , in order to d is t ingu ish be tween the two di f ferent concep ts , and in accordance wi th the te rm ino logy adop ted in Sec t ion 4 . the te rm " m o d e l " wil l be used for the lat ter sense of the t e rm . The s a m e te rm wil l be used for o ther descr ip t ions of ind iv idual s igni f ieds s t ruc tured accord ing to a s c h e m a , for e x a m p l e for records descr ib ing ins tances of act iv i t ies or even ts used in s o m e t ypes of case -based reason ing ( C B R ) , that are usual ly cal led " c a s e s " , for s t ruc tured product descr ip t ions etc . In add i t ion to s c h e m a s that descr ibe a s ing le type of s ign i f ieds , there are s c h e m a s that descr ibe more c o m p l e x in format ion resources and interre lated sets of s ign i f ieds that are sub jec ts of these resources . Th is ca tegory inc ludes, for e x a m p l e , database schemas. It is impor tan t to note here that da tabase s c h e m a s are norma l l y used local ly to enab le manageab i l i t y and cons is tency of d a t a ; on ly records wi th in a par t icu lar da tabase are s t ruc tured accord ing to that s c h e m a . Shar ing and in teroperabi l i ty wil l - 49 -require me tada ta that descr ibes the da tabase and that me tada ta wil l be descr ibed us ing a cer ta in me tada ta s c h e m a . Yet ano the r type of s c h e m a s cover ing mul t ip le s ign i f ieds is doma in s c h e m a s . A domain schema is a shared conceptua l i za t ion of a cer ta in doma in of interest deve loped by a c o m m u n i t y of s takeho lde rs in order to a l low exchange of in format ion be tween d iverse app l ica t ions used in the d o m a i n . These conceptua l s c h e m a s fo rma l l y descr ibe pa rad igms and re la t ionsh ips that ex is t in the d o m a i n , as agreed upon by its par t ic ipants . A l though these s c h e m a s are mos t f requent ly ca l led " d o m a i n m o d e l s " , the t e rm " s c h e m a " is used in this documen t , in order to d is t ingu ish s c h e m a s f rom mode l s . D o m a i n s c h e m a s can use ve ry di f ferent f o rma l i sms and languages—from mode l ing to log ic -based ones , and inc lude cons t ra in ts and rules to a di f ferent ex ten t—thus rang ing f rom conceptua l da tabase s c h e m a s to ob jec t -o r ien ted data mode ls and s e m a n t i c ne tworks to onto log ies . 5.5 Standardization vs. Diversity S e m a n t i c resources can vary w ide ly in te rms of the i r d o m a i n of cove rage and seman t i c dep th . The i r domain of coverage can vary a long mul t ip le d i m e n s i o n s , such as d o m a i n ex ten t , doma in t ype , and seman t i c focus. S e m a n t i c resources can cove r very nar row sub jec t d o m a i n s or the ent i re un ive rse , documen ts , p roduc ts , p rocesses , or conceptua l i za t ion cons t ruc ts . They can focus on detai ls of a s ing le ca tegory , on re la t ionsh ips of a s ing le type (e .g . h ierarch ica l ) between numerous ca tegor ies wi thout spec i fy ing categor ies t h e m s e l v e s , or prov ide very deta i led def in i t ions of ve ry broad doma ins inc luding ca tegor ies , d iverse re la t ionsh ips , and codes , a long wi th cons t ra in ts and inher i tance m e c h a n i s m s . The "semantic depth" of a seman t i c resource , or the degree to wh ich it m a k e s the conceptua l i za t ion expl ic i t , i.e. its level of de ta i l , ref lects the purpose of the resource , the avai lab i l i ty of h u m a n in terpretat ion at e i ther the or ig inat ing or rece iv ing e n d , and the level of " i n te l l i gence" expec ted f rom compu te rs . Whi le , for e x a m p l e , the use of e thnoc lass i f i ca t ion a s s u m e s h u m a n par t ic ipants at both ends of the in format ion s y s t e m and expec ts no mach ine reason ing , a thesaurus a s s u m e s h u m a n par t ic ipants as wel l as a mach ine in te rmed ia ry capab le of ret r iev ing in format ion resources abou t sub jec ts that are not expl ic i t ly requested in a query s t r ing , and an onto logy is expec ted to enab le comp lex mach ine reason ing wi thout direct h u m a n invo lvement . In [Dorr et a l . 2001 ] the te rm " s e m a n t i c d e p t h " is used in a s l ight ly di f ferent sense . The au thors d is t ingu ish - 50 -dif ferent t ypes of in format ion s tandards based on the i r content type and semantic depth. Along the con ten t - t ype d i m e n s i o n , they d i s t i ngu ish : genera l s tandards and onto log ies , p rocess , product , in format ion m e d i a , concep tua l mode l ing and representa t ion s tandards . Based on seman t i c dep th , they d i s t i ngu ish : in terchange fo rmats and pro toco ls , c o m m o n data d ic t ionar ies , t hesaur i , re ference mode l s , and ax iomat i c on to log ies . The i r not ion of semantic depth is spl i t in th is documen t into semantic focus and semantic depth as a "deg ree of mach ine -unde rs tandab le exp l i c i tness" . Its mean ing is s im i la r to what Uscho ld uses as a bas is for d is t ingu ish ing four k inds of s e m a n t i c s : (1) impl ic i t , (2) expl ic i t and in fo rma l , (3) expl ic i t and formal for h u m a n p rocess ing , and (4) expl ic i t and formal for mach ine process ing [Uschold 2003] . Yet ano the r d is t ingu ish ing aspec t of di f ferent seman t i c resources is the i r in tended scope of adoption or the extent of the commi t t i ng c o m m u n i t y . They can va ry f rom resources used to organ ize persona l f i les or s t ruc ture in format ion in a s ing le da tabase , to resources in tended to be used th roughout an en terpr ise , or to in ternat iona l s tandards expec t ing c o m m i t m e n t f rom the global c o m m u n i t y . Pract ica l ly all efforts a im ing spec i f ica l ly to enab le or enhance commun ica t i on are based on c o m m i t m e n t to a shared seman t i c resource , i.e. on s tandard iza t ion . S tandard i za t i on enab les shar ing and exchange of in format ion as wel l as enhanced in format ion ret r ieval and p rocess ing , qual i ty and cons is tency of da ta , and preserva t ion of da ta ove r t ime . The A E C / F M d o m a i n , wh ich features both ex tens ive and in tens ive in format ion exchange be tween mul t ip le and chang ing par t ic ipants , is cer ta in ly a s t rong cand ida te for the deve lopmen t and s tandard iza t ion of shared seman t i c resources , s ince agreed upon ru les, codes , and s c h e m a s are a prerequis i te for eff ic ient c o m m u n i c a t i o n in e lec t ron ic env i ronmen ts . However , th is var ie ty of part ies and the i r in format ion needs can h inder both the deve lopmen t of such s tandards and thei r use fu lness . A s ing le doma in s c h e m a is not l ikely to be in ha rmony wi th conceptua l i za t ions used by di f ferent par t ic ipants in di f ferent con tex ts . Th is is ref lected in the ex is tence of mul t ip le seman t i c resources in the d o m a i n , in the i r h istory and insuff ic ient adop t i on , wh ich are all rev iewed in Sec t ion 6.3.1. A s Ph ipps wa rns , s tandard iza t ion may create even more di f f icul t ies "by hid ing comp lex i t i es beh ind superf ic ia l a g r e e m e n t s " [Phipps 2000] . Whi le necessary for commun i ca t i on and knowledge deve lopmen t , s tandard iza t ion of concep tua l s c h e m a s at the s a m e t ime h inders f ree rea r rangement of ca tegor ies and creat ion of new re la t ionsh ips—processes that fo rm the basis of creat ive th ink ing , as wel l as part ial unders tand ing and pragmat i cs , wh ich are charac te r i z ing h u m a n - t o - h u m a n c o m m u n i c a t i o n . - 51 -6 Point of Departure This sect ion rev iews speci f ic efforts a im ing to improve in format ion m a n a g e m e n t , inc luding re levant research d i rec t ions , enab l ing techno log ies , s tandards , f r a m e w o r k s , and commerc i a l so f tware , both genera l and doma in -spec i f i c . Focus is on efforts that have in format ion shar ing and in teroperabi l i ty for the i r goa l , but recogn ize to at least s o m e ex ten t , the need for d ivers i ty of concep tua l i za t ions , te rm ino log ies , representa t ions , and con tex ts . It a lso covers doma in -spec i f i c seman t i c resources and s tandard iza t ion efforts to be l eve raged , as wel l as ex is t ing and emerg ing techno log ies that can be used towards that goa l . 6.1 General Research and Development Efforts 6.1.1 Interoperability The major i ty of efforts re lated to the m a n a g e m e n t of e lec t ron ic in format ion focus on in teroperabi l i ty of c o m p u t e r s y s t e m s . Interoperability is the abi l i ty of two or more c o m p u t e r s y s t e m s to exchange in format ion and to cor rec t ly interpret and process in format ion f rom other s y s t e m s . Main h indrances to in teroperabi l i ty are typ ica l ly g rouped into two ca tegor ies : (a) rep resen ta t iona l , syn tac t i c , s t ruc tu ra l , or s c h e m a he terogene i ty and (b) seman t i c he terogene i ty . Howeve r , as G o h points out , the d is t inct ion be tween these two types is often b lurry , s ince the logical o rgan iza t ion of data often c o n v e y s seman t i c in format ion [Goh 1997 ] . In add i t i on , the very concept of seman t i cs can substant ia l ly differ ac ross di f ferent c o m m u n i t i e s , as deta i led in the d iscuss ion of d i f ferences in seman t i c dep th . Both types of he terogene i ty can ex is t at the data level ( i .e. d i f ferences in the representa t ion or in terpretat ion of instance da ta va lues , such as d i f ferences in uni ts , p rec is ion , or te rm ino logy ) , at the s c h e m a level (e .g . di f ferent naming of c l asses , use of s a m e names for di f ferent concep ts , representa t ion of c l asses , e l emen ts , or ent i t ies as at t r ibu tes and v ice v e r s a ) , and across levels ( i .e. di f ferent not ions of ins tances—what is a da ta va lue in one s y s t e m can be an ent i ty , e lemen t , c l ass , or at t r ibute in ano ther ) . S e m a n t i c he terogene i ty is descr ibed and classi f ied in di f ferent w a y s . For e x a m p l e , [Fi leto and Mede i ros 2003 ] d is t ingu ishes con found ing , sca l i ng , and naming conf l ic ts , whi le [Na iman and Oukse l 1995] c lass i f ies seman t i c conf l ic ts a long three d i m e n s i o n s : n a m i n g , abs t rac t ion , and levels of he terogene i ty . A n u m b e r of so lu t ions have been deve loped ove r t ime for in tegrat ing he te rogeneous in fo rmat ion . S o m e of the so lu t ions focus on in tegrat ing d iverse s y s t e m s of the s a m e t y p e ; e x a m p l e s include da ta warehous ing [Object M a n a g e m e n t Group 2003 ] and - 52 -in tegrated l ibrary ca ta logues [ A N S I / N I S O Z 3 9 . 5 0 - 2 0 0 3 ] . O ther so lu t ions , such as the IBM Gar l i c project [Cody et a l . 1995] or Z a r a g o s a Un ivers i ty Obse rve r [Mena et al 2 0 0 0 ] , in tegrate very d iverse types of in fo rmat ion , inc luding s t ruc tu red , s e m i -s t ruc tu red , and unst ruc tured in format ion in di f ferent fo rmats . C o m m e r c i a l of fer ings focus on di f ferent s c o p e s ; mos t f requent a re those for persona l in format ion m a n a g e m e n t (e .g . Enf ish p r o d u c t s 6 ) and Enterpr ise In format ion In tegrat ion (e .g . Me taMat r i x 7 or iWay So f twa re 8 ) . [Fi leto and Bauze r Mede i ros 2 0 0 3 ; R a g h a v a n and Garc ia -Mo l i na 2001 ] prov ide ove rv iews of di f ferent techn ica l so lu t ions used for da ta in tegra t ion , inc luding ga teways , wrappers and med ia to rs , ex tens ion modu les , da ta w a r e h o u s e s , l aye red , and v i ew-based app roaches . Chan and Zeng ' s rev iew of methodo log ica l app roaches used in efforts on es tab l ish ing in teroperabi l i ty a m o n g cont ro l led vocabu la r ies and c lass i f icat ion s c h e m e s ident i f ied the fo l lowing bas ic t ypes : der ivat ion/model ing, t ranslat ion/adaptat ion, satellite and leaf node l inking, direct mapping, co-occurrence mapping, swi tching, linking through a temporary union list, and linking through a thesaurus server protocol. [Chan and Zeng 2 0 0 2 ; Z e n g and C h a n 2 0 0 4 ] . Dif ferent bus iness p rob lems requi re di f ferent t ypes of seman t i c resource reconc i l ia t ion. Merg ing of da ta sources requi res s c h e m a in tegra t ion, data warehous ing and data min ing requi re mapp ing to a shared target s c h e m a , data shar ing us ing a un i form query inter face to mul t ip le da ta sources m a y be best se rved by c rosswa lks , query ing and retr ieval of data in pee r - to -pee r env i ronmen ts by a network of bi lateral s c h e m a mapp ings [Halevy et a l . 2 0 0 3 ] , and in teroperabi l i ty of sof tware app l ica t ions by der iva t ion . In th is rev iew, the fo l lowing bas ic app roaches for ach iev ing semantic interoperability wil l be d i s t i ngu ished : (1) use of a shared a priori deve loped seman t i c resource to which all par t ic ipat ing s y s t e m s c o m m i t at des ign t ime (der iva t ion /mode l l ing) and (2) enab l ing in teroperabi l i ty of independent ly deve loped s c h e m a s (every th ing e lse) . The second app roach inc ludes two sub t ypes : (a) bi lateral mapp ings be tween each pair of seman t i c resources and (b) reconci l ia t ion of independent ly deve loped seman t i c resources us ing a shared seman t i c resource (F igure 8 ) . Al l o ther app roaches can be cons idered var ia t ions of the second app roach (2) . 6 Enfish rhttp://www.enfish.com/1 7 MetaMatrix rhttp://www.metamatrix.com/l 8 iWay Software |"http.7/www.iwaysoftware.com/products/eii.htmn - 53 -Shared schema Local schema (1) Local application schemas derived from a shared schema, fully (left) or partially (right) (2a) Local, individually developed application schemas mapped to each other. (2b) Local application schemas independently developed and then mapped to a shared schema. Figure 8. Types of approaches for achiev ing semant i c interoperabi l i ty. A n examp le of a shared seman t i c resource in tended for der iva t ion of local s c h e m a s is Dublin Core Metadata Element Set ( D C M E S ; [DCMI 2 0 0 6 b ] . Th is approach has al l benef i ts and d rawbacks of s tandard iza t ion d iscussed in Sec t ion 5 .5 . One opt ion for avo id ing the d rawbacks is the adop t ion of wha t B e h r m a n cal ls a "m in ima l i s t a p p r o a c h " to the deve lopmen t of s tandards [Beh rman 2 0 0 2 ] . In the minimalist approach, s tandard deve lopers s tandard ize no more than necessa ry ; they start sma l l and bot tom up , focus ing on rapid adopt ion and con t inuous tes t ing , and fur ther deve lop the s tandard on ly as needed . Th is is opposed to the structuralist approach, wh ich is t o p - d o w n , a im ing to deve lop a comp le te and c o m p r e h e n s i v e s tanda rd , s tar t ing wi th a h igh- leve l m o d e l , and then e laborat ing and deta i l ing . Recogn iz ing the need for di f ferent v iews and evo lu t ion , s tandards increas ing ly a l low commi t t i ng s y s t e m s to use only subse ts (v iews) of the shared seman t i c resource and prov ide di f ferent m e c h a n i s m s for ex tend ing requi red subse ts wi th local e l emen ts : by us ing a speci f ied f o r m a l i s m , by qua l i f y ing /adap t ing spec i f ied e lemen ts , or by mix ing and match ing e lemen ts of di f ferent seman t i c resources us ing the s a m e conceptua l f o rma l i sm . S o m e of these m e c h a n i s m s are brief ly rev iewed be low. Wi th the avai labi l i ty of seman t i c resources of all k inds in pract ical ly every d o m a i n , it is rarely necessary to deve lop a seman t i c resource needed for a par t icu lar purpose f rom sc ra tch . R e c o m m e n d e d pract ice is to a l w a y s : ident i fy a su i tab le s tandard seman t i c resource , use it as - i s if poss ib le , if no t—adapt it for the speci f ic purpose as needed , and - 54 -only if these two opt ions are not accep tab le , deve lop a comp le te l y new seman t i c resource. Profiles are deve loped by tak ing and c o m b i n i n g , adapt ing and comp lemen t i ng e lemen ts f rom one or more s tandard s c h e m a s or e lemen t se ts and poss ib ly f rom o ther prof i les. Di f ferent s tandards and gu ide l ines dif fer in prescr ib ing what can be used in def in ing prof i les. S o m e a l low only the use of subse ts of a s tandard s c h e m a s , e i ther f reely fo rmed or inc luding a set of manda to ry e lemen ts (e .g . ISO 19106 [ ISO 2004 ] ] ) , o ther inc lude so -ca l led qual i f iers that a l low adapta t ion or redef in i t ion of s tandard e lemen ts to match local needs (e .g . D C M E S [DCMI 2006b ] ) , whi le the most l iberal ones a l low free m i x i ng , ma tch ing , adap ta t i on , and ex tens ion of mul t ip le s c h e m a s and prof i les [Heery 2 0 0 2 ] . The m e c h a n i s m used by the IFCs [IAI 2006 ] invo lves use of predef ined properties and property sets. In cases whe re a c lass that is not ava i lab le in the s c h e m a is requ i red , the s tandard r e c o m m e n d s the fo l lowing p rocess : reuse an ex is t ing predef ined proper ty se t ; if none is su i tab le , c rea te a new one reus ing predef ined proper t ies ; if a predef ined proper ty is not ava i lab le , c reate a new one . New proper t ies and proper ty se ts need to be mode l led accord ing to a prescr ibed s c h e m a and submi t ted to the Model Suppor t G roup as proposa ls [Adachi 2 0 0 1 ] . In addi t ion to serv ing as a m e c h a n i s m for seman t i c d i samb igua t ion ( i .e. codes ) , X M L namespaces a lso prov ide for modi f ica t ion of s tandard s c h e m a s as wel l as for cor re la t ing ( l inking) e lemen ts used in di f ferent s c h e m a s . N a m e s p a c e s a l low X M L fi les to use a mix tu re of g lobal ly s tandard ized e l e m e n t s / t e r m s and those local ly de f ined . A n X M L namespace is a col lect ion of t e rms , ident i f ied by a URI re ference, wh ich are used in X M L documen ts as e lemen t t ypes and at t r ibute names [W3C 1999 ] . In X M L d o c u m e n t s , reused e lement and at t r ibute n a m e s are preceded by a n a m e s p a c e pref ix, wh ich is assoc ia ted wi th a n a m e s p a c e URI re ference in a n a m e s p a c e dec lara t ion at the beg inn ing of the f i le, thus a l lowing sof tware modu les to recogn ize e lemen ts and at t r ibutes that they are des igned to p rocess . A n X M L fi le can inc lude e lemen ts and at t r ibutes def ined in un l imi ted numbe r of X M L s c h e m a s and each X M L s c h e m a can offer new e lemen ts to be reused by o ther s c h e m a s by dec la r ing its target n a m e s p a c e . The approaches that focus on seman t i c reconci l ia t ion of independent ly deve loped seman t i c resources dif fer along, mul t ip le d imens ions . They dif fer in t e rms of the type and divers i ty of resources they are t ry ing to make in teroperab le , degree of h u m a n effort requ i red , the s t rength and depth of in teroperabi l i ty they are t ry ing to ach ieve . Efforts of this k ind can be t racked in many di f ferent f ie lds, such as federa ted da tabases and - 55 -enterpr ise data in tegrat ion [e .g . Bottcher and Groppe 2 0 0 3 ; Mi l ler et a l . 2 0 0 1 ] , inter-onto logy mapp ing [e .g . Rodr iguez and Egenhofer 2 0 0 3 ; S u s h a m a et a l . 2 0 0 2 ] , and interoperabi l i ty of me tada ta s c h e m a s and contro l led vocabu la r ies in in format ion ret r ieval [e .g. Day 2 0 0 2 ; A D L 1 9 9 7 ] . A number of su rveys compar i ng these efforts in di f ferent f ie lds are ava i lab le [Bat ini et a l . 1 9 8 6 ; Berns te in and R a h m 2 0 0 1 ; Doan and Ha levy 2 0 0 5 ; Kal fog lou and S c h o r l e m m e r 2 0 0 3 ; Noy 2 0 0 4 ; R a h m and Berns te in 2 0 0 1 ; S h v a i k o and Euzena t 2 0 0 5 ] . Tab le 1 s u m m a r i z e s d i f ferences in app roaches to seman t i c reconci l ia t ion and in teroperabi l i ty . Degree of content heterogeneity Basic approach Matching method Strength of matching Mapping time Depth Same type Diverse type Direct local schema mapping Derivation from shared schema Lifting to shared schema Manual Automated Machine-assisted Full Partial Related via anchors Design time Post-design Run time Syntax Terminology Human-understandable semantics Machine-processable semantics Tab le 1: Di f ferences in approaches to achiev ing interoperabi l i ty . Mappings, crosswalks, and mediated schemas are resources that cap ture seman t i c equ iva lence or cor re la t ion be tween e lemen ts f rom two or more seman t i c resources . They can be created by h u m a n intel lectual efforts (e .g . A D L and Ge t t y ' s c rosswa lks , Microsof t B izTalk Mapper ) , au tomat ica l l y [see R a h m and Berns te in 2 0 0 1 ] , or using a comb ina t ion of these two me thods [Bossung 2 0 0 4 ] . Issues invo lved in creat ing and using mapp ings are numerous and inc lude: d i f ferences in scope and granu la r i t y , fuzzy , o n e - t o - m a n y , and m a n y - t o - o n e ma tches between e l emen ts , logical m i s m a t c h e s in re la t ionsh ips , and the need for change m a n a g e m e n t [Woodley 1 9 9 8 ; S t . Pierre and LaPlant 1998 ] . The use of a cent ra l s c h e m a genera ted by merg ing s c h e m a s of di f ferent da ta sou rces that need to be merged has been a cont inuous research topic s ince the ear ly 1980 's [Bat in i , Lenzer in i , and Nava the 1 9 8 6 ; Se th and Larson 1 9 9 0 ; Parent and Spaccap ie t ra 1 9 9 8 ; Pot t inger and Berns te in 2 0 0 3 ] . Mapp ing of source s c h e m a s to a cent ra l s c h e m a came into focus in the ear ly 1990 's wi th the deve lopmen t of data wa rehous ing and data min ing [Mil ler et a l . 2 0 0 0 ] . The cent ra l s c h e m a can be f ixed or it can evo lve by a cont inuous addi t ion of new seman t i c resources , as (e .g . Unified Medical Language System (UMLS) [Nat ional L ibrary of Medic ine 2006 ] ) . - 56 -Bi lateral mapp ings are used for exchange of da ta be tween two sof tware app l ica t ions or da tabases , for shar ing da ta in di f ferent in format ion retr ieval s y s t e m s , and s ince recent ly , for pee r - to -pee r in format ion exchange . It is in terest ing to note that mode l l ing approaches to in teroperabi l i ty s tar ted as a react ion to d rawbacks of bi lateral mapp ings , but in recent y e a r s , there is a rev ived interest in this app roach . Due to the obse rved inadequacy of s tandard i zed a priori seman t i c s t ruc tures for d y n a m i c and evo lv ing env i r onmen ts , an increas ing number of researchers bel ieve that g lobal seman t i c in teroperabi l i ty can e m e r g e f rom a mul t ip l ic i ty of pa i r -base , local in teract ions and that a large network of ex is t ing loca l , one - to -one mapp ings , hyper l inks be tween Web resources , user 's b rows ing paths can resul t in a se l f -s tab i l iz ing seman t i c in f rast ructure ("emergent semantics") [e .g . A b e r e r et a l . 2 0 0 4 ; G rosky et a l . 2 0 0 2 ; a lso Ha levy et a l . 2 0 0 3 ] . Both bi lateral and m a n y - t o - o n e app roaches can vary in te rms of the s t rength of m a p p i n g ; s o m e a i m for comp le te ma tch ing , whi le o ther accept part ia l unders tand ing and only cor re la te shared e lemen ts (e .g . swi tch ing and l ink ing) . Loose ma tch ing can be done by match ing leaf nodes in h ierarch ica l s t ruc tures or by ident i fy ing co r respond ing n o d e s / e l e m e n t s at any level and then e i ther merg ing their ch i ldren or a l lowing swi tch ing f rom one resource to the other as needed . F rameworks that p romote part ia l unders tand ing and au tomated match ing rely on m e c h a n i s m s for un ique ident i f icat ion of s ign i f ieds of al l k inds and on merg ing or l inking ru les. S o m e of these m e c h a n i s m s are descr ibed and d i scussed in Sec t ion 6 .2 .2 .5 . 6.1.2 Context Context as an e lemen t of commun i ca t i on in e lect ron ic env i ronmen ts c a m e into focus of research pro jects more recent ly . The not ion of contex t and the reason for handl ing it differ ac ross d o m a i n s . The knowledge representa t ion (KR) doma in is conce rned wi th contex t as the scope of va l id i ty of s ta temen ts captured in on to log ies , mode ls , o r knowledge bases . The impor tance of contex t in ru le -based art i f ic ial in te l l igence (A l ) efforts w a s f irst recogn ized by McCarthy [McCarthy 1987], In the A l c o m m u n i t y , contex t is regarded as the body of wor ld knowledge to wh ich par t ic ipat ing part ies have access whi le c o m m u n i c a t i n g . G u h a def ines it as eevery th ing that is not in a s ta temen t that is needed to make it a mean ing fu l s ta temen t , represent ing what it is in tended to s ta te . Th is e n c o m p a s s e s all assump t i ons , when these assump t i ons are reasonab le and the theory app l i cab le , how it might relate to s ta temen ts in other con tex ts [Guha 1 9 9 1 , 8 ] . Model l ing contex t for th is purpose , wh ich requi res u l t imate seman t i c dep th , is a ma jo r - 57 -cha l lenge wi th no f inal so lu t ion . As McCar thy pointed out whi le d iscuss ing the p rob lem of genera l i t y in A I , " it is probably not correct to regard con tex ts as equ iva len t to se ts of assump t i ons—no t even inf inite sets of a s s u m p t i o n s " [McCar thy 1987 ] . Derv in c a m e to a s imi la r conc lus ion and compa red contex t to an " u n r u l y b e a s t " in her rev iew of t r ea tmen ts of the te rm in socia l sc ience that pointed to the di f f icul t ies and inev i tab le chaos invo lved in efforts to capture contex t in in format ion seek ing and use [Derv in 1997 ] . K a e n a m p o r n p a n and E a m o n n [2004] s u m m a r i z e d di f ferent def in i t ions and c lass i f ica t ions of con tex t used in con tex t -mode l l i ng . They found that di f ferent research s t r eams cons ider di f ferent subse ts of the fo l lowing e lemen ts as con tex t : loca t ion , cond i t ions , in f rast ructure (comput ing env i ronmen t ) , in fo rmat ion on user , soc ia l contex t , user ac t iv i ty , t ime , and dev ice character is t ics and c lass i fy t h e m I ve ry di f ferent w a y s under the fo l lowing ca tegor ies : phys ica l env i ronmen t , cu l tura l con tex t , h u m a n factor , user env i ronmen t , in format ion contex t , and ident i ty . S o m e of the d i f ferences s tem f rom di f ferent pu rposes for wh ich con tex t -mode l l i ng is used . Acco rd ing to [Becker and Nick las 2 0 0 4 ] , con tex t is used in app l ica t ions for four bas ic pu rposes : con tex t -based se lec t ion , con tex t -based p resen ta t ion , con tex t -based ac t ion , and con tex t -based tagg ing . [Oukse l and S h e t h 1999] ident i f ied the fo l lowing benef i ts of mode l ing and represent ing con tex t : e c o n o m y of rep resen ta t ion , e c o n o m y of reason ing , m a n a g e m e n t of incons is tent in fo rmat ion , f lex ib le seman t i cs . It is wor th not ing that in in format ion m a n a g e m e n t , un l ike in A I , contex t is used to speci fy not on ly the scope in wh ich an asser t ion or set of asser t ions is va l id but more of ten the scope in wh ich it is re levant . Scop ing is a m e c h a n i s m for add ing p ragmat i cs to e lec t ron ic in format ion env i ronmen ts . It is used to a l low d is t inct ion be tween un iversa l l y and local ly va l id or re levant asse r t i ons , i.e. f i l ter ing of content va l id in or re levant to a speci f ic contex t whi le keep ing l inks to o ther re lated in format ion and o ther poss ib le " t ru ths" . Asser t i ons that may need to be scoped in in format ion m a n a g e m e n t s y s t e m s inc lude: sense spec i f i ca t ions , ass ignmen t of va lues to s c h e m a e lemen ts or a t t r ibu tes , re la t ionsh ips be tween s ign i f ieds and so on . Con tex t can be a speci f ic l anguage , theory , c o m m u n i t y , t ime per iod , m e a s u r e m e n t s y s t e m , d isp lay dev ice or any o ther mo re -o r - l ess comp lex concept . - 58 -6.1.3 Sharing Mechanisms One of the pr imary reasons for mak ing concep tua l i za t ions and senses expl ic i t is to suppor t shar ing of in format ion be tween di f ferent par t ic ipants . S e a m l e s s e x c h a n g e , re-use , and merg ing of in format ion requi re e i ther a c o m m i t m e n t to a shared conceptua l i za t ion and senses or an unders tand ing of concep tua l i za t ions and senses used by o ther par t ies. Th is sect ion descr ibes di f ferent app roaches cur rent ly used for shar ing in format ion in ne tworked env i ronmen ts . The process and pract ice of mak ing content ava i lab le to o ther par t ies is often referred to as syndication. Di f ferent t ypes of repositories and registries m a k e content ava i lab le to other users and s y s t e m s ; they can s tore in format ion resources t hemse l ves ( in format ion repos i tor ies , da ta se rve rs , learn ing object repos i to r ies) , me tada ta records descr ib ing in format ion resources (me tada ta repos i tor ies and " th i ck " reg is t r ies) , s c h e m a s accord ing to which e i ther in format ion resources t h e m s e l v e s or me tada ta records descr ib ing t h e m are s t ruc tured ( schema repos i to r ies , on to logy l ibrar ies) , pointers to any of these t ypes of content ( reg is t r ies , " t h i n " reg is t r ies , or d i rec tor ies) , or any comb ina t ion thereof . Metada ta reposi tor ies and r e g i s t r i e s 9 focus on documen ta t i on or descr ip t ion of content wi th the goal to enab le access and in terpre ta t ion , whi le s c h e m a repos i tor ies and regist r ies foster ha rmon iza t ion and s tandard iza t ion , encourag ing e i ther full or part ia l re-use of s tandard s c h e m a s . S c h e m a repos i tor ies s tore on ly ac tua l s c h e m a s (e .g . O A S I S X M L Indust ry Por ta l ) , whi le registr ies prov ide more or less deta i led in format ion about the seman t i cs and s t ruc ture of each e lemen t , any local ex tens ions , re lated vocabu la r ies , mapp ings to o ther s c h e m a s , c o m m e n t a r i e s , e n d o r s e m e n t s , t rans la t ions etc. Examp les inc lude: Dubl in Core Metadata In i t iat ive 's (DCMI) Metada ta Reg is t ry , regist r ies deve loped in success ive European pro jec ts : D E S I R E , S C H E M A S , M E G Reg is t ry , C O R E S , J I S C , or UL IS Open Metadata Reg is t ry , me tada ta regis t r ies (MDRs) con fo rming to I S O / I E C 1 1 1 7 9 [ ISO 2 0 0 4 c ] , such as Aus t ra l ian Inst i tu te 's of Heal th and Wel fare (AIWH) METeOR. Onto logy and RDF s c h e m a repos i tor ies and l ibrar ies (e .g . W e b O n t o , 9 It should be noted that the terminology related to repositories and registries varies widely across different communities. Although the international standard [ISO/IEC 1179-1] defines a metadata registry (MDR) as "a database of metadata that supports the functionality of registration", in other words as (what is called here) a metadata repository with the functionality of registration, this definition has not been accepted in this work, as it makes it difficult to distinguish between different types of facilities. The nomenclature used in this document is trying to distinguish between repositories as facilities storing actual content from registries as facilities storing pointers to (and descriptions of) content stored in repositories. - 59 -Onto l i ngua , D A M L Onto logy l ibrary, S H O E , Onto logy Se rve r , IEEE S tanda rd Upper On to logy , O n t o S e r v e r , O N I O N S , S c h e m a W e b ) can prov ide addi t iona l que ry ing , reason ing , or merg ing /ha rmon i za t i on serv ices [Ding and Fense l 2 0 0 1 ] . S o m e d o m a i n s have deve loped very soph is t i ca ted in format ion shar ing in f rast ructure. Probab ly the best e x a m p l e is the env i ronmen ta l in format ion in f rast ructure that cons is ts of loca l , na t iona l , reg iona l , and in ternat ional da ta sou rces , me tada ta repos i tor ies and reg is t r ies , and co-ord ina t ing nodes (e .g . Global Resource Information Database (GRID) [Uni ted Nat ions Env i ronmenta l P r o g r a m m e ] ) . S ign i f i cant efforts of th is k ind a lso ex is t in the sc ient i f ic c o m m u n i t y (e .g . Oriented Data Technology (OODT) [ N A S A Jet Propu ls ion Labora tory ] and e-Science [Research Counc i l s UK ] ) . Harvesting is the process of col lect ing in format ion f rom d is t r ibuted repos i tor ies . Each reposi tory or data provider m a k e s its conten t and s c h e m a s ava i lab le to agents ca l led harvesters. The other party in the system—service providers, opera te harves te rs wh ich issue requests to data prov iders . If the s c h e m a s of fered by the data prov ider are unders tandab le to the harves te r , it wil l ret r ieve the da ta , e i ther comp le te or f i l tered us ing speci f ic pa ramete rs . The serv ice prov ider wil l use da ta ret r ieved f rom mul t ip le prov iders as a basis for prov id ing v a l u e - a d d e d se rv i ces . 6.1.4 Automated Extraction of Semantics N u m e r o u s techno log ies are cur rent ly ava i lab le for au toma ted ex t rac t ion of seman t i cs . They use computa t iona l l inguist ics and natura l language process ing (NLP) , mach ine learn ing , and stat is t ica l me thods for me tada ta ex t rac t i on , content c lus te r ing /ca tegor i za t ion , content s u m m a r i z a t i o n and agg rega t i on , t a x o n o m y deve lopmen t or ma in tenance , uns t ruc tu red- to -s t ruc tu red t rans fo rma t ion , and quest ion a n s w e r i n g . Examp les of compan ies in th is f ield inc lude: A u t o n o m y , Ent r ieva ( former ly S e m i o ) , Inx ight , S e m a g i x , Metacode • (bought by In te rwoven) , App l ied Seman t i c s (acqu i red by G o o g l e ) , Ask Jeeves etc. Al l these di f ferent tools are used to relate or t rans fo rm uns t ruc tured content to expl ic i t seman t i cs . For that purpose they al l need s o m e pr ior knowledge . Pr ior knowledge can be of syntactic or semantic nature (or a comb ina t ion of both) . The fo rmer refers to knowledge on morpho logy , word f requenc ies in a cer ta in l anguage , or o ther mean ing independent in fo rmat ion . The lat ter refers to techn iques re lat ing words based on s imi lar i ty of mean ing (f.e. WordNe t : Mi l ler , 1995 ) , re lates words accord ing to their mean ing in a cer ta in contex t , descr ibes in format ion as spat ia l or tempora l re lat ions etc. [Engels and B r e m d a l 2 0 0 0 , 57] - 60 -Many app l ica t ions ef fect ive ly comb ine dec lara t ive seman t i cs ( i .e. seman t i c resources) wi th impl ic i t seman t i cs ava i lab le in uns t ruc tured content and wi th stat is t ica l me thods , probabi l is t ic and fuzzy reason ing . Expl ic i t seman t i cs can be both the i r input and output , as seman t i c resources are used to suppor t these app l i ca t ions a n d , on the o ther hand , these app l ica t ions are genera t ing or updat ing expl ic i t seman t i c resources . For e x a m p l e , the App l ied S e m a n t i c s ' C I R C A techno logy is based on an ex tens ive "on to logy " that inc ludes mi l l ions of wo rds , the i r mean ings and re la t ionsh ips to other m e a n i n g s , deve loped by a t e a m of computa t iona l l ingu is ts , art i f ic ial inte l l igence exper ts and da tabase arch i tec ts . So f tware modu les that m i m i c h u m a n process ing and unders tand ing of l anguage , use the " o n t o l o g y " to ca tegor i ze , s u m m a r i z e , and retr ieve in fo rmat ion , but a lso to genera te me tada ta and taxonom ies and to update the "on to logy " . 6.2 Supporting Technologies and Related Frameworks 6.2.1 XML and Associated Standards Extensible Markup Language (XML) 1 0 is a wide ly used s tandard syn tax for descr ip t ion , de l i very , and exchange of r ich, s t ruc tured da ta . Its extens ib i l i ty a l lows di f ferent commun i t i es to def ine s tandard s c h e m a s and add seman t i cs to data and documen ts that they need to e x c h a n g e . Doma in -spec i f i c X M L s c h e m a s / l a n g u a g e s are rev iewed in Sec t ion 6 . 3 . 1 . Be low is a brief ove rv iew of the a c c o m p a n y i n g X M L - b a s e d s tandard languages that are re levant to the topic of th is thes is . The latest ve rs ion of the m a r k - u p language for the Wor ld Wide W e b , XHTML 1 1 , s tems f rom the need to adapt d isp lay of in format ion for an increas ing ly d iverse a r ray of dev ices—to al low both very rich and very s imp le d isp lays of the s a m e content . It is an X M L re formulat ion of HTML wh ich def ines its m i n i m u m subset and poss ib le ex tens ions to be used by d iverse X M L - b a s e d user agen ts , thus a l lowing adapta t ion of d isp lay and content to a par t icu lar contex t . Whi le ear l ier ve rs ions of HTML suppor t seman t i cs on ly th rough the use of me ta - t ags , wh ich can be used to prov ide me tada ta for a Web page as a who le , X H T M L a l lows mach ine -usab le seman t i c s to be built into pages at a f iner level of g ranu lar i ty , pr imar i ly us ing the class a t t r ibute [Dumbi l l 2 0 0 0 ] . Extensible Markup Language (XML) r h t t p : / /www.w3 .orq /XML/1 1 1 XHTML r h t t D : / / w w w . w 3 . o r a / T R / x h t m l l / l - 61 -The Extensible Stylesheet Language Family ( X S L ) 1 2 is a powerfu l tool for se lec t ing , f i l ter ing, t r ans fo rm ing , and re - represen t ing X M L content . A very wide range of its actual and potent ia l uses a lso inc ludes ex t rac t ion of RDF s ta temen ts by " s c r e e n -s c r a p i n g " X H T M L pages [Connol ly 2 0 0 0 ] . Scalable Vector Graphics (SVG) 1 3 is a language for descr ib ing two -d imens iona l g raph ics and graph ica l app l ica t ions in X M L cons is t ing of two par ts : an X M L - b a s e d fi le fo rmat and a p rog ramming API for g raph ica l app l ica t ions . It suppor ts scr ip t ing th rough languages such as E C M A S c r i p t and has c o m p r e h e n s i v e suppor t for an ima t i on . Many des ign tools suppor t impor t and expor t of S V G . T h a n k s to the extens ib i l i ty of X M L , S V G d iag rams can have me tada ta e m b e d d e d in propr ie tary fo rmats w i thout af fect ing the presenta t ion . X 3 D 1 4 , a successo r to V i r tua l Real i ty Model l ing Language ( V R M L ) , is an X M L - b a s e d s tandard fi le format for high quality, real-time, interactive, 3D graphics. It enab les in terchange of 3D data be tween app l ica t ions and can be incorporated into W e b serv i ces arch i tec tures and d is t r ibu ted env i ronmen ts . Its S c e n e Au thor ing Inter face a l lows in tegrat ion of real t ime 3D content and cont ro ls into di f ferent W e b and n o n - W e b app l ica t ions. Synchronized Multimedia Integration Language (SMIL) 1 5 is an X M L - b a s e d l anguage in tended for c reat ion of in teract ive mu l t imed ia p resenta t ions , the on ly one that hand les tempora l behav iour . SMIL c o m p o n e n t s can be in tegrated into o ther X M L - b a s e d languages , pr imar i ly X H T M L and S V G to in tegrate t im ing and synch ron iza t i on . 6.2.2 Semantic Web S e m a n t i c Web (SW) is a v is ion of the future Wor ld Wide W e b , more prec ise ly , of its ex tens ion where " the d a y - t o - d a y m e c h a n i s m s of t rade , bu reaucracy , and our da i ly l ives wil l be hand led by mach ines ta lk ing to mach ines , leav ing h u m a n s to prov ide the inspirat ion and in tu i t ion" [Be rne rs -Lee 1999 , 158 ] . It is expec ted to enr ich the Web by add ing logic—"the means [enabl ing mach ines ] to use rules to m a k e in ferences, choose courses of act ion and a n s w e r ques t i ons " [Be rne rs -Lee et a l . 2 0 0 1 ] . 1 2 Extensib le Sty lesheet Language Fami ly (XSL) [ht tp: / /www.w3 .Org /Style/XSL/l 1 3 Scalable Vector Graphics (SVG) rhttp : / /www.w3.orq/Graphics/SVG/Overview.htm1 1 4 X 3 D rhttp : / /www.web3d.orq/ l 1 5 Synchron ized Mult imedia Integrat ion Language (SMIL) [http://www.w3.org/TR/smil20/cover.htmll - 62 -6.2.2.1 RDF Resource Description Framework (RDF) is one of the enab l ing s tandards / techno log ies for the S e m a n t i c Web endorsed by the Wor ld Wide Web Consor t i um (W3C}.. It is an asser t iona l language for represent ing in format ion about resources on the Wor ld Wide W e b , a shared f ramework that a l lows the exchange of that in format ion be tween app l ica t ions wi thout loss of mean ing [W3C 2 0 0 4 b ] . R D F is pr imar i ly in tended for represent ing me tada ta about Web resources : Web pages as wel l as parts and col lect ions thereof . Howeve r , the concept of a " W e b resou rce " can be ex tended to inc lude all ent i t ies—both phys ica l ob jects and concep ts , that can be ident i f ied on the W e b , even when they cannot be direct ly ret r ieved ove r the W e b , such as in format ion use rs , user agen ts , or products that can be ordered on the W e b . RDF is based on the idea that every th ing is ident i f iable us ing Uniform Resource Identifiers (URI) and can be descr ibed us ing s imp le s ta temen ts . The use of URI ' s is in tended to ensure that concep ts are not jus t s t r ings but pointers to publ ic ly ava i lab le un ique ident i f icat ions of sub jec ts that are referred to f rom mul t ip le sou rces def in i t ions , a l though they do not have to actua l ly point to any th ing . However , the URI m e c h a n i s m does not prov ide c lar i ty on whe the r an URI ident i f ies a s igni f ied or an in format ion resource about that s igni f ied [Berners -Lee 2 0 0 3 ; Booth 2 0 0 3 ; C lark 2 0 0 2 ; Hawke 2 0 0 2 ; Pepper and S c h w a b 2003 ] A n RDF s ta temen t assoc ia tes a resource ("subject") wi th a n a m e d proper ty ( "predicate") and its va lue for the g iven resource ( "object" ) . RDF s ta temen ts can be represented as tr ip les of the f o r m : {p red ica te , sub jec t , ob jec t } or as d i rec ted g raphs of nodes and arcs (F igure 9 ) . predicate object F igure 9. RDF graph The S e m a n t i c W e b is w o v e n out of such asser t ions that can be freely cont r ibu ted by anybody . The power is prov ided by addi t ional layers that are built on top of RDF or or thogonal to it (F igure 10) . - 63 -Moz P3P Dublin Core element set KR data KR rules Web of Trust Proof Logic Ontology support RDF Schema RDF syntax in XML Resource Description Framework: basic ER-like model Namespaces XML-Schema XML - Structured documents Universal Resource Identifiers (Unicode) XML Dig. Sig. Figure 10. Semant ic W e b Layers . Source: http:/ /www.w3.orq/2000/Talks/0906-xmlweb-tbl / . Copyright © 2000 World Wide Web Consort ium, by permission. 6.2.2.2 RDFS Whi le RDF prov ides a way to exp ress s imple s ta temen ts about resou rces , it does not prov ide m e c h a n i s m s for descr ib ing propert ies and thei r re la t ionsh ips wi th o ther resources . Howeve r , R D F users and appl icat ions f requent ly need the abi l i ty to indicate that they are descr ib ing speci f ic k inds or c lasses of resources (e .g . peop le ) , and wil l use a speci f ic set of proper t ies to descr ibe t hem (e .g . persona l prof i le s c h e m a s ) . RDF 's vocabu la ry descr ip t ion language , RDF Schema (RDFS), is an ex tens ion of R D F , wh ich speci f ies m e c h a n i s m s needed to name and descr ibe proper t ies and the c lasses of resource they descr ibe [W3C 2004c ] . It a l lows resources to be de f ined as ins tances of one or more c lasses and a l lows c lasses to be organ ized into h ie rarch ies . Howeve r , " R D F c lass and property descr ip t ions do not c rea te a s t ra igh t jacke t into wh ich in format ion mus t be fo rced , but instead prov ide add i t iona l in fo rmat ion abou t the RDF resources they desc r i be " [W3C 2004b ] . In o rder to suppor t one of the basic pr inc ip les of the S e m a n t i c Web—that anyone can say any th ing abou t any th i ng , un l ike most concep tua l f o rma l i sms , RDF is proper ty -cent r i c—ins tead of def in ing c lasses as sets of p roper t ies that they sha re , it def ines propert ies in te rms of c lasses they may desc r ibe . RDF proper ty (or pred icate) is a speci f ic aspec t , charac ter is t i c , a t t r ibu te , or re lat ion wi th def ined mean ing that m a y be used to descr ibe resources . Its va lue can be ano the r - 64 -resource , spec i f ied by a URI . or a l i teral . In the fo rmer c a s e , the RDF s ta temen t represents a re la t ionship be tween two resources . A proper ty def in i t ion may speci fy its permi t ted va lues ( R D F S : r a n g e ) as wel l as the t ypes of resources that m a y be descr ibed wi th th is proper ty ( R D F S : d o m a i n ) . R D F has been accepted by numerous and d iverse research and deve lopmen t (R&D) c o m m u n i t i e s . A n u m b e r of tools and too lk i ts for deve lop ing R D F app l ica t ions are f reely ava i lab le to deve lopers [W3C] . 6.2.2.3 OWL A s RDF S c h e m a a l lows def in i t ion of c lasses wi th mul t ip le subc lasses and super c l asses , def in i t ion of proper t ies , the i r sub -p roper t i es , d o m a i n s , and ranges , it can be used for in format ion d iscovery , nav iga t ion , f i l ter ing, pe rsona l i za t ion , and o ther t ypes of ma tch ing . However , in teroperabi l i ty of mul t ip le independent ly deve loped and main ta ined s c h e m a s and the full rea l izat ion of the S e m a n t i c Web v is ion requi re the abi l i ty to deve lop t rue, fu l l -b lown onto log ies wi th de ta i led , accu ra te , and cons is ten t def in i t ions of c lasses , p roper t ies , and re la t ionsh ips in d iverse d o m a i n s . Th is is the role of Web Ontology Language (OWL), a language der ived f rom D A M L + O I L [W3C 2001 ] and in tended for def in ing Web onto log ies and the i r assoc ia ted know ledge bases . A n OWL ontology is a Web d o c u m e n t cons is t ing of a sequence of a x i o m s , fac ts , and inc lus ion re ferences to o ther on to log ies . It can also have anno ta t ions used to record in format ion about the onto logy (e .g . au thorsh ip ) . A n OWL knowledge base (KB) is a set of O W L asser t ions loaded into a reason ing s y s t e m and m a y be based on a s ingle on to logy or on mul t ip le d is t r ibuted onto log ies that have been comb ined using impor t m e c h a n i s m s def ined by the O W L spec i f icat ion [W3C 2 0 0 4 a ] . 6.2.2.4 Topic Maps Top ic m a p s are sets of in terre lated sub jec t prox ies assoc ia ted wi th per t inent in fo rmat ion . A sub jec t proxy or topic is an aggrega te of s ta temen ts made about the sub jec t it represents . These s ta tements or charac ter is t i cs ass i gnmen ts inc lude ass i gnmen ts of names, roles p layed in associations w i th o ther sub jec ts / top i cs , occurrences, i.e. per t inent in fo rmat ion , and spec i f ica t ions of sub jec ts that they s tand for. The top ic map f ramework is des igned to al low uncons t ra ined express iv i t y and to be onto logy neut ra l . Concep tua l i za t ion of a d o m a i n is exp ressed local ly th rough assoc ia t ions - 65 -and use of topic and charac ter is t i c t ypes . Types can be reif ied as top ics , but not fo rmal ly def ined to enab le mach ine in ference. Top ic Maps Cons t ra in t Language (TMCL) [ ISO 2 0 0 5 b ] , that shou ld al low users to cons t ra in any aspec t of the topic map data m o d e l , is stil l under deve lopmen t . Top ic Maps Query Language (TMQL) [ ISO 2 0 0 5 c ] , wh ich shou ld prov ide a syn tax to form query exp ress ions necessary to ret r ieve and man ipu la te content of top ic map repos i tor ies , is in an even ear l ier deve lopmen t s tage . The topic m a p f ramework a ims to a l low both seman t i c d ivers i ty and in teroperabi l i ty . S e m a n t i c d ivers i ty is ach ieved th rough independent deve lopmen t of topic m a p s by d iverse ind iv idua ls and commun i t i es and the use of scope, an at t r ibute that can be speci f ied for any ass ignmen t of a charac ter is t ic , mean ing that the topic charac ter is t i c ass i gnmen t is va l id only wi th in the speci f ied contex t . S c o p e is c o m p o s e d of a set of top ics that toge ther def ine s o m e contex t . In teroperabi l i ty is ach ieved by the possib i l i ty to merge independent ly deve loped top ic m a p s . Sub jec t co l locat ion ob ject ive refers to the goa l that every th ing that is known about a g iven sub jec t is ava i lab le f rom a s ingle locat ion wi th in a par t icu lar top ic m a p , wh ich may be a product of merg ing mul t ip le topic m a p s . Th is goal requi res unamb iguous sub jec t ident i f icat ion. Di rect ly addressab le sub jec ts , ( i .e. in format ion resources ava i lab le on the Internet) are eas i ly ident i f ied by so-ca l led subject locators. Al l o ther t ypes of sub jec ts require the use of so-ca l led subject indicators—information resources that prov ide an unamb iguous human- readab le indicat ion of the ident i ty of a sub jec t , suf f ic ient to d is t ingu ish it f rom other sub jec ts . It can have the fo rm of descr ip t ion , def in i t ion , n a m e , v i sua l , aud io , or o ther rep resen ta t ion , or any comb ina t ion thereof . Published subjects [ ISO 2006a ] are a m e c h a n i s m in tended to enab le g lobal in teroperabi l i ty and merg ing of di f ferent topic m a p s , as wel l as in teroperabi l i ty wi th o ther f r a m e w o r k s , such as RDF and O W L , for e x a m p l e . Published subject indicators (PSI) are publ ic ly ava i lab le sub jec t ind icators. Subject identifier is a mach ine -processab le add ress of a sub jec t indicator . If a s y s t e m encoun te rs mul t ip le top ics point ing to the s a m e sub jec t us ing the s a m e sub jec t ident i f ier , it can deduce that they represent the s a m e subject and merge t hem into one . Top ic m a p s m a y be represented in a var ie ty of w a y s : in f i les us ing one of the topic map syn taxes (e .g . X T M , HyTM) , in da tabases , as internal da ta s t ruc tures in di f ferent app l i ca t ions , and even menta l l y in h u m a n m inds . There is a n u m b e r of, most ly propr ie tary , too ls for deve lop ing , ma in ta in ing , and imp lemen t ing topic maps and many - 66 -success fu l app l ica t ions. However , the adopt ion of th is f ramework is far beh ind the adopt ion of RDF . Poss ib le reasons are d i scussed in [Freese 2 0 0 2 ] . 6.2.2.5 Topic Maps Compared to RDF Both Top ic Maps and R D F a i m to suppor t both seman t i c d ivers i ty and in teroperabi l i ty and hence , the two f r ameworks have m a n y s im i la r fea tures . Howeve r , they are or ig inal ly in tended for di f ferent purposes . The ma in purpose of top ic m a p s is m a n a g e m e n t of large and comp lex bodies of in format ion and they are pr imar i ly in tended for h u m a n use . RDF was deve loped to enab le rea l izat ion of the S e m a n t i c Web by prov id ing s t ruc tured in format ion about resources to be used for mach ine in ference. A numbe r of mapp ing app roaches to a l low interoperabi l i ty of the two f r a m e w o r k s have been proposed [Lacher and Decke r 2 0 0 1 ; Moore 2 0 0 1 ; Og ieve tsky 2 0 0 1 ] . S imi la r i t ies and d i f ferences are d iscussed in more deta i l in [Pepper 2 0 0 2 ; Garsho l 2 0 0 3 ] . Unfor tunate ly , the key m e c h a n i s m needed to suppor t the in teroperab i l i ty—unique sub jec t ident i f icat ion—has not been sat is factor i ly reso lved in e i ther f ramework . The RDF c o m m u n i t y real ized th rough exper ience wi th imp lemen t ing the f ramework , that the URI m e c h a n i s m does not d is t ingu ish be tween rea l -wor ld sub jec ts and in format ion resources about t h e m [Berne rs -Lee 2 0 0 3 ; C lark 2 0 0 2 ] . Top ic Maps , wh ich are handl ing this p rob lem by d is t ingu ish ing be tween sub jec t locators and sub jec t ident i f iers, sti l l have to cope wi th genera l issues assoc ia ted wi th un ique sub jec t ident i f icat ion, such as s i tuat ional or re lat ive ident i ty , ident i ty evo lu t ion , ident i f iabi l i ty , ve rs ions and cop ies [Kent 1 9 7 8 ; Kent 2003 ] as wel l as m e c h a n i s m s for enforc ing g lobal ly un ique and eas i ly f indable pub l ished subject ind icators . S o m e efforts on au toma ted match ing of sub jec ts based on proper t ies o ther than Sub jec t Ident i ty Proper ty (SIP) have been exp lo red [Maicher and Wi tsche l 2 0 0 4 ] . Yet ano the r feature re levant for th is work has not been sat is factor i ly reso lved . Hand l ing of the scope of va l id i ty of s ta temen ts or contex t is m iss ing in R D F , whi le Top ic Maps leave it open to in terpretat ion and insuf f ic ient ly spec i f ied at the g l o b a l -in teroperabi l i ty leve l . - 67 -6.2.3 Technologies and Architectures for Sharing and Customization 6.2.3.1 Web Services XML Web services are a recent techno logy wor th ment ion ing because of the capabi l i ty to enab le commun i ca t i on be tween d is t r ibuted he te rogeneous s y s t e m s wh ich have no pr ior knowledge of each other . They a re se l f - con ta ined , reusab le , self-descr ib ing sof tware componen ts that can be loca ted , invoked and used by other app l ica t ions ac ross the W e b . Web serv ices enab le the exchange of da ta and the remote invocat ion of appl icat ion logic us ing X M L m e s s a g e s for t ransfer r ing da ta be tween loosely coup led he te rogeneous s y s t e m s wr i t ten in any p r o g r a m m i n g language , runn ing on any opera t ing s y s t e m . W e b serv ices are enab led by a ser ies of s tandards /spec i f i ca t i ons , all based on X M L , wi th S O A P , U D D I , and W S D L being the most impor tant ones . SOAP (S imp le Object Access P ro toco l ) 1 6 is a protocol that enab les di f ferent p rog rams to commun i ca te us ing HTTP and X M L for da ta e x c h a n g e . It consists of three parts: an envelope defining a f ramework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of appl icat ion-defined data types, and a convention for representing remote procedure calls and responses. WSDL (Web Services Description Language ) 1 7 desc r ibes the inter face of fered by a Web serv ice—what it can do , where it res ides , and how it can be invoked . UDDI (Un iversa l Descr ip t ion , D iscovery and I n t e g r a t i o n ) 1 8 is a m e c h a n i s m that enab les c l ients to dynamica l l y f ind and connect to Web serv ices . It is a p la t fo rm-independent , open f ramework for descr ib ing and d iscover ing Web serv ice prov iders , serv ices that they offer, and techn ica l in ter faces that can be used to access the serv ices . The spec i f icat ion prov ides an API for regis t r ies of ava i lab le Web serv ices . 6.2.3.2 Service-Oriented Architecture (SOA) A serv ice-o r ien ted arch i tec ture (SOA) is a s y s t e m for l ink ing in teract ing sof tware agents on d e m a n d . A serv ice is a unit of work done by a serv ice prov ider to ach ieve des i red end- resu l t s for a serv ice c o n s u m e r . Prov ider and c o n s u m e r are roles p layed by sof tware agents on behal f of the i r owners . Se rv i ce in teract ions are def ined using a 1 6 S O A P (Simple Object Access Protocol) rhttp:/ /www.w3.orq/TR/SOAP/1 1 7 W S D L (Web Services Description Language) rh t tp : / /www.w3.org /TR/wsd l l 1 8 UDDI (Universal Description, Discovery and Integration) fh t tp : / /www.oas is-open.orq/commit tees/uddi-spec/1 - 68 -descr ip t ion language . Each in teract ion is se l f -con ta ined and loosely c o u p l e d -independent of any o ther in terac t ion. Web serv ices are cur rent ly the most c o m m o n , but not the only imp lementa t ion of S O A . Ideal ly , there is a m a n a g e m e n t layer be tween the prov iders and c o n s u m e r s to ensure comp le te f lexibi l i ty regard ing imp lemen ta t i on protocols . 6.2.3.3 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMhn The Open Arch ives In i t iat ive Protocol for Metadata Harves t ing ( O A I - P M H ) 1 9 p rov ides an in teroperabi l i ty f r amework based on metadata harvesting. The Open A rch i ves Ini t iat ive s tar ted as an effort to enhance access to e-pr int a rch ives and improve the avai lab i l i ty of scho lar ly c o m m u n i c a t i o n . However , the techno log ica l f ramework and s tandards deve loped to suppor t th is goal are genera l l y app l icab le and independent of the type of content and related e c o n o m i c m e c h a n i s m s . In the O A I - P M H f ramework da ta prov iders are mak ing me tada ta ava i lab le for harvest ing by expos ing it in O A I - P M H comp l ian t repos i to r ies—network access ib le se rvers that can process O A I - P M H reques ts . Repos i to r ies can expose me tada ta us ing di f ferent s c h e m a s ; however , each mus t make it ava i lab le in unqual i f ied Dubl in Co re . The O A I -PMH a l lows var ious reposi tory conf igura t ions by d is t ingu ish ing be tween three d is t inct ent i t ies : " R e s o u r c e " is the object descr ibed by m e t a d a t a . It can be phys ica l or d ig i ta l , s tored in the reposi tory or in a remote da tabase . " I t e m " is a const i tuent of a repos i to ry , a con ta iner that s to res or dynamica l l y genera tes metada ta about a s ing le resource in mul t ip le fo rmats , each of wh ich can be harvested as records v ia the O A I - P M H . The metada ta can be d i ssemina ted on- the- f l y f rom the assoc ia ted resource , c ross -wa l ked f rom s o m e canon ica l f o rm , ac tua l ly s tored in the repos i tory , etc. Each i tem has an ident i f ier that is un ique wi th in the scope of the reposi tory of wh ich it is a const i tuent . " R e c o r d " is metada ta in a speci f ic me tada ta fo rmat re turned in response to an O A I -PMH request for metada ta f rom an i tem. A record is ident i f ied unamb iguous l y by the combina t ion of the un ique ident i f ier of the i tem f rom wh ich the record is ava i lab le , the Open Arch ives Init iative Protocol for Metadata Harvest ing (OAI-PMH) rhttp://www.openarchives.orq/OAI/openarchivesprotocol.htmll - 69 -metada ta Pref ix ident i fy ing the me tada ta fo rmat of the record , and the da tes tamp of the record . I tems in a repos i tory can be g rouped into se ts . Se ts can be o rgan ized as f lat l ists or as s ingle or mul t ip le h ierarch ies . Se ts are used to a l low se lec t ive ha rves t i ng . Se lec t i ve harvest ing a l lows harves te rs to l imit requests to por t ions of the me tada ta ava i lab le f rom a reposi tory by f i l ter ing on d a t e s t a m p s , set m e m b e r s h i p , or a comb ina t ion thereof . The o ther party in the f ramework—serv i ce prov iders , use metada ta harves ted v ia the O A I - P M H as a bas is for bui ld ing v a l u e - a d d e d serv ices . They are opera t ing h a r v e s t e r s -cl ient app l ica t ions that issue O A I - P M H requests to repos i tor ies . There are s ix t ypes of O A I - P M H requests wh ich a l low harves te rs to : ret r ieve in format ion about a repos i to ry , l ist of ava i lab le records ( i .e. record headers ) , ava i lab le me tada ta fo rma ts , repos i tory set s t ruc ture , an ind iv idual known me tada ta record f rom a repos i tory , or a comp le te or f i l tered set of records . O A I - P M H requests are us ing the HTTP protocol and responses are returned as X M L - e n c o d e d byte s t r eams . 6 . 2 . 3 . 4 W e b F e e d s In ne tworked e n v i r o n m e n t s , conten t is usual ly shared in the fo rm of feeds, X M L documen ts of fered by conten t prov iders us ing a s tandard s c h e m a . Cur ren t l y there are severa l s tandard s c h e m a s used for th is purpose on the W e b . R S S s tands for severa l related concep ts . Real ly S i m p l e Synd ica t i on is an umbre l la te rm that e n c o m p a s s e s mul t ip le ve rs ions of Rich S i te S u m m a r y (0 .9 , 0 . 9 1 , 0 .92 , 0 . 93 , 0 .94 and 2.0) and RDF Si te S u m m a r y ( R S S 1.0). Cur ren t l y used ve rs ions inc lude R S S 0.91 used for bas ic synd ica t ion , R S S 1.0 for R D F - b a s e d app l ica t ions and R S S 2.0 for gene ra l - pu rpose , me tada ta - r i ch synd ica t ion . Ano the r s tandard feed fo rmat is A t o m , fo rmer ly known as Echo. Users or s y s t e m s can subscr ibe to a n u m b e r of feeds f rom d iverse sources . So f tware or a remote ly hosted serv ice that per iod ica l ly reads a set of news sou rces , in one of synd ica t ion fo rmats , f inds new i tems, and d isp lays t h e m , typ ica l ly in reverse-chrono log ica l o rder , on a s ing le page is ca l led a g g r e g a t o r . There a re : desk top news aggrega to rs that can d isp lay feed content in a b rowser or in an e -ma i l c l ient , on l ine news aggrega to rs , and se rve r -s ide aggrega to rs . The pr inc ipal use of feeds is to prov ide news and updates . A l though mos t feeds use only the ex t reme ly l imi ted set of requi red metada ta e lemen ts , opt iona l e lemen ts - 70 -ava i lab le in s tandard synd ica t ion s c h e m a s a l low very soph is t ica t ing m a n a g e m e n t of content p rov ided in feeds . 6.2.3.5 Peer to Peer Peer-to-peer (or P2P) compu te r ne tworks rely on the s torage s p a c e , compu t i ng power , and bandwid th of all par t ic ipat ing nodes. In a " p u r e " P2P network , al l nodes are equa l , all funct ion as both c l ients and se rve rs , and connec t to each o ther as n e e d e d , wi th no serv i ces prov ided by centra l se rve rs or routers . To reso lve di f f icul t ies assoc ia ted wi th f inding content in P2P ne tworks and bot t lenecks , mos t of P2P ne tworks dev ia te f rom th is pure mode l by inc luding a centra l se rver that keeps and prov ides in format ion about peers or by des igna t ing nodes that have more capac i ty to act as index ing nodes . 6.2.3.6 Grid Computing Grid is a s y s t e m of au tonomous resources that can be s h a r e d , se lec ted , and aggrega ted as n e e d e d , to so lve a c o m m o n p rob lem. The a u t o n o m o u s resources , that a re geograph ica l l y d is t r ibuted and independent ly managed and can inc lude c o m p u t e r s , da ta , s torage s y s t e m s , spec ia l dev i ces , and se rv i ces , are se lec ted and aggrega ted at run t ime , based on thei r capab i l i t y , ava i lab i l i ty , cost , and user requ i rements . The or ig ina l and sti l l prevai l ing app l ica t ion doma in of gr id compu t ing is sc ience , but it is s tar t ing to f ind use in other a reas , such as f inance or co l laborat ive eng inee r ing . The need for eff ic ient se lec t ion and aggrega t ion of resou rces . l ed to the birth S e m a n t i c G r i d 2 0 , wh ich a ims to e m p o w e r grid compu t i ng by enab l ing seman t i c in teroperabi l i ty of par t ic ipat ing a u t o n o m o u s resources . 6.2.3.7 Agent Technologies Agen ts a re se l f -con ta ined p rog rams able to sense or unders tand the env i ronmen t and to m a k e dec is ions and take act ions towards sat is fy ing the i r goa ls , based upon the i r percept ion of the env i ronmen t . Agen ts in mul t i -agent s y s t e m s are ab le to mutua l ly interact by mak ing changes in the shared env i ronmen t or by pass ing m e s s a g e s using an agent commun i ca t i on language (ACL) and protocols such as FIPA (Foundat ion for Intel l igent Phys ica l Agen ts ) ser ies of s t a n d a r d s 2 1 or K Q M L (Know ledge Query and Semantic Grid rhttp : / /www.semanticqrid.org/l 2 1 FIPA (Foundation for Intelligent Physical Agents) series of standards rhttp : / /www.fipa.orq/repository/standardspecs .htmn - 71 -Manipu la t ion L a n g u a g e ) 2 2 . In order to unders tand each other , agen ts need to c o m m i t to a shared onto logy . FIPA Ontology Service Specification spec i f ies se rv i ces prov ided by Onto logy A g e n t s ( O A ) ; these may inc lude: • " d i scove ry of publ ic onto log ies in order to access t h e m , • ma in ta in (for e x a m p l e , regis ter wi th the DF, up load , d o w n l o a d , and modi fy) a set of publ ic on to log ies , • t rans la te exp ress ions be tween di f ferent onto log ies a n d / o r d i f ferent content l anguages , • respond to query for re la t ionsh ips be tween t e rms or be tween on to log ies , and • faci l i tate the ident i f icat ion of a shared onto logy for c o m m u n i c a t i o n be tween two a g e n t s . " [FIPA 2000b ] 6.2.3.8 Smart Clients S m a r t c l ients are adapt i ve cl ient app l icat ions that can interact wi th d is t r ibuted app l ica t ions v ia Web serv ices when they are connec ted to the In ternet or work off l ine and use local resources for p rocess ing . They can run on a lmos t any type of dev ice wi th In ternet connec t iv i t y , inc luding desk top , lap top, and tablet persona l compu te rs ( P C s ) , mob i le phones , and Persona l Digi tal Ass is tan ts (PDAs) and can be dep loyed and updated in real t ime ove r the network f rom a cent ra l ized server . 6.2.3.9 Web 2.0 Web 2.0 is a m e m e , a te rm or ig inal ly co ined in 2004 to m e a n Web as a p la t fo rm, that soon b e c a m e widely popu lar , but wi th no c lear m e a n i n g . It may best be descr ibed by e x a m p l e s of techno log ies and Web app l ica t ions , such as G o o g l e , F l ickr , B i tTorrent , Naps ter , b logs , w ik is , Web se rv i ces , f o l ksonomies or by the seven Web 2.0 pr inc ip les ident i f ied by O'Rei l ly [ 2005 ] : • the Web as p lat form • harness ing co l lect ive inte l l igence (va lue -add ing use rs ' cont r ibu t ions) Speci f icat ion of the KQML Agent -Communica t ion Language r http://www.es. umbc.edu/kqml/kqrnlspec/spec.htmll - 72 -• da ta is the next Intel inside (da ta -d r i ven app l i ca t ions , un ique , hard - to -recreate da ta sou rces that get r icher as more people use t h e m , aggrega t ing user da ta as a s ide-ef fect of the i r use of the app l ica t ion) • end of the sof tware re lease cyc le (perpetua l be tas , fea tures added on a regular bas is , users as rea l - t ime testers) • l ightweight p rog ramming mode ls (coopera t ion , rather than con t ro l , a network of coopera t ing data se rv i ces , web serv ices in ter faces and content synd ica t i on , re -use of the data serv ices of o the rs , a l low for loose ly -coup led sys tems ) • so f tware above the level of a s ingle dev ice (serv ices in tegrated ac ross a var ie ty of handhe ld dev i ces , P C s , and internet servers ) - r ich user exper iences (e .g . A J A X ) . 6.2.4 Research Projects A n u m b e r of in terre lated European pro jects invest iga ted the use of seman t i cs and contex t for enhanc ing in format ion m a n a g e m e n t in di f ferent e n v i r o n m e n t s , all under the umbre l l a of knowledge m a n a g e m e n t . G e r m a n project KnowMore ( 1 9 9 7 - 1 9 9 9 ) 2 3 invo lved app l i ca t ion-or ien ted bas ic resea rch , wh ich se rved as a star t ing point for severa l cur ren t ef forts in the area of know ledge m a n a g e m e n t . It focused on con tex t -sens i t i ve , t a s k - b a s e d act ive in format ion supp ly us ing on to log ies : an in format ion onto logy focus ing on meta proper t ies and contex t , an in tegrated doma in thesau rus /on to logy cover ing content of in format ion and an enterpr ise onto logy mode l ing context in wh ich it is used . A s imi lar i ty thesaurus prov id ing cor re la t ions be tween t e rms was au tomat ica l l y g e n e r a t e d , based on the f requency and co -occu r rence of te rms in documen ts , but ident i f icat ion of concep ts and creat ion of l inks were done manua l l y . The onto log ies used a cus tom-deve loped representa t ion language . A n au tomat i c ca tegor iza t ion tool was used for assoc ia t ing d o c u m e n t s wi th onto logy concep ts . Re levant in format ion was ret r ieved by t ravers ing the onto log ies us ing task-spec i f i c search heur is t ics . 2 3 KnowMore (1997-1999) ["http://www.dfki.uni-kl.de/froclo/KnowMore-Slideshow/KnowMore-alle/KnowMore-alle.ppsl - 73 -A successo r project of K n o w M o r e , FRODO (A Framework for Distributed Organizational Memories) 2 4 2 0 0 0 - 2 0 0 3 exp lo red me thods and tools for bui ld ing and main ta in ing d is t r ibuted organ iza t iona l m e m o r i e s in enterpr ise env i ronmen ts . It deve loped an agen t -based f ramework for o rgan iza t iona l memor i es in tegrat ion th rough mapp ing [van Elst et a l . 2 0 0 4 ] , app l ied documen t ana lys is and unders tand ing techn iques to faci l i tate onto logy const ruc t ion and evo lu t ion and to incorporate he te rogeneous in format ion sources , and in t roduced weak ly -s t ruc tu red work f lows. Ano the r EU project , Know-Net 2 5 , deve loped an in tegrated knowledge m a n a g e m e n t so lu t ion e n c o m p a s s i n g a f ramework , so f tware too ls , consu l t ing methodo log ies and techn iques , wh ich br ings together the know ledge -as -a -p roduc t (content) and k n o w l e d g e - a s - a - p r o c e s s (context ) perspec t i ves to know ledge m a n a g e m e n t . The K n o w - N e t arch i tec ture is based on a cent ra l knowledge se rve r s tor ing " know ledge ob jec ts , " wh ich are m a n a g e d — i n d e x e d , o r g a n i z e d , r e -o rgan i zed , re t r ieved , us ing on to logy -based me tada ta and K n o w - N e t Onto logy Edi tor . The s y s t e m offers f lex ib le ca tegor iza t ion of documen ts a long mul t ip le index ing d imens ions , v isua l browsing of m e t a d a t a , and i terat ive search re f inement . Know ledge ob jec ts are c rea ted , m a n a g e d , and used th rough a sui te of modu la r app l ica t ions bui l t on top of the server . The user can interact wi th the tool th rough three di f ferent " n a v i g a t o r s , " ta i lored to speci f ic o rgan iza t iona l ro les. The f irst round of rea l -wor ld va l ida t ion revea led a s t rong need for cus tomiza t ion and l inking of ex terna l app l ica t ions to the knowledge server . Th is sugges ted a need for a k n o w l e d g e - m a n a g e m e n t re ference m o d e l , a long wi th predef ined knowledge asse t s , ob jects and the i r a t t r ibu tes , and temp la tes for bus iness p rocesses , in order to sat is fy the needs of sma l l e r c o m p a n i e s unab le to afford subs tan t ia l cus tom so lu t ions and consu l t ing serv ices [Mentzas et a l . 2 0 0 1 ] . The European C o m m i s s i o n IST p rog ram Corporate Memory Management through Agents (CoMMA) 2 0 0 0 - 2 0 0 2 , deve loped a p la t form for the m a n a g e m e n t of a corpora te m e m o r y based on mul t ip le coopera t ing agen ts wi th in ference m e c h a n i s m s opera t ing on seman t i c RDF annota t ions of d o c u m e n t s us ing R D F S - b a s e d onto log ies . Agen ts wi th the role of User Prof i le Manager are ab le to adapt to the user and contex t by us ing mach ine learn ing techn iques [Gandon et a l . 2 0 0 2 ; Perez et a l . 2 0 0 1 ; Gandon 2 0 0 2 ; K iss and Qu inque ton 2 0 0 1 ] . The pa ramete rs used for th is purpose inc luded user 2 4 FRODO (A Framework for Distr ibuted Organizat ional Memor ies) 2 0 0 0 - 2 0 0 3 r h t t p : / / w w w . d f k i . u n i - k l . d e / K M / / c o n t e n t / e l 7 9 / e 5 0 6 / i n d e x e n q . h t m l l 2 5 Know-Net r h t t p : / /www.know-net .ora / 1 - 74 -role, s ta ted or impl ied p re fe rences , usage s ta t is t ics , and user or g roup rat ing of documen ts . Ano the r European project , bu i ld ing upon KnowMore and KnowNe t , DECOR (DEIivery of Context-sensitive ORganizational knowledge) 2 6 2 0 0 0 - 2 0 0 2 , invest iga ted in tegrated me thods and tools for Bus iness -P rocess Or ien ted Knowledge Managemen t ( B P O K M ) , i.e. ac t i ve , con tex t - sens i t i ve , and se l f -adap t i ve de l ivery of organ iza t iona l knowledge wi th in bus iness p rocesses . It deve loped an intel l igent in format ion ass is tan t wh ich obse rves the runn ing work f low and de l ivers contex t -sens i t i ve in format ion f rom the arch ives s t ruc tured a round d o m a i n onto log ies and process mode l s , thus help ing people to per form know ledge- in tens ive bus iness tasks . The app l ica t ion doma in was the hea l th care indust ry . The project used conceptua l f o rma l i sms of its own based on three cent ra l concep ts : k inds , charac te r is t i cs , and re la t ions, mapped to e lemen ts s t ruc ture un i ts , def inab le a t t r ibu tes , and l inks used in C o g n o V i s i o n , the commerc ia l product used in the project as a d o c u m e n t m a n a g e m e n t sys tem [Abecker et a l . 2 0 0 1 ; Papavass i l i ou 2 0 0 2 ] . The cur rent project , successo r of F R O D O , Evolving Personal to Organizational Knowledge Spaces (EPOS) 2 7 inves t iga tes an evo lu t ionary bo t tom-up approach to reso lve the d isc repancy between ind iv idua ls ' needs for ind iv idua l ized s t ruc tures and f lexibi l i ty in processes and work o rgan iza t ion on one s ide and an organ iza t ion 's need for s tandard ized pers is tent s t ruc ture and s t ruc tured p rocesses on the other . A soc ie ty of interact ing agents f rom di f ferent wo rkspaces synch ron i ze in format ion needs , ba lance s t ruc tu res , on to log ies , and process mode l s , and exchange contex t -spec i f i c re levant in format ion and thus reach a c o m m o n and shared unders tand ing of the in format ion and knowledge used in their r e a l m , and f ina l ly , cont r ibute to the organ iza t iona l m e m o r y . [Aschoff et a l . 2004] The cur rent I tal ian project Enabling Distributed and Autonomous Management of Knowledge (EDAMOK) 2 8 is p romot ing a d is t r ibuted approach to knowledge m a n a g e m e n t , wh ich a l lows each organ iza t iona l uni t a u t o n o m y in manag ing knowledge in its own contex t and enab les know ledge shar ing across o rgan iza t iona l uni ts th rough coord inat ion of mul t ip le con tex ts , ra ther than impos ing a shared knowledge s t ruc ture . 2 6 DECOR (DEIivery of Context-sens i t ive ORganizat ional knowledge) 2000-2002, rhttp://www.dfki.uni-kl.de/decor/facts. htm] 2 7 FRODO, Evolving Personal to Organizat ional Knowledge Spaces (EPOS) rhttp://www3.dfki.uni-kl.de/epos/l 2 8 Enabl ing Distr ibuted and A u t o n o m o u s Management of Knowledge (EDAMOK) [http: //edamok. itc. it/1 - 75 -To th is end it is us ing a comb ina t i on of mu l t i -agen t s y s t e m s , mach ine lea rn ing , c a s e -based reason ing , l inguist ic ana l ys i s , and fo rmal techn iques for knowledge representa t ion and reason ing . The EU-pro jec t S W A P 2 9 used S e m a n t i c Web and peer - to -pee r techno log ies to a l low users both ind iv idual v iews and ef fect ive know ledge sha r i ng . It used emergen t seman t i cs by exp lo i t ing use rs ' eve ryday ac t ions in order to prov ide bo t tom up seman t i c s t ruc tures . The G e r m a n project Knowledge Shar ing in Heterogeneous Expert Communit ies ( A W A K E ) 3 0 inves t iga tes how impl ic i t knowledge s t ruc tu res in di f ferent commun i t i es of exper ts can be cap tu red , v i sua l i zed , and used for seman t i c nav igat ion of in format ion spaces and cons t ruc t ion of new knowledge . It deve lops an agen t -based approach for c reat ion and co l laborat ive use of persona l i zed learn ing knowledge m a p s , that uses seman t i c text ana l ys i s , mach ine lea rn ing , and in ter faces for v isua l iz ing re la t ionsh ips and creat ing new knowledge s t ruc tures [Novak et a l . 2 0 0 2 ; Novak et a l . 2 0 0 3 ] . Two pro jects , Gnowsis 3 1 and Haystack 3 2 , focus on persona l in format ion m a n a g e m e n t based on the S e m a n t i c W e b techno logy . Seve ra l research pro jects at L S D I S Lab at the Un ivers i ty of Geo rg ia have been focus ing on seman t i c in format ion m a n a g e m e n t . InfoQuilt invest iga ted w a y s to ach ieve seman t i c in teroperabi l i ty th rough te rmino log ica l t ransparency , con tex t -sens i t i ve in format ion p rocess ing , and seman t i c cor re la t ion . Semantic Discovery: Discovering Complex Relationships in Semantic Web (SemiDIS), a joint project with Un ivers i ty of Ma ry l and , Ba l t imore Coun ty ( U M B C ) , focuses on the development of a system that supports indexing and querying of complex semantic relationships. The system is driven by notions of information trust and provenance and models of hypotheses and arguments [ A l e m a n - M e z a et a l . 2 0 0 5 ; A n y a n w u and She th 2 0 0 2 ] . METEOR-S: Semant ic Web Services and Processes, a ims to ex tend Web serv ices s tandards wi th S e m a n t i c Web techno log ies , in order to enab le seman t i c annota t ion and d iscovery of W e b serv ices [Ra jasekaran et a l . 2 0 0 5 ; Pati l et a l . 2 0 0 4 ; S i v a s h a n m u g a m et a l . 2 0 0 3 ; Ca rdoso and S h e t h 2 0 0 3 ] . 9 SWAP rhttp://swap.semanticweb.orq/public/index.htm1 3 0 Knowledge Sharing in Heterogeneous Expert Communities (AWAKE) [http://awake.imk.fraunhofer.de/l 3 1 Gnowsis [http://www.qnowsis.Org/1 3 2 Haystack r http: / /haystack. Ics.mit.edu/] - 76 -Stanford Un ivers i ty Know ledge S y s t e m s , A l Labora tory (KSL) deve loped Chimaera— a sof tware s y s t e m that suppor ts users in creat ing and main ta in ing d is t r ibuted onto log ies on the W e b . Two ma jo r func t ions it suppor ts are merg ing mul t ip le on to log ies toge ther and d iagnos ing ind iv idual or mul t ip le onto log ies . It suppor ts users in tasks such as loading knowledge bases in di f fer ing fo rmats , reorgan iz ing t axonom ies , reso lv ing n a m e conf l ic ts , b rows ing on to log ies , edi t ing t e rms , etc. 6.3 AEC/FM-Specific Efforts 6.3.1 Domain-Specific Standards and Semantic Resources The history of efforts on enab l ing commun ica t i on be tween d iverse and d ispersed part ies and in format ion in tegrat ion and reuse , fo l lows the history of in format ion techno logy . The ear l ies t ef for ts , wh ich focused on c lass i f icat ion and s t ruc tur ing of documen ts were fo l lowed by c lass i f icat ion of s t ruc tured da tabase in format ion and exchange of da ta be tween cost accoun t ing or C A D app l ica t ions . The in t roduct ion of ob jec t -or ien ted app roaches brought about product model l ing in i t ia t ives, whi le the adven t of Web techno log ies shi f ted focus on X M L - b a s e d da ta exchange and fur ther intensi f ied the need to enab le commun i ca t i on be tween d iverse par t ic ipants . Two genera l observa t ions can be m a d e . One is the d ispar i ty be tween the in tens ive d e v e l o p m e n t of s tandards and related research on one s ide and the poor adopt ion thereof in prac t ice , on the other . The second is the s t rong t rend towards the suppor t of d ivers i ty . 6.3.1.1 Classification schemes Construction Specifications Canada (CSC) and the Construction Specifications Institute (CSI) have been developing a number of standards that have been widely used throughout North America for decades. MasterFormat [ C S I / C S C 2004 ] is a spec i f i ca t ions-wr i t ing s tandard for bui ld ing des ign and const ruc t ion pro jects , f irst pub l ished in 1963 . It s tandard i zes the s t ruc ture and sequenc ing of in format ion and documenta t i on wi th the goal to faci l i tate commun ica t i on a m o n g arch i tec ts , spec i f ie rs , cont rac tors and supp l ie rs . SectionFormat standardizes the presentation of subjects and article headings within each section, while PageFormat specifies presentation of the text within construction documents. UniFormat is a classification of building elements—systems and assemblies that perform a particular function, intended for the description, economic analysis, and management of a building over its life cycle. It is primarily used for cost estimates and cost analysis. - 77 -Const ruc t ion c lass i f icat ion in Europe ref lects the d i f ferences in cons t ruc t ion pract ices across di f ferent count r ies . The o ldest and best known c lass i f icat ion s y s t e m is S w e d i s h SfB, a cons t ruc t ion index ing sys tem in tended to faci l i tate s to rage and ret r ieval of in format ion genera ted or ut i l ized in const ruc t ion act iv i t ies. Or ig ina l ly ca l led SfB af ter the commi t t ee that s tar ted it in 1 9 4 7 —Samarbe t skommi t ten for Bygganadsfragor (SfB), and later r enamed BASB in S w e d e n , the sys tem was au thor ized by the In ternat iona l Counc i l for Bui ld ing Research S tud ies and Documen ta t i on (CIB) for the s t ruc tur ing and f i l l ing of cons t ruc t ion indust ry in format ion and t rans la ted and adapted for use in m a n y o ther count r ies , such as E n g l a n d , Ne ther lands , B e l g i u m , S ingapo re , under s l ight ly di f ferent names (e .g . NI/SfB, Cl/Sfb). The Dutch bui ld ing c o m m u n i t y a lso uses the local ly deve loped S T A B U spec i f ica t ion for bui ld ing wo rks , whi le in Uni ted K i n g d o m , Common Arrangement of Work Sections (CAWS)—a s tandard ized set of deta i led work sec t ion def in i t ions is used for o rgan iza t ion of spec i f ica t ions and bil ls of quant i t ies . F in ish Ta lo 90 and its p redecesso rs , Talo 70 and Talo 80, ref lect the F inn ish t radi t ion of cont ro l l ing the qual i ty and the cos ts of a bui ld ing project by const ruc t ion e l emen ts , not by work sec t ions . Recogn iz ing the need for di f ferent c lass i f icat ions in di f ferent count r ies and reg ions and for di f ferent pu rposes , ident i f ied in ISO/TR 14177 [ ISO 1994 ] , ISO 12006-2 [ ISO 2001a ] prov ides a faceted f ramework to be used as a basis for deve lopmen t of nat ional and regional c lass i f icat ion s y s t e m s , rather than a s ingle s tandard h ierarchy. The f ramework is based on the fo l lowing s imple conceptua l iza t ion of the cons t ruc t ion d o m a i n : cons t ruc t ion resources are used in or requi red for cons t ruc t ion p rocesses , the output of wh ich is cons t ruc t ion resu l ts . Cons t ruc t ion resources , p rocesses , and resul ts are cons idered as bas ic ca tegor ies e n c o m p a s s i n g severa l core c lasses , wh ich are fur ther d iv ided into subc lasses by one or more pr inc ip les of spec ia l i za t ion , each resul t ing in a c lass i f icat ion tab le . The s tandard def ines t i t les and def in i t ions for the tab les , but not thei r deta i led content . It is not prescr ip t ive and it is ve ry loose by d e s i g n , in order to a l low the cont inu ing use of a l ready we l l -es tab l i shed coun t ry -spec i f i c c lass i f icat ion s y s t e m s . EPIC (Electronic Product Information Cooperation) [E lect ronic Product In format ion Co -Ope ra t i on 1999] is an in ternat ional s tandard for shar ing in format ion be tween const ruc t ion product da tabases , that con fo rms to the ISO 12006-2 [ ISO 2001a ] f ramework . It w a s deve loped as a resul t of a 1990 ag reemen t be tween represen ta t i ves f rom ten European count r ies . EPIC uses the funct ion facet as the f irst c lass i f icat ion cr i ter ion and fur ther subd iv is ion is based on shape and mate r ia l . - 78 -Uniclass [NBS Serv i ces 1997] is a Br i t ish c lass i f icat ion s y s t e m deve loped wi th the goal to in tegrate CI/SfB w i th o ther c lass i f icat ion s y s t e m s used in the U K , inc luding CAWS and EPIC. It inc ludes a ser ies of faceted tab les in tended for o rgan iz ing l ibrary mater ia ls and for s t ruc tur ing product l i terature and project in fo rmat ion . The latest ve rs ion of Ta lo—Talo 90 a lso uses the faceted app roach and ISO 12006-2 [ ISO 2001a ] f r amework . OmniClass Construction Classification System (OCCS) 3 3 is a faceted c lass i f icat ion sys tem des igned to e n c o m p a s s the ent i re A E C d o m a i n th roughou t the ent i re life cyc le of the built env i ronmen t and to c lass i fy it at a much grea ter level of g ranu lar i ty than other , s im i la r s y s t e m s . It has been deve loped th rough uni f ied efforts of numerous d o m a i n par t ic ipants ac ross North A m e r i c a . It is in tended to prov ide shared te rm ino logy and va lues to popula te indust ry s tandards , to se rve as the basis for o rgan i z i ng , s to r ing , and ret r iev ing in format ion and der iv ing re lat ional app l i ca t ions , and to prov ide a backbone for Bui ld ing In format ion Mode ls (B IM 's ) . The s y s t e m is mode led af ter Uniclass and incorpora tes c lass i f icat ion s y s t e m s of na r rower scope (e .g . MasterFormat [ C S I / C S C 2004] ) whe reve r poss ib le . The shift f rom the init ial enumera t i ve approach towards f lex ib le , f ace ted , bo t tom-up sys tems—fo l low ing the real izat ion that a s ingle f ixed h ierarchy is not su i tab le for cons t ruc t ion pract ise in di f ferent count r ies , for o rgan iz ing di f ferent t ypes of in fo rmat ion , and for d iverse purposes—is ev ident in all more recent c lass i f ica t ion efforts in the d o m a i n . Al l of the s tandards c la im to be c lass i f ica t ions of in format ion abou t cons t ruc t ion wo rks , howeve r , wha t they ac tua l ly c lassi fy are cons t ruc t ion d o m a i n ent i t ies , i.e. sub jec ts of that in fo rmat ion . None of t h e m ment ions re la t ionsh ips to me tada ta s tandards . 6 . 3 . 1 . 2 I S O 1 2 0 0 6 - 3 ISO 12006-3 [ ISO 2001b ] is a s tandard in tended to reconci le c lass i f icat ion and ob jec t -o r ien ted app roaches to s t ruc tur ing A E C / F M in fo rmat ion . It p rov ides a f r amework and conceptua l f o rma l i sm for def in ing concep ts , the i r re la t ionsh ips , and n a m i n g . Concep ts are ob jec ts def ined by proper t ies . Both ob jects and proper t ies can have re la t ionsh ips and can be g rouped . The set of proper t ies assoc ia ted wi th an ob ject m a k e s its fo rmal def in i t ion. Proper t ies have va lues that fo rm the seman t i c conten t of a concept and can be exp ressed in uni ts . The in tended role of an object can be spec i f i ed , wh ich 3 3 OmniClass Construction Classification System (OCCS) rhttp://www.occsnet.orq/1 - 79 -prov ides the capabi l i ty to def ine the contex t wi th in wh ich the object is used . The fo rma l i sm permi ts mul t ip le names for each object thus a l lowing addi t ion of s y n o n y m s and mul t ip le languages . Objects can be re lated to ex te rna l d o c u m e n t s and fo rmal c lass i f icat ion s y s t e m s th rough re ferences. 6.3.1.3 Lex icon Lexicon [STABU 2004 ] is a large ISO J 2 0 0 6 - 3 - c o m p l i a n t vocabu la ry that def ines A E C / F M d o m a i n te rm ino logy . Each concept is descr ibed by n a m e s , shor t n a m e s , and tex tua l descr ip t ions in mul t ip le languages (current ly Dutch and Engl ish on ly ) , and opt iona l images . Concep ts are a lso formal ly def ined by assoc ia ted proper t ies a n d , where poss ib le , by typ ica l compos i t i on . Proper t ies have va lues that are assoc ia ted wi th m e a s u r e s and uni ts . Ac tua l va lues are not s tored wi th in the vocabu la ry but, where appropr ia te , se ts of a l lowed va lues are p rov ided . In add i t i on , concep ts are a r ranged in a spec ia l i za t ion h ierarchy that can be rear ranged by chang ing the order of proper t ies used for g roup ing [Woes tenenk 2 0 0 2 ] . Deve loped by Dutch S T A B U founda t ion , Lexicon is in te rconnected wi th m a n y doma in pro jects and in i t ia t ives. It has part ly been deve loped wi th in C O N C U R project , used in eCons t ruc t , compat ib le wi th IFCs , STEP, bcXML. and BARM, suppor ted by In ternat ional Cons t ruc t ion In format ion Soc ie ty ( ICIS) and In ternat iona l A l l iance for In teroperabi l i ty ( IAI). Howeve r , the remain ing effort requi red to popula te it was es t ima ted at 50 m a n -yea rs in 2002 [S tephens et a l . 2 0 0 2 , 42 ] . 6.3.1.4 BARBi BARBi, that was or ig inal ly in tended to be a new c lass i f ica t ion s y s t e m , is the Norweg ian vers ion of a c o m m o n ob jec t -o r ien ted reference da ta l ibrary based on ISO DIS 1 2 0 0 6 - 3 . It focuses on def in i t ions of ind iv idual concep ts , regard less of the i r roles in any speci f ic contex t , and descr ibes the i r proper t ies and re la t ionsh ips to o ther concep ts . In that w a y , it enab les the concept to be inc luded in mul t ip le c lass i f icat ion h ierarch ies . 6.3.1.5 SDC The Standard Dictionary for Construct ion ( S D C ) 3 4 is a French init iat ive a im ing to ha rmon i ze and unify the ex is t ing French cons t ruc t ion -p roduc t vocabu la r ies , l anguages , and c lass i f icat ion s y s t e m s in o rder to suppor t e - c o m m e r c e in the const ruc t ion sector . 3 4 S tandard Dict ionary for Construct ion (SDC) f http://www.sdc.biz/1 - 80 -SDC does not impose a s ingle syn tax but a l lows imp lemen te rs to use EDIFACT, X M L , or any o ther syn tax . 6.3.1.6 CONNET Web Thesaurus C O N N E T is a portal for const ruc t ion pract i t ioners a l lowing t h e m to access in format ion resources ac ross Europe . The C O N N E T Ini t iat ive a lso deve loped WEB Thesaurus35—a da tabase of t e rms and concepts used in the A E C indust r ies wi th the purpose to suppor t index ing and retr ieval of A E C "ob jec ts " . In add i t ion to h ierarch ica l and gener ic assoc ia t i ve re la t ionsh ips , s y n o n y m s , h o m o n y m s , def in i t ions, exp lana to ry no tes , and graph ic i l lust rat ions, the thesaurus inc ludes t e rms in o ther languages and ca tegor ies f rom di f ferent A E C c lass i f icat ion s y s t e m s that are t rea ted as s y n o n y m s . 6.3.1.7 Industry Foundation Classes Ano the r l ine of efforts a im ing to enab le c o m m u n i c a t i o n and in tegrat ion of in format ion in the doma in has been focus ing on the in teroperabi l i ty of sof tware app l ica t ions . The ear ly efforts that looked for the so lu t ion in file fo rmats for data exchange be tween C A D app l ica t ions were soon rep laced by p roduc t -and d o m a i n -mode l l ing in i t iat ives that emerged wi th ob jec t -o r ien ted sof tware deve lopmen t . The mos t impor tant doma in schema—Industry Foundation Classes (IFC) has been deve loped s ince 1994 th rough jo in t ef forts of ma jo r indust ry p layers o rgan ized into the In ternat ional A l l iance for In teroperabi l i ty ( IAI). The purpose is to enab le sof tware in teroperabi l i ty in the A E C / F M indust r ies , i.e. shar ing and exchange of da ta be tween di f ferent sof tware app l ica t ions wi thout t rans la t ion or h u m a n in te rvent ion . The in teroperabi l i ty is ach ieved by spec i fy ing content and s t ruc ture for e lect ron ic representa t ions of ob jec ts , proper t ies and re la t ionsh ips in the d o m a i n to be used in sof tware deve lopmen t . IFCs handle uns t ruc tured in format ion by re ferenc ing it us ing e i ther I f cDocumentRe fe rence or I f cDocumen t l n f o rma t i on , i.e. an ent i ty can reference e i ther on ly the locat ion of a re lated documen t or a set of me tada ta descr ib ing it. Fo l lowing the def in i t ion of onto logy as an "exp l ic i t speci f icat ion of a concep tua l i za t i on " [Gruber 1993 ] , that is cur rent ly w ide ly accep ted in the in format ion techno logy d o m a i n , the IFCs can be seen as a shared doma in ontology to wh ich deve lopers of IFC comp l ian t app l ica t ions have ag reed to commi t . However , it is impor tant to keep in mind that th is def in i t ion does not co inc ide wi th the or ig ina l , 3 5 Currently available as ThesaurusforAEC.com [http://thesaurus.foraec.com/] - 81 -phi losophica l not ion of onto logy [Par t r idge 2 0 0 2 ] , that the IFCs do not actua l ly speci fy a s e g m e n t of the real wor ld , but e lec t ron ic representa t ions of its ent i t ies, and that the spec i f icat ion is s t rong ly t in ted by the use of the E X P R E S S language , speci f ied by ISO 1 0 3 0 3 - 1 1 [ ISO 2 0 0 4 b ] . E X P R E S S is a mode l l i ng , rather than log ic -based language . A l though it can be a rgued that E X P R E S S has the exp ress i ve power of f i rs t -order logic wh ich enab les spec i f icat ion and check ing of cons t ra in ts , it can hardly be cons idered for fu l l -b lown inference [SEMIS Bu l le t in ] . Ano the r issue re lated to the use of E X P R E S S is compat ib i l i t y wi th the cur rent use of X M L as the pr imary means for exchang ing in fo rmat ion . Part 28 of the S T E P s tandard ISO/TS 10303-28 [ ISO 2003 ] spec i f ies m e a n s by wh ich s c h e m a s exp ressed us ing the E X P R E S S language and data gove rned by E X P R E S S s c h e m a s can be represented as an X M L d o c u m e n t wi thout loss of in fo rmat ion . However , these no- loss representa t ions resul t in f i les that are very large and not typ ica l for X M L usage , thus underm in ing the goal of us ing the X M L in f rast ructure [Beh rman 2 0 0 2 ] . In add i t i on , E X P R E S S s c h e m a s a l low po lyh ierarchy that is not compat ib le wi th the h ierarch ica l s t ructure of XML . The X M L t rans la t ions of IFC mode ls descr ibed in subsequen t sub -sec t i ons have therefore to dea l wi th t radeof fs be tween comp le teness and usabi l i ty of da ta . Benef i ts of mode l -based s y s t e m s are und ispu tab le ; however , it is becoming obv ious that not all aspec ts of a project can and shou ld be s t ruc tured accord ing to a s ingle m e g a - m o d e l [cf. Bjork 2 0 0 2 ] . In the rev iew of A E C bui ld ing mode ls [L iebich et a l . 2002 ] points to the change of th ink ing about product m o d e l i n g , based on years of expe r ience : " A s sof tware that use in format ion mode ls as the basis for in format ion exchange and shar ing become more wide ly u s e d , l imi ta t ions in the bas ic techno logy are becoming apparen t . Whi ls t it is poss ib le to def ine ve ry deta i led mode ls that ful ly descr ibe const ruc t ion on to logy , it is reasonab le to ask whe the r th is shou ld be done. Both the IFC mode l and the P O S C / C A E S A R mode l have a l ready recogn ized that it may be bet ter to a l low f lexibi l i ty in object spec i f icat ion th rough reference da ta or proper ty se ts . By do ing th is , they focus the a t tent ion of the mode l on the re la t ionsh ips that ex is t be tween mode ls and therefore on the behav iora l charac ter is t i cs of the mode l . Wi th the deve lop ing e m p h a s i s on proper ty sets and re ference da ta , the need for a c o m m o n seman t i c d ic t ionary that can def ine the mean ings of ind iv idual words has e m e r g e d . Th is can prov ide a cons is ten t bas is for spec i f ica t ion of p roper t ies /a t t r ibu tes together wi th uni ts u s e d , s y n o n y m s , language based exp ress ion etc. and therefore m in im ize incons is tency in the way in wh ich proper ty sets are d e v e l o p e d . " [L iebich et a l . 2 0 0 2 , 58] [Rezgui and Debras 1996] and [Rezgui and Coope r 1998] looked at possibi l i t ies for br idg ing product model l ing and documen t m a n a g e m e n t app roaches . Bjork sugges ts the - 82 -use of s a m e conceptua l s c h e m a mode l l i ng me thods for both documen t me tada ta and st ruc ture , as wel l as for the product mode l in fo rmat ion , wh ich would enab le at least migrat ion of data f rom product mode l app l ica t ions to d o c u m e n t based s y s t e m s [Bjork 2 0 0 2 ] . 6.3.1.8 Metadata Standard iza t ion of in format ion resource descr ip t ions s tar ted receiv ing a t tent ion in the A E C / F M doma in more recent ly . [Bjork 1993] sugges ted the use of me tada ta for manag ing const ruc t ion d o c u m e n t s , but s tandard iza t ion efforts s tar ted on ly in the 2000 ' s wi th I S O / T R 19033 [ ISO 2 0 0 0 ] . The su rvey of research papers in the proceed ings of the annua l con ferences of the C IB Work ing c o m m i s s i o n W 7 8 (IT in Const ruc t ion) be tween 1992 and 2002 [Amor et a l . 2002 ] found that documen t m a n a g e m e n t was a substant ia l ly less popu lar research top ic than doma in mode l i ng , The ma in metada ta s tandard iza t ion act iv i ty has been concent ra ted a round the 8 2 0 4 5 ser ies of in ternat ional s tandards . Part 1 [ IEC 2001] speci f ies pr inc ip les and methods for the m a n a g e m e n t of techn ica l documen ts and prov ides the f ramework for subsequen t parts of the s tanda rd . Part 2 [ IEC 2004 ] prov ides a col lect ion of me tada ta in tended as a genera l bas ic resource for all app l ica t ion f ie lds and an imp lemen ta t i on -independent in format ion m o d e l , supp ly ing the contex t of the m e t a d a t a . It p rov ides mapp ing to D C M E S [DCMI 2006b ] and a d o c u m e n t - m a n a g e m e n t app l ica t ion re ference mode l (Sect ion 11.6 .3) that re la tes d o c u m e n t s to the contex t in wh ich they are c reated and used , us ing subse ts of app l ica t ion re ference mode ls of ISO 10303 whe reve r poss ib le . Part 5 [ ISO 2005a ] spec i f ies e l emen ts and me thods for shar ing and exchange of documen t me tada ta in the A E C / F M d o m a i n . It p rescr ibes that project t e a m s def ine a c o m m o n metada ta set that mus t be used th roughou t the project , by se lec t ion f rom the e lemen ts speci f ied in the s tanda rd . Whe re not s ta ted by the s tanda rd , a l lowed va lues for metada ta e lemen ts can be speci f ied on the nat ional level or the project leve l . Each sys tem invo lved has to be ab le to produce me tada ta in a s tandard ized exchange fo rmat , but no speci f ic fo rmat is i m p o s e d . The s tandard prov ides for co- re la t ion of documen t me tada ta to IFCs th rough the Con tex t .P roduc tOb jec t e lement . Indust ry Foundat ion C lasses in t roduced me tada ta in vers ion 2x. The e lemen t set captured in the I f cDocumen t l n fo rma t i on ent i ty is m u c h sma l l e r than the one in ISO 82045-5 [ ISO 2005a ] and is not re lated to any metada ta s tanda rds , and the indiv idual - 83 -e lements are insuff ic ient ly def ined (e .g . S c o p e is def ined as " S c o p e for th is d o c u m e n t " , Purpose as " P u r p o s e for th is d o c u m e n t " ) . The focus is on admin is t ra t i ve m e t a d a t a ; content can be descr ibed th rough the I f cRe lAssoc ia tesDocumen t f rom I fcObject , the opt ional Descr ip t ion e lemen t , and poss ib ly th rough N a m e . A l though the use of me tada ta has been a recurr ing t h e m e of research pro jects ove r a long per iod of t ime [e.g. Turk et a l . 1 9 9 4 ; Rezgu i and Coope r 1 9 9 8 ; Underwood 2 0 0 3 ] , there is no ev idence of s ign i f icant use of any of these s tandards in the const ruc t ion pract ice. A l though the major i ty of in format ion m a n a g e m e n t s y s t e m s used in pract ice (Rezgu i [2001] rev iewed tools for knowledge m a n a g e m e n t , Lakka et a l . , [2001] for co l labora t ion , and Bjork [2002] documen t m a n a g e m e n t sys tems ) use s o m e basic searchab le me tada ta [Lakka et a l . 2 0 0 1 ] , no s tandard me tada ta se ts are used [Bjork 2 0 0 2 ; Ka t ranuschkov et a l . 2 0 0 4 ] . Fewer than 7 0 % of the s y s t e m s sea rch f i le n a m e s , whi le fewer than 4 0 % use full text sea rch . Al l s y s t e m s rev iewed in [Lakka et a l . , 2001 ] use s ing le , f ixed fi le fo lder s t ruc tu res for b rows ing . A su rvey per fo rmed wi th in the p rodAEC project [Woes tenenk et a l . 2 0 0 2 ] , ident i f ied the fo l lowing barr iers to a w ider use of me ta da ta s tanda rds : • " m o s t users do not know what it is • the requi red manua l input of me ta da ta substant ia l ly h inders s tandards usage • data exchange is sti l l hand led main ly by e - m a i l ; serv ices of fered by In ternet compan ies do not p romote meta da ta use • there is a large n u m b e r of s tanda rds , and most are not suf f ic ient ly s tab le • suppor t of me ta data s tandards by typ ica l sof tware app l ica t ion in bui ld ing const ruc t ion is qui te low; propr ie tary fo rmats where meta da ta used are not open to publ ic are sti l l d o m i n a n t . " [Ka t ranuschkov et a l . 2004 ] Metadata s tandards are not used in A E C / F M research pro jects e i ther , and are rarely even referenced in research papers . The recent Digi tal Handove r D A C a P o pro ject , a part of the Dan ish Det Dig i ta le Bygger i (DDB) p rog ram (h t tp : / /www.de td ia i t a lebyqqer i . dk ) is a rare examp le that imp lemen ts ISO 82045-5 [ ISO 2005a ] e lemen t set [Chr is t iansson and Car lsen 2005 ] . Metadata that descr ibes content , contex t , and s t ruc ture of da ta s tored in re lat ional da tabases , wh ich has been rece iv ing ex tens ive at tent ion in corpora te and geospat ia l in format ion m a n a g e m e n t , is not a ma jo r top ic in the A E C / F M d o m a i n . The c lass i f icat ion sys tems ment ioned ear l ier prov ide va lues for me tada ta e lemen ts descr ib ing content of re lat ional da tabases but do not r e c o m m e n d the set of e lemen ts to be used nor va lues for e lemen ts descr ib ing the con tex t or s t ruc ture of the da ta . - 84 -A number of research pro jects , such as C O N D O R [Rezgu i and C o o p e r 1998] and e S M @ R T [Schapke et a l . 2 0 0 2 ] , exp lo red and proposed w a y s for in tegrated m a n a g e m e n t of d i f ferent types of i n fo rmat ion ; however , there have been no re lated s tandard iza t ion efforts or real wor ld s y s t e m s app ly ing t h e m . 6.3.1.9 I fcXML I f c X M L 3 6 is deve loped by the In ternat ional A l l iance for In teroperabi l i ty . It has two representa t ions . The f irst one fo l lows Part 28 and genera tes equa l ly c u m b e r s o m e and atyp ica l X M L f i les. The second one is condensed but wi th unavo idab le loss of the or ig ina l in fo rmat ion . 6.3.1.10 BL IS -XML Building Lifecycle Interoperable Software (BLIS) 3 7 was a project that coord ina ted imp lemen ta t i on efforts of v e n d o r s ' t ry ing to suppor t the IFCs . One of the de l i verab les is BL IS -XML, a me thodo logy for encod ing E X P R E S S based in format ion in X M L fo rmat . B L I S - X M L suppor ts three di f ferent scenar ios of data e x c h a n g e : • Conve rs ion of a full E X P R E S S data m o d e l , • Conve rs ion of a va l id subset (v iew) of an E X P R E S S data m o d e l , • Convers ion of a s ing le , sel f suff ic ient concept f rom an E X P R E S S data mode l that can be used independent ly in da ta exchange or e m b e d d e d into o ther X M L mode ls . In o rder to reduce the comp lex i t y of the s c h e m a and the s ize of exchange f i les, B L I S - X M L does not conta in the full inher i tance of the E X P R E S S mode l on ly but on ly those inher i ted a t t r ibutes that are re levant to each mode l v iew contex t . It a lso inc ludes re levant I N V E R S E re la t ionsh ips as , unl ike E X P R E S S - b a s e d too ls , X M L parsers are not able to reca lcu la te t h e m . The po lyh ierarchy of E X P R E S S mode ls are hand led in B L I S - X M L by the use of " id re f " re ferences instead of the typ ica l h ierarch ica l X M L s t ruc tu re , wh ich makes man ipu la t ion of f i les wi th X S L T more cha l leng ing . However , the B L I S - X M L methodo logy is sti l l much s imp le r than the app roaches def ined in S T E P Part 28 . 3 6 IfcXML [http://www.iai-international.org/iai international/Technical Documents/ IfcXML2.html 3 7 Building Lifecycle Interoperable Software (BLIS) [http://www.blis-proiect.org/index2.htmn - 85 -6.3.1.11 aecXML aecXML 3 8 is an X M L - b a s e d language used to represent di f ferent t ypes of in format ion in the A E C indust ry . The North A m e r i c a n chapte r of the IAI that or ig ina l ly s tar ted deve lop ing the s c h e m a , soon changed its goal to deve lop ing a f ramework [IAI 2001] and accep t ing s c h e m a s submi t ted by others . The F ramework env is ioned that s c h e m a s get submi t ted and placed into a C o m m o n Object Repos i to ry ( C O R ) . C o m m o n ob jects f rom the C O R wou ld then be conso l ida ted and put in a C o m m o n Object S c h e m a ( C O S ) as foundat ion ob jects for o ther a e c X M L s c h e m a s , e i ther Doma in Spec i f i c S c h e m a s ( D S S ) or Bus iness Process S c h e m a s (BPS) to bui ld upon . Th is f r amework that t r ied to use cons t ruc ts of ob jec t -o r ien ted techno log ies in X M L never b e c a m e t ru ly u s e d ; on ly two s c h e m a s got subm i t t ed . 6.3.1.12 bcXML Bu i ld ing -Cons t ruc t ion ex tens ib le Ma rk -up Language ( b c X M L ) 3 9 was deve loped wi th in the eCons t ruc t project wi th the purpose to suppor t e - c o m m e r c e and e -bus iness in the European bui ld ing and const ruc t ion indust ry , more spec i f ica l ly , to enab le c reat ion of " requ i r emen t m e s s a g e s " that can be interpreted by c o m p u t e r app l i ca t ions ab le to f ind su i tab le products meet ing those requ i rements . The language spec i f ica t ion a l lows deve lopmen t of b c X M L compl ian t t a x o n o m i e s , mean ing t a x o n o m i e s ab le to return in format ion in bcXML fo rmat and imp lemen t ing the inter face def ined by eCons tuc t . One of b c X M L comp l ian t t axonom ies is Lexicon. bcXML is l inked to IFCs th rough the jo in t ly deve loped C o m m o n Object S c h e m a ( C O S ) . 6.3.1.13 Promotion and Mapping of Standards In recent y e a r s , a n u m b e r of pro jects s tar ted address ing the prob lem of both prol i ferat ion and insuff ic ient adopt ion of A E C / F M s tanda rds , prod A E C 4 0 is a European Network suppor t ing the best pract ice , ha rmon iza t i on , imp lemen ta t i on , and use of s tandards for data e x c h a n g e , e-work and e -bus iness in the A E C sector . In addi t ion to its pr imary role to p romote and educa te , and thus encourage the use of s tanda rds , it a lso encourages harmon iza t ion of s tanda rds wi th over lapp ing top ics . 3 8 aecXML [http://www.iai-na.org/aecxml/mission.php1 3 9 Building-Construction extensible Mark-up Language (bcXML) rhttp : / /www.econstruct.org/2003/index.htmll 4 0 prodAEC [http://www.prodaec.com/] - 86 -thesaurus.forAEC.com 4 1 deve loped in the C O N N E T pro ject , on the other h a n d , a l lows for the ex is tence of mul t ip le s tandards by prov id ing mapp ings be tween Talo 9 0 , Un ic lass , EP IC , and N A I C S 9 7 . Feasibility Study for a Unified Semantic Infrastructure in the European Construction Sector (FunSIEC) 4 2 p lays both roles. It is a project that p romotes the use of s tandards and s tud ies the feasibi l i ty of bui ld ing and main ta in ing an O p e n S e m a n t i c In f ras t ruc tu re for the Eu ropean Cons t ruc t ion Sec to r (OSIECS) at a techn ica l , o rgan iza t i ona l , and bus iness leve l . The goal of the in f rast ructure is to enab le in teroperabi l i ty be tween tools us ing di f ferent seman t i c resources by prov id ing seman t i c mapp ings be tween these resources . The ma in de l iverab le is the O S I E C S t r iad , c o m p o s e d of the O S I E C S K e r n e l , the O S I E C S M e t a - m o d e l , and the O S I E C S Mode l . The Kerne l is a s e m i - a u t o m a t i c so f tware tool that rel ies on h u m a n in tervent ion to produce both M e t a -mode l and Model—tab les prov id ing semant i c mapp ings be tween seman t i c resources . S e m a n t i c resources mapped in FunS IEC inc lude: the eCons t ruc t bcBuildingDefinitions t a x o n o m y , the e - C O G N O S onto logy , ISO 1 2 0 0 6 - 3 , and the IFC m o d e l . These resources are ini t ial ly syntact ica l ly homogen i zed by convers ion to O W L . Future p lans inc lude: add i t ion of new seman t i c resources , inc luding those in di f ferent l anguages , non -s t ruc tu red , sem i - s t r uc tu red , and relat ional sources as wel l as deve lopmen t of an App l i ca t ion P rog ramming Inter face to offer serv ices to th i rd par ty app l i ca t ions [L ima et a l . 2 0 0 5 ; L ima et a l . 2 0 0 5 a ] . 6.3.2 AEC Research Projects A number of pro jects in the 1990 's invest igated so lu t ions to the prob lem of in tegrat ing s t ruc tured const ruc t ion project in format ion by us ing mode l -based app roaches , e .g . A T L A S [Bohms et a l . 1994 ] , C O M B I [LAP and G C C 1 9 9 5 ] , C O M B I N E [Augenbroe 1 9 9 3 ] , ICON [Aouad et a l . 1994 ] , MOB [MOB 1 9 9 4 ] , R A T A S [Bjoerk 1994 ] , S P A C E [Faraj and A lshawi 1999 ] . At the s a m e t ime , ano the r l ine of research exp lo red the use of e lec t ron ic documen t m a n a g e m e n t (EDM) s y s t e m s in the d o m a i n [Turk et a l . 1 9 9 4 ; B joerk et a l . 1 9 9 3 ; Lowner tz 1998 ] . A third g roup of pro jects s tar ted looking at in tegrat ion of d o c u m e n t - b a s e d and mode l -based app roaches . 4 1 Thesaurus for Architecture, Engineering, and Construction [http://thesaurus.foraec.com/] 4 2 Feasibility Study for a Unified Semantic Infrastructure in the European Construction Sector (FunSIEC) l"http://www.funsiec.org/common index2.php?id l inqua=l] - 87 -The D O C C I M E project invest igated the genera t ion of cons t ruc t ion d o c u m e n t s us ing both project and d o c u m e n t mode ls [Rezgui and Debras 1996 ] . C O N D O R a imed to prov ide a migra t ion path f rom d o c u m e n t - b a s e d to mode l - based approaches to in format ion representa t ion and st ruc tur ing and to al low coex is tence of di f ferent documen t m a n a g e m e n t s y s t e m s th rough the use of AP I ' s , adap to rs , and integrat ion se rv i ces . The proposed mode l used internal s t ruc ture of text d o c u m e n t s and C A D -draw ing layers and l ink ing of in format ion e l emen ts to "p roduc t ob jec ts " ( i .e. IFCs) [Racine and Coope r 1998 ] . Howeve r , the v is ion deve loped in these pro jects requi red the change of cul ture and reeng ineer ing of bus iness p rocesses in the indust ry . GEN Intelligent Access Libraries (GENIAL) ( 1 9 9 4 - 1 9 9 9 ) was a key project wi thin the G E N in i t ia t ive—an open co-opera t ion of indust ry and a c a d e m i a wi th the miss ion to prov ide a g lobal e lec t ron ic marke tp lace for users and supp l ie rs of eng ineer ing products and serv ices . The ob ject ive of the project w a s to def ine an open arch i tec ture and to es tab l ish a C o m m o n S e m a n t i c In f rast ructure that wou ld enab le en terpr ises f rom di f ferent sec to rs to comb ine internal and ex terna l know ledge . The so lu t ion was based on the index ing of content , inc luding documen ts and in format ion about p roduc ts , compan ies and serv i ces , us ing mul t ip le ex is t ing c lass i f ica t ions T o C E E 4 3 ( 1 9 9 6 - 1 9 9 8 ) that are mapped to each o ther and enr iched by s y n o n y m s and keywords . [GENIAL 1999] was concerned wi th da ta shar ing be tween ac tors and life cyc le s tages in concur ren t eng inee r ing . One of its de l iverab les was a d o c u m e n t mode l that was re lated to product and process mode ls [Amor and Clift 1997 ] . 6.3.2.1 eConstruct Electronic Business in the Building and Construction Industry: Preparing for the new Internet (eConstruct) 4 4 had for a goal the deve lopmen t of a commun ica t i on techno logy to suppor t e - c o m m e r c e in the A E C / F M indust r ies . The ma in de l iverab le is Bui ld ing and Const ruc t ion ex tens ib le Mark -up Language (bcXML)—an X M L vocabu la ry for s t ruc tur ing product ca ta logues and requ i rements m e s s a g e s , match ing requ i rements to products and a l lowing impor t of product da ta into p rocu rement s y s t e m s . Af ter a yea r of deve lopmen t , the project gave up its or ig inal goal to deve lop a s ing le m e g a bcTaxonomy and dec ided to prov ide for deve lopmen t of mul t ip le d is t r ibuted bcXML compl ian t t axonom ies ins tead . [To lman et a l . 2 0 0 1 ] . It prov ided a m e t a - s c h e m a and a bui ld ing 4 3 ToCEE (1996-1998) rhttp : / /wwwcib.bau.tu-dresden.de/tocee/] 4 4 E lectronic Bus iness in the Bui lding and Construct ion Industry : Prepar ing for the new Internet (eConstruct) [http://www.econstruct.orq/] - 88 -and const ruc t ion speci f ic t a x o n o m y fo l lowing the s c h e m a . The t a x o n o m y , that can be used for deve lop ing ca ta logues and spec i f ica t ion m e s s a g e s , is L e x i c o n , or ig inal ly deve loped wi th in C O N C U R project . Howeve r , the f inal report s ta tes that the comple t ion of the t a x o n o m y was to take add i t iona l 50 m a n - y e a r s and recogn izes the need to s tandard ize it v ia in ternat ional s tandard iza t ion bod ies . [eConst ruc t 2 0 0 2 ] . One of the organ iza t iona l end- resu l t s of the project is CEN/ISSS Workshop for eConstruction 4 5 that a ims to be the p la t form for ha rmon iza t ion of all seman t i c e lec t ron ic commun ica t i on and in format ion in tegrat ion efforts in the European Cons t ruc t ion Indust ry . The goal is the def in i t ion of f ive in ter re la ted spec i f ica t ions (CWAs) cover ing f ramework , a rch i tec ture , on to logy , m e t a - s c h e m a , and sof tware too lset for eCons t ruc t i on . It is o rgan ized as a ser ies of face- to - face and v i r tua l meet ings wi th ma jo r p lenary sess ions accompan ied by the suppor t ing eEurope pi lot pro ject Specifications for Integrated Construction E-standards (SPICE) 4 6 . The group tasked wi th the deve lopmen t of the s tandard ized onto logy c a m e to the conc lus ion that the un ique onto logy for the sec tor wil l never ex is t and that the use of the ex is t ing " s e m a n t i c resou rces " shou ld be promoted instead [L ima 2 0 0 4 ] . 6.3.2.2 ISTforCE One of the de l iverab les of the European project Intelligent Service Tools for Concurrent Engineering (ISTforCE) was a know ledge -based Model Access Service (MAS) that a l lows eng ineers to interact wi th IFC product mode ls v ia an onto logy serv ice that br idges the termino log ica l gap between the IFC mode l spec i f icat ion and eng ineers [Gehre and Ka t ranuschkov 2 0 0 2 ] . Ano the r de l i verab le was a process matrix [Wix and Liebich 2 0 0 0 ] , that was fur ther ref ined and synch ron i zed wi th o ther re lated efforts in the ICCI project [Wix and Ka t ranuschkov 2 0 0 2 ] . Based on input f rom indust ry o rgan iza t ions , and accompan ied by in format ion type c lass i f i ca t ion , the mat r ix has the potent ia l to serve as a star t ing point for def in ing project work f lows , des ign ing in format ion s y s t e m s , or under tak ing organ iza t iona l in format ion aud i ts . 6.3.2.3 DocMo service DocMo serv ice was deve loped to address the miss ing p ieces necessary for knowledge m a n a g e m e n t and organ iza t iona l l ea rn ing : con tex tua l i za t ion of in format ion to suppor t 4 5 CEN/ISSS W o r k s h o p for eConstruct ion [ h t t p : / / w w w . n e n . n I / w s e c o n s t r u c t i o n / l 4 6 Speci f icat ions for Integrated Construct ion E-standards (SPICE) [ h t t p : / / s p i c e . b o u w . t n o . n I / l - 89 -in tegrat ion of new knowledge wi th ind iv idual menta l mode l s , f lex ib le adapt ive p resenta t ion , in format ion presenta t ion and st ruc tur ing to suppor t learn ing e .g . by prov id ing top ics , s e q u e n c i n g , and background in fo rmat ion , persona l iza t ion of knowledge reposi tor ies to suppor t a rch iv ing of persona l retr ieval pa ths , and organ iz ing in format ion accord ing to indiv idual menta l mode l s , sens ing user needs to a l low for proact ive prov is ion of appropr ia te in format ion re levant to the user 's cur rent task and s i tua t ion . The DocMo serv ice uses a d o c u m e n t reposi tory imp lemen ted as a re lat ional da tabase wi th s tandard ized metada ta descr ip t ions based on the IFC 2x I fcEx terna lReference s c h e m a . The IFC s c h e m a was ex tended to handle keywords as wel l as documen t f ragments . The use of the s c h e m a a l lows in tegrat ion wi th IFC-based process and work f low m a n a g e m e n t s y s t e m s , thus enab l ing both conten t - and con tex t -p rov id ing serv ices . [Schapke et a l . 2002 ] 6.3.2.4 OSMOS Open System for Inter-Enterprise Information Management in Dynamic Virtual Environments (OSMOS) 4 7 2 0 0 0 - 2 0 0 2 . deve loped , a m o n g o ther de l i ve rab les , an In format ion M a n a g e m e n t Model that inc ludes m e t a d a t a , re la t ionsh ips be tween in format ion resources , and c lass i f ica t ion (which pract ical ly inc ludes me tada ta wi th enumera t i on va lues that can be used for g roup ing in format ion resources ) . [Wi lson et a l . 2001 ] 6.3.2.5 e-Coqnos The goal of eCognos (Me thodo logy , too ls and arch i tec tures for e lec t ron ic c o n s i s t e n t k n o w l e d g e ma/Vagement ac ross p r o j e c t s and between e n t e r p r i s e s in the const ruc t ion d o m a i n ) 4 8 ( 2 0 0 1 - 2 0 0 3 ) was to deve lop a W e b se rv i ces -based and on to logy -enab led Knowledge M a n a g e m e n t in f rast ructure ( e -CKMI ) and a set of tools to enab le cons is ten t knowledge m a n a g e m e n t wi th in co l laborat ive cons t ruc t ion env i ronmen ts . The project invo lved deve lopmen t of e C o g n o s on to logy , made ava i lab le v ia eCognos onto logy server e C O S 4 9 . The onto logy , exp ressed in DAML+OIL , uses IFCs as a backbone but keeps t h e m independent in order to prov ide for changes in IFCs. It has a 4 7 Open Sys tem for Inter-Enterpr ise Informat ion Management in Dynamic Virtual Env i ronments (OSMOS) 2000-2002 rhttp:/ /cic.vtt . f i /proiects/osmos/l 4 8 eCognos (Methodology, tools and architectures for electronic consis tent knowledGe ma/Vagement across projects and between enterprises in the construction domain) (2001-2003) [http://www.e-coqnos.org/default.html 4 9 eCOS rhttp://www.e-coqnos.orq/Downloads/Publications/eSMART2002-IFC-Ontoloqy.pdfl - 90 -t a xonomy ( inc luding on ly " i s a " re la t ionships) as a co rner s tone but a lso inc ludes equ iva len t t e rms col lected f rom users and add i t iona l re la t ionsh ips , s o m e based on IFC re la t ionship types and s o m e sugges ted by end users . The onto logy can evo l ve—end users submi t on to logy sugges t ions to an Onto logy Manager , who can e i ther re jects t hem or add t hem to the on to logy . A " k n o w l e d g e i t em" , wh ich can be a d o c u m e n t , d r a w i n g , personal exper t i se , o rgan iza t iona l exper ience e tc . , is represented as a co l lect ion of onto logica l concepts that def ine it. Th is mode l se rves as a basis for se rv ices that a l low c rea t ion , cap tu r ing , i ndex ing , re t r ieva l , and d issemina t ion of know ledge , us ing adap t i ve m e c h a n i s m s for o rgan iz ing knowledge i tems and persona l ized in format ion de l ivery serv ices that a l low c rea t ion , cap tu r i ng , i ndex ing , re t r ieva l , and d i ssemina t i on of knowledge [Wether i l l 2 0 0 2 ] . A research s t r eam at the Dresden Univers i ty of Techno logy Inst i tute for Cons t ruc t ion In format ics has been focus ing on con tex t -sens i t i ve in format ion m a n a g e m e n t in mobi le ne tworked env i ronmen ts . Agent Enabled Environment for Mobile e-Work on Construction Site (AMECS) invest igated the use of intel l igent in format ion ga ther ing and ret r ieval agen ts for au tomat i c , con tex t -aware in format ion re t r ieva l . The contex t recogni t ion and sens i t iv i ty f ramework is based on the know ledge cap tured in Context Models, Context Patterns and Rules. User in ter face agen ts use that knowledge and the in format ion gathered by dev ice senso rs for con tex t recogni t ion and con tex t -sens i t i ve presenta t ion [Scherer et a l . 2 0 0 2 ] . In more recent work , con tex t is def ined as a set of aspec ts of a 'work scenar io ' , wh ich is def ined a s : "Wi th in one speci f ic W O R K S C E N A R I O s o m e A C T O R is us ing a speci f ic IT I N F R A S T R U C T U R E to ob ta in , en ter , v iew or modi fy I N F O R M A T I O N that h e / s h e requi res to success fu l l y accomp l i sh h i s /he r ACTIV ITY at a speci f ic L O C A T I O N and T IME under speci f ic E N V I R O N M E N T A L C O N D I T I O N S . " [Menze l et a l . 2004 ] These contex t aspec ts are used as d imens ion h ierarch ies of a data wa rehouse s ta r s c h e m a . The so -ca l led Process Pattern Libraries' (PPL) cons is t ing of concept , s t ruc tu res , life cyc le , ICT, and faci l i t ies parts and inc luding h o m o g e n e o u s process descr ip t ions are used a long wi th the th ree - laye r p a t t e r n - m a n a g e m e n t f r amework , cons is t ing of s t ra teg ic , tac t ica l , and opera t iona l layers and in tended for manag ing dif ferent levels of g ranu la r i t y , for the setup of new v i r tua l , project o rgan iza t ions , [ ibid.] The interest in the use of S e m a n t i c Web in the doma in is surpr is ing ly late compa red to o ther doma ins . Howeve r re lated ea r l y - s tage research at mul t ip le research cen ters inc luding Techn ica l Un ivers i ty Delft [van Rees 2 0 0 4 ; van Rees 2 0 0 4 a ] , the Aa lborg Univers i ty Bui ld ing In format ics g roup [Lai et a l . 2 0 0 3 ; Lai 2 0 0 4 ; Chr is t iansson 2 0 0 3 ; Lai - 91 -et a l . 2 0 0 2 ] , the Loughborough Univers i ty Cent re for Innovat ive Cons t ruc t ion Eng ineer ing (CICE) [Zeeshan et a l . 2 0 0 4 ; A n u m b a , et a l . 2 0 0 3 ; A n u m b a , et a l . 2 0 0 4 ] , and the Un ivers i ty of Toron to [E l -D i raby 2004 ] a long wi th C E N / I S S S W o r k s h o p for eCons t ruc t ion recogni t ion of S e m a n t i c Web as one of the key techno log ies , ind icate that the research and d e v e l o p m e n t in this area is l ikely to f lour ish in the near fu ture. S e m a n t i c Gr id research is in even ear l ier research s tage in the d o m a i n [ A n u m b a et a l . 2 0 0 4 a ; Turk et a l . 2 0 0 4 ] , however , it can be expec ted that it wil l ga in more breadth and dep th . - 92 -7 Proposed Approach Af ter , se t t ing forth the requ i rements that the proposed so lu t ion needs to fulfi l l and bas ic h igh- leve l pr inc ip les fo l lowed in its deve lopmen t , th is sec t ion ident i f ies shared e lemen ts of seman t i c resources and pat terns of the i r a r r a n g e m e n t s , i l lust rates and va l ida tes these f ind ings by reduc ing genera l i zed and speci f ic seman t i c resources to the c o m m o n f ramework , and prov ides an ove rv iew of the proposed s y s t e m , its funct ional i ty , and arch i tec ture . A s s ta ted in Sec t ion 2 .9 , th is work a ims to enhance in format ion use and m a n a g e m e n t in the A E C / F M d o m a i n whi le accept ing and suppor t ing its comp lex i t y . The search for the so lu t ion s tar ts wi th the premise that a prerequ is i te for ach iev ing th is goal is the eff ic ient commun ica t i on be tween very d iverse par t ies , inc luding both h u m a n s and m a c h i n e s , wi th in a comp lex s y s t e m . The d iscuss ion in Sec t ion 4 revea led that such commun i ca t i on necess i ta tes reconci l ia t ion of di f ferent concep tua l i za t ions a long wi th te rmino log ica l t ransparency and con tex tua l i za t ion . The rev iew of ex is t ing seman t i c resources in Sec t ion 5 and Sec t ion 6.3.1 showed that they a re equa l ly d iverse and intr icate ly in ter re la ted, mak ing a comp lex s y s t e m of its own and that a so lu t ion shou ld not be sought in a s ing le , new and better seman t i c s t ruc ture , but ra ther in enab l ing par t ic ipants , inc luding both ind iv idua ls and part ic ipat ing s y s t e m s , us ing the ex is t ing s t ruc tures to c o m m u n i c a t e . Joe wan ts to add a French w indow to the bui ld ing he is des ign ing . He needs to f ind a product to insta l l , e x a m p l e s of French w indows in arch i tec tura l l i terature, re levant bui ld ing sense requ i rements . His Web search re t r ieves pages about the French vers ion of the Windows opera t ing s y s t e m . His search on First Sou rce and S w e e t s re t r ieves no hits as in the fo rmer , re lated records are c lass i f ied under "S t i l e and Rail Wood Doo rs " , in the latter they are ca l led " F r e n c h doo rs " . Di f ferent da tabases index re lated ar t ic les wi th " c a s e m e n t doo rs " , " d o o r w i n d o w s . " Not on ly di f ferent n a m e s are used but di f ferent conceptua l i za t ions of the d o m a i n . In Ta lo , they are c lass i f ied under Structure completion products > Windows > Special windows > in CPV under Wood, wood products, cork products, basketware and wickerwork > Builders' wood joinery and carpentry > Builders' wood joinery > Wooden windows. A n d he is look ing for one m a d e out of me ta l . Is he go ing to miss s o m e bui ld ing sense requ i rements if he looks up under " w i n d o w s " ? Is he going to se lect I fcWindow or I fcDoor in his d raw ings? Inc lude it in the door or w indow schedu le? It may not m a k e much d i f ference if he sends the d o c u m e n t s to Jane but to what ex tent can such d i f fe rences esca la te if a n u m b e r of compu te r s y s t e m s are used between the two of t h e m , s o m e looking at l i teral s t r ings and others us ing onto log ies , t axonom ies , s c h e m a s , ru les, and mapp ings built by Peter , Pau l , and Mary? In deve lop ing the so lu t ion to the p rob lem, it is necessary to def ine a set of requ i rements that the solut ion needs to fulfil and aga ins t wh ich its success can be va l ida ted and m e a s u r e d . The sect ions that fol low wil l out l ine requ i rements that the - 93 -proposed so lu t ion shou ld meet and h igh- leve l requ i remen ts , wh ich are used as gu id ing pr inc ip les in its deve lopmen t . 7.1 Requirements Domain -spec i f i c s tud ies focus ing on in fo rmat ion- re la ted act iv i t ies and behav iour were found to be col lect ive ly insuff ic ient , in te rms of both cove rage and level of de ta i l , to prov ide a reasonab ly comp le te set of requ i rements for an A E C / F M in fo rmat ion-m a n a g e m e n t f r amework and s y s t e m . Spec i f i ca l ly , mos t of these s tud ies focus on in format ion behav iour of arch i tec ts [e .g. Mack inder and Marv in 1 9 8 2 ; Goodey and Mat thew 1 9 7 1 ; Lera et a l . 1 9 8 4 ; Newland et a l . 1 9 8 1 ; Newland 1990] or eng ineers [e.g. A n d e r s o n et a l . 2 0 0 1 ; Cool and Xie 2 0 0 0 ; Leckie et a l . 1 9 9 6 ; Muel ler et a l . 2 0 0 5 ; Tenop i r and K ing 2 0 0 4 ] ; the lat ter group not even being speci f ic to eng ineers work ing in the A E C / F M d o m a i n . S h a a b a n et a l . [2003] ana l yzed behav iour of di f ferent A E C pro fess iona ls but on ly when us ing onl ine techn ica l in format ion resources . There fo re , the requ i rements were der ived f rom sources ana lyz ing n o n - d o m a i n speci f ic act iv i t ies re lated to the c rea t ion , s e e k i n g , use , and m a n a g e m e n t of di f ferent t ypes of in format ion—al l of wh ich are used wi th in the d o m a i n : da tabases , d o c u m e n t s , Web content , so f tware app l ica t ions. It is impor tant to note that the act iv i t ies ident i f ied in that p rocess are di f ferent f rom work t a s k s ; these are ins t rumenta l in fo rmat ion- re la ted act iv i t ies that are necessary for comp le t ion of a much b roader range of work tasks and wil l cer ta in ly be s t rongly in f luenced by work - task contex t in pract ice [e .g . Be lk in et a l . , 1 9 8 2 ; I ngwersen , 1 9 9 2 ; Bys t rom and Jarve l in 1 9 9 5 ; Jarve l in and Ingwersen . 2 0 0 4 ] . Due to the genera l purpose of the propose so lu t ion , it wou ld be pract ica l ly imposs ib le to def ine a reasonab ly comp le te list of work tasks to be used for def in ing the sys tem requ i rements . Ano the r source used for ident i fy ing the requ i rements inc ludes funct ions of di f ferent seman t i c resources . A s the genera l purpose of these resources is to suppor t in fo rmat ion- re la ted act iv i t ies , it is reasonab le to a s s u m e that the i r func t ions can be a good ind icator of re lated requ i rements . The bas ic funct ional i ty of any da tabase or pers is tence layer in a sof tware s y s t e m inc ludes the abi l i ty to : (1) c rea te , (2) re t r ieve , (3) u p d a t e , and (4) de le te (CRUD) its content . A s the a m o u n t and heterogene i ty of conten t inc rease , the addi t iona l funct ional i ty is requi red to al low and enhance its m a n a g e m e n t , re t r ieva l , in tegra t ion, and use . A s m u c h of that addi t iona l funct ional i ty is ach ieved by in t roduc ing metada ta [Hunter and S p r i n g m e y e r 1994 ] , fur ther funct ional requ i rements were sough t in sources - 94 -ana lyz ing me tada ta funct ions. [S t rebe l , M e e s o n , and Fr i thesen 1994] propose that the three ma in funct ions of me tada ta are to enab le and e n h a n c e : (1) data m a n a g e m e n t , (2) da ta access , and (3) da ta ana lys is . Burnet te et a l . [1999] sugges t that the funct ions of me tada ta can be d iscussed at the s y s t e m level and the e n d - u s e r leve l . A t the s y s t e m leve l , me tada ta can be used to (1) faci l i tate in teroperabi l i ty and (2) the abi l i ty to share a m o n g resource d iscovery too ls , whi le at the e n d - u s e r leve l , it can suppor t the abi l i ty to de te rm ine : (I) what da ta is ava i lab le ; (II) whe the r it mee ts speci f ic needs ; (III) how to acqu i re it; and (IV) how to t ransfer it to a local s y s t e m . The bas ic funct ions of me tada ta used by Da tabase Des ign So lu t i ons ' for thei r t r a d e m a r k e d 5-Quest ion™ Methodo logy inc lude: (1) ident i fy da ta (What da ta do we have? ) , (2) def ine da ta (What does it m e a n ? ) , (3) locate data (Where is i t?) , (4) source da ta (How did it get t he re? ) , and (5) access da ta (How do I get it? — go get it for me ! ) [Tannenbaum 2 0 0 1 , 18 , 9 3 - 4 ] . A s ment ioned in Sec t ion 5 . 1 . 1 . the l ibrary c o m m u n i t y has been us ing metada ta and o ther resources pr imar i ly to improve access to in fo rmat ion . The requ i rements inc lude the funct ions of in format ion descr ip t ions l isted in Sec t ion 2 . 5 . as wel l as the funct ions of me tada ta ident i f ied by Lagoze et a l . [ 1996 ] : (1) resource d i scove ry / i n fo rma t ion re t r ieva l , (2) resource m a n a g e m e n t , (3) resource usage , (4) resource use by appropr ia te aud iences , (5) resource au thent ica t ion and o ther p rovenance related act iv i t ies , (6) resource l inking wi th re lated resources , and (7) resource hardware and sof tware needs . These funct ions are mapped to the di f ferent c lass i f ica t ions of me tada ta d iscussed in Sec t ion 5.1.1 in [Greenberg 2 0 0 5 ] . Accord ing to ANSI/NISO Z39.19-2005, four pr inc ipal purposes se rved by a thesaurus a re : "(a) T rans la t ion . To prov ide a means for t rans la t ing the natura l language of au tho rs , indexers , and users into a contro l led vocabu la ry used for index ing and re t r ieva l . b) Cons i s tency . To p romote cons is tency in the ass i gnmen t of index t e rms . c) Indicat ion of Re la t ionsh ips . To indicate seman t i c re la t ionsh ips a m o n g te rms . d) Ret r ieva l . To serve as a search ing aid in ret r ieval of d o c u m e n t s . " [ A N S I / N I S O Z 3 9 . 1 9 - 2 0 0 5 , 1] Ano the r source used for ident i fy ing requ i rements is the research re lated to act iv i t ies invo lved in in format ion seek ing . Ell is [1989] recogn izes the fo l lowing " f ea tu res " in in fo rmat ion -seek ing behav iou rs : (1) s ta r t i ng , (2) cha in ing , (3) b rows ing , (4) d i f ferent ia t ing, (5) mon i to r ing , and (6) ex t rac t ing . March ion in i [1995] d is t ingu ished in fo rmat ion -seek ing act iv i t ies at four levels of g ranu la r i t y : pa t terns , s t ra teg ies , tac t ics , - 95 -and m o v e s . Two bas ic s t ra teg ies are (1) b rows ing and (2) ana ly t ica l sea rch ing . Moves inc lude basic speci f ic ac t ions such a s : (a) pr int , (b) d o w n l o a d , (c) fol low l ink, (d) o rder resu l ts , and so on . E x a m p l e s of tac t ics inc lude: (I) search w iden ing , (II) search na r row ing , (III) search re f inement , and act iv i t ies ident i f ied in Ba tes ' ear l ier work [ 1979 ; 1989 ] , such as (1) di f ferent ber ryp ick ing techn iques : (a) footnote chas ing or "backward cha in ing " , (b) c i tat ion search ing or " forward cha in i ng " , (c) jou rna l run , (d) a rea s c a n n i n g , (e) sub jec t sea rches in b ib l iograph ies and abs t rac t ing and index ing , (f) au thor sea rch ing . More recent ly , Ba tes [2002] ident i f ied four d i f ferent m o d e s of in format ion seek ing based on the level of user effort (act ive v s . pass ive) and c lar i ty of the i r in format ion need (d i rected v s . und i rec ted) : (a) sea rch ing , (b) mon i to r ing , (c) b rows ing , and (d) "be ing aware " . Add i t iona l requ i rements were der ived f rom more recent research focus ing on user tasks in the Wor ld Wide Web env i ronmen t . S h n e i d e r m a n [1997] l ists bas ic Web tasks as fo l lows: (1) speci f ic fact - f ind ing (known i tem sea rch ) , (2) ex tended fac t - f ind ing , (3) open -ended b rows ing , and (4) exp lo ra t ion of ava i lab i l i ty . In the i r t a x o n o m y of Web act iv i t ies , Morr ison et a l . [2001] d is t ingu ish W e b search act iv i t ies based on the purpose of the s e a r c h , method u s e d , and the conten t sough t . Based on pu rpose , they d is t ingu ish the fo l lowing types of ac t iv i t ies : (1) f i nd : (a) down load in fo rmat ion , (b) get a fact , (c) get a documen t , (d) f ind out about a p roduc t ; (2) c o m p a r e / c h o o s e ; and (3) unders tand . Bas ic act iv i t ies based on method a r e : (I) exp lo re , (II) mon i to r , (III) f ind , and (IV) col lect . B roder [2002] d is t ingu ished three t ypes of W e b search t a s k s : (1) nav iga t iona l , (2) in fo rmat iona l , and (3) t ransac t iona l , whi le Byme l et a l . [1999] deve loped a t a x o n o m y of tasks in wh ich users engage whi le brows ing the W e b , g rouped under the fo l lowing ca tegor ies : (1) use in fo rmat ion , (2) locate on page , (3) go to page , (4) prov ide in fo rmat ion , (5) conf igure b rowser , (6) react to env i ronmen t . B jork 's [1995] key requ i rements for bui ld ing product da ta mode ls were a lso inc luded : (1) be capab le of model l ing al l poss ib le t ypes of in format ion used for bui ld ing descr ip t ions , (2) cover the in format ion needs c o m m o n to app l i ca t ions in all phases and subd isc ip l ines of the des ign and const ruc t ion p rocess , (3) be non- redundan t in its def in i t ion of in format ion i tems, (4) be capab le of suppor t ing a l ternat ive presenta t ion fo rmats , (5) be independent of rest r ic t ions on permiss ib le in format ion s t ruc tures posed by the l imi tat ions of app l ica t ion so f tware . A n d f inal ly , the requ i rements list was c o m p l e m e n t e d by the prev ious ly c i ted Davenpor t and Prusak 's list of h u m a n act iv i t ies that t rans fo rm in format ion into - 96 -know ledge : (1) c o m p a r i s o n , (2) c o n s e q u e n c e s , (3) connec t ions , and (4) conversa t ion [Davenpor t and Prusak 1998 ] . The ana lys i s , merg ing (which inc luded ident i f icat ion of equ iva len ts and br inging to a manageab le level of g ranu la r i t y ) , and c leanup (st r ipping act iv i t ies re lated to the funct ional i ty of the s y s t e m , ra ther than the f r amework , e .g . pr int , copy and paste) of these sources resul ted in the requ i rements l isted in Tab le 2 be low. Each requ i rement in the list refers the sources f rom wh ich it has been de r i ved . It shou ld be no ted , however , that nei ther the sources used for th is purpose nor the resul t ing requ i rements are exhaus t i ve . - 97 -REQUIREMENT SOURCE R l Add/modify/delete information [Broder 2002 (3 ) ; B y m e l et a l . 1999 (4)] R l . l A d d new in format ion [CRUD (1)] R l . 2 Anno ta te in format ion [Byme l et a l . 1999 (6 ) ; Davenpor t and Prusak 1998 (4)] R l . 3 Relate newly added in format ion to ex is t ing in format ion [Lagoze et a l . 1996 (6)] R l . 4 Modify ex is t ing in format ion [CRUD (3 ) ; Broder 2 0 0 2 (3 ) ; Byme l et a l . 1999 (6)] R l . 5 Delete in format ion [CRUD (4 ) ; Broder 2 0 0 2 (3 ) ; Byme l et a l . 1999 (6)] R2 Access information [CRUD (2 ) ; S t rebe l et a l . 1994 (2 ) ; Lagoze et a l . 1996 (1 ) ; Cu t te r 1904 (1 ) ; IFLA 1998 (1 ) ; Morr ison et a l . 2001 (1)] R2.1 <by method> R2.1 .1 Brows ing [Ell is 1989 (3 ) ; March ion in i 1995 (1 ) ; Bates 2 0 0 2 (c ) ; [Shne ide rman 1997 (3) ; Morr ison et a l . 2001 (I)] R2 .1 .1 .1 Get ove rv iew of al l in format ion ava i lab le in a cer ta in scope [Burnet te et a l . 1999 ( I ) ; T a n n e n b a u m 2001 (1 ) ; Cu t te r 1904 (2 ) ; [Shne ide rman 1997 (4)] R 2 . 1 . 2 Sea rch ing [ A N S I / N I S O Z 3 9 . 1 9 - 2 0 0 5 (d) ; March ion in i 1995 (2 ) ; Bates 2002 (a) ; Morr ison et a l . 2001 (III)] R2 .1 .3 Moni tor ing [Ell is 1989 (5 ) ; Bates 2 0 0 2 (b ) ; Morr ison et a l . 2001 (II)] R2 .1 .4 " B e i n g a w a r e " [Bates 2002 (d)] R 2 . 1 . 5 A u t o m a t e d resource d iscovery [Burnet te et a l . 1999 (2)] R2.2 <by target> R2.2 .1 Known in format ion resource [Shne ide rman 1997 (1 ) ; Broder 2002 (1)] R 2 . 2 . 2 In format ion about a speci f ic sub jec t [Bates 1989 (e ) ; S h n e i d e r m a n 1997 (2 ) ; Morr ison et a l . 2001 ( l c ) ; B roder 2002 (2 ) ; Svenon ius 2000 ] R 2 . 2 . 3 In format ion about re lated sub jec ts [ A N S I / N I S O Z 3 9 . 1 9 - 2 0 0 5 (c ) ; March ion in i 1995 (I, II, III); Davenpor t and Prusak 1998 (3)] R2 .2 .4 In format ion re levant to a speci f ic contex t (any comb ina t ion of sub jec t , task , [Bates 1989 (c ,d , f ) ; Davenpor t and - 98 -REQUIREMENT SOURCE pe rson , t ime , space) Prusak 1998 (1,2)] R 2 . 2 . 5 Fact or a n s w e r to a ques t ion [Shne ide rman 1997 (1 ) ; Morr ison et a l . 2001 ( l b ) ; Byme l et a l . 1999 (2)] R2 .2 .6 In format ion resource wi th a par t icu lar proper ty (e .g . content t ype , sou rce , au thor , date) [Burnet te et a l . 1999 (2)] R2 .2 .7 In format ion resource re lated to other in format ion resource (c i ta t ions, s t ruc tu re , hyper l inks) [Ell is 1989 (2 ) ; Ba tes 1989 (a , b ) ; Byme l et a l . 1999 (3 ) ; [Davenpor t and Prusak 1998 (4 ) ; Lagoze et a l . 1996 (6)] R2.3 <regardless of> R2.3 .1 te rm ino logy used [ A N S I / N I S O Z 3 9 . 1 9 - 2 0 0 5 (a)] R 2 . 3 . 2 in format ion s o u r c e / s y s t e m [Bjork 's 1995 (5)] R 2 . 3 . 3 level of granu lar i ty [Bjork 's 1995 (2)] R3 Use information [Lagoze et a l . 1996 (3 ) ; B y m e l et a l . 1999 (1)] R3.1 Exam ine in format ion about in format ion i tems [Burnet te et a l . 1999 (II, I I I , IV); T a n n e n b a u m 2001 (2 ,3 ,4 ) ; [Lagoze et a l . 1996 (5 ) ; Cu t te r 1904 (3) ; IFLA 1998 (2)] R3 .2 A n a l y z e [Strebel et a l . 1994 (3 ) ; Morr ison et a l . 2001 (3)] R3 .2 .1 compare [Ell is 1989 (4 ) ; IFLA 1998 (3 ) ; Morr ison et a l . 2001 (2)] R 3 . 2 . 2 match in format ion based on di f ferent cr i ter ia [Svenon ius 2000 ] R 3 . 2 . 3 g roup in format ion based on di f ferent cr i ter ia [Morr ison et a l . 2001 (IV)] R3 .2 .4 sort in format ion based on di f ferent cr i ter ia [Marchion in i 1995 (d)] R 3 . 3 G e t in format ion resources of interest [Tannenbaum 2001 (5) ; IFLA 1998 (1 ) ; March ion in i 1995 (b) ; Morr ison et a l . 2001 ( l a , l c ) ; B roder 2 0 0 2 (1)] R3 .4 Represen t located in format ion in a way su i tab le for a par t icu lar con tex t ( task, pe rson , dev ice etc.) [B jork 's 1995 (4)] R3 .5 Ext ract in format ion of interest [Ell is 1989 (6)] R4 Manage information [St rebel et a l . 1994 (1 ) ; Lagoze et a l . 1996 (2)] - 99 -REQUIREMENT SOURCE R4.1 Make in format ion f indable [Lagoze et a l . 1996 (1 ) ; Cu t te r 1904 (1 ) ; IFLA 1998 (1)] R4 .2 Prov ide in format ion to appropr ia te aud iences [Lagoze et a l . 1996 (4)] R4 .3 Crea te and m a n a g e work f low [St rebel et a l . 1994 (1 ) ; Lagoze et a l . 1996 (2)] R4 .4 Imp lemen t record m a n a g e m e n t pol ic ies [St rebel et a l . 1994 (1 ) ; Lagoze et a l . 1996 (2 ,5 ) ; Gi l l i land 2000 ] R4 .5 Match in format ion to e lemen ts of con tex t [St rebel et a l . 1994 (1 ) ; Lagoze et a l . 1996 (5)] R4 .6 Enab le p reserva t ion of in format ion [Gi l l i land 2 0 0 0 ; Lagoze et a l . 1996 (7)] R4 .7 S h a r e and exchange in format ion [Burnet te et a l . 1999 (1 ) ; Lagoze et a l . 1996 (7)] Table 2: Requ i rements for the so lut ion. - 100 -7.2 General Principles The intent ion of th is work is not to comple te ly au toma te in format ion f low in the d o m a i n but to suppor t it, us ing mach ines for those tasks in wh ich they are super io r to h u m a n s , such as da ta s to rage and process ing power , and let t ing h u m a n s do what they are bet ter at : t h ink ing , c rea t ing , inven t ing , and teach ing compu te r s how to do for t h e m wha t they do not en joy do ing or do not have t ime to do . The h igh- leve l requ i rements for the so lu t ion co inc ide wi th the goa ls set out by Be rne rs -Lee for the S e m a n t i c W e b . The so lu t ion s t r ives for : " a p lace where the w h i m of a h u m a n being and the reason ing of a mach ine coex is t in an idea l , powerfu l m ix tu re " [Be rne rs -Lee 1999 , 158] and " p r o c e s s e s in wh ich the people do the creat ive work and the mach ine does the adm in i s t ra t i on . " [ ib id, 172] It env is ions an in format ion env i ronmen t " to represent and suppor t the web of l i fe, [...] enab le us to opera te in di f ferent w a y s wi th di f ferent g roups of di f ferent s izes and scopes at d i f ferent p laces every day[...] It mus t a lso t ranscend leve ls , because creat ive people are a lways c ross ing boundar ies . Tha t is how people so lve p rob lems and innova te . " [ ib id, 164] That env i ronmen t has to a l low captur ing of " ha l f - baked i deas " and " feedback f rom the people who 've m a d e new intui t ive l i nks . " A s what br ings i nnova t ion , inven t ion , and creat iv i ty is noth ing e lse but the connect ion of remo te , independent l y deve loped half-baked ideas. It has to prov ide " a democ ra t i c and ecolog ica l knowledge space where di f ferent concep ts and def in i t ions can co-ex is t . One that su rv i ves is the one that is mos t re fe renced" [ A r u m u g a m et a l . 2002 ] It subsc r ibes to mos t of the pr inc ip les set for the S e m a n t i c W e b . These pr inc ip les , ga thered f rom mul t ip le sources [e.g. Mi l ler 2 0 0 1 ; Ko i vunen and Mi l ler 2001 ] are pa raphrased be low: • Part ia l in format ion is to le ra ted . Anybody can say any th ing and no one shou ld expec t g lobal cons is tency of all in fo rmat ion . • There is no need for abso lu te t ru th . T rus twor th iness is eva lua ted by agents (app l icat ions or humans ) that p rocess the in format ion and can dec ide what to t rust based on the contex t of the asse r t i ons ; e .g . who sa id what and w h e n . - 101 -• Evo lu t ion needs to be sus ta i ned , inc luding suppor t for ef fect ive comb ina t ion of the independent work of d i ve rse c o m m u n i t i e s wi th the abi l i ty to add new in format ion wi thout ins is t ing that the old be modi f ied and to resolve ambigu i t ies and clar i fy incons is tenc ies . Descr ip t i ve conven t ions to be used mus t be able to expand as h u m a n unders tand ing e x p a n d s . • Min imal is t des ign—"Make the s imp le th ings s imp le , and the comp lex th ings poss ib le " [At t r ibuted to A lan Kay ta lk ing on Sma l l t a l k . (Exact re ference unknown . ) ] , enab le s imp le app l ica t ions now that p lan for future comp lex i t y , and s tandard ize no more than is necessary . The on ly excep t ion is the f irst pr incip le of the S e m a n t i c Web that s ta tes that every th ing ident i f iable is on the S e m a n t i c W e b . Th is f r amework requi res nei ther that every th ing ex is t ing or th inkab le is represented in the s y s t e m nor that every th ing that is represented is unamb iguous l y ident i f ied. The so lu t ion a lso subscr ibes to the des ign pr inc ip les for the Wor ld Wide W e b , n a m e l y : s impl ic i ty , modu la r i t y , decen t ra l i za t ion , to le rance , least power , and test of independent invent ion [Berners -Lee 1998 ] . It wil l s t r ive to cap i ta l ize on the benef i ts of ne tworked env i ronmen ts and use loose coupl ing to suppor t chang ing needs . 7.3 Semiotic Infrastructure One of the stated goa ls of this work is to deve lop a f ramework that a l lows coex is tence of di f ferent app roaches used for manag ing in format ion and its f low in the A E C / F M d o m a i n . The f ramework shou ld enab le a coheren t s y s t e m for d i scove r ing , a c c e s s i n g , d i rec t ing , rep resen t ing , and man ipu la t ing in any other w a y , in format ion in any fo rm and at any level of g ranu lar i t y , a s y s t e m in wh ich all d i f ferent seman t i c resources can be at least cor re la ted and made ava i lab le to and usab le by a l ternat ive app roaches to in format ion m a n a g e m e n t . Th is f r amework cons is ts of c o m m o n e lemen ts of seman t i c resources , m e c h a n i s m s for re lat ing and comb in ing these e lemen ts , and a h igh- leve l arch i tec ture of a s y s t e m that uses and c o m p l e m e n t s the resul t ing " sem io t i c in f ras t ruc ture" . A s deta i led in Sec t ion 5, seman t i c resources are used to represent a l l : phenomena of ve ry di f ferent k inds , or more prec ise ly , the i r men ta l representa t ions and genera l i za t ions , the i r assoc ia t ions wi th s igni f iers and in fo rmat ion , and thei r re la t ionsh ips to o ther s ign i f ieds. In format ion is intr icately re lated to the " res t of the wo r l d " , i.e. to every th ing - 102 -other than in format ion that ex is ts , has ex i s ted , or cou ld ex is t in the real wor ld or in h u m a n m ind . The re levant " res t of the w o r l d " can be e i ther the sub jec t of in format ion (that wh ich the in fo rmat ion is about ) or the con tex t in wh ich it is genera ted and u s e d . Look ing f rom the in fo rmat ion m a n a g e m e n t perspec t i ve , it is poss ib le to d is t ingu ish three in ter re la ted wor lds : in format ion rea lm , sub jec t r e a l m , and con tex t rea lm (Figure 11) . The boundar ies be tween these three rea lms are not a lways c r i sp . Cer ta in in format ion can have o ther in fo rmat ion as its sub jec t , for e x a m p l e , and s o m e s ign i f ieds can belong to both sub jec t and con tex t rea lms . The b lur r iness of the boundar ies var ies across d o m a i n s . In as t rophys i cs , for e x a m p l e , sub jec t , con tex t , and in fo rmat ion can be eas i ly s e p a r a t e d , whi le in pub l i sh ing , the three wor lds have subs tan t ia l ove r l aps and can be qui te dif f icult to tell apar t . In the A E C / F M d o m a i n , it is fair ly easy to d is t ingu ish a por t ion of the sub jec t wor ld—the one invo lv ing bu i ld ings and equ ipmen t , f r om the rema in ing two rea lms . However , as all p rocesses in the d o m a i n are in fo rmat ion -heavy and i n fo rma t ion -dependen t , the o ther part of the sub jec t r e a l m , the one invo lv ing p rocesses , peop le , and documen ta t i on , has big ove r laps wi th in fo rmat ion and contex t rea lms . When the d o m a i n gets fur ther spec ia l i zed to A E C / F M in fo rmat ion m a n a g e m e n t , the d is t inct ions get ve ry diff icult to m a k e . re la t i onsh ips Figure 11. Three inter-related wor lds . However , as deta i led in Sec t ion 4 . what is used in c o m m u n i c a t i o n and what ex is ts in the in format ion rea lm are not the actua l ent i t ies that compr i se these three wor lds , but some s igni f iers (words , s t r ings , sounds , p ic tures) that s tand for t h e m and can have - 103 -m a n y - t o - m a n y re la t ionsh ips to rea l -wor ld ent i t ies. Fu r the rmore , s ign i f iers and rea l -wor ld ent i t ies are not d i rect ly re lated but th rough a cer ta in men ta l represen ta t ion of the ent i ty (F igure 6 ) , wh ich can di f fer f rom person to person and s y s t e m to s y s t e m and ove r t ime. Thus , in addi t ion to the three in ter - re la ted wor lds , seman t i cs a lso s p a n s three di f ferent layers (F igure 12) . What di f ferent seman t i c resources are captur ing is ent i t ies and connec t ions wi th in and be tween the three wor lds and the three layers. Di f ferent s e m a n t i c resources cap tu re di f ferent subsets of these connec t ions , do it in di f ferent w a y s and for di f ferent pu rposes . In o rder to make the resources work together , it is necessary to homogen i ze and corre la te all these di f ferent connec t ions that they are cap tu r ing . The f ramework shou ld enab le synchron iza t ion of al l these di f ferent representa t ions and be inf ini tely ex tens ib le in more than one d i rec t ion : hor izonta l ly—to a c c o m m o d a t e representa t ion of d i f ferent t ypes of s igni f ieds and prov ide for t y p e - a g n o s t i c i s m where n e e d e d ; ver t ica l ly—to a l low un i fo rm handl ing of di f ferent levels of spec i f i c i t y /abs t rac t ion , di f ferent not ions of ins tances , and di f ferent " m e t a l eve l s " ; and in yet ano the r d imens ion—to prov ide for te rmino log ica l a n d , more genera l l y , representa t iona l t ransparency . At the s a m e t ime , the f ramework needs to a l low f i l ter ing and rea r rangement of represen ta t ions , thus a l lowing a full manageab i l i t y of di f ferent v iews , in t e rms of scope , focus , level of abs t rac t ion , o rgan iza t i on , and represen ta t ion . S u c h a genera l f ramework mus t be ut ter ly h igh- leve l and min ima l i s t i c—very s imp le , compr i sed of the fewest and bares t essen t ia l s , but a l lowing inf ini tely comp lex and d iverse s t ruc tures and app l ica t ions . Th is requ i rement ca l ls to mind a paral le l wi th the concep t of f rac ta ls . Fractals are geomet r i c ob jects wi th an inf inite a m o u n t of de ta i l , i r regu lar at any sca le , and not eas i ly conceptua l iza t ions Figure 12. Three layers of s e m a n t i c s . - 104 -exp la ined by c lass ica l geome t r y . E x a m p l e s of f racta ls inc lude very comp lex natura l s h a p e s , such as moun ta ins , t rees , or c louds . Fracta ls have s o m e inf ini tely repeat ing pat tern and can be genera ted by the infinite i terat ion of a cer ta in ru le : by repeated ly subst i tu t ing cer ta in geomet r i c s h a p e s wi th o ther shapes ( i .e. IFS f rac ta ls ) , by repeated ly app ly ing geomet r i c t rans fo rmat ions (e .g . ro ta t ion, t rans la t ion , or ref lect ion) to points , by repeat ing one or more ma thema t i ca l f o rmu las , or by s tochas t ic p rocesses . Fracta ls a re se l f -s imi la r at any level of de ta i l , but the degree and type of se l f -s imi lar i ty can va ry . Perfect or exac t se l f -s imi lar i ty m e a n s that the fractal appears ident ical at any sca le . App rox ima te l y or quas i se l f - s im i la r f racta ls conta in cop ies of t hemse l ves ac ross sca les but in d is tor ted f o rm . Brown ian or stat is t ica l se l f -s imi lar i ty is the weakes t—on ly numer ica l or stat is t ica l measu res are p reserved at di f ferent sca les . Figure 13 shows an e x a m p l e of the mos t perfect type of f racta ls . IFS (Iterated Function System) fractals a re geomet r i c shapes that can be repeated ly subd iv ided into sma l l e r par ts , each of wh ich is a sma l le r copy of the who le . They are perfect ly self-s im i la r at any level of deta i l and are genera ted by recurs ion , s tar t ing f rom an ut ter ly s imp le shape cal led initiator and an equa l ly s imp le rule us ing ano ther s imp le shape cal led generator. V Initiator or iteration 0 Generator Ru le : rep lace each s t ra ight line of the ini t iator by the genera to r A A > < > < V V 1 iteration 2 iterations 0 £? 3 H? 3 iterations 4 iterations Figure 13. Example of an IFS fractal: Koch's snowflake. - 105 -Fracta ls have been used as an inspi rat ion in this work that t r ies to ident i fy s o m e pr imi t ives that a re c o m m o n to di f ferent seman t i c resources and a set of more or less s t rong rules that can be used to genera te very comp lex and d iverse var ie t ies thereof . Whi le the use of recurs ion by itself wou ld prov ide for ver t ica l ex tens ib i l i t y , the extens ib i l i ty in other d i rec t ions wou ld necess i ta te ident i f icat ion of ve ry s imp le c o m m o n e lemen ts to be used as ini t iators and genera to rs . The fo l lowing sect ions a i m to ident i fy c o m m o n bas ic e lemen ts used in di f ferent seman t i c resources that can be used to relate t h e m , to a l low c o m m u n i c a t i o n be tween s y s t e m s us ing t h e m , and to prov ide a shared " sem io t i c in f ras t ruc ture" that can be used by a var ie ty of efforts a im ing to enhance in format ion m a n a g e m e n t in the A E C / F M d o m a i n . It is expec ted that such a shared semio t i c in f rast ructure suppor t ing and relat ing a var ie ty of efforts towards the s a m e goa l wou ld prov ide for f lex ib i l i ty , ex tens ib i l i t y , cor re la t ion , reconc i l ia t ion, cus tom iza t i on , and con tex tua l i za t ion , ident i f ied as genera l l y miss ing in the ex is t ing s y s t e m s serv ing the d o m a i n . It is impor tan t to inc lude s o m e addi t ional de l imi ta t ions here . Th is work has no asp i ra t ions to app ly the comp lex i t y theory in its str ict ma themat i ca l sense to in format ion m a n a g e m e n t in the d o m a i n . Th is theory has been used to unders tand the prob lem and as a source of ideas for its so lu t ion . IFS f racta ls are only one , ideal type of f rac ta ls . Fracta ls appear ing in nature are never perfect ly se l f -s im i la r and genera ted by determin is t i c p rocesses . They a lways invo lve cer ta in r andomness that charac te r i zes the Brown ian se l f -s imi la r i ty . Fo l lowing th is me taphor , it is not expec ted that it is poss ib le to deve lop a f ini te set of in i t iators, genera to rs , and rules to be used for genera t ing s c h e m a s that cou ld capture the full comp lex i t y of semio t i cs . The only expec ta t ion is that it is poss ib le to identi fy and relate s imp le e lemen ts that are c o m m o n to all the d ive rse , but in s o m e way s imi la r , seman t i c resources ident i f ied in prev ious chap te rs . 7.3.1 Basic Concepts—Shared Factors This sect ion ident i f ies e lemen ts that are shared ac ross seman t i c resources , of ten under di f ferent n a m e s . Each e lemen t wil l be ass igned a graph ic s y m b o l that will represent it in v isua l representa t ions of the s t ructure of speci f ic seman t i c resources . A s the intent ion of th is f ramework is to relate s y s t e m s us ing these resources and not to replace t h e m , the goal wil l not be to reduce all e l emen ts of al l seman t i c resources , but only those that are c o m m o n and re levant d i rect ly to in format ion m a n a g e m e n t . Those addi t ional e lemen ts wil l in most cases m a k e useful supers t ruc tu re to the f ramework , but - 106 -Figure 14. Symbo l for subject . never its in tegral part . The shared e lemen ts wil l a l low them to add va lue for the rest of the s y s t e m . In acco rdance wi th the topic of th is work , the focus will not be on s ign i f ieds but on in format ion a n d , for the s a m e r e a s o n , the te rm "subject", as def ined in topic maps [Garsho l and Moore 2 0 0 5 ] , wil l be used hereaf ter instead of " s i gn i f i ed " to e m p h a s i z e th is focus. Th is wil l a lso br ing the te rm ino logy c loser to the c o m m o n usage , as th is t e rm has been used in re lated f r a m e w o r k s (RDF and Top ic Maps) and in speci f ic seman t i c resources (e .g . L e x i c o n , Dubl in Co re ) . The s y m b o l used to represent sub jec ts in this documen t wil l be a hor izonta l d i amond (Figure 14) . In the in format ion r e a l m , sub jec ts per se do not e x i s t 5 0 ; they are present only th rough s igni f iers denot ing t h e m , in format ion tel l ing some th ing about t h e m , and th rough re la t ionsh ips be tween s ign i f iers , between in format ion i tems, and be tween s igni f iers and in format ion i tems. There fo re , these const ruc ts and the i r assoc ia t ions wi th sub jec ts need to fo rm a basis of the f ramework . The concept of French w indows l ives in our m inds , the i r spec i f ic ins tances in the real wor ld . In compu te r s y s t e m s , on the Internet , and in d o c u m e n t a r ch i ves , they are present on ly as wo rds , s y m b o l s , sounds s tand ing for t h e m , d o c u m e n t s , images , pa rag raphs , tab les , te l l ing someth ing about t h e m , and re la t ionsh ips be tween these two , such as hyper l inks , re ferences, p rox im i ty , tab le re la t ionsh ips etc . Figure 15. Symbol for token. Sign i f ie rs indicate ment ions of a sub jec t or occur rence of s o m e in format ion about it in uns t ruc tured in format ion and in va lues and e lemen t names of s t ruc tured in fo rmat ion , inc luding me tada ta . A s igni f ier can be a word or phrase in any natural or art i f ic ial l anguage or any o ther a l phanumer i c s t r ing , s o u n d , or image—any representa t ion of a sub jec t that s tands as its p roxy in a par t icu lar contex t . To e m p h a s i z e th is role of a p roxy , s ign i f iers wil l hereaf ter be referred to as tokens. It wil l be represented in this documen t by a c i rc le (F igure 15) and its ass i gnmen t to a sub jec t , i.e. code (e .g . € s tands for Euro , the cur rency of the European Economic and Moneta ry Un ion ) , as shown in Figure 16. Figure 16. Code: assoc iat ion of a token with a subject . 5 0 The exceptions are information and tokens when treated as subjects, as discussed further in the document. - 107 -In cur ren t eve ryday in format ion env i ronmen ts , the conv inc ing major i ty of tokens be long to the ca tegory of symbo l i c s ign i f iers—arb i t rary represen ta t ions of a subject assoc ia ted wi th it by conven t i on . Howeve r , iconic and index ica l s igni f iers (F igure 5) will not be exc luded f rom the d i scuss ion , in order to prov ide for pat tern recogni t ion and other ex is t ing or potent ia l mach ine- in te l l i gence techn iques that may be more wide ly used in the future. On the other h a n d , as indexica l s igni f iers a re typ ica l ly sub jec ts t h e m s e l v e s , they wil l of ten need to be t reated as such and thei r s ign i fy ing funct ion cap tured th rough re la t ionsh ips be tween the two casua l l y connec ted sub jec ts . There is often a need to prov ide s o m e in format ion about tokens . For e x a m p l e , a cer ta in g raph ica l symbo l used in plan d raw ings may need to be n a m e d , re lated to the cor respond ing s y m b o l used in sec t ion d raw ings , and its use exp la ined in a _ . . , - , „ • * • - i * • 3 ' r F igure 17. Reif ied token . d o c u m e n t or a help f i le. In such c a s e s , t okens can be t reated as sub jec ts , hav ing tokens , in fo rmat ion , and re la t ionsh ips of the i r own (Figure 17) . T o k e n s that , in addi t ion to funct ion ing as su r roga tes , p rov ide in format ion about a sub jec t , such as long descr ip t ive n a m e s , render ings , or pho tog raphs , c a n , in th is con tex t , be t reated as in fo rmat ion . For e x a m p l e , in the Indust ry Foundat ion C l a s s e s , I f cRepresenta t ion and its const i tuent I f cRepresen ta t i on l tems se rve as prox ies of I fcProduct , w i th wh ich they are assoc ia ted th rough its Represen ta t ion at t r ibute. Howeve r , they a lso prov ide geomet r i c and topolog ica l in format ion about that subject . Fu r the rmore , there may be a need to say someth ing about the I f cRepresen ta t ion , to state its type or to relate it to o ther representa t ions or to speci f ic d o c u m e n t s , for e x a m p l e . There fo re , a par t icu lar I fcRepresenta t ion or I f cRepresen ta t i on l t em may need to be represented in di f ferent con tex ts as sub jec t , as a t o k e n , or as in fo rmat ion . Th is impl ies that the way someth ing is represented in this f r amework wil l depend not on ly on what it is per se, but on its re la t ionship to other e lemen ts . A s ment ioned ear l ier , in format ion ex is ts in many di f ferent f o rms , s t ruc tured and uns t ruc tu red , in di f ferent fo rmats , and on di f ferent m e d i a . In th is documen t , the te rm "information item" wil l be used to mean not on ly in format ion r e s o u r c e s 5 1 , but a lso ind iv idual p ieces of in fo rmat ion , such as sub jec t a t t r ibutes and the i r va lues . 5 1 Although the definition of "information resource" is very broad, the one for "information i tem" is even broader. An information item can be a simple value which cannot be interpreted as information by any agent, without any context, i.e. related elements in the framework. - 108 -In the examp le of a par t icu lar French w indow as a sub jec t , its deta i led d raw ing , a sen tence stat ing that it is m a d e out of meta l and c lear g lass , a record in the w indows schedu le , and a s ing le a t t r ibu te -va lue pair " he igh t : 1 8 0 c m " , wil l all be cons idered in format ion i tems. Th is wil l a l low handl ing of in format ion about a sub jec t in the s a m e way regard less of its type and level of granu lar i ty (e .g . a t t r ibu te -va lue pair , reco rd , tab le , da tabase , data wa rehouse ) . Assoc ia t ion of an in format ion i tem wi th a sub jec t wil l be referred to as information. A n in format ion i tem wil l be represented by a hexagon (Figure 18) and in format ion will be represented by its assoc ia t ion wi th a sub jec t s y m b o l (Figure 19) . In cases w h e n someth ing needs to be sta ted abou t an in format ion i t em, i.e. to prov ide m e t a d a t a , the in format ion i tem can a lso be cons ide red and t reated as a subject (F igure 20 ) . For e x a m p l e , m a n y e -ma i l m e s s a g e s , le t ters, and faxes may have a par t icu lar change order for the i r sub jec t or a par t icu lar d raw ing may be referenced f rom a n u m b e r of o ther documen ts . Once a g a i n , re la t ionsh ips to o ther e lemen ts are as impor tant as the nature of an ent i ty . It is wor th not ing that , un l ike o ther sub jec ts , in format ion i tems are direct ly present in the e lec t ron ic space and are d i rect ly add ressab le . Figure 18. Symbol Figure 19. In format ion: subject for information assoc iated with an informat ion F igure 20. Reif ied information i tem. i tem tel l ing someth ing about it. i tem. In add i t ion to the ass i gnmen t of tokens and in format ion i tems, the f ramework a lso needs to inc lude relationships be tween sub jec ts . Re la t ionsh ips are one of the e lemen ts that expl ic i t ly represent sub jec ts in all s t ruc tured in fo rmat ion , but they are also impl ied in re lat ions be tween tokens (e .g . co -occu r rence and p rox im i t y ) , be tween in format ion i tems (e .g . hyper l inks ) , and be tween tokens and in format ion i tems (occur rences) . Re la t ionsh ips wil l be represented as a l ine s e g m e n t connec t ing two sub jec t s ymbo l s (F igure 21) . S o m e t i m e s , it is suff ic ient to represent e m p l o y m e n t , cont rac t , or ad jacency , for e x a m p l e , as a s imp le re la t ionship be tween two sub jec ts , but , in other c a s e s , there may be a need to prov ide addi t iona l in format ion about these re la t ionsh ips . Re la t ionsh ips can in such cases be re i f ied, i.e. t reated as sub jec ts , hav ing tokens , in fo rmat ion , and re la t ionsh ips of the i r own (Figure 22) . - 109 -For e x a m p l e , in format ion about Joe Doe 's e m p l o y m e n t re la t ionship to Coo l Bu i lders Inc. can inc lude that it s ta r ted on 2 0 / 0 2 / 2 0 0 2 , is pe rmanen t , fu l l - t ime, sub jec t to c lauses def ined in Cont rac t #2002 etc . Figure 22. Reif ied relat ionship between two subjects . S e m a n t i c resources , more prec ise ly the i r under ly ing conceptua l f o rma l i sms , di f fer in the way they represent re la t ionsh ips . S o m e s imp ly s ta te the ex is tence of an unspec i f ied re lat ionship between two sub jec ts ; o thers use labe l led , t y p e d , or reif ied re la t ionsh ips . Many fo rma l i sms use d i rected re la t ionsh ips and the re la t ionship d i rect ion enta i ls impor tant seman t i cs in al l re la t ionsh ips that are not commuta t i ve ( i .e. s y m m e t r i c a l ) . Yet ano ther var ia t ion is the use of " r o l e d " re la t ionsh ips , in wh ich on ly par t ic ipat ing sub jec ts and the i r roles in the re la t ionship are spec i f ied wi thout spec i fy ing the re la t ionship itself. For e x a m p l e , top ic m a p s would represent a re la t ionship be tween C o m p a n y A and C o m p a n y B as an assoc ia t ion of two m e m b e r s wi th the roles of " de fendan t " and "plaint i f f " . The s a m e re la t ionship will in o ther f o rma l i sms be represented as a d i rected " s u e s " re la t ionsh ip , as a set of unspec i f ied or named re la t ionships be tween C o m p a n y A , de fendan t and plainti f f c lasses , C o m p a n y B, and the par t icu lar lawsui t or as a reif ied re la t ionship " lawsu i t 1" wi th at t r ibute va lues re ferenc ing C o m p a n y A and C o m p a n y B. The IFCs use object i f ied d i rected re la t ionsh ips d is t ingu ish ing relat ing object and re lated ob jec t (s ) , w i th instant iab le re la t ionship s u b -types prov id ing more deta i led def in i t ions and speci f ic cons t ra in ts . S e m a n t i c resources in tended for mach ine use need to d is t ingu ish re la t ionsh ips that are impor tant for mach ine in ference, e .g . t rans i t ive re la t ionsh ips that ind icate inher i tance or to cap ture reciproci ty of spec i f ic re la t ionsh ips . The proposed f ramework needs the abi l i ty to homogen i ze re la t ionsh ips be tween sub jec ts , captured in par t ic ipat ing s y s t e m s us ing al l these di f ferent t ypes of representa t ion . Fol lowing the method used in def in ing th is f ramework , th is wil l be ach ieved by ana lyz ing e lemen ts of these di f ferent represen ta t ions , ident i fy ing those that are c o m m o n and those that are reduc ib le , but th is can be done only af ter add i t iona l e lemen ts of the f ramework are in t roduced. - 110 -To s u m m a r i z e the fo rego ing d i scuss ion , a sub jec t is represented in the in format ion wor ld by any set of assoc ia ted tokens , in format ion i tems convey ing some th ing about it, and re la t ionsh ips to o ther sub jec ts (F igure 23 ) . Hereaf ter these concepts wil l be col lect ive ly referred to as sub jec t "properties". Any ass ignmen t of a proper ty to a subject wil l be cal led "assertion." In an asse r t i on , the proper ty has the role of a pred icate , i.e. it is the part of the asser t ion o ther than its sub jec t . S e m a n t i c resources make these assoc ia t ions expl ic i t v ia three t ypes of asser t ions that are in this documen t ca l l ed : codes , in fo rmat ion , and re la t ionsh ips . Values of these asser t ions a re : t o k e n s , in format ion i tems, or o ther sub jec ts , respect ive ly . Subject Code Information Relationship Figure 23. Representat ion of a subject and three basic types of assert ions about it. It is necessary to m a k e a d ig ress ion here and point to a few potent ia l points of con fus ion . First , it can r ightful ly be a rgued that an asser t ion s ta t ing that a person 's name is Joe Doe or that he is in an e m p l o y m e n t re la t ionship to the c o m p a n y Coo l Bui lders represents in format ion about that pe rson . However , due to the lack of an appropr ia te te rm that wou ld e n c o m p a s s al l types of in format ion i tems and thei r assoc ia t ions to sub jec ts , and d is t ingu ish t h e m f rom other t ypes of asser t i ons as def ined above , the te rm " i n f o r m a t i o n " is used in that sense . To capture the b roader mean ing of that t e r m , the one inc lus ive of al l k inds of asser t ions , the te rm "content" wi l l be used hereaf ter in the d o c u m e n t . It can a lso be argued that any of the asser t i ons are ac tua l ly re la t ionsh ips or that in format ion and re la t ionsh ips also represent a sub jec t in s o m e way (e .g . a da tabase record) and can there fore be cons idered as its tokens . The facts tha t : • all t okens , in format ion i tems, and re la t ionsh ips can be sub jec ts , • all asser t ions can be seen as in format ion or as re la t ionsh ips , and • in format ion and re la t ionsh ips can be seen as tokens , can be qui te confus ing and indicat ing the w e a k n e s s of the f ramework . The point is that re la t ionsh ips be tween these const ruc ts are ex t reme ly intr icate and d is t inc t ions very subt le , wh ich m a k e s t h e m often con fused in pract ice. One of the impl ied goa ls of th is work is to point to this int r icacy and try to shed s o m e l ight on it. A big obstac le to th is - I l l -goal is the lack of prec ise and s tandard ized te rm ino logy . In o rder to unders tand the f ramework , it is ve ry impor tant to keep the def in i t ions prov ided wi th in the d o c u m e n t in mind whi le read ing the text , as the choice of t e rms had in m a n y cases to dev ia te f rom c o m m o n usage . Yet ano the r c o m m o n e lemen t encoun te red in mos t seman t i c resources is " t y p e " or " c l a s s " . The t e rm " c a t e g o r y " wil l be used in th is work to e m p h a s i z e the re lat ive nature of th is cons t ruc t , as d iscussed in Sec t ion 4 . Accord ing to the broad def in i t ion of sub jec t , used in th is work , a sub jec t ca tegory is jus t a sub jec t assoc ia ted wi th o ther sub jec ts and can be represented as such . However , as th is type of re la t ionship is the on ly one represented in s o m e types of seman t i c resources and is c lear ly d is t ingu ished f rom all o ther re la t ionsh ip t ypes in m a n y o thers , th is f ramework wil l adopt its spec ia l s ta tus . It wil l be v isua l ly represented here wi th a ver t ica l l ine s e g m e n t connec t ing an e lemen t to its ca tegory sub jec t topped by a hor izonta l bar c ross ing the ca tegory sub jec t (F igure 24) . Ano the r reason for the dec is ion to t reat ca tegor ies in a spec ia l way is that ca tegor ies are c o m m o n l y ass igned not only to sub jec ts but a lso to any of the i r proper t ies (e .g . F igure 25) and are unavo idab le in the exp lana t ion of th is f ramework (as man i fes ted in the numbe r of occur rences of re lated t e rms in this d o c u m e n t and the re ference to " re la t ionsh ip t y p e " in this very pa ragraph) . Th is cons t ruc t wil l be used to denote the re la t ionship that is in di f ferent f o rma l i sms referred to as " s u b s u m p t i o n " , " spec ia l i za t i on " , " gene r i c " , " A K O " , or "genus-species", but a lso the one known as " i ns tan t i a t i on " or " I s A " re la t ionsh ip , i.e. all re la t ionsh ips that imply inher i tance of proper t ies . The th i rd type of h ierarch ica l re la t ionsh ips , used in thesaur i and da ta mode l ing and known as "par t i t i ve" , " agg rega t i on " , or " i n c l u s i o n " re la t ionsh ips , wil l not be e n c o m p a s s e d by th is const ruc t . It wil l be d iscussed in Sec t ion 7 .3 .2 . Figure 25. Typed re lat ionship: category ass igned to Figure 24. Category re lat ionship. an asser t ion . In o rder to a c c o m m o d a t e the re lat ive nature of ca tegor ies , the f ramework shou ld prov ide for an impor tan t opt iona l proper ty of asse r t i ons : s c o p e . Acco rd ing to Ro ta , the famous ma themat i c i an S tan is law U lam once s a i d : - 112 -"[...] 'Wha t m a k e s you so sure that ma themat i ca l logic co r responds to the way we th ink? Logic fo rma l i zes on ly a very few of the p rocesses by wh ich we actua l ly th ink. The t ime has c o m e to enr ich fo rmal logic by add ing to it s o m e o ther f undamen ta l not ions. What is it that you see w h e n you see? You see an ob ject as a key , a m a n in a car as a passenger , s o m e sheets of paper as a book. It is the word 'as ' that mus t be ma themat i ca l l y f o rma l i zed . . . . Unt i l you do that , you wil l not get ve ry far wi th you r AI p r o b l e m . ' " [Rota 1 9 9 7 , 59] con tex tua l i za t ion and f i l ter ing of in format ion in all t ypes of app l i ca t ions . Not on ly ca tegory F igure 26. Scoped relat ionship: necessar i l y un iversa l ly va l id or re levant . The s a m e person can be a lead arch i tect in project A , emp loyee #1234 in C o m p a n y X , e m p l o y e e of the y e a r in 2 0 0 2 , a gu i tar is t in band N, cal led " D a d d y " in fami ly Doe . S c o p e s can have very di f ferent nature a n d , jus t as ca tegor ies , can be t reated as sub jec ts . They wil l be represented as a re la t ionship of an asser t ion to a sub jec t (F igure 26) . Ca tegor ies and scopes can be used to represent all d i f ferent t ypes of re la t ionship concep tua l i za t ions . Figure 27 shows examp les of di f ferent concep tua l i za t ions of the s a m e re la t ionship ment ioned as e x a m p l e s ear l ier in th is sec t ion . For s imp l ic i t y , the re la t ionsh ips will be represented as potent ia l ly b id i rect ional t yped and scoped re la t ionsh ips be tween two sub jec ts (F igure 28) . The b id i rect ional i ty wi l l be ind icated by a pai r of ca tegory labels and where one va lue is m i ss i ng , its ex is tence can be imp l ied . It is impor tan t to note that th is does not m e a n that redundancy of concep tua l i za t ions wil l be d i scou raged , on ly that ca tegor ies and scopes wil l be used to reduce the i r representa t ions to the s imp les t poss ib le f o rm. " A s - n e s s " of asser t ions is impor tant not on ly in art i f ic ial in te l l igence (AI) , but a lso for re la t ionsh ips , but all t ypes of asser t ions are not re lat ionship of an assert ion to a subject represent ing a scope . - 113 -3 has role of Figure 27. Different representat ions of the s a m e relat ionship. F igure 28. Representat ion of the s a m e re lat ionship in this document . A s a l ready men t i oned , sub jec ts are present in the in format ion rea lm on ly as tokens , re lated in fo rmat ion , and the i r re la t ionsh ips . Howeve r , au toma ted in format ion m a n a g e m e n t requi res s o m e way to refer to a par t icu lar sub jec t and to ensure that what is referred to is that speci f ic sub jec t and noth ing e lse . To proper ly process the s ta temen t " L e Co rbus ie r des igned R o n c h a m p . " , a c o m p u t e r needs to be inst ructed that " L e C o r b u s i e r " refers to the f amous arch i tect Cha r l es -Edoua rd Jeannere t ( 1 8 8 7 - 1 9 6 5 ) and not a type font or furn i ture c o m p a n y named af ter h im and that " R o n c h a m p " s tands for the chape l of Notre D a m e du Haut that is located in R o n c h a m p , a town in nor theas tern France , not the town itself. - 114 -The purpose of all t okens is to enab le referr ing to what they s tand for ; however , m a n y di f ferent tokens may be used to refer to the s a m e sub jec t (e .g . "F rench w i n d o w s " and "F rench doors" ) and the s a m e token (e .g . " F r e n c h Windows" ) may be used to refer to m a n y di f ferent sub jec ts (e .g . a ca tegory of bui ld ing products and a vers ion of an opera t ing s y s t e m ) . For that reason , all in fo rmat ion m a n a g e m e n t s y s t e m s use s o m e kind of ident i f iers to ensure unamb iguous ident i f icat ion. In the env i ronmen t on wh ich this research is focus ing , i.e. a comp lex s y s t e m cons is t ing of d is t r ibu ted , he te rogeneous , and au tonomous s u b s y s t e m s , it is ve ry l ikely that the s a m e sub jec t wil l have more than one ident i f ier . In each par t icu lar scope , it wil l have one token as a local ident i f ier used for man ipu la t ion by local code that may or m a y not be usable for g lobal ident i f icat ion, i.e. for in teroperabi l i ty wi th o ther s y s t e m s . In m a n y app l i ca t ions , there wil l a lso be a " p r e f e r r e d " local token in tended for h u m a n use , potent ia l ly accompan ied by a graph ica l representa t ion used for its handl ing in the user in ter face. In the samp le B L I S - X M L fi le used in the pilot imp lemen ta t i on , one of the I fcWindow e lemen ts has X M L I D = " i 9 7 0 " , wh ich is used to refer to it wi th in that par t icu lar X M L fi le ( i .e. in its re la t ionsh ips wi th o ther en t i t i es /e lemen ts ) , G l o b a l I d = " a @ a e O k / q f w = s l F / , , = ! " , that is used to refer to it in any o ther fi le or s y s t e m , and L a b e l = " W i n d o w # l l " , that is in tended for h u m a n use and d isp lay in user in ter faces. A numbe r of m e c h a n i s m s for un ique g lobal ident i f icat ion in d is t r ibuted env i ronmen ts have been deve loped wi th s l ight ly d i f ferent pu rposes . G loba l l y Un ique Ident i f iers (GUIDs) are used for mach ine -hand l i ng of e lec t ron ic representa t ions of sub jec ts in in tegrated sof tware app l ica t ions and da tabases , Un i fo rm Resource Ident i f iers (URI 's) for ident i fy ing and address ing in format ion resources access ib le ove r the Internet , and pub l ished sub jec ts for ident i fy ing any k ind of sub jec t , espec ia l l y those that are not in format ion resources . In pract ice , it is of ten con fused what an ident i f ier ident i f ies: the sub jec t itself, its par t icu lar e lect ron ic rep resen ta t ion , or an in format ion resource about it (e .g . [Be rne rs -Lee 2 0 0 3 ; C lark 2002 ] ) . A l though in m a n y cases th is fa i lure to d is t ingu ish does not cause any ma jo r d a m a g e , this f r amework wil l encourage a c lear d is t inc t ion , as in comp lex s y s t e m s such m i x - u p s m a y have d ispropor t iona l ly large consequences , espec ia l