UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

IfcXMLExplorer : a visualization tool for exploring and understanding ifcXML data Duttachoudhury, Nayantara 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_february_duttachoudhury_nayantara.pdf [ 2.24MB ]
JSON: 24-1.0216480.json
JSON-LD: 24-1.0216480-ld.json
RDF/XML (Pretty): 24-1.0216480-rdf.xml
RDF/JSON: 24-1.0216480-rdf.json
Turtle: 24-1.0216480-turtle.txt
N-Triples: 24-1.0216480-rdf-ntriples.txt
Original Record: 24-1.0216480-source.json
Full Text

Full Text

IfcXMLExplorer: A Visualization Tool for Exploring andUnderstanding IfcXML DatabyNayantara DuttachoudhuryB.Tech. in Computer Science and Engineering, VIT University, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)November 2015c© Nayantara Duttachoudhury, 2015AbstractXML is a markup language popularly used for data exchange across different ap-plications. Its flexibility and simplicity has made it easy to use. However, thisflexibility makes it difficult for large XML files to be easily comprehensible. MostXML files have complex schemas and these schemas differ across domains. Inthis work, we have taken a specific type of XML files - ifcXML. IfcXML files aredomain specific XML files generated from building information models (BIM).The organization of ifcXML files is hard to follow; elements in the ifcXML fileare identified through unique identifiers, which are used to connect one element toanother. This results in long chains of connections. Currently there is no effec-tive method of extracting and understanding these connections. The only way auser can see how one element is connected to another is by following the path ofconnections through the ifcXML file. We address this gap by introducing ifcXM-LExplorer. IfcXMLExplorer is a visualization tool that enables users to better un-derstand the different systems in a BIM model along with the connections withinelements of the system by extracting necessary information from the ifcXML file.iiPrefaceThis dissertation is an original intellectual product of the author, N. Duttachoud-hury. The user study reported in Chapter 3 was covered by a UBC Ethics Certifi-cate. The approval was given by the UBC Behavioural Research Ethics Board onJune 23rd, 2015. The project title is ‘Understanding of metadata’ and the certificatenumber is H14-02544.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 BIM: building information models . . . . . . . . . . . . . . . . . 52.2 IFC: industry foundation classes . . . . . . . . . . . . . . . . . . 62.2.1 IFC visualization . . . . . . . . . . . . . . . . . . . . . . 72.3 Specific type of XML files: ifcXML . . . . . . . . . . . . . . . . 82.3.1 Problems with ifcXML files . . . . . . . . . . . . . . . . 83 Case study: ifcXML data . . . . . . . . . . . . . . . . . . . . . . . . 133.1 Part 1: information retrieval from ifcXML files . . . . . . . . . . 143.1.1 XML database: BaseX . . . . . . . . . . . . . . . . . . . 153.1.2 Querying and parsing ifcXML files . . . . . . . . . . . . 16iv3.2 Part 2: IFC quality validation from ifcXML files . . . . . . . . . . 203.2.1 Validation questions: how and why were these chosen? . . 223.3 Part 3: IfcXMLExplorer: a visual interface for exploring and un-derstanding ifcXML data . . . . . . . . . . . . . . . . . . . . . . 313.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.2 Problem description . . . . . . . . . . . . . . . . . . . . 353.3.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . 433.3.5 User study . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.6 Discussion and future work . . . . . . . . . . . . . . . . 553.3.7 Concluding example of IfcXMLExplorer . . . . . . . . . 563.4 Case study conclusions . . . . . . . . . . . . . . . . . . . . . . . 574 XML in other fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.1 Linguistic XML . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2 Genomic visualization of XML data . . . . . . . . . . . . . . . . 614.3 XML data visualization software . . . . . . . . . . . . . . . . . . 625 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64A User Study Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 68B User Study Information Form . . . . . . . . . . . . . . . . . . . . . 70vList of TablesTable 3.1 Task and data abstraction: this table summarizes our analysis ofwhat, why and how questions based on Munzner’s model[33]. . 60viList of FiguresFigure 2.1 Building Information Models in Autodesk Revit . . . . . . . . 6Figure 2.2 Solibri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 2.3 Hierarchical structure of ifcXML files[42] . . . . . . . . . . . 9Figure 2.4 Comparing 3D model with ifcXML representation[42] . . . . 10Figure 2.5 Finding the length of a wall from an ifcXML file . . . . . . . 11Figure 3.1 Complete System of IfcXML Processing . . . . . . . . . . . 14Figure 3.2 Illustration of port connectivity . . . . . . . . . . . . . . . . 24Figure 3.3 Illustration of hanging port . . . . . . . . . . . . . . . . . . 25Figure 3.4 Element Connections . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.5 Space boundaries . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3.6 Object that spans multiple stories . . . . . . . . . . . . . . . 27Figure 3.7 Trim operation resulting in a Null body . . . . . . . . . . . . 28Figure 3.8 Object placements . . . . . . . . . . . . . . . . . . . . . . . 28Figure 3.9 Overlapping geometries . . . . . . . . . . . . . . . . . . . . 29Figure 3.10 Duplicate types . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 3.11 Parametric vs Explicit geometries . . . . . . . . . . . . . . . 30Figure 3.12 Incomplete space modelling . . . . . . . . . . . . . . . . . . 31Figure 3.13 Object floating in space . . . . . . . . . . . . . . . . . . . . . 32Figure 3.14 Opening that does not cut anything . . . . . . . . . . . . . . . 33Figure 3.15 Object in the wrong container . . . . . . . . . . . . . . . . . 33Figure 3.16 Screenshot of complete System. . . . . . . . . . . . . . . . . 34Figure 3.17 Structure of ifcXML Schema . . . . . . . . . . . . . . . . . . 36Figure 3.18 Relational and Non-relational Elements . . . . . . . . . . . . 37viiFigure 3.19 Different data instances connected through reference paths. . . 38Figure 3.20 Overview View . . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 3.21 Search View . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 3.22 History View . . . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 3.23 Network View . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 3.24 Colour Gradient . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 3.25 User comparison between IfcXML and IfcXMLExplorer. Par-ticipant 1- 6 are CS students. Participants 7 - 10 are AEC students 48Figure 3.26 User comparison between IfcXML and IfcXMLExplorer. Par-ticipant 1- 6 are CS students. Participants 7 - 10 are AEC students 49Figure 3.27 User comparison between IfcXML and IfcXMLExplorer. Par-ticipant 1- 6 are CS students. Participants 7 - 10 are AEC students 50Figure 3.28 Comparison between time taken using IfcXMLExplorer andIfcXML file for both groups of participants. . . . . . . . . . . 53Figure 3.29 Schematic representation of the subsystem - Mechanical Sup-ply Air 2. Elements excluding ports are shown in rectangularboxes. Distribution ports are shown in diamond shaped boxes. 57Figure 3.30 Network view in ifcXMLExplorer which shows the entire sub-system mechanical supply air 2 . . . . . . . . . . . . . . . . 58Figure 3.31 Connection of Flow Fitting i25739 to Port i142622 . . . . . . 59Figure 3.32 Connection of Port i142622 to Flow Fitting i25739 . . . . . . 59Figure 3.33 Connection of Port i142622 to Port i142613 . . . . . . . . . . 59Figure 3.34 Connection of Port i142613 to Port i142622 . . . . . . . . . . 59Figure 3.35 Connection of Port i142613 to Flow Segment i16931 . . . . . 60viiiAcknowledgmentsI would like to thank my supervisor, Dr. Rachel Pottinger for her guidance andinspiration. She has supported me in many ways by providing invaluable advicethroughout my research and while writing my thesis. This work would not havebeen possible without her supervision. I would also like to thank Dr. Sheryl Staub-French and the her students for their constant help and feedback.In addition, I am grateful to Dr. Tamara Munzner for being my second readerand providing insight to make this thesis more complete. I am also thankful toall the members of the Data Management and Mining Lab. They provided a greatatmosphere to learn as well as to have fun. Many thanks to all my friends in bothVancouver and elsewhere for their support.Lastly, and most importantly, I wish to thank my parents for their unconditionallove and support. I would not have been where I am today without them. I dedicatethis thesis to them.ixChapter 1IntroductionExtensible Markup Language (XML) is a simple and flexible text format for ex-change of a wide variety of data [18, 29, 38]. For an XML document to be valid,it needs to conform to a schema. Schemas simplify the process of data exchangeby applying constraints to XML documents. Organizations like the World WideWeb Consortium (W3C) created XML DTD to describe the structure of XML datasource. [2, 13]. This was followed by XSD (XML Schema definition) to over-come the limitations of DTD [6]. These languages define a schema for XML datasources. The schema consists of metadata in form of schema elements which fol-low a set of predefined rules [18]. The more data an XML standard needs to sup-port, the more complex its schema gets. As XML is text, its raw size is huge even ifit carries small amounts of data. This leads to inefficiency when exchanging a largeamount of data and is not feasible in terms of time. Users can keep creating theirown tags and defining schema elements at will. This increases the size and com-plexity of XML files. As a result, XML data and schemas are hard to understand.Even though XML is popular for exchanging data, many applications chose to notsave their information in this format. Relational databases are still preferred mostof the time. In cases where the information is better stored in XML, transitioningto relational databases is difficult.Despite these problems, XML still remains an important data storing format.Research has been done to make XML more usable. Many querying languages forXML have been developed that query in different ways. XML querying technolo-1gies such as [15, 17, 34] expect structured queries, whereas keyword querying hasbeen explored in [26, 31, 40]. Graph querying has been addressed in [26, 40, 41].These papers assume that the user either knows exactly what to look for by inputingqueries or can minimize the search space through keywords.We address the important question of understanding XML schemas when theuser has no prior information about the data or the schema of the data. This meanswe are concerned with two groups of people. The first group will consist of peoplewho have never interacted with the data. The second group consists of domainexperts who understand the meaning of the data but do not interact with its structureand have no idea about the schema. This is mostly due to the availability of toolsthat can extract the information they need. But sometimes, these tools are notenough and they need to look into the raw XML data.On such example, which we use as a case study for this thesis, is architectural,engineering and construction data. Such data is typically stored in CAD models orthe newer, richer way of storing such data – Building Information Models (BIM)[1]. The widespread use of building information models (BIM) [1] has made iteasier to plan, manage, construct and design buildings and infrastructure. Soft-wares such as Autodesk Revit [4] can be used to create these 3D models. The datafrom Revit can be exported in IFC format [3] and converted to ifcXML [11]. AnIfcXML file is an Industry Foundation Classes XML file used by Building Informa-tion Modelling (BIM) programs. IfcXML has been successfully standardized bythe International Alliance for Interoperability [11]. These ifcXML files suffer fromthe same problems as XML files. The BIM information embedded in ifcXML iscomplex. As ifcXML is a subset of XML, its the primary type of data is an element.Elements can be related to other elements either by being nested as sub-elements,or they can be connected through references to identifiers through id - ref pairs.When individual elements are defined in an ifcXML file, they contain some basicinformation about the element.There are four different types of BIM models (architectural model, mechani-cal model, electrical model and structural model) and each of them have separateifcXML files. Out of these types, the mechanical model shows clear examples ofrelationship information. After discussion with the civil engineers, we realized thatthe kind of information they were looking to extract from ifcXML files were pri-2marily based on relationships in mechanical models. As a result, we chose to workwith mechanical models. Relationships in the mechanical model can be defined asphysical connections between different objects. For example, building designers orestimators may want answers to questions such as ‘which floors will be affected if aflow terminal is removed?’. Using existing software, they will need to find the flowterminal, manually traverse through the 3D representation of the system that con-tains this flow terminal and record the different floors that the system spans across.Many such simple questions cannot be answered easily by Revit and require de-signers to export the BIM data into ifcXML. Unfortunately, trying to understandthis raw data is problematic; in particular, for a flow terminal spanning two onlyfloors, at least six id-ref pairs need to be accessed to answer this simple question.The first problem we encountered was successfully querying the ifcXML filesto extract relevant information. Section 3.1 expands on this. Secondly, we need tocheck the information in ifcXML files for consistency and correctness. Accordingto Eastman [39], who is an expert in the field of civil engineering, a lot of the in-formation in ifcXML files is semantically incorrect. Thus the quality of ifcXMLfiles is very important. Section 3.2 is based on this. Finally, once we have suc-cessfully extracted information from the ifcXML files and checked its correctness,we need to present this information in a user friendly manner. Section 3.3 intro-duces IfcXMLExplorer, a visual interface for exploring and understanding ifcXMLdata. Using IfcXMLExplorer, users can now easily retrieve relationship informa-tion from ifcXML files without having to deal with the complexities of raw ifcXMLdata.In the remainder of this work, we first list our contributions (Section 1.1). Weprovide some background in Section 2. Then we introduce the ifcXML case studyin Section 3. Section 4 discusses the usage of XML in other domains. Finally, inSection 5 we conclude and end with future work.1.1 ContributionsWe created a visual interface — IfcXMLExplorer for exploring and understand-ing IfcXML data. The interface performs the following functions:• Summarize all common non-relationship specifying element types in terms3of the subsystems they belong to.• Locate an element whose identifier is already known.• Browse and find an element from the tables generated from the differentsegments in the overview view.• View the relationship network of a particular element to see which otherelements it is connected to and through what relationshipIn order to successfully create IfcXMLExplorer, we first extracted necessaryinformation from IfcXML files by running XQuery queries and parsing the queryresults. Then we were required to check for correctness and completeness of thedata we has extracted. We chose appropriate tests to validate the semantic infor-mation in IfcXML files and applied them.4Chapter 2Background2.1 BIM: building information modelsIn BIM, everything starts with a 3D digital model of the building. However, BIM ismuch more than pure geometry and some nice textures cast over it for visualization.A true BIM model consists of the virtual equivalents of the actual building partsand pieces used to build a building. These elements have all the characteristics(both physical and logical) of their real counterparts. These intelligent elementsare the digital prototype of the physical building elements such as walls, columns,windows, doors, stairs, etc, that allow us to simulate the building and understandits behavior in a computer environment way before the actual construction starts.The US National Building Information Model Standard Project Committee definesBIM as follows.Building Information Modeling (BIM) is a digital representation ofphysical and functional characteristics of a facility. A BIM is a sharedknowledge resource for information about a facility forming a reli-able basis for decisions during its life-cycle; defined as existing fromearliest conception to demolitionCurrent BIM software is used by individuals, businesses and government agen-cies who plan, design, construct, operate and maintain diverse physical infrastruc-tures, such as water, wastewater, electricity, gas, refuse and communication utili-5ties. Autodesk Revit (shown in Figure 2.1) [4] is one such BIM application. BIMmodels created in Revit can be saved in IFC [3]form.Figure 2.1: Building Information Models in Autodesk Revit2.2 IFC: industry foundation classesIFC (Industry Foundation Classes) is a data model developed by buildingSMART[9] (formerly known as the International Alliance for Interoperability, IAI) andis used for interoperability in the the architecture, engineering and construction(AEC) industry. The IFC model specification is open and freely available. IfcXMLis its XML specification [42]. IFC files can be exported from building modelscreated in Autodesk Revit [4]. IFC/ifcXML specifications based convertors canbe used to convert IFC to ifcXML. Besides IFC, there are other standards thatbuilding data can be exported into such as Microsoft Access, gbXML or DWL.But it is seen that ifcXML contains information not supported by other standards6Figure 2.2: Solibri[42]. Therefore, if people in the AEC industry want to find out information byexamining the data, they use ifcXML, which is why we have also concentrated ourresearch on it.2.2.1 IFC visualizationA building project can have more than one IFC file. Each IFC file defines differentparts of the same projects. The IFC files can be separated on the basis of models orcontent specific. Model based separation means that the different BIM models forthe architectural, mechanical, structural and electrical components have separateIFC files. In order to effectively integrate these IFC files, IFC checkers have beenintroduced. These IFC checkers check if an IFC file is defined correctly and allIFC files related to the same project are appropriately integrated. There are a fewIFC checkers in the market which also act as viewers. State of the art IFC viewerslike Solibri [5] extract information from IFC files and display a 3D model of thebuilding. These viewers make it possible for a user to navigate through the virtualbuilding and explore its spatial structure. Information such as dimensions of an airterminal and other spatial details about objects in a building can be found through7Solibri (Figure 2.2).But existing approaches fail to answer many questions whose answers aredeeply embedded in the data and not directly related through spatial details. Under-standing how different objects are related to each other and by what relationship isone such situation. For example, if building designers or estimators want answersto questions such as ‘which floors will be affected if a flow terminal is removed?’,they need to manually find the flow terminal, traverse through the 3D representa-tion of the entire system and record the different floors that the system spans across.These kind of questions cannot be answered easily by Revit and require designersto export the BIM data into ifcXML. Unfortunately trying to understand this rawdata is problematic; in particular, it requires looking at all the id - ref pairs of theflow terminal in order to answer this simple question.2.3 Specific type of XML files: ifcXMLIfcXML is the IFC data file using the XML document structure [9]. It can be gen-erated directly from the model creator or from an IFC file. The ifcXML represen-tation is an implementation of the ISO-10303 Part 28 Edition 2 (part 28) standard.The mapping is guided by a configuration file that controls the specifics of thetranslation process. For ifcXML this configuration file is standardized and pub-lished for each version of the corresponding IFC schema.2.3.1 Problems with ifcXML filesThe schema of ifcXML is very complex. Properties are often not attached directlyto the element but related indirectly through refs. Some of these reference paths arevery long. For example, as mentioned in [42], Figure(2.3) shows a part of the basichierarchical representation of an exported ifcXML file. To find a single openingrelationship between a door and a wall, five different elements need to be identifiedand referenced. Also to find out basic properties about a wall such as dimensionor length, four refs need to be connected. This makes it very complex and difficultto identify simple elements from ifcXML data. Figure (2.4), taken from [42] helpsunderstand the problem better. Its shows a 3D wall component and its correspond-ing ifcXML elements in an XML viewer and the actual ifcXML representation.8It is seen that under the element tag IfcWallStandardCase which represents a stan-dard wall, only limited information can be explicitly found from the actual ifcXMLdata. Most properties are represented implicitly by refs. Analyzing how differentobjects and their properties are linked through refs is the biggest challenge withifcXML data.Figure 2.3: Hierarchical structure of ifcXML files[42]Motivating exampleThe ifcXML schema is so complex that answering simple questions such as ‘Whatis the length of a wall?’ becomes extremely difficult. In Figure(2.5), we assumethe user knows the id of the wall (let id = i51). A search for this id will direct theuser to the element typeIfcWallStandardCase. Here the user gets to know the name9Figure 2.4: Comparing 3D model with ifcXML representation[42]and object type. Other information about the wall can be found by following refsthat are connected to the wall. However, if the user wants to know the length ofthe wall, following these refs will lead to a dead end. To get the length of the wall,the user needs to go to the element ifcRelDefinesByProperties (id = i1560) whichreferences the wall (id = i51). IfcRelDefinesByProperties references ifcPropertySet(id = i1775) which further references ifcPropertySingleValue (id = i2051) whichhas the attribute ‘length’. This attribute gives us the length of the wall.The problem described above arises due to the flexibility of XML. It allows fordata to be represented in many different ways. This is good because people canexpress their data in a way that makes sense for them, but problematic for under-standing data that hasn’t been created by them. IfcXML suffers from these prob-lems too, even through it has been standardized by buildingSMART [9]. Whilethese standards have helped make sure that machines can read the data, it is notenough to make sure that humans can understand the data. Specifically in ifcXMLfiles, information about elements is spread over the file through identifiers. One ofthe biggest challenges is tracking relevant information through this web of connec-10Figure 2.5: Finding the length of a wall from an ifcXML filetions.Information about elements in the BIM mechanical modelOur work will be based on the BIM mechanical model. In order to better under-stand what each element in the IFC file corresponds to in the real world, we haveexplained the IFC nomenclature below. These are the standard definitions given bybuildingSMART [9]. The flow distribution system refers to the system in the realworld.• IfcFlowFitting: IfcFlowFitting defines the occurrence of a junction or tran-sition in a flow distribution system.• IfcFlowSegment: IfcFlowSegment defines the occurrence of a segment in aflow distribution system such as a duct, pipe, etc.• IfcBuildingElementProxy: IfcBuildingElementProxy serves as a proxy forbuilding elements that cannot be classified. Elements that do not have asemantic definition in the current IFC release fall under this category.• IfcDistributionPort: IfcDistributionPort defines the occurrence of a special-ized port that connects two or more elements in the system.11• IfcFlowTerminal: A flow terminal is a point at which the system interactswith an external environment. It marks the end or beginning of a distributionsystem (such as air outlet, drain, sink, etc.)12Chapter 3Case study: ifcXML dataThis work on ifcXML files consists of three major sections as shown in Figure 3.1.First is the extraction part where we have taken the ifcXML files and extractedthe information we required. This step is crucial as not all the information in thefiles is important for the purpose of this research. So we obtained the requiredinformation and stored them in intermediate XML files. This was done with thehelp of the XML Database Management System - BaseX [7]. The next step wasto check the quality of information in the ifcXML files. The biggest problem withIFC files is the quality of their data. Standardizations have been applied to ensurea minimum quality for IFC files. But even with these measures, the task of qualitychecking of IFC files is tedious as it requires manual efforts by experts. Alsothere are no standard definitions of what constitutes a good IFC file. Definingand validating data quality is a difficult task and involves taking into considerationmultiple characteristics of data, including syntactic wellformedness, consistencyacross multiple redundant representations, integrity of translation from the sources,the accuracy of derived data and others. Organizations that depend on model dataneed to have a reliable and automated method of assessing all these aspects. Inthe paper ‘Toward Robust and Quantifiable Automated IFC Quality Validation’ byEastman et al. [39], formal definitions of IFC quality testing have been framed.Using the rules in Eastman’s paper [39], we checked for the completeness andcorrectness of the data we extracted. Finally, the information was visualized inifcXMLExplorer. IfcXMLExplorer was created using javascript libraries such as13D3js and Bootstrap.Figure 3.1: Complete System of IfcXML Processing3.1 Part 1: information retrieval from ifcXML filesThe structure of ifcXML files makes it extremely difficult to directly extract infor-mation from. For our case study, we chose two different BIM mechanical models.We first conducted our preliminary work on a BIM model small enough to beunderstood by hand. In particular, we looked at a portion of the Centre for Inter-active Research on Sustainability (CIRS) building at UBC. The file is 21MB with517,098 lines. It consists of 5 small systems, with the biggest system having a totalof 146 objects. Objects are entities that exist in the real world such as Flow Termi-nals. Flow terminals are objects that define the beginning of a distribution system.These can be of different types such as air terminals and stack terminals. This en-abled us to generate appropriate queries and check the results of the queries fromthe file. Since there was no available ground truth, we explored the file to crosscheck the results from the querying process to know if a query was correct or not.This was done by collaborating with our civil engineering colleagues. Most of thetime, only querying an ifcXML file did not provide the exact information needed.Instead it generated intermediate XML files that required further processing to getthe necessary information. We tackled this by parsing these intermediate XMLfiles in python. The querying and parsing of intermediate XML files was initially14done on the smaller CIRS dataset so that we could manually check the results. Af-ter ensuring that the hand check was correct, we moved onto a bigger dataset: themechanical model of the Royal Alberta Museum. The museum is 215,533.00 sq.ftand the ifcXML file is more than 2GB. As this data is too large for us to be ableto view and directly apply tests to, we conducted tests on the smaller file first andthen applied them to the bigger file.From these files we have to extract ‘relationship’ information. Relationshipinformation can be defined as connections in the real world that have been projectedand saved into ifcXML files. For example, a flow terminal may be connected to aflow fitting in the real world through distribution ports. This can be identified in theifcXML file by following the series of id - ref pairs though each of these elements.3.1.1 XML database: BaseXWe are using the XML Database ’BaseX’ to query the ifcXML files. This databasewas chosen after trying out other databases such as Oracle Berkeley DB and exist-db. The reason we rejected these are:Oracle Berkeley DBOracle Berkeley XML DB required many other supporting packages to be down-loaded. Also it requires an outdated version of Java to be installed. As the success-ful installation of software was not happening, and the database would not work ifall of the supporting software is not working properly, Berkeley DB was rejected.Exist-dbExist-db was easy to install and run. After trying out with small documents andsuccessfully querying small sample xml files, it was seen that exist-db does notsupport large XML files. Exist-db can only support a maximum file size of 100mb.Unfortunately the ifcXML file we will be working with is much larger than that.So we were unable to use the 2GB ifcXML file with exist-db.Final choice: BaseXWe finally found and settled for BaseX because of the following reasons:15• BaseX is not dependent on any supporting programs being installed.• It is extremely lightweight.• It has an interactive and user-friendly GUI which makes it easy to view andexplore XML documents.• It has a scalable XML Database engine and XPath/XQuery 3.1 Processorwhich includes full support for the W3C Update and Full Text extensions.• Due to its scalability, BaseX can successfully load the 2GB ifcXML file.• BaseX can query the large ifcXML file in a few seconds.Thus BaseX was the best XML Database choice for our ifcXML Data.3.1.2 Querying and parsing ifcXML filesOnce a proper XML database has been chosen, our next step was to successivelyextract interesting information from these ifcXML files. Interesting informationfor the purpose of this research has been described as information based on realworld relationships. For example a pipe in the real world may be connected to anair terminal. In IFC nomenclature, the pipe will be categorized as a flow segment.Flow segments are sections of a distribution system such as pipes and ducts. Sim-ilarly, an air terminal will be categorized as a flow terminal. Flow terminals areelements that act as the beginning of a distribution system. So in ifcXML files,in order to find information about this connection, we need to start by finding arelationship between a flow segment and a flow terminal. Most of this informa-tion cannot be directly retrieved by querying ifcXML files. As a result, queryingis done to narrow down our search space. This gives us smaller and intermediateXML files to work with. Finally these intermediate XML files can be parsed togive us our final results.The queries that we selected were naturally based on the usage of the resultsof these queries. They were of two types. The first set of queries were based ona paper by Eastman [39] that checks for semantic consistency of information inifcXML files. We will expand on this in Section 3.2. The second set of queries16were framed to extract information for the visual interface we create later to betterunderstand the information in ifcXML files. This has been discussed in Section3.3.Queries that we used from the eastman paperThe questions that we selected from Eastman’s paper [39] to check the semanticconsistency of the information in ifcXML files are as follows. Details about theselection are given in Section 3.2.• Connection Information: Connectivity information between ports should beconsistent. A distribution port with a flow direction as SINK should connectto a distribution port with flow direction as SOURCE and the other wayround. Similarly, distribution ports with flow direction SOURCEANDSINKshould be connected to another SOURCEANDSINK.• Junction Information: For connectivity involving more than 2 ports, at leastone of the ports should be SINK or SOURCE or SOURCEANDSINK. Thiscondition is put to maintain that every junction has at least one ‘source’ andone ‘sink’, otherwise there would be no flow.• Objects that span multiple levels: Object that spans multiple level shall haveappropriate IfcContainedInSpatialStructure or ifcRelReferencedInSpatial-Structure information.Connection informationThis type of information can be found in distribution ports in the ifcXML file. Eachdistribution port has a ‘flow direction’ which can be SINK, SOURCE or SOURCE-ANDSINK. The types of possible connection can be found under the relational ele-ments - ifcRelConnectsPortToElement and ifcRelConnectsPorts. A SINK type dis-tribution port connects to a SOURCE type and vice versa. A SOURCEANDSINKtype distribution port connects to another SOURCEANDSINK type distributionport. As shown in Figure 3.2, every SINK is connected to a corresponding SOURCE.And the flow direction is always from the SOURCE to the SINK. To retrieve in-formation to validate connection information for ifcXML files, we need to find all17the id’s and flow direction of all distribution ports under ‘RelatingPort’ or ‘Relat-edPort’ xml tag in the relationship ifcRelConnectedPorts.Junction informationA junction is defined as a connection with more than two distribution ports. For ajunction to be valid, the following must be true:• One of the ports must be SINK or SOURCEANDSINK• One of the ports must be SOURCE or SOURCEANDSINKTo retrieve information to validate junction information for ifcXML files, wefirst check if the ifcRelConnectsPorts relationship can have more than 2 ports. Ifnot, how is it possible to find junctions in an ifcXML file? We formulate a fewchecks and apply them to both our ifcXML files. These checks are as follows:• Do ifcXML files have any other kind of ifcRelConnects relationships besidesifcRelConnectsPorts and ifcRelConnectsPortToElement?• Can ifcRelConnectsPorts have more than two ports? First find the numberof ‘RelatingPorts’ and then ‘RelatedPorts’. If these are the same, then therelationship ifcRelConnectsPorts can have two ports only.• Can the junctions be saved as ‘RelatedElement’ in the relationship ifcRel-ConnectsPortToElement?In order to perform the above checks, we need information about the differentelements of relationship specifying types - ifcRelConnectsPorts and ifcRelConnect-sPortToElement.Objects that span multiple levelsTo find if an object spans more than one level or not, we need to explore the rela-tionship specifying element type — ifcRelContainedInSpatialStructure or ifcRel-ReferencedInSpatialStructure. It will be connected to an individual element calledifcBuildingStorey. This will give us the storey that the object belongs to. If the ob-ject has more than one ifcRelContainedInSpatialStructure element with a different18ifcBuildingStorey, it is said to span multiple levels. As shown in Figure 3.6, theelement 11 (ifcCurtainWall) has more than one ifcRelContainedInSpatialStructureor ifcRelReferencedInSpatialStructure which are connected to different ifcBuild-ingStories — 8, 9 and 10. This means the curtain wall spans across three floors —Ground floor, first floor and second floor. To retrieve information to find objectsthat span multiple levels, we need to do the extract the following information:• Find all the ifcRelContainedInSpatialStructure and extract the set of its ‘Re-latedElements’ and the id associated with ifcBuildingStorey under ‘Relat-ingStructure’.• For each object in the ‘Related Elements’ set, find if the object has anotherdifferent ifcRelContainedInSpatialStructure value.The actual checks and results will be discussed below in Section 3.2.Queries that we used for the visualizationOnce an XML database was chosen, our next step was to successively extract inter-esting information from these ifcXML files. Interesting information for the purposeof this research has been described as information based on real world relation-ships. Most of this information cannot be directly retrieved by querying ifcXMLfiles. As a result, querying is done to narrow down our search space. We used thestandard XML query language XQuery [15] to query the ifcXML files. This givesus intermediate XML files which can be parsed to give us our final results.In order to build our visualization – IfcXMLExplorer – we needed to extract afixed set of information from ifcXML files. For the purpose of this research, wewere interested in the different systems, their corresponding subsystems and theelements in each subsystem. For each of the elements, we were also interested inwhich other elements it may be connected to and through what relationship. Forexample, a Flow Terminal is connected to a Distribution Port element through therelationship ifcRelConnectsElementToPort. To help us extract information and for-mulate queries, we divided the information we needed in the form of the followingquestions. Finding answers to these questions will give us the information we need.• Which systems and subsystems are present in the ifcXML file?19• What are the individual elements each subsystem contains?• What other elements is a particular element connected to and through whichrelationship?The answer to the first two questions can be found by picking out all the sys-tems present in the ifcXML file and finding all the individual elements that belongto these systems. Each system has its own definition of the relationship type: ifcRe-lAssignsToGroup which gives us a set of sub-elements for RelatedObjects and theRelatingGroup. The RelatedObject set consists of elements referenced though refvalues that are part of the same system. The RelatingGroup gives us the ref valueof the system. The answer to the third question is more complicated. To find allthe elements a particular element is connected to and through which relationships,we need to track all the references to ids of that element in the file. For this westart with the id value of the element we are interested in. Next we need to findall the references to this id value. In an ifcXML file, this is specified by men-tioning the id value of the element as a ref value in the XML tag. An exampleis <IfcFlowTerminal xsi:nil=‘true’ ref=‘i2229’/>. In simple XML nomenclature,IfcFlowTerminal is the XML tag and ref=‘i2229’ is an attribute giving us informa-tion about the tag. In ifcXML nomenclature, this translates to an element of typeflow terminal identified through the reference identifier ref=‘i2229’. By tracingall the occurrences of the id value i2229 as reference values, we can generate therelationship network of the element.3.2 Part 2: IFC quality validation from ifcXML filesThe use of IFC (Industry foundation classes) has been increasing with the need ofinteroperability. Quality of an IFC model in the interoperability scenario is a matterthat has been discussed for many years. In [28], a well-ordered historical overviewof the development of IFC as a standard with its different issues is shown. Amor[14] discussed the issues and compared it to similar issues in other industries suchas healthcare and STEP-based manufacturing. He recommended a set of processes,best practices and tools to accomplish what different industries had achieved tomanage the challenges of interoperability. Similarly, Lipman [32] examined these20issues in detail and picked out examples from various IFC conformance testing andcompared with similar conformance testing in other domains. Kiviniemi [27] anda report from IAI Denmark [19] featured issues with the IFC certification processand the demand for a more robust certification regime that is comprehensible bythe end users and more understandable in the real world by including realistic testcases.Testable definition of well-formednessEastman [39] has taken some steps to define the well-formedness of an IFC model.The aim is to allow consistent predictability and confidence in checking the resultsof the model quality regardless of the model size. They proposed the confidence Cas a function of a model:C(model) =Σ(T(locally valid))Σ(T(locally valid)+T(locally invalid))(3.1)Where:C is a function that reflects confidence of the quality of the model.T is a function of all the relevant tests that discriminate model correctness frominaccuracy or errors in data integrity. Such tests employ a logical process insteadof Boolean. The local tests may result in three outcomes: locally valid, locallyinvalid, missing data and not testable.The confidence should be a high percentage value, mostly increasing as themodel goes through many iterations of testing. Ideally this should be 100%. But,in practice a small tolerance is accepted. There are two main processes related toIFC where validation is required, i.e. Export and Import. The main question thatneeds to be answered while dealing with export validation is:Given a drawing or other representation of a building model, can thesystems export that model accurately representing the intent in IFC?Export cases are more direct compared to import since an export model has a pub-licly defined mapping representation. This allows a common set of tests to beapplied across all application translators. Import model testing needs to be aware21of the varied native model structure each BIM supplication is made of. There aretwo main groups of rules. One that is completely applicable inside the IFC file, i.e.all test rules work with information contained in the IFC file (self-contained), andthe other one is the group that requires information beyond the IFC file (externaldependency). Eastman [39] has defined a set of rule categories and sub-categories.The different categories of rules are:• Correctness tests: These tests validate that mappings are correct betweena (supposedly correct) native instance model and IFC. These rules includesyntactic, semantic and geometry tests.• Conformance tests: These tests are validation steps beyond the schema thathelp determine whether an IFC instance file conforms to the specific ex-change requirement.• Additional tests with external dependency: These tests require additionalinformation outside of the IFC model.The tests can also be categorized according to source such as schema valida-tion, implementer’s agreement, model view definition (MVD) requirements anddata consistency. Another method of classification can be based on different sub-domains such as architectural, structural, construction, facility management, fabri-cation, civil engineering and geometry.3.2.1 Validation questions: how and why were these chosen?We are mostly interested in the relationship information extracted from ifcXMLfiles. Before we can use this extracted information, we need to know that it iscorrect. Otherwise the output from the IfcXMLExplorer tool will be incorrect too.This makes the quality and consistency of the information extremely important.We start by selecting the set of semantic tests. Semantics tests are defined as alltests against the semantic correctness of the data. These checks are done especiallyto maintain consistency of relations that may be topologically correct but do notmake sense semantically, for example an object contained in a space but spatiallydisjoint.22All the tests below can also be classified (done in [39]) according to the diffi-culty in obtaining information to apply these tests. This can be easy (E), medium(M) or hard (H). We are interested in either ’M’ or ’H’ tests. ‘E’ marked tests canbe easily validated only by observing and thus do not pose an interesting problem.These do not require any additional processing. So we decided to ignore the ‘E’marked tests. Also we were only interested in tests which are based on relation-ships in the IFC file. This is mostly topological information. This narrows downour space and we selected four tests. Out of these four tests, one is not a mechan-ical model test. We removed this from our set and end up with three tests. Thesethree tests are used to validate information in the ifcXML file.We first explain the set of semantic tests that can be divided into 5 classes. Andthen narrow down to the three tests that we have selected.Consistent data associated with topologyAll tests related to data associated with topology. For example a flow-direction in-formation on the connected ports must come in fitting pairs (SINK and SOURCE).The specific type of tests are:• System connectivity test: All components in the system should be connected.• Hanging port test: Object should not have a hanging port. As seen in Figure3.2, all distribution ports are connected in pairs. When a distribution port isnot part of a pair, it is called a hanging port. The red dot in Figure 3.3 showsa distribution port that is not connected to any other distribution port.• Connection information: Connectivity information between ports should beconsistent. SINK should be connected to SOURCE and vice versa. AndSOURCEANDSINK should be connected to SOURCEANDSINK (Figure3.2). We have selected this test.• Junction connection: For connectivity involving more than two ports, at leastone shall be SINK or SOURCE or SOURCEANDSINK (Figure 3.2). Wehave selected this test• Wall connection test: Wall connection information shall be consistent. AT-PATH shall be paired only with ATSTART or ATEND (Figure 3.4).23Figure 3.2: Illustration of port connectivityWell-formedness of topologyAll tests that ensure completeness of a topology, for example whether a space iscompletely bounded by the space bounding objects. The specific type of tests are:• Object not in any container: The object has not been assigned a container.• Space boundary check: Completeness of space boundaries. The space bound-aries must be ‘watertight’ when viewed from all horizontal directions and24Figure 3.3: Illustration of hanging portFigure 3.4: Element Connectionsvertical directions (Figure 3.5).• Object that spans multiple levels: An object that spans multiple levels musthave appropriate IfcRelReferencedInSpatialStructure or IfcRelContainedInSpa-tialStructure information (Figure 3.6). We have selected this test.Geometric well-formednessAll tests to make sure that the geometry adheres to certain elementary geometriccriteria, such as that all edges of a window are defined. The specific type of testsare:25Figure 3.5: Space boundaries• Results of Boolean operation shall not be NULL: Evaluate consistent ge-ometry operation, e.g. IfcBooleanResult that leave nothing behind, trim oropening that does not cut anything (Figure 3.7).• Well-formed door or window: Check specific properties essential to the ob-jects. For example, door local placement. The same applies to windowobjects too (Figure 3.8).Model integrityAll tests to check existence of entities that make a model useful, e.g. every IFC filemust have one IfcProject. The specific type of tests are:• Check essential information: Check essential entities that make an IFC fileuseful, e.g. IfcProject, header information, and other IfcProject related setupsuch as unit, GeometricContext, etc.• Overlapping objects: Solid objects should not overlap each other, except in26Figure 3.6: Object that spans multiple storieswell-defined cases such as embeds in concrete (Figure 3.9).• Duplicated types: Same types should not be duplicated. This is also ap-plicable to PropertySet and MaterialLayer (Figure 3.10). Here the two Airterminal types are exactly the same but with different unique id’s. This is not27Figure 3.7: Trim operation resulting in a Null bodyFigure 3.8: Object placementsallowed in IFC.• Parametric and explicit geometries: Check consistency of geometry infor-mation against the explicit geometry, e.g. IfcWallStandardCase formed byits Axis + MaterialLayerSetUsage vs. the explicit geometry by extrusion, or28Figure 3.9: Overlapping geometriesDoor and Window (Figure 3.11).• Well-formed spaces: For proper space calculations, all floor or slab spacewithin the extras walls of a building should be covered by either permanentconstruction or habitable space, or vertical shafts (Figure 3.12).• Door operation type: Door and window require the correct type to be defined.• MEP (Mechanical, Electrical and Plumbing) System check: MEP objectsshall be part of a system.Spatial integrityAll tests that related to spatial relations between objects and the other related enti-ties. The specific type of tests are:• Object floating in space: Identify all objects that are spatially disjoint andnot touching another set of objects (Figure 3.13).29Figure 3.10: Duplicate typesFigure 3.11: Parametric vs Explicit geometries• Badly-formed opening: Opening that does not cut anything (Figure 3.14).• Object in wrong container: Object assigned to a wrong container (Figure3.15).Out of all the above tests, we selected the following three tests.• Connection information• Junction information30Figure 3.12: Incomplete space modelling• Objects that spans multiple levelsOnce the validity, quality and consistency of the data has been checked, weneed to create our application to be able to understand this data better.3.3 Part 3: IfcXMLExplorer: a visual interface forexploring and understanding ifcXML data3.3.1 OverviewTo better understand users’ need and current challenges, we consulted with ex-perts in the Civil Engineering department. Based on the complexity of ifcXMLfiles and users’ difficulty in understanding those files, we focused on designing andimplementing a simple and easy-to-use visualization tool that shows the differentrelationships between elements in an ifcXML file. We specially considered usersthat are not necessarily familiar with ifcXML complex schema. In the followingsections we describe our task abstraction and then tool design and different views31Figure 3.13: Object floating in spaceof the visualization tool. To design our tool, we closely followed Munzner’s ab-straction [33]. Munzner defined rules of abstraction by reducing complex problemsinto basic what, why and how questions. This helped us categorize our set of tasksinto different methods of visualization that best represented these tasks. Table 3.1summarizes our analysis of what, why and how questions based on Munzner’smodel. This is a detailed analysis of the data, the tasks involved and the encodingsdone. Based on this analysis, we came up with the following tasks:TasksOur visualization system helps the user perform these tasks:• Summarize all common individual elements in terms of the subsystems theybelong to.• Locate an element whose identifier is already known.32Figure 3.14: Opening that does not cut anythingFigure 3.15: Object in the wrong container• Browse and find a particular element from the tables generated from thedifferent segments in the overview view• View the relationship network of a particular element to see which otherelements it is connected to and through what relationship.Based on our task and data abstraction, we designed four views to enable users33Figure 3.16: Screenshot of complete System.to perform different tasks. The first thing a user sees is the overview view whichshows common individual element types and their corresponding elements in dif-ferent systems. Each segment in this view has its own set of elements in tabularform. This table is generated in the search view along with a search bar. Putting thevalue of an identifier in the search bar displays the graph of the element the identi-fier refers to. This graph is known as a relationship network and is one of the mostimportant parts of the visualization. The relationship network displays how theselected element is connected to other individual elements through relationshipsdescribed by relationship specifying elements.343.3.2 Problem descriptionIfcXML files are XML files that adhere to the ifcXML [11] standard. The ifcXMLstandard was designed to store BIM [1] information. Comparing this to relationaldatabases, element types in ifcXML can be seen as different tables and the individ-ual elements as rows in a table. For example, the element type Flow Terminal canhave the element (spotted by its identifier value) i2229 under it. Each individualelement has different values for a fixed set of attributes (columns in a table withseparate value for each individual element or row). However, the constraints thatdescribe the relationships between elements are different and more verbose thanthose in a relational database. In a relational database, a reference is typically de-clared on a table at a time basis — e.g. one could declare that in a payroll table thatall EmployeeIDs reference employees in the Employee table. In XML this has tobe done by individual refs for each Employee, which causes a much more verbose,complex, and difficult to understand representation.Because of this complexity in understanding the definition of relationships be-tween ifcXML schema elements, ifcXML uses the keyword Rel to describe ele-ments that exist solely to describe relationships. One relationship example is ifcRe-lAssignsToGroup. It connects a set of elements to their corresponding group. InifcXML nomenclature, a group refers to a system. Thus, this relationship tells uswhich elements belong to the same system. This essentially creates two classes ofelements in ifcXML: individual elements and relationship specifying elements. Weuse these terms throughout the paper.Individual element types give us new information about an object and relation-ship specifying element types connect two individual element types through a re-lationship. For example, ifcContainedinSpatialStructure is a relationship specify-ing element type connecting two individual elements such as ifcFlowTerminal andifcDistributionPort. This connection tells us that an element of type — ifcFlowTer-minal is in the same spatial structure as a ifcDistributionPort type element. In Fig-ure 3.18, A and C are individual element types while B is a relationship specifyingelement type.Unfortunately with ifcXML files, not all elements belonging to a specific el-ement type are defined together. As a result, there may be multiple elements of35Figure 3.17: Structure of ifcXML Schemathe same type spanned across the ifcXML file. This is seen in Figure 3.17, theelement type ifcDerivedUnitElement is defined twice, each time with different el-ement under it. This is similar to tables in relational databases with different rowvalues but exact table name. Each element has attributes defined in the XML tag.One attribute is reserved for the id, which is the global identifier for each element.Many elements have a ref value as an attribute in their XML tag. This means thatthe element has been defined elsewhere, but is being referenced at that point. Thiscreates explicit connections in ifcXML file through id - ref pairs. For example,the flow terminal identified by its id value — i2229 is defined in the ifcXML fileas <IfcFlowTerminal id=‘i2229’/>. In this XML tag, the portion — id=‘i2229’tells us that the flow terminal identified through the id value i2229 is defined atthis point. But when it is referenced in another part of the ifcXML, its XML taglooks like <IfcFlowTerminal ref=‘i2229’/>. Here instead of id=‘i2229’, the tag36Figure 3.18: Relational and Non-relational Elementscontains ref=‘i2229’. This shows that the flow terminal is not defined at this point,but just referenced. References are made to explicitly connect different elementswhich creates reference paths. Some of these reference paths are very long. An-alyzing how different elements and their attributes are linked to get informationabout objects and their properties is the biggest challenge with ifcXML data.The purpose of ifcXMLExplorer is to help understand ifcXML data better byvisualizing it. The dataset used is the ifcXML file of the mechanical model of theCIRS building in UBC. Here is a recap of the terminology to understand the databetter.Terminology• Schema: Structure of ifcXML file.• Objects: Real world objects in the building. E.g. flow terminal in room 007in the basement.• Properties: Information about objects. E.g. type of flow terminal.• Elements: Similar to tables in relational databases. Are of 2 types - relation-ship specifying and individual. Relationship specifying elements connectindividual elements to each other through id - ref pairs.• Attributes: Columns in a table with separate value for each data instance orrow.37Figure 3.19: Different data instances connected through reference paths.3.3.3 SolutionOur visualization is divided into four views. The first thing a user sees is theOVERVIEW view which shows common individual elements and their correspond-ing elements through different systems. Each segment in this view has its own setof elements in tabular form. This table is generated in the SEARCH view alongwith a search bar. Putting the value of an identifier in the search bar displaysthe graph of element the identifier refers to in the NETWORK view. This graphis known as a relationship network and is one of the most important parts of thevisualization. Every identifier whose relationship network has been generated ismentioned in the HISTORY view. This helps the user keep track of all the iden-tifiers that have already been visited. The relationship network displays how theselected element is connected to other elements through relationships described byrelationship specifying elements. The analysis table (Table 3.1) of the visualization38gives a detailed analysis of the data, the tasks involved and the encodings done.OverviewIfcXML files have a special individual element type called ifcSystem defined inthem. Elements in ifcSystem are defined through attributes such as Name andType. All elements in ifcSystem can be classified into system types, e.g. Me-chanical Exhaust Air. The CIRS dataset has 2 types of Systems: Exhaust Air andSupply Air. These systems are divided into 5 subsystems: Mechanical ExhaustAir 1, Mechanical Exhaust Air 2, Mechanical Supply Air 2, Mechanical SupplyAir 3, Mechanical Supply Air 4. Each element in an ifcSystem is connected toa set of individual elements through the relationship specifying element ifcRelAs-signsToGroup. Individual element types such as ifcFlowSegment, ifcFlowFitting,ifcBuildingElementProxy, ifcDistributionPort and ifcFlowTerminal are common inall subsystems. The overview categorizes these subsystems according to the com-mon individual elements. The Y axis shows the different subsystems defined in theifcXML file. The X axis shows all the common individual element types. Eachrectangular segment in the view is colour coded according to the number of ele-ments. Figure 3.20 shows the overview view of the ifcXML file. Hovering overthe rectangles in the view show the number of elements that fall under that cate-gory. For example, hovering over the rectangular segment defined by MechanicalExhaust Air 2 on the Y-axis and B on the X-axis shows a tooltip with the number21 on it. This is the number of flow fittings belonging to this subsystem.This view makes it much easier for users to obtain system specific informa-tion. For example if a user wants to see which system has the maximum number ofdistribution ports, he can just hover over the element type - distribution port underthat system. To find the answer to the same question from the raw ifcXML file, hewould need to manually search for every system and count the number of distri-bution ports in each. For large ifcXML files, this is an extremely tedious task asthere can be many different systems and a big number of distribution ports in eachsystem.39Figure 3.20: Overview ViewSearchThe Search view consists of the search bar and a table of elements. When a seg-ment in the overview is clicked, a table of elements belonging to that segment isgenerated. The table shows their ids and names. An id can be selected from this ta-ble and entered into the search box. This generates the network view of the selectedelement. A known id can also be directly inputted into the search box without find-ing it from the table to get its network view. The search box and table is shownclearly in Figure 3.21.40Figure 3.21: Search ViewHistoryThe history view gives a list of all identifiers whose networks have been generated.It helps the user keep track of all visited elements. From Figure 3.22, it can be seenthat the relationship network for identifiers ‘i2229’ and ‘i140134’ have alreadybeen generated.NetworkThe network view shows the user the relationship network of the elements selectedby its unique identifier. The root node shows the selected element. Second level41Figure 3.22: History Viewnodes show the different relationship specifying elements types that the root noderefers to by its ref value. Third level nodes show the individual element types con-nected to the relationship specifying element types in the second level. Finally, leafnodes show the elements connected to the root node through ref’s across the secondlevel and third level nodes. As seen in Figure 3.23, the root node is coloured in redand specifies the identifier whose relationship network is being shown. Relation-ship specifying element types are shown as blue nodes, individual element types asblack nodes and leaf nodes are shown in green. Hovering over the green leaf nodeswill show the identifier of the element. For example, once we type the identifiervalue ‘i2229’ for a flow terminal into the search bar, we generate its relationshipnetwork which initially shows just the root node. On clicking this node, the net-work expands to show the first level nodes. These are relationship specifying ele-ment types such as ifcRelAssignsToGroup and ifcRelConnectsPortToElement. Thenetwork can be further expanded by clicking on one of these relationships. Let usclick on ifcRelConnectsPortToElement to find the individual element types that theflow terminal (i2229) is connected to. In this case, the individual element type isa distribution port. Finally, by clicking on this node, we can find the name of thedistribution port that the flow terminal is connected to.Now that we have the relationship network of a specific element (e.g flow ter-minal i2229), we can easily find out which other elements it is connected to andthrough what relationships. Answering the question ‘which floors will be affectedif a flow terminal is removed?’ is much easier. From the relationship network, wecan see that the flow terminal is connected to a individual element type ifcBuild-ingStorey through the relationship specifying element type ifcRelContainedInSpa-tialStructure. Clicking on the ifcBuildingStorey node will give us the names of thefloors that will be affected if the flow terminal is removed.42Figure 3.23: Network View3.3.4 ImplementationThe main purpose of this project is to help domain and non-domain experts ex-tract relationship information from ifcXML files. The system needs to be easy toinstall and run. We chose to design a browser-based system that just requires theusers to run the .html file. The system does not need any additional downloads orinstallations.EncodingThe tasks are encoded using two main visualization idioms. The overview viewis visualized using a heat map where each segment is colour coded. The colourgradient goes from light yellow to dark red as seen in Figure 3.24. The range43of items is 0 - 100 with each coloured section for 20 items. Node-link diagramsare used to encode the relationship network where red and green nodes refer toelements, blue nodes refer to relationship specifying element types and black nodesrefer to individual element types. The data is manipulated by selecting sections indifferent views.Figure 3.24: Colour GradientDataOur original data is in the form of ifcXML files with real world objects described interms of elements with different attributes. Each of the element types have multipleelements identified with unique identifier values. The data derived from ifcXMLfiles are as follows:• A matrix for overview where x axis shows individual elements common in allsubsystems and y axis shows subsystems defined in ifcXML file in ifcSystem.• Each segment in the overview view has a table of elements identified throughunique identifiers.• Each element has its corresponding relationship network saved as a .json file.This file is read by a Javascript file, written using D3js which converts theJSON data into a relationship network.Tools usedThe visualization is browser based and developed using a combination of HTML,CSS and Javascript [22]. D3.js [10] is Javascript library that uses digital data todrive the creation and control of dynamic and interactive graphical forms whichrun on web browsers. Bootstrap [8], a web framework, was used to create theframework of the visualization.443.3.5 User studyWe conducted a user study to validate the usefulness of IfcXMLExplorer. The civilengineers that we consulted had already validated that the specific set of questionswe were answering are difficult to answer through standard IFC viewers. The 3Dviews provided by these viewers make it time consuming to answer questions suchas ‘which objects will be affected due to the removal of a flow fitting?’. On anIFC viewer, this question will take at least 15-20 mins to find the solution. Theuser needs to manually navigate through the 3D representation following the con-nections and make note of all the objects. This is similar to exploring differentbranches of a tree. The user needs to keep track of all the intersections and pathsthat branch out from these intersections. Also 3D representation pose problemsof occlusion. This exerts a big cognitive load on the user. As a result, users cur-rently do not try to answer these questions using IFC viewers. Instead they preferextracting information from raw ifcXML files.To verify that our tool helps users answer relationship based questions better,we conducted a user study. The participants were of two types — with and withoutdomain knowledge. For participants with domain knowledge, we selected studentsfrom the civil engineering, construction and architecture department (AEC). Forparticipants without domain knowledge, we selected students with computer sci-ence background (CS). The intuition behind this grouping was that we needed oneset of participants with domain knowledge but without any knowledge about XMLdata, whereas the other set consisted of participants without domain knowledge butwith experience with XML data. This grouping was done to observe the effect ofdomain knowledge on participants. We wanted to observe how the background ofparticipants affected the way they interacted with the tool and with the ifcXMLfile.Participants in both groups were given the same set of questions and asked toanswer them using both the tool and the raw ifcXML file. To make it easier to workwith the ifcXML files, we allowed participants to use XML viewers and createbasic visualizations such as tree view. For each task in both groups, we recordedthe amount of time taken. Also for participants in each group, we noticed theassumptions and common mistakes they made. After completion of the tasks, we45gave each of the participants an information form to fill out as included in AppendixB. This form consists of basic questions to better understand the participant. In thisquestionnaire, we also asked the participants if they had prior experience with BIMmodels and if they have seen an ifcXML file before. This helped us judge the biasesof the users. Below we present the set of questions (Appendix A) that were given toparticipants from both groups. Participants were expected to answers the followingquestions using both ifcXMLExplorer and ifcXML. During this time, their screenswere being recorded in order to have video documentation of their sessions.Task set• Which is the biggest subsystem? Write down the name or id. HINT: Thebiggest subsystem will have the maximum number of elements. Under therelationship ifcRelAssignsToGroup, elements defined under RelatedObjectswhile the subsystem they are related to is defined as a ifcSystem under Re-latingGroup.• Which subsystem has the least number of distribution ports? Write down thename or id. HINT: In the ifcXML file, under RelatedObjects, distributionports are defined as ifcDistributionPort.• Find any three id values and their corresponding types of Flow Terminalsbelonging to the subsystem mechanical supply air 4. HINT: Flow Terminalsare defined in ifcXML as ifcFlowTerminal. The type of a flow terminal isdefined as ObjectType under its definition.• For the flow terminal identified by id = ‘i2229’, find the names or id’s ofall the different relationships it is a part of. HINT: Flow terminal will bedefined as ifcFlowTerminal. Relationships are any fcRel elements that havethis particular flow terminal in its RelatedElement or RelatedObjects.• Which relationships contain the flow terminal i2229 and some distributionports?• Out of the above relationships, which one do you think is a direct connectionbetween the flow terminal and a distribution port? HINT: A different rela-46tionship besides ifcRelAssignsToGroup and ifcRelContainedInSpatialStruc-ture.• Which distribution ports is it connected to? Note down the id values.High Level use case: Assuming that you know the identifier value of a par-ticular data instance. Let this be i18788 which is a Flow Fitting. How will theremoval of this particular flow fitting affect the entire system? You need to find thefollowing information:• Which distribution ports will directly get affected? Write id values.• Find two distribution ports that will get affected indirectly? Write id values.HINT: These distribution ports will belong to the same system as the flowfitting.• Find the ids of the other elements that will be affected because of the removalof this flow fitting.• Which floor get disrupted? Write name or id value. HINT: Floor informa-tion can be found defined as ifcBuildingStorey under the relationship ifcRel-ContainedInSpatialStructure as a RelatingStructure.Details about user study• Number of participants: The user study consisted of a total of 10 participants.The AEC group had 4 participants. The CS group had 6 participants.• Training time for each participant was enforced: For both ifcXMLExplorerand ifcXML, we gave a 1 minute introduction each for them to better under-stand the tool and where to find what. Following which they were given 2minutes each to play with both the tools.• The ordering of both the tools was counter balanced for each set of users(AEC and CS Students). For example, in the AEC group, 2 participantsused the tool first followed by the ifcXML file while the other 2 participantsfollowed a reverse ordering. The same was done with the CS students.47• Users were notified if they got an incorrect answer and asked to look againto find the correct solution.Results of user studyThe first part of the results section is going to address each individual query in thetask set for each participant. For each of the queries below, in the graph, we firstshow all the 6 CS participants followed by the 4 AEC participants. Following that,we summarize our results by combining similar types of queries.Figure 3.25: User comparison between IfcXML and IfcXMLExplorer. Par-ticipant 1- 6 are CS students. Participants 7 - 10 are AEC students• Which is the biggest subsystem? Write down the name or id. From Fig-ure 3.25(a), we can see that participants have consistently performed much48Figure 3.26: User comparison between IfcXML and IfcXMLExplorer. Par-ticipant 1- 6 are CS students. Participants 7 - 10 are AEC studentsbetter with IfcXMLExplorer in comparison to IfcXML. Participant #2 tooksubstantially less time while using IfcXML because he did not count thenumber of elements in each subsystem unlike the other participants whocounted elements for each subsystem to answer the question.• Which subsystem has the least number of distribution ports? Writedown the name or id. Participant #2 and #6 took less time as they did notcount and just estimated the number of distribution ports of each subsystem(Figure 3.25(b)). All other participants manually counted the number ofdistribution ports to find which subsystem had the least number.49Figure 3.27: User comparison between IfcXML and IfcXMLExplorer. Par-ticipant 1- 6 are CS students. Participants 7 - 10 are AEC students• Find any three id values and their corresponding types of Flow Termi-nals belonging to the subsystem mechanical supply air 4. Participants#7,#8, #9 & #10 (AEC students) took more time using the IfcXML file incomparison to their CS counterparts as they found the XML file difficult tonavigate through. As the CS students were more familiar with XML filesand their structures, they found it easier to get the correct answer (Figure3.25(c)).• For the flow terminal identified by id = ‘i2229’, find the names or id’s ofall the different relationships it is a part of. All participants consistentlyperformed well using IfcXMLExplorer. AEC participants took more time50using IfcXML in comparison to CS participants (Figure 3.25(d)).• Which relationships contain the flow terminal i2229 and some distribu-tion ports? Interestingly, participant #5 took less time using the IfcXMLfile in comparison to IfcXMLExplorer. This is because while using IfcXM-LExplorer, he started experimenting with the network view and lost track ofthe task at hand. He commented later that the network view was well doneand engaging (Figure 3.26(a)).• Out of the above relationships, which one do you think is a direct con-nection between the flow terminal and a distribution port? Participant#7 and #10 took more time using IfcXMLExplorer as they failed to instantlyunderstand the meaning of the phrase ‘direct connection’ (Figure 3.26(b)).• Which distribution ports is it connected to? Note down the id values.Participant #3 and #5 got confused between id value of the relationship andid value of distribution port. So they took more time (in comparison to someAEC students) using IfcXML in this case (Figure 3.26(c)).• Which distribution ports will directly get affected? Write id values. Thisis a high level use case question which is similar to the previous questionwhich also asks the participants to find directly connected distribution ports(Figure 3.26(d)). It can be seen that the results are very similar to the previ-ous graph.• Find two distribution ports that will get affected indirectly? Write idvalues. In this case, due to the hint provided, CS students got confused andtook more time using IfcXML as they did not understand the meaning of thephrase ‘belong to the same system’. Once told that it is the same as definedearlier, it was easy for them to find the solution. AEC students already hadthis information, so they took less time. (Figure 3.27(a)).• Find the id’s of the other elements that will be affected because of the re-moval of this flow fitting. On an average, participants from both groups tookthe same time to find the solution using each of the tools (Figure 3.27(b)).51• Which floor get disrupted? Write name or id value. CS students did notknow where to find floor information in IfcXMLExplorer. So they took moretime in comparison to AEC students (Figure 3.27(c)).From the above results, we can summarize our conclusions as below.Participants using IfcXMLExplorer took substantially less time to complete thetasks in comparison to using the ifcXML file. The user study proved that IfcXM-LExplorer makes it much easier for users to find relationship information in com-parison to extracting from the ifcXML file. This kind of information is difficult tofind from ifcXML files even after using an IFC viewer such as Solibri [5]. Our col-laborators in the civil engineering department confirmed this by demonstrating thedifficulty of answering relationship based questions using IFC viewers. In orderto answer relationship based questions such as ‘which floors are affected if flowterminal i2229 is removed?’, a user using an IFC viewer needs to first find the flowterminal and then follow the various connections in the 3D representation to seewhich other floors the subsystem that i2229 belongs to spans across. It is similar tonavigating through different branches of a tree. Starting with flow terminal i2229,the user needs to explore every path that is connected to this flow terminal. Everytime a new intersection is encountered, the user needs to chose a path and alsokeep track of the other possible paths from the same intersection. Once a path isexhausted, the user needs to go back to explore other unexplored paths. This is atedious task with a huge cognitive load. According to civil engineering experts, theabove question requires them to navigate through each path and note down eachintersection (to help keep track of explored paths). On an IFC viewer, this questionwill take at least 15-20 mins to find the solution. Also the solution is not guaranteedto be correct because if the user makes a mistake through the process, he may endup losing track of paths and miss a possible solution. As a result, users currently donot try to answer these questions using IFC viewers. Instead they prefer extractinginformation from raw ifcXML files. Fortunately, this problem can be made mucheasier using IfcXMLExplorer. As seen in Figure 3.28, on an average, participantsusing IfcXMLExplorer took less than 2 minutes to answer each question, whereasparticipants using the ifcXML file took around 8 minutes. This shows that in gen-52eral, IfcXMLExplorer makes it much easier for users to complete the task set givento them.Participants from the AEC group took less time while using IfcXMLExplorer incomparison to CS participants. It was seen from the user study that participantsfrom the AEC group took less time in comparison to their CS counterparts whileusing ifcXMLExplorer. This was due to the additional domain knowledge theyhad. We observed that these participants made a lot of correct assumptions withoutrequiring additional help. They knew the meaning of terms defined in ifcXMLwhereas CS students did not have this information. As a result, they were muchfaster with ifcXMLExplorer. This result is obvious from Figure 3.28.Figure 3.28: Comparison between time taken using IfcXMLExplorer andIfcXML file for both groups of participants.Participants from the CS group took less time while using the IfcXML file in com-parison to AEC participants. Participants from the CS group took less time withthe ifcXML file. We assume this is because they had knowledge about XML filesand their orientation. It was easier for CS students to navigate through an ifcXML53file and find necessary information. This result can also be seen in Figure 3.28.Task time distribution. The questions in the task set can be divided on the basis ofthe views that would need to be explored to find solutions. The first two questionson the task set can be answered by looking at the overview in ifcXMLExplorer.The third question requires participants to use the search view. And the last fourquestions need to be answered using the network view. All four questions in thehigh level use case can only be answered using the network view. Observations onthe three types of questions are discussed below.• Overview View: Participants could easily obtain the number of elements(such as flow terminals) without manually counting elements in the ifcXMLfile. They claimed that this made it much easier for them to compare sub-systems. The overview view answered questions such as ‘Which subsystemhas the least number of distribution ports?’ on an average of less than twominutes.• Search View: Participants from both groups agreed that the search view madeit easier to find an element whose location is unknown. They could browsethrough all the subsystems to see which one contained the element they werelooking for.• Network View: The network view helped the participants keep track of re-lationships between elements. Out of the four AEC participants, both theparticipants who used ifcXML before ifcXMLExplorer to answer the ques-tion set gave incorrect answers to the first two network view questions. Thisshows that finding out relationship information is harder for AEC partici-pants even with their expertise.Results from information form.• All participants from the AEC group (all had an average of more than 2 yearsof experience with BIM models) claimed to not understand the structure ofifcXML files.54• Only 1 participant out of 6 in the CS group claimed to understand ifcXMLfiles.• Every participant from the AEC group claimed that the biggest problem withifcXML files is lack of conciseness. Information is spread across the file.• CS Students mentioned that despite understanding XML files, ifcXML wasdifficult to work with.3.3.6 Discussion and future workIfcXMLExplorer primarily shows relationships between different ifcXML elements.These relationships are hard to extract from the ifcXML data as it involves goingthrough many id - ref pairs to find which elements are connected to each other. Ourmain goal was to design an easy to use and understand visualization to help usersbetter make sense of complex ifcXML files. The system was designed and itera-tively modified based on experts in civil engineering. Currently ifcXMLExploreronly concentrates on the relationship between elements. It would be interesting tosee what other information can be extracted from ifcXML files. IFC viewers likeSolibri [5] already deal with the orientation and spacing of objects in a building,but it would be good to see what else can be visualized. We understand that not allextractable information would need to be visualized. Future work could improveinterface design details and extend the functionality of the tool.To validate the usefulness of our tool, we conducted user studies with twogroups. The first group consisted of people who have prior knowledge about thestructure of XML files but no domain knowledge. The second group consisted ofdomain experts without any XML knowledge. We gave both the groups same setof tasks to perform, once using the visualization tool and again with the ifcXMLfile. To be fair, we allowed users to be able to use any basic XML viewer andgenerate basic visualizations to better understand the ifcXML file. Users fromboth groups claimed that IfcXMLExplorer was much easier to work with. Mostusers got entangled in the mess of connections in the ifcXML file and had to keepexploring the same elements over and over again. On an average, finding answersusing the tool took less than one-third of the time taken to find the same answers55from the ifcXML file. We received positive feedback from both groups about thefeatures and design of IfcXMLExplorer.3.3.7 Concluding example of IfcXMLExplorerIn this section, we have taken a sample subsystem from the ifcXML file of theCIRS building and explained how ifcXMLExplorer helps us find each individualelement in this subsystem and gives us an idea about the different connections. Weare doing this to demonstrate how ifcXMLExplorer works and successfully givesthe user relationship related information which is difficult to find from the ifcXMLfile or by using standard IFC viewers. We take a sample subsystem and describe thestep-by-step process of discovering different elements in this subsystem. As seen inFigure 3.29, this subsystem has four elements of three different types - flow fitting,flow segment and building element proxy. The subsystem also has six distributionports with three InPorts and three OutPorts. Every element is connected to thesubsystem through ports. Between two elements, there are always two ports outof which one is an OutPort and the other is an InPort. This is a requirement forthere to be successful flow through the system. Each element is connected to a portthrough the relationship ifcRelConnectsPortToElement and ports are connected toeach other through the relationship ifcRelConnectsPorts.In Figure 3.30, we can use ifcXMLExplorer to see all the elements presentin the system. This can be found under the relationship specifying element typeifcRelAssignsToGroup. But this does not tell us how the different elements in thesubsystem are connected to each other. To find this information, we need to findthe relationship network of each individual element and track the connections.Let us start from the first element - flow fitting: i25739 (Figure 3.31). In ifcXM-LExplorer, when we enter the identifier value in the search bar, the relationship net-work of this flow fitting is shown. Clicking on the relationship specifying elementtype ifcRelConnectsPortToElement, we see that it is connected to the distributionport - OutPort-994050. Next we can enter the identifier value of OutPort-994050(i142622) to generate its relationship network. Now we see that under the relation-ship specifying element type ifcRelConnectsPortToElement, there is a connectionbetween the flow fitting i25739 and the port i142622 (Figure 3.32). On the same re-56Figure 3.29: Schematic representation of the subsystem - Mechanical Sup-ply Air 2. Elements excluding ports are shown in rectangular boxes.Distribution ports are shown in diamond shaped boxes.lationship network (Figure 3.33), if we click on the ifcRelConnectsPorts node, wesee that the port is connected to another distribution port - InPort-994048. Sincethe distribution port i142622 is an OutPort, it is connected to an InPort. Nextwe generate the relationship network of InPort-994048 (i142613), we see that un-der the relationship specifying element type ifcRelConnectsPorts (Figure 3.34) itis connected to OutPort-994050 and connected to the Flow Segment Round Ductthrough the relationship specifying element types ifcRelConnectsPortToElement(Figure 3.35). Following the rest of the connections in the system, we can find howall the events in the system are connected to each other.3.4 Case study conclusionsWe addressed how a visualization tool can be beneficial to solve a real world prob-lem that is not very well addressed by current database techniques. First, we in-vestigated how to retrieve and extract information from ifcXML files. Then wechecked for completeness and correctness of information in the ifcXML files. Thiswas done following the rules in Eastman’s paper [39] - Toward Robust and Quan-57Figure 3.30: Network view in ifcXMLExplorer which shows the entire sub-system mechanical supply air 2tifiable Automated IFC Quality Validation. Finally, we designed and developed avisualization tool for exploring and understanding ifcXML data. Our visualizationhelps users summarize all common individual elements in terms of their subsys-tems, locate an element whose identifier value is already known, explore and finda particular element and finally, view the relationship network of an element to seeits relationship to other elements. The main design objective was to make it easierfor users to find relationships and connections in ifcXML files without manuallynavigating through the data file.58Figure 3.31: Connection of Flow Fitting i25739 to Port i142622Figure 3.32: Connection of Port i142622 to Flow Fitting i25739Figure 3.33: Connection of Port i142622 to Port i142613Figure 3.34: Connection of Port i142613 to Port i14262259What: Data IfcXML files (Domain specific xml files) with real worldobjects described in terms of elements with differentattributes. Each of these elements has multiple unique datainstances with values for each attribute.What:Derived Matrix for heatmap where X axis shows individual elementtypes common between all subsystems, and Y axis showssystems (divided into subsystems) defined in the ifcXML file.Each section in the heat map has a table of elements (identifiedthrough a global identifier).Each element creates its own node-link diagram in form of atree where the root node is the selected element, the leaf nodesare the other elements it is connected to through referenceidentifiers. The middle nodes describe the relationshipbetween the elementsWhy: Tasks Actions: summarize and identify from heatmaps, search(locate and explore).Target: Network topology (relationship between root and leafnodes).How: Encode Heatmaps, Node-link diagramsHow: Manip-ulateSelect (elements between different views)How: Facet Juxtapose linked views - overview and small multiplesHow: Reduce Item FilteringScale For CIRS dataset - Elements: Ten ThousandsAll element types: 50, Subsystems: 5Table 3.1: Task and data abstraction: this table summarizes our analysis ofwhat, why and how questions based on Munzner’s model[33].Figure 3.35: Connection of Port i142613 to Flow Segment i1693160Chapter 4XML in other fieldsXML visualization is not a new concept. There are many online XML viewers[12] that extract the hierarchical form of XML data and visualize it as a tree. Buteach XML file is different from another. XML files across different domains arevery different. Domain specific visualization of XML data has been done. Thesevisualizations are distinct from each other, even though they all have XML data incommon. This shows that XML data visualization is very domain specific.4.1 Linguistic XMLIn XCES [25], the authors have created an XML-based standard encoding for lin-guistic corpora. This shows us that the need for standardizing XML data is con-sistent throughout all domains. Dipper et al. [20] have applied visualization tolinguistic XML data to find patterns in annotated data. After that, an OWL andXQuery based mechanism [35] was used to retrieve linguistic patterns from XML-Corpora. Finally in 2008, an ontology of linguistic annotations [16] was createdbased on existing standardizations for integration of linguistic data in XML form.4.2 Genomic visualization of XML dataCGView [36] is a Java application and library for generating high-quality, zoomablemaps of circular genome. It converts XML data to a graphical map. In some cases,XML languages such as PhyloXML [23] were created to store and exchange the61structures of evolutionary trees and associated data from the complex schema de-scribed through an XSD. PhyloXML was extended to a visualization known asInteractive Tree of Life [30] by Ivica Letunic et al.4.3 XML data visualization softwareGGobi [37] is an interactive and dynamic for data visualization. One of the datatypes it visualizes is XML data. Software architectures based on XML data havealso been visualized [24]. XML-based static type checking and dynamic visualiza-tion for TCOZ [21] develops a type checker for detecting static semantic errors inthe TCOZ specification. There are many more XML data visualization softwares,but they all cater to different kinds of XML data.62Chapter 5ConclusionIn this thesis, we address how a visualization tool can be beneficial to solve a realworld problem that is not very well addressed by current database techniques. First,we investigated how to retrieve and extract information from ifcXML files. Thenwe chose a set of tests to validate the semantic information in the ifcXML files.After carefully choosing these tests, we applied these tests to our ifcXML files tocheck for their correctness. Once the information was validated, we designed anddeveloped a visualization tool for exploring and understanding ifcXML data. Ourvisualization, called IfcXMLExplorer, helps users summarize all common non-relationship specifying elements in terms of their subsystems, locate an elementwhose identifier value is already known, browse and find a particular element andfinally, view the relationship network of an element to see its relationship to otherelements. The main design objective was to make it easier for users to find re-lationships and connections in ifcXML files without manually navigating throughthe data file.63Bibliography[1] BIM: Building Information Model.http://www.autodesk.com/solutions/building-information-modeling/overview.URLhttp://www.autodesk.com/solutions/building-information-modeling/overview.Accessed: 2013-07-14. → pages 2, 35[2] Xml specification dtd. w3c consortium, 1998. URLhttp://www.w3.org/XML/. → pages 1[3] Industry Foundation Classes. http://www.buildingsmart.org/standards/ifc.URL http://www.buildingsmart.org/standards/ifc. Accessed: 2011-01-07. →pages 2, 6[4] Autodesk Revit. http://www.autodesk.com/products/revit-family/overview.URL http://www.autodesk.com/products/revit-family/overview. Accessed:2010-06-30. → pages 2, 6[5] Solibri. http://www.solibri.com/. URL http://www.solibri.com/. Accessed:2013-12-01. → pages 7, 52, 55[6] Xml schema definition language (xsd), w3c consortium, 2012. URLhttp://www.w3.org/TR/xmlschema11-1/. → pages 1[7] Basex. http://basex.org/. URL http://basex.org/. Accessed: 2012-05-13. →pages 13[8] Bootstrap. http://getbootstrap.com/. URL http://getbootstrap.com/.Accessed: 2012-10-23. → pages 44[9] building SMART: International home of openBIM.http://www.buildingsmart-tech.org/. URLhttp://www.buildingsmart-tech.org/. Accessed: 2011-04-29. → pages 6, 8,10, 1164[10] D3js. http://d3js.org/. URL http://d3js.org/. Accessed: 2013-09-16. → pages44[11] Ifcxml. http://www.buildingsmart-tech.org/specifications/ifcxml-releases.URL http://www.buildingsmart-tech.org/specifications/ifcxml-releases.Accessed: 2012-02-14. → pages 2, 35[12] Xmlgrid. http://xmlgrid.net/. URL http://xmlgrid.net/. Accessed:2014-05-10. → pages 61[13] S. Agreste, P. De Meo, E. Ferrara, and D. Ursino. Xml matchers: approachesand challenges. Knowledge-Based Systems, 66:190–209, 2014. → pages 1[14] R. Amor. A better bim: Ideas from other industries. In Proceedings of the2008 CIB W78 Conference. Citeseer, 2008. → pages 20[15] D. Chamberlin. Xquery: An xml query language. IBM systems journal, 41(4):597–615, 2002. → pages 2, 19[16] C. Chiarcos. An ontology of linguistic annotations. In LDV Forum,volume 23, pages 1–16, 2008. → pages 61[17] J. Clark, S. DeRose, et al. Xml path language (xpath) version 1.0, 1999. →pages 2[18] P. De Meo, G. Quattrone, G. Terracina, and D. Ursino. Integration of xmlschemas at various severity levels. Information Systems, 31(6):397–434,2006. → pages 1[19] I. Denmark. Ifc exchange test between 3d cad applications, 2006. → pages21[20] S. Dipper and M. Go¨tze. Accessing heterogeneous linguistic data?genericxml-based representation and flexible visualization. In Proceedings of the2nd Language & Technology Conference 2005, pages 23–30, 2005. → pages61[21] J. S. Dong, Y. F. Li, J. Sun, J. Sun, and H. Wang. Xml-based static typechecking and dynamic visualization for tcoz. In Formal Methods andSoftware Engineering, pages 311–322. Springer, 2002. → pages 62[22] D. Flanagan. JavaScript: the definitive guide. ” O’Reilly Media, Inc.”, 2006.→ pages 4465[23] M. V. Han and C. M. Zmasek. phyloxml: Xml for evolutionary biology andcomparative genomics. BMC bioinformatics, 10(1):356, 2009. → pages 61[24] D. I. Houlding. Method and system for providing visualization ofunderlying architecture of a software system, Aug. 19 2008. US Patent7,415,697. → pages 62[25] N. Ide, P. Bonhomme, and L. Romary. An xml-based encoding standard forlinguistic corpora. In Proceedings of the Second International Conferenceon Language Resources and Evaluation, pages 825–830, 2000. → pages 61[26] G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. Naga:Searching and ranking knowledge. In Data Engineering, 2008. ICDE 2008.IEEE 24th International Conference on, pages 953–962. IEEE, 2008. →pages 2[27] A. Kiviniemi. Ifc certification process and data exchange problems. InProceedings of the 2008 ECCPM Conference, page 6, 2009. → pages 21[28] M. Laakso and A. Kiviniemi. The ifc standard: A review of history,development, and standardization, information technology. ITcon, 17(9):134–161, 2012. → pages 20[29] M. L. Lee, L. H. Yang, W. Hsu, and X. Yang. Xclust: clustering xmlschemas for effective integration. In Proceedings of the eleventhinternational conference on Information and knowledge management, pages292–299. ACM, 2002. → pages 1[30] I. Letunic and P. Bork. Interactive tree of life v2: online annotation anddisplay of phylogenetic trees made easy. Nucleic acids research, pagegkr201, 2011. → pages 62[31] G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. Ease: an effective 3-in-1keyword search method for unstructured, semi-structured and structureddata. In Proceedings of the 2008 ACM SIGMOD international conference onManagement of data, pages 903–914. ACM, 2008. → pages 2[32] R. Lipman, M. Palmer, and S. Palacios. Assessment of conformance andinteroperability testing methods used for construction industry productmodels. Automation in Construction, 20(4):418–428, 2011. → pages 20[33] T. Munzner. Visualization Analysis and Design. A K Peters VisualizationSeries. Taylor and Francis / CRC Press, 2014. → pages vi, 32, 6066[34] E. Prud, A. Seaborne, et al. Sparql query language for rdf. 2006. → pages 2[35] G. Rehm, R. Eckart, and C. Chiarcos. An owl-and xquery-based mechanismfor the retrieval of linguistic patterns from xml-corpora. Corpus, 2(3):1,2007. → pages 61[36] P. Stothard and D. S. Wishart. Circular genome visualization and explorationusing cgview. Bioinformatics, 21(4):537–539, 2005. → pages 61[37] D. F. Swayne, D. T. Lang, A. Buja, and D. Cook. Ggobi: evolving fromxgobi into an extensible framework for interactive data visualization.Computational Statistics & Data Analysis, 43(4):423–444, 2003. → pages62[38] H. S. Thompson, D. Beech, M. Maloney, et al. Xml schema part 1:Structures second edition, 2004. → pages 1[39] C. E. W. Solihin and Y. Lee. Toward robust and quantifiable automated ifcquality validation, 2014. → pages 3, 13, 16, 17, 21, 22, 23, 57[40] H. Wang and C. C. Aggarwal. A survey of algorithms for keyword search ongraph data. In Managing and Mining Graph Data, pages 249–273. Springer,2010. → pages 2[41] S. Yang, Y. Wu, H. Sun, and X. Yan. Schemaless and structureless graphquerying. Proceedings of the VLDB Endowment, 7(7):565–576, 2014. →pages 2[42] J. Zhang. Evaluations on xml standards for actual applications. Master’sthesis, University of British Columbia, 2013. → pages vii, 6, 7, 8, 9, 1067Appendix AUser Study QuestionsThe participants are required to answer the following questions with the help of thetechnology provided to them.• Which is the biggest subsystem? Write down the name or id. HINT: Thebiggest subsystem will have the maximum number of elements. Under therelationship ‘ifcRelAssignsToGroup’, elements defined under RelatedOb-jects while the subsystem they are related to is defined as a ‘ifcSystem’ underRelatingGroup.• Which subsystem has the least number of distribution ports? Write down thename or id. HINT: In the ifcXML file, under RelatedObjects, distributionports are defined as ‘ifcDistributionPort’• Find any three id values and their corresponding types of Flow Terminalsbelonging to the subsystem mechanical supply air 4. HINT: Flow Terminalsare defined in ifcXML as ‘ifcFlowTerminal’. The type of a flow terminal isdefined as ‘ObjectType’ under its definition.• For the flow terminal identified by id = ‘i2229’, find the names or id?s of allthe different relationships it is a part of. HINT: Flow terminal will be definedas ‘ifcFlowTerminal’. Relationships are any ‘ifcRel’ elements that have thisparticular flow terminal in its RelatedElement or RelatedObjects.68• Which relationships contain the flow terminal i2229 and some distributionports?• Out of the above relationships, which relationship do you think is a di-rect connection between the flow terminal and a distribution port? HINT:A different relationship besides ‘ifcRelAssignsToGroup? and ‘ifcRelCon-tainedInSpatialStructure?.• Which distribution ports is it connected to? Note down the id values.High Level use case: Assuming that you know the identifier value of a par-ticular data instance. Let this be i18788 which is a Flow Fitting. How will theremoval of this particular flow fitting affect the entire system? You need to find thefollowing information:• Which distribution ports will directly get affected? Write id values.• Find two distribution ports that will get affected indirectly? Write id values.HINT: These distribution ports will belong to the same system as the flowfitting.• Find the ids of the other elements that will be affected because of the removalof this flow fitting.• Which floor get disrupted? Write name or id value. HINT: Floor informationcan be found defined as ‘ifcBuildingStorey’ under the relationship ‘ifcRel-ContainedInSpatialStructure’ as a RelatingStructure.69Appendix BUser Study Information Form• In what age group are you?– 18 and under– 19 - 25– 26 - 35– 36 - 45– 46 - 55– 55 and above• Gender?– Male– Female– I prefer to not disclose• Which department do you belong to?• What program are you part of?• Which year of your program are you currently in?• Do you have experience with BIM Models? If yes, how many years?• Before today, have you seen an ifcXML file?70– Yes– No• Do you understand the structure of ifcXML files?– Yes– No• What would you like to change about them?• What would you like to change about the visualization?71


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items