Probabilistic Reasoning with Undefined Properties inOntologically-Based Belief NetworksbyChia-Li KuoB.Sc., The University of British Columbia, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)October 2013c? Chia-Li Kuo, 2013AbstractThis thesis concerns building probabilistic models with an underlying ontology thatdefines the classes and properties used in the model. In particular, it considers theproblem of reasoning with properties that may not always be defined. Furthermore,we may even be uncertain about whether a property is defined for a given individ-ual. One approach is to explicitly add a value ?undefined? to the range of randomvariables, forming (what we call) extended belief networks. Adding an extra valueto a random variable?s range, however, has a large computational overhead. In thiswork, we propose an alternative, ontologically-based belief networks, where prop-erties are only used when they are defined. We show how probabilistic reasoningcan be carried out without explicitly using the value ?undefined? during inference.This, in general, requires that we perform two probabilistic queries to determine(1) the probability that the hypothesis is defined and (2) the probabilities of thehypothesis given it is defined. We prove this is equivalent to reasoning with thecorresponding extended belief network and empirically demonstrate on syntheticmodels that inference becomes more efficient.iiPrefaceThis thesis is largely based on the published work, Kuo et al. [18], by myselfwith collaborators David Buchman, Arzoo Katiyar, and David Poole. The paperwas the result of many active discussions among all the co-authors and numeroussubsequent refinements. Specific contributions of each co-author are listed below:? As the main contributor and co-author, I proposed and formalized the ideasin detail, wrote the original draft of the paper, and conducted all the exper-iments to compare the performance of different approaches. Additionally,I was responsible for all the proofs in Appendix A. I have worked in closeconsultation with David Poole and been frequently involved in discussionswith David Buchman.? David Buchman actively engaged in discussions and contributed fresh per-spectives and skepticism toward our work. He was the first to propose somekey insights into the problem we considered. He also revised, proofread, andgave comments on drafts of the paper.? Arzoo Katiyar, during her summer internship at UBC, mainly helped withthe implementation of a Java program to test several preliminary ideas at theearly stage of the work.? David Poole introduced this research topic and supervised the work throughits entirety. He constantly advised, corrected, criticized, and discussed thework. Several portions of the paper have been revised and modified by DavidPoole after its initial draft by myself. The experiments used part of the open-source code that was originally developed by Poole and Zhang [28].iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 22 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Probabilistic Inference . . . . . . . . . . . . . . . . . . . 92.2.3 Context-Specific Independence . . . . . . . . . . . . . . 112.3 Ontologies and Probabilistic Models . . . . . . . . . . . . . . . . 123 Ontologically-Based Belief Networks (OBBNs) . . . . . . . . . . . . 173.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.1 Preliminary Definitions and Notation . . . . . . . . . . . 173.1.2 Structural and Probabilistic Constraints . . . . . . . . . . 18iv3.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Equivalence of Representations . . . . . . . . . . . . . . . . . . . 254 Empirical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 275 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35A Lemmas and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.1 Correctness of 3Q-INFERENCE . . . . . . . . . . . . . . . . . . . 42vList of FiguresFigure 2.1 The DAG structure of a simple belief network for Example 2. 7Figure 2.2 Tabular representation of the CPDs for the variables in the be-lief network for Example 2. . . . . . . . . . . . . . . . . . . 8Figure 2.3 Two confactors representing the set of CPDs for admitted inthe belief network in Example 2. . . . . . . . . . . . . . . . . 12Figure 2.4 A tree-structured representation of the CPDs for causeDamagegiven its parents isAnimal, isHuman, and education. The leafnodes specify probability distributions over {true, f alse}. . . 14Figure 2.5 The graph structure of a belief network using the extendedvariants of the random variables in Example 6. . . . . . . . . 15Figure 2.6 The graph structure of an ontologically-based belief networkfor Example 6. . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 3.1 An enumeration of all possible worlds represented by the OBBNin Example 6. Every path from the root to a leaf correspondsto a maximal well-defined conjunction of variable assignments. 22Figure 4.1 A pictorial visualization of a TAF-structured ontology with 13properties, namely p0, p1, . . . , p12. . . . . . . . . . . . . . . . 27Figure 4.2 A comparison of running times for probabilistic queries withthe OBBNs and their corresponding EBNs, using the three dif-ferent approaches. The x-axis is linearly spaced, whereas they-axis is spaced using the log scale. . . . . . . . . . . . . . . 29viFigure 4.3 A comparison of running times for probabilistic queries usingdifferent modified OBBNs and EBNs. The x-axis is linearlyspaced, whereas the y-axis is spaced using the log scale. . . . 30Figure 4.4 Two ?side-by-side? boxplots of 30 data of running times forregular OBBNs (in red) and extra-edged OBBNs (in blue). Thelower and upper sides of the boxes are the first and third quar-tiles, whereas the middle green bars correspond to the medi-ans. The whiskers extending from the boxes corresponding tonon-outlier extreme points. . . . . . . . . . . . . . . . . . . . 32viiAcknowledgmentsI owe to all who have helped make this thesis possible.Foremost, I am deeply indebted to my research supervisor David Poole. Thisthesis would not have seen the light of day without his continuing guidance andsupport.I thank all STAR-AI members, my thesis second examiner Giuseppe Carenini,and many other colleagues along my study. Every piece of their comments andcritiques on my work has made me a better researcher.I am particularly grateful to my friends in the department. Over the two years,they have always been a source of welcoming distractions from academic life. Theymade my experience at UBC more enjoyable and memorable.Finally, I would like to express my sincerest gratitude to my brother Chia-TungKuo, who has added more meaning to my life. Those moments we shared wereconstant reminders to me that there was more to life than my graduate study.viiiChapter 1IntroductionIn many fields, we want ontologies to define the vocabulary and probabilistic mod-els to make predictions. For example, in geology the need to define standardizedvocabularies becomes clear when one looks at detailed geological maps, whichdo not match at political boundaries because jurisdictions use incompatible vo-cabularies. There has been much recent effort to define ontologies for geologyand use them to describe data (e.g., http://onegeology.org/). Geolo-gists need to make decisions under uncertainty and so need probabilistic modelsthat use ontologies. Similarly in the biomedical domain, huge ontologies (e.g.,http://obofoundry.org/) are being developed and need to be integratedinto decision making under uncertainty.In an ontology, the domain of a property specifies the individuals for whichthe property is defined. One of the problems in the integration of ontologies andreasoning under uncertainty arises when properties have non-trivial domains andthus are not always defined. Properties applied to individuals correspond to randomvariables in the probabilistic model, giving rise to random variables that are notalways defined. For example, property education may be applicable to peoplebut not to dogs or rocks. However, we may not know if an individual is a person.Performance on certain task may depend on his/her education level if the individualis a person, such as when some damage may have been caused by a crafty person(depending on his/her education level) or by a natural phenomenon. Similarly,hardness measure (in Mohs scale [22]) may be applicable to rocks but not to people.1When modelling and learning, we do not want to think about undefined values, andwe will never observe an undefined value in a dataset that obeys the ontology; forinstance, we do not want to consider the education level of rocks, and no datawould contain such information.1.1 Thesis ContributionWe propose a simple framework, ontologically-based belief networks (OBBNs),which integrates belief networks [25] with ontologies. OBBNs do not explicitlyuse an undefined value in the construction of random variables, but by leveragingthe ontology we can compute the posterior distribution of any random variable, in-cluding the probability that it is undefined. We define three inference approaches tocomputing the posterior distribution of a random variable. The first approach con-structs a regular belief network, which we call an extended belief network (EBN),that explicitly includes an extra value ?undefined? in the range of the random vari-ables. Then standard belief network inference algorithms are applied. The secondapproach does not include ?undefined? but involves two separate inference stepswith the OBBN: (1) determining the probability that the query is well-defined and(2) querying a random variable as if the well-definedness of the query were ob-served evidence. The third is a mix of the first two approaches and only adds?undefined? to the range of the target variable at query time.1.2 Thesis OrganizationChapter 2 provides background on ontologies, belief networks, and the interplaybetween them. Chapter 3 starts with preliminary definitions and notation, and thenformally defines OBBNs and specifies how probabilistic reasoning is carried out inOBBNs without using any undefined value in the model, whereas the probabilitythat a random variable is undefined can still be obtained as the result of inference.In Chapter 3, we also present the coherence of OBBNs as well as the validity of theproposed inference scheme by comparing OBBNs to the equivalent EBNs. Someempirical results are presented in Chapter 4. Proofs are included in Appendix A.Chapter 5 concludes the work and discusses possible future research directions.2Chapter 2BackgroundThis chapter presents some background material for the rest of this thesis. Sec-tion 2.1 and 2.2 review the basics of ontologies and belief networks, respectively.Section 2.3 discusses the interplay between ontologies and belief networks, andintroduces the approach that we take to building ontologically-based probabilisticmodels.2.1 OntologiesAn ontology is a formal specification of meanings of symbols in information sys-tems [31]. Ontologies provide researchers and practitioners with standardized vo-cabulary in which to describe the world. An ontology defines any terminology thatneeds to be shared among datasets.Modern ontologies, such as those written in the Web Ontology Language (OWL)[16, 23], are defined in terms of classes, properties, and individuals. The semanticsof OWL is defined in terms of sets [24] ? a class is the set of all possible individualsit may contain, and a property is a binary relation, which is the set of all orderedpairs for which the property holds.Properties have domains and ranges. The domain of a property is the class ofindividuals for which the property is defined. Thus a property is only defined forindividuals in its domain; for other individuals, the property is not defined. Therange of a property specifies the set of possible values for the property.3It has been advocated that ontologies be written in terms of Aristotelian def-initions [1, 3, 27], where each explicitly specified class is defined in terms of asuperclass, the genus, and restrictions on properties (e.g., that some property has aparticular value), the differentia, that distinguish this class from other subclasses ofthe genus. This is not a restrictive assumption, as we can always define a propertythat is true of members of a class. It allows us to just model properties, with classesbeing defined in terms of property restrictions.T hing is the single top-level class, of which every other class is a subclass, andT hing contains all possible individuals.Example 1. The functional property education is defined for humans, and its rangeis {low,high}. This can be specified in OWL-2 functional syntax:Declaration(ObjectProperty(:education))FunctionalObjectProperty(:education)ObjectPropertyDomain(:education :Human)ObjectPropertyRange(:educationObjectUnionOf(:low :high))A human is any individual in the class Animal for which the property isHumanis true:Declaration(Class(:Human))EquivalentClasses(:HumanObjectIntersectionOf(DataHasValue(:isHuman "true"^^xsd:boolean):Animal))Declaration(DataProperty(:isHuman))FunctionalDataProperty(:isHuman)DataPropertyDomain(:isHuman :Animal)DataPropertyRange(:isHuman xsd:boolean)Here we stated that isHuman is only defined for animals. This makes the ex-ample later more interesting. An animal is an individual for which the propertyisAnimal is true and can be defined in a similar way. The domain of isAnimal isT hing and its range is a Boolean value.4xsd in the above OWL-2 specification stands for the XML Schema Definitionand refers to the schema where boolean is defined.In Example 1, the genus and differentia of Animal are T hing and isAnimal =true, respectively, whereas those of Human are Animal and isHuman = true. Theproperty education has Human as its domain and is thus undefined when appliedto an individual for which isHuman is not true. Likewise, isHuman has domainAnimal and so is only defined for individuals of the class Animal.Definition 1. An ontology O is a set of logical assertions about a set C of classes,a set P of properties, and a set I of individuals. Each property p ? P has a domainand a range, where the domain is the class for which p is only defined, and therange is the set of values that p can take.We use O.C and O.P to refer to the classes and properties, respectively, ofontology O. We denote by range(p) the range of property p.Definition 2. An Aristotelian ontology is an ontology O where every class C ?O.C is either T hing or is defined as C? ? O.C conjoined property restrictions.In this work, we only consider property restrictions of the conjunctive formp1 = v1? ?? ?? pk = vk, where pi ? O.P, vi ? range(pi), and pi is defined for indi-viduals in C? for which p1 = v1??? ?? pi?1 = vi?1.In an Aristotelian ontology O, we can reduce any class C ? O.C to a conjunc-tion of property restrictions (with T hing omitted), which we refer to as the prim-itive form of class C. For property p ? O.P, we denote by dom(p) the primitiveform of the domain of p.Definition 3. An acyclic Aristotelian ontology is an Aristotelian ontology thatdoes not contain cyclic definitions of property domains in the sense that any classthat defines the domain of a property cannot be defined in terms of that property.The acyclic restriction avoids the possibility of defining two classes in terms ofeach other. It ensures that to determine whether a property is defined for an individ-ual cannot involve applying the property to that individual. An acyclic Aristotelianontology O thus induces a partial ordering of the properties: for every property5p ?O.P, any property p? appearing in dom(p) precedes p, written as p? ? p. Aris-totelian definitions with acyclicity give rise to a class hierarchy and reduce classdefinitions to property restrictions. In Example 1, dom(education) ? isAnimal =true? isHuman = true.Definition 4. A formula is logical sentence written in the language (e.g., OWL)used for an ontology.A formula is entailed by a ontology if the formula is always true given the setof logical assertions in the ontology. As a simple example, if an ontology specifiesthat Student is a subclass of Human and Human is a subclass of Animal, then theontology entails that Student is a subclass of Animal, i.e., any individual in theclass Student is also in the class Animal. In Example 1, the ontology entails theformula: education = low? isHuman = true. There exist ontology reasoners forOWL (e.g., Pellet1 and HermiT2) that determine such entailments. We write O |=?if formula ? is entailed by ontology O.2.2 Belief NetworksProbabilistic graphical models [21, 25] compactly represent factorizations of jointprobability distributions in terms of graphs that encode conditional (in)dependencies.A belief network (as known as Bayesian network) is a directed probabilistic graph-ical model where the factorization corresponds to the conditional probability dis-tributions (CPDs) of the random variables. This chapter briefly reviews the basicsof belief networks, probabilistic inference, and the notion of context-specific inde-pendence to the levels of detail that are relevant to the rest of this thesis.We use capital, italicized letters to denote random variables and calligraphicletters for logical formulae that are built by instantiations of random variables.Theset of possible values for random variable X is denoted range(X). Boldface let-ters represent sets of random variables. Our work assumes discrete, finite randomvariables, where |range(X)| is finite for every variable X .1http://clarkparsia.com/pellet2http://hermit-reasoner.com/62.2.1 RepresentationA belief network consists of a qualitative graph structure that encodes conditional(in)dependences among the random variables and a quantitative part that specifiesthe CPDs of each random variable given each possible instantiation of its parentvariables. A belief network defines a joint probability distribution over a set ofvariables in terms of the CPDs for each individual variable.Definition 5. A belief networkN is a tuple (G,?) where G = (X,E) is a directedacyclic graph (DAG); X = {X1, . . . ,Xn} is a set of nodes that correspond to randomvariables, and E ? X?X is a set of directed edges between the nodes. For eachvariable Xi ?X, the set of variables X j such that?X j,Xi??E is called the parents ofXi, denoted pa(Xi). ? is a set of conditional probability distributions that includes,for each variable Xi, one CPD for each instantiation PXi of pa(Xi), written Pr[Xi |PXi ].Example 2. Whether a university will admit an applicant for graduate study de-pends on the applicant?s undergraduate GPA and whether he/she has strong ref-erences. Whether the applicant can obtain strong references depends on his/herundergraduate GPA. Figure 2.1 gives the graph structure of a belief network thatmodels the relationships among the three random variables: gpa, re f erences, andadmitted, where range(gpa) = {low,high}, range(admitted) = {true, f alse} andrange(re f erences) = {strong,weak}. The associated conditional probabilities aregiven in Figure 2.2.admittedre f erencesgpaFigure 2.1: The DAG structure of a simple belief network for Example 2.Definition 6. In a directed graph G = (X,E), a topological ordering is an orderingX1, . . . ,Xn of the nodes such that Xi ? X j if?Xi,X j?? E.7gpa re f erences admittd Valuelow weak true 0.1low weak f alse 0.9low strong true 0.7low strong f alse 0.3high weak true 0.4high weak f alse 0.6high strong true 0.7high strong f alse 0.3(a) CPDs for admitted.gpa Valuelow 0.8high 0.2(b) CPD for gpa.gpa re f erences Valuelow strong 0.2low weak 0.8high strong 0.7high weak 0.3(c) CPD for re f erences.Figure 2.2: Tabular representation of the CPDs for the variables in the beliefnetwork for Example 2.The DAG implies the existence of a topological ordering of the random vari-ables in a belief network. Given a topological ordering X1, . . . ,Xn of the variables,a belief network satisfies the local Markov property: for any variable Xi ? X,Xi is conditionally independent of its non-descendants given its parents. That is,Pr[Xi | X1, . . . ,Xi?1] = Pr[Xi | pa(Xi)].By the chain rule for probability and the local Markov property in an beliefnetwork, the probability of any instantiation X ? X1 = x1 ? ?? ? ?Xn = xn of thevariables can be computed using only the conditional probabilities asPr[X1 = x1??? ??Xn = xn] =n?i=1Pr[Xi = xi | X1 = x1??? ??Xi?1 = xi?1] (2.1)=n?i=1Pr[Xi = xi | PXi ], (2.2)where PXi denotes the instantiation of the parent variables of Xi in X .8Example 3. In Example 2,Pr[gpa = high? re f erences = strong?admitted = true]= Pr[gpa = high]Pr[re f erence = strong]Pr[admitted = true | gpa = high? re f erences = strong].2.2.2 Probabilistic InferenceOne of the most common inference problems with belief networks is conditionalprobabilistic queries ? asking the posterior distribution of a subset of the randomvariables given some observation. When there is no observation, such queriesamount to finding the marginal distribution of a subset of variables with respectto the joint distribution. We give a high-level description of arguably the simplestalgorithm for such probabilistic queries ? variable elimination (VE) [11, 34] ?which is a dynamic programming algorithm [2, 4] based on the simple principle ofdistributing sums over products.Consider querying some variable X j given the observation Y ? Y1 = y1? ?? ??Yk = yk, where {Y1, . . . ,Yk} ? X and X j /? {Y1, . . . ,Yk}. Let Z = {Z1, . . . ,Zl} bethe non-queried, non-observed variables (i.e., Z = X?Y?{X j}). The posteriorprobabilities of X j can be computed by first setting the values for the observedvariables and successively summing out (i.e., marginalizing out) the variables in Z,and then normalizing the result:Pr[X j | Y1 = y1??? ??Yk = yk]=Pr[X j ?Y1 = y1??? ??Yk = yk]Pr[Y1 = y1??? ??Yk = yk](2.3)? Pr[X j ?Y1 = y1??? ??Yk = yk] (2.4)=?Z1. . .?ZlPr[X j,Z1, . . . ,Zl ?Y1 = y1??? ??Yk = yk] (2.5)=?Z1. . .?Zln?i=1Pr[Xi | pa(Xi)Y ], (2.6)where pa(Xi)Y incorporates the observation Y if pa(Xi) contains any observed vari-9able in Y.In a tabular implementation (e.g., Figure 2.2), the CPDs of each variable arestored as a factor, which is a table that has an entry for each combination ofvalue assignments to the variables. Mathematically, a factor on a set of vari-ables X1, . . . ,Xk is a function from range(X1)? ?? ? ? range(Xk) to non-negativereal numbers. For each variable X in a belief network, there is an initial factor on{X}?pa(X) that stores the conditional probabilities of X (or prior probabilities inthe case when X has no parents). Factors are also used to represent intermediateresults of multiplying factors and summing out variables in factors.VE exploits the fact that when summing out some variable Zi in Equation 2.6,the CPDs that do not involve Zi can be distributed out of the summation for Zi, andthus only necessary CPDs are multiplied together before summing out Zi. Givenan elimination (or summation) ordering for Z1, . . . ,Zl , VE iteratively eliminateseach Zi by multiplying the terms that involve Zi and marginalizing out Zi from theproduct. A high-level layout of the variable elimination algorithm is presented inAlgorithm 1, where we do not delve into the details of the primitive factor opera-tions such as eliminating a variable and normalizing the values.Algorithm 1: VARIABLE-ELIMINATIONInput: A set F of factors, a set Q of query variables, and observationE ? Y1 = y1??? ??Yk = yk.Output: A factor Fpos representing posterior distribution P(Q | E).Let X denote the set of all random variablesZ? X?Q?{Y1, . . . ,Yk}for factor F ? F such that F involves some Yi doSet every entry of F with Yi 6= yi to 0endwhile there is a factor in F involving a variable in Z doSelect some variable Z ? ZF? result of eliminating Z from factors in FZ? Z?{Z}endFpos? multiplying all factors in F and normalizingreturn Fpos102.2.3 Context-Specific IndependenceWe introduce the notion of context-specific independence [5, 35] and present itsformal definition. Context-specific independence captures additional finer inde-pendence relations among random variables not revealed on the topological level ofbelief networks. By using more compact, structured representations for the CPDs,the variable elimination algorithm can be adapted to exploit context-specific inde-pendence in a belief network, making inference more efficient [28].Definition 7. [5] Let X, Y, Z, and C be pairwise disjoint sets of random variables,X and Y are contextually independent given Z and context C ? range(C) if Pr[X |Y ?Z ?C] = Pr[X | Z ?C] for any instantiations X , Y , and Z .Example 4. In Example 2, admitted is contextually independent of gpa given thecontext re f erences = strong since the CPDs for admitted satisfy thatPr[admitted | re f erences = strong?gpa = high]= Pr[admitted | re f erences = strong?gpa = low].(2.7)In this particular example, given the context that an applicant has strong references,his/her GPA does not influence the probability of admission.Poole and Zhang [28] devised a structure called contextual factor (or confac-tor) and adapted VE to use confactors to exploit context-specific independence. Aconfactor is a combination of a context and a factor, where the factor is a table andthe context specifies when this factor is applicable.Example 5. The confactor representation of the CPDs for admitted in Example 2is shown in Figure 2.3. Compared to the purely tabular representation in Fig-ure 2.2a, fewer parameters need to be specified in the confactors.While the high-level algorithmic procedure stays the same as in Algorithm 1,the original primitive operations in VE are modified, and additional operations areintroduced to manipulate confactors and to ensure that confactors are only multi-plied when their contexts do not conflict with each other. The resulting algorithm iscalled contextual variable elimination (CVE) and has been empirically demon-strated to be much more efficient when the belief network contains context-specific11?re f erences = weak,gpa admitted Valuelow true 0.1low f alse 0.9high true 0.4high f alse 0.6??re f erences = strong,admitted Valuetrue 0.7f alse 0.3?Figure 2.3: Two confactors representing the set of CPDs for admitted in thebelief network in Example 2.independence to exploit [28]. CVE reduces to VE when there is no context-specificindependence. In such case, every confactor is simply a factor with an empty con-text.2.3 Ontologies and Probabilistic ModelsThere has been a body of research on probabilistic ontologies [e.g., 6, 7, 20], whichattempts to add probabilities into ontology constructions and develop the syntax forprobabilities in ontology languages such as OWL. Our work takes a fundamentallydifferent approach. We adopt the approach of Poole et al. [26], where ontologiesform logical constraints that come logically prior to data and to probabilistic mod-els. For example, as we know (as part of our ontology) that humans are mammals,we do not expect any dataset to say explicitly that some person is a human and amammal (if a person is said to be an animal, it is a different sense of animal). Thedata and the models are written taking into account the underlying ontology.We follow the integration of ontologies and (relational) probabilistic modelsof Poole et al. [26, 27], where random variables are constructed from propertiesand individuals. When modelling uncertainty, a functional property applied to anindividual becomes a random variable, where the range of the random variableis the range of the property. A non-functional property applied to an individualbecomes a Boolean random variable for each value in the range of the property.We can arrange these random variables into a DAG which specifies (conditional)dependencies.12A relational probabilistic model [10, 14], also referred to as a template-basedmodel [17], extends a probabilistic graphical model with the relational (or first-order) component and can be defined in terms of parametrized random variables.Given a population of individuals, such models can be grounded into standardgraphical models.While an ontologically-based probabilistic model is inherently relational (asthe ontology is about properties of individuals), the issues considered in this thesisdo not depend on the relational structure. For the rest of the thesis, we discussthe propositional version, which can be considered as either being about a singledistinguished individual or about the grounding of a relational network, with theindividuals implicit. Our work concerns directed graphical models.The corresponding property of a random variable is undefined for individualsnot in its domain. We need a way to represent that a random variable is definedonly in some contexts. As Poole et al. [27] noticed, the idea of using a randomvariable only in a context in which it is defined is reminiscent of context-specificindependence [5, 35]. Here instead of context-specific independence, a randomvariable is simply not defined in some contexts.Definition 8. For each random variable X , the extended variant of X is a randomvariable X+ with range(X+) = range(X)?{?}, where X+ = X when X is defined,and X+ =? when X is undefined.The alternatives are best given in terms of a motivating example:Example 6. Continuing Example 1, suppose the ontology contains the propertycauseDamage, whose domain is T hing and range is {true, f alse}, which specifieswhether an individual is capable of causing some particular damage.Suppose we have the following conditional probabilities for causeDamage.For non-animals, there is a small probability (e.g., 0.1) that causeDamage = true.For animals that are not human, the probability of causeDamage = true is higher(e.g., 0.3). For humans, the distribution of causeDamage depends on education.When education is high, the probability of causeDamage = true is 0.9, and wheneducation is low, the probability of causeDamage = true is 0.5. A tree-structuredrepresentation of the CPDs for causeDamage given its parents isAnimal, isHuman,and education is given in Figure 2.4.13isAnimalisHumaneducation(0.1, 0.9)(0.9, 0.1) (0.5, 0.5)(0.3, 0.7)true f alsetrue f alsehigh lowFigure 2.4: A tree-structured representation of the CPDs for causeDamagegiven its parents isAnimal, isHuman, and education. The leaf nodesspecify probability distributions over {true, f alse}.Note that the conditional probabilities in Example 6 obey the constraint thata property is only used in a context where it is defined. In Figure 2.4, the CPDtree cannot split on education in any other context. The conditional probabilitiesnaturally exhibit context-specific independence.A straightforward approach to dealing with potentially undefined variables is tobuild probabilistic models and perform inference using the extended random vari-ables and standard inference methods. In Example 1 and 6, the ontology impliesthe probabilities:Pr[isHuman+ =? | isAnimal+ = v] = 1, v ? { f alse,?}Pr[education+ =? | isHuman+ = v] = 1, v ? { f alse,?}To define a joint distribution over all random variables, we also need the fol-lowing probabilities:Pr[isAnimal+ = true]Pr[isHuman+ = true | isAnimal+ = true]Pr[education+ = high | isHuman+ = true]The topology of a belief network using the extended random variables is shown inFigure 2.5, where causeDamage+ depends on all three other random variables.14isAnimal+isHuman+education+causeDamage+Figure 2.5: The graph structure of a belief network using the extended vari-ants of the random variables in Example 6.A tabular version of the CPDs for causeDamage+ would define the probabilitydistribution of causeDamage+ in 27 contexts (33). A tree-structured version of theCPDs of causeDamage+ is more complicated than that in Figure 2.4, because itneeds to model the undefined cases that are implied by the ontology.In this work, we explore representing and reasoning in terms of the non-extendedrandom variables. Exact inference methods such as variable elimination [11, 34]and recursive conditioning [8, 9] have time complexity O(|variables? range|T ),where T is the network treewidth. Reducing the ranges of random variables canhave a dramatic impact on inference speed.A simpler representation is to build a model with the non-extended randomvariables as shown in Figure 2.6. In the topology of this model, the constraints ofisAnimalisHumaneducationcauseDamageFigure 2.6: The graph structure of an ontologically-based belief network forExample 6.15the ontology are not represented. In particular, there is no edge from isHuman toeducation ? the probability of education does not depend on isHuman; only thedefinedness of education does, and that is represented by the ontology. Not onlydo the random variables have smaller ranges, but the network structure is simpler(with fewer edges).In this thesis, we address two questions: (1) how to construct a model andspecify probabilities and (conditional) independencies that obey the constraints im-plied by the underlying ontology, and (2) how to carry out probabilistic inference,by leveraging the ontology, so that undefined values do not need to be explicitlymodelled in the random variables.16Chapter 3Ontologically-Based BeliefNetworks (OBBNs)In this chapter, we define ontologically-based belief networks (OBBNs), whichrepresent probabilistic dependencies separately from the logical constraints froman ontology. We also specify how to perform probabilistic inference with OBBNs.3.1 Representation3.1.1 Preliminary Definitions and NotationIn an OBBN, random variables are constructed from properties in the ontology1.For a random variable, the domain of its corresponding property signifies whenthe random variable is defined. The range of a random variable is simply the rangeof the corresponding property in the ontology.For our purpose of defining the semantics, we are concerned with conjunctionsof variable assignments. Moreover, we want to only consider those conjunctionsthat are consistent with the ontology.1Since every random variable corresponds to a property in the ontology, we use the same symbolto refer to both the variable and the property. Thus, the individuals are left implicit. The context willmake clear to which of the two constructs the symbol is referring.17Definition 9. A conjunction of variable assignments is well-defined in contextCon, where Con is also a conjunction, with respect to ontology O if:? the conjunction is of the form X = x, and O |=Con? dom(X); or? the conjunction is of the form ? ? ? , ? is well-defined in Con, and ? iswell-defined in Con?? .When a context is not specified, we assume the empty context, denoted True.Definition 9 defines the notion of well-definedness in terms of a non-commutativelogic using the conjunctive operator.We use the notation vars(?), for any conjunction ? , to denote the set of ran-dom variables that appear in ? . Furthermore, for any conjunction ? of variableassignments, we use ?+ to denote the corresponding conjunction with variablesXi ? vars(?) replaced by their extended variants X+i but assigned the same valueas Xi in ? .3.1.2 Structural and Probabilistic ConstraintsAn OBBN, similar to a belief network, is specified in terms of conditional prob-ability distributions (CPDs) of random variables. We start with a total orderingof the random variables X1, . . . ,Xn that is consistent with the partial order of theproperties in the ontology. For each random variable Xi, there is a set of parentvariables pa(Xi) ? {X1, . . . ,Xi?1} that have direct probabilistic influence on Xi. Aset of CPDs given the values of its parent variables is then specified such thatPr[Xi | X1, . . . ,Xi?1] = Pr[Xi | pa(Xi)].A parent context [28] for a random variable X is a conjunction P such thatvars(P)? pa(X) and P is well-defined in the context dom(X). For every randomvariable X , there is a covering set of parent contexts ?X such that the parent con-texts in ?X are mutually exclusive and their disjunction is implied by dom(X) in theontology. For every Pi ? ?X , there is a probability distribution pi over range(X),which represents Pr[X | Pi].In an OBBN, it is required that for any pair of random variables X and Y ?vars(dom(X)), Y is not a descendent of X . A random variable Y , whose instantia-tion may possibly render X undefined, cannot thus be probabilistically influenced18by the value of X . This is a consequence of the existence of a total ordering of therandom variables that is consistent with the partial ordering of the properties in theontology.Note that the random variables involved in the primitive form of the domain ofa random variable X , vars(dom(X)), need not be parents of X in an OBBN. TheCPDs of X are simply irrelevant in the cases when its domain does not hold.Definition 10. An ontologically-based belief network (OBBN) is defined by a5-tuple (Ont,X,?,pa,?), where? Ont is an acyclic Aristotelian ontology;? X ? Ont.P is a set of random variables;? ? is a total ordering of the random variables that is consistent with the partialordering of the properties in Ont;? pa : X? 2X specifies, for each variable Xi ? X, the set of its parent variablespa(Xi)? {X ? X : X ? Xi}; and? ? is a mapping from X that maps X ? X into {(Pi, pi) : Pi ? ?X}, wherePi is well-defined in dom(X) with respect to Ont, and pi is a probabilitydistribution over range(X).The definition of an OBBN requires that CPDs of a random variable only bespecified in well-defined parent contexts. This suggests a natural semantic repre-sentation of the CPDs as a tree structure, like that for context-specific independence[5]. In the tree-structured representation of the CPDs for X , the internal nodes cor-respond to parent variables, and the edges from a node correspond to values forthe random variable at that node. The ontology entails that dom(X) conjoined withthe path to a node must imply the domain of the random variable at that node. ?Xcorresponds to the set of paths from the root down the tree, with the correspondingconditional probability distributions at the leaves.Example 7. Figure 2.4 depicts a tree-structured representation of the CPDs forcauseDamage. Note that the path isAnimal = f alse cannot split on any other parentvariable because any more splitting results in undefined conjunctions of variableassignments, and similarly for the path isAnimal = true? isHuman = f alse.19Example 8. The tree-structured CPD for education consist of only one single dis-tribution (e.g., (0.2,0.8)) over {low,high}, since education does not need a parentvariable in an OBBN. We will only use education in the contexts where the indi-vidual under consideration is a human.Given an OBBN M = (Ont,X,?,pa,?), the corresponding extended beliefnetwork (EBN) M+ is a belief network that has random variable X+ for X ?X, where range(X+) = range(X)?{?}, and the CPDs of each variable are alsodefined in terms of parent contexts.Let dom(X)Y , where Y is a set of variables, denotes the part of dom(X) thatinvolves the variables in Y. In the EBN, the parents of X+ are the parents of Xtogether with the minimal subset Y of vars(dom(X)) such that for every randomvariable Y ? vars(dom(X)), either Y ? Y or M.Ont |= dom(X)Y ? dom(X){Y}.In the corresponding EBN for Example 6, Y = {isHuman} for education (eventhough dom(education) ? isAnimal = true? isHuman = true) since M.Ont |=isHuman = true? isAnimal = true. In the worst case, Y= vars(dom(X)).For any parent context PX for X inM,M+ has PrM+ [X+ | dom(X)+Y ?P+X ] =PrM[X | PX ] (hence PrM+ [X+ = ? | dom(X)+Y ?P+X ] = 0). We enumerate otherparent contexts (involving variables in Y) for X+ so that they form a coveringset. For any of these parent contexts P+X for X+, PrM+ [X+ = ? | P+X ] = 1 (hencePrM+ [X+ = x | P+X ] = 0 for any x 6=?).Example 9. Consider education+ in the EBN in Example 6 (as shown in Fig-ure 2.5). In the parent context isHuman+ = true, Pr[education+ | isHuman+ =true] has the same distribution as Pr[education] in the OBBN (with Pr[education+ =? | isHuman+ = true] = 0). Note that in the OBBN, education does not haveparents in the graph structure. For the other two parent contexts in the EBN,we have Pr[education+ = ? | isHuman+ = f alse] = 1 and Pr[education+ = ? |isHuman+ =?] = 1.Example 10. To specify the CPDs for causeDamage+ in the EBN for Example 6,we have three other parent contexts in addition to those in the OBBN case:? isAnimal+ =?? isAnimal+ = true? isHuman+ =?20? isAnimal+ = true? isHuman+ = true? education+ =?where the probability that causeDamage+ =? is 1 in any of the three contexts.3.2 SemanticsThe standard semantics for probability is in terms of possible worlds, which cor-respond to assignments of values to all random variables. A joint probability dis-tribution is a distribution over the possible worlds, and probabilities of individualpropositions can be defined in terms of this. When there is an underlying ontology,an assignment of a value to each random variable is often undefined. If we onlyconsider the well-defined assignments, we can get a smaller possible world struc-ture. The analogous notion to an assignment of a value to each random variable isa maximal well-defined conjunction of variable assignments:Definition 11. A well-defined conjunction X?(1) = x?(1)??? ??X?(k) = x?(k) (where? is a mapping from {1, . . . ,k} to {1, . . . ,n} such that k ? n and ?(i) < ?(i+1))is maximal with respect to an ontology if for any variable X ? /? {X?(1), . . . ,X?(k)}and any value x? ? range(X ?), X?(1) = x?(1) ? ?? ? ?X?(k) = x?(k) ?X? = x? is notwell-defined.To avoid considering equivalent conjunctions that only differ in the order ofvariable assignments, we assume that variables in a conjunction follow a total or-dering consistent with the ordering of the ontology. For conjunction ? , we use ?<kto denote the preceding part of ? that contains variables Xi ? vars(?) with i < k.Whereas an EBN encodes a joint probability distribution over all conjunctionsof extended variable assignments that involve all the random variables, an OBBNrepresents a joint probability distribution only over maximal well-defined conjunc-tions of the non-extended random variables. For an OBBNMwith a total orderingX1, . . . ,Xn of random variables, the probability of a maximal well-defined conjunc-tion X ? X?(1) = x?(1)??? ??X?(k) = x?(k) is:PrM[X ] =k?i=1PrM[X?(i) = x?(i) | P?(i)],where P?(i) is the parent context for X?(i) such that X<?(i) |= P?(i).21By not representing the undefined worlds, an OBBN can largely reduce the sizeof the set of possible worlds.Example 11. For the EBN in Example 6, there are 4 random variables, each with3 values (including ?), and so there are 34 = 81 possible worlds. The possibleworlds for the OBBN in Example 6 are shown in Figure 3.1. Out of the 81 possibleworlds for the EBN, only 8 have a non-zero probability. The others are impossibledue to the ontology. There is an isomorphism (as shown in Section 3.4) betweenthe non-zero worlds and the worlds for the OBBN in Figure 3.1.isAnimalisHuman causeDamagecauseDamageeducationcauseDamage causeDamagew1 w2 w3 w4w5 w6w7 w8true f alsetrue f alselow hightrue f alse true f alsetrue f alsetrue f alseFigure 3.1: An enumeration of all possible worlds represented by the OBBNin Example 6. Every path from the root to a leaf corresponds to a maxi-mal well-defined conjunction of variable assignments.3.3 InferenceThe problem of inference is to determine the posterior probability distributionPr[Q+ | E ] for a random variable Q, given some well-defined evidence E . Wecompare three approaches:EBN-Q Compute Pr[Q+ | E+] using the EBN.3Q (1) Query the ontology and (2) compute Pr[dom(Q) | E ] and Pr[Q | E ?dom(Q)] using the OBBN.22MQ (1) Query the ontology and (2) replace Q in the OBBN with Q+, add explicit(probabilistic) dependencies from vars(dom(Q)) to Q+, and then computePr[Q+ | E ] using the modified OBBN.We first describe 3Q. The inference starts with querying the ontology to ex-tend the given evidence with the domain of each variable in the evidence. Theontological query simply conjoins the evidence E with dom(X) for every variableX ? vars(E), so that the extended evidence E ? is a well-defined conjunction forthe subsequent probabilistic queries. This relatively cheap computation (which in-volves only one simple form of logical deduction) can simplify or even circumventthe subsequent probabilistic queries.For instance, in Example 6, if the given evidence is E ? isHuman = true, theextended evidence E ? after the ontological query incorporates dom(isHuman) andbecomes isAnimal = true? isHuman = true.There are three possible outcomes after the ontological query:1. The extended evidence E ? entails ?dom(Q):E ? |= ?dom(Q).The answer to the query is then: Pr[Q+ =?] = 1. No probabilistic inferenceis needed.2. The extended evidence E ? entails ?dom(Q):E ? |= dom(Q).In this case, Pr[Q+ =?] = 0, and for any other value q 6=?, Pr[Q+ = q | E ]is the same as Pr[Q = q | E ?] in the OBBN (which does not contain undefinedvalues).3. Otherwise, it remains uncertain whether an assignment Q = q, for any q ?range(Q), is well-defined given E . We proceed with two separate probabilis-tic inference tasks, which can be run in parallel:23(a) Determining the probability that Q = q is well-defined given E ?:? = Pr[dom(Q) | E ?].(b) Computing the distribution of Q when it is well-defined:D = Pr[Q | E ??dom(Q)].3a needs not deal with the potential issue of undefinedness, as dom(Q) iswell-defined. Likewise, 3b incorporates dom(Q) into the evidence and be-comes an instance of Case 2, where the query variable is always defined.The query returns Pr[Q+ = ? | E ] = 1? ? and Pr[Q+?? | E ] = ? ? D, wherePr[Q+?? | E ] denotes the probabilities over range(Q+)?{?}.Leveraging the underlying ontology, the entire inference process of 3Q boilsdown to an ontological query, followed by up to two probabilistic queries. The on-tological query is assumed to be a relatively cheaper computation compared to theprobabilistic queries, yet in some cases, it lets us skip one or both of the probabilis-tic queries. We can then use any belief network inference technique to compute theprobabilistic queries, if needed. Although the graph structure and the associatedconditional probabilities of the OBBN do not capture ontological information, wewill show (in the following section) that we still obtain the correct results. Thisinference scheme is summarized in Procedure 3Q-INFERENCE.MQ is a straightforward mix of the two approaches. In MQ, after the onto-logical query, we modify the model to incorporate the necessary ontological infor-mation into the topological structure and the CPDs for Q in the OBBN. We startby expanding the range of the query variable Q to include ???, and then we makevars(dom(Q)) parent variables of Q.We create a new set of CPDs for Q in terms of parent contexts. Since nowvars(dom(Q)) ? pa(Q), we can create a set of parent contexts such that for anyparent context Pi, either Pi |= dom(Q) or Pi |= ?dom(Q). The CPDs are definedso that Pr[Q = ? | Pi] = 1 for any parent context Pi |= ?dom(Q), and Pr[Q | P j]stays unchanged for any parent context P j |= dom(Q). We can then run any beliefnetwork inference method on the modified model to compute Pr[Q | E ?].24Procedure 3Q-INFERENCEInput: OBBNM, query variable Q, and evidence E .Output: Pr[Q+ | E ], where range(Q+) = range(Q)?{?}.E ?? extending E with dom(X) for every X ? vars(E) usingM.Ont.if E ? |= ?dom(Q) thenreturn Pr[Q+ =? | E ] = 1else if E ? |= dom(Q) thenD? Pr[Q | E ?]return {Pr[Q+ =? | E ] = 0,Pr[Q+?? | E ] =D}else? ? Pr[dom(Q) | E ?]D? Pr[Q | E ??dom(Q)]return {Pr[Q+ =? | E ] = 1? ?,Pr[Q+?? | E ] = ? ?D}endCompared with 3Q, approach MQ allows for a single probabilistic query andso may possibly avoid some repeated computation (e.g., the same multiplicationsin computing Pr[dom(Q) | E ?] and Pr[Q | E ??dom(Q)]), at the expense of needingto dynamically modify the OBBN structure for each query.3.4 Equivalence of RepresentationsIn this section, we present the results showing that an OBBN represents a coherentprobability distribution (i.e., the probabilities of the possible worlds actually sum to1), and by comparing 3Q-INFERENCE to inference using the corresponding EBN,we show the correctness of 3Q-INFERENCE.Theorem 1. LetM be an OBBN andM+ be the corresponding EBN. Every max-imal well-defined conjunction S of variable assignments can be extended to a fullconjunction S+f ull (involving all variables) by adding an assignment X+ = ? forevery random variable X /? vars(S) such that PrM[S] = PrM+ [S+f ull]. Any otherpossible world inM+ has probability 0.See Appendix A for a proof of this and other theorems. This shows that anOBBN represents a probability distribution over all maximal well-defined conjunc-tions of variable assignments.25This result is surprising because although some conjunctions may not be well-defined, as long as we follow the structure of an OBBN and never use a variable ina context where the variable is not defined, we can define a semantics that does notexplicitly represent the value ?undefined? and avoid considering the distribution ofa variable when it is undefined.That an OBBN represents a coherent probabilistic distribution and the corre-spondence between the OBBN and the EBN directly lead to the consistency ofprobabilistic query results between the two models.Theorem 2. LetM be an OBBN andM+ be the corresponding EBN. Let S be awell-defined conjunction of variable assignments. Then PrM[S] = PrM+ [S+].Theorem 3. Let M be an OBBN and M+ be the corresponding EBN. For anyrandom variable Q and evidence E with PrM[E ] > 0, the posterior distributionPrM[Q+ | E ] returned by 3Q-INFERENCE is the same as PrM+ [Q+ | E+].26Chapter 4Empirical ComparisonTo empirically evaluate the proposed inference methods for OBBNs, we constructan ontology that expands parametrically with the number of properties. We definea TAF (for ?true, always, false?) ontology as follows: every property has a Booleanrange (i.e., {true, f alse}). For each property, there are classes that are defined onlywhen the property is true, classes that are defined only when the property is f alse,and there are classes whose definitions do not depend on the value of the property.We construct one instance of each of these classes in a recursive tree structure in abreadth-first manner. For each class, we define a property which has that class asits domain.p0p1 p2 p3p4 p5 p6 p10 p11 p12. . .true always f alsetrue always f alse true always f alseFigure 4.1: A pictorial visualization of a TAF-structured ontology with 13properties, namely p0, p1, . . . , p12.Such an ontology has a rich hierarchical class structure, where for every classthere is a property whose domain is that class. A class associated with a property27assignment gives rise to a new subclass, and there is another property only definedfor the subclass. On top of the ontology, we also want to have complex probabilisticdependencies in our model.Figure 4.1 provides an illustration of the synthetic ontology. Classes are leftanonymous and do not show up in the figure, as they are reduced to property val-ues. Property p0 is always defined (e.g., the domain of p0 could be T hing in anOWL ontology). Property p1 is only defined when p0 = true; p2 is always definedregardless of the value p0 takes; and p3 is defined when p0 = f alse. p4 is definedonly when p1 = true (so dom(p4)? p0 = true? p1 = true). Note that p5 is definedno matter what value p1 takes, but it still requires p0 = true. Additional propertiesare constructed in the same fashion.We build an OBBN using the ontology such that each random variable pi prob-abilistically depends on every predecessor variable whenever the predecessor vari-able is defined, except for those variables whose values are determined when piis defined. This essentially introduces the largest number of arcs into the proba-bilistic graph structure without violating the structural constraint of OBBNs. Asan example, for the ontology in Figure 4.1, p12 has probabilistic dependencies onp2, p7, p8, p9, and p11. We query the final random variable with no evidence so thatthe complex probabilistic dependencies have the most impact on the queries, as weexpect evidence to simplify the queries.We use an open-source implementation of the CVE algorithm [28], which usesthe confactor representation for the CPDs to exploit context-specific independence.We note that context-specific independence is exploited in both the OBBNs andthe EBNs. The difference only comes from that the EBN uses extended randomvariables and has to model the ontological constraints in its topological structureand associated CPDs.Figure 4.2 presents a comparison of the running times for probabilistic querieswith the constructed OBBNs and their corresponding EBNs, using the three differ-ent approaches, EBN-Q, 3Q, and MQ. These running times are the averages of 500runs of the same queries for 15 ? n ? 30, 100 runs for 31 ? n ? 45, and 10 runsfor 46 ? n, where n is the number of random variables1. The significant differ-1For the small models, we could not get meaningful measures of the running times with onesingle run of each query, due to the issue of machine-recorded numerical precision. The inference28ence in the running times demonstrates that inference with OBBNs can be ordersof magnitude more efficient2 than with their corresponding EBNs.Figure 4.2: A comparison of running times for probabilistic queries with theOBBNs and their corresponding EBNs, using the three different ap-proaches. The x-axis is linearly spaced, whereas the y-axis is spacedusing the log scale.In an attempt to analyse and isolate the sources of inference speedup withthe OBBNs, we add to the OBBNs extra redundant edges (i.e., the CPDs repre-senting the edges) that represent ontological dependencies and perform the samequeries. Furthermore, we execute 3Q-Inference on the EBNs by carrying out twoprobabilistic queries (without modifying the models): Pr[dom(Q)+] and Pr[Q+ |procedures are deterministic, and the variation in our implementation?s running times is negligiblefor our purpose of demonstrating the large differences between using OBBNs and EBNs.2In our experiments, we ran the two probabilistic queries sequentially. However, they could easilybe executed in parallel.29dom(Q)+]. Figure 4.3 gives a plot of the running times for querying these modi-fied models, where ?EBN-2Queries? refers to the case of running two probabilisticqueries on the EBNs, and ?3Q-Xedges? refers to the case where extra edges areadded to the OBBNs.Figure 4.3: A comparison of running times for probabilistic queries usingdifferent modified OBBNs and EBNs. The x-axis is linearly spaced,whereas the y-axis is spaced using the log scale.To our surprise, in addition to the reduced range sizes, the pure fact of runningtwo (simpler) probabilistic queries plays an important role in reducing the inferencetimes. One explanation could be sought from the relationship:Pr[Q+] = Pr[Q+ | dom(Q)+] Pr[dom(Q)+] (4.1)+Pr[Q+ | ?dom(Q)+] (1?Pr[dom(Q)+]). (4.2)30When carrying out the two probabilistic queries for Pr[dom(Q)+] and Pr[Q+ |dom(Q)+], we do not perform the computation for Pr[Q+ | ?dom(Q)+] (which weknow is 0) in Equation 4.2, which would otherwise be included in the intermediateresults of computing Pr[Q+] with a single query.It should be emphasized that the CVE algorithm used in all our experimentsdoes not try to detect zero values. If we had employed an inference algorithm thatdetects zeros in the intermediate computations and saves all the multiplicationsinvolving zeros, we would likely have obtained different results for the EBNs, asmany of the multiplications are expected to include zeros. The OBBN frameworkcan be thought as explicitly extracting the determinism and allowing a genericinference algorithm to work more efficiently.On the other hand, for our particular TAF ontologies and models, adding extraedges to the OBBN turns out to have little effect in inference speed. To furtherinvestigate into this result, we repeat the experiment 30 times for each OBBN andthe corresponding OBBN with redundant edges to see whether there exist small yetsystematic differences between the two models.For each model, we apply the Shapiro-Wilk normality test [29, 30] to these 30data points with ? = 0.05. The p values for almost all models are much smallerthan our chosen ? value, which suggests that the variations of running times arenot normally distributed. We therefore plot the medians, the first and third quartilesof the 30 data points for each OBBN and each extra-edged OBBN. In Figure 4.4,two boxplots are placed ?side-by-side? such that for each model size, there are onered box and one blue box adjacent to each other. The red boxplot corresponds todata for the regular OBBNs, whereas the blue boxplot is for those OBBNs withextra redundant edges. For several models (e.g., models of size 32 and 33), thevariations in the data are so small that the first quartiles, the medians, and the thirdquartiles all appear overlapping in the plots, and only the medians are visible. Inthese cases, the few outliers have accounted for the non-normality.Figure 4.4 reflects that the effect of removing edges is not significant and seemsto be no more than that of the random variations in the machine running times. Onepossible justification for such results lies in our use of the contextual factors [28].When adding an extra parent to a variable, the size of the table(s) representing theCPDs only increases by a relatively small amount independent of the existing table31Figure 4.4: Two ?side-by-side? boxplots of 30 data of running times for reg-ular OBBNs (in red) and extra-edged OBBNs (in blue). The lower andupper sides of the boxes are the first and third quartiles, whereas themiddle green bars correspond to the medians. The whiskers extendingfrom the boxes corresponding to non-outlier extreme points.sizes. If purely factor representations of the CPDs were used, then adding oneextra parent variable would double the size of the factor. The insignificant effect ofremoving edges might also be an artifact of the structure of the TAF ontologies, butlarge real-world examples do not exist. How our results would generalize remainsto be explored.All of the measured times are for a Java implementation (under JDK 1.7) run-ning on a 64-bit Windows 7 machine with Intel i5-2520 processor and 16 GB ofRAM. For all experiments, 12 GB of RAM were allocated to the Java Virtual Ma-chine. Garbage collector was executed using a separate parallel thread.32Chapter 5Conclusion5.1 SummaryWe have presented ontologically-based belief networks (OBBNs), a simple frame-work for building probabilistic models that use formal ontologies and avoidingexplicit modelling of undefined values in the random variables. The proposedframework frees domain modellers from having to think about undefined values bydecoupling the probabilistic dependencies and the logical constraints implied bythe ontology. By exploiting the determinism that exists in the ontology, an OBBNlargely simplifies the probabilistic network structure. We showed that probabilisticreasoning involving variables whose corresponding properties are not always well-defined can still be effectively carried out by transforming a given query into twosimpler queries. Furthermore, we have demonstrated that, leveraging the ontology,probabilistic inference can become more efficient by reducing the ranges of therandom variables as well as executing two simpler queries.5.2 Future WorkWe see several distinct research directions to pursue in the future. First, it is inter-esting to investigate how approximate inference methods such as statistical sam-pling can make use of the ontology to be more efficient. Markov chain MonteCarlo (MCMC) sampling [15, 33] in an EBN would encounter the problem of ?033probabilities? as many, if not most, worlds would have a probability of 0. Straight-forwardly plugging an MCMC sampler (e.g., Gibbs sampler [12, 13]) into 3Q-INFERENCE may generate samples that correspond to conjunctions of variable as-signments that are well-defined for an OBBN. While it appears plausible that thesamples corresponding to well-defined conjunctions could still make up the cor-rect proportions to estimate those probabilities of interest (as only well-definedconjunctions are queried), it would be desirable to utilize the ontology to avoidsampling a variable when the current assignment to the other variables makes itundefined. This might also reduce the number of samples needed to make goodestimates.Another direction is to extend the ontologically-based modelling approach tothe relational domain. Kuo and Poole [19] extended this work by considering con-structing relational ontologically-based probabilistic models and discussed variouskinds of properties in an ontology that could give rise to representational and rea-soning issues; e.g., random variables may have a priori unknown ranges.Finally, a real-world application of probabilistic modelling based on a formalontology is definitely an exciting avenue. There has been a growing interest inbuilding (or reconstructing from existing data) large-scale ontologies [32]. Onenext step is to build ontologically-based probabilistic models so that, not only data,but predictive models can also be shared. This requires that domain modellersagree on an ontology and build probabilistic systems that use such formal vocabu-lary and structure. Our work has tackled a relevant sub-problem in realizing thesepractical systems.34Bibliography[1] Aristotle (350 B.C.). Categories. Translated by E. M. Edghill. URL http://www.classicallibrary.org/aristotle/categories/. ?pages 4[2] Bellman, R. (1957). Dynamic Programming. Princeton University Press,Princeton, NJ, USA, 1 edition. ? pages 9[3] Berg, J. (1982). Aristotle?s theory of definition. In ATTI del ConvegnoInternazionale di Storia della Logica, pages 19?30. San Gimignano. URLhttp://ontology.buffalo.edu/bio/berg.pdf. ? pages 4[4] Bertel?, U. and Brioschi, F. (1972). Nonserial dynamic programming,volume 91 of Mathematics in Science and Engineering. Academic Press. ?pages 9[5] Boutilier, C., Friedman, N., Goldszmidt, M., and Koller, D. (1996).Context-specific independence in Bayesian networks. In Proceedings of theTwelfth Conference on Uncertaintyin Artificial Intelligence (UAI-96), pages115?123. ? pages 11, 13, 19[6] Carvalho, R. N., Laskey, K. B., and da Costa, P. C. G. (2013). Pr-owl 2.0 -bridging the gap to owl semantics. In URSW (LNCS Vol.), pages 1?18. ?pages 12[7] da Costa, P. C. G., Laskey, K. B., and Laskey, K. J. (2008). Pr-owl: ABayesian ontology language for the semantic web. In URSW (LNCS Vol.),pages 88?107. ? pages 12[8] Darwiche, A. (2001). Recursive conditioning. Artificial Intelligence, volume126(1-2):pages 5?41. ? pages 15[9] Darwiche, A. (2009). Modeling and Reasoning with Bayesian Networks.Cambridge University Press. ? pages 1535[10] De Raedt, L., Frasconi, P., Kersting, K., and Muggleton, S. H., editors(2008). Probabilistic Inductive Logic Programming. Springer. ? pages 13[11] Dechter, R. (1996). Bucket elimination: A unifying framework forprobabilistic inference. In E. Horvitz and F. Jensen, editors, UAI-96, pages211?219. Portland, OR. ? pages 9, 15[12] Gelfand, A. E. and Smith, A. F. (1990). Sampling-based approaches tocalculating marginal densities. Journal of the American statisticalassociation, volume 85(410):pages 398?409. ? pages 34[13] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions,and the Bayesian restoration of images. IEEE Transactions on PatternAnalysis and Machine Intelligence, volume 6(6):pages 721?741. ? pages 34[14] Getoor, L. and Taskar, B., editors (2007). Introduction to StatisticalRelational Learning. MIT Press, Cambridge, MA. ? pages 13[15] Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1996). Markov chainMonte Carlo in practice, volume 2. CRC press. ? pages 33[16] Grau, B. C., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., andSattler, U. (2008). Owl 2: The next step for owl. Web Semant.,volume 6(4):pages 309?322. URLhttp://dx.doi.org/10.1016/j.websem.2008.05.001. ?pages 3[17] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models -Principles and Techniques. MIT Press. ? pages 13[18] Kuo, C.-L., Buchman, D., Katiyar, A., and Poole, D. (2013). Probabilisticreasoning with undefined properties in ontologically-based belief networks.In Proceedings of the Twenty-Third International Joint Conference onArtificial Intelligence (IJCAI-13). ? pages iii[19] Kuo, C.-L. and Poole, D. (2013). On integrating ontologies with relationalprobabilistic models. In Proceedings of the Third International Workshop onStatistical Relational Artificial Intelligence (StaRAI-13), AAAI-13Workshops. ? pages 34[20] Laskey, K. B. (2008). MEBN: A language for first-order Bayesianknowledge bases. Artificial Intelligence, volume 172(2?3):pages 140?178.URL http://www.sciencedirect.com/science/article/pii/S0004370207001312. ? pages 1236[21] Lauritzen, S. L. (1996). Graphical models, volume 17. Oxford UniversityPress. ? pages 6[22] Mohs, F. (1812). Versuch einer Elementar-Methode zur naturhistorischenBestimmung und Erkennung der Fo?ilien, volume 1. Camesina. ? pages 1[23] Motik, B., Patel-Schneider, P. F., and Grau, B. C., editors (2012). OWL 2Web Ontology Language Direct Semantics, volume W3C Recommendation11 December 2012. W3C. URLhttp://www.w3.org/TR/owl-direct-semantics. ? pages 3[24] Motik, B., Patel-Schneider, P. F., and Parsia, B., editors (2012). OWL 2 WebOntology Language Structural Specification and Functional-Style Syntax,volume W3C Recommendation 11 December 2012. W3C. URLhttp://www.w3.org/TR/owl2-syntax/. ? pages 3[25] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks ofPlausible Inference. Morgan Kaufmann, San Mateo, CA. ? pages 2, 6[26] Poole, D., Smyth, C., and Sharma, R. (2008). Semantic science: Ontologies,data and probabilistic theories. In Uncertainty Reasoning for the SemanticWeb I, LNAI/LNCS. Springer. URL http://cs.ubc.ca/~poole/papers/SemSciChapter2008.pdf. ?pages 12[27] Poole, D., Smyth, C., and Sharma, R. (2009). Ontology design for scientifictheories that make probabilistic predictions. IEEE Intelligent Systems,volume 24(1):pages 27?36. URL http://www2.computer.org/portal/web/computingnow/2009/0209/x1poo. ? pages 4, 12,13[28] Poole, D. and Zhang, N. L. (2003). Exploiting contextual independence inprobabilistic inference. Journal of Artificial Intelligence Research,volume 18:pages 263?313. Code is available fromhttp://www.cs.ubc.ca/~poole/code.html. ? pages iii, 11, 12,18, 28, 31[29] Razali, N. M. and Wah, Y. B. (2011). Power comparisons of shapiro-wilk,kolmogorov-smirnov, lilliefors and anderson-darling tests. Journal ofStatistical Modeling and Analytics, volume 2(1):pages 21?33. ? pages 31[30] Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test fornormality (complete samples). Biometrika, volume 52(3/4):pages 591?611.? pages 3137[31] Smith, B. (2003). Ontology. In L. Floridi, editor, Blackwell Guide to thePhilosophy of Computing and Information, pages 155?166. Oxford:Blackwell. URL http://ontology.buffalo.edu/smith/articles/ontology_pic.pdf. ? pages 3[32] Staab, S. and Studer, R. (2009). Handbook on ontologies. Springer. ?pages 34[33] Tanner, M. (1996). Tools for Statistical Inference: Methods for theExploration of Posterior Distributions and Likelihood Functions. SpringerSeries in Statistics. Springer. ? pages 33[34] Zhang, N. L. and Poole, D. (1994). A simple approach to Bayesian networkcomputations. In Proc. 10th Canadian Conference on AI, pages 171?178.? pages 9, 15[35] Zhang, N. L. and Poole, D. (1999). On the role of context-specificindependence in probabilistic reasoning. In Proceedings of the SixteenthInternational Joint Conference on Artificial Intelligence (IJCAI-99), pages1288?1293. Stockholm, Sweden. ? pages 11, 1338Appendix ALemmas and ProofsProposition 1. LetM be an OBBN and S be a maximal well-defined conjunctionof variable assignments. For any variable X /? vars(S), S |= ?dom(X).Proof. Assume the statement were not true. Let Xk /? vars(S) be the first variablesuch that S<k 6|= ?dom(Xk). Let D = vars(dom(Xk)) \ vars(S). We could find anassignment D to the variables in D such that S ?D?Xk = xk is well-defined, andso S were not maximal.Lemma 1. LetM be an OBBN with the total ordering X1, . . . ,Xn of variables, andM+ be the corresponding EBN. Let S+ ? X+?(1) = x?(1)? ?? ??X+?(k) = x?(k) be aconjunction of variable assignments. If there exists some X+i ? vars(S+) such thatS+ |= dom(Xi)+ and S+ |= X+i =?, then PrM+ [S+] = 0.Proof. It suffices to show that for any full conjunction S ?f ull such that S?f ull |= S+,PrM+ [S ?f ull] = 0.By construction of the EBN, we have the conditional probability PrM+ [X+i =? | S ?pa(X+i )] = 0, where S ?pa(X+i )denotes the part of S ?f ull involving the variables inpa(X+i ). The chain rule for belief networks thus givesPrM+ [S?f ull] =n?j=1PrM+ [S?{X+j }| S ?pa(X+j )] = 0.39Lemma 2. Let M be an OBBN with the total ordering X1, . . . ,Xn of variables,andM+ be the corresponding EBN. Let S+ ? X+?(1) = x?(1) ? ?? ? ?X+?(k) = x?(k)be a conjunction of variable assignments and S be the conjunction such that S |=Xi = xi iff S+ |= X+i = xi and xi 6= ?. If there exists some X j ? vars(S) such thatM.Ont |= S< j??dom(X j), then PrM+ [S+] = 0.Proof. It suffices to show that for any full conjunction S ?f ull such that S?f ull |= S+,PrM+ [S ?f ull] = 0.Let Xu ? vars(S) be the first variable such thatM.Ont |= S<u ? ?dom(Xu).By construction of the EBN, we have the conditional probability PrM+ [X+u = xu |S ?pa(X+u )] = 0 (as xu 6= ?). The chain rule for belief networks gives PrM+ [S?f ull] =0.Corollary 1. LetM be an OBBN with the total ordering X1, . . . ,Xn of variables,andM+ be the corresponding EBN. Let S be a maximal well-defined conjunctionof variable assignments, and S+f ull be the full conjunction where X+i ? vars(S f ull)is assigned the same value as Xi if Xi ? vars(S) and ? otherwise. PrM+ [S+] =PrM+ [S+f ull].Proof. The probability of S+ inM+ can be computed asPrM+ [S+] = ?S ?f ull |=S+PrM+ [S?f ull],where S ?f ull is any full conjunction of variable assignments that entails S+. ByProposition 1 and Lemma 2, since S is maximal, PrM+ [S ?f ull] = 0 for any S?f ull 6?S+f ull . Thus, PrM+ [S+] = PrM+ [S+f ull].Lemma 3. Let M be an OBBN with the total ordering X1, . . . ,Xn of variables,andM+ be the corresponding EBN. Let S be a maximal well-defined conjunctionof variable assignments, and S+f ull be the full conjunction where X+i ? vars(S f ull)is assigned the same value as Xi if Xi ? vars(S) and ? otherwise. PrM[S] =PrM+ [S+f ull].40Proof. By definition of the OBBN, PrM[S] is computed asPrM[S] =k?i=1PrM[X?(i) = x?(i) | P?(i)],where P?(i) is the parent context for X?(i) such that S |= P?(i).Similarly, for the corresponding EBN,PrM+ [S+f ull] =n?i=1PrM+ [X+i = xi | P?i ],where P?i ? dom(Xi)+?P+i is the corresponding parent context inM+.By definition of the EBN, PrM+ [X+i = xi | P?i ] = PrM[Xi = xi | Pi] for Xi ?vars(S), and PrM+ [X+j =? |P?j ] = 1 for X j /? vars(S). Thus, the equality PrM[S] =PrM+ [S+f ull] follows.Corollary 2. LetM be an OBBN andM+ be the corresponding EBN. Let S ?X?(1) = x?(1)??? ??X?(k) = x?(k) be a maximal well-defined conjunction of variableassignments. PrM[S] = PrM+ [S+].Proof. The result follows immediately from Corollary 1 and Lemma 3.Proof of Theorem 1. Lemma 3 gives PrM[S] =PrM+ [S+f ull]. Lemma 1 and Lemma 2together state that any other world inM+ has probability 0. SinceM+ is knownto represent a coherent probability distribution, it follows that M also encodes acoherent probability distribution (i.e., the probabilities of all maximal well-definedconjunctions sum to 1).Proof of Theorem 2. The probability of S is the sum of probabilities of all maximalwell-defined conjunctions S ? that entail S.PrM[S] = ?S ?|=SPrM[S?],where S ? is maximal well-defined. Likewise,PrM+ [S+] = ?S ?f ull |=S+PrM+ [S?f ull],41where S ?f ull is any full conjunction that entails S+f ull .By Theorem 1, PrM+ [S ?f ull] = 0 except for any S?f ull that corresponds to someS ? in the OBBN. Thus PrM[S] = PrM+ [S+].Corollary 3. LetM be an OBBN andM+ be the corresponding EBN. Considerany two conjunctions of variable assignments, S1 and S2, such that S1 is well-defined, S1?S2 is well-defined, and PrM[S1]> 0. PrM[S2 | S1] = PrM+ [S+2 | S+1 ].Proof. Probability theory statesPrM+ [S+2 | S+1 ] =PrM+ [S+1 ?S+2 ]PrM+ [S+1 ].By Theorem 2, PrM[S1] = PrM+ [S+1 ] and PrM[S1?S2] = PrM+ [S+1 ?S+2 ]. ThusPrM[S2 | S1] = PrM+ [S+2 | S+1 ].Corollary 4. LetM be an OBBN andM+ be the corresponding EBN. Considerany variable Q and well-defined evidence E such that E |= dom(Q) and PrM[E ] >0. PrM[Q = q | E ] = PrM+ [Q+ = q | E+] for any q 6=?, and thus PrM+ [Q+ =? |E+] = 0.Proof. For any q 6=?, the conjunction E?Q= q is well-defined since E |= dom(Q).By Corollary 3, PrM[Q = q | E ] = PrM+ [Q+ = q | E+]. Because PrM[Q | E ] is aprobability distribution over range(Q), it immediately follows that PrM+ [Q+ = q |E+] = 0.A.1 Correctness of 3Q-INFERENCEProposition 2. LetM be an OBBN andM+ be the corresponding extended beliefnetwork. For any variable Q and well-defined evidence E such that E |= ?dom(Q)and PrM[E ] > 0, PrM+ [Q+ =? | E+] = 1.Proof. Consider any q 6=?. By Lemma 2, PrM+ [E+?Q+ = q] = 0 sinceM.Ont |=42E ? ?dom(Q). It follows thatPrM+ [E+] =?vPrM+ [E+?Q+ = v]= PrM+ [E+?Q+ =?].Substituting the above result gives:PrM+ [Q+ =? | E+] =PrM+ [E+?Q+ =?]PrM+ [E+]=PrM+ [E+]PrM+ [E+]= 1.An immediate consequence of Proposition 2 is that, when E |= ?dom(Q),PrM+ [Q+ = q | E+] = 0 for any q 6=?.Lemma 4. Let M be an OBBN and M+ be the corresponding EBN. For anyvariable Q and well-defined evidence E such that E 6|= ?dom(Q) and PrM[E ] > 0,PrM+ [Q+ = q | E+] = PrM+ [dom(Q)+?Q+ = q | E+] for any q 6=?.Proof. It suffices to show that PrM+ [?dom(Q)+?Q+ = q | E+] = 0.If E |= dom(Q), it is an immediate result that PrM+ [?dom(Q)+ ?Q+ = q |E+] = 0. Otherwise,PrM+ [?dom(Q)+?Q+ = q | E+] =PrM+ [E+??dom(Q)+?Q+ = q]PrM+ [E+].By Lemma 2, PrM+ [E+??dom(Q)+?Q+ = q] = 0, and so the result follows.Lemma 5. Let M be an OBBN and M+ be the corresponding EBN. For anyvariable Q and well-defined evidence E such that PrM[E ] > 0, PrM+ [Q+ = ? |E+] = 1?PrM+ [dom(Q)+ | E+].Proof. If E |= dom(Q), then PrM+ [dom(Q)+ | E+] = 1. By Corollary 4, PrM+ [Q+ =? | E+] = 0. If E |= ?dom(Q), then PrM+ [dom(Q)+ | E+] = 0. By Proposition 2,43PrM+ [Q+ =? | E+] = 1. Otherwise, if E 6|= dom(Q) and E 6|= ?dom(Q), thenPrM+ [Q+ =? | E+] (A.1)=?S+PrM+ [Q+ =? | E+?S+] ?PrM+ [S+ | E+] (A.2)= ?S?:E+?S?|=?dom(Q)+PrM+ [Q+ =? | E+?S?] ?PrM+ [S? | E+] (A.3)= ?S?:E+?S?|=?dom(Q)+PrM+ [S? | E+] (A.4)= PrM+ [?dom(Q)+ | E+] (A.5)= 1?PrM+ [dom(Q)+ | E+], (A.6)where S+ is any conjunction such that vars(S+)= vars(dom(Q)+)\vars(E+), andS? is any S+ such that E+?S? |= ?dom(Q)+. Equation A.3 follows because, byCorollary 4, PrM+ [Q+ =? | E+?S?] = 0; Equation A.4 follows since, by Propo-sition 2, PrM+ [Q+ = ? | E+?S?] = 1. (Note that E+?S+ includes all variablesin vars(dom(Q)+), and so E+?S+ 6|= dom(Q)+ iff E+?S+ |= ?dom(Q)+.)Proof of Theorem 3. We consider the three distinct cases.Case 1:M.Ont |= E ??dom(Q). By Proposition 2, PrM+ [Q+ =? | E+] = 1.Case 2:M.Ont |= E ? dom(Q). By Corollary 3, for q 6=?, PrM[Q = q | E ] =PrM+ [Q+ = q | E+] and PrM+ [Q+ =? | E+] = 0.Case 3:M.Ont 6|= E ??dom(Q) andM.Ont 6|= E ? dom(Q). By Corollary 3and Lemma 5,1?PrM[dom(Q) | E ] = 1?PrM+ [dom(Q)+ | E+]= PrM+ [Q+ =? | E+].Since dom(Q) ? E is well-defined, Corollary 4 states that, for any q 6= ?,PrM[Q = q | dom(Q)?E ] = PrM+ [Q+ = q | dom(Q)+?E+]. Together with Corol-44lary 3, it follows thatPrM[dom(Q) | E ] ?PrM[Q = q | dom(Q)?E ]= PrM+ [dom(Q)+ | E+] ?PrM+ [Q+ = q | dom(Q)+?E+]= PrM+ [dom(Q)+?Q+ = q | E+]= PrM+ [Q+ = q | E+],where the final equality follows from Lemma 4.We now show that any inference algorithm for belief networks can be used forOBBNs to compute the correct probabilities by proving the following theorem.Theorem 4. LetM be an OBBN. Suppose we treated the graphical structure andassociated CPDs ofM as for a regular belief network (i.e., ignoring whether thevariables are well-defined and the missing possible value ?undefined?), calledM?.Let S be a well-defined conjunction forM, PrM? [S] = PrM[S].Proof. In bothM andM?, the probability of S is the sum of the probabilities ofall maximal well-defined conjunctions that entail S . It therefore suffices to showthat, for any maximal well-defined conjunction Smax, PrM? [Smax] = PrM[Smax].Let Smax ? X?(1) = x?(1)??? ??X?(k) = x?(k). PrM? [Smax] can be computed by?marginalizing out? the variables not in vars(Smax) from the initial segment of abelief network:PrM? [Smax] = ?{x j?range(X j):X j /?vars(Smax)}PrM? [X1 = x1??? ??X?(k) = x?(k)], (A.7)where Xi is assigned the same value if Xi ? vars(Smax), and the other variables are45marginalized out. By the factorization of belief networks,PrM? [Smax] (A.8)= ?{xi?range(Xi):Xi /?vars(Smax)}PrM? [Xi = xi | Pi]??{x j}?X j /?vars(Smax)PrM? [X j = x j | pa(X j)Smax ] (A.9)= PrM[Smax]. (A.10)where Pi is the parent context for Xi such that Smax |= Pi. Equation A.9 followsbecause, by construction, Pi does not involve any X j /? vars(Smax) (even if X j isa parent variable of Xi). Equation A.10 follows since conditional probabilities ofevery X j sum to 1.This means that although M? can encode arbitrary probabilities for conjunc-tions that are not well-defined, we can still apply any belief network algorithm in3Q-INFERENCE, as we only make probabilistic queries that are well-defined withrespect to the ontology.46
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Probabilistic reasoning with undefined properties in...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Probabilistic reasoning with undefined properties in ontologically-based belief networks Kuo, Chia-Li 2013
pdf
Page Metadata
Item Metadata
Title | Probabilistic reasoning with undefined properties in ontologically-based belief networks |
Creator |
Kuo, Chia-Li |
Publisher | University of British Columbia |
Date Issued | 2013 |
Description | This thesis concerns building probabilistic models with an underlying ontology that defines the classes and properties used in the model. In particular, it considers the problem of reasoning with properties that may not always be defined. Furthermore, we may even be uncertain about whether a property is defined for a given individual. One approach is to explicitly add a value "undefined" to the range of random variables, forming (what we call) extended belief networks. Adding an extra value to a random variable's range, however, has a large computational overhead. In this work, we propose an alternative, ontologically-based belief networks, where properties are only used when they are defined. We show how probabilistic reasoning can be carried out without explicitly using the value "undefined" during inference. This, in general, requires that we perform two probabilistic queries to determine (1) the probability that the hypothesis is defined and (2) the probabilities of the hypothesis given it is defined. We prove this is equivalent to reasoning with the corresponding extended belief network and empirically demonstrate on synthetic models that inference becomes more efficient. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2013-10-22 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0052177 |
URI | http://hdl.handle.net/2429/45338 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2013-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2013_fall_kuo_chiali.pdf [ 333.82kB ]
- Metadata
- JSON: 24-1.0052177.json
- JSON-LD: 24-1.0052177-ld.json
- RDF/XML (Pretty): 24-1.0052177-rdf.xml
- RDF/JSON: 24-1.0052177-rdf.json
- Turtle: 24-1.0052177-turtle.txt
- N-Triples: 24-1.0052177-rdf-ntriples.txt
- Original Record: 24-1.0052177-source.json
- Full Text
- 24-1.0052177-fulltext.txt
- Citation
- 24-1.0052177.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0052177/manifest