UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A Markov random fields approach to modelling habitat Sánchez-Ordoñez, Andrés E. 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_september_sanchezordonez_andres.pdf [ 341.44kB ]
JSON: 24-1.0165758.json
JSON-LD: 24-1.0165758-ld.json
RDF/XML (Pretty): 24-1.0165758-rdf.xml
RDF/JSON: 24-1.0165758-rdf.json
Turtle: 24-1.0165758-turtle.txt
N-Triples: 24-1.0165758-rdf-ntriples.txt
Original Record: 24-1.0165758-source.json
Full Text

Full Text

A Markov Random Fields Approach to Modelling HabitatbyAndre´s E. Sa´nchez-Ordon˜ezA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)August 2015c© Andre´s E. Sa´nchez-Ordon˜ez, 2015AbstractHabitat modelling presents a challenge due to the variety of data available andtheir corresponding accuracy. One option is to use Markov random fields as away to incorporate these distinct types of data for habitat modelling. In this work,I provide a brief overview of the intuition, mathematical theory, and applicationconsiderations behind modelling habitat under this framework. In particular, anauto-logistic model is built and applied to modelling sea lion habitat using syntheticdata. First, we explore modelling one sample of data. Afterwards, the frameworkis extended to the multi-sample scenario. Finally, the theory for the methodologyis presented, the results of the applied implementation are presented.iiPrefaceThis dissertation is original, unpublished, independent work by the author Andre´sE. Sa´nchez-Ordon˜ez under the supervision of Prof. Matı´as Salibia´n-Barrera.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . 52.2 The Negative Potential Function . . . . . . . . . . . . . . . . . . 82.3 Auto-logistic Model . . . . . . . . . . . . . . . . . . . . . . . . . 103 Current Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1 Auto-logistic Model for Tracking Data . . . . . . . . . . . . . . . 133.2 Identifiability of The Likelihood . . . . . . . . . . . . . . . . . . 163.3 Likelihood as an Auto-logistic Model . . . . . . . . . . . . . . . 173.4 Estimation of the Joint Likelihood . . . . . . . . . . . . . . . . . 21iv3.5 Multi-sample Joint Likelihood . . . . . . . . . . . . . . . . . . . 243.6 Interpretation of Model Parameters . . . . . . . . . . . . . . . . . 274 Implementation and Numerical Optimisation . . . . . . . . . . . . . 304.1 Gibbs Sampler Implementation . . . . . . . . . . . . . . . . . . . 304.2 Numerical Optimisation in the Multi-sample Scenario . . . . . . . 305 Computer Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 356 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.1 Standard Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 51vList of TablesTable 3.1 Example of how the different configurations of occupied vs. non-occupied sites shown in Fig. 3.1 yield the same spatial auto-correlation parameter value for the auto-logistic model in Eq. 3.1. 29Table 5.1 Summary statistics for estimated parameter βˆ1 for six of thenine simulations conducted. Since only 3, 53, and 8 samplesconverged in the 4th, 6th, and 9th simulations, respectively, theresults are not shown here. . . . . . . . . . . . . . . . . . . . . 40Table 5.2 Summary statistics for estimated parameter θˆ1 for six of thenine simulations conducted. Since only 3, 53, and 8 samplesconverged in the 4th, 6th, and 9th simulations, respectively, theresults are not shown here. . . . . . . . . . . . . . . . . . . . . 41Table 5.3 Summary statistics for estimated parameter θˆ0 for six of thenine simulations conducted. Since only 3, 53, and 8 samplesconverged in the 4th, 6th, and 9th simulations, respectively, theresults are not shown here. . . . . . . . . . . . . . . . . . . . . 42viList of FiguresFigure 3.1 Two configurations of absence-presence locations on a latticethat yield equivalent spatial auto-correlation parameter valuefor the auto-logistic model in Eq. 3.1. The observed data y1(top) and y2 (bottom) are used in the example shown in Table 3.1. 15Figure 5.1 A realisation of an observed data grid generated in the simula-tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Figure 5.2 A typical realisation of the distribution of the covariate used inthe simulations. . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 5.3 Distribution of estimated parameter vector ηˆ throughout fivecycles of the Gibbs (MCMC) and Newton-Raphson optimisa-tion for the 1st simulation experiment. Horizontal line indi-cates the true value of the parameter boxplot of the correspond-ing colour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 5.4 Distribution of estimated parameter vector ηˆ throughout fivecycles of the Gibbs (MCMC) and Newton-Raphson optimisa-tion for the 2nd simulation experiment. Horizontal line indi-cates the true value of the parameter boxplot of the correspond-ing colour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 5.5 Distribution of estimated parameter vector ηˆ throughout fivecycles of the Gibbs (MCMC) and Newton-Raphson optimisa-tion for the 3rd simulation experiment. Horizontal line indi-cates the true value of the parameter boxplot of the correspond-ing colour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43viiFigure 5.6 Distribution of estimated parameter vector ηˆ throughout fivecycles of the Gibbs (MCMC) and Newton-Raphson optimisa-tion for the 5th simulation experiment. Horizontal line indi-cates the true value of the parameter boxplot of the correspond-ing colour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 5.7 Distribution of estimated parameter vector ηˆ throughout fivecycles of the Gibbs (MCMC) and Newton-Raphson optimisa-tion for the 7th simulation experiment. Horizontal line indi-cates the true value of the parameter boxplot of the correspond-ing colour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Figure 5.8 Distribution of estimated parameter vector ηˆ throughout fivecycles of the Gibbs (MCMC) and Newton-Raphson optimisa-tion for the 8th simulation experiment. Horizontal line indi-cates the true value of the parameter boxplot of the correspond-ing colour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46viiiGlossaryMCMC Markov chain Monte CarloMLE Maximum Likelihood EstimatorMPLE Maximum Pseudo-Likelihood EstimatorixAcknowledgmentsSpecial gratitude is owed to my parents and my brother without whose continuedsupport this work would not be possible. I offer great gratitude to Prof. Matı´asSalibia´n-Barrera for his immense and continued help and advice from conceptionto completion of the work presented here. Finally, I want to thank the faculty, staff,and my fellow colleagues for all of their encouragement.xChapter 1MotivationThe determination of animal habitat is an open problem in the ecological sciences.For ecologists habitat can have a variety of interpretations. One common interpre-tation is that of critical habitat, which is of particular importance for governmentalregulation and sustainability of animal life [14]. Another interpretation is that ofhabitat as the place where animals reside the majority of their time [2]. This latterinterpretation has scientific significance as it allows ecologists to understand thebehaviour and habits of the animals under study. We focus on exploring this latterdefinition with regards to marine mammals and, in particular, sea lions.In order to comprehend animal habitat, a variety of distinct data has been gath-ered by researchers. Broadly, the data can be categorised into two types: systematicsurveys and opportunistic sightings.The first of these types of data, namely, systematic surveys consist of a metic-ulous combing of a target area believed to be of significance for animal habitat.Either through aerial or marine surveys, ecologists discretise and travel the area ofpotential habitat in a pre-specified order. Afterwards, the number of animals, aswell as characteristics of these (e.g., gender or approximate age), are recorded. In-cluded within systematic surveys are counts of animal populations on land. Sincesystematic surveys are usually carried out by experts on the field, the data recordedcan often be assumed to have high accuracy. Furthermore, the discretised natureof the data facilitates the modelling of spatial auto-correlation in a region [19] andthe incorporation of environmental covariates [13].1As opposed to systematic surveys, opportunistic sightings data, also knownas platform of opportunity data, are gathered in a less rigorous manner. Oppor-tunistic sightings data consists of approximate locations at which an animal wasobserved by an individual passing by a specified area [7]. The data are usuallygathered by non-experts including members of fishing vessels or members of thegeneral public. For example, members of commercial fishing vessels may volun-tarily record any sea lion sighted during their daily routine. Given that this type ofdata is gathered by non-experts voluntarily, the data is usually not believed to behighly accurate as it is difficult to verify that the individual who observed the ani-mal is sufficiently knowledgeable to distinguish similar looking animal species [7].Furthermore, the location at which the animal is observed is generally not recordedwith high precision by non-experts as a consequence of the voluntary aspect ofthe data gathering mechanism. More problematic still, opportunistic sightings donot record the effort exerted into observing the animal. That is, the records do notcommunicate whether the animal was observed after spending a day looking for ananimal at the location or whether the animal was observed in the few minutes takenfor a vessel to cross the location. This also leads to an over-abundance of zeroes(i.e., absence observations) in platform-of-opportunity data, generally denoted aspseudo-absences to account for the limitations in the data gathering mechanism[23]. Some studies have incorporated ad-hoc measures of effort when predictinghabitat from opportunistic sightings [7]. Other studies, however, have opted to useopportunistic sightings as a tool for model validation rather than for model build-ing [14] [20]. A third viable alternative, which we later consider, is to aggregateopportunistic sightings over time and use these as covariates in a larger frameworkas a way to dampen the noise in the data.A major drawback of both systematic surveys and opportunistic sightings isthat the data gathered only accounts for the presence of an animal at the specifiedlocation at the specified time. More specifically, systematic surveys may recordan animal at one location on the first day and may also record the same animal atanother location a week later. Since without some sort of tagging, it may be difficultto identify whether the animal recorded during a survey has been recorded before,systematic surveys can lead to biased data on animal habitat due to over-estimationof the animals in the area and a misrepresentation of their habitat preferences.2Similarly, since opportunistic sightings may be recorded by distinct groups of non-synchronised individuals, the records on a certain day may show a multitude ofanimals observed at a location while in reality it may just be the same animalobserved by multiple individuals at the location. Finally, it is important to note thatboth systematic surveys and opportunistic sightings cannot account with certaintythe absence of an animal. That is, if an animal is not observed at a location, anobserver cannot state with certainty that the location is unfavourable habitat for theanimal [23] [22]. All that can be stated is that at the specified time an animal wasnot observed at the location. This issue is more problematic for the opportunisticsightings data since there is no indication of where the animals may or may nothave travelled.An additional drawback of both systematic surveys and opportunistic sightingsis that neither of these types of data are driven by the behaviour of the animal itself.Systematic surveys sample locations believed to be pertinent to animal habitat, andopportunistic sightings record by chance observations of animals. However, eco-logical theory suggests that habitat is driven by the availability of resources [3].In this setting resources refers to characteristics of the environment that benefit orfacilitate the survival and growth of an animal population [4] [8]. Generally, thesemay refer to the availability of food or lack of predators. Quantifying these, how-ever, can be rather difficult. Therefore, resources are usually inferred by physicalenvironmental variables such as topography, bathymetry, temperature, chlorophyllabundance, etc. [20] [15] [21]. The correlation between physical environmentalvariables and resources is general justified by the observed or assumed correlation,which is based on past literature and scientific theory [14].Lately, a new type of data has gained popularity that can alleviate some of thedrawbacks faced by systematic surveys and opportunistic sightings. Using tags at-tached to animals, ecologists are now recording telemetry or tracking data. Theseconsist of satellite-based locations of an animal, usually sampled multiple times aday, through an extended period of time. In addition to recording the animal’s loca-tion, several of the tags currently in use also record environmental variables at thelocation such as temperature, bathymetry, light intensity, etc. Since telemetry dataoriginates from the location travelled by the animal itself, the data is believed toprovide a less biased view of the animal’s habitat preferences. However, there are3a few limitations to telemetry data. First, the tags are usually attached to a specificsubset of the animal population giving a potentially biased data sample. For ex-ample, in the case of Steller sea lions, tags are usually attached to females or pupssince attaching the tags to the male sea lions is a dangerous task [21]. Second, dueto the cost of the tags, only a few animals in a colony are tagged providing a samplethat may not be representative of the population. Nevertheless, the abundance ofinformation sampled by the telemetry data tags has allowed certain studies to useoff-the-shelf methods to model habitat [21]. While these models incorporate theanimals’ trajectory and environmental variables, the spatial auto-correlation of thedata is absent.In order to get a more complete understanding of animal habitat, it would beideal to be able to incorporate distinct types of data into a single model. We proposethe use of an auto-logistic model with covariates as a reasonable habitat model. Theresponse variable in the model is an indicator for whether the animal occupied adiscrete spatial location based on the telemetry data. The joint probability of thetelemetry data locations are modelled by an exponential function of the covariates.The covariates, in this case, could consist of environmental data or come fromopportunistic sightings or systematic surveys. Moreover, spatial auto-correlationis also incorporated into the model by way of a Markov neighbourhood structureof the telemetry data. Furthermore, by using an auto-logistic model, importantcovariates could be reasonably determined within a hypothesis testing framework.4Chapter 2BackgroundThis section constitutes a review of the literature necessary for deriving and under-standing the auto-logistic model that is implemented in our work. The backgroundpresented here draws mainly on the results found in [5] and [9]. First, the theo-retical foundation of Markov random fields is introduced including the definitionsof the spatial neighbourhood structure and potential function utilised for buildingan auto-logistic model. Afterwards, the framework of conditional dependence ofthe model is briefly discussed. Finally, using the foundations discussed, an auto-logistic model is formally defined.2.1 Markov Random FieldsIn order to define a Markov random field, it is necessary to first define the con-cept of a neighbour with respect to a distribution as opposed to a topology. Let Ybe a discrete random variable defined over a squared lattice D on R2. The sites{s1,s2, . . . ,sN} ⊂D denote the N spatial locations at which observations are avail-able. Then, following the notation used in [9], write a conditional probability pro-cess P(Y (s1) = y(s1), . . . ,Y (sI) = y(sI)|Y (sI+1) = y(sI+1), . . . ,Y (sI+J) = y(sI+J))as P(y(s1), . . . ,y(sI)|y(sI+1), . . . ,y(sI+J)).Definition 1 (Neighbour) A site k is a neighbour of site i if the conditional dis-tribution of Y (si), given all other site values, depends functionally on Y (sk) fori 6= k.5Definition 2 (Clique) A clique denotes a set of sites consisting of either a singlesite or sites that are neighbours to each other.Having now defined a neighbourhood structure for a site, we can now formallydefine a Markov random field as follows:Definition 3 Any probability measure whose conditional distributions define a neigh-bourhood structure {Ni : i = 1, . . . ,N} is defined to be a Markov random field.In other words, a Markov random field is a set of random variables each ofwhose conditional distribution depends on each other based on a Markov neigh-bourhood structure. In other words, a Markov random field is an undirected graphwith the Markov property.Given the conditional dependence of a Markov random field, we make use ofthe factorisation theorem [5] below to specify a joint probability mass or densityfunction as the product of conditional probabilities. This enables us to computedifficult joint probabilities as the product of simpler conditional probabilities.Theorem 2.1.1 (Factorisation Theorem) Let P(·) denote the joint probability massfunction of a random variable {Y (si) : i = 1, . . . ,N} whose support ζ satisfies thepositivity condition (see below). Then,P(y)P(z)=N∏i=1P(y(si)|y(s1), . . . ,y(si−1),z(si+1), . . . ,z(sN))P(z(si)|z(s1), . . . ,z(si−1),y(si+1), . . . ,y(sN)), (2.1)for all y,z ∈ ζ where y = (y(s1), . . . ,y(sN))T and z = (z(s1), . . . ,z(sN))T are allpossible realisations of Y .In order for the factorisation theorem to be true, however, the positivity condi-tion below must be satisfied.Definition 4 (Positivity Condition) Let {y(si) : i = 1, . . . ,N} be the values of adiscrete random variable Y observed at the locations on a lattice. Define ζ ≡{y : P(y) > 0} and ζi ≡ {y(si) : P(y(si)) > 0} for i = 1,2, . . . ,N. The positivitycondition is satisfied if ζ = ζ1×ζ2,×·· ·×ζN .6In essence, the positivity condition ensures that when the joint probability func-tion is factorised as in Theorem 2.1.1, none of the individual conditional probabil-ities are zero and, thus, the product of the conditionals will be non-zero. Althoughthe necessity of the positivity condition makes intuitive sense, it is neverthelessworth illustrating the condition through an example adapted from [9] in order tosee possible complications in the specification of a model based on a Markov ran-dom field.Let y(si) = I(si is occupied), that is, an indicator for a presence being observedat site i, and assume that site y(s j) can only be occupied if one of its immediateneighbours in the four cardinal directions is occupied. Under these conditions, thefollowing arrangement would need to have a probability of zero.0 0 00 1 00 0 0(2.2)This is because the only way the centre site could be occupied is through con-tact with an occupied neighbour in the four cardinal directions. Hence, unless thisis the starting state for the occupancy process and assuming that occupied sitescannot revert back to being unoccupied, the probability of the arrangement aboveoccurring should be zero by the positivity condition.Now, assume that the initial occupied site is somewhere on the lattice otherthan the observed occupied site and that we arrive at the following configuration:y(s1) 0 y(s2) y(s3)0 1 y(s4) y(s5)y(s6) 0 y(s7) y(s8)Note that under this configuration, we should observeP(y(s4) = 0|{y(s j) : j 6= 4}) = 0,as otherwise we would get the same situation as in the Eq. 2.2 above where thepositivity condition is violated. However, note that given the neighbours of s4in the four cardinal directions, namely, N(s4) : {y(s2),1,y(s5),y(s7)}, we have7P(y(s4)= 0|N(s4))> 0 since we only observe one occupied neighbour for s4. Sincewe cannot ensure that P(y(s4) = 0|{y(s j) : j 6= 4}) = 0, the positivity condition isviolated in this case. Therefore, when specifying conditional probability modelsbased on Markov random fields, it is important to check that the framework of thestochastic process allows the positivity condition to be satisfied.2.2 The Negative Potential FunctionHaving now covered the necessary foundation for Markov random fields, we nowintroduce the (negative) potential function which will play an important role in thespecification of the auto-logistic model.Assumption 1 Without loss of generality, assume zero can occur at any site [9];that is, 0 ∈ ζ .Definition 5 Let Q be the negative potential function whereQ(y)≡ log{P(y)P(0)}for y ∈ ζ .It follows from Def. 5 thatexp(Q(y)) =P(y)P(0)for y ∈ ζ ,such that,P(y) =exp(Q(y))∑z∈ζexp(Q(z)),where the normalising constant ∑z∈ζ exp(Q(z)) is commonly called the partitionfunction.Now, we make use of the factorisation theorem to derive some important prop-erties of the negative potential function Q that allow a conditional probabilitymodel to be later derived.Proposition 1 The negative potential function Q satisfies the following two equa-8tions:P(y(sk)|{y(s j) : j 6= k})P(0(sk)|{y(s j) : j 6= k})=P(y)P(y0(sk))= exp{Q(y)−Q(y0(sk))}, (2.3)where 0(sk)≡ {y(sk) = 0} and y0(sk) ≡ (y(s1), . . . ,y(sk−1),0,y(sk+1), . . . ,y(sN))T .Furthermore, Q can be expanded uniquely on ζ asQ(y) = ∑1≤i≤Ny(si)Gi(y(si))+ ∑1≤i< j≤Ny(si)y(s j)Gi j(y(si),y(s j))+ ∑1≤i< j<l≤Ny(si)y(s j)y(sl)Gi jl(y(si),y(s j),y(sl))+ · · ·+ y(s1) · · ·y(sN)G1...N(y(s1), . . .y(sN)),(2.4)where Gi(·) is defined as y(si)Gi(y(si))≡ Q(0, . . . ,0,y(si),0, . . . ,0) andy(si)y(s j)Gi j(y(si),y(s j))≡ Q(0, . . . ,0,y(si),0, . . . ,0,y(s j), . . .)−Q(0, . . . ,0,y(si),0, . . . ,0)−Q(0, . . . ,0,y(s j),0, . . . ,0),where Q is a unique function on ζ if Gi j(·) = 0 whenever y(si) = 0 or y(s j) = 0.Therefore, from Eq. 2.3 and Eq. 2.4, we can specify Q(y) in terms of the con-ditional probabilities:y(sk)Gk(y(sk)) = log{P(y(sk)|{0(s j) : j 6= k})P(0(sk)|{0(s j) : j 6= k})}. (2.5)The equations described in Proposition 1 are important in that they allow oneto precisely define forms of the negative potential function for which the partitionfunction can be defined as a closed form of the parameters of Q. This result isproved in an unpublished paper by Hammersley and Clifford [17] as follows:Theorem 2.2.1 (Hammersley-Clifford Theorem) SupposeY is distributed accord-ing to a Markov random field ζ that satisfies the positivity condition. Then, the neg-ative potential function Q(·) must satisfy the following: if sites (si,s j, . . . ,sl) do notform a clique, then Gi, j,...,l(·)≡ 0, where cliques are defined by the neighbourhood9structure {Ni : i = 1, . . . ,N}.In other words, the Hammersley-Clifford theorem states that conditional spatialmodels can be specified in terms of a few non-zero functions Gi, j,...,l(·). The re-sults of the theorem are perhaps clearer when we can specify the negative potentialfunction as follows:Let κ be a clique and define yκ ≡ (y(si) : i∈ κ)T and Vκ(yκ)= {∏i∈κ y(si)}Gκ(yκ).Then,Q(y) = ∑κ∈CVκ(yκ).where C is the set of all cliques. Thus, from Eq. 2.3 we find thatP(y(si)|{y(s j) : j 6= i}) ∝ exp(∑κ:i∈κVκ(yκ))for i = 1,2, . . . ,N.A corollary of the above result is obtained if ζ is countable in Rd or ζ is aLebesgue measurable subset of Rd , and a set of well-defined functions G can bespecified from the conditional probabilities and neighbourhoods Ni : i = 1, . . . ,N.Then, the resulting negative potential function yields a unique well-defined jointprobability function provided that ∑y∈ζ exp(Q(y))< ∞ holds [9]. In general, how-ever, the partition function cannot be written in closed form as a function of theparameters of the negative potential function which makes maximising the exactlikelihood computationally prohibitive [9].2.3 Auto-logistic ModelWe now have the appropriate foundation to specify the auto-logistic model thatforms the basis of our proposed model. Since the probability mass function of Yhas the form P(y) = exp(Q(y))/∑z∈ζ exp(Q(z)), exponential family models canbe trivially expressed in terms of the negative potential function. Furthermore, thisallows us to specify pairwise-only dependence models for conditional exponentialdistributions [5]. First, note that, as shown in [9], the conditional distribution of10exponential family models can written as:P(y(si)|{y(s j) : j 6= i}) =exp{Ai({y(s j) : j ∈Ni})Bi(y(si))+Di({y(s j) : j ∈Ni})+Ci(y(si))},(2.6)where for i = 1,2, . . . ,N we have Ai and Di which depend on the neighbours of theith site and Bi and Ci which do not.Now, assuming Eq. 2.6 and assuming pair-wise only dependence, we haveAi({y(s j) : j ∈Ni}) = αi +N∑j=1θi jB j(y(s j)), (2.7)where θi j = θ ji, θii = 0, and θik = 0 for k 6∈Ni and i = 1,2, . . . ,N.Using the above, an auto-logistic model is defined as follows [5] [9]. LetGi(1)≡ αi and Gi j(1,1)≡ θi j. Thus, the negative potential function in terms of αiand θi j is:Q(y) =N∑i=1αiy(si)+ ∑1≤i< j≤Nθi jy(si)y(s j),where θi j = 0 if j 6∈Ni and θii = 0 to maintain identifiability. Consequently, fromEq. 2.3 it follows thatQ(y)−Q(y0(sk)) = αky(sk)+N∑j=1θk jy(sk)y(s j).Under the same assumptions, we can apply the above general result to thebinary case where y(si) = 0 or 1. First, note that we can express:P(y(sk)|{y(s j) : j 6= k})P(0(sk)|{y(s j) : j 6= k})= exp{αky(sk)+N∑j=1θk jy(sk)y(s j)}.And similarly,1−P(0(sk)|{y(s j) : j 6= k})P(0(sk)|{y(s j) : j 6= k})= exp{αk +N∑j=1θk jy(s j)}.11After a bit of algebra, we reach the following form of the conditional probability:P(y(sk)|{y(s j) : j 6= k}) =exp{αky(sk)+∑Nj=1 θk jy(sk)y(s j)}1+ exp{αk +∑Nj=1 θk jy(s j)} , (2.8)which is defined to be an auto-logistic model due to its similarity to the classicalform of the logistic model.12Chapter 3Current WorkIn this section, we derive our proposed model. First, we adapt the model specifiedby Huffer and Wu [18] to more reasonably model telemetry data. Next, we dealwith issues of identifiability in the proposed likelihood and adjust the model toalleviate these. We then show that the proposed likelihood is an auto-logistic modelas defined in Section 2.3. After this, we derive the usual elements of the modelneeded for maximum likelihood estimation, namely, the log-likelihood, gradient,and information matrix. Following this, we extend the model to the multi-samplecase where more than one lattice of spatial location data is available. Finally, wetouch upon numerical issues and methods to resolve these in the multi-sample case.3.1 Auto-logistic Model for Tracking DataOur proposed method has a foundation in the unnormalised auto-logistic likelihoodfunction put forth in [18] shown below (Eq. 3.1). In order to simplify the notation,we let y = (y1, . . . ,yN)T , where yi = y(si) = I(presence was observed at site i) forcells on a lattice or grid indexed by i = 1,2, . . . ,N. Furthermore, xi denotes a vectorof the value of the explanatory variables observed at the ith site. Finally, j ∈Nidenotes sites neighbouring site i based on a pre-defined neighbourhood structure.In our model, the first-order neighbours in any direction are considered neighbours.13f (y) ∝ exp{ N∑i=1[yiβ0 + yixiβ 1]+ (γ/2) ∑j∈Ni[yiy j +(1− yi)(1− y j)]}. (3.1)Although the model above is a viable method for modelling binary data in alattice, we want a more fine-tuned model which differentiates between occupiedsites, where an animal has traversed, from unoccupied sites (i.e., unvisited by ananimal) assigning a separate parameter to each as this may fit telemetry data moreclosely. Initial experimentation showed that the parameter specifying the spatialstructure of the data(γ/2)N∑i=1∑j∈Ni[yiy j +(1− yi)(1− y j)] = (γ/2)N∑i=1yi ∑j∈Niy j +(1− yi) ∑j∈Ni(1− y j),did not differentiate between zones of spatially auto-correlated occupied sites andthose of empty sites. For example, using Eq. 3.1, the two data grids shown inFig. 3.1 would have the same estimated coefficient since the termN∑i=1∑j∈Ni[yiy j +(1− yi)(1− y j)] ,equals 14 in both cases. Table 3.1 breaks down the process with y1 calculations inthe left column and y2 in the right column.Given that our aim is to use telemetry data as the source of our presence/ab-sence response variable, the difference between a track of occupied sites and atrack of empty sites is critical. In order to make this distinction clear in the estima-tion procedure, we choose to split the spatial structure parameter into two distinctparameters. The first, θ1, tracks series of occupied sites surrounded by empty sites.The second, θ2, tracks zones composed entirely of empty sites.Note that given our specification of the spatial parameters, we still meet thepositivity condition since a site can turn to one or zero so long as there is at least aone (presence) or a zero (absence) surrounding it.With our new parametrisation of the spatial structure parameters, the likelihood140 0 01 1 11 0 0longitudelatitude0.000.250.500.751.00valuey11 1 10 0 00 1 1longitudelatitude0.000.250.500.751.00valuey2Figure 3.1: Two configurations of absence-presence locations on a lattice thatyield equivalent spatial auto-correlation parameter value for the auto-logistic model in Eq. 3.1. The observed data y1 (top) and y2 (bottom)are used in the example shown in Table 3.1.becomes:f (y) ∝ exp{N∑i=1yiβ0 + yixiβ 1 +θ1 ∑j∈Niyi(1− y j)+θ2 ∑j∈Ni(1− yi)(1− y j)}.(3.2)The likelihood shown in Eq. 3.2 has an issue of identifiability where the in-15tercept term is not identifiable due to the form of the spatial structure parameters.Dropping the intercept term, we are left with the following likelihood whose iden-tifiability we show next:f (y) = exp{N∑i=1yixiβ +θ1 ∑j∈Niyi(1− y j)+θ2 ∑j∈Ni(1− yi)(1− y j)}/z(η ),(3.3)where η = (β ,θ1,θ2)T .3.2 Identifiability of The LikelihoodWe now verify the identifiability of the likelihood. It is clear that under someweak assumptions on the xi’s, such as all of the xi not being equal to each other,terms depending on the parameter β are identifiable due to their dependence onthe particular covariate being modelled. Hence, to show the identifiability of themodel, we must show that the spatial auto-correlation parameters are identifiable.Showing identifiability of the model is trivial and involves a simple expansionof the spatial auto-correlation parameters.θ1∑i∑j∈Ni(yi− yiy j)+θ2∑i∑j∈Ni(1− yi)(1− y j)= θ1∑iyi|Ni|+(θ2−θ1)∑i∑j∈Niyiy j +θ2[∑i|Ni|−∑i∑j∈Niy j−∑iyi|Ni|]= (θ2−θ1)∑i∑j∈Niyiy j +(θ1−θ2)∑iyi|Ni|+θ2∑i|Ni|−θ2∑i∑j∈Niy jwhere |Ni| denotes the cardinality of the neighbours of the ith site, and ∑i is ashorthand notation for ∑Ni=1. Note that the number of neighbours depends on thelocation of the site and the given neighbourhood structure. For example, for thefirst-order neighbourhood structure we use, we have eight neighbours for sites noton the edge of the lattice, five neighbours for sites on the edge but not on a corner,and three neighbours for sites on the corners. Finally, from the expansion above,we can see that there are four terms which contribute to the likelihood in addition16to the term containing the covariate. Since each of the four terms has a distinctsufficient statistic associated each with its own parameter (or a difference betweenparameters), it is clear that the model is identifiable for distinct parameter values.3.3 Likelihood as an Auto-logistic ModelHaving shown the identifiability of the model, we now show that the proposed like-lihood is an auto-logistic model as previously specified and, thus, can be expressedas the product of conditional probabilities.First, recall from Def. 5 that the probability mass function of a Markov randomfield can be written in terms of the negative potential function and the partitionfunction. We denote the (exponentiated) negative potential function as:exp{Q(y)}= exp{N∑i=1yixiβ +θ1 ∑j∈Niyi(1− y j)+θ2 ∑j∈Ni(1− yi)(1− y j)}.(3.4)In our case, the partition function is just a normalising constant defined as thesum over all possible configurations of the data. Consequently, we denote thenormalising constant z(η ) as a function of the parameters of the model η . Notethat computation of z(η ) is usually intractable due to the dimension of the data.Next, fix k and let y0k = (y1, . . . ,yk−1,0,yk+1, . . . ,yN)T and using Eq. 2.3, wetake the difference of the negative potential functions at y and y0k and exponentiatethese to find:P(y)P(y0k)= exp{∑iyixiβ −∑i6=kyixiβ +θ1∑i∑jyi(1− y j)−θ1∑i 6=k∑jyi(1− y j)+θ2∑i∑j(1− yi)(1− y j)−θ2∑i6=k∑j(1− yi)(1− y j)−θ2 ∑j∈Nk(1− y j)}.We now simplify the equation by parts. For the terms containing the parameter17β we have:∑iyixiβ −∑i 6=kyixiβ = ykxkβ . (3.5)Next, for the terms containing the parameter θ1, we break the summation intothose sites not neighbouring site k, those which have site k as a neighbour and thecontribution from the neighbours of these, and the neighbours of site k.θ1∑i∑jyi(1− y j)−θ1∑i 6=k∑jyi(1− y j)= θ1 ∑i6∈Nk∑j∈Niyi(1− y j)+θ1 ∑i∈Nk∑j∈Niyi(1− y j)+θ1 ∑j∈Nkyk(1− y j)−θ1 ∑i6∈Nk∑j∈Niyi(1− y j)−θ1 ∑i∈Nk∑j∈Ni, j 6=kyi(1− y j)−θ1 ∑i∈Nkyi= θ1 ∑i∈Nk∑j∈Niyi(1− y j)+θ1 ∑j∈Nkyk(1− y j)−θ1 ∑i∈Nk∑j∈Ni, j 6=kyi(1− y j)−θ1 ∑i∈Nkyi.(3.6)Note that for sites in the neighbourhood of the kth site, we must take into ac-count the value of the kth site. Moreover, note that:θ1 ∑i∈Nk∑j∈Niyi(1− y j) = θ1 ∑i∈Nkyi(∑j∈Ni, j 6=k(1− y j)+(1− yk)).Hence, Eq. 3.6 simplifies toθ1 ∑i∈Nk∑j∈Ni, j 6=kyi(1− y j)+θ1 ∑i∈Nkyi(1− yk)+θ1 ∑j∈Nkyk(1− y j)−θ1 ∑i∈Nk∑j∈Ni, j 6=kyi(1− y j)−θ1 ∑i∈Nkyi=−θ1 ∑i∈Nkyiyk +θ1 ∑j∈Nkyk(1− y j) = θ1 ∑j∈Nk(yk−2y jyk).For the terms containing the parameter θ2, we similarly break down the sum-mation based on the neighbours of the kth site.18θ2∑i∑j(1− yi)(1− y j)−θ2∑i6=k∑j(1− yi)(1− y j)= θ2 ∑i 6∈Nk∑j∈Ni(1− yi)(1− y j)−θ2∑i 6∈k∑j∈Ni(1− yi)(1− y j)+θ2 ∑i∈Nk∑j∈Ni(1− yk)(1− y j)+θ2 ∑j∈Nk(1− yk)(1− y j)−θ2 ∑i∈Nk∑j∈Ni(1− yi)(1− y j)−θ2 ∑j∈Nk(1− yk)(1− y j).(3.7)As was the case with θ1, we can simplify sites which have site k as a neighbouras:θ2 ∑i∈Nk∑j∈Ni(1−yi)(1−y j)= θ2 ∑i∈Nk∑j∈Ni, j 6=k(1−yi)(1−y j)+θ2 ∑i∈Nk(1−yi)(1−yk).Hence, Eq. 3.7 simplifies toθ2 ∑i∈Nk∑j∈Ni, j 6=k(1− yi)(1− y j)+θ2 ∑i∈Nk(1− yi)(1− yk)+θ2 ∑j∈Nk(1− yk)(1− y j)−θ2 ∑i∈Nk∑j∈Ni, j 6=k(1− yi)(1− y j)−θ2 ∑i∈Nk(1− yi)−θ2 ∑j∈Nk(1− y j)= 2θ2 ∑j∈Nk(1− yk)(1− y j)−2θ2 ∑j∈Nk(1− y j).Therefore, taking the simplified forms of Eq. 3.5, Eq. 3.6, and Eq. 3.7, we findthe ratio of the conditional probabilities at the kth site given all other sites to be:P(y)P(y0k)= exp{ykxkβ +θ1(∑j∈Nkyk(1− y j)− y jyk)+2θ2(∑j∈Nk(1− yk)(1− y j)− (1− y j))}=P(yk|{y j : j 6= k},xk)P(0k|{y j : j 6= k},xk),where 0k ≡ 0(sk).19Furthermore, since our data are binary we note that1−P(0k|{y j : j 6= k},xk)P(0k|{y j : j 6= k},xk)= exp{xkβ +θ1 ∑j∈Nk(1−2y j)−2θ2 ∑j∈Nk(1− y j)},so that we arrive at the conditional likelihood of the kth site given all other sites tobe:P(y(sk)|{y(s j) : j 6= k},xk) =exp{ykxkβ +θ1(∑ j∈Nk yk(1− y j)− y jyk)+2θ2(∑ j∈Nk(1− yk)(1− y j)− (1− y j))}1+ exp{xkβ +θ1∑ j∈Nk(1−2y j)−2θ2∑ j∈Nk(1− y j)} ,which is an auto-logistic model as previously defined. Furthermore, it is worthnoting that the conditional probability of observing presence, that is, yk = 1 at anysite k, is simply an inverse-logit function of a linear function gk(·) of the covariatesxk at a location yk:P(yk = 1|{y j : j 6= k},xk) =exp{gk(xk)}1+ exp{gk(xk)}, (3.8)wheregk(xk) = xkβ +θ1 ∑j∈Nk(1−2y j)−2θ2 ∑j∈Nk(1− y j) (3.9)By the factorisation theorem (Theorem 2.1.1), it follows that we can estimatethe parameters of the joint likelihood as the product of the conditional probabilitiesat each site. Since each conditional probability is essentially a logistic function,estimation of the parameters in the factorised likelihood can be trivially achievedvia logistic regression. This estimate, known as the Maximum Pseudo-LikelihoodEstimate (MPLE), has the advantage of not requiring one to compute the normal-ising constant of the joint likelihood. Therefore, the MPLE serves as the startingpoint in our MCMC-MLE optimisation procedure described later in Section Estimation of the Joint LikelihoodWe now examine the estimation of the joint likelihood via Monte Carlo meth-ods. Since in order to estimate model parameters a tractable form of the likelihoodfunction is needed, we follow the material presented in [18] and [12] in examin-ing available methods to obtain, or at least estimate, a normalised joint likelihoodfunction to optimise.By definition, one can decompose the normalised likelihood function into thetwo components shown below: the unnormalised likelihood, hη(y), and the nor-malising constant, z(η ):f (y,η ) = 1z(η )hη(y),Here, the unnormalised likelihood is the exponentiated negative potential functionfrom Eq. 3.4:hη(y) = exp{Q(y)}= exp{N∑i=1yixiβ +θ1 ∑j∈Niyi(1− y j)+θ2 ∑j∈Ni(1− yi)(1− y j)},(3.10)and the normalising constant isz(η ) =∫hη(y)dµ,where µ is the Lebesgue measure on R.As is common in the field of spatial statistics, direct computation of the nor-malising constant is intractable. In our case, computing the normalising constantis particularly difficult due to the constant being a function of our parameter to beestimated.In estimating the normalising constant, we proceed as suggested in [18]. Ratherthan estimating the normalising constant on its own, we introduce an importancedistribution and estimate the ratio of the normalising constant of the original like-lihood over that of the importance distribution. Therefore, we estimate the ratioof the likelihood with the parameters to be estimated η over a likelihood with ar-21bitrarily chosen importance parameters η ∗ = (β ∗,θ ∗1 ,θ ∗2 )T . Under this construct,maximising the likelihood f (y,η ) is equivalent to maximising the ratio of the twolikelihoods:argmaxηf (y,η ) = argmaxηf (y,η )f (y,η ∗) ,wheref (y,η )f (y,η ∗) =hη(y)hη∗(y)·z(η ∗)z(η ) =hη(y)z(η )z(η ∗)hη∗(y). (3.11)Now, instead of estimating the normalising constant directly, we estimate theratio of the normalising constants for the two likelihoods. Estimating the ratio ofthe normalising constant is easier sincez(η )z(η ∗) =∫hη(y)z(η ∗) fη∗(y)dµ(y) = Eη∗[hη(Y )hη∗(Y )]≈1MM∑m=1hη(y∗m)hη∗(y∗m), (3.12)where y∗m is a sample from f (·,η ∗). Substituting Eq. 3.12 into Eq. 3.11, we find:f (y,η )f (y,η ∗) ≈hη(yobs)/hη∗(yobs)1MM∑m=1hη(y∗m)/hη∗(y∗m). (3.13)Note that by making use of the ergodic theorem [10], the expected value wewish to find can be computed from a non i.i.d.sample y∗m. Therefore, for our esti-mation we sample pseudo-data generated via a Gibbs-sampler (i.e., MCMC) fromY ∗ ∼ f (·,η ∗). The arbitrarily chosen η ∗ ought to be close to the true parameterη so that the maximisation procedure works adequately, and the obtained estimateηˆMLE is accurate. Hence, as previously mentioned, we set the MPLE as the initialvalue for η ∗.For conciseness, we can summarise the observed data into a vector of sufficientstatistics, t(y), such thatt(y) =(∑iyixi,∑i∑j∈Niyi(1− y j),∑i∑j∈Ni(1− yi)(1− y j))T.22Using t(y), we can approximate our proposed likelihood in the form of Eq. 3.13 asa function of y,y∗,η , and η ∗:L(η ;y,η ∗)≈ exp{(η −η∗)T t(yobs)}1M ∑mexp{(η −η ∗)T t(y∗m)}, (3.14)where yobs are the observed data and y∗m is the mth sample of pseudo-data generatedvia MCMC for m = 1, . . . ,M. Similarly, we can derive the log-likelihood as:`(η ;y,η ∗)≈ (η −η ∗)T t(yobs)− log(1M∑mexp{(η −η ∗)T t(y∗m)}).Now that we have an estimate for the log-likelihood function in closed form,we can compute the maximum likelihood estimates as usual. First, we compute thegradient of the log-likelihood function. That is, we compute the vector of partialderivatives where the partial derivative of the log-likelihood with regards to the jthparameter is:∂`(η )∂η j= t j(yobs)−M∑m=1exp{(η −η ∗)T t(y∗m)}t j(y∗m)M∑m=1exp{(η −η ∗)T t(y∗m)}= t j(yobs)−M∑m=1wmt j(y∗m).(3.15)Note that we usewm =exp{(η −η ∗)T t(y∗m)}M∑m=1exp{(η −η ∗)T t(y∗m)},to denote the weight of the mth MCMC sample.23Similarly, we compute the Hessian matrix of second-order partial derivatives:∂ 2`(η )∂ηiη j=−M∑m=1exp{(η −η ∗)T t(y∗m)}ti(y∗m)t j(y∗m)TM∑m=1exp{(η −η ∗)T t(y∗m)}+(M∑m=1exp{(η −η ∗)T t(y∗m)}ti(y∗m))(M∑m=1exp{(η −η ∗)T t(y∗m)}t j(y∗m)T)(M∑m=1exp{(η −η ∗)T t(y∗m)})2=−wmti(y∗m)t j(y∗m)T +w2mti(y∗m)t j(y∗m)T .(3.16)The MLE is found by maximising the log-likelihood, which is equivalent tofinding the value of η where the gradient is zero and the Hessian is negative defi-nite.Finally, standard errors of the MLE estimates can then be computed via theFisher information defined as the expectation of the negative Hessian of the log-likelihood:I(η 0) =−E[∂ 2`(η )∂ηη T].However, we estimate the Fisher information by taking the expectation of the neg-ative Hessian evaluated at the MLE. Thus, the observed Fisher information is:Iˆ(η 0) =−∂ 2`(η )∂ηη T∣∣∣∣η=θˆ= Eθˆ[wmti(y∗m)t j(y∗m)T −w2mti(y∗m)t j(y∗m)T ] . (3.17)3.5 Multi-sample Joint LikelihoodThus far, we have focused on the case described in [18], where we have only asingle lattice or grid of data representing the sites y where there is either an ob-served presence or absence. Now, we move on to the multi-sample case where thedata consists of a set of grids. Each of these grids can vary in terms of presence orabsence at a site, the covariate values at sites, etc. However, we can consider allthese grids as having been generated from a common true stochastic process.24In the multi-sample case, we have K grids of available data each of which weassume to have been discretised to the same resolution of d×d sites. We assumethat the grids, y1, . . . ,yK , are all independent and identically distributed and taketheir multi-sample joint likelihood to be the product of the individual K likelihoods.L(η ;y1, . . . ,yK ,η ∗) =K∏k=1L(η ;yk,η ∗). (3.18)Recalling Eq. 3.11, one can re-write the multi-sample joint likelihood as:K∏k=1L(η ;yk,η ∗) =K∏k=1hη(yobs,k)/hη∗(yobs,k)z(η )/z(η ∗) .When approximating the ratio of the normalising constants z(η )/z(η ∗) in Eq. 3.12,one estimates the integration over the entire sample space by making use of Mgenerated pseudo-grids (each of dimension d× d). It is important to note, how-ever, that in the multi-sample case estimating the ratio of the normalising constantsjointly requires an integration over M random vectors of K grids each. There-fore, in computing the multi-sample joint likelihood, the integration in the measureµ(y1, . . . ,yK) no longer corresponds to integrating over all possible configurationsof a single grid y but rather to all possible configurations of a joint set of grids. Thisjoint set, however, is itself a function of the possible configurations for each of theK individual grids t(y1, . . . ,yK) so thatz(η )z(η ∗) =∫. . .∫hη(y1, . . . ,yK)z(η ∗) fη∗(y1, . . . ,yK)dµ.In particular, note that the function of sufficient statistics t(·) is applied to eachof the K observed samples as well as the M pseudo-data samples generated.Under our parametrisation of the likelihood, where the likelihood is propor-tional to an exponentiated linear term, the joint ratio of the normalising constantscan be estimated as follows. First, we add the sufficient statistics across all Kdistinct grids for the mth pseudo-sample. Next, each of these summed sufficientstatistics is used to evaluate the ratio of the unnormalised likelihoods at a given ηand η ∗. Finally, we take the average of these evaluated ratios of the unnormalised25joint likelihood as an approximation for their expectation.z(η )z(η ∗) = Eη∗[hη(Y 1, . . . ,Y K)hη∗(Y 1, . . . ,Y K)]≈1M∑mexp{(η −η ∗)T ∑kt(y∗k,m)}. (3.19)Having derived an estimate for the ratio of the normalising constants in themulti-sample case, the multi-sample joint likelihood can be expressed as:K∏k=1L(η ;yk,η ∗)≈K∏k=1exp{(η −η ∗)T t(yk,obs)}1M ∑mexp{(η −η ∗)TK∑k=1t(y∗k,m)}=exp{(η −η ∗)TK∑k=1t(yk,obs)}1M ∑mexp{(η −η ∗)TK∑k=1t(y∗k,m)} .(3.20)Naturally, the multi-sample joint log-likelihood is:`(η ;y1, . . . ,yK ,η ∗)≈(η −η ∗)TK∑k=1t(yk,obs)− log(1M∑mexp{(η −η ∗)TK∑k=1t(y∗k,m)}).(3.21)Since the multi-sample joint likelihood has the same form as the univariatejoint likelihood, the gradient and Fisher information matrix necessary for maxi-mum likelihood estimation using the multi-sample joint likelihood are almost iden-tical to the univariate case except that the sufficient statistics t(yobs) are replacedby the sum across the K grids ∑Kk=1 t(yk,obs). Besides this difference, derivativesare identical to those shown in Eq. 3.15 and Eq. 3.17 and, hence, are not presentedhere.Although the analytical form of the multi-sample joint likelihood derived abovecan be used during the estimation procedure, experimentation showed this partic-ular analytical form to be numerically unstable given the magnitude of the termsbeing exponentiated. Consequently, forms of the multi-sample joint likelihoodwhich are more numerically stable were used in our implementation. These forms26are presented in Section 4.23.6 Interpretation of Model ParametersTo close this section, the interpretation of the model parameters is discussed. Thereare three broad types of parameters in the model, namely, parameters that relateto the covariates β , the spatial auto-correlation parameter for occupied sites sur-rounded by empty sites θ1, and the spatial auto-correlation parameter for trackingempty sites surrounded by empty sites θ2. We examine each of these below by wayof examining their effect on the conditional likelihood in Eq. 3.8.First, note that since each parameter has an inverse-logit relationship to theprobability of observing presence at a site k, interpretation of all the model param-eters can be made in terms of the log-odds as in logistic regression. More specif-ically, given the value of its neighbours, the probability of observing presence atthe kth site is an inverse-logit transform of the function gk(xk) shown in Eq. 3.9.We can observe that the covariate parameters β are encompassed within theterm xkβ of the function gk(·) in the conditional likelihood. Since xk refers to thevector of covariate values observed at the kth site, an individual parameter β isinterpreted as the expected change in the log-odds of observing a presence at thekth site due to a one-unit increase in the value of the covariate x accounting for thepresence or absence of observations at locations neighbouring the kth site.Although interpretation of the spatial auto-correlation parameters follows simi-larly, parameter interpretation depends on the neighbouring structure. First, chang-ing one of the neighbours of the kth site from absent to present (i.e., changing oney j = 0 to y j = 1) denotes an increase of 2(θ2− θ1) in the log-odds of observinga presence at the kth site. On the other hand, changing one of the neighbours ofthe kth site from present to absent denotes an increase of 2(θ1− θ2) in the log-odds of observing a presence at the kth site. Note that since changing the valueof one neighbour affects the statistics associated with both spatial auto-correlationparameters θ1 and θ2, we must interpret a change in the log-odds as resulting fromthe difference between these two parameters at any one time. That is, we cannotinterpret a change in the log-odds associated with one auto-correlation parameterwithout accounting for the change produced by the other.27Finally, we note that the spatial-auto correlation parameters can be interpretedas being auto-regressive [16]. That is, although two sites may not be neighbour-ing, the strength of the spatial auto-correlation among sites may lead the two non-neighbouring sites to affect one another by way of a diffusion process transferringfrom site to site. This diffusion is auto-regressive by way of the Markov process in-herent in the model construction. Hence, the larger the magnitude of the differencebetween the parameters θ1 and θ2, the stronger the long-range dependence of themodel. Consequently, when there is a strong underlying spatial auto-correlation,although the correlation between two sites may decrease as the distance betweenthese sites increases, the correlation between the sites does not reach zero [16].28Table 3.1: Example of how the different configurations of occupied vs. non-occupied sites shown in Fig. 3.1 yield thesame spatial auto-correlation parameter value for the auto-logistic model in Eq. 3.1.Site y1 ∑ j∈Ni yiy j ∑ j∈Ni(1− yi)(1− y j) y2 ∑ j∈Ni yiy j ∑ j∈Ni(1− yi)(1− y j)1 0 0 1 1 1 02 0 0 2 1 2 03 0 0 1 1 1 04 1 1 0 0 0 15 1 2 0 0 0 36 1 2 0 0 0 27 0 0 1 1 1 08 0 0 1 1 1 09 1 2 0 0 0 2Total 8 6 6 829Chapter 4Implementation and NumericalOptimisationIn this section we discuss implementation details for our estimation procedure.First, we walk through the implementation for the Gibbs sampling algorithm usedfor the purpose of generating the pseudo-samples of observations. Afterwards, wediscuss methods for maintaining numerical stability in estimating parameters forthe multi-sample scenario.4.1 Gibbs Sampler ImplementationThe implementation of the Gibbs sampling algorithm for generating our pseudo-samples is shown below (Algo. 1).4.2 Numerical Optimisation in the Multi-sampleScenarioIn estimating parameters for the multi-sample scenario, a straight-forward imple-mentation of the likelihood shown in Eq. 3.20 yielded numerical stability issues.Even with moderately sized data grids (e.g., lattices of dimensions 10× 10 cells)and small parameter values (‖η‖ ≈ 10−1), the magnitude of the negative potentialtended to be quite large. Consequently, calculation of the exponentiated negativepotential function and the likelihood resulted in either over-flow or under-flow er-30Algorithm 1 One iteration of the Gibbs sampler algorithm for the auto-logisticmodel.Input: Spatial locations y1, . . . ,yN , importance parameters η ∗, and (optionally)covariates x1, . . . ,xN .1: for i = 1 to N do2: Compute sufficient statistics t(yi) ←(yixi,∑ j∈Ni yi(1− y j),∑ j∈Ni(1− yi)(1− y j)).3: Compute negative potential Q(yi)← t(yi)η ∗.4: Compute acceptance probability αc←min{1, logit−1(Q(yi))}.5: Sample uc ∼ Unif(0,1).6: if uc < αc then7: y∗i ← 1.8: else9: y∗i ← 0.10: end if11: if y∗i 6= yi then12: t(y∗i )←(y∗i xi,∑ j∈Ni y∗i (1− y j),∑ j∈Ni(1− y∗i )(1− y j)).13: for s in Ni in the proposed space (y1, . . . ,y∗i , . . . ,yN) do14: t(y∗s )←(ysxs,∑ j∈Ns ys(1− y j),∑ j∈Ns(1− ys)(1− y j)).15: end for16: for s in Ni the current space (y1, . . . ,yi, . . . ,yN) do17: t(ys)←(ysxs,∑ j∈Ns ys(1− y j),∑ j∈Ns(1− ys)(1− y j)).18: end for19: Let t(yi,s)← (t(yi), t(ys∈Ni)) and t(y∗i,s)← (t(y∗i ), t(y∗s∈Ni)).20: Current neighbourhood-extended negative potential Q(yi,s) ←t(yi,s)η ∗.21: Proposed neighbourhood-extended negative potential Q(y∗i,s) ←t(y∗i,s)η ∗.22: Compute joint acceptance ratio r← exp{Q(y∗i,s)}/exp{Q(yi,s)}.23: Compute acceptance probability α ←min{1,r}.24: Sample u∼ Unif(0,1).25: if u < α then26: Accept proposal yi← yi∗.27: end if28: end if29: end forOutput: Posterior pseudo-samples y∗.31rors. It is worth noting that these issues could also have arised from phase transitionas has been observed in the Markov random field [11], network modelling [1], andstatistical physics literature [6]. We used the methods described in this section toalleviate some of these issues. Although the methods discussed aid numerical sta-bility to an extent, scaling the multi-sample estimation to very large or very finespatial areas or to parameters which are greater than 1 in magnitude remains anissue.First, since maximising the likelihood with respect to the parametersη is equiv-alent whether or not the importance η ∗ parameters are included, we stabilise themulti-sample joint log-likelihood Eq. 3.21 by dropping the importance parametersin the un-normalised likelihood. Although not beneficial in all cases, dropping theimportance parameters in the un-normalised likelihood can alleviate overflow is-sues arising when the importance parameters are far away from the true parameters.The stabilised likelihood then has the form:`(η ;y1, . . . ,yK ,η ∗)≈η TK∑k=1t(yk,obs)− log(1M∑mexp{(η −η ∗)TK∑k=1t(y∗k,m)}).(4.1)Next, rather than optimising the log-likelihood directly, we opt to use Newton-Raphson’s method to minimise the gradient of the numerically stable version of thenegative log-likelihood. That is, we find a point where the gradient is zero and theHessian is positive definite. The Newton-Raphson algorithm is shown in Algo. 2.In order to further stabilise the optimisation procedure at this point, we further ap-proximate the log-likelihood as follows. Since we have K distinct data grids, whichwe have assumed to be i.i.d., we exponentiate the ratio of the normalising constantsfor a single grid, denoted as z1(η )/z1(η ∗) below, rather than approximating the ra-32tio of the normalising constants across the entire K-dimensional space.`(η ;y1, . . . ,yK ,η ∗) = log(K∏k=1hη(yobs,k)/hη∗(yobs,k)z(η )/z(η ∗))≈ log(∏Kk=1 hη(yobs,k)/hη∗(yobs,k)[z1(η )/z1(η ∗)]K)=η TK∑k=1t(yk,obs)−K log(1M∑mexp{(η −η ∗)T t(y∗m)}).(4.2)Hence, within the optimisation procedure, the first-order partial derivative of themulti-sample log-likelihood can be computed as:∂`(η ;y1, . . . ,yK ,η ∗)∂η j=η T 1KK∑k=1t j(yk,obs)−M∑m=1wmt j(y∗m),We note here that since the accuracy of the estimated parameters depends heav-ily on the chosen importance parameters, we found that there is often a need to it-erate through the entire estimation procedure multiple times. That is, using the pa-rameter estimates obtained from the Newton-Raphson optimisation, an additionalset of pseudo-samples may need to be generated via MCMC followed by anotherNewton-Raphson optimisation. This process may be repeated multiple times untilconvergence. Although there is no set method to determine convergence of the al-gorithm, we follow [12] and assume convergence after a few cycles of the MCMCand Newton-Raphson optimisation procedure. Our assumption is supported by theresults of our computer experiments, where we see that, in general, the estima-tion of the MLE improves significantly between the first and second MCMC andNewton-Raphson optimisation cycles (see Fig. 5.3 – 5.8). Improvements in theestimation of the parameters after the second cycle of the MCMC and Newton-Raphson optimisation are much smaller with decreases in variance being accom-panied with increases in bias.33Algorithm 2 Newton-Raphson algorithm used for optimisation. The threshold εand step-size λ were set at 10−7 and 1, respectively.Input: Data sets y1, . . . ,yK , importance parameter vector η ∗ (same as ηMPLE atinitialisation), threshold ε , and step-size λ .1: Initialize:η (0)←η ∗; i← 1; imax← 202: while i < imax and δ (i−1) > ε do3: η (i)←η (i−1)+λ([∇2`(η (i−1);y1, . . . ,yK ,η ∗)]−1∇`(η (i−1);y1, . . . ,yK ,η ∗))4: δ (i)←‖∇`(η (i−1);y1, . . . ,yK ,η ∗)‖5: i← i+16: end while7: ηMLE ←η (i)8: η ∗←ηMLE , i← 1, and repeat until the end of time.Output: MLE estimate ηMLE .34Chapter 5Computer ExperimentsWe now discuss the computer experiments carried out to empirically validate thetheoretical results presented thus far. For these experiments we set β to be a uni-variate parameter β1 whose value is kept constant at 0.25. Next, for all possiblecombinations of three distinct values of the spatial auto-correlation parameters θ1and θ0, a simulation experiment was carried out. Each experiment consisted ofgenerating sample data sets under the true parameters and then estimating theseparameters using the theory discussed in previous sections.In each simulation, 150 samples each containing K = 20 data grids of dimen-sions 10× 10 were generated via Gibbs sampling. An example of one realisationof the data generated is shown in Fig. 5.1. In each simulation the distribution of thesingle covariate x was generated as follows. The lower triangular elements of thedata grid were assigned a value uniformly between 600 and 1200 while the uppertriangular elements were assigned a value uniformly between 0 and 10 (i.e., the di-mension of the data grid). Finally, all of the values in the data grid were divided bythe maximum value to yield x∈ [0,1]. An example of the distribution of the covari-ates is shown in Fig. 5.2. For each sample, the MCMC chain was allowed to runfor 1020 iterations, the last 20 of which were taken to be the observed data. After-wards, the MPLE for the 20 observed data sets taken jointly was estimated. Usingthe joint MPLE as the initial value for the importance parameter η ∗, Gibbs sam-pling was used to generate 3000 pseudo-samples. The first M = 1000 of these werediscarded as burn-in while the latter M = 2000 were used in estimating the ratio351 0 1 1 0 0 1 1 1 11 1 1 1 1 1 0 0 0 00 1 1 0 1 0 1 1 1 11 1 1 0 1 0 1 1 0 10 1 1 0 1 0 1 0 0 01 0 0 1 1 0 1 1 1 11 0 1 0 1 1 1 0 0 01 1 1 0 1 1 1 1 1 10 0 1 0 1 0 1 0 0 11 0 1 1 1 0 1 1 1 1longitudelatitude0.000.250.500.751.00valueExample of Grid of DataFigure 5.1: A realisation of an observed data grid generated in the simula-tions.of the normalising constants as in Eq. 3.12. Using the pseudo-samples, the multi-sample joint log-likelihood was maximised via Newton-Raphson with a maximumof 20 optimisation iterations, where if the Newton-Raphson optimisation exceeded20 iterations we denoted the MLE as non-convergent. Finally, the Gibbs samplingand Newton-Raphson steps were repeated four more times. The value at this final5th cycle of the optimisation procedure was taken to be the estimated MLE ηˆMLE .Summary statistics for ηˆMLE at the 5th cycle of the optimisation procedure forsix of the simulations are presented in Tables 5.1 – 5.3 and their distributions areshown in the last panel of Fig. 5.3 – Fig. 5.8. The distribution of the parametersduring the first four cycles of the optimisation procedure are shown in these figuresas well. For simulations 4, 6, and 9, only 3, 53, and 8 samples converged and, there-fore, the results for these may not be reliable and are not shown here. We note thatalthough the estimation improves considerably between the first and second cycle360.817 0.728 0.904 0.982 0.526 0.742 0.568 0.811 0.771 0.0030.899 0.719 0.676 0.678 0.583 0.848 0.542 0.629 0.004 0.0080.987 1 0.629 0.808 0.607 0.872 0.9 0 0.008 0.0040.856 0.722 0.667 0.737 0.672 0.731 0.002 0.001 0.001 0.0020.761 0.803 0.808 0.697 0.621 0.004 0.001 0.002 0.005 0.0080.679 0.613 0.665 0.564 0.006 0.007 0.008 0.001 0.008 0.0020.687 0.513 0.846 0.005 0.007 0.005 0.008 0.001 0.001 0.0050.68 0.796 0.002 0.006 0.005 0.004 0.004 0.003 0 0.0030.872 0.006 0.004 0.005 0.002 0.001 0.004 0.007 0.007 0.0040.002 0.002 0.005 0.004 0 0.005 0.005 0 0.002 0.008longitudelatitude0.250.500.751.00valueExample of Distribution of Covariate ValuesFigure 5.2: A typical realisation of the distribution of the covariate used inthe simulations.of the MCMC and Newton-Raphson optimisation procedure, later cycles providemuch more moderate improvements to the estimation. Furthermore, the improve-ments in these later cycles are driven by a bias-variance trade-off where a decreasein variance is accompanied by an increase in bias. This behaviour is perhaps mostclearly illustrated in Fig. 5.3 and Fig. 5.4 below. In the tables, Asymptotic SD de-notes the standard deviation obtained using the the observed Fisher informationaveraged across the 150 samples while MCMC SD denotes the standard devia-tion of the 150 estimated parameters obtained in the simulation. In some of thesimulations, the Newton-Raphson optimisation reached the maximum allowed 20iterations for optimisation or the likelihood yielded a singular Hessian. If this oc-curred, we denoted the algorithm as not converging and labelled the MLE of thesample as unknown. The column Num. Converged denotes the number of samplesthat converged and yielded an MLE out of the possible 150 in each simulation.37Although some of the simulations yielded quite reasonable results, we observedthat in some simulations the number of samples that converged was drasticallysmall. We attribute this mainly to poor pseudo-samples being generated and tonumerical issues remained even after implementing the methods discussed in Sec-tion 4.2.llllllllllllllllllllllOptimisation Cycle #1 Optimisation Cycle #2 Optimisation Cycle #3Optimisation Cycle #4 Optimisation Cycle #5−−β1 θ1 θ0 β1 θ1 θ0Parameter Valueparameterβ1θ1θ0Optimisation Cycles for Simulation # 1Figure 5.3: Distribution of estimated parameter vector ηˆ throughout five cy-cles of the Gibbs (MCMC) and Newton-Raphson optimisation for the1st simulation experiment. Horizontal line indicates the true value ofthe parameter boxplot of the corresponding colour.38lllllllllllllllOptimisation Cycle #1 Optimisation Cycle #2 Optimisation Cycle #3Optimisation Cycle #4 Optimisation Cycle #5−−β1 θ1 θ0 β1 θ1 θ0Parameter Valueparameterβ1θ1θ0Optimisation Cycles for Simulation # 2Figure 5.4: Distribution of estimated parameter vector ηˆ throughout five cy-cles of the Gibbs (MCMC) and Newton-Raphson optimisation for the2nd simulation experiment. Horizontal line indicates the true value ofthe parameter boxplot of the corresponding colour.39Table 5.1: Summary statistics for estimated parameter βˆ1 for six of the nine simulations conducted. Since only 3, 53,and 8 samples converged in the 4th, 6th, and 9th simulations, respectively, the results are not shown here.True (β1,θ1,θ0) Mean Asymptotic SD MCMC SD Bias MSE Num. Converged(0.25,0.4,−0.15) 0.17 0.17 0.15 −0.08 0.03 150(0.25,0.01,−0.15) 0.22 0.13 0.11 −0.03 0.01 150(0.25,−0.35,−0.15) 0.28 0.15 0.35 0.03 0.12 86(0.25,0.4,0.3) − − − − − −(0.25,0.01,0.3) 0.18 0.24 0.23 −0.07 0.06 81(0.25,−0.35,0.3) − − − − − −(0.25,0.4,−0.05) 0.15 0.18 0.18 −0.10 0.04 148(0.25,0.01,−0.05) 0.23 0.12 0.08 −0.02 0.01 150(0.25,−0.35,−0.05) − − − − − −40Table 5.2: Summary statistics for estimated parameter θˆ1 for six of the nine simulations conducted. Since only 3, 53,and 8 samples converged in the 4th, 6th, and 9th simulations, respectively, the results are not shown here.True (β1,θ1,θ0) Mean Asymptotic SD MCMC SD Bias MSE Num. Converged(0.25,0.4,−0.15) 0.40 0.03 0.05 −0.00 0.00 150(0.25,0.01,−0.15) 0.01 0.02 0.03 0.00 0.00 150(0.25,−0.35,−0.15) −0.34 0.02 0.05 0.01 0.00 86(0.25,0.4,0.3) − − − − − −(0.25,0.01,0.3) −0.05 0.16 0.16 −0.06 0.03 81(0.25,−0.35,0.3) − − − − − −(0.25,0.4,−0.05) 0.40 0.03 0.05 −0.00 0.00 148(0.25,0.01,−0.05) 0.01 0.02 0.02 0.00 0.00 150(0.25,−0.35,−0.05) − − − − − −41Table 5.3: Summary statistics for estimated parameter θˆ0 for six of the nine simulations conducted. Since only 3, 53,and 8 samples converged in the 4th, 6th, and 9th simulations, respectively, the results are not shown here.True (β1,θ1,θ0) Mean Asymptotic SD MCMC SD Bias MSE Num. Converged(0.25,0.4,−0.15) −0.15 0.02 0.04 −0.00 0.00 150(0.25,0.01,−0.15) −0.15 0.01 0.03 −0.00 0.00 150(0.25,−0.35,−0.15) −0.19 0.02 0.04 −0.04 0.00 86(0.25,0.4,0.3) − − − − − −(0.25,0.01,0.3) 0.26 0.07 0.12 −0.04 0.02 81(0.25,−0.35,0.3) − − − − − −(0.25,0.4,−0.05) −0.06 0.02 0.04 −0.01 0.00 148(0.25,0.01,−0.05) −0.05 0.01 0.02 −0.00 0.00 150(0.25,−0.35,−0.05) − − − − − −42llllll lllll l ll l llllllllllOptimisation Cycle #1 Optimisation Cycle #2 Optimisation Cycle #3Optimisation Cycle #4 Optimisation Cycle #5−10123−10123β1 θ1 θ0 β1 θ1 θ0Parameter Valueparameterβ1θ1θ0Optimisation Cycles for Simulation # 3Figure 5.5: Distribution of estimated parameter vector ηˆ throughout five cy-cles of the Gibbs (MCMC) and Newton-Raphson optimisation for the3rd simulation experiment. Horizontal line indicates the true value ofthe parameter boxplot of the corresponding colour.43llllllllllllllllllllllll lOptimisation Cycle #1 Optimisation Cycle #2 Optimisation Cycle #3Optimisation Cycle #4 Optimisation Cycle #5−6−4−202−6−4−202β1 θ1 θ0 β1 θ1 θ0Parameter Valueparameterβ1θ1θ0Optimisation Cycles for Simulation # 5Figure 5.6: Distribution of estimated parameter vector ηˆ throughout five cy-cles of the Gibbs (MCMC) and Newton-Raphson optimisation for the5th simulation experiment. Horizontal line indicates the true value ofthe parameter boxplot of the corresponding colour.44lllllllllllllllllllllOptimisation Cycle #1 Optimisation Cycle #2 Optimisation Cycle #3Optimisation Cycle #4 Optimisation Cycle #5012012β1 θ1 θ0 β1 θ1 θ0Parameter Valueparameterβ1θ1θ0Optimisation Cycles for Simulation # 7Figure 5.7: Distribution of estimated parameter vector ηˆ throughout five cy-cles of the Gibbs (MCMC) and Newton-Raphson optimisation for the7th simulation experiment. Horizontal line indicates the true value ofthe parameter boxplot of the corresponding colour.45lllllllllllllllOptimisation Cycle #1 Optimisation Cycle #2 Optimisation Cycle #3Optimisation Cycle #4 Optimisation Cycle #β1 θ1 θ0 β1 θ1 θ0Parameter Valueparameterβ1θ1θ0Optimisation Cycles for Simulation # 8Figure 5.8: Distribution of estimated parameter vector ηˆ throughout five cy-cles of the Gibbs (MCMC) and Newton-Raphson optimisation for the8th simulation experiment. Horizontal line indicates the true value ofthe parameter boxplot of the corresponding colour.46Chapter 6ConclusionThe use of Markov random fields is a viable method for habitat modelling. Bybuilding an auto-logistic model within the Markov random fields framework, dis-tinct types of ecological and biological data can be incorporated into a mathemati-cal model for inference and habitat prediction. Although this model can be appliedto the data from a single animal, it also generalisable for use with the data frommultiple animals. However, particularly in the multi-sample case, there exist somecomputational limitations with the method which future work may explore froman implementation or a theory approach. This may include further work in op-timisation procedures which are more cost effective or yield better convergenceresults than the MCMC-MLE procedure presented. Despite these issues, the cur-rent implementation of the model has the advantages of relying on inference tech-niques that are understandable by ecologists, of being intuitively interpretable viathe model parameters, and of allowing a compromise among the accuracy and va-riety of the ecological and environmental data that may be used to build the model.47Bibliography[1] A. Agaskar and Y. M. Lu. Alarm: A logistic auto-regressive model forbinary processes on networks. In Global Conference on Signal andInformation Processing (GlobalSIP) IEEE, pages 305 – 308, 2013. → pages32[2] S. Ban and A. W. Trites. Quantification of terrestial haul-out and rookerycharacteristics of steller sea lions. Marine Mammal Science, 23(3):496 –507, 2007. → pages 1[3] K. J. Benoit-Bird et al. Prey patch patterns predict habitat use by top marinepredators with diverse foraging strategies. PLoS ONE, 8(1):e53348, 2013.→ pages 3[4] K. J. Benoit-Bird et al. Foraging behavior of northern fur seals closelymatches the hierarchical patch scales of prey. Marine Ecology ProgressSeries, 479:283 – 302, 2013. → pages 3[5] J. Besag. Spatial interaction and the statistical analysis of lattice systems.Journal of the Royal Statistical Society, 36(2):192 – 236, 1974. → pages 5,6, 10, 11[6] R. J. Birgeneau, R. A. Cowley, G. Shirane, and H. Yoshizawa. Phasetransitions in diluted magnets: Critical behavior, percolation, and randomfields. Journal of Statistical Physics, 34(5 - 6):817 – 848, 1984. → pages 32[7] G. K. H. Boor and R. J. Small. Steller sea lion spatial-use patterns derivedfrom a bayesian model of opportunistic observations. Marine MammalScience, 28(4):375 – 403, October 2012. → pages 2[8] C. J. A. Bradshaw et al. At-sea distribution of female southern elephantseals relative to variation in ocean surface properties. Journal of MarineScience, 61:1014 – 1027, 2004. → pages 348[9] N. Cressie. Statistics for Spatial Data. Wiley-Interscience, 1993. → pages5, 7, 8, 10, 11[10] R. Durrett. Probability: Theory and Examples. Cambridge University Press,4th edition, 2010. → pages 22[11] A. E. Gelfand, P. Diggle, P. Guttorp, and M. Fuentes, editors. Handbook ofSpatial Statistics. CRC Press, 2010. → pages 32[12] C. J. Geyer and E. A. Thompson. Constrained monte carlo maximumlikelihood for dependent data. Journal of the Royal Statistical Society. SeriesB, 54(54):657 – 699, 1992. → pages 21, 33[13] K. T. Goetz, R. A. Montgomery, J. M. V. Hoef, R. C. Hobbs, and D. S.Johnson. Identifying essential summer habitat of the endangered belugawhale delphinapterus leucas in cook inlet, alaska. Endangered SpeciesResearch, 16:135 – 147, 2012. → pages 1[14] E. J. Gregr and A. W. Trites. A novel presence-only validation technique forimproved steller sea lion eumetopias jubatus critical habitat descriptions.Marine Ecology Progress Series, 365:247 – 261, 2008. → pages 1, 2, 3[15] C. Guinet et al. Spatial distribution of foraging in female antarctic fur sealsarctocephalus gazella in relation to oceanographic variables: ascale-dependent approach using geographic information systems. MarineEcology Progress Series, 219:251 – 264, 2001. → pages 3[16] M. L. Gumpertz, J. M. Graham, and J. B. Ristaino. Autologistic model ofspatial pattern of phytophthora epidemic in bell pepper: effects of soilvariables on disease presence. Journal of Agricultural, Biological, andEnvironmental Statistics, 2(2):131 – 156, 1997. → pages 28[17] J. Hammersley and P. Clifford. Markov fields on finite graphs and lattices.Unpublished, 1971. → pages 9[18] F. Huffer and H. Wu. Markov chain monte carlo in spatial binary datamodels with application to the distribution of plant species. Biometrics, 54(2):509 – 524, 1998. → pages 13, 21, 24[19] D. S. Johnson, P. B. Conn, M. B. Hooten, J. C. Ray, and B. A. Pond. Spatialoccupancy models for large data sets. Ecology, 94(4):801 – 808, 2013. →pages 149[20] K. Kaschner, R. Watson, A. W. Trites, and D. Pauly. Mapping world-widedistributions of marine mammal species using a relative environmentalsuitability (res) model. Marine Ecology Progress Series, 316:285 – 310,2006. → pages 2, 3[21] C. A. Nordstrom, B. C. Battaile, C. Cott, and A. W. Trites. Foraging habitatsof lactating northern fur seals are structured by thermocline depths andsubmesoscale fronts in the eastern bering sea. Deep-Sea Research II, 88 -89:78 – 79, 2012. → pages 3, 4[22] J. L. Pearce and M. S. Boyce. Modelling distribution and abundance withpresence-only data. Journal of Applied Ecology, 43:405 – 412, 2006. →pages 3[23] D. Warton and G. Arts. Advancing our thinking in presence-only andused-available analysis. Journal of Animal Ecology, 82:1125 – 1134, 2013.→ pages 2, 350Appendix AAppendixA.1 Standard NotationThis section serves as a reference for some of the notation commonly used through-out this document. Notation is listed in order of appearance.51Notation Descriptions Vector of sites indexed i = 1, . . . ,Ny(si) Observed response at the ith siteNi Neighbourhood around the ith site.P(·) Probability mass function with support ζ0(si) Absence/zero at the ith siteI(·) Indicator functionQ(·) Negative potential functionyi Short-hand notation for y(si)x Model covariate(s)β Model covariate parameter(s)θ Model spatial auto-correlation parameter(s)z(η ) Partition function/normalising constantη Model parameter vector0i Short-hand notation for 0(si)hη(·) Un-normalised likelihood functionη ∗ Importance parameter vectorf (·) Likelihood functionyk,obs Observed data of spatial locations for kth animaly∗k,m mth pseudo-data sample generated for kth animalt(·) Function of sufficient statisticsL(·) Likelihood function`(·) Log-likelihood function52


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items