UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Accounting for preferential sampling in the statistical analysis of spatio-temporal data Watson, Joe


Spatio-temporal statistical methods are widely used to model natural phenomena across both space and time. Example phenomena include the concentrations of airborne pollutants and the distributions of endangered species. A spatio-temporal process is said to have been preferentially sampled when the locations and/or times chosen to observe it depend stochastically on the values of the process at the chosen locations and/or times. When standard statistical methodologies are used, predictions of a preferentially sampled spatio-temporal process into unsampled regions and times may be severely biased. Preferential sampling within spatio-temporal data may be the rule rather than the exception in practice. The work demonstrated in this dissertation addresses the issue of preferential sampling. We develop the first general framework for modelling preferential sampling in spatio-temporal data and apply it to historical UK black smoke measurements. We demonstrate that existing estimates of population-level black smoke exposures may be highly inaccurate due to preferential sampling. By leveraging the information contained in the chosen sampling locations, we can adjust estimates of black smoke exposure to the presence of preferential sampling. Next, we develop a fast, intuitive, powerful, and general test for preferential sampling. A user-friendly R-package we wrote performs the test. We demonstrate its utility in both a thorough simulation study and by successfully replicating previously-published results on preferential sampling. Finally, we adapt our ideas on preferential sampling to the setting of spatio-temporal point patterns. By considering the observed point pattern as a spatio-temporal thinned, marked log-Gaussian Cox process, we show that preferential sampling can be directly accounted for within the model. Under certain assumptions, the true distribution of locations can then be attained. Using these ideas, we develop a framework for combining multiple data sources to estimate the spatio-temporal distribution of an animal. We then apply our framework to estimate effort-corrected space-use of an endangered ecotype of killer whales. Ultimately, we hope that investigations into preferential sampling will become an essential component within spatio-temporal analyses, akin to model diagnostics. The methods developed in this dissertation are widely applicable, allowing researchers to routinely perform such investigations.

Item Media

Item Citations and Data


Attribution-NoDerivatives 4.0 International