UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Hidden at the root : statistical methods for population size estimation on trees Flynn, Mallory J.

Abstract

In many fields, populations of interest are hidden from data for a variety of reasons, though their magnitude remains important in determining resource allocation and appropriate policy. One popular approach to population size estimation, the multiplier method, is a back-calculation tool requiring only a marginal subpopulation size and an estimate of the proportion belonging to this subgroup. Another approach is to use Bayesian methods, which are inherently well-suited to incorporating multiple data sources. However, both methods have their drawbacks. A framework for applying the multiplier method which combines information from several known subpopulations has not yet been established; Bayesian models, though able to incorporate complex dependencies and various data sources, are difficult for researchers in less technical fields to design and implement. Increasing data collection and linkage across diverse fields suggests accessible methods of estimating population size with synthesized data are needed. In public health and epidemiology, these linkages often admit a tree structure, with the target population represented by the root, and paths from root-to-leaf representing pathways of care after a health event. In this thesis, we propose an extension to the well-known multiplier method which is applicable to tree-structured data, where multiple subpopulations and corresponding proportions combine to generate a population size estimate via the minimum variance estimator. The estimates given by this methodology are compared to those from a Bayesian hierarchical model, for both simulated and real world data, the latter provided by BC's opioid overdose cohort, a tree-like data structure which tracks individuals along pathways of care after an overdose. Subsequent analysis elucidates which data are key to estimation in each method, and examines robustness and feasibility of methods. Finally, two R packages have been developed to facilitate the use of these methods on similar applications. The first provides a straightforward method of estimating population size on tree-structured data with the modified multiplier methodology. The second provides functionality to automatically generate Bayesian model code for tree-structured data intended to be used for estimation with the MCMC sampler, JAGS, lowering the technical barrier of implementation.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International