Hidden at the root : statistical methods for population size estimation on trees

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Hidden at the root : statistical methods for population size estimation on trees Flynn, Mallory J.

Abstract

In many fields, populations of interest are hidden from data for a variety of reasons, though their magnitude remains important in determining resource allocation and appropriate policy. One popular approach to population size estimation, the multiplier method, is a back-calculation tool requiring only a marginal subpopulation size and an estimate of the proportion belonging to this subgroup. Another approach is to use Bayesian methods, which are inherently well-suited to incorporating multiple data sources. However, both methods have their drawbacks. A framework for applying the multiplier method which combines information from several known subpopulations has not yet been established; Bayesian models, though able to incorporate complex dependencies and various data sources, are difficult for researchers in less technical fields to design and implement. Increasing data collection and linkage across diverse fields suggests accessible methods of estimating population size with synthesized data are needed. In public health and epidemiology, these linkages often admit a tree structure, with the target population represented by the root, and paths from root-to-leaf representing pathways of care after a health event. In this thesis, we propose an extension to the well-known multiplier method which is applicable to tree-structured data, where multiple subpopulations and corresponding proportions combine to generate a population size estimate via the minimum variance estimator. The estimates given by this methodology are compared to those from a Bayesian hierarchical model, for both simulated and real world data, the latter provided by BC's opioid overdose cohort, a tree-like data structure which tracks individuals along pathways of care after an overdose. Subsequent analysis elucidates which data are key to estimation in each method, and examines robustness and feasibility of methods. Finally, two R packages have been developed to facilitate the use of these methods on similar applications. The first provides a straightforward method of estimating population size on tree-structured data with the modified multiplier methodology. The second provides functionality to automatically generate Bayesian model code for tree-structured data intended to be used for estimation with the MCMC sampler, JAGS, lowering the technical barrier of implementation.

Item Metadata

Title	Hidden at the root : statistical methods for population size estimation on trees
Creator	Flynn, Mallory J.
Supervisor	Gustafson, Paul, 1968-
Publisher	University of British Columbia
Date Issued	2023
Description	In many fields, populations of interest are hidden from data for a variety of reasons, though their magnitude remains important in determining resource allocation and appropriate policy. One popular approach to population size estimation, the multiplier method, is a back-calculation tool requiring only a marginal subpopulation size and an estimate of the proportion belonging to this subgroup. Another approach is to use Bayesian methods, which are inherently well-suited to incorporating multiple data sources. However, both methods have their drawbacks. A framework for applying the multiplier method which combines information from several known subpopulations has not yet been established; Bayesian models, though able to incorporate complex dependencies and various data sources, are difficult for researchers in less technical fields to design and implement. Increasing data collection and linkage across diverse fields suggests accessible methods of estimating population size with synthesized data are needed. In public health and epidemiology, these linkages often admit a tree structure, with the target population represented by the root, and paths from root-to-leaf representing pathways of care after a health event. In this thesis, we propose an extension to the well-known multiplier method which is applicable to tree-structured data, where multiple subpopulations and corresponding proportions combine to generate a population size estimate via the minimum variance estimator. The estimates given by this methodology are compared to those from a Bayesian hierarchical model, for both simulated and real world data, the latter provided by BC's opioid overdose cohort, a tree-like data structure which tracks individuals along pathways of care after an overdose. Subsequent analysis elucidates which data are key to estimation in each method, and examines robustness and feasibility of methods. Finally, two R packages have been developed to facilitate the use of these methods on similar applications. The first provides a straightforward method of estimating population size on tree-structured data with the modified multiplier methodology. The second provides functionality to automatically generate Bayesian model code for tree-structured data intended to be used for estimation with the MCMC sampler, JAGS, lowering the technical barrier of implementation.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2024-10-31
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0437150
URI	http://hdl.handle.net/2429/86174
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2023-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Hidden at the root : statistical methods for population size estimation on trees Flynn, Mallory J.

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights