UBC Faculty Research and Publications

Does size really matter? A sensitivity analysis of number of seeds in a respondent-driven sampling study… Lachowsky, Nathan J; Sorge, Justin T; Raymond, Henry F; Cui, Zishan; Sereda, Paul; Rich, Ashleigh; Roth, Eric A; Hogg, Robert S; Moore, David M Nov 16, 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12874_2016_Article_258.pdf [ 909.64kB ]
JSON: 52383-1.0362021.json
JSON-LD: 52383-1.0362021-ld.json
RDF/XML (Pretty): 52383-1.0362021-rdf.xml
RDF/JSON: 52383-1.0362021-rdf.json
Turtle: 52383-1.0362021-turtle.txt
N-Triples: 52383-1.0362021-rdf-ntriples.txt
Original Record: 52383-1.0362021-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessDoes size really matter? A sensitivityanalysis of number of seeds in arespondent-driven sampling study of gay,bisexual and other men who have sex withmen in Vancouver, CanadaNathan John Lachowsky1,2* , Justin Tyler Sorge1, Henry Fisher Raymond3,4, Zishan Cui1, Paul Sereda1,Ashleigh Rich1, Eric A. Roth5, Robert S. Hogg1,6 and David M. Moore1,7AbstractBackground: Respondent-driven sampling (RDS) is an increasingly used peer chain-recruitment method to sample“hard-to-reach” populations for whom there are no reliable sampling frames. Implementation success of RDS varies;one potential negative factor being the number of seeds used.Methods: We conducted a sensitivity analysis on estimates produced using data from an RDS study of gay, bisexualand other men who have sex with men (GBMSM) aged ≥16 years living in Vancouver, Canada. Participants completeda questionnaire on demographics, sexual behavior and substance use. For analysis, we used increasing seed exclusioncriteria, starting with all participants and subsequently removing unproductive seeds, chains of ≤1 recruitment waves,and chains of ≤2 recruitment waves. We calculated estimates for three different outcomes (HIV serostatus, condomlessanal intercourse with HIV discordant/unknown status partner, and injecting drugs) using three different RDS weightingprocedures: RDS-I, RDS-II, and RDS-SS. We also assessed seed dependence with bottleneck analyses and convergenceplots. Statistical differences between RDS estimators were assessed through simulation analysis.Results: Overall, 719 participants were recruited, which included 119 seeds and a maximum of 16 recruitment waves(mean chain length = 1.7). The sample of >0 recruitment waves removed unproductive seeds (n = 50/119, 42.0%),resulting in 69 chains (mean length = 3.0). The sample of >1 recruitment waves removed 125 seeds or recruits (17.4%of overall sample), resulting in 37 chains (mean length = 4.8). The final sample of >2 recruitment waves removed afurther 182 seeds or recruits (25.3% of overall sample), resulting in 25 chains (mean length = 6.1). Convergence plotsand bottleneck analyses of condomless anal intercourse with HIV discordant/unknown status partner and injectingdrugs outcomes were satisfactory. For these two outcomes, regardless of seed exclusion criteria used, the crudeproportions fell within 95% confidence intervals of all RDS-weighted estimates. Significant differences between thethree RDS estimators were not observed.(Continued on next page)* Correspondence: nlachowsky@cfenet.ubc.ca1Epidemiology & Population Health, British Columbia Centre for Excellence inHIV/AIDS, 608-1081 Burrard Street, Vancouver V6T 1Y6, Canada2School of Public Health and Social Policy, Faculty of Human and SocialDevelopment, University of Victoria, Victoria, CanadaFull list of author information is available at the end of the article© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Lachowsky et al. BMC Medical Research Methodology  (2016) 16:157 DOI 10.1186/s12874-016-0258-4(Continued from previous page)Conclusions: Within a sample of GBMSM in Vancouver, Canada, this RDS study suggests that when equilibrium andhomophily are met, although potentially costly and time consuming, analysis is not negatively affected by largenumbers of unproductive or lowly productive seeds.Keywords: HIV/AIDS, Men who have sex with men, Respondent-driven sampling, Sensitivity analysisBackgroundSeveral populations are considered “most at risk” ofhuman immunodeficiency virus (HIV) infection andcontribute disproportionately to the epidemic. Such pop-ulations include sex workers, injection drug users andgay, bisexual and other men who have sex with men(GBMSM) [1]. Internationally, GBMSM are at a dispro-portionate risk of HIV infection when compared withother men [2, 3]. In Canada, 2011 prevalence estimatesindicated that 33,330 GBMSM were living with HIV(47% of all prevalent cases), with an HIV incidence rate71 times greater than other men [3, 4]. The HIVepidemic amongst GBMSM is centered within urbancontexts. For example in British Columbia, HIV preva-lence amongst GBMSM in Metro Vancouver was esti-mated at 18% in 2009 [4]. Rigorous bio-behavioralsurveillance and research with GBMSM is needed, but ishindered by limitations of probability sampling with thispopulation.Due to a lack of systematic/institutional data collectionon relevant behaviors or identities, as well as potentiallegal barriers and stigma, these GBMSM populations arewidely considered to be “hidden” or “hard-to-reach” [1].Although a population estimate of Vancouver’s GBMSMpopulation has been calculated [5], a complete samplingframe or list of sampling units does not exist for thispopulation. Consequently, it is difficult to generate anunbiased and generalizable sample. While some re-searchers have found success sampling most at-risk pop-ulations through time-location sampling [6], previousresearch among Vancouver’s GBMSM population identi-fied sub-populations that may not frequent the venuesused for sampling [7] and these sub-populations may beunderrepresented in time-location sampling. Respondent-driven sampling (RDS) is an increasingly used peer chain-recruitment framework to sample and analyze data fromthese “hard-to-reach” populations [8, 9]. Globally, therehave been over 120 bio-behavioral HIV surveillance stud-ies using RDS methodology, with almost 40 studiesfocused exclusively on GBMSM [10].Respondent-driven sampling theory and methodologyhas been well described in the literature; while not anexhaustive list, the curious reader is referred to thesources cited within this article [8–17]. Currently,there are three RDS-adjustment weighting approaches[8, 9, 13, 17–19]. The first group of estimators, RDS-I(SH), developed in 1997 and later refined in 2004,uses data to make inferences about network charac-teristics, and then uses those estimates to makeinferences about a population parameter point esti-mate [8, 9]. A second group of estimators, RDS-II(VH), was developed in 2008. These estimators use aMarkov chain model to make probability-based calcu-lated estimates directly from the data. As such, theseestimators assume that sampling is with-replacement.Additionally, RDS-II (VH) estimation allows for ana-lytical calculation of variance and considers homo-phily and network size, and not just the latter as forRDS-I (SH) [18]. Using computational simulations tocompare these estimators, RDS-II (VH) estimatorswere found to outperform RDS-I (SH) estimatorsoverall [14, 17]. To prevent the introduction of biasfrom re-sampling subjects, respondents can only besampled once in RDS, and as such the assumption ofsampling with-replacement is never met with RDS.But when the target population is large enough bycomparison to the sample size, known as samplefraction, sampling is said to approximate with-replacement. Finally, a third estimator, RDS-SS, hasbeen developed to address the bias introduced whenthe assumption of with-replacement sampling is vio-lated, specifically when a large sample fraction exists.The RDS-SS estimator uses successive sampling meth-odology to approximate parameters and outperformsRDS-II (VH) when sampling is without-replacement.For successive sampling estimation, the target popula-tion size must be known. When the sample fractionis small, RDS-II (VH) and RDS-SS estimates converge[13]. If certain assumptions are met, these analysesare said to be asymptotically unbiased [8].Although the theoretical strengths of RDS are well known,implementation success of RDS varies [10, 12, 20, 21].Accurate and precise RDS data estimation requireseffective implementation of RDS sampling processes[8, 9, 13, 14, 18]. When applied effectively, one par-ticularly important consequence of the RDS process isthat the final estimate is not influenced by biases inthe initial sampling design; that is, results are notdependent on seed selection. In order for this tooccur, there must be enough successive waves for sta-bility on the measured parameter to occur [22]. Thiscan be accomplished by using a small number ofLachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 2 of 10seeds, relative to the desired sample size, allowing forenough waves of recruitment before the sample size ismet. When larger numbers of seeds are sampled, thedesired sample size may be reached with a smaller numberof waves and recruitment may be ended before stability ofparameters is reached [9, 14, 15, 23]. If this is the case, theuse of data provided by unproductive seeds, chosenthrough biased convenience sampling, may have undueimpact on final analysis.The number of waves required to reach equilibrium isalso influenced by the level of homophily, or segregationof sub populations within the target population. Ifrecruits tend to sample from within the same groupbased on various factors (e.g., age, gender, ethnicity) thisindicates a higher level of homophily, which will necessi-tate more waves to reach stability as there will be alower probability of recruits sampling from without theirgroup. Furthermore, point-estimate variance increaseswith increased homophily [8].It has been suggested that equilibrium will occurwithin no more than the fourth to sixth wave. [8, 16]While a diagnostic formula to assess if equilibrium hasoccurred has been developed, it has received some criti-cism [9, 22].As an alternative, the use of graphical diagnostics toassess for parameter equilibrium has been proposed.Convergence plots depict a population’s parameter pro-portion on the y-axis by the number of recruits on thex-axis. As recruitment continues, values will convergeon the population estimate with equilibrium indicatedby a stabilization of values over remaining recruits, indi-cating that the sample is not biased by the purposefulselection of seeds over the parameter. Examples ofconvergence plots are widely available [22]. Conver-gence plots may hide the effect that individual seedsand their subsequent trees may have on the sampleestimate [22]. Bottleneck plots superimpose conver-gence plots for each individual seed and are useful inassessing if homophily is present. Examples of bottle-neck plots are also available [22]. Seed tree plots thatconverge on or near the population estimate (i.e., one“bottleneck”) is indicative of low homophily. Con-versely, different seed tree plots that stabilize on dif-ferent estimates (i.e., two or more “bottlenecks”) isevidence of homophily [22]. While not statisticalhypothesis tests of assumptions these plots can beused to assess visually the properties of populationstability and homophily, much like visually assessingQQ-plots to assess normality with regression diagnos-tics or evaluation of trace plots to assess convergenceof Markov chains has become commonplace, forexample [24, 25]. These graphical diagnostics can beeasily visualized at any stage of an RDS study toexamine its success or shortcomings.While it is analytically desirable to have a small num-ber of seeds and long recruitment chains, this may notalways be practical. For example, successive purposefulsampling of unique seeds to access identified sub-populations may be necessary after initial seeds havebeen selected. Additionally, if recruitment slows, newseeds may be required in order to reach a particularsample size that would have sufficient statistical powerto address particular research questions. Indeed, this wasour experience implementing an RDS study, and wewere therefore curious about the effect of having a largeramount of seeds in our RDS study.Using data collected from cross-sectional study ofGBMSM in Vancouver, British Columbia (BC), weconducted a sensitivity analysis on key study RDS-adjustment weighted point estimates. Our analysisexamined the effect that implementing increasingly strictseed exclusion criteria had on point estimates usingthree different RDS estimators. We hypothesized thatwhen equilibrium and low homophily are graphicallyobserved for a given outcome, RDS point esti-mates (using any RDS estimator) would remain robustagainst seed selection bias.MethodsThe Momentum Health Study of GBMSM in MetroVancouver, BC, is a cross-sectional RDS study with sub-sequent semi-annual prospective follow-up. The scale-up of highly active antiretroviral therapy (HAART) inBC through a policy of Treatment as Prevention mayaffect HIV sexual risk behaviour as mediated by increas-ing use of soft and hard drugs (including injection andnon-injection drugs) [26]. If this risk compensation issubstantial, then the HAART scale-up might not bringabout a decline in HIV incidence in the GBMSM popu-lation. The overall study therefore aims to detect signifi-cant but small changes in HIV sexual risk anddrug-taking behaviour over the course of the 4 years offollow-up.Study populationParticipants were recruited into the Momentum HealthStudy. Eligibility criteria were identify as a man, reportrecent sex (past 6 months) with another man, beaged ≥ 16 years of age, live in Metro Vancouver, andbe able to complete a questionnaire in English. Baselinecross-sectional data were collected between February2012 and February 2014.Recruitment and study proceduresAfter conducting formative research using communitymapping to identify GBMSM characteristics in Vancouver[7] participants were recruited using RDS. Initially, 30seeds were selected purposively from our formative work,Lachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 3 of 10community agency and study team contacts, and ourcommunity advisory board with consideration to di-versity in terms of age, ethnicity, and HIV status.Seeds were trained in peer recruitment in-person by aresearch assistant and provided with up to 6 paper orelectronic coupons to recruit other GBMSM fromtheir networks. An additional 89 seeds were added topromote further recruitment success of sample sizetargets, which were additionally recruited using adver-tisements on popular online social/sexual networkingplatforms popular amongst GBMSM.All study subjects were asked to complete a computerassisted, self-administered questionnaire collecting dataon demographics, sexual behavior and substance use. Anurse-administered questionnaire and clinical visit wasconducted, which included a rapid point-of-care HIVtest. Participants received a $50 Canadian dollars (CAD)honorarium as participation incentive. Participation in-centive could be either paid in cash or redeemed for asemi-annual prize draw entry for travel ($2,000 value) ormonthly prize draw entry for an electronics gift card($250 value). Participants were also provided a $10 CADrecruitment incentive for each successful participant thatcompleted the study protocol.Outcome variablesTo describe our sample we include basic demographicvariables. Sexual identity was determined as gay or bisexual/other. Age was categorized as 18–29, 30–44 or ≥ 45 yearsold. Race/ethnicity was self-identified as Caucasian, Asian,Indigenous, or other. Participant annual income was cate-gorized as < $30,000, $30,000-$60,000 or ≥ $60,000 CAD.We also determined if participants had a regular partner atthe time of survey, and the number of male anal sex part-ners respondents had in the past six months, categorizedupon quartiles as 0, 1–2, 3–6, or ≥7 partners.Our sensitivity analysis focuses on three key variables: 1)HIV serostatus (HIV-negative or HIV-positive); 2) any“high risk sex”, which was defined as any condomless analintercourse in the past 6 months with an HIV-discordantor status unknown partner; and 3) any injection drug use(excluding steroids) in the past six months. HIV serostatuswas determined using a nurse-administered point-of-careHIV test (InstiTM Rapid HIV-1/HIV-2 test, BiolyticalLaboratories, Richmond, Canada) with subsequent typicalconfirmatory testing for reactive or indeterminant resultsat the local public health laboratory, or for study partici-pants who self-reported as being HIV positive, confirm-ation of their HIV status with a previous laboratory report.All newly diagnosed participants were referred to care.Sample sizePrior research in Vancouver approximated a prevalenceof condomless anal intercourse with a sero-discordantpartner of unknown HIV status of 20% among GBMSM,and the prevalence of any hard or soft drug use withinthe two hours prior to anal intercourse, a possible pre-dictor of high-risk sex, of 26% [4]. In order to detect asignificant difference of +/− 8.6% with a power of 0.9 atp = 0.05 of condomless anal intercourse with a sero-discordant or unknown HIV status partner and an oddsratio of 1.52 or larger for the effect of drug-use aroundsex on having risky sex with a power of 0.8 at p = 0.05,we calculate a minimum required sample size of 560after excluding a planned 30 seeds.Statistical analysisDescriptive analysis was used to calculate crude and RDS-adjustment weighting point-estimates and 95% confidenceintervals (CI) of outcome variables. Any missing data weretreated as non-response and coded as such for all analyses.RDS estimates were conducted with functions RDS-I (SH)[8, 9], RDS-II (VH) [18] and RDS-SS [13]. For RDS-II(VH) weights, a participant’s network size was determinedusing the following questions asked on the computer-assisted self-interview questionnaire: “Of the GBMSM youknow in the Vancouver area and whom you have seen orspoken to in the past month, how many do you knowcomfortably enough to give a study voucher inviting theirparticipation (in the study)?” RDS-SS estimates, which re-quire the population size be known, assumed a populationsize estimate of 33,960 GBMSM in Vancouver as previ-ously described [5]. To conduct sensitivity analysis, weused various sample cuts, starting with all participants andsubsequently removing unproductive seeds (0 recruitmentwaves), chains of ≤ 1 recruitment waves, and chains of ≤ 2recruitment waves.A simulation analysis was performed to compare esti-mates and variances of outcome variables between thethree RDS-weighting functions used. At random, a sub-sample of seed trees was selected from the overall sampleand estimates were calculated using the three chosen RDS-weighting functions. This process was repeated for 100sub-samples. Pair-wise comparisons between each estimatewere performed using a level of significance of α = 0.05. Ifthe difference in estimates tended to be always >0 or <0that would indicate that one method tended to producegreater or smaller estimates than the other.Data were cleaned using SAS 9.4 and analysis andplots were done using RDS Analyst 0.52 [27]. Diagnoseswere observed visually with convergence plots andbottleneck plots, with functions ‘convergence.plot’ and‘bottleneck.plot’ respectively [22].ResultsRDS characteristicsOverall, 719 participants were recruited, which included119 seeds (16.6% of overall sample) and a maximum ofLachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 4 of 1016 recruitment waves (mean recruitment chain length =1.75 waves). The recruitment tree depicting samplenetworks is presented in Fig. 1. The removal of unpro-ductive seeds (n = 50, 42.0% of sample seeds, 7.0% ofoverall sample), those that did not recruit any partici-pants, left 69 productive seeds that resulted in at leastone recruitment wave. For this sub-sample, the meanrecruitment chain length increased to 3.01 waves. By re-moving seeds that only produced one recruitment wave(n = 125 seeds or recruits, 17.4% of overall sample), 37moderately productive seeds remained, with the meanrecruitment chain length increasing to 4.76 waves.Finally, removal of seeds that only produced tworecruitment waves (n = 182 seeds or recruits, 25.3%of overall sample) left 25 highly productive seeds,with a mean recruitment chain length of 6.08 waves.The wave-length characteristics of each sub-sampleare summarized in Table 1.Sample demographicsOur sample of 719 GBMSM contained 612 (85.1%) menthat identified as gay. There was a reasonably even dis-tribution of age among our sample, ranging from 18 to74 years with a median of 33 years (interquartile range26–47 years). The majority of our sample consisted ofrespondents who identified their race/ethnicity asCaucasian (n = 539, 75.0%). Most respondents reportedearning less that $30,000/year (n = 457, 63.6%). Oursample consisted of a majority of respondents that didnot report having a current partner (n = 446, 62.3%) and629 (87.6%) respondents reported more that one analsex partner in the past 6 months. Descriptive crude vari-ables of our sample can be found in Table 2.Diagnostic plotsDiagnostic plots were produced at the end of recruit-ment and data collection. Convergence plots of high-risksex and injection drug use showed that both variablesconverged on the population estimate (Fig. 2). Addition-ally, the bottleneck plots for both variables appeared toconverge on the point estimate, suggesting low homo-phily (Fig. 2). Contrastingly, the convergence plot of theHIV-positive serostatus variable showed very late con-vergence of sample results on the population estimate.Furthermore, this bottleneck plot showed two divergentestimates, suggestive of sample homophily. Analytically,convergence was found to have occurred by the 10thwave (n = 691 including seeds) at a level of 0.01homophily for all three key outcome variables (datanot shown).Sensitivity analysisBased on the various sample restrictions excluding seedsand their recruitment chains stepwise by productivity,Table 3 provides estimates for three key study outcomes.For high risk sex and injection drug use, within eachFig. 1 Momentum Health Study recruitment tree. Nodes represent one study recruit. Seeds are represented by superior terminus nodesLachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 5 of 10sample cut, crude, RDS-I (SH), RDS-II (VH) and RDS-SS adjusted estimates of proportions fell within each es-timates’ confidence interval. Additionally, we find thatfor outcome HIV-positive serostatus, within each samplecut, crude estimates did not fall within the RDS-I (SH)confidence interval, but did fall within RDS-II (VH) andRDS-SS confidence intervals.Simulation analysis did not find any significant differ-ences between RDS estimators using paired compari-sons. Differences between RDS-II (VH) and RDS-SSestimators were 0.0, while the absolute differencesbetween RDS-II (VH) and RDS-I (SH) and RDS-SS andRDS-I (SH) were ≥ 1.2. Table 4 shows the results of theseanalyses, simulation samples and estimates are providedin the supplementary material (Additional file 1).DiscussionUsing RDS methodology for a cross-sectional study of719 GBMSM in Vancouver, BC, our results suggest thatpoint estimates for parameters upon which our samplereached equilibrium with low homophily (e.g., high risksex and injection drug use) were not effected by theinclusion of unproductive seeds or short recruitmentchains in analysis. This was assessed through visualizationof diagnostic plots and examination that point estimatescalculated fell within the 95% CIs of overall estimatesacross all RDS-adjustment weighting approaches for vari-ous sample restrictions. That crude sample proportions ofthese parameters fell within all RDS-adjustment weightingapproaches’ 95% CIs, within each sample cut based onseed productivity strengthens our conclusion that our esti-mates are not influenced by potential biases in seed selec-tion. Analytically, we did not find any statisticallysignificant differences between RDS estimators using pair-wise comparison. We conclude that within our sample,when parameter stability and low homophily are met, ouranalysis is not affected by using large numbers of unpro-ductive seeds, as has been suggested [9, 14, 15, 23].Although this may seem a costly and time-consumingmethod of recruitment, we found that we were able toreach our desired sample size by introducing additionalseeds into the sample; this allowed us to maintain oursample size, thus limiting variance and preserving statis-tical power. While these conclusions were based on therelatively large number of seeds used in our analysis, theseconclusions may not generalize to studies using moreseeds where data from non- and lowly-productive seedsmay in fact contribute bias to calculated point estimates.We encourage researchers that depend on a larger propor-tion of seeds than we present to assess if they will inflictundue bias upon the point estimate.Contrastingly, when the assumptions of parameter sta-bility and low homophily are violated, such with ourHIV serostatus variable, as determined by very late con-vergence and evidence of two bottlenecks on diagnosticplots, we find some key differences. We find that crudeestimates fell within RDS-II (VH) and RDS-SS calcu-lated 95% CIs but not within the RDS-I (SH) calcu-lated 95% CIs across all sample cuts. However, in thecontext of low number of waves (i.e., with theinclusion of unproductive and less productive seeds)and lower homophily, RDS-II (VH) has been found toTable 1 Wave-length characteristics of four differing sample restrictions based on seed productivityOverall (n = 719) >0 Wave (n = 669) >1 Wave (n = 594) >2 Waves (n = 537)Seedsa n(% of sample remaining) 119 (16.6%) 69 (10.3%) 37 (6.2%) 25 (4.7%)Wave length mean (median) 1.75 (1.00) 3.01 (2.00) 4.76 (3.00) 6.08 (5.00)Wave length range 0–16 1–16 2–16 3–16aNumber of seeds corresponds with number of chainsTable 2 Momentum Health Study sample demographicsn Crude %Sexual identityGay 612 85.1Bisexual/Other 107 14.9Age (years)18–29 275 38.330–44 233 32.4≥ 45 211 29.5Race/EthnicityWhite 539 75.0Asian 72 10.0Indigenous 50 7.0Other 58 8.1Income (CAD)< $30,000 457 63.6$30,000–60,000 182 25.3> $60,0000 80 11.1Regular sex partnerYes 273 37.0No 446 62.3Number of male anal sex partners in past 6 months0 89 12.41–2 226 31.53–6 208 30.0≥ 7 195 27.2Lachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 6 of 10outperform RDS-I (SH) [14, 17], and we will thereforelend more trust to RDS-II (VH) estimates when seedbias is present.When comparing RDS-II (VH) with RDS-SS estimateswe found that point estimates, as well as 95% CIs, all fellwithin 0.1% of each other across all sample cuts for allFig. 2 Convergence and bottleneck plots of three key outcome variables of the Momentum Health Study. Plot y-axes represent proportions of participantsthat answered “yes” to the parameter, x-axes are number of participants. P6M= past six months, IDU = injection drug use, excluding steroidsLachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 7 of 10variables. It has been suggested that RDS-SS estimatescan be used to validate the with replacement assumptionof RDS-II (VH). Assuming that our Vancouver GBMSMpopulation estimate is robust [5], we conclude that thewith replacement sampling assumption is met and thatglobal exhaustion or finite population effects have notintroduced bias into our estimates [13].Finally, our results support previous suggestions thatconvergence and bottleneck plots are an effective way todetermine sample stability and level of homophily [22].We believe that our estimates are robust for outcomevariables that reached equilibrium as evidenced by diag-nostic plots, which is supported by prior research [22].Although we produced these diagnostics at the end ofrecruitment, we believe that these plots can be easilycreated and assessed during any stage of sampling to de-termine if further recruitment is required to reach stabil-ity or if further addition of specific unique seeds isrequired to address sample bottlenecks accounting forlow homophily. We feel that our study contributes anempirical “proof of concept” of the diagnostics presentedby Gile and colleagues where observational evidence islacking [17, 19, 22].Noteably, analytical homophily was observed on allthree key outcome variables. This is in contrast to theobserved homophily on the HIV serostatus variable de-termined through the diagnostic bottleneck plot. Thismay suggest that graphical determination of equilibriumand homophily is better suited to empirical data thansimulated data.To our knowledge, this is the first study to report asensitivity analysis of varying levels of seed productivitywithin an RDS study in the literature. Further, we believethis to be the first study within Canada to successfullyapply RDS to a GBMSM population. We believe ourstudy contributes empirical evidence to a somewhatnovel and increasingly used sampling and analysis meth-odology where a relative paucity exists.LimitationsRespondent-driven sampling with GBMSM populationshas been used extensively in non-Western settings,which have unique community-level and societal-levelfactors in terms of connectedness, acceptance, andstigma. One limitation of our findings is that inferencesshould not be made to other regions or populations thatTable 3 Three key study outcomes using various sample cuts and RDS-weightsn Crude % RDS I % (95% CI) RDS II % (95% CI) RDS SS % (95% CI)HIV-positive serostatus (vs. HIV-negative) Overall 199 27.7 22.5 (19.4–25.6) 26.7 (20.7–32.7) 26.7 (20.7–32.7)>0 Wave 189 28.3 22.9 (19.7–26.2) 27.7 (21.5–33.9) 27.7 (21.4–34.1)>1 Wave 178 30.0 24.2 (20.6–27.7) 29.3 (22.6–36.0) 29.4 (22.6–36.1)>2 Wave 161 30.0 23.8 (20.1–27.5) 29.1 (22.1–36.1) 29.1 (22.1–36.2)Any high risk sex in past 6 months (vs. none) Overall 262 37.3 34.9 (31.1–38.8) 33.6 (27.6–39.6) 33.7 (27.7–39.7)>0 Wave 251 38.4 35.5 (31.5–39.5) 35.2 (28.9–41.5) 35.3 (29.0–41.6)>1 Wave 221 38.0 36.0 (31.7–40.3) 35.8 (29.3–42.4) 35.8 (29.2–42.4)>2 Wave 203 38.7 37.0 (32.4–41.5) 36.4 (29.5–43.3) 36.5 (29.6–43.3)Injected any drugs in past 6 months (vs. none) Overall 61 7.1 8.8 (5.5–12.1) 7.3 (3.9–10.7) 7.3 (3.9–10.7)>0 Wave 51 7.6 9.1 (5.8–12.4) 8.1 (4.4–11.8) 8.1 (4.3–11.9)>1 Wave 48 8.1 8.7 (5.7–11.7) 8.0 (4.7–11.3) 8.0 (4.7–11.3)>2 Wave 43 8.0 8.7 (5.5–11.9) 8.3 (4.8–11.8) 8.3 (4.8–11.9)Table 4 Paired comparison of RDS weights for three key study outcomesCrude RDS-I RDS-II RDS-SS RDS-II – RDS-I RDS-SS – RDS-I RDS-SS – RDS-IIHIV-positive serostatusMean % (95% CI) 27.0 (12.3–39.3) 22.2 (10.3–31.9) 25.7 (10.7–36.3) 25.7 (10.7–36.3) 3.4 (−0.9–7.5) 3.4 (−0.9–7.5) 0.0 (−)p-value 0.11 0.11 0.22Any high risk sex in past 6 monthsMean % (95% CI) 37.0 (32.2–41.9) 34.3 (26.7–41.4) 33.1 (25.3–39.4) 33.1 (25.3–39.4) −1.2 (−2.9–0.3) −1.2 (−2.9–0.3) 0.0 (0.0–0.1)p-value 0.12 0.12 0.28Injected any drugs in past 6 monthsMean % (95% CI) 6.6 (2.6–11.3) 9.0 (2.3–14.0) 6.7 (1.8–11.7) 6.7 (1.8–11.6) −2.3 (−3.0–0.0) −2.3 (−3.0–0.0) 0.0 (−)p-value 0.06 0.06 0.18Lachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 8 of 10demonstrate characteristics not consistent with those ofour sampled population. Indeed, prior research suggeststhat population characteristics may vary country tocountry based on underlying network structure,psychology, behavior, cultural practices, etc. [28, 29]Additional characteristics of our study further limitits generalizability: for example, our findings representa sample with low homophily, but the inference ofthese findings to populations with greater homophilyon key outcomes may be limited.Our sensitivity analysis of increasingly removing re-spondents based on recruitment productivity has led toa reduction in sample size. Inherently, variance will beincreased. This limits our use of overlapping 95% CIs, orlack thereof, as an adequate measure of whether or notthe differing seed exclusion criteria and choice ofestimator made a meaningful impact on estimation.Indeed, RDS variances are shown to be relatively widealready [12, 20]. This was the impetus for us to carry outthe simulation analysis for this study.Additionally, interpretation of our HIV serostatusresults should be made with caution due to the potentialdependence of our overall analysis on seed selection onthis parameter, as assessed through diagnostic plots.Indeed, we suggest interpretation of this variable be lim-ited to our sample cut including only the most product-ive seeds. We therefore suggest that when stability orhomophily are not assessed on a given parameter thatfinal analysis exclude unproductive and lowly productiveseeds. This will likely reduce sample size, thus increasingvariance and limiting power to detect differences whencomparing differing groups or methods of analysis.A particularly large limitation of all RDS studies is in-creased variance compared with more traditional dataanalysis methods, and this applies to the study presentedhere [12, 20].As with all observational studies, our analysis may belimited to unobserved selection bias and confounding.Particular to chain-referral sampling methods in general,subpopulations that are not penetrated, or recruited, mayexist. While formative assessment attempts to address thisby identifying these subpopulations [7, 11], such isolated“out-groups” that are unknown to researchers, based oncultural differences or discriminatory behaviors or perhapsbecause of a different parameter prevalence, will have ledto a form of selection bias. This form of selection biasmay still be at play within our study.The authors view this work, previous analyses [7, 30],and future analyses of the Momentum Health Study asexamples of the successful implementation of RDS toderive inferential information of a population without acomprehensive sampling frame. This work, and the bodyof knowledge cited within, provides support of an emer-ging method to obtain valid inferences from a non-probability sample, while remaining cautious of its limita-tions. We encourage those considering the use of RDS toproceed with an understanding of the number of assump-tions that must be met for unbiased analysis, and we offerthis sensitivity analysis as an example of how to empiric-ally assess some of these assumptions.ConclusionsUsing diagnostic methods suggested by Gile, Johnston andSalganik [22], for outcomes that have reached parameterstability and within each sample cut, the crude propor-tions fell within 95% confidence intervals of all RDS-weighted estimates. All RDS-weighted estimates weresimilar and fell within the 95% confidence intervals ofeach other on these outcomes. We did not find significantdifferences between RDS estimators analytically. Further-more, we find that diagnostic plots are a useful method toassess for equilibrium and homophily within an RDS sam-ple and this is a useful predictor of the validity of descrip-tive estimates. RDS studies, although potentially costlyand time consuming, are not negatively affected by largenumbers of unproductive or lowly productive seeds whenequilibrium has occurred. These conclusions may nothold true in instances of instability and/or low homophily,as evidenced by the HIV serostatus variable of this study.Additional fileAdditional file 1: Supplemental Material. (XLSX 823 kb)AbbreviationsBC: British Columbia; CAD: Canadian dollar; CI: Confidence interval;GBMSM: Gay, bisexual and other men who have sex with men;HAART: Highly active antiretroviral therapy; HIV: Human immunodeficiencyvirus; RDS: Respondent-driven samplingAcknowledgementsThe authors would like to thank our study participants, office staff and communityadvisory board, as well as our community partner agencies, Health Initiative forMen, YouthCO HIV & Hep C Society, and Positive Living Society of BC.FundingThe study is funded through the National Institute on Drug Abuse(R01DA031055-01A1) and the Canadian Institutes for Health Research (MOP-107544, 143342). NJL was supported by a CANFAR/CTN PostdoctoralFellowship Award. DMM is supported by a Scholar Award from the MichaelSmith Foundation for Health Research (#5209).Availability of data and materialsThe data used and analyzed during the current study are available from thecorresponding author on reasonable request.Authors’ contributionsHFR, EAR, RSH, DMM conceptualized and designed the overall MomentumHealth Study. AR was responsible for data collection. NJL and HFRconceptualized the sensitivity analysis study. ZC and PS undertook all datapreparation and statistical analysis. All authors evaluated the results. Thepaper was drafted by NJL and JTS, and all authors contributed tosubsequent writing and review. All authors approved this version of thepaper submitted to review.Lachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 9 of 10Competing interestsThe authors declare that they have no competing interests.Consent for publicationNot applicable.Ethics approval and consent to participantThis study protocol was approved by Research Ethics Boards of Simon FraserUniversity (2011 s0691), University of British Columbia/Providence Health(H11-00691), and the University of Victoria (11–459). During enrollment allparticipants, regardless of age, provided written informed consent toparticipate in this study; no parental/guardian consent was required forparticipants aged 16 or 17.Author details1Epidemiology & Population Health, British Columbia Centre for Excellence inHIV/AIDS, 608-1081 Burrard Street, Vancouver V6T 1Y6, Canada. 2School ofPublic Health and Social Policy, Faculty of Human and Social Development,University of Victoria, Victoria, Canada. 3University of California San Francisco,San Francisco, USA. 4San Francisco Department of Public Health, SanFrancisco, USA. 5Department of Anthropology, University of Victoria, Victoria,Canada. 6Faculty of Health Science, Simon Fraser University, Burnaby, Canada.7Faculty of Medicine, University of British Columbia, Vancouver, Canada.Received: 16 May 2016 Accepted: 4 November 2016References1. Magnani R, Sabin K, Saidel T, Heckathorn D. Review of sampling hard-to-reachand hidden populations for HIV surveillance. AIDS. 2005;19 Suppl 2:S67–72.2. Beyrer C, Baral SD, Van Griensven F, Goodreau SM, Chariyalertsak S, Wirtz AL,et al. Global epidemiology of HIV infection in men who have sex with men.Lancet. 2012;380(9839):367–77. Elsevier Ltd. Available from: http://dx.doi.org/10.1016/S0140-6736(12)60821-6.3. Centre for Communicable Disease and Infection Control, Public HealthAgency of Canada. HIV/AIDS Epi Updates Chapter 1: National HIVPrevalence and Incidence Estimates for 2011. Canada: Centre forCommunicable Disease and Infection Control, Public Health Agency ofCanada; 2014.4. Trussler T, Banks P, Marchand R, Robert W, Gustafson R, Hogg R, et al.ManCount Sizes-up the Gaps: a sexual health survey of gay men inVancouver. Vancouver: Vancouver Coastal Health; 2010.5. Lachowsky N, Rich A, Cui Z, Oliveira N, Colley G, Sereda P, et al. Estimating theSize of the MSM Population Using Multiple Methods and Data Sources inVancouver, British Columbia. Can J Infect Dis Med Microbiol. 2015;26(Suppl B):78. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4427862/.6. Valleroy LA, MacKellar DA, Karon JM, Rosen DH, McFarland W, Shehan DA,et al. HIV prevalence and associated risks in young men who have sex withmen. Young Men’s Survey Study Group. JAMA. 2000;284(2):198–204.Available from: http://www.ncbi.nlm.nih.gov/pubmed/10889593.7. Forrest JI, Stevenson B, Rich A, Michelow W, Pai J, Jollimore J, et al. Communitymapping and respondent-driven sampling of gay and bisexual men’scommunities in Vancouver, Canada. Cult Health Sex [Internet]. 2014;(May 2015):288–301. Available from: http://www.ncbi.nlm.nih.gov/pubmed/24512070.8. Heckathorn DD. Respondent-driven sampling: A new approach to the studyof hidden populations. Soc Probl. 1997;44(2):174–99.9. Respondent-Driven HDD, Sampling II. Deriving Valid Population Estimates fromChain-Referral Samples of Hidden Populations. Soc Probl. 2002;49(1):11–34.10. Malekinejad M, Johnston LG, Kendall C, Kerr LRFS, Rifkin MR, Rutherford GW.Using respondent-driven sampling methodology for HIV biological andbehavioral surveillance in international settings: A systematic review. AIDSBehav. 2008;12 SUPPL 1:105–30.11. Johnston L, Whitehead S, Simic-Lawson M, Kendall C. Formative research tooptimize respondentdriven sampling surveys among hard-to-reachpopulations in HIV behavioral and biological surveillance: lessons learnedfrom four case studies. AIDS Care. 2010;22(6):784–92.12. Goel S, Salganik MJ. Assessing respondent-driven sampling. Proc Natl AcadSci U S A. 2010;107(15):6743–7.13. Gile KJ. Improved Inference for Respondent-Driven Sampling Data withApplication to HIV Prevalence Estimation. J Am Stat Assoc. 2011;106(493):135–46. Available from: http://arxiv.org/abs/1006.4837.14. Gile KJ, Handcock MS. Respondent-Driven Sampling: An Assessment ofCurrent Methodology. Sociol Methodol. 2010;40(1):285–327. Available from:http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3437336&tool=pmcentrez&rendertype=abstract.15. Salganik MJ, Heckathorn DD. Sampling and estimation in hidden populationsusing respondent-driven sampling. Sociol Methodol. 2004;34(1):193–240.16. Heckathorn D, Semaan S. Extensions of respondent-driven sampling: a newapproach to the study of injection drug users aged 18–25. AIDS Behav.2002;6(1):55–67. Available from: http://link.springer.com/article/10.1023/A:1014528612685.17. Wejnert C. An empirical test of respondent-driven sampling: Pointestimates, variance, degree measures, and out-of-equilibrium data. SociolMethodol. 2009;39(1):73–116.18. Volz E, Heckathorn DD. Probability based estimation theory for respondentdriven sampling. J Off Stat. 2008;24(1):79–97.19. Wirtz AL, Mehta SH, Latkin C, Zelaya CE, Galai N, Peryshkina A, et al.Comparison of respondent driven sampling estimators to determine HIVprevalence and population characteristics among men who have sex withmen in Moscow, Russia. PLoS One. 2016;11(6):e0155519.20. Salganik MJ. Variance estimation, design effects, and sample sizecalculations for respondent-driven sampling. J Urban Health. 2006;83(6Suppl):i98–112. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1705515&tool=pmcentrez&rendertype=abstract.21. McCreesh N, Frost SDW, Seeley J, Katongole J, Tarsh MN, Ndunguse R, et al.Evaluation of respondent-driven sampling. Epidemiology. 2012;23(1):138–47.Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3277908&tool=pmcentrez&rendertype=abstract.22. Gile K, Johnston LG, Salganik MJ. Diagnostics for respondent‐drivensampling. J R … [Internet]. 2014;241–69. Available from: http://onlinelibrary.wiley.com/doi/10.1111/rssa.12059/full.23. Kuhns LM, Kwon S, Ryan DT, Garofalo R, Phillips G, Mustanski BS. Evaluationof respondent-driven sampling in a study of urban young men who havesex with men. J Urban Heal. 2014;92(1):151–67. Available from: https://www.ncbi.nlm.nih.gov/pubmed/25128301.24. Fox J. Diagnosing Non-Normality, Nonconstant Error Variance, andNonlinearity. In: Applied Regression Analysis and Generalized Linear Models.3rd ed. Thousand Oaks: SAGE Publishing; 2016.25. Cowles MK, Carlin BP. Markov Chain Monte Carlo Convergence Diagnostics:A Comparative Review. J Am Stat Assoc. 1996;91(434):883–904. Availablefrom: http://www.jstor.org/stable/2291683.26. Ramadanovic B, Vasarhelyi K, Nadaf A, Wittenberg RW, Montaner JSG, Wood E,et al. Changing Risk Behaviours and the HIV Epidemic: A Mathematical Analysisin the Context of Treatment as Prevention. PLoS One. 2013;8(5):e62321.27. Handcock MS, Fellows IE, Gile KJ. RDS Analyst: Software for the Analysis ofRespondent-Driven Sampling Data, Version 0.52 [Internet]. 2015. Availablefrom: http://hpmrg.org.28. Johnston LG, Chen Y-H, Silva-Santisteban A, Raymond HF. An empiricalexamination of respondent driven sampling design effects among HIV riskgroups from studies conducted around the world. AIDS Behav. 2013;17:2202–10. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23297082.29. Henrich J, Heine SJ, Norenzayan A. The weirdest people in the world?Behav Brain Sci. 2010;33(2–3):61-83-135.30. Moore DM, Cui Z, Lachowsky N, Raymond HF, Roth E, Rich A, Sereda P,Howard T, McFarland W, Lal A, Montaner J. HIV Community Viral Load andFactors Associated With Elevated Viremia Among a Community-BasedSample of Men Who Have Sex With Men in Vancouver, Canada. J AcquirImmune Defic Syndr. 2016;72(1):87–95.Lachowsky et al. BMC Medical Research Methodology  (2016) 16:157 Page 10 of 10


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items