UBC Research Data

Data from: Recurrent selection explains parallel evolution of genomic regions of high relative but low absolute differentiation in a ring species Irwin, Darren E.; Alcaide, Miguel; Delmore, Kira E.; Irwin, Jessica H.; Owens, Gregory L.

Description

Abstract
Recent technological developments allow investigation of the repeatability of evolution at the genomic level. Such investigation is particularly powerful when applied to a ring species, in which spatial variation represents changes during the evolution of two species from one. We examined genomic variation among three subspecies of the greenish warbler ring species, using genotypes at 13 013 950 nucleotide sites along a new greenish warbler consensus genome assembly. Genomic regions of low within-group variation are remarkably consistent between the three populations. These regions show high relative differentiation but low absolute differentiation between populations. Comparisons with outgroup species show the locations of these peaks of relative differentiation are not well explained by phylogenetically conserved variation in recombination rates or selection. These patterns are consistent with a model in which selection in an ancestral form has reduced variation at some parts of the genome, and those same regions experience recurrent selection that subsequently reduces variation within each subspecies. The degree of heterogeneity in nucleotide diversity is greater than explained by models of background selection, but is consistent with selective sweeps. Given the evidence that greenish warblers have had both population differentiation for a long period of time and periods of gene flow between those populations, we propose that some genomic regions underwent selective sweeps over a broad geographic area followed by within-population selection-induced reductions in variation. An important implication of this ‘sweep-before-differentiation’ model is that genomic regions of high relative differentiation may have moved among populations more recently than other genomic regions.; Usage notes
Greenish Warbler consensus genome v1.0Contains a consensus greenish warbler genome assembly, based on a consensus of genome assemblies of three greenish warbler (Phylloscopus trochiloides) individuals (viridanus TL2; trochiloides LN10; and plumbeitarsus BK2). The file contains sequences for 31 chromosomes and chromosome fragments, based on mapping of greenish warbler contigs to the zebra finch genome assembly (version 3.2.4; Warren et al. 2010). The file is in standard Fasta format. It needs to be un-compressed first by running the command: gunzip Phylloscopus_trochiloides.Greenish_warbler.3sample.consensus.wrapped.toplevel.fa.gz The three whole-genome shotgun assemblies have been deposited at DDBJ/ENA/GenBank under the accessions LXPA00000000 (viridanus), LXOZ00000000 (trochiloides), and LXOY00000000 (plumbeitarsus).Phylloscopus_trochiloides.Greenish_warbler.3sample.consensus.wrapped.toplevel.fa.gz
Greenish Warbler VCF files: invariant and SNPsThis folder contains 31 files in the VCF format. Each contains genotypic information for a single chromosome (or chromosome fragment) for all individuals in the two GBS (genotype by sequencing) plates in the study. The files have names corresponding to "GW_Lane5plusLiz.GWref.genotypes.allSites.chr*.infoSites.vcf"; the * represents the chromosome name. All sites, both variant and invariant, where there was a total of more than 10 sequencing reads among all individuals were included. The chromosomes refer to those in the Zebra Finch assembly (version 3.2.4; Warren et al. 2010). For details of how these VCF files were produced and how they were then used, see the associated paper (Irwin et al., in review, Molecular Ecology) and this file in this same Dryad package: "GW_islands_of_diff_processing_scripts.txt"GW_infoSites_vcf_chromosome_files.tar.gz
Greenish Warbler genotypes: 45-sample analysisThis folder contains genotypic information used in the 45-sample analysis comparing the taxa viridanus, trochiloides, and plumbeitarsus (15 individuals each). For each chromosome, there is a file containing the genotypes (ending in "012NA"), a file containing the list of individuals (ending in "012.indv"), and a file containing the list of positions on the chromosome (ending in "012.pos"). For details of how these files were produced see the file "GW_islands_of_diff_processing_scripts.txt", also provided in this Dryad package. These files are ready to be used for subsequent analysis and presentation in the R scripts supplied in this package. The metadata file ("GW_Lane5_plus_Liz.45samples.Fst_groups.txt"; also in this package) will be needed in the R processing.GW_45samples_012NA_files.tar.gz
Genotypes: outgroup analysisThis folder contains genotypic information used in the 9-sample analysis comparing greenish warbler and outgroup taxa (1 individual for each of 9 taxa). For each chromosome, there is a file containing the genotypes (ending in "012NA"), a file containing the list of individuals (ending in "012.indv"), and a file containing the list of positions on the chromosome (ending in "012.pos"). For details of how these files were produced see the file "GW_islands_of_diff_processing_scripts.txt", also provided in this Dryad package. These files are ready to be used for subsequent analysis and presentation in the R scripts supplied in this package. The metadata file ("GW_Lane5_plus_Liz.nine-taxa.Fst_groups.txt"; also in this package) will be needed in the R processing.GW_nine_taxa_012NA_files.tar.gz
metadata for the 45-sample analysisThis file is required for conducting the 45-sample analysis using the R script provided in this package. The file provides the names of each individual, the location code, the "group" (used for coloring points in various plots) and "Fst_group" (used for defining groups in the Fst analysis), and the "plot_order" (not used in the present paper).GW_Lane5_plus_Liz.45samples.Fst_groups.txt
metadata for the outgroup analysisThis file is required for conducting the 9-sample analysis using the R script provided in this package. The file provides the names of each individual, the location code, the "group" (used for coloring points in various plots) and "Fst_group" (used for defining groups in the Fst analysis), and the "plot_order" (not used in the present paper).GW_Lane5_plus_Liz.nine_taxa.Fst_groups.txt
main R analysis scriptThis file contains the main R scripts used to conduct the analysis and produce the figures. It uses as input the "012NA" files (and associated files) produced as described in the "GW_islands_of_diff_processing_scripts.txt" file. Note that two metadata files (also contained in this Dryad package) are also needed. Also crucial is a file of R functions ("genomics_R_functions.R") written especially for this analysis, but designed to work more generally; these functions are called by this script file.GW_GBS_R_analysis_script_for_Dryad.R
custom R functions fileThis R file contains functions written by Darren Irwin for the analysis of Greenish Warbler GBS variation, but are designed to work more broadly for any dataset in similar input format. To reproduce the analysis in the paper, the main R script file ("GW_GBS_R_analysis_script_for_Dryad.R") should be run; it calls functions in the present file.genomics_R_functions.R
SiteStats and WindowStats R filesThis folder contains files containing locus-based statistics ("SiteStats") and window-based statistics (WindowStats) for each chromosome. These files can be produced by the R script (in this package), and they can also be used by that script (whether the script saves and/or loads SiteStats and WindowStats files can be adjusted in that script using the setting for "calculate_or_load_stats" and the related settings below that). Producing these files can take days of processing time; I have included them here so you can produce most of the figures in the paper, by running the R script below the heading "GENOME-WIDE plots". That script will call the appropriate files (as long as you have designated a path/folder structure that matches the R script). Files are included for both the "45-sample" analysis and the "nine_taxa.one_per_taxon" analysis.SitesStats_and_WindowStats_files_GW_archived.tar.gz
SNPs only across the whole-genomeThis folder contains genotypic information for variant sites across the whole genome, among all greenish warbler individuals in the study. This information was used to produce Figure S1. The folder contains a file containing the genotypes (ending in "012NA"), a file containing the list of individuals (ending in "012.indv"), and a file containing the list of positions on the chromosome (ending in "012.pos"). For details of how these files were produced see the file "GW_islands_of_diff_processing_scripts.txt", also provided in this Dryad package. These files are ready to be used for subsequent analysis and presentation in the R scripts supplied in this package. The metadata file ("GW_Lane5_plus_Liz.GW_only.adults_only.Fst_groups.txt"; also in this package) will be needed in the R processing.GW.SNPs_only.whole-genome_archived.tar.gz
metadata for Figure S1 (whole-genome PCA)This file contains metadata for greenish warbler individuals used in the whole-genome PCA analysis (Figure S1). The R script in this package uses this information to color points in the PCA.GW_Lane5_plus_Liz.GW_only.adults_only.Fst_groups.txt
scripts: converting GBS reads to genotypesThis text file contains scripts and notes for the steps used in converting raw Illumina GBS sequencing reads to individual genotypes (at both variant and invariant sites) across the genome. The resulting genotype files (in "012NA" format) were then used as input into R, for the rest of the analysis and production of figures.GW_islands_of_diff_processing_scripts.txt

Item Media

Item Citations and Data

Licence

CC0 Waiver

Usage Statistics