UBC Faculty Research and Publications

A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data… Oleksyk, Taras K; Pombert, Jean-Francois; Siu, Daniel; Mazo-Vargas, Anyimilehidi; Ramos, Brian; Guiblet, Wilfried; Afanador, Yashira; Ruiz-Rodriguez, Christina T; Nickerson, Michael L; Logue, David M; Dean, Michael; Figueroa, Luis; Valentin, Ricardo; Martinez-Cruzado, Juan-Carlos Sep 28, 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13742_2011_Article_16.pdf [ 749.61kB ]
JSON: 52383-1.0224068.json
JSON-LD: 52383-1.0224068-ld.json
RDF/XML (Pretty): 52383-1.0224068-rdf.xml
RDF/JSON: 52383-1.0224068-rdf.json
Turtle: 52383-1.0224068-turtle.txt
N-Triples: 52383-1.0224068-rdf-ntriples.txt
Original Record: 52383-1.0224068-source.json
Full Text

Full Text

A locally funded Puerto Rican parrot(Amazona vittata) genome sequencing projectincreases avian data and advances youngresearcher educationOleksyk et al.Oleksyk et al. GigaScience 2012, 1:14http://www.gigasciencejournal.com/content/1/1/14 28DATA NOTE Open AccessA locally funded Puerto Rican parrot(Amazona vittata) genome sequencing projectincreases avian data and advances youngresearcher educationTaras K Oleksyk1*, Jean-Francois Pombert2, Daniel Siu3, Anyimilehidi Mazo-Vargas1, Brian Ramos1, Wilfried Guiblet1,Yashira Afanador1, Christina T Ruiz-Rodriguez1,4, Michael L Nickerson4, David M Logue1, Michael Dean4,Luis Figueroa5, Ricardo Valentin6 and Juan-Carlos Martinez-Cruzado1AbstractBackground: Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrotspecies in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studiedon a genomic scale.Findings: In a unique community-based funded project, DNA from an A. vittata female was sequenced using aHiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89xaverage coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments(N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other availableavian whole-genome sequences.Conclusions: The current data represents the first genomic information from and work carried out with a uniquesource of funding. This analysis further provides a means for directed training of young researchers in genetic andbioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Ricanparrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful forcomparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contributeto an improved understanding of the overall population health of this species and aid in ongoing and futureconservation efforts.Keywords: Amazona vittata, Puerto rican parrot, Genome sequence, Annotation, Assembly, Local funding,Education* Correspondence: taras.oleksyk@upr.edu1University of Puerto Rico at Mayagüez, Mayagüez, Puerto RicoFull list of author information is available at the end of the article© 2012 Oleksyk et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.Oleksyk et al. GigaScience 2012, 1:14http://www.gigasciencejournal.com/content/1/1/14Data descriptionA locally funded genomic sequencing project providedthe first phase of genome sequencing of the PuertoRican Parrot (Amazona vittata) (see Developing of theLocal Community Involvement in Additional file 1).DNA was purified from a female A. vittata blood sample(see Additional file 2: Table S1), and sequencing wasinitiated with the construction of two genome libraries:the majority of sequencing used a short fragment library(~300 bp inserts), and scaffolds were generated usinga long fragment library (~2.5 kb inserts). Raw IlluminaHiSeq reads were processed and filtered using the Geno-me Analyzer Pipeline software (as per the manufacturer’sinstructions at default parameters). Of the 309,060,168paired-end reads and the 180,079,956 mate-pair reads,respectively, 86.48% and 85.14% passed QC, using thecondition that if one read from a pair failed the QC, theentire pair was filtered out. Based on the total number ofbase pairs generated (see Additional file 3: Table S2), andthe predicted genome size of 1.58 Gb [1], we calculateda total genome coverage of 26.89x depth: with 17.08xcoverage for short fragment reads, and 9.8x for matepairs (Table 1 and Additional file 3: Table S2) (see SampleCollection and Genome Sequencing sin Additional file 1).We carried out two separate de novo assemblies,using Ray [2] software (Table 2) and SOAPdenovo [3](Additional file 4: Table S3), and selected the Ray assem-bly for use in all further analyses. Our genome coveragewas approximately 76%, which, given some of the scaf-folds may be overlapping and could not be properlyassembled, might be slightly overestimated. (see Assem-bly in Additional file 1). We evaluated assembly bycomparing the entire collection of transcripts listed forG. gallus in the NCBI Entrez Gene database usinglocal BLAST [4] and found that > 70% of the chickentranscripts were present, and as much as 11% of scaf-folds shared similarity with at least one G. gallus se-quence at average density of 1.39 genes/kbp (Table 3;Additional file 5: Figure S1).RepeatMasker software (http://www.repeatmasker.org)was used to search scaffolds for the presence of theknown repeat classes with known repeats found on 59%of the scaffolds (see Annotation in Additional file 1).In addition, we used manual annotation, both by annotationTable 1 Average coverage of the Puerto Rican parrot genome in the current study based on the predicted genome sizeof 1.58Gb [1]Sample Sequence information Total bases Read count Coverage TotalPa9a Pa9a_1 13,496,744,938 133,631,138(~300 bp inserts) Pa9a_2 13,496,744,938 133,631,138 17.08XPa9a Pa9a-MP_1 7,743,004,915 76,663,415(~2.5 kbp inserts) Pa9a-MP_2 7,743,004,915 76,663,415 9.90X 26.89XTable 2 Results of the genome assembly by Ray [2]Category ≥ 100 nt ≥ 500 ntContigs Number 358,398 259,423Total length 1,137,438,369 1,116,807,713Average 3,173 4,304Largest 75,003 75,003Median 1,637 2,774N50 6,841 6,983Scaffolds Number 245,947 148,255Total length 1,184,594,388 1,164,566,833Average 4,816 7,855Largest 206,462 206,462Median 1,048 2,913N50 19,033 19,470Table 3 Annotation summaryScaffolds mapped to: Scaffolds mRNAs+ RepeatsN (%)# N (%)* % of the scaffold N (%)* % of the scaffoldG. gallus genome only 53,345 22% 1,256 5% 8% 88,157 76% 7.7%Unmapped 105,030 43% 1,429 2% 22% 125,470 48% 19.4%T. guttata genome only 26,078 11% 4,206 21% 7% 87,592 93% 2.1%Mismatched 54,621 22% 12,030 27% 2% 266,478 98% 1.0%G. gallus and T. guttata 6,873 3% 1,426 26% 3% 32,994 98% 1.2%Total 245,947 100% 20,347 11% 4% 600,691 59% 4.3%+ mRNAs are from G. gallus.# Percentage values are from total number of scaffolds.* Percentage values are from the number of scaffolds in that category.Oleksyk et al. GigaScience 2012, 1:14 Page 3 of 7http://www.gigasciencejournal.com/content/1/1/14scaffolds for gene and repeat elements and by annotat-ing known genes, to validate high-throughput annota-tion, and using this, we designed and carried out astudent development program (see Genome Annotationand Education in Additional file 1).Comparative analyses of the A. vittata scaffolds againstthe chicken (Gallus gallus) [5] and zebra finch (Taeniopygiaguttata) [6] genomes using local BLAST [4] resulted in93.4 Mbp of total length of alignments to the chicken ge-nome with 82.7% identity on average (average bit score577.3), and 41.7 Mbp of total length of alignments to thezebra finch genome with 84.5% identity on average (averagebit score 431.1).The top BLAST alignments were sorted by the averageof their locations, and their frequencies were calculatedin 1 Mbp bins and plotted along all of the chromosomesFigure 1 Density of the A. vittata scaffolds that shared similarity with fragments of chicken and zebra finch genomes (Top) Chicken(G. gallus genome (per Mbp) and (Bottom) zebra finch (T. guttata) genome (per Mbp). Different chromosomes are represented by differentcolors as shown in the legend on the right. Chromosomal locations, lengths and quality of alignments to the two genomes by BLAST arepresented in Additional file 6: Table S4.Figure 2 Proportion of sequences with some similarity across the two avian genomes (G. gallus and T. guttata). A. vittata scaffolds areclassified into five categories (A) unmapped - those that were not found any similar sequence, (B) chicken only – those that shared similarity onlywith a fragment of G. gallus genome; (C) finch only – those that shared similarity only with a T. guttata genome; (D) mismatched – those scaffoldsthat shared similarity with sequences of G. gallus and T. guttata genomes but mapped to different chromosomes in the two species; (E) matched– those that mapped to the same chromosome in reference genomes of the two avian species. Proportions are represented as totals (pie chart),absolute numbers (top) and proportions per chromosome (bottom). The associated data are provided in Additional file 9: Table S5.Oleksyk et al. GigaScience 2012, 1:14 Page 4 of 7http://www.gigasciencejournal.com/content/1/1/14for both G. gallus and T. guttata genomes using Circos[7] (Figure 1). The chicken genome coverage was higher(109 scaffolds per Mbp in chicken on average vs. 72 inzebra finch), but the chicken genome also had morelocations with higher genome coverage. As high as 57%of the scaffolds could be partially aligned to one or bothof the genomes: 21.7% aligned only to G. gallus, and10.6% aligned exclusively to T. guttata, while 25%aligned to both genomes (Figure 2). These data are pre-sented and summarized for chicken in Additional file 6:Table S4.A, for zebra finch in Additional file 7: Table S4.B, and the complete information in Additional file 8:Table S4.C.Although a large proportion of scaffolds shared somesimilarity with the two avian genomes, there was alsodiscordance as only 12.6% of the scaffolds (2.8% of theFigure 3 Synteny of alignment of the A. vittata scaffolds to two avian reference genomes (G. gallus and T. guttata). The connectinglines show the proportion of scaffolds that mapped to T. guttata chromosomes on the left side to G. gallus chromosomes on the right side.The chromosomes are shown in order from top to bottom and designated in the same color for the both species. For simplicity, different colorsare used only for the three largest chromosomes. Chromosome 1 in G. gallus corresponds to chromosomes 1, 1A and 1B in T. guttata shownin different shades of blue.Oleksyk et al. GigaScience 2012, 1:14 Page 5 of 7http://www.gigasciencejournal.com/content/1/1/14total number of scaffolds) aligned to the same chromo-some in both species (Figure 2, top and Additional file 9:Table S5), and the proportion of discordance variedacross chromosomes, with the lowest value on chromo-some 11 (Figure 2, bottom and Additional file 9:Table S5). While this lack of synteny could point to ex-tensive rearrangements during the evolutionary history,the proportions of scaffolds discordantly aligned betweenchromosomes seemed to be distributed similarly relativeto chromosome lengths, indicating a significant randomcomponent (Figure 3). To test this, we selected the 200longest scaffolds and independently queried 500 bp endsto the chicken genome. Of these, only 10 scaffolds (5%)showed discordance by aligning to the opposite ends totwo or more different chicken chromosomes (see Com-parative Analysis in Additional file 1).In summary, these data represent the first assembly ofa genome sequence for a parrot endemic to the UnitedStates, and also the first genome of a species from the di-verse and ecologically important genus, Amazona, nativeto South America and the Caribbean. The assembled se-quence provides a starting point towards completing andannotating a draft genome sequence. The data availableat this coverage will be helpful in designing the future se-quencing efforts, and can also be used for annotation andcomparative genomic studies across the growing amountof avian genome data [5,6,8], which is essential giventhe growing rate of extinction among avian speciesworldwide.Availability of supporting dataThe raw reads are available at the ENA (accession#PRJEB225). Scaffolds and the assembly parametershave been submitted to the GenBank (accession#PRJNA171587), and all data, including FASTA files ofcontigs, scaffolds, corresponding assembly parameters,and annotation data are available in GigaDB [9]. Thelinks to all the supplementary tables and databases arelisted in (Additional files 2, 3, 4, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, and 16) and can also be accessed at http://genomes.uprm.edu/gigascience/Supplementary Tables/.Additional filesAdditional file 1: Supplementary materials.Additional file 2: Table S1. Quality and volume of four DNA samplesextracted from whole blood of two Amazona vittata parrots selected forthe genome sequencing.Additional file 3: Table S2. Results of the genome sequencing(Illumina HiSeq, Axeq Technologies). Pa9a_1 and Pa9a_2 represent theopposite ends of the 300 bp short reads, and the Pa9a-MP_1 andPa9a-MP_2 are the 2,500 bp mate pairs (MP). All sequences were 101bp long.Additional file 4: Table S3. Results of the genome assembly bySOAPdenovo [8].Additional file 5: Supplementary figures. Figure S1. Venn diagram ofthe overlap between the number of A. vittata scaffolds and the G. gallustranscripts from GenBank that were mapped to them by BLAST. Figure S2.A single example of chimera detected on scaffold-74754 after visualinspection of reads mapped to 100 largest scaffolds. Figure S3.Percentage of scaffolds containing fragments with > 95% similarity toGenBank sequences. Figure S4. Comparison between categories ofA. guttata scaffolds (described earlier in Figure 2): The box plots show themedians, Q1, Q3 and the extreme values. The means are shown inTable 3. A. Distribution of scaffold lengths; B. Distribution of densities ofgenes mapped per kbp of scaffold length. C. Differences in thedistribution of proportion of the length of the scaffold mapped to aG. gallus transcript from NCBI Entrez Gene database. D. Differences in thedistribution of proportion of the length of the scaffold mapped to aknown repeat class using RepeatMasker software [5]. Figure S5.Distribution of major classes of repetitive sequences found on A. vittatascaffolds. Figure S6. Relationship between the quality scores of thealignments between the parrot scaffolds to the chicken and zebra finchgenomes: A. All scaffolds. B. Mismatched scaffolds only (those scaffoldsthat shared similarity with sequences of G. gallus and T. guttata genomesbut mapped to different chromosomes in the two species; seeclassification in Figure 2). C. Matched sequences only (those that mappedto the same chromosome in reference genomes of the two avianspecies). Figure S7. Relationship between the size of a scaffold and thequality of its alignment to T. guttata and/or G. gallus genome sequence:A. All scaffolds aligned to the T. guttata genome. B. All scaffolds alignedto the G. gallus genome. C. Scaffolds from T. guttata that Mismatchedscaffolds mapped to different chromosomes in G. gallus; see classificationin Figure 2). D. Scaffolds from G. gallus that Mismatched scaffoldsmapped to different chromosomes in T. guttata). E. Matched sequencesfrom T. guttata only (those that mapped to the same chromosome inreference genomes of the two avian species), F. Matched sequences fromG. gallus only (those that mapped to the same chromosome in referencegenomes of the two avian species). Figure S8. Small fragments arerepeat- rich and gene-rich: A. Relationship between the length of thescaffolds and the proportion of it length matched to the G. gallussequences from NCBI Entrez Gene database. B. Relationship between thelength of the scaffolds and the proportion of it length designated byRepeatMasker as repetitive sequence.Additional file 6: Table S4A. Summary of the alignment of A. vittatasequences to the G. gallus genome sequence containing only the topalignment for each scaffold, its chromosomal position and quality scores.Additional file 7: Table S4B. Summary of the alignment of A. vittatasequences to the T. guttata genome sequence containing only the topalignment for each scaffold, its chromosomal position and quality scores.Additional file 8: Table S4C. The database of the alignmentinformation of A. vittata sequences to G. gallus and T. guttata genomesequence by BLAST.Additional file 9: Table S5. Proportions of sequences with somesimilarity that mapped to chromosomes of two reference avian genomes(G. gallus and T. guttata).Additional file 10: Table S6A. The summary of the database ofGenBank sequences with more than 95% similarity with the parrotscaffolds.Additional file 11: Table S6B. The database of GenBank sequenceswith more than 95% similarity with the parrot scaffolds found by BLAST.S7A. A map of G. gallus transcripts from NCBI Entrez Gene database thatmapped to one of the A. guttata scaffolds.Additional file 12: Table S7A. A map of G. gallus transcripts from NCBIEntrez Gene.Additional file 13: Table S7B. The database of alignments between ofG. gallus transcripts from NCBI Entrez Gene database and A. guttatascaffolds by BLAST.Additional file 14: Table S8. Distribution of different cases of repetitiveelements among different classes of A. guttata scaffolds.Oleksyk et al. GigaScience 2012, 1:14 Page 6 of 7http://www.gigasciencejournal.com/content/1/1/14Additional file 15: Table S9. Bioinformatics tools and outputs forscaffold and gene annotation.Additional file 16: Table S10. An example of annotation outputproduced by a student in the Genome annotation class using A. vittatagenome.Competing interestsOleksyk TK, Pombert JF, Mazo A, Ramos B, Guiblet W, Afanador Y,Ruiz-Rodriguez CT, Nickerson ML, Logue D, Dean M, Figueroa L, Valentin R,and Martinez-Cruzado JC do not have competing interests. Siu D isemployed by Axeq Technologies; the company which carried out the DNASequencing.Authors’ contributionsTKO, LF, RV, MD, MLN, DL and JCMC came up with the idea, and designedthe experiments. TKO, WG, YA, CTRR and JCMC organized public support andraised the funds. TKO, AMV, BR, YA, CTRR and RV collected, extracted andquantified DNA. DS performed sequencing and assembly by SOAPdenovo.JFP performed assembly by Ray. TKO and WG designed the data browserwebpage. TKO, JFP, MLN, DL, MD and JCMC wrote the paper. All authorsread and approved the final manuscript.Note from the editorsA related commentary by Stephen O’Brien on the issues surrounding thiswork is published alongside this article [10].AcknowledgementsFirst, we want to thank the people of Puerto Rico for their generous supportof our initiative in the form of hundreds of individual donations to thePuerto Rican Parrot Genome Project. Additional support came from U.S. Fishand Wildlife Service (US FWS) grant #F11AP00196, and from a donation byFundación Toyota de Puerto Rico. We thank the US FWS and the Compañíade Parques Nacionales de Puerto Rico for their assistance in obtainingsamples. We thank College of Arts and Sciences of the University ofPuerto Rico at Mayaguez for supporting the project and to dozens ofundergraduate students from the Biology Department for contributing theirtime. We thank Stephen J O’Brien, Juan A Rivero, Juan Lopez-Garriga,Steven E Massey, Fernando Bird, Nanette Diffoot, Susan Soltero, Jennifer Bae,Mathew Landers, April Matisz, and Audrey J Majeske for helpful ideas,discussions, and help at different stages of the project. Finally, we thank thebusiness community of Rincon, Puerto Rico, especially to Mr. Jim Behr andMs. Rhea Maxwell for help with promoting the collection of funds.Author details1University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico. 2University ofBritish Columbia, Vancouver, BC, Canada. 3Axeq Technologies, Seoul, SouthKorea. 4Cancer and Inflammation Program, National Cancer Institute, NIH,Frederick, MD, USA. 5Compañía de Parques Nacionales de Puerto Rico, SanJuan, Puerto Rico. 6Department of Natural and Environmental Resources, SanJuan, Puerto Rico.Received: 14 November 2011 Accepted: 14 September 2012Published: 28 September 2012References1. Tiersch TR, Wachtel SS: On the evolution of genome size of birds. J Hered1991, 82(5):363–368.2. Boisvert S, Laviolette F, Corbeil J: Ray: simultaneous assembly of readsfrom a mix of high-throughput sequencing technologies. Journal ofcomputational biology: a journal of computational molecular cell biology2010, 17(11):1519–1533.3. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K,et al: De novo assembly of human genomes with massively parallel shortread sequencing. Genome Res 2010, 20(2):265–272.4. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignmentsearch tool. J Mol Biol 1990, 215(3):403–410.5. International Chicken Genome Sequencing C: Sequence and comparativeanalysis of the chicken genome provide unique perspectives onvertebrate evolution. Nature 2004, 432(7018):695–716.6. Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Kunstner A,Searle S, White S, Vilella AJ, Fairley S, et al: The genome of a songbird.Nature 2010, 464(7289):757–762.7. Krzywinski MI, Schein JE, Birol I, Connors J, Gascoyne R, Horsman D,Jones SJ, Marra MA: Circos: An information aesthetic for comparativegenomics. Genome Res 2009, 19(9):1639–45.8. Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z,Rasko DA, McCombie WR, Jarvis ED, et al: Hybrid error correction and denovo assembly of single-molecule sequencing reads. Nat Biotechnol 2012,30(7):693–700.9. Oleksyk TK, Guiblet W, Pombert JF, Valentin R, Martinez-Cruzado JC:Genomic data of the Puerto Rican Parrot (Amazona vittata) from alocally funded project. GigaScience 2012. http://dx.doi.org/10.5524/100039.10. O’Brien SJ: Genome empowerment for the Puerto Rican parrot –Amazona vittata. GigaScience 2012, 1:13.doi:10.1186/2047-217X-1-14Cite this article as: Oleksyk et al.: A locally funded Puerto Rican parrot(Amazona vittata) genome sequencing project increases avian data andadvances young researcher education. GigaScience 2012 1:14.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submitOleksyk et al. GigaScience 2012, 1:14 Page 7 of 7http://www.gigasciencejournal.com/content/1/1/14


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items