Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genomics of sunflower improvement : from wild relatives to a global oil seed Baute, Gregory Joseph 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_november_baute_gregory.pdf [ 31.08MB ]
JSON: 24-1.0166730.json
JSON-LD: 24-1.0166730-ld.json
RDF/XML (Pretty): 24-1.0166730-rdf.xml
RDF/JSON: 24-1.0166730-rdf.json
Turtle: 24-1.0166730-turtle.txt
N-Triples: 24-1.0166730-rdf-ntriples.txt
Original Record: 24-1.0166730-source.json
Full Text

Full Text

Genomics of sunflower improvementFrom wild relatives to a global oil seedbyGregory Joseph BauteB.Sc., The University of Guelph, 2009M.Sc., The University of British Columbia, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Botany)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2015c© Gregory Joseph Baute 2015AbstractGenetic diversity is a critical component of global food security. We depend on crop wild relativesas a genetic resource for the continued improvement and diversification of many crops. In thisdissertation, I use population genomic approaches to investigate aspects of crop evolution insunflower, which is a widely grown hybrid oilseed. In Chapter 1, the relevant aspects of cropimprovement are reviewed, the sunflower system is introduced, and the research chapters are brieflydescribed. In Chapter 2, I used transcriptome sequencing data to scan the genomes of a panel ofcultivated and wild sunflowers and identified genes involved in domestication and improvement.Using data from additional wild sunflower species, I also identified widespread introgression of wildalleles into the modern crop gene pool. Chapter 3 describes a genomic survey of a diverse set ofabout 300 wild sunflower samples using genotyping by sequencing (GBS). The GBS data allowedme to determine evolutionary relationships and detect gene flow among taxa in the sunflower genus.I selected a subset of these wild samples to develop pre-bred lines, which I describe in Chapter4. Pre-bred lines act as a bridge through which wild alleles may be introduced into breedingprograms. Each of the circa 400 pre-bred lines I generated contain different components of theirwild parent’s genome which were identified using GBS. Evaluation of these lines in Uganda revealedthey could be excellent sources of alleles for disease resistance and drought tolerance. In Chapter5, I examined relationships among cultivated lines, specifically the two heterotic groups that havebeen developed for hybrid crop production. Using whole genome sequencing data, I found manygenomic regions were targets of selection during the development of these populations, and thatpatterns of diversity do not support an overdominance model of heterosis. Surprisingly, only tworegions, which correspond to introduced wild alleles, are highly differentiated between these twopopulations. As I conclude in Chapter 6, the history of the use of wild relatives in sunflowerbreeding is found in the genomes of these plants, and permeates and defines modern lines.iiPrefaceAlthough I was responsible for the majority of the design, execution, analysis and writing of thiswork, several other people and institutions made important contributions.Loren Rieseberg made intellectual contributions throughout these projects, as well as carefuledits to earlier versions of this document and its components.Components of the introduction and discussion have been published in a book chapter:Gregory J. Baute, Hannes Dempewolf and Loren H. Rieseberg (2015) Using genomic ap-proaches to unlock the potential of CWR for crop adaptation to climate change. In Crop WildRelatives and Climate Change. Wiley-Blackwell.Hannes Dempewolf contributed to revisions of this book chapter.Chapter 2 has been published:Gregory J. Baute, Nolan C. Kane, Christopher J. Grassa, Zhao Lai, and Loren H. Riese-berg (2015) Genome scans reveal candidate domestication and improvement genes in cultivatedsunflower, as well as post-domestication introgression with wild relatives. New Phytologist,206(2):830–838, 2015.Nolan Kane developed the idea for the F ST scan component of this work. Chris Grassaconstructed the genetic map that is described in this paper. I carried out all of the analyses andwrote this chapter with their input.Dan Ebert and I worked together to extract the DNA used for genotyping by sequencing (GBS)in Chapter 3. I prepared the libraries for sequencing of the annual species, and Dan Bock provideddata from the perennial species. I carried out all of the analyses with exception of the ABBA:BABAanalysis, which Gregory Owens conducted.The development of the pre-bred lines described in Chapter 4 took place over several years andinvolved the participation of several research assistants in both the field and greenhouse. SOLTIS,iiiPrefacean international plant breeding organization, extracted the DNA from these lines, and the NationalAgricultural Research Organization (NARO) of Uganda carried out the phenotypic evaluation ofthe lines under the supervision of Walter Anyanga. I prepared the GBS libraries, carried out themajority of the field and greenhouse work at UBC, evaluated the United States Department ofAgriculture (USDA) lines, and did all of the analyses and writing.The genotyping of the lines described in Chapter 5 was a large effort involving many partici-pants. The Rieseberg Lab and collaborators had generated the data previously. With the guidanceof Sariel Hubner, a software company, SAP AG, produced genotype calls. I conceived and carriedout the downstream analyses.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Our global food ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Domestication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Improvement and diversification . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Crop wild relatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.1 Cataloging germplasm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Unlocking the potential of CWRs . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Detecting historic selection . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Sunflower, the global oil-seed and its wild relatives. . . . . . . . . . . . . . . . . . 101.4 Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.1 Chapter 2: Domestication, improvement and admixture . . . . . . . . . . 12vTable of Contents1.4.2 Chapter 3: Assessing wild germplasm for continued use in improvementprograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.4.3 Chapter 4: Introducing wild germplasm into the cultivated sunflower genepool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4.4 Chapter 5: Selection during the last 40 years of sunflower breeding: buildingthe heterotic groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4.5 Chapter 6: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Genome scans reveal candidate domestication and improvement genes in cultivatedsunflower, as well as post-domestication introgression with wild relatives. . . . . 152.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Selection during domestication and improvement . . . . . . . . . . . . . 182.2.2 Placement of contigs onto the genetic map . . . . . . . . . . . . . . . . 182.2.3 Introgression during improvement . . . . . . . . . . . . . . . . . . . . . . 192.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 Selection during domestication . . . . . . . . . . . . . . . . . . . . . . . 202.3.2 Selection during improvement . . . . . . . . . . . . . . . . . . . . . . . . 212.3.3 Introgression during improvement . . . . . . . . . . . . . . . . . . . . . . 222.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 A genomic survey of wild Helianthus germplasm clarifies phylogenetic relationshipsand identifies population structure and interspecific gene flow. . . . . . . . . . . 323.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 New pre-bred lines have desirable agronomic traits and give insight into the per-meability of the modern sunflower genome . . . . . . . . . . . . . . . . . . . . . . 47viTable of Contents4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2.1 Building the pre-bred lines . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2.2 Phenotypic evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2.3 Genotyping of the pre-bred lines . . . . . . . . . . . . . . . . . . . . . . . 514.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 The genomic profile of a new hybrid crop: 40 years of sunflower breeding . . . . 655.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A Chapter 2 supplementary materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 98B Chapter 3 supplementary materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 157C Chapter 4 supplementary materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 176D Chapter 5 supplementary materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 198viiList of Tables2.1 Extent of introgression from wild relatives into modern H. annuus cultivars . . . . 262.2 Number of genes subject to selection during domestication and or introgressionduring improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27A.1 Sample information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A.2 Description of domestication candidate contigs from F ST genome scan. . . . . . . 109A.3 Description of improvement candidate contigs from F ST genome scan. . . . . . . . 135B.1 Description of wild Helianthus samples sequenced with GBS . . . . . . . . . . . . 157C.1 USDA pre-bred lines evaluated at UBC . . . . . . . . . . . . . . . . . . . . . . . . 176C.2 Description of pre-bred lines developed at UBC . . . . . . . . . . . . . . . . . . . 178D.1 Sunflower association population used in for WGS . . . . . . . . . . . . . . . . . 198viiiList of Figures1.1 Satellite images from our cultivated planet . . . . . . . . . . . . . . . . . . . . . . 21.2 Archetypes of H. annuus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Differentiation during domestication of H. annuus. . . . . . . . . . . . . . . . . . . 222.2 Genetic locations of domestication and improvement outliers . . . . . . . . . . . . 232.3 Introgression of wild H. annuus alleles into modern cultivars . . . . . . . . . . . . 253.1 Collection localities of samples used in this study. . . . . . . . . . . . . . . . . . . 363.2 Genetic diversity (Hs) of wild Heliathus species . . . . . . . . . . . . . . . . . . . 363.3 The inbreeding coefficient (F IS) of wild Heliathus species. . . . . . . . . . . . . . 373.4 Phylogenetic network of Helianthus germplasm . . . . . . . . . . . . . . . . . . . 393.5 Reconstructed phylogeny of Helianthus species. . . . . . . . . . . . . . . . . . . . 403.6 Assignment of H. annuus samples to fastSTRUCTURE populations . . . . . . . . 413.7 Geographic location of H. annuus colored by fastSTRUCTURE . . . . . . . . . . . 423.8 Interspecific gene flow between sympatric H. annuus and local annual Helianthus . 434.1 Geographic locations of the wild donors used in the creation of the pre-bred lines . 494.2 Evaluation locations for pre-bred lines in Uganda. . . . . . . . . . . . . . . . . . . 504.3 Agronomic traits of previously pre-bred lines. . . . . . . . . . . . . . . . . . . . . . 524.4 Evaluation of pre-bred lines at Kitgum, Uganda. . . . . . . . . . . . . . . . . . . . 544.5 BLUPs for days to 50% flowering . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.6 Disease tolerance of pre-bred lines . . . . . . . . . . . . . . . . . . . . . . . . . . 584.7 Genetic position of introgressions in pre-bred lines . . . . . . . . . . . . . . . . . . 614.8 The length of wild introgressions in pre-bred lines . . . . . . . . . . . . . . . . . . 624.9 Number of lines with introgression for each region of the genome . . . . . . . . . 63ixList of Figures5.1 Principal component analysis of cultivated samples by group . . . . . . . . . . . . 695.2 The size of regions selected during the development of the heterotic groups . . . . 705.3 Genomic regions putatively subject to selection during the development of theB-lines and R-lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.4 Heof theoretical crosses between B-line and R-line populations . . . . . . . . . . . 725.5 Differentiation of B- and R-lines along chromosome 10 . . . . . . . . . . . . . . . 725.6 Differentiation of B- and R-lines along chromosome 13 . . . . . . . . . . . . . . . 745.7 Principle component analysis of cultivated samples by date of release . . . . . . . 75A.1 Genetic diversity of domestication outliers in wild H. annuus and landraces. . . . . 138A.2 Tajima’s D in H. annuus landraces for all contigs and domestication outliers. . . . 139A.3 Introgression of wild alleles into modern cultivars on linkage group 1 . . . . . . . . 140A.4 Introgression of wild alleles into modern cultivars on linkage group 2 . . . . . . . . 141A.5 Introgression of wild alleles into modern cultivars on linkage group 3 . . . . . . . . 142A.6 Introgression of wild alleles into modern cultivars on linkage group 4 . . . . . . . . 143A.7 Introgression of wild alleles into modern cultivars on linkage group 5 . . . . . . . . 144A.8 Introgression of wild alleles into modern cultivars on linkage group 6 . . . . . . . . 145A.9 Introgression of wild alleles into modern cultivars on linkage group 7 . . . . . . . . 146A.10 Introgression of wild alleles into modern cultivars on linkage group 8 . . . . . . . . 147A.11 Introgression of wild alleles into modern cultivars on linkage group 9 . . . . . . . . 148A.12 Introgression of wild alleles into modern cultivars on linkage group 10 . . . . . . . 149A.13 Introgression of wild alleles into modern cultivars on linkage group 11 . . . . . . . 150A.14 Introgression of wild alleles into modern cultivars on linkage group 12 . . . . . . . 151A.15 Introgression of wild alleles into modern cultivars on linkage group 13 . . . . . . . 152A.16 Introgression of wild alleles into modern cultivars on linkage group 14 . . . . . . . 153A.17 Introgression of wild alleles into modern cultivars on linkage group 15 . . . . . . . 154A.18 Introgression of wild alleles into modern cultivars on linkage group 16 . . . . . . . 155A.19 Introgression of wild alleles into modern cultivars on linkage group 17 . . . . . . . 156xList of FiguresB.1 Neighbor joining and parsimony phlygenies of Helianthus . . . . . . . . . . . . . . 175C.1 Seed weight and branching of previously pre-bred lines. . . . . . . . . . . . . . . . 196C.2 Days to flowering for previously pre-bred lines . . . . . . . . . . . . . . . . . . . . 197xiNomenclatureBLUP Best linear unbiased predictorcM centiMorganGBS Genotyping By SequencingLG Linkage groupMb MegabaseNARO National Agricultural Research OrganisationNSERC National Science and Engineering Research CouncilOPV Open pollinated varietyQTL Quantitative trait locusSNP Single nucleotide polymorphismUSDA United States Department of AgricultureWGS Whole genome sequencingxiiAcknowledgementsThis thesis would not have been possible without the support of many incredible people andorganizations.I would like to extend my sincere thanks to:My advisor Loren Rieseberg for the tremendous opportunity to do this work. I would not havedeveloped into the scientist I am today without the freedom he granted me, and the ideas andresources he generously shared.My committee Mike Whitlock and Sally Aitken, as well as other faculty at the BiodiversityResearch Center, who were always there with hard questions and helpful comments. HannesDempewolf for his admirable drive to do this type of research. Nolan Kane for his infectiousexcitement for all things sunflower. Dan Ebert and Dan Bock, who helped me find my way inthe lab. Chris Grassa for always being there with the perfect (bioinformatics) one liner. GregOwens, Kate Ostevik, Kieran Samuk, Brook Moyers and Kathryn Turner for their eagerness totalk through new ideas and politely re-align my understanding of some topics.My sunflowers would not have survived in the field and greenhouse without the assistanceof Rebecca Seifert, Teale Dunsford and Winnie Cheung. They would not have thrived withoutthe facilities so carefully managed by Sean Trehearne and Melina Biron. The Rieseberg lab, the‘Biodiv’ research community, and the Department of Botany not only made this work possible,but they also enriched my experience greatly.The broader sunflower research community was immensely helpful and approachable. BrentHulke was always quick with key advice: my sunflowers would not have gone as far without him.Glenn Cole gave me a window into the big world of sunflower breeding. Gerald Seiler and LauraMarek helped me navigate the wild diversity of sunflower germplasm. Seeing my sunflower linesin the hands of Walter Anyanga, researchers at NARO, as well as Elena Albrecht and the rest ofxiiiAcknowledgementsthe team at Soltis gave me an exciting glimpse at their future potential.This work would have been much more difficult without a huge amount of freely availablecomputational tools and resources. Although these tools, including Linux, R, LATEX , and Perl, arewonderful themselves, it is the generosity of their online communities that unlock their power.I am very grateful for the public funding that made my work possible, including a fellowshipfrom Natural Sciences and Engineering Research Council of Canada, a Four-Year Fellowship fromUBC, and the Frances Chave Memorial Graduate Scholarship. Much of this work was undertakenas part of the initiative “Adapting Agriculture to Climate Change: Collecting, Protecting andPreparing Crop Wild Relatives”, which is supported by the Government of Norway. The project ismanaged by the Global Crop Diversity Trust with the Millennium Seed Bank of the Royal BotanicGardens, Kew and implemented in partnership with national and international gene banks andplant breeding institutes around the world.The support of my friends and family gave me confidence and a useful sense of calm. Myparents, Brenda and Dave, have always encouraged my curiosity and made it clear from thebeginning that they would be behind me 100% no matter what path I chose to walk in life.Finally, I could not be happier to have Kasia on this adventure with me.xivDedicationThis work is dedicated to the memory of Nikolai Ivanovich Vavilov, the early 20th century Russianbotanist. Vavilov recognized the critical importance of preserving crop diversity for crop improve-ment and ultimately for food security. His insights into the genetic basis of agronomic traits andcrop domestication still ring true today and his legacy lives on in the germplasm he safeguardedfor future generations.xvChapter 1Introduction1.1 Our global food ecosystemWe live on a cultivated planet. Nearly 40% of earth’s ice-free land is used for food production(Foley et al., 2011), a fact that can readily be observed from space (Fig. 1.1). This manufac-tured landscape currently provides the calories needed to feed the world’s expanding population.However, there is a large ecological cost associated with this food production; modern agriculturerelies on irrigation, fertilizer, and pesticides. The estimated human population of 9.5 billion in 2050(Gerland et al., 2014), along with changing diets, means that food production must double overthe next 35 years (Foley et al., 2011). During this same time period, climate change is expected toincrease inter-annual variability in temperature and precipitation, as well as the number of extremeweather events (Hansen et al., 2012). Given these three issues - the need for environmental stew-ardship, a growing population, and climate change - gains from crop improvement are expectedto have an especially positive impact over the next three decades. Although a formidable task,increasing food production at this scale is not unprecedented. The green revolution of the 1960sdoubled the yield of many cereal crops. This was accomplished by applying new technologies,including improved varieties, propagation techniques and external inputs. Further improvement ofour crops will be critical to meeting the challenges of the coming decades. Genetic diversity, suchas that found in crop wild relatives (CWRs), is a crucial component of crop improvement, and theapplication of genomic approaches may hold the key to unlocking the great potential of availablewild germplasm. Genomic data can provide valuable insights into the genetic history of our crops,as well as accelerate their improvement through genomic and/or marker-assisted selection.This thesis describes my research into aspects of the domestication and improvement of sun-flower and the role of its wild relatives in these processes. In this introductory chapter, I discuss11.1. Our global food ecosystemFigure 1.1: Satellite images of our cultivated planet. Top) Southern Ontario, Canada. Center)Near Playa del Mar, Argentina. Bottom) Near Kitgum, Uganda. White bars are 1km, images fromGoogle Earth.21.1. Our global food ecosystemeach of the following topics in turn: domestication, improvement, wild relatives, genomics, andsunflowers. The next four chapters of my dissertation describe a series of interconnected pop-ulation genetic and genomic research projects, which are also summarized briefly at the end ofthe current chapter. Lastly, a brief concluding chapter ties the results from the research chapterstogether and synthesizes emerging ideas.1.1.1 DomesticationThe plants that dominate our landscape are the products of millennia of selection. The process ofdomestication began at the end of the last ice age independently in several locations around theworld. Many crops were domesticated in one of a handful of centers of origin or Vavilov centers,named after the Russian botanist who conceived the idea that the geographic origin of a cropshould coincide with its center of genetic diversity. These centers of origin include what are nowMexico, the Middle East, and China (Vavilov, 1935). There are, however, domestication eventsthat occurred outside these major centers such as the domestication of squash and sunflower inwhat is now the eastern region of the United States of America (Sanjur et al., 2002; Harter et al.,2004). Whether by intentional or unintentional means, early farmers selected for a common set oftraits in many crop species. These traits, now referred to as a “domestication syndrome”, includelarger fruit or seed size, altered flowering times, and fewer stems or flowers. This process alsoresulted in the development of reproductive barriers between some crops and their wild ancestors,apparently as a byproduct of selection favouring the domestication syndrome (Dempewolf et al.,2012). Most crops appear to be the product of a single domestication event, although manyhave been subject to gene flow from wild relatives during or following domestication (Meyer &Purugganan, 2013).1.1.2 Improvement and diversificationOnce domesticated, many crops were further diversified through selection for adaptation to localconditions and/or for specific food or non-food uses. Maize for example, has been adapted tothe tropics, high altitude regions, and regions of low rainfall, and has varieties bred for popping,31.1. Our global food ecosystemsilage, flour and more. This process of diversification continues today with new varieties beingdeveloped for specific geographic regions and/or for specific markets. Some crops now leverage thephenomenon of heterosis or hybrid vigour, which requires special breeding considerations and hasresulted in hybrid or ’F1’ crops. Not only has plant breeding greatly benefited from our growingunderstanding of genetic processes, but it has also been a driving force in the characterization ofmany of these processes. Most modern breeding relies heavily on variation found within a crop’sexisting gene pool. Often, breeders will return to early cultivars such as heirloom varieties orlandraces to re-examine them for useful traits. If desired alleles are not found within cultivatedgermplasm, however, then breeders must turn to the wild relatives of these crops or exploremutagenesis or genetic engineering approaches such as transgenics and genome editing (Urnovet al., 2010; Shalem et al., 2014).1.1.3 Crop wild relativesCrops were domesticated from their wild relatives, and CWRs appear to have been involved in cropimprovement ever since. For example, there is evidence of gene flow between cultivars and wildpopulations in our three major cereals. Maize landraces have acquired alleles from a high altituderelative as they colonized high altitude regions (Hufford et al., 2013). Cultivated rice experiencedmassive introgression as its range expanded across Southeast Asia (Huang et al., 2012). Finally,durum wheat engulfed the whole genome of a wild relative, resulting in the allohexaploid breadwheat (Kihara, 1944). This process continues today with the intentional use of wild relatives inbreeding programs worldwide. Wild relatives have been the source of numerous traits ranging fromflood tolerance in rice (Hattori et al., 2009) to yield in tomato (Eshed & Zamir, 1995). Manycrops have access to huge amounts of genetic diversity in their wild relatives. Some crops, such assunflower, are at least partially cross-compatible with dozens of other species (Rogers et al., 1982).Plant breeders have made intentional use of wild relatives since at least the 1800s (Knight, 1806),and many modern breeding programs make substantial use of wild relatives. Often, however,useful alleles are introgressed without knowledge of their molecular basis or genomic location.Although CWRs are known to hold great genetic potential, they are typically employed as a last41.2. Genomicsresort in breeding programs. There are several reasons for this, including (1) there are often preor post-zygotic barriers to creating interspecific hybrids; (2) the creation of hybrids or backcrossesmay require the use of tissue culture techniques and ploidy manipulation, and can involve highrates of failure; and (3) beneficial alleles in CWRs are often genetically linked to unwanted alleles(i.e., linkage drag). Given these difficulties most breeding programs employ a small number ofCWRs or none at all.Selection of germplasm must be done with care if CWRs are to be employed for breeding.Some breeding programs will first evaluate CWRs for traits of interest before incorporating theminto cultivated genetic backgrounds. However, cryptic variation will be missed with such anapproach. CWRs can also be selected on the basis of geographic distribution or genetic diversity(Brown, 1989; Mackay & Street, 2004). Often several sources of information are used togetherto select a CWR for use in breeding, including the environmental and ecological conditions of thelocation from which it was collected. However, this ’passport’ information is often unavailable andvaluable collections may go unused. In fact, in a survey of plant breeders at international breedinginstitutions, lack of quality data was the most often cited reason for not using CWRs (ChelseaSmith, personal communication, Sept 5 2014).1.2 GenomicsUltimately, it is the genome of a CWR that will be utilized to improve our crops, not its phenotype,collection locality, or its history of local adaptation. With this in mind, I argue that germplasmresources in gene banks or in the wild may eventually be best explored, surveyed, or mapped, at thelevel of the genome as we become better at predicting the breeding value of individual accessionsand/or alleles. A growing array of genomic technologies is becoming more widely available andaffordable. Genome-wide surveys of large numbers of genetic markers in large populations arestarting to allow us to truly understand what diversity is available to breeders within crop wildrelatives. Faced with limited resources, plant breeders must choose carefully which germplasm toevaluate and incorporate into their improvement programs. This selection is especially difficult withwild relatives. Germplasm curators are also faced with difficult issues regarding CWRs, including51.2. Genomicsdecisions about conservation priorities and where to focus germplasm acquisition efforts. Genomicdata offers solutions to a number of practical issues surrounding the conservation and use of exoticand wild germplasm, as well as more effective and efficient ways to use these resources.1.2.1 Cataloging germplasmGene banks hold material of immense economic value (Smale & Koo, 2003) as they containgermplasm that is key to future global food production. In spite of this great value, the exactnumber of accessions held in gene banks remains unknown. Estimates of the number of uniqueaccessions vary widely, but the figure most commonly quoted is circa 2 million (FAO, 2010). Thenatural history of the germplasm can vary greatly, with different collections capturing differentamounts of diversity. Many accessions have complicated histories of collection, rounds of regen-eration, involvement in breeding cycles and sometimes movement among institutions around theglobe. Complicating the situation further, handling errors are nearly unavoidable when dealingwith large collections. Genomic data offer an unprecedented level of accounting of a gene bank’sholdings, thereby allowing questions pertaining to the diversity and nature of accessions to bemore confidently addressed. The uniqueness and diversity of accessions can be identified usingphylogenetic approaches, population genetic statistics (Huang et al., 2012), and multivariate anal-yses (Cavanagh et al., 2013; Romay et al., 2013). These analyses may reveal that the diversitywithin accessions justifies splitting or lumping, or may identify material that has been duplicated,mislabeled or misidentified. Various molecular markers have been used to identify and removeredundant lines from germplasm collections (van Treuren et al., 2010). Genomic technologiesoffer the marker density required to screen germplasm for rare alleles, which may not have beenidentified previously. Genomic data from CWRs can also help to resolve taxonomic questions,possibly resolving species relationships or identifying cryptic species, sub-populations or ecotypesof a taxa.The acquisition of new CWR germplasm through field collecting is an important process thatcan also be aided by genomic information. Collections are often incomplete and the diversity ofthe CWRs of many crops is not captured to a sufficient extent in current gene bank holdings (FAO,61.2. Genomics2010). Resource limitations and plant life histories often lead to small windows of opportunity forcollecting, mandating maximum efficiency. Effective and efficient collecting missions are partic-ularly important for diversity that is threatened in the wild or in the field. Genomic informationcan be combined with information on the geographic and eco-geographic distribution of a speciesto maximize the effectiveness of collecting. Current ‘gap analyses’ use occurrence records fromherbaria and other sources to generate models of where a species is likely to occur. The distri-bution of existing germplasm collections is then overlaid on these distribution models and gaps(areas where a species is likely to occur but there are no collections) identified (Ramírez-Villegaset al., 2010). However, accession abundance and collection locality alone are not truly represen-tative of species diversity and distribution, as the genetic diversity of a species is rarely spatiallyhomogeneous. Species frequently undergo range changes and population size fluctuations. Someareas may also contain more diversity, such as glacial refugia (Petit & Excoffier, 2009), or theycould contain unique alleles not found elsewhere. It can therefore be expected that the integrationof genomic information into gap analyses will yield more precise and accurate estimates of current‘gaps’ in gene bank collections.1.2.2 Unlocking the potential of CWRsPhenotyping is one of the biggest bottlenecks in crop improvement programs, especially whenCWRs are involved, because multiple generations of introgression into a cultivated genomic back-ground may be necessary before evaluation. Collections of CWRs are often larger than can berealistically evaluated and many have limited or incomplete passport information. Breeders canonly select a subset of the material for evaluation and incorporation into a breeding program.Germplasm selection by breeders is typically based on natural history records, previous phenotypicand genotypic evaluations, curator knowledge about a given accession’s characteristics, prior usein breeding programs and the breeders’ needs. Candidates for use are sometimes drawn from corecollections, which are subsets of accessions designed to capture the greatest amount of geneticdiversity in as few accessions as possible (Brown, 1989). Core collections are generally madeby dividing accessions into phenotypic, life history, taxonomic or ecogeographic groups and then71.2. Genomicsselecting representatives from each group. Often several metrics are used together to establish acore collection, but molecular markers are particularly useful for this process. Genetic diversity canbe assessed with many different kinds of molecular markers, but the marker density obtained fromhigh-throughput sequencing can be several orders of magnitude greater than other methods andwill likely give the clearest picture of the patterns of diversity within a collection. With genotypicdata, selections can be made in an automated fashion with software that aims to maximize thenumber of alleles in a selected group of samples (Gouesnard et al., 2001). Another approachfor identifying germplasm that may contain alleles conferring tolerance to a given abiotic stress,herbivore, or pathogen is the Focused Identification of Germplasm Strategy (FIGS). The FIGSapproach uses collection locality information to select accessions most likely to have desired traits(Mackay & Street, 2004). However, a crucial assumption underlying FIGS and similar approachesis that the material is locally adapted, which is not always the case, especially for crop wild rela-tives where populations can be small and/or migration rates are high (Rieseberg & Willis, 2007;Mallet, 2007). Genomic tools can be used to identify signals of selection (Morrell et al., 2012), soincorporating them into FIGS and similar approaches may be beneficial.Although genomic approaches may allow the detection of locally adapted alleles, there areseveral further issues that need to be considered before they can be utilized effectively in a breedingprogram. First, the traits may have a complex genetic basis, making introgression into elite materialdifficult. Second, even if the responsible alleles are successfully introduced, linked maladaptivealleles may be a problem. In addition, novel genetic interactions may prevent the expression ofthe locally adapted traits. Lastly, locally adapted phenotypes must be useful in the context ofcultivation. There are many ways in which plants can evolve tolerances to a given environmentalfactor. For example, it is possible that a population that is adapted to dry environments harborsalleles for more efficient photosynthetic machinery, better membrane transporters or tolerance toreactive oxygen species that arise during drought stress, all of which may benefit cultivars (Cruz deCarvalho, 2008; Schroeder et al., 2013). Alternatively, the population may respond to droughtby closing its stomata, shutting down and waiting for water to become available, or by escapingdrought by flowering early in the season. Although the latter strategies may help wild populations81.2. Genomicssurvive, and leave a corresponding signature of selection in the genome, they are not useful inthe context of breeding drought tolerance because it may not lead to yield advantages, creatinga pitfall for both FIGS and genomics-enabled approaches. Leveraging genomic information fromexisting cultivated material can help address some of these issues.As we will see in Chapter 2 and references therein, genomic data can be used to detectpast use of wild relatives. This includes (1) corroboration of pedigrees; (2) confirmation of theuse of specific CWRs; and (3) quantification of the number of introgressions and their geneticand physical size. Such information could, for example, make the under- or over-representationof particular wild taxa apparent, highlighting them for future use. The size of the introducedchromosomal segments may also yield information regarding the importance of linkage drag. Thepresence of a wild allele in one or more cultivated lines should not be viewed as evidence ofits general usefulness. Many crops have complicated histories of selection and contain multiplefunctional groups, so these genomic signals should be put in the context of the broader cultivatedgene pool when possible.1.2.3 Detecting historic selectionCrop plants were selected from wild plants over millennia, leaving a signature of selection in theirgenomes, which can be detected using genome scans approaches. Many such scans examinevariation among loci in F ST, which is one of the main statistics used to study population differen-tiation. Loci with exceptionally high F ST values may be the targets of divergent selection, as theyare more highly differentiated between populations than the rest of the genome. This “genomescan” method of detecting selection by identifying outlier high-F ST markers has been gaining pop-ularity with the advent of high-throughput sequencing technologies (Roesti et al., 2012; Stöltinget al., 2012). Numerous candidate genes have been identified in a variety of crops, including rice(Huang et al., 2012), maize (Hufford et al., 2012), soybeans (Lam et al., 2010), tomato (Koenig &Jiménez-Gómez, 2013) and watermelon (Guo et al., 2012). Measures of differentiation or diversitycan be used independently, or in combination with metrics such as the cross-population compositelikelihood ratio test (Chen et al., 2010), to detect these genes. Groups working with CWRs may91.2. GenomicsFigure 1.2: Archetypes of H. annuus. Left) Multiheaded highly branched wild H. annuus. Center)Late flowering large landrace. Right) Semi-dwarf modern inbred line. Illustrations courtesy of KasiaStepien.consider exploiting this information to rapidly remove undesirable wild traits during pre-breeding.Many crops have been diversified into numerous groups with different functional roles or uses.Functional groups could include heterotic gene pools for the creation of F1 hybrids or cultivarsdeveloped for specific growing regions or types of cultivation practices. Maintaining such groupsis critical to successful breeding programs, so understanding what differentiates them is of greatinterest. Genome scan approaches can be used here as well. For example, in tomato a genome scanshowed that processing varieties are defined by a large segment of chromosome 5 that containsmultiple QTL relevant to the processing phenotype (Lin et al., 2014). In maize, clear evidenceof divergent selection was detected in comparisons of different heterotic groups that have beenmaintained over the past several decades (van Heerwaarden et al., 2012). These analyses haveyet to be carried out in other crops with similar functional groups, such as sunflower, so fewgeneralizations of these processes can be made.101.3. Sunflower, the global oil-seed and its wild relatives.1.3 Sunflower, the global oil-seed and its wild relatives.Starting over 4000 years ago, Native Americans began collecting and cultivating H. annuus, ex-posing it to a novel selective regime (Harter et al., 2004; Smith, 2006). The resulting landraces aredifferentiated from their wild ancestors by the presence of one or a few flower heads, large leaves,large seeds, self-compatibility, and late flowering (Fig 1.2). Following this long period of selection,sunflowers were introduced to the old world as a novelty and curiosity (Putt, 1997). Over the lastcentury, sunflowers have entered into a phase of intensified and intentional improvement. Selectionduring this period resulted in single-headed varieties with increased oil content. Further breedinghas created distinct heterotic groups for hybrid seed production as well as lines with distinct oilprofiles. Sunflower is now one of the world’s main sources of plant-based oil. It is also remarkablywidely grown; in 2013, 25 million hectares of sunflowers were grown worldwide (FAO, 2014).Various traits have been introduced into the elite lines from wild or exotic germplasm. Thesetraits include disease resistance (Feng et al., 2008), herbicide tolerance and genetic systems forhybrid seed production (cytoplasmic male sterility and restorer alleles) (Seiler & Jan, 1997). Publicbreeding efforts have also made a variety of ’pre-bred’ lines available (Seiler & Jan, 1997; Fenget al., 2006; Seiler, 1991a,b,c, 1993, 2000). Pre-bred lines are cultivated lines with wild or exoticintrogressions. They have not been developed and refined enough to be released as cultivars, butrepresent a bridge through which wild alleles may be introduced into breeding programs more easily.Wild sunflower germplasm is also being mined by several large private breeding efforts. Despitesuch efforts, much of the genetic diversity contained in wild sunflowers is still not accessible tosunflower breeders because the investment and time required to move from wild material to a highperformance cultivar.Wild Helianthus continues to be a useful model for exploring the evolutionary processes ofgene flow and adaptive introgression. Across North America, 52 species of Helianthus are foundin a wide variety of habitats (Kane et al., 2012), with a large range of population sizes (Strasburget al., 2011). There are a number of species that contain sub-species or ecotypes that appear tobe adapted to different environments. There is also a growing set of genomic resources availableto sunflower researchers, including a reference genome for the cultivated sunflower (Kane et al.,111.4. Dissertation2011), genomic sequence data for numerous other genotypes and a high-density sequence-basedgenetic map (Kane et al., 2012; Baute et al., 2015; Renaut et al., 2013). The majority of theresearch in Helianthus has focused on members of the annual clade, which contains the crop’swild progenitor H. annuus.1.4 DissertationMy dissertation research employs genomic approaches to investigate various aspects of sunflowerdomestication and improvement, with a focus on wild relatives. Given the global importanceof sunflower, an independent international organization, the Global Crop Diversity Trust, hasmade it one of the focal crops of a large project aimed at harnessing the diversity of CWRs fordeveloping environmentally resilient crop varieties. This project is large in scope and scale andaims to conserve, collect and carry out pre-breeding for a number of food security crops. Thiswork described here, particularly Chapters 3 and 4, are components of this larger initiative. Thegoal of these projects was not only to broaden our understanding of the underlying genetic basisof crop improvement, but also to produce useful information and germplasm for the sunflowerresearch and breeding community.1.4.1 Chapter 2: Domestication, improvement and admixtureIn this chapter I describe the genetic changes that have accompanied the domestication andimprovement of sunflower based on the transcriptome sequencing of 80 Helianthus samples. Thesesamples include both wild and domesticated H. annuus as well as two other cross-compatiblespecies. Using both single nucleotide polymorphism (SNP) data and a dense genetic map, Iscanned the genome to identify likely targets of selection during domestication and improvement,and identified several domestication gene candidates. I then employed a Bayesian analysis toidentify post-domestication introgressions from two wild relative species. I found extensive evidencefor wild relatives having been used in the creation of modern sunflower lines. These introgressionsare found on every linkage group and in each of the lines sampled. One large introgressioncorresponds to the branching allele that is known to have been re-introduced into the cultivated121.4. Dissertationsunflower in the 1960s to facilitate hybrid seed production.1.4.2 Chapter 3: Assessing wild germplasm for continued use in improvementprograms.This chapter describes a genomic survey of wild Helianthus based on genotyping by sequencing(GBS) data from circa 290 wild collections. This germplasm comes primarily from the UnitedStates Department of Agriculture’s (USDA) germplasm collection and focuses on wild H. annuus.I used these data to reconstruct the phylogeny for the 27 taxa I surveyed. I identified a strong signalof population structure within H. annuus consistent with its geographic distribution across NorthAmerica, with clear geographically isolated groups. The origin of one of these geographicallyand genetically distinct groups may have been driven by introgression from a locally sympatricHelianthus species as revealed by an ABBA:BABA analysis. The information gained here is notonly useful to those interested in using this germplasm for crop improvement, but it also givesinsights into role of introgression in shaping genotypic and phenotypic variation in wild H. annuus.1.4.3 Chapter 4: Introducing wild germplasm into the cultivated sunflowergene poolIn this Chapter I describe the development of new pre-bred lines for sunflower, as well as thephenotypic and genotypic characterization of these lines. I created circa 400 pre-bred lines, eachof which contain introgressions from a wild Helianthus genotype selected from the germplasmsurveyed in Chapter 3. The wild samples were selected to encompass as much genetic diversity aspossible, and the rounds of backcrossing and selfing did not include any intentional selection. Thisapproach was taken to maximize the wild diversity introduced into the crop gene pool for evalua-tion. All lines are freely available to the sunflower community under the standard material transferagreement. Collaborators at the Ugandan National Agriculture Research Organization (NARO)evaluated these lines, and analyses show that they contain a great deal of promising variation forvaluable traits such as drought tolerance, disease resistance and flowering time. Additionally, 55previously developed pre-bred lines were evaluated at UBC. The multi-locus genotypes of each131.4. Dissertationline were assessed with GBS, and the number, size and locations of the wild introgressions wereidentified. Patterns of introgression across the genome are discussed.1.4.4 Chapter 5: Selection during the last 40 years of sunflower breeding:building the heterotic groups.Sunflower is a hybrid crop, and heterotic groups are a prerequisite for effective hybrid breeding. Inthis chapter, I used whole genome sequencing (WGS) data from circa 280 cultivated samples todetect signals of divergent selection during the creation of the male and female heterotic groups.This was accomplished by comparing each of the heterotic groups to the open pollinated varieties(OPVs) from which they were derived. I found hundreds of genomic regions that appear to havebeen targeted by selection in one or both heterotic groups. To determine if shared loci could bethe product of selection for overdominance, a possible genetic mechanism for heterosis, I askedif the same or different alleles had been the focus of selection in the two heterotic groups. Thesame allele was found to be the target of selection at shared loci, suggesting that these lines havebeen shaped by selection for generally useful improvement alleles rather than overdominant geneticcombinations. Surprisingly, only two regions are highly differentiated in these populations; thesecorrespond to the branching locus and to the locus responsible for restoring fertility in F1 plants.1.4.5 Chapter 6: ConclusionFinally, I bring together the main results of this research and discuss their broader implicationsfor crop improvement. The main ideas presented are that crop improvement is a more complexprocess then generally depicted and that the application of genomic tools can yield useful insightsinto the evolutionary history, genetic relationships and genomic composition of our crops and theirwild relatives.14Chapter 2Genome scans reveal candidate domestication and improvementgenes in cultivated sunflower, as well as post-domesticationintrogression with wild relatives.2.1 IntroductionDomesticated plants form the basis of the global food supply. These plants are the product ofmillennia of selection for traits that increase their utility and productivity relative to their wildprogenitors (Meyer & Purugganan, 2013; Olsen & Wendel, 2013). The genetic basis of thesechanges has been of great interest as it may inform our understanding of early human societies andmay facilitate the continued improvement of our crops. Morphologically, the modern domesticatedsunflower provides a classic example of the domestication syndrome in plants, having a singlelarge head and large seeds that do not disperse by shattering, in contrast to its wild progenitorwhich has many small heads and small seeds that disperse when mature (Burke et al., 2002).Furthermore, cultivated sunflower has altered flowering time and oil content relative to its wildprogenitor (Blackman et al., 2011a; Burke et al., 2005). Sunflower provides the strongest evidencefor an independent origin of agriculture in eastern North America (Harter et al., 2004; Blackmanet al., 2011b). As a consequence, its domestication has long been a subject of broad internationalinterest. It was domesticated >4000 years ago (Smith, 2006) serving roles as a source of calories,as a medicinal plant, and even as a dye. Sunflower is now one of the top oil seeds globally andin 2013 was grown on over 25 million hectares (FAO, 2014). It has also been one of the cropswith the largest growth in abundance in the global food supply in the last 50 years (Khoury et al.,2013).While the history of plant domestication and improvement is typically viewed in terms ofrecurrent rounds of selection and genetic bottlenecks, for many crops this may be a simplification152.1. Introductionof the process that has take place (Meyer & Purugganan, 2013). For example, recent geneticand genomic studies have revealed evidence of prehistoric introgressions from wild relatives intosome domesticated plant and animals. Rice experienced such massive introgression from localwild relatives as it moved away from its center of domestication that there was a long debate overwhether the two main subspecies of cultivated rice actually represented independent domesticationevents (Huang et al., 2012). Pre-historic introgression has also been found to occur on a muchsmaller scale, both genomically and geographically. High altitude maize landraces have allelesincorporated from a high altitude wild relative (Hufford et al., 2013). Likewise, there is evidencefor the incorporation of wild alleles following domestication in grapes and soybeans (Lam et al.,2010; Myles et al., 2011). For many crops the process of incorporating genetic material fromwild relatives continues today in modern breeding programs. Indeed, major international wildrelative conservation and pre-breeding efforts are ongoing for dozens of the world’s most valuablefood-security crops, including sunflower.Early domesticated lines of sunflower (referred to here as landraces) were introduced to Europeby naturalists in 1510 (Putt, 1997; Olsen & Wendel, 2013; Meyer & Purugganan, 2013), wherethey experienced a phase of intensified improvement, being bred for high oil yield beginning inthe 19th century. The bulk of our modern cultivated germplasm is derived from material fromthis period (Korell et al., 1992; Burke et al., 2002). Modern lines are generally dwarfed, singleheaded, and have been selected to have specific oil profiles. One of the major shifts in sunflowerbreeding was to accommodate hybrid seed production beginning in the 1970s. This transitioninvolved establishing heterotic groups and creating a pollination system with which F1 seed couldbe produced at a commercial scale. The needs for seed production tools were met with a system ofcytoplasmic male sterility and restoration, as well as the re-introduction of branching to increasethe window of time in which pollination can occur. The responsible alleles were sourced fromH. petiolaris and wild H. annuus (Leclercq, 1969; Kinman, 1970; Fick et al., 1975), althoughsubsequent breeding has found many more sources for these traits in wild Helianthus. Basedon breeding registrations, introduction of genetic material from other wild relatives appears tohave been prevalent in sunflower improvement (Korell et al., 1992). Although for some of these162.2. Materials and methodstraits there has been efforts to map the underlying genes (Harter et al., 2004; Burke et al., 2005;Yue et al., 2008; Blackman et al., 2011b; Qi et al., 2012), we do not know where many of theintrogressions have occurred in the genomes of particular cultivated lines, or how much of themodern cultivar genomes are of wild and interspecific origin.Here we present an analysis of the genomic changes associated with domestication and im-provement in sunflowers, based on transcriptome sequences of early and improved domesticatedsunflowers, as well as of wild H. annuus, H. argophyllus and H. petiolaris. We identify allelesassociated with domestication and/or improvement, determine the extent and parentage of in-trogressions into modern lines, and discuss the likely phenotypic effects of domestication alleles,improvement alleles, and introgressed regions. Lastly, we ask if changes in the genetic backgroundduring domestication affected genomic patterns of subsequent introgression from wild species.Given that modern breeders typically select against wild phenotypes for most traits, we predictthat introgression should be reduced in genomic regions containing domestication alleles as hasbeen observed in Maize (Hufford et al., 2013) and tomato (Lin et al., 2014).2.2 Materials and methodsWe analyzed sequence variation in 80 transcriptomes, representing 38 genotypes of wild andcultivated (landrace and modern lines) H. annuus, 21 genotypes of H. petiolaris and 21 genotypesof H. argophyllus (Table A.1). Transcriptome sequencing of the wild species and one of thecultivars are described in Lai et al., (2012), Renaut et al., (2013) and Rowe et al., (2013), whereastranscriptomes for the remaining 18 cultivated genotypes are reported here for the first time (seeTable A.1 for details). RNA extractions, library preparation, and sequencing using Illumina and454 sequencing platforms were carried out following (Lai et al., 2012). Reads were aligned againsta reference transcriptome comprised of 16,312 contigs (Renaut et al., 2012). Raw Illumina readswere aligned with the Burrows-Wheeler Aligner (BWA) (Version: 0.6.1-r104) using the bwa-shortalgorithm with a quality trim parameter of 20. 454 reads were cleaned and trimmed using SnoWhite(Smith, 2006; Dlugosch et al., 2013) and aligned using the BWA-SW algorithm (Li & Durbin,2009; Khoury et al., 2014). Alignments were converted to binary format, sorted, and indexed172.2. Materials and methodsusing SAMtools (Li et al., 2009; Meyer & Purugganan, 2013) (Version: 0.1.15 (r949:203)) andheaders were added using Picard (Version: 1.45). Local realignment near indels and genotypingwere performed using the GATK (McKenna et al., 2010) (Version: 1.1-31-gdc8398e)For all analyses, the following filtering criteria were employed: (1) sites where fewer than 80% ofthe individuals were scored were removed; (2) invariant sites or sites with a minor allele frequency ofless than 0.05 were removed; (3) sites with observed heterozygosity greater than 0.5 were removedto minimize false positives due to paralogous alignments; and (4) sites with alleles exclusive to454 sequence reads were assumed to be artifactual and discarded. Phylogenetic network analyseswere carried out using the program splitstree with default settings (Huson, 2005).2.2.1 Selection during domestication and improvementIn order to study selection during domestication we searched for outlier SNPs between the lan-drace transcriptomes and the transcriptomes of wild H. annuus from its native range. The lattertranscriptome set includes all of the wild populations of H. annuus sampled, except those fromTexas, which are thought to represent recent colonists (Heiser, 1951) and have been classifiedas a distinct subspecies, H. annuus subsp. texanus. F ST was calculated and outlier SNPs wereselected on the basis of having a q-value of less than 0.05 using Bayescan and an F ST value abovethe mean (so as to exclude targets of balancing selection) (Foll & Gaggiotti, 2008; Lam et al.,2010; Myles et al., 2011) with default settings. Contigs containing one or more outlier SNPs,hereafter referred to as “domestication genes”, were used to generate a phylogenetic network.Additionally we calculated genetic diversity (HS), Tajima’s D and the number of synonymous andnon-synonymous substitutions in these candidate domestication genes. To calculate Tajima’s Dwe included all sites that had sufficient sampling over 200bp. A similar genome scan was carriedout with landraces and modern lines to investigate selection during improvement. Gene Ontology(GO) enrichment analysis was carried out on all contigs using the highest and average F ST scorewith the program ErmineJ (Lee et al., 2005), which uses gene rankings and is not dependenton cut offs. This transcriptome had been previously annotated (Renaut et al., 2012). The GSRmethod in ErmineJ was used with a minimum gene set size of 10 and a maximum of 100, and up182.2. Materials and methodsto 100,000 iterations.2.2.2 Placement of contigs onto the genetic mapThe transcript contigs were placed on a high-density sequence-based genetic map of sunflowerusing blast to find the best-hit and requiring 90% identity over 500 base pairs. The data andmethods employed to generate the genetic map have been described previously (Renaut et al.,2013), except that an updated assembly of the sunflower genome (HA412 v0.2 assembly) wasemployed for mapping. The total length of the map is 1358 cM.2.2.3 Introgression during improvementTo identify introgressed regions in the genomes of the improved cultivars (i.e. modern lines) weused the linkage admixture model in STRUCTURE (Pritchard et al., 2000) (burnin = 100,000;numreps =100,000) on each linkage group separately. For each analysis, SNPs were selectedbased on the same criteria described above for the domestication and improvement genome scans.With the exception of the modern lines, samples were assigned to the following potential donorpopulations: H. argophyllus, H. petiolaris, native H. annuus, H. a. texanus, and landraces based ona priori knowledge of their origins. Note that these groups also appear to represent coherent geneticclusters (Fig. 2.1). The admixture analysis was first carried out with the H. annuus samples only,with the goal of identifying introgressions from H. a. texanus that accompanied the re-introductionof branching into modern lines. We specifically compared introgressions into RHA 274, whichis branched, with Sunrise and VNIIMK8931, which are older, unbranched modern lines. Theadmixture analysis was then expanded to include all samples to identify interspecific introgressions.Introgression into the modern gene pool was declared using the conservative criterion of <0.05ancestry to landrace germplasm.The total map length of an introgression was determined by the number of adjacent map binsthat met our ancestry criteria. For map bins that contained introgression breakpoints, the positionof the breakpoint was arbitrarily estimated according to the fraction of introgressed SNPs withinit. For example, if a 1 cM map bin contained 60 introgressed SNPs out of 100 total, then the192.3. Resultsintrogression breakpoint was inferred at 0.6 cM within the map bin. Lastly, we tested whethercontinued selection on domestication traits during the improvement phase affected patterns ofintrogression from the wild species. This was accomplished by using a contingency test to whetherthe locations of putative introgressions were independent of regions identified as targets of selectionduring domestication.2.3 Results2.3.1 Selection during domesticationFor the domestication genome scan, 146,815 single nucleotide polymorphisms (SNPs) met ourfiltering criteria on 9,768 contigs. Using Bayescan to test for highly differentiated SNPs, weidentified 184 outlier SNPs from 122 contigs (Table A.3). The average F ST for all SNPs was0.14, whereas outlier SNPs had an average F ST of 0.43 and are found in the extreme tail ofthe F ST distribution (Fig 2.1). The wild and cultivated germplasm are clearly distinct in thephylogenetic network generated using these domestication genes (Fig. 2.1). These outliers wereless diverse in the landraces than in the wild samples (t-test, p-value < 2.2e-16, Fig. A.1) butwere not enriched for non-synonymous substitutions. The majority (84/88) of the domesticationgenes have a Tajima’s D of < -2; however, the majority of the full set of genes fit this pattern,indicating the whole genome was subject to a severe bottleneck (Fig. A.2)Two outlier contigs were especially strongly differentiated, containing 19 and 12 outlier SNPsout of their 24 and 18 total SNPs, respectively. These putative domestication genes exhibitsequence homology to Arabidopsis gene AT5G49460, which encodes ATP citrate lyase subunitB2, and AT5G52840, which has a function related to NADH-ubiquinone oxidoreductase. Both areinvolved in basic metabolism and likely have a role in oil production. For example, ATP citrate lyaseis an important enzyme that links carbohydrate metabolism to fatty acid biosynthesis (Fatland,2005). Another oil synthesis-related domestication gene encodes a homolog of AT5G52920, asubunit of pyruvate kinase. Pyruvate kinase is a glycolytic enzyme that, like ATP citrate lyase,plays a key role in the conversion of carbohydrates to seed oil (Flugge et al., 2011).High average F ST contigs are enriched for several gene ontology (GO) categories at p <202.3. Results0.05 and a FDR corrected p-value of < 0.1. In the molecular function ontology, nutrient reservoiractivity (GO:0045735) and protein binding transcription factor activity (GO:0000988) are enriched,for the biological processes ontology, immune system processes (GO:0002376) and locomotion(GO:0040011) are enriched, and for the cellular components ontology, cell junction (GO:0030054)is enriched. At lower levels in the GO hierarchy, there are many more significant groups including,with the two lowest p-values, coenzyme biosynthetic process (GO:009108) and transferase activity(GO:0016746).Slightly less than half of the contigs (7677/16312) used in this study, including 59/122 outlierloci (i.e., putative domestication genes) could be placed onto the genetic map (Fig. 2.2). Tofurther evaluate the possible phenotypic effects of the putative domestication genes, we comparedtheir positions on the genetic map to domestication-related QTL from previous studies (Table A.2).Many of the domestication candidate genes were found to fall within domestication related QTL.We discuss several of the more promising associations below, with the caveat that we currently lackfunctional data to validate these associations. Additionally, many of these mapping populationsdid not include wild parents, making it difficult to infer the effects of wild versus cultivated alleles.A transcription factor, a homolog of ATMYB59, is within QTL for achene weight, disc diameter,and flowering time (Wills & Burke, 2006; Dechaine et al., 2009) on linkage group one. MYB familytranscription factors are known to regulate many aspects of plant growth and development (Duboset al., 2010). Two domestication genes are found on linkage group ten, which contains severaldomestication related QTL including branching and flowering time. One of these domesticationgenes is a homolog of AT5G62430, a cycling DOF factor, and could be involved in the shift inflowering time in the landraces through regulation of CONSTANS (Imaizumi, 2005). In a previousgenome scan in sunflower, a DOF-like gene was identified as under selection during domesticationbased on reductions in diversity of a closely linked microsatellite locus (Chapman et al., 2008).The other domestication gene on linkage group ten is a homolog of AT3G54610, named HaGNAT,a histone acetyltransferase found in the GNAT gene family, and maps near the major branchingQTL (Mandel et al., 2013). The GNAT gene family has been implicated in regulation of WUSHELand AGAMOUS (Bertrand et al., 2003), which play a critical role in floral meristem development.212.3. ResultsH. annuus, Domesticated (n = 19) H. annuus, Wild (n = 11)H. a. texanus (n = 8)H. argophyllus (n = 21)H. petiolaris (n = 21)(a) (b)(c)Figure 2.1: Differentiation during domestication of H. annuus. (a) Phylogenetic network usingentire data set, samples coloured according to taxonomic group. (b) Phylogenetic network basedon F ST (fixation index) outlier containing (domestication) genes. (c) F ST distribution with insetfor values above Selection during improvementFor the improvement analysis, 146,903 SNPs on 10944 contigs were detected that passed ourfilters. Average differentiation between the landrace and modern lines (F ST = 0.14) was essentiallyidentical to that for the domestication contrast described above. Using the same genome scanapproach, we identified 25 outlier SNPs with an average F ST of 0.44 in 15 contigs (Table A.3).Six of these improvement genes were placed on the genetic map (Fig. 2.2). The improvementgenes identified include components of photosystem II, a heat shock protein and an osmoticallyresponsive gene. One improvement gene has homology to LORELEI (AT5G56170), which isinvolved in pollen tube reception and fertilization as well as seed development (Capron et al.,2008; Tsukamoto et al., 2010). The transition from landraces to modern inbred lines involvedselection for self-compatibility and LORELEI may have contributed this process.2.3.3 Introgression during improvementUsing the linkage model in STRUCTURE we identified numerous putative introgressions from wildrelatives into modern sunflower lines. First, focusing on the ancestry of two unbranched and one222.3. ResultscentiMorganFigure 2.2: Genetic locations of domestication and improvement outliers on the genetic map ofH. annuus. Horizontal lines denote genetic map bins.232.3. Resultsbranched line we identify large chromosomal regions that have the ancestry wild H. annuus fromTexas (H. a. texanus) in the branching line. The largest signals of introgressions are found onchromosome 8, 10, and 12 spanning of 18, 49, and, 22 cM of those chromosomes, respectively.In RHA 274 the domestication gene, HaGNAT is subject to introgression from H. a. texanus.This gene has ancestry to landraces in the remainder of the lines sampled. Similarly, evidence forintrogression in the other modern lines is largely absent in these regions (Fig. 2.3).Expanding the analysis to include a larger number of modern lines and two other wild relativespecies, we see that they all contain putative introgressions and that all chromosomes appear tohave experienced introgression in at least one of the modern lines (Fig. A.3-19). VNIIMK8931and Kosim have the smallest amount of introgression in our modern line samples and RHA 274has the most (Table 2.1). Scoring each region of the genome cumulatively over all of modernlines, we calculate that a total of 154.8 cM of the genome has been subject to introgression. Givena total map length of 1358 cM, it appears that > 10% of the genome of the modern lines hasbeen subject to wild introgressions. These introgressions contain 12 of 57 domestication genesthat were included in the ancestry analysis, which is more than we would expect by chance (Table2.2; Fisher’s exact test, p=0.0356). This implies that introgression is not independent of selectionduring domestication. However, seven of the domestication genes are subject to introgressionin only one line, RHA 274, apparently due to the purposeful re-introduction of branching. Theremaining five domestication genes affected by introgression are found in a single introgression onlinkage group 5 in Mammoth and HA89.242.3. ResultsRHA274VNIIMK8931Sunrise10 20 30 40 50 60Genetic Position (CM)Linkage Group 810 20 30 40 50 60 70 80 90Genetic Position (CM)Linkage Group 910 20 30 40 50 60 70 80Genetic Position (CM)Linkage Group 10RHA274VNIIMK8931SunriseRHA274VNIIMK8931SunriseLandraceH. annuusH. a. texanusFigure 2.3: Introgression of wild H. annuus alleles into RHA 274, a modern branched cultivar, aswell as into two unbranched cultivars (Sunrise and VNIIMK8931) on linkage groups 8, 9 and 10.The y-axis indicates the amount of admixture identified from the potential parental populations.252.3.ResultsTable 2.1: Extent of introgression from wild relatives into modern H. annuus cultivars (<0.05 assignment to landrace ancestry), estimatedtotal genetic distance (in centiMorgans) by linkage group.Linkage Group VNIIMK8931 Mammoth Kosim Sunrise HA89 HA369 HA384 HA412 RHA274 Cumulative1 0.33 0.63 0 4.33 1.09 2.07 0.11 5.18 1.4 11.092 0 0 0 0.13 0.26 0.07 0.35 0.17 0.73 1.433 0.1 1.46 0.79 2.91 1.55 0 1.06 0.92 3.59 7.464 0.2 0.72 0.09 3.9 2.9 0.2 2.37 1.89 13.62 19.915 0 7.48 0.33 0.43 5.88 0.92 0.22 0.22 1.04 10.586 0 0.34 0 1.23 1.02 0.18 0.96 1.7 0.99 47 0.43 0.2 0.33 0.05 1 0.2 1.06 1.1 0.3 3.018 0.32 0.6 0.49 0.38 0.23 0.7 0.15 2.87 14.89 18.669 0 0.03 0 2.45 0.45 0.65 1.81 1.7 1.03 5.6310 0.09 0.16 0.15 1.99 1.17 1.41 1.42 0.32 26.59 28.4911 0.92 0 0.5 0.52 0.34 0.09 1.16 1.28 4.07 512 0 0 0.34 3.67 1.26 0.18 0 0 15.47 18.1413 0 0.02 0.05 1.68 0 0.24 0.21 0.49 2.22 3.7714 0.7 1.35 0.29 1.58 1.96 0.9 1.26 1.04 1.38 4.9215 0.41 0.64 0.4 1.5 0.1 1.45 0 0.81 0 4.416 0.09 0.75 0.47 0.52 0.94 0.06 0.24 0 1.11 2.2117 0.11 0.25 0.14 2.93 0 0.25 0.32 0 3.37 6.12Total 3.72 14.63 4.36 30.2 20.15 9.59 12.71 19.66 91.81 154.79262.4. DiscussionTable 2.2: Number of genes subject to selection during domestication, introgression in at leastone modern H. annuus line during improvement, or a combination of the two.Subject to selectionYes NoSubject to Yes 12 525introgression No 45 40302.4 DiscussionDespite recent advances in genotyping technologies, little is known about the identity of genesunderlying domestication and improvement in sunflower. To identify targets of selection duringthese two stages of sunflower domestication, we used a bottom-up approach based on analysesof transcriptome sequence data. This approach allowed us to identify the targets of selectionacross the genome with fewer biases (although see discussion below) than are inherent to topdown approaches such as QTL mapping or genome wide association studies, which typically targetvisible phenotypes. Using a common measure of population differentiation (F ST), we identifiedhighly differentiated SNPs and their corresponding contigs between wild and landrace accessions(Fig. 2.1), and between landrace and modern lines. These outlier contigs represent candidatedomestication and improvement genes, respectively, and can be placed onto a sequence basedgenetic map of sunflower (Fig. 2.2).The two most extreme outlier contigs in the domestication scan, in terms of number of highlydifferentiated SNPs, both have homology to genes that are involved in basic metabolism andpresumably oil biosynthesis. This may represent selection for different oil and nutritional propertiesor for more general modifications to growth habit and vigour. Previous work in sunflower has foundevidence of selection acting on oil synthesis genes during domestication (Chapman & Burke, 2012),as well as on a small number of genes investigated on the basis of homology to domestication genesin other species (Blackman et al., 2011a; Mandel et al., 2014), although we find little overlap withour candidate genes. Chapman et al. (2008) identified 19 candidate domestication genes from agenome scan based on changes in the diversity patterns of microsatellites associated with expressedsequence tags. However, only five of these candidates had sufficient data to be included in ourstudy and only one corresponds to a domestication gene we identified (c1533) (Chapman et al.,272.4. Discussion2008). The low correspondence is most likely due to the more conservative correction for multipletesting (necessitated by the larger number of markers) employed in the present study.Chapman et al. (2008) also reported that several GO categories were enriched for domesticationgenes, including amino acid metabolic processes. This pattern of enrichment was similar to reportsin other crops (Wright, 2005) and raised the question of whether this may be a commonalityamongst domestication events (Chapman et al., 2008). Here, with a larger sample of genes, anda robust method for finding enriched GO categories (Gillis et al., 2010), we observed a differentset of GO categories enriched in domestication genes. One GO category we identify, proteinbinding transcription factor activity (GO:0000988), fits our expectations well, and is consistentwith a recent review reporting that the majority of domestication and improvement genes currentlyidentified are transcription factors (Meyer & Purugganan, 2013). Differentiated genes annotatedwith nutrient reservoir activity and coenzyme biosynthetic process may contribute to the changesin oil production and storage that took place during sunflower domestication. So, although weare aware that spurious associations can be made with GO analyses (Pavlidis et al., 2012), theseresults fit well with our expectations for a morphologically differentiated oil seed. A caveat is thatour transcriptome data are mainly from seedling tissue rather than from tissues more obviouslytargeted by selection during domestication (e.g., achenes). This may account for why our dataset does not include some of the candidate domestication genes targeted by previous studies andmay also explain differences in our GO results relative to previous reports.Although genome scans can be useful for identifying targets of selection under domestication,such approaches may be less appropriate for analyses of selection arising from improvement becauseof the more complex demographic histories typically associated with the latter. Modern sunflowerlines, for example, do not represent a single cohesive population as they have been selected fordifferent uses (oil and confection) and have been divided into different heterotic groups. Thisgenerates population structure, which can lead to false positives or lower power (Excoffier et al.,2009). In addition, the intentional introduction of novel genetic material into modern lines, aswe provide evidence for here, may make this measure of differentiation less effective. In line withthese pitfalls, we are only able to identify a small number of outliers in a landrace to modern line282.4. Discussioncomparison, and even these should be viewed with caution. Outlier scans may be more appropriatewhen investigating the history of selection in a single population of modern lines, for example oilB-lines (females), but our current sampling is not broad enough to accommodate this type ofanalysis.Breeding registrations and pedigrees indicate that the improvement of sunflower involved theintroduction of alleles from numerous wild species (Korell et al., 1992). Using a method previ-ously employed to investigate crop wild introgression along the genome (Hufford et al., 2013) weinvestigated a case where much is known: the reintroduction of branching in the early 1970s. Tomake hybrid seed production viable at a commercial scale in sunflower, a cytoplasmic male sterilitysystem was developed by introgressing a sterility inducing cytoplasm from the prairie sunflower H.petiolaris (Leclercq, 1969) into sunflower B-lines, which act as females for large scale seed pro-duction. Sunflower R-lines, the pollen donors for hybrid production, such as RHA 274, contain arestorer allele for the H. petiolaris (pet1) cytoplasm and have recessive branching. The branchingwas incorporated to increase the duration of pollen availability and maximize seed production.Thus, the RHA 274 line, as well as the majority of other public R-lines, is expected to have thealleles responsible for branching derived from wild H. annuus, specifically H. a. texanus that isindicated to have been used for these traits (Kinman, 1970; Fick et al., 1975; Korell et al., 1992).As expected, we found several major introgressions from H. a. texanus in RHA 274 that arenot found in the non-branching lines Sunrise and VNIIMK8931 (Fig 2.3). The large size of theseintrogressions is consistent with pedigree information, which suggests RHA 274 was generated viathe equivalent of a few rounds of backcrossing and selfing (Kinman, 1970; Fick et al., 1975; Korellet al., 1992). The genomic locations of the introgressed regions correspond to the major QTLfor branching on linkage group 10 and a second QTL for branching on linkage group 8 (Mandelet al., 2013). Subsequent breeding of advanced R-lines is likely to have refined these introgressionsto reduce linkage drag; future studies will address this possibility. The restorer allele, Rf1 is alsopresent in RHA 274 likely having been simultaneously introduced from H. a. texanus. This allelehas been the focus of several mapping studies and has been placed on linkage group 13 (Horn et al.,2003; Yue et al., 2010), which has evidence of introgression in RHA 274 as expected although it292.4. Discussioncontains no obvious candidates for the Rf1 gene. A large introgression on linkage group 12 doesnot correspond to any known QTL that we would expect to find from H. a. texanus and mayhave been introduced unintentionally.Expansion of the introgression analyses to include all of the modern lines and two additionalwild species, H. argophyllus and H. petiolaris, revealed widespread intra- and interspecific wildintrogressions in all of the modern lines studied and on every chromosome (Table 2.1; Fig. A.3-19). The introgressions range in size from a single contig to several cM; e.g., HA 89 and Mammothcontain ~5 cM introgressions from wild H. annuus on linkage group five (Fig. A.3-19). A fewof the introgressions appear to be common to many of the lines such as one on linkage group2 that appears to derive from H. argophyllus, the silverleaf sunflower. In this case, the signalof introgression originates from a single gene, a subunit of photosystem II (having homology toAT1G44575). The majority of the introgressions identified appear to be from wild H. annuus. WildH. annuus germplasm has been used for improvement of numerous traits; however, it is possiblethat some of these signals are from Native American landrace genotypes that were lost prior to theUSDA collection trips done in the early 20th century and so could not be included in our diversitypanel. In particular, no landrace material is available from peoples in the southeastern and southcentral US. Furthermore, several other species that have been used in sunflower improvement arenot in our analysis, for example H. tuberosus has been used extensively for decades (Korell et al.,1992; Seiler, 1992), which may result in incorrect ancestry estimates and/or may correspond toregions with poor assignment to all groups.Based on a conservative criterion of <0.05 landrace ancestry, we found that ~10% of thecultivated sunflower genome contains a signal of introgression in at least one of the modern linessampled here. Overall, our analysis may underestimate the amount of interspecific introgressionsdue to the use of an alignment based approach, which may be biased against divergent alleles.With more thorough sampling of wild and cultivated germplasm, we expect that an even largerfraction of the genome will show evidence of introgression.Aggressive selection for a crop-like phenotype is usually applied following the use of wildrelatives in breeding. Such selection is expected to favour domestication alleles, leading to the302.4. Discussionprediction domestication genes should be under-represented in successful introgressions. Althoughthis has been observed in other crops (Hufford et al., 2013; Lin et al., 2014), we failed to findsupport for this prediction in the present study. This appears to be due to nuances of the historyof sunflower breeding, including the intentional re-introduction of branching into the restorer lineRHA 274 and not to a unique trait genetic architecture in sunflower. We are currently gatheringwhole genome shotgun sequence data for a much larger number of cultivars and wild species, whichwill allow us to more rigorously test this prediction, as well as to provide a more comprehensiveand unbiased list of domestication and improvement genes in sunflower.31Chapter 3A genomic survey of wild Helianthus germplasm clarifiesphylogenetic relationships and identifies population structure andinterspecific gene flow.3.1 IntroductionCapitalizing on the biodiversity maintained in the world’s seed banks is expected to be a criticalcomponent of obtaining and maintaining food security into the future. Historically, wild progeni-tors and closely related congeners of modern crops have been an important source of high-valuetraits. These include, but are not limited to, increased yield, disease resistance, and tolerance tounfavourable abiotic conditions. In spite of these past successes, wild germplasm remains under-utilized for two main reasons. First, high quality genotypic and evaluation data for wild germplasmaccessions are typically unavailable. Second, incorporation of wild germplasm into high-yieldingvarieties is a high risk and long term project (McCouch et al., 2013). The availability of geneticallycharacterized germplasm resources can reduce risks and timelines.The availability of relatively low cost high-throughput sequencing is making genomic surveysof crop wild relatives (CWRs) possible even for crops with modest genomic resources. NumerousDNA library preparation methods, such as RAD-seq (Baird et al., 2008) and GBS (Elshire et al.,2011), have been developed to leverage new sequencing technologies to genotype large numbers ofsamples in a cost effective manner. Additionally, methods for analyzing these data are now availablethat do not require a pre-existing reference sequence (Lu et al., 2013; Catchen et al., 2011). Theresulting genotypic information can be used to assess phylogenetic relationships (Wagner et al.,2013), population structure and gene flow (Andrew et al., 2013). For a plant breeder, suchinformation provides a basis for classifying accessions, establishing core collections (subsets ofaccessions designed to capture the greatest amount of genetic diversity) and detecting admixed323.1. Introductionpopulations.As we have seen in Chapter 2, wild sunflowers have long been used in sunflower improve-ment. However, progress has been hampered by a lack of high-resolution characterization ofgenotypic diversity maintained in public seed banks. The genus contains circa 12 annual and 37perennial species distributed across North America in habitats that range from open plains to saltmarshes (Heiser et al., 1969). Reconstructing phylogenetic relationships among these species hasbeen a formidable challenge, due to the group’s recent origin (Schilling, 1997), high incidenceof interspecific hybridization (Kane et al., 2009), and occurrence of multiple rounds of whole-genome duplication (Barker et al., 2008). These same factors have also made it difficult to resolvespecies boundaries among some Helianthus taxa. Although recent work suggests that next gen-eration sequence data can be used to resolve some of these relationships (Bock et al., 2013),phylogenomic-scale information at the genus level is currently lacking.Detailed genetic characterization is needed at the intraspecific level as well. The wild progenitorof the cultivated sunflower, H. annuus, occurs across much of North America (Kane et al., 2009)and is known to have a large effective population size (Strasburg et al., 2011). Previous workhas identified some subpopulation structure within H. annuus, corresponding to the divergence ofpopulations in California (Dorado et al., 1992) and Texas (Rieseberg et al., 1990). Other than thesepopulations, little genetic structure has been reported previously within H. annuus (Mandel et al.,2011). The range of H. annuus overlaps with several other cross-compatible annual Helianthusspecies (Rogers et al., 1982). However, the extent of interspecific gene flow between wild H.annuus and its sympatric congeners is unclear.Here, we use GBS to survey genome-wide genetic variation in ~290 accessions of wild He-lianthus from 27 taxa. We use these data to reconstruct phylogenomic relationships amongannual and perennial sunflowers. We then investigate patterns of genetic diversity and popula-tion structure within a geographically diverse panel of wild H. annuus accessions. Lastly, we usethe ABBA:BABA (Green et al., 2010; Durand et al., 2011; Kulathinal et al., 2009) approach toquantify gene flow between H. annuus and its sympatric annual relatives across its range.333.2. Materials and methods3.2 Materials and methodsThe USDA gene bank contains more than 2000 accessions of wild annual Helianthus (Kane et al.,2012). I selected approximately 15% of these accessions for genetic analyses. This selectionfocused on covering the indigenous geographic range of H. annuus. In addition, I included rep-resentatives from each species of the annual clade of sunflowers, as well several cross-compatibleperennial species (Table B.1). Lastly, in order to survey species that were not available from theUSDA at the time, for example H. winteri, I included some Rieseberg lab collections (Table B.1).These samples cover a large geographic area (Fig. 3.1). DNA was isolated from leaf tissue using aCTAB protocol (Doyle & Doyle, 1987). I then genotyped these samples using a modified GBS pro-tocol (Elshire et al., 2011), in which a gel electrophoresis / isolation step is incorporated to excludedimers amplified during the polymerase chain reaction (PCR). Ninety-six samples were sequencedper lane on an Illumina HiSeq 2000 using paired-end sequencing. The reads were demultiplexedusing an in house Perl script that also trims off adapter read-through. Reads shorter than 50bpfollowing this trimming step were removed. The remaining reads were aligned to a genome assem-bly of H. annuus (version 0.2; (Kane et al., 2011) using BWA(Li & Durbin, 2009). The SNPs were then called using GATK (DePristo et al., 2011).For all analyses, SNPs were selected on the basis of passing the same three filtering criteriadescribed in Chapter 2: minor allele frequency > 0.05, missing data < 20%, and observed het-erozygosity < 0.5. A phylogenetic network was generated with Splitstree (Huson, 2005) usingdefault settings (Fig. 3.4). From this network, a subset of samples was selected to (1) excludethe samples that were clear outliers with respect to their relationship with other samples fromthat taxon, and (2) homogenize the number of samples for each taxon. We also considered phe-notypic and collection information when removing samples. PAUP (Swofford, 2001) was used tocreate neighbor-joining and parsimony trees with this refined data set, and 1,000 bootstraps wereperformed to evaluate the level of support for individual branches (Fig. B.1). The program fast-STRUCTURE was used to identify population structure within H. annuus (Fig. 3.6) (Raj et al.,2014). A Mantel’s test was carried out using the R packages ade4 (Dray et al., 2007), SNPRelate(Zheng et al., 2012), and fossil (Vavrek, 2011) and 10,000 permutations to determine if there is343.2. Materials and methodsa relationship between geographic and genetic distance in H. annuus (Dray et al., 2007).To test for localized gene flow with H. annuus and other annual species, we ran ABBA:BABAtests (Green et al., 2010; Durand et al., 2011; Kulathinal et al., 2009) with H. annuus samplesdivided into sympatric and allopatric samples based on the distribution range of sympatric con-geners. This test uses a simple phylogenetic framework to determine if a population’s geneticcomposition has been influenced by interspecific gene flow. We divided H. petiolaris into its twosubspecies, H. petiolaris subsp. fallax and H. petiolaris subsp. petiolaris, for all ABBA:BABAanalyses. As the fastSTRUCTURE analysis of H. annuus revealed several geographically isolatedpopulation groupings, we also used the ABBA:BABA test to see if these structure groupings cor-related with increased gene flow from a sympatric congener. For the ABBA:BABA analysis wefiltered the SNP data separately for each comparison. We carried out this analysis with all of theannual species, except the known hybrid species (H. deserticola, H. paradoxus and H. anomalus).To ensure that we could accurately assess which allele was ancestral, we used a pooled sampleof the nine perennial sunflower species as an outgroup and only used loci where no alternate al-leles were found in the outgroup. Additionally we selected only bi-allelic sites where each of thepopulations used had at least one representative sequence. Standard errors were calculated usinga block jackknife bootstrap with the genomic contigs as the block size. We also calculated fd(Martin et al., 2013) to estimate the proportion of the genome shared through introgression anddetermined standard errors in the same way as for the D-statistic.Population genetic statistics were calculated for each species using the R package heirfstat(Goudet, 2004) using a second SNP set generated with a reference-free approach similar to theUNEAK method (Lu et al., 2013), but developed in house. The rationale for this second approachwas to avoid any biases caused by the H. annuus reference, as well as to more confidently detectheterozygous genotypes. Briefly, this reference-free method first trims all reads to 67 base pairsafter the removal of the barcodes and the restriction enzyme site sequence. The trimmed tagsare then compared to find single base pair mismatches. Following this a strict network filterwas applied to select only tags found in networks with one or two other tags. Once acceptablenetworks were identified, each sample was genotyped based on the presence or absence of each353.3. Resultsllllll lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll253035404550−120 −100 −80LongitudeLatitudeGrouplllllH. annuusannuus cladehybrid speciesperrenial cladepetiolaris cladeFigure 3.1: Collection localities of samples used in this study. Coloured by broad taxonomic groupsfor illustrative purposes.tag. A sample only needed a single read to be scored for that allele. This data set was filteredfor 50% missing data as it inherently filters for polymorphic sites. Additionally, species with fewerthan five samples were removed.3.3 ResultsWe sequenced 292 samples representing 27 taxa of Helianthus. Following de-multiplexing, sevenof these samples were removed due to insufficient sequencing depth (Table B.1). A total of 3,891SNPs passed our filtering criteria using the reference genome based approach, whereas 326 SNPspassed our quality thresholds using our more conservative ’uneak-like’ approach. This latter dataset was used for calculation of the basic population genetic statistics described below.Widespread species such as H. annuus, H. petiolaris and H. divaricatus are highly variablegenetically, as measured by expected heterozygosity (Hs), whereas the geographically restrictedspecies H. argophyllus, H. exilis, and H. grosseserratus are less diverse (Fig. 3.2). Helianthusneglectus is an exception to this general rule in that it exhibits high diversity despite its limitedgeographic distribution. Earlier molecular genetic studies have made the same observation and363.3. Resultslll l l l l l l0.000.250.500.751.00H. annuusH. anomulusH. argophyllusH. bolanderiH. debilisH. decapetalusH. deserticolaH. divaricatusH. exilisH. giganteusH. grosseserratusH. maximilianiH. neglectusH. niveusH. petiolarisH. praecoxSpeciesGenetic Divesity (Hs)Figure 3.2: Genetic diversity (Hs) of wild Heliathus species. Estimated using 326 SNPs genotypedusing a de novo UNEAK-like approach.373.3. Resultsllllllllllllllllllll−1.0− annuusH. anomulusH. argophyllusH. bolanderiH. debilisH. decapetalusH. deserticolaH. divaricatusH. exilisH. giganteusH. grosseserratusH. maximilianiH. neglectusH. niveusH. petiolarisH. petiolaris subsp fallaxH. praecoxSpeciesObserved FisFigure 3.3: The inbreeding coefficient (F IS) of wild Heliathus species. Estimated based on 326SNPs genotyped using a de novo UNEAK-like approach.383.3. Results0.01H. annuus(n = 101)H. anomulus(n = 8)H. argophyllus(n = 15)H. bolanderi(n = 6)H. debilis(n = 15)H. decapetalus (n = 6)H. deserticola(n = 14)H. divaricatus (n = 5)H. exilis(n = 11)H. giganteus (n = 5)H. grosseserratus (n = 6)H. hirsutus (n = 4)H. maximillianI (n = 11)H. neglectus(n = 18)H. niveus(n = 12)H. nuttallii (n = 3)H. paradoxus(n = 2)H. petiolaris subsp. petiolaris(n = 13)H. petiolaris subsp. fallax(n = 6)H. praecox(n = 7)H. tuberosus(n = 6)H. winteri(n = 4)H. strumosus(n = 1)Figure 3.4: Phylogenetic network of Helianthus germplasm. Red dots denote samples not adheringto labeled species, from the top these are samples annotated as H. petiolaris, H. maximiliani, H.annuus, and H. niveus.have suggested that H. neglectus is part of the gene pool of H. petiolaris and might be moreappropriately treated as a subspecies (Raduski et al., 2010). In the present study with the largenumber of markers used, we can differentiate these two taxa.The Helianthus species that were included in this study are self-incompatible outcrossers. Thus,we expected that the inbreeding co-efficient, F IS, would approach zero, as previously reportedfor species in the genus (e.g., Rieseberg et al., (1988)). Indeed, this was what we observedfor all species in the genus except for H. annuus and H. petiolaris (Fig. 3.3). However, theapparent evidence of inbreeding in these species may be an artifact of combining samples fromthe distinct populations of H. annuus described below and from both H. petiolaris subspecies inour calculations. F IS does decrease when the taxa are divided into subspecies. Note, that whenwe calculated inbreeding coefficients using the larger SNP set from the reference-guided approach,393.3. ResultsH. annuusH. anomulusH. argophyllusH. bolanderiH. debilisH. decapetalusH. deserticolaH. divaricatusH. exilisH. giganteusH. grosseserratusH. hirsutusH. maximillianiH. neglectusH. niveusH. paradoxusH. petiolarisH. praecoxH. winteriH. tuberosusFigure 3.5: Reconstructed phylogeny of Helianthus species drawn using information from splitstree,parsimony and neighbour joining analyses. Each node shown is supported by at least 90% of thebootstrapped parsimony and neighbor joining trees generated.403.3. ResultsF IS was above zero for essentially all species, suggesting that heterozygotes are under-called inthe reference based SNP set.Phylogenetic relationships were reconstructed using two different subsets of the data. First,I generated a phylogenetic network based on 285 samples with adequate sequence coverage and3,891 SNPs that were called using our reference-guided approach (Fig. 3.4). I found that themajority of samples clustered with other samples of the same taxon as expected, but there wereseveral exceptions. These “outlier” samples typically correspond to accessions with collection notessuggestive of questionable origins. For example, the collection notes for a H. petiolaris samplefound in the H. annuus cluster states that “Helianthus annuus plants mixed with H. petiolaris onboth sides of road”. An outlier sample annotated as a H. maximiliani that also clusters with H.annuus was selected from a wild collection for being “bigger” than the rest of the H. maximiliani.For the remaining phylogenetic analyses, these outlier samples were excluded and five sampleswere selected from each species and subspecies when possible. These analyses were carried out us-ing 4,304 SNPs from the reference-guided SNP set. The topologies for phylogenetic trees/networksmade with Splitstree, parsimony, and neighbor-joining methods are essentially identical with mostbranches having strong bootstrap support (Fig. B.1). Most named taxa are supported as uniquelineages by these trees, with the exception of H. bolanderi/H.exilis, H. winteri/H. annuus, H.hirsutus/H. divaricatus and H. tuberosus (Fig. 3.5).Analyses of population structure within H. annuus were based on 101 samples from acrossthe geographic range of the species (Fig. 3.1) and 5,863 SNPs. The chooseK algorithm offastSTRUCTURE indicates that a model with four subpopulations within H. annuus best explainsthese data (Fig. 3.6). However, there is similar support for 5 or 6 subpopulations, and this numberis sensitive to changes in methodology, such as the exclusion of some samples. This populationstructure appears to be correlated with geography, suggesting that there may be some geographicstructure within H. annuus (Fig. 3.7). The California subpopulation appears distinct in eachmodel of 4 to 6 subpopulations. A Mantel’s test supports this relationship between geographyand genotype (r = 0.28, p = 9.999e-05). Testing for localized introgression between H. annuusand sympatric populations of other annual sunflowers detected a significant signal for H. bolanderi413.4. DiscussionSampleQ−valueFastStructurePopulation 1Population 2Population 3Population 4Figure 3.6: Assignment of H. annuus samples to fastSTRUCTURE populations. Each vertical bardenotes a sample.only; 6.0±1.5% of the genome was shown to be involved in introgression. All other comparisonswere did not reach our threshold for significance (Fig. 3.8). Introgression between H. bolanderiand California H. annuus was also supported by the STRUCTURE-based classification of the H.annuus samples (data not shown).3.4 DiscussionGenomic data from a panel of wild Helianthus samples provides insights into the nature of thesegermplasm collections, as well as a broader view of the relationships among the taxa surveyed.Analysis of circa 290 samples genotyped for thousands of markers with the GBS approach revealedseveral instances where genotypic and collection information support each other in highlightingspecific collections that appear to have been misidentified. For example, a H. petiolaris samplecollected near a population of H. annuus was found in to cluster genetically with H. annuus (Fig.3.4). Another example involves several accessions that were originally identified as H. bolanderibut appear genotypically to be H. annuus. Upon further investigation these samples were foundto resemble H. annuus phenotypically as well (data not shown). This information has already423.4. Discussionlllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll253035404550−120 −100 −80LongitudeLatitudeFastStructurellllPopulation 1Population 2Population 3Population 4Figure 3.7: Geographic location of H. annuus colored by fastSTRUCTURE population assignment.Each sample is colored according to its majority Q-value.been incorporated into the USDA germplasm information database and the collections have beenre-classified (Table B.1). Although the sampling described here is far from comprehensive, evenfor the subset of taxa investigated, it does demonstrate the power of using genomic tools foraddressing practical issues concerning germplasm curation.The population statistics generated here may also be useful in prioritizing future collectionefforts, especially as sampling and technical improvements are made in genotyping efforts. Whilethe amount of genetic diversity in each of the sampled species generally appear to be in line withtheir effective population sizes, as estimated in Strasburg et al. (2011) (Fig. 3.2), levels of geneticdiversity found in H. grosseserratus are exceptionally low and suggest a need for new collectionefforts.All of the phylogenetic trees generated with this GBS data agree in topology and are largelycongruent with previously published molecular phylogenies (Timme et al., 2007; Rieseberg et al.,1991). As found in the previous studies, the annual sunflowers occur as a monophyletic lineagethat is distinct from the perennial clade. In addition, two main lineages are recovered within theannual clade: one of which corresponds to H. annuus and allied species, and the other to H.433.4. Discussionll llll0. argophyllus H. bolanderi H. debilis H. petiolaris H. praecoxSympatric SpeciesD−valueFigure 3.8: Interspecific gene flow between sympatric H. annuus and local annual Helianthusdetermined by ABBA:BABA tests. The D-values indicates the proportion of the genome involvedin introgression. The red dot indicates contrasts with a significant difference from zero (p<0.05),error bars indicate standard error.443.4. Discussionpetiolaris and its relatives (Fig. 3.5). However, unlike the ribosomal DNA-based phylogeny ofTimme et al., (2007), I find that H. petiolaris shares a more recent common ancestor with H.debilis than with H. niveus (Fig. 3.5, B.1).With a few notable exceptions, the named taxa are well supported as independent lineages(Fig. B.1) and can be represented in a complete phylogeny for all of the annual species (Fig. 3.5).The hybrid species, H. paradoxus, is found in an expected polytomy between the clades of itsparental species. The other two hybrid species, H. deserticola and H. anomulus, are placed withinthe H. petiolaris clade. Concurrently, a recent transcriptomic analysis has found these hybridspecies have unequal parentage and owe the majority of their gene space to H. petiolaris (G.Owens, personal communication, April 10 2015). Despite considerable genetic distance from thereference sequence, and variation in ploidy, the perennial samples are well resolved and most taxaare supported. There is no differentiation between the autopolyploid H. hirsutus and its diploidprogenitor, H. divaricatus. As expected, the placement of H. tuberosus, an autoallohexaploid, isunresolved but between its diploid and tetraploid parents, H. hirsutus and H. grosseserratus (Bocket al., 2013). Thus, while I did not specifically attempt to address genotyping problems that mightarise from polyploidy, I was able to correctly reconstruct the relationships of the polyploid taxa(Bock et al., 2013).The progenitor of the cultivated sunflower, H. annuus, contains a striking amount of diver-sity (Fig. 3.2). Although thought to be largely homogeneous across its range (Mandel et al.,2011), I found distinct subpopulations within H. annuus (Fig. 3.6). The number of subpopula-tions identified in various iterations of the fastSTRUCTURE analysis presented here, or in earlierSTRUCTURE analyses (not shown), was somewhat volatile. Nonetheless, between 4 and 6 sub-populations were always supported and the California population was distinct in all analyses (Fig.3.7). With k = 6 a subpopulation in Texas was also found that likely corresponds to H. a. texanus,but power to detect this subspecies may have been limited by inadequate sampling from Texas.Physical and genetic distances within H. annuus are correlated (Mantel’s Test, r = 0.28, p =9.999e-05), suggesting that the observed population structure is at least partially due to isolationby distance. However selection or interspecific gene flow (introgression) could also be playing roles.453.4. DiscussionUsing the ABBA:BABA approach we found strong evidence for substantial introgression betweenH. annuus from California and H. bolanderi (Fig. 3.8). Although this test is not directional, thestrong signal of population structure in H. annuus may partly be a consequence of migration of H.bolanderi alleles into H. annuus. We were not able to detect interspecific gene flow between H.annuus and H. petiolaris despite previous reports of rampant introgression (Strasburg & Rieseberg,2008; Rieseberg et al., 1999, 1998). Possibly the long history of sympatry between these species,followed by recent range expansion by H. annuus, has reduced the power of the ABBA:BABAapproach. That is, allopatric H. annuus populations might have inherited alleles from ancestralgene flow events that would not be detectable with this method, highlighting a potential pitfall ofthe ABBA:BABA test.Understanding patterns of introgression in the wild could be of use to crop improvementprograms. For example, introgressions associated with the colonization of new habitats couldbe adaptive and contain alleles of interest. Conversely, genomic regions that resist introgressioncould harbor locally adapted alleles and thus represent potential targets for further investigation.Admixed wild genotypes could also be used as a bridge to help breeders access a larger amountof genetic diversity with less effort. For example, it may be possible to introgress H. bolanderialleles into cultivated H. annuus via wild H. annuus from California without suffering the costof the reduced hybrid fertility. In another example, modern cultivars of H. annuus appear tocontain alleles from the Texas subpopulation of H. annuus, H. a. texanus (Chapter 2). Thissubspecies is thought to have formed via introgression with a Texas endemic, H. debilis subsp.cucumerifolius, so modern cultivars of sunflowers may already contain interspecific introgressionsof H. debilis alleles captured by H. annuus in the wild. It is not uncommon for breeders to employsuch a bridge process to move alleles into cultivars of interest (Jansky & Hamernik, 2009). Usinggenomic data to identify introgression in the wild could allow breeders to access more distant wildrelatives more rapidly and efficiently.46Chapter 4New pre-bred lines have desirable agronomic traits and give insightinto the permeability of the modern sunflower genome4.1 IntroductionThe world is faced with a growing population and a less certain climate. Droughts and heatwaves cause massive yield loss and threaten food security (Beddington, 2009). Our staple cropsmust be improved to face this challenge. A new global initiative coordinated by the Global CropDiversity Trust is underway to improve 26 priority food security crops by harnessing the geneticdiversity of their wild relatives (Dempewolf et al., 2014). Sunflower is one of the crops targeted bythis initiative and, along with rice, is the focus of pilot pre-breeding projects that are intended toprovide guidelines and advice for subsequent projects involving the other crop species. Sunfloweris an attractive crop for this project because its wild relatives are adapted to a diverse array ofenvironments and likely harbours numerous alleles for abiotic and biotic tolerances. Additionally,wild Helianthus germplasm is relatively well collected and introgression with many of the wildspecies is straightforward compared to some other crops.The amount of genetic diversity available in the wild relatives of crop plants is immense, fargreater than what can be practically evaluated for use in food production. Also, because of lifehistory and architectural differences (e.g. branching), it can be difficult to judge the potentialusefulness of a particular wild accession without moving its alleles into a cultivated background.Thus, multiple generations of crosses between wild and domesticated plants, combined with ex-tensive evaluations under different environmental conditions, are typically required to determinewhich CWRs can provide useful alleles for a given trait. Breeders typically have limited resources,so the need for pre-breeding represents a significant impediment to the use of wild germplasm.Additionally, practical issues such as variation in flowering time and reduced fertility limit use of474.2. Materials and methodsCWRs. For these reasons, “pre-breeding” has been added to the crop improvement continuum.Pre-breeding attempts to centralize these first difficult generations of CWR use, with the goal ofreleasing pre-bred lines to plant breeders rather than finished lines to growers.As can be seen from our analysis of the genomes of elite lines (Chapter 2), wild sunflowers havealready proven to be useful in breeding programs, These earlier endeavors, which involved bothpublic and private breeding programs, largely focused on qualitative traits, especially disease resis-tance traits, with an estimated contribution of $269.5 million per year to the US sunflower industryalone (Phillips & Meelleur, 1998). As part of previous public efforts to make wild germplasm moreaccessible to sunflower breeders, a number of pre-bred lines were developed and made publiclyavailable (Seiler & Jan, 1997; Feng et al., 2006; Seiler, 1991b, 1993, 2000; Jan & Chandler, 1988;Jan & Vick, 2006; Jan et al., 2004; Stelkens & Seehausen, 2009). These lines are a valuableresource, but lack genotypic and comprehensive evaluation data. In this chapter I describe a seriesof crossing experiments, phenotypic evaluations, and genotypic analyses, with the goal of makingwild sunflower germplasm resources more accessible and useful to breeders. Specifically I devel-oped 426 new pre-bred lines and gathered phenotypic and genotypic data for 318 and 364 of theselines, respectively, in addition to phenotypic and genotypic data for 55 pre-bred lines previouslydeveloped by the USDA.4.2 Materials and methods4.2.1 Building the pre-bred linesFrom the wild samples sequenced in Chapter 3, a subset of 28 accessions was selected for back-crossing into cultivated H. annuus (Table C.2). Accessions were selected to represent as manyspecies as possible and, for H. annuus, each of the subpopulations identified in an earlier analysisof the data described in Chapter 3. I also attempted to maximize coverage of the geographic rangeof wild H. annuus (Fig. 4.1). Pollen from each wild individual was used to fertilize a cytoplasmicmale sterile (CMS) HA89 plant at the UBC Farm in the summer of 2011. The F1 seeds resultingfrom these crosses were germinated and grown in the greenhouse during the winter of 2011-2012for backcrossing into either CMS HA89 or HA89 if the F1 plant lacked pollen. A row of 15 plants484.2. Materials and methodsllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll253035404550−120 −100 −80LongitudeLatitude UsellGenotyping onlyGermplasm donorFigure 4.1: Geographic locations of the collection sites of the wild donors used in the creation ofthe pre-bred lines developed at UBC. Germplasm donors were selected from a subset of the linesgenotyped in Chapter 3.derived from a single BC1 plant was propagated in the summer of 2012 at the UBC BotanicalGardens nursery for a second round of backcrossing. Restorer alleles are frequent in the wild, andmany of the F1 and BC1 plants had full or partial pollen fertility as expected. For those that didnot, a restorer allele was introduced using RHA391, which is the most closely related restorer lineavailable for HA89. The lines were then subjected to two rounds of self-pollination, first in thegreenhouse during the winter of 2012-2013, then again at Totem Field in 2013. Seed from 426pre-bred lines were then sent to a commercial nursery in Chile to bulk up sufficient quantities ofseed for evaluation.4.2.2 Phenotypic evaluationFrom the currently available public pre-bred material, 55 lines containing intraspecific and inter-specific introgressions (mostly in an HA89 background) were obtained from the USDA (Table C.1).These efforts involved many perennial species, as well as several wild annual species (Seiler & Jan,1997; Feng et al., 2006; Seiler, 1991b, 1993, 2000; Jan & Chandler, 1988; Jan & Vick, 2006;Jan et al., 2004; Stelkens & Seehausen, 2009). The USDA pre-bred lines, along with HA89 as a494.2. Materials and methodsFigure 4.2: Evaluation locations for pre-bred lines in Uganda. Image from Google earth.504.2. Materials and methodscontrol, were grown at the UBC Farm during the summer of 2012 using a complete randomizedblock design, with one individual from each genotype in each block, and 15 blocks in total. Theywere phenotyped for the following traits: stem density, weight per seed, number of days to flower,number of branches in the upper and lower half of the plant, number of flower heads, height, headdiameter, stem width, number of leaves on the main stem, number of leaves on the side branches,and the angle of flowering heads.Of the pre-bred lines developed at UBC, 318 lines with adequate seed were evaluated bycollaborators at NARO, Uganda. Evaluations took place in three locations in Uganda: the mainarid crop research center, NaSARRI; a location in the center of their sunflower growing region,Ngetta; and a dry northern location, Kitgum (Fig. 4.2). An alpha lattice design (15 x 22) withthree replicates was used to evaluate the 318 lines, along with 12 released varieties as controllines. Days to first flower, days to 50% flowering and days to maturity were recorded for lines atthe NaSARRI location. Ordinal measurements were made at all sites for Alternaria prevalence,leaf crinkle prevalence, branching type, branching length, petiole length, petiole color, stem color,disc color, stem pubescence, Sclerotinia wilt prevalence, Sclerotinia leaf rot prevalence, Sclerotiniastem rot prevalence, number of leaves, plant height, maturity and drought tolerance. For eachline, Best Linear Unbiased Predictors (BLUPs) were estimated using the lme4 package in R, withline and block as random effects and location as a fixed effect.4.2.3 Genotyping of the pre-bred linesI extracted DNA from leaf tissue of the 55 USDA lines, plus HA89, using a Qiagen DNeasy 96kit. DNA from the UBC-generated lines was extracted by collaborators at SOLTIS using a similarcolumn-based DNA extraction protocol. After normalizing DNA concentration levels, I preparedGBS libraries as described in Chapter 3 but with some modifications. I used a second restrictionenzyme, MspI, and a second set of barcoded adapters. This second set of adapters ligates to thecut site created by the MspI enzyme and facilitates 2D multiplexing; in our case up to ~2300samples may be sequenced per lane. Using this two enzyme method, and the additional barcodes,514.2. Materials and methodslllllllllllllllllllllllllllllllllllllllllllllllll lllll−40−20020−5 0 5 10DaysToFlowerHeightWild DonorlllllllllllllllllH. annuusH. anomulusH. argophyllusH. bolanderiH. debilisH. deserticolaH. giganteusH. hirsutusH. maximilianiH. neglectusH. paradoxusH. petiolarisH. praecoxH. resinosusH. strumosusH. tuberosusHA89HeadDiamllll−1001020Figure 4.3: Agronomic traits of previously pre-bred lines. Height (in cm) and days to first maturefloret presented as a BLUPs from evaluation at UBC in 2012. The HA89 control line contains nowild introgressions.524.2. Materials and methodsthe pre-bred lines were sequenced with ~250 samples per lane on an Illumina HiSeq 2000 using115bp paired-end sequencing with an average insert site of 300bp. One to five individuals weresequenced per line.The sequence reads for the pre-bred lines, nine HA89 samples and the wild samples describedin Chapter 3 were demultiplexed and then filtered for adapter read through contamination; caseswhere the forward and reverse reads were not the same length were removed. For each sample,the reads were trimmed with trimmomatic (Bolger et al., 2014) using a window size of 4bp anda required quality of 15. Trimmed reads had to be at least 36 base pairs to be retained. Bothpaired and orphaned reads were aligned using the mem algorithm of bwa version 0.7.12 to a newerreference sequence of the genome (HA412 version 1.1). The alignments were then merged andcleaned using picard tools. Samples that had bam files of smaller than 1 megabyte were removed.Samtools and bcftools version 1.2 were used to call SNPs using mpileup (Li et al., 2009). TheVCF was filtered to remove sites with an observed heterozygosity of > 0.5 and < 50% of thesamples scored.To determine the genomic locations of wild introgressions in the pre-bred lines, I used a windowbased permutation analysis that was implemented with a custom Perl script. Loci were selectedfor this analysis based on two criteria. First, they must be monomorphic in the cultivated parent(HA89) samples that I tested. Second, a wild allele must be present at that locus; the presence ofa wild allele was inferred from the wild donor genotype and/or from the presence of an alternativeallele at the locus in the pre-bred lines themselves. In the latter case, alternative alleles with afrequency of > 0.05 and with a frequency difference of 0.5 relative to the cultivar allele, wereconsidered to be wild alleles. Then in 10 Mb windows for each sample the number of wild andcultivar alleles was scored. This was compared to 1,000 permutations of randomly selected allelesfrom the full population of pre-bred lines in that window. To be scored as a wild introgression,a window had to contain more wild alleles than 99% of the random samples. The replicates foreach line were then combined into a consensus score for that line using the best-supported alleleand scoring ties as wild. The glm packages of R were used to model the importance of family,chromosome, position and wild‘ species using a binomial distribution. Likelihood ratio tests were534.3. ResultsFigure 4.4: Evaluation of pre-bred lines at Kitgum, Uganda. The potentially drought tolerant lineon the left has been selected for further characterization (credit: W. Anyanga).used to compare different models. Additionally, reshape2 (Wickham, 2007) and ggplot2 (Wickham,2009) were used for data manipulation and plotting.4.3 ResultsThe wild donors employed for pre-breeding were taxonomically, genetically, and phenotypicallydiverse (Table C.2, Fig. 4.1), which created a number of challenges for line development. Theseincluded (1) the extreme variance in flowering time among lines, which made it difficult to correctlytime the planting of such a large number of plants; (2) the failure of many crosses to produce viableseed despite the copious application of pollen; and (3) the failure of many lines to self-pollinate.There was considerable variability in the expression of hybrid incompatibilities and self-pollination.This, combined with unforeseeable field and greenhouse issues (e.g., herbivory by squirrels, fungal544.3. Resultsinfections, etc.), makes it difficult to quantify the effects of genetic incompatibilities on seedproduction versus that of other factors. Nonetheless 426 lines, representing 28 original wild donorsand 11 species, were sent to the nursery in Chile, from which I recovered enough seed to evaluate318 lines with NARO in Uganda.Evaluation of the USDA lines at UBC identified several lines with promising agronomic char-acteristics (Fig. 4.3, C.1). Head diameter (our proxy for yield) varied considerably across thelines, with several pre-bred lines having much larger heads than the common elite parent, HA89.Contrary to our expectations there is no correlation between head diameter and height (Pearsoncorrelation, p = 0.38), indicating there may not necessarily be a trade off between these traits.There are several lines that do have the desired phenotype of short stature, early flowering, andlarge head size (Fig. 4.3). However, low seed weight may be an issue for many of these lines;despite having large heads, most lines have smaller seeds than HA89 (Fig. C.1). The majority ofthe pre-bred lines had few or no branches. However, we did observe some branching in HA89 (Fig.C.1) and the UBC environment has been observed to affect branching number and architecture inother populations (Nambeesan et al., 2015). Thus, we suspect that an even smaller fraction of thepre-bred lines would exhibit branching in other environments. Additionally, the whole populationflowered nearly a month later than expected (Fig. C.2) possibly because of the cool spring that istypical in Vancouver. Such late flowering has been observed in subsequent nursery experiments.The evaluations in Uganda experienced several challenges including a severe drought and heavypredation by variegated grasshoppers and termites at the Kitgum evaluation site (A. Walter,personal communication, April 2 2015). Based on performance across the three sites, 96 lineswere selected for test crossing, further evaluation and possible incorporation into the sunflowerbreeding program at NARO. These lines still segregate for male sterility and branching. Thus,additional generations of selfing, or of selection within lines, may be required prior to test crossingand further evaluation. Number of days to 50% flowering at the NaSARRI site shows that wildspecies donor and family significantly influences flowering time (Fig. 4.5) (anova, p = 9.2e-10and p = 3.15e-10 respectively). These lines also have considerable variation for disease resistance(Fig. 4.6). We were unable to obtain data for yield, and some of the ordinal measurements made554.3. Resultsllllll ll ll−15−10−50510Checkann011ann025ann029ann031ann041ann049ann051ann098ann100ann103ann106ann118ann121ann125ann132ano061arg019bol023deb012exi077exi083neg086neg154niv097pet091pra096win038win039FamilyMaturityFigure 4.5: BLUPs for days to 50% flowering for pre-bred lines developed at UBC and evaluatedat NaSARRI. Checks include 14 release varieties. Each family corresponds to lines derived fromone wild NARO are unsuitable for further analyses.A total of 1,337 samples from the UBC pre-bred lines and 89 samples from the USDA pre-bredlines were submitted for sequencing. Unfortunately, technical problems during sequencing resultedin poor quality scores and high rates of uncalled bases in the second half of the second read foreach paired-end read. This did not affect the second inline barcode, so reads could be successfullydemultiplexed. Following alignment, and after samples with insufficient data were removed, 1,261samples of representing 339 UBC pre-bred lines, 89 samples representing 55 USDA pre-bred lines, 9samples of HA89 and 116 wild parental samples remained. After SNP calling and filtering, 78,813SNPs were recovered. On average 142 SNPs were used to score a window as either cultivated orwild for each pre-bred line.Introgression was found throughout the genome of each of these pre-bred lines (Fig. 4.7). Onaverage, across all of the lines, 15.89% of the genome was introgressed from a wild donor. Nu-merically there are more small introgressions; however, many lines contain large wild chromosomesegments and some introgressions span entire chromosomes (Fig. 4.8). Based on likelihood ratiotests, models containing genetic position and an interaction between family and chromosome fit564.3. Resultslllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll−−1.0 −0.5 0.0 0.5AlternariaLeaf CrinkleWild DonorllllllllllllChecksH. annuusH. anomulusH. argophyllusH. bolanderiH. debilisH. exilisH. neglectusH. niveusH. petiolarisH. praecoxH. winteriFigure 4.6: BLUPs for disease prevalence of pre-bred lines developed at UBC based on evaluationsacross three locations in Uganda.574.4. Discussionthe observed patterns of introgression better than a model with only wild donor and chromosome(chi squared, p < 2.2e-16). Additionally, the identity of the wild donor species appears to affectwith the amount of introgression observed across the different chromosome (chi squared, p <2.2e-16).The amount of introgression varies across the genome. For example, no introgression wasobserved in a 10 Mb region on chromosome 5 across all UBC and USDA lines, and introgressionwas greatly reduced across parts of chromosome 17. Other regions have experienced introgressionin a large number of lines. For example, several windows spanning 30 Mb on chromosome 9 appearto have been introgressed in over 30% of the lines and a 10 Mb subset of this region appears tocontain wild alleles in 33% of the UBC-generated lines (Fig. 4.9).4.4 DiscussionEasily accessible wild germplasm may be critical to future crop improvement, especially in thecontext of an increasingly variable climate. Although considerable wild genetic diversity is housedin the world’s gene banks, numerous barriers limit its use in breeding. Pre-breeding programsattempt to reduce these barriers by centralizing this early stage of the breeding process. Indeveloping pre-bred sunflower lines I did indeed encounter many of the issues that hinder the useof wild germplasm in breeding programs. Variation in flowering time, hybrid incompatibilities,and the absence of domestication traits such as self-compatibility resulted in unpredictable andpoor nursery outcomes for many crosses, as well as in low seed production. Largely unforeseeablecircumstances, ranging from equipment failures to road closures to herbivory, further hinderedline development and evaluation. Nonetheless, this work has considerably increased the number ofpublicly available pre-bred lines for sunflower. Sunflower breeders interested in this wild germplasmcan now access it without repeating the several years of pre-breeding required to move wild allelesinto a background suitable for evaluation. In addition, I have comprehensively genotyped theselines, so that sunflower breeders can avoid lines with large or otherwise undesirable introgressions.Evaluation of previously developed pre-bred lines revealed extensive phenotypic diversity. Al-though only the common cultivated parent, HA89, was included in this evaluation, and not the584.4. Discussionwild donors, it is certain that many traits are transgressive because the wild parents are known tohave much lower values (e.g., head diameter is known to be much smaller in wild genotypes) (Fig.4.3). The pre-bred lines I developed also display a wide amount of variation in key agronomictraits such as flowering time (Fig. 4.5). Although each of the lines contains different introgres-sions, both wild donor genotype and wild donor species significantly affect flowering times. Thisfinding implies that flowering time is highly polygenic in sunflower and that it might be possibleto combine desired traits from different pre-bred lines to achieve even more extreme phenotypes.Many of the pre-bred lines exhibit promising abiotic and biotic tolerance characteristics. Droughtstress at one of the evaluation sites was severe enough to greatly impact yield potential in some ofthe lines (Fig. 4.4). However, several lines appeared to be highly drought tolerant and representpossible sources for drought resistance alleles. Other lines were observed to have less damage fromAlternaria, a fungal pathogen, as well as from a virus which causes leaf crinkle, than the commonreleased varieties used as controls throughout the evaluations (Fig. 4.6). Some of these lines willbe incorporated into the sunflower breeding program at NARO and will be made available to otherinterested research groups.Using GBS I identified the genomic locations of the wild introgression in these pre-bred lines(Fig. 4.7). Although the pre-bred lines are not well suited for QTL or association mapping,the genotypic information is a useful resource for plant breeders and can give insights into thegenetic processes that occur during pre-breeding. The amount of introgression roughly meets ourexpectations for a BC2 population ~16% vs. an expected 12.5%. This small discrepancy may beexplained by recombination events with in a window resulting in the whole 10Mb window beingscored as wild. More stringent criteria for calling a window as wild may also be needed. Some ofthe introgressions are large in size spanning considerable portions of chromosomes or in some caseswhole chromosomes (Fig. 4.8). Chromosome segment substitution lines (CSSLs) have become animportant genetic resource in some crop communities, such as rice (Xi et al., 2006; Ebitani et al.,2005) and peanut (Fonceka et al., 2012). These CSSLs each carry a different wild chromosomeor chromosome segment. It may be possible to rapidly create a set of CSSLs for sunflower byselecting a subset of these pre-bred lines, backcrossing them again to HA89 and using marker594.4. Discussionassisted selection to capture the desired segments or chromosomes.There is considerable variation in the amount of introgression that occurred between familiesand across the different chromosomes. There are significant interactions between chromosome andwild donor for many of the combinations, but sample size and population structure are complicatingfactors in the interpretation of these results. Lines derived from one wild donor, ann029, appearto have experienced much more introgression then the other families. This wild sample appearsto be genotypically H. annuus (Chapter 3), and the collections were made near both cultivated H.annuus and a population of H. pauciflorus. It is not clear whether introgression of cultivated orperennial wild alleles into the wild parent would affect the amount of introgression that occurredduring pre-breeding. The identity of the parental species is a significant factor in determining theprofile of introgression in these lines, but it is difficult to draw conclusions on, for example, theimportance of chromosomal rearrangements in restricting introgression, due to the small numberof lines derived from some of the interspecific crosses. Although it may be possible to addressthese questions using seed from earlier generations of these populations, work may continue to behindered by poor seed set and viability.Several regions of the genome appear to have experienced more introgression than expected(Fig. 4.9). Three of the five most highly introgressed regions are found on chromosome 9 (Fig.4.9) even though this chromosome contains numerous domestication related QTL (Wills & Burke,2007) and candidate genes (Chapter 2). This pattern might be a consequence of unintentionalselection at UBC since it is not seen in the USDA lines (Fig. 4.9). There was considerablesclerotinia pressure in the field, and mildew in the greenhouse, at UBC, which might account forthis pattern. Whether such unintended selection is responsible for this pattern and the ultimatelyutility of these lines may become more apparent with further evaluation.Only one region, found on chromosome 5, is monomorphic for the cultivated allele in all UBCand USDA pre-bred lines. This chromosome contains QTL for seed size (Tang et al., 2006) aswell as several domestication genes (Chapter 2); however, it is not clear how such traits/genescould result in the observed dominance of the cultivated allele. Meiotic drive represents a possibleexplanation for this observation. Exploring the diversity in this region, for example, with the604.4. DiscussionHa1 Ha2 Ha3 Ha4 Ha5Ha6 Ha7 Ha8 Ha9 Ha10Ha11 Ha12 Ha13 Ha14 Ha15Ha16 Ha17PositionLineWild Donorann011 ann025 ann029 ann031 ann041 ann049 ann051ann098 ann100 ann103 ann106 ann118 ann121 ann125ann132 ano061 arg019 bol023 deb012 exi077 exi083neg086 neg154 niv097 pet091 pra096 win038 win039Figure 4.7: Genetic position of introgressions in pre-bred lines developed at UBC based on GBS.Permutation tests were used to determine if each 10 Mb window along the genome is derived froma cultivated parent (in grey) or one of the wild donors.614.4. DiscussionHa1 Ha2 Ha3 Ha4Ha5 Ha6 Ha7 Ha8Ha9 Ha10 Ha11 Ha12Ha13 Ha14 Ha15 Ha16Ha170501001500100200300010020001002000100200300400050100025507510012505010015001002003000100200300400010020001002000100200300010020030005010015020001002003000501001502000 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 2000 100 200 300 0 40 80 120 0 40 80 120 0 50 100 150 2000 100 200 0 100 200 300 0 50 100 150 200 0 50 100 150 2000 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 0 50 100 150 200 2500 100 200Length (MB)CountFigure 4.8: The length of wild introgressions in pre-bred lines determined by GBS. Red bars indicatethe total length of each chromosome, rounded up to the nearest 10 Mb.624.4. Discussion010020030002040UBCUSDAGenetic position (re−ordered)Number of LinessWildCultivarFigure 4.9: Number of lines with introgression from a wild donor for each region of the genomedetermined using GBS. Genomic positions have been re-ordered by the amount of introgressionobserved in the UBC developed lines.634.4. Discussionsunflower association mapping population (Mandel et al., 2013), could further illuminate thispossibility. Chromosome 17 as a whole, and several specific regions within it, has experienced alower rate of wild introgression than the rest of the genome. Chromosome 17 is known to containthe self-incompatibility (SI) locus (Burke et al., 2002) and several candidate domestication genes(Chapter 2). The cultivated allele confers self-compatibility, and it would have been stronglyfavoured during the multiple rounds of selfing these populations experienced. Valuable wild alleleslinked to the SI locus may have been lost in this process. It might be possible in the future toselect lines containing rare recombination events adjacent to the SI locus, or to evaluate lines priorto selfing, in order to capture that wild diversity.64Chapter 5The genomic profile of a new hybrid crop: 40 years of sunflowerbreeding5.1 IntroductionHeterosis, or hybrid vigour, is a critical component of modern sunflower production. As withother hybrid crops, yield gains from heterosis are considerable (up to 30% in sunflower (Fick &Swallers, 1972)). Over the past half century, sunflower breeders have developed heterotic groupsto maximize heterosis. These heterotic groups are composed of individual lines that, when crossedwithin a group do not generate large heterotic effects, but when crossed between groups yieldpronounced hybrid vigour. Breeders rely on these heterotic gene pools to predictably produce highyielding hybrids.The boost in yield from heterosis, combined with legal and biological protection providedby F1 seeds (i.e., F1 seeds do not breed true), have resulted in a large hybrid seed productionindustry. Sunflower is the second most valuable hybrid crop, with a global seed market of circa1 billion USD annually. In addition, maize, tomato, cotton, canola and sorghum are all primarilygrown as hybrid crops. It is noteworthy that heterosis may not be exploitable in all crops. Forexample, despite considerable investment into its development, hybrid wheat may not experiencecommercial success in the near future because of the absence of an efficient hybrid seed productionsystem as well as the genetic architecture of key traits (Whitford et al., 2013). Given potentialyield increases from heterosis, and the associated success of the hybrid seed industry, there isconsiderable incentive to investigate the mechanisms of heterosis and to maximize yield gains inhybrids.Three general genetic models have been put forward to account for heterosis: dominance,overdominance and epistasis. In the dominance model of heterosis, the enhanced performance655.1. Introductionof hybrids is thought to result from genetic complementation (i.e., the masking of deleteriousrecessive alleles from one parent by dominant alleles from the other parent). The overdominancemodel posits that increased hybrid vigour is due to favourable interactions of alleles from differentlineages at a single locus. The third mechanism, epistasis, presumes that the superior performanceof hybrids results from beneficial interactions of parental alleles at different loci. In addition toconsiderable empirical support for each of these genetic models (see below), an increasing numberof studies have characterized the underlying molecular mechanisms (Chen, 2013), which are largelyconsistent with one or more of the genetic models described above. Molecular processes commonlyassociated with heterosis include changes in gene expression (Swanson-Wagner et al., 2006; Kriegeret al., 2010), protein metabolism (Goff, 2011), and epigenetic modification of key regulatory genes(Chen, 2013).While experimental data have been found that support each of the genetic models for heterosis,determining the relative importance of these different explanations has been much more challeng-ing. Early genetic studies of heterosis often relied on QTL mapping, but detection of responsibleQTLs can be affected by statistical and technical considerations (Schnable & Springer, 2013).For example, unless carefully designed, QTL studies often have limited power to detect epistasis.An additional issue is a phenomenon called psuedo-overdominance, which can make it difficultto distinguish between the dominance and overdominance models (Schnable & Springer, 2013).Pseudo-overdominance is caused by tight linkage between a pair of dominant alleles in repulsionphase, giving the appearance of overdominance. Limited recombination in cultivated populationscould mean that pseudo-overdominance plays a big role in heterosis, as has been suggested inmaize (Gore et al., 2009). Most QTL studies investigating heterosis have reported that a largenumber of small effect loci are involved (Schnable & Springer, 2013). However, several studieshave found a single gene with large overdominance effects, for example, in tomato (Krieger et al.,2010) and in Arabidopsis (Ni et al., 2008).In sunflower, hybrid based production began in the 1970s. Before this all production wasbased on open-pollinated varieties (OPVs). These OPVs were not subject to severe inbreeding, asare modern inbreds, but instead were maintained as small populations. From these OPVs ‘male’665.2. Materials and methodsR-line and ‘female’ B-line gene pools were developed (Korell et al., 1992). The critical attributethat divided these groups was the presence of cytoplasmic male sterility (CMS) in the B-linesand complementary restorer alleles in the R-lines. Additionally, as we have seen in Chapter 2,the R-lines have had branching re-introduced. Both CMS (Leclercq, 1969), and the restorer andbranching alleles, were brought into cultivated sunflowers from wild relatives (Fick et al., 1975;Kinman, 1970). Breeders have, and will continue to select for heterosis in the development ofthese gene pools. Although much phenotypic evaluation of new potential inbred lines is doneon the inbred lines themselves, test crossing is usually a critical component of line selection, andcombining ability is a top priority in inbred line release and use. Thus, breeders have selected forheterosis in these sunflower gene pools for several decades. Here I use whole genome sequence(WGS) data from a diverse panel of OPV, R-line and B-line varieties to investigate the genomicimpact of this selection. The profile of selection on these genomes may provide clues regardingthe genetic mechanism responsible for heterosis in sunflower, as well as useful information for thecontinued improvement of sunflower.5.2 Materials and methodsTo investigate the impact of selection during the creation of the heterotic gene pools, I leveragedsequence information developed for a public sunflower association mapping population. Thispopulation used was developed as a community effort and attempts to capture as much of thediversity in cultivated sunflower as possible (Mandel et al., 2013). Lines were selected fromnumerous gene pools, including landraces, OPVs and modern high oil lines and purified furtherusing self pollination when possible. Most relevant to this study are a large sampling of B-lineand R-line varieties and OPVs from which they have been derived. This whole population wassequenced to 5-20x coverage (Table D.1) on the Illumina platform as a collaborative effort involvingUBC, the University of Georgia, the Institut national de la recherche agronomique (INRA), andthe South African Research Council. The large WGS data set of raw reads, with a combined harddrive footprint of over 10 Terabytes, was delivered to collaborators at the database company SAPfor alignment and SNP calling. There the reads were cleaned using trimmomatic (Bolger et al.,675.2. Materials and methods2014) and aligned to the ‘HA412Bronze’ genome assembly (,using an alignment algorithm developed by SAP. Following alignment, genotypes were called usinga proprietary maximum posterior probability algorithm. Simulations indicate that the aligner andvariant caller out-perform BWA and GATK, respectively (S. Hubner, personal communication,February 2 2015). The resulting variant call information was delivered to us and formatted andfiltered for missing data, minor allele frequency and observed heterozygosity as in previous chapters.Principal component analyses were carried out with an arbitrary subset of 5,000 SNPs representingeach chromosome, to explore relationships among cultivated lines.The genomic signature of selection for heterosis should vary depending on its genetic basis. Ifoverdominance is the cause of heterosis, then selection for heterosis or combining ability shouldfavour different alleles at the same locus. This would lead to a signature in which the samelocus would show evidence of selection in both the male and female populations, but for differentalleles. In contrast, different regions of the genome should be targeted by selection if geneticcomplementation (dominance) and/or epistasis underlies hybrid vigour. Note that a differentsignature is expected for general improvement traits that are favoured in all cultivated lines. Herewe would expect to see evidence of selection at the same locus and same allele in both populations.To find putative targets of selection, highly differentiated regions in the genome (i.e., outlierregions) were identified using an F ST genome scan approach. I calculated F ST using a customPerl script (Weir & Cockerham, 1984), for three different contrasts, OPV vs. B-lines, OPV vs.R-lines and B-lines vs. R-lines. In each case, F ST was calculated for every SNP, and the top 1%highest F ST sites were considered to be outliers. Then in non-overlapping 1 Mb windows I askedif there were more outliers than expected by chance, 95% of the time, based on a permutationanalysis. I combined adjacent windows in order to determine the size of the regions putativelyaffected by selection. Available gene annotations were queried to assess possible functional rolesof genes in the selected windows. I then categorized each enriched window as either being uniqueto the B-lines or R-lines or found to be under selection in both. To compare these findings to thedomestication genes identified in Chapter 2, I used blastn (Altschul et al., 1990) and selected hitswith e-values of less than 1e-40. For all of the SNPs in these regions I calculated the expected685.3. Resultsllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll llllllllll lllllllllll lllllllllllllll llllllllllllll llllllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll−20−1001020−20 −10 0 10 20PC1PC2llllB−lines (n=127)OPV (n=13)Other (n=42)R−lines (n=92)Figure 5.1: Principal component analysis of the sunflower association mapping population samplesbased on a random subset of 5,000 WGS derived SNPs.heterozygosity for a population derived from crosses between B-lines and R-lines using the followingequation: He = (p1×q2)+(p2×q1). Comparisons of expected heterozygosity for unique enrichedwindows versus that for shared outlier windows were used to infer whether or not the same allelewas putatively targeted by selection in each population.5.3 ResultsA total of 784,441 SNPs passed our filtering criteria. Principal component analyses revealed thatmuch of the differentiation among cultivars could be explained by genetic group (B- vs. R-lines)and year of release (Fig. 5.1 and 5.7).The number of outlier regions detected in the B- and R-lines by our F ST scan was surprisingly695.3. Results050100150050100150B−lineR−line0 5 10 15Length (MB)CountFigure 5.2: The sizes of genomic regions putatively subject to selection during the developmentof the B-lines and R-lines. Region size was determined by the number of consecutive 1 Mbwindows that were enriched for high F ST outliers in OPV vs B-line and OPV vs R-line comparisons,respectively.similar – approximately 190 in each population. However, the total size of these regions varies, withcirca 250 Mb enriched for outliers in the B-lines and 314 Mb in the R-lines. The R-lines containseveral large regions of outlier enrichment, whereas the B-lines primarily have smaller enrichedregions (Fig. 5.3). These enriched windows are dispersed across the genome in both populations(Fig. 5.3). On chromosome 10, the R-lines have numerous enriched windows, spanning a largefraction of the chromosome. There is significant overlap in the outlier windows between the B-and R-lines (chi squared, X-squared = 7617.1, p < 2.2e-16), with 108 Mb enriched in both.Analyses of He of a hypothetical hybrid population in the outlier windows failed to detect anincrease in the shared windows relative to outlier windows that were unique to the B- or R-lines.Thus, our data do not support the overdominance model of heterosis (Fig. 5.4).The final genome scan comparison (B- versus R-lines) detected 19 windows enriched for highF ST outliers. The outlier windows are restricted to chromosome 10 and 13. On chromosome 13,the region putatively affected by selection is small, between the physical locations of 214-215 Mb,216-220 Mb and 223-225 Mb. Because it was not clear whether there was a single locus targetedby selection or multiple separate loci, the entire region, spanning from 214 Mb to 225 Mb, wasinvestigated further. Our genome annotation reveals that this region is significantly enriched forpentatricopeptide repeat (PPR) genes (chi squared, p<0.0001). On chromosome 10, we found a705.4. Discussion12345678910111213141516170e+00 1e+08 2e+08 3e+08Position (bp)Linkage GroupOPV vs. B−linesOPV vs. R−linesFigure 5.3: Genomic regions putatively subject to selection during the development of the B-linesand R-lines. These regions contain more high F ST outliers than 95% randomly sets of SNPs in apermutation of 25 Mb targeted by selection including one nearly continuous block spanning 14-35 Mb.Of the four domestication genes on linkage group 10 identified in Chapter 2 (Table A.2), two havetheir best hits in this region, including HaGNAT. Best hits for the other two domestication outlierson chromosome 10 are outside of the putatively selected regions. However, both have additionalhits inside a selected region.5.4 DiscussionUnderstanding the genetic architecture of heterosis will contribute to the efficient leveraging hybridvigour for continued crop improvement. The public sunflower association mapping population(Mandel et al., 2013) captures the history of the development of hybrid cultivars, as it containsboth heterotic groups and the ancestral population from which they were bred (Fig. 5.1). Withdense genotyping of this population by WGS, I was able to identify genomic regions that may715.4. Discussionllllllllll0.10.20.3B−lines R−lines BothPopulationHe of B−line x R−line crossFigure 5.4: He of hypothetical crosses between the B-line and R-line populations in 1 Mb regionsthat were putatively subject to selection during the development of B-lines only (n = 142), R-linesonly (n =206) or both (n=108). 1e+08 2e+08 3e+08Position (bp)FstFigure 5.5: Differentiation of B- and R-lines along chromosome 10. Each point represents a singleSNP and vertical red bars indicate 1 Mb regions that are enriched for high F ST outliers.725.4. Discussionhave been targeted by selection during this phase of improvement – information that allowed meto distinguish between alternative genetic models of heterosis in sunflowers.My analyses revealed that profiles of selection during the creation of the B and R gene poolsare similar, with an equivalent number of windows targeted by selection in the two populations.The majority of selected regions are small – only a few Mb in size. The R-lines have been subjectto more large sweeps, with major sweeps found on chromosomes 8, 10, 13 and 16. The sweepson 8 and 10 likely correspond to the introgressions from H. a. texanus identified in Chapter2. These introgressions are responsible for the recessive branching found in nearly all of the Rlines. Interestingly, the introgression on LG12 reported for the R-line in Chapter 2 (Fig A.14)does not appear to have been subject to selection, possibly suggesting it has been removed duringsubsequent breeding in the R-lines. In the B-lines there are several putatively selected regionsclustered on both chromosome 6 and 15. Chromosome 6 contains oil QTLs (Burke et al., 2005)and chromosome 15 contains QTLs for height and head size (Wills & Burke, 2007). Althoughthere are a number of regions with unique histories of selection in each population, the number ofloci putatively targeted by selection in both populations is significantly greater than expected bychance (chi-squared, X-squared = 7617.1, p < 2.2e-16). It is possible that the regions that onlyappear to have experienced selection in one population could be involved in overdominance. Thiswould be the case for loci where one allele was already at a high frequency. However, we expectthat overdominant loci would have been maintained in these populations at a high heterozygosity.Analyses of expected heterozygosity implies that for regions targeted by selection in both theB and R populations, it was the same allele that was under selection (Fig. 5.4). This result failsto support the over-dominance model for heterosis; rather these overlapping regions likely containalleles that were favoured in both populations during improvement. Such regions occur throughoutthe genome and include a large genomic region on chromosome 16, as well as smaller regions onchromosomes 2, 3, and 4. Chromosome 16 contains QTLs for seed size and oil content (Mokraniet al., 2002; Tang et al., 2006), whereas chromosomes 3 and 4 contain QTLs for linoleic and oleicacid content and total percent oil, respectively (Burke et al., 2005; Tang et al., 2006). One of themain traits pursued by breeders over this time period was oil content, and the oil content QTL on735.4. Discussion0.000.250.500.750.0e+00 5.0e+07 1.0e+08 1.5e+08 2.0e+08 2.5e+08Position (bp)FstFigure 5.6: Differentiation of B- and R-lines along chromosome 13. Each point represents a singleSNP and vertical red bars indicate 1 Mb regions that are enriched for high F ST outliers.linkage group 4 has a recessive mode of inheritance (Burke et al., 2005), so it would have beencritical to select for this allele in both populations.Given the absence of evidence of selection for overdominance in this population, dominance andepistasis remain as potential mechanisms for heterosis in sunflower. Unfortunately, my approachis unable to differentiate between these mechanisms and there is insufficient information regardingmodes of inheritance in existing QTL studies to assist with this task. Additional mapping studiesin large populations with high heterozygosity may be required to assess the contribution of thesemechanisms to heterosis in sunflower. There is significant variation in the amount of hybrid vigourfound in different crosses between these two gene pools; however existing evaluation data are toolimited to make effective use of this information. In addition to the distinct clusters within eachof the heterotic groups (Cheres & Knapp, 1998), there is a clear signal of continued improvementsince the initial creation of these lines (Fig. 5.7). Some of the changes that took place over theselast decades may result in more pronounced heterosis. It is also likely that some of this signal iscaused by the continued use of wild relatives in line development. If dominance is the cause ofheterosis in sunflower, than wild introgressions contributing to heterosis need be introduced intoone population only, reducing time to release and easing logistical constraints.745.4. Discussionlllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllll lllllllllllll llllllllllllllllllllllllllllllllllll lllllllllll lllllllll lllllllllllllllllll llllllllllllllllllllllllllllllll−20−1001020−20 −10 0 10 20PC1PC21970198019902000Figure 5.7: Principal component analysis of the sunflower association mapping population samplesby year of release based on a random subset of 5,000 WGS derived SNPs.755.4. DiscussionSurprisingly, I find only two regions that are strongly differentiated between the two mainsunflower gene pools. These regions correspond to the re-introduced branching allele on chromo-some 10, described in Chapter 2, and the restorer allele that was introduced at the same time onchromosome 13. Both of these traits were isolated from the CWR H. a. texanus and their intro-gression coincides with the creation of these gene pools (Chapter 2). As the majority of previouslycharacterized CMS restorer alleles are PPR genes (Chen & Liu, 2014), the array of PPR genesin the highly differentiated region of chromosome 13 represent good candidates for the restorerof fertility allele in sunflower. The domestication candidates for branching (including HaGNAT)described in Chapter 2 occur in differentiated regions on chromosome 10, indicating that they alsodeserve further study. Although many breeders expressed great interest in the pre-bred materialsdescribed in Chapter 4, a recurring concern was the integration of these pre-bred lines into existingheterotic gene pools. This is indeed a major concern for breeders in all F1 crops. Here, I haveidentified strong candidates for the two most critical components of these heterotic groups. Thisinformation will allow marker-assisted selection to be applied early in future pre-breeding programs,thereby enabling the development of male and female pre-bred lines. Such lines would addressbreeder concerns and facilitate more rapid uptake of wild germplasm.76Chapter 6ConclusionCrop domestication and improvement are complex processes, involving more than just repeatedrounds of selection and genetic bottlenecks. As I have shown in Chapter 2, there is substantialpost-domestication introgression from wild relatives into sunflower elite lines. Introducing novelalleles into crops from their wild relatives, as I describe in Chapter 4, shows that continued gainscould be made with the use of wild germplasm. As with cultivated lines, wild species do sometimeexperience interspecific gene flow, which I discuss in Chapter 3. However, this introgressionis restricted geographically and involves a handful of sympatric congeners. Because of humanintervention, crops typically have access to a much broader gene pool than their wild progenitors.Additionally, crops are not necessarily homogenous, and distinct genotypic groups can be createdand change over time (Chapter 5).The sunflower heterotic groups, and the two CWR introgression events responsible for theircreation, are an important focus of this dissertation. Interestingly, these introgressions involve theloss of domestication alleles for apical dominance and restoration of cytoplasmic male sterility.Future pre-breeding in sunflower can use my results to select for the appropriate alleles at theseloci so that lines (and the traits they carry) are more readily integrated into existing heteroticgroups.Looking beyond the heterotic groups of sunflower, the methods I employed here could beapplied to identify the alleles that have been of historic importance in other crops. Breeders havedeveloped cultivars for specific uses and/or that are adapted to local climatic conditions. Thesignals of these selection events can be detected using genome scans and used to inform breedingplans. In this way we can build on past successes. A potential pitfall associated with this approachis that breeders might become too reliant on a narrow genetic base for a given trait. For example,as far as I am aware the majority of public sunflower lines used in production are based on the use77Chapter 6. Conclusionof a single CMS cytotype and restorer system, possibly. There are, however, numerous alternativeCMS cytotypes and restorer alleles available for sunflower (Seiler & Jan, 1994; Jan & Vick, 2006).Although the main results presented here are based on independent populations and severaldifferent sequencing methodologies and integrate well with previous work in sunflower, there aresignificant shortcomings. Most importantly, the many candidate genes described in Chapter 2 havenot been validated via fine mapping and functional analyses. Likewise, we have yet to connectmany of the putatively selected regions identified in Chapter 5 to phenotypic variation. Anotherissue concerns the quality of the phenotyping conducted at NARO, which was largely done usingordinal measurements. Although this method does greatly reduce the equipment and personnelrequirements needed for such large populations it does make it more difficult to compare the datato other studies or even between different locations in Uganda. In addition, more traits (specificallyyield-related traits) must be phenotyped to fully understanding the value of these lines.Work is underway to address these issues. Reverse genetic tools have been developed forsunflower, with the goal of validating several of the candidate domestication genes reported here.Likewise, work is ongoing to link genomic regions of interest to phenotypic variation. Resultsfrom two types of approaches, population and quantitative genetics, can be combined to associategenotype with phenotype and identify candidate genes. For example, without previous QTL andassociation studies, connecting HaGNAT to the branching phenotype would have been extremelyspeculative. Of course, additional molecular genetic work may ultimately be required to confirmthe functional role of HaGNAT and other candidate genes. With respect to evaluations of thepre-bred lines, we have provided details on best practice approaches to phenotyping of sunflowersto our colleagues at NARO, and the Global Crop Diversity Trust is funding additional evaluationsthere. In addition, the sunflower breeding company, SOLTIS, is conducting evaluations of the newpre-bred lines this summer and has agreed to make the data public.I envision a future where gene bank curators and plant breeders will employ a genomics partscatalog to facilitate the use of CWRs and all other available germplasm, and thereby accelerate theimprovement and diversification of our crops. This catalog would include an accession’s genomesequence(s), information on unique and shared variants (SNPs, indels, rearrangements, etc.) and78Chapter 6. Conclusiontheir potential functional effects, environmental information from the original collection locale,exhaustive phenotypic data, and information on its previous use in breeding. It would quantify thegenetic and phenotypic diversity within and between accessions of a particular species, sub-speciesor ecotype. Such information would allow breeders to make selections based on the presence ofdesired alleles for particular QTLs. A breeder could also select improved material that alreadycontains particular wild alleles of interest. If an allele of interest is only found in wild material,they could select an accession with a genetic background most amenable to introgression. Withappropriate population genomic analyses, it should be possible to select accessions for breedingbased on both phenotypic and genotypic associations with environments of interest. Genomicinformation is attractive not only because it can provide an increasingly useful estimate of thevalue for crop improvement of individual accessions, but also because it can be obtained for everysample in a collection, including both existing and future collections. Genomic data could becomethe common ‘currency’ of germplasm curation and use.As the climate changes and our population grows, crop improvement must become moreeffective and efficient. This will certainly involve the cooperation of many parties. Indeed, just theresearch described here would not have been possible without contributions of people from severalcountries spread over four continents. Genetic diversity, such as that found in the wild relativesof crops, will play a critical role in crop improvement, and genomics may be key to tapping intoits potential.79BibliographyAltschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). Basic local alignmentsearch tool. Journal of molecular biology, 215(3), 403–410.Andrew, R. L., Kane, N. C., Baute, G. J., Grassa, C. J. & Rieseberg, L. H. (2013). Recentnonhybrid origin of sunflower ecotypes in a novel habitat. Molecular ecology, 22(3), 799–813.Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A., Selker,E. U., Cresko, W. A. & Johnson, E. A. (2008). Rapid SNP discovery and genetic mapping usingsequenced RAD markers. PLoS ONE, 3(10), e3376.Barker, M. S., Kane, N. C., Matvienko, M., Kozik, A., Michelmore, R. W., Knapp, S. J. &Rieseberg, L. H. (2008). Multiple paleopolyploidizations during the evolution of the Compositaereveal parallel patterns of duplicate gene retention after millions of years. Molecular Biologyand Evolution, 25(11), 2445–2455.Baute, G. J., Kane, N. C., Grassa, C. J., Lai, Z. & Rieseberg, L. H. (2015). Genome scansreveal candidate domestication and improvement genes in cultivated sunflower, as well as post-domestication introgression with wild relatives. New Phytologist, 206(2), 830–838.Beddington, J. (2009). Food security: contributions from science to a new and greener revolution.Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 61–71.Bertrand, C., Bergounioux, C., Domenichini, S., Delarue, M. & Zhou, D. X. (2003). Ara-bidopsis histone acetyltransferase AtGCN5 regulates the floral meristem activity through theWUSCHEL/AGAMOUS pathway. Journal of Biological Chemistry, 278(30), 28246–28251.Blackman, B. K., Rasmussen, D. A., Strasburg, J. L., Raduski, A. R., Burke, J. M., Knapp, S. J.,80BibliographyMichaels, S. D. & Rieseberg, L. H. (2011a). Contributions of flowering time genes to sunflowerdomestication and improvement. Genetics, 187(1), 271–287.Blackman, B. K., Scascitelli, M., Kane, N. C., Luton, H. H., Rasmussen, D. A., Bye, R. A., Lentz,D. L. & Rieseberg, L. H. (2011b). Sunflower domestication alleles support single domesticationcenter in eastern North America. Proceedings of the National Academy of Sciences of theUnited States of America, 108(34), 14360–14365.Bock, D. G., Kane, N. C., Ebert, D. P. & Rieseberg, L. H. (2013). Genome skimming reveals theorigin of the Jerusalem Artichoke tuber crop species: neither from Jerusalem nor an artichoke.New Phytologist, 201(3), 1021–1030.Bolger, A. M., Lohse, M. & Usadel, B. (2014). Trimmomatic: a flexible trimmer for illuminasequence data. Bioinformatics, 30(15), 2114–2120.Brown, A. (1989). Core collections: a practical approach to genetic resources management.Genome, 31(2), 818–824.Burke, J. M., Knapp, S. J. & Rieseberg, L. H. (2005). Genetic consequences of selection duringthe evolution of cultivated sunflower. Genetics, 171(4), 1933–1940.Burke, J. M., Tang, S., Knapp, S. J. & Rieseberg, L. H. (2002). Genetic analysis of sunflowerdomestication. Genetics, 161(3), 1257–1267.Capron, A., Gourgues, M., Neiva, L. S., Faure, J. E., Berger, F., Pagnussat, G., Krishnan, A.,Alvarez-Mejia, C., Vielle-Calzada, J. P., Lee, Y. R., Liu, B. & Sundaresan, V. (2008). Maternalcontrol of male-gamete delivery in Arabidopsis involves a putative GPI-Anchored protein encodedby the LORELEI gene. The Plant Cell Online, 20(11), 3038–3049.Cruz de Carvalho, M. H. (2008). Drought stress and reactive oxygen species: Production, scav-enging and signaling. Plant signaling & behavior, 3(3), 156–165.Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, W. & Postlethwait, J. H. (2011). Stacks:building and genotyping loci de novo from short-read sequences. G3 (Bethesda, Md.), 1(3),171–182.81BibliographyCavanagh, C. R., Chao, S., Wang, S., Huang, B. E., Stephen, S., Kiani, S., Forrest, K., Saintenac,C., Brown-Guedira, G. L. & Akhunova, A. (2013). Genome-wide comparative diversity uncov-ers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars.Proceedings of the National Academy of Sciences of the United States of America, 110(20),8057–8062.Chapman, M. A. & Burke, J. M. (2012). Evidence of selection on fatty acid biosynthetic genesduring the evolution of cultivated sunflower. Theoretical and Applied Genetics, 125(5), 897–907.Chapman, M. A., Pashley, C. H., Wenzler, J., Hvala, J., Tang, S., Knapp, S. J. & Burke, J. M.(2008). A genomic scan for selection reveals candidates for genes involved in the evolution ofcultivated sunflower (Helianthus annuus). The Plant Cell Online, 20(11), 2931–2945.Chen, H., Patterson, N. & Reich, D. (2010). Population differentiation as a test for selectivesweeps. Genome Research, 20(3), 393–402.Chen, L. & Liu, Y.-G. (2014). Male sterility and fertility restoration in crops. Annual review ofplant biology, 65, 579–606.Chen, Z. J. (2013). Genomic and epigenetic insights into the molecular bases of heterosis. NatureReviews Genetics, 14(7), 471–482.Cheres, M. T. & Knapp, S. J. (1998). Ancestral origins and genetic diversity of cultivated sunflower:Coancestry analysis of public germplasm. Crop Science, 38(6), 1476–1482.Dechaine, J. M., Burger, J. C., Chapman, M. A., Seiler, G. J., Brunick, R., Knapp, S. J. & Burke,J. M. (2009). Fitness effects and genetic architecture of plant-herbivore interactions in sunflowercrop-wild hybrids. New Phytologist, 184(4), 828–841.Dempewolf, H., Eastwood, R. J., Guarino, L., Khoury, C. K., Müller, J. V. & Toll, J. (2014).Adapting agriculture to climate change: A global initiative to collect, conserve, and use cropwild relatives. Agroecology and Sustainable Food Systems, 38(4), 369–377.Dempewolf, H., Hodgins, K. A., Rummell, S. E., Ellstrand, N. C. & Rieseberg, L. H. (2012).Reproductive Isolation during Domestication. The Plant Cell Online, 24(7), 2710–2717.82BibliographyDePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., Philippakis,A. A., del Angel, G., Rivas, M. A., Hanna, M. et al. (2011). A framework for variation discoveryand genotyping using next-generation dna sequencing data. Nature genetics, 43(5), 491–498.Dlugosch, K. M., LAI, Z., Bonin, A., Hierro, J. & Rieseberg, L. H. (2013). Allele identificationfor transcriptome-based population genomics in the invasive plant Centaurea solstitialis. G3(Bethesda, Md.), 3(2), 359–367.Dorado, O., Rieseberg, L. H. & Arias, D. M. (1992). Chloroplast DNA introgression in southernCalifornia sunflowers. Evolution, 46(2), 566–572.Doyle, J. J. & Doyle, J. L. (1987). A rapid DNA isolation procedure for small quantities of freshleaf tissue. Phytochemical Bulletin, 19, 11–15.Dray, S., Dufour, A.-B. et al. (2007). The ade4 package: implementing the duality diagram forecologists. Journal of statistical software, 22(4), 1–20.Dubos, C., Stracke, R., Grotewold, E., Weisshaar, B., Martin, C. & Lepiniec, L. (2010). MYBtranscription factors in Arabidopsis. Trends in Plant Science, 15(10), 573–581.Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. (2011). Testing for ancient admixturebetween closely related populations. Molecular biology and evolution, 28(8), 2239–2252.Ebitani, T., Takeuchi, Y., Nonoue, Y., Yamamoto, T., Takeuchi, K. & Yano, M. (2005). Con-struction and evaluation of chromosome segment substitution lines carrying overlapping chro-mosome segments of indica rice cultivar ’kasalath’ in a genetic background of japonica elitecultivar ’koshihikari’. Breeding Science, 55(1), 65–73.Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S. & Mitchell,S. E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversityspecies. PLoS ONE, 6(5), e19379.Eshed, Y. & Zamir, D. (1995). An introgression line population of Lycopersicon pennellii in thecultivated tomato enables the identification and fine mapping of yield-associated QTL. Genetics,141(3), 1147–1162.83BibliographyExcoffier, L., Hofer, T. & Foll, M. (2009). Detecting loci under selection in a hierarchicallystructured population. Heredity, 103(4), 285–298.FAO (2010). Second report on the state of the world’s plant genetic resources for food andagriculture. Rome.FAO (2014). Accessed: 2014-09-26.Fatland, B. L. (2005). Reverse genetic characterization of cytosolic acetyl-CoA generation byATP-citrate lyase in arabidopsis. The Plant Cell Online, 17(1), 182–203.Feng, J., Seiler, G., Gulya, T. & Jan, C. (2006). Development of Sclerotinia stem rot resistantgermplasm utilizing hexaploid Helianthus species. In 28th Sunflower research workshop, Fargo,ND, USA.Feng, J., Seiler, G. J., Gulya, T. J., Cai, X. & Jan, C. C. (2008). Incorporating Sclerotinia stalkrot resistance from diverse perennial wild Helianthus species into cultivated sunflower. In 30thSunflower research workshop, Fargo, ND, USA.Fick, G. & Swallers, C. (1972). Higher yields and greater uniformity with hybrid sunflowers. NDak Farm Res, 29(6), 7–9.Fick, G. N., Kinman, M. L. & Zimmer, D. E. (1975). Registration of ‘RHA 273’ and ‘RHA 274’sunflower parental lines (Reg. No. PL 7 and 8). Crop Science, 15(1), 106–106.Flugge, U. I., Hausler, R. E., Ludewig, F. & Gierth, M. (2011). The role of transporters insupplying energy to plant plastids. Journal of Experimental Botany, 62(7), 2381–2392.Foley, J. A., Ramankutty, N., Brauman, K. A., Cassidy, E. S., Gerber, J. S., Johnston, M., Mueller,N. D., O’Connell, C., Ray, D. K., West, P. C. et al. (2011). Solutions for a cultivated planet.Nature, 478(7369), 337–342.Foll, M. & Gaggiotti, O. (2008). A genome-scan method to identify selected loci appropriate forboth dominant and codominant markers: A bayesian perspective. Genetics, 180(2), 977–993.84BibliographyFonceka, D., Tossim, H.-A., Rivallan, R., Vignes, H., Lacut, E., de Bellis, F., Faye, I., Ndoye, O.,Leal-Bertioli, S. C., Valls, J. F. et al. (2012). Construction of chromosome segment substitu-tion lines in peanut (Arachis hypogaea L.) using a wild synthetic and QTL mapping for plantmorphology. PloS one, 7(11), e48642.Gerland, P., Raftery, A. E., Ševčíková, H., Li, N., Gu, D., Spoorenberg, T., Alkema, L., Fosdick,B. K., Chunn, J., Lalic, N. et al. (2014). World population stabilization unlikely this century.Science, 346(6206), 234–237.Gillis, J., Mistry, M. & Pavlidis, P. (2010). Gene function analysis in complex data sets usingErmineJ. Nature Protocols, 5(6), 1148–1159.Goff, S. A. (2011). A unifying theory for general multigenic heterosis: energy efficiency, proteinmetabolism, and implications for molecular breeding. New Phytologist, 189(4), 923–937.Gore, M. A., Chia, J. M., Elshire, R. J., Sun, Q., Ersoz, E. S., Hurwitz, B. L., Peiffer, J. A.,McMullen, M. D., Grills, G. S., Ross-Ibarra, J., Ware, D. H. & Buckler, E. S. (2009). Afirst-generation haplotype map of maize. Science, 326(5956), 1115–1117.Goudet, J. (2004). hierfstat, a package for R to compute and test hierarchical F-statistics. Molec-ular Ecology Notes, 5(1), 184–186.Gouesnard, B., Bataillon, T. M., Decoux, G., Rozale, C., Schoen, D. J. & David, J. L. (2001).MSTRAT: An algorithm for building germ plasm core collections by maximizing allelic or phe-notypic richness. The Journal of heredity, 92(1), 93–94.Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li,H., Zhai, W., Fritz, M. H.-Y. et al. (2010). A draft sequence of the Neandertal genome. science,328(5979), 710–722.Guo, S., Zhang, J., Sun, H., Salse, J., Lucas, W. J., Zhang, H., Zheng, Y., Mao, L., Ren, Y.,Wang, Z., Min, J., Guo, X., Murat, F., Ham, B.-K., Zhang, Z., Gao, S., Huang, M., Xu, Y.,Zhong, S., Bombarely, A., Mueller, L. A., Zhao, H., He, H., Zhang, Y., Zhang, Z., Huang, S.,85BibliographyTan, T., Pang, E., Lin, K., Hu, Q., Kuang, H., Ni, P., Wang, B., Liu, J., Kou, Q., Hou, W.,Zou, X., Jiang, J., Gong, G., Klee, K., Schoof, H., Huang, Y., Hu, X., Dong, S., Liang, D.,Wang, J., Wu, K., Xia, Y., Zhao, X., Zheng, Z., Xing, M., Liang, X., Huang, B., Lv, T., Wang,J., Yin, Y., Yi, H., Li, R., Wu, M., Levi, A., Zhang, X., Giovannoni, J. J., Wang, J., Li, Y., Fei,Z. & Xu, Y. (2012). The draft genome of watermelon (Citrullus lanatus) and resequencing of20 diverse accessions. Nature Genetics, 45(1), 51–58.Hansen, J., Sato, M. & Ruedy, R. (2012). Perception of climate change. Proceedings of theNational Academy of Sciences, 109(37), E2415–E2423.Harter, A. V., Gardner, K. A., Falush, D., Lentz, D. L., Bye, R. A. & Rieseberg, L. H. (2004).Origin of extant domesticated sunflowers in eastern North America. Nature, 430(6996), 201–205.Hattori, Y., Nagai, K., Furukawa, S., Song, X.-J., Kawano, R., Sakakibara, H., Wu, J., Matsumoto,T., Yoshimura, A., Kitano, H. et al. (2009). The ethylene response factors SNORKEL1 andSNORKEL2 allow rice to adapt to deep water. Nature, 460(7258), 1026–1030.van Heerwaarden, J., Hufford, M. B. & Ross-Ibarra, J. (2012). Historical genomics of northamerican maize. Proceedings of the National Academy of Sciences, 109(31), 12420–12425.Heiser, C. B. (1951). Hybridization in the annual sunflowers: Helianthus annuus × H. debilis var.cucumerifolius. Evolution, 5(1), 42–51.Heiser, C. B., Smith, D., Clevenger, S. & Martin, W. (1969). The north american sunflower(Helianthus). Memories Torrey Botanical Club, 22, 1–218.Horn, R., Kusterer, B., Lazarescu, E., Prüfe, M. & Friedt, W. (2003). Molecular mapping of theRf1 gene restoring pollen fertility in PET1-based F1 hybrids in sunflower (Helianthus annuusL.). Theoretical and Applied Genetics, 106(4), 599–606.Huang, X., Kurata, N., Wei, X., Wang, Z.-X., Wang, A., Zhao, Q., Zhao, Y., Liu, K., Lu, H., Li,W., Guo, Y., Lu, Y., Zhou, C., Fan, D., Weng, Q., Zhu, C., Huang, T., Zhang, L., Wang, Y.,86BibliographyFeng, L., Furuumi, H., Kubo, T., Miyabayashi, T., Yuan, X., Xu, Q., Dong, G., Zhan, Q., Li,C., Fujiyama, A., Toyoda, A., Lu, T., Feng, Q., Qian, Q., Li, J. & Han, B. (2012). A map ofrice genome variation reveals the origin of cultivated rice. Nature, 490(7421), 497–501.Hufford, M. B., Lubinksy, P., Pyhäjärvi, T., Devengenzo, M. T., Ellstrand, N. C. & Ross-Ibarra,J. (2013). The genomic signature of crop-wild introgression in maize. PLoS Genetics, 9(5),e1003477.Hufford, M. B., Xu, X., van Heerwaarden, J., rvi, T. P. a. j. a., Chia, J.-M., Cartwright, R. A.,Elshire, R. J., Glaubitz, J. C., Guill, K. E., Kaeppler, S. M., Lai, J., Morrell, P. L., Shannon,L. M., Song, C., Springer, N. M., Swanson-Wagner, R. A., Tiffin, P., Wang, J., Zhang, G.,Doebley, J., McMullen, M. D., Ware, D., Buckler, E. S., Yang, S. & Ross-Ibarra, J. (2012).Comparative population genomics of maize domestication and improvement. Nature Genetics(pp. 1–6).Huson, D. H. (2005). Application of phylogenetic networks in evolutionary studies. MolecularBiology and Evolution, 23(2), 254–267.Imaizumi, T. (2005). FKF1 F-Box protein mediates cyclic degradation of a repressor of CONSTANSin Arabidopsis. Science, 309(5732), 293–297.Jan, C. C. & Chandler, J. M. (1988). Registration of a powdery mildew resistant sunflowergermplasm pool, PM. Crop Science, 28(6), 1040–1040.Jan, C. C., Quresh, Z. & Gulya, T. J. (2004). Registration of seven rust resistant sunflowergermplasms. Crop Science, 44(5), 1887–1888.Jan, C. C. & Vick, B. A. (2006). Registration of seven cytoplasmic male-sterile and four fertilityrestoration sunflower germplasms. Crop Science, 46(4), 1829–1830.Jansky, S. & Hamernik, A. (2009). The introgression of 2× 1EBN Solanum species into thecultivated potato using Solanum verrucosum as a bridge. Genetic resources and crop evolution,56(8), 1107–1115.87BibliographyKane, N. C., Burke, J. M., Marek, L., Seiler, G., Vear, F., Baute, G., Knapp, S. J., Vincourt,P. & Rieseberg, L. H. (2012). Sunflower genetic, genomic and ecological resources. MolecularEcology Resources, 13(1), 10–20.Kane, N. C., Gill, N., King, M. G., Bowers, J. E., Berges, H., Gouzy, J., Bachlava, E., Langlade,N. B., Lai, Z., Stewart, M., Burke, J. M., Vincourt, P., Knapp, S. J. & Rieseberg, L. H. (2011).Progress towards a reference genome for sunflower. Botany, 89(7), 429–437.Kane, N. C., King, M. G., Barker, M. S., Raduski, A., Karrenberg, S., Yatabe, Y., Knapp, S. J. &Rieseberg, L. H. (2009). Comparative genomic and population genetic analyses indicate highlyporous genomes and high levels of gene flow between divergent Helianthus species. Evolution,63(8), 2061–2075.Khoury, C. K., Bjorkman, A. D., Dempewolf, H., Ramirez-Villegas, J., Guarino, L., Jarvis, A.,Rieseberg, L. H. & Struik, P. C. (2014). Increasing homogeneity in global food supplies andthe implications for food security. Proceedings of the National Academy of Sciences, 111(11),4001–4006.Khoury, C. K., Greene, S., Wiersema, J., Maxted, N., Jarvis, A. & Struik, P. C. (2013). Aninventory of crop wild relatives of the United States. Crop Science, 53(4), 1496.Kihara, H. (1944). Origin of spelta wheat. Agriculture and Horticulture (Tokyo), 19, 889–890.Kinman, M. (1970). New developments in the USDA and state experiment station sunflowerbreeding programs. Proceedings of the 4th International Sunflower Conference.Knight, T. (1806). Observations on the method of producing new and early fruit. Transactionsof the Horticultural Society, 2, 30–39.Koenig, D. & Jiménez-Gómez, J. M. (2013). Comparative transcriptomics reveals patterns ofselection in domesticated and wild tomato. In Proceedings of the National Academy of Sciences.Korell, M., Mosges, G. & Friedt, W. (1992). Construction of a sunflower pedigree map. Helia,15, 7–16.88BibliographyKrieger, U., Lippman, Z. B. & Zamir, D. (2010). The flowering gene SINGLE FLOWER TRUSSdrives heterosis for yield in tomato. Nature Genetics, 42(5), 459–463.Kulathinal, R. J., Stevison, L. S. & Noor, M. A. (2009). The genomics of speciation in Drosophila:diversity, divergence, and introgression estimated using low-coverage genome sequencing. PLoSGenetics, 5(7), e1000550.Lai, Z., Kane, N. C., Kozik, A., Hodgins, K. A., Dlugosch, K. M., Barker, M. S., Matvienko, M.,Yu, Q., Turner, K. G., Pearl, S. A. et al. (2012). Genomics of Compositae weeds: EST libraries,microarrays, and evidence of introgression. American journal of botany, 99(2), 209–218.Lam, H.-M., Xu, X., Liu, X., Chen, W., Yang, G., Wong, F.-L., Li, M.-W., He, W., Qin, N.,Wang, B., Li, J., Jian, M., Wang, J., Shao, G., Wang, J., Sun, S. S.-M. & Zhang, G. (2010).Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversityand selection. Nature Genetics, 42(12), 1053–1059.Leclercq, P. (1969). Une stérilité male cytoplasmique chez le Tournesol (Helianthus annuus L.).CR Acad. Sci. Paris (pp. 2385–2387).Lee, H. K., Braynen, W., Keshav, K. & Pavlidis, P. (2005). ErmineJ: tool for functional analysisof gene expression data sets. BMC Bioinformatics, 6, 269.Li, H. & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheelertransform. Bioinformatics, 25(14), 1754–1760.Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis,G., Durbin, R. & 1000 Genome Project Data Processing Subgroup (2009). The SequenceAlignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079.Lin, T., Zhu, G., Zhang, J., Xu, X., Yu, Q., Zheng, Z., Zhang, Z., Lun, Y., Li, S., Wang, X.,Huang, Z., Li, J., Zhang, C., Wang, T., Zhang, Y., Wang, A., Zhang, Y., Lin, K., Li, C., Xiong,G., Xue, Y., Mazzucato, A., Causse, M., Fei, Z., Giovannoni, J. J., Chetelat, R. T., Zamir, D.,Städler, T., Li, J., Ye, Z., Du, Y. & Huang, S. (2014). Genomic analyses provide insights intothe history of tomato breeding. Nature Genetics, 46(11), 1220–1226.89BibliographyLu, F., Lipka, A. E., Glaubitz, J., Elshire, R., Cherney, J. H., Casler, M. D., Buckler, E. S. &Costich, D. E. (2013). Switchgrass genomic diversity, ploidy, and evolution: Novel insights froma network-based SNP discovery protocol. PLoS Genetics, 9(1), e1003215.Mackay, M. & Street, K. (2004). Focused identification of germplasm strategy – figs. Proceedingsof the 54th Australian Cereal Chemistry Conference and the 11th Wheat Breeders’ Assembly(pp. 138–141).Mallet, J. (2007). Hybrid speciation. Nature, 446(7133), 279–283.Mandel, J. R., Dechaine, J. M., Marek, L. F. & Burke, J. M. (2011). Genetic diversity andpopulation structure in cultivated sunflower and a comparison to its wild progenitor, Helianthusannuus L. Theoretical and Applied Genetics, 123(5), 693–704.Mandel, J. R., McAssey, E. V., Nambeesan, S., Garcia-Navarro, E. & Burke, J. M. (2014).Molecular evolution of candidate genes for crop-related traits in sunflower (Helianthus annuusL.). PLoS ONE, 9(6), e99620.Mandel, J. R., Nambeesan, S., Bowers, J. E., Marek, L. F., Ebert, D., Rieseberg, L. H., Knapp,S. J. & Burke, J. M. (2013). Association mapping and the genomic consequences of selectionin sunflower. PLoS Genetics, 9(3), e1003378.Martin, S. H., Dasmahapatra, K. K., Nadeau, N. J., Salazar, C., Walters, J. R., Simpson, F.,Blaxter, M., Manica, A., Mallet, J. & Jiggins, C. D. (2013). Genome-wide evidence for speciationwith gene flow in Heliconius butterflies. Genome research, 23(11), 1817–1828.McCouch, S., Baute, G. J., Bradeen, J., Bramel, P., Bretting, P. K., Buckler, E., Burke, J. M.,Charest, D., Cloutier, S., Cole, G. et al. (2013). Agriculture: feeding the future. Nature,499(7456), 23–24.McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K.,Altshuler, D., Gabriel, S., Daly, M. & DePristo, M. A. (2010). The Genome Analysis Toolkit: AMapReduce framework for analyzing next-generation DNA sequencing data. Genome Research,20(9), 1297–1303.90BibliographyMeyer, R. S. & Purugganan, M. D. (2013). Evolution of crop species: genetics of domesticationand diversification. Nature Reviews Genetics, 14(12), 840–852.Mokrani, L., Gentzbittel, L., Azanza, F., Fitamant, L., Al-Chaarani, G. & Sarrafi, A. (2002).Mapping and analysis of quantitative trait loci for grain oil content and agronomic traits usingAFLP and SSR in sunflower (Helianthus annuus L.). Theoretical and Applied Genetics, 106(1),149–156.Morrell, P. L., Buckler, E. S. & Ross-Ibarra, J. (2012). Crop genomics: advances and applications.Nature Reviews Genetics, 13(2), 85–96.Myles, S., Boyko, A. R., Owens, C. L., Brown, P. J., Grassi, F., Aradhya, M. K., Prins, B.,Reynolds, A., Chia, J.-M. & Ware, D. (2011). Genetic structure and domestication history ofthe grape. Proceedings of the National Academy of Sciences of the United States of America,108(9), 3530–3535.Nambeesan, S. U., Mandel, J. R., Bowers, J. E., Marek, L. F., Ebert, D., Corbi, J., Rieseberg,L. H., Knapp, S. J. & Burke, J. M. (2015). Association mapping in sunflower (Helianthusannuus L.) reveals independent control of apical vs. basal branching. BMC plant biology, 15(1),84.Ni, Z., Kim, E.-D., Ha, M., Lackey, E., Liu, J., Zhang, Y., Sun, Q. & Chen, Z. J. (2008). Alteredcircadian rhythms regulate growth vigour in hybrids and allopolyploids. Nature, 457(7226),327–333.Olsen, K. M. & Wendel, J. F. (2013). A bountiful harvest: Genomic insights into crop domesti-cation phenotypes. Annual Review of Plant Biology, 64(1), 47–70.Pavlidis, P., Jensen, J. D., Stephan, W. & Stamatakis, A. (2012). A critical assessment of sto-rytelling: Gene ontology categories and the importance of validating genomic scans. MolecularBiology and Evolution, 29(10), 3237–3248.Petit, R. J. & Excoffier, L. (2009). Gene flow and species delimitation. Trends in Ecology &Evolution, 24(7), 386–393.91BibliographyPhillips, O. L. & Meelleur, B. A. (1998). Usefulness and economic potential of the rare plants ofthe United States: a statistical survey. Economic Botany, 52(1), 57–67.Pritchard, J. K., Stephens, M. & Donnelly, P. (2000). Inference of population structure usingmultilocus genotype data. Genetics, 155(2), 945–959.Putt, E. (1997). Early history of sunflower. In Sunflower Technology and Production (pp. 1–19).Qi, L. L., Seiler, G. J., Vick, B. A. & Gulya, T. J. (2012). Genetics and mapping of the R 11 geneconferring resistance to recently emerged rust races, tightly linked to male fertility restoration,in sunflower (Helianthus annuus L.). Theoretical and Applied Genetics, 125(5), 921–932.Raduski, A. R., Rieseberg, L. H. & Strasburg, J. L. (2010). Effective population size, gene flow, andspecies status in a narrow endemic sunflower, Helianthus neglectus, compared to its widespreadsister species, H. petiolaris. International journal of molecular sciences, 11(2), 492–506.Raj, A., Stephens, M. & Pritchard, J. K. (2014). fastSTRUCTURE: variational inference ofpopulation structure in large SNP data sets. Genetics, 197(2), 573–589.Ramírez-Villegas, J., Khoury, C., Jarvis, A., Debouck, D. G. & Guarino, L. (2010). A gap analysismethodology for collecting crop genepools: A case study with Phaseolus beans. PLoS ONE,5(10), e13497.Renaut, S., grassa, C., Moyers, B., Kane, N. & Rieseberg, L. (2012). The population genomics ofsunflowers and genomic determinants of protein evolution revealed by RNAseq. Biology, 1(3),575–596.Renaut, S., Grassa, C., Yeaman, S., Moyers, B., Lai, Z., Kane, N., Bowers, J., Burke, J. &Rieseberg, L. (2013). Genomic islands of divergence are not affected by geography of speciationin sunflowers. Nature Communications, 4, 1827.Rieseberg, L. H., Baird, S. J. & Desrochers, A. M. (1998). Patterns of mating in wild sunflowerhybrid zones. Evolution (pp. 713–726).92BibliographyRieseberg, L. H., Beckstrom-Sternberg, S. & Doan, K. (1990). Helianthus annuus ssp. texanus haschloroplast DNA and nuclear ribosomal RNA genes of Helianthus debilis ssp. cucumerifolius.Proceedings of the National Academy of Sciences of the United States of America, 87(2),593–597.Rieseberg, L. H., Beckstrom-Sternberg, S. M., Liston, A. & Arias, D. M. (1991). Phylogenetic andsystematic inferences from chloroplast DNA and isozyme variation in Helianthus sect. Helianthus(Asteraceae). Systematic Botany (pp. 50–76).Rieseberg, L. H., Soltis, D. E. & Palmer, J. D. (1988). A molecular reexamination of introgressionbetween Helianthus annuus and H. bolanderi (Compositae). Evolution (pp. 227–238).Rieseberg, L. H., Whitton, J. & Gardner, K. (1999). Hybrid zones and the genetic architecture ofa barrier to gene flow between two sunflower species. Genetics, 152(2), 713–727.Rieseberg, L. H. & Willis, J. H. (2007). Plant Speciation. Science, 317(5840), 910–914.Roesti, M., Salzburger, W. & Berner, D. (2012). Uninformative polymorphisms bias genome scansfor signatures of selection. BMC evolutionary biology, 12(1), 94.Rogers, C., Thompson, T. & Seiler, G. (1982). Sunflower Species of the United States. NationalSunflower Association.Romay, M. C., Millard, M. J., Glaubitz, J. C., Peiffer, J. A., Swarts, K. L., Casstevens, T. M.,Elshire, R. J., Acharya, C. B., Mitchell, S. E., Flint-Garcia, S. A., McMullen, M. D., Holland,J. B., Buckler, E. S. & Gardner, C. A. (2013). Comprehensive genotyping of the USA nationalmaize inbred seed bank. Genome Biology, 14(6), R55.Rowe, H. C. & Rieseberg, L. H. (2013). Genome-scale transcriptional analyses of first-generationinterspecific sunflower hybrids reveals broad regulatory compatibility. BMC genomics, 14(1),342.Sanjur, O. I., Piperno, D. R., Andres, T. C. & Wessel-Beaver, L. (2002). Phylogenetic rela-tionships among domesticated and wild species of Cucurbita (Cucurbitaceae) inferred from a93Bibliographymitochondrial gene: Implications for crop plant evolution and areas of origin. Proceedings ofthe National Academy of Sciences, 99(1), 535–540.Schilling, E. (1997). Phylogenetic analysis of Helianthus (Asteraceae) based on chloroplast DNArestriction site data. Theoretical and applied genetics, 94(6-7), 925–933.Schnable, P. S. & Springer, N. M. (2013). Progress toward understanding heterosis in crop plants.Annual review of plant biology, 64, 71–88.Schroeder, J. I., Delhaize, E., Frommer, W. B., Guerinot, M. L., Harrison, M. J., Herrera-Estrella,L., Horie, T., Kochian, L. V., Munns, R., Nishizawa, N. K. et al. (2013). Using membranetransporters to improve crops for sustainable food production. Nature, 497(7447), 60–66.Seiler, G. J. (1991a). Registration of 13 downy mildew tolerant interspecific sunflower germplasmlines derived from wild annual species. Crop Science, 31(6), 1714–1716.Seiler, G. J. (1991b). Registration of 15 interspecific sunflower germplasm lines derived from wildannual species. Crop Science, 31(5), 1389–1390.Seiler, G. J. (1991c). Registration of six interspecific sunflower germpalsms lines derived from wildperennial species. Crop Science, 31(4), 1097–1098.Seiler, G. J. (1992). Utilization of wild sunflower species for the improvement of cultivatedsunflower. Field Crops Research, 30(3), 195–230.Seiler, G. J. (1993). Registration of six interspecific germplasm lines derived from wild perennialsunflower. Crop Science, 33(5), 1110–1111.Seiler, G. J. (2000). Registration of ten interspecific germplasms derived from wild perennialsunflower. Crop Science, 40(2), 587–588.Seiler, G. J. & Jan, C. C. (1994). New fertility restoration genes from wild sunflowers for sunflowerPET1 male-sterile cytoplasm. Crop Science, 34(6), 1526–1528.Seiler, G. J. & Jan, C. C. (1997). Registration of 10 interspecific germplasm fertility restorerpopulations for sunflower PET1 male-sterile cytoplasm. Crop Science, 37(6), 1989–1991.94BibliographyShalem, O., Sanjana, N. E., Hartenian, E., Shi, X., Scott, D. A., Mikkelsen, T. S., Heckl, D.,Ebert, B. L., Root, D. E., Doench, J. G. et al. (2014). Genome-scale CRISPR-Cas9 knockoutscreening in human cells. Science, 343(6166), 84–87.Smale, M. & Koo, B. (2003). Biotechnology and Genetic Resource Policies: What is a GenebankWorth? IFPRI: IPGRI: SGRP.Smith, B. D. (2006). Eastern North America as an independent center of plant domestication.Proceedings of the National Academy of Sciences of the United States of America, 103(33),12223–12228.Stelkens, R. & Seehausen, O. (2009). Genetic distance between species predicts novel trait ex-pression in their hybrids. Evolution, 63(4), 884–897.Stölting, K. N., Nipper, R., Lindtke, D., Caseys, C., Waeber, S., Castiglione, S. & Lexer, C.(2012). Genomic scan for single nucleotide polymorphisms reveals patterns of divergence andgene flow between ecologically divergent species. Molecular Ecology, 22(3), 842–855.Strasburg, J. L., Kane, N. C., Raduski, A. R., Bonin, A., Michelmore, R. & Rieseberg, L. H. (2011).Effective population size is positively correlated with levels of adaptive divergence among annualsunflowers. Molecular Biology and Evolution, 28(5), 1569–1580.Strasburg, J. L. & Rieseberg, L. H. (2008). Molecular demographic history of the annual sunflowersHelianthus annuus and H. petiolaris—large effective population sizes and rates of long-term geneflow. Evolution, 62(8), 1936–1950.Swanson-Wagner, R. A., Jia, Y., DeCook, R., Borsuk, L. A., Nettleton, D. & Schnable, P. S.(2006). All possible modes of gene action are observed in a global comparison of gene expressionin a maize F1 hybrid and its inbred parents. Proceedings of the National Academy of Sciencesof the United States of America, 103(18), 6805–6810.Swofford, D. L. (2001). Paup*: Phylogenetic analysis using parsimony (and other methods) 4.0.b5.95BibliographyTang, S., Leon, A., Bridges, W. C. & Knapp, S. J. (2006). Quantitative trait loci for geneticallycorrelated seed traits are tightly linked to branching and pericarp pigment loci in sunflower.Crop Science, 46(2), 721–734.Timme, R. E., Simpson, B. B. & Linder, C. R. (2007). High-resolution phylogeny for Helianthus(Asteraceae) using the 18S-26S ribosomal DNA external transcribed spacer. American Journalof Botany, 94(11), 1837–1852.van Treuren, R., de Groot, E. C., Boukema, I. W., van de Wiel, C. C. M. & van Hintum, T. J. L.(2010). Marker-assisted reduction of redundancy in a genebank collection of cultivated lettuce.Plant Genetic Resources, 8(02), 95–105.Tsukamoto, T., Qin, Y., Huang, Y., Dunatunga, D. & Palanivelu, R. (2010). A role for LORELEI,a putative glycosylphosphatidylinositol-anchored protein, in Arabidopsis thaliana double fertil-ization and early seed development. The Plant Journal, 62(4), 571–588.Urnov, F. D., Rebar, E. J., Holmes, M. C., Zhang, H. S. & Gregory, P. D. (2010). Genome editingwith engineered zinc finger nucleases. Nature Reviews Genetics, 11(9), 636–646.Vavilov, N. I. (1935). Theoretical basis for plant breeding, Vol. 1. Moscow. Origin and geographyof cultivated plants. In The Phytogeographical Basis for Plant Breeding (D. Love, transl.).Cambridge, UK: Cambridge Univ. Press, Cambridge, UK.Vavrek, M. J. (2011). fossil: palaeoecological and palaeogeographical analysis tools. Palaeontolo-gia Electronica, 14(1). R package version 0.3.0.Wagner, C. E., Keller, I., Wittwer, S., Selz, O. M., Mwaiko, S., Greuter, L., Sivasundar, A. &Seehausen, O. (2013). Genome-wide rad sequence data provide unprecedented resolution ofspecies boundaries and relationships in the lake victoria cichlid adaptive radiation. Molecularecology, 22(3), 787–798.Weir, B. S. & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of populationstructure. Evolution, 38(6), 1358–1370.96BibliographyWhitford, R., Fleury, D., Reif, J. C., Garcia, M., Okada, T., Korzun, V. & Langridge, P. (2013).Hybrid breeding in wheat: technologies to improve hybrid wheat seed production. Journal ofexperimental botany, 64(18), 5411–5428.Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software,21(12), 1–20.Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer New York.Wills, D. M. & Burke, J. M. (2006). Chloroplast DNA variation confirms a single origin ofdomesticated sunflower (Helianthus annuus L.). Journal of Heredity.Wills, D. M. & Burke, J. M. (2007). Quantitative Trait Locus Analysis of the Early Domesticationof Sunflower. Genetics, 176(4), 2589–2599.Wright, S. I. (2005). The Effects of Artificial Selection on the Maize Genome. Science, 308(5726),1310–1314.Xi, Z.-Y., He, F.-H., Zeng, R.-Z., Zhang, Z.-M., Ding, X.-H., Li, W.-T. & Zhang, G.-Q. (2006).Development of a wide population of chromosome single-segment substitution lines in the ge-netic background of an elite cultivar of rice (Oryza sativa L.). Genome, 49(5), 476–484.Yue, B., Radi, S. A., Vick, B. A., Cai, X., Tang, S., Knapp, S. J., Gulya, T. J., Miller, J. F. & Hu,J. (2008). Identifying quantitative trait loci for resistance to Sclerotinia head rot in two USDAsunflower germplasms. Phytopathology, 98(8), 926–931.Yue, B., Vick, B. A., Cai, X. & Hu, J. (2010). Genetic mapping for the Rf1 (fertility restoration)gene in sunflower (Helianthus annuus L.) by SSR and TRAP markers. Plant Breeding, 129(1),24–28.Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C. & Weir, B. S. (2012). A high-performance computing toolset for relatedness and principal component analysis of SNP data.Bioinformatics, 28(24), 3326–3328.97Appendix AChapter 2 supplementary materialsTable A.1: Sample information.Sample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA Number14TB2 H. a. texanus wild Renaut et al.,201214TB-2 Illuminapaired-end5311300 SRX26486820TB7 H. a. texanus wild Renaut et al.,20122OTB-7 Illuminapaired-end7295124 SRX264869Alberta H. annuus wild Renaut et al.,2013ALB Illuminapaired-end11513420 SRX264905Ames449 H. argophyllus wild Renaut et al.,2013Ames449 Illuminapaired-end7027341 SRX264858Ames695 H. argophyllus wild Renaut et al.,2013Ames695 Illuminapaired-end6179471 SRX26485998AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberARG11B H. argophyllus wild Renaut et al.,2013arg11B-11 Illuminapaired-end4077526 SRX264836ARG14B7 H. argophyllus wild Renaut et al.,2013arg14B-7 Illuminapaired-end6722467 SRX264837ARG1805 H. argophyllus wild Renaut et al.,2013ARG1805 Illuminapaired-end12360624 SRX264860ARG1820 H. argophyllus wild Renaut et al.,2013ARG1820 Illuminapaired-end14942649 SRX264861ARG1834 H. argophyllus wild Renaut et al.,2013ARG1834 Illuminapaired-end5639634 SRX264862ARG2B H. argophyllus wild Renaut et al.,2013arg2B-4 Illuminapaired-end9283770 SRX264838ARG4B H. argophyllus wild Renaut et al.,2013arg4B-8 Illuminapaired-end8159151 SRX264839ARG6B H. argophyllus wild Renaut et al.,2013arg6B-1 Illuminapaired-end13130920 SRX26484099AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberArikara H. annuus landrace current study PI 369357 Illuminapaired-end10351815 SRX790687BTM10 H. argophyllus wild Renaut et al.,2013btm10-5 Illuminapaired-end7248063 SRX264844BTM13 H. argophyllus wild Renaut et al.,2013btm13-4 Illuminapaired-end6098828 SRX264845BTM17 H. argophyllus wild Renaut et al.,2013btm17-4 Illuminapaired-end8992366 SRX264846BTM19 H. argophyllus wild Renaut et al.,2013btm19-1 Illuminapaired-end11067369 SRX264847BTM20 H. argophyllus wild Renaut et al.,2013btm20-8 Illuminapaired-end10190174 SRX264848BTM21 H. argophyllus wild Renaut et al.,2013btm21-4 Illuminapaired-end5968340 SRX264849BTM22 H. argophyllus wild Renaut et al.,2013btm22-8 Illuminapaired-end6989636 SRX264850100AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberBTM25 H. argophyllus wild Renaut et al.,2013btm25-2 Illuminapaired-end12474280 SRX264851BTM26 H. argophyllus wild Renaut et al.,2013btm26-4 Illuminapaired-end4772713 SRX264852BTM27 H. argophyllus wild Renaut et al.,2013btm27-3 Illuminapaired-end10368905 SRX264853BTM3 H. a. texanus wild Renaut et al.,2012btm3-2 Illuminapaired-end6949675 SRX264864BTM30 H. argophyllus wild Renaut et al.,2013btm30-6 Illuminapaired-end5702191 SRX264854BTM35 H. a. texanus wild Renaut et al.,2012btm35-4 Illuminapaired-end7560569 SRX264867BTM6 H. a. texanus wild Renaut et al.,2012btm6-1 Illuminapaired-end8048792 SRX264865Colorado H. annuus wild Renaut et al.,2013CON2 454 single-end 425154 SRX264553101AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberHA369 H. annuus moderncultivarcurrent study PI 534655 Illuminapaired-end6641058 SRX790688HA384 H. annuus moderncultivarcurrent study PI 578873 Illuminapaired-end4753467 SRX790689HA412 H. annuus moderncultivarcurrent study PI 642777 Illuminapaired-end20244870 SRX790690HA89 H. annuus moderncultivarRowe andRieseberg2013PI 599773 Illuminapaired-end10815764 SRX264826Havasuapi H. annuus landrace current study PI 369358 Illuminapaired-end8737471 SRX790691Hidatsa H. annuus landrace current study PI 600721 Illuminapaired-end11159166 SRX790692Hopi H. annuus landrace current study PI 369359 454 single-end 278614 SRX790693Hopi.Dye H. annuus landrace current study PI 369359 Illuminapaired-end10088537 SRX790694102AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberHTAI H. a. texanus wild Renaut et al.,2013HTAI Illuminapaired-end373561 SRX264556Iowa H. annuus wild Renaut et al.,2013IOW 454 single-end 759756 SRX264558ISS19 H. petiolaris wild Renaut et al.,2013ISS19 Illuminapaired-end7715472 SRX264886K111 H. a. texanus wild Renaut et al.,2013K111 Illuminapaired-end25367944 SRX264906Kansas H. annuus wild Renaut et al.,2013KSN 454 single-end 493766 SRX264561Kosim H. annuus moderncultivarcurrent study PI 650781 Illuminapaired-end5948346 SRX790695KSG54 H. petiolaris wild Renaut et al.,2013KSG54 Illuminapaired-end9140849 SRX264887Maiz.Negro H. annuus landrace current study PI 650761 Illuminapaired-end3684143 SRX790696103AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA H. annuus landrace current study PI 650646 Illuminapaired-end5688052 SRX790697Mammoth H. annuus moderncultivarcurrent study PI 476853 Illuminapaired-end1548351 SRX790698Mandan H. annuus landrace current study PI 600717 Illuminapaired-end9560933 SRX790699Missouri H. annuus wild Renaut et al.,2013MOW 454 single-end 383934 SRX264562Nebraska H. annuus wild Renaut et al.,2013NEW Illuminapaired-end12350250 SRX264909New.Mexico H. annuus wild Renaut et al.,2013NMN 454 single-end 350394 SRX264564North.Dakota H. annuus wild Renaut et al.,2013NDW 454 single-end 456746 SRX264563Oklahoma H. annuus wild Renaut et al.,2013OKW 454 single-end 446909 SRX264565104AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberPET_2 H. petiolaris wild Renaut et al.,2012PET-2 Illuminapaired-end13013983 SRX264824PET_3 H. petiolaris wild Renaut et al.,2012PET-3 Illuminapaired-end10837143 SRX264825PET2119 H. petiolaris wild Renaut et al.,2013pet2119 Illuminapaired-end16650411 SRX264888PET2152 H. petiolaris wild Renaut et al.,2013Pet2152 Illuminapaired-end8323481 SRX264889PET2341 H. petiolaris wild Renaut et al.,2012PET2341 Illuminapaired-end17793928 SRX264891PET2342 H. petiolaris wild Renaut et al.,2012PET2342 Illuminapaired-end16650540 SRX264892PET2343 H. petiolaris wild Renaut et al.,2012PET2343 Illuminapaired-end12578956 SRX264894PET2344 H. petiolaris wild Renaut et al.,2012PET2344 Illuminapaired-end14028606 SRX264893105AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberPET489 H. petiolaris wild Renaut et al.,2012pet489 Illuminapaired-end15506825 SRX264895PI468805 H. petiolaris wild Renaut et al.,2012PI 468805 Illuminapaired-end4840463 SRX264896PI468812 H. petiolaris wild Renaut et al.,2012PI 468812 Illuminapaired-end15076081 SRX264897PI468815 H. petiolaris wild Renaut et al.,2012PI 468815 Illuminapaired-end5885463 SRX264898PI503232 H. petiolaris wild Renaut et al.,2012PI 503232 Illuminapaired-end12883809 SRX264899PI531058 H. petiolaris wild Renaut et al.,2012PI 531058 Illuminapaired-end12946737 SRX264900PI547210 H. petiolaris wild Renaut et al.,2012PI 547210 Illuminapaired-end10238504 SRX264901PI586932 H. petiolaris wild Renaut et al.,2012PI 586932b Illuminapaired-end2312079 SRX264902106AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberPI613767 H. petiolaris wild Renaut et al.,2012PI 613767 Illuminapaired-end15444453 SRX264903PI649907 H. petiolaris wild Renaut et al.,2012PI 649907 Illuminapaired-end13573136 SRX264904PL109 H. petiolaris wild Renaut et al.,2012PL109 Illuminapaired-end1213774 SRX264890RHA274 H. annuus moderncultivarcurrent study PI 599759 Illuminapaired-end20978271 SRX790700Seneca H. annuus landrace current study PI 369360 454 single-end 424379 SRX790701Sunrise H. annuus moderncultivarcurrent study PI 162454 Illuminapaired-end34229408 SRX790702andSRX790703Tenessee H. annuus wild Renaut et al.,2013TEW Illuminapaired-end10736919 SRX264911TEX H. a. texanus wild Renaut et al.,2013TEX Illuminapaired-end11087280 SRX264912107AppendixA.Chapter2supplementarymaterialsSample Name Taxa Description Source Original Name SequencingtechnologyNumber ofreads mappedSRA NumberUtah H. annuus wild Renaut et al.,2013UTN1 454 single-end 379276 SRX264567VNIIMK8931 H. annuus moderncultivarcurrent study PI 340790 Illuminapaired-end32460462 SRX790704Zuni H. annuus landrace current study PI432515 Illuminapaired-end6602952 SRX790705108AppendixA.Chapter2supplementarymaterialsTable A.2: Description of domestication candidate contigs fromF ST genome scan.Contig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet000129|Contig50270|consensus| ann13 0.17821 1 0.0961 0.2776 -2.5604 4 61.3122BigSet000428|Contig48614|consensus| ann1 0.42769 1 0.1889 0.0000 -2.4836 17 53.5728BigSet001317|Contig47760|consensus| ann18 0.34166 12 0.0364 0.2742 na na naBigSet001458|Contig5380|consensus| ann9 0.17220 1 0.1080 0.2747 -3.9308 16 86.7283BigSet001624|lrc117225 0.17832 2 0.1595 0.2739 -3.5979 14 55.7125109AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet002030|Contig38417|consensus| ann30 0.16175 1 0.3719 0.1991 -4.3404 na naBigSet002136|Contig52634|consensus| ann25 0.16828 2 0.1314 0.3018 -3.7469 5 56.9785BigSet002407|Contig58156|consensus| ann12 0.16872 1 0.2574 0.2240 -3.5979 na naBigSet002555|Contig25148|consensus| ann30 0.15776 1 0.0556 0.4242 na na naBigSet002613|Contig60453|consensus| ann19 0.16435 1 0.3486 0.1457 -3.7469 na na110AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet002710|Contig37563|consensus| ann31 0.15149 1 0.2785 0.2681 -4.6292 na naBigSet002711|Contig56856|consensus| ann11 0.18435 1 0.1558 0.2296 -2.5604 16 39.4784BigSet002717|Contig2564|consensus| ann9 0.21556 1 0.0000 0.4007 -2.3135 5 57.2473BigSet002761|Contig11910|consensus| ann3 0.34407 1 0.0000 0.3029 na na naBigSet002773|Contig66556|consensus| ann19 0.16122 1 0.3106 0.1935 -2.8904 na na111AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet002778|Contig36366|consensus| ann1 0.53417 1 0.0000 0.0000 na na naBigSet002925|lrc901213 0.17162 1 0.2764 0.2057 -4.2356 11 84.2895BigSet003235|Contig1420|Contig27094|consensus| ann|c153276 0.23396 1 0.1713 0.2481 -3.2262 12 34.7025BigSet003265|c278811 0.17460 1 0.0576 0.3822 na 10 34.8963BigSet003281|Contig18542|consensus| ann5 0.19849 1 0.1167 0.3113 na na na112AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet003422|Contig43945|consensus| ann31 0.15931 1 0.1560 0.3716 -4.2475 na naBigSet004407|Contig1351|Contig24728|consensus| ann|c348316 0.16889 1 0.1231 0.3019 -3.2262 11 25.8401BigSet004413|Contig57915|consensus| ann33 0.15055 1 0.3214 0.2272 -3.8396 na naBigSet004482|Contig38069|consensus| ann44 0.15215 1 0.2298 0.2985 -3.7820 na naBigSet004543|Contig43996|consensus| ann17 0.16506 1 0.2149 0.3200 -3.2471 1 16.6803113AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet004597|HA89.CCFT7546.r12 0.19005 2 0.2836 0.2864 -4.3144 na naBigSet004698|Contig28059|consensus| ann12 0.19591 2 0.0810 0.3394 na na naBigSet004876|Contig637|Contig2319|consensus| ann|HA89.CCFT8054.f50 0.14886 1 0.2227 0.3003 -3.7020 17 61.1021BigSet004931|Contig64008|consensus| ann13 0.16583 1 0.2662 0.1856 -3.5979 3 54.9389114AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet005014|Contig242| c4828|Contig16517|consensus| ann|Contig27720|consensus| ann34 0.17921 3 0.1828 0.3219 -4.3788 12 46.4079BigSet005141|Contig29807|consensus| ann6 0.19533 1 0.0893 0.3621 na 1 10.8957BigSet005202|Contig56323|consensus| ann17 0.17418 2 0.2171 0.2066 -3.2963 na naBigSet005244|Contig43255|consensus| ann16 0.16149 1 0.0794 0.2755 na na na115AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet005296|Contig1805|consensus| ann7 0.21508 2 0.1953 0.2375 -2.5604 na naBigSet005436|Contig46201|consensus| ann14 0.18005 1 0.3210 0.2138 -4.0098 9 65.9556BigSet005449|Contig2183|Contig64031|consensus| ann|Contig57135|consensus| ann22 0.15876 1 0.1465 0.2833 -3.8393 5 55.9020BigSet005546|Contig68271|consensus| ann32 0.16173 1 0.2195 0.2618 -2.6486 na na116AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet005604|Contig49724|consensus| ann11 0.20140 1 0.0779 0.3456 na na naBigSet005672|Contig424|HA89.CCFS6939.f|Contig43324|consensus| ann17 0.16414 1 0.3581 0.2413 -4.3144 5 56.9777BigSet005691|Contig908|Contig11231|consensus| ann|Contig8105|consensus| ann|c55369 0.17741 1 0.1020 0.3310 -2.5604 13 23.4055117AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet005841|Contig30589|consensus| ann4 0.24456 1 0.0313 0.3299 na na naBigSet005892|Contig31598|consensus| ann14 0.24020 3 0.0677 0.3220 na na naBigSet006455|Contig46990|consensus| ann8 0.21485 1 0.1993 0.2388 -3.1023 na naBigSet006491|Contig10403|consensus| ann14 0.26943 5 0.1748 0.2011 0.0150 na naBigSet006643|Contig897|Contig10955|consensus| ann|HA89.FSISY1C16JPTVP3 0.22915 1 0.2143 0.2602 na 10 36.6444118AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet006701|Contig18409|consensus| ann4 0.25204 1 0.2098 0.1281 -1.1117 na naBigSet006873|Contig460|HA89.CCFT9077.r|Contig28879|consensus| ann|Contig55354|consensus| ann37 0.15551 1 0.2097 0.2725 -4.0098 9 27.7500BigSet006946|Contig43256|consensus| ann21 0.18658 1 0.1501 0.2790 -3.6270 na naBigSet007035|Contig56537|consensus| ann24 0.41020 19 0.0422 0.1038 na na na119AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet007120|Contig1487|Contig30146|consensus| ann|Contig5268|consensus| ann13 0.16361 1 0.2405 0.3040 -3.8393 14 76.1648BigSet007231|c956727 0.15610 1 0.0725 0.3753 -1.4009 9 65.5520BigSet007547|Contig22739|consensus| ann9 0.18538 1 0.0735 0.2934 na 4 48.9412BigSet007670|c430820 0.15698 1 0.2352 0.2904 -4.2098 na naBigSet007707|Contig16908|consensus| ann27 0.17425 1 0.2773 0.3180 -4.3038 5 50.6572120AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet007731|Contig266| c5540|Contig12429|consensus| ann|c627823 0.15766 1 0.2309 0.2512 -3.2777 1 10.8957BigSet007792|Contig30152|consensus| ann4 0.22061 1 0.1232 0.1528 -2.5604 1 3.0974BigSet007923|Contig32280|consensus| ann4 0.24697 1 0.1000 0.2992 na na naBigSet007950|Contig1820|Contig46727|consensus| ann|RHA280.gi_22453236_gb_BU017716.1_BU01771611 0.18198 1 0.2224 0.2940 na na na121AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet008111|c877231 0.15128 1 0.1585 0.4773 -3.2262 na naBigSet008178|c207724 0.15523 1 0.2192 0.2129 -4.2356 na naBigSet008341|Contig607|Contig1580|consensus| ann|Contig33545|consensus| ann39 0.15398 1 0.1759 0.3159 -3.7041 14 11.8320BigSet008657|Contig43302|consensus| ann23 0.15376 1 0.2612 0.2606 -3.0541 na naBigSet008709|Contig24329|consensus| ann11 0.17597 1 0.0366 0.3464 na na na122AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet009001|Contig2034|Contig60403|consensus| ann|c1077420 0.15999 1 0.2549 0.2991 -3.1582 na naBigSet009044|Contig38527|consensus| ann36 0.17536 2 0.0451 0.3545 -2.5604 4 54.3187BigSet009065|Contig12544|consensus| ann53 0.14742 1 0.3480 0.2329 -4.7010 4 56.2019BigSet009186|Contig12013|consensus| ann5 0.21528 1 0.1000 0.2403 na 5 50.6572BigSet009454|Contig12902|consensus| ann21 0.15551 1 0.2526 0.2427 -4.1370 14 19.6370123AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet009465|Contig66197|consensus| ann11 0.17974 1 0.2507 0.1510 -2.5604 na naBigSet009849|Contig63323|consensus| ann25 0.21847 6 0.0164 0.3185 na 14 16.1410BigSet009862|Contig35195|consensus| ann14 0.19095 1 0.0738 0.3551 -2.9193 na naBigSet010068|c527416 0.17993 1 0.0554 0.3752 -3.5979 4 51.2266BigSet010145|Contig3707|consensus| ann72 0.14940 1 0.1991 0.4350 -3.4435 17 39.5907BigSet010266|Contig46266|consensus| ann8 0.18541 1 0.0639 0.3367 na na na124AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet010363|Contig35728|consensus| ann3 0.24197 1 0.1435 0.1761 -2.5604 na naBigSet010508|Contig16538|consensus| ann47 0.15195 1 0.1659 0.3504 -3.7041 16 43.7818BigSet010603|Contig3229|consensus| ann1 0.46428 1 0.0000 0.0909 na na naBigSet010609|Contig5651|consensus| ann19 0.16163 1 0.2307 0.2588 -3.5979 na naBigSet010836|Contig48475|consensus| ann7 0.22650 1 0.1595 0.2391 na na na125AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet010838|Contig39589|consensus| ann15 0.16790 1 0.2361 0.2092 -4.2356 5 58.0541BigSet010889|Contig62840|consensus| ann18 0.20541 1 0.2068 0.2842 na 12 63.5155BigSet010926|Contig21227|consensus| ann31 0.15716 1 0.2230 0.2580 -4.2356 na naBigSet010983|Contig1049|Contig14903|consensus| ann|c217618 0.15998 1 0.1804 0.3238 -3.2262 na naBigSet011333|Contig43783|consensus| ann15 0.17129 1 0.2215 0.3077 -3.7719 6 36.3127126AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet011397|Contig31468|consensus| ann16 0.15815 1 0.3361 0.3858 -4.1381 na naBigSet011502|Contig17057|consensus| ann11 0.18764 1 0.0000 0.4001 na 16 39.7472BigSet011503|Contig4890|consensus| ann27 0.15581 1 0.1634 0.3190 -4.3788 14 28.5178BigSet011796|Contig700|Contig4895|consensus| ann|HA89.FSISY1C16JQB4V61 0.14505 1 0.1241 0.2742 -2.7532 na naBigSet012116|Contig8694|consensus| ann20 0.16027 1 0.2292 0.2484 -3.5979 na na127AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet012177|Contig8622|consensus| ann2 0.27998 1 0.0000 0.2854 na 4 51.0922BigSet012762|Contig29138|consensus| ann5 0.18842 1 0.1686 0.2180 -3.5979 2 45.7873BigSet012802|Contig39802|consensus| ann40 0.16054 2 0.1639 0.4145 na na naBigSet013418|c50249 0.26144 3 0.0831 0.2734 -2.5604 13 47.2149128AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet013559|Contig1329|Contig24389|consensus| ann|c7241|Contig18288|consensus| ann26 0.15428 1 0.3080 0.2329 -2.6870 14 22.8635BigSet013672|Contig18958|consensus| ann5 0.32581 2 0.0628 0.1549 na na naBigSet013854|HA89.CCFT8711.f8 0.18505 1 0.1310 0.3369 na 4 53.1087BigSet013858|Contig43421|consensus| ann29 0.15864 1 0.1596 0.2809 -3.5979 5 53.6159129AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet013869|Contig7051|consensus| ann24 0.16145 1 0.2720 0.2908 -3.8393 na naBigSet013891|Contig29228|consensus| ann13 0.25323 4 0.0671 0.2678 -2.5604 16 45.5293BigSet013967|Contig5765|consensus| ann20 0.15881 1 0.1394 0.3567 -3.2777 na naBigSet014098|Contig28162|consensus| ann9 0.17754 1 0.1573 0.1854 -3.2262 na naBigSet014180|Contig13211|consensus| ann15 0.19272 1 0.3180 0.1393 -2.5604 13 10.2182130AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet014287|Contig63252|consensus| ann19 0.18132 1 0.2142 0.2019 -4.2356 9 11.5822BigSet014295|Contig4075|consensus| ann64 0.14658 1 0.1900 0.4529 na 12 61.6335BigSet014317|Contig11476|consensus| ann87 0.14920 2 0.2152 0.4573 -4.5935 na naBigSet014318|Contig12802|consensus| ann24 0.15519 1 0.2312 0.2430 -4.3144 na naBigSet014330|Contig26175|consensus| ann7 0.17852 1 0.3301 0.2286 -4.0098 na na131AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet014408|Contig37063|consensus| ann62 0.15820 1 0.1408 0.3325 -1.1117 na naBigSet014484|Contig580|consensus| ann20 0.15808 1 0.2882 0.2703 -4.1816 13 47.7526BigSet015073|Contig33470|consensus| ann13 0.19570 1 0.3050 0.3117 -2.3916 na naBigSet015147|Contig54587|consensus| ann7 0.18081 1 0.0000 0.3596 na na naBigSet015258|Contig21787|consensus| ann22 0.15909 1 0.1731 0.3715 -3.2963 12 61.0957132AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet015461|Contig46107|consensus| ann12 0.21675 3 0.0800 0.3151 -2.3135 10 45.1140BigSet015596|Contig73| c13898|Contig25110|consensus| ann4 0.25995 1 0.2709 0.2083 na na naBigSet015615|ANN1312.gi_90460204_gb_DY922083.1_DY92208328 0.15333 1 0.2857 0.3011 -4.4952 na naBigSet015728|Contig62461|consensus| ann7 0.18536 1 0.0729 0.2996 na na naBigSet015821|Contig6804|consensus| ann86 0.14591 1 0.1784 0.4476 -3.2262 na na133AppendixA.Chapter2supplementarymaterialsContig Number ofSNPs incontigAverageF STNumber of F STOutliers(q<0.05)Hs Landrace HsWild TajimasDLinkageGroupcMBigSet015851|Contig9367|consensus| ann6 0.19673 1 0.1713 0.3322 -3.5979 12 54.4893BigSet015906|Contig61207|consensus| ann7 0.19597 1 0.1496 0.2510 -2.5604 na naBigSet015961|Contig66354|consensus| ann45 0.16268 2 0.3230 0.2429 -4.4324 7 26.6285c9329 42 0.15236 1 0.2698 0.2769 -3.1086 5 50.7908lrc5740 10 0.24296 3 0.0339 0.3135 na 10 44.1730134Appendix A. Chapter 2 supplementary materialsTable A.3: Description of improvement candidate contigs fromF ST genome scan.Contig Number ofSNPs incontigAvgerageFSTNumber of FSTOutliers(q<0.05)LinkageGroupcMBigSet001733|Contig45538|consensus| ann95 0.1498 1 na naBigSet003684| c5343 16 0.1605 1 na naBigSet003946|Contig56| c12804|Contig27007|consensus| ann25 0.1678 2 11 57.8588BigSet004681|Contig820| Contig8532|consensus| ann| c1253523 0.1588 1 na naBigSet005700|Contig1465|Contig29295|consensus| ann| c1231627 0.2046 5 na naBigSet005777|Contig1744|Contig44388|consensus| ann|Contig36396|consensus| ann26 0.1620 1 5 50.9261135Appendix A. Chapter 2 supplementary materialsContig Number ofSNPs incontigAvgerageFSTNumber of FSTOutliers(q<0.05)LinkageGroupcMBigSet006359|Contig16311|consensus| ann22 0.1571 1 na naBigSet006946|Contig43256|consensus| ann1 0.3854 1 na naBigSet008544|Contig41475|consensus| ann24 0.2012 4 14 57.0593BigSet009448|Contig15432|consensus| ann19 0.1613 1 na naBigSet011693|Contig48054|consensus| ann30 0.1649 1 8 45.3237BigSet011768|Contig57622|consensus| ann2 0.2881 1 5 50.9261BigSet012883|Contig66664|consensus| ann13 0.1741 1 14 58.6752BigSet013961|HA89.FSISY1C15JDR4Z44 0.1608 2 na na136Appendix A. Chapter 2 supplementary materialsContig Number ofSNPs incontigAvgerageFSTNumber of FSTOutliers(q<0.05)LinkageGroupcMBigSet014800|Contig18570|consensus| ann19 0.1881 2 na na137Appendix A. Chapter 2 supplementary materialsFigure A.1: Genetic diversity of domestication outliers in wild H. annuus and landraces.138Appendix A. Chapter 2 supplementary materialsFigure A.2: Tajima’s D in H. annuus landraces for all contigs and domestication outliers.139AppendixA.Chapter2supplementarymaterialsFigure A.3: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 1. The y-axis indicates the amountof admixture identified from the potential parental populations.140AppendixA.Chapter2supplementarymaterialsFigure A.4: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 2. The y-axis indicates the amountof admixture identified from the potential parental populations.141AppendixA.Chapter2supplementarymaterialsFigure A.5: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 3. The y-axis indicates the amountof admixture identified from the potential parental populations.142AppendixA.Chapter2supplementarymaterialsFigure A.6: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 4. The y-axis indicates the amountof admixture identified from the potential parental populations.143AppendixA.Chapter2supplementarymaterialsFigure A.7: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 5. The y-axis indicates the amountof admixture identified from the potential parental populations.144AppendixA.Chapter2supplementarymaterialsFigure A.8: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 6. The y-axis indicates the amountof admixture identified from the potential parental populations.145AppendixA.Chapter2supplementarymaterialsFigure A.9: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 7. The y-axis indicates the amountof admixture identified from the potential parental populations.146AppendixA.Chapter2supplementarymaterialsFigure A.10: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 8. The y-axis indicates the amountof admixture identified from the potential parental populations.147AppendixA.Chapter2supplementarymaterialsFigure A.11: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 9. The y-axis indicates the amountof admixture identified from the potential parental populations.148AppendixA.Chapter2supplementarymaterialsFigure A.12: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 10. The y-axis indicates theamount of admixture identified from the potential parental populations.149AppendixA.Chapter2supplementarymaterialsFigure A.13: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 11. The y-axis indicates theamount of admixture identified from the potential parental populations.150AppendixA.Chapter2supplementarymaterialsFigure A.14: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 12. The y-axis indicates theamount of admixture identified from the potential parental populations.151AppendixA.Chapter2supplementarymaterialsFigure A.15: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 13. The y-axis indicates theamount of admixture identified from the potential parental populations.152AppendixA.Chapter2supplementarymaterialsFigure A.16: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 14. The y-axis indicates theamount of admixture identified from the potential parental populations.153AppendixA.Chapter2supplementarymaterialsFigure A.17: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 15. The y-axis indicates theamount of admixture identified from the potential parental populations.154AppendixA.Chapter2supplementarymaterialsFigure A.18: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 16. The y-axis indicates theamount of admixture identified from the potential parental populations.155AppendixA.Chapter2supplementarymaterialsFigure A.19: Introgression of wild Helianthus alleles into modern H. annuus cultivars on linkage group 17. The y-axis indicates theamount of admixture identified from the potential parental populations.156Appendix BChapter 3 supplementary materialsTable B.1: Description of samples sequenced using GBS in thischapter. In the notes column RD stands for removed due toinsufficent data, MU stands for Misslabeled by the USDA (thesehave now all been corrected) and PA denotes samples used inthe phylogentic analyses.Name Alias Collection Species Number ofreadsNumberalignedNotesann01 GB011 PI 613783 H. annuus 7077009 4994645ann02 GB013 IAF 54-46 H. annuus 423777 345206ann03 GB015 PI 592317 H. annuus 2971888 2476825ann04 GB016 PI 613727 H. annuus 1304960 1088777ann05 GB020 PI 468556 H. annuus 7126077 6248054ann06 GB025 PI 413021 H. annuus 4522383 4004476157AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesann07 GB029 PI 586809 H. annuus 3825298 3311965ann08 GB031 PI 613752 H. annuus 3939970 3053977ann09 GB032 PI 468580 H. annuus 5340616 4265102ann10 GB034 PI 547167 H. annuus 4550235 3957362ann11 GB035 PI 435612 H. annuus 3616772 3115208ann12 GB036 PI 413130 H. annuus 7387815 6489894ann13 GB037 PI 592318 H. annuus 2853124 1924796ann14 GB041 PI 435368 H. annuus 7521461 5411862ann15 GB042 PI 613737 H. annuus 6600627 4779173ann16 GB043 PI 435406 H. annuus 3333506 2725130ann17 GB044 PI 435410 H. annuus 3271597 2812165ann18 GB047 PI 468615 H. annuus 5372433 4539927ann19 GB048 PI 435589 H. annuus 10807194 9471310ann20 GB049 PI 468571 H. annuus 4168117 3446731ann21 GB050 PI 468545 H. annuus 4078456 3312620ann22 GB051 PI 435531 H. annuus 6655311 5580373 PAann23 GB052 PI 468476 H. annuus 1841553 1612873158AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesann24 GB053 PI 468463 H. annuus 2049729 1670425ann25 GB054 PI 413157 H. annuus 3587173 2963491ann26 GB098 PI 649814 H. annuus 2281050 1943669ann27 GB099 PI 435471 H. annuus 2437077 2086032ann28 GB100 PI 592312 H. annuus 1029069 883481ann29 GB101 PI 586887 H. annuus 763126 634065ann30 GB102 PI 435414 H. annuus 370936 308535ann31 GB103 PI 435850 H. annuus 3935007 3325118ann32 GB104 PI 468562 H. annuus 2555817 2213534ann33 GB105 PI 435598 H. annuus 2555817 2213534ann34 GB106 PI 435557 H. annuus 2373418 2034561ann35 GB107 PI 586864 H. annuus 5740838 4709203ann36 GB110 PI 468613 H. annuus 3286962 2833641ann37 GB111 PI 468616 H. annuus 4318591 3649335ann38 GB113 PI 649854 H. annuus 692312 558697ann39 GB114 PI 413173 H. annuus 3813039 3291500ann40 GB115 PI 613749 H. annuus 2254468 1964958159AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesann41 GB116 PI 435359 H. annuus 2934372 2386051 PAann42 GB117 PI 653547 H. annuus 2521385 2129236ann43 GB118 PI 586879 H. annuus 1389716 1129383ann44 GB119 PI 413097 H. annuus 3785107 3284248ann45 GB120 PI 413103 H. annuus 7078967 5957977ann46 GB121 PI 413131 H. annuus 5857620 5217191ann47 GB122 PI 413155 H. annuus 5476381 4642530ann48 GB123 PI 413080 H. annuus 4060571 3475011ann49 GB124 PI 413079 H. annuus 3500458 2952445ann50 GB125 PI 413095 H. annuus 5073307 4269540ann51 GB126 PI 468542 H. annuus 378487 265095ann52 GB127 PI 413120 H. annuus 3768807 3094854ann53 GB128 PI 435456 H. annuus 2277678 1814250ann54 GB129 PI 586853 H. annuus 4038089 3483928 PAann55 GB130 PI 586860 H. annuus 2412034 2017861ann56 GB131 PI 586818 H. annuus 5672918 4901938ann57 GB132 PI 586819 H. annuus 2846704 2431183160AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesann58 GB133 PI 613787 H. annuus 445713 332887ann59 GB134 PI 435442 H. annuus 2927222 2480214ann60 GB135 PI 435448 H. annuus 3113008 2647705ann61 GB169 PI 435457 H. annuus 819437 673059ann62 GB170 PI 468494 H. annuus 2715245 2388907 PAann63 GB171 PI 468456 H. annuus 876814 751547ann64 GB172 PI 468512 H. annuus 1253199 1084238ann65 GB173 PI 597901 H. annuus 3799562 3311086ann66 GB174 PI 435841 H. annuus 1896843 1658727ann67 GB175 PI 468457 H. annuus 1542766 1306721ann68 GB176 PI 597890 H. annuus 2681717 2313839ann69 GB177 PI 468596 H. annuus 3727982 3140556ann70 GB178 PI 435534 H. annuus 1049765 905926ann71 GB182 PI 435397 H. annuus 1715724 1503079ann72 GB183 PI 468548 H. annuus 1609070 1212214ann73 GB184 PI 468583 H. annuus 5759175 5054150ann74 GB185 PI 468536 H. annuus 1717154 1270620161AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesann75 GB186 PI 432524 H. annuus 1128971 940297ann76 GB187 PI 649806 H. annuus 1516969 1296430ann77 GB188 PI 435598 H. annuus 2485725 2168307 PAann78 GB189 PI 413088 H. annuus 957125 839055ann79 GB190 PI 413088 H. annuus 5762355 5116657ann80 GB191 PI 413088 H. annuus 1730416 1531936ann81 GB192 PI 413079 H. annuus 3105248 2727019ann82 GB193 PI 413079 H. annuus 1000233 867525ann83 GB194 PI 413088 H. annuus 4133253 3671222ann84 GB195 PI 413088 H. annuus 4912596 4231261ann85 GB198 PI 435442 H. annuus 6285152 5347245ann86 GB199 PI 586853 H. annuus 704309 574010ann87 GB200 PI 586853 H. annuus 4078081 3511119ann88 GB201 PI 435442 H. annuus 1003631 827369ann89 GB202 PI 435442 H. annuus 2941522 2526732ann90 GB203 PI 468542 H. annuus 547 508 RDann91 GB205 PI 468580 H. annuus 8959020 7581333162AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesann92 GB206 PI 468580 H. annuus 4188487 3465669ann93 GB014 PI 649867 H. annuus 2911929 2331896ann94 GB026 PI 649869 H. annuus 6268420 5435536 MUann95 GB027 PI 649868 H. annuus 7379616 5091829 MUann96 GB028 PI 649867 H. annuus 4553722 3814037 MUann97 GB249 PI 649869 H. annuus 3857401 3309232ann112 GB112 PI 649850 H. annuus 1254316 1085452ann204 GB204 PI 468542 H. annuus 4533927 3864025ann225 GB225 PI 435400 H. annuus 951498 864867ann250 GB250 PI 649869 H. annuus 3734701 3328258 MUann255 GB255 PI 435483 H. annuus 5862563 5325149ano01 GB061 PI 468638 H. anomalus 3419928 2912255 PAano02 GB279 PI 649860 H. anomalus 3982229 3297629 PAano03 GB263 PI 468642 H. anomalus 1720660 1455411 PAano04 GB056 B1 Goblin Valley H. anomalus 3612131 3094524 PAano05 GB059 Rose B2 H. anomalus 605935 502244ano144 GB144 PI 649860 H. anomalus 4204824 3664271163AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesano1495 ANO1495 PI 468638 H. anomalus 974351 852192ano1506 ANO1506 PI 468642 H. anomalus 1075905 933672arg01 GB004 PI 494582 H. argophyllus 2501547 2090223 PAarg02 GB005 PI 649862 H. argophyllus 498871 431216 PAarg03 GB006 PI 435623 H. argophyllus 5154171 4404571 PAarg04 GB007 PI 494569 H. argophyllus 832247 717234 PAarg05 GB008 PI 494579 H. argophyllus 1925365 1664416 PAarg06 GB009 PI 494570 H. argophyllus 2560530 2230647arg07 GB010 PI 435625 H. argophyllus 3870126 3343742arg08 GB018 PI 435627 H. argophyllus 2710164 2138849arg09 GB019 PI 435629 H. argophyllus 1790927 1425475arg10 GB021 PI 435630 H. argophyllus 3363949 1857346arg11 GB022 PI 664729 H. argophyllus 1988640 1554820arg12 GB073 PI 649865 H. argophyllus 2147526 1732379arg13 GB075 PI 664730 H. argophyllus 3796051 3116054arg14 GB245 PI 649865 H. argophyllus 2683139 2364491arg15 GB246 PI 649865 H. argophyllus 720839 607118164AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesbol01 GB023 PI 435641 H. bolanderi 1385579 1185961 PAbol02 GB074 Ames 7109 H. bolanderi 682531 561410 PAbol03 GB145 PI 649895 H. bolanderi 2835638 2361264 PAbol04 GB150 PI 435641 H. bolanderi 4178716 3469736 PAbol06 GB276 Ames 27237 H. bolanderi 5994540 5197603 PAbol330 GB330 Ames 7109 H. bolanderi 164724 143394 RDbol338 GB338 PI 435641 H. bolanderi 247275 204725bol339 GB339 PI 435641 H. bolanderi 8906 7278 RDbol352 GB352 Ames 7109 H. bolanderi 17472 14274 RDdeb01 GB072 PI 435651 H. debilis 1130147 979195 PAdeb02 GB166 PI 469690 H. debilis 858017 702492 PAdeb03 GB012 PI 653609 H. debilis subsp. cucumerifolius 2270354 1636749 PAdeb04 GB024 PI 653610 H. debilis subsp. Cucumerifolius 4929029 4294177 PAdeb05 GB159 PI 435673 H. debilis subsp. cucumerifolius 3392232 2626328 PAdeb06 GB163 PI 649870 H. debilis subsp. cucumerifolius 2561900 2201345deb07 GB168 PI 435654 H. debilis subsp. cucumerifolius 6905965 5740602deb08 GB141 PI 468671 H. debilis subsp. debilis 3872103 3276867165AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesdeb09 GB161 PI 468671 H. debilis subsp. debilis 1672475 888110deb10 GB165 PI 649871 H. debilis subsp. debilis 569143 464742deb11 GB162 PI 468679 H. debilis subsp. silvestris 5021295 4283083deb12 GB164 PI 468686 H. debilis subsp. silvestris 431401 313176deb13 GB167 PI 468680 H. debilis subsp. silvestris 298659 226319deb14 GB140 PI 468694 H. debilis subsp. vestitus 5247209 4554926deb15 GB160 PI 468695 H. debilis subsp. vestitus 597910 420070dec64 DB64 PI 547170 H. decapetalus 2842894 2352426 PAdec78 DB78 PI 468697 H. decapetalus 1375958 1139305dec89 DB89 PI 503244 H. decapetalus 400093 319134 PAdec186 DB186 PI 547169 H. decapetalus 2006325 1652032 PAdec222 DB222 PI 503243 H. decapetalus 419598 348774 PAdec314 DB314 PI 649972 H. decapetalus 1546847 1276131des01 GB057 PI 468702 H. deserticola 4034962 3242546 PAdes02 GB066 PI 664663 H. deserticola 2777430 2331998 PAdes03 GB067 PI 649882 H. deserticola 2301238 1973583 PAdes04 GB068 PI 649880 H. deserticola 1530035 1270693 PA166AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesdes05 GB069 PI 649881 H. deserticola 1534824 1248854 PAdes06 GB070 PI 468703 H. deserticola 2574951 2194017des07 GB071 PI 649874 H. deserticola 1892069 1564879des08 GB179 PI 649879 H. deserticola 1323979 1090995des09 GB299 PI 649878 H. deserticola 1152883 903078des10 GB275 PI 468702 H. deserticola 2368060 1939660des1476 DES1476 PI 468702 H. deserticola 601848 288496des1484 DES1484 PI 468703 H. deserticola 1299870 1111712des2463 DES2463 PI 664663 H. deserticola 3254776 731844des2526 DES2526 PI 649880 H. deserticola 284666 240988div320 DB320 PI 503218 H. divaricatus 3444068 2717039 PAdiv322 DB322 PI 664604 H. divaricatus 3029188 2462632 PAdiv324 DB324 PI 503209 H. divaricatus 3522475 1795501 PAdiv325 DB325 PI 664645 H. divaricatus 1861081 1331862 PAdiv329 DB329 PI 547174 H. divaricatus 5943051 4773358 PAexi01 GB033 PI 649898 H. exilis 5531324 3996793 PAexi02 GB060 PI 664631 H. exilis 1645618 1401110 PA167AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesexi03 GB077 PI 649895 H. exilis 1801404 1489259 PAexi04 GB078 PI 649889 H. exilis 1751411 1527420exi05 GB080 PI 664632 H. exilis 1306748 1078037 PAexi06 GB081 PI 649899 H. exilis 721817 479379 PAexi07 GB082 PI 649896 H. exilis 555654 464159exi08 GB083 PI 664629 H. exilis 265093 187208exi09 GB084 PI 649900 H. exilis 2810603 1629314exi10 GB158 PI 468662 H. exilis 231905 180311exi242 GB242 PI 649898 H. exilis 160953 137938gig38 DB38 PI 503223 H. giganteus 375481 301345 PAgig94 DB94 PI 547178 H. giganteus 611587 499133 PAgig291 DB291 PI 664647 H. giganteus 1002109 837903 PAgig295 DB295 PI 664710 H. giganteus 2195876 1829197 PAgig297 DB297 PI 468719 H. giganteus 1738806 1437159 PAgro114 DB114 PI 547195 H. grosseserratus 1072114 887915 PAgro118 DB118 PI 586890 H. grosseserratus 737802 612729 PAgro124 DB124 PI 547192 H. grosseserratus 846289 691875 PA168AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesgro129 DB129 PI 547202 H. grosseserratus 1698759 1361495gro209 DB209 PI 468726 H. grosseserratus 489681 399560 PAgro302 DB302 PI 468725 H. grosseserratus 670643 561595 PAhir134 DB134 PI 547204 H. hirsutus 1539062 1235583 PAhir146 DB146 PI 468735 H. hirsutus 3482074 2742391 PAhir197 DB197 PI 468739 H. hirsutus 2067972 1727208 PAhir238 DB238 PI 495610 H. hirsutus 1602433 1319384 PAmax01 GB062 PI 468747 H. maximilliani 3074873 2533477 PAmax02 GB063 PI 592333 H. maximilliani 1076185 828811 PAmax03 GB064 PI 650010 H. maximilliani 1702024 1378225 PAmax04 GB065 PI 613794 H. maximilliani 1900494 1406028 PAmax05 GB142 PI 613757 H. maximilliani 4327561 3549848max06 GB143 PI 531041 H. maximilliani 1775425 1410393 PAmax07 GB146 PI 531041 H. maximilliani 2034103 1638037max08 GB278 PI 531041 H. maximilliani 7310267 5922007max09 GB282 PI 531041 H. maximilliani 12465693 9997542max148 GB148 PI 601812 H. maximilianii 2467716 2099387169AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesmax277 GB277 PI 531041 H. maximilliani 410885 337155neg01 GB058 PI 468779 H. neglectus 3111217 2649508 PAneg02 GB085 PI 468770 H. neglectus 2766251 2249657 PAneg03 GB086 PI 597916 H. neglectus 1798588 1532579 PAneg04 GB151 PI 468772 H. neglectus 1751410 1499446 PAneg05 GB152 PI 468777 H. neglectus 2446212 1985450 PAneg06 GB153 PI 468767 H. neglectus 5415590 4534370neg07 GB154 PI 435768 H. neglectus 2846411 2391809neg08 GB155 PI 468764 H. neglectus 4699950 3585099neg09 GB156 PI 435769 H. neglectus 4372848 3700221neg10 GB264 PI 468770 H. neglectus 5950498 5096556neg267 GB267 PI 468776 H. neglectus 1832971 1622538neg271 GB271 PI 468780 H. neglectus 485596 427882neg273 GB273 PI 468779 H. neglectus 1440319 1286556neg301 GB301 PI 468779 H. neglectus 1143516 1005936neg307 GB307 PI 468768 H. neglectus 939428 574852neg309 GB309 PI 468780 H. neglectus 1087536 957995170AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesneg311 GB311 PI 468781 H. neglectus 644783 571302neg327 GB327 PI 435762 H. neglectus 782802 686829niv01 GB097 PI 650016 H. niveus 2244166 1831737 PAniv02 GB157 PI 613758 H. niveus 8043008 4053298 PAniv03 GB303 PI 650019 H. niveus 995593 843541 PAniv04 GB304 PI 650021 H. niveus 1771156 1517807 PAniv05 GB293 PI 435770 H. niveus 2840922 2402715 PAniv06 GB287 PI 650019 H. niveus subsp. tephrodes 5932734 5115833niv07 GB180 PI 468788 H. niveus subsp. canescens 3381431 2662096 PAniv08 GB030 PI 650017 H. niveus subsp. tephrodes 3672246 2307510niv09 GB076 PI 650021 H. niveus subsp. tephrodes 4568437 4001098niv10 GB181 PI 650018 H. niveus subsp. tephrodes 4650579 3645101niv19 GB289 PI 650021 H. niveus subsp. tephrodes 13595 6537 RSniv286 GB286 PI 650019 H. niveus subsp. tephrodes 2677306 2341113niv291 GB291 PI 435776 H. niveus 2075695 1828034niv292 GB292 PI 649905 H. niveus 345633 295907niv294 GB294 PI 435770 H. niveus 2630738 2309386171AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotesniv295 GB295 PI 435770 H. niveus 522122 447632niv296 GB296 PI 468785 H. niveus 997906 876484niv302 GB302 PI 649905 H. niveus 254484 220134niv334 GB334 PI 649905 H. niveus 388566 330135nut01 GB001 King 140-38 H. nutallii 4899512 4034318 PAnut02 GB002 King 140-32 H. nutallii 1731352 1442899 PAnut03 GB003 King 140-32 H. nutallii 1848784 1499808 PApar04 GB055 King 141-3 H. paradoxus 3070172 2354678 PApar07 GB300 King 144-28 H. paradoxus 1572462 1302620 PApar350 GB350 King 143-1 H. paradoxus 123 111 RDpet01 GB017 IPL 54-34 H. petiolaris fallax 5075265 3547045 PApet02 GB045 PI 451978 H. petiolaris 3959219 2634652 PApet03 GB046 PI 586922 H. petiolaris subsp. petiolaris 4372462 3313186 PApet04 GB079 PI 592355 H. petiolaris subsp. petiolaris 5516296 4786509 PApet05 GB087 PI 586918 H. petiolaris subsp. petiolaris 3362778 2733840 PApet06 GB089 PI 613769 H. petiolaris subsp. petiolaris 2150072 1799506pet07 GB090 PI 613762 H. petiolaris subsp. petiolaris 1361706 1125369172AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotespet08 GB091 PI 435836 H. petiolaris subsp. petiolaris 3068700 2472784pet09 GB092 PI 586932 H. petiolaris subsp. petiolaris 1025899 807505pet10 GB093 PI 435825 H. petiolaris subsp. petiolaris 551652 431316pet11 GB094 PI 592356 H. petiolaris subsp. petiolaris 3493328 2762594pet88 GB088 PI 586911 H. petiolaris subsp. petiolaris 1435545 1095316pet248 GB248 PI 435825 H. petiolaris subsp. petiolaris 976888 863587pet256 GB256 PI 613769 H. petiolaris subsp. petiolaris 1208543 1072036pet331 GB331 PI 468817 H. petiolaris fallax 489134 429146 PApet333 GB333 PI 435843 H. petiolaris 1509690 1335485 PApet346 GB346 PI 435816 H. petiolaris fallax 2485535 2172116pet349 GB349 PI 468817 H. petiolaris fallax 1414717 1227997 PApet354 GB354 PI 435843 H. petiolaris 856063 765185pra01 GB270 PI 435849 H. praecox 2819038 2414281 PApra02 GB269 PI 435849 H. praecox 5754001 4952615 PApra03 GB297 PI 435849 H. praecox 2160315 1822447 PApra04 GB095 PI 468847 H. praecox subsp. hirtus 1387621 1171925 PApra05 GB147 PI 468849 H. praecox subsp. hirtus 4571870 3923991 PA173AppendixB.Chapter3supplementarymaterialsName Alias Collection Species Number ofreadsNumberalignedNotespra06 GB096 PI 468851 H. praecox subsp. praecox 6257894 5332777pra07 GB149 PI 468851 H. praecox subsp. praecox 1376463 1185016pra270 GB326 PI 468851 H. praecox 162850 139502 RDstr52 DB52 PI 435888 H. strumosus 3555929 2849514 PAtub179 DB179 PI 547248 H. tuberosus 7421585 6171701tub242 DB242 PI 547243 H. tuberosus 4083851 2998899tub254 DB254 PI 650105 H. tuberosus 3468123 2809383tub261 DB261 PI 503279 H. tuberosus 6328054 5161185tub32 DB32 PI 547230 H. tuberosus 7270596 5832232tub69 DB69 PI 613795 H. tuberosus 2338973 1945832win01 GB038 Hwe3 H. winterii 6153204 3374628win02 GB039 Hwb1 H. winterii 8672124 5329407win03 GB040 Hwb12 H. winterii 2179274 1895935win04 GB136 HWE-13 H. winterii 3002645 2549942174Appendix B. Chapter 3 supplementary materials      Figure B.1: Phylogenies of Heliathus, numbers shown for branches with support in 80% bootstrapsa) neighbor joining b) parsimony.175Appendix CChapter 4 supplementary materialsTable C.1: USDA pre-bred lines evaluated at UBC.Evaluation Number Accession Number AliasE1 PI 539882 PET-PET-1741-1E2 PI 539884 PRA-RUN-417-3E3 PI 539885 PRA-RUN-417-2E4 PI 539886 PRA-RUN-417-1E5 PI 539887 ARG-1575-4E6 PI 539888 ARG-1575-3E7 PI 539889 ARG-1575-2E8 PI 539890 ARG-1575-1E9 PI 539892 BOL-774E10 PI 539893 ANO-1509-2E11 PI 539894 ANO-1509-1E12 PI 539895 RES-834-3E13 PI 539896 RES-834-2E14 PI 539897 RES-834-1E15 PI 539899 PAR-1673-1E16 PI 539900 PAR-1673-2E17 PI 539901 PAR-1084-1E18 PI 539902 PAR-1671-2E19 PI 539903 PAR-1671-1176Appendix C. Chapter 4 supplementary materialsEvaluation Number Accession Number AliasE20 PI 539904 NEG-1255E21 PI 539905 HIR-1734-3E22 PI 539906 HIR-1734-2E23 PI 539907 HIR-1734-1E24 PI 539909 DEB-SIL-367-1E25 PI 539911 DEB-CUC-1810E26 PI 539912 DES-1474-1E27 PI 539913 DES-1474-2E28 PI 539914 DES-1474-4E29 PI 543744 cmsHA89(MAX1)E30 PI 564520 TUB-1789E31 PI 564515 TUB-365E32 PI 564517 TUB-1709-1E33 PI 564518 TUB-1709-2E34 PI 564519 TUB-1709-3E35 PI 564549 TUB-346E36 PI 596741 Rf ANN-19E37 PI 596742 Rf ANN-48E38 PI 596743 Rf ANN-783E39 PI 596744 Rf ANN-892E40 PI 596745 Rf ANN-1064E41 PI 596746 Rf ANN-1742E42 PI 596747 Rf ARG-420E43 PI 596748 Rf ARG-1575E44 PI 596749 Rf PRA-417E45 PI 596750 Rf TUB-346177Appendix C. Chapter 4 supplementary materialsEvaluation Number Accession Number AliasE46 PI 610782 GIG -1616-1E47 PI 610783 GIG -1616-2E48 PI 610784 HIR -828-1E49 PI 610785 HIR -828-2E50 PI 610786 HIR -828-3E51 PI 610787 HIR -828-4E52 PI 610788 STR -1622-1E53 PI 610789 STR -1622-2E54 PI 610790 TUB -825-1E55 PI 610791 TUB -825-2Table C.2: Description of pre-bred lines developed at UBC. Wilddonors are described in Table B.1. Lineage describes the indi-vidual plants each line was derived from, first is a short handfor the wild donor then dashes seperate the row positions of theBC2 and BC2S1 plants used, resepectively. The asterisk denotesselfing that took place in the greenhouse so the numbers do notrepresent actual row positions.Name Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC1 ann011-A GB011 wild donor AN011-1-1 yes yesUBC2 ann011-B GB011 wild donor AN011-1-2 yes yesUBC3 ann011-C GB011 wild donor AN011-1-3 yes yesUBC4 ann011-D GB011 wild donor AN011-1-4 yes yesUBC5 ann011-E GB011 wild donor AN011-2-1 no noUBC6 ann011-F GB011 wild donor AN011-2-2 yes yes178Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC7 ann011-G GB011 wild donor AN011-2-3 yes yesUBC8 ann011-H GB011 wild donor AN011-2-4 yes yesUBC9 ann011-I GB011 wild donor AN011-3-1 yes yesUBC10 ann011-J GB011 wild donor AN011-3-2 yes yesUBC11 ann011-K GB011 wild donor AN011-3-3 yes yesUBC12 ann011-L GB011 wild donor AN011-4-2* yes yesUBC13 ann011-O GB011 wild donor AN011-4-3* yes yesUBC14 ann011-M GB011 wild donor AN011-4-4* yes yesUBC15 ann011-N GB011 wild donor AN011-4-1 yes yesUBC16 ann011-P GB011 RHA391 AN011-5-1 yes yesUBC17 ann011-Q GB011 RHA391 AN011-5-2 no yesUBC18 ann011-R GB011 RHA391 AN011-5-3 yes yesUBC19 ann011-S GB011 RHA391 AN011-5-4 yes yesUBC20 ann011-T GB011 RHA391 AN011-5-5 yes yesUBC21 ann025-A GB025 wild donor AN025-1-1 no noUBC22 ann025-B GB025 wild donor AN025-1-2 yes yesUBC23 ann025-C GB025 wild donor AN025-1-3 yes yesUBC24 ann025-D GB025 wild donor AN025-1-4 yes noUBC25 ann025-E GB025 wild donor AN025-1-5 yes yesUBC26 ann025-F GB025 wild donor AN025-2-2 no noUBC27 ann025-G GB025 wild donor AN025-2-3 yes yesUBC28 ann025-H GB025 wild donor AN025-2-4 no yesUBC29 ann025-I GB025 wild donor AN025-2-5 yes yesUBC30 ann025-J GB025 wild donor AN025-3-1 no noUBC31 ann025-K GB025 wild donor AN025-3-2 yes yes179Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC32 ann025-L GB025 RHA391 AN025-4-1 yes yesUBC33 ann025-O GB025 RHA391 AN025-4-2 yes yesUBC34 ann025-M GB025 RHA391 AN025-4-3 yes yesUBC35 ann025-N GB025 RHA391 AN025-4-4 yes yesUBC36 ann025-P GB025 RHA391 AN025-4-5 yes yesUBC37 ann025-Q GB025 RHA391 AN025-5-1 yes yesUBC38 ann025-R GB025 RHA391 AN025-5-2 no yesUBC39 ann025-S GB025 RHA391 AN025-5-3 yes yesUBC40 ann025-T GB025 RHA391 AN025-5-4 yes yesUBC41 ann029-A GB029 RHA391 AN029-1-1 yes yesUBC42 ann029-B GB029 RHA391 AN029-1-2 yes yesUBC43 ann029-C GB029 RHA391 AN029-10-1* no yesUBC44 ann029-D GB029 RHA391 AN029-10-2* yes yesUBC45 ann029-E GB029 RHA391 AN029-10-3* no yesUBC46 ann029-F GB029 wild donor AN029-2-2 no noUBC47 ann029-G GB029 wild donor AN029-2-3 no yesUBC48 ann029-H GB029 wild donor AN029-3-1 yes yesUBC49 ann029-I GB029 wild donor AN029-3-2 yes yesUBC50 ann029-J GB029 wild donor AN029-3-3 yes noUBC51 ann029-K GB029 wild donor AN029-4-1 yes yesUBC52 ann029-L GB029 RHA391 AN029-5-1 yes yesUBC53 ann029-O GB029 RHA391 AN029-5-2 yes yesUBC54 ann029-M GB029 RHA391 AN029-5-3 yes yesUBC55 ann029-N GB029 - AN029-6-1* no noUBC56 ann029-P GB029 RHA272 AN029-8-1* yes yes180Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC57 ann029-Q GB029 RHA272 AN029-8-2* yes yesUBC58 ann029-R GB029 RHA272 AN029-8-3* yes yesUBC59 ann029-S GB029 - AN029-9-1* yes yesUBC60 ann029-T GB029 - AN029-9-2* yes yesUBC61 ann031-A GB031 RHA391 AN031-1-1 no noUBC62 ann031-B GB031 RHA391 AN031-1-2 yes yesUBC63 ann031-C GB031 RHA391 AN031-1-3 yes yesUBC64 ann031-D GB031 RHA391 AN031-1-4 yes yesUBC65 ann031-E GB031 RHA391 AN031-2-1 yes yesUBC66 ann031-F GB031 RHA391 AN031-2-2 yes yesUBC67 ann031-G GB031 RHA391 AN031-2-3 no noUBC68 ann031-H GB031 RHA391 AN031-2-4 no noUBC69 ann031-I GB031 RHA391 AN031-3-2 yes yesUBC70 ann031-J GB031 RHA391 AN031-3-3 no noUBC71 ann031-K GB031 RHA391 AN031-3-4 yes yesUBC72 ann031-L GB031 RHA391 AN031-3-5 yes yesUBC73 ann031-O GB031 RHA391 AN031-4-1 no yesUBC74 ann031-M GB031 RHA391 AN031-4-2 no noUBC75 ann031-N GB031 RHA391 AN031-4-3 yes yesUBC76 ann031-P GB031 RHA391 AN031-4-5 yes yesUBC77 ann031-Q GB031 RHA391 AN031-5-1 yes yesUBC78 ann031-R GB031 RHA391 AN031-5-2 yes yesUBC79 ann031-S GB031 RHA391 AN031-5-3 yes yesUBC80 ann031-T GB031 RHA391 AN031-5-4 no yesUBC81 ann041-A GB041 RHA391 AN041-2-1 yes yes181Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC82 ann041-B GB041 RHA391 AN041-2-2 no yesUBC83 ann041-C GB041 RHA391 AN041-3-1* yes yesUBC84 ann041-D GB041 RHA391 AN041-3-2 yes yesUBC85 ann041-E GB041 RHA391 AN041-3-4 yes yesUBC86 ann041-F GB041 RHA391 AN041-4-1 yes yesUBC87 ann041-G GB041 RHA391 AN041-4-2 yes yesUBC88 ann041-H GB041 RHA391 AN041-4-3 yes yesUBC89 ann041-I GB041 RHA391 AN041-5-2 no yesUBC90 ann041-J GB041 RHA391 AN041-5-3 yes yesUBC91 ann041-K GB041 RHA391 AN041-5-4 yes yesUBC92 ann041-L GB041 RHA391 AN041-5-5 yes yesUBC93 ann041-O GB041 RHA391 AN041-6-1* yes yesUBC94 ann041-M GB041 RHA391 AN041-6-2* yes yesUBC95 ann041-N GB041 RHA391 AN041-6-3* yes yesUBC96 ann041-P GB041 RHA391 AN041-7-1* yes yesUBC97 ann041-Q GB041 RHA391 AN041-7-2* yes yesUBC98 ann041-R GB041 RHA391 AN041-7-3* yes yesUBC99 ann041-S GB041 RHA391 AN041-8-1* yes yesUBC100 ann041-T GB041 RHA391 AN041-8-2* yes yesUBC101 ann049-A GB049 wild donor AN049-1-1 yes yesUBC102 ann049-B GB049 wild donor AN049-1-2 no noUBC103 ann049-C GB049 wild donor AN049-1-3 yes yesUBC104 ann049-D GB049 wild donor AN049-1-4 no yesUBC105 ann049-E GB049 RHA391 AN049-2-1 yes yesUBC106 ann049-F GB049 RHA391 AN049-2-3 yes yes182Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC107 ann049-G GB049 RHA391 AN049-2-4 no yesUBC108 ann049-H GB049 RHA391 AN049-3-1 yes yesUBC109 ann049-I GB049 RHA391 AN049-3-2 yes yesUBC110 ann049-J GB049 RHA391 AN049-4-1 yes yesUBC111 ann049-K GB049 RHA391 AN049-4-2 yes yesUBC112 ann049-L GB049 RHA391 AN049-4-3 yes yesUBC113 ann049-O GB049 RHA391 AN049-5-1 yes yesUBC114 ann049-M GB049 RHA391 AN049-5-2 yes yesUBC115 ann049-N GB049 RHA391 AN049-5-3 no noUBC116 ann049-P GB049 RHA391 AN049-5-4 yes yesUBC117 ann049-Q GB049 RHA391 AN049-6-1 yes yesUBC118 ann049-R GB049 RHA391 AN049-6-2 yes yesUBC119 ann049-S GB049 RHA391 AN049-6-3 yes yesUBC120 ann049-T GB049 RHA391 AN049-6-4 yes yesUBC121 ann051-A GB051 wild donor AN051-1-1 yes yesUBC122 ann051-B GB051 wild donor AN051-1-2 no noUBC123 ann051-C GB051 wild donor AN051-1-3 no yesUBC124 ann051-D GB051 wild donor AN051-1-4 no noUBC125 ann051-E GB051 wild donor AN051-2-1 yes yesUBC126 ann051-F GB051 wild donor AN051-2-10 yes yesUBC127 ann051-G GB051 wild donor AN051-2-11 yes yesUBC128 ann051-H GB051 wild donor AN051-2-12 yes yesUBC129 ann051-I GB051 wild donor AN051-2-2 yes yesUBC130 ann051-J GB051 wild donor AN051-3-1 yes yesUBC131 ann051-K GB051 wild donor AN051-3-4 no yes183Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC132 ann051-L GB051 wild donor AN051-3-5 no noUBC133 ann051-O GB051 RHA391 AN051-4-1 yes yesUBC134 ann051-M GB051 wild donor AN051-5-1 no noUBC135 ann051-N GB051 wild donor AN051-5-2 yes yesUBC136 ann051-P GB051 wild donor AN051-5-3 yes yesUBC137 ann098-A GB098 wild donor AN098-1-5* yes yesUBC138 ann098-B GB098 wild donor AN098-1-4* yes yesUBC139 ann098-C GB098 wild donor AN098-1-1 no noUBC140 ann098-D GB098 wild donor AN098-1-2 yes yesUBC141 ann098-E GB098 wild donor AN098-2-1 no noUBC142 ann098-F GB098 wild donor AN098-2-2 yes yesUBC143 ann098-G GB098 wild donor AN098-2-3 yes yesUBC144 ann098-H GB098 wild donor AN098-2-4 yes yesUBC145 ann098-I GB098 wild donor AN098-3-1 yes yesUBC146 ann098-J GB098 RHA391 AN098-4-1 yes yesUBC147 ann098-K GB098 RHA391 AN098-4-2 yes yesUBC148 ann098-L GB098 RHA391 AN098-4-3 yes yesUBC149 ann098-O GB098 RHA391 AN098-4-4 yes yesUBC150 ann098-M GB098 RHA391 AN098-4-5 yes yesUBC151 ann098-N GB098 RHA391 AN098-4-6 no noUBC152 ann098-P GB098 RHA391 AN098-4-7 yes yesUBC153 ann098-Q GB098 RHA391 AN098-4-8 yes yesUBC154 ann098-R GB098 RHA391 AN098-5-1 yes yesUBC155 ann098-S GB098 RHA391 AN098-5-3 yes yesUBC156 ann098-T GB098 RHA391 AN098-5-4 yes yes184Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC157 ann100-A GB100 wild donor AN100-1-1 no yesUBC158 ann100-B GB100 wild donor AN100-1-2 yes yesUBC159 ann100-C GB100 wild donor AN100-1-3 yes yesUBC160 ann100-D GB100 RHA391 AN100-13-1* yes yesUBC161 ann100-E GB100 RHA391 AN100-13-2* no noUBC162 ann100-F GB100 RHA391 AN100-13-3* yes yesUBC163 ann100-G GB100 RHA391 AN100-14-1* no yesUBC164 ann100-H GB100 RHA391 AN100-14-2* yes yesUBC165 ann100-I GB100 RHA391 AN100-14-3* yes yesUBC166 ann100-J GB100 wild donor AN100-2-1 yes yesUBC167 ann100-K GB100 wild donor AN100-2-2 yes yesUBC168 ann100-L GB100 wild donor AN100-2-4 yes yesUBC169 ann100-O GB100 wild donor AN100-3-2 no yesUBC170 ann100-M GB100 wild donor AN100-3-3 no yesUBC171 ann100-N GB100 wild donor AN100-3-4 no noUBC172 ann100-P GB100 wild donor AN100-3-6 no noUBC173 ann100-Q GB100 wild donor AN100-4-3 yes yesUBC174 ann100-R GB100 wild donor AN100-5-1 yes yesUBC175 ann100-S GB100 wild donor AN100-5-3 yes yesUBC176 ann100-T GB100 wild donor AN100-5-4 no yesUBC177 ann103-A GB103 wild donor AN103-1-1 yes yesUBC178 ann103-B GB103 wild donor AN103-1-3 no noUBC179 ann103-C GB103 wild donor AN103-1-4 no noUBC180 ann103-D GB103 wild donor AN103-2-3 no yesUBC181 ann103-E GB103 wild donor AN103-2-5 no yes185Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC182 ann103-F GB103 wild donor AN103-3-1 yes yesUBC183 ann103-G GB103 wild donor AN103-3-3 yes yesUBC184 ann103-H GB103 wild donor AN103-3-4 yes yesUBC185 ann103-I GB103 wild donor AN103-4-1 no noUBC186 ann103-J GB103 wild donor AN103-4-3 no yesUBC187 ann103-K GB103 wild donor AN103-4-5 no noUBC188 ann103-L GB103 wild donor AN103-4-7 yes yesUBC189 ann103-O GB103 wild donor AN103-5-1 no noUBC190 ann103-M GB103 wild donor AN103-5-3 yes yesUBC191 ann106-A GB106 wild donor AN106-1-1 yes yesUBC192 ann106-B GB106 RHA391 AN106-11-1* yes yesUBC193 ann106-C GB106 RHA391 AN106-11-2* yes yesUBC194 ann106-D GB106 RHA391 AN106-11-3* yes yesUBC195 ann106-E GB106 RHA391 AN106-11-4* yes yesUBC196 ann106-F GB106 RHA391 AN106-11-5* yes yesUBC197 ann106-G GB106 RHA391 AN106-2-4* yes yesUBC198 ann106-H GB106 RHA391 AN106-2-5* no yesUBC199 ann106-I GB106 RHA391 AN106-2-1 yes yesUBC200 ann106-J GB106 RHA391 AN106-2-2 yes yesUBC201 ann106-K GB106 RHA391 AN106-2-3 yes yesUBC202 ann106-L GB106 RHA391 AN106-3-1 yes yesUBC203 ann106-O GB106 RHA391 AN106-3-2 no noUBC204 ann106-M GB106 RHA391 AN106-3-3 no yesUBC205 ann106-N GB106 RHA391 AN106-3-4 yes yesUBC206 ann106-P GB106 RHA391 AN106-4-2 no no186Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC207 ann106-Q GB106 RHA391 AN106-4-3 yes yesUBC208 ann106-R GB106 RHA391 AN106-4-4 no noUBC209 ann106-S GB106 RHA391 AN106-4-5 yes yesUBC210 ann106-T GB106 RHA391 AN106-4-7 no noUBC211 ann121-A GB121 RHA391 AN121-1-1 yes yesUBC212 ann121-B GB121 RHA391 AN121-1-2 no noUBC213 ann121-C GB121 RHA391 AN121-1-3 no yesUBC214 ann121-D GB121 RHA391 AN121-1-4 yes yesUBC215 ann121-E GB121 RHA391 AN121-2-1 yes yesUBC216 ann121-F GB121 RHA391 AN121-2-2 yes yesUBC217 ann121-G GB121 RHA391 AN121-2-3 no noUBC218 ann121-H GB121 RHA391 AN121-2-4 yes yesUBC219 ann121-I GB121 RHA391 AN121-3-1 yes yesUBC220 ann121-J GB121 RHA391 AN121-3-2 yes yesUBC221 ann121-K GB121 RHA391 AN121-3-3 no noUBC222 ann121-L GB121 RHA391 AN121-3-4 no yesUBC223 ann121-O GB121 RHA391 AN121-4-1 yes yesUBC224 ann121-M GB121 RHA391 AN121-4-2 no noUBC225 ann121-N GB121 RHA391 AN121-4-3 yes yesUBC226 ann121-P GB121 RHA391 AN121-4-4 yes yesUBC227 ann121-Q GB121 RHA391 AN121-5-11 yes yesUBC228 ann121-R GB121 RHA391 AN121-5-3 yes yesUBC229 ann121-S GB121 RHA391 AN121-5-5 no noUBC230 ann121-T GB121 RHA391 AN121-5-6 yes yesUBC231 ann125-A GB125 wild donor AN125-1-2 yes yes187Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC232 ann125-B GB125 wild donor AN125-1-3 yes yesUBC233 ann125-C GB125 wild donor AN125-1-5 yes yesUBC234 ann125-D GB125 RHA391 AN125-10-1* yes yesUBC235 ann125-E GB125 RHA391 AN125-10-2* yes yesUBC236 ann125-F GB125 RHA391 AN125-10-3* yes yesUBC237 ann125-G GB125 RHA391 AN125-11-1* yes yesUBC238 ann125-H GB125 RHA391 AN125-11-2* yes yesUBC239 ann125-I GB125 RHA391 AN125-11-3* no yesUBC240 ann125-J GB125 RHA391 AN125-2-1 yes yesUBC241 ann125-K GB125 RHA391 AN125-2-3 yes yesUBC242 ann125-L GB125 RHA391 AN125-2-5 yes yesUBC243 ann125-O GB125 wild donor AN125-3-1 yes yesUBC244 ann125-M GB125 wild donor AN125-3-3 yes yesUBC245 ann125-N GB125 wild donor AN125-3-6 no yesUBC246 ann125-P GB125 RHA391 AN125-4-1 no noUBC247 ann125-Q GB125 RHA391 AN125-4-2 no noUBC248 ann125-R GB125 RHA391 AN125-4-4 yes yesUBC249 ann125-S GB125 RHA391 AN125-4-5 yes yesUBC250 ann125-T GB125 wild donor AN125-5-1 yes yesUBC251 ann132-A GB132 wild donor AN132-1-2 no yesUBC252 ann132-B GB132 wild donor AN132-1-3 no noUBC253 ann132-C GB132 wild donor AN132-1-4 yes yesUBC254 ann132-D GB132 wild donor AN132-1-6 yes yesUBC255 ann132-E GB132 wild donor AN132-2-4 yes yesUBC256 ann132-F GB132 wild donor AN132-2-5 no yes188Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC257 ann132-G GB132 wild donor AN132-3-1 yes yesUBC258 ann132-H GB132 wild donor AN132-3-2 yes yesUBC259 ann132-I GB132 wild donor AN132-3-3 no yesUBC260 ann132-J GB132 wild donor AN132-4-1 no yesUBC261 ann132-K GB132 wild donor AN132-4-2 no noUBC262 ann132-L GB132 wild donor AN132-4-3 yes yesUBC263 ann132-O GB132 wild donor AN132-4-4 no yesUBC264 ann132-M GB132 wild donor AN132-5-1 yes yesUBC265 ann132-N GB132 wild donor AN132-5-2 yes yesUBC266 ann132-P GB132 wild donor AN132-5-3 no noUBC267 ann132-Q GB132 wild donor AN132-7-1* yes yesUBC268 ann132-R GB132 wild donor AN132-7-2* yes yesUBC269 ann132-S GB132 wild donor AN132-8-1* no yesUBC270 ann132-T GB132 wild donor AN132-8-2* yes yesUBC271 ano061-A GB061 wild donor AO061-2-1* yes yesUBC272 ano061-B GB061 wild donor AO061-2-2* yes yesUBC273 ano061-C GB061 wild donor AO061-2-3* yes yesUBC274 arg019-A GB019 wild donor AR019-1-1 yes yesUBC275 arg019-B GB019 wild donor AR019-1-2 yes yesUBC276 arg019-C GB019 wild donor AR019-1-3 yes yesUBC277 arg019-D GB019 wild donor AR019-1-4 no yesUBC278 arg019-E GB019 wild donor AR019-2-2 yes yesUBC279 arg019-F GB019 wild donor AR019-2-3 yes yesUBC280 arg019-G GB019 wild donor AR019-2-5 yes yesUBC281 arg019-H GB019 wild donor AR019-3-1 yes yes189Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC282 arg019-I GB019 wild donor AR019-3-2 yes yesUBC283 arg019-J GB019 wild donor AR019-3-3 yes yesUBC284 arg019-K GB019 wild donor AR019-3-4 yes yesUBC285 arg019-L GB019 wild donor AR019-4-1 yes yesUBC286 arg019-O GB019 wild donor AR019-4-2 yes yesUBC287 arg019-M GB019 wild donor AR019-4-3 no noUBC288 arg019-N GB019 wild donor AR019-4-4 no yesUBC289 arg019-P GB019 wild donor AR019-5-1 no yesUBC290 arg019-Q GB019 wild donor AR019-5-5 yes yesUBC291 arg019-R GB019 wild donor AR019-5-6 yes yesUBC292 arg019-S GB019 wild donor AR019-6-1 yes yesUBC293 arg019-T GB019 wild donor AR019-6-2 yes yesUBC294 ann118-A GB118 wild donor AR118-1-1 yes yesUBC295 ann118-B GB118 wild donor AR118-1-3 yes yesUBC296 ann118-C GB118 wild donor AR118-1-4 no yesUBC297 ann118-D GB118 wild donor AR118-2-2 yes yesUBC298 ann118-E GB118 wild donor AR118-2-3 no noUBC299 ann118-F GB118 wild donor AR118-2-4 yes yesUBC300 ann118-G GB118 wild donor AR118-2-5 no noUBC301 ann118-H GB118 wild donor AR118-2-6 yes yesUBC302 ann118-I GB118 wild donor AR118-3-2 no noUBC303 ann118-J GB118 wild donor AR118-3-3 no noUBC304 ann118-K GB118 wild donor AR118-3-4 no noUBC305 ann118-L GB118 wild donor AR118-3-5 yes yesUBC306 ann118-O GB118 wild donor AR118-3-6 yes yes190Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC307 ann118-M GB118 wild donor AR118-4-1 yes yesUBC308 ann118-N GB118 wild donor AR118-4-2 yes yesUBC309 ann118-P GB118 wild donor AR118-4-3 yes yesUBC310 ann118-Q GB118 wild donor AR118-5-2 yes yesUBC311 ann118-R GB118 wild donor AR118-5-4 yes yesUBC312 ann118-S GB118 wild donor AR118-5-6 no yesUBC313 ann118-T GB118 wild donor AR118-5-7 yes yesUBC314 bol023-A GB023 RHA391 BL023-1-1* yes yesUBC315 bol023-B GB023 RHA391 BL023-1-2* yes yesUBC316 bol023-C GB023 RHA391 BL023-1-3* yes yesUBC317 bol023-D GB023 RHA391 BL023-1-4* yes yesUBC318 bol023-E GB023 RHA391 BL023-1-5* yes yesUBC319 deb012-A GB012 RHA391 DB012-1-1 no yesUBC320 deb012-B GB012 wild donor DB012-1-2 yes yesUBC321 deb012-C GB012 wild donor DB012-1-3 yes yesUBC322 deb012-D GB012 wild donor DB012-1-4 yes yesUBC323 deb012-E GB012 wild donor DB012-1-5 yes yesUBC324 deb012-F GB012 RHA391 DB012-13-1* yes yesUBC325 deb012-G GB012 RHA391 DB012-13-2* yes yesUBC326 deb012-H GB012 RHA391 DB012-14-1* no noUBC327 deb012-I GB012 wild donor DB012-2-1 yes yesUBC328 deb012-J GB012 wild donor DB012-2-2 yes yesUBC329 deb012-K GB012 wild donor DB012-3-1 yes yesUBC330 deb012-L GB012 wild donor DB012-3-2 no yesUBC331 deb012-O GB012 wild donor DB012-3-4 yes yes191Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC332 deb012-M GB012 wild donor db012-3-4 yes yesUBC333 deb012-N GB012 wild donor DB012-4-3* no noUBC334 deb012-P GB012 wild donor DB012-4-1 yes yesUBC335 deb012-Q GB012 wild donor DB012-4-2 yes yesUBC336 deb012-R GB012 wild donor DB012-5-1 yes yesUBC337 exi077-A GB077 wild donor EX077-1-1 yes yesUBC338 exi077-B GB077 wild donor EX077-1-2 yes yesUBC339 exi083-A GB083 wild donor EX083-1-1* yes yesUBC340 exi083-B GB083 wild donor EX083-1-2* yes yesUBC341 exi083-C GB083 wild donor EX083-1-3* yes yesUBC342 exi083-D GB083 wild donor EX083-1-4* yes yesUBC343 exi083-E GB083 wild donor EX083-2-1* yes yesUBC344 exi083-F GB083 wild donor EX083-2-2* yes yesUBC345 exi083-G GB083 wild donor EX083-2-3* yes yesUBC346 exi083-H GB083 wild donor EX083-2-4* no yesUBC347 ano061-D GB061 wild donor AO061-3-1* yes yesUBC348 ano061-E GB061 wild donor AO061-3-2* yes yesUBC349 ano061-F GB061 wild donor AO061-3-3* yes yesUBC350 HA89M N/A N/A N/A yes yesUBC351 neg086-A GB086 wild donor NG086-1-1 yes yesUBC352 neg086-B GB086 wild donor NG086-1-2 yes yesUBC353 neg086-C GB086 wild donor NG086-1-4 no yesUBC354 neg086-D GB086 wild donor NG086-2-1* yes yesUBC355 neg086-E GB086 wild donor NG086-2-2* yes yesUBC356 neg086-F GB086 wild donor NG086-2-3* yes yes192Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC357 neg086-G GB086 wild donor NG086-3-1* yes yesUBC358 neg086-H GB086 wild donor NG086-3-2* yes yesUBC359 neg154-A GB154 wild donor NG154-2-1 yes yesUBC360 neg154-B GB154 wild donor NG154-2-2 yes yesUBC361 neg154-C GB154 wild donor NG154-2-3 yes yesUBC362 neg154-D GB154 wild donor NG154-2-4 yes yesUBC363 neg154-E GB154 wild donor NG154-2-5 yes yesUBC364 neg154-F GB154 RHA391 NG154-3-1 yes yesUBC365 neg154-G GB154 RHA391 NG154-3-2 no yesUBC366 neg154-H GB154 RHA391 NG154-3-3 no noUBC367 neg154-I GB154 RHA391 NG154-3-4 yes yesUBC368 neg154-J GB154 RHA391 NG154-4-1 yes yesUBC369 neg154-K GB154 RHA391 NG154-4-2 yes yesUBC370 neg154-L GB154 RHA391 NG154-4-3 yes yesUBC371 neg154-O GB154 RHA391 NG154-4-4 yes yesUBC372 neg154-M GB154 RHA391 NG154-4-5 no noUBC373 neg154-N GB154 RHA391 NG154-5-1 yes yesUBC374 neg154-P GB154 RHA391 NG154-5-2 yes yesUBC375 neg154-Q GB154 RHA391 NG154-5-3 yes yesUBC376 neg154-R GB154 RHA391 NG154-5-4 no noUBC377 neg154-S GB154 RHA391 NG154-5-5 yes yesUBC378 neg154-T GB154 RHA391 NG154-5-6 yes yesUBC379 niv097-A GB097 RHA391 NV097-1-1 yes yesUBC380 niv097-B GB097 RHA391 NV097-1-2 yes yesUBC381 pra096-A GB096 RHA391 PR096-1-1 no no193Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC382 pra096-B GB096 RHA391 PR096-1-2 no noUBC383 pra096-C GB096 RHA391 PR096-1-3 yes yesUBC384 pra096-D GB096 RHA391 pr096-1-4 yes yesUBC385 pra096-E GB096 RHA391 PR096-1-5 yes yesUBC386 pet091-A GB091 wild donor PT091-1-1 no yesUBC387 pet091-B GB091 wild donor PT091-1-2 yes yesUBC388 pet091-C GB091 wild donor PT091-2-1 no noUBC389 pet091-D GB091 wild donor PT091-2-3 yes yesUBC390 pet091-E GB091 wild donor PT091-3-1 yes yesUBC391 pet091-F GB091 RHA391 PT091-4-1 yes yesUBC392 pet091-G GB091 RHA391 pt091-4-2 yes yesUBC393 pet091-H GB091 RHA391 PT091-4-4 yes yesUBC394 pet091-I GB091 RHA391 PT091-4-5 yes yesUBC395 pet091-J GB091 RHA272 PT091-5-1 yes yesUBC396 pet091-K GB091 RHA272 PT091-5-2 yes yesUBC397 pet091-L GB091 RHA272 PT091-5-3 yes yesUBC398 pet091-O GB091 RHA272 PT091-5-4 yes yesUBC399 pet091-M GB091 RHA272 PT091-5-6 yes yesUBC400 RHA391 N/A N/A N/A yes yesUBC401 win038-A GB038 RHA391 WN038-2-1* no yesUBC402 win038-B GB038 RHA391 WN038-2-2* yes yesUBC403 win038-C GB038 RHA391 WN038-3-1* yes yesUBC404 win038-D GB038 RHA391 WN038-3-2* no noUBC405 win038-E GB038 RHA391 WN038-5-1* no yesUBC406 win038-F GB038 RHA391 WN038-5-2* yes yes194Appendix C. Chapter 4 supplementary materialsName Alias Wild Donor Restorerallele sourceLineage Evaluatedat NAROGenotypedUBC407 win038-G GB038 RHA391 WN038-6-1* yes yesUBC408 win038-H GB038 RHA391 WN038-6-3* yes yesUBC409 win039-A GB039 RHA391 WN039-1-1 yes yesUBC410 win039-B GB039 RHA391 WN039-1-2 yes yesUBC411 win039-C GB039 RHA391 WN039-1-3 yes yesUBC412 win039-D GB039 RHA391 WN039-1-4 no noUBC413 win039-E GB039 RHA391 WN039-3-1 yes yesUBC414 win039-F GB039 RHA391 WN039-3-2 yes yesUBC415 win039-G GB039 RHA391 WN039-3-3 yes yesUBC416 win039-H GB039 RHA391 WN039-3-4 yes yesUBC417 win039-I GB039 RHA299 WN039-4-1 yes yesUBC418 win039-J GB039 RHA299 WN039-4-3 yes yesUBC419 win039-K GB039 RHA299 WN039-4-5 yes yesUBC420 win039-L GB039 RHA299 WN039-5-1 no noUBC421 win039-O GB039 RHA299 WN039-5-2 yes yesUBC422 win039-M GB039 RHA299 WN039-5-3 yes yesUBC423 win039-N GB039 RHA299 WN039-5-4 yes yesUBC424 win039-P GB039 RHA391 WN039-6-1 yes yesUBC425 win039-Q GB039 RHA391 WN039-6-2 no noUBC426 win039-R GB039 RHA391 WN039-6-3 yes yesUBC427 win039-S GB039 RHA391 WN039-6-5 no noUBC428 win039-T GB039 RHA391 WN039-7-1 no no195Appendix C. Chapter 4 supplementary materialsllllllllllllllllllllllllllllllllllllllllll lllllllllllll−0.04− 2.5 5.0 7.5TotalBranchesSeedWeightWild DonorlllllllllllllllllH. annuusH. anomulusH. argophyllusH. bolanderiH. debilisH. deserticolaH. giganteusH. hirsutusH. maximilianiH. neglectusH. paradoxusH. petiolarisH. praecoxH. resinosusH. strumosusH. tuberosusHA89HeadDiamllll−1001020Figure C.1: Seed weight and branching of previously pre-bred lines. Each trait presented as aBLUP from evaluation at UBC in 2012. The HA89 depicted contains no wild introgressions.196Appendix C. Chapter 4 supplementary materials050100150050100150UBCNaSARRI40 60 80 100 120Days to floweringCountFigure C.2: Days to flowering for previously pre-bred at UBC and NaSARRI. Evaluation at UBCincluded 55 previously pre-bred lines and NaSARRI evaluations included the pre-bred lines devel-oped at UBC. Days to flowering was recorded for individual plants at UBC, where as it was scoredat NaSARRI when 50% of the plants in a row flowered.197Appendix DChapter 5 supplementary materialsTable D.1: Sunflower association population used in for WGS.SAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthHopi PI 432505 USDA Hopi Landrace N/A 57,263,565,118 16PPN001 NSL 202853 USDA HA 851 B-line 1985 13,310,867,128 4PPN002 NSL 202855 USDA HA 853 B-line 1985 43,852,272,636 12PPN003 NSL 208771 USDA HA 323 B-line 1985 45,183,730,895 13PPN004 PI 509061 USDA HA 351 B-line 1987 16,795,360,999 5PPN005 PI 509062 USDA HA 352 B-line 1988 13,702,270,789 4PPN006 PI 552932 USDA HA 286 B-line 1974 40,452,990,933 11PPN007 PI 552940 USDA HA 302 B-line 1979 14,784,022,699 4PPN008 PI 560144 USDA RHA 376 R-line 1992 24,601,874,597 7198AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN009 PI 560145 USDA RHA 377 R-line 1992 40,710,356,739 11PPN010 PI 561920 USDA HA 380 B-line 1993 41,248,054,607 11PPN011 PI 561921 USDA RHA 381 R-line 1993 38,799,173,640 11PPN012 PI 578008 USDA RHA 386 R-line 1994 40,068,720,164 11PPN013 PI 578009 USDA RHA 387 R-line 1994 25,248,626,832 7PPN014 PI 578010 USDA RHA 388 R-line 1994 14,981,208,896 4PPN015 PI 578011 USDA RHA 389 R-line 1994 17,162,860,384 5PPN016 PI 578872 USDA HA 383 B-line 1995 16,616,907,367 5PPN017 PI 578873 USDA HA 384 B-line 1995 14,899,738,499 4PPN018 PI 597367 USDA HA 403 B-line 1997 15,628,245,113 4PPN020 PI 597368 USDA HA 404 B-line 1997 40,365,307,442 11PPN021 PI 597369 USDA HA 405 B-line 1997 39,350,317,195 11PPN022 PI 597373 USDA RHA 396 R-line 1997 29,100,545,692 8PPN023 PI 597374 USDA RHA 397 R-line N/A 41,070,919,192 11PPN024 PI 599758 USDA RHA 273 R-line 1975 15,932,796,900 4PPN025 PI 599766 USDA RHA 298 R-line 1979 37,709,723,062 10199AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN026 PI 599775 USDA HA 124 B-line 1971 35,765,856,839 10PPN027 PI 599783 USDA HA 314 B-line 1985 35,653,741,976 10PPN028 PI 599976 USDA HA 306 B-line 1981 16,063,738,676 4PPN029 PI 599983 USDA HA 313 B-line 1985 14,204,857,435 4PPN030 PI 607920 USDA R-185 R-line 2000 12,443,472,218 3PPN031 PI 607921 USDA R-188 R-line 2000 12,486,027,741 3PPN032 PI 607923 USDA R-201 R-line 2000 17,842,156,855 5PPN033 PI 633746 USDA RHA 436 R-line 2004 34,650,885,682 10PPN034 PI 633747 USDA RHA 437 R-line 2004 33,466,933,541 9PPN035 PI 650575 USDA HA 112 B-line 1985 40,058,733,413 11PPN036 PI 650579 USDA HA 116 B-line 1985 23,234,337,640 6PPN037 PI 650582 USDA HA 133 B-line 1985 14,634,300,786 4PPN038 PI 650612 USDA HA 113 B-line 1971 10,321,599,627 3PPN039 PI 509060 USDA HA 350 B-line 1987 10,321,599,627 3PPN040 PI 534656 USDA HA 370 B-line 1990 24,132,639,906 7PPN041 PI 543746 USDA RGIG1 R-line 1991 15,479,027,203 4200AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN042 PI 599768 USDA RHA 801 R-line 1981 15,145,549,170 4PPN043 PI 650754 USDA HA-R3 B-line 1985 13,249,732,189 4PPN045 NSL 208772 USDA ND-BLPL2 Other N/A 13,340,100,926 4PPN046 PI 600717 USDA Mandan#1 Landrace N/A 21,722,644,027 6PPN047 PI 642771 USDA HA 452 B-line 2006 14,781,022,504 4PPN048 Ames 22511 USDA Klein Casares OPV N/A 12,974,919,588 4PPN049 PI 642774 USDA RHA 455 R-line 2006 13,148,345,056 4PPN050 PI 642775 USDA HA 456 B-line 2006 18,363,366,075 5PPN051 PI 340790 USDA Ames 1138 OPV N/A 15,583,874,199 4PPN052 PI 642776 USDA HA 457 B-line 2006 16,521,459,612 5PPN053 PI 642777 USDA HA 412 HO B-line 2006 29,627,706,212 8PPN054 PI 655014 USDA RHA 463 R-line N/A 30,283,875,722 8PPN055 PI 655015 USDA RHA 464 R-line N/A 15,441,454,709 4PPN056 NSL 202282 USDA RHA 324 R-line 1986 36,571,616,337 10PPN057 NSL 202859 USDA RHA 855 R-line 1987 28,574,795,402 8PPN059 PI 509059 USDA HA 349 B-line 1987 14,708,747,702 4201AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN060 PI 531072 USDA RHA 359 R-line 1989 29,773,554,934 8PPN061 PI 534650 USDA RHA 364 R-line 1990 13,391,971,240 4PPN062 PI 534657 USDA HA 371 B-line 1990 35,094,641,692 10PPN063 PI 549001 USDA HA germ. poolIII-GOther N/A 32,991,068,280 9PPN064 PI 549011 USDA HA germ. poolIII-QOther N/A 18,671,240,173 5PPN065 PI 552931 USDA RHA 296 R-line 1979 19,178,031,324 5PPN066 PI 552948 USDA DM-3 Other 1985 30,746,720,177 9PPN067 PI 561918 USDA HA 378 B-line 1993 18,622,154,032 5PPN068 PI 599753 USDA HA 154 B-line 1985 34,019,220,003 9PPN069 PI 639165 USDA HA 442 B-line 2006 32,387,985,269 9PPN070 PI 599780 USDA HA 285 B-line 1974 32,690,929,542 9PPN071 PI 599781 USDA HA 289 B-line 1975 31,909,287,981 9PPN072 PI 599980 USDA RHA 309 R-line 1983 33,478,106,425 9PPN073 PI 601368 USDA PHA009 Other N/A 13,214,895,166 4202AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN074 PI 650588 USDA HA 228 B-line 1985 32,801,508,570 9PPN075 NSL 166209 USDA Ha germ. Pool I Other N/A 15,300,320,367 4PPN076 PI 599982 USDA HA 312 B-line 1985 14,960,113,622 4PPN077 NSL 202852 USDA HA 850 B-line 1985 14,229,634,791 4PPN078 NSL 208770 USDA HA 322 B-line 1985 32,509,701,531 9PPN079 PI 548996 USDA HA germ. poolIII-BOther N/A 27,276,845,012 8PPN080 PI 552933 USDA HA 287 B-line 1974 19,499,964,977 5PPN081 PI 560143 USDA RHA 375 R-line N/A 15,799,726,616 4PPN082 NSL 202275 USDA ND-NONOIL B2 Other N/A 24,127,832,536 7PPN083 PI 432504 USDA Hopi dye Landrace N/A 15,121,024,392 4PPN084 NSL 166210 USDA Ha germ. Pool II Other N/A 17,735,415,930 5PPN086 NSL 176425 USDA HA germ. poolV-1IOther N/A 20,225,856,736 6PPN087 NSL 176427 USDA HA germ. poolV-1KOther N/A 20,156,303,092 6203AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN088 NSL 176432 USDA HAGPPV.2 Other N/A 22,041,299,993 6PPN089 NSL 202272 USDA NDNONOIL2 Other N/A 15,902,198,590 4PPN090 PI 655009 USDA HA 458 B-line N/A 18,088,675,658 5PPN091 NSL 202273 USDA ND NONOIL 3 Other N/A 33,816,502,326 9PPN092 NSL 202276 USDA ND NONOIL B3 Other N/A 21,190,842,065 6PPN093 PI 386230 USDA VIR 847 OPV N/A 19,726,681,318 5PPN094 PI 476853 USDA Mammoth OPV N/A 11,121,243,529 3PPN095 NSL 202278 USDA ND NONOIL B5 Other N/A 26,371,360,432 7PPN096 PI 655013 USDA RHA 462 R-line N/A 33,946,541,280 9PPN097 NSL 202281 USDA ND NONOIL M3 Other N/A 29,484,848,645 8PPN098 NSL 202283 USDA RHA 326 R-line 1986 21,459,772,557 6PPN099 NSL 202285 USDA RHA 329 R-line 1986 27,766,897,646 8PPN100 NSL 202286 USDA RHA 330 R-line 1986 15,753,463,656 4PPN101 NSL 202287 USDA RHA 331 R-line 1986 25,660,279,092 7PPN102 NSL 202288 USDA RHA 332 R-line 1986 27,278,525,570 8PPN103 NSL 202289 USDA RHA 333 R-line 1986 12,221,211,883 3204AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN104 NSL 202854 USDA HA 852 B-line 1985 24,923,144,210 7PPN105 NSL 202856 USDA ND-BLOS Other N/A 28,093,608,812 8PPN106 NSL 202858 USDA RHA 854 R-line 1987 35,140,062,762 10PPN107 NSL 202861 USDA RHA 857 R-line 1987 30,608,073,472 9PPN108 NSL 202862 USDA RHA 858 R-line 1987 16,312,156,174 5PPN109 NSL 206234 USDA ND-EBLYS Other N/A 32,387,798,613 9PPN110 NSL 208766 USDA HA 318 B-line 1985 19,876,631,601 6PPN111 NSL 208767 USDA HA 319 B-line 1985 28,768,965,647 8PPN112 NSL 208768 USDA HA 320 B-line 1986 15,576,285,637 4PPN113 NSL 208769 USDA HA 321 B-line N/A 33,899,739,357 9PPN114 NSL 208774 USDA RHA 325 R-line 1986 11,384,717,139 3PPN115 PI 509057 USDA RHA 347 R-line 1987 13,215,217,541 4PPN116 PI 509063 USDA HA 353 B-line 1987 16,939,195,912 5PPN117 PI 509064 USDA RHA 354 R-line 1987 32,331,523,388 9PPN118 PI 531074 USDA RHA 361 R-line 1989 32,983,516,555 9PPN119 PI 531075 USDA RHA 362 R-line 1989 35,257,957,557 10205AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN120 PI 534649 USDA RHA 363 R-line 1990 33,950,046,112 9PPN121 PI 534653 USDA RHA 367 R-line 1990 11,869,139,539 3PPN122 PI 543745 USDA RPET2 R-line N/A 16,533,494,524 5PPN123 PI 548997 USDA HA germ. poolIII-COther N/A 14,730,316,021 4PPN124 PI 548998 USDA HA germ. poolIII-DOther N/A 17,383,696,116 5PPN125 PI 549003 USDA HA germ. poolIII-IOther N/A 15,345,552,544 4PPN126 PI 549006 USDA HA germ. poolIII-LOther N/A 15,098,658,568 4PPN127 PI 549009 USDA HA germ. poolIII-OOther N/A 20,647,838,901 6PPN128 PI 549015 USDA HA germ. poolIII-UOther N/A 13,176,047,009 4PPN129 PI 552934 USDA HA 288 B-line 1974 16,312,081,182 5206AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN130 PI 552935 USDA HA 290 B-line 1979 20,526,470,491 6PPN131 PI 552936 USDA HA 291 B-line 1979 19,583,972,453 5PPN132 PI 552937 USDA HA 292 B-line 1979 20,692,269,910 6PPN133 PI 552937 USDA HA 292 B-line 1979 33,965,161,481 9PPN134 PI 552938 USDA HA 300 B-line 1979 34,443,277,706 10PPN135 PI 552942 USDA HA 305 B-line 1979 14,270,275,092 4PPN137 PI 560142 USDA RHA 374 R-line 1992 31,937,688,083 9PPN138 PI 578874 USDA HA 385 B-line 1995 33,680,834,435 9PPN139 PI 597364 USDA HA 393 B-line 1997 20,014,769,511 6PPN140 PI 597365 USDA HA 394 B-line 1997 13,899,134,258 4PPN141 PI 597370 USDA HA 406 B-line 1997 18,295,517,585 5PPN142 PI 597375 USDA RHA 398 R-line 1997 16,539,683,164 5PPN143 PI 599757 USDA RHA 270 R-line 1975 16,986,366,698 5PPN144 PI 599770 USDA HA 60 B-line 1971 13,597,996,515 4PPN145 PI 599779 USDA HA 277 B-line 1975 17,151,382,702 5PPN146 PI 599788 USDA RHA 293 R-line 1979 30,623,724,599 9207AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN147 PI 599978 USDA HA 308 B-line 1981 10,134,916,147 3PPN148 PI 603987 USDA RHA 391 R-line 1999 12,921,615,824 4PPN149 PI 607510 USDA HA-R7 B-line 2001 14,684,715,850 4PPN150 PI 607927 USDA IMISUN-1 Other N/A 14,843,066,697 4PPN151 PI 617098 USDA HA 425 B-line 2002 13,999,201,600 4PPN152 PI 632338 USDA HA 429 B-line 2003 34,858,771,711 10PPN153 PI 632342 USDA HA 433 B-line 2003 20,895,428,601 6PPN154 PI 633748 USDA RHA 438 R-line 2004 16,897,243,260 5PPN155 PI 639169 USDA HA 446 B-line 2006 19,680,072,063 5PPN156 PI 650570 USDA HA 65 B-line N/A 18,316,709,526 5PPN157 PI 650592 USDA HA 236 B-line 1985 17,058,795,711 5PPN158 PI 650599 USDA HA 249 B-line 1985 21,502,254,008 6PPN159 PI 650763 USDA HA-R5 B-line 1984 17,289,614,082 5PPN160 PI 597371 USDA HA 407 B-line 1996 17,780,908,188 5PPN161 PI 597372 USDA RHA 395 R-line 1997 16,563,184,805 5PPN162 PI 597376 USDA RHA 399 R-line 1997 34,579,363,764 10208AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN163 PI 597377 USDA RHA 400 R-line 1997 34,684,301,617 10PPN164 PI 597378 USDA RHA 401 R-line 1997 34,729,962,097 10PPN165 PI 599759 USDA RHA 274 R-line 1975 32,277,560,544 9PPN166 PI 599763 USDA RHA 279 R-line 1975 34,319,294,809 10PPN167 PI 599764 USDA RHA 294 R-line 1979 14,726,942,990 4PPN168 PI 599765 USDA RHA 297 R-line 1979 29,197,990,834 8PPN169 PI 599767 USDA RHA 299 R-line 1979 32,885,095,567 9PPN170 PI 599769 USDA HA 8 B-line 1985 31,910,555,843 9PPN171 PI 599771 USDA HA 61 B-line 1968 36,583,693,409 10PPN172 PI 599772 USDA HA 64 B-line 1970 33,631,752,168 9PPN173 PI 599773 USDA HA 89 B-line 1971 15,064,592,674 4PPN174 PI 599774 USDA HA 99 B-line 1971 69,651,517,022 19PPN175 PI 599776 USDA HA 224 B-line 1975 38,560,562,808 11PPN176 PI 599778 USDA HA 234 B-line 1971 17,831,208,163 5PPN177 PI 599782 USDA HA 304 B-line 1979 15,548,804,985 4PPN178 PI 599785 USDA HA 822 B-line 1986 38,273,221,305 11209AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN179 PI 599786 USDA RHA 271 R-line 1975 17,122,609,692 5PPN180 PI 599787 USDA RHA 272 R-line 1975 32,146,791,656 9PPN181 PI 599789 USDA RHA 311 R-line 1983 17,490,172,591 5PPN182 PI 599977 USDA HA 307 B-line 1981 35,190,222,800 10PPN183 PI 599979 USDA HA 207 B-line 1983 34,610,697,146 10PPN184 PI 599981 USDA RHA 310 R-line 1983 37,776,309,576 10PPN185 PI 599984 USDA HA 821 B-line 1986 14,506,102,920 4PPN186 PI 600000 USDA RHA 417 R-line 2002 36,224,752,597 10PPN187 PI 600723 USDA BRS-1 Other 1995 14,009,463,901 4PPN188 PI 600725 USDA BRS-3 Other 1995 16,271,645,998 5PPN189 PI 603986 USDA HA 390 B-line 1999 11,999,544,952 3PPN190 PI 603988 USDA RHA 392 R-line 1999 40,517,784,318 11PPN191 PI 603989 USDA RHA 408 R-line 1999 13,056,090,707 4PPN192 PI 603990 USDA RHA 409 R-line 1999 18,698,054,543 5PPN193 PI 603991 USDA HA 410 B-line 1998 22,163,381,781 6PPN194 PI 607504 USDA HA 413 B-line 1999 40,148,719,671 11210AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN195 PI 607508 USDA RHA 418 R-line 2002 12,477,487,210 3PPN196 PI 607509 USDA HA-R6 B-line 2001 29,437,427,842 8PPN197 PI 607511 USDA HA-R8 B-line 2001 15,644,767,331 4PPN198 PI 607922 USDA R-190 R-line 2000 15,493,470,453 4PPN199 PI 607925 USDA R-206 R-line 2000 19,646,045,510 5PPN200 PI 607928 USDA IMISUN-2 Other N/A 35,251,848,917 10PPN201 PI 607929 USDA IMISUN-3 Other N/A 27,875,088,843 8PPN202 PI 607930 USDA IMISUN-4 Other N/A 65,222,241,653 18PPN203 PI 617099 USDA RHA 426 R-line 2002 17,240,708,387 5PPN204 PI 617100 USDA RHA 427 R-line 2002 15,783,686,269 4PPN205 PI 618725 USDA HA 421 B-line 1999 16,953,343,441 5PPN206 PI 618726 USDA HA 422 B-line 2002 36,219,270,192 10PPN207 PI 619204 USDA RHA 419 R-line 2002 14,318,856,927 4PPN208 PI 619206 USDA RHA 428 R-line 2002 14,799,948,508 4PPN209 PI 632339 USDA HA 430 B-line 2003 16,594,389,649 5PPN210 PI 632340 USDA HA 431 B-line 2003 19,416,456,893 5211AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN211 PI 632341 USDA HA 432 B-line 1986 13,950,432,631 4PPN212 PI 633744 USDA HA 434 B-line 2004 13,891,429,173 4PPN213 PI 633745 USDA HA 435 B-line 2004 19,929,059,518 6PPN214 PI 639162 USDA RHA 439 R-line 2006 28,898,341,570 8PPN215 PI 639163 USDA RHA 440 R-line 2006 19,545,520,116 5PPN216 PI 639164 USDA HA 441 B-line 2006 20,147,286,199 6PPN217 PI 639166 USDA RHA 443 R-line 2006 35,887,305,633 10PPN218 PI 649793 USDA 1972R R-line 1972 33,141,559,722 9PPN219 PI 650358 USDA HA 1 B-line 1985 18,378,171,044 5PPN220 PI 650361 USDA HA 15 B-line N/A 35,631,262,736 10PPN221 PI 650571 USDA HA 66 B-line N/A 19,361,934,589 5PPN222 PI 650586 USDA HA 211 B-line N/A 15,578,292,464 4PPN223 PI 650594 USDA HA 243 B-line 1985 16,154,957,913 4PPN224 PI 650597 USDA HA 248 B-line N/A 19,938,357,394 6PPN225 PI 650603 USDA HA 253 B-line N/A 43,780,394,310 12PPN226 PI 650605 USDA HA 259 B-line N/A 36,890,741,600 10212AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN227 PI 650753 USDA HA-R2 B-line 1985 15,704,158,372 4PPN228 PI 650755 USDA HA-R4 B-line 1985 17,765,302,524 5PPN229 PI 650842 USDA HA germ. poolVII HMO BULKOther N/A 16,859,710,069 5PPN230 NSL 202290 USDA RHA 334 R-line 1986 20,168,294,731 6PPN231 PI 509051 USDA HA 341 B-line 1987 39,585,920,350 11PPN232 PI 534654 USDA RHA 368 R-line 1990 75,260,957,245 21PPN233 PI 561919 USDA HA 379 B-line 1993 20,768,667,117 6PPN234 NSL 176426 USDA HA germ. poolV-1JOther N/A 40,103,167,332 11PPN235 NSL 202271 USDA ND NONOIL 1 Other N/A 19,528,126,455 5PPN236 NSL 202279 USDA ND NONOIL M1 Other N/A 79,798,625,552 22PPN237 NSL 202284 USDA RHA 328 R-line 1986 19,981,389,547 6PPN238 NSL 202857 USDA ND-RL0S Other N/A 12,206,263,880 3PPN239 NSL 202863 USDA RHA 859 R-line 1987 71,971,898,306 20PPN240 NSL 208764 USDA HA 316 B-line 1986 17,126,163,063 5213AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN241 PI 509053 USDA HA 343 B-line 1987 37,046,987,569 10PPN242 PI 509065 USDA RHA 355 R-line 1987 13,476,456,272 4PPN243 PI 534652 USDA RHA 366 R-line 1990 22,289,669,591 6PPN244 PI 534658 USDA HA 372 B-line 1990 19,635,374,169 5PPN245 PI 549002 USDA HA germ. poolIII-HOther N/A 15,396,857,330 4PPN246 PI 549014 USDA HA germ. poolIII-TOther N/A 22,527,322,174 6PPN247 PI 552939 USDA HA 301 B-line 1979 15,252,359,293 4PPN248 PI 552944 USDA RHA 282 R-line 1974 35,962,304,353 10PPN249 PI 597366 USDA HA 402 B-line 1997 35,962,304,353 10PPN250 PI 599762 USDA RHA 278 R-line 1975 12,659,700,356 4PPN252 PI 618727 USDA HA 423 B-line 2002 15,862,751,188 4PPN253 PI 650359 USDA HA 1 B-line 1985 18,589,751,378 5PPN254 SF 145 INRA N/A B-line N/A 17,474,758,644 5PPN255 PI 650794 USDA Manchurian OPV N/A 39,043,505,185 11214AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN256 SF 281 INRA N/A R-line N/A 42,238,154,668 12PPN257 PI 650353 USDA VIR 101 OPV N/A 24,469,925,159 7PPN258 SF 179 INRA N/A B-line N/A 16,302,607,657 5PPN259 SF 92 INRA N/A B-line N/A 35,402,119,774 10PPN260 PI 650541 USDA Charata OPV N/A 16,640,584,151 5PPN261 SF 193 INRA N/A B-line N/A 34,215,303,304 10PPN262 SF 230 INRA N/A B-line N/A 78,314,304,164 22PPN263 SF 232 INRA N/A B-line N/A 18,336,377,262 5PPN264 SF 233 INRA N/A B-line N/A 38,377,802,334 11PPN265 Ames 20073 USDA Hemus OPV N/A 13,828,912,298 4PPN266 SF 293 INRA N/A R-line N/A 38,242,647,410 11PPN268 SF 63 INRA N/A B-line N/A 17,517,691,460 5PPN269 SF 210 INRA N/A B-line N/A 40,142,908,423 11PPN270 SF 60 INRA N/A B-line N/A 17,009,324,542 5PPN273 SF 76 INRA N/A B-line N/A 16,340,727,269 5PPN275 SF 169 INRA N/A B-line N/A 36,329,758,208 10215AppendixD.Chapter5supplementarymaterialsSAM name AccessionnumberSource Name Group Approximaterelease dateNumber of reads aligned EstimatedsequencingdepthPPN276 SF 306 INRA N/A R-line N/A 40,271,842,452 11PPN277 PI 650437 USDA Saturn OPV N/A 17,970,475,297 5PPN278 SF 295 INRA N/A R-line N/A 39,115,700,017 11PPN280 SF 322 INRA N/A R-line N/A 33,227,355,164 9PPN284 SF 70 INRA N/A B-line N/A 16,046,901,197 4PPN286 SF 23 INRA BU BCR.1760.3.1.B-line N/A 13,714,477,730 4PPN289 N/A N/A N/A N/A N/A 16,821,466,756 5SF_33 N/A INRA N/A N/A N/A 38,703,018,947 11216


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items