Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Global investigation into the population genetic structure of Ciyptosporidium hominis based on a whole… Williamson, Jill Marie 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2009_fall_williamson_jill.pdf [ 7.09MB ]
JSON: 24-1.0068273.json
JSON-LD: 24-1.0068273-ld.json
RDF/XML (Pretty): 24-1.0068273-rdf.xml
RDF/JSON: 24-1.0068273-rdf.json
Turtle: 24-1.0068273-turtle.txt
N-Triples: 24-1.0068273-rdf-ntriples.txt
Original Record: 24-1.0068273-source.json
Full Text

Full Text

Global Investigation into the Population Genetic Structure of Ciyptosporidium hominis Based on a Whole Genome Multi-locus SNP-typing Scheme; Inferences about the Existence of Biogeographical Partitions,  by Jill Marie Williamson B.Sc., University of British Columbia, 2000  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE STUDIES (Pathology and Laboratory Medicine)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  April 2009  0 Jill Marie Williamson, 2009  ABSTRACT  Previously considered a disease of importance strictly to veterinary medicine Cryptosporidium has emerged as a highly successful opportunistic parasitic protozoan posing a significant threat to public health. Intricate transmission dynamics, a complex epidemiology, and parasite robustness and persistence have all hampered efforts for the prevention and control of Cryptosporidium.  Genetic diversity is a prerequisite to  better understand the role of parasite variation in disease etiology and pathobiology. The extent of genetic structure among C. hominis and C. parvum, the two most prevalent species of Ciyptosporidium, is insufficiently understood with the population structure still largely suspect We report on the distribution of genetic diversity and possible existence of geographic partitions among C. hominis subpopulations from Australia, Kenya, Peru and Scotland. We studied C. hominis population genetic structure using a multi-locus SNP-type (M1St) established from 45 single nucleotide polymorphic loci covering 13 bio-functionally relevant proteins. A total of 77 isolates from 4 intercontinental subpopulations were genetically typed. Twenty-four unique M1St’s were identified, 25% of which were found to be located within one or more subpopulations. Diversity statistical tests to discern the degree of ultrapopulation and inter-population diversity, genetic distance, and genetic identity variation were used to examine the population genetic structure. Within-population differences among subpopulations account for 69.6% of genetic variation; differentiation among subpopulations constitute 3 0.4%. Genetic distances among subpopulations averaged 0.048 and varied from 0.03 4 between the Australian and Scotland subpopulations to 0.061 between Scotland and Kenya. More broadly, our results argue that too wide of a geographic boundary can impede rather than advance genetic population studies and that the practice of sampling more regional subpopulations be adopted.  A fifth subpopulation, a combination of C. hominis and C. parvum isolates, was drawn upon to determine whether or not a pre-defined allelic profile of single nucleotide polymorphisms (SNPs) was an efficient and reliable means for species specific identification. Results showed the SNP-typing approach’s ability to distinguish between different species as well as being capable of uncovering potential novel SNPs within an individual isolate. We propose that the patterns of genetic variation are influenced by geography and that the identification of host adapted geographically conserved sub-genotypes within a defined geographic cohort versus widespread dissemination of genetically stable isolates could ultimately provide a valuable basis for the predictive epidemiology of Cryptosporidiwn infection. Our fmdings provide an alternative method for species detection, a crucial element to epidemiological investigations. U  TABLE OF CONTENTS  Abstract  ii  Table of Contents  iii  List of Tables  vii  List of Figures  viii  Abbreviations  ix  Glossaiy  xi  Acknowledgements  xv  Dedication  CHAPTER 1  xvii  Introduction  1  1.1 Parasitology  2  1.1.1 Parasitism  2  1.1.2 The Emergence of Cryptosporidium  3  1.1.3 Cryptosporidium Biology: Life Cycle & Propagation  4  1.2 Clinical Pathogenesis  7  1.2.1 Host Colonization  7  1.2.2 Clinical Manifestation  8  1.2.3 Human Immune System Response  9  1.2.4 Diagnosis  9  1.2.5 Chemotherapy & Treatment  1.3 Epidemiology  10  12  1.3.1 Routes of Exposure  12  1.3.2 Transmission Dynamics  12  1.3.3 Prevalence  16  1.3.4 Demographics & Sociological Influence  18  1.3.5 Prevention & Control  19  In  CHAPTER 2  CHAPTER 3  CHAPTER 4  Genomics of Ciyptosporidium  .24  2.1 Phylum Apicomplexa  25  2.2 Genus Cryptosporidium  26  2.3 C. hominis vs. C. parvum  29  2.4 Biogeographical Genetic Diversity  30  2.5 Theory of Clonal Population  31  Experimental Design  33  3.1 Aim  34  3.1.1 Central Research Question  34  3.1.2 Hypothesis  34  3.2 Study Objectives  35  3.3 Experimental Rationale: the Use of SNPs  36  Experimental Platform  38  4.1 Workflow  39  4.2 Materials & Methods  40  4.2.1 Data Mining  40  4.2.2 Bioinformatics Analyses: ORF Analysis & Biophysical Properties  40  4.2.3 Target Loci for Multi-locus SNP-type  42  4.2.4 Primer Design  43  4.2.5 Multiplex PCR  43  4.2.6 Amplicon Purification  46  4.2.7 SNaPshot Chemistry  46  4.2.8 Capillary Electrophoresis & Electropherogram Analysis  49  4.2.9 Allele Discrimination & Scoring  49  4.2.10 Statistical Analysis  51  4.2.11 Geographic Boundaries  51  iv  CHAPTER 5  CHAPTER 6  CHAPTER 7  CHAPTER 8  Literature Cited  Results  .56  5.1 Whole Genome Data Mining; Comparative Genomics  57  5.2 Target Gene Panel: Qualitative Characterization  60  5.3 Multi-locus SNP-type Assembly  66  5.4 Bioinformatics Computations: Protein Topology & Biophysical Properties  68  5.5 Multi-locus SNP-typing: Intercontinental Subpopulations  72  5.5.1 Australia  72  5.5.2 Kenya  76  5.5.3 Peru  78  5.5.4 Scotland  80  5.6 Mixed Genotypes  83  5.7 Distribution of Multi-locus SNP-types  86  5.8 Descriptive Statistics: Measures of Genetic Variability  87  5.9 Genetic Data Analysis  91  5.9.1 Distribution Genetic Variation  91  5.9.2 Genetic Identity Measures  95  5.10 Canada Multi-locus SNP-typing  97  5.11 Species Distinction  100  Discussion  102  6.1 Comparative & Computational Whole Genome Analysis  103  6.2 Population Genetic Substructuring  112  Recommendations for Future Work  122  7.1 Study Extensions: Current Work  122  7.2 Study Extensions: Future Work  123  7.2.1 Continuation of Multi-locus SNP Dataset  123  7.2.2 Inferring patterns of Evolutionary Descent  124  7.2.3 Implications to the Range of Vaccines or Chemotherapies  126  Executive Summary  127  130  V  APPENDICES  .146  Appendix 1 C. hominis genome summary  .147  Appendix 2 Spatial structure of species  148  Appendix 3 ABI 3 130x1 automated sequencer  149  Appendix 4 Standard Lizl2O fragment sizer  150  Appendix 5 Example protean profiles; Cp23 & Gp60  151  Appendix 6 Alleles sampled per SNP marker per subpopulation  153  Appendix 7 Multi-plex PCR gel electrophoresis; future target proteins  154  Appendix 8 Genomiphi whole genome amplification  155  Appendix 9 eBURST; inferring patterns of evolutionary descent  157  vi  LIST OF TABLES  Table 1.1  Target populations for Cryptosporidium exposure  19  Table 1.2  Deactivation of Cryptosporidium oocysts  22  Table 2.1  Recognized Cryptosporidium species  27  Table 2.2  Cryptosporidium genome summary; C. hominis, C. parvum, P. falciparum  28  Table 4.1  MiSt reaction sets  42  Table 4.2  PCR primer pairs  45  Table 4.3  SBE capture probes; fragment analysis  48  Table 4.4  Australian subpopulation  53  Table 4.5  Kenyan subpopulation  53  Table 4.6  Peruvian subpopulation  54  Table 4.7  Scotland subpopulation  54  Table 4.8  Canadian subpopulation  55  Table 5.1  Target gene library  57  Table 5.2  Gene panel; multi-locus SNP-typing  58  Table 5.3  Genetic organization target genes; descriptive genetics  59  Table 5.4  Target gene protein sequences; blast queries  60  Table 5.5  Assembled SNP marker panel  67  Table 5.6  B jo-physical properties; target SNP loci  70  Table 5.7  Australia subpopulation MlSts; C. hominis  73  Table 5.8  Australia subpopulation MlSts; C. parvum  74  Table 5.9  Kenya subpopulation MiSts  77  Table 5.10 Peru subpopulation MiSts  79  Table 5.11 Scotland subpopulation MlSts  81  Table 5.12 Diversity Indices; intercontinental C. hominis subpopulations  88  Table 5.13 Allelic frequencies; intercontinental C. hominis subpopulations  90  Table 5.14 Gene Diversity; intercontinental C. hominis subpopulations  92  Table 5.15 Apportionment genetic diversity; C. hominis intra- & inter-population relationships  94  Table 5.16 Nei’s genetic identity & distance measures  96  Table 5.17 Physical geographic distance among intercontinental C. hominis populations  96  Table 5.18 Canada subpopulation MlSts; C. hominis & C. parvum  99  Table A. 1  Alleles sampled per SNP marker per subpopulation  153 VII  LIST OF FIGURES  Figure 1.1  Cryptosporidium life cycle  6  Figure 1.2  Brush border of the gastrointestinal tract  7  Figure 1.3  Cryptosporidium detection; acid-fast stain  10  Figure 1.4  Cryptosporidium therapeutics; macrolide antibiotics  11  Figure 1.5  Cryptosporidium transmission dynamics; pictorial illustration of their interlacing  16  Figure 4.1  Experimental platform workflow  39  Figure 4.2  Multi-plex PCR; gel electrophoresis of Cp23, COWP & 13-tubulin SNaPshot fragment analysis  44  50  Figure 4.5  Electropherograin; fragment analysis of Cp23, COWP & 13-tubulin Geographical representation of 5 subpopulations  52  Figure 5.1  Phylogenetic relationship Apicomplexa; MDH & LDH  61  Figure 5.2  Hydrophobicity profile; Gp60 locus  69  Figure 5.3  Electropherogram; Australia, HSP7O locus  74  Figure 5.4  Electropherogram; Kenya, COWP locus  77  Figure 5.5  Electropherogram; Peru, Cp23 & 18srRNA loci  79  Figure 5.6  Electropherogram; Peru, COWP locus, marker COWP3, novel allele  80  Figure 5.7  Electropherogram; Scotland, Cp23 & 1 8srRNA loci  82  Figure 5.8  Electropherogram; mixed genotype  84  Figure 5.9  Distribution mixed alleles; by individual SNP marker  85  Figure 4.3 Figure 4.4  47  Figure 5.10 Distribution mixed alleles; by geography  86  Figure 5.11 Mist distribution; by geography  87  Figure 5.12 Phylogenetic relationship; intercontinental subpopulations  97  Figure 5.13 Electropherogram; Canada  100  Figure A. 1 C. hominis genome summary  147  Figure A.2 Spatial structure of species  148  Figure A.3 ABI 313 Oxl automated sequencer  149  Figure A.4 Standard Lizl2O fragment sizer  150  Figure A.5 Example Protean profiles; Cp23 & Gp60  151  Figure A.6 Gel electrophoresis; future target proteins  154  Figure A.7 Schematic of Genomiphi protocol  156  Figure A.8 e BURST representation of evolutionary descent  157 vifi  ABBREVIATIONS  A  Australia  aa  amino acid  AIDS  Acquired Immunodeficiency Disease  APR  Apoptosis Related Protein  BC  British Columbia  bp  basepair  BT  -tubulin  COWP  Cryptosporidium oocyst wall protein  Cp23  Cryptosporidium protein  DHFR  Di-hydrofolate Reductase  D(st)  average among populations diversity  ED  emerging infectious disease  EMAAg  erythrocyte membrane associated antigen  Fst  Wright’s F statistics  Gst  Nei’s analog for Wright’s (Fst) statistics  H(s)  average within population diversity  H(t)  total genetic diversity  11W  human immunodeficiency virus  HSP  heat shock protein  GKO  genetic knockout  Gp60  glycoprotein 60  K  Kenya  LDH  lactate dehydrogenase  MDH  malate dehydrogenase  MLG  multi-locus genotype  M1St  multi-locus SNP-type  NIAID  National Institute Allergy and Infectious Disease  NJM  neighbour joining method  NS-SNP  non- synonymous SNP  nt  nucleotide  ORE  open reading frame  P  Peru lx  PAGE  poly-acrylamide gel electrophoresis  PCR  polymerase chain reaction  S  Scotland  SAAP  single amino acid polymorphism  SBE  single base extension  SNP  single nucleotide polymorphism  S-SNP  synonymous SNP  øst  Welt & Cockerham analog for Wright’s statistics  UPGMA  unweighted pair group method wI arithmetic mean  UPRTase  Uracil phosphoribotase  WHO  World Health Organization  x  GLOSSARY  Allele alternative form of a gene. -  Allopatric an organism whose range is entirely separate. In regards to speciation it refers to biological -  populations that are physically isolated by an extrinsic barrier and evolve genetic reproductive isolation. Antigen any substance that causes your immune system to produce antibodies against; it may be a -  foreign substance from the environment such as chemicals, bacteria, viruses, parasites or pollen. B cell lymphocytes that play role in the humoral immune response whose principal functions are to -  make antibodies against antigens. Bioinformatics collection of methods utilized for the analysis of molecular biology data through a -  computer. Brush border microvilli-covered surface of epithelium cells found in intestinal tract of the body. -  ClustalW multiple sequence alignment program. -  Codon a group of three adjacent nucleotides that encode an amino acid. -  Comparative genomics analysis based on the comparisons of whole genomes. -  Differentiation (genetic)  —  differences in allele frequencies among populations.  Distance Matrix a pairwise ‘distance’ between taxa, for molecular data, it could be the observed number -  of nucleotide differences between the pairs of taxa. Enterocyte intestinal absorptive cells, simple columnar epithelial cells found in the small intestine and -  colon. Eukaryote organisms with a complex cell structure and cell nucleus. -  Exon coding part of a gene/protein. -  x  Extension process in PCR by which nucleotides are extended from the initial binding site of a primer by -  the action of a polymerase. Gene flow also known as migration, any movement of genes from one population to another, result of -  which decrease inter-population variation and increase intra-population variation. Genetic drift random fluctuations in allele frequency which occur by chance, particularly in small -  subpopulations, as a result of sampling error. Genome full set of chromosomes carried by a given organism. -  Genotype genetic characteristics of a cell or organism according to its entire genome or a specific set of -  genetic loci (allele) Gnotobiotic animals born in aseptic conditions, removed by Caesarean section, are exposed only to the -  microorganisms researchers wish to be present in the animal. Haploid possessing only one copy of each chromosome in a genome, in contrast to diploid where two -  copies of each chromosome are present. Homolog gene or morphological character that shares a common ancestry with a different gene or -  morphological character. Humoral the part of immunity or the immune response that involves antibodies secreted by B cells and -  circulating in bodily fluids. Intra-population within a population. -  Inter-population among or between different populations. -  Intron non-coding part of a gene/protein. -  Isolate single sample of a species from a given population -  Locus (p1. loci) specific location on a chromosome. -  Macrodiversity  —  genotype diversity of an organism revealed by a strain-typing method designed to  detect changes throughout the genome.  xl’  Meiosis cell division in sexually reproducing organisms that reduces amount of genetic information by -  half. Metapopulation group of populations connected by some level of gene flow, also referred to as -  population. Microdiversity genotype diversity of an organism revealed by a strain-typing method designed to detect -  nucleotide changes in a restricted part of the genome. Microvilhi microscopic cellular membrane protrusions that increase the surface area of cell. -  Migration see gene flow. -  Mitosis simple cell division without a reduction in chromosome number. -  Monophyletic group group of organisms with the same taxonomic title that are shown phylogenetically -  to share a common ancestor that is exclusive to these organisms. Multi-locus SNP type (MiSt)  —  genetic typing of an isolate based on the allelic profile of all molecular  markers examined, most studies discuss similar results in terms of a multi-locus genotype (MLG), we chose the acronym M1St based on the SNP foundation of the study. Mutation change in the nucleotide sequence of the genetic material of an organism. -  Neighbour-Joining (NJ) algorithm for inferring a branching tree diagram from a distance matrix, by -  successively clustering pairs of taxa together. Non-synonymous substitutions (NS)  -  substitutions that occur in protein-coding genes that result in a  change at the amino acid level. Oocyst thick-walled spore phase of Cryptosporidium, is the infectious form of the parasite. -  Opportunistic microorganism pathogenic organism that exploits an immunocompromised immune -  system to establish infection. ORF open reading frame, DNA sequence without stop codons thus allowing for the translation of a -  protein sequence. Ortholog homologous genes found in two different taxa that are performing the same function in each -  taxon.  xlii  Outbreak  acute appearance of a cluster of illnesses caused by a single pathogen that occurs in numbers  —  of excess of what is expected for that time and place. Pathogenesis development of a disease; the origin of a disease and the chain of events leading to that -  disease. Phenotype  —  observable characteristics expressed by an organism, including drug resistance, virulence,  and morphology. Potable water considered to be of sufficiently high quality so it can be consumed or utilized without risk -  of immediate or long term harm. Prepatent period between the time of exposure to a parasite and the time when the parasite can be -  detected in the blood or stool. Primer  —  short piece of nucleic acid that binds to a complementary target or template DNA strand,  serving as the starting point for the addition or extension of complementary nucleotides along the rest of the template strand. Purine Adenine (A) or Guanine (G) nucleotide. -  Pyrimidine  —  Cytosine (C) or Thymidine (T) nucleotide.  Subpopulation subgroup of a metapopulation or total population, commonly annotated as -  subpopulation, local populations, or demes. Symbiotic a relationship that is mutualistic, parasitic, or commensal in nature. -  Synonymous substitutions (5) substitutions that do not change the identity of the encoded amino acid. -  T cell cells belonging to a group of white blood cells known as lymphocytes that play a central role in -  cell-mediated immunity and are produced in the thymus. Taxonomy scientific discipline of naming organisms. -  Transition substitution of a purine for a purine or a pyrimidine for a pyrimidine (like-for-like). -  Transversion substitution of a purine for a pyrimidine or vice versa. -  xiv  ACKNOWLEDGMENTS  I offer my enduring gratitude to the faculty, staff and fellow students at UBC who have inspired me to pursue my work in this field throughout both my undergraduate and graduate careers. As I moved into post-graduate studies I was beyond blessed with the exemplary mentoring, guidance and enduring support, both emotionally and academically of Dr. David Walker, without whom I am not sure I would be where I am today. My sincerest gratitude for all his hard work within the Department of Pathology as a whole and his more personal dedication to me, it is more appreciated than I could ever express. I further give great credit for my success to one of the most efficient, knowledgeable and patient people I know, Penny Woo. Without her the Department of Pathology and Laboratory Medicine would be at a tremendous loss. I was very fortunate to be situated within University of British Columbia’s research division of the British Columbia Center for Disease Control. This state of the art facility provided me access to superior technologies, world-class scientists and some of the best trained technicians and resource staff there is. In particular I must extend much thanks for all their guidance, technical expertise and resource support to; Kathy Adie, Ruben Chen, Alan McNabb, Diane Eisler, Glenna Geddes, Theresa Low, Yvonne Santa-Cruz, and Quantine Wong. This study would not have been possible without the incredible privilege I was afforded through the collaborations with five of the top scientists in the field of Cryptosporidium research. I had the incredible opportunity to work alongside them, have them available as a resource, receive their contributions as co-authors and use them as generous sources of advice and guidance. This is something most people would hope to achieve in their entire careers, let alone their graduate work. With great thanks to Dr. Huw Smith (Scottish Parasite Diagnostic Laboratory, Stobhill Hospital, Glasgow, Scotland), Dr. Una Ryan (Murdoch University of Western Australia, Perth, Australia), Dr. Lihua Xiao and Dr. Vitaliano Cama (Center for Disease Control, Atlanta, Georgia, USA) and Dr. Wangeci Gatei (Kenya Medical Research Institute; Nairobi, Kenya). I was again beyond privileged with the make-up of my supervisory committee which included some of the most accredited and renowned researchers in their respective fields; Dr Judy Isaac-Renton, Dr. Martin Adamson, Dr. Craig Stephens, and Dr.David Walker. As a whole their different perspectives and suggestions regarding my studies provided a real robustness to my academic career and really xv  allowed me to look outside the narrowness of one research topic and think of how I might apply my knowledge to other aspects of disease research or to different concepts all together. Their questions and ideas not only challenged me but allowed me the opportunity to mature my own thinking to a level where I became more and more confident and proud of the progress I had made. Each in their own way, their contributions to my work will have a profound effect on all that I have yet to accomplish in my career and I have such gratitude for that.  Last, but certainly not least, I have to give the greatest of thanks and accolades to my supervisor, Dr. Corinne Ong. She taught me to think for myself, work for myself and expand myself to a new level of academic standards yet at all times I never doubted that she was right their alongside me to guide me, challenge me and encourage me. I cannot thank her enough for her patience with my work, her respect for my work methods and her enduring support through thick and thin. She is a true scientist, with expertise not only in her specific field but also in how good, reputable research should be conducted. She  was not just a mentor for my work but also for what I strive to achieve as an all around medical professional in regards to work ethic, professionalism, reputation and contributions to and collaborations in science. After all she has done for me, how hard she has fought for me and stood up for me and my work there are not enough words to acknowledge what all her dedication to me has meant. What she has taught me will be the foundation of what I hope to achieve in my career. Anyone would be so blessed to have the chance to work with her.  xvi  DEDICATION From elementary school, to secondary school to undergraduate academics to graduate academics there have been countless people alongside me that whom without with I am sure I would not have been able to achieve what I have thus far in my career. I give profound thanks for all their support and inspiration and I continue to be in awe that I have been so blessed with such incredible people in both my personal and academic lives. The encouragement and support from my two amazing older sisters is one of the greatest gifts they could have ever been given to me. From as early as I can remember I have always looked up to them, been inspired by them and challenged myself to be more like them. Their courageous spirits continue to awe me and I am so humbled to have such incredible people in my life for me to look up to. I have learnt so much from them and I am so proud of the people they have become and all they have accomplished. Furthermore I am indebted to my Auntie Trish, who was always there with her un conditional love and endless supply of funny mail when all I had to read were “boring” papers and journals. The greatest thanks of all must go to my mother. No words will ever be powerful enough to impress upon her how much I love her and how much I appreciate all she has done for me and sacrificed for me. She cheered me on when I was already up and lifted me up in times I was down. I have no doubt that I would not have gone into the field I have if it weren’t for her. My mom is a highly accomplished and respected nurse. As a small child I would wait on bated breath for her to get home from work. Though exhausted from a long 12 hour shift and likely just wanting 10 minutes of peace in a hot bath before resuming her life as wife and mother of three active daughters I would pounce on her the minute she was in the door. I would pepper her with questions about her patients, their illnesses, the causes, their prognoses, what she did to treat them, how she helped them and so on and so on and so on. Most days I would draw the bath for her ahead of time so I would not have to wait any longer than the minutes it would take her to get in the door draw it herself. While she tried to unwind I chatted non-stop and yet her patience never wavered, not once, she never turned me away, she never asked me to be quiet or to leave. It is those stories, that came at 730 pm post-day shift or 730 am post-morning shift that created and nurtured my interest in the medical field. It is the compassion with which she told those stories that inspired me, drove me, and created the conviction within me to be involved in this field that strives to create a greater good for people of all walks of life; to help all of mankind no matter who you are, where you live or what you have. I am eternally grateful to have had the life and upbringing that enabled me to hear these stories. It is these stories and their storyteller to whom I dedicate my past and future career to. It is these stories and their storyteller to whom I dedicate this dissertation to. xvii  CHAPTER 1 INTRODUCTION TO Cryptosporidium -  Chapter Summary  —  Biological Concepts of Cryptosporidium  -  Parasitism is a symbiotic relationship between organisms of different species  where the parasite benefits at the expense of the host. Parasitic diseases account for a large proportion of both human and animal morbidity and mortality.  Long thought to be a disease of importance strictly to  veterinary medicine Cryptosporidium has recently emerged as an important group of parasitic protozoa posing a serious threat to public health. Discussed is a comprehensive review of the biology, pathogenesis, and epidemiology of Cryptosporidium.  I  Li Parasitology  1.1.1 Parasitism  According to the latest WHO 240 estimates worldwide infectious disease accounts for nearly 30% of 56 million annual deaths. Even in the face of amazing advancements in medicine, science ,and technology in the last twenty years modern medicine has seen a spectacular resurgence or novel emergence of several pathogens thought to have been be eradicated or contained, parasites included. This is likely due to a combination of factors: human demographics and behaviour, technology, economic development and land use, international travel and commerce, microbial adaptation, and changes to and the breakdown of public health measures. One of the greatest examples is the increased resistance to drugs and insecticides which has proven to be a major cause in the resurgence of malarial 46 diseases 8 ” 5 The changing global climate and ecology could create new environments more favourable to pathogenic organisms in addition to hosts and vectors of pathogenic organisms. Furthermore the rising global population is forcing the expansion of human habitats into the niches of potentially virulent organisms. Parasitism is defined by the relationship between two organisms: a parasite, most often the smaller of the two, and a host upon which the parasite is physiologically dependent ’ 154 The host46 parasite interaction is a dynamic balance between the two organisms; the virulence of the pathogen and the resistance of the host are constantly changing . Equilibrium between host and parasite is necessary to 46 ensure the survival of both partners and thus sustaining the endurance of the relationship. Hosts and parasites therefore co-evolve. Pathogens constitute selective pressures in the evolution of the host just as hosts are to pathogens. Globally, eukaryotic parasitic infections account for a significantly higher incidence of morbidity and mortality than disease produced by any other group of organisms . In the developing world this is 46 further exacerbated by additional factors such as other diseases rampant within a given population, poor diet, socioeconomic status, age, war, and other similar stressors. Opportunistic parasites minimally pathogenic to immunocompetent organisms, can exploit decreased or compromised cellular immunity to induce serious host damage. While perhaps the most well known example is the immunosuppression brought on by the I{IV/AIDS epidemic centralizing around Africa and Asia, compromised immunity is also common in those individuals undergoing chemotherapy, organ transplantation, with auto-immune diseases or with 2  poor health due to nutrition or starvation. Examples of such opportunistic parasites include Plasmodium spp., Toxoplasma spp., Leishmania spp., Giardia spp., Pneumocystis spp. and Cryptosporidium spp., the ’ 6 focus of this study  20, 22,48,60,89,92, 115, 137, 154,183, 222  1.1.2 The Emergence of Cryptosporidium  Primarily because of the growth rate of the global human population, the expanding appetite for resources of all types has led to the dissolution of many ecological barriers important to the natural control of disease. More specifically, for parasitic zoonoses acquired from wildlife habitats and vice versa, a shift in the interface between wildlife and people from often sporadic and fragile to more 87, 123, 131, 150,167, 176,  ’ 55 permanent and substantial provides significant opportunities for parasite transmission 224 Dasak et al. (2000) reviewed an array of emerging infectious diseases (ED) affecting people and discussed some of the underlying factors mediating their emergence. They went on to address the consequences such EIDs have on humans, domestic animals and wildlife health in addition to the  consequences EIDS pose for biodiversity as a whole. Of the 18 EIDs identified only cryptosporidiosis is named as a zoonotic pathogen of major importance to public health. Long thought to be a disease restricted to veterinary medicine, the Cryptosporidium genus with its various species has now been established as a significant emerging opportunistic global 226 (1907) first published evidence on the one-celled organism known enteropathogen to humans. Tyzzer  as Cryptosporidium muris found in the gastric glands of  226  Five years later, in 1912, he reported  a second species, Cryptosporidiumparvum, from laboratory mice which differed from the type species in . It was not until 63 both the localization of infection and the developmental morphology of the organism the late 1970’s early 1980’s, a time period coinciding with the forefront of the AIDS epidemic that the host switching by Cryptosporidium to humans was recognized and its pathogenic potential appreciated. In the advent of new detection methods human cases of Cryptosporidium infection became more . Two 63 apparent. In 1976 Cryptosporidium was diagnosed in a previously healthy three year old child . From 29 months later a second case arose in an immunosuppressed individual undergoing drug therap 1976 to 1982 the disease was rarely reported and primarily occurred in immunocompromised persons. In 1982, the number of reported cases began to increase dramatically, though still relatively limited to immunocompromised persons. With the aid of newly developed laboratory diagnostic techniques, . In the 1990’s, the application of 56 outbreaks in immunocompetent persons began to be recognized molecular techniques to the identification of isolates brought about both clarification and complexity to 3  our understanding of Cryptosporidium spp. and host specificity. The WHO 240 now defmes Cryptosporidium as the most notable example of an emerging disease caused by a protozoan parasite in the last 25 years. It has been put forth by some that the catalyst in the emergence of human Cryptosporidium infections was the altered host environment to be found in a new population of H1V (+) . As the demographic and temporal conditions became suitable for the propagation of an 174 individuals organism such as Cryptosporidium natural selection would likely have facilitated its evolution. Cryptosporidium has caused many recent large-scale outbreaks in the developed and developing 9 63 world 3 2 ” 1 2 ’ ° 51 0  The pathogenic success of Cryptosporidium within the host is in part attributed  to its low infectious dose, as few as one oocyst, the infectious form of Cryptosporidium, is capable of establishing infection’ . The intracellular but extra-cytoplasmic location and monoxenous life cycle also 71 contribute to the parasite’s success.  Oocysts are omnipresent, highly stable, and occur in diverse  ecological situations, having shown an environmental persistence of up to six months. Coupled with the fact that Cryptosporidium is now classified by the NIAID as a class B bioterrorism agent a re examination of the public health threat from Cryptosporidium spp. is warranted, thus generating a major shift in research focus towards the organism . 37  1.1.3 Cryptosporidium Biology: Life Cycle & Propagation  Cryptosporidium ‘s life cycle involves two asexual stages (merogony) and a sexual stage (gametogony), all three of which occur ° 23 This has impaired efforts to clari1’ the exact ’ 64 intracellularly mechanisms of the developmental stages of Cryptosporidium maturation . Throughout the majority of 238 their life cycles Cryptosporidium exists as haploid organisms, multiplying asexually. A transient sexual stage results in a diploid state followed by meiotic division. Genetic recombination between C. parvum and C. hominis has been shown but the exact contribution to genetic diversity within and between each species remains ’ 3 unclear 2 16 17 The oocyst contains four sporozoites, unique considering most Apicomplexans contain 8, that are released upon host cell  ° The sporozoite differentiates into a spherical trophozoite and 23 za  asexual development begins, forming two types of meronts. Typically type one meronts contain 6-8 nuclei which eventually incorporate into 6-8 first generation merozoites as the meront matures. Each merozoite is capable of infecting a new host cell and subsequently developing into either a new type one meront or into a type two meront. Upon maturation type two meronts contain four second generation merozoites which also invade new host  6230  It’s these second generation merozoites that initiate 4  sexual reproduction by differentiating into either male (microgameto cyte) or female (macrogamont) stages. Microgametes mature and form a microgametocyte which n.pon excysting from its host cell goes on to invade another host cell containing a female macrogamont. Fertilization ensues and a zygote is developed and then eventually an oocyst which undergoes meiosi3 (sporogeny) within the host. Once completed each oocyst comprises of four potentially infectious sporozoites. Oocysts either sporulate in situ and release sporozoites for autoinfection or are expelled from the body in the  6230  From this life cycle two types of oocysts are formed, thick walled and thin walled  oocysts. Thick walled oocysts make up about 80% of those produced and with the tough, hearty outer shell are those that are released into the environment via the intestinal tract. These oocysts are therefore responsible for the long-term survival of Cryptosporidium in the environment. In the environment each oocyst is in its infectious form, haploid, and contains eight chromosomes. The remaining 20% of oocysts produced are considered thin walled which excyst endogenously, infect new host cells thus auto infecting the host, and perpetuating the life cycle. It is this autoinfection that can cause Cryptosporidium to develop into a chronic disease in immunosuppressed  6230  It takes approximately 12-14 hours for a generation of parasites to develop and  ° The 623 tu  rapid life cycle exacerbated by its multiple modes of autoinfection and monoxenous nature creates a heavy burden upon the host. This parasite burden can lead to the development of secondary infection sites within the intestinal tract thus creating chronic, relentless infections which are often seen in the immunocompromised, elderly and young. Such a high propagation rate further contributes to  Cryptosporidium ‘S pathogenic success by flooding the natural world with oocysts. As many as 20 billion oocysts have been collected over a 24hr time period from experimentally infected cattle making the 4 63 11 apparent’°’ 2 ’ affliction these oocysts can have on the environment °  5  Figure 1.1 Cryptosporidium life cycle.  Thick-walled OOCy$L ingesled by ho  lnec1ie Sta9e Diaosbc Stage  . 37 Figure 1.1. Depiction of the basic life cycle of Cryptosporidium within and outside of the human host  6  1.2 Clinical Pathogenesis  1.2.1 Host Colonization  Oocysts are encountered through a fecal-oral transmission dynamic. Once ingested oocysts excyst or break open releasing f’ur individual parasites known as sporozoites. Sporozoites have a predilection for the gastro-intestinal tract, principally to the ileum orjejunum of the lower small intestine where they enter and parasitize the luminal space of the epithelial cells of the brush border of the microvillus (Figure 1.2)  16, 56, 105, 119,133, 138  Cryptosporidium, like all Apicomplexans, is considered an intracellular but extra-cytoplasmic organism’. This positioning could enable it to evade the host’s standard immune surveillance giving it time to colonize and establish disease. Studies have shown that  these parasites can colonize other tissues such as respiratory tissues, different regions of the digestive tract or the ’ conjun 3 3 2 1 ctiva Some research indicates that the clinical presentation of cryptosporidiosis can vary in severity depending on where precisely it has localized 71 to’ Periera et al. (2002) used a . gnotobiotic piglet model of C. hominis and C. parvum infection and found that the parasite consistently invaded the ileum and colon when challenged with C. hominis versus the jejunum, duodenum, and ileum when challenged with C. parvum’ . The clinical outcome of each infection was significantly different 71 leading to the hypothesis that the colonization site within the host could be a determinant of clinical manifestation by that host.  Figure 1.2 Brush border of human intestinal tract.  Figure 1.2. Scans of the microvilli lining the gastrointestinal tract, illustrating the predominant site of human host colonization of Cryptosporidium once ingested. Image at: 627 x 300 7  1.2.2 Clinical Manifestation  57 reported 6 2 Clinical 3 Asymptomatic infections of Cryptosporidium have been ’ cryptosporidiosis presents as an acute gastroenteritis with classic symptoms such as nausea, cramping, fever, weight loss, fatigue, and most notably profuse secretory diarrheal ’ 6 56 episodes 9 1 37 3 153, 156,191 The 8 volume of diarrhea can be extreme, with 3LIday being common and with reports of up to ” 63 42 17L/day ’ 5 . The prepatent period, time 63 When infected, humans can excrete up to 1010 oocysts per gram of feces from ingestion of oocysts to the excretion of oocysts following completion of the life cycle, can be anywhere from 3-5 days up to two weeks. Duration of infection is largely dependent on the immune status of the patient though typically symptoms remit in 30 days or so. While self-limiting in the immunocompetent host, infection can result in chronic disease with an intractable diarrhea in the immunocompromised host, making it an opportunist ’ 6 62 pathogen 9 2 ” 1 37 3 The greatest impact of 9 0  Cryptosporidium is seen on the HIV(+) community where infection has become an AIDS defining diagnoses with a more than 2-fold hazard of death than other AIDS defining  ° 9617 gn  Livestock and domesticated or companion animals exhibit the same prominent clinical signs of infection as humans, voluminous watery diarrhea but more severe cases more often result in mortality 63 ’ 62 The molecular basis for pathogenicity is not well understood and no specific virulence factors have been unequivocally shown to cause direct or indirect damage to host tissues. Cell death comes about as a direct result of the parasites invasion, multiplication, and 57 extrusion’ 158 Pathologically, the ’ increased turnover of mature intestinal epithelial cells with immature cells results in a loss of absorptive capacity of the epithelium causing a reduced ability of the host to digest and absorb fats and fat soluble . The subsequent release of inflammatory cell mediators stimulates electrolyte secretion and 78 vitamins diarrhea. Morphologically cell damage is a result of villous atrophy, lengthening of the crypt, mitochondrial changes and an upsurge of lysozomal activity in infected cells due to T-cell mediated . 227 inflammation  8  1.2.3 Immune System Response  The human host immune response to Cryptosporidium has not been extensively studied and is still for the most part poorly understood. The parasite appears to make little effort in evading the immune system of the host. Many of Cryptosporidium ‘s surface proteins, glycoproteins, and phospholipids are strongly immunogenic and antigenically  7275230  it seems plausible that this high  immunologic profile may represent a survival strategy of the organism. The primaiy mechanism of host defence appears to be cellular immunity though to some degree 7 199 Mouse models of infection have suggested 8 59 ’ 21 involved 7 humoral immunity is also known to be ” that IL- 12, an important interferon-y inducer, could play a critical role in determining resistance to C. . Hunter et al (2002) confirmed Tcell, specifically CD4 T-Lymphocyte, involvement 21 parvum infection in the mechanisms of immunity and demonstrated that Tcell deficient or impaired mice consistently presented with increased susceptibility and a more severe course of disease. In contrast studies on human cases of Cryptosporidium have shown that primary exposure is not sufficient to protect against future 158 Combined these two studies would suggest a predominant T-lymphocyte ’ 57 bouts of disease involvement and minimal involvement of B-lymphocytes or memory introduced into the immune system.  1.2.4 Diagnosis  When C. parvum was first diagnosed as a human pathogen diagnosis was made by a biopsy of intestinal tissue ’. Methodologies for the routine monitoring of Cryptosporidium are only semi11 ’ 33 quantitative as they do not provide information on the viability or human infectivity  A variety of  tests such as ELISA (Enzyme-Linked Immuno Assay) and IFA (immunoflourescence assay) can detect anti-Cryptosporidial 1gM, IgG and IgA antibodies but they are unable to distinguish between different pathogenic species or distinct genotypes of a given species. In recent years the advancement of PCR and ’ 4 related techniques has proven reliable for accurate species identification and distinction  15, 62  ’ ‘. Stool samples are 7 Medical laboratory diagnosis of cryptosporidiosis is relatively simple . Acid-fast 7 collected and sugar flotation or a comparable technique is used to concentrate the organism staining methods, with or without stool concentration, are most frequently used in clinical laboratories. For greatest sensitivity and specificity, immunoflourescence microscopy is the method of choice, followed closely by enzyme 33 immunoassays ‘. Niehlsen acid fast staining or IFA staining of 000ysts in ’  9  fecal smears are proficient for indicating the presence of parasites (‘Figure 1.3).  Histological sections  from a biopsy of intestinal epithelium indicating any stages of the Jrganism can also provide a positive identification of the parasite. However a majority of cases are n t confirmed and/or reported to health officials as not all patients seek treatment or health professionals fail to submit specimens for Ciyptosporidium specific diagnosis. The result of this, especially in an outbreak situation, leads to a skewed representation of actual cases as few are definitively confirmed through a diagnostic laboratoty and many are simply resolved with the speculation of Ciyptosporidium infection.  Figure 1.3 Ciyptosporidium detection; acid-fast staining method.  Figure 1.3. Micrograph of a direct fecal smear stained to detect Cryptosporidium using modified cold Kinyoun acid-fast staining technique. Cryptosporidium oocysts are stained red. Source: CDC/Dr. Pearl Ma.  1.2.5 Chemotherapy & Treatment  There is no defmitive cure for cryptosporidiosis and efficacious chemotherapies or vaccines have literature 107, 108, 110, 118 Despite more than 120 drugs tested against ’ yet to be described in 61 Cryptosporidium effective anti-microbial treatments for the disease are still lacking. Some have proven toxic to the patient at the doses required to reduce parasite multiplication while others have shown efficacy only in animal models and most have shown no efficacy at all. Though limited in scope some studies have indicated moderate success with the use of macrolide antibiotics (Figure 1.4). Currently the 10  treatment of choice is symptomatic based. Oral rehydration, anti-diarrhea medications and electrolyte 65 24O 63 reconunended replacement are all ’  5 therapeutics 6 ’ Figure 1.4. Macrolide antibiotics recently explored as Cryptosporidium 63 Nitazoxanide treatment demonstrates some effectiveness when administered to immunocompetent -  patients with a dramatically more severe clinical manifestation of disease. Nitazanoxide has recently been approved by the United States Food and Drug Administration for treatment of Cryptosporidiosis in 1-11 year old children (USFDA, 2002). Azithromycin another drug that has demonstrated some improvement in diarrhea symptoms when given -  to immunosuppressed children. Octreotide though it has no effect against the organism itself, Octreotide also appears to help control -  diarrhea symptoms as alternative uses have shown it effective against watery diarrhea and in the reduction of flushing. Parmomycin  —  studies have proven a mild effectiveness against the actual parasite when used in HIV (+)  individuals. A substantial decline in the number of infectious oocysts, a decrease in intensity of the . 78 disease and improvements in intestinal function and morphology were reported Figure 1.4 Major macrolide antibiotics used as chemotherapeutic agents against Cryptosporidium infection with varying degrees of success.  11  1.3 Epidemiology  1.3.1 Exposure  Person-to-person contact, agriculture, livestock, wildlife and drinldng or recreational water sources 90 125,165,181 Cryptosporidium is considered to be a 7 Cryptosporidium’ are all principle points of exposure to ’ 191 By some accounts Cryptosporidium is one of the most ’ 7 highly contagious and communicable disease’ . The success of Cryptosporidium is attributed to multiple 54 important parasitic causes of diarrhea! disease’ factors. Cryptosporidium oocysts, the infectious form of Cryptosporidium, have a very resistant nature. Oocysts are omnipresent, highly stable and have been isolated from a diverse array of ecological situations: lakes, rivers, streams, ponds, marshlands and saline rich coastlines. These tiny spore-like bodies are surrounded by a tough protective wall and can remain in their infectious state outside of the host whereas many parasites are only in their infectious state once inside their host. The thick outer wall of an oocyst 15 ’ 7 measures just 4-5um in diameter, about half that of a normal red blood cell  Oocysts are shed in the  63 62 months feces and have been shown to have an enviromnental persistence of up to 6 ’  65  Also  ’ 62 contributing to the parasite’s success is the fact that only a few oocysts are required to establish infection 64  As many as 100 different mammals can serve as a reservoir host for infectious Cryptosporidium species  1 Humans and livestock, particularly livestock neonates, are 77 10 127 48 universal’ 8 ” 2 making it even more ’ 5 61 42 oocysts 8 ” considered to be the most significant source of .  1.3.2 Transmission Dynamics  With the host range of the organism being so broad the transmission routes of Cryptosporidium oocysts becomes multi-faceted and very complex. This is further complicated by the ability of different transmission routes to interlace with one another (Figure 1.5). This makes tracking infection sources and subsequent transmission routes arduous. Cryptosporidium is transferred through zoonotic transmission 2 anthroponotic transmission” (human to human), through contact with fecally 3 37 human) 6 (animal to ’ 63 and the ingestion of fecally contaminated ’ 35 contaminated surfaces  143,180  (food borne) and water  94 transmission. Regardless of the mode of 0 and though rare via aerosol 9 4 34 ° 86 (waterborne) 9 2 2 ’  12  transmission the transfer of parasitic oocysts from host-to-host is mediated by the fecal-oral transmission route. Zoonotic Transmission 6 C. parvum is capable of infecting most species of mammals  ,63, 230  Tn many zoonotic diseases a  vector agent is required however Cryptosporidium can pass from ammal to human through direct fecal contact with infected animals. Most of these situations arise on farms, petting zoos or direct contact with wildlife 12, 134 ’ 10 Bovines are the primary reservoirs for C. parvum and they play a central role in maintaining and , 10 disseminating oocysts because of their high susceptibility to disease and extensive diarrheal episodes 208 The disease is most prevalent in neonates but the defmitive source of infection and the direction of  11,  transmission between calf and adult or adult and calf are still unclear. Calves can excrete 1010 oocysts per 7 Outbreaks of cryptosporidiosis have been associated with both beef and dairy cattle. The evidence day 9 ” 93 for C. parvum transmission from calves to humans is unequivocal, particularly during the calving season. Besides direct contact with livestock human cases may also arise as agriculture or farm personnel and equipment become vehicles for transmission. Furthermore the exceedingly high prevalence of infected calves on dairy farms raises additional questions about the prudence of handling and drinking unpasteurized milk. While the human threat from Cryptosporidium within the public sector is of greatest concern the indirect correlation to the prevalence of Cryptosporidium on farm and agricultural lands cannot be ignored. ’ 76 Many infectious agents, mostly parasites, are carried by wild animals’  Direct fecal-oral  transmission between people and wildlife on farms or petting zoos can facilitate zoonotic transfer of ’ 176, 197• The precise significance of wildlife as a reservoir for farm animal or human 173 infectious oocysts cases still needs to be elucidated. More recently, Cryptosporidium has been reported in mice, feral pigs, 0 Some of these animals can 73 09 13 4 2 muskrats’ 1 189 ”’ 2 wild rabbits, foxes, squirrels, chipmunks, and ’ often be found in urban areas therefore increasing the opportunity for transmission to humans and 76 Wild animals often share their habitats with farm animals and agricultural lands, ” 6 domestic animals providing an additional source for environmental contamination and livestock contamination which could . Humans have close interaction with companion animals. Sharing living 181 ultimately carry on to man spaces also means sharing microorganisms that can cause disease. Common pets include dogs, cats and birds and the more exotic fish, snakes, lizards and ferrets. Though suspected, reports regarding cases of transfer between household pets such as cats and dogs to humans are limited in the literature. The prospect of acquiring Cryptosporidium from a household pet is typically more serious for children, the promised 111,131 6 immunocom 7 ’ elderly and the 45 13  Anthroponotic Transmission Human to human contact involving an infected individual also facilitates spread of disease 107 ’ 72 Typically these cases migrate out to the immediate family or other household members who are likely to be exposed to the organism. Cryptosporidium transmission occurs with very high frequency in children’s facilities such as daycares and schools . Infants or young children are clustered in classrooms, share 111 toilets, and common play areas or necessitate frequent diaper changing. Nosocomial infection or hospital acquired infection is another major opportunity for anthroponotic transmission making both staff and patients vulnerable . The housing of multiple patients in close quarters increases the chance of cross35 infection as do staff circulating about the hospital from patient to patient and ward to ward.  Foodborne transmission Occasionally food sources, such as raw meat, unpasteurized products and fruits or vegetables, may serve as vehicles for transmission. This is presumably because of contamination through fecal matter in untreated water used to wash, irrigate or spray crops . Furthermore foodborne transmission 133 can result from improper food preparation and/or safety measures in the food itself. A Cryptosporidium outbreak in Maine was traced to children who drank fresh-pressed apple cider contaminated by animal feces at a county fair’ . This is thought to be the first documented outbreak using this transmission mode. 43 The handling of food in unsanitary conditions or by unsanitary workers is of great concern as they may unwittingly transfer oocysts to foods not cooked after handling thus creating the potential for large-scale outbreaks’ 198 ’ 80  Waterborne Transmission In the last two decades enteric protozoa have become the leading cause of waterborne disease outbreaks for which an etiologic agent can be determined. Waterborne transmission of Cryptosporidium is the most significant route of exposure for sporadic and outbreak situations. The most common sources are contaminated drinking and recreational water sources. Contaminated drinking water has shown to be responsible for many ’ 92 90 outbreaks 2 ” 21 ”°°”°” 9 130, 165, 239  General causes include inadequate treatment practices, contamination at treatment plants, and  direct sewage contamination through pipe leakage, breakage, backsiphoning and cross-connections. Most municipalities throughout North America acquire drinking water from surface and groundwater . Recent studies indicate that Cryptosporidium oocysts are present in 65-97% of surface 59 resources waters (i.e. rivers, lakes, streams) tested throughout North America . Concentrations of oocysts as high 59 14  as 5800 per litre of surface water have been found. Groundwater is also impacted with estimates of 9.5%. The largest 59 22% of the United States ground water samples testing positive for Cryptosporidium waterborne disease outbreak for any pathogen ever recorded resulted in approximately 403,000 cases of 130 Runoff from nearby dairy farms, cryptosporidiosis in Milwaukee, Wisconsin in the early 1990’  drainage from an abattoir and other sources were all suspected, thus creating an in-direct component of zoonotic transmission. An association to agriculture or wildlife in close proximity to water resource facilities is thought . Water intake facilities located near agricultural or 95 to be a major contributor to this transmission mode pasture land provides opportunity for oocysts to enter public water systems through run-off waters. During a confirmed waterborne outbreak in the British Columbia interior, oocysts were detected in 70%  . 63 of the cattle fecal specimens collected in the watershed close to the reservoir intake’ Non-domesticated wildlife and livestock can also initiate drinking water contamination by gaining access to and releasing infectious oocysts in regions designated as protected water resources for human consumption. Graczyk et al (1997) found that feces from migratory Canada geese collected in 7 of the . Oocysts from three of the sites were 80 nine sites in Chesapeake Bay contained Cryptosporidium oocysts infectious to mice and identified as C. parvum. Based on this it would appear waterfowl can pick up infectious Cryptosporidium spp. from their habitat and deposit it into the environment, including drinking water supplies where it becomes accessible to humans. Recreational water amenities are a second route for encounter, particularly those visited by small children. Unintentional fecal release from infected babies or toddlers could contaminate a pool, wading pool or hot tub enough that upon ingestion of water others would be exposed. This combined with oocysts resistance to chlorine, low infectious dose, and a high bather density creates optimal conditions for outbreak situations. In both the United States and Canada numerous outbreaks associated with swimming pools, ”° 8 documented’ ’ waterslides and water parks have been 4  200  According to the Natural Resources Defence Council of the United States, at least 33% of rivers  and over 50% of lakes in North America are unfit for swimming, fishing and other activities. Contamination from uncontrollable animal and human sources contribute to the fecal burden making safeguards in these environments difficult or impossible to implement.  15  Figure 1.5. Interlacing of Cryptosporidium transmission dynamics.  ENVIRONMENT  Figure 1.5. Simplified schematic Cryptosporidium transmission dynamics, illustrating sources of exposure and pathways of spread. Produced by author; JMW.  1.3.3 Prevalence  Infectious diarrheal diseases are the second leading cause of morbidity and mortality in the world. An estimated 200 to 375 million episodes occur each year in the U.S. alone, resulting in 73 million physician consultations, 1.8 million hospitalizations and 3,100 deaths . Worldwide, there are 3.1 million 240 deaths associated with diarrhea each year, more than 8,400 per day, mostly among children in developing . Some estimates claim that 3-7% of all reported diarrheal diseases in the third world can be 240 countries . In industrialized nations it is estimated that somewhere around 0.4% 9 traced to Cryptosporidium species’ of the population appears to be passing oocysts in the feces at any one time. Unfortunately most countries in the world have no testing protocols for Cryptosporidium, either on a routine basis or as a cause of death when diarrhea is implicated. The result is that many cases go unrecognized or are settled  16  with speculation therefore skewing the actual rates of incidence for disease. Most experts collectively agree the actual number of cases is much higher than those reported on or documented. At least nine molecularly different types of Cryptosporidium have been found to infect humans, whether immunocompetent or  626365  The vast majority of cases are caused by C. hominis and C.  parvum making them the biggest concern from a public health standpoint. In the United States and Australia C. hominis is responsible for greater than 75% and 85-92%, respectively, of all human infections’ 2 ’ 70 12 In contrast, it is reported that in the United Kingdom human cases are mainly the result of C. parvum infection, 61.5%, while C. hominis accounts for approximately 37.8% of all human cryptosporidiosis cases’ . The difference is likely a result of the obvious separation between urban and 39 rural populations in the US and Australia when compared to those of the UK, where agriculture plays a more significant role. In developing countries, diarrheal disease is much harder to contain and is worsened by the increase in migration and movement of populations in the last two decades enabling national boundaries to disappear as far as the transmission of disease is 98 concerned 240 While the ’ prevalence in these regions is likely extremely high the lack of infrastructure in place to document suspected cases prevents precise estimates. According to WHO the three most common causes of protozoan diarrhea are Cryptosporidium parvum, Giardia intestinalis, and Entamoeba histolytica. Temporal circumstances also play an elemental role in Cryptosporidium epidemiology. Cryptosporidium has been shown to have a seasonal distribution based on geography, and temporal trends or patterns leading to an increase in parasite burden in the environment’ . Many outbreaks have been 0 dated to post-rainy seasons as increased rainfall and run off events are major factors affecting the total microorganism load in water sources. In the Northern hemisphere Cryptosporidium generally becomes a problem from March to . 59 June 6 ’ 2 Typically during the spring season rains increase the run-off and the 3 population of neonate animals is higher. In both Great Britain and the West Coast of Canada this season tends to be a little longer, beginning in February and lasting until 6 ’ 59 mid-May 2 Earth is experiencing 3 increasingly more severe weather systems, in turn leading to more frequent and more extensive flooding. To emphasize the effects of climate change and weather patterns on engineered water systems we only have to look at the recent major flooding events of the Asian tsunami and Hurricane Katrina. During massive flooding events water systems become inundated resulting in their collapse and contamination with human and agricultural wastes which ultimately leads to a lack of potable water for human consumption.  Flooding can lead to significant population displacements which compromise normal  hygiene standards creating the perfect dynamics for large-scale outbreaks.  17  1.3.4 Demographics & Sociological Influences  Cryptosporidium is cosmopolitan in its distribution. All humans are presumed susceptible to ’ 19 infection regardless of age, race and gender  Cryptosporidium infection has been reported in persons  from three days of age to ninety-five years of age . The severity of Cryptosporidium typically varies 63 according to age, immune status, and socioeconomic circumstances. There are extraneous factors that can increase the chances for an encounter with the organism or that will dictate the course of disease thus creating pockets of target populations for Cryptosporidium incidence (Table 1.1). While age is not a defining factor in Cryptosporidium epidemiology some of the most critical cases tend to appear in the age extremes of the elderly and young . The immune system is either 20 deteriorating with age or is just beginning to mature both of which create a vulnerability to opportunistic pathogens. Age can also impact the degree of personal hygiene among these populations as the elderly may not be physically able to tend to themselves properly or young children are unaware of the importance of proper sanitation. Cryptosporidium is a serious illness in patients with suppressed immune systems making them a high risk target population for 32 infection 45 Primary illnesses such as FIIV (+) status, immune system ’ deficiencies, and autoimmune disorders create a favourable niche for the parasite to colonize. Infection rates for AIDS patients are reported to be 4% and 2.5% in the United States and Canada respectively . 240 Patients undergoing immunosuppressive chemotherapies for cancers and organ transplant situations are also in a weakened immune state and have a greater susceptibility to the disease. Since transmission is dominated by the fecal-oral route areas of inadequate sanitation and poor hygiene standards create another target population ’ 20, 169 This renders cryptosporidiosis a disease of 8 great socioeconomic status as these conditions are more likely to be found in poverty stricken regions or the developing world. According to the WHO worldwide approximately 1.1 billion people lack access to improved water sources and 2.4 billion have no basic sanitation. The scope of the problem is enormous as each year close to 4 billion cases of diarrhea occurs globally. In Southeast Asia and Africa diarrhea is responsible for as much as 8.5% of all deaths. The differences in diarrheal disease incidence in developed countries versus underdeveloped ones can be attributed to sanitation, access to potable water and personal and domestic hygiene . A review of the geographic distribution and prevalence of 8 Cryptosporidium based on oocyst detection and seroprevalance studies in humans from forty countries was compiled by Ungar et al (1990). Based on detection of oocysts in fecal specimens the prevalence of human infection in African countries (2.6-21.3%), Central and South American countries (3.2-31.5%), 18  Asian countries (1.3-13.1) and others in the Pacific and Caribbean areas is considerably greater than that of Europe (0.1-14.1%) or North America (0.3-4.3%)’ 62, 144  Table 1.1 Target populations for Cryptosporidiiim exposure Source  Likely Target Population  RurallUrban  Daycares/schools  Infants, young kids, employees  either  Unfiltered/untreated drinking water  Small communities, farms, those using well  rural  based water Lambing, calving, muck spreading  Farmers, ranchers, ranch hands  rural  Sexual practices  Young to older adults, gay males  either  Nosocomial/health facilities  Elderly, patients, staff, visitors  either  Farm & zoo animals  Veterinarians, children, employees  either  Regions without water treatment  travelers  rural  Household members  either  standards Household pets (rare)  Table 1.1. Populations considered being at a slightly elevated risk of Cryptosporidium exposure.  1.3.5 Prevention & Control  Knowledge regarding clinical and ecological aspects of a pathogen are important if public health measures to control it are to be effective. Pathogens that have more complex, interlacing transmission dynamics demand a comprehensive approach to control and prevention strategies. Cryptosporidium is acquired through the ingestion or inhalation of infectious oocysts therefore control efforts are aimed at limiting host contact with the organism. Management of outbreaks calls for a multi-tiered approach encompassing scientific, medical, economic, political, and educational solutions. For Cryptosporidium this is three dimensional with efforts focused on public water safety, agricultural practices, and hygiene standards.  19  Cryptosporidium & Water Quality It stands to reason that the role of water in the transmission of waterborne pathogens may increase substantially in importance and complexity as human and animal populations grow and the demand for potable water escalates. Contaminated water is commonly considered to be the most potential source of Cryptosporidium exposure making water purification the most important single measure available for ensuring public health’ . In North America drinking water contaminated with oocysts has 33 been blamed in a number of gastroenteritis outbreaks bringing about some apprehension about the safety of public water  8690130,163  The 1993 Milwaukee outbreak, where over 400,000 people were  affected, brought the issue of drinking water safety and standards to the forefront. The sheer magnitude of the outbreak and its association to water obtained from a municipal water plant that was operating within existing state and federal guidelines initiated questions as to the validity of current regulations. Not only did this outbreak emphasize the need for improved surveillance by public health agencies it also stimulated efforts to develop regulatory standards specific to Cryptosporidium. Here in Canada authorities have been forced to re-evaluate water treatment and monitoring practices following the tragic consequences of the waterborne outbreaks of E. coil 0157:H7 in Walkerton, Ontario and C. parvum in North Battleford, 59 Saskatchewan 90 Both have awakened health officials here in Canada to our own ’ vulnerability to such diseases despite our more “urbanized” utilities in place. Ignorance is no longer an option. Water contamination can occur at any of the three major steps in water systems; source water, water treatment and water distribution . The first link in the chain of providing access to clean safe 28 water is ensuring the source water quality. Many municipalities in North America extract water from surface waters such as rivers, lakes and streams or groundwater resources. Many of the “supposedly” protected sources are susceptible to contamination from wildlife, accidents or contaminated groundwater . Older water distribution systems are rapidly deteriorating. In any given city there are thousands 00 flows’ of miles of piping and not only are the replacement costs extreme but the process is very slow, likely taking years. Although technologies are available to treat even the most contaminated water source it is an ongoing challenge for many communities to incur these costs and implement the more current or advanced methods. Treatment of municipal drinking water is commonly done in two ways: through chemical treatments and through filtration. Chemically, chlorination is the most frequently used for disinfection of water by killing most viruses, bacteria and protozoa such as Giardia. Research shows that  Cryptosporidium is 240,000 times more resistant to chlorination than Giardia and the actual amount needed to effectively kill Cryptosporidium oocysts would render water to toxic for consumption’°’. 20  Filtration, using ultra-fme membranes, is a better bet for removing Cryptosporidium oocysts though the expense of such a system is a major issue . Listed below are the contaminant removal parameters for 29 biological agents in municipal water 28 systems 29 ’ •  5-100  microns,  conventional  filtration:  removes  human  hair,  the  smallest particles  visible to the naked eye and red blood cells. •  0.1-5 microns, micro filtration: removes the smallest yeast cells, tobacco smoke and the smallest bacteria.  •  0.01-0.1 microns, ultra filtration: removes carbon black  •  0.001-0.01 microns, reverse osmosis: removes ionic particles such as polio virus, aqueous salts and metal ions.  With a size range of 4-6 microns for Cryptosporidium oocysts a minimum of micro filtration is required but preferable treatments would entail the use of ultra filtration or reverse osmosis . Most 28 current water systems are aging and in desperate need of upgrades to newer more technical systems such as filtration. The use of ozone and ultra-violet (UV) lights has also been shown to disinfect water sources successfully with respect to 59 Cryptosporid 6 ’ 3 ium 90 Although ozonation of water demonstrates the ability to kill Cryptosporidium oocysts, the appropriate amounts of ozone needed to disinfect water at various temperatures and pH levels have not been clearly defmed. In general, the amount of ozone needed to kill Cryptosporidium species is hundreds of times greater than that needed to kill bacterial . 107 contaminants For iminunocompromised individuals avoiding contaminated water is particularly important’° . 7 The risks involved with tap water are still not clearly defined but are considered to be high enough that it is advised these people use properly filtered bottled water or boil water intended for drinking for a minimum of one minute. In-home purification and filtration systems can reduce the risk exposure providing it can remove particles 0.1-1 micron in size, filters via reverse osmosis or has an absolute 1 micrometer filter’° . 7 In an effort to get a handle on how best to kill Cryptosporidium oocysts a number of different techniques and chemicals have been tested (Table 1.2)63.  21  Table 1.2 Deactivation Of Cryptosporidium Oocysts Very Effective  Somewhat effective  Not effective  Boiling, >73°, >1mm  Ammonia  Phenol  Freezing, <(-)2°C, >24hrs  Chlorine  Formaldehyde  Methyl bromide  UV light  Ethanol  Ethylene oxide  Iodine  Isopropyl alcohol  Hydrogen peroxide  Lysol  Table 1.2. Efficacy of various methods for deactivation of infectious Cryptosporidium oocysts . 56  Despite all the advancements in understanding Cryptosporidium and how best to approach it in public water resources the relentless nature of the organism still creates problems. Most waterborne outbreaks of Cryptosporidium in North America have occurred in communities whose water facilities were compliant with governmental and health regulations. Although utility companies may adhere to guidelines, the guidelines themselves may not be sufficiently stringent for public protection. Recent surveys in the United States for the presence of Cryptosporidium oocysts in fully treated (disinfected and filtered) municipal water showed a small number of oocysts breached the barriers and could be isolated from tap water in nearly half of the communities evaluated’ 122 This gave birth to many questions. Whether or not ’ 21 a small number of oocysts in drinking water constitutes a large enough infectious dose to cause illness, are immunosuppressed persons more susceptible to lower doses, are there strains of Cryptosporidium that vary in infectious dose and infectivity?  Agriculture Approaches Farming and agriculture lands may also play a role in introducing Cryptosporidium into the water systems, most often because of their locations to waterways and intake facilities. Because rainfall or  snowmelts can transport contaminated fecal material from grazing fields cattle farms or feedlots should be located away from surface water sources such as rivers, lakes and streams. Stream bank fencing is recommended for landowners that pasture their livestock along these waters as it not only improves the overall water quality it protects the wildlife, fish and vegetation. Other practices encouraged are the reduction of stock density, separation of neonates from adult populations, minimal contact between personnel and calves and maintaining a relatively short calving period” 22  Hygiene With enteropathogens prevention centralizes around hygiene measures in any setting in attempt to interrupt fecal-oral transmission. Sanitation and personal hygiene standards are critical in the home and . Epidemiological evidence suggests that hygiene and sanitation is at least as effective in 45 public places . Regular hand washing is the number one 33 preventing disease as is improved water supplies’ recommended practice to prevent exposure to fecally transmitted microorganisms. The simple act of . Vigilance 240 washing hands with soap and water can decrease diarrheal disease transmission by one third is especially required after visits to hospitals, nursing homes and daycares or zoos. Avoidance of public pools and aquatic centers frequented by diapered or young children is encouraged. The amount of chlorine and types of filters used in public swimming pools do not prevent transmission from swimmers shedding infectious oocysts. The safe disposal of children’s feces is critical as children are not only more likely to acquire diarrheal disease but they are most likely the source of infection also. Implementation of the strategies for prevention and control of cryptosporidiosis are a more 920 The World Health Organization has listed daunting challenge for the developing  Cryptosporidium as a “reference pathogen” for the monitoring of global water quality. A lack of fresh, clean water and poor sanitation conditions is a catalyst for the spread of disease making Cryptosporidium and similar pathogens endemic to these regions. The water supply and sanitation sectors, or lack of, will face enormous challenges over the coming decades as the urban populations of Africa, Asia, and Latin . This will put great strain on an afready failing 20 America are all expected to dramatically increase system. lii rural Africa, Asia and Latin America alone, just fewer than 2 billion, one third the total global population, are without access to improved sanitation. Approximately 1.1 billion are without improved water supply. Some believe these countries would be more appropriately referred to as “Thirst World” countries rather than third world. Innovative and cost effective source and treatment options, public initiatives, and monitoring programs directed towards the needs of these countries are essential if the global incidence of waterborne disease is to be crippled.  23  CHAPTER 2 GENOMICS OF Ciyptosporidium -  Chapter Summary  -  Phylum, Genus, Species  -  The Cryptosporidium genus, phylum Apicomplexa, has been undergoing constant  revision and is the subject of great debate within the field. The accurate identification of a species or genotype is fundamental to the diagnosis, treatment, and prevention or control strategies of cryptosporidiosis in both humans and animals. The burden of disease attributable to a specific species is still elusive therefore hampering efforts to clarify the transmission dynamics and epidemiology of cryptosporidiosis. Herein the genomics of Cryptosporidium spp. as they are currently known are described.  24  2.1 Phylum Apicomplexa  ’ 9 ” 7 Cryptosporidium has been classified as a genus within the Apicomplexa’  Phylum  Apicomplexa, previously known as Sporozoa, is large, complex, and consists of protists characterized by  the presence of an apical complex. They are unicellular, spore-fonning, and most often parasites of animals. All members are parasitic, have multifaceted life cycles involving both asexual and sexual reproduction and since most are intracellular lack any visible means of locomotion. Apicomplexan parasites are eukaryotes and therefore share many metabolic pathways with their hosts. Because of this therapeutic target development becomes extremely difficult. An efficacious drug that harms an Apicomplexan parasite is also likely to cause harm or damage in its human or animal host. Biomedical research on these parasites is challenging because it is difficult, if not impossible, to maintain live parasite cultures in the laboratory and to genetically manipulate these organisms. This has impaired efforts to secure purified samples at different developmental stages. Research is forced to focus on intensive molecular studies to clarify the pathobiology. The most medically important and notorious Apicomplexa genus is Plasmodium, which includes the causative agents of malarial disease. Listed below are four of the classes within the Apicomplexan phylum and some genera they contain.  •  Coccidia: Cryptosporidium, Eimeria, Sarcocystis, Toxoplasma  •  Gregarinia: Gregarina, Monocystis, Pseudomonocystis  •  Haemosporidians: Hepatocystis, Plasmodium  •  Piroplasmids: Babesia, Theileria  In spite of its medical and veterinary importance Cryptosporidium has not been studied to the extent that other Apicomplexa, like Plasmodium spp., Giardia spp., and Toxoplasma spp., have. Cryptosporidium has the traditional hallmark features of an Apicomplexan organism however the differences between Cryptosporidium and other Apicomplexa have prevented the application of parallel scientific and therapeutic tactics.  25  2.2 Genus Cryptosporidium  Cryptosporidium is classified as a genus of protozoan parasites with multiple species capable of 2 65 amphibians 3 4 ” 2 27 4 The taxonomic status of 5 6 7 infecting mammals, reptiles, birds, fish and ’ this genus is rapidly changing as new molecular data is published. The most widely recognized species, C. parvum, was once thought to be a single species with a broad host range whereas now several species have been identified. Presently there are 20 pathogenic species/genotypes of Cryptosporidium recognized (Table 2.1). With the exception of C. parvum, capable of infecting over 150 mammals, each species appears to have a more restricted host specificity or range. Initially studies indicated that species were limited to a single host. Molecular investigations have since challenged this concept. Cryptosporidium muris, C. meleagridis, C. baileyi, C. canis, and C. fells, considered to be contained to mice, turkeys, 63230 In the chickens, dogs, and eats respectively, have now all been documented in human in case of C. parvum, morphologically identical genotypes could eventually be accepted as separate species as biological and molecular data amasses, as was the case with C. hominis. The recent acceptance of C. hominis as a distinct human specific species has afready been challenged. Though still incapable of establishing infection in mice C. hominis has now been shown to infect lambs, gnotobiotic pigs and higher primates under the appropriate laboratory ’ 148 2003 Host-parasite co-evolution is also 2 conditions . 67 common in Cryptosporidium, as closely related hosts usually had related Cryptosporidium parasites’ The issue of host specificity is clearly multifarious and therefore is likely a fallible means for determining a species and its host range. Understanding the evolution of Cryptosporidium species is important not only for clarification of the taxonomy of the parasites but also for the assessment of the public health significance of Cryptosporidium parasites from animals. Until the full extent of intra-specific allelic variation in Cryptosporidium taxonomy is fully resolved numerous genotypes will likely continue to be described. The vagueness of the ancestral relationship of the Cryptosporidium genus as a whole has fuelled the need for extensive molecular and biological research into the organism.  26  Table 2.1 Recognized Cryptosporidium spp. Species/Genotype  Predominant Host  Reference  C. andersoni  Cattle; bovine (bos Taurus)  Lindsay et al., 2000  C. baileji  Birds; chicken (gal/usga//us)  Current et al., 1986  C. bovis  Cattle; bovine (bos Taurus)  XiaoetaL,2001  C. canis  Dog (canisfami/ianis)  Fayer et al., 2001  C. fr/is  Cat (fe/is catis)  Iseki, 1979; Pedraza-Diaz et aL, 2000; Morgan et al., 2000  C. La/li  Birds; (Spernzistidaefnin,gi//idae, G. Ga//us)  Pavlasek, 1999  C. hominis  Humans, higher primates (homo sapiens)  Morgan-Ryan et al., 2002; Xiao et al., 2001; McLaughlin et al., 1999, 2000  C. meleaLtidis  Birds; turkey (me/eagnisgal/oparo)  Slavin, 1955; Morgan et al., 2000; Xiao et al., 2001; Pedraza et al., 2001, Streter et al., 2000  C. mo/nan  Fish (iparus aurata)  Alvarez-Pellitero et al., 2002  C. mu,is  Rodents; mouse (mas muscu/us)  Katsumata et aL, 2000; Gatei et al., 2003; Palmer et aL, 2000; Tyzzer, 1907  C. nasorum  Fish (naso literatus)  C. parvum  >  100 mammals; humans, livestock,  Hoover et al., 1981 Xiao et al., 2001 & 2002; Tyzzer, 1912  wildlife C. saurophilum  Reptiles; lizards  Koudela et al., 1998  C. serpentis  Reptiles; snakes (Elaphe guttata)  Levine, 1980  C. suis  Pigs  Xiao et al., 2002  C. varanii  Reptiles; emerald monitor lizard  Pavlasek, 1995  C. wraini  Guinea Pig (Cavia porcellus)  Vetterling et al., 1971  Cervine genotype  deer  Ong et al., 2002 & 2006  Rabbit genotype  Rabbits, hares  Xiao et al., 2002  Marsupial genotype  Marsupials  Morgan et al., 1999  Table 2.1. Accepted Cryptosporidium species and/or genotypes. The genus is undergoing constant review and revision and many of the species listed here were only recently identified, classified and accepted as a true species.  27  Abrahamsen et al. (2004) successfully sequenced and published the C. parvum genome in its entirety, a first in the Cryptosporidium research field’. This uncovered new genes/proteins and components of Cryptosporidium ‘s biology, all of which could potentially be targeted or exploited in downstream studies. The genome is only 9.1 Mbp with 8 chromosomes all highly compacted and extremely gene dense’. This makes the Cryptosporidium genome 2.5x’s smaller than that of Plasmodium falciparum but with 1 .8x’ s greater gene density (Table 2.2, Appendix 1). Genome reduction has occurred predominantly through the shortening of intergenic regions, the loss and/or shortening of introns, and a reduction in the mean length of genes themselves’. The C. parvum genome was shown to have highly . Of 1 streamlined metabolic pathways rendering the organism dependent on the host for nutrient supply particular note was that in contrast to other Apicomplexans Cryptosporidium lacks a plastid and mitochondrial genome. In addition Cryptosporidium species demonstrate some peculiarities when compared to other Apicomplexa including an endogenous phase of development in microvilli of epithelial surfaces, two morpho-functional types of oocysts (thin and thick walled oocysts), and the smallest number of sporozoites per oocyst’. Table 2.2  Ciyptosporidium homiiis genome summary The genome  C. homitils  C. parvwn  P.falciparwn  Size (Mb)  9.16  9.11  22.85  32.3  31.9  23.7  Coding size (Mb)  6.29  6.80  12.03  Percentage coding  69  74  53  No. of genes  3,994  3,952  5,268  Mean gene length (bp)  1,576  1,720  2,283  Genes w/ introns (%) ±  5-20%  5%  54%  Non-coding size (Mb)  2.87  2.32  10.83  % non-coding  31  25  47  Mean length (bp)  716  585  1,694  Biological processes  1,239  n.d.  1,613  Cellular component  1,265  n.d.  1,586  Molecular function  1,235  n.d.  1,625  Coding Regions G  +  C content (%)  Intergenic Regions  Gene Ontology  Table 2.2. C. hominis genome summary in comparison to C. parvum and P. falczarum genomes, adapted from Xu et al., 2004245. Full table available in Appendix 1, Figure A. 1.  +  Excluding introns.  Estimated intron content from  expressed sequence tags. 28  2.3 Cryptosporidium hominis versus Ciyptosporidium parvum  Though C. parvum has long thought to be the principal species, molecular investigations revealed  that a genotype of C. parvum delineated into a separate molecular, host specificity, and transmission This genotype is now known as C. hominis which appears to be restricted to human hosts. Publication of the C. hominis genome following that of the C. parvum genome shows a highly similar gene complement with approximately 97% genetic identity between the two” 245 The phenotypic variation of Cryptosporidium protein repertoires is likely due to these subtle genotypic differences. Parasite transmission, infectivity and host resistance could all be subject to key genetically determined variations between two closely related species or within the species itself. 3 26,27, 30, ’ 2 Molecular evidence for the existence of two separate species has rapidly accumulated 52, 170, 228, 241, 243 Both are able to infect humans and have occasionally been simultaneously identified from the same host but a clear absence of recombinants is suggestive of reproductive incompatibility, a distinction 3 In areas where both species are endemic genetic dimorphism is ’ requirement of species 2 . Considering the sexual stage of its life cycle the lack of genetic recombinants is of interest. 45 conserved’ The development of the gnotobiotic pig model able to sustain C. hominis infection is highly significant as it is the first of its kind and has opened the door for comparative studies on the differences of biological 71 Using this model Pereira et al. showed that the prepatent period, ” 3 species behaviour between the two . disease severity and site of colonization all differed giving further evidence to the idea of two distinct . Akiyoshi (2003) used a gnotobiotic pig model system to demonstrate the inability of C. species’ ’ 7 hominis to compete with C. parvum. When both were administered concurrently C. parvum consistently dominated or displaced C. hominis implying type specific factors associated with virulence, ,  transmission, and disease severity. Medically these differences could have considerable importance as indicators of specific risk factors for disease and patterns of epidemiology. Genetic variations correlated to clinically important parameters, specifically within attachment and invasion proteins, could explain why C. hominis preferentially infects humans.  29  2.4 Biogeographical Genetic Diversity of Cryptosporidium  C. hominis and C. parvum are responsible for greater than 90% of cryptosporidiosis cases in humans in most areas 244 Geographic differences have been shown to exist among the species and the burden of disease attributable to them. In the United Kingdom, other parts of Europe and New Zealand, C. parvum is responsible for slightly more infections than C. ” 49 40 hominis 8 ’ 3 1 5  In contrast, C. hominis is  responsible for far more infections in North America, Australia, Japan, and developing countries where genotyping studies have been conducted 71, 120,146,163,169,170, 212 The geographic prevalence of one species over another is likely a factor of transmission dynamics. With the now common acceptance that C. hominis and C. parvum are in fact distinct species research has shifted to dissecting the intra-species pathogenomics. Evaluating intra-species variation will help elucidate the population structure which in turn will facilitate tracing outbreak sources and transmission routes. Such investigations can identi1’ sub-species or sub-genotypes inhabiting a geographic subdivision of the range of a species . In contrast to the original idea of a limited intra 127 genotypic variation within C. hominis and C. parvum sizeable intra-genotypic diversity has been proven and sub-genotypes ” 46 ’ 25 identified 6 8 0 Sub-genotypes have revealed a wide-spread geographical 9 distribution whereas others appear to be limited to a restricted geography. A study of three outbreaks in Northern Ireland all attributed to drinking water contamination demonstrated the value of genotyping analysis, especially within a timely manner . The research 75 isolated and identified a sub-genotype of C. hominis in 2 of the 3 outbreaks. The same sub-genotype had been isolated in the United States, Canada, the United Kingdom, Portugal, and Peru suggesting there may be no correlation between strain and point of geographic origin. A second study focusing on the DHFR gene sequence in both C. hominis and C. parvum isolates from the United Kingdom, the United States, Canada and Guinea Bissau showed DNA sequence conservation indicating geography has no effect on intra-genotypic variation 232 A third study supporting the concept of independence from geography was ’ 9 conducted on the 18S rRNA locus using C. hominis and C. parvum isolates from patients with and without 11W, and living in Kenya, Malawi, Brazil, the United Kingdom or Vietnam . Supported by 71 phylogenetic analysis the results revealed a lack of specific variation in correlation to geography.  30  While geographically independent C. hominis and C. parvum sub-genotypes have been documented, other studies give evidence of genetic variation conserved to a defined geography. In Italy Caccio et a!. (2001) used 2 microsatellite loci in zoonotic isolates of C. parvum from all over the country . 27 to examine sequence polymorphisms  At the ML1 locus three alleles were discovered, two of which  proved to be widespread and a third that was restricted to Southern Italy. Interpretation of this could argue for the non-random distribution which could be indicative of clonal populations adapted to a particular climate or environment. A similar study on C. parvum, done in Scotland using RFLP analysis  35 ” 82 of three different loci, revealed little to no evidence for geographical sub-structuring  It should be  considered whether or not the conclusions reached from these studies are applicable to other geographic locales or are particular to the country studied. The analysis of samples from more diverse geographies could determine if a lack of geographical sub-structuring is due to a relatively small geographic area and . In an attempt to better define the geographic 35 the frequent movement of hosts between areas’ components of species distribution Gp60 sub-typing tools have been used. Results have revealed the complexity of Cryptosporidium transmission and could potentially have great significance because the Gp60 locus encodes two glycoproteins thought to be involved with attachment to and invasion of host cells. Numerous Gp60 sub-types of C. parvum and C. hominis have been seen in endemic areas of disease. In developing regions, the complexity of transmission is often reflected by the strong presence of many sub-types within C. hominis. An example of one study focusing on the highly polymorphic Gp60 locus recognized a geographically limited allele, le. This allele was one sub-type among many found predominantly in isolates from South Africa even though isolates from the United States, Brazil, Peru, Guatemala, and Zaire were genotyped’ . High heterogeneity is likely an indicator of stable 20 cryptosporidiosis transmission in the area while conversely in developed or industrialized nations the degree of heterogeneity is typically smaller with fewer sub-types predominating, suggestive of a more unstable transmission of cryptosporidiosis.  2.5 Theory of Clonal Population in Cryptosporidium  For many pathogenic organisms that utilize mainly asexual reproduction methods it is often unclear whether or not epidemics are the result of the emergence of pathogenic clones or environmentally 88 ’ 67 determined increases in the population size of the organism  Descriptions of the genetic structures of  epidemic populations are able to help distinguish between these competing ideas. Examinations of intra and inter-genotype diversity have put forth the question of whether or not Cryptosporidium is in fact a 31  clonal population. The occurrence of widespread genetically identical isolates, apparent parity between unlinked loci and infrequent genetic recombination suggests that it is. Whether or not these same conditions occur in nature and can occur between C. hominis and C. parvum has not yet been proven. A clonal population implies that meiotic recombination is rare but does not preclude the existence of complete meiosis or the occurrence of sexual reproduction . Meiotic recombination bears great 223 consequence for the generation and maintenance of variation within a species and can have long term evolutionary impact. Genetic variation can also be generated by intragenic recombination. While Cryptosporidium population structure is almost certainly highly clonal and dominated by the C. parvum and C. hominis lineages interlineage recombination can occur naturally producing mixed genotype progeny that are viable and infectious 223 There are no stable genetic differences among the lineages ’ 217 and this could hamper efforts to characterize molecularly diverse individual natural genotypes and subgenotypes. The clones, not the species as wholes are distinctive evolving units therefore the hypothesis of clonality has great genetic and medical implications. Advances in our understanding of the population structure and pathogenic properties of Cryptosporidium will come from examining those polymorphisms capable of differentiating among isolates belonging to the same genotypic group. Examining the genetic variation among isolates of the related Giardia lamblia one study showed a . The level at which variation 221 range of a virtual lack of genetic variation to extensive genetic variation does exist was left unclear, meaning the heterozygosity within an individual versus the polymorphism within a population. Some authors have argued for close relatedness of isolates throughout the world while others emphasize clonal lineages that are evolutionary independent. Similarly molecular evidence and arguments for both a genetically sub-structured and clonal population structure of Cryptosporidium has ’ 7 3 accumulated’ 5 ” 2 23 7 Frequent transmission from environmental sources could increase the 5 probability of coinfections with genetically heterogeneous parasites thus favouring recombination. In countries where the sanitary conditions are better and diseases like HIV less prevalent, coinfections with heterogeneous parasites originating from environmental sources may be less frequent, and clonal propagation may prevail.  32  CHAPTER 3 STUDY DESIGN -  Research Concepts, Goals, Hypothesis & Significance  Chapter Summary  —  -  Studies on population genetics provide insight into the genetic relationship of  “difficult” complexes and taxonomic representations of a genus. The distribution of genetic variation in intercontinentally disjunct subpopulations may provide important information about the transmission dynamics, epidemiological behaviours and evolutionary patterns of C. hominis. The aim and design of the study enabled us to address research questions regarding the distribution of genetic variation in global C. hominis subpopulations, whether or not such subpopulations are genetically differentiated, and the efficacy of such a typing system for species specific identification.  33  3. 1 Aim  3.1.1 Central Research Questions  The specific aims for this study are threefold. (1) What is the relationship of genetic diversity and genetic differentiation among international C. hominis subpopulations when partitioned into intra-population and inter-population genetic structures? (2) Is the distribution pattern of genetic variation at the SNP level within a particular gene and/or genotype geographically conserved versus geographically widespread? (3) Are single nucleotide polymorphisms (SNPs) mapped to the C. hominis genome a sound approach for molecular typing applications in epidemiological investigations; considering both established and/or novel genotypes in addition to sporadic and outbreak situations?  3.1.2 Hypothesis  We hypothesize that on the basis of the allelic profile of a panel of single nucleotide polymorphisms distributed throughout the Cryptosporidium genome, the degree and partitioning of genetic diversity within and among C. hominis subpopulations will be influenced geographically. Secondly we hypothesize that through the use of a pre-defmed pattern of single nucleotide polymorphisms mapped to multiple loci throughout the Cryptosporidium genome we can establish an efficient and reliable molecular tool for species distinction of Cryptosporidium isolates.  34  3.2 Experimental Objectives  To address our research questions three sequential experimental objectives had to be achieved. The initial objective was to identify, map and characterize SNPs from biologically relevant genes/proteins throughout the Cryptosporidium genome, allowing for the establishment of a data set of molecular markers. This provided the baseline foundation of the study setting the tone for all downstream processes. A panel of mutations needed to be assembled, creating a multi-locus SNP-type (MlSt) or haplotype, differing from the more commonly used and reported on approach of multi-locus genotyping for molecular epidemiology studies. To address different kinds of variation a combination of silent and expressed molecular markers, potential antigenic markers and those more likely to be under positive or diversifying selection pressures were used. Our second objective was the design and development of a SNP-based molecular typing assay. The technological design had to be usable with the crude fecal specimens from which parasite DNA is isolated. Competition from other naturally or invasively occurring organisms is likely so it is essential the research design was capable of detecting and isolating Cryptosporidium DNA amongst that of other microorganisms. An efficient, standardized methodology was crucial as isolates that were kindly donated originated from facilities around the world and have all gone through various processing applications, depending on the lab’s internal procedures and protocols. To assess the full impact of geography on mutation profiles, the third objective was to establish as defined as possible geographic boundaries. Infectious agents do not obey national boundaries and given the opportunity, an organism will always spread. In light of this one application of genetic typing techniques is to track pathogens geographically. To allow for more rigorous epidemiological analyses to be made geographic boundaries with minimal overlap need to be established. This is particularly important when considering the globalization of the modern world. The more restricted or isolated the geography the lesser the likelihood that similar genotypes arise from human travel or urbanization.  35  3.3 Experimental Rationale  The purpose of this study was to evaluate the biogeographical distribution of genetic variation of  C. hominis subpopulations. While we know Cryptosporidium populations have shown genetic substructuring we were interested in asking whether or not global C. hominis subpopulations have the same genetic structure. Examination of many isolates from different geographic origins, from both outbreak and sporadic cases using unlinked informative genetic markers is crucial to commanding a better understanding of the transmission route of disease and probable outbreak sources. Natural geographic barriers can directly impact the dispersion of the organism thus affecting the exchange of genetic material between species. The relative importance of zoonotic transmission patterns has already been shown to vary according to geography. The cohesive characteristics that discriminate between species of  Cryptosporidium could be good predictors of range and potential for transmission. The extent or degree of genetic differentiation could be reflective of transmission intensities within restricted geographic boundaries, having a direct impact on the design and development of chemotherapies.  The Use ofSNPs Phylogenetic analyses are best done with neutral data but identifying genes that might characterize a transmission mode might well involve genes that are under selection. Allelic variation arises from many ways including random nucleotide mutations, diversifying selection, horizontal gene transfer and intragenic recombination ’ 61 54 events 8 8 Single nucleotide polymorphisms may be synonymous and nonsynonymous. Non-synonymous (NS) SNPs result in amino acid replacements and hence are targets for evolutionary selection. NS SNPs are also excellent markers when evaluating pathogenic species and the differences between  6185  In contrast, synonymous SNPs (S SNPs) do not alter the chemistry or  structure of a protein and are therefore functionally and evolutionary neutral, or nearly so . Synonymous 23 SNPs occur with higher frequency and are therefore more accessible targets for genetic population studies to examine evolutionary relationships between species or sub-species ’ 5  185  SNPs provide an efficient tool  for making associations between whole genome comparisons and epidemiology’ . The use of both 92 antigenic relevant SNPs and more neutral molecular markers allows for deeper insight into the molecular epidemiology of C. hominis.  36  The worldwide threat from Ciyptosporidium to human health emphasizes the need to develop rapid methods for the identification of genetic relationships among infectious strains, especially those responsible for mass outbreaks or more clinically severe disease. Restricted allelic variation can limit the utility of multi-locus sequence analysis for estimating phylogenetic relationships among strains or . Hi-throughput SNP genotyping is an efficient way for assigning closely related strains to 92 genotypes’ specific lineages, either identical or related by descent . This removes a critical barrier to population and 83 geographically based studies on the relationships between Cryptosporidium genotypes, where genetic variation is limited. Compared with other molecular markers, SNPs exhibit extremely low mutation rates, making them rare in recently emerged pathogens and therefore extremely valuable from a phylogenetic standpoint 166, 185  It is likely that the subtle genetic differences between Cryptosporidium species accounts for the  variances observed in pathogenicity, host range, clinical presentation and response to therapies. If a clearer understanding of the biology and epidemiology of the emerging pathogen is to develop complete characterization of the few genetic polymorphisms that do exist is crucial. With the many advantages of using SNPs for estimating and examining the taxonomy of Cryptosporidium it could become a tool routinely used to categorize species and strains of the genus. Hi-throughput SNP-typing is an attractive method for analysis of this type as hundreds to thousands of SNPs can be catalogued in relatively short time periods. It would be reasonable to assume that in an outbreak situation this magnitude of SNP genotyping would be a commodity. To date studies focused on biogeographical SNP-typing of Cryptosporidium have been limited thus requiring further study into their use in Cryptosporidium genomics. While some have addressed the issue from a single locus, whole genome SNP analyses with a global perspective are so far grossly under reported in the literature. A multi-locus, whole genome approach can better define the Cryptosporidium population structure.  37  CHAPTER 4 EXPERIMENTAL PLATFORM -  Methodology, Techniques, Instrumentation  -  Chapter Summary The experimental platform of this study involved a multiple method approach, —  uniting qualitative data with quantitative data, conducted in a sequential manner. A combination of genetic data mining, comparative genomics between the C. hominis and C. parvum genomes, and bioinformatics analyses of target loci led to the establishment of a 394 SNPs data set. A panel of 45 SNPs distributed throughout 13 loci was assembled establishing a multi-locus SNP-type (M1St). A total of 77 international C. hominis isolates were processed and scored using the SNaPshot single base extension assay coupled to fragment analysis. Inferences about population structure were made using genetic data analysis software, GDA and Fstat. The ability of the SNP-typing scheme to discriminate between species was also assessed.  38  4.1 Experimental Workflow  Figure 4.1 Experimental Platform; sequential workflow.  silico Data Miiung, [comparative genomics C. ho,,,inis (113502) & C. parvuni (Iowa II)  Establishment of Gene Library prospective target genes SNP data set, targeted molecular markers, bio physical profile & characterization Assembled multi-locus SNP-type:, whole genome multi-locus SNP allele profile Single base extension:, design of’ SNP capture probes Genomic isolation:, sample processing & multi-plex PCR Amplicon purification  SNaPshot. single base extension chemistry & capillary electrophoresis Fragment analysis:, allele scoring & multi-locus SNP-type designation  Data analysis:, descriptive genetic diversity, genetic identity & diversity measures, species distinction 39  4.2 Materials & Methods  4.2.1 Data Mining  From the published C. hominis and C. parvum genomes, based on reference strains TU502 and Iowa Type “II” respectively, exhaustive data mining was done to target genes for SNP-typing. Criteria were bio-ftinctionality, genome location and indicators of selection pressures.  A library of 161  candidate loci, covering all 8 chromosomes was compiled. Each gene was catalogued and compared to its ortholog in regards to chemical and physical properties, and genetic identity. For each locus all available sequences were used in alignment comparisons. Nucleotide sequences were aligned followed by amino acid translation. Open reading frames (ORF) were determined by the presence of an in frame methionine residue. Highly hyper-variable genes and those that are completely conserved were discounted due to primer constraints and/or lack of genetic diversity. Gene sequences were aligned for primary sequence analysis and screened for SNPs using the SeqMan2 module of Lasergene V5. Clustal W slow/accurate alignment at both the nucleotide and amino acid level for each gene was performed using the MegAlign module of Lasergene VS. Alignment reports generated were used to highlight single nucleotide polymorphisms conferring a single amino acid polymorphism (SAAP) within the peptide arrangement. On the basis of the inferred protein sequences SNPs were assigned as either synonymous (S) or non-synonymous (NS). Each gene or gene fragment was submitted as a BLAST query to ensure no significant similarity to other microorganisms that may be naturally or invasively occurring in fecal samples used in this study.  4.2.2 Bioinformatics Analyses: ORF Analysis & Biophysical Properties of Target Genes  ORF analysis of candidate genes/proteins consisted of profiling chemical, structural, and positional characteristics, with particular attention paid to those regions containing target SNPs. The Protean module of Lasergene VS was used to generate graphical and numerical representations of  40  hydropathic character, surface probability, antigenic indices, and predicted secondary structure (Appendix 4)  43,70,06, 103, 116  The Kyte-Dolittle hydrophobicity plot is a graphical representation of the hydropathic score of a sequence of amino acid side chains in a protein’ . Based on their ability to repel and attract water and to 16 what degree each possible amino acid is assigned a number or score. Scores are biologically significant in that they are indicative of possible transmembrane domains. Proteins passing though the phospholipid bilayer of a cell interact with a region inside or outside of the cell, where they will find water, and will therefore have a hydrophobic region correlating to the hydrophobic region of the bilayer. Non-globular proteins, those without transmembrane domains, will be strictly hydrophilic in nature. With a scale set at (-) 4.5 (+) 4.5 values greater than zero are suggestive of hydrophobic character while a value of two or —  more indicates strong hydrophobic region. Emini Surface Probability indicates the probability of finding an amino acid residue on the surface of the protein molecule. With a threshold scale of 1-6 a value of 1 or greater increasingly predicts probability of protein surface location. The computer software program, developed by Jameson-Wolf, is a multi-disciplinary index integrating hydrophilicity, surface probability, backbone flexibility, and secondary structure parameters to predict possible antigenic sites’° . On a scale of (-) 1.7 3  —  (+) 1.7 values approaching (+) 1.7 are  reminiscent of antigenicity. Complete Protean profile examines the flexibility of the peptide backbone as predicted by the method of Karplus and Schulz (1985) is indicated. The propensity of the peptide chain to form various secondary structure conformations such as CL-helix, 13-sheet, and 13 -turn are calculated by both the Chou Fasman and Garnier-Osguthorpe-Robson (GOR) algorithms ’ 43 Quantitative evidence of diversifying selection for each locus was evaluated using the straightforward mathematical model that examines the ratio of non-synonymous to synonymous ’ 3 divergence”  This ratio analysis is the most widely used criterion for detecting natural selection. A  disproportionate number of NS: S substitutions would be indicative of regions under positive diversifying . 28 selection’  41  4.2.3 Target Loci: Multi-locus SNP-typing (MiSt)  From 161 initially considered target genes 25 genes or families of genes were isolated for genetic typing (Table 4.1). These were organized in reaction sets (RS) comprised of two or three loci each to facilitate high-throughput and multiplexed PCR amplification. The remaining two loci used a single-plex PCR platform. From the 25 protein loci targeted 13 were established as initial targets. A pre-defmed panel of 45 SNPs from these 13 was assembled for allele discrimination. The remaining 12 loci, RS ‘s 69, were successfully amplified and prepared for multi-locus SNP-typing at a later date. Table 4.1 Reaction Set  Gene Annotation  MiS-typing Reaction Sets Gene Abbreviation  PCR Platform  Chromosomal Position  1  Malate Dehydrogenase  MDII  Multi-plex  7  1  Lactate Dehydrogenase  LDH  Multi-plex  7  1  Uracil Phosphoribosyl Transferase  UPRTase  Multi-plex  1  2  Erythrocyte Membrane Associated Ag  EMAAg  Multi-plex  1  2  Apoptosis Related Protein  APR  Multi-plex  4  3  Cbtoporidium Oocyst Wall protein  COWP  Multi-plex  4  3  Beta-tubulin  B-tubulin  Multi-plex  6  4  Acetyl coA synthetase  ACoA  Multi-plex  8  4  Mucin-l  Mucin-l  Multi-plex  6  5  Cp23  Cp23  Multi-plex  4  5  1 8S rRNA  18s rRNA  Multi-plex  multi-copy  6  Ceilcycle Regulator  CCR  Multi-plex  I  6  CTCL Tumor Ag  CTCL  Multi-plex  2  6  Aldahyde-Alcohol Dehydase  AAD  Multi-plex  8  7  CLL Associated Ag-KW-2  CLL  Multi-plex  2  7  Sexual Stage Specific Kinase  SSSk  Multi-plex  3  7  FLJ31812/DHHC palmitoyl transferase  FLJ  Multi-plex  7  8  Transmembrane amino acid Transporter  TMaaT  Multi-plex  3  8  ABC multi-drug or ion effiux  ABC  Multi-plex  4  8  Thiolproteinase  Thiol  Multi-plex  7  9  Extracellular protein w/ 8 kazal repeats  Epro  Multi-plex  4  9  Seroreactive Ag BMN-19B related protein  SeroAg  Multi-plex  7  9  RIK protein w/? WD4O repeats  RIK  Multi-plex  8  10  Heat Shock Protein 70  HSP7O  Single-plex  2  11  Gp60  Gp60  Single-plex  6  Table 4.1. The 25 genes targeted at the outset of the study, their abbreviated identifiers, and chromosomal position. 42  4.2.4 Primer Design  Gene sequences of regions spanning target SNPs were drawn from Genbank. Primers for multiplex PCR reactions were designed using Primer Select module of Lasergene V5 with criteria stipulated as: melting temperature of >52°C, length of 18-24 nucleotides, GC content within range of 4060% (Table 4.2). Primers were synthesized by Sigma-Genosys at a scale of 0.5u1 and were RP1 purified (Cambridge, UK). Primers were validated by single-plex amplification of using standard HotStar Taq (Qiagen, Mississauga, ON) PCR conditions: 1X PCR master mix, 800uM dNTPs, 1.5mM MgC12 and 0.5uM each primer. Reactions consisted of an initial denaturation at 94°C for 1 5m followed by 40 cycles of denaturation for im at 94°C, annealing for im at 55°C, and extension for im at 72°C, with a final extension at 72°C for 1 Om. Single base extension (SBE), primer design was done manually using parameters set out in the corresponding SNaPshot protocol (Applied Biosystems, Foster City, CA). An individual capture primer for each viable SNP target was developed. Primers containing neighbouring SNPs within 15 nucleotides upstream (5’) of the target SNP were excluded to ensure primer conservation. Primers all met the stipulations of: minimum melting temperature of 50°C, GC content of less than 70-80%, as devoid as possible of secondary structure interactions. All SBE primers were PAGE purified. A non-annealing GACT nucleotide repeat tail was added to the 5’ end of the oligomer in different size ranges to ensure separation in fragment analysis. A minimum primer length, including the GACT tail, of 36 nucleotides was designed for. All original primer sequences were modified with an initial GACT repeat sequence to total 36 nucleotides followed by varying increments of GACT repeats added thus ensuring a fragment separation of nucleotides between target SNPs and allowing for mobility shift due to different dye weights. SBE primers used in this study are listed in Table 4.3.  4.2.5 Multi-plex PCR Amplification  Genomic DNA was extracted from fecal C. hominis specimens. Target gene regions were amplified using Qiagen Multiplex PCR Kit (catalogue #206143). Primers and expected amplicon sizes are listed in Table 4.2. To ensure optimal PCR conditions the protocol stipulating the use of Q-solution was followed. Q-solution changes the melting temperature of DNA and improves sub-optimal PCR caused by templates 43  that have a high degree of secondary structure or are GC% rich . Reaction mixes were made of 2x 141 Qiagen Multiplex PCR Master Mix (fmal concentration of 3mM MgC12) at a 25ul volume, lOx primer mix (final concentration of 2uM per primer) at a 5u1 volume, 5x Q-solution at a 5u1 volume and RNase free water at a 5ul volume. Ten microliters of template DNA (< lug DNAI5Ou1) was added to give a final reaction volume of 50u1. Using Applied Biosytem 2720 thermacycler an initial activation step of 95°C for 15m was followed by 40 cycles of: 94°C, 30s; 60°C, 90s; 72°C, 60s. After a final extension of 72°C for 1 Om samples were consumed immediately or stored at 4°C for up to 24hrs or at -20°C for longer than 24hrs. Samples were verified on a 1.8% agarose gel electrophoresis at 11 Ovolts for 1 hr 1 Sm alongside lOObp molecular weight marker supplied by Invitrogen (Figure 4.2).  Figure 4.2 Multi-plex PCR; gel electrophoresis of Cp23, COWP & -tubulin loci.  -  -  -  —720bp, COWP  -478bp, f3-tubulin 4OObp, Cp23  Figure 4.2. Gel electrophoresis (1.8%) of Qiagen multiplex PCR for Cp23, COWP and f3-tubulin loci. Lanes are designated along the top. Isolates are all Canadian at: lane 1, BC1; lane2, BC2; lane 3, BC3; lane 4, BC4; lane 5, BC12; lane 6, empty; lane 7, lOObp molecular marker ladder; lane 8, empty; lane 9, BC13; lane 10, BC14.  44  Table 4.2 PCR Primer Pairs Rm  s  Gene MDH  1 I  2 3 3  LDH UPRT  APR  COWP B-tubulm ACoA  4  Mucin I Cp23  5  MS rRNA  6  CCR  6  1  6 7  (IL  7  SSSk  original/  F Primer  F primer (5’-3’)  R Primer  R Primer (5’-3’)  bp  I4DH-f  TrCCAATGTrrGTYrCT  MDH-r  GTrGATAAATCITGTAACTG  796  original  LDH-f  mCGAGAACAAAAA  LDI-{-r  CACAAAAATCTAACCATrA  510  original  UPRT-f  CAACTrCTATTTKtGCrIGCGATrG  UPRT-r  TGCTITrGTTATrGTAATrGTCCAAA  460  original  EMJi-f  TGGTITrCAGTrGCAT  EMA-r  CAAGGGATAAATCCGCAGT  988  original  APR-f  AAATCTCAAAGCAAGAGA  APR-r  CAACTGTGGAACATACCCAACT  297  original  COWPF  GACTCAATrATTGATCG  COWPR  CAGAGTACCAGCTITrGT  720  original  B-tub-5  AGGAACCCATGTGAATITAAGAGA  B-tub-4  TGGC1’CTGCAACAAGCTG  478  published  ACOAFI  GGACCrATI’GAAITrGTCAAGG  AcoA-R1  GAGTAATfCI’GTGTCTCTCCAC  298  published  Mucl-F2  T1’GATGATrCAGAATCATCTGACT  Mucl-R2  GTGAGTICflCfl’CATCTGTATAG  650-900  published  8170  AGGAACCCATGTGAATITAAGAGA  8167  GAGTAATrCrGTGTCTCTCCAC  400  published  CDC-F  GGACCrArrGAATrrGTCAAGG  CDC-R  GTGAGTrCYI’CrrCATCrGTATAG  435  published  CCreg-F  GATGATATBDTGcrAGACCATrCAA  CCreg-R  CrrrCrTCTGTSGTYITrGGTrG  953  original  CTCL AgF  AGGCTATTCAGGTGGATGCT  CTCL Ag-R  CAAAAAIGTI’AAAGAGCGCAAT  678  original  AA DaseF  TrCCAATGTITGTGGCFrCr  PA DaseR  GTrGATAATCCrCC1TPTAACTG  817  original  CLLaAg-F  mCGAGAATGAATGCAPAA  CLLaAgR  CACAATCrPAAATATCGCCATI’A  983  original  SSSkin-F  CAACTTCTATn’MtCYfGCGATrG  SSSkin-R  TGC1T]GTrATrGTAA1TGTCCAPA  507  original  FLJ-F  TrTGCGAAGTGCATGGATAG  FLJ-R  GAAAAACAAGTrCTGATGGTATrCAA  849  original  TMaaT-F  TGGCIGGTITrCAGTrGCAT  TMaaT-R  CAAGGGATAAATCGACGCAGT  850  original  ABCmd-F  AAACCTITrCTCPAAGCAAGAGA  ABCmd-R  CAACTGTGGAACATACCCAACT  762  original  Thiol-F  GACTCAATrATCGCYTGATCG  Thiol-R  CAGGCAGTACCAGCTITrGT  398  original  J-{ypo-F  AGGAACCCATGTGPAmPAGAGA  J4ypo-R  TGGCTCTGCAACPAGCTGTA  949  original  SeroAg-F  GCAATI’AAGAACATCGGGTIT  SeroAg-R  TrACAATCACAGGGGCAAAT  777  original  RIK-F  GCAAATACTI’CATCGAACACCA  RIK-R  TCCATGTGGGACITCATCAGA  573  original  HSP7O-4F  AATrCTCAAPGCAAGAGA  HSP7O-4R  CAGGCAACCAGCTITrGT  745  published  Gp60-F1  GACTCAATrATCrGATCG  Gp60-R1  CAGGCAGTACCAGCTrGT  934  published  8175/IntF  AGGAACCCATMAGAGA  8174/IntR  TGGGCAACAAGCTGTA  800-850  published  published  ‘  8  TMaaT ABC  8  Thiol  9  Epro  9  6 SeroA  9  RIK  solo  14SP70  solo  Gp60  solo  Gp60  Table 4.2. Multi-plex PCR primer pairs used for gene specific amplification from fecal specimens and the expected size fragment of the amplified product. F primer; forward primer. R primer; reverse primer. Bp; approximate expected fragment size from PCR. Origina1!Pub1ished; origmal primers were designed within the lab by the author.  45  4.2.6 Amplicon Purification  PCR amplicons were purified using Invitrogen’s ChargeSwitch PCR Clean-Up Kit (Catalogue# CS 12000). Using the supplied protocol, amplicons were bound using 1 Oul of magnetic beads, 50u1 of supplied purification buffer, 50u1 of PCR product and the MagnaRack. Each reaction was washed twice with the supplied buffer and subsequently eluted using the supplied elution buffer E5 (10mM Tris-HC1, pH 8.5). Purified PCR product was quantified by a Pharmacia Biotech GeneQuant 2 spectrophotometer.  4.2.7 SNaPshot: Single Base Extension (SBE) Chemistry  The SNaPshot primer extension method (Applied Biosystems, Foster City, CA) was used to analyze the 45 SNP-marker set with capture probes listed in Table 4.3. The SNaPshot technique is based on the addition of a single fluorescently labelled ddNTP to the 3’ end of an unlabeled specific oligonucleotide primer that hybridizes to its target DNA located contiguous to the SNP of interest (Figure 4.3). A total volume of 0.5u1 of purified PCR amplicon was used for dideoxy single base extension of unlabeled oligonucleoticle primer following conditions recommended by manufacturer.  46  Figure 4.3 SNaPshot fragment analysis sequential procedure.  Pthnw P’on  ljuuubuuuuuuuuuuu ea  flc  5. ‘V  .  P’atly FOR p,_j  a  SNedmotddNIP pitn.r eA....lI rscn  —  n_ _  t  n TPe  nnnnnnnnHHHHHHHHHHHHHHHI-I1-innn Suft& dcflTP ddCTP dd(U)flP  . FiGwid and lönnba Pflr  fl.aiupafla Ia•  Figure 4.3. Sequential methodology for single base extension chemistry for targeting specific SNP markers with a capture probe designed uniquely for it. Probes combined into the same reaction differed in sizes due to the presence of a non-annealing poly-GACT tail, not shown here. Image from Applied Biosystems, Foster City, Califomia.  Mini-sequencing primers were designed so that the 3’ end was situated one base upstream from the relevant polymorphism. Primers were tailed with a non-annealing poly-GACT tail to produce fragments with differing sizes. Extension was performed for 25 cycles at 96°C for 1 Os, 50°C for 5s and 60°C for 30s in a GeneAmp 2720 Thermacycler (Applied Biosystems). Inactivation of dNTPs and removal of primers from extension products were performed by incubation with 1 unit Calf Intestinal Phosphatase (CIP) for lhr at 37°C followed by enzyme inactivation for 1 5m at 75°C.  47  Table 4.3 Single Base Extension Probes for Fragment Analysis Gene  SNP Marker+  .  B-tubulin  SBE Capture Probe  Length(nt)  I  TGAACTGAATAATrGACT  47  4  AAGATACGYrCCAAGAGC ITrCTACAATGAAGC1TC CTGATAACITCA’rrrrl’GG  20 18 18 19 17  49 69 83  18 18 20  94 23 20  18  22 34  3 5 7 8 5  COWP  6 I 3 7  Cp23  4 3 I 5  18S rENA  HSP7O  ACoA  .  Mucin-1  G p 60  LDH MDH UPRTase  EMAAg APR  6 I 3 14 17 19 20  GAGGGAGCTGAACTCTP GCAGAATCITGTGATGC TCAATATCTCCCTGCAAA CCCACCAGGATATACAGA ATGTCCCCCAGGATrCGT TCCAGAATGTCCTCCAGG  18 18 18  CAT1TTACAAGGCCTCCA TCCTCCAGCTGCTGATGC AAGAATCCAGCTCCAAT  21  AAAGAATCCAGCTCCAAT  48  46 22 31 41  CCAGCAGCCCAAGCTCCT ACAGGATAAGCCAGCTGA  16 16 18  TATATAATAYrAACATAATrCATAITACTAT GAATAATATrAAAGATIITIATCTIT  16 26  76 78  TGCTAATGGTATCTTGAATGT AGGGTGAGGATGAGCA GAGAACTACCTGTATACATGAG TGAACATCAACAA.AAGGA  21 16 23  19 36 41  52 62  22 I  GAATGCCAGGTGG TATATACCTCAGGTAGTACTGG  18 13 22  58 75 42  4 6  AGGATACTrACTITATGCTGC TCAI’ICATrCATCCTGG  21 16  19 48  7 13 16  GTGCTGGAGATATGG GACCTGATAACTrCA1111IGG CAAACCTGAFAGGAG  16 22 17  60 66  80  CTGGTGAAGTrACATCTGTA  20  108 126  GATTTGITrGCCITTAC CGGCGCAAACAG TCGTCTATGCACCTATAAAAGA  17 12 22  98 115 10  ATCPIAGATCAAGAAGATCACTC AGAATFGAAGTGGCTGT TTGTGTGCCATACCAGA&  22 17 19  AACATI’CATI’GCACAACA AITACTCAITCACAAATC  16 20  95 32 40 21  T1’CAGTTGCAGAAAATGT ACCTTCTrCCTI’ATGATr TAAAACCCAAATGGAAT  17 20  29 41  3 8 7 2 3 29 27 2  TATAACAAACTCCCATAC TrCTATTGATGAGC1TGC ACGTTGGCCCGATGAAA  3  TCAfAGAGAATAXflGA  Table 4.3. Markers labelled according to internal lab database.  16 22  +  83 20 35 52 68 78  50  18 15  39 48 36  20  36  Fragment(nt) 67 66 67 88 100 112 41 42 40 52 64 40 52 57 68 80 92 104 40 52 64 76 88 64 40 64 76 88 100 40 52 64 90 100 112 51 56 41 46 61 66 61 66 51 56  Numbered according to internal lab data.  annealing poly-GACT tail to create size discrepancies for fragment separation and sizing.  ±  Non-  Expected fragment size  including original capture probe, non-annealing poly-GACT tail, and single base extension.  48  4.2.8 Capillary Electrophoresis & Electropherogram Analysis  Extension products, 0.5u1, were mixed with 0.5u1 of internal size standard Liz 120 (Applied Biosystems) and 9u1 of Hi-Di formamide (Applied Biosystems), heated for 5m at 95°C, and immediately quenched in an ice bath. SNP detection was carried out on an automated 3 l3Oxl 5-color sequencer (Applied Biosystems) with a capillary length of 36cm (Appendix 6). Performance optimized polymer-6 (POP-6) was injected into the capillary to serve as a dynamic coating for the capillary walls and to provide sieving medium for fragment analysis. Parameters for electrophoresis were performed for 1 5m with an injection time of 5s, voltage of 15kv and run temperature of 60°C. Total run time for each sample was 24 minutes. Fluorescently labelled fragments were exported to GeneMapper analysis software v4.0 (Applied Biosystems). Peaks were scored and analyzed based on size and color using the local Southern method.  4.2.9 Allele Discrimination & Scoring  It can be difficult to predict mobility of an oligonucleotide since fmal mobility is determined not only by length, but also by molecular weight of the labelling dye for each base. Fragment sizes can be skewed to a slightly larger size than that expected based off the unique primer sequence alone. Primer dimers and secondary structures can skew the expected fragment size. These factors were all considered during fragment analysis. Resulting electropherograms consisted of peaks corresponding to the predetermined fragment size and allele discrimination was assigned according to DS-03 dye set (Figure 4.4). Fluorescent specific nucleotides were identified; green for adenine (A), black for cysteine (C), blue for guanine (G), red for thymine (T). An internal Liz 120 size standard (Applied Biosystems, Foster city, CA) with size markers at 15, 20, 25, 35, 60, 80, 100, and 120 nucleotides was used to analyze and size electropherogram data (Appendix 3). The Lizi 20 standard fluoresces orange at each of the above mentioned pre-sized fragments. To verify the efficiency and accuracy of the SBE protocol and subsequent allele scoring a collection of random SNPs from random samples were typed in a single-plex reaction before being multi-plexed. Allele scoring was done via GeneMapper V4.0 software (Applied Biosystems, Foster City, CA). Alleles were scored based on fluorescent labels and size ranges for each SNP marker individually.  49  Figure 4.4 Fragment analyses; example electropherogram representation of allele discrimination & scoring.  ‘IllS I  3500  ?000  40  ISW  Figure 4.4. Electropherogram  4300•  representation of Canadian C.  3500  hammEr isolate, BC12, scored at SNP  -  3800  -  2500  —  2000  —  1500  —  1000  —  markers COWP 5, COWP 6, Cp23 3, Cp23 1, and BT 1 with an expected  500  lU  t  IL /:  uHY_ijx: Dye/Sample Peak  —  I  j  A  Fubiutes  Size  —  C. hominis allelic profile of C-T-T  Peak Height  Peak Area  T-C respectively. Alleles discriminate as black for C, red for T, green for A, and blue for G. Sizes for each fragment are listed below  f  Data Point  .__!J_. .._.._tia .J±I Y.5  .._.&t R,3  11.85  6423  3031 980  6382  1281 zi  7875  3232  iia ._szz .._iIA! 11.89  59.53  9L.1  flIt 3188  2t ..JLZL OfiJJ.33 1118 23 2J  9,8 0,9  13.74  compared against Lizl2O dye standard, fluorescing orange with a pre-inscribed set of fragments. Sample peaks correspond as: Y, 3 as  ..S.L’_JP2L iica. ,SL,.J__J.Q8  the electropherogram and are  ....S1L ,.__2Z.. 0O fl32 2852 I95 QSL ...J21± .fl90L 2_L 55 85 !.:00 !IL -—  COWP 5 (C, black); R, 1 as COWP 6  -  [1q90  120.00  1760  11486  1P7  115  1979  15619  3376  (T, red); R, 2 as Cp23 3 (T, red); R, 3 Cp23 1 (T, red); Y, 5 as BT1 (C, black).  3745  In the event of two alleles typed to a single SNP locus additional peaks present at >20% of the height of the main peak were scored as mixed or double allele calls. The main or predominant peak was used for M1S-typing under the assumption that the predominant peak at each SNP locus represents the actual genotype. The cut-off was applied to prevent the inclusion of possible false-positive peaks resulting from artifactual stutter peaks or adjacent pull-up peaks.  The combination of alleles from all 45  SNP markers isolated from the 13 targeted loci was used to determine a multi-locus SNP-type (M1St) for  each isolate from Australia, Kenya, Pen, and Scotland subpopulations. SNP patterns from Western Canada were more varied across the 45 SNP markers and therefore reserved to decide about viability of the assay as a species distinction tool.  50  4.2.10 Diversity Statistical Analysis  MiSts detennined for each isolate were used to make species distinctions and identify variant or novel alleles at any given SNP position. Allele frequencies were tabulated and used for analysis of genetic diversity and stnicture. Based on MiSt the number of unique SNP profiles was determined as was their geographic distribution. Qualitative SNP data was converted to quantitative data by creating appropriate mathematical input files and processed through a combination of software programs designed for genetic population studies. Using the genetic data analysis (GDA) program five standard genetic diversity parameters were estimated: percent polymorphic loci (P), mean number of alleles per polymorphic locus (AP), mean number of alleles per locus (A), effective number of alleles per locus (Ae), and expected heterozygosity or gene diversity (He) . Intercontinental 1 differentiation was compared among subpopulations and estimated by calculating Nei’s genetic identity and distance. Dendogram representation based on genetic distances using the Neighbour Joining Group was done using TreeViewX version O.5.O. The Fstat Statistics software package was used partition genetic diversity into within and between subpopulations based Fixation indexes. In addition to descriptive statistics, total genetic diversity (Ht), genetic diversity within subpopulations (Hs), average among subpopulations genetic diversity (Dst), and the proportion of genetic diversity found among subpopulations (Gst) were calculated for subpopulations following Nei’s (1973,1977) gene diversity formulae.  4.2.11 Geographic Boundaries & Study Subpopulations  In any phylogeographic study populations from opposite sides of obvious or suspected barriers are sampled. Parasite isolates were obtained from leading facilities around the world. With great appreciation isolates were donated by Drs. L. Xiao, W. Gatei, H. Smith, and U. Ryan from Peru, Kenya, Scotland and Australia respectively. Including isolates secured here in Canada our study currently covers 5 distinct global localities, climates and ecologies from 5 continents (Figure 4.5).  51  Figure 4.5 Geographical representation of intercontinental subpopulations used in study.  Figure 4.5. Pictorial illustration of the five subpopulations used in this study; Australia (blue), Canada (red), Kenya (black), Peru (yellow), Scotland (green).  Isolates were genetically identified as C. hominis in their home labs based on host, oocyst morphology and sequence data. All samples donated were isolated from human hosts of Cryptosporidium. Given their varied histories isolates from each geography were treated as a subpopulation. Isolates are referred by their country code throughout the remainder ofthis document. A total of 108 samples from the five subpopulations were used in this study: Australia, 15; Kenya, 20; Peru, 22; Scotland, 22; W. Canada, 31. Subpopulations for all five international regions are listed below, tables 4.4,4.5,4.6,4.7, and 4.8, with a simplified schematic of subpopulations versus metapopulation in Appendix 2.  52  Table 4.4 Australia Subpopulation, n=I5 Isolate Code Species H131 C. hominis Hl39 C hominic H140 C hoininic H14l C. hominis H142 C hominic H158 C hominis H160 C hominic H161 C hominis H162 C. hominic H163 C hominis H164 C. hominis H165 C hominis H166 C hominis H167 Cpavum H168 Cpaivum Confirmed C. parvum isolates in originating lab, U. Ryan.  Country Code Al A2 A3 A4 A5 A6 A7 A8 A9 AlO All A12 A13 A14* A15*  Sample Type fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecai, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA  Table 4.5  Country Code Ki K2 K3 K4 KS K6 K7 K8 K9 Kl0 Ki 1 K12 K13 Kl4 K15 K16 K17 K18 K19 K20  Isolate 10962 10963 10965 10966 10967 10972 10973 10974 10975 10976 11144 11145 11146 11148 11149 11150 11151 11152 11153 111S4  Kenya Subpopulation, n=20 Code Species C hominic C hominic C hominis C hominic C hominis C hominis C hominis C hominis C hominis C. hominis C hominis C hominis C hominis C hominis C hominis C hominis C hominis C hominis C hominic C hominis  Sample Type fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA 53  Table 4.6 Countty Code P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22  Peru Subpopulation, n22 Isolate Code Species 11122 C. homith 11123 C hominic 11124 C hominis 11125 C. hominic 11126 C. hominis 11127 C. hominis 11128 C homith 11129 C homith 11130 C hominis 11131 C hominic 11132 C hominis 11133 C hominis 11134 C hominis 11135 C hominis 11136 C hominis 11137 C hominis 11138 C hominic 11139 C hominis 11140 C hominis 11141 C hominis 11142 C hominic 11143 C hominic  Sample Type feca], whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA  Table 4.7 Country Code Si S2 S3 S4 S5 S6 S7 S8 S9 SlO Sil S12 S13 S14 S15 S16 S17 S18 S19 S20  Scotland Subpopulation, n20 Isolate Code Species 2023 C hominis 2026 C hominis 2068 C hominis 2091 C hominis 2096 C hominic 2106 C hominis 2114 C hominis 2118 C hominis 2122 C hominis 2132 C hominis 2133 C homith 2140 C hominic 2177 C hominis 2181 C hominic 2203 C hominis 2224 C homith 2234 C hominir 2242 C hominic 2249 C hominic 2267 C hominis  Sample Type fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA 54  Table 4.8 Countiy Code BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BC1O BCI I BCI2 BC13 BC14 BC1 5 BCI 6 BC17 BC18 BCI9 BC2O BC21 BC22 BC23 BC24 BC25 BC26 BC27 BC28 BC29 BC3O BC31 * Not reportable by law.  Isolate Code* nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr nr  Canada Subpopulation, n31 Species Cpaivum C. pantum Cpanium C panfum C panm C panwm C. patvum Cpatvum C patvum Cpantum C homith C hominic C. hominis C hominis C hominis C hominis C hominis C hominic C homini.s C hominic C. hominis C hominic C. hominic C hominic Cpaivum C. hominis C hominis C. hominis C hominis C hominis C hominis  Sample Type fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA feca], whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecal, whole DNA fecai, whole DNA fecal, whole DNA fecal, whole DNA  55  CHAPTER 5 RESULTS -  Descriptive Genomics, Multi-locus SNP-typing, Genetic Diversity Measures & Partition  -  Summary We report on genetic variation both within and between C. hominis subpopulations from —  Australia, Kenya, Peru, and Scotland. We examined —48 500 bp and assembled a data set of 394 SNPs. Following comparative genomics and bio-physical profiling an expected haplotype, representing a set of 45 single nucleotide polymorphisms at individual loci was established. Molecular typing of 77  international isolates based on this haplotype or multi-locus SNP-type was done, twenty-four unique MlSt’ s were identified. Inferences about genetic relationships were made using genetic data analysis software programs to quantify and partition the genetic diversity into intra- and inter-population diversity and to discern genetic distances among subpopulations. Within population differences among subpopulations account for 69.6% of genetic variation; differentiation among subpopulations constitute 30.4%. Genetic distances among subpopulations averaged 0.048 and varied from 0.034 between the Australian and Scotland subpopulations to 0.061 between Scotland and Kenya. The potential use of a DNA-typing scheme based on SNPs to resolve Cryptosporidium epidemiology was examined. Established C. hominis and C. parvum subpopulations from Western Canada, in collaboration with international SNP-typing results, were successfully used to assess the efficacy of a mutation based platform for genetic typing and species distinction.  56  5.1 Comparative Genomics: Global Patterns of Single Nucleotide Polymorphisms (SNPs)  Of 3956 total genes annotated in the C. hominis genome” 245 25 were targeted for bioinformatics analysis in the hopes of identifying ideal SNPs that would have the greatest research potential (Table 5.1). Genes examined consisted of four bio-fiinctionalities; enzymatic or bio-synthesis related genes, structural or cellular related genes, virulence or antigenic determinant genes, and intron-housing genes. The use of various genes allows for more robust conclusions to be made from the compiled data sets.  Table 5.1  1  Gene Annotation Malate Dehydrogenase  Target Gene Libraty Abbreviation MDH  2  Lactate Dehydrogenase  LDH  enzymatic  7  3  Uracil Phosphoribosyl Transferase  UPRT  enzymatic  1 1  Bio-functionality enzymatic  Chromosome 7  4  Erythrocyte Membrane Associated Ag  EMAAg  antigenic  5  Apoptosis Related Protein  APR  bio-synthesis  4  6  Cjqtospondium Oocyst Wall protein  COWP  structurally-related  4  7  Beta-tubulin  BT  intron-containing  6  enzymatic  8 6  8  Acetyl coA synthetase  AcoA  9  Mucin-1  Muc-1  structurally-related  10  Cp23  Cp23  antigenic  4  11  18S rRNA  18S rRNA  ribosomal (structural)  multi-copy  CCR  enzymatic  I  12  Ceilcycle Regulator  13  CTCL Tumor Ag  CTCL  antigenic  2  14  Aldahyde-Alcohol Dehyd’ase  AAD  enzymatic  8  15  CLL Associated Ag-KW-2  CLLAg  antigenic  2  16  Sexual Stage Specific Kinase  SSK  bio-synthesis  3  17  FLJ31812/DHHC palmitoyl transferase  FLJ  enzymatic  7  18  Transmembrane amino acid Transporter  TMaaT  bio- synthesis  3  19  ABC multi-drug or ion efflux  ABC  bio-synthesis  4  20  Thiolproteinase  Thiol  enzymatic  7  21  Extracellular protein w/ 8 kazal repeats  ExP  unknown  4  22  Seroreactive Ag BMN-19B related protein  SeroAg  antigenic  7  23  RIK protein w/ ? WD4O repeats  RIK  unknown  8  24  Heat Shock Protein 70  HSP7O  bio-synthesis  2  25  Glycoprotein6o  Gp60 antigenic 6 Table 5.1. Genes targeted for genetic typing; abbreviated identifiers and putative molecular function and chromosomal position are all listed.  57  Thirteen of these (Table 5.2) were targeted for further characterization and profiled based on numerous criteria as discussed in the materials and methods section. This initial baseline analysis was crucial to pinpointing those SNPs that would best represent the gene for downstream SNP-typing of isolates. Each of the 13 genes used in the study are identified by their C. hominis annotation as described in the CryptoDB database ( Clustal alignments generated from the Mercator and MAVID programs at the nucleotide level between the compiled C. hominis and C. parvum databases  within the CryptoDB database were done to elucidate single base differences between the C. hominis reference strain, TU502, and the C. parvum reference strain, lowall. These two strains are the parent strains of all molecular studies to date on Cryptosporidium and the only two strains to have had their genome sequenced in its entirety.  Table 5.2 Gene Panel for Multi-locus SNP-typing Gene AcoA APR  t3-tub COWP Cp23 EMAAg Gp60 HSP7O LDH MDH Muc-1 I8SrRNA UPRT  CryptoDB ID (Ch.#####) 10418 40253 50050 40378 40414 30018 60138 20010 70063 70062 60622 rmOl3 80328  Contig # (GenBank) AAELOI000346 AAELO1000018 AAELO1000450 AAELO1000002 AAELOI000216 AAELOI0002O7 AAELOI000025 AAELOI000I58 AAELOI000175 AAELOI000I75 AAELOI000244 AAELOI000005 AAELOI00003O  Nucleotide Position 468-2555 38706-39098 4053-4892 6200-8425 5348-5683 2119-4338 26940-27974 12080-14113 8305-9270 9686-10636 4857-7199 6112-7866 7647-9020  MW (Da) 77945 15042 31939 84131 11262 84923 35768 73706 33866 33589 85466 66497 51885  Coding Seq. (bp) 2088 393 840 2226 336 2220 1035 2034 966 951 2343 1689 1374  Protein Seq. (aa) 695 130 279 741 111 739 344 677 321 316 780 563 457  C. parvwn Ortholog ID cgdl_3710 cgd4_2220 cgd5_3220 cgd4_3340 cgd4_3620 cgd3_90 cgd6_1080 cgd2_20 cgd7_480 cgd7_470 cgd6_5400 cgd2_1375 cgd8_2810  Table 5.2. Thirteen genes used herein for SNP-typing of international subpopulations and their NCBI databank identification codes. Molecular weight (MW) in Daltons, coding sequence and peptide length listed.  To identify SNPs, we examined 18 495 bp representing the 13 target loci, which included 13 coding regions and one intron. A total of 394 SNPs were mapped creating a comprehensive data set; the density of SNPs was approximately 1 in 47 bp. Of 394 total SNPs, 392 were mapped to coding regions, while the remaining 2 were found in the non-coding or intron region of the -tubulin gene. In total our SNP data set is comprised of 274 synonymous or non-expressed single base mutations compared to 120  non-synonymous or expressed mutations (Table 5.3). Overall there were almost half as many nonsynonymous (NS) changes as synonymous (S) changes, producing a ratio of NS SNPs/S SNPs of 0.44.  58  Table 5.3 Genetic Organization of Gene Panel Gene AcoA APR 3-tubulin COWP  # SNPs 35 9 9 38  Transitions  Transversions  # SAAPs  cu Ratio  27  8  6  6:39  7  2  3  8 30  1 8  0 11  3:6 0:9 11:27 1:6 19:21  Cp23 7 5 2 1 EMAAg 40 36 4 19 Gp60 153 88 65 72 72:81 HSP7O 32 26 6 1 1:31 LDH 20 16 4 1 1:19 MDH 15 8 7 2 2:13 Muc-1 2 0 2 0 0:2 18SrRNA 5 1 4 0 0:5 UPRT 29 23 6 4 4:25 Table 5.3. Descriptive makeup of SNPs mapped to target genes. + Indicates number of transitions (purine to purine or  pyrimidine to pyrimidine) or transversions (purine <>pyrimidine) present.  *  SAAPs are single amino acid  polymorphisms introduced by a mutation conferring a change to the original peptide make up of the protein.  cu ratio represents the ratio of non-synonymous (expressed) to synonymous (silent) mutations and can be an indicator of  selection pressure.  To ensure genomic comparisons were made to orthologous genes a blastp search was conducted against over 50 eukaryotic genomes from multiple phyla, important considering the prospect of other competing microorganisms within gastrointestinal space (Table 5.4). A tblastn, to infer any changes at the amino acid level that may occur within the each target protein and its closest ortholog, followed. A tblastn search is a modified blastn search that compares a given protein sequence against a nucleotide sequence database dynamically translated in all 6 reading frames for both strands. C. hominis protein sequences were searched against both the complete C. parvum and C. muris databases. In all cases the closest ortholog was from the C. parvum lowall genomic database (Table 5.4). Results of the blast searches were used to decipher if particular SNPs mediated a single amino acid change to the peptide configuration of a given protein. Positions of SAAPs were determined based on the in-frame start or methionine codon according to the blast alignment.  59  Table 5.4 Gene  Length  blastp Protein Queries Score (bit) E value  AcoA  8117  1310.7  APR  39459  233.9  1.3e-63  3-tub COWP Cp23 EMAAg Gp60 HSP7O LDH MDH Muc-1 18SrRNA UPRT Table 5.4. Blast  844015  509.5  1.4e-145  (+) 3 (+) 3  579568  1278.0  0.0000  (+) 2  579568  90.6  1 .9e-20  (-) 3  84326  1219.2  0.0000  (+) 2  Frame (+) 3  1213436  289.1  1.9e-85  (-) 2  Closest Ortholog C. paivum Towall C. parvum Towall C. parvum lowall C. parvum lowall C. parvum lowall C. parvum Towall C. pan’um Towall  97996  1056.2  0.0000  (+) 1  C.paniumlowall  1278458  520.8  1 .4e-148  (-) (-)  0.0000  C parPum lowall 1278458 533.1 1.2e-153 1 C.panmIowaIT 10947 312.4 1.5e-200 3 C.parvumlowall (+) 1150.2 887873 0.0000 (+)3 C.pan’umlowall 1156729 795.7 1.le-229 C.parvumlowall (-) I queries of thirteen target genes for sequence similarity. E value is the expectation value and 3  calculates how many times you could have expected the result by chance alone, the lower the better.  ÷ Based  on  comparative search encompassing over 50 genomes from 9 different Eukaryotic phyla.  5.2 Target Genes: Qualitative Characterization  Malate & Lactate Dehydrogenase Malate dehydrogenase (MDH) and lactate dehydrogenase (LDH) are NAD dependent enzymes located adjacently to one another on chromosome 7. Both belong to the 2-Ketoacid: NAD (P)-dependent dehydrogenase superfamily responsible for the catalytic conversion of 2-Hydroxyacids to the corresponding 2-Ketoacid. MDH is found across all three domains (Eukarya. Bacteria and Archaea) of life while LDH is exclusive to the Eukarya and Bacteria. Both genes are reported to be syntenic single copy genes within the Cryptosporidium genome. In contrast genetic evidence for the closely related Apicomplexans Plasmodiumfalciparum and Toxoplasma gondii points to the presence of two LDH loci’ . 32 Genetic investigations indicate that independent of other Apicomplexans, Cryptosporidium’ s LDH gene evolved from a LDH-like MDH gene . This supports the concept that Cryptosporidium has a more 132 distinct molecular and evolutionary divergence from its fellow Apicomplexans. Zhu et a!. (2001) showed that both MDH and LDH are extremely substrate specific for oxoalacetate and pyruvate respectively . 246 Mainly responsible for this specificity are the residues at amino acid position 102. For both the LDH and MDH loci this is a conserved region between the two species of C. hominis and C. parvum. The 60  divergence of Cryptosporidium’s LDH and MDH loci from their counterparts in other Apicomplexans, animals and humans supports the idea that these enzymes should be further investigated as rational  Cryptosporidium specific prophylactic targets.  Figure 5.1 Phylogenetic relationship Apicomplexan parasites based on the MDH and LDH enzymatic genes.  C. hominis LDH C. parvum LJDH C. hominis MDH C. parvum MDH P. vivax LiD I—I P. reichenowi LDH T. goridii LDI—I T. goncili MDH Pyoelli MDI-I P. falciprium MDH P. falciprium LDH E. tenella LDH -  -  -  -  -  -  -  -  -  -  -  407.6  —  400 350 300 250 200 1 50 1 00 50 P.lucleotide Substitutions (xl 00)  0  Figure 5.1. The lack of strong genetic similarity is clearly evident as Cryptosporidium clades distinctly within its own genus. Phylogeny based on GenBank sequences accession; C. hominis-LDH, AAELO1000175; C. parvum-LDH, AF2743 10; C. hominis-MDH, AAELO1000175; C. parvum-MDH, AY334274; P. vivax-LDH, AY486060; P. reichenowi-LDH, AB122147; T gondii-LDH,U23207; 2’. gondii-MDH, AY650028; P. yoelli MDH, AABLO 1000969; P. falciaprum-MDH, AY324 107; P. fa1cirium-LDH, AF323520; E. tenella-LDH, AY143389. JMWilliamson, unpublished.  Lactate Dehydrogenase  —  Two biological processes are attributed to the LDH loci specifically.  First it is thought to play a major role in glycolysis in addition to being involved in the tricarboxylic acid cycle. In terms of molecular function this gene is considered to have both L-lactate dehydrogenase activity as well as oxidoreductase activity. Sequence analysis of alignment of a 965bp fragment of the LDH gene showed 20 SNPs, only one of which appears to confer an amino acid polymorphism. The crucial l02”’ amino acid position remained conserved with an aspartic acid residue suggesting no effect on substrate specificity. All 20 silent SNPs were located in the third or wobble position of the codon.  Malate Dehydrogenase  —  Gene ontology for the MDII loci also includes glycolysis, malate  metabolic processes, and tricarboxylic acid cycle involvement in addition to L-malate dehydrogenase and oxidoreductase activity for biological and processes and molecular functions respectively. A 957 bp ClustalW alignment of C. hominis MDH and C. parvum MDH gene indicated 15 single nucleotide  61  polymorphisms. Of the 15 total SNPs, 11 occur in the third codon position with the remaining 4 occurring in the first codon position. Seven of the 15 SNPs procured an amino acid change, none of which occur at the critical 102 valine residue amino acid position previously mentioned.  Uracil PhosphoRibosyl Transferase (UPRT) Located on chromosome 1, UPRTase, is an operative enzyme in Cryptosporidium amino acid metabolism. It is the key component of the pyrimidine salvage pathway. Its molecular functions include uridine kinase activity, ATP binding, and kinase activity. In theory hampering the expression of UPRTase could decrease the amount of functional transcripts produced hence diminishing the ability to assimilate. Alignment of a smaller 711 bp fragment of the C. hominis UPRTase gene and its C. parvum ortholog was done. Within this targeted gene fragment there were 2 transitions and 7 transversions and only 1 SNP out of a total of 10 mediated an amino change in the primary peptide sequence. Of the 9 silent mutations 8 occur in the wobble position and the ninth occurs in the first codon position. The lone non-synonymous SNP is situated in the third codon position.  Apoptosis Related Protein (APR) Apoptosis is characterized by a controlled cellular self-digestive process that functions to eliminate pathogen invaded host cells during the development of infectious organisms. Apoptosis, or cell death, is crucial with respect to host-parasite interactions. Host cell apoptosis on infected cells has a parasiticidal effect but can also kill neighbouring un-infected host cells. Cryptosporidium colonizes the gastro intestinal tract where other histological consequences such as the loss of absorptive or goblet cells and crypt hyperplasia can result. Some studies suggest that the differences exerted on host cell apoptosis, whether parasite infected or not, are mediated by the developmental stages of the parasite. By halting apoptosis in a host cell the parasite is allowed to complete its development to fruition. Once fully formed and functional the parasites can excyst from the host cell, re-establishing apoptosis. The transcription factor NF-kappa B regulates many host derived genes that encode apoptosis inhibitor proteins. One theory is that by upregulating the activation pathway for NF-kappa B Cryptosporidium can prolong the life of its host cells giving it sufficient time to fully develop. The exact balance of apoptotic benefits or exploitations between host and parasite are still largely unclear. For C. hominis on chromosome 4 Tzipori et al. 2004 have annotated a 393bp ORE as an apoptosis related protein. Little is known about this protein and its exact functions. To date there is extremely limited genetic documentation or literature on this specific protein thus making it a point of interest within this study. It is considered to be a smaller more conserved protein hypothesized to have a double stranded DNA-binding region. Upon comparison to the C. parvum ortholog 9 potential target SNPs were visualized 62  at the basic genetic level. Further analysis at the amino acid sequence for the gene indicated the presence of 3 SAAPs.  Acelyl Co-enzyme A (AcoA) Acetyl-CoA is an important molecule in metabolism, used in many biochemical reactions, in species of all types. Its main use is to convey the carbon atoms within the acetyl group to the Krebs cycle to be oxidized for energy production. In chemical structure, acetyl-C0A is the thioester between coenzyme A (a thiol) and acetic acid (an acyl group carrier). In Cryptosporidium ACoA is a single-copy gene which has been reported to have genetic preference for thymidine and adenosine amino acid residues in the third codon position. As is the case with most other Cryptosporidium genes the gene is extremely dense, containing no intron regions. Gene ontology at the biological level is metabolic while at the molecular level it is thought to have catalytic activity functions as well as AMP binding functions. Previously Upton et al. reported on the degree of low usage codon within this protein. Their results found there to be relatively high number of them within the ACoA open reading frame (ORF). In all species, proteins containing a high percentage of low-usage codons could be characterized as cases where an excess of the protein could be detrimental, thus making it a suitable target for further molecular characterization. For the purpose of this study a smaller 344bp fragment was aligned between the C. hominis and C. parvum parent strains. The result of this led to the identification of 7 potential single base mutation targets.  Heat Shock Protein 70 (HSP7O) The HSP7O protein is classified as cytoplasmic protein that helps mediate ATP-binding processes. Previous literature has suggested the usefulness of this locus for phylogenetic analysis of the genus Cryptosporidium. The heat shock protein belongs to a multi-gene family that is highly conserved across prokaryotes and eukaryotes. These proteins appear to function as molecular chaperones for facilitating the folding of proteins in secretion and transport. Previous studies have shown their up-regulation under environmental stresses and their involvement in cellular protection. The high incidence of synonymous or silent mutations suggests that this locus is more permissive of nucleotide changes and may be under selection pressure. The fact that these nucleotide mutations are spread over the entire sequence versus those proteins that cluster mutations close to one another indicates that this gene is a more robust target for molecular marking as well as genotyping and phylogenetic studies. Thirty-two single point mutations were mapped within the HSP7O domain, 26 transitions and 6 transversions. In all, only one SNP introduced a single amino acid polymorphism (SAAP).  63  Ciyptosporidium Oocyst Wall Protein (COWP) The durability of the outer oocyst membrane is largely attributed to a family of Cryptosporidium oocyst wall proteins (COWP). Their expression is significantly upregulated during the later stages of its life cycle, a time point coinciding with oocyst development stage. A great deal of the parasites resistance to chemical or environmental stresses comes from this outermost double layered membrane. The 2 layers are tightly connected through a protein-lipid-carbohydrate matrix. COWP- 1 was the initial loci to be discovered and analyzed. With the completion of both the C. hominis and C. parvum genomes 8 additional COWP genes have been uncovered. Supporting the discovery of these other COWP loci was postsequence analysis on the COWP- 1 loci showing a pattern of cysteine residues every 10-12 amino acids. The COWP genes are scattered throughout the genome on multiple chromosomes and are therefore unlinked. Of particular future interest is the COWP-7 gene. Molecular analysis has shown that it houses intronic regions within its open reading frame, a rarity in Cryptosporidium. Using a smaller 552nt fragment of the more well known and researched COWP-1 gene positioned on chromosome 4 alignments and analysis highlighted 8 potential target single nucleotide polymorphisms. Translation into an 1 84aa sequence indicated that only 1 of these introduced a change to its amino acid makeup. All remaining 7 SNPs involved a redundant amino acid change that occurred in the  or wobble  position.  18S rRNA A central componenet to the ribosome in eukaryotic organisms is the small subunit, the 18S rRNA subunit. Ribosomal RNA’s (rRNA’s) are targets for many clinically relevant antibiotics, including parmomycin which has been used in varying success with Ciyptosporidium infection. Typically the ribosomal subunits are among the most conserved genes in cells making them among the most studied and most used in clarifying the taxonomic status of an organism. Due to its genetic stability 1 8S rRNA is one of the most reliable and early genes used for the identification and diagnosis of Cryptosporidium. The locus has the most molecular data accrued within all the major genetic databases giving an increased confidence in the true presence of what mutations may exist within it. Molecular markers isolated from the multi-copy Cryptosporidium 1 8S rRNA gene are expected to be useful for both species confirmation and as information for population genetic studies. Alignment of multiple partial C. hominis and C. parvum fragments ranging from 540bp to 745bp revealed only 3 single base mutations or SNPs all of which were silent.  64  Cp23 The Cp23 gene encodes an immunodominant surface protein. The immunodominant Cp23 locus has mainly been used as an investigative tool in immunoreactivity and serology studies. Characteristic serum immunoglobulin G (IgG) antibody responses to this 27kDa antigen have been shown to develop post-infection ’ 2 . It has also been previously studied at the molecular level for the characterization of gene fragments encoding epitopes to which sporozoite neutralizing antibodies were directed. Between C. hominis and C. parvum the Cp23 loci is relatively stable genetically. This coupled with functional properties make it another excellent genotypic and clinical marker for investigating Cryptosporidium epidemiology. Alignment of the C. hominis and C. parvum Cp23 locus revealed 7 nucleotide differences including 5 transitions and 2 transversions. Only 1 of these mutations conferred a single amino acid polymorphism upon translation into the peptide sequence. Of the 6 silent mutations 3 were to be found in the wobble spot with 2 in the first position and the remaining 1 in the middle codon position.  Erythrocyte Membrane Associated Antigen (EMAAg) Scattered throughout the Cryptosporidium genome is a gene family encoding for a protein termed the erythrocyte membrane associated antigen. These loci range in size and degrees of genetic variation between species. The least known protein of the three antigenic proteins under investigation here, the erythrocyte membrane associated antigen was only recently annotated upon the publication of the C. hominis genome by Abrahamsen et al. (2004) hence literature on molecular investigations into its genetic structure and organization are almost nil. A 203 7nt fragment of the erythrocyte membrane associated antigen from chromosome one was aligned. In the C. parvum homologue there was a 4nt deletion present at positions 520-524 and a l4nt deletion at positions 525 through 539. In all 40 single nucleotide changes were observed, 19 of which were the precursors to a single amino acid change. Glycoprotein6O (Gp60) The Ciyptosporidium glycoprotein 60, Gp60, also commonly known as Cpgp 40/15, is a single copy surface protein that in previous molecular studies has shown to be highly polymorphic in intra species and inter-species population studies. Previous research focused on the molecular clarification of the high degree of genetic differentiation within this protein has led to the identification and classification of five allelic subgroups termed 1 a, 1 b, 1 c, 1 d and more recently 1 e. Specifically the Gp60 locus encodes for a precursor protein that upon proteolytic cleavage yields two mature cell surface glycoproteins, gp4O 65  and gp 15. These two mature proteins are implicated in zoite attachment to and invasion of enterocytes thus in part mediating host cell invasion.  Such an observation could imply that polymorphisms in this  gene could contribute to differences in the host specificities and infection patterns of C. hominis versus C. parvum. The high degree of genetic polymorphisms within this protein support the notion that its gene  products, gp4O and gpl 5, are in fact surface associated proteins that are hypothesized to be virulence determinants likely under host immune selection. It stands to reason that further investigation into the unprecedented degree of genetic polymorphism within this protein, particularly from a geographic standpoint, could prove to be very helpful in understanding the molecular epidemiology of cryptosporidiosis. Comparative genomics for the Gp60 protein revealed extensive polymorphisms, parallel to that of other studies. Sequences from both the C. hominis and C. parvum genomes were isolated and aligned for visual comparison and SNP position identification. Mapped to Gp60 were 153 SNPs, 72 of which conferred amino acid polymorphisms to the primary protein sequence. This locus is the most variable of all the target proteins involved in the study and considered to be under the most selective pressure.  /3-tubulin A single copy structural protein, the (3-tubulin gene is of particular interest as it is one of the few open reading frames in the Cryptosporidium genome containing an intron region. A single intron is sandwiched between two open reading frames, exons 1 and 2. This intron is 84nt long with a 2bp deletion in the C. hominis gene. A 457nt alignment between C. hominis and C. parvum showed 2 SNPs within the intron and 9 in the 5’ region of the adjacent exon2. Translation into the 1 53aa sequence revealed all 9 of these to be silent. Both intron SNPs were identical involving a cysteine residue versus a thymidine residue for the C. hominis and C. parvum respectively. Genetically the gene is variable but stable. SNPs within this gene are hypothesized to be invaluable markers for differentiating unique or novel isolates.  5.3 Multi-locus SNP-type (MiSt) Assembly  Compilation of the comparative genomics and bioinformatics criteria as outlined resulted in a pre defined SNP-type and allelic profile of the Cryptosporidium genome based on target SNPs. Collectively 45 SNP loci from the 13 target genes were used to type isolates (Table 5.5).  66  Table 5.5 Assembled SNP Marker Panel Gene AcoA  APR B-tubulin  COWP  Cp23  EMAAg Gp60  Hsp7O  SNP 1oci 1 4 6 7 2 3 1 4 3 5 7 8 5 6 1 3 7 4 3 1 5 6 29 27 80 108 126 79 98 115 14  17 19 20 22 LDH  10 3  MDH  C. hominis allele C C C A T T C A T C G C C T C T G G T T C G G G T T T T G G A A A A A  C. parvum allele A T G T C G T G G T A T T C T C A A C C G T A C A C C C T A G G G G T  silent silent silent silent  C C  T  expressed  T  silent  expressed/silent silent silent silent silent expressed expressed intronic, silent silent silent silent silent silent silent silent silent silent silent silent expressed silent expressed expressed expressed silent silent expressed expressed expressed expressed expressed silent  8  G  A  expressed  7  G  C  silent  Mudn-1  13 16  T A  G T  expressed expressed  UPRTase  2  T  A  expressed  3  T  C  silent  18SrRNA  1  T A silent 3 T C silent Table 5.5. + SNP loci identified by internal lab molecular data set. Expected allele as annotated in C. hominis TU502 and C. parvum Iowa II reference genomes. The allelic profiles for the C. hominis and C. parvum genomes from 45 molecular markers and 13 gene loci for DNA subtyping of international subpopulations. 67  5.4 Bioinformatics Computations: Protein Topology & Biophysical Predictions in Relation to Polymorphism.  The distribution of a pathogen in a community of hosts may be determined by the pathogens own genetic content that can in turn be influenced by the genetic make-up of that host as well as to the environment in which the pathogen resides. Ascertaining those genes that may undergo selective pressures from these niches is one approach to the identification of genes putatively involved in a pathogen’s unique epidemiological behaviour. In genetics the ratio of non-synonymous or expressed substitutions to synonymous or silent substitutions, can be used to infer the degree or likelihood of positive selection on any given protein-coding gene. A disproportionate number of NS:S substitutions would be indicative of regions under positive diversifying selection. Of the 13 protein encoding genes examined herein only 2 loci demonstrate a slight propensity towards positive selection, both of which are antigenic proteins (Table 5.3). First, is the Gp60 gene and secondly is the EMAAg gene, both considered to be as antigenic determinants in Cryptosporidium infection. The behaviour of amino acids can be dramatically altered depending on their immediate surroundings. Multiple criteria were applied to each of the target genes in the hopes of better elucidating the SNPs within them having the potential to be of greater research value (Table 5.6). ORF analysis consisted of profiling their chemical, structural and positional characteristics, with particular attention paid to those regions containing target SNPs. When collectively considered this multi-faceted approach enabled better decisions about which SNPs were best as molecular targets. Hydrophobicity is a popular analysis tool because it’s a good biological indicator of transmembrane segments or core regions within a protein (Figure 5.2, Appendix 5). The Kyte-Dolittle hydropathic score was used to assess an individual amino acids ability to repel and attract water and to what degree. Finding one transmembrane segment at the N-terminus of a sequence may imply the protein is secreted whereas many transmembrane domains in one protein may indicate a channel. SNPs positioned within such domains could be theorized to be involved in such biological processes such as molecule transport. Protein motifs and any possible mutations within them that are positioned on either side of the cellular membrane could potentially be contributing to biological processes such as cell signalling, phosphorylation or invasion processes. Just as hydrophobic regions can be membrane spanning segments in proteins that anchor themselves into a membrane, hydrophilic stretches could be looping out at the surface of the protein. When coupled to the predictability of the Emini algorithm, probability of finding an amino acid residue on the surface of the protein molecule, we were better able to hypothesize the location of a particular SNP within a protein. 68  Figure 5.2 Example hydrophobicity plot for the Gp60 ORF returned using the Kyte-Doolittle algorithm.  Figure 5.2. Hydrophobicity profile returned by Protean by using the Kyte-Doolittle scale for Gp60 protein locus (orientated 5’ -) 3’); values greater than zero are reminiscent of hydrophobic, membrane segments, with values of 2 indicating strong hydrophobic character. Regions of hydrophobic nature are highlighted beneath blue bars, with a clear indication that either ends or termini of the protein appear to be membrane anchored.  Further influencing molecular marker or target SNP selection were scores computed by the Jameson-Wolf antigenic index. SNPs evaluated as reminiscent of antigenicity are considered to be of greater research potential when considering the pathogenesis model of Cryptosporidium. On a threshold scale of (-) 1.7 to (+) 1.7 values approaching (+) 1.7 are evocative of antigenicity (Table 5.6, Figure A.5). When undergoing SNP selection the secondary structure characteristics of a given protein were taken into consideration. This is the intermediary structure between the primary structure, the protein sequence, and the tertiary structure, the 3-D folded conformation that is the final shape of the protein. The propensity of the peptide chain to form various secondary structure conformations is basic protein chemistry; cr-helices where protein residues seem to follow the shape of a spring, 13-sheet or strands where residues are in line and successive residues turn their back to each other, and random coils or turns when the amino acid chain is neither helical nor extended. A mutation located within the inside fold of a turn versus one that may be on the outside of a turn may be under different positive pressures from the external or internal biological environments.’  69  Table 5.6 Gene AcoA (1) AcoA (4) AcoA (6) AcoA (7) APR (2)  Species C. hominis C.parvum C. hominis C.parvum C. hominis C.parvum C. bomims C.pa7vum C. bominis  C.parvum APR (3) 3-tubulin (1) 13-tubulin (4) 3-tubulin (3) 3-tubulin (5) t3-tubulin (7) f3-tubulin (8) Cp23 (4) Cp23 (3) Cp23 (1) Cp23 (5) Cp23 (6) COWP (5) COWP (6) COWP (1) COWP (3) COWP (7) EMAAg (29)  EMAAg (27) Gp60 (80) Gp60 (108)  C. hominis C.parvum C. hominis C.parvum C. hominis C.paium C. bominis C.patvum C. hominis C.parvum C. hominis C.patvum C. hominis C.parvum C. hominis C.pan’um C. bominis C.parvun, C. hominis C.pan’um C. hominis C.parrim C. hominis C.parvum C. hominis C.parvum C. hominis C.panim C. bo,#inis C.patvum C. ho,ninis C.pan’um C. hominis C.parvum C. bominis C.pariwm C. hominis C.parvum C. hominis C.parvum C. hominis C.parvum  BioPhysical Properties of Targeted SNP Loci lcD NS/S* SNP Marker JW Score Score C A C T C G A T T C T G C T A G T G C T G A C T T C T C T C T C T C C T T C T C C G G T G A  G C T A T C  S  enzymatic  S  • enzymatic  S  • enzymatic  S  • enzymatic  NS  bio-synthesis  NS  bio-synthesis  s  introns (molecular)  s  structural (molecular) structural (molecular) structural (molecular) structural (molecular) structural (molecular)  s s s s  .  S  antigenic  NS  antigenic  •  • •  .  •  .  •  .  S  antigenic  NS  antigenic  NS  antlgenic  s s  NS  structural (molecular) structural (molecular) structural (molecular) structural (molecular) structural (molecular) antigenic  S  antigenic  S  antigenic  NS  antigemc  0.52  0.52 -0.10 0.01 -0.28 -0.22 -0.14 -0.14 0.47 0.47 2.03 2.03 0.32 0.32  0.36 0.36 0.26 0.22 1.00 0.84 0.43 0.43 -0.24 -0.23 0.13 0.13 1.68 1.77 0.71 0.71 0.13 0.13 0.77 0.77 -0.08 -0.08 1.74 1.74 0.22 0.22 0.06 0.00 -0.37 -0.37 1.09 1.09 -0.41 -0.44 0.77 0.78 1.09 1.09  1.20 1.09 0.00 0.00 -0.10 -0.10 -0.60 -0.44 0.45 0.45 1.58 0.90 0.30 0.30 -0.60 -0.60 0.45 0.45 0.85 0.85 0.10 0.01 -0.60 -0.60 0.30 0.30 1.43 1.39 1.28 0.65 -0.30 0.01 0.75 0.75 0,50 0.50 1.70 1.70 -0.60 -0.58 -0.05 0.06 0.25 0.78 1.02 0.59 0.45 0.30 0.90 0.90 0.75 1.01  Emini  1.45 1.45 1.35 1.35 0.12 0.90 0.44 0.84 0.94 0.94 1.09 1.13 0.51 0.51 0.33 0.33 0.43 0.43 0.58  0.55 0.85 0.85 0.60 0.72 0.62 0.73 1.92 2.21 0.94 0.94 0.57 0.57 0.94 1.02 0.43 0.43 1.92 1.92 0.29 0.34 0.45 0.45 0.39 0.39 1.40 1.35 0.91 0.38 1.53 1.53 0.68 0.60  ChauFas T  GO R C  H/E H/E H/E H/E  E H E E  T T T T T  C C  -  -  H/E H/E  T T T T E -  H/E H/E  -  -  -  T T T T  -  -  -  T T H H  NS, non-synonymous; S. synonymous.  70  Table 5.6 continued Gene Gp60 (126) Gp60 (79)  Gp60 (98) Gp60 (115) HSP7O (14) HSP7O (17) HSP7O (19) HSP7O (20) HSP7O (22) LDH (10)  LDH (3) MDH (8) MDH (7) Mucin-1 (13) Mucin-1 (16) UPRTase (2) UPRTase (3) 18S rRNA (1)  18S rRNA (3)  Species  BioPhysical Properties of Targeted SNP Loci KD N/S* SNP Marker JW Score Scoe  C. bominis C.parirnm C. hominis C.parvum C. hominis C.parvum C. hominis C.parzwm C. homini.c C.parvum C. homini.r  T C G T G A A G A  C.pmwm  G  C. hominis C.pan’um C. hominis C.parvum C. hominis C.parvum C. hominis C.parvum C. hominis C.parvum C. hominis C.parDum C. bominis C.panum C. hominis C.parvum C. bominis C.parvum C. homim,c C.parvum C. homims C.parvam C. hominis C.parvum C. bominis C. arirnm  A G A G A T C T C T G A G C T G A T T A T C T A T C  T  C  NS  antigenic  NS  antigenic  NS  antigenic  NS  antigenic  S  bio-synthesis  S  bio-synthesis  S  bio-synthesis  S  bio-synthesis  S  bio-synthesis  NS  enzymatic  S  enzymatic  NS  enzymatic  S  enzymatic  NS NS  structural (molecular) structural (molecular)  NS  enzymatic  S  enzymatic  S  ribosomal (structural) ribosomal (structural)  S  1.29 1.54 2.33 2.10 0.32 0.30 2.90 2.85 0.33 0.27 1.96 1.78 1.24 1.24 0.42 0.44 0.29 0.29 1.46 0.88 -0.66 -0.66 0.54 0.54 -0.01 0.01 1.68 1.40 0.71 0.42 1.19 0.73 0.59 0.61 0.44 0.44 0.29 0.33  1.58 1.42 1.50 1.66 1.00 1.06 1.30 1.30 0.45 0.45 0.90 0.85 1.00 0.62 -0.73 -0.71 0.45 0.45 0.60 -0.30 -0.30 -0.30 0.45 -0.30 0.30 0.33 0.80 0.62 1.28 0.98 0.45 0.45 0.95 0.95 1.09 1.09 0.45 0.45  Emini 1.27 1.30 2.31 2.45 1.19 1.19 3.60 3.61 1.03 1.03 1.45 1.44 1.45 1.45 0.59 0.59 0.34 0.56 1.99 0.97 0.93 0.93 0.40 0.38 1.47 1.34 1.92 2.21 0.94 0.94 2.28 1.26 0.66 0.66 0.59 0.59 0.34 1.03  Chau Fas  G0R  -  -  T T T T T T  C C  T  T T H/E H/E H/E H/E H/E H/E H/E H/E  C C H H H H H H T T T T  H/E H/E H/E H/E H/E H/E  T T H E H H  Table 5.6. Kyte-Dolittle hydropathy score on a scale of regions > 0.0 indicative of hydrophobicity while regions < 0.0 are indicative of hydrophilicity. Jameson-Woif algorithm for antigenicity, domains at or reaching towards 1.7 are typically considered antigenic. The Emini surface probability indicates the probability of fmding an amino acid residue on the surface of the protein molecule; with a threshold scale of 1-6 a value of 1 or greater increasingly predicts probability of protein surface location. Chau-Fasman secondary structures are predicted based on helices (H), sheets  (-) or turns (T).  GOR predictions for secondary structure are based on helices (H), sheets (E), turns (T) or coils  (C). NS, non-synonymous; S, synonymous.  71  5.5 Multi-locus SNP-typing  A panel of 45 SNP markers was assembled for the purpose of MiS-typing (Table 5.5). In the cases of Australia, Kenya, Peru, and Scotland 43 of these were consistently used to type samples from each subpopulation. Though well amplified through routine and multi-plex PCR methods the SNPs within the APR protein were not scored because of poor activity and/or resolution of SNP markers/loci upon  downstream typing. This reduced a panel of 45 SNP markers to 43 SNP markers. The SNaPshot method relies on single base extension (SBE) reactions using dye-labelled, mobility modified detection probes to discriminate alleles. MISt’s were scored by peak size, peak color and peak height ratio. When scoring SNP alleles multiple features were taken into account such as mobility shift of labelling dyes, the presence of artifactual stutter peaks, fragment shift due to hairpins or secondary structures within SNP detection probes and the presence of mixed or double allele calls. In the case of double allele calls a predominant peak of significant and consistent height for the same marker was typically present. Mixed SNP-types were scored based on those as having a secondary allele call of at least a minimum of 20% that of the predominant allele call. For the case of data analysis in a haploid organism such as Cryptosporidium the predominant allele call was used for the designation of M1St based genotyping.  5.5.1 Australia: Multi-locus SNP-typing  Thirteen C. hominis isolates and two C. parvum reference isolates (A ,A 14 ) 1 5 were collected from South Western Australia. Of the possible 559 (13*43) SNP loci 422, or 75.4%, were successfully typed (Table 5.7). Eighty-eight allele variants were uncovered from all 12 genes. There were 31 novel allele variants, that is those of neither the C. hominis or C. parvum expected M1St. These were uncovered at three SNP positions; position 3 of 18S rRNA, identified in 13 isolates, position 98 and position 115 of Gp60, identified in 11 and 7 isolates respectively. For SNP loci 3 of the 18S rRNA gene isolates 13 -A were 1 A novel. At SNP position 98 of the Gp60 locus samples A  1-6,8,9, 10,11,13  were novel variants from either of  the species specific alleles. Also at SNP marker 115 of the Gp60 locus samples A 10 revealed 7 6 5 3 2 , 1 unique allele types. Results are not surprising since the Gp60 antigenic protein is one of the most hyper variable studied. The EMAAg protein was the only gene to show complete genetic stability in all SNPs successfully typed, no C. parvum or novel allele variants were seen. The most obvious observation in the Australian subpopulation is the variability of A , showing almost a complete C. parvum MiSt (Figure 5.3, 4 C). This brings forth the concept of a possible mixed infection within this isolate. Also to be considered is 72  the possibility that it is in fact a C. parvum sample displaying C. hominis allele variances, keeping in mind samples 13 -A were designated as confirmed C. hominis isolates from the donor country. With the 1 A exception of sample A 6 the Australian subpopulation had four genetically stable proteins including COWP, Cp23, HSP7O and EMAAg.  Table 5.7 Australia Subpopulation MISt & Allele Variants B-tubulin  Cp23  COWS’  185  HSP7O  AcoA  Muc.1  LDH  MDH  EMA  SNP  1  4  3  5  7  8  5  6  1  3  7  4  3  1  5  6  1  3  14  17  19  28  22  1  4  6  7  13  16  80  108  126  79  98  115  10  3  8  7  29  27  2  Ch Cp  3  C  A  T  C  G  C  C  T  C  T  G  G  T  T  C  G  T  T  A  A  A  A  A  C  C  C  A  T  A  T  T  T  T  G  G  C  C  G  G  G  G  T  T  T  G  G  T  A  T  T  C  T  C  A  A  C  C  G  T  A  C  G  G  G  G  T  A  T  G  T  6  T  A  C  C  C  T  A  T  T  A  C  A  C  A  C  -  -  -  -  Al -  .....  A3  -  -  G. -  A4 A5 A6 A7 A8 A9 AlO All A12 A13  A  AC  A  A  A  AC A  .  A  -  6  -  ATTCTC  -  ACCGT  -  AG  G  G  G  T  -  TGT  -  T  A A  -  -  -  -  .  .  . -  -  C  -  -  -  .  A ....A  .  .  -  -  -  -  -  C  A  C  A  C  A  C  A  C  A  C  -  -  -  -  -  -  -  -  -  AC A  C  A  C  -  -  AC  A  -  -  -  -  C  -  ACT -  UPRT  -  A  AC  A T  Gp60  C  A  -  TT -  -  AC A  -  -  C  -  A  -  A  C  -  -  A A  -  C  -A  -  --  A A  A  C  -  -  A  Table 5.7. Ch; C. hominis. Cp; C. parvum. (.) denotes alleles scored in agreement with the expected C. hominis SNP subtype (top). (-) denotes markers unsuccessfully typed or scored. Allele designations, A, C, T, G, represent allele variants that deviate from the expected C. hominis M1St.  In the Australian subpopulation, 2 known C. parvum isolates were received as reference samples from the donating colleague (Table 5.8). These proved to be beneficial in providing a comparative differential from the C. hominis data. They allowed for the testing of species distinction via fragment analysis and point mutations on international isolates. Both C. parvum isolates showed the presence of a novel allele at SNP position 3 of the 1 8S rRNA locus. This molecular marker therefore gave the same novel allele call in all Australian samples, whether C. hominis or C. parvurn. Also of note, though not completely unexpected given the hyper-variable nature of the locus, differential allele calls were seen in 4 of the Gp60 SNP markers; Gp60 126, and 79, 98,126 and 115 for isolates A 14 and A 15 respectively.  73  Table 5.8 Australia MiSt of C. parvum Isolates B-tubulin  CC)WP  Cp23  lOS  A14 A15  T T  G G  G G  -  -  AT A  T  T T  CT C  T  CA C  A  A A  CC  G  T  C  G  T  C  Table 5.8. Ch; C. hominis. Cp; C. parvum. subtype (top).  AcoA  HSP7O  13 SNPII 43 315611311417 Ch C A T C C C C T C T C G T T C G T T A A Cp TGGTATTCTCAACCGTACG G  1920I1 A  A  A  C  M.c-1  46  7113  C  A  C  T  Gp60  C  G  TATCTG  TA  C  CC  AA  C  G  C  G  TAT  TA  C  T  A  G  G  C  G  T  A  C  T  A  G  T  -  LDH  161801081267998 A T T C T T  MDH  G  TA  -  -  -  T  A  C  C  EMA  C  C  C  G  TTA  CA  T  C  A  C  A  T  UPRT  71292712  115110  TA -  A  (.) denotes alleles scored in agreement with the expected C.  C  3  T  CA -  -  C  AC A  C  hominis SNP  (-) denotes markers unsuccessfully typed or scored. Allele designations, A, C, T, G, represent allele  variants that deviate from the expected C. hominis M1St.  Figure 5.3 Australia, electropherogram representations; HSP7O locus SNP markers 14, 17, 19, 20, 22 (left to right).  A. Isolate A 9  Figure 5.3, A. Expected C. hominis profile as seen by 5 adenine alleles scored to each marker (A, fluoresces green).  74  Figure 5.3 continued. Australia electropherogram representations; HSP7O locus SNP markers 14, 17, 19, 20, 22 (left to right).  B. Isolate A . 3  0  22  00  100  120  1  IaS; €02  I_JILl ILLU  200  A  Figure 5.3, B. Expected C. hominis profile as seen by 5 adenine alleles scored to each marker (A, fluoresces green)  C. Isolate A . 6  Figure 5.3, C. HSP7O profile for A , illustrating the C. parvum profile within C. horninis isolate, illustrated by the 4 6 guanine stretch followed by a single thymine allele (G, fluoresces blue; T, fluoresces red).  75  Figure 5.3 continued. Australia electropherogram representation; HSP7O locus SNP markers 14, 17, 19, 20, 22 (left to right).  D. Isolate . 14 A 0  48  80  80  00  120  1a0o  1200  1880  888  €00  400  200  IL1\  J  LLJ  Figure 5.3, D. HSP7O profile for , 14 illustrating the C. parvum profile within C. hominis isolate, illustrated by the 4 A guanine stretch followed by a single thymine allele (0, fluoresces blue; T, fluoresces red).  5.5.2 Kenya: Multi-locus SNP-typing  Of a possible 860 SNP positions to be typed (20*43) 583, or 67.8 %, were successfully scored for a specific allele; 74 were variant or not of the expected C. hominis SNP-type (Table 5.9). Six genes demonstrated complete genetic stability; COWP, Cp23, LDH, MDH, EMAAg, and UPRT. The EMAAg gene was also completely stable in the Australian subpopulation and with the exception of A . Minus 1 6 K 9  the HSP7O locus was genetically stable at all 5 SNP loci. When looking at all the differential alleles there is a slight predilection seen towards novel alleles versus those of the closely related C. parvum M1St.  There were 39 novel alleles in total for the Kenyan subpopulation. The remainder allele scores (35) were that of the C. parvum genotype. The same novel allele variant, adenine, was seen at SNP marker 3 of the 1 8S rRNA gene in isolates , 35 716, 18-20. There was a novel allele, in a sole Kenyan sample, , 1 K 11 at SNP K marker 7 of the 3-tubu1in locus. This was not seen in any of the other 3 intercontinental subpopulations. The remaining novel alleles were typed to the hyper-variable Gp60 locus in , 41 K 1 8 at position 79, in 4 , 1 K 1 7 at position 98 and in K 8 , 4-7, 10-13, 15-17, 19,20 1  SNP position 115. Although variant, alleles scored at  SNP markers 80 and 108 for Gp60 were consistently that of the C. parvum Mist. A result of note for this 76  particular protein, also mimicked in the C. hominis isolates of the Australian subpopulation, is the lack of genetic differentiation at SNP position 126. In such a variable gene, this could provide early indications of a specific biofunctionality, its expression or lack of, or even its position within the primary folding conformation of the Gp60 protein. Table 5.9 Kenya Subpopulation MiSt & Allele Variants B.tubulin  COWP  Cp23  18S  HSP7O  AcoA  Muc-1  Gp60  LDH  MDH  EMA  SNP  1  4  3  5  7  8  5  6  1  3  7  4  3  0  5  6  1  3  14  17  19  20  22  1  4  6  7  13  16  60  108  126  79  98  115  10  3  8  7  29  27  2  3  Ch Cp  C  A  T  C  G  C  C  1’  C  T  G  15  T  T  C  15  T  T  A  A  A  A  A  C  C  C  A  T  A  T  T  T  T  15  15  C  C  G  15  G  G  T  T  T  15  G  T  A  T  T  C  T  C  A  A  C  C  G  T  A  C  15  G  G  G  T  A  T  G  T  G  T  A  C  C  C  T  A  T  T  A  C  A  C  A  C  -  -  -  -  A  C  1(1  K2 K3 K4 K5 K6  A -  A  -  -  G  K7 K8  G  -  -  -  -  A  -  -  -  -  K9  A  C  -  1<20  C  A  C  -  C  C  -  -  -  -  C  C  -  -  G  -  AC  A  A  -  C  G  C  --T  -  A  -  C  -  A  --  A  C  A  --  -  G  --C  A  -  -  C  A A  K1O Ku K12. K13--K14 K15 K16 K17 K18 K19  --  -  A  -  -  -AC  V  -  -  AC  ---A-  -  A  ----  -  -  UPRT  C-  AC  -  C  A A  -  -  -  -AC  --  A  -  .  -.  -  A  V -  -  G  A  -  C----  AC  -AC-  C  -AA-  A  -  C-  -  ---AC  -  A  C  A  C  -  -  -  -  --  --  C C  -  -  -  V  Table 5.9. Ch; C. hominis. Cp; C. parvum. (.) denotes alleles scored in agreement with the expected C. hominis SNP subtype (top). (-) denotes markers unsuccessfully typed or scored. Allele designations, A, C, T, G, represent allele variants that deviate from the expected C. hominis MiSt.  Figure 5.4 Kenya electropherogram representation; isolate K , COWP locus SNP markers 1, 3, 7. 9  28  40  .  .  100  120  500 84°  700  000 500 400 300 24° 100  jj  J_  I  jI’t -  V..-  GI  Figure 5.4. Electropherogram of Kenyan isolate, I(9 COWP SNP markers 1, 3, 7, alleles C, T and G respectively  (left to right). Lizl2O size standard seen in orange. 77  5.5.3 Peru: Multi-locus SNP-typing  With a subpopulation of 22 isolates there is the potential for 946 (22*43) SNP markers to be scored. At a success rate of 71.5 %, 676 were typed, of these only 10.8% or 73 in total were variant. Variant alleles were only seen in 4 loci; f3-tubulin, COWP, 1 8S rRNA, and Gp60. Five of the remaining proteins, Cp23, HSP7O, LDH, MDH, EMAAg and UPRT, showed complete genetic stability while AcoA and Muc-1 had too limited typing results to base any conclusive inferences. The Peruvian subpopulation shows the first report of a novel allele being detected in the f3-tubulin locus at SNP position 3 (Table 5.10) as demonstrated in isolates , ,g, 5 P 1 9 4 21. Variants at the same SNP loci seen in Australia and Kenya were of the C. parvum MlSt. This is also the first report of 2 novel alleles revealed in the COWP locus which can be seen in samples P5,gd21 at SNP marker COWP6 and again in P 21 at marker COWP3. As was the case in Australia and Kenya, the novel allele of A, was again seen 1 8S rRNA, position 3. If in fact a true novel allele it appears to be stable despite geography although Kenya had two incidences of an honest C. hominis allele type, K 2 and . 17 Multiple allele differences can be seen throughout the hyper-variable Gp60 gene. K Variant alleles scored at SNP positions 80 and 108 for Gp60 were that of the C. parvum expected genotype. For SNP 126 marker all isolates are again stable except for P 2 showing a C. parvum allele call. As will be discussed this is also the case for the Scottish subpopulation. The stability of Gp60 marker 126 in all four subpopulations could suggest it would be a suitable SNP marker from such a variable protein for species distinction, phylogenetic studies or one to monitor for possible mutations evolving in the future. The other variant alleles for the Gp60 protein are spread throughout markers 98 and 115. Only 9 samples were successfully typed for Gp60 position 79 and there were no variances seen. As also seen in Kenya loci Cp23, LDH, MDH, EMAAg and UPRTase are all genetically stable for all markers typed. Markers mapped to the HSP7O locus gave a complete C. hominis MiSt profile with 95 of a possible 110 SNPs successfully typed. The only exceptions result from a failure to type samples P ,is for all five HSP7O 5 , 4 SNP loci.  78  Table 5.10 Peru Subpopulation MiSt & Allele Variants B-tubulin  COW?  Cp23  ISS  HSP7O  AcoA  Muc-1  Gp60  LDH  MDH  EMA  SNP  1  4  3  5  7  8  5  6  1  3  7  4  3  1  5  $  1  3  14  17  19  20  22  1  4  6  13  Cli Cp  7  1$  80  108  126  79  98  115  10  3  8  7  28  27  2  3  C  A  T  C  G  C  C  T  C  T  G  G  T  T  C  G  T  T  A  A  A  A  A  C  C  C  A  T  A  T  T  T  T  G  G  C  C  G  G  53  G  T  T  T  G  G  T  A  T  1  C  T  C  A  A  C  C  53  T  A  C  G  53  53  G  T  A  T  53  T  6  T  A  C  C  C  T  T  A  C  A  C  A  C  P1 P2 p3 p4 p5 P6 p7 P8 p9 PlO P11  A  -  -  . -  -  -  V  -  A -  -  -  -  -  A  A  -  -  -  -  -  -  A  -  -  -  -  -  -  AC  V  .  -  V  -  A  .  V  V  -  V  V  A  -  -  AC  -  -  -  -  -  -  C -  A C  -  -  -  -  -  -  -  -  -  -  V  V  VVVVVVV  VVVVVVV  -  V  -  -  -  -  A  V  V  VVVVVVVV  -  -  A  -  A  A  V  A  A  --C-  PI2VVVV  P13 P14 P15 P16 P17. P18 P19  A  A  A-  V  A A  CC  AC  V  -  T A  -  A -  C  AC AC  V  -  -  C -  A  -  -  A A  A  -  -  AC  UPRT  -  A  C  A  ---AC A  VVVVVVV  -.  -  VVV  A  C  -A  C  A  ...  A  A  -  .  V  C  --  C  -  -  -  V...  -  -  -C  VVA  VVVVVVVVVVVV  V  VVVVVVVVVVVVVV  VV  V  VVVV  -  A -  P2OVVV  P21 P22  A  VVVVVVVVVVVVV  A  .  -  G  V  A  -  Table 5.10. Ch; C. hominis. Cp; C. parvum.  SNP subtype (top).  A A  -  A  C  -  -  A  C  -  -  -  -  C  -  (.) denotes alleles scored in agreement with the expected C. hominis  (-) denotes markers unsuccessfully typed or scored.  Allele designations, A, C, T, G, represent  allele variants that deviate from the expected C. hominis M1St.  Figure 5.5 Peru electropherogram representations; isolate P , of Cp23 and 1 8S rRNA loci SNP markers. 6  Figure 5.5. Peruvian isolate, P , reaction set 5 marker profile comprised of Cp23 markers 4,3,1,5,6 and 1 8S rRNA 6 markers I and 3 (left to right). Allelic profile for expected C. hominis genotype would be G-T-T-C-G for COWP and T-T for 1 8S rRNA (Table 5.10). Shown is 1 8S rRNA’s presence of a strong A allele at marker 3, far right. (3 (fluoresce blue), T (fluoresce red), C (fluoresce black), A (fluoresce green). Lizl2O size standard seen in orange. 79  Figure 5.6 Peru electropherogram representations; isolate , 21 COWP locus SNP 3 novel allele variant, G. P  Figure 5.6.Electropherogram depicting G (fluoresce blue) variant allele at SNP marker COWP 3, expected C. hominis and C. parvum allele would be T or C respectively. Liz 120 size standard seen in orange.  5.5.4 Scotland: Multi-locus SNP-typing  With a sample subpopulation of 20 there is the potential for 860 SNP markers to be typed. We  were able to achieve 622 of these, a success rate of 71.1% (Table 5.11). Immediate observations reveal genetic stability for the COWP, Cp23, HSP7O, LDH, MDH, EMAAg and UPRTase loci. This is akin to the results of SNP typing for both the Australian and Kenyan subpopulations and with the exception of the COWP gene the Peruvian subpopulation as well. Only a handful of markers were able to be reliably typed to the ACoA and Mucin- 1 genes; those that were displayed genetic homology to the expected C. hominis M1St. Ninety-three different alleles were observed, 49 of which were novel. A novel allele variant seen in 5 Peruvian samples (P 9 14,21) at SNP locus 3 of the 13-tubulin gene is also seen in the Scotland 8 , 5 subpopulation in samples S 2d  .  The novel allele seen at position 3 in 1 8S rRNA in the previous three  subpopulations is also clearly present in Scotland. All four subpopulations had the same variant allele in 71 of the 73 total samples successfully scored for this particular marker in the 1 8S rRNA locus. The SNP position and allelic profile was re-examined and confirmed for the expected C. hominis or C. parvum genotype. Given that A , an isolate that gave a very dominant C. parvum profile and both of the confirmed 6  80  C. parvum isolates, , A 1 4 also scored the same novel allele as the C. hominis isolates could imply that is a 5 true novel allele.  The Gp60 gene marker 126 shows a genetically stable C. hominis MlSt. Markers 80 and 108 were genetically stable though of the C. parvum genotype. SNP marker 98 was stable as well albeit of a novel allele variant, A, which was also seen in 11 of the 13 Australian isolates. There were no mixed allele calls for this particular protein in any of the Scotland samples. In the Australian, Kenyan, and Peruvian subpopulations this protein had numerous double allele calls, of which the dominant one was used.  Table 5.11 Scotland Subpopulation MISt & Allele Variants B-tubulin  Cp23  COWP  1SS  HSP7O  AcoA  Muc-1  SNP  1  14  17  19  26  22  Ch Cp  CATCGCCTCTGGTTCGTTA  A  A  A  ACCCA  TGGTATTCTCAACCGTACG  666  SI S2 S3 S4 S5 S6 S7 S8 S9 SlO Sli S12 S13. S14 S15 S16 S17 S18 S19 S20  4  3  5  7  8  5  .  -  -  A  .  T  -  .  .  6  1  3  7  4  3  1  5  6  1  A  -  -  -  .  .  .  3  -  -  .  -  -  A  -  -  7  -  LDH  MDH  EMA  80  109  126  79  98  115  10  8  7  29  27  2  3  TAT  T  T  T  G  G  CCG  G  G  G  T  T  C  C  CT  A  TTA  C  AC  A  C  A  A  A  A  -  -  13  -  .  -  -  .  G  -  -  -  A  AC  C  A  A  C  A  AC -  -  -  -  -  -  -  A  C C A  C  A  A  C  A  A-  A  -  -  -  -  -  -  -  -  -  -  -  -  -  UPRT  A A  A  A  A  AC A  A  A  A  -  -  -  -  -  -  -  -  -  AC A  C  A  A  -  -  -  -  C-  -.  -  -  -AA-  --  --  --C  -  3  -  C  A  -  -  C  A .  -  C  A  A  -  -  .  -  -  A  .  .  .  T  Gp60  A  -  A -  16  C  A  -  -  6  TATGTG  A -  4  A  -  -  -  1  A  -  C  -  AC A  -  -  -  -  -  -  -  -  A -  A  . -  -  .  .  .  -  -  -  A  -  -  -  Table 5.11. Ch; C. hominis. Cp; C. parvum. SNP subtype (top).  -  a  .  -  A  -  -  -  -  A  C  A  C  A  C  A  C  -  -  -  -  -  A  A  A  A  A  C  A  A  -  -  -  -  -  (.) denotes alleles scored in agreement with the expected C. hominis  (-) denotes markers unsuccessfully typed or scored.  Allele designations, A, C, T, G, represent  allele variants that deviate from the expected C. hominis MlSt.  81  Figure 5.7 Scotland electropherogram representations; Cp23 and 1 8S rRNA loci SNP markers.  A. Isolate S . 3  40 I  1204.  80  I  120  I  I  18 I  1000  800  -  :  A LI 1.1 Ii ..A I  -  T1  Figure 5.7, A. Scotland isolate, 53, reaction set 5 marker profile comprised of Cp23 locus markers 4,3,1,5,6 and 1 8S rRNA locus markers 1 and 3 (left to right). Allelic profile for expected C. hominis genotype G-T-T- C-G for COWP  and T-T for 18S rRNA (Table 5.11). Shown is 18S rRNA’s presence of strong A allele at marker 3, far right.  B. Isolate S . 6  0  40  80  120  1800 1400 1200 1000 800  600 400 200 0  A  JI  .A..±  itj i9  Figure 5.7, B. Scotland isolate, S , reaction set 5 marker profile comprised of Cp23 locus markers 4,3,1,5,6 6 3 and S and 18S rRNA locus markers I and 3 (left to right). Allelic profile for expected C. hominis genotype would be G, T, T, C, G for COWP and T, T for 18S rRNA (Table 5.11). Shown is 18S rRNA’s presence of a strong A allele at marker 3, far right. G (fluoresce blue), T (fluoresce red), C (fluoresce black), A (fluoresce green). Lizl2O size standard (lSnt-120, left to right) seen in orange. 82  5.6 Mixed Genotypes  In recent years the concept of mixed genotypes in Cryptosporidium has gained considerable momentum and is becoming increasingly more accepted. In light of this the presence of true mixed alleles was addressed. Multi-locus SNP-types were defmed for each sample based on the allele calls (i.e., allele scored) within range of an expected fragment size. To conduct population genetic analysis it is imperative that each molecular marker can be assigned an allele to construct either a Multi-locus genotype (MLG) or an M1St (Multi-locus SNP-type). The presence of mixed alleles is a common occurrence in any population genetics study on microparasites. In studies of a haploid organism it is essential that each sample can be assigned a single dominate allele in order to assemble a MLG or M1St for individual isolates. To facilitate this, the assumption that the predominant peak (i.e. allele scored) at each SNP locus represents the actual genotype must be made. It is these alleles that are used to establish the MlSts described above. The genotypic data suggests a high proportion of mixed SNP-types at numerous SNP positions within the 43 marker assembled SNP panel. Because Cryptosporidium is haploid lacking multicopy genes (except for the ribosomal genes); the presence of more than one peak fluorescing at a given fragment size could imply a mixed population which may reflect the transmission intensity within a given population. For the purpose of this study mixed alleles were scored based on peak height ratio. Those having a secondary peak within the expected fragment size range of at least 20% the height of the predominant allele call were scored as mixed. While peak height ratio changed between individual SNP markers, it remained constant for a given marker.  83  Figure 5.8 Electropherogram representation; mixed alleles scored isolates A 1 for Cp23 markers 3 and 1.  0  20  40  $0  -1 COO 1400 1200 1000 COO 800 400 200  A  A  A  LA  Figure 5.8. Electropherogram for isolate A . Cp23 SNP markers 1 3 and 1 were among those that consistently displayed the presence of mixed allelic profiles. In both markers the expected C. hominis allele type is T (red). In the case of Cp23 marker 3 (left peak) a secondary allele, C (black) was present at approximately 28% of the predominant T allele. This variant allele type is that of closely related C. parvum genotype. In the case of Cp23 marker 1 (right)  L  Li  the secondary allele present, at approximately 40% of the predominant T allele, is a novel variant A (green).  How these mixed alleles are distributed based on individual SNP marker among the Australia, Kenya, Peru and Scotland populations as a collective is shown in Figure 5.9. Of the total 43 SNP markers in this study, 13 displayed a propensity for mixed alleles. The SNP marker with the most mixed alleles scored is marker 3 in the COWP locus. Sixty-two, 82.7%, of a total of 75 C. hominis isolates (Australia, 13; Kenya, 20; Peru, 22; Scotland, 20), had mixed alleles scored at this particular SNP position. This was closely followed by 53, 70.6%, samples having mixed alleles at marker 22 of the HSP7O locus.  84  Figure 5.9 Mixed allele distribution based on individual SNP marker. 100 90 80  70 60  50 40  • Distribution % mixed alleles among Australia, Kenya, Peru and Scotland based on SNP marker.  10 0  —  —  —  —  —  ‘1’  ?‘  I  r  ?  -  —  c,  Figure 5.9. The x axis represents the 13 SNP markers that were observed as having mixed alleles scored at their position and the total percentage of their distribution throughout all four subpopulations combined.  Fragment analysis of the AcoA and Mucin- 1 proteins resulted in a disproportionately high number of bi-allelic or mixed allele calls for almost all SNP markers and isolates tested. Those that could be scored a dominant allele were done so but for most resolving one allele over another was not possible; thus, were left un-scored for multi-locus SNP-typing. Data from these two genes was also omitted from downstream quantitative analysis to prevent a skew based on unreliable predominant allele calls. Despite a tn-peat of the same samples, using both exact as well as varying conditions, the same result ensued. A higher number of genotypically mixed MiSt’s seen in one subpopulation versus another could imply a less stable population structure. A prerequisite for genetic exchange is met by the concept of mixed infections. When considering the distribution of mixed alleles among the four subpopulations of Australia, Kenya, Peru, and Scotland there are no significant differences (Figure 5.10). Scotland comes out on top with 12.67% of all SNPs successfully typed as having mixed alleles though is closely followed by Australia at 11.35%, Peru at 10.18% and Kenya at 8.12%. In the context of our study these figures appear to refute the notion of a less stable genetic subpopulation in one country versus another. As sample sizes increase and more molecular markers are examined this could change.  85  Figure 5.10 Distribution mixed alleles according to geographic boundary. 100 90 80 70 LI. , .  50 40 30 20  10 0  t Australia  Kenya  Peru  Scotland  Population  Figure 5.10. Distribution of mixed alleles when partitioned by geography, Scotland showing to have the most (slightly) with the fewest incidences of mixed alleles seen in Kenya.  5.7 Distribution of Multi-locus SNP-types  Multilocus SNP-type was determined based on the allelic profiles for each isolate from all four international geographies. Twenty-four unique MlSt profiles were encountered from 72 of a total 75 C. hominis isolates across the 4 international C. hominis subpopulations surveyed. MlSt’s of the remaining three isolates (isolates K 5 17) were deemed too incomplete to designate with significance. Observed were , 3 a small number of highly abundant MlSts and a large number of singletons, consistent with previous data from Scotland’ . Of 72 isolates scored for a quasi-complete MISt sixteen (22.2%) belonged to the most 34 abundant MiSt, M1St1, which is closely followed by M1St1 5 with thirteen (18.0%) isolates. The single most frequent MlSt was found situated within the geographic boundary of Scotland, MISt, with 9 isolates scored. This Mist was also identified in one Kenyan and two Peruvian isolates (Figure 5.11) though not in the Australian subpopulation. Of the 24 MlSt’ s identified, 25% were found to be located within one or more geographies. The most widespread Mist, Mlstl, occurred within all four geographies and contains novel SNPs. Each biogeography contained at least one or more unique MlSt having private alleles, alleles detected only within one subpopulation. Kenya had the most followed closely by Peru at 7 and 6 respectively. There were 3 M1St’ s unique to Australia though only 1 in Scotland. Each geographically 86  diverse subpopulation displayed a wide variety of MiSt’s, whether shared or unique. Within the boundary of Australia there were 6 different MiSt’s identified with 11 in Kenya, 11 in Peru, and 6 in Scotland. Pairwise comparison of all four geographically distinct subpopulations shows that Peru and Scotland have the most MlSts in common, sharing four of the 24 identified. Australia and Scotland, Kenya and Peru and Kenya and Scotland all share 3 common MlSts. Australia only shares 1 and 2 with Kenya and Peru respectively. Results suggest that the repertoires of MiSts circulating amongst all four subpopulations show only a slight overlap with one another. The broad range of MISt’ s contained in varying degrees within all four geographies suggests that the intra-population genetic diversity plays a more significant role in population sub-structuring.  Figure 5.11 Geographic distribution of MlSt’s identified.  10 1  C  wh.  8  6  ZLJZZZEZZZTIE  4-’ -j  2  I  ii 1 * 1  2  3  4  5  6  7  8  IA 1K  I  I I  I  Ill  11111  PP Is  9 10 1112 1314 1516 17 18 1920 2122 23 24  Designated MLSI Figure 5.11. The 24 multi-locus SNP-types identified (numbered 1-24) and their distribution based on geography; Australia (A) seen in blue, Kenya (K) in red, Peru (P) in green, and Scotland (S) in purple.  5.8 Descriptive Statistics & Measures of Genetic Variability  For analysis of genetic diversity at a single point mutation level, a locus was considered polymorphic if two or more alleles were detected, regardless of their frequencies. Of a possible 37, Australia had 28 SNP loci that scored more than one allele type. This is the most out of all four geographies suggesting the most intra-population diversity exists in Australia when compared to the other three subpopulations, though we must keep in mind to the allelic profile of isolate A . In terms of allele 6 87  frequency, Scotland was the most stable of the four geographies; only three SNP markers displayed more than one allele type: BT 3, BT 8, and GP6O 115. All remaining SNP markers, whether novel, variant or of the expected genotype were genetically stable for this subpopulation. Nine separate marker loci were multi-allelic in the Peru subpopulation and 8 in the Kenyan subpopulation. Standard genetic diversity parameters were estimated and the mean number of alleles per locus (A), frequency of polymorphic loci (P), observed heterozygosity (He) and expected heterozygosity (He) (Table 5.12) varied among the 4 subpopulations, with values of A ranging from 1.135 in Scotland to 1.778 in Australia; P from 8.10% in Scotland to 77.8%, in Australia; H 0 from 0.097 in Scotland to 0.182 in Australia and H. from 0.086 in Scotland to 0.165 in Australia. The four international subpopulations contained genetic diversity of varying degrees. With the extremely polymorphic profile of isolate A 6 when compared to the expected C. hominis expected allele type the threshold relationship between Australia and the other three international subpopulations may be skewed and if omitted would likely be reduced to levels comparable of the other three subpopulations. Scotland clearly showed the smallest level of variation.  Table 5.12 Diversity Indices for 4 International Subpopulations of C. hominis. Subpopuiation  N  # Mists  a  PD  A  AP  Australia  13  6  0.778  0.462  1.778  2.000  0.165 (0.079)  0.182 (0.049)  Kenya  20  11  0.243  0.550  1.297  2.222  0.071 (0.015)  0.068 (0.045)  Peru  22  11  0.243  0.550  1.297  2.222  0.085 (0.001)  0.109 (0.004)  Scotland  20  6  0.081  0.462  1.135  1.667  0.086 (0.000)  0.097 (0.016)  8.5  0.336  0.506  1.376  2.278  0.086  0.113  Mean Table 5.12.  He”  (±)  0 (±) H  locus is considered polymorphic if the frequency of the most common allele does not exceed 0.99.  “Unbiased estimate (Nei 1978). N, sample size; P, frequency of polymorphic loci; A, mean number of alleles per locus; AP, mean number of alleles per polymorphic locus; H, expected heterozygosity, H,, observed heterozygosity. PD is proportion of distinguishable MiSts. (±) Standard deviation.  Clonal diversity, measured as the proportion of distinguishable genotypes (PD), was relatively uniform across all four subpopulations with a value of only 0.08 8. Private alleles (Table 5.13) at low frequency were encountered in all four subpopulations though under different circumstances. With the high degree of variability isolated to A 6 in 28 SNP loci of 37 it can be considered to have a high proportion  88  of private alleles, though all are of the C. parvum genotype. Excluding this isolate there are 6 private alleles seen in isolate A . Though each variant allele is of the C. parvum genotype they are contained to the 7 LDH, MDH, and UPRT loci. There was genetic stability in the Kenya, Peru, and Scotland subpopulations for these three specific genes. Kenya had two discrete alleles reserved to two different SNP loci. First was a cysteine residue at marker BT 7 in K which was not only novel but no other subpopulation had any 11 genetic diversity at this position (excluding A ). Secondly while not variant isolate K 6 2 and K 17 had what may be considered a private allele at loci 1 8S rRNA 3. This is the only subpopulation to show the expected C. hominis allele type at this particular marker, all other isolates and subpopulations had the novel Adenine allele variant. Allelic variances reserved to the Kenyan subpopulation were also seen at the Gp60 locus marker 79 but this particular SNP was not successfully typed in any of the other three subpopulations therefore this particular finding must be taken with some reserve. In three different isolates Peru had the same private allele at the COWP 6 locus, which was distinct from either the C. hominis or C. parvum genotype. Again in the Peruvian subpopulation there was the novel SNP type, guanine, at SNP marker COWP 3. In contrast to variant alleles there are a number of SNP markers that demonstrate genetic consistency across all four international subpopulations; two from the f3-tubulin locus (markers 4, 5), one from the COWP locus (COWP 7) and both EMAAg markers (EMAAg 29, 27). If the A 6 hyper-variable isolate is removed there are 12 additional genetically stable SNP loci and if A 7 is excluded there are a further 6 SNP loci with genetic constancy. Based on the typing results gene diversity per locus and subpopulation were computed with zero indicating a complete lack of differentiation or a total presence of genetic stability. Table 5.14 describes the degree or level of gene diversity per locus and subpopulation. Increasing values toward a threshold of one indicate greater genetic variation at that particular SNP position within individual subpopulations. Scotland had the greatest number of genetically stable SNP-markers at 32 followed by Kenya and Peru at 58 and 27 respectively. Australia was only genetically stable at 8 of a possible 37 SNP markers. Four SNP markers were shown to have no genetic diversity across all four subpopulations; BT4, 1 8S rRNA1, EMA29 and EMA27. All other SNP markers were variable at least once in one or more subpopulations. SNP marker 18S rRNA 3 does show a lack of differentiation among the four subpopulations of Australia, Kenya, Peru, and Scotland in tenns of the allele scored but as it is for a novel allele variant. If it is indeed a true novel allele it is one that is stable among globally diverse parasite subpopulations. SNP markers Gp60 98 and 115 showed the greatest amount of diversity in all four subpopulations, at least 2 or 3 different alleles were scored to this SNP position.  89  Table 5.13 Allele Frequencies at 3lPolymorphic SNP Loci of International Populations of C. hominis SNP marker Australia Kenya Peru Scotland 2 BT1-A  3 BT1-A  0.924 0.076  BT3-A 1 3 BT3-A 4 BT3-A  0.846 0.154  BT7-A 1 2 BT7-A BT7-Az  0.924  BT8-AZ 3 BT8-A  0.924 0.076  1 COWP5-A COWP5-A  0.924 0.076  COWP6-Ai COWP6-Az 3 COWP6-A  0.076 0.924  COWPI-Az COWPI-Az  0.924 0.076  COWP3-A 2 3 COWP3-A 4 COWP3-A  0.076 0.924  Cp(23)4-Ai 4 Cp(23)4-A  0.076 0.924  Cp(23)3-Az Cp(23)3-Az  0.076 0.924  Cp(23)1-Az Cp(23)1-A3  0.076 0.924  Cp(23)5-Az Cp(23)5-Az  0.924 0.076  Cp(23)6-A 3 Cp(23)6-Az  0.076 0.924  18S rRNA3-Ai 18S rRNA3-A 3  1.000  HSP(70) 14-Ai HSP(70) 14.A  0.924 0.076  HSP(70) 1 17-A HSP(70) 17-Az  0.924 0.076  0.846 0.154  0.227 0.773  0.105 0.842 0.053  0.076 0.067 0.933 0.944 0.056  Table 5.13 Allele Frequencies at 31 Polymorphic SNP Loci of International Populations of C. hominis SNP marker Australia Kenya Peru Scotland HSP(70) 19-A 1 HSP(70) 19-A 4  0.924 0.076  HSP(70) 20-Ai HSP(70) 2-A 4  0.924 0.076  HSP(70) 22-Al HSPQI0) 22-A3  0.924 0.076  Gp(60) 1 80-A Gp(60) 80-A 3  0.924 0.076  0.824 0.176  0.444 0.556  1.000  Gp(60) 108-Az Gp(60) 108-A 3  0.924 0.076  0.883 0.117  0.500 0.500  1.000  Gp(60) 2 126-A Gp(60) 126-Az  0.045 0.955  0.150 Gp(60) 79-Az Gp(60) 79-Az Gp(60) 79-Az  0.850  0.950 0.050  0.895 0.105  1.000  1.000  Gp(60) 1 98-A Gp(60) 98-Az Gp(60) 98-Az  0.111 0.556 0.333 0.846  0.500 0.111 0.389  1.000  0.154  0.187 0.063 0.750  Gp(60) 1 115-A Gp(60) 115-Az Gp(60) 115-Az  0.583 0.417  0.053 0.737 0.210  0.136 0.454 0.410  0.650 0.300 0.050  LDHIO-AZ LDHIO-A 3  0.750 0.250  LDH3-Az 3 LDH3-A  0.858 0.142  1 MDH8-A MDH8-Az  0.250 0.750  3 MDH7-A MDFI7-Az  0.286 0.714  1 UPRT2-A UPRT2-A 3  0.250 0.750  2 UPRT3-A 0.100 UPRT3-AI 0.900 Table 5.13. Allelic designation Ai, Adenine; A , Cysteine; A 2 , Thysnine; Az, Guanine. 3 0.947 0.053  C C’  5.9 Genetic Data Analysis  The ultimate goal in quantifying population genetic structure is to understand variation among species and to determine whether there are patterns within or among different populations of organisms and biogeography. Determining how genetic variation is distributed within versus among populations provides insight into genetic population structure, gene flow, historic population parameters and hints of speciation. Genetic differentiation is essentially defined as the level of differences in inter-population allele frequencies, the differences among populations. Intra-population, or within population variation is essentially a measure of heterozygosity for a given population. The two are not mutually exclusive as both can be influenced by shifts in one or another.  5.9.1 Distribution of Genetic Variation  Diversity measures were calculated by Nei’s (1973) index and ranged from H0 to H0.642 (Table 5.14). Averaged over all markers Scotland was found to be the least diverse. Australia showed the highest level of diversity while Kenya and Peru revealed intermediate diversity values (Table 5.14). The low diversity seen in Scotland is largely a result of the fact that although six different SNP markers were polymorphic when compared to the expected C. hominis SNP-type three of these were genetically stable in that polymorphism; 18S rRNA marker 3, and Gp60 markers 80 and 108. The allelic profile of that polymorphism was consistent throughout all isolates typed. The other three SNP markers had bi allelic profiles. Australia, the most diverse subpopulation, was polymorphic at almost all SNP loci except for BT4, COWP7, Gp60 126, and EMA 27 and 29. This is largely result of the A 6 and A 7 typing results and upon their omission the values would likely be more akin to that of Kenya and Peru. In respect of this both isolates were confirmed with the originating laboratory regarding species distinction, typing results were repeated in triplicate and standard gene sequencing methods within our lab were done to confirm their C. hominis status to be true.  91  Table 5.14  SNP locus  Gene diversity within Intercontinental C. hominis Populations estimated by Nei’s (1973) Diversity Measure for 37 SNP Markers. Australia Kenya Peru Scotland  BT1  0.154  0  0  0  BT4  0  0  0  BT3  0.282  0.282  0.368  0 0.199  BT5  NA  0  0.00  0  BT7  0.154  0.133  0.000  0.000  0.154 0.154 0.167 0.154 0.154 0.000 0.154 0.154 0.154 0.154 0.154 0.000 0.000 0.154 0.154 0.154 0.154 0.154 0.167 0.182 0.000 0.000 0.389 0.53 0.429 0.286 0.429 0.476 0.000 0.000 0.429 0.200  0.000  0.000  0.111  BT8 COWP5 COWP6 COWP1 COWP3 COWP7 23Cp4 23Cp3 23Cp1 23Cp5 23Cp6 18S rRNAI 18S rRNA3 HSPI4 HSP17 HSPI9 HSP2O HSP22 60Gp80 60Gp108 60Gp126 60Gp79 60Gp98 60Gp115 LDH1O LDH3 MDH8 MDH7 EMAg29 EMAg27 UPRTase UPRTase  Means 0.177 Table 5.14. Gene diversity scaled from 0  0.000  0.000  0.000  0.000  0.209  0.000  0.000  0.000  0.000  0.000  0.105 0.100 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.105 0.000 0.000 0.000 0.515 0.529 0.091 0.000 0.642 0.636 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000  0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.282 0.000 0.105 0.000 0.000 0.000 0.343 0.385 0.000 0.500 0.425  0.433  0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000  0.511 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.078 0.089 0.022 1, with a zero value telling of a total lack of diversity and 1 representing 0.000  —  0.000 0.000  complete diversity.  92  Genetic diversity is a measurement of differentiation among closely related taxa by using a mathematical measure to understand the degree of genetic separation between species at the molecular level. The long-standing method of deciphering the amount of genetic differentiation that exists has been the use of F-statistics, as described by Wright (1943, 1951, and 1965). Wright’s F-statistics (Fst) provides an integrated view of genetic variation at three hierarchal levels of population structure: within subpopulations, among subpopulations and the total variation in the metapopulation. All of the measures made under the guidelines of F-statistics are based on losses of heterozygosity and work to partition heterozygote deficiency into a within and among population component. Fst is a measurement value of the amount of genetic variation in the total samples that is due to differences among populations comprising that sample. This proportion can range from zero indicating genetically identical populations to one, indicating completely isolated populations. There are some constraints on Wrights original fixation indices resulting in several analogs being introduced to help circumvent such limitations. Nei’s analog (1972, 1973) averaged Fst over alleles and pairs of populations and enabled its application to any population without many key assumptions. It operates under a disregard to patterns of evolutionary forces, sexual or asexual reproduction and ploidy as long as allele frequencies can be estimated. Nei’s functional equivalent, Gst, is essentially the ratio of inter-subpopulational gene diversity (Dst) to the total gene diversity (Ht). Nei’s algorithms using the statistics Hs, Ht, Dst, Gst were estimated for each locus and overall based on SNP-typing results; Hs represents the within sample gene diversity, Ht the overall gene diversity, and Dst is the amount of gene diversity among samples (I.e. The average of genetic diversity among populations). The quantity Dst has been refined and is independent of the number of samples and used. Gst is an estimator of the proportion of total gene diversity partitioned among populations and again is independent of the number of samples used. While it is often cited or argued that Gst cannot be negative later more refmed versions now allow for this . 235  93  Table 5.15 Apportionment of Genetic Diversity into Within and Between Intercontinental C. hominis Populations Marker Hs Ht Dst Gst BTI 0.038 0.038 0.001 0.018 BT4 0.000 0.000 0.000 0.000 BT3 0.283 0.286 0.004 0.015 BT5  0.000  0.667  1.000  1.000  BT7 BT8 COWP5 COWP6 COWP1 COWP3 COWP7 23Cp4 23Cp3 23Cp1 23Cp5 23Cp6 18S rRNA1 18S rRNA3 HSP14 HSP17 HSP19 HSP2O HSP22 60Gp80 6OGplO 60Gp126 60Gp79 60Gp98 60Gp115 LDH1O LDH3 MID’H8  0.071  0.071  0.000  0.000  0.066  0.065  0.001  -0.023  0.038  0.038  0.000  0.013  0.094  0.095  0.002  0.020  0.038  0.038  0.001  0.019  0.065  0.064  -0.000  -0.007  0.025  0.025  -0.000  -0.009  0.038  0.038  0.001  0.015  0.038  0.038  0.001  0.015  0.038  0.038  0.001  0.015  0.038  0.038  0.001  0.015 0.015  0.038  0.038  0.001  0.000  0.000  0.000  0.000 0.003 0.001 0.002 0.038 0.001 0.019 0.064 -0.001 -0.012 0.038 0.001 0.019 0.038 0.001 0.019 0.038 0.001 0.019 0.42 0.116 0.253 0.331 0.076 0.218 0.023 -0.000 -0.011 0.174 -0.028 -0.171 0.519 0.209 0.365 0.621 0.124 0.191 0.120 0.020 0.162 0.071 0.003 0,037 0.120 0.020 0.162 IvEDH7 0.135 0.031 0.214 EMAg29 0.000 0.000 0.000 EMAg27 0.000 0.000 0.000 UPRTase 0.120 0.021 0.170 UPRTase 0.050 0.002 0.032 Overall 0.092 0.122 0.040 0.304 Table 5.15. Gene diversities calculated; average diversity within population (Hs), total diversity (Ht), mean level of genetic differentiation (Gst), and average among population diversity (Dst).  0.000 0.038 0.064 0.038 0.038 0.038 0.343 0.274 0.023 0.195 0.363 0.500 0.105 0.069 0.105 0.112 0.000 0.000 0.104 0.048  94  The average diversity within subpopulation (Hs) was 0.092 and the total diversity (Ht) amounted to 0.122 (Table 5.15). The mean level of genetic differentiation (Gst), diversity between subpopulations overall loci was 0.304. This indicates that almost a third or 30.4% proportion of the total genetic variation existed among subpopulations, compared to diversity within subpopulations at 69.6%. According to Wright Fst values for most organisms is typically 0.15 or less, though values upward of 0.7 have been recorded. Using the established and accepted but arbitrary guidelines for Fst values of Wright’s statistics, which also apply to subsequent analogs, values of Fst or Gst greater than 0.25 are considered significant 235 Each SNP marker/loci contributed differently to the observed degree ’ 47 of subpopulation differentiation, varying from a low of 0 for BT4, BT7, 1 8S rRNA, and both EMAAg markers to a high of 3 6.5% for Gp60 marker 98. The average within subpopulation diversity (Hs) is greater than the average among population genetic diversity (Dst) at 0.092 and 0.040 respectively. This is indicative of intra-population diversity being more imposing than inter-population diversity. The Weir and Cockerham (1984) method (Ost) is another estimator of FstlGst. The main difference between the two methods is that Nei’s approach weights all samples equally regardless of sample size whereas Weir & Cockerham weight samples according to sample size. Having a range of sample sizes from 13 to 22 we calculated the heterozygote deficit using the Weir & Cockerham (1984) parameters for Fstatistics which weight allele frequencies according to sample size. The results, Gst versus  Ost, were in agreement 0.3 04 and 0.319 respectively.  5.9.2 Genetic Identity Measures  Using the genetic data analysis program genetic identity values and distances were calculated from the Nei’s (1978) gene diversity index between each of the subpopulations. Genetic identity values measure the degree of closeness based on allele frequencies between pairs of populations and range from 0, indicating no shared alleles between populations, to 1, indicating that the two populations have the same alleles in identical frequencies. Nei’s unbiased genetic identities were computed to alleviate any bias caused by small sample size, for example, fewer than 50 individuals. Genetic identity values (Table 5.16) ranged from 0.942 between Kenya and Scotland to 0.9 84 Australia and Scotland. From the minima to the maxima the difference amounts to 4.2%. The mean identity between all pairwise comparisons is 0.963; on average there is a genetic identity of 96.3% between all four global subpopulations. Genetic distances averaged 0.048 and varied from 0.034 between Scotland and Australia and 0.061 between 95  Scotland and Kenya. A dendrogram (Figure 5.12) constructed on the basis of Nei’s genetic distance was done using the Neighbour Joining method (NIM).  Table 5.16 Matrix of Nei’s Unbiased Genetic Identity/Distance Measures Based on 37 Loci among 4 Global Subpopulations Australia Australia  -  Kenya  Peru  Scotland  0.976  0.977  0.984  0.955  0.942  Kenya  0.044  Peru  0.045  0.047  Scotland  0.034  0.061  -  0.945  -  0.058  -  Table 5.16. Nei (1978) identity above the diagonal, Nei (1972) distance below the diagonal.  The two subpopulations at the minima of genetic distance and clad to one another are two of the farthest apart based on physical geographic distance (Table 5.17). In contrast, the two subpopulations closest in terms of geography showed to have the greatest genetic distance relationship.  Table 5.17 Approximate Distances Between Intercontinental Populations (kin)  Appj,  Kjyj  ‘Lima  Ap Kjroj  8 896.15  Puma  14 934.84  12 565.73  14 740.90  7 352.15  Table 5.17.  10 072.83  Distances are only estimates and calculated according to location of originating laboratory of samples,  which may or may not be the exact location of an isolates origin (data unknown).  96  Figure 5.12 Neighbour-joining phylogenetic analyses of 4 intercontinental subpopulations of C. hominis; Australia, Kenya, Peru, and Scotland, based on genetic distance. Australia  Scotland  Peru  Kenya  Figure 5.12. Dendrogram provides a visual account of how closely related one species is to another. The more alleles in common, the closer they are related. Dendrogram representation is on the basis that the shorter the distance the greater the number of shared alleles in contrast to the longer the distance representing the fewer number of shared allele.  5.10 Canada: Multi-locus SNP-typing  SNP-typing results for Canadian isolates are more scattered than those previously seen in Australia, Kenya, Peru and Scotland. Though the sample size is the largest (N=3 1) only 22 of a possible 45 SNP markers were typed with confidence. The reasons for this are three-fold. First, samples were more readily available and dispensable since it is the home location for the study therefore Canadian isolates were used to test the design of our experimental platform. Samples received from other countries were limited in supply and reserved until the methodology was reliably confirmed. Second, because of the supply of both C. hominis and C. parvum isolates in our lab we were able to put more effort into testing both species. Aside from the two C. parvum isolates received from Australia all isolates donated in kind were of the C. hominis genotype. Thirdly, both laboratory and fmancial resources compounded by time limitations prevented us from currently going back to further test Canadian isolates more in depth for the remaining 23 SNP markers. In light of these factors Canadian isolates are unlikely to make 97  significant contributions to genetic population data analysis so were omitted from genetic diversity indices. The importance of this subpopulation can be seen in other aspects of the study. First, as mentioned we had the freedom to test, manipulate and finesse our experimental approach before moving onto the more indispensable samples of our global populations. Next, the use of reliable but even more importantly confirmed and documented samples of the C. hominis and C. parvum genotype enabled us to address the question of whether MiS-typing would be a dependable and efficient tool for species distinction. Lastly, while limited there is molecular marker data available for the APR locus for the first time. This indicates to us that our primers were well designed, a welcomed indication considering the time and financial costs of SBE primer construction. SNP-typing results for those samples and markers tested are shown below in Table 5.18. Isolates 1 through BC BC , and BC 10 25 are confirmed C. parvum genotypes. BC 1 demonstrated a complete C. parvum MlSt profile with the exception of two particular markers. First SNP marker 4 located in the  f3-  tubulin locus reveals a C. hominis allele call, recall SNP marker 4 is located within the coding region of the -tubulin locus. No variant alleles are seen at this particular SNP position in any of the other subpopulations studied. The second allele variant is located at SNP position 126 in the Gp60 locus, the same allele consistently seen in the other four subpopulations and again in multiple BC isolates (Table 5.18). Another confirmed C. parvum genotype, sample BC , showed two variant allele calls of the C. 5 hominis genotype at markers 1 and 4 in the f3-tubulin locus. With the exception of A , this is the only 6 other example of a variant allele within the intron region of the 3-tubulin locus.  Samples BC , 26-31 are 1124  confirmed C. hominis isolates. A collection of allele variants, novel or of the C. parvum genotype, are seen at various markers throughout these isolates typed at the Gp60 gene, which are also present in the other four subpopulations. In contrast results for BC 14 show the same allele variant (C) is as that seen in only one other population and isolate, A , at marker 6 in the COWP locus. In the APR locus 3 C. parvum 6 alleles are seen in three confinned C. hominis genotypes, BC . In the case of BC 142021 14 this is the second case of a C. parvum allele variant in its M1St, the first being COWP marker 6.  98  Table 5.18 Canada Subpopulation MlSt & Allele Variants SNP  Ch Cp BCI BC2 BC3 BC4 BC5  B-tubulin COW? 1 4 5 6 ACT C T G T C TAT C T G T C T G T C CAT  C  Cp23 3 1 T T C C C C C C C C  80 T A A A A A  C  BC6  BC7 BC8 BC9 BC1O BC1I BC12 BC13 BCI4 BCI5 BCI6 BCI7 BC18  A  -  -  -  C  T T T T T  -  -  -  C C C C C C -  -  T ACT ACT T ACT T A CT C ACT T ACT T CT CT  T T T T T T  C  BC2O BC21  A  LDH  98 G T T  -  -  -  -  -  -  -  -  -  -  -  C C C C C  -  -  -  -  -  A  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  T T  T  A  -  -  C C  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  C C C C  T T C T T T T T T  -  -  C C C  Table 5.18. Ch; C. hominis. Cp; C. parvum.  (-)  -  -  G -  -  -  -  EMAAg  UPRT  8 G A A  7 G C C  29 G A  27 G C  2 T A A  G  G  G  G  G  G  G G G  G G  MDH  -  A C -  APR  2 T C C  3 T C G  -  T  T  C  C C  G G G  G G G  T T  T T  C C  -  T  -  A A T A C  3 T C C  -  -  G  3 C T  -  C  -  10 C T T  C C C C  -  -  -  115 G A  T  -  -  BCI9  BC22 BC23 BC24 BC25 BC26 BC27 BC28 BC29 BC3O BC3I  Gp60 126 79 T T C C T T T T T  108 T C C C C C C  G C A G C C C C C  denotes markers unsuccessfully typed or scored. Allele  designations, A, C, T, G, represent alleles scored at that particular SNP loci.  99  Figure 5.13 Canada electropherogram representations; BC 1, of COWP, Cp23 and 13-tubulin loci molecular markers.  T  CCCTA  300 500 300 200 500 800 100  A  .. V  I .  Figure 5.13. Electropherogram of Western Canada isolate, BC 1, a confirmed C. parvum isolate, including from left  to right molecular markers 5 (T, red) and 6 (C, black) for the COWP locus, molecular markers 3 (C, black) and 1 (C, black) for Cp23 and molecular marker 1 (T, red) for 13-tubulin. Expected at SNP marker f3-tubulin 4 for C. parvum is an allele type G, (blue) however this C. parvum isolate indicates the presence of a C. hominis allele (A) as the fluorescing green peak denotes (far right). Lizl2O size standard (lSnt-l2Ont, left to right) seen in orange.  5.11 Species Distinction  Identifying relationships between organisms involves grouping them according to a defined set of characteristics. In epidemiology studies just as crucial is the ability to exclude one organism from another for diagnostic purposes and the tracking of transmission routes and infection sources. Typing systems based on genomic material are designed to compare differences at the nucleotide level either in designated regions of the genome (microdiversity) or the entire genome itself (macrodiversity). The most optimal comparisons are those done by the typing and analysis of the entire genome sequence of every strain or isolate. Whole genome DNA sequencing is still not a widely accessible or affordable option for  many countries and laboratories. In lieu of this high-throughput SNP-typing is an attractive alternative. With the ability to multi-locus SNP-type Canadian isolates of both species using our methodology we were able to reliably examine whether or not M1S-typing can be used as a species distinction tool. The results of isolates tested, when compounded by the results from the international subpopulations 100  support this finding. The high throughput, time and cost efficient protocol of our experimental platform makes our method a very attractive alternative to standard genotyping practices. MiS-typing results of the Australian, Kenyan, Peruvian, and Scottish sample populations allow us to make further inferences about MiS-typing as an acceptable methodology for species identification. The genetic stability among four globally diverse populations of certain proteins and SNP markers implicates those that may have the best potential for species classification. The complete absence or almost complete absence of allele variants at SNP position 1 in 1 8S rRNA and all 5 SNP positions in HSP7O indicates they would be dependable markers for species differentiation. The stability of the enzymatic proteins LDH, MDH, and UPRT also suggests their usefulness as markers for species differentiation. As enzymatic proteins essential to Cryptosporidium ‘s biosynthesis processes and therefore survival these proteins make excellent targets for phylogenetic studies as well as potential downstream applications focused on the development of chemotherapeutics. The immunodominantlantigenic yet genetically stable proteins Cp23 and EMAAG make for a sixth and seventh gene that should be explored for rapid genotyping via M1S-typing.  101  CHAPTER 6 DISCUSSION  Summary In this study a battery of C. hominis isolates collected from 4 intercontinental regions was —  genetically typed based on a mutation or SNP profile, allowing for inferences on the global population structure of the parasite to be made. The aim of the study at large was to ascertain whether or not C. hominis populations are partitioned based on geography. The within-population component of genetic variation exceeds the average proportion of genetic differences among populations of C. hominis. Thus far our data appears to do little to indicate population substructuring. Previous studies have argued for more globally diverse biogeographic investigations into genetic variation of the Cryptosporidium genome. Our results argue that a too wide of a geographic boundary can impede rather than advance such population studies. Furthermore the high throughput, time and cost efficient protocol of our experimental platform makes our method a very attractive alternative to standard genotyping practices. Such an approach could ultimately help bridge the gap between Cryptosporidium detection versus specific genotype or strain identification.  102  6.1 Comparative & Computational Whole Genome Analysis  Optimally comparisons of similarity or diversity between organisms are best done using entire genomes, which to date are available for a limited number of organisms. The ability to sequence entire genomes of pathogens has engendered a new discipline termed comparative genomics. Though most often used in phylogenetic studies its potential for application in epidemiology studies is becoming more evident as new assumptions can be made about the nucleic acid sequences used to type and classify such pathogens.  In Cryptosporidium research efforts have intensified in this respect but there are still  insufficient data available from which to draw robust genomic comparisons. In the post-genomic era attention has shifted to comparative genomics focused on the differences between genomes. Identifying genetic relationships between populations involves grouping organisms according to a defined set of characteristics. Cryptosporidium species are phenotypically very similar rendering it difficult to distinguish species based on morphology; molecular markers are therefore in demand. At the lowest genetic level are single point mutations or SNPs affecting individual nucleotides. Compared with other molecular markers, single-nucleotide polymorphisms exhibit extremely low mutation rates, making them rarer in recently emerged pathogens. A prerequisite for this study was mapping SNPs throughout the Cryptosporidium genome which involved comparing reference strains TU502 for C. hominis and Iowa II for C. parvum. The construction of the genomic and SNP library described yielded hundreds of initial targets for whole genome SNP-typing of Cryptosporidium. Thirteen genes were targeted from this group and used to generate a multi-locus SNP-type, representing a set of SNPs at 45 individual loci, which was subsequently used to partition the genetic relationships among and within globally distinct C. hominis subpopulations. Multiple criteria were used to evaluate genes and the SNPs within them for the study. The initial focus was on 2 major demes of gene type. First were those hypothesized to be under positive or diversifying selection pressures, most often being antigenic determinant genes. Secondly we looked for putative genes thought to be bio-functionally relevant to the success of Cryptosporidium. Comparison of the 9Mb Cryptosporidium genome sequences led to 13 targeted open reading frames. Aligmnent of target genes using the reference genomes for C. hominis and C. parvum was done to identify polymorphic sites at the nucleotide level. Results showed the presence of 394 single point mutations. On the basis of the protein sequences inferred each polymorphic site was differentiated as synonymous or silent versus non-synonymous, resulting in a single amino acid polymorphism. 103  While those conferring an amino acid change to the primary protein sequence are more likely to be clinically relevant genetic markers, synonymous mutations were used as molecular markers for phylogeography implications. Mutations that result in an amino acid substitution provide a substrate for evolutionary selection and have a greater chance of having a profound effect on the protein’s function than silent ones do. They may be harmful with a greater chance of causing a deleterious effect on the function of the protein and as a result most species evolve to eliminate them from the population through selection processes. In contrast they may improve protein function and advantageous selection plays a major role. Because synonymous mutations have greater potential of being neutral a larger proportion of them will become fixed in the population making them excellent molecular targets for deciphering the genetic structure of parasite populations. Mutations may affect organisms in multiple ways. A complete range of altered phenotypes from mild to phenotypically silent effects to minor advantageous traits to detrimental effects can be 36 seen 0• ’ If the global populations of Cryptosporidium do in fact share recent ancestry the account for the occurrence and ratio of silent versus nonsilent polymorphisms needs to be addressed. Most organisms, including many bacteria, viruses and eukaryotes carry it in abundance. Only rarely does replacement variation exceed synonymous variation. It can arise by strong diversifying selection, an event that might be highly anticipated in antigenic or virulent determinants of a pathogen. The most straightforward approach to determine this is to examine the ratio of non-synonymous versus synonymous mutations. From our initial comparative genome analysis the overall amount of synonymous substitution was 70% versus 30% for non-synonymous from the total 394 SNPs mapped, proposing a higher level of conservation of protein sequences in Cryptosporidium. From this SNP data set there were almost half as many NS SNPs as there were S SNPs with a ratio of 0.44 found at the 13 target gene loci. This would result in a higher level of conservation of protein sequences compared to that found in the Apicomplexans P.falciparum (2.34 ratio of NS SNPs / S SNPs), P. vivax (1.75 ratio of NS SNPs / S SNPs), and even the human populations (0.89 ratio of NS SNPs / S SNPs) . This implies a predilection for functional 221 . The genetic stability of the Cryptosporidium 221 constraint, as was recently shown for G. lamblia genome is clear; thus, underlining the importance of examining the subtle genetic diversity that does exist to better understand a specific species’ host range and transmission dynamics.  104  When looking at each gene alone the observations were diverse in regards to the concepts of natural selection pressure. Proteins involved in interactions with the host milieu are often rapidly evolving and can be identified by the comparison of silent versus expressed mutations. The EMAAg and Gp60 genes, two genes thought to be antigenically relevant were the only two that came close to having a predisposition to non-synonymous SNPs. While the hyper-variable nature of Gp60 was consistent with the allelic profiles seen from SNP-typing results for all four international subpopulations the EMAAg locus was highly conserved or genetically stable. Alternatively the location of a protein and consequentially its exposure to the host enviromnent can also influence selective pressure exerted upon it. The COWP gene, which due to its positioning is constantly exposed to external pressures from the host, showed only a slight propensity for expressed mutations versus silent mutations. This may be result of the COWP protein being an integral part of the hearty nature of the oocyst wall to ensure its environmental persistence. Alternatively it may be a reflection of the different niches occupied by the parasite within their respective hosts. Conversely genes representing biological processes such as cell growth, maintenance and metabolism have a much lower occurrence of expressed SNPs. The SNPs mapped to such genes for this study are in agreement with this. In contrast to earlier research, our study investigates the sequence diversity between the two Cryptosporidium species at both the nucleic acid and amino acid level in addition to the biochemical and biophysical impact of such mutations. Scientific research has drastically evolved with the development and implementation of highly specific software programs. Computational methods are now widely used to make inferences about the coding regions of a gene or genome, polymorphic sites and the subsequent impact on genotype and phenotype and ultimately evolutionary relationships. The use of a bioinformatics approach with the application of computational methods for more comprehensive studies on the polymorphic nature of such proteins provides valuable groundwork for more extensive downstream applications and studies. It is under-reported on in the field of Cryptosporidium research and therefore warranted. The aim here specifically was to provide a comprehensive account of the molecular and biochemical properties of genes and SNPs targeted for molecular typing. Similarities or dissimilarities within and among such genes could be beneficial in the design of therapeutic approaches.  105  Bio-synthesis & Enzymatic Proteins The ability to accurately detennine the genetic relatedness of isolates is fundamental to molecular epidemiological and evolutionary studies. The use of nucleotide variation at multiple housekeeping loci is an excellent approach to strain characterization, as it has advantages for inferring levels of relatedness between strains and the reconstruction of evolutionary events. Housekeeping genes often include those crucial to biological processes of an organism, processes such as metabolic and cellular pathways. Examined herein were six proteins suspected to be part of major bio-synthetic pathways within  Cryptosporidium machinery; APR, AcoA, HSP7O, LDH, MDH, and UPRT. Unfortunately the resolution of SNP typing results for the APR and AcoA protein was insufficient to allow inferences to be made. However the remaining proteins involved in major biosynthesis processes of Cryptosporidium, acetyl coenzymeA (AcoA), lactate dehydrogenase (LDH), malate dehydrogenase (MDII), and uracil phosphoribosyl transferase (UPRT), were genetically stable amongst all four international populations. SNPs within these genes were typed with great reliability and they proved to be excellent identifying molecular markers for species determination. Functionally crucial enzymes for parasite survival are unlikely to show significant interspecies variation. SNPs within these genes, as was the case here, could be utilized as suitable species discriminating or SNP-typing markers. Nucleotide biosynthetic pathways provide the precursors for DNA and RNA synthesis, essential processes to any pathogen, and are therefore are an excellent source of drug and/or vaccine targets. The metabolic machinery of Cryptosporidium is highly streamlined and is unique in that both mitochondrial and chioroplast DNA appear to be missing 9  109, 161,246  Whereas most  parasitic protozoa salvage purines from their host and synthesize prymidines de novo, Cryptosporidium is dependent on amino acid salvage for both purines and prymidines, lacking the ability to synthesize them de 205 novo” 245 Crucial to the import of nucleosides and amino acids are the transmembrane transporter ’ . Located in the parasites plasma membrane they provide substrate specific permeation routes 7 proteins” for preferred amino acids from the host. It is likely these transporters are required for parasite viability at most if not all of its life cycle stages. Transport proteins are not only vital for providing the parasite with the necessary nutrients but for the generation of electrochemical gradients, cell signalling pathways and the maintenance of ion homeostasis. Inhibition of transporter proteins would severely impair the metabolic and energy producing pathways of the organism leading to the parasite’s starvation and eventual death. Enzymatic proteins such as kinases and those involved in self-induced apoptosis are also ideal candidates within this same respect.  106  Cryptosporidium’s metabolic processes are unique from those of other Apicomplexan parasites making it a phylogenetic enigma within the phylum. The organism lacks a functional mitochondrion and a chioroplast. It is now apparent that C. hominis and C. parvum rely solely upon their host for the provision of nucleic acid precursors . Besides being the primary units of nucleic acids, nucleotides 205 contribute too many other crucial cellular processes such as cell signalling, replication and transcription. The biosynthetic pathways of Oyptosporidium are drastically simplified suggesting the metabolic function of each protein or enzyme involved is critical. Inhibition or deactivation of these proteins could severely hamper the biofunctionality of the organism.  Structural Proteins The study of parasite and host cell-tissue interactions is focused on the identification of structural and/or surface proteins that contribute to infection and disease pathogenesis as well as parasite propagation. Such proteins are of fundamental importance to the success of an organism as they are responsible for attachment, invasion and interactions with the host’s cellular niche 73 ’ 31  136  Cryptosporidium invasion of the intestinal epithelium microvillus involves the apical complex of the organism and results in a parasitiphorous vacuole. Cryptosporidium localizes to the intracellular but extra-cytoplasmic region of epithelial cells making up the brush border of the microvillus lining the intestinal. Proteins involved in this process are ideal targets for immunoprophylaxis as their inhibition could greatly hamper or prevent host colonization. Polymorphisms in these genes may reflect host receptor specificities. As a collective genetic variation within attachment and invasion proteins may reveal the underlying factors of the differences in C. hominis’ and C. parvum’s host range. A major part of Cryptosporidium’s pathogenic success is due to its complex homoxenous (single host) life cycle. Research is hampered by the inability to isolate the specific stages of it in vitro. The life cycle consists of two asexual stages and a sexual stage resulting in mature infectious oocysts. In the case of Cryptosporidium there is no requirement of an external period for sporulation thus making direct fecal oral transmission feasible. In other words oocysts are infective, and remain infective, immediately upon leaving the host. Ingestion of oocysts results in their breaking open and releasing four sporozoites which invade new host cells. Asexual maturation ensues and produces four second generation merozoites that excyst and invade new host cells. Sexual reproduction gives rise to zygotes that go through a second asexual stage to develop into mature oocysts, containing four haploid sporozoites, which are passed into the environment or auto-infect the host. Loci encoding proteins expressed at the surface of the merozoites or sporozoites would be expected to be more diverse than those expressed during the sexual 107  stages or inside the parasite. Surface associated proteins shared by or unique to the sporozoites and merozoites make attractive targets for drugs based on the interception of attachment to host cells. Three proteins considered to be involved in maintaining the structural integrity of  Cryptosporidium were targeted for allelic profiling at mutation positions; COWP, Mucin- 1, and 1 8S rRNA. The COWP (Cryptosporidium oocyst wall protein) locus is a crucial protein to the tough exterior shell of the infectious oocyst and hence helps it maintain its environmental persistence in a range of severe ecological conditions. Though genetically conserved within Australia, Kenya, and Scotland, two different molecular markers, COWP 6 and 1, revealed novel allele types of neither the C. hominis nor C. parvum standard in the Peruvian subpopulation. Whether a product of adaptive nature to varying temporal or environmental circumstances is not known and should be pursued. SNP-typing results from Mucin- 1, across all four international parasite populations were left unresolved hence unavailable for interpretation. The 1 8S rRNA gene has long been the standard of genetic typing for Cryptosporidium due to its reliability and ease of amplification for genetic testing. We looked at two SNP loci within this gene and the results displayed elements of similarity and drastic difference. The first SNP position, 1 8S rRNA SNP 1, was stable across all isolates for all four subpopulations for the expected C. hominis allele type. Similarly the second SNP, 1 8S rRNA SNP 3, was genetically stable across all four international subpopulations however the discriminate allele was a novel allele, not C. hominis nor C. parvum. The majority of the variant alleles were most often not shared among isolates from either within or between subpopulations. Therefore before we could consider this a fixed variant we had to first consider the multi-copy nature of the gene. A fixed difference, in particular a common fixed difference, implies a nucleotide change that would be present in each sequence from isolates of each subpopulation. Having confirmed the SNP location and the SNP-specific probe design typing results consistently showed the same novel allele variant. Since genes/proteins of interest were amplified from specimens containing whole genome DNA we accept that in all likelihood the multi-copy nature of the protein is a factor though argue that further exploration into this particular SNP position be done.  Putative Antigenic Determinants Basic immunology dictates that microorganisms exploit and manipulate their host to prevent recognition and attack from its pathogen defence mechanisms. Their ability to do so successfully likely emerges from the proteins involved having conserved amino acid sequences throughout the evolutionary history as selective pressures have eliminated genetic variations that have proven unsuccessful. It would be logical to expect that nucleotide changes introduced between C. hominis and C. parvum especially 108  within pathogenically crucial proteins were a result of positive selection and could explain the phenotypic differences between the two species. Differences between host populations have a defmite impact on this natural positive selection. Using SNPs to investigate the genetic basis of Cryptosporidium virulence is of great value to the identification design of potential immunotherapy targets. Surface proteins with antigenic potential are ideal candidates as they are vital to establishing and maintaining infection within the hostile environment of the host. It is possible that SNPs or patterns of SNPs within genes encoding antigenic determinants could help better define the host-parasite interplay.  Intron-containing Proteins A fourth type of protein considered are those that contain intronic regions. The Cryptosporidium genome is highly compact and gene dense” 245• C. hominis and C. parvum have considerably fewer introns than Apicomplexa such as Plasmodiumfalciparum . Its genome is only 9.1 Mbp with 8 245 chromosomes all highly compacted and extremely gene dense. In comparison the Cryptosporidium genome is 2.5xs smaller but has 1.8 x greater gene density. Genome reduction is thought to have occurred predominantly through the shortening of intergenic regions, loss and shortening of introns and a reduction in the mean length of genes” 245 Intron regions have great potential for playing a role in gene transfer, exon shuffling and ultimately genetic drift’ . SNPs or other mutations in these intragenic 06 regions could serve as reliable genetic markers for genotyping of unknown or novel isolates. The idea of introns and their origin, purpose or effect on gene organization has been heavily debated for years. To establish and retain intronic regions within a genome is difficult in the face of opposing evolutionaiy ’ 29 231 For an intron to persist and evolve with an emerging organism such ” 68 forces as Cryptosporidium sufficient positive selection pressures that favour its presence must exist, especially if it is to avoid further mutational challenges. Cells are programmed to splice out non-coding DNA (introns) through the recognition of identifiable nucleotide sequences dictating proper excision. Mutation within this specific region or any other sites critical to intron processing could affect the splicing of an intron. Mutations within an intron can affect the neighbouring coding DNA exons by introducing a loss or gain of amino acids which alters the reading frame of the gene. This could render a change in function for the gene or affect the regulatory factors exerted upon it.  109  Examination of SNPs within intronic regions is also useful because SNPs within these introns could be crucial when examining the evolutionary relationships of Cryptosporidium. Silent SNPs are often completely or nearly so adaptively neutral and not subject to direct natural selection. Because of this they have great potential to give an honest reflection of the mutation rate and time elapsed since the organism diverged from its most common ancestor. Moreover polymorphisms located in untranslated regions could be suitable for genotype tagging allowing for faster and more efficient means of identifiing isolates.  Bioinformatics Characterization In studies investigating association between the epidemiology and a genome, it is inefficient and impractical to genotype every single nucleotide polymorphism ’ 225 It is therefore necessary to target 24 those mutations expected to be relevant in terms of pathology, taxonomy or epidemiology. Comparative genomics is greatly supported by the use of bioinformatics applications for molecular studies as an essential data mining tool. This is especially appreciated in light of limited sequence data or potential erroneous sequence data being available for isolates from different sources as is the case with  Cryptosporidium. Formally SNPs can be defmed as alleles in a population. Through the use of multiple bioinformatics algorithms we are able to elucidate those SNPs that are of greatest potential research value to allow for inferences about the genetic relationship between C. hominis populations to be made. SNPs can exhibit a range of altered phenotypes from mild and phenotypically silent effects to minor advantageous traits to detrimental 68 effects I85 The likelihood of a SNP having impact on a protein ’ depends on where it occurs within the protein and the nature of the phenotype 54 68 To respond to this ’ 36 we employed multiple predictive bioinformatics algorithms to enable a more explicit evaluation of prospective SNPs. The profile of a protein’s hydrophobic character can be useful in predicting membrane-spanning domains, potential antigenic sites and regions that are likely exposed on the protein’s surface. Hydrophilic regions are more likely exposed on the surface of a protein and therefore are potentially antigenic. In contrast such analysis has the goal of predicting membrane-spanning segments which have a strong hydrophobic character. Proteins passing though the phospholipid bilayer of a cell interact with a region inside or outside of the cell, where they will find water, and will therefore have a hydrophobic region correlating to the hydrophobic region of the bilayer. Non-globular proteins andlor those without transmembrane domains will be strictly hydrophilic in nature. With a scale set at (-) 4.5  —  (+) 4.5 a value  110  greater than zero is suggestive of hydrophobic character while a value of two or more indicates a strong hydrophobic region. The analysis of properties such as secondary structure can suggest disease-causing mutations are associated with extreme changes in the value of parameters relating to protein stability . Secondary 66 structure prediction methods attempt to use the statistical preference of amino acid residues for secondary structures with the sequence to predict the secondary structure of each residue. The Kyte-Dolittle method of using hydrophobicity plots to assess topology predictions or transmembrane domains is an earlier method that lacks the robustness of more recent and computationally more advanced methods such as TOPPRED or MEMSAT. For the purposes of this study, whose main focus was not one of predicting membrane protein topology, it was considered sufficient. Membrane proteins are those proteins that span a lipid bilayer. The exterior surface that is in contact with the lipid hydrocarbon tails is highly enriched in hydrophobic residues. Water-exposed surfaces on either side of the membrane are dominated by polar and charged residues while the residues in the membrane-water interface region often tend to be aromatic. When evaluating SNPs it is worth considering where these amino acids are positioned within the structure of the protein as this will dictate what external pressures they are exposed to. The primary structure of a protein consists of the amino acids that compose the protein. Different regions of this sequence then form local secondary structures, such as alpha helices and beta strands. The packing of these secondary structural elements into one or several compact globular units or domains defines a proteins tertiary structure. Depending on their location within this tertiary structure could again dictate what outside pressures a given amino acid may be exposed to which in turn will influence potential SNP mutations. The secondary structure level of a folded protein is the most important. All downstream folding conformations are based on this level. Amino acid residues prefer certain structures. For example some amino acid residues have a propensity to form helices while others do 51 not 152 SNP mutations in genes ’ that code for a particular globular domain that confer a more drastic amino acid change, i.e., from a highly hydrophobic amino residue to a strongly hydrophilic residue, could impact the folding conformation of a protein’ . Ultimately this could influence the proteins bio-fuinctionality and/or 52 stability.  111  6.2 Population Geographical Substructuring  SNP-typing & Allelic Discrimination Genetic variation within a population occurs when there is more than one allele present in a population at a given locus , in our case this locus is a SNP molecular marker. Just as there are variable 47 alleles at a given loci there are fixed alleles. These can be useful for species designation or if a fixed allele is unique to a specific geography the identification of such a marker would be highly beneficial to any epidemiology study. If genetic variation occurs between populations or sub-populations it is considered genetic differentiation, which can be defined as the differences in allele frequencies among populations. Many studies have employed multiple methods of comparative genomics to score allelic variation at various loci throughout a given organism’s ”° 69 18Z 204, 221• In particular micro- and mini ’ 58 genome 5 ” 35 satellites, allozyme analysis and oligonucleotide arrays have been used extensively in work with Apicomplexans providing a great increase in the understanding of the population genetic structure and epidemiology of these ’ 3 72 parasites 4 6 ” 9 2 11 4 233 The biological differences between 9 5  Cryptosporidium and other Apicomplexans prevent comparisons from such studies to be made. There is little doubt that the use of similar genetic markers will be important tools for clariI’ing the population structure of Cryptosporidium. We have used a novel set of such molecular markers to analyse the population structure of C. hominis in four geographical subpopulations. The methodology utilized single base extension (SBE) chemistry for SNP-typing. SBE is one of the best biochemistries for genetic variation studies as it allows for large-scale association studies and population studies on the evergrowing SNP data bases being developed for organisms of all types . 44 From all the SNP-typing data accumulated thus far, using the baseline SNP panel established through comparative genomics and bioinformatics analysis, a M1St defined by the combination of SNP alleles at the 43 SNP loci for each isolate from 4 international populations was generated. In total 24 distinct MlSts were found; 6, 11, 11, and 6 for Australia, Kenya, Peru, and Scotland respectively.  Only  one, M1St- 1, was found in all four subpopulations while 6 were shared by two or more subpopulations;  75% of the MlSts are unique to one subpopulation. This argues for a limited gene flow between such distant populations. What heterogeneity that does exist in parasite populations is likely occurring at a more regional level, implying that investigations into the levels of inter-specific genetic variation among 112  more localised populations would better help elucidate the intensity of geographic barriers in  Cryptosporidium population structure. Environments where the frequency of transmission is high and/or conditions are more conducive to exposure competitive interactions may select for particular genotypes or SNP-types. Genetically isolated populations have an increased potential for local adaptation to specific environmental conditions as well as temporal or ecological circumstances. Differences such as access to potable water supplies, agriculture, hygiene standards or practices, diminished immune status of the population due to higher incidences of primary infections such as FIIV, tuberculosis and malaria could all play a role in higher rates of transmission or more opportunities for exposure. One could argue this would certainly be the case in Kenya and Peru, where the nation’s infrastructure for water supply in addition to socioeconomic circumstances likely differs from that of the more developed regions of Australia and Scotland. In a population of 20, Scotland had 9 occurrences of MiSt 15 and 7 of M1St1, the highest of any one MlSt among all four populations (Figure 5.10). In fact Scotland has had one of the greatest histories of cryptosporidiosis. However this may be accounted for by a more vigilant level of reported and documentation of such cases when compared to other nations. Following this, there were 6 cases of MlStl2 in Peru and 5 of M1St1O in Kenya, both of which are found in greater numbers than in any other geographical location for these MlSts. Variant alleles for the same SNP loci, 18S rRNA, BT 3, were found in more than one population and could suggest a global distribution. Where these alleles originated from would be of particular interest from an epidemiological standpoint but beyond the scope of this study. Perhaps most importantly is the identification of novel SNPs within the COWP protein in the Peruvian subpopulation.  This could  be suggestive of a unique genetic profile to Peru, or even South America, in a protein that is crucial to the attachment and invasion of host cells. A larger sample size would be needed to further investigate this as alternatively it could be a rare allele that occurs globally. SNP-typing results showed the genetic stability of six of the 13 proteins investigated. First off Cp23 and EMAAg, two antigenic proteins, showed distinct genetic stability whether intra-population or inter-population. This could potentially be of great research value due to their suspected roles as immunodominant or antigenic proteins that is proteins that elicit an immune response from the host. Both proteins and their respective SNPs were genetically stable and therefore reliable for molecular studies, particularly in monitoring the species complex of Cryptosporidium for evolutionary events that may occur due to selective or immune pressures. In contrast, the lack of variation within these two genes may suggest they are under selection though does not conclusively prove it. If eventually proven to be truly 113  genetically conserved despite geography one could hypothesize that a drug or vaccine developed to target either Cp23 or EMAAg could be universally applicable. Four other proteins; MDH, LDH, UPRTase, and HSP7O, showed significant genetic stability both within and between populations. This is not particular surprising in the case of the enzymatic proteins as they are housekeeping genes crucial to metabolic processes therefore a necessity to the pathobiology of Cryptosporidium. Of potential benefit from this is their candidacy to be neutral molecular targets for tracking evolutionary events or further species delineations that may occur within the phylum. It could also be inferred that the lack of variation in a global context would also render them excellent molecular targets for the disruption of essential Cryptosporidium biological processes. Visual examination of SNP-typing and allele discrimination results resolves two major goals of this study. First is the usefulness of our experimental platfonn in the identification of allelic diversity from sample to sample. This proved to be the case whether the allele was that of the very closely related C. parvum genotype or a novel allele altogether. This can be further extended to the investigation of suspected species co-infections, a concept once ignored in the field but gaining momentum as perhaps being more common than originally thought. Secondly, we have brought to light the usefulness of certain proteins and their mutation profiles as a genetic typing tool for species specific distinction. Not only is the methodology more rapid than that of standard gene sequencing it is more cost effective with higher throughput than most molecular methods. Theoretically up to 12 SNPs in 96 different samples could be processed in less than 8 hours. As a collective all of these observations would be of great benefit to epidemiological studies into Cryptosporidium behaviour, particularly in a large-scale outbreak situation. In contrast to the use of stable allelic profiles sequence or Mist comparison revealed that polymorphisms were in many cases not shared between individual isolates, let alone within or between subpopulations. If 5NPs unique to one isolate over another is fixed within that isolate they could ultimately be used to define it. For an allele variant be strictly unique to one isolate it would be ever present in each sequence from that isolate but would be absent in sequences from other isolates. Ultimately this could lead to the identification of a novel allelic structure of an isolate within a group of isolates.  114  Mixed Infections The presence of mixed alleles is a common occurrence in population genetics studies on microparasites. In the case of mixed allele calls at SNP markers the assumption was made that predominant peaks at each locus represent the actual genotype or SNP-type. This assumption is a necessity when dealing with haploid organisms and genetic data analysis. It is commonly cited in the literature for studies based on other related Apicomplexa” ’ ‘. Other than the isolation of individual 36 oocysts and genotyping or SNP-typing these, the research community commonly accepts that there is no other way around the issue of mixed alleles. Based on our experimental approach the decision to use 20% as a threshold value was made though can be left open for interpretation. The presence of such alleles or mixed genotypes is one that anyone dealing with microparasite infections runs into. To some extent this threshold can be considered arbitrary and most certainly is dependent on the reliability of the detection system used. Tait et al (2003) conducted a study on the geographical sub-structuring of C. hominis and C. parvum genotypes in the United Kingdom using a combination of micro and mini-satellite markers. Having encountered the problem of deciphering multi-locus genotypes in the presence of mixed alleles they reasoned that any secondary peak being at least 10% the height of the main peak could be scored as real. A second study by Anderson et al., (2005), using microsatellite regions for an investigation into the population structure of  Plasmodiumfalciparum argued the use of a 33% cut off because of their experimental platfonn. Microsatellite loci often have stutter peaks one repeat length away from the main peak. If a cut off less than 33% was used, these artifactual stutter peaks could be confused with additional alleles. In contrast to using satellite markers, when dealing with SNPs, minor peaks can typically be detected at a lower threshold. The presence of multiple peaks at certain SNP molecular markers could result from multiple infections within a given sample. If the case we would expect to see multiple peaks at multiple SNP positions within a single sample. Also to be considered is the sample source. Multiple infections are more likely to occur in geographic areas where cryptosporidiosis is highly endemic. Frequent transmission from environmental sources increases the probability of coinfections occurring with genetically heterogeneous parasites, favouring recombination. Though ubiquitous, on the global scale, certain localities within a given country may be more susceptible to cryptosporidiosis outbreaks. This could be from many different factors as discussed in chapter one, factors such as agricultural practices, potable water resources, hygiene standards and host immune status. For example in developed countries 115  where sanitary practices are more stringent and HlV incidence rates are lower, coinfections with heterogeneous parasites originating from environmental sources may be less frequent thus allowing for clonal propagation to prevail. Samples used in this study were crude fecal specimens known to contain genomic Cryptosporidium DNA. While typed or diagnosed as C. hominis the presence of other species cannot be ruled out unless individual oocysts are isolated and typed for a specific species. With the argument of clonality in Cryptosporidium genetic populations, a potential alternative could be a mutation within clonal infections. In this case we hypothesize that just one or perhaps two SNPs would have multiple peaks in any infection. Our data showed 19 SNP markers with bi-allelic typing results. The results of the study presented here show there to be mixed alleles, at 13 (19 if the AcoA and Mucini loci are considered) different SNP loci within 5 (or 7) genes. These 5 genes are located on four different chromosomes of the Cryptosporidium genome. In the case of mixed genotypes, we can hypothesize that a higher number of such genotypically mixed M1St’s seen in one population versus another could be suggestive of a less stable population structure. This could be result of multiple factors. Such analysis would be hampered as most of these can only be answered by detailed retrospective studies within other public health districts, In the case of Peru and Kenya these are likely very primitive if established at all. Since the majority of the secondary allele calls are that of the C. parvum genotype it is possible that humans co habiting with or relatively near to bovine hosts could cause this. Also worth examining would be the exact etiological source of infection meaning questioning the infection source of water, versus food, versus swimming pool versus petting zoo and so forth. Because samples used were isolated from human hosts the human to human transmission of C. parvum may be more important than has been previously assumed. Alternatively as the temporal, ecological or host immune status conditions vary from country to country different conditions may favour the appearance of certain genotypes. It is here that data on the exact location of each isolate’s isolation within the geographic territory it came from would be especially useful. The preliminary results of our work show no predilection for mixed genotypes for one subpopulation over another. This may imply that Australia, Kenya, Peru, and Scotland have similar epidemiological circumstances, circumstances that would affect the prevalence of mixed genotypes. Further examination and more precise and stringent species specific molecular tools would be necessary to resolve this. The implication from the evidence of mixed infections occurring in the isolates tested is 116  that rather than conforming to a strict paradigm of either clonal or panmictic species the data is suggestive of the cooccurence of both pathways which is consistent with other studies’ 35 ” 34  2s•  Genetic Diversity Indices Understanding and quantifying the genetic structure of a natural population of any organism has been a long standing objective in evolutionary biology. Only by defining how genetic diversity is distributed within (intra) versus among (inter) populations can insight into genetic population structure, levels of gene flow, historic population parameters and even the early periods of speciation be provided. These studies are of particular importance when considering either a re-emerging infectious disease or a newly emerging infectious disease, as is the case with Cryptosporidium. One of the ultimate goals in quantifying population genetic structure is to not only understand variation among species, or sub-populations of a given species, but to determine whether or not any patterns exist among different populations and life zones/geographies exist.  At the genetic level,  barriers to dispersal and subsequent genetic exchange between populations allow for their divergence because of local adaptation, gene flow or random genetic ’ 88 236• Spatial or geographic population ” 47 drift structure is most readily estimated by evaluating the degree of genetic differentiation of genetic marker, neutral or expressed, among geographically separated populations. Herein the genetic relationship between four international populations of Cryptosporidium, Australia, Kenya, Peru and Scotland, were investigated using such DNA markers. The molecular typing data indicate relatively higher levels of genetic variability within populations. Gene flow, or the movement of genes between populations, can be a potent force in the reduction of genetic differentiation among populations. Excessive gene migration among populations can convert genetic differentiation into an increase in genetic variation within a population. If gene migration is restricted, which can be the case in isolated geographic regions; genetic differentiation will increase as allelic frequencies within a population become more fixed or stable. In contrast genetic differentiation can be created through the process of random genetic drift which refers to the fluctuations in allele frequency occurring by chance. This is especially true within smaller sub-populations. As time goes by these allele frequencies become fixed or stable within a population, leading to an increase in population differentiation. While changes in allele frequencies within populations, most likely caused by natural selection, can lead to adaptation it is the genetic differentiation that can ultimately lead to major evolutionary events.  117  Multilocus sequence analysis of a panel of 43 molecular markers, SNPs, from Cryptosporidium isolates revealed that, on a global scale, genetic diversity is more considerable within a given population than it is among populations. The identification of such a biogeographical trend has implications when investigating the population structure of a microorganism. Previous studies into the biogeography of Cryptosporidium have uncovered evidence of both geographically restricted genetic sequences as well as more ubiquitous ones. However the geographic dispersal of genetic population structure within these studies was quite limited and questioned whether similar patterns would be seen in other locales and how large of an impact the movement of parasite hosts within the respective locale made. This study is one of the first to approach the pathogenomics of Cryptosporidium from such a global perspective. Considering all populations are situated on different continents, hence the obvious geographic boundaries of oceans and mountain ranges, it would stand to reason that the exchange of genetic information, or gene flow, would be more restricted. If this were a true factor one would expect to see a high degree of genetic differentiation between populations. Because mutation is the ultimate source of all genetic variation, it increases variation within 47 subpopulations 235 Mutation also leads to an increase in differentiation (inter ’ population diversity), because the chance of the same mutations occurring within the same subpopulations is low. Gene flow works to convert inter-population genetic variation (differentiation) to intra-population variation, while genetic drift tends works oppositely, converting intra-population into inter-population variation . With the physical geographic boundaries of Australia, Kenya, Peru, and 47 Scotland limited gene flow would be expected resulting in a higher degree if intra-population diversity. The Australian population was considerably more variable for all gene diversity measures when compared to the others, Hs=0. 177 (Table 5.13), in particular Scotland which revealed to have the least genetic diversity averaged at Hs’0.022. Despite the pronounced difference the clonal diversity, PD (Table 5.12), was quite uniform across all four international populations.  An estimation of GstO.304  indicates that 30.4% of the total genetic variation exists among the four international C. hominis populations. The inter-population differentiation estimated to be 3 0.4% is suggestive of limited but still present gene flow between populations. In other words gene flow was not sufficient to erase genetic divergence amongst these geographically separated subpopulations. Because the probability of gene flow between populations would be expected to be lowered when considering the relatively isolated habitats of the four intercontinental populations the practice of sampling more individuals within a given population should be adopted. If the degree of variation among populations was estimated to be significantly higher than variation within a population it would then be more prudent to sample more populations or geographies, with less focus on the number of individuals. 118  This result is in agreement with a more recent study that undertook the assessment of population structure on a global scale by Tanriverdi et al., 2008218. The study involved both C. hominis and C. parvum isolates and also showed insufficient gene flow to erase genetic divergence. Though considering the degree of travel occurring in the modern world today the author must point out that precisely defined and isolated geographic boundaries from an epidemiological standpoint are almost non-existent, thus creating opportunity for gene flow through travel and import/export of foods. Taking into account results here we put forth that the degree of genetic variation partitioned among populations may be better examined by sampling more populations within the given territories of Australia, Kenya, Peru, and Scotland respectively. It must also be taken into account that while the origins of parasite populations donated are documented to be from with each country as confirmed C. hominis specimens’ information on the exact location within their country of origin is not known. This fact could be key when considering the vastness of a country such as Australia. Through the use of very geographically separated populations we hoped to negate issues surrounding the travel of hosts among the four localities though we cannot ignore the intensive nature of human travel in the modern world. It is therefore of great interest that in the future we would hope to be able to access to data surrounding the exact circumstances of each isolate in terms of host and point of origin. In doing so we hypothesize we could limit a biased epidemiological structure by omitting isolates from patients reporting travel. This would perhaps result in a more regional assessment of inter-population differentiation being made which in return may help better determine the true impact of genetic drift at the subpopulation level. The severe isolation of a fmite population will cause random genetic drift to become relatively more important than gene flow . With the clonality of Cryptosporidium populations still being debated this would be worth 47 exploring as in many clonal species the majority of genetic variation is often found among populations. Also supporting the idea that more “mini-subpopulations” within a locality should be evaluated is shown by the genetic distances measured among all four international populations. Despite being so geographically removed from one another the highest genetic distance relationship was only 0.06 1 between Kenya and Scotland. Also a factor of consideration was the potential for selection for local adaptation of the four intercontinental sub-populations. The environments our four subpopulations inhabit, as is often the case, differ in terms of light, temperature, agriculture, population immunity, host density, and so on. As local adaptation occurs an increase in differentiation occurs ’ 185 Conversely, selection events that do not differ 47 between subpopulations, due to similar environments or fundamental features of a species, will lead to a decrease in differentiation . Though overall results indicate that differentiation is low in comparison to 47 119  intra-population variation thus implying minimal effect from local adaptation selection does not act on the genome as a whole. While genetic drift and gene flow affect all loci, selection can be more targeted. As mentioned the novel allele variants in the Peruvian subpopulation may be evidence of such a local adaptation upon further examination. A common obstacle in genetic investigations into any new pathogen is the establishment of baseline genetics from which inferences, comparisons and differences can be ascertained. Some argue for close relatedness of C. hominis isolates throughout the world, other emphasize that clonal lineages within C. hominis are evolutionary independent. While there appears to be a monophyly of C. hominis there is extensive substructure so Cryptosporidium should be considered a species complex. Though data is starting to accumulate we don’t know the baseline measures of genetic diversity or population differentiation for Cryptosporidium, especially at the bio-geographic 218 level 219• ’ Our data compliments those of other studies done previously on Cryptosporidium parasite populations using various methods, different loci, and in a less global ’ 2 25 manner 8 3 ” 3 5206 7 219• To date 8 2 4 most studies have focused on a few loci, versus a multi-loci whole genome approach, which limit the ability to accurately capture the true genetic structure of a population. An earlier study conducted by Smith et al. (2003) showed through the analysis of a combination of micro- and mini-satellite markers there was no evidence to support geographic or temporal substructuring of C. parvum populations within Scotland. A lack of geographical sub-structuring was evident by both Wright’s Fst values and Nei’s genetic distance values. While the results of the Smith study are limited to the geographic boundaries of Scotland, a similar study done in Italy by Caccio et al. (2000) showed evidence for the non-random geographical distribution of specific alleles within a protein, the ML1 protein. Their study involved Italy and other Northern European samples and indicates that perhaps geographical sub-structuring is more evident when samples from a wider area are used. A third, more recent study aimed at geographic linkage and variation in C. hominis was done by Hunter et al. in the UK . They assessed the geographic 40 population structure from a standard genotyping approach using the Gp60 locus. This marker is especially appealing due to its functional relevance and extensive sequence polymorphism. There are some differences between their study and ours. One major discrepancy between the two studies is instead of using distinct geographic populations they were looking into the transmission dynamics from a movement of hosts out of and into the UK. Their conclusion of the relationship between travel outside of Europe and Gp60 subtypes was 37.08% with no other epidemiological associations present. That is differentiation or inter-population diversity constitutes 3 7.08% of genetic variation. Even though it was  120  conducted through a different methodology and perspective of geography this was somewhat complementary to our finding of inter-population averaging at 30%. Thus far it appears from our data that there is little to indicate population substructuring with part of the problem being that we don’t have specifics on how the samples used relate to the geographies they represent. If they all came from the same location they would likely underestimate the diversity. Migration is an influential mover of genetic change. Whenever it is involved it is hard to maintain population sub-structuring. In contrast, selection and drift move much more slowly and can be easily swamped by minimal migration. The minimal amount of global baseline data akin to this study in the literature leaves the researcher as having to take the results with confidence at face value until further studies are accomplished. In the interim it is likely that by increasing the number of SNP loci and samples, using well defined sub-populations and improving estimates of allele frequencies and divergence with more sophisticated data analysis methods can improve upon the study at hand. Ultimately patterns of modem C. hominis population structure discussed here could be used to guide construction of historical models of migration and admixture which would be useful in inferential studies of Cryptosporidium genetic history.  121  CHAPTER 7 FUTURE DIRECTIONS -  Current Work, Study Extensions  —  7.1 Current Work  As eluded to we investigated a total of 25 target proteins for SNP-based genetic typing of globally distinct C. hominis subpopulations. While 13 of these were addressed in the study at hand, 12 are awaiting multi-locus SNP-typing. These 12 include: Cellcycle Regulator, CTCL Tumor Ag, Aldahyde-Alcohol Dehydrogenase, CLL Associated Ag-KW-2, Sexual Stage Specific Kinase, FLJ3 181 2/DHHC palmitoyl transferase, Transmembrane amino acid Transporter, ABC multi-drug or ion efflux, Thiolproteinase, Extracellular protein w/ 8 kazal repeats, Seroreactive Ag BMN-19B related protein and RIK protein w/? WD4O repeats. These 12 proteins represent 6 of the total 8 chromosomes of the Cryptosporidium genome: 1 of which is situated on chromosome one, 2 from chromosomes 2, 3, 4 and 8 and 3 of which represent chromosome 7. These 12 proteins had bio-functionalities ranging from biosynthesis, enzymatic, metabolic and antigenic properties. Of most interesting note is, with the exception of the genome sequencing project, there is either a limited or a complete lack of molecular data on any of the 12 above mentioned proteins, making them attractive novel target proteins for multi-locus SNP-typing. Using the restricted sequence details that were available through in silico data mining we designed original, un-published upon, PCR amplification primers. All 12 of these proved to be highly successful in generating large amounts of DNA amplicons from crude fecal specimens (Appendix 6). To date there is no published data on successful PCR amplification of any of these 12 proteins thus indicating our primers will be a valued addition to the molecular field of Cryptosporidium research. Using isolates from all five subpopulations M1S-typing of these target proteins is currently awaiting completion. It is believed that the addition of 12 new proteins 122  and a minimum of 24 new SNP loci would add great robustness to the study at hand and allow for more conclusive correlations to be inferred.  7.2 Future Study Extensions  The success of efforts to design and develop efficacious vaccines or chemotherapies for Cryptosporidium is contingent on characterizing the extent and nature of genetic diversity within its genome. Just as important is the identification of the mechanisms by which such diversity is generated and able to persist in parasite populations. Our study is a preliminary investigation that could be extrapolated to address this. Keeping in mind the high degree of genetic similarity between the C. hominis and C. parvum genomes (—97%) it stands to reason that differences in their pathogenic behaviours, from host specificity to mode of transmission to disease severity, is most likely due to those subtle genetic differences that do exist. Further, more complete characterization and evaluation of the genetic make-up and organization between the two at the mutation level is necessary. Molecular studies are vital for refining the host specificity, interlaced transmission dynamics and infection sources of Cryptosporidium. To put into an ecological context more studies need to be undertaken. Studies could provide important insights into the effects that anthropogenic activities like waste treatment, water supply treatments, farming and agricultural practices and public health or hygiene issues have on the overall epidemiology of Ciyptosporidium.  7.2.1 Continuation of Current Multi-locus SNP Data: Examination of More Genes & Molecular Markers  Our data suggests through the evaluation of isolates located from more regional areas within the territories of Australia, Kenya, Peru, Scotland and Canada the exact degree of intra-population diversity could be better defined. It is important that a balance between geographically diverse populations and population structure conclusions be made. If populations examined are too close, results may be skewed by too narrow of a geographic boundary and the potential for increased movement of hosts within it.  123  In molecular research the more data accumulated the more robust the conclusions that can be made. The logical immediate extension of this study would be to continue to examine more proteins and SNPs. As a greater number SNPs and proteins are examined the stronger the associations to geography can be made. A comparable study was done on the human genome, estimated as having 20,000 to 25,000 genes, covering three distinct populations specifically designed to detect the number of SNP loci to infer population structure . Results showed that just over 65 random SNP loci were required for identifying 225 distinct geographically separated populations. The Cryptosporidium genome, estimated to contain approximately 10, 000 genes, is just 0.17 that of the human genome and we are currently using 43 SNP loci. While our results are interesting it remains that the use of either more SNPs overall or particular SNPs not yet identified that are crucial to population studies could make a considerable impact. Also to consider would be incorporating more intron regions and possible SNPs within them into the study. At the initial design of the study, 5 years ago, there were only 6 hypothesized genes containing intron regions. With the completion of the C. hominis and C. parvum genomes and the increasing accumulation of molecular data there is now an estimated 200-800 genes with putative intron regions (Figure A. 1). Non-coding regions, like introns, are expected to have fewer functional constraints compared to coding regions. The levels of genetic variation within these regions could have significant implications. A low genetic variation could imply influences exerted upon them by which a gene’s frequency changes due to selection operating upon a linked gene; proximity on a chromosome may allow genes to be dragged through the selection process due to an advantageous gene nearby. Alternatively low genetic variation could be suggestive of conserved functional roles usually involved with introns, such as splicing machinery. With respect to Cryptosporidium this is very likely as the splicing machinery of the genome has been shown to be drastically reduced or streamlined” 245 The examination of allelic variation within these regions could be very informative.  7.2.2 Inferring Patterns of Evolutionary Descent  To obtain further ideas about the nature of population structure for Cryptosporidium authors would like to extend current multi-locus SNP data and future data generated to alternative analysis methods or approaches. The use of multi-locus molecular marker data for the accurate cataloguing of isolates of parasitic pathogens has a marked impact on both routine epidemiological surveillance and population biology. In both fields, a requirement for exploiting this resource is the ability to differentiate the relatedness and patterns of evolutionary descent among isolates with similar genotypes. Though 124  valuable in their own right most clustering methods, such as dendrograms, tend to provide a poor representation of recent evolutionary events as they are inclined to rebuild relationships in the absence of a realistic model in which parasite populations emerge and diversify. Dendrograms typically represent multi-locus genetic data on the basis of a matrix of pairwise differences in the allelic profiles of the isolates studied. While a convenient means of identifying isolates that may be identical or closely related the topology of dendrogram representation can be arbitrary, providing little information on the patterns of evolutionary descent of the isolates. In view of the cosmopolitan distribution of Cryptosporidium spp. which may easily travel between countries a more detailed account of the host from which each isolate was obtained would be attractive. This type of data would be crucial as to whether or not an epidemic structure or bias can be ruled out due to the presence of imported, thus reproductively isolated, MlSts in C. hominis isolates from the subpopulations studied. This would require a retrospective approach to the study as patient information regarding travel behaviours should be obtained in order to discern those isolates that may have come from hosts having travelled outside of the geographic boundaries. In theory the ecology or environment of cryptosporidiosis in different geographies may have selected for phenotypes best adapted to each environment. If true one would expect that imported parasites would be unlikely to spread if the environmental factors or transmission patterns are hostile.  In consideration of clonality for Cryptosporidium, of future interest for study may be the single locus variants, those allelic profiles or M1St’ s that differ by just one molecular marker. In the simplest of terms the emergence of a clonal population is that an initial genotype increases in frequency in the population. This is likely a result of fitness advantage or random genetic drift thus enabling it to become a predominant genotype. As its frequency increases over time, this genotype will gradually diversify. Ultimately variants in the allelic profile of descendents of this genotype will arise, by point mutation or recombination. This may start with a single allele variant but in time can lead into multiple allele variants as further diversification occurs. To address these concepts the authors would like to evaluate the allelic profiles of isolates from each of the subpopulations used in this study using a more recent program known as eBURSPM ( The eBURSPM algorithm works to identify mutually exclusive groups of related genotypes in the population and attempts to identify the founding genotype or sequence type (ST) of each group. The algorithm then predicts the descent from the predicted founding genotype to the other genotypes in the group displaying the output as a radial diagram, centered on the predicted founding genotype. The primary founder of a group is defined as the sequence type (ST) that differs from the 125  largest number of other STs at only a single locus. The eBURST diagrams display the patterns of descent within each group from the predicted founding ST (Appendix 9). The assignment of the founding ST does not take into account the number of isolates of each ST; this makes the procedure relatively robust to sampling bias.  7.2.3 The Potential Implications to the Range of Vaccines or Chemotherapies Targeted to Specific Mutations within the Cryptosporidium Genome.  Ultimately an attractive downstream approach to SNP diversification among Ciyptosporidium populations would be to determine a more precise picture of the stability of SNPs known to be under intense immune or diversifying pressure, in particular surface antigens of the parasite. Needed would be epidemiological settings for Cryptosporidium that are suitable to test whether polymorphisms evolve rapidly because limited human movement among defined geographic regions and low transmission levels limit the diversity of parasite populations. While this was addressed in this study we cannot ignore the “global population” of the modem world created by transportation and travel practices. To get a true representation of SNP stability using such epidemiological conditions it is likely that the study would have to expand to a relatively remote geography that would support the epidemiological settings mentioned and be extended over a significant period of time. In theory SNPs that did not show any sequential or stepwise changes among alleles and/or alleles found in more than geography could imply SNPs originating outside of such a locale. Furthermore this would hint at novel SNPs having not evolved  within that geography, thus indicating stable SNPs. In contrast if SNPs varied stepwise in terms of chronology inferences about the age of the observed SNPs could be made. In the big picture of the future of Cryptosporidium research, if an efficacious vaccine or drug regimen targeted to known antigenic molecular markers were to be developed the presence of these stable SNPs would suggest that they will be more effective where there is a limited gene pool, as in heavily isolated populations. Studies could also then be conducted radiating outwards to increasingly larger geographic boundaries to determine the range of usefulness of such therapies.  126  CHAPTER 8 EXECUTIVE SUMMARY  By some estimates water is responsible for approximately 80% of all infectious disease. One of the most prevalent causative agents is Apicomplexan organisms such as Cryptosporidium. Morbidity and mortality due to infectious Apicomplexa protozoa is of growing concern, especially in the era of AIDS. Mounting rates of infection and numerous large scale outbreaks coupled with ineffective therapies, their side effects and emerging resistance among organisms proves there is a tangible need for the development of novel therapeutics targeting these protozoan parasites. Cryptosporidium is a globally ubiquitous enteropathogen of great importance to public health exacerbated by factors such as socioeconomic status, access to potable water, proximity to agricultural practices and wildlife, and personal immune health statuses. The environmental stability when coupled to the complex, interlacing transmission dynamics of this pathogenic parasite renders Cryptosporidium a highly successful microorganism for which there is currently no efficacious vaccine or prophylactic treatment. A detailed analysis and characterization of the subtle differences between the emerging infectious species of Cryptosporidium is a crucial and important step towards the rational design of novel therapies and more effective intervention policies. It is foreseeable that the widespread occurrence of similar genomic regions considered potential vaccine targets but having high rates of mutability will impact the probability of success of protective vaccines. There is a need for longitudinal studies that link population based genetics with clinical end points. The potential implications of this are that prevention or treatment strategies may need to differ for different geographical areas where genetic variations are conserved in order to be more effective. Extensive research into the genetics and etiopathogenesis of Cryptosporidium are being conducted in facilities all over the world. Even with all the progress the pathobiology of Cryptosporidium is still largely unclear. As analysis of the completed genomes proceeds the discovery of new genes and proteins will arise. Inevitably so will questions that address their degree of variability, how this variation is generated and maintained, and in what way can genetic diversification affect intervention efforts. 127  Cryptosporidium and the growing number of novel species being identified is an excellent example of how the parallels between wildlife or ecological circumstances and an emerging infectious disease in the human population can be associated with increased interactions with zoonotic pathogens coupled to the host-parasite paradigm. When investigating an organism, like Cryptosporidium, whose pathobiology is directly linked to the environment identifying any correlations between the emergence of disease and casual factors such as microbial adaptation and the degree of genetic diversity from a biogeographical perspective is crucial if a better understanding of the epidemiology of Cryptosporidium is to evolve. We report on genetic variation both within and between C. hominis subpopulations from Australia, Kenya, Peru, and Scotland. We examined -1 8 500 bp and assembled a data set of 394 SNPs. Employing comparative genomics and bio-physical profiling an expected haplotype, representing a set of 45 single nucleoticle polymorphisms at individual loci was established. Molecular typing of 77 international isolates based on this haplotype or multi-locus SNP-type was done, twenty-four unique M1St’s were identified. Inferences about genetic relationships were made using genetic data analysis software programs to quantify and partition the genetic diversity into intra- and inter-population diversity and to discern genetic distances among subpopulations. Our aim was to answer the question what level of genetic variation exists within geographically distinct populations. The possibility of exclusive “geo-types” would suggest Cryptosporidium parasites harbour substantially greater biodiversity and species richness than current estimates imply. Our data suggests little to argue for population substructuring. Depending on the locus and isolate studied, the results ranged from a virtual lack of to more extensive genetic variation. Within population differences among subpopulations account for 69.6% of genetic variation; differentiation among subpopulations constitute 3 0.4%. Genetic distances among subpopulations averaged 0.048 and varied from 0.034 between the Australian and Scotland subpopulations to 0.06 1 between Scotland and Kenya. The potential use of a DNA-typing scheme based on SNPs to resolve Cryptosporidium epidemiology was examined. Using the experimental methodology that we did enabled us to demonstrate the ability to genotype an isolate based on a particular mutation profile. Rapid and reliable species distinction is crucial to any epidemiological outbreak investigation. We identified four genetically stable SNP profiles within four different proteins that would be excellent candidates for study into this. Shown was the ability to clarify the presence of standard or novel allelic variation at a specific SNP locus. We identified private allele variants unique to one population, Peru, within a protein crucial to the invasion and attachment strategy of Cryptosporidium. The concept of mixed infections possibly being more 128  common than once thought has garnered more attention. Implications about using SNPs as a molecular tool to reveal the presence of mixed infections could be made from the data generated. This study is one of the first to report on international biogeographical diversity using a SNP profile as a DNA-typing scheme. Some of the proteins and SNPs are discussed for the first time within the field, offering some excellent baseline possibilities. To more precisely clarif’ the species complex as a whole, the evolutionary forces behind the emergence of new species and the subsequent consequences to human population health further molecular research is certainly warranted.  129  LITERATURE CITED  1.  Abrahamsen M, Templeton T, Kapur V, et al. 2004. Complete genome sequence of the Apicomplexan, Cryptosporidiumparvum. Science 304: 44 1-444.  2.  Akiyoshi D, Feng X, Tzipori S, et al. 2002. Genetic Analysis of a Cryptosporidium parvum Human Genotype 1 Isolate Passaged through Different Host Species. Infection and Immunity 70(10): 5670-5675.  3.  Akiyoshi D, Siobahn M, Tzipori 5. 2003. Rapid Displacement of Cryptosporidiumparvum Type 1 by Type 2 in Mixed Infections in Piglets. Infection and Immunity 71(10): 5765-5771.  4.  Amar C, Dear P, McLaughlin J. 2003. Detection and identification by real time PCR/RFLP analyses of Cryptosporidium species from human feces. Society for App. Microbiology 38: 217222.  5.  Anderson T, Nair 5, Nosten F, et al. 2005. Geographical Distribution of Selected and Putatively Neutral SNPs in Southeast Asian Malaria Parasites. Molecular Biology and Evolution 22(12):2362-2374.  6.  Applebee A, Thompson A, Olson M. 2005. Giardia and Cryptosporidium in mammalian wildlife current status and future needs. TRENDS in Parasitology 21(8):370-375. —  7.  Arrowood M. 1997. Diagnosis in Cryptosporidium and Cryptosporidiosis. Fayer R. CRC press: 43-64.  8.  Ashbolt N. 2004. Microbial contamination of drinking water and disease outcomes in developing regions. Toxicology 198: 23 1-238.  9.  Atreya C, Anderson K. 2004. Kinetic Characterization of Biofunctional Thymidylate synthase dihydrofolate reductase from C. hominis. J. of Biol. Chemistry 279(18):183 14-18322.  10.  Atwill E, Johnson D, Frost W, et al. 1999. Age, geographic and temporal distribution of fecal shedding of Cryptosporidium parvum oocysts in cow-calf herds. American J. of Veterinary Research 60: 420-425.  ii.  Atwill E, Johnson D, Pereira M. 1999. Association of herd composition, stocking rate and duration of calving season with fecal shedding of Cryptosporidium parvum oocysts in beef herds. J. American Veterinary Med. Association 215:1833-1838.  12.  Atwill B, Sweitzer R, Boyce W, et al. 1997. Prevalence of and associated risk factors for shedding Cryptosporidium parvum oocysts and Giardia cysts within feral pig populations in California. Appl. and Enviromnental Microbiology 63:3946-3949.  13.  Awad-El-Kariem F. 1999. Does Cryptosporidiumparvum have a Clonal Population Structure? Parasitology Today 15(12)502-504.  130  14.  ‘Awad-El-Kariem F, Robinson H, Casemore D, et al. 1998. Differentiation between human and animal isolates of Cryptosporidium parvum using molecular and biological markers. Parasitology Research 84: 297-301.  is.  Balabat A, Jordan G, Tang Y, Silva J. 1996. Detection of Cryptosporidium DNA in human feces by nested PCR. J. of Clinical Microbiology 34: 1769-1772.  16.  Barnes D, Bonnin A, Huang J, et al. 1998. A novel multi-domain mucin like glycoprotein of C. parvum mediates invasion. Mol. & Bioch. Parasitology 96:93-110.  17.  Beck J, Davies J. 1981. Medical Parasitology; 3’ Edition. C. V. Mosby Company.  18.  Bell A, Meeds D, Farley J, et al. 1993. A swimming pool-associated outbreak of Cryptosporidiosis in British Columbia Canada. Canadian J. Public Health 84:334-337.  19.  Black. 1996. Lecture in Infectious Disease Epidemiology. Johns Hopkins School of Public Health.  20.  Bogitsh B, Carter C, Oeltmann T. 2005. Human Parasitology; Press.  21.  Bonafonte M, Smith L, Mead J. 2000. A 23-kDa recombinant antigen of Cryptosporidium parvum induces a cellular immune response on in vitro stimulated spleen and mesenteric lymph node cells from infected mice. Exp. Parasitology 96(1 ):32-4 1.  22.  Bourgon R, Delorenzi M, Sargeant T, et al. 2004. The serine repeat antigen gene family phylogeny in Plasmodium: the impact of GC content and reconciliation of gene and species trees. Molecular Biol. & Evolution 21(1 1):2161-2171.  23.  Brookes A. 1999. The essence of SNPs. Gene 234 (2):177-186.  24.  Butler J, Bishop T, Barrett J. 2005. Strategies for selecting subsets of single-nucleotide polymorphisms to genotype in association studies. BMC Genetics 6(Suppl 1):S72.  25.  Caccio S, Homan W, Camilli R, Traldi G, Kortbeek T, Pozio E. 2000. A microsatellite marker reveals population heterogeneity within human and animal genotypes of Cryptosporidium parvum. Parasitology 120(Pt.3):23 7—244.  26.  Caccio S, Homan W, van Dijk K, Pozio K. 1999. Genetic polymorphism at the b-tubulin locus among human and animal isolates of C. parvum. FEMS Microbiology Letters 170(1): 173-179.  27.  Caccio S, Spano F, Pozio E. 2001. Large sequence variation at two microsatellite loci among zoonotic (genotype C) isolates of Cryptosporidium parvum. International J. for Parasitology 31: 1082-1086.  28.  Camp, Dresser and McKee. 1995. “Summary of the Mt. Vernon, Ohio, Membrane Softening Pilot Plant.” December 14, 1995.  29.  Campbell I, Tzipori 5, Hutchinson G, Angus K. 1982. Effect of disinfectants on survival of Cryptosporidium oocysts. Veterinary Research 111: 414-415.  rd 3  Edition. Elsevier Academic  131  30.  Carroway M, Tzipori 5, Widmer G. 1996. Identification of genetic heterogeneity in the Cryptosporidium parvum ribosomal repeat. App. and Env. Microbiology 62(2): 712-716.  31.  Casedevall A, Pirofski L. 2000. Host-pathogen Interactions: Basic concepts of microbial commensalisms, colonization, infection and disease. Infection & Immunity 68(1 2):65 11-6518.  32.  Casedevall A, Pirofski L. 1999. Host-pathogen Interactions: redefining the basics of concepts of virulence and pathogenicity. Infection & Immunity 67(8):3703-3713.  33.  Casemore D. Molecular and Antigenic aspects of Cryptosporidium and Cryptosporidiosis, a brief review. Public Health Laboratory Service, Cryptosporidium Reference Unit, Wales, UK. Appendix 7: 137-142.  34.  Casemore D, Armstrong M, Sands R. 1985. Laboratory Diagnosis of Cryptosporidium. J. of Clinical Microbiology 38: 1337-1341.  35.  Casemore D, Garder C, O’mahony C. 1994. Cryptosporidial infection, with special reference to Nosocomial transmission of Cparvum: a review. Folia Parasitology 41(1): 17-21.  36.  Cavallo A, Martin A. 2004. Mapping SNPs to protein sequence and structure data. Bioinformatics 2 1(8): 1443-1450.  37.  Center for Disease Control; Atlanta, United States.  38.  Cevallos A, Bhat N, Verdon R, et al. 2000. Mediation of Cryptosporidium parvum infection in vitro by mucin-like glycoproteins defined by a neutralizing monoclonal antibody. Infection & Immunity 68(9): 5167-5175.  39.  Cevallos A, Zhang X, Waldor M, et al. 2000. Molecular cloning and expression of a gene encoding Cryptosporidiumparvum glycoproteins gp4O and gplS. Infection & Immunity 68(7): 4108-4116.  40.  Chalmers R, Hadfield 5, Jackson C, Elwin K, Xiao L, Hunter P. 2008. Geographic Linkage and Variation on Cryptosporidium hominis. Emerging Infectious Disease 14(3):496-498.  41.  Chalmers R, Sturdee A, Bull 5, Miller A, Wright E. 1997. The prevalence of Ciyptosporidium parvum and C. muris in Mus domesticus, Apodemus sylvaticus and Clethrionomys glareolus in agricultural system. Parasitology Res. 83:478-482.  42.  Chappell C, Okhuysen P, Sterling R, DuPont H. 1995. Cryptosporidiumparvum: intensity of infection and oocyst excretion patterns in healthy volunteers. J. Infect Dis. 173(1 ):232-6.  43.  Chou, P.Y. & Fasman, G.D. 1974. Prediction of protein conformation. Biochemistry 13:222— 245.  44.  Che Y, Chen X. 2003. A multiplexing single nucleotide polymorphism typing method based on restriction-enzyme-mediated single-base extension and capillary electrophoresis. Analytical Biochemistry 329(2):220-229.  45.  Chin J. 2000. Control of Communicable Diseases Manual;  th 17  Edition. United Book Press. 132  46.  Combes C. 2001. Parasitism; the Ecology and Evolutions of Intimate Interactions. University of Chicago Press.  47.  Conner J, Harti D. 2004. A Primer of Ecological Genetics. Sinauer Associates. Sunderland, Massachusetts, U.S.A.  .  Cortes A, Mellombo M, Mueller I, Benet A, Reeder J, Anders R. 2003. Geographical Structure of Diversity and Differences between Symptomatic and Asymptomatic Infections for Plasmodiumfalciprium Vaccine Candidate AMA1. Infection & Immunity 71(3):1416-1426.  49.  Culley T, Wallace L, Gengler-Nowak K, Crawford D. 2001. A comparison of two methods of calculating Gst, a genetic measure of population differentiation. American J. of Botany 89:460465.  50.  Current W, Reese N, Weinstein W, et a!. 1983. Human cryptosporidiosis in immunocompetent and immunodeficient persons. Studies of an outbreak and experimental transmission. New England J. of Medicine 308: 1252-1257.  51.  Dawson J, Weinger J, Engelinan D. 2002. Motifs of Serine and Threonine can Drive Association of Transmembrane Helices. J. of Molecular Biology 316:799-805.  52.  Deng M, Templeton T, Abrahamsen J, et al. 2002. C. parvum genes containing thrombospondin type-i domains. Infection & Immunity 70(1 2):6987-6995.  53.  Denton H, Brown S, Coombs G, et al. 1996. Comparison of the phosphofructokinase and pyruvate kinase activities of C. parvum, E. tenella and T. gondii. Mo!. and Biochem. Parasitology 76:23-29.  54.  Deonier R, Tavare 5, Waterman M. 2005. Computational Genome Analysis. Springer Science.  55.  Dronamraju K. 2004. Infectious Disease and Host-Pathogen Evolution. Cambridge University Press. Cambridge, United Kingdom.  56.  Dubey J, Speer C, Fayer R, et al. 1990. Cryptosporidiosis of man and animals. Boston: CRC Press, 1990:1-199.  57.  Dupont H, Chappell C, Sterling C, Jakubowski W, et al. 1995. The infectivity of Cryptosporidiumparvum in healthy volunteers. New England J. Medicine 332:855-926a.  58.  El-Sayed N, Myler P, Blandin G, Hall N, et a!. 2005. Comparative Genomics of Trypanosomatid Parasitic Protozoa. Science 309:408-409.  59.  Environment Canada.  60.  Etkin N. 2003. The co-evolution of people, plants, and parasites: biological and cultural adaptations to malaria. Proceedings of the Nutrition Society 62:311-317.  61.  Fayer R, Andrews C, Ungar B, Blagburn B. 1989. Efficacy of hyper immune bovine colostrums for prophylaxis of cryptosporidiosis in neonatal calves. J. of Parasitology 75: 393-397.  133  62.  Fayer R, Morgan U, Upton S. 2000. Epidemiology of Crypto: transmission, detection and identification. International J. of Parasitology 30:1305-1322.  63.  Fayer R, Speer C, Dubey J. 1997. The general biology of Cryptosporidium: 1-41. In R. Fayer (ed.), Cryptosporidium and cryptosporidiosis. CRC Press, Boca Raton, Fla.  64.  Fayer R, Unger B. 1986. Cryptosporidium spp and Cryptosporidiosis. Microbiology reviews 50:458-483.  65.  Feng X, Rich S, Tzipori S, Widmer G. 2002. Experimental evidence for genetic recombination in the opportunistic pathogen Cryptosporidium parvum. Mo!. and Bio. Parasitology 119: 5 5-62.  66.  Ferrer-Costa C, Orozco M, de le Cruz X. 2001. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. of Molecular Biology 3 15(4):771-786.  67.  Fisher M, Koenig G, White T, Taylor J. 2000. Pathogenic Clones versus Environmentally Driven Population Increase: Analysis of an Epidemic of the Human Fungal Pathogen Coccodioides immitis. J. of Clinical Microbiology 38(2): 807-813.  68.  Forsdyke D. 2006. Evolutionary Bioinformatics. Springer Science. New York, New York, U.S.A.  69.  Gao L, Ge, S, Hong D. 2000. Low Levels of Genetic Diversity within Populations and High Differentiation Among Populations of Wild Rice, Oryza Granulata Nees et Ar Ex. Watt., From China. International J. of Plant Science 16 1(4):69 1-697.  70.  Gamier, Osguthorpe and Robson. 1978. J. of Molecular Biology 120:97-120.  71.  Gatei W, Greensill J, Hart A, et a!. 2003. Molecular Analysis of the 1 8s rRNA Gene of Ciyptosporidium Parasites from Patients with or without Human Immunodeficiency Virus Infections Living in Kenya, Malawi, Brazil, the United Kingdom, and Vietnam. J. of Clinical Microbiology 41(4): 1458-1462.  72.  Gasser R, Abs EL-Osta Y, Chalmers R. 2003. Electrophoretic Analysis of Genetic Variability within Cryptosporidiumparvum from Imported and Autochthonous Cases of Human Cryptosporidiosis in the United Kingdom. App. and Environmental Microbiology 69(5): 27192730.  73.  Gaur D, Mayer G, Miller L. 2004. Parasite ligand-host receptor interactions during invasion of erythrocytes by Plasmodium merozoites. International J. for Parasitology 34(13, 14): 1413-1429.  74.  Gavrilescu C, Denkers E. 2003. Apoptosis and the balance of homeostatic and pathologic responses to protozoan infection. Infection & Immunity 71(11 ):6 109-6115.  75.  Glaberman 5, Moore J, Xiao L, et a!. 2002. Three Drinking Water-Associated Cryptosporidiosis Outbreaks, Northern freland. Emerging Infectious Diseases 8(6): 631-633.  76.  Glaser C, Safrin 5, Reingold A, Newman T. 1998. Association between Cryptosporidium infection and animal exposure in H1V-infected individuals. J. Acquired Immune Deficiency Syndrome. Hum. Retrovirology 17:79-82. 134  77.  Gibbons C, Gazzard B, Awad-El-Kariem F, et a!. 1998. Correlation between markers of strain variation in Cryptosporidium parvum: Evidence of clonality. Parasitology International 47: 139147.  78.  Goodgame R. 1996. Understanding intestinal spore forming protozoa: cryptosporidia, microsporidia, Isospora and Cyclospora. Ann intern. Medicine 1 24(4):429-44 1  79.  Goudet J. FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics. www.2.unil.chlpopgenfsoftwares/fstat.htm  80.  Graczyk TK, Fayer R, Cranfield MR. Owens R. 1997. Infectivity of Ciyptosporidium parvum oocysts is retained upon intestinal passage through a migratory water-fowl species (Canada goose, Branta canadensis). J. Parasitology 83(1): 111-4.  81.  Graur D, Li W. 2000. Fundamentals of Molecular Evolution; 2’ Edition. Sinauer Associates.  82.  Grinberg A, Lopez-Villalobos N, Pomroy W, Widmer G, Smith H, Tait A. 2008. Host-shaped segregation of the Cryptosporidiumparvum multilocus genotype repertoire. Epidemiology and Infection 136:273-278.  83.  Gut I. 2001. Automation in genotyping of single nucleotide polymorphisms. Human Mutations 17:475-492.  84.  Gutacker M, Smoot J, Musser J., et al. 2002. Genome-wide Analysis of Synonymous Single Nucleotide Polymorphisms in Mycobacterium tuberculosis Complex Organisms: Resolution of Genetic Relationship among Closely Related Strains. Genetics 162:1533-1543.  85.  Guyot K, Follet-Dumoulin E, Dei-Cas E, et al. 2001. Molecular Characterization of Cryptosporidium Isolates Obtained from Humans in France. J. of Clinical Microbiology 39(10): 3472-3480.  86.  Haas C, Rose J. 1994. Reconciliation of microbial risk assessment and epidemiology: the case of the Milwaukee outbreak. In: Proceedings of the 1994 Conference of the American Water Works Association water quality: 517-523. —  87.  Harrus S, Baneth G. 2005. Drivers for the emergence and re-emergence of vector-borne protozoal and bacterial diseases. International J. for Parasitology 35(11-12): 1309-1318.  88.  Hanski I, Gaggiotti 0. 2004. Ecology, Genetics, and Evolutions of Metapopulations. Elsevier Academic Press.  89.  Hay S, Guerra C, Tatem A, Noor A, Snow R. 2004. The global distribution and population at risk of malaria: past, present and future. The Lancet, Infectious Disease 4:327-336.  90.  Health Canada. 2001. Waterborne Cryptosporidiosis Outbreak, North Battleford, Saskatchewan, Spring 2001. Canadian Communicable Disease Report 27(22): 185-192.  91.  Heuser V, Kuenzi P, Rottenberg S. 2001. Inhibition of apoptosis by intracellular protozoan parasites. International J. for Parasitology 31(11): 1166-1172.  135  92.  Hey J. 1999. Parasite populations: The puzzle of Plasmodium. Current Biology 9(15): R565R566.  93.  Hoar B, Atwill E, Elmi C, Farver T. 2001. An examination of risk factors associated with beef cattle shedding pathogens of potential zoonotic concern. Epidemiology Infection 127:147-155.  94.  Hojlyng N, Holten-Anderson W, Jepsen S. 1987. Cryptosporidiosis: a case of airborne transmission. Lancet 2: 271-272.  95.  Hooda P, Edwards A, Miller A. 2000. A review of water quality concerns in livestock farming areas. Sci. Total Environment 250: 143-167.  96.  Hopp, T. P., K. R. Woods. 1981. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. USA 78:3824.  97.  Horton R, Moran L, Ochs R, Rawn D, Scrimgeour G. 1996. Principles of Biochemistry; 2’ Edition. Prentice Hall.  98.  Hunter P, Nichols G. 2002. Epidemiology and Clinical Features of Cryptosporidium Infection in Immunocompromised Patients. Clinical Microbiology Reviews 15(1): 145-154.  .  Hunter P. Quigly C. 1998. Investigation of an outbreak of cryptosporidiosis associated with treated surface water finds limits to the value of case control studies. Communicable Disease and Public Health 1(4): 234-23 8.  ioo. Isaac-Renton J, Blatherwick J, Robertson W, et al. 1999. Epidemic and endemic seroprevalance of antibodies to Cryptosporidium and Giardia in residents of three communities with different drinking water supplies. American J. of Tropical Medicine Hygiene 60(4): 578-583. 101.  Jakubowski W. 1995. Crypto and Giardia: the details. Safe drinking water seminar, US EPA.  102.  Jackson J, Tinsley R. 2005. Geographic and within population structure in variable resistance to parasite species and strains in a vertebrate host. International J. for Parasitology 3 5:29-37.  103.  Jameson, BA and Wolf, H. 1988. The antigenic index: a novel algorithm for predicting antigenic determinants. Bioinformatics 4:181-186.  104.  Joce R, Bruce J, Kiely D, et al. 1991. An outbreak of Cryptosporidiosis associated with a swimming pool. Epidemiology Infection 107:497-508.  los. Joe A, Verdon R, Ward H, et al. 1998. Attachment of Cryptosporidium parvum Sporozoites to Human Intestinal Epithelial Cells. Infection & linmunity 66(7): 3429-3432. 106.  Jolly C, Vourch C, Robert-Nicoud M, Morimoto R. 1999. Intron-dependent association of splicing factors with active genes. J. of Cell Biology 145(6):1 133-1143.  107.  Juranek D. 1995. Cryptosporidiosis: sources of infection and guidelines for prevention. Clinical Infectious Disease 21 Supplement 1: S57-S61.  136  108.  Kaiser A, Gottwald A, Maier W, Seitz, H. 2003. Targeting enzymes involved in spermidine metabolism of parasitic protozoa: a possible new strategy for anti-parasitic treatment. Parasitology Research 91:508-5 16.  109.  Keeling, P. 2004. Reduction and compaction in the Genome of the Apicomplexan Parasite Cryptosporidiumparvum. Developmental Cell: 614-616.  110.  Keithly J, Zhu G, Upton S, et a!. 1997. Polyamine biosynthesis in C. parvum and its implications for chemotherapy. Molecular and Biochemical Parasitology 88:35-42.  iii.  Keusch G, Joe A, Hamer D, Ward H, et al. 1995. Cryptosporidia who is at risk? Schweiz Med Wochenschr 125(18):899-908.  112.  Khan, 0. 2003. A review of cryptosporidiosis. Johns Hopkins University.  113.  Kimura M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press.  114.  Kimura M. 1980. A simple method for estimating the evolutionary rates of base substitutions through comparative studies of sequence analysis. J. of Molecular Evolution 16(2): 111-120.  115.  Koji Lum J, Kaneko A, Tanabe K, Takahashi N, Bjorkman A, Kobayakawa T. 2003. Malaria dispersal among islands: human mediated Plasmodiumfalciparum gene flow in Vanuatu, Melanesia. Acta Tropica 90:181-185.  116.  Kyte, J. and Doolittle, R. 1982. A simple method for displaying the hydropathic character of a protein. J. Molecular Biology 157:105-132.  117.  Landfear S, Ullman B, Carter N, Sanchez M. 2004. Nucleoside and nucleobase transporters in parasitic protozoa. Eukaryotic Cell 3(2):245-254.  118.  LaGier M, Keithly J, Zhu G. 2002. Characterization of a novel transporter from C. parvum. International J. for Parasitology 32:877-887.  119.  Langer R, Riggs M. 1999. C. parvum apical complex CSL contains a sporozoites ligand for intestinal epithelial cells. Infection & Immunity 67(10):5282-5291.  120.  Leav B, Mackay M, Ward H, et al. 2002. Analysis of Sequence Diversity at the Highly Polymorphic Cpgp4O/15 Locus among Cryptosporidium Isolates from Human Immunodeficiency Vfrus-Infected Children in South Africa. Infection & Immunity 70(7): 3881-3890.  121.  LeChevallier, M.W. et al., Occurrence of Giardia and Cryptosporidium spp. in surface water supplies. Applied and Environmental Microbiology 57(9): 2610-2616 (1991).  122.  LeChevallier M.W. et al., Giardia and Cryptosporidium spp. in filtered drinking water supplies. Applied and Environmental Microbiology 57(9):26 17-2621(1991).  i.  Lederberg J. 1998. Emerging Infections: An Evolutionary Perspective. Emerging Infectious Disease 4(3):366-370.  i.  Leng X, Mosier D, Oberst R. 1996. Differentiation of Cryptosporidium parvum, C. muris and C. baileyi by PCR-RFLP analysis of the 18s rRNA gene. Veterinary Parasitology 62 (1 and 2): 1-7.  —  137  125.  Leoni F, Mallon M, Smith H, Tait A, McLauchlin J. 2007. Multilocus Analysis of Cryptosporidium hominis and Cryptosporidiumparvum Isolates from Sporadic and OutbreakRelated Human Cases and C. parvum Isolates from Sporadic Livestock Cases in the United Kingdom. J. of Clinical Microbiology 45(l0):3286-3294.  126.  Lewis, P. 0., and Zaykin, D. 2001. Genetic Data Analysis: Computer program for the analysis of allelic data. Version 1.0 (dl6c). Free program distributed by the authors over the internet from  127.  Li J, Collins W, McCutchan T, et al. 2001. Geographic Subdivision of the Range of the Malaria Parasite, Plasmodium vivax. Emerging Infectious Diseases 7(1): Synopsis. 6123.  128.  Liberles D, Wayne. 2002. Tracking adaptive evolutionary events in genomic sequences. Genome Biology 3(6):1018.1-1018.4.  129.  Lynch M. 2002. Intron evolution as a population genetic process. Proceedings National Academy Science USA 99(9):61 18-6 124  130.  MacKenzie W, Hoxie N, Davis J, et al. 1994. A Massive Outbreak in Milwaukee of Cryptosporidium Infection Transmitted through the Public Water Supply. New England J. of Medicine 33 1(3): 161-167.  131.  MacPherson C. 2005. Human behavior and the epidemiology of parasitic zoonoses. International J. for Parasitology 35(11-12): 1319-1331.  132.  Madern D, Cai X, Abrahamsen M, Zhu G. 2003. Evolution of C. parvum lactate dehydrogenase from malate dehydrogenase by a very recent event of gene duplication. Molecular Biology Evolution 21 (3):489-497.  133.  Madigan M, Martinko J, Parker J. 2003. Brook Biology of Microorganisms. Prentice Hall.  it.  Mallon, M., MacLeod A, Wastling J, Smith H, Reilly B, Tait A. 2003. Population structures and the role of genetic exchange in the zoonotic pathogen Cryptosporidium parvum. J. Molecular Evolution 56:407—417.  135.  Mallon M, MacLeod A, Tait A, et al. 2003. Multi-locus genotyping of Cryptosporidiumparvum Type 2: population genetics and sub-structuring. Infect. Genetic Evolution 3: 207-218.  136.  Maff J, Nilsen T, Komuniecki R. 2003. Molecular Medical Parasitology. Elsevier Academic Press.  137.  Marshall M, Naumovitz D, Ortega Y, Sterling C. 1997. Waterborne Protozoan Pathogens. Clinical Microbiology Reviews, 10:67-85.  138.  McCole D, Eckman L, Laurent F, KagnoffM. 2000. Intestinal epithelial cell apoptosis following Cryptosporidium parvum infection. Infection & linmunity 68(3): 1710-1713.  139.  Mclaughlin J, Amar C, Pedraza-Diaz S, Nichols G. 2000. Molecular Epidemiological Analysis of Cryptosporidium spp. In the United Kingdom: Results of Genotyping Cryptosporidium spp. In 1,705 Fecal Samples from humans and 105 Fecal Samples from Livestock Animals. J. of Clinical Microbiology 38(11): 3984-3990. 138  140.  McLaughlin J, Pedraza-Diaz S, Amar-Hoetzeneder, C, Nichols G. 1999. Genetic Characterization of Cryptosporidium Strains from 218 Patients with Diarrhea Diagnosed as Having Sporadic Cryptosporidiosis. J. of Clinical Microbiology 37(10): 3153-3158.  141.  McPherson M, Moller S. 2000. PCR. BIOS Scientific Publishers.  142.  Mele R, Morales M, Tosini F, Pozio E. 2004. C. parvum at different developmental stages modulates host cell apoptosis in vitro. Infection & Immunity 72(1 0):606 1-6067.  i.  Millard P, Gensheimer K, Addiss D, et al. 1994. An outbreak of Cryptosporidiosis from freshpressed apple cider. JAMA 272:1592-1596.  i.  Morgan U. 2000. Detection and characterization of parasites causing emerging zoonoses. International J. for Parasitology 30:1407-1421.  145.  Morgan-Ryan U, Fall A, Xiao L, et al. Cryptosporidium hominis n. sp. (Apicomplexa: Cryptosporidiidae) from Homo sapiens. J. of Eukaryotic Microbiology 49(6): 433-440.  i.  Morgan U, Sargent K, Thompson R, et al. 1998. Molecular characterization of Cryptosporidium from various hosts. Parasitology 117: 3 1-37.  147.  Morgan U, Weber R, Deplazes P, et al. Molecular characterization of Cryptosporidium Isolates Obtained from Human Immunodeficiency Virus-Infected Individuals Living in Switzerland, Kenya and the United States. J. of Clinical Microbiology 38(3): 1180-1183.  148.  Morgan U, Xiao L, Thompson A, et al. 1999. Variation in Cryptosporidium: towards a taxonomic revision of the genus. International J. for Parasitology 29: 1733-1751.  149.  Mu J, Awadalla P. Su X, et al. 2007. Genome-wide variation and identification of vaccine targets in the Plasmodiumfalciparum genome. Nature Genetics 39(1): 126-130.  iso. Musser 3. 1996. Molecular population genetic analysis of emerged bacterial pathogens: selected insights. Emerging Infectious Disease 2:1-17. 151.  Navin TR, Juranek DD. 1984. Cryptosporidiosis: clinical, epidemiologic, and parasitological review. Reviews Infectious Disease 6(3):3 13-27.  152.  Nei M. 1987. Molecular Evolutionary Genetics. Columbia University Press.  153.  Nime F, Burek D, Yardley J, et al. 1976. Acute enterocolitis in a human being infected with the protozoan Cryptosporidium. Gastroenterology 70: 592-598.  154.  Nelson K, Williams C, Graham N. 2001. Infectious Disease Epidemiology; Theory and Practice. Aspen Publications.  155.  Ngouanesavanh, T, Guyot K, Banuls A, et al. 2006. Cryptosporidium population genetics: evidence of clonality in isolates from France and Haiti. J. Eukaryotic Microbiology 53:S33—S36.  is. O’Donoghue P. 1995. Cryptosporidium and cryptosporidiosis in man and animals. International J. of Parasitology 25: 139-195. 139  157.  Okhuysen P. Chappell C, Crabb J, Sterling C, DuPont H. 1999. Virulence of three distinct Cryptosporidium parvum isolates for healthy adults. J. Infectious Disease 180:1275—1281.  158.  Okhuysen P. Chappell C, Sterling C, Jakubowski W, DuPont H. 1998. Susceptibility and serologic response of healthy adults to re-infection with Cryptosporidium parvum. Infection & Immunity 66(2):441-3.  159.  Oleksiak M, Churchill G, Crawford D. 2002. Variation in gene expression within and among natural populations. Nature Genetics 32:261-266.  160.  Ohno, 5. 1984. Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence. Proceedings National Academy Science USA, 81: 2421— 2425.  161.  O’Neil R, Lilien R, Donald B, et al. 2003. The crystal structure of DI{FR-thymidylate synthase from C. hominis reveals a novel architecture for the bi-functional enzyme. J. of Eukaryotic Microbiology, 50(sl): 555-556.  162.  Ong C, Eisler D, Isaac-Renton J, et al. 2002. Novel Cryptosporidium Genotypes in Sporadic Cryptosporidiosis Cases: First Report of Human Infections with a Cervine Genotype. Emerging Infectious Diseases 8(3): 263-268.  163.  Ong C, Isaac-Renton J, Fyfe M, et al. 1999. Molecular epidemiology of Cryptosporidiosis Outbreaks and Transmission in British Columbia, Canada. American J. Tropical Medicine Hygiene 61(1): 63-69.  it.  Oura C, Asiimwe B, Weir W, Lubega G, Tait A. 2005. Population genetic analysis and substructuring of Theileriaparva in Uganda. Mol. and Biochemical Parasitology 140(2): 229-239.  165.  Patel 5, Pedraza-Diaz 5, Mclaughlin J, Casemore D. 1997. Molecular Characterization of Cryptosporidiumparvum from two large suspected waterborne outbreaks. Communicable Disease and Public Health 1(4): 23 1-233.  166.  Pearson T, Busch J, Keim P. 2004. Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proceedings National Academy Science 101(37):13536-13541.  167.  Patz J, Gracyzk t, Geller N, Vittor A. 2000. Effects of environmental change of emerging parasitic diseases. International J. for Parasitology 30:1395-1405.  168.  Pedraza-Diaz 5, Amar C, Nichols G, McLaughlin J. 2001. Nested Polymerase Chain Reaction for Amplification of the Cryptosporidium Oocyst Wall Protein Gene. Emerging Infectious Diseases 7(1): 49-56.  169.  Peng M, Matos 0, Gatei W, Xiao L. 2001. A Comparison of Cryptosporidium Sub-genotypes from several Geographic Regions. J. Eukaryotic Microbiology Supp. :28-29.  170.  Peng M, Xiao L, Beard C, et al. 1997. Genetic Polymorphism among Cryptosporidiumparvum Isolates: Evidence of Two Distinct Human Transmission Cycles. Emerging Infectious Diseases 3(4):567-573. 140  171.  Pereira S, Ramirez N, Xiao L, Ward L. 2002. Pathogenesis of Human and Bovine Ciyptosporidium parvum in Gnotobiotic Pigs. J. of Infectious Diseases 186: 715-718.  172.  Perryman L, Jasmar D, Riggs M, et a!. 1996. A cloned gene of the Cryptosporidiumparvum encodes neutralization-sensitive epitopes. Molecular and Biochemical Parasitology 80:137-147.  173.  Perz J, Le Blancq S. 2001. Cryptosporidiumparvum infection involving novel genotypes in wildlife from lower New York State. App. and Environmental Microbiology 67: 1154-1162.  174.  Petersen C. 1992. Cryptosporidiosis in patients infected with the human immunodeficiency virus. Clinical Infectious Disease 15: 903-909.  175.  Petersen C, Gut J, Leech J. 1992. Characterization of a >900,000-Mr C. parvum sporozoites glycoprotein recognized by protective hyper immune bovine colostral immunoglobulin. Infection & Immunity 60(12):5132-5 138.  176.  Polley L. 2005. Navigating parasite webs and parasite flow: Emerging and re-emerging parasitic zoonoses of wildlife origin. International J. for Parasitology 35(11-12): 1279-1294.  177.  Pozio E, Gomez M, Barbieri F, La Rosa G. 1992. Cryptosporidium: different behavior in calves of isolates of human origin. Transactions Royal Society, Tropical Medicine Hygiene 86: 636638.  178.  Priest J, Li A, Kktan M, Arrowood M, Lammie P, Ong C, Roberts J, Isaac-Renton J. 2001. Enzyme immunoassay detection of antigen-specific immunoglobulin g antibodies in longitudinal serum samples from patients with cryptosporidiosis. Clinical Diagnostic Lab Immunology (2):415-23.  179.  Priest J, Kwon J, Lammie P. et al. 1999. Detection by Enzyme Immunoassay of Serum linmunoglobulin G Antibodies That Recognize Specific Cryptosporidiumparvum Antigens. J. of Clinical Microbiology 37(5): 13 85-1392.  180.  Quiroz E, Bern J, La! a, et a!. 2000. An outbreak of cryptosporidiosis linked to a food handler. J. of Infectious Disease 181: 695-700.  181.  Ramirez N, Ward L, Sreevatsan S. 2004. A review of the biology and epidemiology of cryptosporidiosis in humans and animals. Microbes and Infection 6: 773-785.  182.  Reid S, Hoe N, Smoot M, Musser J. 2001. Group A streptococcus: allelic variation, population genetics, and host pathogen interactions. J. Clinical Investigations 107:393-399.  183.  Rich S, Hudson R, Ayala F. 1997. Plasmodiumfalciparum antigenic diversity: Evidence of clonal population structure. Proceedings National Acadamy Science 94:13040-13045.  184.  Rickard L, Siefker C, Boyle C, Gentz E. 1999. The prevalence of Cryptosporidium and Giardia spp. in fecal samples from free-ranging white-tailed deer (Odocoileus virginianus) in the southeastern United States. J. Veterinary Diagn. Investigations 11:65-72.  185.  Riley L. 2004. Molecular Epidemiology of Infectious Disease. American Society for Microbiology Publishing. Washington, DC, U.S.A. 141  186.  Rochelle P, Jutras E, Atwill E, De Leon R, Stewart M. 1999. Polymorphisms in the beta-tubulin gene of Cryptosporidium parvum differentiate between isolates based on animal host but not geographic origin. J. of Parasitology 85: 986-989.  187.  Roderic D. 2005. TreeViewX Version 0.5.0. www.darwin.zoology.  ms. Roe A, Sperling F. 2007. Population structure and species boundary delimitation of cryptic Dioryctria moths: an integrative approach. Molecular Ecology 16:3617-3633. i.  Ryan M, Sundberg J, Sauerschell R, Todd K. 1986. Cryptosporidium in wild cottontail rabbit (Sylvilagusfioridanus). J. Wildlife Disease 22:267.  190.  Sasahara T, Maruyama H, Inoue M, et al. 2003. Apoptosis of intestinal crypt epithelium after C. parvum infection. J. of Infectious Chemotherapy 9:278-281.  191.  Schaechter M, Engleberg C, Eisenstein B, Medoff G. 1999. Mechanisms of Microbial Disease; Edition. Lippincott, Williams and Wilkins.  rd 3  192.  Schork N, Fallin D, Lanchbury J. 2000. Single nucleotide polymorphism and the future of genetic epidemiology. Clinical Genetics 58:250-264.  193.  Sestak K, Ward L, Sheoran A, Feng X, Akiyoshi D, Tzipori S. 2002. Variability among Cryptosporidiumparvum genotype 1 and 2 immunodominant surface glycoproteins. Parasite Immunology 24:213-219.  194.  Shankhar S, Park Y. 2002. Genetic Structure of Six Korean Tea Populations as Revealed by RAPD-PCR Markers. Plant Genetic Resources 42:594-60 1.  195.  Shirley M, Harvey D. 2000. A Genetic Linkage Map of the Apicomplexan Protozoan Parasite Eimeria tenella. Genome research,  196.  Simpson V. 2002. Wild animals as reservoirs of infectious diseases. Veterinary Journal 163:128-146.  197.  Sischo W. Atwill E, Lanyon L, George J. 2000. Cryptosporidia on dairy farms and the role these farms may have in contaminating surface water supplies in the northeastern United States. Prey. Veterinary Medicine 43:253-2667.  198.  Slifko T, Smith H, Rose J. 2000. Emerging parasite zoonoses associated with water and food. International J. for Parasitology 30:1379-1393.  199.  Smith LM, Priest JW, Lammie PJ, Mead JR. 2001. Human T and B cell immunoreactivity to a recombinant 23-kDa Cryptosporidium parvum antigen. J. Parasitol. 87(3):704-7.  200.  Sorvillo F, Fujioka K, Mascola R, et al. 1992. Swimming-associated Cryptosporidiosis. American J. Public Health 82(5):742-744.  201.  Spano F, Casemore D, Cristani A, et al. 1997. PCR-RFLP analysis of the Cryptosporidium oocyst wall protein (COWP) gene discriminates between C. parvum isolates of human and animal origin. FEMS Microbiology Letters 150: 209-217. 142  202.  Spano F, Putignani L, Widmer G, et a!. 1998. Multilocus Genotypic Analysis of Cryptosporidium parvum Isolates from Different Hosts and Geographical Origins. J. of Clinical Microbiology 36(11): 3255-3259.  203.  Sreter T, Kovacs G, Varga I, et al. 2000. Morphologic, Host Specificity, and Molecular Characterization of a Hungarian Cryptosporidium meleagridis Isolate. App. and Environmental Microbiology 66(2): 73 5-738.  204.  Straub T, Daly D, Chandler D, et al. 2002. Genotyping Cryptosporidiumparvum with an hsp70 Single-Nucleotide Polymorphism Microarray. App. and Environmental Microbiology 6 8(4): 1817-1826.  205.  Striepen B, Pruijssers A, Kissinger J, et a!. 2004. Gene transfer in the evolution of parasite nucleotide biosynthesis. Proceedings National Academy Science 10 1(9):3 154-3159.  206.  Strong W, Gut J, Nelson R. 2000. Cloning and Sequence Analysis of a Highly Polymorphic Cryptosporidiumparvum Gene Encoding a 60-Kilodalton Glycoprotein and Characterization of Its 15- and 45-Kilodalton Zoite Surface Antigen Products. Infection & Immunity 68(7): 41174134.  207.  Sturbaum G, Jost H, Sterling C. 2003. Nucleotide changes within three Cryptosporidium parvum surface protein encoding genes differentiate genotype 1 from genotype 2 isolates. Molecular and Biological Parasitology 128: 87-90.  208.  Sturdee A, Bodley-Tickell, Archer A, Chalmers R. 1993. Long-term study of Cryptosporidium prevalence on a lowland farm in the United Kingdom. Vet. Parasitology 45: 209-213.  209.  Sturdee A, Chalmers R, Bull S. 1999. Detection of Cryptosporidium oocysts in wild mammals of mainland Britain. Veterinary Parasitology. 80:273-280.  210.  Sulaiman I, Morgan U, Xiao L, et al. 2000. Phylogenetic Relationships of Cryptosporidium Parasites Based on the 70-Kilodalton Heat Shock Protein (HSP7O) Gene. App. and Environmental Microbiology 66(6): 2385-2391.  211.  Sulaiman I, Xiao L, Lal A. 1999. Evaluation of Cryptosporidium parvum genotyping techniques. App. and Environmental Microbiology 65: 4431-4435.  212.  Sulaiman I, Xiao L, Lal A, et al. 1998. Differentiating human from animal isolates of Cryptosporidium parvum. Emerging Infectious Disease 4: 681-685.  213.  Sundberg J, Hill D, Ryan M. 1982. Cryptosporidiosis in a gray squirrel. J. American Veterinary Med. Association 181:1420-1422.  214.  Tamura K, Nei M, Kumar 5. 2004. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proceedings National Academy Science 101(30): 11030-11035.  215.  Tanabe K, Sakihama N, Kaneko A. 2004. Stable SNPs in Malaria Antigen Genes in Isolated Populations. Science 303: 493.  143  216.  Tanriverdi S, Arslan M, Akiyoshi D, Tzipori S, Widmer G. 2003. Identification of genotypically mixed Cryptosporidiumparvum populations in humans and calves. Molecular Biochemical Parasitology 130:13—22.  217.  Tanriverdi, S, Blain J, Deng B, Ferdig M, Widmer G. 2007.Genetic crosses in the apicomplexan parasite Ciyptosporidiumparvum defines recombination parameters. Molecular Microbiol. 63:1432—1439.  218.  Tanriverdi S, Grinberg A, Chalmers M, Widmer G, et al. 2008. Inferences about the Global Population Structures of Cryptosporidium parvum and Cryptosporidium hominis. App. And Environmental Microbiology 74(23):7227—7234.  219.  Tanriverdi, S, Markovics A, Arsian M, Itik A, ShkapV, Widmer G. 2006. Emergence of distinct genotypes of Cryptosporidium parvum in structured host populations. App. and Environmental Microbiology 72:2507—2513.  220.  Templeton T, Lancto C, Abrahamsen M, et a!. The Cryptosporidium oocysts wall protein is a member of a multigene family and has homology in Toxoplasma. Infection & linmunity 72(2): 980-987.  221.  Teodorovic 5, Braverman J, Elmendorf H. 2007. Unusually Low Levels of Genetic Variation among Giardia lamblia Isolates. Eukaryotic Cell 6(8): 1421-1430.  222.  Thompson A. 2004. The zoonotic significance and molecular epidemiology of Giardia and giardiasis. Veterinary Parasitology 126(1-2): 15-35.  223.  Tibayrenc M, Kjellbeg F, Ayala F. 1990. A clonal theory of parasitic protozoa: The population structures of Entamoeba, Giardia, Leishmania, Naegleria, Plasmodium, Trichomonas, and Trypanosoma and their medical and taxonomical consequences. Proceedings National Academy Science 87: 2414-2418.  224.  Traub R, Monis P, Robertson I. 2005. Molecular epidemiology: a multidisciplinary approach to understanding parasitic zoonoses. International J. for Parasitology 35(11-12): 1295-1307.  225.  Turakulav R, Easteal S. 2003. Number of SNPs Loci Needed to Detect Population Structure. Human Hereditary 55:37-45.  226.  Tyzzer E. 1907. A sporozoan found in the peptide glands of the common mouse. Proceedings Soc. Experimental Biology Medicine: 12-13.  227.  Tzipori S. 1988. Cryptosporidiosis in perspective. Advances in Parasitology 27:63-119.  228.  Umejiegon N, Li C, Riera T, et a!. 2004. C. parvum IMP dehydrogenase. J. of Biological Chemistry 279(39): 40320-40327.  229.  Ungar B, Ward D, Fayer R, Quinn C. 1990. Cessation of Cryptosporidium-associated diarrhea in an acquired immunodeficiency syndrome patient after treatment with hyper immune bovine colostrum. Gastroenterology 98(2):486-9.  230.  Upton S. 2003. Basic biology of Cryptosporidium. Parasitology Laboratory, Kansa State University. 144  231.  VanLin L, Pace T, et al. 2001. Interspecies conservation of gene order and intron-exon structure in a genomic locus of high gene density and complexity in Plasmodium. Nucleic Acids Research 29(10): 2059-2068.  232.  Vasquez J, Gooze L, Nelson C, et al. 1996. Potential antifolate resistance determinants and genotypic variation in the biofunctional dihydrofolate reductase-thymidylate synthase gene from human and bovine isolates of Cryptosporidiumparvum. Molecular and Bio. Parasitology 79: 153-165.  233.  Volkman 5, Harti D, Nilesen K, Winzeler E, et al. 2002. Excess Polymorphisms in Genes for Membrane Proteins in Plasmodiumfalciparum. Science 298:216-218.  234. University of Lethbridge, Alberta, British Columbia.  235.  Weir B. 1990. Genetic Data Analysis. Sinauer Associates Press.  .  Whitaker R, Grogan D, Taylor J. 2003. Geographic Barriers Isolate Endemic Populations of Hyperthermophilic Archaea. Science 301(563 5):976-978.  237.  Widmer G, Tchack L, Chappell C, Tzipori 5. 1998. Sequence Polymorphism in the 13-tubulin Gene Reveals Heterogeneous and Variable Population Structures in Cryptosporidium parvum. App. and Environmental Microbiology 64(11): 4477-4481.  238.  Widmer G, Lin L, Kapur, Feng X, Abrahamsen M. 2002. Genomics and genetics of Cryptosporidiumparvum: the key to understanding cryptosporidiosis. Microbes and Infection 4: 1081-1090.  239.  Willocks, Crampin A, Lightfoot N, et al. 1998. A large outbreak of cryptosporidiosis associated with a public water supply from a deep chalk borehole. Communicable Disease and Public Health 1(4): 239-243.  240.  World Health Organization (WHO).  241.  Xiao L, Sulaiman IM, Ryan UM, Zhou L, Atwill ER, Tischler ML, Zhang X, Fayer R, Lal A. Host adaptation and host-parasite co-evolution in Cryptosporidium: implications for taxonomy and public health. International J. for Parasitology 32(14):1773-85.  242.  Xiao L, Bern H, Checkley J, et al. 2001. Identification of 5 types of Cryptosporidium parasites in children in Lima, Peru. J. of Infectious Disease 183: 492-497.  243.  Xiao L, Morgan U, Altaf L, et al. 1999. Genetic diversity within Cryptosporidium parvum and Related Cryptosporidium Species. App. and Environmental Microbiology 65(8): 3386-3391.  244.  Xiao L, Morgan U, Lal A, et al. 2000. Cryptosporidium systematics and implications for public health. Parasitology Today 16: 287-292.  245.  Xu, P. et al. 2004. The genome of Cryptosporidium hominis. Nature 431, 1107-1112 (2004): Letters to Nature.  .  Zhu G, Marchewka M, Keithly J. 2000. Ciyptosporidiumparvum appears to lack a plastid genome. Microbiology 146: 3 15-321. 145  APPENDICES  146  Appendix 1. Cryptosporidium hominis genome characterized in comparison to that of C. parvum and P.  245 falciarum”  Figure A.1  table 1 homirtis genoma stim mary (a) The genome Size çI\Ab) No. of physical gaps No. of coritigs (c3÷C) content (96) Coding regionst Coding size (Mb) Percentage coding (G -i- C) content (36) No. of genes Mean gene length (bp) Gene density (bp per gene) Genes with introns (3’b) Hits nr Percentage hits nr Intergenio regions Non-coding size (Mb) Percentage not coding (GC) content (96) No of intergenic regions Mean length (bp) RNAs No. of tRNA genes No. of 5S rRNA genes No. of 5.8S,18S and 28S  C. homiri  C. parvrim  P. faiciparLim  9.16 246 1413 31 .7  9.1 1 5 30.3  22.85 93 n.a. 19.4  6.29 69 32.3 3.994 1,576 2,293 5—2036 2,331 58  8.80 74 31.9 3,952 1.720 2,305 536 2,483 63  12.03 53 23.7 5,268 2,283 4.338 5436 n.d. n.d.  2.87 31 30.3 4,003 71 6  2.32 25 25.6 3,960 585  10.83 47 14.6 6,392 1 .694  45 6 5  45 6 5  43 3 7  3,994 2,779  3.952 2,567  5,268 3,208  1.239 1.265 1.235  n.d. n.d. n.d.  1,613 1 .586 1,625  786 421 221  n .d. n.d. n.d.  1 .631 544 367  n.e.  (ID) The proteome Total predicted proteins Hypothetical proteins  Gene ontology Biological process Cellular component rvloleoularfunction Structural features Transmembrarie domain Signal peptide Signal anchor  Ar additional 673 very short contigs are not assembled arid probably include contaminsi, sequences. t Excluding Introns. j: Estimated intron content from expressed seqience tags. § Hits, or piJtative proteins in the non-redundant protein database. Hypothetical proteins, proteins without sufficient similarity to any other gene to permit functional assignment; n.a., not applicable; n.d., riot determined: physical gaps, those that no existing clon closes; tranamembrane ctarnains, TMHIv1M, Trans Membrane Hidden ivlarkov Model (for predictior of tranamembrane in proteins); signal peptide and signal anchor. S.gnalP—2.0. C. pa,rvurn and C. hom!nis Q enomes were annotated with identical strategies to perrn It corn par ison.  Figure A. 1. C. hominis genome summary, obtained from: Ping Xu, Giovanni Widmer, Yingping Wang, Luiz S. Ozaki, Joao M. Alves, Myrna G. Serrano, Daniela Puiu, Patricio Manque, Donna Akiyoshi, Aaron J. Mackey, William R. Pearson, Paul H. Dear, Alan T. Bankier, Darrell L. Peterson, Mitchell S. Abrahamsen, Vivek Kapur, Saul Tzipori and Gregory A. Buck. The genome of Cryptosporidium hominis Nature 431, 1107-1112(28 October 2004) doi: 10.103 8/nature02977  147  Appendix 2. Simplified sketch of spatially structured species.  Figure A.2  A  0 0 Figure A.2. Simplified sketch of spatially structured species, adapted from Connor and Hart1 . Shaded areas, 47  denoted B, are where the organisms live and are considered subpopulation of the metapopulation. They area within which some gene flow occurs is denoted by A and can range greatly in size.  148  Appendix 3. Applied Biosystems 313 Oxl automated sequence analyzer.  Figure A.3  Figure A.3. Automated DNA sequencer, model 31 30x1, used for fragment analysis of Ciyptosporidium hominis subpopulations. With the SNaPshot protocol, running a full 96-well plate, consisting of 96 individual samples, the potential for each sample to be genetically typed for up to 8-12 SNPs could be accomplished in less than 90 minutes.  149  Appendix 4. Liz 120 size standard profile for sizing fragments using SNaPshot single base extension chemistry.  Figure A.4  5400  —  4800 4200  3600 3000  —  —  —  2400  1800  —  1200 600  A Li  —  0 Dye/Sample Peak  ...  ..  Mrii*es  2J ..9s1.... Y 2Lt  ..  11& :P1  .......PJ 28 2fl  0, 10  1P:4 JP....  Size  ........  —  ?.J&.  ....S2Q  Peak Height 768 48 44 1  JPL .fl€  1I2:QO •  13.91  12D.DD  Peak Aces  2D91  Data Point  2765  0 ?P. ...........  4•1 8192  Z9  ?Z9.. 2.11. -  IMO 17832  34 q1 3793  Figure A.4. Electropherogram of LIZ12O size standard, Dye Set E, with predefmed peaks ranging in size from l5nt to l2Ont (shown in table below electropherogram). Acted as the reference against which Cryptosporidium fragments were sized.  150  Appendix 5. Example protean profiles: Cp23 and Gp60 loci.  Figure A.5 A,Cp23 profile and B, Gp60 profile.  A. Cp23 ORF bio-physical profile.  Scale ‘5101520253035404650560005707580858095100105110’  Al  I Alpha, Regions. Gainier.Robson  A  lAipha, Regions. ChouFasman  B  IBeta, Regions. GaiecRebson  B  IBeta, Regns Chou.Fasman —  — —  .  —  —  —  —  —  --c-----c  iTs, Regions. BanioiRobson —  •Tiin, Regions. Chou.Fasman  ri  Coil, Regions. GamieFRobson 2Hrophilidty Rot Kyte.oli1e  I I ii I F  .17  -I——I-  II  I  I Alpha, Aoophipathic Regics. Esenbeig  IBeta, Arophipathic Regions. senberg  I  —  I Re ci ble Regions. Karphjs.S±uli  I  12 Angenic Indeo  .  Jameson.Voll  Dirfa Ftc babilit Rot.  Figure A.5, A. Complete depiction of Cp23 open reading frame in regards to biophysical properties including Jameson Wolf antigenic index (pink), Emini surface probability (bottom: yellow), Kyte Dolittle hydrophilicity plot (blue, middle), and multiple secondary structure predictors (top: red, blue, yellow). Within the Protean program an individual SNP locus could be targeted generating a mathematical output for its position.  151  Appendix 5, Figure A.5 continued.  B. Gp60 ORF bio-physical profile.  20  40  P  I  I  I  I  100  120  140  160  180  •I  IlI  p.  .ip  I  I  200  220  —  I  I  240 260 _I  I  I  280  300  I  I  •  320  II  B  I Alpha, Regions. amier.Robson  I III ii. II I II i•• i iiIIII •III — liii liii Ill I •..i. iiiiiiii.  ... . — .  I  ___  .  —  . — .. .. —  ppppp  p  —  —  ••  ..  C Scale  340  I II I  A  B1 T T  I  80  .._  60  —  I Alpha,  Regions. ChouFasrnn  IBeta, Regions. Gainier.Robson IBeta, Regions Chou•Fasman .  ITuin, Regions. Lainier•Robson ITuin, Regions. Chou•Fasman  OCoil, Regions. Ganiier-Robson  0I  4.5 U  -  .  ‘.-—  --  iiiii  IHphilt.Kole  —  .ii I.  — •. i i  —  iii I  — —  F  —,  ii  i•I ii  Ii.  •i.  liii  II  II. •1  — I —  II  •I  I.—-..  • I  — ——  11111  —.  1111•  a  IlAIpha, Arnphipalhic Regions. Esenbergi jiAmpFdpa Regns. senbergI I Renible Regions. Karplus.Sthulz  I Angenic eden. Jameson•olf  J± ‘v kA ‘Id  rI  v  \.J  V,,’ I1•  O&iaon Robabili RotS ini  Figure A.5, B. Complete depiction of Gp60 open reading frame in regards to biophysical properties including Jameson Wolf antigenic index (pink), Emini surface probability (bottom: yellow), Kyte Dolittle hydrophilicity plot (blue, middle), and multiple secondary structure predictors (top: red, blue, yellow). Within the Protean program an  individual SNP locus could be targeted generating a mathematical output for its position.  152  Appendix 6. Alleles scored per SNP marker per subpopulation. Table A.1 Number of alleles sampled per SNP locus and population. SNP locus  Australia  Kenya  Peru  Scotland  Total  1  1  2  BTI  2  1  BT4  1  1  1  1  1  BT3  2  2  2  2  3  BT5  n.d.  I  1  1  3  BT7  2  2  1  1  3  BT8  2  1  1  2  2  COWP5  2  1  1  1  2  COWP6  2  1  2  1  3  COWP1  2  1  1  1  2  COWP3  2  1  2  1  3  COWP7  1  1  2  1  2  23Cp4  2  1  1  1  2  23Cp3  2  1  1  1  2  23Cp1  2  1  1  1  2  23Cp5  2  1  1  1  2  23Cp6  1  2  2  1  1  I8sRNA  I  1  1  1  1  18sRNA  I  I  I  I  I  HSPI4  2  1  1  1  2  HSPI7  2  2  1  1  2  HSPI9  1  2  2  1  1  HSP2O  2  1  1  1  2  HSP22  2  1  1  1  2  60Gp8O  2  2  2  nd.  2  6OGp1O8  2  2  2  1  2  6OGp126  2  1  2  1  1  6OGp79  1  2  1  1  2  6OGp98  2  3  3  1  3  6OGp115  2  3  3  3  3  LDHIO  2  1  1  1  2  LDH3  2  1  1  1  2  MDH8  2  1  1  1  2  MDH7  2  1  1  1  2  EMAg29  I  1  1  1  1  EMAg27  1  1  1  1  1  UPRTase  2  1  1  1  2  UPRTase  2  1  1  1  2  Table A. 1. Number of different alleles scored at each SNP molecular marker for each subpopulation as a whole; ranging from 1 to 4 (A, C, T or G). Most scored for a marker was 3. 153  Appendix 7. Future target protein loci for SNP-typing; multi-plex PCR gel electrophoresis, run against lOObp ladder. Figure A.6 A. Top Gel: Isolates , 115 reaction set 6; CCR, CTCL, AAD at 953, 678, and 817bp respectively K Bottom Gel: Isolates , 115 reaction set 7; CLL, SSK, FLJ at 983,507, and 849bp respectively. K  B. Isolates , 114 reaction set 9; Exp, SeroAg, RIK at 949, 777, and 573 respectively. P  154  Appendix 8. Genomiphi; whole genome amplification of Cryptosporidium DNA from fecal specimens.  An initial objective of the present study was to amplify genomic DNA in its entirety to assess the potential of isolating molecular markers without having to amplify gene specific regions. This would work to increase the throughput of the typing system almost 2-fold and was extremely cost-effective; it negated the need for gene specific primers and accompanying reagents and/or materials. In addition it was designed to generate large amounts of DNA which can be especially useful when the amount of a given sample is small. The system used was the GenomiPhi kit by GE Healthcare. The GenomiPhi kit utilizes bacteriophage Phi29 DNA polymerase to exponentially amplify single- or double-stranded linear DNA templates via a strand displacement reaction and therefore thermal cycling is not required. The genomic DNA template is combined with a sample buffer containing random hexamer primers. The mixture is heat denatured and cooled to allow random priming of the hexamers. Then, the remaining reaction components— including Phi29 DNA polymerase, deoxynucleotide triphosphates, and buffer components optimized for linear DNA synthesis—are added. This reaction mixture is incubated overnight at 30°C, during which time the available nucleotides are consumed and converted into high molecular weight fragment copies of the template DNA. The DNA replication is extremely accurate because of the proofreading activity of Phi29 DNA polymerase. Once genome amplification is completed various genotyping assays can be undertaken from a large base of synthetic DNA copies. We spent much time with the system attempting to evaluate it, finesse the procedure and obtain reproducibility. While the system design is beautiful in its theory and simplicity and amplification results were positive in those samples that did amplify, there was little confidence that downstream typing results were specific to Cryptosporidium genomic DNA. This is largely due to the fact that isolates were collected and processed from fecal specimens of patients under different protocols in facilities around the world so the likelihood of genetic material from other microorganisms, naturally occurring or invasive; being amplified as well was considered too high.  155  Appendix 8 continued.  Figure A.7 Simplified schematic of Genomiphi protocol.  1  U I iJ ri I iripu DNA i:iIotedor cell sciteI -  9 1  9  rction hjffr enz’me  [1  ii  cirnpie buffer  HeMto9Cør3m. Coal to4 C on ice.  mi<  [j J)  nubateat 33Cfor1-2h..then ractiae lhe enyrre a 6YC for 10 mm.  I 4-? q iroc&c  Irw )IA syrt?sis in r’o-enipkie cortrs)  Figure A.7. The Genomiphi protocol is engineered on the basis of amplification of genomic DNA material using the whole genome as a template.  156  Appendix 9. eBURST; inferring patterns of evolutionary descent.  Figure A.8 eBURST representation of evolutionary descent.  ST 1 $T2  A S T2  ST  o Q  I  ST5  B  Double 10C4JS variant (OL Single oc.u5 vrnt {SLV  —  —  SN FOurdrig genorype  STg  flure 1  ST7  STG  Figure A.8. The primary founder of a group is defined as the ST that differs from the largest number of other STs at only a single locus (i.e. the ST that has the greatest number of single-locus variants; SLVs). This method of  assigning the primary founder takes account of the way in which clones emerge and diversiQ,’ (Figure A); the initial diversification of the founding genotype of a clonal complex will result in variants of the founder that differ at only one of the seven loci (i.e., SLVs of the founder). The eBURST diagrams display the patterns of descent within each group from the predicted founding ST (Figure B).  END OF DISSERTATION; JMW, 2009  157  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items