Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The role of defensins and C-X-C chemokines in mammalian innate immunity Rehaume, Linda Marie 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_spring_rehaume_linda.pdf [ 7.88MB ]
Metadata
JSON: 24-1.0071450.json
JSON-LD: 24-1.0071450-ld.json
RDF/XML (Pretty): 24-1.0071450-rdf.xml
RDF/JSON: 24-1.0071450-rdf.json
Turtle: 24-1.0071450-turtle.txt
N-Triples: 24-1.0071450-rdf-ntriples.txt
Original Record: 24-1.0071450-source.json
Full Text
24-1.0071450-fulltext.txt
Citation
24-1.0071450.ris

Full Text

THE ROLE OF DEFENSINS AND C-X-C CHEMOKINES IN MAMMALIAN INNATE IMMUNITY  by Linda Marie Rehaume B.Sc., University of Victoria (with distinction), 2001  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Microbiology & Immunology) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) November 2010 © Linda Marie Rehaume, 2010  ABSTRACT In humans, defensins constitute the largest group of host defence peptides that are evolutionarily conserved components of innate immunity. Defensins share many structural and functional characteristics with C-X-C chemokines, including a C-X-C amino acid motif, net positive charge, disulphide bonding, three-dimensional shape and chemokine activity. Deficiencies in α-defensins and C-X-C chemokines have been correlated with susceptibility to infection and chronic inflammatory diseases. However the genetics and diversity of defensins and mechanisms underlying these disorders were not well understood. This thesis comprises three separate but overlapping approaches to address these issues. The genomic content of murine α-defensins within the reference C57BL/6J strain was characterized. Novel α-defensin (11) and defensin-related cryptdin (3) genes were found, as were gene duplications and differences in genomic content between strains of mice. A next-generation sequencing method was developed for the quantitative analysis of α-defensin and defensinrelated cryptdin gene expression. The α-defensin DEFA1 induced interleukin (IL) 8 and IL10 release from human PBMCs. The mechanism(s) of action of defensins, which appears to involve induction of chemokines and anti-inflammatory cytokines, needs further elucidation in vivo. Consequently, novel murine models of inflammation and immunosuppression were developed. The IL8 and Il10 genes were separately cloned, behind an intestine-specific promoter, into eukaryotic expression vectors, which were used to transfect murine embryonic stem cells. Correct targeting was confirmed for both constructs and germline transmission achieved for the IL8 mice. Conditional homozygous mice were generated, which, upon breeding with Cre-expressing mice, will express IL8, a C-X-C chemokine, in an intestinal-specific manner. This will enable analyses of effects of chemokine overexpression on intestinal infection, and on peptide efficacy in the resolution of infection. In other studies to address innate immune mechanisms, the transcriptional profiles of patients susceptible to Salmonella and mycobacterial infections due to immunodeficiencies in IL12- and interferon-γ-mediated immunity were generated. These data indicated that the chemokines CXCL9 and CXCL10 might mediate immunity to Mycobacteria whereas additional defects in TLR4 responses appeared to underlie susceptibility to Salmonella. The data presented here strengthen our understanding of the murine defensin repertoire and provide tools that enable sophisticated systems level studies of in vivo function.  ii  PREFACE A version of Chapter 2 has been published in: Amid, C.*, Rehaume, L.M.*, Brown, K.L., Gilbert, J.G.R., Dougan G., Hancock, R.E.W. and Harrow J.L. 2009. Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome. BMC Genomics 10:606. Authors retain copyright. * Equal contribution Manual annotation of the defensin gene cluster in the C57BL/6J mouse reference genome was performed by Clara Amid in the Human and Vertebrate Analysis and Annotation (HAVANA) group at the Wellcome Trust Sanger Institute (WTSI). Subsequent analyses and writing of the manuscript was shared by Linda Rehaume and Clara Amid with overall revisions being performed by our supervisors. The text has been significantly revised in this thesis. James Gilbert originally created Figure 2.1 that was further modified in collaboration with Clara Amid. Figures 2.10 and 2.11 were generated by Clara Amid. The defensin transcriptional and functional experiments were performed by Linda Rehaume, and have not been published. Ethical approval was obtained for the work in the thesis involving human material. The research in Chapter 2 is covered by The University of British Columbia Clinical Research Ethics Board Certificate H04-70232. The research in Chapter 4 is covered by The Royal Free Hampstead National Health Service Trust Research Ethics Committee Certificate 04/Q0501/119. Approval for the research carried out in Chapter 4 was also obtained from the Wellcome Trust Sanger Institute Human Material and Data Management Committee. Throughout the thesis, genes and proteins are referred to using approved nomenclature. This means that references to previous studies, where non-approved nomenclature was used, have been changed accordingly; although the non-approved nomenclature is also indicated in brackets for clarity. The distinction between human and mouse gene and protein symbols is made by capitalization. Human symbols contain all upper-case letters and mouse symbols begin with an upper-case letter followed by lower-case letters.  iii  TABLE OF CONTENTS ABSTRACT ................................................................................................................................................ ii PREFACE .................................................................................................................................................. iii TABLE OF CONTENTS ...........................................................................................................................iv LIST OF TABLES.......................................................................................................................................v LIST OF FIGURES.................................................................................................................................. vii ABBREVIATIONS ....................................................................................................................................ix ACKNOWLEDGEMENTS ..................................................................................................................... xii DEDICATION ......................................................................................................................................... xiii 1 INTRODUCTION..................................................................................................................................1 1.1 MODULATORS OF THE IMMUNE SYSTEM .......................................................................................1 1.2 DEFENSINS .....................................................................................................................................2 1.3 PEPTIDE THERAPEUTICS IN MODELS OF INFLAMMATION ..............................................................5 1.4 INTERLEUKIN 8 ..............................................................................................................................6 1.5 INTERLEUKIN 10 ............................................................................................................................9 1.6 CYTOKINES IN GASTROINTESTINAL DISEASE ..............................................................................11 1.7 PROJECT GOALS AND HYPOTHESES .............................................................................................13 2 α-DEFENSIN GENE ANNOTATION, QUANTITATIVE TRANSCRIPT EXPRESSION AND FUNCTIONAL ACTIVITY ......................................................................................................15 2.1 INTRODUCTION ............................................................................................................................15 2.2 METHODS AND MATERIALS .........................................................................................................17 2.3 RESULTS.......................................................................................................................................30 2.4 DISCUSSION..................................................................................................................................69 3 TARGETED KNOCK-IN OF INTERLEUKIN 8 AND INTERLEUKIN 10 INTO MURINE EMBRYONIC STEM CELLS TO GENERATE MICE WITH CONDITIONAL INTESTINAL-SPECIFIC GENE EXPRESSION ..............................................76 3.1 INTRODUCTION ............................................................................................................................76 3.2 METHODS AND MATERIALS .........................................................................................................80 3.3 RESULTS.....................................................................................................................................102 3.4 DISCUSSION................................................................................................................................132 4 CHARACTERIZATION OF TRANSCRIPTIONAL PROFILES INVOLVED IN INTERLEUKIN 12AND INTERFERON-γ-MEDIATED PRIMARY IMMUNODEFICIENCIES ...............................................................................................................137 4.1 INTRODUCTION ..........................................................................................................................137 4.2 METHODS AND MATERIALS .......................................................................................................141 4.3 RESULTS.....................................................................................................................................146 4.4 DISCUSSION................................................................................................................................151 5 DISCUSSION AND CONCLUSIONS .............................................................................................154 REFERENCES ........................................................................................................................................159 APPENDICES .........................................................................................................................................179 A. SUPPLEMENTARY INFORMATION FOR CHAPTER 2 .....................................................................179 B. SUPPLEMENTARY INFORMATION FOR CHAPTER 3 .....................................................................193 C. SUPPLEMENTARY INFORMATION FOR CHAPTER 4 .....................................................................210 iv  LIST OF TABLES TABLE 2.1.  PRIMER SEQUENCES FOR UNIVERSAL DEFENSIN EXPRESSION PROFILING BY CAPILLARY SEQUENCING...........................................................................................20  TABLE 2.2.  PRIMER SEQUENCES FOR UNIVERSAL DEFENSIN EXPRESSION PROFILING BY 454 AMPLICON SEQUENCING .....................................................................................25  TABLE 2.3.  GENES ANNOTATED WITHIN THE α-DEFENSIN AND β-DEFENSIN CLUSTERS ON C57BL/6 MOUSE CHROMOSOME 8............................................................................32  TABLE 2.4.  PSEUDOGENES ANNOTATED WITHIN THE α-DEFENSIN AND β-DEFENSIN CLUSTERS ON C57BL/6 MOUSE CHROMOSOME 8......................................................35  TABLE 2.5.  PFAM  TABLE 2.6.  SUMMARY OF THE BEST NON-MOUSE BLAST HITS FOR CRS PEPTIDE SEQUENCES ...............................................................................................................44  TABLE 2.7.  GENOME BROWSER COMPARISON OF MOUSE α-DEFENSIN GENES ..............................50  TABLE 2.8.  DEFENSIN GENES CURRENTLY ‘MISSING’ FROM THE MOUSE REFERENCE GENOME ....................................................................................................................51  TABLE 2.9.  UNIVERSAL DEFENSIN CAPILLARY SEQUENCING SUMMARY ......................................60  QUERY RESULTS FOR THE DEFENSIN-RELATED CRS PEPTIDE SEQUENCES ...............................................................................................................44  TABLE 2.10. UNIVERSAL DEFENSIN TRANSCRIPT EXPRESSION DETERMINED BY CAPILLARY SEQUENCING OF TOPO CLONES ................................................................................61 TABLE 2.11. UNIVERSAL DEFENSIN 454 AMPLICON DEEP SEQUENCING SUMMARY ........................63 TABLE 2.12. UNIVERSAL  DEFENSIN TRANSCRIPT EXPRESSION DETERMINED BY 454 AMPLICON DEEP SEQUENCING ...................................................................................64  TABLE 2.13. BLASTX ANALYSIS OF 454 TRANSCRIPTS AGAINST THE MATURE CRP PEPTIDES ........66 TABLE 2.14. SPECIES-SPECIFIC DEFENSIN NOMENCLATURE ...........................................................70 TABLE 3.1.  PRIMER SEQUENCES FOR THE GENERATION AND CONFIRMATION OF IL8 AND IL10 KNOCK-IN MICE AT THE GPA33 GENOMIC LOCUS ..............................................82  TABLE 3.2.  IL8_PCR-BLUNTII-TOPO CLONE SEQUENCING RESULTS SUMMARY ......................108  TABLE 3.3.  IL10_PCR-BLUNTII-TOPO CLONE SEQUENCING RESULTS SUMMARY ....................109  TABLE 3.4.  IL8_PA33LSL CLONE SEQUENCING RESULTS SUMMARY ........................................111  TABLE 3.5.  IL10_PA33LSL CLONE SEQUENCING RESULTS SUMMARY ......................................112  TABLE 3.6.  A33P_PCR-BLUNTII-TOPO CLONE SEQUENCING RESULTS SUMMARY ...................117  TABLE 3.7.  IL8_GPA33_JM8N4  TABLE 3.8.  THE IL8 LRHE MOUSE COLONY .............................................................................126  TABLE 3.9.  THE IL10 LRIT MOUSE COLONY .............................................................................129  AND IL10_GPA33_JM8N4 MICROINJECTION AND CHIMERA GENERATION SUMMARY ..........................................................................124  TABLE 3.10. THE IL10 LRMT MOUSE COLONY ...........................................................................131 v  TABLE 4.1.  NUMBERS OF GENES WITHIN THE ENRICHED DATA SET IN PBMCS FROM IL12RB1 DEFICIENT PATIENTS COMPARED TO HEALTHY CONTROL PBMCS ...........146  TABLE 4.2.  DIFFERENTIALLY  TABLE 4.3.  NUMBERS OF GENES WITHIN THE ENRICHED DATA SET IN PBMCS FROM IFNGR1 DEFICIENT PATIENTS COMPARED TO HEALTHY CONTROL PBMCS ............147  TABLE 4.4.  DIFFERENTIALLY  TABLE 4.5.  PATHWAY  EXPRESSED GENES IN PBMCS FROM IL12RB1 DEFICIENT PATIENTS COMPARED TO HEALTHY CONTROL PBMCS ............................................147  EXPRESSED GENES IN PBMCS FROM IFNGR1 DEFICIENT PATIENTS COMPARED TO HEALTHY CONTROL PBMCS. ...........................................148 ANALYSIS OF IL12RB1 DIFFERENTIALLY EXPRESSED GENES FOLLOWING IL12/IL18 TREATMENT .......................................................................150  vi  LIST OF FIGURES FIGURE 1.1. CANONICAL DEFENSIN STRUCTURAL CHARACTERISTICS .............................................3 FIGURE 2.1. OVERVIEW  OF THE CHROMOSOME 8 DEFENSIN GENE CLUSTER REGION IN MOUSE AND HUMAN REFERENCE GENOMES ...............................................................31  FIGURE 2.2. SCHEMATIC FOR SIZE ESTIMATION OF THE GAP IN MURINE C57BL/6J CHROMOSOME 8 LOCATED BETWEEN GENOMIC POSITION 20 AND 22 MB .................36 FIGURE 2.3. MURINE α-DEFENSIN PEPTIDES .................................................................................37 FIGURE 2.4. THE POLYMORPHIC DEFCR5 PEPTIDES ......................................................................38 FIGURE 2.5. DUPLICATED DEFCR23 TRANSCRIPT SEQUENCES ......................................................39 FIGURE 2.6. DUPLICATED DEFCR3 AND DEFCR20 TRANSCRIPT SEQUENCES .................................40 FIGURE 2.7. CRS-DEFENSIN PEPTIDES ..........................................................................................41 FIGURE 2.8. DUPLICATED NOVEL CRS1C TRANSCRIPT SEQUENCES..............................................42 FIGURE 2.9. ALIGNMENT OF MOUSE CRS-DEFENSIN AND RAT α-DEFENSIN PEPTIDES ..................45 FIGURE 2.10. NOVEL CODING AND NON-CODING MURINE β-DEFENSIN SPLICE VARIANTS ...............46 FIGURE 2.11. TATA-BOXES  ANNOTATED WITHIN THE POTENTIAL PROMOTER REGION OF MURINE DEFENSIN GENES ..........................................................................................47  FIGURE 2.12. NOVEL MURINE PREPROPEPTIDE RELATED TO α- AND CRS-DEFENSINS ....................48 FIGURE 2.13. NUCLEOTIDE ALIGNMENT OF ALL MURINE α- AND CRS-DEFENSIN CODING SEQUENCES ...............................................................................................................57 FIGURE 2.14. MURINE α-  AND CRS-DEFENSIN CODING SEQUENCE START WITH UDEFF1 PRIMER SEQUENCE ....................................................................................................58  OF C57BL/6J MURINE UNIVERSAL α- AND CRS-DEFENSIN POOLS FOR CAPILLARY AND 454 AMPLICON SEQUENCING .........................................59  FIGURE 2.15. GENERATION FIGURE 2.16. UNIVERSAL  DEFENSIN TRANSCRIPT EXPRESSION DETERMINED BY 454 AMPLICON DEEP SEQUENCING ...................................................................................65 HUMAN α-DEFENSIN DEFA1 INDUCED CYTOKINE AND CHEMOKINE RELEASE IN VITRO......................................................................................................68  FIGURE 2.17. THE FIGURE 3.1.  PA33LSL TARGETING VECTOR AND GENOMIC LOCUS OF RECOMBINATION ...............79  FIGURE 3.2. SCHEMATIC AND NUCLEOTIDE SEQUENCE OF THE IL8 AND IL10 INSERTS................103 FIGURE 3.3. GENERATION OF THE IL8 INSERT BY PCR AMPLIFICATION .....................................104 FIGURE 3.4. GENERATION OF THE IL10 INSERT BY PCR AMPLIFICATION ....................................105 FIGURE 3.5. CONFIRMATION OF IL8 INSERT CLONING INTO PCR-BLUNTII-TOPO .....................106 FIGURE 3.6. CONFIRMATION OF IL10 INSERT CLONING INTO PCR-BLUNTII-TOPO....................107 FIGURE 3.7. DIRECTIONAL  CLONING OF THE IL8 AND IL10 INSERTS INTO THE PA33LSL MURINE EMBRYONIC STEM CELL TARGETING VECTOR FACILITATED BY ASCI AND XHOI RESTRICTION DIGESTION ........................................................................110  vii  FIGURE 3.8. VALIDATION  OF IL8 AND IL10 INSERT LIGATION WITH PA33LSL FOR THE GENERATION OF FINAL ESC TARGETING VECTOR....................................................111  FIGURE 3.9. FINAL SALI-DIGESTED IL8_PA33LSL-1 AND IL10_PA33LSL-1 TARGETING VECTORS .................................................................................................................112 FIGURE 3.10. PCR  OPTIMIZATION OF THE SOUTHERN BLOT PROBE FOR ESC TARGETING CONFIRMATION .......................................................................................................113  FIGURE 3.11. CONFIRMATION OF SOUTHERN BLOT PROBE PRIMER SPECIFICITY ...........................114 FIGURE 3.12. CONFIRMATION OF SOUTHERN BLOT PROBE CLONING INTO PCR-BLUNTIITOPO .....................................................................................................................116 FIGURE 3.13. PURIFIED A33 AMPLICONS FOR USE AS SOUTHERN BLOT PROBE .............................117 FIGURE 3.14. SOUTHERN BLOT ANALYSIS OF BSABI-DIGESTED IL8_GPA33_JM8N4 AND IL10_GPA33_JM8N4 CLONE DNA ........................................................................118 FIGURE 3.15. CONFIRMATION  OF HOMOLOGOUS RECOMBINATION OF THE IL8_A33 TARGETING VECTOR AND JM8N4 ESC DNA AT THE EXPECTED GENOMIC LOCUS BY PCR AMPLIFICATION ..............................................................................120  FIGURE 3.16. CONFIRMATION  OF HOMOLOGOUS RECOMBINATION OF THE IL10_A33 TARGETING VECTOR AND JM8N4 ESC DNA AT THE EXPECTED GENOMIC LOCUS BY PCR AMPLIFICATION ..............................................................................121  FIGURE 3.17. IN VITRO CRE-MEDIATED DELETION OF THE IL8 AND IL10 TARGETED ALLELE NEOMYCIN RESISTANCE CASSETTE ..........................................................................123 FIGURE 3.18. IL8 F1 (LRHE) OFFSPRING GENOTYPING ................................................................125 FIGURE 3.19. IL10 F0 CHIMERA (LRMT) OFFSPRING GENOTYPING ..............................................130 FIGURE 4.1. KEY  SIGNALLING MOLECULES INVOLVED IN IL12-DEPENDENT IFNG PRODUCTION ...........................................................................................................138  FIGURE 4.2. CYTOKINE  PRODUCTION AS A DIAGNOSTIC INDICATOR OF TH1 PRIMARY IMMUNODEFICIENCIES .............................................................................................139  FIGURE 4.3. LOCATION OF KNOWN GENETIC MUTATIONS IN IFNGR1 AND IL12RB1..................141  viii  ABBREVIATIONS A33p, A33 probe BAC, bacterial artificial chromosome BLAST, basic local alignment search tool CCL, chemokine (C-C motif) ligand CCR, chemokine (C-C motif) receptor CNV, copy number variation Crp, cryptdin CRS, cryptdin-related sequence CXCL, chemokine (C-X-C motif) ligand CXCR, chemokine (C-X-C motif) receptor DC, dendritic cell DEFA, α-defensin DEFB, β-defensin Defcr, defensin-related cryptdin sequence Defcr-rs, defensin-related cryptdin-related sequence D-PBS, Dulbecco’s phosphate buffered saline ELISA, enzyme linked immunosorbent assay ELR, glutamic acid-leucine-arginine amino acid motif ESC, embryonic stem cell Fabp, fatty acid binding protein Gpa33, glycoprotein A33 (transmembrane) GSMID, genome sequencer multiplex identifier HGNC, Human Genome Organisation Gene Nomenclature Committee HNP, human neutrophil peptide IFNG, interferon-γ IFNGR, interferon-γ receptor IL, interleukin IL10R, interleukin 10 receptor IL12R, interleukin 12 receptor IRF, interferon regulatory factor JAK, Janus activating kinase ix  KI, knock-in KO, knock-out LPS, lipopolysaccharide LRHE, Linda Rehaume human interleukin eight mouse colony LRIE, Linda Rehaume interleukin eight mouse colony LRIT, Linda Rehaume murine interleukin ten mouse colony LRMT, Linda Rehaume murine interleukin ten mouse colony MCS, multiple cloning site MGI, mouse genome informatics MGNC, Mouse Genomic Nomenclature Committee NCBI, National Center for Biotechnology Information NET, neutrophil extracellular trap NF-H2O, nuclease free water NF-κB, nuclear factor of kappa-light-chain-enhancer of activated B cells NK, natural killer NTM, non-tuberculosis mycobacteria OTTID, Vega database identifier PBMC, polymorphonuclear cell Pfam, Protein family database RACE, rapid amplification of cDNA ends RefSeq, NCBI Reference Sequence database RGNC, Rat Genome Nomenclature Committee RSF, research support facility RT, reverse transcription SFF, standard flowgram format SOC, super optimal broth with catabolite repression SPAG, sperm-associated antigen SPI, Salmonella pathogenicity island STAT, signal transducer and activator of transcription Th1, Type 1 helper T cell Th2, Type 2 helper T cell TLR, Toll-like receptor TNF, tumor necrosis factor x  Treg, regulatory T cell TTSS, type III secretion system TYK, tyrosine kinase UBC, University of British Columbia VEGA, Vertebrate Genome Annotation genome browser WT, wild type WTSI, Wellcome Trust Sanger Institute  xi  ACKNOWLEDGEMENTS I would like to start by thanking my supervisor, Bob Hancock, for his unwavering support. I appreciated the freedom he allowed me during my research, especially for the opportunity to work in Cambridge UK, but also the guidance he gave me when I needed to refocus and see the larger picture. I am grateful to Gordon Dougan for the opportunity to work in his lab at the Wellcome Trust Sange Institute in Cambridge UK, as well as David Adams for tissue culture space and helpful discussions. I would like to thank the Human and Vertebrate Analysis and Annotation group, especially Clara Amid for the defensin annotation and the friendship that has developed along the way. Additionally I have appreciated working with Rainer Doffinger and Dinakantha Kumararatne at the Addenbrooke’s Hospital Department of Clinical Biochemistry & Immunology in Cambridge, UK. I would especially like to thank the immune deficient patients and their families for their selfless participation in this research. I would like to thank my committee members, Megan Levings, David Speert and Ninan Abraham, for their feedback on my research, as well as the members of the Hancock, Dougan and Adams’ labs for intellectual and technical support. I have enjoyed working with such hardworking, talented and enthusiastic people and am grateful for all I have learned. I would especially like to thank my parents, Eleanor and Len Rehaume, and sister, Vicki Rehaume, for their steadfast belief in me always. I have leaned on them for emotional support through good and bad times, and on my Mum and Vicki during the years of this degree. My parents gave me the best start in life and to them I will always be grateful. Finally I would like to thank my partner, Greg Baillie, for all of his love and support, especially for ‘doing everything’ while I was writing my thesis. Thank you.  xii  DEDICATION This work is dedicated to my Dad. He taught me so many things and was one of the smartest people I know. I was inspired to pursue this degree in the field of immunology because of what he went through. It is my hope that each new piece of information will not only advance our knowledge of how the immune system works, but lead to novel therapies and improve patient health. I know my Dad would have been proud of me.  xiii  1 1.1  INTRODUCTION Modulators of the Immune System Cationic host defence peptides are evolutionarily conserved across a wide range of  species. They constitute a major component of the defensive repertoire in lower organisms, whereas in higher species they are part of the complex immune system involved in protecting against infection. These peptides were first described as antimicrobial peptides, but it is now apparent they play a multi-functional role in the immune system. There are several classes of peptides, based on their secondary structure, but they all share several key features. They often require proteolytic cleavage for release of the mature, active peptide which contains 12-50 amino acids and is positively charged and amphipathic (1, 2). The secondary structure of these peptides can be α-helical (e.g. cathelicidins), looped or cyclized with one disulphide bond (e.g. bactenecin), β-stranded with two or more disulphide bonds (e.g. defensins), or extended (e.g. indolicidin) (1, 2). The expression of host defence peptides occurs in a wide range of cell types in either a constitutive or an inducible manner. As such, many host defence peptides are commonly found at increased concentrations within the inflammatory milieu in humans. Additionally several human immune deficiencies have decreased host defence peptide expression, which is often correlated with increased susceptibility to infection, implicating them in disease. Whether these peptides are the cause or effect of such conditions has been difficult to ascertain, due to the often redundant nature of their expression. The immune system has evolved layers of protection against invading pathogens. Often several molecules or pathways perform the same or similar functions. This becomes apparent in loss-of-function assays if the observed phenotype is milder than expected, or no phenotype is observed. The redundancy of host defence peptide expression and apparent function is analogous to that of cytokines, especially for chemokine function, although cytokines have long been regarded as crucial in the functioning of all aspects of the immune system. Within the chemokine family, often more than one member performs the same function, many bind to more than one receptor, and conversely chemokine receptors usually bind more than one chemokine. The receptor-ligand interaction is not strictly a one-to-one relationship. In vitro experiments have been important for determination of their common and distinctive functions. Within each of the cytokine and host defence peptide families there are members that possess overlapping functions. Cytokines are responsible for cellular differentiation, mobilization and activation, as well as regulation of homeostasis, tolerance and repair (3). It is becoming apparent that host defence 1  peptides also play an important role in many of these processes. The interaction between cytokines and host defensin peptides within the inflammatory milieu has not been extensively studied, and whether these molecules display synergy with respect to any of these functions is unknown. Defensins and chemokines have direct chemotactic activities for immune cells (although much higher concentrations of defensins are needed), and despite their differing primary sequences, defensins and chemokines have related tertiary structures and cationic nature (4). Both contain three β-strands, an amino terminal α-helix and disulphide bridges (4). It has been proposed that the chemotactic activities of human β-defensin (DEFB) 1, DEFB4 (DEFB2) and chemokine (C-C motif) ligand (CCL) 20 (MIP-3α) are due to their binding to the chemokine (CC motif) receptor, CCR6 (5-7). Similar to Toll-like receptor (TLR) agonists (8), CCL20 binding of dendritic cells (DCs) via CCR6 leads to increased CCR7 expression and reduced CCR6 expression (4). Additionally murine Defb2, Defb3 and Ccl20 (Mip-3α) demonstrate adjuvant antitumor activity (9). Mice vaccinated with lymphoma-specific nonimmunogenic antigens fused with murine Defb2, Defb3 or Ccl20, generated tumour specific antibodies and were protected upon tumour challenge (9). Furthermore, increased survival was observed when mice bearing tumours were administered the fusion constructs (9). This adjuvant activity was dependent on the targeting of murine Defb2, Defb3 or Ccl20 to immature dendritic cells, presumably through the chemotactic receptor, CCR6 (9). Therefore both host defence peptides and chemokines link the innate and adaptive immune systems through a continuum of cell signalling and activation. 1.2  Defensins Defensins are the largest family of cationic host defense peptides in humans and are  produced by many cell types including neutrophils, Paneth cells, epithelial cells and keratinocytes (10). Defensin genes typically have a two-exon structure, however there are exceptions within the α-defensin family, some of which have three exons, while members of the β-defensin family can have between two to four exons, e.g. fish and birds have three-exon βdefensins (11). The differences between α- and β-defensins have been proposed to be a consequence of gene duplication and subsequent divergence selected during evolution (12). Extensive analysis has provided insight into the evolution of mammalian β-defensins (12-14). βdefensins are evolutionarily older than α-defensins, which are thought to have arisen by repeated gene duplication of β-defensins and positive diversifying selection (12, 15). This is especially 2  apparent for rodent, and in particular murine, α-defensins, which have most likely “lost” one or more of these exons during evolution, and subsequent gene/chromosome duplication events led to their two exon structure. The high similarity of mouse α-defensin genes and subsequent repetitive nature of their chromosomal position lends support to this model. As a rapidly evolving gene family, defensins provide a means through which to study mammalian evolution. Defensins are produced as 94-95 amino acid prepropeptides, and sequential cleavage of the signal- and pre-peptides releases the 29-34 amino acid mature peptides (Figure 1.1A). Mature defensin peptides are characterized by six canonical cysteines residues at defined positions, and adopt a β-stranded secondary structure composed of three disulphide bonds. The spacing between the cysteine residues and the arrangement of the three disulphide bonds divides the defensins into α- and β-defensin subfamilies (Figure 1.1B) (16-18). A.  Gene Structure Exon 1  Intron  Exon 2  5’ upstream sequence  3’ downstream sequence  Peptide Structure Signal  B.  Pro-peptide  Mature peptide  (i)  X 1-2 C X C R X2-3 C X 3 E X 3 G X C X3 G X 5 C C X 1-4  (ii)  X2-10 C X5-6 G/A X C X 3-4 C X 9-13 C X 4-7 C C Xn  (iii)  GXCRCXCXR RXCXCRCXG  Figure 1.1. Canonical defensin structural characteristics. Schematic of murine α- and CRS-defensin gene and peptide structures (A). The defensin primary sequence, and linkage pattern of the six cysteines residues (B), allowing formation of the three disulphide bonds, divides them into α-defensin (i), β-defensin (ii) and θ-defensin (iii) subfamilies. 3  β-defensins have a broad tissue expression pattern and are found in most vertebrates and some invertebrate species, whilst α-defensins are specific to certain mammals and are mainly produced by leukocytes of myeloid origin and Paneth cells of the small intestine (16-18). θDefensins are believed to be derived by cyclization of α-defensins and seem to be restricted to the leukocytes of Old World monkeys (19). There is debate as to the role of defensins but there is limited evidence to support the hypotheses of their biological function. In vitro, under low salt and serum-free conditions, defensins have antimicrobial activity against Gram-positive and Gram-negative bacteria, fungi, parasites, yeast and viruses, however under physiological salt conditions and in the presence of serum, the killing activity of most defensins is abrogated or greatly reduced (2). Under these conditions, DEFB103 (DEFB3) is the only human defensin that retains significant antimicrobial function. Direct killing activity of defensins may be limited to sites at which concentrations are high such as neutrophil phagosomes, intestinal crypts, or at the skin epithelia which has lower concentrations of salts and serum proteins. The discovery of a mouse Paneth cell α-defensin peptide, termed cryptdin (Crp) due to its expression in the Crypts of Lieberkühn (20), was the first report of defensin expression in a nonmyeloid cell lineage (21, 22). The gene coding for cryptdin, defensin-related cryptdin (Defcr) was subsequently mapped to mouse Chromosome 8 (23, 24) and since has been discovered to be part of a larger gene family including additional Defcr α-defensin and Defcr-related sequence (Defcr-rs) genes. Defcr-rs were so named due to their sequence similarity and genetic linkage to Defcr (21-25). The Defcr-rs genes and peptide products are also referred to as cryptdin-related sequences (CRS). Additional Defcr/Defcr-rs loci have been discovered in different mouse strains, some of which may be polymorphic or involved in copy number variation (CNV) (23, 24, 26-29). The confusion around gene names, variable copy numbers and polymorphisms has made the study of mouse defensins quite complex. In humans, α-defensins are most abundant in neutrophils and intestinal Paneth cells (30). The human neutrophil-specific α-defensins, DEFA1, DEFA2 and DEFA3 constitute 30-50% of the protein content in the azurophilic granules (10). DEFA2 does not exist within the human genome, however the DEFA2 (HNP-2) peptide is presumed to be derived from either DEFA1 or DEFA3 through post-translational cleavage of the N-terminal alanine or aspartic acid residue, respectively (31). Excluding the N-terminal amino acid, the DEFA1 and DEFA3 primary sequences are identical (31). The human intestinal-specific α-defensins DEFA5 and DEFA6 are the highest expressed of Paneth cell components (32). Both neutrophil and Paneth cell 4  stimulation can lead to degranulation and α-defensin release, either into the surrounding tissues or intestinal crypts and lumen, respectively. There are rare human disorders, Chediak Higashi Syndrome and Specific Granule Deficiency, associated with decreased or absent neutrophil αdefensins, however other neutrophil granule components are also deficient which makes it difficult to specifically assign these disorders to defensins (33). Murine neutrophils do not contain α-defensins, therefore Paneth cells are the only source of α-defensins in mice (34). These murine α-defensins comprise 70% of the killing activity of Paneth cells (35). Due to their secondary structure, defensins are relatively resistant to proteolysis and to this extent murine Paneth cell α-defensins have been purified from the lumen of the large intestine, confirming this property in vivo (36). This characteristic of α-defensins may result in their effects being propagated throughout the intestinal tract, in addition to within the crypts of the small intestine (36). Reduced DEFA5 and DEFA6 expression has been linked with inflammatory bowel Crohn’s disease and, from experiments with transgenic mice expressing DEFA5 in the intestine, is hypothesized to be the result of alterations within the intestinal microbiota, which might contribute to the inflammatory pathology of this disease (32, 37). Interestingly the severity of disease is also correlated with increased IL8 production (32), which may be either a marker of inflammation or a factor in the disease progression. 1.3  Peptide Therapeutics in Models of Inflammation There is increasing interest in host defence peptides for use as novel therapeutics in the  treatment of infectious diseases (38-40). One goal of this work was to develop novel mouse models that would enable elucidation of the in vivo mode of action of host defence peptides in intestinal infection under conditions of hyper- and hypo-inflammation. Characterization of the in vitro functions of host defence peptides, in particular defensins, on a variety of cell types, including primary cells, has revealed their wide range of functions, including direct chemokine activity, induction of chemokine production and secretion (including IL8), suppression of induced pro-inflammatory responses (in part reflected in increased IL10 production), enhancement of phagocytosis, inhibition of complement, adjuvant activity, co-stimulatory molecule induction, enhancement of cellular growth and stimulation of wound healing (41, 42). Direct antimicrobial activity in vivo under physiological conditions remains to be determined (2); however the concentration of defensins upon neutrophil or Paneth cell degranulation would favor high local concentrations and suggest that, under certain conditions, defensins might contribute 5  to local immunity through direct antimicrobial action. The in vivo peptide mechanism of action is harder to determine due to redundancy in peptides, especially in the case of defensins, however the transgenic mice expressing DEFA5 in the intestine were slightly more resistant to oral Salmonella typhimurium infection (43). There is interest in generating synthetic derivatives of host defence peptides that have enhanced function(s). Administration of a synthetic peptide, IDR-1, prophylactically or therapeutically to mice improves survival upon infection with both Gram-positive and Gram-negative bacteria (39). 1.4  Interleukin 8 IL8 is a chemokine that demonstrates low basal expression and can be markedly  increased in response to various stimuli, e.g. bacteria, viruses, stress factors, cytokines and, pertinent to this thesis, host defence peptides, including defensins and LL-37 (44-47). LL-37 and DEFB1 inhibit lipopolysaccharide (LPS)-induced IL8 production (44, 48). A synthetic peptide IDR-1002 induces robust chemokine, including IL8, production from human polymorphonuclear cells (PBMCs), chemoattracts murine neutrophils and monocytes in vivo and protects mice against infection (49). It is therefore important to consider the in vivo context of host defence peptides with respect to IL8 production/inhibition. IL8 is a member of the ELR+ C-X-C motif family of chemokines, which are primarily selective for neutrophils (50). These chemokines contain the amino acid motif glutamic acidleucine-arginine (E-L-R) before the first cysteine of the C-X-C motif (50). IL8 chemoattracts and activates neutrophils; activation leads to phagocytosis or degranulation, and can also prepare for the oxidative burst through phosphorylation events needed for nicotinamide adenine dinucleotide phosphate (NADPH) oxidase complex activation (51). There are conflicting results as to whether IL8 inhibits neutrophil apoptosis, which may be beneficial for bacterial clearance but could also promote excessive tissue damage (51). Neutrophil production of chemokines is dependent on the inflammatory milieu, as varying combinations of stimuli produce variable chemokines (51). Similarly, neutrophil activation can result from the engagement of the chemokine (C-X-C motif) receptors (CXCR) 1 or CXCR2, or both. However, these receptors, although potentially redundant in function due to overlapping ligand specificity, appear to have unique roles in neutrophil activation (52). The contents released by neutrophils upon degranulation, secreted cytokines and chemokines or cell-cell contact mediate the interaction of neutrophils with endothelial cells, monocytes, dendritic cells and T cells (53). It is now understood that 6  neutrophils play a more significant role in the immune response through their ability to coordinate the innate and adaptive immune systems. IL8 mediates inflammation through neutrophil activation but also promotes neutrophil release (mobilization) from the bone marrow, which is the primary site of neutrophil precursor differentiation (54). Additionally, IL8 is a potent inducer of angiogenesis (46). The mouse genome does not contain an Il8-encoding gene although chemokine (C-X-C motif) ligand (Cxcl) 1 (KC) and Cxcl2 (Mip-2β) have both been proposed as functional equivalents due to their ability to chemoattract neutrophils (55). However, the protein alignments, and thus identities, of Cxcl1 and Cxcl2 with IL8 do not reflect true orthology, and they likely have other distinct functions based on their differing receptor binding capabilities (discussed below). Of the rodents with sequenced genomes that are accessible through the Ensembl genome browser, the ones that contain homologues of human IL8 are guinea pig, kangaroo rat, pika, rabbit, squirrel and tree shrew, but homologues are absent from mouse or rat. This indicates that both mouse and rat have likely lost the Il8 gene, rather than species outside of rodents, e.g. primates, having gained the gene. In humans, IL8 binds two G protein-coupled receptors, CXCR1 and CXCR2. CXCR1 and CXCR2 both bind the chemokines IL8 and CXCL6, whereas CXCR2 also binds CXCL1, CXCL2, CXCL3, CXCL5 and CXCL7 (52). The roles of CXCR1 and CXCR2 are largely thought to be redundant since both bind ELR+ chemokines. In particular they both bind IL8 with comparable high affinity, although CXCR1 is relatively selective for IL8 compared to CXCR2, which does not display any ligand selectivity (50). However the difference in the ligand-binding repertoires of CXCR1 and CXCR2 indicates that engagement of these two receptors by IL8 may lead to different outcomes. This is in agreement with the expanding role of neutrophils, beyond phagocytosis and cell-mediated killing, although any distinct roles of CXCR1 and CXCR2 need elucidating. It was initially thought that since mice lack an Il8 gene they would also lack a homologue of human CXCR1; however, mouse Cxcr1 was recently identified. Conversely, the ligand specificity and neutrophil chemoattractant properties of Cxcr2 have been well characterized and orthology between mouse Cxcr2 and human CXCR2 is generally accepted. Cxcr2 binds ELR+ chemokines, in particular Cxcl1 and Cxcl2, the human homologues CXCL1 and CXCL2, as well as CXCL3 and IL8 (52). The identification and cloning of mouse Cxcr2 was reported by several groups (56-58), but initial characterizations were somewhat conflicting. Cxcr2 bound Cxcl1 and 7  Cxcl2 with high affinity and IL8 with low affinity. Nevertheless IL8 promoted murine neutrophil chemotaxis, although higher concentrations of IL8 were required to elicit cell numbers equivalent to Cxcl1, in vitro (58). The responsiveness of murine neutrophils to IL8 led to speculation that a second IL8 receptor exists in mouse, similar to CXCR1 (58). This was corroborated by Southern blot analysis when a human CXCR2 probe was shown to hybridize to two similarly-sized fragments of murine DNA (58). However Cxcr2-transfected cells bound IL8 in a dose-dependent manner with similar affinity to that of murine neutrophils, and a single band was observed by Southern blot using a conserved CXCR2 probe against DNA from a murine cell line (56). The explanation for the apparent discrepancy of receptor number is likely the use of different Southern blot probes. In the first study, probes derived from either the 5’ or 3’ end of the human CXCR2 gene both hybridized twice but one of the two bands that were observed with the 5’ probe was much weaker compared to similar intensity bands using the 3’ probe. It was therefore interpreted that the 3’ end of the human CXCR2 gene was more similar to the second receptor than the 5’ end (58); alignment of the nucleotide and protein Consensus Coding Sequences (CCDS) of Cxcr1 and Cxcr2 supports this observation (alignment not shown). In the second study, the probe encompassed the internal region of the CXCR2 gene, which apparently only hybridized to Cxcr2; however a Cxcr2 probe hybridized twice to mouse neutrophil mRNA (56), again confirming the presence of another IL8 receptor. A putative CXCR1 homologue, Cxcr1, was discovered in mice (59-61), although it is still not clear whether this is a true orthologue of CXCR1 due in part to a lack of murine Il8. The first reports of the cloning and tissue expression of Cxcr1 from BALB/c and C57BL/6 mice described its genomic position 30 kb downstream of Cxcr2, through 5’ and 3’ rapid amplification of cDNA ends (RACE) experiments (59, 60); the gene and protein sequences, as well as its genomic position are preserved in the current murine C57BL/6J reference genome assembly (NCBIM37). More recently, the putative Cxcr1 was also identified through computational searches of mouse expressed sequence tag (EST) data using the CXCR1 cDNA sequence, and subsequently cloned and expressed (61). Cxcr1 is highly expressed on neutrophils, and in tissues from lung, spleen, caecum, thymus, placenta, testis and stomach, as well as in the mesenteric and peripheral lymph nodes (59-61). Cxcr1 is also expressed in the bone marrow and peripheral blood leukocytes (59, 61), specifically CD3+ T cells and macrophages express both Cxcr1 and Cxcr2, and CD4+ and to a smaller extent CD8+ T cells express Cxcr1; Cxcr2 expression by CD4+ or CD8+ was not tested 8  (60). LPS-induced inflammation in the lung causes an influx of neutrophils, with increased Cxcr1 but decreased Cxcr2 expression (59). This may be the result of receptor desensitization, since lower concentrations of IL8 are needed for Cxcr2 internalization and Cxcr2 recycling is slower than that of Cxcr1 (52). Similarly, the expression of both Cxcr1 and Cxcr2 is induced in collagen-induced arthritis (61). It has been reported that Cxcr1 does not bind IL8, CXCL1, Cxcl1 or Cxcl2 in contrast to Cxcr2 and CXCR2, which both bind, to some degree, to the aforementioned chemokines (60). However assay conditions may not have fulfilled requirements for Cxcl1 activation by these ligands; Sf9 (Spodoptera frugiperda) insect cells were infected with baculoviruses expressing either Cxcr1, Cxcr2 or CXCR2 as well as the heterotrimeric G protein αi2β1γ3 (60). Neutrophil chemotaxis mediated by CXCR1 or CXCR2 can be inhibited by pertussis toxin but both receptors may couple with different heterotrimeric G protein subunits for activation (62). This may also be the case for Cxcr1 and Cxcr2. In contrast, Cxcr1 was reported to bind IL8 and CXCL6, analogous to CXCR1, as well as Cxcl6, as determined by binding assays, GTP exchange and chemotaxis of Cxcr1-expressing cells (61). Moreover Cxcr1 did not bind other CXCR2 ligands (61), indicating that this may indeed be the true CXCR1 orthologue. Further experimentation is required to verify this relationship. 1.5  Interleukin 10 IL10 is a major anti-inflammatory cytokine, produced by a variety of leukocyte and non-  leukocyte cell types, which include monocytes, macrophages, myeloid DCs, T cells, B cells, mast cells, natural killer (NK) cells, eosinophils and neutrophils, as well as epithelial cells, endothelial cells and keratinocytes, respectively (63-66). IL10 was first described for its ability to polarize T cell responses. Production of IL10 by Th2 cells inhibits cytokine production by Th1 cells, however similar inhibition of Th2 cytokine production was not observed (67). Indeed IL10 inhibits IL12 production, but both CD4+ and CD8+ T cells produce IL10, and in particular the T cell subclasses Th1, Th2, Tr1 and Th17 (64, 66). IL10 suppresses T cells and NK cells by inhibiting cytokine, co-stimulatory molecule and MHC class II production from macrophages and DCs, in addition to upregulating anti-inflammatory molecules (63, 64). IL10 can be expressed constitutively but also induced in response to a variety of stimuli such as microbial pattern-associated molecular patterns (PAMPs) including superantigens, and cytokines (66, 68). Engagement, by appropriate ligands, of TLR2, TLR4, TLR9, C-type lectin domain family 7 (CLEC7A) (DECTIN1) or CD209 (DC-SIGN) can induce IL10 production 9  (66). Additionally DEFB2 and IDR-1002 induce IL10 production from human PBMCs and murine bone marrow-derived macrophages, respectively (49, 69). An increase in Il10 in vivo upon IDR-1002 administration during infection was not observed (49). The control of IL10 production involves several levels of complexity including type, strength and number of stimuli, activation of enhancing or silencing transcription factors, chromatin remodeling, histone modifications (acetylation and phosphorylation), as well as post-transcriptional regulation (66). The regulation and timing of intestinal IL10 production in vivo with respect to either commensal or pathogen stimulation needs further elucidation (66). It is not clear which cells are key IL10 producers in the intestine and whether those that maintain homeostasis are different from those that are involved in the inflammatory response (66). Host defence peptides appear to selectively modulate the immune response, thus it is important to determine their role in immune regulation. The functional IL10 receptor is composed of two subunits, IL10RA and IL10RB. IL10RA is expressed at low levels on most hematopoietic cells. Expression patterns are cell-type specific and can be up- or down-regulated during activation although IL10RA is constitutively expressed by human and mouse intestinal epithelial cells (63). IL10RB is also expressed by a wide variety of cells, however its expression appears to be constitutive in all of these (63). IL10RA binds IL10 at the surface of the cell and IL10RB transduces the intracellular signal through phosphorylation of Janus activating kinase (JAK) 1 and tyrosine kinase (TYK) 2, which can then activate signal transducer and activator of transcription (STAT) 1, STAT3 or STAT5. STAT3 seems to mediate the majority of IL10-induced responses through transcription of suppressor of cytokine signalling (SOCS) 3 (63). IL10 has been implicated in a number of human diseases, particularly those involving intracellular pathogens including viruses, autoimmune disorders and sepsis (63). In some cases it is not clear whether IL10 is the cause or a consequence of the disease although high IL10 levels may predispose individuals to systemic lupus erythematosus (63). IL10 protects from endotoxin and death due to sepsis and may be involved in endotoxin tolerance in vitro (63). Interestingly certain viruses (EBV, herpes, poxvirus) encode an IL10 homologue, which share high similarity with human IL10 (63), which is likely one mechanism these viruses use to reduce the inflammatory response and aid survival within the host.  10  1.6  Cytokines in Gastrointestinal Disease Gastrointestinal diseases pose a significant threat to human health worldwide. These  include bacterial and viral infections, due to contaminated water and food, and direct person to person spread (70, 71), as well as autoimmune inflammatory bowel disorders and cancer (72, 73). While the pathology of these diseases is distinct, they share a common underlying etiology, the dysregulation of inflammation. Too much or too little inflammation can be a bad thing and cytokines, including both IL8 and IL10, have been implicated in inflammatory diseases (74, 75). In humans Shigellosis or bacillary dysentery is caused by several Gram-negative Shigella species, S. flexneri, S. dysenteriae, S. sonnei and S. boydii. Shigella are transmitted via the fecaloral route and are a serious threat to people in the developing world who do not have access to clean water and proper sanitation. The World Health Organization estimates Shigella spp. cause between 80–165 million cases of dysentery and 600,000 deaths annually; approximately 60% of the deaths are for children under five years old (70). There is also a significant burden on developed countries from Shigella infections in travellers to developing countries (70). The virulence of S. flexneri is associated with a 140 Mb plasmid because strains lacking this plasmid are non-invasive (76). Gene expression of the human intestinal epithelial cell line, Caco-2, was analyzed by microarray following infection with wild type S. flexneri strain M90T or a mutant strain, BS176, lacking the 140 Mb plasmid (77). The gene most highly induced by the invasive strain, but not the mutant non-invasive strain, was IL8 (~300-fold increase) (77). IL8 protein induction by the invasive strain, but not the non-invasive strain, was confirmed (77). A vaccine protecting against Shigella infection has not been developed, in part because current animal models do not mimic the human disease (78). It has been postulated that the lack of murine Il8 may account for this discrepancy in disease severity because administration of recombinant human IL8 to mice infected with S. flexneri induces a disease which better resembles that of the human disease (79). The lack of neutrophil influx in the intestinal tract cannot be explained by differing neutrophil extravasation in human and mouse because a large neutrophil infiltrate is observed in pulmonary Shigella infection (80). This difference may be due to a more protective barrier in the intestine due to the barrage of microbes in the gastrointestinal tract compared to that of the lungs. Shigella is phagocytosed by resident macrophages and DCs in the follicle following transport through the follicular-associated epithelium via M cells. It then escapes from the vacuole and induces caspase-1 mediated apoptosis of the macrophage or DC (78). Caspase-1 activation increases IL1B production, which leads to neutrophil infiltration, 11  increased inflammation and increased bacterial invasion (78). IL8 is therefore not the only mediator of neutrophil influx but its absence in mice may be one reason why less neutrophil infiltrate is observed in many mouse models compared to that observed in the human disease, especially those of the gastrointestinal tract. Therefore it has been hypothesized that IL8 leads to increased neutrophil recruitment and inflammation, and subsequent Shigella infection. However this may not be the same for all enteric diseases as neutrophil recruitment could increase clearance of the microorganism. It is unlikely that the same mechanism is exploited by all bacteria similar to the fact that IL1B can be detrimental in terms of Shigella infection but is beneficial in others, e.g. Escherichia coli (78), and Candida albicans (81). Shigella is a mucosal pathogen whereas Salmonella can cause systemic infection and thus comparison of these two infections will lead to a better understanding of how IL8 mediates disease pathology. Salmonella enterica serovar Typhi (S. typhi) is the causative agent of typhoid fever in humans. An estimated 17 million cases of typhoid fever occur annually and account for 600,000 deaths (70). S. typhi is a human pathogen and the disease is modeled in mice by use of Salmonella enterica serovar Typhimurium (S. typhimurium), which causes the human enteric disease gastroenteritis. This disease can be modeled in mice by administration of antibiotics before infection with S. typhimurium (82). There are significant differences in the human diseases caused by S. typhimurium and S. typhi, in particular patients with S. typhimurium have a large intestinal neutrophil infiltrate which is not observed in the disease caused by S. typhi (83). This combined with the host-restricted nature of S. typhi has led to substantial research comparing these two organisms, both at the genetic and host-interaction levels (82-84). Mice orally infected with S. typhimurium develop a disease similar to typhoid fever in humans, as the bacteria invade the intestinal mucosa without neutrophil infiltration or epithelial damage, and then disseminate to the spleen and liver, where granulomas form (82). Mice treated with antibiotics prior to S. typhimurium infection develop colitis, which is characterized by a large neutrophil infiltrate consistent with the human disease (82). S. typhimurium can also disseminate in this model depending on the genetic background of mice (82). The Salmonella pathogenicity islands (SPI) encode virulence factors, which contribute to colonization and invasion of the host (82). S. typhimurium contains at least two type III secretion systems (TTSS), SPI-1 and SPI-2 (84). The SPI-1 TTSS is necessary for the translocation of virulence factors into intestinal epithelial cells, thereby S. typhimurium promotes its own invasion of these cells (82). Following 12  successful translocation the SPI-2 TTSS is required for invasion of macrophages, replication and dissemination (82). Investigations of S. typhimurium virulence should therefore encompass both epithelial cells and leukocytes. Novel murine models of intestinal inflammation are needed to accurately assess the mechanism by which host defence peptides protect against infection in vivo. Additionally there are human immunodeficiencies that increase the susceptibility to Salmonella infection. The study of leukocytes from these patients ex vivo can be used to complement the in vivo murine models of infection, thereby providing a better understanding of the disease. IL10 is important for immune regulation but during infection, the source of IL10 and the timing of its production are not well understood (64). IL10 production, whether by epithelial cells or intestinal lymphocytes, in the intestinal tract likely plays an important role in the early stages of the immune response, and in general immune surveillance for differentiation between enteric pathogens and commensal organisms. As bacterial-induced inflammation can promote epithelial barrier destruction leading to translocation and successful infection, it is hypothesized that reducing this inflammation through the overexpression of Il10 in the mouse intestinal tract would limit the epithelial damage induced by enteric pathogens and prevent infection and widespread dissemination, thus increasing survival following bacterial challenge. This is consistent with increased survival of mice with reduced lung inflammation following intranasal infection with Il10-secreting Shigella flexneri (85). However if the bacteria are able to invade the epithelial barrier in high enough numbers, it is possible that mice will succumb to bacterial infection due to the inhibitory effects of Il10. Novel murine models of immune suppression are also needed to further address the role of host defence peptides in immune homeostasis. 1.7  Project Goals and Hypotheses This aim of this project was to further elucidate the role of host defence peptides, in  particular defensins, in the innate immune system through in vitro stimulation of human peripheral blood mononuclear cells. Originally the generation of an α-defensin deficient mouse was proposed in order to study the in vivo role of α-defensins within the murine intestinal tract. This highlighted the need for a complete characterization of murine α-defensins within the C57BL/6J reference strain, as well as the development of alternative models with which to study the function of defensins.  13  The key hypotheses pursued here were: 1. Human α-defensins induce the production of chemokines and cytokines from human peripheral blood mononuclear cells. 2. The discrepancy between genomic and peptide α-defensin repertoire indicates murine αdefensin expression is tightly regulated. 3. Murine models of inflammation and immunosuppression can be used to test the efficacy of host defence peptides against gastrointestinal disease. 4. Cells of patients with immunodeficiencies can be used to dissect the mechanisms of susceptibility to Salmonella and mycobacterial infections.  The specific research questions addressed in this thesis are: 1. How extensive and diverse is the alpha defensin locus in mouse and is there evidence in the DNA sequence signatures of how it evolved? 2. Do the different alpha defensin genes represent a repository of silent genes for generation of diversity or are they actually expressed? 3. Since defensins work in part through induction of chemokines and cell recruitment (which are also exceptionally diverse and functionally redundant) are there genetic approaches to permit this property to be explored in infection models?  14  2 2.1  α-DEFENSIN GENE ANNOTATION, QUANTITATIVE TRANSCRIPT EXPRESSION AND FUNCTIONAL ACTIVITY Introduction In humans, α-defensins are most abundant in neutrophils and Paneth cells of the small  intestine, however the neutrophils of mice do not contain α-defensins (30, 34), despite the fact that mice have the largest known repertoire of defensin-encoding sequences. Therefore the Paneth cell α-defensins are the only option when considering a murine model with which to study their function. However the fact that murine α-defensins are contained within discrete compartments within the small intestine, i.e. Paneth cell granules, is an advantage for the generation of such a model. Knocking out α-defensins in mice would allow the study of their function in the tolerance of normal flora, upon bacterial infection and during chronic inflammatory diseases. At the beginning of this project, there was increasing evidence of the importance of defensins but their mechanism of action was, and still is, not clear. Additionally no α-defensin deficient mouse models had been generated for the in vivo study of defensin function. It was suggested that human neutrophil α-defensins chemoattract immature dendritic cells and T cells in vitro (86), and subsequently macrophages and mast cells but with conflicting results for previous observations (87). In vitro, in standard low phosphate buffer (10 mM), murine αdefensins have varying antimicrobial activities (88). In vivo, β-defensin knock-out mice are more susceptible to certain bacterial infections but it might be difficult to determine the effect of knocking out only one defensin because of the apparent redundancy of these peptides, which in itself if an implication of their importance. Defb1, encoding the constitutively expressed βdefensin 1, deficient mice had similar bacterial load to wild type mice but higher numbers of Staphylococcus bacteria in the bladder; bacterial clearance from the lung was not altered (89). The redundancy of these peptides is apparent by the delayed, but eventual, ability of Defb1 deficient mice to clear Haemophilus influenzae from the lung (90). In support of the immunomodulatory role of defensins, administration DEFA1 (HNP-1) not only decreased the bacterial load following peritoneal administration of Klebsiella pneumoniae to mice, but also increased the numbers of macrophages, granulocytes and lymphocytes in the peritoneal cavity, both in a dose-dependent manner (91). The antibacterial activity of DEFA1 was abolished in leukocytopenic mice upon treatment with cyclophosphamide (91). Thus it can be argued that the killing activity of DEFA1 is secondary to the recruitment of effector cells of the immune system. 15  Paneth cell DEFA5 transgene knock-in mice are less susceptible to S. typhimurium infection and death as compared to wild type mice (43). DEFA5 was knocked-in because it possesses the largest antimicrobial activity of defensins in vitro, but this was determined under low salt concentration (10 mM sodium phosphate) (92). The authors argue that the differences seen in the transgenic mice compared to the wild type, with respect to bacterial load and survival, must be due to direct antimicrobial activity of DEFA5 because of the short time frame in which the differences were seen (6-12 hours) (43). However, in vitro studies show that in response to defensins, chemotaxis of a variety of cell types (monocytes, mast cells, immature dendritic cells, naïve T cells) can occur in as little as 1-3 hours, and peptides can cause significant increases in cytokine production in 4 hours (86, 93, 94). Recently, β-chemokines have been shown to induce neutrophil chemotaxis in one hour (95), and given the similar tertiary structure and some functions, β-defensins may also be able to chemoattract neutrophils. If other defensins have the ability to chemoattract various immune effector cells, it is possible that DEFA5 could have similar abilities and that these events could occur within the timeframe indicated for its in vivo activity. This mouse model and these experiments elegantly show the importance and often underestimated role that defensins play in the proper functioning of the immune system. Direct antimicrobial activity by DEFA5 was not shown definitively, although DEFA5 alters the microbiota composition within the mouse small intestine (96). It is possible that there are also other equally important functions of DEFA5, and indeed other defensins, as compared to direct antimicrobial activity. It is also difficult to determine whether the effects seen are more of an additive effect due to the increased expression of defensins, endogenous plus DEFA5, in the mouse intestine. Matrix metalloproteinase (MMP) 7 deficient mice were available however, in addition to α-defensin processing, MMP7 is also involved in matrix remodeling (97), wound healing (98) and neutrophil migration (99), all of which could significantly affect bacterial-induced inflammatory responses. The approach of knocking out endogenous α-defensins in mice may be a better indication of their function, in addition to complementing the knock-outs either genetically or by the addition of exogenous peptide. The choice of gene for deletion is complicated owing to the fact that there is not a clear orthologous relationship between human and mouse Paneth cell α-defensins. Whereas humans express two Paneth cell α-defensins, DEFA5 and DEFA6 (100), over twenty α-defensin genes or defensin-related genes have been described in several strains of mouse. Six of these, Crp1-6 that correspond to Defcr1-6, have 16  been purified as peptides from Outbred Swiss mice (101). In this strain, Crp1 is most abundant in adult mice (29), whereas Defcr6 is most abundant in newborn mice (102). Crp2 and 3 differ in the primary sequence compared to Crp1 at only three amino acids (29). Murine α-defensins are also differentially expressed throughout the small intestine. Defcr1 and Defcr5 expression is equivalent in duodenum, jejunum, and ileum, whereas the others show regional differences (103). Taking all of these factors into consideration, the generation of Defcr1 and Defcr5 deficient mice was proposed. In retrospect, for reasons discussed below, these two genes were the least suitable for knock out in the C57BL/6J mouse but serendipitously it drew attention to annotation and assembly problems within the α-defensin region of the C57BL/6J reference genome, as well as differences in α-defensin genomic content between different strains of mice. It also highlighted discrepancies between genome databases, which, until their resolution, prevented any α-defensin deficient mouse generation. A collaboration was established between the Centre for Microbial Disease and Immunity Research at the University of British Columbia (Vancouver, BC, Canada) and the Wellcome Trust Sanger Institute (Hinxton, Cambridge, UK) to investigate the genomic structure of α-defensins, with the ultimate goal of the generation of an α-defensin deficient mouse, with which to study their in vivo function. 2.2  Methods and Materials  2.2.1 Genomic analysis and annotation pipeline Prior to the process of manual annotation, an automated analysis for similarity searches and ab initio predictions is run in an extended Ensembl analysis pipeline system (104). All search results are stored in an Ensembl MySQL database. Following genomic sequence masking of interspersed repeats and tandem repeats by RepeatMasker and Tandem repeats finder (105), a WU-BLASTN search against the nucleotide databases is performed. Significant hits are then realigned to the unmasked genomic sequence using est2genome (106). The Uniprot protein database is then searched with wuBLASTX. In order to provide prediction of protein domains Genewise (107) is used to align hidden Markov models for Pfam (Protein family database) protein domains against the genomic sequence. Finally, a number of different ab initio algorithms are used: Genescan predicts genes (108), tRNAscan predicts tRNA genes (109), and Eponine TSS predicts transcription start sites (110). 17  After completion of the automated analysis, manual annotation starts using a Perl/Tk based graphical interface, called 'otterlace', to edit annotation data stored in a separate MySQL database system (111). Otterlace provides tools for changing exon coordinates, adding gene names and remarks, assigning genes to different categories or adding genomic features such as polyA sites and signals. The annotation of gene objects requires a visual representation of the genomic region and features such as CpG islands, repeats and polyA sites, gene predictions, evidence to support the annotation of gene structures (EST/cDNA/protein), and all transcript variants created by annotators. This representation is provided by a graphical user interface called ZMap, which was written in C programming language to give the high performance required to display large numbers of features (personal communication, R. Storey). An alignment viewer called 'Blixem' allows gapped alignments of nucleotide and protein blast hits to be compared with the genomic sequence (112). Furthermore, a 'Dotplot' tool called 'Dotter' is used to show pair-wise alignments of unmasked sequences, revealing the location of exons that are occasionally missed by the automated blast searches due to their small size and/or match to repeat-masked sequence (112). All annotation is publicly available in the Vertebrate Genome Annotation (VEGA) browser (113). Definitions of Vega gene and transcript types are also available. Subsequent to the first annotation of the region, the C57BL/6J genomic assembly changed slightly and some of the gene names have been changed. Figure 2.1 represents the current version at the time of publication and writing of this thesis. Vega will be updated accordingly. 2.2.2 Mouse genomic assembly The manual annotation of the defensin gene cluster region was based on NCBI Build 36 (NCBIM36). However, at the time of writing the new build NCBIM37 was released, and the two assemblies show several differences. The most crucial one is that a new clone AC161189 was added to the new assembly which overlaps partially with and has replaced clone AC140205. Although most genes initially annotated in AC140205 are present in AC161189, there are seven loci missing. One of these codes for β-defensin 33 (Defb33) and the remaining ones are pseudogenes. In order to preserve this data we propose that the clone AC140205 should be trimmed from the point where it is unique and be returned to the new assembly. The Genome Reference Consortium (114), which is a collaborative effort between NCBI, WTSI, EMBL-EBI and the Genome Center at Washington University, aim to close remaining gaps in the human and 18  mouse genomes and remove discrepancies in clones observed by research groups. The issue described here has been submitted to the Genome Reference Consortium. 2.2.3 Size estimation of the 2 Mb gap of C57BL/6J mouse Chromosome 8 The genomic sequence of Chromosome 8:18,508,450-24,203,501 was downloaded from Vega (v. 23, NCBIM36), and N’s in the sequence, which represent gap regions, were replaced by a space. The sequence was then digested in silico at SwaI-predicted sites using RestrictionMapper (v. 3) (115). The ordered digest fragments were aligned with SwaI-digested DNA fragments from the same region of the Optical Map for murine C57BL/6J Chromosome 8, which was kindly provided by Steve Goldstein (Genome Center of Wisconsin, University of Wisconsin-Madison). The alignment and agreement of fragment sizes between the predicted and in vitro digests was determined manually and the results displayed in Excel. The presence of the inverted tandem repeat (Contig AC152164.14) was determined by S. Goldstein (personal communication). 2.2.4 Genome browser/ database gene set comparison The Vega mouse α-defensin genes annotated here were used to query the gene sets contained within the Ensembl (v. 50), MGI (v. 4.11) and NCBI (Build 37.1) databases. Both Vega gene symbols and database identifiers (OTTIDs) were used because, depending on the database and gene, different results were sometimes returned with either search. As well, not all databases recognize the Vega accession number if a gene name has not been assigned. A gene linked to Vega indicates that there is a reference to the Vega Gene/ OTTID as being the same gene (i.e. mapped to the same position on the chromosome), but has been given a databasespecific name. If the database had referenced another database to obtain the OTTID but there is no acknowledgement to the mapping in Vega, the gene was not considered linked and put into the Additional Gene column. Genes that were identified in the searches but were different to the Vega gene were put into the Additional Gene column. NCBI was searched with OTTIDs and gene name because some Vega accession numbers return bacterial artificial chromosome (BAC) clones in the results. NCBI does not cross-reference Vega, rather they obtain their information from MGI, so none of the genes are directly linked to Vega; genes linked to the Vega-linked Ensembl genes are therefore listed.  19  2.2.5 Genetic analysis of α- and CRS-defensins Nucleotide and protein sequences of α- and CRS-defensins were aligned using ClustalW2 multiple sequence alignment software, with default parameters (116). Nucleotide ClustalW2 alignment files (.aln) were visualized using the GeneDoc software (117); protein ClustalW2 alignment files were visualized in .aln format. Basic local alignment search tool (BLAST) searches were performed using NCBI BLAST 2.2.18+ and WU-BLAST2, with default and variable parameters, with both CRSdefensin genomic and peptides sequences. BLASTP (2.2.18+) searches only returned annotated defensin peptides therefore unannotated nucleotide databases (whole genome shotgun, expressed sequence tag, and draft genomes) were queried using TBLASTN. CRS-defensin peptide sequences were also used to query Pfam (v. 22.0) using the batch search option, with default parameters. 2.2.6 Transcriptional expression profiling of α- and CRS-defensins in C57BL/6J mice 2.2.6.1 Universal defensin primer design Primers designed to amplify all annotated α- and CRS-defensin cDNAs in a single PCR reaction can be found in Table 2.1; these primers are denoted as universal defensin primers. The 26 mouse α- and CRS-defensin proteins were aligned using ClustalW2 (v. 2.0.8) with default parameters (116), and then PAL2NAL (v. 11) used to align the corresponding coding sequences according to their relative codon alignments (118). The output PAL2NAL alignment was visualized with the GeneDoc software (117). Additionally, the α- and CRS-defensin 5’untranslated region (UTR) sequences were downloaded from Ensembl (v. 50) with BIOMART using Ensembl Gene IDs, corresponding to their Vega Gene IDs, and then aligned using ClustalW2 (v. 2.0.8). Table 2.1. Primer sequences for universal defensin expression profiling by capillary sequencing. Primer Name UDefF1 UDefR UDefR-oligo-dT  Sequence (5'-3') ATGAAGACAYTWGTCCTCCTCTCTG GCAGCACAGATACGACTCACG GCAGCACAGATACGACTCACGTTTTTTTTTTTTTTTTTTVN  20  The consensus sequence for the start of the coding sequence was identified as the longest stretch of 100% identity between all genes. OligoCalc was used to calculate GC content, melting temperature and secondary structure formation of potential primers (119). The UDefF1 primer comprised the nucleotide sequence for the first 25 bases following the start of each defensin coding sequence, which are all, except four positions, conserved across all defensin sequences. At these positions, the base included in the primer sequence was obtained either from the consensus sequence or else left as an ambiguous base. The differentiation was dependent on the relative number of genes with a substitution in that particular position. The first nine and last ten bases of the UDefF1 primer have 100% identity across all defensin genes, which should ensure accurate and specific amplification regardless of the intervening ambiguities. Reverse transcription was performed using the UDefR-oligo-dT hybrid primer. The oligo-dT sequence, based on literature searches (120, 121), and commercially available kits, contains 18 thymidines, as well as [AGC][N] to anchor the primer at the end of the transcript. The non-dT or anchor part of the hybrid primer, denoted UDefR, was designed to be equivalent to the UDefF1 forward primer in length, GC content and melting temperature so that following reverse transcription, amplification was straightforward using the UDefF1 and UDefR primers. The  UDefR  sequence  was  derived  from  the  3’  RACE  Adaptor  sequence,  5’-  GCGAGCACAGAATTAATACGACTCACTATAGGT12VN-3’, from the FirstChoice® RLMRACE Kit (Ambion, Inc.). This sequence was modified according to BLAST results to obtain a sequence with no significant homology to the mouse genome or transcriptome, whilst still having a similar melting temperature to that of the forward UDefF1 primer (119). 2.2.6.2 Animals Naïve C57BL/6J mice, three female and three male denoted as M1, M2, M3 and M4, M5, M6, respectively, were used at 4-8 weeks of age. The mice were killed by carbon dioxide asphyxiation, and the small intestine removed and divided into seven equal segments. Faeces were removed from each segment, which was then cut open along its length and flattened. A tissue culture cell scraper was used to scrape along the entire segment to remove epithelial and Paneth cells. The scrapings were immersed in 500 μl RNAlater (Qiagen) in a 1.5 ml Eppendorf tube, and stored at 4°C for two days, upon which time RNA was extracted. 2.2.6.3 RNA extraction RNA was extracted using the RNeasy Mini kit with QIAshredder columns (Qiagen), as per the manufacturer’s instructions. Briefly, the samples in RNAlater were microcentrifuged at 21  maximum speed for 10 minutes to pellet the tissue, and the RNAlater supernatant removed. RLT lysis buffer (600 μl) containing 143 mM β-mercaptoethanol (Sigma Aldrich) was added to the sample and vortexed well. Tissues difficult to lyse were also pipetted to aid lysis. The entire lysate was transferred to a QIAshredder column, microcentrifuged at maximum speed for 2 minutes, added to a new tube containing 600 μl 70% ethanol (EtOH), and mixed by pipetting. The sample (600 μl) was transferred to an RNeasy spin column and microcentrifuged at maximum speed for 30 seconds; this step was repeated for each sample using the same column. The rest of the RNeasy Mini Handbook was followed, with on-column DNase digestion for 30 minutes at room temperature using 15 μl RNase-Free DNaseI and 115 μl RDD buffer (Qiagen) per sample. Each column was eluted twice with 50 μl nuclease-free water (NF-H2O) (Ambion, Inc.) giving a final volume of approximately 100 μl per sample. The RNA quality was assessed using the Agilent 2100 Bioanalyzer 2100 Expert Version B.02.05.SI360 (Agilent Technologies), and the concentration determined using the Thermo Scientific NanoDrop™ ND-1000 Spectrophotometer (Thermo Fisher Scientific Inc.), both as per the manufacturers’ instructions. 2.2.6.4 Reverse transcription First strand synthesis reverse transcription (RT) was performed using the Qiagen QuantiTect Reverse Transcription kit, as per the manufacturer’s instructions. Briefly 1000 ng total small intestinal RNA per mouse (samples M1-M6) was pooled from the individual extractions in a final volume of 12 μl. A further DNase digestion was carried out with 2 μl gWipeout buffer at 42°C for 2 minutes. The RT reaction contained 5 μl RT buffer, 4 μl 10 mM UDefR-oligo-dT primer (0.5 μg) and 1 μl RT enzyme, and was carried out at 42°C for 60 minutes, followed by 3 minutes at 95°C; the cDNA was stored at -20°C. Four RT reactions were performed for each sample, as well as quarter-sized no RT enzyme reactions, making up the volume with NF-H2O, as the control for DNA contamination within the RNA extractions. Aliquots of cDNA were then pooled for each sample. 2.2.6.5 PCR The mouse small intestinal cDNA from samples M1-M6 were amplified using Platinum Taq DNA Polymerase High Fidelity system (Invitrogen). Each reaction contained 1X High Fidelity PCR buffer, 200 μM each dNTP, 2 mM MgSO4, 200 nM each primer (UDefF1, UDefR), 1 U Platinum Taq High Fidelity polymerase and 2.5 μl pooled cDNA in a volume of 50 μl. Cycling conditions were as follows: 94°C for 2 minutes, 35 cycles of 94°C for 30 seconds, 22  68°C for 30 seconds, 68°C for 60 seconds, a final extension of 68°C for 10 minutes, then 4°C. PCR products (5 μl) were visualized by agarose gel electrophoresis (1% agarose, 80 V, 70 min). 2.2.6.6 TOPO cloning The PCR products from each sample (M1-M6) were cloned into the pCR®4BluntTOPO® plasmid, according to the Zero Blunt® TOPO® PCR Cloning Kit for Sequencing instructions (Invitrogen). Ligation reactions performed at room temperature for 30 minutes contained 4 μl PCR product, 1 μl pCR4BluntII-TOPO vector and 1 μl dilute (1/4) salt solution. A pCR4BluntII-TOPO vector-only control was also included, making up the volume with 4 μl NF-H2O. Each ligation reaction (2 μl) was then added to 50 μl One Shot® TOP10 Electrocomp™ E. coli cells (Invitrogen). The entire volume was added to a 0.1 mm cuvette (Cell Projects Limited, UK) and electroporated using the Bio-Rad Gene Pulser Xcell Electroporation System (Bio-Rad Laboratories, Inc.) with the pre-set E. coli bacterial electroporation program (25 μF, 200 Ω, 1800 V, exponential decay). For the recovery, 250 μl super optimal broth with catabolite repression (SOC, Invitrogen) was added and the entire volume transferred to a 15 ml Falcon tube and incubated at 37°C with shaking at 200 rpm, for 90 minutes. A positive control (1 μl pUC19 plasmid, Invitrogen) was also electroporated under the same conditions to ensure TOP10 cells were transformation competent. The E. coli cells (20, 50 or 80 μl replicates) were spread onto low salt Luria Bertani (LB-Luria) agar (1.5%) plates containing 50 μg/ml ampicillin (Roche Applied Science) and 50 μg/ml kanamycin (Gibco) for TOPO ligations, and 100 μg/ml ampicillin for pUC19, and incubated at 37°C overnight. For each sample, 96 individual colonies were picked and grown overnight at 37°C and 200 rpm in 500 μl LB-Luria broth containing 50 μg/ml ampicillin and 50 μg/ml kanamycin in a 96-well deep well plate (BD Biosciences). 2.2.6.7 Capillary sequencing of TOPO clone inserts Plasmid extraction from the overnight cultures (96-well format), and capillary sequencing of the inserts using standard T3 and T7 primers was performed by the WTSI core sequencing facility. Capillary sequencing (Applied Biosystems 3730XL capillary sequencer) was performed using BigDye Terminator BDTv3.1 sequencing chemistry. The sequencing facility (Team 56) processed the sequence trace files using the sequencing production software Asp and then deposited those that passed quality control into a central repository. I performed further analyses of the sequences.  23  2.2.6.8 Capillary sequencing analysis The T3 and T7 sequences were converted from experimental file format to FASTA file format using the WTSI program exp-piece and following command, where n = 1-6 for each M1, M2, M3, M4, M5, M6 sample, the prefix for each experimental file is T15_lr5_090929UDefMn and the prefix for each resulting FASTA file is 100429_UDEF_Mn_exp-piece_FASTA. The exp-piece option (left right) clips the sequences, providing the first and last useful bases. for read in `indir | grep "T15_lr5_090929UDefMn"`; do exp-piece left right $read >> ~lr5/1793864-1793875/100429_UDEF_Mn_exp-piece_FASTA The α- and CRS-defensin transcript FASTA sequences were downloaded from Vega (v. 37) using Ensembl Biomart. A database was created from the text file of the sequences using the following command, where –i is the input file, –p is the type of file (nucleotide, therefore false), and –o allows creation of the indices in FASTA format. formatdb –i 100428_defensin_transcript_OTTIDs_FASTA_Vega37.txt –p F –o T BLAST was then used to batch query each Mn_exp-piece_FASTA file (n=1-6), against the defensin transcript database using the following commands, where blastall is the program, –p indicates program name for type of blast (nucleotide), –d is the database name, –i is the input query file name, –o is the name of the output file, –b is the number of matches to retrieve (5), –m is the alignment output format (0 is pairwise, 8 is tabular, 9 is tabular with headings), –e is the expected value for significant alignments, and –F indicates any sequence filtering (F(alse) turns off filtering to prevent masking of repetitive sequences). blastall –p blastn –d 100428_defensin_transcript_OTTIDs_FASTA_Vega37.txt –i 100430_UDEF_Mn_exp-piece_clipped_FASTA –o 100430_UDEF_M1_exppiece_clipped_v_transcript_5_alignments –b 5 –m 0 –e 1e-05 –F F blastall –p blastn –d 100428_defensin_transcript_OTTIDs_FASTA_Vega37.txt –i 100430_UDEF_Mn_exp-piece_clipped_FASTA –o 100430_UDEF_Mn_exppiece_clipped_v_transcript_5_tabular –b 5 -m 0 –e 1e-05 –F F All sequences with significant BLAST results were inspected manually using the trace viewing software Trev (v. 1.9), which is part of the Staden Package and displays both the base calls and confidence values. If a base-calling error was observed, the sequence position was noted in the tabular output file and the sequence corrected in the BLAST alignment file. As the 24  BLAST analysis was performed against the database of full-length defensin transcript sequences, any mismatches in the UDefF1 primer region, either due to primer synthesis defects or ambiguous base incorporation, were ignored. The percentage identity of the best BLAST hit for each (edited) sequence read was noted and the results tabulated. 2.2.6.9 Universal defensin 454 primer design and multiplex identifier tag incorporation The primers used for defensin cDNA amplification prior to 454 sequencing were the same as those used prior to capillary sequencing, except that sequences for 454-specific adaptors and GS multiplex identifier (GSMID) tags were incorporated (Table 2.2). The Adaptor A and B sequences were added 5’ to the UDefF1 and UDefR primer sequences, respectively, and the sequences for the GSMID tags were added between the adaptor and UDefF1 sequences. This allowed 454 sequencing of the defensin PCR products in the forward direction, with respect to transcript orientation, beginning at the start codon. Sequencing of the reverse strand is not possible because of the homopolymeric thymidine region, due to the transcript polyA tail. The GSMID sequences were added to each forward primer for multiplex sequencing of pooled PCR products. Standard Adaptor and GSMID sequences were obtained from Roche. Table 2.2. Primer sequences for universal defensin expression profiling by 454 amplicon sequencing. Adaptor sequences are in italics, GSMID sequences are underlined, followed by the UDefF1 or UDefR sequences for the UDefF1-454 and UDefR-454 primers, respectively. The Roche standard adaptor and MID nomenclature are indicated. Primer N ame UDefF1-454-1 UDefF1-454-2 UDefF1-454-3 UDefF1-454-4 UDefF1-454-5  Sequence (5'-3') Adaptor GSMID A 1 CCA TCTC ATCCCTGC GTGTCTCCGA CTCAG ACG AGTGCG TA TGAA GAC AY TW GTCCTCCTCTCTG A 2 CCA TCTC ATCCCTGC GTGTCTCCGA CTCAG ACG CTCGAC AA TGA AG ACA YTW G TCC TC CTCTCTG A 3 CCA TCTC ATCCCTGC GTGTCTCCGA CTCAG AGA CGCA CTCA TGA AG ACA YTW G TCC TC CTCTCTG A 4 CCA TCTC ATCCCTGC GTGTCTCCGA CTCAG AGC ACTGTA GATGA AGA CAY TW GTCCTCCTCTCTG A 5 CCA TCTC ATCCCTGC GTGTCTCCGA CTCAG ATCA GAC ACGA TGAA GAC AY TW GTCCTCCTCTCTG A UDefF1-454-6 CCA TCTC ATCCCTGC GTGTCTCCGA CTCAG ATA TCG CGA GATGA AGA CAY TW GTCCTCCTCTCTG 6 UDefR-454 B n/a CCTA TCC CCTGTGTGCCTTGGCA GTCTCA G GCA GCA CAG ATAC GACTCA CG  2.2.6.10 454 sequencing amplicon PCR The conditions for the 454 sequencing amplicon PCR were the same as previously described, except the UDefF1 primer was replaced with the UDefF1-454 primers (1-6), and each sample (M1-M6) amplified with the corresponding numbered UDefF1-454 primer. The same UDefR primer was used in all reactions. Four PCR reactions were set-up for each sample, and the products (5 μl) visualized by agarose gel electrophoresis (1% agarose, 80 V, 35 min). For each sample 10 μl of each PCR reaction was pooled, and then 2 μl of each sample pool was 25  pooled. The final M1-M6 PCR pool was quantified using the Quanti-iT™ PicoGreen dsReagent Kit (Qiagen), as per the manufacturer’s instructions. 2.2.6.11 454 amplicon sequencing The final M1-M6 PCR pool (18 ng/μl, 11 μl) was submitted to the WTSI core sequencing facility (Team 130) for GS FLX Titanium Series Amplicon 454 Sequencing (Roche). All subsequent sequencing protocols, data generation and quality control were performed by staff of Team 130. Briefly, the M1-M6 PCR pool was denatured to give single-stranded fragments, which were then immobilized onto DNA Capture Beads (Roche). The beads were subjected to emulsion PCR (emPCR) and then loaded onto a PicoTiter Plate (Roche). Sequencing was performed with the Genome Sequencer FLX Instrument (Roche). The data were then split into six files based on their GSMID tags (i.e. M1-M6), and deposited into a central repository as Standard Flowgram Format (SFF) files. I performed all subsequent analyses. 2.2.6.12 454 amplicon sequence analysis The SFF files contain the flowgram, base-called sequences and quality scores for each sequencing read (Roche). FASTA files were extracted from each MIDn.sff file, where n=1-6, and saved in a new file using the following command. sffinfo –s Region8.MID1.sff > 100512_Region8.MID1.sff.fa All sequences in each FASTA file were counted with the following command, where n=1-6, and the results tabulated. grep “>” 100512_Region8.MIDn.sff.fa | wc –l The FASTA files were filtered for those over 300 nucleotides using a perl script kindly provided by Gregory Baillie (WTSI), and the following command, where n=1-6. The options for the get_seqs_within_length_range.pl script are minimum and maximum length, to which 300 and arbitrarily 10000 were assigned. ~gb7/bioinformatics/perl_scripts/get_seqs_within_length_range.pl 100512_Region8.MIDn.sff.fa 100512_Region8.MIDn_over300.sff.fa 300 10000 All sequences in each filtered FASTA file were counted with the following command, where n=1-6, and the results tabulated. grep “>” 100512_Region8.MIDn_over300.sff.fa | wc -l 26  A database of α- and CRS-defensin sequences was created using the following command, and the same FASTA sequences as for the capillary sequencing analysis, except that any 5’-UTR and the UDefF1 primer sequences were removed (trimmed transcripts). formatdb –i 100512_ defensin_transcript_trimmed_OTTIDs_FASTA_Vega37.txt –p F –o T Nucleotide BLAST analysis was then performed with each 300 nucleotide filtered MID FASTA file against the trimmed transcript database, for the top hit in tabular form using the following command, where MIDn=MID1-6. blastall –p blastn –d 100512_defensin_transcript_trimmed_OTTIDs_FASTA_ Vega37.txt –i 100512_Region8.MIDn_over300.sff.fa –o 100512_Region8.MIDn_ over300.sff.fa_v_trimmed_transcript –b 1 –m 8 –e 1e-05 –F F For each MID, new files of all sequencing reads with 100% identity to any of the reference sequences, over an alignment length of greater than 300 nucleotides, were created and sorted, and then counted using the following commands, where n=1-6. awk ‘{if ($3 == 100 && $4 > 300) {print $2 “\t” $3}}’ 100512_Region8.MIDn_over300.sff.fa_v_trimmed_transcript | sort –k 1,1 > 100512_Region8.MIDn_over300.sff.fa_v_trimmed_transcript_100hits wc –l 100512_Region8.MIDn_over300.sff.fa_v_trimmed_transcript_100hits The number of sequencing reads matching each reference sequence with 100% identity was quantitated for all MID files with the following command. The MID1-6 variable was first defined and then the filtered BLAST files were then sorted according to reference sequence and the number of unique entries counted. for mid in 1 2 3 4 5 6; do echo "Mid $mid"; sort 100512_Region8.MID${mid}_over300.sff.fa_v_trimmed_transcript_100hits | uniq –c; done The BLAST analysis was repeated with the previous blastall command with alignments as the output (-m 0), however this also contains alignments under the 300 nucleotide cut-off used for the previous analyses.  27  2.2.6.13 Identification of unannotated C57BL/6J defensin transcripts The 454 amplicon sequencing reads that did not match any of the defensin reference transcript sequences with 100% identity were further analyzer for putative novel variants. Peptide analysis of the C57BL/6J murine α- and CRS-defensins has revealed an α-defensin, termed Crp27, whose corresponding cDNA sequence, termed Defcr27, has not been annotated to the reference genome (personal communication, Andre Ouellette and Michael Shanahan). BLAST analysis was therefore performed to search the 454 sequences for matches to the Crp27 peptide. A database of peptide sequences, including Crp27, and the six other α-defensin peptides found to be present in the C57BL/6J mouse as controls (personal communication, A. Ouellette and M. Shanahan), was created using the following command. formatdb –i 100527_Crp_protein_7_FASTA –p T –o T Each extracted MID FASTA file, without 300 nucleotide filtering, was then blasted against the peptide database using the following command, where n=1-6. blastall –p blastx –d 100527_Crp_protein_7_FASTA –i 100512_Region8.MIDn.sff.fa –o 100527_Region8.MIDn.sff.fa_v_100527_ Crp_protein_7_FASTA –m 8 –e 1e-05 –F F The output files were filtered and counted for those that translate with 100% identity to the full length of Crp27 (35 amino acids) using the following command. awk ‘{if ($2 ~ /Crp27/ && $3 == 100 && $4 >= 35) {print}}’ 100527_Region8.MIDn.sff.fa_v_100527_ Crp_protein_7_FASTA | wc –l BLAST analysis was repeated for each MID FASTA with 300 nucleotide filtering, and the results tabulated for 100% identity for each of the seven peptides using the following command, where $4 = amino acid length of each peptide (Crp3=35, Crp5=36, Crp20=42, Crp21=36, Crp23=35, Crp24=35, Crp27=35). for mid in 1 2 3 4 5 6; do echo "Mid $mid"; awk ‘{if ($2 ~ /Crp3/ && $3 == 100 && $4 == 35) {print}}’ 100601_Region8.MID${mid}_ over300.sff.fa_v_100601_ Crp_protein_7_FASTA | wc –l; done  28  2.2.7 In vitro activity of the human defensin DEFA1 2.2.7.1 Reagents DEFA1 (ACYCRIPACIAGERRYGTCIYQGRLWAFCC) was synthesized by N-(9fluorenyl)methoxy carbonyl (F-moc) chemistry, and folded, as previously described (122), at the Biomedical Research Centre at the University of British Columbia (UBC). LL-37 (LLGDFFRKSKEKIGKEFKRIVQRIKDFFRNLVPRTES) was synthesized by F-moc chemistry at the Nucleic Acid Protein Service Unit at UBC. The lyophilized peptides were resuspended in endotoxin-free water, and their concentrations determined by amino acid analysis. LPS was purified from Pseudomonas aeruginosa strain H103 as per the Darveau-Hancock method (123). 2.2.7.2 Cell culture Human venous blood (100 ml) was collected from healthy volunteers in Vacutainer collection tubes containing sodium heparin (BD Biosciences) in accordance with UBC ethical approval and guidelines. The blood was mixed with an equal volume of complete RPMI media, and 20 ml layered over 10 ml Ficoll-Paque® Plus (Amersham Biosciences) or Lymphoprep (Cedar Lane Labs) in 50 ml Falcon Tubes. PBMCs in the buffy coat were isolated by gradient centrifugation. The tubes were centrifuged at 1450 rpm for 20 minutes in a Beckman Coulter Allegra 6 Centrifuge, and the buffy coat removed. The PBMCs were washed twice with Dulbecco’s PBS (D-PBS), spinning in between at 1600 rpm for 8 minutes. The cells were resuspended in complete RPMI and 1 X 106 cells in 1 ml seeded into 24-well tissue culture plates (BD Falcon), and rested at 37°C with 5% CO2 for 1-2 hours prior to stimulation. PBMCs were then treated for 24 hours with DEFA1, LL-37 or LPS at the indicated concentrations. Endotoxinfree water served as the vehicle control. 2.2.7.3 Detection of cytokines and chemokines Following incubation with the various stimuli, the culture supernatants were removed, centrifuged at 1000 g for 10 minutes, and then stored at -20°C. The concentrations of IL8, IL10 and TNF in the supernatants were measured using enzyme linked immunosorbent assay (ELISA), as per the manufacturer’s instructions (IL8: BioSource International, Camarillo, CA; TNF and IL10: eBioscience, San Diego).  29  2.3  Results  2.3.1 Annotation of the defensin clusters of the mouse C57BL/6 reference genome 2.3.1.1 Genomic overview of the annotated region on mouse Chromosome 8 Manual annotation was performed for the genomic region on mouse Chromosome 8:18.923.2 Mb, which consists of 18 finished BAC clones that span approximately 2.4 Mb of the NCBIM36 assembly (and subsequently NCBIM37) reference sequence (Figure 2.1A). To date this is the only known region associated with murine α-defensins and was therefore chosen as the initial starting point for defensin annotation in the mouse genome. This region also contains two β-defensin gene clusters, which flank the α-defensin gene cluster. Known MGI nomenclature was used to name genes only if there was a 100% cDNA match. Otherwise genes were referred to by an interim Vega database identifier such as OTTMUSG00000018259. The Vega database identifier is stable, versioned and will remain unchanged after further naming or assembly updates. In total, annotation of this genomic region revealed the existence of 54 and 44 loci in the α-defensin and β-defensin gene clusters, respectively, which includes both predicted genes and pseudogenes (Tables 2.3 and 2.4). The entire defensin gene cluster is flanked by Xkr5 (X Kell blood group precursor-related family, member 5) and Ccdc70 (coiled-coil domain containing 70). This region also contains three gaps of various sizes ranging from 50 kb to 2 Mb where additional defensin genes could be located. Following alignment of the in silico SwaI-digested genomic sequence with the fragment sizes for the corresponding region of the optical map of Chromosome 8, the estimation of the intervening gap between the defensin clusters is approximately 0.76 Mb (Figure 2.2 and Appendix A.1), which is considerably lower the current 2 Mb size. The ‘finished’ C57BL/6J genome was published in 2009, which included the same gap alignment analysis for NCBIM36 with a size estimation of 0.86 Mb (124).  30  Figure 2.1. Overview of the Chromosome 8 defensin gene cluster region in mouse and human reference genomes. A clone tiling path is shown for the corresponding regions in mouse (A) and human (B). Clones are displayed in yellow but regions overlapping with adjacent clones are shown in black. Genes are indicated by blue arrows and pseudogenes by green arrows. The font colour of the gene identifier classifies the genes according to their defensin subfamily or non-defensin status (see colour legend for more details). Genes in shadowed boxes are duplicated and the colour indicates the pairs. A -'- highlights all potential Defcr5 genes. The mouse assembly is based on NCBIM37, in which three gaps currently exist; two gaps are represented by grey bars and the biggest gap between the two clusters is joined by a 'V'. During writing of this thesis, MGNC implemented our suggestion of changing the α- and CRS-defensin gene symbol root from Defcr to Defa. This figure represents the most current version of the annotation, however all other analyses, and subsequent figures and tables generated, were performed using the Defcr root.  31  Table 2.3. Genes annotated within the α-defensin and β-defensin clusters on C57BL/6 mouse Chromosome 8. The Vega (v. 38) genomic coordinates of the region between Xkr5 and Ccdc70 are 8:18,932,729-21,221,369 bp. The order of genes in the table corresponds to their 5’3’ order within the genome. Proposed gene symbols are included, some of which have been implemented by the Mouse Genomic Nomenclature Committee (MGNC), as well as analyses concerning polymorphisms and orthology, where known or predicted. Approved gene symbols are shown in bold. “AC” gene symbols are provisional and based on their position on the clone. Nine α-defensins in rat do not have mouse orthologues; n/a, corresponding rat gene not found. Chr, Chromosome.  defensin beta 40  2  Rat TATA-Box TATA-Box Polymorphic/ Human Orthologue Orthologue CNV/ Genomic (potential) Coordinates Duplication OTTMUSG00000020772 Defb40  defensin beta 37  2  OTTMUSG00000020771 -  Defb37  defensin beta 38  2  OTTMUSG00000020768 -  Defb38  defensin beta 39  2  OTTMUSG00000020770 -  Defb39  defensin beta 12  3  defensin beta 34  2  OTTMUSG00000020595 TATAAATG 19114860 19114865 (strand -1) OTTMUSG00000020594 -  2 and 3 OTTMUSG00000020593  2  OTTMUSG00000020599 TATAAATG 19157855 19157862  SPAG11B  Defb14  spermassociated antigen 11c/h (Spag11c/h) sperm associated antigen 11 defensin beta 14  2  OTTMUSG00000020600 -  DEFB103 Defb14 (MGI & ref)  Defb4  defensin beta 4  2  OTTMUSG00000020596 -  DEFB4 (ref)  Defb6  defensin beta 6  2  OTTMUSG00000020598  n/a  2  OTTMUSG00000020619 -  n/a  Organism/ Gene Species/ Symbol Chr Mouse Defb40 C57BL/6J Chr8 Mouse Defb37 C57BL/6J Chr8 Mouse Defb38 C57BL/6J Chr8 Mouse Defb39 C57BL/6J Chr8 Mouse Defb12 C57BL/6J Chr8 Mouse Defb34 C57BL/6J Chr8 Mouse AC116394.3 C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8  Spag11  Gene Name  AC163997.1 beta-defensin 53 (Defb53)  Proposed Gene Exon Vega-Gene ID Symbol Count  Defb#  DEFB105 Defb12 (MGI & ref) n/a NP_0010329 41.1 (20 aa shorterdeletion) Spag11  Defb4  Defb5  defensin beta 5  2  OTTMUSG00000020669 -  n/a  Defb3  defensin beta 3  2  OTTMUSG00000020668 -  Defb3  Defb8  defensin beta 8  2  OTTMUSG00000020716 -  n/a  Defb7  defensin beta 7  2  OTTMUSG00000020722 -  n/a  Defa-rs#  2  OTTMUSG00000018344 TATAAATG 22084456 22084463 (strand -1)  n/a  Defb#  3  OTTMUSG00000018889 -  Defb51  AC140205.1 CRS4C-6 (cryptdinrelated sequence peptide) Mouse AC140205.2 Beta-defensin C57BL/6J 51 (Defb51) Chr8  32  Organism/ Gene Species/ Symbol Chr Mouse AC140205.3 C57BL/6J Chr8 Mouse AC140205.4 C57BL/6J Chr8 Mouse Defcr21 C57BL/6J Chr8 Mouse Defcr23 C57BL/6J Chr8 Mouse AC129197.1 C57BL/6J Chr8  Gene Name  Proposed Gene Exon Vega-Gene ID Symbol Count  beta-defensin 52 (Defb52)  Defb#  4  TATA-Box TATA-Box Polymorphic/ Human Rat Genomic CNV/ Orthologue Orthologue Coordinates Duplication (potential) OTTMUSG00000018888 Defb52  beta-defensin 33 (Defb33)  Defb#  3  OTTMUSG00000018925 -  Defb33  defensin related Defa21 cryptdin 21  2  OTTMUSG00000019489 TATAAATA 22165193 22165200  n/a  defensin related Defa23 cryptdin 23  2  OTTMUSG00000019488 TATAAATG 22194686 22194693  n/a  2  OTTMUSG00000018258 TATAAATG 22204671 22204678  n/a  2  OTTMUSG00000019700  n/a  2  OTTMUSG00000018260 TATAAATG 22235796 22235803 (strand -1)  n/a  2  OTTMUSG00000019742 TATAAAGG 22274268 22274275  n/a  2  OTTMUSG00000019763 TATAAATA 22301925 22301932  n/a  2  OTTMUSG00000019762 TATAAATG 22331219 - Yes 22331226  n/a  2  OTTMUSG00000019786 TATAAATG 22341209 22341216  n/a  2  OTTMUSG00000019784 TATACATA 2238936722389374  n/a  2  OTTMUSG00000019782 TATAAATG 22427048 22427055  n/a  2  OTTMUSG00000019785 TATAAATG 22437044 22437051  n/a  2  OTTMUSG00000019792 TATAAATG 22466721 22466728 (strand -1) OTTMUSG00000019857 -  n/a  novel protein Defa5[suffix] similar to defensin related cryptdin 5 Mouse defensin related Defa25 (Defcr2 Defcr25 C57BL/6J cryptdin 25 name should be Chr8 removed) Mouse AC129197.2 novel defensin Defa-rs2 C57BL/6J related Chr8 sequence cryptdin peptide CRS1C Mouse AC166039.1 novel defensin Defa# C57BL/6J related cryptdin Chr8 Mouse defensin related Defa22 Defcr22 C57BL/6J cryptdin 22 Chr8 Mouse AC166039.3 novel defensin Defa23[suffix] C57BL/6J related cryptdin Chr8 identical to Defcr23 Mouse AC129174.1 novel defensin Defa# C57BL/6J related cryptdin Chr8 Mouse AC129174.4 novel defensin Defa# C57BL/6J related cryptdin Chr8 Mouse defensin related Defa3 Defcr3 C57BL/6J cryptdin 3 Chr8 Mouse AC129174.7 novel protein Defa5[suffix] C57BL/6J similar to Chr8 defensin related cryptdin 5 Defensin related Mouse Defcr-rs1 sequence C57BL/6J (alias Defa-rs1 cryptdin peptide Chr8 CRS1C-2) Mouse AC133094.3 novel defensin Defa# C57BL/6J related cryptdin Chr8 Mouse AC133094.5 novel defensin Defa-rs4 C57BL/6J related Chr8 sequence cryptdin peptide CRS1C Mouse defensin related Defa20 Defcr20 C57BL/6J cryptdin 20 Chr8 Mouse AC133094.9 novel defensin Defa20[suffix] C57BL/6J related cryptdin Chr8 identical to Defcr20 Mouse AC133094.1 novel protein C57BL/6J similar to Chr8 defensin related cryptdin 5 Mouse AC133094.11 novel defensin C57BL/6J related cryptdin Chr8 Mouse AC133094.13 novel defensin Defa-rs5[suffix] C57BL/6J related Chr8 sequence cryptdin peptide CRS1C  2  n/a  2  OTTMUSG00000019859 TATAAATG 22567056 - Yes, identical 22567063 to (strand -1) AC133094.13  n/a  2  OTTMUSG00000019856 TATAAATG 22619699 22619706  n/a  2  OTTMUSG00000019860 TATAAATG 22639473 - Yes 22639480  n/a  2  OTTMUSG00000018259 TATAAATG 22675812 - Yes 22675819  n/a  2  OTTMUSG00000019896 -  n/a  2  OTTMUSG00000019893 TATAAATG 22705004 - Yes, identical 22705011 to (strand -1) AC133094.5  n/a  33  Organism/ Gene Species/ Symbol Chr Mouse Defcr26 C57BL/6J Chr8 Mouse AC134533.2 C57BL/6J Chr8  Gene Name  Proposed Gene Exon Vega-Gene ID Symbol Count  defensin related Defa26 cryptdin 26  novel defensin Defa3[suffix] related cryptdin identical to Defcr3 Mouse AC134533.3 novel protein Defa5[suffix] C57BL/6J similar to Chr8 defensin related cryptdin 5 Mouse AC134533.6 novel defensin Defa-rs3 C57BL/6J related cryptdin Chr8 (CRS1C-3) defensin related Defa24 Mouse Defcr24 cryptdin 24 C57BL/6J Chr8 defensin beta 1 Mouse Defb1 C57BL/6J Chr8 defensin beta 50 Mouse Defb50 C57BL/6J Chr8 defensin beta 2 Mouse Defb2 C57BL/6J Chr8 defensin beta 10 Mouse Defb10 C57BL/6J Chr8  2  TATA-Box TATA-Box Polymorphic/ Human Rat Genomic CNV/ Orthologue Orthologue Coordinates Duplication (potential) OTTMUSG00000019889 TATACATA 22728581n/a 22728588  2  OTTMUSG00000019892 TATAAATG 22766199 - Yes 22766206  n/a  2  OTTMUSG00000019924 TATAAATG 22776195 - Yes 22776202  n/a  2  OTTMUSG00000019927 TATAAATG 22814141 22814148 (strand -1) OTTMUSG00000019980 TATAAATG 22844935 22844942  n/a  2  Defcr24  2  OTTMUSG00000019983 TATAAAAA 22887029 22887036  2  OTTMUSG00000019981 TATAAATC 22933984 22933991  Defb50  2  OTTMUSG00000020785 -  Defb2  2  OTTMUSG00000020783 -  Mouse Defb9 C57BL/6J Chr8 Mouse Defb11 C57BL/6J Chr8  defensin beta 9  2  OTTMUSG00000020782 -  defensin beta 11  2  OTTMUSG00000020784 -  Mouse Defb15 C57BL/6J Chr8  defensin beta 15  2  OTTMUSG00000020830 TATAAAGG 23057269 23057262 (strand -1)  Mouse Defb35 C57BL/6J Chr8 Mouse Defb13 C57BL/6J Chr8  defensin beta 35  2  OTTMUSG00000020829 -  defensin beta 13  2  OTTMUSG00000020827  DEFB1 Defb1 (MGI & ref)  Defb9 53% Defb10 53% Defb11 53% questionable Defb9 50% Defb10 50% Both questionable DEFB106 Defb15 (MGI & ref) (syntenic) NP_0010326 09.1 (12 aa longer; not syntenic) Both questionable n/a DEFB107 (ref)  Defb13 75% questionable  34  Table 2.4. Pseudogenes annotated within the α-defensin and β-defensin clusters on C57BL/6 mouse Chromosome 8. The Vega (v. 38) genomic coordinates of the region between Xkr5 and Ccdc70 are 8:18,932,729-21,221,369 bp. The order of pseudogenes in the table corresponds to their 5’-3 order within the genome. Organism/ Species/ Chr Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8 Mouse C57BL/6J Chr8  Clone  Vega-Gene ID  AC140205.9  AC140205.9  defensin pseudogene similar to Defa7  Unprocessed  AC129197  OTTMUSG00000019818 Defa-ps3  defensin related cryptdin pseudogene  Unprocessed  AC166039  OTTMUSG00000019815 Defa-ps4  pseudogene similar to part of defensin related cryptdin  AC129174  OTTMUSG00000019783 Defa-ps5  novel defensin related cryptdin pseudogene  Unprocessed  OTTMUSG00000019780 Defa-ps6  defensin, alpha, pseudogene 1  Transcribed_unprocessed  OTTMUSG00000019817 Defa-ps7  pseudogene similar to part of defensin related cryptdin  OTTMUSG00000019781  novel defensin related cryptdin pseudogene  Unprocessed  OTTMUSG00000019794  novel defensin related cryptdin pseudogene  Unprocessed  OTTMUSG00000019793  novel defensin related cryptdin pseudogene  Unprocessed  OTTMUSG00000019795  pseudogene similar to part Pseudogene of defensin related cryptdin  OTTMUSG00000019855  defensin related cryptdin pseudogene  OTTMUSG00000019858  pseudogene similar to part Pseudogene of defensin related cryptdin  OTTMUSG00000019890  defensin alpha pseudogene Unprocessed  OTTMUSG00000019894  defensin alpha pseudogene Unprocessed  OTTMUSG00000019925 Defa-ps9  defensin alpha pseudogene Unprocessed  OTTMUSG00000019923 Defa-ps1  defensin alpha pseudogene Transcribed_unprocessed identical to Defa-ps1  OTTMUSG00000019929 Defa-ps10  pseudogene similar to part Pseudogene of defensin related cryptdin  OTTMUSG00000019982 Defa-ps11  defensin alpha pseudogene Unprocessed  AC116394  OTTMUSG00000020597 Defa-ps12  defensin beta 46 pseudogene  Unprocessed  AC113099  OTTMUSG00000020671 Defa-ps13  defensin beta 54 pseudogene (Defb54-ps)  Unprocessed  OTTMUSG00000020719 Defa-ps14  novel defensin alpha pseudogene  Unprocessed  OTTMUSG00000020718 Defa-ps15  novel defensin pseudogene Unprocessed  AC133094  AC134533  AC121131  MGI Name Pseudogene Name  Pseudogene Type  Unprocessed  35  ~ 0.76 Mb  fragments approximately the same size (same restriction site) gap in sequence alignment inverted contig AC152164.14 inserted fragments (small so most likely minor changes in restriction sites) different restriction sites (same bp size overall)  Figure 2.2. Schematic for size estimation of the gap in murine C57BL/6J Chromosome 8 located between genomic position 20 and 22 Mb. The genomic sequence for Chr8:18,508,450-24,203,501 bp was downloaded from Vega (v. 23), and then digested in silico at SwaI-predicted sites. The ordered digest fragments were aligned with SwaI-digested DNA fragments from the same region of the Optical Map for Chromosome 8. See Appendix A.1 for the original spreadsheet alignment.  2.3.1.2 α-Defensin gene cluster  Nineteen apparently intact α-defensin genes, eleven of which are novel, and 22 defensinrelated pseudogenes were observed within the mouse α-defensin cluster (Figure 2.1A and 2.3). The α-defensin peptide alignment (Figure 2.3A) shows increased sequence variability of the mature peptide compared to the most recent publication (Figure 2.3B) (13). Additionally a conserved thymidine (T) residue has been identified at the position preceding the fourth cysteine, and a conserved glutamine (G) at the position 2-3 residues following the forth cysteine is not conserved within all murine α-defensins.  36  A. OTTMUSG00000019784(noveldef) Defcr26(OTTMUSG00000019889) Defcr25(OTTMUSG00000019700) OTTMUSG00000019896(noveldef) Defcr24(OTTMUSG00000019980) OTTMUSG00000019742(noveldef) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019786(noveldef) Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) Defcr20(OTTMUSG00000019856) OTTMUSG00000019860(Defcr20dupl)  MKTLVLLSALFLLAFQVQADPIQKTDEETNTEVQPEEEEQAMSVSFGNPE MKTLVLLSALFLLAFQVQADPIQNTDEETNTEVQPQEEDQAVSVSFGNPE MKTLVLLSALALLAFQVQADPIQNRDEESKIDEQPGKEDQAVSVSFGDPE MKTLVLLSALALLAFQVQADPIQNRDEESKIDEQPGKEDQAVSVSFGDPE MKTLILLSALVLLAFQVQADPIQNTDEETKTEEQPGEEDQAVSVSFGDPE MKTLILLSALVLLAFQVQADPIQNTDEETKTEEQPGEEDQAVSVSFGDPE MKTLVLLSALVLLAFQVQADPIQNTDEETKTEEQPGEDDQAVSVSFGDPE MKTLVLLSALVLLAFQVQADPIQNTDEETKTEEQPGEDDQAVSVSFGDPE MKTLVLLSALILLAFQVQADPIQNTDEETKTEEQPGKEDQAVSVSFGDPE MKTLVLLSALILLAFQVQADPIQNTDEETKTEEQPGKEDQAVSVSFGDPE MKTFVLLSALVLLAFQAQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTFVLLSALVLLAFQVQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTFVLLSALVLLAYQVQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTIVLLSALVLLAFQVQADPIQKTDEETNTEEQPGEEDQAVSISFGGQE MKTFVLLSALVLLAFQAQADPIHKTDEETNTEEQPGEEDQAVSISSGGQE MKTLVLLSALILLAYQVQTDPIQNTDEETNTEEQPGEDDQAVSVSFGGQE MKTLVLLSALILLAYQVQTDPIQNTDEETNTEEQPGEEDQAVSVSFGGQE MKTLVLLSALVLLAFQVQADPIQNTDEETNTEEQPGEEDQAVSVSFGDPE MKTLVLLSALVLLAFQVQADPIQNTDEETNTEEQPGEEDQAVSVSFGDPE ***::***** ***:*.*:***:: ***:: : ** :::**:*:* *. *  50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50  OTTMUSG00000019784(noveldef) Defcr26(OTTMUSG00000019889) Defcr25(OTTMUSG00000019700) OTTMUSG00000019896(noveldef) Defcr24(OTTMUSG00000019980) OTTMUSG00000019742(noveldef) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019786(noveldef) Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) Defcr20(OTTMUSG00000019856) OTTMUSG00000019860(Defcr20dupl)  GSDLQEES---LRDLGCYCRKRGCTRRERINGTCRKGHLMYTLCCL----GSDLQEES---LRDLGCYCRKRGCTRRERINGTCRKGHLMYTLCCL----GSSLQEEC----EDLICYCRTRGCKRRERLNGTCRKGHLMYMLWCC----GSSLQEESSSALRDRICYCRTS-CKKRERLNGTCRKGHLMYKLCCR----GASLQEES---LRDLVCYCRARGCKGRERMNGTCSKGHLLYMLCCR----GSSLQEES---LRDLVCYCRARGCKGRERMNGTCSKGHLMYMLCCR----GSSLQEES---LRDLVCYCRKRGCKRRERMNGTCRKGHLMYTLCCR----GSSLQEES---LRDLVCYCRKRGCKRRERMNGTCRKGHLMYTLCCR----GSSLQEES---LRDLVCYCRTRGCKRRERMNGTCRKGHLIYTLCCR----GSSLQEES---LRDLVCYCRTRGCKRRERMNGTCRKGHLIYTLCCR----GSALHEEL---SKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS----GSALHDEL---SKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS----GSALHEEL---SKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS----GSALHEEL---SKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS----GSALHEEL---SKKLICYCRIRGCKRRECVFGTCRNLFLTFVFCCS----GSALHEKL---SRDLICLCRNRRCNRGELFYGTCAGPFL---RCCRRRR-GSALHEKL---SRDLICLCRKRRCNRGELFYGTCAGPFL---RCCRRRR-GSALHEKS---SRDLICYCRKGGCNRGEQVYGTCSGRLL---FCCRRRHRH GSALHEKS---SRDLICYCRKGGCNRGEQVYGTCSGRLL---FCCRRRHRH *: *::: .. * ** *. * . *** * *  93 93 92 95 93 93 93 93 93 93 93 93 93 93 93 93 93 95 95  B.  (i) X 5-10 C X C R X2-3 C X3 E X3 G T C X1-3 (G) X4-7 C C X0-6  (ii) X C X C R X C X E X G X C X G X C C X 1-2 2-3 3 3 3 5 1-4  Figure 2.3. Murine α-defensin peptides. A multiple protein alignment of the murine α-defensin prepropeptides (A). Protein sequences were deduced from the nucleotide sequences. The six canonical cysteine residues are highlighted in yellow. Mature peptides that have been isolated from C57BL/6J mice are in blue (personal communication, A. Ouellette and M. Shanahan). Genes identified for the first time in this study are tagged as noveldef and duplicated genes are tagged as dupl; for clarity, duplicated gene peptide products are not shown in blue. The peptide alignment shows increased sequence variability (B) of the murine α-defensin mature peptide (i) compared to the most recent publication (ii) (13). N-terminal variability is based on the sequences of the mature peptides isolated by Ouellette and Shanahan. 37  Furthermore six MYM-Type zinc finger protein pseudogenes as well as three ribosomal protein pseudogenes are also located in this region. Within the α-defensin gene cluster there is a region containing several genes very similar to Defcr5 but no identical match to the Swiss-Prot entry P28312.2 for Defcr5, which is derived from the genomic sequence of the 129 mouse strain (Figure 2.4). Two of these loci, OTTMUSG00000019785 and OTTMUSG00000018259 show only one amino acid difference in their signal peptides compared to the Defcr5 Swiss-Prot entry P28312.2. Locus OTTMUSG00000018258 shows one amino acid difference in its pro-segment to P28312.2 and locus OTTMUSG00000019924 differs in one amino acid in the signal peptide and one in the pro-segment compared to P28312.2. These genes all have identical mature peptides compared to the P28312.2 Defcr5 sequence and have therefore been denoted as novel protein similar to defensin related cryptdin 5. Questions arise as to whether a common sequence for the mature peptide qualifies these genes to be named the same as a published sequence, whether they have the same functionality and how differences in the signal- and/or pro-segment might affect their expression. These Defcr5 loci might be the result of chromosomal duplications or involved in copy number variation similar to a number of defensin genes where we observed 100% identity throughout the entire coding sequence (see below).  Signal peptide  Pro-peptide  P28312|DEF5_mouse OTTMUSG00000019785 OTTMUSG00000018258 OTTMUSG00000018259 OTTMUSG00000019924  MKTFVLLSALVLLAFQVQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTFVLLSALVLLAFQAQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTFVLLSALVLLAFQVQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTFVLLSALVLLAYQVQADPIHKTDEETNTEEQPGEEDQAVSISFGGQE MKTIVLLSALVLLAFQVQADPIQKTDEETNTEEQPGEEDQAVSISFGGQE ***:**********:*.*****:***************************  P28312|DEF5_mouse OTTMUSG00000019785 OTTMUSG00000018258 OTTMUSG00000018259 OTTMUSG00000019924  GSALHEELSKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS GSALHEELSKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS GSALHDELSKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS GSALHEELSKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS GSALHEELSKKLICYCRIRGCKRRERVFGTCRNLFLTFVFCCS *****:*************************************  50 50 50 50 50  93 93 93 93 93  Mature peptide Figure 2.4. The polymorphic Defcr5 peptides. A protein alignment of the deduced prepropeptides of all potential Defcr5 copies and the Swiss-Prot sequence of Defcr5 P28312.2. Variations in amino acids are highlighted in red. The mature peptide cleavage site is based on the Swiss-Prot annotation however the preceding leucine residue was found in the mature Crp5 isolated by Ouellette and Shanahan.  Locus OTTMUSG00000019786 also has a best match to Defcr5 but there are three amino acid differences, one in the signal peptide, one in the pro-segment and another one in the mature peptide compared to P28312.2. Therefore, this locus has been annotated as a novel 38  defensin related cryptdin without commenting on any similarity to Defcr5, since there are clear precedents for applying different names to defensins with small sequence changes. Three genes were observed to have coding sequences with 100% identity to known genes. One example of this is represented by two copies for Defcr23, which are identical in their coding and 3’-UTR regions (Figure 2.5). To clarify this situation, one copy has been tagged as Defcr23 and the other as 'novel defensin related cryptdin identical to Defcr23' or Defcr23dupl. OTTMUSG00000019488 OTTMUSG00000019762  TCCTGCTCACCAATCCTCCAGGTGACTCCCAGCCATGAAGACACTAGTCCTCCTCTCTGC 60 TCCTGCTCACCAATCCTCCAGGTGACTCCCAGCCATGAAGACACTAGTCCTCCTCTCTGC 26 ************************************************************  OTTMUSG00000019488 OTTMUSG00000019762  CCTCATCCTGCTGGCCTTCCAGGTCCAGGCTGATCCTATCCAAAACACAGATGAAGAGAC 120 CCTCATCCTGCTGGCCTTCCAGGTCCAGGCTGATCCTATCCAAAACACAGATGAAGAGAC 86 ************************************************************  OTTMUSG00000019488 OTTMUSG00000019762  TAAAACTGAGGAGCAGCCAGGGAAAGAGGACCAGGCTGTGTCTGTCTCTTTTGGAGACCC 180 TAAAACTGAGGAGCAGCCAGGGAAAGAGGACCAGGCTGTGTCTGTCTCTTTTGGAGACCC 146 ************************************************************  OTTMUSG00000019488 OTTMUSG00000019762  AGAAGGCTCTTCTCTTCAAGAGGAATCGTTGAGAGATCTGGTATGCTATTGTAGAACAAG 240 AGAAGGCTCTTCTCTTCAAGAGGAATCGTTGAGAGATCTGGTATGCTATTGTAGAACAAG 206 ************************************************************  OTTMUSG00000019488 OTTMUSG00000019762  AGGCTGCAAAAGAAGAGAACGCATGAATGGGACCTGCAGAAAGGGTCATTTAATATACAC 300 AGGCTGCAAAAGAAGAGAACGCATGAATGGGACCTGCAGAAAGGGTCATTTAATATACAC 266 ************************************************************  OTTMUSG00000019488 OTTMUSG00000019762  GCTCTGCTGTCGCTGAACATGGAGACCACAGAGGACAAGACGAGCATGAGTACTGAGGCC 360 GCTCTGCTGTCGCTGAACATGGAGACCACAGAGGACAAGACGAGCATGAGTACTGAGGCC 326 ************************************************************  OTTMUSG00000019488 OTTMUSG00000019762  ACTGATGCTGGTGCCTGATGACCACTTCTCAATAAATTGTTCGCAATATGC 411 ACTGATGCTGGTGCCTGATGACCACTTCTCAATAAATTGTTCGCAATATGC 360 ***************************************************  Figure 2.5. Duplicated Defcr23 transcript sequences. These genes have been annotated as duplicated because their coding sequences, indicated in blue, are identical. However the transcripts can not be differentiated in either the capillary or 454 sequencing assays as their 3’-UTR sequences are also identical. The sequence highlighted in yellow has not been annotated as part of the 5’-UTR, however it is in the 5’ upstream genomic sequence, and is included for confirmation of the duplicated nature of these two genes. Defcr23 is OTTMUSG00000019488 and Defcr23dupl is OTTMUSG00000019762. There are three bases different within the entire sequence of these two genes, which are all within Intron1-2.  Two other α-defensin genes, Defcr3 and Defcr20, also have apparent duplications in the mouse genome, however each of these pairs has one base pair difference in the 3’-UTR region (Figure 2.6). The duplicated genes have been denoted Defcr3dupl and Defcr20dupl because their predicted peptides would be indistinguishable from Defcr3 and Defcr20, respectively. Genes with duplicated copies are ideal candidates for copy number variation.  39  A. OTTMUSG00000019782 OTTMUSG00000019892  TCCTGCTCACCAATCCTCCAGGTGACTCCCAGCCATGAAGACACTAGTCCTCCTCTCTGC 60 TCCTGCTCACCAATCCTCCAGGTGACTCCCAGCCATGAAGACACTAGTCCTCCTCTCTGC 47 ************************************************************  OTTMUSG00000019782 OTTMUSG00000019892  CCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGATCCTATCCAAAACACAGATGAAGAGAC 120 CCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGATCCTATCCAAAACACAGATGAAGAGAC 107 ************************************************************  OTTMUSG00000019782 OTTMUSG00000019892  TAAAACTGAGGAGCAGCCAGGGGAAGACGACCAGGCTGTGTCTGTCTCTTTTGGAGACCC 180 TAAAACTGAGGAGCAGCCAGGGGAAGACGACCAGGCTGTGTCTGTCTCTTTTGGAGACCC 167 ************************************************************  OTTMUSG00000019782 OTTMUSG00000019892  AGAAGGCTCTTCTCTTCAAGAGGAATCGTTGAGAGATCTGGTATGCTATTGTAGAAAAAG 240 AGAAGGCTCTTCTCTTCAAGAGGAATCGTTGAGAGATCTGGTATGCTATTGTAGAAAAAG 227 ************************************************************  OTTMUSG00000019782 OTTMUSG00000019892  AGGCTGCAAAAGAAGAGAACGCATGAATGGGACCTGCAGAAAGGGTCATTTAATGTACAC 300 AGGCTGCAAAAGAAGAGAACGCATGAATGGGACCTGCAGAAAGGGTCATTTAATGTACAC 287 ************************************************************  OTTMUSG00000019782 OTTMUSG00000019892  ACTCTGCTGTCGCTGAACATGGAGACCACAGAGGACAAGACGAACATGAGTACTGAGGCC 360 ACTCTGCTGTCGCTGAACATGGAGACCACAGAGGACAAGACGAACATGAGTACTGAGGCC 347 ************************************************************  OTTMUSG00000019782 OTTMUSG00000019892  ACTGATGCTGGTGCCTGATGACTACCTCGCAATAAATTGTTCGCAATATG 410 ACTGATGCTGGTGCCTGATGACCACCTCGCAATAAATTGTTCGCAATATG 397 ********************** ***************************  B. OTTMUSG00000019860 OTTMUSG00000019856  ACATTGGGCTCCTGCTCACCAATTCTCCAGGTGACTCACAGCCATGAAGACACTTGTCCT 60 ACATTGGGCTCCTGCTCACCAATTCTCCAGGTGACTCACAGCCATGAAGACACTTGTCCT 60 ************************************************************  OTTMUSG00000019860 OTTMUSG00000019856  CCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGATCCTATCCAAAACACAGA 120 CCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGATCCTATCCAAAACACAGA 120 ************************************************************  OTTMUSG00000019860 OTTMUSG00000019856  TGAGGAGACTAATACTGAGGAGCAGCCAGGGGAGGAGGACCAGGCTGTGTCTGTCTCCTT 180 TGAGGAGACTAATACTGAGGAGCAGCCAGGGGAGGAGGACCAGGCTGTGTCTGTCTCCTT 180 ************************************************************  OTTMUSG00000019860 OTTMUSG00000019856  TGGAGACCCAGAAGGATCTGCTCTTCATGAAAAATCGTCGAGAGATCTGATATGCTATTG 240 TGGAGACCCAGAAGGATCTGCTCTTCATGAAAAATCGTCGAGAGATCTGATATGCTATTG 240 ************************************************************  OTTMUSG00000019860 OTTMUSG00000019856  TAGAAAAGGAGGCTGCAATAGAGGAGAACAAGTTTATGGGACCTGCTCAGGACGACTTTT 300 TAGAAAAGGAGGCTGCAATAGAGGAGAACAAGTTTATGGGACCTGCTCAGGACGACTTTT 300 ************************************************************  OTTMUSG00000019860 OTTMUSG00000019856  GTTCTGCTGCCGCCGCCGCCACCGCCACTGAACATGCAGATGGCAGAGATATGACAACCA 360 GTTCTGCTGCCGCCGCCGCCACCGCCACTGAACATGCAGATGACAGAGATATGACAACCA 360 ****************************************** *****************  OTTMUSG00000019860 OTTMUSG00000019856  TCAGCTCTGAGGCTACTGATGCTGGGGCCTGATAACCACTTCTCAATAAATTGTTTGCAA 420 TCAGCTCTGAGGCTACTGATGCTGGGGCCTGATAACCACTTCTCAATAAATTGTTTGCAA 420 ************************************************************  OTTMUSG00000019860 OTTMUSG00000019856  TATGC 425 TATGC 425 *****  Figure 2.6. Duplicated Defcr3 and Defcr20 transcript sequences. Defcr3/Defcr3dupl (A) and Defcr20/Defcr20dupl (B) have been annotated as duplicated because their coding sequences, indicated in blue, are identical. However the transcripts can be differentiated in both the capillary and 454 sequencing assays due to one base difference, highlighted in red, within the 3’-UTR. The sequence highlighted in yellow has not been annotated as part of the 5’-UTR, however it is in the 5’ upstream genomic sequence. In (A), Defcr3 is OTTMUSG00000019782 and Defcr23dupl is OTTMUSG00000019892, and in (B), Defcr20 is OTTMUSG00000019856 and Defcr20dupl is OTTMUSG00000019860. For each pair there are additionally two bases different within the entire gene sequence, which are all within Intron1-2. 40  2.3.1.3 β-Defensin clusters  The mouse β-defensin cluster contains 28 β-defensin genes, including sperm-associated antigen 11 (Spag11) and OTTMUSG00000020593 (Spag11c/h). Spag11 genes encode βdefensin-like peptides and have tissue- and species-specific alternative splicing in primate species (125). Furthermore, three β-defensin pseudogenes and nine other pseudogenes, as well as a gene coding for a novel protein with tubby-like domains have been annotated. This manual annotation confirms the current mouse β-defensin repertoire (13, 126). In the human genome, most β-defensin genes have been recently duplicated but in the mouse genome the annotation did not reveal any 100% identical β-defensin genes. This analysis is limited by the current mouse genome assembly as the most recent duplications may not be apparent. Finishing of this region may reveal duplicated β-defensin genes similar to those in the α-defensin gene set. 2.3.1.4 Cryptdin-related sequences  Within the α-defensin gene cluster genes that show similarity to the prosegment of αdefensins but have a different number and spacing pattern of cysteines within the mature peptide region were identified (21, 22, 127). Such genes are referred to as cryptdin-related sequences (CRS) or Defcr-related sequence (Defcr-rs). Six genes, three of which are novel, belonging to two groups of cryptdin-related sequences, CRS1C and CRS4C, have been annotated in the C57BL/6J mouse (Figure 2.7) (25, 127). Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000018260(novelCRS1C) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  MKTLVLLSALVLLAFQVQADPIQNTDEETKTEEQPEEEDQAVSVSFGGTE MKTLVLLSALALLAFQVQADPIQNTDEETKTEEQPEEEDQAVSVSFGGTE MKTLVLLSALALLAFQVQADPIKNTDEETKTGEQPEEEDQAVSVSFGGTE MKTLVLLSALALLALQVQADPIQNTDEETKTQEQPGEEDQAVSVSFGGTE MKTLVLLSALALLALQVQADPIQNTDEETKTQEQPGEEDQAVSVSFGGTE MKTLVLLSALVLLAFYVQADSTQETDEETKTDDQPGEEDQGVSVSFEDPE **********.***: ****. ::******* :** ****.***** ..*  50 50 50 50 50 50  Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000018260(novelCRS1C) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  GSALQDVAQRRFPWCRKCRVCQKCQVCQKCPVCPTCPQCPKQPLCEERQN GSALQDVAQRRFLWCRKCPVCQKCQVCQKCPVCPTCPQCPKQPLCEERQN GSALQYVAQRRFPWCRKCPVCQKCQVCQKCPVCPTCPQCPKLPLCKERQN GSALQDVAQRRFPWCRKCRVCQKCEVCQKCPVCPTCPQCPKQPLCKERQN GSALQDVAQRRFPWCRKCRVCQKCEVCQKCPVCPTCPQCPKQPLCKERQN RYVLQVSGLGKPPQCPKCPVCSKCPQCPQCPQCPGCPRCN----CMTK-.** . : * ** **.** * :** ** **:* * :  100 100 100 100 100 94  Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000018260(novelCRS1C) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  KTAITTQAPNTQHKGC KTAITTQAPNTQHKGC KSAITTQAPNTQHKGC KTAITTQAPNTHHKGC KTAITTQAPNTHHKGC ----------------  116 116 116 116 116  Figure 2.7. CRS-defensin peptides. A multiple protein alignment of the deduced CRS-defensin prepropeptides. Cysteine residues are highlighted in yellow. Members of the CRS1C family contain eleven conserved cysteines; CRS4C-6 belongs to the CRS4C family but contains ten instead of the usual nine cysteines for this group. 41  Nine cysteine residues characterize CRS peptides of the CRS4C class purified from C3H/HeN  murine  gastrointestinal  mucosa  (101);  CRS4C-6,  annotated  here  OTTMUSG00000018344 belongs to the subfamily CRS4C; however, CRS4C-6, which harbors ten predicted cysteine residues has not been included in prior studies (21, 22, 127). The remaining  five  cryptdin-related  OTTMUSG00000018260,  sequences  annotated  OTTMUSG00000019927,  here  (OTTMUSG00000019792, OTTMUSG00000019893,  OTTMUSG00000019859) contain eleven predicted cysteines residues. OTTMUSG00000018260 shows three amino acid differences, one in the signal sequence and two in the mature peptide, compared to Defcr-rs1 (OTTMUSG00000019792). Two identical genes have been assigned as coding for novel CRS1C peptides, OTTMUSG00000019859 and OTTMUSG00000019893, which show 100% identity to each other throughout their transcript sequences (Figure 2.8). OTTMUSG00000019893 OTTMUSG00000019859  ACATTGAGCTCCTGCTCACTAATCTTCCAGGTGACTCCCAGTCATGAAGACACTTGTCCT 60 ACATTGAGCTCCTGCTCACTAATCTTCCAGGTGACTCCCAGTCATGAAGACACTTGTCCT 60 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  CCTCTCTGCCCTTGCCCTGCTGGCCTTACAGGTCCAGGCTGATCCTATCCAAAACACAGA 120 CCTCTCTGCCCTTGCCCTGCTGGCCTTACAGGTCCAGGCTGATCCTATCCAAAACACAGA 120 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  TGAAGAGACTAAAACTCAGGAGCAGCCAGGGGAAGAGGACCAGGCTGTTTCTGTCTCCTT 180 TGAAGAGACTAAAACTCAGGAGCAGCCAGGGGAAGAGGACCAGGCTGTTTCTGTCTCCTT 180 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  TGGAGGCACAGAAGGCTCTGCTCTTCAAGATGTAGCCCAAAGAAGATTTCCGTGGTGCCG 240 TGGAGGCACAGAAGGCTCTGCTCTTCAAGATGTAGCCCAAAGAAGATTTCCGTGGTGCCG 240 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  GAAGTGCCGAGTGTGCCAGAAGTGCGAAGTGTGCCAGAAGTGCCCTGTGTGCCCGACATG 300 GAAGTGCCGAGTGTGCCAGAAGTGCGAAGTGTGCCAGAAGTGCCCTGTGTGCCCGACATG 300 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  CCCCCAGTGCCCAAAGCAGCCATTGTGCAAAGAAAGGCAAAATAAAACTGCTATCACCAC 360 CCCCCAGTGCCCAAAGCAGCCATTGTGCAAAGAAAGGCAAAATAAAACTGCTATCACCAC 360 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  CCAAGCTCCAAATACACATCATAAAGGCTGTTGAGCTGAATGTGGAATCTGGGTTGAGAT 420 CCAAGCTCCAAATACACATCATAAAGGCTGTTGAGCTGAATGTGGAATCTGGGTTGAGAT 420 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  GACCATTTGCCTTTGGTCTCCACGATCTCTTTGTGCTTAGCCTCAATTGCAATTCCTTCT 480 GACCATTTGCCTTTGGTCTCCACGATCTCTTTGTGCTTAGCCTCAATTGCAATTCCTTCT 480 ************************************************************  OTTMUSG00000019893 OTTMUSG00000019859  CTCATAAACTCCTTGCTGAAAAATCA 506 CTCATAAACTCCTTGCTGAAAAATCA 488 **************************  Figure 2.8. Duplicated novel CRS1C transcript sequences. These genes have been annotated as duplicated because their coding sequences, indicated in blue, are identical. However the transcripts can not be differentiated in either the capillary or 454 sequencing assays as their 3’-UTR sequences are also identical. The sequence highlighted in yellow has not been annotated as part of the 3’-UTR, however it is in the 3’ downstream genomic sequence, and is included for confirmation of the duplicated nature of these two genes. There are three bases different within the entire sequence of these two genes, which are all within Intron1-2.  42  All CRS1C peptides known-to-date encode 116 amino acid proteins in contrast to αdefensins that encode proteins with 92-95 amino acids. Purification of the predicted CRS peptides annotated here from C57BL/6J mice will reveal whether processing occurs as expected when compared to other strains of mice (101). Additionally CRS4C peptides purified from C3H/HeN mice form covalent homo- and heterodimers, in vitro and in vivo, and kill commensal and pathogenic bacteria, in vitro (101). The function of the novel putative C57BL/6J CRS peptides, and indeed the novel putative C57BL/6J α-defensin peptides, remains to be determined. As three novel CRS genes were annotated, and the expression of this family of defensinlike peptides has only been reported in mice, it was of interest to determine whether other species, rat in particular, have homologous peptides. The CRS family cluster separately from other mammalian α-defensins and it has been argued that the mouse CRS family and rat αdefensins have evolved separately with a common gene ancestor prior to speciation (12). Databases were searched for annotated or experimental evidence to identify novel rat gene or peptide sequences homologous to that of the CRS family, however for all of the CRS peptides, the only significant hit (based on e-value) is the defensin_propep domain, which is general and not species specific. Individual searches returned specific defensins but none aligned at the Cterminal end of the CRS peptides. Extensive BLAST searches using both genomic and peptide CRS sequences suggested rat α-defensins as potential homologues. However results of all BLAST searches, including those against unannotated databases, showed significant hits to the 5’ and N-terminal region of the genomic and peptide sequences, respectively, but poor matches to 3’ and C-terminal regions (Tables 2.5 and 2.6). An additional protein alignment of mouse CRS and rat α-defensin peptides confirmed this lack of homology (Figure 2.9). At this time, it appears the CRS family is unique to mouse.  43  Table 2.5. Pfam query results for the defensin-related CRS peptide sequences. Pfam (v. 22.0) was queried using the batch search option with default parameters, and the results returned were tabulated. Defensin_propep, defensin propeptide; VirB, VirB/Tra/Trw family (Type IV secretion system); GASA, Gibberellin regulated protein; Conotoxin, small neurotoxic peptides with disulfide bridges; MucB_RseB, regulators of the anti-sigma E protein RseD  Sequence ID OTTMUSG00000018260  OTTMUSG00000018344  OTTMUSG00000019792  OTTMUSG00000019859  OTTMUSG00000019893  OTTMUSG00000019927  Sequence Start End 1 52 1 14 1 81 1 52 1 14 1 81 1 52 1 81 1 14 1 21 1 52 1 81 1 14 1 52 1 81 1 14 1 52 1 14 1 81  Pfam Accession PF00879.9 PF08139.3 PF02704.5 PF00879.9 PF08139.3 PF02950.8 PF00879.9 PF02704.5 PF08139.3 PF03888.5 PF00879.9 PF02704.5 PF08139.3 PF00879.9 PF02704.5 PF08139.3 PF00879.9 PF08139.3 PF02704.5  HMM Start End 1 54 1 15 1 114 1 54 1 15 1 84 1 54 1 114 1 15 1 26 1 54 1 114 1 15 1 54 1 114 1 15 1 54 1 15 1 114  Alignment Mode ls fs ls ls fs ls ls ls fs fs ls ls fs ls ls fs ls fs ls  Bit Score  E-value  Pfam ID  97.6 5.2 -52.6 76.6 5.6 -5.6 102.8 -51.2 5.6 3.8 104.5 -48.1 5.2 104.5 -48.1 5.2 91.5 5.2 -53  3.90E-26 0.59 0.64 8.00E-20 0.45 0.63 1.10E-27 0.45 0.45 0.54 3.30E-28 0.2 0.59 3.30E-28 0.2 0.59 2.70E-24 0.59 0.71  Defensin_propep VirB GASA Defensin_propep VirB Conotoxin Defensin_propep GASA VirB MucB_RseB Defensin_propep GASA VirB Defensin_propep GASA VirB Defensin_propep VirB GASA  Table 2.6. Summary of the best non-mouse BLAST hits for CRS peptide sequences. All BLAST searches gave similar information regarding potential homologues in rat, which, appeared to be none. From these tests it was decided that WU-BLAST2 Nucleotide and Protein BLASTS (default parameters) using BLASTN and BLASTP algorithms, respectively, would give the most useful information with regards to full coverage of the gene and peptide sequences in the output. The best matches for both the genomic and peptide sequence for each gene are indicated. Otter ID OTTMUSG00000018260 OTTMUSG00000019792 OTTMUSG00000019927 OTTMUSG00000018344 OTTMUSG00000019893 OTTMUSG00000019859  Gene Defa9 n/a Defa9 n/a Defa9 n/a Defa6 n/a Defa9 n/a Defa9 n/a  Peptide n/a Defa9 n/a Defa9 n/a Defa9 n/a Defa9 n/a Defa6 n/a Defa6  Score 1167 90 1299 98 1410 109 1533 98 1395 151 1395 151  % Identity 81 37 80 39 81 40 69 41 81 48 81 48  % Positive 81 45 80 46 81 50 69 45 81 54 81 54  E-Value 2.3e-84 3.5e-06 2.5e-104 5.3e-07 2.0e-104 6.3e-08 4.3e-90 5.4e-04 1.2e-106 5.8e-12 3.3e-96 5.8e-12  44  Figure 2.9. Alignment of mouse CRS-defensin and rat α-defensin peptides. The alignment shows that despite a shared homology in their signal and pro-peptide regions, mouse CRS-defensin and rat α-defensin peptides were significantly different peptides, and none seemed to be orthologous even though they most likely shared a common ancestor. Mouse sequences are denoted by the last 5 numbers of their OTTID (e.g. 18344 – OTTMUSG00000018344) and rat sequences are denoted by their gene name preceded by an r (e.g. rDefa11); E and N denote sequences in Ensembl and NCBI, respectively, when the peptides shared the same name in both databases but had different peptide sequences. Residues that showed 100% conservation across all sequences are highlighted in black. Classical cysteine residues of rat α-defensins are highlighted in grey and cysteine residues of the mouse CRS peptides are highlighted in red.  2.3.1.5 Identification of new splice variants within the mouse defensin genes  To complete the defensin gene set in the C57BL/6J mouse reference genome, all other defensin loci on Chromosomes 1, 2 and 14 were also annotated. These loci contain only βdefensin genes and as such were not subjected to in-depth analysis. Interestingly however, novel splice variants were annotated for Defb30 and Defb42 on Chromosome 14, which was in contrast to the family members on Chromosome 8. Defb30 had four different splice variants, one of which was previously known; three variants were tagged as “putative coding” as they have a different first exon compared to the known variant. Two pairs of variants shared the same 5' exon but differed in their 3' exons. In each pair, one variant consisted of three exons and the other one of two (Figure 2.10). For Defb42 two coding and two non-coding variants were identified and annotated. One of the transcripts that seemed to lack coding properties was tagged as a transcript likely to be subject to nonsense-mediated mRNA decay (NMD). All four Defb42 variants had differentially spliced 5' first exons and only one had previously been identified in other gene sets. Tissue-specific and species-specific alternative splicing was previously implicated for primate SPAG11 (125). The β-defensin Defb42 has been discovered and characterized in mice and its expression has been shown to be epididymis-specific (128). Looking at the origin of the manually annotated spice variants for Defb42, it was noticeable that all cDNA clones 45  representing the main coding variant were derived from the adult male reproductive tract, specifically the epididymis. However, there was one coding cDNA with an alternative 5’ UTR exon compared to the main variant that was derived from the spleen of a four week old male mouse. The potential NMD splice variant is a two cells egg cDNA and another overlapping noncoding transcript is based on an 11 days embryo whole body cDNA. This observation suggests that alternative splicing for Defb42 is likely to also be development stage specific. An unusual feature was observed for Defb17 and Defb41 on Chromosome 1. These genes shared the same start and first exon but differed in their second exon, which was crucial since it encoded the mature peptide. According to Vega general annotation guidelines, these two genes would normally be merged and the two transcripts would represent splice variants of the same gene since they share the first coding exon. Differential splicing seems to be a rare event for defensin genes; however, the observed examples here indicate potential functional differences for the affected transcripts.  Figure 2.10. Novel coding and non-coding murine β-defensin splice variants. Vega screenshot of Defb30 and Defb42, where three new variants per locus were annotated. Defb30: Variant 1 was a known variant with known CDS, variant 2 was a novel variant with the same CDS as variant 1 but had an alternative 3' UTR, variant 3 and 4 were novel variants with putative CDS and different 3'UTR. Defb42: Variant 1 represented a non-coding transcript, variant 2 was a novel variant with the same CDS as the known transcript (3) but with an alternative 5' UTR, variant 3 was a known variant with known CDS and variant 4 was a NMD candidate.  46  2.3.2 Genomic structures of annotated defensins on Chromosome 8 2.3.2.1 TATA boxes  Annotation of TATA boxes was based on motifs verified experimentally and published previously for five defensin genes in mouse (26, 27), and two defensin genes in human (129). The position of TATA box motifs for several more defensin genes, 28 in total including four βdefensin, Spag11, 17 α-defensin and all six CRS genes, can be found in Table 2.3. Somewhat unsurprisingly, duplicated defensin genes contained the same TATA box sequence, which suggested recent duplication events. A strong TATA box motif was identified for the majority of the α- and CRS-defensins. However for a novel α-defensin gene, OTTMUSG00000019784, and for Defcr26 a TATA box with a weaker consensus was identified (Figure 2.11), which would be predicted to reduce the expression of these genes. TATA box-containing genes are significantly more likely to change in expression and are biased towards spontaneous mutations (130). A. OTTMUSG00000019784 Defcr26(OTTMUSG00000019889)  CTTTTCTCTGCCATATACATATGGGCTGACTAATCACACTCCACACATTGGGCTCCTGTT 60 CTTTTCTCTGCCATATACATATGTGCTGACTAATCACACTCCACACATTGGGCTCCTGCT 60 *********************** ********************************** *  OTTMUSG00000019784 Defcr26(OTTMUSG00000019889)  CCCCAATCCCCCAGGTGACTCCCAGCCATG 90 CCCCAATCCCCCAGGTGACTCCCAGCCATG 90 ******************************  B. Defcr3(OTTMUSG00000019782) OTTMUSG00000019785  CCTTTCTCTGTCCTATAAATGCAGGCTGGATATTCACTCTCCACACATTGGGCTCCTGCT 60 CCTTTCTCTGCCCTATAAATGCAAGTTGGCTACTCACTCTCCACACATTGGGCTCCTGCT 60 ********** ************ * *** ** ***************************  Defcr3(OTTMUSG00000019782) OTTMUSG00000019785  CACCAATCCTCCAGGTGACTCCCAGCCATG 90 CAACAATTCTCCAGGTGACCCCCAGCCATG 90 ** **** *********** **********  Figure 2.11. TATA-boxes annotated within the potential promoter region of murine defensin genes. A weak TATA box motif could be identified for two genes OTTMUSG00000019784 and Defcr26 (A). A strong TATA box motif was found for 27 defensin genes; an example is shown for Defcr3 and OTTMUSG00000019785, a novel defensin gene (B). TATA box motifs are shown in red/blue and start codons are underlined.  Several  TATA  OTTMUSG00000019896)  box-less and  also  genes  (Defcr25,  OTTMUSG00000019857  and  two  pseudogenes  (OTTMUSG00000019793  and  OTTMUSG00000019923) had an identical 5' UTR/promoter region to that of the previously reported 'Crypi' (26, 27), which is presumed to be non-functional because of a pre-mature stop codon. The question that arose was whether these loci represented new pseudogenes or whether their expression was regulated by an alternative promoter. The gene OTTMUSG00000019857 47  had a divergent C-terminus/mature peptide compared to all other defensin genes (Figure 2.12). It had a coding potential for 112 amino acids but three of the consensus cysteine residues were missing. These data suggested that this gene was possibly pseudogenic in the reference genome. OTTMUSG00000019857(noveldef)  MKTLVLLSALALLAFQVQADPIQNRDEESKIDEQPGKEDQAVSVSFGDPE 50 GSSLQEECEDRICYCRTSCKKKRTPDWDLQKGSFNVQALLPLNMETTEDK 98 TAMSTEATDAGA 112  Figure 2.12. Novel murine prepropeptide related to α- and CRS-defensins. A novel sequence (OTTMUSG00000019857) was annotated within the defensin gene cluster region, which shared homology with the signal and pro-region of α- and CRS-defensin peptides. OTTMUSG00000019857 contained four cysteine residues, highlighted in yellow, but lacked all the canonical cysteines in any known number and spacing pattern.  2.3.2.2 Pseudogenes  A total of 22 defensin pseudogenes were annotated in the major mouse defensin gene cluster region on mouse Chromosome 8, whereas ten pseudogenes were annotated in the corresponding human defensin cluster. Approximately 10% of the pseudogenes annotated in the Encyclopedia of DNA Elements (ENCODE) project were from genes involved in the immune response (131), thus the high frequency of defensins pseudogenes (>25%) was notable. The Vega annotation guidelines divide pseudogenes into two categories, processed and unprocessed, each with two subcategories, transcribed or untranscribed. A locus is annotated as pseudogene when clear homology is shown to proteins but the coding sequence is disrupted, resulting in frameshifts or in-frame stop codons. A locus can also be tagged as a pseudogene when one or more parent genes that show spliced gene structure can be found elsewhere in the genome whereas the pseudogene locus is a single exon encoding for the corresponding protein. Pseudogenes of the defensin gene cluster are unprocessed as they have likely evolved through duplication of functional genes and have accumulated mutations over time and become non-functional. Of the 22 pseudogenes annotated in this region five were only partial and one, Defa-ps1 was tagged as a transcribed_unprocessed pseudogene. Here, protein homologies  suggested this locus is a pseudogene, but overlapping locus-specific transcription evidence (cDNAs) indicated expression. There is recent evidence that regulatory interdependency exists between transcribed pseudogenes and their parent gene. For example a targeted knockdown of the transcribed ABC transported pseudogene ABCC6P1 results in a significant reduction of the parent gene ABCC6 expression levels (132). Annotation of TATA box-like motifs was also performed for the defensin related pseudogenes. Two defensin pseudogenes contained a strong TATA box motif (TATAAA TG), 48  and four defensin pseudogenes contained a weak TATA box motif (TATACA TA/G), but for the majority nothing similar could be identified. Looking at the transcribed pseudogene, Defa-ps1, the homology broke down 48 bp upstream of the start codon and no TATA box-like motif could be identified. Generally, the TATA box-lacking defensin pseudogenes had 5' upstream sequences similar to the potential promoter regions of TATA box-lacking active defensin genes. 2.3.3 Comparative analysis of murine defensin gene sets  To illustrate the difficulties created in naming the defensins, by comparison to the assembled data from databases and the literature, four major gene sets were cross-referenced, and gene symbols and gene IDs, for the mouse α-defensin genes annotated herein, assembled (Table 2.7). These included the manually curated data set from Vega (v.30), automatic gene annotation from Ensembl (v.49), cDNA evidence from NCBI RefSeq, and a merged gene set encompassing UCSC, RefSeq and Ensembl from MGI (4.01) (133). The gene Defcr25 had a cross-reference to gene Defcr2 in NCBI's Entrez Gene indicating that this gene was also known as Defcr2. However, the protein sequences for Defcr25 (MGI:3630385; Swiss-Prot:Q5G864.1) and Defcr2 (MGI:94882; Swiss-Prot:P28309.2) were different and were derived from different mouse strains. Therefore, this locus was annotated as Defcr25, since the sequence in the reference mouse genome was identical to this gene. Another example of ambiguities between databases are Defcr16 and Defcr17, which have been associated by MGI with OTTMUSG00000019742 and OTTMUSG00000019892 respectively; in case of Defcr16 evidence for its expression was derived from C3H/HeJ strain only while the association of Defcr17 with the Vega model is incorrect as OTTMUSG00000019892 was a duplicate of Defcr3. Additional examples are listed in Table 2.7.  49  Table 2.7. Genome browser comparison of mouse α-defensin genes. See Sections 2.2.4 and 2.3.3 in the text for the complete description of the comparison. Ensembl Annotation Gene Linked to Vega Vega Gene Symbol  Vega OTTER Gene ID Gene Symbol  AC140205.1  OTTMUSG00000018344  Defcr21  OTTMUSG00000019489  Defcr23 AC129197.1  MGI Annotation  Additional Gene(s)  Gene Linked to Vega  Gene ID  Gene Symbol  Gene ID  Gene Symbol  Gene ID  AY761185  ENSMUSG00000079120  -  -  AY761185  MGI:3630303  Defcr21  ENSMUSG00000074447  -  -  -  -  OTTMUSG00000019488  Defcr23  ENSMUSG00000074446  Defcr23 Defcr-rs7  ENSMUSG00000074442 OTTMUSG00000019762  Defcr23  MGI:3630381  OTTMUSG00000018258  Defcr5  ENSMUSG00000061845  -  -  -  Defa-ps1 OTTMUSG00000019780 ENSMUSG00000071164 AC129174.9 OTTMUSG00000019793 AC134533.5 OTTMUSG00000019923 ENSMUSG00000061958  Defcr25  OTTMUSG00000019700  Defcr25  AC129197.2  OTTMUSG00000018260  Defcr-rs1  -  -  NCBI Annotation  Additional Gene(s) Gene Symbol  Gene Linked to Ensembl  Additional Gene(s)  Gene ID  Gene Symbol  Gene ID  Gene Symbol  Gene ID  -  -  AY761185  503556  -  -  Defcr21  MGI:1913548  Defcr21  66298  -  -  -  -  Defcr23  497114  -  -  -  OTTMUSG00000018258  MGI:3711900  OTTMUSG00000018258  100041688  Defcr5  13239  -  -  Defcr25  MGI:3630385  Defcr25 (aka Defcr2)  13236  -  -  -  -  OTTMUSG00000018260  MGI:3709605  OTTMUSG00000018260  634825  Defcr-rs1  13218  13235 100038927  AC166039.1  OTTMUSG00000019742  Defcr16  ENSMUSG00000074444  Defcr24  ENSMUSG00000064213  Defcr16  MGI:99585  -  -  Defcr16 LOC100038927  -  -  Defcr22  OTTMUSG00000019763  Defcr22  ENSMUSG00000074443  -  -  Defcr22  MGI:3639039  -  -  Defcr22  382059  -  -  13226  -  -  AC166039.3  OTTMUSG00000019762  Defcr-rs7  OTTMUSG00000019762  Defcr23  ENSMUSG00000074442  Defcr-rs7  MGI:102509  -  -  Defcr-rs7 (aka CRS4C-2, CRS4C2b, CRS4C2c, CRS4C2d, CRS4C3a, CRS4C3b, CRS4C3c, CRS4C3d, CRS4C3e, CRS4C3a2)  AC129174.1  OTTMUSG00000019786  novel  ENSMUSG00000079116  -  -  OTTMUSG00000019786  MGI:3705230  -  -  OTTMUSG00000019786  100041759  -  -  AC129174.4  OTTMUSG00000019784  novel  ENSMUSG00000074441  -  -  OTTMUSG00000019784  MGI:3708769  -  -  OTTMUSG00000019784  100041787  -  -  Defcr3  OTTMUSG00000019782  Defcr3  ENSMUSG00000074440  Defcr3  ENSMUSG00000060208  Defcr3  MGI:94883  -  -  Defcr3  13237  -  -  AC129174.7  OTTMUSG00000019785 NP_031877.2 ENSMUSG00000074439 AC134533.3 OTTMUSG00000019924  AC129174.10 OTTMUSG00000019792  Defcr5  MGI:99583  -  -  Defcr5  13239  -  Defcr-rs1  ENSMUSG00000074437  -  -  Defcr-rs1  MGI:94881  -  -  Defcr-rs1  13218  -  -  novel  ENSMUSG00000074436  -  -  OTTMUSG00000019857  MGI:3642785  -  -  -  -  -  -  AC133094.3  OTTMUSG00000019857  AC133094.5  OTTMUSG00000019859  novel  ENSMUSG00000079114  novel  ENSMUSG00000079113  EG665927  MGI:3645033  -  -  EG665927  665927  -  -  Defcr20  OTTMUSG00000019856  Defcr20  ENSMUSG00000065958  Defcr20  ENSMUSG00000065957  Defcr20  MGI:1915259  -  -  Defcr20 (aka cryptdin 4)  68009  -  -  AC133094.9  OTTMUSG00000019860  Defcr20  ENSMUSG00000065957  -  -  OTTMUSG00000019860  MGI:3709042  -  -  OTTMUSG00000019860  100041890  Defcr20 (aka cryptdin 4)  68009  AC133094.1  OTTMUSG00000018259  Defcr5  ENSMUSG00000065956  Defcr5  MGI:3705236  -  -  OTTMUSG00000018259  100041895  Defcr5  13239  AC133094.11 OTTMUSG00000019896  novel  ENSMUSG00000074434  -  ENSMUSG00000061845 OTTMUSG00000018259 -  EG626682  MGI:3646688  -  -  EG626682  626682  -  -  AC133094.13 OTTMUSG00000019893  novel  ENSMUSG00000079113  -  -  EG665956  MGI:3648003  -  -  EG665956  665956  -  -  50  Defcr26  OTTMUSG00000019889  Defcr26  ENSMUSG00000060070  -  -  Defcr26  MGI:3630390  -  -  Defcr26  626708  -  -  AC134533.2  OTTMUSG00000019892  Defcr3  ENSMUSG00000060208  Defcr17  OTTMUSG00000019892  Defcr17 ENMUSG00000060208  MGI:1345152  -  -  Defcr17  23855  Defcr3  13237  AC134533.3  OTTMUSG00000019924  novel  ENSMUSG00000063206 NP_031877.2 ENSMUSG00000074439 OTTMUSG00000019924  MGI:3709048  -  -  OTTMUSG00000019924  100041952  -  -  AC134533.6  OTTMUSG00000019927  AY761184  ENSMUSG00000058618  -  -  AY761184  MGI:3611585  -  -  AY761184  382000  -  -  Defcr24  OTTMUSG00000019980  Defcr24  ENSMUSG00000064213  -  -  Defcr24  MGI:3630383  -  -  Defcr24  503491  -  -  50  This analysis has highlighted the necessity in accounting for strain differences when deriving gene annotation based on cDNAs aligned to the C57BL/6J reference genome. The differences become very obvious when looking at the α-defensin region in MGI Genome Browser. The number of α-defensin genes there is higher than the number annotated here, but looking at the origin of many of the genes revealed that they are derived from strains distinct from the reference genome. An example is the Crp4 peptide, which was first isolated from Outbred Swiss mice (134) and corresponds to Defcr4 in MGI; this gene has been annotated in the 129X1/SvJ strain but has not been annotated in the reference strain (Table 2.8). The reference strain contained two presumed Crp4 peptide variants termed Crp4-B6a and Crp4-B6b because they were all missing three codons between the forth and fifth cysteine residues (135). MGNC has named these variants Defcr20 and Defcr21, respectively, however the relationship of these peptides between the two mouse strains is not obvious. Table 2.8. Defensin genes currently ‘missing’ from the mouse reference genome. α- and CRS-defensin genes that did not map to the C57BL/6J reference genome, mostly likely because of mouse strain differences or as a result of the incomplete genome sequence within the defensin cluster of C57BL/6J. The strain origin of genomic, transcript and peptide sequences are indicated, where applicable. n/a, not applicable; n/s, not specified. Gene Symbol Genomic Strain Transcript Strain Peptide Strain Defa1 129X1/SvJ Swiss albino n/a Defcr2 129 C3H/HeJ n/a Defcr4 129X1/SvJ 129/SvJ n/a Defcr6 129X1/SvJ 129 n/a Defcr7 n/a C3H/HeJ n/a Defcr8 n/a C3H/HeJ n/a Defcr9 n/a C3H/HeJ n/a Defcr10 n/a C3H/HeJ n/a Defcr11 n/a C3H/HeJ n/a Defcr12 n/a C3H/HeJ n/a Defcr13 n/a C3H/HeJ n/a Defcr14 n/a C3H/HeJ n/a Defcr15 n/a C3H/HeJ n/a Defcr-rs2 n/a Swiss albino n/a Defcr-rs4 129X1/SvJ C3H/HeN C3H/HeN Defcr-rs5 n/a 129X1/SvJ n/a Defcr-rs6 n/a 129X1/SvJ n/a Defcr-rs8 n/a 129X1/SvJ n/a Defcr-rs9 n/a 129X1/SvJ n/a Defcr-rs10 129 n/s n/a Defcr-rs11 n/a 129X1/SvJ n/a Defcr-rs12 129 129 n/a  Cross-reference Validation missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no 100% BLAST match missing in Vega and no sequence entry missing in Vega and no sequence entry missing in Vega and no sequence entry missing in Vega and no sequence entry missing in Vega and no 100% BLAST match missing in Vega and no sequence entry missing in Vega and no 100% BLAST match  Entrez Gene no annotation info no annotation info not annotated not annotated not annotated not annotated not annotated not annotated not annotated not annotated not annotated not annotated not annotated no annotation info not annotated not annotated not annotated not annotated not annotated not annotated not annotated no annotation info  Data displayed in the MGI Genome Browser is a combination of downloaded information from the UCSC's Genome browser (136) and that generated at MGI. MGI and UCSC do not filter any strain-specific data, and mapping of defensin genes has not been carried out stringently with the aim of displaying a comprehensive set of all existing genes. This is a valuable resource; 51  however, as a result several genes which shared the same nucleotide sequence coding for the signal peptide and the pro-segment have been mapped together. However, as the region encoding the mature peptide showed some differences, these genes can not be considered the same. To determine the ease of cross-species comparison, genomic alignments and putative orthologues were searched for in both human and rat genomes compared to mouse α-defensins. There are only six defensin genes with defined orthologues between human and mouse found on the Mouse Chromosome 8 Linkage Map (137). It was appreciated that for the mouse α-defensin family, orthology was especially hard to predict because of the high intraspecies similarity for these genes. The same human and rat genes aligned with most of the mouse genes and/or were predicted to be orthologues for the mouse peptides. In particular, DEFA7P was predicted to be the human orthologue for the majority of the mouse cryptdins by Ensembl but this gene lacked a start codon (12) and was therefore designated a pseudogene by manual annotation. These alignments and orthologues were predicted by Ensembl and may be an artefact of Ensembl’s naming scheme or of the high similarity between members of this gene family. 2.3.4 Conserved synteny within defensin clusters  Interspecies comparison is important not only from a biological perspective, but also to ascertain whether the human and/or rat genome assemblies can be used to facilitate closing of the estimated 0.76 Mb gap of mouse Chromosome 8, intervening the two α-defensin loci. Ensembl (v. 49) was used to display regions of conserved synteny between the mouse, human and rat αdefensin regions. The conserved syntenic region of the human genome is Chromosome 8p23.1. Figure 2.2B shows the arrangement of the human defensin genes within this region (8: 6.6 – 7.9 Mb), which includes one gap of approximately 100 kb in size. This region is also flanked at the 5’ end by XKR5, the homologue of the mouse Xkr5. However, whereas in the mouse Ccdc70, Atp7b and Alg11 were found 3' to the defensin gene cluster on Chromosome 8, CCDC70 did not  flank the 3’ end in human, but rather is located on human Chromosome 13q14.3 where the two telomeric gene homologues, ATP7B and ALG11, have also been mapped. There are no defensin genes in this region of human Chromosome 13, which indicates the breakpoint occurred telomeric to the defensin cluster, and a survey of this region shows a complete assembly. The quality of the human genome assembly near the defensin regions appears to be better than that of mouse. This may be due to a larger number of defensin genes within the mouse genome as compared to the human genome, as well as the high similarity between the mouse α52  defensin genes in particular. To investigate this further, we used Ensembl to analyze the conserved syntenic regions of the rat genome compared to the mouse (Chr16: 73.7 – 75.8 Mb). Nine rat α-defensin genes and one defensin-related gene were located on Chromosome 16q12.5. There were a few gaps in the region adjacent the defensin genes; one in particular of about 120 kb in the middle of the α- and β-defensin cluster, and another 5’ to the defensin region about 200 kb in size. The assembly of the rat reference sequence appears to be more similar to that of the human sequence, compared to the mouse sequence, with respect to the level of completion of defensin-rich regions. However it is important to note that the rat genome sequence is a draft sequence which differs with respect to the finished sequence of both human and mouse genomes (138). The approach for the sequencing of the rat genome was a combination whole genome shotgun (WGS) and BAC sequencing, and the authors argue that this approach has generated sequence with quality near to that of finished sequence (138). Another caveat is that the human and mouse genomes aided the assembly of the rat genome in difficult regions (138), therefore any errors in either the human and mouse assemblies within the defensin regions could translate into errors in the rat genome. A coordinated effort has been undertaken for the generation of a new rat genome build and the anticipated release is imminent (139). Re-examination of the defensin clusters will determine whether our analyses and observations hold true for these regions, in particular. Similar to the annotation of mouse α-defensins only on Chromosome 8, human and rat αdefensins have only been identified on Chromosome 8 and 16, respectively. This is in contrast to the presence of human, rat and mouse β-defensins on multiple chromosomes (human – Chromosomes 6p21, 8p23.1, 20q11.1 and 20p13, rat – 3q41, 9q13, 15p12 and 16q12.5 and mouse – 1A4, 2H1, 8A3 and 14D1). The assembly of human Chromosomes 6p21 and 20p13 are complete with no gaps, but there is a gap 5’ to the β-defensin cluster on 20q11.1; however this gap is also near to the centromere, which was not targeted by the genome projects due to the difficulty in sequencing highly repetitive α-satellite DNA in heterchromatic regions (140, 141). Additional rat β-defensins are located on Chromosomes 3q41, 9q13, 15p12 and 16q12.5; these regions appear complete, with the exception of 3q41, which has an 11 kb gap within the βdefensin cluster and 16q12.5, which contains both α- and β-defensins genes, as previously described. The assemblies of the mouse, rat and human genomes are more complete near regions of β-defensins compared to α-defensins since β-defensins are not as genetically similar as α53  defensins. β-defensins have had more time for movement associated with chromosomal rearrangements and multiple duplication events as compared to α-defensins; however mouse αdefensins have undergone a rapid expansion that has not occurred to the same extent in human and rat. Rapidly changing regions are interesting in evolutionary terms but are difficult to assemble into finished sequence (138), and additional defensin genes may be present within gaps in the assembly. These factors reinforce the biological importance and need for further characterization. 2.3.5 Defensin transcriptional expression profiling  There are advantages and disadvantages of both capillary and 454 sequencing. Longer sequence reads are generated by capillary sequencing however 454 sequencing is free from clonal biases and generates greater coverage depth. Therefore both technologies were used in parallel to determine α-defensin expression in C57BL/6J mice. 2.3.5.1 Genetic analysis of α- and CRS-defensins necessitates universal primer design  The nucleotide alignment of all α- and CRS-defensin coding sequences (Figure 2.13) was used to design primers with which to investigate gene expression following the genomic annotation. The alignment showed high conservation at the 5’ end of the coding sequences between all defensins, which ends approximately 160 nucleotides following the start codon. The 5’-UTR region of the 26 defensins was also inspected for primer design in order to encompass full transcript length in the amplification, but the conservation was lower than that of the start of the coding sequence. Additionally, there was variability in the length and composition of the 5’UTR sequence when present (ten genes did not have the 5’-UTR annotated) so this region was not considered for design of the forward primer. There was also nucleotide conservation at the 3’ end, however it was found only for defensin subgroups and did not occur throughout all defensins. The 3’ ends of the genes were also quite repetitive, with respect to nucleotide arrangement (e.g. large stretches of Ts), so this would also hinder primer design. The variability of the 3’ region would be lost when designing optimal primers by moving further in the 5’ direction and this region was needed for differentiation between expressed defensin genes. Ideally the primers needed to be designed outside of this region to facilitate sequencing of the mature peptide coding region.  54  Interestingly the nucleotide alignment of the α- and CRS-defensin intronic regions also showed a high degree of similarity (Appendices A.2 and A.3), which supported their recent duplication.  Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) OTTMUSG00000019860(Defcr20dupl) Defcr20(OTTMUSG00000019856) Defcr25(OTTMUSG00000019700) OTTMUSG00000019857(noveldef) OTTMUSG00000019896(noveldef) OTTMUSG00000019742(noveldef) Defcr24(OTTMUSG00000019980) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr26(OTTMUSG00000019889) OTTMUSG00000019784(noveldef) OTTMUSG00000019786(noveldef) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000018260(novelCRS1C) Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  : : : : : : : : : : : : : : : : : : : : : : : : : :  * 20 * 40 * 60 ATGAAGACACTTGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTACCAGGTCCAGACTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTACCAGGTCCAGACTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAAGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAAGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAAGTCCAGGCTGAT ATGAAGACACTAATCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAATCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTTTTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTTTTCCTGCTGGCCTTCCAAGTCCAGGCTGAT ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAGGCCCAGGCTGAT ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAGGCCCAGGCTGAT ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTACCAGGTCCAGGCTGAT ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACAATTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAAGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTAGTCCTCCTCTCTGCCCTCGTTCTGCTGGCCTTCCAGGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTACAGGTCCAGGCTGAT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTACAGGTCCAGGCTGAT ATGAAGACACTCGTTCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCTATGTCCAGGCTGAT ATGAAGACAcT gTcCTCCTCTCTGCCCT cCTGCTGGCCTtccA GtCCAGgCTGAT  : : : : : : : : : : : : : : : : : : : : : : : : : :  60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60  Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) OTTMUSG00000019860(Defcr20dupl) Defcr20(OTTMUSG00000019856) Defcr25(OTTMUSG00000019700) OTTMUSG00000019857(noveldef) OTTMUSG00000019896(noveldef) OTTMUSG00000019742(noveldef) Defcr24(OTTMUSG00000019980) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr26(OTTMUSG00000019889) OTTMUSG00000019784(noveldef) OTTMUSG00000019786(noveldef) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000018260(novelCRS1C) Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  : : : : : : : : : : : : : : : : : : : : : : : : : :  * 80 * 100 * 120 CCTATCCAAAACACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGATGACCAG CCTATCCAAAACACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCAAAACACAGATGAGGAGACTAATACTGAGGAGCAGCCAGGGGAGGAGGACCAG CCTATCCAAAACACAGATGAGGAGACTAATACTGAGGAGCAGCCAGGGGAGGAGGACCAG CCTATCCAAAACAGAGATGAAGAGAGTAAAATTGATGAGCAGCCAGGGAAAGAAGACCAA CCTATCCAAAACAGAGATGAAGAGAGTAAAATTGATGAGCAGCCAGGGAAAGAAGACCAA CCTATCCAAAACAGAGATGAAGAGAGTAAAATTGATGAGCAGCCAGGGAAAGAAGACCAA CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCAAAATACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGGGAAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGGGAAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGGGGAAGACGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGGGGAAGACGACCAG CCTATCCAAAACACAGATGAAGAGACTAATACTGAGGTGCAGCCACAGGAAGAGGACCAG CCTATCCAAAAAACAGATGAAGAGACTAATACTGAGGTGCAGCCAGAGGAAGAGGAGCAG CCTATCCACAAAACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCACAAAACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCACAAAACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCACAAAACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCAAAAAACAGATGAAGAGACTAATACTGAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCAAAAACACAGATGAAGAGACTAAAACTGGGGAGCAGCCAGAGGAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGAGGAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTGAGGAGCAGCCAGAGGAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTCAGGAGCAGCCAGGGGAAGAGGACCAG CCTATCCAAAACACAGATGAAGAGACTAAAACTCAGGAGCAGCCAGGGGAAGAGGACCAG TCTACCCAAGAGACTGATGAAGAGACTAAAACTGATGATCAGCCAGGGGAAGAGGATCAG cCTAtCcAaaA AcaGATGAaGAGAcTAA AcTgagGagCAGCCAggGgAaGA GAcCAg  : : : : : : : : : : : : : : : : : : : : : : : : : :  120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120  55  Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) OTTMUSG00000019860(Defcr20dupl) Defcr20(OTTMUSG00000019856) Defcr25(OTTMUSG00000019700) OTTMUSG00000019857(noveldef) OTTMUSG00000019896(noveldef) OTTMUSG00000019742(noveldef) Defcr24(OTTMUSG00000019980) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr26(OTTMUSG00000019889) OTTMUSG00000019784(noveldef) OTTMUSG00000019786(noveldef) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000018260(novelCRS1C) Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  : : : : : : : : : : : : : : : : : : : : : : : : : :  * 140 * 160 * 180 GCTGTGTCTGTCTCCTTTGGAGGCCAAGAAGGATCTGCTCTTCATGAAAAATTGT----GCTGTGTCTGTCTCCTTTGGAGGCCAAGAAGGATCTGCTCTTCATGAAAAATTGT----GCTGTGTCTGTCTCCTTTGGAGACCCAGAAGGATCTGCTCTTCATGAAAAATCGT----GCTGTGTCTGTCTCCTTTGGAGACCCAGAAGGATCTGCTCTTCATGAAAAATCGT----GCTGTTTCTGTCTCCTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATGT-----GCTGTTTCTGTCTCCTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATGT-----GCTGTTTCTGTCTCCTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATCCTCCAGC GCTGTGTCTGTCTCTTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATCGT----GCTGTGTCTGTCTCTTTTGGAGACCCAGAAGGCGCTTCTCTTCAAGAGGAATCAT----GCTGTGTCTGTCTCTTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATCGT----GCTGTGTCTGTCTCTTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATCGT----GCTGTGTCTGTCTCTTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATCGT----GCTGTGTCTGTCTCTTTTGGAGACCCAGAAGGCTCTTCTCTTCAAGAGGAATCGT----GCTGTGTCTGTCTCCTTTGGAAATCCAGAAGGCTCTGATCTTCAAGAAGAATCGT----GCTATGTCTGTCTCCTTTGGAAATCCAGAAGGCTCTGATCTTCAAGAAGAATCGT----GCTGTGTCTATCTCCTCTGGAGGCCAAGAAGGGTCTGCTCTTCATGAAGAATTGT----GCTGTGTCTATCTCCTTTGGAGGCCAAGAAGGGTCTGCTCTTCATGAAGAATTGT----GCTGTGTCTATCTCTTTTGGAGGCCAAGAAGGGTCTGCTCTTCATGAAGAATTGT----GCTGTGTCTATCTCTTTTGGAGGCCAAGAAGGGTCTGCTCTTCATGATGAATTGT----GCTGTGTCTATCTCCTTTGGAGGCCAAGAAGGGTCTGCTCTTCATGAAGAATTGT----GCTGTTTCTGTCTCCTTTGGAGGCACAGAAGGCTCTGCTCTTCAATATGTAGCCC----GCTGTTTCTGTCTCCTTTGGAGGCACAGAAGGCTCTGCTCTTCAAGATGTAGCCC----GCTGTTTCTGTCTCCTTTGGAGGCACAGAAGGCTCTGCTCTTCAAGATGTAGCCC----GCTGTTTCTGTCTCCTTTGGAGGCACAGAAGGCTCTGCTCTTCAAGATGTAGCCC----GCTGTTTCTGTCTCCTTTGGAGGCACAGAAGGCTCTGCTCTTCAAGATGTAGCCC----GGTGTGTCTGTCTCCTTTGAAGACCCAGAACGCTATGTTCTTCAAGTTTCAGGCC----GcTgT TCTgTCTC TtTGgAg cc AGAAgG tcT cTCTTCA ga g A  : : : : : : : : : : : : : : : : : : : : : : : : : :  175 175 175 175 174 174 180 175 175 175 175 175 175 175 175 175 175 175 175 175 175 175 175 175 175 175  Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) OTTMUSG00000019860(Defcr20dupl) Defcr20(OTTMUSG00000019856) Defcr25(OTTMUSG00000019700) OTTMUSG00000019857(noveldef) OTTMUSG00000019896(noveldef) OTTMUSG00000019742(noveldef) Defcr24(OTTMUSG00000019980) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr26(OTTMUSG00000019889) OTTMUSG00000019784(noveldef) OTTMUSG00000019786(noveldef) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000018260(novelCRS1C) Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  : : : : : : : : : : : : : : : : : : : : : : : : : :  * 200 * 220 * 240 ----CGAGAGATCTGATCTGCCTTTGTAGAAATCGTCGCTGCAATAGAGGAGAACTATTT ----CGAGAGATCTGATCTGCCTTTGTAGAAAACGTCGCTGCAATAGAGGAGAACTATTT ----CGAGAGATCTGATATGCTATTGTAGAAAAGGAGGCTGCAATAGAGGAGAACAAGTT ----CGAGAGATCTGATATGCTATTGTAGAAAAGGAGGCTGCAATAGAGGAGAACAAGTT ------GAAGATCTGATATGCTATTGTAGAACAAGAGGCTGCAAAAGAAGAGAACGCCTG ------GAAGATCGGATATGCTATTGTAGAACAAGCTGC--AAAAAAAAGAGAACGCCTG GCGTTGAGAGATCGGATATGCTATTGTAGAACAAGCTGC---AAAAAAAGAGAACGCCTG ----TGAGAGATCTGGTATGCTATTGTAGAGCAAGAGGCTGCAAAGGAAGAGAACGCATG ----TGAGAGATCTGGTATGCTATTGTAGAGCAAGAGGCTGCAAAGGAAGAGAACGCATG ----TGAGAGATCTGGTATGCTATTGTAGAACAAGAGGCTGCAAAAGAAGAGAACGCATG ----TGAGAGATCTGGTATGCTATTGTAGAACAAGAGGCTGCAAAAGAAGAGAACGCATG ----TGAGAGATCTGGTATGCTATTGTAGAAAAAGAGGCTGCAAAAGAAGAGAACGCATG ----TGAGAGATCTGGTATGCTATTGTAGAAAAAGAGGCTGCAAAAGAAGAGAACGCATG ----TGAGAGATCTGGGATGCTATTGTAGAAAAAGAGGCTGTACAAGAAGAGAACGCATT ----TGAGAGATCTGGGATGCTATTGTAGAAAAAGAGGCTGCACAAGAAGAGAACGCATT ----CAAAAAAGCTGATATGCTATTGTAGAATAAGAGGCTGCAAAAGAAGAGAATGCGTT ----CAAAAAAGCTGATATGCTATTGTAGAATAAGAGGCTGCAAAAGAAGAGAACGCGTT ----CTAAAAAGCTGATATGCTATTGTAGAATAAGAGGCTGCAAAAGAAGAGAACGCGTT ----CAAAAAAGCTGATATGCTATTGTAGAATAAGAGGCTGCAAAAGAAGAGAACGCGTT ----CAAAAAAGCTGATATGCTATTGTAGAATAAGAGGCTGCAAAAGAAGAGAACGCGTT ---AAAGAAGGTTTCCGTGGTGCCGGAAGTGCCCAGTGTGCCAGAAGTGCCAAGTGTGCC ---AACGAAGGTTTCTGTGGTGCCGGAAGTGCCCAGTGTGCCAGAAGTGCCAAGTGTGCC ---AAAGAAGGTTTCCGTGGTGCCGGAAGTGCCGAGTGTGCCAGAAGTGCCAAGTGTGCC ---AAAGAAGATTTCCGTGGTGCCGGAAGTGCCGAGTGTGCCAGAAGTGCGAAGTGTGCC ---AAAGAAGATTTCCGTGGTGCCGGAAGTGCCGAGTGTGCCAGAAGTGCGAAGTGTGCC ---TAGGAAAGCCTCCCCAGTGCCCGAAGTGCCCAGTGTGCTCAAAGTGCCCACAGTGCC A a ct G G AG G ca aag A g  : : : : : : : : : : : : : : : : : : : : : : : : : :  231 231 231 231 228 226 237 231 231 231 231 231 231 231 231 231 231 231 231 231 232 232 232 232 232 232  56  Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) OTTMUSG00000019860(Defcr20dupl) Defcr20(OTTMUSG00000019856) Defcr25(OTTMUSG00000019700) OTTMUSG00000019857(noveldef) OTTMUSG00000019896(noveldef) OTTMUSG00000019742(noveldef) Defcr24(OTTMUSG00000019980) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr26(OTTMUSG00000019889) OTTMUSG00000019784(noveldef) OTTMUSG00000019786(noveldef) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000018260(novelCRS1C) Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  : : : : : : : : : : : : : : : : : : : : : : : : : :  Defcr21(OTTMUSG00000019489) Defcr22(OTTMUSG00000019763) OTTMUSG00000019860(Defcr20dupl) Defcr20(OTTMUSG00000019856) Defcr25(OTTMUSG00000019700) OTTMUSG00000019857(noveldef) OTTMUSG00000019896(noveldef) OTTMUSG00000019742(noveldef) Defcr24(OTTMUSG00000019980) Defcr23(OTTMUSG00000019488) OTTMUSG00000019762(Defcr23dupl) Defcr3(OTTMUSG00000019782) OTTMUSG00000019892(Defcr3dupl) Defcr26(OTTMUSG00000019889) OTTMUSG00000019784(noveldef) OTTMUSG00000019786(noveldef) OTTMUSG00000019785(novelDefcr5) OTTMUSG00000018259(novelDefcr5) OTTMUSG00000018258(novelDefcr5) OTTMUSG00000019924(novelDefcr5) OTTMUSG00000019927(CRS1C-3) OTTMUSG00000018260(novelCRS1C) Defcr-rs1(OTTMUSG00000019792) OTTMUSG00000019859(novelCRS1C) OTTMUSG00000019893(novelCRS1C) OTTMUSG00000018344(CRS4C-6)  : : : : : : : : : : : : : : : : : : : : : : : : : :  * 260 * 280 * 300 TATGGGACCTGCGCAGGACCTTTTTTGCGCTGCTGCCGCCGCCGCCGCTGA--------TATGGGACCTGCGCAGGACCTTTTTTGCGCTGCTGCCGCCGCCGCCGCTGA--------TATGGGACCTGCTCAGGACGACTTTTGTTCTGCTGCCGCCGCCGCCACCGCCACTGA--TATGGGACCTGCTCAGGACGACTTTTGTTCTGCTGCCGCCGCCGCCACCGCCACTGA--AATGGGACCTGCAGAAAGGGTCATTTAATGTACATGCTCTGGTGCTGCTGA--------ATTGGGACCTGCAGAAAGGGTCATTTAATGTACAAGCTCTGCTGCCGCTGAACATGGAGA AATGGGACCTGCAGAAAGGGTCATTTAATGTACAAGCTCTGCTGCCGCTGA--------AATGGGACCTGCAGTAAGGGTCATTTAATGTACATGCTCTGCTGTCGCTGA--------AATGGGACCTGCAGTAAGGGTCATTTATTGTACATGCTCTGCTGTCGCTGA--------AATGGGACCTGCAGAAAGGGTCATTTAATATACACGCTCTGCTGTCGCTGA--------AATGGGACCTGCAGAAAGGGTCATTTAATATACACGCTCTGCTGTCGCTGA--------AATGGGACCTGCAGAAAGGGTCATTTAATGTACACACTCTGCTGTCGCTGA--------AATGGGACCTGCAGAAAGGGTCATTTAATGTACACACTCTGCTGTCGCTGA--------AATGGGACCTGCAGAAAGGGTCATTTAATGTACACACTCTGCTGCCTCTGA--------AATGGGACCTGCAGAAAGGGTCATTTAATGTACACACTCTGCTGCCTCTGA--------TTTGGGACCTGCAGAAATCTTTTTTTAACTTTCGTATTCTGTTGCAGCTGA--------TTTGGGACCTGCAGAAATCTTTTTTTAACTTTCGTATTCTGCTGCAGCTGA--------TTTGGGACCTGCAGAAATCTTTTTTTAACTTTCGTATTCTGCTGCAGCTGA--------TTTGGGACCTGCAGAAATCTTTTTTTAACTTTCGTATTCTGCTGCAGCTGA--------TTTGGGACCTGCAGAAATCTTTTTTTAACTTTCGTATTCTGCTGCAGCTGA--------AGAAGTGCCCTGTGTGCCCGACATGCCCCCAGTGCCCGAAGCTGCCATTGTGCAAAGAAA AGAAGTGCCCTGTGTGCCCCACATGCCCTCAGTGCCCAAAGCAGCCATTGTGCGAAGAAA AGAAGTGCCCTGTGTGCCCGACATGCCCCCAGTGCCCAAAGCAGCCATTGTGCGAAGAAA AGAAGTGCCCTGTGTGCCCGACATGCCCCCAGTGCCCAAAGCAGCCATTGTGCAAAGAAA AGAAGTGCCCTGTGTGCCCGACATGCCCCCAGTGCCCAAAGCAGCCATTGTGCAAAGAAA CACAGTGCCCACAGTGCCCGG---------GGTGCCCGAGGT-GCAATTGCATGACAAAA G CC c g t c Gc G tG * 320 * 340 * --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------CCACAGAGGACAAGACAGCCATGAGTACTGAGGCCACTGATGCTGGGGCCTGA-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GGCAAAATAAAAGTGCAATCACCACCCAAGCTCCAAATACACAGCATAAAGGCTGTTGA GGCAAAATAAAACTGCAATCACCACCCAAGCTCCAAATACACAGCATAAAGGCTGTTGA GGCAAAATAAAACTGCAATCACCACCCAAGCTCCAAATACACAGCATAAAGGCTGTTGA GGCAAAATAAAACTGCTATCACCACCCAAGCTCCAAATACACATCATAAAGGCTGTTGA GGCAAAATAAAACTGCTATCACCACCCAAGCTCCAAATACACATCATAAAGGCTGTTGA TGA--------------------------------------------------------  : : : : : : : : : : : : : : : : : : : : : : : : : :  : : : : : : : : : : : : : : : : : : : : : : : : : :  282 282 288 288 279 286 288 282 282 282 282 282 282 282 282 282 282 282 282 282 292 292 292 292 292 282  339 351 351 351 351 351 285  Figure 2.13. Nucleotide alignment of all murine α- and CRS-defensin coding sequences. Defensin gene sequences downloaded from Vega v.37 (Biomart) and aligned using ClustalW2. Alignment file visualized with GeneDoc (Multiple Sequence Alignment Editor and Shading Utility Version 2.7.000) software to view percentage conservation by shading (100% black, 80% dark grey, 60% light grey).  Designing defensin-specific primers for interrogation of transcriptional expression was not possible, either for each transcript individually or as a population. Potentially, primers for a few of the subgroups of defensins could have been designed, but this would have omitted the remainder, and also prevented the opportunity for novel defensin discovery. An alternative 57  method was to use the hybrid UDefR-oligo-dT primer in the reverse transcription. Due to the high conservation of all α- and CRS-defensins at the start of the coding sequence, the forward universal primer UDefF1 was designed at the start of the defensin coding sequences (Figure 2.14). 19785 19786 18258 18259 19924 19489 19763 19856 19860 19980 19742 19488 19762 19892 19782 19700 19857 19896 19784 19889 18260 19927 19792 19893 19859 18344  : : : : : : : : : : : : : : : : : : : : : : : : : :  * 20 * 40 * ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAGGC ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAGGC ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAGGT ATGAAGACATTTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTACCAGGT ATGAAGACAATTGTCCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCCAAGT ATGAAGACACTTGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTACCAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTACCAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGT ATGAAGACACTAATCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGT ATGAAGACACTAATCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGT ATGAAGACACTAGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTTCCAGGT ATGAAGACACTAGTCCTCCTCTCTGCCCTCATCCTGCTGGCCTTCCAGGT ATGAAGACACTAGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGT ATGAAGACACTAGTCCTCCTCTCTGCCCTCGTCCTGCTGGCCTTCCAGGT ATGAAGACACTAGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAAGT ATGAAGACACTAGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAAGT ATGAAGACACTAGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAAGT ATGAAGACACTTGTCCTCCTCTCTGCCCTTTTCCTGCTGGCCTTCCAAGT ATGAAGACACTTGTCCTCCTCTCTGCCCTTTTCCTGCTGGCCTTCCAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTCCAGGT ATGAAGACACTAGTCCTCCTCTCTGCCCTCGTTCTGCTGGCCTTCCAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTACAGGT ATGAAGACACTTGTCCTCCTCTCTGCCCTTGCCCTGCTGGCCTTACAGGT ATGAAGACACTCGTTCTCCTCTCTGCCCTTGTCCTGCTGGCCTTCTATGT ATGAAGACAcT gTcCTCCTCTCTGCCCT cCTGCTGGCCTtccA Gt  : : : : : : : : : : : : : : : : : : : : : : : : : :  50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50  Figure 2.14. Murine α- and CRS-defensin coding sequence start with UDefF1 primer sequence. The GeneDoc display of the PAL2NAL alignment of the start of the defensin coding sequences with the consensus sequence shown below the alignment. Upper case letters indicate 100% identity between all sequences, lower case letters indicate the consensus of all sequences and spaces indicate no consensus between sequences. The universal UDefF1 primer is highlighted in grey, and the non-consensus positions coloured according to base identity. The defensin sequences are named by their OTTIDs, and should be preceded by OTTMUSG000000 for any database queries.  Following reverse transcription, amplification with the UDefF1 and UDefR primers resulted in distinct α- and CRS-defensin PCR product pools, with approximate sizes of 400 and 500 bp, respectively. (Figure 2.15A). Amplification with the UDefF1-454 and UDefR-454 primers also resulted in distinct α- and CRS-defensin PCR product pools, however the products were larger in comparison due to the additional length of the 454 primers (Figure 2.15B).  58  A.  UDefF1 primer UDefR primer defensin cDNA template RT enzyme 1 Kb DNA ladder (0.5 μg)  - + + - + + - M1 M2 - + + + - -  + + M3 + -  + + M4 + -  + + M5 + -  + + M6 + -  + + + + + + + + - M1 M2 M3 n/a - - - - - -  UDefF1-454-1 primer UDefF1-454-2 primer UDefF1-454-3 primer UDefR primer defensin cDNA template 1 Kb DNA ladder (0.5 μg)  - + + - - - - - + + - M1 M1 + - -  + + M1 -  + + M1 -  + - - + + - - + + + - M2 M2 - - -  + + M2 -  UDefF1-454-4 primer UDefF1-454-5 primer UDefF1-454-6 primer UDefR primer defensin cDNA template 1 Kb DNA ladder (0.5 μg)  - + + - - - - - + + - M4 M4 + - -  + + M4 -  + + M4 -  + - - + + - - + + + - M5 M5 - - -  + + M5 -  + + M4 -  + + M5 -  + + M6 -  + + - n/a - +  + + M2 -  - - + - - + + + + + - M3 M3 - - -  + + M3 -  + + M3 -  + + -  +  + + M5 -  - - + - - + + + + + - M6 M6 - - -  + + M6 -  + + M6 -  + + -  +  B.  Figure 2.15. Generation of C57BL/6J murine universal α- and CRS-defensin pools for capillary and 454 amplicon sequencing. RNA was extracted from the small intestines of 3 female and 3 male mice and reverse transcribed using the UDefR-oligo-dT primer. PCR amplification was performed using either the UDefF1 and UDefR primers or UDefF1-454 and UDefR primers for capillary sequencing (A) and 454 sequencing (B), respectively, and products visualized by agarose gel electrophoresis. The no RT enzyme control was performed using the UDefF1 and UDefR primers (A).  59  2.3.5.2 Capillary sequencing of universal defensin cDNA  Unlike subsequent efforts, there were a high proportion of failed capillary sequences per sample. Additionally, some clones had only one good quality read, as it was common for reads initiated at the polyA tail to have messy sequences with higher background levels. Only those reads greater than 350 nucleotides were included in the analysis as these were considered fulllength. Reads with short alignment length were excluded from the analysis because gaps in the PCR product were detected, which suggests bacterial-mediated deletion. Table 2.9 summarizes the capillary sequencing read quality and number of clones analyzed for each sample. Table 2.9. Universal defensin capillary sequencing summary. The number of sequencing reads obtained from sequencing 96 TOPO clones per sample (M1-M6) with T3 and T7 primers. The number of reads that passed Asp (WTSI quality control), corresponding to the number of individual clones is indicated. Manual inspection of all the sequencing reads that passed Asp further reduced the number of high quality reads for analysis due to messy sequence, high background or less than 350 nucleotide alignment length.  Sample  Reads Passed Asp (/192)  M1 M2 M3 M4 M5 M6  178 183 187 189 185 190  Reads with Significant BLAST Hits (/192) 35 65 56 53 72 65  Clones with Significant BLAST Hits (/96) 20 36 28 27 37 34  Clones Analyzed (/96)  Clones Analyzed T3 and T7 Reads  15 20 20 17 22 31  11 15 15 11 9 12  The significant BLAST matches for the capillary sequences shared 98-100% identity with the 26 known α- and CRS-defensin transcripts (Table 2.10), however only those with 100% identity were further considered. None of the transcripts were detected in all samples M1-M6, however all but eleven transcripts, (OTTMUSG000000)18258, 18344, 19700, 19784, 19786, 19792, 19857, 19860, 19889, 19896 and 19924, were detected in at least one sample. Seven of these, 18344, 19700, 19786, 19792, 19857, 19896 and 19924, were also not detected by 454 sequencing (see below).  60  Table 2.10. Universal defensin transcript expression determined by capillary sequencing of TOPO clones. Relative murine α- and CRS-defensin transcript expression in six naïve C57BL/6J mice (M1-M6), determined by PCR amplification of small intestinal cDNA using universal defensin primers, cloning of PCR products into TOPO4 and capillary sequencing. TOPO clone sequencing reads are expressed as percentage identity with respect to each transcript reference sequence. > indicates identity was greater than the percentage listed but less than the next highest category (e.g. > 99.5% ≡ 100% < x ≥ 99.5%). Duplicated genes that could not be differentiated are listed together. * indicates an ability to differentiate between duplicated genes, and ** indicates an inability to differentiate between very similar genes. 100% Identity Sample  OTTID  No. Reads (%)  >99.5% Identity OTTID  No. Reads (%)  >99.0% Identity OTTID  No. Reads (%)  >98.5% Identity OTTID  No. Reads (%)  >98% Identity OTTID  No. Reads (%)  3 (20.0) 19892 * 1 (6.7) 1 (6.7) 1 (6.7) 1 (6.7) 1 (6.7)  2 (13.3) 19980 19892 * 19488/ 19762  1 (6.7) 19763 1 (6.7) 19892 * 1 (6.7)  1 (6.7) n/a 1 (6.7)  n/a  M1  19892 * 19489 19763 19782 * 19856 * 19859/ 19893  2 (10.0) 18259 2 (10.0) 19489 2 (10.0) 19856 * 2 (10.0) 1 (5.0) 1 (5.0) 2 (10.0) 1 (5.0)  1 (5.0) 19763 1 (5.0) 1 (5.0)  1 (5.0) 19892 *  3 (15.0) n/a  n/a  M2  19489 19763 19856 * 19892 * 18259 19782 * 19859/ 19893 19488/ 19762  2 (10.0) 19856 * 2 (10.0) 19742 1 (5.0) 19489 1 (5.0) 1 (5.0)  5 (25) 19742 2 (10.0) 19892 * 1 (5.0)  1 (5.0) 19860 * 1 (5.0) n/a 1 (5.0) 19762 19488 ** 2 (10.0)  n/a  M3  19489 19856 * 18259 18260 19488/ 19762  2 (11.8) 2 (11.8) 1 (5.9) 1 (5.9) 1 (5.9)  18259 19860 * 19892 * 19927 19489/ 19763 **  1 (5.9) 19856 * 1 (5.9) 19489 1 (5.9) 19892 * 1 (5.9) 2 (11.8)  1 (5.9) 19489 1 (5.9) 1 (5.9)  1 (5.9) n/a  n/a  M4  19489 19763 18259 19856 * 19892 *  M5  18259 18260 19742 19763  1 (4.5) 1 (4.5) 1 (4.5) 1 (4.5)  19856 * 18259 19763 19859/ 19893  3 (13.6) 2 (9.1) 1 (4.5) 1 (4.5)  1 (4.5) 19860 * 1 (4.5) 19856 * 1 (4.5) 19980 1 (4.5)  2 (9.1) 19782 * 1 (4.5) 1 (4.5) 18260 1 (4.5) 1 (4.5) 19742/ 19980 ** 1 (4.5)  2 (6.5) 19856 * 2 (6.5) 19892 * 2 (6.5) 19488/ 19762 2 (6.5) 2 (6.5) 2 (6.5) 2 (6.5) 1 (3.2) 1 (3.2) 1 (3.2) 1 (3.2) 1 (3.2)  1 (3.2) 19782 * 1 (3.2) 19489 1 (3.2) 19856 *  2 (6.5) n/a 1 (3.2) 1 (3.2)  M6  18259 19742 19763 19785 19856 * 19892 * 19927 18260 19489 19980 19488/ 19762 19859/ 19893  19489 18259 19856 * 19489/ 19763 **  1 (3.2) 18259 1 (3.2) 19856 * 3 (9.7) 19892 *  n/a  61  2.3.5.3 Deep sequencing by 454 of universal defensin PCR products  The 454 sequencing of the universal defensin amplicons generated a total of 101,525 sequences for the M1-M6 samples, based on their GSMID tag sequence identifier. The minimum and maximum number of reads for any individual sample was 13,671 and 20,145, respectively. The number of reads for each sample decreased by about half once they were filtered for those over 300 nucleotides, which was considered full-length, taking into account the lengths of the trimmed transcript (minus UDefF1 primer sequence and any 5’-UTR) reference sequences. One transcript (OTTMUSG00000019700) was shorter than this cut-off following trimming (254 nt), so the analysis was repeated filtered for those reads over 250 nucleotides, however there were still no 100% matches to this transcript. All other results were reported as per the 300 nucleotide filtering. Overall, approximately 3% of the filtered reads matched at least one of the trimmed transcripts with an alignment length of at least 300 nucleotides and 100% identity. Sequencing reads that did not match any defensin reference sequence were candidates for novel, unannotated genes. Pyrosequencing is prone to base calling errors in repetitive and homopolymeric regions of sequence, and these errors would also be contained within the non-100% identity category. The 454 sequencing results summary, including the numbers of reads generated for each sample and the number of identical matches to the trimmed defensin transcripts, can be found in Table 2.11.  62  Table 2.11. Universal defensin 454 amplicon deep sequencing summary. The total numbers of 454 sequencing reads for each C57BL/6J mouse (M1-M6). Unfiltered sequences that passed quality control processing steps (WTSI sequencing facility) were then filtered for those at least 300 nucleotides in length. The filtered sequences were further subdivided based on their identity to the defensin reference transcript sequences, trimmed to exclude any 5’-UTR sequence as well as the UDefF1 primer sequence. The sequencing reads with 100% identity (ID) with an alignment length of at least 300 nucleotides were quantitated for relative defensin transcript numbers. The percentage of unfiltered reads was compared to the total run number, the percentage of filtered reads were compared to the total unfiltered reads, and both the percentage of 100% and non-100% ID reads were compared to the total filtered reads. Number of Sequence Reads Sample Unfiltered Filtered 100% ID Non-100% ID M1 13732 7436 345 7091 M2 13671 7358 456 6902 M3 19753 10268 181 10087 M4 20145 10123 152 9971 M5 16867 8792 141 8651 M6 17357 8467 148 8319 Total 101525 52444 1423 51021 Percentage 99.6 51.7 2.7 97.3 TOTAL RUN 101928  Throughout the six samples, variations within the numbers of each of the 26 transcripts were observed (Table 2.12 and Figure 2.16), however there were overall trends in the degree of transcript expression. The α-defensin genes found to have the highest expression included Defcr3dupl  (OTTMUSG00000019892),  Defcr23/Defcr23dupl  Defcr21  (OTTMUSG00000019489),  (OTTMUSG00000019488/OTTMUSG00000019762),  and  Defcr20  (OTTMUSG00000019856). Moderate α-defensin gene expression was observed for Defcr22 (OTTMUSG00000019763),  Defcr20dupl  (OTTMUSG00000019856),  Defcr24  (OTTMUSG00000019980), and Defcr3 (OTTMUSG00000019782). α-Defensin genes with low expression  included  (OTTMUSG00000019889),  (novelDefcr5),  OTTMUSG00000018259 OTTMUSG00000019785  Defcr26  (novelDefcr5)  and  OTTMUSG00000019784 (novelDef). α-Defensin genes for which no expression was detected were OTTMUSG00000018258 (novelDefcr5), OTTMUSG00000019924 (novelDefcr5), Defcr25 (OTTMUSG00000019700), OTTMUSG00000019786 (novelDef), and OTTMUSG00000019896 (novelDef). The CRS-defensin genes expressed at  moderate to  low levels  were  OTTMUSG00000018260 (novelCRS1C), OTTMUSG00000019859/OTTMUSG00000019893 63  (novelCRS1C/novelCRS1Cdupl), and CRS1C-3 (OTTMUSG00000019927). The expression of CRS4C-6 (OTTMUSG00000018344) and Defcr-rs1 (OTTMUSG00000019792) was not  detected. The expression of OTTMUSG00000019857 (novelDef), which has a predicted mature peptide with only three cysteines, was also not detected. Table 2.12. Universal defensin transcript expression determined by 454 amplicon deep sequencing. Relative murine α- and CRS-defensin transcript expression in six naïve C57BL/6J mice (M1-M6), determined by PCR amplification of small intestinal cDNA using 454-adapted universal defensin primers and 454 sequencing, expressed as a percentage of the total number of sequencing reads with 100% identity to each transcript. Defensin reference transcripts are denoted by the last five digits of their gene OTTIDs, which are preceded by OTTMUSG000000 in the Vega database genome browser. * and ** indicates genes annotated as duplications, which can be distinguished based on their 3’-UTR sequence. Duplicated genes that could not be differentiated are listed together. M1 M2 M3 M4 M5 M6 TOTAL OTTID Reads % Reads % Reads % Reads % Reads % Reads % Reads % 19892 * (Defcr3dupl) 76 22.0 124 27.2 71 39.2 43 28.3 33 23.4 18 12.2 365 25.7 19489 (Defcr21) 13 3.8 20 4.4 40 22.1 60 39.5 69 48.9 79 53.4 281 19.7 19488/ 19762 126 36.5 118 25.9 3 1.7 3 2.0 4 2.8 4 2.7 258 18.1 (Defcr23/Defcr23dupl) 19856 ** (Defcr20) 63 18.3 78 17.1 6 3.3 6 3.9 3 2.1 12 8.1 168 11.8 19763 (Defcr22) 26 7.5 41 9.0 2 1.1 3 2.0 2 1.4 2 1.4 76 5.3 19860 ** (Defcr20dupl) 22 6.4 39 8.6 2 1.1 1 0.7 2 1.4 1 0.7 67 4.7 19980 (Defcr24) 0 0.0 0 0.0 32 17.7 18 11.8 11 7.8 4 2.7 65 4.6 19782 * (Defcr3) 4 1.2 21 4.6 9 5.0 4 2.6 5 3.5 0 0.0 43 3.0 3 0.9 3 0.7 4 2.2 7 4.6 5 3.5 16 10.8 38 2.7 18260 (novelCRS1C) 18259 (novel Defcr5) 4 1.2 1 0.2 3 1.7 2 1.3 2 1.4 2 1.4 14 1.0 19859/ 19893 1 0.3 0 0.0 3 1.7 2 1.3 4 2.8 3 2.0 13 0.9 (novelCRS1C)  19742 (novelDefcr) 19889 (Defcr26) 19927 (CRS1C-3) 19784 (novelDefcr) 19785 (novel Defcr5) 19792 (Defcr-rs1) 18344 (CRS4C-6) 19857 (novelDefcr) 19896 (novelDefcr) 19786 (novelDefcr) 19700 (Defcr25) 19924 (novel Defcr5) 18258 (novel Defcr5) TOTAL  2 3 0 2 0 0 0 0 0 0 0 0 0 345  0.6 0.9 0.0 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100  8 1 0 1 1 0 0 0 0 0 0 0 0 456  1.8 0.2 0.0 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100  2 3 1 0 0 0 0 0 0 0 0 0 0 181  1.1 1.7 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100  0 2 0 0 1 0 0 0 0 0 0 0 0 152  0.0 1.3 0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100  0 0 1 0 0 0 0 0 0 0 0 0 0 141  0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100  0 2 4 0 1 0 0 0 0 0 0 0 0 148  0.0 12 0.8 1.4 11 0.8 2.7 6 0.4 0.0 3 0.2 0.7 3 0.2 0.0 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0 0.0 100 1423 100  64  100%  19892 * (Defcr3dupl) 19489 (Defcr21)  90%  19488/ 19762 (Defcr23/Defcr23dupl) 19856 ** (Defcr20) 19763 (Defcr22)  80%  19860 ** (Defcr20dupl) 19980 (Defcr24)  70%  19782 * (Defcr3) 18260 (novelCRS1C)  60%  18259 (novelDefcr5) 19859/ 19893 (novelCRS1C) 19742 (novelDefcr)  50%  19889 (Defcr26) 19927 (CRS1C-3)  40%  19784 (novelDefcr) 19785 (novelDefcr5) 19792 (Defcr-rs1)  30%  18344 (CRS4C-6) 19857 (novelDefcr)  20%  19896 (novelDefcr) 19786 (novelDefcr)  10%  19700 (Defcr25) 19924 (novelDefcr5) 18258 (novelDefcr5)  0% M1  M2  M3  M4  M5  M6  ALL  Figure 2.16. Universal defensin transcript expression determined by 454 amplicon deep sequencing. Relative murine α- and CRS-defensin transcript expression in six naïve C57BL/6J mice (M1-M6), determined by PCR amplification of small intestinal cDNA using 454-adapted universal defensin primers and 454 sequencing, expressed as a percentage of the total number of sequencing reads with 100% identity to each transcript. Defensin reference transcripts are denoted by the last five digits of their gene OTTIDs, which are preceded by OTTMUSG000000 in the Vega database genome browser. * and ** indicates genes annotated as duplications, which could be distinguished based on their 3’-UTR sequence. Duplicated genes that could not be differentiated are listed together. Table 2.12 lists the exact numbers and percentages of 454 sequencing reads for each transcript reference sequence. Each set of duplicated genes had identical mature peptide regions, which therefore could not be differentiated in upon translation.  The 454 sequencing confirmed the gene expression of six out of the seven α-defensin peptides recently purified from the small intestine of C57BL/6J mice (personal communication, A. Ouellette and M. Shanahan). These included Crp3, Crp5, Crp20, Crp21, Crp23, and Crp24, which are coded for by genes Defcr3, Defcr5, Defcr20, Defcr21, Defcr23 and Defcr24, respectively. These genes comprise the most highly and moderately expressed transcripts identified through 454 sequencing, including the duplicated genes. One exception was the highly expressed transcript from Defcr22, for which the corresponding predicted peptide Crp22 was not identified by Ouellette and Shanahan. However the mature peptide regions of Crp21 and Crp22 differ by only one amino acid at position 11 which corresponds to position 68 in the prepropeptide. At this position Crp21 contains an asparagine (polar, neutral), whereas Crp22 65  contains a lysine (polar, positive), which both have similar hydropathy indices (142). The peptides were separated by reverse-phase high performance liquid chromatography (RP-HPLC) with an increasing acetonitrile gradient, thus Crp21 and Crp22 may have eluted together. The defensin transcript data generated in this work confirms the genomic annotation for those genes detected, and their relative expression suggests that the polymorphisms observed between strains are dependent not only on their genomic content but also on the regulation of their transcription. None of the translated-454 sequences for any of the six mouse samples matched with 100% identity to the putative novel Crp27 peptide purified from C57BL/6J mice (Table 2.13). In contrast,  the  454  analysis  identified  two  other  moderately  expressed  transcripts,  OTTMUSG00000019742 (novelDef) and Defcr26 (OTTMUSG00000019889), which were not identified as peptides (personal communication, A. Ouellette and M. Shanahan). Table 2.13. Blastx analysis of 454 transcripts against the mature Crp peptides. The amino acid sequences for the mature peptides purified from C57BL/6J small intestine were obtained from A. Ouellette and M. Shanahan. A database of the peptide sequences was created, against which the translated filtered 454 sequences (over 300 nt) for each sample (M1-M6) were blasted. The translated unfiltered sequences were also blasted against the Crp27 peptide sequence, but no matches were detected (not shown). The lengths of the mature peptides are as follows. Crp3, 35 a.a; Crp5, 36 a.a.; Crp20, 42 a.a.; Crp21, 36 a.a.; Crp23, 35 a.a.; Crp24, 35 a.a.; Crp27 35 a.a.  Peptide Crp3 Crp5 Crp20 Crp21 Crp23 Crp24 Crp27 TOTAL  M1 702 40 1858 107 640 21 0 3368  Number of 100% Identity Reads M2 M3 M4 M5 772 580 415 298 48 62 75 41 1937 430 460 444 104 174 322 292 517 243 144 122 19 179 76 52 0 0 0 0 3397 1668 1492 1249  M6 168 60 465 288 80 15 0 1076  There are several possible explanations for these apparent discrepancies. It seems possible that the Defcr27 cDNA sequence is incorrect since the GenBank accession is from Expressed Sequence Tag evidence. Conversely, the predicted and actual mass spectrometry atomic mass units (A.M.U.) were not as close for Crp27 as for the other six peptides detected (personal communication, A. Ouellette and M. Shanahan). It is also possible that the Defcr27 transcript  might have been produced at levels below the limits of detection of the 454 sequencing or 66  conversely the transcripts identified by 454 sequencing were produced as peptides but at quantities below the limits of detection by RP-HPLC. If the cDNA sequence is incorrect, then it is possible that the 454 sequencing results could aid in the identification of the correct sequence. The predicted molecular weights of the deduced mature peptides of OTTMUSG00000019742 (novelDef) and Defcr26 (OTTMUSG00000019889) are 4085.0 and 4150.1 A.M.U., respectively (143). The molecular weight of OTTMUSG00000019742 was similar to the predicted (4086.7 A.M.U.) and actual (4077.0 A.M.U.) values obtained for the putative novelCrp27 peptide (personal communication, A. Ouellette and M. Shanahan). Thus, OTTMUSG00000019742 could potentially be a novel α-defensin identified at the genomic, transcript and peptide levels. 2.3.6 The human α-defensin DEFA1 induces IL8 and IL10 release in vitro  DEFA1 induced the release of IL8 and TNF, as well as IL10 from human PBMCs (Figure 2.17A). This was dose-dependent for IL10 and TNF (IL8 not tested) (Figure 2.17B). The specificity of peptide action was verified by proteinase K digestion of DEFA1, which significantly reduced the TNF production by human PBMCs (Appendix A.4). Proteinase K digestion of LPS under the same conditions did not significantly reduce the TNF production, except at the highest concentration of proteinase K tested, which is likely due to excessive residual enzymatic activity in the tissue culture supernatant or to a contaminant in the proteinase K. Cytotoxicity of the peptide was not observed by WST-1 cellular proliferation assay, and no endotoxin contamination was detected by LAL assay (Hycult Biotechnology). Preliminary experiments suggested the involvement of NF-κB and mitogen-activated protein kinase (MAPK) 14 signalling, as DEFA1-induced IL8, IL10 and TNF release from human PBMCs was inhibited by one hour pretreatment with 10 μM of the Iκ-Bα phosphorylation inhibitor, Bay 11-7085, or 10 μM of the MAPK14 inhibitor, SB208530 (Biomol International) (Appendix A.5).  67  A. IL10  TNF-a  20000  1600  18000  1400  16000  1200  IL8 (pg/ml)  14000 1000  12000 10000  800  8000  600  6000  400  IL10 or TNF (pg/ml) b  IL8  4000 200  2000  0  0  Control  LL-37  DEFA1  LPS  B. 1800  IL10 TNF  1600  IL10 or TNF (pg/ml)  1400 1200 1000 800 600 400 200 0 Control  DEFA1 (5 ug/ml)  DEFA1 (10 ug/ml)  DEFA1 (20 ug/ml)  LL-37 (5 ug/ml)  LL-37 (20 ug/ml)  Figure 2.17. The human α-defensin DEFA1 induced cytokine and chemokine release in vitro. Human PBMCs treated with DEFA1 (20 μg/ml) for 24 hours release IL8, IL10 and TNF into the cell culture supernatant, which was measured by ELISA (A). The quantity of IL8 produced is reported on the left-hand axis and the quantities of IL10 and TNF are reported on the right-hand axis. The production of IL10 and TNF, measured by ELISA, by PBMCs treated for 24 hours with 5-20 μg/ml DEFA1 was dose-dependent (B). The release of IL8 by PBMCs in a dosedependent manner was not tested. LPS and LL-37 were included as controls. Results are representative of two independent experiments for A and four for B, each with duplicate wells per treatment. 68  2.4  Discussion  Historically the organization of mouse α-defensin genes has been poorly defined, which is due in part to the difficulties in assembling the reference genome in these regions. Additionally consistency and standardization in naming genes and peptides has been an issue between research groups, journals and genome browsers, and is most likely due to a combination of factors. Defensins were first discovered as peptides or from cDNA. With the completion of large-scale genome sequencing projects, it has become possible to mine these genomes for defensin genes by scanning translated genomic sequence for the six conserved cysteine residues. Some defensins have only been identified at the genomic level, without subsequent peptide or RNA expression data. Searching the literature gives the impression that the naming of murine αdefensins is orderly and systematic although searching major databases for the corresponding information is difficult. The problem arises because most of the experimental data comes from mouse strains that are different from the C57BL/6J reference genome. I have found evidence to confirm that there are strain and possibly CNV differences within the defensin gene family and therefore annotating/mapping genes and peptides discovered in non-reference strains remain a challenge. Since defensin genes are involved in copy number variation as well as revealing polymorphisms between strains, a one to one orthology between species is hard to predict. Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. This manual annotation of the defensin gene cluster enabled the identification of novel α- and CRS-defensin genes, and pseudogenes. Comparison of the mouse α-defensins in the three main mouse reference gene sets Ensembl, MGI, and NCBI RefSeq revealed significant inconsistencies in annotation and nomenclature. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. The manually curated gene models described here will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. This work has highlighted the need to establish a standardized defensin nomenclature system, applicable to all organisms, which can then be implemented by the major genome centres, genome browsers and journals. For mouse, human and many other species, the abbreviation DEFB/Defb and DEFA/Defa has been used to tag β- and α-defensins, respectively. Both the Hugo (Human Genome Organisation) Gene Nomenclature Committee (HGNC) and Mouse Genomic Nomenclature Committee (MGNC) groups have approved the DEFB/Defb and 69  DEFA/Defa naming schemes for human and mouse (144). This system is ideal compared to taxonomy-based naming (Table 2.14), which only complicates matters further. Table 2.14. Species-specific defensin nomenclature. Examples of the diverse names and symbols historically used to describe defensins within several species including mouse, most of which can be found in current literature. n/a, no symbol associated with name. Species human  rabbit  mouse  rat  guinea pig  cow  chicken  ostrich  turkey  Name human neutrophil peptide human defensin alpha-defensin human beta-defensin beta-defensin corticostatin neutrophil peptide microbicial cationic protein corticostatin "rabbit kidney" crypi cryptidin cryptdin defensin-related cryptdin cryptdin-related sequence defensin-related cryptdin-related sequence alpha defensin beta defensin corticostatin rat neutrophil peptide "rat peptide from bone marrow" alpha-defensin beta-defensin guinea pig neutrophil peptide guinea pig cationic peptide corticostatin tracheal antimicrobial peptide neutrophil beta-defensin beta-defensin lingual antimicrobial peptide bovine beta-defensin chicken heterophil peptide gallinacin beta-defensin avian beta-defensin ostricacins ostrich gallinacin ostrich avian beta-defensin turkey heterophil peptide turkey beta-defensin turkey avian beta-defensin  Symbol HNPHDDEFAHBDDEFBCS NPMCPCS RKcrypi CrypCrypDefcrCRSDefcr-rsDefaDefbCS RatNPRDefaDefbGPNPGNCPCS TAP BNBD-/BNDBDEFB-/DB-/BDLAP BBDCHP1/AMP1_CHICK GAL-/GLLn/a AvBDOSPn/a ostrich AvBDTHPn/a turkey AvBD-  70  If possible, any naming scheme should encompass copy number variants and define relationships between peptides in different strains of mice (e.g. Crp4 and Defcr20/Defcr21). Suggestions for a clearer and unified naming scheme have been submitted by us to MGNC, and are being discussed by both HGNC and the Rat Genome Nomenclature Committee (RGNC). MGNC has started to assign symbols to most defensin pseudogenes identified here and has implemented some of the proposed nomenclature changes (Tables 2.3 and 2.4), e.g. the Defcr root has been changed to Defa for mouse α-defensins and defensin-related cryptdin sequences. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo. The human α- and β-defensin genes are subject to copy number variation. Evolution, duplication and allelic variation of defensin genes are currently under investigation (145-147). Since many diseases/disorders appear to be associated with copy number variation, the review and standardization of CNV nomenclature is critical to future studies. Several analyses have shown that large-scale copy number polymorphisms are a major source of genetic variation (148150). One of these polymorphisms involves the human β-defensin cluster on 8p23.1. Whereas carriers of a euchromatic variant that is cytogenetically visible have nine to twelve copies of the region (151, 152), most people have two to seven copies (145). Correlation of β-defensin copy number with expression levels suggests that variable expression levels could cause different predisposition and susceptibility to infectious diseases. A recent genetic mapping approach confirms two distinct β-defensin CNV loci, approximately 5 Mb apart on human Chromosome band 8p23.1 (146). The authors state that this contradicts the current genome assembly. As a follow up, we analyzed the region surrounding the genomic coordinates indicated in the aforementioned study and found five known β-defensin genes (126) and one β-defensin pseudogene (two copies of Defb130, Defb134, Defb136 and Defb137, data not shown). The relationship between this cluster and the duplicated CNV region is unclear. In a study analyzing the expression levels of human α-defensins DEFA1 and DEFA3, the relative proportions of DEFA1:DEFA3 mRNA were associated with the respective gene copy numbers (153). However,  combined levels of DEFA1 and DEFA3 were not correlated with gene copy number, and distinct transcription factor regulation was suggested to explain the differential expression of these genes (153). Conversely total DEFA1 and DEFA3 copy numbers were correlated with the amount of DEFA1-3 peptides purified from human neutrophils (147), which might explain the apparent 71  constitutive nature of human neutrophil α-defensin expression. Total DEFA1 and DEFA3 copy numbers have been reported to vary between four and 11 in 111 individuals (153) and five to 14 in 27 individuals (147), whilst the DEFA3 allele was absent in 10% (153) or 26% (147) of the individuals tested. The proportion of individuals lacking DEFA3 appears to be populationdependent and has been shown to vary between 10-37% (154). It has been speculated that populations with distinct ecological histories carry different defensin gene copy numbers derived from the selective pressure presented on their historical geographic regions (147). However a direct correlation between the copy number variation in the defensin region and geographic origin has not been established. Detection of copy number variation and structural variation in mouse, similar to what has been observed in human, has proven to be difficult because of the limited sequence information on various mouse strains and as well as the large gap in Chromosome 8. There has also been speculation that the current assembly may be collapsed due to its highly repetitive nature and the putative tandem repeat of Contig AC152164.14 is such an example. Current mouse tile-path arrays do not have the capability to resolve individual gene copy numbers and their design is limited by the current mouse genome assembly. Genes that have been annotated as 100% identical are obvious candidates for copy number variants, but sequence information from additional mouse strains is needed for verification. Between the existing α-defensins, we have observed apparent polymorphisms in different mouse strains. A project currently being undertaken at the Wellcome Trust Sanger Institute aims to sequence the genomes of 17 common mouse strains. However, it is reasonable to assume that there will be difficulties in assembling Chromosome 8 proximal to α-defensin gene regions. Nevertheless, comparisons between sequence available for these strains will start to define the copy number and polymorphic variation of mouse α-defensins. The difficulties in characterizing mouse α-defensins have hindered exploration of their in vivo function. Designing unique vectors for individual knockouts of α-defensin genes is  impossible. However, in order to circumvent the potential redundancy in defensin function in vivo, I explored the rationale for generating an α-defensin deficient mouse. The Mutagenic  Insertion  and  Chromosome  Engineering  Resource  (MICER)  utilizes  hypoxanthine/aminopterin/thymidine (HAT) resistance mediated by hypoxanthine guanine phosphoribosyl transferase (Hprt) for murine embryonic stem cell gene targeting (155). Chromosomal deletions can also be engineered however two rounds of microinjection with the 72  MICER targeting vectors are required; one vector contains Exons1-2 of Hprt, a loxP site, and a neomycin resistance cassette, and the second vector contains Exon3-9 of Hprt, a loxP site (in the same orientation as the first), and a puromycin resistance cassette. Cre-mediated recombination between the loxP sites allows for deletion of the intervening DNA. Serendipitously there are MICER clones in the correct location and orientation to delete the entire α-defensin cluster (~0.7 Mb). However the combination of multiple targeting events and, if successful, low frequency of the deletion event, negated this approach. Additionally, until the assembly issues have been addressed, manipulation of this region of the genome is likely to come under a lot of scrutiny, as the exact complement of the α-defensin genes within the mouse reference genome is still unknown. Therefore it was decided that any attempts to generate an α-defensin deficient mouse would not be prudent at this time. At present only a small fraction of defensin genes have peptide products that have been purified from any strain of laboratory mouse. The small size, redundancy and variable expression levels of these peptides may be the reasons for the difficulty in isolating the peptides. Transcriptional profiling enables differentiation between those genes expressed and those for which peptides have been detected. Traditional cloning and capillary sequencing methods have provided solid evidence of murine α- and CRS-defensin expression in strains other than C57BL/6J (20-22, 25-27, 88, 127, 156), although the antimicrobial nature of defensins is likely to bias this method. Single nucleotide polymorphisms were detected in the capillary sequences obtained here that have 98-99.9% identity to reference sequences. Additionally capillary sequencing only allows for presence/absence determination of gene expression or at most a very rough estimate of transcript numbers. In-depth analyses of α- or CRS-defensin transcription have not been previously performed in any mouse strain. This is therefore the first study to provide quantitative gene expression data. The 454 sequencing analysis was performed in order to validate the genomic annotation, especially for novel genes, as well as to quantify the relative numbers of transcripts produced. The results confirmed what is known about the expression levels of defensin peptides in strains of mice other than the C57BL/6J, including Swiss Outbred, 129, C3H (28, 29, 88, 156), and also gives a rational explanation for this apparent discrepancy between genomic content and peptide repertoire. The control and degree of defensin gene expression appears crucial to their detection as peptides. The annotation of TATA boxes provides the starting point for promoter analysis, however this does not appear to be a main regulatory factor as the transcripts expressed 73  in high quantities had both strong or weak TATA boxes. There is a trend, however, that transcripts expressed in low amounts or those not detected did not have a TATA box. This is likely due to the rapid expansion and gene duplication events that have occurred in mouse defensins, leading to loss or gain of regulatory elements. A more in-depth analysis the defensin promoter regions is required, as well as experimental validation of those transcription factors involved in defensin gene expression. Copy number variation also appears to be a defining factor in the detection of the defensin genes as peptides. However, 454 sequencing has also shown that CNV does not necessarily correlate with peptide expression levels. CNV has been implicated in many human diseases, and defensin polymorphisms could indeed play a significant role in immunity and disease susceptibility. However a better understanding of the regulation of defensins, especially with respect to CNV, is needed in order to address their role in disease or disease susceptibility. The Blastx analysis showed a higher proportion of the filtered reads that matched known peptides compared to the number of filtered reads that matched the known transcripts (Table 2.13). This is not surprising when considering that the peptide alignment was less than a third of the length of the full-length transcript alignment, as only the sequence for the mature peptide region, and not the full-length pro-peptide, has been reported (personal communication, A. Ouellette and M. Shanahan). This indicates that the high proportion of less-than-100% matches in the filtered sequences could be the result of base-calling errors, which are inherent in 454 sequencing of those sequences with homopolymeric regions. The identification of novel variants amongst the non-100% matches remains to be determined. Regardless, 454 sequencing appeared to be the most appropriate of the next generation sequencing technologies for α- and CRSdefensins. The reads obtained from Illumina sequencing are not long enough to differentiate the defensin transcripts, due to their high similarity while the very large number of reads obtained by 454 sequencing allowed for relative quantification of the α- and CRS- defensins in the C57BL/6J mouse. In vitro tissue culture experiments revealed that DEFA1 induces IL8, IL10 and TNF  release from human PBMCs. DEFA1 may mediate cytokine production through the activation of NF-κB signalling pathways. Similarly, in response to DEFA1 human bronchial epithelial cells produce IL1B and IL8, stimulate IL8 release, and promote NF-κB-DNA binding (157). Both proand anti-inflammatory functions have been proposed for DEFA1 due to its ability to selectively induce cytokine production, chemoattract monocytes, naïve T cells, and immature dendritic cells, 74  and act as an adjuvant, but also to inhibit the release of IL1B from LPS-treated human monocytes, and inhibit complement activation (42). These experiments provided evidence for the function of DEFA1. However the in vivo role of DEFA1, and indeed other defensins, needs further elucidation. Due to the potential of host defence peptides, including defensins, as novel therapeutics against infection (158), in vivo models need to be developed in order to test their function amidst the sea of peptide and cytokine redundancy. The development of knock-in mice expressing IL8 or Il10 will allow for the testing of peptide mechanism in the treatment of infection under conditions of inflammation or immunosuppression.  75  3  3.1  TARGETED KNOCK-IN OF INTERLEUKIN 8 AND INTERLEUKIN 10 INTO MURINE EMBRYONIC STEM CELLS TO GENERATE MICE WITH CONDITIONAL INTESTINAL-SPECIFIC GENE EXPRESSION Introduction  Many different stimuli induce IL8 production in vitro from human primary cells or cell lines, and as such IL8 is a common output for inflammatory assays. Additionally, IL8 has been implicated in a number of human enteric diseases, in particular Shigellosis, Salmonellosis and chronic inflammation (e.g. Crohn's disease) (78, 159, 160), as well as in certain cancers due to its angiogenic and tumorigenic properties (46). However the lack of Il8 in the common laboratory model organisms, mouse and rat, has made it difficult to study its role in vivo. In addition, the lack of Il8 is hypothesized to be one reason why models of intestinal disease do not necessarily mimic the pathology seen in the corresponding human disease, particularly with regards the lack of neutrophil infiltration (78). Alternative models have been developed to address this issue, but their use has not been widespread. Guinea pigs, which express an IL8 orthologue, were the first rodents used to study infectious diseases, and for some diseases provide the most comparable model to the human disease (161). However mice are most often used for experimentation due to their relatively low cost of maintenance, as well as the large number of laboratory reagents (e.g. antibodies), genetic tractability and bioinformatics tools (e.g. genome sequence), available for their study. The completion of the mouse reference genome sequence and its annotation has made it possible to generate specific gene knock-outs in murine embryonic stem cells, which can then be used to generate gene deficient mice in a targeted manner (162). There has been considerable interest in administering recombinant IL8 to mice in order to induce neutrophil infiltration and the pathology associated with intestinal bacterial disease (79), as well as generating transgenic mice expressing human IL8 (163, 164). The human apolipoprotein (APO) E promoter/ hepatocyte control region (HCR) enhancer, or the rat fatty acid binding protein (Fabp) promoter were used to create mice expressing IL8 in the liver or intestine, respectively (163). However for both constructs, considerable variability of gene expression was determined in different F1 heterozygous lines (163). Random transgene integration occurred for both constructs, and likely the number of insertions varied between the F1 lines since insertion of a similar APOE/ HCR construct resulted in 1-90 copies of the transgene (165). Additionally, there was no external control over the expression of IL8 in either of these constructs and while constitutive expression resulted in IL8 in the serum and 76  subsequently higher numbers of neutrophils, further neutrophil movement was inhibited (163), potentially due to receptor desensitization. Another IL8 transgenic mouse has been generated, by random transgene insertion, whereby IL8 is expressed in the distal ileum, caecum, and proximal colon of the intestine under the control of the rat Fabp promoter and the reverse tetracyline transactivator (164). IL8 expression can therefore be induced using doxycycline, although expression does not occur throughout the length of the intestine. This model is not appropriate for bacterial infection studies, as doxycycline is likely to affect the normal flora of the mouse as well as the course of infection (166). Additionally, all of these models were created by random insertion of an engineered transgene, therefore genetic elements in proximity to the site(s) of insertion could affect expression of the transgene, and conversely insertion of the transgene could disrupt expression of endogenous genetic elements by insertional mutagenesis (166). Mice deficient in either the Cxcr1 or Cxcr2 genes have been generated. There have been numerous studies using the Cxcr2 deficient mouse, however there have been only limited studies on the Cxcr1 deficient mouse and suggest that homozygous mutants do not show any abnormal phenotype (167). In general, Cxcr2 deficient mice have decreased neutrophil recruitment, which is partially tissue- and time-dependent, but cell numbers vary according to type of inflammation, whether it be induced by infection or chemical (52). In some cases neutrophil chemotaxis is almost completely abrogated, which would suggest Cxcr2 as solely responsible for neutrophil chemotaxis under certain conditions (52). Murine neutrophils are capable of in vivo chemotaxis following administration of recombinant IL8 (79), indicating sufficient binding of Cxcr1 or Cxcr2 to IL8, or conversely indicating that in vivo cross-talk occurs between Cxcr1 and Cxcr2 upon IL8 engagement. These experiments provide a way of determining the in vivo function of Cxcr1 and Cxcr2, which may be extrapolated in part to define that of IL8. However the fact that both Cxcr1 and Cxcr2 bind other ligands, in addition to IL8, makes this an indirect method for determining specific ligand function as a whole. The importance of IL10 in the regulation of the immune system becomes apparent upon loss of the gene product in various knockout mouse models. Il10-deficient mice, created either by gene knockout or administration of recombinant anti-Il10 antibodies, are generally more resistant to bacterial infection, especially intracellular bacteria (63, 168). However these mice are also prone to excessive immune stimulation and can develop spontaneous colitis, which is mediated largely by CD4+ T cells (169). Interestingly the Il10 produced by some CD4+ T cell subsets, Th1 and Treg, inhibits activity of these cells as well as macrophages and DCs (64, 170). The role of the normal flora of the mice also plays an important role, as spontaneous colitis does 77  not develop in germ-free Il10 deficient mice (171). IL10 is also critical for tolerance to self, which if broken, can lead to other autoimmune disorders (172). The recognition of and differentiation between self, commensal organism and pathogen is crucial to mount the proper immune response and limit damage to the host. A transgenic Il10-Foxp3 reporter mouse with CD90.1 and green fluorescent protein (GFP) knocked-in at the endogenous Il10 and Foxp3 loci, respectively, has been developed to monitor Treg differentiation in vivo (173). The lung and liver have Tregs expressing Foxp3 but not Il10, whereas the small intestine has Treg expression of Il10 but not Foxp3 and the large intestine has Tregs expressing both Il10 and Foxp3 (173). Tregs can develop in the absence of Il10 in vivo, which contrasts to many in vitro studies (173), therefore it is important to develop models with which to study Il10 function. Although Il10 production was disrupted in the above described reporter mouse, another Il10 reporter mouse has been generated wherein GFP was knocked-in downstream of Il10 (174). This model was used to elegantly show Il10 production by intestinal lymphocytes in the small and large intestine following T cell receptor (TCR) stimulation (174). Il10 production therefore occurs normally in vivo, which is advantageous for the study of its induction. However study of its function following such induction is not as straightforward. A transgenic model in which Il10 is expressed only in the intestinal tract was generated by introducing the rat Fabp intestinal (Fabpi) promoter upstream of endogenous Il10 and then crossing with an Il10 deficient mouse (175). Constitutive Il10 expression was expected in enterocytes from the small intestine and the proximal region of the large intestine, although expression could only be demonstrated in the small intestine (175). Regardless, there was an increase in CD3+ T cells and CD4+CD25+ Treg cells, and a decrease in tumor necrosis factor (Tnf) and interferon-γ (Ifng) production, but increased transforming growth factor-β1 (Tgfb1) production (175). Importantly the authors noted that lymphocyte populations in the spleen were not affected indicating that Il10 acted locally in the small intestine (175). A caecal ligation and puncture model of sepsis demonstrated improved survival in the case of these Il10 transgenic mice. However, most physiological parameters tested were equivalent for both transgenic and wild type mice, except that the transgenic mice had lower Ccl2 and Il6 production from splenocytes stimulated with anti-CD3/CD28 but not LPS, and lower numbers of peripheral neutrophils (176). Critically, this Il10 transgenic model was limited by the constitutive expression of Il10 in the small intestine, which could affect cellular differentiation and normal flora composition throughout development and in subsequent experimentation. 78  The existing knock-in mutant IL8 or Il10 mice have been very informative in investigations regarding the function of these important cytokines and particularly with IL8, instrumental in showing proof of principle that the expression of IL8 results in neutrophil influx in vivo. The proposed critical involvement of chemokines and Il10 in the in vivo action of  immunomodulatory host defence peptides, including defensins, would make these models a useful adjunct to attempts to define mechanism in vivo. However, these models have significant limitations. For this reason, it was decided here to build on these initial studies to develop new transgenic mice expressing either IL8 or Il10 in a controllable manner throughout the intestinal tract. Figure 3.1 depicts the targeting strategy employed in this work.  A. genomic Gpa33 locus 2  3  BsaBI 4  BsaBI 5  6  AscI  IL8 or Il10 insert  Probe  B.  XhoI  7 STOP 3’-UTR  BsaBI  pA33LSL  5  6  7 STOP  loxP  IRES NeoR pA loxP  IRES  3’-UTR  NotI AscI  C.  homologous  recombination  recombined genomic Gpa33 locus  AscI  BsaBI 2  3  4  XhoI  5  6  7 STOP  loxP  IRES NeoR pA loxP  IRES  BsaBI  IL8 or Il10 insert 3’-UTR Probe  NotI  Cre-mediated  D.  deletion  recombined and deleted genomic Gpa33 locus  AscI  BsaBI 2  3  4  5  6  7 STOP  XhoI  loxP  IRES  BsaBI  IL8 or Il10 insert 3’-UTR Probe  Figure 3.1. pA33LSL targeting vector and genomic locus of recombination. The targeting strategy employed for the generation of mice with conditional, intestinal-specific IL8 or Il10 expression [reproduced and modified with permission from (177)]. The murine Gpa33 genomic locus (A) is shown from exon 2 to the 3’-UTR. The BsaBIlinearized pA33LSL targeting vector (B) contains the 5’ and 3’ arms of homology with intervening loxP flanked neomycin resistance cassette (NeoR) and internal ribosome entry site (IRES) to ensure distinct translation from the Gpa33 mRNA. An additional IRES was cloned into pA33LSL (David Adams, WTSI) to allow direct cloning of the desired insert. Cloning of the IL8 and Il10 inserts into pA33LSL produced the final targeting vectors used for transfection of JM8.N4 murine embryonic stem cells. Homologous recombination resulted in the creation of either IL8 or Il10 conditional alleles (C) and Cre-mediated deletion of the neomycin resistance cassette (D) allows for bicistronic Gpa33-IL8 or Gpa33-Il10 transcription and subsequent translation of the IL8 or Il10. Restriction enzyme sites for insert cloning, AscI/ XhoI, and Southern blot confirmation of the targeted locus, BsaBI, are indicated. The Southern blot radiolabelled probe would bind to the region boxed in black; the probe position allowed for discrimination between wildtype and recombined alleles. 79  The pA33LSL vector targets the murine glycoprotein A33 (transmembrane) (Gpa33) gene (177) located on Chromosome 1: 168,060,548-168,096,640 (178). Gpa33 is a transmembrane glycoprotein expressed during development in the inner cell mass of the blastocyst and in the endoderm cell layer (179). Strong expression of Gpa33 occurs throughout both the small and large intestine, in the crypts and villi, and elsewhere a very small amount of expression has been detected from stomach and bladder epithelial cells; Gpa33 has been designated a marker for the basolateral intestinal epithelium (179-181). Other intestinal-specific genes, in particular intestinal and liver Fabp, are differentially expressed throughout the small and large intestine as well as in the crypts and villi (181), which is likely the reason for variable expression in the IL8 transgenic mice previously described. Expression of IL8 or Il10 under control of the Gpa33 promoter is therefore advantageous in the intestinal-specific models developed here. The function of Gpa33 has not been well characterized but it is a member of the immunoglobulin superfamily (179), and as such may play a role in the immune system. Mice deficient in Gpa33 lose the ability to repair chemically-induced intestinal epithelial damage which is concurrent with decreased cellular infiltration of polymorphonuclear cells despite similar mononuclear cell migration (182). It is therefore important that the pA33LSL targeting vector does not to disrupt the Gpa33 gene upon homologous recombination (177), as this would severely hamper any infection studies carried out with the resulting mice. It was necessary to use the same methodology in the generation of both mouse strains to permit direct comparison of experiments involving them. In addition to probing the role of IL8 and Il10 in vivo, these mice will provide models of hyper- and hypo-inflammation, respectively, for testing potential therapeutics. 3.2  Methods and Materials  3.2.1 Human interleukin-8 and mouse interleukin-10 cDNA source  The human IL8 TrueClone™ cDNA plasmid (OriGene, Rockville, MD, USA; catalog number TC119807) contains human IL8 cDNA within the multiple cloning site (MCS) between the EcoRI and SalI restriction sites, as well as an ampicillin resistance gene (β−lactamase) for selection. The plasmid, supplied as 1 mg lyophilized pCMV6-XL5 vector, was resuspended in 15 μl nuclease-free water (NF-H2O) (Ambion/ Applied Biosystems) to give a concentration of 66  ng/μl, as per the manufacturer’s instructions. See Appendix B.1 for the vector map of pCMV6-  80  XL5 (a), pCMV6-XL5 MCS for IL8 insertion (b) and the IL8 NCBI reference cDNA sequence NM_000584.2 with the coding sequence indicated (c). The mouse cytokine plasmid pUMVC3-mIL10 (Aldevron, Fargo, ND, USA; catalog number 4032) contains mouse Il10 cDNA, cloned into the MCS of pUMVC3 between SalI and BamHI, as well as a kanamycin resistance gene (aminoglycoside 3'-phosphotransferase) for  selection. The plasmid, supplied at 3.1 mg/ml in TE buffer, was diluted to a working stock of 10 ng/μl in NF-H2O. See Appendix B.2 for the vector map of pUMVC3-mIL10 (a) and the Il10 NCBI reference cDNA sequence NM_010548.1 with the coding sequence indicated (b). 3.2.2 Primer design to engineer IL8 and Il10 inserts for directional cloning and expression detection  Primers were designed to amplify the IL8 and Il10 coding sequence from their respective cDNA plasmids, incorporating a 5’ FLAG-tag as well as two flanking restriction enzyme sites. AscI (GGCGCGCC) and XbaI (TCTAGA) restriction sites were added at the 5’ end, and XhoI  (CTCGA) and AscI restriction sites were added at the 3’ end of the coding sequence. The 24 base pair (bp) FLAG-tag sequence, 5’-GACTACAAGGATGACGATGACAAG-3’, corresponding to the N-DYKDDDDK-C protein sequence (183), was preceded by a start codon (ATG). The final engineered coding sequences are referred to as IL8 and Il10 inserts. Primer sequences for IL8F-Flag and IL8R-Flag, used to PCR amplify the IL8 insert, can be found in Table 3.1. Amplification of the Il10 insert was optimized using the forward primer, IL10F-Flag, paired with five different reverse primers, IL10R-Flag2-6; all primer sequences for Il10 insert amplification can be found in Table 3.1. The initial IL10R-Flag1 primer was  redesigned, as the original sequence obtained from Vega was shorter than that in NCBI’s RefSeq; the Il10 sequence was amended in the subsequent Vega (v.29) release. Primer design took into account GC content (50-60%), presence of 3’ GC handles (G, C, GC or CG at 3’ end) and nearest neighbour melting temperature (Tm), and eliminated repetitive sequences or bases, if possible. An additional base (A) was added to the 5’ end of each primer to account for any defect in primer synthesis due to their unusual length. For each primer, the Tm and potential for secondary structure formation were calculated using OligoCalc (119), and specificity against the Mus musculus reference assembly genome was verified using NCBI Primer-BLAST (184), with  default parameters. Lyophilized primers (Sigma-Genosys) were reconstituted at 100 μM and  81  diluted to a working stock of 10 μM in NF-H2O. Annealing temperatures were calculated by subtracting 5°C from the Tm, and then optimizing accordingly, if necessary. Table 3.1. Primer sequences for the generation and confirmation of IL8 and Il10 knock-in mice at the Gpa33 genomic locus. * denotes the original reverse primer designed against the Il10 transcript downloaded from Vega. A longer transcript, which is in agreement with Ensembl v.48, was subsequently downloaded from the Vega v.29 update, therefore IL10R-Flag2-6 were designed. ** denotes these primers and their sequences were provided by the WTSI core sequencing facility. Primer Name IL8F-Flag IL8R-Flag IL10F-Flag IL10R-Flag1 * IL10R-Flag2 IL10R-Flag3 IL10R-Flag4 IL10R-Flag5 IL10R-Flag6 M13F ** M13R ** A33F A33R IL10-1 IL10-2 IL8-5 IL8-6 IL8-7 IL8-8 IL8-9 IL8-10 A33p-F2 A33p-R2 NeoF NeoR A33-E7-F A33-3U-R  Sequence (5'-3') AGGCGCGCCTCTAGAATGGACTACAAGGATGACGATGACAAGATGACTTCCAAGCTGGCCG AGGCGCGCCCTCGAGTTATGAATTCTCAGCCCTCTTC AGGCGCGCCTCTAGAATGGACTACAAGGATGACGATGACAAGATGCCTGGCTCAGCACTGC AGGCGCGCCCTCGAGGCCTGGGGCATCACTTCTACCAGG AGGCGCGCCCTCGAGTTAGCTTTTCATTTTGATCATCATGTATGCTTC AGGCGCGCCCTCGAGTTAGCTTTTCATTTTGATCATCATGTATGC AGGCGCGCCCTCGAGTTAGCTTTTCATTTTGATCATCATG AGGCGCGCCCTCGAGTTAGCTTTTCATTTTGATC AGGCGCGCCCTCGAGTTAGCTTTTCATTTTG TGTAAAACGACGGCCAGT CAGGAAACAGCTATGACC GGACGTGGTTTTCCTTTGAA TTTCATTGGAAAGGCTGGTC GGCCATGCTTCTCTGCCT CCAGCTGGACAACATACTGCT GCAGCTCTGTGTGAAGGT CCTTGGGGTCCAGACAGA CGCGCCTCTAGAATGGACTAC GTTATGAATTCTCAGCCCTCTTC GTGCAGTTTTGCCAAGGAGT TAATTTCTGTGTTGGCGCAG CAGTGTGACCTTGACATGGG CTGCATCTTGAAAGGCAACA CTGAATGAACTGCAGGACGA AATATCACGGGTAGCCAACG AGGATCGGTGGAGCTCGGGG AGGCCCTGCCAGCCGTAAGA  3.2.3 PCR amplification of IL8 and Il10 cDNA plasmids to incorporate the FLAG-tag and restriction sites  The IL8 and Il10 inserts were amplified from pCMV6-XL5 and pUMVC3-mIL10, respectively, using the Expand High Fidelity PCR System (Roche Applied Science), as per the manufacturer’s instructions. The final PCR reaction mix contained 1X Expand High Fidelity buffer containing 1.5 mM MgCl2, 200 μM each dNTP (PCR Nucleotide Mix 10 mM each dNTP, Roche Applied Science), 1 μM each forward and reverse primer, 2.6 U Expand High Fidelity enzyme mix and 20 ng template plasmid in a final volume of 50 μl. Cycling conditions were as follows: an initial denaturation step at 95°C for 2 minutes, followed by 10 cycles of 94°C for 30 82  seconds, 65°C for 30 seconds, 72°C for 60 seconds, and then 20 cycles of 94°C for 30 seconds, 65°C for 30 seconds, 72°C for 60 seconds adding 5 seconds onto the extension for each successive cycle and a final elongation 72°C for 10 minutes. PCR amplification was performed using a DNA Engine Tetrad 2 Peltier Thermal Cycler (MJ Research). The IL8 PCR products (5 μl) were analyzed by agarose gel electrophoresis (0.5% agarose, 100 V, 60 min) to confirm successful amplification. Unless otherwise indicated agarose gels contained ethidum bromide (EtBr) dispensed from a 10 mg/ml stock solution (SigmaAldrich) by dipping approximately 0.5 cm of the end of a pipette tip into the solution and then swirling in the melted and slightly cooled agarose. Electrophoresis was performed using the MGU-502T gel tank apparatus (C.B.S. Scientific, Co., CA, USA) in 1X Tris-Acetate-EDTA (40 mM Tris acetate, 1 mM EDTA, TAE) buffer pH 8.0, and 1X TAE agarose gels visualized under ultraviolet (UV) light using the UVItec gel doc system and UVIproMV version 11.03 software (UVItec Limited, Cambridge, UK). The 1 Kb DNA ladder (0.5 μg in 1X BlueJuice® Gel Loading Buffer) (Invitrogen) was run in all agarose gels for size determination. The remaining PCR product (45 μl) was gel extracted using the TOPO® XL gel purification kit (Invitrogen), as per the manufacturer’s instructions. Samples prepared in 1X final crystal violet loading buffer were subjected to agarose gel electrophoresis (0.4% agarose containing 90 μl crystal violet, 90 V for 60 min, 60 V for 40 min). Gel slices from the two PCR products were pooled and the PCR product extracted using the QIAquick Gel Extraction Kit (Qiagen), eluting in a final volume of 50 μl Buffer EB (10 mM Tris-Cl, pH 8.5); the purified product (5 μl) was visualized by agarose gel electrophoresis (0.5% agarose, 110 V, 45 min). Small-scale centrifugation steps were carried out in a Micro 240A microfuge (Denville Scientific, Inc., NJ, USA). The Il10 PCR products (5 μl) visualized by agarose gel electrophoresis (0.8% agarose, 1X TAE buffer, 100 V, 60 min) to confirm amplification. The PCR product obtained using the IL10F-Flag and IL10R-Flag6 primers gave the cleanest product therefore was purified using the QIAquick gel extraction kit eluting in a final volume of 50 μl Buffer EB; the purified product (2.5 μl) was visualized by agarose gel electrophoresis (1% agarose, 80 V, 90 min).  83  3.2.4 Cloning of IL8 and Il10 PCR products into pCR®-BluntII-TOPO® and confirmation by PCR, restriction digestion and capillary sequencing 3.2.4.1 IL8 insert TOPO® cloning and confirmation  The purified IL8 PCR product was cloned into the pCR®-BluntII-TOPO® vector (Invitrogen) as per the manufacturer’s instructions; the ligated vector will be referred to as IL8_pCR-BluntII-TOPO, with registered trademark symbols assumed. Ligation reactions performed at room temperature for 20 minutes consisted of 1 μl dilute (1:4) salt solution, 1 μl pCR-BluntII-TOPO vector and either 1, 2 or 4 μl purified PCR product in a final volume of 6 μl made up with NF-H2O, if necessary; 2 μl of ligated product was then added to 50 μl One Shot® TOP10 Electrocomp™ E. coli cells (Invitrogen). The entire volume was added to a 0.1 mm cuvette (Cell Projects Limited, UK) and electroporated using the Bio-Rad Gene Pulser Xcell Electroporation System (Bio-Rad Laboratories, Inc.) with the pre-set E. coli bacterial electroporation program (25 μF, 200 Ω, 1800 V, exponential decay). For the recovery, 250 μl super optimal broth with catabolite repression (SOC, Invitrogen) was added and the entire volume transferred to a 15 ml Falcon tube and incubated at 37°C with shaking at 200 rpm, for 160 minutes. A positive control (1 μl pUC19 plasmid, Invitrogen) was also electroporated under the same conditions to ensure TOP10 cells were transformation competent. The E. coli cells were diluted 1:10 in SOC and 100 μl each of the neat and diluted cells were spread onto two low salt Luria Bertani (LB-Luria) agar (1.5%) plates containing 50 μg/ml kanamycin (Gibco) for TOPO ligations, and 100 μg/ml ampicillin (Roche Applied Science) for pUC19, and incubated at 37°C overnight. Thirty of resulting colonies were screened using a rapid boil DNA extraction method, colony PCR. Briefly, each colony was added to 10 μl NF-H2O and heated for 10 minutes at 100°C. The final 28.5 μl reaction contained 5 μl of the boiled colony, 0.9X Platinum PCR Supermix (Invitrogen) and 175 nM each of IL8F-Flag and IL8R-Flag primers; amplification was performed as previously described, and the products (10 μl) visualized by agarose gel electrophoresis (0.5% agarose, 100 V, 60 min). Ten positive IL8_pCR-BluntII-TOPO clones (1-10) were each inoculated into 25 ml sterile universal tubes (Sarstedt) containing 5 ml LB-Luria broth with 50 μg/ml kanamycin and incubated overnight at 37°C with shaking at 200 rpm. Glycerol stocks (25%) were made by adding 500 μl of the overnight culture to 500 μl of 50% glycerol; high quality plasmid was extracted from the remainder of the overnight culture using the QIAprep® Miniprep kit 84  (Qiagen), eluting in a final volume of 50 μl Buffer EB. The large-scale centrifugation step to pellet the overnight culture was carried out in a Sorvall Legend RT (Thermo Scientific Inc.) with rotor 6446. The plasmids were quantified using the Thermo Scientific NanoDrop™ ND-1000 Spectrophotometer (Thermo Fisher Scientific Inc.) as per the manufacturer’s instructions; unless otherwise indicated all concentrations were determined using the Nanodrop. IL8_pCR-BluntIITOPO plasmids were subjected to restriction digestion with either AscI or EcoRI (New England Biolabs, NEB) for verification of correct insert size. For each final 25 μl enzyme mix, 87-269 ng of plasmid (1 μl) was digested with 10 U of enzyme in the appropriate 1X NEB buffer at 37°C for 60 minutes, followed by 65°C for 20 minutes to heat inactivate the enzymes. Agarose gel electrophoresis (0.5% agarose, 100 V, 70 min) of 15 μl of each digest and 1 μl uncut plasmid for each IL8_pCR-BluntII-TOPO clone was performed to confirm the correct insert size for further sequence confirmation. Negative mastermix controls without plasmid template were also run to ensure no cross-sample contamination. 3.2.4.2 IL8 insert capillary sequencing  Capillary sequencing of the IL8 inserts was performed by the WTSI core sequencing facility using IL8F-Flag and IL8R-Flag primers, as well as standard M13F and M13R primers (Table 3.1) supplied by the facility. IL8_pCR-BluntII-TOPO-1-10 plasmids were submitted at 100 ng/μl, diluting where necessary with NF-H2O; IL8F-Flag and IL8R-Flag primers were submitted at 5 μM concentration. Capillary sequencing (Applied Biosystems 3730XL capillary sequencer) was performed using BigDye Terminator BDTv3.1 sequencing chemistry. The sequencing facility processed the sequence trace files using the sequencing production software Asp and then deposited those that passed quality control into a central repository. I performed further analyses of the sequences. For each IL8_pCR-BluntII-TOPO clone a file of filenames (fofn) was created to include all the names of the sequence traces from the four primers as well as the name of the text file containing the 355 bp sequence of the IL8 insert. The sequence traces for each clone and the IL8 insert sequence text file were assembled with the Genome Assembly Program, Gap4, using the normal shotgun assembly function with default parameters (185); a separate Gap4 database was created for each clone. In each database, the IL8 insert sequence (5’-3’) was set as the reference. The assemblies were checked manually for sequence quality and the identification of any discrepancies between the trace sequences and the reference sequence. Any discrepancies  85  between bases in the reference insert and those within the sequence traces were verified by two or more sequence reads or re-sequenced. The positive IL8_pCR-BluntII-TOPO-10 plasmid was cultured overnight at 37°C with shaking at 200 rpm in a 250 ml Erlenmeyer flask (Corning, Inc) containing 100 ml LB-Luria broth and 50 μg/ml kanamycin. A larger-scale plasmid preparation was carried out using the Qiagen Midi Plasmid kit, following manufacturer’s instructions. The final plasmid pellet was resuspended in 150 μl 10 mM Tris-Cl pH 8.0 and the concentration determined. 3.2.4.3 Il10 insert TOPO® cloning and confirmation  The purified Il10 PCR product was cloned into the pCR®-BluntII-TOPO® vector as per the manufacturer’s instructions; the ligated vector will be referred to as Il10_pCR-BluntIITOPO, again with registered trademark symbols assumed. The pCR-BluntII-TOPO ligation reactions were set up as using 1 μl and 3 μl volumes of Il10 PCR product in a final volume of 6 μl at room temperature for 15 minutes. A pCR-BluntII-TOPO alone control and Il10 PCR  product alone control (3 μl) were also set-up under the same conditions. One shot® TOP10 Electrocomp™ E. coli cells were electroporated in a 0.1 mm cuvette using the Bio-Rad pre-set E. coli bacterial program, as previously described. The cells were allowed to recover in 250 μl SOC for 90 minutes at 37°C with shaking at 200 rpm, and then diluted 1:10 in SOC The neat and diluted transformations (100 μl) were then spread onto LB-Luria agar plates containing 25 μg/ml kanamycin and 25 μg/ml zeocin (100 mg/ml stock, Invitrogen). A pUC19 control was also transformed, as previously described. Fourteen isolated colonies were screened by colony PCR as previously described, except the colony was added straight to the reaction instead of pre-boiling. Final reaction conditions consisted of 1X Platinum PCR Supermix and 200 nM of each M13F and M13R primer in a volume of 25 μl with an annealing temperature of 55°C. The PCR products (5 μl) were visualized by agarose gel electrophoresis (1% agarose, 80 V, 90 min). The same colonies were also inoculated into 25 ml universal tubes containing 5 ml LB-Luria with 50 μg/ml kanamycin and 25 μg/ml zeocin and grown overnight at 37°C with shaking at 200 rpm; 25% glycerol stocks were prepared as previously described. Plasmids were extracted using the QIAprep Miniprep kit eluting in 50 μl Buffer EB and their concentrations determined. The Il10_pCR-BluntII-TOPO plasmids (5 μl 1:10 dilution; 56-169 ng) were digested with HindIII and XmnI in a final digest reaction of 25 μl consisting of 1X NEB buffer 2, 1X BSA and 20 U each of HindIII and XmnI 86  (NEB). Digestion was performed for 4 hours at 37°C and fragments visualized by agarose gel electrophoresis (0.8% agarose, 100 V, 40 min); the entire 25 μl digest volume and 1 μl (111.5338.06 ng) of uncut plasmid was loaded for each Il10_pCR-BluntII-TOPO clone. 3.2.4.4 Il10 insert capillary sequencing  Capillary sequencing of the Il10 inserts from PCR- and digest-positive plasmids was performed by the WTSI core sequencing facility, as previously described. Il10_pCR-BluntIITOPO plasmids and the IL10F-Flag, IL10R-Flag6 primers were submitted at 100 ng/μl and 5 μM, respectively. M13F and M13R primers were supplied by the facility. Additional primers,  IL10-1 and IL10-2 (Table 3.1), were designed in order to get full coverage of the Il10 insert. Samples were processed by the WTSI core facility and I assembled the sequences with the 592 bp Il10 insert sequence text file and performed subsequent analysis, as previously described. The positive Il10_pCR-BluntII-TOPO-10 plasmid was cultured overnight at 37°C with shaking at 200 rpm in a 250 ml Erlenmeyer flask containing 100 ml LB-Luria broth and 50 μg/ml kanamycin and 25 μg/ml zeocin. A larger-scale plasmid preparation was carried out using  the Qiagen Midi Plasmid kit, following manufacturer’s instructions. The final plasmid pellet was resuspended in 300 μl Buffer EB and the concentration determined. 3.2.5 Digestion of the pA33LSL embryonic stem cell targeting vector for IL8 and Il10 insert ligation  The pA33LSL targeting vector (Figure 3.1) (177, 186), already transformed into E. coli, was kindly provided by David Adams (WTSI). The culture was streaked onto LB-Luria agar containing 100 μg/ml ampicillin and grown overnight at 37°C; a 25% glycerol stock was prepared. An isolated colony was inoculated into a 25 ml universal tube containing 5 ml LBLuria broth and 100 μg/ml ampicillin and grown overnight at 37°C with shaking at 200 rpm. The pA33LSL plasmid was isolated using the QIAprep Miniprep kit eluting with 50 μl Buffer EB, and the concentration determined. Separate pA33LSL plasmid preparations were used for the IL8 and Il10 cloning reactions; inoculation into LB-Luria broth was always with a single colony from a streak plate of the glycerol stock. For the Il8 and Il10 ligations, 0.9-5 μg pA33LSL was digested for 4 hours at 37°C in a final volume of 50 μl containing 1X NEB buffer 4, 25 U of AscI (NEB) and 50 U of XhoI (NEB), then dephosphorylated for one hour at 37°C by adding 10 U calf intestinal alkaline phosphatase (NEB) directly to the reaction mix. Subsequently the digested pA33LSL was gel 87  purified after agarose gel electrophoresis of the digest reaction (Il8 ligation:1% agarose gel, 110 V, 40 min; Il10 ligation: 1% agarose, 90 V, 90 min) using the QIAquick gel extraction kit [as per the manufacturer’s instructions with the following modification: 750 μl QG buffer was used as the standard volume regardless of gel slice weight]. The final purified plasmids were eluted with 50 μl Buffer EB, and quantified. 3.2.6 Digestion of IL8_pCR-BluntII-TOPO and Il10_pCR-BluntII-TOPO for IL8 and Il10 insert ligation with the pA33LSL targeting vector 3.2.6.1 IL8_pCR-BluntII-TOPO digest  IL8_pCR-BluntII-TOPO-10 (50 μg) was digested for 4 hours at 37°C in a final volume of 75 μl containing 1X NEB buffer 4, 200 U AscI (NEB) and 400 U XhoI (NEB). The entire digest was subjected to agarose gel electrophoresis (2% agarose, 100 V, 75 min), and the 351 bp IL8_AscI_XhoI insert gel extracted, as previously described, eluting with 50 μl Buffer EB. The purified product was quantified, and 5 μl analyzed by agarose gel electrophoresis (1% agarose, 80 V, 90 min). 3.2.6.2 Il10_pCR-BluntII-TOPO digest  Il10_pCR-BluntII-TOPO-10 (5 μg) was digested for 4 hours at 37°C in a final volume of 50 μl containing 1X NEB buffer 4, 25 U AscI and 50 U XhoI. The entire digest was subjected to agarose gel electrophoresis (1% agarose, 90 V, 90 min) and the 588 bp Il10_AscI_XhoI insert gel extracted, as previously described, eluting in a final volume of 50 μl EB buffer. The purified product was quantified, and 5 μl visualized by agarose gel electrophoresis (1% agarose, 90 V, 120 min). 3.2.7 Ligation of the AscI- and XhoI-digested pA33LSL vector with each of the AscI- and XhoI-digested IL8 and Il10 inserts 3.2.7.1 IL8_AscI_XhoI insert ligation  The ligation reaction for IL8 with pA33LSL consisted of 18 ng of digested pA33LSL, 46 ng digested IL8 insert, 1X T4 DNA ligase reaction buffer and 400 CEU (cohesive end unit) T4 DNA ligase (NEB) in a final volume of 10 μl, and was carried out at 4°C for 4 hours. Vectorand insert-only control ligations were also set-up, using NF-H2O to make up the volume. Transformations were performed using 1 μl of each ligation and 20 μl ElectroMAX™ DH10B™ competent E. coli (Invitrogen) in a 0.1 mm cuvette using the Bio-Rad pre-set E. coli 88  bacterial program, as previously described. One ml SOC was added to each transformation and 100 μl of undiluted culture plated out onto LB-Luria agar containing 100 μg/ml ampicillin and incubated overnight at 37°C. The rest of the transformation was incubated at room temperature overnight and the following day 200 μl was plated onto three LB-Luria agar plates containing 100 μg/ml ampicillin. Six individual colonies were analyzed for the correct ligation event; each colony was inoculated into a 25 ml universal tube containing 5 ml LB-Luria broth and 100 μg/ml ampicillin and grown overnight at 37°C with shaking at 200 rpm. Plasmids were isolated using the QIAprep Miniprep kit eluting with 50 μl Buffer EB, and their concentrations determined. The presence and insert size was confirmed by PCR amplification of approximately 20 ng plasmid with 1X Platinum PCR Supermix and 200 nM each of A33F and A33R primers (Table 3.1), which amplify the insert from within the pA33LSL vector, in a final volume of 25 μl,. The products were visualized by agarose gel electrophoresis (1% agarose, 100 V, 55 min). 3.2.7.2 Il10_AscI_XhoI insert ligation  The ligation reaction for Il10 into pA33LSL was carried out at 4°C for 4 hours using 50 ng of digested pA33LSL, 50 ng digested Il10 insert, 1X T4 DNA ligase reaction buffer and 400 CEU (cohesive end unit) T4 DNA ligase in a final volume of 10 μl. Vector- and insert-only negative control ligations were also performed, making up the extra volume with NF-H2O. Each ligation was transformed into DH10B E. coli cells made competent in-house. Briefly, the 25% glycerol stock (made from ElectroMAX™ DH10B™ competent cells, Invitrogen) was streaked onto an LB-Luria agar plate containing 50 μg/ml streptomycin (SigmaAldrich), as this strain is streptomycin resistant due to a chromosomal mutation in rpsL150 (187), and incubated overnight at 37°C. An isolated colony was inoculated into two 25 ml universal tubes each containing 5 ml LB-Luria broth and 50 μg/ml streptomycin and incubated overnight at 37°C with shaking at 200 rpm. The next day the 5 ml culture was added to 200 ml LB-Luria broth in a one litre Erlenmeyer flask (Corning, Inc.) and grown at 37°C with shaking at 200 rpm for approximately 2 hours until the E. coli reached mid-log phase (optical density approximately 0.5). The culture was split into four 50 ml Falcon tubes and put on ice. All remaining steps were carried out at 4°C and all reagents and the centrifuge were pre-chilled to 4°C. The tubes were centrifuged at 4000 rpm for 10 minutes. The supernatant was poured off and each pellet resuspended in 1 ml MilliQ H2O (Synthesis A10, Millipore) and then topped up to 50 ml with MilliQ H2O, and centrifuged at 4000 rpm for 10 minutes. The pellets were resuspended in 1 ml MilliQ H2O and transferred to 2 ml Eppendorf tubes. The tubes were centrifuged at 89  13000 rpm for 2 minutes and the cells washed five times with MilliQ H2O; the first wash was resuspended in 1 ml, the second in 0.5 ml (two tubes pooled), the third in 1 ml, the forth in 0.5 ml (two tubes pooled) and the fifth in 1 ml, giving one tube containing 1 ml cells. After a final spin at 13000 rpm for 2 minutes, the cells were resuspended in 100 μl MilliQ H2O. Transformations were performed immediately using 50 μl competent cells and 1 μl ligation reaction in 0.1 mm cuvettes and the Bio-Rad pre-set E. coli bacterial program, as previously described. One ml SOC was added and the cells recovered at 37°C with shaking at 200 rpm for 2 hours. The cells were plated onto LB-Luria plates containing 100 μg/ml ampicillin, two with 100 μl neat culture and one with 200 μl neat culture, and incubated overnight at 37°C. One colony was tested using the rapid boil DNA extraction method with PCR amplification using IL10F-Flag and IL10R-Flag6 primers, as previously described. Plasmids were then extracted from a further six colonies, using the QIAprep Miniprep kit, and inserts confirmed by PCR amplification using 1X Platinum PCR Supermix and A33F/ A33R and IL10F-Flag/ IL10F-Flag6 primer pairs, as previously described; products were visualized by agarose gel electrophoresis (0.8% agarose, 90 V, 45 min). 3.2.8 Capillary sequencing of the IL8_pA33LSL and Il10_pA33LSL targeting vectors  PCR-positive IL8_pA33LSL and Il10_pA33LSL plasmids were capillary sequenced by the WTSI core facility, as previously described, for confirmation of the insert sequences of the final murine embryonic stem cell (ESC) targeting vectors. Plasmids were submitted at 100 ng/μl and primers at 5 μM. IL8_pA33LSL plasmids were sequencing using IL8F-Flag, IL8R-Flag, as well as additional IL8 insert-specific IL8-5, IL8-6, IL8-7, IL8-8, IL8-9 and IL8-10 primers (Table 3.1), and Il10_pA33LSL plasmids were sequencing using IL10F-Flag, IL10R-Flag6, IL10-1 and IL10-2 primers. The additional primers circumvented difficulties encountered obtaining full sequence coverage of the inserts. I performed all subsequent sequence analyses using Gap4 assembly and trace viewing software, as previously described. 3.2.9 Linearization and preparation of the IL8 and Il10 final targeting vectors for embryonic stem cell transfection  Cultures of the positive IL8_pA33LSL-1 and Il10_pA33LSL-1 DH10B clones were grown overnight from single colonies at 37°C with 200 rpm shaking in one litre Erlenmeyer flasks containing 200 ml of LB-Luria broth and 100 μg/ml ampicillin. Plasmids were isolated 90  using the Qiagen Plasmid Midi kit, following the manufacturer’s instructions, and plasmid pellets resuspended in 100 μl NF-H2O. The concentration of a 1:10 dilution of each plasmid was determined, from which the concentration of the undiluted plasmid was extrapolated. Plasmid DNA was prepared prior to ESC electroporation by linearization with SalI restriction enzyme (NEB). In a final volume of 100 μl, 100 μg of vector was digested with 200 U of SalI in 1X NEB buffer 3 and 1X BSA at 37°C overnight, and then inactivated at 65°C for 20 minutes. The linear vectors were purified by adding 2.5 times the volume of 100% EtOH (250 μl), vortexing and incubating on ice for 5 minutes. The precipitate was centrifuged at 14000 rpm  for two minutes at room temperature and the supernatant discarded. The resulting pellet was washed twice with 70% EtOH (750 μl each), centrifuged at 14000 rpm for two minutes between washings and the supernatant removed. The pellets were resuspended in 100 μl of sterile tissueculture grade PBS (Invitrogen) at 4°C overnight, giving a final vector concentration of one μg/μl. Digests and uncut plasmids (1 μg) were visualized by agarose gel electrophoresis (0.6% agarose, 90V, 60 min). 3.2.10 JM8.N4 murine embryonic stem cell culture reagents Α 100X β-mercaptoethanol (BME) solution was prepared by adding 72 μl β-  mercaptoethanol stock (Sigma-Aldrich) to 100 ml Dulbecco’s phosphate buffered saline (DPBS) without magnesium chloride or calcium chloride (Invitrogen) and filter sterilizing (0.2µm); BME was stored at 4°C and used within two weeks of preparation. A 1X trypsin solution was prepared by adding 0.1 g EDTA (Sigma-Aldrich) to 500 ml DPBS and filter sterilizing (0.2 µm). To this 5 ml chicken serum (Invitrogen) and 10 ml 2.5% trypsin (Invitrogen) were added. 2X trypsin was prepared by adding 4 ml 2.5% trypsin to 200 ml of 1X solution. Aliquots of both 1X and 2X trypsin were stored at -20ºC, and then at 4ºC once thawed. Trypsin solutions and aliquots were prepared by Theodore Whipp (WTSI). A glutamine, penicillin and streptomycin (GPS) mix was prepared by adding 58.4 g glutamine (Amresco), 10 g streptomycin, 6 g penicillin (Sigma-Aldrich) to two litres of MilliQ water, and filter sterilized (0.2µm). Single-use aliquots (6 ml) were stored -20ºC and thawed just prior to media preparation. GPS solution and aliquots prepared by Theodore Whipp (WTSI). 0.1% Gelatin was prepared by adding 25 ml of 2% gelatin (Sigma-Aldrich) to 500 ml DPBS and stored at 4°C.  91  M15 media contains 500 ml Knockout DMEM (Invitrogen), 90 ml fetal bovine serum (Invitrogen), 6 ml GPS, 6 ml 100X β-mercaptoethanol and 120 μl leukocyte inhibitory factor (LIF); LIF prepared in-house by Theodore Whipp (WTSI) using a modified protocol obtained from Janet Rossant (Hospital for Sick Children, Toronto, Canada). M15 was stored at 4°C and used within 2-3 weeks of preparation. Freezing medium was prepared with 90% M15 media and 10% DMSO (Sigma-Aldrich) and filter sterilized (0.2µm); freezing media was stored at 4°C and used within 48 hours. 3.2.11 JM8.N4 murine embryonic cell line culture  The JM8.N4 mouse ESC line was kindly provided by David Adams (WTSI). Throughout the entire procedure, feeder-free JM8.N4 ESC were cultured on 0.1% gelatin at 37°C with 5% CO2 in M15 media. For selection steps, JM8.N4 cells were cultured in M15 media containing 175 μg/ml Geneticin or 3 μg/ml puromycin (Sigma-Aldrich), as indicated. All centrifugations for ESC work performed in an Eppendorf 5702 microfuge as indicated. 3.2.12 JM8.N4 ESC transfection and selection  Prior to electroporation, the JM8.N4 cells were cultured in a T150 flask (Corning, Inc.) until 90% confluent, washed once with D-PBS and trypsinized with 5 ml of 1X trypsin for 5 minutes at 37°C and 5% CO2, after which time 15 ml of M15 media was added to neutralize the trypsin. A volume of 6 ml of JM8.N4 cells, corresponding to 1.75x107 cells, was added to each of two 25 ml universal tubes, which were centrifuged at 1200 rpm for 3 minutes. The supernatant was aspirated and each pellet resuspended in 700 μl D-PBS. For each vector, 20 μl (i.e. 20 μg) of the linear plasmid was added to a new 1.5 ml Eppendorf tube to which the 700 μl cell suspension was added. The cells/ DNA were mixed twice with a P1000 pipettor and the entire contents added to a 0.4 mm cuvette (Bio-Rad). The mixture was electroporated at 800 V, 25 μF with infinite resistance and then incubated at room temperature for 10 minutes. The cells were washed out of the cuvette with 2 ml of selection media (M15 with Geneticin) and the entire volume added to a 50 ml Falcon tube containing 47 ml selection media. The cuvette was rinsed with another 1 ml of selection media, which was added to the Falcon tube bringing the final volume to approximately 50 ml. Following thorough mixing by inversion, 10 ml was added to each of five 10 cm tissue culture (TC) dishes (Corning, Inc.); the dishes were swirled to ensure even distribution of cells upon settling and adherence. The final number of 92  cells per dish was approximately 3.5x106 million. Media changes were made daily for the entire duration of the Geneticin selection period. 3.2.13 Colony picking and clone expansion  Following 10 days of Geneticin selection, colonies of JM8.N4 ESC, presumably undergone homologous recombination and clonal expansion were distinctly visible to the naked eye. The media was aspirated from the dishes and replaced with 1X D-PBS. Well-spaced colonies were first identified by eye and then their quality checked under an Olympus TH4-20 microscope to ensure even, uniform edges. Each colony was removed from the dish in one motion by sliding a pipette tip underneath it and then gently sucking up with 54 μl (to account for the volume of the colony) D-PBS without breaking up the colony. The entire volume including the colony was ejected into one well of a pre-gelatinized 48-well plate (BD Falcon). This picking procedure was repeated to fill two 48-well plates, giving 96 colonies for each targeting construct. To each colony 50 μl 2X trypsin was added and the plates incubated at 37°C with 5% CO2 for 7 minutes. M15 media, without Geneticin selection, was added to fill each well approximately three-quarters full and mixed thoroughly (6-8 times) with a P1000 pipettor, changing tips between each well; the plate was then returned to the incubator. The trypsinizing procedure was repeated with the remaining three 48-well plates. The media was changed the following day (Day 11) and everyday thereafter until Day 14, upon which time the majority of clones were 60-90% confluent. Media was aspirated and the cells washed once with D-PBS. The cells were trypsinized using 100 μl of 1X trypsin and incubated at 37°C with 5% CO2 for 4 minutes in the incubator. M15 media was added to fill the wells three-quarters full, and each well mixed by pipetting up and down 5-6 times with a P1000 pipettor. The volume of cells/ media was approximately split between two 24-well plates (BD Falcon), one for freezing (F1-8) and the other for DNA extraction (DNA1-8). Where necessary, additional M15 media was added so all wells were approximately three-quarters full. The media was changed every day until most of the wells were confluent enough for freezing (80-90%) or for extracting DNA (~100%). On Day 17, plates F1-8 were frozen since the majority were ~90% confluent; some clones were less confluent but if the plate was left any longer the others would have become overgrown and risk the potential of losing their pleuripotency. The media was aspirated and the cells washed once with D-PBS. A volume of 100 μl of 1X trypsin was added to 93  each well and incubated at 37°C with 5% CO2 for 4 minutes in the incubator. Freezing media (500 μl) was added to each well and the plates sealed and stored at -80°C until positive clones were identified by Southern blotting and PCR of the targeted allele. The DNA1-8 plates were cultured until Day 19 when the maximum confluency for any one clone was ~100%, upon which time the DNA was extracted. 3.2.14 Embryonic stem cell DNA extraction and quantification  The media was aspirated from plates DNA1-8 and the cells washed once with D-PBS, which was then flicked out of the plates. A volume of 500 μl of tail extraction buffer, consisting of 50 mM Tris pH 8, 50 mM EDTA pH 8, 0.5% sodium dodecyl sulphate (Ultrapure 10% SDS Solution, Gibco) containing 1 mg/ml fungal proteinase K (Invitrogen) was added to each well and the plates incubated at room temperature with shaking at 40 rpm for 5 minutes to detach the cells. Cell lysates were added to 1.5 ml Eppendorf tubes and incubated for 22 hours at 55°C and then stored at 4°C for 24 hours. The entire lysate was added to 900 μl of 100% EtOH and mixed by inversion. The DNA was swirled onto a glass rod (sealed capillary tube made of soda glass, Samco), washed by dunking in 500 μl 70% EtOH and then in 500 μl 100% EtOH, and inverted to dry for one hour at room temperature. The DNA was resuspended in 150 μl 10mM Tris-HCl pH 8.0 for three days at 4°C, and then stored at -20°C. The samples were thawed at room temperature; in some tubes the DNA had not resuspended so all samples were incubated 55°C for an additional two hours. The DNA concentrations were determined, and then the samples were stored at 4°C to avoid freeze-thaw cycles until the restriction digest was performed. Wild type (WT) JM8.N4 DNA was extracted as described above for probe generation and Southern blot WT control. 3.2.15 Restriction digestion of ESC DNA for Southern blotting  The genomic sequence surrounding the Gpa33 targeting locus (Exon 4 to 3’-UTR plus 6260 bp; Chromosome 1:168087644-168102900 bp) was downloaded through Ensembl v.51 (NCBIM37 Assembly). Appendix B.3 is a screenshot of this region taken from Ensembl v.51. The genomic sequence (Appendix B.4) was subjected to in silico digestion using NEBcutter (188) to identify a restriction enzyme(s) that could be used to differentiate between WT and targeted knock-in (KI) allele by Southern blotting. ESC DNA (10 μg) was digested with 20 U 94  BsaBI in 1X NEB buffer 4 in a final volume of 50 μl for 16.5 hours at 60°C, followed by a 20  minute heat inactivation at 80°C. JM8.N4 WT DNA was also digested under the same conditions to serve as the positive control (+/+ genotype) and the BsaBI master mix alone served as the negative control. The human IL8 KI and mouse Il10 KI alleles are denoted as hIL8:c and mIL10:c, respectively, indicating that the KI allele is conditional; human (h) and mouse (m) designations are made for the alleles so that the species origin of the gene is clear on the WTSI Mouse Database. 3.2.16 Gel electrophoresis and transfer for Southern blot analysis BsaBI-digested DNA (10 μg) was subjected to gel electrophoresis (0.6% agarose gel, 60  V, 6 hours, 0.25 mg/ml EtBr). The gel was washed twice, with gentle agitation, in one litre depurination buffer (0.242 M HCl) for 10 minutes each, and once in one litre neutralization buffer (0.4M NaOH, 1M NaCl) for 15 minutes, and then transferred overnight (14.5 hours) to positively charged nylon membrane (Hybond-XL, Amersham/ GE Healthcare) by capillary action. The membrane was agitated in wash buffer (0.5M Tris-HCl, 1M NaCl) for 15 minutes and then baked for three hours at 80°C between Whatman filter paper. 3.2.17 Southern blot probe labelling and hybridization  The transferred and baked Hybond-XL nylon membrane was pre-hybridized in a large glass bottle (300 X 35 mm, Thermo Scientific) containing 20 ml Rapid-hyb™ buffer (Amersham/ GE Healthcare) at 65°C for 2 hours rotating in a Hybaid Shake and Stack hybridization chamber (Thermo Scientific). One aliquot of the A33 probe (described below) was labelled with α-32P-dCTP (Redivue deoxycytidine 5'-[Α32p] triphosphate, triethylammonium salt 3000Ci/mmol 10mCi/ml EasyTide, 250 µCi; Perkin Elmer) using the Prime-It® II Random Primer Labeling Kit (Agilent Technologies, Stratagene Products) according to the manufacturer’s instructions. Briefly, 10 μl of Random 9-mer primers was added to the 25 μl probe aliquot (50 ng), boiled at 100ºC for 5 minutes and then placed on ice for 1 minute. On ice, 10 µL of 5X dCTP buffer (containing 0.1 mM of each of dATP, dGTP and dTTP) and 1 µL of Klenow polymerase were added. In the radiation room, 5µl (50 μCi) of αP32–dCTP was added and incubated at 37ºC for 1 hour 22 minutes. The probe was then cleaned through an Illustra ProbeQuant™ G-50 Micro column (GE Healthcare) to remove unincorporated nucleotides, primers and the Klenow polymerase. The column was prepared by snapping off the bottom and 95  spinning in a collection tube for 2 minutes at 4000 rpm. The column was placed in a new 1.5 ml tube and add the entire probe reaction added to the gel resin in the column and spun again for 2 minutes at 4000 rpm. The cleaned probe (eluant) was denatured for 5 minutes at 100ºC and then placed on ice. The entire labelled probe was then added to the pre-hybridized membrane and hybridized by rotating at 65°C for 5 hours and 15 minutes. Low stringency wash (LSW) and high stringency wash (HSW) buffers were pre-warmed to 65°C. Following hybridization the membrane was washed by rotation for 10 minutes with each of LSW (2X SSC, 0.1% SDS) and HSW (0.5X SSC, 0.1% SDS) buffers at 65°C. The membrane was wrapped in cling film and exposed to Hyperfilm MP (Amersham/ GE Healthcare) at -80°C for 15 hours 15 minutes. The film was developed using a Compact X4 Automatic X-ray Film Processor (Xograph Healthcare Ltd, UK). 3.2.18 Southern blot probe design and amplification  The genomic sequence downloaded for the Gpa33 targeting locus encompassed 6260 bp past the 3’ UTR, and subsequently past the end of the 3’ arm of homology of pA33LSL, to ensure that the Southern blot probe hybridized to the genomic sequence outside the region of homologous recombination. This was to detect both and distinguish between WT and KI alleles using the same probe. Within the genomic sequence, exons 4-7, 3’-UTR, 5’ and 3’ arms of homology, repeat elements and BsaBI restriction sites were identified (Appendix B.4). The genomic sequence following the 3’-UTR contains four repeat elements (Type 1 Transposons) ranging from 55-157 bp in length. The sequence after this was free of repeats for a long enough region to design primers to amplify a probe optimal for Southern blot hybridization (500-1000 bp). The primer sequences for A33pF2 and A33pR2 can be found in Table 3.1 and the probe amplicon is highlighted in Appendix B.4. The Tm and potential for secondary structure formation for each primer were calculated using OligoCalc (119) and specificity against the Mus musculus genomic reference assembly determined using NCBI Primer-BLAST (184). The A33 probe (A33p) was amplified from 50 ng WT JM8.N4 DNA in a final volume of 50 μl under the following final reaction conditions: 1X Pfx reaction buffer, 300 μM dNTPs, 300 nM each of A33pF2 and A33pR2 primers, 5 U Pfx50 DNA polymerase (Invitrogen). The cycling conditions were as follows: 94ºC for 2 minutes, 35 cycles of 94ºC for 15 seconds, 58ºC for 30 seconds, 68ºC for 30 seconds, with a final extension at 68ºC for 5 minutes. The PCR product (10 μl) was analyzed by agarose gel electrophoresis (0.8% agarose, 90 V, 55 min), and purified using 96  the QIAquick PCR purification kit (Qiagen), according to the manufacturer’s instructions, and eluted using 50 μl Buffer EB. The pure product (1 μl) was then ligated into pCR-BluntII-TOPO for 20 minutes, as previously described, and then 2 μl of a 1/10 dilution transformed into 50 μl electrocompetent TOP10 E. coli cells, as previously described. A vector-only negative control was also ligated and transformed under the same conditions. SOC (250 μl) was added and the transformation allowed to recover at 37°C with shaking at 200 rpm for one hour, after which time 20, 50 and 100 μl were plated onto LB-Luria agar with 50 μg/ml kanamycin and incubated overnight at 37°C. Twelve individual colonies were each inoculated into 25 ml universal tubes containing 5 ml LB-Luria and 50 μg/ml kanamycin and grown overnight at 37°C, with shaking at 200 rpm. The plasmids were extracted using the QIAprep Miniprep kit, according to the manufacturer’s instructions, eluted in 50 μl Buffer EB, and quantified. The A33p_pCR-BluntII-TOPO plasmid insert size was confirmed by restriction digestion using BamHI and EcoRV, which both cut once in the pCR-BluntII-TOPO MCS to release the insert, as well as EcoRI, which cuts twice in the pCR-BluntII-TOPO MCS to release the insert and also cuts once within the A33p amplicon. Both restriction digests were carried out using 1 μg of each A33p_pCR-BluntII-TOPO plasmid in a final volume of 50 μl for 16.5 hours at 37°C,  followed by 65°C for 20 minutes to inactivate the enzymes. The BamHI/ EcoRV double digest reaction consisted of 1X NEB buffer 3, 0.1 mg/ml BSA and 10 U each of BamHI and EcoRV (NEB); the EcoRI single digest consisted of 1X NEB EcoRI buffer and 20 U of EcoRI (NEB). The digests (10 μl) and 1 μl of their respective parent A33p_pCR-BluntII-TOPO plasmids (54305 ng) were subjected to agarose gel electrophoresis (0.8% agarose, 90 V, 60 minutes). The A33p_pCR-BluntII-TOPO plasmid insert and therefore probe identity was confirmed by PCR amplification using 1X Platinum PCR Supermix, 300 nM each of A33pF2 and A33pR2 primers and 5-30 ng plasmid in a final volume of 50 μl. WT JM8.N4 (25 ng) was included as the positive control. Amplification was carried out, as above for A33p TOPO cloning using Pfx50 DNA polymerase, except the annealing temperature was decreased to 55°C. Products (10 μl) were visualized by agarose gel electrophoresis (0.8% agarose, 80 V, 120 min).  As a final confirmation of primer specificity, the A33p_pCR-BluntII-TOPO plasmids confirmed to be restriction digest- and PCR-positive were capillary sequenced using A33pF2 and A33pR2 primers as well as standard M13F and M13R primers by the WTSI core sequencing facility. The plasmids and primers were diluted to 100 ng/μl and 5 μM, respectively, in NF-H2O, 97  prior to sample submission. I performed all subsequent sequence analysis using Gap4 assembly and trace viewing software, as previously described. A33p PCR products amplified from seven A33p_pCR-BluntII-TOPO positive plasmids using the A33pF2 and A33pR2 primers and Pfx50 DNA polymerase, as previously described for TOPO cloning, except 100-600 pg of plasmid was used as template. The resulting PCR products were column purified using the QIAquick PCR purification kit, as per the manufacturer’s instructions, with a final elution of 50 μl Buffer EB, to remove unincorporated nucleotides, primers and Pfx50 DNA polymerase. Each purified product (5 μl) was visualized by agarose gel electrophoresis (0.8% agarose, 80 V, 90 or 120 min). A volume of 5 μl was pooled from each reaction and the pool concentration determined. The A33p PCR pool was diluted to 2 ng/μl with NF-H2O and aliquots of 25 μl made for a final amount of 50 ng/ aliquot, the amount required for labelling in the Southern blot analysis. 3.2.19 PCR confirmation of Southern blot-positive clones  Clones determined to be positive by Southern blot analysis were further confirmed by PCR amplification using several primer combinations for maximum confidence in the targeting process. For each gene construct, insert-specific primers were used to amplify the IL8 insert (IL8-7/ IL8-8) or the Il10 insert (IL10F-Flag/ IL10R-Flag6), as well as the neomycin resistance cassette (aminoglycoside phosphotransferase) using NeoF/ NeoR primers (Table 3.1). The NeoF and insert-specific forward primers were each paired with the reverse A33p primer, A33pR2 (Table 3.1), which is downstream of the 3’ arm of homology. Due to the nature of the targeting vector, design of a probe upstream of the 5’ arm of homology is difficult to the necessity of a very long-range PCR. The 5’ arm of homology is 4979 bp and ends within intron 4-5 of Gpa33. To ensure specificity the primer would have to be positioned within exon 4, the start of which is 7767 bp away from the conditional allele insertion site. The final 50 μl reactions consisted of 1X High Fidelity PCR buffer, 200 μM dNTPs, 2 mM MgSO4, 200 nM each of forward and reverse primers, 1 U Platinum Taq High Fidelity DNA polymerase (Invitrogen) and 10 or 25 ng genomic DNA. The amplification conditions were as follows: 94°C for 2 minutes, 35 cycles of 94°C for 30 seconds, 55°C for 30 seconds, 68°C for 6.5 minutes, a final extension of 68°C for 10 minutes, followed by 4°C. Products (5 μl) were analyzed by agarose gel electrophoresis (0.6% agarose, 90V, 1-2 hours).  98  3.2.20 Expansion of correctly targeted ESC clones  For each gene construct, ESC clones identified as positive by Southern blot and PCR were cultured as potential candidates for microinjection, as previously described. Frozen stocks were thawed and cells cultured in 6-well plates (BD Falcon) for 10 days prior to the microinjection target date. M15 media was changed daily and upon reaching ~80% confluency, every second day, cells were passaged at a 1/5 dilution. Cells were also passaged the day before microinjection at two different dilutions, regardless of the schedule, to ensure maximum cell health on the day of microinjection. 3.2.21 ESC clone microinjection and chimera F0 mouse generation  On the morning of microinjection the best of the positive clones for each gene construct (IL8 or Il10) was chosen based on visual inspection by microscopy. This was judged by percentage confluency (70-80%), clone morphology (evenly-sized and spaced), and the least amount of dead or differentiating cells. Media was aspirated from the chosen clone and the cells washed once with D-PBS. Trypsin (1X) was added to just cover the bottom of the well (<500 μl) and the cells were incubated for 4 minutes at 37°C with 5% CO2. M15 media was added and the entire volume transferred to a 25 ml universal tube, topped up to 10 ml with M15 and pipetted well to mix. The tube was centrifuged for 4 minutes at 1500 rpm and the supernatant aspirated. The ESC pellet was resuspended in 1ml M15: HEPES solution (sterile-filtered cell culture grade, Sigma-Aldrich) (50:1) and the cells mixed well. An aliquot of 500 μl of the cell suspension was added to a cryovial and placed immediately on ice. The remainder of the cells were added to one well of a 6-well plate and checked under an Olympus CKX41 microscope. The cryovial was submitted to the WTSI Research Support Facility (RSF), and Team 121 (Generating Mouse Mutants) performed the microinjection and surgical implantation; I was allowed to observe the procedures but it was not on a day that any of my constructs were microinjected. See Appendix B.5 for my observations. Briefly, ESC were microinjected into 3040 blastocysts (5-15 ESC/ blastocyst); the microinjected blastocysts were then surgically implanted into pseudo-pregnant recipient females (3-4/ MI depending on total number of embryos microinjected).  99  3.2.22 Cre plasmid transfection for functional analysis of the targeted ESC loxP sites  A Cre-expressing plasmid containing puromycin selection was kindly provided by Kosuke Yusa and Wei X. Wang (WTSI). The plasmid was transformed into E. coli TOP10 electrocompetent cells, plated onto LB-Luria agar and incubated overnight at 37°C, as previously described. An isolated colony was inoculated into a 250 ml Erlenmeyer flask containing 50 ml LB-Luria broth and cultured overnight at 37°C, with shaking at 200 rpm. The Cre-puromycin plasmid was purified using the Qiagen Plasmid Midi kit, as per the manufacturer’s instructions. The plasmid was resuspended in 150 μl 10 mM Tris-HCl pH 8.0, quantified and diluted to one μg/μl in NF-H2O.  The first set of microinjected ESC clones were validated for functional Cre-mediated neomycin resistance cassette deletion. (i.e. IL8_Gpa33_JM8N4-12 and Il10_Gpa33_JM8N4-24). ESC clones were thawed and cultured in T25 flasks (Corning, Inc.). To prepare the ESC for transfection, old M15 media was aspirated from the T25 flasks and the cells washed once with D-PBS. One ml of 1X trypsin was added to each flask and incubated for 4 minutes at 37°C with 5% CO2. Fresh M15 (9 ml) was added to each flask and the entire contents transferred to a 25 ml universal tube and mixed to create a single cell suspension. For each gene construct, two transfections were performed with 3X106 ESC resuspended in 700 μl D-PBS and 20 μg Crepuromycin plasmid. ESC were added to 20 μl pre-aliquoted plasmid, mixed twice and transferred to a 0.4 cm cuvette and electroporated with 230 V, 500 μF and infinite resistance. After 20 minutes recovery, ESC from each transfection were plated out onto a pre-gelatinized 10 cm TC dish in selection-free M15 media. The following day puromycin selection was started and continued for nine days, changing the media daily. On day 10 after transfection, 48 colonies were picked for each gene construct into 48-well plates, as previously described. These clones were expanded in selection-free media until 80-90% confluent and then split in half into 24-well plates. These plates were grown again until approximately 90% confluent and then one plate frozen and DNA extracted from the other for PCR confirmation of the Cre-mediated neomycin resistance cassette deletion. Positive clones were thawed, expanded for one passage and then frozen in one ml freezing media in liquid nitrogen for archiving. Clone DNA (25 ng) from each insert were tested by amplification using Platinum PCR Supermix, as previously described, with gene-specific or neomycin resistance cassette primers; the parent ESC clone (25 ng), WT JM8.N4 (25 ng) were included as controls. The PCR products (5 μl) were analyzed by agarose gel electrophoresis (1% agarose, 90 V, 45 min). 100  3.2.23 Chimera percentage determination and F0 matings  The genetic background of the embryos (blastocyst) used for microinjection is C57BL/6J-Tyr<c-Brd> (100%) (prefix CALB) and that of the recipient pseudo-pregnant mice is CBA (50%) and C57BL/6J (50%) (F1) (prefix BXCB). Both CALB and BXCB strains are albino whereas JM8.N4 ESC are derived from a black C57BL/6N strain; this allows resulting F0 chimeras to be genotyped by coat colour expressed as a percentage, which was performed by RSF staff. Chimera F0 mice were then mated with C57BL/6N Taconic USA (prefix CBLN) wild type mice and subsequent F1 pups genotyped for germline transmission of respective knock-in allele by PCR. All mouse husbandry and matings performed by staff of the WTSI RSF. 3.2.24 Genotyping F1 offspring to detect germline transmission of hIL8:c or mIL10:c alleles  Ear clips from F1 pups were taken by WTSI RSF staff and stored at -20°C until DNA extraction. I performed subsequent processes. DNA was extracted from ear clips using the DNeasy Blood & Tissue Kit (Qiagen), as per the manufacturer’s instructions, with the following specifications: Proteinase K treatment was carried out for at least 5 hours and up to approximately 16 hours (overnight) at 55°C, DNA was eluted twice into the same tube with 100 μl Buffer AE, and the concentration determined. DNA from the first F1 litters (LRHE5.1, 6.1, 7.1, 8.1 and LRIT6.1) was amplified using the neomycin resistance cassette primers (NeoF/ NeoR) as well as IL8 or Il10 gene-specific primers (IL8-7/ IL8-8 or IL10F-Flag/ IL10R-Flag6). A final reaction volume of 25 μl consisting of 1X Platinum PCR Supermix (Invitrogen), 200 mM each forward and reverse primer and 12.5 ng DNA was subjected to the following amplification conditions: 94°C for 2 minutes, 35 cycles of 94°C for 30 seconds, 55°C for 30 seconds, 72°C for 30 seconds, a final extension of 72°C for 5 minutes. Amplicons (5 μl) were visualized by agarose gel electrophoresis (1% agarose, 90 V, 45 min). F1 litters LRHE5.2 and LRHE8.2 were genotyped using a multiplex PCR to amplify both the WT and hIL8:c alleles in one reaction, which was confirmed using the neomycin resistance cassette and IL8-specific primers, as described above. The multiplex PCR contains a forward primer within exon 7 of Gpa33 (A33-E7) and a reverse primer within the 3’-UTR of Gpa33 101  (A33-3U) (Table 3.1), as well either a IL8- or Il10-specific forward primer. A final reaction of 25 μl consisting of 1X Platinum PCR Supermix (Invitrogen), 100 mM each of A33-E7 and genespecific forward primers, 200 mM A33-3U reverse primer and 2 μl undiluted DNA (28-60 ng) using the same amplification conditions as above. The PCR products (5 μl) were visualized by agarose gel electrophoresis (1% agarose, 90 V, 40 min). WT JM8.N4 and Southern blotconfirmed targeted IL8_Gpa33_JM8N4 ESC DNA were included as heterozygous and wild type controls. All subsequent LRHE, LRIT and LRMT litters were genotyped using the multiplex PCR to discriminate between wild type, heterozygous and homozygous mice. The PCR conditions were further standardized and optimized. The DNA was diluted to 10 ng/μl for a final amount of 20 ng in each PCR reaction, the annealing temperature was increased to 57°C and the number of cycles decreased to 30. 3.2.25 F1 pairings for colony expansion and homozygous mouse generation  F1 wild type mice that did not have transmission of the hIL8:c or mIL10:c allele were culled by staff of the RSF. F1 hIL8:c/+ heterozygous mice were paired, and the resulting F2 offspring genotyped, as previously described, for the generation of homozygous mice, hIL8:c/-. The homozygous genotype is denoted as hIL8:c/-, indicating that the mice carry two copies of the conditional allele and no WT allele. Homozygous pairings were set up. Once these matings have produced 2-3 litters they will be paired with Cre-expressing mice. 3.3  Results  3.3.1 PCR amplification of engineered IL8 and Il10 inserts from cDNA plasmids  The IL8 and Il10 coding sequences were successfully amplified from their respective cDNA plasmids using the primers designed to incorporate two restriction sites at both the 5’ and 3’ ends, as well as a 5’ FLAG-tag. Figure 3.2 depicts the final gene inserts with relative positions of the modifications indicated (A), the IL8 coding sequence (B), the engineered IL8 insert (C), the Il10 coding sequence (D) and engineered Il10 insert (E). Optimization of PCR amplification was performed using different high fidelity DNA polymerases; the Expand High Fidelity PCR system gave the optimal amplification for both the IL8 insert (Figure 3.3A), and Il10 insert (Figure 3.4A), and the gel extracted IL8 (Figure 3.3B) and Il10 (Figure 3.4B) insert products were used for subsequent TOPO cloning. 102  A.  AscI XbaI  XhoI AscI FLAG-tag  IL8 or Il10 coding sequence  B. ATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGCCTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGC TAAAGAACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTTTCCACCCCAAATTTATCAAAGAACTGAGAGTGATTGAGAGTG GACCACACTGCGCCAACACAGAAATTATTGTAAAGCTTTCTGATGGAAGAGAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCAG AGGGTTGTGGAGAAGTTTTTGAAGAGGGCTGAGAATTCATAA  C.  GG▼CGCGCCT▼CTAGAATGGACTACAAGGATGACGATGACAAGATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGCCTTCCTGAT TTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAGAACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTT TCCACCCCAAATTTATCAAAGAACTGAGAGTGATTGAGAGTGGACCACACTGCGCCAACACAGAAATTATTGTAAAGCTTTCTGAT GGAAGAGAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCAGAGGGTTGTGGAGAAGTTTTTGAAGAGGGCTGAGAATTCATAAC▼T CGAGGG▼CGCGCC  D. ATGCCTGGCTCAGCACTGCTATGCTGCCTGCTCTTACTGACTGGCATGAGGATCAGCAGGGGCCAGTACAGCCGGGAAGACAATAA CTGCACCCACTTCCCAGTCGGCCAGAGCCACATGCTCCTAGAGCTGCGGACTGCCTTCAGCCAGGTGAAGACTTTCTTTCAAACAA AGGACCAGCTGGACAACATACTGCTAACCGACTCCTTAATGCAGGACTTTAAGGGTTACTTGGGTTGCCAAGCCTTATCGGAAATG ATCCAGTTTTACCTGGTAGAAGTGATGCCCCAGGCAGAGAAGCATGGCCCAGAAATCAAGGAGCATTTGAATTCCCTGGGTGAGAA GCTGAAGACCCTCAGGATGCGGCTGAGGCGCTGTCATCGATTTCTCCCCTGTGAAAATAAGAGCAAGGCAGTGGAGCAGGTGAAGA GTGATTTTAATAAGCTCCAAGACCAAGGTGTCTACAAGGCCATGAATGAATTTGACATCTTCATCAACTGCATAGAAGCATACATG ATGATCAAAATGAAAAGCTAA  E.  GG▼CGCGCCT▼CTAGAATGGACTACAAGGATGACGATGACAAGATGCCTGGCTCAGCACTGCTATGCTGCCTGCTCTTACTGACTGG CATGAGGATCAGCAGGGGCCAGTACAGCCGGGAAGACAATAACTGCACCCACTTCCCAGTCGGCCAGAGCCACATGCTCCTAGAGC TGCGGACTGCCTTCAGCCAGGTGAAGACTTTCTTTCAAACAAAGGACCAGCTGGACAACATACTGCTAACCGACTCCTTAATGCAG GACTTTAAGGGTTACTTGGGTTGCCAAGCCTTATCGGAAATGATCCAGTTTTACCTGGTAGAAGTGATGCCCCAGGCAGAGAAGCA TGGCCCAGAAATCAAGGAGCATTTGAATTCCCTGGGTGAGAAGCTGAAGACCCTCAGGATGCGGCTGAGGCGCTGTCATCGATTTC TCCCCTGTGAAAATAAGAGCAAGGCAGTGGAGCAGGTGAAGAGTGATTTTAATAAGCTCCAAGACCAAGGTGTCTACAAGGCCATG AATGAATTTGACATCTTCATCAACTGCATAGAAGCATACATGATGATCAAAATGAAAAGCTAAC▼TCGAGGG▼CGCGCC  Figure 3.2. Schematic and nucleotide sequence of the IL8 and Il10 inserts. The IL8 and Il10 coding sequences were engineered by PCR to contain two sets of flanking restriction enzyme recognition sites as well as a 5’-FLAG tag; schematic in (A). The restriction enzyme sites facilitate directional cloning into two murine embryonic stem cell targeting vectors, pA33LSL (AscI/ XhoI) and pBigT/pROSA26PA (XbaI/ XhoI). The final IL8 and Il10 inserts cloned into the pA33LSL eukaryotic expression vector therefore do not contain sequence for the 3’-AscI recognition site. The 300 bp IL8 coding sequence, NCBI accession number NM_000584.2, (B) was amplified from the pCMV6XL5 cDNA plasmid to give the final 355 bp IL8 insert (C) with the forward IL8F-Flag primer (underlined) incorporating 5’ sequence for AscI and XbaI restriction enzyme recognition sites and the FLAG-tag, with start codon, and the reverse IL8R-Flag primer (underlined) incorporating 3’ sequence for XhoI and AscI restriction enzyme sites. The 537 bp Il10 coding sequence, NCBI accession number NM_010548.1, (D) was amplified from the pUMVC3-mIL10 cDNA plasmid to give the final 592 bp Il10 insert (E) with the forward IL10F-Flag primer (underlined) incorporating 5’ sequence for AscI and XbaI restriction enzyme recognition sites and the FLAG-tag, with start codon, and the reverse IL10R-Flag6 primer (underlined) incorporating 3’ sequence for XhoI and AscI restriction enzyme sites (E). In (C and E), the AscI recognition site is highlighted in teal, the XbaI recognition site is highlighted in pink, the FLAG-tag start codon is highlighted in yellow, the FLAG-tag sequence is highlighted in red and the XhoI recognition site is highlighted in green. Each restriction enzyme cut site is denoted by ▼.  103  A.  IL8 insert amplification Il10 insert amplification Mastermix alone negative DNA polymerase 1 Kb DNA ladder (0.5 μg)  +  + + - - - - + HiFi - - -  - - + + - - + HiFi - - -  + + - - - - + HiFi Plus - - -  - - + + - - + HiFi Plus - - -  + + - - - - + Pwo - - -  - - + + - - + Pwo - - -  +  B.  IL8 Flag HiFi PCR product (column purified) IL8 Flag HiFi PCR product (gel extracted) blank 1 kb ladder (0.5 μg)  +  + -  + -  + -  Figure 3.3. Generation of the IL8 insert by PCR amplification. The final 355 bp IL8 insert was amplified from the pCMV6-XL5 cDNA plasmid with the IL8F-Flag and IL8F-Flag primers. Optimization of IL8 insert amplification with HiFi, HiFi Plus and Pwo DNA polymerase systems was performed and each product visualized by agarose gel electrophoresis (A); gel contains both IL8 and Il10 products but Il10 insert amplification was repeated following primer re-design. The large primer-dimer products in some reactions are most likely due to the unusual length of the primers. The IL8 HiFi PCR products were gel extracted and the purified IL8 inserts analyzed by agarose gel electrophoresis (B).  104  A.  1  2  3  4  HiFi DNA polymerase HiFi Plus DNA polymerase IL10F-Flag primer IL10R-Flag(n) primer PCR mastermix 5X orange G loading buffer 1 kb ladder (0.5 μg)  5  6  7  8  9 10 11 12 13 14 15 16 17  +  + + 1 + -  + + 2 + -  + + 3 + -  + + 4 + -  + + 5 + -  + + 6 + -  + + -  + + 1 + -  + + 2 + -  + + 3 + -  + + 4 + -  + + 5 + -  + + 6 + -  + -  + -  +  B.  Figure 3.4. Generation of the Il10 insert by PCR amplification. The final 592 bp Il10 insert was amplified from the pUMVC3-mIL10 cDNA plasmid. Optimization of Il10 insert amplification with HiFi and HiFi Plus DNA polymerase systems was performed with five primer sets; the forward primer IL10F-Flag was paired with the reverse primers IL10R-Flag2-6 as well as the original IL10R-Flag1 for confirmation of correct amplicon size; n denotes primer number. Each product was visualized by agarose gel electrophoresis (A). The HiFi Il10 insert PCR product with IL10F-Flag and IL10R-Flag6 primers was gel extracted and the purified Il10 insert analyzed by agarose gel electrophoresis (B).  3.3.2 Confirmation of IL8 and Il10 TOPO cloning  Successful TOPO cloning of the IL8 insert was verified in a set of 30 colonies screened by PCR amplification (Figure 3.5A) and AscI or EcoRI restriction digestion of the plasmids isolated from the first ten of these yielded eight putatively positive clones (Figure 3.5B). AscI would be predicted to cut on either side of the IL8 insert producing a 351 bp fragment, and 105  EcoRI should cut once on either side of the pCRBluntII-TOPO cloning site to give a 373 bp fragment. Based on these criteria, IL8_pCRBluntII-TOPO-1, 3, 4, 5, 7, 8, 9, 10 were both PCRand digest-positive clones. A.  IL8_pCR-BluntII-TOPO clone pCMV6-XL5 cDNA plasmid PCR mastermix alone 1 kb ladder (0.5 μg)  -  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28  -  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -  29  30  - -  - - - + -  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  - - - - +  + - - - - - - - - - - - - - - - - - - - - - - - - - - - - +  + - - - -  B.  IL8_pCR-BluntII-TOPO clone AscI enzyme EcoRI enzyme 1 kb ladder (0.5 μg)  1 2 3 4 5 6 7 8 9 - + - - + - - + - - + - - + - - + - - + - - + - - + - - - + - - + - - + - - + - - + - - + - - + - - + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - -  IL8_pCR-BluntII-TOPO clone AscI enzyme EcoRI enzyme 5X orange G loading buffer 1 kb ladder (0.5 μg)  +  + -  10 + -  -  + -  + -  + -  +  +  Figure 3.5. Confirmation of IL8 insert cloning into pCR-BluntII-TOPO. Thirty IL8_pCR-BluntII-TOPO clones were screened by colony PCR using IL8F-Flag/ IL8R-Flag primers (A); the pCMV6-XL5 cDNA plasmid was included as the positive control and the PCR mastermix alone was the negative control. Each reaction was subjected to agarose gel electrophoresis. All clones were PCR-positive therefore the first ten clones were grown overnight for plasmid isolation (miniprep). Restriction digestion of IL8_pCR-BluntII-TOPO-1-10 clones with each of AscI and EcoRI was performed for confirmation of insert size (B). The digests were analyzed by agarose gel electrophoresis; uncut plasmids were also analyzed. AscI releases a 351 bp fragment and EcoRI releases a 373 bp fragment; positive clones for both digests are IL8_pCR-BluntII-TOPO-1, 3, 4, 5, 7, 8, 9, 10. 106  Fourteen colonies from the Il10 insert TOPO cloning screened by PCR amplification (Figure 3.6A), and restriction digestion of the isolated plasmids with HindIII and XmnI yielded eight putatively positive clones (Figure 3.6B). HindIII was predicted to cut once in the pCRBluntII-TOPO vector only and XmnI should cut once in the Il10 insert only, giving fragments of 259 and 3853 bp or 451 and 3661 bp, depending on whether the insert is in forward or reverse orientation, respectively, with respect to the reference Il10 coding sequence. From these criteria Il10_pCR-BluntII-TOPO-1, 2, 5, 6, 8, 9, 10 and 12 were both PCR- and digest-positive clones. A.  Il10_pCR-BluntII-TOPO clone - 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PCR mastermix alone - - - - - - - - - - - - - - - + 5X orange G loading buffer - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - 1 kb ladder (0.5 μg)  + -  +  B.  Il10_pCR-BluntII-TOPO clone HindIII and XmnI enzymes 5X orange G loading buffer 1 kb ladder (0.5 μg)  2 3 4 5 6 7 8 9 - 1 - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - +  Il10_pCR-BluntII-TOPO clone HindIII and XmnI enzymes 5X orange G loading buffer 1 kb ladder (0.5 μg)  11 12 13 14 - 10 - + - + - + - + - + - + - - - - - - - - - - - + - - - - - - - - - - -  + -  + -  +  Figure 3.6. Confirmation of Il10 insert cloning into pCR-BluntII-TOPO. Fourteen Il10_pCR-BluntII-TOPO clones were screened by colony PCR using standard M13F and M13R primers (A) The pCR-BluntII-TOPO plasmid contains M13F/ M13R primer binding sites which are absent from the original pUMVC3-mIL10 cDNA plasmid. Each reaction was subjected to agarose gel electrophoresis. All colonies were grown overnight for plasmid isolation. Restriction digestion of the Il10_pCR-BluntII-TOPO-1-14 plasmids with HindIII and XmnI (B) confirmed positive clones. The digests were analyzed by agarose gel electrophoresis; uncut plasmids were also analyzed. The sizes of digest fragments correspond to insert orientation within pCR-BluntII-TOPO; 259 bp indicates 5’ orientation and 451 bp indicates 3’ orientation. PCR- and digest-positive clones are Il10_pCR-BluntII-TOPO-1, 2, 5, 6, 8, 9, 10, 12. 107  3.3.3 Capillary sequencing of IL8 and Il10 TOPO inserts  Capillary sequencing, using gene-specific and M13 forward and reverse primers, of the putatively positive IL8_pCR-BluntII-TOPO and Il10_pCR-BluntII-TOPO clones identified those clones containing IL8 or Il10 inserts with the correct coding sequence and without any PCRintroduced mutations. The trace files within the assemblies were checked manually for discrepancies between the assemblies and reference sequence; some discrepancies were bona fide base pair changes (i.e. mutations) in the insert sequence, whereas others were errors in base calling by the Gap4 software. Merely taking the assembly consensus sequence as the final sequence would miss the discrimination between base pair changes and base calling errors by Gap4. Table 3.2 summarizes the eight IL8_pCR-BluntII-TOPO clones and Table 3.3 summarizes the eight Il10_pCR-BluntII-TOPO clones, indicating the number of discrepancies between the assembly consensus sequence and insert reference sequence, and from this the number of passes or fails as determined following manual checking. Two IL8_pCR-BluntII-TOPO clones (3, 10) were deemed to be acceptable to proceed with for subsequent cloning into the ESC targeting vector, pA33LSL. Six Il10_pCR-BluntII-TOPO clones (2, 5, 6, 9, 10, 12) were deemed suitable for pA33LSL cloning. Of the clones with correct sequences, IL8_pCR-BluntII-TOPO-10 and Il10_pCR-BluntII-TOPO-10 had high base calling confidence values throughout the reference sequence, and as such were used in subsequent cloning steps. Table 3.2. IL8_pCR-BluntII-TOPO clone sequencing results summary. Sequence traces were assembled using the Gap 4 software and checked manually for any discrepancies between the IL8 insert reference sequence. A discrepancy prevented further action when it was not resolved by at least two other sequencing reads. The overall suitability took into account the lack of fails as well as high sequencing quality throughout the insert length. Sample IL8_pCR-BluntII-TOPO-1 IL8_pCR-BluntII-TOPO-3 IL8_pCR-BluntII-TOPO-4 IL8_pCR-BluntII-TOPO-5 IL8_pCR-BluntII-TOPO-7 IL8_pCR-BluntII-TOPO-8 IL8_pCR-BluntII-TOPO-9 IL8_pCR-BluntII-TOPO-10  Contigs Discrepancies 1 1 1 1 1 1 1 1  4 1 5 4 6 4 10 6  Number Number Overall Subsequent Cloning Fails Passes Suitability 1 3 No NO 0 1 Yes NO 1 4 No NO 3 1 No NO 1 5 No NO 2 2 No NO 6 4 No NO 0 6 Yes YES  108  Table 3.3. Il10_pCR-BluntII-TOPO clone sequencing results summary. Sequence traces were assembled using the Gap 4 software and checked manually for any discrepancies between the Il10 insert reference sequence. A discrepancy failed when it was not resolved by at least two other sequencing reads. The overall suitability took into account the lack of fails as well as high sequencing quality throughout the insert length. Sample  Contigs Discrepancies  Il10_pCR-BluntII-TOPO-1 Il10_pCR-BluntII-TOPO-2 Il10_pCR-BluntII-TOPO-5 Il10_pCR-BluntII-TOPO-6 Il10_pCR-BluntII-TOPO-8 Il10_pCR-BluntII-TOPO-9 Il10_pCR-BluntII-TOPO-10 Il10_pCR-BluntII-TOPO-12  1 1 1 1 1 1 1 1  Number Number Overall Subsequent Fails Passes Suitability Cloning  16 15 13 24 17 19 15 39  2 0 0 0 2 0 0 0  No Yes Yes Yes No Yes Yes Yes  14 15 13 24 15 19 15 39  NO NO NO NO NO NO YES NO  3.3.4 IL8_pCR-BluntII-TOPO and Il10_pCR-BluntII-TOPO digestion and cloning into the pA33LSL targeting vector  To permit cloning into the pA33LSL targeting vector, the modified IL8 and Il10 inserts were excised from their respective pCR-BluntII-TOPO vectors using AscI and XhoI restriction enzymes. Figure 3.7 shows the agarose gels confirming the 340 bp IL8 (A) and 577 bp Il10 (B) digest-fragments and gel extracted products, as well as the AscI- and XhoI-digested pA33LSL plasmids. The amplified fragments and pA33LSL vector were ligated together. A.  pA33LSL IL8_pCRII-BluntII-TOPO-10 AscI and XhoI enzymes Purified IL8 insert 1 Kb DNA ladder (0.5 μg)  +  -  + + - + + - - -  +  -  - + + + + - - -  +  + -  109  B.  Il10_pCRII-BluntII-TOPO-10 pA33LSL AscI and XhoI enzymes Purified Il10 insert Purified pA33LSL digest 1 Kb DNA ladder (0.5 μg)  +  -  + + -  + + -  + + -  + + -  +  + -  + -  + -  -  + -  + -  Figure 3.7. Directional cloning of the IL8 and Il10 inserts into the pA33LSL murine embryonic stem cell targeting vector facilitated by AscI and XhoI restriction digestion. The confirmed insert-positive IL8_pCRBluntII-TOPO-10 and Il10_pCR-BluntII-TOPO-10 plasmids were digested with AscI and XhoI restriction enzymes to release the final 340 bp IL8 and 577 bp Il10 inserts, respectively, for ligation with pA33LSL, also digested with AscI and XhoI, to create the final targeting vectors. Agarose gel electrophoresis of the IL8_pCR-BluntII-TOPO-10 (A) and Il10_pCR-BluntII-TOPO-10 (B) digests confirmed insert size, which were then gel extracted prior to ligation. The linearized pA33LSL plasmid was also gel extracted in concert with each insert (A, B).  Successful amplification across the pA33LSL plasmid AscI/ XhoI cloning site, as well as with insert-specific primers for the Il10 clones, identified four putatively positive IL8 clones, IL8_pA33LSL-1, 2, 4, 5 (Figure 3.8A), and five putatively positive Il10 clones, Il10_pA33LSL1, 2, 3, 4, 6 (Figure 3.8B). A.  IL8_pA33LSL clone PCR mastermix 1 Kb DNA ladder (0.5 μg)  +  1 + -  2 + -  3 + -  4 + -  5 + -  6 + -  + -  110  B.  Il10_pA33LSL clone - 1 2 3 4 5 6 - 1 2 3 4 5 6 - A33F/ A33R primers - + + + + + + + - - - - - - - IL10F-Flag/IL10R-Flag6 primers - - - - - - - - + + + + + + + PCR mastermix - + + + + + + + + + + + + + + + - - - - - - - - - - - - - - + 1 Kb DNA ladder (0.5 μg)  Figure 3.8. Validation of IL8 and Il10 insert ligation with pA33LSL for the generation of final ESC targeting vector. Six of each of the IL8_pA33LSL (A) and Il10_pA33LSL (B) plasmids were amplified with Platinum PCR Supermix and primers targeting the pA33LSL cloning site (A33F/ A33R), as well as insert-specific primers for the Il10 clones (IL10F-Flag/ IL10R-Flag6). Successful amplification identified four positive IL8 clones, IL8_pA33LSL1, 2, 4, 6, and five positive Il10 clones, Il10_pA33LSL-1, 2, 3, 4, 6.  The  capillary  sequencing  analysis  for  the  PCR-positive  IL8_pA33LSL  and  Il10_pA33LSL plasmid inserts is found in Tables 3.4 and 3.5, respectively. Two of the four IL8_pA33LSL and four of the five Il10_pA33LSL clones had the correct sequences and were suitable for use as the ESC targeting vector. The final vectors were chosen based the overall base calling confidence scores for the insert sequences. IL8_pA33LSL-1 and Il10_pA33LSL-1 were selected as the final ESC targeting vectors. Table 3.4. IL8_pA33LSL clone sequencing results summary. Sequence traces were assembled using the Gap 4 software and checked manually for any discrepancies between the IL8 insert, digested in silico with AscI and XhoI, reference sequence. A discrepancy failed when it was not resolved by at least two other sequencing reads. The overall suitability took into account the lack of fails as well as high sequencing quality throughout the insert length. The IL8_pA33LSL-5 clone was not analyzed because the two sequences that passed Asp (WTSI quality control) were very short and did not assemble with the reference sequence. Clone IL8_pA33LSL-1 IL8_pA33LSL-2 IL8_pA33LSL-4 IL8_pA33LSL-5  Contigs Discrepancies 1 1 2 3  3 5 2 n/a  Number Number Overall Fails Passes suitability 0 3 Yes 0 5 Yes 0 2 Maybe n/a n/a No  ESC Targeting YES NO NO NO  111  Table 3.5. Il10_pA33LSL clone sequencing results summary. Sequence traces were assembled using the Gap 4 software and checked manually for any discrepancies between the Il10 insert, digested in silico with AscI and XhoI, reference sequence. A discrepancy failed when it was not resolved by at least two other sequencing reads. The overall suitability took into account the lack of fails as well as high sequencing quality throughout the insert length; the sequences for IL10_pA33LSL-6 were of lower quality than those of the other clones and therefore not deemed suitable. Clone  Contigs Discrepancies  Il10_pA33LSL-1 Il10_pA33LSL-2 Il10_pA33LSL-3 Il10_pA33LSL-4 Il10_pA33LSL-6  1 1 1 1 1  8 8 6 6 7  Number Number Overall Fails Passes suitability 0 8 Yes 0 8 Yes 0 6 Yes 0 6 Yes 0 7 No  ESC Targeting YES NO NO NO NO  3.3.5 Transfection of JM8.N4 murine embryonic stem cells with the IL8_pA33LSL and Il10_pA33LSL targeting vectors  Following transfection of JM8.N4 ESC with either the SalI linearized IL8_pA33LSL or Il10_pA33LSL targeting vector (Figure 3.9), Geneticin section and ESC colony formation, microscopic inspection revealed that most colonies had round, well-defined edges. Lysed cells were visible at the surface of the cuvette, which had indicated successful electroporation. A few colonies were star-burst in appearance, indicative of differentiating cells. Generally speaking, colonies with differentiating cells were not picked for expansion to maintain ESC pluripotency.  IL8_pA33LSL-1 Il10_pA33LSL-1 SalI enzyme 5X orange G loading buffer 1 Kb DNA ladder (0.5 μg)  +  + + -  + -  + + -  + -  + -  + -  +  Figure 3.9. Final SalI-digested IL8_pA33LSL-1 and Il10_pA33LSL-1 targeting vectors. Each plasmid was linearized by SalI restriction digestion for JM8.N4 ESC transfection and vector targeting at the murine Gpa33 genomic locus. Agarose gel electrophoresis confirmed the single cut site for both IL8_pA33LSL-1 and Il10_pA33LSL-1 compared to uncut plasmids. 112  During colony expansion for recombination-positive clones, it was necessary to find a balance between maximum desired confluency (80-90%) and the maximum number of wells at this confluency, so that there would be the highest number of cells possible for freezing and DNA extraction without compromising ESC health. Of the 96 clones picked for each gene construct, 66, 52 and 0 of the IL8_Gpa33_JM8N4 ESC clones and 54, 29 and 2 of the Il10_Gpa33_JM8N4 ESC clones had at least 5, 10 and 20 μg of DNA, respectively, following extraction. Clones with concentrations sufficiently high enough to obtain 10 μg DNA in the required volume were further subjected to restriction digestion. 3.3.6 Southern blot probe PCR and sequencing confirmation  Optimization of the amplification conditions for the A33 probe (A33p) using Pfx50 DNA polymerase was performed at variable annealing temperatures (50, 55, 58°C), primer sets (A33pF1-3, A33pR1-3) and JM8.N4 DNA template amounts (25, 50, 100 ng) (Figure 3.10). An annealing temperature of 58°C, using A33pF2/ A33pR2 primers and 50 ng template gave the cleanest amplification.  Annealing Temp (°C) Primers JM8.N4 ESC DNA (ng) 1 Kb ladder (0.5 μg)  +  Annealing Temp (°C) Primers JM8.N4 ESC DNA (ng) 5X orange G loading buffer 1 Kb ladder (0.5 μg)  +  50  55  F1/R1 F1/R1 F1/R1 F2/R2 F2/R2 F2/R2 F3/R3 F3/R3 F3/R3 F1/R1 F1/R1 F1/R1 F2/R2 F2/R2 F2/R2 F3/R3 F3/R3 F3/R3  25 -  50 -  100 -  25 -  50 -  100 -  25 -  50 -  100 -  58  25 -  50 -  100 -  25 -  50 -  50  55  58  + -  +  F1/R1 F1/R1 F1/R1 F2/R2 F2/R2 F2/R2 F3/R3 F3/R3 F3/R3 F1/R1 F2/R2 F3/R3  25 -  50 -  100 -  25 -  50 -  100 -  25 -  50 -  100 -  -  -  -  100 -  25 -  50 -  100 -  +  Figure 3.10. PCR optimization of the Southern blot probe for ESC targeting confirmation. Three sets of forward and reverse primers were designed to amplify a region downstream of the Gpa33 genomic locus and 3’ arm of homology of the pA33LSL targeting vector to act as the Southern blot probe. Amplification was performed with Pfx50 DNA polymerase using variable primer pairs (F1/R1, F2/R2, F3/R3), JM8N4 ESC DNA template (25, 50, 100 ng), and annealing temperatures (50, 55, 58°C). The PCR products were visualized by agarose gel electrophoresis. Expected amplicon sizes from the F1/R1, F2/R2 and F3/R3 primer pairs were 356, 520 and 248 bp, respectively. 113  Amplification with Platinum PCR Supermix using the same annealing temperatures and primer combinations, but with 12.5 and 50 ng JM8.N4 DNA, confirmed the primer specificity (Figure 3.11). However the Pfx50 product was used for TOPO cloning due to the high fidelity of the Pfx50 DNA polymerase.  Annealing Temp (°C) Primers JM8.N4 ESC DNA (ng) 1 Kb ladder (0.5 μg)  +  Annealing Temp (°C) Primers JM8.N4 ESC DNA (ng) 5X orange G loading buffer 1 Kb ladder (0.5 μg)  +  50  55  F1/R1 F1/R1 F2/R2 F2/R2 F3/R3 F3/R3 F1/R1 F2/R2 F1/R1 F1/R1 F2/R2 F2/R2 F3/R3 F3/R3 F1/R1 F3/R3  12.5 -  50 -  12.5 -  50 -  12.5 -  50 -  -  -  58 F1/R1 F1/R1 F2/R2 F2/R2 F3/R3 F3/R3 F2/R2 F3/R3  12.5 -  50 -  12.5 -  50 -  12.5 -  50 -  12.5 -  50 -  12.5 -  50 -  + -  +  12.5 -  50 -  12.5 -  50 -  12.5 -  50 -  +  Figure 3.11. Confirmation of Southern blot probe primer specificity. The Southern blot probe primers were amplified using Platinum PCR Supermix to test their specificity as amplification with Pfx50 DNA polymerase was very smeary. This could also have been due to too much genomic DNA template so the same primer combinations (F1/R1, F2/R2, F3/R3) were tested (300 nM) at the three different annealing temperatures (50, 55, 58°C), however 12.5 and 50 ng JM8N4 ESC DNA was used as the template in a final volume of 50 μl. The PCR products (10 μl) were visualized by agarose gel electrophoresis (0.8% agarose, 80 V, 90 min). The F1/R1, F2/R2 and F3/R3 primers all amplified the expected amplicons of 356, 520 and 248 bp, respectively.  Successful cloning of the A33p PCR product into pCR-BluntII-TOPO was confirmed by restriction digestion with BamHI/ EcoRV (581 bp) and EcoRI (451 and 87 bp) and analysis of undigested clones, Figure 3.12A. Nine of the twelve clones analyzed gave fragments of expected 114  size and therefore contained the A33p insert. Undigested clones demonstrated ligation of the linear A33p_pCR-BluntII-TOPO vector upon cloning. A33p_pCRIIBlunt_TOPO clones 7, 8 and 10 did not give restriction fragments of expected size so these were not used in further analyses. The identity of the insert of the nine positive A33p_pCR-BluntII-TOPO plasmids was confirmed by  PCR  amplification  with  A33pF2  and  A33pR2  primers  (Figure  3.12B).  A33p_pCRIIBlunt_TOPO clones 8 and 10 did not give amplicons of expected size and thus were not used in subsequent analyses. A further two A33p_pCR-BluntII-TOPO clones (3, 11) did not have sufficient DNA for further analysis and were consequently excluded. Putatively positive A33p_pCR-BluntII-TOPO clones were numbers 1, 2, 4, 5, 6, 9 and 12. A.  A33p_pCR_BluntII-TOPO clone BamHI and EcoRV enzymes EcoRI enzyme 1 Kb ladder (0.5 μg)  +  + -  1 + -  -  + -  2 + -  -  + -  3 + -  -  + -  4 + -  -  + -  5 + -  -  + -  6 + -  -  + -  A33p_pCR_BluntII-TOPO clone BamHI and EcoRV enzymes EcoRI enzyme 1 Kb ladder (0.5 μg)  +  + -  7 + -  -  + -  8 + -  -  + -  9 + -  -  + -  10 + -  -  + -  11 + -  -  + -  12 + -  -  + -  115  B.  A33p_pCR_BluntII-TOPO clone JM8.N4 WT DNA PCR mastermix 1 Kb ladder (0.5 μg)  +  1 -  2 -  3 -  4 -  5 -  6 -  7 -  8 -  9 -  10 -  11 -  12 -  + -  + -  +  Figure 3.12. Confirmation of Southern blot probe cloning into pCR-BluntII-TOPO. Twelve isolated colonies were grown overnight for plasmid isolation (miniprep) and screened for correct insert size by restriction digestion (A) and PCR amplification (B). Restriction digestion was carried out with BamHI and EcoRV, which both cut once in the pCR-BluntII-TOPO multiple cloning site (MCS) to release the insert (expected size 581 bp), as well as EcoRI, which cuts twice in the pCR-BluntII-TOPO MCS to release the insert and once within the A33 amplicon, giving two digest fragments (expected sizes 87 and 451 bp). Digest-positive clones were A33p_pCR-BluntII-TOPO-1, 2, 3, 4, 5, 6, 9, 11, 12. PCR-positive clones were A33p_pCR-BluntII-TOPO-1, 2, 3, 4, 5, 6, 7, 9, 11, 12.  Capillary sequencing confirmed the identity of the seven A33p_pCR-BluntII-TOPO plasmids and a summary of sequence analysis is found in Table 3.6. Any discrepancies between sequence trace and A33p reference sequence were checked manually. In all but one plasmid, all questionable bases were verified to be those of the A33p sequence; A33p_pCRIIBlunt-5 contained an A to G mutation at position 520 with respect to the A33p reference sequence. This mutation is unlikely to affect hybridization of the probe and therefore all seven plasmids were deemed suitable for use in the generation of the Southern blot probe. Figure 3.13 shows the final purified A33p products amplified from the A33p_pCR-BluntII-TOPO clones, which were then pooled in order to maintain the composition of the Southern blot probe over several hybridizations, if necessary.  116  Table 3.6. A33p_pCR-BluntII-TOPO clone sequencing results summary. Sequence traces were assembled using the Gap 4 software and checked manually for any discrepancies between the Gpa33 probe (A33p) reference sequence. A discrepancy failed when it was not resolved by at least two other sequencing reads. The overall suitability took into account the lack of fails as well as high sequencing quality throughout the insert length. A33p_pCRIIBlunt-5 was included in the probe despite having one discrepancy because this was an A to G mutation at the last position of the sequence that would not affect hybridization in the Southern blot assay. Clone  Contigs Discrepancies  A33p_pCRIIBlunt-1 A33p_pCRIIBlunt-2 A33p_pCRIIBlunt-4 A33p_pCRIIBlunt-5 A33p_pCRIIBlunt-6 A33p_pCRIIBlunt-9 A33p_pCRIIBlunt-12  A33p_pCR_BluntII-TOPO clone Platimum PCR Supermix Pfx50 DNA Polymerase 1 Kb ladder (0.5 μg)  1 1 1 1 1 1 1  6 5 8 5 4 7 4  +  1 + -  1 + -  Number Number Overall Fails Passes suitability 0 6 Yes 0 5 Yes 0 8 Yes 1 4 Yes 0 4 Yes 0 7 Yes 0 4 Yes  +  2 + -  4 + -  5 + -  6 + -  9 + -  Southern Blot Probe YES YES YES YES YES YES YES  12 + -  Figure 3.13. Purified A33 amplicons for use as Southern blot probe. The Southern blot probe was amplified from sequence-positive A33p_pCR-BluntII-TOPO plasmids with Pfx50 DNA polymerase and A33pF2/ A33pR2 primers, then column purified and pooled. The second lane in the left gel is the A33p_pCR-BluntII-TOPO-1 PCR product amplified with Platinum PCR Supermix and column purified as a test for what appeared to be degradation in the confirmation PCR (Figure 3.12B). No further use was made of the Platinum PCR Supermix products beyond insert confirmation.  3.3.7 Southern blot analysis of IL8 and Il10 ESC clone BsaBI-digested DNA  Figure 3.14 is the agarose gel (A) and film image (B) for the Southern blot analysis of the first eleven ESC clones of each gene construct. Of the IL8 blotted clones, three had high molecular weight bands (approximately 23 kb in size) indicative of undigested DNA and seven were positive with hIL8:c allele of 13.1 kb and WT allele of 10.9 kb relatively equal in intensities. The identity of one clone was undetermined as it had both hIL8:c and WT alleles, but 117  the WT allele was stronger in intensity than the hIL8:c potentially indicative of a heterogeneous cell population and incomplete hIL8:c penetrance. Of the Il10 blotted clones, five had undigested DNA and five were positive with mIL10:c allele of 13.3 kb and WT allele of 10.9 kb relatively equal in intensities. The identity of one clone was undetermined as it also had a much stronger intensity WT band compared to mIL10:c band. A.  B.  Gene Insert Clone ID JM8.N4 WT DNA Genotype BsaBI MM λ DNA/ HindIII ladder  8 8 8 8 8 8 8 8 8 8 10 10 10 10 10 10 10 10 10 10 10 8 1 12 15 20 21 22 26 28 31 34 3 4 9 11 14 15 22 23 24 26 27 35 - - - - - - - - - - - - - - - - - - - - - - +  H H H ND H H U U U H ND H U H U H U H H U U H WT + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - -  -  +  Figure 3.14. Southern blot analysis of BsaBI-digested IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 clone DNA. Each clone DNA was digested with the restriction enzyme BsaBI. JM8.N4 DNA was also digested under the same conditions for the wildtype (WT) genotype/ positive control and the BsaBI digest mastermix alone was the negative control. The λ DNA/ HindIII ladder (25 μg) was included to approximate the sizes of the hybridized digestion fragments; the sizes of the three visible λ DNA/ HindIII bands are 23130, 9416 and 6557 bp, from the largest at the top of the agarose gel (A) and Southern blot (B). The expected size of the WT allele (+/+) is 10.9 kb, the IL8 knock-in allele (hIL8:c) is 13.1 kb and the Il10 knock-in allele (mIL10:c) is 13.3 kb. The genotype of each clone is noted. Correctly targeted hIL8:c/+ or mIL10:c/+ clones are labelled H to denote their conditional heterozygous nature, clones labelled ND were not determined as they had a weaker knock-in allele compared to WT, clones labelled U were undigested (>23 kb single band) and therefore their genotype was also not determined. Both the BsaBI mastermix alone and blank lane negative controls were clear.  118  The Southern blot revealed BsaBI-digest-generated DNA fragments of expected size for both hIL8:c and mIL10:c alleles, and as such it can be assumed that homologous recombination occurred in the correct genomic loci since the Southern blot probe hybridizes to the genomic sequence outside of the targeting vector. A second method was needed for confirmation of this assumption and thus PCR amplifications in and around the locus of recombination were performed. The remaining BsaBI-digested clone DNA were not subjected to further analysis as the first Southern blot gave sufficient positive clones for PCR confirmation and subsequent ESC culture. 3.3.8 PCR confirmation of hIL8:c and mIL10:c alleles  Concomitant to Southern blot analysis, the locus of recombination of the first eleven clones was also analyzed by PCR. Primers used to amplify the knock-in allele, neomycin resistance cassette as well as a primer downstream of the pA33LSL vector arm of homology confirmed the clones identified as positive by Southern blot analysis. Seven IL8_Gpa33_JM8N4 and five Il10_Gpa33_JM8N4 ESC clones were thus identified as positive for correct homologous recombination at the genomic locus after the seventh exon of the Gpa33 gene. Figures 3.15 and 3.16 show the PCR results for the IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 ESC clones, respectively.  119  A.  IL8_Gpa33_JM8N4 clone WT JM8.N4 DNA IL8-7/ IL8-8 primers 1 Kb DNA ladder (0.5 μg)  1 12 15 20 21 22 26 28 31 34 35 - - - - - - - - - - - + + + + + + + + + + + + + - - - - - - - - - - - -  +  B.  IL8_Gpa33_JM8N4 clone WT JM8.N4 DNA NeoF/ NeoR primers 1 Kb DNA ladder (0.5 μg)  +  1 12 15 - - + + + - - -  20 21 22 26 28 31 34 35 - - - - - - - - + + + + + + + + + + - - - - - - - - -  +  1 12 - - + - - + + + - - - - - - -  + + -  C.  IL8_Gpa33_JM8N4 clone WT JM8.N4 DNA PCR mastermix alone NeoF/ A33pR2 primers IL8-7/ A33pR2 primers IL8-7/ IL8-8 primers 1 Kb DNA ladder (0.5 μg)  1 12 - - + - - - - + + + - - - - -  + + -  1 12 - - + - - - - - - + + + - - -  + + -  Figure 3.15. Confirmation of homologous recombination of the IL8_A33 targeting vector and JM8N4 ESC DNA at the expected genomic locus by PCR amplification. IL8_Gpa33_JM8N4 clone DNA was amplified with Platinum Taq DNA polymerase High Fidelity using several primer combinations to confirm the correct locus of recombination, as well as the identity of IL8 itself. These include (A, C) IL8-specific primers (IL8-7/ IL8-8), (B) neomycin resistance cassette primers (NeoF/ NeoR), (C) neomycin resistance cassette or IL8 insert to past the 3’ recombination locus (NeoF/ A33pR2 and IL8-7/ A33pR2).  120  A.  Il10_Gpa33_JM8N4 clone 3 WT JM8.N4 DNA IL10F-Flag/ IL10R-Flag6 primers + 1 Kb DNA ladder (0.5 μg)  4 + -  9 11 14 15 22 23 24 26 27 - - - - - - - - - + + + + + + + + + + + - - - - - - - - - -  3 + -  4 + -  +  B.  Il10_Gpa33_JM8N4 clone WT JM8.N4 DNA NeoF/ NeoR primers 1 Kb DNA ladder (0.5 μg)  +  9 11 14 15 22 23 24 26 27 - - - - - - - - - + + + + + + + + + + + - - - - - - - - - -  C.  Il10_Gpa33_JM8N4 clone WT JM8.N4 DNA PCR mastermix alone NeoF/ A33pR2 primers ILl0F-Flag/ A33pR2 primers IL10F-Flag/ IL10R-Flag6 primers 1 Kb DNA ladder (0.5 μg)  +  3 + -  4 + -  + + -  + + -  3 + -  4 + -  + + -  + + -  3 + -  4 + -  + + -  + + -  +  Figure 3.16. Confirmation of homologous recombination of the Il10_A33 targeting vector and JM8N4 ESC DNA at the expected genomic locus by PCR amplification. Il10_Gpa33_JM8N4 clone DNA was amplified with Platinum Taq DNA polymerase High Fidelity using several primer combinations to confirm the correct locus of recombination, as well as the identity of Il10 itself. These include (A) Il10-specific primers (IL10F-Flag/ IL10RFlag6), (B) neomycin resistance cassette primers (NeoF/ NeoR) and (C) neomycin resistance cassette or Il10 insert to past the 3’ recombination locus (NeoF/ A33pR2 and IL10F-Flag/ A33pR2).  121  3.3.9 Confirmation of functional loxP sites within the IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 clones in vitro  Following transformation of the Cre-puromycin plasmid into IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 ESC clones, 48 colonies were picked for selection. Compared to the IL8_pA33LSL or Il10_pA33LSL targeting vector transformations, there were fewer high quality colonies for picking; some colonies with differentiating cells around the edges were picked to make up the numbers and give the best chance of finding at least one with a deleted neomycin resistance gene. As a result, only 11 and 22 of the colonies picked from the IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 ESC transformations, respectively, grew to the desired confluency following clonal expansion; this is expected as differentiating cells lose their pleuripotent ability. Figure 3.17 shows the agarose gel electrophoresis of PCR products using gene-specific (A, B) and neomycin resistance cassette primers (C, D) for both IL8 and Il10 alleles, respectively. A clone was deemed positive if amplification occurred with the gene-specific but not neomycin resistance cassette primer pairs. All five IL8_Gpa33_JM8N4_Cre ESC clones had amplification of the IL8 insert and three of these had no amplification of the neomycin resistance cassette. Three out of five Il10_Gpa33_JM8N4_Cre ESC clones had amplification of the expected gene insert and all of these had no amplification of the neomycin resistance cassette. The two Il10_Gpa33_JM8N4_Cre ESC clones that demonstrated amplification of neither the gene insert nor the neomycin resistance cassette either did not have enough DNA for amplification or else more likely, something in the reaction inhibited the amplification. A WT JM8.N4 positive reaction would have been beneficial for this determination. Further optimization of these two samples was not performed because the other three clones demonstrated the functionality of the Cre/loxP system. The parent IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 ESC clones, as well as WT JM8N4 DNA were included as controls; the parent clones had both amplification of the gene insert as well as the neomycin resistance cassette and the WT DNA did not amplify with either set of primers. The  loxP  sites  are  therefore  intact  in  both  the  IL8_Gpa33_JM8N4  and  Il10_Gpa33_JM8N4 ESC and Cre-mediated deletion of the neomycin resistance cassette functions as expected. It was therefore assumed that this process would occur in the same manner in vivo upon breeding of IL8 or Il10 knock-in mice with Cre-expressing and the subsequent expression of Cre-recombinase.  122  A.  B.  C.  D.  IL8_A33_JM8N4-12_Cre clone IL8_A33_JM8N4-12 Il10_A33_JM8N4-24_Cre clone Il10_A33_JM8N4-24 JM8.N4 WT DNA PCR mastermix alone 1 Kb DNA ladder (0.5 μg)  +  1 -  13 -  20 -  27 -  30 -  + -  + -  + -  6 -  14 -  15 -  16 -  18 -  + -  + -  + -  +  Figure 3.17. In vitro Cre-mediated deletion of the IL8 and Il10 targeted allele neomycin resistance cassette. The initial IL8_Gpa33_JM8N4-12 and Il10_Gpa33_JM8N4-24 clones microinjected were transformed with a Creexpressing plasmid, which also contains puromycin resistance for selection. Five colonies from each clone were tested by amplification using Platinum PCR Supermix with gene-specific or neomycin resistance cassette primers; the parent ESC clone, WT JM8.N4 and PCR mastermix alone were included as controls. All five of the IL8_Gpa33_JM8N4-12_Cre clones had amplification of the IL8 insert with IL8-7 and IL8-8 primers (A) but only three had full deletion of the neomycin resistance cassette by the lack of product with NeoF and NeoR primers (C). Three of the Il10_Gpa33_JM8N4-24_Cre clones had amplification of the Il10 insert with IL10F-Flag and IL10FFlag6 primers (B) and all of the three had deletion of the neomycin resistance cassette by lack of product with NeoF and NeoR primers (D).  3.3.10 IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 ESC clone microinjection  The single-cell suspension submitted for microinjection contained very round and bright ESC. The WTSI technician who performed each microinjection additionally confirmed the quality of the ESC. IL8_Gpa33_JM8N4 ESC underwent two rounds of microinjection with two different positive clones to ensure the best chance of germline transmission (i.e. the microinjections were carried out on separate days using different clones with different passage numbers). Table 3.7 summarizes the two microinjections performed for the IL8 gene construct including the ESC clone, number of ESC microinjected, the number of embryos implanted and 123  outcome for each recipient mouse. The first microinjection (MI2049) did not result in any of the three recipient mice getting pregnant. The repeat IL8_Gpa33_JM8N4 ESC microinjection (MI2098, Colony name LRHE: Linda Rehaume Human Interleukin-Eight) resulted in three male F0 offspring LRHE3.1a, LRHE4.1a, LRHE4.1b having 90%, 90% and 50% chimerism, respectively as evaluated by staff of the WTSI RSF. LRHE3.1a developed a growth/grey film over its left eye, but was otherwise healthy. LRHE4.1a developed the growth/grey film over both eyes and a bloated abdomen and was culled at 4.9 weeks of age due to the enlarged abdomen. A wild type, age-matched CBLN mouse was also culled as a control. For both mice, the eye, kidney, spleen and portions of the small and large intestine were removed, fixed in formaldehyde and embedded in wax. Upon dissection, the kidneys of the LRHE4.1a mouse were obviously swollen and full of liquid. Haematoxylin and eosin stained sections showed relatively normal kidney cellular integrity in areas surrounding the cavity that contained the liquid. The left eye of LRHE4.1a showed abnormal formation with loss of symmetry and extension of the sclera over the cornea. No further analysis of LRHE4.1a tissue was performed. It is known that sometimes injecting too many ESC can overwhelm the embryo resulting in abnormalities (personal communication, D. Adams), and this was therefore considered a sporadic finding. Table 3.7. IL8_Gpa33_JM8N4 and Il10_Gpa33_JM8N4 microinjection and chimera generation summary. The embryo (3.5 dpc blastocyst) microinjection and implantation was performed by members of Team 121, WTSI. The embryo strain is C57BL/6J-Tyr<c-Brd> (100%), and the recipient strain is CBA (50%) and C57BL/6J (50%). dpc, days post coitus. Colony ID Prefix LRIE LRHE LRIT LRMT  2049 2097 2050 2098  MI Date  ESC Clone Injected  2009/05/19 IL8_Gpa33_JM8N4-21 2009/07/06 IL8_Gpa33_JM8N4-12 2009/05/20 Il10_Gpa33_JM8N4-24 2009/09/10 Il10_Gpa33_JM8N4-4  ESC Embryos Recipient Chimeras % Injected/ Implanted/ Status (M/F) Chimera Embryo Recipient 5-15 11, 10, 10 none pregnant n/a n/a 5-20 11, 11, 11, 11 all pregnant 3M 90, 90, 50 5-15 11, 11, 11 all pregnant 2F 70, 50 5-15 11, 10, 10 all pregnant 3M 90, 90, 30  The remaining two F0 IL8 chimera males (LRHE3.1a, 90%; LRHE4.1b, 50%) were each mated with two CBLN female mice, leading to each IL8 chimera male fathering two litters. LRHE3.1a fathered fifteen F1 pups in the LRHE5.1 and LRHE6.1 litters and LRHE4.1b fathered sixteen F1 pups in the LRHE7.1 and LRHE8.1 litters. None of the pups from LRHE4.1b were heterozygous for the IL8 gene. However there were six heterozygous pups from the first LRHE3.1a litters giving a germline transmission rate of 40%, which is close to the theoretical 50%, but slightly lower than the 62% reported for JM8.N4 ESC (189). Subsequently, LRHE3.1a 124  fathered eight pups in LRHE5.2 of which only one was heterozygous and unfortunately this pup was found dead at 2.5 weeks of age (w). LRHE3.1a also fathered litters LRHE6.2b and LRHE6.2c, which produced eight and three pups, respectively, of which five were heterozygous. LRHE4.1b fathered another LRHE8.2 litter producing ten more F1 pups, but none of these were heterozygous, therefore germline transmission was not successful with the 50% IL8 chimera. Pairings of F1 heterozygous hIL8:c/+ offspring were set up to generate mice homozygous for the IL8 insert, hIL8:c/-, and subsequently homozygous pairings have been set up. Figure 3.18 is an example of the multiplex genotyping for the litters LRHE15.1 and LRHE13.1, which is representative of all LRHE offspring. Table 3.8 summarizes the LRHE colony, which includes all mice regardless of fate, as of 2010/04/20.  LRHE15.1 litter a b c d e LRHE13.1 litter a b c d e f g h IL8_A33_JM8N4 ESC DNA JM8.N4 WT DNA Tweezer Control + PCR mastermix + + + + + + + + + + + + + + NF-H2O + 1 Kb DNA ladder (0.5 μg) Genotype n/a Hom Het Hom Hom Het Het Hom WT Hom Het Hom Het WT n/a  + + + + + + n/a Het WT n/a  + + n/a  Figure 3.18. IL8 F1 (LRHE) offspring genotyping. Genomic DNA was amplified with Platinum PCR Supermix in a multiplex reaction consisting of an IL8 insert forward primer (IL8-7), a Gpa33 exon-7 forward primer (A33-E7) and a reverse Gpa33 3’-UTR primer (A33-3U). The PCR product obtained from the wildtype allele is 331 bp and that of the hIL8:c allele is 664 bp. Amplification of both products occurs for heterozygous mice, the 664 bp product for homozygous mice and the 331 bp product for wildtype mice. This genotyping is representative of all LRHE offspring.  125  Table 3.8. The IL8 LRHE mouse colony. Three male chimeras were produced out of the five F0 pups born to the four recipient mice (M00297577-9, M00297590). One of the 90% chimeras was culled due to sickness and the 50% chimera was culled due to unsuccessful germline transmission. The remaining 90% chimera (LRHE3.1a) had successful germline transmission, determined by PCR, upon mating with CBLN females resulting in an established breeding colony. The age of each mouse or age at time of death is given in weeks. The colony is current as of 2010/04/20. U, unsexed; F, female; M, male. Name LRHE3.1a LRHE5.1b LRHE5.1e LRHE5.1c LRHE6.1d LRHE6.1b LRHE6.1e LRHE6.2a LRHE6.2f LRHE6.2b LRHE6.2h LRHE10.1g LRHE10.1a LRHE10.1b LRHE10.1f LRHE9.2d LRHE9.2c LRHE9.2e LRHE13.1b LRHE13.1f LRHE13.1d LRHE15.1c LRHE15.1a LRHE15.1d LRHE10.3a LRHE10.3b LRHE10.3c LRHE10.3d LRHE10.3e LRHE15.1b LRHE6.4b LRHE6.4d LRHE15.2a LRHE15.2b LRHE15.2c LRHE15.2d LRHE15.2e LRHE15.2f LRHE15.2g LRHE6.4a LRHE6.4c LRHE10.1e LRHE10.2b LRHE10.2h LRHE10.2i LRHE13.1c LRHE13.1h LRHE5.1a LRHE5.1d LRHE5.1f LRHE5.1g LRHE5.2a LRHE5.2b  Generation F0 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F3 F3 F3 F2 F2 F2 F2 F2 F3 F1 F1 F3 F3 F3 F3 F3 F3 F3 F1 F1 F2 F2 F2 F2 F2 F2 F1 F1 F1 F1 F1 F1  Sex M M F M F M F M F M F F M M F F M F M F M F M F M M M F F M M F U U U U U U U M F F M M F M F M M F F M M  Coat Albino/Black (90%) Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black  Genotype hIL8:chimera 90% hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/hIL8:c/hIL8:c/+ hIL8:c/hIL8:c/hIL8:c/hIL8:c/hIL8:c/hIL8:c/hIL8:c/hIL8:c/hIL8:c/hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+  Status Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive Alive To be culled To be culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled  Age 38.7 28.4 28.4 28.4 25.9 25.9 25.9 18.7 18.7 18.7 18.7 15.7 15.7 15.7 15.7 11.1 11.1 11.1 6.6 6.6 6.6 6.9 6.9 6.9 3.4 3.4 3.4 3.4 3.4 6.9 5.6 5.6 1.1 1.1 1.1 1.1 1.1 1.1 1.1 5.6 5.6 3 6.1 6.1 6.1 5.9 5.9 10.6 6.9 10.6 6.9 3.1 3.1  Mating LRHE6 LRHE9 LRHE9 LRHE10 LRHE10 LRHE11 LRHE11 LRHE12 LRHE12 LRHE13 LRHE13 LRHE14 LRHE14 LRHE15 LRHE15 LRHE16 LRHE17 LRHE17 LRHE18 LRHE18 LRHE19 LRHE19 LRHE20 LRHE20  Full Strain Genetic Background C57BL/6N(100%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%)  126  Name LRHE5.2c LRHE5.2d LRHE5.2e LRHE5.2f LRHE5.2h LRHE6.1a LRHE6.1c LRHE6.1f LRHE6.1g LRHE6.1h LRHE6.2c LRHE6.2e LRHE6.2g LRHE7.1a LRHE7.1b LRHE7.1c LRHE7.1d LRHE7.1e LRHE7.1f LRHE7.1g LRHE8.1a LRHE8.1b LRHE8.1c LRHE8.1d LRHE8.1e LRHE8.1f LRHE8.1g LRHE8.1h LRHE8.1i LRHE8.2a LRHE8.2b LRHE8.2c LRHE8.2d LRHE8.2e LRHE8.2f LRHE8.2g LRHE8.2h LRHE8.2i LRHE8.2j LRHE10.1c LRHE10.1d LRHE10.2a LRHE10.2c LRHE10.2f LRHE10.2g LRHE10.2j LRHE13.1a LRHE13.1e LRHE13.1g LRHE15.1e LRHE5.2g LRHE6.2d LRHE9.2a LRHE9.2b LRHE10.2d LRHE10.2e LRHE3.1b LRHE4.1c LRHE4.1b LRHE4.1a LRHE11.1a LRHE11.1b LRHE11.1c LRHE11.1d LRHE11.2a LRHE14.1a LRHE14.1b LRHE15.1f LRHE15.1g  Generation F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F3 F1 F1 F2 F2 F2 F2 F0 F0 F0 F0 F2 F2 F2 F2 F2 F3 F3 F3 F3  Sex F F F F F M F F F F M F F M M M M M M F M M M M M M M F F M M M M M F F F F F M M M M M M F M F F F F M M M M M U U M M U U U U U U U U U  Coat Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Albino Albino Albino/Black (50%) Albino/Black (90%) Black Black Black Black Black Black Black Black Black  Genotype +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/+ hIL8:c/hIL8:c/hIL8:chimera hIL8:chimera hIL8:chimera 50% hIL8:chimera 90% hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested  Status Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Found Dead Culled Culled Culled Culled Culled Culled Culled Culled Sick Culled Sick Missing Missing Missing Missing Missing Found Dead Found Dead Missing Missing  Age Mating 3.1 3.1 3.1 3.1 3.1 4 4 4 4 4 5.9 5.9 5.9 6.7 6.7 6.7 6.7 6.7 6.7 7.4 6.7 6.7 6.7 6.7 6.7 6.7 6.7 7.4 7.4 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1 12 12 6.1 6.1 6.1 6.1 6.1 5.9 5.9 5.9 6.1 2.4 15 7.4 7.4 6.1 6.1 3.9 2.6 32.1 4.9 0.4 0.4 0.4 0.4 1.7 0.3 0.3 1.1 1.1  Full Strain Genetic Background C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6NTac/USA(50%);C57BL/6N(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%)  127  Name LRHE16.1a LRHE6.3a LRHE6.3b LRHE6.3c LRHE9.1a LRHE9.1b LRHE9.1c LRHE9.1d LRHE9.1e M00297577 M00297578 M00297579 M00297590 CBLN907.2e CBLN907.2f CBLN907.2h CBLN907.2i  Generation F3 F1 F1 F1 F2 F2 F2 F2 F2  Sex U U U U U U U U U F F F F F F F F  Coat Black Black Black Black Black Black Black Black Black  Black Black Black Black  Genotype hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested hIL8:untested +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+  Status Found Dead Missing Missing Missing Missing Missing Found Dead Found Dead Found Dead Culled Culled Culled Culled Alive Alive Culled Culled  Age Mating 0.9 0.9 0.9 0.9 0.6 0.6 0 0 0  37.9 LRHE5 37.9 LRHE6 34.1 34.1  Full Strain Genetic Background C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) CBA/Wtsi;C57BL/6JIco CBA/Wtsi;C57BL/6JIco CBA/Wtsi;C57BL/6JIco CBA/Wtsi;C57BL/6JIco C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%)  Il10_Gpa33_JM8N4 ESC also underwent two rounds of microinjection with two different positive clones. Table 3.7 summarizes the two microinjections performed for the Il10 gene construct including the number of ESC microinjected, the number of embryos implanted and outcome for each recipient mouse. The first microinjection (ID 2050, Colony name LRIT: Linda Rehaume Interleukin-Ten) resulted in two female F0 offspring LRIT2.1a and LRIT2.1b having 70% and 50% chimerism, respectively. LRIT2.1a and LRIT2.1b were both mistakenly paired with C57BL6c-c- (prefix CALB) males, which are albino, and each had a litter of five and seven offspring, respectively, but in addition to them not being the desired genetic background, they were all albino and therefore culled. The chimera females were then both paired with male CBLN males. LRIT2.1a produced an F1 litter LRHE6.1 containing seven pups but none of these were heterozygous for the exogenous Il10 gene; six of these pups were genotyped but one went missing so genotyping was not possible. LRIT2.1a produced two more litters LRHE6.2 and LRHE6.2 with four and two pups, respectively, but none of those were heterozygous for the mIL10:c allele. Unfortunately LRIT2.1a was found dead at 30.8w and thought to have died during littering. This colony was terminated due to several factors. The chimeric mice were seven months old, which is getting too old for breeding, there was no germline transmission of the mIL10:c allele from the 70% chimera (LRIT2.1a), which then died during what had already been intended as her final littering, and the 50% chimera (LRIT2.1b) did not produce any litters with the CBLN male. Table 3.9 summarizes the LRIT colony.  128  Table 3.9. The Il10 LRIT mouse colony. Two female chimeras were produced out of the eleven F0 pups born to the three recipient mice (M00267240-2). The chimeras were first mistakenly mated with CALB males producing only albino F1 offspring, which were not genotyped by PCR. Mating of the chimeras with CBLN males produced only wildtype (+/+) offspring, determined by PCR. The age of death for each mouse is given in weeks. U, unsexed; F, female; M, male. Name LRIT1.1a LRIT1.1b LRIT1.1c LRIT1.1d LRIT1.1e LRIT1.1f LRIT2.1a LRIT2.1b LRIT2.1c LRIT2.1d LRIT2.1e LRIT4.1a LRIT4.1b LRIT4.1c LRIT4.1d LRIT4.1e LRIT5.1a LRIT5.1b LRIT5.1c LRIT5.1d LRIT5.1e LRIT5.1f LRIT5.1g LRIT6.1a LRIT6.1b LRIT6.1c LRIT6.1d LRIT6.1e LRIT6.1f LRIT6.1g LRIT6.2a LRIT6.2b LRIT6.2c LRIT6.2d LRIT6.3a LRIT6.3b M00267240 M00267241 M00267242 CALB2596.1d CALB2596.1e CBLN779.2c CBLN830.3a  Generation F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1  F22 F22  Sex U U U U U U F F U U U U U U U U U U U U U U U M M M M F F F M M M M F F F F F M M M M  Coat Albino Albino Albino Albino Albino Albino Albino/Black (70%) Albino/Black (50%) Albino Albino Albino Albino Albino Albino Albino Albino Albino Albino Albino Albino Albino Albino Albino Black Black Black Black Black Black Black Black Black Black Black Black Black  Albino Albino Black Black  Genotype mIL10:chimera mIL10:chimera mIL10:chimera mIL10:chimera mIL10:chimera mIL10:chimera mIL10:chimera 70% mIL10:chimera 50% mIL10:chimera mIL10:chimera mIL10:chimera mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested mIL10:untested +/+ +/+ +/+ +/+ +/+ +/+ mIL10:untested +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+  Fate Culled Culled Culled Culled Culled Culled Found Dead Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Missing Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled  Age 1.4 1.4 1.4 1.4 1.4 1.4 30.7 31.3 3.6 3.6 3.6 1.4 1.4 1.4 1.4 1.4 1.3 1.3 1.3 1.3 1.3 1.3 1.3 4.3 4.3 4.3 4.3 4.3 4.3 2.3 6.3 6.3 6.3 6.3 2.3 2.3  Full Strain Genetic Background C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6J-Tyr<c-Brd>(50%);C57BL/6N(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) CBA/Wtsi;C57BL/6JIco CBA/Wtsi;C57BL/6JIco CBA/Wtsi;C57BL/6JIco 11 C57BL/6J-Tyr<c-Brd>(100%) 11 C57BL/6J-Tyr<c-Brd>(100%) 22.3 C57BL/6NTac/USA(100%) 23.1 C57BL/6NTac/USA(100%)  The second Il10 Gpa33_JM8N4 ESC microinjection (ID 2098) was more successful in producing three males LRMT1.1a, LRMT1.1b, LRMT1.1c having 90%, 90% and 30% chimerism, respectively. LRMT1.1a and LRMT1.1b were put with two CBLN females each but the chances of LRMT1.1c transmitting the targeted allele to offspring is low, as observed for the 50% IL8 chimera, so this mouse was only paired with one CBLN female. Interestingly, only the CBLN paired with the 30% chimera (LRMT1.1c) produced any offspring from these initial matings; one pup was born but found dead at 0.8w. All three chimeras were put with younger 129  CBLN females, again two each for the 90% chimeras and one for the 30%, on two different occasions but again only the 30% chimera produced any offspring and thus the two 90% chimeras were presumed sterile. Figure 3.19 is an example of the multiple genotyping for the litter LRMT18.1, which is representative of both LRIT and LRMT offspring. Table 3.10 summarizes the LRMT colony, which has been terminated. Repeat microinjections were carried out on 2010/04/23 for two more Il10_Gpa33_JM8N4 ESC clones.  LRMT18.1 litter a b c d e f g h Il10_A33_JM8N4 ESC DNA JM8.N4 WT DNA Tweezer Control + PCR mastermix + + + + + + + + + NF-H2O 5X orange G loading buffer + 1 Kb DNA ladder (0.5 μg) Genotype n/a WT WT WT WT WT WT WT WT n/a  + + + + + + n/a Het WT n/a  + + n/a  + n/a  + n/a  Figure 3.19. Il10 F0 chimera (LRMT) offspring genotyping. Genomic DNA was amplified with Platinum PCR Supermix in a multiplex reaction consisting of an Il10 insert forward primer (IL10-2), a Gpa33 exon-7 forward primer (A33-E7) and a reverse Gpa33 3’-UTR primer (A33-3U). The wildtype allele is 331 bp and the mIL10:c allele is 686 bp. All eight mice from this litter (LRMT18.1a-h) were wildtype. This genotyping is representative of all LRIT and LRMT offspring.  130  Table 3.10. The Il10 LRMT mouse colony. Three male chimeras were produced out of the eleven F0 pups born to the three recipient mice (M00320092, M00320120, M00320122). The two 90% chimeras were each trio’d with two CBLN females three separate times, but did not produce any offspring so presumed sterile. The 30% chimera was paired with one CBLN female three times but only produced wildtype (+/+) offspring, determined by PCR. The age of death for each mouse is given in weeks. U, unsexed; F, female; M, male. Name LRMT1.1a LRMT1.1b LRMT1.1c LRMT1.1d LRMT1.1e LRMT1.1f LRMT1.1g LRMT8.1a LRMT18.1a LRMT18.1b LRMT18.1c LRMT18.1d LRMT18.1e LRMT18.1f LRMT18.1g LRMT18.1h LRMT19.1a LRMT19.1b LRMT19.1c LRMT19.1d LRMT19.1e LRMT19.1f LRMT19.1g LRMT19.1h M00320092 M00320120 M00320122 CBLN1039.2h CBLN1039.2i CBLN1039.2k CBLN1208.1k CBLN1370.3h CBLN1421.1c CBLN1421.1d CBLN1421.1e CBLN1421.1f CBLN1659.1a CBLN1659.1b CBLN1659.1c CBLN1659.1d CBLN1659.1e CBLN918.4b CBLN931.3c  Generation F0 F0 F0 F0 F0 F0 F0 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1  Sex M M M U U U U U M M M M M F F F M F F F F F F F F F F F F F F F F F F F F F F F F F F  Coat Albino/Black (90%) Albino/Black (90%) Albino/Black (30%) Albino Albino Albino Albino Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Agouti Agouti Agouti Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black Black  Genotype mIL10:chimera 90% mIL10:chimera 90% mIL10:chimera 30% mIL10:chimera mIL10:chimera mIL10:chimera mIL10:chimera mIL10:untested +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+ +/+  Fate To be culled To be culled To be culled Culled Culled Culled Culled Found Dead Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled Culled To be culled Culled Culled Culled Culled To be culled To be culled To be culled To be culled To be culled Culled Culled  Age 29 29 29 1.4 1.4 1.4 1.4 0.7 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1 11.1 11.1 13.3 22.3 18 18 17.4 14.6 12.7 12.7 12.7 12.7 15.2 15.2 15.2 15.2 15.2 13 17.7  Full Strain C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(100%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6N(50%);C57BL/6NTac/USA(50%) C57BL/6JIco(50%);CBA/Wtsi(50%) CBA/Wtsi(50%);C57BL/6JIco(50%) CBA/Wtsi(50%);C57BL/6JIco(50%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%) C57BL/6NTac/USA(100%)  All data from microinjection to colony maintenance is freely available on the WTSI Mouse Database. I have included all relevant information in the Figures, Tables, and the Appendix. Additionally the ESCs and mice are freely available to the research community as part of the WTSI data release policy.  131  3.4  Discussion  To generate mouse models for the study of the roles of IL8 and Il10 in intestinal infection, I set out to create four knock-in mouse strains: two containing IL8 and two containing Il10. Human intestinal epithelial cells are a major source of IL8 during infection by some enteric pathogens (78, 83), therefore it was the intent of this project to generate mice expressing IL8 from these cells. Il10 is also produced by murine intestinal epithelial cells (190), and even though it is still not clear which cells are the most important source of Il10 (65), there is also strong evidence of Il10 expression in intestinal lymphocytes (174). Regardless of the main cellular sources of IL8 and Il10 during intestinal infection, their presence in the inflammatory milieu basolateral to the epithelial barrier has been confirmed (78, 173, 174). Many events, including bacterial adhesion and translocation as well as host recognition and signalling, need to occur before intestinal epithelial cell cytokine production occurs. Generating a mouse in which the production of IL8 or Il10 occurs in ‘all’ cells simultaneously would not only fail to mimic events in man, but would also likely lead to overwhelming inflammation or immunosuppression, respectively, and overshadow the course of infection. As well, the production of either of these cytokines from birth would also likely skew cellular differentiation and development, again making interpretation of the results difficult. Therefore the source of IL8 or Il10 and the timing of their production were both desirable factors over which to have control. The generation of mice expressing IL8 or Il10 in the intestinal tract was approached using two different methods. The first was to generate a knock-in behind a ubiquitously expressed promoter and then cross with an intestinal-specific Cre-expressing mouse and the second was to generate the knock-in behind an intestinal-specific promoter and then cross with a general Creexpressing mouse. The advantage of the first method is that the original transgenic mouse is more versatile for the wider research community and can be used to generate mice expressing IL8 or Il10 in the tissue of interest, depending with which Cre-expressing mouse the transgenic is crossed. The advantage of the second method is that the intestinal-specific nature of IL8 or Il10 is created in the original mouse and there is no need to rely on the availability of an appropriate intestinal-specific Cre-expressing mouse. Both of these mouse models are applicable for the study of enteric diseases and mechanism of action of defensins and other host defence peptides. All of the above factors were considered for the design of the knock-in mice; even though it 132  might not be possible to directly mimic the production of either cytokine during infection, there is still the opportunity to probe their function through controlled experimentation. Two targeting vector systems were chosen for each gene: the pBigT/ pROSA26PA twovector system targets the ROSA26 locus, which allows ubiquitous expression of the inserted gene (191, 192); and the pA33LSL vector targets the Gpa33 gene (177), which is located on Chromosome 1, and is constitutively expressed in the mouse intestine. Both IL8 and Il10 were successfully cloned into the pA33LSL vector, and the integrity of the genes was confirmed by restriction enzyme digestion, PCR, and sequencing. However, cloning of both IL8 and Il10 into the pBigT/ pROSA26PA plasmids was unsuccessful, and this approach will not be discussed further. The IL8- and Il10-containing pA33LSL constructs were transfected into mouse embryonic stem cells from the JM8 line, and cells in which targeting events had occurred were selected by growth in Geneticin, resistance to which was conferred by the targeting vector. For both IL8 and Il10, numerous colonies were subsequently isolated and expanded, and successful targeting events were confirmed by Southern blotting and PCR. A subset of the targeted clones was selected for microinjection into mouse blastocysts, for the generation of knock-in mice. Two rounds of microinjection were performed with both the IL8- and Il10-targeted ESC, and independent ESC clones were used for each round. The second round of microinjections, LRHE, produced three male chimeras, one of which transmitted the hIL8:c allele to its offspring. A successful breeding colony has been established producing both conditional heterozygous and conditional homozygous mice; colony expansion will continue until sufficient numbers have been generated to start breeding with vil-Cre-ERT2 mice. Germline transmission did not occur for the Il10-targeted ESC but these have been microinjected again, in a final attempt to obtain germline transmission. The purpose of the Cre-recombinase-conditional system was to enable control over the context of expression of my genes of interest. The constitutive nature of Gpa33 expression throughout the intestine is a potential disadvantage of this construct if there is also concomitant IL8 or Il10 expression, which would likely overwhelm the intestinal mucosa and skew cellular differentiation. Any further immune system challenge (e.g. bacterial infection) may be difficult to assess because of the ensuing chronic hyper- or hypo-inflammation, respectively. However, the advantage of using the pA33LSL targeting vector is that the resulting IL8 and Il10 mice are conditional knock-in mutants and full expression of either cytokine occurs only with deletion of the neomycin resistance cassette. Read-through of the neomycin resistance cassette polyA tail 133  may occur resulting in transcription of IL8 or Il10 but this should occur only at a very low frequency. The neomycin resistance cassette was utilized during ESC transfection with the pA33LSL vector to ensure survival of those cells that take up the vector and undergo homologous recombination. Following the expansion of surviving ESC and the identification of correctly targeted clones, the neomycin resistance cassette is no longer necessary for general growth. It does however play a role in the conditional nature of the resulting mice; deletion of the neomycin resistance cassette in vivo, facilitated by the position and orientation of the loxP sites of pA33LSL, permits transcription of Gpa33-IL8 or Gpa33-Il10. I verified that the loxP sites of the targeted alleles were intact and in vitro the Cre-loxP system functioned to delete the neomycin resistance cassette. It is therefore assumed that these sites will be intact in the resulting mice and Cre-mediated neomycin resistance cassette deletion will occur in vivo following breeding with vil-Cre-ERT2 mice. The ESC colonies that formed following transfection with the Cre-puromycin plasmid had an increased number of differentiating cells compared to the pA33LSL transfection, which is an indication of lower ESC health. In addition to creating conditional knock-in mice, in vivo Cre-mediated neomycin resistance cassette deletion negates the need for a second electroporation thereby increasing the rate of germline transmission since the percentage of pluripotent ESC can decrease with each manipulation. Expression of IL8 or Il10 in vivo requires breeding of the conditional IL8 and Il10 mice with those expressing Cre recombinase. In order to have full control over IL8 and Il10 expression, the expression of Cre recombinase has to be inducible and ideally only expressed in the intestine to prevent any Gpa33-IL8 or Gpa33-Il10 transcription in the stomach or bladder, regardless of whether this is negligible. A transgenic inducible C57BL/6N Cre-expressing mouse, vil-Cre-ERT2, has been generated that expresses Cre recombinase under the control of a tamoxifen-dependent villin (vil) promoter (193). Administration of tamoxifen, which is a selective antagonist for the estrogen receptor (SERM), induces Cre expression throughout the intestinal tract with high expression in the small intestine and lower expression in the large intestine (193). However when these mice were crossed with Cre-dependent β-galactosidase reporter mice, X-gal (5-bromo-4-chloro-3-indolyl-b-D-galactopyranoside) staining was visible throughout both the small and large intestine (193). Additionally Cre expression was observed in epithelial cells in both the crypt and villus (193). The use of tamoxifen to induce expression of Cre recombinase is desirable over more traditional tetracycline inducible systems since the antibiotic properties of tetracycline would interfere with bacterial infection challenges. Crossing 134  of the conditional IL8 and Il10 mice with vil-Cre-ERT2 mice will allow for controlled IL8 or Il10 expression. Future experiments to study the role of IL8 in enteric Shigella and Salmonella infections will be discussed; those involving Il10 specifically will not be discussed since the mice have not been generated. However the Il10 conditional mice would have also been studied in parallel to the IL8 conditional mice in the described infections, so that the role of these cytokines could be compared in enteric infection as well as in administration of peptide therapeutics. The ability to control IL8 or Il10 expression allows examination of cell activation kinetics, cellular recruitment and disease pathology following infectious challenge. Additionally the effects of IL8 or increased Il10 production on the efficacy of host defence peptides will be evaluated. There are two ways to control IL8 or Il10 expression in this system. First is the breeding of conditional mice with vil-Cre-ERT2 mice for tamoxifen-induced Cre-mediated neomycin resistance cassette deletion, as previously described. Second is the infection of conditional mice with Cre-expressing bacteria. S. typhimurium harbouring the plasmid pSB1881 express a SopE-Cre recombinase fusion protein (194). SopE, which, as part of the SPI-1 TTSS, is injected into epithelial cells upon infection (82, 194). The background of the S. typhimurium strain plays a significant role in the success of this system as the strain must be virulent enough to replicate following invasion but not kill the epithelial cells before IL8 or Il10 is expressed (194). The wild type strain SL1344 is hypothesized to be too virulent for the in vivo efficacy of this system (194), so pSB1881 will be transformed into S. typhimurium strain M525, which has moderate virulence compared to SL1344 or C5 strains, for a trial infection of the conditional IL8 mice. (NB. Between submission and defence of this thesis, the experiments described above were performed. The results are shown in Appendices B.6 and B.7.)  The SopE-Cre reporter system was developed to study the SPI-1 TTSS and the possibility of its use in vaccine development; however it is an ideal system for Cre-mediated neomycin resistance cassette deletion in vivo. The expression of either cytokine will only occur in infected cells, thus more closely resembling the process of natural infection. Administration of peptide, either prophylactically or therapeutically, and the characterization of intestinal cellular populations and induction of chemokines and cytokines will aid in determining the mechanism of action of host defence peptides in infection in vivo. The S. typhimurium M525 strain expressing Cre recombinase will also be used to infect the IL8-vil-Cre-ERT2 mice. The comparison of IL8-vil-Cre-ERT2 mice infected with wild type Salmonella and IL8 conditional mice infected with Cre-expressing Salmonella could highlight 135  subtleties in host-pathogen interactions, in addition to controlling for potential immune modulating effects mediated by tamoxifen-dependent estrogen receptor engagement. In summary, I have generated two embryonic stem cell lines harbouring insertions of the human IL8 or mouse Il10 gene behind the intestinal-specific Gpa33 gene. Additionally this work has led to the generation of conditional heterozygous and homozygous mice carrying the hIL8:c allele(s), and efforts to generate conditional Il10 mice are ongoing. These cell lines and mouse strain will enable valuable studies into the roles of IL8 and Il10 in intestinal infection and peptide mechanism of action.  136  4  4.1  CHARACTERIZATION OF TRANSCRIPTIONAL PROFILES INVOLVED IN INTERLEUKIN 12- AND INTERFERON-γ-MEDIATED PRIMARY IMMUNODEFICIENCIES Introduction  The ability of Salmonella species to penetrate the epithelial barrier (e.g. intestine, lung) is only the first step in their strategy for host invasion. As a facultative intracellular pathogen, the survival of Salmonella species also depends on their ability to invade and replicated within phagocytic cells (82). Conversely, the ability of the host to defend itself against invasion depends on cell-mediated immunity and type 1 cytokines. IFNG and TNF are key Th1 cytokines involved in the activation of macrophages for cell-mediated killing of Salmonella species and other intracellular pathogens such as mycobacterial species (82). The DEFA5 transgenic mouse has also highlighted the importance of defensins in immunity to S. typhimurium infection (43), which might be partially through regulation of the intestinal microbiota composition (96). However S. typhimurium reduces Defa1 expression and α-defensin peptide release from FvB mice, in a SPI1-dependent manner (195). S. typhimurium has developed mechanisms for resistance to host defence peptides, including defensins. These include the PhoP-PhoQ two-component system (196), yejABEF operon (197), and sigma factor RpoE (198). This suggests that defensins have additional functions in protecting against infection. Consistent with their adjuvant antitumor activity (discussed previously), human neutrophil α-defensins augment IFNG and antigenspecific antibody production following immunization (199). Additionally human neutrophil αdefensins can promote macrophage IFNG and TNF production (200), and a chicken β-defensin can induce IL12B (IL12p40) production in a TLR4-NF-κB-dependent mechanism (201). Interestingly, recombinant IL12 and IFNG also induce human β-defensin expression (202, 203). Finally IL12B- and IFNG-deficient mice have dramatically reduced Defb3 expression following infection with the intestinal pathogen, Citrobacter rodentium (204). These examples further strengthen the relationship between defensins and cytokines in shaping the immune response against intestinal pathogens. Mutations in molecules within the IL12-dependent IFNG signalling pathways (Figure 4.1) increase susceptibility to the Bacillus Calmette-Guerin (BCG) tuberculosis vaccine, nontuberculosis mycobacteria (NTM) and Salmonella infections (205, 206).  137  LP S  Figure 4.1. Key signalling molecules involved in IL12-dependent IFNG production. The primary immunodeficiencies, IL12RB1 and IFNGR1, of patients involved in this work are boxed in red. Stimuli used for ex vivo stimulation of PBMCs isolated from patients are boxed in green. Modified from (206) (permission obtained from BMJ Publishing Group Ltd.).  The clinical phenotype of these primary immunodeficiencies, which are normally autosomal recessive and present in early childhood, is largely dependent not only on the molecule involved but the genetic location of the mutation (205). The disease associated with complete deficiency is generally more severe than that of partial deficiency, and additionally people with partial deficiency can respond to high dose IFNG treatment (205). Genes in which mutations have been identified within the IL12-dependent IFNG pathway are IFNGR1, IFNGR2, IL12RB1, IL12B, signal transducer and activator of transcription 1 (STAT1), tyrosine kinase 2 (TYK2) and NF-κB essential modulator (NEMO). Even though NEMO mutations can predispose to NTM infection, there are other complicating factors that differentiate them from mutations in the other genes and thus this group will not be discussed further. The diagnostic assays involved in putatively identifying such mutations have been well established, which involve the treatment of peripheral blood mononuclear cells (PBMCs) or whole venous blood (206) with various stimuli, and measuring the cytokines produced (206). The output of the cytokine assay in conjunction with the clinical phenotype is used to predict the gene harbouring the mutation. Confirmation is required by gene sequencing. Figure 4.2 summarizes the treatments and expected outcomes for the aforementioned gene deficiencies.  138  Stimulus  Cytokine Responses from PBMCs or Blood of Individuals with Th1 Deficiencies IFNGR1/2 Normal IL12RB1 IL12B STAT1 Controls  Readout  Media IL12 PHA PHA + IL12 LPS LPS + IL12  IFNG  Media IFNG LPS LPS + IFNG  IL12A/B (IL12p70)  Media IFNG LPS LPS + IFNG  TNF  Media IFNG LPS LPS + IFNG  IL12B  Figure 4.2. Cytokine production as a diagnostic indicator of Th1 primary immunodeficiencies. The relative levels of each cytokine are indicated by shading. Black is the highest output set at 100%. Dark and light grey denote approximately 50% and 25% output, respectively, compared to the maximum. Modified from (206) (permission obtained from BMJ Publishing Group Ltd.).  The primary immunodeficiencies associated with mycobacterial and Salmonella infections and the genetic mutations within IFNGR1, IFNGR2, IL12RB1, IL12B, STAT1 and TYK2 have been well described (205-209). The signalling cascades involved in IFNG and TNF production are also well characterized, as well as the synergy required between IL12 and IL18 for efficient IFNG production (205-209). However the transcriptional responses which mediate these or other events, and the expression of other genes involved in the resulting immunodeficiencies are not well defined. Additionally synergies between LPS/IL12 or LPS/IFNG, if any, warrant further investigation. Therefore the aim of this project was to compare the gene expression of PBMCs, treated with various stimuli, from people with IFNGR1 and IL12RB1 deficiencies to those of healthy controls. IFNGR1 is a 489 amino acid protein coded by 1470 nucleotides comprising seven exons. IFNGR1 forms a heterodimer with IFNGR2, which contains 337 amino acids coded by 1014 nucleotides also comprising seven exons. IFNGR1 binds IFNG, and both IFNGR1 and IFNGR2 are required to transduce the signal intracellularly through JAK1/2 and STAT1 phosphorylation 139  events (206). STAT1 homodimers translocated to the nucleus bind to gamma activation sequences and induce transcription (206). The two people included in this study with IFNGR1 deficiency, denoted as IFNGR1(1) and IFNGR1(2), both had heterozygous mutations in IFNGR1. IFNGR1(1) and IFNGR1(2) both had a heterozygous four nucleotide deletion (TTAA) at position 818 (coding sequence) within Exon 6 of IFNGR1 (Figure 4.3A) (208). This deletion results in a frameshift mutation and premature stop codon just downstream of the receptor transmembrane domain (208). As a result, the truncated IFNGR1 is expressed at the surface of the cell. However due to the lack of the intracellular signalling and receptor recycling domains and normal binding to IFNG, IFNGR1 accumulates at the cell surface and exhibits dominant negative expression (208). IL12RB1 is a 381 amino acid protein coded by 1146 nucleotides comprising 16 exons. IL12RB1 forms a heterodimer with IL12Rβ2, which contains 862 amino acids coded for by 2