Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Harnessing natural diversity for the discovery of glycoside hydrolases and design of new glycosynthases Armstrong, Zachary 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2018_september_armstrong_zachary.pdf [ 45.71MB ]
JSON: 24-1.0367927.json
JSON-LD: 24-1.0367927-ld.json
RDF/XML (Pretty): 24-1.0367927-rdf.xml
RDF/JSON: 24-1.0367927-rdf.json
Turtle: 24-1.0367927-turtle.txt
N-Triples: 24-1.0367927-rdf-ntriples.txt
Original Record: 24-1.0367927-source.json
Full Text

Full Text

Harnessing Natural Diversity for theDiscovery of Glycoside Hydrolases andDesign of New GlycosynthasesbyZachary ArmstrongB.Sc., The University of British Columbia, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Genome Science and Technology)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)May 2018c© Zachary Armstrong 2018The following individuals certify that they have read, and recommend to the Faculty of Graduateand Postdoctoral Studies for acceptance, the dissertation entitled:Harnessing Natural Diversity for the Discovery of Glycoside Hydrolases andDesign of New Glycosynthases.Submitted By Zachary Armstrong in partial fulfillment of the requirements forthe degree of Doctor of Philosophyin Genome Science and TechnologyExamining Committee:Professor Stephen G. WithersCo-supervisorProfessor Steven J. HallamCo-supervisorProfessor Lindsay D. EltisSupervisory Committee MemberProfessor Harry BrummerUniversity ExaminerProfessor Martin TannerUniversity ExamineriiAbstractPlant biomass offers a sustainable source for energy and materials and an alternative to fossil fuels.However, the industrial scale production or biorefining of fermentable sugars from plant biomassis currently limited by the lack of cost effective and efficient biocatalysts. Microbes, the earth’smaster chemists – employing biocatalytic solutions to harvest energy, and transform this energyinto useful molecules – offer a potential solution to this problem. However, a majority of microbesremain uncultured, limiting our access to the genetic potential encoded within their genomes. Thishas spurred the development of culture independent methods, termed metagenomics.In this thesis I harnessed high-throughput functional metagenomic screening to discover biomassdeconstructing biocatalysts from uncultured microbial communities. Towards this goal, twenty-twoclone libraries containing DNA sourced from diverse microbial communities inhabiting terrestrialand aquatic ecosystems were screened with 4-methylumbelliferyl cellobioside to detect glycosidehydrolase activity. This revealed 178 active clones containing glycoside hydrolases, often in geneclusters. This set of active clones was consolidated and further characterized through sequencingand rapid, plate-based, biochemical assays. Additionally, libraries sourced from beaver fecal and gutmicrobiomes were screened with four fluorogenic probes (6-chloro-4-methylumbelliferyl derivativesof cellobiose, xylobiose, xylose and mannose) for glycoside hydrolase activity. This revealed a totalof 247 active fosmid-harbouring clones, that encoded many polysaccharide-degrading genes and genecassettes. Specific candidate genes from the fecal library were sub-cloned, and the resulting purifiedenzymes were shown to be involved in synergistic degradation of arabinoxylan oligomers. Theclone libraries that were generated through functional metagenomic screening were then employedto reveal the promiscuity of glycoside hydrolases towards unnatural azido- and aminoglycosides.Promiscuous enzymes identified from metagenomic and synthetic clone libraries were then used as astarting point for the generation of new glycosynthases capable of incorporating modified glucosidesiiiAbstractand galactosides. The resulting set of eight new glycosynthases are capable of synthesizing di- andtrisaccharides, glycolipids and inhibitors such as 2,4-dinitrophenyl 4’-amino-2,4’-dideoxy-2-fluoro-cellobioside. Taken together this work has exploited the power of functional metagenomics to revealnew modes of biocatalysis and develop new synthetic tools.ivLay SummaryMicrobes are ubiquitous; they are in soil, air, water and inside our bodies. They also make enzymesto promote chemical transformations in our environment, including plant matter degradation. Somemicrobes are difficult to study as they can’t be grown in the laboratory. For these microbes we usea collection of growth free methods, termed metagenomics. This thesis investigates plant-degradinggenes – the DNA that codes for enzymes – present in soil, ocean water, bioreactors, coal beds, andbeaver digestive tracts using metagenomics. Many of the uncovered genes had not been seen before.I also investigated how some of these enzymes work together to degrade specific sugars present inplants. Additionally, I made some of these enzymes capable of creating unnatural molecules thatwould be difficult to create otherwise. This work used metagenomics to discover catalysts, includingthose that break down plants, and create catalysts to synthesize unnatural molecules.vPrefaceA number of sections of this work are partly or wholly published in press. Much of this researchwas conducted as a collaborative effort and contributions to each section are detailed below.• Portions of Chapter 1 and Chapter 5 drew references and ideas from previous publicationsbut contain wording original to this thesis. These publications, for which I am an author,follow below:Zachary Armstrong, Keith Mewis, Cameron Strachan, and Steven J. Hallam. ”Biocata-lysts for biomass deconstruction from environmental genomics.” Current Opinion in ChemicalBiology 29: 18-25. (2015)Zachary Armstrong, Stephen G Withers. ”Synthesis of glycans and glycopolymers throughengineered enzymes.” Biopolymers 99(10): 666-674Zachary Armstrong, Peter Rahfeld, Stephen G Withers. ”Discovery of New GlycosidasesFrom Metagenomic Libraries.” Methods in enzymology 597, 3-23• Chapter 2: The functional screening method was developed and applied for study of theanaerobic bioreactor and forest soils by Dr. Keith Mewis. Other screening efforts were under-taken by Sam Kheirandish under supervision of Dr. Keith Mewis and Zachary Armstrong.Fosmid libraries for environments screened were created by Dr. Marcus Taupp, Dr. SangwonLee, Payal Sipahimalani, and Melanie Scofield.• Chapter 3: Sampling of beaver feces was performed by Zachary Armstrong and Dr. KevinMehr, and DNA isolation and purification was performed by Zachary Armstrong. The fosmidlibrary was created by Zachary Armstrong with assistance from Melanie Scofield. Screeningand hit validation was performed by Zachary Armstrong. Cell lysate initial rate characteriza-viPrefacetion was preformed by Zachary Armstrong and Dr. Feng Liu. Beaver intestinal samples werecollected by Dr. Keith Mewis and Zachary Armstrong, and DNA isolation and purificationwas performed by Zachary Armstrong. DNA and fosmid sequencing was performed by Dr.Keith Mewis at the UBC Pharmaceutical Sciences Sequencing Center (PSSC) with help fromDr. Sunita Sinha and Jennifer Chiang. Molecular cloning and protein expression, purificationand characterization was performed by Zachary Armstrong.• Chapter 4: All molecular cloning, protein purification and characterization was performedby Zachary Armstrong. Both Zachary Armstrong and Dr. Feng Liu were responsible forlarge-scale purifications. All NMR assignment was performed by Dr. Feng Liu.The UBC Office of Research Ethics was consulted related to work with dissected beavers inChapter 3, but no ethical applications or approval was required.Throughout this work, the term ”we” refers to Zachary Armstrong, unless otherwise stated.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Plant Biomass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.1 Structure of Polysaccharides . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Cellulose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.3 Hemicellulose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.4 Pectins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1.5 Lignin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2 Carbohydrate Active Enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.1 Glycoside Hydrolases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.2 Polysaccharide Utilization Loci . . . . . . . . . . . . . . . . . . . . . . . . . . 16viiiTable of Contents1.2.3 Glycosynthases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.3 Metagenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.3.1 Functional Metagenomic Screens . . . . . . . . . . . . . . . . . . . . . . . . . 211.3.2 16S Ribosomal RNA Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Large-Scale Functional Metagenomic Screening for Glycoside Hydrolases . . . 252.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3.1 In-Silico Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.2 Functional Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.3 High-throughput Characterization of Fosmids . . . . . . . . . . . . . . . . . 362.3.4 Fosmid Sequencing and Gene Annotation . . . . . . . . . . . . . . . . . . . . 392.4 Limitations and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Functional Screening of the Castor canadensis Fecal and Gut Metagenomes . 613.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3 Beaver Fecal Metagenome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.3.1 16S Ribosomal RNA Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 633.3.2 Metagenome Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3.3 Functional Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.3.4 Fosmid Sequencing and Gene Annotation . . . . . . . . . . . . . . . . . . . . 683.3.5 Gene Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3.6 Presence of Hemicellulose Targeting Loci . . . . . . . . . . . . . . . . . . . . 723.4 Beaver Gut Metagenome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.4.1 16S Ribosomal RNA Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 773.4.2 Metagenome Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81ixTable of Contents3.4.3 Functional Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843.4.4 Fosmid Sequencing and Gene Annotation . . . . . . . . . . . . . . . . . . . . 873.4.5 Presence of Polysaccharide Utilization Loci . . . . . . . . . . . . . . . . . . . 903.5 Limitations and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964 Harnessing Natural Diversity to Profile Promiscuity and Create New Glycosyn-thases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.3 Fosmid Hit Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.3.1 Screening with Modified Glycosides . . . . . . . . . . . . . . . . . . . . . . . 1014.3.2 Kinetic Characterization of Hydrolases . . . . . . . . . . . . . . . . . . . . . 1044.3.3 Acceptor Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.3.4 Nucleophile Mutant Creation and Glycosynthase Tests . . . . . . . . . . . . 1124.3.5 Product Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134.4 Glycoside Hydrolase Family 1 Library . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.4.1 Screening with Modified Glycosides . . . . . . . . . . . . . . . . . . . . . . . 1154.4.2 Kinetic Characterization of Hydrolases . . . . . . . . . . . . . . . . . . . . . 1194.4.3 Acceptor Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204.4.4 Nucleophile Mutant Creation and Glycosynthase Tests . . . . . . . . . . . . 1234.4.5 Product Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.5 Discussion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.1 Relevant Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.2 Limitations and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355.2.1 Diverse Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355.2.2 Enzyme Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138xTable of Contents5.3 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.1 General Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.2 Data Accessioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.3 Chapter 2 Experimental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.3.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.3.2 Library Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.3.3 Fosmid End-Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.3.4 Annotation of End-Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.3.5 Functional Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446.3.6 Fosmid DNA Isolation and Sequencing . . . . . . . . . . . . . . . . . . . . . 1446.3.7 Fosmid Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.3.8 GH Family Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.3.9 Fosmid-Encoded Activity Characterization . . . . . . . . . . . . . . . . . . . 1466.4 Chapter 3 Experimental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.4.1 Sample Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.4.2 DNA Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.4.3 PCR Amplification of Ribosomal SSU Gene Sequences . . . . . . . . . . . . 1496.4.4 Sequencing and Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1496.4.5 Analysis of Pyrotag Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1506.4.6 Analysis of Metagenomic Sequences . . . . . . . . . . . . . . . . . . . . . . . 1506.4.7 Fosmid Library Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1516.4.8 Functional Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1516.4.9 Fosmid Preparation and Sequencing . . . . . . . . . . . . . . . . . . . . . . . 1526.4.10 Fosmid Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.4.11 Fosmid Encoded Enzyme Specificities . . . . . . . . . . . . . . . . . . . . . . 1536.4.12 Sub-Cloning of Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1536.4.13 Mutagenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154xiTable of Contents6.4.14 Protein Expression and Purification . . . . . . . . . . . . . . . . . . . . . . . 1546.4.15 Protein Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.5 Chapter 4 Experimental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596.5.1 Screening: Metagenomic Hit Library . . . . . . . . . . . . . . . . . . . . . . 1596.5.2 Sub-Cloning of Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596.5.3 Protein Expression and Purification: Metagenome Hit Library . . . . . . . . 1606.5.4 Wild-Type Enzyme Kinetics: Metagenomic Hit Library . . . . . . . . . . . . 1616.5.5 Production of Mutants: Metagenomic Hits . . . . . . . . . . . . . . . . . . . 1626.5.6 Acceptor Specificity: Metagenomic Hits . . . . . . . . . . . . . . . . . . . . . 1626.5.7 Glycosynthase Reactions: Metagenomic Hits . . . . . . . . . . . . . . . . . . 1636.5.8 Multi-milligram Scale Reactions: Metagenomic Hits . . . . . . . . . . . . . . 1646.5.9 Screening: GH1 library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.5.10 Protein Purification: GH1 Library . . . . . . . . . . . . . . . . . . . . . . . . 1656.5.11 Acceptor Specificity Screening: GH1 Library . . . . . . . . . . . . . . . . . . 1666.5.12 Wild-Type Enzyme Kinetics: GH1 Library . . . . . . . . . . . . . . . . . . . 1676.5.13 Production of Mutants: GH1 Library . . . . . . . . . . . . . . . . . . . . . . 1676.5.14 Glycosynthase Reactions: GH1 Library . . . . . . . . . . . . . . . . . . . . . 1686.5.15 Multi-milligram Scale Reactions: GH1 Library . . . . . . . . . . . . . . . . . 1686.5.16 Mass Spectrometry and NMR Spectroscopy of Products . . . . . . . . . . . 169Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172AppendicesA Chapter 2 Supplemental Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214A.1 Supplemental Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214B Chapter 3 Supplemental Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220B.1 Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220xiiTable of ContentsC Chapter 4 Supplemental Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222C.0.1 NMR Assignments of Glycosynthase Products . . . . . . . . . . . . . . . . . 222xiiiList of Tables1.1 Plant Cell Wall Composition, Amount of Polysaccharide (% w/w) . . . . . . . . . . 102.1 Fosmid Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2 End Sequences Interrogated From Each Library . . . . . . . . . . . . . . . . . . . . . 292.3 Highly Repetitive Short ORFs from PWCG7 . . . . . . . . . . . . . . . . . . . . . . 302.4 Functional Screening Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.5 GH3 and GH5 Recovery Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1 GH43 Subfamilies Identified on Functionally Active Fosmids. . . . . . . . . . . . . . 703.2 Kinetic Rates Determined for Purified GH43 Enzymes with CMU-X. . . . . . . . . . 713.3 Activity of Purified GH43 Enzymes on Aryl-glycosides. . . . . . . . . . . . . . . . . . 723.4 OTU Counts from Beaver Fecal and Gut Samples . . . . . . . . . . . . . . . . . . . . 823.5 CAZyme Relative Abundance (% of All ORFs) in Beaver Gut Samples . . . . . . . . 833.6 Beaver Gut Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.1 Number of Fosmid Hits for Each Modified Substrate (Robust Z-Score >10). . . . . . 1044.2 Selected Fosmids with Activity on Modified Glycosides and the Genes Selected forSub-Cloning and Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.3 Kinetic Constants for Fosmid Sourced Hydrolases. . . . . . . . . . . . . . . . . . . . 1084.4 Acceptor Specificity of Selected Wild-Type Hydrolases. . . . . . . . . . . . . . . . . . 1114.5 Stereochemical Outcome and Yield of C11 E354S Glycosynthase Reactions. . . . . . 1144.6 Selected GH1 Genes and Their Activities. . . . . . . . . . . . . . . . . . . . . . . . . 1204.7 Kinetic Parameters for Selected GH1s . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.8 Product Yields From Small Scale Glycosynthase Reactions. . . . . . . . . . . . . . . 124xivList of Tables4.9 Characterized GH1 Glycosynthase Products. . . . . . . . . . . . . . . . . . . . . . . 1254.10 Glycosynthase Activity with Azido and Amino Donor Sugars. . . . . . . . . . . . . 1264.11 Comparison of Enzyme Reactivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.1 The Ten Most Recently Defined Glycoside Hydrolase Families . . . . . . . . . . . . . 1336.1 Sequences Used To Generate Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1466.2 Beaver Pyrotag Counts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1576.3 Beaver Intestinal Metagenomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.4 Sub-Cloning Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.5 Mutagenesis Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.6 Primers Used for Sub-Cloning Fosmid Derived Genes . . . . . . . . . . . . . . . . . . 1606.7 Primers Used for Mutagenesis of Metagenome Sourced Hydrolases . . . . . . . . . . 1706.8 Primers Used in QuikChange Mutagenesis of Selected GH1 Enzymes . . . . . . . . . 171A.1 Relative Initial Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214xvList of Figures1.1 Polymer Constituents of Lignocellulose. . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Forms of D-Glucose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Monosaccharides Present in Plant Biomass. . . . . . . . . . . . . . . . . . . . . . . . 61.4 Polymer Constituents of Lignocellulose. . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Glycoside Hydrolase Mechanisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.6 Enzymatic Degradation of Plant Cellulose and Hemicelluloses. . . . . . . . . . . . . 141.7 Enzymatic Degradation of Plant Pectins. . . . . . . . . . . . . . . . . . . . . . . . . 151.8 Starch Utilization System (SUS) Operon in B. thetaiotaomicron . . . . . . . . . . . 171.9 Glycosynthase Mechanisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.10 Functional Metagenomic Screening Workflow. . . . . . . . . . . . . . . . . . . . . . 232.1 In-Silico Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2 Fluorogenic Reporter 4-Methylumbelliferyl Cellobioside. . . . . . . . . . . . . . . . . 322.3 Functional Screening Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.4 Fosmid Substrate Preference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5 pH Optima of Fosmid Clone Activity . . . . . . . . . . . . . . . . . . . . . . . . . . 382.6 Thermal Stability of Fosmid Clone Activity . . . . . . . . . . . . . . . . . . . . . . . 402.7 Dristribution of Fosmid Insert Length . . . . . . . . . . . . . . . . . . . . . . . . . . 412.8 Predicted GH Abundance on Fosmids Hits and on End Sequences . . . . . . . . . . . 432.9 Hydrolase Distribution with Optimal Substrate. . . . . . . . . . . . . . . . . . . . . . 452.10 Percent Identities of Best Blast Hits to Putative Hydrolases . . . . . . . . . . . . . . 472.11 Phylogenetic Tree Containing Discovered GH1s . . . . . . . . . . . . . . . . . . . . . 492.12 Phylogenetic Tree Containing Discovered GH3s . . . . . . . . . . . . . . . . . . . . . 51xviList of Figures2.13 Phylogenetic Tree Containing Discovered GH5s . . . . . . . . . . . . . . . . . . . . . 522.14 Phylogenetic Tree Containing Discovered GH8s . . . . . . . . . . . . . . . . . . . . . 542.15 Phylogenetic Tree Containing Discovered GH9s . . . . . . . . . . . . . . . . . . . . . 552.16 PUL Containing Fosmids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.17 Multiple GH containing Fosmids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.1 Beaver Fecal Community Composition. . . . . . . . . . . . . . . . . . . . . . . . . . . 653.2 Substrates Used in Multiplex Screening . . . . . . . . . . . . . . . . . . . . . . . . . 663.3 Functional Screening of Beaver Fecal Library . . . . . . . . . . . . . . . . . . . . . . 673.4 Fosmids Identified from High-Throughput Screening of Fecal Library. . . . . . . . . 693.5 Gene Organization of Multi-Domain Proteins Identified on Functional Fosmids. . . . 713.6 Synergistic Degradation of Arabinoxylooligosaccharides by H03-13 GH43 Enzymes. 733.7 Gene Organization of Putative Hemicellulose Targeting Fosmids and SusC/SusD-likeEncoding Fosmids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.8 Beaver Gut Sampling Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.9 Bubble Plot of Beaver Gut Pyrotags. . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.10 Beaver Gut Pyrotags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.11 Beaver Gut CAZyme Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843.12 Abundance of Plant Polysaccharide Degrading Cazymes in Beaver Gut Metagenomes 853.13 Functional Screening of Beaver Gut Libraries. . . . . . . . . . . . . . . . . . . . . . 863.14 Distribution of Beaver Gut Fosmid Insert Length . . . . . . . . . . . . . . . . . . . . 883.15 Relative Abundance of Glycoside Hydrolases in Sequenced Fosmids and Metagenomes. 903.16 Gene Organisation of Beaver Gut Fosmids Containing SusC/SusD-like Proteins anda Two Domain GH10-GH43. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.17 Gene Organisation of Beaver Gut Fosmids Containing SusC/SusD-like Proteins Withhighest Activity on CMU-X2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.18 Gene Organisation of Beaver Gut Fosmids Containing SusC/SusD-like Proteins Withhighest Activity on CMU-C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.1 Modified Sugars Used for Metabolic Labelling. . . . . . . . . . . . . . . . . . . . . . 100xviiList of Figures4.2 Screening Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.3 Modified Glucosides and Galactosides Used for Screening. . . . . . . . . . . . . . . . 1034.4 Functional Screening of Hit Libraries with Modified Glycosides. . . . . . . . . . . . . 1054.5 Gene Organisation of Selected Fosmids With Activity on Modified Glycosides. . . . 1074.6 GH1 Enzyme Library β-Glucosidase Activity. . . . . . . . . . . . . . . . . . . . . . 1164.7 GH1 Azido- and Aminoglucoside Screening Results. . . . . . . . . . . . . . . . . . . 1174.8 Pha GH1 in Complex With Gluconolactone and Substrate-Protein Bond Distances. 1194.9 GH1 Acceptor Specificity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121B.1 Unabridged Comparison of Beaver Fecal Metagenome with Other Sequenced Mam-mal Microbiomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221xviiiList of AbbreviationsAbbreviations∆∆Go‡ Change in the Gibbs energy of activationAG ArabinogalactanAra ArabinanBLAST Basic Local Alignment Search Toolbp base pairCAZy Carbohydrate Active enZymeCE Carbohydrate EsteraseCel CelluloseDNA Deoxyribonucleic acidGH Glycoside HydrolaseGM GlucomannanGX GlucuronoxylanHG HomogalacturonanKbp thousand base pairsLPMO Lytic Polysaccharide Mono-OxygenaseMbp Million base pairsORF Open reading frameOTU Operational Taxonomic UnitPL Polysaccharide LyasePUL Polysaccharide Utilization LociRG RhamnogalacturonanRNA Ribonucleic acidSSU rRNA Small Subunit Ribosomal Ribonucleic AcidTm Denaturation midpointUPGMA Unweighted Pair Group Method with Arithmetic MeanXG XyloglucanXGUL Xyloglucan Utilization LocixixList of AbbreviationsAmino acidsAla A AlanineArg R ArginineAsn N AsparagineAsp D Aspartic AcidCys C CysteineGlu E Glutamic AcidGln Q GlutamineGly G GlycineHis H HistidineIle I IsoleucineLeu L LeucineLys K LysineMet M MethioninePhe F PhenylalaninePro P ProlineSer S SerineThr T ThreonineTrp W TryptophanTyr Y TyrosineVal V ValinexxList of AbbreviationsSubstratesαF-3-N3-Glc 3-azido-3-deoxy-α-D-glucopyranosyl fluorideαF-3-NH2-Glc 3-amino-3-deoxy-α-D-glucopyranosyl fluorideαF-4-N3-Glc 4-azido-4-deoxy-α-D-glucopyranosyl fluorideαF-4-NH2-Glc 4-amino-4-deoxy-α-D-glucopyranosyl fluorideαF-6-N3-Gal 6-azido-6-deoxy-α-D-galactopyranosyl fluorideαF-6-N3-Glc 6-azido-6-deoxy-α-D-glucopyranosyl fluorideαF-6-NH2-Glc 6-amino-6-deoxy-α-D-glucopyranosyl fluorideαF-Gal α-D-galactopyranosyl fluorideαF-Glc α-D-glucopyranosyl fluorideCMU 6-chloro-4-methylumbelliferylCMU-C 6-chloro-4-methylumbelliferyl cellobiosideCMU-X 6-chloro-4-methylumbelliferyl β-D-xylopyranosideCMU-X2 6-chloro-4-methylumbelliferyl xylobiosideDNP 2,4-dinitrophenolDNP 2-F-Gal 2,4-dinitrophenyl 2-deoxy-2-β-D-fluoro-galactosideDNP 2-F-Glc 2,4-dinitrophenyl 2-deoxy-2-β-D-fluoro-glucosideDNP-C 2,4-dinitrophenyl cellobiosideGal GalactoseGlc GlucoseMU 4-methylumbelliferoneMU-3-N3-Glc 4-methylumbelliferyl 3-azido-3-deoxy-β-D-glucopyranosideMU-4-N3-Glc 4-methylumbelliferyl 4-azido-4-deoxy-β-D-glucopyranosideMU-6-N3-Gal 4-methylumbelliferyl 6-azido-6-deoxy-β-D-galactopyranosideMU-6-N3-Glc 4-methylumbelliferyl 6-azido-6-deoxy-β-D-glucopyranosideMU-3-NH2-Glc 4-methylumbelliferyl 3-amino-3-deoxy-β-D-glucopyranosideMU-4-NH2-Glc 4-methylumbelliferyl 4-amino-4-deoxy-β-D-glucopyranosideMU-6-NH2-Glc 4-methylumbelliferyl 6-amino-6-deoxy-β-D-glucopyranosideMU-3-O-Me-Gal 3-methoxy-β-D-galactopyranosideMU-3-O-Me-Glc 3-methoxy-β-D-glucopyranosideMU-Ara 4-methylumbelliferyl α-L-arabinofuranosideMU-C 4-methylumbelliferyl cellobiosideMU-Gal 4-methylumbelliferyl β-D-galactosideMU-Glc 4-methylumbelliferyl β-D-glucopyranosideMU-Lac 4-methylumbelliferyl LactosideMU-Man 4-methylumbelliferyl mannosideMU-X 4-methylumbelliferyl β-D-xylopyranosideMU-X2 4-methylumbelliferyl xylobiosidepNP p-nitrophenolpNP 6-PO4-Glc p-nitrophenyl 6-phospho-β-D-glucopyranosidepNP-Ara p-nitrophenyl α-L-arabinofuranosidepNP-Gal p-nitrophenyl β-D-galactopyranosidepNP-Glc p-nitrophenyl β-D-glucopyranosidepNP-X p-nitrophenyl β-D-xylopyranosideXyl XylosexxiAcknowledgementsFirstly, I would like to thank my two Ph.D. supervisors Dr. Steven Hallam and Dr. Stephen Withersfor their years of support, mentorship, enthusiasm and vision. I would also like to thank all thepast and present members of both the Hallam and Withers labs for their support and friendship.In particular I would like to thank my partner on many of these projects: Dr. Keith Mewis, for hisdrive and enthusiasm. I would also like to thank Sam Keirandish for his robotic expertise; Dr. FengLiu to his determined NMR assignments and robot wrangling; lab managers Emily Kwan, DianeFairley, Melanie Scofield, and Jade Schiller without whom nothing would ever get done; SpenceMacdonald, Dr. Peter Rahfield, Dr. David Kwan and Dr. Ethan Goddard-Bodger for helpfulconversations, perspective and inspiration; Connor Morgan-Lang, Dr. Aria Hahn, and Dr. NielsHanson for bioinformatic support; and Dr. Harry Brumer for the use of his HPAEC.Finally, and most gratefully, I would like to thank my wife Sarah for her unending patience andsupport.xxiiChapter 1IntroductionNature is replete with a diversity of biocatalysts possessing the potential to solve current industrialand medical challenges. To convert this potential into tangible solutions new biocatalysts must bediscovered, characterized and adapted to the particular application. Discovery of new biocatalystshas historically relied on the screening of cultured organisms for the activity of interest. This,however, neglects the majority of microbes which belong to phyla lacking a cultured representative[133, 256]. To pass beyond the limitations of culture dependence, metagenomic techniques havebeen developed which allow us to investigate the genomic information of an environmental samplewithout prior culturing of the present microorganisms [112, 255, 256]. Furthermore, functionalmetagenomic screening allows us to tap into the catalytic power of these microbes without priorsequence annotation, enabling the discovery of catalysts which may have little or no sequencesimilarity to previously known catalysts.Ready access to sustainable energy and materials is one challenge which may be solved throughbiocatalytic means. Plant biomass is a renewable resource that can be converted into energy andmaterials as an alternative to fossil fuel [47, 100]. The structure of plant biomass, has however,evolved to be highly recalcitrant, thus complicating the realization of its potential value [122].To harvest the fermentable sugars and aromatics provided in plant biomass, mechanical, chemicaland biological processes have been developed [199, 232]. However, a major limitation for theindustrial deconstruction of plant biomass polymers continues to be a lack of cost-effective andefficient biocatalysts. For over 3.5 billion years, cooperative microbial communities have beendriving energy and material transformations that create and sustain planetary living conditions.As a result, although the vast majority of microbes in nature remain uncultured, they representan almost unbounded reservoir of genetic information and metabolic potential [256, 293]. High-11.1. Plant Biomassthroughput functional metagenomic screening offers a method to tap into this reservoir and identifynew catalysts for the deconstruction of plant biomass.Within this introductory chapter I review existing literature and motivate the creation of activeclone libraries through functional metagenomic screening and the development of biocatalysts fromthese libraries. Firstly, the molecular structure of plant biomass (with particular emphasis on thepolysaccharide component), and its variation within primary and secondary cell walls is reviewed.Next, carbohydrate active enzymes are reviewed, with particular emphasis placed on classes ofenzymes and enzyme systems that degrade plant biomass. This chapter then reviews a class of en-gineered enzymes (glycosynthases) which can be generated from plant biomass-degrading enzymesand used for synthesis. Functional metagenomic methods for the discovery of biomass-degradingenzymes are then reviewed. Finally, the structure of this dissertation is detailed.Plant CellsLayered mesh of microfibrils in plant cell wallMicrofibril structureHemicelluloseCelluloseLigninPlasma MembraneSecondary Cell WallPrimary Cell Wall CytosolMiddle LamellaFigure 1.1: Polymer Constituents of Lignocellulose. Cellulose, hemicellulose and lignin form struc-tures which are organized into macrofibrils that mediate the structural stability of plant cell walls.Pectic fraction is not shown. Adapted from Rubin [259].21.1. Plant Biomass1.1 Plant BiomassOne of the hallmarks of plant cells is the presence of a cell wall composed of the polysaccharidescellulose, hemicelluloses and pectin and the polyaromatic lignin. This cell wall can be divided intothree layers: the primary, and secondary cell walls and the middle lamella, Figure 1.1. Primarycell walls are synthesized during growth and typically are relatively thin, flexible, highly hydratedstructures [66]. Secondary cell walls provide strength and rigidity in plant tissues that have ceasedgrowing [66]. The middle lamella is a thin pectin-rich region between adjacent primary cell walls[334]. The concentrations and structures of cell wall polysaccharides vary between the primary andsecondary cell walls and with the plant taxa.Marine algae have similar cell walls to land plants, containing crystalline cellulose, hemicelluloseand matrix polysaccharides. However, algae contain several hemicelluloses and matrix polysaccha-rides that are not found within the land plants, including the sulfated glucan and glucuronanhemicelluloses found in red and brown algae [? ]. Futhermore algae do not contain pectins butinstead the green algae contain ulvans, red algae contain fucans and brown algae contain agars,carageenans and prophyrans [? ]. Although algal polysaccharide structures are undoubtedly im-portant to understanding biomass degradation, the following description of plant polysaccharideswill be confined to those found within terrestrial plants.1.1.1 Structure of PolysaccharidesThe majority of plant cell wall polymers are polysaccharides. The functional role of these polysac-charides is dictated by the monosaccharides present (see Figure 1.3) and their linkages, which maybe highly repetitive or extremely diverse. In order to determine the structure of a polysaccharidethe following must be determined:• Which monosaccharides are present. The most common monosaccharides contain eitherfive or six carbons and are known as pentoses and hexoses respectively. These monosaccharidescontain either an aldehyde or ketone at one end and typically hydroxyls at each of the othercarbons. For a hexose this means that in the linear form there are four stereocentres (carbons2-5 in Figure 1.2). Instead of referring to monosaccharides by their absolute R- and S-31.1. Plant Biomassconfiguration, as this would be cumbersome, each of the possible monosaccharides has atrivial name.• Whether the sugar present is the D- or L- isomer. These isomers are mirror imagesof each other and can be determined by the configuration of the carbon furthest from thealdehyde or ketone functionality. This is carbon 5 for glucose, shown in Figure 1.2. Thisnaming is based on the analogy to D- and L-glyceraldehyde.• Whether the sugar is in the pyranose or furanose form. Free sugars often exist as amixture of the linear, 5-membered ring (furanose) and 6-membered ring (pyranose) structures.When present as acetals or ketals in polysaccharides the monomers are no longer able tointerconvert between these isomers, unless they are at the reducing end of a polysaccharide.See Figure 1.2 for cyclic forms of glucose.1OHOOHHOHOOH234 56α-D-glucofuranoseα-D-glucopyranoseOHOOHHOHOOH1234 56β-D-glucofuranoseβ-D-glucopyranoseD-glucoseCOHCCOHHCCOHHHO HH OHCH2OH123456OHOHHOOHOOH123456 OHOHHOOHOOH123456Figure 1.2: Forms of D-Glucose. D-Glucose is shown in its open chain form as a Fischer projectionThe cyclic forms of glucose are shown as Haworth projections for the furanose forms and in thechair conformation for the pyranose forms. Carbons are numbered.• Whether the anomeric position (C1 in the ring form, in most cases) is in the α- orβ- configuration. The anomeric configuration of a sugar is determined by reference to thecarbon that determines the D- or L- configuration. In a Fischer projection, if the substituent41.1. Plant Biomassoff the anomeric centre is on the same side as the oxygen of the configurational (D- or L-)carbon, then it is the α-anomer. If it is directed in the opposite direction it is the β-anomer.See Figure 1.2 for anomeric configurations of glucose.• Which hydroxyls form the linkage between two monosaccharides. Typically, gly-cosidic linkages are formed between the anomeric center of one monosaccharide and a non-anomeric hydroxyl in another monosaccharide. For two β-D-glucopyranose residues this wouldmean four separate linkages could be formed (1,2-, 1,3- 1,4- or 1,6-linkages).• The position and presence of any non-carbohydrate modifications. Common mod-ifications include acetylations and methylations of hydroxyls.1.1.2 CelluloseApproximately 35 to 50 % of dry plant matter is composed of cellulose [184], a polymer of re-peating 1,4-linked β-D-glucopyranose subunits, making it the most abundant terrestrial polymer.Within plants, macromolecular complexes synthesize several cellulose strands simultaneously [65].Hydrogen bonding and hydrophobic interactions between these strands cause them to form an in-soluble crystalline cellulose microfibril [279]. These microfibrils may be 3-5 nm in diameter, severalmicrometers in length, and contain several hundred glucose molecules [65]. The insoluble natureof cellulose confers structural stability and causes practical problems for microbial degradation[121]. Many taxa, other than the land plants, also synthesize cellulose, including: green algae(Chlorophyta and Charophyta)[79], red algae (Rhodophyta) [297], brown algae (Phaeophyceae)[185], Oomycetes [118], animals (Urochordates which are marine invertebrates) [212], Amoebozoa[85] and Cyanobacteria [221].1.1.3 HemicelluloseHemicellulose is an overarching term used to describe the non-ionic polysaccharides, other thancellulose, which are present in the plant cell wall. This includes mixed-linkage glucans, xylans,xyloglucans, glucomannan and mannans. Hemicelluloses are thought to play roles as signallingmolecules [335] and in strengthening cell walls through interactions with cellulose and lignin [264].51.1. Plant BiomassOHOOHHOHOOHOHOOHOHHOOHOHOOHHOHOOHOHOHOHOOHOHOOHHOOHOHOOHHOHOOHOOHOOHHOHOOHOβ-D-glucopyranose β-D-galactopyranose β-D-mannopyranoseβ-D-glucopyranuronic acidβ-D-galactopyranuronic acidβ-D-xylopyranoseHOOOHHOHOH3Cα-L-rhamnopyranoseOHOOHOHHOH3Cα-L-fucopyranoseα-L-arabinopyranose 3-Deoxy-D-manno-oct-2-ulosonic acid (KDO)OOH OHHO OHβ-D-apiofuranoseOHOOHOHHOα-L-arabinofuranoseOHOOHOHHOHOα-L-galactopyranose3-deoxy-D-lyxo-2-heptulosaric acid (DHA)OHOOHOHHOOO OHOOHOHH3COHO3-C-carboxy-5-deoxy-α-L-xylose (Aceric Acid)OOHHOHOOHOHOHOHOFigure 1.3: Monosaccharides Present in Plant Biomass. Cellulose, hemicelluloses and pectin arecomposed of the variety of monosaccharides shown.The composition and abundance of each of these polysaccharides is variable between species, andoften differs between primary and secondary cell walls within the same species [264].Mixed linkage glucansLike cellulose, mixed linkage glucans (MLGs) consist entirely of β-glucose residues, however unlikecellulose MLGs contain 1,3-linkages in addition to the 1,4-linkages seen in cellulose. Typically,the 1,3-linkages are located every 3 to 4 residues, linking cellotriosyl and cellotetraosyl subunits[141]. MLGs are considered to be limited to grasses (Poales) and one isolated land plant genus61.1. Plant Biomass(Equisetum) [96] and are thought to be more abundant in the primary cell wall than in the lignifiedsecondary cell wall [305]. MLGs are also present in many food sources including oats and barley[195]XylansXylans are characterized by a backbone chain of β-1,4-linked D-xylose residues. This backboneis the site for several decorations, the most common of these being: acetylation at the 2 or 3position, attachment of α-D-glucuronic acid or 4-O-methyl-α-glucuronic acid at the 2-position andattachment of α-L-arabinofuranose at the 2- or 3-position [264]. Additionally, a majority of xylanshave a characteristic reducing end sequence consisting of (xylose-(1,3)-α-L-rhamnopyranose-(1,2)-α-D-galacturonate-(1,4)-D-xylopyranose) [142]. Further decorations on the α-1,2-linked glucuronicacid residues are present in the monocot orders Alismatales, Asparagales and the dicot Eucalyptisgrandis [234]. Within these taxa α-L-arabinopyranose is attached via 1,2-linkage to glucuronic acid,which may or may not be methylated [234]. The xylans of Eucalyptis also contain β-galactose units1,2-linked to glucuronic acid [234, 298]. Xylans present in grasses (family Poaceae) may also haveferulic acid esters attached to the C-5 hydroxyl [75] and further decorations on C-2 hydroxyl ofthe α-1,3-linked arabinofuranose substituents [234]. Corn bran xylan is one such polymer, havingmultiple separate tetrasaccharides containing D-galactose, L-galactose and ferulic acid and D-xylosebranching from an arabinofuranose sidechain [6, 11].The decorations observed on the xylan backbone change with taxonomy [42, 234]. Dicots, in-cluding hardwood trees, contain primarily glucuronoxylan and lack arabinofuranose decorations.On the other hand, monocots of the order Poales, which includes grasses, contain higher concen-trations 2- and 3-linked arabinofuranose decorations in addition to α-glucuronyl sidechains [234].Xylans from softwood (Gymnosperms) such as Douglas fir [Pseudotsuga menziesii ], or Spruce [Piceaabies] are also decorated with both glucuronic acid and 1,3-linked arabinofuranose, but lack 1,2-linked arabinofuranose [42, 86]. Grasses (Poaceae) have higher concentrations of xylans than doeither dicots or gymnosperms.71.1. Plant BiomassXyloglucanXyloglucans are mainly present in the primary cell walls of plants, and form strong interactionswith cellulose microfibrils through hydrogen bonding [233]. The backbone of this polysaccharideis a chain of β-1,4-linked glucose residues. The most common decorations of this polymer arexylosyl residues attached via α-linkages at the 6-hydroxyl position of the glucose backbone. Boththe backbone glucosyl and the xylosyl residues can be further substituted with D- and L-galactosyl,L-fucosyl, D-galacturonosyl, L-arabinopyranosyl, L-arabinofuranosyl and acetyl moieties at specificlocations in specific linkages, resulting in the 24 unique structures identified to date [233].As for xylans, the diversity of xyloglucan substitutions varies with taxonomy. The most commonform of xyloglucan contains a repeating structural unit consisting of three xylose-decorated glucosesfollowed by one undecorated glucose. This sequence is often galactosylated and fucosylated, result-ing in fucogalactoxyloglucan which is observed to be present in most tissues of most dicots [233].The majority of xyloglucans in the primary cell walls of gymnosperms are also fucogalactoxyloglu-cans with similar structures to those of the xyloglucans in the primary walls of most dicots [130].Monocots have diverse xyloglucan structures, with non-grass monocots having structures similarto dicots, whereas grasses have fewer decorations and reduced xyloglucan concentrations [129].Mannan and GlucomannanMannans and glucomannans contain β-1,4-linked D-mannose residues in their structural backbonewith glucomannans also containing backbone β-1,4-linked D-glucose residues. This backbone canbe decorated with acetyl groups at the 2- and 3-positions or with α-linked galactosyl groups atthe 6-positions, forming galactoglucomannans [207]. Acetylated galactoglucomannans are presentin gymnosperms, such as conifers, as the main hemicellulose, although xylans are also present [44].Dicots and grasses also contain glucomannans, though in smaller amounts [264]. Mannans are alsohighly abundant in seeds as a storage polymer [37].81.1. Plant Biomass1.1.4 PectinsPectins, used as the gelling agent used in the preparation of jams and jellies, form gel like struc-tures in cell walls which help hold the layers of the cell wall together [139]. Pectins contain awider variety of monomers and linkages than those seen in either cellulose or hemicellulose. Theidentifying feature of pectins is a backbone containing the charged sugar galacturonic acid. Thismonomer is the sole backbone sugar in homogalacturonan (HG), rhamnogalacturonan II (RG-II)and xylogalacturonan, where it is linked via α-1,4-bonds. Rhamnogalacturonan I (RG-I), on theother hand, contains alternating α-L-rhamnose and α-D-galacturonic acid monomers with galactoseattached to the 2-position and rhamnose attached to the 4-position of galacturonic acid.The type of branches and decorations linked to the backbone polymer vary with the type ofpectin. In HG, decorations of the backbone consist of methylesterifications of the C-6 carboxylateand acetylations at the O-2 and O-3 positions [43]. Xylogalacturonan also contains β-linked xyloseat the 3-hydroxyl and occasionally the 4-hydroxyl of the galacturonan backbone [205]. In RG-I,branching polysaccharides occur at the 4-position of the rhamnose backbone residues. Arabinans,galactans and arabinogalactans have all been observed as branches from RG-I. The arabinansbranching from RG-I contain a polymer of α-L-arabinose residues with a 1,5-linkage, which may befurther decorated at the 3-position with additional α-L-arabinose residues. The galactans branchingfrom RG-I consist of β-1,4-linked D-galactose residues, which may contain β-D-galactose decorationsat the 6-position The branching arabinogalactans contain a backbone of either β-1,4-linked D-galactose (type I), or β-1,3-linked D-galactose (Type II). These arabinogalactan backbones serve asfurther branching points that can contain a variety of decorations [43].RG-II, is the most complex of the pectin polymers as it contains 12 different sugar monomerswith 20 different linkages (see Figure 1.7 for structure). These include the following rare sugars: D-apiose, L-aceric acid, 2-O-methyl L-fucose, 2-O-methyl D-xylose, L-galactose, 2-keto-3-deoxy-D-lyxo-heptulosaric acid (DHA) and 2-keto-3-deoxy-D-manno-octulosonic acid (Kdo) [228]. Additionally,RG-II is bound to borate through D-apiose residues, causing cross-linkages between RG-II strands.In terms of abundance, HG is the major component of pectins, constituting approximately 65% of pectin [205]. The next most abundant pectin polymer, RG-I, represents 20-35 % while RG-II91.1. Plant Biomassrepresents approximately 10 % of the pectin present in primary cell walls [205]. Xylogalacturonanconcentrations are typically low as these polymers are mainly found in reproductive cells [205].Pectins are abundant in the growing primary cell walls and middle lamella, but are present atmuch lower levels in secondary cell walls. RG-II is also present in the primary cell wall, but notdetected in the middle lamella [228]. RG-II presence also varies with taxonomy, making up between1-4 % of primary cell walls of dicots and gymnosperms but less than 0.1 % in grasses [228].1.1.5 LigninLignin, which after cellulose is the most abundant terrestrial biopolymer, [31] can constitute upto 30 % of secondary cell walls [264]. It is an aromatic polymer created through radical oxidativecoupling of monolignols. The three most common lignin monomers are: p-coumaryl alcohol, sinapylalcohol and coniferyl alcohol and their relative abundance in lignin varies with species. Dicotscontain lignin derived from sinapyl and coniferyl alcohol, while grasses incorporate higher amountsp-coumaryl alcohol and gymnosperms (such as conifers) lack sinapyl alcohol [31].The radical, oxidative mechanism of lignin formation causes highly diverse structural linkagesto be formed between monomers and between monomers and carbohydrates present in the cellwall. In fact, the linkages are so diverse it has been hypothesized that no two lignin molecules areidentical [3]. Lignin stiffens the cell wall by cross-linking with the polysaccharide fraction and itprovides a barrier between potential pathogens and the energy-rich polysaccharides [304].Table 1.1: Plant Cell Wall Composition, Amount of Polysaccharide (% w/w)Dicot Walls Grass Walls Conifer WallsPolymer 1o 2o 1o 2o 1o 2o ReferenceCellulose 15-30 45-50 20-30 35-45 20-30 40-50 [39, 78, 252, 309]Hemicellulosesβ-Glucans - - 2-15 Minor - - [264]Xyloglucan 20-25 Minor 2-5 Minor 10 - [264]Glucuronoxylan - 20-30 - - - - [264]Glucuronoarabinoxlyan 5 - 20-40 40-50 2 5-15 [264]Glucomannan 3-5 2-5 2 0-5 - - [264]Galactoglucomannan - 0-3 - - Present 10-30 [264]Pectins 20-35 Minor 5 Minor 20-35 Minor [309, 334]*Values vary between different species and tissue types.101.2. Carbohydrate Active EnzymesFigure 1.4: Polymer Constituents of Lignocellulose.Cellulose (A) solely contains glucose subunitslinked through β-1,4 bonds. Polygalacturonan (B) is the main constituent of pectin and forms thebackbone of rhamnogalacturonan II and homogalacturonan. Hemicellulose contains, among others,the polymers xyloglucan (C) and glucuronoarabinoxylan (D). Lignin (E) is a polyaromatic, withextremely heterogeneous structure.1.2 Carbohydrate Active EnzymesThe enzymes that synthesize, modify and degrade carbohydrates are termed carbohydrate active en-zymes. The Carbohydrate Active enZymes (CAZy) database ( has emergedas an integral clearing-house for functional annotation [181]. CAZy categorizes polysaccharidedegradation genes, such as glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrateesterases (CEs), carbohydrate-binding modules (CBMs) and more recently lytic polysaccharidemono-oxygenases (LPMOs) [172], into sequence-defined families.111.2. Carbohydrate Active Enzymes1.2.1 Glycoside HydrolasesGlycoside hydrolases (EC 3.2.1.x) are enzymes that catalyse the hydrolytic cleavage of either gly-cosidic bonds between saccharides or between a saccharide and a non-sugar molecule (aglycone).Classification within the EC framework fails to take into account enzyme mechanism and manyenzymes with the same EC number have unrelated sequences. Sequence-based classification by theCAZy database has delineated over 150 glycoside hydrolase (GH) families [181]. Categorizationinto sequence-based families gives insight into the conserved mechanisms and active-site residueswithin families. Several families with multiple activities, including both GH43 and GH5, have alsobeen further classified into subfamilies which provide finer details of the evolution and substratespecificity of specific families [15, 202].EnzO OEnzOOEnzO OEnzOOEnzO OEnzOHOROHOHHOHOHOOHOREnzO OOHHEnzHO OEnzOOROHOHOHOHNOHOEnzOOROHOHHOHOHONHOHOEnzOHOA.B.C.HOHOHOOHOHOROHOHOOHOHOHOHOOHOHOHHN OOHOHOOHOREnzOOH OHEnzOOHOHOHOHOOHOHOHFigure 1.5: Glycoside Hydrolase Mechanisms. Catalytic mechanisms of a retaining glycoside hydro-lase (A), a substrate assisted, N -acetyl glucosaminidase (B), and an inverting glycoside hydrolase(C).Glycoside hydrolysis can occur with either retention or inversion of stereochemistry at the121.2. Carbohydrate Active Enzymesanomeric centre (see Figure 1.5). Retaining glycosidases progress through a double displacementmechanism involving a covalent intermediate. In the first step the nucleophilic active site residue(generally an aspartate or glutamate) attacks the anomeric center concomitant with protonation ofthe aglycone by the active site acid residue (often an aspartate or glutamate as well). This resultsin cleavage of the glycosidic bond and formation of the covalent glycosyl-enzyme intermediate.The anomeric center of this intermediate is subsequently attacked by a water molecule, with basecatalytic assistance from the same active site acid/base residue. Retaining enzymes that hydrolyse2-acetamido sugars can alternatively employ a substrate-assisted mechanism in which the activesite nucleophile is absent [186]. In this case the acetamide oxygen attacks the anomeric centreproducing an oxazolinium ion intermediate, which is in turn attacked by water to release thehydrolysis product with net retention of the anomeric stereochemistry (Figure 1.5 B). Invertingenzymes utilise an acid and a base residue to catalyse the direct attack of water at the anomericcentre, facilitating release of the aglycone with inversion of anomeric stereochemistry (Figure 1.5C).GHs can be further categorized as either exo- or endo-acting. Exo-acting enzymes cleavemonosaccharides from either the reducing (the terminal anomeric center is not involved in bonding)or non-reducing (terminal anomeric center is involved in bonding) termini of a polysaccharide, re-leasing monosaccharides. Endo-acting enzymes, on the other hand, cleave glycosidic bonds withina polysaccharide releasing two polysaccharide fragments, and creating new termini which can betargeted by exo-cleaving enzymes. Within the context of plant biomass degradation both endo- andexo-acting enzymes are required for efficient degradation of plant polysaccharides. Furthermore,the complex nature of polysaccharides such as xyloglucan and rhamnogalacturonan II dictates re-quirement of many different GH families to catalyse the complete degradation of these polymers(see Figures 1.6 and 1.7).131.2.CarbohydrateActiveEnzymesFucoseArabinofuranoseGlucuronic AcidGalactose XyloseMannoseGlucoseβ-1,3 β-1,4 β-1,4 β-1,3 β-1,4 β-1,3 β-1,4β-GlucanGH9GH16EC GH5 GH16 GH17 GH55GH5 GH16 GH17 GH55 GH64 GH81 GH128β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4CelluloseEC GH6 GH7 GH8 GH10 GH12 GH26 GH44 GH45 GH48 GH51 GH74 GH124GH1 GH3 GH5 GH9 GH30 GH116β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4α-1,3α-1,2 α-1,2α-1,34MeGlucuronoarabinoxylan 2AcGH67GH115β-1,4EC GH5 GH8 GH9 GH10 GH11 GH12 GH16 GH26 GH30 GH43 GH44 GH51 GH62 GH98 EC GH3 GH5 GH30 GH39 GH43 GH51 GH52 GH54 GH116 GH120 GH8EC EC GH3 GH43 GH51 GH54 GH62GH2 GH3 GH43 GH51 GH54 GH62β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4GlucomannanGH1GH2GH5β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4Galactomannanα-1,6 α-1,6EC GH9 GH26 GH44 GH113 GH134EC GH27 GH31 GH36 GH57 GH97 GH110Xyloglucanα-1,6α-1,6α-1,6α-1,6α-1,6α-1,2β-1,2 β-1,2β-1,4 β-1,4 β-1,4 β-1,4 β-1,4 β-1,4α-1,2GH1 GH3 GH5 GH9 GH30 GH116GH31GH1 GH2 GH3 GH35 GH39 GH42 GH50 GH59EC GH9 GH12 GH16 GH26 GH44 GH74GH2 GH3 GH43 GH51 GH54 GH62EC 1.6: Enzymatic Degradation of Plant Cellulose and Hemicelluloses. Glycoside hydrolase and polysaccharide lysase familieswith the required activities to cleave plant polysaccharides. Monosaccharides are abbreviated as symbols and the linkages betweenthem are labeled. Methylations and acetylations are abbreviated as Me and Ac respectively. GH and PL families with the requiredactivity are given within the dashed boxes, and corresponding EC numbers are also indicated. CE families are excluded for simplicity.141.2.CarbohydrateActiveEnzymesA Aceric AcidGalactoseApiose ArabinopyranoseXyloseKDO ArabinofuranoseGalacturonic AcidDHA Glucuronic AcidRhamnoseFucose L L-Galactoseα-1,4α-1,4α-1,4α-1,3α-1,4α-1,4α-1,4α-1,3α-1,4 α-1,4α-1,4α-1,4β-1,2α-1,3α-1,2 β-1,3α-1,4β-1,4α-1,2α-1,3Lα-1,3α-1,4α-1,2Aβ-1,2α-1,4α-1,3α-1,2 β-1,2Ac2MeAcRhamnogalacturonan IIChain Eβ-2,3β-1,5α-2,3α-1,5Chain FChain DMeChain A Chain BChain CGH140 β-1,2GH143GH142GH137GH139GH141GH138GH43 GH43 GH105 GH18 GH78GH33GH2GH78GH2GH95GH78GH78GH127GH2GH2 GH106    Rhamnogalacturonan IGalactanArabinanβ-1,4α-1,2 α-1,4 α-1,2 α-1,4 α-1,2 α-1,4β-1,4β-1,6α-1,4α-1,5α-1,3EC ECα-1,2EC GH78GH28PL11PL26PL4PL11GH2 GH3 GH43 GH51 GH54 GH62GH93EC GH2 GH3 GH43 GH51 GH54 GH62EC GH2 GH3 GH35 GH39 GH42 GH50 GH53 GH59Homogalacturonanα-1,4α-1,4α-1,4α-1,4α-1,4α-1,46Me 3Ac 2AcEC GH28 PL1 PL2 PL9GH28 PL1 PL2 PL3 PL9 PL10EC EC α-1,4EC MethylationAcetylation2MeFigure 1.7: Enzymatic Degradation of Plant Pectins. Glycoside hydrolase and polysaccharide lysase families with the requiredactivities to cleave plant polysaccharides. Monosaccharides are abbreviated as symbols and the linkages between them are labeled.Methylations and acetylations are abbreviated as Me and Ac respectively. GH and PL families with the required activity are givenwithin the dashed boxes, and corresponding EC numbers are also indicated. The RG-II degradation genes shown has been limitedto those families identified by Ndeh et. al. [215]. CE families are excluded for simplicity.151.2. Carbohydrate Active Enzymes1.2.2 Polysaccharide Utilization LociThe study of carbohydrate metabolism has resulted in foundational achievements in molecularbiology. Study of the lac operon and the L-arabinose operon have revealed mechanisms of geneexpression and provided powerful molecular tools [107, 137]. More recently, research has focusedon co-localized gene clusters in Bacteroidetes genomes that target plant biomass. These carbo-hydrate targeting gene clusters have been termed Polysacccharide Utilization Loci (PULs) [28].As Bacteroidetes are found in many diverse environments, including gut microbiomes [14, 208],both marine [89] and fresh water, [290] and soils [166], PULs play a significant role in planetarycarbohydrate degradation.The first PUL to be identified by Salyers et al. [289] was the starch utilization system (SUS),see Figure 1.8. This archetypical PUL contains the outer-membrane binding proteins which bindstarch, and a surface-bound hydrolase that produces starch oligomers. These oligomers are thentransported via a TonB-dependent transporter into the periplasmic space where they are furtherdegraded by two additional hydrolases. The products of this saccharification can then enter thecell and central metabolism [90]. Not only are all these genes co-expressed, but they are alsoco-localized within the genome [90].The hallmark of all PULs is the presence of a sequential pair of SusC-like and SusD-like proteins,encoding a TonB-dependent transporter and a surface binding protein. Otherwise, the variety ofenzymes encoded within PULs varies in complexity with the polysaccharide being acted on. PULshave been shown to contain catalytic PLs, CEs, sulfatases and phosphorylases in addition to bothendo- and exo-acting GHs [48, 187, 215]. Furthermore, the discovery of PULs that target thecomponents of the plant cell wall (including mixed-linkage glucans [287], xyloglucan [165], xylan[257], galactomannan [16], RG-I [183] and RG-II [215]) has lead to the identification of several newGH families and the identification of new activities within previously known families161.2. Carbohydrate Active EnzymesSusBSusATonBStarchSusCOligosDimers/MonomersInner MembranePermeaseSusRSusR SusA B C D E F GABSusD SusE SusFSusGOuter MembraneFigure 1.8: Starch Utilization System (SUS) Operon in B. thetaiotaomicron. A: Extracellularstarch is bound by the outermembrane lipoproteins SusDEF and hydrolysed by SusG (GH13).These starch oligomers are then transported to the periplasm via the TonB-dependent transporterSusC. The starch oligos are further degraded to dimers and monomers by the hydrolases SusA(GH13) and SusB (GH97), which then enter the cell. SusR senses maltose and drives expression ofsusABCDEFG. B: Genomic organization of the SUS operon, genes are not shown to scale. Figureadapted from Koropatkin et al.[156]1.2.3 GlycosynthasesGlycoside hydrolases are reversible, and therefore have the capability to be used in the synthesisof glycans. Reversal by altering the equilibrium position, however, is challenging and requires theuse of very high sugar concentrations to counteract the presence of 55 M water [116]. Attemptsto perform transglycosylations in non-aqueous solutions are not generally useful since the sugarsthemselves typically become insoluble, though in certain cases worthwhile products can be obtained171.2. Carbohydrate Active Enzymes[164]. More fruitful has been the formation of products under kinetic control via transglycosylation.This necessarily uses a retaining glycosidase, typically with an activated donor sugar to form a highsteady state concentration of glycosyl enzyme, allowing efficient transglycosylation [98]. However,the products formed from activated glycosides can subsequently be hydrolysed by the glycosidase,limiting yields.Mutation of the active site nucleophile drastically decreases the hydrolytic activity of a retainingglycoside hydrolase [313]. This also prevents transglycosylation as the necessary covalent glycosyl-enzyme intermediate is no longer formed. If however, a donor substrate possessing an activatedleaving group at the anomeric center with the opposite anomeric stereochemistry relative to thehydrolysis product (a mimic of the glycosyl-enzyme intermediate) is employed, transfer to a suitableacceptor can be catalysed without subsequent hydrolysis (Figure 1.9 A). Enzymes of this class havebeen termed glycosynthases. This method for glycan synthesis was first demonstrated 20 yearsago with the E358A nucleophile variant of Agrobacterium sp. β-glucosidase (Abg) [188]. Thisenzyme was chosen to create a glycosynthase as the wild-type enzyme normally catalyses efficienttransglycosylation [245] and the substitution of alanine for the glutamate nucleophile resulted in astable enzyme with severely decreased hydrolysis rates [312]. The use of either α-galactosyl fluorideor α-glucosyl fluoride as donors and para-nitrophenyl glycoside acceptors with this enzyme enabledthe production of several different glycans with yields of up to 92 % [188].A similar method has also been developed for retaining glycoside hydrolases employing substrateassisted catalysis [303]. By utilising an activated oxazoline glycan as a donor, transglycosylationof 2-acetamido-glycans can be catalysed by a wild-type glycosidase [98]. However, the productmay still be hydrolysed. Yamamoto and colleagues were able to circumvent this problem by intro-ducing active site mutations which reduced hydrolysis rates without substantially compromisingtransglycosylation rates, thereby improving yields [303] (Figure 1.9 B).Though most glycosynthases are derived from retaining glycoside hydrolases, inverting glycosi-dases have also been converted into glycosynthases by mutating the catalytic base and using anactivated glycan with the same anomeric stereochemistry as the normal hydrolysis product. Effi-cient transglycosylation can be achieved without subsequent hydrolysis by reversal of the normalreaction since fluoride requires no acid activation for departure, yet the normal hydrolytic process181.2. Carbohydrate Active EnzymesOHOHOOHEnzFEnzOOOHOHOOHOREnzEnzOHOOHREnzOHOA.B.C.R' R'FOHHEnzR'OHOHOOHFOHOHOOHOREnzR'F OHHROHOHOHOOHOHEnzEnzOHOR'SlowOH OHEnzOOOHROHOHOHNOHOOHOHONHOROOHEnzOOOHREnzOOHOH OH OHFigure 1.9: Glycosynthase Mechanisms. The mechanism of a glycosynthase developed from aretaining glycosidase (A), a glycosynthase utilising an oxazoline donor sugar (B), and an invertingglycosynthase (C).is substantially slowed (Figure 1.9 C). An example of this type of glycosynthase is an invertingglycosynthase from GH19 [227]. In that case, the S102A variant of the Bryum coronatum chitinasecan catalyse the synthesis of chitooligosaccharides from α-chitobiosyl fluoride, which acts as bothdonor and acceptor molecule.One could envisage using glycosynthases to synthesize almost any polysaccharide with definedregiospecificity and without need for chemical protection. However, for this vision to become areality, the range of available glycosynthases must be expanded. Glycosynthases have thus far beendeveloped from 17 GH families [60, 227], but this represents only a small fraction of the over 140active glycoside hydrolase families currently known. Expansion of the range of GH families thathave been converted to glycosynthases will enable the production of new glycan linkages. Also, theexploration of hydrolases within a family that may act on similar glycans but with different protein191.3. Metagenomicsor lipid specificity is a worthwhile goal. The creation of functional gene libraries should enablethe rapid screening for enzymes with specific hydrolytic activities (including non-natural activites)which can then be converted into glycosynthases with a cognate synthase activity.1.3 MetagenomicsA vast majority of the estimated 1030 prokaryotic cells [317] belong to species which have neverbeen cultured in isolation. This confounds the central questions of microbial ecology, namely “whois there?” and “what are they doing?” [314]. To address these questions a number or techniqueshave been employed to analyse all the genetic material within an environment as a whole. Toaccess the metagenome, a term first coined in 1998 [113], DNA is often isolated directly from theenvironment, thus bypassing the need for culturing.Metagenomic research has taken advantage of massively paralleled, high-throughput DNA se-quencing techniques to provide insight into environmental DNA. To analyse the functional role ofthese sequences and their corresponding genes within an environment, a functional prediction mustbe made. However, this is limited by the number of genes that have been functionally characterizedand the reliability of prediction. Furthermore, these predictions are unable to assign new function-alities to novel genes; sequence annotation can only operate within the current paradigm of genefunctions. For example, it has been estimated that only 6% of CAZy enzymes have been charac-terized and it has been estimated that the function of only 20% of the proteins in the sequencedatabase can be predicted with confidence [105].There is a clear need for the functional characterization of metagenomic DNA. This can beaccomplished by functional metagenomic activity screens, coupled with high-throughput enzymecharacterization. Functional screens have the ability to provide a direct link between metagenomesand their functional activities. They can also provide the ability to discover enzymes with activitiesthat exist outside the current paradigms of gene annotation, which in turn, can better inform insilico approaches.201.3. Metagenomics1.3.1 Functional Metagenomic ScreensFunctional metagenomic screens involve the construction and screening of environmental DNAexpression libraries. These libraries require a suitable vector for heterologous expression in a com-patible host system such as E. coli (Figure 1.10). Identifying a suitable source of environmentalDNA is a critical consideration when designing a screening strategy. Potential DNA sources in-clude soil [73], water [135], feces [148] and bioreactors [201], all of which present different challengesin their processing. Soil and feces typically contain contaminants that interfere with downstreamenzymatic processes, necessitating additional DNA purification steps. Water samples, on the otherhand, may be too dilute, in terms of the number of cells per liter, and require the filtration of alarge volume to obtain enough cells. Additionally, the choice of environment will likely dictate theviability and method of functional screening. If the targeted activity is known to be abundant inthe environmental sample a small insert library is potentially sufficient. Conversely, if the suspectedactivities are likely to be rare, a small insert library will not be sufficient and a large insert, orfosmid, library will potentially be the better choice [293].The choice of host strain is another factor that must be considered when designing a screen.Engineered E. coli strains are the most commonly used screening hosts for functional metagenomics,as they grow rapidly and are easy to transform with exogenous DNA. However, there are limitationswhen dealing with exogenous promoters, initiation factors, codon usage or protein folding. Gaboret al. [99] estimated that from a diverse subset of genomes the expression potential for an E. colihost system ranges widely, from only 7% to up to 73% of the genes. Additionally, it is importantto select a host strain that lacks endogenous activity against the screening substrate.Resulting libraries are screened for activity on agar [88] or in microtiter plates [200, 201], usinga reporter e.g. substrate or transgene, or other form of phenotypic selection e.g. growth. Screen-ing libraries sourced from a range of environmental conditions (e.g. pH, temperature, metal ionconcentrations) enables recovery of active clones with alternative substrate specificities and toler-ances [19, 24, 27, 147, 193, 223, 306, 318, 328, 332]. Similarly, libraries sourced from xylotrophicor wood-feeding organisms can provide insight into biomass deconstruction. Recently, Ruegg andcolleagues screened an isolate fosmid library sourced from the lignocellulolytic bacterium Enter-211.3. Metagenomicsobacter lignolyticus to identify genes conferring IL tolerance under biorefining conditions in anE. coli host [260]. They recovered an active clone encoding a membrane transporter and tran-scriptional regulator enabling a 20% increase in biofuel production in the presence of 68 mM1-ethyl-3-methylimidazolium chloride. Similarly, Bastien and colleagues screened fosmid librariessourced from the termite Pseudacanthotermes militaris gut and fecal combs [19]. This species cul-tivates a termite-specific Basidiomycete fungus, Termitomyces sp., which thrives upon combs madeof termite feces. Functional screening recovered 101 clones acting on a range of model substratescontaining arabinoxylan and xylan moieties and identified differences in biomass deconstructionpotential between microbial communities inhabiting the gut and comb milieus. Functional metage-nomic screening has allowed the discovery from a number of environments, however, many remainto be explored.1.3.2 16S Ribosomal RNA ProfilingTo address the question of which species are present within an environment, molecular methodshave been developed. This is necessary as it is difficult to determine the taxonomy of prokaryoticcells based on morphology alone. By examining the sequence of marker genes, encoded within thegenome, a systematic framework for bacterial taxonomy has been developed. The specific markergene that is typically used is the small sub-unit ribosomal RNA, also known as 16S rRNA. Thisis an ideal choice as this gene is ubiquitous, functionally conserved and different regions change atdifferent rates [321]. The 16S rRNA contains nine (9) variable regions [331] which can be targetedwith primers, facilitating the amplification of these regions from the genetic background. AmplifiedDNA can then be sequenced, with short read sequencing technology, producing many thousands ofreads. The resulting sequences are then processed with the use of a bioinformatic pipeline, such asQIIME [46]. This pipeline removes low quality sequences, and clusters the sequences (typically at97% sequence similarity ). The resulting bins, refered to as Operational Taxonomic Units (OTUs),can then be assigned a taxonomy based on identity with known 16S sequences.221.4. Dissertation OverviewEnvironmental SamplesIsolated CellsEnvironmental DNALigation withVector BackboneFosmid LibraryPackage intoBacteriophageE. coli LibraryTransduction intoE. coli HostColonyPickingSmall Insert LibraryElectroporation intoE. coli HostFunctional ScreeningEPlate LibraryFigure 1.10: Functional Metagenomic Screening Workflow. Microbial communities can be inter-rogated for biological activities through functional metagenomic screening. Environmental DNAcan be extracted directly from natural and engineered ecosystems and used to construct screeninglibraries. A workflow for constructing large insert fosmid libraries and small insert libraries is de-picted. Fosmid library production involves high molecular weight environmental DNA preparation,ligation into a vector backbone and head-full packaging of ligated DNA into a phage delivery sys-tem. Small insert libraries can similarily be ligated with a variety of vector backbones which canbe used to transformed via electroporation. Host cells are then transfected, plated and arrayed in384-well plate libraries and can be interrogated with a variety of functional screens.1.4 Dissertation OverviewThe aim of this thesis is to analyze the functional aspects of microbial communities that degradeplant polysaccharides and to investigate unexamined environments with high-throughput functionalscreens. This will lead to the creation of a library of functional clones which can be rapidly inter-rogated under a variety of conditions with a variety of substrates. Additionally, mutation of thesecatalysts can produce enzymes that are capable of synthesizing defined glycans containing chemi-231.4. Dissertation Overviewcally modified sugars. This thesis contributes a better understanding of the enzymatic conversionof plant polysaccharides, and to new catalysts for both the deconstruction of plant biomass andsynthesis of chemically modified polysaccharides.Chapter 2 details the use of high-throughput functional metagenomics to screen 22 differentenvironments for cellobioside-degrading activities. This enabled the creation of a panel of activefosmid-harbouring clones, which were further characterized by rapid, plate-based, assessment ofthe biochemical parameters, and sequencing. This has revealed hundreds of glycoside hydrolases,many of which show low identity to any previously discovered gene.Chapter 3 describes the application of functional metagenomic screening to the Castor canaden-sis fecal and gut microbiomes. Four fosmid libraries were created from different sites within thedigestive tract and fecal matter. These were subjected to functional screening with new and highly-activated substrates specific for cellulose- and hemicellulose-cleaving enzymes. This resulted in theidentification of many previously unknown PULs and characterization of enzymes that synergisti-cally degrade arabinoxylans.Chapter 4 uses the clone libraries generated in Chapters 2 and 3 and a synthetic gene libraryto detail the promiscuity of glycoside hydrolases. Genes identified with activity towards modifiedglycosides were then mutated in the hopes of creating glycosynthases that could use modifiedacceptor sugars. The efficiency and products produced by the created glycosythases are described.This has led to the generation of eight new synthetically useful glycosynthases.Chapter 5 gives an overall analysis and integration of the research and conclusions of the thesisin light of current research in the field. This chapter also comments on strengths and limitationsof the thesis research and presents possible future research directions in the field drawing on thework of this thesis.Finally, Chapter 6 details the materials and methods used to conduct the research containedwithin this thesis.24Chapter 2Large-Scale Functional MetagenomicScreening for Glycoside Hydrolases2.1 SummaryThis chapter presents the high-throughput functional screening of 22 large insert metagenomiclibraries and the characterization of active clones. Screening was performed in 384-well plate formatwith a model substrate (4-methylumbelliferyl cellobioside) that releases a fluorescent molecule whencleaved by β-glucosidases or cellulases, and resulted in 178 verifiably active clones. The substratespecificity, thermal stability and optimal pH of the glycosidase(s) expressed on these clones wasinvestigated in a high-throughput, plate-based format. The insert DNA, harboured within eachof these clones, was sequenced and functional annotation revealed a cornucopia of carbohydrate-degrading enzymes. The discovered genes were compared to those of previously characterizedglycoside hydrolases, which revealed several genes belonging to clades that have not previouslybeen characterized. The large insert sequences were investigated for the presence of operons andgene clusters, which revealed synteny between fosmids. This well characterized collection of clonesserves as a future resource for the development of optimized biocatalysts, whether it be for thedegradation of biomass or for other specialized functions.2.2 BackgroundPlant biomass offers a sustainable source for energy and materials and an alternative to fossilfuels. However, the industrial scale production or biorefining of fermentable sugars from plantbiomass is currently limited by the lack of cost effective and efficient biocatalysts [57]. Microbes,252.2. Backgroundthe earth’s master chemists – employing biocatalytic solutions to harvest energy, and transformthis energy into useful molecules – offer a potential solution to this problem. Microbial degradationof carbohydrates involves the use of glycoside hydrolases (GH), which offer some of the greatestcatalytic rate enhancements among enzymes [336]. GHs catalyse the degradation of a profuse varietyof polysaccharides, including cellulose, the most abundant terrestrial biopolymer [31], pectins andhemicelluloses. They are also important industrial catalysts [76, 239, 267], and therapeutic targets[151]. Clearly, the identification of new GH genes has the potential to improve upon both theefficacy of current biocatalysts and the generation of new catalysts for new chemistries.The Carbohydrate-Active Enzymes database (CAZy) is an expertly curated resource whichclassifies GH genes into over 140 families based on sequence similarity [181]. The genes withina family often display catalytic specificity towards the same broad category of substrate, whichenables the predictive annotation of genes that have not been functionally analyzed [315]. However,this predictive ability often breaks down when a diverse range of substrates are cleaved by enzymeswithin a family. It has been estimated that the function of only 20 % of the proteins in the sequencedatabase can be predicted with confidence [105]. Additionally, only a small subset of the GH geneswithin the CAZy database have been functionally characterized; as of 2013 only 6 % of GH geneshave had any form of functional characterization [181] and this percentage is surely decreasing withthe influx of new genomes that are deposited into the database.Metagenomic research has taken advantage of massively paralleled, high-throughput DNA se-quencing techniques to provide insight into the function of environmental DNA [300]. Severalstudies have focused on the discovery of GH genes from environmental DNA [120, 176, 315]. Thisapproach serves as a promising avenue for the discovery of new catalysis, however, typically onlyvery few enzymes from a metagenome are functionally characterized. This lack of functional char-acterization further expands the gap between the total number of genes sequenced and those geneproducts that have been functionally characterized.Most efforts to increase the diversity of functionally characterized GH genes have focused onstudying one or a few enzymes at a time. More recently efforts utilizing large-scale gene synthesishave enabled the exploration of phylogenetic branches within a family that have not been wellcharacterized [117]. This is a worthwhile method that one could envision being applied to many262.3. Results and Discussionenzyme families. However, until the cost of gene synthesis comes down this type of study remainsout of reach for a majority of research groups.Function-based metagenomic activity screens, coupled with high-throughput enzyme character-ization, can enable the functional annotation of genes without the bias introduced when annotationis done by sequence comparison and without the need for costly gene synthesis. Functional screenshave the ability to provide a direct link between metagenomes and their functional activities.They can also provide the ability to discover enzymes with activities that exist outside the currentparadigms of gene annotation, which in turn can better inform in silico approaches.The aim of this study was to produce a library of fosmid clones containing environmental DNAencoding cellobiohydrolase activity, as this function is key to the degradation of plant polysaccha-rides [105]. Furthermore, we hoped to profile how presence of cellobiohydrolase genes varied acrossenvironments expected to either be enriched or depleted in plant biomass. To this end 309,504clones containing DNA extracted from 22 diverse sites were interrogated with a fluorogenic activityprobe. The resulting resource, a panel of 178 clones, enabled us to rapidly investigate the substratespecificity, acid tolerance and thermal tolerance of enzymes expressed by these clones and revealeda diverse set of genes and activities.2.3 Results and DiscussionA set of twenty-two (22) fosmid libraries were chosen for functional metagenomic screening. Theselibraries were sourced from a variety of natural and engineered ecosystems, as described in Table2.1. Ocean water samples were sourced from the North-Eastern sub-Arctic Pacific Ocean at depthsranging from surface to 2000 m [326]. Soil samples were collected from four different depths fromdisturbed and undisturbed test plots in Skulow Lake, British Columbia [114]. Coal bed sampleswere produced from coal bed core cuttings or water withdrawn from the coal beds [8]. Bioreactorsamples were sourced from an anaerobic mining bioreactor [201], a methanogenic naptha-degradingculture or a methanogenic toluene-degrading culture [288]. As these DNA sources varied drasticallyin their physiochemical properties (Table 2.1) and microbial community composition, we hoped thatthis diversity would potentiate the discovery of new catalysts.272.3.ResultsandDiscussionTable 2.1: Fosmid LibrariesName Project Sample Type Ref. Depth (m) Temp. (◦C) pH Clones12010 Ocean Water from Station P12 [326] 10 8.4 7.8 7,68012200 Ocean Water from Station P12 [326] 500 4.5 7.3 7,68012500 Ocean Water from Station P12 [326] 2000 1.9 7.4 7,68040010 Ocean Water from Station P4 [326] 10 9.9 7.8 7,68040500 Ocean Water from Station P4 [326] 500 5.6 7.4 7,68041000 Ocean Water from Station P4 [326] 1000 3.6 7.3 7,68041300 Ocean Water from Station P4 [326] 1300 2.9 7.3 7,680NO Soil Natural; Organic horizon [114] 0 4.1 5.0 10,752NA Soil Natural; Mineral (eluviation) [114] 0.1 4.1 5.7 13,440NB Soil Natural; Mineral (transition) [114] 0.3 4.1 6.0 9,984NR Soil Natural; Mineral (accumulation) [114] 0.55 4.1 6.7 23,040CO Soil Clearcut; Organic horizon [114] 0 4.1 6.0 16,512CA23 Soil Clearcut; Mineral (eluviation) [114] 0.1 4.1 5.7 9,216CB Soil Clearcut; Mineral (transition) [114] 0.3 4.1 6.2 21,888SCR Soil Clearcut; Mineral (accumulation) [114] 0.55 4.1 6.7 10,752FOS62 Bioreactor Bioreactor core sample [201] 0 18.0 6.9 18,432TolDC Bioreactor Toluene degrading culture [288] 1.5 25.0 7.5 23,040NapDC Bioreactor Naptha degrading culture [288] 31 28.0 7.5 20,736CG23A Coal Bed Coal bed produced water [8] 300-500 32.1 7.9 9,600CO182 Coal Bed Coal bed cutting [8] 686 22.0 N.D. 23,040CO183 Coal Bed Coal bed cutting [8] 730 22.0 N.D. 23,040PWCG7 Coal Bed Coal bed produced water [8] 300-500 32.4 7.7 22,272Total 309,504282.3. Results and Discussion2.3.1 In-Silico ScreeningAll 22 of the chosen libraries have had a portion of the clones end-sequenced, meaning that the endsof the insert DNA were sequenced using Sanger-sequencing technology, Table 2.2. To preliminarilyassess the potential of these libraries to catalyse the degradation of cellulosic biomass we turnedto these end-sequences, as being representative of genes within the library. A total of 176,472clones were end-sequenced, 57 % of all clones, producing 235 Mbp of sequence data. Open readingframes (ORFs) were predicted from these end-sequences using Prodigal [134] implemented withinthe MetaPathways bioinformatic pipeline [155] resulting in a total of 400,561 predicted ORFs.These predicted ORFs were then annotated using LAST [150] implemented in the MetaPathwayspipeline based on queries of the CAZy database [181], revealing a total of 3,953 predicted GlycosideHydrolases(GHs).Table 2.2: End Sequences Interrogated From Each LibraryLibrary Project End Sequences Predicted ORFs GH Genes12010 Ocean 12,477 12,769 5012200 Ocean 14,740 17,472 10612500 Ocean 14,886 17,495 9640010 Ocean 14,275 15,771 9040500 Ocean 14,705 16,715 10741000 Ocean 14,701 16,935 11641300 Ocean 14,488 16,601 111CO Soil 15,360 17,086 235CA Soil 15,360 16,903 166CB Soil 15,360 17,441 185SCR Soil 15,360 16,303 188NO Soil 15,360 17,577 164NA Soil 15,360 17,288 198NB Soil 15,360 17,413 212NR Soil 15,360 17,259 174FOS62 Bioreactor 37,632 40,255 837TolDC Bioreactor 15,360 16,618 131NapDC Bioreactor 15,360 17,126 143CG23A Coal Bed 15,360 16,779 195CO182 Coal Bed 15,360 17,329 225CO183 Coal Bed 15,360 17,678 209PWCG7 Coal Bed 15,360 23,748 15Total 352,944 400,561 3,953Of the predicted GHs, 320 (0.080 % of all predicted ORFs) were found to belong to families292.3. Results and Discussionthat have β-glucosidase activity, but not cellulase activity (GH1, GH3, GH30, GH116) with GH3being the most abundant (246 ORFs, 0.061 % of predicted ORFs). With respect to cellulases, 256(0.064 % of predicted ORFs) were found to belong to families that contain members with cellulaseactivity (GH5, GH6, GH7, GH8, GH9, GH10, GH12, GH26, GH44, GH45, GH48, GH51, GH74 andGH124) with GH5 being the most abundant (105, 0.026 % of predicted ORFs). The distributionof cellulases and β-glucosidases varied greatly between libraries, Figure 2.1. The library derivedfrom an anaerobic mining bioreactor (FOS62) had the highest abundance of predicted cellulasesand β-glucosidases (n = 169, 0.42 % of predicted ORFs), likely a reflection of the feed stock for thebioreactor (bacterial biomass and partially degraded and composted cellulose). The soil librarieshad the next highest abundance of cellulases and β-glucosidases (24.6 ± 6.5 ORFs, 0.14 % ± 0.04% of predicted ORFs), which one would expect due to the presence of cellulose in the form ofdecaying plant matter. The ocean and coal bed samples were relatively depleted in cellulases andβ-glucosidases (0.08 % ± 0.03 % and 0.08 % ± 0.05 % of predicted ORFs respectively) which in turnmay be attributed to the paucity of cellulose in these environments. Additionally, coal bed samplesshow a lack of diversity in the number of cellulase families found; they contain almost exclusivelyGH5 enzymes. This dearth is also reflected by the counts of cellulases and β-glucosidases predictedfor the naptha- and toluene-degrading enrichment cultures (0.08 % and 0.06 %, respectively),furthermore, the end-sequences of these libraries contain almost no enzymes predicted to belong tocellulase families (TolDC = 1, NapDC = 2).Table 2.3: Highly Repetitive Short ORFs from PWCG7ORF Length Counts Blast Hit Identity1 124 5838 holin [Pseudomonas stutzeri ] 98%2 174 5096 phage terminase [Pseudomonas stutzeri ] 99%3 112 3459 pyocin R2, holin [Pseudomonas stutzeri ] 100%Of the sequences investigated, the PWCG7 library had the fewest predicted cellulases and β-glucosidases (n = 1, 0.004 % of all ORFs). An additional peculiarity is that this library has asubstantially higher number of predicted ORFs when compared to other libraries of a similar size,Table 2.2. Further investigation revealed a redundancy among the predicted ORFs; there were only3,807 unique ORFs within the 23,748 predicted ORFs. Three ORFs in particular were found over302.3. Results and Discussion12010122001250040010405004100041300CO CA CB SCRNO NA NB NR FOS62TolDCNapDCCG23ACO182CO183PWCG7Scale (% of ORFs):CellulasesGH5GH6GH8GH9GH10GH12GH26GH44GH45GH48GH51GH74β-GlucosidasesGH1GH3GH30GH116LPMOsAA9AA10Total0.063 0.13 0.25 0.5Figure 2.1: In-Silico Screening of Fosmid End-sequences. Bubble plot of CAZymes which arepredicted to have activity on cellulose or cellooligosaccharides. Families GH7 and GH124 wereomitted as there were no predicted genes belonging to these families. Bubble area is proportionalto the percentage of all predicted ORFs within a specific familiy3,000 times within the end-sequences (Table 2.3) and each of these is a small phage protein. Thisis suggestive of either phage contamination within this portion of the PWCG7 library or that theenvironment was sampled during a period of viral bloom.The presence of lytic polysaccharide mono-oxygenases (LPMOs) was also investigated. Theseenzymes are classified as Auxiliary Activities (AA) in the CAZy database [171] and AA9 and AA10have been observed to oxidatively cleave cellulose chains and act synergistically with other cellulases[94, 126]. However only 4 AA9 or AA10 proteins were predicted from the fosmid end-sequences(2 AA9s in the NO library and 1 AA10 each in the CA and CO183 libraries). One can speculatethat this dearth of LPMOs may be caused by the anoxic or anaerobic nature of a majority of thesamples that were used to create libraries.Taking this information together, the expectation for functional screening would be that the312.3. Results and DiscussionFOS62 library is likely to produce the greatest number of hits, followed by the soil libraries. Screen-ing of the ocean, coal bed, TolDC and NapDC libraries, on the other hand, is likely to result ina smaller number of functional hits and in the case of the coal bed libraries I would expect a lowdiversity in the number of families that are functionally identified.2.3.2 Functional ScreeningAll 22 libraries were screened by Sam Kheirandish with a fluorogenic substrate, 4-methylumbelliferylcellobioside (MU-C), designed to detect cellulase, cellobiohydrolase and β-glucosidase activity. Thissubstrate releases the fluorophore 4-methylumbelliferone (MU) which can be detected at concen-trations in the nanomolar range, Figure 2.2. MU-C offers an increase in the sensitivity over thepreviously employed 2,4-dinitrophenyl cellobioside (DNP-C), a chromogenic substrate which hadbeen used to screen a third (6,144 of 18,432) of the FOS62 clones [201], as fluorescent detectionis inherently more sensitive. The use of this substrate was adapted to the screening paradigmemployed by Mewis et. al. [201] which enabled the rapid screening of over 300,000 clones.OHOHO OHOOHOHO OHOHO O O O O OOHOHO OHOOHOHO OHOHOHFigure 2.2: Fluorogenic Reporter 4-Methylumbelliferyl Cellobioside. Cleavage of the reducing endacetal linkage releases a fluorescent molecule, facilitating the detection of glycoside hydrolases.Screening revealed 256 hits with a plate-based z-score (the number of standard deviations abovethe mean) above 10, a hit rate of 1 in 1209 clones, although this varied drastically between libraries,Figure 2.3. Of these hits, 178 were verified after re-streaking and triplicate validation, Table 2.4.As expected from the annotation of fosmid ends the FOS62 library had the highest hit rate of anyof the libraries (1 in 222). The library with the next highest hit rate was the TolDC library (1in 768), followed by the Soil libraries (average hit rate of 1 in 2,627). The coal bed libraries werecomparatively poor with an average hit rate less than half that seen for the soil libraries (averagehit rate of 1 in 5,996) while the ocean libraries only produced a combined total of 3 hits from theover 50,000 clones screened (average hit rate of 1 in 17,920)322.3. Results and DiscussionFigure 2.3: Functional Screening of All Libraries with MU-C. Z-score values for fluorescence werecalculated for each plate. Clones above the a z-score of 10 were chosen for further validation.The general trend observed for the number of hits fit well with the expectations from fosmidend-annotation (FOS62 > soil > coal bed, ocean, methoanogenic bioreactors). However, there werea few notable exceptions. Firstly it was unexpected that the TolDC library would have the secondhighest hit rate from all the libraries screened. This enrichment culture was supplied with tolueneas its sole carbon source, so it is quite surprising that the library obtained from this source showssuch a capacity to degrade cellobiosides. Additionally, the absence of any hits from the CA23soil library was unexpected, a result which may underline the inherent stochasticity of screening.Another unexpected result was the presence of any hits from the PWCG7 library. As this libraryhad the worst frequency of annotated cellulases and β-glucosidases it was expected to have thefewest number of hits. However, PWCG7 had more hits than all 7 Ocean libraries combined.Access to both end-sequences and functional screening results also allowed us to empiricallyestimate the recovery rates for each of the libraries screened. Using the most frequently seen β-332.3. Results and DiscussionTable 2.4: Functional Screening HitsLibrary Source Verified Hits Clones per Hit12010 Ocean 0 -12200 Ocean 1 7,68012500 Ocean 1 7,68040010 Ocean 0 -40500 Ocean 1 7,68041000 Ocean 0 -41300 Ocean 0 -NO Soil 14 768NA Soil 8 1,680NB Soil 5 1,997NR Soil 3 7,680CO Soil 5 3,302CA Soil 0 -CB Soil 7 3,127CR Soil 2 5,376Fos62 Bioreactor 83 222TolDC Bioreactor 31 743NapDC Bioreactor 4 5,184CG23A Coal Bed 3 3,200CO182 Coal Bed 4 5,760CO183 Coal Bed 2 11,520PWCG7 Coal Bed 4 5,568Total Ocean 3 17,920Total Soil 45 2,569Total Bioreactor 118 527Total Coal Bed 13 5,996Total All Libraries 178 1,738glucosidase and cellulase families from the end-sequences (GH3 and GH5) we can estimate theexpected number of ORFs that belong to these families on all of the clones within each of thelibraries 2.5. Comparison of the number of GH3s and GH5s recovered from the fosmids gives ussome insight into the hydrolase recovery rates and how this changes across environments. Theaverage recovery rate was approximately 2.5 % for both GH3 and GH5 families, however it washighly variable between libraries. The ocean library had the lowest recovery rates (GH3 = 0.17%, GH5 = 0.98 %), while the TolDC library had the highest recovery rate seen (GH3 = 9.71%, GH5 = 13.87) . The differences in recovery percentages seen is likely due to multiple factors,including: regulation and expression of the genes, the ability of E. coli to properly translate thegenes, and whether the protein products are active under the screening conditions used. One caveat342.3. Results and Discussionof interpreting this data is that not all GH3s and GH5s are active glucosides, with some familymembers targeting other substrates, such as -N-acetylglucosaminides or xylosides in the case ofGH3s.As the FOS62 library had been previously screened with a different, chromogenic, substrate(DNP-C), the performance of MU-C could be compared to this benchmark. For all FOS62 clonesscreened (n = 18,432) there were 90 colonies determined to be hits with DNP-C (z-score = 6), while83 were uncovered with MU-C (z-score = 10), and 35 of these clones being found in both screens.These two leaving groups appear to access somewhat different sets of enzymes as 103 of the total138 fosmids recovered (75 %) were only identified with a single substrate. DNP-C is more reactive,as the pKa of the 2,4-dinitrophenyl leaving group (pKa = 4.09) is substantially lower than that ofMU (pKa = 7.79), resulting in reduced activation energy for bond cleavage. The DNP-C probehowever lacks the sensitivity of fluorogenic MU-C. A chemical activity probe bearing a fluorescentleaving group with low pKa may afford a larger number of clones, and offer an improved hit rateover either DNP-C or MU-C.Table 2.5: GH3 and GH5 Recovery Rates.Expected Recovered (%)Library GH3 GH5 GH3 GH5Ocean 577 307 0.17 0.98Soil 1,861 679 2.58 1.77Coal Beds 912 510 0.77 0.98Fos62 762 345 6.04 6.95TolDC 216 36 9.71 13.87NapDC 315 0 0.64 N/ATotal 4,942 2,109 2.53 2.42N/A: Could not be calculated.352.3. Results and Discussion2.3.3 High-throughput Characterization of FosmidsTo gain further insight into the discovered hits, high-throughput characterization of the fosmidclones was performed by Dr. Feng Liu and Tanya Duo. This characterization exploited the useof a Biomek FX workstation (Beckman Coulter) and plate-based assays to gain insight into thesubstrate specificity, pH dependence and thermal stability of identified clones without the need forenzyme sub-cloning and purification. Though it should be noted that as there may be more thanone gene expressed, this characterization may reflect the activity of more than one enzyme.Substrate PreferenceThe 178 fosmid clones identified were assayed against a panel of eight different glycosides bear-ing a MU leaving group. This panel of substrates consisted of the cellobioside, lactoside, β-D-glucopyranoside, β-D-galactopyranoside, β-D-xyloside, α-L-arabinofuranoside, β-D-mannopyranosideand β-D-N -acetylglucosaminide. Many of these monosaccharides and disaccharides are present inthe hemicellulosic and pectic fractions of wood. A majority of clones were most active againsteither the glucoside or cellobioside substrate, however, there were a substantial number of clonesthat had higher activity against other substrates (see Figure 2.4 and Table A.1). Clones with opti-mal activity against MU α-L-arabinofuranoside and MU β-D-xyloside were the next most abundantwith counts of 34 and 10 clones respectively. These sugar monomers are essential components ofhemicellulose [264], thus fosmids active on these substrates may be active against hemicellulose inaddition to their activity on glucosides. The presence of either multifunctional enzymes or multiplegenes located in gene clusters such as PULs is a likely explanation for the multiple activities seen.Optimal pH determinationTo ascertain the optimal pH for each fosmid clone assays were performed with the optimal sub-strate for that clone in a number of solutions buffered at a pH ranging between 4.0 and 9.8. Theaverage pH optimum was 5.6 ± 0.7, with the largest number of clones having an optimal pH ofbetween 5 and 6 (142 of 178 clones), Figure 2.5. Of the clones with pH values greater than 7.5a disproportionate number were derived from the ocean environment, likely reflecting the slightly362.3. Results and Discussion0204060Arab Cel Gal Glc GlcNAc Lac Man XylNumber of FosmidsSource: Bioreactor Coal Bed Ocean SoilFigure 2.4: Fosmid Substrate Preference. Each fosmid containing clone was assayed against eightsubstrates: MU cellobioside (Cel), MU lactoside (Lac), MU β-D-glucopyranoside (Glc), MU β-D-galactopyranoside (Gal), MU β-D-xyloside (Xyl), MU α-L-arabinofuranoside (Arab), MU β-D-mannopyranoside (Man) and MU β-D-N -acetylglucosaminide (GlcNAc). Initial rates were deter-mined using crude cell lysate to determine the optimal substrate for each clone.alkaline pH of the open ocean. A total of 5 clones were observed to have the lowest pH optima(CB006 04 L11, FOS62 34 K14, NO001 13 N07, PWCG7 49 G20, TolDC 59 K14), being most ac-tive in pH 4 buffered solutioins. No clear correlation between the sample pH and the optimalpH of the fosmid clone was observed. One possible explanation for this lack of correlation maybe the intracellular use of a subset of these enzymes, causing the pH optima to be a reflection ofintracellular pH rather than that of the environment.The pH range observed for fosmid clones assayed is typical of most β-glucosidases and cellulaseswhich have been studied to date. There are however some notable exceptions. For example, alkaline372.3. Results and Discussion02040604 5 6 7pHNumber of FosmidsSource Bioreactor Coal Bed Ocean SoilFigure 2.5: pH Optima of Fosmid Clone Activity. Initial rates were used to determine the pH atwhich the fosmid harbouring clones best catalysed the degradation of the optimal substrate.cellulase K from Bacillus sp. strain KSM-635 [273] has optimal activity at a pH of 9.5 almost twounits away from the most alkaline tolerant clone found here. On the other end of the spectrumthe endo-glucanase SSO1949 from the archaea Sulfolobus solfataricus has an very low pH optimumof 1.8 [132], substantially lower than the most acidic fosmid uncovered here. This low optimal pHis likely a reflection of the extremely low pH optima (pH of 2-4) for this species [271]. Extremelyacidic or alkaline activity is, however, not necessary for the successful implementation of cellulasesor β-glucosidases in commercial cellulase cocktails. The most commonly used cellulase cocktail,Cellic R© CTec3 (Novozymes, Copenhagen, Denmark), has a pH optimum of 5.0 - 5.5, a range in382.3. Results and Discussionwhich nearly 30 % of the fosmid clones had optimal activity, and an even greater number wereactive.Thermal StabilityFurther characterization was performed by Tanya Duo and Dr. Feng Liu to determine the thermalstability of the activity seen for each clone. Assays were performed with the optimal substratefor each clone at its optimal pH, after preheating at a range of temperatures between 37 ◦C and90 ◦C. The resulting rates were used to determine the denaturation midpoint temperature (Tm).The Tm values determined spanned a range from 38◦C to 74 ◦C and had an average value of50.7 ± 6.4 ◦C, Figure 2.6. One noteworthy observation was that three of the four highest Tmvalues determined were for fosmids from the PWCG7 library (PWCG7 33 K24, PWCG7 19 J20,PWCG7 19 I21 with Tm values of 69, 69 and 74◦C respectively). The PWCG7 library was sourcedfrom coal bed produced water that was at a temperature of 32.4 ◦C, the highest temperature forany environment screened here (Table 2.1), consistent with this library producing the clones withthe highest Tm.Taken in the context of the scientific literature the Tm values of the recovered clones are modest.Proteins from extremophilic organisms such as Pyrococcus furiosus are much more likely to haveextremely thermotolerant enzymes. In fact the endoglucanase from P. furiosus has a temperatureoptimum of 100 ◦C and a Tm of 112 ◦C [20]. However, the current mixtures of hydrolytic enzymessuch as Cellic R© CTec3 exhibit optimal activity at moderate temperatures (50 - 55 ◦C). A total of27 % of the fosmid clones had Tm values at or above 55◦C, signifying that although the Tm valuesfor recovered clones were not extreme, there are quite a few that are acceptably stable to use asenzyme cocktail additives.2.3.4 Fosmid Sequencing and Gene AnnotationValidated and characterized fosmids were then fully sequenced and assembled by Dr. Keith Mewis,Sam Kheirandish and myself to reveal the active genes present on each fosmid insert. Sequencingof 178 clones produced 6.2 Mbp of assembled data with an average fosmid insert size of 35 ± 5 kbp,Figure 2.7. Comparison between sequences identified 123 non-redundant clones based on greater392.3. Results and Discussion0102040 50 60 70Denaturation Midpoint (°C)Number of FosmidsSource Bioreactor Coal Bed Ocean SoilFigure 2.6: Thermal Stability of Fosmid Clone Activity. Denaturation midpoints were determinedfor all clones by first pre-incubating lysate over a range of temperatures and then assaying theclones with the optimal substrate to determine the initial rates of hydrolysis.than 95 % similarity across more than 90 % of insert length. The redundant clones were mostprevalent in the Fos62 and TolDC libraries (60 and 18 redundant clones respectively), while therewere no clones meeting the redundancy criteria identified within any of the soil libraries. Thissuggests that more sequence diversity is captured in the soil libraries.GH abundanceAcross all fosmids 4,653 ORFs were predicted, an average of 26.1 ± 5.5 per fosmid. These ORFswere queried against the CAZy database [181] with LAST [150] implemented in the MetaPathways402.3. Results and Discussion051015202520,000 30,000 40,000 50,000Base PairsNumber of FosmidsSource: Bioreactor Coal Bed Ocean SoilFigure 2.7: Distribution of Fosmid Insert Length. Histogram showing the number of sequencedfosmids with a specified length, bars are coloured by the library source.412.3. Results and Discussionpipeline [155]. This revealed 516 ORFs annotated as glycoside hydrolases, Figure 2.8. All of theidentified fosmids contained a GH belonging to a known β-glucosidase or cellulase family. Theannotated GHs spanned 48 families, including all 7 families with β-glucosidase activity (GH1,GH3, GH5, GH9, GH30, GH39 and GH116) and 8 of 14 cellulase families (GH5, GH9, GH8,GH10, GH12, GH26, GH44, and GH51). Of the six cellulase families that were not found (GH6,GH7, GH45, GH48, GH74 and GH124), neither GH7 nor GH124 were identified from the fosmidend-sequences, thus their absence is unsurprising. Although GH45s were identified on FOS62library end-sequences, the majority (92 %) of GH45 sequences in CAZy are eukaryotic, which mayexplain the inability to detect any with E. coli as a host. The remaining cellulase families thatescaped detection were GH6, GH48 and GH74. Of these, GH6 and GH48 are both thought to actprocessively from the non-reducing end, which, in turn may require tighter binding in the positiveenzyme subsites to the cellulose polymer. As MU-C lacks glucose residues in the “+” subsites, thisis a feasible justification for the absence of these families. The family GH74 on the other hand islargely composed of endoxyloglucanases, and only one enzyme is seen to have better activity onglucans than xyloglucans [55], the absence of this family from the observed hits can be justified bythe scarcity of its action on un-decorated glucans.The two most abundant hydrolase families recovered from the fosmid inserts, GH3 and GH5(2.7 and 1.1 % of fosmid ORFs, respectively), were also the most abundant β-glucosidase andcellulase families identified in the fosmid end-sequences. A further 6 hydrolase families were foundat rates greater than 0.5 % of all ORFs (GH16, GH43, GH10, GH30 and GH1, in order from mostto least abundant), though not all of these families contain cellulases or β-glucosidases. The thirdmost abundant family (GH16, 0.8 % of predicted fosmid ORFs) is not annotated as containingcellulases or β-glucosidases, but a portion of its characterized members do have activity on glucanpolymers with mixed 1,3- and 1,4-linkages. All 30 of the fosmids annotated as containing a GH16also have either a GH3 or GH5 present. The large number of GH16s recovered is therefore likelydue to their association with cellulases or β-glucosidases in clusters of genes that work together todegrade glucans.422.3. Results and DiscussionGH130GH128GH127GH116GH115GH109GH105GH97GH95GH94GH93GH87GH78GH73GH67GH66GH65GH63GH55GH53GH51GH44GH43GH39GH38GH36GH35GH31GH30GH29GH27GH26GH23GH20GH16GH15GH13GH12GH11GH10GH9GH8GH5GH4GH3GH2GH1Coal Bed-HitsCoal Bed-EndsFOS62-HitsFOS62-EndsNapDC-HitsNapDC-EndsOcean-HitsOcean-EndsSoil-HitsSoil-EndsTolDC-HitsTolDC-EndsTotal-HitsTotal-EndsFamily5.0 %1.0 %0.1 %Percentage of ORFsSourceFigure 2.8: Predicted GH Abundance on Fosmids Hits and on End Sequences Bubbles show therelative abundances of each GH family recovered from positive fosmid clones for each library source.The bioreactor results are shown for each library. Fosmid encoded GHs are compared to thoserecovered from end-sequencing of fosmids. Bubbles are coloured by library.432.3. Results and DiscussionGH43 enzymes, have also not been described as cleaving β-glucans, rather, they are known to acton β-xylosides, α-L-arabinofuranoside, and β-galactans, which are key components of hemicellulose[202] and are often found in hemicellulose- and pectin-degrading loci [215, 257]. As with the GH16family, the high abundance of GH43 genes can be ascribed to their genomic co-localization withcellulases or β-glucosidases. Furthermore, the abundance of GH43s is likely the cause of the highpercentage of hits with arabinosidase and xylosidase activity.The distribution of predicted fosmid hydrolases was consistent across environments, barring afew exceptions. One abberation was the low number of GH1s predicted from the FOS62 library (5genes, 0.23 % of predicted ORFs) when compared to all other environments (22 genes, 0.86 % ofpredicted ORFs). The FOS62 library was also quite diverse in the range of cellulases recovered.Hydrolases belonging to famlies GH8, GH12, GH44 and GH51, were only recovered from the FOS62library, even though all of these were seen in the end-sequences from the soil libraries. One furthersurprise was that the GH30 family was predicted on fosmids from coal bed libraries. End-sequencingof these libraries revealed a paucity in the diversity of hydrolase families, with only GH1, GH3, andGH5 families expected to be recovered.I also sought to investigate how the activities seen in the high-throughput characterizationrelated the genes present on the recovered fosmids. To gain insight into which families are likelyresponsible for these permiscuous activities the fosmids were divided into sets based on their optimalsubstrate, Figure 2.9. All fosmids were active on MU-C, thus the presence of GH1s and GH3s andGH5s whithin each of the subsets was unsurprising. However, there were substantial differencesbetween the percentage of certain hydrolases families seen in each subset. The sets with highestactivity on monosaccharides all had greater numbers of GH3s, as compared to the set most activeon the cellobioside. Fosmids with the highest activity on cellobiosides, usurprisingly, had a higherpercentage of ORFs assigned to cellobiohydrolase containing families GH5 and GH8. The set withhighest activity on MU-Glc had the highest diversity of GH’s seen of any of the sets, which islikely a reflection of the greater number of fosmids within this group. The fosmids which wereoptimally active on xylose had a higher percentage of GH43s, a family containing β-xylosidases,than any other group of fosmids. This set of fosmids was also highly enriched in GH67 genes, whichis surprising as no members of this family have been identified with β-xylosidase activity. The over442.3. Results and Discussionabundance of this family, may however be due to its frequent incorporation into glucuronylxylancleaving gene cassettes, rather than its activity on xylosides.GH130GH128GH127GH116GH115GH109GH105GH97GH95GH94GH93GH87GH78GH73GH67GH66GH65GH63GH55GH53GH51GH44GH43GH39GH38GH36GH35GH31GH30GH27GH26GH23GH20GH16GH15GH13GH12GH10GH9GH8GH5GH4GH3GH2GH1ArabinoseCellobioseGalactoseGlucoseXyloseTotalFamily4%1%0.25 %Percentage of ORFs Optimal SubstrateFigure 2.9: Hydrolase Distribution with Optimal Substrate. Bubble area is proportional to thepercentage of ORFs that belong to each hydrolase family. Fosmids with optimal activity on MUβ-D-mannopyranoside, MU lactoside or MU β-D-N -acetylglucosaminide are excluded, as there werefewer than 5 fosmids within each of these categories.To assess the distinctness of recovered glucanases, ORFs belonging to families containing β-452.3. Results and Discussionglucosidase or cellulase activity were queried against the National Center for Biotechnology In-formation (NCBI) non-redundant protein database (accessed April 2017) using BLAST [7]. Theproteins recovered were quite distinct, with an average maximum identity of 65.6 ± 12.5 %, Figure2.10. Of the 331 proteins queried only 15 had a homologous protein with greater than 90 % identity.The lowest % identity uncovered was that of the GH3 FOS62 47 P19-8, which had a maximumidentity of 40.5 %. These results highlight the ability of functional metagenomic screening, notonly to find functional proteins, but to also reveal a large number of novel proteins.Phylogenetic Tree ConstructionWe also sought to gain insight into how the recovered proteins relate to previously characterizedmembers of the same family. Towards this goal we constructed maximum likelihood phylogenetictrees for the glucanase families that were found most frequently on the recovered fosmids (GH1,GH3, GH5, GH8, GH9, GH10 and GH30). Trees were constructed with GH proteins identifiedthrough fosmid sequencing and those denoted as characterized in the CAZy database. These twosets of proteins were clustered separately at 95 % to remove duplicates or highly similar sequences.The two sets of proteins were then combined for alignment with COBALT [230], which was trimmedwith trimAl [45] and a tree was generated using RAxML [284].GH1Many of the recovered GH1s clustered together, even though they were from different libraries,Figure 2.11. The majority of recovered GH1s (15 of 18) clustered within one clade – the group of allproteins originating from a single branch point– which was almost entirely populated with proteinswith β-glucosidase activity. Surprisingly, two of the identified GH1s clustered with the phospho-β-glucosidases (TolDC 30 A19-17 and CG23A 01 C20-5), which generally do not display activityagainst non-phosphorylated glucosides. Additionally, neither of these fosmids contain anotherhydrolase belonging to a different glucanase family, though the coal bed-derived clone also containeda GH4, a family with phospho-β-glucosidase activity. However, both of these fosmids also containpredicted cellobiose PTS systems (transporters that couple the phosphorylation of sugars to theiruptake [198]) in the same reading frame and in close proximity to the identified GH1 genes. Thissuggests that these clones first phosphorylate MU-C during uptake and before degradation.462.3. Results and Discussion025507540 60 80 100Percent IdentityCountsFamilyGH1GH3GH5GH8GH9GH10GH12GH16GH26GH30GH39GH44GH51GH116Figure 2.10: Percent Identities of Best Blast Hits to Putative Hydrolases All putative hydrolasegenes predicted to belong to β-glucosidase or cellulase families were queried against the blast-nrprotein database.472.3. Results and DiscussionGH3The discovered GH3 proteins appeared more widely distributed as compared to GH1s, Figure2.12, and tended to be more similar to each other than to previously characterized proteins. ClusterA was deeply branching, with a bootstrap value of 100, and was dominated by GH3s uncovered inthis study. The previously characterized proteins within cluster A were the β glucosidase gluA fromDictyostelium discoideum, BoGH3B, a β glucosidase present in a xyloglucan degrading PUL fromBacteroides ovatus and A1 9 a protein that had been previously discovered when the FOS62 libraryhad been screened with DNP-C. Cluster B, also deeply branching, contained a large number ofmetagenome-derived genes, however this cluster also had a larger number of previously characterizedGH3s than seen in cluster A. The coal bed hits within this group cluster closely with GH3s fromthe gammaproteobacteria Cellvibrio japonicus and Saccharophagus degradans which have beencharacterized [40, 216]. The soil and bioreactor hits within this group however, cluster separatelyfrom any previously characterized protein. Cluster C contains 12 GH3s found within this studyand a number of proteins from uncultured sources. Two of these proteins (D1 14 and H1 5) wereidentified in the previous screening of the FOS62 library [201] and three were identified from atermite gut [340]. This cluster also contains the BglX from E. coli [330] and Lin1840 from Listeriainnocua, which has been crystalized [274].There were a number of fosmid-derived proteins (NA002 01 B04-2, SCR03 04 B15-17,NO001 03 P09-1, FOS62 38 D22-5, NO001 04 B04-0, NO002 11 N21-0, TolDC 31 E21-18, andCO003 01 D22-3) which clustered with enzymes that have activity on xylosides, rather than theexpected glucosides (Figure 2.12, cluster D). Of these fosmids, five (NA002 01 B04, SCR03 04 B15,NO001 03 P09, NO001 04 B04 and NO002 11 N21) had more than one predicted GH3, with theadditional GH3(s) clustering in a β-glucosidase group. The remaining three fosmids had many GHgenes, each with a GH gene from another family likely to be responsible for activity. Finally, clusterE was entirely made up of clones from this study (n= 15), and fell within a clade that has almostentirely β-glucosidase activity.482.3. Results and DiscussionActivity3.2.1.23 : β−galactosidase3.2.1.−   : Glycosidase3.2.1.25 : β−mannosidase3.2.1.21 : β−glucosidaseFosmid3.2.1.86 : 6−phospho−β−glucosidase : Thioglucosidase3.2.1.31 : β−glucuronidase2.4.1.−   : Hexosyltransferases3.5.2.17   : Hydroxyisourate hydrolase3.2.1.118 : Prunasin β−glucosidase3.2.1.117 : Amygdalin β−glucosidase3.2.1.161 : β−apiosyl−β−glucosidase3.2.1.105 : 3α(S)−strictosidine β−glucosidase3.2.1.125 : Raucaffricine β−glucosidase3.2.1.149 : β−primeverosidase3.2.1.119 : Vicianin β−glucosidase3.2.1.85 : 6−phospho−β−galactosidase0.0 0.5LibraryCAZy Database SoilBioreactor Coal Bed OceanFigure 2.11: Phylogenetic Tree Containing Discovered GH1s. The inner ring of squares representsthe library from which each protein was retrieved. The outer ring of coloured circles signifies theactivity that each characterized protein has annotated in the CAZy database. Branch points withbootstrap values > 70 % are signified with a small black dot. 49GH5 The tree generated from GH5 proteins had clustering which was in agreement with boththe subfamily designations [15] and the activities of characterized proteins, Figure 2.13. Of the 29predicted GH5s identified from screening, 11 clustered with previously characterized proteins (2each in subfamily 4, 26 and 27, and one each in subfamilies 2, 7, 25, 28 and 36). There were 3clusters of fosmid-derived GH5s that appeared at branches between well clustered subfamilies. Thefirst of these was between subfamilies 12 and 52. This group of soil library GH5s, had the highestsimilarity to UmCel5F a cellulase within subfamily 39 which was derived from a pulp effluentmetagenome [327]. The second cluster of discovered GH5s that did not fit well within a definedsubfamily lies between subfamilies 36 and 22. This group contains proteins identified from all fourtypes of libraries screened and one previously characterized protein, SARM 0025. This protein,the sole characterized protein in subfamily 46, was identified from metageomic sequencing of a cowrumen and displayed activity on 1 % Carboxymethyl cellulose agar [120].The final cluster of discovered GH5s sits between subfamilies 9 and 15, and contains 3 proteinsfrom CO182 and NapDC libraries. The closest characterized gene to this cluster, CW-EG1, isa gene from an uncultured bacterium present in the gut of a cutworm (Agrotis ipsilon) [254],which belongs to subfamily 44. However, all three of these GH5s within this cluster have highestsequence identity with subfamily 45 proteins, which currently lacks a characterized representative.Furthermore, two of these three fosmids (CO182 24 J12 and CO182 11 I14) have no other predictedglucanases besides the GH5 gene, making these targets for future characterization.502.3. Results and DiscussionActivity● : β−glucosidase● 3.2.1.−   : Glycosidase ●   : β−N−acetylhexosaminidase● : Chitosanase● Fosmid●   : Glucan 1,3−β−glucosidase●   : Glucan 1,4−β−glucosidase●   : Xylan 1,4−β−xylosidase● :  α−L−arabinofuranosidase ● : Oligoxyloglucan β−glycosidase● : β−galactosidase● : GlucosylceramidaseLibraryCAZy Database SoilBioreactor Coal Bed Ocean0.0 0.5ABCDEFigure 2.12: Phylogenetic Tree Containing Discovered GH3s. The inner ring of squares representsthe library from which each protein was retrieved. The outer ring of coloured circles signifiesthe activity that each characterized protein has annotated in the CAZy database. Arc segmentsrepresent clusters of proteins containing metagenome-derived GH3s. Branch points with bootstrapvalues > 70 % are signified with a small black dot.512.3. Results and Discussion●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0.0●●●●●● ●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ●●●●●●●●●0.54510725211252373822369153422681Activity●     : Cellulase● Fosmid● : Xyloglucan endo−β−1,4−glucanase● 3.2.1.−     : Glycosidase●   : β−mannosidase●   : Mannan endo−1,4−β−mannosidase●   : Licheninase●     : Endo−1,4−β−xylanase●   : β−N−acetylhexosaminidase● : Endoglycosylceramidase●   : Glucosylceramidase● : Steryl−β−glucosidase●   : β−glucosidase●   : Glucan 1,4−β−glucosidase●   : Xylan 1,4−β−xylosidase●   : Glucan endo−1,6−β−glucosidase●   : Glucan 1,3−β−glucosidase● : β−primeverosidase● : Chitosanase ●   : Cellulose 1,4−β−cellobiosidase● : Galactan endo−1,6−β−galactosidaseLibraryCAZy Database SoilBioreactor Coal Bed OceanFigure 2.13: Phylogenetic Tree Containing Discovered GH5s. The inner ring of squares representsthe library from which each protein was retrieved. Coloured circles signifies the activity that eachcharacterized protein has annotated in the CAZy database. Subfamilies are identified with theoutermost arc segments [15]. Branch points with bootstrap values > 70 % are signified with asmall black dot.522.3. Results and DiscussionGH8A total of seven GH8s were identified from the fosmid metagenomic libraries and all of thesewere found in the FOS62 library. The identified GH8s fell into two groups when clustered at 95%. These two clusters group separately in the constructed phylogenetic tree, Figure 2.14. Thefirst of these proteins FOS62 40 E07-2 groups closely with Cel8B from Fibrobacter succinogenes,an endoglucanase with the highest observed activity on carboxy-methyl cellulose [247]. The secondof these genes, FOS62 26 K06-32 is branching from a deeply positioned node (bootstrap value =98) and belongs to a clade which is dominated by enzymes with cellulase activity.GH9The GH9 family is almost exclusively composed of enzymes with cellulase or cellobiohydrolaseactiviy, Figure 2.15. The four discovered GH9s within this tree fall into two clades, both with boot-strap values greater than 95. The first clade contains two putative GH9s FOS62 30 N01-10 andSCR03 04 B15-0. The bioreactor sourced FOS62 30 N01-10 clusters closest to Cel9B, a multido-main protein from Ruminococcus champanellensis that also contains a GH16 domain. This proteinis annotated within the CAZy database as a licheninase, however only the GH16 domain was charac-terized [206], and this annotation cannot be assigned to the GH9 domain. SCR03 04 B15-0 groupswith Cel9U from Clostridium cellulolyticum and CelD from Ruminiclostridium thermocellum, bothwith activity on insoluble celluloses [102, 249]. The second clade, containing FOS62 25 H06-9 andCO182 36 O01-5, was also almost entirely occupied by bacterial cellulases. The characterized GH9sEgC, Cel9B and SARM 0002 from Fibrobacter succinogenes BL2, Fibrobacter succinogenes subsp.succinogenes S85 and an uncultured organism from a cow rumen, respectively were most similarto FOS62 25 H06-9. All three of these GH9s were active on a variety of soluble and insolublecelluloses [23, 120, 247]. Lastly, CO182 36 O01-5 was most similar to UmCel9B, from a compostsoil metagenome which displayed activity on Carboxymethyl cellulose amongst other glucans [229].532.3. Results and DiscussionActivity●     : Cellulase● 3.2.1.−     : Glycosidase● Fosmid●     : Endo−1,4−β−xylanase● : Chitosanase●   : Licheninase● : Reducing−end xylanase0.0 0.5LibraryCAZy Database SoilBioreactor Coal Bed OceanFigure 2.14: Phylogenetic Tree Containing Discovered GH8s. The squares placed at branch tipsrepresent the library from which each protein was retrieved. Coloured circles signify the activitythat each characterized protein has annotated in the CAZy database. Branch points with bootstrapvalues > 70 % are signified with a small black dot.542.3. Results and DiscussionActivity●     : Cellulase● 3.2.1.−     : Glycosidase● : Cellulose 1,4−β−cellobiosidase●   : Licheninase● Fosmid●   : Cellulose 1,4−β−cellobiosidase●   : Glucan 1,4−β−glucosidase●   : β−glucosidase● : Exo−1,4−β−D−glucosaminidase● : Xyloglucan endo−β−1,4−glucanase●     : Endo−1,3(4)−β−glucanase0.0 0.5LibraryCAZy Database SoilBioreactor Coal Bed OceanFigure 2.15: Phylogenetic Tree Containing Discovered GH9s. The inner ring of squares representsthe library from which each protein was retrieved. The outer ring of coloured circles signifies theactivity that each characterized protein has annotated in the CAZy database. Branch points withbootstrap values > 70 % are signified with a small black dot552.3. Results and DiscussionPresence of Gene ClustersThe wealth of fosmids encoding unexpected activities prompted an investigation of the Polysaccha-ride Utilization Loci (PUL) and gene clusters present on the fosmid hits. PULs have been reportedin several cases to synergistically target complex plant polysaccharides, and their composition cangive insight into the targeted polymers [71, 145, 165, 215, 257, 294]. A total of 17 fosmids were foundto contain PULs, as indicated by the presence of the hallmark SusD/SusC-like protein pairing, Fig-ure 2.16. Several of the identified PULs shared nucleotide identity; FOS62 08 G04, FOS62 08 J18,FOS62 10 O15 were identical at the nucleotide level over the PUL region as were FOS62 29 F15and FOS62 38 A06. Furthermore, fosmids FOS62 37 N04 and FOS62 38 C16 had 74 % nucleotideidentity over 97 % of identified PUL. Synteny was also seen for identified PULs, with the SusC/SusDpair being followed closely by a GH3 in all identified PULs and the frequent inclusion of one orFOS62_08_G04FOS62_08_J18FOS62_10_O15FOS62_29_C04FOS62_37_N04FOS62_29_F15FOS62_38_A06FOS62_38_C16NapDC_52_E10PWCG7_19_I21TOLDC_06_L02TOLDC_41_A17GH3GH16GH5GH30GH35GH43SusCSusDOther5 kbpGH23Figure 2.16: Gene Organisation of SusC/SusD-like Encoding Fosmids. Putative glycoside hy-drolases and SusC/SusD-like proteins are coloured by family. ORFs not annotated as a glycosidehydrolase, SusC-like, or SusD-like, are shown in grey. Fosmids identical to those shown here havebeen omitted for simplicity. Fosmids have been aligned to highlight synteny. Fosmids pairs or setswithin brackets share greater than 95 % identity over their the PUL region.562.3. Results and Discussionmore GH16s. This motif has been seen previously in the laminarin – a β-glucan containing 1,3- and1,6-linkages – degrading PUL from the marine bacteria Gramella forsetii KT0803 [145] and in themixed-linkage glucan degrading PUL from B. ovatus[287]. This co-occurrence of GH3s and GH16swithin the same gene cluster implies that many of these loci target glucans other than cellulose fordegradation.Many fosmids that lack PULs contain clusters of multiple GH genes. Almost 15 % (26 of178) of the fosmids contained 5 or more GH genes, Figure 2.17. There was one set of twoclones (FOS62 37 N12 and FOS62 38 G18) and an additional set of three clones (FOS62 38 D22,FOS62 41 N11 and FOS62 46 E02) with near complete identity over an overlapping region. Addi-tionally, a set of four clones (NO001 07 A13, NO001 01 I19, NA004 04 B18, NR003 09 O07) sharebetween 80 and 90 % identity over a region containing a GH2 and two GH3 genes. Surprisingly,there were a number of clones with gene clusters that appeared to target xylans. These clonescontained carbohydrate esterases, GH43s (which have activity towards xylosides and arabinosides),GH67s and GH115s. The last two families (GH67 and GH115) both play a role in the removal of glu-curonic acids from glucouronoxylan [196, 210]. Specifically, clones CO182 36 O01, TOLDC 20 J14,FOS62 41 N11, FOS62 46 E02 and NO002 11 N21 all contained either a GH67 or a GH115, whileclones FOS62 37 N12, FOS62 38 D22 and FOS62 38 G18 harboured carbohydrate esterases pre-dicted to target acetylations present on xylan. All of these potential xylan targeting fosmids haveeither a GH3, GH30 or GH10 enzyme present, families which can have members with xylosidase orxylanase activity, and we speculate that these are the proteins which have activity towards MU-C.Future characterization of the PULs and gene clusters has potential to shed light on synergisticmechanisms of degradation occurring within the sampled communities.572.3. Results and DiscussionCO182_36_O01FOS62_25_L08FOS62_37_N12FOS62_38_D22FOS62_38_G18FOS62_41_N11FOS62_46_E02NA004_04_B18NO001_01_I19NO001_03_P09NO001_04_B04NO001_07_A13NO002_11_N21NR003_09_O07SCR03_04_B15TOLDC_20_J1412200_16_F10CB004_07_C21CO002_07_L075 kbpGH1GH2SusCSusDGH10GH15GH5GH3GH26GH30GH20GH16GH43GH65GH39GH31GH97GH109GH95GH67GH130OtherGH127GH115Figure 2.17: Gene organization of fosmids containing more than 5 GH genes Putative glycoside hydrolasesproteins are coloured by family. ORFs not annotated as a glycoside hydrolase are shown in grey. Fosmidsidentical to those shown here have been omitted for simplicity. Fosmids have been aligned to highlightsynteny. Fosmids pairs or sets within brackets share greater than 95 % identity over their the overlappingregion.582.4. Limitations and Future Directions2.4 Limitations and Future DirectionsThe combined use of high-throughput screening and high-throughput characterization of the recov-ered hits has allowed us rapid access to hundreds of active clones, each with one or more specificactivities. This approach allows for the rapid discovery of catalytic function, however as with mosthigh-throughput approaches, compromises have to be made. The first of these lies in the choice ofscreening host. E. coli is a work horse of biotechnology, but it undoubtedly cannot express everyprotein, especially those requiring extensive post-translational modification. Functional metage-nomic screening has been successfully performed using multiple hosts [34], however a commercialsystem comparable to that used to generate libraries in E. coli for other species remains elusive.The use of a soluble fluorogenic reporter molecule has allowed the rapid screening of hundredsof thousands of clones. However, the choice of reporter molecule certainly introduces bias into thehits recovered. This is evidenced by the comparison of hits recovered from the FOS62 library whenscreened with DNP-C or MU-C. This has prompted our development of probes with a fluorogenicleaving group with a lower pKa, the 6-chlorocoumarin containing probes described in Chapter 3.Additionally, the use of glycoside reporters with a reporter functionality at the reducing end mayalso limit the range of hits discovered. Enzymes that require binding in the +1 subsite for activitymay be missed by the use of such substrates. The complementary use of chromogenic hydrogelsderived from plant polysaccharides, such as those developed by Willats et. al. [157] may providefurther access to such enzymes.Additional screening conditions would likely reveal a greater number of clones and providefurther access to the diversity of degradative enzymes. Assay conditions, such as temperature,pH and concentrations of metal ions and enzyme co-factors undoubtedly affect which hits, andthe number of hits, recovered. Re-screening libraries under various conditions would result in alarger number of hits. This being said, it often comes down to a question of resource management;will it be more fruitful to screen the same library with different conditions or a different librarywith the same conditions? Ultrahigh-throughput technologies such as those utilizing droplet-basedmicrofluidics [63, 127] may enable exploration of larger numbers of screening conditions.592.5. Conclusions2.5 ConclusionsThis study has revealed a diversity of cellobioside-degrading activities from a wide assortmentof metagenomes. The coupling of liquid-based high-throughput functional screening, plate-basedclone characterization and fosmid sequencing and annotation has allowed us access to the catalyticpotential encoded in these metagenomes. This has revealed hundreds of glycoside hydrolases, manyof which show low identity to any previously discovered gene. Comparison of these genes sequencesto those of characterized glycoside hydrolases, also revealed many genes within clades lacking acharacterized representative. The use of large insert libraries also revealed PULs and clusters ofmultiple GHs, many of which appear to target hemicelluloses. This collection of clones providesa wide range of genes and gene cassettes which may be useful for biomass deconstruction andmodification schemes of carbohydrates.60Chapter 3Functional Screening of the Castorcanadensis Fecal and GutMetagenomes3.1 SummaryBeavers have been described as natures engineers due to their prolific capacity to reshape forestecosystems into ponds and meadows, raising groundwater levels and creating new habitats for di-verse plants and animals. Beyond their capacity to transform landscapes, beavers are extremelyefficient consumers of woody biomass relying on bark, shoots, leaves, and other fibers from hard-wood deciduous trees as primary nutritional resources. Despite this conspicuous efficiency, theunderlying mechanisms enabling beavers to effect woody biomass deconstruction into soluble sug-ars has remained a mystery. Here we chart the community structure and metabolic problem solvingpower of the beaver fecal and gut microbiomes using a combination of metagenomic sequencing,functional metagenomics and carbohydrate biochemistry. This has revealed hundreds of functionalclones from the beaver fecal and gut microbiomes which are active on model cellulose and xylansubstrates. A subset of these fosmids contained GH43 enzymes belonging to previously uncharac-terized subfamilies. Cloning and expression revealed the substrate specificity of three previouslyuncharacterized subfamilies and revealed a mechanism involving two of these family 43 hydrolasedomains and an appended family 8 glycoside hydrolase domain which synergistically degrade ara-binoxylan oligomers. Fosmid clones recovered in this study also revealed novel assortments of genesclustered into polysaccharide utilization loci (PULs) with the potential to synergistically enhance613.2. Backgroundbiomass deconstruction in support of host nutrition.3.2 BackgroundMicrobial communities inhabiting the mammalian digestive tract, so-called gut microbiomes, af-fect host health and mediate essential services including dietary access to recalcitrant glycans orpolysaccharides such as starches and fibers [192]. With respect to digestion, the taxonomic com-position of these communities correlates with host diet and nutrient acquisition strategies acrossdifferent mammalian lineages [208]. Multiple studies indicate that mammalian gut microbiomesconsist of specialized communities that respond to complex glycans derived from specific dietarysources such as lignocellulosic biomass and release products that can be absorbed into the digestivetract [74, 173, 208, 261, 281]. Consistent with these observations, the digestive tracts of herbi-vores and wood-feeding (xylotrophic) organisms harbor microbial communities enriched in genesor gene cassettes encoding the corresponding biocatalysts and polysaccharide utilization systems[19, 120, 226, 240, 268, 315]. These communities provide a frame of reference for understandinghow lignocellulosic biomass is converted into dietary macronutrients, as well as a deep reservoir ofgenomic information with potential biotechnological applications [13].The North American beaver, Castor canadensis, provides a useful animal model for the study ofxylotrophic microbiomes, as its diet is largely composed of bark, shoots, leaves, and other fibers fromhardwood deciduous trees such as poplar, aspen, and cottonwood, which have commercial value inthe forestry sector [41, 308]. Hardwoods such as poplar typically have a total polysaccharide contentcomprising 60-80 % of the dry mass of the wood [319]. Some 35-50 % of this dry mass consists ofcellulose followed by hemicellulose (20 % primarily glucuronoxylan) and pectin. Previous studieshave shown that beavers are capable of digesting up to 32 % of the available cellulose in consumedhardwoods [70]. However little is known about the utilization of the hemicellulose component bythe beaver microbiome, and much less about the enzyme repertoire effecting its deconstruction.Gruninger and colleagues recently used small subunit ribosomal RNA (SSU rRNA) gene se-quencing to profile the microbial community structure of beaver cecum and rectal samples, indicat-ing a typical mammalian hindgut community that is dominated by Bacteroidetes and Firmicutes623.3. Beaver Fecal Metagenome[109]. It is also worth noting that a recent paper by Wong and colleagues has also examined themetagenomes of cultures inoculated with beaver droppings and propagated with cellulose or poplarhydrolysate over a three year period[324]. This was done to enrich for species capable of degradingeither cellulose or the carbohydrates present in poplar. As the community compositions observedwere substantially different from the inoculum (which itself was atypical for a gut sample as itwas dominated by Proteobacteria) [323] investigation of beaver microbiome in its native state mayreveal further insight.As a hindgut fermenter, commensal microbes in the lower digestive tract of the beaver areexpected to mediate the degradation and fermentation of complex sugars to provide short chainfatty acids that provision host nutrition [217]. Given that the proclivity of beavers to consumewood differentiates them from other hindgut-fermenting herbivores, several questions arise fromthe initial microbiome study: What is the population structure of the beaver gut microbiome andhow does it change throughout the digestive tract? Does the microbiome encode specialized genesor gene cassettes mediating conversion of lignocellulosic biomass? Are some components of thelignocellulose targeted for digestion more than others? Could analysis of these differences revealnew insight into sequential biomass deconstruction of wood-based fibers transferrable to industrialprocess streams? To begin answering these questions, we used a combination of SSU rRNA genesequencing, shotgun metagenomics and functional screening to evaluate the community structureand metabolic potential of the beaver microbiome and to recover activities mediating lignocellulosicbiomass deconstruction.3.3 Beaver Fecal Metagenome3.3.1 16S Ribosomal RNA ProfilingTo profile the microbial community composition of the beaver fecal microbiome, I performed 454pyrotag sequencing of the V6-V8 region of the SSU rRNA gene with three-domain resolution oncomposite fecal samples from 2 captive beavers. A total of 12,579 rRNA pyrotag sequences, re-covered from the fecal sample, were clustered with a 97 % similarity cut-off into 404 operationaltaxonomic units (OTUs) after singleton removal. Of the OTUs identified, only two could not be633.3. Beaver Fecal Metagenomeaffiliated with described microbial taxa based on comparison to sequences in the Silva database[248]. One of these OTUs showed 99 % identity to the Castor canadensis mitochondrial DNAsequence [125], while the other had at most 75 % sequence identity to SSU rRNA sequences fromuncultured bacteria. The majority of sequences were affiliated with the bacterial phyla Firmicutes(214 OTUs, 58.4 %) and Bacteroidetes (93 OTUs, 24.4 %), Figure 3.1. Within the Firmicutes,200 OTUs were affiliated with the class Clostridia (55.9 %), with 143 OTUs (43.6 %) affiliatedwith the family Lachnospiraceae, which is known to harbor xylanotrophic, butyric acid-producingmembers [68]. Within the Bacteroidetes, 68 of the 93 OTUs (21.3 %) were affiliated with theclass Bacteroidia, with 39 OTUs (15.5 %) affiliated with the uncultured S24-7 group. In a recentculture-dependent study, S24-7 comprised approximately 4 % of the beaver fecal microbiome priorto methanogenic enrichment on different lignocellulosic biomass substrates [323]. Overall, these re-sults are consistent with the observations of Gruninger and colleagues, although the proportions ofFirmicutes and Bacteroidetes did vary between the studies [109]. As a high proportion of identifiedOTUs lacked cultured representatives (354 of 370 bacterial OTUs) specific metabolic roles couldnot be inferred with confidence. To this end we used shotgun metagenome sequencing to predictmetabolic functions encoded in the beaver microbiome.3.3.2 Metagenome SequencingShotgun metagenomic sequencing was conducted on the 454 platform using DNA from the samepreparations used in SSU rRNA gene pyrotag analysis. This resulted in the production of 469.2million base pairs (Mbp) of total sequence information (616,811 reads with average length 761bp). Raw sequences were trimmed to Q30 quality score using prinseq lite+ [265] and assembledusing MIRA [54] by Dr. Keith Mewis. This resulted in 75,523 contigs with an N50 of 1,787 bpand 130.5 Mbp of consensus sequence. To explore potential bias in community structure basedon pyrotag analysis, we examined SSU rRNA gene sequences recovered from the metagenomeby comparing unassembled reads to the Silva SSU database using MetaPathways [155]. Of the616,811 unassembled reads 1,812 were annotated as having SSU rRNA genes. The majority ofthese sequences were affiliated with either Firmicutes (890) or Bacteriodetes (438), consistent withpyrotag results (Figure 3.1). A notable exception was the relative abundance of Tenericutes within643.3. Beaver Fecal MetagenomeSSU rRNA gene abundance (%)BetaproteobacteriaOther ProteobacteriaBacteroidetesFirmicutesTenericutesOther BacteriaArchaeplastidaOpisthokontaSAROther EukaryotaOtherMetagenomePyrotags0 10 20 30 40 50 60Figure 3.1: Beaver Fecal Community Composition. The relative abundance of 16s rDNA genesfound in the metagenome are compared to those identified by pyrotags. Both methods reveal ametagenome dominated by Firmicutes (green), Bacteroidetes (red) and Proteobacteria (purple)phyla.the metagenome, which was greater than that seen in the pyrotag data (percentages of 0.02 %and 5.8 % respectively). This may be due to amplification bias, as has previously been observedfor Mycoplasma, the dominant Tenericutes genus identified in the beaver fecal metagenome [159].In addition to bacteria, we detected SSU rRNA gene sequences affiliated with Archaeplastida(predominant class Liliopsida) and Opisthokonta (predominant class Insecta) at low abundance.The presence of these eukaryotic taxa in the beaver microbiome could reflect captive dietary intakeor colonization post defecation.To investigate the abundance of CAZymes within the fecal metagenome, the unassembled readswere queried against the CAZy database [181] using LAST [150] implemented in MetaPathways[155]. This revealed 28,107 ORFs (3.85 %) annotated as belonging to a CAZy family. GH geneswere the most numerous CAZyme category identified, comprising 2.14 % of all annotated ORFs.The five most abundant hydrolase families identified were GH13, GH2, GH3, GH43 and GH31,which were present at 0.35, 0.20, 0.17, 0.12 and 0.07 % of all ORFs, respectively (Figure 3.4,panel B). All of these top families, GH13 excluded, are known to be involved in the degradation ofvarious plant polysaccharides. GH13, the most abundant GH found within the beaver gut is often653.3. Beaver Fecal Metagenomeassociated with α-amylase and α-glucosidase activity. Although starch is present in most plants,as a form of energy storage, the abundance of this family may also be due to the pervasive use ofglycogen by bacteria for energy storage [17]. There were a few GH families that were conspicuousin their absence; neither GH6 nor GH12, which are involved in the degradation of cellulose, werefound within the fecal metagenome. The absence of these families, however, has been noted inother herbivorous mammals [241]. Futhermore, hierarchical clustering analysis done by Dr. KeithMewis of CAZyme profiles of mammalian gut microbiomes revealed that the beaver sample clusterswell with other herbivores (Figure, Appendix B.1).3.3.3 Functional ScreeningOHOHOOHOOHOHOOHOHO O OClOHOHOOHO OHOOHO O OClOHOHOOHO O OClCellulase or Xylanase or Xylosidase[ ]6-chloro-4-methylumbelliferyl cellobioside6-chloro-4-methylumbelliferyl xylobioside6-chloro-4-methylumbelliferyl xyloside++6-chloro-4-methylumbelliferoneOHOHOOHO OHOOHOHOHOHOOHOOHOHOOHOHOHOHOHOOHOHO O OCl+ororFigure 3.2: Substrates Used in Multiplex Screening. 6-Chloro-4-methylumbelliferyl cellobioside(CMU-C), 6-chloro-4-methylumbelliferyl xyloside (CMU-X) and 6-chloro-4-methylumbelliferyl xy-lobioside (CMU-X2) were used for functional screening. Hydrolysis of these substrates liberates6-chloro-4-methylumbelliferone which can be detected through fluorescence spectroscopy at a neu-tral pH.A fosmid library containing over 4,500 clones was constructed from the same DNA used inshotgun metagenome sequencing using the pCC1 copy control system expressed in E. coli EPI300(Epicentre). Activity assays were based on methods described by Mewis and colleagues [200, 201]but instead of chromogenic substrates we utilised cellobioside, xyloside, and xylobioside fluorogenicsubstrates bearing the 6-chloro-4-methylumbelliferyl aglycone, resulting in greater sensitivity withimproved signal to noise ratio [52]. The incorporation of a chlorine atom lowers the pKa of the663.3. Beaver Fecal Metagenomereleased aglycone to 5.9 ± 0.1, increasing substrate reactivity when compared to the parent 4-methylumbelliferone (pKa = 7.8), and allowing direct sensitive detection at neutral pH values [52].This is a further improvement on the screening conditions employed in Chapter 2. We combinedthe three substrates in a multiplex format to reduce both the time required and materials costs(Figure 3.2). Multiplex screening identified 51 validated fosmids that hydrolyzed at least one ofthe three fluorogenic substrates (z-score >3); a hit rate of 1.1 % (Figure 3.3).Figure 3.3: Functional Screening of Beaver Fecal Library. Z-score values for fluorescence werecalculated for each plate. Clones above the z-score threshold of 3 were chosen for further validation.To further characterize active clones recovered in multiplex screening, initial rates of hydrolysiswere assessed against a panel of nine separate fluorogenic substrates by Dr. Feng Liu. A majorityof clones were most active against either β-glucosides or β-xylosides. However, six clones displayedhigher activities against alternative substrates including arabinosides (06 E19, 09 O03, 09 O15),galactosides (10 J12), lactosides (09 I18), and mannosides (05 B01). This suggests that either theactive enzymes encoded on these fosmids possess broad substrate specificities, or that multiplefunctions are encoded and expressed from individual clones consistent with gene cassettes e.g.cellulosomes or polysaccharide utilization loci (PULs) involved in the extracellular deconstructionof insoluble biomass [91] and the utilization of soluble carbohydrates within the cell respectively673.3. Beaver Fecal Metagenome[191].3.3.4 Fosmid Sequencing and Gene AnnotationTo identify individual genes or gene cassettes mediating substrate conversion we fully sequencedthe 51 active clones. Reads were assembled using ABySS [275] and ORFs were predicted andannotated using the MetaPathways pipeline [275]. Comparison between sequences identified 38non-redundant clones based on a threshold of >95 % similarity across >90 % of insert length(Figure 3.4). Additional queries against the CAZy database identified 135 GH genes from 28 GHfamilies encompassing 11.13 % of the annotated ORFs (Figure 3.4). Unexpectedly, active fosmidsharbored only 5 GH genes from families with annotated cellobiohydrolases or endoglucanases, oneeach from GH families 5 and 51, and three from GH8 with none from the cellulolytic GH families6, 7, 9, 12, 26, 44, 45, 48, 74 or 124. As the metagenome encodes for a total of 1,085 cellulases from9 of 14 cellulase families (0.15 % of all predicted metagenome ORFs) it may be that we have onlycaptured the most abundant taxa within the fosmid library and that cellulose is being degraded byrarer taxa within the community.683.3.BeaverFecalMetagenomea b10 20 30 10 20 30 401020102010201020301020301010203010201020301020301020301020301020102030102030102030012030102030101020102030102030102030102010203010203040102010203010201020102030102030102030102030102004_C0304_C2104_M2205_B0105_D1805_M1305_P0809_I1809_N2210_G1110_H1810_J0910_J1211_F0211_G0211_G0311_K0111_M2012_B1812_B2212_H03a12_H03b12_J0312_P0403_F1309_O0310_C1212_A1009_O1509_G0109_C2209_K0612_E1404_O2205_H0106_E1911_F230123GH1GH2GH3GH5GH8GH10GH13GH16GH25GH28GH29GH31GH32GH35GH36GH42GH43GH50GH51GH67GH78GH88GH94GH95GH97GH120GH130SusCSusD0.0 0.2 0.4 0.630201001_A01Fosmid ID Activity  kbp GHs PULs >90 % Id.C X X2Fosmids MetagenomePredicted ORF count(percent of total ORFs)Figure 3.4: Fosmids Identified from High-Throughput Screening of Fecal Library. a Schematic representing fosmid hits, gene presenceand similarity. Grey bars represent each fosmid and are proportional to their length. Fosmids sharing 100% identity with anotherfosmid were removed. Connections in the center represent areas of 90% or greater nucleotide identity between fosmids. Inner trackrepresents the locations of identified PULs. Outer coloured track represents activities identified for each fosmid from functionalscreening (C: CMU-cellobiose, X: CMU-xylose, X2: CMU-xylobiose). Coloured bars within each fosmid represent GH domains aspredicted by BLASTP against the CAZy database. b Histogram displays colour encoding of GH gene families and relative abundanceof each family in the complete fosmid dataset compared to the abundance of the same gene families in the unassembled metagenomicdataset. Figure generated by Dr. Keith Mewis.693.3. Beaver Fecal MetagenomeIn contrast, there was an abundance of genes encoding hemicellulose-converting enzymes, par-ticularly xylan. The most abundant GH family recovered was GH3, which contains both β-xylosidases and β-glucosidases. This was followed in abundance by GH43, a family containingboth β-xylosidases and xylanases [202]. Within the unassembled metagenome, 912 GH43 genes(0.12 % of all metagenomic genes) from 23 subfamilies were identified. A total of 27 GH43 genes(1.8 % of all fosmid genes) from 10 subfamilies were identified on active clones (Table 3.1). FiveGH43 genes identified on fosmids belonged to subfamilies containing no previously characterizedmembers (subfamilies 2, 7, and 28, see Table 3.1) thus were of unknown specificities. Genes encodingxylan side-chain removing enzymes (α-glucuronidase from GH67) were also present. Interestingly,we identified a number of genes encoding multiple GH domains (Figure 3.5). Four of these encodedpredicted endo-acting and exo-acting domains. These include 04 C21-10 and 12 B18-19 whichcontain both GH43 and GH10 domains and are likely involved in xylan degradation, as well as10 G11-03 and 12 H03-13, which both contain GH43 and GH8 domains consistent with a role inxyloglucan or xylan degradation. The presence of both endo- and exo-glycosidase domains withinthe same protein can lead to synergism in efficient deconstruction of these xylan substrates, as hasbeen observed previously for other polysaccharides [36].Table 3.1: GH43 Subfamilies Identified on Functionally Active Fosmids.GH43 Subfamily ORFs1 04 C21-11, 11 G02-252 10 G11-3, 12 H03-137 12 H03-1210 05 H01-3, 09 K06-1, 11 G03-16, 12 E14-1111 05 D18-17, 06 E19-1, 12 B18-1812 04 C21-10, 11 K01-12, 12 A10-9, 12 B18-19, 12 H03-319 04 O22-2524 04 M22-11, 10 J12-7, 12 J03-1528 10 J12-4, 12 J03-1829 10 G11-4, 10 G11-8, 11 K01-19, 12 H03-103.3.5 Gene CharacterizationTo better understand the substrate specificities and activities of the enzymes present, we focused ourattention on the uncharacterized GH43 subfamilies identified in functional screening. To this end703.3. Beaver Fecal Metagenome04_C21_ORF1003_F13_ORF2103_F13_ORF1307_E13_ORF0710_G11_ORF0410_G11_ORF0310_J09_ORF2511_K01_ORF1911_K01_ORF2011_K01_ORF2112_B18_ORF1912_H03_ORF1012_H03_ORF1112_H03_ORF1212_H03_ORF13CBM48 GH13GH10 GH43CBM20 CBM20 GH77GH43 CBM6 GH8GH43 CBM6CBM4/9 CBM4/9 GH10GH43 CBM6GH29 GH42GH43 CBM13GH43 GH10GH43 CBM6GH29 GH42GH43 CBM13GH43 GH8Fosmid FosmidGH5BACONFigure 3.5: Gene Organization of Multi-Domain Proteins Identified on Functional Fosmids. Pro-teins containing more than one domain with a CAZy annotation are shown. The colouring ofdomains is consistent with Figure 3.4we generated constructs that were used to overexpress and purify recombinant proteins (12 H03-13from subfamily 2, 12 H03-12 from subfamily 7, and 12 J03-18 from subfamily 28). Since 12 H03-13, also contained a GH8 domain, we created two additional constructs in which the GH8 andGH43 domains were inactivated independently by mutation of the catalytic acid residue (GH8domain variant H03-13 E507A and GH43 domain variant H03-13 E209A). Of the three wild-typeenzymes, two had detectable cleavage activity on aryl-monosaccharides (Table 3.3); both 12 H03-13and 12 J03-18 cleaved CMU-xyloside. The specificity constant of the 12 H03-13 wild-type enzymewas the same, within error, as that of the H03 E507A variant in which the GH8 activity wasabated (Table 3.2), indicating that the GH43 domain is responsible for the hydrolysis of CMU-X. Surprisingly, none of the enzymes cleaved any of the other aryl glycosides tested (Table 3.3),reinforcing the utility of the inherently more reactive chlorocoumarin glycosides for detection ofpreviously unknown activities.Table 3.2: Kinetic Rates Determined for Purified GH43 Enzymes with CMU-X.Enzyme KM (mM) kcat(s−1) kcat/KM (s−1mM−1)12 J03-18 0.48 ± 0.06 0.22 ± 0.02 0.45 ± 0.0712 H03-13 WT 0.19 ± 0.03 0.80 ± 0.06 4.2 ± 0.712 H03-13 E507A 0.14 ± 0.01 0.73 ± 0.02 5.2 ± 0.4713.3. Beaver Fecal MetagenomeTable 3.3: Activity of Purified GH43 Enzymes on Aryl-glycosides.Substrate 12 J03-18 12 H03-12 12 H03-13 WT 12 H03-13 E507A 12 H03-13 E209ApNP-X ND ND ND ND NDMU-X ND ND ND ND NDCMU-X Yes ND Yes Yes NDpNP-Ara ND ND ND ND NDMU-Ara ND ND ND ND NDND : None DetectedThe activities of enzymes 12 H03-12, 12 H03-13 and its variants were also tested on a set ofarabinoxylan oligosaccharides. This revealed a synergistic degradation mechanism in which theGH43 domain of 12 H03-13 (subfamily 2) releases undecorated xylose from the non-reducing endof the oligosaccharides while the GH8 domain of 12 H03-13 (a reducing end xylose-releasing exo-oligoxylanase [Rex]) releases xylose from the reducing end of decorated oligosaccharides (Figure3.6). The activity displayed by 12 H03-13 is further complemented by GH43 12 H03-12 (subfamily7) which cleaves arabinose decorations from arabinoxylans, releasing arabinose and xylobiose, anactivity which is only observed in the presence of 12 H03-13. This establishes the intriguing possi-bility that 12 H03-12 is activated by 12 H03-13 which should be the subject of future work. Thexylobiose generated by these two enzymes appears to be resistant to further degradation. As GH8Rex genes typically require at least a trimer for activity this domain is not expected to hydrolysexylobiose [124, 163]. The GH43 domain of 12 H03-13 was expected to further degrade xylobiose,yet this is not the case, suggesting that the presence of an arabinose sidechain may be importantfor the xylosidase activity of this domain. This represents, to my knowledge, the first multi-domainprotein containing both a GH43 and GH8 domain to be characterized and the first description ofhow these two domains function synergistically on arabinoxylan oligosaccharides converting theminto arabinose and xylobiose. Collectively these results illuminate the substrate specificity andactivity of GH43 subfamilies 2, 7 and 28 within the context of the beaver fecal microbiome withdirect relevance to lignocellulosic biomass conversion and host nutrition.3.3.6 Presence of Hemicellulose Targeting LociIn addition to providing a route toward functional validation of predicted GH genes, active clonesequences contained information about the structural organization of GH gene cassettes. This has723.3. Beaver Fecal MetagenomePAD ResponseRetention Time (min)2 4 6 8 2 4 6 8 2 4 6 8(1) (2) (3)abGH43 GH8GH43GH43 GH8GH43GH43 XGH8XEnzymes(1) (2) (3)α1-2 α1-3α1-2 /α1-3GH43 GH8GH43XXXXX XXylose ArabinoseGH43 12_H03-12GH43 GH8 12_H03-13Figure 3.6: Synergistic Degradation of Arabinoxylooligosaccharides by H03-13 GH43 Enzymes. A.Schematic of the activities of the individual domains of 12 H03-13 and 12 H03-12 on arabinoxylans.These two enzymes were tested for activity on mixture of 23-α-L-arabinofuranosyl-xylotetraoseand 33-α-L-arabinofuranosyl-xylotetraose (1), 23-α-L-arabinofuranosyl-xylotriose (2) and 32-α-L-arabinofuranosyl-xylobiose (3). The GH8 domain of 12 H03-13 releases xylose from the reducingend of (1) and (2). The GH43 domain of 12 H03-13 releases xylose from the non-reducing end of(1). 12-H03-12, a GH43 belonging to subfamily 7 is able to release arabinose from the oligomerscontaining an arabinose α-1,3-linkage. (B.) High performance anion exchange chromatographywith pulsed amperometric detection (HPAEC-PAD) analysis of the degradation of (1), (2) and(3) ) catalyzed by H03-12, H03-13, H03-13 E209A (GH43 domain mutant, denoted with an Xon the GH43 domain) or H03-13 E507A (GH8 domain mutant, denoted with an X on the GH8domain) and their combinations.733.3. Beaver Fecal Metagenomerevealed several GH gene clusters that appear to target plant cell wall hemicelluloses (Figure 3.7a).The fosmid 04 C03 contains a motif (GH16 and GH3 adjacent to SusC and SusD-like proteins)which has synteny with a PUL recently shown to be active against mixed-linkage glucans [287].Several fosmids (04 C21, 10 G11, 11 G02, 12 H03 and 11 K01) also appear to target xylans. Thefour fosmids 04 C21, 11 G02, 12 H03 and 11 K01 all harbor GH10 genes, which often act as endo-xylanases, and GH43s which may have exo-xylanase activity. Furthermore, Fosmid 04 C21 containsa motif (GH10-GH43 [subfamily 12] protein followed by an additional GH43 [subfamily 1] and aGH67) which has synteny with a gene cluster identified in Bacteroides intestinalis [311]. The GH10-GH43 homolog from B.intestinalis has endo-xylanase and arabinofuranosidase activity, which is ableto release xylose, xylosoligosaccharides and arabinose from arabinoxylans [311]. Although Fosmid10 G11 lacks a GH10 it does possess a two domain GH43-GH8 gene, which we speculate may havesimilar activity to target arabinoxylanoligomers as H03-13. The presence of a GH29 (a family withα-L-fucosidases), GH42 (a family with β-galactosidases) and GH31 (a family with α-xylosidases)on the three fosmids 09 O03, 12 H03 and 11 K01 leads us to speculate that these fosmids maytarget fucogalactoxyloglucan which is present in most dicots and gymnosperms [130, 233].Moreover, we identified 15 fosmids spanning 5 identity groups containing Sus-like genes (SusCor SusD), leading indicators for the identification of PULs using an automated PUL predictiontool [295] (Figure 3.7b). Eight of these clones exhibited near complete nucleotide identity (05 O06,05 O07d, 05 O08, 05 P05, 05 P06, 05 P07, 05 P08, and 05 P12) and 4 clones shared near completenucleotide identity specifically in the PUL interval (12 E14, 11 G03, 05 H01, and 09 K06). Theremaining 4 clones (04 C03, 09 C22 and 09 G01) contained unique PUL intervals. RepresentativePULs from each identity group were compared to the RefSeq database to see if they are also found insequenced microbial genomes. PULs from 09 C22 and 09 G01 exhibited 99 % and 98 % nucleotideidentity respectively to distinct regions of the Alistipes senegalensis JC50 genome whereas themost common PUL represented by 05 P08 exhibited 99 % nucleotide identity to the genome ofAlistipes finegoldii DSM17242. The remaining identity groups exhibited less than 7 % nucleotideidentity to reference genomes, indicating previously unrecognized architectures. In addition tothe fosmids which appear to target hemicelluloses mentioned above, 05 H01 and homologs appearto target pectic polymers as they contain a GH28 (a family containing polygalacturonases) and a743.3. Beaver Fecal MetagenomeGH88 which may target the unsaturated reducing ends generated by pectate lyases. The substratestargeted by the PULs present on fosmids 05 P08, 09 G01 and 09 C22 are not immediately apparent,and further biochemical characterization will be needed to reveal their activity.Taken together, the PULs and gene clusters identified on fosmids appear to target many of thehemicellulosic components of plant cell wall, including glucuronylxylan, xyloglucan and pectins,which would be present in hardwoods. Some of the polymers which these gene cassettes likelyact on, however, are not present in hardwoods, such as mixed linkage glucans, which are mainlyfound in grasses, and arabinoxylans, which are present in grasses and softwoods. The ability todegrade these polymers may provision for host nutrition when a preferential food source is scarce.Future characterization of these PULs has the potential to shed light on combinatorial biomassdeconstruction within the beaver microbiome.753.3. Beaver Fecal Metagenome04_C0305_P0809_G0109_C2205_H0111_G0312_E1409_K0604_C0304_C2109_O0310_G1111_G0211_K0112_H03Mixed-Linkage GlucansXylansXyloglucansGH43-GH8GH29-GH42abGH97SusCSusDOther5kbGH2GH3GH10GH11GH16GH13GH25GH29GH28GH32GH31GH35GH36GH43GH50GH42 GH72GH88GH95GH67GH29-GH42GH10-GH43GH43-GH8Figure 3.7: Gene Organization of Putative Hemicellulose Targeting Fosmids and SusC/SusD-likeEncoding Fosmids. A Fosmids with gene clusters that may target the hemicellulosic portion of plantbiomass within the beaver diet. B SusC/SusD-like encoding fosmids. Putative glycoside hydrolasesand SusC/SusD-like proteins are coloured with the same scheme as Figure 3.4. ORFs not annotatedas a glycoside hydrolase, SusC-like, or SusD-like, are shown in grey. Fosmids Identical to 5 P08have been omitted for simplicity.763.4. Beaver Gut Metagenome3.4 Beaver Gut MetagenomeInvestigation of the beaver fecal metagenome left several unanswered questions that we intended toaddress by applying similar methods to samples taken along the beaver digestive tract. We hopedto determine whether the microbes and genes found within the fecal sample were reflective of thosefound within the internal digestive compartments of the beaver. We also aimed to gain insight intothe variability of the gut microbiome, both along the gut transect and between individual beaver.To approach these goals, six beaver were dissected and chyme (partially digested food matter) andfeces were collected from five sites within the digestive tract (Figure 3.8). These samples werethen subjected to the same interrogative methods used for the feces: 16s rRNA sequencing shotgunmetagenome sequencing, and high-throughput functional screening of fosmid libraries.3.4.1 16S Ribosomal RNA ProfilingTo ascertain the microbial community structure throughout the beaver gut the V6-V8 region ofthe SSU rRNA gene was subjected to 454 pyrotag sequencing. Primers with the same sequence(excluding the bar-coding region) as those used for the beaver fecal DNA were employed to facilitatecomparison. This revealed 142,453 rRNA pyrotag sequences, after quality control and singletonremoval, which were clustered with a 97 % similarity cut-off into 1,115 OTUs, see Table 3.4. Ofthe OTUs identified from the gut sequences, 1009 were bacterial, 12 were archael and 93 wereeukaryotic. The majority of eukaryotic sequences were either mammalian, likely host associated(29,639 of 142,453, 20.8%), or from likely food sources (6,199 of 142,453, 4.4 %), though there werealso sequences belonging to the digestive parasite class Trematoda, also known as flukes, foundin the small intestine of beaver 3 and cecum of beaver 3, 4 and 5 (335 of 142,453, 0.2 %). Thestomach and small intestine sequences are particularly dominated by eukaryotic sequences withthe stomach averaging 64.3 ± 38 % and the small intestine averaging 65.5 ± 35 % assigned to aneukaryotic OTU, reflecting both the decreased concentration of bacterial cells and the increase inpartially digested plant matter, Figure 3.9. There was a surprisingly large variability in the ratioof eukaryotic:bacterial OTUs within the stomach and small intestine, which I speculate could be aresult of fecal matter present in the stomach due to coprophagy, which is exhibited by beaver [38].773.4. Beaver Gut MetagenomeStomachSmall IntestineCecumProximal ColonRectumFecesFigure 3.8: Beaver Gut Sampling Sites. Stars denote the location of sampling sites throughout thedigestive tract. Figure adapted from Vispo and Hume [308].The large intestine, which is composed of the cecum, proximal colon and rectum, is dominatedby Firmicutes and Bacteroidetes, Figure 3.9. These two phyla account for 88 ± 13 % of all phylawithin these compartments. One outlier from this was the beaver 3 proximal colon sample, whichhad substantial fusobacterial counts. The most abundant Firmicutes family was, as seen for thefeces, Lachnospiraceae which comprised 30 ± 11 % of all counts within the cecum, proximal colonand rectum samples. This proportion of Lachnospiraceae is somewhat decreased relative to thefecal sample (43.6 %), however it is consistent with the relative proportions seen by Gruninger [109] (cecal samples : 25.4 %, rectal samples : 28.3 %). The most abundant Bacteroidetesfamilies within the cecum, proximal colon and rectum were Bacteroidaceae and S24-7 (20.4 ± 8.3783.4. Beaver Gut Metagenome% and 12.8 ± 4.4 % of all counts, respectively). While the S24-7 family was seen at similar levels inthe fecal sample, the relative number of counts for the Bacteroidaceae family was greatly increasedin almost all of the gut cecum, proximal colon and rectum samples (20.4 ± 8.3 %) when comparedto the fecal sample (3.9 %).ArchaeaOther EukaryotaBlastocystisCiliophoraFungiMetazoaArchaeplastidaOther BacteriaOther CyanobacteriaMelainabacteriaMollicutesFirmicutes OtherNegativicutesErysipelotrichiaClostridiaBacilliOther ActinobacteriaCoriobacteriiaOther BacteroidetesSphingobacteriiaFlavobacteriiaCytophagiaBacteroidiaFusobacteriiaOther ProteobacteriaGammaproteobacteriaBetaproteobacteriaAlphaproteobacteriaB1STB2STB3STB4STB5STB6STB1SIB2SIB3SIB4SIB5SIB6SIB1CEB2CEB3CEB4CEB5CEB6CEB1PCB2PCB3PCB4PCB5PCB6PCB1REB2REB3REB4REB5REB6REB0FESourceScale (%):5 10 20 40ProteobacteriaBacteroidetesActinobacteriaFirmicutesCyanobacteriaBacteriaOpisthokontaSAREukaryotaArchaeaFigure 3.9: Bubble Plot of Beaver Gut Pyrotags. Phyla representing greater than 0.5 % in at leastone sample; all other taxa are binned into a higher taxonomic group or other categories. 100 %of the total pyrotags clustered at 97 % in OTUs are represented in this plot. Sample names areabbreviated as follows: ST: Stomach, SI: Small Intestine, CE: Cecum, PC: Proximal Colon, RE:Rectum, FE: Feces. Bubbles are coloured by sample source.To investigate the similarity of samples, hierarchical clustering of the pyrotag counts was per-formed with all OTU counts, Figure 3.10. As the Beaver 2 cecum sample had a much lower countnumber than any other sample (n = 241) this sample was excluded from the analysis. Clusteringwas performed using the Unweighted Pair Group Method with Arithmetic mean (UPGMA) [278]793.4. Beaver Gut Metagenomewith dissimilarity between samples calculated using the Bray-Curtis statistic [22]. Clustering re-vealed, firstly, that the fecal sample is quite distinct, and clusters distinctly from all other samples.Furthermore, the majority of small intestine and stomach samples form a cluster, while the sitesforming the large intestine (cecum, proximal colon and rectum) form another. Within the largeintestine cluster, the samples also cluster more frequently by organism, rather than by chamber.This is not entirely surprising when taken in the context of other microbiome studies, which haveseen large interanimal/personal variation in the species-level abundance of phylotypes in the gutmicrobiota [21, 67].B5STB1SIB3SI B6STB3STB4STB4SIB5SIB6SIB0FEB2STB1REB1STB1CEB1PCB2SIB2PCB2REB5CEB5PCB6REB6CEB6PCB3CEB4CEB4PCB4REB5REB3PCB3RE0.−Curtis Dissimilarity90969710098 1009489 10097 5510057 9784 99 908997 719662846774616056Figure 3.10: Hierarchical Clustering Analysis of Gut Pyrotags. Hierarchical cluster analysis ofpyrotags (V6-V8 SSU rRNA) from table of OTU counts. A Bray-Curtis distance matrix was usedto determine dissimilarity. Approximately unbiased p-values for each branch in the dendrogramwere determined through bootstrap resampling (n= 10,000). Sample names are abbreviated asfollows: ST: Stomach, SI: Small Intestine, CE: Cecum, PC: Proximal Colon, RE: Rectum, FE:Feces. Beaver 2 Cecum sample is excluded due to low total counts.803.4. Beaver Gut MetagenomeAt least one third of the bacterial OTUs present in the fecal sample were found in at leastone of the other samples, however, the fecal sample was a clear outlier. The fecal sample hadthe highest number of unique OTUs (n = 204) and this sample was distinct in the clusteringanalysis. The post-defecation colonization of the fecal sample may be, in part, to responsible forthis distinctness. However, there are many other variables which are likely to have an effect onthe microbial community present in the fecal sample. All of the gut samples were taken from wildbeaver, for which food choice is dictated by self-grazing on local species. This is contrasted withthe beaver fecal sample which was taken from beaver who were being supplied by humans withplant material, rather than by foraging. The discrepancy between the plant species chosen bybeaver and by humans, may account for some of the gut community variability. Additionally, theclose proximity of other mammalian species, including raccoons, deer, and otters, may have hadan effect on the community composition, possibly through cross-species microbial exchange, suchas that seen by Song et. al. [280]. Moreover, captivity has been observed to alter the microbialcommunity of several mammalian species [59, 77, 197] and may have had an effect in this case also.3.4.2 Metagenome SequencingTo gain insight into the genetic potential encoded within the beaver gut we sought to sequenceextracted DNA from all five compartments from three beaver. Extracted DNA, the same as usedfor pyrotag analysis, was sequenced by Dr. Keith Mewis on the Miseq platform (Illumina) usingindividual barcodes for each sample. This resulted in 4.1 GB of raw sequence data, which wastrimmed to Q30 quality score using prinseq lite+ [265] by Dr. Keith Mewis. Unassembled sequencedata was used for further analysis as attempts to assemble the sequences proved ultimately unsuc-cessful. ORF prediction was performed as for the fecal sample and resulted in a total 4,910,871ORFs, an average of 350,777 ± 213,140 per sample.We next sought to investigate the genetic capacity of the beaver gut microbiome to degradethe complex polysaccharides. To this end, the predicted ORFs were annotated as for the fecalsample using the Metapathways pipeline [155]. Of the 4,910,871 predicted ORFs, 156,933 (3.2%)were annotated as CAZymes, Table 3.5. The relative abundance of GHs, CEs and PLs were allsignificantly increased (p-value = 0.006, 0.031 and 0.011 respectively) in the large intestine samples813.4. Beaver Gut MetagenomeTable 3.4: OTU Counts from Beaver Fecal and Gut SamplesSite Beaver All OTUCountsAllOTUsBacterialOTU CountsBacterialOTUsUniqueOTUsUniqueBacterial OTUsStomach 1 3659 431 3560 421 4 3Stomach 2 3594 288 2350 265 23 18Stomach 3 3580 53 931 41 8 6Stomach 4 3564 36 424 23 7 5Stomach 5 4482 120 305 95 85 62Stomach 6 3927 104 195 82 27 21Small Intestine 1 2115 299 1349 295 3 3Small Intestine 2 5178 357 4038 349 4 3Small Intestine 3 7718 272 4378 264 6 6Small Intestine 4 4339 27 218 14 4 4Small Intestine 5 2620 21 13 10 6 5Small Intestine 6 8066 31 198 24 3 2Cecum 1 2593 370 2582 368 3 3Cecum 2 241 105 241 105 0 0Cecum 3 1264 178 1218 176 2 2Cecum 4 4818 281 4596 276 3 0Cecum 5 5050 392 4990 379 6 5Cecum 6 2028 276 2018 271 0 0Proximal Colon 1 5222 478 5205 473 7 7Proximal Colon 2 16367 512 16345 505 13 12Proximal Colon 3 3163 165 3078 159 3 3Proximal Colon 4 4300 280 4298 278 3 2Proximal Colon 5 4290 372 4255 367 4 4Proximal Colon 6 4156 370 4063 362 5 4Rectum 1 15015 588 14799 582 24 23Rectum 2 3690 344 3598 340 3 2Rectum 3 4033 271 3940 266 3 3Rectum 4 4130 294 4092 289 3 3Rectum 5 5412 373 4855 366 7 7Rectum 6 3839 319 3657 314 4 4Feces 0 11575 355 10930 318 259 204Total Counts - 154028 1374 116713 1213 532 426Mean - 4969 332 3871 326 17 14when compared to the stomach and small intestine samples.To further compare the presence of CAZymes within the different compartments of the beavergut hierarchical clustering of gene abundance was performed. The relative family abundance forthe fecal sample was also included in this analysis as a point of comparison. As we were focused onthe presence of genes responsible for the degradation of polysaccharides, only the CE, PL and GH823.4. Beaver Gut MetagenomeTable 3.5: CAZyme Relative Abundance (% of All ORFs) in Beaver Gut SamplesSite Beaver AAs GHs GTs CBMs CEs PLs Total1 0.01 1.61 0.95 0.09 0.15 0.07 2.87Stomach 2 0.32 1.36 1.84 0.11 0.20 0.05 3.883 0.00 0.31 0.19 0.01 0.03 0.01 0.541 0.00 0.87 0.58 0.04 0.11 0.04 1.64Small Intestine 2 0.00 0.28 0.20 0.01 0.03 0.02 0.553 0.00 0.08 0.10 0.01 0.00 0.00 0.191 0.00 2.16 1.22 0.09 0.20 0.09 3.76Cecum 2 0.00 2.99 1.49 0.10 0.28 0.14 5.013 0.00 2.10 1.01 0.09 0.20 0.10 3.501 0.00 2.17 1.29 0.09 0.19 0.08 3.83Proximal Colon 2 0.00 2.74 1.35 0.10 0.27 0.13 4.593 0.00 0.62 0.49 0.03 0.07 0.02 1.231 0.00 1.95 1.14 0.09 0.18 0.07 3.44Rectum 2 0.00 1.32 0.71 0.06 0.13 0.06 2.283 0.00 1.33 0.78 0.06 0.13 0.04 2.34families were used for clustering. The results of hierearchichal clustering (Figure 3.11), show, asfor the OTU clustering, a split between the stomach and small intestine samples and the samplestaken from the large intestine. Furthermore, the fecal sample clustered with the large intestine group(cecum, proximal colon and rectum). This suggests that although the community composition ofthe fecal sample was substantially different from the gut samples, the carbohydrate degradativecapabilities encoded by these communities is similar. The Beaver 3 proximal colon sample was aclear outlier from the observed separation of the large intestine samples from the stomach and smallintestine samples. This sample was also distinct in the pyrotag analysis, as it contained a higherproportion of Fusobacteria and Erysipelotrichia counts and was reduced in the phyla Clostridiaand Bacteroidia. This intimates that the presence of Fusobacteria, Erysipelotrichia has shifted thepolysaccharide degrading capacity of the sample.To further investigate the degradative potential of the fecal samples we examined the specificfamilies known to be responsible for plant polysaccharide degradation, Figure 3.12. The mostabundant families across all samples were GH43, GH2, GH3, and GH5 (0.097, 0.084, 0.087, 0.054% of all ORFs, respectively). These families are all able to catalyse the degradation of a number ofdifferent components of holocellulose, Figure 3.12, suggesting that their proliferation is a result ofthe multiple members within a family, each with distinct polymer specificities. The majority of plant833.4. Beaver Gut MetagenomeB2CEB2PCB0FEB1STB2REB3REB3CEB1REB1CEB1PCB2STB3SIB3STB2SIB1SIB3PC0. Distance100991001001001007284 917481797378Figure 3.11: Hierarchical Cluster Analysis of Beaver Gut and Feces CAZyme Abundance. Hier-archical cluster analysis of the relative abundance of ORFs annotated as GHs, CEs, and PLs. AManhattan distance matrix was used to determine distance. Approximately unbiased p-values foreach branch in the dendrogram were determined through bootstrap resampling. Sample names areabbreviated as follows: ST: Stomach, SI: Small Intestine, CE: Cecum, PC: Proximal Colon, RE:Rectum, FE: Feces.biomass-degrading CAZy families were more abundant in the large intestine compartments, withthe average fold increase being 3.0 ± 2.2. The beaver fecal sample also displayed a similar profileof biomass targeting enzymes (Figure 3.12), mirroring what was seen with hierarchical clustering.3.4.3 Functional ScreeningDNA from the intestinal samples was also used to construct fosmid libraries for functional screening.To this end, DNA from all five sites collected from beaver 2 were used in an attempt to createfosmid libraries. Unfortunately, libraries could not be created for the stomach and small intestine.The DNA from these samples was much more fragmented, than that from the cecum, colon andrectum samples, making library creation lower yielding. The DNA from the small intestine and843.4. Beaver Gut MetagenomePL11PL10PL9PL2PL1CE15CE13CE12CE8CE7CE6CE4CE3CE2CE1GH124GH120GH116GH115GH113GH106GH105GH95GH93GH88GH78GH74GH67GH54GH53GH51GH48GH44GH43GH39GH35GH31GH30GH28GH26GH16GH11GH10GH9GH8GH6GH5GH3GH2GH1B1STB2STB3ST B1SIB2SIB3SIB1CEB2CEB3CEB1PCB2PCB3PCB1REB2REB3RE B0FESourceFamilyPL11PL10PL9PL2PL1CE15CE13CE12CE8CE7CE6CE4CE3CE2CE1GH124GH120GH116GH115GH113GH106GH105GH95GH93GH88GH78GH74GH67GH54GH53GH51GH48GH44GH43GH39GH35GH31GH30GH28GH26GH16GH11GH10GH9GH8GH6GH5GH3GH2GH1AG Ara CelGM GX HG RG XGPolymer TargetScale (%):0.01 0.05 0.10 0.20Figure 3.12: Abundance of Plant Polysaccharide Degrading Cazymes in Beaver Gut Metagenomes.CAZy families with annotated activities against plant polysaccharides are shown. Only familiespresent in at least one sample are displayed. Sample names are abbreviated as follows: ST: Stom-ach, SI: Small Intestine, CE: Cecum, PC: Proximal Colon, RE: Rectum, FE: Feces. Bubbles arecoloured by sample source. The polymer targets of each family are indicated by the presence of ablack box. Polymers are abbreviated as follows: AG: Arabinogalactan, Ara: Arabinan, Cel: Cellu-lose, GM: Glucomannan, GX: Glucuronoxylan. HG: Homogalacturonan, RG: Rhamnogalacturonanbackbone, XG: Xyloglucan. Bubbles are coloured by sample source.stomach certainly have fewer bacterial colonies than further down the digestive tract and higherconcentrations of DNA from food sources, which is likely to be partially degraded. Other hindgut-fermenting mammals, humans for example, have approximately 107 fold fewer bacterial cells in thestomach and small intestine than they do in the large intestine [270].853.4. Beaver Gut MetagenomeThe three libraries made from beaver 2 intestinal DNA contained a total of 43,776 clones. Ofthese clones, 6,528 were derived from the cecum, 14,976 from the proximal colon and 22,272 fromthe rectum sample. Together these libraries contain nearly 10 times the number of clones createdfrom the fecal DNA. The generated fosmid libraries were then subjected to functional metagenomicscreening as described for the fecal library in order to identify active clones. The only alteration tothe screening protocol was the use of a mixture of CMU-X2, CMU-C and CMU-Man as screeningsubstrates instead of the mixture of CMU-X2, CMU-C and CMU-X, which was used for the fecallibrary. This was done for two reasons: we were able to achieve better signal to noise ratios when themannoside was used instead of the xyloside, and almost all clones active on the xyloside were alsoactive on the xylobioside. Functional screening identified a total of 374 clones that had plate-basedFigure 3.13: Functional Screening of Beaver Gut Libraries. Robust z-score values for fluorescencewere calculated for each plate. Clones above the robust z-score threshold of 40 were chosen forfurther validation.robust z-scores of 40, Figure 3.13. Validation and deconvolution of these hits revealed 196 active863.4. Beaver Gut Metagenomeclones (z-score > 10), of which 138 were active on the xylobioside, 104 active on the cellobiosideand 5 active on the mannoside (Table 3.6). Quite a few clones (n = 48) were active towards boththe xylobioside and the cellobioside suggesting the presence of gene clusters, as seen in the beaverfecal fosmids. The number of hits varied greatly between libraries, with the cecum derived libraryyielding the highest percentage of hits (1.29%), while the rectum (0.32 %), and proximal colon(0.27 %) libraries had much lower hit rates.Table 3.6: Beaver Gut HitsActive Clones (Z-score >10)Library Clones Verified Hits CMU-Cellobiose CMU-Xylobiose CMU-MannoseCecum 6,528 84 (1.29%) 40 (0.61%) 61 (0.93%) 2 (0.03%)Proximal Colon 14,976 40 (0.27%) 23 (0.15%) 26 (0.17%) 1 (0.01%)Rectum 22,272 72 (0.32%) 41 (0.18%) 51 (0.23%) 2 (0.01%)Total 43,776 196 (0.45%) 104 (0.24%) 138 (0.32%) 5 (0.01%)3.4.4 Fosmid Sequencing and Gene AnnotationTo reveal the gene(s) responsible for activity, the DNA from fosmid hits was sequenced, assem-bled, and annotated. Of the 196 fosmids identified and validated 168 have been sequenced, allwith greater than 20 kbp of sequence, Figure 3.14. The average insert length on these fosmidswas 35,880 ± 7,064, for a total of 5.57 Mbp of DNA sequence data. ORFs were predicted as forthe fecal fosmids, revealing an average of 21 ± 5.8 ORFs per fosmid. Out of the 168 sequencedfosmids, 119 were non-redundant. The threshold for redundancy was set as being > 95 % iden-tity over 90 % of the sequenced fosmid. Several fosmids had redundancy with fosmids obtainedfrom other compartments, for example: B2Rectum 12 G19 was redundant with three cecal fosmids(B2Cecum 01 M24, B2Cecum 02 L22, B2Cecum 06 M06) a proximal colon fosmid (B2PC 42 E09)and three rectal fosmids (B2Rectum 02 O10, B2Rectum 13 C08, and B2Rectum 19 N19). Thiscross-compartmental redundancy indicates that at least a subset of the active members within themetagenome can be found throughout the cecum to rectum transect.To assess the presence of carbohydrate-degrading enzymes, the sequenced fosmids were anno-tated, as for the fecal fosmids, using a LAST [150] comparison to the CAZy database [181]. Thisrevealed a total of 430 GHs spanning 29 different families. GH genes accounted for 11.08 % of all873.4. Beaver Gut Metagenome05101520,000 30,000 40,000Base PairsNumber of FosmidsLibrary Rectum Prox.Colon CecumFosmid Length Figure 3.14: Distribution of Beaver Gut Fosmid Insert Length. Histogram showing the number ofsequenced fosmids with a specified length, bars are coloured by the library source.predicted fosmid ORFs, a relative frequency nearly identical to that seen for the fecal fosmids (11.13%), Figure 3.15. The most abundant GHs found on the gut fosmids were GH43, GH3, GH2, GH67and GH10 (1.86%, 1.78 %, 1.48 %, 1.39 % and 1.29 % of all ORFs respectively). This is somewhatdifferent from the proportions of GHs found on the fecal fosmids, where the five most abundantfamilies were GH3, GH43, GH1, GH2, and GH130 (2.68 %, 2.13 %, 0.71 %, 0.71%, and 0.47 %of all ORFs respectively). The most apparent difference between these two sets of fosmids wasthe substantial increase in xylan α-1,2-glucuronidases, from GH67 and GH115 in the gut fosmids.These two families made up 1.38 %, for GH67, and 0.46 %, for GH115, of all ORFs within the883.4. Beaver Gut Metagenomegut fosmids, whereas only one GH67 (0.08 % of all ORFs) and no GH115s were identified on thefecal fosmids. Neither of these families have previously shown activity on xylosides, mannosides,or cellobiosides, thus their presence is likely due to frequent incorporation into glucuronylxylancleaving loci. The stark contrast between the presence of α-1,2-glucuronidases may be partiallyexplained by the differing frequencies of these two GH families within the different metagenomesscreened. Both families are found more predominantly (3.9 fold and 2.3 fold greater for GH67 andGH115, respectively) in the beaver 2 average gut metagenome than in the fecal metagenome.Another difference between the beaver fecal fosmid genes and gut fosmid genes was the decreasedrecovery of GH1 genes within the gut libraries. Although the beaver 2 gut had on average a greaterpercentage of ORFs assigned to the GH1 family (0.046 % for feces and 0.069 % for the averageof gut compartments) the beaver feces fosmids had a far greater proportion of GH1 genes (0.71% of ORFs for feces fosmids, 0.05 % for gut fosmids). This 14 fold increase over the gut librariesis likely due to the substrates used for screening. The GH1 family, which contains members withβ-xylosidase activity, is likely more enriched in the fecal fosmids as this library was screened withthe CMU-X, which was absent from the assay mix used to screen the gut libraries.An additional aberration from expectations was the absence of any GH8 from the recoveredfosmids. The GH43-GH8 gene recovered from the fecal fosmids showed an intriguing synergisticmechanism tuned for the degradation of arabinoxylans. I would have expected to find similar geneswithin the fosmids recovered from the beaver gut microbiomes, or GH8 genes present within xylandegrading gene clusters. One contributing factor to the absence of recovered GH8s may be that therelative abundance of GH8 genes was substantially lower in beaver 2 large intestine metagenomes(2.8 fold decrease from the fecal metagenome).Analysis of the sequences revealed that two of the fecal fosmids, displayed homology to fosmidsrecovered from screening of the beaver gut. The first of these feces sourced fosmids, 04 C21, hadgreater than 95 % identity to B2Cecum 08 E22 over a 25 kb region containing five GH genes(GH43, GH67, GH95, GH29, GH25). The fecal fosmid 04 C03 also had > 95 % similarity toB2Rectum 42 O23 over a total of 22305 bp, containing a GH3, GH16 and a GH32. The repeateddiscovery of fosmids containing nearly identical functional genetic regions from distinct samplesdiffering in multiple aspects (digestive compartment, sampling procedure and host organism) alludes893.4. Beaver Gut Metagenometo the presence of a core functional membership within the beaver microbiome.GH130GH120GH115GH112GH105GH97GH95GH94GH88GH78GH67GH63GH51GH50GH43GH42GH36GH35GH32GH31GH30GH29GH28GH27GH26GH25GH23GH16GH13GH11GH10GH8GH5GH3GH2GH1Cecum FosmidsCecum MetagenomeProximal Colon FosmidsProximal Colon MetagenomeRectum FosmidsRectum MetagenomeFecal FosmidsFecal MetagenomeSourceFamilyScale(%)0.10.5 1.02.0Figure 3.15: Relative Abundance of Glycoside Hydrolases in Sequenced Fosmids and Metagenomes.Bubbles plot shows the relative abundances of each GH family recovered from positive fosmid clonesfor each library source, including the beaver fecal library. Bubble area is proportional to the relativeabundance. Bubbles are coloured by library source. Only the GH families found within at leastone sequenced fosmid are shown.3.4.5 Presence of Polysaccharide Utilization LociAs the fosmids recovered from the fecal library contained a multitude of PULs, including those withnovel organizations, I sought to identify PUL-containing fosmids recovered from the gut fosmids.903.4. Beaver Gut MetagenomeAs previously, PUL-containing fosmids were identified by their signature tandem SusC-like/SusD-like pairing, which is a hallmark of the presence of PULs [295]. In total 69 of the 168 fosmids, 41 %of all sequenced beaver gut fosmids, contained PULs. Within this set of PUL-containing fosmids,50 were non-redundant (< 95 % similarity over 90 % of the fosmid).B2Cecum_02_L12B2Cecum_05_F16B2Cecum_10_E21B2Cecum_10_O17B2Cecum_12_C18B2Cecum_14_D15B2Cecum_16_E13B2Rectum_54_I06B2Rectum_54_E22B2Rectum_40_K20B2Rectum_02_A18B2PC_09_P24OtherSusCSusDGH3GH2GH10GH10/43GH30GH31GH43GH67GH1155kbpFigure 3.16: Gene Organisation of Beaver Gut Fosmids Containing SusC/SusD-like Proteins and aTwo Domain GH10-GH43. Putative glycoside hydrolases and SusC/SusD-like proteins are colouredwith the same scheme as Figure 3.4, with the exception of the two domain GH10-GH43. ORFs notannotated as a glycoside hydrolase, SusC-like, or SusD-like, are shown in grey. Fosmids pairs orsets within brackets share greater than 95 % identity over their overlapping region. Fosmids havebeen aligned to highlight synteny.Inspection of this set of all beaver gut, PUL-containing fosmids revealed multiple syntenicgroups, see Figures 3.16, 3.17 and 3.18. The first of these groups is recognized by a conservedmotif of a dual domain GH10-GH43 (GH43 subfamily 12) protein followed by an additional GH43(subfamily 1) and a GH67. All of these GH10-GH43 containing PUL fosmids had optimal activity on913.4. Beaver Gut Metagenomexylobiose. This motif is present in 12 non-redundant fosmids, of which 3 sets have nucleotide identitygreater than 98 % over their overlapping regions. This motif, including the two domain protein, isalso present on the fecal fosmid 04 C21. This cluster of genes has synteny with the xylan degradingcluster of genes identified within Bacteroides intestinalis [311]. Specifically, BACINT 04202, fromBacteroides intestinalis, which contains a GH10 fused to a GH43 subfamily 12 domain, has 55-57 % amino acid identity with the GH10-GH43 proteins identified from the beaver gut. TheBACINT 04202 protein has endo-xylanase and arabinofuranosidase activity, which is able to releasexylose, xylosoligosaccharides and arabinose from cereal arabinoxylans [311].Several of the fosmids identified with the GH10-GH43 gene cluster also have additional hydro-lase genes. A set of 9 fosmids also code for a GH10 and GH2 downstream of the GH10-GH43 cluster,two with an additional GH115. Another set of four GH10-GH43 containing fosmids had a GH31,GH2, GH3 motif upstream, which resembles a portion of the characterized xyloglucan degradingXyGUL-PUL from B.ovatus [165]. This GH31, GH2, GH3 motif is also seen on two other fosmids,B2Cecum 01 K09 and B2Cecum 13 L24, which lack a the GH10-GH43, GH43, GH67 genes. Fur-thermore, all fosmids containing this XyGUL like motif also had activity against CMU-C, suggestingthat this region is responsible for the observed activity.A second subset of fosmids, was active on CMU-X2, yet lacked the GH10-GH43 region, seeFigure 3.17. Many of these fosmids contained GH genes that were present in the downstreamregion of the first cluster; GH10, GH2, GH115 and GH43s were commonly seen to be present onthese fosmids. In fact, a GH10 protein was seen in all 17 fosmids within this group, highlightingthe importance of this family, which is well know for the degradation of xylans [105]. The highabundance of α-glucuronidases within the beaver gut fosmids is in part due to their presence inthese loci, as over half of the fosmid GH67 and GH115 genes were found on PUL-containing fosmids.The final set of 11 PUL-containing fosmids was most active on CMU-C, and showed limitedactivity against CMU-X, Figure 3.18. This set of fosmids bore a resemblance to the PUL-containingfosmids in Chapter 2, which were identified with the substrate MU-C. All of these fosmids had GH3present, likely the active protein, and many of these contained an additional GH16 domain withinthe PUL. As noted earlier, two of these fosmids, B2Cecum 01 K09 and B2Cecum 13 L24, hadsynteny with GH10-GH43 containing fosmids and likely target xyloglucans.923.5. Limitations and Future DirectionsB2Cecum_11_J14B2Cecum_14_F12B2Cecum_15_P24B2PC_19_I12B2Rectum_27_G03B2Rectum_45_B02B2Cecum_04_L10B2Cecum_04_N13B2Cecum_16_G01B2Cecum_16_L22B2PC_16_D08B2Rectum_08_P24B2Rectum_14_L11B2Rectum_27_F20B2Rectum_32_P09B2Rectum_39_D08B2Rectum_57_I13OtherSusCSusDGH3GH2GH10GH5GH11GH29GH35GH43GH67GH115GH63 5kbpFigure 3.17: Gene Organisation of Beaver Gut Fosmids Containing SusC/SusD-like Proteins WithHighest Activity on CMU-X2. Putative glycoside hydrolases and SusC/SusD-like proteins arecoloured with the same scheme as Figure 3.4. ORFs not annotated as a glycoside hydrolase,SusC-like, or SusD-like, are shown in grey. Fosmids pairs or sets within brackets share greater than95 % identity over their overlapping region. Fosmids have been aligned to highlight synteny.3.5 Limitations and Future DirectionsThis multifaceted analysis of beaver fecal and gut microbial communities has revealed both thecommunity members present within these microbiomes and the molecular mechanisms they im-933.5. Limitations and Future DirectionsB2Cecum_05_K21B2Cecum_08_O15B2Cecum_12_J04B2Cecum_15_M20B2Cecum_16_F09B2PC_41_D02B2PC_44_P12B2Rectum_01_N16B2Rectum_14_D10B2Rectum_41_B02B2Rectum_42_O23B2Cecum_01_K09B2Cecum_13_L14SusCSusDGH3 GH30GH16 GH32 GH130Other5kbpGH36GH31GH2 GH29GH5Figure 3.18: Gene Organisation of Beaver Gut Fosmids Containing SusC/SusD-like ProteinsWith Highest Activity on CMU-C. Putative glycoside hydrolases and SusC/SusD-like proteins arecoloured with the same scheme as Figure 3.4. ORFs not annotated as a glycoside hydrolase, SusC-like, or SusD-like, are shown in grey. Fosmids pairs or sets within brackets share greater than 95% identity over their overlapping region. Fosmids have been aligned to highlight synteny.plement to degrade plant polysaccharides. There are however some constraints which limit ourinterpretation of the data presented. The first of these limitations is the under-sampling of clonelibraries. The number of clones needed to ensure a library is representative can be estimated usingthe Carbon and Clarke formula [58], equation 3.1.N = ln(1− P )/ln(1− f) (3.1)Where N is the number of clones needed, P is the probability that a given sequence is presentin the library, and f is the fractional size of each insert of the total genome. Assuming that thereare 1,000 species of bacteria present within the beaver gut, the average fosmid insert is 40 kbp943.5. Limitations and Future Directionsand the average bacterial genome is 5 Mbp in length, you would need a library of over 370,000clones of to have a 95 % chance of finding a certain genetic locus. This number of clones becomessignificantly higher when the low abundance of rare taxa is taken into account. The largest librarygenerated within this study contained 22,272 clones which is certainly an under-sampling of theenvironment. As the number of clones needed to be screened for diverse environment nears thetechnical limitations for plate-based screening, other screening technologies may offer the abilityto more fully screen metagenomic libraries. Recent advances in using droplet-based microfluidicsfor functional metagenomics are particularly alluring [211] and should allow for a more completesampling of microbial communities.The beaver diet, like that of many mammals, is seasonal. During the spring and summer monthsherbaceous aquatic vegetation constitutes a higher proportion of their diet than in the wintermonths, when they become more dependent on tree species [5, 35]. As such, one might expectthat the microbial communities shift throughout the seasons. Indeed, this seasonal variability ofgut community composition has been observed for several other mammalian species, including wildmice [194], reindeer [92], bison [190], cattle [276] and humans [277]. This study is limited in theassessment of seasonal variation as a majority of the samples were obtained in the spring, and nosamples were collected in the summer or fall. Seasonal bias in sampling is difficult to avoid, asthe British Columbia Ministry of Environment currently restricts beaver trapping to the periodbetween October 15th and April 30th in the lower mainland region. Further examination of thebeaver gut at different times throughout the year could reveal a shift in community which coincideswith a shift in diet from herbaceous material to woody plant matter.Another source of variation in the beaver diet originates from the wide range of habitats in whichit can be found. North American beaver have been found to inhabit Arctic Tundra and range asfar south as Northern Mexico and are an invasive species in Tierra del Fuego, Argentina [56], andFinland [231]. As the plant species within these ecozones varies widely one would expect the dietof the various beaver sub-populations to also vary. As the glycan component of plant matter is alsovariable between plant species [42], it stands to reason that different beaver populations would beexposed to plants with variable carbohydrate compositions and substitution patterns. Investigationof the gut community composition as it varies with diet could lead to a better understanding of953.6. Conclusionsthe microbial degradation of specific plant polymers from defined species.This thesis chapter has examined the genetic potential encoded within the fecal and gut micro-biomes. However, the genes present within the community are surely expressed at differently levels.Metatranscriptomic analysis, such as that performed for Arctic ground squirrels [115], cattle [64] orhumans [95, 238], would reveal which degradative enzymes within the beaver gut are actively beingtranscribed. Moreover, this could illuminate transcriptional differences as digesta transit throughthe gut. Integration of metaproteomic data could further refine the mechanistic details involved inthe degradation of plant matter within the beaver gut. Metaproteomic studies have looked at plantpolysaccharide digestion in the termite [315] and shipworm [226]. The application of this methodto the beaver gut could reveal not just which proteins are being expressed, but also what subsetare being secreted into the gut environment to degrade plant polymers.3.6 ConclusionsThis chapter opens a functional metagenomic window on the capacity of the beaver fecal and gutmicrobiomes to deconstruct woody plants into soluble sugars supporting host nutrition. Althoughbeavers are considered xylotrophic organisms, their microbiome composition is most similar to thoseof hindgut-fermenting mammals dominated by Bacteroidetes and Fimicutes. Multiplexed high-throughput functional metagenomic screening was applied to recover active glycoside hydrolase(GH) enzymes from these metagenomes. The functional screening of fosmid libraries sourced fromthe beaver fecal and gut microbiomes with model cellulose and xylan substrates revealed 247 fosmidswith an array of carbohydrate degradation profiles. A subset of these fosmids contained GH43enzymes belonging to previously uncharacterized subfamilies. Cloning and expression revealedthe substrate specificity of three previously uncharacterized subfamilies and revealed a mechanisminvolving two of these family 43 hydrolase domains and an appended family 8 glycoside hydrolasedomain which synergistically degrade arabinoxylan oligomers. Fosmid clones recovered in thisstudy also revealed novel assortments of genes clustered into polysaccharide utilization loci (PULs)with the potential to synergistically enhance biomass deconstruction in support of host nutrition.Interestingly we did not identify an abundance of fosmid genes encoding cellulases indicating the963.6. Conclusionspotential for overexpression by specialized strains.The panoply of genes encoding enzymes with hemicellulose degrading activities is presumablyrequired to tackle the complex structures of hardwood biomass in which glucan and xylan backbonesare extensively decorated with appended sugars. The presence of two-domain enzymes endowsthe beaver fecal and gut microbiomes with synergistic capacity to efficiently degrade the specificlinkages present within the hemicellulose. Not only does this liberate monosaccharides, but thedegradation of hemicellulose scaffolds also exposes the underlying cellulose fibers for digestionby the cellulase repertoire, enhancing conversion efficiency. Taken together these results providea unique perspective on the modular domain architecture and functional specialization drivingcombinatorial biomass deconstruction in the beaver fecal and gut microbiomes.97Chapter 4Harnessing Natural Diversity toProfile Promiscuity and Create NewGlycosynthases4.1 SummaryThe use of chemical probes bearing unnatural functional groups has enabled the discovery and char-acterization of enzyme activity. However we do not generally know how well specific modificationson the sugar ring are tolerated by glycoside hydrolases. Functional screening performed in Chapters2 and 3 produced a library of functional clones containing a diverse set of glycoside hydrolase genes.Access to both this fosmid hit library and a synthetic GH1 gene library has enabled us to addressthe issue of tolerance of azido-, amino- and methoxy- functional groups by glycoside hydrolases.Additionally, it would be valuable to exploit any promiscuous activities identified to create varianthydrolases with synthetic capacity. To this end, I performed high-throughput plate-based screeningof two separate libraries, followed by kinetic analysis to identify clones and their expressed enzymeswith promiscuous hydrolase activity. I then characterized the acceptor specificity of these enzymesand used site-directed mutagenesis to create active glycosynthases. This revealed which modifica-tion positions and functional groups are best tolerated. Eight new glycosynthases, with a varietyof activities and ability to incorporate modified glucosides and galactosides, were created from theselected promiscuous enzymes.984.2. Background4.2 BackgroundEnzymes are often thought to be highly specific, yet many enzymes have minor activities on sub-strates for which they were never specialized [149, 220]. Termed promiscuous activities, thesesecondary functions are often starting points for the evolution of new activities, or hold-oversfrom previous ancestral functions [149, 224]. Furthermore, promiscuous enzyme activities can beexploited to create new biocatalysts [33]. These latent secondary activities, however are oftendifficult to predict and are often overlooked when enzymes are characterized.Investigation of promiscuous activities has particular relevance to chemical glycobiology, as anumber of studies have relied on promiscuous enzymatic activity towards chemical probes containingunnatural functional groups, to reveal functions in both in-vitro and in-vivo experiments. Severalstudies have used azido sugars to enable the metabolic labelling and visualization of glycans on cellsurfaces and in organisms [51, 167, 310, 341]. Alkynyl sugars, modified at several different positions,have similarly been used to label cancer cell glycans [131, 140], animal glycans [50] and plant glycans[343]. Examples of both alkynyl and azido sugars are given in Figure 4.1. Despite their apparentutility in interrogating biochemical activities, we don’t know how well specific modifications aretolerated by CAZymes, including the glycoside hydrolases.Assessment of hydrolase promiscuity can also help us to address another long-standing problemin the biological sciences: ready access to synthetic oligosaccharides. The development of facilemethods for the synthesis of peptides and oligonucleotides revolutionised the biological sciences.However, despite excellent progress in automated glycan synthesis [111, 214] we still lack a univer-sal and reliable method for glycan assembly, largely because of the enormously greater regio- andstereochemical challenges posed. The alternative to chemical synthesis involves use of enzymes,most likely either the natural nucleotide phosphosugar-using glycosyltransferases or syntheticallyuseful variant forms of glycoside hydrolases – glycosynthases [12]. These latter enzymes are createdby mutating the catalytic nucleophile residue of a retaining glycosidase (Glu or Asp in almost allcases) typically to either Ala, Gly or Ser. Such variants are hydrolytically incompetent with naturalsubstrates, but when presented with a glycosyl fluoride donor sugar of inverted anomeric config-uration, will typically transfer that sugar to an appropriate acceptor, often in near stoichiometric994.2. BackgroundOAcON3AcOAcOOAcOAcOAcOAcOAcOHNN3OOAcOOAcAcOAcOHNN3OOAcOOAcOAcAcOHNN3OOAcO OAcOAcAcON3OAcOAcOAcOAcOHNOOAcO OAcOAcAcOOAcOOAcOAcAcOHNOOAcOOAcAcOAcOHNOOAcOOAcAcOOOAcOHOHOHOHOHNOOHOOHHOHOHNOOHOOHOHHOHNOOHO OHOHHOOHOOHHOHOOHN-acetyl-mannosamineN-acetyl-galactosamineN-acetyl-glucosamineL-FucoseGlucoseNaturally Occuring Sugar Azido-sugar Alknyl-sugarFigure 4.1: Modified Sugars Used for Metabolic Labelling. Examples of per-acetylated azido- andalkynyl-glycosides used in metabolic labelling experiments and their unmodified parent.yield, without subsequent hydrolysis [12, 72]. The first glycosynthase generated was from the GH1β-glucosidase from Agrobacterium sp. [188] and many have been developed since and are usedto create a variety of glycans including: glycolipids [253], glycosidase inhibitors [106] and definedglycoproteins [61, 72, 103, 110, 160].This has been accomplished through the creation of glycosynthases from new families [110],and through directed evolution [153]. Our intent in this work was to use libraries of fosmids andlarge synthetic gene libraries as a means of identifying useful catalysts for the formation of specificglycosides that could not be assembled with currently known enzymes. Of particular interest wasthe ability to assemble oligosaccharides containing amino- or azido substituents at the 3, 4 or 6positions. These would be useful not only in the degradation/assembly of aminosugar-containingglycans, such as antibiotics or bacterial surface polysaccharides, but also as a way of incorporating1004.3. Fosmid Hit Librariesmodifiable glycans into biomolecules under mild conditions for subsequent tagging. More broadlythis served as a test system for generating a pipeline for the discovery and generation of custombiocatalysts for glycan assembly.In this chapter I aimed to harness the diverse set of enzymes encoded on the fosmid hits andwithin a synthetic gene library to evaluate their capacity to hydrolyse modified synthetic glycosides.The research performed in Chapters 2 & 3 provided us with a panel of hydrolases with which wecould begin to explore the distribution of promiscuous hydrolase activity. Additionally, we haveobtained a synthetic gene library, produced by the JGI, which contains a diverse set of 175 GH1family genes [117]. By similarly interrogating this synthetic gene library we hoped to reveal thepromiscuity within this specific family. By assaying these two libraries with azido-, amino- andmethoxy-glycosides we hoped to gain insight into which promiscuous activities are more prevalentamong glycoside hydrolases.I have furthermore exploited the promiscuous enzymes identified to generate new biocatalysts.A set of 15 glycoside hydrolases, from both the metagenomic and GH1 libraries, were selected to befurther investigated and transformed into glycosynthases. The acceptor (+1 site) specificities of allselected enzymes were then explored against a panel of acceptors in a second, plate-based screen,identifying optimal candidates to utilize as acceptors in subsequent glycosynthase reactions (Figure4.2). The alanine, serine and glycine variants of the nucleophilic glutamate residue were created foreach of the fifteen GHs and the activities of the corresponding 45 variants were evaluated. Variantforms of one of the seven fosmid derived genes and seven of the eight chosen synthetic GH1s actedas competent glycosynthases yielding the desired glycans containing azido or amino substituents.Finally, the utility of these glycosynthases in the assembly of taggable activity-based probes wasdemonstrated.4.3 Fosmid Hit Libraries4.3.1 Screening with Modified GlycosidesIn the search to identify glycoside hydrolases with activity on modified glycosides we harnessed thefunctional clone collections generated in Chapters 2 and 3. Active clones that were identified with1014.3. Fosmid Hit LibrariesOHOHOOHN3OHO O OOOEnzHO OEnzOHOHOOHN3CH3EnzFO OEnzOHChoice of Donor Choice of AcceptorGlycosynthase ActivityModified GlycosidaseScreeningAcceptor SpecificityScreeningOHOHOOHN3OROHOHOOHN3OOHO OO OOEnzEnzOHOHOFOHOOEnzO OEnzOH ROHOHOFOHOOEnzHO OEnzORRFigure 4.2: Screening Methodology. The -1 subsite specificity is evaluated through glycosidasescreening with fluorogenic substrates. Acceptor specificity is based on stimulation of reactivationof a trapped 2-fluoroglycosyl enzyme through transglycosylation to a competent acceptor. Theextent of reactivation is assessed with a chromogenic substrate, giving an indication of +1 subsitespecificity. The corresponding glycosynthase can then be employed with cognate acceptor anddonor substrates.DNP-C or CMU-C from the libraries described in Chapter 2 with were also screened. In total 653active clones, from soil, ocean, bioreactor, coal bed and beaver fecal libraries were screened with 10different modified glucosides and galactosides, see Figure 4.3. The fluorogenic probes used in the1024.3. Fosmid Hit Librariesscreens contained either an amino group at the 3-, 4-, or 6-position, an azido group at that 3-, 4-,or 6-position or a methoxy group at the 3- or 4-position, Figure 4.3. Screening was performed inthe same manner used to originally identify the clones.O O OOOHH 2NHOOHMU-3-O-Me-GlcMU-4-O-Me-GlcMU-4-NH2-GlcO O OOOHHOH 2NOHMU-3-NH2-GlcO O OONH 2HOHOOHO O OOOHHOOOHO O OOOHOHOOHMU-6-NH2-GlcO O OON3HOHOOHO O OOOHHON3OHO O OOOHN3HOOHO O OON3OHHOOHMU-3-O-Me-GalO O OOOHOHOOH MU-3-N3-GlcMU-4-N3-GlcMU-6-N3-GlcMU-6-N3-GalFigure 4.3: Modified Glucosides and Galactosides Used for Screening. The fluorogenic sub-strates used were: 4-methylumbelliferyl 3-amino-3-deoxy-β-D-glucopyranoside (MU-3-NH2-Glc),4-methylumbelliferyl 4-amino-4-deoxy-β-D-glucopyranoside (MU-4-NH2-Glc), 4-methylumbelliferyl6-amino-6-deoxy-β-D-glucopyranoside (MU-6-NH2-Glc) an azido group at that 3-,4-, or 6-position(4-methylumbelliferyl 3-azido-3-deoxy-β-D-glucopyranoside (MU-3-N3-Glc), 4-methylumbelliferyl4-azido-4-deoxy-β-D-glucopyranoside (MU-4-N3-Glc), 4-methylumbelliferyl 6-azido-6-deoxy-β-D-glucopyranoside (MU-6-N3-Glc), 4-methylumbelliferyl 6-azido-6-deoxy-β-D-galactopyranoside (6-N3-Gal MU) or a methoxy group at the 3-position (3-methoxy-β-D-galactopyranoside (MU-3-O-Me-Gal) and 3-methoxy-β-D-glucopyranoside (MU-3-O-Me-Glc)).Screening revealed a total of 264 clones that hydrolysed at least one of ten compounds tested, asdetermined by a robust z-score greater than 10, see Figure 4.4 and Table 4.1. The most frequentlyaccepted modification was the incorporation of an amino group, with the 6-, 4- and 3-amino glu-1034.3. Fosmid Hit Librariescosides being hydrolysed by 87, 164 and 17 clones respectively. The azido modification was alsoaccepted by a number of clones, though substitution at the 6-position was substantially better tol-erated than the 4- or 3-position for the glucosides. Few clones were able to hydrolyse the methoxymodified glycosides (only 13 hits were identified for MU-3-O-Me-Gal and none for MU-3-O-Me-Glcand MU-4-O-Me-Glc), though substrates with methoxy groups at the 6-position were not tested.The presence of multiple GH families within this library and fosmids with multiple GHs com-plicates the rationalization of the observed substrate preferences. However, one initial point ofcomparison is the GH3 family, as this was the most abundant GH family found on fosmids fromChapter 2 and Chapter 3. Investigation of GH3 β-glucosidase structures both containing boundsubstrates (PDB:1IEW, 2X41) reveals that the 3- and 4-hydroxyl groups both have a greater num-ber of amino acid residues positioned within hydrogen bonding distance than does the 6-hydroxyl[128, 242]. Additionally, the 6-position appears to be pointing out of the active site, whereas the3- and 4-hydroxyls are facing the enzyme interior. This corresponds well with our hit rates, as theamino group (which is small and able to maintain hydrogen bonding) has a higher hit rate for the 3-and 4-positions than the azido- or methoxy- modified sugars, while modifications at the 6-positionseem to be accommodated regardless of their size. It is difficult to expand these justifications tothe modified galactosides, as the GH3 families do not typically exhibit β-galactosidase activity.Modification Parent 3-Position 4-Position 6-PositionAzido Glucose 4 8 158Amino Glucose 17 164 87Methoxy Glucose 0 0 N/AAzido Galactose N/A N/A 79Methoxy Galactose 13 N/A N/AN/A: Not testedTable 4.1: Number of Fosmid Hits for Each Modified Substrate (Robust Z-Score >10).A total of 653 active clones were screened of which 203 cleaved MU-Glc.4.3.2 Kinetic Characterization of HydrolasesSequenced hits with the highest fluorescence were selected for further characterization, Table 4.2.In total seven fosmid hits were selected, which together cleaved eight of the ten modified glycosides.Several of the selected hits cleaved more than one modified glycoside, see Table 4.2. Additionally,1044.3. Fosmid Hit LibrariesFigure 4.4: Functional Screening of Hit Libraries with Modified Glycosides. Robust z-score valuesfor fluorescence were calculated on a per plate basis.six of the seven selected hits had more then one glycoside hydrolase gene encoded on the fosmids,TolDC 15 C08 was the exception to this as it only encoded one GH from family 3, see Figure 4.5.To limit the number of genes for down stream analysis, I selected only those genes from familieswith known activities that corresponded with the parent compound, i.e. β-glucosidase familiesfor modified glucosides and β-galactosidase families for modified galactosides. These genes werefurther limited to the set that had a retaining mechanism, in the hopes that these could eventuallybe transformed into glycosynthases. A total of 10 genes were selected for sub-cloning and expressiontests, see Table 4.2 and Figure 4.5 . Only one fosmid, CA233 02 C24 had multiple genes which1054.3. Fosmid Hit LibrariesTable 4.2: Selected Fosmids with Activity on Modified Glycosides and the Genes Selected forSub-Cloning and ExpressionSubstrate Fluorescence Robust Proteins GH ExpectedClone (MU-) (RFU) Z-score Selected Family ActivityCG23A 23 I013-N3-Glc 91 13.9 I01 GH1 GH1 6-phospho-β-glucosidase3-NH2-Glc 3,518 55.8TolDC 15 C08 3-NH2-Glc 1,718 21.1 C08 GH3 GH3 β-glucosidaseBeaver 09 O03 3-O-Me-Gal 2,574 465 O03 GH42 GH42 β-galactosidaseNapDC 14 D084-N3-Glc 379 44.9 D08 GH3 GH3 β-glucosidase4-NH2-Glc 24,832 163.5CA233 02 C244-N3-Glc 317 36 C24 GH3-1, GH3 β-glucosidase4-NH2-Glc 24,505 168.5 C24 GH3-2, GH3 β-glucosidase6-N3-Glc 2892 111.3 C24 GH3-3, GH3 β-glucosidase6-NH2-Glc 13,237 180.3 C24 GH3-4 GH3 β-glucosidaseFOS62 41 C11 6-N3-Gal 4793 467 C11 GH1 GH1 β-glucosidaseFOS62 40 O226-N3-Glc 2734 104.9 O22 GH3 GH3 β-glucosidase6-NH2-Glc 9,117 118.8fit the aforementioned criteria. This fosmid contained four GH3 enzymes, any of which may havebeen responsible for the detected activities.The gene selected from CG23A 23 I01, belongs to the GH1 family which contains many mem-bers with β-glucosidase activity. However, inspection of the gene sequence revealed a SKY motif,identified by Heins et al. [117] as indicative of 6-phospho-β-glucosidase activity. Furthermore, thisfosmid also codes for genes annotated as PTS IIA, IIB and IIC, essential components of the phos-photransferase system, which concomitantly imports and phosphorylates sugars [198]. Therefore itmay be the case that the observed activity is a result of the hydrolysis of the screening compoundsonce they have been phosphorylated.All ten selected genes were sub-cloned into a pET28 vector backbone with a C-terminal hexa-histine tag. As all four CA233 02 C24 GH3 genes had N-terminal signal peptides (as determinedby the SignalP server [218]) N-terminal truncated genes were used. All ten sub-cloned genes werethen transformed into E. coli BL21(DE3) for expression. Nine of the ten genes were expressed atsufficient levels for purification, with the exception being C24-3-GH3. Kinetic characterization ofthe wild-type proteins was performed to confirm the activities observed from fosmid clones, seeTable 4.3.1064.3. Fosmid Hit LibrariesBeaver_09_O03CA233_02_C24FOS62_40_O22FOS62_41_C11NapDC_14_D08TolDC_15_C08CG23A_23_I01GH3GH31 GH42GH2 GH29GH1 GH16GH51GH44GH4 GH30GH78 3 kbp2 3 41GH39Figure 4.5: Gene Organisation of Selected Fosmids With Activity on Modified Glycosides. Putativeglycoside hydrolases are coloured with the same scheme as Figure 3.4. ORFs not annotated as aglycoside hydrolase are shown in grey. ORFs Selected for sub-cloning and further characterizationare underlined. As there were multiple GH3s selected for sub-cloning from fosmid CA233 02 C24these were numbered 1-4.1074.3.FosmidHitLibrariesTable 4.3: Kinetic Constants for Fosmid Sourced Hydrolases.Fosmid Enzyme Substrate kcat (s−1) KM (mM) kcat/KM (mM−1s−1)FOS62 41 C11 C11-GH1MU-Glc 63±5 0.20±0.03 320±50MU-6-N3-Glc 14±1 0.05±0.01 280±60MU-Gal 0.9±0.2 0.05±0.02 20±9MU-6-N3-Gal 3.9±0.5 0.10±0.02 40±10NapDC 14 D08 D08-GH3MU-Glc 0.07±0.005 0.064±0.009 1.1±0.2MU-4-N3-Glc - - 0.013±0.002MU-4-NH2-Glc - - 32.1±0.4Beaver 09 O03 O03-GH42MU-Gal - - 95±3MU-6-N3-Gal - - 3.2±0.1MU-3-O-Me-Gal - - 4.4±0.2TolDC 15 C08 C08-GH3MU-Glc 2.7±0.2 0.039±0.005 70±10MU-3-NH2-Glc 0.017±0.005 0.5±0.2 0.03±0.01FOS62-40-O22 O22-GH3MU-Glc 1.10±0.05 0.050±0.005 20±2MU-6-N3-Glc 1.4±0.2 0.24±0.05 5±1MU-6-NH2-Glc - - No ActivityCA233 02 C24 C24-GH3-1MU-Glc 0.025±0.001 0.0012±0.0003 20±5MU-6-N3-Glc 0.37±0.03 0.06±0.01 6±1MU-6-NH2-Glc 0.00161±0.00004 0.0029±0.0006 0.6±0.1MU-4-N3-Glc - - 0.01±0.0007MU-4-NH2-Glc 2.5±0.2 0.32±0.04 8±1CA233 02 C24 C24-GH3-2MU-Glc 0.00068±0.00006 0.004±0.002 0.15±0.086-N3-Glc 0.012±0.004 0.13±0.07 0.09±0.06MU-4-NH2-Glc - - No ActivityMU-4-N3-Glc - - No ActivityCA233 02 C24 C24-GH3-4MU-Glc - - 0.83±0.02MU-6-N3-Gal - - 0.0018±0.0003MU-4-NH2-Glc - - 0.0048±0.0001MU-4-N3-Glc - - No ActivityCG23A 23 I01 I01-GH1pNP 6-PO4-Glc 16±2 0.04±0.01 500±100pNP-Glc - - No ActivityMU-Glc - - No ActivityMU-3-N3-Glc - - No ActivityMU-3-NH2-Glc - - No Activity1084.3. Fosmid Hit LibrariesA majority of the activities detected from the initial screening were confirmed to be a result ofthe sub-cloned and expressed proteins. The GH42 from Beaver 09 O03 was confirmed to cleave MU-3-O-Me-Gal, however this activity was an order of magnitude less than that seen for the galactoside.The GH1 from FOS62 41 C11 had the suspected activity against 6-N3-Gal MU, with a specificityconstant (kcat/KM ) on the same order of magnitude as the unmodified glycoside. Additionally,C11-GH1 cleaved MU-6-N3-Glc and the unmodified glucoside, with specificity constants an orderof magnitude greater than for the corresponding galactosides. D08-GH3 had a detectable, butlow activity on the 4-azido glucoside, however the specificity constant for the 4-amino glucosidewas, surprisingly, an order of magnitude greater than that of the unmodified glucoside. The GH3expressed from TolDC 15 C08 was able to hydrolyse the 3-amino glucoside, however the KM valuefor this hydrolysis was much larger than that seen for the unmodified glucoside.As CA233 02 C24 contained several genes which may have been responsible for the observedactivity, each of the purified proteins was assayed against all four of the modified glucosides that thefosmid clone cleaved. C24-GH3-2 and C24-GH3-4 both cleaved MU-6-N3-Glc, however, specificityconstants were substantially lower than those observed for C24-GH3-1. The C24-GH3-2 enzymecleaved none of the other modified glycosides with which it was interrogated. C24-GH3-4 cleavedthe 4-amino glucoside, however this activity again paled in comparison to the activity displayed byC24-GH3-1. Furthermore, C24-GH3-1 also cleaved the 4-azido glycoside, albeit slowly, as well asthe 6-amino glucoside, indicating that this enzyme alone is sufficient to explain the activity seen inthe initial screen.Two of the expressed proteins did not hydrolyse the expected modified glycosides. The firstof these, the GH3 from FOS62-40-O22, was expected to hydrolyse both the 6-amino and 6-azidoglucosides. However, O22-GH3 only cleaved the azido-substituted glucoside and not the amino-substituted glucoside. This fosmid also contains a GH30, a family known to contain β-glucosidases,which may have been responsible for the activity on the 6-amino glycoside. The GH1 fromCG23A 23 I01 was expected to cleave both the 3-amino and 3-azido glucosides. However, nei-ther of the 3- modified glycosides nor MU-Glc were hydrolysed by I01-GH1, even after a prolonged(18 hour) incubation. As mentioned earlier, this fosmid also contains PTS IIA, IIB and IIC genesdirectly upstream and on the same DNA strand as the I01, suggesting that these genes are co-1094.3. Fosmid Hit Librariesexpressed. We then decided to test I01-GH1 with pNP 6-phospho-β-D-glucoside. Indeed, I01-GH1was able to catalyze hydrolysis of the 6-phospho-β-D-glucoside. It is therefore likely that the amino-and azido-glucosides must first be phosphorylated before hydrolysis. Attempts were made to phos-phorylate MU-3-N3-Glc and MU-3-NH2-Glc with an ATP dependent β-glucoside kinase (BglK)[296] from Klebsiella pneumonia, however, neither were phosphorylated by this enzyme. The pres-ence of a GH4 on the CG23A 23 I01 fosmid also obfuscates the observed activity. The GH4 familycontains members with activity on 6-phospho-β-glucosides, however this family employs an un-usual mechanism involving reduction and elimination steps, which is initiated by oxidation of the3-hydroxyl [333]. Since this is not possible for the 3-azido-glucoside it is difficult to ascribe theactivity on the 3-azido-glucoside to the GH4.4.3.3 Acceptor SpecificityAs we hoped to use the selected enzymes for synthesis, through the generation of glycosynthases, itwas pertinent to identify the range of possible acceptors. To probe the enzyme acceptor specificitywe followed the method developed by Blanchard et. al. [29]. This method is based upon screen-ing of relative rates of reactivation, through transglycosylation, of a stabilised, but catalyticallycompetent, glycosyl enzyme intermediate. This method employs a mechanism-based inactivator, a2-deoxy-2-fluoro-glucoside (2-F-Glc) or 2-deoxy-2-fluoro-galactoside (2-F-Gal) bearing an activatedleaving group, which forms a covalent intermediate with the enzyme of interest. Once the enzymeis inactivated excess inactivator is removed and the inactivated enzyme is subsequently incubatedin the presence of several different potential reactivators. After a set period of time, reactivationis assessed by assaying the enzyme with a chromogenic or fluorogenic substrate. The reactivatedenzyme is then compared to both the un-inhibited enzyme and a control in which no reactivatorwas used to assess the ability of a molecule to reactivate the enzyme, see Figure 4.2.Acceptor specificity assays were performed with 6 of the 9 purified enzymes, C24-GH3-2 andC24-GH3-4 were excluded as C24-GH3-1 had a wider range of activity on modified substrates, andI01-GH1 was excluded as this was not inactivated with 2,4-dinitrophenyl 2-deoxy-2-fluoro-glucoside.The six enzymes that were interrogated were assayed in a plate-based format and incubated with aset of 87 potential reactivators, which included thiols, alcohols, glycosides, free sugars, and amino1104.3. Fosmid Hit Librariesacids. As O03-GH42 displayed a preference for galactosides as opposed to glucosides, accordingly,this enzyme was inhibited with DNP 2-F-Gal, as opposed to DNP 2-F-Glc and reactivation wasassessed with MU-3-O-Me-Gal as opposed to pNP-Glc, which was used for the 5 other enzymes.Table 4.4: Acceptor Specificity of Selected Wild-Type Hydrolases.Acceptor C08-GH3 O22-GH3 C24-GH3-1 D08-GH3 C11-GH1 O03-GH42No Inhibitor 100 100 100 100 100 100No Acceptor 10.0 26.5 17.3 8.7 0.4 52.7Phenyl β-D-galactoside - 54.0 22.0 - - -Phenyl β-D-glucoside - - - - 0.8 -pNP α-D-galactoside 69.5 - - 14.7 1.0 -pNP α-D-xyloside 30.5 - 24.9 12.0 2.4 -pNP β-D-fucoside 30.3 - - - 1.1 -pNP β-D-galactoside 49.2 36.9 22.2 13.0 1.9 -pNP β-D-glucoside 86.6 30.2 42.2 18.6 17.2 -pNP β-D-glucuronide 75.6 - - 11.7 - -pNP β-D-mannoside - - 21.5 - - -pNP β-D-xyloside 28.6 - - - 3.7 -Cellobiose - - - - 0.7 -Gentiobiose - - - - 0.8 -Xylose - - - - 0.7 -2-Mercaptoethanol - - 23.0 10.4 0.7 62.91-Hexanol - - - - - 62.71-Pentanol - - - - - 62.92-methoxyethanol - - 21.3 10.1 - 71.02-Phenylphenol - - - - - 70.43-Mercapto-1-propanol - - 35.1 13.6 2.1 77.4Ethanediol - 28.2 21.7 - - 60.6Galactal - - 80.5 - - -Methanol - - 24.5 - - -Phenethyl alcohol - 37.5 32.5 12.8 - 61.1Rates are as a % of un-inhibited enzyme and only acceptors with a z-score > 3 in comparison tothe no-acceptor control are shown. The top reactivator for each enzyme is bolded.- : not a significant reactivatorScreening revealed distinct acceptor profiles for each of the enzymes assayed, see Table 4.4. Intotal twenty-four molecules were able to reactivate at least one inhibited enzyme faster than wateralone. Many of the top reactivators were aryl-glycosides, with pNP-Glc being the top reactivatorfor C08-GH3, D08-GH3 and C11-GH1, while phenyl galactoside was the top reactivator for O22-GH3. Although C24-GH3-1 also was reactivated by aryl glycosides, including pNP-Glc and pNP-Gal, this enzyme was reactivated by a number of alcohols and its best reactivator was galactal.1114.3. Fosmid Hit LibrariesO03-GH42 had a reactivator profile that was quite different from the other enzymes, as thiolsand alcohols appeared to be preferred to aryl glycosides. Additionally, as this enzyme displayedrapid reactivation without any added acceptors, I also observed hydrolysis of p-nitrophenyl β-D-galactoside, p-nitrophenyl α-L-arabinoside and p-nitrophenyl β-D-fucoside.4.3.4 Nucleophile Mutant Creation and Glycosynthase TestsWith both the donor and acceptor specificity information in hand, we next sought to create nucle-ophile variants of each enzyme and test for glycosynthase activity. The nucleophile residue of each ofthe wild-type enzymes was identified through multiple sequence alignment with well-characterizedenzymes from the same family. The codon coding for the nucleophilic aspartate (in the case ofC08-GH3, C24-GH3-1, D08-GH3 and O22-GH3) or glutamate (in the case of C11-GH1, I01-GH1and O03-GH42) was mutated to a codon for either an alanine, serine or glycine. The 21 mutantgenes were then transformed into an expression strain and expressed as for their cognate wild-typeenzymes.Initial glycosynthase tests were performed with the top three hits from the acceptor specificitytest. The donors used were α-F-Glc (C08-GH3, D08-GH3, O22-GH3 and C24-GH3-1), α-F-Gal(O03-GH42 and C11-GH1) or α-F-6-PO4Glc for I01 GH1, which was generated in situ. Of allnucleophile variants tested, only the C11-GH1 enzymes had any observable glycosynthase activity.The C11-GH1 E354S variant in particular was capable of transferring α-F-Gal onto pNP-Glc,pNP-β-Xyl and pNP-α-Xyl as determined by thin layer chromatography and mass spectrometry.C11-GH1 E354S was also capable of using both 6-N3-α-F-Glc and 6-N3-α-F-Gal as glycosynthaseacceptors.The lack of glycosynthase activity for a majority of the nucleophile variants led us to questionhow to improve the catalysts to obtain active glycosynthases. To date two GH3 enzymes have beensuccessfully transformed into glycosynthases. The first of these EryBI D257G was able to catalysethe glucosylation of erythromycin, however the yields obtained were quite low, 14% [138]. Thesecond active GH3 glycosynthase was derived from a thermostable β-glucosidase from Thermotoganeapolitana [244]. Nucleophile variants of this enzyme, TnBgl3B, were unable to catalyse glycosyn-thase reactions. However, when an additional mutation, W243F which had previously been seen1124.3. Fosmid Hit Librariesto result in increased transglycosylation of another GH3 β-glucosidase, [269] was introduced intothe original nucleophile variants competent glycosynthases were created.Inspired by this double mutation strategy we hoped that the introduction of a phenylanlaninein the place of the analogous tryptophan in the C08-GH3, C24-GH3-1, D08-GH3 and O22-GH3enzymes could result in active glycosynthases. This was also a possibility as all four of theseenzymes had a conserved tryptophan residue directly C-terminal to the active site nucleophile as-partate, in the same position as TnBgl3B. Mutagenesis was performed in the nucleophile serinevariant for all four genes, resulting in double active site variant genes (O22 GH3 D231S W232F,D08 GH3 D229S W230F, C24 GH3 1 D271S W272F and C08 GH3 D235S W236F). The corre-sponding proteins were purified as for the wild-type enzymes. The four double variant proteinswere then assayed as previously using 10 mM donor and acceptors. However, unlike the resultsobtained by Pozzo et. al. [244] no glycosynthase activity was observed, suggesting that mutationof the active site tryptophan isn’t a general strategy for the creation of GH3 glycosynthases.4.3.5 Product CharacterizationTo further characterize the glycosynthase products generated by C11 E354S I performed the reac-tions on a multi-milligram scale. These reactions were performed with 50 µmol of donor sugar and250 µmol of acceptor. In total four different multi-milligram scale reactions were carried out, see Ta-ble 4.5. C11 E354S was able to catalyse the galactosylation of both pNP α- and β-D-xylopyranoside,and pNP β-D-glucopyranoside.The Galactosylation of β-D-glucopyranoside resulted in both the 1,3- and 1,2-linked products.GH1 glycosynthase production of 1,3-linked pNP galactosylglucoside has been previously observedfor both Abg E358A [188] and Bgl3 E383A [87], however the 1,2-linked product has not been ob-served previously. Galactosylation of the xylosides resulted in 1,2-linkages when either pNP-α-Xylor pNP-β-Xyl were used. The product containing the α-xyloside is particularly interesting as thiscould be used as a model substrate for xyloglucan decorations. The galactosyl xylosides producedhere have different linkages than those produced by either Abg E358A [188] or Bgl3 E383A [87].Both these enzymes are able to catalyse the galactosylation of pNP-β-xyl, however in both casesthe major regiochemical outcome was the 1,3-linked product.1134.3. Fosmid Hit LibrariesWe were also able to use C11 E354S to attach 6-azido-modified α-galactosyl fluoride. Theglycosynthase reaction between 6-N3-α-F-Gal and pNP-Glc resulted in three separate products,with similar yields, see Table 4.5. The 1,2-, 1,3- and 1,4-linked products were all observed withthe 1,3-glycoside being the major product. This is somewhat surprising as the 1,4-linked productwas not observed when the unmodified galactoside donor was used in a similar reaction. Theregiochemical outcome is thus influenced by the presence of the 6-azido functional group. Takentogether these results demonstrate the ability of C11 E354S to glycosylate with azido-modifieddonors and future research should focus on the scope of molecules that can act as competentacceptors.Table 4.5: Stereochemical Outcome and Yield of C11 E354S Glycosynthase Reactions.Enzyme Donor Acceptor Product Yield (%)C11 E354S α-F-Gal pNP-Glc Gal-(β-1,2)-Glc-β-pNP 15Gal-(β-1,3)-Glc-β-pNP 20C11 E354S α-F-Gal pNP-α-Xyl Gal-(β-1,2)-Xyl-α-pNP 50C11 E354S α-F-Gal pNP-β-Xyl Gal-(β-1,2)-Xyl-β-pNP 60C11 E354S 6-N3-α-F-Gal pNP-Glc 6-N3-Gal-(β-1,4)-Glc-β-pNP 236-N3-Gal-(β-1,2)-Glc-β-pNP 216-N3-Gal-(β-1,3)-Glc-β-pNP 371144.4. Glycoside Hydrolase Family 1 Library4.4 Glycoside Hydrolase Family 1 LibraryTo further explore the prevalence of promiscuous hydrolase activities we exploited a library of175 GH1 enzymes synthesized and characterized by Heins et. al. [117]. The genes within thislibrary were chosen to maximize sequence diversity and are from eukaryal, archaeal, bacterial andmetagenomic sources. The GH1 family contains members with activity on many different glycosides,yet the most abundant activity observed by Heins et. al. [117] was the hydrolysis of β-glucosides(59 of 105 expressed and purified enzymes). This abundance of β-glucosidases was the reason weselected this library for interrogation with modified β-glucosides. Additionally, the GH1 familycontains many examples of successful glycosynthases [80, 87, 188, 236, 237, 243, 299], includingC11 E354S detailed previously, implying that the hydrolases within this library can be convertedto glycosynthases with good success rates.4.4.1 Screening with Modified GlycosidesThe GH1 library was screened, as for the fosmid library, with six substrates bearing a fluorogenic4-methylumbelliferyl leaving group and either an azido or amino group at the 3, 4, or 6 position.Clones were also screened with MU-Glc to determine the number of β-glucosidases that couldbe detected from crude lysate. Screening revealed that 115 of the 175 GH1 enzymes cleavedthe unmodified substrate MU-Glc (z-score > 9), a much higher number of active enzymes thanobserved by Heins et. al. [117], Figure 4.6. This increased number of active enzymes likely reflectsthe increased reaction time (18 hours in this study, 10 mins for Heins et al.[117]) and an increasedenzyme concentration for those that had not been purified in significant concentrations.Nearly two thirds of the clones tested on modified glucosides cleaved at least one substrate (106of 175 with a z-score > 9), and 91 cleaved more than one substrate (Figure 4.7). Four of theseclones (Genbank ID: BAJ01494.1, ABS04001.1, BAA74959.1, CAL97639.1) cleaved MU-6-N3-Glc,yet had no activity on the parent MU-Glc. All other clones with activity on modified glucosides alsocleaved the parent MU-Glc. In general a greater number of active clones were seen for the aminosubstituted glucosides (MU-4-NH2-Glc: 99/175, MU-6-NH2-Glc: 83/175, MU-3-NH2-Glc : 64/175)than the azido substituted glucosides (MU-6-N3-Glc: 91/175, MU-3-N3-Glc: 2/175, MU-4-N3-Glc:1154.4. Glycoside Hydrolase Family 1 Library ABC33525.1 ACO44852.1 ABE33903.1 BAC96154.1 BAJ01494.1 ABD80656.1 ABD54861.1 ABP70047.1 CR 19 ABC92395.1 AAF36392.1 BAA19881.1 ADG73989.1 CBG74700.1 CAA82733.1 ADB52696.1 ADG88608.1 ABK51908.1 ACZ89864.1 AAF37730.1 ABK71329.1 ACY97307.1 ADG88606.1 ACZ86244.1 CAQ02518.1 ACZ00292.1 BAG18260.1 BAH33569.1 ADG89462.1 ACZ90607.1 ABI88543.1 ACL95685.1 CAB95278.1 ADG89307.1 CR 24 CR 23 ACZ37931.1 ACL74759.1 CBL17177.1 CR 29 CR 14 AAQ00997.1 CAA31087.11.41824AACACL70277.1 ACK43071.1 ABY34767.1ACJ34717.1 ACQ71106.1 CAA91220.1AAZ81839.1AAA22264.1ACJ76349.1 ACM22958.1 ABX31229.1BAA36160.1 AAA22266.1BAB05642.1 BAE04157.1 ACO22341.1 CAR89462.1ACZ10298.1 ACQ69978.1 CAK19693.1ABJ73134.1BAD76622.1 CR 9 CBL32986.1ACR18502.1 ABS74298.1AAV42120.1 ACQ70805.1 CAL97639.1ABJ68623.1CR 8 ABV62413.1CR 5 ABS75964.1AAT59229.1 ABU77792.1 ABV41425.1ADD01617.1AAU43012.1 ACX80029.1 BAD77499.1 ACZ10184.1 AAK78365.1ACV50514.1 CAG77303.1 BAD63025.1 BAC14719.1 ACQ70812.1 BAD76141.1 CAW94361.1 CAQ67883.1 CAA56282.1CR 35 ABY33610.1ABJ83756.1ABW14028.1CAJ88232.1BAG19255.1 CAA52344.1ADB09596.1CAN00121.1ABS04001.1BAB65785.1 ACJ75238.1 ABS61401.1ABW01492.1ABW01253.1BAB59827.1 AAY81155.1 AAA72843.1CAC12444.1 AAT44038.1 ABL78509.1ABL79070.1CAB49848.1AAL81332.1ABY49715.1BAA78713.1 ACL69240.1 ABN52659.1CR 37 CAQ03968.1 ACR79134.1 ACY18557.1 ABF43014.1ABQ91331.1 1.06349NAC1.00781ICA  ABW02360.1 ABN70482.1 ACL10579.1 CAA94187.1 ADD08340.1 ACS89645.1 ABF90507.1 ACI21065.1 BAE87008.1 AAP57758.1 BAE63197.1 ACO82080.1 CAP91982.1 CAK47813.1 AAL34084.2 BAA74959.1 AAX07701.1 ABD82858.1 CAC08178.1 AAA65946.1 CBD28469.1 ABV54716.1 CAC83098.1 AAN01354.1 AAL25999.1 ACR79740.1 ABQ07856.1 ADB37940.1 ADD27846.1 AAU92142.1 ABF52736.1 BAE49023.1 ACM38274.1 ACF00459.1 BAI75305.1 ABU56651.1 ADD27066.1 ACZ42845.1 ABD68852.1 ABF87202.1 ACM06095.1 ACU71435.1 BAG17581.1β-Glucosidase ActivityFigure 4.6: GH1 Enzyme Library β-Glucosidase Activity. Relative fluorescent intensity is repre-sented by bars, with each leaf corresponding to a protein. Fluorescent values are given as thefraction of the maximum expected fluorescence. Genbank IDs are given at the tip of each leaf.1164.4. Glycoside Hydrolase Family 1 Library1/175). It is also worth noting that the absolute fluorescence for hits identified with the 3- and4-substituted azido sugars was extremely low when compared to other hits and had fluorescencevalues less than 1 % of the anticipated maximum.MU-6-NH2-GlcMU-3-NH2-Glc MU-4-NH2-GlcMU-6-N3-GlcFigure 4.7: Screening Results. Relative fluorescence intensity is represented by bars, with each leafcorresponding to a protein. Fluorescent values are given as the fraction of the maximum expectedfluorescence. Results for MU-3-N3-Glc and MU-4-N3-Glc are not shown.1174.4. Glycoside Hydrolase Family 1 LibraryThe finding of relatively few enzymes that are capable of cleaving the MU-3-N3-Glc and MU-4-N3-Glc substrates is a reflection both of the considerable steric demand of an azide substituentcompared to an amine or the parent hydroxyl, as well as of the importance of the hydrogen bondsnormally formed between the enzyme and the substrate at those positions. The key residues in-volved in interactions with the 3- and 4-hydroxyls, His123, Gln20, Trp423 and Glu422 (numberingbased on Phanerochaete chrysosporium GH1 [219]) are highly conserved, see Figure 4.8. The6-azido substituent, however, is reasonably well tolerated, most likely because of the greater con-formational flexibility possible at the 6-position, allowing the substituent to adopt an orientationthat minimises steric repulsion. The readier acceptance of an amine substituent than an azide atC3 or C4 likely stems from its small size, as well as its hydrogen bonding potential. Likewise,amine substitution at the 6 position also seems to be broadly tolerated. The hit rates seen forthe amino glucosides, in fact, nicely mirror the specificities determined for the GH1 β-glucosidaseAbg by Namchuk and Withers [213] through measurement of kinetic parameters for hydrolysis ofa set of mono-deoxyglycoside substrates in which each hydroxyl, individually, had been replacedby hydrogen. From these data, the contributions of interactions with each hydroxyl to transitionstate stabilization were determined, yielding ∆∆Go‡ values of 7.4, 2.5 and 2.9 kJ/mol for the 3-, 4-and 6-hydroxyls, respectively, very much in line with the lower tolerance seen here for substitutionat C3.1184.4. Glycoside Hydrolase Family 1 LibraryO-OOHHOHOOHONHNCNH2OCONH2OOHNGlu-365Asn-169His-123Trp-423Gln-20Glu-4223.0Å3.5Å3.0Å2.9Å2.7Å2.9Å2.8Å2.7ÅOA BFigure 4.8: Pha GH1 in Complex With Gluconolactone and Substrate-Protein Bond Distances. AThe crystal structure of Pha GH1 (PDB:2E40) in complex with gluconolactone clearly shows the6-hydroxyl of the gluconolactone pointed toward the exterior of the tunnel like active site. Theview shown is directed into the active site. Carbon atoms of gluconolactone are shown in greenand oxygen atoms are shown in red. B Bond distances between the hydroxyls of gluconolactoneand the conserved active site residues are shown. Crystal structure was determined by Nijikkenand coworkers [219].4.4.2 Kinetic Characterization of HydrolasesFrom the 106 hits so identified we chose 8 enzymes, which between them encompassed all theactivities sought, as candidates for transformation into glycosynthases (Table 4.6). As a firststep, kinetic parameters were measured for each modified substrate with each of these purifiedwild-type enzymes (Table 4.7) with the exception of the Lac enzyme, for which activity on MU-3-NH2-Glc, MU-4-N3-Glc and MU-6-N3-Glc was below detectable levels. Since this enzyme isprimarily a 6-phospho-beta-glucosidase, it is not surprising that it has such sparing activity onnon-phosphorylated modified glucosides. In many cases KM values were too high to be reliablymeasured, so values of kcat/KM , the specificity constant, were determined instead through measure-ments at low substrate concentrations. In general, for those enzyme/substrate pairs where cleavage1194.4. Glycoside Hydrolase Family 1 Librarywas detected, higher specific activities were seen with 6-modified substrates (Table 4.7). However,in two cases kcat/KM values measured with 6-modified substrates were higher than for the parentglucoside. Also, interestingly, in some cases the 6-azidoglucoside was cleaved more rapidly than the6-amino, and in others the reverse was seen. Additionally, all enzymes capable of cleaving both4-amino and 3-amino glucosides displayed higher specific activities towards the 4-amino substrates.This again is very much in agreement with the specificity studies on Abg noted previously.Table 4.6: Selected GH1 Genes and Their Activities.Modified GlucansGenbankID Enzyme Reference MU-Glc 3-NH2 4-NH2 6-NH2 3-N3 4-N3 6-N3AAZ81839.1 Ali GH1 [168] X X X X - - XACO44852.1 Dei GH1 - X X X X - - XACQ71106.1 Exi GH1 - X X X X X - XCAQ67883.1 Lac GH1 - X X X - X X -ABF87202.1 Myx GH1 - X X X X - - XBAE87008.1 Pha GH1 [301] X - X X - - XABD82858.1 Sac GH1 - X X X X - - XAAF37730.1 The GH1 [282] X - - - - - X4.4.3 Acceptor SpecificityTo gain insight into the specificity of the +1 subsite, each of the 8 wild-type enzymes was subjectedto acceptor specificity screening as was done for the metagenomic hits. All 8 enzymes were screenedagainst a panel of 83 potential acceptors, including a variety of glycosides, free sugars and alcohols.As can be seen in Figure 4.9, each enzyme displayed a different pattern of acceptor specificity.The extent of reactivation of each enzyme also differed, with the maximum rates of reactivation forAli GH1, Pha Gh1 and The GH1 being fairly modest (0.4%, 2.1 % and 3.5% of the uninhibited rate,respectively) when compared to those of Dei GH1, Exi GH1, Lac GH1, Myx GH1 and Sac GH1(19.7%, 83.8 %, 59.4%, 47.2 % and 100% of the uninhibited rate, respectively). The majority ofthe best reactivators were aryl glycosides, with six of the eight enzymes being reactivated fastest byan aryl glucoside (pNP-Glc, pNP-α-Glc or MU-α-Glc). The exceptions to this were Pha GH1, forwhich cellobiosides were best – suggesting a strong preference for cello-oligosaccharide acceptors,and Sac GH1, which reactivated fastest with pNP β-D-fucopyranoside.1204.4. Glycoside Hydrolase Family 1 LibraryA D E L M P S TControlpNP β−D−glucuronidepNP α−D−mannopyranosiden−octyl β−D−glucopyranosideMU β−D−xylosidepNP α−D−galactopyranosidepNP β−D−cellobiosideMU β−D−glucopyranosideMethyl β−D−xylopyranosidepNP β−D−galactopyranosideDithiothreitolMU β−D−lactosideGlucosepNP β−D−xylopyranosidepNP α−D−glucopyranosideGentiobioseCellobiosephenyl β−D−glucopyranosidepNP β−D−fucopyranosideMU α−D−glucopyranosidepNP β−D−glucopyranosidePercent of maximum rate (%)0 100Figure 4.9: GH1 Acceptor Specificity. The top five reactivators and the relative initial rates observedare shown for each of the wild-type enzymes. Enzymes screened are abbreviated as follows: (A)Ali GH1, (D) Dei GH1, (E) Exi GH1, (L) Lac GH1, (M) Myx GH1, (P) Pha GH1, (S) Sac GH1,and (T) The GH1. Rates are scaled to the best reactivator for each given enzyme. The control isthe activity seen when no reactivator was included.1214.4. Glycoside Hydrolase Family 1 LibraryTable 4.7: Kinetic Parameters for Selected GH1sEnzyme Substrate kcat (s−1) KM (mM) kcat/KM (s−1,mM−1)Ali GH1MU-Glc 28 ± 1 0.13 ± 0.01 220 ± 20MU-3-NH2-Glc - - 9× 10−4 ± 2× 10−4MU-4-NH2-Glc - - 2.8 ± 0.6MU-6-NH2-Glc - - 1.60× 10−1 ± 6× 10−3MU-6-N3-Glc 0.32 ± 0.04 0.04 ± 0.01 8 ± 3Dei GH1MU-Glc 14 ± 1 0.13 ± 0.03 100 ± 30MU-3-NH2-Glc - - 7.5× 10−2 ± 2× 10−3MU-4-NH2-Glc - - 3.31 ± 0.05MU-6-NH2-Glc 61± 2 0.20± 0.01 310± 20MU-6-N3-Glc - - 7.45× 10−1 ± 1× 10−3Exi GH1MU-Glc 46 ± 4 0.018 ± 0.009 3000 ± 1000MU-3-NH2-Glc 0.0006 ± 0.0001 0.012± 0.001 0.48± 0.05MU-4-NH2-Glc - - 8.4± 0.6MU-6-NH2-Glc 117± 6 0.07± 0.01 1700± 300MU-3-N3-Glc - - 2.2× 10−5 ± 3× 10−6MU-6-N3-Glc 5.9± 0.7 0.24± 0.05 25± 6Lac GH1pNP 6-PO4-Glc 2.04± 0.04 1.34± 0.05 1.52± 0.06pNP Glc - - 1.2× 10−4 ± 2× 10−5MU-Glc - - 8.70× 10−2 ± 9× 10−4MU-3-NH2-Glc - - N.D.MU-4-NH2-Glc - - 1.05× 10−2 ± 3× 10−4MU-4-N3-Glc - - N.D.MU-6-N3-Glc - - N.D.Myx GH1MU-Glc 5.1± 0.2 2.5× 10−3 ± 7× 10−4 2000± 600MU-3-NH2-Glc - - 2× 10−3 ± 1× 10−3MU-4-NH2-Glc - - 1.05± 0.09MU-6-NH2-Glc 4.5± 0.4 0.020± 0.007 220± 80MU-6-N3-Glc - - N.D.Pha GH1MU-Glc 20.2± 0.8 0.073± 0.009 280± 40MU-4-NH2-Glc - - 1.24± 0.04MU-6-NH2-Glc - - 0.95± 0.04MU-6-N3-Glc 1.40± 0.06 2.0× 10−3 ± 4× 10−4 700± 200Sac GH1MU-Glc 9± 1 0.07± 0.02 160± 40MU-3-NH2-Glc - - 3.0× 10−3 ± 9× 10−5MU-4-NH2-Glc - - 3.2× 10−2 ± 2× 10−3MU-6-NH2-Glc - - 4.6× 10−3 ± 2× 10−4MU-6-N3-Glc - - 0.74± 0.03The GH1MU-Glc 10± 1 0.07± 0.02 160± 50MU-3-NH2-Glc - - N.D.MU-4-NH2-Glc 0.13± 0.01 0.29± 0.04 0.44± 0.07MU-3-N3-Glc - - N.D.MU-6-N3-Glc - - 0.68± 0.031224.4. Glycoside Hydrolase Family 1 Library4.4.4 Nucleophile Mutant Creation and Glycosynthase TestsWe sought to make glycosynthases from each of the eight hits, with the hopes that they wouldbe competent at transferring modified glucosides onto a variety of molecules. The conserved nu-cleophilic glutamate residue, for each enzyme, was mutated to three different amino acids (Serine,Alanine and Glycine) in the hopes that one of these would be an active glycosynthase. All 24variant enzymes were expressed and purified on a 50 mL scale. Initially enzymes were tested forglycosynthase activity using α-glucosyl fluoride (αF-Glc, 50 mM) as a donor and para-nitrophenylglucoside as an acceptor (pNP-Glc, 10 mM). For six (Ali GH1, Dei GH1, Exi GH1, Myx GH1,Sac GH1, The GH1) of the eight enzymes, at least one variant acted as a glycosynthase with thisdonor/acceptor combination. Mutants of the other two (Lac GH1 and Pha GH1) did not yieldproducts. However, when Pha GH1 nucleophile variants were incubated with αF-Glc and pNP-cellobioside (pNP-C), one of the best reactivators for the wild-type enzyme, products were indeedseen. As Lac GH1 has a much higher specificity constant for the hydrolysis of 6-phospho-glucosideswhen compared to that seen for glucosides (Table 4.7), we suspected that the nucleophile variantswould be competent glycosynthases with 6-phospho-glucosyl donors. Unfortunately none of theLac GH1 variants had observable catalytic activity with 6-phospho-α-glucosyl fluoride in conjunc-tion with any of the top five reactivators.To choose the best glycosynthase variant from the three different variants we performed HPLCanalysis of small scale reactions. Reaction mixtures contained an equal amount of donor andacceptor (αF-Glc, pNP-Glc or pNP-C at 5 mM). All glycine variants displayed hydrolytic activity,which we speculate is due to mis-incorporation of the wild-type glutamate, as has been reportedpreviously for a GH1 glycosynthase [236]. In that study, mis-incorporation is seen for the genecontaining the GGG codon for glycine, in our case we used GGA, however, in both of these casesthe codons differ in only one base from that for glutamate (GAG, GAA). We suggest that futureglycosynthase creation should utilise codons containing 2 substitutions (GGC, GGT) if the glycinevariant is to be tested. Of the serine and alanine variants, the serine variant had the highestyield for the major product for all enzymes except for The GH1 for which the alanine variant wasselected. The differences between variants were, for most cases, within 5%, the only exception1234.4. Glycoside Hydrolase Family 1 Librarybeing Sac GH1, which had a 21 % higher yield for the serine variant than the alanine (Table 4.8).Enzyme Major Product Yield %Ali E354A 64 ± 0.2Ali E354S 65 ± 0.2Dei E346A 40 ± 0.1Dei E346S 45 ± 0.3Exi E350A 41 ± 0.2Exi E350S 54 ± 0.3Myx E357A 47 ± 0.4Myx E357S 52 ± 0.7Sac E368A 79 ± 0.9Sac E368S 100 ± 0.4The E388A 59 ± 0.7The E388S 55 ± 0.4Pha E365A 65 ± 1Pha E365S 70 ± 1Table 4.8: Product Yields From Small Scale Glycosynthase Reactions.4.4.5 Product CharacterizationTo identify the products of the most efficient glycosynthases, large-scale reactions were performedand products were purified by HPLC. The majority of the NMR experiments were performed by Dr.Feng Liu, and detailed chemical shift assignments are given in Appendix Section C.0.1. Initiallylarge-scale glycosynthase reactions were carried out with α-F-Glc as acceptor and pNP-Glc or pNP-C as the donor. These products were then characterized by NMR and mass spectroscopy to revealthe glycosidic linkages (Table 4.9). Remarkably, a large set of different glycans was formed by thedifferent glycosynthases despite both the donor and acceptor sugars being the same (except in thecase of Pha E365S). Of the seven competent glycosynthases, five selectively transferred a singlesugar onto the acceptor, with Ali E354S, Exi E350S, and Myx E357S preferentially forming β-1,3-linkages, while Pha E365S and The E388A preferentially formed β-1,4-linkages. The Sac E368Sglycosynthase also formed β-1,4-linkages, but this enzyme was also competent using the productas an acceptor to transfer an additional glucose, forming a trisaccharide. The final glycosynthase,Dei E346S, also preferentially produced trisaccharides, but in this case the first transfer formeda β-1,3-linkage and the second a β-1,4 yielding the mixed trisaccharide (Glc-β-1,4-Glc-β-1,3-Glc-1244.4. Glycoside Hydrolase Family 1 Libraryβ-1,4-pNP). This mixed-linkage product may be useful in dissecting the mechanism of hydrolasesthat function on mixed-linkage glucans.Heins and coworkers performed a detailed characterization of the linkage specificity for a selectedset of enzymes which included the Ali and The enzymes [117]. Both The GH1 and Ali GH1 had thefastest hydrolysis rates for laminaribiose (β-1,3-linked Glc-Glc), with sophorose hydrolysis secondfastest for Ali GH1 (β-1,2-linked Glc-Glc) and cellobiose (β-1,4-linked Glc-Glc) being second fastestfor The GH1. The glycosynthase product for Ali GH1 was consistent with the hydrolysis rates,however The E388A only synthesized β-1,4-linked products. Justification of this inconsistency maylie in the presence of a para-nitrophenyl-aglycone in the acceptor, which may interact with the +2subsite, altering the acceptor orientation. Determining the regiochemical outcome of the reactionin which glucose as an acceptor would shed light on whether this is the case.Enzyme Donor Acceptor Product Yield (%)Ali E354S αF-Glc pNP-Glc Glc-(β-1,3)-Glc-β-pNP 65Glc-(β-1,4)-Glc-β-pNP 8Dei E346S αF-Glc pNP-Glc Glc-(β-1,4)-Glc-(β-1,3)-Glc-β-pNP 45Glc-(β-1,4)-Glc-β-pNP 12Exi E350S αF-Glc pNP-Glc Glc-(β-1,3)-Glc-β-pNP 54Glc-(β-1,4)-Glc-β-pNP 3Myx E357S αF-Glc pNP-Glc Glc-(β-1,3)-Glc-β-pNP 52Glc-(β-1,4)-Glc-β-pNP 14Sac E368S αF-Glc pNP-Glc Glc-(β-1,4)-Glc-(β-1,4)-Glc-β-pNP 100Pha E365S αF-Glc pNP-C Glc-(β-1,4)-Glc-(β-1,4)-Glc-β-pNP 70The E388A αF-Glc pNP-Glc Glc-(β-1,4)-Glc-β-pNP 59Glc-(β-1,4)-Glc-(β-1,4)-Glc-β-pNP 28Sac E368S αF-3-NH2-Glc pNP-Glc 3-NH2-Glc-(β-1,4)-Glc-β-pNP 74Exi E350S αF-3-NH2-Glc pNP-Glc 3-NH2-Glc-(β-1,3)-Glc-β-pNP 16Sac E368S αF-4-NH2-Glc pNP-Glc 4-NH2-Glc-(β-1,4)-Glc-β-pNP 63The E388A αF-4-NH2-Glc pNP-Glc 4-NH2-Glc-(β-1,4)-Glc-β-pNP 64Sac E368S αF-6-NH2-Glc pNP-Glc 6-NH2-Glc-(β-1,4)-Glc-β-pNP 37Sac E368S αF-6-N3-Glc pNP-Glc 6-N3-Glc-(β-1,4)-Glc-β-pNP 84Exi E350S αF-6-N3-Glc pNP-Glc 6-N3-Glc-(β-1,3)-Glc-β-pNP 426-N3-Glc-(β-1,4)-Glc-β-pNP 8Pha E365S αF-6-N3-Glc pNP-C 6-N3-Glc-(β-1,4)-Glc-(β-1,4)-Glc-β-pNP 27Dei E346S αF-Glc pNP-Xyl Glc-(β-1,4)-Glc-(β-1,3)-Xyl-β-pNP 65Glc-(β-1,3)-Xyl-β-pNP 24Exi E350S αF-Glc pNP-Xyl Glc-(β-1,3)-Glc-(β-1,3)-Xyl-β-pNP 50Glc-(β-1,3)-Xyl-β-pNP 49Dei E346S αF-Glc n-Octyl-Glc Glc-(β-1,4)-Glc-(β-1,3)-Glc-β-Octyl 11Sac E368S αF-Glc DNP 2F-Glc Glc-(β-1,4)-Glc-(β-1,4)-2F-Glc-β-DNP 28Glc-(β-1,4)-Glc-(β-1,4)-Glc-(β-1,4)-2F-Glc-β-DNP 21Sac E368S αF-4-NH2-Glc DNP 2F-Glc 4-NH2-Glc-(β-1,4)-2F-Glc-β-DNP 74Table 4.9: Characterized GH1 Glycosynthase Products.1254.4. Glycoside Hydrolase Family 1 LibraryEnzyme αF-3-NH2-Glc αF-4-NH2-Glc αF-6-NH2-Glc αF-6-N3-GlcAli E354S X XDei E346S X X XExi E350S X X X XMyx E357S X X XPha E365S X X X XSac E368S X X X XThe E388A XTable 4.10: Glycosynthase Activity with Azido and Amino Donor Sugars.Having probed the general utility of the set of glycosynthases we turned our attention towardsthe modified glycosynthase donors. A range of α-glycosyl fluorides containing the same 3-, 4-, and6-azido and amino modifications as those used to screen for hydrolase activity were synthesized byDr. Hongming Chen to test as glycosynthase donors. Each of the seven competent glycosynthaseswas tested with each of the modified donor substrates, using either pNP-Glc or pNP-C as acceptor:all seven functioned with at least one modified donor (Table 4.10). Consistent with what had beenseen in screening modified substrate activity, the 4-aminoglucosyl fluoride was accepted as a donorby all seven of these glycosynthases. The 3- and 6-aminoglucosyl fluorides were also accepted bymany of the glycosynthases (5 of 7 and 6 of 7, respectively) corresponding fairly well with WTenzyme results on modified substrate. Three of the seven glycosynthases (Pha E365S, Sac E368Sand Exi E350S) were also able to transfer the 6-azidoglucosyl fluoride donor onto pNP-Glc orpNP-C, each producing a different azido-modified glucan (Table 4.9). Finally, none of the variantswere able to carry out glycosyl transfers using the 3- or 4-azidoglucosyl fluorides as donors. Thisis not so surprising given the relatively low activities of these wild-type enzymes, along with theconsiderably lower activities of non-evolved glycosynthases relative to the wild-type parents carryingout the normal reaction.Having established the donor and acceptor specificities of these glycosynthases we tested theability of selected glycosynthases to generate useful conjugates of other glycans and non-sugaracceptors. Oligosaccharides of mixed sugar composition could be assembled, as demonstrated in theability of Dei E346S and Exi E350S to transfer to pNP-xyloside, generating glucosyl-β-xylosides,with linkages that mirrored those seen when using pNP-G as acceptor (Table 4.9). Likewise,n-octyl β-glucoside served as an acceptor for Dei E346S generating octyl oligosaccharides with1264.4. Glycoside Hydrolase Family 1 Librarypotential as detergents; addition of terminal aminosugars would allow simple assembly of cationicversions of these detergents. Within the sugar series, a particularly useful set of products arethe mechanism-based inactivators generated by glycosynthase-catalyzed glycosylation of simpleglucoside-based reactive entities such as 2,4-dinitrophenyl 2-deoxy-2-fluoro-β-glucoside (DNP 2-F-Glc). The Sac E368S enzyme was selected for transfer onto this inhibitor – this enzyme had thehighest transfer yields – resulting in the disaccharide version. More importantly, Sac E368S was ableto transfer a 4-aminoglucosyl moiety to create a disaccharide inhibitor bearing a functionalisableamine on the non-reducing end sugar (Table 4.9). This now allows facile, and mild derivatizationof mechanism-based inhibitors and affinity labels via amide formation, allowing the attachment offluorophores for detection, or of biotin for capture. Further, the amine could serve as the point ofattachment of diverse substituents as a means of introducing novel specificity elements.1274.5. Discussion and Future Directions4.5 Discussion and Future DirectionsThe two clone libraries investigated in this study allowed for the interrogation of enzymatic promis-cuity and creation of glycosynthases. The characteristics of the libraries allowed us to interrogateboth the promiscuity across a range of different hydrolase-containing gene cassettes, and with a finedetail within one specific family. The identification of promiscuous genes in turn allowed for thecreation of a panel of glycosynthases which can incorporate modified glucosides and galactosides.The power of fosmid libraries lies in the ability to identify lengthy clusters of genes whichmay function synergistically. This however can also complicate the identification of the specificgene responsible for activity, especially when several potentially active genes are present. Thereare typically three ways forward to identify the genes responsible for activity: creation of a smallinsert libraries or knock-out libraries and sub-cloning the CAZymes. Small-insert libraries arecreated by first shearing the purified DNA, cloning these fragments into an expression vector thentransforming this library into a suitable host. The re-screening of these small-insert libraries andsubsequent sequencing should then reveal the gene(s) responsible for activity. Creating knock-outlibraries involves integrating a selection marker randomly within a fosmid, then screening a libraryof clones with the integration for a loss of activity and finally sequencing the selected clones. Thesemethods are feasible for small numbers of hits, but become intractable when tens or hundreds offosmid clones are identified.The ideal solution would be to create a library containing each of the potentially active genesfrom each fosmid in a high-expression vector. Creation of such a library would, without a doubt, belimited by the time consuming process of sub-cloning. However, technological advances may soonmake this dream a reality. Development of laboratory automation equipment, such as the digital tobiological converter [32], could allow for the rapid, automated sub-cloning of genes. Gene-synthesisautomation coupled with decreasing costs, may soon allow for the realistic creation of librariescontaining thousands of such clones at the click of a button. Technological advances should alsoenable the rapid creation of sets of phylogentically diverse enzymes from families other than GH1,allowing for a similar fine-detail characterization of promiscuous activity.We were able to generate several new glycosynthases from promiscuous hydrolases, however,1284.5. Discussion and Future Directionsseveral questions remain to be investigated. Are there additional mutations which will allow thecreation of glycosythases from GH3 enzymes? How can we incorporate sterically bulky substituentsat the 3- and 4-positions? How can we determine a priori whether an GH is a good candidate fortransformation into a glycosynthase?The first of these questions may be solved by directed evolution. This may be accomplishedby using a screen similar to that performed by Kim et al [153] where detection of glycosynthaseactivity was coupled to the activity of an enzyme (Cel5A) that was able to cleave the glycosynthaseproduct, but not the reactants. Subjecting the metagenomically identified promiscuous GH3s tothis process should enable the evolution of glycosynthases and the identification of mutations whichsupport this catalysis. Directed evolution of multiple GH3s in parallel may reveal mutations thatuniversally support glycosynthase transformation for all GH3s.We were able to use glycosynthases to incorporate amino modifications at the 3-, 4- and 6-positions of donor sugars and azido modifications at the 6-position. However, the creation of aglycosynthase capable of using either 3- or 4-azido or 3- or 4-methoxy sugars as donors remainselusive. Previous work by Shim et al. [272] produced a glycosynthase capable of transferring a3-O-methyl-glucosyl moiety by means of directed evolution of an existing glycosynthase (Abg2F6)[153]. Their strategy involved the saturation mutagenesis at key primary protein interaction sitesaround the 3-hydroxyl group within the hydrolase and plate-based activity screens. This strategymay also be successful for the incorporation of azido-modified sugars. Another approach couldbe to target specific β-glucosidase families known to have relaxed interactions at either the 3- or4-hydroxyl positions. The GH5 family, which typically has endo-activity, but contains memberswith β-glucosidase activity could be a useful starting point.The question of what makes a good candidate for a glycosynthase has been addressed previouslyby Ducros et al [81]. Within this paper the authors suggested that measuring the reactivationrates of a 2-fluoro-glycosyl-enzyme intermediate (a proxy for the glycosynthase bound α-glycosylfluoride) with acceptors and comparison of this rate to the reactivation rate with water could beuseful metrics for determining whether a hydrolase will yield an efficient glycosynthase. They foundthat hydrolases with reactivation rates (ktrans) rates > 10−2 min−1 and high selectivity for transferto acceptor over water (ktrans/kH2O) > 20 acted as efficient glycosynthases. Although I did not1294.6. ConclusionsTable 4.11: Comparison of Enzyme Reactivation.Enzyme Reactivated (%) Ratio ActiveEnzyme Family Acceptor H2O Acceptor Acceptor/H2O Glycosynthase?Ali GH1 pNP-Glc 0.06 0.31 5.6 YDei GH1 pNP-Glc 0.69 19.00 27.4 YExi GH1 pNP-Glc 9.15 74.68 8.2 YLac GH1 pNP-α-Glc 4.30 55.12 12.8 NMyx GH1 pNP-Glc 0.56 7.85 14.1 YPha GH1 pNP-C 0.13 1.04 8.2 YSac GH1 pNP-Glc 4.47 75.72 16.9 YThe GH1 pNP-Glc 0.17 0.71 4.1 YC08-GH3 GH3 pNP-Glc 1.65 12.69 7.7 NO22-GH3 GH3 phenyl-Glc 34.44 35.87 1.0 NC24-GH3-1 GH3 Galactal 11.89 43.34 3.6 ND08-GH3 GH3 pNP-Glc 11.25 12.76 1.1 NC11-GH1 GH1 pNP-Glc 0.45 16.75 37.3 YO03-GH42 GH42 3-Mercapto-1-propanol 52.66 24.70 0.5 Ndirectly measure such rates, the results of the acceptor specificity tests may give some insight intothe selectivity for transfer (see Table 4.11). The enzymes that were successfully transformed intoglycosynthases all had higher percentages of acceptor reactivation (total reactivation with acceptor- reactivation due to water) than water reactivation. Most had ratios of reactivation ( Acceptorreactivation/ water reactivation) greater than 5. The hydrolases which could not be transformedinto successful glycosynthases had fairly low ratios of the % enzyme reactivated (Table 4.11), withO03-GH42 even having a higher rate of hydrolysis than transfer to any acceptor. C08-GH3 wasexceptional in that it had a ratio (7.7) comparable to enzymes that could be transformed into activeglycosynthases, however, it may be that other factors such as a low transfer rate are limiting thisenzyme from becoming an effective glycosynthase.4.6 ConclusionsThe variety of different bonds formed by this panel of glycosynthases truly speaks to the powerof harnessing the diversity of enzymes present in nature. By exploring a wide variety of enzymesthrough hydrolase screening, we were able to rapidly identify enzymes with promiscuous hydro-lase activity. Coupling this process to acceptor specificity screening enabled the identification ofideal substrates to use with each glycosynthase. Eight wild-type enzymes were transformed intocompetent glycosynthases, which were able to catalyse a variety of glycosylations. This set of1304.6. Conclusionsglycosynthases was able to generate disaccharides, trisaccharides, glycolipids and inhibitors con-taining azido or amino functional handles. The ability to synthesize glycans containing modifiedglucans will, going forward, enable the rapid diversification of molecules, to include a variety offunctionalities such as fluorophores or specificity elements.131Chapter 5ConclusionsIn this thesis I harnessed high-throughput functional metagenomic screening to identify novel genesinvolved in carbohydrate degradation throughout oceanic, soil, coal-bed and man-made bioreactorenvironments. In addition I harnessed this technology to detail microbial mechanisms of carbohy-drate degradation within the beaver digestive tract, and reveal new synergistic modes of degrada-tion. Libraries of identified metagenomic clones and synthesized genes were then profiled to revealpromiscuous enzymes which, in turn, were developed into new synthetic tools.The aim of this chapter is to provide an analysis and to integrate of the research within thisthesis in light of current research in the field. In addition, the limitations and strengths of myapproaches are analysed, as are possible future directions for investigation.5.1 Relevant ResearchDiscovery of New Glycoside HydrolasesFunctional metagenomic screening offers a valuable method to discover biomass-degrading enzymes,to complement more traditional methods for enzyme discovery, including activity-based screeningof isolates or isolate libraries, and genetic analysis of known carbohydrate-degrading organisms. Ofthe ten most recently discovered GH families (see Table 5.1), seven have been identified throughgenetic analysis of PULs, reflecting both the current interest in these gene clusters and the catalyticpower of the human symbiont B. thetaiotaomicron. Two of the other three most recently identifiedfamilies (GH144 & GH149) were identified through isolate activity screening, followed by proteinpurification [2, 158]. The last GH family (GH148) was identified through functional metagenomicscreening of a fosmid-harbouring E. coli library. The DNA sample used to construct this libraryoriginated from a volcanic crater which had both high temperature (67 ◦C) and pH (9.3). This1325.1. Relevant Researchlibrary was screened with both MU-C and Carboxy-methyl Cellulose (CMC) and the clone ofinterest had low activity against CMC, but higher activity on β-glucans [10].Table 5.1: The Ten Most Recently Defined Glycoside Hydrolase FamiliesFamily Activity Discovery Method Substrate ReferenceGH139 α-2-O-methyl-L-fucosidase PUL genetic analysis RG-II Ndeh et al.[215]GH140 endo-apiosidase PUL genetic analysis RG-II Ndeh et al.[215]GH141 β-L-arabinofuranosidase PUL genetic analysis RG-II Ndeh et al.[215]GH142 α-L-fucosidase; xylanase PUL genetic analysis RG-II Ndeh et al.[215]GH143 DHAase PUL genetic analysis RG-II Ndeh et al.[215]GH144 endo-β-1,2-glucanase Isolate activity screening β-1,2-glucan Abe et al. [2]GH145 α-L-rhamnosidase PUL genetic analysis AGP Munoz-Munoz et al. [209]GH146 β-L-arabinofuranosidase PUL genetic analysis RG-I Luis et al.[183]GH147 β-galactosidase PUL genetic analysis RG-I Luis et al.[183]GH148 β-1,3/β-1,4-glucanase Metagenomic screening CMC, β-glucan Angelov et al. [10]GH149 β-1,3-glucan phosphorylase Isolate activity screening laminaribiose Kuhaudomlarp et al.[158]CMC: Carboxymethyl-Cellulose, AGP: Arabinogalactan Protein, DHAase: 2-keto-3-deoxy-D-lyxo-heptulosaric acidhydrolaseFurther investigation of the most recently described GH families can give us insight into success-ful strategies for discovery. Most of the new familes were identified with either difficult to purify andcomplex substrates (GH139-143 and GH135-147 which are active on RG-I, RG-II or AGP) or in thecase of GH144 which is active on the uncommon β-1,2-glucan, substrates which have recently beenmade accessible through new synthetic schemes. The discovery of GH149 hinged on the mechanisticdetails of this family as activity assays were based on looking for reverse phosphorolysis productsrather than degradation products. The discovery of GH148 is an outlier from this set, as it reliedneither on new substrates – CMC was first employed in 1986 [262]– nor on mechanistic details,but rather was made possible by screening an extreme environment and investigating hits with lowactivity. Incorporating these successful strategies (complex natural substrates, probes based onmechanism, extreme environments and investigation of low activities) into functional metagenomicscreens should allow for the continued discovery of new hydrolase families.Glycoside hydrolase enzymes with activities previously unobserved within specific families havealso been recently identified. The methods used to identify these enzymes has mirrored those usedto identify new families. PUL genetic analysis [215], isolate screening [266, 307] transcriptomics[143] have all been used to identify new activities within known families. Additionally, phylogeneticanalysis has been used to identify enzymes or enzyme subfamilies which have low sequence similarity,or are deeply branching. One such example employs a novel bioinformatic pipeline (SACCHARIS)1335.1. Relevant Researchto identify uncharacterized subfamilies [144]. Application of this pipeline to the GH43 subfamilyenabled the identification of a GH43 (Bacteroides dorei DSM 17855 [BdGH43b]) which is able todegrade α-D-glucans, a surprising activity as this family has hitherto only been known to degradeeither β-D or α-L substrates [144].Modified Glycan SynthesisThe glycosynthases generated in Chapter 4 enable facile incorporation of modified sugars bearingboth azido and amino functional handles. Thus far only one other glycosynthase has been de-veloped to incorporate azido functional handles. The HiCel7B E197A glycosynthase was used tosynthesize a modified cellulose with 6-azido groups present at every second position [62]. This wasaccomplished by using a donor cellobioside possessing a single 6-azido modification on the reducingend glucose. This resulted in polymers with a degree of polymerization up to 34, which couldbe subsequently modified with click chemistry. Additionally, transglycosylation has been used toincorporate modified N-glycans bearing 6-azido functionalities. Ochiai and coworkers were able touse an oxazoline pentasaccharide donor sugar containing 6-azido mannose to remodel the N-glycanof a small natural glycoprotein [225].Another successful avenue for the generation of glycosynthases that act on modified sugars is theuse of directed evolution. As mentioned within Chapter 4, other members from the Withers grouphave had success subjecting Abg glycosynthase to directed evolution, and specifically screening forthe incorporation of non-natural substrates with modified substituents at the C3- position [272].To achieve this enhanced activity, a variant library of wild-type enzymes was first screened forhydrolytic activity with a 3-O-methyl-β-D-galactopyranoside. The mutations identified from thisdirected evolution were then introduced into the Abg 2F6 glycosynthase scaffold. This resulted ina 39-fold increase in glycosynthase activity when 3-O-methyl glucopyranosyl fluoride was used asthe donor sugar. One could envisage a hybrid strategy employing both metagenomic dicovery, tofirst identify candidate GHs, and directed evolution to improve or modify their activities beforeconversion to a glycosynthase.Glycosyl transferases have also been used to incorporate modified glycosides. Using a methodthey have termed glycorandomization, Jon Thorson and collegues have been able to repurpose GTs1345.2. Limitations and Future Directionsto transfer amino and azido glycosides [182]. This technique employs a promiscuous nucleotidyltransferase to first synthesize modified nucleotide diphospho-sugars, which are then used as sub-strates for GTs. A number of different glucose analogues have been glycosylated using this method,including 3-, 4- and 6- amino glucose [97, 182]. Additionally 3-,4- and 6-azidoglucosyl moieties couldbe attached to erythromycin analogues [337] or vancomycin analogues [97].5.2 Limitations and Future Directions5.2.1 Diverse SearchingThe functional metagenomic screening methods used in this thesis have enabled the identificationof hundreds of new GH genes from diverse environments. This process is, however, susceptibleto false negatives. Not every gene can be expressed in E. coli and not every glycoside hydrolasewill be detected with our substrates. This invites the obvious question – how can we improve ourfunctional screening to give ourselves a better chance of finding diverse plant biomass-degradinggenes from an environment?One problem affecting our ability to uncover biomass-degrading genes, discussed in Chapter 3,is under-sampling. This leads to poor representation of the rare taxa within the resulting library,decreasing the diversity of recovered hits. One solution to this problem is to simply create largerand larger libraries. This however leads to wasted resources as the most abundant hits are foundover and over again. Also, there are technical limitations on the size of library that can be screenedvia plate-based methods within reasonable time frames and costs. Microfluidic technologies offeran alternative to plate-based screens and are able to more rapidly screen clone libraries [211].Another potential solution to this problem could be to employ fluorescence activated cell sorting(FACS), which can be used to rapidly isolate sub-populations based on size, morphology or bindingto specific probes. Alternatively, stable isotope probing, could be used to isolate the DNA fromsub-populations. This technique relies on isotopic labelling of substrates, that when metabolizedby bacteria are incorporated into their genomic DNA. This isotopically labelled DNA can then beseparated via ultracentrifugation, and used to create metagenomic libraries.The improvement of heterologous expression is another avenue which promises to improve func-1355.2. Limitations and Future Directionstional metagenomic screening. The screens performed in this thesis were conducted entirely in theE. coli EPI300 strain. This strain has several benefits: it grows rapidly, is genetically tractable andallows for copy number induction of fosmids. However, the sequence space that can be exploredby this host is certainly limited. E. coli are limited to mesophilic growth, hindering access to ther-mophilic or psychotrophic enzymes, have comparatively few σ-factors and have biased codon usage[179, 302]. This has led to an exploration of alternative expression hosts and has spurred researchinto the creation of multiple host vectors. The Proteobacteria have been a particular area of interestwith E. coli, Agrobacterium tumefaciens,[69] Caulobacter vibrioides,[69] Rhizobium leguminosarum,[178] Ralstonia metallidurans,[69] Pseudomonas fluorescens, [1] Pseudomonas putida,[53, 69] Xan-thomonas campestris,[1] Burkholderia graminis, and [69] Sinorhizobium meliloti [263] all being usedas screening hosts. Other bacterial hosts include Thermus thermophilus [170], belonging to theDeinococcus-Thermus phylum, the Firmicute Bacillus subtilis [26] and the Actinobacterium Strep-tomyces albus[136]. Although the number of hosts appears extensive, they lack phylogenetic di-versity. Only 4 of the 30 accepted bacterial phyla have had a representative used in a functionalmetagenomic screen. Future advancement of functional metagenomic screening should lie in thedevelopment of new expression systems in hitherto under-utilized phyla.Conspicuous by their absence from the list of functional metagenomic hosts are the Bac-teroidetes. This phyla, as noted previously, have colonized virtually all types of environmentsand are well known for their ability to degrade complex carbohydrates [204]. Although I have beenable to identify many fosmids with Bacteroidetes origins, expression of metagenomic DNA withina member of this phylum should increase our ability to detect carbohydrate degrading genes andPULs. One potential complication with using a Bacteroidetes as a host may be the presence ofendogenous genes. Careful selection of a host with low background hydrolysis rates or creation ofknock-out strains tuned to the specific screen should enable the use of a strain from this phylum.In addition to developing hosts throughout the tree of life, it will be important to develop hoststhat allow access to specific chemistries. Two recent publications have identified the role of hydrogenperoxide (H2O2) in the oxidative cleavage of carbohydrates [25, 161]. Detection of this activityin culture would likely require the presence of H2O2, however many bacteria (including E. coli)possess the enzyme catalase which functions to degrade hydrogen peroxide. The implementation of1365.2. Limitations and Future Directionsfunctional screening in hosts known to produce hydrogen peroxide, such as Lactobacillus acidophilus[119], could enable the detection of such enzymes. Furthermore these strains may also be useful forthe discovery of enzymes, such as lignin-peroxidases [322], which require reactive oxygen species.It may also be beneficial to screen in hosts that are able to use uncommon amino-acids. Bothselenocysteine and pyrolysine, thought of as the 21st and 22nd proteinogenic amino-acids, are es-sential to catalysis in certain enzymes [30, 258]. However, the synthesis and use of these amino acidsdoes not occur in every branch of the tree of life. E. coli possess the machinery to incorporate se-lenocysteine, and they express several selenoproteins, yet they are unable to incorporate pyrolysine.Desulfitobacterium hafniense, an anaerobic Firmicute, is perhaps the only known bacterium thathas both been isolated and is known to use both selenocysteine and pyrolysine [152, 283]. Use of D.hafniense as a metagenomic host would enable the detection of proteins incorporating these aminoacids, which would otherwise go unseen. Although the development of metagenomic systems in newhosts may be technically difficult, it offers the potential to provide access to unexplored sequencespace and new biocatalysts.Another route forward, which circumvents the need for heterologous expression, is direct func-tional screening of environmental cells. This tactic also frees the researcher from the need to firstpurify and then insert metagenomic DNA into a vector and host strain. Two studies employingrapid screening technologies have made progress towards such direct functional screening of envi-ronmental cells [211, 250]. The first of these studies screened cells from a wheat stubble field inthe North of France [211]. They used a microfluidic system to encapsulate cells in 20 pL dropletswith a fluorogenic reporter (6,8-difluoro-7-hydroxycoumarin-4-methanesulfonate cellobioside) forcellobiosidase activity. Fluorescent droplets were then sorted on chip at a rate of over 100,000 bac-teria in less than 20 min. DNA from the resulting cells was then used for 16S sequencing to revealphylogeny of the hits and the sorted population was grown on agar. The second study screenedsurface water from Damariscotta Lake in the North-Eastern United States using a FACS-basedmethod [250]. In this study fluoresceinamine-labeled laminarin was incubated with environmentalcells, then FACS was used to sort those cells which bound to this substrate. The resulting hits werethen subjected to single-cell whole-genome amplification, a powerful technique for revealing the ge-netic potential of environmental cells without prior culturing. This revealed 121 laminarin-binding1375.2. Limitations and Future Directionssingle amplified genomes (SAGs), five of which were sequenced. The SAG with the highest coverage(SAG AAA168-F10) contained 58 putative glycoside hydrolases and a host of other carbohydratemodifying enzymes.The future of direct functional metagenomic screening should incorporate methodologies fromboth these studies. The use of droplet-based microfluidic screening with a reporter substrateoffers the ability to directly detect a functional activity, which is a superior method to the FACS-based screen. This is because it offers a direct connection between a cell and its activty, unlikethe FACS based screen which relied on the assumption that cells bound to substrates could alsoalso contain the enzymes to hydrolyse them. Although the authors of the microfluidics-basedstudy were able to sequence the 16S of the resulting hits, they failed to identify which geneswere responsible for activity. Coupling microfluidics based screening and sorting to single-cellwhole-genome amplification and sequencing will allow both rapid screening for activities and theidentification of responsible genes.5.2.2 Enzyme ProfilingMany of the hits identified throughout this thesis were subjected to plate-based assays to reveal thepH-dependence, thermal stability and substrate range of activity. This information is limited torelative values of initial rates as it is difficult to determine the exact concentration of an enzyme incrude lysate. To determine kinetic constants (kcat & KM ) the concentration of the active enzymemust be known. Active site titrating reagents, such as chromogenic or fluorogenic 2-fluoro sugars[82, 101], offer one possible avenue towards determining enzyme concentration in crude lysate,however these substrates can only be effectively employed with retaining enzymes. The future ofhigh-throughput enzyme characterization may therefore hinge on the rapid sub-cloning, expressionand purification of proteins. As discussed in Chapter 4, robotic automation gene synthesis, proteinexpression and purification has been accomplished with a standalone system [32]. However for thistechnology to be routinely employed, throughput must be expanded and costs decreased.The use of activated fluorogenic probes, such as the the chlorocoumarin glycosides employed inthis thesis, enable rapid and sensitive screening. These substrates contain a fluorogenic moleculewhich is thought to occupy the +1 subsite of the active enzyme. However, positive subsite in-1385.2. Limitations and Future Directionsteractions are undoubtedly important for catalysis. Furthermore, the identified enzymes likelyhave specificity towards different linkages, (β-1,2, β-1,3, β-1,4 etc.) and this information is notconveyed when the substrate contains a reporter molecule. As natural carbohydrates do not con-tain leaving groups that can be detected with the sensitivity of fluorophores, less sensitive andmore time-consuming methods (Such as TLC, reducing sugar assays and HPAEC) must be usedto characterize their activity. The use of such technologies becomes unreasonable as the number ofenzymes being assayed approaches the hundreds or thousands.The development of rapid, sensitive assays that utilize more natural substrates, that maintain+1 subsite interactions, will be an area of future research. Technologies based on Fo¨rster resonanceenergy transfer (FRET), biosensors, mass spectrometry and capillary electrophoresis show poten-tial for rapid high-throughput characterization of activities. For example, a study by Yang andcoworkers used FRET probes, containing two fluorophores installed on either side of a ganglioside,to interrogate endo-active hydrolases [329]. Generation of a suite of FRET-based probes whichincorporate plant-biopolymer oligomers would allow for the rapid profiling of endo-acting enzymes.The development of biosensors has also been a topic of recent research, [123, 339] particularly in thecontext of metabolic engineering [338]. Recently, a biosensor has been developed for the detection ofcellulases [162]. This work uses a genetic circuit that responds to the presence of cellobiose. Whenpresent, cellobiose derepresses the transcription of a fluorescent protein, resulting in a detectablereadout. One could imagine biosensors being able to detect any molecule including the metabolitesgenerated from plant biomass degradation. Future biosensor development will involve expandingthe range of substrates that can be detected and improving the dynamic range of biosensors.Another high-throughput characterization method is the use of Nanostructure initiator massspectrometry (NIMS) to rapidly profile enzyme activity [108]. This method employs glycans con-taining fluorous tags with varying mass attached to the reducing end that are used to assay enzymeactivity [222]. Accoustic deposition is then used to transfer small volumes (1 nL) of the reactionmixtures onto a chip containing a fluorous initiator. Matrix-assisted laser desorption/ionization(MALDI) is then used to detect the fluorous glycans depositied into this chip. This method hasbeen used by Heins and coworkers [117] to detail the activity of the same panel of GH1 glycosidehydrolases used in Chapter 4 of this thesis. This work used a NIMS chip and acoustic deposition to1395.3. Closingexamine 10,080 experimental conditions with 4 different substrates. This revealed substrate prefer-ences and temperature dependences for each of the 105 active enzymes that they assayed. Capillaryelectrophoresis has also been recently used to characterize enzymatically-released oligosaccharides[177]. This method re-purposed a DNA sequencer which can analyse 96 samples simultaneously torapidly quantify sugars. This allowed the detection of sugars released from the action of xylanaseson wheat flour arabinoxylan down to femtomolar ranges while differentiating between the activitiesof GH10 and GH11 xylanases. Future development of these methods promises to allow the rapidcharacterization of thousands of enzyme hits derived from metagenomic screening.5.3 ClosingFunctional metagenomic screening has the power to reveal those active genes within a microbialcommunity that are used to shape their chemical landscape. In this thesis functional metagenomicscreening has enabled the cataloging of new genes identified from diverse environments that degradeplant matter. It has also given us insight into complex carbohydrate metabolism within the beavergut and feces. The diversity of genes discovered can serve as a starting point for both the profilingof enzyme promiscuity and the development of new catalysts. In this respect I have created 8 newcompetent glycosynthases from both metagenomic and synthetic gene libraries. Further refinementof these discovered and engineered catalysts will expand our carbohydrate synthesis and degradationtoolkit. This, in turn, promises to open doors to more efficient degradation of plant biomass andthe creation of complex molecular probes and inhibitors for carbohydrate-active enzymes.140Chapter 6Methods6.1 General MethodsAll buffers and reagents were from Sigma-Aldrich Chemical Company unless otherwise stated.Custom DNA oligos used for sub-cloning and mutagenesis were synthesized by Integrated DNATechnologies. Sequence verification by means of targeted Sanger sequencing of mutants and sub-cloned genes was performed by Genewiz.6.2 Data AccessioningNucleotide sequences for fosmids described in Chapter 2 have been deposited in Genbank (Ac-cession ID: MH105917 - MH106139). Beaver feces data has been deposited in the NCBI Bio-Project portal (Bioproject ID: PRJNA261082), for assembled metagenomic reads (BioSample ID:SAMN04122864), unassembled metagenomic reads (BioSample ID: SAMN03389401), functionallyidentified fosmids (Biosample ID: SAMN03389402) and pyrotags (Biosample ID: SAMN03389403).Functionally identified beaver gut fosmids have been deposited in Genbank (Accession ID: MH106140- MH106387).6.3 Chapter 2 Experimental6.3.1 SamplingSoilSoil samples were collected by Dr. Marcus Taupp from the long-term soil productivity site atSkulow Lake, British Columbia. Soil from the organic layer, mineral layer of eluviation, mineral1416.3. Chapter 2 Experimentaltransition layer and mineral layer of accumulation at both undisturbed (Libraries: NO, NA, NBand NR) and harvested sites (Libraries: CO, CA, CB and SCR) were used to create fosmid libraries[114].OceanOcean water from the North-Eastern sub-Arctic Pacific Ocean was collected by Dr. Jody Wright.Water from Line P stations 4 and 12 was collected in February 2010 at depths between 10 and 2000meters, and used to create fosmid libraries [326].Coal BedSampling of Hydrocarbon resource environments was a result of the Hydrocarbon MetagenomicsProject ( Four separate samples were collected and usedto create fosmid libraries. Two samples were derived from cuttings of coal bed cores sourced fromRockyford Standard (CO182) and Basal (CO183) coal zones in Alberta. Another two samples(CG23A and PWCG7) were collected from co-produced water from coal bed methane well headslocated in the San Juan Basin, New Mexico [8].BioreactorsThree additional samples were derived from bioreactors. The first, a methanogenic naptha-degradingcommunity (NapDC) was initially inoculated with mature fine tailings from the Syncrude MildredLake Settling Basin (Alberta, Canada). This culture was enriched for naphtha-degrading consortiaby growing the culture with 0.2 % (v/v) hydrocarbon mixture naphtha as a sole carbon source[288]. The second enrichment culture was a methanogenic toluene-degrading culture (TolDC) de-rived from a shallow gas condensate-contaminated aquifer located beneath a natural gas productionsite in Weld County (Colorado, USA). This culture was enriched for toluene-degrading consortia bypropagating the culture with 0.01 % toluene (v/v) as a sole carbon source, prior to DNA isolation[93, 104, 288]. The final sampled bioreactor (FOS62) was an designed for remediation of metal con-taminated effluent from smelting operations. The bioreactor is located in Trail, British Columbiaand contains a mixture of limestone, quartz sand and Celgar biosolids, a by-product of the pulp1426.3. Chapter 2 Experimentaland paper industry. The biosolids were used in an anaerobic digester by the Zellstoff Celgar millcompany and therefore include bacterial biomass and partially degraded and composted celluloseand hemicellulose [4]. A homogenized core sample was collected, as described by Mewis et al [201].6.3.2 Library CreationFosmid Libraries were created by Dr. Sangwon Lee. Once environmental DNA had been isolatedand purified, fosmid libraries were generated. Fosmid library creation was performed as previouslydescribed using the CopyControl Fosmid Library Production Kit with pCC1FOS Vector Kit (Epi-Centre) [292]. Briefly, the DNA was end repaired to create 5 -phosphorylated blunt ends and thensubjected to pulsed-field gel electrophoresis (PFGE) to size-select 35-60 kb DNA fragments. TheDNA was recovered by gel extraction and ligated into the pCC1 vector. Linear concatemers of pCC1and insert DNA were packaged into a phage and transduced into phage-resistant E. coli EPI300cells. The successfully transduced clones were recovered on LB agar plates containing chloram-phenicol (12.5 µg/mL) and picked into 384-well plates, containing 100 µL of LB chloramphenicol(12.5 µg/mL) and 10 % glycerol, with an automated colony-picking robot (Qpix2, GENETIX).Clones were grown overnight at 37 ◦C then stored at -80 ◦C. In total, fosmid library constructionproduced 309,504 individual clones from a diverse set of environments (Table 2.1).6.3.3 Fosmid End-SequencingBi-directional Sanger end-sequencing was performed on a subset of the libraries using the ABIBigDye kit (Applied Biosystems, Carlsbad, Ca) on all clones at Canada’s Michael Smith GenomeSciences Centre, Vancouver, B. C. Canada. The primers used were pCC1 sequencing primers (for-ward: GGATGTGCTGCAAGGCGATTAAGTTGG, reverse: CTCGTATGTTGTGTGGAATTGT-GAGC).6.3.4 Annotation of End-SequencesOpen reading frames (ORFs) from fosmid end-sequences were predicted using Prodigal [134] imple-mented in the MetaPathways pipeline [155]. The 352,994 end-sequences yielded 400,561 ORFs >1801436.3. Chapter 2 Experimentalnucleotides in length which were annotated using LAST [150] implemented in the MetaPathwayspipeline based on queries of the CAZy database [181] (retrieved 2014,09,04).6.3.5 Functional ScreeningScreening was performed generally according to procedures by Mewis et al. [201] with modifications.384-well master plates were thawed at 37 ◦C for 20 minutes, after which they were replicated into384-well plates (Corning 3680) containing 40 µL per well of LB chloramphenicol (12.5 µg/mL)with arabinose (100 µg/mL). Replicated plates were then incubated in a humid chamber at 37◦C for 16-18 hours. Plates were removed from the incubation chamber and 40 µL of Assay mix(1 % Triton X-100, 1 mM 4-methylumbelliferyl cellobioside, 100 mM potassium acetate, pH 5.5)was then added to each well. These plates were incubated at 37 ◦C for a further 16-18 hours in ahumid chamber. Fluorescence was subsequently measured using a Varioskan (ThermoFisher) platereader with the excitation wavelength = 365 nm and the emission wavelength = 450 nm. Fosmidschosen for sequencing (>3 standard deviations above the mean for each substrate) were validatedby rescreening each clone in triplicate. These clones were rearrayed using an automated colony-picking robot (Qpix2, Molecular Devices), into a 384 well plate (Corning 3680) containing 80 µLof LB chloramphenicol (12.5 µg/mL) and 10% glycerol. This master plate was incubated overnightat 37 ◦C and then stored at -80 ◦C.6.3.6 Fosmid DNA Isolation and SequencingThe 384 well master plate was used to streak cultures onto LB chloramphenical (12.5 µg/mL) agarplates. Individual colonies were inoculated into 5 mL of terrific broth (TB) media and incubatedwith shaking for 18 hours at 37 ◦C. Fosmids were purified with a QIAprep Spin Miniprep Kit(QIAGEN), treated with PlasmidSafe ATP-dependent DNAse (Epicentre) and quantified using aQbit fluorimeter (ThermoFisher). Purified DNA was prepared for sequencing on the Illumina MiSeqplatform using Nextera XT library preparation kit and 96 sample Nextera V1 index kit. Bead-basednormalization was used before pooling samples, and samples were sequenced using paired end 150bp reads (2 x 150 bp mode). Fastq sequences were obtained from the sequencer and quality wasassessed using FastQC. Raw sequences were trimmed to Q30 quality, and residual contaminating1446.3. Chapter 2 ExperimentalE. coli genomic DNA was removed by alignment to the E. coli K12 reference genome using thebwa aligner [174]. Trimmed reads were assembled at a range of kmer values (64 to 160) usingABySS [275] and the kmer value that produced the fewest contigs of appropriate size (25 - 40 kb)was selected. The presence of pCC1 vector sequence at ends of fosmids signalled the proper contigto select. Wells that did not produce contigs with pCC1 vector present were end-sequenced andcompared to all contigs produced from that well to identify the correct sequence. Assembly andquality control were done in part by Dr. Keith Mewis and Connor Morgan-Lang.6.3.7 Fosmid AnnotationOpen reading frames (ORFs) were predicted using Prodigal [134] implemented in the MetaPathwayspipeline [155]. The 188 assembled fosmids yielded 4,969 ORFs >180 nucleotides in length whichwere annotated using LAST [150] implemented in the MetaPathways pipeline based on queries ofthe CAZy [181] (retrieved 2014-09-04), COG [291] (retrieved 2016-10-20), KEGG [146] (retrieved2011-06-18) and refseq-nr [246] (retrieved 2014-01-18) databases.6.3.8 GH Family TreesAll characterized protein sequences from GH1, GH3, GH5, GH8 and GH9 were downloaded fromthe CAZy database [295] in May of 2017 (Table 6.1). Sequences were clustered at 95% similaritywith UCLUST [84]. Fosmid encoded proteins from the same families as above were compiled andclustered as for the characterized CAZy proteins. Representative sequences from both sets forproteins were aligned with COBALT [230] and poorly aligned regions were removed with trimAL[45] using a gap threshold of 0.95 and a conservation threshold of 0.8. The trimmed multiplesequence alignments were then used to generate phylogenetic trees based on maximum likelihoodanalysis using RAxML [284]. One hundred bootstrap cycles were performed using the Whelan andGoldman substitution model [316] and Γ distribution of heterogeneity as parameters. Trees werererooted at the tree midpoint and visualised with the phytools [251] package in R.1456.3. Chapter 2 ExperimentalTable 6.1: Sequences Used To Generate TreesCAZy Characterized Fosmid EncodedCAZy Family Sequences Clusters (95%) Sequences Clusters (95%)GH1 277 247 28 18GH3 286 256 129 74GH5 536 451 50 29GH8 73 50 7 2GH9 167 143 6 46.3.9 Fosmid-Encoded Activity CharacterizationColonies that were selected for characterization were inoculated into a 96 deep-well plate (Costar3960) containing 800 µL of LB with chloramphenicol (12.5 µg/mL) and arabinose (100 µg/mL).After 18 hours of growth at 37◦ C with shaking, cells were harvested by centrifugation at 3,200 xg for 20 min. The supernatant was decanted, cell pellets were re-suspended in 100 µL of buffer (20mM NaOAc, 10 mM NaCl, pH 6.0) and OD600 was recorded. The cell suspension was added to100 µL of 2x lysis buffer (20 mM NaOAc, 10 mM NaCl, 2 % Triton X-100, 0.5 mg/mL lysozyme,cOmplete Protease Inhibitor-EDTA free(Sigma-Aldrich), pH 6.0) and incubated for 2 hours at 20◦C.General Assay ProcedureTo initiate reactions 20 µL of lysate was added into a 96-well plate containing 100 µL of bufferwith substrate (20 mM sodium acetate, 10 mM NaCl, 240 µM substrate, pH 6.0). Reactionswere performed using a Beckman Coulter Biomek FX workstation and run in triplicate at 20 ◦C.Samples (10 µL) were taken after set intervals and quenched by addition to 100 µL of stop buffer(1 M glycine, pH=10.4). The fluorescence of quenched reactions was determined with a BeckmanCoulter DTX-880 Multimode Detector (λexex = 365 nm, λem = 465 nm). Initial rates, below 10% of substrate consumption, were used to quantify enzyme activity.Substrate PreferenceTo asses substrate preference assays were setup according to the general protocol, with eight 4-methylumbelliferyl (MU) glycoside substrates (MU cellobioside, MU lactoside, MU β-D-glucopyranoside,1466.3. Chapter 2 ExperimentalMU β-D-galactopyranoside, MU β-D-xyloside, MU α-L-arabinoside, MU β-D-mannopyranoside andMU β-D-N -acetylglucosaminide).pH DependencepH dependence assays with the optimal substrate, as determined by the substrate specificity assay,were conducted using the general protocol with buffer replaced by a one of a set of eight citrate-phosphate buffers (50 mM sodium phosphate, 25 mM sodium citrate, 10 mM NaCl) at a pHbetween 4-7.7, and repeated in pH 7-9.8 Glycyl-glycine buffers (20 mM) when necessary. Assayswere conducted at 20 ◦C with sampling after 1, 4 and 16 hours or when appropriate they wereincubated at at 37 ◦C for 2.5 hours. The optimum pH was recorded as the pH where the maximumvelocity was observed.Thermal StabilityThe lysates were aliquoted and pre-incubated for 10 minutes at different temperatures using agradient incubation protocol on a 96-well MyCycler thermal cycler (Biorad), at 37, 39, 42, 46, 52,57, 60, 62 ◦C, and repeated at 37, 42, 52, 60, 65, 70, 80.4, 90 ◦C if found necessary. Assays with thebest substrate, as determined by the substrate specificity assay, were then setup according to thegeneral protocol, and conducted at 20 ◦C with sampling after 1, 4 and 16 hours or when appropriatethey were incubated at at 37 ◦C for 2.5 hours. Data were fit to the van’t Hoff equation (6.1) todeduce the denaturation midpoint temperature (Tm).lnKD =∆S◦DR− ∆H◦DRT(6.1)Where KD is the is the equilibrium constant of denaturation, ∆S◦D is the standard-state entropyof denaturation, ∆H◦Dis the standard-state enthalpy of denaturation, R is the ideal gas constantand T is absolute temperature.1476.4. Chapter 3 Experimental6.4 Chapter 3 Experimental6.4.1 Sample CollectionFecal samples were collected by Dr. Kevin Mehr and myself on April 24th, 2012 from two beaverthat were being cared for at the Critter Care Wildlife Society located in Langley, British Columbia,Canada. Animals were fed branches from a variety of woody plant species, native to the PacificNorthwest. Due to the difficulty of obtaining fresh fecal matter, as beavers defecate underwater,samples were collected from material that had accumulated at the enclosure water outflow grating,within 12 hours of cleaning. The enclosure was open to the environment and not heated. Thetemperature fluctuated between 7 and 15 ◦C in the time before sample collection. As both beaversshared the same enclosure, it was not possible to identify which animal the samples came from.Samples were frozen in a slurry of dry-ice and ethanol and transported to the laboratory on dry-iceand were stored at -80 ◦C.Intestinal samples were collected by Dr. Keith Mewis and myself from beavers freshly trapped(< 24 hours) by Allan Starkey in Maple Ridge, British Columbia, Canada between January 18th,2014 and April 14th, 2014. Six beavers were dissected in Maple Ridge to remove the entire digestivetract (esophagus to rectum) and transported on ice to UBC. Beaver 1 had suffered trauma thatresulted in rupture of the stomach and possible contamination from other gut sites. Digestive tractswere dissected and chyme or feces was collected from five locations (stomach, small intestine, cecum,proximal colon, and rectum). Collected samples were frozen in a slurry of dry-ice and ethanol andstored at -80 ◦C.6.4.2 DNA ExtractionHigh molecular weight DNA was extracted as described previously [169]. Four grams of beaver fecesor chyme was thawed and extracted in two gram duplicates. The samples were ground by mortarand pestle under liquid nitrogen, and extracted three times with extraction buffer (100 mM sodiumphosphate pH 7.0, 100 mM Tris-HCl, 100 mM EDTA, 0.5 M NaCl, 1 % hexadecyltrimethylammo-nium bromide, 2 % sodium dodecyl sulfate) at 65 ◦C with rotation. The resulting supernatant waswashed with chloroform-isoamyl alcohol. Finally, DNA was purified by isopropanol precipitation1486.4. Chapter 3 Experimentaland quantified using the PicoGreen assay (Invitrogen).6.4.3 PCR Amplification of Ribosomal SSU Gene SequencesFollowing DNA isolation, the V6-V8 region of the small subunit ribosomal RNA (rRNA) gene wasPCR amplified with the universal three-domain primers 926F (5-AAA CTY AAA KGA ATT GACGG-3) and 1392R (5-ACG GGC GGT GTG TRC-3). Reverse primer sequences were modified toinclude the 454 adaptor sequence and a 5 base-pair (bp) barcode for multiplexing during sequencing.Reactions were run in duplicate under the following PCR conditions: initial denaturation cycle at95◦C for 3 minutes; 25 cycles of 95◦C for 30 seconds, primer annealing at 55◦C for 45 seconds,and extension at 72 ◦C for 90 seconds; and a final extension cycle at 72◦C for 10 minutes. Each50 µL reaction contained: 1-10 ng template DNA, 0.6 µL Taq polymerase (Bioshop Canada Inc.,5 U/µL), 5 µL 10X reaction buffer, 4 mM MgCl2, 0.4 mM of each dNTP (Invitrogen), 200 nMeach of forward and bar-coded reverse primers, and 33.4 µL nuclease free water (Fisher). Duplicatereactions were pooled and purified using a QIAquick PCR Purification Kit (Qiagen) and quantifiedusing the PicoGreen assay (Invitrogen). Samples were diluted to 10 ng/µL and pooled in equalconcentrations.6.4.4 Sequencing and AssemblyAmplicon pools from both the fecal and intestinal samples were sent to The McGill University andGe´nome Que´bec Innovation Centre for 454 pyrosequencing using the Roche 454 GS FLX Titanium(454 Life Sciences, Branford, CT, USA) technology according to the manufacturer’s instructions.Metagenomic DNA from the beaver feces was sent to the same facility and was sequenced withthe same platform. This resulted in 616,811 reads of average length 761 bp, and total length of469.2 Mbp. Sequences were trimmed by Dr. Keith Mewis to Q30 quality score using prinseq lite+[265] and assembled using MIRA [54], resulting in 75,523 contigs with an N50 of 1787 bp and 130.5Mbp of consensus sequence.Metagenomic DNA from intestinal samples was sequenced at the UBC Pharmaceutical SciencesSequencing Center on an Illumina MiSeq using paired-end, 300 bp Nextera XT chemistry (Illumina,San Diego, CA). Fifteen samples (five sites from three different beavers) were indexed and pooled1496.4. Chapter 3 Experimentalfor sequencing. DNA sequences from intestinal samples was trimmed by Dr. Keith Mewis toQ30 quality score prinseq lite+ [265]. Dr. Keith Mewis and Connor Morgan-Lang attempted toassemble this data with ABySS [275], IDBA-UD [235] and SPAdes [18], however all assembliesgenerated contained less than 5 % of the unassembled data. Paired-end reads were merged by Dr.Keith Mewis using FLASH [189], with parameters specifying a minimum 20 base pair overlap with95 % similarity, to generate reads up to 580 bp of high quality. Assembly attempts with thesecombined reads remained poor, with no samples showing N50 values (a weighted median statisticsuch that 50 % of the entire assembly is contained in contigs equal to or larger than this value)above 1000 bp.6.4.5 Analysis of Pyrotag DataThe software package Quantitative Insights Into Microbial Ecology (QIIME) [46] was used to an-alyze both the fecal and gut pyrotag sequences. As a quality control step, sequences with qualityscores less than Q25, those containing ambiguous bases, or identified homopolymer runs, or chimericsequences, or with length less than 200 bp were removed. The remaining high quality sequences (seeTable 6.2 for breakdown by sample) were clustered at the 97% identity threshold with a maximume-value cut-off of 1 x 10−10 using UCLUST, implemented in QIIME software [46]. Singletons wereomitted from downstream analyses, leaving a total of 1,044 operational taxonomic units (OTUs)made up of 154,028 pyrotag sequences from both fecal and gut samples. Taxonomic assignmentfor each OTU cluster was performed using the Basic Local Alignment Tool (BLAST) [7] and theSILVA database version 111 ( [248] with a confidence level of 0.8 and a maximume-value cut-off of 1 x 10−3. OTU abundance was normalized to the total number of reads recovered,and expressed as a normalized percentage for analysis.6.4.6 Analysis of Metagenomic SequencesGenes from the beaver fecal metagenome were predicted from both assembled and unassembleddata using Prodigal [134] within the MetaPathways software package. The assembled metagenomeyielded 151,180 open reading frames (ORFs) larger than 60 amino acids. Using the LAST algorithm[150] within MetaPathways [155], these ORFs were compared to the KEGG [146], COG [291],1506.4. Chapter 3 ExperimentalRefSeq [246], and MetaCyc [49] databases. The unassembled intestinal metagenomes were queriedin an identical fashion, revealing a total of 4,910,871 predicted ORFs larger than 60 amino acids,Table Fosmid Library CreationFor large insert library construction, DNA was further purified by cesium chloride gradient ul-tracentrifugation [325]. The large insert libraries were constructed as described previously [292]using the CopyControl Fosmid Library Production Kit with pCC1FOS Vector Kit (EpiCentre). Alibrary of 12 x 384-well plates of clones (4,608 individual clones) was generated for the Fecal library.Additionally, DNA from the intestinal tract of beaver 2 was used to make libraries derived from thececum (17 x 384-well plates, 6,528 clones), the proximal colon (39 x 384-well plates, 14,976 clones)and the rectum (58 x 384 well plates, 22,272 clones). We also attempted to make libraries from thestomach and small intestine DNA from beaver 2, however this DNA was highly fragmented and Iwas unsuccessful.Clones were picked with an automated colony picking robot (Qpix2, Molecular Devices) andinoculated into plates containing 100 µL of LB chloramphenicol (12.5 µg/mL) and 10% glycerol.These plates were incubated overnight at 37 ◦C then stored at -80 ◦C.6.4.8 Functional ScreeningScreening was performed generally according to procedures by Mewis et al. [201] with modifications.Screening was carried out in phosphate buffer (final concentration 25 mM sodium phosphate pH6.0), with 100 µM each of three fluorogenic substrates (6-chloro-4-methylumbelliferyl cellobioside,6-chloro-4-methylumbelliferyl xylobioside, and 6-chloro-4-methylumbelliferyl β-D-xylopyranoside)were pooled to screen for multiple activities simultaneously. Screening was performed at a temper-ature of 37 ◦C, which is the body temperature of Castor canadensis [83]. Wells with fluorescenceabove a specific threshold (z-score > 3 for the fecal library, and robust z-score > 40 for the gutlibraries) were selected for validation and re-screening of these clones was performed in triplicate.Fosmid containing clones chosen for sequencing (validated with a z-score >3 for each substrate)were rearrayed using an automated colony-picking robot (Qpix2, Molecular Devices), into a 96 well1516.4. Chapter 3 Experimentalplate (Costar 3370) containing 200 µL of LB chloramphenicol (12.5 µg mL−1) and 10 % glycerol.This master plate was incubated overnight at 37 ◦C and then stored at -80 ◦C.Screening was performed similarly for the beaver intestinal fosmids, except for the use 6-chloro-4-methylumbelliferyl β-D-mannoside in the place of 6-chloro-4-methylumbelliferyl β-D-xylopyranoside.6.4.9 Fosmid Preparation and SequencingThe 96 well master plate was used to inoculate a 96 deep-well plate (Costar) containing 1.65 mL LBwith chloramphenicol (12.5 µg/mL) and arabinose (100 µg/mL). This deep-well plate was incubatedwith shaking (37 ◦C, 320 rpm) for 20 hours, after which the plate was centrifuged at 1500 × g for10 minutes and the supernatant was decanted. Fosmids were purified from the pelleted cells usinga Montage Plasmid MiniprepHTS 96 Kit (Millipore), treated with PlasmidSafe ATP-dependentDNAse (Epicentre) and quantified using the PicoGreen assay (Invitrogen).Purified DNA was prepared for sequencing on the Illumina MiSeq platform using Nextera XTlibrary preparation kit and 96 sample Nextera V1 index kit. Bead-based normalization was usedbefore pooling samples, and samples were sequenced using paired end 150 bp reads (2 x 150 bpmode). FastQ sequences were obtained from the sequencer and quality was assessed using FastQC.Raw sequences were trimmed to Q30 quality, and residual contaminating E. coli genomic DNA wasremoved by alignment to the E. coli K12 reference genome using the bwa aligner [174]. Trimmedreads were assembled at a range of kmer values (64 to 160) using ABySS [275] and the kmer valuethat produced the fewest contigs of appropriate size (25 - 40 kb) was selected. The presence of pCC1vector sequence at ends of fosmids signalled the proper contig to select. Wells that did not producecontigs with pCC1 vector present were end-sequenced and compared to all contigs produced fromthat well to identify the correct sequence.6.4.10 Fosmid AnnotationOpen reading frames (ORFs) were predicted using Prodigal [134] implemented in the MetaPathwayspipeline [155]. The assembled metagenome yielded 151,180 ORFs >180 nucleotides in length whichwere annotated using LAST [150] implemented in the MetaPathways pipeline based on queries ofthe CAZy [181] (retrieved 2014,0904), COG [291] (retrieved 2016-10-20), KEGG [146] (retrieved1526.4. Chapter 3 Experimental2011-06-18) and refseq-nr [246] (retrieved 2014-01-18) databases.6.4.11 Fosmid Encoded Enzyme SpecificitiesFosmid hits from the fecal library were further characterized once chosen for sequencing. The frozenmaster plate was used to inoculate a deep well plate (Costar) containing 0.8 mL of LB containingchloramphenicol (12.5 µg/mL) and arabinose (100 µg/mL). This expression plate was incubatedat 37 ◦C for 18 hours with shaking at 225 rpm. Cells were harvested by centrifugation at 3220 x gfor 20 min. After supernatant was decanted, cell pellets were re-suspended in 200 µL of buffer (50mM sodium phosphate, 10 mM NaCl, pH 6.0) and OD600 was recorded. This cell suspension wasadded the same volume of lysis buffer (50 mM sodium phosphate, 10 mM NaCl, 2 % triton, 0.5mg/mL lysozyme, cOmplete Protease Inhibitor- EDTA free (Roche), pH 6.0) and incubated for 1hat 20 ◦C.Activity assays were performed in 96-well plates (Costar) which contained 40 mM sodiumphosphate, 200 µM substrate and 20 µL of cell lysis. Substrates assayed for activity were: 6-chloro-4-methylumbelliferyl cellobioside, 6-chloro-4-methylumbelliferyl xylobioside, and 6-chloro-4-methylumbelliferyl β-D-xylopyranoside, methylumbelliferyl cellobioside, methylumbelliferyl β-D-glucopyranoside, methylumbelliferyl xylobioside, methylumbelliferyl β-D-xylopyranoside, methy-lumbelliferyl lactopyranoside, methylumbelliferyl β-D-galactopyranoside, methylumbelliferyl β-D-mannopyranoside, methylumbelliferyl α-L-arabinofuranoside and methylumbelliferyl N -acetyl-β-D-glucosaminide. Reactions were setup on a Beckman Coulter Biomek FX workstation and run intriplicate at 20 ◦C. Samples (10 µL) were taken after 1, 2, 4, 6 hours and quenched with stop buffer(1 M glycine, pH=10.4) and analyzed by fluorescence spectroscopy on a Beckman Coulter DTX-880Multimode Detector (λex = 365, bandwidth 25 nm, λem = 465, bandwidth 35 nm).6.4.12 Sub-Cloning of GenesTo further investigate the GH43 genes belonging to uncharacterized subfamilies present on thebeaver fecal fosmids I sub-cloned and expressed the protein products. One GH43 gene from each ofthe uncharacterized subfamilies 2, 7 and 28 was chosen for cloning, expression and characterization.The three genes (12 H03-12, 12 H03-13, 12 J03-18), were inserted into a pET28 vector by use of the1536.4. Chapter 3 ExperimentalPolymerase Incomplete Primer Extension method [154]. Purified fosmids were used as a templatefor insert amplifications, while purified pET28 was used as the vector template. Each PCR reactioncontained 10 µL of Phusion reaction buffer, 1.5 µL of dNTPs (10 mM), 1 µL forward primer (10µM), 1 µL reverse primer (10 µM) 2 µL of template DNA (5 ng/µL) 0.5 µL Phusion polymerase and34 µL of water. The insert PCR was performed with the following parameters: Initial denaturationat 95 ◦C for 2 minutes followed by 25 cycles of denaturation at 95 ◦C (30 s), annealing between 57◦C and 70 ◦C (30 s) and extension at 72 ◦C (1 min). Vector PCR was performed as above, exceptthe annealing temperature was 55 ◦C and the extension time was 3.5 minutes. The primers usedare detailed in Table 6.4. PCR products were mixed and transformed into DH5α cells, plasmidswere sequence verified, then transformed into BL21(DE3) cells for expression.6.4.13 MutagenesisThe variant enzymes H03-13 E507A and H03-13 E209A were produced by means of modifiedQuikChange mutagenesis [180]. PCR was first performed for 12 cycles with one of the senseor anti-sense primers these two reactions were subsequently pooled and an additional 16 cycles ofPCR were performed. Each PCR reaction contained 10 µL of Phusion GC reaction buffer, 2.5µL of dNTPs (10 mM), 2.5 µL sense or anti-sense primer (10 µM), 10 µL of template DNA (5ng/µL) 1 µL Phusion polymerase and 24 µL of water. The PCR cycling parameters were: Initialdenaturation at 98 ◦C for 30 s followed cycles of denaturation at 98 ◦C (10 s), annealing between 60◦C and 70 ◦C (30 s) and extension at 72 ◦C (3 min and 15 seconds). The primers used are detailedin Table 6.5. PCR reactions were digested with the endonuclease DpnI (ThermoFisher) for 1 hourat 37 ◦C. This digestion reaction was subsequently cleaned up with a GeneJet PCR purificationkit (ThermoFisher) and DNA was eluted into water. The cleaned up DNA (10 µL) was then usedto transform DH5α cells, plasmids were sequence verified, then transformed into BL21(DE3) cellsfor expression.6.4.14 Protein Expression and PurificationProteins were purified with use of polyhistidine tags and Ni-NTA resin columns. Cultures of 50mL LBE-5052 [285], containing 50 µg/L of kanamycin were inoculated with the expression host1546.4. Chapter 3 Experimentaland cells were grown for 18 hours at 37 ◦C (12 H03-12 and 12 J03-18) or 30 ◦C (12 H03-13, H03-13 E507A and H03-13 E209A) with shaking. Cultures were centrifuged (3,200 x g, 4 ◦C, 20 min),the supernatant was removed and cell pellets were stored at -80 ◦C until purification. To purifyproteins 2.5 mL of lysis mix (1 x BugBuster [Novagen], 20 mM HEPES, 300 mM NaCl, 20 mMImidazole, pH 7.0) was used to resuspend thawed cell pellets. This suspension was incubated at20 ◦C for 20 minutes, after which the lysate was clarified by centrifugation (3220 x g, 4 ◦C, 20min) and loaded onto columns containing 1 mL of HisPur resin (ThermoScientific). Columns werewashed with 20 mL of Buffer A (20 mM HEPES, 300 mM NaCl, 20 mM Imidazole, pH 7.0) andprotein was eluted with 4 mL of Buffer B (20 mM HEPES, 300 mM NaCl, 500 mM Imidazole, pH7.0). Proteins were buffer-exchanged into storage buffer (20 mM HEPES, 300 mM NaCl, pH 7.0)with Amicon 30 kDa filter columns and stored at 4 ◦C. Protein concentrations were determinedbased on absorbance at 280 nm.6.4.15 Protein CharacterizationEach purified enzyme was tested for activity on the following model substrates: p-nitrophenylβ-D-xylopyranoside,, p-nitrophenyl β-D-xylopyranoside, p-nitrophenyl α-L-arabinofuranoside, 4-methylumbelliferyl β-D-xylopyranoside, 6-chloro-4-methylumbelliferyl β-D-xylopyranoside and 4-methylumbelliferyl α-L-arabinofuranoside. Purified enzyme was added (final concentrations of 200nM) to a solution of 100 µM substrate, 50 mM HEPES, 50 mM NaCl, pH 7.0. These assays wereincubated at 37 ◦C for 18 hours after which absorbance (λ = 400 nm) and fluorescence (λex = 365nm, λem = 450 nm) were detected with a BioTek synergy H1 plate reader.Kinetic parameters were determined using 6-chloro-4-methylumbelliferyl β-D-xylopyranoside.Assays were performed in 96 well plates (Corning 3370) containing the substrate (2.5 µM - 100µM), buffer (50 mM HEPES, 50 mM NaCl, pH 7.0) and purified enzyme. Reactions were performedat 30 ◦C and fluorescence (λex= 365 nm and λem= 450 nm, gain = 65) was monitored usinga Synergy H1 plate reader (BioTek). The quantity of fluorophore generated was determined bymeans of a calibration curve of 6-chlorocoumarin within an identical buffer system. All reactionswere performed in triplicate. Rate measurements were used to calculate kinetic parameters withthe software program GraFit 7.0 software.1556.4. Chapter 3 ExperimentalEnzyme activity was also determined on the oligosaccharides, 32-α-L-arabinofuranosyl-xylobiose(A3X), 23-α-L-arabinofuranosyl-xylotriose (A2XX) and a mixture of both 23-α-L-arabinofuranosyl-xylotetraose and 33-α-L-arabinofuranosyl-xylotetraose (XA3XX/XA2XX). Purified enzymes (finalconcentration of 0.5 µM per enzyme) were added to a solution of 4 mM substrate in HEPESbuffer (50 mM HEPES, 50 mM NaCl, pH 7.0). Assays were incubated at 25 ◦C for 18 hours,then subsequently boiled for 10 min to inactivate the enzymes. Products were analyzed with theuse of a high performance anion-exchange chromatography equipped with a pulsed amperometricdetector (HPAEC-PAD). This system was quipped with a CARBOPACTM PA-200 analytical anionexchange column (Dionex). The elution conditions were: 0-4 min 20 mM NaOH; 4-13 min, 20 mMNaOH with a 0 - 84 mM sodium acetate gradient; 13-14 min, 20 mM NaOH with a 84-120 mMsodium acetate gradient; 14-16 min with 20 mM NaOH and 120 mM sodium acetate. The standardsused to identify the chromatographic peaks were arabinose, xylose, xylobiose, and xylotetraose.1566.4. Chapter 3 ExperimentalTable 6.2: Beaver Pyrotag Counts.Number of SequencesBeaver Site High-quality Singletons Removed OTUs*0 Feces 12,250 11,575 3551Stomach 3,874 3,659 431Small Intestine 2,167 2,115 299Cecum 2,784 2,593 370Proximal Colon 5,584 5,222 478Rectum 15,708 15,015 5882Stomach 4,904 3,594 288Small Intestine 5,332 5,178 357Cecum 251 241 105Proximal Colon 17,078 16,367 512Rectum 3,889 3,690 3443Stomach 3,595 3,580 53Small Intestine 7,899 7,718 272Cecum 1,352 1,264 178Proximal Colon 3,180 3,163 165Rectum 4,081 4,033 2714Stomach 3,574 3,564 36Small Intestine 4,351 4,339 27Cecum 4,879 4,818 281Proximal Colon 4,408 4,300 280Rectum 4,215 4,130 2945Stomach 5,425 4,482 120Small Intestine 2,626 2,620 21Cecum 5,272 5,050 392Proximal Colon 4,558 4,290 372Rectum 5,532 5,412 3736Stomach 5,055 3,927 104Small Intestine 8,087 8,066 31Cecum 2,077 2,028 276Proximal Colon 4,341 4,156 370Rectum 3,966 3,839 319Total 162,294 154,028 1,044*singletons and mitochondria/chloroplasts removed from OTUs1576.4. Chapter 3 ExperimentalTable 6.3: Beaver Intestinal MetagenomesBeaver Site File Size (Mbp) Predicted ORFs1Stomach 101.8 1,203,77Small Intestine 69.5 47,018Cecum 385.4 689,640Proximal Colon 295.0 545,684Rectum 360.4 597,8642Stomach 47.1 23,566Small Intestine 282.2 178,664Cecum 390.4 592,572Proximal Colon 240.8 377,942Rectum 334.2 338,3813Stomach 393.1 179,079Small Intestine 333.0 170,004Cecum 243.8 325,306Proximal Colon 278.4 335,192Rectum 374.8 509,959Total 4,129.9 4,910,871Table 6.4: Sub-Cloning PrimersPrimer SequenceH03 12 ∆1-23 IPF CTTTAAGAAGGAGATATACCATGCAGGTGGGGCAACCCTGGATH03 12 ∆1-23 IPR GATCTCAATGGTGATGGTGATGGTGAGGTTCCCTCCTCATCCTCCH03 12 ∆1-23 VPF AGGATGAGGAGGGAACCTCACCATCACCATCACCATH03 12 ∆1-23 VPR AATCCAGGGTTGCCCCACCTGCATGGTATATCTCCTTCTTAAAGH03 13 ∆1-21 IPF CTTTAAGAAGGAGATATACCATGCAAAACCCGCTCATCCACTCH03 13 ∆1-21 IPR GATCTCAATGGTGATGGTGATGGTGTTTTACATCCACAGTGATATTCCH03 13 ∆1-21 VPF ATCACTGTGGATGTAAAACACCATCACCATCACCATH03 13 ∆1-21 VPR CGAGTGGATGAGCGGGTTTTGCATGGTATATCTCCTTCTTAAAG12 J03 IPF CTTTAAGAAGGAGATATACCATGAAAACCTACTGCAACCCG12 J03 IPR GATCTCAATGGTGATGGTGATGGTGGCCCTCCATCTTTACAATTTC12 J03 VPF ATTGTAAAGATGGAGGGCCACCATCACCATCACCAT12 J03 VPR GCGGGTTGCAGTAGGTTTTCATGGTATATCTCCTTCTTAAAGTable 6.5: Mutagenesis PrimersPrimer SequenceH03 13 E507A F GATGTGCGCACCGCCGGAATGTCATACH03 13 E507A R GTATGACATTCCGGCGGTGCGCACATCH03 13 E209A F CGAAGGCTTCAAGGCAGGGCCCTTCGCCTTCH03 13 E209A R GAAGGCGAAGGGCCCTGCCTTGAAGCCTTCG1586.5. Chapter 4 Experimental6.5 Chapter 4 Experimental6.5.1 Screening: Metagenomic Hit LibraryScreening was performed using master-plates generated from the screening of numerous libraries(including all of those detailed in Chapter 2 and the fecal library generated in Chapter 3).Screening methods followed the procedures detailed in section 6.3.5, with no modifications ex-cept for the substrate used. Instead of the substrates used to originally identify the clonesa panel of azido-, amino- and methoxy glycosides were used. The fluorogenic substrates usedwere the amino-glycosides: 4-methylumbelliferyl 3-amino-3-deoxy-β-D-glucopyranoside (MU-3-NH2-Glc ), 4-methylumbelliferyl 4-amino-4-deoxy-β-D-glucopyranoside (MU-4-NH2-Glc), and 4-methylumbelliferyl 6-amino-6-deoxy-β-D-glucopyranoside (MU-6-NH2-Glc); the azido-glycosides:4-methylumbelliferyl 3-azido-3-deoxy-β-D-glucopyranoside (MU-3-N3-Glc), 4-methylumbelliferyl4-azido-4-deoxy-β-D-glucopyranoside (MU-4-N3-Glc), 4-methylumbelliferyl 6-azido-6-deoxy-β-D-glucopyranoside (MU-6-N3-Glc), and 4-methylumbelliferyl 6-azido-6-deoxy-β-D-galactopyranoside(MU-6-N3-Gal); and the methoxy-glycosides: 3-methoxy-β-D-galactopyranoside (MU-3-O-Me-Gal)and 3-methoxy-β-D-glucopyranoside (MU-3-O-Me-Glc).6.5.2 Sub-Cloning of GenesThe genes selected for further investigation were inserted into a pET28 vector with a C-terminalHis-tag by use of the Polymerase Incomplete Primer Extension method [154]. Signal sequences werepredicted using SignalP [218] and primers were designed to exclude these amino acids. Purifiedfosmids were used as a template for insert amplifications, while purified pET28 was used as thevector template. Each PCR reaction contained 10 µL of Phusion reaction buffer, 1.5 µL of dNTPs(10 mM), 1 µL forward primer (10 µM), 1 µL reverse primer (10 µM) 2 µL of template DNA(5 ng/µL) 0.5 µL Phusion polymerase and 34 µL of water. The insert PCR was performed withthe following parameters: Initial denaturation at 95 ◦C for 2 minutes followed by 25 cycles ofdenaturation at 95 ◦C (30 s), annealing between 57 ◦C and 70 ◦C (30 s) and extension at 72 ◦C(1 min). Vector PCR was performed as above, except the annealing temperature was 55 ◦C andthe extension time was 3.5 minutes. The primers used are detailed in Table 6.6. PCR products1596.5. Chapter 4 Experimentalwere mixed and transformed into DH5α cells, plasmids were sequence verified, then transformedinto BL21(DE3) cells for expression.Table 6.6: Primers Used for Sub-Cloning Fosmid Derived GenesPrimer SequenceC11 GH1 pET28 fwd GTACCATATGGTGGCTTTTTCGGATAAATTTTTGTGC11 GH1 pET28 rev GTACCTCGAGTTACAGATTTTTTCCGTTCCTGCTGBeaver 09 O03 GH42 IPF CTTTAAGAAGGAGATATACCATGTACGAAAAAGTATGGAAACAGGBeaver 09 O03 GH42 IPR GATCTCAATGGTGATGGTGATGGTGTATCGCCGTCTTCACGATCGBeaver 09 O03 GH42 VPF ATCGTGAAGACGGCGATACACCATCACCATCACCATBeaver 09 O03 GH42 VPR GTTTCCATACTTTTTCGTACATGGTATATCTCCTTCTTAAAGFOS62 40 O22-25 IPF CTTTAAGAAGGAGATATACCATGAAACACAACATTGAAGAAATCFOS62 40 O22-25 IPR GATCTCAATGGTGATGGTGATGGTGATTACTGAGTCCCAAAGAFOS62 40 O22-25 VPF TCTTTGGGACTCAGTAATCACCATCACCATCACCATFOS62 40 O22-25 VPR TTCTTCAATGTTGTGTTTCATGGTATATCTCCTTCTTAAAGNapDC 14 D08-33 IPF CTTTAAGAAGGAGATATACCATGTCCGATTCTGTGCTATCCANapDC 14 D08-33 IPR GATCTCAATGGTGATGGTGATGGTGAGCCTGGCTGTGCACCTGNapDC 14 D08-33 VPF CAGGTGCACAGCCAGGCTCACCATCACCATCACCATNapDC 14 D08-33 VPR GGATAGCACAGAATCGGACATGGTATATCTCCTTCTTAAAGTolDC 15 C08-23 IPF CTTTAAGAAGGAGATATACCATGCTTCATTACCTTTCCCGCTolDC 15 C08-23 IPR GATCTCAATGGTGATGGTGATGGTGCATCTCCAAGCGCAGGCTTolDC 15 C08-23 VPF AGCCTGCGCTTGGAGATGCACCATCACCATCACCATTolDC 15 C08-23 VPR GCGGGAAAGGTAATGAAGCATGGTATATCTCCTTCTTAAAGC24 GH3-1 ∆40 IPF CTTTAAGAAGGAGATATACCATGAAGTTTGCACATGATTTTCC24 GH3-1 ∆40 IPR GATCTCAATGGTGATGGTGATGGTGGAGATCTTCCCCACGATTC24 GH3-1 ∆40 VPF AATCGTGGGGAAGATCTCCACCATCACCATCACCATC24 GH3-1 ∆40 VPR AAAATCATGTGCAAACTTCATGGTATATCTCCTTCTTAAAGC24 GH3-2 ∆34 IPF CTTTAAGAAGGAGATATACCATGGAACACGATGAAAAGCC24 GH3-2 ∆34 IPR GATCTCAATGGTGATGGTGATGGTGTTTCCCGTTGATTAGAATC24 GH3-2 ∆34 VPF ATTCTAATCAACGGGAAACACCATCACCATCACCATC24 GH3-2 ∆34 VPR CTGCTTTTCATCGTGTTCCATGGTATATCTCCTTCTTAAAGC24 GH3-3 ∆31 IPF CTTTAAGAAGGAGATATACCATGAGCGCGGCTTCTTTTGC24 GH3-3 ∆31 IPR GATCTCAATGGTGATGGTGATGGTGGGTGAATTCCAGGTAATCGAGC24 GH3-3 ∆31 VPF GATTACCTGGAATTCACCCACCATCACCATCACCATC24 GH3-3 ∆31 VPR TGCAAAAGAAGCCGCGCTCATGGTATATCTCCTTCTTAAAGC24 GH3-4 ∆29 IPF CTTTAAGAAGGAGATATACCATGAAACACAACATTGAAGAAATCC24 GH3-4 ∆29 IPR GATCTCAATGGTGATGGTGATGGTGATTACTGAGTCCCAAAGAC24 GH3-4 ∆29 VPF TCTTTGGGACTCAGTAATCACCATCACCATCACCATC24 GH3-4 ∆29 VPR TTCTTCAATGTTGTGTTTCATGGTATATCTCCTTCTTAAAG6.5.3 Protein Expression and Purification: Metagenome Hit LibraryProteins were purified with use of polyhistidine tags and Ni-NTA resin columns. Cultures of 50 mLLBE-5052 [285], containing 50 µg/L of kanamycin were inoculated with the expression host andcells were grown for either 18 hours at 37 ◦C (O03 GH42 His6, C11 GH1 His6, D08 GH3 His6) orat 30 ◦C for 6 hours followed by 48 hours at 18 ◦C (O22 GH3 His6, C08 GH3 His6, C24 GH3-1 ∆1-40 His6, C24 GH3-3 ∆1-31 His6, C24 GH3-4 ∆1-29 His6) with shaking. Cultures were centrifuged(3,200 x g, 4 ◦C, 20 min), the supernatant was removed and cell pellets were stored at -80 ◦C until1606.5. Chapter 4 Experimentalpurification. To purify proteins 2.5 mL of lysis mix (1 x BugBuster [Novagen], 20 mM HEPES, 300mM NaCl, 50 mM Imidazole, pH 7.0) was used to resuspend thawed cell pellets. This suspensionwas incubated at 20 ◦C for 20 minutes, after which the lysate was clarified by centrifugation (3220x g, 4 ◦C, 20 min) and loaded onto columns containing 1 mL of HisPur resin (ThermoScientific).Columns were washed with 20 mL of Buffer A (20 mM HEPES, 300 mM NaCl, 50 mM Imidazole,pH 7.0) and protein was eluted with 4 mL of Buffer B (20 mM HEPES, 300 mM NaCl, 500 mMImidazole, pH 7.0). Proteins were buffer-exchanged into storage buffer (20 mM HEPES, 300 mMNaCl, pH 7.0) with Amicon 30 kDa filter columns and stored at 4 ◦C. Protein concentrations weredetermined based on absorbance at 280 nm. The extinction coefficients used were: C11 GH1 ε= 128,480 M−1cm−1, C08 GH3 ε = 134,105 M−1cm−1, C24 GH3-1 ∆1-40 ε = 113,680 M−1cm−1,C24 GH3-2 ∆1-34 ε = 84,925 M−1cm−1, C24 GH3-4 ∆1-29 ε = 89,395 M−1cm−1, D08 GH3 ε =82,655 M−1cm−1, O03 GH42 ε = 170,225 M−1cm−1, O22 GH3 ε = 79,355 M−1cm−1. All variantenzymes were expressed and purified as for the wild-type enzymes.Both E. coli ATP-Dependent Glucokinase (EcGlk)[203] and Klebsiella pneumoniae β-glucosidekinase (BglK) [296] were expressed on a 50 mL scale, as above, and purified using the same his-tag/Ni-NTA procedure as above.6.5.4 Wild-Type Enzyme Kinetics: Metagenomic Hit LibraryKinetic parameters for wild-type enzymes were determined using the fluorogenic screening sub-strates. Assays were performed in 96-well plates (Corning 3370) containing the fluorogenic glyco-side (0.5 µM - 1 mM), buffer (20 mM HEPES, 300 mM NaCl, pH 7.0) and purified enzyme. Assaysfor I01-GH1 were performed both with and without the presence of EcGlk [203], ATP (10 mM)and MgSO4 (10 mM). Reactions were performed at 37◦C and fluorescence (λex=365 nm and λem=450 nm) was monitored using a Synergy H1 plate reader (BioTek). The quantity of fluorophoregenerated was determined by means of a calibration curve of 4-methylumbelliferyl within an iden-tical buffer system. All reactions were performed in triplicate. Rate measurements were used tocalculate kinetic parameters with the software program GraFit 7.0 software.1616.5. Chapter 4 Experimental6.5.5 Production of Mutants: Metagenomic HitsAll mutants were generated using a modified QuikChange mutagenesis protocol [180]. For each vari-ant generated, PCR was first performed for 12 cycles with one of the sense or anti-sense primers.These two reactions were subsequently pooled and an additional 16 cycles of PCR were performed.Each PCR reaction contained 10 µL of Phusion GC reaction buffer, 2.5 µL of dNTPs (10 mM),2.5 µL sense or anti-sense primer (10 µM), 10 µL of template DNA (5 ng/µL) 1 µL Phusion poly-merase and 24 µL of water. The PCR cycling parameters were: Initial denaturation at 98 ◦C for 30 sfollowed by cycles of denaturation at 98 ◦C (10 s), annealing between 60 ◦C and 70 ◦C (30 s) and ex-tension at 72 ◦C (3 min and 15 seconds). Wild-type plasmids were used as the template to generateall nucleophile variants. The double mutants (O22 GH3 D321S W232F, D08 GH3 D229S W230F,C24 GH3-1 D271S W272F and C08 GH3 D235S W236F) were generated using the serine nucle-ophile mutant as PCR template. The primers used for PCR are detailed in Table 6.7. After PCR,reactions were digested with the endonuclease DpnI (ThermoFisher) for 1 hour at 37 ◦C. Thisdigestion reaction was subsequently purified with a GeneJet PCR purification kit (ThermoFisher)and DNA was eluted into water. The purified DNA (10 µL) was then used to transform DH5αcells, plasmids were sequence verified, then transformed into BL21(DE3) cells for expression.6.5.6 Acceptor Specificity: Metagenomic HitsAcceptor specificity screening generally followed the procedures detailed by Blanchard et. al. [29]First, between 0.5 and 3 nanomoles of purified wild-type enzyme was incubated with 1 mM of2,4-dinitrophenyl 2-deoxy-2-fluoro-β-D-glucopyranoside (DNP 2F-Glc). In the case of O03-GH42the inactivator 2,4-dinitrophenyl 2-deoxy-2-fluoro-β-D-galactopyranoside was used as this enzymeis not inactivated with the glucoside. These reactions were incubated at 20 ◦C until the wild-typeenzyme displayed less than 95 % activity (typically 30 min). The inactivated enzyme was thenwashed three times with storage buffer to remove excess inactivator using a Viva spin 500 (10k)centrifugal filter unit (Vivaproducts). An aliquot of the inactivated enzyme was then transferred toa 96 well plate containing an array of potential reactivators at concentrations of 20 mM or 40 % ofa saturated solution. This reactivation plate was then incubated for 1 hour at 25 ◦C. Reactivation1626.5. Chapter 4 Experimentalrates were then assessed by adding para-nitrophenyl glucoside (pNP-G) to a final concentrationof 1 mM to each reaction and monitoring absorbance at 400 nm using a Synergy H1 plate reader(BioTek). For the O03-GH42 enzyme MU-3-O-Me-Gal was used instead of pNP-Glc to determinereactivation rates and the resulting fluorescence (λex = 365, λem = 450 nm) was detected using aSynergy H1 plate reader (BioTek). Acceptor specificity assay was not performed for I01-GH1 asthis enzyme wasn’t inhibited by DNP 2F-Glc.The acceptors used were: 1-Adamantanemethanol, α,α-D-trehalose, α-lactose, α-L-rhamnose,β-gentiobiose, 1,3-propanediol, 1,5-anhydro-D-glucitol, 1-butanol, 1-ethynylcyclohexanol, 1-hexanol, 1-naphthol, 1-octanol, 1-pentanol, 1-propanol, 1-pyrenemethanol, 2-mercaptoethanol,2-methoxyethanol 2-naphthol, 2-propanol, 3-mercapto-1-propanol 4-(hexyloxy)phenol, 4-methylumbelliferyl cellobioside, 4-methylumbelliferyl β-D-galactopyranoside, 4-methylumbelliferylβ-D-glucopyranoside, 4-methylumbilliferyl β-D-xylopyranoside, 4-vinylphenol, 5-hexyne-1-ol, asparagine, caffeic acid, cyclohexanol, D/L-threitol, D-allose, D-arabitol, D-cellobiose,D-fructose, D-galactose, D-lyxose, D-mannitol, D-mannose, D-ribose, D-tagatose, D-xylose,ethanediol, ethanol, galactal, galactitol, gallic acid, glucal, glucose, inositol, L-arabinose,L-arabitol, L-arginine, L-cysteine, L-erythritol, L-fucose, L-serine, L-sorbose, L-threonine, L-tyrosine, maltose, maltotriose, methanol, o-Phenylphenol, p-nitrophenyl α-D-galactopyranosidep-nitrophenyl α-D-mannopyranoside, p-nitrophenyl α-D-xylopyranoside, p-nitrophenyl α-L-arabinopyranoside, p-nitrophenyl β-D-cellobioside, p-nitrophenyl β-D-fucopyranoside, p-nitrophenyl β-D-galactopyranoside, p-nitrophenyl β-D-glucopyranoside, p-nitrophenyl β-D-glucuronide, p-nitrophenyl β-D-lactopyranoside, p-nitrophenyl β-D-mannopyranoside, p-nitrophenyl β-D-xylopyranoside, phenethyl alcohol, phenol, phenyl β-D-galactopyranoside, phenylβ-D-glucopyranoside, phloroglucinol, p-methoxyphenol, p-phenylphenol, raffinose, resorcinol,sorbitol, and sucrose.6.5.7 Glycosynthase Reactions: Metagenomic HitsGlycosynthase activity for each of the generated nucleophile variants was first assessed with theunmodified α-glycosyl fluoride as a donor. In the case of the nucleophile variants derived fromC08-GH3, C24-GH3-1, D08-GH3 and O22-GH3 the donor used was α-D-glucopyranosyl fluoride1636.5. Chapter 4 Experimental(αF-Glc), while the donor used for C11-GH1 and O03-GH42 was α-D-galactopyranosyl fluoride(αF-gal). The donor sugar used for I0S-GH1, 6-phospho-α-D-glucopyranosyl fluoride (6-PO4-αF-Glc) was generated in situ by Escherichia coli ATP-Dependent Glucokinase (EcGlk) [203] in thepresence of ATP (10 mM) and MgSO4 (10 mM). The acceptors used in the assay were chosen asthe top three hits from the acceptor specificity assay. Reactions were performed on a 100 µL scalewith 10 mM donor sugar and 10 mM acceptor in reaction buffer (100 mM HEPES, 100 mM NaCl,pH 7.0). Reactions were incubated at 37 ◦C for 18 hours, after which point they were monitoredby thin-layer chromatography (TLC). TLC was performed on aluminum-backed sheets of SilicaGel 60F254 (E. Merck) of thickness 0.2 mm. The plates were visualised using UV light (254nm)and/or by exposure to 10% ammonium molybdate (2M in H2SO4) followed by charring. Reactionsdisplaying product spots were sent for mass spectrometry analysis and selected for multi-milligramscale reactions. Enzymes which displayed activity were then assayed with the appropriate amino-,azido- or methoxy-α-fluorosugar as a donor.6.5.8 Multi-milligram Scale Reactions: Metagenomic HitsLarge-scale reactions contained 1.5 µM of the applicable glycosynthase, 2 mM donor sugar (α-D-glucopyranosyl fluoride (αF-Glc), 6-azido-6-deoxy-α-D-galactoyranosyl fluoride (αF-6-N3-Gal) or6-azido-6-deoxy-α-D-glucopyranosyl fluoride (αF-6-N3-Glc), 10 mM acceptor molecule (pNP β-D-glucopyranoside (pNP-Glc), or pNP α-D-xylopyranoside (pNP-α-xyl) and buffer (100 mM Hepes,100 mM NaCl, pH 7.0). These reactions were set up on a 25 mL scale and incubated at 25 ◦Cfor 18 hours with gentle agitation. Reactions were terminated by boiling for 10 minutes. Thiswas followed by centrifugation to remove precipitated protein and storage at -80 ◦C. Reactionswere then lyophilized. Solid products were suspended in 500 µL of 5 % acetonitrile in water,passed through a Millipore Ultrafree MC centrifugal column (PVDF, 0.22 µM), then loaded ona C-18 column (ZORBAX Eclipse XDB-C18, 9.4 mm x 250 mm, Agilent). A gradient of 5-10%acetonitrile in water was used to elute the product. Absorbance at 300 nm was monitored andfractions corresponding to major products were pooled. Products were lyophilized then preparedappropriately for mass spectrometry and NMR1646.5. Chapter 4 Experimental6.5.9 Screening: GH1 libraryTwo 96 deep well plates containing 800 µL of LBE-5052 [285] (50 µg/L of carbenicilin) were inoc-ulated with (5 µL per well) the library of GH1 enzymes described in Heins et al [117]. Plates wereincubated at 37 ◦C for 18 hours with shaking at 225 rpm. Cells were harvested by centrifugationat 3220 x g for 30 minutes. The supernatant was then removed and 300 µL of lysis buffer (0.3mg/mL lysozyme, 1 % triton X-100, cOmplete protease inhibitor [1 tablet/50 mL] and benzonase),was added to each well. The plates were incubated, with shaking for 2 hours at 25 ◦C. Lysate fromeach well (20 µL) was added to 96 well plates containing 280 µL of reaction buffer (20 mM sodiumacetate, pH 6.0 and 107 µL of a fluorogenic substrate). The fluorogenic substrates used were:MU-3-NH2-Glc, MU-4-NH2-Glc, MU-6-NH2-Glc, 3-N3-MU-Glc, MU-4-N3-Glc, MU-6-N3-Glc, andMU-Glc. After 18 hours reactions were terminated by diluting 20 µL with 100 µL of stop buffer(1 M Glycine, pH 10.0). Fluorescence (λex=360 nm and λem= 465 nm) was then measured with aSynergy H1 plate reader (BioTek).6.5.10 Protein Purification: GH1 LibraryBoth variant and wild-type enzymes were purified with polyhistidine tags and Ni-NTA resincolumns. Cultures of 50 mL LBE-5052[285] containing 50 µg/L of carbenicilin were inoculatedwith 5 µL of the expression host. Reactions were incubated with shaking (250 rpm) at 37 ◦C for 18hours. Cultures were centrifuged (3220 x g, 4 ◦C, 20 min), the supernatant was removed and cellpellets were stored at -80 ◦C until purification. To purify proteins 2.5 mL of lysis mix (1 x Bug-Buster [Novagen], 20 mM HEPES, 300 mM NaCl, 20 mM Imidazole, pH 7.0) was used to resuspendthawed cell pellets. This suspension was incubated at 20 ◦C for 20 minutes, after which the lysatewas clarified by centrifugation (3220 x g, 4 ◦C, 20 min) and loaded onto columns containing 1 mLof HisPur resin (ThermoScientific). Columns were washed with 20 mL of Buffer A (20 mM HEPES,300 mM NaCl, 20 mM Imidazole, pH 7.0) and protein was eluted with 4 mL of Buffer B (20 mMHEPES, 300 mM NaCl, 500 mM Imidazole, pH 7.0). Proteins were buffer-exchanged into storagebuffer (20 mM HEPES, 300 mM NaCl, pH 7.0) with Amicon 30 kDa filter columns and stored at 4◦C. Protein concentrations were determined based on absorbance at 280 nm. Extinction coefficients1656.5. Chapter 4 Experimentalare as follows : Ali GH1: ε = 121,365 M−1cm−1, Dei GH1: ε = 110,365 M−1cm−1, Exi GH1: ε =113,135 M−1cm−1, Lac GH1: ε = 113,470 M−1cm−1, Myx GH1: ε = 108,540 M−1cm−1, Pha GH1:ε = 125,375 M−1cm−1, Sac GH1: ε = 123,675 M−1cm−1, The GH1: ε = 108,540 M−1cm− Acceptor Specificity Screening: GH1 LibraryAcceptor specificity screening generally followed the procedures detailed by Blanchard et. al. [29]First, between 0.5 and 3 nanomoles of purified wild-type enzyme was incubated with 1 mM ofdintrophenyl 2-deoxy-2-fluoro-β-D-glucopyranoside (DNP 2F-Glc). This reaction was incubated at20 ◦C until the wildtype enzyme displayed greater than 95 % inhibition. The inactivated enzymewas then washed with storage buffer to remove excess inactivator using a Viva spin 500 (10 k)centrifugal filter unit (Vivaproducts). An aliquot of the inactivated enzyme was then transferred toa 96 well plate containing an array of potential reactivators at concentrations of 20 mM or 40 % ofa saturated solution. This reactivation plate was then incubated for 1 hour at 25 ◦C. Reactivationrates were then assessed by adding para-nitrophenyl glucoside (pNP-G) to a final concentrationof 1 mM to each reaction and monitoring absorbance at 400 nm using a Synergy H1 plate reader(BioTek)The acceptors used were: 1-propanol, 2-Naphthol, 2-propanol, 4-hydroxycoumarin, 4-methylumbelliferyl β-D-xyloside, 4-methylumbelliferyl a-D-glucopyranoside, 4-methylumbelliferylα-L-arabinopyranoside, 4-methylumbelliferyl β-D-cellobiopyranoside, 4-methylumbelliferylβ-D-galactopyranoside, 4-methylumbelliferyl β-D-glucopyranoside, 4-methylumbelliferyl β-D-glucuronide dihydrate, 4-methylumbelliferyl lactoside, 4-methylumbelliferyl N -acetyl-glucosaminide, 8-hydroxy-quinoline, α-L-fucose, α-L-rhamnose, β-gentiobiose, caffeic acid,citric acid, cyclohexanol, D-araboascorbic acid, D-galactose, D-glucosamine, D-maltose,D-trehalose, D-xylose, D-arabitol, cellobiose, D-galactosamine, D-fructose, D-fructose 1,6-diphosphate, D-galactal, D-galacturonic acid, D-glucoheptose, D-gluconic acid, D-gluconic acidlactone, D-glucose, D-glucose 6-phosphate, D-glucuronic acid, D-gulonic-γ-lactone, D-lyxose,D-mannitol, D-mannose, D-mannose-6 phosphate, D-tagatose, dithiothreitol, gallic acid, geran-iol, glycine, inositol, L-fucose, L-arabinose, L-ascorbic acid, lactose, L-cysteine, levulinic acid,L-xylose, maltotriose, methyl α-D-mannopyranoside, methyl α-L-rhamnoside, methyl β-D-1666.5. Chapter 4 Experimentalgalactopyranoside, methyl β-D-xylopyranoside, N -acetyl-D-glucosamine, N -acetyl-mannosamine,N -acetylneuraminic acid, octyl-β-D-glucopyranoside, palatinose, phenethyl alcohol, phenylβ-D-galactoside, phenyl β-D-glucopyranoside, p-nitrophenyl β-D-glucuronide, p-nitrophenyl α-D-mannopyranoside, p-nitrophenyl β-D-fucopyranoside, p-nitrophenyl N -acetyl-β-D-glucosaminide,p-nitrophenyl α-L-arabinopyranoside, p-nitrophenyl β-D-glucopyranoside, p-nitrophenyl β-D-xylopyranoside, p-nitrophenyl β-cellobioside, p-nitrophenyl β-D-galactopyranoside, p-nitrophenylα-D-galactopyranoside, p-nitrophenyl α-D-glucopyranoside, quercetin, raffinose hydrate, sialic acid,and sodium azide.6.5.12 Wild-Type Enzyme Kinetics: GH1 LibraryKinetic parameters for wild-type enzymes were determined using fluorogenic or chromogenic sub-strates. Assays with fluorogenic substrates were performed in 96 well plates (Corning 3370) contain-ing the 4-methylumbelliferyl glycoside (0.5 µM - 1 mM), buffer (20 mM HEPES, 300 mM NaCl, pH7.0) and purified enzyme. Reactions were performed at 30 ◦C and fluorescence (λex=360 nm andλem= 465 nm, gain = 75) was monitored using a Synergy H1 plate reader (BioTek). The quantityof fluorophore generated was determined by means of a calibration curve of 4-methylumbelliferylalcohol within an identical buffer system. Rate measurements for chromogenic reagents (para-nitrophenyl 6-deoxy-6-phospho-β-D-glucopyranoside, para-nitrophenyl β-D-glucopyranoside) wereperformed using a Cary3000 spectrophotometer (Agilent). Reaction buffer was the same as for thefluorogenic substrates and temperature was also maintained at 30 ◦C. Reactions were monitoredat 400 nm (ε = 9.42 mM−1 cm−1). All reactions were performed in triplicate. Rate measurementsfor both chromogenic and fluorogenic substrates were used to calculate kinetic parameters with thesoftware program GraFit 7.0 software.6.5.13 Production of Mutants: GH1 LibraryNucleophile mutants were generated using the same protocol used for the generation of mutants frommetagenome-sourced hydrolases. Table 6.8 details the primers used for QuikChange mutagenesis.1676.5. Chapter 4 Experimental6.5.14 Glycosynthase Reactions: GH1 LibraryThe inhibitor 2,4-dinitrophenyl 2-deoxy-2-fluoro-glucopyranoside were synthesized as previouslydescribed [175, 320]. All modified and unmodified α-glycosyl fluorides were synthesized by Dr.Hong-Ming Chen.To determine the best nucleophile variant for each enzyme, glycosynthase reactions were per-formed in triplicate, and analysed by HPLC. Glycosynthase reactions (50 µL scale) contained 20µM enzyme, 5 mM α-glucosyl fluoride, 5 mM pNP-glucopyranoside or 5 mM pNP cellobioside inreaction buffer (100 mM HEPES, 100 mM NaCl, pH 7.0). Reactions were incubated at 25 ◦C for18 hours, after which point they were diluted with 500 µL of ethanol and centrifuged to removeprecipitated protein. An aliquot (1 µL) of the reaction was loaded on a C-18 column (Poroshell 120EC-C18, 4.6 mm x 50 mm, Agilent). A gradient of 0-10% acetonitrile in water was used to elutethe product and the absorbance at 300 nm was monitored and peak area was quantified. Activitywith amino and azido donor sugars was performed on a 25 µL scale with 20 µM enzyme, 50 mMdonor sugar, 10 mM pNP-glucopyranoside or 10 mM pNP cellobioside in reaction buffer (100 mMHEPES, 100 mM NaCl, pH 7.0). Reactions were incubated at 25 ◦C for 18 hours, after which pointthey were monitored by thin-layer chromatography (TLC). TLC was performed on aluminium-backed sheets of Silica Gel 60F254 (E. Merck) of thickness 0.2mm. The plates were visualised usingUV light (254nm) and/or by exposure to 10% ammonium molybdate (2M in H2SO4) followed bycharring. Reactions displaying product spots were sent for mass spectrometry analysis and selectedfor multi-milligram scale reactions. Reactions with alternate acceptors were performed as above.6.5.15 Multi-milligram Scale Reactions: GH1 LibraryLarge-scale reactions contained 20 µM of the applicable glycosynthase, 10 mM donor sugar (α-D-glucopyranosyl fluoride (αF-Glc), 3-amino-3-deoxy-α-D-glucopyranosyl fluoride (αF-3-NH2-Glc), 4-amino-4-deoxy-α-D-glucopyranosyl fluoride (αF-4-NH2-Glc), 6-amino-6-deoxy-α-D-glucopyranosylfluoride (αF-6-NH2-Glc), 3-azido-3-deoxy-α-D-glucopyranosyl fluoride (αF-3-N3-Glc), 4-azido-4-deoxy-α-D-glucopyranosyl fluoride (αF-4-N3-Glc) or 6-azido-6-deoxy-α-D-glucopyranosyl fluoride(αF-6-N3-Glc), 10 mM acceptor molecule (pNP glucopyranoside (pNP-Glc), pNP cellobioside (pNP-1686.5. Chapter 4 ExperimentalC), pNP xylopyranoside (pNP-xyl), n-octyl glucoside or DNP 2-deoxy-2-fluoro-glucopyranoside(DNP 2F-Glc) and buffer (100 mM Hepes, 100 mM NaCl, pH 7.0). These reactions were set upon a 3.2 mL scale and incubated at 25 ◦C for 18 hours with gentle agitation. Reactions wereterminated by boiling for 5 minutes. This was followed by centrifugation to remove precipitatedprotein and storage at -80 ◦C. Reactions were then lyophilized. Solid products were suspended in500 µL of 5 % acetonitrile in water, passed through a Millipore Ultrafree MC centrifugal column(PVDF -.22 µM), then loaded on a C-18 column (ZORBAX Eclipse XDB-C18, 9.4 mm x 250 mm,Agilent). A gradient of 5-10% acetonitrile in water was used to elute the product. Absorbance at300 nm was monitored and fractions corresponding to major products were pooled. Products werelyophilized then prepared appropriately for mass spectrometry and NMR6.5.16 Mass Spectrometry and NMR Spectroscopy of ProductsProton and carbon NMR spectra were recorded on Bruker Advance 400inv, 400dir and a 300 FourierTransform spectrometer fitted with a 5mm BBI-Z probe. All spectra were recorded using an internaldeuterium lock and are referenced internally using the residual solvent peak. Carbon and protonchemical shifts are quoted in parts per million (ppm) downfield of tetramethylsilane. Couplingconstants (J) are given in Hertz (Hz). Carbon NMR spectra were acquired with broadband protondecoupling and were recorded with DEPT. Mass spectra were measured on a Waters/MicromassLCT using electrospray ionisation (ESI) using methanol as solvent.1696.5. Chapter 4 ExperimentalTable 6.7: Primers Used for Mutagenesis of Metagenome Sourced HydrolasesPrimer SequenceB09O03 GH42 E309A F GTTTCTGCTCATGGCGCAGACGCCGAGCB09O03 GH42 E309A R GCTCGGCGTCTGCGCCATGAGCAGAAACB09O03 GH42 E309S F CGTTTCTGCTCATGAGCCAGACGCCGAGCGTGB09O03 GH42 E309S R CACGCTCGGCGTCTGGCTCATGAGCAGAAACGB09O03 GH42 E309G F GTTTCTGCTCATGGGCCAGACGCCGAGCGB09O03 GH42 E309G R CGCTCGGCGTCTGGCCCATGAGCAGAAACFOS6240O22 GH3 D231A F GTTACGTGATGACGGCGTGGGGCGCAATGAACFOS6240O22 GH3 D231A R GTTCATTGCGCCCCACGCCGTCATCACGTAACFOS6240O22 GH3 D231S F GTTACGTGATGACGAGCTGGGGCGCAATGFOS6240O22 GH3 D231S R CATTGCGCCCCAGCTCGTCATCACGTAACFOS6240O22 GH3 D231G F GTTACGTGATGACGGGCTGGGGCGCAATGAACFOS6240O22 GH3 D231G R GTTCATTGCGCCCCAGCCCGTCATCACGTAACFOS6241C11 GH1 E354A F CCTGCCGCTTATTATTACCGCAAACGGGATGGCGGACAACGACFOS6241C11 GH1 E354A R GTCGTTGTCCGCCATCCCGTTTGCGGTAATAATAAGCGGCAGGFOS6241C11 GH1 E354S F CCTGCCGCTTATTATTACCTCAAACGGGATGGCGGACAACGACFOS6241C11 GH1 E354S R GTCGTTGTCCGCCATCCCGTTTGAGGTAATAATAAGCGGCAGGFOS6241C11 GH1 E354G F CCTGCCGCTTATTATTACCGGAAACGGGATGGCGGACAACGACFOS6241C11 GH1 E354G R GTCGTTGTCCGCCATCCCGTTTCCGGTAATAATAAGCGGCAGGNapDC14D08 GH3 D229A F GTTTTGTGGTTTCTGCGTGGGGAGCTGTGCATGNapDC14D08 GH3 D229A R CATGCACAGCTCCCCACGCAGAAACCACAAAACNapDC14D08 GH3 D229S F GAAGGTTTTGTGGTTTCTAGCTGGGGAGCTGTGCATGACNapDC14D08 GH3 D229S R GTCATGCACAGCTCCCCAGCTAGAAACCACAAAACCTTCNapDC14D08 GH3 D229G F GTTTTGTGGTTTCTGGCTGGGGAGCTGTGCATGNapDC14D08 GH3 D229G R CATGCACAGCTCCCCAGCCAGAAACCACAAAACTolDC15C08 GH3 D235A F CGGTCGTCTCCGCGTGGTTTGCGACTolDC15C08 GH3 D235A R GTCGCAAACCACGCGGAGACGACCGTolDC15C08 GH3 D235S F CGCGGTCGTCTCCAGCTGGTTTGCGACTolDC15C08 GH3 D235S R GTCGCAAACCAGCTGGAGACGACCGCGTolDC15C08 GH3 D235G F CGGTCGTCTCCGGCTGGTTTGCGACTolDC15C08 GH3 D235G R GTCGCAAACCAGCCGGAGACGACCGCA23302C24 GH3 1 D271A F CGTCATGATGTCCGCGTGGTTTGCGACTTACCA23302C24 GH3 1 D271S F GGCGTCATGATGTCCAGCTGGTTTGCGACTTACCA23302C24 GH3 1 D271G F GTCATGATGTCCGGCTGGTTTGCGACCA23302C24 GH3 1 D271A R GTAAGTCGCAAACCACGCGGACATCATGACGCA23302C24 GH3 1 D271S R GTAAGTCGCAAACCAGCTGGACATCATGACGCCCA23302C24 GH3 1 D271G R GTCGCAAACCAGCCGGACATCATGACCG23A23I01 GH1 E374A F CAAGATTTATATTACCGAGCGTGGTCTTGGTGATGAAGATCCG23A23I01 GH1 E374A R GATCTTCATCACCAAGACCACGCTCGGTAATATAAATCTTGCG23A23I01 GH1 E374S F GTCAAGATTTATATTACCGAAGCTGGTCTTGGTGATGAAGATCCCG23A23I01 GH1 E374S R GGATCTTCATCACCAAGACCAGCTTCGGTAATATAAATCTTGACCG23A23I01 GH1 E374G F CAAGATTTATATTACCGAGGCTGGTCTTGGTGATGAAGATCCG23A23I01 GH1 E374G R GATCTTCATCACCAAGACCAGCCTCGGTAATATAAATCTTGO22 GH3 D231S W232F F CGTGATGACGAGCTTTGGCGCAATGAACAACO22 GH3 D231S W232F R GTTGTTCATTGCGCCAAAGCTCGTCATCACGD08 GH3 D229S W230F F GTTTTGTGGTTTCTAGCTTTGGAGCTGTGCATGACAGD08 GH3 D229S W230F R CTGTCATGCACAGCTCCAAAGCTAGAAACCACAAAACC24 GH3 1 D271S W272F F GCGTCATGATGTCCAGCTTCTTTGCGACTTACGACGGTGC24 GH3 1 D271S W272F R CACCGTCGTAAGTCGCAAAGAAGCTGGACATCATGACGCC08 GH3 D235S W236F F GCGGTCGTCTCCAGCTTCTTTGCGACCCATTCCACC08 GH3 D235S W236F R GTGGAATGGGTCGCAAAGAAGCTGGAGACGACCGC1706.5. Chapter 4 ExperimentalTable 6.8: Primers Used in QuikChange Mutagenesis of Selected GH1 EnzymesPrimer Name SequenceAliGH1 E354A F CATTCCGATCTACATTACTGCAAATGGCGCAGCCTTTGATGAliGH1 E354A R CATCAAAGGCTGCGCCATTTGCAGTAATGTAGATCGGAATGAliGH1 E354G F CATTCCGATCTACATTACTGGAAATGGCGCAGCCTTTGATGAliGH1 E354G R CATCAAAGGCTGCGCCATTTCCAGTAATGTAGATCGGAATGAliGH1 E354S F CATTCCGATCTACATTACTTCAAATGGCGCAGCCTTTGATGAliGH1 E354S R CATCAAAGGCTGCGCCATTTGAAGTAATGTAGATCGGAATGDeiGH1 E346A F CACCGATGTACATTACCGCAAATGGTGCAGCCTATCDeiGH1 E346A R GATAGGCTGCACCATTTGCGGTAATGTACATCGGTGDeiGH1 E346G F CACCGATGTACATTACCGGAAATGGTGCAGCCTATCDeiGH1 E346G R GATAGGCTGCACCATTTCCGGTAATGTACATCGGTGDeiGH1 E346S F CACCGATGTACATTACCTCAAATGGTGCAGCCTATCDeiGH1 E346S R GATAGGCTGCACCATTTGAGGTAATGTACATCGGTGMyxGH1 E357A F CCCTTTGTACATTACAGCAAATGGTTGCGCCTATGMyxGH1 E357A R CATAGGCGCAACCATTTGCTGTAATGTACAAAGGGMyxGH1 E357G F CCCTTTGTACATTACAGGAAATGGTTGCGCCTATGMyxGH1 E357G R CATAGGCGCAACCATTTCCTGTAATGTACAAAGGGMyxGH1 E357S F GCCCTTTGTACATTACATCAAATGGTTGCGCCTATGCMyxGH1 E357S R GCATAGGCGCAACCATTTGATGTAATGTACAAAGGGCPhaGH1 E365A F CAGTTTATGTGACAGCAAATGGCTTCCCTGPhaGH1 E365A R CAGGGAAGCCATTTGCTGTCACATAAACTGPhaGH1 E365G F CAGTTTATGTGACAGGAAATGGCTTCCCTGPhaGH1 E365G R CAGGGAAGCCATTTCCTGTCACATAAACTGPhaGH1 E365S F GATAAGCCAGTTTATGTGACATCAAATGGCTTCCCTGTTAAAGGPhaGH1 E365S R CCTTTAACAGGGAAGCCATTTGATGTCACATAAACTGGCTTATCSacGH1 E368A F CTGATATCTATATCACTGCAAACGGTTGCGCCCTGCSacGH1 E368A R GCAGGGCGCAACCGTTTGCAGTGATATAGATATCAGSacGH1 E368G F CTGATATCTATATCACTGGAAACGGTTGCGCCCTGCSacGH1 E368G R GCAGGGCGCAACCGTTTCCAGTGATATAGATATCAGSacGH1 E368S F CTGATATCTATATCACTTCAAACGGTTGCGCCCTGSacGH1 E368S R CAGGGCGCAACCGTTTGAAGTGATATAGATATCAGTheGH1 E388A F GTTGTACATCACCGCAAACGGTGCAGCCTTCGAAGTheGH1 E388A R CTTCGAAGGCTGCACCGTTTGCGGTGATGTACAACTheGH1 E388G F CCGTTGTACATCACCGGAAACGGTGCAGCCTTCGAAGTheGH1 E388G R CTTCGAAGGCTGCACCGTTTCCGGTGATGTACAACGGTheGH1 E388S F CTTACCGTTGTACATCACCTCAAACGGTGCAGCCTTCGAAGTheGH1 E388S R CTTCGAAGGCTGCACCGTTTGAGGTGATGTACAACGGTAAGExiGH1 E350A F CTATCTATATCACTGCAAACGGTGCCGCGTTCExiGH1 E350A R GAACGCGGCACCGTTTGCAGTGATATAGATAGExiGH1 E350G F CTATCTATATCACTGGAAACGGTGCCGCGTTCExiGH1 E350G R GAACGCGGCACCGTTTCCAGTGATATAGATAGExiGH1 E350S F GCCTATCTATATCACTAGCAACGGTGCCGCGTTCGExiGH1 E350S R CGAACGCGGCACCGTTGCTAGTGATATAGATAGGCLacGH1 E366A F CATGGTTTGTTGCCGCAAATGGTATTGGCGLacGH1 E366A R CGCCAATACCATTTGCGGCAACAAACCATGLacGH1 E366G F CATGGTTTGTTGCCGGAAATGGTATTGGCGLacGH1 E366G R CGCCAATACCATTTCCGGCAACAAACCATGLacGH1 E366S F GCCATGGTTTGTTGCCAGCAATGGTATTGGCGTGGLacGH1 E366S R CCACGCCAATACCATTGCTGGCAACAAACCATGGC171Bibliography[1] T. Aakvik, K. F. Degnes, R. Dahlsrud, F. Schmidt, R. Dam, L. Yu, U. Vo¨lker, T. E. Ellingsen,and S. Valla. A plasmid RK2-based broad-host-range cloning vector useful for transfer ofmetagenomic libraries to a variety of bacterial species. FEMS Microbiology Letters, 296(2):149–158, 2009.[2] K. Abe, M. Nakajima, T. Yamashita, H. Matsunaga, S. Kamisuki, T. Nihira, Y. Takahashi,N. Sugimoto, A. Miyanaga, H. Nakai, T. Arakawa, S. Fushinobu, and H. Taguchi. Biochemicaland structural analyses of a bacterial endo-β-1,2-glucanase reveal a new glycoside hydrolasefamily. Journal of Biological Chemistry, 292(18):7487–7506, 2017.[3] K. E. Achyuthan, A. M. Achyuthan, P. D. Adams, S. M. Dirk, J. C. Harper, B. A. Simmons,and A. K. Singh. Supramolecular self-assembled chaos: polyphenolic lignin’s barrier to cost-effective lignocellulosic biofuels. Molecules, 15:8641–8688, 2010.[4] M. Al, L. J. Evans, W. Douglas Gould, W. F. A. Duncan, and S. Glasauer. The long termoperation of a biologically based treatment system that removes As, S and Zn from industrial(smelter operation) landfill seepage. Applied Geochemistry, 26(11):1886–1896, 2011.[5] M. Aleksiuk. The Seasonal Food Regime of Arctic Beavers. Ecology, 51(2):264–270, 1970.[6] E. Allerdings, J. Ralph, H. Steinhart, and M. Bunzel. Isolation and structural identificationof complex feruloylated heteroxylan side-chains from maize bran. Phytochemistry, 67(12):1276–1286, 2006.[7] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignmentsearch tool. Journal of Molecular Biology, 215(3):403–410, 1990.172Bibliography[8] D. An, S. M. Caffrey, J. Soh, A. Agrawal, D. Brown, K. Budwill, X. Dong, P. F. Dunfield,J. Foght, L. M. Gieg, S. J. Hallam, N. W. Hanson, Z. He, T. R. Jack, J. Klassen, K. M.Konwar, E. Kuatsjah, C. Li, S. Larter, V. Leopatra, C. L. Nesbo, T. Oldenburg, A. P. Page,E. Ramos-Padron, F. F. Rochman, A. Saidi-Mehrabad, C. W. Sensen, P. Sipahimalani, Y. C.Song, S. Wilson, G. Wolbring, M. L. Wong, and G. Voordouw. Metagenomics of hydrocarbonresource environments indicates aerobic taxa and genes to be unexpectedly common. EnvironScience & Technology, 47(18):10708–17, 2013.[9] S. Anders and W. Huber. Differential expression analysis for sequence count data. GenomeBiology, 11(10):R106, 2010.[10] A. Angelov, V. T. T. Pham, M. belacker, S. Brady, B. Leis, N. Pill, J. Brolle, M. Mechelke,M. Moerch, B. Henrissat, and W. Liebl. A metagenome-derived thermostable β-glucanasewith an unusual module architecture which defines the new glycoside hydrolase family GH148.Scientific Reports, 7(1), 2017.[11] M. M. Appeldoorn, M. A. Kabel, D. Van Eylen, H. Gruppen, and H. A. Schols. Charac-terization of Oligomeric Xylan Structures from Corn Fiber Resistant to Pretreatment andSimultaneous Saccharification and Fermentation. Journal of Agricultural and Food Chem-istry, 58(21):11294–11301, 2010.[12] Z. Armstrong and S. G. Withers. Synthesis of Glycans and Glycopolymers Through Engi-neered Enzymes. Biopolymers, 99(10):666–674, 2013.[13] Z. Armstrong, K. Mewis, C. Strachan, and S. J. Hallam. Biocatalysts for biomass decon-struction from environmental genomics. Current Opinion in Chemical Biology, 29:18–25,2015.[14] M. Arumugam, J. Raes, E. Pelletier, D. Le Paslier, T. Yamada, D. R. Mende, G. R. Fer-nandes, J. Tap, T. Bruls, J.-M. Batto, M. Bertalan, N. Borruel, F. Casellas, L. Fernandez,L. Gautier, T. Hansen, M. Hattori, T. Hayashi, M. Kleerebezem, K. Kurokawa, M. Leclerc,F. Levenez, C. Manichanh, H. B. Nielsen, T. Nielsen, N. Pons, J. Poulain, J. Qin, T. Sicheritz-Ponten, S. Tims, D. Torrents, E. Ugarte, E. G. Zoetendal, J. Wang, F. Guarner, O. Pedersen,173BibliographyW. M. de Vos, S. Brunak, J. Dor, M. Antoln, F. Artiguenave, H. M. Blottiere, M. Almeida,C. Brechot, C. Cara, C. Chervaux, A. Cultrone, C. Delorme, G. Denariaz, R. Dervyn,K. U. Foerstner, C. Friss, M. van de Guchte, E. Guedon, F. Haimet, W. Huber, J. vanHylckama-Vlieg, A. Jamet, C. Juste, G. Kaci, J. Knol, O. Lakhdari, S. Layec, K. Le Roux,E. Maguin, A. Mrieux, R. Melo Minardi, C. M’Rini, J. Muller, R. Oozeer, J. Parkhill, P. Re-nault, M. Rescigno, N. Sanchez, S. Sunagawa, A. Torrejon, K. Turner, G. Vandemeulebrouck,E. Varela, Y. Winogradsky, G. Zeller, J. Weissenbach, S. D. Ehrlich, and P. Bork. Enterotypesof the Human Gut Microbiome. Nature, 473(7346):174–180, 2011.[15] H. Aspeborg, P. M. Coutinho, Y. Wang, H. Brumer, and B. Henrissat. Evolution, sub-strate specificity and subfamily classification of glycoside hydrolase family 5 (GH5). BMCEvolutionary Biology, 12(1):186, 2012.[16] V. B˚agenholm, S. K. Reddy, H. Bouraoui, J. Morrill, E. Kulcinskaja, C. M. Bahr, O. Aurelius,T. Rogers, Y. Xiao, D. T. Logan, E. C. Martens, N. M. Koropatkin, and H. St˚albrand.Galactomannan Catabolism Conferred by a Polysaccharide Utilization Locus of Bacteroidesovatus. Journal of Biological Chemistry, 292(1):229–243, 2017.[17] S. G. Ball and M. K. Morell. FROM BACTERIAL GLYCOGEN TO STARCH: Under-standing the Biogenesis of the Plant Starch Granule. Annual Review of Plant Biology, 54(1):207–233, 2003.[18] A. Bankevich, S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M.Lesin, S. I. Nikolenko, S. Pham, A. D. Prjibelski, et al. SPAdes: a new genome assemblyalgorithm and its applications to single-cell sequencing. Journal of Computational Biology,19(5):455–477, 2012.[19] G. Bastien, G. Arnal, S. Bozonnet, S. Laguerre, F. Ferreira, R. Faure, B. Henrissat, F. Lefevre,P. Robe, O. Bouchez, C. Noirot, C. Dumon, and M. O’Donohue. Mining for hemicellulasesin the fungus-growing termite Pseudacanthotermes militaris using functional metagenomics.Biotechnology for Biofuels, 6(1):78, 2013.[20] M. W. Bauer, L. E. Driskill, W. Callen, M. A. Snead, E. J. Mathur, and R. M. Kelly. An174Bibliographyendoglucanase, EglA, from the hyperthermophilic archaeon Pyrococcus furiosus hydrolyzesbeta-1,4 bonds in mixed-linkage (1→3),(1→4)-beta-D-glucans and cellulose. Journal of Bac-teriology, 181(1):284–90, 1999.[21] N. T. Baxter, J. J. Wan, A. M. Schubert, M. L. Jenior, P. Myers, P. D. Schloss, and K. E.Wommack. Intra- and Interindividual Variations Mask Interspecies Variation in the Micro-biota of Sympatric Peromyscus Populations. Applied and Environmental Microbiology, 81(1):396–404, 2015.[22] E. W. Beals. Bray-Curtis Ordination: An Effective Strategy for Analysis of MultivariateEcological Data. Advances in Ecological Research, 14:1–55, 1984.[23] C. Bera, V. Broussolle, E. Forano, and G. Gaudet. Gene sequence analysis and properties ofEGC, a family E (9) endoglucanase from Fibrobacter succinogenes BL2. FEMS MicrobiologyLetters, 136(1):79–84, 1996.[24] A. Bhat, S. Riyaz-Ul-Hassan, N. Ahmad, N. Srivastava, and S. Johri. Isolation of cold-active,acidic endocellulase from Ladakh soil by functional metagenomics. Extremophiles, 17(2):229–239, 2013.[25] B. Bissaro, s. K. Rhr, G. Mller, P. Chylenski, M. Skaugen, Z. Forsberg, S. J. Horn, G. Vaaje-Kolstad, and V. G. H. Eijsink. Oxidative cleavage of polysaccharides by monocopper enzymesdepends on H2O2. Nature Chemical Biology, 13(10):1123–1128, 2017.[26] S. Biver, S. Steels, D. Portetelle, and M. Vandenbol. Bacillus subtilis as a tool for screening soilmetagenomic libraries for antimicrobial activities. Journal of Microbiology and Biotechnology,23(6), 2013.[27] S. Biver, A. Stroobants, D. Portetelle, and M. Vandenbol. Two promising alkaline beta-glucosidases isolated by functional metagenomics from agricultural soil, including one show-ing high tolerance towards harsh detergents, oxidants and glucose. Journal of IndustrialMicrobiology & Biotechnology, 41(3):479–488, 2014.175Bibliography[28] M. K. Bjursell, E. C. Martens, and J. I. Gordon. Functional Genomic and Metabolic Studies ofthe Adaptations of a Prominent Adult Human Gut Symbiont, Bacteroides thetaiotaomicron,to the Suckling Period. Journal of Biological Chemistry, 281(47):36269–36279, 2006.[29] J. E. Blanchard and S. G. Withers. Rapid screening of the aglycone specificity of glycosidases:Applications to enzymatic synthesis of oligosaccharides. Chemistry & Biology, 8(7):627–633,2001.[30] A. Bock, K. Forchhammer, J. Heider, W. Leinfelder, G. Sawers, B. Veprek, and F. Zinoni.Selenocysteine: the 21st amino acid. Molecular Microbiology, 5(3):515–20, 1991.[31] W. Boerjan, J. Ralph, and M. Baucher. Lignin biosynthesis. Annual Review of Plant Biology,54:519–46, 2003.[32] K. S. Boles, K. Kannan, J. Gill, M. Felderman, H. Gouvis, B. Hubby, K. I. Kamrud, J. C. Ven-ter, and D. G. Gibson. Digital-to-biological converter for on-demand production of biologics.Nature Biotechnology, 35(7):672–675, 2017.[33] U. T. Bornscheuer and R. J. Kazlauskas. Catalytic Promiscuity in Biocatalysis: Using OldEnzymes to Form New Bonds and Follow New Pathways. Angewandte Chemie InternationalEdition, 43(45):6032–6040, 2004.[34] E. Bouhajja, M. McGuire, M. R. Liles, G. Bataille, S. N. Agathos, and I. F. George. Iden-tification of novel toluene monooxygenase genes in a hydrocarbon-polluted sediment usingsequence- and function-based screening of metagenomic libraries. Applied Microbiology andBiotechnology, 101(2):797–808, 2017.[35] F. J. Brenner. Foods Consumed by Beavers in Crawford County, Pennsylvania. The Journalof Wildlife Management, 26(1):104, 1962.[36] R. Brunecky, M. Alahuhta, Q. Xu, B. S. Donohoe, M. F. Crowley, I. A. Kataeva, S. J. Yang,M. G. Resch, M. W. Adams, V. V. Lunin, M. E. Himmel, and Y. J. Bomble. Revealing nature’scellulase diversity: the digestion mechanism of Caldicellulosiruptor bescii CelA. Science, 342(6165):1513–6, 2013.176Bibliography[37] M. S. Buckeridge, H. Pessoa dos Santos, and M. A. S. Tin. Mobilisation of storage cell wallpolysaccharides in seeds. Plant Physiology and Biochemistry, 38(1-2):141–156, 2000.[38] R. R. Buech. Ontogeny and diurnal cycle of fecal reingestion in the North American beaver(Castor canadensis). Journal of Mammalogy, 65(2):347–350, 1984.[39] L. Burhenne, J. Messmer, T. Aicher, and M.-P. Laborie. The effect of the biomass componentslignin, cellulose and hemicellulose on TGA and fixed bed pyrolysis. Journal of Analytical andApplied Pyrolysis, 101:177–184, 2013.[40] W. F. Burkholder, R. M. Weiner, L. E. Taylor, B. Henrissat, L. Hauser, M. Land, P. M.Coutinho, C. Rancurel, E. H. Saunders, A. G. Longmire, H. Zhang, E. A. Bayer, H. J.Gilbert, F. Larimer, I. B. Zhulin, N. A. Ekborg, R. Lamed, P. M. Richardson, I. Borovok,and S. Hutcheson. Complete Genome Sequence of the Complex Carbohydrate-DegradingMarine Bacterium, Saccharophagus degradans Strain 2-40T. PLOS Genetics, 4(5):e1000087,2008.[41] P. E. Busher. Food Caching Behavior of Beavers (Castor canadensis): Selection and Use ofWoody Species. American Midland Naturalist, 135(2):343–348, 1996.[42] M. Busse-Wicher, A. Li, R. L. Silveira, C. S. Pereira, T. Tryfona, T. C. F. Gomes, M. S. Skaf,and P. Dupree. Evolution of xylan substitution patterns in gymnosperms and angiosperms:implications for xylan interaction with cellulose. Plant Physiology, page pp.00539.2016, 2016.[43] K. H. Caffall and D. Mohnen. The structure, function, and biosynthesis of plant cell wallpectic polysaccharides. Carbohydrate Research, 344(14):1879–1900, 2009.[44] P. Capek, J. Alfo¨ldi, and D. Likov. An acetylated galactoglucomannan from Picea abies L.Karst. Carbohydrate Research, 337(11):1033–1037, 2002.[45] S. Capella-Gutierrez, J. M. Silla-Martinez, and T. Gabaldon. trimAl: a tool for automatedalignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15):1972–3, 2009.[46] J. G. Caporaso, J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K. Costello,177BibliographyN. Fierer, A. G. Pena, J. K. Goodrich, J. I. Gordon, et al. QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5):335–336, 2010.[47] A. Carroll and C. Somerville. Cellulosic Biofuels. Annual Review of Plant Biology, 60(1):165–182, 2009.[48] A. Cartmell, E. C. Lowe, A. Basl, S. J. Firbank, D. A. Ndeh, H. Murray, N. Terrapon, V. Lom-bard, B. Henrissat, J. E. Turnbull, M. Czjzek, H. J. Gilbert, and D. N. Bolam. How membersof the human gut microbiota overcome the sulfation problem posed by glycosaminoglycans.Proc Natl Acad Sci, 114(27):7037–7042, 2017.[49] R. Caspi, T. Altman, K. Dreher, C. A. Fulcher, P. Subhraveti, I. M. Keseler, A. Kothari,M. Krummenacker, M. Latendresse, L. A. Mueller, Q. Ong, S. Paley, A. Pujar, A. G. Shearer,M. Travers, D. Weerasinghe, P. Zhang, and P. D. Karp. The MetaCyc database of metabolicpathways and enzymes and the BioCyc collection of pathway/genome databases. NucleicAcids Research, 40(Database issue):D742–53, 2012.[50] P. Chang, X. Chen, C. Smyrniotis, A. Xenakis, T. Hu, C. Bertozzi, and P. Wu. MetabolicLabeling of Sialic Acids in Living Animals with Alkynyl Sugars. Angewandte Chemie Inter-national Edition, 48(22):4030–4033, 2009.[51] P. V. Chang, D. H. Dube, E. M. Sletten, and C. R. Bertozzi. A Strategy for the SelectiveImaging of Glycans Using Caged Metabolic Precursors. Journal of the American ChemicalSociety, 132(28):9516–9518, 2010.[52] H. M. Chen, Z. Armstrong, S. J. Hallam, and S. G. Withers. Synthesis and evaluationof a series of 6-chloro-4-methylumbelliferyl glycosides as fluorogenic reagents for screeningmetagenomic libraries for glycosidase activity. Carbohydrate Research, 421:33–9, 2016.[53] J. Cheng and T. C. Charles. Novel polyhydroxyalkanoate copolymers produced in Pseu-domonas putida by metagenomic polyhydroxyalkanoate synthases. Applied Microbiology andBiotechnology, 100(17):7611–7627, 2016.178Bibliography[54] B. Chevreux, T. Pfisterer, B. Drescher, A. J. Driesel, W. E. Mu¨ller, T. Wetter, and S. Suhai.Using the miraEST assembler for reliable and automated mRNA transcript assembly andSNP detection in sequenced ESTs. Genome Research, 14(6):1147–1159, 2004.[55] S. R. Chhabra and R. M. Kelly. Biochemical characterization of Thermotoga maritima en-doglucanase Cel74 with and without a carbohydrate binding module (CBM). FEBS Letters,531(2):375–80, 2002.[56] C. Choi. Tierra del Fuego: the beavers must die. Nature, 453(7198):968–968, 2008.[57] S. P. S. Chundawat, G. T. Beckham, M. E. Himmel, and B. E. Dale. Deconstruction of Lig-nocellulosic Biomass to Fuels and Chemicals. Annual Review of Chemical and BiomolecularEngineering, 2(1):121–145, 2011.[58] L. Clarke and J. Carbon. A colony bank containing synthetic CoI EI hybrid plasmids repre-sentative of the entire E. coli genome. Cell, 9(1):91–99, 1976.[59] J. B. Clayton, P. Vangay, H. Huang, T. Ward, B. M. Hillmann, G. A. Al-Ghalith, D. A.Travis, H. T. Long, B. V. Tuan, V. V. Minh, F. Cabana, T. Nadler, B. Toddes, T. Murphy,K. E. Glander, T. J. Johnson, and D. Knights. Captivity humanizes the primate microbiome.Proc Natl Acad Sci, 113(37):10376–10381, 2016.[60] B. Cobucci-Ponzano and M. Moracci. Glycosynthases as tools for the production of glycananalogs of natural products. Natural Product Reports, 29(6):697–709, 2012.[61] B. Cobucci-Ponzano, A. Strazzulli, M. Rossi, and M. Moracci. Glycosynthases in Biocatalysis.Advanced Synthesis & Catalysis, 353(13):2284–2300, 2011.[62] V. Codera, K. J. Edgar, M. Faijes, and A. Planas. Functionalized Celluloses with RegularSubstitution Pattern by Glycosynthase-Catalyzed Polymerization. Biomacromolecules, 17(4):1272–1279, 2016.[63] P.-Y. Colin, B. Kintses, F. Gielen, C. M. Miton, G. Fischer, M. F. Mohamed, M. Hyvo¨nen,D. P. Morgavi, D. B. Janssen, and F. Hollfelder. Ultrahigh-throughput discovery of promis-179Bibliographycuous enzymes by picodroplet functional metagenomics. Nature Communications, 6:10008,2015.[64] S. Comtet-Marre, N. Parisot, P. Lepercq, F. Chaucheyras-Durand, P. Mosoni, E. Peyretail-lade, A. R. Bayat, K. J. Shingfield, P. Peyret, and E. Forano. Metatranscriptomics Revealsthe Active Bacterial and Eukaryotic Fibrolytic Communities in the Rumen of Dairy Cow Feda Mixed Diet. Frontiers in Microbiology, 8, 2017.[65] D. J. Cosgrove. Growth of the plant cell wall. Nature Reviews Molecular Cell Biology, 6(11):850–61, 2005.[66] D. J. Cosgrove and M. C. Jarvis. Comparative structure and biomechanics of plant primaryand secondary cell walls. Frontiers in Plant Science, 3, 2012.[67] E. K. Costello, C. L. Lauber, M. Hamady, N. Fierer, J. I. Gordon, and R. Knight. BacterialCommunity Variation in Human Body Habitats Across Space and Time. Science, 326(5960):1694–1697, 2009.[68] M. Cotta and R. Forster. The Family Lachnospiraceae, Including the Genera Butyrivibrio,Lachnospira and Roseburia. Springer, 2006.[69] J. W. Craig, F. Y. Chang, J. H. Kim, S. C. Obiajulu, and S. F. Brady. Expanding Small-Molecule Functional Metagenomics through Parallel Screening of Broad-Host-Range CosmidEnvironmental DNA Libraries in Diverse Proteobacteria. Applied and Environmental Micro-biology, 76(5):1633–1641, 2010.[70] A. Currier, W. D. Kitts, and C. I. Cellulose Digestion in The Beaver (Castor canadensis).Canadian Journal of Zoology, 38:1109–1116, 1960.[71] F. Cuskin, E. C. Lowe, M. J. Temple, Y. Zhu, E. A. Cameron, N. A. Pudlo, N. T. Porter,K. Urs, A. J. Thompson, A. Cartmell, A. Rogowski, B. S. Hamilton, R. Chen, T. J. Tolbert,K. Piens, D. Bracke, W. Vervecken, Z. Hakki, G. Speciale, J. L. Munz-Munz, A. Day, M. J.Pea, R. McLean, M. D. Suits, A. B. Boraston, T. Atherly, C. J. Ziemer, S. J. Williams, G. J.180BibliographyDavies, D. W. Abbott, E. C. Martens, and H. J. Gilbert. Human gut Bacteroidetes can utilizeyeast mannan through a selfish mechanism. Nature, 517(7533):165–169, 2015.[72] P. M. Danby and S. G. Withers. Advances in Enzymatic Glycoside Synthesis. ACS ChemicalBiology, 11(7):1784–1794, 2016.[73] R. Daniel. The metagenomics of soil. Nature Reviews Microbiology, 3(6):470–478, 2005.[74] L. A. David, C. F. Maurice, R. N. Carmody, D. B. Gootenberg, J. E. Button, B. E. Wolfe,A. V. Ling, A. S. Devlin, Y. Varma, M. A. Fischbach, S. B. Biddinger, R. J. Dutton, andP. J. Turnbaugh. Diet rapidly and reproducibly alters the human gut microbiome. Nature,505(7484):559–563, 2013.[75] M. M. de O. Buanafina. Feruloylation in Grasses: Current and Future Perspectives. MolecularPlant, 2(5):861–872, 2009.[76] P. M. de Souza and P. de Oliveira Magalhaes. Application of microbial alpha-amylase inindustry - A review. Brazilian Journal of Microbiology, 41(4):850–61, 2010.[77] T. C. Delport, M. L. Power, R. G. Harcourt, K. N. Webster, S. G. Tetu, and H. Goodrich-Blair. Colony Location and Captivity Influence the Gut Microbial Community Compositionof the Australian Sea Lion (Neophoca cinerea). Applied and Environmental Microbiology, 82(12):3440–3449, 2016.[78] A. Demirba. Calculation of higher heating values of biomass fuels. Fuel, 76(5):431–434, 1997.[79] D. S. Domozych, M. Ciancia, J. U. Fangel, M. D. Mikkelsen, P. Ulvskov, and W. G. T. Willats.The Cell Walls of Green Algae: A Journey through Evolution and Diversity. Frontiers inPlant Science, 3, 2012.[80] J. Drone, H.-y. Feng, C. Tellier, L. Hoffmann, V. Tran, C. Rabiller, and M. Dion. Thermusthermophilus Glycosynthases for the Efficient Synthesis of Galactosyl and Glucosyl β-(1→3)-Glycosides. European Journal of Organic Chemistry, 2005(10):1977–1983, 2005.181Bibliography[81] V. Ducros, C. Tarling, D. Zechel, A. M. Brzozowski, T. P. Frandsen, I. von Ossowski,M. Schlein, S. G. Withers, and G. J. Davies. Anatomy of Glycosynthesis. Chemistry &Biology, 10(7):619–628, 2003.[82] T. Duo, E. D. Goddard-Borger, and S. G. Withers. Fluoro-glycosyl acridinones are ultra-sensitive active site titrating agents for retaining β-glycosidases. Chemical Communications,50(66):9379–9382, 2014.[83] A. P. Dyck and R. A. MacArthur. Seasonal patterns of body temperature and activityin free-ranging beaver (Castor canadensis). Canadian Journal of Zoology, 70(9):1668–1672,1992.[84] R. C. Edgar. Search and clustering orders of magnitude faster than BLAST. Bioinformatics,26(19):2460–1, 2010.[85] L. Eichinger, J. A. Pachebat, G. Glo¨ckner, M. A. Rajandream, R. Sucgang, M. Berriman,J. Song, R. Olsen, K. Szafranski, Q. Xu, B. Tunggal, S. Kummerfeld, M. Madera, B. A.Konfortov, F. Rivero, A. T. Bankier, R. Lehmann, N. Hamlin, R. Davies, P. Gaudet, P. Fey,K. Pilcher, G. Chen, D. Saunders, E. Sodergren, P. Davis, A. Kerhornou, X. Nie, N. Hall,C. Anjard, L. Hemphill, N. Bason, P. Farbrother, B. Desany, E. Just, T. Morio, R. Rost,C. Churcher, J. Cooper, S. Haydock, N. van Driessche, A. Cronin, I. Goodhead, D. Muzny,T. Mourier, A. Pain, M. Lu, D. Harper, R. Lindsay, H. Hauser, K. James, M. Quiles,M. Madan Babu, T. Saito, C. Buchrieser, A. Wardroper, M. Felder, M. Thangavelu, D. John-son, A. Knights, H. Loulseged, K. Mungall, K. Oliver, C. Price, M. A. Quail, H. Urushihara,J. Hernandez, E. Rabbinowitsch, D. Steffen, M. Sanders, J. Ma, Y. Kohara, S. Sharp, M. Sim-monds, S. Spiegler, A. Tivey, S. Sugano, B. White, D. Walker, J. Woodward, T. Winck-ler, Y. Tanaka, G. Shaulsky, M. Schleicher, G. Weinstock, A. Rosenthal, E. C. Cox, R. L.Chisholm, R. Gibbs, W. F. Loomis, M. Platzer, R. R. Kay, J. Williams, P. H. Dear, A. A.Noegel, B. Barrell, and A. Kuspa. The genome of the social amoeba Dictyostelium discoideum.Nature, 435(7038):43–57, 2005.[86] A. Escalante, A. Gonalves, A. Bodin, A. Stepan, C. Sandstro¨m, G. Toriz, and P. Gatenholm.182BibliographyFlexible oxygen barrier films from spruce xylan. Carbohydrate Polymers, 87(4):2381–2387,2012.[87] M. Faijes, M. Saura-Valls, X. Prez, M. Conti, and A. Planas. Acceptor-dependent regios-electivity of glycosynthase reactions by Streptomyces E383A β-glucosidase. CarbohydrateResearch, 341(12):2055–2065, 2006.[88] Y. Feng, C.-J. Duan, H. Pang, X.-C. Mo, C.-F. Wu, Y. Yu, Y.-L. Hu, J. Wei, J.-L. Tang, andJ.-X. Feng. Cloning and identification of novel cellulase genes from uncultured microorganismsin rabbit cecum and characterization of the expressed cellulases. Applied Microbiology andBiotechnology, 75(2):319–328, 2007.[89] B. Fernndez-Gmez, M. Richter, M. Schler, J. Pinhassi, S. G. Acinas, J. M. Gonzlez, andC. Pedrs-Ali. Ecology of marine Bacteroidetes: a comparative genomics approach. TheISME Journal, 7(5):1026–1037, 2013.[90] M. H. Foley, D. W. Cockburn, and N. M. Koropatkin. The Sus operon: a model system forstarch uptake by the human gut Bacteroidetes. Cellular and Molecular Life Sciences, 73(14):2603–2617, 2016.[91] C. M. G. A. Fontes and H. J. Gilbert. Cellulosomes: Highly Efficient Nanomachines Designedto Deconstruct Plant Cell Wall Complex Carbohydrates. Annual Review of Biochemistry, 79(1):655–681, 2010.[92] R. J. Forster, A. Salgado-Flores, L. H. Hagen, S. L. Ishaq, M. Zamanzadeh, A.-D. G. Wright,P. B. Pope, and M. A. Sundset. Rumen and Cecum Microbiomes in Reindeer (Rangifer taran-dus tarandus) Are Changed in Response to a Lichen Diet and May Affect Enteric MethaneEmissions. PLOS ONE, 11(5):e0155213, 2016.[93] S. J. Fowler, X. Dong, C. W. Sensen, J. M. Suflita, and L. M. Gieg. Methanogenic toluenemetabolism: community structure and intermediates. Environmental Microbiology, 14(3):754–64, 2012.183Bibliography[94] K. E. H. Frandsen, T. J. Simmons, P. Dupree, J.-C. N. Poulsen, G. R. Hemsworth, L. Ciano,E. M. Johnston, M. Tovborg, K. S. Johansen, P. von Freiesleben, L. Marmuse, S. Fort,S. Cottaz, H. Driguez, B. Henrissat, N. Lenfant, F. Tuna, A. Baldansuren, G. J. Davies,L. Lo Leggio, and P. H. Walton. The molecular basis of polysaccharide cleavage by lyticpolysaccharide monooxygenases. Nature Chemical Biology, 12(4):298–303, 2016.[95] E. A. Franzosa, X. C. Morgan, N. Segata, L. Waldron, J. Reyes, A. M. Earl, G. Giannoukos,M. R. Boylan, D. Ciulla, D. Gevers, J. Izard, W. S. Garrett, A. T. Chan, and C. Huttenhower.Relating the metatranscriptome and metagenome of the human gut. Proc Natl Acad Sci, 111(22):E2329–E2338, 2014.[96] S. C. Fry, B. H. W. A. Nesselrode, J. G. Miller, and B. R. Mewburn. Mixed-linkage(1→ 3, 1→4)−β-D-glucan is a major hemicellulose of Equisetum (horsetail) cell walls. NewPhytologist, 179(1):104–115, 2008.[97] X. Fu, C. Albermann, J. Jiang, J. Liao, C. Zhang, and J. S. Thorson. Antibiotic optimizationvia in vitro glycorandomization. Nature Biotechnology, 21(12):1467–1469, 2003.[98] M. Fujita, S. Shoda, K. Haneda, T. Inazu, K. Takegawa, and K. Yamamoto. A novel disac-charide substrate having 1,2-oxazoline moiety for detection of transglycosylating activity ofendoglycosidases. Biochimica et Biophysica Acta, 1528(1):9–14, 2001.[99] E. M. Gabor, W. B. L. Alkema, and D. B. Janssen. Quantifying the accessibility of themetagenome by random expression cloning techniques. Environmental Microbiology, 6(9):879–886, 2004.[100] P. Gallezot. Conversion of biomass to selected chemical products. Chemical Society Reviews,41(4):1538–1558, 2012.[101] Z. Gao, M. Niikura, and S. G. Withers. Ultrasensitive Fluorogenic Reagents for Neu-raminidase Titration. Angewandte Chemie International Edition, 56(22):6112–6116, 2017.[102] V. Garcia-Campayo and P. Beguin. Synergism between the cellulosome-integrating protein184BibliographyCipA and endoglucanase CelD of Clostridium thermocellum. Journal of Biotechnology, 57(1-3):39–47, 1997.[103] J. P. Giddens, J. V. Lomino, M. N. Amin, and L. X. Wang. Endo-F3 glycosynthase mutantsenable chemoenzymatic synthesis of core fucosylated tri-antennary complex-type glycopep-tides and glycoproteins. The Journal of Biological Chemistry, 2016.[104] L. M. Gieg, R. V. Kolhatkar, M. J. McInerney, R. S. Tanner, S. H. Harris, K. L. Sublette,and J. M. Suflita. Intrinsic Bioremediation of Petroleum Hydrocarbons in a Gas Condensate-Contaminated Aquifer. Environmental Science & Technology, 33(15):2550–2560, 1999.[105] H. J. Gilbert. The Biochemistry and Structural Biology of Plant Cell Wall Deconstruction.Plant Physiology, 153(2):444–455, 2010.[106] E. D. Goddard-Borger, B. Fiege, E. M. Kwan, and S. G. Withers. Glycosynthase-mediatedassembly of xylanase substrates and inhibitors. ChemBioChem, 12(11):1703–11, 2011.[107] J. Greenblatt and R. Schleif. Arabinose C protein: regulation of the arabinose operon invitro. Nature New Biology, 233(40):166–70, 1971.[108] M. Greving, X. Cheng, W. Reindl, B. Bowen, K. Deng, K. Louie, M. Nyman, J. Cohen,A. Singh, B. Simmons, P. Adams, G. Siuzdak, and T. Northen. Acoustic deposition withNIMS as a high-throughput enzyme activity assay. Analytical and Bioanalytical Chemistry,403(3):707–711, 2012.[109] R. J. Gruninger, T. A. McAllister, and R. J. Forster. Bacterial and Archaeal Diversity in theGastrointestinal Tract of the North American Beaver (Castor canadensis). PLOS ONE, 11(5):e0156457, 2016.[110] F. Gullfot, F. M. Ibatullin, G. Sundqvist, G. J. Davies, and H. Brumer. Functional char-acterization of xyloglucan glycosynthases from GH7, GH12, and GH16 scaffolds. Biomacro-molecules, 10(7):1782–8, 2009.[111] H. S. Hahm, M. Hurevich, and P. H. Seeberger. Automated assembly of oligosaccharidescontaining multiple cis-glycosidic linkages. Nature Communications, 7:12482, 2016.185Bibliography[112] J. Handelsman. Metagenomics: Application of Genomics to Uncultured Microorganisms.Microbiology and Molecular Biology Reviews, 68(4):669–685, 2004.[113] J. Handelsman, M. R. Rondon, S. F. Brady, J. Clardy, and R. M. Goodman. Molecular bio-logical access to the chemistry of unknown soil microbes: a new frontier for natural products.Chemistry & Biology, 5(10):R245–R249, 1998.[114] M. Hartmann, S. Lee, S. J. Hallam, and W. W. Mohn. Bacterial, archaeal and eukaryalcommunity structures throughout soil horizons of harvested and naturally disturbed foreststands. Environmental Microbiology, 11(12):3045–62, 2009.[115] J. J. Hatton, T. J. Stevenson, C. L. Buck, and K. N. Duddleston. Diet affects arctic groundsquirrel gut microbial metatranscriptome independent of community structure. Environmen-tal Microbiology, 19(4):1518–1535, 2017.[116] T. Hattori, M. Ogata, Y. Kameshima, K. Totani, M. Nikaido, T. Nakamura, H. Koshino, andT. Usui. Enzymatic synthesis of cellulose II-like substance via cellulolytic enzyme-mediatedtransglycosylation in an aqueous medium. Carbohydrate Research, 353:22–6, 2012.[117] R. A. Heins, X. Cheng, S. Nath, K. Deng, B. P. Bowen, D. C. Chivian, S. Datta, G. D.Friedland, P. D’Haeseleer, D. Wu, M. Tran-Gyamfi, C. S. Scullin, S. Singh, W. Shi, M. G.Hamilton, M. L. Bendall, A. Sczyrba, J. Thompson, T. Feldman, J. M. Guenther, J. M.Gladden, J.-F. Cheng, P. D. Adams, E. M. Rubin, B. A. Simmons, K. L. Sale, T. R. Northen,and S. Deutsch. Phylogenomically Guided Identification of Industrially Relevant GH1 β-Glucosidases through DNA Synthesis and Nanostructure-Initiator Mass Spectrometry. ACSChemical Biology, 9(9):2082–2091, 2014.[118] W. Helbert, J. Sugiyama, M. Ishihara, and S. Yamanaka. Characterization of native crys-talline cellulose in the cell walls of Oomycota. Journal of Biotechnology, 57(1-3):29–37, 1997.[119] R. Hertzberger, J. Arents, H. L. Dekker, R. D. Pridmore, C. Gysler, M. Kleerebezem, M. J. Mattos, and G. T. Macfarlane. H2O2 Production in Species of the Lactobacillus acidophilusGroup: a Central Role for a Novel NADH-Dependent Flavin Reductase. Applied and Envi-ronmental Microbiology, 80(7):2229–2239, 2014.186Bibliography[120] M. Hess, A. Sczyrba, R. Egan, T. W. Kim, H. Chokhawala, G. Schroth, S. Luo, D. S. Clark,F. Chen, T. Zhang, R. I. Mackie, L. A. Pennacchio, S. G. Tringe, A. Visel, T. Woyke, Z. Wang,and E. M. Rubin. Metagenomic discovery of biomass-degrading genes and genomes from cowrumen. Science, 331(6016):463–7, 2011.[121] M. E. Himmel. Biomass Recalcitrance: deconstructing the plant cell wall for bioenergy. Black-well Publishing, Oxford, 2008. ISBN 1405163607.[122] M. E. Himmel, S. Y. Ding, D. K. Johnson, W. S. Adney, M. R. Nimlos, J. W. Brady, andT. D. Foust. Biomass recalcitrance: engineering plants and enzymes for biofuels production.Science, 315(5813):804–807, 2007.[123] J. C. H. Ho, S. V. Pawar, S. J. Hallam, and V. G. Yadav. An Improved Whole-Cell Biosensorfor the Discovery of Lignin-Transforming Enzymes in Functional Metagenomic Screens. ACSSynthetic Biology, 2017.[124] Y. Honda and M. Kitaoka. A Family 8 Glycoside Hydrolase fromBacillus haloduransC-125 (BH2105) Is a Reducing End Xylose-releasing Exo-oligoxylanase. Journal of BiologicalChemistry, 279(53):55097–55103, 2004.[125] S. Horn, W. Durka, R. Wolf, A. Ermala, A. Stubbe, M. Stubbe, and M. Hofreiter. Mitochon-drial genomes reveal slow rates of molecular evolution and the timing of speciation in beavers(Castor), one of the largest rodent species. PLOS ONE, 6(1):e14622, 2011.[126] S. J. Horn, G. Vaaje-Kolstad, B. Westereng, and V. G. Eijsink. Novel enzymes for thedegradation of cellulose. Biotechnology for Biofuels, 5(1):45, 2012.[127] M. Hosokawa, Y. Hoshino, Y. Nishikawa, T. Hirose, D. H. Yoon, T. Mori, T. Sekiguchi,S. Shoji, and H. Takeyama. Droplet-based microfluidics for high-throughput screening of ametagenomic library for isolation of microbial enzymes. Biosensors and Bioelectronics, 67:379–85, 2015.[128] M. Hrmova, J. N. Varghese, R. De Gori, B. J. Smith, H. Driguez, and G. B. Fincher. Catalytic187BibliographyMechanisms and Reaction Intermediates along the Hydrolytic Pathway of a Plant β-D-glucanGlucohydrolase. Structure, 9(11):1005–1016, 2001.[129] Y. S. Y. Hsieh and P. J. Harris. Xyloglucans of Monocotyledons Have Diverse Structures.Molecular Plant, 2(5):943–965, 2009.[130] Y. S. Y. Hsieh and P. J. Harris. Structures of xyloglucans in primary cell walls of gym-nosperms, monilophytes (ferns sensu lato) and lycophytes. Phytochemistry, 79:87–101, 2012.[131] T. L. Hsu, S. R. Hanson, K. Kishikawa, S. K. Wang, M. Sawa, and C. H. Wong. Alkynylsugar analogs for the labeling and visualization of glycoconjugates in cells. Proc Natl AcadSci, 104(8):2614–2619, 2007.[132] Y. Huang, G. Krauss, S. Cottaz, H. Driguez, and G. Lipps. A highly acid-stable and ther-mostable endo-β-glucanase from the thermoacidophilic archaeon Sulfolobus solfataricus. Bio-chemical Journal, 385(Pt 2):581–8, 2005.[133] L. A. Hug, B. J. Baker, K. Anantharaman, C. T. Brown, A. J. Probst, C. J. Castelle, C. N.Butterfield, A. W. Hernsdorf, Y. Amano, K. Ise, Y. Suzuki, N. Dudek, D. A. Relman, K. M.Finstad, R. Amundson, B. C. Thomas, and J. F. Banfield. A new view of the tree of life.Nature Microbiology, 1(5), 2016.[134] D. Hyatt, G.-L. Chen, P. F. LoCascio, M. L. Land, F. W. Larimer, and L. J. Hauser. Prodigal:prokaryotic gene recognition and translation initiation site identification. BMC Bioinformat-ics, 11(1):119, 2010.[135] K. Ininbergs, B. Bergman, J. Larsson, and M. Ekman. Microbial metagenomics in the BalticSea: Recent advancements and prospects for environmental monitoring. Ambio, 44(S3):439–450, 2015.[136] H. Iqbal, L. Low-Beinart, J. Obiajulu, and S. Brady. Natural Product Discovery throughImproved Functional Metagenomics in Streptomyces. Journal of the American ChemicalSociety, 138(30):9341–9344, 2016.188Bibliography[137] F. Jacob and J. Monod. Genetic regulatory mechanisms in the synthesis of proteins. Journalof Molecular Biology, 3(3):318–356, 1961.[138] D. L. Jakeman and A. Sadeghi-Khomami. A β-(1,2)-Glycosynthase and an Attempted Selec-tion Method for the Directed Evolution of Glycosynthases. Biochemistry, 50(47):10359–10366,2011.[139] M. C. Jarvis and D. C. Apperley. Chain conformation in concentrated pectic gels: evidencefrom 13C NMR. Carbohydrate Research, 275(1):131–145, 1995.[140] H. Jiang, B. P. English, R. B. Hazan, P. Wu, and B. Ovryn. Tracking Surface Glycans on LiveCancer Cells with Single-Molecule Sensitivity. Angewandte Chemie International Edition, 54(6):1765–1769, 2015.[141] L. Johansson, L. Virkki, S. Maunu, M. Lehto, P. Ekholm, and P. Varo. Structural char-acterization of water soluble β-glucan of oat bran. Carbohydrate Polymers, 42(2):143–148,2000.[142] M. H. Johansson and O. Samuelson. Reducing end groups in brich xylan and their alkalinedegradation. Wood Science and Technology, 11(4):251–263, 1977.[143] D. R. Jones, M. S. Uddin, R. J. Gruninger, T. T. M. Pham, D. Thomas, A. B. Boraston,J. Briggs, B. Pluvinage, T. A. McAllister, R. J. Forster, A. Tsang, L. B. Selinger, andD. W. Abbott. Discovery and characterization of family 39 glycoside hydrolases from rumenanaerobic fungi with polyspecific activity on rare arabinosyl substrates. Journal of BiologicalChemistry, 292(30):12606–12620, 2017.[144] D. R. Jones, D. Thomas, N. Alger, A. Ghavidel, G. D. Inglis, and D. W. Abbott. SACCHA-RIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activitieswithin polyspecific families and de novo sequence datasets. Biotechnology for Biofuels, 11(1),2018.[145] A. Kabisch, A. Otto, S. Ko¨nig, D. Becher, D. Albrecht, M. Schler, H. Teeling, R. I. Amann,189Bibliographyand T. Schweder. Functional characterization of polysaccharide utilization loci in the marineBacteroidetes Gramella forsetii KT0803. The ISME Journal, 8(7):1492–1502, 2014.[146] M. Kanehisa and S. Goto. KEGG: kyoto encyclopedia of genes and genomes. Nucleic AcidsResearch, 28(1):27–30, 2000.[147] P. Kanokratana, L. Eurwilaichitr, K. Pootanakit, and V. Champreda. Identification of glyco-syl hydrolases from a metagenomic library of microflora in sugarcane bagasse collection siteand their cooperative action on cellulose degradation. Journal of Bioscience and Bioengi-neering, 2014.[148] A. E. Kaoutari, F. Armougom, J. I. Gordon, D. Raoult, and B. Henrissat. The abundanceand variety of carbohydrate-active enzymes in the human gut microbiota. Nature ReviewsMicrobiology, 11(7):497–504, 2013.[149] O. Khersonsky and D. S. Tawfik. Enzyme Promiscuity: A Mechanistic and EvolutionaryPerspective. Annual Review of Biochemistry, 79(1):471–505, 2010.[150] S. M. Kielbasa, R. Wan, K. Sato, P. Horton, and M. C. Frith. Adaptive seeds tame genomicsequence comparison. Genome Research, 21(3):487–93, 2011.[151] J. H. Kim, R. Resende, T. Wennekes, H. M. Chen, N. Bance, S. Buchini, A. G. Watts,P. Pilling, V. A. Streltsov, M. Petric, R. Liggins, S. Barrett, J. L. McKimm-Breschkin,M. Niikura, and S. G. Withers. Mechanism-based covalent neuraminidase inhibitors withbroad-spectrum influenza antiviral activity. Science, 340(6128):71–5, 2013.[152] S.-H. Kim, C. Harzman, J. K. Davis, R. Hutcheson, J. B. Broderick, T. L. Marsh, and J. M.Tiedje. Genome sequence of Desulfitobacterium hafniense DCB-2, a Gram-positive anaerobecapable of dehalogenation and metal reduction. BMC Microbiology, 12(1):21, 2012.[153] Y.-W. Kim, S. S. Lee, R. A. J. Warren, and S. G. Withers. Directed Evolution of a Glycosyn-thase from Agrobacterium sp. Increases Its Catalytic Activity Dramatically and Expands ItsSubstrate Repertoire. Journal of Biological Chemistry, 279(41):42787–42793, 2004.190Bibliography[154] H. E. Klock and S. A. Lesley. The Polymerase Incomplete Primer Extension (PIPE) MethodApplied to High-Throughput Cloning and Site-Directed Mutagenesis, volume 498. HumanaPress, 2009.[155] K. M. Konwar, N. W. Hanson, M. P. Bhatia, D. Kim, S. J. Wu, A. S. Hahn, C. Morgan-Lang,H. K. Cheung, and S. J. Hallam. MetaPathways v2.5: quantitative functional, taxonomicand usability improvements. Bioinformatics, 31(20):3345–7, 2015.[156] N. M. Koropatkin, E. A. Cameron, and E. C. Martens. How glycan metabolism shapes thehuman gut microbiota. Nature Reviews Microbiology, 2012.[157] S. K. Kraun, J. Schckel, B. Westereng, L. G. Thygesen, R. N. Monrad, V. G. H. Eijsink, andW. G. T. Willats. A new generation of versatile chromogenic substrates for high-throughputanalysis of biomass-degrading enzymes. Biotechnology for Biofuels, 8(1), 2015.[158] S. Kuhaudomlarp, N. J. Patron, B. Henrissat, M. Rejzek, G. Saalbach, and R. A. Field. Identi-fication of Euglena gracilis β-1,3-glucan phosphorylase and establishment of a new glycosidehydrolase (GH) family GH149. Journal of Biological Chemistry, page jbc.RA117.000936,2018.[159] P. S. Kumar, M. R. Brooker, S. E. Dowd, and T. Camerlengo. Target region selection isa critical determinant of community fingerprints generated by 16S pyrosequencing. PLOSONE, 6(6):e20956, 2011.[160] M. Kurogochi, M. Mori, K. Osumi, M. Tojino, S. Sugawara, S. Takashima, Y. Hirose,W. Tsukimura, M. Mizuno, J. Amano, A. Matsuda, M. Tomita, A. Takayanagi, S. Shoda,and T. Shirai. Glycoengineered Monoclonal Antibodies with Homogeneous Glycan (M3, G0,G2, and A2) Using a Chemoenzymatic Approach Have Different Affinities for Fc gamma RI-IIa and Variable Antibody-Dependent Cellular Cytotoxicity Activities. PLOS ONE, 10(7):e0132848, 2015.[161] S. Kuusk, B. Bissaro, P. Kuusk, Z. Forsberg, V. G. H. Eijsink, M. Srlie, and P. Vljame. Kinet-ics of H2O2-driven degradation of chitin by a bacterial lytic polysaccharide monooxygenase.Journal of Biological Chemistry, 293(2):523–531, 2018.191Bibliography[162] K. K. Kwon, S.-J. Yeom, D.-H. Lee, K. J. Jeong, and S.-G. Lee. Development of a novel cel-lulase biosensor that detects crystalline cellulose hydrolysis using a transcriptional regulator.Biochemical and Biophysical Research Communications, 495(1):1328–1334, 2018.[163] S. Lagaert, S. Van Campenhout, A. Pollet, T. M. Bourgois, J. A. Delcour, C. M. Courtin,and G. Volckaert. Recombinant Expression and Characterization of a Reducing-End Xylose-Releasing Exo-Oligoxylanase from Bifidobacterium adolescentis. Applied and EnvironmentalMicrobiology, 73(16):5374–5377, 2007.[164] M. Lang, T. Kamrat, and B. Nidetzky. Influence of ionic liquid cosolvent on transgalacto-sylation reactions catalyzed by thermostable beta-glycosyl hydrolase CelB from PyrococcusFuriosus. Biotechnology and Bioengineering, 95(6):1093–100, 2006.[165] J. Larsbrink, T. E. Rogers, G. R. Hemsworth, L. S. McKee, A. S. Tauzin, O. Spadiut,S. Klinter, N. A. Pudlo, K. Urs, N. M. Koropatkin, A. L. Creagh, C. A. Haynes, A. G.Kelly, S. N. Cederholm, G. J. Davies, E. C. Martens, and H. Brumer. A discrete geneticlocus confers xyloglucan metabolism in select human gut Bacteroidetes. Nature, 506(7489):498–502, 2014.[166] C. L. Lauber, M. Hamady, R. Knight, and N. Fierer. Pyrosequencing-Based Assessmentof Soil pH as a Predictor of Soil Bacterial Community Structure at the Continental Scale.Applied and Environmental Microbiology, 75(15):5111–5120, 2009.[167] S. T. Laughlin and C. R. Bertozzi. Metabolic labeling of glycans with azido sugars andsubsequent glycan-profiling and visualization via Staudinger ligation. Nature Protocols, 2(11):2930–2944, 2007.[168] B. D. Lauro, M. Rossi, and M. Moracci. Characterization of a β-glycosidase from the ther-moacidophilic bacterium Alicyclobacillus acidocaldarius. Extremophiles, 10(4):301–310, 2006.[169] S. Lee and S. J. Hallam. Extraction of high molecular weight genomic DNA from soils andsediments. Journal of Visualized Experiments, (33), 2009.192Bibliography[170] B. Leis, A. Angelov, M. Mientus, H. Li, V. T. Pham, B. Lauinger, P. Bongen, J. Pietruszka,L. G. Goncalves, H. Santos, and W. Liebl. Identification of novel esterase-active enzymesfrom hot environments by use of the host bacterium Thermus thermophilus. Frontiers inMicrobiology, 6:275, 2015.[171] A. Levasseur, E. Drula, V. Lombard, P. M. Coutinho, and B. Henrissat. Expansion of the en-zymatic repertoire of the CAZy database to integrate auxiliary redox enzymes. Biotechnologyfor Biofuels, 6(1):41, 2013.[172] A. Levasseur, E. Drula, V. Lombard, P. M. Coutinho, and B. Henrissat. Expansion of the en-zymatic repertoire of the CAZy database to integrate auxiliary redox enzymes. Biotechnologyfor Biofuels, 6(1):41, 2013.[173] R. E. Ley, M. Hamady, C. Lozupone, P. J. Turnbaugh, R. R. Ramey, J. S. Bircher, M. L.Schlegel, T. A. Tucker, M. D. Schrenzel, R. Knight, and J. I. Gordon. Evolution of mammalsand their gut microbes. Science, 320(5883):1647–51, 2008.[174] H. Li and R. Durbin. Fast and accurate short read alignment with Burrows–Wheeler trans-form. Bioinformatics, 25(14):1754–1760, 2009.[175] K.-Y. Li, J. Jiang, M. D. Witte, W. W. Kallemeijn, H. van den Elst, C.-S. Wong, S. D.Chander, S. Hoogendoorn, T. J. M. Beenakker, J. D. C. Code, J. M. F. G. Aerts, G. A.van der Marel, and H. S. Overkleeft. Synthesis of Cyclophellitol, Cyclophellitol Aziridine,and Their Tagged Derivatives. European Journal of Organic Chemistry, 2014(27):6030–6043,2014.[176] L. L. Li, S. Taghavi, S. M. McCorkle, Y. B. Zhang, M. G. Blewitt, R. Brunecky, W. S. Adney,M. E. Himmel, P. Brumm, C. Drinkwater, D. A. Mead, S. G. Tringe, and D. Lelie. Bio-prospecting metagenomics of decaying wood: mining for new glycoside hydrolases. Biotech-nology for Biofuels, 4(1):23, 2011.[177] X. Li, P. Jackson, D. V. Rubtsov, N. Faria-Blanc, J. C. Mortimer, S. R. Turner, K. B. Krogh,K. S. Johansen, and P. Dupree. Development and application of a high throughput carbo-193Bibliographyhydrate profiling technique for analyzing plant cell wall polysaccharides and carbohydrateactive enzymes. Biotechnology for Biofuels, 6(1):94, 2013.[178] Y. Li, M. Wexler, D. J. Richardson, P. L. Bond, and A. W. B. Johnston. Screening a widehost-range, waste-water metagenomic library in tryptophan auxotrophs of Rhizobium legumi-nosarum and of Escherichia coli reveals different classes of cloned trp genes. EnvironmentalMicrobiology, 7(12):1927–1936, 2005.[179] W. Liebl, A. Angelov, J. Juergensen, J. Chow, A. Loeschcke, T. Drepper, T. Classen,J. Pietruzska, A. Ehrenreich, W. R. Streit, and K.-E. Jaeger. Alternative hosts for func-tional (meta)genome analysis. Applied Microbiology and Biotechnology, 98(19):8099–8109,2014.[180] H. Liu and J. H. Naismith. An efficient one-step site-directed deletion, insertion, single andmultiple-site plasmid mutagenesis protocol. BMC Biotechnology, 8(1):91, 2008.[181] V. Lombard, H. Golaconda Ramulu, E. Drula, P. M. Coutinho, and B. Henrissat. Thecarbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research, 42(Databaseissue):D490–5, 2014.[182] H. C. Losey, J. Jiang, J. B. Biggins, M. Oberthr, X.-Y. Ye, S. D. Dong, D. Kahne, J. S.Thorson, and C. T. Walsh. Incorporation of Glucose Analogs by GtfE and GtfD from theVancomycin Biosynthetic Pathway to Generate Variant Glycopeptides. Chemistry & Biology,9(12):1305–1314, 2002.[183] A. S. Luis, J. Briggs, X. Zhang, B. Farnell, D. Ndeh, A. Labourel, A. Basl, A. Cartmell,N. Terrapon, K. Stott, E. C. Lowe, R. McLean, K. Shearer, J. Schckel, I. Venditto, M.-C.Ralet, B. Henrissat, E. C. Martens, S. C. Mosimann, D. W. Abbott, and H. J. Gilbert. Dietarypectic glycans are degraded by coordinated enzyme pathways in human colonic Bacteroides.Nature Microbiology, 2017.[184] L. R. Lynd, P. J. Weimer, W. H. van Zyl, and I. S. Pretorius. Microbial cellulose utilization:fundamentals and biotechnology. Microbiology and Molecular Biology Reviews, 66(3):506–77,table of contents, 2002.194Bibliography[185] S. Mabeau and B. Kloareg. Isolation and Analysis of the Cell Walls of Brown Algae: Fu-cus spiralis, F. ceranoides, F. vesiculosus, F. serratus, Bifurcaria bifurcata and Laminariadigitata.[186] M. S. Macauley, G. E. Whitworth, A. W. Debowski, D. Chin, and D. J. Vocadlo. O-GlcNAcase uses substrate-assisted catalysis: kinetic analysis and development of highly se-lective mechanism-inspired inhibitors. Journal of Biological Chemistry, 280(27):25313–22,2005.[187] A. K. Mackenzie, A. E. Naas, S. K. Kracun, J. Schckel, J. U. Fangel, J. W. Agger, W. G. T.Willats, V. G. H. Eijsink, P. B. Pope, and H. L. Drake. A Polysaccharide Utilization Locusfrom an Uncultured Bacteroidetes Phylotype Suggests Ecological Adaptation and SubstrateVersatility. Applied and Environmental Microbiology, 81(1):187–195, 2015.[188] L. F. Mackenzie, Q. Wang, R. A. J. Warren, and S. G. Withers. Glycosynthases: MutantGlycosidases for Oligosaccharide Synthesis. Journal of the American Chemical Society, 120(22):5583–5584, 1998.[189] T. Magocˇ and S. L. Salzberg. FLASH: fast length adjustment of short reads to improvegenome assemblies. Bioinformatics, 27(21):2957–2963, 2011.[190] J. E. Maldonado, G. T. Bergmann, J. M. Craine, M. S. Robeson, and N. Fierer. SeasonalShifts in Diet and Gut Microbiota of the American Bison (Bison bison). PLOS ONE, 10(11):e0142409, 2015.[191] E. C. Martens, N. M. Koropatkin, T. J. Smith, and J. I. Gordon. Complex Glycan Catabolismby the Human Gut Microbiota: The Bacteroidetes Sus-like Paradigm. Journal of BiologicalChemistry, 284(37):24673–24677, 2009.[192] E. C. Martens, A. G. Kelly, A. S. Tauzin, and H. Brumer. The devil lies in the details:how variations in polysaccharide fine-structure impact the physiology and evolution of gutmicrobes. Journal of Molecular Biology, 426(23):3851–65, 2014.195Bibliography[193] M. Martin, S. Biver, S. Steels, T. Barbeyron, M. Jam, D. Portetelle, G. Michel, and M. Van-denbol. Identification and characterization of a halotolerant, cold-active marine endo-beta-1,4-glucanase by using functional metagenomics of seaweed-associated microbiota. Appliedand Environmental Microbiology, 80(16):4958–4967, 2014.[194] C. F. Maurice, S. Cl Knowles, J. Ladau, K. S. Pollard, A. Fenton, A. B. Pedersen, and P. J.Turnbaugh. Marked seasonal variation in the wild mouse gut microbiota. The ISME Journal,9(11):2423–2434, 2015.[195] B. V. McCleary and R. Codd. Measurement of (1→ 3), (1→4)−β-D-glucan in barley andoats: A streamlined enzymic procedure. Journal of the Science of Food and Agriculture, 55(2):303–312, 1991.[196] L. S. McKee, H. Sunner, G. E. Anasontzis, G. Toriz, P. Gatenholm, V. Bulone, F. Vilaplana,and L. Olsson. A GH115 alpha-glucuronidase from Schizophyllum commune contributes tothe synergistic enzymatic deconstruction of softwood glucuronoarabinoxylan. Biotechnologyfor Biofuels, 9:2, 2016.[197] V. J. McKenzie, S. J. Song, F. Delsuc, T. L. Prest, A. M. Oliverio, T. M. Korpita, A. Alexiev,K. R. Amato, J. L. Metcalf, M. Kowalewski, N. L. Avenant, A. Link, A. Di Fiore, A. Seguin-Orlando, C. Feh, L. Orlando, J. R. Mendelson, J. Sanders, and R. Knight. The Effects ofCaptivity on the Mammalian Gut Microbiome. Integrative and Comparative Biology, 57(4):690–704, 2017.[198] N. D. Meadow, D. K. Fox, and S. Roseman. The Bacterial Phosphoenol-Pyruvate: GlycosePhosphotransferase System. Annual Review of Biochemistry, 59(1):497–542, 1990.[199] V. Menon and M. Rao. Trends in bioconversion of lignocellulose: Biofuels, platform chemicalsbiorefinery concept. Progress in Energy and Combustion Science, 38(4):522–550, 2012.[200] K. Mewis, M. Taupp, and S. J. Hallam. A high throughput screen for biomining cellulaseactivity from metagenomic libraries. Journal of Visualized Experiments, (48), 2011.196Bibliography[201] K. Mewis, Z. Armstrong, Y. C. Song, S. A. Baldwin, S. G. Withers, and S. J. Hallam.Biomining active cellulases from a mining bioremediation system. Journal of Biotechnology,167(4):462–471, 2013.[202] K. Mewis, N. Lenfant, V. Lombard, and B. Henrissat. Dividing the Large Glycoside HydrolaseFamily 43 into Subfamilies: a Motivation for Detailed Enzyme Characterization. Applied andEnvironmental Microbiology, 82(6):1686–92, 2016.[203] D. Meyer, C. Schneider-Fresenius, R. Horlacher, R. Peist, and W. Boos. Molecular character-ization of glucokinase from Escherichia coli K-12. Journal of Bacteriology, 179(4):1298–306,1997.[204] G. Michel, M. Czjzek, E. Rebuffet, J.-H. Hehemann, and F. Thomas. Environmental andGut Bacteroidetes: The Food Connection. Frontiers in Microbiology, 2, 2011.[205] D. Mohnen. Pectin structure and biosynthesis. Current Opinion in Plant Biology, 11(3):266–277, 2008.[206] S. Moras, Y. B. David, L. Bensoussan, S. H. Duncan, N. M. Koropatkin, E. C. Martens, H. J.Flint, and E. A. Bayer. Enzymatic profiling of cellulosomal enzymes from the human gutbacterium, R uminococcus champanellensis , reveals a fine-tuned system for cohesin-dockerinrecognition. Environmental Microbiology, 18(2):542–556, 2016.[207] L. R. S. Moreira and E. X. F. Filho. An overview of mannan structure and mannan-degradingenzyme systems. Applied Microbiology and Biotechnology, 79(2):165–178, 2008.[208] B. D. Muegge, J. Kuczynski, D. Knights, J. C. Clemente, A. Gonzlez, L. Fontana, B. Hen-rissat, R. Knight, and J. I. Gordon. Diet Drives Convergence in Gut Microbiome FunctionsAcross Mammalian Phylogeny and Within Humans. Science, 332(6032):970–974, 2011.[209] J. Munoz-Munoz, A. Cartmell, N. Terrapon, B. Henrissat, and H. J. Gilbert. Unusual activesite location and catalytic apparatus in a glycoside hydrolase family. Proc Natl Acad Sci, 114(19):4936–4941, 2017.197Bibliography[210] T. Nagy, K. Emami, C. M. Fontes, L. M. Ferreira, D. R. Humphry, and H. J. Gilbert. Themembrane-bound alpha-glucuronidase from Pseudomonas cellulosa hydrolyzes 4-O-methyl-D-glucuronoxylooligosaccharides but not 4-O-methyl-D-glucuronoxylan. Journal of Bacteri-ology, 184(17):4925–9, 2002.[211] M. Najah, R. Calbrix, I. Mahendra-Wijaya, T. Beneyton, A. Griffiths, and A. Drevelle.Droplet-Based Microfluidics Platform for Ultra-High-Throughput Bioprospecting of Cellu-lolytic Microorganisms. Chemistry & Biology, 21(12):1722–1732, 2014.[212] K. Nakashima, L. Yamada, Y. Satou, J.-i. Azuma, and N. Satoh. The evolutionary origin ofanimal cellulose synthase. Development Genes and Evolution, 214(2):81–88, 2004.[213] M. N. Namchuk and S. G. Withers. Mechanism of Agrobacterium beta-glucosidase: ki-netic analysis of the role of noncovalent enzyme/substrate interactions. Biochemistry, 34(49):16194–202, 1995.[214] K. Naresh, F. Schumacher, H. S. Hahm, and P. H. Seeberger. Pushing the limits of automatedglycan assembly: synthesis of a 50mer polymannoside. Chemical Communications, 53(65):9085–9088, 2017.[215] D. Ndeh, A. Rogowski, A. Cartmell, A. S. Luis, A. Basle, J. Gray, I. Venditto, J. Briggs,X. Zhang, A. Labourel, N. Terrapon, F. Buffetto, S. Nepogodiev, Y. Xiao, R. A. Field,Y. Zhu, M. A. O’Neill, B. R. Urbanowicz, W. S. York, G. J. Davies, D. W. Abbott, M. C.Ralet, E. C. Martens, B. Henrissat, and H. J. Gilbert. Complex pectin metabolism by gutbacteria reveals novel catalytic functions. Nature, 544(7648):65–70, 2017.[216] C. E. Nelson, A. Rogowski, C. Morland, J. A. Wilhide, H. J. Gilbert, and J. G. Gardner.Systems analysis in Cellvibrio japonicus resolves predicted redundancy of β-glucosidases anddetermines essential physiological functions. Molecular Microbiology, 104(2):294–305, 2017.[217] J. K. Nicholson, E. Holmes, J. Kinross, R. Burcelin, G. Gibson, W. Jia, and S. Pettersson.Host-gut microbiota metabolic interactions. Science, 336(6086):1262–7, 2012.[218] H. Nielsen. Predicting Secretory Proteins with SignalP, volume 1611. Humana Press, 2017.198Bibliography[219] Y. Nijikken, T. Tsukada, K. Igarashi, M. Samejima, T. Wakagi, H. Shoun, and S. Fushi-nobu. Crystal structure of intracellular family 1 β-glucosidase BGL1A from the BasidiomycetePhanerochaete chrysosporium. FEBS Letters, 581(7):1514–1520, 2007.[220] I. Nobeli, A. D. Favia, and J. M. Thornton. Protein promiscuity and its implications forbiotechnology. Nature Biotechnology, 27(2):157–167, 2009.[221] D. R. Nobles, D. K. Romanovicz, and R. M. Brown. Cellulose in Cyanobacteria. Origin ofVascular Plant Cellulose Synthase? Plant Physiology, 127(2):529–542, 2001.[222] T. R. Northen, J. C. Lee, L. Hoang, J. Raymond, D. R. Hwang, S. M. Yannone, C. H. Wong,and G. Siuzdak. A nanostructure-initiator mass spectrometry-based enzyme activity assay.Proc Natl Acad Sci, 105(10):3678–3683, 2008.[223] M. Nyyssonen, H. M. Tran, U. Karaoz, C. Weihe, M. Z. Hadi, J. B. Martiny, A. C. Martiny,and E. L. Brodie. Coupled high-throughput functional screening and next generation se-quencing for identification of plant polymer decomposing enzymes in metagenomic libraries.Frontiers in Microbiology, 4:282, 2013.[224] P. J. O’Brien and D. Herschlag. Catalytic promiscuity and the evolution of new enzymaticactivities. Chemistry & Biology, 6(4):R91–R105, 1999.[225] H. Ochiai, W. Huang, and L.-X. Wang. Expeditious Chemoenzymatic Synthesis of Homoge-neous N-Glycoproteins Carrying Defined Oligosaccharide Ligands. Journal of the AmericanChemical Society, 130(41):13790–13803, 2008.[226] R. M. O’Connor, J. M. Fung, K. H. Sharp, J. S. Benner, C. McClung, S. Cushing, E. R.Lamkin, A. I. Fomenkov, B. Henrissat, Y. Y. Londer, M. B. Scholz, J. Posfai, S. Malfatti,S. G. Tringe, T. Woyke, R. R. Malmstrom, D. Coleman-Derr, M. A. Altamia, S. Dedrick,S. T. Kaluziak, M. G. Haygood, and D. L. Distel. Gill bacteria enable a novel digestivestrategy in a wood-feeding mollusk. Proc Natl Acad Sci, 111(47):E5096–E5104, 2014.[227] T. Ohnuma, T. Fukuda, S. Dozen, Y. Honda, M. Kitaoka, and T. Fukamizo. A glycosynthase199Bibliographyderived from an inverting GH19 chitinase from the moss Bryum coronatum. BiochemicalJournal, 444(3):437–43, 2012.[228] M. A. O’Neill, T. Ishii, P. Albersheim, and A. G. Darvill. RHAMNOGALACTURONAN II:Structure and Function of a Borate Cross-Linked Cell Wall Pectic Polysaccharide. AnnualReview of Plant Biology, 55(1):109–139, 2004.[229] H. Pang, P. Zhang, C. J. Duan, X. C. Mo, J. L. Tang, and J. X. Feng. Identification ofcellulase genes from the metagenomes of compost soils and functional characterization of onenovel endoglucanase. Current Microbiology, 58(4):404–8, 2009.[230] J. S. Papadopoulos and R. Agarwala. COBALT: constraint-based alignment tool for multipleprotein sequences. Bioinformatics, 23(9):1073–1079, 2007.[231] H. Parker, P. Nummi, G. Hartman, and F. Rosell. Invasive North American beaver Castorcanadensis in Eurasia: a review of potential consequences and a strategy for eradication.Wildlife Biology, 18(4):354–365, 2012.[232] L. Paulova, P. Patakova, B. Branska, M. Rychtera, and K. Melzoch. Lignocellulosic ethanol:Technology design and its impact on process efficiency. Biotechnology Advances, 33(6):1091–1107, 2015.[233] M. Pauly and K. Keegstra. Biosynthesis of the Plant Cell Wall Matrix Polysaccharide Xy-loglucan. Annual Review of Plant Biology, 67(1):235–259, 2016.[234] M. J. Pea, A. R. Kulkarni, J. Backe, M. Boyd, M. A. ONeill, and W. S. York. Structuraldiversity of xylans in the cell walls of monocots. Planta, 244(3):589–606, 2016.[235] Y. Peng, H. C. Leung, S.-M. Yiu, and F. Y. Chin. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11):1420–1428, 2012.[236] S. Pengthaisong, C.-F. Chen, S. G. Withers, B. Kuaprasert, and J. R. Ketudat Cairns. RiceBGlu1 glycosynthase and wild type transglycosylation activities distinguished by cyclophel-litol inhibition. Carbohydrate Research, 352:51–59, 2012.200Bibliography[237] G. Perugino, A. Trincone, A. Giordano, J. van der Oost, T. Kaper, M. Rossi, and M. Moracci.Activity of Hyperthermophilic Glycosynthases Is Significantly Enhanced at Acidic pH. Bio-chemistry, 42(28):8484–8493, 2003.[238] D. R. Plichta, A. S. Juncker, M. Bertalan, E. Rettedal, L. Gautier, E. Varela, C. Manichanh,C. Fouqueray, F. Levenez, T. Nielsen, J. Dor, A. M. D. Machado, M. C. R. de Evgrafov,T. Hansen, T. Jrgensen, P. Bork, F. Guarner, O. Pedersen, M. O. A. Sommer, S. D. Ehrlich,T. Sicheritz-Pontn, S. Brunak, and H. B. Nielsen. Transcriptional interactions suggest nichesegregation among microorganisms in the human gut. Nature Microbiology, 1(11):16152, 2016.[239] M. L. Polizeli, A. C. Rizzatti, R. Monti, H. F. Terenzi, J. A. Jorge, and D. S. Amorim. Xy-lanases from fungi: properties and industrial applications. Applied Microbiology and Biotech-nology, 67(5):577–91, 2005.[240] P. B. Pope, S. E. Denman, M. Jones, S. G. Tringe, K. Barry, S. A. Malfatti, A. C. McHardy,J.-F. Cheng, P. Hugenholtz, C. S. McSweeney, and M. Morrison. Adaptation to herbivory bythe Tammar wallaby includes bacterial and glycoside hydrolase profiles different from otherherbivores. Proc Natl Acad Sci, 107(33):14793–14798, 2010.[241] P. B. Pope, A. K. Mackenzie, I. Gregor, W. Smith, M. A. Sundset, A. C. McHardy, M. Mor-rison, and V. G. H. Eijsink. Metagenomics of the Svalbard Reindeer Rumen MicrobiomeReveals Abundance of Polysaccharide Utilization Loci. PLOS ONE, 7(6):e38571, 2012.[242] T. Pozzo, J. L. Pasten, E. N. Karlsson, and D. T. Logan. Structural and Functional Analysesof β-Glucosidase 3B from Thermotoga neapolitana: A Thermostable Three-Domain Repre-sentative of Glycoside Hydrolase 3. Journal of Molecular Biology, 397(3):724–739, 2010.[243] T. Pozzo, M. Plaza, J. Romero-Garca, M. Faijes, E. N. Karlsson, and A. Planas. Glycosyn-thases from Thermotoga neapolitana β-glucosidase 1A: A comparison of α-glucosyl fluorideand in situ-generated α-glycosyl formate donors. Journal of Molecular Catalysis B: Enzy-matic, 107:132–139, 2014.[244] T. Pozzo, J. Romero-Garca, M. Faijes, A. Planas, and E. Nordberg Karlsson. Rational de-201Bibliographysign of a thermostable glycoside hydrolase from family 3 introduces β-glycosynthase activity.Glycobiology, 27(2):165–175, 2017.[245] H. Prade, L. F. MacKenzie, and S. G. Withers. Enzymatic synthesis of disaccharides usingAgrobacterium sp. beta-glucosidase. Carbohydrate Research, 305(3-4):371–381, 1997.[246] K. D. Pruitt and D. R. Maglott. RefSeq and LocusLink: NCBI gene-centered resources.Nucleic Acids Research, 29(1):137–40, 2001.[247] M. Qi, H. S. Jun, and C. W. Forsberg. Characterization and Synergistic Interactions ofFibrobacter succinogenes Glycoside Hydrolases. Applied and Environmental Microbiology, 73(19):6098–6105, 2007.[248] C. Quast, E. Pruesse, P. Yilmaz, J. Gerken, T. Schweer, P. Yarza, J. Peplies, and F. O.Glockner. The SILVA ribosomal RNA gene database project: improved data processing andweb-based tools. Nucleic Acids Research, 41(D1):D590–D596, 2012.[249] J. Ravachol, R. Borne, C. Tardif, P. de Philip, and H.-P. Fierobe. Characterization ofAll Family-9 Glycoside Hydrolases Synthesized by the Cellulosome-producing BacteriumClostridium cellulolyticum. Journal of Biological Chemistry, 289(11):7335–7348, 2014.[250] J. Ravel, M. Martinez-Garcia, D. M. Brazel, B. K. Swan, C. Arnosti, P. S. G. Chain, K. G.Reitenga, G. Xie, N. J. Poulton, M. L. Gomez, D. E. D. Masland, B. Thompson, W. K. Bel-lows, K. Ziervogel, C.-C. Lo, S. Ahmed, C. D. Gleasner, C. J. Detter, and R. Stepanauskas.Capturing Single Cell Genomes of Active Polysaccharide Degraders: An Unexpected Contri-bution of Verrucomicrobia. PLOS ONE, 7(4):e35314, 2012.[251] L. J. Revell. phytools: an R package for phylogenetic comparative biology (and other things).Methods in Ecology and Evolution, 3(2):217–223, 2012.[252] C. Rhn. Chemical composition and gross calorific value of the above-ground biomass compo-nents of young Picea abies. Scandinavian Journal of Forest Research, 19(1):72–81, 2004.[253] J. R. Rich and S. G. Withers. A chemoenzymatic total synthesis of the neurogenic starfish202Bibliographyganglioside LLG-3 using an engineered and evolved synthase. Angewandte Chemie Interna-tional Edition, 51(34):8640–3, 2012.[254] P. M. Richardson, W. Shi, S. Xie, X. Chen, S. Sun, X. Zhou, L. Liu, P. Gao, N. C. Kyrpides,E.-G. No, and J. S. Yuan. Comparative Genomic Analysis of the Endosymbionts of Herbiv-orous Insects Reveals Eco-Environmental Adaptations: Biotechnology Applications. PLOSGenetics, 9(1):e1003131, 2013.[255] C. S. Riesenfeld, P. D. Schloss, and J. Handelsman. Metagenomics: Genomic Analysis ofMicrobial Communities. Annual Review of Genetics, 38(1):525–552, 2004.[256] C. Rinke, P. Schwientek, A. Sczyrba, N. N. Ivanova, I. J. Anderson, J. F. Cheng, A. Darling,S. Malfatti, B. K. Swan, E. A. Gies, J. A. Dodsworth, B. P. Hedlund, G. Tsiamis, S. M.Sievert, W. T. Liu, J. A. Eisen, S. J. Hallam, N. C. Kyrpides, R. Stepanauskas, E. M. Rubin,P. Hugenholtz, and T. Woyke. Insights into the phylogeny and coding potential of microbialdark matter. Nature, 499(7459):431–437, 2013.[257] A. Rogowski, J. A. Briggs, J. C. Mortimer, T. Tryfona, N. Terrapon, E. C. Lowe, A. Basle,C. Morland, A. M. Day, H. Zheng, T. E. Rogers, P. Thompson, A. R. Hawkins, M. P. Yadav,B. Henrissat, E. C. Martens, P. Dupree, H. J. Gilbert, and D. N. Bolam. Glycan complexitydictates microbial resource allocation in the large intestine. Nature Communications, 6:7481,2015.[258] M. Rother and J. A. Krzycki. Selenocysteine, Pyrrolysine, and the Unique Energy Metabolismof Methanogenic Archaea. Archaea, 2010:1–14, 2010.[259] E. M. Rubin. Genomics of Cellulosic Biofuels. Nature, 454(7206):841–845, 2008.[260] T. L. Ruegg, E.-M. Kim, B. A. Simmons, J. D. Keasling, S. W. Singer, T. Soon Lee, andM. P. Thelen. An auto-inducible mechanism for ionic liquid resistance in microbial biofuelproduction. Nature Communications, 5, 2014.[261] J. G. Sanders, A. C. Beichman, J. Roman, J. J. Scott, D. Emerson, J. J. McCarthy, and P. R.203BibliographyGirguis. Baleen whales host a unique gut microbiome with similarities to both carnivores andherbivores. Nature Communications, 6:8285, 2015.[262] A. Sazci, K. Erenler, and A. Radford. Detection of cellulolytic fungi by using Congo red asan indicator: a comparative study with the dinitrosalicyclic acid reagent method. Journal ofApplied Bacteriology, 61(6):559–562, 1986.[263] M. Schallmey, A. Ly, C. Wang, G. Meglei, S. Voget, W. R. Streit, B. T. Driscoll, and T. C.Charles. Harvesting of novel polyhydroxyalkanaote (PHA) synthase encoding genes from asoil metagenome library using phenotypic screening. FEMS Microbiology Letters, 321(2):150–156, 2011.[264] H. V. Scheller and P. Ulvskov. Hemicelluloses. Annual Review of Plant Biology, 61:263–89,2010.[265] R. Schmieder and R. Edwards. Quality control and preprocessing of metagenomic datasets.Bioinformatics, 27(6):863–864, 2011.[266] C. Schro¨der, S. Blank, and G. Antranikian. First Glycoside Hydrolase Family 2 Enzymesfrom Thermus antranikianii and Thermus brockianus with β-Glucosidase Activity. Frontiersin Bioengineering and Biotechnology, 3, 2015.[267] A. Schuster and M. Schmoll. Biology and biotechnology of Trichoderma. Applied Microbiologyand Biotechnology, 87(3):787–99, 2010.[268] E. D. Scully, S. M. Geib, K. Hoover, M. Tien, S. G. Tringe, K. W. Barry, T. Glavina delRio, M. Chovatia, J. R. Herr, and J. E. Carlson. Metagenomic profiling reveals lignocellulosedegrading system in a microbial community associated with a wood-feeding beetle. PLOSONE, 8(9):e73827, 2013.[269] H. F. Seidle, K. McKenzie, I. Marten, O. Shoseyov, and R. E. Huber. Trp-262 is a keyresidue for the hydrolytic and transglucosidic reactivity of the Aspergillus niger family 3 β-glucosidase: Substitution results in enzymes with mainly transglucosidic activity. Archivesof Biochemistry and Biophysics, 444(1):66–75, 2005.204Bibliography[270] R. Sender, S. Fuchs, and R. Milo. Revised Estimates for the Number of Human and BacteriaCells in the Body. PLOS Biology, 14(8):e1002533, 2016.[271] Q. She, R. K. Singh, F. Confalonieri, Y. Zivanovic, G. Allard, M. J. Awayez, C. C. Y.Chan-Weiher, I. G. Clausen, B. A. Curtis, A. De Moors, G. Erauso, C. Fletcher, P. M. K.Gordon, I. Heikamp-de Jong, A. C. Jeffries, C. J. Kozera, N. Medina, X. Peng, H. P. Thi-Ngoc, P. Redder, M. E. Schenk, C. Theriault, N. Tolstrup, R. L. Charlebois, W. F. Doolittle,M. Duguet, T. Gaasterland, R. A. Garrett, M. A. Ragan, C. W. Sensen, and J. Van der Oost.The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad Sci, 98(14):7835–7840, 2001.[272] J. H. Shim, H. M. Chen, J. R. Rich, E. D. Goddard-Borger, and S. G. Withers. Directed evo-lution of a -glycosidase from Agrobacterium sp. to enhance its glycosynthase activity towardC3-modified donor sugars. Protein Engineering Design and Selection, 25(9):465–472, 2012.[273] T. Shirai, H. Ishida, J. Noda, T. Yamane, K. Ozaki, Y. Hakamada, and S. Ito. Crystalstructure of alkaline cellulase K: insight into the alkaline adaptation of an industrial enzyme.Journal of Molecular Biology, 310(5):1079–87, 2001.[274] I. Silman, M. Nakajima, R. Yoshida, A. Miyanaga, K. Abe, Y. Takahashi, N. Sugimoto,H. Toyoizumi, H. Nakai, M. Kitaoka, and H. Taguchi. Functional and Structural Analysis ofa β-Glucosidase Involved in β-1,2-Glucan Metabolism in Listeria innocua. PLOS ONE, 11(2):e0148870, 2016.[275] J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. Jones, and I. Birol. ABySS: aparallel assembler for short read sequence data. Genome Research, 19(6):1117–1123, 2009.[276] H. Smidt, S. J. Noel, G. T. Attwood, J. Rakonjac, C. D. Moon, G. C. Waghorn, and P. H.Janssen. Seasonal changes in the digesta-adherent rumen bacterial communities of dairycattle grazing pasture. PLOS ONE, 12(3):e0173819, 2017.[277] S. A. Smits, J. Leach, E. D. Sonnenburg, C. G. Gonzalez, J. S. Lichtman, G. Reid, R. Knight,A. Manjurano, J. Changalucha, J. E. Elias, M. G. Dominguez-Bello, and J. L. Sonnenburg.205BibliographySeasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science,357(6353):802–806, 2017.[278] R. R. Sokal and C. D. Michener. A statistical method for evaluating systematic relationships.Multivariate Statistical Methods, Among-Groups Covariation, page 269, 1975.[279] C. Somerville, S. Bauer, G. Brininstool, M. Facette, T. Hamann, J. Milne, E. Osborne,A. Paredez, S. Persson, T. Raab, S. Vorwerk, and H. Youngs. Toward a systems approach tounderstanding plant cell walls. Science, 306(5705):2206–2211, 2004.[280] S. J. Song, C. Lauber, E. K. Costello, C. A. Lozupone, G. Humphrey, D. Berg-Lyons, J. G.Caporaso, D. Knights, J. C. Clemente, S. Nakielny, J. I. Gordon, N. Fierer, and R. Knight.Cohabiting family members share microbiota with one another and with their dogs. eLife, 2,2013.[281] J. L. Sonnenburg and F. Bckhed. Dietmicrobiota interactions as moderators of humanmetabolism. Nature, 535(7610):56–64, 2016.[282] N. A. Spiridonov and D. B. Wilson. Cloning and biochemical characterization of BglC, a beta-glucosidase from the cellulolytic Actinomycete Thermobifida fusca. Current Microbiology, 42(4):295–301, 2001.[283] G. Srinivasan. Pyrrolysine Encoded by UAG in Archaea: Charging of a UAG-DecodingSpecialized tRNA. Science, 296(5572):1459–1462, 2002.[284] A. Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of largephylogenies. Bioinformatics, 30(9):1312–3, 2014.[285] F. W. Studier. Protein production by auto-induction in high density shaking cultures. ProteinExpression and Purification, 41(1):207–34, 2005.[286] K. S. Swanson, S. E. Dowd, J. S. Suchodolski, I. S. Middelbos, B. M. Vester, K. A. Barry,K. E. Nelson, M. Torralba, B. Henrissat, P. M. Coutinho, I. K. O. Cann, B. A. White, andG. C. Fahey. Phylogenetic and gene-centric metagenomics of the canine intestinal microbiomereveals similarities with humans and mice. The ISME Journal, 5(4):639–649, 2010.206Bibliography[287] K. Tamura, G. R. Hemsworth, G. Djean, T. E. Rogers, N. A. Pudlo, K. Urs, N. Jain, G. J.Davies, E. C. Martens, and H. Brumer. Molecular Mechanism by which Prominent Hu-man Gut Bacteroidetes Utilize Mixed-Linkage Beta-Glucans, Major Health-Promoting CerealPolysaccharides. Cell Reports, 21(2):417–430, 2017.[288] B. Tan, S. J. Fowler, N. Abu Laban, X. Dong, C. W. Sensen, J. Foght, and L. M. Gieg.Comparative analysis of metagenomes from three methanogenic hydrocarbon-degrading en-richment cultures with 41 environmental samples. The ISME Journal, 9(9):2028–45, 2015.[289] E. Tancula, M. J. Feldhaus, L. A. Bedzyk, and A. A. Salyers. Location and characterizationof genes involved in binding of starch to the surface of Bacteroides thetaiotaomicron. Journalof Bacteriology, 174(17):5609–16, 1992.[290] X. Tang, G. Xie, K. Shao, J. Dai, Y. Chen, Q. Xu, and G. Gao. Bacterial Community Com-position in Oligosaline Lake Bosten: Low Overlap of ¡Betaproteobacteria and Bacteroideteswith Freshwater Ecosystems. Microbes and Environments, 30(2):180–188, 2015.[291] R. L. Tatusov, D. A. Natale, I. V. Garkavtsev, T. A. Tatusova, U. T. Shankavaram, B. S.Rao, B. Kiryutin, M. Y. Galperin, N. D. Fedorova, and E. V. Koonin. The COG database:new developments in phylogenetic classification of proteins from complete genomes. NucleicAcids Research, 29(1):22–8, 2001.[292] M. Taupp, S. Lee, A. Hawley, J. Yang, and S. J. Hallam. Large insert environmental genomiclibrary production. Journal of Visualized Experiments, (31), 2009.[293] M. Taupp, K. Mewis, and S. J. Hallam. The art and design of functional metagenomic screens.Current Opinion in Biotechnology, 22(3):465–72, 2011.[294] M. J. Temple, F. Cuskin, A. Basl, N. Hickey, G. Speciale, S. J. Williams, H. J. Gilbert, andE. C. Lowe. A Bacteroidetes locus dedicated to fungal 1,6-β-glucan degradation: Unique sub-strate conformation drives specificity of the key endo-1,6-β-glucanase. Journal of BiologicalChemistry, 292(25):10639–10650, 2017.207Bibliography[295] N. Terrapon, V. Lombard, H. J. Gilbert, and B. Henrissat. Automatic prediction of polysac-charide utilization loci in Bacteroidetes species. Bioinformatics, 2014.[296] J. Thompson, F. W. Lichtenthaler, S. Peters, and A. Pikis. β-Glucoside Kinase (BglK) fromKlebsiella pneumoniae. Journal of Biological Chemistry, 277(37):34310–34321, 2002.[297] R. Toffanin, S. H. Knutsen, C. Bertocchi, R. Rizzo, and E. Murano. Detection of cellulosein the cell wall of some red algae by 13C NMR spectroscopy. Carbohydrate Research, 262(1):167–171, 1994.[298] H. Togashi, A. Kato, and K. Shimizu. Enzymatically derived aldouronic acids from Eucalyptusglobulus glucuronoxylan. Carbohydrate Polymers, 78(2):247–252, 2009.[299] A. Trincone, G. Perugino, M. Rossi, and M. Moracci. A novel thermophilic Glycosynthasethat effects branching glycosylation. Bioorganic & Medicinal Chemistry Letters, 10(4):365–368, 2000.[300] S. G. Tringe and E. M. Rubin. Metagenomics: DNA sequencing of environmental samples.Nature Reviews Genetics, 6(11):805–14, 2005.[301] T. Tsukada, K. Igarashi, M. Yoshida, and M. Samejima. Molecular cloning and characteriza-tion of two intracellular beta-glucosidases belonging to glycoside hydrolase family 1 from thebasidiomycete Phanerochaete chrysosporium. Applied Microbiology and Biotechnology, 73(4):807–14, 2006.[302] T. Uchiyama and K. Miyazaki. Functional metagenomics for enzyme discovery: challengesto efficient screening. Current Opinion in Biotechnology, 20(6):616–622, 2009.[303] M. Umekawa, W. Huang, B. Li, K. Fujita, H. Ashida, L. X. Wang, and K. Yamamoto. Mutantsof Mucor hiemalis endo-beta-N-acetylglucosaminidase show enhanced transglycosylation andglycosynthase-like activities. Journal of Biological Chemistry, 283(8):4469–79, 2008.[304] R. Vanholme, B. Demedts, K. Morreel, J. Ralph, and W. Boerjan. Lignin biosynthesis andstructure. Plant Physiology, 153(3):895–905, 2010.208Bibliography[305] M. Vega-Sanchez, Y. Verhertbruggen, H. V. Scheller, and P. Ronald. Abundance of mixedlinkage glucan in mature tissues and secondary cell walls of grasses. Plant Signaling & Be-havior, 8(2):e23143, 2014.[306] J. K. Vester, M. A. Glaring, and P. Stougaard. Discovery of novel enzymes with industrialpotential from a cold and alkaline environment by a combination of functional metagenomicsand culturing. Microbial Cell Factories, 13:72, 2014.[307] A. H. Viborg, T. Katayama, T. Arakawa, M. Abou Hachem, L. Lo Leggio, M. Kitaoka,B. Svensson, and S. Fushinobu. Discovery of α-L-arabinopyranosidases from human gutmicrobiome expands the diversity within glycoside hydrolase family 42. Journal of BiologicalChemistry, 292(51):21092–21101, 2017.[308] C. Vispo and I. D. Hume. The digestive tract and digestive function in the North Americanporcupine and beaver. Canadian Journal of Zoology, 73(5):967–974, 1995.[309] J. Vogel. Unique aspects of the grass cell wall. Current Opinion in Plant Biology, 11(3):301–307, 2008.[310] H. Wang, R. Wang, K. Cai, H. He, Y. Liu, J. Yen, Z. Wang, M. Xu, Y. Sun, X. Zhou, Q. Yin,L. Tang, I. T. Dobrucki, L. W. Dobrucki, E. J. Chaney, S. A. Boppart, T. M. Fan, S. Lezmi,X. Chen, L. Yin, and J. Cheng. Selective in vivo metabolic cell-labeling-mediated cancertargeting. Nature Chemical Biology, 13(4):415–424, 2017.[311] K. Wang, G. V. Pereira, J. J. V. Cavalcante, M. Zhang, R. Mackie, and I. Cann. Bacteroidesintestinalis DSM 17393, a member of the human colonic microbiome, upregulates multipleendoxylanases during growth on xylan. Scientific Reports, 6(1), 2016.[312] Q. Wang and S. G. Withers. Substrate-assisted catalysis in glycosidases. Journal of theAmerican Chemical Society, 117(40):10137–10138, 1995.[313] Q. Wang, R. W. Graham, D. Trimbur, R. A. J. Warren, and S. G. Withers. ChangingEnzymic Reaction Mechanisms by Mutagenesis: Conversion of a Retaining Glucosidase to anInverting Enzyme. Journal of the American Chemical Society, 116(25):11594–11595, 1994.209Bibliography[314] B. B. Ward. How many species of prokaryotes are there? Proc Natl Acad Sci, 99(16):10234–10236, 2002.[315] F. Warnecke, P. Luginbuhl, N. Ivanova, M. Ghassemian, T. H. Richardson, J. T. Stege,M. Cayouette, A. C. McHardy, G. Djordjevic, N. Aboushadi, R. Sorek, S. G. Tringe, M. Podar,H. G. Martin, V. Kunin, D. Dalevi, J. Madejska, E. Kirton, D. Platt, E. Szeto, A. Salamov,K. Barry, N. Mikhailova, N. C. Kyrpides, E. G. Matson, E. A. Ottesen, X. Zhang, M. Her-nandez, C. Murillo, L. G. Acosta, I. Rigoutsos, G. Tamayo, B. D. Green, C. Chang, E. M.Rubin, E. J. Mathur, D. E. Robertson, P. Hugenholtz, and J. R. Leadbetter. Metagenomicand functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature, 450(7169):560–565, 2007.[316] S. Whelan and N. Goldman. A general empirical model of protein evolution derived frommultiple protein families using a maximum-likelihood approach. Molecular Biology and Evo-lution, 18(5):691–9, 2001.[317] W. B. Whitman, D. C. Coleman, and W. J. Wiebe. Prokaryotes: the unseen majority. ProcNatl Acad Sci, 95(12):6578–83, 1998.[318] A. Wierzbicka-Wos, P. Bartasun, H. Cieslinski, and J. Kur. Cloning and characterization of anovel cold-active glycoside hydrolase family 1 enzyme with beta-glucosidase, beta-fucosidaseand beta-galactosidase activities. BMC Biotechnology, 13:22, 2013.[319] S. Willfo¨r, A. Sundberg, A. Pranovich, and B. Holmbom. Polysaccharides in some industriallyimportant hardwood species. Wood Science and Technology, 39(8):601–617, 2005.[320] S. G. Withers, I. P. Street, P. Bird, and D. H. Dolphin. 2-Deoxy-2-fluoroglucosides: a novelclass of mechanism-based glucosidase inhibitors. Journal of the American Chemical Society,109(24):7530–7531, 1987.[321] C. R. Woese. Bacterial evolution. Microbiological Reviews, 51(2):221–71, 1987.[322] D. W. Wong. Structure and action mechanism of ligninolytic enzymes. Applied Biochemistryand Biotechnology, 157(2):174–209, 2009.210Bibliography[323] M. T. Wong, W. Wang, M. Lacourt, M. Couturier, E. A. Edwards, and E. R. Master.Substrate-Driven Convergence of the Microbial Community in Lignocellulose-Amended En-richments of Gut Microflora from the Canadian Beaver (Castor canadensis) and North Amer-ican Moose (Alces americanus). Frontiers in Microbiology, 7, 2016.[324] M. T. Wong, W. Wang, M. Couturier, F. M. Razeq, V. Lombard, P. Lapebie, E. A. Edwards,N. Terrapon, B. Henrissat, and E. R. Master. Comparative Metagenomics of Cellulose- andPoplar Hydrolysate-Degrading Microcosms from Gut Microflora of the Canadian Beaver (Cas-tor canadensis) and North American Moose (Alces americanus) after Long-Term Enrichment.Frontiers in Microbiology, 8, 2017.[325] J. J. Wright, S. Lee, E. Zaikova, D. A. Walsh, and S. J. Hallam. DNA Extraction from0.22 mu;M Sterivex Filters and Cesium Chloride Density Gradient Centrifugation. Journalof Visualized Experiments, (31), 2009.[326] J. J. Wright, K. Mewis, N. W. Hanson, K. M. Konwar, K. R. Maas, and S. J. Hallam.Genomic properties of Marine Group A bacteria indicate a role in the marine sulfur cycle.The ISME Journal, 8(2):455–468, 2013.[327] Y. Q. Xu, C. J. Duan, Q. N. Zhou, J. L. Tang, and J. X. Feng. Cloning and identification ofcellulase genes from uncultured microorganisms in pulp sediments from paper mill effluent.Wei Sheng Wu Xue Bao, 46(5):783–8, 2006.[328] C. Yang, Y. Niu, C. Li, D. Zhu, W. Wang, X. Liu, B. Cheng, C. Ma, and P. Xu. Characteriza-tion of a novel metagenome-derived 6-phospho-beta-glucosidase from black liquor sediment.Applied and Environmental Microbiology, 79(7):2121–2127, 2013.[329] G.-Y. Yang, C. Li, M. Fischer, C. W. Cairo, Y. Feng, and S. G. Withers. A FRET Probefor Cell-Based Imaging of Ganglioside-Processing Enzyme Activity and High-ThroughputScreening. Angewandte Chemie, 127(18):5479–5483, 2015.[330] M. Yang, S. M. Luoh, A. Goddard, D. Reilly, W. Henzel, and S. Bass. The bgIX genelocated at 47.8 min on the Escherichia coli chromosome encodes a periplasmic β-glucosidase.Microbiology, 142(7):1659–1665, 1996.211Bibliography[331] P. Yarza, P. Yilmaz, E. Pruesse, F. O. Glo¨ckner, W. Ludwig, K.-H. Schleifer, W. B. Whitman,J. Euzby, R. Amann, and R. Rossell-Mra. Uniting the classification of cultured and unculturedbacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology, 12(9):635–645, 2014.[332] Y. F. Yeh, S. C. Chang, H. W. Kuo, C. G. Tong, S. M. Yu, and T. H. Ho. A metagenomicapproach for the identification and cloning of an endoglucanase from rice straw compost.Gene, 519(2):360–366, 2013.[333] V. L. Y. Yip, A. Varrot, G. J. Davies, S. S. Rajan, X. Yang, J. Thompson, W. F. Anderson,and S. G. Withers. An Unusual Mechanism of Glycoside Hydrolysis Involving Redox andElimination Steps by a Family 4 β-Glycosidase from Thermotoga maritima. Journal of theAmerican Chemical Society, 126(27):8354–8355, 2004.[334] W. S. York. The composition and structure of plant primary cell walls. The Plant Cell Wall,pages 1–54, 2003.[335] W. S. York, A. G. Darvill, and P. Albersheim. Inhibition of 2,4-dichlorophenoxyacetic Acid-stimulated elongation of pea stem segments by a xyloglucan oligosaccharide. Plant Physiology,75(2):295–7, 1984.[336] D. L. Zechel and S. G. Withers. Glycosidase mechanisms: anatomy of a finely tuned catalyst.Accounts of Chemical Research, 33(1):11–8, 2000.[337] C. Zhang, Q. Fu, C. Albermann, L. Li, and J. S. Thorson. The In Vitro Characterization ofthe Erythronolide Mycarosyltransferase EryBV and Its Utility in Macrolide Diversification.ChemBioChem, 8(4):385–390, 2007.[338] F. Zhang and J. Keasling. Biosensors and their applications in microbial metabolic engineer-ing. Trends in Microbiology, 19(7):323–329, 2011.[339] F. Zhang, J. M. Carothers, and J. D. Keasling. Design of a dynamic sensor-regulator systemfor production of chemicals and fuels derived from fatty acids. Nature Biotechnology, 30(4):354–359, 2012.212[340] M. Zhang, N. Liu, C. Qian, Q. Wang, Q. Wang, Y. Long, Y. Huang, Z. Zhou, and X. Yan.Phylogenetic and Functional Analysis of Gut Microbiota of a Fungus-Growing Higher Termite:Bacteroidetes from Higher Termites Are a Rich Source of β-Glucosidase Genes. MicrobialEcology, 68(2):416–425, 2014.[341] X. Zhang, D. E. Green, V. L. Schultz, L. Lin, X. Han, R. Wang, A. Yaksic, S. Y. Kim, P. L.DeAngelis, and R. J. Linhardt. Synthesis of 4-Azido-N-acetylhexosamine Uridine DiphosphateDonors: Clickable Glycosaminoglycans. The Journal of Organic Chemistry, 82(18):9910–9915, 2017.[342] L. Zhu, Q. Wu, J. Dai, S. Zhang, and F. Wei. Evidence of cellulose metabolism by the giantpanda gut microbiome. Proc Natl Acad Sci, 108(43):17714–17719, 2011.[343] Y. Zhu and X. Chen. Expanding the Scope of Metabolic Glycan Labeling in Arabidopsisthaliana. ChemBioChem, 18(13):1286–1296, 2017.213Appendix AChapter 2 Supplemental MaterialA.1 Supplemental TablesTable A.1: Relative Initial Rates of Hydrolysis by Fosmids Clones. Rates are given as a percentage of the maximumrate on MU cellobioside (MU-C), MU lactoside (MU-Lac), MU β-D-mannoside (MU-Man), MU β-D-galactoside (MU-Gal), MU β-D-xyloside (MU-X), MU β-D-glucoside (MU-Glc) , MU N -acetyl-β-D-glucosaminide (MU-GlcNAc) orMU α-L-arabinoside (MU-Ara).Fosmid MU-C MU-Lac MU-Man MU-Gal MU-X MU-Glc MU-GlcNAc MU-Ara12200 16 F10 100 67 0 6 21 3 0 912500 09 F02 100 71 0 1 0 0 0 140500 12 L11 55 14 6 0 11 17 1 100CB002 04 H07 5 3 0 35 0 100 1 0CB003 08 B11 11 3 0 2 1 100 0 3CB004 07 C21 2 0 100 3 3 15 18 8CB004 10 B20 8 0 1 1 16 100 58 5CB005 08 O01 9 3 0 1 2 3 100 6CB006 04 L11 50 8 0 28 25 100 56 26CB006 08 D19 2 2 0 18 5 100 2 3CG23A 01 C20 21 0 1 0 1 100 4 3CG23A 09 O05 27 18 0 12 0 100 0 11CG23A 23 H23 9 12 0 16 0 14 100 7CO002 07 L07 100 86 0 7 0 0 0 0CO003 01 D22 100 68 0 0 1 1 0 1CO003 10 H14 65 100 1 6 0 84 0 5CO004 05 B17 2 2 0 53 100 26 0 2CO004 10 P05 12 0 1 30 7 100 14 7CO182 11 I14 16 1 0 5 12 51 0 100214A.1. Supplemental TablesTable A.1 – Continued from previous pageFosmid MU-Cel MU-Lac MU-Man MU-Gal MU-Xyl MU-Glc MU-GlcNAc MU-AraCO182 24 J12 1 1 0 100 1 6 0 4CO182 36 O01 12 0 0 7 2 100 6 3CO182 36 O04 100 61 7 80 12 46 0 0CO183 09 B08 100 56 0 4 0 1 0 0CO183 11 O01 83 85 0 59 0 100 0 19FOS62 08 C22 2 1 0 2 100 3 6 0FOS62 08 D12 100 93 1 12 64 7 0 6FOS62 08 G04 14 18 0 4 0 100 1 9FOS62 08 J18 24 38 0 2 0 100 10 24FOS62 10 O15 0 0 0 0 0 11 0 100FOS62 10 P15 10 7 0 0 2 100 2 97FOS62 21 B24 3 2 0 100 1 6 1 6FOS62 21 D16 5 3 0 28 0 2 16 100FOS62 21 J05 11 0 0 6 2 100 7 3FOS62 22 C08 18 0 0 71 36 53 0 100FOS62 23 B24 19 0 0 46 9 100 9 5FOS62 23 F03 100 75 0 0 0 37 0 22FOS62 23 J07 100 88 0 0 0 0 0 0FOS62 24 J23 100 89 0 0 0 0 0 0FOS62 24 L18 50 31 0 4 22 100 19 1FOS62 24 P09 22 4 0 1 16 100 1 0FOS62 25 H06 38 9 0 10 10 100 7 25FOS62 25 L08 18 0 0 33 10 100 0 1FOS62 25 O06 2 2 1 100 0 3 9 28FOS62 26 C23 15 5 2 100 1 33 1 17FOS62 26 C24 14 1 0 14 62 74 100 31FOS62 26 K06 100 55 1 2 0 5 1 0FOS62 26 K16 100 46 2 1 1 45 0 0FOS62 26 L14 8 0 0 10 65 100 0 0FOS62 26 M02 100 73 0 0 0 86 7 53FOS62 27 M17 44 0 0 38 21 100 61 47FOS62 27 N22 12 13 0 32 0 100 8 9FOS62 27 P24 100 95 0 0 0 0 0 0215A.1. Supplemental TablesTable A.1 – Continued from previous pageFosmid MU-Cel MU-Lac MU-Man MU-Gal MU-Xyl MU-Glc MU-GlcNAc MU-AraFOS62 28 A14 100 90 0 0 0 0 0 0FOS62 28 K23 17 1 0 2 1 100 1 88FOS62 29 C04 5 2 0 15 0 100 1 2FOS62 29 F15 8 0 0 0 0 17 0 100FOS62 30 E15 69 74 0 19 0 100 68 52FOS62 30 E20 6 0 0 13 85 100 0 0FOS62 30 H03 30 0 0 23 21 100 47 16FOS62 30 J11 100 73 1 13 0 1 0 2FOS62 30 L24 100 78 2 0 2 5 0 0FOS62 30 N01 2 0 0 1 1 7 100 1FOS62 34 D13 100 48 6 7 3 6 0 0FOS62 34 J06 19 0 0 65 2 100 5 2FOS62 34 K14 4 0 0 2 1 100 7 4FOS62 34 O23 100 79 0 7 0 68 38 30FOS62 35 C14 11 1 0 28 1 100 1 2FOS62 36 J17 73 29 0 0 67 100 90 0FOS62 36 K01 100 89 0 0 0 1 0 6FOS62 37 C18 18 10 3 75 1 3 25 100FOS62 37 N04 21 0 0 68 3 100 17 2FOS62 37 N12 100 43 6 2 3 5 2 0FOS62 38 A06 3 0 0 0 0 7 0 100FOS62 38 C16 5 5 0 20 79 100 0 3FOS62 38 D22 54 23 0 1 46 100 46 0FOS62 38 G18 100 47 14 0 2 11 0 0FOS62 38 N16 11 0 0 26 0 100 3 1FOS62 40 E07 100 82 0 0 0 1 0 0FOS62 40 G22 0 0 0 0 0 3 0 100FOS62 41 A23 1 0 0 1 0 10 0 100FOS62 41 C11 0 0 0 0 0 100 0 0FOS62 41 D24 13 2 0 28 34 100 0 1FOS62 41 I01 45 100 0 1 0 1 0 0FOS62 41 K10 100 99 0 0 0 0 0 0FOS62 41 K19 4 0 0 12 42 100 0 0216A.1. Supplemental TablesTable A.1 – Continued from previous pageFosmid MU-Cel MU-Lac MU-Man MU-Gal MU-Xyl MU-Glc MU-GlcNAc MU-AraFOS62 41 L01 66 100 3 81 11 22 1 0FOS62 41 N11 11 2 1 1 100 1 3 0FOS62 42 D11 100 65 0 0 0 0 0 0FOS62 42 K13 95 58 0 1 61 100 63 0FOS62 43 C07 100 63 6 11 1 10 0 0FOS62 43 F03 32 0 0 3 100 19 3 0FOS62 43 J20 100 39 2 2 2 2 7 2FOS62 43 J23 100 92 0 3 0 0 0 0FOS62 43 O18 100 83 0 0 0 0 0 0FOS62 44 A15 100 87 0 5 0 0 0 11FOS62 44 E09 100 81 0 0 0 0 0 0FOS62 44 F23 13 0 0 30 5 100 0 1FOS62 44 J10 51 22 0 100 0 87 4 15FOS62 45 J16 100 77 0 1 0 41 18 25FOS62 46 D05 0 2 0 100 0 3 0 5FOS62 46 E02 8 2 1 6 100 1 3 0FOS62 46 L17 100 32 10 2 7 11 24 37FOS62 47 B05 100 76 0 2 3 9 0 0FOS62 47 F04 100 50 2 1 1 2 0 0FOS62 47 H05 2 1 0 100 6 87 0 1FOS62 47 J09 17 0 0 54 1 100 2 2FOS62 47 P19 11 0 0 6 6 100 63 19NA001 01 P12 1 1 0 1 1 11 0 100NA001 02 B17 2 3 0 26 0 100 0 1NA001 07 E13 100 67 0 1 0 6 0 0NA001 07 F24 0 0 0 0 0 0 0 100NA001 11 K24 4 1 0 22 1 100 0 0NA001 16 B03 12 7 0 10 12 39 25 100NA002 01 B04 28 7 0 41 37 85 60 100NA004 04 B18 100 49 19 2 4 21 0 0NapDC 20 D21 100 10 1 1 2 6 0 0NapDC 21 E17 7 3 0 6 2 26 20 100NapDC 52 E10 14 2 0 91 34 100 0 1217A.1. Supplemental TablesTable A.1 – Continued from previous pageFosmid MU-Cel MU-Lac MU-Man MU-Gal MU-Xyl MU-Glc MU-GlcNAc MU-AraNapDC 53 D04 18 11 0 17 10 100 10 7NB001 03 I24 5 3 0 2 1 7 1 100NB001 12 A01 100 55 0 35 13 75 0 12NB001 13 B14 100 48 1 32 13 42 0 6NB001 14 K20 1 1 0 10 2 100 0 2NB001 23 D20 4 2 0 1 5 8 0 100NO001 01 G23 82 0 0 8 53 100 0 0NO001 01 I19 38 32 40 94 40 100 3 0NO001 03 P09 16 0 1 3 11 100 4 0NO001 04 B04 0 0 0 100 0 1 0 3NO001 06 D04 100 71 0 0 0 77 0 9NO001 07 A13 39 18 38 1 32 50 2 100NO001 08 K19 36 12 3 2 15 100 25 0NO001 08 N01 42 0 0 3 23 100 0 0NO001 10 L12 77 58 0 6 3 10 100 1NO001 13 N07 100 73 0 17 2 1 5 2NO002 01 J07 0 1 0 54 0 100 0 0NO002 04 P09 4 0 0 0 10 13 6 100NO002 07 F01 37 24 0 9 24 100 51 6NO002 11 N21 56 52 0 90 0 100 10 19NR003 03 D21 67 16 0 17 73 100 96 0NR003 09 O07 100 55 0 0 19 36 14 0NR003 36 K13 100 54 0 1 0 17 0 0PWCG7 19 I21 2 1 0 2 1 100 1 1PWCG7 19 J20 2 2 0 7 0 100 0 1PWCG7 33 K24 37 6 6 0 100 5 2 0PWCG7 49 G20 100 66 0 0 0 0 1 1SCR03 04 B15 6 0 3 1 4 100 1 0SCR03 01 L21 6 5 0 100 0 4 0 4TolDC 06 L02 7 0 0 0 2 47 0 100TolDC 08 I17 2 0 0 0 2 12 0 100TolDC 10 A11 12 0 0 1 8 57 1 100TolDC 13 D14 74 69 0 8 0 100 3 58218A.1. Supplemental TablesTable A.1 – Continued from previous pageFosmid MU-Cel MU-Lac MU-Man MU-Gal MU-Xyl MU-Glc MU-GlcNAc MU-AraTolDC 15 C08 21 1 0 70 14 100 0 1TolDC 15 D05 5 0 14 12 9 100 46 0TolDC 15 E19 9 0 0 0 3 24 0 100TolDC 15 G15 24 2 0 59 11 100 0 1TolDC 20 J14 31 15 0 0 100 0 1 0TolDC 22 A01 0 1 0 1 0 11 1 100TolDC 22 J01 23 0 0 58 15 100 0 1TolDC 25 I24 0 0 0 5 7 100 0 0TolDC 30 A19 49 12 6 0 4 100 0 9TolDC 30 J10 4 0 0 1 4 31 0 100TolDC 31 E21 24 11 1 0 100 7 1 0TolDC 31 L02 56 60 0 0 0 100 6 7TolDC 32 D22 2 1 0 0 0 2 0 100TolDC 35 I03 19 0 1 26 0 3 21 100TolDC 38 E11 18 1 0 62 12 100 0 3TolDC 39 M03 0 0 0 0 0 43 17 100TolDC 41 A17 53 29 0 15 20 100 34 26TolDC 46 B16 20 1 0 2 15 81 0 100TolDC 50 B06 16 4 0 2 13 23 0 100TolDC 50 P08 18 9 0 4 100 10 0 2TolDC 55 H19 73 33 0 27 100 66 2 0TolDC 56 H11 16 0 0 0 1 72 100 2TolDC 56 L15 9 8 0 53 0 19 23 100TolDC 59 E21 100 64 0 46 4 96 4 3TolDC 59 J01 9 3 0 3 4 9 0 100TolDC 59 J06 8 2 0 1 15 56 0 100TolDC 59 K14 100 62 0 2 0 0 0 1219Appendix BChapter 3 Supplemental MaterialB.1 Supplemental Figures220B.1.SupplementalFiguresPolar BearBlack BearGiant PandaEchidnaSpectacled BearArmadilloSquirrelCallimicosBlack LemurBush DogHyenaLionDogGorillaOrangutanBaboonUrialSpringbokBighorn SheepOkapiGazelleZebraAfrican ElephantHorseKangarooRingtailed LemurColobusGiraffeChimpanzeeRabbitWallabyBeaverVisayan Warty PigSakiRock HyraxBlack RhinoCapybaraReindeerHindgut FermenterForegut FermenterOmnivoreCarnivore0 6-6z-score Plant Cell Wall Acting(Cellulose, Hemicellulose, Pectin)Animal Polysaccharide Acting(Glycosaminoglycan)Xylan Acting Bacterial Cell Wall Modifying(peptidoglycan, glycoproteins)Figure B.1: Unabridged Comparison of Beaver Fecal Metagenome with Other Sequenced Mammal Microbiomes. Heatmap showsenrichment (blue) or depletion (red) of all families of CAZymes for each mammal. Clustering of mammals shows CAZyme abundancecorrelates with host digestive strategy. Clusters of genes enriched in herbivores include: 1) families active on plant polysaccharidesincluding cellulose, hemicellulose and pectin; 2) families active on xylan. Clusters of genes enriched in carnivores include: 3) familiesactive on animal polysaccharides such as glycosaminoglycans. Figure generated and analysed by Dr. Keith Mewis. Metagenomicdata from previous studies [208, 241, 286, 342] was downloaded from the RAST online database and used for comparison. Counts ofeach GH family were normalized for library size using a variance stabilizing transformation provided in the DESeq2 R package [9],and the z-score for each GH family was calculated on a per sample basis. Samples and GH families were independently clusteredusing the Manhattan distance metric, and z-scores were plotted as a heatmap.221Appendix CChapter 4 Supplemental MaterialC.0.1 NMR Assignments of Glycosynthase ProductsGlc-β-1,3-Glc-β-pNP (pNP Laminaribiose)1H NMR (400 MHz, Deuterium Oxide) δ 8.28 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.26 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.31 (d, J1,2 = 7.6 Hz, 1H, H-1), 4.81 (d, J1′,2′ = 8.2 Hz, 1H,H-1), 3.97 (dd, J5,6a′= 2 Hz, J6a,6b= 12.3, 1H, H-6a), 3.95 (dd, J5′,6′a= 2.1 Hz, J6′a,6′b= 12.4 Hz,1H, H-6a), 3.91 (dd, J2,3′= 9.3 Hz, J3,4= 8.5 Hz, 1H, H-3), 3.85 (dd, J1,2 = 7.6 Hz, J2,3′= 9.3 Hz,1H, H-2), 3.80 (dd, J5,6a′= 5.3 Hz, J6a,6b= 12.3, 1H, H-6b), 3.74 (dd, J5′,6b′= 5.9 Hz, J6′a,6′b= 12.3Hz, 1H, H-6b), 3.76-3.73 (m, 1H, H-5), 3.65 (dd, J3,4= 8.5 Hz, J4,5= 9.7 Hz, 1H, H-4), 3.55 (dd,J2′,3′= 9.4 Hz, J3′,4′= 8.9 Hz, 1H, H-3), 3.51 (ddd, J4′,5′= 9.7 Hz, J5′,6′a= 2.2 Hz, J5′,6b′= 6.0 Hz,1H, H-5), 3.43 (dd, J3′,4′= 8.9 Hz, J4′,5′= 9.8 Hz, 1H, H-4), 3.40 (dd, J1′,2′ = 7.9 Hz, J2′,3′= 9.4Hz, 1H, H-2).13C NMR (101 MHz, Deuterium Oxide) δ 161.78 (pNP-C-1), 142.75 (pNP-C-4), 126.21(2C,pNP-C-3 and C-5), 116.60 (2C, pNP-C-2 and C-6), 102.92(C-1), 99.33 (C-1), 83.95(C-3), 76.13(C-5), 76.01(C-5), 75.65(C-3), 73.56(C-2), 72.64(C-2), 69.70(C-4), 67.93(C-4), 60.82(C-6), 60.51(C-6).13C NMR (101 MHz, MeOD) δ 163.78, 143.95, 126.61 (2xC), 117.75(2xC), 105.24, 101.21, 87.48,78.22, 78.04, 77.83, 75.52, 74.08, 71.57, 69.64, 62.64, 62.30.Glc-β-1,4-Glc-β-pNP (pNP cellobioside)1H NMR (400 MHz, Deuterium Oxide) δ 8.29 (d,J= 9.2 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J=9.2 Hz, 2H, 2x pNP -O2N-C-C-H), 5.32 (d, J1,2 = 7.8 Hz, 1H, H-1), 4.56 (d, J1′,2′ = 7.9 Hz, 1H,H-1), 4.02 (dd, J6a,6b= 10.4, J5,6a′= 3.6 Hz, 1H, H-6a), 3.95 (dd, J6′a,6′b= 12.5 Hz, J5′,6′a= 2.1 Hz,1H, H-6a), 3.87-3.81 (m, 2H, H-5 and H-6b), 3.82-3.74 (m, 3H, H-3, H-4, H-6b), 3.70 (dd, J1,2 =222Appendix C. Chapter 4 Supplemental Material7.8 Hz, J2,3′= 9.3 Hz, 1H, H-2), 3.54 (dd, J2′,3′= 9.2 Hz, J3′,4′= 9.2 Hz, 1H, H-3), 3.54-3.50 (m,1H, H-5), 3.44 (dd, J3′,4′= 9.2 Hz, J4′,5′= 9.2 Hz, 1H, H-4), 3.35 (dd, J1′,2′ = 8.0 Hz, J2′,3′= 9.2Hz, 1H, H-2).13C NMR (101 MHz, Deuterium Oxide) δ 161.64 (pNP-C-1), 142.63 (pNP-C-4), 126.09(2C,pNP-C-3 and C-5), 116.46 (2C, pNP-C-2 and C-6), 102.56(C-1), 99.22 (C-1), 78.14 (C-4), 75.99(C-5), 75.48 (C-3), 75.08(C-5), 73.97(C-3), 73.15(C-2), 72.51(C-2), 69.44(C-4), 60.56 (C-6), 59.73(C-6).Gal-β-1,2-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.33 8.26 (m, 2H), 7.32 7.23 (m, 2H), 5.49 (d, J = 7.5Hz, 1H, H-1), 4.80 (d, J = 7.5 Hz, 1H, H-1), 3.98 3.89 (m, 3H, H-6, H-4, H-2), 3.84 (pt, J = 9.1Hz, 1H, H-3), 3.80 3.68 (m, 3H, H-6, H-5, H-3), 3.66 3.61 (m, 1H, H-5), 3.60 3.53 (m, 3H, H-5,H-4, H-6a), 3.25 (dd, J = 11.2, 6.4 Hz, 1H, H-6b).Linkage was determined by 1H-13C HMBC experiment (Heteronuclear Multiple Bond Corre-lation) showing correlation between H-1 (4.80 ppm) and C-2 (81.19 ppm), COSY (homonuclearCOrrelation SpectroscopY) experiment showing correlation between H-1 (5.49 ppm) and H-2 (3.93ppm), as well as 1H-13C HSQC experiment (Heteronuclear Single Quantum Correlation) showingcorrelation between H-2 (3.93 ppm) and C-2 (81.19 ppm).Gal-β-1,3-Glc-β-pNP1H NMR was shown to be identical to that data previously recorded by Faijes et al [87].Glc-β-1,4-Glc-β-1,3-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide)1,2 δ 8.29 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.26 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.32 (d, J1,2 = 7.7 Hz, 1H, H-1), 4.84 (d, J1′,2′ = 8.1 Hz, 1H,H-1), 4.53 (d, J1′′,2′′ = 8.0 Hz, 1H, H-1), 4.02 (dd, J5,6a′= 2 Hz, J6a,6b= 12.3, 1H, H-6a), 3.95 (dd,J5′,6′a= 1.8 Hz, J6′a,6′b= 12.5 Hz, 1H, H-6a), 3.94 (dd, J5′′,6′′a= 2.1 Hz, J6′′′a,6′′′b= 12.4 Hz, 1H,H-6a), 3.92 (dd, J2,3′= 9.2 Hz, J3,4= 8.5 Hz, 1H, H-3), 3.87 (dd, J1,2 = 7.7 Hz, J2,3′= 9.1 Hz,1H, H-2), 3.84 (dd, J5′,6′b= 5.2 Hz, J6a,6b= 12.5 Hz, 1H, H-6b), 3.80 (dd, J5′,6b′= 5.4 Hz, J6′a,6′b=12.4 Hz, 1H, H-6b), 3.78-3.72 (m, 2H, H-5 and H-6b), 3.72-3.61 (m, 4H, H-4, H-3, H-4 and H-5),223Appendix C. Chapter 4 Supplemental Material3.57-3.47 (m, 2H, H-3 and H-5), 3.44 (dd, J1′,2′ = 7.8 Hz, J2′,3′= 9.5 Hz, 1H, H-2), 3.43 (dd, J3′′,4′′=8.9 Hz, J4′′,5′′= 9.8 Hz, 1H, H-4), 3.33 (dd, J1′′,2′′ = 7.9 Hz, J2′′,3′′= 9.4 Hz, 1H, H-2).13C NMR (101 MHz, Deuterium Oxide) δ 161.78 (pNP-C-1), 142.76 (pNP-C-4), 126.21(2C,pNP-C-3 and C-5), 116.60 (2C, pNP-C-2 and C-6), 102.66 and 102.51 (C-1 and C-1), 99.35 (C-1),83.76 (C-3), 78.69 (C-4), 76.09(C-5), 76.01(C-5), 75.59(C-5), 74.95(C-3), 74.23 (C-3), 73.35(C-2),73.25(C-2), 72.67(C-2), 69.55(C-4), 67.88 (C-4), 60.67 (C-6), 60.50 (C-6), 60.12(C-6).Glc-β-1,4-Glc-β-1,4-Glc-β-pNP (pNP cellotriose)1H NMR (400 MHz, Deuterium Oxide) δ 8.27 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.25 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.30 (d, J1,2 = 7.8 Hz, 1H, H-1), 4.58 (d, J1′,2′ = 8.0 Hz, 1H,H-1), 4.53 (d, J1′′,2′′ = 7.9 Hz, 1H, H-1), 4.05-3.97 (m, 2H, H-6a and H-6a), 3.93 (dd, J5′′,6′′a= 2.1Hz, J6′′′a,6′′′b= 12.4 Hz, 1H, H-6a), 3.90-3.81 (m, 3H, H-5, H-6b and H-6b), 3.80-3.75 (m, 2H, H-3and H-4), 3.72 (dd, J5′′,6′′b= 6.2 Hz, J6′′′a,6′′′b= 12.4 Hz, 1H, H-6b), 3.70-3.61 (m, 4H, H-2, H-3, H-4and H-5), 3.55-3.48 (m, 2H, H-3 and H-5), 3.47 (dd, J3′′,4′′= 8.8 Hz, J4′′,5′′= 9.7 Hz, 1H, H-4), 3.40(dd, J1′,2′ = 8.1 Hz, J2′,3′= 9.1 Hz, 1H, H-2), 3.33 (dd, J1′′,2′′ = 8.1 Hz, J2′′,3′′= 9.2 Hz, 1H, H-2).13C NMR (101 MHz, Deuterium Oxide) δ 161.76 (pNP-C-1), 142.72 (pNP-C-4), 126.21(2C,pNP-C-3 and C-5), 116.58 (2C, pNP-C-2 and C-6), 102.67(C-1), 102.47(C-1), 99.37(C-1), 78.51(C-4), 78.16(C-4), 76.09(C-5), 75.58(C-3), 75.20(C-5), 74.94(C-5), 74.16(C-3), 74.05(C-3), 73.25(C-2),73.04(C-2), 72.63(C-2), 69.56(C-4), 60.68 (C-6), 60.01(C-6), 59.82(C-6).Glc-β-1,3-Xyl-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.29 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J= 9.3Hz, 2H, 2x pNP -O2N-C-C-H), 5.32-5.24 (m, 1H, H-1), 4.81 (d, J1′,2′ = 8.7 Hz, 1H, H-1), 4.14-4.09(m, H-5a), 3.95 (dd, J5′,6′a= 2.2 Hz, J6′a,6′b= 12.3 Hz, 1H, H-6a), 3.90-3.82 (m, 3H, H-2, H-3, H-4),3.74 (dd, J5′,6b′= 6.1 Hz, J6′a,6′b= 12.3 Hz, 1H, H-6b), 3.64-3.47 (m, 3H, H-5b, H-3 and H-5), 3.43(dd, J3′,4′= 9.1 Hz, J4′,5′= 9.9 Hz, 1H, H-4), 3.39 (dd, J1′,2′ = 8.0 Hz, J2′,3′= 9.3 Hz, 1H, H-2).13C NMR (101 MHz, Deuterium Oxide) δ 161.51(pNP-C-1), 142.67 (pNP-C-4), 126.09(2C, pNP-C-3 and C-5), 116.47 (2C, pNP-C-2 and C-6), 102.68 (C-1), 99.78(C-1), 83.34(C-3), 75.99(C-5),75.54(C-3), 73.42(C-2), 72.26(C-2), 69.62(C-4), 67.59(C-4), 64.93(C-5), 60.73(C-6).224Appendix C. Chapter 4 Supplemental Material13C NMR (101 MHz, Methanol-d4) δ 163.63(pNP-C-1), 143.96(pNP-C-4), 126.61(2C, pNP-C-3 and C-5), 117.66(2C, pNP-C-2 and C-6), 105.07(C-1), 101.76(C-1), 86.87(C-3), 78.20(C-5),77.84(C-3), 75.49(C-2), 73.76(C-2), 71.62(C-4), 69.69(C-4), 66.65(C-5), 62.67(C-6).Glc-β-1,2-Xyl-α-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.35 8.20 (m, 2H), 7.35 7.24 (m, 2H), 6.09 (d, J = 3.5Hz, 1H, H-1), 4.63 (d, J = 7.7 Hz, 1H, H-1), 4.05 (t, J = 9.3 Hz, 1H), 3.91 (dd, J = 3.6 Hz, 9.1Hz, 1H, H-2), 3.91 3.88 (m, 1H), 3.84 3.73 (m, 2H), 3.70 3.63 (m, 2H), 3.61 3.53 (m, 3H), 3.45(dd, J = 11.4, 7.3 Hz, 1H, H-6b).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.63ppm) and C-2 (80.3 ppm), COSY experiment showing correlation between H-1(6.09 ppm) and H-2(3.91 ppm), as well as 1H-13C HSQC experiment showing correlation between H-2 (3.91 ppm) andC-2 (80.3 ppm).Glc-β-1,2-Xyl-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.35 8.28 (m, 2H), 7.33 7.25 (m, 2H), 5.51 (d, J = 6.8Hz, 1H, H-1), 4.79 (d, J = 7.8 Hz, 1H, H-1), 4.10 (dd, J = 11.6, 4.0 Hz, 1H), 3.99 3.91 (m, 2H),3.87 3.79 (m, 2H), 3.72 (dd, J = 10.0, 3.4 Hz, 1H), 3.67 3.56 (m, 3H), 3.29 (dd,J= 11.2, 6.4 Hz,1H H-6b).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.79ppm) and C-2 (80.72 ppm), COSY experiment showing correlation between H-1(5.51 ppm) andH-2 (3.94 ppm), as well as 1H-13C HSQC experiment showing correlation between H-2 (3.94 ppm)and C-2 (80.72 ppm).Glc-β-1,3-Glc-β-1,3-Xyl-β-pNP1H NMR (400 MHz, Deuterium Oxide)1,2 δ 8.29 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.30-5.26 (m, 1H, H-1), 4.86 (d, J1′,2′ = 8.0 Hz, 1H, H-1), 4.77(d, J1′,2′ = 7.8 Hz, 1H, H-1), 4.15-4.09 (m, 1H, H-5a), 3.95 (dd, J5,6a= 1.9 Hz, J6′a,6′b= 12.5, 1H,H-6a), 3.93 (dd, J5′′,6′′a= 2.1 Hz, J6′′′a,6′′′b= 12.4 Hz, 1H, H-6a), 3.89-3.83 (m, 3H, H-2, H-3 and225Appendix C. Chapter 4 Supplemental MaterialH-4), 3.80 (dd, J2′,3′= 9.1 Hz, J3′,4′= 9.1 Hz, 1H, H-3), 3.76 (dd, J5,6b= 5.6 Hz, J6′a,6′b= 12.5, 1H,H-6b), 3.73 (dd, J5′′,6′′b= 6.0 Hz, J6′′′a,6′′′b= 12.2 Hz, 1H, H-6b), 3.63-3.56 (m, 1H, H-5b), 3.59 (dd,J1′,2′ = 7.7 Hz, J2′,3′= 9.3 Hz, 1H, H-2), 3.57-3.53 (m, 1H, H-4), 3.54 (dd, J2′′,3′′= 9.4 Hz, J3′′,4′′=8.9 Hz, 1H, H-3), 3.57-3.47 (m, 2H, H-5 and H-5), 3.42 (dd, J3′′,4′′= 8.9 Hz, J4′′,5′′= 7.5 Hz, 1H,H-4), 3.38 (dd, J1′′,2′′ = 7.9 Hz, J2′′,3′′= 9.4 Hz, 1H, H-2).13C NMR (101 MHz, Methanol-d4) δ 161.73(pNP-C-1), 142.91(pNP-C-4), 126.31(2C, pNP-C-3 and C-5), 116.70 (2C, pNP-C-2 and C-6), 103.01(C-1), 102.59(C-1), 100.01 (C-1), 84.43(C-3),83.35 (C-3), 76.21(C-5), 75.82(C-3), 75.76(C-5), 73.65(C-2), 73.44(C-2), 72.53(C-2), 69.78(C-4),68.38 (C-4), 67.77(C-4), 66.15(C-5), 60.95(C-6), 60.90(C-6).Glc-β-1,4-Glc-β-1,3-Xyl-β-pNP1H NMR (400 MHz, Deuterium Oxide)1,2 δ 8.27 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.25 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.28-5.25 (m, 1H, H-1), 4.84 (d, J1′,2′ = 7.9 Hz, 1H, H-1), 4.52(d, J1′′,2′′ = 7.9 Hz, 1H, H-1), 4.14-4.09 (m, 1H, H-5a), 4.02 (dd, J5′,6′a= 2.0 Hz, J6′a,6′b= 12.3, 1H,H-6a), 3.93 (dd, J5′′,6′′a= 2.1 Hz, J6′′′a,6′′′b= 12.4 Hz, 1H, H-6a), 3.88-3.83 (m, 3H, H-2, H-3 andH-4), 3.83 (dd, J5′,6b′= 4.9 Hz, J6′a,6′b= 12.4, 1H, H-6b), 3.75 (dd, J5′′,6′′b= 6.9 Hz, J6′′′a,6′′′b= 12.4Hz, 1H, H-6b), 3.71-3.66 (m, 2H, H-3 and H-4), 3.66-3.55 (m, 2H, H-5b and H-5), 3.52 (dd, J2′′,3′′=9.1 Hz, J3′′,4′′= 9.1 Hz, 1H, H-3), 3.53-3.47 (m, 1H, H-5), 3.44 (dd, J1′,2′ = 7.8 Hz, J2′,3′= 9.5 Hz,1H, H-2), 3.43 (dd, J3′′,4′′= 9.0 Hz, J4′′,5′′= 9.7 Hz, 1H, H-4), 3.33 (dd, J1′′,2′′ = 7.9 Hz, J2′′,3′′=9.3 Hz, 1H, H-2).13C NMR (101 MHz, Methanol-d4) δ 161.50(pNP-C-1), 142.63(pNP-C-4), 126.06(2C, pNP-C-3 and C-5), 116.43 (2C, pNP-C-2 and C-6), 102.55(C-1), 102.41(C-1), 99.77 (C-1), 83.12(C-3),78.59 (C-4), 75.94, 75.44, 74.80, 74.09, 73.19, 73.11, 72.28, 69.41 (C-4), 67.52(C-4), 64.91(C-5),60.53(C-6), 60.00(C-6).3-NH2-Glc-β-1,3-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.30 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.31 (d, J1,2 = 7.6 Hz, 1H, H-1), 4.82 (d, J1′,2′ = 8.0 Hz, 1H,H-1), 3.97 (dd, J5,6a′= 2.2 Hz, J6a,6b= 9.6, 1H, H-6a), 3.94 (dd, J5′,6′a= 2.3 Hz, J6′a,6′b= 9.7 Hz,226Appendix C. Chapter 4 Supplemental Material1H, H-6a), 3.91 (dd, J2,3′= 8.9 Hz, J3,4= 8.9 Hz, 1H, H-3), 3.85 (dd, J1,2 = 7.8 Hz, J2,3′= 9.1 Hz,1H, H-2), 3.80 (dd, J5,6a′= 5.3 Hz, J6a,6b= 12.3, 1H, H-6b), 3.78-3.71 (m, 2H, H-5 and H-6b), 3.65(dd, J3,4= 8.6 Hz, J4,5= 9.7 Hz, 1H, H-4), 3.55 (ddd, J4′,5′= 9.8 Hz, J5′,6′a= 2.1 Hz, J5′,6b′= 6 Hz,1H, H-5), 3.37 (dd, J2′,3′= 9.7 Hz, J3′,4′= 9.7 Hz, 1H, H-3), 3.33 (dd, J1′,2′ = 7.7 Hz, J2′,3′= 9.8Hz, 1H, H-2), 2.90 (dd, J3′,4′= 9.8 Hz, J4′,5′= 9.8 Hz, 1H, H-4).3-NH2-Glc-β-1,4-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.29 (d,J= 9.2 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J=9.2 Hz, 2H, 2x pNP -O2N-C-C-H), 5.32 (d, J1,2 = 7.8 Hz, 1H, H-1), 4.57 (d, J1′,2′ = 7.9 Hz, 1H,H-1), 4.05-3.99 (m, 1H, H-6a), 3.94 (dd, J6′a,6′b= 12.0 Hz, J5′,6′a= 1.9 Hz, 1H, H-6a), 3.90-3.84 (m,2H, H-5, H-6b), 3.82-3.77 (m, 2H, H-3 and H-4), 3.76 (dd, J6′a,6′b= 12.7 Hz, J5′,6b′= 5.9 Hz, 1H,H-6b), 3.70 (dd, J1,2 = 7.8 Hz, J2,3′= 9.5 Hz, 1H, H-2), 3.55 (ddd, J4′,5′= 9.8 Hz, J5′,6′a= 2.3 Hz,J5′,6b′=5.9 Hz, 1H, H-5), 3.36 (dd, J2′,3′= 9.7 Hz, J3′,4′= 9.7 Hz, 1H, H-4), 3.27 (dd, J1′,2′ = 7.8Hz, J2′,3′= 10 Hz, 1H, H-2), 2.85 (dd, J3′,4′= 9.8 Hz, J4′,5′= 9.8 Hz, 1H, H-3),13C NMR (101 MHz, Deuterium Oxide) δ 161.73 (pNP-C-1), 142.74 (pNP-C-4), 126.18(2C,pNP-C-3 and C-5), 116.55 (2C, pNP-C-2 and C-6), 103.11(C-1), 99.30(C-1), 78.33(C-4), 77.26(C-5), 75.19(C-5), 74.08(C-3), 72.91(C-2), 72.59(C-2), 69.31(C-4), 60.71(C-6), 59.82(C-6), 57.58(C-3).4-NH2-Glc-β-1,4-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.29 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.32 (d, J1,2 = 7.8 Hz, 1H, H-1), 4.54 (d, J1′,2′ = 7.5 Hz, 1H,H-1), 4.06-3.99 (m, 1H, H-6a), 3.93 (dd, J6′a,6′b= 12.5 Hz, J5′,6′a= 2.5 Hz, 1H, H-6a), 3.90-3.84 (m,2H, H-5 and H-6b), 3.82-3.78 (m, 2H, H-3 and H-4), 3.75 (dd, J6′a,6′b= 12.6 Hz, J5′,6b′= 6.0 Hz,1H, H-6b), 3.69 (dd, J1,2 = 7.8 Hz, J2,3′= 9.5 Hz, 1H, H-2), 3.55 (ddd, J4′,5′= 9.9 Hz, J5′,6′a= 2.4Hz, J5′,6b′= 5.8 Hz, 1H, H-5), 3.40 (dd, J2′,3′= 9.3 Hz, J3′,4′= 9.3 Hz, 1H, H-3), 3.35 (dd, J1′,2′ =7.5 Hz, J2′,3′= 9.3 Hz, 1H, H-2), 2.75 (dd, J3′,4′= 9.7 Hz, J4′,5′= 9.7 Hz, 1H, H-4).13C NMR (101 MHz, Deuterium Oxide) δ 161.73 (pNP-C-1), 142.73 (pNP-C-4), 126.19(2C,pNP-C-3 and C-5), 116.55 (2C, pNP-C-2 and C-6), 102.82 (C-1), 99.30 (C-1), 78.35 (C-4), 76.78(C-5), 75.70(C-3), 75.15(C-5), 74.10(C-3), 73.65(C-2), 72.59(C-2), 60.92(C-6), 59.85(C-6), 52.46 (C-4).227Appendix C. Chapter 4 Supplemental Material6-NH2-Glc-β-1,4-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.30 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.27 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.31 (d, J1,2 = 7.7 Hz, 1H, H-1), 4.58 (d, J1′,2′ = 7.9 Hz, 1H,H-1), 4.03 (dd, J6a,6b = 12.2 Hz, 1H, H-6a), 3.89 (dd, J5′,6′b= 3.5 Hz, J6a,6b= 12.4 Hz, 1H, H-6b),3.86-3.75 (m, 4H, H-3, H-4 and H-5), 3.69 (dd, J1,2 = 8.0 Hz, J2,3′= 8.9 Hz, 1H, H-2), 3.59-3.5 (m,2H, H-3 and H-5), 3.40-3.33 (m, 2H, H-2 and H-4), 3.30 (dd, J5′,6′a= 1.7 Hz, J6′a,6′b= 13.2 Hz, 1H,H-6a), 3.01 (dd, J5′,6b′= 8.4 Hz, J6′a,6′b= 13.4 Hz, 1H, H-6b).13C NMR (101 MHz, Deuterium Oxide) δ 163.29(pNP-C-1), 144.29(pNP-C-4), 127.75(2C, pNP-C-3 and C-5), 118.11(2C, pNP-C-2 and C-6), 103.98(C-1), 100.95(C-1), 78.68(C-4), 76.94(C-3),76.85(C-5), 75.70(C-5), 75.50(C-3), 74.86(C-2), 74.30(C-2), 72.64(C-4), 61.20(C-6), 42.54(C-6).6-N3-Glc-β-1,3-Glc-β-pNP1H NMR (600 MHz, Deuterium Oxide) δ 8.28 (d, J = 9.3 Hz, 1H), 7.26 (d, J = 9.3 Hz, 1H), 5.30(d, J = 7.7 Hz, 1H, H-1), 4.82 (d, J = 8.0 Hz, 1H, H-1), 3.95 (dd, J = 12.4, 2.2 Hz, 1H), 3.90 (pt,J = 9.0 Hz, 1H), 3.85 (dd, J = 9.3, 7.7 Hz, 1H, H-2), 3.78 (dd, J = 12.4, 5.5 Hz, 1H), 3.77 3.72(m, 2H), 3.66 3.59 (m, 2H), 3.56 3.51 (m, 2H), 3.46 (pt, J = 9.4 Hz, 1H, H-4), 3.40 (dd, J = 9.3,8.0 Hz, 1H, H-2).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.82ppm) and C-3 (83.74 ppm), H-2 (3.85 ppm) and C-1 (99.32 ppm), as well as H-2 (3.85 ppm) andC-3 (83.74 ppm).6-N3-Glc-β-1,4-Glc-β-pNP1H NMR (600 MHz, Deuterium Oxide) δ 8.34 8.24 (m, 1H), 7.30 7.21 (m, 2H), 5.30 (d, J = 7.9Hz, 1H, H-1), 4.56 (d, J = 7.9 Hz, 1H, H-1), 4.04 3.98 (m, 1H, H-6a), 3.88 3.83 (m, 2H, H-6b andH-5), 3.82 3.76 (m, 3H, H-6a, H-3, H-4), 3.72 3.65 (m, 1H, H-2), 3.61 (ddd, J = 9.4, 5.6, 2.5 Hz,1H, H-5), 3.55 (dd, J = 13.4, 5.5 Hz, 1H, H-6b), 3.51 (pt, J = 9.2 Hz, 1H, H-3), 3.47 (pt, J = 9.2Hz, 1H, H-4), 3.36 (dd, J = 9.1, 8.0 Hz, 1H, H-2).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.56228Appendix C. Chapter 4 Supplemental Materialppm) and C-4 (78.05 ppm), 1H-13C HMBC correlation of H-4 (3.78 ppm) and C-2 (72.54 ppm),1H-13C HSQC experiment showing correlation between H-2 (3.69 ppm) and C-2 (72.54 ppm), aswell as HSQC experiment showing correlation between H-4 (3.78 ppm) and C-2 (78.05 ppm).6-N3-Gal-β-1,2-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.34 8.27 (m, 2H), 7.33 7.27 (m, 2H), 5.50 (d,J= 7.5Hz, 1H, H-1), 4.85 (d,J= 7.9 Hz, 1H H-1), 3.98 3.90 (m, 2H), 3.88 3.87 (m, 1H), 3.84 (t,J= 9.2Hz, 1H), 3.80 3.68 (m, 3H), 3.61 3.54 (m, 2H), 3.25 3.21 (m, 2H, H-6a and H-6b).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.85ppm) and C-2 (81.24 ppm), COSY experiment showing correlation between H-1(5.50 ppm) andH-2 (3.92 ppm), as well as 1H-13C HSQC experiment showing correlation between H-2 (3.92 ppm)and C-2 (81.24 ppm).6-N3-Gal-β-1,3-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.32 8.24 (m, 2H), 7.31 7.22 (m, 2H), 5.31 (d,J= 7.7Hz, 1H), 4.77 (s, 1H), 3.99 3.83 (m, 5H), 3.83 3.77 (m, 1H), 3.77 3.60 (m, 5H), 3.51 (dd,J= 13.1,4.0 Hz, 1H, H-6b).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.77ppm) and C-2 (83.94 ppm), H-2 (3.85 ppm) and C-1 (99.37 ppm), as well as H-2 (3.85 ppm) andC-3 (83.92 ppm).6-N3-Gal-β-1,4-Glc-β-pNP1H NMR (600 MHz, Deuterium Oxide) δ 8.33 8.23 (m, 2H), 7.26 (d,J= 9.1 Hz, 2H), 5.31 (d,J=7.8 Hz, 1H, H-1), 4.52 (d,J= 7.8 Hz, 1H, H-1), 4.04 3.99 (m, 1H, H-6a), 3.93 (d,J= 3.4 Hz, 1H,H-4,), 3.88 3.84 (m, 3H), 3.82 3.79 (m, 2H), 3.72 3.67 (m, 2H), 3.63 (dd,J= 13.1, 8.5 Hz, 1H,H-6a), 3.59 3.55 (m, 2H).Linkage was determined by 1H-13C HMBC experiment showing correlation between H-1 (4.52ppm) and C-4 (78.2 ppm), H-6a (4.00 ppm) and C-4 (78.2 ppm), as well as H-6b (3.86 ppm) andC-4 (78.2 ppm).229Appendix C. Chapter 4 Supplemental Material6-N3-Glc-β-1,4-Glc-β-1,4-Glc-β-pNP1H NMR (400 MHz, Deuterium Oxide) δ 8.28 (d,J= 9.3 Hz, 2H, 2x pNP -O-C-C-H), 7.26 (d,J=9.3 Hz, 2H, 2x pNP -O2N-C-C-H), 5.31 (d, J1,2 = 7.8 Hz, 1H, H-1), 4.58 (d, J1′,2′ = 7.9 Hz, 1H,H-1), 4.55 (d, J1,2 = 7.9 Hz, 1H, H-1), 4.03 (dd, J5,6a′= 1 Hz, J6a,6b= 11.8, 1H, H-6a), 4.02 (dd,J5′,6′a= 1.7 Hz, J6′a,6′b= 12.1 Hz, 1H, H-6a), 3.91-3.84 (m, 3H, H-5, H-6b, H-6b), 3.84-3.78 (m, 3H,H-6a, H-3 and H-4), 3.72-3.65 (m, 3H, H-2, H-3, H-4 and H-5), 3.62 (ddd, J4′,5′= 8.9 Hz, J5′′,6′′a=2.3 Hz, J5′′,6′′b=5.7 Hz, 1H, H-5), 3.55 (dd, J5′′,6′′b= 5.7 Hz, J6′′′a,6′′′b= 13.4 Hz, 1H, H-6b), 3.52(dd, J2′′,3′′= 8.9 Hz, J3′′,4′′= 9.0 Hz, 1H, H-3), 3.47 (dd, J3′′,4′′= 9.0 Hz, J4′′,5′′= 9.0 Hz, 1H, H-4),3.41 (dd, J1′,2′ = 7.8 Hz, J2′,3′= 9.1 Hz, 1H, H-2), 3.35 (dd, J1′′,2′′ = 7.9 Hz, J2′′,3′′= 9.0 Hz, 1H,H-2).13C NMR (101 MHz, Deuterium Oxide) δ 161.64 (pNP-C-1), 142.62 (pNP-C-4), 126.09 (2C,pNP-C-3 and C-5), 116.46 (2C, pNP-C-2 and C-6), 102.50 (C-1), 102.39 (C-1), 99.24 (C-1), 78.27(C-4), 78.05 (C-4), 75.22, 75.09, 74.79, 74.20, 73.94, 73.93, 73.10, 72.92, 72.52, 70.03 (C-4), 59.85(C-6), 59.69 (C-6), 50.85 (C-6).4-NH2-Glc-β-1,4-2FGlc-β-DNP1H NMR (400 MHz, Deuterium Oxide) δ 8.93 (d,J= 2.8 Hz, 1H, DNP-H-3), 8.58 (dd,J= 9.3, 2.8Hz, 1H, DNP-H-5), 7.66 (d,J= 9.4 Hz, 1H, DNP-H-6), 5.78 (dd, J1,2 = 7.6 Hz, J1,F = 2.9 Hz, 1H,H-1), 4.64 (ddd, J1,2 = 7.6 Hz, J2,F = 51.2 Hz, J2,3′ = 9.1Hz, 1H, H-2), 4.55 (d, J1′,2′ = 7.6 Hz,1H, H-1), 4.11 (ddd, J2,3′= 8.8 Hz, J3,4= 8.8 Hz, J3,F = 15.7 Hz, 1H, H-3), 4.04 (dd, J6a,6b= 12.2Hz, J5,6a′= 1.4 Hz, 1H, H-6a), 3.96-3.84 (m, 4H, H-4, H-5, H-6b and H-6a), 3.75 (dd, J6′a,6′b= 12.4Hz, J5′,6b′= 5.9 Hz, 1H, H-6b), 3.48 (ddd, J4′,5′= 9.9 Hz, J5′,6′a= 2.5 Hz, J5′,6b′= 5.8 Hz, 1H, H-5),3.41 (dd, J2′,3′= 9.7 Hz, J3′,4′= 9.7 Hz, 1H, H-3), 3.35 (dd, J1′,2′ = 7.6 Hz, J2′,3′= 9.3 Hz, 1H, H-2),2.76 (dd, J3′,4′= 9.7 Hz, J4′,5′= 9.7 Hz, 1H, H-4)13C NMR (101 MHz, D2O+DMSO-d6) δ 155.35 (DNP C-1), 143.35(DNP C-2) 140.63(DNPC-4), 131.41(DNP C-5), 123.71(DNP C-6), 119.45(DNP C-3), 104.24(C-1), 99.20(d, C-1), 92.54 (d,C-2), 78.87 (d, C-4), 78.15 (C-5), 77.06, 77.01(C-3 and C-5), 75.08(C-2), 74.09(d, C-3), 62.45(C-6),61.04(C-6), 54.03 (C-4).230Appendix C. Chapter 4 Supplemental MaterialGlc-β-1,4-Glc-β-1,4-2F-Glc-β-DNP1H NMR (400 MHz, Deuterium Oxide) δ 8.92 (d,J= 2.8 Hz, 1H, DNP-H-3), 8.56 (dd,J= 9.4, 2.8Hz, 1H, DNP-H-5), 7.63 (d,J= 9.4 Hz, 1H, DNP-H-6), 5.78 (dd, J1,2 = 7.6 Hz, J1,F = 3.0 Hz, 1H,H-1), 4.64 (ddd, J1,2 = 7.6 Hz, J2,F = 51.1 Hz, J2,3′ = 9.0Hz, 1H, H-2), 4.59 (d, J1′,2′ = 7.9 Hz,1H, H-1), 4.53 (d, J1′′,2′′ = 7.9 Hz, 1H, H-1), 4.11 (ddd, J2,3′= 8.6 Hz, J3,4= 8.6 Hz, J3,F = 15.8Hz, 1H, H-3), 4.06-3.99 (m, 2H, H-6a and H-6a), 3.96-3.87 (m, 4H, H-4, H-5, H-6b and H-6a), 3.85(dd, J6′a,6′b= 12.3 Hz, J5′,6b′= 4.7 Hz, 1H, H-6b), 3.75 (dd, J6′′′a,6′′′b= 12.3 Hz, J5′′,6′′b= 5.7 Hz, 1H,H-6b), 3.72-3.62 (m, 3H, H-3, H-4 and H-5), 3.53 (dd, J2′′,3′′= 9.1 Hz, J3′′,4′′= 9.1 Hz, 1H, H-3),3.52-3.48 (m, 1H, H-5), 3.43 (dd, J3′′,4′′= 9.0 Hz, J4′′,5′′= 9.7 Hz, 1H, H-4), 3.40 (dd, J1′,2′ = 7.9Hz, J2′,3′= 8.9 Hz, 1H, H-2), 3.33 (dd, J1′′,2′′ = 7.9 Hz, J2′′,3′′= 9.3 Hz, 1H, H-2)13C NMR (101 MHz, D2O+DMSO-d6) δ 154.28(DNP C-1), 142.18(DNP C-2) 139.43(DNPC-4), 130.25(DNP C-5), 122.63(DNP C-6), 118.29(DNP C-3), 102.99(C-1), 102.69(C-1), 98.07(d,C-1), 91.37 (d, C-2), 78.83 (C-4), 77.40(d, C-4), 76.40(C-5), 75.90, 75.88(C-3 and C-5), 75.28(C-5),74.45(C-3), 73.57(C-2), 73.32(C-2), 72.88 (d, C-3), 69.87(C-4), 60.99(C-6), 60.34(C-6), 59.84 (C-6).Glc-β-1,4-Glc-β-1,4-Glc-β-1,4-2F-Glc-β-DNP1H NMR (400 MHz, Deuterium Oxide) δ 8.93 (d,J= 2.8 Hz, 1H, DNP-H-3), 8.57 (dd,J= 9.4, 2.8Hz, 1H, DNP-H-5), 7.64 (d,J= 9.4 Hz, 1H, DNP-H-6), 5.79 (dd, J1,2 = 7.5 Hz, J1,F = 2.9 Hz,1H, H-1), 4.64 (ddd, J1,2 = 7.6 Hz, J2,F = 51.1 Hz, J2,3′ = 9.0Hz, 1H, H-2), 4.60 (d, J1′,2′ = 8.0Hz, 1H, H-1), 4.56 (d, J1′′,2′′ = 7.9 Hz, 1H, H-1), 4.53 (d, J1′′′,2′′′ = 7.9 Hz, 1H, H-1), 4.12 (ddd,J2,3′= 8.5 Hz, J3,4= 8.5 Hz, J3,F = 16.2 Hz, 1H, H-3), 4.07-3.97 (m, 3H, H-6a, H-6a and H-6a),3.97-3.88 (m, 4H, H-4, H-5, H-6b, H-6a), 3.86 (dd, J6′a,6′b= 12.3 Hz, J5′,6b′= 4.5 Hz, 1H, H-6b),3.85 (dd, J6′′′a,6′′′b= 12.5 Hz, J5′′,6′′b= 5.7 Hz, 1H, H-6b), 3.75 (dd, J6′′′a,6′′′b= 12.5 Hz, J5′′′,6′′′b=5.7 Hz, 1H, H-6b), 3.73-3.61 (m, 6H, H-3, H-4, H-5, H-3, H-4 and H-5), 3.56-3.48 (m, 2H, H-3 andH-5), 3.46-3.36 (m, 3H, H-2, H-2, H-4), 3.33 (dd, J1′′′,2′′′ = 7.9 Hz, J2′′′,3′′′= 9.3 Hz, 1H, H-2).231Appendix C. Chapter 4 Supplemental MaterialGlc-β-1,4-Glc-β-1,3-Glc-β-octyl1H NMR (400 MHz, Deuterium Oxide) δ 4.77 (d, J1′,2′ = 7.6 Hz, 1H, H-1), 4.52 (d, J1′′,2′′ =7.9 Hz, 1H, H-1), 4.49 (d, J1,2 = 8.1 Hz, 1H, H-1), 4.00 (dd, J5,6a′= 2.1 Hz, J6a,6b= 12.3, 1H,H-6a), 3.96-3.89 (m, 3H, Octyl-H-1a, H-6a and H-6a), 3.82 (dd, J5′,6′b= 5.0 Hz, J6a,6b= 12.3, 1H,H-6b),3.78-3.69 (m, 3H, H-3, H-6b and H-6b), 3.69-3.60 (m, 5H, Octyl-H-1b, H-5, H-3, H-4 andH-5), 3.55-3.49 (m, 3H, H-4, H-3 and H-5), 3.49-3.38 (m, 3H, H-2, H-2 and H-4), 3.33 (dd, J1′′,2′′= 7.8 Hz, J2′′,3′′= 9.2 Hz, 1H, H-2), 1.69-1.58 (m, 2H, Octyl-H-2), 1.41-1.25 (m, 10H, Octyl H-3,H-4, H-5, H-6 and H-7), 0.91-0.84 (m, 3H, Octyl-H-8).232


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items