Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Functional metagenomics and consolidated bioprocessing for valorization of pulp and paper mill sludge Sharan, Anupama Achal 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_may_sharan_anupama.pdf [ 4.22MB ]
Metadata
JSON: 24-1.0365712.json
JSON-LD: 24-1.0365712-ld.json
RDF/XML (Pretty): 24-1.0365712-rdf.xml
RDF/JSON: 24-1.0365712-rdf.json
Turtle: 24-1.0365712-turtle.txt
N-Triples: 24-1.0365712-rdf-ntriples.txt
Original Record: 24-1.0365712-source.json
Full Text
24-1.0365712-fulltext.txt
Citation
24-1.0365712.ris

Full Text

i  FUNCTIONAL METAGENOMICS AND CONSOLIDATED BIOPROCESSING FOR VALORIZATION OF PULP AND PAPER MILL SLUDGE by Anupama Achal Sharan  B.E., Birla Institute of Technology, Mesra, Ranchi, 2015  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF  THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Chemical and Biological Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2018  © Anupama Achal Sharan, 2018 ii  Abstract  Biocatalyst discovery is integral to bioeconomy development, enabling design of scalable bioprocesses that can compete with the resource-intensive petrochemical industry. Uncultivated microbial communities within natural and engineered ecosystems provide a near-infinite reservoir of genomic diversity and metabolic potential that can be harnessed for this purpose. To bridge the cultivation gap, functional metagenomic screens have been developed to recover active genes directly from environmental samples. In this thesis, a pipeline for recovery of biomass-deconstructing biocatalysts sourced from pulp and paper mill sludge (PPS) metagenome is described. This environment is targeted given its high composition of cellulose that is hypothesized to direct enrichment of enzymes capable of hydrolysing it. The resulting oligosaccharides represent platform molecules that can be fed to downstream applications using consolidated process design for converting biological waste streams into value-added products. High-molecular weight DNA was extracted from sludge and used to construct a fosmid library containing 15,000 clones using the copy control system in EPI300™-T1 R E.coli. Extracted DNA was also used in whole genome shotgun sequencing to compare the metabolic potential of the sludge community with fosmid screening outcomes as well as other waste biomass environments using MetaPathways v2.5 software pipeline, with specific emphasis on carbohydrate-active enzymes (CAZymes). Metagenomic assembling, open reading frame (ORF) prediction, binning and taxonomic assignment approaches were also used to bring out correlations between function and taxonomy. In total, 32,232 ORF’s were mapped to the CAZy database predicted to encode glycoside hydrolases, glycosyl transferases, and carbohydrate binding module families. The fosmid library was screened for glycosidase hydrolase activities using a pool of sensitive iii  fluorogenic glycosides of 6-chloro-4-methylumbelliferone (CMU). A total of 744 clones capable of converting pooled substrates were recovered indicating an extremely high hit rate (1 hit per 43 clones). Following fosmid sequencing and annotation, two of the most promising hits with defined single GH family loci were sub-cloned and overexpressed in E.coli BL21 DE3 strain to conduct basic biochemical characterization. Activity of purified enzymes was demonstrated on model lignocellulosic substrates to evaluate the potential of implementing the proposed circular bioprocess with waste PPS as both the feedstock and source of enriched biocatalysts.               iv  Lay Summary  In an era of growing global consciousness for bioeconomy development, it is ironic that the pulp and paper industry, which is one of the most prominent biomass-based industries in terms of revenue generation, is declining in Canada. The strategic importance of this industry within the bioeconomy and an opportunity to reduce environmental burden by not only remediating but valorising pulp and paper mill waste forms the primary motivation behind this work.  This study presents an approach to add value to the solid waste stream, paper sludge, from this industry by unravelling its environmental genome to discover novel genes that produce enzymes capable of breaking down biomass. This was done using both experimental techniques and data analysis that involved high-throughput robotic functional screening and bioinformatics approaches. The findings of this study point to definitive cost-reduction approaches in industrial bioprocessing using cheap, waste biomass feedstocks for reciprocal enzyme discovery and enhanced bioconversion.        v  Preface Several sections of this work are being used for composing manuscripts for publication in peer-reviewed journals. The research was conducted under the co-supervision of Dr. Vikramaditya G. Yadav (Chemical and Biological Engineering) and Dr. Steven J. Hallam (Microbiology and Immunology). I conducted the literature review, defined research questions and methodologies, designed and conducted experiments, analysed data and compiled and interpreted results under their guidance and assistance with group members from both Hallam and Yadav lab groups. The thesis is written by me and contains original, unpublished work with inputs from both Dr. Hallam and Dr. Yadav.  Several bioinformatics-based workflows included in the thesis as presented in chapters 2-3, including shotgun metagenome DNA QC, trimming and assembly, binning, EMIRGE and Phylosift taxonomy assignment, positive fosmid clone sequences FabFos pipeline assembly were implemented in collaboration with Connor Morgan-Lang. Small subunit ribosomal RNA (SSU rRNA or 16S) gene pyrotag sequencing and operational taxonomic unit (OTU) data analysis for the metagenomic DNA as presented in chapter 2 was done in collaboration with Ashley Arnold. Annotation of the pulp and paper sludge (PPS) metagenome and other metagenomes included in the comparative analysis presented in chapter 2 was done using Metapathways 2.5, a metagenomic DNA annotation pipeline developed in the Hallam lab (Hanson et al. 2014) with suitable in-house updates. The same pipeline was also used for positive fosmid clone annotations vi  as presented in chapter 3. The fosmid sequence ORF figure was produced by Kateryna Ievdokymenko and edited by me. The fluorogenic substrates for fosmid library screening as presented in chapter 3, were kindly provided by Zach Armstrong from the Withers lab group at UBC and experiment design for fosmid library screening and biochemical assays and characterization were guided by him. Dr. Aria Hanh and Dr. Keith Mewis also guided several aspects of metagenomic annotation and results interpretations. Compositional analysis of the paper mill sludge as presented in chapter 4 was guided by Dr. Jinguang Hu, Saddler lab group (Forestry) and done in collaboration with Daniela Vargas Figueroa, MASc student in the same group. The Saddler group also provided the Celluclast enzyme used as positive control in enzyme assays. The mutant Chit-O enzyme used for cellulase assaying was kindly provided by the Alessandro R Ferrari from the University of Groningen, Netherlands. Dr. Sandip Pawar and Benson Chang from the Yadav group helped with sub-cloning work presented in this chapter.      vii  Table of Contents  Abstract ............................................................................................................................................ ii Lay summary ................................................................................................................................... iv Preface ............................................................................................................................................. v Table of Contents ........................................................................................................................... vii List of Tables ................................................................................................................................... xi List of Figures ................................................................................................................................ xiii List of Abbreviations…………………………………………………………………………………………………………………xx Acknowledgements ………………………………………………………………………………………………………………xxiii Dedication ……………………………………………………………………………………………………………………………..xxv Chapter 1: Introduction …………………………………………………………………………………………………………….1 1.1 Sustainability and the Circular Economy ………………………………………………………………….1 1.2 The Bioeconomy ……………………………………………………………………………………………………..4 1.2.1 Industrial Biocatalysis ……………………………………………………………………………….9 1.3 Rejuvenating the Paper Industry by Valorising Solid Waste Stream – Pulp and Paper Mill Sludge (PPS) ………………………………………………………………………………………………………….11 1.4 Research Overview ………………………………………………………………………………………………..13 Chapter 2: In-silico Analysis of the Paper Mill Sludge Microbiome……………………………………………16 2.1 Background…………………………………………………………………………………………………………….16 2.1.1 Metagenomics – Unearthing Nature’s Biocatalytic Potential……………………16 2.1.1.1 Lignocellulose Hydrolysis and Valorization…………………………………17 2.1.2 In-silico Approaches in Metagenomics – what all can the microbiome map tell us?...................................................................................................................20 2.1.2.1 Bioinformatics Tools – making sense in the noise………………………21 2.1.2.2 Standardizing Data and Streamlining Analysis……………………………23 2.2 Materials and Methods………………………………………………………………………………………….28 2.2.1 Sample Collection and Processing……………………………………………………………28 2.2.2 Whole-metagenome Sequencing, Binning and Assembly…………………………29 viii  2.2.2.1 High Molecular Weight Genomic DNA Extraction………………………29 2.2.2.2 Sequencing………………………………………………………………………………..30 2.2.2.3 Metagenome Assembly and Binning………………………………………….31 2.2.3 Taxonomic Analysis………………………………………………………………………………….32 2.2.3.1 454 16s rRNA gene Pyrotag Sequencing…………………………………….32 2.2.3.2 Expectation Maximization Iterative Reconstruction of Genes from the Environment (EMIRGE) based 16s Prediction………………………………….34 2.2.3.3 Phylosift Annotation………………………………………………………………….34 2.2.4 Functional annotation of metagenome……………………………………………………35 2.2.5 Mapping Reads to Assembly and Generating Normalized Abundance Values for Open Reading Frames (ORFs)………………………………………………………………………36 2.2.6 Qualitative comparison of Carbohydrate Active Enzyme (CAZyme) abundance across different environments………………………………………………………..37 2.3 Results……………………………………………………………………………………………………………………38 2.3.1 Metagenome Assembly and Annotation………………………………………………….38 2.3.2 CAZy Families Relevant to Plant Polysaccharide Degradation…………………..39 2.3.3 Binning the PPS metagenome………………………………………………………………….41 2.3.4 The Paper Sludge Microbiome and Metabolic Potential…………………………..45 2.3.4.1 Taxonomic Distribution……………………………………………………………..45 2.3.4.2 CAZy Distribution……………………………………………………………………….48 2.3.5 Comparison to Other Environmental Microbiomes………………………………….51 2.4 Conclusion………………………………………………………………………………………………………………58 CHAPTER 3 High-Throughput Biocatalyst Discovery from Paper Sludge Metagenome by Functional Metagenomics………………………………………………………………………………………………………..60 3.1 Background…………………………………………………………………………………………………………….60 3.1.1 Functional Metagenomics - discovering novel industrial biocatalysts………60 3.1.2 What goes into metagenomic functional screen design?............................62 3.1.3 Carbohydrate Active Enzymes (CAZy) Database: Glycoside Hydrolase (GH) families and Polysaccharide utilisation loci (PUL’s)……………………………………………67 ix  3.2 Materials and Methods…………………………………………………………………………………………..68 3.2.1 Fosmid Library Construction…………………………………………………………………….68 3.2.2 Functional Screening………………………………………………………………………………..70 3.2.3 Fosmid Purification and Sequencing…………………………………………………………72 3.2.4 Fosmid Assembly and Annotation…………………………………………………………….72 3.3 Results and Discussion……………………………………………………………………………………………74 3.3.1 Metagenomic Library Construction and Host Selection……………………………74 3.3.2 High-throughput Functional Screening…………………………………………………….76 3.3.3 Fosmid Sequencing and Annotations……………………………………………………….80 3.4 Conclusion………………………………………………………………………………………………………………89 CHAPTER 4 Function to Application – Consolidated Bioprocessing…………………………………………..91 4.1 Background…………………………………………………………………………………………………………….91 4.1.1 Coming Full Circle - design and implementation of consolidated bioprocessing…………………………………………………………………………………………………...91 4.1.2 Biochemical Assays for Cellulose Hydrolysis Kinetics………………………………..94 4.1.3 Matching the Scales of Discovery and Application - what does functional metagenomics need to go all the way?................................................................95 4.2 Materials and Methods…………………………………………………………………………………………..97 4.2.1 Biomass Compositional Analysis………………………………………………………………97 4.2.2 Detection of Hydrolytic Activity using Colorimetric Assay…………………………98 4.2.3 Different Systems for Testing Cellulolytic Activity…………………………………..101 4.2.4 Bench-scale Bioprocess Development……………………………………………………103 4.3 Results………………………………………………………………………………………………………………….104 4.3.1 Compositional Analysis…………………………………………………………………………..104 4.3.2 Colorimetric Detection of Cellulolytic Activity………………………………………..105 4.3.2.1 Whole-cell Lysates…………………………………………………………………..107 4.3.2.2 Fosmid Whole Cell Lysate and 50X Concentrated Culture Supernatant Protein Fractions…………………………………………………………….109 x  4.3.2.3 Sub-cloned GH genes from fosmids over-expressed in E. coli BL21(DE3) strain………………………………………………………………………………….113 4.3.3 Bench-scale Hydrolysis…………………………………………………………………………..116 4.4 Conclusion……………………………………………………………………………………………………………117 Chapter 5 Thesis Conclusion and Future Directions of Work………………………………………………….118 5.1 Concluding Discussion…………………………………………………………………………………………..118 5.2 Future Perspectives………………………………………………………………………………………………119 5.2.1 Microbiome Metabolic Potential……………………………………………………………119 5.2.2 Functional Metagenomic Screening……………………………………………………….120 5.2.3 Consolidated Bioprocess Development………………………………………………….121 Bibliography…………………………………………………………………………………………………………………………..123 Appendix A: Chapter 4 - Sub-cloning Details…………………………………………………………………………..149              xi  List of Tables Table 1.1: Bioeconomy development strategies………………………………………………………………………….8 Table 1.2: Approximate content of cellulose, hemicellulose and lignin in different types of waste lignocellulosic materials (Prasetyo and Park 2013)……………………………………………………………………13 Table 2.1: The most prominent enzyme activities needed to completely hydrolyse lignocellulosic biomass (adapted from Salehi Jouzani and Taherzadeh 2015)…………………………………………………..20 Table 2.2:  Genome Standards Consortium recommendations for metadata generation for metagenome assembled genomes (MIGS – Minimum information about a Genomic Sequence) (Bowers et al. 2017)………………………………………………………………………………………………………………….25 Table 2.3: Metagenome assembled genome (MAG) metadata requirements (Bowers et al. 2017)……………………………………………………………………………………………………………………………………….27 Table 2.4: General Assembly Statistics………………………………………………………………………………………39 Table 2.5: Summary of phylogenetic (Phylosift) and functional (Metapathways) information of the medium-high quality draft genomes as assembled through binning of the PPS metagenome (*≤ 1 gene count)……………………………………………………………………………………………………………………..43 Table 2.6: Taxonomic assessment comparison pipeline across the major phyla depicted in Figure 2.9……………………………………………………………………………………………………………………………………………47 Table 2.7: ORF statistics and metadata for metagenomes in comparative analysis…………………….52 xii  Table 3.1: Hit rate of different functional metagenomic library screened for glycoside hydrolase genes using a soluble, chromogenic model compound, 2,4-dinitrophenyl cellobioside (DNP-C) (data from Mewis 2016)……………………………………………………………………………………………………………78 Table 3.2: Fosmid assembly statistics – generated using FabFos pipeline and Quast online tool………………………………………………………………………………………………………………………………………….81 Table 3.3: Taxonomic assignment of ORF’s across fosmids: ORF taxonomic annotation was done through the LCA algorithm implemented using NCBI taxonomy tree in Metapathways pipeline (in cases of multiple GH loci from individual fosmid annotations – the annotation for ≥50% of instances was reported)……………………………………………………………………………………………………………86              xiii  List of Figures Figure 1.1: Types of sustainability and meaningful sustainable development at their intersection (adapted from Rehmann 2010)…………………………………………………………………………………………………..2 Figure 1.2: The bioeconomy circular by nature (BioVale 2015)…………………………………………………….5 Figure 1.3: Flowchart showing pathways to products from biomass that are conventionally produced from petroleum based feedstocks (U.S. DOE. 2015)…………………………………………………….7 Figure 1.4: Major current industry statistics for the Pulp and Paper industry in Canada (Source: IBIS world open access content)……………………………………………………………………………………………….12 Figure 1.5: Research Overview………………………………………………………………………………………………….15 Figure 2.1: Dot plot to show abundance of enzymes discovered through functional screening in metagenomic libraries (Taupp et al. 2011) Copyright © 2011 Elseiver Ltd.………………………………..17 Figure 2.2: Structure of lignocellulosic biomass containing cellulose (composed of a β-1,4-linked chain of glucose molecules), hemicellulose (composed of various 5- and 6-carbon sugars) and lignin (composed of three major phenolic components) (Rubin 2008) Copyright © 2008, Springer Nature……………………………………………………………………………………………………………………………………..18 Figure 2.3: Quality assessment pipeline for single amplified genomes (SAGs) and genomes from metagenomes (MAGs) (Bowers et al. 2017)………………………………………………………………………………25 Figure 2.4: Phlyosift workflow schematic (Darling et al. 2014)…………………………………………………..35 xiv  Figure 2.5: Nx curve for Megahit assembled Pulp and Paper sludge (PPS) metagenome (Figure in collaboration with Connor-Morgan Lang)…………………………………………………………………………………38 Figure 2.6: Manhattan distance hierarchical clustering of CAZymes used in this study……………….41 Figure 2.7: Completeness and contamination estimation of the bins of PPS metagenome…………42 Figure 2.8: Left to right (a) Taxonomic assignment distribution at phylum level for the bins as determine through Phylosift (b) Cumulative percentage distribution of CAZymes across the annotated bins using Metapathways………………………………………………………………………………………..45 Figure 2.9: Qualitative assessment of taxonomy distribution of major phyla through different pipelines………………………………………………………………………………………………………………………………….47 Figure 2.10: Overall CAZyme family distribution in the PPS metagenome as annotated using Metapathways…………………………………………………………………………………………………………………………49 Figure 2.11: Phylum level distribution of relevant Glycoside Hydrolase (GH) genes in pulp and paper sludge (PPS) metagenome (only phyla constituting > 90% of the taxonomy as annotated through Metapathways included)…………………………………………………………………………………………….51 Figure 2.12: Hierarchical clustering of the different microbiomes in this study based on relevant CAZy gene counts (tree distances calculated using Manhattan method)……………………………………55 Figure 2.13: Heat map showing differential abundance of CAZyme families across different metagenomic environments (The color coding represents the conversion of VST GH count values to an enrichment-depletion scale based on calculated z-scores for each family across different environments)………………………………………………………………………………………………………………………….57 xv  Figure 3.1: Production and functional screening of metagenomic libraries (Taupp et al. 2011) Copyright © 2011 Elseiver Ltd.………………………………………………………………………………………………….66 Figure 3.2: Metagenomic library and functional screening schematic for pulp and paper sludge (PPS) metagenome…………………………………………………………………………………………………………………..70 Figure 3.3: The β-1,4-glycoside substrates of 6-chloro-4-methylumbelliferyl (CMU) used for functional screening of the pulp and paper sludge (PPS) metagenomic library clones (A) CMU-cellobiosise (B) CMU-xylobioside (C) CMU-Mannoside (figures by Zach Armstrong)………………….71 Figure 3.4: FabFos pipeline schematic (https://github.com/hallamlab/FabFos)...........................73 Figure 3.5: Co-occurrence and co-localization of glycoside hydrolase (GH) genes are presented in literature (A) Heat map showing frequencies of cooccurrence of GH43 subfamily domains with major noncatalytic modules including CBM, carbohydrate binding module; DOC, cellulosomal dockerin domain; X19, conserved noncatalytic module with subfamilies clustered as per respective HMM profiles (Mewis et al. 2016). (B) Schematic representation of gram-positive polysaccharide utilization locii (gpPULs) concerned with xylan, pectin and arabinogalactan utilization © (Harris et al. 2016)………………………………………………………………………………………………..75 Figure 3.6: Schematic of testing for different cellulolytic activities using of 6-chloro-4-methylumbelliferyl (CMU) glycoside of cellobiose and resultant products from enzymatic breakdown that results in fluorescent signal detection……………………………………………………………..76 Figure 3.7: Initial functional screening results with all clones in the pulp and paper sludge (PPS) metagenomic library………………………………………………………………………………………………………………..78 xvi  Figure 3.8: Second round of screening – validation of top-128 hits in triplicate and deconvolution of activity on CMU-cellobioside………………………………………………………………………………………………..79 Figure 3.9: Reproducibility test of top-29 hits using CMU-cellobiose alongside background control ePCC1FOS and positive control Celluclast enzyme cocktail; inset shows ePCC1FOS values on the two runs (error bars represent 5% error)…………………………………………………………………………………..80 Figure 3.10: Percentage breakdown of Metapathways annotations of fosmid ORFs with focus on CAZy annotations……………………………………………………………………………………………………………………..83 Figure 3.11: Fosmid linked genomic map - each line represents a fosmid clone with some fosmids represented by multiple contigs. Each predicted gene is represented by an arrow showing the direction of transcription. Grey links connect protein homologous with e-value≤1e-10 (Figure in collaboration with Kateryna Ievdokymenko)…………………………………………………………………………….83 Figure 3.12: Taxonomic distribution across sequenced fosmids (Taxonomy assigned based on LCA assignment of taxonomy at phylum level represented in ≥50% of ORF’s for each fosmid assembly)…………………………………………………………………………………………………………………………………88 Figure 4.1: Schematic of proposed circular, consolidated process using paper sludge feedstock as direct (smaller circle) and indirect (bigger circle) applications to the paper industry……………………………………………………………………………………………………………………………………92 Figure 4.2: Different bioprocessing strategies available for the conversion of lignocellulosic biomass to bioalcohols. Abbreviations: SHF, separate hydrolysis and fermentation; SHCF, separate hydrolysis and co-fermentation; SSF, simultaneous saccharification and fermentation; xvii  SSCF, simultaneous saccharification and co-fermentation; CBP, consolidated bioprocessing (Salehi Jouzani and Taherzadeh 2015)………………………………………………………………………………………93 Figure 4.3: Pulp and paper sludge (PPS) feedstock (left-right) (a) Wet PPS cakes obtained after filtration of water content (b) Dry PPS (constant weight) (c) Dried, milled and sieved PPS…………………………………………………………………………………………………………………………………………..98 Figure 4.4: Schematic of the colorimetric assay used for cellulolytic activity detection………………99 Figure 4.5: Experimental set-up for bio-hydrolysis…………………………………………………………………103 Figure 4.6: Percentage composition of dried, milled PPS (left-right) (a) Klason method (b) CHN elemental analysis (5% error)………………………………………………………………………………………………….105 Figure 4.7: Titration of Celluclast FPU in the chito-oligosaccharide oxidase assay (net absorbance values after subtracting assay mixture blank)………………………………………………………………………….106 Figure 4.8: Colorimetric assay - Columns 1-3 titration of Celluclast at different FPU loading; other wells show supernatant from hydrolysis of PPS at different solid loadings (fixed Celluclast loading 500mU) showing color development in contrast to blanks (D-F 10-12)…………………………………….107 Figure 4.9: (Left-right) T0 and T0+24 hours incubation of fosmid whole cell lysates with filter paper substrate. Only wells spiked with Celluclast show activity and colour change………………..108 Figure 4.10: Absorbance readings at specific time intervals during incubation – representative results for two fosmid clone reactions - the cellulolytic activity signal is resulting only from Celluclast enzyme action with no contribution from fosmid whole cell lysates (‘+cel’ refers to spiking of reaction with 10mU of Celluclast, 5% error)…………………………………………………………….108 xviii  Figure 4.11: Protein content estimation of fosmid whole-cell lysate and supernatant fractions using BCA assay (50mL cultures; 5% error)………………………………………………………………………………111 Figure 4.12: SDS PAGE visualisation of whole cell lysate and secreted protein fractions…………..111 Figure 4.13: Measurement of colorimetric signal after incubation with filter paper substrate for 72 hours (left-right) (a) colour development and (b) absorbance values at end of incubation period…………………………………………………………………………………………………………………………………….112 Figure 4.14: Supplementing Celluclast enzyme mixture with fosmid protein fractions (left-right) and application to filter paper substrate (a) Replacement (1:1) with total protein content fixed at 35mg/g cellulose (b) Addition of protein factions to give net double increase in total protein content (Celluclast + fosmid   protein)…………………………………………………………………………………….113 Figure 4.15: SDS-PAGE results of sub-cloned BL21 DE3 cell lysate and supernatant fraction with genes from fosmids P04P08 and P14I01 respectively (sup- supernatant fraction; CL – cell lysate)…………………………………………………………………………………………………………………………………….114 Figure 4.16: Activity testing of sub-cloned cell lysate and protein fraction using CMU substrates (CMU-C2: Cellobioside; CMU-X2: Xyloside; CMU-Man: Mannoside; CMU-3X: mixture of all three substrates; readings at end of 4-hour incubation period with 5% error and inset shows deconvolution tests for P14I01 supernatant fraction)……………………………………………………………..115 Figure 4.17: Percentage conversion of glucan in PPS to glucose during the hydrolysis experiment…………………………………………………………………………………………………………………………….116 xix  Figure 5.1: Material and energy-based revenue flow streams around the paper mill using a biorefinery for valorization of pulp and paper mill sludge………………………………………………………..122 Figure A.1: Plasmid pET-21 a(+) circular map (Addgene database)…………………………………………..152              xx  List of Abbreviations AAI – Amino acid identity AA – Auxiliary activities BACs - Bacterial artificial chromosomes BCA - Bicinchoninic acid BSA – Bovine serum albumin CAZyme – Carbohydrate active enzyme CBM – Carbohydrate binding module CBP – Consolidated bioprocessing CE – Carbohydrate esterase CFU – Colony forming unit CMU - 6-chloro-4-methylumbelliferyl COG – Clusters of orthologous groups DNS - 3,5-dinitrosalicylic acid EMIRGE - Expectation maximization iterative reconstruction of genes from the environment ePGDB - Environmental pathway/genome database GH – Glycoside hydrolase xxi  GHK - Glucose hexokinase GO - Glucose oxidase GT – Glycosyl transferase HMM – Hidden markov models HPLC – High performance liquid chromatography IMG-M - Integrated Microbial Genomes & Microbiomes JGI – Joint genome institute  KEGG - Kyoto Encyclopedia of Genes and Genomes MIMAG - Minimum information about metagenome-assembled genome MIMS - Minimum information about a metagenomic sequence MIxS - Minimum information about any (x) sequence MP – Metapathways NCBI – National center for biotechnology information NGS – Next generation sequencing nr – non-redundant ORF – Open reading frame OUT – Operational taxonomic unit xxii  PCR – Polymerase chain reaction PFGE – Pulse field gel electrophoresis PL – Polysaccharide lyase PMSF - Phenylmethane sulfonyl fluoride PPS – Pulp and paper mill sludge PULs – Polysaccharide utilization loci QIIME - Quantitative Insights into Microbial Ecology RPKM - Reads per kilobase per million mapped reads SDS PAGE – Sodium dodecyl sulphate polyacrylamide gel electrophoresis SIP – Stable isotope probing SIGEX - Substrate induced gene expression TMP – Thermo-mechanical pulping VST - Variance stabilizing transformation XyGULs – Xyloglucan utilization loci      xxiii  Acknowledgements I would like to thank my co-supervisors, Dr. Steven Hallam and Dr. Vikramaditya Yadav for their support and guidance in conducting this work. Both of them encouraged me to think creatively and undertake novel approaches to answer my research questions. Without their mentorship this amazing journey at UBC and participation in the ECOSCOPE NSERC-CREATE training program would not have been possible. This experience has been extremely beneficial to me both personally and professionally. I am also very grateful to them and Dr. Heather Trajano, for providing inputs on the thesis and directing me to relevant literature resources for reading. Thanks also to Dr. Dhanesh Kannangara for being a great mentor and her constant support during my program here. I would also like to acknowledge Brian Houle, Vaughan Blackman and Prashanth Krishnamoorthy for their help with sampling of pulp and paper mill sludge. Big thanks to Dr. Hubert Timmenga for his constant mentorship throughout the project. A big part of my work was done in collaboration with the members of both the Hallam lab and Yadav lab (Biofoundry). Special thanks to Zach Armstrong, Connor-Morgan Lang, Ashely Arnold, Kateryna Ievdokymenko for their constant advice on experiments and help with data analysis and bioinformatics. I would also like to thank Dr. Aria Hahn for her guidance and mentorship and Dr. Keith Mewis for providing insights from his work. Thanks to Joe Ho and Jewel Ocampo for their constant support. Big thanks to Biofoundry lab members, Sonal, Roza, Julia, Dr. Protiva, Dr. Sandip, Carmen and Benson for their advice and support. xxiv  Thanks also to Andrew Wieczorek for his guidance during his time in the Hallam lab. I am also grateful to Dr. Cara Haney and Dr. Ranil Waliwitya (and the team at Active Agri Science) for coordinating my amazing internship experience.  Last but not the least, I would like to thank all my friends in Vancouver and Canada for being my constant support system. They have made this place a home away from home. My biggest motivation are my parents and grandparents who have always encouraged me to push the limits and try to excel in all my endeavours. I thank you all!           xxv   To my parents and grandparents!        1  CHAPTER 1 Introduction   1.1 Sustainability and the Circular Economy There is worldwide movement towards sustainability in almost all forms of manufacturing and process industries and it is hence no surprise that several emerging research areas within the field of Chemical Engineering are dedicated towards improving, modelling and assessing their sustainability potential. Sustainability, apparently quite simple and straightforward in one of the most widely adopted definitions - “meeting the needs of the present without compromising the ability of future generations to meet their own needs” (United Nations General Assembly 1987) – is quite ambiguous in how it is approached. Often, sustainability is used interchangeably with environmental protection. While preservation of the environment is paramount, if it becomes the sole basis for revamping a manufacturing process to conform to sustainability norms, it can often produce undesirable consequences. This has been observed as a major factor in the decline of the paper industry within Canada where the increasing costs of environmental regulations levied on the industry are being passed on to customers and have decreased the industry’s global competitiveness (Bogdanski 2014). True sustainable development can only be efficiently achieved at the intersections of economic, environmental and socio-political sustainability (Rehmann 2010) (Figure 1.1).  2   Figure 1.1: Types of sustainability and meaningful sustainable development at their intersection (adapted from Rehmann 2010) To this end, circularization of the economy, is being put forward as a potential framework for approaching sustainable development in industrial manufacturing. Within the circular economy model, resource demands of global economic growth and scarce environmental resources are reconciled by focusing on product reuse, remanufacture, and resource recycling. This is postulated to extract the maximum value from the current, linear economy through an approach combining industrial reworking, policy incentives and technological innovation. Based on the different definitions (Geng and Doberstein 2008; The Ellen MacArthur Foundation 2015; Bocken et al. 2016) and theoretical influences for the circular economy (McDonough and Braungart 2002; Benyus 2009; Stahel 2010) as presented in scientific literature, for this study the following definition will be adopted: Environmental SustianabilitySocio-political SustainabilityEconomic SustainabilitySustainable development 3  “The circular economy is the material, energy and financial flow system that minimizes the resource input from the environment, prioritizes sustainable development, and minimizes the disposal of unused outputs from processes by employing recycle streams to valorize waste and maximize value addition to the global economic market.  The ultimate intention is to maximize economic efficiency of the linear economy by the reallocation of scarce environmental resources, utilizing all product streams by integration of each product flow into an alternate input stream to totality.” Circularization requires coordination between industry, consumers, and government, and path dependency makes it difficult to make theoretical process improvements. Due to the lack of global governance, the coordination of industries on a global scale is unlikely to occur anytime soon (Korhonen et al. 2018), placing an upper limit on the capabilities of circularization. In traditional consumption markets, the bulk of recycling and reuse responsibility is placed on the consumer, internalizing profits and externalizing negative effects like pollution. The European Union and the United Nations are attempting to implement the circular economy using a top-down approach, in which policy is expected to foment culture sustainability by forcing national governments to adhere to international objectives. According to Lieder and Rashid, a bottom-up approach, in which industry takes responsibility for their product life cycles must also be applied simultaneously to achieve circularization (Lieder and Rashid 2016). This has indeed been reflected in the most recent circularization initiatives launched at the World Economic Forum (2018) which are increasingly business and global-supply chain centric.  There are however several technological bottlenecks that need to be overcome before circularization of manufacturing processes can be achieved. The technology demands for 4  efficient recycling are lagging behind our circularization demands (Stahel 2016; Skene 2017). Integrated resource recovery is the disruptive paradigm shift that is at the core of the closed-loop, circular future. However, splitting heterogenous waste process streams to be used viably as feedstocks is an expensive endeavor, and there is too little research in the chemical sciences to facilitate this (Stahel 2016).  The ability to valorize renewable resources into a variety of products and the technological flexibility compared to other manufacturing industries allows the bioeconomy, the collection of industries which use renewable biomass as the primary feedstock and biotechnology and bioprocessing are major contributors to the economic productivity (Bueso and Tangney 2017), to be a catalyst in the success of the global circular economy.  1.2 The Bioeconomy While the global circular economy community is actively searching for case studies to evaluate economic feasibility and social impact of cyclic workflows, the perfect models can be potentially provided by the bioeconomy (Carrez and Van Leeuwen 2015). Bioprocesses making up the bioeconomy are inherently circular and often operate in closed, feedback-driven loops and the waste products used to supplement other process inputs are typically inexpensive and simple to collect (Figure 1.2). The scientific rationale for pursuing development of bioprocesses is not just formed by the global resource and energy paucity arguments but also by the ability of these processes to restore environmental damage and regenerate value from waste.  5   Figure 1.2: The bioeconomy circular by nature (BioVale 2015) The inhibitory energy requirements that other industries face that offset the profit margin in the conventional circular economy are drastically reduced, as the labor is performed by microbial communities breaking down wastes into valorized products through metabolic pathways. While traditional recycling is energy intensive, the energy requirements of recycling through bioprocess engineering can be provided by renewable feedstocks. Where many markets require extensive research to facilitate the circularization shift, the bioeconomy has experienced significant technology advancements especially in synthetic biology and industrial biotechnology that feed directly into making possible closed-loop bioprocesses. Biorefining as defined by the International Energy Agency (IEA Bioenergy Task 42—Biorefineries), is the “sustainable processing of biomass into a spectrum of bio-based products 6  (food, feed, chemicals, and/or materials) and bioenergy (biofuels, power, and/or heat)” (Ree and Jong 2017) . Theoretically biorefining can indeed result in production of all the consumer products that are made from petro-chemicals currently (Figure 1.3). However, it must be acknowledged that the differences between biomass as the central feedstock vs crude oil puts an upper limit on the efficiency with which it can be converted into the targeted end-products. Fluctuations in global petroleum prices together with development of technologies to access shale/natural gas reserves more cheaply make it difficult to justify the high investments that entail development of technologies to refine or valorize waste or lignocellulosic biomass (Chen 2012) . Especially in the scenario of a mismatch in the required end-product yields make complete replacement to bio-based products economically unfeasible and not truly sustainable. In conclusion, a “one-dimensional departure from the fossil economy” to the bioeconomy is not possible (Schütte 2017). 7   Figure 1.3: Flowchart showing pathways to products from biomass that are conventionally produced from petroleum based feedstocks (U.S. DOE. 2015) The scale of operation, feedstock constraints (including supply chain networks) and intensive water usage of bioprocesses are some of the key hindrances that deter their inclusion as strong models leading the transition to circularity in the global economy. There have been several strategies proposed to overcome this in interdisciplinary literature across ecological economics, industrial biotechnology, chemical engineering and socio-politics. These are summarized in the Table 1.1. 8   Table 1.1: Bioeconomy development strategies Strategy discipline Proposed strategy References Industrial Biotechnology and Microbiology Large-scale biocatalyst discovery and optimization of industrial hosts (Pellis et al. 2018) (Bio-TIC 2015) Bioengineering and Bioprocessing Continuous fermentations and whole-cell/immobilized biocatalysis development and scale-up (Committee on Industrialization of Biology 2015; Sheldon and Woodley 2018) Synthetic Biology Metabolic engineering and protein engineering (Bueso and Tangney 2017) Engineering process design Consolidated/Closed-loop bioprocessing; Effective utilisation of mixed waste feedstocks; modularization of unit processes (Brown 2013) (Lamers et al. 2016) Engineering process economics Focus on waste, abundant feedstocks, growing biomass on marginalised lands, resource recovery to bring down manufacturing costs (Venkata Mohan et al. 2016) Education and human resource development Knowledge mobilization through open access of biological data, course development across disciplines to foster bioeconomy leadership (European Commission 2017)(El-Chichakli et al. 2016) Policy and governance Tax credit incentivization systems for renewable/waste feedstock usage and support to emerging research and industrial technology development; Promoting technology transfer (Lange et al. 2016)(Pellerin and Taylor 2008) Economics and Marketing Targeting niche market segments for fine chemicals and compounds difficult to synthesize chemically (Browne et al. 2011; Rabaçal et al. 2017) Ecological Economics/Industrial Ecology Industrial symbiosis networks/Integrated Biorefineries (Sillanpää and Ncibi 2017) (Philp and Winickoff 2017) Social economics Focus on small communities in rural agricultural areas with strong biomass supply network for biorefinery development and rural economy revitalization (Lopolito et al. 2011; Owen 2018)  9  Within the scope of this study, the focus will be on the technological aspect of biocatalysis, which has been identified as one of the most important means to take bioeconomy development forward. 1.2.1 Industrial biocatalysis A strong push for industrial bio-catalysis development, arises from advantages around their high specificities (particularly enantiomeric like transaminases) and improvement in process economics by their low price in comparison to chemical catalysts as well as requirement of mild operating process conditions leading to savings in energy costs (Sheldon and Woodley 2018). This has been especially beneficial for industries needing fine chemical synthesis like pharmaceutical and accelerated the growth of industries that utilize specific enzymes in their production processes viz food and beverage processing (hydrolases); detergent and surfactant (proteases) and polymer synthesis (hydratases, peroxidases) among others (Patel et al. 2017). Some key areas of focus around industrial biocatalysis development have been immobilized enzyme systems (for ease of recovery and enzyme regeneration) and whole cell biocatalysis to overcome cell lysis and enzyme recovery unit operations from production processes. However, there remains a vast expanse of biological information that is untapped for biocatalyst production. Advances in synthetic biology, bioinformatics and data analytics have now made it possible to cost-effectively mine this large dataset for putative enzyme candidates with interesting properties like thermo-stability, broad-range pH tolerance and identification of promiscuous activities that can be applied for conversion of several different types of substrates. Moreover, capitalizing on nature’s directed evolution of microbial enzyme activity enrichment 10  due to unique environmental composition can be translated to process applications for utilizing difficult to degrade substrates. This has been observed for lignocellulosic biomass hydrolysis through fungal or bacterial enzymes (Prasetyo et al. 2011; Kharayat and Thakur 2012) or bioremediation applications for mine tailings (Nancucheo et al. 2017) or oil sands process water treatment using enrichment cultures derived from these environments (Rochman et al. 2017). These promising applications of biocatalysis directly feed into bioeconomy development. In addition to biochemical and synthetic biology tools, metagenomics, is arguably the single-most powerful recent area of research that has made possible high-throughput exploration and development of biocatalysts from environmental genomic information.  In the following sections of this study, a combined approach of metagenomics and consolidated bioprocessing is presented within the context of application towards valorizing the solid waste stream from the pulp and paper industry, one of the largest and more prominent sectors of the bioeconomy, in a closed-loop, sustainable process. This is motivated by identification of bottlenecks in the growth of the bioeconomy, promising technological tools that can be potentially used to overcome this coupled with the decline in the paper industry within Canada that demands an economically promising, re-propositioning solution. The findings presented herein are very relevant and applicable to the broader development of the global circular economy.   11  1.3 Rejuvenating the Paper Industry by Valorising Solid Waste Stream – Pulp and Paper Mill Sludge (PPS) As per the nominal GDP data on the Canadian economy, the pulp and paper industry contributed only 0.45% of the overall GDP (sourced from National Research Council Canada statistics data) in 2016. From being the world’s leader in pulp and paper production, Canada is currently ranked 8th. IBIS world key statistics for the industry (depicted in Figure 1.4) clearly indicate a decline phase with the decline forecasted to plateau in the next five years with negligible revival. The industry has taken this hard hit due to the diminishing demand for printed paper products - newspapers, magazines, directories - owing to the digital media revolution. Industry mills have steadily consolidated in recent years due to rising competitive pressures from digital media and foreign manufacturers of paper, packaging materials, hygiene products and other paper products. Pulp, especially Northern Bleached Softwood Kraft (NBSK), however, remains a competitive product of this industry, both domestically as well as in exported products.  Falling demand for newsprint and intensifying import competition have made it difficult for smaller mills to remain profitable in this industry, causing many operators to shut down entirely over the past five years greatly affecting the employment in small communities supported by paper mills. Globally, there is an annual decline of 2.4% for Canadian paper exports. At the same time this industry is also heavily regulated environmentally which leads to compliance costs being transferred to consumers that limit the company’s international competitiveness. There is hence a dire need to look at ways to strategically reposition this industry in a manner that can capture an emerging market. Bio-refining to valorise its solid waste stream, pulp and paper sludge (PPS) presents one such compelling solution.  12  PPS is a solid by-product of the pulping and paper-making process which produced abundantly in quantities of about 300– 350 million tons every year (Ioelovich 2014). Landfill disposal of PPS presents a significant share of around 60% of the total waste water treatment plant operating costs, creating both economic and environmental problems (Chen et al. 2014). Even in plants utilising incineration of PPS to generate heat energy from the organic content, the process becomes a net cost due to extremely high moisture content (>80%) which makes drying of PPS prior to burning a high energy and cost intensive process.  Figure 1.4: Major current industry statistics for the Pulp and Paper industry in Canada (Source: IBIS world open access content) The high cellulosic content of paper mill sludge (45-50%), when compared to other waste lignocellulosic biomass sources (Table 1.2) makes it uniquely suited for conversion to bio-products of value. Moreover, due to the thermo-mechanical nature of paper pulp processing, the cellulose fibres within paper sludge are more readily accessible for hydrolytic conversion to oligosaccharides as compared to other typical lignocellulosic materials fostering confidence in the economic viability of such conversions (Gurram et al. 2015).  There have already been several 13  studies that have investigated feasibility of conversion of paper sludge to the biofuel, bio-ethanol (Marques et al. 2008; Kang et al. 2010) and other bio-products.  Table 1.2: Approximate content of cellulose, hemicellulose and lignin in different types of waste lignocellulosic materials (Prasetyo and Park 2013)  1.4 Research Overview  The goal of this research thesis is to examine pulp and paper mill sludge (PPS), an important waste lignocellulosic biomass material, both as a source of potentially novel glycoside hydrolase (GH) enzymes as well as assess feasibility of its application as a feedstock in a consolidated bioprocess to generate a bioproduct that can valorise the cellulosic content in it. This investigation is undertaken with the broader objective to establish proof-of-concept of a hypothetical closed-loop bioprocess that can potentially be retrofitted to the pulp and paper making industry and add value to its waste stream upon proper optimization and scale-up. The biocatalyst discovery is done using a novel metagenomic approach in contrast to traditional microbiological culturing as has been observed before in literature (Maki et al. 2011). The proof-14  of-concept of closed loop bioprocess is demonstrated by hydrolysis of PPS, assessment of the hydrolysate and application of hydrolysate to generate a bioproduct. Throughout this study the term paper sludge will refer to cellulose fibre rejects from the paper making process, sampled from the primary unit of the wastewater treatment operation. The specific research objectives are discussed in the thesis chapters as follows (Figure 1.5): 1. In Chapter 2 the overall metabolic potential of the pulp and paper metagenome is discussed with a focus on carbohydrate active enzymes (CAZyme) activity using in silico methods. This chapter introduces metagenomics with a focus on the bioinformatics methods as applicable to this study.  2. In Chapter 3 the functional metagenomics part of the study is detailed. The experimental design to mine biocatalysts with GH function from PPS metagenome is presented.  3. Chapter 4 is based on taking the metagenomic discoveries to bioprocess application. Feasibility of using PPS as a bioprocess platform feedstock is presented and findings from tests conducted on assessing hydrolysis potential of whole cell fosmid lysates are also discussed.  15   Figure 1.5: Research overview There is inherent, synergistic linking between all the modules of research in this study. It reflects the overall motivation behind this work to contribute towards development of a sustainable and circular bioeconomy. In silico findings from the microbiome are important to assess the potential of an environment for mining biocatalysts. The functional metagenomic activity discovery in turn must be corroborated and supported by preliminary assessments using bioinformatics. Further, biochemical testing and experimental validation of function of biocatalysts using substrates that closely model the real biomass feedstock is important from the perspectives of both bioprocess application, as well as annotation of novel enzyme families, that adds to expansion of existing knowledge databases.        16  CHAPTER 2 In silico Analysis of the Paper Mill Sludge Microbiome 2.1 Background  2.1.1 Metagenomics – Unearthing Nature’s Biocatalytic Potential Metagenomics is the isolate independent study and analysis of microbial communities and the metabolic potential contained in the collection of their genomes across different environments (Handelsman 2004; Council 2007b; Thomas et al. 2012) . It combines the methods of genomic sequencing, high-performance computing and classical microbiology to unravel microbial community structure and metabolism with genome scale resolution (Hawley et al. 2017). Given that the vast majority of the bacteria and archaea dwelling in nature have not been isolated under laboratory conditions, the metagenomics approach has enabled access to their genomic potential and gene expression patterns (Schloss and Handelsman 2005). This together with genetic engineering and synthetic biology is leading to the transformation of tractable industrial strains with novel environmental genes to produce biocatalysts (Madhavan et al. 2017) with improved properties or novel metabolites through pathway engineering (Bao et al. 2017; Cuadrat et al. 2018). The findings generated  through metagenomics have contributed significantly to the body of knowledge about the elegant mechanisms through which the “unseen majority” (Whitman et al. 1998) of life directs and guides several biogeochemical processes which are crucial for the survival of the “visible” life on earth (Falkowski et al. 2008).   17  2.1.1.1 Lignocellulose Hydrolysis and Valorization One of the biggest areas of research in biocatalysis is lignocellulose breakdown. This is also proportionately reflected in the number of metagenomic studies that have aimed to discover enzyme families and complexes for this objective. In fact, glycoside hydrolase gene families occupy the greatest number of families discovered through metagenomics in the past decade as depicted in Figure 2.1.  Figure 2.1: Dot plot to show abundance of enzymes discovered through functional screening in metagenomic libraries (Taupp et al. 2011) Copyright © 2011 Elsevier Ltd. 18  Lignocellulose is the most abundant natural renewable material on earth while cellulose is the most abundant naturally occurring polymer (Salehi Jouzani and Taherzadeh 2015; Cai et al. 2017). It is currently the target feedstock for several biorefining processes that are increasingly looking to valorise second generation biomass including harvest and forestry residues, underutilised crops, food and municipal waste, paper and sawmill residues. These processes seek to capitalize on the low cost of these feedstocks and lack of competition with starch-based food resources but suffer from lack of technological readiness for economically feasible and efficient conversion of the fractions – cellulose, hemicellulose and lignin within the biomass. The structure of lignocellulose is quite complex and cellulose – which is the primary source of C6 monomer generation is encapsulated well within a reinforced structure of lignin and cross-linked through hemi-cellulose chains (depicted in Figure 2.2). Also, hydrolysis of biomass after lignin removal produces a mixture of C5 and C6 sugars which are not very readily fermentable by industrial strains.       Figure 2.2: Structure of lignocellulosic biomass containing cellulose (composed of a β-1,4-linked chain of glucose molecules), hemicellulose (composed of various 5- and 6-carbon sugars) and lignin (composed of three major phenolic components) (Rubin 2008) Copyright © 2008, Springer Nature 19  The low cost coupled with the high cellulosic content of these biomass sources presents both an intellectual and industrial incentive to discover enzyme mixtures capable of hydrolysing them and funnelling the carbon obtained into bioproducts of high value like biofuels, biopolymers and other fine chemicals. In their comprehensive review on consolidated bioprocesses (CBP) for butanol production from lignocellulosic biomass Jouzani et al. have described almost twenty-five different activities that are needed to completely hydrolyse any given lignocellulosic biomass without any pre-treatment (Table 2.1). Engineering a single “biorefining organism” encoding all these activities alongside metabolizing and producing the desired bioproduct would entail an extremely heavy metabolic load for any biological system to bear by itself! It is hence no surprise that thermo-mechanical or chemical processes for pre-treatment of the biomass are needed prior to hydrolysis. These energy-intensive processes can be potentially replaced by microbial consortium displaying all these activities in the following sequential steps (including fermentation to a bioproduct): Step 1 - Secreting several glycoside hydrolase enzymes  Step 2 - Hydrolyzing both cellulose and hemicellulose to soluble sugars  Step 3 - Metabolizing soluble sugars  Step 4 - Produce bioproducts Step 5 - Be highly tolerant against lignin-derived compounds and the bioproduct produced.   20  Table 2.1: The most prominent enzyme activities needed to completely hydrolyse lignocellulosic biomass (adapted from Salehi Jouzani and Taherzadeh 2015) There is hence an immense opportunity using metagenomics to find environmental genes with the desired properties that can lead to the design of a “one-pot” consolidated bioprocess that can not only break down but also valorise lignocellulosic biomass. 2.1.2 In silico Approaches in Metagenomics - what all can the microbiome map tell us? The fast-paced development of bioinformatics tools and pipelines specifically for metagenomic data coupled with a great reduction in the cost of next-generation DNA sequencing method has led to an enormous wealth of information that is available for in-silico analysis before using functional screening approaches for biocatalyst discovery or biochemical assaying for validation of enzyme function. Cellulases Hemi-cellulases Ligninases Pectinolytic and Cell wall loosening enzymes cellobiohydrolase endoxylanase lignin peroxidase expansin endoglucanase β-xylosidase aryl-alcohol oxidase swollenin β-glucosidase acetyl xylan esterase laccase loosinin phospho-β-glucosidase glucuronyl esterase glyoxal oxidase cellulose induced protein  arabinofuranosidase cellobiose dehydrogenase   galactosidase    glucuronidase    mannanase    xyloglucan hydrolase       21  There is a highly interdependent relationship between the functional gene diversity, microbial community structure and the metabolic potential of any environment (Wang et al. 2017) that needs to be carefully navigated before making any interpretations of “who’s doing what?” in a given environment. Given the complex and mixed nature of metagenomic environmental DNA, bioinformatics tools become very important in making sense out of potential noise that can be confounding and to better assist functional hypothesis and experimental design.  2.1.2.1 Bioinformatics Tools – making sense of the noise The development of next generation sequencing platforms has made possible enough sequencing depth to confidently analyse complex microbiomes. This is done through either second generation (short reads 150-400bp; Illumina (Illumina Inc. 2015) and Ion-torrent (Ansorge et al. 2017)) or third generation (longer reads 6-20kb; PacBio (Rhoads and Au 2015) and Nanopore technologies (Bainomugisa et al. 2018)) technologies. These reads however then must be assembled or recruited into the genomes which comprise the metagenome to enable reconstruction of open reading frames (ORFs) on single or multiple gene loci that would eventually lead to analysis of the metabolic potential through pathway reconstruction (Metacyc or Pathway tools) or annotation of genes.  Metagenomic assembly is essentially assembly of many different genomes at once and is hence more complex than single genome assembly (which is a complicated process due to the presence of repetitive elements within genomes making read assignments to individual contiguous sequences difficult). Several “de-novo”’ assembly pipelines exist for this purpose and 22  most of them use an iterative approach using k-mers (or short sequences of fixed length) to construct de Bruijn graphs which is then used to systematically construct larger contiguous sequences. There are some statistics that are used for judging the quality of the assembly like N50, L50, (terms are defined and reviewed elsewhere (Mäkinen et al. 2012)) largest contig length or the Nx curve that depicts proportion of genome contained within specific contig lengths. These can be readily computed using QUAST: quality assessment tool for genome assemblies (Gurevich et al. 2013). Some common pipelines used are Megahit , Minia , Meraga, A*, Ray Meta and Velour that  produce reproducible results (Sczyrba et al. 2017) .  The metagenomic data can also be binned or grouped into clusters with data arising from related genomic or taxonomic sequences, either prior to assembly or post assembly (Roumpeka et al. 2017). This is done to better understand the phylogenetic distribution of the functional/metabolic clusters within the metagenome. The quality of the bins is determined by a suitable measurement of the contamination or presence of reads from other taxonomic groups within the bin (calculated as % contamination and strain heterogeneity index). Binning can be done in both supervised (using a reference genome) or un-supervised manner (without a reference genome). The latter is better for metagenomic data where different genomes might potentially exist. Pipeline that include both genome and taxonomic binners include MaxBin, MetaBAT, MetaWatt, CONCOCT, Kraken etc. (Sczyrba et al. 2017). The distribution of taxa within a metagenome is also not easy to assess. Often multiple marker strategies that might also include  lowest common ancestor (LCA) tree construction – using Phylosift (Darling et al. 2014), taxator-tk (Dröge et al. 2015) or MEGAN (Huson et al. 2016) 23  rather than only SSU or 16S rRNA recovery strategies – SILVA database alignment, EMIRGE (Miller et al. 2011) or amplicon sequencing (Pilloni et al. 2012) - are used for determining taxonomic IDs. Finally, gene annotation for the metagenomic data is done and like other analyses, several approaches exist for this purpose. There are some pipelines that can predict genes from fragmented or short read data like MetaGeneAnnotater (Noguchi et al. 2008), Glimmer-MG (Kelley et al. 2012) and FragGeneScan (Rho et al. 2010) among others. However, for prediction of multiple gene loci, especially those contained within large-insert fosmid/cosmid metagenomic libraries or applicable to activities like cellulose hydrolysis that involve more than one catalytic family domains, it is important to assemble the data first prior to annotation. This might also be very important for PCR or gene synthesis strategies for experimental validation of function where recovery of complete gene sequences is necessary. There are also pipelines that combine gene prediction with annotation using databases of interest that are tailored for metagenomic assemblies or short-read sequence data. Metapathways (Hanson et al. 2014) is one such pipeline and has been used in this study.  2.1.2.2 Standardizing Data and Streamlining Analysis The relative newness of metagenomics together with the explosion of different pipelines to analyse environmental sequence information (as noted in the section before) has resulted in a big disarray in terms of the forms of outputs and how published results affect consequent interpretations about function of discovered gene products. This has also partly contributed to inhibition of industry-wide adoption of metagenomically discovered biocatalysts given the time-scale for biocatalyst optimization under process conditions. This time-scale can be reduced at the 24  upstream R&D end through standardization of the way the metadata of metagenome is organized and how the recovered genomes are generated and annotated. Specifically, with respect to industrial biocatalysis and biorefining, standardization can help with: 1. Better comparison of functional and/or taxonomic enrichment across different environments as it relates to environmental metadata 2.  Standardization of metadata would enable stronger linkages of functional discoveries with taxonomic composition 3. It can also potentially inform synthetic biology or genetic engineering-based optimization of industrial hosts with environmental genes by optimizing codon usage or using native promoters.   To facilitate this, the Genome Standards Consortium, in continuation of their Minimum Information about Any (x) Sequence (MIxS) standards, have established Minimum Information about a Metagenomic Sequence (MIMS) and Metagenome-Assembled Genome (MIMAG) of bacteria and archaea standards (Bowers et al. 2017). These are recommendations to provide statistics for assembly quality, genome completeness, and a measure of contamination to assess genome quality prior to further downstream analysis and annotation. The detailed criteria and qualification for high, medium and low-quality genome drafts are provided in Tables 2.2 and 2.3 and depicted in Figure 2.3. 25   Figure 2.3: Quality assessment pipeline for single amplified genomes (SAGs) and genomes from metagenomes (MAGs) (Bowers et al. 2017) Table 2.2:  Genome Standards Consortium recommendations for metadata generation for metagenome assembled genomes (MIGS – Minimum information about a Genomic Sequence) (Bowers et al. 2017) General genome metadata currently not in MIGS   mandatory analysis project type   single amplified genome (SAG)   metagenome-assembled genome (MAG) mandatory taxa id   16S rRNA gene   multi marker approach   other mandatory assembly software   tool used for assembly optional annotation   tool or pipeline used for annotation Genome quality   mandatory assembly quaility 26    Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. Assembly statistics*.    High Quality Draft:Multiple fragments where gaps span repetitive regions. Assembly statistics*. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs.    Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics*.   Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics*. mandatory completeness score   High Quality Draft: >90%   Medium Quality Draft: >50%   Low Quality Draft: < 50% mandatory contamination score   High Quality Draft: < 5%   Medium Quality Draft: < 10%   Low Quality Draft: < 10% mandatory completeness software   name of software optional number of contigs   value optional 16S recovered   yes/no optional 16S recovery software   name of software optional number of standard tRNAs extracted   value 0-21 optional tRNA extraction software   name of software optional completeness approach   marker gene based   reference genome based   other optional contamination screen input   reads   contigs optional contamination screen parameters 27    ref db   kmer   coverage   ref db+kmer   ref db+coverage   ref db+kmer+coverage   kmer+coverage   other optional decontamination software   checkm/refinem   anvi'o   prodege   bbtools:decontaminat.sh   acdc   combination   other  Table 2.3: Metagenome assembled genome (MAG) metadata requirements (Bowers et al. 2017) MAG metadata   mandatory bin parameters   homology search   kmer   coverage   codon usage   other combinations madatory binning software   metabat   maxbin   anvi'o   concoct   groopm   esom   metawatt 28    combination   other optional reassembly post binning   yes/no optional mag coverage software   bwa   bbmap   bowtie   other  2.2 Materials and Methods 2.2.1 Sample Collection and Processing The paper sludge was sampled from a BC coastal paper mill (48° 52'N, 123° 39' W) on 19th February 2016. The pulping process used at this facility is a combination of thermomechanical (TMP) pulping and kraft process and the major product produced is Northern Bleached Softwood Kraft pulp (NBSK). The sample in the study however was taken from the primary wastewater treatment reactor prior to mixing of the TMP and kraft process rejects. The sampling was done using three methods:  1. 8X15mL cryo-temperature resistant falcon tubes were filled with liquid sludge, frozen and transported over dry ice and stored at -800C for metagenomic DNA extraction and fosmid library construction. 2. 3X5 gal (19 L) plastic jugs containing liquid sludge sampled from the primary wastewater reactor on site at the mill. The samples were chilled at 40C overnight prior to transport at ambient temperature and then stored at 40C. 29  3. ~ 1 kg total weight of dewatered sludge cakes were sampled by filtering the liquid sludge using vacuum-based Buchner funnel apparatus and stored at 40C. The dewatered sludge cakes were dried to a constant weight by controlled drying for 72 hours at 450C in an incubator (redline, Binder) in accordance with standard procedures for preparation of wet biomass for compositional analysis (Hames et al. 2008). Detailed methods for compositional analysis are covered in chapter 4. 2.2.2 Whole-metagenome Sequencing, Binning and Assembly 2.2.2.1 High Molecular Weight Genomic DNA Extraction The high molecular weight genomic DNA from sludge was extracted following the previously developed protocol for DNA extraction from forest soils and sediments (Lee and Hallam 2009) and modified suitably for paper sludge. Briefly, the sludge was dewatered using centrifugation (3000g, 5 mins) and approximately 5g. was used for each DNA extraction. It was ground using liquid N2 to a powdery consistency with regular addition of denaturing solution. Hybridisation based extraction was carried out at 650C followed by centrifugation to remove solids. Chloroform-isoamyl alcohol extraction was done repeatedly on the supernatant. Precipitates were removed from the collected aqueous phase and it was concentrated using 10kDa cut-off amicon filters (Millipore) and buffer-exchanged into 1X Tris-EDTA (TE) buffer to a final volume of 500-250 µL. The DNA was then precipitated using 0.6 volumes of isopropyl alcohol and air-dried pellet solubilised in 100 µL 1X TE through an overnight incubation at 40C. The integrity of DNA was checked using 0.8% agarose gel using λ / HindIII (Thermo fisher) and 1kb+ ladder (New England Biolabs - NEB) with 23kb as the lower cut-off. The extracted samples were 30  quantified for double-stranded (ds) DNA content using Quant-iT Picogreen dsDNA kit (Thermo Fisher) protocol. 2.2.2.2 Sequencing DNA library preparation and HiSeq Sequencing was done at GENEWIZ (NJ, USA). For QC, The DNA sample was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and the DNA integrity was checked with 0.6% agarose gel with 50-60 ng of sample loaded in the well. NEB NextUltra DNA Library Preparation kit was used following the manufacturer’s recommendations (Illumina). Briefly, the genomic DNA was fragmented by acoustic shearing with a Covaris S220 instrument. The DNA was end repaired and adenylated. Adapters were ligated after adenylation of the 3’ends. Adapter-ligated DNA was indexed and enriched by limited cycle PCR. The DNA library was validated using TapeStation (Agilent Technologies), and was quantified using Qubit 2.0 Fluorometer. The sequencing libraries were multiplexed and clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq 4000 instrument according to manufacturer’s instructions. The samples were sequenced using a 2x150 Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS) on the HiSeq instrument. Raw sequence data (.bcl files) generated from Illumina HiSeq were converted into fastq files and de-multiplexed using Illumina bsl2fastq v. 2.17 program. One mis-match was allowed for index sequence identification.   31  2.2.2.3 Metagenome Assembly and Binning Metagenomic assembly and binning was done in collaboration with Connor Morgan-Lang. Prior to assembly the reads were quality filtered and trimmed to remove Illumina TruSeq paired-end sequencing adaptors using Trimmomatic-0.35 (Bolger et al. 2014) that resulted in removal of 0.23% of the total reads.  PrefixPair used: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT' The resulting reads were stored as interleaved paired-end read files and used for assembly with Megahit v1.0.6 (Li et al. 2015) using a k-mer value of 27 and a 10-step increase to a kmax of 147. Binning was done post-assembly using MaxBin 2.0 (Wu et al. 2016) using a minimum contig length of 2000 bp and a probability threshold of 0.95. The bins were further assessed for quality using CheckM tool (Parks et al. 2015) that gives values for % completeness and % contamination of the bins. The former is calculated based on the number of marker genes present in the bin (from the set of 104 markers used for bacteria) and the latter is calculated using the number of copies for each of the marker gene (<1 copy implies contamination). There is another parameter generated from the pipeline called strain heterogeneity index that allows for assessing the amino acid identity (AAI) between multicopy marker genes to give an index value for how many of these genes arise from closely related strains above an AAI threshold and further informs strain level contamination of the bins generated. The scale is from 0 (no strain heterogeneity) to 100 (all markers present >1 appear to be from closely related organisms). 32  2.2.3 Taxonomic Analysis Both in-silico and experimental approaches were taken for the taxonomic analysis of the PPS microbiome with an objective to compare the differing pipelines qualitatively. Metapathways annotation of taxonomy using refseq database gene detection is discussed in the section 2.2.4. 2.2.3.1 454 16S rRNA gene Pyrotag Sequencing  The sample preparation and analysis were done in collaboration with Ashely Arnold from the Hallam lab group. Briefly, the V6 – V8 region of the 16S rRNA gene was amplified from the DNA template using the universal primer pair 926F (5′-AAACTYAAAKGAATTGRCGG-3′) and 1392R (5′-ACGGGCGGTGTGTRC-3′) (Engelbrektson et al. 2010) to generate molecular barcodes of microbial community diversity.  Primers were modified to include 454 adaptor sequences and barcodes were added to the reverse primers to allow multiplexing during sample sequencing. PCR amplification was performed in 50μL reaction volumes in duplicate to minimize PCR bias. Each reaction well contained 5μL Bishop 10X PCR buffer, 3μL 25mM MgCl2, 4μL 2mM dNTPs, 2μL of forward and reverse primers, 2μL 10mg/mL BSA, 0.6 μl Bishop Taq DNA Polymerase (5 U μl−1), 33.4 nuclease-free water and 1 – 10ng of the DNA or cDNA template. Negative controls were included for each sample reaction. Thermal cycler protocol started with denaturation at 950C for 3 minutes, followed by 25 cycles of additional denaturation at 950C for 30 seconds, annealing at 550C for 45 seconds, and extension at 720C for 90 seconds. 10 minutes at 720C for final extension completed the amplification process.  33  Successful PCR products were pooled, cleaned using the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel), eluted in 30μL of 5mM Tris buffer at pH 8.5, and quantified following the Quant-iT Picogreen dsDNA kit (Thermo Fisher) protocol. Each 16S SSU rRNA amplicons were pooled at 100ng DNA prior to sequencing. Emulsion PCR and sequencing were performed at Genome Quebec (Montreal, Canada) on the Roche 454 GS FLX Titanium platform (454 Life Sciences) according to manufacturer’s protocol.  6,137 SSU rDNA pyrotag sequences were processed using the Quantitative Insights Into Microbial Ecology (QIIME) version 1.9.0 software package (Caporaso et al. 2010). Sequences with less than 200 bases, ambiguous ‘N’ bases, and homopolymer runs were removed before a quality filtering step using the usearch quality filtering pipeline developed by Robert Edgar and implemented in QIIME. De novo and reference based chimeric sequences were identified via UCHIIME and removed prior to taxonomic assignment. Non-chimeric sequences were clustered at 97% into operational taxonomic units (OTUs) with UCLUST where representative sequences from each cluster were queried against the SILVA 128 ribosomal RNA database (Quast et al. 2013) using the Ribosomal Database Project (RDP) classifier to assign taxonomy. Singleton OTUs (represented by one read only) were omitted from downstream analysis to reduce over prediction of rare OTUs (Kunin et al. 2010).    34  2.2.3.2 Expectation Maximization Iterative Reconstruction of Genes from the Environment (EMIRGE) based 16s Prediction The raw reads of the PPS metagenome as obtained from using the methods described in section 2.2.2.2 were analyzed for genome wide 16s rRNA gene reads using EMIRGE software for reconstructing full length ribosomal genes from short read sequencing data (Miller et al. 2011) in collaboration with Connor Morgan-Lang in the Hallam lab. Briefly, in this pipeline a probabilistic method based on expectation maximization algorithm is used in an iterative manner to determine abundance of taxa in the short-read dataset. The fastQ input files had a Phred quality score of 33 and the SILVA 132 SSU rRNA database was used as the candidate SSU database. Maximum input read length was specified as 150 and insert mean and inset standard deviation parameters were set to a value of 500 to allow all 16s rRNA gene read pairs to map. 2.2.3.3 Phylosift Annotation Both the whole metagenome dataset and the bins were analyzed for taxonomy through multiple marker gene detection approach using the pipeline Phylosift (Darling et al. 2014). This pipeline incorporates LAST alignment algorithm, hidden markov model profiling (HMMER) and ppclaer tools to automate phylogenetic prediction through identification of protein coding regions and RNA sequences. The outputs are presented in the form of dynamic krona plots. The workflow is presented in the Figure 2.4 below and the rRNA workflow presented is relevant to the objective of this study. 35   Figure 2.4: Phlyosift workflow schematic (Darling et al. 2014) 2.2.4 Functional annotation of metagenome  Metapathways (MP) (v 2.5 with in-house updates) (Hanson et al. 2014), a modular annotation and analysis pipeline developed in the Hallam lab for environmental sequence information analysis was used for the functional annotation of the assembled metagenome and bins. Gene prediction and annotation information generated from the pipeline is then used to construct Environmental Pathway/Genome Database (ePGDB) comprising of MetaCyc pathways using the Pathway Tools software and its pathway prediction algorithm “PathoLogic”. The results could be interactively viewed in the graphical user interface of the pipeline and the ePGDB data structure of sequences, genes, pathways, and literature annotations were used for integrative interpretation.  The metagenome contigs were compared to the curated and annotated latest versions of the databases KEGG (Kanehisa et al. 2016), COG (Tatusov et al. 2003), RefSeq (O’Leary et al. 2016), MetaCyc (Caspi et al. 2016), and CAZy (Terrapon et al. 2017) databases using the 36  LAST algorithm. Prodigal (Hyatt et al. 2010) parameters for open reading frame (ORF) prediction used were: minimum length of 60 bp, minimum bitscore of 20, minimum (B)LAST score ratio (BSR) of 0.4, and maximum e-value of 1x10− 6.  2.2.5 Mapping Reads to Assembly and Generating Normalized Abundance Values for Open Reading Frames (ORFs)  The burrows wheel aligner (bwa) –  a software package for mapping low-divergent sequences (such as Illumina short reads) against a large reference genome was used to map PPS metagenome raw reads to assembly (Li and Durbin 2010). This was done to get an estimate of the unmapped reads and correct for abundance measures lost during assembly. The MEM algorithm was used as it offers support for longer sequences ranged from 70bp to 1Mbp and split alignment which was the case for PPS reads. The developers also recommend it for high-quality queries as it is faster and more accurate.   Reads Per Kilobase per Million mapped reads (RPKM) values were calculated for all predicted ORF’s to generate abundance values normalised for sequencing depth and the gene length through a custom script developed in the Hallam lab and run with the “multi-read” mode flag to consider the multi-reads generated through the bwa alignment. The results generated were fed into MP and the GUI was used to interactively view the ORF RPKM relative abundance across different databases.    37  2.2.6 Qualitative comparison of Carbohydrate Active Enzyme (CAZyme) abundance across different environments To qualitatively assess where the PPS microbiome CAZy profile lies in the spectrum of CAZy abundance across different waste-biomass environments, an in-silico comparison of CAZy family counts was done. Datasets were obtained from both publicly available datasets (NCBI and IMG-M/JGI) as well as the metagenomic data generated by members of the Hallam lab comprising different environments. These datasets’ metadata and related information are given in Table 2.7. This comparison was qualitative given the different pipelines through which assemblies were generated for the microbiomes before feeding into MP. Also, due to variations in sequencing depth between samples, a variance stabilizing transformation (VST) was applied to the data using the DeSeq2 package (Love et al. 2014) in R. A hierarchical clustering based on Manhattan distances of z-scores was used. The Manhattan distance metric weights each additional gene equally regardless of abundance, whereas a Euclidean distance metric places lesser importance on additional genes and therefore this approach was chosen.  Next, relative abundance of each family was calculated, and a z-score (sigma value above or below mean count of a CAZy family) was determined for each CAZy family in the different environments and a color scale was chosen to represent this in a heat map implemented in R with mean of each GH family assigned 0 and z-score on a scale of +4 to -4 used to color code for enrichment and depletion respectively.   38  2.3 Results 2.3.1 Metagenome Assembly and Annotation  The assembly statistics of the PPS metagenome were satisfactory with 93.73% of reads mapping to the assembly. The average RPKM of the contigs across the sample was 0.71. The Nx curve for the assembly is shown in Figure 2.5 below and Table 2.4 gives the general assembly statistics. These values give confidence in further functional annotation of the pulp and paper sludge (PPS) metagenome.  Figure 2.5: Nx curve for Megahit assembled Pulp and Paper sludge (PPS) metagenome (Figure in collaboration with Connor-Morgan Lang)  39  Table 2.4: General Assembly Statistics General Assembly Statistic Parameter Value Total number of contigs 807686 Total cumulative length 839622602 bp Minimum contig length 200 bp Maximum contig length 626031 bp Average contig length 1040 bp N50 1652 bp 2.3.2 CAZy Families Relevant to Plant Polysaccharide Degradation The primary objective of this study was to find the most relevant enzymes that can hydrolyse the polysaccharide content within the waste biomass feedstock PPS. As such, majority of the solid content within PPS consists of rejects from the pulping process during paper making i.e. the fibres that are deemed too short to be processed ahead. The pulp in turn is derived from woody tree species that are used at the respective mills. The sample of PPS used in this study was derived from a coastal BC mill and is composed of a mixture of softwood trees species, typical of mills in this area, with only small adjustments from year to year. The major tree species comprising the source of pulp include Hemlock, Douglas-Fir, Cedar, Cypress and SPF (Spruce-Pine-Fir) mixture (data from mill).  It is interesting to note the differences in the major polysaccharide components of the starting woody material and the sludge produced after the pulping treatment. The composition of softwood polysaccharides has been reviewed by Willför et al. and their analysis of the major tree species found the most prevalent components as galactoglucomannans (mannans) and arabinoglucuronoxylans (xylans) (components of hemicellulose) followed by glucans (derived 40  mostly from cellulose). However, given their existence within the amorphous hemicellulose fractions and undesirability for paper making process, they are systematically removed during pulping that results in the cellulose related polymers (glucans) as the major components of process pulp and consequently sludge. This has been reflected in the composition of PPS in this study (Figure 4.6, section 4.3.1) as well as sludge used in other studies involving NBSK pulp rejects  (Jackson and Line 1997; Mabee 2001; Rangu 2018).  Using this information from the target feedstock and environment, the CAZy families most relevant to this study have been determined. These families have been shown to have activities on the substrates ArabinoGalactan, Arabinan, Cellulose (or glucan), GlucoMannan, GlucuoronoXylan, Homogalacturonan backbone, Rhamnogalacuronan backbone and Xyloglucan and Pectin. This list also represents some of the most well-characterized enzyme activities found in bacterial and fungal systems across different microbiomes and often act in synergy to carry out the hydrolysis of polysaccharides in woody biomass (van den Brink and de Vries 2011; Simmons et al. 2014; López-Mondéjar et al. 2016; Ndeh et al. 2017; Berlemont 2017) . Figure 2.6 shows a hierarchical clustering of the selected enzymes (using Manhattan distance and clustering based on average counts) derived from their variance stabilization transformed counts across the different microbiomes used in a comparative assessment in this study (section 2.3.4) 41   Figure 2.6: Manhattan distance hierarchical clustering of CAZymes used in this study 2.3.3 Binning the PPS metagenome  Binning the metagenome resulted in around 87 bins. The total size of sample that was assembled into bins was 246 Mb. This represents around 30% of the total assembly. Further, these bins were then checked for quality and the bins for annotation of CAZy functions were selected based on their completeness and contamination scores. Bins with >90% completeness were considered for further analysis. 6 of these bins had contamination % >20% and that put 42  them in the “medium-quality” draft standard as mentioned before using the MIMAG standards (Bowers et al. 2017) . The assembled bin draft quality ranges are indicated in Figure 2.7.    Figure 2.7: Completeness and contamination estimation of the bins of PPS metagenome  The legend in the figure represents taxonomy assignment through CheckM tool and given the nature of the metagenomic data several bins were assigned taxonomy at the “bacteria” domain level due to the absence of marker genes used by this tool to assign taxonomy. If these bins were included for further analysis, then their taxonomy was determined using Phylosift as depicted in Table 2.5. In total these 19 high-medium quality bins represent only 9% of the assembled metagenome. Despite good closure (80-100%) of the bins in assignment to a dominant phylum (except in 3 cases), given this low percentage of coverage of the metagenome by binning, it is difficult to make correlations between the significance of the contribution of this approach towards linking taxonomic distribution to function due to the high probability that 43  several CAZy families that could remain undetected by random chance due to insufficient coverage or completeness. Table 2.5: Summary of phylogenetic (Phylosift) and functional (Metapathways) information of the medium-high quality draft genomes as assembled through binning of the PPS metagenome (*≤ 1 gene count) Bin Size (Mb) Phylosift assigned phylum (%) Est. Completeness (%) Est. Contamination (%) Strain Heterogenity Index (0-100) Major GH family annotations (% of total CAZy ORF annotations) Function relevant to plant polysaccharide degradation 4 3.5 Proteobacteria (88%) 95 3 6 GH 13 (8.3); GH 23 (7.4) α-amylase; peptidoglycan lyase 5 5.2 Proteobacteria (86%) 98.6 7.7 2 GH 13 (5.7); GH 23 (4.9) α-amylase; peptidoglycan lyase 6 2.4 Firmicutes (96%) 91.9 2.6 0 GH 1 (21); GH 13 (15); GH 65 (6) β-glucosidase; α-amylase; phosphorylase 7 4.1 Bacteroidetes/ Chlorobi group (92%) 97.8 2.4 0 GH 13 (5.9); GH 2/3 (5) α-amylase; β-galactosidase; β-glucosidase 16 5.5 Bacteroidetes/ Chlorobi group (100%) 97.5 0.9 0 GH 13 (8.2); GH 3 (4.1) α-amylase; β-glucosidase 26 5.9 Bacteroidetes/ Chlorobi group (100%) 96.3 23.6 14 GH 13 (6.6); GH 3/5 (3.3) α-amylase; β-glucosidase; endo-β-1,4-glucanase / cellulase 31 5.5 Bacteroidetes/ Chlorobi group (100%) 99 1.7 20 GH 13 (5.3); GH 29 (4); GH 3/5(2.67) α-amylase; α-L-fucosidase; β-glucosidase; endo-β-1,4-glucanase / cellulase 32 4.1 Proteobacteria (74%) 94.3 0.98 0 GH 23 (10.5); GH 3/13/57 (3.5) peptidoglycan lyase; β-glucosidase; α-amylase 34 3.3 Bacteroidetes/ Chlorobi group (100%) 95.6 0.86 66.7 GH 43 (8); GH 3/13 (5.3) β-xylosidase; β-glucosidase; α-amylase 35 3.3 Proteobacteria (100%) 88.1 0.76 33.3 GH 23 (3.1); GH 6/10/43 (2.1) Chitinase; endoglucanase; cellobiohydrolase; endo-β-1,4-xylanase; β-xylosidase 38 3.3 Chlamydiae/verrucomicrobia group (57%) 95.4 2.1 0 GH 13 (8.5); GH 57 (4.2) α-amylase 44  40 2.2 Firmicutes (100%) 94.2 1.45 66.7 GH 9 (11.3); GH 5/13/43 (7) Endoglucanase; endo-β-1,4-glucanase; α-amylase; β-xylosidase 42 3.3 Bacteroidetes/Chlorobi group (100%) 91.2 6.1 70.6 GH 3 (6); GH 13 (5.1) β-glucosidase; α-amylase 48 2.7 Firmicutes (90%) 96.8 6.45 0 GH 13 (22); GH 3/30 (4.9) α-amylase; β-glucosidase; endo-β-1,4-xylanase 49 3.4 Bacteroidetes/Chlorobi group (100%) 94.6 1.7 50 No significant hits*  50 4.5 Bacteroidetes/Chlorobi group (100%) 95 2 20 GH 1 (5.1); GH 92 (3.4) β-glucosidase 51 2.2 Firmicutes (100%) 93.6 2.1 100 GH 13 (13.2); GH 3 (10.5) α-amylase; β-glucosidase  53 3.8 Planctomycetes (77%) 95.6 17.4 4.8 GH 31/15/29 (6.8)  α- glucosidase; glucoamylase; α-L-fucosidase 57 5.1 Chlamydiae/Verrucomicrobia group (100%) 89.7 2.3 25 GH 43 (15.4); GH 2 (6.7) β-galactosidase; β-xylosidase  Most of the bins were assigned to phylum Bacteroidetes (44%) (Figure 2.8 (a)). These bacteria are known for their organic polysaccharide degradation activities and are considered model organisms for polysaccharide utilization loci (PULs) studies and annotations. Firmicutes represent another phylum well known for organic matter degradation and were observed at 16.7% of the total bin phylogeny. It is therefore not surprising that CAZy annotations representing majority of the most relevant hydrolytic activity towards cellulose and hemicellulose degradation including β-glucosidase (GH3, GH1, GH92), endo-β-1,4-glucanase (GH5, GH9), β-xylosidase (GH43) and endo-β-1,4-xylanase (GH30) are present in the bins assigned to these phyla indicating the phylogenetic correlation with these functions and validating previous observations from literature (Terrapon et al. 2015; Ransom-Jones et al. 2017). GH13 was observed to be the CAZy 45  family with the highest % abundance across ORF’s annotated through CAZy and it is expected given the high abundance of this CAZy family within bacteria. However, this activity is not very relevant to cellulose or hemicellulose degradation.  Overall, CAZy families with cellulolytic and hemicellulolytic activities dominated the CAZyme distribution across the bins, representing a cumulative total of 54% (Figure 2.8 (b)).  Figure 2.8: Left to right (a) Taxonomic assignment distribution at phylum level for the bins as determine through Phylosift (b) Cumulative percentage distribution of CAZymes across the annotated bins using Metapathways 2.3.4 The Paper Sludge Microbiome and Metabolic Potential 2.3.4.1 Taxonomic Distribution Taxonomic assessment of metagenomic data or derived assemblies present many challenges (Sedlar et al. 2017). This arises from a combination of sampling techniques, the nature of the short-read data produced from next-generation platforms as well as the downstream bioinformatics processing pipelines that are used to extract marker genes to assign taxonomy (and the lack of consensus for benchmarking these techniques) (Sczyrba et al. 2017). 46  In this study, I sought to do a qualitative assessment of different means through which one can arrive at an estimation of the taxonomic composition of a metagenome and how they differ from each other. Taxonomic binning results have been presented previously, but due to the small % of the representation of the total microbiome in high quality bins, it is not possible to conclude taxonomic distribution with confidence using solely bin distribution.  Therefore, the whole metagenome taxonomic distribution was assessed using different bioinformatics pipelines. The results were qualitatively compared based on the distribution of the major phyla (that comprised >90% of the metagenome). For 454 pyrotag sequencing 421 unique OTU’s were identified through the QIIME (Caporaso et al. 2010) pipeline using 5,818 QC reads clustered at 97%. It was observed that there was closer agreement between EMIRGE and 454 pyrotag sequencing and Metapathways and Phylosift results respectively (Figure 2.9). However, it is intriguing to note that these pipelines identified the major phylum to be different from each other. This disparity of taxonomic assignment across amplicon-based sequencing and NGS/shotgun sequencing pipelines has been noted before and Tessler et al. claim that amplicon sequencing results might be more robust (Tessler et al. 2017). However, it is also well-known that amplicon-based sequencing suffers from over-estimation of phylum with bigger genome sizes that might skew the detection due to PCR bias and consequently the phylogenetic distribution. These can miss out on rare taxa that other bioinformatics pipelines might capture and result in potential discovery of new candidate phyla (Gonzalez et al. 2012).  47  Table 2.6: Taxonomic assessment comparison pipeline across the major phyla depicted in Figure 2.9 Pipeline Input data Reference % assigned to other phyla (non-major) 454 Pyrotag sequencing and QIIME assessment 16s pyrotag samples from PPS metagenome (Caporaso et al. 2010; Pilloni et al. 2012) 4 EMIRGE Short-read sequence data from Illumina sequencing of whole PPS metagenomic DNA (Miller et al. 2011) 0.5 PhyloSift PPS metagenome assembly  (Darling et al. 2014) 10 Metapathways (LCA using NCBI taxonomy and RefSeq tree) PPS metagenome assembly (Hanson et al. 2014) 2   Figure 2.9: Qualitative assessment of taxonomy distribution of major phyla through different pipelines  Data on taxonomic profile of PPS metagenome is not available in literature. Some studies give data on isolated strains from these samples (Maki et al. 2011; Ghribi et al. 2016). However, 48  investigations across similar environments – activated sludge from a municipal wastewater reactor (Yu and Zhang 2012) and a thermophilic cellulose degrading microbial sludge consortia (Xia et al. 2013) have shown similar trends in distribution of major phylum across these environments. The databases used for annotations in these pipelines can also explain the trend observed, Phylosift incorporates a multi-marker approach using HMM’s which is like the LCA algorithm implemented in Metapathways using information from NCBI non-redundant (nr) protein dataset. Similarly, EMIRGE and QIIME both rely on SILVA SSU databases for taxonomic assessments. It is therefore important to be wary of these differences in methods to better select a tool to answer the question most relevant to this study. Increasing the sample size to include PPS from different coastal mills in the geographic region is needed to confidently determine which taxonomic assessment is the most representative of this environment. 2.3.4.2 CAZy Distribution The CAZyme annotation of the PPS metagenome was done using Metapathways pipeline as explained before. A total of 32,232 unique CAZy ORF’s were detected, representing around 2.69% of the total ORF’s meeting QC thresholds in the pipeline. The overall distribution of the CAZy families is given below.  49   Figure 2.10: Overall CAZyme family distribution in the PPS metagenome as annotated using Metapathways Glycoside hydrolases form the biggest fraction of the CAZy families for the PPS sample. It is interesting to note that within GH fraction, the highest activity fraction observed is related to hemicellulose and structural biomass degradation (viz pectin) while cellulosic biomass degradation forms only 11% of this fraction. It should also be noted that other CAZy families that are needed for the synergistic breakdown of hemicellulose in this environment are also present in the overall annotations. These results show interesting targets for functional screening for other enzyme activities (apart from GH’s) that might be important for total biomass deconstruction. The taxonomic enrichment and distribution of the CAZy-GH families most relevant to plant polysaccharide distribution was plotted (Figure 2.11). This figure is generated using LCA taxonomy assignment of GH genes through Metapathways including only those gene counts that could be assigned at phylum level. As noted before, GH 13 remained the most predominant family and was found across all major phyla.  It is very interesting to note that the phylum Bacteroidetes has the greatest abundance of the most relevant GH families involved in activities cellulose and hemicellulose degradation. These include all the 7 β-glucosidase families (GH1, 50  GH3, GH5, GH9, GH30, GH39 and GH116), 6 of 14 cellulase families (GH5, GH6, GH9, GH10, GH12, GH26), β-xylosidase families (GH2, GH43) and endo-β-1,4-xylanase families (GH10, GH30). The same trend was also observed through binning and assignment of CAZy functionality to the taxonomy. Estimation of the Bacteroidetes phylum in the PPS metagenome is also the most closely agreed on by different pipelines and therefore lends credence to the hypothesis that this phylum is the major GH activity provider. This observation might also point out the possibility that majority of the CAZy functionality of this metagenome might be carried out by the taxonomic group which is not the most abundant (based on mapped reads) and other taxa might be dependent on this group for their metabolism. The genes belonging to this phylum might be potential candidates for sub-cloning and optimization of expression of both cellulolytic and hemi-cellulolytic functions. This is also corroborated from findings from literature that show a strong correlation between members of the Bacteroidetes phyla and GH activity. Berlemont and Martiny have done a comprehensive study on linking GH family profiles (functionally and taxonomically) across around 1,934 annotated metagenomes from 13 broadly defined ecosystems and found that Bacteriodetes are degrader genera found in almost all type of ecosystems (Berlemont and Martiny 2016) and are also strongly associated with xylan degradation which is the most major type of GH activity found in the PPS environment. 51   Figure 2.11: Phylum level distribution of relevant Glycoside Hydrolase (GH) genes in pulp and paper sludge (PPS) metagenome (only phyla constituting > 90% of the taxonomy as annotated through Metapathways included) 2.3.5 Comparison to Other Environmental Microbiomes In order to, better understand the relevancy of CAZy functionality of the PPS metagenome in the broader context of environmental metagenomes, I did a comparative analysis of the distribution of the CAZy families relevant to plant polysaccharide degradation across them. The metadata for these environments are given in Table 2.7 and should be considered while analysing the CAZyme distribution.     52  Table 2.7: ORF statistics and metadata for metagenomes in comparative analysis Metagenome ID  Metagenome description (metadata) No. of ORFs predicted No. of CAZy hits (and % of total ORFs) Reference WL-0 LFH(organic) harvested soil horizon, 0.3 cm depth, pH 5.87 160378 16113 (2.85) (Wilhelm et al. 2017); All samples were from O’Connor lake,BC WL-1 Mineral (Ae) harvested soil horizon, 6.2 cm depth, pH 6.13 714203 18627 (2.61) WL-2 Mineral (AB) harvested soil horizon, 33.3 cm depth, pH 6.5 313767  25681 (2.37)  WL-3 Mineral (Bt) harvested soil horizon, 52.8 cm depth, pH 6.9 653135  52910 (2.34)  WL-4 LFH(organic) harvested soil horizon, 0.2cm depth, pH 6.18 320397  8497 (2.65)  WL-5 Mineral (Ae) harvested soil horizon, 6.7 cm depth, pH 6.05 921308  86655 (2.62)  WL-6 Mineral (AB) harvested soil horizon, 21.7 cm depth, pH 6.27 477359  48868 (2.9)  WL-7 Mineral (Bt) harvested soil horizon, 59.7 cm depth, pH 6.76 868411  75413 (2.45)  WL-8 LFH(organic) harvested soil horizon0.2cm depth, pH 6.03 532576  12552 (2.36)  WL-9 Mineral (Ae) harvested soil 153718 3379 53  horizon, 4.2 cm depth, pH 5.16  (2.2)  BEAVER Beaver feces metagenome 179831  6375 (3.54) (Mewis 2016) WWT-SLUDGE Activated sludge wastewater microbial communities from Ann Arbor, Michigan, USA 1104030  113964 (2.61) (Stadler et al. 2017) COALBED-1 Coalbed methane (CBM) samples – sampled as pieces of core obtained by rotary drilling (cuttings from less than 1000 mbs); 182 m. depth; 10–20°C 1370665  30088 (2.2) (An et al. 2013) COALBED-2 Coalbed methane (CBM) samples – sampled as pieces of core obtained by rotary drilling (cuttings from less than 1000 mbs); 182 m. depth; 10–20°C 987759  23766 (2.41)  (An et al. 2013) SAKINAW_LAKE Permanently stratified meromictic Sakinaw Lake  5721366  82269 (1.44)  (Gies et al. 2014) COMPOST Rice straw-adapted microbial consortia enriched from compost ecosystems (Day 30 - 29.2 °C) 159941  16317 (2.74)  (Wang et al. 2016) PPS Primary wastewater reactor sludge sampled from BC coastal paper mill (19-200C, pH = 7.0 ± 0.02) 1200162  32031 (2.67)  This study (Sharan et al. 2017) 54  ZOO_COMPOST Time series study of thermophilic zoo composting facility (sampled on DAY 67 - 70.5 ± 4.80C, pH = 7.4) 371820  10452 (2.81)  (Antunes et al. 2016)  Figure 2.12 shows a hierarchical clustering of the environments on the basis of the variance stabilized GH gene counts. As expected, the beaver metagenome assembly clusters away from all other environmental metagenomes owing to the fundamental differences between this system (mammalian xylotroph) and the other environments. This is also reflected in the CAZyme enrichment profile (Figure 2.13). Given its diet comprising of mostly woody biomass that drives a dedicated microbial flora to assist with digestion, it is not surprising to see an enrichment of most major cellulolytic and hemicellulolytic activities (as described before). This environment also represents the highest total % ORF CAZy annotations. The other environments have pretty similar values in terms of the overall percentage and PPS metagenome falls within this range (2-3%). It is good to see the PPS metagenome clustering together with compost microbial enrichment and zoo-compost which gives further confidence in choosing these environments for explaining the CAZy taxonomy trends as observed for PPS earlier given the lack of literature on the metagenome of this environment. However, it is surprising to see activated sludge clustering with coal bed sample. The anaerobic nature of this sludge leading to methanogenesis might explain potential correlation with dominant phyla found in the solid coalbed methane core samples. The soil metagenomes (WL 0-9) cluster together with each other with unexpected overlaps between organic and mineral horizons (might be potentially attributed to error in 55  sampling methods and the stratified meromictic lake sample also clusters away from most environments as expected.  Figure 2.12: Hierarchical clustering of the different microbiomes in this study based on relevant CAZy gene counts (tree distances calculated using Manhattan method)    56   The overall profiles of CAZyme abundance of these environments corroborate well with the metadata. In fact, they explain the gene functionality to explain the clustering together. For instance, the unexpected overlaps between the mineral horizons WL 1,7,9 with the organic horizon WL 0,4,8 is explained by the enrichment of similar CAZyme activities. The other mineral horizons expectedly show lack of these activities. Harvesting of soil and intermittent mixing of horizons as related to sampling might be driving these profiles.  Similarly, the clustering of coal-bed samples with activated sludge microbiome is explained. These metagenomes along with mineral soil horizons (WL 2,3,6,5) represent some of the environments with the poorest CAZy enrichment profiles and therefore do not seem good candidates for functional profiling of CAZy activities.  It is quite surprising to find this particular activated sludge environment as not enriched in CAZy activities when this environment is quite regularly prospected for lignocellulolytic activity in literature.  A closer look at the CAZyme heat map across all these environments shows that the PPS metagenome is not exceptionally rich in the CAZy activities most relevant to plant polysaccharide degradation. However, it is interesting to note that the enzyme families including - GH 43, CE7, GH 30, GH 115, GH 113, GH 52, GH 27, GH 35, GH 51, GH 106, GH 127 – which represent most of the differentially abundant enzyme categories across this environment are related to degradation of hemicellulose and structural compounds instead of cellulose. This reflects the CAZy profile as annotated through Metapathways and it further adds emphasis on re-asking the question of targeted enzymatic mining of this environment.  57   Figure 2.13: Heat map showing differential abundance of CAZyme families across different metagenomic environments (The color coding represents the conversion of VST GH count values to an enrichment-depletion scale based on calculated z-scores for each family across different environments)   58  The presence of hemicellulolytic genes specific for polysaccharides and oligosaccharides derived from softwood species is intriguing despite knowing that compositionally sludge is not enriched in the hemicellulolytic fraction. However, on an “as received” basis – sludge is 80% moisture and 20% solids. These microbial communities are present naturally in this high-moisture environment. There is a possibility that due to the presence of the hemicellulose derived sugars in a dissolved form in the environment – these activities might get enriched in the microbial communities. Being crystalline, cellulose hydrolysis and uptake is quite inhibitory for metabolism and as such does not represent an energetically favourable metabolism pathway. Its persistence in PPS solids post the harsh pulping process stands proof to its recalcitrance. This can potentially explain this observation and prompt hemicellulolytic gene mining as the targeted approach for PPS microbiome. However, an analysis of the liquid fraction of sludge is needed to validate this reasoning and should be involved in further studies. 2.4 Conclusion In-silico analysis of microbiomes prior to functional analysis represents a powerful pre-validation step to maximize efficiency in downstream experimental design and use of wet-lab resources. It might also be needed for quick-testing of hypothesis around functionality and taxonomy of metagenomes. However, lack of bench-marking bioinformatic tools and variability in outputs across pipelines makes informed interpretations of such results difficult. As much as possible, and especially for comparative analysis across microbiomes, it must be ensured that the draft assemblies of metagenomes confirm to MIMAG standards and be processed equivalently. This can make these analyses more quantitative and less qualitative. Also supplementing shotgun metagenomic DNA sequencing with metatranscriptomic sequencing can lead to new insights 59  about the actual functional distribution of genes and the phyla that are most active in driving the activities of interest to any study.              60  CHAPTER 3 High-throughput Biocatalyst Discovery from Paper Sludge Metagenome by Functional Metagenomics 3.1 Background 3.1.1 Functional Metagenomics - discovering novel industrial biocatalysts Functional metagenomics, or the expression of mixed environmental microbial DNA in heterologous host systems (Lam et al. 2015), is the experimental complement to sequence-based metagenomics. Sequence-based metagenomics concerns itself with in-silico annotation of the environmental DNA using known nucleotide or protein databases and can potentially lead to discovery novel gene families through clustering approaches. However, it is only through functional metagenomic methods that we can truly link desired phenotype/function to the novel gene discoveries and potentially generate tractable systems for biocatalyst or metabolite production (Cheng et al. 2017). This is done through the cloning and expression of environmental DNA of interest into a suitable host system to generate what is called a “metagenomic library” i.e. several, thousand clones harbouring a fragment of the environmental DNA. This library is then screened or tested for desirable function using different assaying approaches that includes but is not limited to fluorescence detection, colorimetric or chromogenic methods or possibly even selection for growth or inhibition of growth, preferably in high-throughput (Streit and Daniel (Eds.) 2010). Functional metagenomic findings are very important for closing the gap between knowledge-based discovery and application to bioprocesses engineering and industrial 61  biocatalysis. It also enables answering key questions in environmental microbial ecology for linking function to taxonomy through the ability to assess function independent of the ability to cultivate the microbes in laboratory setting (Chistoserdovai 2010). Detection of desired phenotypes in a functional metagenomic library requires a strong, hypothesis-driven approach. Correct screen design is indeed arguably the singular most important in the process of discovering a gene or biocatalytic activity of interest and will be discussed in detail in section 3.1.2.  However, prior to screen design it is also important to consider other factors that would influence hypothesis formation around the functions of interest that one can potentially hope to discover from an environment. The sampling methodology and environmental metadata like temperature, pH, presence of target substrate molecules are very important to drive screening targets, substrates or even the host systems used (Taupp et al. 2011; Thies et al. 2016) . For instance, several glycoside hydrolases have been discovered in ruminant gut and faecal microbiome as the main dietary components of these animals are feedstocks that are rich in cellulose (Hess et al. 2011; Geng et al. 2012; Ilmberger et al. 2014) or biosurfactant producing genes from the microbiota of oil-contaminated environments that have been enriched for microbes that can break down oil molecules and these have immense application in bioremediation approaches (Oliveira et al. 2015).  Apart from the metadata, as evidenced from chapter 2 of this study, in-silico metagenome analysis also plays an important part in guiding screen design as it provides information about the pathways and genes that might exist in the metagenome. There is almost a two-way 62  dependency of sequence-based metagenomics and functional metagenomics on each other with regards to experimentation and validation. 3.1.2 What goes into metagenomic functional screen design? Screen design is the most important part of functional metagenomics. A consistent observation across both culture dependent (enrichment cultures, directed evolution) and culture independent (metagenomic screening) phenotypic screening experiments is – we get what we screen for. This puts even more emphasis on the careful considerations that must go behind screen design. The design principles for metagenomic functional screens usually follow the general guidelines that are used for high throughput screening (HTS) applications  (Acker and Auld 2014) as often the preliminary screen involving the whole metagenomic library involves several thousand reactions that need to be conducted simultaneously and reproducibly. The following are some important factors that would affect both the screen design and outcomes and have been reviewed previously (Taupp et al. 2011; Armstrong et al. 2015) (Figure 3.1):  1. Environmental DNA: The quality and quantity of the isolated genomic DNA affects several aspects of the screening process. The quantity of DNA that is captured in the metagenomic library will affect the coverage of the environmental genome that is captured in the library and would therefore create biases in the type of functions that might be observed.  Strategies like substrate-based enrichment of mixed environment cultures (Wang et al. 2016; Arshad et al. 2017) [in some cases with use of special growth chambers (Nichols et al. 2010)] and substrate induced gene expression (SIGEX) (Simon and Daniel 2011) have been used to enhance the probability of discovery of targeted gene 63  products. Stable isotope probing (SIP), a common technique applied in microbial ecology studies to link taxonomy to metabolic function (Radajewski et al. 2000) is now being increasingly integrated with metagenomic studies to fractionate the environmental DNA fraction that needs to be targeted for screening (Grob et al. 2015; Ziels et al. 2018). The DNA can then be size-fractionated to construct libraries harbouring different genes of interest.  For low biomass environments, especially extreme environments that are targeted for thermo-stable or broad range pH activities, it is challenging to get sufficient DNA for library construction. In these cases, techniques like whole genome (multiple displacement) amplification (Binga et al. 2008), linker-amplified shotgun library (LASL’s) or expressed-LASL can be used to amplify extracted DNA with negligible bias and have been reviewed for phage metagenomic studies (Henn et al. 2010). 2. Vector and expression system: Plasmids can usually harbour inserts up to 15kb in size and while their high transformation efficiency greatly increases the number of available clones for screening, only single loci gene functions can be tested. The other vectors like fosmids, cosmids (≤40kb) and bacterial artificial chromosomes (BACs) (>40kb) can support screens for linked gene inheritance derived activities, multi-locus traits also enable better taxonomic annotation. Some systems like pCC1FOSTM (Jendrisak et al. 2002) and conditionally amplifiable BACs (Wild et al. 2002) also support inducible copy control that is useful for high yields of cloned DNA of interest while maintaining a stable clonal population and minimizing expression of toxic genes (Simon and Daniel 2011) .  64  Selection of the expression system or host in which the environmental DNA is cloned is very important to ensure optimum translation of desired genes. E.coli strains remain the most frequent host, and this is supported by evidence through the in-silico analysis of the transcriptional, translational and posttranslational controls along with the promoter recognition and initiation factors that suggests expression of approximately 40% of genes within a subset of 32 taxonomically diverse genomes with wide-ranging variation in expression potential between genomes (7 –73%) (Gabor et al. 2004). However, depending on the codon usage bias percentage that varies between different species, other hosts like Bacillus (Steels et al. 2013), Caulobacter or Pseudomonas spp. (Craig et al. 2010) have been found to be better hosts for genes linked to other taxonomic groups. It is therefore important to consider taxonomic linkages of target genes and comparatively assess expression of vectors in different hosts (Lam et al. 2015) to obtain the clone with the most optimized expression system that would reduce downstream biocatalyst production optimization.  3. Activity/function: Biochemically there exist several nuances to this factor. Enzymatic functions often belong to different classes, it is misleading to designate generic names like “metagenomic screen for cellulases” as it is a very broad term containing different enzyme activity sub-classes that are responsible for complete degradation of the cellulose polymer (Wilson 2009). There are dedicated databases for several important large enzyme families (specific or collection of different activities) and they are used for annotation and identification of potential novel enzymes in conjunction with functional metagenomics (Schomburg and Schomburg 2010). This also informs a structure in 65  systematic screening approaches where there can be step-wise capturing of a repertoire of different activities followed by deconvolution of clones with very specific activities applicable to target substrates (Chen et al. 2016) as is presented in this study. 4. Substrate, Co-factors and co-enzymes: Substrate specificity is one of the fundamental characteristics of enzymes. Substrate selection for the metagenomic screen should therefore be done very carefully to avoid confounding effects of substrate analogues or presence of other reactive groups that might lead to promiscuity and discovery of false positives. Macdonald et al. have recently described a novel high-throughput assay for discovering broad-substrate specificity glycoside phosphorylases using inorganic phosphate in the enzyme assay medium (Macdonald et al. 2018).  Several enzymes might also need the presence of co-factors or co-enzymes to be functional. Although many of these co-factors might be naturally present in the cellular medium, the specific requirements of the targeted enzymatic reactions should be assessed prior to screening (Bisswanger 2014). This can be done by potentially doing an in-silico survey of known enzyme families with similar activities and the screening medium or assay mixture should be supplemented as required. 5. General considerations for good functional screen design: Given the anticipated issues with effective gene expression of environmental genes in single host systems, sensitivity of activity detection is very important and can be achieved by design of substrate. Related factors also include the need for a high dynamic range, broad pKa to allow screening at different environmental pH and low signal to noise ratio during screening. Insensitivity to cellular contents and compatibility with typical cell lysis reagents is also important. For 66  example, Chen et al. have demonstrated synthesis fluorescent phenols of pKa <7, such as halogenated coumarins modified to be stable so that they can provide even at extended assay times without generation of significant background signal. These substrates have been used for screening in this work (Chen et al. 2016) .   Figure 3.1: Production and functional screening of metagenomic libraries (Taupp et al. 2011) Copyright © 2011 Elsevier Ltd. 67  3.1.3 Carbohydrate Active Enzymes (CAZy) Database: Glycoside Hydrolase (GH) families and Polysaccharide utilisation loci (PUL’s) The Carbohydrate Active enZymes (CAZy) database (http://www.cazy.org/) has been developed as a curated and annotated reference resource for determining CAZy activities and is increasingly being applied to metagenomic data (Armstrong et al. 2015). The polysaccharide degradation genes are classified as glycoside hydrolases (GH), polysaccharide lyases (PL), carbohydrate esterases (CE), carbohydrate-biding modules (CBM) and auxiliary activities (AA) (Terrapon et al. 2017). This database allows targeted database searches alongside gene expression studies from functional screening from genomes and metagenomes thereby allowing ready functional validation.  Polysaccharide utilisation loci or PULs are minimally defined as a SusC/SusD gene pairing in close proximity to genes that encode carbohydrate active enzymes (Grondin et al. 2017). Hypothetical genes predicted within a PUL have the potential to provide functional clues and inform downstream expression studies especially related to lignocellulosic bioprocessing. An automated Bacteroidetes PUL prediction pipeline and web interface using genomic context information and domain annotations based on information in the CAZy database has been developed by Terrapon et al. (Terrapon et al. 2015), using sequence information from the Bacteriodetes group where PULs occur most commonly and presents a powerful tool to investigate the occurrence of such cellulolytic orchestrated gene cassettes within metagenomes.   68  3.2 Materials and Methods 3.2.1 Fosmid Library Construction The fosmid library was created following the protocols established previously (Taupp et al. 2009) using the Epicentre (now Lucigen) CopyControl™ Fosmid Library Production Kit with pCC1FOS™ Vector. The high molecular weight DNA to be used in fosmid library production was extracted from pulp and paper mill sludge (PPS) sampled for in-situ analysis and as described in sections 2.2.1 and 2.2.2.1. The DNA was purified from commonly present contaminants using CsCl density-gradient ultracentrifugation (Wright et al. 2009) . The desired pure genomic DNA band was extruded using a syringe and the dye Ethidium Bromide (Et-Br) was removed using water-saturated butanol extraction and further purified by concentration using YM-30 microcon (Millipore) unit. The insert DNA was then end-repaired using reagents supplied in the library production kit and size separated on low-melting point agarose through pulse field gel electrophoresis (PFGE) technique. The gel was stained using SYBR Gold (Molecular probes, life technologies) and thin band corresponding to size range (40-23kb) for fosmid library creation as determined using control DNA, mid-range PFGE marker and λ / HindIII ladder was spliced out. The gel was melted and treated using GELase (Epicentre) enzyme. The DNA was purified and concentrated to a volume of 14 µL using Amicon and microcon filters in succession. The DNA was then ligated overnight with pCC1FOS vector and packaged into MaxPlax Lambda Packaging Extract. After proper incubation, an actively growing culture of host strain E. coli EPI300-T1R was infected with the phage particles and following an hour-long incubation at 370C the cells were centrifuged and resuspended in 1mL Luria-Bertini (LB) medium with 10% glycerol and stored as glycerol stocks at -800C. A titre test for testing phage infection efficiency (using control DNA 69  reaction) and colony forming unit (CFU) count of the culture was also done simultaneously. The following formulae were used to determine packaging efficiency (1), CFU/µL of the glycerol stock (2) as well as number of clones to make the pulp metagenome library (3):  (𝑁𝑜.𝑜𝑓 𝑝𝑙𝑎𝑞𝑢𝑒𝑠) (𝑑𝑖𝑙𝑢𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟)(𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑜𝑙𝑢𝑚𝑒)(𝑉𝑜𝑙𝑢𝑚𝑒 𝑜𝑓 𝑑𝑖𝑙𝑢𝑡𝑖𝑜𝑛 𝑝𝑙𝑎𝑡𝑒𝑑)(𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝐷𝑁𝐴 𝑝𝑎𝑐𝑘𝑎𝑔𝑒𝑑)=  packaging efficiency (pfu/μg DNA) (1) (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑜𝑛𝑖𝑒𝑠 𝑐𝑜𝑢𝑛𝑡𝑒𝑑)(𝑉𝑜𝑙𝑢𝑚𝑒 𝑜𝑓 𝑐𝑢𝑙𝑡𝑢𝑟𝑒 𝑃𝑙𝑎𝑡𝑒𝑑) 𝑋 (𝑉𝑜𝑙𝑢𝑚𝑒 𝑜𝑓 𝑐𝑢𝑙𝑡𝑢𝑟𝑒)𝑉𝑜𝑙𝑢𝑚𝑒  𝑜𝑓 𝑡ℎ𝑒 𝑔𝑙𝑦𝑐𝑒𝑟𝑜𝑙 𝑠𝑡𝑜𝑐𝑘= CFU/µL of the glycerol stock (2) ln(1−𝑃)ln(1−𝑓)= 𝑁 , (P = desired probability fraction; f = proportion of genome contained in insert; N = no. of clones) (3) For equation 3, in the case of a mixed environmental metagenome, f is indeterminate. The clones were picked and transferred into 384 well plates containing LB with 12.5 µg/mL chloramphenicol and 10% glycerol using QPix2 robot (Genetix) and stored at a temperature of -800C prior to screening. Figure 3.2 depicts the general workflow for metagenomic library construction and screening as adopted in this work.       70   Figure 3.2: Metagenomic library and functional screening schematic for pulp and paper sludge (PPS) metagenome 3.2.2 Functional Screening The fosmid library was screened for cellulolytic activity using a mixture of model fluorogenic substrates developed previously (Chen et al. 2016) containing the functional group 6-chloro-4-methylumbelliferyl (CMU) . The substrates were synthesized as described in the publication and were provided by Zach Armstrong from the Withers lab group at UBC. In the first round of screening the entire library (viz 15,000 clones) a mixture of the substrates CMU-cellobioside (CMU-C), CMU-D-mannoside (CMU-Man), and CMU-xylobioside (CMU-X2) (Figure 3.3) to capture the repertoire of catalytic activities of interest from the library. In the subsequent screening rounds, the cellulolytic activity was deconvoluted by using only CMU-C substrate to 71  screen and validate clones having specificities only towards cellobiose (and/or cellulose for which the substrate served as the model proxy).    Figure 3.3: The β-1,4-glycoside substrates of 6-chloro-4-methylumbelliferyl (CMU) used for functional screening of the pulp and paper sludge (PPS) metagenomic library clones (A) CMU-cellobiosise (B) CMU-xylobioside (C) CMU-Mannoside (figures by Zach Armstrong) The clones from the working glycerol stock library were replicated and grown for 24 hours at 370C in supplemented LB media with 12.5 µg/mL chloramphenicol in a volume of 45µL per well. Following OD600 measurement using a VarioSkan multi-mode plate reader coupled with a RapidStak device (Thermo Fisher Scientific), the lysis mixture containing the substrates was added to each well using a high-throughput liquid handler giving a final volume ratio of 1:1.The lysis mixture was prepared in 50mM potassium acetate buffer adjusted at pH 7.02 using 1% Triton X-100 as the lysing agent and 200 µM each of the substrates. The incubation was done overnight (16-18 hours) after which the fluorescence measurements were carried out using the following parameters (excitation: 365 nm, emission: 450 nm, gain: auto) using a VarioSkan as before.  Clones with relative fluorescence values (RFU) – represented as robust z-score calculations- greater than the specified σ-value cut-offs in each round of screening were 72  designated as positive clones. The choice of σ-value cut-off varied with the dataset and was chosen depending on significant inflection of data values or the base-line value as observed with the background control (EPI300 strain with empty vector pCC1TMFOS). The data analysis and visualisations were done using suitable packages in R version 3.4.3. 3.2.3 Fosmid Purification and Sequencing Following identification of positive clones, following several rounds of screening and deconvolution, the fosmid DNA was extracted from the clones using GeneJet Plasmid Miniprep kit (Thermo Fisher Scientific). E.coli (EPI300) genomic DNA was removed using Plasmid-Safe DNase (Epicentre). Plasmid DNA concentrations were determined using Quant-iT PicoGreen Assay (Invitrogen). DNA was sent for (next-generation) NGS library preparation, sequencing and short-read data generation to seqWell Inc. (Beverly, MA) using their plexWell ProTM Service platform and sequencing was done on Illumina Miseq. 3.2.4 Fosmid Assembly and Annotation The short-read DNA sequence data as obtained from seqWell was checked for quality using Fast-QC pipeline Version 0.11.6 (Andrews 2017). The assembly was done using the FabFos pipeline, developed by Connor Morgan-Lang freely available at Hallam Lab GitHub account (https://github.com/hallamlab/FabFos). The pipeline involves sequential QC of reads and vector backbone (pCC1FOSTM) backbone trimming using Trimmomatic (v 0.35) (Bolger et al. 2014), assembly of the trimmed fasta files into contigs using Megahit (v 1.0.6) (Li et al. 2015) and (optional) mapping of end-sequences to assembled contigs using BlAST (blastn) (Figure 3.4) and 73  outputs the assembly statistics in a tsv file utilising the input from user in a minimum information for fosmid environmental data (MIFFED) file giving the major Nx statistics for the assembly. Fosmid sequences were annotated for gene content using MetaPathways v2.5 (Hanson et al. 2014) , and compared to the curated and annotated latest versions of the databases KEGG (Kanehisa et al. 2016), COG (Tatusov et al. 2003), RefSeq (O’Leary et al. 2016), MetaCyc (Caspi et al. 2016), and CAZy (Terrapon et al. 2017) databases using the LAST algorithm. Prodigal (Hyatt et al. 2010) parameters for open reading frame (ORF) prediction used were: minimum length of 60 bp, minimum bitscore of 20, minimum (B)LAST score ratio (BSR) of 0.4, and maximum e-value of 1x10− 6. Fosmid gene content was annotated using the results files generated by MP.  The taxonomic assignment was also done using the (lowest common ancestor) LCA algorithm as implemented in MP using NCBI tree and assignment of all ORFs found on a fosmid served as a marker for fosmid taxonomy.  Figure 3.4: FabFos pipeline schematic (https://github.com/hallamlab/FabFos) 74  3.3 Results and Discussion 3.3.1 Metagenomic Library Construction and Host Selection A functional metagenomic library based on a fosmid-vector system was chosen to allow expression of larger gene clusters required for more complex metabolic pathways (Martínez and Osburne 2013) . Although, the gene cluster capacity is limited at an upper threshold of ~40kb due to the phage packaging system as offered by the λ phage packaging system (Epicentre 2010), it is a good compromise in terms of ease of construction and capturing complex gene cluster expressions. Phage T1-Resistant TransforMax™ EPI300™-T1R Electrocompetent E. coli was used as the host strain in this study. It has been optimized for heterologous gene expression and the copy-control pCC1FOSTM system allows for tight, inducible control over the expression of fosmid copy number relevant to gene expression (Lucigen 2016).  Complex/Multi-gene cluster expression systems are also important for the major objective of this study –  discovery of glycoside hydrolase (GH) families. It has been previously shown through co-occurrence analysis in literature that some GH family genes (for instance GH 43) are co-localised with carbohydrate binding modules (CBM’s) genes (Mewis et al. 2016) (Figure 3.5 (A)). It has also been common to observe co-occurrence of multiple GH families within the same operon in previous fosmid-based functional screening works. It is therefore important to capture them for correct interpretation of the activity of the gene loci which is assisted. Well-characterized systems like PULs such as those observed in cellulolytic bacterial phylum Bacteroidetes including Xyloglucan Utilization Loci (XyGULs) (Attia et al. 2018) also occur as co-localized gene clusters that encode enzymes and protein ensembles required for the 75  saccharification of complex carbohydrates (Grondin et al. 2017) (Figure 3.5 (B)). The same is also true for bacterial cellulosome systems that occur as multienzyme complexes comprising of structural (scaffoldin) and enzymatic subunits (Artzi et al. 2017). These represent some of the most potent cellulolytic systems in the prokaryotic domain and are the prime targets for the functional screen in this study, thereby necessitating the need for using fosmid-based libraries.   Figure 3.5: Co-occurrence and co-localization of glycoside hydrolase (GH) genes are presented in literature (A) Heat map showing frequencies of cooccurrence of GH43 subfamily domains with major noncatalytic modules including CBM, carbohydrate binding module; DOC, cellulosomal dockerin domain; X19, conserved noncatalytic module with subfamilies clustered as per respective HMM profiles (Mewis et al. 2016). (B) Schematic representation of gram-positive polysaccharide utilization locii (gpPULs) concerned with xylan, pectin and arabinogalactan utilization © (Harris et al. 2016) In this study high molecular weight DNA (40-23kB), purified using density-gradient caesium chloride centrifugation, was subjected to random shearing and end-repair. It was then size-separated on pulse-field gel electrophoresis and the correct size fraction spliced, digested and ligated to pCC1TMFOS vector. The ligation mixture was phage packaged and EPI300TM-T1R cells transfected and plated for metagenomic library creation. Around 15,000 clones were generated and assumed to give sufficient coverage for the pulp and paper sludge metagenome 76  and stored as master copy and working stock. The working stock was then used consequently for high-throughput functional screening.  3.3.2 High-throughput Functional Screening The fluorogenic substrates used in this study were selected for their high sensitivity and relatively stable half-lives and emission spectrum that allows for rapid, high-throughput screening of environmental metagenomic libraries (Chen et al. 2016). This is especially important to increase the hit frequency rate of positive clone recovery from functional screening of metagenomic libraries. Pooling substrate also allow a single screening-run to be performed, surveying several activities at one time (depicted in Figure 3.6 for the substrates used in this study), thus conveying increased efficiency and reduced materials cost.  Figure 3.6: Schematic of testing for different cellulolytic activities using of 6-chloro-4-methylumbelliferyl (CMU) glycoside of cellobiose and resultant products from enzymatic breakdown that results in fluorescent signal detection 77  To this end, we chose to pool three distinct substrates, the 6-chloro- 4-methylumbelliferyl β-glycosides of cellobiose (CMU-C), xylobiose (CMU-X2) and D-mannose (CMU-Man). It had been previously established that the screening host (E. coli EPI300TM) does not catalyse significant turnover of these substrates. This allowed for simultaneous testing of endo-acting cellulases, xylanases, or sequential action of exo-acting β-glucosidases. To the best of my knowledge, this is the only attempt so far at generating a metagenomic library for functional screening of GH family genes from the pulp and paper primary sludge metagenome. GH activity is one of the most sought after activity for functional screening from metagenomic libraries, given its direct applicability to industrial lignocellulose biocatalysis (Armstrong et al. 2015) The initial screening with 15,000 clones yielded around 384 hits with robust z-score value ≥10 (also considered the sigma value). This represents an extremely high hit rate of almost 1 in 39 clones which is much higher than values reported for other microbiomes investigated for GH genes like anaerobic bioreactor system (1 in 410) (Mewis et al. 2013) and as observed in functional screening of in-house constructed metagenomic libraries in the Hallam lab group using a previously developed high-throughput screen (Mewis et al. 2011) (Table 3.1). Even at the more conservative 40 sigma cut-off value (where the values show significant inflection from other robust z-score values that cluster together and might confound results), the hit rate of 1 in 517 is well-within the typically observed rate for metagenomic environments.   78         Figure 3.7: Initial functional screening results with all clones in the pulp and paper sludge (PPS) metagenomic library Table 3.1: Hit rate of different functional metagenomic library screened for glycoside hydrolase genes using a soluble, chromogenic model compound, 2,4-dinitrophenyl cellobioside (DNP-C) (data from Mewis 2016) Library Clones Hits Hit rate (1 per x clones) Forest soils 115584 194 596 Hydrocarbon 121728 193 631 Marine 53760 30 1792 Beaver 44928 184 244  From the initial screen, the top 128 clones were picked for validating in triplicate in a 384 well plate format and de-convolution of activity on CMU-cellobiose which is the closest model proxy to cellulose. Given method limitations like plate-edge effects and batch-to-batch screen run variations, specificity of activity on cellobiose, this second round of screening led to significant decrease in the number of significant hits (evaluated at sigma value ≥10), giving a sub-set of ~29 clones that were then further investigated (Figure 3.8). 79   Figure 3.8: Second round of screening – validation of top-128 hits in triplicate and deconvolution of activity on CMU-cellobioside These top-29 clones were then re-screened alongside background control strain EPI300TM containing the empty pCC1FOS TM vector as well as commercial Celluclast (Novozymes) enzyme mixture cocktail as positive control to check potency for activity on CMU-cellobiose. The controls in a way define the spectrum of activity on a relative scale - background strain at low (insignificant) end and Celluclast (Hu et al. 2011) at the high end. Testing was done at two distinct time points to assess the reproducibility and it was gratifying to observe that within a 5% error, all significant 13 hits, except for one, were recovered. It was also intriguing to see that at microlitre-level, several clones outperformed Celluclast at 0.5 and 1 mU of enzyme loading. 80   Figure 3.9: Reproducibility test of top-29 hits using CMU-cellobiose alongside background control ePCC1FOS and positive control Celluclast enzyme cocktail; inset shows ePCC1FOS values on the two runs (error bars represent 5% error) This step-by-step approach of eliminating hits to cherry-pick clones was taken to ensure rigour in selection for clones for direct application to cellulose hydrolysis and downstream bioprocess development for valorization of PPS – the ultimate objective of this work. The screen used is reproducible (within the limits of dynamic biological systems variations) and sensitive, enabling a high recovery rate of GH gene discovery (Figure 3.9). 3.3.3 Fosmid Sequencing and Annotations The top-13 clones recovered in the section above were sent for next-generation (NGS) library preparation and sequencing on Illumina Miseq platform. Out of the 13 fosmids sent, 11 fosmids had sufficient coverage to give confidence in assembly and further analysis. The assembly statistics generated using the FabFos pipeline are indicated in the table below. Fosmid ID is designated as library plate serial # followed by well serial # viz P04P08 is the fosmid in well 81  location P08 in PPS library plate (PPSLIBM) ID 04. The assemblies were majorly successful in complete assembly of contigs and retention of major portion of the genome on the largest contig (with the possible exception of sample ID P13D18) giving confidence in using the assembled contigs for functional annotation of genes and taxonomic assignment of the genes in the fosmid insert. Table 3.2: Fosmid assembly statistics – generated using FabFos pipeline and Quast online tool Fosmid ID Coverage Length of Largest Contig (bp) No. of contigs >1000 bp Cumulative Length (bp) N50 P04P08 230 28534 2 37,020 28534 P08H17 535 23747 2 35,719 23747 P13D18 488 10810 7 25611 10810 P14I01 435 35484 3 41268 39228 P14K01 389 33549 1 33549 33549 P22O04 643 35319 3 43400 43400 P28E08 440 32830 1 32830 32830 P29G23 528 40754 1 40754 40754 P31H11 300 31614 1 31614 31614 P37P01 591 23101 2 31207 23101 P38I02 244 21955 3 43708 21955        Metapathways v2.5 (MP) (Hanson et al. 2014) with in-house updates was used to annotate the concatenated input fasta file comprising all fosmid contigs against curated and annotated databases as described in materials and methods. The focus was to assess CAZy annotations to validate empirical functional activity observations. For taxonomic assessment of 82  genes, all fosmid fasta files were also run separately on MP including refseq annotation for LCA taxonomic tree construction of annotated genes. The results presented in Table 3.3 therefore differ from Figure 3.11 due to gene annotation frequency changes between different runs (concatenated vs separate) A total of 327 translated ORFs were predicted across all fosmids following QC with 39 hits in CAZy database (8%). All of them, saving one hit in carbohydrate esterase (CE) family, belonged to glycoside hydrolase (GH) family (Figure 3.10). It was also observed that majority of hits across COG and KEGG databases (that yielded maximum ORF annotations) were related to carbohydrate transport and metabolism. A comprehensive interpretation of all these database annotations is beyond the scope of this study. However, these annotations foster confidence in the possibility of discovering genes that would not only enable hydrolysis of lignocellulosic polymers but also help with heterologous uptake of these polymers, specifically carbohydrate ABC transporter membrane proteins, sugar transporter permeases, glycosylation related proteins (sugar kinases) and accessory activities like sugar isomerases. Lignocellulose hydrolysis requires synergy between different activities as discussed before and these genes can be used to engineer a biological system than can lead to consolidated bioprocessing.       83    Figure 3.10: Percentage breakdown of Metapathways annotations of fosmid ORFs with focus on CAZy annotations  Figure 3.11: Fosmid linked genomic map - each line represents a fosmid clone with some fosmids represented by multiple contigs. Each predicted gene is represented by an arrow showing the direction of transcription. Grey links connect protein homologous with e-value≤1e-10 (Figure in collaboration with Kateryna Ievdokymenko) 84  The Metapathways output was then fed into a custom python script developed by Ivan Minevskiy in collaboration with Kateryna Ievdokymenko (https://github.com/minevskiy/bioinformatics/tree/master/genomic-map-with-links) to visualise ORFs and determine the protein homology between the different fosmid clone gene sequences using BLASTP 2.2.2.28 (e-value ≤1.00e-10). The output was re-touched and coloured using Adobe Illustrator CS 6 (Figure 3.11) to depict the different glycoside hydrolase family genes. Each of the fosmid gene cassette (shown in Figure 3.11) harbours at least one GH locus which explains and validates the functional screening. The fosmid gene cassettes were quite distinct from each other. For fosmids where only 1 or 2 significant GH ORF’s are present it is simple to draw conclusions about GH family responsible for observed activity. This is observed for fosmids P04P08, P1401 and P08H17 and interestingly the first two were the hits with the highest RFU signal during screening (Figure 3.9). However, for other fosmids, interpretations about which gene locus/loci are active is more convoluted - given the co-occurrence/co-localization of several GH families with each other. GH 3, GH 43 and GH 10 represent equally the most abundant GH family groups in the fosmids. No endo-glucanase groups were detected on these fosmids, potentially indicating that the observed activity was a result of exo-acting glucanases or β-glucosidases. However, endo-xylanase groups were detected (GH 43, GH 10, GH 30) and were found to be increasingly co-localized. These families have been commonly observed to co-occur on PULs especially XyGULs (Grondin et al. 2017; Attia et al. 2018) and have been commonly observed across different groups of bacteria. For example, several groups within Bacteriodetes (model group for PUL studies) 85  (Larsbrink et al. 2014) and Firmicutes (Harris et al. 2016) phylum that are well-defined organic degraders, contain these XyGULs and can degrade xylan polymers in hemicellulose. From the perspective of bio hydrolysis, it is an important function to make cellulose more accessible and generate net total reducing C5-C6 sugars that might be used further downstream as C-sources. For these loci, synteny was also observed with GH 3 and GH 62 families (Figure 3.11) and the former might explain observation of activities on CMU-C substrate.  GH3 is a big family including exo-acting β-D-glucosidases and β-D-glucan glucohydrolases, α-L-arabinofuranosidases, β-D-xylopyranosidases, N-acetyl-β-D-glucosaminidases, and N-acetyl-β-D-glucosaminide phosphorylases. The enzyme activities span cellulosic biomass degradation, plant and bacterial cell wall remodelling, energy metabolism and defence mechanisms. The enzymes are also known to be promiscuous in action and have dual or broad substrate specificities with respect to monosaccharide residues, linkage position and chain length of the substrate (Fincher et al. 2017).   The taxonomic assignment of the fosmid genes is presented in Table 3.3.  Although, ideally the fosmid gene segment/insert contains sufficient taxonomic resolution to constrain the taxonomy of donor genotypes, functional screening often recovers active clones with mixed heritage consistent with horizontal gene transfer in the environment. Especially for big groups like proteobacteria, confidently assigning taxonomy to the fosmid is difficult as observed in this study. Some of the genes of interest that could not be assigned through the LCA algorithm utilising Refseq and NCBI taxonomic tree information within MP have been indicated below. These need further investigation and are candidate for potential novel phylogeny placements or candidate phylum discovery. This can be done through MEGAN (Huson et al. 2016) , MLTreeMAP 86  (Stark et al. 2010) or other suitable softwares tailored for metagenomic sequence information (including both raw sequence and assembled reads). The individual protein sequences can also be searched for homology based taxonomic assignment in protein specific databases like Pfam (Bateman et al. 2004) or Universal Protein Resource (UNIPROT) (Wu et al. 2006). The increased presence of co-localised XuGULs on the fosmid sequences which have ambiguous taxonomic results can also speak to assembly artefacts that deter taxonomic assignment by missing sequence information. These loci could be used as a composite query in the PUL database (Terrapon et al. 2015) to predict taxonomy of these conserved regions. The ambiguity also presents an opportunity to investigate in detail the GH families that are missing taxonomic assignments and placement in enzyme-family specific taxonomic trees created using RAxML (Stamatakis 2014) or other suitable softwares. Table 3.3: Taxonomic assignment of ORF’s across fosmids: ORF taxonomic annotation was done through the LCA algorithm implemented using NCBI taxonomy tree in Metapathways pipeline (in cases of multiple GH loci from individual fosmid annotations – the annotation for ≥50% of instances was reported) Fosmid ID Taxonomic Group represented in ≥50% of ORFs Taxonomic rank and phylum information MP Taxonomic assignment of GH gene(s) Notes P04P08 Chloroflexia Class; Chloroflexi GH 3 -Prokaryotes GH 3 gene not assigned P08H17 Caldilinea aerophila strain STL-6-01 Strain; Chloroflexi GH 1 - Caldilinea aerophila strain STL-6-01  P13D18 Microbacteriaceae  Family; Actinobacteria GH 3 – Microbacteriaceae; CE 7 – Microbacteriaceae; GH 20 - Microbacteriaceae  87  P14I01 Caldilinea aerophila strain STL-6-01 Strain; Chloroflexi  GH 1 – Prokaryotes; GH 13 - Caldilinea aerophila strain STL-6-01 GH 1 gene not assigned P14K01 Betaproteobacteria Class; Proteobacteria GH 30 – Gammaproteobacteria; CE 1 – Burkhoderiales; GH 62 – Cellvibrionaceae; GH 10 – Uliginosibacterium strain 5YN10-9; GH 43 - Betaproteobacteria Equal assignments to orders Rhodocyclales and Burkholderiales P22O04 Caldilinea aerophila strain STL-6-01 Strain; Chloroflexi GH 92 and GH 1 - Caldilinea aerophila strain STL-6-01  P28E08 Bacteroidales Order; Bacteroidetes GH 3 – Porphyromonadaceae; GH 10 – Bacteroidales; GH 43 – Bacteroides; GH 67 - Porphyromonadaceae  P29G23 Caldilinea aerophila strain STL-6-01 Strain; Chloroflexi GH 3 – Chloroflexi; GH 29 - Caldilinea aerophila strain STL-6-01  P31H11 Betaproteobacteria Class; Proteobacteria GH 30 – Xanthomonas; GH 43 – Betaproteobacteria; GH 10 – Uliginosibacterium; GH 62 - Gammaproteobacteria Presence of both beta and gamma sub-groups of Proteobacteria P37P01 Prokaryotes Domain GH 2 – Ktedonobacter; GH 3 – Proteobacteria; GH 88 - Proteobacteria The fosmid shows different phyla; one GH 3 ORF not specifically assigned P38I02 Rhodocyclales Order; Proteobacteria GH 10 - Uliginosibacterium strain 5YN10-9; GH 30 and GH 43 - Presence of Burkholderiales group 88  Proteobacteria; GH 62 – gamma Proteobacteria       Figure 3.12: Taxonomic distribution across sequenced fosmids (Taxonomy assigned based on LCA assignment of taxonomy at phylum level represented in ≥50% of ORF’s for each fosmid assembly) Comparing Figure 3.12 with Figure 2.9 (taxonomic distribution of the whole metagenome) makes the disparity between the functional gene abundance of active clones obtained through functional screening and whole metagenomic taxonomic assessments very evident. Similar to binning taxonomic distribution results (Figure 2.8), this can point towards the importance of certain phyla that may not be very abundant in the environment but play a crucial role as the primary organic matter degraders. Interestingly the specific species of Chloroflexi phylum that had the highest number of ORF’s assigned to it, Caldilinea aerophila (Sekiguchi et al. 2003) is a thermophile commonly found in anaerobic granular sludge and waste water treatment bioreactors. So, its presence in the PPS environment is not unsurprising. However, it does not have the highest representation in the taxonomy of the GH ORFs (Table 3.3). The Chloroflexi phylum are present in environments similar to PPS but they are not usually the phylum with the most predominant GH activities. These activities are attributed more to Bacteroidetes, Firmicutes 0510152025303540Chloroflexi Bacteriodetes Actinobacteria ProteobacteriaPercentage assignment across fosmid ORFs89  and Actinobacteria (Wang et al. 2016; Berlemont and Martiny 2016; Berlemont 2017). There is however, some evidence for organic matter degradation by Chloroflexi phyla in marine environments (Jessen et al. 2017; Landry et al. 2017) and a patent on an undisclosed thermophilic Chloroflexi-like organism capable of degrading cellulose (Stott et al. 2009). In light of these findings from this study and literature, it is difficult to make definite conclusions about the disparity between in silico and functional taxonomic distribution and it can elude to the need to evaluate biases in environmental DNA capture for metagenomic library preparation, substrate choice in functional screening, and potentially conduct metatranscriptomic studies on the environment to answer the question of which taxa are functionally most active in the environment for GH activity.  3.4 Conclusion The fosmid clones obtained through functional metagenomic screening mostly show exo-acting β-glucosidase activities. There is also an interesting enrichment of XyGUL’s and endo-xylanase activities in the screened clones and it might be attributed to the promiscuity of the GH families that are present within these groups. Transposon mutagenesis can be used to assign specific GH loci to observed function. Taxonomically, there is more to investigate to confidently assign the fosmid genes and map it back to the original environments. The findings presented are within the boundaries of the functional screening paradigm set-up in this study and therefore suffer from the general limitations of functional metagenomic screen design including choice of substrate, expression host and batch-to-batch variations. Upstream of library preparation, techniques like SIP based DNA enrichment to focus on cellulose hydrolysis function can also be 90  done to narrow down screening efforts. There is a need to couple high-throughput functional clone product biochemical characterization to truly translate these findings into bioprocessing applications with lignocellulosic substrates.              91  CHAPTER 4 Function to Application – Consolidated Bioprocessing  4.1 Background 4.1.1 Coming Full Circle - design and implementation of consolidated bioprocessing The term consolidated bioprocessing (CBP) is used interchangeably in literature as a definition which is only very specific to lignocellulosic biomass processing for biofuel production. However, a closer analysis reveals it to be more of a general process design concept which concerns itself with consolidation or merging of different unit processes together to result in a “one-pot” processing. This design makes the process very modular and is easy to retrofit with any other existing bioprocesses. If the objective for process integration or any similar brown-field operation is to utilise waste bio-streams for making value added products, then this process design can be implemented with ease for closing the loop around the bioprocesses.  Closed-loop, circular bio-process schematics for utilization of the feedstock in this study, pulp and paper sludge (PPS) is depicted in Figure 4.1. The smaller circle represents direct application of PPS sludge hydrolysis products to the paper industry as strength additives to pulp making. The bigger circle shows possibility of consolidation with other engineered bioprocesses that can produce bioproducts and biofuels that can be potentially applied to the paper industry or represent alternate revenue streams for the industry through integration of on-site bioproduction facilities.  92   Figure 4.1: Schematic of proposed circular, consolidated processes using paper sludge feedstock as direct (smaller circle) and indirect (bigger circle) applications to the paper industry Consolidation of unit processes is especially beneficial for processes concerned with lignocellulosic biomass refining due to the complex nature of the biomass that entails several separate pre-treatment and processing steps. The consolidation of several steps in comparison to other fermentation strategies for bioprocessing is depicted in Figure 4.2. As per an estimate by Bayer et. al, the cost of feedstock, enzyme, and pre-treatment account for about two-third of the total production cost, of which the enzyme cost is the largest (Bayer et al. 2007). This is where CBP design has its most impact, through both microbial platform or organism engineering (Parisutham et al. 2014) and in-situ production of enzymes.  93   Figure 4.2: Different bioprocessing strategies available for the conversion of lignocellulosic biomass to bioalcohols. Abbreviations: SHF, separate hydrolysis and fermentation; SHCF, separate hydrolysis and co-fermentation; SSF, simultaneous saccharification and fermentation; SSCF, simultaneous saccharification and co-fermentation; CBP, consolidated bioprocessing (Salehi Jouzani and Taherzadeh 2015) CBP design for lignocellulosic degradation is done through two major approaches: native and recombinant (Olson et al. 2012; Kricka et al. 2014).  1. Native strategy: Using organisms that are naturally cellulolytic to generate the desired bioproduct. These can include fungi and bacteria (including both free-enzyme systems and cellulosomes) 2. Recombinant strategy: This approach is about genetically engineering all processes from scratch in one organism or a consortium. It can prove beneficial and almost necessary if the objective is to utilise both C5 and C6 sugars that cannot be done through native systems. Given that no industrial scale demonstration of CBP process has been done to date – it might reflect on the fact that there is a need to improve upon the biological host design. Precise genetic circuitry control that would allow temporal gene expression and help control release of enzyme as per the stage of the process (Committee on Industrialization of Biology et al. 2015) is needed. The metagenomic functional genes that are integrated into bioprocess development 94  should also be optimized within this framework. Functional metagenomics has led to discovery of several functional genes (especially glycoside hydrolases) that can be potentially used to engineer CBP organisms (Maruthamuthu et al. 2016; Tiwari et al. 2018). Sommer et al. have shown application for expansion of synthetic biology toolbox within the context of increased tolerance to inhibitory compounds arising from lignin degradation in biomass (Sommer et al. 2010). 4.1.2 Biochemical Assays for Cellulose Hydrolysis Kinetics Biochemical assay experiment design requires use of pure substrates to establish activity value and other kinetic parameters like kcat (turnover number), Km (substrate inhibition constant), optimum temperature and pH of enzyme activity. This becomes challenging to screen enzymes directly on complex substrates like waste lignocelluloses biomasses on an “as-received basis” where there are interferences from other components. There is also a need to detect signal in a turbid medium and potential substrate inaccessibility that hampers activity detection. This is in direct contradiction to the need for developing enzymes that act on these complex substrates. Dashtban et al. have reviewed the different assays that are traditionally used for cellulase activity detection (Dashtban et al. 2010). Broadly, detecting cellulase activity can be done using three major approaches: 1. Assays in which the accumulation of products after hydrolysis is measured (for e.g. assays in which the reducing sugar content is measured like 3,5-dinitrosalicylic acid (DNS), Glucose oxidase (GO) or Glucose hexokinase (GHK) assays).  95  2. Assays in which the reduction in substrate quantity was monitored (this can range from simple mass difference measurements to sophisticated bio-analytical techniques using size-exclusion chromatography for determining the quantity of oligosaccharides released. 3. Assays in which the change in the physical properties of the substrate is measured (Microscopic analysis of surface characteristics using scanning electron microscopy and/or fibre staining) 4.1.3 Matching the Scales of Discovery and Application - what does functional metagenomics need to go all the way? The power of discovery can be best realized through application.  Despite its potency in unveiling unique biocatalysts and novel metabolic pathways from different environmental microbiomes, there have been several recent questions raised as to how far has functional metagenomics fulfilled its promise for delivering the promised biocatalysts (Bergholz et al. 2014; Ferrer et al. 2016) that would make biochemical processing arguably more sustainable and environmentally benign. It is indeed difficult to quantitatively assess this. This may result in part from lack of application-based studies in metagenomics combined with the reluctance of industries to share information about proprietary strains and bioprocesses. The terms bioprospecting and biorefining are quite often used in several metagenomic experimental literature (Strachan et al. 2014; Zhang et al. 2016a). However, the inherent commercialization connotation (Timmermans 2001) that comes with them has been left unaddressed by many of these studies. 96  A closer look at the history of the field and its discovery-focused structure might possibly answer this question. In 2006, the National Academic Research Press published the key challenges faced by metagenomics as a new field and this was outlined by the Global Microbiome Initiative (Council 2007a). These challenges however stop at optimising the discovery of functional genes coding putative biocatalytic functions. The logical step that should follow is the optimization of the discovered functional clones for bioprocess applications to allow true “translation” of biocatalytic products in industry or commercial ventures. The 7th challenge in this sequence would be to integrate metagenomics with bioprocessing.  A major challenge faced by functional discovery and application is the inability of biochemical characterization to keep pace with generation and analysis of metagenomic sequence data. Recently, an in-silico approach using an iterative hidden markov model (HMM) approach for minimizing the functional hits from metagenomic library has been proposed to arrive at a reasonable number of protein candidates for experimental characterization and validation of function without any significant loss of information (Kusnezowa and Leichert 2017). However, there is yet no literature on an experimental high-throughput biochemical characterization/assaying platform that can be coupled directly with functionally discovered metagenomic clones, especially as applicable to detecting glycoside hydrolase (GH) activities. Microfluidics and microarray approaches can be used to partially overcome these technical hurdles (Abot et al. 2016).  Downstream biochemical characterization of enzymes and optimization of clones should be informed from real process conditions for the intended application. Some aspects that should be given importance while designing these experiments are (Prather 2004):  97  1. Genetic or genomic engineering to maximize substrate flux to reach as close as possible to maximum theoretical yields of enzyme product production. 2. Using reference enzyme homolog models to guide protein engineering applications to include synthesis of co-factors or co-enzymes that might be missing from the environmental catalytic code captured through metagenomics. 3. Stoichiometric analysis of the feedstock to product process to determine the yield needed for an economically viable process and this should set the parameters for enzyme kinetic performance. 4.2 Materials and Methods 4.2.1 Biomass Compositional Analysis Sampling of biomass has been previously described in section 2.2.1. The pre-treatment of biomass prior to composition analysis (on a dry basis) was done in accordance with National Renewable Energy Laboratory (NREL) Laboratory Analytical Procedure (LAP) for preparation of biomass for compositional analysis (Hames et al. 2008). The biomass was dried to a constant weight through incubation for around 72 hours at 450C in an incubator. The moisture content was determined after the dry PPS reached a constant weight. It was then ground using a Wiley laboratory knife mill with a 40-mesh sieve. The dry ground PPS was also fractionated using a sieve shaker and the +20/-80 fraction was retained for analysis (Figure 4.3). This was done to remove excess ash content in the sludge which has been shown previously to inhibit hydrolysis due to change in pH of the hydrolysate (Gurram et al. 2015).  98   Figure 4.3: Pulp and paper sludge (PPS) feedstock (left-right) (a) Wet PPS cakes obtained after filtration of water content (b) Dry PPS (constant weight) (c) Dried, milled and sieved PPS  The extractive content of the biomass was also determined prior to compositional analysis and the biomass was analysed after this on an “extractives-free” basis. The analysis was done using a Dionex ASE 350 Accelerated Solvent Extractor and the % extractives in the biomass was determined according to NREL: LAP “Determination of Extractives in Biomass” (Sluiter et al. 2008). The biomass analysis was conducted using a modified Klason method based on Technical Association of the Pulp and Paper Industry (TAPPI) standard compositional analysis protocol T222om-88 (TAPPI 2006) . Briefly, in this method, the composition of pre-treated (dried and milled) woody biomass is determined using acid-hydrolysis. The structural carbohydrates are determined through hydrolysis and solubilization by sulfuric acid, acid-soluble and insoluble lignin and the ash content are also determined simultaneously.  4.2.2 Detection of Hydrolytic Activity using Colorimetric Assay The colorimetric assay used for the detection of cellulolytic activity of positive fosmid clone lysates was developed by Ferrari et al. (Ferrari et al. 2014). The assay uses a mutant oxidase 99  (chito-oligosaccharide oxidase; designated as ChitO-Q268R) engineered and produced in E. coli (E. coli ORIGAMI2 DE3) that releases hydrogen peroxide upon the oxidation of cellulase-produced hydrolytic products (oligomers and monomers). The hydrogen peroxide (H2O2) produced is then monitored using a horseradish peroxidase (HRP) mediated reaction in which HRP uses the released H2O2 to convert 4-aminoantipyrine (AAP) and 3,5-dichloro-2-hydroxybenzenesulfonic acid (DCHBS) into a pink and stable compound (Figure 4.4).   Figure 4.4: Schematic of the colorimetric assay used for cellulolytic activity detection This assay was chosen as it presents a fast, sensitive method to detect cellulolytic activity without the need for any acid hydrolysis or boiling as is used in 3,5-dinitrosalicylic acid (DNS) method. It has been tested previously on complex lignocellulolytic substrates (like wheat straw) (Ferrari et al. 2014) and being colorimetric, allows rapid detection of cellulolytic activity even in coloured or turbid media. The assay is also independent of the need for reducing sugar production (i.e. only glucose) as the mutant enzyme is capable of oxidizing different oligosaccharides that result from breakdown of cellulose (glucose, cellobiose, cellotriose, 100  cellotetraose). The chito-oligosaccharide oxidase used in this study was kindly provided by. Dr. Marco W Fraaije and Dr. Alessandro Ferrari from the Molecular Enzymology Group, University of Groningen, Netherlands. It was in the form of a lyophilized powder stored at -200C. HRP (Sigma, 179.2 U/mg), 4-AAP (Acros Organics, 98%) and DCHBS (Alfa Aesar,99%) were purchased from different vendors. For detection of cellulolytic activity in a high-throughput format, filter paper disks (Whatman filter paper grade 1) (0.5 cm in diameter) were punched and deposited at the bottom of a 96-well plate and used as substrate (containing 95% crystalline α-cellulose) for cellulolytic activity detection. The disks were incubated with 200 µL of the enzyme mixture (either fosmid lysates or purified proteins as described in next section) and incubated overnight. The plate was briefly centrifuged (2000g, 5 min) and 100 µL of the supernatant was transferred to another well and supplemented with 100 µL of the reaction mixture. The plate was immediately set for continuous read measurement at 515 nm using Bio-Tek Synergy H1 Hybrid Multi-Mode Reader and readings were taken for up to 72 hours. The coloured product development was monitored by plotting absorbance values with time and was correlated to cellulolytic activity by comparison with the positive control - Celluclast enzyme cocktail. This enzyme mixture was kindly provided by the Saddler lab group (Hu et al. 2011) and was used as the positive control. EPI300 strain with empty vector pCC1TMFOS (“ePCC1FOS”) was used as the background control, BSA as protein additive control (in addition/replacement tests with Celluclast) and assay mixture as blank.   101  4.2.3 Different Systems for Testing Cellulolytic Activity Tests for cellulolytic activity of the positive fosmid clones as noted in Table 3.3, section 3.3.3, was done using three different protein sources (please note that “fosmids” from this point forward refer to the positive clones). The order of listing these systems is reflective of a scale of ease of implementation of cellulolytic assaying with the objective to reduce processing steps at scale-up. The protein lysis methods used are derived from established protein purification protocols (Burgess and Deutscher 2009) : 1. OD600 normalized fosmid whole cell lysates: 5 mL fosmid cultures and ePPC1FOS were grown for 24 hours following induction with Arabinose at inoculation. 1mL of each culture was normalized following OD600 measurements by diluting with LB to get the lowest OD value across all samples. 100 µL of culture was supplied to each 96-well with 100 µL lysis mixture (as specified in section 3.2.2) and incubated with filter paper overnight. 100 µL of this centrifuged supernatant was used for the assay next day. 2. Fosmid whole cell lysate and 50X concentrated culture supernatant protein fractions: 50 mL fosmid cell cultures were grown for 24 hours following induction with Arabinose at inoculation. Whole cell lysate protein fraction was obtained by a combination of enzymatic and free-thaw based lysis. Briefly, cells were harvested at by pelleting at 3000g (20 min) and the supernatant was separately stored at 40C prior to processing. 3 mL lysis buffer (50mM Tris pH 8.0, 0.1% Triton X-100, 10% glycerol, 300mM NaCl) supplemented with 100 µL lysozyme, 25 µL PMSF was used to resuspend the pellets from 50mL culture.  This was followed by an incubation at 370C for 30 min. The cultures were then flash frozen in liquid nitrogen for 3 min, thawed at 420C in a water-bath and pulse vortexed. This was 102  repeated 3 times. Finally, the cell debris was separated out by spinning at 15000rpm at 40C for 20 min. DNAse and RNAse (Molecular probes, Invitrogen) were supplemented to the supernatant if it was too viscous. The lysate fraction was concentrated 3X using an Amicon filter (Merck, Millipore) with a 10kDa cut-off. The supernatant fraction was concentrated 10X and buffer exchanged in lysis buffer using an Amicon filter as with lysate. The protein fractions were measured for concentration using the Pierce™ bicinchoninic acid (BCA) protein assay kit (ThermoFisher Scientific) and then visualised using Coomassie Blue staining after SDS-PAGE. 3. Sub-cloned GH genes from fosmids over-expressed in E. coli BL21(DE3) strain: Following sequencing of fosmids and gene annotation in Metapathways, the top two hits from CMU-C2 tests (section 3.3.2) harbouring GH locus on their contigs were selected for sub-cloning (P04P08 – GH3, P14I01- GH1). Details of the gene sequence, primer design and sub-cloning protocols are given in Appendix A. The sub-cloned fosmid genes were expressed in BL21(DE3) E. coli strain transformed with pET-21 a (+) vector and induced using IPTG at OD600 = 0.6. Following induction, they were grown at 300C. Untransformed BL21(DE3) cells were also included as negative controls. The cells were lysed to obtain proteins using a combination of enzymatic (lysozyme) and mechanical (probe sonication) lysis. Briefly, 3 mL of the same lysis buffer (composition same as before) was used to resuspend the cells. This was followed by an incubation on ice for 30 min. The cultures were then subjected to probe sonication by pulsing for 15s followed by ice-incubation for 30s (total 10 cycles per sample). The proteins were then concentrated, quantified and 103  visualised as explained before. They were tested for activity using CMU substrates as explained in chapter 3 methods. 4.2.4 Bench-scale Bioprocess Development  The fosmid clones were tested for their hydrolytic potential on PPS in 100mL reactions using Erlenmeyer flasks and monitored for 72 hours with 2mL sample withdrawal at definite intervals. The flasks were supplemented with 10% inoculum at T0 and whole-cell hydrolysis was carried out without addition of any lysing reagent to test feasibility of whole-cell biocatalysis and use of PPS as growth medium to induce cellulase production. Each flask had 2.5% solids loading (dried and milled sludge) and was supplemented with 5g/L peptone as N-source (given negligible N-content in PPS). The pH was measured before inoculation and adjusted if necessary to be near the natural pH of PPS (7.1-7.3 ±0.08). The media was then autoclaved and inoculated with 10% (v/v) inoculum from a 10mL seed culture grown for 24 hours prior to start to biohydrolysis. One blank control was also included along with positive Celluclast controls, and the incubation was done at 370C at 250 rpm. Each reaction was set-up in duplicate (Figure 4.5). pH measurements were made after stopping the incubation at T0 + 72 hours.   Figure 4.5: Experimental set-up for bio-hydrolysis 104  The collected samples were stored at 40C prior to processing. They were centrifuged at high speed in a bench top centrifuge at 40C to pellet the solid fraction and the clear supernatant was passed through a 0.2 µm filter and stored for sugar analysis. The samples were then analysed for crude glucose content using glucose oxidase membrane-based detection (YSI 2300 STAT Plus Glucose Lactate Analyzer, Marshall Scientific) to select samples for further analysis using HPLC. The sugar content of the selected samples was then analysed using HPLC (Dionex DX-3000, Sunnyvale, CA) using a Dionex PA1 column equipped with a pulsed amperometric detector and autosampler (Dionex). The column was equilibrated with 0.25 mM NaOH and eluted with pure water at a flowrate of 0.8 mL/min. L-Arabinose, D-galactose, D-glucose, D-xylose, D-mannose, were used as calibration standards and fucose as an internal standard. The standards were prepared in five different concentrations for each sugar to cover the estimated range in the samples. Each sample was injected in triplicate using the auto-sampler set-up. 4.3 Results 4.3.1 Compositional Analysis The compositional analysis was conducted on the biomass both with and without extractives and the “extractive-free” basis gave a much better mass closure (~99%) as depicted in the Figure 4.6 (a) below. The sugars are reported as % polymers of the detected reduced sugar monomers and the cellulose content is considered equal to the Glucan (C6) content and hemicellulose as the sum of the Xylan, Mannan, Arabinan and Galactan (C5) content. The ash content of PPS was quite low given the source being mostly thermo-mechanically pulped fibre rejects. For PPS with high ash content, milling and sieving of the dry material has been shown 105  previously in literature to effectively get rid of the ash (Gurram et al. 2015) and reduce the undesirable buffering effects that basic components might have on hydrolysis of PPS (Kang et al. 2010). The CHN elemental composition results are also included (Figure 4.6 (b)).     Figure 4.6: Percentage composition of dried, milled PPS (left-right) (a) Klason method (b) CHN elemental analysis (5% error) From the compositional analysis, the sampled PPS shows promise as a lignocellulosic feedstock for hydrolysis and potentially support whole-cell catalysis if supplied with N2 sources. This also presents an interesting opportunity for supplementing with other local agricultural waste streams rich in available N2 like used growth medium from mushroom farming and chicken manure (data from Timmenga & Associates Inc., Vancouver, BC).  4.3.2 Colorimetric Detection of Cellulolytic Activity  The mutant chito-oligosaccharide oxidase enzyme was first titrated against different concentrations of activity of positive control enzyme mixture Celluclast used in this study. The lowest possible coloured visible signal detection threshold was found to be 0.5 mU (FPU). The Figure 4.7 below shows the range of linear signal measurement using this assay. The results were 106  readily reproducible and in accordance with the original results as observed in literature (Ferrari et al. 2014) with the linear range of FPU detection as 6-100 mU.  Figure 4.7: Titration of Celluclast FPU in the chito-oligosaccharide oxidase assay (net absorbance values after subtracting assay mixture blank) The assay was also found to be able to distinguish between different solid loadings at the same enzyme loading and yield detectable colorimetric signal even while using turbid media or whole cell lysates (Figure 4.8, 4.9). The sequencing results of the positive fosmid clone hits were obtained after these assays were done and hence the interpretations from functional gene annotations were not applied for choice of substrate for assaying and only filter paper was used as model cellulose proxy. 00.020.040.060.080.10.120.140.160.180.20 100 200 300 400 500 600 700Net Absorbance 515 nm (Absorbance Units)Cellulase mU (Celluclast, Novozymes from T. reesei)107   Figure 4.8: Colorimetric assay - Columns 1-3 titration of Celluclast at different FPU loading; other wells show supernatant from hydrolysis of PPS at different solid loadings (fixed Celluclast loading 500mU) showing color development in contrast to blanks (D-F 10-12)  4.3.2.1 Whole-cell Lysates The top-10 positive fosmid clones as obtained after deconvolution with CMU-C2 in section 3.3.2 were cultured and OD600 normalized as described. Lysis mixture was added, and the lysate incubated on filter paper overnight. Each clone culture was tested in triplicate and for each clone culture there was another identical set that was spiked with 10mU Celluclast enzyme cocktail. It is clearly seen from both visual (Figure 4.9) as well as absorbance readings observations (Figure 4.10) that the cell lysates did not yield any significant activity that could be detected by the colorimetric assay. 108   Figure 4.9: (Left-right) T0 and T0+24 hours incubation of fosmid whole cell lysates with filter paper substrate. Only wells spiked with Celluclast show activity and colour change  Figure 4.10: Absorbance readings at specific time intervals during incubation – representative results for two fosmid clone reactions - the cellulolytic activity signal is resulting only from Celluclast enzyme action with no contribution from fosmid whole cell lysates (‘+cel’ refers to spiking of reaction with 10mU of Celluclast, 5% error) There could be several optimization steps that could help answer the initial question of lack of detection of cellulolytic activity using a sensitive assay. Given the low volume of the reaction, the cellulolytic proteins in the fosmid lysates might not be concentrated enough to catalyse substrate breakdown. Also, since they are environmental genes in a non-native host, 109  issues of improper protein folding/truncation might occur. Different more rigorous methods of cell lysis could be used. To address the concentration issue, tests were done with whole-cell crude protein lysates as well as the secreted fractions of fosmid clones (50mL volume). This was to check if there was any activity detection by increasing the protein quantity of the fosmid whole cell lysates as well as test secreted fraction for activity (several bacterial GH enzymes or cellulosomes are secreted) (Gladden et al. 2011; Rashamuse et al. 2016). 4.3.2.2 Fosmid Whole Cell Lysate and 50X Concentrated Culture Supernatant Protein Fractions The total protein factions from the fosmid hits (both cell lysate and the supernatant fractions were obtained, and quantification showed cell lysate of all the clones with much higher protein contents than the control (epCC1FOS) strain. The supernatant fraction however did not seem to be enriched significantly in protein content when compared to the control strain (Figure #). However, when these proteins were run on SDS-PAGE, there was significant overexpression seen in almost all the hits in the size range of (20-25 kDa). Some hits also showed overexpression in the size range (75-30kDa) which can be corroborated with the expected molecular weight sizes as predicted by ExPasy PI/Mw tool (https://web.expasy.org/compute_pi/)  from the amino acid sequence information of the respective ORF’s.  For most of the clones, however, the size of the overexpressed protein bands (20-25 kDa) are less readily relatable to the theoretical calculations from the sequence information (Figure 4.12). But these proteins might still show promise and be targeted for sub-cloning as there are cellulolytic proteins belonging to the glycoside hydrolase (GH) families in literature that fall within 110  this size range and might be catalysing breakdown of the model substrates. GH12, which the most well-known GH family in this size range includes endoglucanases. Proteins in this family are known to be induced in fungal hosts after xylanase expression when exposed to hydrolysates of lignocellulose polysaccharides (Xing et al. 2013), have some of the smallest size of GH families (around 20 kDa) (Karlsson et al. 2002), typically lack a carbohydrate-binding module (CBM) and show multifunction including both endoglucanase and endoxylanase activities (Zhang et al. 2016b). Given the overall enrichment of endoxylanase genes in the fosmids (section 3.3.3), there might be a possibility of occurrence of a gene sequence from this family which could potentially be getting annotated as an endoxylanase (for example P38I02). Although a rigorous testing of this hypothesis would entail construction of phylogenetic tree of the endoxylanase annotated enzymes and structural homology studies between observed endoxylanase clusters and GH12 sequences in curated databases. This is less likely however, since the annotation pipelines use structural and sequence similarity signatures and allocate the best possible match for gene annotation as a specific GH gamily. But given the environmental source of these genes, there could also be other low-scoring hits for the same gene that might help explain these observations There is also another recent report of discovery thermostable alkaline cellulase of compost microbial origin that falls within this size range and shows CMCase (CMC- Carboxy methyl cellulose) activity (De Marco et al. 2017). 111   Figure 4.11: Protein content estimation of fosmid whole-cell lysate and supernatant fractions using BCA assay (50mL cultures; 5% error) Colorimetric assay of both the lysate and the supernatant protein fractions was not very promising as neither of these fractions showed significant activity towards the model substrate filter paper. Even after an extended incubation of 72 hours, the values were not significantly greater than the control strain (cell lysates performed marginally better). Representative results for fosmids P38I02 and P13D18 have been depicted in Figure 4.13.   Figure 4.12: SDS PAGE visualisation of whole cell lysate and secreted protein fractions 112   Figure 4.13: Measurement of colorimetric signal after incubation with filter paper substrate for 72 hours (left-right) (a) colour development and (b) absorbance values at end of incubation period Tests were also done to assess the efficiency of the protein fractions as accessory enzymes to existing cellulolytic enzyme mixtures (Figure 4.14). This was done using two methods:  1. Replacing a part of cellulolytic enzyme mixture with the protein fraction 2. Adding a specific amount of protein fraction to the cellulolytic enzyme mixture No marked increase in hydrolysis of filter paper was observed through either of these methods. Fosmid sequence information as depicted in Figure 3.11 (section 3.3.3) tells us potentially that fosmid P38I02 has a Xyloglucan utilization loci (XuGUL) and might be encoding xylanase genes. Similarly, P13D18 has glucuronic hydrolase (GH88) (breaking down glucuronic acid - a component within hemi-cellulose) and β-glucosidase (GH3) activity that might be encoded.  Supplementation of xylanase enzymes on the same cellulolytic mixture (Celluclast) used in this study has previously shown to increase hydrolysis when supplemented by replacement, rather than addition. The latter was observed to decrease the saccharolytic output (Hu et al. 2011). However, these results were not readily observable for fosmid protein fractions. On the contrary, a very minor increase in cellulolytic signal was seen for the additive tests with 113  P38I02. These might be related to the difference in the type of substrates used for the assay. While Hu et al. had lignocellulosic substrates with a major hemicellulose component that was hydrolysed by xylanases and lead to increase in cellulolytic activity – the substrate used in this assay does not have any hemi-cellulose content. So, use of these lysates might be redundant and potentially other substrates like xylans, mannans or even avicel (for testing exo-acting glycosidases) should be used to best assess hydrolytic potential.  Figure 4.14: Supplementing Celluclast enzyme mixture with fosmid protein fractions (left-right) and application to filter paper substrate (a) Replacement (1:1) with total protein content fixed at 35mg/g cellulose (b) Addition of protein factions to give net double increase in total protein content (Celluclast + fosmid protein) 4.3.2.3 Sub-cloned GH genes from fosmids over-expressed in E. coli BL21(DE3) strain SDS-PAGE was used to visualise the gene products from P04P08 and P1401 fosmids with BL21 DE3 as control. The first three lanes in the Figure depict the supernatant fraction both P04P08 and P14I01 show bands of ~25 kDa as observed which is absent from control strain both in supernatant as well as secreted fraction similar to what was observed before in fosmid whole cell lysates (Figure 4.12). However, the size of the observed overexpressed protein fraction is not readily explained since ExPASy tool (https://web.expasy.org/compute_pi/) 114  prediction of the molecular weights of the translated amino acid sequences of the genes puts the expected sizes at around 77 and 50 kDa respectively (Figure 4.15).  Figure 4.15: SDS-PAGE results of sub-cloned BL21 DE3 cell lysate and supernatant fraction with genes from fosmids P04P08 and P14I01 respectively (sup- supernatant fraction; CL – cell lysate) Some initial tests for functionality of proteins contained within both the lysates and the supernatant fraction were done with the CMU substrates used for screening the metagenomic libraries following the same methods as chapter 3, to check if the proteins within these fractions are active. After incubation for 4 hours, it was observed that only cell lysate and supernatant fraction of P14I01 had activity on both CMU-C2 (cellobioside) and CMU-3X (mixture of cellobioside, xyloside and mannoside) (Figure 4.16). While the cell lysate fraction had expected increased activity for the mixture (3X), it was surprising to see that the supernatant fraction showed a marked decrease in activity for the mixture vs cellobioside only.  To confirm if there was substrate inhibition, the P14I01 supernatant fraction was tested further on each substrate individually (Figure 4.16 inset). Given, low activities on both Xyloside and Mannoside, there could potentially be some inhibition occurring due to their presence or competition for enzyme activity (potentially with Mannoside). GH 1 family enzymes (domain 115  annotation for P14I01 gene sequence) are known to contain all these activities (β-glucosidase/xylosidase/mannosidase) so these results are not unsurprising. However, to understand the kinetics of inhibition better, further combinatorial experiments with the substrates are needed. P04P08 surprisingly did not show any activity in both the supernatant or lysate fractions. This might be attributed to incorrect expression of the gene leading to truncated proteins, improper folding or other post-translational modifications leading to lack of activity. To correctly establish activity of enzymes within these fractions, his-tag purification of the proteins is required. The obtained proteins should be then tested for activity on CMU substrates along with cellobiose and Avicel (which are the more relevant compounds for optimizing these biocatalysts for biomass degradation).  Figure 4.16: Activity testing of sub-cloned cell lysate and protein fraction using CMU substrates (CMU-C2: Cellobioside; CMU-X2: Xyloside; CMU-Man: Mannoside; CMU-3X: mixture of all three substrates; readings at end of 4-hour incubation period with 5% error and inset shows deconvolution tests for P14I01 supernatant fraction)  116  4.3.3 Bench-scale Hydrolysis The hydrolysis tests using whole fosmid cultures showed a maximum glucan % conversion of around 2.7 for two clone cultures P22O04 and P04P08 (Figure 4.15). This was very low as compared to Celluclast control (< 70%). However, given that these clones were not biologically optimized or engineered to overproduce enzymes or secrete enzymes, these clones were selected for further investigation using HPLC analysis of the reduced sugar profile. This was also done to validate results from crude analysis using the glucose oxidase membrane-based detection that had a lower detection threshold of 25mg/L.  Figure 4.17: Percentage conversion of glucan in PPS to glucose during the hydrolysis experiment HPLC however revealed that these results were not reliable as none of the fosmid clone cultures yielded quantifiable amounts of glucose content that could be analysed in the selected standard range (50-750 mg/L with lower value representing a minimum conversion of 5%). The theoretical minimum conversion needed to completely replace current cellulolytic mixtures is around 50-70% and fosmid whole cell cultures do not meet that criteria. It is not surprising given 117  the low copy number of fosmids and need for improving and optimizing the expression of GH genes specifically. Celluclast produced around 50-60% glucose from initial glucan content as expected.  4.4 Conclusion Biocatalysts are an integral part of bioeconomy development and functional metagenomics has the potential to not only unearth new biocatalysts, but also whole new metabolic pathways that might be already adept to complex organic matter or biomass degradation. However, to truly fulfil its promise of enhancing cost-effectiveness in bioprocess development, there is a pressing need to biochemically optimize the clones discovered through functional metagenomics pipeline. These biochemical kinetic assays need to be done in high-throughput and preferably using real biomass substrates to generate activity values as “ligocellulose units (glucan/xylan)” instead of approximating “filter paper units” or “CMC units”. Substrate choice is extremely crucial as it affects the screening process enriching for very specific activities and given the complex nature of biomass, detection of PULs rather than single GH genes is beneficial for biocatalytic clone design and development. The clones obtained in this study show high potential for application as hemi-cellulolytic and exo-acting cellulolytic catalysts. This has important implications in lignocellulolytic biomass degradation and engineering a consortium of these clones in tandem with other metagenomically discovered endocellulolytic clones will enable the economic bioconversion of PPS. As such, it is demonstrated through compositional analysis and conversion using conventional cellulolytic enzyme mixtures that PPS indeed represents a low cost, easily available biomass resource that can be biologically converted to valuable downstream products through production of readily utilisable C6 sugars. 118  Chapter 5 Thesis Conclusion and Future Directions of Work 5.1 Concluding Discussion The paper sludge microbiome presents a promising source of plant polysaccharide degradation genes that can be tapped for engineering optimized biological systems for biocatalyst production. The microbiome seems uniquely enriched in hemi-cellulose and structural biomass degradation genes and these present interesting targets for developing enzymatic cocktails for lignocellulose degradation or even enzymatic pre-treatment of lignocellulosic biomass to selectively hydrolyse and remove hemicellulose content from biomass. This is also reflected in functional genes uncovered through high-throughput screening with model fluorogenic substrates. The positive clones uncovered show enrichment for exo-acting β-glucosidase activities and xyloglucan utilization loci. This might be reflective of the specific substrates used during the screening along with the close association of several glycoside hydrolase genes in polysaccharide utilization loci commonly observed in prokaryotic systems that bring about degradation of organic matter through synergistic action of different enzymes. Finally, consolidated bioprocess development using the fosmid proteins remains to be thoroughly investigated. It has been challenging to translate functional metagenomic findings to bioprocess applications due to time lag between fosmid sequencing information and proper substrate selection for designing kinetics experiments. Given the abundance of xylan hydrolysing loci in fosmids, experiments in which concentrated proteins from fosmid hits are used as xylanase supplements to current cellulolytic enzyme mixtures should be studied to assess their feasibility 119  of application. Biochemical and kinetic characterizations should be done using hemi-cellulose polymers and cellobiose or avicel for exo-acting β-glucosidases. To demonstrate proof-of-concept of closed loop biomass hydrolysis process, further steps would need to be informed using an interdisciplinary approach There has to be an alliance between microbial ecology, metagenomics, synthetic biology and bioprocess engineering to truly advance the progress towards bioeconomy development. 5.2 Future Perspectives 5.2.1 Microbiome metabolic potential The paper sludge microbiome represents an interesting environment for bioprospecting of plant polysaccharide degradation genes. To better understand the microbial ecological metabolic networks driving the functions of interest and better assign function to taxonomy, representative sampling of similar environments is needed. The findings from this study are good for initial insights into the potential linkages between function and taxonomy. However, to confidently do this, it is imperative to assess several different metagenomes from mills in the similar geographic location that use the same kind of plant biomass for pulp making. Metagenomic bioinformatic annotation pipelines are sensitive to not only the nature of the input data but also how it has processed i.e. assembled or binned (tools for which lack benchmarking too). This together with the differences in methodologies and different reference databases used create variations in taxonomic annotations as observed. Therefore, results from multiple samples will allow better assessment of the taxonomic profile confidently.  120  Experimental validation of the functional genes can also be done by using metatranscriptomic studies and construction of cDNA libraries instead of only relying on annotations from pipelines. This is also applicable to the comparative analysis of microbiomes for differential expression of genes of interest. The findings presented in this study are qualitative at best and only present inferences good for hypothesis design. In order to quantitatively validate these findings, the metagenomic data should be processed using same pipelines for all samples prior to annotation and potentially also present enrichment data from rRNA profiles from these environments. 5.2.2 Functional Metagenomic Screening The findings presented from functional metagenomic screening in this study are within the constraints of the specific screening experimental design. The number of clones discovered is proportional to the number of metagenomic genes that can be expressed within host system E. coli EPI300TM. Choosing a vector that has a broad-range host expression can potentially uncover other genes with similar activities when expressed using different promoters in other host systems.  The effect of the substrate on the screening outcomes is very important and when the objective is to discover genes that can produce biocatalysts with improved or equivalent activities to industrial cellulolytic mixtures, then it is important to include complex substrates that represent lignocellulosic biomass. To reach a compromise between inability to design high-throughput screens with complex biomass (space and volume considerations in 96/384 well formats) and need to recover clones with needed functionalities, substrates with a mixture of 121  lignocellulose monomers should ideally be used. This will enable focus on clones that are most relevant to bioprocess development with targeted lignocelluloses. This presents a challenge in substrate design that can be potentially answered by synthetic chemistry. Another strategy that can be explored is targeted screening of enrichment cultures. Inoculum from PPS can be used to grow cultures on different recalcitrant biomass and following metagenomic DNA extraction, functional screening can be done to recover clones that can represent a powerful consortium with hydrolytic properties towards biomass. This will also allow using temperature and pH parameters to uncover thermostable and/or broad-range pH stable enzymes. 5.2.3 Consolidated Bioprocess Development Complete scale-up of the process involving paper sludge feedstock as the biomass for hydrolysis and using on-site production of cellulase enzymes from the PPS microbiome involves several steps beyond proof of concept.  Assaying the biochemical activity of the screened clones needs to be optimized in alignment with bioprocess development. High-titre of biocatalyst production can be achieved in expression strains using proper induction and growth conditions. The enzymes should be tested for functionality step-wise on model and real substrates to guide optimization and rule out recombinant enzyme folding or post-translational modification issues. It is also important to justify the re-proposition of PPS from an economic perspective. To ensure that the proposed closed-loop bioprocess is truly sustainable and adding value to the industrial waste stream, there should be a techno-economic analysis along with risk-assessment 122  studies done on the system boundary proposed (Figure 5.1). These metrics are important to quantify biocatalyst units needed, C5 and C6 sugar titres from PPS hydrolysis to sustain the biomass and finally the downstream product yield to ensure profitable return on investment. These findings will bring the financial and business perspectives needed to attract industrial investment and partnerships for demonstration of bioprocess at pilot-scale and ultimate conversion to industrial scale production. This has been done previously for bioethanol production from PPS (Venditti 2014).  Figure 5.1: Material and energy-based revenue flow streams around the paper mill using a biorefinery for valorization of pulp and paper mill sludge   123  Bibliography Abot A, Arnal G, Auer L, et al (2016) CAZyChip: dynamic assessment of exploration of glycoside hydrolases in microbial ecosystems. BMC Genomics 17:671. doi: 10.1186/s12864-016-2988-4 Acker MG, Auld DS (2014) Considerations for the design and reporting of enzyme assays in high-throughput screening applications. Perspect Sci 1:56–73. doi: 10.1016/j.pisc.2013.12.001 An D, Caffrey SM, Soh J, et al (2013) Metagenomics of hydrocarbon resource environments indicates aerobic taxa and genes to be unexpectedly common. Environ Sci Technol 47:10708–17. doi: 10.1021/es4020184 Andrews S (2017) FastQC: A Quality Control tool for High Throughput Sequence Data.  Ansorge WJ, Katsila T, Patrinos GP (2017) Perspectives for Future DNA Sequencing Techniques and Applications. In: Molecular Diagnostics. Elsevier, pp 141–153 Antunes LP, Martins LF, Pereira RV, et al (2016) Microbial community structure and dynamics in thermophilic composting viewed through metagenomics and metatranscriptomics. Sci Rep 6:38915. doi: 10.1038/srep38915 Armstrong Z, Mewis K, Strachan C, Hallam SJ (2015) Biocatalysts for biomass deconstruction from environmental genomics. Curr Opin Chem Biol 29:18–25. doi: 10.1016/j.cbpa.2015.06.032 Arshad A, Dalcin Martins P, Frank J, et al (2017) Mimicking microbial interactions under nitrate-reducing conditions in an anoxic bioreactor: enrichment of novel Nitrospirae bacteria 124  distantly related to Thermodesulfovibrio. Environ Microbiol 19:4965–4977. doi: 10.1111/1462-2920.13977 Artzi L, Bayer EA, Moraïs S (2017) Cellulosomes: bacterial nanomachines for dismantling plant polysaccharides. Nat Rev Microbiol 15:83–95. doi: 10.1038/nrmicro.2016.164 Attia MA, Nelson CE, Offen WA, et al (2018) In vitro and in vivo characterization of three Cellvibrio japonicus glycoside hydrolase family 5 members reveals potent xyloglucan backbone-cleaving functions. Biotechnol Biofuels. doi: 10.14288/1.0363937 Bainomugisa A, Duarte T, Lavu E, et al (2018) A complete nanonpore-only assembly of an XDR Mycobacterium tuberculosis Beijing lineage strain identifies novel genetic variation in repetitive PE/PPE gene regions. bioRxiv 256719. doi: 10.1101/256719 Bao Y-J, Xu Z, Li Y, et al (2017) High-throughput metagenomic analysis of petroleum-contaminated soil microbiome reveals the versatility in xenobiotic aromatics metabolism. J Environ Sci 56:25–35. doi: 10.1016/J.JES.2016.08.022 Bateman A, Coin L, Durbin R, et al (2004) The Pfam protein families database. Nucleic Acids Res 32:138D–141. doi: 10.1093/nar/gkh121 Bayer EA, Lamed R, Himmel ME (2007) The potential of cellulases and cellulosomes for cellulosic waste management. Curr Opin Biotechnol 18:237–245. doi: 10.1016/J.COPBIO.2007.04.004 Benyus JM (2009) Biomimicry : innovation inspired by nature. HarperCollins e-books Bergholz TM, Moreno Switt AI, Wiedmann M (2014) Omics approaches in food safety: fulfilling 125  the promise? Trends Microbiol 22:275–81. doi: 10.1016/j.tim.2014.01.006 Berlemont R (2017) Distribution and diversity of enzymes for polysaccharide degradation in fungi. Sci Rep 7:222. doi: 10.1038/s41598-017-00258-w Berlemont R, Martiny AC (2016) Glycoside Hydrolases across Environmental Microbial Communities. PLoS Comput Biol 12:e1005300. doi: 10.1371/journal.pcbi.1005300 Binga EK, Lasken RS, Neufeld JD (2008) Something from (almost) nothing: the impact of multiple displacement amplification on microbial ecology. ISME J 2:233–241. doi: 10.1038/ismej.2008.10 Bio-TIC (2015) The bioeconomy enabled - A roadmap to a thriving industrial biotechnology sector in Europe.  BioVale (2015) BioVale: A strategy for a bioeconomy innovation cluster across Yorkshire and Humber.  Bisswanger H (2014) Enzyme assays. Perspect Sci 1:41–55. doi: 10.1016/J.PISC.2014.02.005 Bocken NMP, de Pauw I, Bakker C, van der Grinten B (2016) Product design and business model strategies for a circular economy. J Ind Prod Eng 33:308–320. doi: 10.1080/21681015.2016.1172124 Bogdanski BEC (2014) The rise and fall of the canadian pulp and paper sector. For Chron 90:785–793. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170 126  Bowers RM, Kyrpides NC, Stepanauskas R, et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731. doi: 10.1038/nbt.3893 Brown RC (2013) Distributed Production of Biobased Products with Biomass Processing Modules. Ames Browne T, Gilsenan R, Singbeil D, Paleologou M (2011) Bio-energy and Bio-chemicals Synthesis Report.  Bueso YF, Tangney M (2017) Synthetic Biology in the Driving Seat of the Bioeconomy. Trends Biotechnol 35:373–378. doi: 10.1016/j.tibtech.2017.02.002 Burgess RR, Deutscher MP (2009) Guide to protein purification, 2nd edn. Elsevier/Academic Press Cai J, He Y, Yu X, et al (2017) Review of physicochemical properties and analytical characterization of lignocellulosic biomass. Renew Sustain Energy Rev 76:309–322. doi: 10.1016/J.RSER.2017.03.072 Caporaso JG, Kuczynski J, Stombaugh J, et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336. doi: 10.1038/nmeth.f.303 Carrez D, Van Leeuwen P (2015) Bioeconomy: circular by nature. Eur FIles 34–35. Caspi R, Billington R, Ferrer L, et al (2016) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 44:D471-80. doi: 10.1093/nar/gkv1164 127  Chen G-Q (2012) New challenges and opportunities for industrial biotechnology. Microb Cell Fact 11:111. doi: 10.1186/1475-2859-11-111 Chen H, Armstrong Z, Hallam S, Withers S (2016) Synthesis and evaluation of a series of 6-chloro-4-methylumbelliferyl glycosides as fluorogenic reagents for screening metagenomic libraries for glycosidase activity.  Chen H, Han Q, Daniel K, et al (2014) Conversion of Industrial Paper Sludge to Ethanol: Fractionation of Sludge and Its Impact. Appl Biochem Biotechnol 174:2096–2113. doi: 10.1007/s12010-014-1083-z Cheng J, Romantsov T, Engel K, et al (2017) Functional metagenomics reveals novel β-galactosidases not predictable from gene sequences. PLoS One 12:1–20. doi: 10.1371/journal.pone.0172545 Chistoserdovai L (2010) Functional metagenomics: recent advances and future challenges. Biotechnol Genet Eng Rev 26:335–52. Committee on Industrialization of Biology, Board on Chemical Sciences and Technology, Board on Life Sciences, Division on Earth and Life Studies (2015) Industrialization of Biology: A Roadmap to Accelerate the Advanced Manufacturing of Chemicals.  Council C on MC and FNR (2007a) A Balanced Portfolio: Multi-Scale Projects in the “Global Metagenomics Initiative.” In: The New Science of Metagenomics Revealing the Secrets of Our Microbial Planet. THE NATIONAL ACADEMIES PRESS, pp 107–123 Council NR (2007b) The New Science of Metagenomics: Revealing the Secrets of Our Microbial 128  Planet. National Academies Press, Washington, D.C. Craig JW, Chang F-Y, Kim JH, et al (2010) Expanding Small-Molecule Functional Metagenomics through Parallel Screening of Broad-Host-Range Cosmid Environmental DNA Libraries in Diverse Proteobacteria. Appl Environ Microbiol 76:1633–1641. doi: 10.1128/AEM.02169-09 Cuadrat RRC, Ionescu D, Dávila AMR, Grossart H-P (2018) Recovering Genomics Clusters of Secondary Metabolites from Lakes Using Genome-Resolved Metagenomics. Front Microbiol 9:251. doi: 10.3389/fmicb.2018.00251 Darling AE, Jospin G, Lowe E, et al (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243. doi: 10.7717/peerj.243 Dashtban M, Maki M, Leung KT, et al (2010) Cellulase activities in biomass conversion: Measurement methods and comparison. Crit Rev Biotechnol 30:302–309. doi: 10.3109/07388551.2010.490938 De Marco ÉG, Heck K, Martos ET, Van Der Sand ST (2017) Purification and characterization of a thermostable alkaline cellulase produced by Bacillus licheniformis 380 isolated from compost. Ann Brazilian Acad Sci 8933:2359–2370. doi: 10.1590/0001-3765201720170408 DOE. U (2015) Bioenergy Workshop.  Dröge J, Gregor I, McHardy AC (2015) Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31:817–24. doi: 10.1093/bioinformatics/btu745 129  El-Chichakli B, von Braun J, Lang C, et al (2016) Policy: Five cornerstones of a global bioeconomy. Nature 535:221–223. doi: 10.1038/535221a Engelbrektson A, Kunin V, Wrighton KC, et al (2010) Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J 4:642–647. doi: 10.1038/ismej.2009.153 Epicentre (2010) CopyControl TM Fosmid Library Production Kit with pCC1FOS TM Vector CopyControl TM HTP Fosmid Library Production Kit with pCC2FOS TM Vector. Control 1–28. doi: CCFOSS110 European Commission (2017) Review of the 2012 European Bioeconomy Strategy.  Falkowski PG, Fenchel T, Delong EF (2008) The Microbial Engines That Drive Earth ’s Biogeochemical Cycles. Science (80- ) 320:1034–1039. doi: 10.1126/science.1153213 Ferrari AR, Gaber Y, Fraaije MW (2014) A fast, sensitive and easy colorimetric assay for chitinase and cellulase activity detection. Biotechnol Biofuels 7:37. doi: 10.1186/1754-6834-7-37 Ferrer M, Martínez-Martínez M, Bargiela R, et al (2016) Estimating the success of enzyme bioprospecting through metagenomics: current status and future trends. Microb Biotechnol 9:22–34. doi: 10.1111/1751-7915.12309 Fincher G, Mark B, Brumer H (2017) Glycoside Hydrolase Family 3. In: CAZYpedia. //www.cazypedia.org/index.php?title=Glycoside_Hydrolase_Family_3&oldid=11467. Accessed 20 Mar 2018 Gabor EM, Alkema WBL, Janssen DB (2004) Quantifying the accessibility of the metagenome by 130  random expression cloning techniques. Environ Microbiol 6:879–886. doi: 10.1111/j.1462-2920.2004.00640.x General Assembly UN (1987) Our Common Future: Report of the World Commission on Environment and Development. Oslo Geng A, Zou G, Yan X, et al (2012) Expression and characterization of a novel metagenome-derived cellulase Exo2b and its application to improve cellulase activity in Trichoderma reesei. Appl Microbiol Biotechnol 96:951–962. doi: 10.1007/s00253-012-3873-y Geng Y, Doberstein B (2008) Developing the circular economy in China: Challenges and opportunities for achieving “leapfrog development.” Int J Sustain Dev World Ecol 15:231–239. doi: 10.3843/SusDev.15.3:6 Ghribi M, Meddeb-Mouelhi F, Beauregard M (2016) Microbial diversity in various types of paper mill sludge: identification of enzyme activities with potential industrial applications. Springerplus. doi: 10.1186/s40064-016-3147-8 Gies EA, Konwar KM, Beatty JT, Hallam SJ (2014) Illuminating microbial dark matter in meromictic Sakinaw Lake. Appl Environ Microbiol 80:6807–18. doi: 10.1128/AEM.01774-14 Gladden JM, Allgaier M, Miller CS, et al (2011) Glycoside hydrolase activities of thermophilic bacterial consortia adapted to switchgrass. Appl Environ Microbiol 77:5804–12. doi: 10.1128/AEM.00032-11 Gonzalez JM, Portillo MC, Belda-Ferre P, Mira A (2012) Amplification by PCR artificially reduces 131  the proportion of the rare biosphere in microbial communities. PLoS One 7:e29973. doi: 10.1371/journal.pone.0029973 Grob C, Taubert M, Howat AM, et al (2015) Combining metagenomics with metaproteomics and stable isotope probing reveals metabolic pathways used by a naturally occurring marine methylotroph. Environ Microbiol 17:4007–4018. doi: 10.1111/1462-2920.12935 Grondin JM, Tamura K, Déjean G, et al (2017) Polysaccharide Utilization Loci: Fueling Microbial Communities. J Bacteriol 199:JB.00860-16. doi: 10.1128/JB.00860-16 Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–5. doi: 10.1093/bioinformatics/btt086 Gurram RN, Al-Shannag M, Lecher NJ, et al (2015) Bioconversion of paper mill sludge to bioethanol in the presence of accelerants or hydrogen peroxide pretreatment. Bioresour Technol 192:529–539. doi: 10.1016/j.biortech.2015.06.010 Hames B, Ruiz R, Scarlata C, et al (2008) Preparation of Samples for Compositional Analysis: Laboratory Analytical Procedure (LAP); Issue Date 08/08/2008.  Handelsman J (2004) Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol Mol Biol Rev 68:669–685. doi: 10.1128/MMBR.68.4.669-685.2004 Hanson NW, Konwar KM, Wu S-J, Hallam SJ (2014) MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology. IEEE, pp 1–7 132  Harris HMB, Duncan SH, P. Scott K, et al (2016) Polysaccharide utilization loci and nutritional specialization in a dominant group of butyrate-producing human colonic Firmicutes. Microb Genomics 2:e000043. doi: 10.1099/mgen.0.000043 Hawley AK, Nobu MK, Wright JJ, et al (2017) Diverse Marinimicrobia bacteria may mediate coupled biogeochemical cycles along eco-thermodynamic gradients. Nat Commun 8:1507. doi: 10.1038/s41467-017-01376-9 Henn MR, Sullivan MB, Stange-Thomann N, et al (2010) Analysis of high-throughput sequencing and annotation strategies for phage genomes. PLoS One 5:e9083. doi: 10.1371/journal.pone.0009083 Hess M, Sczyrba A, Egan R, et al (2011) Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331:463–7. doi: 10.1126/science.1200387 Hu J, Arantes V, Saddler JN (2011) The enhancement of enzymatic hydrolysis of lignocellulosic substrates by the addition of accessory enzymes such as xylanase: is it an additive or synergistic effect? Biotechnol Biofuels 4:36. doi: 10.1186/1754-6834-4-36 Huson DH, Beier S, Flade I, et al (2016) MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data. PLOS Comput Biol 12:e1004957. doi: 10.1371/journal.pcbi.1004957 Hyatt D, Chen G-L, Locascio PF, et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. doi: 10.1186/1471-2105-11-119 133  Illumina Inc. (2015) An introduction to Next-Generation Sequencing Technology.  Ilmberger N, Güllert S, Dannenberg J, et al (2014) A Comparative Metagenome Survey of the Fecal Microbiota of a Breast- and a Plant-Fed Asian Elephant Reveals an Unexpectedly High Diversity of Glycoside Hydrolase Family Enzymes. PLoS One 9:e106707. doi: 10.1371/journal.pone.0106707 Ioelovich M (2014) Waste Paper as Promising Feedstock for Production of Biofuel.  Jackson MA, Line MA (1997) Organic Composition of a Pulp and Paper Mill Sludge Determined by FTIR, 13C CP MAS NMR, and Chemical Extraction Techniques. doi: 10.1021/JF960946L Jendrisak JJ, Hoffman LM, Fiandt MJ, Haskins D (2002) Methods and compositions for amplifying DNA clone copy number. 14. Jessen GL, Lichtschlag A, Ramette A, et al (2017) Hypoxia causes preservation of labile organic matter and changes seafloor microbial community composition (Black Sea). Sci Adv 3:e1601897. doi: 10.1126/sciadv.1601897 Kanehisa M, Sato Y, Kawashima M, et al (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–D462. doi: 10.1093/nar/gkv1070 Kang L, Wang W, Lee YY (2010) Bioconversion of kraft paper mill sludges to ethanol by SSF and SSCF. Appl Biochem Biotechnol 161:53–66. doi: 10.1007/s12010-009-8893-4 Karlsson J, Siika-aho M, Tenkanen M, Tjerneld F (2002) Enzymatic properties of the low molecular mass endoglucanases Cel12A (EG III) and Cel45A (EG V) of Trichoderma reesei. J Biotechnol 99:63–78. doi: 10.1016/S0168-1656(02)00156-6 134  Kelley DR, Liu B, Delcher AL, et al (2012) Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res 40:e9–e9. doi: 10.1093/nar/gkr1067 Kharayat Y, Thakur IS (2012) Isolation of bacterial strain from sediment core of Pulp and Paper Mill industries for production and purification of lignin peroxidase (LiP) enzyme. Bioremediat J 16:125–130. doi: 10.1080/10889868.2012.665964 Korhonen J, Honkasalo A, Seppälä J (2018) Circular Economy: The Concept and its Limitations. Ecol Econ 143:37–46. Kricka W, Fitzpatrick J, Bond U (2014) Metabolic engineering of yeasts by heterologous enzyme production for degradation of cellulose and hemicellulose from biomass: a perspective. Front Microbiol 5:174. doi: 10.3389/fmicb.2014.00174 Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123. doi: 10.1111/j.1462-2920.2009.02051.x Kusnezowa A, Leichert LI (2017) In silico approach to designing rational metagenomic libraries for functional studies. BMC Bioinformatics 18:267. doi: 10.1186/s12859-017-1668-y Lam KN, Cheng J, Engel K, et al (2015) Current and future resources for functional metagenomics. Front Microbiol 6:1–8. doi: 10.3389/fmicb.2015.01196 Lamers P, Searcy E, Hess JR, Stichnothe H (2016) Developing the global bioeconomy : technical, market, and environmental lessons from bioenergy.  135  Landry Z, Swan BK, Herndl GJ, et al (2017) SAR202 Genomes from the Dark Ocean Predict Pathways for the Oxidation of Recalcitrant Dissolved Organic Matter. MBio 8:e00413-17. doi: 10.1128/mBio.00413-17 Lange L, Hreggviðsson GÓ, Björnsdóttir B, et al (2016) Development of the Nordic bioeconomy. Rosendahls-SchultzGrafisk Larsbrink J, Rogers TE, Hemsworth GR, et al (2014) A discrete genetic locus confers xyloglucan metabolism in select human gut Bacteroidetes. Nature 506:498–502. doi: 10.1038/nature12907 Lee S, Hallam SJ (2009) Extraction of High Molecular Weight Genomic DNA from Soils and Sediments. J Vis Exp 2–5. doi: 10.3791/1569 Li D, Liu C-M, Luo R, et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. doi: 10.1093/bioinformatics/btv033 Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26:589–595. doi: 10.1093/bioinformatics/btp698 Lieder M, Rashid A (2016) Towards circular economy implementation: A comprehensive review in context of manufacturing industry. J Clean Prod 115:36–51. doi: 10.1016/j.jclepro.2015.12.042 López-Mondéjar R, Zühlke D, Becher D, et al (2016) Cellulose and hemicellulose decomposition by forest soil bacteria proceeds by the action of structurally variable enzymatic systems. 136  Nat Publ Gr. doi: 10.1038/srep25279 Lopolito A, Nardone G, Prosperi M, et al (2011) Modeling the bio-refinery industry in rural areas: A participatory approach for policy options comparison. Ecol Econ 72:18–27. doi: 10.1016/j.ecolecon.2011.09.010 Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. doi: 10.1186/s13059-014-0550-8 Lucigen (2016) Phage T1-Resistant TransforMaxTM EPI300TM-T1.  Mabee W (2001) Study of woody fibre in papermill sludge. University of Toronto Macdonald SS, Patel A, Larmour VLC, et al (2018) Structural and mechanistic analysis of a β-glycoside phosphorylase identified by screening a metagenomic library. J Biol Chem jbc.RA117.000948. doi: 10.1074/jbc.RA117.000948 Madhavan A, Sindhu R, Parameswaran B, et al (2017) Metagenome Analysis: a Powerful Tool for Enzyme Bioprospecting. Appl Biochem Biotechnol 183:636–651. doi: 10.1007/s12010-017-2568-3 Maki ML, Broere M, Leung KT, Qin W (2011) Characterization of some efficient cellulase producing bacteria isolated from paper mill sludges and organic fertilizers. Int J Biochem Mol Biol 2:146–154. Mäkinen V, Salmela L, Ylinen J (2012) Normalized N50 assembly metric using gap-restricted co-linear chaining. BMC Bioinformatics 13:255. doi: 10.1186/1471-2105-13-255 Marques S, Alves L, Roseiro JC, Gírio FM (2008) Conversion of recycled paper sludge to ethanol 137  by SHF and SSF using Pichia stipitis. Biomass and Bioenergy 32:400–406. doi: 10.1016/j.biombioe.2007.10.011 Martínez A, Osburne MS (2013) Preparation of Fosmid Libraries and Functional Metagenomic Analysis of Microbial Community DNA. Methods Enzymol 531:123–142. doi: 10.1016/B978-0-12-407863-5.00007-1 Maruthamuthu M, Jiménez DJ, Stevens P, Dirk Van Elsas J (2016) A multi-substrate approach for functional metagenomics-based screening for (hemi)cellulases in two wheat straw- degrading microbial consortia unveils novel thermoalkaliphilic enzymes. BMC Genomics. doi: 10.1186/s12864-016-2404-0 McDonough W, Braungart M (2002) Cradle to cradle : remaking the way we make things. North Point Press Mewis K (2016) Functional Metagenomic Screening for Glycoside Hydrolases. University of British Columbia Mewis K, Armstrong Z, Song YC, et al (2013) Biomining active cellulases from a mining bioremediation system. J Biotechnol 167:462–471. doi: 10.1016/j.jbiotec.2013.07.015 Mewis K, Lenfant N, Lombard V, Henrissat B (2016) Dividing the Large Glycoside Hydrolase Family 43 into Subfamilies: a Motivation for Detailed Enzyme Characterization. Appl Environ Microbiol 82:1686–92. doi: 10.1128/AEM.03453-15 Mewis K, Taupp M, Hallam S (2011) A high throughput screen for biomining cellulase activity from metagenomic libraries.  138  Miller CS, Baker BJ, Thomas BC, et al (2011) EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol 12:R44. doi: 10.1186/gb-2011-12-5-r44 Nancucheo I, Bitencourt JAP, Sahoo PK, et al (2017) Recent Developments for Remediating Acidic Mine Waters Using Sulfidogenic Bacteria. Biomed Res Int 2017:7256582. doi: 10.1155/2017/7256582 Ndeh D, Rogowski A, Cartmell A, et al (2017) Complex pectin metabolism by gut bacteria reveals novel catalytic functions. Nature 544:65–70. doi: 10.1038/nature21725 Nichols D, Cahoon N, Trakhtenberg EM, et al (2010) Use of ichip for high-throughput in situ cultivation of "uncultivable microbial species▽. Appl Environ Microbiol 76:2445–2450. doi: 10.1128/AEM.01754-09 Noguchi H, Taniguchi T, Itoh T (2008) MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes. DNA Res 15:387–396. doi: 10.1093/dnares/dsn027 O’Leary NA, Wright MW, Brister JR, et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733-45. doi: 10.1093/nar/gkv1189 Oliveira JS, Araújo W, Lopes Sales AI, et al (2015) BioSurfDB: knowledge and algorithms to support biosurfactants and biodegradation studies. Database (Oxford). doi: 10.1093/database/bav033 139  Olson DG, McBride JE, Joe Shaw A, Lynd LR (2012) Recent progress in consolidated bioprocessing. Curr Opin Biotechnol 23:396–405. doi: 10.1016/J.COPBIO.2011.11.026 Owen PW (2018) Special Report Renewable energy for sustainable rural development: significant potential synergies, but mostly unrealised.  Parisutham V, Kim TH, Lee SK (2014) Feasibilities of consolidated bioprocessing microbes: From pretreatment to biofuel production. Bioresour Technol 161:431–440. doi: 10.1016/J.BIORTECH.2014.03.114 Parks DH, Imelfort M, Skennerton CT, et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–55. doi: 10.1101/gr.186072.114 Patel AK, Singhania RR, Pandey A (2017) Production, Purification, and Application of Microbial Enzymes. In: Biotechnology of Microbial Enzymes. Elsevier, pp 13–41 Pellerin W, Taylor DW (2008) Measuring the biobased economy: A Canadian perspective. Ind Biotechnol 4:363–366. doi: 10.1089/ind.2008.4.363 Pellis A, Cantone S, Ebert C, Gardossi L (2018) Evolving biocatalysis to meet bioeconomy challenges and opportunities. N Biotechnol 40:154–169. doi: 10.1016/J.NBT.2017.07.005 Philp J, Winickoff DE (2017) Clusters in Industrial Biotechnology and Bioeconomy: The Roles of the Public Sector. Trends in Biiotechnology 35:682–686. doi: 10.1016/j.tibtech.2017.04.001 Pilloni G, Granitsiotis MS, Engel M, Lueders T (2012) Testing the Limits of 454 Pyrotag 140  Sequencing: Reproducibility, Quantitative Assessment and Comparison to T-RFLP Fingerprinting of Aquifer Microbes. PLoS One 7:e40467. doi: 10.1371/journal.pone.0040467 Prasetyo J, Naruse K, Kato T, et al (2011) Bioconversion of paper sludge to biofuel by simultaneous saccharification and fermentation using a cellulase of paper sludge origin and thermotolerant Saccharomyces cerevisiae TJ14. Biotechnol Biofuels 4:35. doi: 10.1186/1754-6834-4-35 Prasetyo J, Park EY (2013) Waste paper sludge as a potential biomass for bio-ethanol production. Korean J Chem Eng 30:253–261. doi: 10.1007/s11814-013-0003-1 Prather KLJ (2004) Integrated Chemical Engineering Topics I. In: MIT OpenCourseWare. Massachusetts Institute of Technology: MIT OpenCourseWare,  Quast C, Pruesse E, Yilmaz P, et al (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41:D590-6. doi: 10.1093/nar/gks1219 Rabaçal M, Ferreira AF, Silva CAM da, Costa M (2017) Biorefineries : targeting energy, high value products and waste valorisation.  Radajewski S, Ineson P, Parekh NR, Murrell JC (2000) Stable-isotope probing as a tool in microbial ecology. Nature 403:646–649. doi: 10.1038/35001054 Rangu V (2018) Fractionation of pulp mill waste to produce hemicellulose oligomers for adsorption onto NBSK pulp. University of British Columbia 141  Ransom-Jones E, McCarthy AJ, Haldenby S, et al (2017) Lignocellulose-Degrading Microbial Communities in Landfill Sites Represent a Repository of Unexplored Biomass-Degrading Diversity. mSphere. doi: 10.1128/mSphere.00300-17 Rashamuse K, Sanyika Tendai W, Mathiba K, et al (2016) Metagenomic mining of glycoside hydrolases from the hindgut bacterial symbionts of a termite (Trinervitermes trinervoides) and the characterization of a multimodular β-1,4-xylanase (GH11). Biotechnol Appl Biochem. doi: 10.1002/bab.1480 Ree R van, Jong E de (2017) Biorefining in a future BioEconomy. http://www.ieabioenergy.com/task/biorefining-sustainable-processing-of-biomass-into-a-spectrum-of-marketable-biobased-products-and-bioenergy/. Accessed 11 Mar 2018 Rehmann M (2010) Overview of Sustainability Concepts. In: International Forum on Sustainable  Operations for Uranium Production. International Atomic Energy Agency, pp 1–38 Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191–e191. doi: 10.1093/nar/gkq747 Rhoads A, Au KF (2015) PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics 13:278–289. doi: 10.1016/J.GPB.2015.08.002 Rochman FF, Sheremet A, Tamas I, et al (2017) Benzene and Naphthalene Degrading Bacterial Communities in an Oil Sands Tailings Pond. Front Microbiol 8:1845. doi: 10.3389/fmicb.2017.01845 Roumpeka DD, Wallace RJ, Escalettes F, et al (2017) A Review of Bioinformatics Tools for Bio-142  Prospecting from Metagenomic Sequence Data. Front Genet 8:23. doi: 10.3389/fgene.2017.00023 Rubin EM (2008) Genomics of cellulosic biofuels. Nature 454:841–845. doi: 10.1038/nature07190 Salehi Jouzani G, Taherzadeh MJ (2015) Advances in consolidated bioprocessing systems for bioethanol and butanol production from biomass: a comprehensive review. Biofuel Res J 5:152–195. doi: 10.18331/BRJ2015.2.1.4 Schloss PD, Handelsman J (2005) Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol 6:229. doi: 10.1186/gb-2005-6-8-229 Schomburg D, Schomburg I (2010) Enzyme Databases. Humana Press, pp 113–128 Schütte G (2017) What kind of innovation policy does the bioeconomy need? N Biotechnol 3–7. doi: 10.1016/j.nbt.2017.04.003 Sczyrba A, Hofmann P, Belmann P, et al (2017) Critical Assessment of Metagenome Interpretation – a comprehensive benchmark of computational metagenomics software. bioRxiv 1–33. doi: 10.1101/0991277 Sedlar K, Kupkova K, Provaznik I (2017) Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics. Comput Struct Biotechnol J 15:48–55. doi: 10.1016/j.csbj.2016.11.005 Sekiguchi Y, Yamada T, Hanada S, et al (2003) Anaerolinea thermophila gen. nov., sp. nov. and Caldilinea aerophila gen. nov., sp. nov., novel filamentous thermophiles that represent a 143  previously uncultured lineage of the domain Bacteria at the subphylum level. Int J Syst Evol Microbiol 53:1843–1851. doi: 10.1099/ijs.0.02699-0 Sharan AA, Yadav VG, Hallam SJ (2017) Deep Device Mining for Carbohydrate-Active Enzymes in Pulp and Paper Mill Sludge Metagenome and Applications to Bioprocess Development | 2017 Synthetic Biology: Engineering, Evolution &amp; Design (SEED). In: 2017 Synthetic Biology: Engineering, Evolution & Design (SEED). Vancouver,  Sheldon RA, Woodley JM (2018) Role of Biocatalysis in Sustainable Chemistry. Chem Rev 118:801–838. doi: 10.1021/acs.chemrev.7b00203 Sillanpää M, Ncibi C (2017) A sustainable bioeconomy: The green industrial revolution.  Simmons CW, Reddy AP, D’haeseleer P, et al (2014) Metatranscriptomic analysis of lignocellulolytic microbial communities involved in high-solids decomposition of rice straw. Biotechnol Biofuels 7:495. doi: 10.1186/s13068-014-0180-0 Simon C, Daniel R (2011) Metagenomic analyses: past and future trends. Appl Environ Microbiol 77:1153–61. doi: 10.1128/AEM.02345-10 Skene KR (2017) Circles, spirals, pyramids and cubes: why the circular economy cannot work. Sustain Sci. doi: 10.1007/s11625-017-0443-3 Sluiter A, Ruiz R, Scarlata C, et al (2008) Determination of Extractives in Biomass: Laboratory Analytical Procedure (LAP).  Sommer MOA, Church GM, Dantas G (2010) A functional metagenomic approach for expanding the synthetic biology toolbox for biomass conversion. Mol Syst Biol 6:360. doi: 144  10.1038/msb.2010.16 Stadler LB, Delgado Vela J, Jain S, et al (2017) Elucidating the impact of microbial community biodiversity on pharmaceutical biotransformation during wastewater treatment. Microb Biotechnol. doi: 10.1111/1751-7915.12870 Stahel WR (2016) Circular economy. 6–9. Stahel WR (2010) The Performance Economy. Palgrave Macmillan UK, London Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–3. doi: 10.1093/bioinformatics/btu033 Stark M, Berger SA, Stamatakis A, von Mering C (2010) MLTreeMap - accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11:461. doi: 10.1186/1471-2164-11-461 Steels S, Portetelle D, Vandenbol M (2013) Bacillus subtilis as a Tool for Screening Soil Metagenomic Libraries for Antimicrobial Activities. J Microbiol Biotechnol 23:850–855. doi: 10.4014/jmb.1212.12008 Stott MB, Dunfield PF, Crowe MA (2009) Class of Chloroflexi-like thermophilic cellulose degrading bacteria. 40. Strachan CR, Singh R, Vaninsberghe D, et al (2014) Metagenomic scaffolds enable combinatorial lignin transformation. Source Proc Natl Acad Sci United States Am 111:10143–10148. Streit WR, Daniel R (2010) Metagenomics : methods and protocols. Humana Press 145  TAPPI (2006) Acid-insoluble lignin in wood and pulp  (Reaffirmation  of T 222 om-02).  Tatusov RL, Fedorova ND, Jackson JD, et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41. doi: 10.1186/1471-2105-4-41 Taupp M, Lee S, Hawley A, et al (2009) Large Insert Environmental Genomic Library Production. J Vis Exp 2–7. doi: 10.3791/1387 Taupp M, Mewis K, Hallam SJ (2011) The art and design of functional metagenomic screens. Curr Opin Biotechnol 22:465–472. doi: 10.1016/j.copbio.2011.02.010 Terrapon N, Lombard V, Drula E, et al (2017) The CAZy Database/the Carbohydrate-Active Enzyme (CAZy) Database: Principles and Usage Guidelines. In: A Practical Guide to Using Glycomics Databases. Springer Japan, Tokyo, pp 117–131 Terrapon N, Lombard V, Gilbert HJ, Henrissat B (2015) Automatic prediction of polysaccharide utilization loci in Bacteroidetes species. Bioinformatics 31:647–655. doi: 10.1093/bioinformatics/btu716 Tessler M, Neumann JS, Afshinnekoo E, et al (2017) Large-scale differences in microbial biodiversity discovery between 16S amplicon and shotgun sequencing. Sci Rep 7:6589. doi: 10.1038/s41598-017-06665-3 The Ellen MacArthur Foundation (2015) Towards a Circular Economy - Economic and Business Rationale for an Accelerated Transition. Greener Manag Int 97. doi: 2012-04-03 Thies S, Rausch SC, Kovacic F, et al (2016) Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community. Sci Rep 6:27035. doi: 146  10.1038/srep27035 Thomas T, Gilbert J, Meyer F (2012) Metagenomics - a guide from sampling to data analysis. Microb Inform Exp 2:3. doi: 10.1186/2042-5783-2-3 Timmermans K (2001) Trips, CBD and Traditional Medicines: Concepts and Questions. Report of an ASEAN Workshop on the TRIPS Agreement and Traditional Medicine, Jakarta, February 2001: II. CONTEXT: 2.3 Bioprospecting. Jakarta Tiwari R, Nain L, Labrou NE, Shukla P (2018) Bioprospecting of functional cellulases from metagenome for second generation biofuel production: a review. Crit Rev Microbiol 44:244–257. doi: 10.1080/1040841X.2017.1337713 van den Brink J, de Vries RP (2011) Fungal enzyme sets for plant polysaccharide degradation. Appl Microbiol Biotechnol 91:1477–92. doi: 10.1007/s00253-011-3473-2 Venditti RA (2014) Selected Topics in Lignocellulosics for Biofuels: Sludge to Ethanol. http://www4.ncsu.edu/~richardv/ethanol.html. Accessed 30 Mar 2018 Venkata Mohan S, Nikhil GN, Chiranjeevi P, et al (2016) Waste biorefinery models towards sustainable circular bioeconomy: Critical review and future perspectives. Bioresour Technol 215:2–12. doi: 10.1016/j.biortech.2016.03.130 Wang C, Dong D, Wang H, et al (2016) Metagenomic analysis of microbial consortia enriched from compost: new insights into the role of Actinobacteria in lignocellulose decomposition. Biotechnol Biofuels 9:22. doi: 10.1186/s13068-016-0440-2 Wang Y, Zhang R, He Z, et al (2017) Functional Gene Diversity and Metabolic Potential of the 147  Microbial Community in an Estuary-Shelf Environment. Front Microbiol 8:1153. doi: 10.3389/fmicb.2017.01153 Whitman WB, Coleman DC, Wiebe WJ (1998) Prokaryotes: The unseen majority. 95:6578–6583. Wild J, Hradecna Z, Szybalski W (2002) Conditionally amplifiable BACs: switching from single-copy to high-copy vectors and genomic clones. Genome Res 12:1434–44. doi: 10.1101/gr.130502 Wilhelm RC, Cardenas E, Leung H, et al (2017) Data Descriptor: A metagenomic survey of forest soil microbial communities more than a decade after timber harvesting. Sci Data. doi: 10.1038/sdata.2017.92 Wilson DB (2009) Cellulases. In: Encyclopedia of Microbiology. Elsevier, pp 252–258 Wright JJ, Lee S, Zaikova E, et al (2009) DNA Extraction from 0.22 &amp;mu;M Sterivex Filters and Cesium Chloride Density Gradient Centrifugation. J Vis Exp. doi: 10.3791/1352 Wu CH, Apweiler R, Bairoch A, et al (2006) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34:D187-91. doi: 10.1093/nar/gkj161 Wu Y-W, Simmons BA, Singer SW (2016) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32:605–607. doi: 10.1093/bioinformatics/btv638 Xia Y, Ju F, Fang HHP, Zhang T (2013) Mining of Novel Thermo-Stable Cellulolytic Genes from a Thermophilic Cellulose-Degrading Consortium by Metagenomics. PLoS One 8:e53779. doi: 148  10.1371/journal.pone.0053779 Xing S, Li G, Sun X, et al (2013) Dynamic Changes in Xylanases and β-1,4-Endoglucanases Secreted by Aspergillus niger An-76 in Response to Hydrolysates of Lignocellulose Polysaccharide. Appl Biochem Biotechnol 171:832–846. doi: 10.1007/s12010-013-0402-0 Yu K, Zhang T (2012) Metagenomic and Metatranscriptomic Analysis of Microbial Community Structure and Gene Expression of Activated Sludge. PLoS One 7:e38183. doi: 10.1371/journal.pone.0038183 Zhang G, Liu P, Zhang L, et al (2016a) Bioprospecting metagenomics of a microbial community on cotton degradation: Mining for new glycoside hydrolases. J Biotechnol 234:35–42. doi: 10.1016/J.JBIOTEC.2016.07.017 Zhang X, Wang S, Wu X, et al (2016b) Subsite-specific contributions of different aromatic residues in the active site architecture of glycoside hydrolase family 12. Sci Rep 5:18357. doi: 10.1038/srep18357 Ziels RM, Sousa DZ, Stensel HD, Beck DAC (2018) DNA-SIP based genome-centric metagenomics identifies key long-chain fatty acid-degrading populations in anaerobic digesters with different feeding frequencies. ISME J 12:112–123. doi: 10.1038/ismej.2017.143    149  Appendix A Chapter 4- Sub-cloning Details  1. P04P08  Glycoside hydrolase (GH) 3 ORF length: 2150 bp Contig location: 3 Strand orientation: “ – “ LCA taxonomy: Prokaryotes  Start position: 14083 bp End position: 16233 bp ORF_ID: 2_9  Nucleotide Sequence: >PPSLIBM-04-P08_2_9 ATGAGCCTGGAAGAGAAGGTCGCGCAACTGGCGCAGATCAGCGGAGGCGACTTTATGCCAGGGCCAAAGGCCGCCGACATCATCCGCAAGAGCGGGGCTGGCTCTGTGCTGTGGCTGAACGACACCAGGCGGTTCAACGAATTGCAGAAGATCGCCGTGGAGGAAAGCCCGTCCGGCATCCCCGTGCTGTTTGCGCTGGATGTGATTCACGGCTACCGCACGATCTTCCCCGTGCCGCTGGCGATGGCTTCTTCATGGGACCCCGCTGTGGCGGAACAGGCGCAGACGGTGGCAGCGCGCGAAACCCGCGCCGCCGGGCCGCATTGGACGTTTGGCCCGATGCTGGACATCGCGCGCGATGCGCGCTGGGGGCGAATCGTGGAAGGGGCGGGCGAAGATCCCTATCTCGGGGCGGCGATGGCAGCCGCACAGGTGCGCGGATTCCAGGGCGCCGACCTGTCGGACCCGGAGCGCGTGCTGGCGTGCGCCAAGCACTTTGCCGGCTATGGCGCGGCGGAAGGCGGCCGTGACTACGACGAGGTGCACCTGTCGGAGACGGAGCTGCGCAACACGTACTTTCCGCCCTTCGAGGCGGCGGTGAAAGCCGGCGTGGGTTCCTTCATGGGCGCCTATATGGATTTGAACCATGTCCCGGCCAGCGCCAACCGCTGGCTGCTGCGCGACATGCTGCGCAGCGAATTTGGCTTTGAAGGGTTTGTGGTCAGCGATGCCCTGGCGATTGGCAACCTGGTCATCCAGGGCCACGCGCGCGACAAGCGCGATGCTGCGCTGCGGGCGCTCAAGGCCGGCATGAACATGGACATGGCTTCCGGTTCGTACCTGGAAAACCTGGCCGACCTGGTAAAGGATGGCTCCCTTTCGGCAGAGCAGATCGATGAGATGGTGCGGCCAATCCTGGCGATCAAGTTCAAGATGGGGCTGTTCGAGAACCCTTACGTGGAAGAAGGACTGCTGGAGAAGGTGGCCGCCAGGCCCGACCACCGTGAGTTGGCGCGCTGGGCGGCGCAACGCTCGATGGTGCTGCTCAAGAACGAAGGCGGGCTGCTGCCGCTTGCCAAGAGCCTGCAGAAGGTTGCCGTGCTCGGCCCACTGGCCGACTCGATGGCGGCCACCGAAGGATCGTGGATGGTCTTCGGCCATCAACCGTCTGCCGTGACCGTGCTGCAAGGCATTCGGGCCAAGCTGCCCGACGCCAATGTGCAGTACGCCCCCGGGCCGGATATCCGGCGCGATTTCCCCTCGTTCTTTGACGAACTCTTCTCGGAAGCCAAGA150  AACCCGTCCAAACGCCCGCAGAGGCCGACGCAGCCTTGGCAACGGCCGTAGCAACTGCGCAGGCTGCCGACCTGGTCGTGATGGTGCTGGGCGAAGATGCCAACATGGCCGGCGAGTACGCCAGCCGCGGCTCGCTGGACTTGCCGGGCCGGCAGGAAGAACTGCTCAAGGCGGTCTGTGCGCTGGGCAAACCGGTGGTGCTGGTGCTGCTGAACGGCCGCCCGCTGAGCATCAACTGGGCAGCCGAACACGTGCCCGCCATTCTCGAAGCGTGGGAACCCGGCACGGAGGGCGGCAATGCGGTGGCCGACATTCTGTTCGGCGATGTCAACCCAGGCGGCAAGCTGCCTGTTACCTTTCCGCGCAGCGGCAGCCACGCGCCCATGTATTACGCGCACACGCTCAGCCACCAGCCCGAGGGCCACCCGCAGTACACGTCACGCTACTGGGACAGCCCAACCTCGCCATTGTTCCCGTTTGGCTTTGGCCTCAGCTACACCAGCTTTGCGTTTAGCAACCTCACGCTGTCGGCCCCGCAGGTCAAGCTGGGCGCATCCCTCAGCGTGAACGCCGACGTGACCAACACCGGCCCGGTTGCCGGCGACGAGGTGGTGCAGTTGTACATCCACCAACGCTGGGGAACTGACACGCGCCCGATCCGCGAGTTGAAGGGTTTCCAGCGCATTACCCTGCAGCCGGGAGAAACCAAGACGGTCAGCTTCCCGCTGGGGCCGGAGGAACTGCGCTACTGGAGCACGAATGCCGGCGCGTGGATTCAAGATGCCACAACTTTCGACGTGTGGGTTGGCAGTGACTCGCAGGCCACCCTGCACGCTGAATTTGAGGTGACTGCCTAG Amino acid sequence: PPSLIBM-04-P08_2_9 MSLEEKVAQLAQISGGDFMPGPKAADIIRKSGAGSVLWLNDTRRFNELQKIAVEESPSGIPVLFALDVIHGYRTIFPVPLAMASSWDPAVAEQAQTVAARETRAAGPHWTFGPMLDIARDARWGRIVEGAGEDPYLGAAMAAAQVRGFQGADLSDPERVLACAKHFAGYGAAEGGRDYDEVHLSETELRNTYFPPFEAAVKAGVGSFMGAYMDLNHVPASANRWLLRDMLRSEFGFEGFVVSDALAIGNLVIQGHARDKRDAALRALKAGMNMDMASGSYLENLADLVKDGSLSAEQIDEMVRPILAIKFKMGLFENPYVEEGLLEKVAARPDHRELARWAAQRSMVLLKNEGGLLPLAKSLQKVAVLGPLADSMAATEGSWMVFGHQPSAVTVLQGIRAKLPDANVQYAPGPDIRRDFPSFFDELFSEAKKPVQTPAEADAALATAVATAQAADLVVMVLGEDANMAGEYASRGSLDLPGRQEELLKAVCALGKPVVLVLLNGRPLSINWAAEHVPAILEAWEPGTEGGNAVADILFGDVNPGGKLPVTFPRSGSHAPMYYAHTLSHQPEGHPQYTSRYWDSPTSPLFPFGFGLSYTSFAFSNLTLSAPQVKLGASLSVNADVTNTGPVAGDEVVQLYIHQRWGTDTRPIRELKGFQRITLQPGETKTVSFPLGPEELRYWSTNAGAWIQDATTFDVWVGSDSQATLHAEFEVTA Fw Primer - EcoR1 HF: GTTACTTCGAATTCATGAGCCTGGAAGAGAAGG  (tm 63 deg) Re Primer- Hind III HF: GTTACTTCAAGCTTCTAGGCAGTCACCTCAAATT  (tm 61 deg)       151  2. P14I01 Glycoside hydrolase (GH) 1 ORF length: 1358 bp Contig location: 3 Strand orientation: “+” LCA taxonomy: Prokaryotes  Start position: 552 bp End position: 1910 bp ORF_ID: 2_1  Nucleotide Sequence: >PPSLIBM-14-I01_2_1 ATGCCCAGCTTTAACTTCCCGGCAGGCTTTCTATGGGGTTCTGCCACTGCTTCTTACCAGATTGAAGGCGCCGTCAACGAAGATGGTCGCAGCGAATCGATCTGGGACCGCTTCTCGCACACGCCCGGCAAGGTTCTTAACGGAGACACCGGCGACGTTGCGTGCGACCATTACCACCGCTGGCGCGACGACGTAGCGCTGATGAAGTCGCTGGGCCTCAAAGCCTACCGCTTCTCGGTCGCGTGGCCGCGCATCTTGCCCAACGGCGCCGGCGAGGTCAACCAGAAGGGGCTGGACTTCTACAGCGCGCTGGTGGACGAGCTGCTGGCGGCGGGGATTACGCCGTTCGTCACCTTGTATCACTGGGATTTGCCGCAGGTGTTGCAGGATGCCGGCGGCTGGCCCGAGCGCGCCACCTGCGCCGCCTTTGTGGAGTATGCCGACGTGGTCAGCCGCCACTTGGGCGATCGTGTCAAGAACTGGATCACGCACAACGAGCCGTGGTGTGTCAGCTTCCTCAGCCATCAGATTGGCGAGCACGCGCCGGGGTGGAAGGATGACTGGATGGCGGCCTTCCGCGCCGCCCATCACGTGCTGCTGTCACACGGCCAGGCTGTGCCGGTGATCCGCGCCAACAGCGCCGGGGCCGAGGTCGGCATCGCGCTCAACTTCAGTTGGGTGGAAGCCGCTTCCTCCGCCGCCGCCGACCAAATGGCTGCGCGCTGGGCTGACGGCTATTCCAACCGCTGGTTCATCGACCCGGTGTATGGGCGGCGCTACCCGGCGGACATGGTGGAGGCGTTCACCACAGCCGGGCTGTTGCCCAACGGGTTGGACTTTGTGCAGCCGGGCGACATGGATGTGATCGCCACGCAGACGGACTTCTTGGGCGTCAACTACTACACGCGCGATGTGGTCAAGGCGCGAAGTGCGGAGACGCCGCTGCCCGAGCCGGCGCGCGAGGTTGCCACGTTGCCGCGCACCGAGATGGACTGGGAGGTCTACCCGGATGGGCTGTACAAGCTCTTGTGCCGCCTGTATTTTGACTATGACATTCCGAAGCTGTATGTGACGGAGAACGGCTGCAGCTACGGCGATGGGCCGGGGGCCGACGGGGCCGTGCACGACAGGCGGCGCACCGAGTACCTGCGCAGCCACTTCCTGGCGGTGCATCGCGCCATGCTGGCGGGCGCGCCGGTGCAGGGGTATTTCGTGTGGTCGCTGCTGGACAACTTCGAATGGGCCAAGGGGTATACGCAACGCTTTGGGATCGTGTGGGTGGACTACAACACGCAGCAGCGCATTCCCAAGGACAGCGCGCTGTGGGTCAAGCAAGTGATCGCCAATAACGGTTTCTA  152  Amino acid sequence: >PPSLIBM-14-I01_2_1 MPSFNFPAGFLWGSATASYQIEGAVNEDGRSESIWDRFSHTPGKVLNGDTGDVACDHYHRWRDDVALMKSLGLKAYRFSVAWPRILPNGAGEVNQKGLDFYSALVDELLAAGITPFVTLYHWDLPQVLQDAGGWPERATCAAFVEYADVVSRHLGDRVKNWITHNEPWCVSFLSHQIGEHAPGWKDDWMAAFRAAHHVLLSHGQAVPVIRANSAGAEVGIALNFSWVEAASSAAADQMAARWADGYSNRWFIDPVYGRRYPADMVEAFTTAGLLPNGLDFVQPGDMDVIATQTDFLGVNYYTRDVVKARSAETPLPEPAREVATLPRTEMDWEVYPDGLYKLLCRLYFDYDIPKLYVTENGCSYGDGPGADGAVHDRRRTEYLRSHFLAVHRAMLAGAPVQGYFVWSLLDNFEWAKGYTQRFGIVWVDYNTQQRIPKDSALWVKQVIANNGF Fw Primer - EcoR1 HF: GTTACTTCGAATTCATGCCCAGCTTTAACTTCC  (tm 63 deg) Re Primer- Hind III HF: GTTACTTCAAGCTTCTAGAAACCGTTATTGGCG  (tm 62 deg)  Plasmid Information Plasmid Name: pET-21 a(+)  Figure A.1: Plasmid pET-21 a(+) circular map (Addgene database)  153  Plasmid sequence (with selected EcoR I and Hind III cut sites highlighted): ATCCGGATATAGTTCCTCCTTTCAGCAAAAAACCCCTCAAGACCCGTTTAGAGGCCCCAAGGGGTTATGCTAGTTATTGCTCAGCGGTGGCAGCAGCCAACTCAGCTTCCTTTCGGGCTTTGTTAGCAGCCGGATCTCAGTGGTGGTGGTGGTGGTGCTCGAGTGCGGCCGCAAGCTTGTCGACGGAGCTCGAATTCGGATCCGCGACCCATTTGCTGTCCACCAGTCATGCTAGCCATATGTATATCTCCTTCTTAAAGTTAAACAAAATTATTTCTAGAGGGGAATTGTTATCCGCTCACAATTCCCCTATAGTGAGTCGTATTAATTTCGCGGGATCGAGATCTCGATCCTCTACGCCGGACGCATCGTGGCCGGCATCACCGGCGCCACAGGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGATCGGGCTCGCCACTTCGGGCTCATGAGCGCTTGTTTCGGCGTGGGTATGGTGGCAGGCCCCGTGGCCGGGGGACTGTTGGGCGCCATCTCCTTGCATGCACCATTCCTTGCGGCGGCGGTGCTCAACGGCCTCAACCTACTACTGGGCTGCTTCCTAATGCAGGAGTCGCATAAGGGAGAGCGTCGAGATCCCGGACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTAAGTTAGCTCACTCATTAGGCACCGGGATCTCGACCGATGCCCTTGAGAGCCTTCAACCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGCACTTATGACTGTCTTCTTTATCATGCAACTCGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGCGCGACGATGATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACTGGTCCCGCCACCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATACGCGAGCGAACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAACGCGGAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACACCTACATCTGTATTAACGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCCTCACAACGTTCCAGTAACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCCCATGAACAGAAATCCCCCTTACACGGAGGCATCAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAGCCAGACATTAACGCTTCTGGAGAAACTCAACGAGCTGGACGCGGA154  TGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTGATGAGCTTTACCGCAGCTGCCTCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCCATTCGCCA 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0365712/manifest

Comment

Related Items