Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Functional analysis of the escherichia coli hemolysin signal sequence by random oligonucleotide mutagenesis Hui, David 1999

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1999-0359.pdf [ 6.36MB ]
Metadata
JSON: 831-1.0089033.json
JSON-LD: 831-1.0089033-ld.json
RDF/XML (Pretty): 831-1.0089033-rdf.xml
RDF/JSON: 831-1.0089033-rdf.json
Turtle: 831-1.0089033-turtle.txt
N-Triples: 831-1.0089033-rdf-ntriples.txt
Original Record: 831-1.0089033-source.json
Full Text
831-1.0089033-fulltext.txt
Citation
831-1.0089033.ris

Full Text

Functional Analysis of the Escherichia coli Hemolysin Signal Sequence by Random Oligonucleotide Mutagenesis by DAVID HUI B.Sc, The University of British Columbia, 1997 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Biochemistry and Molecular Biology) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1999 ©David Hui, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of fclOLHEHZiSTfcf MpLtCULJl^ SlOLO^y The University of British Columbia Vancouver, Canada Date Haj 2/* m DE-6 (2/88) ABSTRACT The E. coli hemolysin is an RTX toxin that is secreted by an ATP-binding cassette transporter complex consisting of HlyB, HlyD, and TolC. Translocation of this protein is dependent on a C-terminal signal sequence located within the last 60 amino acids. Previous studies demonstrated that the signal sequence of a similar toxin, P. haemolytica leukotoxin, could replace that of hemolysin and was efficiently transported. While the two signal sequences share little primary sequence similarities, circular dichroism and 1 5 N nuclear magnetic resonance revealed a conserved helix-strand-helix motif, leading to the hypothesis that this motif is important for efficient secretion. A previous study using helical variants generated by a combinatorial approach suggested that any one of a large number of sequences that yielded a predicted amphiphilic helix in the upstream helical region is sufficient for transport (Morden, 1998). To further elucidate structural requirements of the signal sequence, the helix-strand-helix motif was divided into three regions (al-linker-a2) and each was subjected to random oligonucleotide mutagenesis analysis. The last 16 residues of the extreme C-terminus was also analyzed using two contiguous random libraries (Cterml and Cterm2). This approach replaced the target region with a random sequence, and the presence or absence of any critical element(s) within each region was deduced by measuring the level of secretion of the mutant population. Based on the results from the five random libraries and the al helical library, a two-domain functional model of the hemolysin signal sequence was proposed. The first domain consists of the al and the linker regions. An amphiphilic helical structure in the al region appears to be both sufficient and required for transport. The second domain involves the last 8 residues of the signal sequence, and hydrophobicity in this region appears to be a major determinant of efficient transport. Connecting the two is a stretch of 19 residues, with no specific requirements. This is a surprise since this stretch of amino acids contains the second helix of the helix-strand-helix motif. Nevertheless, this study supports the model that the important features of the signal sequences are secondary structure (i.e. amphiphilic helix) and general biophysical property (i.e. hydrophobicity) rather than primary sequence. ii LIST OF TABLES Table 1. Sequence of Oligonucleotides Used to Assemble Random Cassettes 17 Table 2. Hemolytic Zone Assignments of Cterml Random Variants Taken by Two Separate Observers 26 Table 3. Hemolytic Zone Assignments Taken on Separate Days by Same Observer ...27 Table 4. Nucleotide Frequencies of All Random Libraries 29 Table 5. Amino Acid Frequencies of All Random Libraries 29 Table 6. Genotype and Phenotype of al Random Variants 30 Table 7. Calculated Biophysical Properties of al Random Full-length Variants 33 Table 8. Genotype and Phenotype of Linker Random Variants 35 Table 9. Calculated Biophysical Properties of Linker Random Full-length Variants ...36 Table 10. Genotype and Phenotype of a2 Random Variants 38 Table 11. Calculated Biophysical Properties of a2 Random Full-length Variants 39 Table 12. Genotype and Phenotype of Cterml Random Variants 43 Table 13. Calculated Biophysical Properties of Cterml Random Full-length Variants..44 Table 14. Genotype and Phenotype of Cterm2 Random Variants 46 Table 15. Calculated Biophysical Properties of Cterm2 Random Full-length Variants..47 iii LIST OF FIGURES Figure 1. Model of Hemolysin Transport .4 Figure 2. Domain Organization of E. coli Hemolysin 6 Figure 3. Design of Combinatorial Signal Sequence Variants 9 Figure 4. Hemolytic Zone Size 22 Figure 5. Western Blot Analysis of the Amount of Endogenous Hemolysin 24 Figure 6. Distribution of Random Signal Sequence Variants 32 Figure 7. Effect of Changing the Length and Amino Acid Composition of the C-terminus on Secretion as Demonstrated by a2 and Cterml Stop Mutants 41 Figure 8. Comparison of the Secretion Levels between al Helical and Random Variants 49 Figure 9. Functional Model of Hemolysin Signal Sequence 53 iv TABLE OF CONTENTS Abstract : ii List of Tables iii List of Figures iv Acknowledgments viii I Introduction 1.1 ATP-Binding Cassette Transporter Superfamily 1.1.1 General Properties of ABC Transporters 1 1.1.2 Eurkaryotic ABC Transporters 1 1.1.3 Prokaryotic ABC Transporters 2 1.2 RTX toxin and E. coli Hemolysin 1.2.1 RTX Toxins As Substrates of ABC Transporters 3 1.2.2 Functional Domains of Hemolysin 5 1.3 Hemolysin Signal Sequence 1.3.1 Studies Defining the Hemolysin Signal Sequence 7 1.3.2 Previous Studies Investigating the Hemolysin Signal Sequence 7 1.3.3 Common Observations 10 1.4 Hemolysin Transport Model 1.4.1 Reasons for Using the Hemolysin System As a Model for Transport 10 1.4.2 Three-Step Model of Hemolysin Transport 11 1.5 Thesis Research 1.5.1 Obj ectives and Approach 11 1.5.2 Advantages of Random Oligonucleotide Mutagenesis 12 II Materials and Methods 2.1 Bacterial Strains and Plasmids 14 2.2 Site-Directed Mutagenesis to Create pUCAC494BN 14 2.3 Cloning of Random Mutants: 15 2.4 Isolation of Variants by Colony PCR 16 v 2.5 DNA sequencing 18 2.6 Blood Agar Plate Assay .< 18 2.7 Data Analysis 19 2.8 SDS Polyacrylamide Gel Electrophoresis and Western Blotting 19 III Results 3.1 Optimization of Cloning Procedure 21 3.2 Blood Agar Plate Assay as an Indicator of Secretion Level 21 3.3 Objectivity and Reproducibility of the Blood Agar Plate Assay 23 3.4 Nucleotide and Amino Acid Distribution of Random Sequences 25" 3.5 a 1 Random Library 3.5.1 Analysis of al Full-length Mutants 28 3.5.2 Analysis of a 1 Stop Mutants 31 3.6 Linker Random Library 3.6.1 Analysis of Linker Full-length Mutants 34 3.6.2 Analysis of Linker Stop Mutants. 34 3.7 a2 Random Library 3.7.1 Analysis of a2 Full-length Mutants 37 3.7.2 Analysis of a2 Stop Mutants 40 3.8 Cterm 1 Random Library 3.8.1 Analysis of Cterml Full-length Mutants 40 3.8.2 Analysis of Cterml Stop Mutants 42 3.9 Cterm2 Random Library 3.9.1 Analysis of Cterm2 Full-length Mutants 45 3.9.2 Analysis of Cterm2 Stop Mutants 45 IV Discussion 4.1 The Role of the al Amphiphilic Helix and the Linker Region in Hemolysin Transport 48 4.2 Correlation of the Predicted Hydrophobicity of the C-terminal 8 Residues with Hemolysin Secretion 50 4.3 Functional Model of the Hemolysin Signal Sequence 52 4.4 Future Work vi 4.4.1 Refinement of the Current Model 54 4.4.2 Understanding How the Functional Domains Contribute to Transport 55 4.4.3 Searching for Substrates of Eukaryotic ABC Transporters 56 4.5 Conclusions 56 Bibliography 57 Appendix Perl Scripts and Output A Batch Analysis Perl Scripts A.l Batch-analysis.pl 61 A. 2 Summation.pl 63 B Searching for Proteins that Contain an cd-like Amphiphilic Helix B. l Search Results 66 B.2 PerlSearch.pl 66 B.3 Typical Output from PerlSearch.pl 69 vii ACKNOWLEDGEMENTS This project would not have been successful without the many wonderful people who have generously contributed their time and knowledge. I would like to offer my sincere thanks to Dr. Victor Ling, my supervisor, for his guidance and many inspirations. I am grateful to Dr. Fang Zhang and Dr. Sarah Childs for their friendship and excellent mentorship. I am also indebted to Dr. David Baillie, whose enthusiasm in bioinformatics fueled my interest in programming. I would like to acknowledge the following people for their contributions to the project: Ms. Carla Morden for her pioneer work in the helical library, Dr. Jaclyn Hung for her extraordinary dedication as an independent observer for the blood agar assays and helpful discussions, as well as Drs. Douglas Hogue and Jonathan Sheps for their critical review of this manuscript. This study was supported by the Medical Research Council of Canada and the Natural Sciences and Engineering Research Council of Canada. I would also like to take this opportunity to express my appreciation to my parents, Rupert and Ella, for their encouragement and love throughout the years. This thesis is dedicated to my friends at the patient support group at the British Columbia Cancer Agency, whose strength, courage, and wisdom I highly admire. viii I. INTRODUCTION 1.1 ATP-Binding Cassette (ABC) Transporter Superfamily 1.1.1 General Properties of ABC Transporters The transport of substrates across lipid membranes is an essential function of all cells. The ABC transporter superfamily of proteins plays an active role in many important biological processes, including nutrient uptake, protein export, and cellular drug efflux (Childs and Ling, 1994; Dean and Allikmets, 1995; Higgins, 1992). Present in eukaryotes as well as prokaryotes, these membrane proteins use active transport to pump specific molecules across membranes. The structure of most ABC transporters is comprised of two types of domains: hydrophobic multiple transmembrane domain (TMD), and cytoplasmic nucleotide-binding domain (NBD). The TMD is believed to be important for substrate specificity, while the NBD is responsible for nucleotide hydrolysis. NBDs of ABC transporters contain three consensus sequences: Walker A (G-X2-G-X-G-K-S/T-T/S-X4-B, where X = any residues, and B = hydrophobic residues), Walker B (R-X-B2-X2-P/T/S/A-X-B4-D-E-A/P/C-T-S/T/A-A/G-B-D), and the signature motif (B-S-X-G-Q-R/K-Q-R-B-X-B-A), which distinguishes an ABC transporter from other nucleotide binding proteins. 1.1.2 Eukaryotic ABC Transporters Most eukaryotic ABC transporters can be classified into two classes based on their domain organization. Full transporters consist of two TMDs and two NBDs arranged in an alternate manner (TMD-NBD-TMD-NBD or NBD-TMD-NBD-TMD). Half transporters only have one TMD and one NBD (TMD-NBD or NBD-TMD), and are believed to function in dimers. All eukaryotic ABC transporters with known functions are either responsible for cellular efflux of compounds (in plasma membrane) or sequestration of substrates into intracellular compartments (in organelle membranes). Eukaryotic ABC transporters play pivotal physiological roles in the translocation of a wide variety of substrates. For example, P-glycoprotein and multi-drug resistance protein actively pump many structurally unrelated chemotherapeutic agents 1 out of cells. Overexpression of these proteins contributes to the multi-drug resistance phenotype in many types of cancer in humans (Chan et al, 1994; Cole et al., 1994). Sister of P-glycoprotein (Childs et al, 1995; Childs et al, 1998) is present predominantly in the liver canalicular membrane and has been demonstrated to transport bile acids (Gerloff et al, 1998). Mutations in this gene have been linked to the childhood liver disease progressive familial intrahepatic cholestasis type 2 (Strautnieks et al, 1998). The cystic fibrosis transmembrane regulator functions as a regulated chloride channel, and defects in this gene lead to cystic fibrosis (Riordan, 1993). The yeast transporter STE6 is responsible for the export of a-mating factor (Kuchler et al, 1989). TAP1 and TAP2 are two eukaryotic half transporters that function as a heterodimer to transport antigenic peptides into the lumen of the endoplasmic reticulum, facilitating antigen presentation via the MHCI complex (Marusina and Monaco, 1996). 1.1.3 Prokaryotic ABC Transporters The structures of prokaryotic ABC transporters are more diverse in terms of domain organization. While by definition all ABC transporters contain at least one NBD, the TMDs are not always present in the same polypeptide in bacteria. For these proteins (NBD, NBD-NBD), the TMDs are usually translated within the same operon, and interact with the corresponding NBD subunit(s) to provide transport. Prokaryotic ABC proteins that are half transporters (NBD-TMD or TMD-NBD) are believed to function with a partner. In addition to the core complex of two TMDs and two NBDs, many bacterial transporters require other subunits (usually located within the same operon) for function. Prokaryotic ABC importers (also known as periplasmic permeases) require periplasmic-binding proteins, which bind to and present the substrate to the import complex at the membrane. Bacterial ABC exporters in gram-negative bacteria are usually associated with accessory proteins, which connect the inner and outer membranes to facilitate the passage of substrates to the outside of the cell. Recent completion of the E. coli genome has allowed for a comprehensive analysis of its ABC transporter inventory, which occupies a remarkable 4.9% of the 2 genome (Linton and Higgins, 1998). Out of the 57 transport systems described, 44 are likely to be importers, while the other 13 are presumed exporters. The best characterized ABC importer is the histidine permease complex, which is comprised of the periplasmic-binding protein, HisJ, and three proteins, HisQ (TMDi), HisM (TMD2), and HisP (NBD). HisP is the one of the first ABC NBD to be crystallized at 1.5A resolution (Hung et al, 1999), providing a critical step for further understanding of the transport mechanism. ABC systems of similar structural organization are responsible for the uptake of a wide variety of substrates, including oligopeptides (OppB/C/A), sugars (MalE/G/F), vitamins (BtuC/E), metal ions (NikB/C/A), and phosphate (PstA/C/S). While the histidine permease is a model system for bacterial importers, the E. coli hemolysin system is a prototype for prokaryotic ABC exporters (Fath and Kolter, 1993). Hemolysin B (HlyB) is a 708 amino acid half transporter that resides in the inner membrane. This protein forms a complex with hemolysin D (HlyD) and TolC to actively secrete hemolysin (see Sheps et al., 1996 for a review). An illustration of this complex is shown in figure 1. HlyD, a 479 amino acid inner membrane anchored protein, has its bulk in the periplasm, while TolC is a common outer membrane protein. The substrate, hemolysin, is a 1024 amino acid repeat-toxin (RTX) toxin (section 1.2) that is transported directly from the inside of the cell to the outside, and its C-terminal signal sequence (section 1.3) is the focus of this thesis. 1.2 R T X Toxins and E. coli Hemolysin 1.2.1 R T X Toxins As Substrates of A B C Transporters One of the classes of protein substrates secreted by ABC transporters in gram-negative bacteria is the RTX toxin (Coote, 1992). Members in this family share three major characteristics. First, they contain a series of glycine-rich nonapeptide repeats (LXGGBGBBX) near the C-terminus. Second, they are synthesized as inactive precursors and require modification for activation. The genes for toxin synthesis, activation, as well as secretion (ABC transporters) are usually grouped together in one operon. Third, RTX toxins are exported out of gram-negative bacteria via the type I secretion pathway. This transport pathway is different from the conventional 3 HlyA Figure 1: Model of Hemolysin Transport Secretion of HlyA is dependent on the hemolysin transporter complex, consisting of HlyB dimers, HlyD tetramers, and TolC trimers. After translation, the substrate is believed to diffuse into the inner membrane before eventually interacting with the transporter complex, leading to ATP hydrolysis and its subsequent translocation across the inner and outer membranes. 4 system in that it is sec-independent, and it does not involve a periplasmic intermediate. The signal sequences of these secreted proteins are located at their extreme C-termini (which ensures that secretion is a post-translational event), and are not cleaved after translocation. Members of the RTX toxin family are responsible for virulence in a number of bacterial species. The best studied is E. coli hemolysin, which forms a cation-selective pore in the plasma membrane of erythrocytes and leukocytes (Menestrina et al., 1994). This allows small osmoticants to move through the membrane, leading to a net influx of water, and eventual cell lysis. Hemolysin is believed to contribute to the pathogenesis of a number of extraintestinal diseases, including urinary tract infections, peritonitis, meningitis, and septicemia (Goebel et al, 1988). 1.2.2 Functional Domains of Hemolysin The 107kDa hemolysin can be divided into two functional domains (figure 2A). The N-terminal domain contributes to hemolytic activity, and the C-terminal domain (which contains the signal sequence) is responsible for secretion. It should be noted that the N- and C-terminal domains of hemolysin are functionally independent. Truncation of the C-terminal domain abolishes secretion but retains hemolytic activity, while peptides representing the C-terminal portion are not lytic but can be secreted (Jarchau et al., 1994). The hemolytic domain can be further subdivided into three important sections: residues 130 to 450 are predicted to form between eight to ten contiguous amphiphilic helices that are likely to be involved in channel formation (Menestrina et ah, 1994). The second region stretches from residues 600 to 900. Hemolysin C (HlyC)-dependent acylation of this region is essential for targeting of hemolysin to the mammalian cell membrane (Hughes et al., 1992). The current model suggests that HlyC dimers interact with acyl-carrier proteins, then dock at the HlyC recognition sequence within this region and transfer the fatty acid group to nearby acyl modification sites. The third region constitutes a series of glycine-rich repeats downstream of the acylation sites. Calcium binding at this region has been shown to be required for activation of hemolytic activity (Ostolaza and Goni, 1995). 5 (A) 0 100 200 300 _ l l _ 400 500 I L _ 600 L _ 700 l _ 800 _ l 900 1000 _ l l I 1024 amino acids Hemolytic domain Signal Sequence (B) 970 (-55) l _ 980 (-45) 1_ 990 (-35) 1000 (-25) 1010 (-15) 1020 1024 (-5) (-0 _ l I STYGSQDNLNPLINEISKIISAAGNFDVKEERSAASLLQLSGNASDFSYGRNSITLTASA al helix linker a2 helix Cterml Cterm2 (C) GNGKITQDELSKVVDNYELLKHSKNVTNSLDKLISSVSAFTSSNDSRNVLVAPTSMLDQSLSSLQFARAA Figure 2: Domain Organization of E. coli Hemolysin (A) Hemolysin can be divided into two functional domains. The N-terminal hemolytic domain is responsible for pore formation in the eukaryotic cell membrane, while the C-terminal signal sequence domain is required for secretion of this molecule. Within the hemolytic domain are eight to ten predicted amphiphilic helixes (• ), the HlyC activation domain (0), and the glycine rich repeats where Ca 2 + binding occurs (Cl). (B) The helix-strand-helix motif of hemolysin signal sequence as determined by 1 5 N nuclear magnetic resonance is shown here. Residues of the two helices are underlined. (C) The leukotoxin signal sequence contain a similar structural motif. Note the amino acid similarity between the upstream helices of the two signal sequences (i.e. NEISKII v.s. DELSKVV). 6 Separating the N-terminal domain and the signal sequence at the C-terminus, these repeats have also been proposed to assist in the proper folding of the upstream region, allowing the signal sequence to be exposed. For hemolysin to be secreted, it must be recognized by the HlyB/HlyD/TolC transporter complex. The signal sequence of hemolysin (section 1.3) is positioned within its last 60 amino acids, and interaction of the signal sequence with the transporter is both required and sufficient for secretion. The focus of this thesis is to define and understand features within this signal sequence that contribute to transport. 1.3 The Hemolysin Signal Sequence 1.3.1 Studies Defining the Hemolysin Signal Sequence In 1986, the hemolysin signal sequence was found to be located in a 23kE> fragment at the C-terminus (Nicaud et al., 1986). Since then, a number of studies had been carried out to define its exact location. Deletion of the last 27 amino acids abolished transport, suggesting that the signal sequence is positioned at the extreme C-terminus (Gray et al., 1986). Subsequent fusion protein studies with PhoA (Hess et al., 1990) and CTP (Koronakis et al., 1989) revealed that the last 60 residues are sufficient for efficient secretion. To demonstrate that the signal sequence by itself could be recognized by the transporter complex, 5' exonuclease studies had been carried out and confirmed that the last 62 amino acids contain all the necessary information for substrate recognition (Jarchau et al., 1994). Since the "secret" to secretion is located within a well-defined region, a number of sequence comparison and mutational experiments have been done to elucidate features within this signal sequence that are important for transport. 1.3.2 Previous Studies Investigating the Hemolysin Signal Sequence One of the first of these studies compared the signal sequence of E. coli hemolysin with other similar RTX toxins such as P. haemolytica leukotoxin, and revealed two common features: the presence of an aspartate box composed largely of small amino acids and that the last few residues are generally uncharged (Gray et al., 1989). Isolating and sequencing of homologous hemolysins from different species (E. coli, 7 Proteus vulgaris, and Morganella morganii), another group highlighted the conservation of three contiguous sequences within the extreme 53 amino acids—a potential amphiphilic helix, a cluster of charged residues, and a weakly hydrophobic terminal sequence rich in hydroxylated residues (Koronakis et al., 1989). Internal deletions and systematic truncation studies supported the notion that all three regions are important for transport. A collection of single and multiple mutations were analyzed, providing further refinement to this tripartite model (Stanley et ah, 1991). A separate saturation mutagenesis study provided evidence contradicting the above model. A number of mutants designed to alter the proposed features were found to have little effect in transport (Kenny et al., 1992). Furthermore, several other missense mutations reduced secretion by more than 50%, with multiple replacements having a synergistic effect (Kenny et al., 1994). These observations led to the proposal that a few critical residues contribute to transport. Previous studies performed in our laboratory demonstrated that the C-terminal signal sequence of leukotoxin can replace that of hemolysin and achieve up to 100% transport efficiency (Zhang et al., 1993a). This suggested that there must be some conserved features that are important for function. Comparison between the two signal sequences (figures 2B, 2C) revealed little primary sequence similarity, but both appeared to have a common helix-strand-helix motif (al-linker-a2). Circular dichroism analysis showed that both signal peptides appear mainly unstructured in an aqueous environment, but assume a helical secondary structure in a membrane mimetic environment, implying that the hemolysin signal sequence may interact with the inner membrane before it is recognized by the transporter complex (Zhang et al., 1995). Further characterization of the two signal sequences with 1 5N nuclear magnetic resonance analyses supported the presence of two weak amphiphilic helices (Yin et al., 1995). This secondary structure conservation led to the proposal that the helix-strand-helix structural motif within the C-terminal signal sequence plays a critical role in transport. This model formed the basis for further structure-based mutational studies. To test the hypothesis that an amphiphilic helix in the al region is sufficient for transport, a combinatorial library approach (figure 3) was used (Morden, 1998). al 8 Wild-type DNA tta att aat gaa ate age aaa ate art tea get gca Amino acid L I N E I S K I I S A A Helical Variants DNA NtN NtN HaN HaN NtN aHN HaN NtN NtN aHN Dct gBB Amino acid U U Z Z U Z Z U U Z Z U Random Variants DNA NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN Amino acid X X X X X X X X X X X X (B) Wild-type Helical Variants I L A U U u Figure 3: Design of Combinatorial Signal Sequence Variants (A) Using the al region as an example, the design of two types of variants are illustrated here. For helical variants, the amino acids have to be ordered in such a way that their properties assemble the amphophilic helical arrangement of the wild-type sequence. In this case, the degenerate codes NtN and gBB represent non-polar residues (U), while the codes aHN, HaN, and Dct represent polar residues (Z) (where N = a/g/c/t; B = c/t/g; H = a/g/c; and D = a/t/g). For random variants, the sequence coding for the target region is simply replaced by a string of N of the same length. (B) Helical wheel representation of the al region demonstrating polar residues (bold) on one side and non-polar residues on the other for wild-type (right) and helical variants (left). 9 helical variants were generated in such a way that the primary sequence of the al region was lost, but the length and secondary structure were retained. On the one hand, if critical residues were important for transport, then most helical variants would be transported at low efficiency. On the other hand, if an amphiphilic helix were responsible for function, then most variants with a helical design would be secreted at wild-type level. Twenty-two such variants were generated. Twenty of which were secreted at close to wild-type efficiency, suggesting that an amphiphilic helix in this region is sufficient for transport. 1.3.3 Common Observations Depending on the experimental approach and how the data were interpreted, many different important features have been proposed in the hemolysin signal sequence. Despite the complexity of published results, a few observations are consistent. First, there is a substantial amount of evidence supporting the presence of an amphiphilic helix stretching from around L976 to A987. Second, no null mutants have been obtained despite the extensive mutations in the signal sequence. Drastic deletions are required before significant reductions in secretion can be observed. Third, the hemolysin signal sequence can tolerate a wide range of point mutations, demonstrating the broad substrate specificity common among ABC transporters. 1.4 Model of Hemolysin Transport 1.4.1 Reasons for Using the Hemolysin System As a Model for Transport One of the fundamental questions in the field of ABC proteins is how substrates are recognized and transported. The mechanism of transport is believed to be similar for members of this superfamily because of the high degree of homology among ABC transporters from bacteria to man. The hemolysin system provides an excellent model for the following reasons: (1) both the transporter complex and the substrate have been characterized extensively, (2) the bacterial system allows powerful use of classical and recombinant genetics, with a rapid generation time, and (3) the substrate (hemolysin) is a protein, allowing easy genetic manipulation. 10 1.4.2 Three-Step Model of Hemolysin Transport The following three-step model has been proposed for hemolysin transport (Zhang et al., 1995). The first step involves interaction of the signal sequence with the inner membrane (figure 1), possibly through the conserved al amphiphilic helix. This allows hemolysin to diffuse in a two dimensional space (as oppose to three-dimensional in the cytoplasm), and to locate the transporter complex much more efficiently. The second step involves recognition of hemolysin by the HlyB/HlyD/TolC complex, followed by docking of the signal sequence in the recognition site. Complementation experiments provide genetic evidence that the hemolysin signal sequence interacts with HlyB at sites close to the cytoplasmic leaflet of the inner membrane (Sheps et al, 1995; Zhang et al, 1993b). A number of critical residues identified in the signal sequence by Kenny et al. may be involved in binding to the multiple contact sites within the transporter. Finally, interaction of the substrate with the transporter triggers conformational change within the NBD of HlyB, leading to ATP hydrolysis. This would provide the energy for subsequent changes in the structure of TMD, facilitating the translocation of hemolysin across both the inner and outer membranes to the outside of the cell. It is interesting to note that a similar model has been proposed for P-glycoprotein. For example, substrates of P-glycoprotein are generally amphiphilic and are believed to partition to the plasma membrane (Shapiro and Ling, 1998). A recent study from our laboratory has also suggested that there are multiple substrate binding sites in P-glycoprotein (Shapiro and Ling, 1997), in parallel to the multiple contact sites in the hemolysin model. 1.5 Thesis Research 1.5.1 Objectives and Approach The importance of higher-ordered structure in substrate recognition and transport has been clearly established by results from the al helical library (Morden, 1998). Building upon this work in the al region, we would like to understand the structural requirement of the hemolysin signal sequence. To test the hypothesis that the conserved helix-strand-helix motif is required for function, a random library was 11 generated in each of the al, linker, and al regions. This was done by replacing the targeted gene-coding segment with a string of degenerate nucleotides (G, A, T, C). The resulting variants had a random combination of amino acids in the targeted region. If any primary or secondary structural element was required in this region, then the majority of random mutants would be secreted at low levels because very few combinations would possess this element by chance; however, if no specific element was required in this region, then most random mutants would be secreted at wild-type level. Following this rationale, the tail region just downstream of al was also examined to determine the presence or absence of any elements within the extreme C-terminus of the signal sequence that were critical for transport. A combinatorial approach was utilized to generate two contiguous 8 amino acid random libraries (Cterml and Cterm2) within this region. In summary, random oligonucleotide mutagenesis was applied to five regions within the hemolysin signal sequence (figure 2B) —al (L976 to A987), linker (G988 to S997), a2 (A998 to A1008), Cterml (S1009 to N1016), and Cterm2 (SI017 to A1024). The objective of this study was to understand features within the hemolysin signal sequence that contribute to transport. Knowledge in this area would facilitate more detailed refinements of the hemolysin transport model, which might eventually lead to the elucidation of the general ABC transport mechanism. In particular, understanding how the signal sequence interacts with the transporter complex could offer some insights to the widely observed phenomenon of multiple substrates recognition in P-glycoprotein and other ABC transporters. 1.5.2 Advantages of Random Oligonucleotide Mutagenesis The method of random oligonucleotide mutagenesis was adopted for a number of reasons. First, a large, well-defined region can be completely altered. In most cases, the structure would be destroyed rather than disturbed (such as in point mutants). This would provide a clear-cut answer as to whether any required features are present within the altered region. Mutating regions containing critical transport elements would result in low level of secretion, and vice versa. Second, this technique allows 12 for the generation of variants with drastic alterations while leaving the length of the signal sequence unchanged. Third, a large number of variants can be generated, allowing for a greater degree of confidence in the interpretation of results. Fourth, in addition to the highly informative in-frame full-length mutants (i.e. random without pre-mature stop codon), this method also generates stop mutants (truncation mutants with a random tail). These truncation mutants allow further analysis of the role of the C-terminal tail in secretion. 13 II. MATERIALS AND METHODS 2.1 Bacterial Strains and Plasmids The E. coli strain ToplOF' (F'{/acIq, TniO(TetR)} mcrA A(mrr-hsdRMS-mcrBC) <|>80 lacZAMIS AlacXIA deoR recAX araD139 A(ara-leu)7697 galU galK rpsL(StrK) endA\ nupG) (Invitrogen) was used for the construction of random mutations in hemolysin that is encoded by pUCAC494/pUCAC494BN. pUCAC494 was used for the generation of the al, a2, and Cterml random libraries. pUCAC494BN, a derivative of pUCAC494, was generated by site-directed mutagenesis (section 2.2) and was used for the construction of linker and Cterm2 random libraries. All pUCAC494 and pUCAC494BN variants were selected with ampicillin at 50pg/ml. E. coli JM83 (X' araA{pro-lac) rpsL thi <j>80 dlacZAM\5X') transformed with pLGBCD and mutated versions of pUCAC494 were assayed on blood agar plates. pLGBCD, a pACYC derivative containing hlyB, hlyD, and hlyC, was selected with chloramphenicol at 34pg/ml. 2.2 Site-Directed Mutagenesis to Create pUCAC494BN Random oligonucleotide mutagenesis involved insertion of a cassette containing random sequence into the coding region of the signal sequence at two flanking restriction sites. For practical reasons, we limited ourselves to assemble cassettes of 185bp or less (Maximum length of each oligonucleotide synthesized by GIBCO BRL is lOObp, minus 15bp of overlap for annealing. This implies that the distance between the two restriction sites must be less than 185bp). To meet this requirement in the construction of linker and Cterm2 random libraries, two sites were engineered within pUCAC494 to create pUCAC494BN. Construction of pUCAC494BN from pUCAC494 consisted of two steps. The first step involved the creation of a BstBI restriction site in the al region of the hemolysin signal sequence. The polymerase chain reaction (PCR) was used to amplify DNA and contained the following reagents unless otherwise stated: template DNA at lOng/pl, two primers at 20ng/pl, dATP, dGTP, dTTP, and dCTP at lOOpM, O.OlU/pl Taq polymerase, as well as lOmM Tris, pH 8.0, 50mM KC1, ImM MgCl2, and 0.05% Triton X-100. 14 Primers ASalC494 (5'-TCGATCGTGAACACCTTGGAAGGTAACGCG-3') and HlyA-BstBI-R (5'-CAGCTGAAATGATTTTCGAAATTTCATTAATTAATG-3', underlined region represents nucleotides changed) were used to generate a 97bp fragment from pUCAC494 (PCR conditions: 94°C for 5 minutes, followed by 30 cycles of 94°C for 30s, 55°C for 30s and 72°C for 30s, then held at 72°C for 10 minutes). This PCR product was used directly as a primer (40ul in a final volume of 100u.l), in conjunction with M13R1 (5' - AAAACGACGGCC AGTGAATTC-3') to amplify a 319bp fragment from pUCAC494 (reaction conditions: 94°C for 5 minutes, followed by 30 cycles of 94°C for 30s, 60°C for 15s and 72°C for 30s). The resulting reaction product was purified with a QIAquick PCR purification kit (Qiagen, Inc.), then digested with Kpnl and Sail, and finally inserted into the pUCAC494 at these two restriction sites to generate pUCAC494B. The second step involved creation of an Nhel site in the a2 region of pUCAC494B. Primers HlyA-Nhel-F (5' -G AAAG ATCTGCCGCT AGCTTATTGC AGTTGTCC-3') and M13R1 were used to amplify a 198bp fragment from pUCAC494 (reaction conditions: 94°C for 5 minutes, followed by 30 cycles of 94°C for 30s, 55°C for 30s and 72°C for 30s). This reaction product was purified with QIAquick PCR purification kit, and ligated into the Kpnl and Bglll sites of pUCAC494B. The resulting vector, pUCAC494BN differed from pUCAC494 in that it contained two additional unique restriction sites, BstBI (in the al region) and Nhel (in the a2 region). These sites were designed such that the original amino acid sequence remained unchanged. The hemolytic zone of pUCAC494B and pUCAC494BN were determined to be identical to that of pUCAC494. 2.3 Cloning of Random Mutants A pair of desalted oligonucleotides (GIBCO BRL) was used to form a cassette for each of the five random libraries. Annealing of these oligonucleotides was facilitated by the presence of a 15 base pair complementary sequence in each pair. Each cassette contained the mutated target sequence and flanking regions, and was used to replace the wild-type sequence. Insertion of each cassette into pUCAC494 or pUCAC494BN was facilitated by the presence of two unique restriction sites located at the ends of the cassette. 15 Oligonucleotides were dissolved in lOmM Tris, pH 8.5 to a concentration of lOOpmol/pl. lOpl of each was then mixed with its partner, heated to 80°C for 5 minutes, cooled to 25°C (ramp time 1 hour), and incubated at 25°C for 1 hour. The annealed oligonucleotides were filled in with Klenow (GIBCO BRL), and purified using a QIAquick PCR purification kit, before restriction digestion. The cloning plasmid, restriction sites, and oligonucleotide sequences for each library are listed in table 1. The ligation product was transformed into CaCb treated Topi OF' competent cells and the transformants were screened for the proper insertion of the random cassette using one of two colony PCR methods (section 2.4). 2.4 Isolation of Variants by Colony PCR Colonies transformed with mutated versions of pUCAC494/pUCAC494BN were picked and resuspended in 20pl of ddH.20, from which 5pl was used for testing. In addition to template DNA from each colony, all reactions contained the following reagents: two primers at 20ng/pl, dATP, dGTP, dTTP, and dCTP at lOOpfvl, O.OlU/pl Taq polymerase, as well as lOmM Tris, pH 8.0, 50mM KC1, ImM MgCl2, and 0.05% Triton X-100. All reactions had a final volume of 25pl and were performed under the same cycling condition (94°C for 5 minutes, followed by 30 cycles of 94°C for 30s, 55°C for 30s and 72°C for 30s, then held at 72°C for 10 minutes). Two methods of colony PCR were used to isolate transformants. The first method was based on the idea that random oligonucleotide mutagenesis would likely destroy many of the previously mapped restriction sites within the targeted region (for example, there is only a 1/1024 chance for a specific 6 nucleotide restriction site to appear at the same position). Therefore, amplification of a PCR fragment and subsequent digestion with a selected restriction enzyme would allow the selection of clones with a PCR product that could not be cleaved, and thus likely contained a random sequence. For the al, linker, a2, as well as the Cterml random libraries, a 579bp fragment was amplified from each colony using primers HA24 (5'-GATTTCCGGGACGTTGCC-3') and M13R1, and then subjected to digestion with PstI, Bglll, Hpall, and Ndel, respectively. The Cterm2 region did not contain any convenient restriction sites and thus a second method was adopted. Primers HA24 and Cterm-test-R (5'-GCTGATGCTGTCAAAGT 16 Table 1: Sequence of Oligonucleotides Used to Assemble Random Cassettes (underlined region represents complementary sequence where annealing occurs) al cassette: inserted into pUCAC494 at the Sail and Bglll sites HlyA-al-F HlyA-al-R 5'-GGAAGGTAACGCGTCGACTTATGGGAGCCAGGACAATCTTAATCCA-3' 5'-GGCAGATCTTTCCTCCTTAACATCGAAGTT^ NNNNNNNNNNNNNNNNNNNNNTGGATTAAGATTGTC GTC-3' Linker cassette: inserted into pUCAC494BN at the BstBI and Nhel sites HlyA-linker-F HlyA-linker-R 5'-CCArrAATTAATGAAATTTCGAAAATCATTTCAGCTGCA-3' 5'-GGACAACTGCAATAAGCTAGCGGCNNNNNN^ NNNNNNNNTGCAGCTGAAATGATTTTCGAAATTTCATTAATTAATGG-3' a2 cassette: inserted into pUCAC494 at the Bglll and Kpnl sites HlyA-a2-F HlyA-a2-R 5'-GAGGAAAGATCTNNNNNNNNNNNNNNN>JNNNNN^ TGAlTrTrCATATGGACGGAACTCAATAACTTTGACAGCATCAGCAT-3' 5'-AGCTCGGTACCATTATGACTCCAAAAAAATAGCAATCTTATGTGGCACA GCCAGTAAGATTGCTATCATTTAAATAATATATTATGCTGATGCTGTCA-3 1 Cterml cassette: inserted into pUCAC494 at the Bglll and Kpnl sites HlyA-Cterml-F HlyA-Cterml-R 5'-GAAAGATCTGCCGCTAGCTTATTGCAGTTGTCCGGTAATGCCNNNNNNN NNNNNNNNNNNNNNNNNTCAATAACTTTGACAGCATCAGCATAATATA 3' 5'-GAGCTCGGTACCATTATGACTCCAAAAAAATAGCAATCTTATGTGGCAC AGCCCAGTAAGATTGCTATCATTTAAATTAATATATTATGCTGATGCTGTC-3* Cterm2 cassette: inserted into pUCAC494BN at the Nhel and Kpnl sites HlyA-Cterm2-F HlyA-Cterm2-R 5'-CTGCCGCTAGCTTATTGCAGTTGTCCGGTAATGCCAGTGAT/TTTTCATAT GGACGGAACNWT>mNN>^ 3' 5'-GAGCTCGGTACCATTATGACTCCAAAAAAATAGCAATCTTATGTGGCAC AGCCCAGTAAGATTGCTATCATTTAAATTAATATATTA-3' 17 TATTG-3') were used to amplify a 470bp fragment. Cterm-test-R was designed to anneal to the wild-type Cterm2 region only. In this case, absence of a band would indicate a positive candidate. Although this procedure did not require digestion, a follow-up PCR reaction using HA24 and M13R1 was used to eliminate any candidates with abnormal band size or no band at all. Random variants (listed in tables 6 to 15) were named according to the following scheme: a one letter code was used to represent the region mutated (A=al, L=linker, B=a2, C=Cterml, D=Cterm2), the first digit represented the ligation number, the second digit represented the transformation number of the specific ligation, and the last two to three digits represented the colony PCR assignment. All positive colonies were grown overnight and plasmid DNA of each variant was prepared with the Quantum miniprep kit (BioRad) following the manufacturer's instructions. DNA sequencing was performed on each variant to obtain the genotype. 2.5 DNA Sequencing Plasmid DNA was quantitated with the use of an SSF-600 solid state fluorimeter (Tyler Research Instruments Corporation) by the ethidium bromide fluorescence assay described in the instruction manual. DNA samples were prepared with an ABI PRISM BigDye terminator cycle sequencing ready reaction kit according to manufacturer's instructions, and run on a 310 Genetic Analyzer (PE Biosystems). The forward sequence was obtained for all variants using aSEQ (5'-GACGGCAGGGTAATCACACC-3') as primer. The reverse sequence was obtained for selected variants using M13R1. 2.6 Blood Agar Plate Assay The hemolysin secretion level of each variant was determined using blood agar plate assay. Five different plating conditions were tested: 20ml of LB agar with 1%, 2%, or 5% defibrinated sheep blood (PML Microbiologicals), or two-layered plates with 10ml plain LB bottom agar and 10ml 2% or 5% blood LB top agar. In general, high secretors could be resolved better on high percent blood plates, and vice versa. Plates with double layers were found to provide a better contrast. The optimal condition was established to 18 be 10ml LB bottom agar with 10ml 5% blood LB top agar, and these plates were used to determine the phenotype of all clones. DNA isolated from each variant was transformed into CaCk competent JM83 containing pLGBCD and spread on blood agar plate in triplicate. After an incubation period of 19 hours at 37°C, each variant was assigned a zone rank from 0 (no hemolysin) to 6 (wild-type) by comparing to a set of standards (pUCAC494=6, pUCAC494BN=6, B23265=5, L2329=4, L3345=3, L3240=2, L2106=l). To control for lysis, cells transformed with pUCAC494 and pLGCD (i.e. without HlyB) had a zone size of zero (no secretion), and were identical to cells transformed with pLGBCD only (i.e. without HlyA). Attributes such as hemolytic zone size, brightness, and colony size were all taken into account. All plates were examined twice separately. Altogether, six readings were taken for each variant and an average was calculated. All variants within one library were assayed the same day to minimize any inconsistencies. 2.7 Data Analysis DNA sequences were translated and analyzed with the Wisconsin Package™ Version 9.1, Genetics Computer Group (GCG), Madison, Wisconsin. Additional Perl scripts were written by Dr. Eric Cabot (appendix A.l) and David Hui (appendix A.2) to facilitate bulk analysis. Peptool Version 1.1 (Biotools Inc., Edmonton) was used for helix hydrophobic moment determination. 2.8 SDS Polyacrylamide Gel Electrophoresis and Western Blotting The amount of endogenous hemolysin in a number of signal sequence variants was determined by Western analysis. JM83 transformed with plasmids encoding the hemolysin variants were harvested at OD600 = 0.85+0.05. Following centrifugation (6000xg for 15 minutes), the cell pellets were resuspended in STE buffer (lOmM Tris, pH 8.0, 150mM NaCl, ImM EDTA) supplemented with various protease inhibitors (lOpg/ml phenylmethylsulphonyl fluoride, 5pg/ml pepstatin, 2.5pg/ml leupeptin, and 5pg/ml aprotinin). For each sample, an equivalent of 200pl of cells was boiled for 3 minutes, and ran on an SDS polyacrylamide gel (10% separating) under reducing condition. Whole cell proteins were transferred onto a nitrocellulose membrane (Mandel 19 Scientific), and then blocked with TBS (lOOmM Tris, pH 7.4, 150mM NaCl) containing 10% skim milk. Following a rinsing step with ST buffer (TBS with 1% skim milk and 0.1% Tween-20), the membrane was incubated at room temperature for 2 hours with rabbit antiserum raised against hemolysin (gift from Drs. LB. Holland and M. Blight), diluted 20,000x with ST buffer. The membrane was rinsed again and then incubated at room temperature for 1 hour with peroxidase-conjugated goat anti-rabbit immunoglobulin (Jackson ImmunoResearch Laboratories, Inc.), diluted 10,000x with ST buffer. After a final rinsing step, the amount of endogenous hemolysin was determined using the enhanced chemiluminescence Western blotting detection reagents (Amersham Pharmacia Biotech) according to manufacturer's instructions. 20 III. RESULTS 3.1 Optimization of Cloning Procedure The isolation of positive candidates using colony PCR (section 2.4) is the rate-limiting step in random oligonucleotide mutagenesis. At a low cloning frequency (such as 2%), one would need to screen approximately 5000 colonies to isolate 100 positive clones. To achieve a high proportion of positive clones, a high insert-to-vector ratio in the ligation step was essential. Several different insert-to-vector ratios were employed during the ligation steps, and it was found that increasing the amount of insert usually enhanced the cloning efficiency (up to 80%, 40x increase). In order to obtain an adequate amount of insert for efficient cloning, the annealing step was critical—the oligonucleotides should come together slowly to ensure maximum yield. In fact, low insert-to-vector ratio not only yielded a smaller number of positive candidates, but also a higher frequency of clones containing unwanted mutations. DNA alterations ranging from deletions (single nucleotide, large region), to insertions (single nucleotide, large region), to duplications (large region), to single base pair substitutions were found in the targeted region as well as flanking regions in the signal sequence. Interestingly, all of these mutations were located within the cassettes for each of the five random libraries, implying that these changes were most likely a result of imperfect cassette formation, rather than low fidelity in the bacterial replication machinery or biological selection. Why more unintended mutants showed up when the insert-to-vector ratio was low remained unknown. 3.2 Blood Agar Plate Assay as an Indicator of Secretion Level In the blood agar plate assay, the size of the hemolytic zone surrounding a colony depends on both the quantity and the activity of the secreted hemolysin. It is known that the hemolytic domain and signal sequence are functionally independent (section 1.2.2). Furthermore, enzyme-linked immunoabsorbant assay (ELISA) has been used to correlate the amount of secreted protein with the hemolytic zone size for a number of al helical variants (figure 4), confirming that signal sequence mutations do not affect the hemolytic activity (Morden, 1998). Thus the zone size can be used as an indirect 21 RankO Rank 1 Rank 5 Rank 6 Figure 4: Hemolytic Zone Size E. coli JM83 harbouring pLGBCD were transformed with plasmid DNA from the five random libraries. Transformants were plated on blood agar plates in triplicate and incubated at 37°C for 19 hours. Each plate was read twice independently by comparing to a set of standards and assigned a rank from 0 (no transporter) to 6 (wild-type activity). A total of 6 readings for each variant was averaged. Based on ELISA, the percentage of secretion was roughly approximated for each zone size: rank 6=100%, rank 5=90%, rank 4=50%, rank 3=30%, rank 2=10%, rank 1=2%, rank 0=0% (Morden, 1998). 22 measurement of the amount of hemolysin secreted into the medium, which is dependent on two important factors: (1) the amount of endogenous hemolysin, and (2) the transport efficiency. To establish the blood agar plate assay as an indicator of transport efficiency, it was important to rule out the possibility that mutations within the signal sequence affect the amount of endogenous hemolysin (due to low transcription level, high RNA instability, low translational level, or rapid protein degradation). Out of a total of 357 full-length and stop variants from the five random libraries, 28 were selected. The amount of endogenous hemolysin for each of these clones was determined (in cells lacking the transporter complex) by Western analysis. Twenty-seven out of 28 of these variants were present at levels similar to wild-type (figure 5), suggesting that hemolytic zone size can be used to estimate the secretion level. It should be noted that hemolysin could not be detected for the lowest secreting oc2 full-length variant, B21321, implying that some mutation—either the specific random sequence within the a2 region, or some aberration elsewhere in the gene—prevented the protein from being expressed at normal levels. Sub-cloning of this random sequence would allow one to distinguish between the two possibilities. This observation did not come as a surprise since most of the other oc2 full-length variants were produced and secreted at wild-type levels (results of a2 full-length variants will be discussed in section 3.7.1). Rather, the fact that the cytoplasmic hemolysin level was at an undetectable level explained why this particular variant had an extremely low hemolytic zone size. 3.3 Objectivity and Reproducibility of the Blood Agar Plate Assay A number of steps were taken to ensure that readings from the blood agar plate assay were as objective and reproducible as possible. Six independent readings were taken for each variant. The average and standard deviation were calculated to provide an indication of the secretion level and variability, respectively. Assignments (figure 4) were most consistent at the two extremes (0, 1, and 6), with the variability greatest for clones secreting at levels 3, and 4. It should be emphasized that the scale for zone assignment (rank 0 to 6) was non-linear, and that the numbers obtained were only semi-quantitative. 23 OS < u D a. 175kD — Hemolysin 83kD 62kD Zone size: Type: 3^-Cs OS u rn SO CN —^ Tt cn tn oo • _« Q tn Q ^ B. < — o .—i (N tn fN CN CN CN # * CN un cn CN cn -J so cn cn cn o cn CN 02 CN m CN CQ CN un CN SO CN CN 00 CN 6 6 4.2 0 6 2 2.3 2 1 1 1 WT FL S (-) WT S FL FL S S FL Figure 5: Western Blot Analysis of the Amount of Endogenous Hemolysin Mutant plasmids were chosen from the five random libraries and transformed into JM83. The quantity of endogenous hemolysin for each clone was determined by Western analysis using rabbit anti-hemolysin antiserum as primary antibody (section 2.8). The arrow indicates the position of hemolysin (107kD). The identity of each sample (200pl of JM83) were shown above the diagrams, while the corresponding hemolytic zone size, as well as the type of mutant (WT=wild-type, FL=full-length random, S=stop mutant, and (-)=negative) were shown below each lane. 24 Since an average of one hundred variants were assayed for each library at the same time, the blood agar plate assay proved to be the only feasible method for phenotype determination because it is quick and convenient. However, this procedure has a major shortcoming in that the hemolytic zone assignments were determined by the naked eye and might be subject to personal bias. To determine the objectivity of this assay, a second person was asked to rank all Cterml random variants (in comparison to 7 standards) independent of the primary observer. All readings were compiled and the two sets of data were analyzed for differences (table 2). Out of a total of 98 variants, 21 were given the same assignments as before. Seventy mutants had averages that differed by less than one, and seven other variants had average zone differences greater than one. The Pearson product moment correlation coefficient between the two sets of readings was 0.95. This level of correlation was considered acceptable. To determine the reproducibility of this assay, about 15 variants were selected from each of the five random libraries and plated a second time. For any given set, the level of consistency between readings from the same observer on two different days was similar to that of two separate persons reading the same set of plates (table 3). Out of a total of 84 variants, 16 were given the same assignments as before. Sixty-one mutants had averages that differed by less than one, and seven other variants had averages that differed by more than one. The Pearson product moment correlation coefficient between the two sets of measurements was 0.95. This second assay also provided a comprehensive comparison among mutants from different libraries. In fact, 22 a 1 helical mutants isolated from a previous study (Morden, 1998) were also plated here such that they could be compared directly with the al random mutants (section 4.1). 3.4 Nucleotide and Amino Acid Distribution of Random Sequences Each variant was classified into one of three classes upon examination of the DNA sequence representing the mutated signal sequence: those of the proper design (full-length mutants), those that had at least one stop codon (stop mutants), and those that had unanticipated mutations in the signal sequence (unintended mutants). In this thesis, all analyses were performed on full-length mutants and stop mutants only. 25 Table 2: Hemolytic Zone Assignments of Cterml Random Variants Taken by Two Separate Observers David Hui Dr. Jadyn Hung David Hui Dr. Jaclyn Hung Designation Zone Size Std. Dev. Zone Size Std. Dev. Differences Designation Zone Size Std. Dev. Zone Size Std. Dev. Differences C1101 6.00 0.00 6.00 0.00 0.00 C12123 6.00 0.00 6.00 0.00 0.00 C1104 5.83 0.41 6.00 0.00 0.17 C12126 5.83 0.41 5.67 0.52 0.17 c m s 6.00 0.00 5.83 0.41 0.17 C12133 6.00 0.00 5.67 0.52 0.33 C1116 6.00 0.00 6.00 0.00 0.00 C12135 5.83 0.41 5.83 0.41 0.00 C1120 5.17 0.98 5.50 0.55 0.33 C12136 2.33 0.52 3.17 0.75 0.83 C1121 6.00 0.00 6.00 0.00 0.00 C12162 6.00 0.00 5.33 1.03 0.67 C1126 2.33 0.52 2.50 0.55 0.17 C12165 2.33 0.52 2.67 0.52 0.33 C1130 2.00 0.00 2.33 0.52 0.33 C12167 6.00 0.00 5.83 0.41 0.17 C1132 5.33 0.52 5.17 0.75 0.17 C12I68 6.00 0.00 5.67 0.52 0.33 C11103 2.00 0.63 2.00 0.00 0.00 C12172 5.67 0.82 5.50 0.84 0.17 C11104 6.00 0.00 5.67 0.52 0.33 C12178 3.00 0.00 3.67 1.03 0.67 C11106 6.00 0.00 5.33 0.52 0.67 C12181 5.50 0.55 5.67 0.52 0.17 C11129 5.00 0.89 5.00 0.63 0.00 C12182 5.67 0.52 6.00 0.00 0.33 C11139 1.00 0.00 1.00 0.00 0.00 C12184 3.17 0.41 4.83 0.41 1.67 C11140 6.00 0.00 5.67 0.52 0.33 C12186 5.17 0.98 4.33 1.21 0.83 C11159 5.00 0.63 4.67 0 82 0.33 C12190 5.83 0.41 6.00 0.00 0.17 C11161 2.67 0.52 3.50 0.84 0.83 C12207 0.00 0.00 0.00 0.00 0.00 C1U66 5.67 0.82 5.17 1.33 0.50 C12209 5.00 0.89 5.33 0.52 0.33 C11187 3.17 0.41 4.33 1.21 1.17 C12213 1.50 0.55 2.00 0.00 0.50 C11205 5.67 0.52 5.33 1.03 0.33 C12221 5.67 0.52 5.83 0.41 0.17 C1I206 6.00 OOO 6.00 0.00 0.00 C12226 6.00 0.00 6.00 0.00 O.OO C11214 3.50 0.55 4.83 0.41 1.33 C12227 6.00 0.00 6.00 0.00 0.00 C11229 6.00 0.00 6.00 0.00 0.00 C12230 1.00 0.00 1.00 0.00 0.00 C11242 6.00 OOO 6.00 0.00 0.00 C12237 5.83 0.41 5.83 0.41 0.00 CU261 3.00 0.63 4.00 0.00 1.00 C12240 2.50 0.55 4.00 0.63 1.50 C11266 6.00 0.00 6.00 0.00 0.00 C12242 6.00 0.00 600 0.00 0.00 C11274 2.33 0.52 2.83 0.41 0.50 C12247 6.00 0.00 5.33 0.82 0.67 C1206 6.00 0.00 5.50 0.55 0.50 C12250 5.50 0.55 6.00 0.00 0.50 C1215 5.83 0.41 5.67 0.52 0.1.7 C12254 5.17 0.75 5.50 0.55 0.33 C1219 3.17 0.41 4.17 0.75 1.00 C12263 5.83 0.41 5.17 0.41 0.67 C1231 3.50 0.55 4.50 0.55 1.00 C12274 4.50 0.55 5.17 0.41 0.67 C1234 6.00 0.00 5.83 041 0.17 C12282 5.67 0.52 5.33 0.52 0.33 C1237 6.00 0.00 5.67 0.52 0.33 C12312 5.83 0.41 5.67 0.52 0.17 C1244 6.00 0.00 5.83 0.41 0.17 C12313 2.83 0.75 3.33 0.52 0.50 C1247 2.83 0.41 3.67 0.52 0.83 C12318 6.00 0.00 5.33 0.82 0.67 C1265 6.00 0.00 5.67 0.52 0.33 C12320 5.17 0.75 5.17 0.98 0.00 C1267 5.17 0.41 5.00 0.89 0.17 C12331 2.83 0.41 3.17 0.75 0.33 C1270 5.67 0.52 5.67 0.52 0.00 C12332 5.50 0.55 5.17 0.75 0.33 C1272 3.83 0.75 4.50 0.84 0.67 C12345 6.00 0.00 5.67 0.52 0.33 C1280 5.83 0.41 5.50 0.84 0.33 C12348 2.83 0.41 3.83 0.41 1.00 C1282 5.50 0.55 5.67 0.52 0.17 C12351 2.50 0.55 3.33 1.03 0.83 C128S 2.17 0.41 3.67 0.82 1.50 C12352 2.67 0.52 3.67 0.82 1.00 C12102 6.00 0.00 6.00 0.00 0.00 C12362 5.50 0.84 5.33 0.52 0.17 C12108 5.83 0.41 5.83 0.41 OOO C12363 3.67 1.03 4.17 0.98 0.50 C12110 6.00 0.00 6.00 0.00 0.00 C12375 5.17 0.98 5.33 0.52 0.17 C12115 2.83 0.41 3.67 0.52 0.83 C12376 6.00 0.00 6.00 0.00 0.00 C12117 5.83 0.41 5.83 0.41 0.00 C12377 2.33 0.52 3.17 0.75 0.83 C12121 4.50 1.05 5.50 0.84 1.00 C12381 2.67 0.52 4.33 0.82 1.67 C12122 6.00 0.00 5.83 0.41 0.17 C12388 5.50 0.55 5.67 0.52 0.17 Average 6.76 0.49 4.98 0.46 0.37 Statistics for Cterml Variants Difference Number Number of clones not in Average of clones taken into account by standard error 0 21 -<1 70 2 >=1 7 3 total 98 5 26 Table 3: Hemolytic Zone Assignments Taken on Separate Days by Same Observer First Reading Second Reading Designation Zone Size Std.Dev. Zone Size Std. Dev. Differences A1I08 3.00 0.00 2.67 0.52 0.33 A l l 12 2.30 0.55 2.50 0.55 0.00 A l l 13 3.33 0.52 3.83 0.75 0.50 A l l 19 1.50 0.55 2.00 0.00 0.50 A1120 2.00 0.00 2.50 0.55 0.50 A1122 2.00 0.00 2.17 0.41 0.17 A1126 2.83 0.41 3.00 0.00 0.17 A1131 3.50 0.55 3.67 1.03 0.17 A l 1113 1.33 0.52 1.17 0.41 0.17 All123 2.00 O.OO 1.83 0.41 0.17 A l 1131 2.67 0.52 3.67 0.52 1.00 A l 1132 2.33 0.52 3.00 0.63 0.67 A l l 136 2.17 0.75 3.50 0.55 1.33 A11226 1.00 0.00 0.67 0.52 0.33 A11244 1.50 0.55 1.67 0.52 0.17 A11246 1.67 0.52 2.00 0.00 0.33 A11284 1.83 0.41 2.00 O.OO 0.17 A120S 2.00 0.00 1.67 0.52 0.33 A1232 3.17 0.41 3.33 0.52 0.17 A1236 2.00 0.00 2.00 0.00 0.00 A1270 1.33 0.52 0.83 0.41 0.50 A12119 3.00 0.00 3.00 0.00 0.00 A1214S 1.67 0.52 1.50 0.55 0.17 A12169 3.50 0.55 3.50 0.55 0.00 L2106 1.00 0.00 1.17 0.41 0.17 L2108 1.00 0.00 1.33 0.52 0.33 L2114 2.83 1.17 2.83 0.41 0.00 L2124 2.50 0.55 2.33 0.52 0.17 L2310 3.33 0.52 3.00 0.00 0.33 L2329 4.00 0.00 4.50 0.84 0.50 L3235 2.00 0.89 2.00 0.00 0.00 L3247 2.50 0.55 2.67 0.52 0.17 L3290 3.50 0.55 3.33 0.52 0.17 L3309 2.67 0.52 2.67 0.52 0.00 L3336 2.17 0.75 2.33 0.52 0.17 L3337 2.67 0.52 3.00 0.63 0.33 L3345 3.00 0.00 3.33 0.52 0.33 L3364 2.00 0.63 2.00 0.63 0.00 L3387 2.33 0.52 3.33 0.52 1.00 B2149 6.00 0.00 6.00 0.00 0.00 B21136 1.00 0.00 0.67 0.52 0.33 B21303 1.00 0.00 0.67 0.52 0.33 B21312 1.50 0.55 1.50 0.55 0.00 B21321 1.00 0.00 0.00 0.00 1.00 B21336 5.50 0.55 5.83 0.41 0.33 B2378 4.00 0.63 3.67 0.52 0.33 B23151 6.00 0.00 6.00 0.00 0.00 B3144 1.00 0.00 1.00 0.00 0.00 B3216 5.00 O.OO 5.67 0.52 0.67 B3278 4.50 0.55 5.50 0.84 1.00 B3279 5.83 0.41 6.00 0.00 0.17 B32101 3.33 0.82 3.00 0.00 0.33 B32185 1.00 0.00 1.00 0.00 O.OO B32259 2.83 0.75 3.17 0.41 0.33 CHOI 6.00 0.00 5.83 0.41 0.17 CU30 2.00 0.00 2.00 0.00 0.00 CU129 5.00 0.89 4.50 1.64 0.50 C11261 3.00 0.63 3.50 1.22 0.50 C1231 3.50 0.55 2.67 0.52 0.83 C1285 2.17 0.41 2.33 0.52 0.17 C12121 4.50 1.05 3.83 0.41 0.67 C12122 6.00 0.00 5.67 0.52 0.33 C12126 5.83 0.41 3.75 0.96 2.08 C12135 5.83 0.41 6.00 0.00 0.17 C12178 3.00 0.00 2.50 0.55 0.50 C12181 5.50 0.55 5.17 0.75 0.33 C12182 5.67 0.52 4.50 1.97 1.17 C12313 2.83 0.75 3.67 1.51 0.83 D1206 6.00 0.00 5.50 0.84 0.50 D1231 4.00 0.63 3.50 1.05 0.50 D12334 2.00 0.00 2.00 0.00 0.00 D3101 2.00 O.OO 2.00 0.00 0.00 D311S 6.00 O.OO 6.00 0.00 0.00 D3130 6.00 0.00 5.83 0.41 0.17 D3146 4.17 0.75 4.50 0.55 0.33 D3209 5.33 0.82 5.83 0.41 0.50 D3224 1.00 0.00 1.17 0.41 0.17 D3235 6.00 0.00 6.00 0.00 0.00 D3242 6.00 0.00 6.00 0.00 0.00 D3257 5.83 0.41 6.00 0.00 0.17 D3268 5.00 0.63 5.50 0.84 0.50 D3286 1.17 0.41 2.00 0.00 0.83 D3287 3.17 0.41 3.50 0.55 0.33 Average 3.20 0.37 3.24 0.45 0.35 Statistic! for Alpha! Variants Difference Number Number of clonei not in Average of clones taken into account by standard error 0 4 -<1 18 0 >=1 2 1 total 24 I Statistics for Linker Variant! Difference Number Number of clones not in Average of clones taken into account by standard error 0 2 -' <1 13 0 >-l 1 0 total 15 0 Statistic! for Alpha2 Variant! Difference Number Number of clones not in Average of clones taken into account by standard error 0 5 -<1 8 1 >-l 2 1 total 15 2 Statistics for Cterml Variants Difference Number Number of clonei not in Average of clones taken into account by standard error 0 1 -• <1 12 0 >=1 2 1 total 15 1 Statistic! for Cterm2 Variants Difference Number Number of clonei not in Average of clones taken into account by standard error 0 5 <1 10 1 >=1 0 0 total IS 1 Statlrtlci for all Variants Difference Number Number of clones not In Average of clones taken into account by standard error 0 16 -<1 61 2 >=1 7 3 total 84 5 27 The nucleotide frequency within the random region was determined for all full-length and stop mutants of each library (table 4). Although the random regions were designed such that all four nucleotides had an equal chance of being incorporated at each position (i.e. 25%), the actual proportion deviated dramatically. In fact, all five random libraries had different nucleotide distributions. Based on % statistics, it was determined that these frequencies could not have arisen by chance. Whether this skewing was a result of manipulations during the cloning procedure (oligonucleotide synthesis, annealing, Klenow filling, or ligation), or biological selection remains unclear. Analysis of the amino acid distribution may provide a partial answer to the above observation. The frequencies of the twenty amino acids as well as stop codons were tabulated for each random library. None of these agreed with the normally expected ratio. However, when the expected amino acid distribution was re-calculated based on the corresponding skewed nucleotide frequency, the corrected prediction fitted very well with the observed amino acid frequency, with the exception of the Cterml library (table 5). To a certain extent, this observation supported the idea that biological selection was a less likely explanation—if selection was at the amino acid level (i.e. biological), then the nucleotide frequency might not have matched so well because of the intrinsic degeneracy in the genetic code. 3.5 al Random Library 3.5.1 Analysis of al Full-length Mutants A total of 88 al random variants were generated by replacing the 12 amino acid coding region with 36 degenerate nucleotides. Of these variants, 35 were full-length mutants, 34 were stop mutants, and 19 were unintended mutants. Blood agar plate assays were performed on 34 of the full-length mutants, all of the stop mutants, and 7 of the unintended mutants. If the al amphiphilic helix played a critical role in transport, most al full-length mutants would be expected to be secreted at low levels since they would not contain this specific structural feature. The hemolytic zones for al full-length mutants ranged from 2 to 3.5 (table 6). The population was relatively homogenous, with a mean and a median of 2.8, suggesting that secretion was greatly hampered (figure 28 Table 4: Nucleotide Frequencies of All Random Libraries Nucleotide Alpha 1 (%) Linker ('/.) Alpha 2 (•/.) Cterml (%) Cterm2 (%) T 18% 13% 39% 30% 12% C 25% 35% 13% 19% 26% A 36% 22% 28% 28% 14% G 21% 30% 20% 23% 48% P-value 2.11E-40 8.50E-42 2.17E-78 1.38E-12 1.47E-130 Table S: Amino acid Frequencies of All Random Libraries Amino acid Alphal Linker Alpha2 Cterml Cterm2 Predicted % Observed */• Predicted % Observed % Predicted % Observed % Predicted % Observed */• Predicted % Observed % G 4.27 4.41 9.27 10.17 3.89 2.82 5.17 4.22 23.47 23.00 A 5.15 4.90 10.49 10.51 2.56 2.56 4.37 4.37 12.38 14.50 V 3.76 3.19 3.92 2.54 7.76 7.04 6.85 6.02 6.00 5.00 L 6.41 6.00 5.31 6.61 12.49 12.55 10.39 8.43 4.12 5.67 I 5.23 4.29 1.99 1.86 8.82 8.07 6.50 5.27 0.87 0.83 P 6.21 6.99 11.88 11.36 1.69 1.92 3.70 2.86 6.53 5.67 F 1.42 1.23 0.79 2.03 8.11 8.96 4 48 6.33 0.58 0.17 W 0.78 0.86 1.19 1.69 1.53 2.18 1.56 0.90 2.91 4,17 Y 2.84 2.82 1.35 1.86 5.75 4.48 4.15 2.86 0.64 1.00 T 9.03 8.95 7.65 7.29 3.63 4.10 5.37 6.48 3.48 3.83 S 7.76 9.68 7.64 7.46 8.00 6.91 8.92 11.60 5.67 5.33 c 1.62 1.84 1.86 1.02 4.06 5.89 3.38 5.72 2.28 2.17 M 1.36 1.59 0.87 0.85 2.17 2.94 1.91 1.81 0.82 1.00 Q 5.14 5.51 4.03 4.07 1.73 1.92 2.72 2.26 2.16 1.83 N 5.66 5.88 2.33 2.20 4.08 3.46 3.84 4.82 0.70 0.17 D 3.23 2.45 3.20 2.54 2.88 3.46 3.13 2.56 2.50 2.33 E 4.26 2.82 3.56 4.24 2.62 2.43 3.22 1.66 4.09 2.83 K 7.47 8.21 2.60 2.88 3.72 5.12 3.95 6.02 1.15 1.50 R 9.41 8.46 14.05 12.54 5.19 3.46 7.59 7.83 16.47 16.00 H 3.89 3.31 3.62 3.73 1.90 1.92 2.65 1.51 1.32 1.17 * 5.11 6.62 2.38 2.54 7.40 7.81 6.17 6.48 1.86 1.83 P-value 0.49 0.21 0.08 8.66E-05 0.44 29 Table 6: Genotype and Phenotype of Alphal Random Variants Designation Nucleotide sequence Amino acid seq. Zone Size Std. Dev. Stop? Where AU31 TTCCGCAAGGCCAATCAGCACAATCAGTTGGCGTGG FRKANQHNQLAW 3.50 0.55 0 0 A12169 AATGCGTTAGTTAAAGTACAGAAAATTCATTTAGAG NALVKVQKIHLE 3.50 0.55 0 0 A1113 AACGCTACGTTGCACCAAAGCCCTTCAGAGTTGTTA NATLHQSPSELL 3.33 0.52 0 0 A12123 ATGCCCCCGAGGTCCATCGATGGACTTTGCACCGCC MPPRSIDGLCTA 3.33 0.52 0 0 A12170 AGAAAAGCCAGCTCAACAGGTTCGCAAAATGAATGG RKASSTGSQNEW 3.33 0.52 0 0 A1232 AGCAGAAAATTCGGTAATTATACATCACCAACCGTA SRKFGNYTSPTV 3.17 0.41 0 0 A1248 TGTAGGCAAAGAATTCCTCAAACGGTATTAGGAGTA CRQRIPQTVLGV 3.17 0.41 0 0 A12119 AAGCCCCTCAAGCGCCGCTCGGGCAGATTAGATACA KPLKRRSGRLDT 3.00 0.00 0 0 A12149 GAGGAGAGTTGTACCGTACACATATCAATACTTATC EESCTVHISILI 3.00 0.00 0 0 A12174 AATCAGTACACGTACAGGCGCTTATCCGCATGCAAG NQYTYRRLSACK 3.00 0.00 0 0 A12176 ATGGGTAGGTACCACACAGAAAGTGACCCCAGGCCA MGRYHTESDPRP 3.00 0.00 0 0 A1126 AAACAACTCATCCCTGAACACAACCCTACTCATAAA KQLIPEHNPTHK 2.83 0.41 0 0 A1130 AATGCCATAAAAGCAGCAGACTCGCACGCAATTTTC NAIKAADSHATF 2.83 0.41 0 0 A l l 102 TTTCCCGACTGGAAAGCATTATCTAGGACCCAGCCT FPDWKALSRTQP 2.83 0.41 0 0 A11285 CGTTCTATTTCTAAACCAGATCCGGAACCACTCAAA RSISKPDPEPLK 2.83 0.41 0 0 A11289 TATCCTAAAACGAGCACACAGCAGAATTTAAGGGGG YPKTSTQQNLRG 2.83 0.41 0 0 A12171 AACGGAAGCAAGACTCACGAGCTACCTACACCAAAA NGSKTHELPTPK 2.83 0.41 0 0 A11207 CTCCCCACGAAGAGCTTACAGTGTTCCTTTAAAACA LPTKSLQCSFKT 2.83 0.75 0 0 A11131 ACCTCTTGCTTACACAGGAGTCTAAAAAGACAAGTT TSCLHRSLKRQV 2.67 0.52 0 0 A11145 AGTGCCAGCACAATGGAGCCTAGCGCGAGACACCAT SASTMEPSARHH 2.67 0.52 0 0 A11275 AACCTCTCCACAAAGGGTAGGCATGGATACAAAACG NLSTKGRHGYKT 2.67 0.52 0 0 A1275 AAGCACAGCAGAATGCCAAATATAAACGCAAAAAGA KHSRMPNINAK.R 2.67 0.52 0 0 A1254 TCTTCTTTGAGGAGTAGCCAGCCCAACTTAGGCAAA SSLRSSQPNLGK 2.67 0.82 0 0 A l l 12 AAGGCTGGAGGCCAGCAATCGAAGCTGGGCAGCAAG KAGGQQSKLGSK 2.50 0.55 0 0 A11253 GTCGCAGCAATAGACTGGACTGCTCGGGACCCCATC VAAIDWTARDPI 2.50 0.55 0 0 A1203 AACGCCCAGTATAGCAGAGTTCCGACAGAATCTAAT NAQYSRVPTESN 2.50 0.55 0 0 A1250 GACAGCAGCAGATTATACACCAACAAGCACAAATCC DSSRLYTNKHKS 2.50 0.55 0 0 A1279 AGCTCACCGAGTACATACAAAGTACCCGGGCAAAAA SSPSTYKVPGQK 2.50 0.55 0 0 A l l 132 AACCCGACGAAGAAGTCCCAATCCTCGCCAAGACTG NPTKKSQSSPRL 2.33 0.52 0 0 A11136 TGGCGCTTGATAACCACCAAGCGGGACTCCACAACT WRLITTKRDSTT 2.17 0.75 0 0 A1122 AATTCAAGAAAAATGCAAAAGAGTATCCAGCAAAGA NSRKMQKSIQQR 2.00 0.00 0 0 A11219 AAGCACGTATGGGACAATCTTAAAGAATGTTCATAA LPTKSLQCSFKT 2.00 0.00 0 0 A11237 AATGCCCGCGAAAAGCCATATCACACGCGAAAAAAG NAREKPYHTRKK 2.00 0.00 0 0 A1236 GCAGAGAAGAGCACTAGTAGGCAACGGTCACGAAAG AEKSTSRQRSRK 2.00 0.00 0 0 A l l 29 TAAGGGGGGAACAGCGTATGCTCAGTCAGTTGACAG *GGNSVCSVS*Q 2.00 0.00 2 1 A11242 TAGGGCACGAGGTGAGACGAGTGCGAAATCCTAACC •GTR'DECEILT 2.00 0.00 2 1 A1283 TAAAGCACCTAACATACATAGAGACACGGTGTGTTA •ST'HT'RHGVL 2.00 0.00 3 1 A11284 TAGACTGCAAACACCTGAATACGAAGACGCATACAC *TANT*IRRRIH 1.83 0.41 2 1 A1230 TAAAGCCAGGGTAGGAGATAATAAGAAGAACCTACT *SQGRR"EEPT 1.67 0.52 3 1 A l l 123 AATTAACCCCAATAACTCCAATAAGTTGGAATGCTT N'PQ'LQ'VGML 2.00 0.00 3 2 A11214 ACGCCCTGAGTCACCCCAAATGTAGACCGCTGAACG TP'VTPNVDR'T 2.00 0.00 2 3 A11246 AAACCCTAGTACAACCAAGACCCTGAGAACGAAAGT KP'YNQDPENES 1.67 0.52 1 3 A11153 A T A G A T T A G G C A T A C C C C A G C G A C T A G G A A C T C G C A ID'AYPSD'ELA 1.50 0.55 2 3 A11254 CTCCCAGCCTAATGCAAGGTCCAAAAGAAATAAATA LPA'CKVQKK'I 2.17 0.41 2 4 A1208 AACCCTAGGTGATACCCGAAGTAGGTCAACTTAATA NPR*YPK»VNLI 2.00 0.00 2 4 A1273 GACTGGCCCTGAACACTCGGAACAGTGGACCTTGCA DWP'TLGTVDLA 2.00 0.00 1 4 A11101 AAGCCTACGTAATTATCATGCTAAGCGATATATAGC KPT'LSC'AIYS 1.83 0.41 4 A11251 TACACACAGTAATCTGACTAATATAATATCTACTGA YTQ*SD*YNIY* 1.83 0.41 4 A1266 GTTACACCGTAAAGAAAATCCAAACAAGGGTCAACA VTP'RKSKQGST 1.83 0.41 1 4 A11203 GGGTCGAAGTAAAATAAAAGCGGAATTAAAAGCTGA GSK'NKSGIKS* 1.50 0.55 4 A1120 CAATATCTCTTCTGAGCTAATACACTAGGAACTATG Q YLF* ANTLGTM 2.00 0.00 1 5 A11147 AACAGTGCTGGATAAAATAGGCATTCTACTGGGAAC NSAG'NRHSTGN 2.00 0.00 1 5 A1219 CAGGTTAACACATAGCCCGTAGCAGTCAAACCACAA QVNT'PVAVKPQ 1.83 0.41 1 5 A11210 GGTCTCTCCTGTTAGGCCCTAATTACGCTAACTAAA GLSC'ALITLTK 1.67 0.52 1 5 A11150 AACACTCGATACCAGTAACCGCGTAGTGACCGTTCT NTRYQ'PRSDRS 2.00 0.00 1 6 A11262 ACCGTGCGATATACGTAGCCGACAGCGTAACAACCA TVRYT'PTA'QP 2.00 0.00 6 A11224 CGTCGTGGGGTACACATATAACCCGCTAACTGGATC RRGVH1*PANWI 1.83 0.41 1 7 A12145 AACAGAAAAATTAATTGCTAAATCACAAGAGCGAAG NRKINC'ITRAK 1.67 0.52 1 7 A1119 AAGCACATGGCTGGACCGTGACGAATTTAATGAGGA KHMAGP*RI**G 1.50 0.55 7 A l l 143 AACGAAAGTAGCTTCCAGTAACTGAGACAAACAGAG NESSFQ'LRQTE 1.50 0.55 1 7 All142 GCGTTCAGCTGCCTCATGACGTGACCGCACAGGATA AFSCLMT*PHRI 2.83 0.41 1 8 A l l 105 ATACACGCAAAAACGTTCCGTTGAACGGGTCGCCGC IHAKTFR'TGRR 2.00 0.00 1 8 A11244 AAGCCGGGTAACATCGGAAGGTAACCAATGGTACAT KPGNIGR'PMVH 1.50 0.55 1 8 A11268 CGGAACCAGGCCTCCCTACAAATTTGACCTCATAAT RNQASLQI*PHN 2.00 0.00 1 9 A1270 AATGAACAAAATAAAACGCGGAAATAACGTGCACCC NEQNKTRK'RAP 1.33 0.52 1 9 A11226 AACCGAACCAGGAGAACCAAATGTTAACAACCACCC NRTRRTKC'QPP 1.00 0.00 1 9 A1108 TACACAGATGGCATGACAATCTATTACATGTAAAAC YTDGMTTYYM'N 3.00 0.00 1 11 A11113 CACACGACATCCAAGTCCCGAAAACTACAGATGTAA HTTSKSRKLQM* 1.33 0.52 1 12 30 6A). Since the rest of the signal sequence was intact, any changes in hemolysin transport could be attributed to modifications in the al region. More specifically, the dramatic reduction in secretion indicated that some feature(s) essential for efficient transport must be located in this region. It was proposed that an amphiphilic helix in the al region is required for transport (Morden, 1998). In order to test this hypothesis, the helix hydrophobic moment (which is an indicator of amphiphilic helix formation (Eisenberg et al., 1984)) was determined for each al full-length mutant. To facilitate analysis, these variants were divided into three classes based on their hemolytic rankings: two secreting at 3.5, twenty-six secreting between 2.5 and 3.4, and six secreting below 2.5 (table 6). The respective average helix hydrophobic moment for each class was 0.17, 0.14, and 0.13, in contrast with wild-type, which had a value of 0.23. The finding that the average value of helix hydrophobic moment for each class increased with the ability to be secreted supports our hypothesis that an amphiphilic helix is required for efficient transport. A number of other biophysical properties were also calculated for the twelve amino acid region of each full-length mutant, including the predicted secondary structure, hydropathy value, and hydrophilicity. It appears that higher secretors have a slightly higher tendency to form a-helices as predicted by the Chou-Fasman (CF) method (Chou and Fasman, 1978), as well as the Garnier-Osguthorpe-Robson (GOR) method (Gamier et al., 1978). Furthermore, an increase in zone size is associated with decreasing hydrophilicity according to the Kyte-Doolittle method (Kyte and Doolittle, 1982) (table 7). 3.5.2 Analysis of al Stop Mutants The zone sizes for al stop mutants ranged from 1 to 3. Again, the population was relatively homogeneous, with a mean of 1.9 and a median of 1.8 (table 6). In these variants, the last 49 amino acids were replaced by anywhere from zero to 11 random amino acids. Since the majority of the signal sequence was removed, it was not surprising to see the dramatic reduction in hemolysin secretion (figure 6A). It should be noted that most al stop mutants were transported at lower levels than al full-length mutants, suggesting that the downstream region of the signal sequence also 31 (A) Distribution of Alphal Random Variants • FuU-length mutants Ii Stop mutants 40 | 30 I 20 1 10 5 0 _H_ L 1 2 3 4 5 Hemolytic zone size (B) Distribution of Linker Random Variants Full-length mutants El Stop mutants 2 5 1 c 20-S 3 E 15-o 01 10-E 5-z 0-1 2 3 4 5 Hemolytic zone size (C) Distribution of Alpha2 Random Variants • Full-length mutants • Stop mutants 30 I 25 I 20 I 15 I 10 i 5 * 0 1 1 1 2 3 4 5 Hemolytic zone size (D) Distribution of Cterml Random Variants • Full-length mutants • Stop mutants 50 40 a c a I 30 120 iio J L 1 2 3 Hemolytic zone size Distribution of Cterm2 Random Variants • Full-length mutants m Stop mutants 30-, Hemolytic zone size Figure 6: Distribution of Random Signal Sequence Variants (A) The secretion levels of al random variants (34 full-length and 34 stop) are plotted in the histogram above. Each secretion category has a 0.5 margin (e.g. a hemolytic zone size of 5 represents mutants secreting between and including 4.5 and 5.495). (B) The secretion pattern of linker random variants (45 full-length and 15 stop). (C) The secretion pattern of a2 random variants (32 full-length and 39 stop). (D) The secretion pattern of Cterml random variants (48 full-length and 35 stop). (E) The secretion pattern of Cterm2 random variants (65 full-length and 10 stop). 32 Table 7: Calculated Biophysical Properties of Alphal Random Full-length Variants PepPlot (Chow and Fasman Predictions] PeptideStructure Prediction Helix Average Average Average Hydropathy Hydrophilicity CF-Pred GOR-Pred hydrophobic Designation Zone Size Alph i Beta Turn t T b B h H T B H moment Wild-type 6 1.10 1.07 0.38 0.11 -0.17 0 0 0 0 9 0 0 0 12 0.23 Al 131 3.50 1.08 1.02 0.27 0.34 -0.22 0 0 0 0 12 0 5 0 0 0.17 A12169 3.50 1.14 1.10 0.28 0.10 -0.05 0 0 0 0 0 11 0 0 12 0.17 Average 3.50 1.11 1.06 0.27 0.22 -0.13 0.0 0.0 0.0 0.0 6.0 5.5 2.5 0.0 6.0 0.17 Al 113 3.33 1.02 0.93 0.56 0.53 -0.18 3 2 0 0 0 0 0 0 0 0.17 A12123 3.33 0.93 0.94 0.25 0.61 -0.01 3 2 0 0 0 0 6 4 0 0.18 A12170 3.33 0.97 0.87 0.52 0.80 0.41 4 3 0 0 0 0 5 0 0 0.08 A1232 3.17 0.84 1.02 1.46 0.57 -0.12 9 0 0 0 0 0 6 0 0 0.15 A1248 3.17 0.94 1.17 0.27 0.19 -0 18 0 0 0 12 0 0 4 8 0 0.21 A12119 3.00 0.95 0.89 0.53 0.52 1.16 1 2 0 0 0 0 6 0 0 0.16 A12149 3.00 1.05 111 0.29 0.20 -0.55 0 0 0 9 0 0 0 4 8 0.12 A12174 3.00 0.93 1.07 0.66 0.46 0.07 2 0 7 0 0 0 7 5 0 0.13 A12176 3.00 0.91 0.83 0.80 0.66 0.71 2 2 0 0 0 0 4 0 0 0.13 A1126 2.83 0.99 0.90 0.43 0.38 0.23 1 2 0 0 0 0 3 0 0 0.16 A1130 2.83 1.13 0.97 0.31 0.26 -0.16 0 0 0 0 0 11 0 0 12 0.15 AU102 2.83 0.99 0.94 0.42 0.69 0.14 0 2 0 0 8 0 2 0 0 0.16 A11285 2.83 0.95 0.78 0.65 0.55 0.92 0 4 0 0 0 0 5 0 0 0.13 Al 1289 2.83 0.88 1.00 0.25 0.51 0.21 3 0 6 0 0 0 7 0 0 0.13 A12171 2.83 0.90 0.82 0.39 0.41 0.53 3 2 0 0 0 0 4 0 0 0.11 A11207 2.83 0.95 1.02 0.16 0.49 -0.03 4 0 5 0 0 0 11 0 0 0.15 Al 1131 2.67 0.98 1.06 0.44 0.36 0.33 0 0 0 5 0 0 11 0 0 0.13 A l l 145 2.67 1.04 0.81 0.29 0.36 0.32 2 0 0 0 8 0 0 0 7 0.11 A11275 2.67 0.87 0.96 0.56 0.68 0.35 3 4 0 0 0 0 9 0 0 0.14 A1275 2.67 0.99 0.90 0.39 0.45 0.66 0 2 0 0 0 0 2 0 0 0.15 A1254 2.67 0.88 0.88 0.58 0.65 0.30 3 2 0 0 0 0 4 0 0 0.13 A1112 2.50 0.97 0.86 0.31 0.49 0.54 7 0 0 0 0 0 3 0 0 0.11 A11253 2.50 1.08 1.04 0.69 0.23 0.04 2 0 0 0 0 8 4 0 0 0.17 A1203 2.50 0.92 0.95 0.69 0.42 0.26 2 0 6 0 0 0 4 0 0 0.13 A1250 2.50 0.92 0.91 0.66 0.67 0.64 0 4 0 5 0 0 8 0 0 0.10 A1279 2.50 0.84 0.92 0.82 0.92 0.21 2 4 0 0 0 0 7 0 0 0.08 Average 2.85 0.95 0.94 0.51 0.50 0.26 2.2 1.4 0.9 1.2 0.6 0.7 4.7 0.8 1.0 0.14 A l l 132 2.33 0.88 0.85 0.41 0.91 0.75 4 6 0 0 0 0 5 0 0 0.11 A l l 136 2.17 0.97 1.08 0.66 0.41 0.39 5 0 0 5 0 0 3 0 0 0.11 Al 122 2.00 1.03 0.97 0.21 0.36 0.84 0 2 0 0 9 0 4 6 0 0.18 A11219 2.00 1.03 0.96 0.64 0.31 0.34 2 0 0 0 0 0 4 0 7 0.15 A11237 2.00 1.01 0.85 0.38 0.46 1.32 2 0 0 0 0 0 5 0 0 0.13 A1236 2.00 1.04 0.83 0.37 0.57 1.52 6 0 0 0 0 0 5 0 7 0.10 Average 2.08 0.99 0.92 0.44 0.50 0.86 3.2 1.3 0.0 0.8 1.5 0.0 4.3 1.0 2.3 0.13 33 played a significant role in transport. When the hemolytic zone of these variants was plotted against the position of the stop codons, no obvious trend was observed. 3.6 L inker Random Library 3.6.1 Analysis of L inker Full-length Mutants A total of 68 linker random variants were generated by replacing the 10 amino acid linker coding region with 30 degenerate nucleotides. Upon sequencing of these clones, 45 were full-length mutants, 15 were stop mutants, and 19 were unintended mutants. Blood agar plate assays were performed on all of these variants. If the only role of this region was to connect the two conserved helices (al and a2) as proposed, then the linker should be able to tolerate many different combinations of amino acids, and most linker full-length mutants would be expected to be secreted at a level similar to wild-type. However, the hemolytic zones for linker full-length mutants ranged from 1.3 to 4, with a mean of 2.4, and a median of 2.3 (table 8). Secretion in these linker variants was obviously reduced despite of the presence of an intact al helix (figure 6B). The dramatic decrease in transport upon random mutagenesis of this region suggested that the linker, like the al region, also contains some important element(s) that may be required for efficient transport. In an attempt to elucidate features within the linker region that contribute to transport, a number of biophysical properties (secondary structure predicted by the CF and GOR methods, hydropathy, hydrophilicity, and helix hydrophobicity) were calculated (table 9). No obvious relationships were identified. Analysis of the primary sequence revealed that a net negative charge within this region was more favorable for secretion. In addition, both the wild-type sequence as well as the highest-secreting linker mutant (L2329) contained a negatively charged residue at position four. While no conclusions could be drawn based on these empirical observations, it was likely that multiple features were involved. 3.6.2 Analysis of L inker Stop Mutants The zone sizes for linker stop mutants ranged from 1 to 2.7, with a mean of 1.7, and a median of 1.6 (table 8). In these mutants, the last 37 amino acids were replaced by a 34 Table 8: Genotype and Phenotype of Linker Random Variants Designation Nucleotide sequence Amino acid seq. Zone Size Std. Dev. Stop? Where L2329 AGTTCCATGGAGCTGCCAACAACCCTCACG SSMELPTTLT 4.00 0.00 0 0 L2102 CCCAACCAGGCGCACTACTTCAGAGCAGGT PNQAHYFRAG 3.50 0.55 0 0 L3290 CACCAAGACTTCGGCGCAACAGACGAACCA HQDFGATDEP 3.50 0.55 0 0 L2301 CGGCCGTGGGCCGCCGAATCCCCGCCACGG RPWAAESPPR 3.33 0.52 0 0 L2310 CATTGGTTCCATATCACTCCTACCCTAGAA HWFHIHPTLE 3.33 0.52 0 0 L2406 ACCCCGTGCGCTGACCTGCTCCCCGAATCC TPCADLLPES 3.17 0.41 0 0 L3233 AGCCTAGTGAGGCCGACAGACGCAGAACCA SLVRPTDAEP 3.17 0.41 0 0 L3345 CCGTCCCCCTACCCGGCGGCAGGAGGTCTG PSPYPAAGGL 3.00 0.00 0 0 L3241 AAGCCATTCGACGCACCCCCAACACCCAGC KPFDAPPTPS 2.83 0.41 0 0 L3329 GACGCTCGGGACCAGATGGGGTGGCAAGGA DARDQMGWQG 2.83 0.41 0 0 L3365 CCCGACCGCGGGGGACCAGAGGGGCGGCAA PDRGGPEGRQ 2.83 0.41 0 0 L2114 CGGCGGAACGGATCACAAGGCTCACCGGCC RRNGSQGSPA 2.83 1.17 0 0 L2318 CGCTCCAGCGAATTCCTCCCACAGATCCAG RSSEFLPQIQ 2.67 0.52 0 0 L3282 GGCCGGGCCGTACGCAACTCTGAACACAGC GRAVRNSEHS 2.67 0.52 0 0 L3337 AGCTCCGGGGAGCGCAGACACCACCCTAAT SSGERRHHPN 2.67 0.52 0 0 L3377 GGCCCGGGCGCCCGAGCGCACTCGCAAGCC GPGARAHSQA 2.67 0.52 0 0 L2116 CACCCCTACAGCACCCCGAGAGAGAACAAT HPYSTPRDNN 2.67 1.03 0 0 L2124 TGGTGGTTTGCACAGAGCGGGACTGCGGGA WWFAQSGTAG 2.50 0.55 0 0 L2307 GGGCTCGGCTCCCCCGCATGCCGCAGTCCT GLGSPACRSP 2.50 0.55 0 0 L2402 TTAGGTGCCTACGCTGGTCCGCCCGGGGGC LGAYAGPPGG 2.50 0.55 0 0 L22120 CCGACCCACACACCCCCTCTGCAGCGAAAG PTHTPPLORK 2.33 0.52 0 0 L22123 CCGGACCGGCGCGCCCACGCCTGTAAGACG PDRRAHACKT 2.33 0.52 0 0 L2306 CTGGCCGGTTCGACGAAAGGCACACCGGTA LAGSTKGTPV 2.33 0.52 0 0 L3323 CCAGCGCACCATGACAGCAGGCTCCGTGCA PAHHDSRLRA 2.33 0.52 0 0 L3387 CCCCCGGCTGCGCTCATCCGGCCCAGCCCC PPAALIRPSP 2.33 0.52 0 0 L2118 CGCAGCCCCACAGCGCCGCGCCGCACACAT RSPTAPRRTH 2.17 0.41 0 0 L3259 GGTAGGCGCCAAGGCGGCCAGGGAAACAAG GRRQGGQGNK 2.17 0.41 0 0 L3280 CCCGAGTGCGCGCGGGGATTTCGGACGGGA PECARGFRTG 2.17 0.41 0 0 L3336 GCCATGAGCACGCCCTTCCGCCTGAGCATC AMSTPFRLSI 2.17 0.75 0 0 L3240 ATCTCATGTCCTAGGTTGGGACCTACCCAG ISCPRLGPTQ 2.00 0.00 0 0 L3385 G G C A G G C G C C G A G G G G C G G G C T C A C A T G A G GRRRGAGSHE 2.00 0.00 0 0 L3361 GGGAGGCGACCAGCAAGCCATGTCATGCAG GRRPASHVMQ 2.00 0.63 0 0 L3235 CTCGGGTTCCAGACGACCCCCAGCAAGCGA LGFQTTPSKR 2.00 0.89 0 0 L3348 GCGAGGGGGGCCGCAACGCTCGAAGTTGTT ARGAATLEW 1.83 0.41 0 0 L2316 AAGTACCCGAGCACACCGCCGCGCCGCGCG KYPSTPPRRA 1.83 0.75 0 0 L3267 ACAGAGACTGCCAAGTACGGCCTGCAGACG TETAKYGLQT 1.67 0.52 0 0 L3275 TTGGCTCGGAGCCTCCCATACGGGCACGTG LARSLPYGHV 1.67 0.52 0 0 L3346 CTGGGCGGTATCGCTCTCCAGAAAAAGGGA LGGIALQKKG 1.67 0.52 0 0 L3281 GCCGCATATAGACGGAGGGTGTCGGCAGAG AAYRRRVSAE 1.50 0.55 0 0 L3315 CGCGCGTGGCGGCAACGCACCAGCGGCCCT RAWRQRTSGP 1.50 0.55 0 0 L3362 CTATCCCGGGCCACTCACAGCTTCCCCCGT LSRATHSFPR 1.50 0.55 0 0 L2313 ACAGGGAAGCTCCTCTGGGAACAGCTAAAG TGKLLWEQLK. 1.33 0.52 0 0 L3308 TGGGTGAGGCCTGGGACGTCAAGAACGAAA WVRPGTSRTK 1.33 0.52 0 0 L3333 TTTCGCTTCTATCGCAATCCCACCGAAATA FRFYRNPTEI 1.33 0.52 0 0 L3375 GACCGCCTCCAGGCCTGGGCCCTCACGCGT DRLQAWALTR 1.33 0.82 0 0 L3309 TGATCCATTCATGCGCCCCCACTAACAACA •SIHAPPLTT 2.67 0.52 1 1 L3247 CTGTGACTCGGCCGGCTTGGGTCTCCTCTC L*LGRLGSPL 2.50 0.55 1 2 L3263 AGGCTCCGGGCATGAAATGGTCCAGCTACA RLRA'NGPAT 1.50 0.55 1 5 L3269 CGCCCAGCACGGGGCTGAGCATACAGATTA RPARG'AYRL 1.17 0.41 1 6 L2304 GGTCAGAATGTCGCCTATTAGTCTCCATTA GQNVAY*SPL 2.33 0.52 1 7 L3364 GCGGCAGGGGACCACGCAGGATAAGGATCA AAGDHAG*GS 2.00 0.63 1 8 L2115 GCCGAACTAGTCACCGGGTGCTAAATGCAG AELVTGC*MQ 1.83 0.75 1 8 L3338 CGGTGGACGCCCCCACGCCCCTGAACAGAG RWTPPRP*TE 1.67 0.52 1 8 L2325 GCGGTAGCAAACCGGCCCCGGTAGTGAGAG AVANRPR**E 1.50 0.55 8 L2106 CGGCCAGGGCGTCGGGACAAATAGGCGACG RPGRRDK*AT 1.00 0.00 1 8 L3276 CGTGTGCGCGCAACCATCCGCATATGAGAA RVRATIRI*E 1.17 0.41 1 9 L2108 GCGGTCAAGCAGAAGACAAGCGAATGAGAG AVKQKTSE*E 1.00 0.00 1 9 L3378 CTCGACGGCTCCGGGCGACACGTAAAGTGA LDGSGRHVK* 1.83 0.41 1 10 L3253 GGCAACGAAAACATTCGGGAGTCGGGGTAA GNENIRESG* 1.17 0.41 1 10 35 Table 9: Calculated Biophysical Properties of Linker Random Full-length Variants PepPlot (Chow and Fasman Predictions] PeprJdeStructure Prediction Helii Average Average Average Hydropathy Hydrophilicity CF-Pred GOR-Pred hydrophobic Designation Zone Size Alph a Beta Turn t T b B h H T B H moment Wild-type 6 1.04 0.84 0.47 0.25 1.35 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 6.0 0.12 L2329 4.00 1.00 0.96 0.30 0.25 -0.27 0 0 0 0 0 0 0 0 0 0.16 L2102 3.50 0.96 0.96 0.46 0.24 -0.26 0 0 0 0 7 0 2 5 0 0.07 L3290 3.50 1.02 0.81 0.44 0.49 0.61 0 0 0 0 9 0 2 0 0 0.11 Average 3.67 0.99 0.91 0.40 0.32 0.03 0.0 0.0 0.0 0.0 5.3 0.0 1.3 1.7 0.0 0.11 L2301 3.33 0.99 0.77 0.43 0.28 0.38 2 0 0 0 7 0 4 0 0 0.09 L2310 3.33 1.02 1.07 0.81 0.20 -0.79 0 0 0 6 0 0 2 0 0 0.15 L2406 3.17 0.98 0.86 0.44 6.51 0.11 2 2 0 0 0 0 5 0 0 0.23 f L3233 3.17 0.99 0.87 1.02 0.35 0.63 3 0 0 0 0 0 0 4 0 0.11 L3345 3.00 0.84 0.83 0.26 0.64 -0.51 2 2 0 0 0 0 0 0 0 0.12 L3241 2.83 0.86 0.76 0.62 0.40 0.21 4 0 0 0 0 0 3 0 0 0.12 L3329 2.83 1.03 0.90 0.43 0.49 0.24 0 0 0 0 0 6 6 0 0 0.15 L3365 2.83 0.84 0.72 0.46 1.22 1.21 2 5 0 0 0 0 2 0 0 0.11 L2114 2.83 0.84 0.82 0.71 0.90 0.48 0 6 0 0 0 0 5 0 0 0.09 L2318 2.67 1.02 0.98 0.17 0.26 -0.03 3 0 0 5 0 0 2 0 0 0.20 L3282 2.67 0.97 0.88 0.76 0.40 0.74 4 0 0 0 0 0 0 0 0 0.07 L3337 2.67 0.88 0.77 0.32 0.43 0.90 0 2 0 0 0 0 3 0 0 0.07 L3377 2.67 0.98 0.82 0.19 0.55 0.16 0 2 0 0 0 0 0 0 0 0.07 L2116 2.67 0.83 0.85 0.56 0.97 0.50 1 4 0 0 0 0 4 0 0 0.12 L2124 2.50 1.00 1.03 0.76 0.40 -0.82 0 2 0 5 0 0 0 0 0 0.11 L2307 2.50 0.81 0.84 0.30 0.53 0.15 3 2 0 0 0 0 5 0 0 0.12 L2402 2.50 0.82 0.85 1.48 1.23 -0.47 2 3 0 0 0 0 4 0 0 0.10 Average 2.83 0.92 0.86 0.57 0.57 0.18 1.6 1.8 0.0 0.9 0.4 0.4 2.6 0.2 0.0 0.12 L22120 2.33 0.88 0.90 0.21 0.37 0.32 4 0 0 0 0 0 7 0 0 0.13 L22123 2.33 1.01 0.86 0.13 0.38 0 86 2 0 0 0 0 0 10 0 0 0.14 L2306 2.33 0.90 0.98 0.43 0.56 -0.04 6 0 0 0 0 0 2 0 0 0.14 L3323 2.33 1.04 0.84 0.31 0.54 0.59 2 2 0 0 0 0 0 0 10 0.12 L3387 2.33 0.92 0.84 1.07 0.30 -0.11 0 2 0 0 7 0 0 5 0 0.10 L2118 2.17 0.89 0.87 0.18 0.47 0.74 3 2 0 0 0 0 3 0 0 0.10 L3259 2.17 0.83 0.87 0.45 0.80 0.87 4 4 0 0 0 0 6 0 0 0.12 L3280 2.17 0.93 0.89 0.36 0.43 0.46 0 2 0 0 0 0 8 0 0 0.14 L3336 2.17 1.02 1.03 0.17 0.33 -0.44 0 0 0 5 0 0 0 0 0 0.12 L3240 2.00 0.84 0.99 0.51 0.67 -0.08 0 4 0 0 0 0 4 0 0 0.14 L3385 2.00 0.94 0.79 0.51 0.59 1.07 6 2 0 0 0 0 5 0 0 0.05 L3361 2.00 099 0.95 0.20 0.28 0.12 0 0 0 0 9 0 0 4 0 0.12 L3235 2.00 0.92 0.99 0.63 0.54 0.25 1 2 5 0 0 0 3 0 0 0.10 L3348 1.83 1.15 1.04 0.11 0.15 -0.07 0 0 0 0 0 7 0 0 10 0.12 L2316 1.83 0.85 0.85 0.50 0.75 0.67 0 4 0 0 0 0 6 0 0 0.10 L3267 1.67 1.02 1.01 0.24 0.30 -0.06 0 0 0 0 9 0 5 0 0 0.12 L3275 1.67 0.95 1.05 0.60 0.56 -0.47 0 2 0 0 0 0 3 0 0 0.14 L3346 1.67 1.01 0.99 0.21 0.22 0.19 0 0 0 0 0 6 3 0 7 0.13 L3281 1.50 1.12 0.96 0.54 0.25 0.77 0 0 0 0 0 0 0 4 0 0.09 L3315 1.50 0.93 0.93 0.61 0.47 0.48 1 2 0 0 6 0 6 0 0 0.09 L3362 1.50 0.97 0.95 0.29 0.30 0.05 2 0 0 0 0 0 5 0 0 0.11 Average 1.98 0.96 0.93 0.39 0.44 0.29 1.5 1.3 0.2 0.2 1.5 0.6 3.6 0.6 1.3 0.11 L2313 1.33 1.11 1.02 0.31 0.12 -0.03 0 0 0 0 0 8 0 0 7 0.18 L3308 1 33 0.88 1.01 0.95 0.72 0.58 4 2 0 0 0 0 5 0 0 0.12 L3333 1.33 0.96 1.07 1.74 0.49 0.03 0 2 0 6 0 0 4 0 0 0.10 L3375 1.33 1.13 1.03 0.27 0.14 -0.18 0 0 0 0 10 0 0 0 8 0.15 Average 1.33 1.02 1.03 0.82 0.37 0.10 1.0 1.0 0.0 1.5 2.5 2.0 2.3 0.0 3.8 0.14 36 random combination of zero to 9 amino acids. The average secretion level for linker stop mutants was lower than linker full-length mutants (figure 6B), implying that in addition to the a 1 and linker regions the downstream sequence was also required for transport. When the hemolytic zone of these variants was plotted against the position of the stop codons, no obvious trend was observed. Observations regarding the linker stop mutants should be interpreted cautiously because of the small number of variants involved. 3.7 a2 Random Library 3.7.1 Analysis of al Full-length Mutants A total of 113 al random variants were generated by replacing the 11 amino acid al coding region with 33 degenerate nucleotides. Upon sequencing of these clones, 32 were full-length mutants, 53 were stop mutants, and 28 were unintended mutants. Blood agar plate assays were performed on all full-length mutants, 39 of the stop mutants, and 13 of the unintended mutants. If the second amphiphilic helix played a critical role in transport, then most al full-length mutants would be expected to be secreted at low levels similar to those observed for al random variants. The hemolytic zones for a2 full-length mutants ranged from 1 to 6, with a mean of 5.1, and a median of 5.5 (table 10). Twenty-four of the 32 full-length mutants secreted at 5 or higher, with only one that had a hemolytic zone size lower than 2.8 (figure 6C). Interestingly, the protein product of this variant was not detectable in the cytoplasm (section 3.2), providing an explanation for the low hemolytic zone size of this outlier. The surprising distribution of a2 full-length variants provided strong evidence that the a2 region can tolerate almost any combination of amino acids without having a major effect on transport. When the helix hydrophobic moment was calculated for the mutated region of all a2 full-length mutants, no relationship was observed between the average value and secretion level (table 11). This supported the idea that an amphiphilic helix in this region is not a prerequisite for efficient transport. Other parameters were also analyzed; however, no obvious conclusions could be drawn. 37 Table 10: Genotype and Phenotype of Alpha2 Random Variants Designation Nucleotide sequence Amino acid seq. Zone Size Std. Dev. Stop? Where B2149 GGTTTTTGTCAGGGCTTCATTGATGCTTTGACT GFCQGFIDALT 6.00 0.00 0 0 B2U11 TATAATTTTCAAAGTTTTGGACCTACTATGTGC YNFQSFGPTMC 6.00 0.00 0 0 B21133 GTGATGGATGCACTTTTTAACCTAGAATGTGCT VMDALFNLECA 6.00 0.00 0 0 B23151 TTTGTAATATGTTGTTTGGTTACACAAGATGTT FVICCLVTQDV 6.00 0.00 0 0 B23176 GATGATTACTCATTATACGAACACGTGCTAGCT DDYSLYEHVLA 6.00 0.00 0 0 B3137 TCAAAGATTGGGTTGCATTCCATTTCGTTAGAG SKIGLHSISLE 6.00 0.00 0 0 B3150 GTAATTTTGATTTCACACCAGGGATTAAAATTG VILISHQGLKL 6.00 0.00 0 0 B31130 TCGAGCAGTGTAATAGTTAGTATTGTATTTCTT SSSVIVSIVFL 6.00 0.00 0 0 B3206 CCAGATGTGAGCTTTAGAACTGAAAAAGTCATG PDVSFRTEKVM 6.00 0.00 0 0 B23210 TCGCGCTATGATCCAGTTGGGCTCCTGTGCTTG SRYDPVGLLCL 5.83 0.41 0 0 B3178 AACAGTAATCCGTTTTTTTTAGAATGCAGTGAA NSNPFFLECSE 5.83 0.41 0 0 B3279 TGTTATGTTCTATGTGTTAAGGGAGGTATGCCA CYVLCVKGGMP 5.83 0.41 0 0 B21203 TATTTTGAATTTAAAAGAAACGTCAGTAGTACA YFEFKRNVSST 5.67 0.52 0 0 B21302 TCGAGATTTGATTATCAATACATGTGTTACATG SRFDYQYMCYM 5.67 0.52 0 0 B21U3 ATCAACTGGATGTTATTGTGCAACGTAGCTTTG INWMLLCNVAL 5.50 0.55 0 0 B21119 TGTTTCATTTTTCCTCTTAGAGTCGCGTGCACC CFIFPLRVACT 5.50 0.55 0 0 B21336 TTTACTTCTATAACGCTCCTGTTGGACCTAGCG FTSITLLLDLA 5.50 0.55 0 0 B3258 AAAGTTATCACTGCGAGCACGAAACTATTGCAG KVITASTKLLQ 5.33 0.52 0 0 B32268 AATAAAATTGTATGTGAATACTGTAAAATGCCA NKIVCEYCKMP 5.17 0.41 0 0 B32288 TTCTGTAATTTGTTTTCTGTCTATCTTTGCAAT FCNLFSVYLCN 5.17 0.75 0 0 B23265 TCTAGTTATCGAGTGCAGCAAGTTATGTTTGTA SSYRVQQVMFV 5.00 0.00 0 0 B3257 AGAGTTACAATTAGTAAGTCAGCCTTGTTTATG RVTISKSALFM 5.00 0.00 0 0 B3216 TCAATAAATGCGTTCTATTTTGGATTATGGCGA SINAFYFGLWR 5.00 0.00 0 0 B3240 GCA1 I'll I'GATTTGTGATATTTGGTTTTTGTTG AFLICDIWFIX 5.00 0.63 0 0 B2133 AGGCTATTCATTATAACTCCGCTAGCTTTCAAA RLFIITPLAFK 4.50 0.55 0 0 B23204 CATTGGTATTATTTTTGTTGCTCAGCGTTTGAC HWYYFCCSAFD 4.50 0.55 0 0 B3278 TTGTATCCCTTGTTTGAAACCACTGTATGTGAT LYPLFETTVCD 4.50 0.55 0 0 B2358 TGTATCAGACAGTTATTGTGGAAAACTTTATAT C1RQLLWKTLY 3.67 0.52 0 0 B23115 CCCATTTCATGCAGATTGTTTCAAGGAAAATGT PISCRLFQGKC 3.67 1.21 0 0 B21372 TATTTGATTCTGTCACGCGAAAAAATTTTTAAA YLILSREKIFK 3.00 1.10 0 0 B32259 CTTTGCTTCCCGTGGTGTTATGTGTGTTGTTTA LCFPWCYVCCL 2.83 0.75 0 0 B21321 CATGTTGTAATGTTTGAAAATGCATCTTGGCAT HWMFENASWH 1.00 0.00 0 0 B1130 TGATGGATTGTTCCTTTGTGCTGTTTGTTGTAA *WIVPLCCLL» 1.17 0.41 2 1 B3132 TAGTGGATGCAGTTGATTAAAATACATGTTCTT •WMQLIKIHVL 1.00 0.00 1 1 B31120 TAAGAGGTATGGCACGTATTTATAATTAACGGC •EVWHVFIING 1.00 0.00 1 1 B31144 TAAGTTATTAAAATGCTAGTAATTGCCTGTTGG •V1KMLVTACW 1.00 0.00 1 1 B32180 TAATTGACTATGTATGTAAAGATTGTTATCTTA •LTMYVKIVIL 1.00 0.00 1 1 B32185 TGATTTCTTCACTTTACGTTAGTTACAAAGATG •FLHFTLVTKM 1.00 0.00 1 1 B32240 TGAAATAACTTGCACGATAAGATTGTTTTACTA •NNLHDKIVLL 1.00 0.00 1 1 B3122 TAATAAAATTTTGAACTTTGTCGTTTCCTTTTG ••NFELCRFLL 1.00 0.00 2 1 B3127 TGATGGATTGTTCCTTTGTGCTGTTTGTTGTAA •WIVPLCCLL* 1.00 0.00 2 1 B3155 TGATATGACACGATTGGTGGCAGGTTAAGTTGA •YDTIGGRLS* 1.00 0.00 2 1 B3U22 TGAGAGTCTTAGGAATTAGATTATAGACTTATA •ES'ELDYRLI 1.00 0.00 2 1 B32109 TAATGAGTGTTTTACAACTGGAAGCTATGTATT "VFYNWKLCI 1.00 0.00 2 1 B32205 TAATAACATGTTGATTCTTAGGATAAGTTTGAA **HVDS*DKFE 1.00 0.00 3 1 B21312 CTCTGATGAATGTGCAATATTGTAATCAGTAGA L«MCNIVISR 1.50 0.55 2 2 B3233 TCATAGATATTCTACTACGTCTTGATTATGCCG S»IFYYVLIMP 1.00 0.00 1 2 B21104 TCCTAATTGCTTGTGCTTTCTCTTTAACTAACT S*IXVLSL*LT 1.00 0.00 2 2 B32225 AAGTGAATTCTATTTTGTAAATAACGTTGTTGA K*ILFCK»RC* 1.00 0.00 3 2 B32250 AAGTAATAAATTCTAGGCAATTGAGCAAGTTTT K**ILGN*ASF 1.00 0.00 3 2 B3180 TTTGTCTAATTCATGGTAAGCATTGGTTACACA FV'FMVSIGYT 1.33 0.52 1 3 B23142 AATATGTTTTAGTAGCTGTAACAATAGCTTACT NMF**L*Q*LT 2.00 0.00 3 4 B21136 TCTAGTAAATAAGTTATTCCAGATTGCTTTAGA SSK'VIPDCFR 1.00 0.00 1 4 B21552 GTGTCAGAATAATGATGGCGGAATGGATCAGGA VSE"WRNGSG 1.00 0.00 2 4 B32178 AAAAAATTTTAGTGATAGTTTGACACGGATTTC KKF***FDTDF 1.00 o.oo 3 4 B32145 AAAATTTTGTATTAGACATGCATGTGTCTTCAT KILY»TCMCLH 1.67 0.52 1 5 B2119 TTCATGATTTGTACCCATTAAGATTAGCTATAT •FMICTH*D*LY 2.17 0.41 2 7 B21525 TATTTTAAAAGTTTCTTCTAAATCGTTAACCGA YFKSFF»IVNR 2.00 0.00 1 7 B23153 CTCCACTTTATTTTGAAATAAGACGGGGCTGTC LHFILK»DGAV 1.83 0.41 1 7 B23107 TATTCATTTTATAGCAGGTAGCTGTGGCTGCGT YSFYSR*LWLR 1.50 0.55 1 7 B3242 TTTGTGATTAGTTTCTTTCAGTAAGACAGGAGT FVISFFQ'DRS 2.83 0.75 1 8 B3123 GCTTTTTTAGGATTCCAGAAATAATTATAAAGA AFLGFQK*L*R 2.00 0.00 2 8 B1105 AAGATCATA1 T l 1 Tl"1 "1GTTCTAATGTATTACT KIlFFLF'ClT 1.83 0.41 1 8 B32101 ACAAATrTTTTTTCCCTTTTTTATTAAAAGTTG TNFFSLFY'KL 3.33 0.82 1 9 B21130 ATTAAAGATAATTCTGCTGAGTACTAATTGATA IKDNSAEY*LI 1.00 0.00 1 9 B21229 AATATTCGTACCATTGATAACAATTAGACTTGT NIRTIDNN*TC 1.00 0.00 1 9 B21303 AAGAAATTTATAACATTGAGAAGTTAATGTCAC KKFITLRS'CH 1.00 0.00 1 9 B32253 TATATTCAATGCGAGATCGACGTTTTCTAAAGA YIQCEIDVF'R 2.33 0.52 1 10 B3268 AGTATTGATGGTTCATCAACTCTATTGTAAACC SIDGSSTLL*T 2.17 0.41 1 10 B3144 AGTATGTTAAAAAAGGCAAAGGTGTGGTAGTTT SMLKKAKVW*F 1.00 0.00 1 10 B2378 ATCTTATGGATCGTTGTAGGATATGGGATGTGA ILWIWGYGM* 4.00 0.63 1 11 38 Table 11: Calculated Biophysical Properties of Alpha2 Random Full-length Variants PepPlot (Chow and Fasman Predictions] PeptideStructure Prediction Heli i Average Average Average Hydropathy Hydrophilidty CF-Pred GOR-Pred hydrophobic Designation Zone Size Alph a Beta Turn t T b B h H T B H moment Wild-type 6.00 1.07 0.97 0.60 0.36 -0.53 3.0 0.0 0.0 0.0 0.0 7.0 0.0 5.0 0.0 0.15 B2149 6.00 0.98 1.09 0.31 0.31 -0.61 0 0 9 0 0 0 4 0 0 0.23 B21111 6.00 0.87 1.06 0.75 0.39 -0.77 2 0 0 6 0 0 10 0 0 0.18 B21133 6.00 1.16 1.03 0.26 0.20 -0.32 0 0 0 0 0 11 0 0 11 0.17 B23151 6.00 1.00 1.33 0.60 0.18 -0.74 0 0 0 11 0 0 0 11 0 0.14 B23176 6.00 1.05 1.01 0.47 0.37 -0.33 1 2 0 6 0 0 3 0 8 0.16 B3137 6.00 1.01 0.98 0.41 0.16 -0.18 0 0 0 0 0 0 2 0 0 0.09 B31S0 6.00 1.04 1.18 0.71 0.29 -0.61 0 0 0 11 0 0 0 5 0 0.12 B3U30 6.00 0.98 1.27 0.37 0.17 -1.13 2 0 0 8 0 0 0 9 0 0.11 B3206 6.00 1 05 0.99 0.43 0.26 0.51 0 0 0 0 11 0 0 0 9 0.14 B23210 5.83 0.91 1.07 0.50 0.43 -0.45 2 0 0 6 0 0 5 5 0 0.15 B3178 5.83 0.97 0.89 0.31 0.57 -0.10 0 4 0 0 6 0 0 0 6 0.11 B3279 5.83 0.89 1.13 1.42 0.36 -0.61 0 2 0 6 0 0 2 6 0 0.10 B21203 5.67 0.97 1.05 0.85 0.34 0.15 2 0 0 0 6 0 3 4 0 0.11 B21302 5.67 0.97 1.13 0.53 0.39 -0.69 0 0 0 6 0 0 7 4 0 0.12 B21113 5.50 1.07 1.22 0.29 0.12 -1.30 0 0 0 11 0 0 3 5 0 0.10 B21119 5.50 0.98 1.20 0.53 0.14 -0.87 0 0 0 10 0 0 6 5 0 0.10 B21336 550 1.08 1.15 0.28 0.12 -0.75 0 0 0 5 0 0 0 9 0 0.19 Average 5.84 1.00 111 0.53 0.28 -0.52 0.5 0.5 0.5 5.1 1.4 0.6 2.6 3.7 2.0 0.14 B3258 5.33 1.08 1.13 0.56 0.22 -0.26 0 0 0 10 0 0 0 0 11 0.21 B32268 5.17 0.98 1.04 0.43 0.25 0.00 0 0 0 6 0 0 5 4 0 0.15 B32288 5.17 0.90 1.22 0.52 0.21 -1.25 0 0 0 7 0 0 11 0 0 0.13 B23265 5.00 1.02 1.24 0.26 0.26 -0.65 0 0 0 9 0 0 2 7 0 0.14 B3257 5.00 1.08 111 0.50 0.22 -0.45 0 0 0 0 11 0 0 4 0 0.11 B3216 5.00 0.98 1.15 1.07 0.27 -1.12 0 0 0 10 0 0 4 6 0 0 11 B3240 5.00 1.11 1.25 0.44 0.11 -1.51 0 0 0 11 0 0 0 0 10 0.13 B2133 4.50 1.07 1.16 0.69 0.11 -0.80 0 0 0 6 0 0 2 6 0 0.13 B23204 4.50 0.94 1.13 0.61 0.32 -1.14 0 0 0 5 0 0 11 0 0 0.10 B3278 4.50 0.98 1.11 0.25 0.19 -0.51 0 0 9 0 0 0 3 4 0 0.19 Average 4.92 1.01 1.16 0.53 0.21 -0.77 0.0 0.0 0.9 6.4 1.1 0.0 3.8 3.1 2.'1 0.14 B2358 3.67 1.02 1.23 0.44 0.15 -0.68 0 0 0 11 0 0 6 0 0 0.23 B23I15 3.67 0.91 1.04 0.37 0.35 -0.08 2 0 0 6 0 0 11 0 0 0.13 Average 3.67 0.97 1.14 0.40 0.25 -0.38 1.0 0.0 0.0 8.5 0.0 0.0 8.5 0.0 0.0 0.18 B21372 3.00 1.09 111 0.68 6.18 0.09 0 0 0 0 10 0 0 0 11 0.13 B32259 2.83 0.89 1.26 0.46 0.33 -1.54 0 0 0 11 0 0 10 0 0 0.12 Average 2.92 0.99 1.18 0.57 0.26 -0.73 0.0 0.0 0.0 5.5 5.0 0.0 5.0 0.0 5.5 0.13 B21321 1.00 1.10 1.07 0.72 0.36 -0.74 0.0 2.0 0.0 5.0 0.0 0.0 0.0 0.0 10.0 0.09 39 3.7.2 Analysis of a2 Stop Mutants The zone size for a2 stop mutants ranged from 1 to 3, with a mean of 1.45, and a median of 1 (table 10). In these mutants, the last 27 amino acids were replaced by a random combination of zero to 10 amino acids. The low level of secretion of a2 stop mutants (figure 6C) suggested that the residues downstream of the linker must contain some important element(s) for efficient transport. The a2 region could be excluded since it could tolerate almost any amino acids, thus the required element(s) must lie within the region after the second helix. Indeed, the average secretion level of a2 stop mutants was not only much less than that of a2 full-length mutants, but also lower than those of the linker and even al stop mutants. This observation was unexplained at this point but might involve the nature of the new C-terminus in the a2 stop variants. When the hemolytic zone of these stop mutants was plotted against the position of the stop codons, the chance of secretion appeared to increase with length (figure 7A). This implied that the length might be important for transport. It should be pointed out that 13 out of 53 stop mutants had a stop codon at position 1 of the a2 region, and thus share the same genotype. Phenotype determination by blood agar plate assay was consistent—they all had an assignment of 1, with only a single exception (i.e. 1.2). Interestingly, when mutants of the same length were analyzed, those that contained positively charged residues between positions -2 to -8 consistently transported at levels lower than their counterparts. Taken together, the results suggest that the length and hydrophobicity of the last few amino acids could be important. This observation was intriguing and encouraged further investigation of the C-terminus tail using two contiguous random libraries (sections 3.8, 3.9). 3.8 C t e r m l Random Library 3.8.1 Analysis of C t e r m l Full-length Mutants A total of 95 Cterml random variants were generated by replacing the 8 amino acid Cterml coding region with 24 degenerate nucleotides. Upon sequencing of these 40 (A) Alpha2 Stop Mutants: Position v.s. Secretion (B) Cterml Stop Mutants: Position v.s. Secretion o SJ 4 w ••a - 3 S • • 2 3 4 5 6 7 8 9 10 11 Position of Stop Codon a o N o ••D >> V • l i i i T i 2 3 4 5 6 7 Position of Stop Codon Figure 7: Effect of Changing the Length and Amino Acid Composition of the C-terminus on Secretion as Demonstrated by a2 and Cterml Stop Mutants (A) The position of the stop codon of 39 a2 stop mutants is plotted against hemolytic zone size. Note that the random region becomes the extreme C-terminus in stop mutants. For instance, a2 variants that have a stop codon at position 11 contain a ten amino acid random tail. (B) The position of the stop codon of 35 Cterml stop mutants was plotted against hemolytic zone size. After position 3, the chances of obtaining optimal secretion seemed to be saturated. Note the bimodal distribution of Cterml stop mutants which is clearly illustrated here as most variants fall into either the top portion or the bottom portion. 41 clones, 48 were full-length mutants, 35 were stop mutants, and 11 were unintended mutants. Blood agar plate assays were performed on all of these variants. The hemolytic zones for Cterml full-length mutants ranged from 2.8 to 6, with a mean of 5.6, and a median of 6 (table 12). Forty-four of the 48 full-length mutants were secreted at or close to wild-type level. This clearly showed that the Cterml region could tolerate almost any combinations of amino acids (figure 6D). When the predicted secondary structure, hydropathy, and hydrophilicity were determined for the mutated region of each variant, no trend was observed between any of the biophysical properties and the secretion level (table 13). This was not surprising since any combination of amino acids in this region would allow wild-type transport. Since removal of the last 27 amino acids resulted in a dramatic reduction in transport, and both the a2 and Cterml (-27 to -9) regions are not particularly critical, by elimination the extreme C-terminus is likely to be a region of great importance. 3.8.2 Analysis of Cterml Stop Mutants The zone size for Cterml stop mutants ranged from 2 to 5.8, with a mean of 3.7, and a median of 3.2 (table 12). In these mutants, the last 16 amino acids were replaced by a random combination of zero to 7 amino acids. Unlike stop mutants from other libraries, there was a sizable proportion of Cterml stop mutants that were not affected by truncation, leading to a bimodal distribution (figure 6D)—about half were secreted at or close to wild-type level (top curve of figure 7B), and the other half were secreted at below 50% efficiency (bottom curve of figure 7B). Stop mutants that were secreted at high levels appeared to have relatively hydrophobic tails (last 8 amino acids), and vice versa. Taken together, results from this library supported the idea that hydrophobicity in the extreme C-terminus is important. When the hemolytic zone of these stop mutants was plotted against the position of the stop codons, the chance of secretion appeared to increase with the length only up to the third residue (figure 7B). After this position, the transport level seemed to be saturated. Compared to most a2 stop mutants, Cterml stop mutants transported at higher levels. Since the two populations differed mainly in the length of the tail, and a positive trend was observed as length increased, it was tempting to suggest that in 42 Table 12: Genotype and Phenotype of Cterml Random Variants Designation Nucleotide sequence Amino acid seq. Zone Size Std. Dev. Stop? Where CHOI TTTGTCGGATCCCGGTCACGATTT FVGSRSRF 6.00 0.00 0 0 C1116 TCACTTAAGCTTAGTCTCACTACT SLKLSLTT 6.00 0.00 0 0 Cl 121 TCTAGAATGGAATTGTCAACCTGC SRMELSTC 6.00 0.00 0 0 Cl 1104 t l 111111GGGCAGTCCCCGCGAG LFLGSPRE 6.00 0.00 0 0 C1206 CGTAAAATTACTACAGGTTGTCAG RKITTGCQ 6.00 0.00 0 0 C1234 ATAACCACAGGGCGCCATGCCTn ITTGRHAF 6.00 0.00 0 0 C1237 TTGAGGTGTGAATCGGGATCCTCA LRCESGSS 6.00 0.00 0 0 C1244 TTTGTTGTCATTGGAGCCCGGCAC FWIGARH 6.00 0.00 0 0 C1265 GATTTTGACGAATGTGCTTTTAGT DFDECAFS 6.00 0.00 0 0 C12102 AGTATGGTGTATAATrCAGCGGCT SMVYNSAA 6.00 0.00 0 0 CI2110 AAATTTAATGATTGCATTTTCTCC KFNDCIFS 6.00 0.00 0 0 C12123 CAGATTAACTCGGGTGTTGTATTG QINSGWL 6.00 0.00 0 0 C12133 TCCAATATGGTGGCACGTTTTAAA SNMVARFK 6.00 ooo 0 0 C12162 GA1T1TCAAAATGCG1T11G1111 DFQNAFCF 6.00 0.00 0 0 C12168 GTTTACTCTCATAGGTCAACGAAC VYSHRSTN 6.00 0.00 0 0 C12227 TCCCCATTCTACGCGAGTTnTTA SPFYASFL 6.00 0.00 0 0 C12242 ATGTCGGGGAGTCCTTGCACAGTG MSGSPCTV 6.00 0.00 0 0 C12247 GCAATGCCGAATAGCTATCAGTTC AMPNSYQF 6.00 0.00 0 0 C12318 GAGACTATTACTATATCGCACATA ETmSHI 6.00 0.00 0 0 C12345 TCGAATCCGGATTATACCTTAAAC SNPDYTLN 6.00 0.00 0 0 C12376 AAGTTCTTGATCCAGTCCACACCG KFLIQSTP 6.00 0.00 0 0 C11266 ATAGGAAAATGTTGTAGCCGTTTT IGKCCSRF 6.00 0.00 0 0 C12122 AGGATATACTTAGTTAGCTTACCG RIYLVSLP 6.00 0.00 0 0 C12167 TTGTTAGTACTCAGCCCAACCGGC LLVLSPTG 6.00 0.00 0 0 C12226 ATAACTGGGGTGAATAAAGATATG ITGVNKDM 6.00 0.00 0 0 C1104 TGTCTGACCACATGTAATAGTGAT CLTTCNSD 5.83 0.41 0 0 C1280 TGCGTATATGATAGTACGGACGTG CVYDSTDV 5.83 0.41 0 0 C12108 AAAAACATCTCGAGAACTCTCGGG KNISRTLG 5.83 0.41 0 0 C12135 GCCGACGAGAACCGATGTAAATCT ADENRCKS 5.83 0.41 0 0 C12237 AATAATCAGTCAGGGAGTAAATTG NNQSGSKL 5.83 0.41 0 0 C12312 GTATTTCCGTCTCATCGATCGTCC VFPSHRSS 5.83 0.41 0 0 C11205 AGTAGAAAGTGTGGATCTCTTGGG SRKCGSLG 5.67 0.52 0 0 C1270 TCGAAGTTTTATACAACGAAAATG SKFYTTKM 5.67 0.52 0 0 C12182 AAAAGTACATTTACGTTTAGGAGC KSTFTFRS 5.67 0.52 0 0 C12221 CGTATTTTGATAGCCTCACTGAGG R1UASLR 5.67 0.52 0 0 C12282 GTTTGCAACCTTCGCTGrTGCGGC VCNLRCCG 5.67 0.52 0 0 CM 166 AACGATCTCATCATCTCCCTGAGT NDLIISLS 5.67 0.82 0 0 C12172 AGTAAGTGCTGTTGTGCCAGCGCG SKCCCASA 5.67 0.82 0 0 C1282 TGGCATTTCATGGCCACTGTCTAC WHFMATVY 5.50 0.55 0 0 C12181 TCCCGTCAGAAGAATAGCATAACT SRQKNSIT 5.50 0.55 0 0 C12250 GTCCCTGGAGTATGTTATAAACGC VPGVCYKR 5.50 0.55 0 0 C12362 GCAGTGTGCCGCAGGAGTAATCAG AVCRRSNQ 5.50 0.84 0 0 C1120 GCCAGTCGATGCGTTAACAAGCAT ASRCVNKH 5.17 0.98 0 0 C12209 AATG11I1G111 TGTTTCTTTGTC NVLFCFFV 5.00 0.89 0 0 C1272 TTCCGAATGGTGCGATGTAAGACC FRMVRCKT 3.83 0.75 0 0 C12363 ACAAATAAGGAAAGATACAGATTG TNKERYRL 3.67 1.03 0 0 C11261 TTAAAAACCTGGCGATGGTTATAC LKTWRWLY 3.00 0.63 0 0 C12313 ATCCGTAACTTCCGTCGAGTTACA IRNFRRVT 2.83 0.75 0 0 C12178 TAGGTGTGCGAAACAAGTATAACG •VCETSIT 3.00 0.00 1 1 C12381 TGATTTAGAGCAAGGGGCAATATA •FRARGNI 2.67 0.52 1 1 C12240 TGAGTTGTACAGCGGACTCTAGCC •WQRTLA 2.50 0.55 1 1 C12351 TGA1 111 1AAGAGCAGTCTTTAAG •FLRAVFK 2.50 0.55 1 1 C12136 TAGTAnTGTTGTGCTGCCTCAAT •YLLCCLN 2.33 0.52 1 1 C12121 ATATAAATGAATCTTCTTTCTGAC I'MNLLSD 4.50 1.05 1 2 C12115 GGTTAACTGGAGTGTCGCTAATTG G*LECR*L 2.83 0.41 2 2 C12331 ACATAAATCTAAGGACATGTGAAG T^I^GHVK 2.83 0.41 2 2 C1285 GGCTAAAAGAAATGTGGGTACAGG G'KKCGYR 2.17 0.41 1 2 C12117 ATATTGTAGTGTTGACCCATTGTT IL'C 'PIV 5.83 0.41 2 3 C1267 TATTCTTAAGCAGGATAAACGTGC YS*AG*TC 5.17 0.41 2 3 C12348 TCTCAGTAGAGCTCAGCATTCTGT SQ'SSAFC 2.83 0.41 1 3 Cl 126 CGTACGTAACCATTGTGAATAAAC R T P L ' I N 2.33 0.52 2 3 C11274 AGGCAGTAGGATTCGCTAAGTTTT RQ'DSLSF 2.33 0.52 1 3 C12263 TGCTTAGCATAAACGATTATTGAG CLA'TIIE 5.83 0.41 1 4 Cl1214 GTCTCAGGCTAAAG1TCTGATCCG VSG*SSDP 3.50 0.55 1 4 C12184 TCGACGAGCTAACGACAATGTGTC STS«RQCV 3.17 0.41 1 4 C1247 GAGTCTAGGTGAAGCGTAAACTGG ESR'SVNW 2.83 0.41 1 4 C12352 AATCATAACTAGTCACGACTAAGC NHN'SRLS 2.67 0.52 1 4 C1130 GCTAAAGATTGAGTATATAAGCCA AKD*VYKP 2.00 0.00 1 4 C11103 TTTAAAAAGTGAGACTGAAAGGGG FKK*D*KG 2.00 0.63 2 4 Cl 1129 TTTTTCATCTGTTAGACTAAACCA FF1C*TKP 5.00 0.89 1 5 Cl1161 AGATCCGTCCAGTGAATAGGATTA RSVQ'IGL 2.67 0.52 1 5 C1215 TTCCCTAAGTTCCATTAGTAAAAA FPKFH**K 5.83 0.41 2 6 C12254 GTGATATTTTTAAAATAAAAACAG VTFLK'KQ 5.17 0.75 1 6 C12320 GCAATTAGCCTATGGTGAAACCGC AISLW'NR 5.17 0.75 1 6 C1219 ATGCAAACGGCACGATAATCACTC MQTAR'SL 3.17 0.41 1 6 C12377 TCGAAAAAGGTCTTATGAAATTGA S K K v X ' N * 2.33 0.52 2 6 C12126 TTCGGGCGTGCCTGTCCGTAACGA FGRACP'R 5.83 0.41 1 7 C12388 TGGAGTT ACTA 111' 1' 111ATTGATTG WSYYFY*L 5.50 0.55 1 7 C1132 GCACGTGTTCTCTTATCGTAATTC ARVLLS*F 5.33 0.52 1 7 Cl1159 CGGTGCGGGACCCCGTTGTGACTG RCGTPL'L 5.00 0.63 1 7 C12332 CTGAAACTTACGATCGCCATTTAA LKLT1A1* 5.50 0.55 1 8 C12375 GCCAAGATGGTCTCTACCTTGTAG AKMVSTL* 5.17 0.98 1 8 C1231 TCGGACACCATCAGACCCAAGTAA SDTKPK* 3.50 0.55 1 8 43 Table 13: Calculated Biophysical Properties of Cterml Random Full-length Variants PepPlot (Chow and Fasman Predictions) PeptideStructure Prediction Average Average Average Hydropathy Hydrophilicity CF-Pred GOR-Pred Designation Zone Size Alph i Beta Turn t T b B h H T B H Wild-type 6.00 0.82 0.93 1.12 0.65 0.18 5.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 C1101 6.00 0.92 1.07 0.84 0.58 0.30 3 0 0 0 0 0 2 0 0 C1116 6.00 1.00 1.07 0.28 0.12 -0.40 0 0 0 0 7 0 0 7 0 C1121 6.00 1.03 0.94 0.20 0.18 0.21 0 0 0 0 0 0 7 0 0 C11104 6.00 0.99 0.92 1.89 0.45 0.22 0 2 0 0 0 0 2 0 0 C1206 6.00 0.91 1.09 0.42 0.47 0.08 0 0 0 5 0 0 8 0 0 C1234 6.00 0.98 1.09 0.22 0.38 -0.24 0 2 0 0 0 0 0 0 0 C1237 6.00 0.91 0.85 0.82 0.68 0.59 1 3 0 0 0 0 6 0 0 C1244 6.00 1.04 1.22 1.27 0.11 -0.50 0 0 0 5 0 0 0 5 0 C1265 6.00 1.09 0.87 0.09 0.35 0.24 0 0 0 0 8 0 0 0 8 C12102 6.00 1.03 1.03 1.24 0.31 -0.67 0 0 0 0 0 0 0 4 0 C12U0 6.00 0.96 1.06 0.14 0.38 -0.30 3 0 0 0 0 0 7 0 0 C12123 6.00 0.94 1.22 0.31 0.46 -0.76 0 0 0 8 0 0 0 4 0 C12133 6.00 1.08 1.03 0.20 0.13 0.09 0 0 0 0 0 6 0 0 8 C12162 6.00 1.04 1.09 0.07 0.27 -0.87 0 0 0 7 0 0 8 0 0 C12168 6.00 0.85 1.07 0.99 0.48 0.07 4 0 0 0 0 0 5 0 0 C12227 6.00 0.96 1.05 0.24 0.53 -1.19 0 0 0 6 0 0 0 5 0 C12242 6.00 0.84 0.99 0.16 0.87 -0.42 1 4 0 0 0 0 6 0 0 C12247 6.00 0.98 1.00 0.14 1.04 -0.73 0 4 0 0 0 0 2 0 0 C12318 6.00 1.02 1.15 0.27 0.09 -0.57 0 0 0 5 0 0 0 4 0 C12345 6.00 0.80 0.95 0.17 1.61 -0.15 1 2 0 0 0 0 3 0 0 C12376 6.00 0.98 1.08 0.94 0.13 -0.46 0 0 0 0 7 0 0 5 0 CU266 6.00 0.89 1.07 0.18 0.58 0.10 2 2 0 0 0 0 8 0 0 C12122 6.00 0.95 1.20 1.20 0.10 -0.90 0 0 0 5 0 0 0 6 0 C12167 6.00 0.93 1.11 2.33 0.51 -0.76 0 2 0 5 0 0 0 4 0 C12226 6.00 0.98 1.06 0.39 0.46 0.38 0 2 0 5 0 0 0 0 0 C1104 5.83 0.84 1.03 0.35 0.49 -0.07 2 0 0 5 0 0 8 0 0 C1280 5.83 0.89 1.14 2.11 0.58 0.15 5 0 0 0 0 0 7 0 0 C12108 5.83 0.91 1.02 0.53 0.26 0.20 0 0 0 5 0 0 2 4 0 C12135 5.83 1.03 0.78 0.11 0.58 1.37 2 2 0 0 0 0 8 0 0 C12237 5.83 0.87 0.90 0.64 0.94 0.36 1 3 0 0 0 0 2 0 0 C12312 5.83 0.88 0.96 0.37 0.62 0.17 3 2 0 0 0 0 3 0 0 C11205 5.67 0.84 0.90 0.34 0.64 0.33 6 0 0 0 0 0 6 0 0 C1270 5.67 1.00 1.06 0.67 0.31 -0.17 0 0 0 5 0 0 7 0 0 C12182 5.67 0.95 . 1.04 0.15 0.23 -0.15 0 0 0 5 0 0 0 4 0 C12221 5.67 1.09 1.16 0.45 . 0.09 -0.34 0 0 0 5 0 0 0 6 0 C12282 5.67 0.82 1.14 0.24 0.37 -0.32 0 0 6 0 0 0 8 0 0 C11166 5.67 0.98 1.09 0.53 0.10 -0.59 0 0 0 5 0 0 0 7 0 C12172 5.67 0.96 0.93 0.22 0.52 -0.19 0 2 0 0 0 0 8 0 0 C1282 5.50 1.08 1.23 0.32 0.09 -1.41 0 0 0 8 0 0 0 0 0 C12181 5.50 0.92 0.99 0.20 0.40 0.55 0 2 0 0 0 0 4 0 0 C12250 5.50 0.85 1.13 0.21 0.48 -0.07 0 0 0 5 0 0 0 6 0 C12362 5.50 0.96 1.04 0.85 0.61 0.66 0 4 0 0 0 0 8 0 0 Average 5.87 0.95 1.04 0.55 0.44 -0.15 0.8 0.9 0.1 2.2 0.5 0.1 3.2 1.7 0.4 C1120 5.17 0.97 0.99 0.28 0.43 0.38 2 0 0 0 0 0 8 0 0 C12209 5.00 1.01 1.37 0.11 0.11 -1.78 0 0 0 8 0 0 0 8 0 Average 5.08 0.99 1.18 0.19 0.27 -0.70 1.0 0.0 0.0 4.0 0.0 0.0 4.0 4.0 0.0 C1272 3.83 1.04 1.14 0.24 0.19 0.37 0 0 0 6 0 0 4 4 0 C12363 3.67 1.00 0.98 0.17 0.36 1.04 0 0 0 0 0 0 5 0 0 Average 3.75 1.02 1.06 0.21 0.27 0.71 0.0 0.0 0.0 3.0 0.0 0.0 4.5 2.0 0.0 Cl 1261 3.00 1.03 1.21 0.41 0.21 -1.02 0 0 0 5 0 0 5 0 0 C12313 2.83 0.96 1.19 0.37 0.24 0.48 0 0 0 8 0 0 6 0 0 Average 2.92 1.00 1.20 0.39 0.23 -0.27 0.0 0.0 0.0 6.5 0.0 0.0 5.5 0.0 0.0 44 addition to the hydrophobicity of the extreme C-terminus, the number of residues after the linker is also important. 3.9 Cterm2 Random Library 3.9.1 Analysis of Cterm2 Full-length Mutants A total of 96 Cterm2 random variants were generated by replacing the 8 amino acid Cterm2 coding region with 24 degenerate nucleotides. Upon sequencing of these clones, 65 were full-length mutants, 10 were stop mutants, and 21 were unintended mutants. Blood agar plate assays were performed on all of these and the phenotype was correlated to the genotype. Unlike other regions of the signal sequence, random mutagenesis of Cterm2 region actually modified the extreme C-terminus. Based on results from the a2 and Cterml stop mutants, this tail region was believed to be important. Furthermore, Cterm2 variants were predicted to display a bimodal distribution similar to Cterml stop mutants, which was indeed observed (figure 6E). The hemolytic zones of Cterm2 full-length mutants ranged from 1 to 6, with a mean of 4.2, and a median of 4.5 (table 14). Thirty-two of the 65 full-length mutants satisfied the required element(s) at the C-terminus and secreted at or close to wild-type level. While none of the Cterm2 high secretors contained more than one positively charged residue (R or K), almost all mutants secreting at 3 or lower had at least two positively charged residues, with the lowest secretor containing 4 arginine residues secreting at 1. A number of biophysical properties were calculated, and the average hydrophilicity of the last 8 residues decreased with increasing secretion (table 15). This provided strong evidence that the hydrophobicity of the extreme C-terminus is critical for efficient transport. 3.9.2 Analysis of Cterm2 Stop Mutants The zone size for Cterm2 stop mutants ranged from 2 to 6, with a mean of 5.2, and a median of 5.7 (table 14, figure 6E). Since the number of mutants in this category was small (ten only), no extensive analysis was done. 45 Table 14: Genotype and Phenotype of Cterm2 Random Variants Designation Nucleotide sequence Amino acid seq. Zone Size Std. Dev. Stop? Where D1206 GCGGAACCGCCGTGGGCCGGGCCA AEPPWAGP 6.00 0.00 0 0 D12144 TGGGTCAGCGATTCCGGCGGAGCG WVSDSGGA 6.00 0.00 0 0 D12151 GGCGTGGGGGGAGGCCGGTGGGCA GVGGGRWA 6.00 0.00 0 0 D12152 CTGGGGAGCGCGGGGTGGTCGGGG LGSAGWSG 6.00 0.00 0 0 D3109 GGGAGAGGCTACGCGGCGGGTGCG GRGYAAGA 6.00 0.00 0 0 D3U3 CTGTCAGGAGTCGAGCCGGGGGGG LSGVEPGG 6.00 0.00 0 0 D3U7 GGTGGGCCCCTGCGGGCCGACAGT GGPLRADS 6.00 0.00 0 0 D3132 CTAACAATGTGGGGCGCGGGCAGG LTMWGAGR 6.00 0.00 0 0 D3138 CCGTTGGTCGGAGATGGGGGGACC PLVGDGGT 6.00 0.00 0 0 D3150 TCACGGGAAGTAGCATTATGGGCG SREVALWA 6.00 0.00 0 0 D3218 TGTGCGTACGGCCGCGCGAATGAT CAYGRAND 6.00 0.00 0 0 D3242 ACGGCGGCGCTCGCTACCGGCGGC TAALATGG 6.00 0.00 0 0 D3243 CCGGCGCTTCGGCTGGCTGCCCCG PALRLAAP 6.00 0.00 0 0 D3288 GAAGGCGGCGAGTGGGCGGTGGCA EGGEWAVA 6.00 0.00 0 0 D3130 ACGGTGGCCCGGCTGAGCGGGGCC TVARLSGA 6.00 0.00 0 0 D3159 CCATGGTACACCGGGGCCATGGGA PWYTGAMG 6.00 0.00 0 0 D3108 TCACTGGGGGCGGTGGCGGCGTCG SLGAVAAS 5.83 0.41 0 0 D3120 CCAACGGTGGGGGGGCACGGAGCG PTVGGHGA 5.83 0.41 0 0 D3165 CTGGCCGGGCGCGGGGGCGAGTGC LAGRGGEC 5.83 0.41 0 0 D3257 GTGGGGACGGAGGTGGCGGGCGGG VGTEVAGG 5.83 0.41 0 0 D1166 ATAGGCTCTCGGGGCCAGGCCTGG IGSRGQAW 5.67 0.52 0 0 D3217 TTTGCTCACCGGCCGTGGTGGCCC FAHRPWWP 5.67 0.52 0 0 D3228 GGCGGGGCTGGCGAGGCGCCGGCC GGAGEAPA 5.67 0.52 0 0 D3269 TGTGGACCCGTGGGGCAGGTCCAC CGPVGQVH 5.67 0.52 0 0 D3278 CTTGAGTGCGCTGGGCGTACTGCT LECAGRTA 5.67 0.52 0 0 D3170 GATCCGGCCCGGAGCGCGGTGACA DPARSAVT 5.67 0.52 0 0 D1363 TCCATCAAATGGCCGTCGGGTCAA SIKWPSGQ 5.50 0.55 0 0 D12101 TTGCGCGGGGCGCCGCCCACTGTC LRGAPPTV 5.33 0.52 0 0 D3104 GGACCGCGGGGTTGGGCAGGGGTG GPRGWAGV 5.33 1.03 0 0 D3268 CCGCCCGCCTCGGGTACGACACCC PPASGTTP 5.00 0.63 0 0 D3137 GCCCGTGGCGCGTGTGCGGGACCG ARGACAGP 5.00 0.89 0 0 D3238 TACATCCTAGGGTGGGGGCGTCGC YILGWGRR 5.00 0.89 0 0 D3102 CGATCTGTAATGTGCAAAGCTATC RSVMCKAI 4.50 0.55 0 0 D3149 CGCGGTTCGCACCCCGCGGGGGTT RGSHPAGV 4.17 0.41 0 0 D1231 TCGGACGCCCGGCGCCCGGGTCTG SDARRPGL 4.00 0.63 0 0 D3105 AAGTCAGGGGCTTGCCCCGCGCAG KSGACPAQ 3.83 0.41 0 0 D3140 CGGGCGCGCGGCCTGGGGGGCTTG RARGLGGL 3.83 0.41 0 0 D3245 GGGGACGGCGGTGGTGAGGGGGGC GDGGGEGG 3.83 0.41 0 0 D3279 GGCGCGGGGCAGCGGGCAATACAG GAGQRAIQ 3.83 0.41 0 0 D3157 ATGACCGCGCGCTTGGTAGGGAGA MTARLVGR 3.50 0.84 0 0 D3287 CGGGTGAAGCCGTGGGGGGGAGCC RVKPWGGA 3.17 0.41 0 0 D3252 TGGCTTGGTAGGGGTCGACGGACG WLGRGRRT 2.83 0.41 0 0 D3203 CGCGAACTGGGCTGGAGGCGGTGG RELGWRRW 2.83 0.75 0 0 D1306 CTCCGGGGGCGAAGGCGGCTCGCC LRGRRRLA 2.67 0.52 0 0 D1368 TTGCTGAGGGTGCGGCTGCTGGGA LLRVRLLG 2.67 0.52 0 0 D1230 GAGAGGGGGCGGGGCGCGGGGCCC ERGRGAGP 2.50 0.55 0 0 D3201 CCAGCTGCGGGGGCCAGGATGCGG PAAGARMR 2.50 0.55 0 0 D3219 CGCGAACTGGGCTGGAGGCGGTGG RELGWRRW 2.50 0.55 0 0 D3241 GGGAGCGCCAAGGCGGCGGGGAGC GSAKAAGS 2.50 0.55 0 0 D3275 TGCGGCAGACCGCTGCGGGGGGGG CGRPLRGG 2.50 0.55 0 0 D3204 CGCGAACTGGGCTGGAGGCGGTGG RELGWRRW 2.33 0.52 0 0 D1347 CAACGTGGGGGCGGTAGATACGAC QRGGGRYD 2.17 0.41 0 0 D3265 GGCCTCCGGGGGCGGACCGGGGCC GLRGRTGA 2.17 0.41 0 0 D3101 CGAACGGGGGCGAGAGGGGCGGGG RTGARGAG 2.00 0.00 0 0 D3141 GGATCAAGGGCGCGTGCGTCGGGG GSRARASG 2.00 0.00 0 0 D3144 AGAGGAGATAGCGCCACACGGGCA RGDSATRA 2.00 O.OO 0 0 D3239 GTTAAAGTAACCGCCGTGCGGAGA VKVTAVRR 2.00 0.00 0 0 D3263 AAGGCTCGCGGGGGGGTGGCAGGA KARGGVAG 2.00 0.00 0 0 D3124 CGCAGGGATGATGGTGGGGAGACA RRDDGGET 2.00 0.00 0 0 D3122 GGGGGAGAGGTGGGTCGGCGACGG GGEVGRRR 1.83 0.41 0 0 D3282 AGGCAGGGAAGGCGCGGCGGGGCC RQGRRGGA 1.83 0.41 0 0 D3261 ACCGCCGGAGCCCGGCGGTCGCAC TAGARRSH 1.67 0.52 0 0 D3220 TCCGGGGCCGGCCGAAAGCGGGGG SGAGRKRG 1.17 0.41 0 0 D3286 GCCAAAGCGAGGCGACGGGGACCT AKARRRGP 1.17 0.41 0 0 D3224 AGCAGGGCTCGGCGGGAGCGCCAT SRARRERH 1.00 0.00 0 0 D3209 TGACGGCGGTACTGTCCTGCGTTA •RRYCPAL 5.33 0.82 1 1 D3214 TAACAGGTGCGCGTGGGAGGGAGC •QVRVGGS 5.33 1.03 1 1 D3235 CCATAGAGGTAAGGGCTTGCGTGC P*R*GLAC 6.00 0.00 2 D13116 GATTAGTGGAGCACACAACGAGCG D*WSTQRA 5.67 0.52 1 2 D3126 CTATAGCGGGCGGATCGCTGTACG L'RADRCT 5.67 0.52 1 2 D311S TGTCATGGGTAGGGGGGCGGCGGG CHG'GGGG 6.00 0.00 1 4 D3234 CCGGTCGGATGAGGGGGGGTAGGC PVG'GGVG 5.67 0.52 1 4 D3146 GGATGCAGGTAGTGGGGCGGGAGA GCR»WGGR 4.17 0.75 1 4 D12334 TCGCAACGCAGTCGTGACTGAATG SQRSRD+M 2.00 0.00 1 7 D2148 TCGGCCTGGGGGGGGGCCCGGTAG SAWGGAR* 5.67 0.52 1 8 46 Table 15: Calculated Biophysical Properties of Cterm2 Random Full-length Variants PepPlot (Chow and Fasman Predictions) PeptideStructure Prediction Average Average Average Hydropathy Hydrophilicity CF-Pred GOR-Pred Designation Zone Size Alpha Beta Turn t T b B h H T B H Wild-type 6.00 1.17 1.18 0.43 0.17 -0.61 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 D1206 6.00 0.96 0.73 0.11 0.38 -0.31 5 0 0 0 0 0 2 0 0 D12144 6.00 0.91 0.93 1.47 1.44 0.04 1 4 0 0 0 0 0 0 0 D12151 6.00 0.85 0.98 0.77 0.77 -0.20 3 2 0 0 0 0 0 0 0 D12152 6.00 0.87 0.91 0.62 0.58 -0.68 1 2 0 0 0 0 2 0 0 D3109 6.00 0.96 0.89 0.31 0.49 -0.25 0 2 0 0 0 0 2 0 0 D3I13 6.00 0.85 0.87 0.83 0.79 0.11 0 2 0 0 0 0 2 0 0 D3117 6.00 0.89 0.80 0.15 0.31 0.55 4 0 0 0 0 0 0 0 0 D3132 6.00 1.01 1.02 0.58 0.32 -0.55 0 0 0 0 6 0 0 0 0 D3138 6.00 0.80 0.94 1.67 0.89 0.03 1 3 0 0 0 0 2 0 0 D3150 6.00 1.18 1.01 0.15 0.13 -0.44 0 0 0 0 0 8 0 0 8 D3218 6.00 0.93 0.93 0.57 0.52 0.34 2 2 0 0 0 0 8 0 0 D3242 6.00 1.03 0.96 0.45 0.33 -0.53 0 0 0 0 0 6 2 4 0 D3243 6.00 1.10 0.89 0.22 0.08 -0.24 0 0 0 0 0 7 0 0 8 D3288 6.00 1.14 0.87 0.08 0.25 -0.18 0 0 0 0 0 6 0 0 0 D3130 6.00 1.03 1.04 0.54 0.24 -0.08 2 0 0 5 0 0 0 5 0 D3159 6.00 0.90 1.00 0.54 0.24 -0.93 2 0 0 0 0 0 2 0 0 D3108 5.83 1.08 0.97 0.16 0.10 -0.56 0 0 0 0 8 0 0 0 7 D3120 5.83 0.82 0.92 0.67 0.50 • -0.36 4 0 0 0 0 0 0 0 0 D3165 5.83 0.94 0.86 0.46 0.78 0.54 3 2 0 0 0 0 6 0 0 D3257 5.83 0.95 1.01 0.27 0.28 -0.03 0 0 0 0 6 0 0 5 0 D1166 5.67 0.95 1.01 0.26 0.70 -0.11 0 3 0 0 0 0 0 0 0 D3217 5.67 0.98 0.98 0.86 0.46 -0.89 1 2 0 0 0 0 2 0 0 D3228 5.67 1.01 0.71 0.14 0.23 0.23 2 0 0 0 0 0 0 0 0 D3269 5.67 0.83 1.08 0.11 0.34 -0.53 0 0 0 5 0 0 2 4 0 D3278 5.67 1.08 0.92 0.31 0.38 0.33 0 2 0 0 0 0 0 0 8 D3170 5.67 1.01 0.92 0.09 0.37 0.31 3 0 0 0 0 0 0 0 0 D1363 5.50 0.89 0.95 1.11 0.97 -0.21 0 3 0 0 0 0 6 0 0 Average 5.88 0.96 0.93 0.50 0.48 -0.17 1.3 1.1 0.0 0.4 0.7 1.0 1.4 0.7 1.1 D12101 5.33 0.90 0.98 0.46 0.32 -0.15 2 0 0 0 0 0 3 0 0 D3104 5.33 0.85 0.95 0.21 1.12 -0.39 0 3 0 0 0 0 3 0 0 D3268 5.00 0.77 0.80 0.24 0.61 -0.14 4 0 0 0 0 0 2 0 0 D3137 5.00 0.96 0.83 0.22 0.39 -0.04 2 0 0 0 0 0 5 0 0 D3238 5.00 0.90 1.14 2.59 0.22 -0.28 0 0 0 5 0 0 6 0 0 D3102 4.50 1.08 1.10 0.19 0.16 -0.05 0 0 0 6 0 0 0 0 8 Average 5.03 0.91 0.97 0.65 0.47 -0.18 1.3 0.5 0.0 1.8 0.0 0.0 3.2 0.0 1.3 D3149 4.17 0.87 0.89 0.24 0.55 -0.06 3 2 0 0 0 0 2 0 0 D1231 4.00 0.94 0.82 0.46 0.62 0.91 0 2 0 0 0 0 4 0 0 D3105 3.83 0.97 0.84 0.15 0.44 0.00 2 2 0 0 0 0 3 0 0 D3140 3.83 0.94 0.94 0.41 0.43 0.07 2 0 0 0 0 0 4 0 0 D324S 3.83 0.74 0.68 0.40 1.34 078 1 5 0 0 0 0 0 0 0 D3279 3.83 1.03 0.99 0.07 0.30 0.10 0 0 0 0 7 0 0 4 0 D3157 3.50 1.06 1.09 0.17 0.17 0.09 0 0 0 6 0 0 0 5 0 Average 3.86 0.94 0.89 0.27 0.55 0.27 1.1 1.6 0.0 0.9 1.0 0.0 1.9 1.3 0.0 D3287 3.17 0.93 0.95 1.06 0.41 -0.11 0 2 0 0 0 0 2 0 0 D32S2 2.83 0.90 1.02 0.53 0.52 0.85 3 2 0 0 0 0 6 0 0 D3203 2.83 1.05 0.99 0.58 0.30 0.32 0 0 0 0 8 0 6 0 0 D1306 2.67 1.04 0.99 0.21 0.39 1.19 4 0 0 0 0 0 5 0 0 DI368 2.67 1.05 1.19 0.27 0.09 -0.28 0 0 0 7 0 0 0 7 0 D1230 2.50 0.90 0.73 0.35 0.55 0.88 6 0 0 0 0 0 5 0 0 D3201 2.50 1.10 0.84 0.12 0.20 • 0.48 0 0 0 0 8 0 0 0 8 D3219 2.50 1.05 0.99 0.58 0.30 0.32 0 0 0 0 8 0 6 0 o-D3241 2.50 1.01 0.78 0.17 0.27 0.28 0 0 0 0 6 0 0 0 0 D3275 2.50 0.77 0.89 0.30 0.50 0.50 2 2 0 0 0 0 6 0 0 Average 2.67 0.98 0.94 0.42 0.35 0.44 1.5 0.6 0.0 0.7 3.0 0.0 3.6 0.7 0.8 D3204 2.33 1.05 0.99 0.58 0.30 0.32 0 0 0 0 8 0 6 0 0 D1347 2.17 0.81 0.90 0.34 0.93 0.83 3 3 0 0 0 0 4 0 0 D3265 2.17 0.89 0.93 0.54 0.47 0.49 5 0 0 0 0 0 2 0 0 D3101 2.00 0.92 0.87 0.28 0.41 0.48 4 0 0 0 0 0 0 0 0 D3141 2.00 0.94 0.82 0.26 0.34 0.71 4 0 0 0 0 0 2 0 0 D3144 2.00 1.00 0.84 0.20 0.42 0.85 3 0 0 0 0 0 3 0 0 D3239 2.00 1.07 1.22 0.29 0.10 0.41 0 0 0 8 0 0 0 8 0 D3263 2.00 0.97 0.91 0.20 0.43 0.23 0 2 0 0 0 0 2 0 0 D3124 2.00 0.93 0.75 0.63 1.24 1.68 1 4 0 0 0 0 4 0 0 D3122 1.83 0.90 0.89 0.31 0.30 1.40 2 0 0 0 0 0 3 0 0 D3282 1.83 0.90 0.87 0.42 0.76 1.05 2 3 0 0 0 0 6 0 0 D3261 1.67 1.00 0.89 0.22 0.34 0.74 0 0 0 0 0 0 3 0 0 Average 2.00 0.95 0.91 0.35 0.50 0.77 2.0 1.0 0.0 0.7 0.7 0.0 2.9 0.7 0.0 D3220 1.17 0.88 0.80 0.22 0.52 1.30 4 0 0 0 0 0 4 0 0 D3286 1.17 1.01 0.81 0.40 0.40 1.54 0 0 0 0 0 6 4 0 0 D3224 1.00 1.08 0.82 0.20 0.24 1.96 0 0 0 0 0 0 5 0 0 Average 1.11 0.99 0.81 0.27 0.39 1.60 1.3 0.0 0.0 0.0 0.0 2.0 4.3 0.0 0.0 47 IV. DISCUSSION 4.1 The Role of the a l Amphiphil ic Helix and the L inker Region in Hemolysin Transport In an attempt to elucidate the structural requirements of the hemolysin signal sequence, the conserved helix-strand-helix motif was divided into three regions (al-linker-a2) and subjected to random oligonucleotide mutagenesis. This method was adopted because a large number of variants could be generated, providing a clear picture in regard to the presence or absence of critical element(s) within the targeted region. al helical variants from a previous study (Morden, 1998) were assayed side by side with al random variants to facilitate comparison of the mutants' ability to support transport. Figure 8 clearly demonstrates that most al helical variants were secreted at higher levels than random variants. While results from the helical library indicated that an amphiphilic helix in the al region is sufficient for transport, results from the al random library further suggested that this structural element is actually required. The observation that the average helix hydrophobic moment of al random variants increased with the ability to be secreted supported the role of an amphiphilic helix in transport. Nevertheless, these results did not exclude the possibility that other factors within the same region might also be involved in the substrate recognition process. The median of secretion for al random full-length mutants was approximately 20%; in contrast, the median of secretion for a2 random full-length variants was close to 100%. Comparison of these two random libraries (al v.s. a2) nicely established the essentiality of the al region (figure 6A v.s. 6C). It was striking to learn that the a2 region could tolerate many different combinations of amino acids. While both al and a2 helices seemed to be conserved (in at least three RTX toxins) (Zhang et al., 1995), only the first of the two helices was required for function. Why is a2 conserved if it is not important for transport? Two possible explanations exist. First, a2 is a relatively weak amphiphilic helix, which is a common secondary structure among protein sequences (appendix B.l). In other words, the appearance of the downstream helix could be just a coincidence. Second, since the blood agar plate assay system was only designed to 48 Secretion Pattern of Alphal Helical v.s. Random Variants • Random mutants • Helical mutants 30 | 25 -S 20 _ E % 15 _ I 10 -s 5 -0 _ 0 1 2 3 4 5 6 Hemolytic zone size Figure 8: Comparison of the Secretion Levels between a l Helical and Random Variants al helical variants from a previous study (Morden, 1998) were assayed at the same time with al random variants to facilitate comparison of the mutant's ability to support transport. The contrast in secretion levels between the helical and random variants is shown in the above plot. While the majority of the 34 random variants secreted at 3, most of the 22 helical variants transported hemolysin at much higher levels, suggesting that an amphiphilic helix in the a 1 region is both required and sufficient for efficient transport. 49 measure the hemolytic activity, the cc2 helix could be conserved for some yet unidentified function. The fact that at least 11 out of 60 amino acids of the hemolysin signal sequence (i.e. the al region) could be replaced with almost any residue and still retained wild-type activity clearly demonstrated the versatility of the hemolysin transporter system in accommodating many different variations of the substrate. In addition to the two helices, the linker region was also investigated. If the only role of this region was to connect the two helices, the linker should be able to tolerate many different combinations of amino acids, and most linker full-length mutants would be expected to be secreted at a level similar to wild-type. Instead, it was found that linker full-length mutants were secreted at levels similar to those of the al library, suggesting that the linker also contains some element(s) important for efficient transport. No specific functional features have been identified in this region yet, partly due to the relatively small population of linker full-length mutants in this library, and also because of the lack of pre-existing clues (such as in the al "helical" region). It would be interesting to determine if the linker was an extension of the al helix or a completely different entity. The analysis of full-length mutants in the al, linker, and a2 libraries allows for the construction of a functional model based on the observed structural features. Results of the al random library, when combined with those of the helical library, provided strong evidence for the importance of an amphiphilic helix in this region. Moreover, the linker region was found to be critical for efficient transport, although further investigations are required to identify the exact features involved. Despite the finding that the a2 helix appears to be conserved across phylogeny, our results clearly demonstrated that no specific primary or secondary structural elements are required in this region. 4.2 Correlation of the Predicted Hydrophobicity of the C-terminal 8 Residues with Hemolysin Secretion The last 16 amino acids of the signal sequence must contain some crucial element(s) since deletion of this region resulted in a dramatic reduction in transport. To identify these features, two contiguous 8 amino acid random libraries (Cterml and Cterm2) were created downstream of a2. Essentially all Cterml full-length mutants secreted at wild-50 type level, providing strong evidence that this region does not contain any important elements. By deduction, the last 8 amino acids (Cterm2 region) must be critical for optimal secretion. While full-length mutants allowed the identification of important regions within the signal sequence, stop mutants provided information regarding the extreme C-terminus. These variants contained a stop codon in the targeted region, and thus the random sequence became the C-terminus. A comparison among stop mutants of the same length from the oc2 and Cterml libraries revealed that variants containing a relatively hydrophobic tail were consistently secreted at higher levels, while clones with more than one positively charged residues within the last 8 amino acids were secreted at lower levels. Analysis of these stop mutants provided clues that the hydrophobicity of the last few residues affects the level of secretion. The most convincing evidence for the above proposal came from analysis of the 65 Cterm2 full-length mutants. In these variants, only the last 8 amino acids were altered, leaving the rest of the signal sequence intact. About 50% of these variants secreted at wild-type level. Since this functional requirement can be met easily, it is unlikely to involve a very specific arrangement of amino acids. When mutants were sorted according to their secretion levels, the average hydrophilicity of each class decreased with increasing ability to be transported. All existing data suggest that hydrophobicity is likely the primary factor involved in the extreme C-terminus. When the hemolytic zone for al stop mutants was plotted against the position of stop codons, it was observed that the chance of secretion increased with length (figure 7A). This implied that the number of residues after the linker region is important for optimal secretion. Interestingly, a similar trend was also observed for the first two residues of the Cterml region, after which the length element seemed to be saturated in that mutants secreting at wild-type level started to appear (figure 7B). Taken together, it was proposed that a minimum of 14 residues (11 from al and 3 from Cterml) is required after the linker region to support wild-type secretion. 51 4.3 Functional Model of the Hemolysin Signal Sequence Analysis of the hemolysin signal sequence variants from six different combinatorial libraries (five random libraries from this study and one helical library from a previous study (Morden, 1998)) allowed us to come to the conclusion that there are two critical functional domains within the signal sequence (figure 9). The first domain is 22 residues long, and consists of the al helix as well as the linker region. An amphiphilic helical structure in the al region is both sufficient and required for transport. The second domain covers the last 8 residues of the signal sequence, and hydrophobicity in this region is a major determinant of efficient transport. Connecting the two is a stretch of 19 residues (a2 and Cterml regions) that can be replaced by almost any combination of amino acids and still retain wild-type activity. It is believed that at least 14 amino acids are required after the first functional domain for optimal transport, thus a length of 6 residues would be the minimum requirement in this connector region. It should be noted that a similar model has been proposed (Koronakis et al., 1991). By comparing the C-terminal sequences of three hemolysin homologues, it was suggested that there were three functional domains within the signal sequence: an amphiphilic helix stretching from L973 to F990, a 7 amino acid charged cluster immediately downstream, and an 8 amino acid hydroxylated tail at the extreme C-terminus. This tripartite model was supported by results from a number of mutational experiments. By analyzing a large number of combinatorial mutants, we highlighted two important functional domains that overlap almost perfectly with those from the other model. Our first domain stretches from L976 to S997, which covers the amphiphilic helix in the tripartite model. Our second domain extends from SOI 17 to A1024, and is identical to the third region proposed in the other model. Our functional model also offers some important insights into a number of common observations from previous studies (section 1.3.3). It verifies the importance of the al amphiphilic helical structure in transport. Furthermore, the ability of the signal sequence to tolerate many drastic mutations can be explained by the fact that multiple functional domains are involved—inactivation of one domain would still allow a reasonable level of transport. Most importantly, we were able to conclusively demonstrate that the hemolysin transport system can recognize and transport a wide range of hemolysin 52 • Critical regions a 1 helix linker a2 helix Cterml Cterm2 I l I 1 I STYGSQDNLNPLINEISKIISAAGNFDVKEERSAASLLQLSGNASDFSYGRNSITLTASA Functional Domain I Functional Domain II Figure 9: Functional Model of Hemolysin Signal Sequence Based on results from the five random libraries and al helical library, a two-domain functional model is proposed (important regions are underlined). Functional domain I consists of an amphiphilic helix (al), and the 10 amino acids immediately downstream (linker). Functional domain II is located at the extreme C-terminus and consists of a stretch of eight hydrophobic amino acids (Cterm2). The 19 residue region (a2 and Cterml) connecting the two domains can be replaced by almost any sequence without having a dramatic effect on transport. 53 variants that share little primary sequence similarity but contain some common structural/biophysical features. This observation is fundamental to explaining the broad substrate specificity of the hemolysin transporter complex. Results from the a2 and Cterml random libraries suggested that no specific primary or secondary structural elements are required in the connector region (A998 to N1016), providing evidence contradictory to the alleged importance of a number of proposed features within this region. Among those were the 13 uncharged amino acid region (S997 to S1009) (Stanley et al, 1991), and the conserved aspartate box (E994 to D1010) (Kenny etal, 1992). Although the emphasis of our functional model is on the helical structure and hydrophobicity, it does not exclude the possibility that multiple "contact" residues also facilitate transport (Chervaux and Holland, 1996). In fact, the observation that a number of al helical variants were secreted at sub-optimal levels implied that other features could also be involved in this region as well. Furthermore, the absence of "critical" residues in most linker variants may offer a testable hypothesis to why these mutants were secreted at low levels. 4.4 Future Work 4.4.1 Refinement of the Current Model Future work can be divided into three areas: (1) to refine the current model, (2) to understand how the functional domains facilitate transport by interacting with the membrane/transporter, and (3) to search for proteins that are potential substrates of ABC transporters based on the principles highlighted in this study. While it had been shown that an amphiphilic helix and a hydrophobic tail are essential for efficient transport, we had not been able to identify any specific element(s) in the linker region. Site-directed mutagenesis could be used to reintroduce certain critical residues into a number of linker random variants. If these residues were sufficient to support efficient transport, wild-type secretion would be restored. Revertants could also be isolated from a number of linker random mutants to help determine elements that promote transport. To test the other hypothesis that 54 the linker was an extension of the al amphiphilic helix, a combinatorial library with a similar design principle to the a 1 helical library could be generated. Conclusions regarding the length requirement of the connector region should be interpreted with caution since they were based on results from stop mutants that had both the length and the identity of residues at the extreme C-terminus altered simultaneously. To properly address the minimum length requirement of this region, systematic internal deletions (e.g. in multiples of three amino acids) starting from either side of this 19 amino acid region (a2 and Cterml) should be carried out. The upper length limit could also be tested by inserting different numbers of alanine residues into this region. Further experiments should try to determine if mutations of the two functional domains would have a synergistic effect on secretion, and if the two domains actually interacted with each other. 4.4.2 Understanding How the Functional Domains Contribute to Transport Since our ultimate goal is to understand the transport mechanism, studies focusing on the signal sequence alone are not sufficient. To determine the orientation of the amphiphilic helical domain in the lipid bilayer, biophysical studies such as Fourier transform infrared spectroscopy could be carried out on synthetic peptides representing wild-type al helices. Peptides representing a number of al random variants could also be studied to understand if mutations in this helix could affect the nature of insertion into the membrane. With the many al and Cterm2 random variants that were defective in transport, it would be interesting to isolate complementary point mutants in HlyB that could reverse the mutant phenotype. This powerful genetic approach had been used to define sites of interaction between hemolysin deletion mutants and HlyB (Sheps et al., 1995; Zhang et al., 1993b); it may provide more specific information with these combinatorial variants since the mutations are more clearly defined. Another way to define interactions would be in a system in which a mutant HlyB secretes wild-type hemolysin at low levels. A screen for hemolysin combinatorial variants that can be transported at higher levels could also reveal important elements of interaction. 55 4.4.3 Searching for Substrates of Eukaryotic A B C Transporters Bacterial RTX toxins like hemolysin all have their own dedicated ABC exporter systems; however, no eukaryotic proteins are known to be transported via a similar mechanism. Since many eukaryotic secreted proteins do not contain a traditional N-terminal signal sequence, and ABC transporters form one of the largest families of transporters in all organisms, it is possible that some of these proteins are translocated by ABC proteins. In this case, defining what constitutes a signal sequence would not only enhance our understanding of the transport mechanism, but also facilitate the search for eukaryotic protein substrates that utilize a similar transport pathway. Following this rationale, a pilot search project had been initiated (appendix B.l). 4.5 Conclusions We have performed a comprehensive mutagenesis of the last 49 amino acids of the hemolysin signal sequence using a combinatorial approach. Building on the helix-strand-helix structural model, we proposed a two-domain functional model that emphasizes the importance of secondary structure (i.e. amphiphilic helix) and general biophysical property (i.e. hydrophobicity) over primary sequence in transport. This model can explain the ability of the signal sequence to tolerate many drastic mutations, as well as the broad substrate specificity of the hemolysin transporter complex. It also forms a good basis for our continual investigation to understand how features within the hemolysin signal sequence interact with the transporter complex to facilitate transport, with relevance to the transport mechanism of a family of transporters highly implicated in human diseases. 56 BIBLIOGRAPHY Chan, H.S., DeBoer, G., Thomer, P.S., Haddad, G., Gallie, B.L. and Ling, V. (1994) Multidrug resistance. Clinical opportunities in diagnosis and circumvention. Hematol Oncol Clin North Am, 8, 383-410. Chervaux, C. and Holland, LB. (1996) Random and directed mutagenesis to elucidate the functional importance of helix II and F-989 in the C-terminal secretion signal of Escherichia coli hemolysin. Journal of Bacteriology, 178,1232-6. Childs, S. and Ling, V. (1994) The MDR superfamily of genes and its biological implications. Important Adv Oncol, 21-36. Childs, S., Yeh, R.L., Georges, E. and Ling, V. (1995) Identification of a sister gene to P-glycoprotein. Cancer Res, 55, 2029-34. Childs, S., Yeh, R.L., Hui, D. and Ling, V. (1998) Taxol resistance mediated by transfection of the liver-specific sister gene of P-glycoprotein. Cancer Res, 58, 4160-7. Chou, P.Y. and Fasman, G.D. (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol, 47, 45-148. Cole, S.P., Sparks, K.E., Fraser, K., Loe, D.W., Grant, C.E., Wilson, G.M. and Deeley, R.G. (1994) Pharmacological characterization of multidrug resistant MRP-transfected human tumor cells. Cancer Res, 54, 5902-10. Coote, J.G. (1992) Structural and functional relationships among the RTX toxin determinants of gram-negative bacteria. FEMS Microbiol Rev, 8,137-61. Dean, M. and Allikmets, R. (1995) Evolution of ATP-binding cassette transporter genes. Curr Opin Genet Dev, 5, 779-85. Eisenberg, D., Weiss, R.M. and Terwilliger, T.C. (1984) The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci USA, 81, 140-4. Fath, M.J. and Kolter, R. (1993) ABC transporters: bacterial exporters. Microbiol Rev, 57, 995-1017. Gamier, J., Osguthorpe, D.J. and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol, 120, 97-120. 57 Gerloff, T., Stieger, B., Hagenbuch, B., Madon, J., Landmann, L., Roth, J., Hofmann, A.F. and Meier, P.J. (1998) The sister of P-glycoprotein represents the canalicular bile salt export pump of mammalian liver. JBiol Chem, 273, 10046-50. Goebel, W., Chakraborty, T. and Kreft, J. (1988) Bacterial hemolysins as virulence factors. Antonie van Leeuwenhoek, 54, 453-63. Gray, L., Baker, K., Kenny, B., Mackman, N., Haigh, R. and Holland, LB. (1989) A novel C-terminal signal sequence targets Escherichia coli haemolysin directly to the medium. Journal of Cell Science - Supplement, 11, 45-57. Gray, L., Mackman, N., Nicaud, J.M. and Holland, LB. (1986) The carboxy-terminal region of haemolysin 2001 is required for secretion of the toxin from Escherichia coli. Mol Gen Genet, 205, 127-33. Hess, J., Gentschev, I., Goebel, W. and Jarchau, T. (1990) Analysis of the haemolysin secretion system by PhoA-HlyA fusion proteins. Molecular & General Genetics, 224, 201-8. Higgins, C.F. (1992) ABC transporters: from microorganisms to man. Annu Rev Cell Biol, 8, 67-113. Hughes, C , Stanley, P. and Koronakis, V. (1992) E. coli hemolysin interactions with prokaryotic and eukaryotic cell membranes. Bioessays, 14, 519-25. Hung, L., IX., W., K., N., Liu, P., Ames, G. and Kim, S. (1999) Crystal Structure of the ATP-binding Subunit of an ABC Transporter, the Histidine Permease of Salmonella typhimurium.at 1.5 A resolution. 2nd FEBS Advanced Lecture Course "ATP-Binding Cassette Transporters: From Multidrug Resistance to Genetic Disease", Gosau, Austria, p. 6. Jarchau, T., Chakraborty, T., Garcia, F. and Goebel, W. (1994) Selection for transport competence of C-terminal polypeptides derived from Escherichia coli hemolysin: the shortest peptide capable of autonomous HlyB/HlyD-dependent secretion comprises the C-terminal 62 amino acids of HlyA. Molecular & General Genetics, 245, 53-60. Kenny, B., Chervaux, C. and Holland, I.B. (1994) Evidence that residues -15 to -46 of the haemolysin secretion signal are involved in early steps in secretion, leading to recognition of the translocator. Molecular Microbiology, 11, 99-109. 58 Kenny, B., Taylor, S. and Holland, LB. (1992) Identification of individual amino acids required for secretion within the haemolysin (HlyA) C-terminal targeting region. Molecular Microbiology, 6,1477-89. Koronakis, V., Koronakis, E. and Hughes, C. (1989) Isolation and analysis of the C-terminal signal directing export of Escherichia coli hemolysin protein across both bacterial membranes. EMBO Journal, 8, 595-605. Kuchler, K., Sterne, R.E. and Thorner, J. (1989) Saccharomyces cerevisiae STE6 gene product: a novel pathway for protein export in eukaryotic cells. Embo J, 8, 3973-84. Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol, 157, 105-32. Linton, K.J. and Higgins, CF. (1998) The Escherichia coli ATP-binding cassette (ABC) proteins. Mol Microbiol, 28, 5-13. Marusina, K. and Monaco, J.J. (1996) Peptide transport in antigen presentation. Curr Opin Hematol, 3, 19-26. Menestrina, G., Moser, C , Pellet, S. and Welch, R. (1994) Pore-formation by Escherichia coli hemolysin (HlyA) and other members of the RTX toxins family. Toxicology, 87, 249-67. Morden, C. (1998) The Role of the al-Helical Domain of the Signal Sequence in Hemolysin Recognition and Transport. Department of Biochemistry and Molecular Biology. The University of British Columbia, Vancouver. Nicaud, J.M., Mackman, N., Gray, L. and Holland, LB. (1986) The C-terminal, 23 kDa peptide of E. coli haemolysin 2001 contains all the information necessary for its secretion by the haemolysin (Hly) export machinery. FEBS Letters, 204, 331-5. Ostolaza, H. and Goni, F.M. (1995) Interaction of the bacterial protein toxin alpha-haemolysin with model membranes: protein binding does not always lead to lytic activity. FEBS Lett, 371, 303-6. Riordan, J.R. (1993) The cystic fibrosis transmembrane conductance regulator. Annu Rev Physiol, 55, 609-30. Shapiro, A.B. and Ling, V. (1997) Positively cooperative sites for drug transport by P-glycoprotein with distinct drug specificities. Eur J Biochem, 250,130-7. 59 Shapiro, A.B. and Ling, V. (1998) Transport of LDS-751 from the cytoplasmic leaflet of the plasma membrane by the rhodamine-123-selective site of P-glycoprotein. Eur J Biochem, 254, 181-8. Sheps, J., Zhang, F. and Ling, V. (1996) Bacterial Toxin Transport: The Hemolysin System. Membrane Protein Transport. JAI Press Inc., Vol. 3, pp. 81-118. Sheps, J.A., Cheung, I. and Ling, V. (1995) Hemolysin transport in Escherichia coli. Point mutants in HlyB compensate for a deletion in the predicted amphiphilic helix region of the HlyA signal. Journal of Biological Chemistry, 270, 14829-34. Stanley, P., Koronakis, V. and Hughes, C. (1991) Mutational analysis supports a role for multiple structural features in the C-terminal secretion signal of Escherichia coli haemolysin. Molecular Microbiology, 5, 2391-403. Strautnieks, S.S., Bull, L.N., Knisely, A.S., Kocoshis, S.A., Dahl, N., Arnell, H., Sokal, E., Dahan, K., Childs, S., Ling, V., Tanner, M.S., Kagalwalla, A.F., Nemeth, A., Pawlowska, J., Baker, A., Mieli-Vergani, G., Freimer, N.B., Gardiner, R.M. and Thompson, R.J. (1998) A gene encoding a liver-specific ABC transporter is mutated in progressive familial intrahepatic cholestasis. Nat Genet, 20, 233-8. Yin, Y., Zhang, F., Ling, V. and Arrowsmith, CH. (1995) Structural analysis and comparison of the C-terminal transport signal domains of hemolysin A and leukotoxin A. FEBS Letters, 366, 1-5. Zhang, F., Greig, D.I. and Ling, V. (1993a) Functional replacement of the hemolysin A transport signal by a different primary sequence. Proceedings of the National Academy of Sciences of the United States of America, 90, 4211-5. Zhang, F., Sheps, J.A. and Ling, V. (1993b) Complementation of transport-deficient mutants of Escherichia coli alpha-hemolysin by second-site mutations in the transporter hemolysin B. Journal of Biological Chemistry, 268,19889-95. Zhang, F., Yin, Y., Arrowsmith, CH. and Ling, V. (1995) Secretion and circular dichroism analysis of the C-terminal signal peptides of HlyA and LktA. Biochemistry, 34,4193-201. 6 0 Appendix A. Batch Analysis Perl Scripts A . l Batch-analysis.pl # This program takes in a FASTA file containing multiple DNA sequences, and transforms it into GCG format. All sequences are then translated and analyzed with the two GCG programs, PEPPLOT and PEPTIDESTRUCTURE. The output resembles a table with the names of sequences on the first column, followed by a number of biophysical properties (separated by commas). In order of presentation: # PEPPLOT—Chou and Fasman alpha propensity # PEPPLOT—Chou and Fasman beta propensity # PEPPLOT—Chou and Fasman turns # PEPPLOT—Kyte and Doolittle hydropathy # PEPTIDESTRUCTURE—Kyte and Doolittle hydrophilicity # PEPTIDESTRUCTURE—Secondary structure according to Chou and Fasman method (t) # PEPTIDESTRUCTURE—Secondary structure according to Chou and Fasman method (T) # PEPTIDESTRUCTURE—Secondary structure according to Chou and Fasman method (b) # PEPTIDESTRUCTURE—Secondary structure according to Chou and Fasman method (B) # PEPTIDESTRUCTURE—Secondary structure according to Chou and Fasman method (h) # PEPTIDESTRUCTURE—Secondary structure according to Chou and Fasman method (H) # PEPTIDESTRUCTURE—Secondary structure according to Garnier-Osguthorpe-Robson method (T) # PEPTIDESTRUCTURE—Secondary structure according to Garnier-Osguthorpe-Robson method (B) # PEPTIDESTRUCTURE—Secondary structure according to Garnier-Osguthorpe-Robson method (H) # Notice that calculations are done for each amino acid position (i.e. a sequence with 10 amino acids would get 10 rows of data). These raw data are then filtered through another program, Summation.pl, to get a summary for that particular sequence. # Command line: perl Batch-analysis.pl input filename output filename & # Written by Dr. Eric Cobat, modified by David Hui, 1998 $iname = shift; $oname = shift; open(OFILE,">$oname"); $listfile = "fromfasta.list"; system("fromfasta -d -nomon $iname -list=$listfile > /dev/null"); open(LISTFILE, $listfile); while(<LISTFILE>) { last if A. {2}$/ } 61 while (<LISTFILE>) { nextif(/A$/|/A!/); chomp; ($seqname) = split; $seqname=~/A.*V(.*)\./; $entryname = $1; print 'cat $seqname > temp.pep'; unlink($seqname); system("pepplot temp.pep -d -cff=temp.cho -noplot >/dev/null"); open (DATA,"<temp.chon); while (<DATA>) { lastif(A.\./); } $index=0; while (<DATA>) { ($pos,$res,$astat,$aave,$bstat,$bave,$arm,$aoh,$brm,$boh,$turn,$hpob) = s $line[$index] .= ''$entiyname,$aave,$bave,$turn,$hpob,''; $index++; } close(DATA); unlink("temp.cho"); system("peptidestruchire -d temp.pep -menu=h -out =temp.p2s >/dev/null "); open (DATA,"temp.p2sM); while (<DATA>) { lastif(A.\./); } $index=0; while (<DATA>) { if(!/A$/){ ($pos,$aa,$gly,$hyph,$surf,$flex,$cfpred,$gorpred,$ai) = split; $line[$index] .= "$hyph,$cfpred,$gorpred"; $index++; } } close(DATA); unlink("temp.p2s"); unlink("temp.pep"); foreach $line (@line) { print OFILE ("$line\n"); } @line=(); #empty the line array } close(LISTFILE); unlink($listfile); close(OFILE); system("echo Script Batch.pl finished at Mate' "); 62 Appendix system("echoM); A.2 Summation.pl # This program takes in a file (data) generated from Batch-analysis.pl, and essentially reduces the data for each sequence to one line by adding up all the entries. The output (called summary) resembles a table with the names of sequences on the first column, followed by a number of biophysical properties (separated by commas). Please refer to Batch-analysis.pl for a detail description of these entries. # Command line: perl Summation.pl # Written by David Hui, 1999 print 'cp data temp'; print 'echo The End »temp' ; open(DATAFILE, "<temp"); open(SUMFILE, ">summary"); $mutant = "Summation Program"; $aavesum,$bavesum,$tumsum,$hpobsum,$hyphsum,$cfpredtcount,$cfpredTcount,$cfp dhcount,$cfpredHcount,$cfpredbcount,$cfpredBcount,$gorpredTcount,$gor^  gorpredBcount = 0; while(<DATAFILE>) { chomp; $line = $_; ($name,$aave,$bave,$turn,$hpob,$hyph,$cfpred,$gorpred) = split(/,/,$line); if (/$mutant/) { Saavesum += $aave; $bavesum += $bave; $turnsum += $turn; $hpobsum += $hpob; $hyphsum += $hyph; if($cfpred eq "t") { $cfpredtcount++; } elsif($cfpredeq"T") { $cfpredTcount++; } elsif($cfpredeq"h") { $cfpredhcount-H-; } elsif($cfpredeq"H") { $cfpredHcount++; } elsif($cfpredeq"b") { $cfpredbcount++; } elsif($cfpredeq"B") { $cfpredBcount++; } if($gorpredeq"T") { $gorpredTcount++; } elsif($gorpredeq"H") { $gorpredHcount++; 63 Appendix } elsif($gorpredeq"Bn) { $gorpredBcount++; } } else { print SUMFILE "$mutant,$aavesum,$bavesum,$tumsum,$hpobsu^  unt,$cfrjredbcount,$cfpredBc^  dBcount,$gorpredHcount \n"; $mutant = $name; $aavesum = $aave; $bavesum = $bave; Sturnsum = $turn; $hpobsum = $hpob; $hyphsum = $hyph; if($cfpred eq "t") { $cfpredtcount = 1; } else { Scfpredtcount = 0; } if($cfpred eq "T") { ScfpredTcount = 1; } else { ScfpredTcount = 0; } if($cfpred eq "b") { Scfpredbcount = 1; } else { Scfpredbcount = 0; } if($cfpred eq "B") { ScfpredBcount = 1; } else { ScfpredBcount = 0; } if($cfpredeqMh") { Scfpredhcount = 1; } else { Scfpredhcount = 0; } if($cfpred eq "H") { ScfpredHcount = 1; } else { ScfpredHcount = 0; } if($gorpred eq "T") { SgorpredTcount = 1; 64 Appendix } else { SgorpredTcount = 0; } if($gorpred eq "B") { SgorpredBcount = 1; } else { SgorpredBcount = 0; } if($gorpred eq "H") { SgorpredHcount = 1; } else { SgorpredHcount = 0; } } } close(D AT AFILE); close(SUMFILE); print 'rm temp'; 65 Appendix B. Searching for Proteins that Contain an al-like Amphiphilic Helix B.l Search Results Two criteria were used to search for proteins that may contain a C-terminal signal sequence. First, the proteins must contain a structure similar to al amphiphilic helix. Second, such a motif should occur within the last hundred amino acids. Applying a design principle similar to that used for al helical variants, we came up with a degenerate sequence in which all 2.3xlO11 combinations were predicted to form an al-like amphiphilic helix—[ALVIMFW] [ALVIMFW] [SCTNQDEKHRY] [SCTNQDEKHR Y] [AL VIMF W] [SCTNQDEKHRY] [SCTNQDEKHRY] [ALVIMF W] [AL VIMF W] [SCT NQDEKHRY][ASCTNQDEKHRY][ALVIMFW]. A Perl script (appendix B.2) was written to perform this search in the NR (non-redundant) database at the National Centre for Biotechnology Information. This database collection contains 3.2xl05 proteins from many different species. If the degenerate sequence occurred by chance, only 15 proteins would be expected (taking into account the biological frequency of each amino acid). Interestingly, 9354 proteins were found with such a motif, suggesting that an amphiphilic helix is very common, and probably plays significant biological roles. Twenty-eight percent of these proteins (2634 out of the 9354) contained this structural motif within the last 100 amino acids. From this list, 126 were human proteins (appendix B.3), many of which were not secreted. A few of the interesting candidates included interleukin 10, cardiotrophin-1, FALL-39 peptide antibiotic, gonadotropin releasing hormone and prolactin release-inhibiting factor precursor, which play important physiological roles, and do not belong to any known secretion pathways. To test if these candidate proteins could be secreted by ABC transporters, their C-terminus could be used to replace the hemolysin signal sequence and the hybrid proteins could be assayed for transport. While this search effort represented only the beginning of a much more complicated project, it had established the common occurrence of an amphiphilic helix in protein sequences. Searches that are much more specific would be possible with more refined definitions of the C-terminal signal sequence in the future. B.2 PerlSearch.pl print "PerlSearch version 1.5 (03/10/98)\n"; 66 Appendix print "This program reads a string of amino acids in regular expression format, then searches through a formatted protein database file for any exact matches. In particular, it limits the search to the C-terminus of each entry (the length is user-defined). The output includes the followings:\n (1) sequences of proteins containing the string, \n (2) sequences of proteins containing the string at the C-terminus,\n (3) names of human proteins from (2), and\n (4) a summary report.\n\n"; # Command line: perl PerlSearch.pl # Written by David Hui, 1999 print "Please enter the search sequence: "; chomp($answerl = <STDIN>); if($answerl ne"") { Stheseq = qq/$answerl/; - } else { Stheseq = " [AL VIMF W] [AL VIMF W] [SCTNQDEKHRY] [SCTNQDEKHRY] [AL VIMF W] [SCTN QDEKHRY] [SCTNQDEKHRY] [ALVIMFW] [ALVIMFW] [SCTNQDEKHRY] [ASCTN QDEKHRY] [ALVIMF W]"; } print "Please enter the file that contains the protein sequences:"; chomp($answer5 = <STDIN>); if($answer5ne"") { $temp = qq/$answer5/; } else { $temp = "temp"; } print "Please enter the number of amino acids (C-terminal end) to limit the search:"; chomp($answer4 = <STDIN>); if($answer4ne"") { SCterm = qq/$answer4/; } else { $Cterm= 100; } print "Please enter a filename for data output:"; chomp($answer2 = <STDIN>); if($answer2ne "") { $targets - qq/$answer2.seq/; $report = qq/$answer2.rpt/; $human = qq/$answer2.hum/; } else { $targets = "default.seq"; $report = "default.rpt"; $human - "default.hum"; } open(TEMPDATA, "<$temp"); # find proteins with search target (from $temp to hits) open(HITDATA, ">hits"); while(<TEMPDATA>) { chomp; $line = $_; if($line^/$theseq/) { 67 Appendix print HITDATA $line, "\n"; } } close(HITDATA); open(HITDATA, "<hits"); # find proteins with search target in C-term (from hits to $targets) open(TARGETDATA, ">$targets"); Stargetnumber = 0; while(<HITDATA>) { chomp; $line = $_; /(.{$Cterm}$)/; $lastl00 = $l; if (SlastlOO =~/$theseq/) { $line =~ s/($theseq)A{$l\}/g; print TARGETDATA $line, "\n"; $targetnumber-H-; } } close(TARGETDATA); open(TARGETDATA, "<$targets"); # create siimmary report (from Stargets to $report) open(REPORTDATA, ">$report"); print REPORTDATA "\n* ********* PerlSearch SUMMARY REPORT * * * * * * * * * *^ n i t . while(<TARGETDATA>) { chomp; $line = $_; ($name, $sequence) = split(/]/,$line); print REPORTDATA $name, "]\n"; } close(REPORTDATA); open(REPORTDATA, "<$report"); # search for human proteins (from $report to $human) open(HUMAND AT A, '^ Shuman"); $humannumber = 0; while(<REPORTDATA>) { chomp; $line = $_; if ($line =~ /Homo sapiens/) { print HUMANDATA $line, "\n"; $humannumber++; } } close(REPORTDATA); open(REPORTDATA, "»$report"); # continue to create summary report 68 Appendix print REPORTDATA "\nThe search sequence is:", Stheseq, "\n"; print REPORTDATA "# of proteins containing search sequence: ", 'cat hits | wc -T; print REPORTDATA "# of proteins containing search sequence in the last", $Cterm," amino acids:Stargetnumber, "\n"; print REPORTDATA "Out of these ", $targetnumber," proteins,", $humannumber," are from humans.\n\n"; print REPORTDATA "Entries containing search sequence are stored in file: hits\n"; print REPORTDATA "Entries containing search sequence in C-terminus are stored in file:", $targets, "\n"; print REPORTDATA "Human proteins are stored in file:", $human, "\n"; print REPORTDATA "Report is stored in file:", $report, "\n"; print REPORTDATA "\nReport created:", Mate'; close(REPORTDATA); open(REPORTDATA, "<$report"); # print summary report while ($line = <REPORTDATA>) { chomp; print($line); } B.3 Typical Output from PerlSearch.pl The search sequence i s : [ALVIMFW][ALVIMFW][SCTNQDEKHRY][SCTNQDEKHRY][ALVIMFW][SCTNQDEKHRY][SCTN QDEKHRY][ALVIMFW][ALVIMFW][SCTNQDEKHRY][ASCTNQDEKHRY][ALVIMFW] # of p r o t e i n s c o n t a i n i n g search sequence: 9354 # of p r o t e i n s c o n t a i n i n g search sequence i n l a s t 100 nucleic/amino a c i d s : 2634 Out of these 2634 p r o t e i n s , 126 are from humans. E n t r i e s c o n t a i n i n g search sequence are stored i n f i l e : h i t s E n t r i e s c o n t a i n i n g search sequence i n C-terminal are st o r e d i n f i l e : 2c.seq Human p r o t e i n s s t o r e d i n f i l e : 2c.hum Report s t o r e d i n f i l e : 2c.rpt Report created: F r i Oct 2 09:18:22 PDT 1998 gi I 121069|sp|P04150|GCR_HUMAN GLUCOCORTICOID RECEPTOR (GR)Dgi|72116|pirMQRHUGA g l u c o c o r t i c o i d receptor, alpha s p l i c e form -humandgi|31680 (X03225) a l p h a - g l u c o c o r t i c o i d receptor [Homo sapiens] gi|126885|sp|P08235|MCR_HUMAN MINERALOCORTICOID RECEPTOR (MR)Dgi|88157|pir||A29513 m i n e r a l o c o r t i c o i d receptor - humandgi|307166 (M16801) m i n e r a l o c o r t i c o i d receptor [Homo sapiens] gi|119561|sp|P11475|ERR2_HUMAN STEROID HORMONE RECEPTOR ERR2 (ESTROGEN-RELATED RECEPTOR, BETA) (ESTROGEN RECEPTOR-LIKE 2)Dgi|88645|pir||B29345 s t e r o i d hormone receptor ERR2 precursor - humanDgi|36611 (X51417) hormone receptor hERR2 (AA 1-443) [Homo sapiens] gi134357 (X01059) pot. prepro LHRH [Homo sapiens] 69 Appendix gi|345751|pir|IS29027 aspartate transaminase (EC 2.6.1.1) (clone H10B1) - humandgi|179067 (M37400) aspartate aminotransferase [Homo sapiens] g i I 284043Ipir| |A41937 choroideremia-linked p r o t e i n - humandgi|339023 (M83773) TCD p r o t e i n [Homo sapiens] gi11082728|pir| |S51316 pr o s t a g l a n d i n E receptor, subtype EP3 s p l i c e form C - humandgi|2135987|pir||S68996 p r o s t a g l a n d i n E receptor, subtype EP3C - humandgi|440310 (L27488) prostanoid EP3-II receptor [Homo sapiens] g i I 345858 I p i r| |S29657 g l u t a t h i o n e t r a n s f e r a s e (EC 2.5.1.18) omega-1 chain - human (fragments)dgi|444770|prf||1908206A g l u t a t h i o n e S-transferase:ISOTYPE=omega [Homo sapiens] gi134986 (X53559) neuronal n i c o t i n i c a c e t y l c h o l i n e receptor alpha-3 subunit [Homo sapiens] gi1458657 (U01351) g l u c o c o r t i c o i d receptor alpha-2 [Homo sapiens] gi1494225 (M38116) neurofibromatosis protein type 1 [Homo sapiens] gi|2136002|pir||138750 p r o t a g l a n d i n receptor EP3D - humandgi|53274 4 (U13217) p r o t a g l a n d i n receptor EP3D [Homo sapiens] gi|1173267|sp|P4 6782|RS5_HUMAN 40S RIBOSOMAL PROTEIN S5dgi|1362935Ipir||S55916 ribosomal p r o t e i n S5 - humandgi|550021 (U14970) ribosomal p r o t e i n S5 [Homo- sapiens] gi|1317 92|sp|P20338|RB4A_HUMAN RAS-RELATED PROTEIN RAB-4Adgi|106188Ipir||E34323 GTP-binding p r o t e i n Rab4 - humandgi|550068 (M28211) GTP-binding p r o t e i n [Homo sapiens] g i I 88264|pir| |A37040 n i c o t i n i c a c e t y l c h o l i n e receptor alpha-3 chain precursor, neuronal (version 2) - humandgi|35090 (X52239) n i c o t i n i c receptor alpha-3 subunit [Homo sapiens] g i I 399022|sp|P24588|AK79_HUMAN A-KINASE ANCHOR PROTEIN 79 (AKAP 79) (CAMP-DEPENDENT PROTEIN KINASE REGULATORY SUBUNIT I I HIGH AFFINITY BINDING PROTEIN)dgi|283959Ipir||A43453 A-kinase anchor p r o t e i n 79 -humandgi|178324 (M90359) p r o t e i n kinase [Homo sapiens] gi|586210|sp|P38606|VATA_HUMAN VACUOLAR ATP SYNTHASE CATALYTIC SUBUNIT A, UBIQUITOUS ISOFORM (V-ATPASE 69 KD SUBUNIT) (ISOFORM VA68)dgi|291868 (L09235) ATPase [Homo sapiens] g i I 179924 (M94151) ORF1 [Homo sapiens] gi|117609|sp|P26232|CTN2_HUMAN ALPHA-2 CATENIN (ALPHA-CATENIN RELATED PROTEIN) (ALPHA N-CATENIN)dgi|345780Ipir|IA45011 CAP-R p r o t e i n -humandgi|179925 (M94151) cadherin-associated p r o t e i n - r e l a t e d [Homo sapiens] gi|118447|sp|P27707|DCK_HUMAN DEOXYCYTIDINE KINASE (DCK)dgil105829|pir|IA38585 deoxycytidine kinase (EC 2.7.1.74) -humandgi|181510 (M60527) deoxycytidine kinase [Homo sapiens] gi|121522|sp|P01148|GONL_HUMAN GONADOLIBERIN PRECURSOR (LHRH) (LUTEINIZING HORMONE RELEASING HORMONE) (GONADOTROPIN RELEASING HORMONE) (GNRH) (LULIBERIN)dgi|1070543|pir||RHHUG g o n a d o l i b e r i n precursor - humanDgi131956 (X15215) gonadotropin-releasing hormone [Homo sapiens] gi|124292|sp|P22301|IL10_HXJMAN INTERLEUKIN-10 PRECURSOR (IL-10) (CYTOKINE SYNTHESIS INHIBITORY FACTOR) (CSIF)dgi1106805|pir| |A38580 interleukin-10 precursor - humandgi|186271 (M57627) interleukin 10 [Homo sapiens] g i I 115442|sp|P05109|S108_HUMAN CALGRANULIN A (MIGRATION INHIBITORY FACTOR-RELATED PROTEIN 8) (MRP-8) (CYSTIC FIBROSIS ANTIGEN) (CFAG) (P8) 70 Appendix (LEUKOCYTE LI COMPLEX LIGHT CHAIN) (S100 CALCIUM-BINDING PROTEIN A8)Dgi|625310Ipir||BCHUCF calcium-binding p r o t e i n MRP-8 -humanDgi|34773 (X06234) MRP-8 (AA 1-93) [Homo sapiens] gi I 627555|pirMA53956 n i c o t i n i c a c e t y l c h o l i n e receptor alpha-3 chain precursor, neuronal - humanDgi|18 9253 (M37 981) neuronal n i c o t i n i c a c e t y l c h o l i n e receptor [Homo sapiens] gi I 189935 (M15716) progesterone receptor [Homo sapiens] gi1553648 (M11049) erythrocyte a l p h a - s p e c t r i n [Homo sapiens] gi|138547|sp|P18206|VINC_HUMAN VINCULINDgi|340237 (M33308) v i n c u l i n [Homo sapiens] gi I 1082362IpirMA55596 FALL-39 peptide a n t i b i o t i c precursor -humanDgi|558379 (Z38026) FALL-39 peptide a n t i b i o t i c [Homo sapiens] gi|1168300|sp|P11230|ACHB_HUMAN ACETYLCHOLINE RECEPTOR PROTEIN, BETA CHAIN PRECURSORDgi|560155 (X14830) a c e t y l c h o l i n e receptor beta-subunit p r e p r o t e i n [Homo sapiens] gi I 2118397Ipir| 1139462 angiotensinogen - human (fragment)Dgi|553181 (M69110) angiotensinogen [Homo sapiens] gi|416836|sp|P23786|CPT2_HUMAN MITOCHONDRIAL CARNITINE PALMITOYLTRANSFERASE I I PRECURSOR (CPT II)Dgi|2134871|pir||A39018 c a r n i t i n e O - p a l mitoyltransferase (EC 2.3.1.21) I I precursor -humanDgi|18098 9 (M58581) c a r n i t i n e p a l m i t o y l t r a n s f e r a s e [Homo sapiens] gi|124238|sp|P27352|IF_HUMAN INTRINSIC FACTOR PRECURSOR (IF) (INF) (GASTRIC INTRINSIC FACTOR)Dgi|106836|pir||A39904 g a s t r i c i n t r i n s i c f a c t o r precursor - humanDgi|18318 6 (M63154) i n t r i n s i c f a c t o r [Homo sapiens] gi I 2144875Ipir| |ACHUA7 n i c o t i n i c a c e t y l c h o l i n e receptor alpha-7 chain precursor, neuronal - humanDgi|4 96607 (X70297) neuronal n i c o t i n i c a c e t y l o c h o l i n e receptor alpha-7 subunit [Homo sapiens] gi129888 (Y00278) CFAg (AA 1-94) [Homo sapiens] gi|116365|sp|P26374|RAE2_HUMAN RAB PROTEINS GERANYLGERANYLTRANSFERASE COMPONENT A 2 (RAB ESCORT PROTEIN 2) (REP-2) (CHOROIDERAEMIA-LIKE PROTEIN)Dgi|481520|pir|IS38787 CHML- p r o t e i n - humanDgi|29943 (X64728) hCHML gene product [Homo sapiens] gi1457737 (Z23141) c h o l i n e r g i c receptor, n i c o t i n i c , alpha polypeptide 7 [Homo sapiens] gi I 585593|sp|P37198|NU62_HUMAN NUCLEAR PORE GLYCOPROTEIN P62 (NUCLEOPORIN P62)Dgi|432654 (X58521) nucleoporin p62 [Homo sapiens] gi|130894|sp|P06401|PRGR_HUMAN PROGESTERONE RECEPTOR (PR)Dgi|35652 (X51730) progesterone receptor (AA 1-933) [Homo sapiens] gi|132559|sp|P04843|RIB1_HUMAN DOLICHYL-DIPHOSPHOOLIGOSACCHARIDE— PROTEIN GLYCOSYLTRANSFERASE 67 KD SUBUNIT PRECURSOR (RIBOPHORIN I)Dgi|88566|pir||A26168 r i b o p h o r i n I precursor - humanDgi|36053 (Y00281) precursor [Homo sapiens] gi1472993 (X76562) i n t r i n s i c f a c t o r [Homo sapiens] gi 1106337Ipir| IS11660 h y p o t h e t i c a l p r o t e i n - humanDgi|36747 (X57637) choroideramia(tapetochoroidal dystrophy) gene [Homo sapiens] gi|586141|sp|P37286|UBCH_HUMAN UBIQUITIN-CONJUGATING ENZYME E2-21 KD (UBIQUITIN-PROTEIN LIGASE) (UBIQUITIN CARRIER PROTEIN) (UBCH2) (E2-20K)Dgi|631492IpirMA53516 u b i q u i t i n - c o n j u g a t i n g enzyme UbcH2 -humanDgi|1363983Ipir||JC4308 u b i q u i t i n — p r o t e i n l i g a s e (EC 6.3.2.19) -mouseDgi|474827 (Z29328) U b i q u i t i n - c o n j u g a t i n g enzyme UbcH2 [Homo sapiens] 71 1 Appendix gi I 224609|prf| 11109236A l u l i b e r i n precursor [Homo sapiens] g i I 2255411prfM 1305349A c y s t i c f i b r o s i s antigen [Homo sapiens] gi I 228505|prf| 11805212A v i n c u l i n [Homo sapiens] gi I 1173455|sp|P4 3330|SMD2_HUMAN SMALL NUCLEAR RIBONUCLEOPROTEIN SM D2 (SNRNP CORE PROTEIN D2) (SM-D2)Dgi|2136168|pir||138861 small nuclear r i b o n u c l e o p r o t e i n chain D2 - humanDgi|6007 4 8 (U15008) Sm D2 [Homo sapiens] gi I 1706745|sp|P4 9913|FA39_HUMAN ANTIBACTERIAL PROTEIN FALL-39 PRECURSOR (FALL-39 PEPTIDE ANTIBIOTIC) (ANTIMICROBIAL PROTEIN CAP-18) (LL-37)Dgi|2118420|pir|IS58023 CAP-18 p r o t e i n precursor -humanDgi|2134863|pir||S66281 CAP-18 p r o t e i n precursor - humanDgi|64 34 7 7 (U19970) CAP18 precursor [Homo sapiens] gi I 1170596|sp|P45985|MPK4_HUMAN JNK ACTIVATING KINASE 1 (C-JUN N-TERMINAL KINASE KINASE 1) (JNKK) (MAP KINASE KINASE 4)Dgi|2135523Ipir||138901 J N K - a c t i v a t i n g p r o t e i n kinase -humanDgi|68517 6 (L36870) MAP kinase kinase 4 [Homo sapiens] gi|1705750|sp|P53567|CEBG_HUMAN CCAAT/ENHANCER BINDING PROTEIN GAMMA (C/EBP GAMMA)Dgi|1363931Ipir|IJC4243 t r a n s c r i p t i o n CCAAT enhancer b i n d i n g protein-gamma - humanDgi|727294 (U20240) C/EBP gamma [Homo sapiens] gi|1345629|sp|P49335|BRN4_HUMAN BRAIN-SPECIFIC HOMEOBOX/POU DOMAIN PROTEIN 4Dgi|732756 (X82324) B r a i n 4 [Homo sapiens] gi I 741337|prf| |2007246A n i c o t i n i c a c e t y l c h o l i n e receptor alpha7 [Homo sapiens] gi|1710117|sp|P98171|RGCl_HUMAN RHO-GAP HEMATOPOIETIC PROTEIN C l (P115) (KIAA0131)Dgi|840786 (X78817) p l l 5 [Homo sapiens] gi|1082258|pir|IA56716 biphenyl h y d r o l a s e - r e l a t e d p r o t e i n (EC 3.4.21.-) - humanDgi|984663 (X81372) biphenyl h y d r o l a s e - r e l a t e d p r o t e i n [Homo sapiens] gi I 2833244|sp|Q13304|GPRH_HUMAN PUTATIVE G PROTEIN-COUPLED RECEPTOR GPR17Dgi|992700 (U33447) p u t a t i v e G-protein-coupled receptor [Homo sapiens] gi I 987948 (Z46973) p h o s p h a t i d y l i n o s i t o l 3-kinase [Homo sapiens] g i I 1706663|sp|P54762|EPB1_HUMAN EPHRIN TYPE-B RECEPTOR 1 PRECURSOR (TYROSINE-PROTEIN KINASE RECEPTOR EPH-2) (NET)Dgi|1100112 (L40636) p r o t e i n t y r o s i n e kinase [Homo sapiens] gi11107696 (X86691) Mi-2 p r o t e i n [Homo sapiens] gi11125077 (U40583) alpha 7 neuronal n i c o t i n i c a c e t y l c h o l i n e receptor [Homo sapiens] gi I 24 98265|sp|Q16619|CTF1_HUMAN CARDIOTROPHIN-1 (CT-1)Dgi11151148 (U43033) c a r d i o t r o p h i n - 1 [Homo sapiens] gi|1717975|sp|P52758|UK14_HUMAN 14.5 KD TRANSLATIONAL INHIBITOR PROTEIN (P14.5) (UK114 ANTIGEN HOMOLOG)Dgi|1177435|gnl|PID|el240168 (X95384) 14.5 kDa t r a n s l a t i o n a l i n h i b i t o r p r o t e i n , pl4.5 [Homo sapiens] gi11239957 (L38487) estrogen r e c e p t o r - r e l a t e d p r o t e i n [Homo sapiens] gi11279457|gnl|PID|e223411 (X95715) a n t h r a c y c l i n e r e s i s t a n c e a s s o c i a t e d p r o t e i n [Homo sapiens] gi11302660 (U52112) C l p l l 5 [Homo sapiens] gi|2506350|sp|P30038|PUT2_HUMAN DELTA-l-PYRROLINE-5-CARBOXYLATE DEHYDROGENASE PRECURSOR (P5C DEHYDROGENASE)Dgi|1353248 (U24266) p y r r o l i n e - 5 - c a r b o x y l a t e dehydrogenase [Homo sapiens] 72 Appendix gi11419374 (U05572) alpha-mannosidase [Homo sapiens] gi11458112 (U62432) n i c o t i n i c a c e t y l c h o l i n e receptor alpha3 subunit precursor [Homo sapiens] gi|2506127|sp|P3654 4|ACH7_HUMAN NEURONAL ACETYLCHOLINE RECEPTOR PROTEIN, ALPHA-7 CHAIN PRECURSORQgi|1458120 (U62436) n i c o t i n i c a c e t y l c h o l i n e receptor alpha7 subunit precursor [Homo sapiens] gi|1469185|gnl|PID|dl010122 (D50921) The KIAA0131 gene product i s novel. [Homo sapiens] gi|2833282|sp|Q16854|DGK_HUMAN DEOXYGUANOSINE KINASE PRECURSOR (DGUOK)Dgi|1477482 (U41668) deoxyguanosine kinase [Homo sapiens] gi|1480198|gnl|PID|e242999 (X97386) deoxyguanosine kinase [Homo sapiens] gi|1504022|gnl|PID|dl013899 (D86974) KIAA0220 [Homo sapiens] gi11531665 (U67934) 44.9 kDa p r o t e i n C18B11 homolog [Homo sapiens] gi I 1658374 (U68567) lysosomal a c i d alpha-mannosidase [Homo sapiens] gi11663700|gnl|PID|dl014125 (D87682) s i m i l a r to a C.elegans p r o t e i n encoded i n cosmid T26A5. [Homo sapiens] gi I 1665785|gnl|PID|dl014079 (D87448) S i m i l a r to S.pombe -rad4+/cut5+product (A40727) [Homo sapiens] gi I 2506125|sp|P32297|ACH3_HUMAN NEURONAL ACETYLCHOLINE RECEPTOR PROTEIN, ALPHA-3 CHAIN PRECURSORDgi|1702908|gnl|PID|e274570 (Y08418) n i c o t i n i c a c e t y l c h o l i n e receptor alpha3 subunit precursor [Homo sapiens] gi|1770741|gnl|PID|e274386 (Y07759) mysoin heavy chain 12 [Homo sapiens] gi I 1770742|gnl|PID|e274344 (Y07759) mysoin heavy chain 12 [Homo sapiens] gi I 1813544 (U70426) A28-RGS14p [Homo sapiens] gi I 1827521Ipdb|2ILK| C r y s t a l S t r u c t u r e Of Human I n t e r l e u k i n - 1 0 At 1.6 Angstroms R e s o l u t i o n Cytokine Mol_id: 1; Molecule: I n t e r l e u k i n - 1 0 ; Chain: N u l l ; Engineered: YesDgi|1827645 |pdb|1INR| Cytokine Synthesis Cytokine M o l _ i d : 1; Molecule: I n t e r l e u k i n - 1 0 ; Chain: N u l l ; Synonym: Cytokine Synthesis I n h i b i t o r y Factor, C s i f ; Engineered: YesDgi|2905618 (AF043333) i n t e r l e u k i n 10 [Homo sapiens] gi11854035 (U86753) pombe Cdc5-related p r o t e i n [Homo sapiens] gi|1914775|gnl|PID|e310108 (Y11999) i n o s i t o l 1,4,5-trisphosphate 3-kinase [Homo sapiens] gi11929901 (L77889) lysosomal p r o t e i n [Homo sapiens] gi|2077999|gnl|PID|e315281 (Y13056) s i n g l e - c h a i n Fv fragment [Homo sapiens] gi|2114188|gnl|PID|dl020739 [Homo sapiens] gi|2114191|gnl|PID|dl020742 sapiens] gi|2114192|gnl|PID|dl020743 sapiens] gi|2114193|gnl|PID|dl02074 4 sapiens] gi|2114195|gnl|PID|dl02074 6 gi|2114196|gnl|PID|dl020747 73 (D86096) a l t e r n a t i v e s p l i c i n g ; pEPR-Ia (D86096) a l t e r n a t i v e s p l i c i n g [Homo (D86096) a l t e r n a t i v e s p l i c i n g [Homo (D86096) a l t e r n a t i v e s p l i c i n g [Homo (D86096) EP3-III [Homo sapiens] (D86096) EP3-IV [Homo sapiens] Appendix gi|3122374|sp|O00754|MA2B_HUMAN LYSOSOMAL ALPHA-MANNOSIDASE PRECURSOR (MANNOSIDASE, ALPHA B) (LYSOSOMAL ACID ALPHA-MANNOSIDASE)Dgi|2209015 (U60899) lysosomal alpha-mannosidase [Homo sapiens] gi12245630 (AF005887) ATF6 [Homo sapiens] gi12257762 (U87589) polymerase [Homo sapiens] gi12257764 (U87590) polymerase [Homo sapiens] gi12257766 (U87591) polymerase [Homo sapiens] gi12257768 (U87592) polymerase [Homo sapiens] gi12257770 (U87593) polymerase [Homo sapiens] gi12257773 (U87595) polymerase [Homo sapiens] gi12465729 (AF022385) TFAR15 [Homo sapiens] gi I 2564247|gnl|PID|ell54174 (Y08685) s e r i n e p a l m i t o y l t r a n s f e r a s e , subunit I [Homo sapiens] gi12605640 (U94829) r e t i n a l l y abundant r e g u l a t o r of G-protein s i g n a l i n g hRGS-r [Homo sapiens] gi|2887417|gnl|PID|dl025773 (AB007881) KIAA0421 [Homo sapiens] gi I 2887435|gnl|PID|dl025784 (AB007892) KIAA0432 [Homo sapiens] gi I 2662167|gnl|PID|dl024620 (AB007903) KIAA0443 [Homo sapiens] gi-|2687819|gnl|PID|e322207 (Y12546) P2Y-like G-protein coupled receptor [Homo sapiens] gi|2695876|gnl|PID|e312307 (Z94155) P2Y-like G-protein coupled receptor [Homo sapiens] gi12739208 (AF037333) Eph- l i k e receptor t y r o s i n e kinase hEphBlc [Homo sapiens] gi12754698 (U63630) DNA-PKcs [Homo sapiens] gi12920582 (AF034956) RAD51D [Homo sapiens] gi|2950156|gnl|PID|el256374 (X78121) choroidermia, Rab ger a n y l g e r a n y l t r a n s f e r a s e component A (REP-1) [Homo sapiens] gi|2960069|gnl|PID|e321296 (Y12777) acyl-CoA s y n t h e t a s e - l i k e p r o t e i n [Homo sapiens] gi I 30590611gnl|PID|el286841 (AL022401) dJ93L7.1 (RAB Escort p r o t e i n 1 (REP-1, RAB p r o t e i n s g e r a n y l g e r a n y l t r a n s f e r a s e component A 1, Choroideraemia p r o t e i n , Tapetochoroidal Dystrophy (TCD) pr o t e i n ) [Homo sapiens] gi|3088336|gnl|PID|dl026749 (AB007149) ribosomal p r o t e i n S5 [Homo sapiens] gi|3107925|gnl|PID|dl026854 (AB013341) Trad [Homo sapiens] gi13158351 (AF030555) acyl-CoA synthetase 4 [Homo sapiens] gi13176762 (AF030339) VESPR [Homo sapiens] gi I 3292970|gnl|PID|el310002 (Y15572) R51H3 [Homo sapiens] gi|3327890|gnl|PID|dl032710 (AB016225) TRAD-d3 [Homo sapiens] gi13335518 (AF058291) e s t r o g e n - r e l a t e d receptor gamma [Homo sapiens] gi13342402 (AF068623) m i n e r a l o c o r t i c o i d receptor [Homo sapiens] gi|3413850|gnl|PID|dl033251 (AB007913) KIAA0444 p r o t e i n [Homo sapiens] gi|3413888|gnl|PID|dl033270 (AB007932) KIAA0463 p r o t e i n [Homo sapiens] gi13417295 (AC003007) KIAA0220 [Homo sapiens] gi13522867 (U60266) lysosomal alpha-mannosidase [Homo sapiens] 74 Appendix gi13659504 (AC005084) guanylate kinase; s i m i l a r to Q64250 (PID:g2497499) [Homo sapiens] gi13659901 (AF092124) FIFO-type ATP synthase subunit g [Homo sapiens] 75 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0089033/manifest

Comment

Related Items