M O L E C U L A R GENETICS OF B L O O D C O A G U L A T I O N F A C T O R X by M A R I O N R. F U N G A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF D O C T O R OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES Department of Biochemistry i We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH C O L U M B I A January, 1988 © Marion R. Fung, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) ii A B S T R A C T Thirty thousand colonies of a bovine liver cDNA library were screened with a mixture of synthetic oligodeoxyribonucleotides coding for bovine factor X . Five positive colonies were identified, and plasmid D N A was isolated. Cleavage with restriction endonucleases showed that these plasmids (designated pBXl-5) contained inserts of 1530 bp, 770 bp, 700 bp, 1100 bp and 930 bp. D N A sequence analysis of the plasmid with the largest insert (pBXl) confirmed that bovine factor X cDNAs had been cloned. The cDNA sequence predicts that factor X is synthesized as a single chain precursor in which the light and heavy chains of plasma factor X are linked by the dipeptide Arg-Arg. The cDNA sequence also predicts that factor X is synthesized with a preproleader peptide. It is proposed that at least five specific proteolytic events occur during the conversion of preprofactor X to plasma factor Xa. A human liver cDNA library was screened by colony hybridization with a bovine factor X cDNA probe. Three of the positive plasmids contained overlapping D N A that coded for most of human factor X mRNA. A second human liver cDNA library was screened by in situ hybridization with 32P-labeled human factor X cDNA clones obtained from the first screen. Several clones were isolated that contained longer inserts. D N A sequence analysis of these clones allowed the prediction of the amino acid sequence of the precursor form of human plasma factor X . From these studies, it is predicted that human factor X is synthesized as a single polypeptide chain precursor in which the light and heavy chains of plasma factor X are linked by the tripeptide Arg-Lys-Arg . The cDNA sequence also predicts that human factor X is synthesized as a preproprotein having an aminoterminal leader peptide of 40 amino acid residues. A comparison of the amino acid sequences of human and bovine factor X shows high sequence identity around the calcium-iii binding regions and catalytic regions but low sequence identity around the nonfunctional regions. A human genomic phage library was screened with a human factor X cDNA as a hybridization probe. Thirty-two overlapping phage clones were isolated. Characterization of six of these clones indicates that over 32 Kbp of contiguous sequence is represented. D N A sequence and restriction map analysis shows that the factor X gene is comprised of at least 8 exons and 7 introns. No clones representing the 5' untranslated region and the prepeptide of the leader sequence were identified. Two further genomic phage libraries and two libraries specific for the 5' region of the factor X gene were screened, but no 5' end clones were obtained. Restriction enzyme mapping and Southern blot analysis indicate that thus far, the human factor X gene maps to 24 Kbp of the human genome. Comparison of the factor X gene with other vitamin K-dependent blood coagulation factor genes reveals homologous exon organization. Within the blood coagulation serine proteases factor X , factor IX, factor VII, and protein C form a closely related gene family. IV TABLE OF CONTENTS Abstract ii Table of Contents iv List of Tables vii List of Figures viii List of Abbreviations x Acknowledgements xii I. Introduction 1 A. Process of Blood Coagulation 1 1. Role in Haemostasis 1 2. Pathway of Blood Coagulation 1 3. Initiation Processes 5 4. Fibrin Clot Formation 6 5. Regulatory Pathways 6 6. Fibrinolysis 7 B. Biochemistry of Factor X 8 1. Isolation and Activation 8 2. Enzymatic Functions 11 3. Regulation of Activity 12 C. Structure of Factor X 13 1. Plasma Factor X 13 2. Posttranslational Modifications 13 3. Functional Domains 15 4. Three Dimensional Structure 17 5. Single Chain Factor X 18 D. Family of Blood Coagulation Factors 19 1. Trypsin-Like Family of Serine Proteases 19 2. Functional Protein Homologies 22 E. Eukaryotic Gene Structure 24 1. Promoters 24 2. Exons and Introns 24 3. Transcription and Processing 25 F. Evolution of Protein and Gene Sequences 26 1. Molecular Clock 26 G. Mechanisms of Gene Evolution 27 1. Gene Duplication 27 2. Gene Fusion 27 3. Exon Shuffling 27 4. Intron Insertion and Sliding 28 H. Genetics of the Serine Proteases 28 1. Genes 28 2. Genetics 29 3. Variation in Gene Structure 30 I. Evolution of the Serine Protease Genes 31 V II. Materials and Methods 33 A. Materials 33 B. Strains, Vectors, and Media 34 1. Bacterial Strains 34 2. Vectors 35 3. Media 35 C. Gel Electrophoresis 37 1. Agarose Gels 37 2. Polyacrylamide Gels 37 D. Isolation of DNA 38 1. Isolation of Plasmid DNA 38 2. Isolation of Phage Lambda DNA 41 3. Isolation of Single-Stranded Phage DNA 42 E. Isolation of Poly A+ RNA 43 F. Oligonucleotide Synthesis and Purification 44 G. Labeling of DNA 45 1. Nick Translation of DNA 45 2. Klenow Labeling of DNA 46 3. Klenow Labeling of Single-Stranded DNA 47 4. End Labeling of Oligonucleotides 48 H. Screening Plasmid Libraries 48 I. Screening Phage Lambda Libraries 52 1. In Situ Hybridization Screening 52 2. Recombinant Screening 53 J. Construction of Specific Genomic Phage Lambda Libraries 54 1. Isolation of Genomic DNA 54 2. Modification of Genomic DNA Fragments 54 3. Ligation and Packaging of Genomic DNA 55 K. DNA Subcloning 56 1. Production of DNA Fragments For Ligation 56 2. Ligation of DNA Into pUC 13, Ml3, and piAN7 Vectors 57 3. Transformation of DNA Into Bacteria 57 L. Blot Hybridization 58 1. Southern Blot Analysis 58 2. Northern Blot Analysis 59 M. DNA Sequences Analysis 60 1. Production of Ml3 Clones 60 2. DNA Sequence Analysis 60 3. Computer Analysis of DNA Sequence Data . 62 III. Results 63 A. Characterization of the Bovine Factor X cDNA 63 1. Isolation and Characterization of the Bovine Factor X cDNA Clones 63 2. DNA Sequence Analysis of Bovine Factor X cDNA Clones 68 B. Characterization of the Human Factor X cDNA 73 1. Isolation and Characterization of the Human Factor X cDNA Clones 73 2. DNA Sequence Analysis of Human Factor X cDNA Clones 74 3. 5' End of the Human Factor X cDNA 80 4. Northern Blot Analysis of the Human Factor X mRNA 80 C. Characterization of the Human Factor X Gene 83 1. Southern Blot Analysis of the Human Factor X 83 2. Isolation and Characterization of the Factor X Gene Genomic Clones 88 3. Localization of the Intron/Exon Junctions 88 4. Nucleotide Sequence of the Human Factor X Gene 91 vi 5. Screening For the 5' End of the Factor X Gene 100 i. Screening General Genomic Libraries 100 ii. Southern Blot Analysis of the 5' End of the Factor X Gene 103 iii. Construction of Specific Genomic Libraries 107 iv. Summary of Screening Conditions and Genomic Clone Analysis 108 D. Human Factor X Genetics 108 1. Chromosomal Localization 108 2. Restriction Fragment Length Polymorphisms 114 IV. Discussion 116 A. Characterization of the Bovine Factor X cDNA 116 1. Characterization of the Bovine Factor X cDNA Clones 116 2. Predicted Amino Acid Sequence of Bovine Factor X 117 B. Characterization of the Human Factor X cDNA 117 1. Characterization of the Human Factor X cDNA Clones 117 2. Size Analysis of the Human Factor X mRNA 121 3. Predicted Amino Acid Sequence of Human Factor X 121 C. Precursor Form of Factor X 122 1. Factor X Is Synthesized as a Single Chain Precursor 122 2. Preprofactor X 126 3. Processing of the Factor X Precursor 130 4. Homologous Structural Domains 131 D. Comparison of Bovine and Human Factor X 132 E. Characterization of the Human Factor X Gene 135 1. DNA Sequence Analysis of the Human Factor X Gene 135 2. 5' Region of the Factor X Gene 138 3. Factor X Gene Structure 140 F. Genetics of Human Factor X 141 1. Analysis of Chromosome Loci 141 2. RFLP Studies 142 G. Evolution of the Factor X Gene 142 H. Comparison of the Genes For the Vitamin K-Dependent Blood Coagulation Factor Genes 143 1. Organization of the Genes For the Vitamin K-Dependent Clotting Factors 143 2. Leader and Gla Regions 147 3. Epidermal Growth Factor Homologous Domains 150 4. Serine Protease Domain 153 I. Evolution of the Vitamin K-Dependent Coagulation Factors 157 V. Literature Cited 161 VI. Appendix I 182 vii LIST OF TABLES I. Colony and In Situ Hybridization Conditions 50 II. DNA Sequencing Mixes 61 III. Nucleotide Sequence of Intron/Exon Junctions in the Factor X Gene 92 IV. Frequencies of Nucleotides at Intron/Exon Junctions 93 V. Size of Exons and Introns in the Human Factor X Gene 94 VI. Summary of Screen Results For the 5' End of the Factor X Gene 109 VII. Segregation of Human Sequences Homologous to the cDNA Encoding Factor 10 in Human-Hamster Hybrids 111 VIII. Restriction Fragment Length Polymorphisms in the Human Factor X Gene 115 viii LIST OF FIGURES 1. The Blood Coagulation Cascade 3 2. Schematic Representation of Bovine Factor X 9 3. Amino Acid Sequence Homologies in Coagulation Factor Zymogens 20 4. Synthetic Oligonucleotide Mixture For Bovine Factor X 64 5. Autoradiograph of Bovine Liver cDNA Library Screened With a Mixture of Synthetic Oligodeoxyribonucleotides Coding For Bovine Factor X 66 6. Restriction Map and Sequencing Strategy For Bovine Factor X cDNA 69 7. Nucleotide Sequence of Bovine Factor X cDNA 71 8. Restriction Map and Sequencing Strategy For Human Factor X cDNA 75 9. Nucleotide Sequence of Human Factor X cDNA 77 10. Nucleotide Sequence of the Leader Peptide and Partial 5' Untranslated Region of the Human Factor X cDNA 81 11. Northern Blot Analysis of Human Factor X mRNA 84 12. Southern Blot Analysis of the Human Factor X Gene 86 13. Partial Restriction Map and Intron/Exon Organization of the Factor X Gene 89 14. DNA Sequencing Strategy For the Human Factor X Gene 95 15. Partial DNA Sequence of the Human Factor X Gene 97 16. Probes Used in Rescreening For the 5' End of the Factor X Gene 101 17. Southern Blot Analysis of the 5' Region of the Factor X Gene 104 18. In Situ Hybridization Studies of the Factor X Gene 112 19. Precursor Form of Human Factor X 119 20. Comparison of Basic Amino Acid Linkages in Precursor Molecules 124 21. Comparison of the Leader Sequence of the Vitamin K-Dependent Blood Coagulation Factors 127 22. Comparison of the Amino Acid Sequences of Bovine and Human Factor X 133 23. Intron Sites in the Human Factor X Molecule 136 ix 24. Comparison of the Genes Coding For the Vitamin K-Dependent Clotting Factors 144 25. Comparison of the Leader Peptide and Gla Exons of the Vitamin K-Dependent Genes 148 26. Comparison of the Epidermal Growth Factor Homologous Domains of the Blood Coagulation Protease Genes 151 27. Comparison of the Exon Organization of the Serine Protease Domain 154 28. A Model For the Evolution of the Vitamin K-Dependent Coagulation Factors 158 29. Screen of Phage Library Using Recombination With the Vector piAN7 183 LIST OF ABBREVIATIONS x A Adenosine ATP Adenosinetriphosphate bp Base Pair(s) BSA Bovine Serum Albumin Ca++ Calcium Ions Ci Curies cpm Counts Per Minute CsCl Cesium Chloride dNTP Deoxyribonucleosidetriphosphate ddNTP Dideoxyribonucleosidetriphosphate DNA Deoxyribonulceic Acid DNase Deoxyribonuclease DTT Dithiothreitol EDTA Ethylenediaminetetraacetic Acid EGF Epidermal Growth Factor EtBr Ethidium Bromide G Guanosine Gla Gamma-Carboxyglutamic Acid GuHCl Guanidine Hydrochloride IPTG Isopropyl-Beta-D-Thiogalactopyranoside Kbp Kilobase Pair(s) Krpm Thousand Revolutions Per Minute LB Luria Broth mA Milliamps mRNA Messenger RNA N OD PEG pfu R RF RFLP RNA RNase SAM SDS TEMED TLC Tris tRNA u U V T W X-Gal Y Any Nucleoside (G, A, T, or C) Optical Density Polyethylene Glycol Plaque Forming Unit Purine (A or G) Replicative Form Restriction Fragment Length Polymorphism Ribonucleic Acid Ribonuclease S-Adenosylmethionine Sodium Dodecyl Sulphate N,N,N',N'-Tetramethylethylenediamine Thin Layer Chromatography Tri(hydroxymethyl)aminomethane Transfer RNA Unit(s) Uridine Volts Thymidine Watts 5-Bromo-4-Chloro-3-Indoyl-Beta-D-Galactopyranoside Pyrimidine (T or C) xii ACKNOWLEDGEMENTS I would like to thank the members of my supervisory committee, Drs. Grant Mauk and Peter Candido, and Dr. Rob McMaster for their continued support over the years. To the members of the lab - I appreciate the generosity of advice and laughter. Family and friends, I thank for their understanding and support. And to Ross...thank you for the inspiration, guidance, encouragement, friendship, humor, coffee, and especially for the deep appreciation and enjoyment of science that you inspire in others. 1 INTRODUCTION A. PROCESS OF BLOOD COAGULATION 1. ROLE IN HAEMOSTASIS One of the remarkable features of the circulatory system is its ability to respond rapidly to trauma. Haemostasis is the physiological process by which blood flow is maintained and blood loss is prevented (Guyton, 1981). Haemostasis is achieved through the interaction of the endothelial cells lining the blood vessels, the blood platelets, and a group of plasma proteins named the clotting factors. Immediately upon rupture of the blood vessel, the vascular wall contracts to minimize fluid loss. Secondly, platelets aggregate at the damaged tissue to form an initial physical plug to close the vascular lesion. If the injury is minor, this platelet plug is sufficient to stop haemorrhaging. If the trauma is severe, activator substances are released from the disrupted endothelial cells and attached platelets trigger the blood clotting process. The fibrin threads produced adhere to the loose platelet plug forming a tight and congealing seal. Finally, growth of fibrous tissue into the blood clot and dissolution of the fibrin plug completes the repair of the endothelial cells and closes the vascular wall permanently (Guyton, 1981). 2. PATHWAY OF BLOOD COAGULATION Of the mechanisms involved in haemostasis, the most extensively studied and best understood is that of the blood coagulation process (Jackson and Nemerson, 1980; Davie et al., 1979). The system is complex; at least 15 proteins interact in a series of discrete, simpler stages. The model of the blood coagulation process is a modification of the classical cascade mechanism originally proposed by MacFarlane (1964) and Davie and Ratnoff (1964) (Figure 2 1). The advantages of this mechanism are two fold: 1) it results in a rapid physical amplification of the initial activating signal and 2) the multi-stage system inherently possesses numerous points for possible feedback regulation (MacFarlane, 1964; Zur and Nemerson, 1981). The blood coagulation cascade consists of two pathways, the extrinsic and the intrinsic systems converging on a common pathway at the activation of factor X (Figure 1). The extrinsic pathway responds more rapidly to trauma (Jackson and Nemerson, 1980), but the components of both pathways must be functional for normal haemostasis suggesting that the two initiating mechanisms are interdependent (Zur and Nemerson, 1981). However, individuals lacking the plasma proteins participating in the initial phases of the intrinsic pathway such as factor XII or prekallikrein are often asymptomatic (Bloom, 1981). Each enzymatic step of the blood coagulation cascade involves the activation of an inactive precursor or zymogen to its active serine protease form by limited proteolysis of one or more specific peptide bonds (Davie et al., 1979). Factors Vila, IXa, Xa, XIa, Xlla, and thrombin represent the active protease forms of the enzymes. Factor V and factor VIII act as cofactors, accelerating the specific enzymatic reactions catalyzed by factor IXa and factor Xa, respectively (Figure 1). Both factor V (Nesheim and Mann, 1979) and factor VIII (Hultin and Nemerson, 1978) require activation prior to function. The other cofactors, high molecular weight (HMW) kininogen and tissue factor, play central roles in 'contact activation' (Griffin, 1981) and initiation of the extrinsic pathway (Zur and Nemerson, 1981), respectively (Figure 1). The reactions catalyzed by the vitamin K-dependent clotting factors factors Vila, IXa, Xa, thrombin (and activated protein C and protein S) require calcium ions and phospholipids. Mediated by the calcium ions, these coagulation factors interact with the phospholipid membrane released by the platelets, thus concentrating the plasma proteins at the site of injury (Jackson and Nemerson, 1980). 3 FIGURE 1: THE BLOOD COAGULATION CASCADE Outline of the mammalian blood coagulation cascade with the intrinsic pathway (left) and the extrinsic pathway (right) converging at the activation of factor X to factor Xa, and ending with the formation of the insoluble fibrin clot. . Anticoagulant protein C acts in a regulatory role as shown (bottom left). Bars represent the polypeptide chains (proportional to polypeptide chain length) with molecular weights indicated below. Intrachain disulphide bridges are indicated by lines between the two polypeptides. Cross-linked fibrin represents the cross linked fibrin clot formed by the action of factor Xllla; T M represents thrombomodulin (modified from Neurath, 1984). 4 S u r f a c e H M W K i n i n o g e n ( 7 6 ) K a l l l k r c i n 21 K 35K a c t . P r o t e i n C 70 K Prothrombin Vo Co 2 * , P- I ipld ' £ 3 = 5K 32 K Thrombin Fibrinogen Fibrin I XUL Fibrin (X-llnked) 5 3. INITIATION PROCESSES Fibrin clot formation is initiated by one of two mechanisms. Following vascular injury, the extrinsic pathway is stimulated by the expression of tissue factor from damaged monocytes and endothelial cells in addition to exposure of the subendothelial layer. The integral membrane protein (Spicer et al., 1987) accelerates the catalytic activation of factor X by factor Vila by 16,000-fold (Jackson and Nemerson, 1980). A 1:1 stoichiometric complex is formed between factor Vila and tissue factor (Bach et al., 1987; Ploplis et al., 1987). Factor X / X a is bound to the complex transiently as a substrate or more avidly when in the prsence of the extrinsic pathway inhibitor (Broze and Miletich, 1987; Rao and Rapaport, 1987) The close proximity of the three proteins is responsible for the greatly enhanced rate of activation. Factor Vila is the exclusive protease of the extrinsic pathway. Zymogen factor VII exhibits minimal activity, but the proteolytically generated derivative, factor Vila, has up to 85-fold increased coagulant activity (Radcliff and Nemerson, 1975). Several clotting factors hydrolyze factor VII by specific proteolysis including factor Xlla and kallikrein (Kisiel et al., 1977; Radcliffe et al., 1977), thrombin (Radcliffe and Nemerson, 1975), factor Xa (Radcliffe and Nemerson, 1975), and factor IXa (Masys et al., 1982). In the intrinsic pathway, the initiation process is not well established. 'Contact activation' was originally discovered by the exposure of plasma to negatively charged surfaces such as kaolin or glass (Margolis, 1957). Charged surfaces released from disrupted tissues provide the contact system in vivo. Initiation involves factor XII, factor XI, prekallikrein, and HMW kininogen but the sequence of activation is ambiguous (see Griffin, 1981). Surface-binding of factor XII induces conformational changes to the protein, rendering it susceptible to cleavage by kallikrein in the presence of HMW kininogen (Griffin, 1978). Alternatively, factor XII is slowly hydrolyzed by active plasma factor XII (see Burgess and Esnouf, 1985). Mandle and Kaplan (1977) noted that the major activator of prekallikrein is factor Xlla, thus proposing a 'reciprocal proteolysis loop' that greatly accelerates the activation process 6 (Griffin and Cochrane, 1976; Griffin, 1981). Activated factor XII hydrolyses HMW-kininogen bound factor XI resulting in the intrinsic cascade (Griffin, 1981). 4. FIBRIN CLOT FORMATION Polymerization of the fibrin monomers produces the foundation of the fibrin clot (Doolittle, 1981a,b). The precursor, fibrinogen, consists of three pairs of polypeptide chains, (Aa^, (B/S^, (7)2, linked together by 29 disulfide bridges (Doolittle, 1981b). Electron microscopy revealed rod-like structures consisting of nodular terminal and central domains (Hall and Slayter, 1959). Dimers comprised of (Aa, B/3, 7) triple-helices form the two halves of the monomer. Thrombin releases the fibrinopeptides A and B from the a and (3 polypeptide chains by specific proteolysis. Binding sites are exposed within the central nodule and non-covalent bonds are formed with a fragment from the terminal domains of adjacent fibrin molecules (Doolittle, 1981b). The resulting polymerization is spontaneous. Intermediate polymers are then interwoven laterally to form a fully developed fibrin clot (Doolittle, 1981a,b). The network is further stabilized by 7-chain cross-linking by the transglutaminase, factor XHIa (Curtis, 1981). 5. REGULATORY PATHWAYS The multiple-stepped blood coagulation cascade provides many opportunities for negative regulation (Jackson and Nemerson, 1980; Davie et al., 1979). The normal balance of the coagulation system is maintained by two major mechanisms: 1) neutralization of activated proteases by plasma protein inhibitors (Collen, 1981) and 2) proteolytic degradation of active species (Esmon, 1987). The major protease inhibitor of blood coagulation is antithrombin III. Antithrombin III is a protein with broad-spectrum specificity, inhibiting all the coagulation serine proteases with the exception of factor Vila (Collen, 1981; Jesty, 1978). The inhibitor 7 forms a tight, inactive, 1:1 stoichiometric complex with the active site of the enzymes in a reaction which is markedly accelerated by heparin (Jesty, 1978). In a similar manner, other plasma protease inhibitors form inhibitory complexes with the blood clotting proteases (Collen, 1981). Alph^-macroglobulin binds thrombin, kallikrein, and plasmin as well as pancreatic enzymes such as trypsin. Alpha, 1-inactivator is the major regulator of the proteases involved in 'contact activation' including factor Xlla, factor XIa, kallikrein, and the fibrinolytic enzyme, plasmin. Finally, alpha, 1-antitrypsin may play a role in factor XIa inactivation. Protein C and protein S function in the regulation of thrombin formation by inactivating two of the accessory proteins, factor V and factor VIII (Esmon, 1987). Protein C is found in plasma as a zymogen to a serine protease. In plasma, protein C is a poor substrate for thrombin; in the presence of the endothelial cell surface protein called thrombomodulin, however, protein C is rapidly cleaved by thrombin to form activated protein C (Owen and Esmon, 1981). Once activated, protein C interacts with protein S on membrane surfaces and rapidly inactivates factor Va by limited proteolysis (Esmon, 1987). As factor Va is required for efficient conversion of prothrombin to thrombin, the activated protein C-protein S complex acts as an anticoagulant. It has been proposed that the membrane-bound thrombomodulin interacts with protein C through calcium bridges (Esmon, 1987). 6. FIBRINOLYSIS To aid vascular repair and to avoid obstruction of normal blood flow, fibrin clot formations undergo fibrinolysis. Three pathways initiate clot dissolution: 1) the intrinsic system comprised of the 'contact phase' proteins factor XII, prekallikrein, and HMW kininogen, 2) the extrinsic system stimulated by trauma-induced release of plasminogen activator from tissue or the vascular wall, and 3) the exogenous system involving the therapeutic agent, urokinase (Collen, 1981; Gaffney, 1981). Each of the pathways converge at the activation of 8 plasminogen by specific hydrolysis to give plasmin. Plasmin has a high binding affinity for fibrin and a proteolytic specificity for 50-60 lysyl and arginyl bonds within the fibrin molecule (Collen, 1981). Structurally, the majority of these peptide bonds exist in the rod-like, coiled segments of the fibrin polymer (Doolittle, 1981a,b). Proteolysis generates soluble fragments composed of the three cross-linked nodular domains. The fundamental structure of the fibrin network permits easy access and rapid dissolution of the fibrin clot while requiring a minimal number of proteolytic events (Doolittle, 1981a,b). B. BIOCHEMISTRY OF FACTOR X 1. ISOLATION AND ACTIVATION The intrinsic and extrinsic clotting pathways converge at the activation of factor X (see Figure 1). In the intrinsic pathway, factor IXa in the presence of factor Villa, calcium ions and phospholipid cleaves a single peptide bond in factor X (Fujikawa et al., 1975). This limited proteolysis converts factor X from an inactive zymogen to the active serine protease called factor Xa (Figure 2). In the extrinsic pathway, the same'peptide bond in factor X is cleaved by factor Vila in the presence of tissue factor, calcium ions and phospholipid (Fujikawa et al., 1975). Factor X is also activated by various proteases including trypsin and a protease from Russell's viper venom (Fujikawa et at., 1972b; Jesty and Esnouf, 1973; Jesty and Nemerson, 1973). Factor X, [8 ug/ml in plasma (Bajaj and Mann, 1973)] the Stuart (Hougie et al., 1957) or Prower (Telfer et at., 1956) factor, was first described in individuals with bleeding disorders. Subsequently, the plasma protein has been highly purified and characterized from bovine (Jackson and Hanahan, 1968; Fujikawa, et al., 1972a; Jackson, 1972; Esnouf et al., 1973), pig (Dupe and Howell, 1973), rat (Graves et al., 1982; Willingham and Matschiner, 1984), and 9 FIGURE 2: SCHEMATIC REPRESENTATION OF BOVINE FACTOR X A schematic representation of the structure of bovine factor X (Titani et al., 1975; Enfield et al., 1980) and its mechanism of activation (Fujikawa et al., 1975). The large arrows refer to the major activation pathway. The cleavage sites are as indicated. The three residues His-93, Asp-138, and Ser-235 constitute the active site catalytic triad. The molecular weights of the protein species are given in parentheses. Gamma-carboxyglutamic acid residues are represented by Y, glycosylated residues are denoted by diamonds, factor IXa and factor Vila cleavage site is indicated by the large open arrow, and the autocatalytic processing site is represented by the small open arrow. FACTOR X aa (45,300) 11 human (Aronson et al., 1969; Rosenburg et al., 1975; Kosow, 1976; Vician and Tischkoff, 1976; DiScipio et al., 1977) plasma. Fractionation of bovine factor X by barium sulfate adsorption and ion exchange chromatography yields two forms of the protein designated X, and X2 (Jackson and Hanahan, 1968; Fujikawa et al., 1972a). Both species possess similar chemical and biological properties. The only distinguishing feature is the sulfated tyrosine residue near the amino-terminal end of the higher molecular weight chain of the factor X2 variant (Morita and Jackson, 1979). As discussed above, factor X is activated to the serine protease factor Xa by cleavage of a single peptide bond. In a second, slower reaction, factor Xa (also designated as factor Xaa) is converted to factor Xa/3 by hydrolysis of an Arg-Gly peptide bond in the carboxyterminal region of the protein (Fujikawa et al., 1975) (Figure 2). This autocatalytic reaction requires calcium ions and phospholipids to proceed, but is unrelated to the activation process (Jesty et al., 1974). Enzymatically, the degradation product exhibits no loss of coagulant activity (Fujikawa et al., 1975); thus the carboxyterminal 17 amino acids of bovine factor X are not essential for physiological function. 2. ENZYMATIC FUNCTIONS The primary enzymatic function of factor Xa is the conversion of prothrombin to thrombin in the presence of the cofactors calcium, phospholipids, and factor Va via two specific proteolytic events (Jackson, 1981). The initial cleavage releases a 33.5 KD glycopeptidyl intermediate possessing no coagulant activity (Esmon and Jackson, 1974). The second proteolysis generates two-chained thrombin with full activity but no associated peptide loss (Jesty and Esnouf, 1973). The proteolytic action of factor Xa is not limited to prothrombin but is significant in the activation of factor VII by a positive feedback loop mechanism (Radcliffe and Nemerson, 1975, 1976). Initial hydrolysis of factor VII by factor Xa generates factor Vila which 12 activates the extrinsic pathway in vitro: however, prolonged exposure to factor Xa results in the formation of a three-chained molecule with esterase, but no proteolytic activity (Radcliffe and Nemerson, 1975, 1976). Therefore, the activation of factor X via the extrinsic pathway may be viewed as self-terminating. In vitro, factor V, factor VIII, as well as factor Xa act as substrates for factor Xa (Davie et al., 1979; Jackson, 1984). 3. REGULATION OF ACTIVITY Regulation of the coagulation reactions in which factor X participates is complex. Three mechanisms control the proteolytic activity of factor Xa: 1) the interrelationship between the reactants in activation complex-formation including cofactor and surface membrane binding (see Jackson, 1984), 2) cofactor protein inactivation by activated protein C both at the stage of factor X activation and factor Xa activation of prothrombin (see Esmon, 1987), and 3) regulation by association with an irreversible protease inhibitor (see Jackson and Nemerson, 1980). Regulation is further complicated by the positive feedback of factor Xa on factor VII. The third mechanism may act more as a 'sink' for dissociated proteases rather than as a regulator. Antithrombin III is a serine protease inhibitor of several of the blood coagulation enzymes, including factor Xa (Jesty, 1978; Kurachi et al., 1976). Although antithrombin III appears to neutralize free protease effectively, the accessibility of factor Xa to the inhibitor is severely restricted when bound either to factor Va and phospholipids (Marciniak, 1973) or to platelets (Miletich et al., 1978). Thus, antithrombin III may only inhibit plasma free enzymes, and therefore remove 'discarded' proteins but is not a regulator of their action in complex form (see Jackson and Nemerson, 1980). In vitro, soybean trypsin inhibitor and trypsin are commonly used inhibitory agents of factor Xa (Jackson and Hanahan, 1968; Fujikawa et al., 1972b). 13 C. STRUCTURE OF FACTOR X 1. PLASMA FACTOR X Plasma factor X has been fully characterized, both in the bovine (Fujikawa et al., 1972a; Jackson and Hanahan, 1968; Jackson, 1972; Esnouf et al., 1973) and human (Aronson et a!., 1969; Rosenberg et al., 1975; Kosow, 1976; Vician and Tiskoff, 1976; DiScipio et al., 1977) species. Circulating zymogen factor X occurs in plasma as a two-chained protein (Figure 2). This feature is unique among the blood coagulation proteases, but is present in the regulatory enzyme, protein C (see Esmon, 1987). Bovine factor X is a glycoprotein with a molecular weight of 55 K D consisting of a light chain (16.5 KD) and a heavy chain (39.3 K D ) linked by a disulfide bond (Fujikawa et al., 1972a; Jackson, 1972). The human counterpart has a somewhat larger molecular weight (59 KD) which is largely attributable to a higher carbohydrate content and an amino-terminal extension associated with the heavy chain (DiScipio et al., 1977). In both species, activation occurs at a Arg-Ile bond, releasing a 10-14 K D fragment from the aminoterminus of the heavy chain (Fujikawa et al., 1972a, 1974; Jesty et al., 1974; DiScipio et al., 1977) (Figure 2). The complete protein sequence of bovine factor X has been reported (Enfield et al., 1980; Titani et al., 1975) as well as the amino acid sequence of the light chain of human factor X (McMullen et al., 1983a). Comparison of the light chains shows an overall homology of 70% between the two species (McMullen et al., 1983a). 2. POSTTRANSLATIONAL MODIFICATIONS After synthesis and prior to secretion from liver cells, the blood coagulation factors are highly modified to facilitate their specialized functions (Burgess and Esnouf, 1985). In the case of factor X, three modifications are required. Glycosylation is associated exclusively 14 with the heavy chain (DiScipio et al., 1977; Jackson and Hanahan, 1968) and occurs at Asn-36 (N-linked) and Thr-300 (O-linked) in the bovine molecule (Titani et al., 1975). Both carbohydrate residues are removed following activation and autocatalytic cleavages (Fujikawa et al., 1975). In the amino-terminal region, several glutamic acid residues of factor X are converted to gamma-carboxyglutamic acid residues (Gla) (Burgess and Esnouf, 1985; Jackson and Nemerson, 1980). Historically, Gla residues were first identified in prothrombin (Dam et al., 1936; Stenflo et al., 1974) and subsequently in the other vitamin K-dependent plasma proteins factors VII, IX, and X and proteins C, S, and Z (Suttie, 1985). Gla modification follows glycosylation (Swanson and Suttie, 1985) and is catalyzed by a vitamin K-dependent carboxylase found in the hepatic cellular endoplasmic reticulium (Suttie, 1985). Gla residues are necessary for the biological activity of these clotting factors. An impaired state may be brought about either by the absence of the essential vitamin or by inhibition of the carboxylation process by coumarol drugs (eg. warfarin). Under such conditions, Gla less factor X is activated at a minimal rate (Burgess and Esnouf, 1985). Other vitamin K -dependent proteins have been discovered in various tissues, most notably in bone (Pan and Price, 1985; Pan et al., 1985). Apparently, a similar carboxylase enzyme is present in both liver and bone cells, although vitamin K-dependent proteins of unknown function are produced in most tissues (Suttie, 1985). In each of the vitamin K-dependent proteins with the exception of prothrombin, an aspartic acid residue is converted to a beta-hydroxyaspartic acid in a posttranslational reaction (Asp-63 in the case of factor X) (Fernlund and Stenflo, 1983; McMullen et al., 1983b). The catalytic mechanism involved is unknown (McMullen et al., 1983a,b). Conserved arginine positions and a single tyrosine/phenylalanine residue (Stenflo et al., 1987) may reflect a unique recognition site for posttranslational hydroxylation (McMullen et al., 1983a,b; Esmon et al., 1983). The functional significance of the modification is not clear, but recent 15 speculation suggests that the beta-hydroxyaspartic acid residue may have calcium-binding capacity independent of the Gla domain (McMullen et al., 1983b; Esmon et al., 1983; Cook et al., 1987). Studies involving Gla-less factor X correlate the presence of a single beta-hydroxyaspartic acid with the binding of a single calcium ion (Sugo et al., 1984). Adjacent aspartic residues may combine with the modified residue to form a calcium binding site (Cook et al., 1987). 3. FUNCTIONAL DOMAINS Factor X is a complex protein, composed of multiple structural domains each of which is essential for the physiological action of the protein. The N-terminal 40 amino acids of the light chain encode the Gla domain of factor X (McMullen et al., 1983a; Enfield et al., 1980). Gla residues are required for the calcium-dependent interaction between the protease and negatively charged phospholipid surface which is essential for the activation of the zymogen (Suttie, 1985). The mechanism of calcium action remains controversial. Some postulate that the Gla residues function in binding calcium ions, thus forming an ionic bridge between factor X and the phospholipid membranes at the site of injury (Stenflo and Suttie, 1977). Others theorize that the binding of calcium ions induces one or possibly two conformational changes in the protein favoring dimerization or protein association to acidic lipid vesicles; however, the primary calcium binding site is located elsewhere on the molecule (Borowski et al., 1986). Still others propose the presence of multiple classes of binding sites exhibiting both positive and negative cooperativity (Jackson and Brenkle, 1980). The Gla domain is followed by a two-fold repeat of approximately 40 amino acids each demonstrating homology to epidermal growth factor (EGF) (Doolittle, 1984). The EGF-like domains consist of three conserved glycine residues as well as six cysteines which form three structurally distinctive disulfide bridges. Similar regions exist in the clotting proteins factors VII, IX, XII, protein C, protein S, plasminogen activator, and urokinase (Doolittle et al., 16 1984; Cool et al., 1985; Stenflo et al., 1987). The posttranslationally modified beta-hydroxyaspartate residue has been identified in the first EGF element in factors VII, IX, X and proteins C and S (aspartate residues 63, 64, 63, 71, and 95, respectively) (Fernlund and Stenflo, 1983; McMullen et al., 1983b; Stenflo et al., 1987). In addition, EGF homologies were recently found in such diverse proteins as the 19 KD protein from vaccinia virus (Blomquist et al., 1984; Reisner, 1985), the LDL receptor (Sudhoff et al., 1985a,b), thrombomodulin, and complement protein Clr (see Stenflo et al., 1987). Whether this structural unit has any binding affinity similar to that of epidermal growth factor is unclear (Doolittle et al., 1984). However, immuno-isolation of the EGF region of protein C identified a Gla-independent high affinity calcium binding site possibly associated with the beta-hydroxyaspartic acid contained in the first EGF homology (Stenflo et al., 1987; Ohlin and Stenflo, 1987). The heavy chain of factor X contains the serine protease region essential for proteolytic activity. The His, Asp, and Ser residues of the charge relay network found in trypsinogen, chymotrypsin, and proelastase are found in similar positions in factor X and the other blood coagulation proteases, indicating a common evolutionary ancestor (Jackson and Nemerson, 1980; Stroud, 1974). In factor X, the catalytic triad is represented by His-93, Asp-138, and Ser-235 (Titani et al., 1985). In each case except for factor VII, enzymatic activity is triggered by cleavage at an Arg-X bond and loss of a peptide fragment from the N-terminus of the zymogen (Davie et al., 1979). No peptide loss is associated with the proteolysis of factor VII. The resulting conformational change induces activity. The highest degree of sequence similarity is observed in the activation recognition region and around the serine active site (Davie et al., 1979). The active site region, Gly-Asp-Ser-Gly-Gly-Pro, is homologous to each protease (see Jackson and Nemerson, 1980). However, in contrast to the digestive serine proteases, factor X and the other blood clotting proteases display a more limited substrate specificity. 17 4. THREE DIMENSIONAL STRUCTURE To date, factor X has not been crystallized in forms suitable for X-ray diffraction studies. The structure of part of fragment 1 of prothrombin has been elucidated (Park and Tulinsky, 1986). It is postulated that several stacked aromatic residues contained in the C-terminus of the Gla domain may form a receptor recognition site for the carboxylase. By inference, corresponding amino acids in factor X may have a similar function. Attempts to examine the three dimensional structure of epidermal growth factor have been unsuccessful. However, solution structures of human EGF have been analyzed by NMR with extrapolation to the first, more homologous EGF-like domain in factors IX and X (Cook et al., 1987). The model indicates two distinct domains within the homologous region; the N-terminal domain contains the first two disulfide bridge structures and the C-terminal domain contains the third. These units adopt a triple-stranded beta-sheet conformation allowing the aspartic acid and beta-hydroxyaspartic acid residues to lie in close proximity on the one face of the sheet where they can easily from a calcium binding site (Cook et al., 1987). As discussed, the catalytic heavy chain of factor X (and the other clotting proteases) share sequence homology with trypsin (Jackson and Nemerson, 1980; Davie et al., 1979). The homology to the pancreatic enzymes has allowed the development of computer models of the three-dimensional structures of factor IXa, factor Xa, thrombin (Furie et al., 1982), and factor Xlla (Cool et al., 1985) based upon the known tertiary structures of chymotrypsin and trypsin (see Stroud (1974) for a review). The models project each serine protease as enzymes possessing highly individualistic, charged surfaces enveloping a strictly conserved active site core. A conserved catalytic mechanism between the pancreatic and coagulation proteases is implied by the invariant three-dimensional structure of the His-Asp-Ser catalytic triad whereas the unique substrate specificity of the clotting factors is thought to be defined by 18 the structural differences found in the substrate binding pockets as well as on the surfaces of the proteins (Furie et al., 1982). 5. SINGLE CHAIN FACTOR X Factor X and protein C circulate in plasma as two-chained molecules (Jackson and Hanahan, 1968; Fujikawa et al., 1972a). However, prothrombin and other homologous serine proteases are found in plasma as a single polypeptide chain. This has led several investigators to propose that factor X is synthesized as a single chain precursor comprised of both the light and heavy chains (see Jackson and Nemerson, 1980). Studies from several laboratories have shown that factor X is synthesized by rat (Graves et al., 1982; Willingham and Matschiner, 1984) and human (Rosenberg et al., 1975; Fair and Bahnak, 1984) hepatoma cells as a precursor consisting of a single polypeptide chain of molecular weight 63 KD (rat). After secretion into the tissue culture medium, the single-chain form is converted to the two-chain form found in plasma, but the nature of this conversion was not established in these studies. However, rapid purification from bovine plasma showed an increased proportion of the single-chained component (Mattock and Esnouf, 1973) suggesting that the modifying proteolytic enzyme(s) necessary for the conversion is present in plasma. Recent evidence indicates that approximately 10% of the total factor X antigen present in normal rat plasma is found in the single-chain form (Willingham and Matschiner, 1984) raising the possibility of a physiological role for the factor X precursor. 19 D. FAMILY OF BLOOD COAGULATION FACTORS 1. TRYPSIN-LIKE FAMILY OF SERINE PROTEASES The family of serine proteases encompasses a wide range of proteins including not only those involved in digestion and coagulation, but those necessary for active immunity, ovum fertilization, neuropeptide processing, and fibrinolysis (Neurath and Walsh, 1976). The evolution and diversification of all the serine proteases raise intriguing questions. Their functions are extremely diverse, yet it is a commonly accepted theory that they are descendants of a single ancestral gene (Stroud, 1974). With the exception of the bacterial subtilisins, the serine proteases studied in the greatest detail show the similarities of amino acid sequence and where available, three-dimensional structure that exemplify homologies (Neurath, 1984). Through both protein and cDNA analysis, the amino acid sequences for the majority of the blood coagulation proteases have been determined, permitting comparison of amino acid and molecular homologies as illustrated in Figure 3 (Young et al., 1978; Hewett-Emmett et al., 1981). The catalytic regions of the blood coagulation factors share approximately 40% amino acid identity with trypsinogen and with each other in their serine protease domains (Hewett-Emmett et al., 1981) (Figure 3). The sequence homology suggests that the zymogens are activated in a manner similar to the activation of the pancreatic zymogens (Kraut, 1977). However, all the clotting serine proteases appear to have acquired unique amino-terminal extensions in addition to the common catalytic region (Jackson and Nemerson, 1980) which have important roles in the regulation and activation of the proteases. 20 FIGURE 3: AMINO ACID SEQUENCE HOMOLOGIES IN COAGULATION FACTOR ZYMOGENS Comparison of the structures of coagulation and fibrinolytic zymogens to trypsinogen. The solid bar represents the catalytic region in the proteases, the cross hatched region represents the Gla region, K represents the kringles, E represents regions homologous to epidermal growth factor precursor, I and II represents regions homologous to the type I and type II homologies of fibronectin, and A represents the homologous regions found in factor XI and prekallikrein. The lengths of the bars are approximately proportional to the lengths of the polypeptide chains. Arrows represent the locations of peptide bonds that are cleaved during activation of the zymogens. Solid lines below the proteins represent disulphide bridges and do not necessarily represent their true locations. 21 PROTHROMBIN FACTOR VII FACTOR IX FACTOR X PROTEIN C V/k K 1 K _ j FACTOR XI ( A J L A 1 A 1 A 1 M » H r V r A 1 A I A I wsmaA • * PREKALIKREIN 1 A 1 A 1 A i A \%mamm ^ A-FACTOR XII 1 II f E 1 1 1 E 1 • i L — i i PLASMINOGEN I I K I K I K I K I K TISSUE TYPE PLASMINOGEN ACTIVATOR UROKINASE I I I E I K | - R ~ T 1 E 1 K | TRYPSINOGEN PROTEIN S Y//A | E 1 E | E | E | 22 2. FUNCTIONAL PROTEIN HOMOLOGIES Comparison of the N-terminal extensions of the blood coagulation proteins shows several regions of homology (Figure 3). The seven vitamin K-dependent coagulation proteases acquired their name from the Gla domain at their aminotermini. The calcium binding region consisting of 10-13 Gla residues are present in factor X (McMullen et al., 1983; Enfield et al., 1980; Titani et al., 1975), prothrombin (Butkowski et al., 1977; Waltz et al., 1977; Magnusson et al., 1975; MacGillivray and Davie, 1984; Degen et al., 1983), factor IX (Katayama et al., 1979; Kurachi and Davie, 1982; Jaye et al., 1983; McGraw et al., 1985), factor VII (Hagen et al., 1986), protein C (Fernlund and Stenflo, 1982; Stenflo and Fernlund, 1982; Long et al., 1984; Foster and Davie, 1984; Beckmann et al., 1985), and the cofactor, protein S (Dahlback et al., 1986; Lundwall et al., 1986; Hoskin et al., 1987). The seventh protein, protein Z (Hojrup et al., 1985) was isolated from plasma, but as yet has no defined function. The structure is not shown, but its homology with the blood coagulation proteases extends to include the Gla domain, the two EGF units, a single beta-hydroxyaspartic acid residue, and the serine protease domain. However, similar to the heme binding protein, haptoglobin (Maeda et al., 1984), protein Z has no associated proteolytic activity as two of the three essential catalytic residues have been substituted (Hojrup et al., 1985). In prothrombin, the Gla domain is followed by two homologous regions of approximately 80 amino acids called kringles by Magnusson et al. (1975). These structures contain three typical disulfide bonds and are represented by K in Figure 3. Although the three-dimensional structure of one kringle was reported recently (Park and Tulinsky, 1986), their function in prothrombin is unclear. Several kringle units have also been identified in the fibrinolytic proteases plasminogen (Sottrup-Jensen et al., 1978), tissue-type plasminogen activator (Pennica et al., 1983), urokinase plasminogen activator (Verde et al., 1984), and in the coagulation protein factor XII (Cool et al., 1985; McMullen and Fujikawa, 1985). In 23 tissue plasminogen activator, kringle 2 has been reported to bind fibrin molecules in vitro (van Zonneveld et al., 1986). With the exception of prothrombin, each of the vitamin K-dependent clotting factors contain multiple copies of the region homologous to epidermal growth factor (region E, Figure 3). In each case, an aspartic acid residue in one of the EGF regions is converted to beta-hydroxyaspartic acid in a posttranslational reaction (Fernlund and Stenflo, 1983; McMullen et al., 1983b). In addition, the three latter EGF-like domains in protein S contain beta-hydroxyasparagine residues (Stenflo et al., 1987). The amino acid sequence of the carboxyterminal region of protein S is unique among the blood coagulation proteins (Figure 3) but exhibits homology to the sexual hormone binding globulin (Edenbrand et al., 1987; Dahlback et al., 1987). The absence of a protease region in protein S is consistent with its role as a binding protein. Type I and type II functional domains present in the adhesive plasma protein, fibronectin (Peterson et al., 1983; Kornblihtt et al., 1985), were identified in factor XII (Cool et al., 1985). Tissue-type plasminogen activator possesses only the type II homology (Pennica et al., 1983). Given the homology with fibronectin, the types I and II domains may have fibrin-binding capacity (Peterson et al., 1983). Finally, factor XI (Fujikawa et al., 1986) and prekallikrein (Chung et al., 1986) share a common structural repeat unit (A in Figure 3) that is unique among the serine proteases. Its function is unknown. 24 E. EUKARYOTIC GENE STRUCTURE 1. PROMOTERS Eukaryotic genes expressed by RNA polymerase II are regulated by regions predominantly 5' to the coding sequence (Breathnach and Chambon, 1981). The upstream control elements of these genes are classified in three types. The 'TATA' box element is involved in fixing the start site of transcription to a point 30 bp downstream from its own position (Corden et al., 1980). A second regulator region, the upstream element, is a broad class of sequences found at variable distances from the transcription start site and appears to be important in determining the level of transcription. These sequences are commonly referred to as G-C rich elements or 'CCAAT' boxes (McKnight and Kingsbury, 1982). Finally, enhancer elements are able to stimulate gene transcription from either up- or downstream of the initiation site, at considerable distances, and in both orientations (Reudelhuber, 1984; Charnay et al., 1984; Wright et al., 1984). Enhancers have been postulated to be tissue-specific modulators of transcription for some mammalian genes (Reudelhuber, 1984). 2. EXONS AND INTRONS A unique feature of some eukaryotic genes is the mosaic arrangement of exons and introns from which the primary transcript is formed (Breathnach and Chambon, 1981). Exons contain the mRNA encoded sequences transcribed by RNA polymerase II (Gilbert, 1978). Prior to mRNA translation, intronic sequences must be spliced out of the initial transcript (Breathnach and Chambon, 1981). Both intron and exon lengths vary greatly, but total intron size is reported to be a function of the total exon size (Naora and Deacon, 1982). At the nucleotide sequence level, all introns present in protein-encoding genes have well defined conserved sequences at their 5' and 3' boundaries (Sharp, 1981; Mount, 1982). These sequences are recognized by small nuclear RNAs (Keller and Noon, 1984; Cech, 1983), 25 splicing intermediates are formed (Graboswki et al., 1984), and introns are accurately excised by splicing proteins (Breathnach and Chambon, 1981). No internal intronic signals are required for correct excision (van Santen and Spritz, 1985) although a minimal size of 80 bp is necessary, possibly due to constraints enforced by the splicing mechanism (Wieringa et al., 1984). 3. TRANSCRIPTION AND PROCESSING In prokaryotes, messenger RNAs generally are primary transcripts and thus are exact replicas of chromosomal DNA sequences (Perry, 1976). In eukaryotes, however, a primary transcript becomes an mRNA only after a series of modifications including: 1) 5' capping, 2) cleavage to form a new 3' terminus followed by polyadenylation, 3) splicing of intronic sequences, 4) base methylation, and 5) the transport of mRNA from the nucleus to the cytoplasm (Nevins, 1983). Initiation of translation occurs at the A U G triplet, although 5' proximal conserved sequences may be required for recognition (Kozak, 1984). Capping of the 5' end of the RNA defines the initiation site for transcription (Nevins, 1983), aids in the stablization of the transcript, markedly enhances translation (Shatkin, 1985), and also plays a significant role in the efficient splicing of the message (Grabowski et al., 1985). Transcriptional termination in eukaryotes is heterogenous and ill-defined (Birnstiel et al., 1985). Mature 3' end formation and subsequent polyadenylation occurs 11-30 bp downstream to a conserved A A U A A A sequence (Proudfoot and Brownlee, 1976). This and flanking sequences (Gil and Proudfoot, 1984; Benoit et al., 1980; Berget, 1984) are required for 3' end formation, but not for polyadenylation (Montell et al., 1983; Wickens and Stephenson, 1984). Base methylation occurs at an early step in mRNA processing and are retained during RNA processing, suggesting that these modifications are important for message formation (Nevin, 1983). Ultimately, the fully processed mRNA is selectively transported from the nucleus to the cytoplasm where translation occurs (Nevin, 1983). F. EVOLUTION OF PROTEIN AND GENE SEQUENCES 26 1. MOLECULAR CLOCK A phylogeny of species can be established by determining the rate of evolution of a conserved protein family within different species (Zuckerkandl and Pauling, 1965; Wilson et al., 1977; Li et al., 1985). During the course of comparative studies of protein structure, a relationship was developed between the number of amino acid substitutions and the time elapsed since any two species have diverged (Zuckerkandl and Pauling, 1965; Wilson et al., 1977). Proteins behave like approximate evolutionary clocks. Amino acid replacements accumulate at fairly steady rates over long periods of evolutionary time (Zuckerkandl and Pauling, 1965; Wilson et al., 1977). This 'molecular clock' originates from the steady and random mutation rate of DNA (Wilson, 1985; Li et al., 1985). However, the rate of evolution along a DNA molecule is not entirely uniform. Mutational changes are slow where there are strong functional constraints in the protein, and faster where there are few (Wilson et al., 1977; Wilson, 1985; Li et al., 1985). Furthermore, differences in environmental pressures alter the rate of evolution (Wilson et al., 1977). Once the function of either a portion of or the entire protein changes, the selective pressures change and again, the evolutionary rate may alter (Wilson et al., 1977). Identification of well conserved regions of a protein between species thus allows detection of functionally important domains within the molecule. 27 G. MECHANISMS OF GENE EVOLUTION 1. GENE DUPLICATION Through gene duplication events, families of structurally and functionally similar proteins are created (Doolittle, 1985; Edged et al., 1983). During the course of evolution, functions have diverged in some ancient gene families such as the lysozyme-lactalbumin (Hall et al., 1982) and the immunoglobulin (Hood et al., 1985) genes. In other families, like the serine protease genes (Young et al., 1978; Hewett-Emmett et al., 1981), the central function has been maintained. Structurally, the serine protease gene family has differentiated and expanded in size (Hewett-Emmett et al., 1981; Patthy, 1985; Doolittle, 1985). Internal or partial gene duplication events generate proteins of greater size (Li, 1983) through repeated insertions of protein domains exhibiting similar amino acid sequence and three-dimensional structure (McLachlan, 1979). For example, the five kringles of plasminogen (Figure 3) may have been acquired through internal duplications of the original fourth kringle (Kurosky et al., 1980). 2. GENE FUSION Proteins are often mosaics composed of parts and pieces fused together from several different sources (Doolittle, 1985). Gene fusions act synergistically with duplication events to provide complexity and elongation to simpler gene structures (Doolittle, 1985). Examples include the complement protein Clq and the cathepsin-like protease found in chicken (Doolittle, 1985). 3. EXON SHUFFLING Gilbert (1978) hypothesized that exons correspond to functional domains of protein structures. In genomic DNA, the exon units are segregated by intronic sequences. Recombination within introns could assort these units, thereby giving rise to novel proteins 28 from segments of existing unrelated molecules (Gilbert, 1985; Blake 1983a,b). Through this exon shuffling mechanism, families of related genes can acquire new and independent functions (Rogers, 1985) such as observed in the immunoglobulin (Hood et al., 1979) and the serine protease genes (Neurath, 1984, 1985). 4. INTRON INSERTION AND SLIDING Variations in protein structure often coincide with the position of the intron/exon junctions (Craik et al., 1983). Small insertions or deletions of 2 to 17 amino acids are attributed to intron sliding events. This is caused by the creation of new splice donor or acceptor sequences within the exons or introns replacing the pre-existing site. No deletion or insertion is observed in the gene; however, a different mRNA product is created. In this process, only movements which maintain the translational reading frame can be tolerated. Other variation in gene structure in which the codon triplet is interrupted may arise by intron insertions or deletions (Craik et al., 1983). These mechanisms are not subject to the above constraint and therefore, can alter DNA to create either large or minimal changes (Craik et al., 1983). Such modifications frequently map to the surfaces of proteins where alterations in structure are the least disruptive (Craik et al., 1982). H. GENETICS OF THE SERINE PROTEASES 1. GENES Generally, the gene sizes of homologous proteins increase during evolution with increasing complexity of function (Neurath, 1985). This is well illustrated by the family of serine proteases. With the advent of molecular biology techniques, the structure of the cDNA and genes encoding most of the serine proteases have been partially or completely elucidated 29 (Young et al., 1978; Hewett-Emmett et al., 1981; Walz et al., 1986). Comparison of the predicted amino acid sequences from the cDNA sequence permitted the determination of homologous structural and functional domains among the serine proteases as described above. Analysis of the protein structure allowed a better understanding of how these proteases are related to each other and to other proteins (Patthy, 1985). A similar study of the genes may yield insight into how and when these families of related proteins evolved. During the past five years, most of the genes encoding the serine proteases have been isolated and characterized. Genes for all of the pancreatic enzymes including trypsinogen (Craik et al., 1984), chymotrypsinogen (Bell et al., 1984), and proelastase (Swift et al., 1984) have been reported. Several of the fibrinolytic protease genes such as tissue-type plasminogen activator (Ny et al., 1984; Fisher et al., 1985; Freizner-Degen et al., 1985), urokinase plasminogen activator (Nagamine et al., 1984; Riccio et al., 1985), and plasminogen (Malinowski et al., 1984; Sadler et al., 1985) have been studied. The factor XII gene structure (Cool and MacGillivray, 1987) has also been elucidated. With the exception of the protein S gene, the structures of all the vitamin K-dependent clotting factor genes have been determined. These include prothrombin (Irwin et al., 1985; Irwin, 1986; Davie et al., 1983; Degen, et al., 1985), factor IX (Anson et al., 1984; Yoshitake et al., 1985), protein C (Foster et al., 1985; Plutzky et al., 1986), and factor VII (O'Hara et al., 1987)."' 2. GENETICS It has been long established that Christmas Disease (Haemophilia B) and Classical Haemophila (Haemophila A) are recessively inherited, X-linked bleeding disorders due to defects in the factor IX and factor VIII genes, respectively (Bloom, 1981). The chromosomal location of the other autosomally inherited haemophilias have not been as well defined. Some loci have been identified by association of chromosomal abnormalities with specific coagulation deficiencies. Others, with the progress made by genetic engineering techniques, have been 30 identified by the direct use of cloned cDNA and genomic sequences in somatic cell hybridization studies. The serine protease genes, trypsinogen, chymotrypsinogen B, and proelastase have been assigned to chromosomes 7q22-qter, 16 and 12, respectively, by cytogenetic studies (Honey et al. 1984). Of the blood coagulation serine protease genes, factor IX (Christmas Disease) has been assigned to chromosome Xq27 (Camerino et al., 1984), prothrombin to chromosome l lp l l -p l2 (Royle et al., 1987), protein C to chromosome 6 (Rocchi et al., 1985), and factor XII which was originally localized to chromosome 6q23 (Pearson et al., 1982) has been reassigned to 5 (D. Cool, personal communications) by in situ hybridization studies. In addition, chromosomal abnormality reports indicate that factor X and factor VII segregate with chromosome 13q34 (Pfeiffer et al., 1982; Ott and Pfeiffer, 1984; de Grouchy et al., 1984). The gene for factor VIII has been localized to chromosome Xq28 (Gritschnier et al., 1985). The protein S gene has been assigned to chromosome 3 (Ploos van Amstel, et al., 1987). Finally, the fibrinolytic protease genes, plasminogen, tissue-type plasminogen activator, and urokinase have been mapped to chromosomes 6q25-qter (Sakata and Aoki, 1980), 8pl2 (Yang-Feng et al., 1986; Rajput et al., 1983), and 10 (Rajput et al., 1983), respectively. 3. VARIATION IN GENE STRUCTURE Haemophilias A and B are the result of a wide variety of genetic defects. Many of the abnormalities have been characterized at the molecular level and include partial gene deletions, point mutations, inappropriate mRNA splicing, and amino acid substitutions ( see Bertina and van der Linden, 1982; Thompson, 1986; Lawn, 1985; Lawn and Vehar, 1986 for recent reviews). Determination of the genetic defect in every affected family is not feasible. The discovery of restriction fragment length polymorphisms (RFLPs) within the gene in question allows the inheritance of specific alleles to be detected by linkage to these polymorphisms. Of the bleeding disorders, haemophilias A (factor VIII) and B (factor IX) are the most extensively studied as a result of their medical significance. Haemophilia A 31 represents -85% and Haemophilia B represents -14% of all haemophilias (Lawn, 1985) whereas Stuart Factor deficiency (factor X) occurs at an incidence of one in one million (Bloom, 1981). Several RFLPs have been identified in the factor IX gene including Taq I (Giannelli et al., 1984; Camerino et al., 1984), Msp I (Camerino et al., 1985), Dde I/Hinf I, Xmn I (Winship et al., 1984), and Bam HI (Hay et al., 1986) polymorphisms. The cDNA (Vehar et al., 1984; Toole et al., 1984) and gene (Gritschnier et al., 1984; Wood et al., 1984) encoding human blood clotting cofactor, factor VIII, have been isolated. Although the gene encompasses 186 Kbp of genomic sequence, surprisingly few polymorphisms have been observed. Those reported so far include Bel I (Gritschnier et al., 1985) and Xba I (Wion et al., 1986), both of which are unlinked and are thus informative for diagnostic studies. I. EVOLUTION OF THE SERINE PROTEASE GENES The molecular genetics of the blood coagulation factors have been studied extensively in the last few years (see previous sections, Walz et al., 1986; Neurath, 1985). Many interesting contributions have been made towards the understanding of the clotting factors as a gene family and this has contributed to the understanding of the evolution of related proteins in general. Analysis of cDNA sequence has facilitated the comparison of homologous protein domains; thus, regions of functional importance can be discerned. Furthermore, cDNA characterization yielded insight into the structure of the precursor forms of the protein molecules, information which is not obtainable by standard protein chemistry techniques. The proteolytic processing steps involved in the formation of the mature plasma form of the protein can be inferred. Elucidation of gene structure permitted prediction of the evolutionary pathways which gave rise to the complex family of serine proteases observed at the present day. 32 Concurrent with the studies on the factor X cDNA and gene sequences reported in this thesis, partial cDNA sequences for factor X have been reported by others (Leytus et al., 1984; Kaul et al., 1986; Bahnak et al., 1987). Subsequently, a full-length cDNA clone as well as the partial structure of the factor X gene has been described (Leytus et al., 1986). 33 MATERIALS AND METHODS A. MATERIALS Agarose, acrylamide, bisacrylamide, urea, ammonium persulfate, Sephadex G-25, TEMED (N,N,N,N'-tetramethylethlyenediamine), and Biogel A50M were from Bio-Rad Laboratories. Yeast extract, casamino acids, bacto-tryptone, and bactoagar were Difco grade from the Grand Island Biological Company. NZ-amine type A was from Humko Sheffield Chemical Co. Oligo (dT) cellulose (type 7) and dextran sulfate were from P.L. Biochemicals. Nitrocellulose sheets and circles (82 and 132 mm) were 0.45 um pore size from Millipore or Schleicher and Schuell. 32P-labeled nucleotides were from New England Nuclear or Amersham. Phenol was obtained from British Drug Houses Ltd. and was redistilled prior to use. The fraction boiling at 179°C was collected and frozen in aliquots at -20°C prior to use. Deoxy- and dideoxyribonucleotides, random hexadeoxyribonucleotides (p(dN^)), and the Ml3 sequencing primer (heptadecanucleotide) were from PL Pharmacia. Isopropyl-beta-D-thiogalactopyranoside (IPTG), 5-bromo-4-chloro-3-indolyl-beta-D-galactopyranoside (X-Gal), ethidium bromide, yeast transfer RNA (tRNA), ampicillin, tetracycline, chloramphenicol, kanamycin, ribonuclease A, deoxyribonuclease I, and proteinase K. were from the Sigma Chemical Company. S-adenosylmethionine was from Cal Biochem. Cesium chloride was from Cabot Berylco Ltd. Ultrogel AcA54 was from LKB. Geneclean was from Bio 101, Inc. An oligodeoxyribonucleotide mixture was purchased from Applied Biosystems Inc. All other oligodeoxyribonucleotides including some of the Ml3 sequencing primer used were synthesized on an Applied Biosystems 890A DNA Synthesizer (by Tom Atkinson, Dept. of Biochemistry) and purified by denaturing polyacrylamide gel electrophoresis prior to use (Atkinson and Smith, 1984). All other chemicals were of reagent grade or better and were purchased from either Sigma Chemical Co., Fisher Scientific, or British Drug Houses Ltd. Restriction endonucleases, T^ DNA ligase, T^ DNA polymerase, T^ polynucleotide kinase, Eco RI methylase, Eco RI linkers, and bovine serum albumin (nuclease free) were from New 34 England Biolabs, Bethesda Research Laboratories, or PL Pharmacia. DNA polymerase I and DNA polymerase I Klenow fragment were from Boerhinger-Mannheim or PL Pharmacia. Lambda Dash and lambda gtll phage vectors, E. coli strains P2 392 and RY1088, and Gigapack Plus extracts were obtained from Stratagene Cloning Systems. Adult human liver was obtained from kidney donor patients. Liver samples were cut into 0.5 cm slices, rinsed in sterile PBS (10 mM sodium phosphate pH 7.5, 150 mM NaCl), frozen in liquid nitrogen and stored at -70°C. Human genomic DNA was isolated from white blood cells (by Katherine Robertson and Heather Kirk, Dept. of Biochemistry). B. STRAINS, VECTORS, AND MEDIA 1. BACTERIAL STRAINS E. coli strain RRI (F-, hsdS20 (rg.nig) recA-, ara-14, proA2, lacYl, galK2, rpsL20 (smr), xyl-5, mtl-1, supE44, lambda-) (Maniatis et al., 1982) was host for the bovine liver cDNA library cloned into pBR322 (MacGillivray and Davie, 1984). E. coli strain MCI061 ^(araD139, (ara,leu)7697, ^lacX74, galU-, galK-, hsr-, hsm+, strA) (Casadaban and Cohen, 1980) was host for the human liver cDNA library constructed in the plasmid pKT218 (Prochownik et al., 1983). E. coli strain C600 (F-, thi-1, thr-1, leuB6, lacYl, tonA21, supE44, lambda-) (Maniatis et al., 1982) and E. coli strain K802 (hsdR+, hsdM+, gal-, met-, supE) (Maniatis et al., 1982) were hosts for the screening of human genomic DNA clones in lambda charon28 (Blattner et al., 1977). E. coli strain MC1061/P3 (same as MC1061 with kanr, amp am, tet am) (from H.V. Huang, Washington University, St. Louis, Missouri) (Casadaban and Cohen, 1980) was host for the screening of human genomic DNA clones in lambda charon4A vector (Blattner et al., 1977). E. coli strain P2 392 (P2 lysogen of LE392) and E. coli strain LE392 (F-, hsdR514 (rR-, mR+), supE44, supF58, lacYl, A(lacIZY)6, galK2, galT22, metBl, trpR55, lambda-) (Maniatis et al., 1982) were hosts for human 35 genomic libraries constructed in the vectors EMBL3 and lambda Dash (Frischauf et al., 1983). E. coli strain RY1088 (AlacU169, supE, supF, hsdR-, hsdM+, metB, trpR, tonA21, proC::Tn5 (pMC9), pMC9 is pBr322 -lacIQ) (Young and Davis, 1983a,b) was host for human genomic libraries constructed in lambda gtll vector (Young and Davis, 1983a,b). E. coli strain JM83 (ara, Alacpro, strA, thi-, phi 80, lacZAM15) (Vieira and Messing, 1982) was host for the transformation of pUC13 vector (Vieira and Messing, 1982). E. coli strain JM101 (Alacpro, supE, thi-, F , traD36, proAB, lacIQ, lacZ M15) and JM103 (^lacpro, supE, thi-, strA, sbcB15, endA, hsdR-, F , traD36, proAB, lacIQ, lacZAM15) (Messing, 1983) were hosts for the transformation of M13mp 8, 9, 18, 19 vectors (Messing, 1983). 2. VECTORS The plasmid vector piAN7 (from H.V. Huang) was utilized for recombinant screening. The cloning vectors, lambda Dash (Frischauf et al, 1983) and lambda gtll (Young and Davis, 1983a,b) (obtained from Stratagene), were used to construct specific human genomic libraries. The Ml3 vectors mp8, 9, 18, and 19 (Messing, 1983) were used for DNA sequence analysis. DNA fragments were subcloned into pUC13 (Vieira and Messing, 1982) (pUC13 was obtained from Dr. Mark Zoller, Dept. of Biochemistry, University of British Columbia) for restriction enzyme mapping and hybridization probe construction. 3. MEDIA E. coli hosts for phage lambda clones were grown in NZYCM medium (10 g NZamine type A, 2 g magnesium chloride, 5 g yeast extract, 1 g casamino acids per litre and the pH was adjusted to 7.5 with NaOH) (Maniatis et al, 1982) supplemented with 0.2% maltose (filter-sterilized). Phage lambda libraries were plated on NZYCM agar plates (1.5% w/v) with an overlay of either NZYCM agarose (0.75% w/v) when screening or NZYCM agar (0.75% w/v) when titering phage libraries or stocks. For recombinant screening. E. coli was infected with 36 phage lambda on plates supplemented with ampicillin (12.5 ug/ml) and tetracycline (7.5 ug/ml). Recombinant phage clones were selected with kanamycin (50 ug/ml). Phage lambda libraries and stocks were diluted and stored in SM buffer (5.8 g NaCl, 2 g magnesium sulphate, 50 ml 1M Tris-HCl pH 7.5, 5 ml 2% gelatin per litre). Luria broth (LB) (5 g yeast extract, 10 g bacto-tryptone, and 10 g NaCl per litre) (Maniatis et al., 1982) was used for the growth of E. coli strains transformed with plasmids. The liver cDNA libraries in E. coli strain RRI/pBR322 vector and E. coli strain MC1061/pKT218 vector were screened on LB agar plates (1.5% w/v) supplemented with tetracycline (12.5 ug/ml) for selection. E. coli strain JM83/pUC13 plasmid derivatives were selected with ampicillin (50 ug/ml). E. coli transformed with Ml3 subclones were grown in YT medium (5 g yeast extract, 8 g bacto-tryptone, and 5 g NaCl per litre) (Maniatis et al, 1982). Phage Ml3 transformants were plated on YT agar plates (1.5% w/v) overlayed with YT agar (0.75% w/v). E. coli strain JM101 and JM103, hosts for M13 vectors, were maintained on minimal medium plates (Messing, 1983) which were prepared as follows: 3 g of agar in 160 ml water (sterilized and cooled to 55°C) was mixed with 40 ml 5X Salts (2.1 g Potassium phosphate, 0.9 g potassium phosphate-monobasic, 0.2 g ammonium sulphate, 0.1 g sodium citrate per 40 ml), 2 ml 20% glucose, 0.2 ml 20% magnesium sulphate, and 0.1 ml 10 mg/ml thiamine. All solutions were autoclaved with the exception of thiamine which was filter-sterilized. Bacterial cultures for large-scale pUC13 plasmid preparation were grown to mid-log phase in the presence of ampicillin. Chloramphenicol (250 ug/ml) was then added for plasmid amplification. Cultures for all other large-scale plasmid preparations were grown in M9 medium (Maniatis et al., 1982) which was prepared as follows: 840 ml water, 100 ml 10X Salts (7 g disodium hydrogen orthophosphate, 3 g potassium phosphate, 0.5 g NaCl, 1 g ammonium chloride per 100 ml), 10 ml magnesium sulphate, 20 ml 20% glucose, 10 ml 0.01M calcium chloride, 20 ml 20% casamino acids, 0.2 ml 10 mg/ml thiamine, and 0.2 g uridine. Again, all solutions were autoclaved with the exception of thiamine and uridine which were filter-sterilized. 37 C. GEL ELECTROPHORESIS 1. AGAROSE GELS DNA and RNA fragments larger than 300 bp were sized by agarose gel electrophoresis. For DNA size analysis, gels were prepared to the agarose concentration required for optimal separation (0.7% to 1.2%) and ethidium bromide was added to a final concentration of 1 ug/ml for DNA detection. DNA samples were loaded in 3% ficoll (10X stock: 30% ficoll, 0.2% xylene cyanol, 0.2% bromophenolblue) and electrophoresis was performed in IX TAE buffer (50 X TAE is 2 M Tris base, 1 M glacial acetic acid, 0.1 M EDTA) (Maniatis et al., 1982) at 1-3 volts/cm. The DNA was visualized by irradiation under UV light at 260 nm. Denaturing agarose gels were used for RNA size analysis. Prior to use, all buffers were sterilized to remove endogenous ribonucleases. Gels were prepared with 0.6% agarose, IX gel buffer (5 X gel buffer: 0.2 M morpholinopropanesulfonic acid (MOPS) pH 7.0, 50 mM sodium acetate, 5 mM EDTA pH 8.0), and 2.2 M formaldehyde (pH > 4.0) as the denaturant. Before loading, RNA samples were incubated at 55°C for 15 minutes in the presence of 2 ul 5 X gel buffer, 3.5 ul formaldehyde, and 10 ul formamide. Up to 27 ug of RNA was used in a final sample volume of 20 ul. Electrophoresis was carried out in 1 X gel buffer at 100 V for 2 hours. RNA fragments were subsequently detected by northern blot analysis. 2. POLYACRYLAMIDE GELS Polyacrylamide gel electrophoresis was used to separate DNA fragments ranging from 10 bp to 500 bp in length. Polyacrylamide gels are either denaturing or non-denaturing due to the presence or absence of urea. Preparative polyacrylamide gels were made from a mixture of 30% (w/v) acrylamide (29:1 acrylamide:bisacrylamide) and 10 X TBE buffer (0.89 M Tris base, 0.89 M boric acid, 25 mM EDTA, pH 8.3) (Maniatis et al., 1982) to give the correct 38 matrix for size selection and a 1 X TBE running buffer. Gel solutions were then degassed using a water aspirator to facilitate polymerization. Polymerization was initiated by the addition of ammonium persulphate and TEMED to final concentrations of 0.066% (w/v) and 0.04% (w/v), respectively. Following electrophoresis, gels were stained in distilled water containing ethidium bromide (2 ug/ml) for 15 minutes and the DNA was visualized by UV irradiation at 260 nm. Denaturing polyacrylamide gels (Sanger and Coulson, 1978) consisted of 8.3 M urea, 1 X TBE, and the appropriate volume of 40% acrylamide stock (38:2 acrylamide:bisacrylamide). Gel solutions were degassed and polymerization was initiated as described above. For DNA sequencing gels, the labeled nucleic acid fragments were visualized by autoradiography. Gels were dried under vacuum with a Bio-Rad gel drier at 80°C for 20-30 minutes and exposed to Kodak XK-1 film, with or without intensifying screens (Dupont Lighting Plus). For oligodeoxyribonucleotide purification, single-stranded DNA was visualized by quenching of UV irradiation (260 nm) over a thin layer chromatography (TLC) plate containing a fluorescent indicator. D. ISOLATION OF DNA 1. ISOLATION OF PLASMID DNA Small-scale preparations of plasmid DNA were isolated by a modification of the alkaline lysis method of Birnboim and Doly (1979) as described by Maniatis et al. (1982). A small volume (5 ml) of Luria or YT broth containing the appropriate antibiotic was inoculated with E. coli and incubated at 37°C overnight. An aliquot (1.5 ml) was transferred to an Eppendorf tube and centrifuged for one minute in an Eppendorf microfuge. The supernatant was discarded and the bacterial pellet was resuspended in 100 ul of a fresh, ice-cold solution of lysozyme 39 (50 mM glucose, 10 mM EDTA, 25 mM Tris-HCl (pH 8.0), 4 mg/ml lysozyme). After incubation at room temperature for 5 minutes, the cells were disrupted by the addition of 200 ul of an ice-cold solution of 0.2 N NaOH, 1% SDS. The contents were mixed gently by inversion and incubated at 4°C for 5 minutes. The SDS was removed by a further 5 minute incubation at 4°C with 150 ul of ice-cold solution of potassium acetate (60 ml 5M potassium acetate, 11.5 ml glacial acetic acid, and 28.5 ml water, pH 4.8). Cellular debris was removed by centrifugation for 10 minutes at 4°C. The supernatant was transferred to a fresh tube and extracted with an equal volume of phenokchloroform (1:1 v/v). The aqueous phase was separated by centrifugation (2 minutes at room temperature) and mixed with two volumes of 95% ethanol. After incubating at room temperature for 2 minutes, the ethanol precipitate was centrifuged for 5 minutes at room temperature. The plasmid DNA pellet was washed with 70% ethanol, air dried, and resuspended in 50-100 ul TE buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA). Two alternative procedures were used for large-scale purification of plasmid DNA. The Triton lysis procedure (Katz et al., 1973, 1977) was used initially to isolate plasmid DNA from clones constructed in the vector pBR322. A small culture of the E. coli was grown overnight and 5 ml was used to inoculate 1 L of M9 medium. The large culture was grown at 37°C shaking at 200 rpm until the OD600nm was 0.6-0.7. Chloramphenicol (250 mg) was added and incubation continued for 12-16 hours. The cells were collected by centrifugation at 9 Krpm for 10 minutes at 4°C in a GS-3 rotor (Sorvall). All residual supernatant was removed and the cells were frozen at -20°C for 2 hours. The weakened cells were resuspended in 6.25 ml of lysis solution (25% (w/v) sucrose, 50 mM Tris-HCl pH 8.0) and 1.5 ml of lysozyme (10 mg/ml in lysis solution) added. The suspension was continuously mixed by swirling on ice for 5 minutes. EDTA (1.25 ml of 0.5 M) was added and the contents swirled an additional 5 minutes. Finally, 10 ml of Triton solution (10 ml 10% (w/v) Triton X-100, 125 ml 0.5M EDTA pH 8.0, 50 ml 1M Tris-HCl pH 8.0, 800 ml water) was added and lysis was completed with a further 15 minutes of swirling. Cellular debris was 40 removed by centrifugation at 19 Krpm for 30 minutes at 4°C in an SS-34 rotor (Sorvall). Plasmid DNA was then purified by cesium chloride gradient centrifugation. To 3.8 ml of supernatant, 3.9 g CsCl (0.95 g/ml) and 0.3 ml 10 mg/ml ethidium bromide was added. The CsCl/EtBr solution was transferred to Beckman heat sealable tubes and centrifuged in a VTi65 rotor (Beckman) at 20°C at either 50 Krpm for 20 hours or 65 Krpm for 4 hours. When the Ti70.1 rotor (Beckman) was used, centrifugation was carried out at 50 Krpm for 20 hours at 20°C. After centrifugation, tubes were punctured and the plasmid DNA withdrawn with a syringe needle. EtBr was removed by several extractions with isopropanol. DNA was diluted and precipitated with 0.1 volume 3 M NaOAc, 2 volumes 95% ethanol at -20°C overnight. Next day, the precipitate was centrifuged at 10 Krpm for 30 minutes at 4°C and resuspended in 0.5-1 ml TE buffer. Alternatively, a simple, rapid method of large-scale plasmid DNA isolation was adapted from the alkaline lysis procedure (Maniatis et al., 1982). Volumes were scaled to accomodate 500 ml cultures (pellets were initially resuspended in 10 ml lysozyme solution). After incubation with potassium acetate, debris was removed by centrifugation in a Ti60 rotor (Beckman) at 40 Krpm for 30 minutes at 4°C. Isopropanol (0.6 volume) was added and nucleic acids were precipitated at room temperature for 15 minutes. The precipitate was recovered by centrifugation in a HB-4 rotor (Sorvall) at 9 Krpm for 30 minutes at room temperature. To remove excessive amounts of RNA, the pellet was resuspended in 8 ml TE buffer (pH 8.0) and digested with 300 ul RNase A (5 mg/ml) at 37°C for one hour. Plasmid DNA was subsequently purified by cesium chloride gradient centrifugation as described above. Large-scale preparations of double stranded replicative form (RF) of Ml3 DNA were isolated as described by Messing (1983). YT broth (1 ml) was inoculated with 10 ul of an overnight culture of JM101 or JM103 and a single M13 phage plaque (or 5 ul of phage supernatant). A second, 10 ml culture of JM101 or JM103 was started simultaneously. Both cultures were grown at 37°C for 6 hours. An aliquot of each (100 ul of the 1 ml culture and 41 1 ml of the 10 ml culture) was added to 1 L of YT medium and incubated at 37°C for 6 hours. RF DNA was prepared by the alkaline lysis method outlined earlier. 2. ISOLATION OF PHAGE LAMBDA DNA Small-scale purification of phage lambda DNA was prepared by the inoculation of 20 ml NZYCM medium with a single phage lambda plaque and 100 ul of a fresh overnight culture of host bacteria. Cultures were shaken vigorously (300-400 rpm) at 37°C until bacterial lysis was apparent (4-7 hours). To induce complete lysis, 200 ul chloroform was added and incubation continued for an additional 5-10 minutes. Cellular debris was pelleted by centrifgation in an HB-4 rotor at 10 Krpm for 10 minutes and the supernatant was transferred carefully to a 50 ml polyethylene centrifuge tube containing 6 ml 50% PEG (polyethylene glycol 6000 carbowax 8000), 3 ml 5M NaCl. The contents were mixed gently and phage particles precipitated at 4°C overnight (6 hours was sufficient). The phage precipitate collected from centrifugation (10 Krpm for 10 minutes at 4°C) was resuspended in 0.5 ml DNase I buffer (50 mM Tris-HCl pH 7.5, 5 mM magnesium chloride, 0.5 mM calcium chloride) containing 5 ug DNase I (5 ul of 1 mg/ml) and 50 ug RNase A (10 ul of 5 mg/ml preboiled for 10 minutes). The reaction mixture was incubated at 37°C for 30 minutes and residual debris was removed by centrifugation in an Eppendorf centrifuge for 5 minutes. Phage particles were then lysed with 20 ul 5 mg/ml proteinase K in the presence of 50 ul 10% (w/v) SDS and 5 ul 0.5M EDTA. Incubation was at 68°C for one hour. The DNA solution was extracted twice with an equal volume of phenokchloroform (1:1 v/v) and twice with chloroform. Sodium acetate (3M) was added to 0.1 volume and 95% ethanol to 2 volumes, and the DNA was precipitated either at -70°C for 20 minutes or at room temperature for 2 minutes. DNA was collected by centrifugation and resuspended in 100 ul of TE buffer. 42 For large-scale preparations of phage lambda DNA (Maniatis et al., 1982) 10 1 0 host cells (1 OD600nm = 8X10 1 0 cell/ml) was infected with 5X108-5X109 phage particles at 37°C for 15 minutes. The inoculum was added to 500 ml of prewarmed NZYCM medium and shaken vigorously at 37°C until lysis was observed. Henceforth, the protocol was comparable to a scaled-up version of the small-scale preparation with some modifications. Following DNase I and RNase A digestion, the phage particles were purified by CsCl gradient centrifugation. CsCl was added to 0.75 g/ml and centrifugation was carried out in a Ti70.1 rotor for 16-20 hours at 50 Krpm, 20°C. Phage was visualized as a bluish band either in dispersed light or, if required, with a direct light source. To remove CsCl salts, phage particles were dialyzed (dialysis tubing was pretreated with 0.01% gelatin to prevent phage absorption) against 2 changes of 1,000 fold excess (v/v) of buffer (10 mM NaCl, 50 mM Tris-HCl pH 8.0, 10 mM magnesium chloride, 0.01% gelatin) for 4 hours each. Phage DNA was extracted as described previously. 3. ISOLATION OF SINGLE-STRANDED PHAGE DNA Single-stranded M13 phage DNA was purified as described by Messing (1983). An inoculum consisting of one phage plaque and 20 ul of overnight host cells was added to 2 ml YT broth (in 15 ml Falcon 2059 tubes). Cultures were grown by shaking vigorously at 37°C for 6-16 hours. To precipitate the single-stranded phage particles, the bacterial cells were removed by centrifugation in an Eppendorf centrifuge (5 minutes) and 1.5 ml of the supernatant was added to 300 ul 20% (w/v) PEG, 2.5M NaCl. The contents were mixed and incubated at room temperature for 15 minutes. The Ml 3 phage precipitate was recovered by centrifugation and resuspended in 200 ul 1 X lo tris buffer (50 mM NaCl, 10 mM Tris-HCl pH 7.5, 1 mM EDTA). The phage suspension was extracted once with phenol followed by two extractions with phenokchloroform (1:1,v/v). DNA was then precipitated by the addition of 0.1 volume 3 M NaOAc and 2 volumes 95% ethanol, and the purified, single-stranded DNA pellet was resuspended in 30-50 ul 1 X lo tris buffer. 43 E. ISOLATION OF POLY A+ RNA All glassware, pipets, plasticware, and solutions were autoclaved prior to use. Human liver RNA was isolated by the method of Chirgwin et al. (1979). Frozen (-70°C) human liver was added directly to buffer (10 ml/g tissue) composed of 7.5 M guanidine hydrochloride (GuHCl) pH 7.5, 25 mM sodium citrate pH 7.0, and 0.1 M dithiothreitol (DTT). The liver tissue suspension was homogenized using a polytron homogenizer. N-lauryl sarcosine (10% w/v) was added to 0.5% (w/v) and insoluble debris was removed by centrifugation at 5 Krpm for 30 minutes at 4°C. Ethanol was added to 33% and the mixture was precipitated overnight at -20°C. The RNA precipitate was recovered by centrifugation at 10 Krpm for 15 minutes at 4°C and the pellet was resuspended (by vigorous vortexing) in half the starting volume of GuHCl buffer. Purification by centrifugation, precipitation, and resuspension in a half volume of GuHCl buffer was repeated twice. The RNA pellet was then dissolved in a small amount of distilled water to give an opaque, viscous solution. Carbohydrate was removed by centrifugation at 9 Krpm for 30 minutes at 4°C and the RNA concentration in the supernatant was determined by absorption at 260nm (20 OD260nm = 1 mg/ml RNA). The OD260nm/OD280nm was 2.0. RNA was stored in distilled water in small aliquots at -70°C. Poly A+ RNA was isolated by affinity chromatography using oligo (dT) cellulose (Edmonds et al., 1971; Aviv and Leder, 1972). Total RNA (in a buffer containing 0.4 M NaOAc pH 7.5, 0.1% SDS, 5 mM EDTA) was applied to an oligo-dT cellulose column prepared with 0.1 N NaOH, 5 mM EDTA and equilibrated with column buffer (0.4 M NaOAc pH 7.5, 1 mMEDTA, 0.1% SDS). The unbound RNA fraction was reapplied to the column twice. The column was washed with intermediate column buffer (0.2 M NaOAc pH 7.5, 1 mM EDTA, 0.1% SDS) until the OD260nm of the eluent was less than 0.05. Bound poly A+ RNA was eluted with 1 mM EDTA pH 7.5, 0.1% SDS. RNA fractions were identified by OD260nm, pooled, and precipitated with 0.1 volume 3 M NaOAc pH 4.8, 2 volumes 95% ethanol. Poly 44 A+ RNA was subsequently resuspended in distilled water at high concentration (6 mg/ml), aliquoted, and stored at -70°C. F. OLIGONUCLEOTIDE SYNTHESIS AND PURIFICATION The oligonucleotide mixture: A 5' - d( A T ^ T T ^ T C G ATCC A T J T ) - 3' C G -p C coding for residues 274-279 of the heavy chain of bovine factor X (Titani et al., 1975) was supplied by Applied Biosystems Inc. All other oligonucleotides were synthesized using an Applied Biosystems 380A DNA synthesizer (by Tom Atkinson, Atkinson and Smith, 1984). Six oligonucleotides were constructed to specific regions of the human factor X cDNA sequence (Fung et al., 1985): 1) Oligo 1, 5'-d(CTTTGTATTTATTCCAGAATTCATT)-3' a 25 mer corresponding to amino acid residues 38 to 46 (non-coding strand), 2) Oligo 2, 5'-d(ACTTTCCCCGAGCAGCAGGAGGCC)-3' a 24 mer coding for residues -24 TO -18 (non-coding strand), 3) Oligo 3, 5'-d(GACTTTCCCCGAGCAGCAGGAGGCCAGCCAGGGAGGCACTGAG)-3' a 43 mer corresponding to residues -31 to -17 (non-coding strand) 4) Oligo 4, 5'-d(CTCAGTGCCTCC)-3' a 12 mer encoding amino acids -31 to -28 (coding strand), 5) Oligo 5, 5'-d(AGTACTCGGCCACACCATGGGGCGCCCACTGCACCTCGTCCTG)-3' a 43 mer corresponding to nucleotides 10 to 53 (Leytus et al. 1986) (coding strand), 6) Oligo 6, 5'-d(CAGGACGAGGTG)-3' a 12 mer corresponding to nucleotides 42 to 53 (Leytus et al., 1986) (non-coding strand). 45 The synthesized oligonucleotides were purified by polyacrylamide gel electrophoresis. The crude lyophilized pellets were dissolved in 50 to 100 ul distilled water. Aliquots (10 ul) were combined with 20 ul deionized formamide, heated at 90°C for 3 minutes, and separated by denaturing polyacrylamide electrophoresis. The 12 mers, and the 24 and 25 mers were purified on 20% polyacrylamide-8.3 M urea (42 cm) gels at 1500 V for 3.5 and 8 hours, respectively. The 43 mers were electrophoresed on 12% polyacrylamide-8.3 M urea gels at 1350 V for 2 hours. Employing dye markers (xylene cyanol and bromophenol blue in deionized formamide) as size standards, the full-length oligonucleotide fragments were visualized by UV irradiation (OD260nm) over a fluorescent thin layer chromatography (TLC) plate. The DNA bands were excised and eluted in 1.5 ml of buffer (0.5 M ammonium acetate, 10 mM magnesium acetate) at 37°C overnight. The eluents were initially passed through small filtering units to remove any polyacrylamide debris and then purified by reverse-phase chromatography. Prior to use, SepPac columns were equilibrated with 10 ml HPLC grade acetonitrile followed by 10 ml distilled water. The samples were applied and washed with five aliquots of 1 ml distilled water. Bound single-stranded DNA was then eluted with three fractions of 1 ml 20% (v/v) acetonitrile or 60% (v/v) methanol. Recovery was determined by absorbance at 260nm (20 OD260nm = 1 mg/ml). The appropriate fractions were pooled and lyophilized under vaccum in a Speed Vac Centrifuge (Sorvall) at 80°C for 3-4 hours. The purified oligonucleotide pellets were then dissolved in distilled water and stored at -20°C. G. LABELING OF DNA 1. NICK TRANSLATION OF DNA DNA hybridization probes were prepared by nick translation (Maniatis et al., 1982). Approximately 500 ng of DNA was labeled in 50 ul of 50 mM Tris-HCl pH 7.5, 5 mM 46 magnesium chloride, 0.05 mg/ml BSA, 10 mM beta-mercaptoethanol, 20 uM dGTP, 20 uM dTTP, 1.4 uM dATP, 1.4 uM dCTP, 700 uCi alpha-32P dATP (3000 Ci/mMole), 700 uCi alpha-32P dCTP (3000 Ci/mMole), 0.2 mM calcium chloride, 50 pg DNase I, and 20 u E. coli DNA polymerase I (Kornberg). The reaction mixture was incubated at 15°C for 60-120 minutes and then terminated by the addition of three volumes of stop buffer (1% SDS, 10 mM EDTA, 25 ug tRNA) and heating at 68°C for 5 minutes. Unincorporated nucleotides were removed by chromatography on a column of Ultrogel AcA54. The DNA sample was loaded and eluted in 10 mM Tris-HCl pH 7.5, 200 mM NaCl, 0.25 mM EDTA. The specific activities of the labeled DNA ranged between 10 to 10 million cpm/ug. The nick translated DNA probes were denatured at 100°C for 10 minutes prior to use. 2. KLENOW LABELING OF DNA DNA was also labeled by the method of Feinberg and Vogelstein (1983). DNA (50-300 ng) was heated to 100°C for 3 minutes, then incubated at 37°C for 10-30 minutes. In a final volume of 50 ul, the denatured- DNA was labeled in 50 mM Tris-HCl pH 8.0, 10 mM magnesium chloride, 10 mM beta-mercaptoethanol, 20 uM dCTP, 20 uM dGTP, 20 uM dTTP, 50 uCi alpha-32P dATP (3000Ci/mMole), 200 mM HEPES pH 6.6, 60 OD260nm/ml P(dNg), 0.4 mg/ml BSA, and 5 u E. coli DNA polymerase I Klenow fragment. The mixture was incubated at 37°C for 4-16 hours. The reaction was terminated as outlined above. Unincorporated nucleotides were removed either by column chromatography (as described) or by spun column chromatography. Spun columns were prepared with Sephadex G-25 (stored in TE buffer, pH 7.5) by water aspiration. Gel beads were packed by centrifugation on a table-top centrifuge for 2 minutes. Labeled DNA was then applied and separated from unincorporated nucleotides by a further 2 minute centrifugation. The specific activity of a DNA probe obtained from Klenow labeling was 1-2X10 cpm/ug. Probes were denatured at 100°C prior to use. 3. KLENOW LABELING OF SINGLE-STRANDED DNA 47 Single-stranded M13 phage DNA was labeled as described by Hu and Messing (1982) and Brown et al. (1982). M13 single-stranded DNA template (2.5 ul of a 50 ul preparation) was annealed to 2.5 ul M13 sequencing primer (1.5 ug/ml) in 15 ul 1 X Hin buffer (10 X Hin: 100 mM Tris-HCl pH 7.4, 500 mM NaCl, 50 mM magnesium chloride, 10 mM DTT). The reaction was heated at 68°C for 10 minutes, then cooled to room temperature slowly. In a 25 ul reaction volume containing 50 uCi alpha-32P dATP (3000 Ci/mMole), 0.05 mM dCTP, 0.05 mM dGTP, 0.05 mM dTTP, and 2 u E. coli DNA polymerase Klenow fragment, the annealed template-primer was extended at 17°C for one hour. The reaction was terminated and unincorporated nucleotides removed as noted above. The specific activity obtained was 2X10 cpm/ug. Heat denaturation was required before the probes were used. Long oligonucleotides were labeled as described above with the following modifications. An aliquot (12 pmole) of the 43 mer and 40 pmole of the complimentary 12 mer were annealed in a small volume (6 ul). Following cooling, the annealed oligonucleotides were diluted with 12 ul distilled water and 2 ul of the diluted template-primer was used for extension with alpha-32P dCTP in a 10 ul reaction volume. After a 10 minute incubation at 37°C, the reaction was chased with 0.05 mM dCTP for 15 minutes, and terminated by the addition of 0.1 volume 5 mM EDTA and heating at 68°C for 5 minutes. Removal of unincorporated nucleotides was by either column or spun column chromatography over a Sephadex G-25 bead matrix. The equilibration buffer used for column chromatography was 5 mM EDTA (pH 7.5). The specific activity of the oligonucleotide probes were 5-10X10 cpm/pmole. Probes were heat denatured prior to use. 48 4. END LABELING OF OLIGONUCLEOTIDES The other oligonucleotides were 5' end-labeled with T^ polynucleotide kinase (Chaconas and van de Sande, 1980). In a 20 ul reaction solution composed of 50 uCi gamma-32P ATP (3000Ci/mMole), 0.1 M Tris-HCl pH 8.0, 0.01 M magnesium chloride, 0.005 M DTT, and 4 u T^ polynucleotide kinase. The reaction was incubated at 37°C for 45 minutes, and terminated by the addition of 0.1 volume 0.5M EDTA followed by heating at 68°C for 5 minutes. Removal of unincorporated nucleotides was as described for Klenow labeled oligonucleotides. The specific activity of the labeled oligonucleotides was approximately 10^ cpm/pmole. 32P-end-labeled oligonucleotide probes were added directly to the hybridization mixture. H. SCREENING PLASMID LIBRARIES Two liver cDNA libraires were screened by colony hybridization. A bovine liver cDNA library was recently constructed (MacGillivray and Davie, 1984). The library consists of 90,000 independent recombinants each containing cDNA inserts greater than 1,000 ± 200 bp cloned into the Pst I site of pBR322 by homopolymeric dG:dC tailing. Thirty thousand bacterial colonies of the cDNA library were spread on six cellulose nitrate filters (Millipore HATF 82mm filters, 5000 colonies per filter), and grown on LB-tetracycline (12.5 ug/ml) plates until the colonies were l-2mm in diameter. Two replica filters were prepared (Hanahan and Meselson, 1980), the filters were placed on fresh tetracycline plates and incubated at 37°C until the colonies were 2-3mm in diameter. At this point, the master filters were stored at 4°C, while the replica filters were placed on LB-chloramphenicol (250 ug/ml) plates and incubated at 37°C for 16 hours. The replica filters were then prepared for hybridization (Grunstein and Hogness, 1975). DNA was denatured and bound to the filters by soaking the filters twice (20 minutes each) on Whatmann 3MM paper saturated with 0.5 49 M NaOH, 1.5 M NaCl. The filters were neutralized in a similar manner in 1 M Tris-HCl pH 7.5 followed by 0.5 M Tris-HCl pH 7.5, 1.5 M NaCl. After air drying, the filters were baked at 68°C for 4-16 hours. Prior to hybridization, cellular debris was removed by gentle rubbing in 2 X SSC (1 X SSC: 0.015 M sodium citrate, 0.15 M sodium chloride, pH7). Prehybridization was for 16 hours at 68°C in a sealed bag containing 6 X SSC, 2 X Denhardt's solution (1 X Denhardt's solution: 0.02% of each of BSA, ficoll, and polyvinyl pyrrolidone) (see Table I). The prehybridization solution was then replaced with hybridization solution containing 6 X SSC, 2 X Denhardt's solution, 0.5% SDS, and the 32P-labeled mixture of oligonucleotides at a concentration of 6.6 mM. After hybridization at 37°C for 18 hours, the filters were rinsed briefly in 6 X SSC at room temperature. The filters were then washed at 35°C (four times, 5 minutes per wash), and the damp filters exposed to X-ray film with intensifying screens at room temperature for 68 hours. Areas of the master filters corresponding to positive colonies were cut out, and resuspended in LB-tetracycline. The colonies were replated at low colony density, and rescreened as before. Positive colonies from the second screen were streaked out to give single colonies, and plasmid DNA isolated. An adult human liver CDNA library (Prochownick et al., 1983) was generously provided by S.H. Orkin (Children's Hospital Medical Center, Boston). This library consists of human liver cDNA greater than 500 bp in size inserted into the Pst I site of pKT218 by homopolymeric dG:dC tailing. Bacterial colonies (240,000) of the human cDNA library were screened at high colony density. Conditions for hybridization and washing were as described by Degen et al. (1983) to allow for possible mismatches between the bovine and human sequences (see Table I). Filters were hybridized in 6 X SSC, 2 X Denhardt's solution, 0.5% SDS at 60°C for 18 hours, then washed at 60°C in 6 X SSC, 0.5% SDS (three times, 30 minutes), and autoradiographed with intensifier screens at -70°C for 18 hours (Laskey and Mills, 1977; Swanstrom and Shank, 1978). Positive clones were screened at lower colony 50 TABLE I. COLONY AND IN SITU HYBRIDIZATION CONDITIONS Summary of hybridization and washing conditions used in screening plasmid and phage lambda libraries. Autoradiography varied greatly depending upon the density of plating and the number of previous screens. For plasmid or phage lambda Southern blot analyses, conditions used were identical those given for screening. Details are given in text. PROBE PREHYBRIDIZATION HYBRIDIZATION WASHES AOTOGRDIOGRAPHY Oligo-nucleotide (17-24 mers) cDNA (Same Species) cDNA (Cross Species Bovine to Human) 6 X SSC 2 X Denhardt's Solution 68*C/ 1 hour-overnight 6 X SSC 2 X.Denhardt's Solution 0.5% SDS 37* C/overnight 6 X SSC 2 X Denhardt's Solution 0.5% SDS 1 mM EDTA 68* C/overnight 6 X SSC 2 X Denhardt's Solution 0.5% SDS 60* C/overnight 6 X SSC 3 X 5 minutes 17 mer: 37*C 24 mer: 52*C 1 X SSC 0.5% SDS 3 X 30 minutes 68'C 6 X SSC 0.5% SDS 3 X 30 minutes 60*C Intensifier -70* C overnight-3 days Intensifier -70* C 4 hours-overnight Intensifier -70*C 4 hours-overnight Long Oligo-nucleotide (43 mers) (Screens/Southerns) 6 X SSC 50 mM NaHPO 5 X Denhardt's Solution 20% Deionized Formamide 42*C/l-2 hours 6 X SSC 50 mM NaHPO 5 X Denhardt's Solution 20% Formamide 10% Dextran Sulphate 42 * C/overnight 1 X SSC 0.1% SDS Rinse: 3 X 30 minutes room temperature Wash: 3 X 30 minutes 60*C Intensifier -70*C 4 hours-3 days cDNA (Genomic Southern and Northern Blot Analysis) Same as for Hybridization 37*C/2 hours-overnight 50% Foinnamide 6 X SSC 1 mM EDTA 0.1% SDS 0.05% NaPPi 100 ug/ml Herring Sperm DNA 25 ug Poly (A) 37»C/24-48 hours Rinse: 2 X SSC 1 X Denhardt's 2 X 30 minutes room temperature /50*C Wash: 0.1 X SSC 0.1% SDS (Southerns: 2 X 90 minutes Northerns: 2 X 60 minutes) Intensifier -70'c overnight-7 days 52 density until all colonies plated were positive. The library was later rescreened, using a 350 bp Pst I fragment of pcHX5 cloned into M13mp8 as the probe (Fung et al, 1984). More stringent hybridization and washing conditions were used due to the specificity of the probe. Filters were hybridized in 6 X SSC, 2 X Denhardt's solution, 0.5% SDS, 1 mM EDTA at 68°C for 18 hours (see Table I). Washing conditions included three incubations at 68°C for 30 minutes in 1 X SSC, 0.5% SDS. I. SCREENING PHAGE LAMBDA LIBRARIES 1. IN SITU HYBRIDIZATION SCREENING Genomic and cDNA libraries in a variety of different lambda vectors were screened by the procedure of Benton and Davis (1977). For initial screening, up to one million recombinant phage were plated at a high density of 50,000 per 150 mm petri dish. Phage were preabsorbed to host cells at 37°C for 15 minutes prior to plating. Once solidified, plates were incubated at 37°C for 4-6 hours or until phage plaques were just visible. The phage were transferred to duplicate nitrocellulose filters and grown overnig ht (37°C) on fresh NZYCM plates. This amplification step was omitted for subsequent low density screens. Master plates were stored at 4°C. The filters were submerged in denaturation solution (0.5 M NaOH, 1.5 M NaCl) twice for 5 minutes and neutralized for 5 minutes each, first in 1 M Tris-HCl pH 7.5, then in 0.5 M Tris-HCl pH 7.5, 1.5 M NaCl. Cellular debris was removed in 3 X SSC; the filters were air dried and baked at 68°C for 4-16 hours. Hybridization and washing conditions varied depending on the DNA probe used (Table I). For long oligonucleotide probes, the conditions were as described by Ullrich et al. (1984) and McLean et al. (1984). The hybridization solution consisted of 6 X SSC, 50 mM sodium phosphate pH 6.8, 5 X Denhardt's solution, 20% formamide, and 10% dextran sulfate. Incubation was at 42°C for 16 hours. The filters were then washed three times in 1 X SSC, 0.1% SDS at 45-53 60°C for 30 minutes. Conditions for autoradiography varied according to the conditions of hybridization and washing. The vector used in library construction, and the type and specific activity of the probe involved were also contributing factors. 2. RECOMBINANT SCREENING A human genomic DNA library constructed in lambda charon4A (Lawn et al., 1978) was screened by recombination (Seed, 1983). This procedure is described in detail in Appendix I. The DNA sequence used as a probe (a 500 bp fragment from the 5' region of the human factor X gene) was cloned into the small plasmid, piAN7, as described elsewhere. The bacterial host strain, MC1061/P3, was transformed with the recombinant plasmid. An aliquot (250 ul) of an overnight culture of the recombinant piAN7 transformant was preabsorbed with 2 million phage at 37°C for 10 minutes and plated on NZYCM-ampicillin/tetracycline (12.5 ug/ml and 7.5 ug/ml, respectively) agar. As a control, an equivalent number of phage were plated on the nonrecombinant host, MC1061/P3/piAN7. Plates were incubated at 37°C overnight and overlayed with 5 ml SM. The phage were harvested at 4°C overnight and titered on two bacterial host strains. K802 (sup+) infected with the eluted phage (as a negative control) was grown on NZYCM plates and MC1061/P3 (sup )^ infected with the same phage stock was plated on NZYCM-kanamycin (50 ug/ml) agar. Recombinant phage clones were isolated from the sup0 host and subjected to a second round of screening by Southern blot analysis. The oligonucleotide, Oligo 2, was 32P-end-labeled and used as the hybridization probe. J. CONSTRUCTION OF SPECIFIC GENOMIC PHAGE LAMBDA LIBRARIES 54 1. ISOLATION OF GENOMIC DNA Two specific human genomic DNA libraries were cloned in phage lambda vectors. Genomic DNA digested with the restriction endonuclease Hind III was ligated in lambda Dash vector (Frischauf et al., 1983). Likewise, genomic DNA cleaved with Bam HI was end-modified and inserted in lambda gtll vector (Young and Davis, 1983a,b). Aliquots (50-100 ug) of human genomic DNA were incubated either with Hind III or Bam HI under conditions recommended by the suppliers. After digestion at 37°C for 4-12 hours, DNA fragments were separated by electrophoresis in 0.9% agarose at 25 mA for 10-12 hours. Three slices of DNA were excised each from the 10 Kbp and the 4 Kbp regions of the Hind III and Bam HI digests, respectively. The DNA was recovered either by low melting point agarose isolation or by extraction with Geneclean (Vogelstein and Gillespie, 1979), resuspended in a small volume of TE buffer, and stored at -20°C. To identify the correct fractions for library construction, one-tenth of the DNA isolations were subjected to Southern blot analysis using 32P-labeled Oligos 3-4 and 5-6 as the hybridization probe as described elsewhere. The blots were subsequently exposed to X-film with intensifiers for 7 days at -70°C. 2. MODIFICATION OF GENOMIC DNA FRAGMENTS The 4 Kbp Bam HI DNA fragments were modified to Eco RI ends as follows. The DNA fragments (approximately 3 ug) were made blunt-ended in a 55 ul solution consisting of 5.5 ul 10 X Klenow buffer (0.5 M Tris-HCl pH 7.5, 0.1 M magnesium sulphate, 1 mM DTT, 500 ug/ml BSA), 5.5 ul 10 X dNTP (0.8 mM dCTP, 0.8 mM dTTP, 0.8 mM dGTP, 0.5 mM dATP), 5 ul alpha-32P dATP (10 uCi/ul), and 7.5 u E. coli DNA polymerase Klenow fragment. The reaction was incubated at room temperature for 15 minutes. The unincorporated nucleotides were removed by Geneclean extraction. To protect internal Eco 55 RI restriction sites, the DNA was methylated (37°C, 30 minutes) in a 50 ul reaction containing 5 ul 10 X buffer (0.5 M Tris-HCl pH 7.5, 0.1 M EDTA), 2.5 ul 0.1 M DTT, 4 ul 1.5 mM s-adenosylmethionine (SAM), and 50 u Eco RI methylase. The DNA was again purified by glassmilk absorption and eluted in 14 ul TE buffer. Eco RI linkers (1 ul of 1 ug/ul) were ligated to the blunt-ended/methylated DNA in a 15 ul reaction of ligation buffer (66 mM Tris-HCl pH 7.5, 5 mM magnesium chloride, 5 mM DTT, 1 mM ATP) and 1 u T 4 DNA ligase. The reaction was sealed in a 10 ul capillary tube and incubated at 4°C for 48 hours. The linkered DNA was subsequently digested with Eco RI and prepared for column chromatography. DNA (30 ul) was adjusted with 4 ul 50% glycerol, 1 mM EDTA, 0.02% bromophenolblue and applied to a 1 ml Biogel A50M column (43 cm) equilibrated with 0.1 M NaCl, 0.02 M Tris-HCl pH 7.5, 1 mM EDTA (for 2 days). Eluted fractions (50 ul) containing 32P-labeled DNA were determinined by Cerenkov counts, pooled, and precipitated with ethanol. The DNA pellet was then resuspended in 5 ul TE buffer. 3. LIGATION AND PACKAGING OF GENOMIC DNA The genomic DNA libraries were constructed by insertion into phage vectors followed by in vitro encapsulation. Lambda Dash vector DNA (5 ug) was cleaved with Hind III, and the 'stuffer' fragment removed by digestion with Xho I followed by selective precipitation with ethanol. The isolated 10 Kbp Hind III DNA fragments (1 ug) were ligated to the lambda Dash vector (1 ug) in a 5 ul reaction (for buffer conditions, see above). The ligation mixture was heat-sealed in a capillary tube and incubated at 4°C for 48 hours. Likewise, the linkered 4 Kbp Bam HI DNA fraction (1-2 ug) was inserted into Eco RI predigested lambda gtll (1 ug). The ligation reactions were then packaged with Gigapack Plus under conditions recommended by Stratagene Cloning Systems. Packaging extracts were thawed quickly from -70°C. The ligation mixture, followed by 15 ul of the yellow tube solution, were added to the red tube. The contents were gently mixed and incubated at room temperature for 2 hour. SM buffer (500 ul) was added and the resulting phage stock stored at 4°C. The 56 lambda Dash and lambda gtll phage libraries were titered and plated on host strains P2 392 and RY1088, respectively. The hybridization and washing conditions used were as described above for screening of phage lambda libraries (Table I). K. DNA SUBCLONING 1. PRODUCTION OF DNA FRAGMENTS FOR LIGATION DNA fragments to be subcloned into pUC13, M13, or piAN7 vectors were produced either by restriction endonuclease digestion or by random sonication (Deininger, 1983). To generate specific fragments, DNA was digested with one or more restriction endonucleases under conditions suggested by the manufacturers. Restriction endonuclease fragments were either used directly or purified by gel electrophoresis prior to ligation. Recovery from agarose gels was by low melting point isolation, electroelution (Maniatis et al., 1982), or Geneclean extraction (Vogelstein and Gillespie, 1979). Random DNA fragments were generated by sonication using a Heat Systems Sonifier at output level 2. DNA (20-50 ug) in 500 ul of 0.5 M NaCl, 0.1 M Tris-HCl pH 7.5, 10 mM EDTA was sheared by seven pulses of 5 seconds. The DNA solution was intermittantly mixed and cooled on ice between pulses. The resulting DNA fragments were separated by electrophoresis in a 5% preparative polyacrylamide gel. Fragments (300-500 bp in size) were recovered by electroelution, and the ends were repaired with T4 DNA polymerase (in a 50 ul reaction of 33 mM Tris acetate pH 7.8, 66 mM potassium acetate, 10 mM magnesium acetate, 100 mg/ml BSA, 0.2 mM of each dNTP, and 6 u T4 DNA polymerase). The random DNA fragments were then extracted with phenol, precipitated with ethanol, and resuspended at 10 ug/ml in TE buffer. 57 2. LIGATION OF DNA INTO pUC13, Ml3, AND piAN7 VECTORS DNA fragments were ligated into pUC13, Ml3, and piAN7 vectors in 10-15 ul of buffer containing 66 mM Tris-HCl pH 7.5, 5 mM magnesium chloride, 5 mM DTT, 0.4-1.0 mM ATP, and T^ DNA ligase ( 1 unit for blunt-ended ligations and 0.2 unit for stick-ended ligations (Maniatis et al., 1982)). For pUC13 ligations, 50-100 ng of vector was ligated to 3 molar excess of insert DNA, while for Ml3 ligations, 10-20 ng of vector was ligated to 1-3 molar excess of insert DNA. The piAN7 probe was constructed with 30 ng of vector and 3 molar exess of insert DNA. Ligation proceded at 15-17°C for either 4 hours for sticky-ended DNA, or overnight for blunt-ended DNA. The ligated DNA was used immediately or stored at -20°C. 3. TRANSFORMATION OF DNA INTO BACTERIA Competent host bacterial cells' were prepared by calcium chloride treatment (Messing, 1983). The host cells (50 ml) were grown at 37°C to mid-log phase (OD600nm = 0.5-0.7), then centrifuged at 4 Krpm for 4 minutes at 4°C. The pellet was resuspended in 25 ml ice cold 50 mM calcium chloride and placed on ice for 25 minutes to 1 hour. The treated cells were recentrifuged as before and resuspended gently in 5 ml cold 50 mM calcium chloride. At this point, the cells were used immediately or incubated at 4°C for 16 hours to increase transformation efficiency (Dagert and Ehrlich, 1979). Ligated DNA (3 ul) was added to 0.3 ml competent cells and incubated on ice for 40 minutes. The suspension was heat shocked at 42°C for 2 minutes and, in the case of M13 transformants, were plated immediately with 10 ul 100 mM IPTG, 50 ul X-Gal (2% w/v in dimethylformamide), 0.2 ml fresh host cells, and 4 ml soft YT agar (42°C). pUC13 transformed cells were rescued with the addition of 0.7 ml Luria broth followed by incubation at 37°C for 30 minutes to 1 hour. Cells (100 ul) were then spread on LB-ampicillin (50 ug/ml) plates saturated with X-Gal (50 ul) and incubated at 37°C overnight. In the selection procedure used, recombinant clones appeared clear 58 whereas nonrecombinants were blue. piAN7 transformed cells were rescued as described for pUC13 transformants, plated on LB-ampicillin/tetracycline (50 ug/ml and 12.5 ug/ml, respectively) agar, and incubated at 37°C overnight. Recombinant clones were detected by colony hybridization. L. BLOT HYBRIDIZATION 1. SOUTHERN BLOT ANALYSIS Southern blots were prepared as described by Southern (1975). DNA (1-10 ug) was digested by restriction enzymes and electrophoresed in an agarose gel. To nick the DNA prior to transfer, the agarose gel was exposed to UV irradiation (260 nm) for 30 seconds. The gel was denatured for 40 minutes in 0.5 N NaOH, 0.6 M NaCl and neutralized in two changes of 1 M Tris-HCl pH 7.5, 0.6 M NaCl for 30 minutes each. Transfer was performed by one of two methods. The aqueous method utilizes Whatman 3MM paper as wicks, allowing a continuous flow of buffer (10 X SSC) through the gel. Transfer of DNA was complete in 16-30 hours. For the ammonium acetate method, the gel, the nitrocellulose filter, and the Whatman 3MM paper were presoaked in 1 M ammonium acetate for 40 minutes. Transfer was carried out in the absence of buffer (direction of transfer was opposite to that of the aqueous method). Efficient DNA transfer was achieved in 4 hours. The Southern blots were then rinsed in 3 X SSC, air dried, and baked either at 68°C for 6-16 hours or at 80°C for 1 hour. Hybridization of 32P-labeled DNA to genomic Southern blots was performed according to Kan and Dozy (1978). The nitrocellulose blots were prehybridized and hybridized in solutions consisting of 50% formamide, 6 X SSC, 1 mM EDTA, 0.1% SDS, 10 mM Tris-HCl pH 7.5, 10 X Denhardt's solution, 0.05% sodium pyrophosphate, 100 ug/ml denatured herring 59 sperm DNA, 25 ug poly (A), 50 ug/ml tRNA, and 32P-labeled DNA in the case of hybridization. Prehybridization and hybridization were carried out at 37°C for 1-16 hours and 30-48 hours, respectively. The blots were washed once in 2 X SSC, 1 X Denhardt's solution (room temperature, 1 hour) and twice in 0.1 X SSC, 0.1% SDS (90 minutes, 50°C). The blots were then rinsed twice at room temperature in 0.1 X SSC, 0.1% SDS followed by 4 rinses at room temperature in 0.1 X SSC. Autoradiography was performed at -70°C for 1-7 days with intensifier screens. Conditions used for genomic Southern blots hybridized with 32P-labeled long oligonucleotides and plasmid/phage lambda Southern blots hybridized with specific 32P-labeled DNA were as described under screening phage lambda libraries. Depending on the the type of blot analysis, film exposure times varied from 1 hour to 7 days. 2. NORTHERN BLOT ANALYSIS RNA was separated by electrophoresis in formaldehyde agarose gels as described above (Maniatis et al., 1982). After electrophoresis, the gel was rinsed in several changes of distilled water (5 minutes each), denatured in 50 mM NaOH, 10 mM NaCl (45 minutes), and then neutralized in 0.1 M Tris-HCl pH 7.5 (45 minutes). Prior to transfer, the gel was soaked in 10 X SSC for 1 hour. Transfer of RNA to nitrocellulose was accomplished by the aqueous wick method (see above). Specific mRNA species were detected by hybridization and autoradiography as described for genomic Southern blots. 60 M. DNA SEQUENCE ANALYSIS 1. PRODUCTION OF Ml3 CLONES DNA fragments to the sequenced by the chain termination method (Sanger et al., 1977) were ligated into the M13 vectors mp8, 9, 18, and 19 (Messing et al., 1981; Messing, 1983). To identify recombinant clones, Ml3 phage plaques were screened by in situ hybridization (Benton and Davis, 1977). Selected clones were isolated and purified by replating at low plaque density. Phage was subsequently grown and single-stranded Ml3 DNA prepared. All techniques used are described elsewhere. 2. DNA SEQUENCE ANALYSIS The nucleotide sequence of the M l 3 DNA clones was determined by the chain termination method developed by Sanger et al. (1977) and modified by Messing et al. (1981). The concentrations of the dideoxy- and deoxyribonucleotides used are shown in Table II. Under these conditions, some Ml3 templates gave ambigous sequence data, presumably because of nondenatured secondary structure in the DNA fragments. In these cases, the sequence analysis was repeated except that deoxyinosine triphosphate (dITP) was substituted for dGTP in each reaction (Barnes et al., 1983) (also see Table II). The dITP reactions were chased with 0.5mM dGTP and 0.5mM dATP, and an additional, regular dC/ddC reaction was included to eliminate irregularities in the dC/ddC(dITP) reaction. The Ml3 template DNA (4 ul) was added to 1 ul primer (1.5 ug/ml, 17 mer with sequence: 5'-d(GTAAAACGACGGCCAG)-3'), 2 ul 10 X Hin buffer (600 mM NaCl, 100 mM Tris-HCl pH 7.5, 70 mM magnesium chloride) and 1 ul distilled water. The mixture was heated at 68°C for 10 minutes, then cooled slowly to room temperature (20-30 minutes). The annealed template primer was then combined with 1 ul 15 uM dATP, 1.5 ul alpha-32P NUCLEOTIDE d/ddG d/ddA d/ddT d/ddC dG 7.9 109.4 158.7 157.9 dT 157.6 109.4 7.9 157.9 dC 157.6 109.4 158.7 10.5 ddG 157.4 - - -ddA - 116.7 ddT - - 550.3 ddC - 191.6 TABLE I I . DNA SEQUENCING MIXES The concentrations of the dideoxy- and deoxy-ribonucleotide triphosphates used i n the sequencing mixes f o r M13 DNA sequencing. Values are given i n uM and were determined empirically by Dr. Joan McPherson, Dept. of Plant Sciences, University of B r i t i s h Columbia. In the case of ambiguous r e s u l t s , deoxy-inosinetriphosphate (dITP) was substituted for dGTP at a r a t i o of .4:1 (dITP:dGTP) and the concentration of ddG was lowered 10-fold. 62 dATP, and 5 ul diluted DNA polymerase I Klenow fragment (0.4 u/ul). To initiate the reactions, 3 ul of the template/primer mixture was mixed with 1.5 ul of each of the four dideoxy/deoxyribonucleotide solutions. The extension was performed at room temperature for 15 minutes followed by a chase (1 ul 0.5 mM dATP) of 20 minutes. Dye mix (5 ul of 98% formamide, 10 mM EDTA pH 8.0, 0.02% xylene cyanol, 0.02% bromophenolblue) was added and the reactions were terminated by heating at 90°C for 3 minutes. The extended products (1-2 ul) were then separated on 6% and 8% thin (0.35 mm) polyacrylamide-8.3 M urea gels (50 cm long) at 52 W. After electrophoresis, the gels were dried and visualized by autoradiography (overnight exposure at room temperature). In some instances where Ml3 subclones were not obtained, the chemical cleavage method of DNA sequence analysis was employed as described by Maxam and Gilbert (1980). 3. COMPUTER ANALYSIS OF DNA SEQUENCE DATA All sequence data was analyzed by the computer programs of Staden (1982) and Delaney (1982). 63 III. RESULTS A. CHARACTERIZATION OF THE BOVINE FACTOR X CDNA 1. ISOLATION AND CHARACTERIZATION OF THE BOVINE FACTOR X CDNA CLONES A bovine liver cDNA library was previously constructed (MacGillivray and Davie, 1984). The library consists of 90,000 independent recombinants each containing cDNA inserts greater than 1,000 ± 200 bp cloned into the Pst I site of pBR322 by homopolymeric dG:dC tailing. Thirty thousand bacterial colonies of the library were screened at high colony density (Hanahan and Meselson, 1980) with a mixture of 24 synthetic oligonucleotides that coded for residues 274-279 of the heavy chain of bovine factor X (Titani et al., 1975) (Figure 4). Screening with a mixture of oligonucleotides was necessary because of the degeneracy of the genetic code. After hybridization with the 32P-labeled oligonucleotides, nine positive colonies were detected in corresponding positions on duplicate filters. An example of a positive colony is shown in Figure 5. Because it was not possible to identify single colonies from this high density screen, positive colonies were replated at low density, and rescreened as before. An example of the second screen is shown in Figure 5B-D. After hybridization with the 32P-labeled oligonucleotides, excess probe was removed by several short washes at 23°C (Figure 5B). A single positive colony was detected in both replica filters, but other colonies on the filter were also weakly positive. To test the specificity of the hybridization, the filters were washed at 38°C (Figure 5C). Although this washing step removed some of the background hybridization, a further washing step at 46°C was required to remove all of the background hybridization, leaving a single positive colony that hybridized specifically to the mixture of synthetic oligonucleotides (Figure 5D). Five of the nine positives from the first screen were positive in the low density second screen. 64 FIGURE 4: SYNTHETIC OLIGONUCLEOTIDE MIXTURE FOR BOVINE FACTOR X Prediction of the oligonucleotide mixture sequence used as a hybridization probe in screening the bovine cDNA library. The mixture encodes residues 274-279 of the heavy chain of bovine factor X as determined by Titani et al. (1975). Synthetic oligonucleotides f o r fa c t o r X 274 275 276 277 278 279 Amino acid sequence: Lys Trp l i e Asp Lys l i e mRNA sequence: 5' AA^ UGG Au8 GA^ AA^ AU 3' u A t\ T A A C cDNA sequence: 3' TT^, ACC TA^ CT^ TT T TA 5' Probe was a mixture of 24 d i f f e r e n t 17mers 4 66 FIGURE 5: AUTORADIOGRAPH OF BOVINE LIVER cDNA LIBRARY SCREENED WITH A MIXTURE OF SYNTHETIC OLIGODEOXYRIBONUCLEOTIDES CODING FOR BOVINE FACTOR X. For each of panels A - D , duplicate filters are shown. X represents marks used to orient the filters, and the arrows point to colonies that were positive on both replica filters. PANEL A: High density screen of 5,000 colonies. Hybridization and washing conditions are described in Materials and Methods. PANEL B: The positive colony from panel A was rescreened at low colony density. After hybridization, the filters were washed four times in 6 X SSC at 23°C and exposed to X-ray film for 18 hours. PANEL C: The filters from panel B were washed four times in 6 X SSC at 38°C and re-exposed to X-ray film for 22 hours. PANEL D: The filters from panel C were washed four times in 6 X SSC at 46°C and re-exposed to X-ray film for 18 hours. 2 3 ° Washing Temperature —I 68 Plasmid DNA was isolated from each of the five positives, designated pBXl-pBX5. Digestion with Pst I showed that these plasmids contained bovine cDNA inserts of 1530 bp, 770 bp, 700 bp, 1100 bp, and 980 bp. Preliminary restriction mapping showed that the inserts contained overlapping DNA (Figure 6). The plasmid containing the largest cDNA insert (pBXl) was chosen for further study. 2. DNA SEQUENCE ANALYSIS OF BOVINE FACTOR X CDNA CLONES The complete nucleotide sequence of the cDNA insert of pBXl was determined using the strategy shown in Figure 6. Restriction endonuclease fragments of pBXl were subcloned into phage Ml3, followed by DNA sequence analysis by the chain termination method (Figure 6, light arrows). This resulted in the determination of 90% of the sequence of the pBXl insert. The sequence analysis was completed using the chemical cleavage method (Figure 6, heavy arrows). The complete DNA sequence of the pBXl insert is shown in Figure 7. Nucleotides 196-615 of pBXl code for the light chain of factor X (Enfield et al., 1980), and nucleotides 622-1537 code for most of the heavy chain of factor X (Titani et al., 1975). The factor X cDNA sequence predicts that factor X mRNA encodes a single polypeptide chain precursor, in which the light and heavy chains are linked by the dipeptide Arg-Arg (encoded by nucleotides 616-621 in Figure 7). None of the factor X clones contains DNA complementary to the 3' end of factor X mRNA. Comparison with the amino acid sequence of the heavy chain shows that pBXl is probably lacking 14 nucleotides of coding sequence, a presumed stop codon, the 3' noncoding region and the poly A tail. As cDNA synthesis was primed with oligo (dT), this lack of 3' sequences may have been the result of subsequent exonuclease activity or incomplete second strand synthesis during the construction of the cDNA library. Analysis of the 5' end of factor X cDNA reveals the presence of a leader sequence (Figure 7). A single A T G codon (nucleotides 76-78) occurs in the same reading frame that codes for 69 FIGURE 6: RESTRICTION MAP AND SEQUENCING STRATEGY FOR BOVINE FACTOR X CDNA. The bars below the restriction map represent the clones pBXl-pBX5. The 5' noncoding region is represented by a dotted bar, the region coding for the leader peptide by the slashed bar, the region coding for the light chain of plasma factor X by the solid bar, and the region coding for the heavy chain of plasma factor X by the open bar. The extent of sequencing of pBXl is indicated by the length of the arrow. DNA sequence determined on the coding strand is indicated by an arrow pointing right; sequence determined on the noncoding strand is indicated by an arrow pointing left. DNA sequence determined by the chain termination method (Sanger et al., 1977) is indicated by the light arrows; DNA sequence determined by the chemical cleavage method (Maxam and Gilbert, 1980) is indicated by the heavy arrows. The scale at the bottom represents nucleotides in kilobases. Q C L L CO C L DC o o Hi Q Q L L L L Q D C L L O c L L _ CO to. n CL D c D C L L Q C 3 pBX1 pi3X2 pBX3 • pBX4 • pBX5 < <- < -> > >~ 0.2 0.4 0.6 0.8 1.0 Nucleotides (kb) 1.2 1.4 1.6 o 71 FIGURE 7: NUCLEOTIDE SEQUENCE OF BOVINE FACTOR X CDNA The sequence was determined by analysis of the overlapping clones shown in Figure 6. The predicted amino acid sequence of bovine preprofactor X is shown above the DNA sequence. Nucleotides 76-195 code for a leader sequence, nucleotides 196-615 code for the light chain of plasma factor X (Enfield et al., 1980), and nucleotides 622-1537 code for most of the heavy chain of plasma factor X (Titani et al., 1975). The single chain bovine factor X is numbered from the site of cleavage that gives rise to the light and heavy chains of factor X. The cDNA sequence predicts that the light and heavy chains of factor X are joined by the dipeptide Arg-Arg (encoded by nucleotides 616-621). Putative cleavages to form two chain factor X are shown by the heavy arrows, the factor IXa cleavage site (Fujikawa et al., 1974) by the light arrow, and the attachment sites for carbohydrate (Titani et al., 1975) on Asn-173 and Thr-445 are indicated by solid diamonds. The boxed region (nucleotides 1450-1466) is complementary to one of the synthetic oligonucleotides used as a hybridization probe in the isolation of factor X cDNA clones. 72 -40 Met Ala Cly Leu Leu Hie Leu Val Lau Leu AGC CTC CCC CAO CCG ACC TTC CCC TGG AGO CCT CTT OCO CCA COO ACT CAC OGC TGT CCT CCC AAC GCC CCC ACC ATT. GCC GCC CTG CTG CAT CTC CTT CTC CTC IS JO _ 45 60 75 90 10S -JO -JO -10 -1 »l Ser Thr Ala Leu Gly Gly Lau Lau Ar? Pro Ala Cly Ser Val Fhe Leu Pro At9 Asp Gin Ala Hit Arc] Val Lau Gin Ar? Ala Ar? Ar? Ala Ann Ser Phe Lau AGC ACC GCC CTG GGC GGC CTC CTG CGG CCG GCC GGG AGC CTG TTC CTG CCC CCG GAC CAO GCC CAC CCT GTC CTG CAG AGA GCC CCC ACC GCC AAC TCA TTC TTG 120 U S ISO 165 180 195 110 10 20 JO 40 Glu Glu Val Lya Gin GI'/ Aan Lau Glu Ar? Glu Cya Lau Glu Glu Ala Cya Ser Lau Glu Glu Ala Ar? Glu Val Pha Glu Aap Ala Glu Gin Thr Aap Glu Pha CAG GAG GTC AAG CAC COA AAC CTG GAG CCA GAC TGC CTG CAG GAG CCC TGC TCA CTA GAC CAC CCC CCC GAC GTC TTC GAG GAC CCA CAC CAG ACG CAT GAA TTC 22S 240 2S5 270 2BS 300 315 SO 60 70 Trp Sar Lya Tyr Lya Asp Cly Aap Cln Cya Glu Gly Hia Pro Cya Leu Aan Gin Cly Hie Cya Lya Asp Gly D a Gly Asp Tyr Thr Cya Thr Cys Ala Glu Gly TGG AGT AAA TAC AAA GAT CGA GAC CAG TGT GAA GCC CAC CCC TGC CTC AAT CAG GGC CAC TGT AAA CAC CCC ATC CCA GAC TAC ACC TGC ACC TGT CCG GAA CCG 330 J45 J60 J7S 390 405 420 80 90 100 110 Phe Glu Gly Lya Aan Cya Glu Pha Sar Thr Arc; Glu l i e Cyt Sar Lau Asp Asn Cly Gly Cys Asp Gin fhe Cya A19 Glu C l u Ar? Sar Glu Val Ar? Cys Sar TTT GAA CCC AAA AAC TGC CAC TTC TCC ACG CCT CAG ATC TGC AGC CTG CAC AAT GGA GCC TGC CAC CAC TTC TGC ACC GAC GAC CGC AGC GAC CTG CGG TGC TCC 435 4S0 46S 480 495 S10 « ± 525 120 n o Cya Ala Hia Cly Tyr Val Lau Cly Aap Aap Sar Lya Ser Cya Val Ser Thr Glu Ar? Phe Pro Cys Gly Lys Phe Thr Gin Gly Ar? ser'Ar? Ar?*Trp Ala I l a TGC COG CAC GGC TAC GTC CTG GCC GAC CAC ACC AAC TCC TCC CTC TCC ACA CAG CGC TTC CCC TGT CCG AAG TTC ACC CAC CGA CCC AGC CGG CGG TGC CCC ATC S40 SSS S70 585 600 61S 630 150 160 170 180 His Thr Sar Clu Asp Ala Lau Asp Ala Sar Glu Lau Glu His Tyr Asp Pro Ala Aap Lau Ser Pro Thr Clu Ser Sar Leu Asp Leu Lau Gly Leu Asn Ar? Thr CAC ACC AGC GAG CAC GCC CTT GAC GCC AGC CAC CTC CAG CAC TAC GAC CCT CCA CAC CTG AGC CCC ACA CAG ACC TCC TTG GAC CTG CTG GCC CTC AAC ACG ACC 645 660 /> 67S 690 70S 720 735 A 190 tf ar Gin Val Val Arc; l i e 200 210 Glu Pro Sar Ala Cly Glu Aap Cly S g ' l l e Val Gly Gly Ar? Asp Cya Ala Glu Gly Clu Cys Pro Trp Gin Ala Lau Lau Val Aan Glu Glu GAG CCC ACC CCC CGC CAG GAC GCC AGC CAC GTC GTC CGG ATA GTC CGC GGC AGC GAC TCC CCG CAG CCC CAS TCC CCA TCC CAG CCT CTC CTG CTC AAC CAA CAC 750 765 780 795 810 825 840 220 230 240 250 Aan Glu Gly Pha Cys Cly Gly Thr I l a Lau Asn Clu Pha Tyr Val Lau Thr Ala Ala His Cyt Lau Hia Gin Ala Lys Ar? Phe Thr v a l Ar? Val Gly Asp Ar? AAC CAG GGA TTC TCC GGC GGC ACC ATC CTG AAC CAG TTC TAC CTC CTC ACG CCT CCC CAC TGC CTG CAC CAG GCC AAC AGO tTC ACG CTG ACC CTC GGC GAC CCC 655 870 885 900 915 930 445 260 270 280 Aan Thr Clu Gin Clu Glu Gly Asn Glu Hat Ala His C l u Val Clu Mat Thr Val Lya Hia Ser Ar? Phe Val Lya Glu Thr Tyr Aap Pha Asp I l a Ala Val Lau AAC ACA GAC CAC CAC CAC CCC AAC GAC ATC CCA CAC GAG CTG CAC ATC ACT CTG AAG CAC AGC CCC TTT CTC AAC CAG ACC TAC GAC TTC CAC ATC GCC CTG CTC 960 975 990 1005 1020 1035 1050 290 300 310 320 Ar? Leu Lya Thr Pro I l a Ar? Pha Ar? Ar? Asn Val Ala Pro Ala Cya Lau Pro Clu Lya Aap Trp Ala Clu Ala Thr Lau Mat Thr Gin Lya Thr Cly I l a Val ACC CTC AAC ACG CCC ATC CCC TTC CGC CCG AAC CTG GCC CCC CCC TCC CTG CCC CAG AAG GAC TCC CCC CAG CCC ACC CTC ATC ACC CAC AAG ACC CGC ATC CTC 1065 1080 1095 1110 1125 1140 11SS 330 340 350 Sar Cly Pha Gly Ar? Thr His Glu Lya C l y Ar? Leu Sar Ser Thr Leu Lya Met Lau Glu Val Pro Tyr Val Asp Ar? Sar Thr Cya Lya Lau Ser Ser Sar Phe AGC GGC TTC GCC CCC ACG CAC CAC AAC GCC CGC CTC TCC TCC ACC CTC AAG ATC CTG GAC CTG CCC TAC CTC GAC CCC ACC ACC TGT AAG CTC TCC ACC AGC TTC 1170 1185 1200 1215 1230 1245 1260 360 370 380 390 Thr 11a Thr Pro Asn Met Pha Cya Ala Cly Tyr Asp Thr Cln Pro Glu Asp Ala Cya Gin Gly Asp Sar Cly c l y Pro Hia Val Thr Ar? Pha Lys Aap Thr Tyr ACC ATT ACC CCC AAC ATC TTC TCC CCC CGC TAC GAC ACC CAG CCC GAC CAC GCC TCC CAC CCC CAC ACT CGC GCC CCC CAC GTC ACC CCC TTC AAC CAC ACC TAC 127S 1290 1305 1320 133S 13SO 116S 400 410 420 Phe Val Thr Gly I l a Val Sar Trp Gly C l u Cly Cya Ala Ar? Lya C l y Lya Phe Gly Val Tyr Thr Lys Val Sar Atn Phe Lau Lya Trp I l a Asp Lya l i e Mat TTC GTC ACA GCC ATC CTC AGC TCC CCA GAA GCC TGC GOG CCC AAG GGC AAC TTC GGC CTC TAC ACC AAC CTC TCC AAC TTC CTC | AAC TGC ATC CAC AAG AT£ ATG 1380 1395 1410 1425 1440 1455 1470 430 440 ^ 447 Lys Ala Ar? Als Gly Ala Ala Cly Sar Ar? Cly His Ser Clu Ala Pro Ala Thr Trp Thr Val Pro AAG GCC ACC CCA CCC CCC CCC GCC ACC CGC CCC CAC ACT CAA CCC CCT CCC ACC TGG ACC CTC CCC C 1485 1500 1515 1530 1537 73 the single chain factor X. This suggests that factor X is synthesized as a precursor having an N-terminal leader peptide of 40 amino acid residues. In that case, pBXl contains a 5' untranslated region of 75 nucleotides. The region of factor X mRNA encoded by pBXl is G/C rich (65% G/C, 35% A/T). This is reflective of the codon usage, where 88% of the bases in the third position are G or C. Codon usage is nonrandom, with nine sense codons not used at all (UUA, UCU, U A U , C A U , CAA, AGA, G U U , GUA, GGU). Eleven of the twelve gamma-carboxyglutamic acid residues are encoded by GAG. This is similar to prothrombin (MacGillivray and Davie, 1984; Degen et al., 1983), protein C (Long et al., 1984; Beckmann et al., 1985), and factor VII (Hagen et al., 1986) where all gamma-carboxyglutamic acid residues are encoded by GAG. However, the majority of the gamma-carboxyglutamic acid residues in human factor IX (Jaye et al., 1983; Kurachi and Davie, 1982) and human and bovine protein S (Hoskins et al., 1987; Dahlback et al., 1986: Lundwall et al., 1986) are encoded by GAA. B. CHARACTERIZATION OF THE HUMAN FACTOR X cDNA 1. ISOLATION AND CHARACTERIZATION OF HUMAN FACTOR X cDNA CLONES A human liver cDNA library was generously provided by Dr. Stuart Orkin at Children's Hospital, Harvard University (Prochownick et al., 1983). Bacterial colonies (240,000) of the human cDNA library were screened at high colony density with the 770 bp Pst I fragment of the bovine factor X cDNA pBX2 (Fung et al., 1984) as probe. Nine colonies hybridized specifically with the probe and were rescreened at lower colony density. Two positive clones from the second screen (designated pcHX5 and pcHX8) were studied further. Plasmid DNA was prepared from each of the clones and cleaved with Pst I. The resulting fragments were 74 analyzed by Southern blotting using the Pst I fragments of pcHX8 as hybridization probes. The analysis showed that the plasmids contained overlapping cDNA inserts (Figure 8). Subsequent sequence analysis showed that although pcF£X8 extended to the poly (A) tail of factor X mRNA, pcHX5 lacked the extreme 5' end of the coding region of factor X mRNA. Therefore, the human cDNA library was rescreened by using the 350 bp Pst I fragment of pcHX5 inserted into the vector M13mp8, as a hybridization probe. A longer clone was isolated (pcHX14; see Figure 8); however, pcHX14 still lacked the extreme 5'end of factor X mRNA (see below). Thus, we conclude that either the cDNA library used does not contain a full-length factor X cDNA clone or such a clone is underrepresented in the library compared to other factor X clones. 2. DNA SEQUENCE ANALYSIS OF HUMAN FACTOR X CDNA CLONES Most of the sequence analysis was performed using pcHX5 and pcHX8. However, in determining the sequence of the 5' and 3'ends, pcHX14 was also used. Plasmid DNA from pcHX5 and pcHX8 was randomly sheared and ligated into the Sma I site of M13mp9. Subclones containing factor X cDNA inserts were identified by plaque hybridization (Messing, 1983) with the Pst I inserts of pcHX5 and pcHX8 as probes; a total of 35 different Ml3 templates were isolated and their sequences were determined. This allowed the reconstruction of most of the factor X cDNA sequence (Figure 8, thick arrows). The sequence was completed by the chemical cleavage method (Maxam and Gilbert, 1980) (Figure 8, thin arrows). The complete nucleotide sequence of human factor X cDNA and the predicted amino acid sequence for the protein are shown in Figure 9. The position of each nucleotide was determined an average of 4.9 times, and 84% of the sequence was determined on both strands. Much of the cDNA sequence was determined for both pcHX5 and pcHX8. Only a single nucleotide difference was found between these two cDNAs; the codon for amino acid residue 344 was TTC (phenylalanine) in pcHX5 and TAC (tyrosine) in pcHX8. 75 FIGURE 8: RESTRICTION MAP AND SEQUENCING STRATEGY FOR HUMAN FACTOR X cDNA. The bars below the restiction map represent the clones pcHX5, pcHX8, and pcHX14 and include regions coding for the leader peptide (hatched bar), the light chain of plasma factor X (solid bar), the heavy chain (open bar), and the 3' untranslated sequence (dotted bar). The region encoding the linker tripeptide (Arg-Lys-Arg) is demarcated at the left of each open bar. The extent of sequencing is shown by the length of the arrows. DNA sequence determined on the coding strand is shown by an arrow pointing right; sequence determined on the noncoding strand is shown by an arrow pointing left. See text for details. The scale at the bottom represents nucleotides in kilobase pairs (kb). CO > < o <3 n *-in < a. co CO 3 n 1 M > > > CO < CL Q _ CQ HIT pcHX8 pcHX5 pcHX14 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 I 1 I I I I I V I Nucleotides (kb) 77 FIGURE 9: NUCLEOTIDE SEQUENCE OF HUMAN FACTOR X cDNA The sequence was determined by analysis of the overlapping clones shown in Figure 8. The predicted amino acid sequence of human prepro factor X is shown above the DNA sequence. Putative cleavages to form two-chain factor X are shown by the solid arrows, the bond cleaved by factor IXa is shown by the open arrow, and potential attachment sites for carbohydrate are indicated by solid diamonds. See text for details. -28 -20 -10 -1 +1 ' Ser Leu Ala Gly Lau Lau Lau Lau Gly c l u Sar Lau Phe 11a Arcj Ar? Glu G l a A l a Aan Aan I I * Lau A l a Arg V a l Thr Arg A l a Van Sar Pha Lau S l u Glu TCC CTC OCT GCC CTC CTG CTG CTC GGG CAA ACT CTG TTC ATC OGC AGG CAG CAC GCC AAC AAC ATC CTG CCG AGO GTC AOG ACG GCC AAT TCC TTT CTT CAA GAG 15 30 45 60 75 90 105 10 20 30 40 Met Lya Lya Gly Hia Lau Glu Arg Glu Cya Met Glu Glu Thr Cya Sar Tyr Glu Glu A l a Arg Glu Val Phe Glu Aap 8ar Aap Lya Thr Aan Glu Pha T r p Aan ATG AAG AAA GGA CAC CTC GAA AGA GAG TCC ATS GAA CAG ACC TGC TCA TAC CAA GAG GCC CGC GAG GTC TTT GAG GAC AGC CAC AAC ACG AAT CAA TTC TGG AAT 120 135 ISO 165 180 195, 210 SO 60 70 Lya Tyr Lya Aap Gly Aap Gin Cya Glu Thr Sar Pro Cya C l n Aan Gin Gly Lya Cya Lya Aap Gly Lau Gly G l u Tyr Thr Cya Thr Cya Lau Glu Gly Pha Glu AAA TAC AAA GAT CGC GAC CAG TGT GAG ACC ACT CCT TCC CAC AAC CAG CGC AAA TGT AAA GAC GCC CTC GGG GAA TAC ACC TGC ACC TGT TTA GAA GGA TTC CAA 22S 240 255 270 285 300 31S V 80 90 100 110 Gly Lya Asn Cya Clu Lau Phe Thr Arg Lya Lau Cya Ser Leu Asp Asn Gly Asp Cya Aap Gin Pha Cya Hia Glu Glu G i n Aan Ser V a l Val Cya Sar Cy* A l a GGC AAA AAC TGT GAA TTA TTC ACA CCG AAG CTC TGC ACC CTG GAC AAC GGG GAC TGT CAC CAG TTC TGC CAC GAG GAA CAG AAC TCT GTC CTG TCC TCC TCC GCC 330 345 360 375 390 405, 420 120 130 ^ 1 4 0 ^ Arg Gly Tyr Thr Leu Ala Asp Aan Gly Lys Ala Cya I I * Pro Thr Gly Pro Tyr Pro Cy* Gly Lya Gin Thr Leu Glu Arg Arg Ly* Arg^Sar V a l A l a Gin A l a CGC GGG TAC ACC CTG CCT GAC AAC GGC AAC GCC TCC ATT CCC ACA GGG CCC TAC CCC TGT GGG AAA CAG ACC CTG CAA CGC AGG AAG AGG TCA CTG GCC CAG GCC 435 450 465 480 495 510 S2S 150 160 170 180 4 ^ Thr Ser Ser Ser Gly C l u Ala Pro Asp Sar Zle Thr Trp Lya Pro Tyr Asp A l a A l a Asp Leu Asp Pro Thr Glu Asn Pro Ph* Asp Leu Leu Asp Phe Asn Gin ACC AGC AGC AGC CCG GAC GCC CCT GAC ACC ATC ACA TGC AAC CCA TAT GAT CCA GCC GAC CTG GAC CCC ACC GAG AAC CCC TTC GAC CTC CTT GAC TTC AAC CAG 540 555 >. 570 585 600 61S " 630 190 /C 200 210 Thr Gin Pro C l u Arg Gly Asp Asn Asn Leu Thr A r g V n * v a l Gly Gly Gin Glu Cya Lya Aap C l y C l u Cy* Pro T r p G i n A l a Leu Leu I I * A*n C l u Gla Aan ACG CAG CCT GAG AGC GGC CAC AAC AAC CTC ACC ACG ATC CTG GGA GGC CAG GAA TGC AAC GAC GGC CAC TGT CCC TGG CAG CCC CTG CTC ATC AAT GAG CAA AAC 645 660 675 690 70S 720 735 220 230 240 250 Glu Gly Phe Cya Gly Gly Thr l i e Leu Ser Glu Phe Tyr l i e Leu Thr A l a A l a Hia Cya Leu Tyr Gin A l a Lys Arg Phe Lys V a l Arg V a l G l y Aap Arg Asn GAG CCT TTC TGT GGT GGA ACT ATT CTG AGC GAG TTC TAC ATC CTA ACG GCA GCC CAC TGT CTC TAC CAA CCC AAG AGA TTC AAG GTG AGG GTA GGG GAC CGC AAC 750 765 780 795 810 825 840 260 270 280 Thr Glu Gin Glu C l u Gly Gly C l u A l a V a l Hia Glu Val Glu V a l V a l l i e Lys His Asn Arg Ph* Thr Lys G l u Thr Tyr Asp Phe Asp l i e A l a Val Leu Arg ACG GAC CAG GAG GAG GCC GGT GAG GOG GTG CAC GAG CTC GAG GTG GTC ATC AAG CAC AAC CGC TTC ACA AAG GAG ACC TAT GAC TTC GAC ATC CCC GTG CTC CCG 85S 870 885 900 91S 930 945 290 300 310 320 Leu Lys Thr Pro H e Thr Phe Arg Met Aan V a l Ala Pro Ala Cya Leu Pro G l u Arg Asp Trp A l a Glu Ser Thr Leu Met Thr Gin Lys Thr Gly H e V a l Sar CTC AAG ACC CCC ATC ACC TTC CGC ATG AAC GTG GCC CCT GCC TGC CTC CCC GAG CGT GAC TGG GCC GAG TCC ACG CTG ATG ACC CAG AAG ACG CGC ATT GTG AGC 960 97S 990 1,005 1,020 1,035 1,050 330 340 Tyr 350 Gly Phe Gly Arg Thr His Glu Lys Gly Arg Gin Ser Thr Arg Leu Lys Met Leu C l u V a l Pro Phe V a l Asp Arg Asn Ser Cya Lys Leu Ser Ser Ser Phe H e GGC TTC GGG CGC ACC CAC GAG AAC GGC CCC CAC TCC ACC AGG CTC AAG ATG CTG GAG GTG CCC TAC CTG GAC CGC AAC ACC TGC AAC CTC TCC AGC AGC TTC ATC 1,065 1.080 1,095 1,110 T 1,125 1.140 1,155 360 370 380 390 H e Thr Gin Asn Met Phe Cys A l a C l y Tyr Aap Thr Lys Gin Glu Asp A l a Cy* Gin Gly Asp S«r Gly Gly Pro BI* V a l Thr Arg Ph* Lys Asp Thr Tyr Ph* ATC ACC CAG AAC ATG TTC TGT GCC GCC TAC GAC ACC AAG CAG CAG GAT GCC TCC CAC GGG GAC AGC GGG GGC CCC CAC GTC ACC CCC TTC AAG SAC ACC TAC TTC 1,170 1,185 1,200 1,215 1,230 1,245 1,260 400 410 420 Val Thr Gly H e Val Ser T r p Gly Glu Ser Cya A l a Arg Lya Gly Ly* T y r C l y I I * T y r Thr Ly* V a l Thr A l a Ph* Leu l y * Trp H e Asp Arg Sar Mat Lya GTG ACA GGC ATC CTC AGC TGG GGA GAG AGC TGT GCC CGT AAG GGC AAG TAC GGC ATC TAC ACC AAG GTC ACC GCC TTC CTC AAG TGG ATC GAC AGG TCC ATG AAA 1,275 1,290 1,305 1.320 1,335 1.350 1,365 430 440 448 Thr Arg Gly Lau Pro Lys A l a Lys Sar His Ala Pro Glu Val H e Thr Ser Ser Pro Leu Lya STOP ACC AGG GGC TTG CCC AAG GCC AAC AGC CAT GCC CCG GAG GTC ATA ACG TCC TCT CCA TTA AAG TGA GAT CCC ACT C 1.380 1,395 1,410 1,425 1,441 79 " The clone described by Leytus et al. (1984) contained the TAC codon in this position. This difference represents either a cloning artifact or a polymorphism in the factor X alleles of the individual whose liver mRNA was used in the construction of the cDNA library. Plasmids pcHX5 and pcHX14 also contain a region coding for an amino-terminal leader peptide of 28 residues. This leader peptide does not contain a methionyl residue in the same reading frame as the factor X protein sequence, suggesting that these two clones are lacking part of the leader peptide and the 5' untranslated region of factor X mRNA. As also reported by Leytus et al. (1984), the cDNA sequence predicts that the heavy chain sequence is followed by a TGA stop codon (nucleotides 1429-1431 in Figure 9), a 3' untranslated region of 10 nucleotides (nucleotides 1432-1441), and a poly (A) tail. Thus the putative polyadenylation signal, A T T A A A (nucleotides 1422-1427) (Proudfoot and Brownlee, 1976), located 15 bp upstream of the poly (A) tail is contained within the coding region of the factor X mRNA. Subsequent to the commencement of this study, Leytus et al. (1984) have reported the characterization of a partial cDNA coding for human factor X. This clone codes for part of the light chain of factor X, a linking tripeptide Arg-Lys-Arg, the complete heavy chain, a short 3' untranslated region, and a poly (A) region. In the positions where they overlap, the sequence for human factor X cDNA agrees with that reported by Leytus et al. (1984) with three exceptions. Leytus et al. reported that residues 450 and 973 (equivalent to positions 756 and 1288 in Figure 9 were C and G, whereas our sequence contains T and A in these positions, respectively. These differences could be the result of cloning artifacts or polymorphisms in the factor X alleles studied. The third difference occurs at position 817 in Figure 9 where both pcHX5 and pcHX8 contain the sequence A - A - G - G - T - G - A - G - G - G - T , whereas Leytus et al. report only - G - A - (nucleotides 511-512 in their sequence). The extra nine nucleotides are required to maintain the alignment between human and bovine factor X 80 sequences (see Figure 22), suggesting that the clone isolated by Leytus et al. may have undergone a small deletion during construction and amplification of the cDNA library. 3. 5' END OF THE HUMAN FACTOR X cDNA In efforts to obtain a clone encoding the complete leader sequence and extending into the 5' untranslated region of the human factor X mRNA, a second human liver cDNA library was screened by in situ hybridization. An oligo (dT) primed cDNA library consisting of 1.5 million individual clones ranging between 500-2000 bp in length was prepared and ligated into the Eco RI site of the vector lambda gtll by Enriqueta Guinto. A total of 750,000 phage lambda from the unamplified library were screened using the 32P-labeled long oligonucleotide probes Oligo 3-4 and Oligo 5-6 corresponding to sequences at the 5' end of the factor X cDNA (see Figure 16 and section II.F). Eight positive clones were isolated, but only four of the phage recombinants hybridized with both oligonucleotide probes. Restriction endonuclease map analysis indicated that three clones might extend 5' to pcHX14. The cDNA clones lambda-cHXl, lambda-cHX3, and lambda-cHX4 were characterized further by DNA sequence analysis. Lambda-cHX4 codes for most of the leader sequence of human factor X (Figure 10); lambda-cHXl and lambda-cHX3 encodes the entire 40 amino acid leader peptide including the initiator methionine residue as well as a small portion (four nucleotides) of the 5' untranslated region of the mRNA (Figure 10). However, none of the isolated clones are full-length; the factor X mRNA contains, at minimum, 25 bp of 5' flanking sequence (Leytus et al., 1986). In addition, a randomly primed human liver cDNA library (prepared by Walter Funk) was screened but no longer clones were isolated. 4. NORTHERN BLOT ANALYSIS OF THE HUMAN FACTOR X mRNA To determine the size of the factor X mRNA transcript, human liver poly A+ RNA was denatured with formamide, separated by formaldehyde agarose gel electrophoresis, and 81 FIGURE 10: NUCLEOTIDE SEQUENCE OF THE LEADER PEPTIDE AND PARTIAL 5' UNTRANSLATED REGION OF THE HUMAN FACTOR X CDNA The open bars below the nucleotide sequence represent the cDNA clones pcHX14, lambda-cHX4, lambda-cHX3, and lambda-cHXl encoding the 5' end of human factor X. The phage clones extend further 3' as represented by the direction of the arrows. The nucleotide sequence of 4 bp of the 5' untranslated region and the complete prepro leader peptide was determined on both strands. Numbers correspond to the cDNA sequence given in Figure 9. Plasma factor X is generated by cleavage as indicated by the arrow. The direction of transcription is 5' to 3' as shown. . -40 -30 -20 10 -1 +1 Met Gly Arg Phe Leu His Leu Val Leu Leu Ser Ala Ser Leu Ala Gly Leu Leu Leu Leu Gly Glu Ser Leu Phe H e Arg Arg Glu Gin Ala Asn Asn H e Leu A l a Arg Val Thr Arg Ala CACC ATG GGG CGC CCA CTG CAC CTC GTC CTG CTC AGT GCC TCC CTG GCT GGC CTC CTG CTG CTG GGG GAA AGT CTG TTC ATC CGC AGG GAG CAG GCC AAC AAC ATC CTG GCG AGG GTC ACG AGG GCC pcHX 14 I ' > XcHX 4 i > XcHX 3 i > XcHX 1 > 5' -3' O O 83 transferred to nitrocellulose. The blot was hybridized with 32P-labeled pcHX14 in the presence of formamide (see Table I for conditions). Autoradiography detected a discrete band with a size of 1800 ± 100 nucleotides (Figure 11). The combined length of the human factor X cDNA clones, lambda-cHXl and pcHX14, is 1481 bp. As poly (A) tails are usually 180-200 nucleotides in size (Perry, 1976), less than 100 nucleotides is absent from the phage clone lambda-cHXl. Thus the latter cDNA clone isolated by Leytus et al. (1986) may be almost full-length. C. CHARACTERIZATION OF THE HUMAN FACTOR X GENE 1. SOUTHERN BLOT ANALYSIS OF THE HUMAN FACTOR X GENE As an initial step in characterization, the human factor X gene was examined by Southern blot analysis. Analysis was performed by Colin Hay. Human genomic DNA was digested with restriction enzymes Hind III, Eco RI, Pst I, Bam HI, or BstE II, electrophoresed in an agarose gel, and transferred to nitrocellulose filter. The blot was hybridized with the 32P-labeled human factor X cDNA clone, pcHX14. Upon autoradiography, several restriction endonuclease fragments were apparent with each digest (Figure 12A). Minimal and maximal size estimates of 16 Kbp (Eco RI) and 30 Kbp (Bam HI), respectively (Figure 12A) were obtained for the human factor X gene. When a 3' end 450 bp Ava I/Pst I fragment (from pcHX14) was used as a hybridization probe, a single band was detected with each restriction enzyme digest (Figure 12B). Thus the human genome appears to contain a single gene encoding factor X. 84 FIGURE 11: NORTHERN BLOT ANALYSIS OF HUMAN FACTOR X mRNA Human liver poly A+ RNA (27 ug) was electrophoresed in a denaturing agarose-formaldehyde gel and transferred to nitrocellulose. The blot was hybridized with 32P-labeled pcHX14 and autoradiographed for 4 days at -70°C. Molecular weight markers represent the position of lambda-Hind III DNA fragments in kilobases. 85 9.50 6.66 4.26 2.25 v. 1.96 0.59 86 FIGURE 12: SOUTHERN BLOT ANALYSIS OF THE H U M A N FACTOR X GENE High molecular weight human liver DNA (10 ug) was digested with various restriction endonucleases and electrophoresed in a 0.9% agarose gel. After denaturation, the DNA was transferred to nitrocellulose and hybridized to 32P-labeled fragments from pcHX14. The blots were exposed to X-ray film. Panel A represents the filter hybridized with 32P-labeled pcHX14; panel B, the filter hybridized with 32P-labeled 450 bp Ava I/Pst I fragment encoding the 3' end of pcHX14. In each panel, lane 1 represents 32P-labeled lambda-Hind III DNA used as molecular weight markers. Genomic human DNA was digested with Hind III (lane 2), Eco RI (lane 3), Pst I (lane 4), Bam HI (lane 5), and BstE II (lane 6). CO LO CO CM C O r -Q - S If) N CO O) CD I t CO * CO o cvjcvi CO in oo CM i • t • i I I Q - K 10 IS*-co o> cd CO o cvi cvi 88 2. ISOLATION AND CHARACTERIZATION OF THE FACTOR X GENOMIC CLONES A human genomic DNA library was provided by Dr. P. Leder. The library was constructed from partial Sau 3A fragments (10-20 Kbp in length) ligated into the Bam HI site of lambda charon28 (Prochownik et al., 1983). One million recombinant phage were screened by in situ hybridization (Benton and Davis, 1977) with the 32P-labeled pcHX14 as probe. A total of 32 positive clones were isolated and DNA was prepared from six randomly chosen phage (lambda-X46, lambda-X51, lambda-X52, lambda-X55, lambda-X56, lambda-X63) for further analysis. Restriction endonuclease mapping and Southern blotting revealed overlapping phage clones representing over 32 Kbp of contiguous human genomic DNA (Figure 13). Southern blot analysis utilizing probes specific for the 5' and 3' ends of the human factor X cDNA initially indicated that the entire gene was probably represented in the six genomic clones. However, DNA sequence analysis showed that the 5' region of the factor X gene was absent. The other 26 positive recombinant phage clones were subsequently characterized. Dot blot and Southern blot analysis with 32P-labeled pcHX14 -and Oligo 2 (see Figure 16 and section II.F) as hybridization probes showed that a number of the clones extended 3' to the factor X gene but no 5' sequence was detected. Either the 5' region of the factor X gene was underrepresented in the genomic phage library (the library was amplified prior to screening) or the cDNA probe used was biased towards the 3' end of the gene. A partial restriction enzyme map was constructed for the existing clones (Figure 13). Thus far, the factor X gene maps to 20 Kbp of the human genome. 3. LOCALIZATION OF THE INTRON/EXON JUNCTIONS To characterize the factor X gene structure, sonicated genomic DNA fragments generated from lambda-X51, lambda-X56, and lambda-X63 were subcloned into Ml3 vectors and transformed into E. coli JM103 host. Exon-containing sequences were identified by screening phage plaques with 32P-labeled pcHX14. To detect the small 25 bp exon, (see 89 FIGURE 13: PARTIAL RESTRICTION MAP AND INTRON/EXON ORGANIZATION OF THE FACTOR X GENE A partial restriction enzyme map of the factor X gene is represented. Genomic clones lambda-X46, lambda-X51, lambda-X52, lambda-X55, lambda-X56, and lambda-X63 are shown below as open bars, but the 3' regions of lambda-X55, lambda-X56, and lambda-X63 have been omitted. Exons (black boxes) are numbered 2 through 8 as no clones representing 'exon 1' were isolated. Abbreviations used are: B - Bam HI; Bs - BstE II; H - Hind III; E -Eco RI; P - Pst I. The scale represents nucleotides in kilobases. S C A L E (KB)-' 0 8 10 12 14 16 18 X CLONES: HH H H XX46 [ B. H 20 22 EXONS • 5' 2 3 4 5 6 7 3* 8 GENE: I I I 1 I B RESTRICTION B s— B , E —"r-H . . E B , B B s S-J , • •, P P — H — E • B« E 1—, 1 XX5I XX52 XX63 .X55 .X56 t t t 91 Figure 15), 32P-end-labeled Oligo 1 was used as a hybridization probe. The DNA sequences of the intron/exon junctions were determined by the chain termination method and are given in Table III. All junctions were verified on both strands with the exception of sequences flanking exon 7, in which case, the sequence data were obtained twice on the same strand. The factor X gene consists of at least 8 exons interrupted by 7 introns. All intron positions and junction sequences agree well with those reported by Leytus et al. (1986) with one difference; the 3' splice acceptor in intron D, g-c-a-g-T-C-A-C (Table III), is replaced by g-g-c-a-g-T-C-A-C (Leytus et al., 1986). Nucleotide frequencies at donor and acceptor site are given in Table IV. At the splice junctions, there are no deviations from the G T / A G rule (Breathnach and Chambon, 1981) and the exon flanking regions conform to the consensus sequence found in eukaryotic structural genes (Mount, 1982) (Tables III and IV). The positions of the exons and the sizes of the intronic sequences were determined by restriction enzyme mapping studies of pUC13 subclones generated from genomic phage clones, lambda-X51, lambda-X56, lambda-X63 (Figure 13; Table V). Within experimental error, all intron sizes are in agreement with lengths estimated by Leytus et al. (1986). The greatest difference was observed in intron B; Leytus et al. (1986) reported a length of 7400 bp compared to 8100 bp (Table V). As no 5' genomic clone was obtained, the length of intron A is unknown but is greater than 120 bp. Exon lengths vary widely, ranging from 25 bp and 612 bp; however, the average size (176 bp) is consistent with data previously collected on eukaryotic exon lengths (Naora and Deacon, 1982). 4. NUCLEOTIDE SEQUENCE OF THE HUMAN FACTOR X GENE The factor X gene was partially characterized by DNA sequence analysis. The sequencing strategy is given in Figure 14. A total of 4600 bp of DNA sequence data was obtained. Each nucleotide was determined an average of 4.1 times and 91% was confirmed on both strands. Comparison with the human factor X cDNA sequence allowed the identification of intron and exon sequences as described above (Figure 15). The nucleotide sequence includes 92 EXON 5' SPLICE INTRON 3' SPLICE CODON NUMBER DONOR ACCEPTOR PHASE 1 A ctgtcctccctgccttccagTGTT I 2 GACGgtaagggctggggatagcct B a a t c t c t t t t t t c c t t t t a g A A T G 0 3 A A A G g t c a g t a t t t t t t c t g t t t t C tcgaaatcctctctttgcagATGG I 4 TTATgtaggttcctctgcttggta D aacgtgcctctcctttgcagTCAC I 5 ACAGgtaggaggcacgttgggcca E ggccgtcctctttctttcagGGCC I 6 GCAGgtaacagtaggatgtcccct F gcctgtcacgtctgtcacagGCCC O 7 GTAGgtaagtgaccaacagccccc G gtcccactcgtctgtcccagGGGA I 8 3' END TABLE I I I . NUCLEOTIDE SEQUENCE OF INTRON/EXON JUNCTIONS IN THE FACTOR X GENE Exon sequence i s shown i n upper case; i n t r o n sequence i n lower case. The codon phase r e f e r s to the p o s i t i o n of the i n t r o n i n the codon t r i p l e t . 0 - i n t r o n occurs between codons; I - i n t r o n occurs a f t e r the f i r s t n u c l e o t i d e i n the codon. 93 DONOR FREQUENCIES +4 +3 +2 +1 -1 -2 -3 -4 -5 -6 G 3 0 0 5 6 0 0 2 5 1 A 2 2 5 0 0 0 5 4 0 2 T 1 2 0 1 0 6 0 0 0 3 C 0 2 1 0 0 0 1 0 1 0 CON N A A G G T R A G T C -20--19-•18--17-•16-•15-G 3 1 2 1 2 1 A 2 2 0 1 1 2 T 1 2 1 2 2 2 C 1 2 4 3 2 2 CON Y Y Y Y Y Y ACCEPTOR FREQUENCIES 14-13-12-11-10 -9 -8 -7 -6 0 0 0 2 0 1 0 2 0 0 1 0 0 0 0 0 0 0 3 2 3 2 6 3 3 2 7 4 4 4 3 1 3 4 3 0 Y Y Y Y Y Y Y Y Y -5 -4 -3 -2 -1 +1 +2 +3 +4 0 2 0 7 7 3 3 2 2 0 1 0 0 0 2 1 1 1 5 2 1 0 0 2 1 2 1 2 2 6 0 0 0 2 2 3 Y N Y A G G N N N TABLE IV. FREQUENCIES OF NUCLEOTIDES AT INTRON/EXON JUNCTIONS The frequencies of the different nucleotides at the intron/exon junctions of the human factor X gene are compared to the consensus (CON) of Mount (1982). Splice junctions are between -1 and +1. EXON LENGTH 1 UNDETERMINED 2 161 3 25 4 114 5 132 6 245 7 118 8 612 INTRON LENGTH A UNDETERMINED B 8100 C 900 D 1400 E 3000 F 3300 G 1400 TABLE V. SIZE OF EXONS AND INTRONS IN THE HUMAN FACTOR X GENE S i z e s o f t h e i n t r o n s were e s t i m a t e d from r e s t r i c t i o n enzyme a n a l y s i s ( F i g u r e 13). L e n g t h s a r e g i v e r i i n base p a i r s . 95 FIGURE 14: DNA SEQUENCING STRATEGY FOR THE HUMAN FACTOR X GENE The gene structure is represented according to the direction of transcription (5' to 3'). Exons are indicated as black boxes (2 to 8) and introns are shown as single lines linking the exons (B to G). The arrows below represent the orientation and frequency of DNA sequence obtained from independent Ml3 clones. The scale is given in kilobases. 97 FIGURE 15: PARTIAL DNA SEQUENCE OF THE H U M A N FACTOR X GENE The sequence was determined by analysis of the M l 3 clones indicated in Figure 14. The predicted amino acid sequence of human factor X is given above the nucleotide sequence. Intron/exon junctions are denoted by vertical arrows ( J ). The sizes of the introns were determined by restriction enzyme map analysis (Figure 13). The sequence of the first intron is incomplete and no sequence corresponding to 'exon 1' was determined (see text). The putative polyadenylation signal A T T A A A (nucleotides 19608-19613) is boxed. Elements corresponding to a second polyadenylation recognition sequence (consensus CAYTG) (nucleotides 19623-19626, 19632-19635, 19646-19651, 19655-19659, 19660-19664, and 19675-19679) are underlined. In the protein coding region, the cleavage site which gives rise to secreted factor X is denoted by ( ^ ) , the sites of processing which generate two chain mature factor X are denoted by (\/),and the site of activation of factor X by factor IXa is denoted by (j^. 98 ATC CCC CTT TTC TCA TCT CCA TAT GCC AAG CCA CAT GCC ACT CAC GGA CCA TAG GTG AGC GGG AGC CTC GGT GAG GGT GAC CAG AGC TTT TAA CCC TGT CCT CCC 15 30 45 60 75 90 105 10 -1»%1 10 isu Phe He Arg Arg Glu Gin Ale Asn Asn He Leu Ala Arg Val Thr Arg Ala Aan Ser Phe Leu Glu Glu Met Lys Lys Gly His Leu Glu Arg TGC CTT CCA GTG TTC ATC CGC AGG GAG CAG GCC AAC AAC ATC CTG GCC AGG GTC ACC AGG GCC AAT TCC TTT CTT GAA GAG ATC AAG AAA GGA CAC CTC CAA AGA " 0 135 150 165 160 195 210 Lya Thr 20 ^ 30 Clu Cyn Met Glu Glu Thr Cys Ser Tyr Glu Clu Ala Arg Glu Val Phe Glu Asp Ser Asp 1 CAG TGC ATC GAA GAG ACC TGC TCA TAC CAA GAG GCC CGC GAG GTC TTT GAG GAC AGC GAC AAG ACG GTA AGC CCT GGG GAT AGC CTG CCT CTT GGT AAG GAG CTC 225 240 255 270 285 300 315 AGG CCA CAC CGC CCT CGC TGG CCC TGC TGC TCC GTC CAT CCA GGG CGG COG CCT GGA GGA AGG CGC AGC GTG CGC GAA GGC TTT CAG GGG CGG GGC CCA OCA AAT 330 345 360 375 390 405 420 CCA GGC CTC GGC GGA GTC CTG CCC ACA GGG ACA TCA GTG CCC CCC CCG CGC TGA CTC CTT CCC GGC GAG CAC TCA GCC GCC ACG GAT GCC CCC AAC TCC CTT GAG 435 450 465 480 495 510 525 | TAsn Glu Phe Trp Asn Lya Tyr GGT CAC AGG GCT TCT GCC AGA CTT AAG TTC TAT TTA AAA ATA AA 7-, 790 bp AAA TCT CTT TTT TCC TTT TAG AAT GAA TTC TGG AAT AAA TAC 540 555 8,170 8,385 8,400 L y a \ AAA CCT CAG TAT TTT TTC TGT TTT AAC CTT CAG TGA GAG GGC TTC ATC AGG ATA TTT GAA TTT TGA AAA TAG TTC CTG AAT TTC CTT TCT GCT TTT GTT CTA ATT 8,415 8,430 8,445 8,460 8,475 8,490 8,505 TTA CTC ATT TAA GAC TTT TTC CCT CAG GGT CTT TCC ATA ATA GTT ATT CTA AAA GAG TTT TTA GAG TAA TTT TAT ACT AAT CCT ACT TTT GTT ATT GAG TTA GAG 8,520 8,535 8,550 8,565 8,580 8,595 8,610 ATA TAT ATT TAA ATC ACT TCA TTC TCA TTT GAG GAT ACC AAA TTC CAT GAT AAC TTT TCT TAA ATA AAA GTG TAT TOG GTA AAA GCA AAA AAC AGA CT 8,625 8,640 8,655 8,670 8,685 8,700 320 bp -G TTC CTC CCT GTG CTC ACC TCT GAC TGT AAA CAC ACT GCA AAA CAC CGG CAA AAA TCA AAA ACC TGG GCC GGT GAT CCA CCT AGA TAA AGG CAT 9,030 9,045 9,060 9,075 9,090 9,105 9,120 CAC GTA CAC ATG GCC ACA AAA GGG GCT GGA TCA AAT AAA GTC CAA AGA GGG CCA GTT GTT TAC AGA CAA ACC GCA AGA CTC TTC CAC TTA TCT GAA CGG CAG GGC 9,135 9,150 9,165 9,180 9,195 9,210 9,225 1 vjp Gly Asp Cln Cys Glu Thr Ser Pro CAA GGT TAG CAC AGC AAA ACT CTT TCC ATG ATG CCG CAA ACA GCT TGC AGA CTC CAG TTT CCA AAT CCT CTC TTT GCA GAT GGC GAC CAG TGT GAC ACC ACT CCT 9,240 9,255 9,270 9,285 9,300 . 9,315 9,330 60 70 80 1 Cya Cln Asn Gin Cly Lya Cya Lys Aap Cly Lau Gly Glu Tyr Thr Cya Thr Cys Leu Clu Gly Phe Glu Gly Lys Asn Cya Glu Leu P> TGC CAC AAC CAG GGC AAA TGT AAA GAC GGC CTC GGC GAA TAC ACC TGC ACC TGT TTA GAA GGA TTC GAA GGC AAA AAC TGT GAA TTA TGT AGG TTC CTC TGC TTC 9,345 9,360 9,375 9,390 9,405 9,420 9,435 CTA TAC CTT CAC ATC AGA TGC CCC TCA AGA CTG GCA GGT GGG CGC GGG AAG AAG TCA AAA CGC CTA ATG AAA CAA TCT TAA GTC ATT TCT GAT TTA CAA ACT CTG 9,450 9,465 9,480 9,495 9,510 9,525 9,540 CGC TCT ATT ATA CCT ATT ATA CTC TGC CAT ATA GCA ATA CAA AAA 1,020 bp TCT CCC TAC ACC GGG CAC TGC ACC ATG AGC TCC CCC TCA CCC 9.555 9,570 9,585 10,620 10,635 10,650 GTG AGG TTC CCC TTC AAG CCA ACT GTA CCT GTC GCC TGG CTC TGG CCC TTT CCT CAA CCC AAT GGC CGC TTT GTG GCT GAC AGG CAA GTG GAT GTA GCT GGC ACC 10,665 10,680 10,695 10,710 10,725 10,740 10,755 I 90 i^e Thr Arg Lys Leu Cya Ser Leu Asp Asn Gly Asp Cys Asp CTT GGG CCA GCC CAG CCT CCA TTT CTC CAG CTC TCC CCA GAG CCA ACG TGC CTC TCC TTT GCA CTC ACA CCG AAC CTC TCC AGC CTC GAC AAC GGG CAC TGT GAC 10,770 10,785 10,800 10,815 10,830 10,845 10,860 100 110 120 1 Gin Pha Cys His Glu Clu Cln ABn Ser Val Val Cya Ser Cya Ala Arg Gly Tyr Thr Leu Ala Asp Asn Gly Lys Ala Cya Ha Pro Thr G> CAG TTC TGC CAC GAG GAA CAG AAC TCT GTG CTC TGC TCC TGC GCC CCC GGG TAC ACC CTC CCT CAC AAC GGC AAC GCC TGC ATT CCC ACA CCT AGG AGG CAC GTT 10,875 10,890 10.90S 10,920 10,935 10,950 10,965 GGG CCA CAG CCA CCC GCT GCC GCT GGG CCG GGC CAG GCA GGA CAA GCC CCT CGC CAG CGG CTC GGG ACA CAG GCA TCT TCT GGG CGG GCC TGG CAG GTA ACA CTC 10,980 10,995 11,010 11,025 11,040 11,055 11,070 ACA CCA AGA GGA CAC CAC TGA GCC CTG GGC TCC GCC CCC AGG TGG TTC AAA CAT GAA GAC CAT GAC CTT TGG AAA CAC ACC CAT TAT TTC TGT AAG CCA CAT CTG 11,085 11,100 11,115 11,130 11,145 11,160 11,175 CTG TTT AA 2,570 bp C AAC GCT CGG ACA GCT CCG CTC ACC TGC AGA TCC GAC CCC TGC CCA CCA CGT GCC GCC TCC CCC TGC AAG CCC GCT GCC 13,755 13,770 13,785 13,800 13,815 13,830 CCT CCG GGT GCC CCT CCG CTC TGC CTC CCG GCT CTC TCA CTC TTC TCC CTC AGG GTG AGC TGT GCA GGC TAT GGG GAG CCT CTC TCT GTG CTC AAG GCC CCG GCC 13,845 13,860 13,875 / y 13,890 / } 13,905 13,920 13,935 TC TGT a 13 e e^ Oligo 5 | 43 mer ^ Oligo 6 «^2 mer| l i 10 bp 103 2, Southern blot analysis was performed using 32P-labeled Oligo 2 as the hybridization probe (Figure 16B). Autoradiography revealed one hybridizing clone, lambda-MX6. However, DNA sequence analysis showed that the phage did not contain 'exon V sequence. The hybridization sequence consisted of 17 matches from a total of 23 nucleotides (Table VI). No 5' end sequence was identified. As the genomic DNA library had been amplified numerous times prior to screening, it is possible that the library is not representative for the 5' region of the factor X gene. A second genomic DNA library was screened by in situ hybridization. One million phage from a library constructed from one million independent clones inserted into EMBL3 vector (prepared by Val Geddes) was screened with Oligo 3-4 (Figure 16B). Four positive phage clones were isolated. Subsequent DNA sequence analysis (using Oligo 3 as the primer) identified 10/12 nucleotides as the hybridizing region (Table VI). Again, no 'exon 1' sequence was obtained. ii) SOUTHERN BLOT ANALYSIS OF THE 5' END OF THE FACTOR X GENE To characterize the 5' region of the factor X gene, Southern blot analysis was performed. Genomic DNA was digested with Hind III, Bam HI, and Eco RI. Blots were hybridized individually with 32P-labeled Oligo 3-4 and Oligo 5-6 (Figure 16B). Autoradiography showed that each oligonucleotide probe hybridized to multiple bands in each digest, but only a single band hybridized with both probes: 10.5 Kbp Hind III, 4 Kbp Bam HI, and 22 Kbp Eco RI fragments, respectively (Figure 17A). Thus 'exon 1' appears to span the entire leader peptide encoding region of the factor X gene. Further Southern blot analysis with a 32P-labeled 200 bp Eco RI fragment from the cDNA clone lambda-cHXl (containing 'exon 1' and exon 2 sequences) identified 10.5 Kbp Hind III, 21 Kbp Bam HI, and 3 Kbp Eco RI fragments (Figure 17B). Only the hybridizing band from the Hind III digest overlapped with that observed for the oligonucleotide probes. Earlier restriction enzyme mapping of factor X 104 FIGURE 17: SOUTHERN BLOT ANALYSIS OF THE 5' REGION OF THE FACTOR X GENE Genomic DNA was digested with various restriction endonucleases and electrophoresed in a 0.9% agarose gel. After denaturation, the DNA was transferred to nitrocellulose and hybridized to 32P-labeled probes. Fragments were detected by autoradiography. Genomic DNA was digested with Hind III, Bam HI, Eco RI, and Pst I. Molecular weight markers are derived from lambda-Hind III fragments. Sizes are given in kilobases. PANEL A: 32P-labeled Oligos 3 and 4 (lanes A) and 32P-labeled Oligos 5 and 6 (lanes B) were used as hybridization probes. Autoradiography varied between 4 and 7 days. PANEL B: 32P-labeled 200bp Eco RI fragment from lambda-cHXl was used as the probe. Exposure was for 24 hours. < CD O h-O (/> IXJ 0. 23.4 9.5 6.66 4.26 2.25 1.96 Q z I 23.4 — »• 95 — 6.66 — 4.26 — 2.25 1.96 0.59 107 genomic clones had indicated that, excluding the 5' end, the human factor X gene mapped to 20 Kbp (see previous discussion). The 10.5 Kbp Hind III fragment encodes both 'exon 1' and exon 2; however as a Bam HI site is found just upstream of exon 2, the minimal estimated length of 'exon 1' and intron A is equivalent to the 4 Kbp Bam HI fragment. The combined data suggests that the structural portion of the factor X gene may be encoded by 24 Kbp of contiguous DNA. The stringency of the hybridization and washing conditions prevented hybridization of the short 'exon 1' sequences both in the case of the lambda-cHXl (Figure 17B) as well as the pcHX14 probe (Figure 12A). Apparently, the 5' exon was not detected in the previous blots. iii) CONSTRUCTION OF SPECIFIC GENOMIC LIBRARIES Due to the difficulty in isolating the 5' end of the factor X gene from general genomic DNA libraries, specific libraries were constructed as determined from data provided by the Southern blot analyses. Two specific genomic libraries, one from the 10.5 Kbp region of the Hind III digest and the second from the 4 Kbp region of the Bam HI digest were prepared. Genomic DNA from the 9.5-11.5 Kbp region of Hind III digest was separated by agarose gel electrophoresis, excised, inserted into the Hind III site of lambda Dash, and packaged in vitro. One hundred thousand independent clones were obtained and screened with 32P-labeled oligonucleotides, Oligo 3-4 and Oligo 5-6. On successive screens, positive plaques were identified with Oligo 3-4, but not with Oligo 5-6. As the results were ambiguous, the clones were not characterized further. A second specific genomic DNA library was prepared from the 3-5 Kbp region of the Bam HI digest. The DNA fragments were end-modified and ligated into the Eco RI site of lambda gtll. A total of 630,000 recombinant phage lambda were screened by in situ hybridization with both 32P-labeled 43 mers. However, no positive plaques were detected that hybridized to both oligonucleotide probes. 108 iv) SUMMARY OF SCREENING CONDITIONS AND GENOMIC CLONE ANALYSIS Screening results for the 5' region of the factor X gene are summarized in Table VI. Several approaches were attempted. A total of five genomic DNA libraries were screened but no 5' end clone was identified. During early screens, several false positives were isolated. However, through modification of both hybridization and washing procedures, mismatches with the oligonucleotide probes were eliminated. Theoretically, the specific libraries resulted in a 10-fold enrichment of the 5' DNA fragment, but it is possible that the number of independent clones required may have been underestimated. Overall, the 5' region of the factor X gene appears to be underrepresented in genomic phage libraries. It has been reported that a proportion (>8.9%) of the human genome is not clonable in standard rec+ E. coli hosts because of recombination-dependent deletion (Wyman et al., 1985). However, these sequences can be cloned in recB, recC, sbcB hosts (Wyman et al., 1985). However, attempts to infect JC8111 (recB, recC, sbcB) (Boissy and Astell, 1985) with lambda gtll phage vector (4 Kbp Bam HI specific library) resulted in a 20-fold decrease in titre. The number obtained were not sufficient for screening. D. H U M A N FACTOR X GENETICS 1. CHROMOSOMAL LOCALIZATION The human factor X cDNA, pcHX8 was used by Dr. Nicola Royle (Dept. of Human Genetics, University of Manitoba) to determine the chromosome loci of the factor X gene by human-hamster hybrid analysis (Table VII) and by in situ hybridization of metaphase chromosomes (Figure 18). The factor X gene was assigned to chromosome 13 and specifically to 13q32-qter by the in situ hybridization studies (Royle et al., 1986). 109 TABLE VI: SUMMARY OF SCREENING RESULTS FOR THE 5' END OF THE FACTOR X GENE The results from the screening of five genomic DNA libraries for the 5' region of the factor X gene are summarized. The lambda charon4A genomic phage library was supplied by Lawn et al. (1978), and the EMBL3 genomic phage library was prepared by Val Geddes. The probe-encoding sequence is given on top; the corresponding sequence from the isolated phage clone is indicated on the bottom. Nucleotide matches are designated by *. See text for details. LIBRARY SCREENING PROCEDURE PROBE Non-Specific XCharon4A Recombination 'exon 2* i n piAN7 Non-Specific Embl3 In S i t u Hybridization Oligo 3-4 (43 mer) 5' S p e c i f i c 10 Kbp Hind III In Situ Hybridization Oligo 3-4/ Oligo 5-6 5' S p e c i f i c 4 Kbp Ban HI In Situ Hybridization Oligo 3-4/ Oligo 5-6 WASHING CONDITIONS RESULTS Oligo 2: Rescreened 5 * -ACTTTCCCGAGCAGCAGGAGGCC-3' * »» »»«**»****» Oligo 2 ATGTTTCCGAGCAGCAGAGAGCC 52'C Oligo 3:5' -GGAGGCACTGAG-3' GGTGGTACTGAG 50 *C 55'C Clones Hybridized Only With Oligo 3-4 60*C No Hybridizing Clones Segregation of Human Sequences Homologous to the cDNA Encoding.Factor 10 in Human-Hamster Hybrids IDENTIFIABLE. INTACT HUMAN CHROMOSOMES Response to Cell Line Factor 10 1 2 3 & 5 6 7 .8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y 41.06 + - - + - + - - '- - + - . - + + + + + + - + - + -45.01 - - - - - - _ - - - - - - _ - - - • _ _ _ _ _ + _ _ 45.43 - + - - - - _ _ - + + + - - - - . + + + + + + - + _ 76.14 + - - + _ + + _ _ _ _ + _ + + _ + + + _ + + _ + + 76.31 - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ + 79.05b - _ _ + _ _ _ _ _ _ _ _ _ _ + + + + + + _ + _ + _ 80.14c - - - - + + + + - - + - - - - - - _ - - + + - + -80.17a - - - - + + - - - + - + - - - + + - + +•+ + + + -82.82a - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ + _ 85.16a + - + + + + - - + - + + - + _ + - + _ _ + _ + _ 89.27 ? _ _ _ _ + _ _ _ + _ _ + _ _ + + + + _ + _ * + _ _ 100.02b - _ _ + _ _ _ _ _ _ - _ ' _ + _ + _ + _ _ + _ _ _ 102.05b + + _ _ _ + + _ _ _ . _ _ _ + _ _ + + _ _ + _ + _ _ 103.04 _ _ - _ + + _ _ _ _ _ _ + _ _ + + _ + _ + + _ _ _ 111.02a - + - - - + + - - - - - + + + + _ + _ + _ + + - -112.10a + + - - + + _ _ + _ _ + + + _ + + _ _ + + _ _ + _ 120.05 - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 120.33 + - - - - + + - - - - - + + - - + + - + - + + + -120.35 + - - + - + + - + - - - + + + _ - + + + - - + + -133.05 - _ _ _ _ _ + _ ; + _ _ ' _ _ _ _ _ _ _ _ _ + _ + + _ 134.02a. + + - - + + - + •+- + + - + - + +.+ + + + + - + -% Concordancy 65 65 70 60 80 65 60 75 50 65 70 65 - 95 60 60 65 80 55 60 '60 45 55 65 60 TABLE VII: Cytogenetic and isozyme analysis was ca r r i e d out on the hybrid c e l l l i n e s by Nichola Royle. Chromosomes are indicated as present (+) or absent (-) based on the cytogenetic analysis of 20 c e l l s . 112 FIGURE 18: IN SITU HYBRIDIZATION STUDIES OF THE FACTOR X GENE A representation of the distribution of labeled sites over the normal human karyotype. 3H-labeled human cDNA clone pcHX8 was used as a hybridization probe. The horizontal axis shows the metaphase chromosome number and the vertical axis shows the number of labeled sites as determined by autoradiography. The peak at 13q32-qter represents 12.5% of the total number of labeled sites scored. Number of Labelled Si tes • 2. RESTRICTION FRAGMENT LENGTH POLYMORPHISMS 114 The human factor X cDNA pcHX14 was used as a probe by Colin Hay and Katherine Robertson to detect restriction fragment length polymorphisms (RFLPs) in the factor X gene. Polymorphisms were originally identified with Eco RI and Pst I digests (Table VIII) (Hay et al., 1986). A third polymorphism was for the enzyme Hind III identified subsequently (Table VIII). All detected RFLPs appear to be unlinked. As previously mentioned, the human factor X gene has been localized to 13q34 by cytogenetic studies (Royle et al., 1986). It had been proposed that the gene responsible for cystic fibrosis (CF) is on the terminal portion of the long arm of chromosome 13 (Edwards et al., 1984). The human factor X cDNA clone, pcHX14 was used as a probe to study the segregation of CF with the long arm of chromosome 13 (Scambler et al., 1986). The results showed that CF is unlinked to factor X. However, two RFLPs for Bel I were identified for the factor X gene (Table VIII). The CF locus has been subsequently localized to chromosome 7q21-q31 by genetic linkage to specific DNA markers (Knowlton et al., 1985; White et al., 1985; Wainwright et al., 1985). ALLELE ENZYME POLYMORPHISM FREQUENCY ECO RI -7.8 Kbp 0.90 7.4 Kbp 0.10 PST I 2.9 Kbp 0.87 2.5 Kbp 0.13 HIND I I I 8.2 Kbp 0.16 7.$ Kbp 0.84 BCL I 'A' 4.7 Kbfl 0.88 4.2 Kbp 0.12 •B' 3.5 Kbp 0.13 3.1 Kbp 0.87 TABLE V I I I . RESTRICTION FRAGMENT LENGTH POLYMORPHISMS IN THE HUMAN FACTOR X GENE Polymo r p h i s m l e n g t h s and t h e f r e q u e n c y o f o c c u r r e n c e i n t h e f a c t o r X gene. The Eco R I , P s t I , and H i n d I I I RFLPs were d e t e r m i n e d by C o l i n Hay and K a t h e r i n e R o b e r t s o n . The B e l I polymorphisms were d e t e r m i n e d by Scambler e t a l . (1986). 116 V. DISCUSSION A. CHARACTERIZATION OF THE BOVINE FACTOR X cDNA 1. CHARACTERIZATION OF THE BOVINE FACTOR X cDNA CLONES The factor X precursor was initially characterized through the isolation of bovine cDNA clones. A bovine liver cDNA library was screened with a mixture of oligonucleotides, and nine overlapping clones were identified. The clone containing the longest cDNA insert, pBXl, coded for 1537 nucleotides of the bovine mRNA including 75 bp of the 5' untranslated region, the putative initiating methionine, the 40 amino acid leader sequence, the light and heavy chains of the mature protein, and the linking basic dipeptide of the precursor (Figure 7). Sequence representing the C-terminal five amino acid and the 3' untranslated region were absent in pBXl. The cDNA sequence confirmed the activation site cleaved by factor IXa. Proteolysis follows residue 193 in the heavy chain resulting in the removal of a 30 amino acid activation peptide. Cleavage to form factor XaB is predicted at the Arg-Gly peptide bond of residues 435-436 (Figure 7), releasing a 17 amino acid glycopeptide from the C-terminus of the protein as determined by previous studies (Fujikawa et al., 1972a; Titani et al., 1975). Nucleotides 673-675 (Figure 7) encode the tyrosine or sulfated-tyrosine residue responsible for the different chromatographic properties of bovine factors X , and X2 (Jackson and Hanahan, 1968; Fujikawa et al., 1972a). The two glycosylated residues are positioned as determined by protein sequence analysis (Asn-178 and Thr-445 in Figure 7) (Titani et al., 1975). 2. PREDICTED AMINO ACID SEQUENCE OF BOVINE FACTOR X 117 The amino acid sequences of the two chains of factor X predicted from the cDNA sequence agree well with those determined by protein chemistry techniques (Titani et al., 1975; Enfield et al., 1980); however, four differences were noted. Residue 63 of the light chain was reported to be asparagine, but the cDNA sequence predicts aspartic acid in this position. Residues 111-115 of the heavy chain were reported to Gln-Glu-Gly-Asp-Glu, whereas the cDNA sequence predicts that this region consists of six residues in the sequence: Glu-Gln-Glu-Glu-Gly-Asn (residues 253-258 in Figure 7). Titani et al. (1975) reported that their positioning of Asn-152 after Arg-151 (corresponding to Arg-294 and Asn-296 in Figure 7) was tentative, presumably because they were unable to characterize fully a peptide derived from this region. However, the cDNA sequence predicts the presence of an extra arginyl residue in this region, giving the sequence: Arg-Arg-Asn. Lastly, residues 165-170 of the heavy chain were reported to be Ala-Glu-Thr-Leu-Gln-Thr, but the cDNA sequence predicts the sequence: Glu-Ala-Thr-Leu-Met-Thr-Gln (residues 309-315 in Figure 7) for this region. This last difference was unexpected, as Titani et al. (1975) based their sequence determination on the structures of overlapping CNBr fragments. However, Met-Thr bonds are known to resist cleavage with CNBr (Schroeder et al., 1969), so that the unexpected presence of a methionyl residue in this region may have been overlooked during the sequence analysis. B. CHARACTERIZATION OF THE HUMAN FACTOR X cDNA 1. CHARACTERIZATION OF THE HUMAN FACTOR X cDNA CLONES Human factor X cDNA clones were isolated from a human liver cDNA library by cross-hybridization with a bovine factor X cDNA. A total of nine clones were isolated. As no 5' 118 clone was initially identified, the library was rescreened and a longer clone, pcHX14, was obtained. However, it was not full-length. A second human cDNA library was screened and three clones, lambda-Xl, lambda-X3, lambda-X4 encoding the 5' region of factor X as well as 4 nucleotides of the 5' untranslated region of the mRNA were characterized. The overlapping human factor X cDNA clones represent a total of 1481 nucleotides of the mRNA. The cDNAs encode the entire precursor protein including the leader peptide (40 amino acids in length), the light chain, the joining basic tripeptide, and the heavy as well as a portion of the 5' and the complete 3' untranslated regions of factor X transcript followed by a poly (A) tail (Figures 9 and 19). Nucleotides 271-273 (Figure 9) encode an aspartic acid residue that undergoes posttranslational modification to form a beta-hydroxyaspartic acid residue found in plasma factor X (McMullen et al., 1983b; Fernlund and Stenflo, 1983). This is similar to the cDNAs for bovine factor X (Fung et al., 1984), factor IX (Jaye et al., 1983; Kurachi and Davie, 1982), protein C (Long et al., 1984; Beckmann et al., 1985), protein S (Dahlback et al., 1986; Lundwall et al., 1986; Hoskin et al., 1987), and factor VII (Hagen et al., 1986), in which the beta-hydroxyaspartic acid residue is also encoded by an aspartic acid codon. During the conversion of factor X to factor Xa, a glycopeptide of 52 amino acid residues (residues 143-194, Figure 9), is released (DiScipio et al., 1977). There are two potential N-glycosylation sites in the activation peptide, at positions 181 and 191 (Figures 9 and 19). By homology with other serine proteases (see Jackson and Nemerson, 1980), the catalytic triad in human factor Xa probably consists of His-236, Asp-282, and Ser-379 (Figures 9 and 19). The heavy chain sequence is followed by a TGA stop codon, a 3' untranslated region of 10 nucleotides and a poly (A) tail (Figure 9). The putative polyadenylation signal (Proudfoot and Brownlee, 1976) A T T A A A (nucleotides 1422-1427) is located 15 nucleotides upstream of the poly (A) tail. Because of the unusually short 3' untranslated region, the polyadenylation signal is contained within the coding region of the factor X mRNA. The mRNAs coding for 119 FIGURE 19: PRECURSOR FORM OF HUMAN FACTOR X The model was derived from the cDNA sequence of human factor X. Amino acid residues are indicated by the single letter code. The prepro peptide is numbered backwards from the site of cleavage that produces plasma factor X. Numbering of plasma single chain factor X is continuous from the N-terminus. Arrows denote processing sites (N-terminal to C-terminal) which give rise to plasma factor X, two-chained factor X, and factor Xa respectively. The light and heavy chains are linked by the basic tripeptide, Arg-Lys-Arg (residues 140-142). Diamonds represent carbohydrate residues, 7 represent Gla residues, and P represents the beta-hydroxyaspartic acid residue. Disulfide bridges are placed according to amino acid homologies with other proteins. The catalytic residues are His-236, Asp-282, and Ser-379. O 121 the beta subunit of human chorionic gonadotropin (Fiddes and Goodman, 1980) and the abnormal alpha-globin Constant Spring (Proudfoot and Longley, 1976) also have short 3' untranslated regions (16 nucleotides). In these two mRNAs, the polyadenylation signal is located 16 nucleotides upstream of the poly (A) tail and contains the UAA codon that is used as a stop codon. 2. SIZE ANALYSIS OF THE HUMAN FACTOR X mRNA The size of the mRNA encoding human factor X was determined by northern blot analysis (Figure 11). The analysis detected a single human factor X mRNA species of 1800 ± 100 nucleotides in length in liver tissue, suggesting that less than 100 nucleotides of the 5' untranslated region of the mRNA are absent from the human factor X cDNA clones. The single species of factor X mRNA agrees with data published by Bahnak et al. (1987). The cDNA clone reported by Leytus et al. (1986) contained 25 bp of the 5' flanking sequence and may encode the entire factor X mRNA. By comparison, the 5' untranslated regions of factor IX and protein C are 29 bp and 75 bp, respectively, as determined by SI nuclease mapping and primer extension analysis of the corresponding genes (Anson et al., 1984; Plutzky et al., 1986). 3. PREDICTED AMINO ACID SEQUENCE OF HUMAN FACTOR X As reported by Leytus et al. (1984), the cDNA sequence predicts that the light and heavy chains of human factor X are joined by the tripeptide Arg-Lys-Arg (encoded by nucleotides 502-510 in Figure 9). The sequence agrees well with those regions of factor X that had been sequenced directly by using protein chemistry techniques. Nucleotides 85-501 encode the complete light chain of factor X. The predicted amino acid sequence is in complete agreement with that determined by McMullen et al. (1983). Nucleotides 51 1-1428 encode the heavy chain of factor X including three regions whose amino acid sequences have been 122 determined previously. Nucleotides 511-558 code for the amino-terminal sequence of the heavy chain of factor X reported by DiScipio et al. (1977), except that the cDNA sequence predicts serine residues at positions 150 and 157 (Figure 9) whereas DiScipio et al. reported an unidentified residue and a threonine residue for these two positions, respectively. Nucleotides 667-717 encode the same amino-terminal sequence of the heavy chain of factor Xa reported by DiScipio et al. (1977), except that the cDNA sequence predicts that residue 208 is a tryptophan rather than a threonine. The reason for these differences is unclear but may be the result of reverse transcriptase errors during cDNA synthesis, the presence of polymorphisms, or incorrect amino acid assignments during the latter stages of the automatic Sequenator analyses. Nucleotides 1171-1245 encode the active site region of factor Xa; the predicted sequence agrees with the amino acid sequence reported by DiScipio et al. (1977). C. PRECURSOR FORM OF FACTOR X 1. FACTOR X IS SYNTHESIZED AS A SINGLE CHAIN PRECURSOR From the cDNA sequence, it is clear that bovine factor X is synthesized as a single-chain precursor. Single chain species have been reported for bovine (Mattock and Esnouf, 1973), human (Rosenberg et al., 1975; Fair and Bahnak, 1984), and rat (Graves et al., 1982; Willingham and Matschiner, 1984) factor X, although these preparations have not been well characterized. The cDNA sequence predicts that the conversion to the two chain form involves the removal of an Arg-Arg dipetide. In the case of human factor X, McMullen et al. (1983a) reported that the carboxyterminal sequence of the light chain was Leu-Glu-Arg, whereas the amino-terminal sequence of the heavy chain of plasma factor X is Ser-Val-Ala (DiScipio et al., 1977a). Thus, the basic tripeptide must be eliminated during the conversion from a single chain to the two-chain form of human factor X. 123 Similar basic peptide linkages have been found in other blood clotting protein precursors including bovine (Long et al., 1984) and human protein C (Foster and Davie, 1984; Beckmann et al., 1985) (Figure 20). This type of processing is analogous to the processing of some hormone precursors, including proinsulin (Steiner et al., 1980). After transport to the Golgi area of the pancreatic beta cell, proinsulin appears to be processed initially by a trypsin-like protease, followed by removal of basic amino acid residues by a carboxypeptidase B-like enzyme (Steiner et al., 1980). A similar processing mechanism is observed in two complement proteins. Two components of the complement system, C3 (de Bruijn and Fey, 1985) and C4 (Belt et al., 1984) are synthesized as single chain precursors. Prior to secretion, a basic tetrapeptide is excised from between the beta and alpha chains of C3 and similar basic amino acid residues are eliminated from between the beta and alpha, and the alpha and gamma chains of C4 to release the two- and three-chained mature forms of the respective proteins (de Bruijn and Fey, 1985; Belt et al., 1984) (see Figure 20). In contrast to the complement proteins, the protease(s) required for the conversion of single chain factor X to two chain factor X may be located in plasma. Graves et al. (1982) showed that the rat hepatoma cell line Ff-35 synthesized and secreted a single chain factor X. Using rapid immunochemical isolation techniques, these authors showed that 40% of rat factor X was in the single chain form, suggesting that extracellular processing may occur. Mattock and Esnouf (1973) also showed that yields of bovine single chain factor X were increased by more rapid processing of the blood after collection. However, the conversion to two chain factor X must be very efficient, as Enfield et al. (1980) did not report any heterogeneity at the C-terminal end of the light chain during their sequence analysis. It is not known whether these cleavages are significant to the function of the protein. Activated single chain factor X has been shown to demonstrate coagulant activity (Willingham and Matschiner, 1984). The identity of the protease(s) responsible for the cleavage to form two chain factor X is unknown. However, a unique yeast endopeptidase 124 FIGURE 20: COMPARISON OF BASIC AMINO ACID LINKAGES IN PRECURSOR MOLECULES Basic amino acid residues linking the polypeptide chains in the precursor forms of the blood coagulation proteins, bovine factor X, human factor X, bovine protein C (Long et al., 1984), human protein C (Beckmann et al., 1985; Foster et al., 1984), and components of the complement system, C3 (de Bruijn and Fey, 1985) and C4 (Belt et al., 1984), are represented. The linking peptides are given between the square parentheses. Flanking residues represent the C-terminus and the N-terminus, respectively, of the processed chains in the mature forms of the plasma proteins. Orientation of the polypeptide chains within the complement C3 and C4 genes are as denoted by the arrows. 125 BOVINE FACTOR X Gly Arg SeT] Arg Arg [jTrp Ala He HUMAN FACTOR X Leu Glvl Arg] Arg Lys Arg [ j e r Val Ala BOVINE PROTEIN C Lys Thr LexT) Lys Arg [j\sp Thr Asn HUMAN PROTEIN C Ser His Leu] Lys Arg[jVsp Thr Glu COMPLEMENT C3B+a Pro Ala A l a ] Arg Arg Arg Arg |j3er V a l Gin COMPLEMENT C4S+o Lys Thr ThiTJ Arg Lys Lys Arg []Rsn Val Asn COMPLEMENT C4a+Y Arg Arg Asn] Arg Arg Arg Arg [ j i l u Arg Pro 126 with substrate specificity for paired basic residues has been purified from cell lysate (Mizuno and Matsuo, 1984). The protein is similar to but distinct from the trypsin-like proteases. It has been proposed that the endopeptidase may be involved in prohormone processing in yeast (Mizuno and Matsuo, 1984). However, an analogous enzyme has yet to be identified in mammalian cells or plasma. 2. PREPROFACTOR X The cDNA sequence also predicts that bovine factor X is synthesized as a precursor with a leader peptide of 40 amino acid residues. The N-terminal part of this leader peptide consists of a highly hydrophobic region (residues -36 to -22 in Figure 7) followed by a region containing glycine, alanine, and proline. The sequence of the leader peptide of human factor X is homologous to that found in the bovine factor X precursor and both share identity with the leader sequences of the other vitamin K-dependent clotting factors including prothrombin (MacGillivray and Davie, 1984; Degen et al., 1983; MacGillivray et al., 1986), factor IX (Jaye et al., 1983; Kurachi and Davie, 1982), and protein C (Long et al., 1984; Beckmann et al., 1985) as shown in Figure 21. Other vitamin K-dependent proteins such as protein S (Dahlback et al., 1986; Lundwall et al., 1986; Hoskin et al., 1987) and factor VII (Hagen et al., 1986) also possess similar precursor sequences. The leader sequences vary between 38-46 amino acids in length. An alternate leader peptide of 60 amino acids has been identified for factor VII (Hagen et al., 1986) with a 22 amino acid insertion between residues -18 and -17 of the 38 amino acid leader. However, the leader region of the shorter cDNA more closely resembles the leader peptide of the other vitamin K-dependent proteins both in size and in hydrophobicity pattern, and is the major mRNA transcript found in liver (Berkner et al., 1986). The amino-terminal region of the leader sequence corresponds to the signal peptide found in many secreted proteins (von Heijne, 1983, 1985; Watson, 1984). However, the conversion to 1 2 7 FIGURE 21: COMPARISON OF THE LEADER SEQUENCE OF THE VITAMIN K -DEPENDENT BLOOD COAGULATION FACTORS Comparison of the leader sequence of bovine factor X, human factor X, bovine prothrombin (MacGillivray and Davie, 1984), human prothrombin (Degen et al., 1983; MacGillivray et al., 1986), human factor IX (Jaye et al., 1983), bovine protein C (Long et al., 1984), and human protein C (Beckmann et al., 1985) as predicted from the cDNA sequences. Identical residues in corresponding positions in two or more of the protein sequences are boxed. The sequences are numbered backwards from the cleavage site that gives rise to the mature protein found in plasma. For bovine factor X, human factor X, and human factor IX, the 5' most ATG codon has been assumed to code for the initiator methionyl residue. The leader sequence of bovine protein C is incomplete, as it does not encode a possible initiator methionyl residue. Bovine Prothrombin Met Ala Arg Val Arg Gly Pro Arg Leu Pro Gly Cys Leu Ala Leu Ala Ala Leu Phe Ser Human Prothrombin Met Ala Arg He Arg Gly Leu Gin Leu Pro Gly Cys Leu Ala Leu Ala Ala Leu Cys Ser Bovine Factor X Met Ala Gly Leu Leu His Leu Val Leu Leu Ser Thr Ala Leu Gly Gly Leu Human Factor X Met Gly Arg Pro Leu His Leu Val Leu Leu Ser Ala Ser Leu Ala Gly Leu Human Factor IX Met Gin Arg Val Asn Met l i e Met Ala Glu Ser Pro Gly Leu 116 Thr He Cys Leu Leu Gly Tyr Leu Bovine Protein C Thr Ser Leu Leu Leu Phe Val Thr He Trp Gly He Ser Ser Thr Human Protein C Met Trp Gin Leu Thr Ser Leu Leu Leu Phe Val Ala Thr Trp Gly He Ser Gly Thr -45 -40 -35 -30 -25 Bovine Prothrombin Human Prothrombin Bovine Factor X Human Factor X Human Factor IX Bovine Protein C Human Protein C Leu Val His Ser Gin His Val Phe Leu Pro Leu Val His Ser Gin His Val Phe Leu Ala Leu His Gin Gin Ala Gin Gin Ala Pro Leu Leu Arg Pro Ala Gly Leu Leu Gly Glu Ser Ala Glu Cys Thr Pro Leu -20 Ser Val Phe Leu Ala Arg Ser Pro Ala Pro Pro Ala Pro Asp Ser Arg Gin Ala His Arg Glu Gin Ala Asn His Glu Asn Ala Asn Ser Leu Ser Leu Val Arg Asn Lys He He Leu Gin Arg Ala Arg Arg Ala Leu GLn Arg Val Arg Arg Ala Leu Gin Arg Ala Arg Arg Ala Arg Val Arg Leu Leu Ala Asn Asp Ser Val Phe Ser Ser Ser Gin Arg Ala His Gin Val Asp Ser Val Phe Ser Ser Ser Glu Arg Ala His Gin Val Leu Arg Leu Arg Val He Pro Thr Arg Ala Tyr Lys Arg Arg Lys Arg Ala Arg Lys Arg Ala -15 -10 -5 -1 +1 129 bovine plasma factor X occurs by cleavage of a bond in the sequence: Arg-Arg-Ala (encoded by nucleotides 190-198 in Figure 7), where Ala represents the N-terminus of single chain bovine factor X. Several other proteins are processed by proteolysis C-terminal to double basic residues including albumin (Strauss et al., 1978) and apoliprotein A-II (Gordon et al., 1983). In both of these proteins, it has been shown that the co-translational (signal peptidase) cleavage occurs at a position upstream of the double basic residues (Redman et al., 1983; Gordon et al., 1984). Presumably, conversion of the proprotein to the plasma form of the protein involves proteolysis at the C-terminal side of the double basic residues by an unknown protease. From the predicted leader sequence, it appears that factor X is also synthesized as a preproprotein similar to other plasma proteins including albumin (Strauss et al., 1978), apolipoprotein A-II (Gordon et al., 1983), prothrombin (MacGillivray and Davie, 1984; Degen et al., 1983), factor IX (Jaye et al., 1983; Kurachi and Davie, 1982), and protein C (Long et al., 1984; Beckmann et al., 1985). Comparison of the leader peptides of four of the vitamin K-dependent proteins (Figure 21) shows little sequence identity between residues -40 and -18 (one residue out of 23 in corresponding positions is identical in all four proteins). The amino-terminal regions contain many hydrophobic residues and probably constitute the signal sequence necessary for translocation of the nascent polypeptide chain across the rough endoplasmic reticulum (Blobel et al., 1979). This region is followed by a more hydrophilic region (residues -18 to -1 in Figure 21) that shares greater sequence homology than the signal peptide region (7 out of 18 residues in corresponding positions are identical). Conversion of the precursors to the forms found in plasma occurs by cleavage of a bond that is carboxyterminal to a basic residue. The nature and the location of the protease that converts the proprotein to the plasma form of the protein are unknown. However, human factor X differs from the other vitamin K-dependent proteins in that the proprotein protease cleaves a bond carboxyterminal to a Thr-Arg sequence rather than a double basic sequence. 130 The location of the signal peptidase cleavage site in bovine and human factor X is unknown. In factor IX and in protein C the pro region is 18 (Diuguid et al., 1986; Bentley et al., 1986) and 24 (Foster et al., 1987) amino acids in length, respectively. Analyses of defective factor IX mutants indicate that arginine -1 (Diuguid et al., 1986) and arginine -4 (Bentley et al., 1986) are essential for accurate processing of the factor IX propeptide. Prepeptide processing does not occur at homologous positions as might be expected. However, the signal peptidase cleavage site in both factor IX and protein C is situated at the approximate region of sequence divergence in the leader peptides, suggesting an analogous pro segment of perhaps 17 to 25 amino acids in length for factor X. Bentley et al. (1986) predict a possible processing site at residue -19. The carboxyterminal regions of the leader peptides also show amino acid sequence homology with the leader peptide of osteocalcin, a bone Gla protein (Pan and Price, 1985; Pan et al., 1985). The proximity of the pro and Gla domains in the otherwise structurally and functionally unrelated proteins suggests an inter-related function for the two adjacent regions. There is now considerable evidence to show that the pro peptide comprises at least part of the recognition sequence for the vitamin K-dependent carboxylase. Mutations in the pro segment of the blood clotting precursors led to secretion of uncarboxylated proteins with reduced biological activity (Jorgensen et al., 1987; Suttie et al., 1987; Foster et al., 1987; Busby et al., 1987). 3. PROCESSING OF THE FACTOR X PRECURSOR From the sequence of the bovine and human cDNAs, it is predicted that factor X is synthesized as a single chain precursor having a prepro leader sequence. This precursor appears to be cleaved specifically in at least five different steps during its conversion to factor Xa ( see Figure 19). Initially, signal peptidase cleaves preprofactor X to produce profactor X. A trypsin-like protease then converts profactor X to single chain factor X in 131 an analogous manner to the conversion of proalbumin to albumin. Perhaps after secretion into the blood, another trypsin-like protease cleaves the single chain factor X at a position C-terminal to Arg-142. A carboxypeptidase B-like enzyme then releases either the two arginyl (bovine factor X) or the Arg-Lys-Arg (human factor X) residues from the light chain, resulting in the formation of two-chain factor X. Finally, activation of factor X to factor Xa occurs by cleavage of the bond C-terminal to Arg-193 (corresponding to Arg-51 of the heavy chain of bovine factor X (Titani et al., 1975)) or Arg-194 (in human factor X, Figure 19). The specificity of these cleavages during the formation of factor Xa is quite remarkable. 4. HOMOLOGOUS STRUCTURAL DOMAINS Comparison of the amino acid sequences of the vitamin K-dependent clotting factors revealed several structurally and functionally homologous domains (Neurath, 1984). The homology extends to include posttranslational modifications (other than predicted carbohydrate attachment sites) and half cysteinyl residues. Comparison of the six glycoproteins revealed three family groups comprised of prothrombin, protein S, and the factor X-like proteases including factors VII, IX, X, and protein C. Among the factor X -like proteases, protein structural domains have been well conserved. The four proteases share -40% amino acid sequence identity. The structural homologies found in human factor X are illustrated in Figure 19. Amino acid residues 1-43 represent the N-terminal Gla domain required for calcium binding and efficient activation of the protein (Suttie, 1985; Burgess and Esnouf, 1985). This is followed by the two EGF-like elements (Doolittle et al., 1984) encoded by residues 44-83 and 84-126, respectively. Their function is as yet undefined. The heavy chain consists of the activation peptide (residues 143-194) released by factor IXa and factor Vila (DiScipio et al., 1977), and the catalytic serine protease domain (residues 195-448) comprises the C-terminal region of the proteolytic protein. D. COMPARISON OF BOVINE AND HUMAN FACTOR X 132 A comparison of the amino acid sequence of bovine and human preprofactor X is shown in Figure 22. Overall, the two sequences display 65% sequence identity when a single gap is inserted in the bovine activation peptide sequence (between residues 189 and 190, Figure 22) to maximize the homology. In general, it appears that factor X is under fewer evolutionary constraints than prothrombin, factor IX, or protein C which demonstrate 81%, 84%, and 72% identity between bovine and human sequences (Walz et al., 1986). The leader peptides of factor X exhibit only 39% sequence identity at the amino acid level but 63% identity at the nucleotide level. The light chains exhibit 70% identity at the amino acid level, and the amino acid identity is 84% for residues 194-429 of the heavy chain. Presumably, this identity reflects the functional importance of these two regions of factor X. The greatest identity of the noncatalytic chain is in the first EGF-like domain (residues 44 to 83) which is 80% identical. In contrast, the activation peptides (residues 143-194 in the human sequence) and carboxyterminal regions (residues 430-448 of the human sequence) exhibit 14% and 5% sequence identity, reflecting the lack of function associated with these regions. Indeed, a carboxyterminal peptide can be removed from the heavy chain of bovine factor Xa without altering its activity (Fujikawa et al., 1975). Carbohydrate attachment is predicted at a homologous Asn site in the bovine (residue 178) and human (residue 181) activation peptide; however, a second oligosaccharide moiety is positioned independently (see Figures 7 and 9). The comparison shown in Figure 22 differs from that reported by Leytus et al. (1984). This is mainly the result of differences between the bovine factor X amino acid sequence determined by protein chemistry techniques (McMullen et al., 1983; DiScipio et al., 1977) used by Leytus et al. (1984) and that predicted from the cDNA sequence (Fung et al., 1984) and used in Figure 22. In every case, however, the sequence predicted from the bovine cDNA shares greater sequence identity with the human factor X sequence. 133 FIGURE 22: COMPARISON OF THE AMINO ACID SEQUENCES OF BOVINE AND H U M A N FACTOR X Protein sequences of bovine and human factor X are predicted from the cDNA sequences. Identical amino acids in corresponding positions are boxed. A single gap has been inserted in the bovine sequence (between residues 189 and 190) to maximize the identity. The carboxyterminal 5 residues of the bovine sequence were not encoded in the cDNA and have been taken from Titani et al. (1975). -40 - 3 0 B o v i n e Met A l a G l y t e n Leu H i s L e u V a l L e u L e u S e r T h r A l a Hunan S e r " • -28 Leu L e u C l y A l a G l y L e u L e u G l y L e u L e u - 2 0 A r g P r o A l a . G l y L e u L e u G l y G l u - 2 0 -10 S e r Ser , V a l L e u Phe Phe L e u P r o l i e A r g A r g Are . A s p G l u G i n A l a G i n A l a H i a A r g V o l A s n A a n Z l e L e u L e u C l n A l a A r g ArgJ B o v i n e Human A l a A s n S e r Phe Leu C l u C l u A l a A s n S e r Phe Leu G l u C l u V a l Met 10 G i n L y s 10 C l y SH. A s n H i s 20 L e u G l u A r g G l u Cys Leu G l u A r g G l u C y s L e u Met G l u C l u G l u G l u A l a T h r S e r S e r L e u T y r C l u C l u A l a A r g C l u V a l Fhe C l u A a p G l u C l u A l a A r g C l u V a l Phe G l u A a p A l a G l u G i n S e r A a p L y s T h r T h r A a p Aan 40 C l u Phe G l u Phe 40" B o v i n e Hunan T r p T r p S e r Asn SO Lya T y r L y s A s p G l y Aap C l n Cys G l u L y s T y r L y s A s p G l y A s p G i n Cya G l u 5 0 -G l y H i s T h r S e r P r o Cya P r o C y s L e u C l n A s n C l n C l y Aan G i n G l y 60 H i s L y a 60 C y s L y s A s p G l y C y s L y a A s p G l y H e L e u C l y G l y A a p C l u T y r T h r C y a T h r Cys T y r T h r C y s T h r Cya A l a L e u 80 C l u G l y Phe G l u C l y L y a Aan G l u C l y Phe C l u G l y Lye A s n B o v i n e Human L y s G l u L y * C l u Phe S e r L e u Phe T h r A r g T h r A r 9 G l u H e L y s L e u Cys S e r L e u Aap Aan G l y C y * S e r L e u A » p Aan G l y 90 C l y Aap 100 C y a Aap G i n Phe C y * C y * Aap C l n Ph* C y * A r g H i * 100 G l u G l u G l u G l u A r o S e r G l u G i n A a n 8 e r V a l V a l A r g V a l 110 C y a S e r C y s A l a C y a S e r C y a A l a H i a A r c G l y T y r v a l Leu T h r 1 L e u G l y A l a A a p A a p A a p l A a n 120 B o v i n a Human S e r C l y S e r A l a Cy* cy* V a l S e r H e P r o T h r T h r 130 C l u A r g Phe G l y P r o T y r 130 P r o C y * G l y Lya P r o C y * G l y Lye Phe G i n T h r T h r C l n G l y L e u G l u A r g *£2. 140 S e r A r g A r g L y s 140 A r g Arg , ISO T r p A l a H e H i * T h r S e r C l u A s p A l a L e u A s p S e r V a l A l a G i n A l a T h r S e r S e r S e r G l y G l u 150 A l a A l a 160 S e r G l u L e u G l u H i a T y r P r o A a p S e r Z l e T h r T r p 160 B o v i n e Hunan A a p L y * P r o P r o A l a T y r A s p A s p 170 Leu S e r P r o T h r G l u S e r S e r L e u Aap L e u L e u G l y A l a A l a Asp L e u A s p P r o T h r G l u A s n P r o Phe A s p 170 L e u L e u 180 A s n A r g T h r G l u P r o S e r A l a G l y L e u A s p Phe A s n G i n T h r G i n P r o 180 G l u G l u A s p A r g C l y six. 190 S e r C l n V a l V a l A a p Aan Aan L e u T h r 190 A r g H e V a l G l y C l y A r g H e V a l G l y C l y A r g A a p C l n G l u 200 B o v i n e Hunan L y a L y s A l a G l u L y s A8p 210 220 G l y G l u Cys Pro T r p C l n A l a L e u L e u G l y G l u Cys P r o T r p G i n A l a L e u L e u m V a l H e A s n G l u G l u A s n G l u G l y Phe C y s G l y G l y T h r H e L e u A s n G l u G l u A s n C l u G l y Phe C y s G l y G l y T h r H e L e u J J J A a n S e r G l u Phe T y r C l u Phe T y r 230 V a l H e 230 L e u T h r A l a A l a H i s C y s L e u L e u T h r A l a A l a H i s C y * L e u H i a T y r G i n G i n 240 B o v i n a Hunan 240 250 A l a L y * A r g Phe A l a L y s A r g Phe T h r L y s V a l Arg V a l G l y Aap A r g Aan T h r C l u C l n C l u G l u G l y V a l Arg V a l G l y Asp A r g A s n T h r G l u G i n C l u C l u C l y I5T5 Aan G l y G l u C l u 260 Net A l a A l a V a l H i a G l u V a l G l u H i a G l u V a l G l u M e t T h r V o l V a l V a l H e L y a H i a L y a H i * S a r A s n A r g Phe A r g Phe V o l T h r L y a C l u T h r T y r A a p L y * G l u T h r T y r A s p B o v i n e Human 280 290 Phe Aap H e A l a V a l L e i u A r g Leu L y s T h r P r o H e Phe A s p H e A l a V a l Leu A r g L e u L y a T h r P r o H e 353 A r g T h r Phe A r g Phe A r g A r g Met A s n V a l A l a P r o A l a Cya L e u P r o G l u A « n V a l A l a P r o A l a C y * L e u P r o C l u L y . A r g A * p T r p A l a G l u A a p T r p A l a G l u A l a S * r T h r L e u Met T h r G i n L y * T h r G l y H e T h r L e u Met T h r G i n L y * T h r C l y H e B o v i n * Hunan 3 20 330 V a l S e r C l y Phe C l y Arg T h r H i * G l u L y * c l y A r g V a l S e r G l y Phe C l y A r g T h r H i s G l u L y * C l y A r g L e u G i n S e r T h r T h r A r g L e u L y * Met L e u G l u V a l P r o T y r V a l A a p A r g L e u L y * M « t Leu C l u V a l P r o T y r V a l A » p A r g S e r T h r A s n S e r Cys L y s L e u S e r S e r S e r Phe Cya L y a L e u S e r S e r S e r P h * T h r H e H e T h r I I * T h r P r o C l n 360 B o v i n a Hunan 360 A s n Met Phe Cys A l a C l y T y r A s p T h r A s n Met Phe Cys A l a G l y T y r Asp T h r 3 7 0 . G i n P r o L y s G i n 370 390 G l u A s p A l a C y s G i n G l y A s p S e r G l y G l y P r o H i s V a l T h r A r g Phe L y a A a p T h r T y r Pha V o l T h r G l y H e V a l s e r T r p G l y G l u Aap A l a C y a G i n G l y A a p S e r G l y C l y P r o H i a V a l T h r A r g Phe L y * A s p T h r T y r Phe V a l T h r C l y H e V o l S e r T r p G l y 355 ' 390 400 B o v i n * Hunan C l u G l u C l y S a r C y * A l a A r g L y s G l y L y * C y s A l a A r g L y a G l y L y s | T y r G l y 211 . *10 _ V a l H a 410 T y r T h r L y a V a l T y r T h r L y * V a l S e r Aan T h r A l a <20 .. Phe Leu L y s T r p H * A s p Phe Leu L y * T r p H e A » p L y a H e A r g S e r M e t L y * Met L y * A l a T h r A r g A r g . 430 A l a G l y A l a A l a C l y S a r A r g C l y H i a S e r C l y L e u P r o L y * A l a L y a S e r H i a A l a P r o 430 C l u G i n 440 4 40 4S0 452 B o v i n e A l a P r o A l a T h r T r p T h r V a l P r o P r o P r o L e u P r o L e u Hunan V a l H e T h r S e r S e r P r o Leu L y s 448 E. CHARACTERIZATION OF THE HUMAN FACTOR X GENE 135 1. DNA SEQUENCE ANALYSIS OF THE HUMAN FACTOR X GENE Preliminary characterization of the human factor X gene by Southern blot analysis showed a single copy of the factor X gene in the human genome spanning more than 16 Kbp of genomic DNA. A total of 32 overlapping phage clones encoding the factor X gene were isolated from a human genomic library. Restriction enzyme mapping demonstrated that over 32 Kbp of contiguous sequence was represented by six random clones. The factor X gene is composed of at least 8 exons and 7 introns as reported by Leytus et al. (1986) (Figure 23). DNA sequence analysis indicated that the 5' untranslated region and the prepeptide encoding sequences were absent from the existing clones. By Southern blot analysis, all exon sequences were localized; thus far, the factor X gene maps to 20 Kbp of the human genome. All splice junction sequences in the factor X gene agree with the consensus sequences reported by Mount (1984). The G T - A G consensus sequence is thought to be essential for the accurate excision of introns from primary mRNA transcript (Breathnach and Chambon, 1981). However, other junction sequences may direct precise processing; the rare splice donor dinucleotide, GC, has been reported in several eukaryotic structural genes which are properly spliced (Wieringa et al., 1984; Dush et al., 1985; Irwin, 1986; Cool and MacGillivray, 1987). Exon flanking sequences also agree well with the consensus sequence of Mount (1982). Comparison with the junctions reported by Leytus et al. (1986) showed only one difference: a G nucleotide has been deleted in the acceptor sequence of intron D. The residue may represent a cloning/sequencing artifact. The A T T A A A polyadenylation signal (Proudfoot and Brownlee, 1976) is found within the coding region, 16 nucleotides 5' to the poly (A) tail as described for the cDNA. Several tentative sequences (similar to the consensus CAYTG) representing another recognition sequence are found at positions 19623, 19632, 19646, 19655, 19660, and 19675 immediately 3' to the site of polyadenylation (Figure 136 FIGURE 23: INTRON SITES IN THE HUMAN FACTOR X MOLECULE The relative posjtions of introns A to G within the factor X amino acid sequence are indicated by Foctor IX, - C O O H 138 14). Termination of transcription occurs beyond the consensus elements at an unknown site and the mRNA is subsequently processed to the site of the poly (A) tail at nucleotide 19627 (Birnstiel et al., 1985). With the exception of the amino acid residue at position 402 (serine is replaced by a glycine in the genomic sequence), no heterogeneity is observed in the coding sequence with respect to the cDNA clones. The difference probably reflects a polymorphic site in the factor X gene. As the glycine variant is also found in the cDNAs isolated by Leytus et al. (1986), it may be the more frequently expressed allele. 2. 5' REGION OF THE FACTOR X GENE Clones encoding the prepeptide region of the leader sequence and the 5' untranslated region of the factor X gene were not obtained. In total, three general and two specific genomic phage libraries were screened in efforts to isolate the 5' region of the gene. The initial cDNA probes used in the in situ hybridizations may have been predisposed towards the 3' end of the gene due to the unusually large 612 bp exon encoding the C-terminus of the protein; however, upon use of specific 5' end probes, no hybridizing sequences were identified. The phage libraries were amplified prior to screening. Underrepresented sequences may have been lost. Subsequent construction and screening of libraries specific for the 5' end region did not yield a positive clone. Possibly, the enrichment factor may not have been sufficient for the isolation of the specific sequence; however, some human genomic phage sequences are known to be deleted in rec+ hosts (Wyman et al., 1985). Hypervariable regions 3' to the alpha-globin locus have resisted strenuous efforts at cloning (see Wyman et al., 1985) as has the junction fragment from a translocation between chromosome 6 and 10 in murine plasmacytoma (Perlmutter et al., 1984). Discrepancies between cloned and genomic sequences have occurred during cloning of mammalian DNA (Wyman et al., 1985; Perlmutter et al., 1984; Taub et al., 1983; Nikaido et al., 1981). Leach and Stahl (1983) and Wyman et al. (1985) have demonstrated that inverted repeats prevent survival of internal sequences in phage lambda vectors unless mutant hosts are used. 139 Unclonable sequences at the 5' end of the factor X gene may indicate the presence of repeat elements. This explanation is plausible as the genomic clones isolated by Leytus et al. (1986) also terminated immediately 5' to exon 2. In conclusion, the genomic phage libraries did not contain clones representing the 5' end of the factor X gene. Southern blot analysis confirmed the presence of at least one other exon encoding the pre segment of the leader peptide analogous to the other vitamin K-dependent clotting factor genes including prothrombin (Irwin et al, 1985; Davie et al., 1983; Degen et al., 1985), factor VII (O'Hara et al., 1987), factor IX (Anson et al., 1984; Yoshitake et al., 1985), and protein C (Foster et al., 1985; Plutzky et al., 1986). Data from analyses of the factor X genomic clones (20 Kbp) and the 5' region of the factor X gene (~4Kbp) indicates that the structural and 3' regions of the factor X gene encompass 24 Kbp of the human genome. Whether an intron interrupts the 5' flanking sequence is unknown. Regulatory elements were not identified for the factor X gene as the majority reside 5' to the structural portion of the gene (Breathnach and Chambon, 1981; Reudelhuber, 1984). Clearly defined 'TATA' and 'CCAAT' -like elements have not been identified in many liver specific genes (Ciliberto et al., 1985; Gorski et al., 1986) including several of the blood clotting factor genes: factor XII (Cool and MacGillivray, i987), prothrombin (Irwin, 1986), factor IX (Yoshitake et al., 1985), and protein C (Plutzky et al., 1986). As alternate upstream liver specific promotors have been identified in other genes (Ciliberto et al., 1985; Gorski et al., 1986), it has been proposed that similar control sequences recognized by liver specific factors may be found in the blood coagulation factor genes (Cool and MacGillivray, 1987). Genes entirely lacking 'TATA' consensus signals may transcribe mRNA species exhibiting heterogeneous 5' ends, although many genes are initiated precisely in the absence of regulatory elements (see Cool and MacGillivray, 1987 for a full discussion). Identification of possible enhancer elements awaits tissue culture expression assays of the blood coagulation 140 protein genes. Whether the 5' flanking sequence of factor X possesses all or any of these control regions remains to be seen. 3. FACTOR X GENE STRUCTURE It has been postulated that intron positions mark discrete exon units of functional protein domains (Gilbert, 1978; Blake, 1978). During evolutionary processes, proteins have arisen as collections of exons brought together by recombination within the intervening sequences. Thus, new proteins may be generated by rearrangement between existing genes (Gilbert, 1978, 1979, 1985; Blake, 1978, 1983a, b). The organization of the factor X gene is represented schematically in Figure 23. The intron sites appear to define protein domains, many of which are found in other proteins. The first intron interrupts the prepro leader peptide at residue -17, in the proximity of the signal peptidase cleavage site reported for factor IX (at residue -18, Diuguid et al., 1986; Bentley et al., 1986) and protein C (at residue -24, Foster et al., 1987). Apparently, the pre and pro peptides are located on separate exons. The Gla domain is composed of 11 modified residues extending from amino acid 6 to 39. One intron interrupts the sequence, segregating the final Gla residue. The 5' exon includes the majority of the Gla region as well as the pro region, supporting the concept of inter-related functions for the two domains (Jorgenson et al., 1987; Suttie et al., 1987; Foster et al., 1987; Busby et al., 1987). The small, 25 bp exon encoding the isolated Gla residue is followed by the two EGF homologies, each of which is represented by a single exon. The linking basic tripeptide and the activation peptide are flanked by introns E and F. Both segments are removed upon processing of the protease. In the heavy chain, one intron divides the serine protease domain of factor X. The His-236 is encoded by a separate exon preceeding the common exon representing both the Asp-282 and Ser-379 catalytic residues. 141 Throughout the length of the coding sequence, the intron positions clearly demarcate structural domains of the protein. Only the Gla and catalytic regions are further divided by introns (the Gla domain by intron B and the catalytic domain by intron G). Whether these domains display autonomous functions is not as well defined. However, calcium binding capacity has been attributed to both the Gla region (Suttie, 1985) and the EGF region (Ohlin and Stenflo, 1987; Stenflo et al., 1987). F. GENETICS OF HUMAN FACTOR X 1. ANALYSIS OF CHROMOSOME LOCI Human-hamster somatic cell hybridization analysis and in situ hybridization studies assigned the factor X gene to chromosome 13q32-qter (Royle et al., 1986). Using pcHX14 as the hybridization probe, similar results were obtained by DNA dosage analysis of an individual monosomic for 13q34 (Scambler and Williamson, 1985). Reports indicated that factor X and factor VII map to the same region of the human genome (Pfeiffer et al., 1982; Otto and Pfeiffer, 1984; de Grouchy et al., 1984), suggesting that a possible tandem duplication event may have generated the two protease genes at a late evolutionary date. Characterization of the factor VII gene shows homologous exon organization (O'Hara et al., 1987) to factor X, supporting a recent divergence from a common ancestral gene. However, tandem duplications do not in themselves indicate more closely related genes; genetic cross-over events can give rise to families of related genes, many of which are positioned on different chromosomes. This is exemplified by the other homologous vitamin K-dependent protein genes, factor IX and protein C, which have been mapped to chromosome Xq27 (Camerino et al., 1984) and chromosome 6 (Rocchi et al., 1985), respectively. The intervening sequences of the factor IX (Anson et al., 1984) and protein C (Plutzky et al., 1986) genes contain Alu elements whereas the factor VII (O'Hara et al., 1987) gene possesses five imperfect repeats, 142 analogous to hypervariable minisatellite DNA. These different sequence patterns may be a reflection of the distribution of ancestral exons of the vitamin K-dependent proteins to different genomic regions (O'Hara et al., 1987). 2. RFLP STUDIES Several restriction fragment length polymorphisms have been identified for the factor X gene including Eco RI, Pst I, Hind III (Hay et al., 1986), and Bel I 'A' and 'B' (Scambler et al., 1986) using pcHX14 as the hybridization probe (see Table VIII). Independent authors reported an additional Taq I polymorphism (1.62 or 1.25 Kbp allele with frequencies of 0.77 and 0.23, respectively) (Jaye et al., 1985). None of the RFLPs reside within the protein-encoding portion of the gene as the polymorphic sites were not detected in the cDNA sequence. No linkage disequilibrium has been associated with any of the reported RFLPs; therefore, despite the low frequencies, the polymorphisms will be useful in carrier diagnosis in Stuart Factor deficient families. G. EVOLUTION OF THE FACTOR X GENE Species-specific changes are observed at the nucleotide level for a number of serine protease genes (Walz et al., 1986); however, the number and positions of exons and introns have been conserved (Rogers, 1985; Irwin, 1986). Proteins with common structural and functional features share similar gene organizations (Patthy, 1985; Rogers, 1985). This evolutionary relationship may have evolved through a series of small tandem duplication events and subsequent divergence of the various domains (Doolittle, 1979) or alternatively, it may have arisen through gene fusion of different functional domains from unrelated proteins (Gilbert, 1978, 1979; Patthy, 1985). The second proposal is more widely accepted as homologous domains are observed in a diversity of proteins. 143 Within the blood coagulation proteins, factor X belongs to the vitamin K-dependent proteases which form a closely related family of serine protease genes (Patthy, 1985; Neurath, 1984, 1985; Hewett-Emmett et al., 1981). Most of the cDNA sequences of the vitamin K -dependent blood clotting factors have been determined including prothrombin (Degen et al., 1983; MacGillivray and Davie, 1984), factor IX (Kurachi and Davie, 1982; Jaye et al., 1983), factor VII (Hagen et al., 1986), protein C (Long et al., 1984; Foster and Davie, 1984; Beckmann et al., 1985), protein S (Dahlback et al., 1986; Lundwall et al., 1986; Hoskin et al., 1986), and factor X (Figures 7 and 9; Leytus et al., 1984). All of the proteases demonstrate functionally homologous domains. Corresponding gene organizations of bovine (Irwin et al., 1985; Irwin, 1986) and human prothrombin (Degen et al., 1983, 1985; Davie et al., 1983), human factor IX (Anson et al., 1984; Yoshitake et 1., 1985), human protein C (Foster et al., 1985; Plutzky et al., 1986), and human factor X (Figure 15; Leytus et al., 1986) have been reported, thus permitting comparison and understanding of the evolution of this gene family. Conservation of gene structure is apparent throughout the protein encoding regions (Figure 24). H. COMPARISON OF THE GENES FOR THE VITAMIN K-DEPENDENT BLOOD COAGULATION FACTOR GENES I. ORGANIZATION OF THE GENES FOR THE VITAMIN K-DEPENDENT CLOTTING FACTORS With the exception of the protein S and Z genes, the structures of all of the vitamin K -dependent clotting factor genes have been characterized by DNA sequence analysis. The organization of the genes coding for prothrombin, factor IX, protein C, and factor X is shown in Figure 24. 144 FIGURE 24: COMPARISON OF THE GENES CODING FOR THE VITAMIN IN-DEPENDENT CLOTTING FACTORS Schematic representation of the genes for the vitamin K-dependent clotting factors. Exons are horizontal; introns are vertical. Lengths are given in Kbp. The 5' and 3' untranslated regions are denoted by slashed bars, the leader-encoding sequences are denoted by shaded bars, and the plasma protein-encoding sequences are denoted by open bars. The Gla domain is indicated by 7, the epidermal growth factor homology is indicated by E, and the kringles are represented by KI and KII. The introns are given by the solid vertical bars. Introns which have not been characterized fully are marked by ?. The codons for the active site residues are represented by H (histidine), D (aspartic acid), and S (serine). The direction of transcription is 5' to 3'. PROTHROMBIN HUMAN A gc|D E 1 { H J | j L I lTpUl . I kll J j ] H I BOVINE j B D E FACTOR VII B FACTOR X a T I Y I T E T E T T HT b S B _ Bl H T I T T T E T E 1 FACTOR IX HT D S A B C I G H PROTEIN C Jl T I - Y ^ T E T E T ! HT D ~ s t r n x w a 0.5 1.0 I.S 2.0 i i 146 The prothrombin gene structure has been determined in both the bovine (Irwin et al., 1985; Irwin, 1986) and human (Davie et al., 1983; Degen et al., 1983, 1985). species. Partial DNA sequence analysis and heteroduplex anaylsis shows that the bovine prothrombin gene is composed of 14 exons and 13 introns, and spans a total distance of 15.4 Kbp. The exon lengths range from 25 to 315 bp. In overall structure, the human prothrombin gene is very similar. All exon sizes have been conserved and all introns are at identical positions in both genes. Therefore, the greater size of the human gene (24 Kbp) is attributed to the difference in intron lengths. Irwin et al. (1985) suggested that this size difference may reflect differences in the length of the repetitive elements found between the two species and/or it may be the result of deletions within the bovine intronic sequences. Genomic clones have been isolated and characterized for human factor VII (O'Hara et al., 1987), factor IX (Anson et al., 1984; Yoshitake et al., 1985), protein C (Foster et al., 1985; Plutzky et al., 1986), and factor X (Figure 15; Leytus et al., 1986). These studies have shown that the genes coding for the four proteins are closely related (Figure 24). The factor VII, factor IX, and protein C genes encompass 12.8 Kbp, 33.5 Kbp, and 11.2 Kbp of the human genome, respectively. The difference in length is attributed to variation in intronic sequences. As mentioned earlier, the intervening sequences of the factor IX and protein C genes contain Alu elements whereas the factor VII gene possesses imperfect tandem repeats, analogous to hypervariable minisatellite DNA. Only the length of intron E (intron F in protein C) has been conserved between three of the four genes, factors IX, X, and protein C (Figure 24). The sequence of the strikingly short intron between exons 3 and 4 of factor VII and exons 4 and 5 of protein C share the same degree of homology as the flanking exons, possibly due to constraints enforced by the splicing mechanism (Wieringa et al., 1984). As discussed earlier, the 5' region of the factor X gene has not yet been characterized; thus, the exact length of the gene is unknown but is greater than the minimal estimate of 24 Kbp. The organizations of the factor VII, factor IX, protein C, and factor X genes are homologous with the exception of the 5' end. The protein-encoding and 3' untranslated regions of the 147 four genes are encoded by 8 exons and 7 introns. However, in contrast to the factor IX gene, the 5' untranslated region of the protein C gene is interrupted by an intron that is 21 bp upstream of the initiator methionine (Plutzky et al., 1986). It will be interesting to note whether the factor X gene has a similar intron. 2. LEADER AND GLA REGIONS A comparison of the prothrombin, factor VII, factor IX, protein C, and factor X genes (Figure 25) indicates a similarity between exons 2 to 4 of the protein C gene and exons 1 to 3 of the other genes. In the prothrombin, factor VII, factor IX, and factor X genes, the positions of the first three introns are invariant. In the protein C gene, the placement of the first conserved intron is shifted 6 bp upstream with respect to the other genes, probably by a creation of a new splice acceptor site within the intervening sequence, 5' to the original site. This intron sliding event does not disrupt the translational reading frame (see Figure 25); hence, protein function is maintained (Craik et al., 1983). As noted previously, the protein C gene contains an additional intron in its 5' untranslated region. Characterization of the 5' flanking regions of the factor X and factor VII genes may clarify whether this difference reflects the unique regulatory role of protein C; in general, however, greater variation is exhibited in the 5' untranslated region of the gene than within the coding sequence (Rogers, 1985). This is exemplified by the fibrinolytic protease genes; an intron is observed in the 5' flanking region of the plasminogen activators but not of factor XII (Cool and MacGillivray, 1987). The 5' portion of the vitamin K-dependent protein genes encodes the prepro leader peptide and the Gla regions of the four precursor proteins. The genomic sequence reveals that an optional exon (nucleotides 1133-1192) flanked by exons 1 and 2 encode the 22 amino acid insertion giving rise to the longer leader peptide of factor VII (see Figure 25). The pro segment of the leader sequence is located on the same exon as most of the Gla domain 148 FIGURE 25: COMPARISON OF THE LEADER PEPTIDE AND GLA EXONS OF THE VITAMIN K-DEPENDENT GENES Comparison of the organization of the leader peptide and Gla exons of the factor X (Figure 15; Leytus et al., 1986), the factor VII (O'Hara et al., 1987), the factor IX (Anson et al., 1984; Yoshitake et al., 1985), the protein C (Foster et al., 1985; Plutzky et al., 1986), and prothrombin (Irwin et al., 1985; Irwin, 1986; Davie et al., 1983; Degen et al., 1983, 1985) genes. Exons are represented by open bars; 5' untranslated regions are represented by the slashed bars. The optional exon situated in the leader encoding-sequence of the factor VII gene is indicated by the cross-hatched bar. The 5'-most exon of the factor X gene and the 5' boundary of the factor VII gene have not been characterized. The Gla domains are indicated by 7. Intron phases are as in Table III. The sizes of exons are indicated by the scale representing 100 bp. The sizes of the introns are not to scale. The direction of transcription is 5' to 3'. PROTHROMBIN FACTOR VII FACTOR IX FACTOR X PROTEIN C - M 7 h o -Q- i - Q i i - i ~m T f- o -Q-I - g | 7 f- 0 -Q- I 7 > o -0-7 ]- ° -o-100 bp 150 further supporting an inter-related function for the two polypeptide regions. As both the leader peptide and Gla domains are inherited by all of the vitamin K-dependent protein genes, these two regions most likely evolved from a common ancestral gene; however, the nature of the ancestral gene is unknown. In contrast to the numerous differences found in the intron sites within the serine protease domain, lack of intron movement in the leader-Gla region indicates a recent evolutionary event (Irwin, 1986). Exon shuffling is a mechanism whereby genes can acquire new functions (Gilbert, 1978, 1979). By this process, the leader-Gla region may have duplicated between the factor X-like and prothrombin genes at a later date (Irwin, 1986). 3. EPIDERMAL GROWTH FACTOR-LIKE DOMAINS Downstream of the Gla exon, the structures of the vitamin K-dependent clotting factor genes diverge to form two families comprised of (i) the factor X, factor VII, factor IX, and protein C genes, and (ii) the prothrombin gene (Figure 24). As discussed earlier, the factor X-like proteins each contain two regions homologous to epidermal growth factor. Similar domains are found in urokinase (Nagamine et al., 1984), tissue-type plasminogen activator (Pennica et al., 1983), and factor XII (Cool et al., 1985) (Figure 26). With the exception of the first EGF domain in protein C (which contain four sulfhydryl bridges) (Long et al., 1984; Foster and Davie, 1984; Beckmann et al., 1985), all of the elements retain the three disulfide bonds observed in epidermal growth factor. In contrast to the fibrinolytic proteases, the factor X -like proteins contain a beta-hydroxyaspartic acid modification in the N-terminal homology. In each gene, these EGF homologies are encoded by separate exons (Figure 26). The exon lengths are conserved and all introns interrupt the reading frame in the same phase, after the first nucleotide of a codon. Conservation permits the sequential placement of new protein domains without disruption of the reading frame (Irwin, 1986). 151 FIGURE 26: COMPARISON OF THE EPIDERMAL GROWTH FACTOR HOMOLOGOUS DOMAINS OF THE BLOOD COAGULATION PROTEASE GENES Comparison of the exon organization of the EGF homologies in the factor X, factor VII, factor IX, protein C, factor XII (Cool and MacGillivray, 1987), tissue-type plasminogen activator (Ny et al., 1984), and urokinase plasminogen activator (Riccio et al., 1985) genes. Exons are represented by open bars. Triplet codon phase is designated as given in Table III. The sizes of the introns are not to scale. The vertical lines below the open bars denote positions of cysteine residues in the exons. The beta-hydroxyasparic acid residue is indicated by p. The epidermal growth factor homology is represented by EGF, and the fibronectin type I homology is designated by FNI. The scale represents 50 bp. The direction of transcription is indicated as 5' to 3'. FACTOR X FACTOR VII FACTOR IX PROTEIN C FACTOR XII T ISSUE-TYPE PLASMINOGEN UROKINASE J 1 ' i 1 1 — i —HT r J I - I EGF 1 -•—i 1 II I — n r I -I EGF r-1 H — i — i — H T r ACTIVATOR 5 3 H EGF h 1 I I I II i EGF i i i II i EGF 1 1 .11 -4 FN I r- I -I EGF V I H n— i—^ rn—i — T I — r I -I EGF > I l i i—TT r 1 -I., r » F U .I" 1 SObp' 153 Whether the acquisition of these growth factor domains was the result of an internal gene duplication or two separate insertional events is not clear. Patthy (1985) postulated that two individual duplication events occurred. Before the divergence of the fibrinolytic and factor X-like enzymes, a growth factor domain closely related to the j unit of the EGF precursor, but containg the beta-hydroxyaspartic acid, was inserted in the ancestral protease gene. The second EGF module (homologous to the e unit of EGF) was inserted into the factor X-like proteases at a later date (Patthy, 1985). However, two independent duplication from a common protein donor is highly unlikely; mispairing between the ancestral factor X-like gene with the EGF precursor gene may have facilitated the transfer of the neighboring e unit into the serine protease gene by some shuffling event (Patthy, 1985). The prothrombin gene acquired two kringle structures in place of the EGF homologies. Kringle 1 (KI in Figure 24) is encoded by 2 exons whereas kringle 2 (KII in Figure 24) is encoded by a single exon. It has been proposed that the dissimilarity in the gene and protein structures of the two kringles indicates that two distinct insertional events have occurred (Kurosky et al., 1980). Similar to the leader-Gla region, the EGF and kringle domains are encoded by one or two exons (Figure 24). The presence of discrete functional units permits the shuffling of these domains between and within various genes. 4. SERINE PROTEASE DOMAIN A comparison of the exon organization of the catalytic regions of the serine protease genes indicates five different groups (Rogers, 1985; Irwin, 1986; Hewett-Emmett et al., 1981) (Figure 27). Groupings are based on similarities in the number and placement of introns and is consistent with grouping by amino acid sequence homology. The factor X, factor IX, factor VII, and protein C genes comprise one of these groups, and prothrombin, a second. A third family consists of the digestive protease zymogens trypsinogen, chymotrypsin, and proelastase, kallikrein, the alpha and gamma subunits of the nerve growth factor, tissue-type 154 FIGURE 27: COMPARISON OF THE EXON ORGANIZATION OF THE SERINE PROTEASE DOMAIN Comparison of the serine protease exon organization in the haptoglobin (Maeda et al., 1984), trypsinogen (Craik et al., 1984), chymotrypsinogen (Bell et al., 1984), proelastase (Swift et al., 1984), kallikrein (Mason et al., 1983; van Leeuwen et al., 1986), alpha and gamma subunits of nerve growth factor (Evans and Richards, 1985), tissue-type plasminogen activator (Ny et al., 1984; Degen et al., 1986), urokinase (Nagamine et al., 1984; Riccio et al., 1985), complement B factor (Campbell and Porter, 1983; Campbell et al., 1984), factor IX (Anson et al., 1984; Yoshitake et al., 1985), protein C (Foster et al., 1985; Plutzky et al., 1986), factor VII (O'Hara et al., 1987), prothrombin (Irwin, 1986) and factor X genes. Intron phases are as in Figures 25 and 26. Codons for the residues at the site of activation of the zymogens are denoted by the vertical arrows; the site of activation of complement factor B and the gamma subunit of nerve growth factor are not activated in this way. The codons for the active site residues histidine, aspartate, and serine are denoted by H, D, and S respectively; in haptoglobin, however, the corresponding codons code for lysine (K), aspartate (D), and alanine (A) residues. The 3' most exons of factor IX, tissue-type plasminogen activator, and urokinase have been abbreviated - they are 1935 bp, 914 bp, and 1119 bp in size respectively. The exons coding 5' untranslated regions are indicated by the dotted boxes, and 3' untranslated regions by the slashed bars. A unique coding region of complement factor B is indicated by the solid box. The scale represents 100 bp (modified from Irwin, 1986). Haptoglobin Trypsinogen Chymotrypsinogen Proelastase Kallikrein «NGF YNGF tPA Urokinase Fac to r XII Factor B i-C K _ i _ D _ L _ A • O - ' -I V I" H \- I -i r-O-t | H D S UTT-l-l r-O-fl-ii-l \Q\ ' t- I -I f-OH" WA O i £ _ 3 o C__l - ' i C Z D o C Z Z _ > ' -CZZZho-t f H D S ISO- i H V I M ' y- i -i t-o-i1 taa ypw//////m h 1 H M M h H D h \ -\ V i H h -r }o-_ H >o-C i H ZZhi-CZ V///AY/////////A i H D s H _Z_-nCZ Z j - i d _ZH-C_Z . : Hon V//AV/////////A i i IH-C >i-C D-o-C Factor IX i -{ Prothrombin H r-O-l H i H r-OH ' |-H i H KOH ' h • H l H H H 5' -3' y i -L D }H__ZrH to, H-TH-o-r s H V//AV///////A D S -\ W///////////A ioobp 156 plasminogen activator, urokinase, and factor XII. Haptoglobin, a hemoglobin binding protein, is closely related to the blood coagulation factors but during evolution it has lost the active site histidine and serine residues in addition to all intron sequences within the region homologous to the serine proteases (Hewett-Emmett et al., 1981; Rogers, 1985). It represents the fourth group. The final family consists of the complement factor B zymogen, which displays the most complex intron/exon organization of the serine protease genes (Blake, 1983a, b; Rogers, 1985; Irwin, 1986). In contrast to the other serine protease gene groups, the factor X-like genes are encoded by three exons; the aspartic acid and serine residues at the active site are located on a single large exon with the active site histidine residue on the adjacent exon upstream (Figure 27). The 5'-most exon encodes the activation peptide and the cleavage sites which gives rise to the two chain forms of factor X and protein C. Lack of size uniformity reflects differences in the activation region. As a serine protease gene, prothrombin is unique (Irwin, 1986; Rogers, 1985). As in two of the other serine protease gene groups, the catalytic residues of the prothrombin gene are encoded by individual exons (Figure 27). However, none of the five intron positions correspond to those present in the other genes. Differences in the organization of the five serine protease gene families suggest an ancient gene duplication (Hewett-Emmett et al., 1981; Young et al., 1978). As evident from the differences found in gene organization of the prothrombin and factor X-like genes (Figure 27), divergence occurred over 600 million years ago (Young et al., 1978). Subsequent independent intron invasion and intron sliding events then gave rise to the heterogeneous serine protease groups observed today (Rogers, 1985; Irwin, 1986; Doolittle, 1985). Alternatively, others argue that fewer introns are present in invertebrate species (Gilbert, 1985) due to intron loss (Blake 1983a, b). Thus, introns were present prior to gene duplication and were subsequently lost to generate the older gene families as exemplified by haptoglobin. In the trypsinogen-like family, concurrent duplication and intron invasion 157 processes may result in the conservation of some intron sites and divergence of others (Irwin, 1986) (Figure 27). I. EVOLUTION OF THE VITAMIN K-DEPENDENT COAGULATION FACTORS A model for the evolution of the vitamin K-dependent coagulation factors has been proposed (Irwin, 1986) and is shown in Figure 28. From amino acid homology data (Walz et al., 1986) and common active site codon usage (Irwin, 1986), it is evident that the vitamin K -dependent coagulation factors belong to a separate lineage of the family of serine proteases. Based on amino acid identity and gene organization, the blood coagulation proteases duplicated from the ancestor of the pancreatic protease zymogens greater than one billion years ago (Young et al., 1978). The two groups comprised of the factor X-like genes and prothrombin appear to have diverged over 600 million years ago (Young et al., 1978). Analysis of the active site serine codon indicates that a minimum of two point mutations are required to generate the present day vitamin K-dependent serine proteases (Figure 28) (Irwin, 1986). In the vitamin K-dependent serine protease genes, the active site serine is represented by A G Y compared to the TCN codon used in the trypsinogen-like genes. Branching of the nonproteolytic haptoglobin may have occurred subsequent to the first mutation (see Figure 28). Restoration of catalytic activity gave rise to the ancestral vitamin K-dependent protease gene. As no mutations altered the other essential regions of the protease domain, Irwin suggested that the two mutations in the codon for the active site serine must have occurred within a relatively short period of time. Distribution of the structural domains shared by the factor X-like genes may have been facilitated by exon shuffling (Gilbert 1978, 1979) of discrete functional exon units (Neurath, 1985). All of the vitamin K-dependent coagulation factors contain prepro leaders and Gla regions (Figure 21). The interdependent function (Jorgenson et al., 1987; Suttie et al., 1987) 158 FIGURE 28: A MODEL FOR THE EVOLUTION OF THE VITAMIN K-DEPENDENT COAGULATION FACTORS Rectangles represent the serine protease domain, with S for active serine protease, and X for altered active site serine residue. Triangles with 7 represent the leader-Gla domain. Squares represent the kringles, and are numbered as in mammalian prothrombins. Circles with E represent the epidermal growth factor homologies. The model was taken from Irwin (1986). 159 duplication I point * mutation dupl icat ion point v. mutation TRYPSINOGEN ETC • ^ gene { fusion HAPTOGLOBIN dupl icat ion © N A©f gene fusion 0 N A a gene fusion gene fusion A©©<=5 AG gene fusion duplications PROTHROMBIN FACTOR VII / FACTOR I X \ FACTOR X PROTEIN C PROTEIN Z >1 X 10 yrs > 250X 10° y r s 160 and the exon organization argue that the two domains were inherited as one unit, although the pre peptide may predate the acquisition of the pro-Gla region (Irwin, 1986). A prothrombin molecule containing a Gla domain has been identified in lamprey; therefore duplication of the Gla-protease gene which gave rise to the factor X-like and the prothrombin genes probably occurred greater than 450 million years ago (see Irwin, 1986). Subsequently, the prothrombin gene acquired the two kringles and the factor X-like genes acquired the epidermal growth factor homologies. In each case, two distinctive insertional events probably occurred (Patthy, 1985). In prothrombin, kringle 2 was acquired originally. Homology to the third kringle of plasminogen suggest that a duplication of this plasminogen kringle gave rise to kringle 1 in prothrombin (Kurosky et al., 1980). Differences in intron placement within the kringle structures can then be attributed to intron sliding (Craik et al., 1983) following the duplication events. In the factor X-like genes, the homology to the analogous domain in the fibrinolytic proteases factor XII, urokinase, and tissue-type plasminogen activator suggests that the beta-hydroxyaspartic acid-containing EGF domain was inserted initially (Patthy, 1985). Mispairing and gene conversion may have been responsible for the separate insertion of the second growth factor module (Patthy, 1985). Finally, subsequent amino acid substitution, insertion, and deletion events generated the present day prothrombin gene (Irwin, 1986). Within the factor X-like gene family, the evolutionary pathways are not as well defined. The gene family diverged relatively recently as reflected both in exon organization and in amino acid sequence homology. Factor VII, factor IX, and factor X are present in both birds (Didisheim et al., 1959; Walz et al., 1974) and mammals (Jackson and Nemerson, 1980). Therefore, at least some of the duplication events were completed over 250 million years ago (Figure 28). Further studies in other species will be useful in identifying the evolutionary steps within this gene family. 161 V. LITERATURE CITED Anson, D.S., Choo, K.H. , Rees, D.J.G., Giannelli, F., Gould, K., Huddleston, J.A., and Brownlee, G.G. (1984). The Gene Structure of Human Anti-Haemophilic Factor IX. EMBO J. 3; 1053-1060. Aronson, D.L., Mustafa, A.J., and Mushinski, J.F. (1969). Purification of Human Factor X and Comparison of Peptide Maps of Human Factor X and Prothrombin. Biochim. Biophys. Acta. 188; 25-30. Atkinson, T. and Smith, M. (1984). Solid Phase Synthesis of Oligodeoxyribonucleotides By the Phosphite-Triester Method, in Oligonucleotide Synthesis: A Practical Approach (Gait, M.J. Ed.), IRL Press Ltd., Oxford, pp. 35-81. Aviv, H. and Leder, P. (1972). Purification of Biologically Active Globin Messenger RNA By Chromatography on Oligothymidylic Acid-Cellulose. Proc. Natl. Acad. Sci. USA 69; 1408-1412. Bach, R., Gentry, R., and Nemerson, Y. (1986). Factor VII Binding to Tissue Factor in Reconstituted Phospholipid Vesicles: Induction of Cooperativity by Phosphatidyl Serine. Biochemistry 25; 4007-4020. Bahnak, B.R., Howk, R., Morrissey, J.H., Ricca, G.A., Edgington, T.S., Jaye, M . C , Drohan, W.W., and Fair, D.S. (1987). Steady State Levels of Factor X mRNA in Liver and Hep G2 Cells. Blood 69; 224-230. Bajaj, S.P. and Mann, K .G. (1973). Simultaneous Purification of Bovine Prothrombin and Factor X. J. Biol. Chem. 248; 7729-7741. Barnes, W.M., Bevan, M., and Son, P.H. (1983). Kilo-Sequencing and Creation of an Ordered Nest of Asymmetric Deletions Across a Large Target Sequence Carried on Phage M13. Meth. Enzymol. 101; 98-122. Beckmann, R.J., Schmidt, R.J., Santerre, R.F., Plutzky, J., Crabtree, G.R., and Long, G.L. (1985). Structure and Evolution of a 461 Amino Acid Human Protein C Precursor and Its Messenger RNA Based Upon the DNA Sequence of Cloned Liver cDNA. Nucl. Acids Res. 13; 5233-5247. Bell, G.L, Quinto, C , Quiroga, M., Valenzuela, P., Craik, C.S., and Rutter, W.J. (1984). Isolation and Sequence of a Rat Chymotrypsinogen B Gene. J. Biol. Chem. 259; 14265-14270. Belt, K.T. , Carroll, M . C , and Porter, R.R. (1984). The Structural Basis of the Multiple Forms of Human Complement Component C4. Cell 36; 907-914. Benoist, C , O'Hare, K., Breathnach, R., and Chambon, P. (1980). The Ovalbumin Gene -Sequence of Putative Control Regions. Nucl. Acids Res. 8; 127-142. Bentley, A.K. , Rees, D.J.G., Rizza, C , and Brownlee, G.G. (1986). Defective Propeptide Processing of Blood Clotting Factor IX Caused By Mutation of Arginine to Glutamine at Position -4. Cell 45; 343-348. Benton, W.D. and Davis, R.W. (1977). Screening Lambda gt Recombinant Clones By Hybridization In Situ. Science 196; 180-182. 162 Berget, S.M. (1984). Are U4 Small Nuclear Ribonucleoproteins Involved in Polyadenylation? Nature 309; 179-182. Berkner, K., Busby, S., Davie, E., Hart, C , Insley, M. , Kisiel, W., Kumar, A., Murray, M., O'Hara, P., Woodbury, R., and Hagen, F. (1986). Isolation and Expression of cDNAs Ecoding Human Factor VII. Cold Spring Harbour Symp. Quant. Biol. 51; 531-541. Bertina, R.M. and van der Linden, I.K. (1984). Detection and Classification of Molecular Variants of Factor IX, in The Hemophilias. Methods in Hematology (Bloom, A.L. Ed.), Churchill Livingstone, New York, pp 151-155. Birboim, H.C. and Doly, J. (1979). A Rapid Extraction Procedure For Screening Recombinant Plasmid DNA. Nucl. Acids Res. 7; 1513-1523. Birnstiel, M.L., Busslinger, M. , and Strub, K. (1985). Transcription Termination and 3' Processing: The End is in Site. Cell 41; 349-359. Blake, C.C.F. (1978). Do Genes-In-Pieces Imply Proteins-In-Pieces? Nature 273; 267. Blake, C. (1983a). Exons - Present From the Beginning? Nature 306; 535-537. Blake, C. (1983b). Exons and the Evolution of Proteins. Trends Biochem. Sci. 8; 11-13. Blattner, F.R., Williams, B.G., Blechl, A.E. , Denniston-Thompson, K., Farber, H.E., Furlong, L.A., Grunwald, D.J., Kiefer, D.O., Moore, D.D., Schamm, J.W., Sheldon, E.L., and Smithies, O. (1977). Charon Phages: Safer Derivatives of Bacteriophage Lambda For DNA Cloning. Science 196; 161-169. Bloebel, G. Walter, P., Chang, C.N., Goldman, B.M., Erickson, A.H., and Lingappa, R. (1979). Translocation of Proteins Across Membranes: The Signal Hypothesis and Beyond, in Secretory Mechanisms (Hopkin, C.R. and Duncan, C.J. Eds.), vol. 33, Cambridge Univ. Press, London, pp. 9-36. Blomquist, M . C , Hunt, L.T., Barker, N .C (1984). Vaccina Virus 19-Kilodalton Protein: Relationship to Several Mammalian Proteins, Including Two Growth Factors. Proc. Natl. Acad. Sci. USA 81; 7363-7367. Bloom, A.L. (1981). Inherited Disorders of Blood Coagulation, in Haemostasis and Thrombosis (Bloom, A.L. and Thomas D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 321-370. Boissy, R. and Astell, C.R. (1985). An Escherichia Coli RecBC sbcB RecF Host Permits The Deletion-Resistant Propagation of Plasmid Clones Containing the 5'-Terminal Palindrome of Minute Virus of Mice. Gene 35; 179-185. Borowski, M. , Furie, B.C., Bauminger, S., and Furie, B. (1986). Prothrombin Requires Two Sequential Metal-Dependent Conformational Transitions to Bind Phospholipid. J. Biol. Chem. 261; 14969-14975. Breathnach, R., and Chambon, P. (1981). Organization and Expression of Eukaryotic Split Genes Coding For Proteins. Ann. Rev. Biochem. 50; 349-383. Brown, D.M., Frampton, J., Goelet, P., Karn, J. (1982). Sensitive Detection of RNA Using Strand-Specific Ml3 Probes. Gene 20; 139-144. 163 Broze, G.J. and Miletich, J.P. (1987). Characterization of the Inhibition of Tissue Factor in Serum. Blood 69; 150-155. Burgess, A.I. and Esnouf, M.P. (1985). Post-Translational Modifications in the Blood Clotting Systems, in The Enzymology of Post-Translational Modification of Proteins (Freedman, R.B. and Hawkins, H.C. Eds.), Vol. 2, Academic Press, London, pp. 299-337. Busby, S., Berkner, K., Halfpap, L. , Gambee, J., Kumar, A. (1987). Alteration of Propeptide Sequence Impairs Biological Activity of Human Factor VII. Thrombosis and Haemostasis 58; 269. Butkowski, R.J., Elion, J., Downing, M.R., and Mann, K .G. (1977). Primary Structure of Human Prethrombin 2 and Alpha-Thrombin. J. Biol. Chem. 252; 4942-4957. Camerino, G. Grzeschik, K.H. , Jaye, M. , de la Salle, H., Tolstoshev, P., Lecoq, J.P., Heilig, R., and Mandel, J.L. (1984). Regional Localization on the Human X Chromosome and Polymorphism of the Coagulation Factor IX Gene (Haemophilia B Locus). Proc. Natl. Acad. Sci. USA 81; 498-502. Camerino, G., Oberle, I., Drayna, D., and Mandel, J.-L. (1985). A New Msp I Restriction Length Polymorphism in the Hemophilia B Locus. Hum. Genet. 71; 79-81. Campbell, R.D., Bentley D.R., and Morley, B.J. (1984). The Factor B and C2 Genes. Phil. Trans. R. Soc. Lond. B. 306; 367-378. Campbell, R.D. and Porter, R.R. (1983). Molecular Cloning and Characterization of the Gene Coding For Human Complement Protein Factor B. Proc. Natl. Acad. Sci. USA 80; 4464-4468. Casadaban, M.J. and Cohen, S.N. (1980). Analysis of Gene Control Signals By DNA Fusion and Cloning in Escherichia Coli. J. Mol. Biol. 138; 179-207. Cech, T.R. (1983). RNA Splicing: Three Themes With Variations. Cell 34; 713-716. Chaconas, G. and van de Sande, J.H. (1980). 5'-32P Labeling of RNA and DNA Restriction Enzyme Fragments. Meth. Enzymol. 65; 75-85. Charnay, P., Treisman, R., Mellon, P., Chao, M. , Axel, R., and Maniatis, T. (1984). Differences in Human Alpha and Beta Globin Gene Expression in Mouse Erythroleukemia Cells: The Role of Intragenic Sequences. Cell 38; 251-263. Chirgwin, J.M., Przybyla, A.E. , MacDonald, R.J., and Rutter, W.J. (1979). Isolation of Biologically Active Ribonucleic Acid From Sources Enriched in Ribonuclease. Biochemistry 18; 5294-5299. Chung, D.W., Fujikawa, K., McMullen, B.A., and Davie, E.W. (1986). Human Plasma Prekallikrein, a Zymogen To a Serine Protease That Contains Four Tandem Repeats. Biochemistry 25; 2410-2417. Ciliberto, G., Dente, L. , and Cortese, R. (1985). Cell-Specific Expression of a Transfected Human Alpha 1-Antitrypsin Gene. Cell 41; 531-540. Collen, D. (1981). Natural Inhibitors of Haemostasis With Particular Reference to Fibrinolysis, in Haemostasis and Thrombosis (Bloom, A.L. and Thomas, D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 225-235. 164 Cook, R.M., Wilkinson, A.J., Baron, M. , Pastore, A., Tappin, M.J., Campbell, I.D., Gregory, H., and Sheard, B. (1987). The Solution Structure of Human Epidermal Growth Factor. Nature 327; 339-341. Cool, D.E., Edged, C.-J.S., Louie, G.V., Zoller, M.J. Brayer, G.D., and MacGillivray, R.T.A. (1985). Characterization of Human Blood Coagulation Factor XII cDNA. J. Biol. Chem. 260; 12666-13676. Cool, D.E. and MacGillivray, R.T.A. (1987). Characterization of the Human Blood Coagulation Factor XII Gene. Intron/Exon Gene Organization and Analysis of the 5' Flanking Region. J. Biol. Chem. 262; 13662-13673. Corden, J., Wasylyk, B., Buchwalder, A., Sassone-Corsi, P. Kedinger, C , and Chambon, C. (1980). Promotor Sequences of Eukaryotic Protein-Coding Sequences. Science 209; 1406-1414. Craik, C.S., Choo, Q.-L., Swife, G.H., Quinto, C , MacDonald, R.J., and Rutter, W.J. (1984). Structure of Two Related Rat Pancreatic Trypsin Genes. J. Biol. Chem. 259; 14255-14264. Craik, C.S., Rutter, W.J., and Fletterick, R. (1983). Splice Junctions: Association With Variation in Protein Structure. Science 220; 1125-1129. Craik, C.S., Sprang, S., Fletterick, R., and Rutter, W.J. (1982). Intron-Exon Splice Junctions map at Protein Surfaces. Nature 299; 180-182. Curtis, C.G. (1981). Plasma Factor XIII, in Haemostasis and Thrombosis (Bloom, A.L. and Thomas, D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 192-197. Dagert, M. and Ehlrich, S.D. (1979). Prolonged Incubation in Calcium Chloride Improves the Competence of Escherichia Coli Cells. Gene 6; 28-28. Dahlback, B., Lundwall, A., Hillarp, A., Malm, J., and Stenflo, J. (1987). Structure and Function of Vitamin K-Dependent Protein S. Thrombosis and Haemostasis 58; 48. Dahlback, B., Lundwall, A., and Stenflo, J. (1986). Primary Structure of Bovine Vitamin K -Dependent Protein S. Proc. Natl. Acad. Sci USA 83; 4199-4203. Dam, H., Schonheyder, F., and Tage-Hansen, E. (1936). Studies on the Mode of Action of Vitamin K. Biochem. J. 30; 1075-1079. Davie, E.W., Degen, S.J.F., Yoshitake, S., and Kurachi, K. (1983). Cloning of Vitamin K -Dependent Clotting Factors. Dev. Biochem. 25; 45-52. Davie, E.W., Fujikawa, K., Kurachi, K., and Kisiel, W. (1979). The Role of Serine Proteases in the Blood Coagulation Cascade. Adv. Enzymol. 48; 277-318. Davie, E.W. and Ratnoff, O.D. (1964). Waterfall Sequence for Intrinsic Blood Clotting. Science 145; 1310-1312. de Bruijn, M.H.L. and Fey, G.H. (1985). Human Complement Component C3: cDNA Coding Sequence and Derived Primary Structure. Proc. Natl. Acad. Sci. USA 82; 708-712. 165 Degen, S.J.F., MacGillivray, R.T.A., and Davie, E.W. (1983). Characterization of the Complementary Deoxyribonucleic Acid and Gene Coding For Human Prothrombin. Biochemistry 22; 2087-2097. Degen, S.J.F., Rajput, B., Reich, E. , and Davie, E.W. (1985). Coagulation and Fibrinolysis: Characterization of the Human Prothrombin and Tissue Plasminogen Activator Genes, in Protides of the Biological Fluids (Peeters, H. Ed.), Vol. 33, Pergamon Press, Oxford, pp. 47-50. de Grouchy, J., Dautzenberg, M.D., Turleau, C , Beguin, S., and Chavin-Colin, F. (1984). Regional Mapping of Clotting Factors VII and X to 13q34: Expression of Factor VII Through Chromosome 8. Hum. Genet. 66; 230-233. Deininger, P.L. (1983). Random Subcloning of Sonicated DNA: Application to Shotgun DNA Sequence Analysis. Anal. Biochem. 129; 216-223. Delany, A.D. (1982). A DNA Sequence Handling Program. Nucl. Acids Res. 10; 61-67. Didisheim, P., Hattori, K., and Lewis, J.H. (1959). Hematologic Coagulation Studies in Various Animal Species. J. Lab. Clin. Med. 53; 866-875. Diuguid, D.L., Rabiet, M.J., Furie, B.C., Liebman, H.A., and Furie, B. (1986). Molecular Basis of Hemophilia B: A Defective Enzyme Due to an Unprocessed Propeptide is Caused By a Point Mutation In the Factor IX Precursor. Proc. Natl. Acad. Sci. USA 83; 5803-5807. DiScipio, R.G., Hermodson, M.A., Yates S.G., and Davie, E.W. (1977). A Comparison of Human Prothrombin, Factor IX (Christmas Factor), Factor X (Stuart Factor), and Protein S. Biochemistry 16; 698-706. Doolittle, R.F. (1979). Protein Evolution, in The Proteins (Neurath, H. and Hill, R.F. Eds.), Vol. 4, Academic Press, New York, pp. 1-118. Doolittle, R.F. (1981a). Fibrinogen and Fibrin. Scientific American 245; 126-135. Doolittle, R.F. (1981b). Fibrinogen and Fibrin, in Haemostasis and Thrombosis (Bloom, A.L. and Thomas, D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 163-191. Doolittle, R.F. (1985). The Geneology of Some Recently Evolved Vertebrate Proteins. Trends Biochem. Sci. 10; 233-237. Doolittle, R.F., Feng, D.F., and Johnson, M.S. (1984). Computer-Based Characterization of Epidermal Growth Factor Precursor. Nature 307; 558-560. Dupe, R. and Howell, R. (1973). The Purification and Properties of Factor X From Pig Serum and Its Role in Hypercoagulability In Vivo. Biochem. J. 133; 311-321. Dush, M.K., Sikela, J.M., Kahn, S.A., Tischfield, J.A., and Stambrook, P.J. (1985). Nucleotide Sequence and Organization of the Mouse Adenine Phosphoribosyltranferase Gene: Presence of a Coding Region Common to Animal and Bacterial Phosphoribosyltransferases That Has a Variable Intron/Exon Arrangement. Proc. Natl. Acad. Sci. USA 82; 2731-2735. Edenbrandt, C M . , Gershagen, S., Fernlund, R., Wydro, R., Stenflo, J., and Lundwall, A. (1987). Gene Structure of Vitamin K-Dependent Protein S; A Region Homologous to Sex 166 Hormone Binding Globulin (SHBG) Replaces the Serine Protease Region of Factors IX, X, and Protein C. Thrombosis and Haemostasis 58; 497. Edgell, M.H., Hardies, S.C, Brown, B., Voliva, C , Hill, A., Phillips, S., Comer, M . , Burton, F., Weaver, S., and Hutchison III, C A . (1983). Evolution of the Mouse Gamma Globin Complex Loci, in Evolution of Genes and Proteins (Nei, M. and Koehn, R.K. Eds.), Sinauer Associates Inc., Sanderland, Mass., pp. 1-13. Edmonds, M. , Vaughn, M.H., and Nakazato, H. (1971). Polyadenylic Acid Sequences in the Heterologous Nuclear RNA and Rapidly-Labeled Polyribosomal RNA of HeLa Cells: Possible Evidence For a Precursor Relationship. Proc. Natl. Acad. Sci. USA 68; 1336-1340. Edwards, J.H., Jonasson, J.A., and Blackwell, N.L. (1984). Locus For Cystic Fibrosis. Lancet 1; 1020. Enfield, P.L., Ericsson, L.H. , Fujikawa, D., Walsh, K.A. , Neurath, H., and Titani, K. (1980). Amino Acid Sequence of the Light Chain of Bovine Factor XI (Stuart Factor). Biochemistry 19; 659-667. Esmon, C T . (1987). The Regulation of Natural Anticoagulant Pathways. Science 235; 1348-1352. Esmon, N.L., DeBault., L.E. , and Esmon, C T . (1983). Proteolytic Formation and Properties of Gamma-Carboxyglutamic Acid-Domainless Protein C. J. Biol. Chem. 258; 5548-5553. Esmon, C.T., Owen, W.G., and Jackson, C M . (1974). A Plausible Mechanism For Prothrombin Activation By Factor Xa, Factor Va, phospholipid and calcium ions. J. Biol. Chem. 249; 8045-8947. Esnouf, M.P., Lloyd, P.H., Jesty, J. (1973). A Method For the Simultaneous Isolation of Factor X and Prothrombin. Biochem. J. 131; 181-789. Evans, B.A. and Richards, R.L (1985). The Genes For the Alpha and Gamma Subunits of Mouse Nerve Growth Factor Are Contiguous. EMBO J. 4; 133-138. Fair, D.S. and Bahnak, B.R. (1984). Human Hepatoma Cells Secrete Single Chain Factor X, Prothrombin, and Antithrombin III. Blood 64; 194-204. Fernlund, P. and Stenflo, J. (1982). Amino Acid Sequence of the Light Chain of Bovine Protein C. J. Biol. Chem. 257; 12170-12179. Fernlund, P. and Stenflo, J. (1983). Beta-Hydroxyaspartic Acid in Vitamin K-Dependent Proteins. J. Biol. Chem. 258; 12509-12512. Feinberg, A.P. and Vogelstein, B. (1983). A Technique For Radiolabeling DNA Restriction Endonuclease Fragments to High Specific Activity. Anal. Biochem. 132; 6-13. Fiddes, J.C. and Goodman, H.M. (1980). The cDNA For the Beta-Subunit of Human Chorionic Gonadotropin Suggests Evolution of a Gene By Readthrough Into the 3'-Untranslated Region. Nature 286; 684-687. Fisher, R., Waller, E.K., Grossi, G., Thompson, D., Tizard, R., and Schleuning, W.-D. (1985). Isolation and Characterization of the Tissue-Type Plasminogen Activator Structural Gene Including Its 5' Flanking Region. J. Biol. Chem. 260; 11223-11230. 167 Foster, D.C. and Davie, E.W. (1984). Characterization of a cDNA Coding For Human Protein C. Proc. Natl. Acad. Sci. USA 81; 4766-4770. Foster, D., Schach, B., Rudinsky, M. , Berkner, K., Kumar, A., Sprecher, C , Hagen, F., and Davie, E.W. (1987). The Effect of Changes in the Leader Sequence of Human Protein C on Biosynthetic Processing and Gamma-Carboxylation. Thrombosis and Haemostasis 58; 230, 330. Foster, D . C , Yoshitake, S., and Davie, E.W. (1985). The Nucleotide Sequence of the Gene For Human Protein C. Proc. Natl. Acad. Sci. USA 82; 4673-4677. Friezner-Degen, S.J., Rajput, B., and Reich, E. (1986). The Human Tissue Plasminogen Activator Gene. J. Biol. Chem. 261; 6972-6985. Frischauf, A .M. , Lehrach, H., Poustka, A., Murray, N. (1983). Lambda Replacement Vectors Carrying Polylinkers Sequences. J. Mol. Biol. 170; 827-842. Fujikawa, K., Chung, D.W., Hendrickson, L.E. , and Davie, E.W. (1986). Amino Acid Sequence of Human Factor XI, a Blood Coagulation Factor With Four Tandem Repeats That Are Highly Homologous With Plasma Prekallikrein. Biochemistry 25; 2417-2424. Fujikawa, K., Coan, M.H. Legaz, M.E., and Davie, E.W. (1974). The Mechanism of Activation of Bovine Factor X (Stuart Factor) By Intrinsic and Extrinsic Pathways. Biochemistry 13; 5290-5299. Fujikawa, K., Legaz, M.E., and Davie, E.W. (1972a). Bovine Factors XI and X2 (Stuart Factor), Isolation and Characterization. Biochemistry 11; 4882-4891. Fujikawa, K., Legaz, M.E., and Davie, E.W. (1972b). Bovine Factor XI (Stuart Factor): Mechanisms of Activation By a Protein From Russell's Viper Venom. Biochemistry 11; 4892-4899. Fujikawa, K., Titani, K., and Davie, E.W. (1975). Activation of Bovine Factor X (Stuart Factor): Conversion of Activation of Factor Xaalpha to Factor Xabeta. Proc. Natl. Acad. Sci. USA 72; 3359-3363. Fung, M.R., Campbell, R.M., and MacGillivray, R.T.A. (1984). Blood Coagulation Factor X mRNA Encodes a Single Polypeptide Containing a Pre-Pro Leader Sequence. Nucl. Acids Res. 12; 4481-4492. Fung, M.R., Hay, C.W., and MacGillivray, R.T.A. (1985). Characterization of an Almost Full-Length cDNA For Human Blood Coagulation Factor X. Proc. Natl. Acad. Sci. USA 82; 3591-3595. Furie, B., Bing, D.H., Feldmann, R.J., Robison, D.J., Burnier, J.P., and Furie, B.C. (1982). Computer-Generated Models of Blood Coagulation Factor Xa, Factor IXa, and Thrombin Based Upon Structural Homology With Other Serine Proteases. J. Biol. Chem. 257; 3875-3882. Gaffney, P.J. (1981). The Fibrinolytic System, in Haemostasis and Thrombosis (Bloom A.L. and Thomas D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 198-224. Giannelli, F., Choo, K.H. , Winship, P.R., Rizza, C.R., Anson, D.S., Rees, D.J.G., Forrari, N., and Brownlee, G.G. (1984). Characterization and Use of an Intragenic Polymorphic Marker For Detection of Carriers of Haemophilia (Factor IX Deficiency). Lancet 1; 239-241. 168 Gil, A. and Proudfoot, N.J. (1984). A Sequence Downstream of A A U A A A is Required For Rabbit Beta-globin mRNA 3' End Formation. Nature 312; 473-474. Gilbert, W. (1978). Why Genes in Pieces? Nature 271; 501. Gilbert, W. (1979). Introns and Exons: Playgrounds of Evolution, in Eukaryotic Gene Regulation (Axel, R., Maniatis, T., and Fox, C F . Eds.), Academic Press, New York, pp. 1-12. Gilbert, W. (1985). Genes-In-Pieces Revisited. Science 228; 823-824. Gitschier, J., Drayna, D., Tuddenham, E.G.D., White, R.L., and Lawn, R.M. (1985). Genetic Mapping and Diagnosis of Haemophilia A Achieved Through a Bel I Polymorphism in the Factor VIII Gene. Nature 314; 736-740. Gitschier, J.G., Wood, W.I., Goralka, T.M., Wion, K .L . , Chen, E.Y., Eaton, D.H., Vehar, G.A., Capon, D.J., and Lawn, R.M. (1984). Characterization of the Human Factor VIII Gene. Nature 312; 326-330. Gordon, J.I., Budelier, K.A. , Sims, H.F., Edelstein, C , Scanu, A.M. , and Strauss, A.W. (1983). Biosynthesis of Human Preproapolipoprotein A-II. J. Biol. Chem. 258; 14054-14059. Gordon, J.I., Sims, H.F., Edelstein, C , Scanu, A.M. , and Strauss, A.W. (1984). Human Proapolipoprotein A-II is Cleaved Following Secretion By Hep G2 Cells By a Thiol Protease. J. Biol. Chem. 259; 15556-15563. Gorski, K., Carneiro, M. , and Schibler, U. (1986). Tissue-Specific In Vitro Transcription From the Mouse Albumin Promoter. Cell 47; 767-776. Graboswki, P.J., Padgett, R.A., Sharp, P.A., (1984). Messenger RNA Splicing In Vitro; An Excised Intervening Sequence and a Potential Intermediate. Cell 37; 415-427. Grabowski, P.J., Seiler, S.R., and Sharp, P.A., (1985). A Multicomponent Complex is Involved in the Splicing of Messenger RNA Precursors. Cell 42; 345-353. Graves, C.B., Munns, T.W., Willingham, A.K. , and Strauss, A.W. (1982). Rat Factor X is Synthesized as a Single Chain Precursor Inducible By Prothrombin Fragments. J. Biol. Chem. 257; 13108-13113. Griffin, J.H. (1978). The Role of Surface in the Surface-Dependent Activation of Hageman Factor (Factor XII). Proc. Natl. Acad. Sci. USA 75; 1998-2002. Griffin, J.H. (1981). The Contact Phase of Blood Coagulation, in Haemostasis and Thrombosis (Bloom, A.L. and Thomas, D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 84-97. Griffin, J.H. and Cochrane, C G . (1976). Mechanisms For Involvement of .High Molecular Weight Kininogen in Surface-Dependent Reactions of Hageman Factor (Coagulation Factor XII). Proc Natl. Acad. Sci. USA 73; 2554-2558. Grunstein, M. and Hogness, D.S. (1975). Colony Hybridization: A Method For the Isolation of Cloned DNAs That Contain a Specific Gene. Proc. Natl. Acad. Sci. USA 72; 3961-3965. 169 Guyton, A.C. (1981). Basic Human Physiology: Normal Function and Mechanisms of Disease, Six Edn., W.B. Saunders, Philadelphia. Hagen, F.S., Gray, C.L., O'Hara, P., Grant, F.J., Saari, G.C., Woodbury, R.G., Hart, C.E., Insley, M. , Kisiel, W., Kurachi, K., and Davie E.W. (1986). Characterization ofa cDNA Coding For Human Factor VII. Proc. Natl. Acad. Sci. USA 83; 2412-2416. Hall, C.E. and Slayter, H.S. (1959). The Fibrinogen Molecule: Its Size, Shape, and Mode of Polymerization. J. Biophys. Biochem. Cyto. 5; 11-15. Hall, L. , Craig, R.K., Edbrooke, M.R., and Campbell, P.N. (1982). Comparison of the Nucleotide Sequence of Cloned Human and Guinea-Pig Pre-Alpha-Lactalbumin cDNA With That of Chicken Pre-Lysozyme cDNA Suggests Evolution From a Common Ancestral Gene. Nucl. Acids Res. 10; 3503-3515. Hanahan, D. and Meselson, M . (1980). Plasmid Screening at High Colony Density. Gene 10; 63-67. Hay, C.W., Robertson, K.A. , Fung, M.R., and MacGillivray, R.T.A. (1986). RFLPs For Pst I and Eco RI in the Blood Clotting Factor X Gene. Nucl. Acids Res. 14; 51118. Hay, C.W., Robertson, K.A. , Yong, S.-L., Thompson, A.R., Growe, G.H., and MacGillivray, R.T.A. (1985). Use of a Bam HI Polymorphism in the Factor IX Gene For the Determination of Hemophilia B Carrier Status. Blood 67; 1508-1511. Hewett-Emmett, D., Czelusniak, J., and Goodman, M. (1981). The Evolutionary Relationships of the Enzymes in Blood Coagulation and Haemostasis. Ann. N.Y. Acad. Sci. 370; 511-527. Hojrup, P., Jensen, M.S., and Petersen, T.E. (1985). Amino Acid Sequence of Bovine Protein Z: A Vitamin K-Dependent Serine Protease Homolog. FEBS Lett. 184; 333-338. Honey, N.K., Sakaguchi, A.K. , MacDonald, R.J., Bell, G.I., Craik, C , Rutter, W.J., and Naylor, S.L. (1984). Chromosomal Assignments of Human Genes For Serine Proteases Trypsin, Chymotrypsin, and Elastase. Somat. Cell Molec. Genet. 10; 369-376. Hood, L. , Kronenberg, M. and Hunkapiller, T. (1985). T Cell Antigen Receptor and Immunoglobulin Supergene Family. Cell 40; 225-229. Hoskins, J., Norman, D.K., Beckmann, R.J., and Long, G.L. (1987). Cloning and Characterization of Human Liver cDNA Encoding a Protein S Precursor. Proc. Natl. Acad. Sci. USA 84; 349-353. Hougie, C , Barrow, E.M., and Graham, J.B. (1957). Stuart Clotting Defect I: Segregation of a Hereditary Hemorrhagic State From the Heterogenous Group Heretofore Called 'Stable Factor' (SPCA, Proconvertin, Factor VII) Deficiency. J. Clin. Invest. 36; 485-496. Hu, N.T. and Messing, J. (1982). The Making of Strand-Specific Ml3 Probes. Gene 17; 271-277. Hultin, M.B. and Nemerson, Y. (1978). Activation of Factor X by Factors IXa and VIII: A Specific Assay for Factor IXa in the Presence of Thrombin-Activated Factor VIII. Blood 52; 928-940. Irwin, D.M. (1986). The Structure and Evolution of the Bovine Prothrombin Gene. Ph.D. Diss. University of British Columbia, Vancouver. 170 Irwin, D.M., Ahern, K.G. , Pearson, G.D., and MacGillivray, R.T.A. (1985). Characterization of the Bovine Prothrombin Gene. Biochemistry 24; 6854-6861. Jackson, C.W. (1972). Characterization of Two Glycoprotein Variants of Bovine Factor X and Demonstration That the Factor X Zymogen Contains Two Polypeptide Chains. Biochemistry 11; 4873-4882. Jackson, C M . (1981). Biochemistry of Prothrombin Activation, in Haemostasis and Thrombosis (Bloom A.L. and Thomas, D.P. Eds.), Churchill Livingstone, Edingburgh, pp. 140-162. Jackson, C M . (1984). Factor X, in Progress in Hemostasis and Thrombosis (Spaet, T.H. Ed.), Vol. 7, Grune and Stratton, Orlando, pp. 55-109. Jackson, C M . and Brenkle, G.M. (1980). Divalent Ion Binding to Bovine Prothrombin Fragment 1 and Its Consequences, in The Regulation of Coagulation (Mann, K .G. and Taylor, F.B. Eds.), Elsevier/North-Holland, Amsterdam, pp. 11-18. Jackson, C M . and Hanahan, D.J. (1968). Studies on Bovine Factor X. II. Observations on Some Alterations in Zone Electrophoretic and Chromatographic Behaviour Occurring During Purification. Biochemistry 7; 4506-4517. Jackson, C M . and Nemerson, Y. (1980). Blood Coagulation. Ann. Rev. Biochem. 49; 765-811. Jaye, M., de la Salle, H., Schamber, F., Balland, A., Kohli, V., Findeli, A., Tolstoshev, P., and Lecocq, J.P. (1983). Isolation of Anti-Haemophilic Factor IX cDNA using a Unique 52-Base Synthetic Oligonucleotide Probe Deduced From the Amino Acid Sequence of Bovine Factor IX. Nucl. Acids Res. 11; 2325-2335. Jaye, M. , Ricca, G., Kaplan, R., Howk, R., Mudd, R., Ngo, K.Y., Fair, D., and Drohan, W. (1985). Polymorphism Associated With the Human Coagulation Factor X (F10) Gene. Nucl. Acids Res. 13; 8286. Jesty, J. (1978). The inhibition of Activated Bovine Coagulation Factors X and VII By Antithrombin III. Arch. Biochem. Biophys. 185; 165-173. Jesty, J. and Esnouf, M.P. (1973). The Preparation of Activated Factor X and Its Action on Prothrombin. Biochem. J. 131; 791-799. Jesty, J. and Nemerson, Y. (1974). Purification of Factor VII From Bovine Plasma. Reaction With Tissue Factor and Activation of Factor X. J. Biol. Chem. 249; 509-515. Jesty, J., Spencer, A.K. , and Nemerson, Y. (1974). The Mechanism of Activation of Factor X. J. Biol. Chem. 249; 5614-5622. Jorgensen, M.J., Cantor, M.A., Furie, B.C., Brown, C.L., Shoemaker, C.B., and Furie, B. (1987). Recognition Site Directing Vitamin K-Dependent Gamma-Carboxylation Residues on the Propeptide of Factor IX. Cell 48; 185-191. Katayama, K., Ericsson, L.H. , Enfield, D.L., Walsh, D.A., Neurath, H., Davie, E.W., and Titani, K. (1979). Comparison of Amino Acid Sequence of Bovine Coagulation Factor IX (Christmas Factor) With That of Other Vitamin K-Dependent Plasma Proteins. Proc. Natl. Acad. Sci. USA 76; 4990-4994. 171 Katz, L. , Kingsbury, D.T., and Helinski, D.R. (1973). Stimulation By Cyclic Adenosine Monophosphate of Plasmid Deoxyribonucleic Acid Replication and Catabolic Repression of the Plasmid Deoxyribonucleic Acid-Protein Relaxation Complex. J. Bacteriol. 114; 577-591. Katz, L. , Williams, P.H., Sato, S., Laevitt, R.W., and Helinski, D.R. (1977). Purification and Characterization of Covalently Closed Replicative Intermediates of Col E l DNA From Eschericha Coli. Biochemistry 16; 1677-1683. Kaul, R.K., Hildebrand, B., Roberts, S., and Jagadeeswaran, P. (1986). Isolation and Characterization of Human Blood-Coagulation Factor X cDNA. Gene 41; 311-314. Keller, E.B. and Noon, W.A. (1984). Intron Splicing: A Conserved Internal Signal in Introns of Animal Pre-mRNAs. Proc. Natl. Acad. Sci. USA 81; 7417-7420. Kisiel, W., Fujikawa, K. , and Davie, E.W. (1977). Activation of Bovine Factor VII (Proconvertin) by Factor Xlla (Activated Hageman Factor). Biochemistry 16; 4189-4194. Knowlton, R.G., Cohen-Haguenauer, O., van Cong, N., Frezal, J., Brown, V.A., Barker, D., Braman, J.C., Schumm, J.W., Tsui, L . - C , Buchwald, M., and Donis-Keller, H. (1985). A Polymorphic DNA Marker Linked to Cystic Fibrosis is Located on Chromosome 7. Nature 318; 380-382. Kornblihtt, A.R., Vibe-Pedersen, K., and Baralle, F.E. (1983). Isolation and Characterization of cDNA Clones For Human and Bovine Fibronectins. Proc. Natl. Acad. Sci. USA 80; 3218-3222. Kosow, D.P. (1976). Purification and Activation of Human Factor X: Cooperative Effect of Ca++ on the Activation Reaction. Thromb. Res. 9; 565-573. Kozak, M. (1984). Compilation and Analysis of Sequences Upstream From the Translational Start Site in Eukaryotic mRNA. Nucl. Acids Res. 12; 857-872. Kraut, J. (1977). Serine Proteases: Structure and Mechanism of Catalysis. Ann. Rev. Biochem. 46; 331-358. Kurachi, K. and Davie, E.W. (1982). Isolation and Characterization of a cDNA Coding For Human Factor IX. Proc. Natl. Acad. Sci. USA 79; 6461-6464. Kurachi, K. , Fujikawa, K., Schmer, G., and Davie, E.W. (1976). Inhibition of Bovine Factor IXa and Factor Xabeta By Antithrombin III. Biochemistry 15; 373-377. Kurosky, A., Barnett, D.R., Lee, T. -H. , Touchstone, B., Hay, R.E., Arnott, M.S., Bowman, B.H., and Fitch, W.M. (1980). Covalent Structure of Human Haptoglobin: A Serine Protease Homolog. Proc. Natl. Acad. Sci. USA 77; 3388-3392. Laskey, R.A. and Mills, A.D. (1977). Enhanced Autoradiographic Detection of 32P and 1251 Using Intensifying Screens and Hypersensitized Film. FEBS Letts. 82; 314-316. Lawn, R.M. (1985). The Molecular Genetics of Hemophilia Blood Clotting Factors VIII and IX. Cell 42; 405-406. Lawn, R.M., Fritsch, E.F., Parker, R.C., Blake, G., and Maniatis, T. (1978). The Isolation and Characterization of Linked Alpha- and Beta-Globin Genes From a Cloned Library of Human DNA. Cell 15; 1157-1174. 172 Lawn, R.M. and Vehar, G.A. (1986). The Molecular Genetics of Haemophilia. Scientific American 254; 48-54. Leach, D.R.F. and Stahl, F. (1983). Viability of Lambda Phages Carrying a Perfect Palindrome in the Absence of Recombination Nucleases. Nature 305; 448-451. Leytus, S.P., Chung, D.W., Kisiel, W., Kurachi, K., and Davie, E.W. (1984). Characterization of a cDNA Coding For Human Factor X. Proc. Natl. Acad. Sci. USA 81; 3699-3702. Leytus, S.P., Foster, D . C , Kurachi, K., and Davie, E.W. (1986). Gene For Factor X: A Blood Coagulation Factor Whose Gene Organization is Essentially Identical With That of Factor IX and Protein C. Biochemistry 25; 5098-5102. Li , W.-H. (1983). Evolution of Duplicated Genes and Pseudogenes, in Evolution of Genes and Proteins (Maclntyre, R.J. Ed.), Plenum Press, New York, pp. 1-94. Li , W.-H., Luo, C . - C , and Wu, C.-I. (1985). Evolution of DNA Sequence, in Molecular Evolutionary Genetics (Maclntrye, R.J. Ed.), Plenum Press, New York, pp. 1-94. Long, G.L., Belagaje, R.M., and MacGillivray, R.T.A. (1984). Cloning and Sequencing of Liver cDNA Coding For Bovine Protein C. Proc. Natl. Acad. Sci. USA 81; 5653-5656. Lundwall, A., Dackowski, W., Cohen, E., Shaffer, M., Mahr, A., Dahlback, B., Stenflo, J., and Wydro, R. (1986). Isolation and Sequence of the cDNA For Human Protein S, a Regulator of Blood Coagulation. Proc. Natl. Acad. Sci. USA 38; 6716-6720. MacFarlane, R.G. (1964). An Enzyme Cascade in the Blood Clotting Mechanism and Its Function as a Biological Amplifier. Nature 202; 498-499. MacGillivray, R.T.A. and Davie, E.W. (1984). Characterization of Bovine Prothrombin mRNA and Its Translation Product. Biochemistry 23; 1626-1634. MacGillivray, R.T.A., Irwin, D.M., Guinto, E.R., and Stone, J.C. (1986). Recombinant Genetic Approaches to Functional Mapping of Thrombin. Ann. N.Y. Acad. Sci. 485; 73-79. Maeda, N., Yang, F., Barnett, D.R., Bowman, B.H., and Smithies, O. (1984). Duplication Within the haptoglobin Hp2 Gene. Nature 309; 131-135. Magnusson, S., Petersen, T.E., Sottrup-Jensen, L. , and Claeys, H. (1975). Complete Primary Structure of Prothrombin: Isolation, Structure, and Reactivity of Ten Carboxylated Glutamic Acid Residues and Regulation of Prothrombin Activation By Thrombin, in Proteases and Biological Control (Reich, E. , Rifkin, D.B., and Shaw, E. Eds.), Cold Spring Harbor Laboratories, Cold Spring Harbor, New York, pp. 123-149. Malinowski, D.P., Sadler, J.E., and Davie, E.W. (1984). Characterization of a Complementary Deoxyribonucleic Acid Coding For Human and Bovine Plasminogen. Biochemistry 23; 4243-4250. Mandle, R. and Kaplan, A.P. (1977). Hageman Factor Substrates Human Plasma Prekallikrein Mechanism of Activation By Hageman Factor and Participation in Hageman Factor-Dependent Fibrinolysis. J. Biol. Chem. 252; 6097-6104. Maniatis, T., Fritsch, E.F., and Sambrook, J. (1982). Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, Cold Spring Harbor, New York. 173 Maniatis, T., Jeffrey, A., and Kleid, D.G. (1975). Nucleotide Sequence of the Rightward Operator of Phage Lambda. Proc. Natl. Acad. Sci. USA 72; 1184-1188. Marciniak, E. (1973). Factor Xa Inactivation By Antithrombin III: Evidence For Biological Stabilization of Factor Xa By Factor V-Phospholipid Complex. Br. J. Haematol. 24; 391 -400. Margolis, J. (1957). Initiation of Blood Coagulation By Glass and Related Surfaces. J. Physiol. 137; 95-109. Mason, A.J., Evans, B.A., Cox, D.R., Shine, J., and Richards, R.I. (1983). Structure of Mouse Kallikrein Gene Family Suggests a Role in Specific Processing of Biological Active Peptides. Nature 303; 300-307. Masys, D.R., Bajaj, S.P., and Rapaport, S.I. (1982). Activation of Human Factor VII by Activated Factors IX and X. Blood 60; 1143-1150. Mattock, P. and Esnouf, M.P. (1973). A Form of Bovine Factor X With a Single Polypeptide Chain. Nature New Biol. 242; 90-92. Maxam, A. and Gilbert, W. (1980). Sequencing End-Labeled DNA With Base-Specific Chemical Cleavages. Meth. Enzymol. 65; 499-560. McGraw, R.A., Davis, L .M. , Noyes, C M . , Lundblad, R.L., Roberts, H.R., Graham, J.B. and Stafford, D.W. (1985). Evidence For a Prevalent Dimorphism in the Activation Peptide of Human Coagulation Factor IX. Proc. Natl. Acad. Sci. USA 82; 2847-2851. McKnight, S.L., and Kingsbury, R. (1982). Transcriptional Control Signals of a Eukaryotic Protein-Coding Gene. Science 217; 316-324. McLachlan, A.D. (1979). Gene Duplication in the Structural Evolution of Chymotrypsin. J. Mol. Biol. 128; 49-79. McLean, J., Fielding, C , Drayna, D., Dieplinger, H., Baer, B., Kohr, W., Henzel, W., and Lawn R. (1986). Cloning and Expression of Human Lecithin-Cholesterol Acyltransferase cDNA. Proc. Natl. Acad. Sci. USA 83; 2335-2339. McMullen, B.A. and Fujikawa, K. (1985). Amino Acid Sequence of the Heavy Chain of Human Alpha-Factor Xlla (Activated Hageman Factor). J. Biol. Chem. 260; 5328-5341. McMullen, B.A., Fujikawa, K., Kisiel, W., Sasagawa, T., Howald, W.N., Kwa, E.Y., and Weinstein, B. (1983a). Complete Amino Acid Sequence of the Light Chain of Human Blood Coagulation Factor X: Evidence for Identifcation of Residue 63 as Beta-Hydroxyaspartic Acid. Biochemistry 22; 2875-2884. McMullen, B.A., Fujikawa, K., and Kisiel, W. (1983b). The Occurrence of Beta-Hydroxyaspartic Acid in the Vitamin K-Dependent Blood Coagulation Zymogens. Biochem. Biophys. Res. Commun. 115; 8-14. Messing, J. (1983). New M13 Vectors For Cloning. Meth. Enzymol. 101; 20-78. Messing, J., Crea, R., and Seeburg, P.H. (1981). A System For Shotgun DNA Sequencing. Nucl. Acids Res. 9; 309-321. Miletich, J.P., Jackson, C M . , Majerus, P.W. (1978). Properties of the Factor Xa Binding Site on Human Platelets. J. Biol. Chem. 253; 6908-6916. 174 Mizuno, K. and Matsuo, H. (1984). A Novel Protease From Yeast With Specificity Towards Paired Basic Residues. Nature 309; 558-560. Montell, C , Fisher, E.E., Caruthers, M.H., and Berk, A.J., (1983). Inhibition of RNA Cleavage But Not Polyadenylation by a Point Mutation in mRNA Concensus Sequence A A U A A A . Nature 305; 600-608. Morita, T. and Jackson, C M . (1979). Bovine Factor XI and X2: Activation Peptide Based Chromatography Differences, in Vitamin K Metabolism and Vitamin K Dependent Proteins (Suttie, J.W. Ed.), Univ. Park Press, Baltimore, Maryland, pp. 120-133. Mount, S.M. (1982). A Catalogue of Splice Junction Sequences. Nucl. Acids Res. 10; 459-472. Nagamine, Y., Pearson, D., and Grattan, M. (1984). Exon-Intron Boundary Sliding in the Generation of Two mRNAs Coding For Porcine Urokinase-Like Plasminogen Activator. Biochem. Biophys. Res. Commun. 132; 563-569. Naora, H., and Deacon, N.J. (1982). Relationship Between the Total Size of Exons and Introns in Protein-Coding Genes of Higher Eukaryotes. Proc. Natl. Acad. Sci. USA 79; 6196-6200. Nemerson, Y., Zur, M. , Bach, R., and Gentry, R.D. (1980). The Mechanism of Activation of Tissue Factor: A Provisional Model, in The Regulation of Coagulation (Mann, K.G. and Taylor, F.B. Eds.), Elsevier North-Holland, New York, pp. 193-203. Nesheim, M.E. and Mann, K.G. (1979). Thrombin-Catalyzed Activation of Single Chain Factor V. J. Biol. Chem. 254; 1326-1334. Neurath, H. (1984). Evolution of Proteolytic Enzymes. Science 224; 350-357. Neurath, H. (1985). Proteolytic Enzymes, Past and Present. Fed. Proc. 44; 2907-2913. Neurath, H. and Walsh, K.A. (1976). The Role of Proteases in Biological Regulation, in Proteolysis and Physiological Regulation (Ribbons, D.W. and Brew, K. Eds.), Academic Press, New York, pp. 29-42. Nevins, J.R. (1983). The Pathway of Eukaryotic mRNA Formation. Ann. Rev. Biochem. 52; 441-466. Nikaido, T., Nakai, S., and Honjo, T., (1981). Switch Region of Immunoglobulin Cu Gene Is Composed of Simple Tandem Repetitive Sequences. Nature 292; 845-848. Ny, T. Elgh, F., and Lund, B. (1984). The Structure of the Human Tissue-Type Plasminogen Activator Gene: Correlation of Intron and Exon Structures to Functional and Structural Domains. Proc. Natl. Acad. Sci. USA 81; 5355-5359. O'Hara, P.J., Grant, F.J., Haldeman, B.A., Gray, C.L., Insley, M.Y., and Murray, M.J. (1987). Nucleotide Sequence of the Gene Coding For Human Factor VII, a Vitamin K -Dependent Protein Participating in Blood Coagulation.' Proc. Natl. Acad. Sci. USA 84; 5158-5162. Ohlin, A . - K . and Stenflo, J. (1987). High Affinity Calcium Binding to Domains of Protein C That Are Homologous to the Epidermal Growth Factor. Thrombosis and Haemostasis 58; 230. 175 Ott, R. and Pfeiffer, R.A. (1984). Evidence That Activities of Coagulation Factors VII and X Are linked to Chromosome 13(q34). Hum. Hered. 34; 123-136. Owen, W.G. and Esmon, C T . (1981). Functional Properties of an Endothelial Cell Cofactor For Thrombin-Catalyzed Activation of Protein C. J. Biol. Chem. 256; 5532-5535. Pan, L . C and Price, P.A. (1985). The Propeptide of Rat Bone Gamma-Carboxyglutamic Acid Protein Shares Homology With Other Vitamin K-dependent Protein Precursors. Proc. Natl. Acad. Sci. USA 82; 6109-6113. Pan, L . C , Williamson, M.K., and Price, P.A. (1985). Sequence of the Precursor to Rat Bone Gamma-Carboxyglutamic Acid Protein That Accumulates in Warfarin-Treated Osteosarcoma Cells. J. Biol. Chem. 260; 13398-13401. Park, C.H. and Tulinsky, A. (1986). Three-Dimensional Structure of the Kringle Sequence: Structure of Prothrombin Fragment 1. Biochemistry 25; 3977-3982. Patthy, L. (1985). Evolution of the Proteases of Blood Coagulation and Fibrinolysis By Assembly From Modules. Cell 41; 657-663. Pearson, P.L., van der Kamp, J., and velt Kamp, J. (1982). Reduced Hageman Factor Level in 6p Patient. Cytogent. Cell Genet. 32; 309. Pennica, D., Holmes, W.E., Kohr, W.J., Harkins, R.N., Vehar, G.A., Ward, C.A., Bennett, W.F., Yelverton, E. , Seeburg, P.H., Heyneker, H.L., Goeddel, D.V., and Collen, D. (1983). Cloning and Expression of Human Tissue-Type Plasminogen Activator cDNA in E. Coli. Nature 301; 214-221. Perlmutter, R.M., Ram, D., and Hood, L. (1984). Chromosomal Translocations and Lymphoid Neoplasia, in Genes and Cancer UCLA Symposium on Molecular and Cellular Biology New Series (Bishop, J.M., Rowley, J.D., and Greaves, M. Eds.), Vol. 1, Liss, New York, pp. 489-499. Perry, R.P. (1976). Processing of RNA. Ann. Rev. Biochem. 45; 605-629. Peterson, T.E., Thogersen, H . C , Shorstengaard, K., Vibe-Pedersen, K., Sahl, P., Sottrup-Jensen, L. , and Magnusson, S. (1983). Partial Primary Structure of Bovine Plasma Fibronectin: Three Types of Internal Homology. Proc. Natl. Acad. Sci. USA 80; 137-141. Pfeiffer, R.A., Ott, R., Gilgenkrantz, S., and Alexandre, P. (1982). Deficiency of Coagulation Factors VII and X Associated With Deletions of a Chromosome 13(q34). Evidence From Two Cases With 46, XY, t(13;Y)(qll;q34). Hum. Genet. 62; 358-360. Ploos van Amstel, J.K., van der Zanden, A.L. , Bakker, E. , Reitsma, P.H., and Bertina, R.M. (1987). Independent Isolation of Human Protein S and the Assignment of the Gene to Chromosome 3. Thrombosis and Haemostasis 58; 497. Ploplis, V.A., Edgington, T.S., and Fair, D.S. (1987). Initiation of the Extrinsic Pathway of Coagulation. J.Biol. Chem. 262; 9503-9508. Plutzky, J., Hoskins, J.A., Long, G.L., and Crabtree, G.R. (1986). Evolution and Organization of the Human Protein C Gene. Proc. Natl. Acad. Sci. USA 83; 546-550. 176 Prochownik, E.V., Markham, A.F., and Orkin, S.H. (1983). Isolation of a cDNA Clone For Human Antithrombin III. J. Biol. Chem. 258; 8389-8394. Proudfoot, N.J. and Brownlee, G.G. (1976). 3' Non Coding Region Sequences in Eukaryotic Messenger RNA. Nature 263; 211-214. Proudfoot, N.J. and Longley, J.I. (1976). The 3' Terminal Sequences of Human Alpha and Beta Globin Messenger RNAs: Comparison With Rabbit Globin Messenger RNA. Cell 9; 733-746. Rao, L .V.M. and Rapaport, S.I. (1987). Studies of a Mechanism Inhibiting the Initiation of the Extrinsic Pathway of Coagulation. Blood 69; 645-651. Radcliffe, R., Bagdasarian, A., Colman, R., and Nemerson, Y. (1977). Activation of Bovine Factor VII by Hageman Factor Fragments. Blood 50; 611-617. Radcliffe, R. and Nemerson, Y. (1975). Activation and Control of Factor VII by Activated Factor X and Thrombin. J. Biol. Chem. 250; 368-395. Radcliffe, R. and Nemerson, Y. (1976). Mechanism of Activation of Bovine Factor VII. J. Biol. Chem. 251; 4797-4802. Rajput, B., Degen, S.J.F., Reich, E. , Waller, E.K., Axelrod, J., Eddy, R.L., and Stows, T.B. (1985). Chromosomal Localizations of Human Tissue Plasminogen Activator and Urokinase Genes. Science 230; 672-674. Redman, C M . , Avellino, G., and Yu, S. (1983). Secretion of Proalbumin By Canavanine-Treated Hep G2 Cells. J. Biol. Chem. 258; 3446-3452. Reisner, A.H. (1985). Similarity Between the Vaccinia Virus 19K Early Protein and Epidermal Growth Factor. Nature 313; 801-803. Reudelhuber, T. (1984). Upstream and Downstream Control of Eukaryotic Genes. Nature 312; 700-701. Riccio, A., Grimaldi, G., Verde, P., Sebastue, G., Boast, S., and Blasi, F. (1985). The Human Urokinase-Plasminogen Activator Gene and Its Promoter. Nucl. Acids Res. 13; 2759-2771. Rocchi, M. , Roncuzzi, L. , Santamaria, R. Sparra, D., Mochi, M. , Archidacono, N., Covone, A., Cortese, R., and Romeo, G. (1985). Mapping of Coagulation Factor Protein C and Factor X on Chromosome 2 and 13 Respectively. Cytogenet. Cell Genet. 40; 734-735. Rogers, J. (1985). Exon Shuffling and Intron Invasion in Serine Protease Genes. Nature 315; 458-459. Rosenberg, J.S., Beeler, D.L., and Rosenberg, R.D. (1975). Activation of Human Prothrombin By Highly Purified Human Factors V and Xa in the Presence of Human Antithrombin. J. Biol. Chem. 250; 1607-1617. Royle, N.J., Fung, M.R., MacGillivray, R.T.A., and Hamerton, J.L. (1986). The Gene For Clotting Factor 10 is Mapped to 13q32-qter. Cytogenet. Cell Genet. 41; 185-188. Royle, N.J., Irwin, D.M., Koschinsky, M. , MacGillivray, R.T.A., and Hamerton, J.L. (1987). Human Genes Encoding Prothrombin and Ceruloplasmin Mapped to 11 pi 1 -p 12 and 3q21-q24, Respectively. Somat. Cell. Mol. Genet. 13; 285-292. 177 Sadler, J.E., Malinowski, D.P., and Davie, E.W. (1985). Cloning and Structural Characterization of the Gene For Human Plasminogen, in Progress in Fibrinolysis (Davidson, J.F., Donati, M.B., and Cocchen, S. Eds.), Vol. VII, Churchill Livingstone, Edinburgh, pp. 201-204. Sanger, F. and Coulson, A.R. (1978). The Use of Thin Acrylamide Gels for DNA Sequencing. FEBS Lett. 87; 107-110. Sanger, F., Nicklen, S., and Coulson, A.R. (1977). DNA Sequencing With Chain-Terminating Inhibitors. Proc. Natl. Acad. Sci. USA 74; 5463-5467. Sataka, Y. and Aoki, N. (1980). Molecular Abnormality of Plasminogen. J. Biol. Chem. 255; 5442-5447. Scambler, P.J., Wainwright, B.J., MacGillivray, R.T.A., Fung, M.R., and Williamson, R. (1986). Exclusion of Human Chromosome 13q34 as the Site of the Cystic Fibrosis Mutation. Am. J. Hum. Genet. 38; 567-572. Scambler, P.J. and Williamson, R. (1985). The Structural Gene For Human Coagulation Factor X is Located on Chromosome 13q34. Cytogenet. Cell Genet. 39; 231-233. Schroeder, W.A., Shelton, J.B., and Shelton, J.R. (1969). An Examination of Conditions For the Cleavage of Polypeptide Chains With Cyanogen Bromide: Application to Catalase. Arch. Biochem. Biophys. 130; 551-556. Seed, B. (1983). Purification of Genomic Sequences From Bacteriophage Libraries By Recombination and Selection In Vivo. Nucl. Acids Res. 11; 2427-2445. Sharp, P.A. (1981). Speculations on RNA Splicing. Cell 23; 643-646. Shatkin, A.J. (1985). mRNA Cap Binding Proteins: Essential Factors For Initiating Translation. Cell 40; 223-224. Sottrup-Jensen, L. Claeys, H., Zajdel, M., Petersen, T.E., and Magnusson, S. (1978). The Primary Structure of Human Plasminogen: Isolation of Two Lysine-Binding Fragments and One 'Mini' Plasminogen (MW, 38,000) By Elastase-Catalyzed Specific Limited Proteolysis, in Progress in Chemical Fibrinolysis and Thrombosis (Davidson, J.F., Rowan, R.M., Samana, M.M., and Desnoyer, P.C. Eds.), Vol. 3, Raven Press, New York, pp. 191-209. Spicer, E.K., Horton, R., Bloom, L., Bach, R., Williams, K.R., Gua, A., Kraus, J., Lin, T.-C , Nemerson, Y., and Konigsberg, W.H. (1987). Isolation of cDNA clones coding for human tissue factor: Primary structure of the protein and cDNA. Proc. Natl. Acad. Sci. 84, 5148-5152. Southern, E.M. (1975). Detection of a Specific Sequence Among DNA Fragments Separated By Gel Electrophoresis. J. Mol. Biol. 98; 503-517. Staden, R. (1982). Automation of the Computer Handling of Gel Reading Data Produced By the Shotgun Method of DNA Sequencing. Nucl. Acids Res. 10; 4731-4751. Steiner, D.F., Quinn, P.S., Chan, S.J., Marsh, J., and Tager, H.S. (1980). Processing Mechanism in the Biosynthesis of Proteins. Ann. N. Y. Acad. Sci. 343; 1-16. 178 Stenflo, J. and Fernlund, P. (1982). Amino Acid Sequence of the Heavy Chain of Bovine Protein C. J. Biol. Chem. 257; 12180-12190. Stenflo, J., Fernlund, P., Egan, W., and Roepstorff, P. (1974). Vitamin K-Dependent Modifications of Glutamic Acid Residues in Prothrombin. Proc. Natl. Acad. Sci. USA 71; 2730-2733. Stenflo, J., Lundwall, A., and Dalhback, B. (1987). Beta-Hydroxyasparagine in Domains Homologous to the Epidermal Growth Factor Precursor in Vitamin K-Dependent Protein S. Proc. Natl. Aca. Sci. USA 84; 368-372. Stenflo, J., Ohlin, A . - K . , Lundwall, A., and Dahlback, B. (1987). Beta-Hydroxyaspartic Acid and Beta-Hydroxyasparagine in the EGF-Homology Regions of Protein C and Protein S. Thrombosis and Haemostasis 58; 331. Stenflo, J. and Suttie, J.W. (1977). Vitamin K-Dependent Formation of Gamma-Carboxyglutamic Acid. Annu. Rev. Biochem. 46; 157-172. Strauss, A.W., Bennett, C.A., Donohue, A.M., Rodkey, J.A., Boime, I., and Alberts, A.W. (1978). Conversion of Rat Pre-Proalbumin to Proalbumin In Vitro By Ascites Membranes. J. Biol. Chem. 253; 6270-6274. Stroud, R.M. (1974). A Family of Protein-Cutting Proteins. Scientific American 231; 74-88. Sudhoff, T.C., Goldstein, J.L., Brown, M.S., and Russell, D.W. (1985a). The LDL Receptor Gene: A Mosaic of Exons Shared With Different Proteins. Science 228; 815-822. Sudhoff, T.C., Russell, D.W., Goldstein, J.L., Brown, M.S., Sanchez-Pescador, R., and Bell G.L (1985b). Cassette of Eight Exons Shared By Genes For LDL Receptor and EGF Precursor. Science 228; 893-895. Sugo, T., Bjork, I., Holmgren, A., and Stenflo, J. (1984). Calcium-binding properties of Bovine Factor X Lacking the Gamma-Carboxyglutaimic Acid-Containing Region. J. Biol. Chem. 259; 5705-5710. Suttie, J.W. (1985). Vitamin K-Dependent Carboxylase. Ann. Rev. Biochem. 54; 459-477. Suttie, J.W., Hoskins, J.A., Engelke, J., Hopfgartner, A., Ehrlich, H., Bang, N.U., Belagaje, R.M., Schoner, B., and Long, G.L. (1987). Vitamin K-Dependent Carboxylase: Possible Role of the Substrate 'Propeptide' As an Intracellular Recognition Site. Proc. Natl. Acad. Sci. USA 84; 634-637. Swanson, C.J. and Suttie, J.W. (1985). Prothrombin Biosynthesis: Characterization of Processing Events in Rat Liver Microsomes. Biochemistry 24; 3980-3897. Swanstrom, R. and Shank, P.R. (1978). X-Ray Intensifying Screens Greatly Enhance the Detection By Autoradiography of the Radioactive Isotopes 32P and 1251. Anal. Bioc. 86; 184-192. Swift, G.H., Craik, C.S., Stary, S.J., Quinto, C , Lahaie, R.G., Rutter, W.J., and MacDonald, R.J. (1984). Structure of the Two Related Elastase Genes Expressed in the Rat Pancreas. J. Biol. Chem. 259; 14271-14278. Taub, R.A., Hollis, G.F., Hieter, P.A., Korsmeyer, S., Waldmann, T.Z., and Leder, P. (1983). Variable Amplification of Immunoglobulin Gamma Light-Chain Genes in Human Populations. Nature 304; 172-174. 179 Telfer, T.P., Denson, K.W., and Wright, D.R. (1956). A 'New' Coagulation Defect. Brit. J. Haemat. 2; 308-316. Thompson, A.R. (1986). Structure, Function, and Molecular Defects of Factor IX. Blood 67; 565-572. Titani, K., Fujikawa, K., Enfield., D.L., Ericsson, L.H. , Walsh, K.A. , and Neurath, H. (1975). Bovine Factor XI (Stuart Factor): Amino-Acid Sequence of Heavy Chain. Proc. Natl. Acad. Sci. USA 72; 3082-3086. Toole, J.J., Knopt, J.L., Wozney, J.M., Sultzman, L.A., Buecker, J.L., Pittman, D.D., Kaufman, R.J., Brown, E. , Shoemaker, C , Orr, E.C., Amphlett, G.W., Foster, W.B., Coe, M.L., Knutson, G.J., Fass, D.N., and Hewick, R.M. (1984). Molecular Cloning of a cDNA Encoding Human Antihaemophilic Factor. Nature 312; 342-347. Ullrich, A., Berman, C.H., Dull, T.J., Gray, A., Lee, J.M. (1984). Isolation of the Human Insulin-Like Growth Factor I Gene Using a Single Synthetic DNA Probe. EMBO J. 3; 361-364. van Leeuwen, B.H., Evans, B.A., Tregear, G.W., and Richards, R.I. (1986). Mouse Glandular Kallikrein Genes: Identification, Structure, and Expression of the Renal Kallikrein Gene. J. Biol. Chem. 261; 5529-5535. van Santen, V.L. and Spritz, R.A. (1985). mRNA Precursor Splicing In Vivo: Sequence Requirements Determined By Deletion Analysis of an Intervening Sequence. Proc. Natl. Acad. Sci. USA 82; 2885-2889. van Zonneveld, A. -J . , Veerman, H., and Pannekoek, H. (1986). Autonomous Functions of Structural Domains on Human Tissue-Type Plasminogen Activator. Proc. Natl. Acad. Sci. USA 83; 4670-4674. Vehar, G.A., Keyt, B., Eaton, D., Rodriguez, H., O'Brian, D.P., Rotblat, F., Oppermann, H„ Keck, R., Wood, W.I., Harkins, R.N., Tuddenham, E.G.D., Lawn, R.M., and Capon, D.J. (1984). Structure of Human Factor VIII. Nature 312; 337-342. Verde, P., Stoppelli, M.P., Galeffi, P., Di Nocera, P., and Blasi, F. (1984). Identification and Primary Sequence of an Unspliced Human Urokinase Poly (A)+ RNA. Proc. Natl. Acad. Sci. USA 81; 4727-4731. Vician, L. and Tishkoff, G.H. (1976). Purification of Human Blood Clotting Factor X By Blue Dextran Agarose Affinity Chromatography. Biochim. Biophys. Acta. 434; 199-208. Vieira, J. and Messing, J. (1982). The pUC Plasmids, an M13mp7 Derived System For Insertion Mutagenesis and Sequencing With Synthetic Universal Primers. Gene 19; 259-268. Vogelstein, B. and Gillespie, D. (1979). Preparative and Analytical Purification of DNA From Agarose. Proc. Natl. Acad. Sci. USA 76; 615-619. von Heijne, G. (1983). Patterns of Amino Acids Near Signal-Sequence Cleavage Sites. Eur. J. Biochem. 133; 17-21. von Heijne, G. (1985). Signal Sequences The Limits of Variation. J. Mol. Biol. 184; 99-105. 180 Wainwright, B.J., Scambler, P.J., Schmidtke, J., Watson, E.A., Law, H.-Y., Farrall, M., Cooke, H.J., Eiberg, H., and Williamson, R. (1985). Localization of Cystic Fibrosis Locus to Human Chromosome 7cen-q22. Nature 318; 384-385. Walz, D.A., Hewett-Emmett, D., Giullin, M.-C. (1986). Amino Acid Sequence and Molecular Homology of the Vitamin K-Dependent Clotting Factors, in Prothrombin and Other Vitamin K Proteins (Seegers, W.H. and Walz, D.A. Eds.), Vol. 1, CRC Press, Boca Raton, Florida, pp. 125-160. Walz, D.A., Hewett-Emmett, D., and Seegers, W.H. (1977). Amino Acid Sequence of Human Prothrombin Fragments 1 and 2. Proc. Natl. Acad. Sci. USA 74; 1969-1972. Walz, D.A., Kipfer, R.K., Jones J.P., and Olsen, R.E. (1974). Purification and Properties of Chicken Prothrombin. Arch. Biochem. Biophys. 164; 527-535. Watson, M.E.E. (1984). Compilation of Published Signal Sequences. Nucl. Acids Res. 12; 5145-5164. White, R., Woodward, S., Leppert, M., O'Connell, P., Hoff, M., Herbst, J., Lalouel, J.-M., Dean, M. , and Vande Woude, G. (1985). A Closely Linked Genetic Marker For Cystic Fibrosis. Nature 318; 382-384. Wickens, M. and Stephenson, P. (1984). Role of the Conserved A A U A A A Sequence: Four A A U A A A Point Mutants Prevent Messenger RNA 3' End Formation. Science 226; 1045-1051. Wieringa, B., Hofer, E. , and Weissmann, C. (1984). A Minimal Intron Length But No Specific Internal Sequence is Required For Splicing the Large Rabbit Beta-Globin Intron. Cell 37; 915-925. Willingham, A.K. and Matschiner, J.T. (1984). Functional Characterization of a Single-Chain Factor X From Rat Liver. Arch. Bioch. Biophys. 230; 543-552. Wilson, A.C. (1985). The Molecular Basis of Evolution. Scientific American 253; 164-173. Wilson, A . C , Carlson, S.S., and White, T.J. (1977). Biochemical Evolution. Ann. Rev. Biochem. 46; 573-639. Winship, P.R., Anson, D.S., Rizza, C.R., Brownlee, G.G. (1984). Carrier Detection in Hemophilia B Using Two Further Intragenic Restriction Fragment Length Polymorphisms. Nucl. Acids Res. 12; 8861-8872. Wion, K .L . , Tuddenham, E.G.D., and Lawn, R.M. (1986). A New Polymorphism in the Factor VIII Gene For Prenatal Diagnosis of Haemophilia A. Nucl. Acids Res. 14; 4535-4542. Wood, W.I., Capon, D.J., Simonsen, C C , Eaton, D.L., Gitshier, J., Keyt, B., Seeburg, P.H., Smith, D.H., Hollingshead, P., Wion, K .L . , Delwart, E. , Tuddenham, E.G.D., Vehar, G.A., and Lawn, R.M. (1984). Expression of Active Human Factor VIII From Recombinant DNA Clones. Nature 32; 330-337. Wright, S., Rosenthal, A., Flavell, R., and Grosveld, F., (1984). DNA Sequences Required For Regulated Expression of Beta-Globin Genes in Murine Erythroleukemia Cells. Cell 38; 265-273. 181 Wyman, A.R., Wolfe, L.B., and Botstein, D. (1985). Propagation of Some Human DNA Squences in Bacteriophage Lambda Vectors Require Mutant Escherichia Coli Hosts. Proc. Natl. Acad. Sci. USA 82; 2880-2884. Yang-Feng, T.L., Opdenakker, G., Volckaert, G., and Francke, U. (1986). Mapping of the Human Tissue-Type Plasminogen Activator (PLAT) Gene to Chromosome 8 Region 8pll.2-p21. Somat. Cell Molec. Genet. 12; 95-100. Yoshitake, S., Schach, B.G., Foster, D . C , Davie, E.W., and Kurachi, K. (1985). Nucleotide Sequence of the Gene For Human Factor IX (Antihemophilic Factor B). Biochemistry 24; 3736-3750. Young, C.L., Barker, W.C, Tomaselli, C M . , and Dayhoff, M.O. (1978). Serine Proteases, in Atlas of Protein Structure (Dayhoff, M.O. Ed.), Vol. 5 (Suppl. 3), National Biochemical Research Foundation, Silver Spring, Maryland, pp. 73-93. Young, R.A. and Davis, R.W. (1983a). Efficient Isolation of Genes By Using Antibody Probes. Proc. Natl. Acad. Sci. USA 80; 1194-1198. Young, R.A. and Davis, R.W. (1983b). Yeast RNA Polymerase I Genes: Isolation With Antibody Probes. Science 222; 778-782. Zuckerhandl, E. and Pauling, L. (1965). Evolutionary Divergence and Convergence in Plasma Proteins, in Evolving Genes and Proteins (Bryson, V. and Vogel, H.J. Eds.), Academic Press, New York, pp. 97-166. Zur, M. and Nemerson, Y. (1981). Tissue Factor Pathways of Blood Coagulation, in Haemostasis and Thrombosis (Bloom, A.L. and Thomas, D.P. Eds.), Churchill Livingstone, Edinburgh, pp. 124-139. 182 VI. APPENDIX I SCREENING BY RECOMBINATION: PRINCIPLE FOR SELECTION In this procedure, the DNA sequence that is to be used as a probe (the 5' region of the factor X cDNA) is cloned into a small plasmid (piAN7) that carries a selectable marker, the tyrosine tRNA amber-suppressor (supF) gene (see Figure 29; Seed, 1983). Bacteria (MC1061/P3) which are sup0 and carry the plasmid P3 (kanr, tet am, amp am) are transformed by the recombinant plasmid. Once transformed, the piAN7 plasmid is maintained in MC1061/P3 by selecting for amber suppression of the tyrosine-amber mutations in both the ampicillin and tetracycline resistance elements. The recombinant piAN7 transformants (kanr, tetr, ampr) are infected with a library of phage lambda recombinants constructed with a vector that bears at least two amber mutations in its essential genes. For example, libraries constructed in lambda charon4A (with ambers in their A and B genes) can be screened by this method. If the DNA cloned into the infecting phage shares sequence homology with the factor X cDNA fragment inserted into piAN7, homologous recombination occurs. This results in the insertion of the entire piAN7 plasmid into the bacteriophage. The subsequent phage, in contrast to the parent, can be grown in sup0 cells by virtue of the supF gene contributed by piAN7. Recombination frequencies greater than 10"^ are above reversion frequencies and are attributed to homologous recombination (Seed, 1983). 183 FIGURE 29: SCREEN OF PHAGE LIBRARY USING RECOMBINATION WITH THE VECTOR piAN7 A schematic representation of the principle for selection by recombination (modified from Maniatis et al., 1982). 184 Factor X DNA sequence cloned inp iAN7 MC1061/P3 K a n r Te t s Arnp s Library constructed in a bacteriophage X vector carrying amber mutations in A and B genes Recombination between homologous eukaryotic DNA sequences cloned in D iAN7and a bacteriophage vector / \\\ Recombinant X bacteriophage carrying A a m , Bam, supr\ and human DNA sequences homologous to those cloned i n p i A N 7 1 growth on Sup° cells Library of bacteriophages carrying Aam Bam no growth on Sup cells