Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The structure and evolution of the bovine prothrombin gene Irwin, David Michael 1986

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A1 I78.pdf [ 11.13MB ]
Metadata
JSON: 831-1.0097390.json
JSON-LD: 831-1.0097390-ld.json
RDF/XML (Pretty): 831-1.0097390-rdf.xml
RDF/JSON: 831-1.0097390-rdf.json
Turtle: 831-1.0097390-turtle.txt
N-Triples: 831-1.0097390-rdf-ntriples.txt
Original Record: 831-1.0097390-source.json
Full Text
831-1.0097390-fulltext.txt
Citation
831-1.0097390.ris

Full Text

THE STRUCTURE AND EVOLUTION OF THE BOVINE PROTHROMBIN GENE by DAVID MICHAEL IRWIN B.Sc.(Hons.), University Of Guelph, 1982 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department Of Biochemistry (Genetics Programme) We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA December 1986 © David Michael Irwin, 1986 In presenting t h i s thesis in p a r t i a l fulfilment of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his or her representatives. It i s 'understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of Biochemistry The University of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date: 16 December 1986 Abstract The gene for bovine prothrombin i s 15.6 Kbp in length which encodes a mRNA of 2025 nucleotides plus a poly(A) t a i l . The prothrombin gene i s composed of 14 exons separated by 13 introns, a l l of which vary in siz e . The positions of the introns found within the prothrombin gene provides some insight into the evolution of prothrombin and provide evidence on the o r i g i n of introns. Within the activation peptide and leader sequence of precursor prothrombin, some of the introns appear to separate s t r u c t u r a l and functional protein domains. Introns are found to separate certain domains, including the pre-peptide, the pro-peptide and Gla region, and each of the kringles. This organization of exons may r e f l e c t the evolution of the prothrombin gene as the result of the fusion of exon(s) containing protein domains by exon s h u f f l i n g . The a c t i v a t i o n peptide appears to be constructed from four domains: a pre-peptide, a pro-peptide and Gla region, and two kringles. On comparison of the exon organization of the serine protease domain of prothrombin and other serine protease genes, i t was found that none of the introns of the prothrombin gene are shared with any of the other serine protease genes. This absence of shared introns is in contrast to the shared introns found for the shared domains of the ac t i v a t i o n peptide and leader. The positions of the introns of the serine protease domain of serine proteases genes does not appear to r e f l e c t the evolution of the serine protease from protein domains, but i i rather the result of intron i n s e r t i o n into the serine protease coding regions. Intron i n s e r t i o n would also explain the o r i g i n of the few introns of the a c t i v a t i o n peptide that do not appear to separate protein domains. In conclusion, the organization of the exons and introns of the gene for prothrombin r e f l e c t both the o r i g i n of introns by insertion events, and the use of introns in exon s h u f f l i n g . The insertion of introns, and the subsequent p o s s i b i l i t y of exon s h u f f l i n g appear to have been e s s e n t i a l for the evolution of the multidomainal proteins, such as prothrombin, which are essential for vertebrate l i f e . i i i Table of Contents Abstract i i Table of Contents iv L i s t of Tables v i i i L i s t of Figures ix Acknowledgement xi L i s t of Abreviations x i i I ntroduct ion 1 A. Physiology of Blood Coagulation 1 1 . Hemostasi s . 1 2. Discovery of Coagulation Factors 2 3. Post-Translational Modifications 3 4. I n i t i a t i o n of Blood Coagulation 6 5. Non-Enzymatic Functions of Thrombin 7 B. Biochemistry of Blood Coagulation 7 1. F i b r i n Clot Formation 7 2. Coagulation Factors as Zymogens 8 3. Two Pathways of Blood Coagulation 9 C. Structure of the Prothrombin Molecule 11 1. Structure of Plasma Prothrombin 11 2. Post-Translational Modifications 12 3. Precursor Prothrombin 12 4. Gamma Carboxyglutamic Acid Domain 16 5. Kringle Domain 16 6. Thrombin Domain 17. 7. Three Dimensional Structure 18 D. Functions of Prothrombin 19 1. Action on Fibrinogen 19 2. Other Enzymatic Functions 19 3. Non-Enzymatic Functions 20 E. Blood Coagulation in Non-Mammals 21 1. Blood Coagulation in the Vertebrates 21 2. Blood Coagulation in the Invertebrates 23 F. Structure of Serine Proteases 24 1. Three Dimensional Structure 24 2. Limited Substrate S p e c i f i c i t y of the Coagulation Factors 25 G. Homologies Within Serine Protease Zymogens 26 1. Families of Serine Proteases.... 26 2. Roles of Serine Proteases in Physiology 26 3. Homologous Domains Within the Activation Peptide of Prothrombin 27 4. Homologous Domains Found in Serine Protease Zymogens Other than Prothrombin 30 H. Structure of Eukaryotic Structural Genes 32 1 . The Gene .32 2. Exons 32 3. Introns 33 4. Promoters 33 5. Transcription and Processing 34 i v I. Evolution of Amino Acid and DNA Sequence 35 1. Molecular Clock 35 2. Gene Duplications... 36 J. Evolution of the Structure of Proteins and Genes 37 1. Internal Duplications Within a Gene 37 2. Gene Fusions 38 K. Function of Introns 39 1 . D i s t r i b u t i o n 39 2. Position of Introns Within Genes and Proteins 39 3. Intron S l i d i n g . . . 40 L. Origin of Introns 41 1. Metabolic Enzymes 41 2. The Triose-Phosphate Isomerase Gene 42 3. Intron M o b i l i t y 43 4. Models of Intron Origin 44 M. Serine Protease Genes 46 1. Sequence of Serine Proteases 46 2. Genes for Serine Proteases 47 N. The Evolution of the Serine Protease Genes 48 Materials and Methods 49 A. Materials 49 B. Strains, Vectors, and Media 50 1. B a c t e r i a l Strains 50 2 . Vectors 51 3. Media 51 C. Basic Molecular Biology Techniques 52 D. I s o l a t i o n of DNA 54 1. I s o l a t i o n of Plasmid DNA. 54 2. I s o l a t i o n of Phage DNA 57 3. Genomic DNA Isolation 58 E. DNA Subcloning 59 1. Production of DNA Fragments for Ligation 59 2. Ligation of DNA into pUC13 or M13 Vectors 60 3. Transformation of DNA into Bacteria 61 F. I s o l a t i o n of RNA 62 1. I s o l a t i o n of Total C e l l u l a r RNA 62 2. I s o l a t i o n of Poly A+ RNA 64 G. Labeling of DNA 65 1. Nick Translation 65 2. Klenow Labeling 65 H. Blot Hybridization 66 1. Genomic Southern Blot Analysis 66 2. Southern Blot Analysis to Detect Repetitive DNA. 67 3. Northern Blot Blot Analysis 68 I. DNA Sequence Analysis 69 1. Construction of M13 Clones 69 2. Screening of M1 3 Clones 69 3. M13 DNA Iso l a t i o n 70 4. DNA Sequencing 71 5. Computer Analysis of DNA Sequence Data 73 V J . Heteroduplex A n a l y s i s 73 K. S c r e e n i n g Phage L i b r a r i e s 74 1. P l a t i n g Phage L i b r a r i e s 74 2. S c r e e n i n g of Phage F i l t e r s 75 L . S c r e e n i n g P l a s m i d L i b r a r i e s 75 M. Mapping the End of a mRNA T r a n s c r i p t 76 1. Nuc lease S1 Mapping 76 2. Pr imer E x t e n s i o n 78 R e s u l t s 80 A . I s o l a t i o n of the Bovine P r o t h r o m b i n Gene 80 1. Southern B l o t A n a l y s i s of the Bov ine Pro thrombin Gene 80 2. C l o n i n g of the Bovine P r o t h r o m b i n Gene 84 3. A n a l y s i s of the S i z e of the Bov ine Pro thrombin mRNA 87 B. H e t e r o d u p l e x Mapping 90 1 . Method 90 2. Exons and I n t r o n s 90 3. R e p e t i t i v e DNA 95 C . DNA Sequence A n a l y s i s of the Bov ine P r o t h r o m b i n Gene 95 D. Mapping the S i t e of mRNA I n i t i a t i o n 103 1. Nuc lease S1 A n a l y s i s 103 2. Pr imer E x t e n s i o n 107 E . Mapping R e p e t i t i v e DNA 112 F . I s o l a t i o n of a Human P r o t h r o m b i n cDNA 117 G . P a r t i a l DNA Sequence of p I I H l 3 120 H . I s o l a t i o n of the Human P r o t h r o m b i n Gene 120 1. I s o l a t i o n of Genomic C l o n e s 120 2. P a r t i a l DNA Sequence A n a l y s i s of the Human Prothrombin G e n e . . . ' . .123 I . I s o l a t i o n of cDNA C l o n e s f o r C h i c k e n P r o t h r o m b i n . . . . 128 1. C o n d i t i o n s of S c r e e n i n g 128 2. DNA Sequence of p C I I I 128 J . I s o l a t i o n of Longer C h i c k e n P r o t h r o m b i n cDNAs 131 K. S i z e A n a l y s i s of C h i c k e n P r o t h r o m b i n mRNA 134 D i s c u s s i o n 138 A . C h a r a c t e r i z a t i o n of the Bovine P r o t h r o m b i n Gene 138 1. I s o l a t i o n of the Bovine P r o t h r o m b i n Gene 138 2. S i z e A n a l y s i s of the Bovine P r o t h r o m b i n m R N A . . . . . 1 3 8 3. Sequence of the Bovine P r o t h r o m b i n Gene 139 4. S i t e of mRNA I n i t i a t i o n 141 5. I n t r o n P o s i t i o n s in the C o d i n g Region 142 B. C h a r a c t e r i z a t i o n of a Human P r o t h r o m b i n cDNA 146 C . C h a r a c t e r i z a t i o n of cDNAs for C h i c k e n P r o t h r o m b i n . . . 149 1. Sequence of the C h i c k e n P r o t h r o m b i n cDNAs 149 2. A l t e r n a t i v e S i t e s of P o l y a d e n y l y l a t i o n 150 D. Comparison of Prothrombin Sequences 151 1. Conserved Sequences 151 2. D e l e t i o n s / I n s e r t i o n s 154 3. mRNA S t r u c t u r e 155 vi E. Comparison of the Bovine and Human Prothrombin Genes 156 F. Comparison of Serine Protease Genes 158 1. Leader and Gla Region 158 2. Kringle Region 164 3. Serine Protease Region 169 G. Orig i n of Introns and Exon Shuffling 174 1. Orig i n of Introns 174 2. Exon Shuffling 176 H. Evolution of the Active Site Serine Codon 177 I. Model of the Evolution of the Vitamin K-Dependent Coagulation Factors _ 181 J. Evolution of the Blood Coagulation System 187 L i t e r a t u r e Cited 191 v i i L i s t of Tables I. DNA Sequencing Mixes 72 II. A Comparison of the Sizes of Exons Determined Both by DNA Sequence Analysis and Heteroduplex Analysis 93 II I . A Comparison of the Sizes of Introns Determined Both by DNA Sequence Analysis and Heteroduplex Analysis.... 94 IV. Length and Location of Inverted Repeat Sequences Observed Within the Introns of the Bovine Prothrombin Gene 96 V. Nucleotide Sequences at the Intron-Exon Junctions of the Bovine Prothrombin Gene 105 VI. Frequencies of Nucleotides at Intron-Exon Junctions 106 v i i i L i s t of Figures 1. The Blood Coagulation Cascade 5 2. The Prothrombin Molecule 14 3. Homologies in Coagulation Factor Zymogens 29 4. Southern Blot Analysis of the Bovine Prothrombin Gene 82-83 5. R e s t r i c t i o n Map of the Bovine Prothrombin Gene 86 6. Northern Blot Analysis of Bovine Prothrombin mRNA 89 7. Heteroduplex Analysis of the Bovine Prothrombin Gene...92 8. P a r t i a l Restriction Map and Sequencing Strategy for the Bovine Prothrombin Gene 99 9. P a r t i a l DNA Sequence of the Bovine Prothrombin Gene 101-102 10. Nuclease S1 Mapping of the Prothrombin mRNA 109 11. Primer Extension Analysis of Prothrombin mRNA 111 12. Southern Blot Analysis of Repetitive DNA Within the Bovine Prothrombin Gene 114 13. Map of Repetitive DNA in the Bovine Prothrombin Gene 116 14. R e s t r i c t i o n Endonuclease Map of the Human Prothrombin cDNAs 119 15. Nucleotide Sequence of the 5' End of pIIH13 122 16. R e s t r i c t i o n Map of the Human Prothrombin Gene 125 17. Southern Blot Analysis of the Human Prothrombin Gene..127 18. DNA Sequence of Chicken Prothrombin cDNAs 130 19. R e s t r i c t i o n Map of Chicken Prothrombin cDNAs 133 ix 20. Northern Blot Analysis of Chicken Prothrombin mRNA.... 136 21. Introns in the Prothrombin Molecule 144 22. Alignment of the Bovine and Human Prothrombin mRNA Sequences 148 23. Homologies in the Prothrombin Sequences 153 24. Comparison of the Organization of the Exons of the Leader Peptide and Gla Domain 161 25. Comparison of the Organization of the Exons of the Kringle Domain 166 26. Comparison of the Organization of the Exons of the Serine Protease Domain 171 27. A Model for the Evolution of the Vitamin K-Dependent Coagulation Factors 183 X Acknowledgement I would l i k e to thank my supervisor Dr. Ross MacGi11ivray, for providing the space and opportunity for me to do t h i s work. I also thank the members of my supervisory committee Drs. Caroline A s t e l l , Tom G r i g l i a t t i , Rob McMaster, and Mike Smith for their helpful comments and suggestions. I thank Drs. Kevin Ahern and George Pearson of Oregon State University for the heteroduplex analysis of the bovine prothrombin gene, which aided my work with the sequencing of the gene. I thank a l l the members of the lab, especially Enriqueta Guinto, Marion Fung, Debbie Cool, and Colin Hay for the many helpful suggestions, comments, methods, and materials. Thankyou also to a l l the members of the Biochemistry department, es p e c i a l l y Jeff Leung and Craig Newton, who have made my stay here very enjoyable. I would l i k e to thank Drs. T. Maniatis, F. Rottman, S. Orkin, and T. Kirshgessner for providing genomic and cDNA l i b r a r i e s used to isolate some of the clones described in t h i s t h e s i s . I would l i k e to acknowledge NSERC and the University Graduate Fellowship committee for their f i n a n c i a l support. x i L i s t of Abreviations A Adenosine ATP Adenosinetr iphosphate bp Base Pair(s) BSA Bovine Serum Albumin C Cytidine Ca 2 + Calcium ions dNTP Deoxyribonucleosidetriphosphate ddNTP Dideoxyribonucleosidetriphosphate DNA Deoxyribonucleic Acid DNase Deoxyr ibonuclease DTT D i t h i o t h r e i t o l EDTA Ethylenediaminetetraacetic Acid EtBr Ethidium Bromide G Guanosine Gla 7-Carboxyglutamic Acid GuHCl Guanidine Hydrochloride hnRNA Heterogeneous Nuclear RNA IPTG Isopropy1-0-D-Thiogalactopyranoside Kbp Kilobase Pair(s) Krpm Thousand Revolutions Per Minute LB Luria Broth mA Milliamps min minute(s) mRNA Messenger RNA N Any Nucleoside (G,A,T, or C) x i i OD O p t i c a l D e n s i t y p f u P l a q u e f o r m i n g u n i t R P u r i n e (A o r G) RNA R i b o n u c l e i c A c i d R N a s e R i b o n u c l e a s e rRNA R i b o s o m a l RNA TEMED N , N , N ' , N ' - T e t r a m e t h y l e t h y l e n e d i a m i n e T r i s T r i ( h y d r o x y m e t h y l ) a m i n o m e t h a n e t R N A T r a n s f e r RNA U U r i d i n e UV U l t r a V i o l e t V V o l t s T T h y m i d i n e W W a t t s X - G a l 5 - B r o m o - 4 - C h l o r o - 3 - I n d o l y l - f 3 - D -G a l a c t o p y r a n o s i d e Y P y r i m i d i n e (T o r C ) 1 INTRODUCTION A. PHYSIOLOGY OF BLOOD COAGULATION 1 . Hemostasi s In the vertebrates, a closed c i r c u l a t o r y system i s essential for nutrient transport, waste removal, hormonal regulation, immune response, and other physiological functions. This closed system of blood vessels (arteries, veins, and c a p i l l a r i e s ) i s prone to i n j u r i e s which lead to loss of blood f l u i d . Several i n t e r a c t i n g physiological mechanisms or systems exist to maintain blood volume and flow, a process known as hemostasis. In mammals, four systems interact to stop blood loss and repair damage in response to injury (Guyton,1977). These four systems or mechanisms are: (1) vascular contraction upon injury reduces blood flow in the damaged vessel, and thus l i m i t s f l u i d l o s s , (2) p l a t e l e t aggregation results in the formation of a p l a t e l e t plug that acts as a physical blockage to f l u i d loss (in non-mammalian vertebrates, a nucleated blood c e l l replaces the mammalian p l a t e l e t (Engle and Woods,1960)); th i s p l a t e l e t plug i s often enough to prevent f l u i d loss from small blood vessels, (3) blood coagulation results in the formation of a f i b r i n blood c l o t which acts as a mechanical block to f l u i d loss, and (4) invasion of the blood clo t by fibrous tissue and dis s o l u t i o n of the f i b r i n c l o t during c e l l and vessel wall repair (Guyton,1977). 2 2. Discovery Of Coagulation Factors Blood coagulation proteins represent only one component of hemostasis (Jackson and Nemerson,1980). The complete hemostatic mechanism is far from understood with the blood coagulation system perhaps the best, but s t i l l incompletely understood process (Davie et al.,1979; Jackson and Nemerson,1980) . Elucidation of the process of blood coagulation has been slow and complicated (MacFarlane,1960; Ratnoff,1977; Zur and Nemerson,1981). It was found in the mid 19th century that an extract from tissue (especialy brain) was a potent act i v a t o r of blood coagulation (see Ratnoff,1977; Zur and Nemerson,1981 for h i s t o r i c a l reviews). These early experiments led to the f i r s t model for blood coagulation in which a tissue factor would convert prothrombin to thrombin in the presence of calcium ions ( C a 2 + ) . Thrombin could then convert fibrinogen to f i b r i n . Almost immediately t h i s model was shown to be inadequate as i t could not explain many of the known bleeding disorders. Indeed, the majority of the blood coagulation factors were id e n t f i e d by description of their absence in patients with bleeding tendencies (Bloom,1981). These d e f i c i e n c i e s led to the discovery of factor V (Quick,1943), factor VII (Owen and Bollman,1948), factor VIII (Patek and Taylor,1937; Brinkhous,1947; Quick,1947), factor IX (Biggs et al.,1952), factor X (Telfer et al.,1956; Hougie et al.,1957), factor XI (Rosenthal e_t al.,1953), and factor XII (Ratnoff and Colopy,1955). With the discovery of these factors, a cascade, 3 or w a t e r f a l l , model of coagulation was developed (MacFarlane,1964; Davie and Ratnoff,1964) (see Fig.1). This coagulation cascade has had further modifications (see below) due to the discovery of additional factors. Some of these proteins were i n i t i a l l y characterized biochemically, and subsequently found to be associated with s p e c i f i c hematological disorders, e.g. protein C ( G r i f f i n et al.,1981) and protein S (Comp et al.,1984; Schwarz et al.,1984). 3. Post-Translational Modifications N u t r i t i o n a l studies in the chicken led to the discovery of another aspect of the blood coagulation system. S p e c i f i c defined d i e t s fed to chicks lead to a bleeding tendency and vitamin K was postulated to be the missing essential vitamin (Dam,1935). The bleeding tendency was shown to be due to the production of an abnormal prothrombin (Dam e_t a l . , 1 936) . Subsequently, i t has been shown that vitamin K i s esse n t i a l in both the mammals and the birds (Suttie,1985), and for the formation of normal prothrombin as well as factors VII, IX, X, and proteins C, S, and Z (Suttie,1985). It has been demonstrated that vitamin K i s a necessary cofactor in the formation of 7-carboxyglutamic acid (Gla) residues found at the amino-terminal regions of the vitamin K dependent coagulation factors (Suttie,1985). The Gla residues are formed by the carboxylation of s p e c i f i c glutamic acid residues by a vitamin K-dependent carboxylase (Suttie,1985). Coumaral drugs, e.g. WARFARIN, i n h i b i t the carboxylation reaction and thus impair blood coagulation. 4 Figure 1: The Blood Coagulation Cascade Outline of the mammalian blood coagulation cascade with the i n t r i n s i c pathway ( l e f t ) and e x t r i n s i c pathway (right) converging at the a c t i v a t i o n of factor X to factor Xa, and ending with the formation of the insoluble f i b r i n c l o t . Bars represent the polypeptide chains (proportional to polypeptide chain length) with molecular weights indicated below. Intra molecular disulphide bridges are indicated by l i n e s between the two chains. X-linked f i b r i n represents the cross linked f i b r i n c l o t formed by the action of factor XHIa. (From Neurath,1984). 5 S u r f a c e HMW Kininogen Kallikrcin Kallikreln XIU.XI, Co 2 * V I I I , P-llpid I8K 45K VII a . VII T i s s u e (actor I7K 39K X I7K 28K va Caz", P-lipid 70 K P r o t h r o m b i n T h r o m b i n Fibrinogen Fibrin xai„ Fibrin (X - l inked) 6 The 7 - c a r b o x y g l u t a m i c a c i d r e s i d u e s a l l o w t h e v i t a m i n K-dependent c o a g u l a t i o n f a c t o r s t o form C a 2 + b r i d g e s t o p h o s p h o l i p i d membranes ( S u t t i e , 1 9 8 5 ) . These p h o s p h o l i p i d membranes a r e p r o b a b l y p r o v i d e d by the p l a t e l e t s ( S u t t i e and J a c k s o n , 1 9 7 7 ) , p r o v i d i n g an example of the i n t e r a c t i o n between the d i f f e r e n t p h y s i o l o g i c a l p r o c e s s e s i n h e m o s t a s i s . The i n t e r a c t i o n of the v i t a m i n K-dependent c o a g u l a t i o n f a c t o r s w i t h p h o s p h o l i p i d i n the presence of C a 2 + was a l s o found t o be dependent on p r o t e i n c o f a c t o r s ( f a c t o r s V and V I I I , and t i s s u e f a c t o r ) . An absence of these f a c t o r s would a l s o i m p a i r c o a g u l a t i o n (Bloom, 1981). 4. I n i t i a t i o n Of B l o o d C o a g u l a t i o n I n i t i a l l y , t i s s u e f a c t o r was thought t o be e s s e n t i a l f o r the i n i t i a t i o n of b l o o d c o a g u l a t i o n (see R a t n o f f , 1 9 7 7 ; Zur and Nemerson,1981 f o r h i s t o r i c a l r e v i e w s ) . However, i t was l a t e r o bserved t h a t c o a g u l a t i o n c o u l d be i n i t i a t e d w i t h o u t an e x t r i n s i c t i s s u e f a c t o r ( R a t n o f f and C o p l e y , 1 9 5 5 ) . An i n t r i n s i c i n i t i a t i o n system appeared t o e x i s t w h i c h l e a d t o the development of the i d e a of two pathways of i n i t i a t i o n of c o a g u l a t i o n - the i n t r i n s i c and e x t r i n s i c (as d i s c u s s e d l a t e r ) . The i n t r i n s i c i n i t i a t i o n system i s s t i l l not c o m p l e t e l y u n d e r s t o o d but does r e q u i r e f a c t o r X I I , p r e k a l l i k r e i n , h i g h m o l e c u l a r weight k i n i n o g e n and a n e g a t i v e l y c h a r g e d s u r f a c e ( G r i f f i n , 1 9 8 1 ) . 7 5. Non-Enzymatic Functions Of Thrombin Prothrombin was found to have functions other than the conversion of soluble fibrinogen to insoluble f i b r i n (see next section). It was discovered that thrombin (activated prothrombin) interacted with p l a t e l e t s and endothelial c e l l s r e s u l t i n g in the formation of activated p l a t e l e t s and inducing wound repair, thus aiding hemostasis (Fenton,1981; Fenton and Bing,l986). Thrombin i s also a chemotactic agent a t t r a c t i n g some c e l l s of the immune system, e.g. neutrophils (Fenton,1981; Fenton and Bing,l986), which may function to prevent entry of foreign material by way of the injured blood vessel. The mechanisms of many of these additional functions of thrombin are not completely understood (see Fenton,1981 for a review). B. BIOCHEMISTRY OF BLOOD COAGULATION 1. F i b r i n Clot Formation Formation of the f i b r i n blood c l o t requires the p a r t i c i p a t i o n of at least 14 plasma proteins, a tissue protein, phospholipid membranes, Ca 2 + , and p l a t e l e t s (Davie e_t a l . , 1979; Jackson and Nemerson,1980). It is the formation of the f i b r i n blood c l o t that i s the best characterized and understood process of hemostasis (Davie et al.,1979; Jackson and Nemerson,1980). The blood c l o t i s formed by the polymerization of f i b r i n monomers into a network which incorporates the p l a t e l e t plug, thrombin and other proteins and c e l l s into a mechanical plug to prevent f l u i d loss (Doolittle,1984). F i b r i n i s formed by li m i t e d p r o t e o l y s i s of fibrinogen to f i b r i n as indicated in 8 Fig.1 (Doolittle,1984). Fibrinogen i s a plasma protein of 340,000 molecular weight and comprised of 6 polypeptide chains: 2 Aa, 2 B/3, and 2 7 chains (Jackson and Nemerson, 1980; Doolittle,1984). Thrombin cleaves four peptide bonds in each fibrinogen monomer, one in each of the Aa and B/3 chains, releasing 2 fibrinopeptides A, 2 fibrinopeptides B, and f i b r i n monomer (Doolttle,1984). The f i b r i n monomers can then polymerize spontaneously to form insoluble f i b r i n polymers (Doolittle, 1984). The f i b r i n network i s further strengthened by the formation of covalent cross l i n k s between monomers by the transglutamase factor XHIa (see Fig.1) (Curtis,1981). Factor XIII i s found in plasma as an inactive protein that is activated to Factor XHIa by thrombin (Davie et al.,1979; Jackson and Nemerson,1980). 2. Coagulation Factors As Zymogens Many of the enzymatic steps of the blood coagulation cascade consist of the conversion of inactive zymogens to active serine proteases, such as the a c t i v a t i o n of prothrombin to thrombin by factor Xa (see F i g . 1 ) (Davie et aJL.,1979; Jackson and Nemerson,1980). As shown in Fig.1, the zymogen forms of the coagulation factors VII, IX, X, XI, XII, and prothrombin are activated to the corresponding serine proteases (factors V i l a , IXa, Xa, XIa, X l l a , and thrombin, respectively) by limited proteolysis (Davie et al.,1979; Jackson and Nemerson,1980). Many of these proteo l y t i c reactions require a protein cofactor such as factor V, factor VIII, high molecular weight kininogen, or tissue factor (Davie et al.,1979; Jackson and Nemerson,1980) . 9 In addition to the protein cofactors, the vitamin K-dependent coagulation proteins (factors VII, IX, X, and prothrombin in Fig.1) also require phospholipid and C a 2 + (Davie et al.,1979; Jackson and Nemerson,1980). As discussed e a r l i e r , the vitamin K-dependent coagulation factors interact with phospholipid through C a 2 + bridges with 7-carboxyglutamic acid residues found at the amino-termini regions of these.proteins (Suttie,1985) . In the vitamin K-dependent coagulation factors, a l l glutamate residues in the f i r s t 45 residues of the amino-terminal of these proteins are 7-carboxylated (Jackson and Nemerson,1980). 3. Two Pathways Of Blood Coagulation Blood coagulation is i n i t i a t e d by either or both of the two pathways shown in Fig.1 (Davie e_t §_1.,1979; Jackson and Nemerson,1980) . The ex t r i n s i c pathway i s i n i t i a t e d by the release of tissue factor (the e x t r i n s i c factor) from damaged tissue (Davie et al.,1979; Jackson and Nemerson,1980). Tissue factor, as a protein cofactor, accelerates the ac t i v a t i o n of factor X by factor V i l a (or VII) (see Fig.1). Factor VII can be activated by many of the coagulation factors including factors X l l a , Xa, and thrombin (Jackson and Nemerson,1980). Factor VII appears to have p a r t i a l proteolytic a c t i v i t y without a c t i v a t i o n , but i s unable to i n i t i a t e blood coagulation in the absence of tissue factor (Jackson and Nemerson,1980). Upon injury, release of t i s s u e factor w i l l i n i t i a t e blood coagulation; however, the production of factor V i l a w i l l increase and sustain the coagulation response (Jackson and Nemerson,1980). The i n t r i n s i c pathway (see Fig.1) d i f f e r s as the protease 10 responsible for the f i r s t p r o t e o l y t i c cleavage necessary for the i n i t i a t i o n of coagulation has not been i d e n t i f i e d (Jackson and Nemerson,1980; Griffin,1981). Factors XII and XI, p r e k a l l i k r e i n and high molecular weight kininogen p a r t i c i p a t e in the i n i t i a l events, but their i n d i v i d u a l roles are not completely understood (Davie et al.,1979; Jackson and Nemerson,1980; G r i f f in,1981). I n i t i a t i o n i s induced by the contact of a plasma factor(s) ( i n t r i n s i c ) with a negatively-charged surface created by injury to the vessel wall (Griffin,1981 ) . Once i n i t i a t e d , the cascade (Fig.1) can proceed to f i b r i n formation to cover the exposed surface (Davie et aJ.,1979; Jackson and Nemerson,1980). In the past twenty-five years, most of the coagulation factors have been p u r i f i e d from plasma allowing characterization of their structures and functions (Davie et §_1.,1979; Jackson and Nemerson,1980 - for comparison see MacFarlane,1960). Recently, the amino acid sequences of the plasma and precursor forms of the coagulation factors have become avail a b l e due to advances in molecular biology techniques. Two important features of the blood coagulation cascade are i l l u s t r a t e d by Fig.1. The existence of a cascade allows rapid amplification of the response to injury (MacFarlane,1964; Davie and Ratnoff,1964) because each activated zymogen is able to activate c a t a l y t i c a l l y a large number of zymogens in the next step of the cascade (see Fig.1) (Davie et a_l.,l979; Jackson and Nemerson,1980). This a m p l i f i c a t i o n allows the rapid response to injury essential for hemostasis (Jackson and Nemerson,1980). Secondly, because a large number of d i f f e r e n t protease 11 i n h i b i t o r s are found in plasma (Jackson and Nemerson,1980), the multiple steps provide a large number of opportunities to regulate the cascade (Davie et ajL., 1979; Jackson and Nemerson,1980). This prevents coagulation beyond the s i t e of injury and allows termination of coagulation once the mechanical plug preventing f l u i d loss i s in place. C. STRUCTURE OF THE PROTHROMBIN MOLECULE 1. Structure Of Plasma Prothrombin Prothrombin is the c i r c u l a t i n g zymogen of thrombin, the serine protease responsible for the limited proteolysis of fibrinogen to produce f i b r i n (Davie et al.,1979; Jackson and Nemerson,1980). Both bovine and human plasma prothrombin are glycoproteins of approximately 70,000 molecular weight (Davie e_t al.,1979; Jackson and Nemerson,1980). Prothrombin has a s i m i l a r molecular weight in other mammalian species (Walz et al.,1974) . The complete amino acid sequence of both bovine (Magnusson e_t al.,1975) and human (Walz et al.,1977; Butkowski et al.,1977) prothrombin have been determined. Prothrombin from the chicken has also been p a r t i a l l y characterized, and was shown to have both a similar molecular weight (Walz e_t al.,1974) and amino acid composition (Walz et al.,1974) to the mammalian prothrombins. The N-terminal amino acid sequence of chicken prothrombin has been determined (Walz,1978). Based on molecular weight, amino acid composition and p a r t i a l amino acid sequence, i t has been concluded that avian and mammalian prothrombins are probably similar in structure and in function (Walz,1978). 1 2 2. Post-Translational Modification Prothrombin, which i s synthesized in the l i v e r as are many of the blood coagulation factors (Anderson and Barnhart,1964), undergoes glycosylation and 7-carboxylation during i t s biosynthesis (Swanson and Suttie,1985). These biosynthetic processes are many and complex. Several precursors of plasma prothrombin have been i d e n t i f i e d in l i v e r tissue, though their structures have not been characterized (Graves e_t a l . , 1 980a , b; Swanson and Suttie,1985). Bovine and human cDNA copies of the mRNA for prothrombin have been is o l a t e d from l i v e r cDNA l i b r a r i e s (MacGi 11 ivray e_t al.,1980; Degen et al.,1983; MacGillivray and Davie,1984) have allowed the prediction of the complete amino acid sequence of the precursor of prothrombin. The amino acid sequence of the bovine prothrombin precursor i s shown in Figure 2 (MacGillivray and Davie,1984). The precursor to bovine prothrombin contains an amino-terminal extension of 43 amino acid residues (MacGillivray and Davie,1984), while the human prothrombin precursor has an extension of least 36 residues (Degen et al.,1983). 3. Precursor Prothrombin The leader peptide (43 amino acids) of both bovine and human prothrombin i s cleaved at an Arg-Ala bond prior to secretion from the l i v e r (Magnusson et §_1.,1975; Walz et al.,1977; Degen et al.,1983; MacGillivray and Davie,1984) (Fig.2). Signal peptidase, the proteoly t i c enzyme which removes signal (pre-) peptides from secreted proteins, t y p i c a l l y cleaves 1 3 F i g u r e 2 : T h e P r o t h r o m b i n M o l e c u l e S c h e m a t i c r e p r e s e n t a t i o n o f t h e s t r u c t u r e o f b o v i n e p r o t h r o m b i n a s p r e d i c t e d f r o m c D N A s e q u e n c e ( M a c G i 1 1 i v r a y a n d D a v i e , 1 9 8 4 ) . A m i n o a c i d r e s i d u e s a r e i n d i c a t e d b y t h e s i n g l e l e t t e r c o d e . T h e p r e p r o - p e p t i d e i s n u m b e r e d b a c k w a r d s f r o m t h e s i t e o f c l e a v a g e t h a t p r o d u c e s p l a s m a p r o t h r o m b i n . D i s u l p h i d e b r i d g e s a r e p l a c e d a c c o r d i n g t o M a g n u s s o n e_t a l . ( 1 9 7 5 ) . T h e t h r e e r e s i d u e s H i s - 3 6 6 , A s p -4 2 2 , a n d S e r - 5 2 8 c o n s t i t u t e t h e a c t i v e s i t e c a t a l y t i c t r i a d . p u t a t i v e s i t e o f s i g n a l p e p t i d a s e c l e a v a g e p u t a t i v e s i t e o f p r o p e p t i d a s e c l e a v a g e Y - 7 - c a r b o x y g l u t a m i c a c i d r e s i d u e s - g l y c o s y l a t e d r e s i d u e s K R I N G L E S s ) - C O O H 582 1 5 after small a l i p h a t i c amino acid side chains (e.g. alanine) (von Heinji,1983,1985), and not large basic residues such as arginine. The s i t e cleaved to produce mature plasma prothrombin (Fig.2) is more similar to pro-peptide cleavage sequences such as prepro-albumin (Steiner e_t al.,1980), than signal peptidase cleavage sequences. It has been suggested that prothrombin i s synthesized as a prepro-protein and contains both a pre-(signal) and a pro-peptide in the prepro-leader sequence (Degen et a l . , 1983; MacGillivray and Davie,1984). Similar prepro-leader peptides have been found in other vitamin K-dependent coagulation factors (Kurachi and Davie,1982; Jaye et al.,1983; Fung et al.,1984,1985; Long et al.,1984; Beckman et al.,1985; Hagen et §_1.,1986). In factor IX, the s i t e of signal peptidase cleavage probably precedes amino acid residue T h r ~ 1 8 producing a 21 (or 25) residue pre-peptide and a 18 residue pro-peptide (Bently e_t a_l.,l986). Based on t h i s s i t e in factor IX and the cleavage s p e c i f i c i t y of signal peptidase, i t has been suggested that the s i t e of signal peptidase cleavage in prothrombin is between amino acid residues H i s " 2 0 and G i n - 1 9 (see Fig.2) (Bently et al.,1986) producing a 24 residue pre-peptide and a 19 residue pro-peptide. While the function of the pro-peptide is unknown, t h i s pro-peptide has high homology with the pro-peptides of other vitamin K-dependent coagulation factors (Fung et al.,1984) and the pro-peptide of the vitamin K-dependent bone protein osteocalcin (Pan and Price,1985; Pan et al.,1985). Because of t h i s homology, i t has been suggested that the pro-peptide may have a role in the 7-carboxylation of the 1 6 vitamin K-dependent proteins (Fung e_t a l . , 1 984 ,1 985; Pan and Price,1985; Pan et al.,1985). 4. Gamma Carboxyqlutamic Acid Domain The N-terminal 47 amino acid residues of plasma prothrombin, the Gla region (see Fig.2) (Magnusson et al.,1975), contains a l l of the Gla residues (see above) which allow the formation of C a 2 + bridges to phospholipid membranes. These interactions are e s s e n t i a l for the e f f i c i e n t a ctivation of prothrombin (Jackson,1981) . Descarboxyprothrombin, found in the plasma of vitamin K defecient cows and humans, is poorly activated as a re s u l t of the absence of the 7-carboxyglutamic acid residues (Suttie and Jackson,1977). 5. Kringle Domain Following the Gla region are the structures known as kringles (Magnusson e_t al.,1975). Kringles are composed of about 80 amino acid residues containing six invariant cysteine residues which form three internal disulphide bridges (Magnusson et al.,1975) (see Fig.2). The .function(s) of the kringles are not clear but the second kringle of prothrombin has been reported to bind to factor Va (Esmon and Jackson,1974), which i s the e s s e n t i a l protein cofactor in prothrombin ac t i v a t i o n complex. 1 7 6. Thrombin Domain The C-terminal half of the prothrombin molecule contains the serine protease c a t a l y t i c region (Magnusson e_t al.,1975). Factor Xa cleaves the polypeptide chain in two places (see Fig.2) releasing the amino-terminal a c t i v a t i o n peptide (with Gla and kringle domains) from the two chain thrombin molecule (Magnusson et al.,1975). Bovine thrombin consists of an A chain (50 amino acid residues) linked to the B chain (259 amino acid residues) by a disulphide bridge (see Fig.2). The function of the A chain of thrombin is unknown (Jackson,1981) . The B chain of thrombin shares amino acid sequence homology with many serine proteases including the invariant h i s t i d i n e 3 6 6 , a s p a r t a t e 4 2 2 , and s e r i n e 5 2 8 residues that comprise the c a t a l y t i c t r i a d (see Fig.2) (Magnusson et al.,1975). Homologies to t r y p s i n at the amino-terminus of the B chain and around A s p 5 2 7 suggest that the mechanism of ac t i v a t i o n of prothrombin i s s i m i l a r to that of trypsinogen (see below) (Jackson and Nemerson,1980) . Upon alignment of the prothrombin and trypsinogen sequences, homology is also observed at the substrate binding pockets with A s p 5 0 6 giving thrombin a t r y p s i n - l i k e s p e c i f i c i t y (see section F-2 for further discussions). However, thrombin has a more l i m i t e d substrate s p e c i f i c i t y than the pancreatic serine proteases (see sect ion F-2). 18 7. Three Dimensional Structure Three dimensional structures of thrombin or prothrombin have not been elucidated, however, the three dimensional structure of one of the kringles of bovine prothrombin has been determined (Tulinsky et §_1.,1985; Park and Tulinsky, 1 986) . This structure was obtained from the proteolytic fragment 1 of bovine prothrombin (amino ac i d residues 1 to 156, Fig.2) (Tulinsky e_t al.,1985; Park and Tulinsky,1986). The disulphide bridges between C y s 8 7 to C y s 1 2 7 and Cys' 1 5 to Cys' 3 9 (Fig.2) are found near the middle of the folded structure, with the loops of the kringle sequence surrounding this nucleus in a d i s c - l i k e manner (Park and Tulinsky,1986) . The Gla region is also contained in prothrombin fragment 1 (see Fig.2) but the structure of the f i r s t 35 amino ac i d residues could not be resolved, due to lack of uniform structure (Park and Tulinsky,1986). It was suggested that some f l e x i b i l i t y in the Gla region may be required for membrane binding (Park and Tulinsky,1986). The sequence from S e r 3 6 to A l a 4 6 of the Gla region could be resolved, and contains some stacked aromatic residues suggesting a possible function as a receptor recognition s i t e (Park and Tulinsky,1986). Although the three dimensional structure of the thrombin domain of prothrombin i s unknown, the amino acid sequence of thrombin shares considerable homology with trypsin (Furie e_t al.,1982). This sequence homology has allowed the development of a three dimensional model for thrombin based on the known c r y s t a l structure of trypsin (Furie et al.,1982; see section 1 9 D. FUNCTIONS OF PROTHROMBIN 1 . Action On Fibrinogen As outlined above, prothrombin is the c i r c u l a t i n g zymogen form of the protease thrombin. The primary function of thrombin in the coagulation cascade i s the conversion of fibrinogen to f i b r i n (see Fig.1) (Davie et §_1.,1979; Jackson and Nemerson,1980). Fibrinogen i s converted to f i b r i n monomer by limited proteolysis in which thrombin cleaves fibrinogen in each of the two Aa and two Bj3 chains to release two of each of the fibrinopeptides, A and B (Doolittle,1984). F i b r i n monomers can then spontaneously polymerize to form insoluble f i b r i n polymers that form the basis of the blood c l o t (Doolittle,1984). Only one peptide bond in each of the Aa and Bj3 chains of fibrinogen is susceptible to the action of thrombin, demonstrating the limited substrate s p e c i f i c i t y of t h i s enzyme ( D o o l i t t l e , 1984). Impairment of thrombin, the phys i o l o g i c a l cause of f i b r i n formation, thus d i r e c t l y impairs blood c l o t formation (Fenton,1981; Fenton and Bing,l986). 2. Other Enzymatic Functions Thrombin i s also able to cleave a li m i t e d number of peptide bonds in a few other plasma proteins with important physiological consequence. Thrombin can activate both factors V and VIII producing factors Va and V i l l a (Davie et al.,1979; Jackson and Nemerson,1980). These proteins are es s e n t i a l cofactors in the ac t i v a t i o n complexes of prothrombin and factor 20 X, respectively (Davie et al.,1979; Jackson and Nemerson,1980). In the presence of the endothelial membrane protein thrombomodulin, thrombin w i l l activate protein C (Stenflo,1976; Esmon,l983). The resulting activated protein C (APC) inactivates factors Va and V i l l a , thereby repressing the coagulation cascade (Stenflo,1976; Esmon,l983). Thus, thrombin has roles in both the i n i t i a t i o n and termination of the coagulation cascade, and as such is an important regulatory protease (Fenton,1981; Fenton and Bing,1986). Thrombin also acts to activate factor XIII by limited p r o t e o l y s i s , producing factor XHIa (see Fig.1 ). Factor XHIa is a transglutaminase which catalyszes the formation of covalent cross l i n k s between glutamine and lysine residues in the 7 chains of adjacent f i b r i n monomers (Davie e_t al.,1979; Jackson and Nemerson,1980) . This cross link i n g strengthens the blood c l o t to a s s i s t in the formation of an insoluble mechanical blockage to f l u i d loss (Davie et al.,1979; Jackson and Nemerson,1980) . Thrombin has been implicated as the activator of other blood coagulation factors, e.g. factor VII as discussed above, but in these roles may not be important ir\ vivo (Zur and Nemerson,1981). 3. Non-Enzymatic Functions As mentioned above, thrombin also interacts with other components of the hemostatic response to injury. Thrombin, by incompletely understood mechanisims, w i l l stimulate many d i f f e r e n t c e l l types leading to mitogenesis, arachidonic acid metabolism and the secretion of proteins (see Fenton and 21 Bing,l986 for a review). Although r e a c t i v i t y may vary, a l l mammalian tissue or c e l l types (except erythrocytes) are responsive to thrombin, e s p e c i a l l y endothelial c e l l s , nerve c e l l s , smooth muscle c e l l s , leucocytes, and cultured fibroblast c e l l s (Fenton and Bing,l986). Thrombin action on pl a t e l e t s i s well studied and, upon a c t i v a t i o n involves a change in c e l l shape and secretion of proteins into plasma (Milis,1981). Thrombin has a hormone-like action upon c e l l s of the immune system (Fenton,1981), and thus may a s s i s t in the prevention of invasion of the body by foreign agents by way of injured blood vessels. E. BLOOD COAGULATION IN NON-MAMMALS 1. Blood Coagulation In The Vertebrates Blood coagulation appears to occur in a l l vertebrates, but has been best characterized within the mammals (see above) (Davie et al.,1979; Jackson and Nemerson,1980 ) . The blood coagulation cascade as shown in Fig.1 was developed for the bovine and human systems, but has been found to be similar in other mammalian species (Davie e_t a_l.,1979; Jackson and Nemerson,1980) whereas coagulation systems in non-mammalian vertebrates have been less well characterized. Conversion of fibrinogen to f i b r i n by a thrombin-1ike enzyme i s the basis of blood c l o t formation in a l l vertebrates (Doolittle,1984 ). In many of the vertebrate classes, the existence of other coagulation factors has not been investigated in d e t a i l . The chicken appears to have the best characterized coagulation 22 system in non-mammals (Didisheim et al.,1959; Walz et al.,1975). In the chicken, most of the mammalian coagulation factors, including the p a r t i a l l y characterized prothrombin (see section C-1), have been i d e n t i f i e d (Didisheim et al.,1959; Walz et a_l. , 1 974 , 1 975) . Prothrombin has also been p a r t i a l l y p u r i f i e d from lamprey ( D o o l i t t l e e_t al.,1962; Dool i t t l e , 1 965) . Activated lamprey prothrombin is able to coagulate bovine fibrinogen ( D o o l i t t l e e_t al.,1962). Lamprey plasma contains at least one protein which contains 7-carboxyglutamic acid (Zytkovicz and Nelsestuen,1976). Lamprey prothrombin, l i k e Gla containing proteins (Zytkovz and Nelsestuen,1976), can be adsorbed to barium s a l t s ( D o o l i t t l e et al.,1962; Doolittle,1965). Thus, lamprey prothrombin is most l i k e l y a Gla containing protein. Other s t r u c t u r a l imformation about lamprey prothrombin i s not known. The remainder of the coagulation factors have been less well characterized (Didisheim et al.,1959; D o o l i t t l e and Surgenor,1962). Attempts to demonstrate the existence of surface a c t i v a t i o n of coagulation in birds and f i s h f a i l e d to conclusively i d e n t i f y this process (Engle and Woods,1960; D o o l i t t l e and Surgenor,1962), while e x t r i n s i c i n i t i a t i o n has been observed in a l l vertebrates examined (Didisheim et aJL.,1959; D o o l i t t l e and Surgenor , 1 962) , suggesting that i n t r i n s i c i n i t i a t i o n of coagulation may be a mammalian adaptation to the e x t r i n s i c system of blood coagulation. 23 2. Blood Coagulation In The Invertebrates Blood coagulation i s not l i m i t e d to vertebrates (Engle and Woods,1960). Hemostasis of some type has been observed in many other phyla including C e o l i n t e r a t i a , Annelidia, Molluscia, Arthropodia, and Eichinodermatia (Engle and Woods,1960; MacFarlane,1960). In many of these invertebrate species, hemostasis is simply the re s u l t of aggregation of blood c e l l s at the s i t e of injury (Engle and Woods,1960; MacFarlane,1960) which may be analogous to the formation of a p l a t e l e t plug in mammals (MacFarlane,1960). There are fewer cases of plasma proteins being involved in a coagulation scheme (MacFarlane,1960) . The best characterized invertebrate coagulation protein i s the fibrinogen molecule from the spiny lobster (Fuller and Doolittle,1971 a,b). In t h i s animal, c l o t formation is caused by the polymerization of a plasma fibrinogen (which is unlike vertebrate fibrinogen; F u l l e r and Doolittle,1971a) by a C a 2 + dependent transglutaminase (Engle and Woods,1960; Fu l l e r and Doolittle,1971b). In the horseshoe crab, a second coagulation scheme exists where a coagulem i s polymerized after limited proteolysis (Solum,1973; Cheng et al.,1986). The complete amino acid sequence of the precursor of coagulogen has been determined (Cheng e_t a_l. ,1986), and has no s i m i l a r i t y to either vertebrate or spiny lobster fibrinogens (Cheng e_t al.,1986). The c l o t t i n g enzyme responsible for the l i m i t e d p roteolysis has been p u r i f i e d and p a r t i a l l y characterized (Seid and Liu,1980; Liang and Liu,1982). The c l o t t i n g enzyme i s C a 2 + dependent, and appears to be activated by endotoxins (Seid and Liu,1980; Liang and 24 Liu,1982). F. STRUCTURE OF SERINE PROTEASES 1. Three Dimensional Structure Many of the activated blood coagulation factors, including prothrombin, are serine proteases (Davie et al.,1979; Jackson and Nemerson,1980). The most obvious function of these coagulation factors is as proteases (see Fig.1) for either the a c t i v a t i o n or inactivation of other coagulation factors or plasma proteins (Jackson and Nemerson,1980). Three dimensional structures of the coagulation factor serine proteases (or zymogens) have not been determined, but due to the i r homology to the digestive serine proteases, models of the structures of several of the coagulation factors have been proposed (Furie e_t al.,1982; Cool et al.,1985). These models assume that the coagulation factor serine proteases have similar three dimensional structures to the digestive serine proteases and function with similar c a t a l y t i c mechanisms (Furie e_t al.,1982; Cool et al.,1985). A l l of the coagulation factor serine proteases contain the c a t a l y t i c a l l y important h i s t i d i n e , aspartate and serine residues in homologous locations (Davie et §_1.,1979; Jackson and Nemer son , 1 980) . Each of the coagulation factors also contains an aspartate residue in a homologous location to the aspartate of the substrate binding pocket of tryp s i n (Kraut,1977; Stryer,l98l) which may account for the (limited) t r y p s i n - l i k e s p e c i f i c i t y of the coagulation factors (Davie et a_l.,l979; Jackson and Nemerson, 1 980) . Trypsinogen i s 25 activated to trypsin by l i m i t e d proteolysis removing an amino-terminal a c t i v a t i o n peptide and creating a new amino-terminus. This new amino-terminal isoleucine then forms a new salt bridge with the aspartate residue adjacent to the active s i t e serine resu l t i n g in a conformational change (Stroud et al.,1977; Stryer,1981). In the coagulation factors, a homologous cleavage in a conserved a c t i v a t i o n sequence (Jackson and Nemerson,1980) may cause a similar conformational change resulting in serine protease a c t i v i t y (Davie e_t c a l . , 1979; Jackson and Nemerson,1980). 2. Limited Substrate S p e c i f i c i t y Of The Coagulation Factors The mechanism for the l i m i t e d proteolytic action of the coagulation factors to s p e c i f i c substrates is not completely understood (Furie e_t §_1 .,1982). Studies of the structure of trypsin have allowed a greater understanding of the c a t a l y t i c mechanism, together with a basis for the substrate s p e c i f i c i t y (e.g. see Craik e_t al.,1985), which may by analogy help explain the mechanism of the coagulation factor serine proteases. The extreme substrate s p e c i f i c i t y in the coagulation factors may be due in part to changes surrounding the substrate binding pocket and the influence of the a d d i t i o n a l polypeptide chain present in many of the coagulation factors (Furie et a_l.,l982). This limited substrate s p e c i f i c i t y i s e s s e n t i a l for the amplification of the coagulation cascade (see Davie e_t a_l.,l979; Jackson and Nemerson,1980). 26 G. HOMOLOGIES WITHIN SERINE PROTEASE ZYMOGENS 1. Families Of Serine Proteases The development of the c a t a l y t i c mechanism of the serine proteases has occurred at least twice during the evolution of l i f e on earth (Neurath,1984). Two families of serine proteases have been i d e n t i f i e d which share a similar mechanism (Neurath,1984). The s u b t i l i s i n type family, although i t shares the same c a t a l y t i c mechanism (including the c a t a l y t i c t r i a d of residues), does not share amino acid sequence or three dimensional structural homology with the t r y p s i n - l i k e serine proteases (Neurath,1984). The t r y p s i n - l i k e family appears to be a larger family and more widespread in Nature. T r y p s i n - l i k e serine proteases are found in both eukaryotes and prokaryotes (Delbaere e_t el.., 1975), while the s u b t i l i s i n s are found only within the B a c i l l i (Kraut,1977). Existence of the t r y p s i n - l i k e serine proteases in both prokaryotes and eukaryotes i s an indication of the age of these proteins. They must have been in existence since early in the evolution of l i f e ( i . e . >1X109 years ago) (Neurath,1984). 2. Roles Of Serine Proteases In Physiology Serine proteases have a role in a large number of e s s e n t i a l physiological processes (Neurath and Walsh,1976; Neurath,1984) , including blood coagulation and digestion as well as such diverse processes as the complement cascade, neuropeptide processing, f i b r i n o l y s i s , and f e r t i l i z a t i o n of germ c e l l s (Neurath and Walsh,1976). A l l of these serine proteases share 27 amino acid sequence homology. Fig.3 i l l u s t r a t e s some of the amino acid homologies within the coagulation and f i b r i n o l y t i c serine protease zymogens (Young et al.,1978; Hewett-Emmett et a l . ,1981). The c a t a l y t i c regions of the blood coagulation factors share approximately 40% amino acid identity with trypsinogen and also with each other (Katayama et al.,1979; Hewett-Emmett et al.,1981) in the i r serine protease domain regions (see Fig.3). The blood coagulation factors, and many of the other serine proteases (e.g. complement factor B) d i f f e r from trypsinogen and the other digestive serine proteases in possessing long amino-terminal a c t i v a t i o n peptides (see Fig.3) (Jackson and Nemerson,1980). The a c t i v a t i o n peptide in trypsinogen i s only 6 amino acid residues long while in prothrombin and plasminogen, the a c t i v a t i o n peptide i s longer than the c a t a l y t i c region (see Fig.3) (Jackson and Nemerson,1980). A l l of these serine proteases appear to have aquired unique (but see below) amino-terminal extensions in addition to a common serine protease domain (Jackson and Nemerson,1980). The amino-terminal extensions have important roles in the regulation and ac t i v a t i o n of the serine proteases, and may have roles independent of the serine protease enzymatic function (Jackson and Nemerson,1980). 3. Homologous Domains Within The Activation Peptide Of Prothrombin When the amino-terminal extensions of many serine proteases are compared, several homologous domains are observed (see Fig.3) (Jackson and Nemerson,1980; Zur and Nemerson,1981; 28 Figure 3; Amino Acid Sequence Homologies in Coagulation  Factor Zymogens Comparison of the structures of coagulation and f i b r i n o l y t i c zymogens to trypsinogen. The s o l i d bar represents the c a t a l y t i c region in the proteases, the cross hatched region represents the Gla region, K represents the kringles, E represents regions homologous to epidermal growth factor precursor, 1 and 2 represent regions homologous to the type I and type II homologies of fibr o n e c t i n , and A represents the homologous regions found in factor XI and p r e k a l l i k r e i n . The lengths of the bars are approximately proportional to the lengths of the polypeptide chains. Arrows represent the locations of peptide bonds that are cleaved during a c t i v a t i o n of the zymogens. S o l i d l i n e s below the proteins represent disulphide bridges and do not necessarily represent t h e i r true locations. (See text for d e t a i l s ) . 29 P R O T H R O M B I N F A C T O R VII F A C T O R IX F A C T O R X P R O T E I N C F A C T O R XI PRE KALLI KREIN F A C T O R XII P L A S M I N O G E N M K I K W/l E 1 E | , | C A I A I A f A A A A \ A 1 A I A I A ^ I 2 I E I I f E I K 1 ^itea K K K K T I S S U E T Y P E P L A S M I N O G E N A C T I V A T O R U R O K I N A S E 1 I E ! K I K E L K T R Y P S I N O G E N 30 Doolittle,1985). As mentioned previously, prothrombin contains two kringle structures that are 80 amino acid residues long (K in Fig.3) (Magnusson et al.,1975), as shown in Fig.3. Kringles have also been i d e n t i f i e d within factor XII (Cool et a_l.,l985; McMullen and Fu j i kawa, 1 985 ) , and the f i b r i n o l y t i c zymogens plasminogen (Sottrup-Jensen et al.,1978), tissue-type plasminogen activator (Pennica et al.,1983) and urokinase-type plasminogen activator (Verde et a_l.,l984). Also shown in Fig.3 is the Gla domain (cross hatched). As mentioned previously, this region is found in other vitamin K-dependent coagulation proteins including factor VII (Hagen e_t a_l.,l986), factor IX (Kurachi and Davie,1982; Jaye et al.,1983), factor X (Fung et a l . ,1984,1985; Leytus et al.,1984), and protein C (Long et al.,1984; Foster and Davie,1984; Beckmann et al.,1985), a l l of which also contain a prepro leader peptide (see Fung et al.,1985). Not shown in Fig.3 are protein S, which contains both the Gla region and prepro leader (Dahlback et al.,1986), and protein Z, which contains at least the Gla region (Hojrup et. al.,1985). 4. Homologous Domains Found In Serine Protease Zymogens Other Than Prothrombin Additional domains are found in other protease zymogens (Fig.3) which are not present in prothrombin. One of these as noted by D o o l i t t l e e_t al.(l984) and Bloomquist et a l . ( l 9 8 4 ) , i s a region of homology to epidermal growth factor (EGF) which has been i d e n t i f i e d in factor VII (Hagen et al.,1986), factor IX (Kurachi and Davie,1982; Jaye et al.,1983), factor X (Fung et 31 al.,1984,1985; Leytus et al.,1984), protein C (Long et al.,1984; Foster and Davie,1984; Beckmann et al.,1985), protein S (Dahlback e_t al.,1986), protein Z (Hojrup et al.,1985), factor XII (Cool et al.,1985; McMullen and Fujikawa,1985), and tissue-type plasminogen a c t i v a t o r (Pennica e_t a_l.,1983). These EGF-l i k e domains are found not only in serine proteases but also in other proteins such as the LDL receptor (Sudhoff et a_l. , 1 985a ,b) . In addition, type I and type II homologies of fibronectin (Peterson e_t al.,1983) are found in factor XII (Cool et al.,1985; McMullen and Fujikawa,1985), with a type II homology also found in tissue-type plasminogen activator (Pennicia et al.,1983). If the amino-terminal extensions of other serine proteases are compared to other protein sequences, more homologous domains are found including an homologous domain in complement factor B (Morley and Campbell,1984) and the interleukin-2 receptor (Leonard e_t al.,1985), and the four repeats (A in Fig.3) shared by factor XI (Fujikawa e_t al.,1986) and p r e k a l l i k r e i n (Chung e_t al.,1986). Thus the serine protease family i l l u s t r a t e s several d i f f e r e n t modes of protein evolution. Not only are there changes in the amino ac i d sequences of the c a t a l y t i c regions, but also there are gains and/or losses of additional protein domains, and in some cases duplication of these domains within a protein. 32 H. STRUCTURE OF EUKARYOTIC STRUCTURAL GENES I . The Gene Genes for proteins found in the vertebrates are of a complex structure (Breathnach and Chambon,1981). T y p i c a l l y , the genes are composed of a s p l i t structure of exons and introns. Transcription i n i t i a t i o n sequences, including promoters and other regulatory sequences, are found in the 5' flanking sequence and tra n s c r i p t i o n termination sequences are found in the 3' flanking sequence (Breathnach and Chambon,1981 ; Nevins,1983). Transcription of these genes requires a large number of di f f e r e n t processing steps (see below) to produce a mature mRNA capable of being translated into a protein (Breathnach and Chambon,1981; Nevins,1983) . 2. Exons Since their discovery, introns have been found in almost a l l vertebrate protein coding genes, i . e . those transcribed by RNA polymerase II (Breathnach and Chambon,1981; Gilbert,1985). Introns separate the exons which are sp l i c e d together to form the translatable mRNAs. Exons have been found to vary greatly in size, although s p e c i f i c size classes appear to be preferred (Naora and Deacon,1982). A relati o n s h i p between mRNA transcript length (coding size) and number of exons has been observed (Blake,1983a,b). The average exon size i s about 140 bp, which corresponds to the most abundant of the observed size classes (Naora and Deacon,1982). 33 3. Introns Introns separate the exons of a gene and must be removed to produce a mRNA tra n s c r i p t (Breathnach and Chambon,1981). Introns, l i k e exons, vary greatly in size (Naora and Deacon,1982) . At the 5' and 3' end of introns, s p e c i f i c conserved sequences can be found (Mount,1982; Keller and Noon,1984) which appear to be essential for the removal of introns (Wieringa et al.,1984). Within the introns an additio n a l conserved sequence was found (Keller and Noon,1984). Deletion of t h i s sequence though has no consequence in intron s p l i c i n g (Wieringa e_t al.,1984). Subsequently, i t has been shown that t h i s sequence i s involved in branch formation during intron s p l i c i n g , and can be replaced with other intronic sequences (Keller,1984) . The minimum size of introns appears to be about 80 bp, which may be due to constraints caused by the intron s p l i c i n g mechanism (Wieringa e_t al.,1984). 4. Promoters Upstream of the s i t e of mRNA i n i t i a t i o n , promoter sequences can be found (Breathnach and Chambon,1981 ; Nevins,1983). Comparison of DNA sequences of these regions show the presence of several conserved sequences, including the "TATA" and "CAAT" sequences (Breathnach and Chambon,1981). The "TATA" sequence (Goldberg-Hogness box) i s usually found about 30 bp 5' to the si t e of mRNA i n i t i a t i o n , and i s essential for the precision of mRNA i n i t i a t i o n (Breathnach and Chambon,1981; McKnight and Kingsbury,1982) . Approximately 80 bp 5' to the s i t e of mRNA 34 i n i t i a t i o n , a second conserved sequence i s usually found - the "CAAT" sequence (Breathnach and Chambon,1981). The function of the "CAAT" sequence is unknown (Breathnach and Chambon,1981), but often t h i s "CAAT" sequence i s flanked by G/C r i c h inverted repeats (McKnight and Kingsbury,1982). The G/C r i c h inverted repeats appear to be essential for e f f i c i e n t promoter a c t i v i t y , but not for precision of i n i t i a t i o n (McKnight and Kingsbury,1982). Other DNA sequences flanking the si t e of mRNA i n i t i a t i o n also a f f e c t the regulation of promoter a c t i v i t y (Nevins,1983). Some of these sequences are orientation and distance s p e c i f i c , while others, such as enhancers, function independently of distance or orienta t i o n (Gluzman,1985). The di s t i n c t i o n between enhancers and promoter elements overlap both physically and funct i o n a l l y , such that t h e i r d i s t i n c t i o n i s becoming blurred (Gluzman,1985). 5. Transcription And Processing Expression of a gene to produce a protein product involves many processes including t r a n s c r i p t i o n of the gene, capping, polyadenylylation and s p l i c i n g of the heterogenous nuclear RNA, transport of the RNA to the cytoplasm, and f i n a l l y translation (Nevins,1983). Capping of the 5' end of the RNA is essential for both e f f i c i e n t s p l i c i n g (Grabowski et al.,1985) and translation (Shatkin,1985). S p l i c i n g of the introns from the RNA is es s e n t i a l to produce the contiguous translatable mRNA as discusssed above. The s i t e and mechanism of termination of RNA transcription i s unknown ( B i r n s t i e l et al.,1985). Most of the genes transcribed by RNA polymerase II are 35 polyadenylylated (Perry,1976; Nevins,1983). Poly(A) i s added to the RNA a f t e r removal of the 3' end of the tra n s c r i p t by a nuclease (Breathnach and Chambon,1981). This cleavage event occurs approximately 20 bp 3' to a conserved AAUAAA sequence found in the mRNAs (Proudfoot and Brownlee,1976). This AAUAAA sequence i s essential for the cleavage reaction, but not for the poly(A) addition reaction (Montell et §_1.,1983). The mechanism of poly(A) addition is not completely understood (McDevitt e_t al.,1984). After the capping, s p l i c i n g , and polyadenylylation (not necessarily in that order) of the RNA t r a n s c r i p t , the mature RNA i s transported from the nucleus to the cytoplasm where i t i s translated into protein (Nevins,1983). I. EVOLUTION OF AMINO ACID AND DNA SEQUENCE 1. Molecular Clock If a protein is isolated from several d i f f e r e n t species, differences in the amino acid sequence are usually found (Zuckerkandl and Pauling, 1 965) . A greater difference i s usually found between sequences from species which have a more ancient common ancestor (Zuckerkandl and Paul ing, 1 965; Wilson e_t §_1.,1977). It appears as i f there i s constant change in the sequence of a protein through time (Zukerkandl and Pauling,1965; Wilson et al.,1977). There is some evidence that most of the changes have occurred at a nearly uniform rate over time, such that the changes in sequence act as a molecular clock. This can be a useful tool in the resolution of the phylogeny of species (Wilson et al.,1977; L i et al.,1985b). Once the function of a 36 protein changes (as can occur to one product of a gene duplication), the rate of evolution of a protein is l i k e l y to change (Wilson et al.,1977) as the protein w i l l now be under a di f f e r e n t c o l l e c t i o n of s e l e c t i v e pressures. To complicate matters, the rate of the evolution of a protein, even though i t maintains the same function, can change due to changes in the organism's environment, or even the c e l l u l a r or molecular environment (Wilson et al.,1977) . The apparent reason for the often near uniform rate of evolution of a protein i s that the mutation rate of DNA has been nearly uniform (Wilson ejt al.,1 977; L i et al.., 1985b). The difference in evolutionary rates between d i f f e r e n t proteins i s not due to d i f f e r i n g rates of mutation of DNA, but primarily due to selection and the a b i l i t y of a protein to tolerate change (Wilson et el.,1977). Even within a protein, different regions can evolve at d i f f e r e n t rates, such as the i n s u l i n molecule (Wilson et §_1.,1977; L i e_t a_l. , 1985b). Thus, comparison of the sequence of the same protein from d i f f e r n t species may demonstrate functionally important regions by their reduced rate of change (Wilson et al.,1977). 2. Gene Duplications There are many gene families of s t r u c t u r a l l y and functionally similar proteins such as the globins (Edgell e_t §_1.,1983). These families represent proteins which function in a similar fashion and often complement each other (Edgell et al.,1983). Other families such as the lysozyme-lactalbumin family (Hall et al.,1982) or to a lesser extent the 37 immunoglobulin superfamily (Hood et al.,1985), function very d i f f e r e n t l y . In either case, gene duplication events were e s s e n t i a l for the formation of these d i f f e r e n t proteins (Li,1983). Often the gene duplication events have occurred several times, e.g. the globins (Edgell et al.,1983; Hardies et al.,1984), immunoglobulins (Hood et al.,1985), or the fibrinogen genes (Crabtree et al.,1985). Within the serine protease gene family similar gene duplications have been responsible for the expansion of this family (Young et al.,1978; Hewett-Emmett et al.,1981). The serine proteases d i f f e r from the example of the globins in that they have also altered the stucture and size of t h e i r proteins greatly (Hewett-Emmett et al.,1981; Patthy,1985) . J . EVOLUTION OF THE STRUCTURE OF PROTEINS AND GENES 1 . Internal Duplications Within A Gene Proteins have not only changed in sequence but also in size (Doolittle,1985). Many proteins have increased greatly in size compared to their ancestral forms (Doolittle,1985). Different mechanisms appear to function to increase the size of a protein, the most obvious of which i s the duplication of a l l or just part of a protein (Li,1983). It i s easy to imagine that the f i v e kringles of plasminogen (see Fig.3) are the result of such i n t e r n a l duplications (Kurosky et al.,1980). In other cases nearly the entire molecule i s duplicated, as in the case of streptokinase (Neurath,1984). Internal duplications not only re s u l t in homologous amino acid sequence, but also a homologous three dimensional structure (McLachlan,1979). In trypsinogen, 38 i t has been observed that by rotating the molecule 180°, i t is possible to produce a si m i l a r three dimensional structure of the molecule (McLachlan,1979). This has been interpreted to imply that trypsinogen, and thus a l l other serine proteases, have been formed by duplication events to resu l t in four similar s t r u c t u r a l domains making up the serine protease domain (McLachlan,1979). Today no amino acid homology i s v i s i b l e from these ancient duplication events (McLachlan,1979). 2. Gene Fusions Gene duplications cannot completely explain the evolution of some of the larger proteins found today, such as the blood coagulation factors ( D o o l i t t l e , 1985) . In these proteins, i t appears that protein domains from several d i f f e r e n t sources have been combined to create new proteins by some gene fusion type event (see next section for possible mechanisms) (Doolittle,1985). In some cases, the gene fusions have been very complicated such as with the large number of d i f f e r e n t protein domains found in factor XII (Cool e_t al.,1985; Neurath,1985). Duplication events appear to occur together with these gene fusion events, s i m i l a r to transposition of re p e t i t i v e DNA elements (Calos and Miller,1980) retaining the protein domain in the donor protein. 39 K. FUNCTION OF INTRONS 1. Distribution With the discovery of introns (Berget et §_1.,1977; Chow et al_.,l977), the paradox of the number of genes and genome size was p a r t i a l l y i f not completely resolved (Gilbert,1979). The size of genes was found to be unrelated to the size of the protein, and thus the size of the genome i s unrelated to the number of genes within the genome (Cavalier-Smith,1978; Gilbert, 1979). Introns are regions of DNA which are not found in the functional RNA product (mRNA, rRNA, or tRNA) of a gene (Breathnach and Chambon,1981). Removal of introns from hnRNA by RNA s p l i c i n g (Cech,1983) joins the exons of a gene to form a functional RNA product. Introns are found in nuclear and organellar genes of eukaryotes, some genes of archebacteria, and some v i r a l genes of prokaryotes (Darnell and Doolittle,1986) . If the mechanism of s p l i c i n g of introns found in various genes and species are compared, at least three d i f f e r e n t types of RNA s p l i c i n g are observed (Cech,1983; Sharp,1985), suggesting the p o s s i b i l i t y of multiple o r i g i n s of introns (see below). 2. Position Of Introns Within Genes And Proteins When the positions of introns in genes were mapped to positions in the translated protein products, i t was observed that many of the introns separated protein domains (Artymiuk e_t al.,1981; Blake,1978,1983a,b,1985). Subsequently i t was demonstrated that in some genes, the introns separated domains of three dimensional structure (which may not necessarily be 40 functional domains) (Go,1981 ,1983). An additional observation was that often the p o s i t i o n of introns mapped to the surface of a protein (Craik et al.,1982a,b,1983). From the early observations, Gilbert(1978,1979) postulated that introns allowed the s h u f f l i n g of protein domains, a mechanism he c a l l e d exon s h u f f l i n g . It should be noted that exon shuffling is not an explanation of the function of introns, but explains how they have been used during evolution (Blake,1985; Gilbert,1985; Rogers,1985). 3. Intron S l i d i n g The discovery of introns also provided a possible explanation for the many insertions and deletions observed between related proteins. These insertions or deletions between proteins were often observed at or near intron-exon junctions of at least one gene within a gene family. Thus, changes in the s i t e of RNA s p l i c i n g could create mRNAs containing insertions and/or deletions of a few amino acid residues (Craik et al.,1982a,b,1983). Only changes of 3 bp or multiples of 3 bp would result in such observations, as any other type of change would a l t e r the reading frame of the mRNA, thus completely a l t e r i n g the sequence of the protein. 41 L. ORIGIN OF INTRONS 1. Metabolic Enzymes Many of the metabolic enzymes bind nucleotides as cofactors and probably constitute one of the most ancient gene families (Rogers,1985; Gilbert,1985). This gene family diverged into the dif f e r e n t metabolic enzymes prior to the eukaryotic-prokaryotic-archebacterial divergence, and thus occurred in the progenote (Gilbert,1985; Marchionni and Gilbert,1986). Hence, i t may be possible to determine i f introns were present in the genes of the progenote by comparing the gene organization of the di f f e r e n t members of the metabolic enzyme family (Gilbert,1985). Genes for several members of the metabolic enzyme family have been isolated and characterized, including alcohol dehydrogenase (Benyajati e_t a l . , 1 98 1 , 1 983; Dennis et a l . , 1 984,1 985; Duester e_t a_l. , 1 986) , glyceraldehyde phosphate dehydrogenase (Stone et al.,1985a,b), lactate dehydrogenase (Li et al.,1985a), pyruvate kinase (Lonberg and Gilbert,1985) , and triose-phosphate isomerase (Brown et al.,1985; Straus and Gilbert,1985; Marchionni and Gilbert,1986; McKnight et §_1.,1986). Comparison of the organization of some of these genes (Cornish-Bowden,1985; Duester et al.,1986) shows that the introns do tend to cluster in similar locations. Duester et §_1.(1986) concluded that introns were present before these genes duplicated, suggesting the existence of introns since the beginning of l i f e . When the sequences of these genes are compared, i t i s found 42 that none of the introns in the different genes is shared (Cornish-Bowden,1985; Straus and Gilbert,1985); indeed, i t has been acknowledged that i t would be d i f f i c u l t to move introns by the fractions of a codon often required to align their positions from d i f f e r e n t genes (Straus and Gilbert,1985, see intron s l i d i n g above). No cle a r example of intron s l i d i n g of a fraction of a codon has been demonstrated. Thus c l u s t e r i n g of the introns does not suggest that there was an intron from the beginning; i t i s possible or even probable that these introns are due to independent insertions (Rogers,1985). 2. The Triose-Phosphate Isomerase Gene The gene for triose-phosphate isomerase has been characterized from a number of species, including c o l i (Pichersky et al.,1984), Saccharomyces cerevisiae (Alber and Kawaski,1982), Schizosaccharomyces pombe (Russell,1985), chicken (Straus and Gi l b e r t , 1 985) , man (Brown et §_1.,1985), maize (Marchionni and Gilbert,1986), and Aspergillus nidulans (McKnight et al.,1986). The genes in E_;_ c o l i , S. cerevisiae, and S_j_ pombe do not contain introns, while the remainder contain up to eight introns (Mcknight et al.,1986; G i l b e r t et a_l.,1986). Comparison of these genes in the four species with introns shows that only one of the introns i s shared by a l l species (McKnight et aJL.,1986; Gilbert et al.,1986). Of the five introns found in Aspergillus, only intron B is found in the other species. Introns A and E are found at non integer number of codons displaced (therefore cannot be eas i l y explained by intron s l i d i n g ) , and introns C and D are unique (McKnight et 43 §_1.,1986). The intron organization of the human and chicken genes are i d e n t i c a l (Gilbert et al.,1986), and six of the eight introns of maize are shared with the chicken (Marchionni and Gilbert,1986; G i l b e r t et al.,1986). Gilbert et al.(1986) concluded that these observations are best explained by introns being present in the progenitor species (and therefore since the beginning of l i f e ) . Unfortunately, no known mechanism w i l l move an intron a fraction of a codon (Rogers,1985; Straus and Gilbert,1985); therefore, introns which interrupt the sequence in d i f f e r e n t phases do not necessarily have a common o r i g i n . Intron invasion at preferred sites cannot be excluded as an a l t e r n a t i v e explanation for the observations within the triose-phosphate isomerase genes, and indeed may be the more probable explanation. 3. Intron Mobility Data on the triose-phosphate isomerase genes show that l i t t l e change in number of introns has occurred since the divergence of plants and animals one b i l l i o n or more years ago (Marchionni and Gilbert,1986; G i l b e r t et al.,1986). Marchionni and Gilbert(1986) concluded that the difference in the number of introns in the maize and chicken gene for triose-phosphate isomerase was due to intron loss in animals. However, i t is not possible to exclude the p o s s i b i l i t y of intron i n s e r t i o n in plants with these data. Similar observations with the globin genes have been made (Darnell and Doolittle,1986). The structure" of the triose-phosphate isomerase gene for Aspergillus indicated that some introns are at least 1.2 b i l l i o n years old 44 (McKnight et al.,1986; Gilbert et al.,1986). Differences observed between the plants and animals and Aspergillus may e a s i l y be due to intron insertions between 1.2 and 1 b i l l i o n years ago. Gene structures of the metabolic enzymes do not demonstrate c l e a r l y the presence of introns prior to their duplication in the pregenote because intron insertion cannot be ruled out (Cornish-Bowden, 1 985) . McKnight e_t al.(l986) observed that intron i n s e r t i o n occurs at preferred s i t e s and an explanation for t h i s i s not apparent. One proposed source of the preferred s i t e s of i n s e r t i o n i s based on chromatin structure (Cavalier-Smith, 1985). Data from the metabolic enzyme genes appear to support intron i n s e r t i o n rather than introns being present from the beginning of l i f e . As discussed above, the organization of the genes for triose-phosphate isomerase implies that intron insertion was taking place at least 1.2 b i l l i o n years ago with l i t t l e further a c t i v i t y in the last 1 b i l l i o n years, e s p e c i a l l y in animals (McKnight et al.,1986; Gilbert et al.,1986). 4. Models Of Intron Origin The usefulness of introns is clear (Gilbert,1985; Blake,1985) but t h e i r o r i g i n is not (Rogers,1985). S p l i c i n g of RNA and thus introns have been proposed to have existed since early in the evolution of l i f e (Darnell and Doolittle,1986), and t h i s s p l i c i n g was necessary for the evolution of the larger proteins e s s e n t i a l for present day l i f e (Doolittle,1978; Darnell and Doolittle,1986). Subsequently, prokaryotes and u n i c e l l u l a r eukaryotes l o s t most ( i f not a l l ) introns by selection for 45 smaller genome size (Doolittle,1978; Gilbert,1985; Darnell and Doolittle,1986). M u l t i c e l l u l a r eukaryotes are postulated not to have this selection pressure for smaller genome size (Gilbert,1985). If this i s true, intron loss has been a major force in gene evolution, although t h i s does not account for the mechanisms of RNA s p l i c i n g associated with intron removal (Cech,1983). A second p o s s i b i l i t y i s that introns have invaded the genome of the eukaryotes (Cavalier-Smith,1978,1985; Crick,1979). Intron RNA can be s e l f - s p l i c i n g , without the requirement for proteins, as a ribozyme (Kruger e_t al.,1982; Zaug et al.,1986), possibly implying that introns were mobile v i r u s - l i k e elements that may have invaded genes (Sharp,1985). The three s p l i c i n g mechanisms (Cech,1983; Sharp,1985) thus could be the result of the conversion of three d i f f e r e n t invading v i r a l - l i k e elements into the three types of introns found today. Additional support for a recent invasion of genes by introns i s found when.the family of flavin-containing metabolic enzymes are compared (see above). The preference for introns to separate protein domains (Go,1981,1983; Blake,1978,1983a,b,1985), and th e i r location corresponding to the surface of proteins (Craik et al.,1982a,b,1983) has not been explained. It i s clear that i f introns inserted they were not mutagenic (Rogers,1985; Duester et al.,1986) so the obvious s e l e c t i v e pressure for clustering of introns i s not v a l i d . Mechanisms such as the insertion of introns at the junction of linker sequences and nucleosome core p a r t i c l e s can account for the observed size classes of introns 46 (Cavalier-Smith,1985), but as yet there i s no evidence to support t h i s mechanism. A yet unknown mechanism may explain the s i t e preference of introns. Investigation of other gene families may provide insight into both the o r i g i n and function of introns. The serine protease gene family i s a excellent family for such an investigation, as the gene duplication events are scattered throughout the evolution of the eukaryote (Young et al.,1978) . M. SERINE PROTEASE GENES 1. Sequence Of Serine Proteases The amino acid sequences of a large number of serine proteases zymogens have been determined (see Young et al.,1978; Hewett-Emmett et §_1.,1981), and a large number of others have been p a r t i a l l y sequenced (Hewett-Emmett et al.,1981). With the advent of molecular b i o l o g i c a l techniques, i s o l a t i o n of cDNAs has allowed the prediction of the amino acid sequences of many more serine protease zymogens, together with their precursor sequences. For the coagulation proteins, the complete amino acid sequences of prothrombin (MacGi 11 ivray et a_l.,l980; Degen et al.,1983; MacGillivray and Davie,1984), factor VII (Hagen et al.,1986), factor IX (Kurachi and Davie,1982; Jaye et al.,1983), factor X (Fung et a l . ,1984,1985; Leytus et al.,1984), factor XI (Fujikawa et al.,1986), factor XII (Cool et al.,1985), p r e k a l l i k r e i n (Chung e_t al.,1986), protein C (Long et a l . , 1984; Foster and Davie,1984; Beckman et al.,1985), and protein S (Dahlback et al.,1986) have been determined from the 47 corresponding cDNA sequences. In addition, cDNAs for many of the f i b r i n o l y t i c zymogens including plasminogen (Malinowski and Davie,1983; Malinowski et al.,1984), tissue-type plasminogen activator (Pennica e_t a_l.,l983), and urokinase (Verde et §_1.,1984) have been isol a t e d and characterized. These sequences have allowed a better understanding of how these proteins are related to each other and other proteins (see Patthy,1985). 2. Genes For Serine Proteases Genes for many of the serine proteases have also been characterized including trypsinogen (Craik et al.,1984), chymotrypsinogen (Bell et §_1.,1984), proelastase (Swift et al.,1984), nerve growth factor subunits a and y (Evans and Richards,1985), peptide processing k a l l i k r e i n s of the maxillary gland (Mason e_t al.,1983) and kidney (van Leeuwen et al.,1986), complement factor B (Campbell and Porter,1983; Campbell et al.,1984), f i b r i n o l y t i c zymogens tissue type plasminogen activator (Ny et §_1.,1984; Fisher e_t al.,1985; Degen et al.,1986), urokinase (Nagamine et al.,1984; Riccio et al.,1985), blood coagulation proteins, factor IX (Anson et al.,1984; Yoshitake e_t al.,1985), protein C (Foster et a_l.,l985; Plutzky et al.,1986), and the plasma protein haptoglobin (a non-serine protease homologue) (Maeda et a_l. , 1984). P a r t i a l gene structures of plasminogen (Malinowski et al.,1984; Sadler et al.,1985), and prothrombin (Degen et §_1. , 1 983 , 1 985; Davie et al.,1983) have also been reported. The structure of a serine protease gene from the invertebrate Drosophi1ia melanogaster (Davis et_ al.,1985) has also been reported. 4 8 N . THE EVOLUTION OF THE S E R I N E PROTEASE GENES By c h a r a c t e r i z i n g t h e g e n e f o r b o v i n e p r o t h r o m b i n a n d c o m p a r i n g t h e g e n e s t u c t u r e t o t h e s t r u c t u r e s o f t h e g e n e s l i s t e d i n t h e p r e v i o u s s e c t i o n , i t may be p o s s i b l e t o o b t a i n i n s i g h t i n t o t h e o r i g i n o f t h e d i f f e r e n t s t r u c t u r a l d o m a i n s f o u n d w i t h i n t h e p r o t h r o m b i n m o l e c u l e . S u c h a s t u d y o f t h e e v o l u t i o n o f one member o f t h e f a m i l y o f s e r i n e p r o t e a s e s may a l s o s h e d l i g h t on t h e e v o l u t i o n a r y h i s t o r y o f i n t r o n s , a n d p o s s i b l y t h e i r f u n c t i o n . F i n a l l y , c o m p a r i s o n o f t h e s e q u e n c e p r o t h r o m b i n f r o m a number o f d i f f e r e n t s p e c i e s may h e l p t o i d e n t i f y r e g i o n s o f f u n c t i o n a l i m p o r t a n c e f o r t h e s h a r e d f u n c t i o n s o f p r o t h r o m b i n w i t h i n t h e s e s p e c i e s . 49 MATERIALS AND METHODS A. MATERIALS Yeast extract, casamino acids, bacto-tryptone, and bacto-agar were Difco grade from the Grand Island B i o l o g i c a l Company. NZ-amine type A was from Humko S h e f f i e l d Chemical Co. Agarose, acrylamide, bisacrylamide, urea, ammonium persulphate, and TEMED (N,N,N',N'-tetramethylethlyenediamine) were from Bio-Rad Laboratories. N i t r o c e l l u l o s e sheets and c i r c l e s (82 and 132 mm) were 0.45jum pore size from M i l l i p o r e or Schleicher and Schuell. 3 2 P - l a b e l e d nucleotides were from New England Nuclear or Amersham. Phenol was from B r i t i s h Drug Houses Ltd. and was r e d i s t i l l e d before use. The fr a c t i o n d i s t i l l e d at 179°C was colle c t e d and frozen in aliquots at -20°C. Deoxy-, dideoxyribonucleotides, and random hexadeoxyribonucleotides (p(dN9)) were from PL-Pharmacia. I sopropyl-j3-D-thiogalactopyranoside (IPTG), 5-bromo-4-chloro-3-indolyl-^-D-galactopyranoside (X-Gal), ethidium bromide (EtBr), dimethylsulphoxide (DMSO), 3-(N-morpholino)propanesulphonic acid (MOPS), yeast transfer RNA (tRNA), a m p i c i l l i n , t e t r a c y c l i n e , chloramphenicol and ribonuclease A were from Sigma. Cesium chloride was from Cabot Berylco Ltd. Ultrogel AcA54 was from LKB. Oligodeoxyribonucleotides were synthesized on an Applied Biosystems 890A DNA Synthesizer (by Tom Atkinson, Dept. of Biochemistry) and p u r i f i e d by denaturing polyacrylamide gel electrophoresis prior to use (Atkinson and Smith,1984). A l l other chemicals were of reagent grade or better and were 50 purchased from either Sigma Chemical Co., Fisher S c i e n t i f i c , or B r i t i s h Drug Houses Ltd. Restriction endonucleases, T4 DNA ligase, T4 DNA polymerase, T4 polynucleotide kinase and BSA (nuclease free) were from New England Biolabs, Bethesda Research Laboratories, or PL-Pharmacia. Nuclease SI and deoxyribonuclease I were from Boerhinger-Mannheim. Avian myoblastosis virus reverse transcriptase was from L i f e Sciences Inc. or New England Nuclear. DNA polymerase I and DNA polymerase I Klenow fragment were from Boerhinger-Mannheim or PL-Pharmacia. Day old chicks were obtained from Western Hatcheries, Abbotsford. Adult chicken l i v e r s were obtained from Dr. P. March, Dept. of Poultry Science, UBC. Bovine l i v e r was obtained from Intercontinental Packers, Vancouver. B. STRAINS, VECTORS, AND MEDIA 1. B a c t e r i a l Strains E. c o l i K802 (hsdR +, hsdM+, gal", met", supE) (Maniatis et al.,1982) was host for screening and i s o l a t i o n of DNA of clones in XCh4A vector (Blattner et al.,1977). E^ _ c o l i Q359 (hsdR" , hsdM+, supF, 080) (Karn et §_1.,1980) was host for screening and i s o l a t i o n of DNA from clones in X1059 vector (Karn e_t al.,1980). E. c o l i JM83 (ara, Alacpro, strA, t h i " , 080, lacZAM15) (Viei r a and Messing,1982) was host for transformation and DNA i s o l a t i o n from clones in pUC13 vector (Vieira and Messing,1982). E_^  c o l i JM101 (Alacpro, supE, t h i " , F', traD36, proAB, lacIQ, lacZAM15) and JM103 (Alacpro, supE, t h i " , strA, sbcBl5, endA, hsdR", F', traD36, proAB, lacIQ, lacZAM15) (Messing,1983) were hosts for 51 transformation and DNA i s o l a t i o n of clones in M13 mp7, 8, 9, 10, and 11 vectors (Messing, 1983) . E_^  c o l i RY1088 (AlacUl69, supE, supF, hsdR", hsdM+, metB, trpR,.tonA21, proC::Tn5(pMC9), pMC9 i s pBR322-lacIQ) (Young and Davis,1983a,b) was host for screening and i s o l a t i o n of DNA from clones in the Xgt11 vector (Young and Davis,1983a,b). 2. Vectors For DNA sequence analysis the M13 vectors mp7, 8, 9, 10, and 11 (Messing,1983) were used as cloning vectors. DNA for r e s t r i c t i o n endonuclease mapping and DNA sequencing was i n i t i a l l y subcloned in pUC13 (Veiera and Messing,1982) (pUC13 was obtained from Dr. Mark Zo l l e r , Dept. of Biochemistry, UBC) . 3. Media The medium for growth and screening of X clones and hosts was NZYC (Maniatis et al.,1982) (1Og NZamine type A, 2g MgCl 2, 5g NaCl, 5g Yeast Extract, 1g Casamino Acids per l i t e r , and pH7.5 with NaOH). For screening phage X l i b r a r i e s , the phage were plated on NZYC-agar(1,5%,w/v) plates with overlay of NZYC-agarose(0.75%,w/v). For t i t e r i n g of phage X stocks, the overlay consisted of NZYC-agar in place of the NZYC-agarose. The medium for the transformation and growth of bacteria containing pUC plasmid derivatives was Luria broth (Maniatis et §_1.,1982) (5g Yeast Extract, 1Og Bacto-Tryptone, and 1Og NaCl per l i t e r ) . For the selection of pUC-containing bacteria, clones were plated on LB-agar(1.5%,w/v) plates supplemented with 50Mg/ml a m p i c i l l i n . 52 This same medium was used for screening the human cDNA l i b r a r y in pKT2l8 except that tetracycline (l2.5Mg/ml) replaced the a m p i c i l l i n . Bacteria containing M13 clones were grown in YT medium (Maniatis et al.,1982) (5g Yeast Extract, 8g Bacto-Tryptone, and 5g NaCl per l i t e r ) . Phage M13 transformants were plated on YT-agar (1.5%,W / V ) plates overlayed with YT containing 0.75% agar. E^ c o l i JM101 and JM103, hosts for M13 vectors, were maintained on minimal medium plates (Messing,1983), which was made up as follows: 3g of agar in 160ml H 20 was autoclaved, cooled to 55°C, and was mixed with 40ml 5X Salts (2.1g K 2HPO„, 0.9g KH 2P0 4, 0.2g (NH„) 2S0 2, 0.1g NaCitrate•7H 20 per 40ml), 2ml 20% glucose, 0.2ml 20% MgS0 4«7H 20, and 0.1ml 1Omg/ml thiamine. Each of these solutions was s t e r i l i z e d by autoclaving except the thiamine which was f i l t e r - s t e r i l i z e d . Bacteria for large scale plasmid preparations were grown in M9 mimimal medium (Maniatis et al.,1982) which was made up as 840ml H20, 100ml 10X Salts (7g Na 2HPO„, 3g KH2PO,,, 0.5g NaCl, 1g NH4C1 per 100ml), 10ml MgSOa-7H20, 20ml 20% glucose, 10ml 0.01M CaCl 2, 20ml 20% Casamino Acids, 0.2ml 1Omg/ml thiamine and 0.2g uridine. Each of the solutions was autoclaved separately except the thiamine and uridine which were f i l t e r - s t e r i l i z e d . C. BASIC MOLECULAR BIOLOGY TECHNIQUES DNA fragments were separated according to size by electrophoresis in agarose or polyacrylamide gels. The buffer for agarose gel electrophoresis was 1XTAE (50XTAE buffer is 2M T r i s base, 1M G l a c i a l Acetic Acid, 0.1M EDTA) (Maniatis et al.,1982). DNA fragments in these gels were v i s u a l i z e d either 53 by UV fluoresence or autoradiography. For detection of DNA by UV fluoresence, agarose gels were prepared containing I0jug/ml EtBr, and the DNA was v i s u a l i z e d by i r r a d i a t i o n under UV li g h t (260nm). If the DNA fragments were v i s u a l i z e d by autoradiography, the gels were dried under vacuum using a Bio-Rad gel d r i e r at 60°C for one hour. The dried gel was then exposed to Kodak XK-1 film, with or without an intensifying screen (Lightning Plus, Dupont). If i n t e n s i f y i n g screens were used, the films were exposed at -20°C or -70°C; otherwise, the film was exposed at room temperature. Polyacrylamide gels were used with 1XTBE buffer (1OXTBE buffer i s 0.89M T r i s base, 0.89M Boric Acid, 25mM EDTA, pH 8.3) (Maniatis et al.,1982). Polyacrylamide gels were either denaturing or nondenaturing, due to the presence or absence of urea as a denaturant. For nondenaturing gels, acrylamide (added to the appropriate concentration from a stock of 29:1 acrylamide:bisacrylamide) and buffer were mixed with the appropriate volume of water, and degassed using a water aspirator. Polymerization was i n i t i a t e d by the addition of ammonium persulphate and TEMED to f i n a l concentrations of 0.066%(w/v) and 0.04%(w/v), respectively. DNA fragments in these gels were v i s u a l i z e d by staining the gels with I0jug/ml EtBr in water for 10 minutes, followed by ir r a d i a t i o n under UV l i g h t (260nm). Denaturing polyacrylamide gels in TBE buffer contained urea (8.3M), acrylamide (added to the concentration from a 38:2 acrylamide:bisacrlyamide) and buffer were mixed mixed with the appropriate volume of water, and degassed using a water as p i r a t o r . Polymerization was 54 i n i t i a t e d by the addition of ammonium persulphate and TEMED to f i n a l concentrations of 0.066%(w/v) and 0.024%(w/v), res p e c t i v e l y . DNA in denaturing gels was v i s u a l i z e d by autoradiography after drying under vacuum in a Bio-Rad gel d r i e r at 80°C for 20-30 minutes, and exposing to Kodak XK-1 f i l m , with or without i n t e n s i f y i n g screens. D. ISOLATION OF DNA 1. I s o l a t i o n Of Plasmid DNA Small amounts of plasmid DNA were prepared by a modification of the alkaline l y s i s method of Birnboim and Doly(l979) (Maniatis et al.,1982). An aliquot (1.5ml) of an overnight culture of the clone of interest was placed in a microfuge tube (Eppendorf), and the bacteria were c o l l e c t e d by c e n t r i f u g a t i o n for 1 minute in an Eppendorf microfuge. The p e l l e t was resuspended in lOOjul of an ice cold solution containing 50mM glucose, 1OmM EDTA, 25 mM Tris-HCl pH8.0, and 4mg/ml lysozyme. The suspension was incubated for 5 minutes at room temperature, and 200yl of a solution containing 0.2N NaOH-1% SDS was added. The mixture was incubated at 4°C for 5 minutes, and 150M1 of potassium acetate solution pH4.8 (60ml 5M KOAc, 11.5ml G l a c i a l Acetic Acid, 28.5ml H 20) was added. After mixing by vortexing, the suspension was incubated at 4°C for 5 minutes. C e l l u l a r debris was removed by centrifugation in a Eppendorf centrifuge for 5 minutes at 4°C. The supernatant was removed and extracted with an equal volume of phenol:chloroform ( 1 : 1 , V / V ) . Nucleic acids were precipitated by the addition of 2 55 volumes of ethanol at room temerature. After centrifugation in an Eppendorf centrifuge for 5 minutes, the supernatant was discarded and the nucleic acid p e l l e t was washed with 1ml of 70% ethanol. The p e l l e t was a i r dried and resuspended in 50M1 TE buffer (10 mM Tris-HCl pH8.0, 1 mM EDTA). Two d i f f e r e n t procedures were used for large scale plasmid i s o l a t i o n . The Triton l y s i s procedure (Katz et al.,1973,1977), was used for large preparations of plasmid in either the pBR322 or the pKT2l8 cloning vectors. An aliquot (5ml) of an overnight culture of bacteria was used to inoculate 1L of M9 medium at 37°C, with shaking at approximately 200 rpm. When the OD600nm of the culture was 0.6-0.7, 250mg chloramphenicol was added, and the culture was shaken at 37°C for 12-16 hours. C e l l s were co l l e c t e d by centrifugation at 6Krpm in a GS-3 rotor for 10 minutes and frozen at -20°C for at least two hours. The c e l l s were then resuspended at 4°C in 6.25 ml of a solution containing 25%(w/v) sucrose, and 50mM Tris-HCl pH8.0. Lysozyme (1.5 ml of a 1Omg/ml solution in 25% sucrose-50mM Tris-HCl pH8.0) was added, and the solution was continuously mixed by swirling on ice for 5 minutes. EDTA (1.25 ml of a 0.5M solution, pH8.0) was added and mixed on ice by swirling for an additional 5 minutes. Triton solution (10ml of a solution comprising 10 ml 10%(W / V ) Triton X-100, 125 ml 0.5M EDTA pH8.0, 50 ml 1M Tris-HCl pH8.0, 800 ml H20)was added, and mixed for an addi t i o n a l 5 minutes. Debris was removed by centrifugation at 19Krpm in an SS-34 rotor for 30 minutes at 4°C. Plasmid DNA was separated from chromosomal DNA and RNA by isopycnic centrifugation using cesium 56 chloride gradients. CsCl/EtBr solutions were produced by the d i r e c t addition of 3.9g CsCl and 0.3ml EtBr(1Omg/ml) to 3.8 ml of the supernatant. These volumes were scaled up for tubes for the larger rotors. Centrifugation times varied with the rotor used. With the vTi65 rotor, centrifugation was either 4 hours at 65Krpm or 20 hours at 50Krpm. For the Ti70.1 rotor, c e n t r i f u g a t i o n was for 20 hours at 50Krpm at 20°C. The large scale a l k a l i l y s i s procedure (Maniatis e_t al.,1982) was scaled up from the small scale preparation described above with the following modifications. After addition of potassium acetate, the debris was removed by centrifugation at 35Krpm in a Ti60 rotor for 30 minutes at 4°C. The supernatant was immediately mixed with of 0.6 volumes of isopropanol, and was incubated at room temperature for 15 minutes. Nucleic acids were c o l l e c t e d by centrifugation at 9Krpm in an HB-4 rotor for 30 minutes at room temperature. Plasmid DNA was p u r i f i e d by isopycnic centrifugation as described above. Double stranded M13 DNA ( r e p l i c a t i v e form) was i s o l a t e d as described by Messing(1983). A single plaque was mixed with 1Oul of overnight culture of uninfected host c e l l s (JM101 or JM103) in 1ml YT for 6 hours. Concurrently, a colony of host bacteria from a minimal medium plate was grown up in 10 ml YT medium for 6 hours. The two cultures were then added to 1L of YT medium at 37°C and grown for 4 hours. DNA was isolated from these c e l l s by the a l k a l i l y s i s procedure, as described above. A l l DNA used as vectors for cloning experiments was subjected to two rounds 57 of p u r i f i c a t i o n through CsCl/EtBr density gradients. 2. Isolation Of Phage DNA For large scale preparations of phage X DNA (Maniatis et al.,1982), 10 1° host b a c t e r i a l c e l l s were collected by centrifugation and resuspended in 3 ml of SM buffer (5.8g NaCl, 2g MgSOa-7H20, 50 ml 1M Tris-HCl pH7.5, 5 ml 2% gelatin per L). Phage X (5X10 7-5X10 8 pfu) were added to the c e l l s , and the phage were allowed to attach to the c e l l s by incubation at 37°C for 10 minutes. This mixture was used to inoculate 0.5L of prewarmed NZYC medium and the culture was incubated at 37°C u n t i l l y s i s . Chloroform (10ml) was added and incubation at 37°C continued for 10 minutes in order to lyse the remainder of the c e l l s . B a c t e r i a l debris was removed by centrifugation at 7Krpm in a GSA or GS-3 rotor for 10 minutes. Phage p a r t i c l e s were precipitated by the addition of 0.3 volumes of 50% polyethelene glycol 6000 (Carbowax 8000) and 0.15 volumes of 5M NaCl, and incubation at 4°C overnight. Phage p a r t i c l e s were co l l e c t e d by centrifugation at 7Krpm in a GSA or GS-3 rotor for 15 minutes at 4°C. After removal of a l l the PEG/NaCl solution, the phage p a r t i c l e s were gently resuspended in 10 ml DNase I buffer (50 mM Tris-HCl pH7.5, 5 mM MgCl 2, 0.5 mM CaCl 2) to which 100/ul 1mg/ml DNase I and 200M1 RNase A were added, and the solution was incubated at 37°C for 30 minutes. Debris was removed by centrifugation at 1OKrpm in an SS-34 rotor for 5 minutes. Phage were p u r i f i e d using CsCl gradients. Gradients were made by the addition of 0.75g CsCl per ml of phage s o l u t i o n . Centrifugation was for 16-20 hours at 20°C in a Ti70.1 rotor at 50Krpm. Phage were 58 removed from the gradient after l o c a l i z a t i o n with a l i g h t source (e.g. with a f l a s h l i g h t , the phage appear as a blue band), and CsCl was removed by d i a l y s i s against DNase I buffer (see above) for at least one hour at 4°C. SDS was added to 1%(w/v), EDTA to 5 mM, and proteinase K to 50/ug/ml and the solution was incubated at 68°C for 1 hour. DNA was p u r i f i e d by extraction with phenoltchloroform (1:1,v/v) followed by 3 extractions with chloroform. Phage DNA was p r e c i p i t a t e d by the addition of 0.1 volume of 3M NaOAc pH4.8 and 2 volumes of ethanol. Small scale X preparations were scaled down from the large preparation described above (Maniatis et al.,1982). Eluted phage from one phage plaque (or 3X106 pfu from phage stock) were attached to lOOjul of host c e l l s at 37°C for 10 minutes and used to inoculate 20 ml of NZYC medium. DNA i s o l a t i o n was as above with the omission of the CsCl gradient. Phage were digested immediately after DNase I and RNase A digestion with proteinase K. DNA was precipitated by the addition of one volume of isopropanol instead of ethanol and the p e l l e t was resuspended in 100/4 of TE buffer. 3. Genomic DNA Isolation Bovine genomic DNA was prepared by Ross MacGillivray by the method of Bl i n and Stafford (1976), which was the same method used for the p u r i f i c a t i o n of DNA from human l i v e r s . Liver tissue was ground to a fine powder in l i q u i d nitrogen, either with a Waring blendor or a with a mortar and pestle. Liver powder was dissolved in a buffer (1Oml/g tissue) consisting of 0.5M EDTA pH8.0, 0.5% SDS, and I00jug/ml proteinase K, and was 59 digested overnight at 50°C. The solution was gently extracted three times with equal volumes of phenol and dialyzed against buffer (50mM Tris-HCl pH8.0, lOmM NaCl, 1OmM EDTA) u n t i l the OD270nm of the dialysate was below 0.05. RNase (DNase free) was added to a concentration of I00jug/ml and the solution was incubated at 37°C for one hour. The DNA solution was extracted gently three times with equal volumes of phenol:chloroform (1:1,v/v), and then dialyzed against TE buffer. Insoluble material was removed by cen t r i f u g a t i o n at 14Krpm in an SS-34 rotor at 4°C for 10 minutes. DNA was precipitated by the addition of G i l b e r t Salts (5X Salts is 2.5M NH„OAc, lOOmM MgCl 2, and 1mM EDTA) to 1X followed by the addition of two volumes of ethanol. After c o l l e c t i o n by centrifugation, the DNA was allowed to rehydrate for at least two days. Insoluble material was removed by c e n t r i f u g a t i o n as described above. The f i n a l genomic DNA p e l l e t was resuspended at a concentration approximately 0.5 mg/ml in TE buffer. E. DNA SUBCLONING 1. Producion Of DNA Fragments For Ligation DNA fragments for l i g a t i o n into either pUCl3 or M13 vectors were produced by several methods including sonication (Deininger,1983), or by r e s t r i c t i o n endonuclease digestion. Fragments that were produced by r e s t r i c t i o n endonuclease digestion were digested under the conditions suggested by the manufacturer of the enzyme. Both mixtures and gel p u r i f i e d r e s t r i c t i o n endonuclease DNA fragments were ligated into 6 0 v e c t o r s . I f m i x t u r e s o f f r a g m e n t s w e r e t o b e l i g a t e d , t h e r e s t r i c t i o n e n d o n u c l e a s e d i g e s t i o n m i x t u r e w a s h e a t e d a t 6 8 ° C f o r 10 m i n u t e s t o i n a c t i v a t e t h e e n z y m e s a n d t h e n e x t r a c t e d w i t h p h e n o l b e f o r e l i g a t i o n . P u r i f i e d r e s t r i c t i o n e n d o n u c l e a s e f r a g m e n t s w e r e i s o l a t e d f r o m a g a r o s e o r p o l y a c r y l a m i d e g e l s b y e l e c t r o e l u t i o n ( M a n i a t i s e t a l . , 1 9 8 2 ) . R a n d o m DNA f r a g m e n t s w e r e p r o d u c e d b y s o n i c a t i o n ( D e i n i n g e r , 1 9 8 3 ) , u s i n g a H e a t S y s t e m s S o n i f i e r a t o u t p u t l e v e l 2 . DNA ( 1 0 - 2 0 M g i n 5 0 0 j u l o f 0 . 5M N a C l , 0 . 1 M T r i s - H C l p H 7 . 4 , 1OmM E D T A ) w a s s o n i c a t e d b y f i v e p u l s e s o f 5 s e c o n d s . T h e D N A s o l u t i o n w a s c o o l e d o n i c e , a n d m i x e d b e t w e e n p u l s e s . T h e r e s u l t i n g DNA f r a g m e n t s w e r e m a d e b l u n t - e n d e d b y i n c u b a t i o n w i t h 33mM T r i s - O A c p H 7 . 8 , 66mM K O A c , 1OmM M g O A c , I 0 0 m g / m l B S A , 0 . 2mM o f e a c h d e o x y n u c l e o t i d e t r i p h o s p h a t e i n 5 0 M 1 a n d 6 u T 4 D N A p o l y m e r a s e . DNA f r a g m e n t s o f 3 0 0 - 6 0 0 b p w e r e s e p a r a t e d b y e l e c t r o p h o r e s i s i n a 5% n o n - d e n a t u r i n g p o l y a c r y l a m i d e g e l f o l l o w e d b y e l e c t r o e l u t i o n . T h e e n d s o f t h e DNA w e r e a g a i n m a d e b l u n t - e n d e d a s a b o v e , p h e n o l e x t r a c t e d , p r e c i p i t a t e d w i t h e t h a n o l a n d r e s u s p e n d e d a t a b o u t l 0 M g / M l i n T E b u f f e r (1OmM T r i s - H C l p H 8 . 0 , imM E D T A ) . 2 . L i g a t i o n O f D N A I n t o p U C 1 3 O r M 1 3 V e c t o r s DNA f r a g m e n t s w e r e l i g a t e d t o v e c t o r D N A i n s m a l l v o l u m e s ( 1 0 - 1 5 j u l ) o f a b u f f e r c o n s i s t i n g o f 66mM T r i s - H C l p H 7 . 5 , 5mM M g C l 2 , 5mM D T T , a n d 0 . 4 - 1 . O m M A T P . F o r p U C 1 3 l i g a t i o n s , a p p r o x i m a t e l y l O O n g v e c t o r w a s l i g a t e d t o a t h r e e f o l d m o l a r e x c e s s o f i n s e r t D N A , w h i l e f o r M 1 3 l i g a t i o n s I 0 ~ 2 0 n g v e c t o r D N A w a s l i g a t e d t o a 1 - 5 f o l d m o l a r e x c e s s o f i n s e r t D N A . T 4 D N A 61 ligase was added (1 unit for blunt-ended l i g a t i o n s and 0.1u for sticky-ended l i g a t i o n s , Maniatis et al.,1982), and l i g a t i o n was allowed to proceed overnight at 15°C. If not used immediately, l i g a t i o n mixtures were stored at -20°C u n t i l used. 3. Transformation Of DNA Into Bacteria Host bacteria for pUC13 and M13 transformations were made competent by treatment with calcium chloride (Messing,1983). F i f t y m i l l i l i t e r s of YT (for JM101 or 103) or L broth (for JM83) were inoculated with host c e l l s and incubated at 37°C with shaking u n t i l the OD600nm of the culture was 0.5-0.6. C e l l s were c o l l e c t e d by centrifugation (2.5Krpm in an HB-4 rotor, 4°C, 5 minutes) and gently resuspended in one half of the st a r t i n g volume of ice cold 50mM Ca C l 2 . C e l l s were incubated on ice for 30-60 minutes and were again co l l e c t e d by centrifugation (2.5Krpm in a HB-4 rotor, 4°C, five minutes). Bacteria were gently resuspended in one tenth of the starting volume of ice cold 50mM C a C l 2 . Highest transformation e f f i c i e n c y was t y p i c a l l y seen i f these competent c e l l s were stored at 4°C for 24 hours (Dagert and Ehr1ich,1979). However, c e l l s were normally used without t h i s 24 hour storage. Aliquots (0.3 ml) of competent c e l l s were t y p i c a l l y transformed with 2-3jul of ligate d DNA (see previous section). C e l l s were incubated with DNA in 13X100mm glass tubes at 4°C for 40-60 minutes and then heat shocked at 42°C for 2 minutes. M13 DNA transformed c e l l s were mixed with 10M1 lOOmM IPTG, 35-50M1 X-Gal (1Omg/ml in dimethylformamide), 0.2ml host c e l l s , and 3-5ml soft YT agar(42°C), and poured onto YT plates. Heat shocked pUC13 62 transformants were rescued with the addition of 0.7 ml of L broth, followed by incubation at 37°C for one hour. Rescued c e l l s (100MD were spread with 50/nl X-Gal on LB plates supplemented with a m p i c i l l i n (50iug/ml). A l l plates with transformed c e l l s were incubated overnight at 37°C, and recombinants with a l l vectors were detected as colourless colonies or clear plaques in the presence of X-Gal (Messing,1983). F. ISOLATION OF RNA 1 . Isolation Of Total C e l l u l a r RNA a) Bovine RNA A l l glassware, pipets and solutions were autoclaved to destroy endogenous ribonucleases. Bovine RNA was i s o l a t e d by the method of Chirgwin et a l . ( l 9 7 9 ) . Powdered bovine l i v e r tissue was added to a buffer (1Oml/g tissue) c o n s i s t i n g of 7.5M guanidine hydrochloride (GuHCl) pH7.5, 25mM sodium c i t r a t e pH7.0, and 0.1M DTT. The l i v e r tissue suspension was disrupted by using a polytron homogenizer. N-lauryl sarcosine was added to 0.5% (w/v) and the insoluble matter was removed by centrifugation (5Krpm for 30 minutes, 4°C, HB-4 r o t o r ) . RNA was precipitated by the addition of ethanol to 33% followed by incubation overnight at -20°C. RNA was c o l l e c t e d by centrifugation (5Krpm in an SS-34 rotor, 4°C, 30 minutes). The RNA p e l l e t was resuspended in half of the s t a r t i n g volume of GuHCl buffer. Insoluble material was removed as before. RNA was precipitated as before, and resuspended in one fourth of the 63 s t a r t i n g volume of GuHCl buffer. Insoluble material was removed, and RNA was precipitated as before. Small RNAs (e.g. tRNA and 5S rRNA) and DNA was removed by selective p r e c i p i t a t i o n of large RNAs with L i C l (Barlow et al.,1963). The RNA p e l l e t was resuspended in 4 ml of 0.1M NaOAc pH7.0, and an equal volume of 4M L i C l , 0.1M NaOAc pH7.0 was added. The mixture was incubated at -20°C for 30 minutes followed by incubation at 0°C for an a d d i t i o n a l 30 minutes. RNA was c o l l e c t e d by centrifugation in an SS-34 rotor at 5Krpm for 30 minutes at 4°C. The RNA p e l l e t was washed twice by resuspending in 8 ml of 2M L i C l , 0.1 M NaOAc pH7.0 and collected by centrifugation in a HB-4 rotor at 9Krpm for 12 minutes at 4°C. RNA was then dissolved in 5 ml of 0.1M NaOAc pH5.0 and insoluble material was removed by centrifugation at 9Krpm in a HB-4 rotor at 4°C for 5 minutes. RNA was p r e c i p i t a t e d by the addition of two volumes of ethanol and incubation overnight at -20°C. RNA was c o l l e c t e d by centrifugation as above and dissolved in a small volume of H 20. The concentration of RNA was determined by assuming that a 1mg/ml solution had an OD260nm of 20. RNA was stored as ethanol p r e c i p i t a t e s in small aliquots at -20°C. b) Chicken RNA RNA y i e l d s with the GuHCl method from chicken l i v e r s were very low so a second RNA i s o l a t i o n procedure using SDS and phenol (Lizardi,1983) was used with this tissue. Livers from day old chicks were homogenized in 30 volumes of SET buffer (1OmM Tris-HCl pH7.5, 5mM EDTA, 1% SDS). Proteinase K was added to 50jug/ml and the homogenate was incubated at 50°C for one 64 hour. After digestion, t r i t o n X-100 and sodium deoxycholate were each added to 1% (v/v and w/v, r e s p e c t i v e l y ) , and NaCl to 0.1M. The homogenate was extracted three times with equal volumes of phenol:chloroform (1:1,v/v). RNA was precipitated by the addition of two volumes of ethanol, followed by incubation at -20°C. RNA was c o l l e c t e d by ce n t r i f u g a t i o n in a SS-34 rotor at 1OKrpm for 10 minutes at 4°C. The RNA was washed in 66% ethanol, 0.1M NaOAc pH5.0 and c o l l e c t e d by centrifugation as above. Small RNAs and DNA was removed by p r e c i p i t a t i o n with L i C l as previously described. 2. Isolation Of Poly A* RNA Poly A* RNA was isolated by chromatography on a column of oligo-dT c e l l u l o s e (Edmonds et al.,1971; Aviv and Leder,l972). Total chicken RNA in a small volume of 0.4M NaOAc pH7.5, 0.1% SDS was applied to the column. The unbound RNA fr a c t i o n was reapplied to the column three times. The column was then washed with 0.4M NaOAc pH7.5, 1mM EDTA and 0.1% SDS u n t i l the OD260nm of the eluate was below 0.05. Poly A + RNA was eluted from the column with 1mM EDTA, 0.1% SDS. Fractions containing RNA were i d e n t i f i e d by their OD260nm and were pooled. RNA was precipitated by the addition of 0.1 volumes of 3M NaOAc pH4.8 and two volumes of ethanol. RNA was resuspended in H 20 at a concentration of 2 mg/ml and stored at -70°C. 65 G. LABELING OF DNA 1. Nick Translation DNA for use as hybridization probes was labeled by nick t r a n s l a t i o n (Maniatis et al.,1975). T y p i c a l l y , 500ng of DNA was labeled in 50jul of 50mM Tris-HCl pH7.5, 5mM MgCl 2, 0.05mg/ml BSA, 10 mM 0-mercaptoethanol, 20MM dGTP, 20juM dTTP, 1.4/zM dATP, 1.4yM dCTP, 1.4/LtCi/Ml a- 3 2P dATP (3000Ci/mMole) , 1.4juCi/Ml a- 3 2P dCTP(3000Ci/mMole) , 0. 2mM CaCl 2, 1 pg / * i l DNase I, and 0.4u/jul E. c o l i DNA polymerase I (Romberg). The reaction mixture was incubated for 60-120 minutes at 15°C. The reaction was terminated by the addition of three volumes of 1% SDS-1OmM EDTA, containing 25/ig tRNA, followed by heating to 68°C for 10 minutes. After allowing the reaction mixture to cool to room temperature, the unincorporated labeled nucleotides were removed by chromatography on an Ultrogel AcA54 column. Labeled DNA was eluted from the column with 1OmM Tris-HCl pH7.5, 200mM NaCl, 0.25mM EDTA. T y p i c a l l y , labeled DNA had a s p e c i f i c a c t i v i t y of 0.5-1.0X108 cpm/jug. Labeled DNA was denatured by b o i l i n g for 10 minutes immediately before use. 2. Rlenow Labeling DNA was also labeled by the method of Feinberg and Vogelstein (1983). Typically, a reaction mixture contained 200-300ng of DNA in a volume of 50M1. DNA in 20yl was denatured by b o i l i n g for three minutes, and was cooled to 37°C for 15-30 minutes. Labeling occurred in a f i n a l volume of 50M1 of 50mM Tris-HCl pH8.0, 10mM MgCl 2, 10mM 0-mercaptoethanol, 20/uM dCTP, 66 20juM dGTP, 20MM dTTP, ]uCi/nl a~32P dATP ( 3 0 0 0 C i / mMo 1 e) , 200mM HEPES pH6.6, 60OD260nm/ml p(dN9), 0.4 mg/ml BSA, and 0.1u/jul E. c o l i DNA polymerase I Klenow fragment. Extension was allowed to occur overnight at 37°C. The reaction was terminated and labeled DNA was separated from unincorporated labeled nucleotides as described for nick t r a n s l a t i o n (see above). Ty p i c a l l y the s p e c i f i c a c t i v i t y of a Klenow labeled probe DNA was 2X108 cpm/jug. Labeled DNA was denatured as above prior to use. H. BLOT HYBRIDIZATIONS I. Genomic Southern Blot Analysis Genomic DNA for Southern blots were transferred to n i t r o c e l l u l o s e e s s e n t i a l l y as described by Southern(1975), and blots were hybridized and washed as described by Kan and Dozy(l978). Genomic DNA (10/ug) was digested with r e s t r i c t i o n endonucleases (20-30u) in a volume of 40M1 under conditions recommended by the enzyme manufacturers. DNA was separated by electrophoresis for 16-24 hours at 20-25 mA in submerged agarose gels. DNA in the gels was denatured for 30 minutes in 0.5N NaOH, 0.6M NaCl and was then neutralized by twice treating for 45 minutes with 1M Tris-HCl pH7.5, 0.6M NaCl. DNA was transferred to n i t r o c e l l u l o s e membranes with 1OXSSC (1XSSC is 0.15M NaCl, 0.015M NaCitrate pH7) for 36~48 hours. After transfer, the n i t r o c e l l u l o s e f i l t e r was washed in 3XSSC to remove any agarose, a i r dried, and then baked at 68°C for 6 hours. 67 DNA fragments were detected by hybridization to 3 2 P labeled probes. The n i t r o c e l l u l o s e f i l t e r was f i r s t wetted with 3XSSC and then prehybridized for 1-16 hours, in a solution containing 50% formamide, 6XSSC, ImM EDTA, 0.1% SDS, 1OmM Tris-HCl pH7.5, 10X Denhardt's solution (1X Denhardt's solution is 0.02% BSA, 0.02% f i c o l , 0.02% polyvinylpyrrolidone), 0.05% sodium pyrophosphate, l00jug/ml denatured herring sperm DNA, and 25nq poly(A). Hybridizations were ca r r i e d out in the same buffer with the addition of denatured labeled probe to at least 1X106 cpm/ml. Hybridization was for 36-48 hours at 37°C. After h y b r i d i z a t i o n , blots were washed for one hour at room temperature in 2XSSC, 1X Denhardt's, and then washed twice for 90 minutes at 50°C in 0.1XSSC, 0.1% SDS. Blots were then rinsed twice at room temperature in 0.1XSSC, 0.1% SDS, followed by 4 rinses at room temperature in 0.1XSSC. After a i r drying, blots were exposed to Kodak XK-1 fil m with in t e n s i f y i n g screen for 1-7 days at -70°C. 2. Southern Blot Analysis To Detect Repetitive DNA Blots to detect the presence of r e p e t i t i v e DNA were performed in a similar way to the genomic Southern b l o t s . Cloned genomic DNA fragments were separated on agarose gels and transferred to n i t r o c e l l u l o s e as described above. These blots were probed with nick translated genomic DNA instead of s p e c i f i c DNA probes. Blots were washed as before and exposed to Kodak XK-1 f i l m for 1-3 hours without i n t e n s i f y i n g screens. 68 3. Northern Blot Analysis Two methods were used to determine the size of mRNAs using either glyoxal (Thomas,1980) or formaldehyde (Maniatis et §_1.,1982) as the denaturing agent. A l l buffers for Northern blot analysis were autoclaved to destroy endogenous ribonucleases. For glyoxal gels, RNA was denatured at 50°C for 60 minutes in a t o t a l volume of 16jul with 2.Jul 6M glyoxal, 8.0M1 DMSO, and 1.6/zl 0.1M NaH2POu pH7.0 with up to 20/ug RNA. Denatured RNA was separated by electrophoresis on 1% agarose gels for 6 hours at 100V using a 1OmM NaH 2PO„ pH7.0 buffer. RNA was then transferred to n i t r o c e l l u l o s e in 20XSSC buffer for 16 hours. After transfer, the n i t r o c e l l u l o s e blot was a i r dried and baked at 80°C in a vacuum oven for 3-4 hours. Specific mRNA species were detected by hyb r i d i z a t i o n to s p e c i f i c labeled DNA probes as described for Southern blots (see above). For Northern blots using formaldehyde as the denaturing agent (Lehrach et a_l. ,1977; Goldberg, 1 980) , RNA was denatured in a t o t a l volume of 20jul with 2ul 5X Gel buffer (0.2M MOPS pH7.0, 50mM NaOAc, 5mM EDTA), 3.5M1 formaldehyde, 1Oul formamide, and up to 20/ug RNA at 55°C for 15 minutes. RNA was separated by electrophoresis in agarose gels containing 1X Gel buffer (40mM MOPS pH7.0, lOmM NaOAc, 1mM EDTA) and 2. 2M formaldehyde, at 100 V for 4-6 hours. Prior to transfer, the gels were washed with H 20 for 5 minutes, denatured with 50mM NaOH, 1OmM NaCl for 45 minutes, neutralized for 45 minutes with 0.1M Tris-HCl pH7.5 and soaked in 20XSSC for 60 minutes. RNA was then transferred to a ni t r o c e l l u l o s e f i l t e r overnight(16-24 hours) in 20XSSC. After 69 transfer, blots were washed with 3XSSC and baked for 6 hours at 68°C. Specific mRNA species were detected by hybridization and washing as described for Southern blots (see above). I. DNA SEQUENCE ANALYSIS 1. Construction Of M13 Clones DNA was sequenced by the chain termination method (Sanger et al.,1977) using M13 sequencing vectors (Messing e_t a l . , 1 981 ; Messing,1983). DNA to be cloned into M13 vectors for sequencing was produced by r e s t r i c t i o n endonuclease digestion (Messing,1983), or by sonication and end repair (Deininger,1983) (see above). 2. Screening Of M13 Clones Ty p i c a l l y , mixtures of DNA fragments were cloned into M13 vectors. To identify recombinant M13 clones containing exon encoding sequences, the M13 plaques were screened by plaque hybridization (Benton and Davis,1977). Replicas of the plaques were transferred to n i t r o c e l l u l o s e f i l t e r s , and the DNA was denatured by treatment with 0.5N NaOH, 1.5M NaCl for 5 minutes. The n i t r o c e l l u l o s e f i l t e r s were neutralized by treatment with 1M Tris-HCl pH7.5 for 5 minutes followed by treatment with 0. 5M Tris-HCl pH7.5, 1.5M NaCl for 5 minutes. After a i r drying, the f i l t e r s were baked at 68°C for two hours. Recombinant M13 phage of interest were detected by hybridization to labeled probes and autoradiography. Prior to hybridization, f i l t e r s were washed with 6XSSC, and then prehybridized in 6XSSC, 2X Denhardt's solution at 68°C for 1-4 hours. F i l t e r s were then hybridized 70 overnight at 68°C in 6XSSC, 2X Denhardt's, 1mM EDTA, 0.5% SDS, and denatured labeled probe (at least 1X106cpm/ml, s p e c i f i c a c t i v i t y >0.5X108 cpm/jug). After hybridization, f i l t e r s were washed twice at room temperature in 2XSSC followed by three washes at 68°C in 1XSSC, 0.5% SDS for 30-40 minutes, and f i n a l l y rinsed in 1XSSC at room temperature. After a i r drying, the f i l t e r s were exposed to Kodak XK-1 f i l m overnight at -70°C with in t e n s i f y i n g screens. 3. M13 DNA Iso l a t i o n DNA from clones of intere s t (see previous section) was prepared as described by Messing(1983). M13 clones were grown as 2 ml cultures in YT medium in 15ml Falcon 2059 tubes using one plaque and 20ul of host bacteria (JM101 or 103) as innoculum. The cultures were incubated at 37°C for 6-16 hours (clones known to contain large inserts were grown for the shorter time period). Host c e l l s were removed by centrifugation in a 1.5ml microfuge tube (Eppendorf). Phage p a r t i c l e s in 1.3ml of supernatant were p r e c i p i t a t e d by the addition of 0.3ml of 20% PEG, 2. 5M NaCl, and incubation at room temperature for 15 minutes. M13 phage were c o l l e c t e d by centrifugation in an Eppendorf centrifuge for 5 minutes. After removal of a l l the supernatant, the phage p a r t i c l e s were resuspended in 200M1 of low t r i s buffer (50mM NaCl, 1OmM Tris-HCl pH7.5, 1mM EDTA). DNA was p u r i f i e d by successive extractions of phenol, phenol:chloroform ( 1 : 1 , V / V ) , and chloroform. DNA was p r e c i p i t a t e d twice by the addition of 0.1 volume of 3M NaOAc and 2 volumes of ethanol. The f i n a l DNA p e l l e t was washed in 70% 71 ethanol and resuspended in 50/ul of low t r i s buffer. 4. DNA Sequencing DNA in M13 clones was sequenced by the chain termination method (Sanger e_t al.,1977) as modified for phage M13 templates (Messing e_t al.,1981). Sequencing reactions were c a r r i e d out using the dideoxy- and deoxyribonucleotide concentrations shown in Table I. Sequencing was performed by hybridizing 4jul of template (from above) with 1ul primer (0.03OD260nm/ml, 17-mer: 5'-GTAAAACGACGGCCAG-3') , 1 ul H 20, and 2jzl 10XHin buffer (600mM NaCl, lOOmM Tris-HCl pH7.5, 70mM MgCl 2) at 68°C for 10 minutes. The hybridization mix was allowed to cool to room temperature (20-30 minutes), and 1 ul of 15uM dATP, 1.0-1.5/ul of a- 3 2P dATP (1OuCi/ul,3000 Ci/mMole) and 2ul of 1U / M 1 DNA polymerase I Klenow fragment were added. An aliquot (2.5/zl) of t h i s template/primer mix was added to 1.5yl of the appropriate deoxy/dideoxy mix (see Table I ) . After 15-20 minutes of incubation at room temperature, 1 M 1 of 0.5mM dATP was added. After 15-20 minutes of incubation at room temperature, 5M1 of stop-dye mix (98% formamide, 1OmM EDTA pH8.0, 0.02% Xylene Cyanole, 0.02% Bromphenol Blue) was added. The extended products were denatured by heating to 92°C for three minutes and 1-2M1 of these products were analyzed on 6% and 8% thin(0.35 mm), denaturing polyacrylamide gels (50cm long) at 52W in 1XTBE. After electrophoresis, the gels were dried at 80°C with a Bio-Rad gel drier for 20-30 minutes, and autoradiographed to Kodak XK-1 f i l m overnight at room temperature. 72 Table I: Sequencing Mixes Nucleotide d/ddG d/ddA d/ddT d/ddC dG 7.9 109.4 1 58.7 1 57.9 dT 157.6 1 09.4 7.9 157.9 dC 157.6 1 09.4 1 58.7 10.5 ddG 1 57.4 - - -ddA - 116.7 - -ddT - - 550.3 -ddC - - - 191.6 The concentrations of the dideoxy- and deoxy-ribonucleotide triphosphates used in the sequencing mixes for M13 DNA sequencing. Concentrations are uM. Concentrations were determined empirically by Dr. Joan McPherson, Dept. of Botany, UBC. 73 5. Computer Analysis Of DNA Sequence Data The DNA sequences deduced from the sequencing gels (see above) was analyzed using the computer programs of Staden (1982) and Delaney (1982 ) . J. HETERODUPLEX ANALYSIS To a s s i s t in determining the size and position of exons and introns in the bovine prothrombin gene, heteroduplex analysis was conducted by Dr. Kevin Ahern and Dr. George Pearson, Oregon State Un i v e r s i t y . Heteroduplexes were formed between EcoRI and PstI cut bovine prothrombin cDNAs (pBI1111 or pBIIl02, MacGillivray and Davie,1984) and DNA either from the X clones containing bovine genomic sequences (XBII1, XBII2, or XBII3) or from appropriately cleaved subclones of the bovine genomic sequences. An aliquot (lOOng) of each DNA to be analyzed by heteroduplex analysis were denatured together in 10M1 of 80% formamide by heating to 70°C for 10 minutes. Hybridization occurred at 37°C for one hour in a reaction mixture volume of 20^1 of 50% formamide, 200mM NaCl. DNA spreading conditions were e s s e n t i a l l y as described by Chow and Broker (1981). The entire duplex mixture was spread as hyperphase in a volume of 40/ul of 50% formamide, lOOmM NaCl, 5mM EDTA, lOOng of DNA length standard and cytochrome c (40yg/ml). The DNA protein f i l m was adsorbed to a parlodion coated grid, stained with uranyl acetate, and rotary shadowed with platinum-palladium. Grids were examined with a Zeiss EM-10A electron microscope operating at 60kV. Molecular lengths were measured using a Videoplan II 74 image analysis system. Single stranded DNA measurements were converted to double stranded lengths using the factor 1.16 to correct for compression during spreading. K. SCREENING PHAGE LIBRARIES 1. Plating Phage L i b r a r i e s Genomic and cDNA l i b r a r i e s in a vari e t y of d i f f e r e n t X vectors were screened by the procedure of Benton and Davis (1977). These l i b r a r i e s were i n i t a l l y screened at a high density of 10" plaques per 100mm p e t r i dish or 5X10" per 150mm pet r i dish. Appropriate d i l u t i o n s of phage were incubated with host c e l l s at 37°C for 10 minutes (to allow attachment of the phage) and then plated on NZYC plates with addition of soft NZYC agarose. Plates were incubated at 37°C u n t i l the phage plaques were v i s i b l e but not touching each other, and the plates were placed at 4°C for one hour. Replicas of the plaques were transferred to n i t r o c e l l u l o s e c i r c l e s and incubated inverted on fresh NZYC plates at 37°C overnight to allow a m p l i f i c a t i o n of phage plaques. Master plates were stored at 4°C. For screens other than the f i r s t high density screen, t h i s a m p l i f i c a t i o n step was omitted. DNA on the n i t r o c e l l u l o s e f i l t e r s was denatured, neutralized and baked as described for M13 screens (see above). 75 2. Screening Of Phage F i l t e r s Various d i f f e r e n t stringencies for hybridization and washing of f i l t e r s were used depending on the homology of the probe to the desired sequences within the l i b r a r y . When the probe and the l i b r a r y were from the same species, the f i l t e r s were hybridized and washed at high stringency, as described for screening M13 f i l t e r s (see above). Cross hybridization between species required conditions of reduced stringency for hybridization and washing. Reduced stringency was obtained by reducing the temperature of the hybridization, increasing the NaCl concentration, and/or reducing the temperature of the washes. Cross hybridization between human and chicken DNA fragments was obtained by hybridization at 50°C and washing in 6XSSC at 45°C. Conditions for autoradiography varied due to conditions of hybridization and washing, X vector, type of l i b r a r y , and s p e c i f i c a c t i v i t y of the probe. The conditions varied from 4 hours at -20°C with intensifying screens to 3 days at -70°C with i n t e n s i f y i n g screens. L. SCREENING PLASMID LIBRARIES A human l i v e r cDNA l i b r a r y was screened by the method of Benton and Davis (1977). The human cDNA l i b r a r y in pKT218 (Prochownik e_t al.,1983) was plated by Marion Fung. Approximately 10" clones per 100mm petri dish were spread on LB plates supplemented with tetr a c y c l i n e . Plates were incubated at 37°C u n t i l colonies were 1-2mm in diameter. At t h i s time, re p l i c a s were made on to n i t r o c e l l u l o s e f i l t e r s . The master 76 plates were stored at 4°C, while the r e p l i c a f i l t e r s were grown on LB tetra c y c l i n e plates u n t i l the colonies were 3-4mm in diameter. The n i t r o c e l l u l o s e f i l t e r s were then transferred to LB plates supplemented with chloramphenicol (25nMg/ml) and incubated overnight at 37°C. Colonies were lysed and the DNA was denatured by treating the n i t r o c e l l u l o s e r e p l i c a f i l t e r s with 0.5N NaOH, 1.5M NaCl twice for 20 minutes. N i t r o c e l l u l o s e r e p l i c a s were neutralized by treating with 1M Tris-HCl pH7.5 for 20 minutes followed by treatment with 0.5M Tris-HCl pH7.5, 1.5M NaCl for 20 minutes. After a i r drying, the f i l t e r s were baked at 68°C for two hours. The human cDNA l i b r a r y was screened with a bovine cDNA probe so that conditions of reduced stringency were needed to detect the corresponding human cDNA. Prior to hy b r i d i z a t i o n , the ni t r o c e l l u l o s e f i l t e r s were washed three times in 6XSSC to remove c e l l debris and prehybridized in 6XSSC 2X Denhardt's at 68°C for two hours. F i l t e r s were hybridized and washed as described for screening M13 clones except that hybridization was at 60°C and washes were at 60°C and in 6XSSC. Positive clones were detected by autoradiography. M. MAPPING THE END OF A mRNA TRANSCRIPT 1. Nuclease S1 Mapping Uniformly labeled single stranded DNA probes for S1 analysis were produced as described by Nasmyth(1983). 01igodeoxyribonucleotide primers, either the M13 sequencing primer (see above) or a primer complementary to the prothrombin 77 mRNA (5'-CCTCGGACGCGCGCCAT-3'), were used to prime DNA synthesis to produce single stranded probes complementary to the bovine prothrombin mRNA. Primer DNA (2.5M1 of 0.03OD260nm/ml) was mixed with 2.5M1 of appropriate M13 clone template, 1.25yl lOXHin buffer (as above), and 1.25/ul of H20, and was incubated at 68°C for 10 minutes and allowed to cool to room temperature (20-30 minutes). Nucleotides (1.2 5jul containing 0.5mM dCTP, 0 . 5mMdGTP, and 0 . 5mMdTTP) , 2.5*il a- 3 2P dATP(1 OjuCi/jul, 3000Ci/mMole), and 1.25jul (0.625u) DNA polymerase I Klenow fragment were added and the mixture was incubated at 15°C for 60 minutes. The reaction was stopped by heating to 68°C for 10 minutes. DNA of a sp e c i f i c size was produced by digestion with the r e s t r i c t i o n endonclease EcoRI for 60 minutes. After digestion, the reaction was stopped by the addition of an equal volume of sequencing stop-dye mix (see above) and denatured by heating to 92°C for 5 minutes. The probe fragment was separated on a denaturing 6% polyacrylamide g e l . The fragment was recovered by electroelution (Maniatis et al.,1982), phenol extracted, and precipitated with ethanol. Approximately 105 cpm of labeled probe was mixed with lOOjug t o t a l bovine l i v e r RNA in 30M1 of 80% formamide, 40mM PIPES pH6.4, 400mM NaCl, 1mM EDTA. The mixture was incubated at 85°C for 5 minutes, followed by incubation at 42°C overnight. Nuclease S1 digestion was performed by the addition of 300^1 of nuclease S1 buffer (0.28M NaCl, 50mM NaOAc pH4.8, 4.5mM ZnS0 4, 20iug/ml denatured herring sperm DNA) containing 2000u/ml nuclease S1 . The reaction was incubated at 37°C for 60 minutes, 78 followed by phenol extraction. Nuclease SI protected DNA fragments were recovered by addition of NH^OAc to 0.7M, 10/ig tRNA and an equal volume of isopropanol. The precipitate was recovered by centrifugation and redissolved in a small volume of sequencing stop-dye mix (see above). Products were separated on a 8% denaturing polyacrylamide gel, afer denaturing the DNA at 92°C for 3 minutes. Protected DNA fragments were detected by autoradiography on the dried g e l . 2. Primer Extension Primer extension was performed e s s e n t i a l l y as described by Law and Brewer(1984). Six picomoles of 5' end labeled oligodeoxyribonucleotide (same ol i g o as used above for nuclease S1 mapping, s p e c i f i c a c t i v i t y was 3x10 6 cpm/pMole) were resuspended with 5uq t o t a l bovine l i v e r RNA in 5jul TE pH7.4 buffer (1OmM Tr i s - H c l pH7.4, 1mM EDTA). The mixture was denatured by b o i l i n g for 3 minutes, and cooled in ice water. In a t o t a l volume of 10^1 KC1 and Tris-HCl pH8.3 were added to 200mM and lOmM, respectively, and kept on ice for 10 minutes. Each deoxyribonucleotide triphosphate was added to 1mM, Tris-HCl pH8.3 to 50mM, KC1 to 50mM, MgCl 2 to lOmM, actinomycin D to 40Mg/ml, and /3-mercaptoethanol to 30mM in a t o t a l volume of 40jul. Avian reverse transcriptase (50u) was added and the reaction was incubated at 37°C for 90 minutes. The reaction was terminated by the addition of 3/xl of 0. 5M EDTA pH8.0, 3jul was mixed with 3M1 of sequencing stop-dye mix (see above). After denaturation at 92°C for 3 minutes, the products were separated on a 8% denaturing polyacrylamide g e l . Products were detected 79 by autoradiography of the dried gel using Kodak XK-1 f i l m . 80 RESULTS A. ISOLATION OF THE BOVINE PROTHROMBIN GENE 1. Southern Blot Analysis Of The Bovine Prothrombin Gene As an i n i t i a l step toward the characterization of the bovine prothrombin gene, bovine l i v e r DNA was digested with several r e s t r i c t i o n endonucleases, and the resu l t i n g fragments were separated by agarose gel electrophoresis. After denaturation, the DNA fragments were transferred to n i t r o c e l l u l o s e and analyzed with 3 2 P - l a b e l e d hybridization probes derived from cloned bovine prothrombin cDNAs. Several bovine prothrombin cDNA clones have been described (MacGillivray and Davie,1984) including pBI1111 (that contains DNA coding for 5 bp of 5'-untranslated sequence and DNA coding for residues -43 to 579 of prothrombin) and pBIIl02 (that contains DNA coding for residues 69 to 582, a stop codon, 119 nucleotides 3' untranslated sequence, and a poly(A) t a i l ) . When the Southern blots of bovine genomic DNA were analyzed with the cDNA inserts of both pBI1111 and pBIIl02 as hybridization probes, several fragments were detected with each of the r e s t r i c t i o n enzymes used (Fig.4A). The i n t e n s i t i e s of bands were similar to those found when pBI1111 DNA was included in the blot at a concentration equivalent to a single copy gene (data not shown). When the 5' or 3' ends of the cDNA were used as hybridization probes, single r e s t r i c t i o n fragments were detected with many of the enzymes used (Fig.4B, 4C), suggesting that the bovine genome contains a single gene coding for prothrombin. From these 81 Fig,4: Southern Blot Analysis of the Bovine Prothrombin Gene Southern blot analysis of the bovine prothrombin gene. High molecular weight bovine l i v e r DNA was digested with various r e s t r i c t i o n endonucleases and electrophoresed in a 0.7% agarose g e l . After denaturation, the DNA was transferred to n i t r o c e l l u l o s e and hybridized to prothrombin cDNA as indicated in part D. In each blot, lane M represents 3 2 p -labeled size markers (X DNA cleaved with H i n d l l l ) . Blot A: Bovine DNA cleaved with BamHI (lane 1), EcoRI (lane 2), H i n d l l l (lane 3), PstI (lane 4), B g l l l (Ine 5), SstI (lane 6). The complete cDNA inserts of pBI1111 and pBIIl02 (MacGillivray and Davie,1984) were used as hybridization probes. Blot B: Bovine DNA was cleaved BamHI (lane 1), H i n d l l l (lane 2), EcoRI (lane 3), SstI (lane 4), B g l l l (lane 5), PstI (lane 6). The Pstl-Xhol fragment of pBI1111 was used as a hyb r i d i z a t i o n probe. Blot C: Bovine DNA was cleaved with H i n d l l l (lane 1), EcoRI (lane 2), BamHI (lane 3). The BamHI-PstI fragment of pBIIl02 was used as a hybridization probe. D: The r e s t r i c t i o n map of the cDNA clones pBIIl02 and pB11 111 with 5' and 3' probes indicated (MacGillivray and Davie,1984), cDNA clones are flanked by PstI r e s t r i c t i o n s i t e s . A C 83 D c D N A c l o n e s : pBI.M 1 1 =ipBI11 02 p r o b e s : B B H S B H 5 ' 3' 84 blots, i t was estimated that the prothrombin gene was at least 10 Kbp in length. 2. Cloning Of The Bovine Prothrombin Gene To study the bovine prothrombin gene more thoroughly, a bovine genomic phage l i b r a r y was constructed by Ross MacGillivray using bovine l i v e r DNA cloned into the BamHI s i t e of X1059. One m i l l i o n phage from t h i s l i b r a r y were screened by Ross MacGillivray by using the cDNA insert of pBIIl02 as a hybridization probe. Two independent positives were isolated, XBII1 and XBII2. R e s t r i c t i o n endonuclease mapping and Southern blot analysis showed that these phage contained overlapping DNA and represented 25 Kbp of contiguous bovine genomic DNA (Fig.5). Southern blot analysis showed that these phage contained most of the prothrombin gene but lacked the 3' region. The X1059 l i b r a r y was subsequently rescreened using the 3' BamHI-PstI fragment of pBII102 as a hy b r i d i z a t i o n probe, but these screens only resulted in the r e i s o l a t i o n of XBII1. To i s o l a t e the 3' end of the prothrombin gene, 106 phage of a second bovine l i v e r genomic l i b r a r y (in XCharon 28 from Dr. F r i t z Rottman, Case Western Reserve University) were screened by using the BamHI-PstI fragment of pBIIl02 as a probe. Three d i f f e r e n t clones, XBII3, XBII4, and XBII5, were i d e n t i f i e d and plaque p u r i f i e d . R e s t r i c t i o n enzyme mapping showed that these phage clones overlapped XBII1 and XBII2 at positions that were 3' to the mapped prothrombin gene (Fig.5). XBII3 and XBII4 contained r e s t r i c t i o n fragments that were consistent with those detected in the genomic Southern blots with the 3' probe. XBII5 85 F i g . 5 : R e s t r i c t i o n Map of the Bovine P r o t h r o m b i n Gene The r e s t r i c t i o n map was determined by a n a l y s i s of the f i v e r ecombinant phage XBII1-5 and s u b c l o n e s d e r i v e d from them. The l o c a t i o n of the p r o t h r o m b i n gene w i t h i n t h i s r e g i o n i s i n d i c a t e d (see s e c t i o n B ) . Exons a r e r e p r e s e n t e d by b l a c k boxes and have been numbered from the 5' end of the gene. The s c a l e a t t h e t o p r e p r e s e n t s n u c l e o t i d e s i n k i l o b a s e pa i r s. SCALE (KB): 0 5 10 15 20 25 30 35 40 4 2 1 1 1 1 1 i I i I i EXONS: B a 1 2 3 4 5 6 789101112 1314 GENE: I I II II • • i II II BamHI L_L EcoRI J I U I I U I I I L L Hindlll i i i i i I I I i __ L _ J SstI i i i i M I i 11 i i i  Bglll i • i i i i I I I i i i ™ Xhol i i i Xbal i i i L J 1 1 Sail i Kpnl i i i CLONES: v „ 1 1 < r -XBII1C XBII2I XBII3I XBII4C XBII5I 87 did not contain the 3'-most exons (see Fig.5) but was i s o l a t e d because i t contained exon 12, a part of which i s contained in the BamHI-PstI fragment used as a probe. A t o t a l of 42.4 Kbp of contiguous genomic DNA was represented by the fi v e phage (XBII1-XBII5). This region contained a l l r e s t r i c t i o n enzyme fragments detected in the genomic Southern blot analysis (see Fig.4). The prothrombin gene maps to 15 Kbp in the middle of t h i s cloned DNA (see sections B and C). 3. Analysis Of The Size Of The Bovine Prothrombin mRNA To determine the size of the mRNA for bovine prothrombin, t o t a l bovine l i v e r RNA was denatured with glyoxal and separated by size on an agarose gel (Thomas,1980). After transfer to ni t r o c e l l u l o s e , the mRNA for bovine prothrombin was detected by hybridization to the 3 2 P - l a b e l e d cDNA insert of pBI1111 as shown in Fig.6. Autoradiography of the blot revealed a single band which was 2150 ± 100 nucleotides in size (see Fig.6). The prothrombin cDNAs pBI1111 and pBIIl02 contain 1998 nucleotides coding sequence plus 3' untranslated sequence (MacGillivray and Davie,1984). As poly(A) t a i l s are usually 180-200 nucleotides in length (Perry,1976), th i s indicated that <50 nucleotides of prothrombin mRNA 5' flanking sequences were absent from the cloned cDNAs. Thus the 5' end of pBII 111 must be very near to the s i t e of mRNA i n i t i a t i o n . 8 8 Fig.6: Northern Blot Analysis of Bovine Prothrombin mRNA The size of the mRNA of bovine prothrombin was determined after denaturing 20jug bovine l i v e r RNA with glyoxal and electrophoresis on an agarose g e l . The RNA was transferred to n i t r o c e l l u l o s e and was hybridized to 3 2 P - l a b e l e d pBIIIII. The molecular weight markers represent the position of X-Hindlll DNA fragmentd. KB refers to kilobase pa i r s . 89 kb 9.96-6.67-4.25-2.25 -1.96-90 B. HETERODUPLEX MAPPING 1. Method Heteroduplex analysis of the cloned bovine prothrombin gene was undertaken by Dr. Kevin Ahern in Dr. George Pearson's laboratory at Oregon State University. This heteroduplex data was useful in determining the sizes of the introns and exons, as well as i n d i c a t i n g the possible presence of r e p e t i t i v e DNA elements. Examples of the heteroduplexes of prothrombin cDNAs (pBI1111 and pBII102) to genomic clones (XBII1, XBII2, or XBII3, or subclones) are shown in Fig.7. 2. Exons And Introns The sizes of the exons and introns determined by heteroduplex analysis, and a comparison of these data to the sizes determined by DNA sequence data are shown in Tables II and I I I . The sizes of a l l exons were determined both by heteroduplex analysis and by DNA sequence analysis, and were found to be in excellent agreement with each other (Table I I ) . The size of a l l but two introns could be determined by heteroduplex analysis (Table I I I ) . Two of the introns (G and M) are too short to be accurately measured, but were v i s i b l e (Fig.7) (Irwin et al_.,1985). The p o s s i b i l i t y of other small introns in the gene could not be discounted from the heteroduplex data (Irwin et al.,1985), however, DNA sequence data demonstrated that a l l introns were detected by heteroduplex analysis. As shown in Table I I I , there were some differences for those introns which were sized both by heteroduplex analysis 91 Fig.7: Heteroduplex Analysis of the Bovine Prothrombin Gene Electron micrographs of heterduplexes formed between cloned bovine genomic DNA (XBII3) and cloned prothrombin cDNA (pBI1111). Three representative heteroduplexes are shown together with interpretive drawings below each photograph. The thin l i n e i s single stranded DNA, the thick l i n e is double stranded DNA. The bar in each panel represents 1 Kbp. Introns are le t t e r e d A through M s t a r t i n g at the 5' end of the gene where intron A i s flanked by exons 1 and 2 (see Fig.8). Stem refers to an inverted repeat sequence found in intron F. IR indicates an inverted repeat sequence shared by introns I and L, where a-d locate the position of the IR within each intron (see Table IV). (From Irwin et al.,1985). 92 93 Table 11: A Comparison of the Sizes of Exons Determined Both by  DNA Sequence Analysis and Heteroduplex Analysis EXON SIZE FROM SIZE FROM REGION3 DNA SEQUENCE (bp) HETERODUPLEX (bp) 1 941 98(14 ) 2 -43 to -17 2 1 64 168(18) -17 to 38 3 25 28(8) 39 to 47 4 51 53(13) 47 to 64 5 1 06 103(13) 64 to 99 6 1 37 139(15) 99 to 1 45 7 315 317(26) 1 45 to 250 8 1 35 137(15) 250 to 295 9 127 117(16) 295 to 337 10 1 68 170(19) 337 to 393 1 1 1 74 159(19) 393 to 451 1 2 1 82 160(17) 451 to 51 1 1 3 71 65(10) 51 1 to 535 14 266 227(17) 536 to 582 , Exon 1 is measured to the 5' end of pBII111 to allow compar i son. 2, In the heteroduplex analysis l i s t i n g the mean length and standard deviation, in parentheses, of the exons in base pairs i s shown. 3, Region represents the amino acid residues of prothrombin encoded by each exon. Heteroduplex analysis data are taken from Irwin et al.(1985) . 94 Table I I I : A Comparison of the Sizes of Introns Determined both  by DNA Sequence Analysis and Heteroduplex Analysis INTRON SIZE FROM SIZE FROM LOCATION1 DNA SEQUENCE (bp) HETERODUPLEX (bp) A 342 261(46) 3 -17 B ND2 601(62) 38-39 C 227 170(39) 47 D ND 1504(73) 64 E 98 112(19) 99 F ND 1381(99) 145 G 293 235(23) 250 H 75 < 1 00 295 I ND 1055(94) 337 J ND 397(46) 393 K 242 216(29) 451 L ND 6940(255) 516 M 1 35 <100 535-536 Location is the amino acid residue(s) at the intron-exon j unct ion. 2, ND, not determined. 3, Mean length with standard deviation in parentheses of the introns in base pairs i s l i s t e d . Heteroduplex analysis data are taken from Irwin e_t a l . (1 985) . 95 and DNA sequence analysis. In general the shorter introns (see Table III) were overestimated in size by the heteroduplex analysis, possibly due to d i f f i c u l t i e s in measuring the short intron loops (see Fig.7). Sizes of the larger introns were in good agreement with sizes predicted from the r e s t r i c t i o n endonuclease map. The t o t a l size of the bovine prothrombin gene was estimated by heteroduplex analysis as 14.9 Kbp (Irwin et al.,1985). This i s in close agreement to the size of 15.6 Kbp indicated by DNA sequencing and r e s t r i c t i o n enonuclease mapping (see section C). 3. Repetitive DNA Heteroduplex analysis detected the presence of repeated DNA sequences within the genomic clones (see Fig.7). These repeated sequences were mapped to within introns F, I, and L. As shown in Table IV, the sizes and positions of some of these repeated sequences could be determined. One such element was found as an inverted repeat sequence within intron F, and a second was found as a homologous sequence in introns I and L (Fig.7). The presence of two homologous DNA sequences within the same genomic clone implies that these may be a type or types of r e p e t i t i v e DNA elements. C. DNA SEQUENCE ANALYSIS OF THE BOVINE PROTHROMBIN GENE To characterize the gene at the nucleotide l e v e l , small fragments of XBII1, XBII2, and XBII4 (or appropriate subclones) were cloned into M13 vectors, and exon-containing Ml 3 phage were i d e n t i f i e d by plaque hybridization using prothrombin cDNA 96 Table IV: Length and Location of Inverted Repeat Sequences Observed Within the Introns of the Bovine Prothrombin Gene FEATURE 1 LENGTH: stem 119(26) i r 378(27) a 586(57) b 129(23) c 4456(186) d 2117(109) loop 3 5692(234) 1, Features are from Fig.7. 2, Lengths of DNA expressed as mean with standard deviation parentheses. 3, Separation between i r sequences. 97 fragments as h y b r i d i z a t i o n probes and DNA sequences of these exon-containing M13 phage were determined by the chain termination method. The nucleotide sequence of a t o t a l of 6.6 Kbp of genomic DNA was determined (Figs.8 and 9). Comparison of this sequence with the prothrombin cDNA sequence (MacGillivray and Davie,1984) allowed the i d e n t i f i c a t i o n of intron and exon sequences, as shown in Fig.9. The 5' end of the mRNA was mapped to nucleotide 1 in Fig.9 (see section D). The nucleotide sequence of 583 bp of 5' flanking sequence was determined in addition to the sequence of each of the 14 exons, and 145 bp of 3' flanking sequence (Figs.8 and 9). The complete nucleotide sequences of 7 of the 13 introns were determined, although the nucleotide sequence of only the intron/exon boundaries of the larger introns was analyzed. A t o t a l of 20 Kbp of DNA sequence data was obtained with the sequence of each nucleotide determined an average of 3 times. A l l intron-exon junctions were obtained using at least two different M13 clones. A l l exon sequence was determined at least twice except for a short portion of exon 7. Parts of the intron sequences, however, were determined only once. The DNA sequence confirmed e a r l i e r heteroduplex r e s u l t s (see section B) on the number and sizes of exons and introns, as shown in Tables II and I I I . The positions of the exons in the genomic clones and the sizes of the larger introns were confirmed by the presence of r e s t r i c t i o n enzyme site s in the DNA sequence that matched the previously determined r e s t r i c t i o n map (Fig.5). From the DNA sequence data shown in Fig.9 and the sizes of the larger introns as determined by 98 F i g . 8 ; P a r t i a l R e s t r i c t i o n Map and Sequencing S t r a t e g y f o r  the Bovine Prothrombin Gene A b b r e v i a t i o n s used a r e : B - BamHI; Bg - B g l l l ; E - E c o R I ; H - H i n d l l l ; K - K p n l ; P - P s t I ; X - X h o l ; Xm - X m a l . Exons a r e shown as b lack boxes (1-14) under the r e s t r i c t i o n map. I n t r o n s are shown as s i n g l e l i n e s j o i n i n g the exons , and are l e t t e r e d A - M . The d i r e c t i o n of t r a n s c r i p t i o n i s i n d i c a t e d 5' to 3 ' . The arrows below the gene i n d i c a t e the o r i e n t a t i o n and amount of DNA sequence o b t a i n e d from independent M13 c l o n e s . The s c a l e r e p r e s e n t s k i l o b a s e pa i r s. 99 1 0 0 Fig.9: P a r t i a l DNA Sequence of the Bovine Prothrombin Gene The sequence was determined by analysis of the Ml 3 clones indicated in Fig.8. The predicted amino acid sequence of bovine prothrombin is given above the nucleotide sequence. The s i t e of transcription i n i t i a t i o n i s given as nucleotide 1 (G); the 5' flanking sequence i s numbered backwards from thi s point. Possible promoter elements in the 5' flanking sequence include an inverted repeat ( - ^ ^ - ) , a CCAT sequence (boxed) and a ATTAA sequence (boxed) - see text for d e t a i l s . Intron/exon junctions are denoted by v e r t i c a l arrows. The sizes of the larger introns have been taken from the heteroduplex analysis (Fig.7, and Table I I I ) . The putative polyadenylylation signals AATAAA (nucleotides 15,563-15,568) and CAGTG (nucleotides 15,599-15,603) are boxed, and the two polyadenylylation s i t e s are denoted by the s o l i d diamonds. In the protein coding region, the cleavage s i t e giving rise to plasma prothrombin i s denoted by (^ ^T! and the two si t e s of activation of prothrombin by factor Xa are denoted b y ( ^ ) . 101 C TGC AGG CCG GCC TCC TGG TGA CCT GGA ACG AAG ATA GAC CAG AGG CCT GGG AGG CCA GGG CCC GAC TCT TCC TCC TGG CAA CCG CTA CAC ACA AAC ACG CCT _570 -540 -510 CCC CAG CTC CCA GGC AGG GCG GGG ACG TGG GAC CCT CCG TGT GCG GCC GGG TGG CCA CAC CCT GCC CTC CAT TTC CTT ACA TGT GGA CGG TGG ACT CCA CAG CCC -470 -450 -*20 -390 TCC CCG CAG GCT TTC CTG CAC ACA GCT GCT GCT CAC TAA GCT CCC CTC TAA ATT AAG AAT CTC CTT CAG TCT CTA CAG CAG GAC ACT CTC CCC ACC GCC CAG AGG -360 "330 -300 AGG AGA CAG GCT CAC AGA GGT CAA AGC AAC CAT CAC CGT GTG TTA GGT AGG AAG GAG CCT GCA GGA GAA CCC TGT GAC CCC ACT GAC CCC GGA GAG GGA GAG GCT -260 -240 -210 -ISO GGA TGG TGG CAG CAC GTC TGG GCT CCG CTC TGG GGC TTC CTC CCA GGA TGG CGG GGG TGG GCT CltC CAT) CCA CGT GTC CCT ATG GCC CTG ACC CGC TGA CCT CCG -150 -120 -90 -43 -40 • Met Ala Arg Val Arg Gly Pro CTT CCC GGC TGA TTT CTT CAC GTT GGT TCA ACfV TTA jpC CGG TGG GGT CAG GAC CAG CCC GCA GAG TGC CGG AGC GGA TAC ACC ATG GCG CGC GTC CGA GGC CCG -50 -30 -1+1 30 -30 -20 I intron A Arg Leu Pro Gly Cys Leu Ala Leu Ala Ala Leu Phe Ser Leu val His Ser Gin His V^  CGG CTG CCT GGC TGC CTG GCC CTG GCT GCC CTG TTC AGC CTC GTG CAC AGC CAG CAT GGT AAG GGG GGC GCT GGA AGC TGT GAT AGG CTG GCG GCA TGC GTG TGG 60 90 120 150 GTC TGT GGG CTG GGG GTC TCC ACC GAG AGA AAC AGG GCT GGC TCC CAG ATC CTC ACC ATG TCC AGC TCA GGG AAG GAC CCC CGG CGC TCC GGG CCG GAC AGA A 180 210 240 ACT GAC TAC TGC TCT CAG GCA ATA TGG AAG GTG GGC TGG GGG TGA CCC ATG AAA GGA GAG GGC TAG TGG CTG CCA CTA GCA GCC TTC CGG GGC CTG CCG CCA TCC 270 300 330 \al Phe Leu Ala His Gin CGG AGT CCC CCG CTC CCG TTT CGG AAG CCA GCA GAG CTT GCC TCC TGC CCC CAC GGT GGC CAT CGT CCC AGC CTC CTC CCC CCT GCA GTG TTC CTG GCC CAT CAG 370 390 k j/^ 420 450 -,1a Asi CC AAC * . » i . « 4 i n t r o n B ITG AGG CCC CGG TGA t I Asp Ala Phe Trp Al* Lys Tyr Thr A^  JOtTOn C Gin Ala Ser Ser Leu Leu Gin Arg Ala Arg Arg la Asn Lys Gly Phe Leu Gla Gla Val Arg Lys Gly Asn Leu Gla Arg Gla Cys Leu Gla Gla Pro Cys Ser CAA GCA TCC TCG CTG CTC CAG AGG GCC GC CGT GCC AAC AAG GGC TTC CTG GAG GAG GTG CGG AAG GGC AAC CTG GAG CGA GAG TGC CTG GAG GAG CCA TCC AGC 480 510 S40 30 Arg Gla Gla Ala Phe Gla Ala Leu Gla Ser Leu Ser CGC GAG GAG GCC TTC GAC GCC CTG GAG TCT CTC AGT GCC ACG GTG AGG CCC CGG TGA GGC AGG TCC TGG CTC CCT CCA AGG GGT CCA GCT G S10 bp -580 600 630 GGA TCC TGC CAC AGC CTC ATA CTC AGC CTT GTT TTT CAG GAT GCG TTC TGG GCC AAG TAC ACA GGT GAA CAC GCG GAA GAC TTT GCT CTG GGA GGG GAG 1170 1200 1230 1260 TCC TGG GGA CCC CAG CTG CAG AGT GCT CCA CCC CAG AGA GGC TTC TGG TCC GCC CAG CCG CCC ATC CCT GCG CCC CTG CCT CGT TCC TCC CTT CCT TCC ATT GTC 1290 1320 1350 I 5 0 1^* Cys Glu Ser Ala J L Arg TGC CCG CCC CTC TGT TTC TGA GCC CTG TCC TAC CCT TTA CTT GTC CCG TCC CCA CCT CAA TCT CAG TGG TGT CTC TGG GTC TTT CTA GCT TGT GAG TCA GCC AGA 1380 1410 1440 1440 I cr. i » ou 4 ' n t r 0 n D Asn Fro Arg Glu Lys Leu Asn Glu AAT CCT CGA GAA AAG CTC AAT GAA TGT CTG GAA CGT GAG GAA CTG ACA TGG GGG TGG GGA GAC CCC CGT GTG CAA AGT AGG GGT GGG GTA GGA GTC GAG GCC CGG 1500 1530 1560 GGG TGG GGG GCC CTG GCC CTT CTG TTC TGA GGT AAG GAT GGC TCT TTC CCC TGC TGT ATG CTG AAT ATC 1220 bp CC CGG GCA CAG CGC CTG 1590 1620 2Bao GCA CAT GGC TGT CAC ACA GGG GGC GCT CAG TGA ATG TTG GGT GCC TGC TGG GTA CAA AGG AAG TGC TCA GTG AAG GCA AGT TAA GGC TCA TGC AGC AGA AGT AGC 2910 2940 2970 i Glu TTG GAG GGG AGG CAC CGA CAG AGC TTT ACG AGG ACA GAA GGG CGG GTG GAC AAG TCC TCA GOG GCA GAC ACC TGG ACT GGG GTC TCC GCA GCA AAC TGC GCT GAA 3000 3030 3060 3090 Jly Asn Cys Ala t* CA A  TC GC  ( 3090 7 0 8 0 9 0 I intron E Gly Val Gly Met Asn Tyr Arg Gly Asn Val Ser val Thr Arg Ser Gly lie Glu Cys Gin Leu Trp Arg Ser Arg Tyr Pro His Lys Pro Clf GGT GTG GGG ATG AAC TAC CGA GGG AAC CTG AGC GTC ACC CGG TCA GGC ATC GAG TGC CAG CTG TGG AGA AGT CGC TAC CCA CAT AAG CCA GAG TGA GTG ACG GGC 3120 3150 3180 I ioo u^ He Asn Ser Thr Thr His Pro AGG CCT GTC TGC TGA GAC GCC GGG GGA CGG AGA CAC TGC GCG TGG CGG GGG CGG GCT TCT TGC TGA CAT CCT TTC TAT TCC AGA ATC AAC TCT ACC ACC CAC CCC 3210 3240 3270 3300 110 120 130 140 Gly Ala Asp Leu Arg Glu Asn Phe Cys Arg Asn Pro Asp Gly Ser He Thr Gly Pro Trp Cys Tyr Thr Thr Ser Pro Thr Leu Arg Arg Glu Glu Cys Ser Val GGG GCT GAC CTG CGG GAG AAT TTT TGC CGC AAC CCG GAT GGC AGC ATT ACT GGG CCC TGG TGC TAC ACC ACA TCC CCG ACT CTG CGG AGA GAA GAG TGC AGC GTC 3330 3360 3390 , » c y s c | Intron F CCG GTG TGC GGT GAG CGG GGG CGG TCG GTG GCC CAA GGC CAA AGC CAG GAC GGG AAT CGA GAT GCC AGC ACC CTC TGA CCC GGG TTA AGT TAG ACA CTT TTC CGG 3420 3450 34B0 3510 GTT AAG TGA CAT CAG GAG GCC 1120 bp GA TCC CAG CTG TCT TTC GTA CTG GGT CTT TGT GAA AAC ACA GAA TCC CTT AGA CTC TGG GCG GGC 3540 4680 4 7 1 0 ACT AGC AGT AGA GTA CAG ATA GCG CAG GAG GTG AAA CCT CGG TAC CAT CCC TGG CTA GTC ACG CCC CAG ACA CTT GCG CCA TAT CTT TTG TTT AAA TCT CAA CAA 4740 4770 4800 4830 CCC TGC AAA AAA AAA CCT CAT TAC AGA TCC CTT TCA CAG CCA AGC CGA ATG CGG CTC AGA GAG GTT AAG TCA CTT GAC ATC GTA CAG GTC AAA GGT CAG GGG GCA 4860 4890 4920 ;150 ly Gin Asp Arg Val Thr Val Glu Val He Pro Arg ^ i-i-i «n. wm T C T WiC TCC ACT GTG GTC CAA CGC TCT CTG CCC CCT CTC TCT CCT CAC CCA CCA GGC CAC GAC CGA GTC ACA GTG GAG GTG ATC CCC CGG 4950 4980 5010 5040 i6o X 7 0 1 8 0 1 9 0 Ser Gly Gly Ser Thr Thr Ser Gin Ser Pro Leu Leu Glu Thr Cya Val Pro Asp Arg Gly Arg Glu Tyr Arg Gly Arg Leu Ala Val Thr Thr Ser Gly Ser Arg TCA GGA GGC TCC ACT ACC AGT CAG TCG CCT CTA CTG GAA ACA TGC GTC CCG GAC CGC GGC CGG GAG TAC. CCA GGG CGG CTG GCG GTG ACC ACA AGC GGG TCC CGC 5070 5100 5130 200 210 220 Cys Leu Ala Trp Ser Ser Glu Gin Ala Lys Ala Leu Ser Lys Asp Gin Asp Phe Asn Pro Ala Val Pro Leu Ala Glu Asn Phe Cys Arg Asn Pro Asp Gly Asp TGC CTT GCC TGG AGC AGC GAG CAC GCC AAG GCC CTG AGC AAG GAC CAG GAC TTC AAC CCG GCC GTG CCC CTG GCG GAG AAC TTC TGC CGC AAC CCA GAC GGG GAC 2 3 o5 1 6 0 5190 5220 5250 Glu Glu Gly Ala Trp Cys Tyr Val Ala Asp Gin Pro Gly Asp Phe Glu Tyr Cys Asp Leu Asn Tyr Cys cJ IntfOn G GAG GAG GGC GCC TGG TGC TAC GTG GCC GAC CAG CCT GGC CAC TTT CAG TAT TGT GAC CTG AAC TAC TGC GGT GAC AGG GCA GGG CCG GGC CCG ACA GAG GAC GGG 5 2 8 0 5310 5340 GCT GGC GGT CAG AGC GGG AGG CGA GCC TTC CCT GGC CTC GGG CTT CCC ACG TGC GCG ACA GGG CCT TCC TGA GCC AGG TAG GGC CCA GCC TAG CCC CTG CCC AC 5370 5 4 0 0 5 4 3 0 AGC TGA GCC CAG TGA GGC CCG CGA GCT CGT TCG CTA GTA AGG TCC GCT CTT AAC CGC CGC CAC ACG GCC TCC CCG GGG TGC GGG CTC GGG GCA GTC CAG CCA GCG 5470 5490 5 5 2 0 5550 102 p50 260 ^ lu Glu Pro Val Asp Gly Asp Lau Gly Aap Ar? Lau Gly Glu Aap Pro Aap Pro Asp GGT GTG GCA TGG CCC GGC CCA GCC GCA GCC COT GTC TGG GTC CCT OCA GAG GAG CCG GTG GAT GGA OAC CTG GGA GAC AGO CTG GST GAG GAC CCG GAC COG .GAC 5580 — 5610 5640 2 7 0 Jr 2 8 0 2 9 0 I intron H Ala Ala l i e Glu Gly Arg pThr Sar Glu Asp His Pha Gin Pro Pha Phe Aan Glu Lys Thr Phe Gly Ala Gly Glu Ala A f GOG GCC ATC GAG GGA CGC ACG TCT GAG GAC CAT TTC CAA CCC TTC TTC AAC GAG AAG ACC TTT GGC GCC GGG GAG GCC GGT AAG GTG TGG GCG TCA CGG CGT GCG 5700 5730 5760 I 300 310 ^sp Cys Gly Lau Arg Pro Lau Pha Glu Lys Lys Gin Val Gin Asp Gin Thr Glu Lya GGC GGG GCG TOG CGG CGC TCC ACC TCT CAC GGT CCC OCT TGC CCC TTA GAC TGT GGC CTG OGA CCC CTG TTC GAG AAG AAG CAG GTG CAG GAC CAA ACG GAG AAG 5790 ^0 5S20 5850 3 2 0 F 3 3 0 I intron I Glu Lau Pha Glu Sar Tyr 11a Glu Gly Arg " l i e val Glu Gly Gin Asp Ala Glu Val Gly Lau Ser Pro T r ^ ..... w.. GAG CTT TTC GAC TCC TAC ATC GAG GGG CGC ATC GTG GAG GGT CAG GAC GOG GAG GTT GGC CTC TCG CCC TGG TGC GTG CTC CTC GCC TCC CCC GTG GCC CTG CTG 5890 5910 5940 5970 CCC CGC CCC CCA GCC AAC GGG CCC GGA GGC CTT CTC CGG GTC ACA GGA CTT TAA GGC TCC ACT TGG TAA CCT ACG CCA CAC CAC GCA TT 320 bp 6000 6030 6060 - A AGG TOG CCA GGT CAA GCT GGG TCT GGG CCA GCA GTT AGC TCT AAT TAG TTA TTA AAC TTG GGA CTT TAC GCT TGT TTT TGT TOT TCA GTC ACT AAG TCG TGT 6420 6450 6480 CCA ACT CTC TOG GAA TCC CAT GGA CTC GAG CAC ACC AGG CTT CCC TGT CCT TCA CTA TCT CCC AGA GTT TGC CCA AAC TCA TGT CCA TTG ACT CGG TGA CAC CAT 6510 6540 6570 6600 CCA ACC ATC TCA TCC TCT GTC GTC CCC TTC TCC TCC CAC CCT CAA TCT TTC CCA GCA TCA GGG TCT TTT CCA GTG ACT CAG CTC TTC GCA TCA GGT GGC CAA AGG 6630 6660 6690 ACT GCA GGG TCG GCA TCA GTC CTT CTA ATG AAT ATT CAG AAT TTA TTT. CCT TTA GAT TGA CAG GTT GGA TCT CCT TCG TGT CCT CCC CAC TCT CAA GAG TCT TCT 6720 6750 6780 6810 CCA ACA CCA CAG TTC AAA AGC ATC AAT TCT TCG GGC CGC TCT GCC TTC TTT ATG GTC CAA TTC TCA CAT CCA TAC ATG ACC ACT GGA AAA ACC ATA GCT TTG ACT 6840 6870 6900 I 340 f p Gin Val Met Leu AAG ACG GAC CTT TCT GCT TGT AGG GCT GGT GAA TGG GGC AGC CCC CAG CCC AAC CCT GCC ACC ACC TAA ATG CTT CCG GCT TCC CGC CTC AGG CAG GTG ATG CTC 6930 6960 6990 7020 350 360 O 3 7 0 Phe Arg Lys Ser Pro Gin Glu Leu Leu Cys Gly Ala Ser Leu l i e Ser Asp Arg Trp val Lau Thr Ala Ala His Cys Leu Leu Tyr Pro Pro Trp Asp Lys Aan TTT CGT AAG ACT CCC CAG GAG.CTG CTC TGT GGG GCC AGC CTC ATC ACT GAC CGC TGG GTC CTC ACG GCT GCC CAC TGT CTC CTG TAC CCG CCT TGG GAC AAG AAC 7050 7080 7110 3 8 0 „ , „. 3 9 0 . . . . I intron J Phe Thr Val Asp Aap Leu Leu Val Arg l i e Gly Lys His Ser Arg Thr Ar^ TTC ACC GTG GAT GAC CTG CTG GTG CGC ATC GGC AAG CAC TCC CGC ACC AGG TCG GAG GGG CC 3 5 0 bp A GCT TCT CTT TTT CTC TGC TGG GGT 7 1 4 0 7 1 7 0 7 5 6 0 1 4 0 0 4 1 0 4 2 0 Q ^g Tyr Glu Arg Lys Val Glu Lys l i e Ser Met Leu Aap Lys l i e Tyr l i e Hia Pro Arg Tyr Asn Trp Lys Glu Asn Leu Asp Arg Asp l i e Ala Leu CTG CAC AGG TAT GAG CGG AAG GTT GAA AAG ATC TCC ATG CTG GAC AAG ATC TAC ATC CAC CCC AGG TAC AAC TGG AAG GAG AAT CTG GAC CGG GAC ATC GCC CTG 7 5 9 0 7 6 2 0 7 6 5 0 s 430 440 450 i intron K Leu Lys Leu Lys Arg Pro l i e Glu Leu Ser Asp Tyr He His Pro val Cys Leu Pro Asp Lys Gin Thr Ala Ala Ly^ M i l l W l l r v CTG AAG CTC AAG AGG CCC ATC GAG TTA TCC GAC TAC ATC CAC CCC GTG TGC CTG CCC GAC AAG CAG ACA GCA GCC AAG TTG GGC AGC CAG GAG GGC AGC GGG GGG 7 6 8 0 7 7 1 0 7 7 4 0 7 7 7 0 GTG GTG GAG GGG GCG GCT TGA GGC TGA GGG GGC CTG GGC TGG GTT CTG GGC CCA ACT CTC ACA TTC CTG TTG CCT TGC CGA AGC TCC TTC CCA TTT CCA GCC TCG 7 8 0 0 7 8 3 0 7 8 6 0 GGC CTT CCT GCC ACG GGG GTC TTA GGC TCG AGT CTC TAC GGG GTG GTG TTG GGG CCA GGA GGC TCC TGG GCG GGA TCT GTT CTC ACT GGG TCC TTC TCC CTT CCC 7 8 9 0 7 9 2 0 7 9 5 0 7 9 8 0 • 4 6 0 4 7 0 4 8 0 Leu Leu His Ala Gly Phe Lys Gly Arg Val Thr Gly Trp Gly Asn Arg Arg Glu Thr Trp Thr Thr Ser val Ala Glu Val Gin Pro Ser Val Leu Gin CAA AGG CTG CTC CAC GCT GGG TTC AAA GGG CGG GTG ACG GGC TGG GGC AAC CGG AGG GAG ACG TGG ACC ACC AGC GTG GCC GAG GTG CAG CCC AGC GTC CTC CAG 8 0 1 0 8 0 4 0 8 0 7 0 4 9 0 5 0 0 5 1 0 I intron L Val Val Asn Leu Pro Leu Val Glu Arg Pro Val Cys Lys Ala Ser Thr Arg He Arg He Thr Asp Asn Met Phe Cys Ala G-f GTG GTC AAC CTG CCT CTC GTG GAG CGG CCC GTG TGC AAG GCC TCC ACC CGG ATC CGC ATC ACC GAC AAC ATG TTC TGT GCC GGC AAG TGC CCT GGG CGG GCG GGG 8100 8130 8160 8190 CTG CGG TGG GAG GAT GAG ACC CGT TAA CAG CGC GGG CCT GTG TTC AAG GCC TGG CTT CGC TTT ATT TGC TTG TGT ATT ACA CAT TTT ATT TGA ACA TAG TTG ATA 8220 8250 82B0 CAC AAT ATT AGT GTC AGG TGT ACA ACA CAG TGA TTC AGT GTG TCG ATA GCT TAT ACT CCA TTT AAA GCT ATT ACA AAA TGA TGG CTG TAT TTC CCT GCG CTG GCC 8310 8340 8370 8400 AGT GTA TCT TGG TTA TTT AGA TGG GAT GCG GTA GTT TCT CTC TCT TAA CCC CCA GCC CCG TCT TGC CCC TCC TCA CTC CCT CTC CCT GCT GGC AAT TCC ATG TTT 8430 8460 8490 GTT CTC TGT CAG TGG GTC TGT TTC TGT TTC ATT ATA TTC ATC TGT TTA TTT TTG GAT TAC CA 6270 bp -' T CAC TCT GCC TGT TGG GTG GAG ACT 8520 8550 14850 GGA TTG GAG GCA GCG AAA GGA GAG GCA GAG AAA GCA GCG GTT CGG GGA GAA AGT GGT GTG TGA TGG GCC CGG GAG CGG AAG TGG CGA GAG TGG CTG GAC TGG GGC 14880 14910 14940 14970 TGC ATG TTG CAG ACA GAG CTG ACA AAA CCT GCC TGG GTT GGA TGC GAG GGG GAG GCA ATG CGC AGT CAG GGA GGG CTA GCA GTC GGG GGG CAC TCT GGC TGG AGC 15000 15030 15060 1 520 O 5 3 0 ly Tyr Lys Pro Gly Glu Gly Lys Arg Gly Asp Ala Cys Glu Gly Asp Ser Gly Gly Pro Phe Val GTG ACT GGT CAC TCC CTG AGC ACT GCG GTT CTC TCT CAA GGT TAC AAG CCT GGT GAA GGC AAA CGA GGG GAC GCT TGT GAG GGC GAC AGC GGG GGA CCC TTC GTC 15090 15120 15150 15180 _ 1 intron M Lys + Met ATG AAG GTA AGC GTC TCC GAA GGC CCC GGA ACT GGT GGG GAG ATC CTT CTG GGT GGA CGG GAG GGA CCC GAG GAT TCA GGA ACA ATC AAT TGA CCC TAC CTT GGA 1S210 15240 15270 I 540 " 550 f Ser Pro Tyr Asn Asn Arg Trp Tyr Gin Met Gly He Val Ser Trp Gly Glu Gly Cys Asp Arg Asp Gly CTC GAC TCT ATT GGA AAC CCC ATA TTT CTT CCT' CAG AGC CCC TAT AAC AAC CGC TGG TAT CAA ATG GGC ATC GTC TCA TGG GGT GAA GGC TGT GAC AGG GAT GGA 15300 15330 15360 - 15390 560 570 580 582 Lys Tyr Gly Phe Tyr Thr His Val Phe Arg Leu Lys Lys Trp He Gin Lys Val He Asp Arg Leu Gly Ser STOP AAA TAT GGC TTC TAC ACA CAC GTC TTC CGC CTG AAG AAG TGG ATA CAG AAA GTC ATT GAT CGG TTA GGA AGT TAG GGA GCC ACC CAC ATT CCA GGC TCC TCA CTG 15420 15450 15480 ^ , • • CAA AAT CTC AGA GGC CAA TCC AGT GAA TGA ATT ATT TTT GTG GTT TGT TCC TAA AAC TAT CTT TCT CfeA TAA JfoG TGA CTC TAT CAA CGA GCC TCG GGA CTC OCA 15510 15540 15570 15600 GTGI CTG TTC ATG GGG CAG CTC AGG AAG CGC CAG CCC CAC CCC TGG ACA AGC GGC ACG CGA GGG ACC TGC CAC CCT AGA ACA GGG CCA GGT GAG AGG GGA CAT GGC 15630 15660 15690 AGC CTG AAC TTA GCA TTT CAG ATG TT 15720 103 heteroduplex analysis, the t o t a l size of the prothrombin gene is approximately 15.6 Kbp. Within experimental error, t h i s value is in excellent agreement with the size of the gene determined by heteroduplex analysis ( 1 4 . 9 Kbp). The sequences found at the intron-exon junctions are given in Table V, and the frequency of occurrence of nucleotides at each position around the junctions is given in Table VI. The sequences agree well with the splice junction consensus sequence found in other genes transcribed by RNA polymerase II (Mount,1982). A l l introns follow the GT/AG rule of Breathnach and Chambon(1981) except for the donor sequence of intron L that has the sequence GC. The sequence of this region of intron L was determined on two separate a l l e l e s of the bovine prothrombin gene (cloned from the two dif f e r e n t phage l i b r a r i e s as described in section A). Both a l l e l e s gave an i d e n t i c a l sequence except that nucleotide 8288 (Fig.9 ) in the intron was T in one a l l e l e and C in the other. D. MAPPING THE SITE OF mRNA INITIATION 1. Nuclease S1 Mapping The mRNA i n i t i a t i o n s i t e in the f i r s t exon was determined by nuclease S1 mapping using a probe that contained part of the 5' flanking sequence, the entire f i r s t exon, and part of the f i r s t intron. This analysis showed that the size of the f i r s t exon was about 100 nucleotides (data not shown). To determine the precise s i t e of mRNA i n i t i a t i o n , a more s p e c i f i c probe was made using a synthetic oligonucleotide to prime DNA synthesis from a genomic DNA fragment cloned into Ml 3. The 3 2 P - l a b e l e d 1 04 Table V: Nucleotide Sequences at the Intron-Exon Junctions  of the Bovine Prothrombin Gene Upper case l e t t e r s are exon sequence, lower case are intron sequence. Codon phase refers to the position within codons interupted by introns: 0 - between codons, I - after the f i r s t nucleotide of a codon, II - after the second nucleotide of a codon. Numbers at the intron-exon junctions indicate the p o s i t i o n of the intron in the mRNA sequence. 1 05 EXON NUMBER 5' SPLICE DONOR INTRON 3' SPLICE ACCEPTOR CODON PHASE 1 CATGgtaagg 1 03 A cagcctcctcccccctgcagTGTT 1 04 I 2 CACGgtgagg 267 B tactcagccttgtttttcagGATG 268 O 3 ACAGgtgaac 292 C gtgtctctgggtctttctagCTTG 293 I 4 GAAGgtgagg 343 D ctggactggggtctccgcagGAAA 344 I 5 CAGAgtgagt 449 E tgagatgctttctattccagAATC 450 II 6 TGCGgtgaga 586 F tctctctcctcacccaccagGCCA 587 I 7 TGCGgtgaga 901 G ccgtgtctgggtccctgcagAGGA 902 I 8 GCCGgtaagg 1 036 H cggtcccgcttgccccttagACTG 1 037 I 9 CCTGgtgcgt 1 1 63 I cttccggcttcccgcctcagGCAG 1 1 64 II 1 0 CCAGgtcgga 1 331 J ctctgctggggtctgcacagGTAT 1 332 II 1 1 CCAAgttggg 1 505 K tccttctcccttccccaaagGCTG 1 506 II 1 2 GCCGgcaagt 1 687 L cactgcggttctctctcaagGTTA 1 688 I 1 3 GAAGgtaagc 1 758 M accccatatttcttcctcagAGCC 1 759 0 1 06 Table VI: Frequencies of Nucleotides at Intron-Exon Junctions DONOR FREQUENCIES -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 G 4 2 1 11 13 0 7 2 12 5 A 1 5 5 2 0 0 4 10 1 2 T 2 0 2 0 0 12 1 0 0 3 C 6 6 5 0 0 1 1 1 0 3 CON N A A G G T R A G T C ACCEPTOR FREQUENCIES -20-19-18-17-16-15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 +1 +2 +3 +4 G 1 2 5 2 3 1 4 4 4 4 4 2 0 1 1 0 3 0 0 13 7 3 1 5 A 1 3 1 0 2 2 0 1 0 0 0 1 0 1 0 1 2 2 13 0 4 3 3 4 T 4 4 2 7 2 4 5 2 5 7 5 6 3 6 3 6 4 2 0 0 1 3 "7 2 C 7 4 5 4 6 6 4 6 4 2 4 4 10 5 9 6 4 9 0 0 1 4 2, ^  2 CON Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y A G G N N N The frequencies of the d i f f e r e n t nucleotides at the intron-exon junctions of the bovine prothrombin gene are compared to the consensus (CON) of Mount(l982). Splice junctions are between -1 and +1. 1 07 probe DNA was released from the M13 DNA with r e s t r i c t i o n endonucleases, and was isolated by denaturing polyacrylamide gel electrophoresis. The probe consisted of nucleotides -212 to 41 of the bovine prothrombin gene (Fig.9). The probe DNA was hybridized to bovine l i v e r mRNA, and then treated with nuclease S1 . The size of the nuclease S1-resistant DNA was analyzed by denaturing polyacrylamide gel electrophoresis (Fig.10). A major DNA fragment was observed together with several minor fragments that were larger than the major fragment. This type of pattern has been observed by others, and may be the result of s t e r i c hindrance of the nuclease S1 by the mRNA cap structure (see Weaver and Weissmann,1979). The size of the major band was estimated by comparing i t s mobility to a chain termination sequencing ladder (Fig.10). This ladder was generated by DNA sequence analysis of the same M13 clone/oligonucleotide that was used to construct the nuclease S1 resistant probe. The major band from the nuclease SI analysis corresponds to the G at position 1 in Fig.9. 2. Primer Extension As an a l t e r n a t i v e method of analyzing the 5' end of the bovine prothrombin gene, bovine l i v e r RNA was reverse transcribed using the synthetic oligonuclotide (same as above) as a primer. The primer extension products were then analyzed by denaturing polyacrylamide gel electrophoresis. Two DNA fragments were observed - a major band corresponding to nucleotide 10 in Fig.9 and a minor band corresponding to nucleotide 2 in Fig.9 (Fig.11). Nucleotide 1 (Fig.9) 1 08 Fig.10: Nuclease SI Mapping of the Prothrombin mRNA Autoradiograph of protected DNA fragments separated by electrophoresis after digestion of a labeled single stranded DNA probe complementary to nucleotide -212 to 41 (Fig.9) by Nuclease S1. DNA sequence of the probe shown beside the protected DNA, sequence i s complementary to that in Fig.9. Major band corresponds to mRNA i n i t i a t i o n at nucleotide 1. 1 0 9 1 10 Fig.11: Primer Extension Analysis of Prothrombin mRNA Autoradiography of products of extension of bovine prothrombin mRNA with avian reverse transcriptase with an oligodeoxyribonucleotide complementary to nucleotides 25 to 41 (Fig.9). DNA sequence of the 5' end of the bovine prothrombin gene (see Fig.9) i s shown in p a r e l l e l with the extended products. DNA sequence is complementary to that in Fig.9. Major termination s i t e s with avian reverse transcriptase was at nucleotide 10, with minor s i t e at nucleotide 2. G A T C 1 1 1 2 corresponds to a consensus mRNA i n i t i a t i o n s i t e (a purine flanked by pyrimidines; Breathnach and Chambon,1981) suggesting that this is the true start s i t e of prothrombin mRNA. In that case, the size of prothrombin mRNA would be 2025 nucleotides which, with a poly(A) t a i l , agrees well with the size of the mRNA determined by Northern blot analysis (2150 ± 100 nucleotides, section A-3). The primer extension product terminating at position 10 may be the resul t of s t a l l i n g of the reverse transcriptase due to secondary structure in the mRNA. E. MAPPING REPETITIVE DNA The presence of re p e t i t i v e DNA within the genomic clones was detected by hybridization of labeled genomic DNA to the cloned DNA. The hybridization signal detected by autoradiography for each cloned r e s t r i c t i o n endonuclease fragment w i l l then be proportional to the number of copies of that sequence found within the bovine genome; therefore, fragments containing r e p e t i t i v e DNA sequences w i l l be detectable upon the shortest exposure of the autoradiogram. Figure 12 demonstrates one of these blots, together with the corresponding gel stained with ethidium bromide. With these blots (Fig.12) repet i t i v e DNA elements could be mapped to several locations within, and flanking the bovine prothrombin gene (Fig.13). As indicated in the previous section, r e p e t i t i v e DNA elements were i d e n t i f i e d within some of the genomic clones by heteroduplex analysis (Fig.7, Table IV) (Irwin et a_l.,1985). These inverted repeats (Table IV) were also detected by Southern blot analysis (Fig.13) confirming the presence of r e p e t i t i v e DNA. 1 1 3 Fig.12: Southern Blot Analysis of Repetitive DNA Within the  Bovine Prothrombin Gene DNA, lanes 1-5 XBII2, lanes 6-10 XBII3, lanes 11-15 XBII4, was cut with various r e s t r i c t i o n endonucleases and separated on an agarose g e l . A, Ethidium bromide stained agarose gel. B, Autoragiograph (100 minutes) of the DNA from A after hybridizing to nick translated bovine genomic DNA (1 x 1 0 8 cpm//ig). lane 1, EcoRI, 2, H i n d l l l , 3, EcoRI-Hindi 11 , 4, EcoRI-BamHI, 5, Sstl-BamHI, 6, H i n d l l l , 7, Sstl-BamHI, 8, Xbal-BamHI, 9, EcoRI-BamHI, 10, Hindi11-BamHI, 11, SstI , 12, B g l l l , 13, EcoRI, 14, Xbal, 15, Xbal-HindiII, M, marker, X digested with H i n d l l l . 1 1 4 1 2 3 4 5 M 6 7 8 9 10 M 11 12 13 14 15 0.59-B 23.4 -1 2 3 4 5 9.96-6.67-4.25-2.25-1.96-0.59 -1 1 5 Fig.13: Map of Repetitive DNA in the Bovine Prothrombin Gene The r e s t r i c t i o n map from Fig.5 i s shown with areas containing r e p e t i t i v e DNA sequences indicated above the r e s t r i c t i o n endonuclease cut s i t e s as s o l i d bars. SCALE (KB): 0 5 10 15 20 25 30 35 40 42 L - — 1 1 1 I I I _ J _ i _ EXONS. 1 2 3 4 g 6 7 8 9 1 0 1 1 t 2 1 3 1 4 GENE: 11 II II • • i II n — BamHI : t_i i i i H-EcoRI - i _ i u i i I I i i i 11  Hindlll i i i i I u _ i i i i Sstl i i i i u i I : u 1 i i Bglll i i i i i I I I i I I Xhol i - i i  Xbal i i i I_I i i Sail i — Kpnl i i i CLONES: . D 1 I„ XBII1 XBII2I = 117 Nucleotides 6,390-6,700 in Fig.9 represent the approximate location of one of the repeated DNAs from heteroduplex analysis (Table IV). This sequence, by hybridization analysis, was shown to contain a re p e t i t i v e DNA element. Comparison of DNA sequence in Fig.9 (especially nucleotides 6,390-6,900) to known bovine r e p e t i t i v e DNA elements (Watanabe et al.,1982; Richardson et §_1.,1986) f a i l e d to find any homology. Thus, the location and i d e n t i t y of the repetitive DNA elements within the bovine prothrombin gene are unknown. F. ISOLATION OF A HUMAN PROTHROMBIN cDNA Degen et §_1.(1983) used the bovine prothrombin cDNA as a hy b r i d i z a t i o n probe to isolate human prothrombin cDNAs. The hybridizations were performed under conditions of reduced stringency to allow for mismatches between the bovine and human sequences. Three of the positives were characterized. The longest clone, pHII3, contained DNA coding for part of a leader peptide of 36 amino acids as well as the entire coding region of the plasma protein, a 3' untranslated sequence of 97 bp and a poly(A) t a i l (Fig.14). To i s o l a t e a human prothrombin cDNA clone for the remainder of the leader peptide, a dif f e r e n t cDNA l i b r a r y (Prochownik e_t al.,1983) was screened using pBI1111 as a hybridization probe. One hundred and twenty thousand colonies were screened by colony h y b r i d i z a t i o n (Benton and Davis,1977) using the same conditions as Degen et a l . ( l 9 8 3 ) . Eight of the positives were characterized further. By r e s t r i c t i o n endonuclease mapping, 1 18 Fig.14: Restriction Endonuclease Map of the Human Prothrombin  cDNAs Restriction endonuclease map of the human prothrombin cDNAs pIIH13 and pHII-3 (Degen et al.,1983). cDNA inserts are flanked by PstI s i t e s by the cloning procedure. Open bars correspond to plasma prothrombin coding region, s o l i d bars correspond to the prepro-leader, and hatched bars correspond to the 5' and 3' untranslated sequences. Arrows below pIIH13 refer to M13 clones used for DNA sequence a n a l y s i s . 1 p o cn X h o l H ind i o CD O 'OO ro CD co ro b S s t B g l P s t I B a m H I co X i CO "0_ X co 6 T T 7t c r 1 20 pIIH13 appeared to be a f u l l - l e n g t h cDNA for human prothrombin. G. PARTIAL DNA SEQUENCE OF pIIH13 DNA sequence of the 5' region of pIIH13 was determined as shown in Fig.15 on both strands using the chain termination method (Sanger et al.,1977). Translation of the cDNA sequence using the standard genetic code showed that pIIH13 did indeed contain DNA coding for human prothrombin. Nucleotides 157-327 (Fig.15) encoded amino acid residues 1-57 of plasma prothrombin. Nucleotides 49-156 (Fig.15) encode the part of the leader sequence in pHII3 as reported by Degen et al.(l983). Upstream of nucleotide 49 is an ATG codon (nucleotides 28-30, Fig.15) that i s in the same pos i t i o n as the i n i t i a t o r methionine found in bovine prothrombin (MacGillivray and Davie,1984). Six nucleotides upstream of t h i s ATG codon i s a TGA stop codon (nucleotides 19-21, Fig.15) strongly suggesting that the ATG at nucleotide 28-30 encodes the i n i t i a t o r methionine for human prothrombin mRNA. In that case, human prothrombin is synthesized as a precursor containing a leader peptide of 43 amino acid residues. This i s the same length as the bovine prothrombin leader peptide. H. ISOLATION OF THE HUMAN PROTHROMBIN GENE 1 . Isolation Of Genomic Clones To i s o l a t e DNA coding for the human prothrombin gene, approximatly 106 clones of the p a r t i a l H a e l l l / A l u l f e t a l human l i v e r genomic l i b r a r y in XCh4A (Lawn et §_1.,1977) were screened using 3 2 P - l a b e l e d pIIH13 as a hybr i d i z a t i o n probe. Three 121 Fig.15: Nucleotide Sequence of the 5' End of pIIHl3 The predicted amino a c i d sequence of human prepro-prothrombin is shown above the cDNA sequence. The leader peptide has been numbered backwards from the s i t e of cleavage that gives r i s e to plasma prothrombin. -43 -40 -30 Met Ala Arg l i e Arg Gly Leu Gin Leu Pro Gly Cys Leu Ala Leu Ala CCC TAG TGA CCC AGG AGC TGA CAC ACT ATG GCC CGC ATC CGA GGC TTG CAG CTG CCT GGC TGC CTG GCC CTG GCT 15 30 45 60 75 -20 -10 Ala Leu Cys Ser Leu Val His Ser Gin His Val Phe Leu Ala Pro Gin Gin Ala Arg Ser Leu Leu Gin Arg Val GCC CTG TGT AGC CTT GTG CAC AGC CAG CAT GTG TTC CTG GCT CCT CAG CAA GCA CGG TCG CTG CTC CAG CGG GTC 90 105 120 135 150 -1 +1 10 20 Arg Arg Ala Asn Thr Phe Leu Glu Glu Val Arg Lys Gly Asn Leu Glu Arg Glu Cys Val Glu Glu Thr Cys Ser CGG CGA GCC AAC ACC TTC TTG GAG GAG GTG CGC AAG GGC AAC CTG GAG CGA GAG TGC GTG GAG GAG ACG TGC AGC 165 180 195 210 225 30 40 Tyr Glu Glu Ala Phe Glu Ala Leu Glu Ser Ser Thr Ala Thr Asp Val Phe Trp Ala Lys Tyr Thr Ala Cys Glu TAC GAG GAG GCC TTC GAG GCT CTG GAG TCC TCC ACG GCT ACG GAT GTG TTC TGG GCC AAG TAC ACA GCT TGT GAG 240 255 270 285 300 50 Thr Ala Arg Thr Pro Arg Asp Lys Leu ACA GCG AGG ACG CCT CGA GAT AAG CTT 315 327 1 23 d i f f e r e n t X clones were i d e n t i f i e d and plaque p u r i f i e d . The DNA contained in these clones was characterized by r e s t r i c t i o n endonuclease mapping (Fig.16). One of these clones (XHII1) contained a 5.0 Kbp insert and was i d e n t i c a l to the previously i s o l a t e d genomic clone X10 (Degen et al.,1983). The other two clones (XHII2, XHII3) overlapped th i s sequence and contained a t o t a l of 23 Kbp of human genomic DNA (Fig.16). Part of the human prothrombin gene has been located in thi s region by Degen et al.(l983) as shown in Fig.16. The gene has been estimated to be greater than 20 Kbp in size (unpublished results quoted in Nagamine e_t aJL.,1984). In that case, the cloned DNA shown in Fig.16 does not contain the complete human prothrombin gene. In addition, the r e s t r i c t i o n map of the genomic clones shown in Fig.16 f a i l e d to account for a l l the r e s t r i c t i o n endonuclease fragments detected by genomic Southern blot analysis (Fig.17). Thus, i t appears that these genomic clones do not contain the 3' end of the human prothrombin gene. 2. P a r t i a l DNA Sequence Analysis Of The Human Prothrombin Gene To prove that the genomic clones isolated contained the gene for human prothrombin, p a r t i a l DNA sequence analysis of a 1.0 Kbp Hindi11-EcoRI r e s t r i c t i o n endonclease fragment of XHII 1 was undertaken (see Fig.16). This fragment was found to contain exons 10 and 11 of the human prothrombin gene as was expected from the r e s t r i c t i o n endonuclease map of Degen e_t a_l.(l983) (see Fig.16). 1 24 Fig.16: R e s t r i c t i o n Map of the Human Prothrombin Gene The r e s t r i c t i o n map was derived from the three clones XHII1, XHII2, and XHII3. Genomic DNA fragments are flanked by EcoRI r e s t r i c t i o n s i t e s (E). The exons are indicated as s o l i d boxes, and introns as the thin l i n e ; both exons and introns have been placed using data from Degen e_t a l . (1 983 , 1 985) and Davie et al.(1983) . m CO-4^ " CD < m EcoRI EcoRI EcoRI BamHI Hindlll EcoRI m CO. co-rn BamHI Hindlll m BamHI BamHI o ' Hindll m X X I X. CO F3 7; CT TJ S 2 T 1 26 Fig.17: Southern Blot Analysis of the Human Prothrombin Gene Human genomic DNA (1 0jug) was digested with various r e s t r i c t i o n endonucleases and electrophoresed in an agarose gel . After denaturation, the DNA was transferred to n i t r o c e l l u l o s e and hybridized to 3 2 P - l a b e l e d pIIH13. Lane M represents 3 2 P - l a b e l e d size markers comprised of X DNA cleaved with H i n d l l l . Human DNA was cleaved with H i n d l l l (lane 1), BamHI (lane 2), EcoRI (lane 3), SstI (lane 4), B g l l l (lane 5), and PstI (lane 6). 23.4 • 9 . 9 6 6 .67 4 . 2 5 2 . 2 5 1 .96 0 . 5 9 1 28 1. ISOLATION OF cDNA CLONES FOR CHICKEN PROTHROMBIN 1 . Conditions Of Screening To i n i t i a t e studies of the prothrombin gene in other species, a chicken l i v e r cDNA l i b r a r y (generously provided by Dr. Todd Kirshgessner, UCLA) was screened at low stringency using a 3 2 P - l a b e l e d human prothrombin cDNA (pIIH13) as a hybridization probe. The l i b r a r y was screened on duplicate f i l t e r s at low stringency in an attempt to detect any weak cross hybridization signal between the human and chicken sequences. Duplicate f i l t e r s were necessary to detect postive clones due to the high background. From the i n i t i a l 30,000 recombinant clones screened, 10 po s i t i v e s were i d e n t i f i e d , two of which were studied further. One of these, pCII1 contained a 950 bp ins e r t . 2. DNA Sequence Of pCII1 The entire DNA sequence of pCII1 was determined (nucleotides 650 to 1569, Fig.18). One of the potential t r a n s l a t i o n products of t h i s DNA sequence was found to have approximately 70% amino acid sequence identity with both bovine and human prothrombin, in the serine protease domain. This high amino acid i d e n t i t y suggested that t h i s was chicken prothrombin. Amino acid sequence data (generously provided by Dr. Dan Walz, Wayne State Univ.) confirmed that the sequence corresponded to the chicken prothrombin gene. Amino acid sequence data was available for two regions of chicken thrombin: the amino-terminal 27 amino acid residues of the B chain, and a 29 amino acid residue long section within the B chain of thrombin (383 to 1 29 F i g . 1 8 : DNA S e q u e n c e o f C h i c k e n P r o t h r o m b i n c D N A s T h e p r e d i c t e d a m i n o a c i d s e q u e n c e i s s h o w n a b o v e t h e D N A s e q u e n c e . T h e t w o p o l y a d e n y l a t i o n s i t e s a r e i n d i c a t e d b y t h e t r i a n g l e s , w i t h t h e A A T A A A p o l y a d e n y l a t i o n s i g n a l s u n d e r l i n e d . 0 , i n d i c a t e s t h e c a t a l y t i c t r i a d r e s i d u e s H i s 3 5 0 , A s p 4 o e , a n d S e r 5 1 1 . S o l i d a r r o w s i n d i c a t e t h e t w o f a c t o r X a c l e a v a g e s i t e s , a n d t h e o p e n a r r o w t h e s i t e o f c l e a v a g e b y t h r o m b i n . 130 100 HQ 120 Lys Tyr Pro His He Pro Lys Phe Asn Ala Ser He Tyr Pro Asp Leu Thr Glu Asn Tyr Cys Arg Asn Pro Asp Asn Asn Ser Glu Gly Pro Trp Cys Tyr Thr AAA TAT CCA CAT ATA CCT AAA TTT AAT GCC TCC ATT TAT CCT GAC CTC ACT GAG AAC TAC TGC AGG AAC CCA GAC AAC AAC TCA GAA GGT CCA TGG TGC TAC ACA IS 30 45 60 75 y\ 90 105 130 140 150 f> 160 Arg Asp Pro Thr Val Glu Arg Glu Glu Cys Pro He Pro Val Cys Gly Gin Glu Arg Thr Thr Val Glu Phe Thr Pro Arg Val Lys Pro Ser Thr Thr Gly Gin CGA GAC CCA ACA GTG GAA CGG GAA GAG TGC CCC ATT CCA GTA TCT GGT CAA GAA AGG ACA ACA GTT GAG TTC ACT CCG CGG GTC AAA CCA TCA ACC ACA GGG CAG 120 . 135 150 165 180 195 210 170 180 190 Pro Cys Glu Ser Glu. Lys Gly Met Leu Tyr Thr Gly Thr Leu Ser Val Thr Val Ser Gly Ala Arg Cys Leu Pro Trp Ala Ser Glu Lys Ala Lys Ala Leu Leu CCT TGT GAA TCA GAG AAA GGA ATG CTT TAT ACA GGG ACG CTT TCA GTC ACT GTA TCT GGG GCT AGG TGC CTG CCA TGG GCC TCA GAG AAG GCC AAA GCA TTG CTC 225 240 255 270 285 300 315 200 210 220 230 Gin Asp Lys Thr He Asn Pro Glu Val Lys Leu Leu Glu Asn Tyr Cys Arg Asn Pro Asp Ala Asp Asp Glu Gly Val Trp Cys Val He Asp Glu Pro Pro Tyr CAA GAC AAA ACC ATT AAC CCA GAA GTG AAG CTG CTG GAG AAT TAC TGT CGG AAC CCT GAT GCA GAT GAT GAG GGT GTC TGG TGT GTA ATA GAT GAA CCA CCA TAC 330 345 360 375 390 405 420 240 250 \f 260 Phe Glu Tyr Cys Asp Leu His Tyr Cys Asp Ser Ser Leu Glu Asp Glu Asn Glu Gin Val Glu Glu He Ala Gly Arg^hr He Phe Gin Glu Phe Lys Thr Phe TTT GAA TAC TGT GAC CTG CAT TAC TGC GAC AGC TCG CTC GAG GAT GAG AAT GAA CAG GTG GAG GAA ATA GCG GGA CGT ACC ATC TTT CAA GAG TTC AAA ACC TTC 435 450 465 480 495 S10 525 270 280 290 300 Phe Asp Glu Lys Thr Phe Gly Glu Gly Glu Ala Asp Cys Gly Thr Arg Pro Leu Phe Glu Lys Lys Gin He Thr Asp Gin Ser Glu Lys Glu Leu Met Asp Ser TTC GAT GAA AAA ACT TTT GGT GAA GGT GAA GCA GAC TGT GGA ACT CGC CCT TTA TTC GAA AAG AAA CAG ATA ACA GAC CAA AGT GAG AAG GAG CTG ATG GAC TCC 540 555 570 585 600 615 630 310 320 330 Tyr Met Gly Gly Arg'Val Val His Gly Asn Asp Ala Glu Val Gly Ser Ala Pro Trp Gin Val Met Leu Tyr Lys Lys Ser Pro Gin Glu Leu Leu Cys Gly Ala TAC ATG GGA GGC AGA GTT GTA CAC GGG AAC GAT GCA GAA GTT GGA AGC GCC CCC TGG CAG GTG ATG CTC TAC AAA AAG AGT CCT CAA GAG CTG CTG TGT GGT GCC > 645 660 675 690 705 720 735 0 340 350 360 370 Ser Leu He Ser Asn Ser Trp He Leu Thr Ala Ala His Cys Leu Leu Tyr Pro Pro Trp Asp Lys Asn Leu Thr Thr Asn Asp He Leu Val Arg Met Gly Leu AGC CTC ATC AGT AAC AGC TGG ATC CTC ACT GCT GCT CAT TGC CTT CTT TAT CCA CCC TGG GAC AAG AAC TTA ACT ACA AAT GAC ATC TTG GTG CGG ATC GGC TTG 750 765 780 795 810 825 840 380 390 400 0 His Phe Arg Ala Lys Tyr Glu Arg Asn Lys Glu Lys He Val Leu Leu Asp Lys Val He He His Pro Lys Tyr Asn Trp Lys Glu Asn Met Asp Arg Asp He CAT TTC AGC GCA AAA TAC GAA AGG AAT AAA GAG AAA ATT GTT CTG TTG GAT AAA GTC ATC ATC CAT CCT AAG TAC AAC TGG AAA GAG AAC ATG GAC CGA GAT ATT 855 670 885 900 915 930 945 410 420 430 440 Ala Leu Leu His Leu Lys Arg Pro Val He Phe Ser Asp Tyr He His Pro Val Cys Leu Pro Thr Lys Glu Leu Val Cln Arg Leu Met Leu Ala Gly Phe Lys GCA CTC CTG CAC CTG AAG CGA CCG GTC ATC TTC AGC GAC TAC ATC CAT CCT GTC TGC TTG CCT ACC AAG GAG CTT GTG CAG AGG CTG ATG CTG GCA GGT TTT AAA 960 975 990 1 1 005 1 020 1 035 1 050 450 460 470 Gly Arg Val Thr Gly Trp Gly Asn Leu Lys Clu Thr Trp Ala Thr Thr Pro Glu Asn Leu Pro Thr Val Leu Gin Gin Leu Asn Leu Pro He Val Asp Gin Asn GGG CGG GTA ACT GGC TGG GGA AAT CTG AAA GAA ACG TGG GCC ACT ACC CCA GAA AAC CTG CCA ACA GTT CTG CAA CAG CTC AAT CTG CCC ATT GTA GAC CAA AAC 1 065 1 080 1 095 1 110 1 125 1 140 1 155 480 490 500 510 0 Thr Cys Lys Ala Ser Thr Arg Val Lys Val Thr Asp Asn Met Phe Cys Ala Gly Tyr Ser Pro Glu Asp Ser Lys Arg Gly Asp Ala Cys Glu Gly Asp Ser Gly ACC TGC AAG GCA TCC ACC AGG GTT AAA GTC ACA GAC AAT ATG TTC TGT GCT GGT TAC ACT CCT GAA GAC TCA AAG AGA GGA GAT GCT TGT GAA GGG GAC AGT CGG 1 170 1 165 1 200 1 2L5 1 230 1 245 1 260 520 530 540 Gly Pro Phe Val Met Lys Asn Pro Asp Asp Asn Arg Trp Tyr Gin Val Gly He Val Ser Trp Gly Glu Gly Cys Asp Arg Asp Gly Lys Tyr Gly Phe Tyr Thr GGG CCT TTT GTA ATG AAG AAC CCA GAT GAC AAC CGC TGG TAT CAA GTG GGA ATA GTT TCA TGG CGA GAA GGC TGT GAC CGA GAT GGC AAA TAT GGA TTT TAC ACT 1 275 1 290 1 305 1 320 1 335 1 350 1 365 550 560 564 - His Val Phe Arg Leu Lys Lys Trp Met Arg Lys Thr He Glu Lys Gin Gly STOP CAC GTA TTC CGC CTG AAA AAA TGG ATG CGA AAA ACC ATT GAA AAA CAA GGA TAG AAG AGA GCT TCC CTT GCT TGT TCT CAG TTC TGC TAC AAT ACT CCA CTT CTT 1 380 1 395 1 410 1 425 1 440 1 455 1 470 V AAA AAC ATA CAC ATT GAA CAA ATC TTG AAG TGG AAG TTA AAT CCC TGC AAC TTG ACA AAG GAA CGT GTT CCT CCT TGA AAA TAA AAG TTC TCA ACC ATC TTC CTC 1 485 1 500 1 515 1 530 1 545 1 560 1 575 CTT GTG TTC ATG CTA AGC TGA ACA CCA CCT GAA TCC ATG CCA TCA CAA TAG CTA GCA GCA CCA ACA CAA CAG CAC CTG CAG TAC TGC TAG TTA AGA TGC TGC CCT 1 590 1 605 1 620 1 635 1 650 1 665 1 660 TCA AGT GTT CTC CTC TAC TCT ATC AGC AGT AAC AAT CAA CAG ATT TTA GAC TTC AGA TGA TGG ACT TCA GTC ACA GTA AGC AAG ACG TCC CTT GGA CAC TGT CCA 1 695 1 710 1 725 1 740 1 755 1 770 1 785 TTC CCC CCT TCA ACT AAA TTC ATT TTC TGT TCT AGA AAT CTG AAA GGA TAA CAA GCT GGA GAT ACC TAC CCA CCT TAC AAG AAC TGT AGC ATT ATT CAA AAT GCC 1 800 1 815 1 630 1 645 1 860 1 675 1 690 ACA TCA AGA CTA AAG CAA CTA TAG CCT TTG TTG ATA AGA CAG ACA TTG TTC TCA GCC ACA ACA GCA GCA ACA AAA TAC CAT CTG TGC TTC TTA CAA AGT TAG TGT 1 905 1 920 1 935 1 950 1 965 1 980 1 995 CTT AAG TTA CAG ATG TCA TCT ATG TGC AAC TTA ATG AGG TAC AGA AAT AGG GGG TTT GAA TAG ATG AAG TAA CAC ACG CAT TTC TGC ATA GCA GTA ACT TTC TAT 2 010 2 025 2 040 2 055 2 070 2 065 2 100 ATG GCC AAG TAC TGC TGG GAC TTG AAA GTA TAT TTT CCA CTG GCA TAA CTA GAT TCA GAA GGA AGC ACT TCG TAC ACA CAA TTT TCA AAG GTC TTC CAA AGG GCA 2 115 2 130 2 145 2 160 2 175 2 190 2 205 GCA TCC GTC ACT GTA CCT ATT TTG TTC TTA TAA AAC TGT TTA GGA TTC ACC CTT AAA AGA AGC CCC ACT TCT TTC ATG AAC TCT TCA GCA AAG ACA CAG AAG TAC 2 220 2 235 2 250 2 265 2 280 2 295 2 310 AAT ACT ATT ATA TAG ACT GGC CAA TCT GTT CAG ACC AGT TTT CTC TCA AAC TAA AGA GGG ATT TGG AAG CTA TCT TTG CTC CCC AAA ACA TCA TTC TCA AAT CCC 2 325 2 340 2 355 2 370 2 385 2 400 2 415 TCA TCC CTC ACA GTG CCA TCA ACT TAC AGA AAC AAC CAA TAG ACA AAA GTT CTT CCT CCT TAA ATG GAG TAT TAA AGG ACA ATC- ACT TCA AAA AAG ATG CTA CAG 2 430 . . . 2 445 2 460 2 475 2 490 2 505 2 520 V AGA ACT ATC CAA AAT TTG TTG GAA, ATn, flAC AGT TAT TAA TC 2 529 2 538 2 S47 2* 556 * 131 411, Fig.18). Of these 56 residues, two differences were observed between the protein sequence predicted by the cDNA and that determined by Walz. Position 310 (Fig.18) was assigned as glutamate by amino acid sequence analysis and h i s t i d i n e by DNA sequence analysis. Position 326 was a phenylalanine by amino acid sequence analysis while the DNA sequence indicated that i t was a tyrosine. Overall, i t i s clear that t h i s cDNA does code for chicken prothrombin. J. ISOLATION OF LONGER CHICKEN PROTHROMBIN cDNAS In an attempt to characterize the entire mRNA for chicken prothrombin, 250,000 recombinants of the chicken l i v e r cDNA l i b r a r y were screened with 3 2 P - l a b e l e d pCII1 as a hybridization probe. A to t a l of twenty additional chicken prothrombin cDNAs were i d e n t i f i e d and plaque p u r i f i e d . This low number of prothrombin cDNA clones detected in the cDNA l i b r a r y suggests that the mRNA for prothrombin in the chicken l i v e r i s lower than in the bovine l i v e r (0.01% of the mRNA in chicken versus 1% of the mRNA in bovine) (see next section). A l l cDNA clones appeared to include a poly(A) t a i l , i ndicating that they had been primed from the 3' end by oligo(dT). cDNA clones greater than 1.0 Kbp in length were mapped for r e s t r i c t i o n endonuclease s i t e s (see Fig.19), and those shown in Fig.19 were used for further DNA sequence analysis (Fig.18). Two of the clones appeared to have a di f f e r e n t 3' end (pCII203, Fig.19 and a similar clone pCII205, not shown). These two cDNA clones contained an extra 1000 nucleotides of 3' untranslated sequences (nucleotides 1570 to 2561 Fig.18), this suggest that an 1 32 Fig.19: R e s t r i c t i o n Map of Chicken Prothrombin cDNAs cDNA inserts are flanked with EcoRI r e s t r i c t i o n s i t e s from the cloning procedure. Protein coding region is shown as s o l i d bar, indicating the approximate length of 5' end sequences. A l l cDNA clones end with poly(A) t a i l s . 1 3 3 Cl o a. o o CM CM O CL CJ CL CO LO d 1 34 alternative polyadenylylation s i t e i s used by the chicken prothrombin gene. Fig.18). None of the cDNAs contained a f u l l length copy of the prothrombin mRNA, with pCII20l extending the most 5' (see Fig.1 9 ) . The three cDNAs provided a t o t a l of 2565 bp of cDNA sequence (Fig.18). The cDNA sequence allowed the prediction of the sequence of 471 amino acid residues of chicken prothrombin. Based on Northern blot analysis (see next section) and analogy to the mammalian prothrombin mRNAs, i t appears that about 450 nucleotides of chicken prothrombin mRNA are not represented by these cDNAs (see Fig.19). A second l i v e r cDNA l i b r a r y was constructed, and screened with a 5' chicken prothrombin cDNA probe. None of the 320,000 randomly primed recombinant clones contained the missing 5' end of the chicken prothrombin sequence. K. SIZE ANALYSIS OF CHICKEN PROTHROMBIN mRNA The size of the chicken mRNA for prothrombin was determined by denaturing chicken l i v e r poly A + RNA with formaldehyde, separating i t on formaldehyde-agarose gels, and transferring the denatured RNA to n i t r o c e l l u l o s e . When these blots were hybridized with 3 2 P - l a b e l e d chicken prothrombin cDNA (pCIIl), two mRNAs were detected (Fig.20). These mRNAs were about 2200 and 3200 nucleotides in length (Fig.20). This supports the suggestion that two d i f f e r e n t polyadenylylation signals are used in the chicken l i v e r (see Figs.18 and 19) creating two d i f f e r e n t 3' ends. Greater than 90% of the mRNA for chicken prothrombin appears to use the f i r s t polyadenylylation signal (see Fig.20), as suggested by the i s o l a t i o n of 20 of the 22 cDNAs with th i s 1 35 Fig.20: Northern Blot Analysis of Chicken Prothrombin mRNA Chicken l i v e r poly A + RNA (20/ug) was denatured with formaldehyde, separated by electrophoresis, and blotted onto n i t r o c e l l u l o s e . The Blot was hybridized to the chicken prothrombin cDNA pCII1. The two mRNAs for chicken prothrombin are indicated by the arrows, and are approximately 3200 and 2200 nucleotides in length. 1 36 M 6.67-4.25-2.25-1.96-I 1 3 7 poly(A) t a i l . Chicken prothrombin mRNA could not be e a s i l y detected with t o t a l l i v e r RNA in contrast to bovine prothrombin mRNA (see F i g . 6 ) . This suggests that prothrombin mRNA in the chicken l i v e r i s much less abundant than in either the bovine or human l i v e r , where t o t a l RNA could be used in Northern blot analysis (see F i g . 6 ) . 1 38 DISCUSSION A. CHARACTERIZATION OF THE BOVINE PROTHROMBIN GENE 1 . Isolation Of The Bovine Prothrombin Gene Preliminary characterization of the bovine prothrombin gene by Southern blot analysis using cloned bovine prothrombin cDNAs as hybridization probes demonstrated that there i s probably a single gene for prothrombin in the bovine genome, and that t h i s gene i s at least 10 Kbp in length (Fig.4). When the cDNAs were used as hybridization probes to screen bovine genomic X l i b r a r i e s , a t o t a l of five d i f f e r e n t X clones were i s o l a t e d (Fig.5). The DNA in these f i v e clones overlapped each other and represented a t o t a l of 42.4 Kbp of contiguous bovine genomic DNA (Fig.5). These clones contained genomic DNA from only one location again suggesting that there is only a single gene for prothrombin in the bovine genome. Southern b l o t t i n g experiments indicated that the bovine prothrombin gene resided in approximately 15 Kbp in the middle of the cloned genomic DNA (Fig.5). 2. Size Analysis Of The Bovine Prothrombin mRNA The size of the mRNA for bovine prothrombin was determined by Nothern blot analysis (Fig.6). Prothrombin mRNA was detected by hybridization to labeled bovine prothrombin cDNA, pBI1111. These blots demonstrated the presence of a single bovine prothrombin mRNA species of 2150 ± 100 nucleotides in length in l i v e r tissue. This size of the mRNA indicated that the bovine 139 prothrombin cDNAs is o l a t e d by MacGillivray and Davie(l984) included nearly the entire mRNA sequence, but were probably lacking about 50 bp at the 5' end of the mRNA. 3. Sequence Of The Bovine Prothrombin Gene Further characterization of the bovine prothrombin gene was undertaken by p a r t i a l DNA sequence analysis. Comparison of the DNA sequence presented in Fig.9 to the cDNA sequence of bovine prothrombin (MacGillivray and Davie,1984) demonstrates that the bovine prothrombin gene i s made up of 14 exons separated by 13 introns. The gene covers approximately 15.6 Kbp of the bovine genome, and is processed into a mRNA of 2025 nuceotides plus poly(A) t a i l . As shown in Tables V and VI, a l l DNA sequences at the intron-exon junctions match the consensus sequence of Mount(l982) except the s p l i c e donor of intron L. The sp l i c e donor of intron L has GC (nucleotides 8170-71 Fig.9) instead of the consensus GT at i t s intron-exon junction. This rare variant has also been observed at s p l i c e junctions in a few other genes (e.g. Wieringa et al.,1984; Dush et al.,1985). This GC sequence has been observed in two diff e r e n t a l l e l e s of the bovine prothrombin gene ( i s o l a t e d from the two di f f e r e n t genomic phage l i b r a r i e s ) . This sequence probably does not represent a cloning/sequencing a r t i f a c t , suggesting that this rare s p l i c e signal i s probably functional in the bovine prothrombin gene. Comparison of the DNA sequence of the exons of the prothrombin gene to that of the previously isolated cDNAs for bovine prothrombin (MacGillivray et a_l.,l980; MacGillivray and Davie,1984) show a t o t a l of 7 nucleotide differences. One of 1 40 the differences is a deletion of an A residue in the 3' untranslated region of the genomic sequence in comparison to the cDNA sequence (between positions 15,482 and 15,484 (Fig.9)) within the 3' untranslated region. Of the remaining six differences, four are changes in the t h i r d p o sition of the codons for amino acid residues 157, 180, 182, and 281. None of these result in a change in amino acid residue, and are probably functionally s i l e n t polymorphisms of the DNA sequence r e s u l t i n g in (presumably) neutral changes. The other two differences in the DNA sequence are in the codon for amino acid residue 188 (see Fig.9) which result in the change from the cDNA determined residue h i s t i d i n e (CAC) to the genomic coding sequence for serine (AGC). This residue i s one of the amino acid differences between the predicted amino acid sequence determined by cDNA sequence analysis (MacGillivray and Davie,1984), and amino acid sequence analysis (Magnusson et al.,1975) while genomic sequence for this residue confirms the amino acid sequence analysis result, this amino acid difference at residue 188 may represent an amino acid residue polymorphism, as the human prothrombin amino acid sequence (Degen et a_l.,l983) has a h i s t i d i n e at t h i s position, which is the same as the bovine prothrombin cDNA sequence (MacGillivray and Davie,1984). Thus the h i s t i d i n e residue may represent the ancestral residue at th i s p o s i t i o n , which has changed to a serine residue in some c a t t l e . Heterogeneity also occurs at the 3' end of the bovine prothrombin mRNAs where there are at least two s i t e s of polyadenylylation. These s i t e s were detected by the comparison 141 of the DNA sequences of several independent bovine prothrombin cDNA clones (MacGillivray et al.,1980; MacGillivray and Davie,1984). The consensus polyadenylylation sequence AATAAA (Proudfoot and Brownlee,1976) i s found at positions 15,563-15,568 (Fig.9) of the bovine prothrombin gene. These AATAAA sequences are 16 and 18 bp 5' to the sit e s of polyadenylylation, a distance s i m i l a r to that found in other eukaryotic genes (Proudfoot and Brownlee,1976; B i r s t i e l et al.,1985). A second possible sequence CAYTG which may be involved in polyadenylylation has been observed 3' to the s i t e of polyadenylylation of some genes (Berget,1984). A similar sequence CAGTG i s found 13 and 15 bp 3' of the s i t e s of polyadenylylation in the bovine prothrombin gene (nucleotides 15,599-15,603 Fig.9). Thus, the 3' end of the prothrombin mRNA is at nucleotide 15,584 or 15,586, although termination of tran s c r i p t i o n probably occurs further 3' at an unknown s i t e . 4. Site Of mRNA I n i t i a t i o n Nuclease S1 and primer extension analysis (Figs.10 and 11) both indicate that the 5' end of the bovine prothrombin mRNA is located at or near nucleotide position 1 in Fig.9. The DNA sequence of t h i s s i t e of mRNA i n i t i a t i o n corresponds to the consensus start s i t e of a purine flanked by pyrimidines that i s found in many genes transcribed by RNA polymerase II (Breathnach and Chambon,1981) . Therefore, this is the most probable mRNA i n i t i a t i o n s i t e although alternate mRNA i n i t i a t o n s i t e s cannot be discounted. An intron in the 5' flanking untranslated region is u n l i k e l y as there i s no consensus splice acceptor sequence 1 42 (Mount,1982) in or near the 5' flanking sequences. No obvious "TATA" sequence can be seen immediately 5' to the s i t e of mRNA i n i t i a t i o n , but an AT r i c h sequence, ATTAA, i s found at the expected distance for a "TATA" sequence (nucleotides -28 to -24 Fig.9), and may function as the "TATA" sequence. Often a "CAAT" sequence is found approximately 100 bp 5' to the s i t e of mRNA i n i t i a t i o n (Breathnach and Chambon,1981). In the prothrombin gene, the sequence CCAT i s found at nucleotides -100 to -97. Like the "CAAT" sequence, the CCAT sequence i s flanked by an inverted repeat (Kingsbury and McKnight,1982) which i s G/C ri c h (nucleotides -121 to -104 and -81 to -63). Thus promoter-like sequences can be found at the appropriate distances from the sit e of mRNA i n i t i a t i o n in the 5' flanking sequence of the bovine prothrombin gene. However, further experiments must be performed to id e n t i f y the region(s) of the 5' flanking sequence that are involved in the regulation and expression of the bovine prothrombin gene 5. Intron Positions In The Coding Region It has been observed in a number of genes that introns are positioned between protein domains (Blake 1 978,1983a,b; Gilbert,1978,1979; Go,1981,1983). When the introns of the bovine prothrombin gene are mapped to the amino acid sequence of the protein molecule as shown in Fig.21, some of the introns appear to separate protein domains e s p e c i a l l y within the activation peptide. The s i t e of signal peptidase cleavage in precursor prothrombin has not yet been determined, but has been postulated to occur at G i n - 1 9 (Bently et al.,1986). Intron A 1 43 Fig.21: Introns in the Prothrombin Molecule The r e l a t i v e positions of introns within the prothrombin amino acid sequence are indicated by ( H H ) . V BOVINE P R O T H R O M B I N 1 45 Figs.9 and 21) interrupts the sequence of the prepro-peptide at residue -17, appearing to separate the pre- and pro-peptides. The Gla domain has been i d e n t i f i e d as extending from amino acid residue 1 to 47 (Jackson and Nemerson,1980; Patthy,1985), and as such i s flanked at residue 47 by intron C (Figs.9 and 21). No intron separates the Gla domain and the pro-peptide, further l i n k i n g the pro-peptide to a functional role in the formation of the Gla region (see Fung et al.,1985; Pan et al.,1985). The two kringles of prothrombin are flanked by introns D, F, and G (Figs.9 and 21), which separate the kringles from each other and from the remainder of the protein molecule. It appears that the N-terminal a c t i v a t i o n peptide has been constructed of exon domains for a signal peptide, a pro-peptide and Gla region, and two separate kringle domains. Some of these domains are further interrupted by introns (the Gla domain by intron B, and the f i r s t kringle domain by intron E (Figs.9 and 21)), which do not appear to separate obvious structural or functional domains. The d e f i n i t i o n of protein domains, either s t r u c t u r a l or functional, within the c a t a l y t i c region of prothrombin i s not as clear as for the a c t i v a t i o n peptide. As shown in Fig.21 introns separate the c a t a l y t i c a l l y important H i s 3 6 5 , A s p 4 2 2 , and S e r 5 2 8 residues, as well as separating the two factor Xa cleavage s i t e s from each other and the remainder of the protein. These may represent some form of functional domains. The three dimensional structure of thrombin is unknown, but a proposed model of the structure exists (Furie et al.,1982). Using t h i s model, the introns of the thrombin domain are found to map to 1 46 the surface of the molecule, as has been observed in other proteins (Craik et al,1982a,b,1983). B. CHARACTERIZATION OF A HUMAN PROTHROMBIN cDNA The amino acid sequence of human prothrombin has been determined (Butkowski et al.,1977; Walz et al.,1977). Characterization of cDNA clones comfirmed t h i s sequence, and demonstrated that precursor prothrombin has a leader sequence of at least 36 amino acid residues (Degen et al.,1983). With the is o l a t i o n of a new human prothrombin cDNA, pIIH13, the complete sequence of precursor prothrombin has been determined (Fig.14). The sequence of t h i s cDNA shows that precursor human prothrombin, l i k e the bovine precursor, has a prepro-peptide of 43 amino acid residues. Stop codons are observed 5' to Met" 4 3 in the same reading frame (Fig.14) suggesting that Met" 4 3 is the i n i t i a t i n g metionine. Optimal alignment of the human cDNA sequence with the bovine genomic sequence (Fig.22) places the f i r s t nucleotide of pIIH13 (Fig.14) at the i n i t i a t i n g nucleotide of the bovine gene. This indicates that pIIH13 may be a f u l l length cDNA i n i t i a t i n g near the f i r s t nucleotide of the human prothrombin gene. Comparison of the sequence of pIIH13 with previously isolated cDNA clones (Degen e_t a_l.,l983) shows only one nucleotide difference. The codon for residue 13 was CTG compared to CTA (Degen et al.,1983). This represents a s i l e n t mutat ion. 1 47 Fig.22: Alignment of the Bovine and Human Prothrombin mRNA  Sequences The nucleotide sequence of the 5' untranslated sequence and prepro-leader sequence of the bovine prothrombin gene (B, nucleotides 1 to 153 of mRNA sequence, Fig.9) is aligned to the cDNA sequence of pIIH13 (H, nucleotides 1 to 156, Fig.16). Gaps (-) are placed to maximize homology of the two DNA sequences. Numbering i s from pIIH13 (Fig.16); stars indicate i d e n t i c a l nucleotides. The i n i t i a t o r methionine (residue -43) i s encoded by nucleotides 28-30, and Arg" 1 of the leader sequence by nucleotides 154-156. 1 48 B : GCAGAGTG — CC-GGAGCGGATACACCATGGCGCGCGTCCGAGGCCCGCGGCTGCCTGGC * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * H : CCCTAGTGACCCAGGAGCTGACACACTATGGCCCGCATCCGAGGCTTGCAGCTGCCTGGC 1 15 30 45 60 B : T G C C T G G C C C T G G C T G C C C T G T T C A G C C T C G T G C A C A G C C A G C A T G T G T T C C T G G C C C A T * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *'* * * * H : T G C C T G G C C C T G G C T G C C C T G T G T A G C C T T G T G C A C A G C C A G C A T G T G T T C C T G G C T C C T 75 90 105 120 B : CAGCAAGCATCCTCGCTGCTCCAGAGGGCCCGCCGT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * H : CAGCAAGCACGGTCGCTGCTCCAGCGGGTCCGGCGA 135 150 1 49 C. CHARACTERIZATION OF cDNAS FOR CHICKEN PROTHROMBIN 1 . Sequence Of The Chicken Prothrombin cDNAs A t o t a l of 22 prothrombin cDNA clones were isolated from a chicken l i v e r cDNA l i b r a r y . Three of these cDNA clones provided 2561 nucleotides of sequence of chicken prothrombin mRNA (Fig.19). From this DNA sequence, the amino acid sequence of 472 residues of chicken prothrombin could be predicted (fig.18), with approximately 92 of the N-terminal amino acid residues missing (see below). Three portions of the predicted amino acid sequence of chicken prothrombin could be aligned with amino acid sequence data (D. Walz, unpublished r e s u l t s ) , at positions 155-185, 308-334, and 381-409 (Fig.18). Differences in amino acid assignment were found at positions 168, 310, and 326 with differences of Lys, Glu, and Phe in the cDNA and Gly, His, and Tyr in the amino acid sequence analysis respectively (see Fig.18). The differences found at positions 310 and 326 were observed in at least two of the cDNA clones, and are therefore u n l i k e l y to be cloning a r t i f a c t s . The difference at position 168 was observed in only one cDNA clone and may be a cloning a r t i f a c t . These three differences may represent polymorphisims within the chicken prothrombin sequence. The amino acid sequence data c l e a r l y demonstrate that the cloned cDNAs code for chicken prothrombin, or an extremely c l o s e l y related protein, such as a recently duplicated gene product. The sequence shown in Fig.18 indicates that chicken prothrombin has a very similar structure to that of the 1 50 mammalian prothrombins. DNA sequence data demonstrate that the chicken prothrombin molecule i s probably made up of a two chain thrombin, and contains two kringles in the activation peptide. The existance of a Gla domain in chicken prothrombin had been demonstrated previously by amino acid sequence analysis (Walz,l978). The structure of the leader peptide is unknown at present. 2. Alternative Sites Of Polyadenylylation Northern blot analysis (Fig.20) of chicken l i v e r mRNA demonstrated the existance of two mRNA species for chicken prothrombin. DNA sequence analysis of cDNA clones for chicken prothrombin demonstrated that the difference between these two mRNAs (Figs.18 and 19) i s due to the use of two dif f e r e n t polyadenylylation signals. The two s i t e s of polyadenylylation were approximately 1000 nucleotides apart (Fig.18), accounting for the difference in size of the mRNAs (Fig.20). The use of these two s i t e s of polyadenylylation does not a l t e r the protein coding region of the mRNAs, but only changes the length of the 3' untranslated sequences. The poly(A) t a i l of most mRNAs are 180-200 nucleotides in length (Perry,1976). Thus, the coding regions of the chicken prothrombin mRNAs are about 3000 and 2000 nucleotides long. To date, 2561 bp of chicken prothrombin cDNA sequence have been determined, indicating that about 450 bp of sequence are absent from the isolated cDNA clones (Fig.19). Approximately 92 amino acid residues of amino acid sequence of plasma prothrombin are absent from the chicken prothrombin cDNA sequence (see below), 151 together with the leader sequence, and 5' untranslated sequences. After accounting for the missing 92 amino acid residues, there are about 170 nucleotides of mRNA sequence remaining, which would be adequate to encode a prepro-peptide si m i l a r to the mammalian prothrombins (43 amino acid residues correspond to 132 nucleotides in addition to 40 nucleotides of 5' untranslated sequences). Thus, i t appears that there may be only minimal differences between chicken prothrombin and the mammalian prothrombins. D. COMPARISON OF PROTHROMBIN SEQUENCES 1. Conserved Sequences An alignment of the amino acid sequences of bovine prothrombin (MacGillivray and and Davie, 1984), human prothrombin (Degen e_t a l . , 1983; F i g . 15) and chicken prothrombin (Walz, 1978; Walz, unpublished r e s u l t s ; Fig.18) i s shown in Fig.23. Gaps and insertions have been placed to allow for maximum homology with the minimum of deletions and/or insertions but with retention of common stru c t u r a l features (see Fig.23). There i s 87% amino acid iden t i t y between the precursor forms of bovine and human prothrombin, and 68% and 65% i d e n t i t y between bovine and chicken, and human and chicken prothrombins, respe c t i v e l y . The most conserved regions between these prothrombins are the Gla region and the thrombin domain. However, the A chain of thrombin is much less conserved than the B chain. In addition, the kringles are much less conserved, at about 60% identity between chicken and the mammals. The least 1 52 Fig.23: Homologies in Prothrombin Sequences An alignment of bovine prothrombin (MacGillivray and Davie,1984), human prothrombin (residues -43 to -37 from Fig.16, residues -36 to 579 from Degen et al.,1983) and chicken prothrombin (residues 1 to 45 from Walz,1978, residues 56 to 90 from Walz, unpublished results, residues 93 to 564 from Fig.18). Sequence i s aligned to give minimum of insertions and/or deletions, , indicate the s i t e s of factor Xa cleavage, s i t e indicate the s i t e of thrombin cleavage, <^ , indicate the active s i t e residues, , represent gaps in the amino acid sequence to allow maximum homology between the sequences, ???, represent uncharacterized amino acid residues which are predicted to exis t by analogy to the mammalian prothrombins (deletions and/or insertions may e x i s t ) . 153 Met Ala Arg Met Ala Arg -40 -30 -20 Val H e Arg Arq Gly Gly Pro Arg Leu Gin Leu Pro Gly Leu Pro Gly Cys Leu Ala Leu A l a Ala Cys Leu Ala Leu Ala Ala Leu Leu Phe Cys Ser Leu Val Ser Leu-Val His Ser Gin His Ser Gin His His -Val Yfli Phe Leu Ala Phe.Lflu Ale His Pro Gin Gin Ala Gin Gin Ala Ser Leu Leu Gin Arg Ser Leu Leu Gin Arg Arg Arg Ala Asn Lys Gly Phe Leu Glu G l u Val Arg Lys Gly Asn Leu Glu Arg G i t Arg Arg Ala Asn|T h r • JPhe Leu Glu G l u Val .Arg Lys Gly Asn Leu.Glu Arg G i t [Ala Asn Lys Gly Phe Leu Glu Glu|Met I l e | L y s Gly Asn Leu Glu Arg G i t Cys Leu Glu GlulPro[Cys S e r j A r g | G l u Glu V i a ' Phe Glu Ala Leu Glu Ser Cys [Val JGlu Glu Thr Cys Ser Tyr C l u Glu A l a Phe Glu Ala Leu Glu Ser Cys Leu Glu Glu Thr C y s f A s n l T y r Glu Glu A l a Phe Glu Ala Leu Glu Ser Leu S e r [ A l a T h r A s p A l a Phe Trp A l a L y s Tyr Thr A l Ser *Thr | A l a Thr Asp [Vail Phe Trp A l a Lys Tvr Thr Ala Thr Val Asp] T h r [ i A 9 p Ala Phe Trp Ala Lvs T v r l ? ? ? ??? B: Cys Glu Ser TTT Arg Asn Trb Arg H: Cys Glu Thr Ala Arg Thr Pro Arg C: Hi ??? ??? iii ??? ??? ??? ??? Glu ILys Leu|Asn GlujCys Leu Glu Gly Asn Cys A l a Glu Gly | Va11Gly Asp [Lys L e u f A l a pTTa Cys Leu Glu Gly Asn Cys A l a Glu Gly Leu Gly ? Thr Thr.Leu)Asp [Ala Cys Leu Glu Gly Asn Cys A l a j v a l AsnlLeu Gly Met Thr Gin Asn Tyr Arg Gly Asn[Trp]Arg Gly Asn Tvr Arg Glv Asn His Thr 60 90 100 |Val jSer Val I Thr Arg Ser Gly l i e Glu Cys Gin Leu Trp Arg Ser Arg Tyr Pro His Lys Pro Glu l i e Asn Se'r Thr Thr His Pro Glyj |v»I Asii [lie I Thr Arg Ser Gly H e Glu Cys Gin Leu Trp Arg Ser Arg Tyr Pro His Lys Pro Glu H e Asn Ser Thr Thr His Pro Glyj I l e l A s x iTvrlThr i L v s l S e r Glv H e Glu Cvs G l n l V a l Tvr 77? 7?7 L v s l T y r Pro H i s I 1 l e |_Pro I Ly s P h e l A s n l A l a Ser l i e T v r l 110 |Ala Asp Leu [Ala Asp Leu Glu"Asn Phe Cys Arg Asn Pro Asp Glu Asn Phe Cys Arg Asn Pro Asp Glu A s n l T v r l C v s Ara Asn Pro Aso 120 130 Gly Ser Ser Ser Asn Asn H e ifrhr'tily Pro Trp Cya Tyr Thr Thr|Ser|Pro T h r J L e u l A r g Arg Glti| AsnJThr Gly Pro Trp Cys Tyr Thr Thr Asp pro Thr V a l Arg Arg f c i n '  Gly Pro Trp Cys Tyr Thr Arg Asp Pro Thr V a l J G l u l A r q Glut 160 Glu Cys S e r l V a l [Pro Val~Cys Gly Gin Asp A rg ' Vai' Thr" Val Glu Val) l i e J Pro A r g S e r j G l y |Gly Ser Thr ThTlSer G l n f S e r Pro Glu Cys Ser H e Pro V a l Cys Gly Gin Asp  Glu Cvs Jp'r'olHe Pro Va\, Cys Glv Gin |Glu Arg. Val Thr Val [Ala | Met Thr Pro Arg Ser| Glu |Gly Ser [S_e_r Val Asn L e u l S e r Pro Thr JThr Val Glu Phe Thr Pro A r g j v a l Lys Pr o l S e r Thr T h r l d y Leu Pro 170 ILeu G l u l T h r l C y s Val Pro Asp Arg Gly] Arg Glu |leu G l u l G l n l c y s Val Pro Asp Arg Glyj Gin Gin lGln| ProLcysjGlu Ser Glu Lvs|Glv, Met Lue 200  180 190 Arg |Gly Arg Leu Ala Vol Thr Thr H i s G l y l S e r J A r g Cys teu A l a T r p l S e r j S e r j Gin JGly Arg Leu Ala Val Thr Thr Hia Gly I Leu Pro! Cys Leu A l a Trp A l a Sein Thr I G l y r T h r ^ L e u l S e r l V a l Th r l Val' Ser IG1 v [ Ala l Arg Cys Leul'prol Trp A l a Serl 210 Glu [Gin Ala Lys Ala Leu Ser Lys Asp Gin Asp Phe Asn Pro Ala V a i l Pro Al a (Gin Ala Lys Ala Leu Ser LyBprTTlGln Asp Phe Asn tS*eT| Al a Va 1 Gin Glu L y s l A l a ^ L y s Ala LeuJLeu Gln|Asp[Lys Thr 1leiAsn Pro|Glu IVa1|Lys 220 Glu Asn Phe Cys Arg Asn Pro Asp Gly Asp Glu] Glu Asn Phe Cys Arg Asn Pro Asp Gly Asp Glul  Glu A s n l T y r l C v s Arg Asn Pro A s p j A l a l A s o l A s p B : Glu Gly H : Glu Gly Ct QlU GlY B i Gly Asp H: Gly Asp Tyr Val Ala Thr Val Ala Val H e Asp d l u Asp Pro Glu Asp Ser Glu Asn Glu Asp Gin [Pro Gly Asp Phe Glu Tyr Cys Asp Leu Asn Tyr Cys Glu Glu I Pro I Va 1| Asp Gly Asp Leu Gly Lys Pro Gly Asp Phe|ciy|Tyr Cys Asp Leu Asn Tyr Cys Glu Clu] Ala 1 Val Glu j Glu Glu Thr Glu Pro.Pro.Tvr ---|phe Glu Tvr Cvs Aso L e u l H i s . T v r CvalAan Ser Ser Laul C l n l y Arg Thr l.ftppl Arg[ Ala l i e Glu C l y Arg Thr " • - • • " l l l e f A l a l Gly Arg Thr Ser Glu Asp His [Phe Gin Ala Thr Ser Glu T y r l G i n Pro ^ Thr H e Phe Gin Glu lPhg|Tys~ Thr Phe Phe Asn Glu Lys] Phe Phe AsnlPro Arg  Phe P h e l A g p l d u Lvsl B ; Thr Phe Gly H : Thr Phe Gly C: Thr Phe Glv 320 Ser Tyr l i e ] Ser Tyr IIe|  Ser T y r j H e t Ala Ser Glu Glu Asp Gly 350 Gly Glu Ala Asp Cys Gly Leu Arg Pro Leu Phe c i u Lys Lys GJn| Val Gin Gly Glu A l a Asp Cys Gly Leu Arg Pro Leu Phe Glu Lys Lys] Ser* Leu Glu Gly Glu Ala Asp Cys G l y j T h r l A r g Pro Leu Phe Glu Lys Lys Gin] H e Thr Asp Gin Thr Glu Lys C Gly Arg l i e Val Glu G l y [ G i n Asp Gin Thr G l u l A r g l G l u Leu S e r l G l q Lys Glu Leu Phe Glu Leu Glu Met Asp Gly Arg H e v a l Glu Gly Ser Glv Aro I V a i l V a i l H i s l Glv I Asn Asp Ala Glu Val Gly Asp A l a G l u l l l e j Gly A S P Ala Glu V a l Glv Leu ISer Pro Trp Gin Val Met Leu Phe Arg Lys Ser Pro Gin Met|Ser Pro Trp Gin Val Met Leu Phe Arg Lvs Ser Pro Gin Ser Alaj Pro T,rp Gin Va^ Met LeulTvr Lvsl Lvs Ser Pro Gin Glu Leu Leu Glu Leu Leu Glu Leu Leu 360 0 370  Cys Gly A l a Ser Leu l i e Ser Asp Arg Trp Val Leu Thr A l a A l a His Cys Leu Leu Tyr Pro Pro Trp Asp Lys Asn Phel Cys Gly A l a Ser Leu H e Ser A S P Arg Trp Va 1, Leu Thr Ala A l a His Cys Leu Leu Tyr Pro Pro Trp Asp Lys Asn .Ph_J Cys Gly A l a Ser Leu H e Se r l A s n S e r f T r p I H e 1 Leu Thr Ala Ala His Cys Leu Leu Tvr Pro Pro Tro A S P L V W Aanl Leu V a l AspJAsp Leu Leu Val Arg H e Gly Lys His Ser Arg Thr Arg Tyr Glu Arg| Lys V a l Glu JAsn Asp Leu Leu Val Arg H e Gly Lys His Ser Arg Thr Arg Tyr Glu Arg A s n l l i e Thr U s n A s p l I l e f L e u Val Arg [Me 11 G l y I Leu] H i 8,1 Phe I Arg | A l a Lys'fTyr Glu Arg Asnl l y s 410 420 0 Glu Lys H e Ser Met Leu Asp Lys I l e l Glu Lys l i e Ser Met Leuf Glu] Lys He) Glu Lvs I l e l V a l L * U ( T . A . . * c n i . w n l v i Tyr H e His Pro Arg Tyr Asn Trp Lys Glu Asn Leu Asp Arg Asp l i e Ala Leu Leu Lys Leu,Lys Arg P r o j l l e Glu LeufSer Asp Tyr Asp Arg Asp H e Ala Leu| Met Lys ]Asp Arg Asp H e Ala Leu Leu H i s l Leu Lys |Lys") Pro V a l l A l a j P h e Ser Asp Tyr .Leu Lys Arq Pro Val|He|Phe Ser Asp Tvr B : H e His Pro V a l Cys Leu Pro Asp Ly B Gin Thr Ala Ala H : H e His Pro V a l Cys Leu Pro Asp lArg Glu Thr Ala Ala C: H e His Pro v a l Cy s Leu Pro[Thr iLvs Glu Leu Val Gin Ala Gly Phe Lys Gly Arg V a l Thr Gly Trp Cly Asn[Arq Ala G l y ( T y r ] L y s Gly Arg V a l Thr Gly Trp Gly Asn Leu] Ala Gly Phe Lys Glv Arg V a l Thr Glv Trp Gly Asn Leul A r g j G l u Thr Trp Thr Thr Ser ILys G l u Thr Trp T h r l A l a Asn |Lys G l u Thr TrpJ A l a 1 Thrl Thr A l a | G l u ] V a l [ G i n Pro Ser Val Leu Gin Val Val Asn Leu Projteuj Val'Glu Arg Pro Val Cys Lys Gly jLy's1* G l y l d n Pro Ser V a l Leu Gin Val Val Asn Leu Pro H e V a l Glu Arq Pro Val Cys Lys Pro i G l u l Asn Leul Prol Thrl Val Leu G l n l G l n Leul Ann L P U Pro H P V a l l A s o Gin Asn ThrlCvs Lvs Ala Ser Thr Arg H e Arg H e Thr Asp Asn Met Phe Cys Ala Gly Tyr Lys Pro Asp.Ser Thr Arg H e Arq l i e Thr Asp Asn Met Phe Cys A l a Gly Tyr Lys Pro Ala Ser Thr Argj'val Lys Val\Thr Asp Asn Met Phe Cys A l a Gly Thr| Ser]Pro Glu Gly Lys Arg Gly Asp Ala Cys Glu Gly Asp Glu Gly Lys Arg Gly Asp Ala Cys Glu Gly Asp Gly Asp i  Glu A S P S e r l L v s Arg G^y A«p Ala Cys r.l» my A<tp. 540 Ser Gly Gly Pro Phe V a l Met Lys Ser Pro Ser Gly Gly Pro Phe V a l Met Lys Ser Pro Ser Gly Gly Pro Phe V a l Met Lys]Asn}Pro 550 TyrfAsn Asn Arg Trp Tyr Gin Met Gly H e Val Ser Trp Gly Glu Gly Cys Asp Arg Asp PheJAsp Asn Arg Trp Tyr Gin Met Glv H e V a l Ser Trp Gly Glu Gly Cys Asp Arg Asp Asp A B pI A B n Arg Trp Thr G l n | | Gly H e Val Ser Trp Gly "Clu Gly Cys Asp Arg Asp. 560 570 580 Gly Lys Tyr Gly Phe Tyr Thr His Val Phe Arg Leu Lys Lys Trp H e Gin Lys Val l i e (Asp  Gly Lys Tyr Gly Phe Tyr Thr His Val Phe Arg Leu Lysi Lva Trol Met Ara [Lys! Thr) 582 Leu Gly Ser STOP Phe Gly Glu STOP Gin GlV STOP 1 54 conserved regions are the regions connecting the Gla and kringles, the region connecting the kringles, and the region connecting the kringle and the thrombin domain (see Fig.23). This homology implies that the Gla and thrombin B chain are the regions most e s s e n t i a l for the common function of the chicken and mammalian prothrombins. The kringles play a somewhat less e s s e n t i a l role, and the connecting regions may only function to separate the d i f f e r e n t domains. 2. Deletions/Insertions A number of deletions and/or insertions are required for maximal alignment of the prothrombin sequences for chicken, human, and bovine (Fig.23). Like the regions of low amino acid conservation, many of these deletions and/or insertions are also found in the connecting regions (Fig.23). Other deletions are found throughout the prothrombin molecule. In the human prothrombin sequence, a deletion e x i s t s at amino acid residue 4 (Fig.23). This same deletion i s found in some of the other vitamin K-dependent coagulation factors (Jackson and Nemerson,1980), but i t s s i g n i f i c a n c e i s unknown. In the kringle regions, deletions of two and one amino acid residues are observed in the chicken sequence (positions 107 and 240, Fig.23). Deletions have been observed in other kringles, and their influence on function i s unknown (Jackson and Nemerson,1980). Two single amino acid deletions occur within the thrombin domain of chicken prothrombin, at positions 475, and 582 (Fig.23). The deletion at position 582 removes the C-terminal amino acid residue; however, the length of this 1 55 C-terminal region is not conserved between serine proteases (Jackson and Nemerson,1980). The second deletion at po s i t i o n 475 also occurs at a position of length v a r i a b l i t y in coagulation factors (Jackson and Nemerson,1980), as well as being found on the surface of the three dimensional model of thrombin (Furie et al.,1982). These two deletions probably have l i t t l e e f fect on the structure and/or function of thrombin. Other deletions, as mentioned above, occur in the connecting regions: deletions of two residues in the human sequence at position 266, and deletions of 5, 7 and 1 residue at positions 164, 255, and 270 in the chicken sequence. None of the deletions/insertions found between the three prothrombin sequences occurs at intron-exon junctions in the bovine (Fig.9) or human (Degen et al.,1983,1985; Davie et §_1.,1983) prothrombin genes. Therefore, i t appears that none of these insertions and/or deletions were produced by intron s l i d i n g (Craik et al.,1982a,b,1983, see section I ) . These deletions and/or insertions were probably produced by deletion and/or insertion of short pieces of DNA sequence. 3. mRNA Structure Prothrombin from bovine, human, and chicken can be encoded by a mRNA transcript of about 2200 nucleotides (Fig.6; Degen e_t §_1.,1983; Fig.20), of which 2000 nucleotides are of coding sequence. As discussed above, the mRNAs from the three species probably have similar lengths of 5' untranslated sequences. As the three prothrombin polypeptide chains are of simi l a r lengths (see Fig.23), the length of protein coding region in each of the 1 56 mRNA transcripts must be s i m i l a r . However, the length of the 3' untranslated regions do d i f f e r . Chicken prothrombin mRNA d i f f e r s from the other two species by using two di f f e r e n t s i t e s of polyadenylylation with each having a separate polyadenylylation s i g n a l . In the chicken, the 5' polyadenylylation signal corresponding to the shorter 3' untranslated sequence appears to be equivalent to the polyadenylylation s i t e s of the mammalian prothrombins. Comparison of these three 3' untranslated sequences demonstrates a great deal of length v a r i a t i o n : 97 nucleotides in human prothrombin (Degen et a_l.,1983), 122 nucleotides in bovine prothrombin (Fig.9; MacGillivray and Davie,1984), and 150 nucleotides in chicken prothrombin (Fig.18). To account for this length v a r i a t i o n , a large number of deletions and insertions appear to have occurred. These deletions and insertions complicate the comparison of the sequences of the 3' untranslated regions; indeed, only the AATAAA polyadenylylation signal is c l e a r l y conserved. It appears that the 3' untranslated region has no other role in the prothrombin tr a n s c r i p t s . E. COMPARISON OF THE BOVINE AND HUMAN PROTHROMBIN GENES The gene for human prothrombin has been isolated and p a r t i a l l y characterized (Degen et al.,1983,1985; Davie et al.,1983; Fig.16). It i s therefore possible to make some comparisons of the structure and organization of the bovine and human prothrombin genes. The gene for human prothrombin has been reported as >20 Kbp in length (unpublished results quoted 1 57 in Nagamine et al.,1984), while the bovine gene i s only 15.6 Kbp (Fig.9). The increase in size of the human prothrombin gene i s v i s i b l e in the the increase in the size of some of the r e s t r i c t i o n fragments of the human prothrombin gene (for possibly conserved r e s t r i c t i o n s i t e s , see Figs.5 and 16). The difference in size of r e s t r i c t i o n endonuclease fragments i s also observed in genomic Southern blots (Figs.4 and 17). The number and size of the exons of the human gene for prothrombin (Degen et al.,1983,1985; Davie et al.,1983) i s the same as for the bovine gene, with a l l intron-exon junctions at ide n t i c a l locations. The difference in the size of the two genes is due to the presence of larger introns within the human prothrombin gene. Not a l l of the introns of the human gene are larger. For example, introns E, G, and H Figs.9 and 16) are of similar length in both genes. In general, i t appears that only the larger introns d i f f e r in length between the two species. Many of the large introns of the bovine (Fig.13) and human (Degen e_t a_l.,l983; Davie e_t §_1.,1983) prothrombin genes contain repe t i t i v e DNA elements. Alu elements have been i d e n t i f i e d within the introns of the human prothrombin gene (Degen e_t al.,1983; Davie et al.,1983), which are t y p i c a l l y 300 bp in length (Jelnick and Schmid,1982). The major r e p e t i t i v e DNA of the bovine genome i s only 120 bp in length (Watanabe e_t al.,1982). Therefore, i f a l l bovine r e p e t i t i v e DNA elements have been replaced with Alu elements, there would be an increase in the size of the introns between the bovine and human prothrombin genes. Another possible mechanism to increase the 1 58 size of introns would be to change the number of r e p e t i t i v e DNA elements found within introns. Insertion and deletion of unique DNA sequences could also change the size of introns. In general, i t appears that the gene for prothrombin has evolved both in DNA and in amino acid sequence in the 80 m i l l i o n years since mammalian radiation (Culbert,1980). The number and positions of exons and introns have been stable for th i s 80 m i l l i o n year period, as has been observed in the organization of the porcine and human genes for the urokinase-type plasminogen activator (Nagamine et al.,1985). Thus, any differences found in the organization of serine protease genes within mammals probably r e f l e c t changes that occurred during the evolution of the gene rather than the evolution of the species. As a large number of serine protease genes have been characterized, they can be compared to understand the evolution of th i s gene family. F. COMPARISON OF SERINE PROTEASE GENES 1. Leader And Gla Region Several of the coagulation factors (prothrombin, factor IX, factor X, factor VII, protein C, protein S, and protein Z) require vitamin K for t h e i r biosynthesis. These proteins undergo a p o s t - t r a n s l a t i o n a l modification at several glutamic acid residues by a membrane bound, vitamin K-dependent carboxylase. The r e s u l t i n g carboxylated protein binds calcium ions which f a c i l i t a t e the anchoring of the proteins to membranes at the s i t e of injury (see Suttie,1985 for a recent review). The cDNA sequences of prothrombin (Degen et al.,1983; 1 59 MacGillivray and Davie,1984), factor X (Fung et al.,1984,1985; Leytus e_t al.,1984), factor IX (Kurachi and Davie, 1982; Jaye et al.,1983), factor VII (Hagen et al.,1986), protein C (Long et al.,1984; Foster and Davie,1984; Beckmann et al.,1985), and protein S (Dahlback et al.,1986) have shown that each of these proteins is synthesized as a precursor containing a prepro-leader sequence. As the vitamin K-dependent bone protein osteocalcin i s synthesized with a prepro-leader peptide that is homologous to the coagulation factors, i t has been suggested that this region may be involved in the carboxylation process (Pan and Price,1985; Pan et al.,1985). The organization of t h i s region of the bovine prothrombin gene (Fig.9), human factor IX gene (Anson et al.,1984; Yoshitake et al.,1985), and the human protein C gene (Foster et a l . ,1985; Plutzky et al.,1986) i s shown in Fig.24. In the prothrombin and factor IX genes the f i r s t three introns are at p r e c i s e l y the same locations (to the same nucleotide) while only the location of the second intron (corresponding to the f i r s t intron of the factor IX and prothrombin genes, see Fig.24) of protein C d i f f e r s , by being s h i f t e d upstream (5') by 6 bp, probably by intron s l i d i n g (see Fig.24). Intron s l i d i n g i s a process whereby an insertion or a deletion of coding sequence occurs because of a change in the s i t e of mRNA s p l i c i n g (Craik e_t al.,1982a,b,1983). This i s caused by the formation or u t i l i z a t i o n of an alternate s p l i c e donor or acceptor sequence within an intron or an exon, which replaces the pre-existing s i t e . This process does not involve the deletion or insertion 160 Fig. 24; Comparison of the Organization of Exons in the  Leader Peptide and Gla Domain The organization leader peptide and Gla exons of the factor IX, protein C, and prothrombin genes. Exons are represented by open bars; 5' untranslated region are represented by the slashed bars. Codons for the residues at the s i t e of cleavage giving r i s e to the plasma proteins are denoted by the v e r t i c a l arrow. Codons for 7-carboxyglutamic acid residues are denoted by the inverted s o l i d t r i a n g l e s . Intron phases are 0, intron between the codons, I, intron after the f i r s t nucleotide of the codon, I I , intron after the second nucleotide of a codon. The sizes of the exons are indicated by the scale representing 50 bp. The sizes of the introns are not to scale. The d i r e c t i o n of transcription i s 5' to 3'. Factor IX Protein C Prothrombin • TT .TTTT TT T T T , 0 - o - - 1 T T T T T T T T T T - o - - I I 1 50 bp 1 62 of DNA sequence within a gene, but does result in a length difference of the f i n a l mRNA and protein product. In the protein C gene i t appears that a new splice acceptor s i t e was produced 6 bp upstream of the o r i g i n a l s i t e (the probable pre-e x i s t i n g s p l i c e acceptor AG is s t i l l present in the genomic DNA sequence and now i s part of the coding sequence 6 bp 3' to the present s p l i c e acceptor site) (Foster et al.,1985; Plutzky et al.,1986). Another example of intron s l i d i n g i s observed within the family of serine proteases. In the porcine gene for urokinase, two d i f f e r e n t splice donor si t e s are used for one intron (Nagamine e_t §_1.,1985), only one of which i s used in the human gene (Nagamine et al.,1985; Riccio et a_l.,1985), res u l t i n g in a 9 amino a c i d residue (27 bp) insertion. This may represent an intermediate in intron s l i d i n g , where the choice between the two d i f f e r e n t s p l i c e s i t e s has not been made yet. Mutations s i m i l a r to these changes in sp l i c e s i t e in the globin genes account for some of the thalassemias (Busslinger e_t al.,1981). Often these are caused by frame s h i f t s due to the new s p l i c e s i t e . The protein C and urokinase mutations maintain the reading frame of the spliced mRNAs. Note that i t i s not necessary for every sucessful intron s l i d i n g event to maintain the reading frame although i f the reading frame i s changed, the new protein coding region C-terminal to thi s change w i l l have no homology to the pre-existing protein. This change in reading frame would be si m i l a r to the results of some d i f f e r e n t i a l s p l i c i n g , for example at the 3' end of the y fibrinogen gene, which produces y and 7 ' fibrinogens with d i f f e r e n t C-terminal 1 63 sequences (Crabtree and Kant,1982). As mentioned above, often these changes in reading frame are deleterious as in some thalassemias (Busslinger et al.,1981). The mutations in the protein C and urokinase genes presumably do not interfere with the protein folding or functions of these proteins. The three exons containing amino acid coding sequences encode the prepro-leader peptide, and the entire Gla region. Bently et §_1.(1986) have characterized an abnormal factor IX gene that results in defective pro-peptide processing. Amino acid sequence analysis of the pro-factor IX that accumulates in the plasma of such individuals showed that signal peptidase cleaves the factor IX prepro-leader peptide between amino acid residues -19 and -18. By analogy, Bentley e_t a l . (1 986) suggested that signal peptidase cleaves the prothrombin prepro-leader peptide between residues -20 and -19, and in a similar position in protein C. In that case, the signal peptide i s encoded by a single exon in the prothrombin, factor IX and protein C genes. Interestingly, most of the pro-region and Gla region is encoded by the next exon. Differences exist between factor IX, protein C and prothrombin in the length of the f i r s t exon, including the presence of an intron in the protein C gene, and the location of the (presumed) i n i t i a t o r methionine residue. These differences in the f i r s t exon are not unexpected as signal peptides often have l i t t l e homology, even i f they have a common ancestor (Rogers,1985). Overall, the leader and Gla regions of the three genes appear to have evolved from a common ancestor. The Gla region 1 6 4 i s not a recent addition to prothrombin; t h i s region appears to exist in lamprey prothrombin as this protein can be adsorbed to barium s a l t s ( D o o l i t t l e et al.,1962; Zytkovicz and Nelsestuen,1976) . The observations indicate that the Gla region is at least 450 m i l l i o n years old, suggesting that vitamin K-dependent carboxylation of the glutamate residues of the protein predates the differences found in the remainder of the protein. Some type of correction event (e.g. gene conversion) may be responsible for maintaining the organization of the leader-Gla region (see section I ) . 2. Kringle Region The protein structures known as kringles have been found in several proteins including prothrombin (Magnusson et al.,1975), plasminogen (Sottrup-Jensen e_t al.,1978), tissue-type plasminogen activator (Pennica e_t a_l.,1983), urokinase-type plasminogen acti v a t o r (Verde et al.,1984), and factor XII (McMullen and Fujikawa,1985; Cool et al.,1985). Genes for several of these proteins have been isolated and characterized allowing a comparison of the organization of the kringle regions (Fig.25). In each case, the kringles are separated from each other and from the remainder of the protein molecule by introns. A l l of the introns that separate the kringles from the remainder of the protein or from each other interrupt the reading frame of the mRNAs in the same phase (see Fig.25); in a l l cases, the intron occurs af t e r the f i r s t nucleotide of a codon. One consequence of t h i s is that by duplicating the exon(s) encoding a kringle, duplication of the protein domain occurs because the 165 Fig.25: Comparison of the Organization of Exons in the  Kringle Domain The organization of the kringle exons in the tissue-type plasminogen activator, urokinase, plasminogen, and prothrombin genes. Details are as in Fig.24 except that the six invariant cysteine residues are denoted by a C above the exons. t P A #1 t P A #2 U r o k i n a s e P l a s m i n o g e n #4 i P l a s m i n o g e n #5 P r o t h r o m b i n #1 P r o t h r o m b i n #2 .1 67 new spliced product maintains the reading frame. Although this is not common, i t i s found in some exon-encoded domains such as the epidermal growth factor homologies found in the genes for factor IX (Anson et a_l,1984; Yoshitake et al.,1985), protein C (Foster and Davie, 1985; Plutzky et a_l.,1986), and such non-proteases as the LDL receptor (Sudhoff et al.,1985a,b). In the prothrombin, tissue-type plasminogen acivator, and urokinase-type plasminogen acti v a t o r genes, the intron found at the C-terminus of the kringles occurs at about the same nucleotide (see Figs.9 and 25). The small differences observed in the positions of these flanking introns are probably due to an intron s l i d i n g process, as discussed above for the protein C leader sequence. Many of the kringles have internal introns (Fig.25) and the position of t h i s intron varies. The second kringle in prothrombin lacks an intron, while the f i r s t kringle contains a single intron (see Fig.9). The kringles found in the tissue-type and the urokinase-type plasminogen activators have an intron at exactly the same location (Ny et a_l,l984; Degen e_t al,l986; Nagamine et al.,1984; Riccio et al.,1985) which d i f f e r s from the location of the intron in the prothrombin gene (see Fig.25). Part of the plasminogen gene has been characterized (Malinowski e_t a_l.,l984; Sadler e_t a_l.,l985) including parts of the fourth and f i f t h kringles (Fig.25). The organization of each of these plasminogen kringles d i f f e r s from other kringles and from each other (Fig.25). These differences in location of introns cannot be accounted for by an intron s l i d i n g process as the differences in intron location are not associated with 1 68 in s e r t i o n or deletion of coding sequences. These differences in intron location can either be explained by the loss of introns from an o r i g i n a l gene that contained at least four introns per kringle, or a l t e r n a t i v e l y , by the insertion of introns into kringle-encoding genes. The second p o s s i b i l i t y of intron insertion appears more l i k e l y , because i f at least four introns were o r i g i n a l l y present in the kringle gene, this would result in some"extremely small exons (e.g. 6 bp). In addition, there i s an absence of any characterized kringle containing more than one of the four introns. This proposal of intron invasion is also supported by data from the serine protease domain (see next section). It has been noted previously that the f i r s t kringle of prothrombin i s more homologous to the t h i r d kringle of plasminogen than to the second kringle of prothrombin (Kurosky et al.,1980). Because of t h i s homology between the kringles i t has been proposed that prothrombin acquired the f i r s t kringle from the ancestor to the t h i r d kringle of plasminogen (Kurosky et al.,1980), rather than the r e s u l t of a duplication of the kringle domain within the prothrombin gene. As discussed previously, the position of the intron of the f i r s t kringle of prothrombin d i f f e r s from those of a l l other kringle containing genes (Fig.25). Thus i t may be possible to use this intron as a marker to follow the evolution and movement of this kringle. The gene structures shown in Fig.25 suggest that the exons coding for kringles have a common ancestor as a single exon. This exon duplicated several times to form the ancestral exon 1 69 for the plasminogen a c t i v a t o r s , plasminogen, and the second kringle of prothrombin (Young e_t al.,1978; Kurosky et al.,1980; Patthy,1985). After multiple duplication events to form the five kringles of plasminogen, a copy of the t h i r d kringle of plasminogen was inserted into the prothrombin gene to become the f i r s t kringle found in prothrombin today (Kurosky e_t al.,1980; Patthy,1985). This i s supported by the proposal that introns have invaded some of the kringle exons after the i n i t i a l duplications, but in some cases, p r i o r to the f i n a l duplications. 3. Serine Protease Region A comparison of the exon organization of the c a t a l y t i c regions of the prothrombin gene, several serine protease genes, and the haptoglobin gene i s shown in Fig.26. As in most serine proteases, the c a t a l y t i c t r i a d residues H i s 3 6 6 , Asp" 2 2, and S e r 5 2 8 of prothrombin are located on separate exons (see Fig.26). However, none of the introns in the prothrombin gene are in similar positions to any other gene reported (see Fig.26). The serine protease genes can be divided into five d i f f e r e n t types based on the intron positions shown in Fig.26. The f i r s t group consists of the haptoglobin gene where no introns interrupt the c a t a l y t i c region. The second group comprises the genes for the pancreatic protease zymogens trypsinogen, chymotrypsinogen, and proelastase, the maxillary gland and kidney k a l l i k r e i n s , the a and 7 subunits of nerve growth factor, and the tissue-type and urokinase-type plasminogen a c t i v a t o r s . Although there are also differences 1 70 Fig.26: Comparison of the Organization of Exons in the  Serine Protease Domain The organization of the serine protease exons in the haptoglobin, trypsinogen, chymotrypsinogen, proelastase, k a l l i k r e i n , a and y subunits of nerve growth factor receptor, tissue-type plasminogen a c t i v a t o r , urokinase, complement factor B, factor IX, protein C, and prothrombin genes. Intron phases are as in Fig.24. The scale represents 100 bp. Codons for the residues at the s i t e of a c t i v a t i o n of the zymogens are denoted by the v e r t i c a l arrows; complement factor B and the y subunit of nerve growth factor are not activated in t h i s way. The codons for the active s i t e residues h i s t i d i n e , aspartate, and serine are denoted by H, D, and S respectively; in haptoglobin, however, the corresponding codons code for lysine (K), aspartate (D), and alanine (A) residues. The 3' end of the haptoglobin gene has not been characterized. The 3'-most exons of factor IX, tissue-type plasminogen activator, and urokinase have been abbreviated - they are 1935 bp, 914 bp, and 1119 bp in size r espectively. The exons coding 5' untranslated regions are indicated by the dotted boxes, and 3' untranslated regions by the slashed bars. A unique coding region of complement factor B i s indicated by the s o l i d box. Haptoglobin Trypsinogen Chymotrypslnogen Proelastase Kallikrein D i H D D - i -C f H D D - i = 3 o = i i i = } < > t \T~Y\-T~Yo-\ Vn - r I H E O i H H i - T oT D H H D tPA Urokinase Factor B Factor IX i -f_ Protein C Prothrombin i-C > II -c >i-C H i-C >II-L H r I { y ' aNGF N h i =5-11- i r T N G F H D N r H ZZ> II -L7= r-> i £ i - O o t } n { = i oC >o-C ho-i - 0 >o-C Jo-t >o-C D-o-C o-i z m v / / / / / / / / / A s Zh-LTJo-C Y/7A H 1 D i s l-o-. V ! H V//AV////////, H D i s I H hoH h ! H V////////////A H D • • s I - - i n HIM - i l - H >o-5'—^3' ioo bp 1 72 (see Fig.26), each of these genes contain (i) an intron just 3' of the codon for the active s i t e h i s t i d i n e , ( i i ) an intron 3' to the codon for the active s i t e aspartate, and ( i i i ) an intron 5' to the codon for the active s i t e serine. A l l of these introns interrupt the coding sequences at i d e n t i c a l locations, in the same phase, in each of the genes (Fig.26). The t h i r d group consists of the complement factor B gene which contains 7 introns within the c a t a l y t i c region (Fig.26). The fourth group consists of the factor IX and protein C genes which have two introns resulting in a large exon that contains both the active s i t e aspartate and serine residues. Lastly, the prothrombin gene constitutes the f i f t h group as i t i s d i f f e r e n t to a l l the other genes discussed (Fig.26). This grouping i s not only representative of the si m i l a r gene organizations but is also consistent with amino acid sequence homologies (Young et §_1.,1978; Hewett-Emmett e_t §_1.,1981) suggesting that the ancestral genes for each of these f i v e types duplicated early in the evolution of the serine proteases. The ancestral gene probably duplicated early in the evolution of the eukaryote (Young e_t aJL.,1978), and c e r t a i n l y p r i o r to the emergence of the f i r s t vertebrates 600 m i l l i o n years ago. Therefore, either enough time has passed to hide the ancestral gene organization by movement of introns, or introns have entered these genes after their divergence, and are therefore found at d i f f e r e n t locations in the d i f f e r e n t genes. Like the kringle domain, differences in the intron positions are most l i k e l y due to intron i n s e r t i o n . Many introns 1 73 are located in s i m i l a r regions of the genes, but often in d i f f e r e n t reading frames (see the trypsinogen and complement factor B genes, Fig.26). As discussed previously, these differences are probably not due to intron s l i d i n g but are more l i k e l y the result of independent intron insertions. Some of the introns seem to be shared between genes from the d i f f e r e n t groups, e.g. the f i r s t intron of factor IX and the second intron of proelastase, or the second intron of factor IX and the second intron of complement factor B (Fig.26). This could be explained by the retention of ancestral introns by these gene pairs, but i t may also be due to horizontal transfer of the intron between the genes a f t e r duplication and divergence by a mechanism such as gene conversion (Sharp,1985) . It is also possible that these introns were both inserted by chance in very similar locations. In t o t a l , the evidence points to intron insertion in order to explain the observed differences, although some intron loss may have occurred after some of the introns were inserted. Genes from invertebrate species generally have fewer introns (Gilbert,1985; G i l b e r t et al.,1986). This could be accounted for either by loss of introns in the invertebrate species, or by less i n s e r t i o n of introns within these species. A gene for a serine protease homologous to trypsinogen has been isolated from the invertebrate Drosophilia melanogaster (Davis et al.,1985). This gene lacks introns and therefore may represent a copy of the ancestral, early eukaryote intron-less serine protease gene. Subsequent invasion of introns after 1 74 duplication to form the f i v e families of serine protease genes provided the d i s t i n c t i v e organizations seen today (Fig.26). Duplications during the intron invasion process would r e s u l t in genes sharing some introns, but d i f f e r i n g in others, as i s observed in the trypsinogen-like genes (Fig.26). G. ORIGIN OF INTRONS AND EXON SHUFFLING 1 . Origin Of Introns It has been proposed that introns have been present since the beginnings of l i f e (Blake,1978; Doolittle,1978; Darnell and D o o l i t t l e , 1986; Gilbert ejt §_1.,1986) but present evidence from the flavin-containing enzymes does not support t h i s (Longby and G i l b e r t , 1985; Stone et al.,1985; Rogers,1985; Duester et al.,1985; McKnight et a_l.,l986; see introduction). Here, additional evidence i s presented indicating that at least the majority of introns may have become inserted into the genes for serine proteases well after the o r i g i n of l i f e . The presence of introns in the d i s t a n t l y separated branches of l i f e (eukaryotes, prokaryotes, and archaebacteria, Darnell and Doolittle,1986), may possibly be due to multiple origins of introns and/or the transfer of i n f e c t i v e , ancestral introns between the kingdoms of l i f e . Despite the uncertainty of the o r i g i n of introns, i t i s clear that they have been invasive and mobile. The invasion process started early in eukaryotic evolution, as shown by the common intron found in the fungal, plant and animal genes for tr iose-phosphate isomerase (McKnight et al.,1986; G i l b e r t et_ 175 al.,1986). This process appears to have been completed at least 450 m i l l i o n years ago, perhaps because of loss of mobility. Evidence for the loss of mobility comes from comparison of genes which are known to have duplicated in the last several hundred m i l l i o n years, for example the globin genes (Edgell et al.,1983; Darnell and D o o l i t t l e , 1986) and the insu l i n genes (Perler e_t a_l.,l 980). Both intron s l i d i n g and intron loss have occurred (Perler e_t al.,1980), but these events can be explained by mechanisms unrelated to the mobilization of introns (intron i n s e r t i o n ) . Indeed, l i t t l e change is observed in the organization of the triose-phosphate isomerase gene in plants and animals, a divergence of at least one b i l l i o n years (Marchionni and Gilbert,1986; Gilbert et al.,1986). No gain of introns has been c l e a r l y demonstrated to have occurred during the last 450 m i l l i o n years in the vertebrates. The differences between the triose-phosphate isomerase genes of the vertebrates and plants (Marchionni and Gilbert,1986) could be due to intron insertion in the plant lineage or due to intron loss in the vertebrate lineage; present evidence cannot distinguish between these two p o s s i b i l i t i e s . The flavin-containing enzymes, which duplicated prior to the divergence of the eukaryote, prokaryote, and archabacteria lineages, do not share any introns though many appear in similar locations (Duester et §_1.,1986; see Introduction). Other gene fam i l i e s duplicated later in the evolution of l i f e , but early in the evolution of the eukaryote. These gene families such as the fibrinogen genes (Crabtree e_t al.,1985) and 1 76 the serine protease genes (see above) show varying degrees of intron sharing which i s proportional to the time since the genes duplicated and diverged. The organization of the genes of these different gene families can best be explained by the invasion of introns into these genes over time rather than the movement and loss of introns. The time period of intron invasion probably i n i t i a t e d before the divergence of the filamentous fungi from plants and animals as observed by the shared intron of the triose-phosphate isomerase gene of Aspergillus , maize, and chicken (McKnight et al.,1986). This divergence was greater than 1.2 b i l l i o n years (Gilbert et al.,1986), and was completed prior to the divergence of the vertebrates, which occurred at least 450 m i l l i o n years ago. 2. Exon Shuffling Exon s h u f f l i n g as proposed by Gilbert(1978,1979) provides a role for introns in the evolution of genes, but not a role for introns themselves (Crick,1979; Rogers,1985; C a v i l i e r -Smith,l985). Today, the processes of intron s p l i c i n g are much better understood (Keller,1984; Ruskin and Green,1985), yet we s t i l l do not know the function of introns. Despite t h i s , i t is clear that introns have had a role in the evolution of many genes (Sudhoff et al.,1985b; Gilbert,1985; G i l b e r t et al.,1986). As discussed previously, various parts of the prothrombin molecule have homology to other proteins in both amino acid sequence and gene organization. This homology cannot be accounted for just by gene duplication events as only some, but not a l l protein domains are shared by other i n d i v i d u a l proteins. 177 Shuffling of exons would account for the observed patterns, e s p e c i a l l y as seen for the kringle stuctures (see above). The f i r s t kringle of prothrombin shares the highest amino acid homology not with the second kringle of prothrombin, but with the t h i r d kringle of plasminogen (Kurosky et al.,1980). This implies that prothrombin acquired the f i r s t kringle from plasminogen rather than from i t s e l f . The best mechanism for th i s i s an exon s h u f f l i n g event which copied the t h i r d kringle of plasminogen and inserted i t as the f i r s t kringle of prothrombin. The Gla region appears to have a common ancestor for a l l the vitamin K-dependent coagulation factors, and i t s i n i t i a l source i s unknown. It appears to have been gained by an exon s h u f f l i n g type event with acquisition of the pro-peptide and Gla as one event, and even possibly a c q u i s i t i o n of the pre-peptide as an ad d i t i o n a l event (or both together). Gene correction events appear to have had a role in maintaining the organization of the leader and Gla region in the face of intron i n s e r t i o n a f t e r the duplication of the prothrombin ancestor and the factor IX-like gene ancestor. This would then explain the i d e n t i c a l gene structures in contrast to the d i f f e r i n g organization of the serine protease domain (see Fig.24 and 26). H. EVOLUTION OF THE ACTIVE SITE SERINE CODON Amino acid sequence, DNA sequence, and gene structure data can be combined in an attempt to explain the evolutionary relationships within the family of serine proteases, and within the subfamily of vitamin K-dependent coagulation factors in 178 p a r t i c u l a r . Amino acid sequence comparisons have produced several evolutionarty trees of the serine proteases (Young et al.,1978; Hewett-Emmett et al.,1980; Patthy,1985) . One common feature of these relationships i s that the vitamin K-dependent coagulation factors are more c l o s e l y related to each other than to other serine proteases. It has been suggested that the ancestor of the vitamin K-dependent serine proteases diverged from the digestive serine proteases very early in the history of the family of serine proteases (Young et al.,1978). Serine proteases contain a conserved active s i t e sequence of Gly-Asp-Ser-Gly-Gly, with the Ser being the active s i t e serine residue. Serine has six possible codons: TCG, TCA, TCT, TCC, AGT, and AGC. These can be separated into two types: TCN, were N is G, A, T, or C and AGY, where Y i s T or C. Serine is unique in the genetic code in that i t i s not possible to go from one codon to a l l other codons by single base pair changes whilst s t i l l retaining the a b i l i t y to code for the same amino acid residue. To change from the TCN type codon to a AGY type codon, at least two nucleotide changes are required, and i f this occurs as single base pair changes, then an intermediate sequence w i l l have to exist which does not code for serine. If such a change occurs at the active s i t e of a serine protease, the protease would lose i t s c a t a l y t i c a c t i v i t y due to the absence of the active s i t e serine residue. The a c t i v i t y would be restored when the serine residue was restored. Both TCN and AGY types of codons for the active s i t e serine exist within the family of serine proteases, as determined by 1 79 cDNA and gene sequence analysis. The AGY type codon i s found in a small number of serine proteases including the vitamin K-dependent coagulation factor cDNAs characterized to date. These include factor VII (Hagen e_t al.,1986), factor IX (Kurachi and Davie,1982; Jaye et al.,1983), factor X (Fung et al.,1984,1985; Leytus et al,l984), protein C (Long et al.,1984; Foster and Davie,1984; Bechmann et al.,1985), and prothrombin (MacGillivray et al.,1980; Degen et al.,1983; MacGillivray and Davie,1984; Fig.18). The only other serine protease known to have the AGY type codon at i t s active s i t e i s plasminogen (Mallinoski e_t §_1.,1984). A l l of the other serine proteases have the TCN type active s i t e codon, including the digestive zymogens (Craik et al.,1985; B e l l et al.,1985; Swift et al.,1985), f i b r i n o l y t i c zymogens (Pennicia e al.,1983; Verde e_t al.,1984), complement factors (Campbell et al.,1983), the protein processing proteases of the k a l l i k r e i n family (Mason e_t al.,1984; Evans and Richards,1985; Ashley and MacDonald,1985; van Leewuen et al.,1986), c y t o l y t i c proteases (Gershenfeld and Weissman,1986; Lobe et al.,1986), and the non-vitamin K-dependent coagulation factors factors XII and XI, and p r e k a l l i k r e i n (Cool et al.,1985; Fujikawa et §_1.,1986; Chung et al.,1986). A gene for a digestive serine protease in Drosophi 1 ia melanogaster has been isolated (Davis et_ al.,1985), and t h i s gene also has the TCN type serine codon. The d i s t r i b u t i o n of the types of serine codons suggests that the ancestral serine codon for the serine protease gene was of the TCN type, which is now found in both vertebrates and 1 8 0 invertebrates. If t h i s i s true, then during the evolution of the vitamin K-dependent coagulation f a c t o r s , the codon for serine changed from the TCN type to the AGY type, and i f this occurred as two separate base pair changes (which appears to be more l i k e l y than a simultaneous double mutation), then an intermediate protein existed which had no serine protease a c t i v i t y . It is interesting to note that haptoglobin is homologous to serine proteases (Kurosky et al.,1980) but is inactive at least in part because of mutations in i t s active s i t e . It i s possible that haptoglobin i s a descendent of the non-serine protease coagulation factor intermediate. This would be possible i f the gene for the non-serine protease intermediate was duplicated prior to the second point mutation (the one to restore serine protease function) with one product becoming the coagulation factors, and the second product becoming haptoglobin. The reason that plasminogen also has the AGY type serine codon is not c l e a r . Amino acid sequence homology indicates that plasminogen i s only d i s t a n t l y related to the vitamin K-dependent coagulation factors (Hewett-Emmett e_t al.,1980), and is the result of a separate duplication from the one giving rise to the digestive zymogen ancestor. This implies that the serine codon in plasminogen changed independentely of the serine codon of the vitamin K-dependent coagulation f a c t o r s . The rates of evolution of the amino acid sequence of most of the serine proteases are unknown, and therefore amino acid sequence homology may not provide the best description of the evolutionary relatedness of 181 these proteins. Unfortunately, the gene structure of plasminogen is not completely known (Sadler et al.,1985), and cannot be used at this time to aid in solving i t s relationships to the vitamin K-dependent coagulation factors. I. MODEL OF THE EVOLUTION OF THE VITAMIN K-DEPENDENT COAGULATION FACTORS A model for the evolution of the vitamin K-dependent coagulation factors is shown in Fig.27. In t h i s model, amino acid sequence homologies, change(s) in active s i t e serine codon, and gene st r u c t u r a l organization are a l l used in an attempt to describe the pathway of evolution of the coagulation factors. It i s clear that the vitamin K-dependent coagulation factors are a separate branch of the family of serine proteases, as shown by th e i r amino acid sequence homology (Hewett-Emmett et al.,1980), and t h e i r common active s i t e serine codon (see above). Based on amino acid sequence homologies, the ancestor to the vitamin K-dependent coagulation factors diverged from the digestive protease zymogens, probably after a gene duplication. This occurred early in eukaryotic evolution and probably greater than one b i l l i o n years ago (Young et §_1.,1978). It i s also clear that the prothrombin and factor IX-like genes diverged early in eukaryotic evolution greater than 600 m i l l i o n years ago (Young et a_l.,l978), as is evident from the differences found in gene organization of the prothrombin and factor IX-like genes (see Fig.26). If the haptoglobin gene i s also derived from the vitamin K-dependent coagulation factor ancestor, t h i s would give t h i s branch of the serine protease family a t h i r d type of gene 1 82 Fig.27: A Model for the Evolution of the Vitamin K-Dependent  Coagulation Factors Rectangles represent the serine protease domain, with S for active serine protease, and X for altered active s i t e serine residue. Triangles with y represent the leader-Gla domain. Squares represent the kringles, and numbered as in mammalian prothrombins. C i r c l e s with E represent the epidermal growth factor homologies. (see text for d e t a i l s ) 1 8 3 d u p l i c a t i o n p o i n t I? m u t a t i o n po in t ^ m u t a t i o n T R Y P S I N O G E N E T C d u p l i c a t i o n HAPTOGLOBIN / 7 \ gene fusion duplicat ion N gene fusion • gene fusion d u p l i c a t i o n s P R O T H R O M B I N F A C T O R VII / F A C T O R I X \ F A C T O R X P R O T E I N C P R O T E I N Z > 1 X 10" y r s > 250 X 1 0 ° yrs 1 84 o r g a n i z a t i o n , i n d i c a t i n g the g r e a t age of t h i s branch. The s h a r i n g of i n t r o n p o s i t i o n between the genes f o r t r i o s e -phosphate isomerase between p l a n t s and a n i m a l s ( M a r c h i o n n i and G i l b e r t , 1 9 8 6 ; G i l b e r t e t a l . , 1 9 8 6 ) s u g g e s t s t h a t the d i f f e r e n t branches of the s e r i n e p r o t e a s e f a m i l y d i v e r g e d from each o t h e r more than one b i l l i o n y e a r s ago. T h i s a n c i e n t age of the d u p l i c a t i o n s may e x p l a i n the d i f f e r e n t gene o r g a n i z a t i o n s due t o i n t r o n i n s e r t i o n s . Amino a c i d homology (Young et a l . , 1 9 7 8 ; Hewett-Emmett e t a l . , 1 9 8 0 ) would not d i s a g r e e w i t h the d a t e s of th e s e d u p l i c a t i o n s . F i g . 2 7 d e m o n s t r a tes the e a r l y gene d u p l i c a t i o n s e p a r a t i n g the c o a g u l a t i o n f a c t o r a n c e s t o r from the d i g e s t i v e p r o t e a s e zymogen ( e . g . t r y p s i n o g e n ) gene, p r o b a b l y more than one b i l l i o n y e a r s ago. To change the a c t i v e s i t e s e r i n e codon, a t l e a s t two p o i n t m u t a t i o n s a r e r e q u i r e d . The f i r s t m u t a t i o n preceeded the d u p l i c a t i o n t h a t l e d t o the s e p a r a t i o n of h a p t o g l o b i n and the c o a g u l a t i o n f a c t o r s . The second m u t a t i o n r e s t o r e d s e r i n e p r o t e a s e f u n c t i o n , but o c c u r r e d i n o n l y one of the two p r o d u c t s of the gene d u p l i c a t i o n . These two p o i n t m u t a t i o n s p r o b a b l y o c c u r r e d c l o s e t o g e t h e r i n t i m e , so t h a t no o t h e r p o i n t m u t a t i o n s c o u l d a l t e r e s s e n t i a l p a r t s of the p r o t e a s e domain and p r e v e n t p o s s i b l e f u n c t i o n as a s e r i n e p r o t e a s e once the s e r i n e codon was r e s t o r e d . A l l v i t a m i n K-dependent c o a g u l a t i o n f a c t o r s have a G l a r e g i o n ( J a c k s o n and Nemerson,1980), so a c q u i s i t i o n of t h i s r e g i o n p r o b a b l y p r e d a t e s o t h e r changes i n the m o l e c u l e s (though l a t e r exon s h u f f l i n g e v e n t s may a l s o be i n v o l v e d w i t h a c q u i s t i o n of t h i s r e g i o n i n some g e n e s ) . A l l G l a c o n t a i n i n g 185 genes characterized to date have prepro-leaders (Fung e_t al.,1985; Pan et al.,1985), implying that the prepro-leader was acquired together with the Gla domain. Exon organization of t h i s region (Fig.24) supports t h i s proposal, though the pre-peptide may have been acquired at a separate time (the pre-peptide may have been part of the o r i g i n a l protease gene to allow secretion, e.g. as in trypsinogen). Exon s h u f f l i n g may account for the acquisition of t h i s domain, and t h i s requires introns to be present so that the intron invasion process must have started (but not finished, see below). Duplication of the Gla containing protease gene would then allow the formation of prothrombin and the factor IX-like genes. The Gla domain appears to be found in a l l prothrombin molecules is o l a t e d to date , including the lamprey ( D o o l i t t l e e_t a l . , 1 962 ) ; thus the Gla region must have been acquired at least 450 m i l l i o n years ago. After duplication of the Gla containing protease gene, intron invasion continued to produce the d i s t i n c t i v e organizations of the serine protease domains (see Fig.26). The Gla region retained i t s p a r t i c u l a r organization while intron invasion occurred. This implies that a homogenization process of the Gla region may have been involved (e.g. gene conversion) to retain this organization, similar to the processes often seen with repeated DNA sequences (Dover,1982). Additional protein domains were acquired by both the prothrombin and the factor IX-like genes (prothrombin acquired the two kringles, and the factor IX-like genes acquired the epidermal growth factor homologies). In both genes these 1 86 domains are found as dis c r e t e units made up of one or two exons. These domains are organized such that insertion of the exon(s) would not create frame s h i f t s , but would result in a larger mRNA using the same reading frame (Fig.25). Exon shuffling appears to be the mechanism by which one copy of each of these domains was inserted into the respective gene. In both the prothrombin and factor IX-like genes, t h i s new domain i s found twice and in both cases and i t does not appear that the two copies of the domain are the result of an p a r t i a l gene duplication (Patthy,1985). It appears that in both genes, a second copy of the same domain was inserted independently. The l i k e l i h o o d of two independent insertions of the same sequence into the same gene seems extremely u n l i k e l y . A possible mechanism for th i s occurrence is a p a r t i a l gene duplication of this repeated domain followed at a la t e r time by a gene conversion type event with an unrelated gene. This would mask the internal gene duplication event and increase the p r o b a b i l i t y of an exon shuffling event. In prothrombin i t appears that the f i r s t kringle acquired was kringle 2 followed by kringle 1 which was acquired from kringle 3 of plasminogen (Kurosky et al.,1980). In the factor IX-like genes, the order of the a c q u i s i t i o n of the EGF homologies or i f both were acquired at the same time is unknown. Further amino ac i d substitutions and to a lesser extent insertions and deletions (see Fig.23) resulted in the prothrombin genes found today. As demonstrated by the conserved protein stucture of prothrombin in mammals and birds, the structure of prothrombin found today was completed at least 250 1 87 m i l l i o n y e a r s a g o . The f a c t o r I X - l i k e g e n e h a s u n d e r g o n e many g e n e d u p l i c a t i o n e v e n t s t o p r o d u c e t h e f a m i l y o f f a c t o r s V I I , I X , X , p r o t e i n C , a n d p r o t e i n Z . A s f a c t o r s V I I , I X , a n d X a r e f o u n d i n b o t h c h i c k e n s ( D i d i s h e i m e t a l . , 1 9 5 9 ; W a l z e t a l . , 1 9 7 4 ) a n d mammals ( J a c k s o n a n d N e m e r s o n , 1 9 8 0 ) , i t a p p e a r s t h a t t h e f a c t o r I X - l i k e s t r u c t u r e was c o m p l e t e d a t l e a s t 250 m i l l i o n y e a r s a g o , a s w e r e a t l e a s t some o f t h e g e n e d u p l i c a t i o n e v e n t s . F u r t h e r c h a r a c t e r i z a t i o n o f c o a g u l a t i o n f a c t o r g e n e s i n t h e o t h e r c l a s s e s o f v e r t e b r a t e s a n d a t t e m p t s t o i d e n t i f y t h e s e g e n e s i n t h e n o n - v e r t e b r a t e c h o r d a t e s w o u l d a s s i s t i n c l a r i f i n g t h e p a t h w a y s o f e v o l u t i o n o f t h e v i t a m i n K - d e p e n d e n t c o a g u l a t i o n f a c t o r s , a n d w o u l d d a t e t h e s t e p s i n t h e i r e v o l u t i o n m o r e p r e c i s e l y . J . EVOLUTION OF THE BLOOD COAGULATION S Y S T E M I t seems t o be p o s s i b l e t o t r a c e t h e e v o l u t i o n a r y h i s t o r i e s o f i n d i v i d u a l c o a g u l a t i o n f a c t o r s ( s e e a b o v e a n d F i g . 2 7 ) , b u t t h e e v o l u t i o n o f t h e c o a g u l a t i o n s y s t e m i s m o r e c l o u d e d . I t i s d i f f i c u l t t o i m a g i n e a m o d e r n v e r t e b r a t e w i t h o u t a b l o o d c o a g u l a t i o n s y s t e m ( t h e l o s s o f o n e b l o o d c o a g u l a t i o n f a c t o r t o c a u s e h e m o p h i l i a i s d a m a g i n g e n o u g h ) . C l e a r l y , c o a g u l a t i o n o f some t y p e mus t h a v e e v o l v e d p r i o r t o o r w i t h t h e e m e r g e n c e o f t h e v e r t e b r a t e s 600 m i l l i o n y e a r s a g o . T h i s p r e - v e r t e b r a t e l i f e f o r m i s u n k n o w n , a n d t h e r e f o r e p r e s e n t s d i f f i c u l t i e s i n a t t e m p t i n g t o f o l l o w t h e o r i g i n a n d d e v e l o p m e n t o f t h e b l o o d c o a g u l a t i o n s y s t e m . I t h a s b e e n p r o p o s e d t h a t t h e c o a g u l a t i o n f a c t o r s e v o l v e d f r o m p r o t e i n s w h i c h p r e v i o u s l y e x i s t e d i n p l a s m a 188 (Doolittle,1961) and which had functions unrelated to hemostasis. The f i r s t role of fibrinogen may have been to increase the v i s c o s i t y of blood (Doolittle,1961). Thrombin (or prothrombin) may have evolved from another plasma protease zymogen, afte r the a c q u i s i t i o n of i t s a b i l i t y to produce insoluble f i b r i n from fibrinogen. A l l blood coagulation systems yet described are much more complicated than t h i s simple system described; indeed, i t may no longer exist today. A l l the vitamin K-dependent coagulation factors are related to each other probably as a resu l t of gene duplications and other events (see above). Thus, the expansion of the blood coagulation cascade may be the d i r e c t result of these gene duplications with subsequent modification of the substrate s p e c i f i c i t y to produce the stepwise cascade of reactions. Accessory proteins are also required for e f f i c i e n t blood coagulation, and i t appears that at least factors V and VIII are related to each other (Fass et al.,1985). In fact, the enzyme complexes for prothrombin and factor X a c t i v a t i o n are very similar (see Fig.2) and could e a s i l y be due to dupl i c a t i o n of the entire complex and their genes. The duplication of a prothrombin ancestor cannot account for a l l the serine proteases found in the mammalian coagulation cascade. The serine proteases involved in i n t r i n s i c coagulation i n i t i a t i o n are not c l o s e l y related to prothrombin, as indicated by the presence of the TCN type serine codon at their active s i t e instead of AGY (see above). Factor XII appears more clo s e l y related to the f i b r i n o l y t i c enzymes tissue-type and 189 urokinase-type plasmingen activators (Cool et al.,1985; Neurath,1985). Evidence for i n t r i n s i c blood coagulation in the chicken and f i s h i s absent (Didisheim et a1.,1959; MacFarlane, 1960; D o o l i t t l e et al.,1962), i n d i c a t i n g that i n t r i n s i c i n i t i a t i o n may be absent in these species. The factor XI and p r e k a l l i k r e i n amino acid and nucleotide sequences are homologous (Chung et al.,1986; Fujikawa ejt al.,1986). It has been proposed that the genes for these two coagulation factors are the result of a recent gene duplication event that occurred approximately 250 m i l l i o n years ago (Chung et al.,1986). This duplication event may have occurred only in mammals as the mammalian lineage diverged from the r e p t i l i a n and avian lineages also about 250 m i l l i o n years ago (Culbert,1980). Thus, this gene duplication in mammals may have provided the necessary proteases to allow the evolution of an i n t r i n s i c blood coagulation cascade. It is possible to reconcile the absence of i n t r i n s i c coagulation in the non-mammalian vertebrates, with the possible existance of additional plasma proteases in the mammals. The development and evolution of the mammalian blood coagulation system has thus involved many d i f f e r e n t types of gene evolution events. As shown in Fig.27 gene fusion events (mediated by exon shuffling) have been responsible for the construction of the various blood coagulation proteins. Gene duplications have been involved in the supply of new proteases to allow the expansion of the cascade (the vitamin K-dependent proteins, see Fig.27). Gene duplications of d i s t a n t l y related 1 90 proteases, which possibly had no role in coagulation, allowed the evolution of a variant of the blood coagulation cascade (the i n t r i n s i c pathway). Investigation of the structure of the genes of the mammalian blood coagulation proteins has helped in providing a clea r e r picture of the mechanisms which have been involved in the formation of this essential physiological process. 191 LITERATURE CITED 1. Alber, T., and Kawaski, G. (1982). Nucleotide Sequence of the Triose Phosphate Isomerase Gene of Saccharomyces cer e v i s i a e . J. Mol. Appl. Genet. Jj_ 419-434. 2. Anderson, G. F., and Barnhart, M. I. (1964). I n t r a c e l l u l a r L o c a l i z a t i o n of Prothrombin. Proc. Soc. Exp. B i o l . Med. 116; 1-16. 3. Anson, D. S., Choo, K. H., Rees, D. J. G., G i a n n e l l i , F. , Gould, K., Huddleston, J . A., and Brownlee, G. G. (1984). The Gene Structure of Human Anti-Haemophi1ic Factor IX. EMBO J. 3}_ 1053-1060. 4. Artymiuk, P. J . , Blake, C. C. F., and Sippel, A. E. (1981). Genes Pieced Together - Exons Delineate Homologous Structures of Diverged Lysozymes. Nature 290; 287-288. 5. Ashley, P. L., and MacDonald, R. J. (1985). K a l l i k r e i n -Related mRNAs of the Rat Submaxillary Gland: Nucleotide Sequence of Four D i s t i n c t Types Including Tonin. Biochemisty 24; 4512-4520. 6. Atkinson, T. and Smith, M. (1984). Solid Phase Sythesis of Oligodeoxyribonucleotides by the Phosphite-Triester Method, in Oligonucleotide Synthesis: A P r a c t i c a l  Approach (Gait, M. J . Ed.), IRL Press, Oxford, pp. 35-81 . 7. Aviv, H., and Leder, P. (1972). P u r i f i c a t i o n of B i o l o g i c a l l y Active Globin Messenger RNA by Chromatography on Oligothymidylic Acid-Cellulose. Proc. Natl. Acad. Sci . USA 69j_ 1 408-1 41 2. 8. Barlow, J . J . , Mathias, A. P., and Williamson, R. (1963). A Simple Method for the Quantitative Isolation of Undegraded High Molecular Weight Ribonucleic Acid. Biochem. Biophys. Res. Commun. 7j_ 61-66. 9. Beckmann, R. J . , Schmidt, R. J., Santerre, R. F., Plutzky, J . , Crabtree, G. R., and Long, G. L. (1985). The Structure and Evolution of a 461 Amino Acid Human Protein C Precursor and Its Messenger RNA Based Upon the DNA Sequence of Cloned Liver cDNA. Nucleic Acids Res. 13; 5233-5247. 10. B e l l , G. I., Quinto, C , Quiroga, M., Valenzuela, P., Craik, C. S., and Rutter, W. J. (1984). Isolation and Sequence of a Rat Chymotrypsinogen B Gene. J. B i o l . Chem. 259; 14265-14270. 1 92 11. Bently, A. K. , Rees, D. J. G., Rizza, C , and Brownlee, G. G. (1986). Defective Propeptide Processing of Blood C l o t t i n g Factor IX Caused by a Mutation of Arginine to Glutamine at Position -4. C e l l 45; 343-348. 12. Benton, W. D., and Davis, R. W. (1977). Screening Xgt Recombinant Clones by Hybridization in s i t u . Science 196; 180-182. 13. Benyajati, C., Place, A. R., Powers, D. A., and Sofer, W. (1981). Alcohol Dehydrogenase Gene of Drosophilia melanogaster: Relationship of Intervening Sequences to Functional Domains of the Protein. Proc. Natl. Acad. S c i . USA 78j_ 2717-2721. 14. Benyajati, C , Spoerel, N., Haymerle, H., and Ashburner, M. (1983). The Messenger RNA for Alcohol Dehydrogenase in Drosophilia melanogaster D i f f e r s in Its 5' End in Different Developmental Stages. C e l l 33; 125-133. 15. Berget, S. M. (1984). Are U4 Small Nuclear Ribonucleoproteins Involved in Polyadenylation? Nature 309; 179-182. 16. Berget, S. M., Moore, C , and Sharp, P. A. (1977). Spliced Segments at the 5' Termininus of Adneovirus 2 late mRNA. Proc. Natl. Acad. S c i . USA 7 4; 1371-1375. 17. Biggs, R., Douglas, A. S., MacFarlane, R. G., Dacie, J. V., Pitney, W. R. , Merskey, C , and O'Brien, J. R. (1952). Christmas Disease: A Condition Previously Mistaken for Haemophilia. B r i t . Med. J. 2; 1378-1382. 18. Birboim, H. C , and Doly, J. (1979). A Rapid Extraction Procedure for Screening Recombinant Plasmid DNA. Nucleic Acids Res. 7j_ 1513-1523. 19. B i r n s t i e l , M. L., Busslinger, M., and Strub, K. (1985). Transcription Termination and 3' Processing: The End i s in Site. C e l l 349-359. 20. Blake, C. C. F. (1978). Do Genes-In-Pieces Imply Proteins-In-Pieces? Nature 273; 267. 21. Blake, C. (1983a). Exons - Present From the Begining? Nature 306; 535-537. 22. Blake, C. (1983b). Exons and the Evolution of Proteins. Trends Biochem. S c i . 8j_ 11-13. 23. Blake, C. C. F. (1985). Exons and the Evolution of Proteins. Int. Rev. Cytol. 93; 149-185. 1 93 24. Blattner, F. R. , Williams, B. G., Blechl, A. E., Denniston-Thompson, K., Farber, H. E., Furlong, L. -A., Grunwald, D. J. , Kiefer, D. 0., Moore, D. D., Schamm, J . W. , Sheldon, E. L., and Smithies, 0. (1977). Charon Phages: Safer Derivatives of Bacteriophage Lambda for DNA Cloning. Science 196; 161-169. 25. B l i n , N. , and Stafford, D. W. (1976). A General Method for I s o l a t i o n of High Molecular Weight DNA from Eukaryotes. Nucleic Acids Res. 3j_ 2303-2308. 26. Bloom, A. L. (1981). Inherited Disorders of Blood Coagulation, in Haemostasis and Thrombosis (Bloom, A. L., and Thomas, D. P~! Eds.), Churchill Livingstone, Edinburgh, pp. 321-370. 27. Bloomquist, M. C , Hunt, L. T., and Barker, W. C. (1984). Vaccina Virus 19-Kilodalton Protein: Relationship to Several Mammalian Proteins Including Two Growth Factors. Proc. N a t l . Acad. S c i . USA 8jj_ 7363-7367. 28. Breathnach, R., and Chambon, P. (1981). Organization and Expression of Eukaryotic S p l i t Genes Coding for Proteins. Ann. Rev. Biochem. 50; 349-383. 29. Brinkhous, K. M. (1947). Clotting Defeciency in Haemophilia: Deficiency in a Plasma Factor Required for P l a t l e t U t i l i z a t i o n . Proc. Soc. Exp. B i o l . Med. 66; 117-120. 30. Brown, J. R. , Daar, I. 0., Krug, J. R., and Maquat, L. E. (1985). Characterization of the Functional Gene and Several Processed Pseudogenes in the Human Triosephosphate Isomerase Gene Family. Mol. C e l l . B i o l . 5j_ 1694-1706. 31. Busslinger, M. , Moschonas, N., and F l a v e l l , R. A. (1981). /3 + Thalassemia: Aberrant Splicing Results from a Single Point Mutation in an Intron. C e l l 27; 289-298. 32. Butkowski, R. J . , E l i o n , J., Downing, M. R., and Mann, K. G. (1977). Primary Structure of Human Prethrombin 2 and a-Thrombin. J. B i o l . Chem. 252; 4942-4957. 33. Calos, M. P., and M i l l e r , J. H. (1980). Transposable Elements. C e l l 20; 579-595. 34. Campbell, R. D., and Porter, R. R. (1983) Molecular Cloning and Characterization of the Gene Coding for Human Complement Protein Factor B. Proc. Natl. Acad. S c i . USA 80j_ 4464-4468. 1 94 35. Campbell, R. D., Bentley, D. R., and Morley, B. J. (1984). The Factor B and C2 Genes. P h i l . Trans. R. Soc. Lond. B. 306; 367-378. 36. Cavalier-Smith, T. (1978). Nuclear Volume Control by Nucleoskelatal DNA, Selection for C e l l Volume and C e l l Growth Rate, and the Solution of the DNA C-Value Paradox. J. C e l l . S c i . 34; 247-278. 37. Cavalier-Smith, T. (1985). S e l f i s h DNA and the Origin of Introns. Nature 315; 283-284. 38. Cech, T. R. (1983). RNA S p l i c i n g : Three Themes with Variation. C e l l 34j_ 713-716. 39. Cheng, S. -M., Suzuki, A., Zon, G. and L i u , T. -Y. (1986). Characterization of a Complementary Deoxyribonucleic Acid for the Coagulogen of Limulus polyphemus. Bioc. Bioph. Acta 868; 1-8. 40. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutter, W. J. (1979). Isolation of B i o l o g i c a l l y Acitve Ribonucleic Acid from Sources Enriched in Ribonuclease. Biochemistry J_8j_ 5294-5299. 41. Chow, L. T., Gelinas, R., Broker, T. R., and Roberts, R. J . (1977). An Amazing Sequence Arrangement at the 5' Ends of Adnovirus 2 Messenger RNA. C e l l 12; 1-8. 42. Chow, L. T., and Broker, T. R. (1981). Mapping RNA:DNA Heteroduplexes by Electron Microscopy, in Electron  Microscopy in Biology ( G r i f f i t h , J . D. Ed.7"^ v o l . 1, Wiley, New York, pp. 139-188. 43. Chung, D. W., Fujikawa, K., McMullen, B. A., and Davie, E. W. (1986). Human Plasma P r e k a l l i k r e i n , A Zymogen to a Serine Protease that contains Four Tandem Repeats. Biochemistry 25; 2410-2417. 44. Comp, P. C , Nixon, R. R. , Cooper, M. R., and Esmon, C. T. (1984). F a m i l i a l Protein S Deficiency is Associated with Recurrent Thrombosis. J . C l i n . Invest. 74; 2082-2088. 45. Cool, D. E., Edgell, C. - J . S., Louie, G. V., Zoll e r , M. J . , Brayer, G. D., and MacGillivray, R. T. A. (1985). Characterization of Human Blood Coagulation Factor XII cDNA: Prediction of the Primary Structure of Factor XII and the Te r t i a r y Structure of 0-Factor X l l a . J. B i o l . Chem. 260; 13666-13676. 46. Cornish-Bowden, A. (1985). Are Introns Structural Elements or Evolutionary Debris? Nature 313; 434-435. 1 95 47. Crabtree, G. R., and Kant, J. A. (1982) Organization of the Rat 7-Fibrinogen Gene: Alternate mRNA Splice Patterns Produce the 7A and 76 (7 ') Chains of Fibrinogen. C e l l 31; 159-166. 48. Crabtree, G. R. , Comeau, C. M., Fowkes, D. M., Fornace, A. J . , Malley, J. D., and Kant, J. A. (1985). Evolution and Structure of the Fibrinogen Genes: Random Intron Insertion of Introns or Selective Loss? J. Mol. B i o l . 185; 1-19. 49. Craik, C. S., Sprang, S., F l e t t e r i c k , R., and Rutter, W. J . (1982a). Intron-Exon Splice Junctions Map at Protein Surfaces. Nature 299; 180-182. 50. Craik, C. S., Laub, 0., B e l l , G. I., Sprang, S., F l e t t e r i c k , R. , and Rutter, W. J. (1982b). The Relationship of Gene Structure to protein Structure, in Gene Regulation (O'Malley, B., and Fox, C. F. Eds.), Academic Press, New York, pp. 35-54. 51. Craik, C. S., Rutter, W. J., and F l e t t e r i c k , R. (1983). Splice Junctions: Association with Variation in Protein Structure. Science 220; 1125-1129. 52. Craik, C. S., Choo, Q. -L., Swift, G. H., Quinto, C , MacDonald, R. J., and Rutter, W. J . (1984). Structure of Two Related Rat Pancreatic Trypsin Genes. J . B i o l . Chem. 259; 14255-14264. 53. Craik, C. S., Largman, C , Fletcher, T., Roczniak, S., Barr, P. J . , F l e t t e r i c k , R., and Rutter, W. J. (1985). Redesigning Trypsin: Alteration of Substrate S p e c i f i c i t y . Science 2 28; 291-297. 54. Crick, F. (1979). S p l i t Genes and RNA S p l i c i n g . Science 204; 264-271 . 55. Culbert, E. M. (1980). Evolution of the Vertebrates , John Wiley and Sons, New York. 56. C u r t i s , C. G. (1981). Plasma Factor XIII, in Haemostasis  and Thrombosis (Bloom, A. L., and Thomas, D. P~ Eds.) , C h u r c h i l l Livingstone, Edinburgh, pp. 192-197. 57. Dagert, M. , and Eh r l i c h , S. D. (1979). Prolonged Incubation in Calcium Chloride Improves the Competence of Escherichia c o l i C e l l s . Gene 6j_ 23-28. 58. Dahlback, B., Lundwall, A., and Stenflo, J. (1986). Primary Structure of Bovine Vitamin K-Dependent Protein S. Proc. Natl. Acad. S c i . USA 83j_ 4199-4203. 59. Dam, H. (1935). The Antihaemoragic Vitamin of the Chick. 196 Biochem. J . 29; 1273-1285. 60. Dam, H., Schonheyder, F., and Tage-Hansen, E. (1936). Studies on the Mode of Action of Vitamin K. Biochem. J. 30; 1075-1079. 61. Darnell, J. E., and D o o l i t t l e , W. F. (1986). Speculations on the Early Course of Evolution. Proc. Natl. Acad. Sci . USA 83j_ 1 271 -1 275. 62. Davie, E. W., and Ratnoff, 0. D. (1964). Waterfall Sequence for I n t r i n s i c Blood C l o t t i n g . Science 145; 1310-1312. 63. Davie, E. W., Fujikawa, K., Kurachi, K. , and K i s i e l , W. (1979). The Role of Serine Proteases in the Blood Coagulation Cascade. Adv. Enzymol. 48; 277-318. 64. Davie, E. W., Degen. S. J . F., Yoshitake, S., and Kurachi, K. (1983). Cloning of Vitamin K-Dependent Cl o t t i n g Factors. Dev. Biochem. 25; 45-52. 65. Davis, C. A., R i d d e l l , D. C , Higgins, M. J . , Holden, J . J. A., and White, B. N. (1985). A Gene Family in Drosophilia melanogaster Coding for Trypsin-Like Enzymes. Nucleic Acids Res. 13; 6605-6619. 66. Degen, S. J . F., MacGillivray, R. T. A., and Davie, E. W. (1983). Characterization of the Complementary Deoxyribonucleic Acid and Gene Coding for Human Prothrombin. Biochemistry 22; 2087-2097. 67. Degen, S. J. F., Rajput, B., Reich, E., and Davie, E. W. (1985). Coagulation and F i b r i n o l y s i s : Characterization of the Human Prothrombin and Tissue Plasminogen Activator Genes, in Protides of the Bio l o g i c a l  Fluids (Peeters, H. Ed.), v o l . 33., Pergamon Press, Oxford, pp. 47-50. 68. Degen, S. J . F., Rajput, B., and Reich, E. (1986). The Human Tissue Plasminogen Activator Gene. J. B i o l . Chem. 261; 6972-6985. 69. Deininger, P. L. (1983). Random Subcloning of Sonicated DNA: Application to Shotgun DNA Sequence Analysis. Anal. Biochem. 129; 216-223. 70. Delaney, A. D. (1982). A DNA Sequence Handling Program. Nucleic Acids Res. 10; 61-67. 71. Delbaere, L. T. J . , Hucheon, W. L. B., James, M. N. G., and Thiessen, W. E. (1975). T e r t i a r y Structural Differences Between Microbial Serine Proteases and Pancreatic Serine Proteases. Nature 257; 758-763. 1 97 72. Dennis, E. S., Gerlach, W. L., Pryor, A. J . , Bennetzen, J. L., I n g l i s , A., Llewellyn, D., Sachs, M. M., F e r l , R. J., and Peacock, W. J. (1984). Molecular Analysis of the Alcohol Dehydrogenase (ADH1) Gene of Maize. Nucleic Acids Res. 12; 3983-4000. 73. Dennis, E. S., Sachs, M. M., Gerlach, W. L., Finnegan, E. J., and Peacock, W. J. (1985). Molecular Analysis of the Alcohol Dehydrogenase 2 (ADH2) Gene of Maize. Nucleic Acids Res. 13; 727-743. 74. Didisheim, P., Hattori, K., and Lewis, J. H. (1959). Hematologic Coagulation Studies in Various Animal Species. J. Lab. C l i n . Med. 53; 866-875. 75. D o o l i t t l e , R. F. (1961). The Comparative Biochemistry of Blood Coagulation, Ph. D. Thesis, Harvard Univ. 76. D o o l i t t l e , R. F. (1965). Differences in the C l o t t i n g of Lamprey Fibrinogen by Lamprey and Bovine Thrombin. Biochem J. 94; 735-741. 77. D o o l i t t l e , R. F. (1984). Fibrinogen and F i b r i n . Ann. Rev. Biochem. 53; 195-229. 78. D o o l i t t l e , R. F. (1985). The Geneology of Some Recently Evolved Vertebrate Proteins. Trends Biochem. S c i . 10; 233-237. 79. D o o l i t t l e , R. F., and Surgenor, D. M. (1962). Blood Coagulation in Fish. Amer. J. Physiol. 203; 964-970. 80. D o o l i t t l e , R. F., Oncley, J. L., and Surgenor, D. M. (1962). Species Differences in the Interaction of Thrombin and Fibrinogen. J. B i o l . Chem. 237; 3123-3127. 81. D o o l i t t l e , R. F., Feng, D. F., and Johnson, M. S. (1984). Computer-Based Characterization of Epidermal Growth Factor Precursor. Nature 307; 558-560. 82. D o o l i t t l e , W. F. (1978). Genes in Pieces: Were They Ever Together? Nature 272; 581-582. 83. Dover, G. (1982). Molecular Drive: A Cohesive Mode of Species Evolution. Nature 299; 111-117. 84. Duester, G., Jo r n v a l l , H., and H a t f i e l d , G. W. (1986). Intron-Dependent Evolution of the Nucleotide Binding Domains Within Alcohol Dehydrogenase and Related Enzymes. Nucleic Acids Res. j_4j_ 1931-1941. 85. Dush, M. K., Sikela, J. M., Kahn, S. A., 1 98 T i s c h f i e l d , J. A., and Stambrook, P. J. (1985). Nucleotide Sequence and Organization of the Mouse Adenine Phosphoribosyltransferase Gene: Presence of a Coding Region Common to Animal and B a c t e r i a l Phosphoribosyltransferases that has a Variable Intron/Exon Arrangement. Proc. Natl. Acad. S c i . USA 82; 2 7 31-2735. 86. Edgell, M. H. , Hardies, S. C , Brown, B., Voliva, C , H i l l , A., P h i l l i p s , S., Comer, M., Burton, F., Weaver, S., and Hutchison I I I , C. A. (1983). Evolution of the Mouse y Globin Complex L o c i , in Evolution of Genes and  Proteins (Nei, M., and Koehn, R. K. Eds.), Sinauer Associates Inc., Sanderland, Mass., pp. 1-13. 87. Edmonds, M., Vaughn, M. H., and Nakazato, H. (1971). Polyadenylic Acid Sequences in the Heterologous Nuclear RNA and Rapidly-Labeled Polyribosomal RNA of HeLa C e l l s : Possible Evidence for a Precursor Relatioship. Proc. Natl. Acad, S c i . USA 68j_ 1336-1340. 88. Engle, R. L., and Woods, K. R. (1960). Comparative Biochemistry and Embryology, in The Plasma Proteins (Putnam, F. W. Ed.), v o l . 2, Academic Press, New York, pp. 184-266. 89. Esmon, C. T., and Jackson, C. M. (1974). The Conversion of Prothrombin to Thrombin IV: The Function of Fragment 2 Region During Ac t i v a t i o n in the Presence of Factor V. J. B i o l . Chem. 249; 7791-7797. 90. Esmon, C. T. (1983). Protein-C: Biochemistry, Physiology, and C l i n i c a l Implications. Blood 62; 1155-1158. 91. Evans, B. A., and Richards, R. I. (1985). The Genes for the a and y Subunits of Mouse Nerve Growth Factor are Contiguous. EMBO J. §j_ 133-138. 92. Fass, D. N., Hewick, R. M., Knutson, G. J., Nesheim, M. E., and Mann, K. G. (1985). Internal duplication and sequence homology in factor V and VIII. Proc. Natl. Acad. S c i . USA 82; 1688-1691. 93. Feinberg, A. P., and Vogelstein, B. (1983). A Technique for Radiolabeling DNA R e s t r i c t i o n Endonuclease Fragments to High S p e c i f i c A c t i v i t y . Anal. Biochem. 132; 6-13. 94. Fenton II, J. W. (1981). Thrombin S p e c i f i c i t y . Ann. N. Y. Acad. S c i . 370; 468-495. 95. Fenton II, J . W., and Bing, D. H. (1986). Thrombin Active-Site Regions. Semin. Thromb. Hemost. 12; 200-208. 1 99 96. Fisher, R., Waller, E. K., Grossi, G., Thompson, D., Tizard, R., and Schleuning, W. -D. (1985). I s o l a t i o n and Characterization of the Tissue-Type Plasminogen Activator Structural Gene Including Its 5' Flanking Region. J . B i o l . Chem. 260; 11223-11230. 97. Foster, D. C , and Davie, E. W. (1984). Characterization of a cDNA Coding for Human Protein C. Proc. Natl. Acad. S c i . USA 8_U 4766-4770. 98. Foster, D. C , Yoshitake, S., and Davie, E. W. (1985). The Nucleotide Sequence of the Gene for Human Protein C. Proc. Natl. Acad. S c i . USA 82; 4673-4677. 99. Fujikawa, K., Chung, D. W., Hendrickson, L. E., and Davie, E. W. (1986). Amino Acid Sequence of Human Factor XI, A Blood Coagulation Factor with Four Tandem Repeats That Are Highly Homologous with Plasma P r e k a l l i k r e i n . Biochemistry 25; 2417-2424. 100. F u l l e r , G. M. and D o o l i t t l e , R. F. (1971a). Studies of Invertebrate Fibrinogen I: P u r i f i c a t i o n and Characterization of Fibronogen from the Spiny Lobster. Biochemistry 10; 1305-1311. 101. F u l l e r , G. M. and D o o l i t t l e , R. F. (1971b). Studies of Invertebrate Fibrinogen I I : Transformation of Lobster Fibrinogen to F i b r i n . Biochemistry 10; 1311-1315. 102. Fung, M. R., Campbell, R. M., and MacGillivray, R. T. A. (1984). Blood Coagulation Factor X mRNA Encodes a Single Polypeptide Containing a Pre-Pro Leader Sequence. Nucleic Acids Res. 12; 4481-4492. 103. Fung, M. R., Hay, C. W., and MacGillivray, R. T. A. (1985). Characterization of an Almost Full-Length cDNA Coding for Human Blood Coagulation Factor X. Proc. Natl. Acad. S c i . USA 82j_ 3591-3595. 104. Furie, B., Bing, D. H., Feldmann, R. J., Robison, D. J . , Burnier, J. P., and Furie, B. C. (1982). Computer-Generated Models of Blood Coagulation Factor Xa, Factor IXa, and Thrombin Based on Structural Homology with Other Serine Proteases. J. B i o l . Chem. 257; 3875-3882. 105. Gershenfeld, H. K., and Weissman, I. L. (1986) . Cloning of a cDNA for a T- C e l l - S p e c i f i c Serine Protease from a Cytotoxic T Lymphocyte. Science 232; 854-858. 106. G i l b e r t , W. (1978). Why Genes in Pieces? Nature 271; 501 . 107. G i l b e r t , W. (1979). Introns and Exons: Playgrounds of Evolution, in Eukaryotic Gene Regulation (Axel, R., 200 Maniatis, T., and Fox, C. F. Eds.), Academic Press, New York, pp. 1-12. 108. G i l b e r t , W. (1985). Genes-In-Pieces Revisited. Science 2 28; 823-824. 109. G i l b e r t , W., Marchionni, M. , and McKnight, G. (1986). On the Antiquity of Introns. C e l l 46; 151-154. 110. Gluzman, Y. (1985). Eukaryotic Transcription: The role of  c i s - and trans- Acting Elements in I n i t i a t i o n , Cold Spring Harbor Publications, Cold Spring Harbor. 111. Go, M. (1981). C o r r e l a t i o n of DNA Exonic Regions with Protein Structural Units in Haemoglobin. Nature 291; 90-92. 112. Go. M. (1983). Modular Structural units, Exons, and Function in Chicken lysozyme. Proc. Natl. Acad. S c i . USA 80_ 1964-1968. 113. Goldberg, D. A. (1980). Is o l a t i o n and P a r t i a l Characterization of the Drosophilia Alchol Dehydrogenase Gene. Proc. Natl . Acad. S c i . USA 77; 5794-5798. 114. Grabowski, P. J . , S e i l e r , S. R. , and Sharp, P. A. (1985). A Multicomponent Complex i s Involved in the Splicing of Messenger RNA Precursors. C e l l 42; 345-353. 115. Graves, C. B., Grabau, G. G., Olsen, R. E., and Munns, T. W. (1980a). Immunochemical Isolation and Electrophoretic Characterization of Precursor Prothrombins in H-35 Rat Hepatoma C e l l s . Biochemistry 19; 266-272. 116. Graves, C. B., Grabau, G. G., and Munns, T. W. (1980b). Biosynthesis and Processing of prcursor Prothrombins, in Vitamin K Metabolism and Vitamin K-Dependent Proteins (Suttie, J. W. Ed.), University Park Press, Baltimore, pp. 529-541. 117. G r i f f i n , J. H. (1981). The Contact Phase of Blood Coagulation, in Haemostasis and Thrombosis (Bloom, A. L., and Thomas, D. P~ Eds. ) , C h u r c h i l l Livingstone, Edinburgh, pp. 84-97. 118. G r i f f i n , J. H., Evatt, B., Zimmerman, T. S., and K l e i s s , A. J . (1981). Deficiency of Protein C in Congenital Thrombotic Disease. J. C l i n . Invest. 68; 1370-1373. 119. Guyton, A. C. (1977). Basic Human Physiology: Normal  Function and Mechanisms of Disease , Second Edn., W, B. Saunders, Philadelphia. 201 120. Hagen, F. S., Gray, C. L., O'Hara, P., Grant, F. J . , Saari, G. C , Woodbury, R. G., Hart, C. E., Insley, M. , K i s i e l , W., Kurachi, K., and Davie, E. W. (1986). Characterization of a cDNA Coding for Human Factor VII. Proc. Natl. Acad. S c i . USA 8_3j. 2412-2416. 121. H a l l , L., Craig, R. K., Edbrooke, M. R., and Campbell, P. N. (1982). Comparison of the Nucleotide Sequence of Cloned Human and Guinea-Pig Pre-a-Lactalbumin cDNA With That of Chicken Pre-Lysozyme cDNA Suggests Evolution From a Common Ancestral Gene. Nucleic Acids Res. J_0j_ 3503-3515. 122. Hardies, S. C , Edgell, M. H. , and Hutchison III, C. A. (1984). Evolution of the Mammalian 7-Globin Gene Cluster. J. B i o l . Chem. 259; 3748-3756. 123. Hewett-Emmett, D., Czelusniak, J., and Goodman, M. (1981). The Evolutionary Relationships of the Enzymes in Blood Coagulation and Haemostasis. Ann. N. Y. Acad. S c i . 370; 511-527. 124. Hood, L., Kronenberg, M., and Hunkapiller, T. (1985). T C e l l Antigen Receptor and Immunoglobulin Supergene Family. C e l l 40j_ 225-229. 125. Hougie, C , Barrow, E. M., and Graham, J . B. (1957). Stuart Clo t t i n g Defect I: Segregation of a Hereditary Hemorrhagic State from the Heterogenous Group Heretofore Called "Stable Factor" (SPCA, Proconvertin, Factor VII) Deficiency. J. C l i n . Invest. 36; 485-496. 126. Hojrup, P., Jensen, M. S., and Petersen, T. E. (1985). Amino Acid Sequence of Bovine Protein Z: A Vitamin K-Dependent Serine Protease Homolog. F. E. B. S. L e t t . 184; 333-338. 127. Irwin, D. M., Ahern, K. G., Pearson, G. D., and MacGillivray, R. T. A. (1985). Characterization of the Bovine Prothrombin Gene. Biochemistry 24; 6854-6861 . 128. Jackson, C. M. (1981). Biochemistry of Prothrombin Activation, in Haemostasis and Thrombosis (Bloom, A. L. , and Thomas, D. P~. Eds. ) , Chu r c h i l l Livingstone, Edinburgh, pp. 140-162. 129. Jackson, C. M., and Nemerson, Y. (1980). Blood Coagulation. Ann. Rev. Biochem. 49; 765-811. 130. Jaye, M., de la S a l l e , H., Schamber, F., Balland, A., Kohli, V., F i n d e l i , A., Tolstoshev, P., and Lecocq, J. P. (1983). Isolation of Anti-Haemophi1ic Factor IX cDNA Using a Unique 52-Base Synthetic Oligonucleotide Probe Deduced from the Amino Acid Sequence 202 of Bovine Factor IX. Nucleic Acids Res. 11; 2325-2335. 131. Jelinek, W. R., and Schmid, C. W. (1982). Repetitive Sequences in Eukaryotic DNA and Their Expression. Ann. Rev. Biochem. 51; 813-844. 132. Kan, Y. W., and Dozy, A. M. (1978). Polymorphism of DNA Sequence Adjacent, to Human 7-Globin Structural Gene: Relationship to S i c k l e Mutation. Proc. Natl. Acad. S c i . USA 75j_ 5631-5635. 133. Karn, J . , Brenner, S., Barnett, L., and Cesareni, G. (1980). Novel Bacteriophage X Cloning Vector. Proc. N a t l . Acad. S c i . USA 77j_ 5172-5176. 134. Katayama, K., Ericsson, L. H., En f i e l d , D. L., Walsh, K., Neurath, H., Davie, E. W., and T i t a n i , K. (1979). Comparison of Amino Acid Sequence of Bovine Coagulation Factor IX (Christmas Factor) with That of Other Vitamin K-Dependent Plasma Proteins. Proc. Natl. Acad. S c i . USA 76; 4990-4994. 135. Katz, L., Kingsbury, D. T., and Helinski, D. R. (1973). Stimulation By C y c l i c Adenosine Monophosphate of Plasmid Deoxyribonucleic Acid Replication and Catabolic Repression of the Plasmid Deoxyribonucleic Acid-Protein Relaxation Complex. J . B a c t e r i o l . 114; 577-591. 136. Katz, L., Williams, P. H., Sato, S., Laevitt, R. W., and Hel i n s k i , D. R. (1977). P u r i f i c a t i o n and Characterization of Covalently Closed Replicative Intermediats of ColE1 DNA From Escherichia c o l i . Biochemistry 16; 1677-1683. 137. K e l l e r , E. B., and Noon, W. A. (1984). Intron S p l i c i n g : A Conserved Internal Signal in Introns of Animal Pre-mRNA's. Proc. Natl. Acad. S c i . USA 8_l_£_ 7417-7420. 138. K e l l e r , W. (1984). The RNA La r i a t : A New Ring to the Spl i c i n g of mRNA Precursors. C e l l 34; 423-425. 139. Kraut, J. (1977). Serine Proteases: Structure and Mechanism of C a t a l y s i s . Ann. Rev. Biochem. 46; 331-358. 140. Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J . , Gottschling, D. E., and Cech, T. R. (1982). S e l f - S p l i c i n g RNA: Autoexcession and Autocyclization of the Ribosomal RNA Intervening Sequence of Tetrahymena. C e l l 31 ; 147-1 57. 141. Kurachi, K., and Davie, E. W. (1982). Isolation and Characterization of a cDNA Coding for Human Factor IX. Proc. Natl. Acad. S c i . USA 79j_ 6461-6464. 203 142. Kurosky, A., Barnett, D. R., Lee, T. -H., Touchstone, B., Hay, R. E., Arnott, M. S., Bowman, B. H., and Fitc h , W. M. (1980). Covalent Structure of Human Haptoglobin: A Serine Protease Homolog. Proc. N a t l . Acad. S c i . USA 77j_ 3388-3392. 143. Law, S. W., and Brewer, H. B. (1984) . Nucleotide Sequence and the Encoded Amino Acids of Human Apolipoprotein A-I mRNA. Proc. Natl. Acad. S c i . USA 8Jj_ 66-70 . 144. Lawn, R. M. , F r i t s c h , E. F. , Parker, R. C , Blake, G., and Maniatis, T. (1978) . The Isolation and Characterization of Linked 5- and 7-Globin Genes From a Cloned Library of Human DNA. C e l l 1157-1174. 145; Lehrach, H., Diamond, D., Wozney, J . R., and Boedtker, H. (1977). RNA Molecular Weight Determination by Gel Electrophoresis under Denaturing Conditions, A C r i t i c a l Reexamination. Biochemistry 16; 4743-4751. 146. Leonard, W. J. , Depper, J. M., Kanehisa, M., Kronke, M. , Peffer, N. J., S v e t l i k , P. B., Su l l i v a n , M. , and Greene, W. C. (1985). Structure of the Human Interleukin-2 Receptor Gene. Science 230; 633-639. 147. Leytus, S. P., Chung, D. W., K i s i e l , W., Kurachi, K. , and Davie, E. W. (1984). Characterization of a cDNA Coding for Human Factor X. Proc. Natl. Acad. S c i . USA 81; 3699-3702. 148. L i , S. S., Tiano, H. F., Fukasawa, K. M., Yagi, K., Shimizu, M., Sharief, S., Nakashima, Y., and Pan, Y. E. (1985). Protein Structure and Gene Organization of Mouse Lactate Dehydrogenase-A Isozyme. Eur. J. Biochem. 149; 215-225. 149. L i , W-. H. (1983). Evolution of Duplicate Genes and Pseudogenes, in Evolution of Genes and proteins (Nei, M., and Koehn, R. K. Eds.), Sinauer Associated Inc., Sunderland, Mass., pp. 14-37. 150. L i , W-. H., Luo, C-. C , and Wu. C-I. (1985). Evolution of DNA Sequence, in Molecular Evolutionary Genetics (Maclntyre, R. J. Ed.), Plenum Press, New York, pp. 1-94. 151. Liang, S. -M., and Li u , T. -Y. (1982). Studies on the Limulus Coagulation System: Inhibit i o n of Act i v a t i o n of the Proclotting Enzyme by Dimethyl Sulfoxide. Bioc. Bioph. Res. Comm. 105; 553-559. 152. L i z a r d i , P. M. (1983). Methods for the Preparation of Messenger RNA. Meth. Enzymol. 96; 24-38. 204 153. Lobe, C. G., Finlay, B. B., Paranchych, W., Paetkau, V. H., and Bleachley, R. C. (1986). Novel serine Proteases Encoded by Two Cytotoxic T Lymphocyte-Specific Genes. Science 232; 858-861. 154. Lonberg, N., and G i l b e r t , W. (1985). Intron/Exon Structure of the Chicken Pyruvate Kinase Gene. C e l l 40; 81-90. 155. Long, G. L., Balagaje, R. M., and MacGillivray, R. T. A. (1984). Cloning and Sequencing of Liver cDNA Coding for Bovine Protein C. Proc. Natl. Acad. S c i . USA 8_lj_ 5653-5656. 156. MacFarlane, R. G. (1960). The Blood Coagulation System, in The Plasma Proteins (Putnam, F. W. Ed.), v o l . 2, Academic Press, New York, pp. 137-181. 157. MacFarlane, R. G. (1964). An Enzyme Cascade in the Blood C l o t t i n g Mechanism and Its Function as a B i o l o g i c a l Amplifier. Nature 202; 498-499. 158. MacGillivray, R. T. A., Degen, S. J. F., Chandra, T., Woo. S. L. C , and Davie, E. W. (1980). Cloning and Analysis of a cDNA Coding for Bovine Prothrombin. Proc. Natl. Acad. S c i . USA 77j_ 5153-5157. 159. MacGillivray, R. T. A., and Davie, E. W. (1984). Characterization of Bovine Prothrombin mRNA and Its Translation Product. Biochemistry 23; 1626-1634. 160. Maeda, N., Yang, F. , Barnett, D. R., Bowman, B. H., and Smithies, O. (1984). Duplication Within the Haptoglobin Hp 2 Gene. Nature 309; 131-135. 161. Magnusson, S., Petersen, T. E., Sottrup-Jensen, L., and Claeys, H. (1975). Complete Primary Structure of Prothrombin: I s o l a t i o n , Structure and Reactivity of Ten Carboxylated Glutamic Acid Residues and Regulation of Prothrombin Activation by Thrombin, in Proteases and  B i o l o g i c a l Control (Reich, E., R i f k i n , B. D., and Shaw, E. Eds.), Cold Spring Harbor Laboratories, Cold Spring Harbor, pp. 123-149. 162. Malinowski, D. P., Sadler, J. E., and Davie, E. W. (1984). Characterization of a Complementary Deoxyribonucleic Acid Coding for Human and Bovine Plasminogen. Biochemistry 23; 4243-4250. 163. Maniatis, T., J e f f r e y , A., and Kleid, D. G. (1975). Nucleotide Sequence of the Rightward Operator of Phage X. Proc. Natl. Acad. S c i . USA 72j_ 1184-1188. 164. Maniatis, T., F r i t s c h , E. F., and Sambrook, J. (1982). 205 Molecular Cloning: A Laboratory Manual , Cold Spring Harbor Laboratories, Cold Spring Harbor. 165. Marchionni, M., and G i l b e r t , W. (1986). The Triosphosphate Isomerase Gene From Maize: Introns Antedate the Plant Animal Divergence. C e l l 46; 133-141. 166. Mason, A. J . , Evans, B. A., Cox, D. R., Shine, J. and Richards, R. I. (1983). Structure of Mouse K a l l i k r e i n Gene Family Suggests a Role in S p e c i f i c Processing of B i o l o g i c a l l y Active Peptides. Nature 303; 300-307 167. McDevitt, M. A., Imperiale, M. J . , A l i , H., and Nevins, J. R. (1984). Requirement of a Downstream Sequence for Generation of a Poly(A) Addition S i t e . C e l l 37; 993-999. 168. McKnight, S. L., and Kingsbury, R. (1982). Transcriptional Control Signals of a Eukaryotic Protein-Coding Gene. Science 217; 316-324. 169. McKnight, G. L., O'Hara, P. J . , and Parker, M. L. (1986). Nucleotide Sequence of the Triosephosphate Isomerase Gene from Aspergillus nidulans: Implications for a D i f f e r e n t i a l Loss of Introns. C e l l 46; 143-147. 170. McLachlan, A. D. (1979). Gene Duplication in the Structural Evolution of Chymtrypsinogen. J . Mol. B i o l . 128; 49-79. 171. McMullen, B. A., and Fujikawa, K. (1985). Amino Acid Sequence of the Heavy Chain of Human a-Factor XIIa (Activated Hageman Factor). J . B i o l . Chem. 260; 5328-5341 . 172. Messing, J. (1983). New M13 Vectors for Cloning. Meth. Enzymol. 101; 20-78. 173. Messing, J., Crea, R., and Seeburg, P. H. (1981). A System for Shotgun DNA Sequencing. Nucleic Acids Res. 9; 309-321. 174. M i l l s , D. C. B. (1981). The Basic Biochemistry of the Pl a t e l e t , in Haemostasis and Thrombosis (Bloom, A. L., and Thomas, D. P. Eds.), C h u r c h i l l Livingstone, Edinburgh, pp. 50-60. 175. Montell, C , Fisher, E. E., Caruthers, M. H., and Berk, A. J. (1983). Inhibit i o n of RNA Cleavage But not Polyadenylation by a Point Mutation in mRNA Concencus Sequence AAUAAA. Nature 305; 600-608. 176. Morley, B. J., and Campbell, R. D. (1984). Internal Homologies of the Ba Fragment of Human Complement 206 Component Factor B, A Class III MHC Antigen. EMBO J. 3j_ 153-157. 177. Mount, S. M. (1982). A Catalogue of Splice Junction Sequences. Nucleic Acids Res. 10; 459-472. 178. Nagamine, Y. , Pearson, D., Atlus, M. S., and Reich, E. (1984). cDNA and Gene Sequence of Porcine Plasminogen Activator. Nucleic Acids Res. 12; 9525-9541 . 179. Nagamine, Y., Pearson, D., and Grattan, M. (1985). Exon-Intron Boundary Sl i d i n g in the Generation of Two mRNA's Coding For Porcine Urokinase-Like Plasminogen Activator. Biochem. Biophys. Res. Commun 132; 563-569. 180. Naora, H. , and Deacon, N. J. (1982). Relationship Between the Total Size of Exons and Introns in Protein-Coding Genes of Higher Eukaryotes. Proc. Natl. Acad. S c i . USA T 9 ± 6196-6200. 181. Nasmyth, K. (1983). Molecular Analysis of a C e l l Lineage. Nature 302; 670-676. 182. Neurath, H. (1984). Evolution of Proteolytic Enzymes. Science 224; 350-357. 183. Neurath, H. (1985). Proteolytic Enzymes, Past and Present. Fed. Proc. 44; 2907-2913. 184. Neurath, H., and Walsh, K. A. (1976). The Role of Proteases in B i o l o g i c a l Regulation, in Proteolysis and  Physi o l o g i c a l Regulation (Robbins, D. W., and Brew, K. Eds.), Academic Press, New York, pp. 29-42. 185. Nevins, J . R. (1983). The Pathway of Eukaryotic mRNA Formation. Ann. Rev. Biochem. 52; 441-466. 186. Ny, T., Elgh, F., and Lund, B. (1984). The Structure of the Human Tissue-Type Plasminogen Activator Gene: Correlation of Intron and Exon Structures to Functional and St r u c t u r a l Domains. Proc. Natl. Acad. S c i . USA 81; 5355-5359. 187. Owen, C. A., and Bollman, J. L. (1948). Prothrombin Conversion Factor of Diacumarol Plasma. Proc. Soc. Exp. B i o l . Med. 67j_ 231-234. 188. Pan, L. C , and Price, P. A. (1985). The Propeptide of Rat Bone 7-Carboxyglutamic Acid Protein Shares Homology With Other Vitamin K-Dependent Protein Precursors. Proc. Natl. Acad. S c i . USA 82j_ 6109-6113. 189. Pan, L. C , Williamson, M. K. , and Price, P. A. (1985). 207 Sequence of the Precursor to Rat Bone 7-Carboxyglutamic Acid Protein That Accumulates in Warfarin Treated Osteosarcoma C e l l s . J. B i o l . Chem. 260; 13398-13401. 190. Park, C. H., and Tulinsky, A. (1986). Three-Dimensional Structure of the Kringle Sequence: Structure of Prothrombin Fragment 1. Biochemistry 25; 3977-3982. 191. Patek, A. J . , and Taylor, F. H. L. (1937). Hemophilia II: Some Properties of a Substrate Obtained From Normal Plasma E f f e c t i v e in Accelerating the Coagulation of Hemophilic Blood. J. C l i n . Invest. 16; 113-124. 192. Patthy, L. (1985). Evolution of the Proteases of Blood Coagulation and F i b r i n o l y s i s by Assembly From Modules. C e l l 41; 657-663. 193. Pennica, D., Holmes, W. E., Kohr, W. J . , Harkins, R. N., Vehar, G. A., Ward, C. A., Bennett, W. F., Yelverton, E., Seeburg, P. H., Heyneker, H. L., Goeddel, D. V., and Collen, D. (1983). Cloning and Expression of Human Tissue-Type Plasminogen Activator cDNA in E. c o l i . Nature 301; 214-221. 194. Perler, F., E f s t r a t i a d i s , A., Lomedico, P., G i l b e r t , W., Kolodner, R., and Dodgson, J . (1980). The Evolution of Genes: The Chicken Preproinsulin Gene. C e l l 20; 555-566. 195. Perry, R. P. (1976). Processing of RNA. Ann. Rev. Biochem. 45; 605-629. 196. Petersen, T. E., Thogersen, H. C., Shorstengaard, K., Vibe-Pedersen, K., Sahl, P., Sottrup-Jensen, L., and Magnusson, S. (1983). P a r t i a l Primary Structure of Bovine Plasma Fibronectin: Three Types of Internal Homology. Proc. Natl. Acad. S c i . USA 80; 137-141. 197. Pichersky, E., Got t l i e b , L. D., and Hess, J. F. (1984). Nucleotide Sequence of the Triose Phosphate Isomerase Gene of E. c o l i . Mol. Gen. Genet. 195; 314-320. 198. Plutzky, J., Hoskins, J. A., Long. G. L., and Crabtree, G. R. (1986). Evolution and Organization of the Human Protein C Gene. Proc. Natl . Acad. S c i . USA 83; 546-550. 199. Prochownik, E. V., Markham, A. F., and Orkin, S. H. (1983). I s o l a t i o n of a cDNA Clone for Human Antithrombin I I I . J . B i o l . Chem. 258; 8389-8394. 200. Proudfoot, N. J., and Brownlee, G. G. (1976). 3' Non-Coding Region Sequences in Eukaryotic Messenger RNA. Nature 263; 211-214. 208 2 0 1 . Quick, A. J. ( 1 9 4 3 ) . On the Constitution of Prothrombin. Amer. J. Physiol. 140; 2 1 2 - 2 2 0 . 2 0 2 . Quick, A. J. ( 1 9 4 7 ) . Studies on the Enigma of the Hemostatic Dysfunction of Hemophilia. Amer. J. Med. S c i . 2 1 4 ; 2 7 2 - 2 8 0 . 2 0 3 . Ratnoff, 0 . D. ( 1 9 7 7 ) . Blood C l o t t i n g Mechanisms: An Overview, in Haemostasis: Biochemistry, Physiology and  Pathyology (Ogston, D., and Bennett, B. Eds.), John Wiley and Sons., London, pp. 1 -24 . 2 0 4 . Ratnoff, 0 . D., and Colopy, J. H. ( 1 9 5 5 ) . A F a m i l i a l Hemorrhagic T r a i t Associated With a Deficiency of a Clot-Promoting Fraction of Plasma. J. C l i n . Invest. 3 4 ; 6 0 2 - 6 1 3 . 2 0 5 . R i c c i o , A., Grimaldi, G., Verde, P., Sebastue, G., Boast, S., and B l a s i , F. ( 1 9 8 5 ) . The Human Urokinase-Plasminogen Activator Gene and Its Promoter. Nucleic Acids Res. 13; 2 7 5 9 - 2 7 7 1 . 2 0 6 . Richardson, K. K., Crosby, R. M., Good, P. J . , Rosen, N. L., and Mayfield, J. E. ( 1 9 8 6 ) . Bovine DNA Contains a Single Major Family of Interspersed Repetitive Sequences. Eur. J. Biochem. 154; 3 4 9 - 3 5 4 . 2 0 7 . Rogers, J . ( 1 9 8 5 ) . Exon Shuffling and Intron Invasion in Serine Protease Genes. Nature 3 1 5 ; 4 5 8 - 4 5 9 . 2 0 8 . Rosenthal, R. L., Dreskin, 0 . H., and Rosenthal, M. ( 1 9 5 3 ) . New Hemophilia-Like Disease Caused by Deficiency of a Third Plasma Thromboplastin Factor. Proc. Soc. Exp. B i o l . Med. 8 2 ; 1 7 1 - 1 7 4 . 2 0 9 . Ruskin, B., and Green, M. R. ( 1 9 8 5 ) . S p e c i f i c and Stable Intron-Factor Interactions Are Established Early During In V i t r o Pre-mRNA Sp l i c i n g . C e l l 4 3 ; 1 3 1 - 1 4 2 . 2 1 0 . Russel, P. R. ( 1 9 8 5 ) . Transcription of the Triose-Phosphate Isomerase Gene of Shizosacchromyces pombe I n i t i a t e s from a Start Point Different From That in Sacchromyces cerevisiae. Gene 4 0 ; 1 2 5 - 1 3 0 . 2 1 1 . Sadler, J . E. , Malinowski, D. P., and Davie, E. W. ( 1 9 8 5 ) . Cloning and Structural Characterization of the Gene for Human Plasminogen, in Progress in F i b r i n o l y s i s (Davidson, J. F., Donati, M. B., and Coccheri, S. Eds.), v o l . VII, Churchill Livingstone, Edinburgh, pp. 2 01— 2 0 4 . 2 1 2 . Sanger, F., Nicklen, S., and Coulsen, A. R. ( 1 9 7 7 ) . DNA Sequencing With Chain-Terminating Inhib i t o r s . Proc. Natl. Acad. S c i . USA 74j_ 5 4 6 3 - 5 4 6 7 . 209 2 1 3 . S c h w a r z , H . P . , F i s c h e r , M . , H o p m e i e r , P . , B a t a r d , M . A . , a n d G r i f f i n , J . H . ( 1 9 8 4 ) . P l a s m a P r o t e i n S D e f i c i e n c y i n F a m i l i a l T h r o m b o t i c D i s e a s e . B l o o d 6 4 ; 1 2 9 7 - 1 3 0 0 . 2 1 4 . S e i d , R . C , a n d L i u , T . - Y . ( 1 9 8 0 ) . P u r i f i c a t i o n a n d P r o p e r t i e s o f t h e L i m u l u s C l o t t i n g E n z y m e . D e v . B i o c h e m . 1 0 ; 4 8 1 - 4 9 3 . 2 1 5 . S h a r p , P . A . ( 1 9 8 5 ) . On t h e O r i g i n o f S p l i c i n g a n d I n t r o n s . C e l l 42 j_ 3 9 7 - 4 0 0 . 2 1 6 . S h a t k i n , A . J . ( 1 9 8 5 ) . mRNA C a p B i n d i n g P r o t e i n s : E s s e n t i a l F a c t o r s f o r I n i t i a t i n g T r a n s l a t i o n . C e l l 4 0 ; 2 2 3 - 2 2 4 . 2 1 7 . S o l u m , N . 0 . ( 1 9 7 3 ) . T h e C o a g u l o g e n o f L i m u l u s p o l y p h e m u s H e m o c y t e s : A C o m p a r i s o n o f t h e C l o t t e d a n d N o n - C l o t t e d F o r m s o f t h e M o l e c u l e . T h r o m b o s i s R e s . 2j_ 5 5 - 7 0 . 2 1 8 . S o t t r u p - J e n s e n , L . , C l a e y s , H . , Z a j d e l , M . , P e t e r s e n , T . E . , a n d M a g n u s s o n , S . ( 1 9 7 8 ) . T h e P r i m a r y S t r u c t u r e o f H u m a n P l a s m i n o g e n : I s o l a t i o n o f Two L y s i n e -B i n d i n g F r a g m e n t s a n s O n e " M i n i - " P l a s m i n o g e n (MW, 3 8 , 0 0 0 ) b y E l a s t a s e - C a t a l y z e d S p e c i f i c L i m i t e d P r o t e o l y s i s , i n P r o g r e s s i n C h e m i c a l F i b r i n o l y s i s a n d  T h r o m b o l y s i s ( D a v i d s o n , J . F . , R o w a n , R . M . , S a m a n a , M . M , a n d D e s n o y e r , P . C . E d s . ) , v o l . 3 , R a v e n P r e s s , New Y o r k , p p . 1 9 1 - 2 0 9 . 2 1 9 . S o u t h e r n , E . M . ( 1 9 7 5 ) . D e t e c t i o n o f a S p e c i f i c S e q u e n c e A m o n g DNA F r a g m e n t s S e p a r a t e d b y G e l E l e c t r o p h o r e s i s . J . M o l . B i o l . 9 8 ; 5 0 3 - 5 1 7 . 2 2 0 . S t a d e n , R . ( 1 9 8 2 ) . A u t o m a t i o n o f t h e C o m p u t e r H a n d l i n g o f G e l R e a d i n g D a t a P r o d u c e d b y t h e S h o t g u n M e t h o d o f DNA S e q u e n c i n g . N u c l e i c A c i d s R e s . 1 0 ; 4 7 3 1 - 4 7 5 1 . 2 2 1 . S t e i n e r , D . F . , Q u i n n , P . S . , C h a n , S . J . , M a r s h , J . , a n d T a g e r , H . S . ( 1 9 8 0 ) . P r o c e s s i n g M e c h a n i s i m s i n t h e B i o s y n t h e s i s o f P r o t e i n s . A n n . N . Y . A c a d . S c i . 3 4 3 ; 1 - 1 6 . 2 2 2 . S t e n f l o , J . ( 1 9 7 6 ) . A New V i t a m i n K - D e p e n d e n t P r o t e i n : P u r i f i c a t i o n F r o m B o v i n e P l a s m a a n d P r e l i m i n a r y C h a r a c t e r i z a t i o n . J . B i o l . C h e m . 2 5 1 ; 3 5 5 - 3 6 3 . 2 2 3 . S t o n e , E . M . , R o t h b l u m . K . N . , a n d S c h w a r t z , R . J . ( 1 9 8 5 a ) . I n t r o n - D e p e n d e n t E v o l u t i o n o f C h i c k e n G l y c e r a l d e h y d e P h o s p h a t e D e h y d r o g e n a s e G e n e . N a t u r e 3 1 3 ; 4 9 8 - 5 0 0 . 2 2 4 - S t o n e , E . M . , R o t h b l u m , K . N . , A l e v y , M . C , K u o , T . M . , a n d S c h w a r t z , R . J . ( 1 9 8 5 ) . C o m p l e t e S e q u e n c e o f t h e 210 Chicken Glyceraldehyde-3-Phosphate Dehydrogenase Gene. Proc. Natl. Acad. S c i . USA 82j_ 1628-1632. 225. Straus, D., and Gi l b e r t , W. (1985). Genetic Engineering in the Precambrian: Structure of the Chicken Triosephosphate Isomerase Gene. Mol. C e l l . B i o l . 5; 3497-3506. 226. Stroud, R. M., Kossiakoff, A. A., and Chambers, J. L. (1977). Mechanisims of Zymogen Activation. Ann. Rev. Biophys. Bioeng. 6j_ 177-193. 227. Stryer, L. (1981). Biochemistry , Freman Press, San Franc i sco. 228. Sudhoff, T. C. Goldstein, J . L., Brown, M. S., and Russell. D. W. (1985a). The LDL Receptor Gene: A Mosaic of Exons Shared With Different Proteins. Science 228; 815-822. 229. Sudhoff, T. C , Russell, D. W., Goldstein, J. L., Brown, M. S., Sanchez-Pescador, R., and B e l l , G. I. (1985b). Cassette of Eight Exons Shared by Genes for LDL Receptor and EGF Precursor. Science 228; 893-895. 230. Suttie, J. W. (1985). Vitamin K-Dependent Carboxylase. Ann. Rev. Biochem. 54; 459-477. 231. Suttie, J. W., and Jackson, C. M. (1977). Prothrombin Structure, Activation and Biosynthesis. Physiol. Rev. 57; 1-70. 232. Swanson, J. C , and Suttie, J. W. (1985). Prothrombin Biosynthesis: Characterization of Processing Events in Rat Liver Microsomes. Biochemistry 24; 3890-3897. 233. Swift, G. H., Craik, C. S., Stary, S. J., Quinto, C , Lahaie, R. G., Rutter, W. J., and MacDonald, R. J. (1984). Structure of the Two Related Elastase Genes Expressed in the Rat Pancreas. J. B i o l . Chem. 2 59; 14271-14278. 234. T e l f e r , T. P., Denson, K. W., and Wright, D. R. (1956). A 'New' Coagulation Defect. B r i t . J. Haemat. 2j_ 308-316. 235. Thomas, P. S. (1980). Hybridization of Denatured RNA and Small DNA Fragments Transferred to N i t r o c e l l u l o s e . Proc. Natl. Acad. S c i . USA 77j_ 5201-5205. 236. Tulinsky, A., Park, C. H., and Kydel, T. J. (1985). The Structure of Prothrombin Fragment 1 at 3. 5 A 0 Resolution. J. B i o l . Chem. 260; 10771-10778. 21 1 237. van Leeuwen, B. H., Evans, B. A., Tregear, G. W., and Richards, R. I. (1986). Mouse Glandular K a l l i k r e i n n Genes: I d e n t i f i c a t i o n , Structure, and Expression of the Renal K a l l i k r e i n Gene. J . B i o l . Chem. 261; 5529-5535. 238. Verde, P., S t o p p e l l i , M. P., G a l e f f i , P., Di Nocera, P., and B l a s i , F. (1984). I d e n t i f i c a t i o n and Primary Sequence of an Unspliced Human Urokinase Poly(A) + RNA. Proc. Natl. Acad. S c i . USA 8J_^ 4727-4731. 239. V i e i r a , J ., and Messing, J, (1982), The pUC Plasmids, an M13mp7 Derived System for Insertion Mutagenisis and Sequencing With Synthetic Universal Primers. Gene 1 9; 259-268. 240. von Heijne, G. (1983). Patterns of Amino Acids Near Signal-Sequence Cleavage S i t e s . Eur. J. Biochem. 133; 17-21. 241. von Heijne, G. (1985). Signal Sequences: The Limits of Vari a t i o n . J. Mol. B i o l . 184; 99-105. 242. Walz, D. A. (1978). Comparitive Aspects of Prothrombin Acti v a t i o n . Biblo. Haemat. 44; 8-14. 243. Walz, D. A., Kipfer*, R. K. , Jones, J. P., and Olsen, R. E. (1974). P u r i f i c a t i o n and Properties of Chicken prothrombin. Arch. Biochem. Biophys. 164; 527-535. 244. Walz, D. A., Kipfer, R. K., and Olsen, R. E. (1975). Effect of Vitamin K Deficiency, Warfarin, and Inhibitors of Protein Synthesis Upon the Plasma Levels of Vitamin K-Dependent C l o t t i n g Factors in the Chick. J. Nutr. 105; 972-981. 245. Walz, D. A., Hewett-Emmett, D., and Seegers, W. H. (1977). Amino Acid Sequence of Human Prothrombin Fragments 1 and 2. Proc. Natl. Acad. S c i . USA 74j_ 1969-1972. 246. Watanabe, Y., Tsukada, T., Notake, M., Nakanishi, S, and Numa, S. (1982). St r u c t u r a l Analysis of Repetitive DNA Sequences in the Bovine Corticotropin-/3-Lipotropin Precursor Gene Region. Nucleic Acids Res. 10; 1459-1469. 247. Weaver, R. F., and Weissmann, C. (1979). Mapping of RNA by a Modification of the Berk-Sharp Procedure: The 5' termini of 15S /3-Globin mRNA Precursor and Mature 10S 0-Globin mRNA Have Ide n t i c a l Map Coordinates. Nucleic Acids Res. 7j_ 1 175-1193. 248. Wieringa, B., Hofer, E., and Weissmann, C. (1984). A 212 Minimal Intron Length But No Spe c i f i c Internal Sequence i s Required For Sp l i c i n g the Large Rabbit 0-Globin Intron. C e l l 37j_ 915-925. 249. Wilson, A. C , Carlson, S. S., and White, T. J . (1977). Biochemical Evolution. Ann. Rev. Biochem. 46; 573-639. 250. Yoshitake, S., Schach, B. G., Foster, D. C , Davie, E. W. and Kurachi, K. (1985). Nucleotide Sequence of the Gene for Human Factor IX (Antihemphilic Factor B). Biochemistry 24; 3736-3750. 251. Young, C. L., Barker, W. C , Tomaselli, C. M. , and Dayhoff, M. 0. (1978). Serine Proteases, in Atlas of  Protein Structure (Dayhoff, M. 0. Ed.), v o l . 5 (suppl. 3), National Biomedical Research Foundation, S i l v e r Spring, Maryland, pp. 73-93. 252. Young, R. A., and Davis, R. W. (1983a). E f f i c i e n t Isolation of Genes by Using Antibody Probes. Proc. N a t l . Acad. S c i . USA 80j_ 1194-1198. 253. Young, R. A., and Davis, R. W. (1983b). Yeast RNA polymerase II Genes: Isolation With Antibody Probes. Science 222; 778-782. 254. Zaug, A. J., and Cech, T. R. (1986). The Intervening Sequence RNA of Tetrahymena i s an Enzyme. Science 231; 470-475. 255. Zuckerkandl, E., and Pauling, L. (1965). Evolutionary Divergence and Convergence in Plasma Proteins, in Evolving  Genes and Proteins (Bryson, V., and Vogel, H. J . Eds.), Academic Press, New York, pp. 97-166. 256. Zur, M., and Nemerson, Y. (1981). Tissue Factor Pathways of Blood Coagulation, in Haemostasis and Thrombosis (Bloom, A. L., and Thomas, D. P^  Eds.), Churchi11 Livingstone, Edinburgh, pp. 124-139. 257. Zytkovicz, T. H., and Nelsestuen, G. L. (1976). 7-Carboxyglutamic Acid D i s t r i b u t i o n . Biochem. Biophys. Acta 444; 344-348. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0097390/manifest

Comment

Related Items