UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Characterization of the human ceruloplasmin cDNA and gene Koschinsky, Marlys Laverne 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1988_A1 K67.pdf [ 8.65MB ]
Metadata
JSON: 831-1.0098062.json
JSON-LD: 831-1.0098062-ld.json
RDF/XML (Pretty): 831-1.0098062-rdf.xml
RDF/JSON: 831-1.0098062-rdf.json
Turtle: 831-1.0098062-turtle.txt
N-Triples: 831-1.0098062-rdf-ntriples.txt
Original Record: 831-1.0098062-source.json
Full Text
831-1.0098062-fulltext.txt
Citation
831-1.0098062.ris

Full Text

CHARACTERIZATION OF THE HUMAN CERULOPLASMIN cDNA AND GENE By Marlys Laverne Koschinsky B.Sc. (Hons), The Univ e r s i t y of Winnipeg, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Department of Biochemistry) We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA March 1988 ® Marlys Laverne Koschinsky, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of jgf QCh+£S\l <JTAKf The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) i i ABSTRACT A cDNA f o r human ceruloplasmin was i d e n t i f i e d i n a human l i v e r cDNA l i b r a r y by screening with mixtures of synthetic oligonucleotides complementary to two regions of the ceruloplasmin mRNA. The r e s u l t i n g clone (phCP-1) contained DNA coding f o r amino acid residues 202 - 1046 of the prot e i n , followed by a 3' untranslated region of 123 bp and a poly(A) t a i l . To i s o l a t e a d d i t i o n a l clones extending i n a 5' d i r e c t i o n , two randomly-primed human l i v e r cDNA l i b r a r i e s were constructed i n the bacteriophage vectors XgtlO and X g t l l . From the former l i b r a r y , a clone was i s o l a t e d (XhCP-1) that contained DNA coding f o r a putative s i g n a l peptide c o n s i s t i n g of 19 amino acid residues, followed by DNA encoding residues 1 - 380 of plasma ceruloplasmin. From the X g t l l l i b r a r y , s i x ceruloplasmin cDNA clones were p u r i f i e d , two of which were shown to contain 10 and 38 bp of non-coding sequence extending 5' to XhCP-1. Bl o t h y b r i d i z a t i o n analysis using cDNA probes showed that ceruloplasmin mRNA from the human hepatoma c e l l l i n e HepG2 i s 3700 nucleotides i n s i z e , while human l i v e r RNA contained an a d d i t i o n a l h y b r i d i z i n g species of 4500 nucleotides i n s i z e . Ceruloplasmin genomic DNA clones (spanning a region of approximately 45 Kbp) were obtained by the screening of several human genomic phage l i b r a r i e s using cDNA probes. These clones were i n i t i a l l y characterized by r e s t r i c t i o n endonuclease mapping. Using DNA sequence analysis, the positio n s of intron/exon boundaries were determined. To date, 14 exons (average s i z e of 183 bp) have been i d e n t i f i e d i n the ceruloplasmin gene, corresponding to nucleotide residues 1 - 2565 of the coding sequence. The i i i majority of the 14 introns l o c a l i z e d within t h i s region were located i n analogous p o s i t i o n s i n the f a c t o r VIII gene, thereby suggesting that these two proteins have evolved from a common ancestral gene. At l e a s t 4 exons have been l o c a l i z e d within the 5' untranslated region of the human ceruloplasmin gene, although t y p i c a l eukaryotic promoter elements have not yet been i d e n t i f i e d . The s i g n i f i c a n c e of t h i s novel organization remains unclear at present. In addition to the wild-type gene, a processed pseudogene f o r human ceruloplasmin was i d e n t i f i e d and contained DNA corresponding to the fu n c t i o n a l gene sequence encoding the carboxy-terminal 563 amino a c i d residues and the 3' untranslated region. The pseudogene appears to have ari s e n from a processed RNA species, since intervening sequences coincident with those of the fu n c t i o n a l gene have been removed with the exception of a short segment of i n t r o n i c sequence which denotes the 5' boundary of the pseudogene. Based on genomic Southern b l o t a n l y s i s performed under high stringency conditions, the pseudogene seems to comprise the only sequence i n the human genome that i s c l o s e l y r e l a t e d to the wild-type gene. Using somatic c e l l h y b r i d i z a t i o n , the pseudogene was l o c a l i z e d to human chromosome 8; t h i s d i f f e r s from the l o c a t i o n of the wild-type ceruloplasmin gene, which has been mapped to chromosome 3. i v TABLE OF CONTENTS Page Abstract i i L i s t of Tables x L i s t of Figures x i Abbreviations x i i i Acknowledgements x i v Dedication xv I. INTRODUCTION 1 I.A PERSPECTIVES 1 I.B PROPERTIES OF CERULOPLASMIN . . . . 1 I.B.I Structure of the Protein 1 I.B.2 Sites of Ceruloplasmin Biosynthesis 9 I.B.3 Functions of Ceruloplasmin 10 I.B.3.1 Ferroxidase a c t i v i t y 10 I.B.3.2 Serum antioxidant a c t i v i t y 10 I.B.3.3 Amine oxidase a c t i v i t y 11 I.B.3.4 Role of ceruloplasmin i n copper transport . . . 11 I.B.4 Regulation of Ceruloplasmin Expression 12 I.B.4.1 Hormonal regulation of ceruloplasmin synthesis 12 I.B.4.2 Copper induction of ceruloplasmin expression 13 I.B.4.3 Regulation of ceruloplasmin synthesis during inflammation 14 I.C ABNORMALITIES IN COPPER HOMEOSTASIS - WILSON'S DISEASE . . 15 I.D CHARACTERIZATION OF RAT CERULOPLASMIN 17 I.E THE RELATIONSHIP OF CERULOPLASMIN TO OTHER COPPER-CONTAINING PROTEINS 19 I.F THE RELATIONSHIP BETWEEN CERULOPLASMIN AND PROTEINS INVOLVED IN BLOOD COAGULATION 26 V Page I.G. CHARACTERIZATION OF THE HUMAN FACTOR VIII GENE 32 I.G.I Historical Perspectives 32 I.G.2 Organization of the Factor VIII Gene 33 I.G.3 Evolutionary Aspects of Intron Positions Within the Factor VIII Gene 35 I.H DYNAMICS OF PROTEIN AND GENE EVOLUTION 38 I.H.I The Molecular Clock 38 I.I MECHANISMS OF GENE EVOLUTION 39 1.1.1 Gene Duplication 39 1.1.2 Gene Fusion 39 1.1.3 Exon Shuffling 40 1.1.4 Intron Insertion and Intron Sliding 40 I.J PSEUDOGENES 41 I.J.I Non-Processed Pseudogenes 41 I. J.2 Processed Pseudogenes 42 I. K. The Present Study 43 II. MATERIALS AND METHODS 45 II. A BACTERIAL HOSTS AND MEDIA 45 II.B HYBRIDIZATION PROBES 48 II. B.l Purification and Labelling of Oligodeoxyribonucleotides 48 II.B.2 Nick Translation 49 II.B.3 Klenow Labelling . . . . 49 II.B.4 Preparation of M13 Probes 50 v i Page II.C IDENTIFICATION OF cDNAS FOR HUMAN CERULOPLASMIN 51 I I . C . l Screening of a Human L i v e r cDNA Library 51 II.C.2 Preparation and Screening of Randomly-Primed Human L i v e r cDNA L i b r a r i e s 52 II.D SCREENING OF HUMAN GENOMIC LIBRARIES 54 II.E ISOLATION OF NUCLEIC ACIDS 55 I I . E . l P u r i f i c a t i o n of Plasmid DNA 55 II.E.2 I s o l a t i o n of Bacteriophage DNA 57 II.E.3 Preparation of Human Genomic DNA 59 II.E.4 I s o l a t i o n of RNA 60 II.E.4.1 Preparation of human l i v e r p o l y ( A ) + RNA . . . 60 II.E.4.2 Preparation of t o t a l RNA from HepG2 c e l l s . . . 61 II.F BASIC DNA TECHNIQUES 61 I I . F . l R e s t r i c t i o n Enzyme Digestion 61 II.F.2 End-Labelling of DNA Fragments 62 II.F.3 Electrophoresis of DNA 62 II.F.3.1 Agarose gel electrophoresis 62 II.F.3.2 Polyacrylamide gel electrophoresis 63 II.F.4 Southern Transfers 64 II.G DNA CLONING 64 II.G.l Fragment Production 64 II.G.2 L i g a t i o n of DNA into pUC or M13 Vectors 65 II.G.3 Transformations 65 II.H DNA SEQUENCE ANALYSIS 66 II.H.l Screening of M13 Clones 66 II.H.2 I s o l a t i o n of M13 Template DNA 66 II.H.3 DNA Sequence Analysis 69 v i i Page I I . I RNA ANALYSIS 71 11.1.1 Northern Blot Analysis 71 11.1.2 RNA Dot Blots 72 11.1.3 Nuclease SI Mapping 72 I I . J CHROMOSOME MAPPING 73 I I . K SUMMARY OF HYBRIDIZATION/WASHING CONDITIONS 73 I I . K . l Genomic Southern Blot Analysis 73 U.K.2 Hybridization Conditions Other Than For Genomic Southern Blots 74 I I I . RESULTS I I I . A CHARACTERIZATION OF THE HUMAN PRECERULOPLASMIN cDNA . . . 76 I I I . A . l I n i t i a l Screening of a Human L i v e r cDNA Lib r a r y 76 III.A.2 I s o l a t i o n of cDNA Clones Encoding the 5' End of Ceruloplasmin 76 III.A.3 DNA Sequence Analysis of Human Preceruloplasmin . 82 III.A.4 Ceruloplasmin Transcript Analysis 87 III.B CHARACTERIZATION OF THE WILD-TYPE HUMAN CERULOPLASMIN GENE 90 I I I . B . l I s o l a t i o n and R e s t r i c t i o n Endonuclease Mapping of Genomic Clones 90 III.B.2 L o c a l i z a t i o n of Intron/Exon Junctions Corresponding to the Ceruloplasmin Coding Region 93 III.B.3 P a r t i a l Nucleotide Sequence Analysis of the Human Ceruloplasmin Gene 96 III.B.4 Organization of the 5' End of the Human Ceruloplasmin Gene 98 v i i i Page III.B.4.1 Comparison of genomic and cDNA sequence data . 98 III.B.4.2 SI mapping analysis of exon 1 98 III.B.4.3 Southern b l o t analysis of exon L4 98 III.B.4.4 Northern b l o t analysis of the 5* end of the human ceruloplasmin gene 105 III.B.4.5 RNA dot b l o t analysis 105 III.C ISOLATION AND COMPLETE CHARACTERIZATION OF A PSEUDOGENE FOR HUMAN CERULOPLASMIN I l l I I I . C . l I s o l a t i o n of Genomic Clones Containing the Human Ceruloplasmin Pseudogene I l l III.C.2 DNA Sequence of the Human Ceruloplasmin Pseudogene 114 III.C.3 Nuclease SI Analysis of the Human Ceruloplasmin Pseudogene 118 I I I . C.4 Chromosome L o c a l i z a t i o n of the Human Ceruloplasmin Pseudogene 124 I I I . D GENOMIC SOUTHERN BLOT ANALYSIS OF THE HUMAN CERULOPLASMIN GENE 124 IV. DISCUSSION 132 IV. A CHARACTERIZATION OF THE HUMAN PRECERULOPLASMIN cDNA . . . 132 IV. A.1 DNA Sequence Analysis of the Human Preceruloplasmin cDNA 132 IV.A.2 Internal Homology Within the Ceruloplasmin cDNA Sequence 136 IV.A.3 Analysis of the Human Ceruloplasmin Tran s c r i p t . 137 IV.B CHARACTERIZATION OF THE WILD-TYPE HUMAN CERULOPLASMIN GENE 138 IV.B.l Ceruloplasmin Gene Organization Corresponding to the Coding Sequence 138 IV.B.2 DNA Sequence Analysis of the Wild-Type Ceruloplasmin Gene 139 IV.B.3 Intron Positions Within the T r i p l i c a t e d A Domain of Human Ceruloplasmin 139 i x Page IV.B.4 Comparison of the Gene Organization of Ceruloplasmin and Factor VIII 145 IV.B.5 Characterization of the 5' End of the Human Ceruloplasmin Gene 150 IV.C CHARACTERIZATION OF A PSEUDOGENE FOR HUMAN CERULOPLASMIN . 153 I V . C l DNA Sequence Analysis of the Human Ceruloplasmin Pseudogene 153 IV.C.2 Chromosome L o c a l i z a t i o n of the Human Ceruloplasmin Pseudogene 156 IV.C.3 Speculations on the Evolutionary O r i g i n of the Human Ceruloplasmin Pseudogene 157 IV.D A MODEL FOR THE EVOLUTION OF CERULOPLASMIN, FACTOR V AND FACTOR VIII 159 IV.E CONCLUDING REMARKS 163 V. REFERENCES 165 X LIST OF TABLES Page I Comparison of A Domains i n Factor V, Factor VIII and Ceruloplasmin 30 II Summary of the Genotypes of B a c t e r i a l Hosts Used i n the Present Study 46 III Subcloning Strategy f o r the Wild-Type Human Ceruloplasmin Gene 67 IV Compositions of M13 DNA Sequencing Mixes 70 V Nucleotide Sequence of Intron/Exon Junctions i n the Human Ceruloplasmin Gene 94 VI Frequency of Nucleotides at Intron/Exon Junctions 95 VII Sizes and Positions of Introns and Exons Within the Ceruloplasmin Gene 97 VIII Segregation of the Human Ceruloplasmin Pseudogene with Human Chromosomes i n Human-Hamster Somatic Hybrids 123 xi LIST OF FIGURES Page 1. S t r u c t u r a l Model of the Human Ceruloplasmin Molecule 5 2. Relationship between Ceruloplasmin and Other Copper-Containing Proteins 21 3. Comparison of the St r u c t u r a l Organization of Ceruloplasmin, Factor V and Factor VIII 28 4. Intron Positions within the A and C Domains of Human Factor VIII 36 5. Schematic Summary of the Cloning of the Human Preceruloplasmin cDNA 77 6. Sequencing Strategy f o r the Human Preceruloplasmin cDNA . . . . 80 7. Nucleotide Sequence of Human Preceruloplasmin cDNA 83 8. Comparative Analysis of cDNA Clones \hCP-2 to XhCP-6 85 9. Bl o t H y b r idization Analysis of Human Ceruloplasmin mRNA . . . . 88 10. P a r t i a l R e s t r i c t i o n Map and Intron/Exon Organization of the Human Ceruloplasmin Gene 91 11. Comparison of the Sequence of \hCP-6 with Overlapping Genomic Sequence Derived from XWT2 99 12. Nuclease SI Mapping of Exon 1 101 13. Southern Blot Analysis of Exon L4 103 14. Northern Blot Analysis ofthe 5* End of the Ceruloplasmin Gene . 106 15. RNA Dot Blot Analysis of the 5' Untranslated Region of the Ceruloplasmin Gene 109 16. P a r t i a l R e s t r i c t i o n Map and Sequencing Strategy f o r the Human Ceruloplasmin Pseudogene 112 17. Nucleotide Sequence of the Human Ceruloplasmin Pseudogene and Comparison with the Corresponding Region of the Ceruloplasmin cDNA Sequence 115 18. Nuclease SI Mapping Analysis of the Ceruloplasmin Pseudogene . . 119 x i i Page 19. Chromosome Mapping of the Human Ceruloplasmin Pseudogene Using Somatic C e l l Hybrid Analysis 121 20. Genomic Southern Blot Analysis of the Human Ceruloplasmin Gene . 125 21. Genomic Southern Analysis of the Human Ceruloplasmin Pseudogene and Related Sequences 128 22. Genomic Southern Blot Analysis of the 3' End of the Human Ceruloplasmin Gene 130 23. Intron Positions Within the Three Repeated Units of Human Ceruloplasmin 140 24. Positions of Introns i n the A Domains of Ceruloplasmin and Factor VIII 143 25. Comparative Positions of Introns i n Ceruloplasmin with Corresponding Introns i n Factor VIII 147 26. A Proposed Model for the Evolution of Ceruloplasmin, Factor V and Factor VIII 160 x i i i ABBREVIATIONS A adenosine ATP adenosine-5'-triphosphate bp base p a i r s C cytosine cDNA complementary DNA cpm counts per minute dNTP deoxyribonucleotidetriphosphate ddNTP dideoxyribonucleotidetriphosphate DNA deoxyribonucleic acid DTT d i t h i o t h r e i t o l EDTA ethylenediamine t e t r a a c e t i c acid 6 guanosine HEPES N-2-hydroxyethylpiperazine-N-2-ethanesulfonic acid Kb, Kbp Kilobases, Kilobase p a i r s Kda Kilodaltons Krpm thousand revolutions per minute mRNA messenger RNA Mr r e l a t i v e molecular weight N adenine, cytosine, guanine or thymine OD o p t i c a l density PIPES piperazine-N,N'-bis (2-ethanesulfonic acid) p o l y ( A ) + polyadenylated RNA ri b o n u c l e i c acid SDS sodium dodecyl s u l f a t e T thymidine TEMED N,N,N',N'-Tetramethylethylenediamine T r i s Tri(hydroxymethyl)aminoethane tRNA transfe r RNA U ur i d i n e UV u l t r a v i o l e t W watts xiv ACKNOWLEDGEMENTS I would l i k e to thank my supervisor, Ross MacGillivray, f o r h i s enthusiastic support of my career ("great"), and h i s willingness to allow each member of the laboratory to develop an independent approach to s c i e n t i f i c research. Ross has generated an atmosphere of warmth and friendship within h i s research group that has f i l l e d my past f i v e years with many happy memories. I would also l i k e to thank Walter (Waltman) Funk f o r the s p e c i a l camaraderie that we have shared over the years. In addition, I would l i k e to acknowledge Heather Kirk, f o r t e c h n i c a l advice during several aspects of t h i s project, and Susan Heming, f o r her patience i n the preparation of the manuscript. A very s p e c i a l thanks to Roland Russnak f o r h i s r e l e n t l e s s optimism — i t might be contagious! DEDICATION To Mother and Father f o r t h e i r t i r e l e s s love and b e l i e f i n me that has given ! the strength to keep t r y i n g . 1 I. INTRODUCTION A. PERSPECTIVES Copper i s required f o r the function of a number of metalloenzymes and metalloproteins present i n both prokaryotic and eukaryotic c e l l s . Therefore, s t r i c t maintenance of copper homeostasis i s e s s e n t i a l f o r many v i t a l processes. On t h i s b a s i s , i t i s not s u r p r i s i n g that since i t s i n i t i a l discovery by Holmberg i n 1944, ceruloplasmin (the p r i n c i p a l copper transport p r o t e i n i n vertebrate plasma) has been the focus of intensive biochemical study. This, i n turn, has r e s u l t e d i n the generation of several thousand papers dealing with the properties of t h i s large multicopper oxidase. In addition to i t s m u l t i f u n c t i o n a l nature which renders i t an i n t e r e s t i n g p r o t e i n f o r study, ceruloplasmin has also received attention i n analyses t r a c i n g the evolution of metal-containing enzymes and proteins i n aerobic c e l l s (e.g. Frieden, 1974). B. PROPERTIES OF HUMAN CERULOPLASMIN B. l Structure of the Protein Ceruloplasmin i s a blue, c r y s t a l l i z a b l e (Morell, 1969; Nakagawa, 1972) a-2 glycoprotein that binds 90 - 95% of plasma copper i n vertebrates. The remaining copper ex i s t s i n complexes of amino acids (Harris and Sass-Kortsak, 1967), serum albumin (Sarkar and Wigfield, 1968) and a t r i p e p t i d e composed of g l y c i n e - h i s t i d i n e - l y s i n e (Pickart et a l . , 1980) , a l l of which may provide a u x i l i a r y transport mechanisms (Frieden, 1981) . M u l t i p l e enzymatic functions have been ascribed to ceruloplasmin (see Section I.B.3; f o r reviews dealing with the functions of ceruloplasmin, 2 see Frieden, 1981; Owens, 1982; Cousins, 1985), a l l of which l i k e l y involve the presence of i n t r i n s i c a l l y - b o u n d cupric ions. Each ceruloplasmin molecule (Mr = 132 Kda) contains at l e a s t 6 or 7 copper-binding s i t e s (Ryden and Bjork, 1976), which can be categorized based upon the p h y s i c a l properties of l i g h t absorbance and electromagnetic behavior. Although the stoichiometry of the d i f f e r e n t copper types of ceruloplasmin i s not well-established, the following composition has been proposed: two type I and one type II [based on q u a n t i t a t i v e electron paramagnetic resonance (EPR) measurements of Deinum and Vanngard, 1973], and three (or four) type I I I (Ryden and Bjork, 1976). Type I copper i s present i n small blue e l e c t r o n - t r a n s f e r proteins and blue copper oxidases of which ceruloplasmin i s the only mammalian representative. Type I copper centres are characterized by strong absorbance around 600 nm (red l i g h t ) , and t y p i c a l l y r e s u l t i n narrow EPR hyperfine s p l i t t i n g . Based on c i r c u l a r dichroism (CD) and magnetic c i r c u l a r dichroism (MCD) studies (Dawson et a l . , 1979), type I s i t e s i n ceruloplasmin are proposed to be coordinated by a cysteine, methionine, and two h i s t i d i n e ligands, i n a d i s t o r t e d tetrahedral geometry. Type I copper has a redox p o t e n t i a l appreciably higher than that of the Cu (II)/Cu (I) couple i n aqueous s o l u t i o n (Fee, 1975). This may be due to the d i s t o r t i o n of the copper geometry which r e s u l t s i n decreased a c t i v a t i o n energy required f o r e l e c t r o n t r a n s f e r (Williams, 1971). Type II copper absorbs weakly at 600 nm, and features an EPR hyperfine structure s i m i l a r to that found i n tetragonal copper complexes. Type II copper i s present i n large blue multicopper oxidases, such as laccase, cytochrome oxidase, ascorbate oxidase and ceruloplasmin, as well 3 as i n non-blue oxidases, such as monoamine oxidase and galactose oxidase. Studies of the coordination environment of type II copper i n ceruloplasmin (Dawson et a l . , 1978) and galactose oxidase (Bereman and Kosman, 1977) provide evidence f o r h i s t i d i n e coordination of the copper i n these type II centres. The type I I I copper centre i s binuclear, composed of two copper (II) ions antiferromagnetically coupled and therefore EPR nondetectable. These copper atoms are associated with an intense absorption band at 330 nm. Type I I I copper i s present i n copper oxidases that catalyze the reduction of dioxygen to two water molecules, such as ceruloplasmin, ascorbate oxidase and laccase. Very l i t t l e i s known about the coordination geometry or s p e c i f i c ligand groups f o r type I I I centres (Urbach, 1981). An EPR s i g n a l has been detected i n the l a t t e r two proteins which d i f f e r s from the si g n a l s r e s u l t i n g from type I or type II Cu (II) ions (Reinhammar et a l . , 1980). This new s i g n a l has been a t t r i b u t e d to one of the p a i r of Cu (II) ions e x i s t i n g i n the binuclear centre, and i s comparable to signals observed i n half-met hemocyanin (Himmelwright et a l . , 1978) and superoxide dismutase (Fielden et a l . , 1974). Both of the l a t t e r proteins have been shown to contain copper-binding s i t e s i n binuclear centres. Despite p e r s i s t e n t study, the assignment of copper-binding s i t e s to s p e c i f i c regions of the ceruloplasmin polypeptide chain remains l a r g e l y undetermined. However, i t has been recently demonstrated by Raju (1983) that 50% of the non-blue copper-binding occurs i n an 11 Kda fragment derived from the carboxyl terminus of ceruloplasmin (corresponding to amino acid residues 885 - 1046). A d d i t i o n a l l y , based on sequence 4 s i m i l a r i t y with known blue and non-blue copper binding s i t e s , several putative copper-binding centres have been i d e n t i f i e d i n ceruloplasmin (see Section I.E). E a r l y studies suggested that ceruloplasmin had a subunit structure c o n s i s t i n g of 2 - 8 polypeptide chains (Freeman and Daniel, 1973; Poulik and Weiss, 1975; McCombs and Bowman, 1976). Ryden (1972) demonstrated, however, that the observed subunits corresponded to p r o t e o l y t i c fragments that could be eliminated when fresh plasma, supplemented with protease i n h i b i t o r s , was used f o r the i s o l a t i o n of ceruloplasmin. This l e d to a proposed s i n g l e chain structure f o r the molecule, which was l a t e r confirmed by amino acid sequence analysis (Takahashi et a l . , 1984). In v i t r o and pos s i b l y i n vivo, spontaneous p r o t e o l y t i c cleavage occurs r a p i d l y at two highly s e n s i t i v e s i t e s following basic amino acid residues, producing fragments with molecular weights of 67 Kda, 50 Kda, and 19 Kda, corresponding to the amino-terminal, middle, and carboxy-terminal portions of the protein, r e s p e c t i v e l y (Kingston et a l . , 1980; Dwulet and Putnam, 1981a; see Figure 1). Limited t r y p t i c d igestion i n v i t r o r e s u l t s i n cleavage at the two l a b i l e s i t e s described above, and a d d i t i o n a l l y degrades the 50 Kda fragment, producing 25- and 26 Kda fragments, and slowly cleaves the 67 Kda fragment, y i e l d i n g 49- and 18 Kda fagments (Takahashi et a l . , 1983; see Figure 1). The p h y s i o l o g i c a l s i g n i f i c a n c e of the l i m i t e d p r o t e o l y t i c cleavage i s unclear, but may function i n protein re g u l a t i o n as i s proposed f o r the deactivation of anaphylatoxins and bradykinin (see below). The complete amino acid sequence of human ceruloplasmin, co n s i s t i n g of 1046 residues, was determined by analysis of the 19 Kda (Kingston 5 Figure 1. S t r u c t u r a l model of the human ceruloplasmin molecule based on  p r o t e o l y t i c cleavage s i t e s and i n t e r n a l amino a c i d sequence i d e n t i t y (modified from O r t e l et a l . , 1984; see text f o r d e t a i l s ) . The polypeptide chain i s cleaved a u t o l y t i c a l l y into the 67, 50 and 19 Kda fragments as shown. In the i n t a c t polypeptide chain, these fragments are connected by s i n g l e amino acid residues arginine (R) and l y s i n e (K). T r y p t i c cleavage s i t e s are indicated by v e r t i c a l arrows, with broad arrows i d e n t i f y i n g major s i t e s of cleavage. The sizes of p r o t e o l y t i c fragments are given i n Kilodaltons (Kda). The p o s i t i o n s of glucosamine oligosaccharide attachment s i t e s are indicated by s o l i d diamonds; the carbohydrate moiety that i s missing i n the type II ceruloplasmin variant i s starred. The extent of the t r i p l i c a t e d u n i t s A l , A2 and A3 i s also shown. t f t £ 1 N 1 U U 1 U C N. -C 25 k 2 6 A I I Proteolytic Fragments (Kda) 67 50 19 350 710 1046 ~~^  ^ ~l Triplicated Units (amino acids) "Ai A2 AT OS 7 et a l . , 1979) and 50 Kda (Dwulet and Putnam , 1981a) p r o t e o l y t i c fragments; the sequence of the 67 Kda fragment and overlapping peptides was reported by Takahashi et a l . (1984). The l a t t e r strategy revealed the presence of an a d d i t i o n a l amino acid residue (Arg 481) between the 67- and 50 Kda fragments, and a s i n g l e residue (Lys 667) connecting the 50- and 19 Kda fragments. I t i s assumed that during preparation, these two basic residues are excised by an enzyme with carboxypeptidase-like s p e c i f i c i t y . A s i m i l a r mechanism i s involved i n the carboxypeptidase-B catalyzed removal of carboxy-terminal arginine or l y s i n e residues that r e s u l t s i n i n a c t i v a t i o n of k i n i n s , and the C3a and C5a analphylatoxins (Putnam, 1984). Ceruloplasmin possesses attachment s i t e s f o r four glucosamine-1inked (GlcN) oligosaccharides (Tetaert et a l . , 1982; Takahaski et a l . , 1984). The l o c a t i o n of these carbohydrate attachment points was determined by separation of GlcN-containing peptides using reverse-phase high pressure l i q u i d chromatography, followed by amino acid sequence a n a l y s i s . Three s i t e s are located i n the 6 7 Kda fragment, while one resides i n the 50 Kda p r o t e o l y t i c fragment (see Figure 1). A l l four GlcN oligosaccharides are linked i n the obligatory t r i p e p t i d e acceptor sequence Asn-X-Ser/Thr (Clamp, 1975). Ceruloplasmin also has three GlcN acceptor sequences that are not glycosylated. This i s probably the r e s u l t of t h e i r l o c a t i o n i n buried, hydrophobic regions of the protein (Ortel et a l . , 1984). Two variants of ceruloplasmin have been i d e n t i f i e d based on carbohydrate composition. In type I ceruloplasmin (predominant form), a l l four oligosaccharides are present, while the second oligosaccharide ( i . e . Asn-339) i s missing in the less abundant type II form (Takahashi et a l . , 8 1984) (see Figure 1). The p h y s i o l o g i c a l s i g n i f i c a n c e of these two variant forms i s unknown at present. Based on computer analysis of the amino acid sequence, the e n t i r e human ceruloplasmin molecule has been shown to e x h i b i t an i n t e r n a l 3-fold homology, with each homology u n i t c o n s i s t i n g of approximately 350 amino acids (Takahashi et a l . , 1984; see Figure 1. These u n i t s ( a r b i t r a r i l y designated A l , A2 and A3 from amino- to carboxy-terminus) share nearly 40% sequence i d e n t i t y when compared pairwise (see Table 1), including a high degree of conservation of the four l e a s t frequent amino acids i n proteins: methionine, h i s t i d i n e , tryptophan, and cysteine (Ortel et a l . , 1984). The amino acid sequence conservation has been interpreted to suggest a high l e v e l of s t r u c t u r a l conservation between these r e l a t e d segments (Ortel et a l . , 1984). The p r o t e o l y t i c cleavage pattern of non-denatured ceruloplasmin suggests that each of these three regions i s subdivided into 2 or 3 domains (see Figure 1). The boundaries between the i n d i v i d u a l homology u n i t s do not correspond to the s i t e s of p r o t e o l y t i c cleavage described previously, which generate the 67-, 50- and 19 Kda p r o t e o l y t i c fragments. However, t r y p s i n cleaves at a s i t e between the A l and A3 domains. Due to a d d i t i o n a l s i t e s of cleavage, these 2 u n i t s are further divided into subdomains, as i s d e t a i l e d i n Figure 1. D i f f e r e n t i a l s e n s i t i v i t y to p r o t e o l y t i c cleavage at homologous s i t e s between proposed domains has been a t t r i b u t e d to observed differences i n both primary and secondary structures at interdomain boundaries (Ortel et §_1. , 1984). Local secondary structure within the ceruloplasmin polypeptide chain was determined based on measurements of CD (Noyer and Putnam, 1981) as well as the c a l c u l a t i o n of parameters p r e d i c t i v e of secondary structure 9 O r t e l et a l . , 1984). On t h i s basis, i t has been proposed that ceruloplasmin consists of 33% B sheet organization, 33% B turns and 20% a-helices. Coupled with a calculated hydropathy p r o f i l e f o r the protein (Ortel et a l . , 1984), these data are i n accordance with the domain model presented i n Figure 1. I t i s a t t r a c t i v e to speculate that d i f f e r e n t domains within the ceruloplasmin molecule may correspond to i t s various b i o l o g i c a l a c t i v i t i e s (see Section I.B.3). B.2 Si t e s of Ceruloplasmin Biosynthesis O r i g i n a l l y , ceruloplasmin biosynthesis was thought to occur e x c l u s i v e l y within the parenchymal c e l l s of the l i v e r (Neifakh et a l . , 1969). However, i t has since been demonstrated that the choroid plexus, yolk sac, placenta and t e s t i s represent extrahepatic s i t e s of ceruloplasmin synthesis i n the r a t (Aldred et a l . , 1987). I t has been proposed that ceruloplasmin expression by these tissues may be important i n the transport of copper across natural b a r r i e r s e x i s t i n g between these compartments ( i . e . blood/cerebrospinal f l u i d , maternal/fetal c i r c u l a t i o n , and b l o o d / t e s t i s b a r r i e r s ) (Aldred et a l . , 1987). Synthesis of the transport proteins t r a n s f e r r i n and t r a n s t h y r e t i n i n these tissues (Dickson et a l . 1985; Schreiber, 1987) further implicates the importance of transport proteins at the i n t e r f a c e between e x t r a c e l l u l a r compartments. Furthermore, the synthesis of t r a n s f e r r i n i n these t i s s u e s , coupled with the known ferroxidase a c t i v i t y of ceruloplasmin (see Section I.B.3) may be important i n the transport of iron across compartment b a r r i e r s . S i m i l a r l y , the a b i l i t y of ceruloplasmin to oxidize serotonin and various catecholamines (see Section I.B.3) may be of p h y s i o l o g i c a l s i g n i f i c a n c e i n 10 the regulation of cerebrospinal f l u i d composition. Using the technique of i n s i t u h i s t o h y b r i d i z a t i o n , Yang et a l . (1986) have also observed ceruloplasmin mRNA within c i r c u l a t i n g macrophages and lymphocytes. I n t e r e s t i n g l y , gene expression of t r a n s f e r r i n has also been shown i n lymphocytes (Lum et a l . , 1987) again suggesting coordinated functional r o l e s f o r these two g e n e t i c a l l y - l i n k e d (Weitkamp, 1983; Yang et a l . , 1984) plasma proteins. B.3 Functions of Ceruloplasmin Ceruloplasmin possesses a number of enzymatic a c t i v i t i e s which cannot be a t t r i b u t e d to a subunit organization, since there i s conclusive evidence that ceruloplasmin i s synthesized as a s i n g l e polypeptide chain (see Section I.B.I). Rather, the m u l t i f u n c t i o n a l nature of the p r o t e i n has been ascribed to the c a t a l y t i c a c t i v i t i e s of the bound cupric ions (Frieden, 1981; see Section I.B.I). The various functions of ceruloplasmin are d e t a i l e d below: B.3.1 Ferroxidase a c t i v i t y (Curzon and O'Reilly, I960; Scheinberg and M o r e l l , 1973; Frieden and Hseih, 1976). Although capable of o x i d i z i n g a v a r i e t y of substrates i n v i t r o , ferrous [Fe (II)] i r o n has been proposed as the p r i n c i p a l p h y s i o l o g i c a l substrate f o r ceruloplasmin. In t h i s capacity, ceruloplasmin oxidizes Fe (II) released from f e r r i t i n to the Fe (III) form, f o r subsequent binding to apotransferrin [ i . e . Fe ferroxidase ( I I ) - f e r r i t i n :• Fe ( I l l ) - t r a n s f e r r i n ] . Thus, ceruloplasmin i s d i r e c t l y involved i n the regulation of hepatic i r o n m obilization. B.3.2 Serum antioxidant a c t i v i t y (Al-Timimi and Dormandy, 1977; Goldstein et a l . , 1979). I t i s well established that ceruloplasmin can 11 serve as a scavenger of free r a d i c a l s and superoxide ions. Frieden (1981) has estimated that the c o l l e c t i v e radical-scavenging p o t e n t i a l of ceruloplasmin i n serum i s less than that of superoxide dismutase. However, since ceruloplasmin i s e x t r a c e l l u l a r i n l o c a t i o n while superoxide dismutase i s p r i m a r i l y i n t r a c e l l u l a r , ceruloplasmin acts as the major scavenger i n plasma, p a r t i c u l a r l y during the acute phase response (see below) when ceruloplasmin l e v e l s are c h a r a c t e r i s t i c a l l y increased. B.3.3 Amine oxidase a c t i v i t y (Peisach and Levine, 1963; Frieden, 1981). Ceruloplasmin possesses s i g n i f i c a n t oxidase a c t i v i t y toward numerous aromatic amines and phenols i n v i t r o . P h y s i o l o g i c a l l y , t h i s a c t i v i t y may be important i n the regulation of l e v e l s of biogenic amines such as serotonin and various catecholamines. B.3.4 The r o l e of ceruloplasmin i n copper transport. The proposal that ceruloplasmin i s a copper transport protein i s strengthened by experiments demonstrating that a ceruloplasmin molecule w i l l r e v e r s i b l y bind up to ten cupric ions, i n addition to the i n t r i n s i c a l l y - b o u n d , c a t a l y t i c a l l y active copper atoms (McKee and Frieden, 1971). This i s consistent with proposals that ceruloplasmin i s the primary source of copper f o r i n t r a c e l l u l a r metalloenzymes present i n various extrahepatic tissues (Owen, 1965; Linder and Moor, 1977; Campbell et a l . , 1981). In t h i s regard, there i s evidence that the copper ions of ceruloplasmin are a p r e - r e q u i s i t e f o r copper u t i l i z a t i o n i n the biosynthesis of cytochrome c oxidase (Marceau and Aspin, 1973a,b; Hsieh and Frieden, 1975). I t has been proposed that ceruloplasmin Cu (II) i s reduced at c e l l membrane receptors and that Cu (I) i s subsequently transferred to an u n i d e n t i f i e d i n t r a c e l l u l a r acceptor(s) (Frieden, 1981). A l t e r n a t i v e l y , ceruloplasmin 12 may be taken up by endocytosis, and Cu (I) may then be released by p r o t e o l y s i s , accompanied by r e c y c l i n g of the p r o t e i n to the plasma membrane f o r release (Cousins, 1985). This l a t t e r model precludes the necessity of an acceptor f o r i n t r a c e l l u l a r transport. The net r e s u l t of e i t h e r route of copper entry into the c e l l s i s that the l a b i l e Cu (I) form would be o x i d a t i v e l y transferred to i n t r a c e l l u l a r apoenzymes, where i t could then be f i x e d with the aid of oxygen into holoenzyme-Cu (II) form. Recently, evidence has been presented f o r the presence of a s p e c i f i c ceruloplasmin receptor i n membranes from chicken aorta and cardiac tissues (Stevens et a l . , 1984). In t h i s study, membrane fragments derived from 125 the l a t t e r tissues bound [ I ] - l a b e l l e d chicken ceruloplasmin with a —8 d i s s o c i a t i o n constant (K.) of approximately 10 M. This i s d consistent with studies showing that the a c t i v a t i o n of a o r t i c l y s y l oxidase i s c o r r e l a t e d with elevated plasma ceruloplasmin l e v e l s (Harris and D i S i l v e s t r o , 1981). B.4 Regulation of Ceruloplasmin Expression B.4.1 Hormonal regulation of ceruloplasmin synthesis. Hormonal factors have been shown to influence ceruloplasmin production by the l i v e r (see Cousins, 1985 for a recent review). Meyer et a l . (1958) have shown that epinephrine and e s t r a d i o l increase serum ceruloplasmin l e v e l s i n the r a t , while both adrenocorticotrophic hormone (ACTH) and hydrocortisone have been shown to increase ceruloplasmin l e v e l s i n chickens (Starcher and H i l l , 1965). Based on these observations, i t has been proposed that any s t r e s s - r e l a t e d change in ceruloplasmin involves adrenal st e r o i d s . Although adrenal hormones have h i s t o r i c a l l y received p a r t i c u l a r a ttention 13 regarding t h e i r e f f e c t on ceruloplasmin l e v e l s , i t has been demonstrated more recently that leukocyte endogenous mediator ( i n t e r l e u k i n I) can also elevate serum ceruloplasmin l e v e l s (Wannemacher et a l . , 1975). The stimulatory e f f e c t s of leukocyte endogenous mediator on ceruloplasmin and several other plasma proteins have been proposed i n the regulation of the acute phase inflammatory response (see below). B.4.2 Copper induction of ceruloplasmin expression. Linder et a l . (1979) have reported that copper d i r e c t l y controls the plasma concentration of ceruloplasmin i n diet-induced copper d e f i c i e n t rats by r e g u l a t i o n of i t s l e v e l of synthesis. In these studies, the e f f e c t s of o r a l administration of copper to copper-deficient rats was assessed by 3 monitoring the incorporation of a two hour pulse dose of [ H]-leucine into plasma proteins. No e f f e c t of copper administration was observed on synthesis of plasma proteins i n general. However, a marked e f f e c t on ceruloplasmin synthesis was observed, r e s u l t i n g i n a s i g n i f i c a n t enhancement (nearly three-fold) a f t e r 6 - 8 hours, thereby resembling the e f f e c t of i r o n on f e r r i t i n synthesis (Drysdale et a l . , 1966). I t has been proposed that the sudden i n f l u x of copper associated with large administered doses may be s u f f i c i e n t to bypass normal co n t r o l mechanisms, and may e i t h e r a c t i v a t e ceruloplasmin gene t r a n s c r i p t i o n s p e c i f i c a l l y , or may a l t e r some aspect of t r a n s l a t i o n a l regulation. These data are consistent with the observations of Weiner and Cousins (1980) using rat parenchymal c e l l s . In the l a t t e r study, incubation with 50 pM copper 3 f o r 12 hours or more s i g n i f i c a n t l y increased [ H]-ceruloplasmin secretion by the c e l l s suggesting that when e x t r a c e l l u l a r copper content 14 was s u f f i c i e n t l y high, ceruloplasmin gene expression may have been enhanced. B.4.3 Regulation of ceruloplasmin synthesis during inflammation - the acute phase response. The acute phase reactants comprise a group of mainly glycoproteins which show c h a r a c t e r i s t i c a l l y a l t e r e d rates of synthesis i n the l i v e r (Schreiber et a l . , 1982) r e s u l t i n g i n changes i n t h e i r plasma concentrations i n response to a wide v a r i e t y of inflammatory s t i m u l i [see Koj (1974) f o r a review]. Ceruloplasmin i s an acute phase reactant (Larson, 1974) and as such, i t s serum l e v e l s can become increased by 2 to 3-fold (from the normal l e v e l of 15 - 60 mg/dl serum; Owen, 1982) i n response to inflammation. The property of ceruloplasmin as a serum antioxidant i s a t t r a c t i v e i n t h i s respect, since increased ceruloplasmin l e v e l s would be useful i n the subsequent n e u t r a l i z a t i o n of l i p i d peroxidation products released into the serum upon t i s s u e damage (Bonta, 1978). The observed stimulatory e f f e c t of leukocyte endogenous mediator (LEM) on hepatocytes has been proposed to r e s u l t i n increased ceruloplasmin synthesis during the inflammatory response (Frieden, 1981) (see above). This protein, which i s released by leukocytes, stimulates the uptake of ir o n , zinc, and amino acids by l i v e r c e l l s and also enhances the synthesis and release of acute phase reactants including ceruloplasmin. The mechanism underlying the stimulatory e f f e c t s of LEM i s not understood. Ceruloplasmin expression during the acute phase response has recently been studied by analysis of rat l i v e r samples following induction of inflammation by i n j e c t i o n of the animals with turpentine (Aldred et a l . , 15 1987). Ceruloplasmin mRNA le v e l s increased to a peak corresponding to 350% of the normal value by 36 hours. By 60 hours post-inflammation, ceruloplasmin has decreased to normal l e v e l s . As has been the case f o r other acute phase reactants studied ( t r a n s f e r r i n , a^-macroglobulin, 6 chain of fibrinogen, a^-acid glycoprotein, and metallothionein-I) (Schreiber et a l . , 1986), the regulation of ceruloplasmin synthesis during the inflammatory response appears to occur at the mRNA l e v e l , a f f e c t i n g the rate of t r a n s c r i p t i o n and/or mRNA s t a b i l i t y as opposed to the rate of pro t e i n t r a n s l a t i o n . C. ABNORMALITIES IN COPPER HOMEOSTASIS - WILSON'S DISEASE In t h i s autosomal recessive disorder (also r e f e r r e d to as hepato l e n t i c u l a r degeneration), pathogenesis i s r e l a t e d to abnormal copper deposition i n body tissues, e s p e c i a l l y the brain and l i v e r (Cumings, 1948). Ceruloplasmin l e v e l s are c h a r a c t e r i s t i c a l l y decreased i n Wilson's disease. This was i n i t i a l l y demonstrated by Scheinberg and G i t l i n (1952) who i s o l a t e d ceruloplasmin from normals and Wilson's disease patients and qua n t i f i e d i t s l e v e l s both immunochemically, and by monitoring the decrease i n absorbance (610 nm) of ceruloplasmin following reduction. The observed c o r r e l a t i o n between decreased ceruloplasmin l e v e l s and Wilson's disease has since been interpreted to suggest a defect i n the rate of ceruloplasmin biosynthesis (Poulik and Weiss, 1975). A d d i t i o n a l l y , there have been reports that the disease i s associated with s t r u c t u r a l anomalies of the ceruloplasmin protein. In t h i s regard, Verbina and Puchkova (1985) have reported the i s o l a t i o n of ceruloplasmin from a Wilson's disease patient that d i f f e r s from normal ceruloplasmin i n physicochemical, 16 immunological, and c a t a l y t i c properties. They have postulated that t h i s anomalous ceruloplasmin may be the r e s u l t of inc o r r e c t p o s t - t r a n s l a t i o n a l modification of the p r o t e i n . Gaitskhoki et al^. (1975) reported comparative immunochemical analysis 125 (using [ I]-ceruloplasmin antibodies) of ceruloplasmin-synthesizing polysomes i n l i v e r biopsies obtained from control subjects and Wilson's disease p a t i e n t s . This study c l e a r l y demonstrated that the amount of ceruloplasmin-forming polysomes i n patients affected with Wilson's disease was 10 - 20 times lower than that determined f o r normal controls. On t h i s b a s i s , i t was proposed that a decreased l e v e l of tr a n s l a t a b l e mRNA i s the l i k e l y cause of the genetic block i n ceruloplasmin synthesis that i s c h a r a c t e r i s t i c of Wilson's disease. This hypothesis i s consistent with the recent findings of Czaja et a l . (1987). The l a t t e r study showed that ceruloplasmin mRNA l e v e l s i n f i v e Wilson's disease patients were decreased to 33% that of control mRNA l e v e l s . In contrast, l e v e l s of albumin mRNA (also synthesized i n the l i v e r ) were elevated to 161% i n Wilson's disease patients compared with normal l e v e l s , suggesting that l i v e r function i n the former group was not compromised due to the disease state. Using nuclear run-on assays to analyze t r a n s c r i p t i o n a l rates, i t was found that the amount of ceruloplasmin gene t r a n s c r i p t i o n i n the Wilson's disease patients was decreased to 44% that of control l e v e l s . This has been interpreted to indicate that i n at least some cases of Wilson's disease, observed reduction of plasma ceruloplasmin i n le v e l s may be due to decreased ceruloplasmin gene t r a n s c r i p t i o n , as opposed to a defect i n the rate of pro t e i n synthesis. 17 The causal r e l a t i o n s h i p between ceruloplasmin and Wilson's disease has become complicated by the recent studies of Frydman et a l . (1985), showing linkage between the gene f o r Wilson's disease and the esterase D locus on chromosome 13. These data have been further substantiated by the more recent assignment of the Wilson's disease locus to chromosome 13q (Yuzbasiyan-Gurkan et a l . , 1987). Since the wild-type ceruloplasmin gene has been unequivocally mapped to chromosome 3 using both somatic c e l l and i n s i t u h y b r i d i z a t i o n techniques (Yang et a l . , 1986; Royle et al_., 1987), i t seems l i k e l y that reduced ceruloplasmin-specific t r a n s c r i p t i o n observed i n Wilson's disease patients studied by Czaja et a l . (1987) i s not the r e s u l t of defect(s) within the s t r u c t u r a l gene, but may involve a trans-acting f a c t o r ( s ) mapping to chromsome 13. D. CHARACTERIZATION OF RAT CERULOPLASMIN Gaitskhoki et a l . (1980) have previously reported the i s o l a t i o n of h i g h l y - p u r i f i e d rat ceruloplasmin mRNA by i n d i r e c t immunoprecipitation of ceruloplasmin-synthesizing rat l i v e r polysomes. Ceruloplasmin mRNA was then t r a n s l a t e d i n a heterologous wheat germ system; products of c e l l - f r e e t r a n s l a t i o n were immunoprecipitated and subjected to SDS-polyacrylamide gel electrophoresis. The r e s u l t s indicated that the primary t r a n s l a t i o n product was 84 Kda. Addition of rat l i v e r membranes to the t r a n s l a t i o n system resulted i n the v i s u a l i z a t i o n of two polypeptides (80 Kda and 65 Kda) following immunoprecipitation. The 65 Kda product was i d e n t i c a l to the secreted form of ceruloplasmin i s o l a t e d from the Golgi complex following i n vivo p u l s e - l a b e l l i n g and su b c e l l u l a r f r a c t i o n a t i o n (Neifakh et a l . , 1979). On th i s basis i t was proposed that a maturation step, 18 r e s u l t i n g i n the appearance of the 130 Kda plasma form of ceruloplasmin, involved l i g a t i o n of two 65 Kda polypeptides (Puchkova et al., 1981). In support of t h i s theory, Prozorovski et a l . (1982) have emphasized the existence of sequence homology within each h a l f of human holoceruloplasmin 125 based on s t r u c t u r a l studies using [ I]-peptide mapping. However Takahashi et a l . (1984) have conclusively demonstrated a t r i p l i c a t e d structure f o r the pr o t e i n (see Section I.B.I). More recently, i t has been shown that a s i n g l e mRNA species of approximately 3.8 Kb corresponds to the ceruloplasmin t r a n s c r i p t from r a t l i v e r , which i s more than s u f f i c i e n t i n s i z e to encode the en t i r e p r o t e i n (Aldred et a l . , 1987). Aldred et a l . (1987) have also reported the i s o l a t i o n of a r a t ceruloplasmin cDNA clone. Using a p a r t i a l human ceruloplasmin cDNA clone to screen a r a t l i v e r cDNA l i b r a r y , a clone was i s o l a t e d that contained DNA coding f o r the equivalent of residues 194 - 879 of the human ceruloplasmin amino acid sequence (Takahashi et a l . , 1984). The predicted amino acid sequence derived from the r a t cDNA shows close sequence i d e n t i t y (nearly 75%) with the determined amino acid sequence of human ceruloplasmin from amino acid residues 194 - 276 ( i . e . within the A l domain). There i s s t r i k i n g (approximately 98%) amino acid sequence i d e n t i t y i n the carboxy-terminal region of t h i s l a t t e r sequence when compared with the human ceruloplasmin protein sequence, showing complete conservation from residues 227 - 276 with the exception of one conservative amino acid change occurring at residue 243. The predicted r a t ceruloplasmin amino acid sequence also c l o s e l y matches that determined for human ceruloplasmin beteen amino acids 810 - 879 (corresponding to the A3 domain), showing 83% sequence i d e n t i t y i n t h i s region. 19 E. THE RELATIONSHIP OF CERULOPLASMIN TO OTHER COPPER-CONTAINING  PROTEINS The amino acid sequence of human ceruloplasmin (Takahashi et a l . , 1984) was analyzed f o r segments s i m i l a r to known type I copper-binding s i t e s i n small blue electron t r a n s f e r proteins (e.g. azurin and plastocyanin), and i n some large multicopper oxidases whose cr y s t a l l o g r a p h i c structures and/or amino acid sequences have been determined. The three-dimensional structure of azurin (a b a c t e r i a l e lectron o transport p r o t e i n with Mr = 14 Kda) has been determined from a 3.0 A r e s o l u t i o n electron density map using X-ray d i f f r a c t i o n methods (Adman et a l . , 1978). Based on these data, the s i n g l e type I copper ion present i n azurin i s thought to be coordinated to cys-112, met-121, his-46 and his-117 residues. This i s consistent with a d i s t o r t e d tetrahedral geometry f o r that type I s i t e that has been proposed based on near-infrared absorption, CD, and MCD studies (Solomon et a l . , 1976). Like azurin, plastocyanin (Mr = 10,500) also belongs to the group of small blue electron t r a n s f e r proteins and possesses a s i n g l e type I blue Cu (II) ion. Plastocyanin has been i d e n t i f i e d as a component i n the photosynthetic chain of a number of green plants and algae. I t s X-ray o c r y s t a l structure at 2.7 A r e s o l u t i o n has been reported by Colman et a l . (1978). As i s the case for azurin, the coordination geometry of the type I Cu (II) i s consistent with a d i s t o r t e d tetrahedron, liganded by a cysteine t h i o l group (cys-84), a methionine thioether group (met-92) and h i s t i d i n e imidazole groups (his-37 and his-87). When compared to the type I centres proposed for azurin and plastocyanin, the type I copper present 20 i n s t e l l a c y a n i n (a small blue protein with proposed electron t r a n s f e r function) d i f f e r s with respect to at l e a s t one coordination p o s i t i o n . This i s based on the reported amino acid sequence of s t e l l a c y a n i n (Bergman et a l . , 1977) which i s devoid of methionine residues. This may explain observed differences i n the type I centres i n s t e l l a c y a n i n and plastocyanin as determined by EPR analysis (Peisach et a l . , 1967) and a comparison of redox p o t e n t i a l s . On the basis of homology with the i d e n t i f i e d ligands to the s i n g l e type I copper i n azurin and plastocyanin (see above), i t has been suggested that a type I copper centre i s present i n the carboxy terminal 19 Kda p r o t e o l y t i c fragment of ceruloplasmin (Kingston et a l . , 1979; see Figures 2A and 2B). Three of the proposed ligands are the clustered cys-1021, his-1026, and met-1031 residues; the fourth type I ligand suggested by Kingston et a l . (1979) i s his-956, which i s 65 residues amino terminal to the cys-1021 residue. The predicted cysteine, h i s t i d i n e and methionine ligands are i n agreement with the three dimensional model of the copper a c t i v e s i t e i n human ceruloplasmin presented by Ryden (1982). However, the l a t t e r study indicates that the fourth ligand i s his-975 which i s more l i k e l y , since t h i s p o s i t i o n corresponds to the l o c a t i o n of type I h i s t i d i n e ligands i n both plastocyanin and azurin (see Figure 2A). I t has been postulated further that a second type I copper binding s i t e may be located i n the 50 Kda p r o t e o l y t i c fragment (Dwulet and Putnam, 1981b), where the ligands cys-680, his-685 and met-690 are located i n homologous posit i o n s to the proposed type I s i t e i n the 19 Kda fragment. The fourth residue comprising the type I s i t e i n t h i s fragment i s proposed to be his-637 (Dwulet and Putnam, 1981b), which also corresponds to the 21 Figure 2. The r e l a t i o n s h i p between human ceruloplasmin (Cp), Neurospora  crassa laccase (Lac), Pseudomonas aeruginosa azurin (Azn), poplar plastocyanin (PI), and bovine superoxide dismutase (B.SOD) i n two regions (Figures 2A and 2B) containing proposed copper ligands (taken from Dwulet and Putnam, 1981b; Germann and Lerch, 1986). I d e n t i c a l amino acid residues are boxed, and p o t e n t i a l ligands to the three types of copper centers are indicated by *1, *2 and *3 r e s p e c t i v e l y . Arrows i d e n t i f y known type I ligands i n azurin and plastocyanin (see text f o r d e t a i l s ) . H i s t i d i n e ligands implicated i n copper-binding i n bovine superoxide dismutase are c i r c l e d . The fourth p o t e n t i a l type I ligand i n laccase ( i . e . Met-169) i s enclosed i n a diamond. Numbers on the l e f t of each sequence i d e n t i f y p o s i t i o n s within the proteins of the f i r s t residues given. For human ceruloplasmin, the corresponding homology u n i t i s given i n brackets. 2A. *1 Cp 276 (Al) H A *1 Cp 637 (A2) H G *1 Cp 975 (A3) H T *1 Lac 99 H P B.SOD 39 G D *1 Azn 46 H N *1 PI 37 H N T 2B. Cp 312 (Al) P G Cp 673 (A2) E G Cp 1014 (A3) P G Lac 164 P G Azn 104 E G PI 77 K G A F F N G •Q I Y F S G N *3 *3 *3 V H F H G H *3 *3 *3 •I H L H G H *3 *3 H G F ® V ® W V L s T A I V E W M L S T — F N V E *3 I - U L L H A *3 S — W L <3> H E Q Y N F F E _ Y S F Y A L T Y S F D F Q F A D *1 C Q *1 C L *1 *2 C H *1 *2 N C *1 C *1 C r H L N T D T D A W P G P *1 H *1 H *1 H *1 H *1 H *1 H T V K T *2 H G G G G L G L *1 M *1 M L *1 M *1 M t 23 p o s i t i o n of type I h i s t i d i n e ligands i d e n t i f i e d i n azurin and plastocyanin (see Figure 2A). Although the 67 Kda fragment, representing the amino terminal p o r t i o n of ceruloplasmin has cysteine and h i s t i d i n e residues i n i d e n t i c a l p o s i t i o n s to those present i n the 19 and 50 Kda fragments, there i s no corresponding methionine residue (see Figure 2B), which may preclude the binding of a type I copper ion i n t h i s region (Takahashi et a l . , 1983). However, the type I copper centers i n ceruloplasmin have been d i f f e r e n t i a t e d into two subtypes, based on the k i n e t i c s of reoxidation (Herve et a l . , 1978). In t h i s regard, i t has been shown more recently that the type I " f a s t " copper center does not require methionine as a coordination ligand, while the type I "slow" center does (Herve et a l . , 1981). Thus, the 67 Kda p r o t e o l y t i c fragment also possesses a putative type I copper binding s i t e which may resemble the blue copper center i n s t e l l a c y a n i n which also lacks methionine as a ligand (see above). A second region within the 19 Kda p r o t e o l y t i c fragment of ceruloplasmin i s homologous to a known non-blue copper binding s i t e i n bovine and human superoxide dismutase (Richardson et a l . , 1975; Jabusch et a l . , 1980) and human cytochrome c oxidase ( B a r r e l l et a l . , 1979), a l l of which contain copper i n binuclear centres. The X-ray c r y s t a l structure o of bovine superoxide dismutase has been determined at 3 A r e s o l u t i o n . The o two copper ions on opposite subunits within the dimer are 34 A apart. The o copper and zinc i n each subunit are approximately 6 A apart, and they both form ligands to the imidazole ri n g of his-61. The pr o t e i n ligands to the copper are proposed to be his-44, his-46, his-61 and his-118 ( i . e . his-X-his motif), arranged i n a s l i g h t l y d i s t o r t e d square plane and 24 thereby resembling a type II centre. The 19 Kda fragment of ceruloplasmin has a h i s t i d i n e - r i c h sequence element (see Figure 2A) that i s homologous to the non-blue copper centre described i n bovine and human superoxide dismutase (Ortel et a l . , 1984). Based on a three-dimensional model of the copper a c t i v e s i t e of the human ceruloplasmin 19 Kda p r o t e o l y t i c fragment (Ryden, 1982), i t has been proposed that these h i s t i d i n e residues ( i . e . his-980 and his-982, i n addition to his-978 and his-1020) (see Figures 2A and 2B) may function as type I I I ligands i n ceruloplasmin. The l a t t e r model i d e n t i f i e s human ceruloplasmin residues his-1022 and his-1028 as p o t e n t i a l type II ligands (see Figure 2B). A corresponding h i s t i d i n e - r i c h c l u s t e r resembling that i d e n t i f i e d i n the 19 Kda fragment of ceruloplasmin i s absent from the 50 Kda p r o t e o l y t i c peptide (Ortel et a l . , 1984) (see Figure 2A). As a general observation, there i s an unusually large number of his-X-his sequences i n ceruloplasmin, some of which may be implicated i n non-blue copper binding. Fungal laccase (Mr =62 Kda) i s a blue oxidase containing four copper ions and a unique cysteine residue. I n i t i a l l y , peptides containing the s i n g l e s u l f h y d r y l group were i s o l a t e d and characterized (Briving et a l . , 1980). Additional amino acid sequence information has been provided from p a r t i a l nucleotide sequence analysis of the laccase gene (Germann and Lerch, 1986). Comparison of the a v a i l a b l e amino acid sequence f o r laccase with that of human ceruloplasmin has revealed the presence of several highly-conserved sequence elements i n these two multicopper oxidases (see Figures 2A and 2B). Due to i t s conserved p o s i t i o n with proposed type I cysteine ligands i n ceruloplasmin, azurin, and plastocyanin (see Figure 2B), the unique cysteine residue i n laccase 25 i s thought to coordinate type I copper. A d d i t i o n a l l y , laccase contains two h i s t i d i n e residues i n s i m i l a r p o s i t i o n s to the proposed type I h i s t i d i n e ligands i n the above copper-containing proteins (see Figures 2A and 2B). However, the methionine ligand which i s conserved i n the l a t t e r proteins i s absent i n laccase, as i s the case for s t e l l a c y a n i n (see above) (Germann and Lerch, 1986). I t has been proposed (Germann and Lerch, 1986) that met-169 i n laccase (see Figure 2B) may be involved i n type I copper coordination. In addition to s i m i l a r i t y observed with respect to type I ligands, h i s t i d i n e residues have been i d e n t i f i e d i n laccase that occur i n i d e n t i c a l p o s i t i o n s to proposed non-blue copper ligands i n ceruloplasmin (see Figures 2A and 2B). Based on e a r l i e r studies (Richardson et a l . , 1975; Ryden, 1982), i t has been suggested that the conserved sequence elements i n ceruloplasmin and laccase may coordinate the binding of type II and/or type I I I copper, i n addition to t h e i r proposed involvement i n type I copper centres (B r i v i n g et a l . , 1980; Germann and Lerch, 1986). Thus, these sequences may form a l i n k between type I, II and III copper centres present i n multicopper oxidases. Taken together, the above data suggest that the type I copper binding s i t e i s s i m i l a r i n both the small blue electron t r a n s f e r proteins as well as large multicopper oxidases (e.g. laccase and ceruloplasmin). I t also seems that i n addition to the presence of a putative type I copper centre, the 19 Kda fragment of ceruloplasmin i s s t r u c t u r a l l y r e l a t e d to non-blue copper binding s i t e s i d e n t i f i e d i n other copper oxidases. I t i s l i k e l y , therefore, that these proteins share a common evolutionary o r i g i n , perhaps 26 d e r i v i n g from a primordial gene encoding a small blue p r o t e i n possessing e i t h e r electron t r a n s f e r or oxidase function. There are several s i t e s within the ceruloplasmin molecule that are characterized by the p o s i t i o n i n g of h i s t i d i n e s adjacent to basic amino acid residues. These s i t e s might be involved i n copper binding as i s the case f o r serum albumin, or the plasma t r i p e p t i d e g l y - h i s - l y s (Pickart et a l . , 1980). The f i r s t two residues of the l a t t e r motif are involved i n the binding of copper, while the side chain of the l y s y l residue i s proposed to be necessary f o r the recognition by c e l l surface receptors. This i s analogous to plasma albumin or a-fetoprotein, i n which the h i s t i d i n e residue that binds copper i s immediately adjacent to e i t h e r l y s i n e or arginine residues (Aoyagi et a l . , 1980). Thus, the p o s i t i o n i n g of a h i s t i d i n e residue next to a basic residue may be a b i o l o g i c a l l y a c t i v e structure f o r copper uptake. The presence of such sequences i n ceruloplasmin may correspond to observed s i t e s of r e v e r s i b l e copper binding (McKee and Frieden, 1971) which may i n turn play an e s s e n t i a l r o l e i n copper transport (see Section I.B.3.4). F. THE RELATIONSHIP BETWEEN CERULOPLASMIN AND PROTEINS INVOLVED IN  BLOOD COAGULATION An i n t e r e s t i n g s t r u c t u r a l r e l a t i o n s h i p has been shown to e x i s t between ceruloplasmin and the blood coagulation factors V and VIII. These l a t t e r two proteins (both with Mr > 300 Kda) share a high degree of s t r u c t u r a l and functional s i m i l a r i t y (Suzuki et a l . , 1982; Nesheim et a l . , 1984). Both proteins function in the i n t r i n s i c blood c l o t t i n g cascade (Jackson and Nemerson, 1980) in conjunction with an activated, vitamin-K 27 dependent c l o t t i n g f a c t o r (factors IXa and Xa for factors VIII and V, r e s p e c t i v e l y ) . Both complexes require a phospholipid surface and calcium ions, and subsequently r e s u l t i n the s p e c i f i c a c t i v a t i o n of a second vitamin K-dependent coagulation protein ( f a c t o r X and prothrombin f o r factors VIII and V, r e s p e c t i v e l y ) . Analysis of the complete amino acid sequences of human f a c t o r V (Jenny et a l . , 1987) and human f a c t o r VIII (Wood et a l . , 1984; Toole et a l . , 1984) predicted from the corresponding cDNAs revealed the existence of 3 types of domains within the two proteins: a t r i p l i c a t e d "A" domain (approximately 320 - 380 amino a c i d residues), a unique "B" domain (925 and 886 amino acid residues i n fa c t o r s VIII and V, r e s p e c t i v e l y ) , and a duplicated "C" domain c o n s i s t i n g of approximately 100 - 150 amino acid residues (see Figure 3). Organization of these u n i t s from amino-to carboxyl terminal within the proteins are as follows: A l - A2 - B - A3 - C l - C2. The "A" domains of factors V and VIII show a high l e v e l of s i m i l a r i t y with the t r i p l i c a t e d u n i t s i n the human ceruloplasmin molecule, sharing approximately 30 - 40% sequence i d e n t i t y when compared pairwise (see Table I ) . Of p a r t i c u l a r note i s the c l u s t e r i n g of cysteine residues at s i m i l a r p o s i t i o n s within the t r i p l i c a t e d "A" domains of fac t o r VIII (Vehar et ad., 1984) and ceruloplasmin (Takahashi et a l . , 1983), i n d i c a t i n g a high degree of s t r u c t u r a l conservation between these repeated u n i t s . The duplicated "C" domain present i n factors V and VIII i s unrelated to ceruloplasmin, but shows approximately 20% sequence i d e n t i t y when compared with d i s c o i d i n s , which are phospholipid-binding l e c t i n s from Dictyostelium discoideum (Poole et a l . , 1981). The "B" domains i n factors V and VIII are each 28 Figure 3. Comparison of the s t r u c t u r a l organization of ceruloplasmin,  f a c t o r V and fa c t o r VIII. The t r i p l i c a t e d A domain (designated A l , A2 and A3 i n an amino-to carboxyl d i r e c t i o n ) i s i d e n t i f i e d i n the 3 molecules by cross-hatched bars. The B domain (present i n factors V and VIII only) i s represented by an open bar. The duplicated C domain (also present only i n factors V and VIII) i s shown by s t i p p l e d bars. Sizes of the domains correspond to amino acid residues. Comparison of the structures of factor VIII, factor V and ceruloplasmin Factor VIII 350 350 Ceruloplasmin 1000 l \ \ \ \ \ \ \ \ \ \ l \ \ \ \ \ \ \ \ \ \ \ K \ \ \ \ V W ^ 350 350 350 KWWWWWM:---350 amino acids 150 150 Factor V K\ \ \ \ \ \ \ \ \ \ l - - : : ; l ;;;•;;••] 350 350 1000 350 150 150 Table I. Comparison of "A" domains i n f a c t o r V, f a c t o r VIII and ceruloplasmin (from Jenny et a l . , 1987). Matches/Length (%) Domain V-A2 V-A3 VIII-A1 VIII-A2 VIII-A3 Cer-Al Cer-A2 Cer-A3 V-Al 31 32 36 30 27 31 32 21 V-A2 - 29 29 44 31 35 34 35 V-A3 - - 33 31 39 32 34 39 VIII-A1 - - - 30 32 34 32 33 VIII-A2 - - - - 33 38 35 36 VIII-A3 - - - - - 28 34 36 Cer-Al - - - - - - 37 39 Cer-A2 - - - - - - - 40 Values (expressed as percentages) represent t o t a l I d e n t i c a l amino acid matches divided by overlapping lengths (including gaps). 31 s t r u c t u r a l l y unique when compared with the r e s t of the molecule, and i n both cases are removed upon a c t i v a t i o n of the proteins. The f u n c t i o n a l r e l a t i o n s h i p between ceruloplasmin and these two seemingly unrelated blood c l o t t i n g factors i s unclear. However, i t has been proposed previously that f a c t o r V i s a metalloprotein (Greenquist and Colman, 1975) r e q u i r i n g calcium and a v a r i e t y of metal ions f o r expression of a c t i v i t y (Esmon, 1979; Hibbard and Mann, 1980). Recently, using both atomic emission and atomic absorption spectroscopy, f a c t o r V has been shown to contain copper i n the r a t i o of 1 copper ion per mol of f a c t o r V (Mann et a l . , 1984). Since f a c t o r V exhibits an absorption spectrum with no maximum at e i t h e r 310 nm (type III copper) or 610 nm (type I copper), the copper binding i s assumed to be the equivalent of the type II copper center present i n ceruloplasmin. Although factor VIII has not yet been assessed with respect to i t s metal-binding content, the peptide chains are non-covalently associated i n a process that i s EDTA s e n s i t i v e (Fass et a l . , 1982). Thus, the association of the peptide chains i n the f a c t o r VIII molecule may be dependent upon bound metal and/or calcium ions. The four amino acids proposed as ligands f o r type I copper binding i n the ceruloplasmin molecule (see Section I-E) are located i n analogous po s i t i o n s i n the A l and A3 u n i t s of factor VIII (Vehar et a l . , 1984). Conservation of these copper ligand residues may imply s i m i l a r metal binding c h a r a c t e r i s t i c s f or both factor VIII and ceruloplasmin. In terms of p h y s i o l o g i c a l s i g n i f i c a n c e , the demonstrated s t r u c t u r a l r e l a t i o n s h i p between factors V and VIII and ceruloplasmin suggests the possible involvement of copper and/or other metal ions, i n addition to calcium, i n the function of factors V and VIII. The possible r o l e of 32 metal binding i n t h i s process i s implied by studies showing the binding capacity of y-carboxyglutamic a c i d residues for lanthanide ions (Sperling et a l . , 1978). These modified glutamic acid residues (present i n b l o o d - c l o t t i n g factors I I , VII, IX, X, p r o t e i n Z, p r o t e i n S and p r o t e i n C) are thought to be involved i n calcium binding, which i s i n turn proposed to mediate the i n t e r a c t i o n of these proteins with p l a t e l e t surfaces i n vivo. I t i s also conceivable that p o t e n t i a l metal ligands present i n f a c t o r V and p o s s i b l y f a c t o r VIII can i n t e r a c t with a d i f f e r e n t metal ion j o i n t l y bound by the y-carboxyglutamic acid regions of factors IX and X, thereby promoting complex formation. Ceruloplasmin has a number of enzymatic functions ascribed to i t i n addition to i t s r o l e i n copper transport and homeostasis (see Section I.B.3). I t i s unknown at t h i s time whether any of these c a t a l y t i c a c t i v i t i e s are also associated with factors V and VIII. G. CHARACTERIZATION OF THE HUMAN FACTOR VIII GENE G.l H i s t o r i c a l Perspective In the majority of cases, the bleeding disorder hemophilia A (or c l a s s i c hemophilia) r e s u l t s from mutations ( e i t h e r s i n g l e base changes or gross rearrangements) within the s t r u c t u r a l gene coding for factor VIII. Since the f a c t o r VIII gene i s located on the X chromosome, there i s a high frequency of hemophilia A r e l a t i v e to autosomal c l o t t i n g disorders (Haldane, 1935). H i s t o r i c a l l y , studies addressing the nature of hemophilia A have been s e r i o u s l y hampered by d i f f i c u l t i e s encountered i n the p u r i f i c a t i o n of factor VIII since i t i s an unusually large (Mr = 330 Kda), unstable protein, present i n low concentrations i n plasma (100 -33 200 ng/ml) (Wood et a l . , 1984). Studies involving the c h a r a c t e r i z a t i o n of the f a c t o r VIII gene were i n i t i a t e d to f a c i l i t a t e an understanding of the molecular basis of hemophilia A. In addition, expression of the cloned gene i n v i t r o would provide a v i r u s - f r e e preparation of recombinant f a c t o r VIII f o r treatment of hemophiliacs. G.2 Organization of the Factor VIII Gene The complete organization of the 186 Kbp human f a c t o r VIII gene was reported by G i t s c h i e r et a l . (1984). I n i t i a l l y , f a c t o r VIII clones were i s o l a t e d from a genomic l i b r a r y constructed using a lymphoblast c e l l l i n e which was derived from an i n d i v i d u a l with 4X chromosomes (Karyotype 49, XXXY). This l i b r a r y was screened using a unique 36-base synthetic oligonucleotide probe constructed on the basis of a previously characterized factor VIII t r y p t i c peptide (Vehar et a l . , 1984). Using chromosome walking to extend the i n i t i a l l y i d e n t i f i e d clones, 200 Kbp (nearly 0.1%) of the X chromosome was characterized that encompassed the complete factor VIII gene. DNA sequence analysis of intron/exon boundaries revealed the presence of 26 exons i n the f a c t o r VIII gene. Although most exon si z e s were consistent with reported d i s t r i b u t i o n s (Naora and Deacon, 1982), two of the exons are unusually large. The large s t exon i s 3106 bp i n length, and corresponds to the 100 Kda connecting peptide ( i . e . the B domain; see Figure 3) j o i n i n g amino- and carboxy-terminal fragments of 90- and 80 Kda, r e s p e c t i v e l y . This exon corresponds to a p h y s i o l o g i c a l u n i t , since the 100 Kda peptide i s p r o t e o l y t i c a l l y excised upon thrombin a c t i v a t i o n (Fulcher et a l . , 1983). The other large exon i s 1958 bp long, 1805 bp of which correspond to the 3' untranslated region of the gene. Intron sizes 34 i n the f a c t o r VIII gene were found to be highly v a r i a b l e (Naora and Deacon, 1982) with the larges t intron spanning 32.4 Kbp. O v e r a l l , the fact o r VIII gene consists of 9 Kbp of exon sequence interrupted by 177 Kbp of intervening sequence, suggesting a lack of s e l e c t i v e pressure to decrease in t r o n s i z e . RNAse pro t e c t i o n experiments using mRNA derived from e i t h e r the AL-7 T c e l l hybridoma l i n e or human l i v e r i n d i c a t e that the t r a n s c r i p t i o n i n i t i a t i o n s i t e i n the f a c t o r VIII gene i s positioned at -170 or -172 res p e c t i v e l y (+1 denotes the p o s i t i o n of the i n i t a t i o r methionine residue). At 30 bp 5' to the predicted mRNA s t a r t s i t e i s located the sequence "GATAAA", which c l o s e l y resembles the Goldberg-Hogness consensus sequence ( i . e . the "TATA" box), proposed to be required f o r precise i n i t i a t i o n of t r a n s c r i p t i o n by eukaryotic RNA polymerase II (Goldberg, 1979; Breathnach and Chambon, 1981). No "CAT" sequence element (Breathnach and Chambon, 1981) was observed upstream to the "ATA" sequence i n the f a c t o r VIII gene. Following the 5* untranslated region i s a t y p i c a l 19 amino acid s i g n a l peptide (von Heijne, 1982) containing two charged residues flanking a core of hydrophobic amino acids. This secretory s i g n a l precedes the mature p r o t e i n sequence of 2332 amino acid residues, which i s followed by a "TGA" stop codon and a subsequent 3' untranslated region of 1802 bp. The conserved polyadenylation s i g n a l "AATAAA" (Proudfoot and Brownlee, 1976) i s contained i n the l a t t e r sequence, occurring 19 bp p r i o r to the po s i t i o n of poly (A) addition. 35 G.3 Evolutionary Aspects of Intron Positions within the Factor VIII  Gene As discussed previously (see Section I.F), the f a c t o r VIII p r o t e i n i s composed of three d i f f e r e n t domains; the order of the domains i n the p r o t e i n i s A l - A l - B - A3 - C l - C2 (see Figures 3 and 4). If tandem gene d u p l i c a t i o n events (see Section I.I.I) have occurred i n the evolution of the f a c t o r VIII gene, as i s strongly suggested by the repeated domain organization, conservation of intron boundaries within the A and C repeats would be predicted ( D o o l i t t l e , 1985). For the C d u p l i c a t i o n , intron/exon boundaries occur p r e c i s e l y at the borders of the C1/C2 repeat u n i t s , as would be expected i f a gene d u p l i c a t i o n event has occurred (see Figure 4). Again, there i s an intr o n at the boundary of the A3 and C l un i t s (see Figure 4) thus also supporting a mechanism involving int r o n j o i n i n g . However, the A1/A2 and A2/A3 junctions are each contained on one exon (see Figure 4). Within the A and C repeated u n i t s , only some of the intr o n boundaries are conserved, suggesting that these introns were present i n the ancestral gene p r i o r to d u p l i c a t i o n (see Figure 4). The d i f f e r i n g number of exons within each of the repeats i s r e f l e c t i v e of ei t h e r i n t r o n loss or intron i n s e r t i o n following the i n i t i a l d u p l i c a t i o n events. The o r i g i n of the unique B domain i s highly speculative. This region i s contained almost e n t i r e l y within a 3106 bp exon as described previously, where the end of the A2 repeat and the beginning of the A3 repeat are also found. Due to i t s anomalous s i z e , i t has been postulated that the B domain may have arisen by i n s e r t i o n of a processed gene (mRNA-derived; see Section I.J) into a short exon containing the A2/A3 36 Figure 4. Location of introns ( v e r t i c a l l i n e s ) within the t r i p l i c a t e d A  and duplicated C domains of human factor VIII (from G i t s c h i e r et a l . , 1984). For the A and C repeated u n i t s , numbers i d e n t i f y the p o s i t i o n of the f i r s t amino acid residue i n each l i n e . The l o c a t i o n and extent of the B domain i s indicated; numbers represent amino acid residues. The exons are numbered consecutively. 37 1 —I 1 2 1 3 1 4 I 5 J i 7 iB 330 8 1 9 L J Q 1 11 1 12 i 13 i 14 •B(712-1649) 1649 14 I 15 I 16 l 17. I 18 u_19 i 2020 i 20 i 21 i 22 i 93 '2173 i 24 1 25 i 26 i i 100 Amino Acids i ! 38 boundary. Although the exon containing the B domain corresponds to a f u n c t i o n a l u n i t , which i s excised upon thrombin a c t i v a t i o n of f a c t o r VIII, correspondence of other exons to functional u n i t s of the p r o t e i n i s unclear at t h i s time. H. THE DYNAMICS OF PROTEIN AND GENE EVOLUTION H.l The Molecular Clock I n i t i a l studies i n v o l v i n g the comparison of p r o t e i n sequences from species whose times of evolutionary divergence are established has allowed an estimate of the rate at which mutations have been accumulating i n genomes. On t h i s basis, i t was suggested that random mutation of DNA occurs at a nearly constant rate, which i s i n turn manifested as a constant rate of amino acid s u b s t i t u t i o n (Zuckerkandl and Pauling, 1965; Wilson et a l . , 1977). Thus, pr o t e i n sequences can serve as approximate molecular clocks. This empirical f i n d i n g has been u s e f u l i n the construction of phylogenetic trees (Wilson et §1^., 1977; L i e_t a l . , 1985), allowing major contributions to our knowledge of evolutionary r e l a t i o n s h i p s among organisms. However, i f the function of a protein i s modified (as can occur with independent products of gene du p l i c a t i o n events), i t s evolutionary rate w i l l l i k e l y change due to the d i f f e r e n t set of s e l e c t i v e pressures. Even i f a protein maintains a constant function, the rate of evolution may s t i l l be subject to change i n response to v a r i a t i o n within the organism's c e l l u l a r environment. Observed differences i n the evolutionary rates of d i f f e r e n t proteins l i k e l y depend upon differences i n p r o b a b i l i t y that mutations w i l l r e t a i n protein function (Afinsen, 1959; Ohno, 1970). This concept of " f u n c t i o n a l 39 con s t r a i n t " also i l l u s t r a t e s why the rate of evolution i s var i a b l e at d i f f e r e n t s i t e s within a p a r t i c u l a r protein, depending upon which areas can t o l e r a t e v a r i a t i o n without r e s u l t i n g i n loss of function (Wilson et a l . , 1977). Thus, the i d e n t i f i c a t i o n of highly conserved regions within a p r o t e i n allows the detection of p o t e n t i a l f u n c t i o n a l domains within a molecule. I. MECHANISMS OF GENE EVOLUTION 1.1 Gene Duplication The process of gene d u p l i c a t i o n has been used to explain the occurrence of multigene f a m i l i e s e i t h e r having s i m i l a r functions, thereby allowing them to act s y n e r g i s t i c a l l y (e.g. the globin gene family; Edgell et a l . , 1983) or d i f f e r i n g i n function (e.g. the lysozyme-lactalbumin family; H a l l et a l . 1982). Increased protein s i z e compared with ancestral forms can be the r e s u l t of i n t e r n a l gene d u p l i c a t i o n ( D o o l i t t l e , 1985; L i , 1983). In such events, the e n t i r e ancestral molecule can be duplicated or t r i p l i c a t e d as i s the case f o r t r a n s f e r r i n (MacGillivray et a l . , 1982) or ceruloplasmin (Takahashi et a l . , 1983), r e s p e c t i v e l y . A l t e r n a t i v e l y , portions of proteins can be duplicated to generate larger, more complex forms ( D o o l i t t l e , 1985). Internal gene duplications are r e f l e c t e d by homologous amino acid sequences and/or s i m i l a r three-dimensional structures between duplicated regions (McLachlan, 1979). 1.2 Gene Fusion By the process of gene fusion, new proteins are created by the j o i n i n g of protein domains derived from d i f f e r e n t sources ( D o o l i t t l e , 40 1985). Coupled with gene duplications, gene fusion events have been proposed i n the formation of complex molecules, such as the coagulation factors ( D o o l i t t l e , 1985). 1.3 Exon S h u f f l i n g I t has been proposed that exons are s t r u c t u r a l motifs which can be assorted by recombination within intervening sequences to y i e l d novel proteins with d i f f e r e n t functions ( G i l b e r t , 1978; Blake, 1983a,b; Rogers, 1985). This phenomenon of modular u n i t s h u f f l i n g i s thought to have been a dominant force i n the evolution of the serine protease gene superfamily (Rogers, 1985; Neurath, 1985). For example, the gene f o r t i s s u e plasminogen a c t i v a t o r provides the f i r s t example of exon t r a n s f e r between otherwise unrelated genes (Ny et a l . , 1984). 1.4 Intron Insertion and Intron S l i d i n g Intron i n s e r t i o n has been commonly observed between r e l a t e d genes, such as within the protease domain of the t r y p s i n gene family (Rogers, 1985). Some r e g u l a r i t y with respect to the i n s e r t i o n of introns has been shown. For example, i t seems that new i n s e r t i o n s tend to occur near the middle of p r e - e x i s t i n g exons, thus allowing evolution toward a consistency with respect to exon s i z e (Naora and Deacon, 1982; Lonberg and G i l b e r t , 1985). Secondly, i t has been suggested that small observed length v a r i a t i o n s between members of p r o t e i n f a m i l i e s can be the r e s u l t of s u b s t i t u t i o n of a l t e r n a t i v e intron s p l i c e s i t e s , thereby permitting extension or contraction of exons at intron junctions (Craik et a l . , 1982a,b; Craik et a l . , 1983). Resulting small i n s e r t i o n s (usually 2 to 17 amino acid residues) often map to the surface of molecules, where 41 resultant s t r u c t u r a l modifications are the l e a s t d i s r u p t i v e to the o v e r a l l t e r t i a r y structure (Craik et a l . , 1982a,b; Craik et a l . , 1983). In t h i s process of intron/exon s l i d i n g , i t i s expected that only those v a r i a t i o n s which r e s u l t i n maintainance of the t r a n s l a t i o n a l reading frame can be tol e r a t e d . The mechanism of in t r o n s l i d i n g has been postulated to explain small v a r i a t i o n s i n the length of re l a t e d gene products i n both the dihydrofolate reductase (DHFR) and serine protease gene f a m i l i e s (Craik et a l . , 1983). J. PSEUDOGENES Higher eukaryotic genes (including human ceruloplasmin) commonly e x i s t i n multigene f a m i l i e s that contain both fu n c t i o n a l genes, as well as c l o s e l y r e l a t e d sequences that have l o s t the a b i l i t y to produce a fun c t i o n a l product due to mutational changes. These l a t t e r sequences have been termed "pseudogenes" (Jacq et a l . , 1977; Proudfoot, 1980). Pseudogenes f a l l into two categories: 1) non-processed pseudogenes and, 2) processed pseudogenes; the c h a r a c t e r i s t i c s of each category are summarized below. 1. Non-Processed Pseudosenes This group (mainly composed of the globin pseudogenes from a v a r i e t y of species; Vanin, 1983) includes those psuedogenes that have retained the intervening sequences found i n t h e i r functional counterparts. In the majority of cases, the chromosomal l o c a t i o n of these pseudogenes i s adjacent to the respective wild-type gene, suggesting that the pseudogene sequences have arisen from gene d u p l i c a t i o n events (Vanin, 1983). 42 2. Processed Pseudogenes This second, more abundant category of pseudogenes i s represented i n many d i f f e r e n t gene f a m i l i e s from a number of mammalian species (see Vanin, 1985 f o r a recent review). Most processed pseudogenes contain genetic lesions that preclude the production of a functional gene product. These lesions include: 1) the presence of in-frame termination codons as a r e s u l t of s i n g l e base substitutions and 2) the i n s e r t i o n or de l e t i o n of non-integral nucleotide t r i p l e t s that cause frameshift mutations, thereby r e s u l t i n g i n premature termination of t r a n s l a t i o n . There are four examples, however, of processed pseudogenes that contain no deleterious mutations: the human metallothionein II pseudogene (Varshney and Gedamu, 1984), the r a t RC9 cytochrome c pseudogene (Scarpulla, 1984), the mouse L32 ribosomal protein pseudogene rpL32-4A (Dudov and Perry, 1984), and the DHFRtyl pseudogene (Chen et a l . , 1982). In addition to various genetic mutations, processed pseudogenes possess a number of c h a r a c t e r i s t i c features, the most d i s t i n g u i s h i n g of which i s the loss of introns r e l a t i v e to t h e i r functional counterparts. The absence of introns i s u s u a l l y precise - i . e . sequences 5' and 3' to the introns are contiguous. Processed pseudogenes are also characterized by the observed divergence of sequence homology with the wild-type counterparts at points corresponding to the beginning and end of the fun c t i o n a l genes. Notable exceptions to t h i s include the human immunoglobulin e (Ueda et a l . , 1982) and Xtyl pseudogenes ( H o l l i s et a l . , 1982) and the mouse c o r t i c o t r o p i n B - l i p o t r o p i n precursor pseudogene (Notake et a l . , 1983), which appear to be DNA copies of only a port i o n of the wild-type mRNA t r a n s c r i p t s , as well as the mouse i\><x3 43 pseudogenes which contain a d d i t i o n a l sequences compared to the fun c t i o n a l t r a n s c r i p t s (Vanin et a l . , 1980). Many processed pseudogenes have a poly(A) t r a c t located immediately 3* to the point at which homology between the pseudogene and wild-type gene ceases, and are often characterized by short (7 - 17 bp) d i r e c t repeats flanking the pseudogene sequence. L a s t l y , almost without exception, processed pseudogenes do not share the same chromosomal l o c a t i o n as the corresponding f u n c t i o n a l genes (Battey et a l . , 1982; Czosnek et a l . , 1984). A number of mechanisms were o r i g i n a l l y proposed f o r the o r i g i n of processed pseudogenes (Vanin, 1985). However, based on the c h a r a c t e r i s t i c s summarized above f o r a number of compiled processed pseudogene sequences, i t i s now generally accepted that processed pseudogenes have arisen from reverse t r a n s c r i p t i o n of mature mRNA species. In t h i s model, cDNA copies of corresponding mRNAs are then randomly integrated into the genome, as has been postulated f o r the dispersion of the human snRNA pseudogenes (Denison et a l . , 1982; Van A r s d e l l et a l . , 1981) and the human A l u l family of r e p e t i t i v e sequences (Jagadeeswaran et a l . , 1981). K. THE PRESENT STUDY The analysis of the structure of a number of representatives of d i f f e r e n t gene f a m i l i e s has led to an enhanced understanding of the nature of p r o t e i n and gene evolution [e.g. the globin gene family (Edgell et a l . , 1983), the serine protease supergene family (Rogers, 1985), and the immunoglobulin supergene family (Hood et a l . , 1985)]. Based on t h e i r shared s t r u c t u r a l s i m i l a r i t i e s as demonstrated by amino acid sequence 4A determination (see Section I.F), i t has been proposed that ceruloplasmin and blood c l o t t i n g factors V and VIII constitute a gene family ( D o o l i t t l e , 1985). In 1984, the present i n v e s t i g a t i o n was i n i t i a t e d i n order to characterize the human ceruloplasmin cDNA and gene, thereby f a c i l i t a t i n g a comparison of the gene organization of t h i s multicopper oxidase to that reported f o r human fac t o r VIII ( G i t s c h i e r et a l . , 1984; see Section I.G). Since 1984, several p a r t i a l human ceruloplasmin cDNA clones have been reported by other groups (Mercer and Grimes, 1986; Yang et a l . , 1986). D e t a i l s concerning the relevance of these l a t t e r studies w i l l be addressed i n the context of subsequent sections. 45 I I . MATERIALS AND METHODS A. BACTERIAL HOSTS AND MEDIA The media used f o r both growth of appropriate b a c t e r i a l hosts and screening of X phage clones was NZYC (Maniatis et aJL., 1982) (10 g NZ amine type A, 2 g MgCl^, 5 g NaCI, 5 g Yeast Extract, 1 g Casamino Acids per l i t r e , adjusted to pH 7.5 by NaOH ad d i t i o n ) . Phage l i b r a r i e s were plated on NZYC-agar (1.5% w/v) p l a t e s , with an overlay of NZYC-agarose (0.7% w/v). The media used f o r the growth of b a c t e r i a transformed with pUC plasmids was L u r i a broth (LB) (Maniatis et a l . , 1982) (5 g Yeast Extract, 10 g Bactotryptone and 10 g NaCI per l i t r e ) . For the s e l e c t i o n of b a c t e r i a containing pUC plasmids, clones were plated on LB agar (1.5% w/v) plates supplemented with 50 - 100 yg/ml a m p i c i l l i n . LB medium was also used f o r screening the human l i v e r cDNA l i b r a r y constructed i n the pKT218 vector, except that t e t r a c y c l i n e (12.5 yg/ml) replaced a m p i c i l l i n as the a n t i b i o t i c . Bacteria containing phage M13 clones were grown i n YT medium (Maniatis et a l . , 1982) (5 g Yeast Extract, 8 g Bactotryptone, 5 g NaCI per l i t r e ) ; M13 transformants were plated on YT agar (1.5% w/v) plates overlayed with YT containing 0.75% (w/v) agar. E. c o l i s t r a i n s JM101 and JM103 were maintained on minimal media p l a t e s , prepared as follows: 3 g of agar i n a t o t a l of 160 ml dH^O was autoclaved, cooled to 55°C, and mixed with 40 ml of 5X s a l t s [2.1 g K HP0 A, 0.9 g KH PO. 0.2 g (NH, ) SO,,, 0.1 g Na 2 4 2 4 4 2 4 Citrate«7H 20 per 40 ml], 2 ml 20% glucose, 0.2 ml 20% MgSO^ , • 7H^0, and 0.1 ml 10 mg/ml thiamine. Bacteria f or large-scale plasmid preparations were grown eit h e r i n LB medium, supplemented with the appropriate a n t i b i o t i c , or i n M9 minimal media (Maniatis et a l . , 1982) Table I I . Summary of the genotypes of b a c t e r i a l hosts used i n the present study. Vectors u t i l i z e d i n conjunction with each host are also given. B a c t e r i a l S t r a i n Genotype Reference Compatible Vector System a) E_. c o l i MC1061 araD139, A(ara, leu)7697 Casadaban and Cohen, human l i v e r cDNA Alacx74, g a l l T , galK~ 1980 l i b r a r y i n PKT218 hsr~, hsm , strA (Prochownik et a l . 1983) b) E. c o l i K802 hsdR +, hsdM*, gal~, Maniatis _et _ a l . , 1982 human genomic l i b r a r y met -, SupE i n Charon 4A (Lawn et a l . , 1978) c) _E. c o l i LE392 CF~, hsdR514(i£nt£) supE44, supF58, l a c Y l , A(lacIZY)6, galK2, galT22, metBl, trpR55, lambda -Maniatis et a l . , 1982 human genomic l i b r a r y i n EMBL 3 (Frischauf et a l . , 1983) d) _E. c o l i P2 392 P2 lysogen of LE392 Maniatis et a l . , 1982 human genomic l i b r a r y i n EMBL 3 (Frischauf et a l . , 1983) B a c t e r i a l S t r a i n Genotype Reference Compatible Vector System e) E. c o l i RY1088 Alac U169, supE, supF, hsdR -, hsdM*, met B, trpR, tonA21, proC::Tn5 (pmc9) Young and Davis, 1983a,b human l i v e r cDNA l i b r a r y i n X g t l l . (Young and Davis, 1983a,b) f ) _E. c o l i C600 H f l + F~, t h i - 1 , thr-1, leuB6, l a c Y l , tonA21, supE44, X~, Hfl+ Appleyard, 1954 Winnacker, 1987 human l i v e r cDNA l i b r a r y i n XgtlO g) E. c o l i JM101 Alacpro, supE, t h i ~ , F', traD36, proAB, lacIQ, lacZAM15 Messing, 1983 pUC vectors ( V i e i r a and Messing, 1982) M13 vectors (Messing, 1983) h) E. c o l i JM103 Alacpro, supE, t h i - , strA, sbcB15, endA, hsdR", F', traD36, proAB, lacIQ, lacZAM15 Messing, 1983 pUC vectors ( V i e i r a and Messing, 1982) M13 vectors (Messing, 1983) 48 containing 840 ml dH 20, 100 ml 10X s a l t s (7 g Na^PO^, 3 g K H 2 P ° 4 ' 0 - 5 s N a C l » 1 6 N H 4 C 1 P e r 1 0 0 m l > » 10'ml MgS0 4«7H 20, 20 ml 20% glucose, 10 ml 0.01 M C a C l 2 > 20 ml 20% Casamino Acids, 0.2 ml 10 mg/ml thiamine and 0.2 g u r i d i n e . A summary of the genotypes of various b a c t e r i a l s t r a i n s used i n t h i s study i s given i n Table I I . B. HYBRIDIZATION PROBES B . l P u r i f i c a t i o n and L a b e l l i n g of Oligodeoxyribonucleotides Oligodeoxyribonucleotide mixtures were synthesized by Tom Atkinson i n the laboratory of Dr. M. Smith, U n i v e r s i t y of B r i t i s h Columbia, using an Applied Biosystems 380 A DNA synthesizer. Crude oligonucleotide preparations were p u r i f i e d through 20% denaturing gels (containing 8.3 M urea) and i s o l a t e d by reverse-phase chromatography using a C^g SEP-PAK ( M i l l i p o r e ) column as described by Atkinson and Smith 32 (1984). The oligonucleotides were l a b e l l e d using [y- P]-ATP and T4 polynucleotide kinase (Chaconas and van de Sande, 1980), and unincorporated ATP was subsequently removed by chromatography on G25 Sephadex. Three pools of heptadecadeoxyribonucleotides were used as h y b r i d i z a t i o n probes f o r the screening of human l i v e r cDNA l i b r a r i e s : POOL I: 5'd(TARTARTGYTTYTCYTT)3' POOL I I : 5'd(ATNGCRTGCATYTTRTT)3 * POOL I I I : 5'd(CCCATNARRTACCARTT)3' where "R" represents both G and A, "Y" represents T and C, and "N" represents G,A,T and C. The three nucleotide pools are complementary to 49 the mRNA encoding amino acid residues 1 - 6 , 937-942, and 962-967 of ceruloplasmin, r e s p e c t i v e l y , as predicted from the amino acid sequence (Takahashi et a l . , 1984). B.2 Nick Trans l a t i o n P u r i f i e d DNA fragments or e n t i r e plasmids were labeled with 32 P by n i c k - t r a n s l a t i o n as described by Maniatis et a l . (1975). Approximately 200 - 500 ng of DNA was l a b e l l e d i n 50 y l of reaction mixture, containing 50 mM Tr i s - H C l pH 7.5, 5 mM MgCl 2 > 0.05 mg/ml BSA, 10 mM B-mercaptoethanol, 20 yM dGTP, 20 yM dTTP, 1.4 mM dATP, 1.4 mM dCTP, 1.4 y C i / y l [<x32P]-dATP (3000 Ci/mMole), 1.4 y C i / y l [a 3 2P]-dCTP (3000 Ci/mMole), 0.2 mM C a C l 2 > 1 pg/yl DNAse I (Sigma) and 0.4 u / y l E. c o l i DNA polymerase I (Romberg fragment) (Boehringer-Mannheim). The above reaction was incubated f o r 60 - 120 min at 15°C, and was subsequently terminated by the addition of 3 volumes of 1% SDS/10 mM EDTA containing 25 yg tRNA, followed by heating f o r 10 min at 68°C. Free triphosphates were separated from the l a b e l l e d strands by chromatography on 1.0 ml spun columns of Sephadex G-50 (Maniatis et a l . , 1982). S p e c i f i c a c t i v i t i e s of resultant probes ranged from 0.5 - 1.0 x 8 10 cpm/yg. B.3 Klenow L a b e l l i n g DNA was also l a b e l l e d using the method described by Feinberg and Vogelstein (1983). Reaction mixtures t y p i c a l l y contained 50 - 200 ng of DNA (ei t h e r p u r i f i e d r e s t r i c t i o n fragments or e n t i r e plasmids) i n a t o t a l volume of 50 y l . The DNA sample ( i n 30 y l of dH^O) was denatured by b o i l i n g f o r 3 minutes followed by rapid cooling on i c e for 5 minutes. The l a b e l l i n g reaction was subsequently c a r r i e d out i n 50 y l t o t a l volume, 50 containing 50 mM Tris-HCl pH 8.0, 10 mM MgCl 2 > lOmM B-mercaptoethanol, 32 20 yM dCTP, 20 yM dGTP, 20 yM dTTP, 1 y C i / y l [a P]-dATP (3000 Ci/mMole), 200 mM Hepes pH 6.6, 60 OD 260 nm/ml random hexadeoxyribonucleotides [p(dN6)] (P-L Biochemicals), 0.4 mg/ml BSA and 0.1 u / y l E. c o l i DNA polymerase I (Klenow fragment) [Bethesda Research Laboratories (BRL) or P-L Biochemicals]. Extension was allowed to continue at 37°C f o r e i t h e r 3 - 4 hours or overnight. The reaction was terminated by heating the sample at 68°C f o r 10 minutes, and unincorporated nucleotides were separated as described f o r n i c k - t r a n s l a t i o n (see above). T y p i c a l l y , s p e c i f i c a c t i v i t i e s of resultant 8 probes ranged from 2 - 5 x 10 cpm/yg. B.4 Preparation of M13 Probes M13 templates containing ceruloplasmin DNA fragments were also used to generate probes with high s p e c i f i c a c t i v i t i e s (Russnak and Candido, 1985). An annealing mixture containing 3.0 y l template (0.5 -1 yg), 2.0 y l uni v e r s a l primer (P-L Biochemicals; 0.03 A 260 units/ml) and 2.0 y l of lOx annealing b u f f e r (100 mM Tris-HCl pH 7.5, o 600 mM NaCI, 70 mM MgCl 2), was incubated at 65 C f o r 15 minutes i n a 1.5 ml microfuge tube. Af t e r cooling to room temperature, 1.0 y l of 20 mM DTT, 2.0 y l of 0.5 mM dGTP, 2.0 y l of 0.5 mM dTTP, 2.5 y l each of [ a 3 2P]-dATP and [ a 3 2P]-dCTP (25 yCi; 3000 Ci/mMol) and 0.5 U of E. c o l i DNA Polymerase I (Klenow fragment) were added. The reaction was allowed to proceed for 10 minutes at room temperature, and was then followed by a 5 minute chase i n i t i a t e d by the addition of 2.0 y l of 0.5 mM dGTP and 2.0 y l of 0.5 mM dCTP before termination of the reaction by heating at 68 C f o r 10 minutes. Unincorporated dNTPs were separated from l a b e l l e d strands as described f o r nick t r a n s l a t i o n (see above). C. IDENTIFICATION OF cDNAS FOR HUMAN CERULOPLASMIN C l Screening of a Human L i v e r cDNA Li b r a r y An adult l i v e r cDNA l i b r a r y (Prochownik et a l . , 1983) was k i n d l y provided by Dr.S.H. Orkin (Children's Hospital Medical Center, Boston). This l i b r a r y contains cDNA i n s e r t s of > 500 bp inserted into the Pst 1 s i t e of pKT218 by homopolymeric dG»dC t a i l i n g . The cDNA l i b r a r y was screened using the colony h y b r i d i z a t i o n method of Grunstein and Hogness (1975). A m p i c i l l i n - r e s i s t a n t colonies (approximately 5,000 per 100 x 15 mm p e t r i plate) were transferred to n i t r o c e l l u l o s e f i l t e r s (82 mm; BA-85, Schleicher and Schuell). A second set of r e p l i c a s was prepared from the o r i g i n a l f i l t e r l i f t s , and b a c t e r i a on both sets of f i l t e r s were allowed to grow at 37°C on LB plates containing t e t r a c y c l i n e u n t i l colonies were 1 - 2 mm i n diameter. F i l t e r s were subsequently transferred to LB plates containing 170 yg/ml chloramphenicol, and the plasmids were allowed to amplify overnight at 37°C. C e l l l y s i s was c a r r i e d out by plac i n g f i l t e r s onto Whatman 3MM paper soaked with 0.5 N NaOH, followed by incubation at room temperature f o r 20 minutes. In a s i m i l a r manner, f i l t e r s were subsequently denatured again with NaOH for 20 minutes, neutralized with 1 M Tri s - H C l pH 7.5 for 20 minutes, and f i n a l l y treated with 0.5 M Tris-HCl pH 7.5/1.5 M NaCI f o r 20 minutes. The f i l t e r s were then a i r - d r i e d and baked at 68°C overnight. P r i o r to h y b r i d i z a t i o n , f i l t e r s were washed 3 times i n 2X SSC buffer (IX SSC i s 0.15 M NaCI, 0.015 M Na C i t r a t e pH 7) i n order to remove c e l l u l a r 52 debris. The r e p l i c a f i l t e r s were then screened by using Pool II and Pool I I I oligonucleotide mixtures (see Section II.B.l) as h y b r i d i z a t i o n probes. Hybridization and washing conditions were e s s e n t i a l l y those described by Fung et a l . (1984) and are summarized i n Section U.K. Putative p o s i t i v e clones were p u r i f i e d from the master p l a t e s , and the recombinant plasmids were analyzed (see Section I I . E . l ) . In order to obtain a d d i t i o n a l ceruloplasmin clones, the l i b r a r y was rescreened using a ni c k - t r a n s l a t e d r e s t r i c t i o n fragment as a probe. H y b r i d i z a t i o n and washing conditions f o r the l a t t e r screen were e s s e n t i a l l y as described by Maniatis et a l . (1982), and are summarized i n Section U.K. C.2 Preparation and Screening; of Randomly-Primed Human L i v e r cDNA  L i b r a r i e s To i s o l a t e cDNAs encoding the 5' end of the ceruloplasmin t r a n s c r i p t , several randomly-primed human l i v e r cDNA l i b r a r i e s were constructed i n the vectors XgtlO or X g t l l (Huynh et a l . , 1984) by Walter Funk (Biochemistry Department, U n i v e r s i t y of B r i t i s h Columbia). B r i e f l y , human l i v e r poly ( A ) + RNA (for i s o l a t i o n procedure, see Section II.E.4) was used as a template f o r f i r s t strand cDNA synthesis by reverse t r a n s c r i p t a s e . DNase I-digested r a t thymus DNA (average length 20 nucleotides) was used as a primer (Goelet and Karn, 1984). Second strand synthesis was performed as described by Gubler and Hoffman (1983), using ribonuclease H (BRL), DNA polymerase I (BRL) and E. c o l i DNA l i g a s e (P-L Biochemicals). After SI nuclease treatment to generate blunt-ends, the res u l t a n t double-stranded cDNA was methylated using EcoRI methylase and S-adenosylmethionine (BRL), EcoRI l i n k e r s (P-L Biochemicals) were l i g a t e d to the ends, and the l i n k e r s were subsequently digested with EcoRI. The 53 cDNA was then chromatographed on a column (30 x 0.2 cm) of Bio-Gel A-50 m (Bio-Rad), e q u i l i b r a t e d with 0.01 M Tris-HCl pH 7.5/0.3 M NaCl/0.001 M EDTA. Fractions comprising the leading edge of the cDNA peak (corresponding to cDNA fragments > 1 Kb) were pooled and the DNA (~ 50 ng) was l i g a t e d with 1 yg of EcoRI-digested, dephosphorylated XgtlO or X g t l l DNA (Vector Cloning Systems). Half of the r e s u l t i n g DNA was packaged into phage p a r t i c l e s i n v i t r o using a Gigapak (Vector Cloning Systems). The l i b r a r i e s constructed i n XgtlO and X g t l l contained 400,000 and 200,000 independent recombinant clones r e s p e c t i v e l y , and were subsequently plated at high density f o r 4 4 screening (4 x 10 or 2 x 10 plaques per 150 mm p e t r i p l a t e f o r XgtlO and X g t l l l i b r a r i e s , r e s p e c t i v e l y ) . For p l a t i n g , appropriate d i l u t i o n s of phage prepared i n SM buffer (5.8 g NaCI, 2 g MgS04, 50 ml 1 M Tris-HCl pH 7.5, 5 ml 2% g e l a t i n per l i t r e ) were incubated with overnight cultures of appropriate host c e l l s at 37°C f o r 10 minutes i n order to allow phage attachment. Phage were then o plated on NZYC agarose. Plates were subsequently incubated at 37 C u n t i l plaques were v i s i b l e but not confluent. Replicas (2 sets) of the plaques were then transferred to n i t r o c e l l u l o s e c i r c l e s (132 mm; Schleicher and Schuell). DNA was denatured by treatment of the n i t r o c e l l u l o s e f i l t e r s with 0.5 N Na0H/1.5 m NaCI for 5 minutes. The f i l t e r s were then ne u t r a l i z e d by treatment with 1 M Tris-HCl pH 7.5 for 5 minutes, followed by treatment with 0.5 M Tris-HCl pH 7.5/1.5 M NaCI f o r 5 minutes. Ceruloplasmin cDNA clones were i d e n t i f i e d by plaque-hybridization (Benton and Davis, 1977) to appropriate r e s t r i c t i o n 32 fragments l a b e l l e d with P by either n i c k - t r a n s l a t i o n or Klenow 54 extension (see above). Recombinant clones of i n t e r e s t were detected by autoradiography and p u r i f i e d to homogeneity by successive rounds of r e p l a t i n g and rescreening at decreased phage d e n s i t i e s . For the XgtlO l i b r a r y , p u r i f i e d phage clones were further screened using the Pool I oligonucleotide mixture (see Section II.B.l) i n order to i d e n t i f y cDNAs extending f u r t h e s t i n a 5' d i r e c t i o n . H y b r i d i z a t i o n and washing conditions f o r the above screens are summarized i n Section U.K. D. SCREENING OF HUMAN GENOMIC LIBRARIES Two d i f f e r e n t human genomic DNA l i b r a r i e s were used i n t h i s study. One was a p a r t i a l A l u l / H a e l l l digest of human genomic DNA, constructed i n the X Charon 4A vector (Lawn et a l . , 1978). This l i b r a r y was generously provided by Dr. T. Maniatis and amplified as described (Maniatis et a l . , 1982). The other l i b r a r y used was a p a r t i a l Sau 3A digest of human lymphocyte genomic DNA, cloned into the BamHl s i t e of the X d e r i v a t i v e EMBL 3. This l i b r a r y was constructed by Val Geddes (1987) (Department of Biochemistry, U n i v e r s i t y of B r i t i s h Columbia) and was screened p r i o r to a m p l i f i c a t i o n . For both l i b r a r i e s , i n i t i a l screening was c a r r i e d out on 5 x 10"* -1 x 10** plaques, representing 2 . 5 - 5 genome equivalents, based on the conservative estimate that each clone contained approximately 10 Kbp of genomic DNA. Phage were plated at a density of 3 - 5 x 10 plaques per 150 mm p e t r i dish. The p l a t i n g and screening procedures were e s s e n t i a l l y performed as previously described for the XgtlO and X g t l l cDNA l i b r a r i e s (Section II.C.2) with the exception that r e p l i c a s of the plaques 55 were transferred to n i t r o c e l l u l a s e f i l t e r s and then incubated on fresh NZYC plates at 37°C to allow a m p l i f i c a t i o n of the phage (Woo, 1980). For screens other than the i n i t i a l high density screen, t h i s a m p l i f i c a t i o n step was omitted. E. ISOLATION OF NUCLEIC ACIDS E . l P u r i f i c a t i o n of Plasmid DNA Small amounts of plasmid DNA were r o u t i n e l y p u r i f i e d using a modified a l k a l i n e l y s i s procedure of Birnboim and Doly (1979), as described by Maniatis et a l . (1982). B r i e f l y , a 1.5 ml a l i q u o t of an overnight b a c t e r i a l culture containing a recombinant plasmid of i n t e r e s t was placed i n a microfuge tube, and b a c t e r i a were harvested by c e n t r i f u g a t i o n i n an Eppendorf centrifuge f o r 2.5 minutes. The p e l l e t was resuspended i n 100 y l of i c e - c o l d s o l u t i o n containing 50 mM glucose, 10 mM EDTA, 25 mM Tris-HCl pH 8.0 and 4 mg/ml lysozyme (Sigma). This mixture was incubated f o r 5 minutes at room temperature, followed by the addition of 200 y l of a s o l u t i o n containing 0.2 N Na0H/l% SDS. This s o l u t i o n was incubated at 4°C f o r 5 minutes, and then 150 y l of potassium acetate s o l u t i o n (60 ml 5 M potassium acetate, 11.5 ml g l a c i a l a c e t i c a c i d , 28.5 ml dH^O; pH 4.8) was added. This suspension was incubated on ice f o r 5 minutes, following which the p r e c i p i t a t e was removed by o c e n t r i f u g a t i o n at 4 C for 5 minutes. The resultant supernatant was removed and extracted with an equal volume of phenol:chloroform (1:1 v/v). Nucleic acids were p r e c i p i t a t e d by the addition of 2 volumes of ethanol (room temperature) and were recovered by c e n t r i f u g a t i o n f or 5 56 minutes. The p e l l e t was washed with 70% ethanol, a i r - d r i e d , and resuspended i n 50 y l of TE buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA). Two d i f f e r e n t procedures were used f o r large-scale plasmid i s o l a t i o n s . The T r i t o n l y s i s procedure (Katz et a l . , 1973, 1977) was used fo r the large scale preparation of plasmids i n the pKT218 cloning vector. An a l i q u o t (5 ml) of an overnight b a c t e r i a l culture was used to innoculate 1 l i t r e of M9 medium at 37°C. When the 0D,.rt of the cult u r e reached 0.6 - 0.7, 250 mg of chloramphenicol was added and the cult u r e was incubated a further 12 - 16 hours at 37°C. C e l l s were harvested by cen t r i f u g a t i o n at 6 K RPM i n a GS-3 rotor f o r 20 minutes, followed by freezi n g of the p e l l e t s at -20°C f o r 2 hours. The c e l l s were then resuspended i n 6.25 ml of s o l u t i o n containing 25% w/v sucrose and 50 mM Tris- H C l pH 8.0. Lysozyme (1.5 ml of 10 mg/ml solution) was added, and the s o l u t i o n was mixed by s w i r l i n g on ice f o r 5 minutes. EDTA (1.25 ml of a 0.5 M so l u t i o n , pH 8.0) was added and mixed on i c e by s w i r l i n g f o r an ad d i t i o n a l 5 minutes. T r i t o n s o l u t i o n [10 ml of a s o l u t i o n containing 10 ml of 10% (w/v) T r i t o n X-100, 125 ml 0.5 M EDTA pH 8.0, 50 ml 1 M Tris-HCl pH 8.0, 800 ml dH^O] was then added, and the s o l u t i o n was mixed again f o r 5 minutes on i c e . Following l y s i s , c e l l debris was removed by o c e n t r i f u g a t i o n at 19 K RPM i n an SS-34 rotor f o r 30 minutes (4 C). Plasmid DNA was subsequently i s o l a t e d by isopycnic c e n t r i f u g a t i o n ( T i 70.1 rotor/20 hours/20°C) using CsCl-ethidium bromide density gradients. The large-scale a l k a l i n e l y s i s procedure of Birnboim and Doly (1979) as described by Maniatis et aJL. (1982) was used for the i s o l a t i o n of recombinant pUC plasmids and M13 r e p l i c a t i v e form (RF) DNA (Messing, 1983). In the former case, 500 ml aliquots of L broth were innoculated 57 with 10 ml of the appropriate b a c t e r i a l culture (grown i n the presence of o a s e l e c t i v e a n t i b i o t i c ) and incubated at 37 C f o r 12 - 16 hours without chloramphenicol a m p l i f i c a t i o n . For the large-scale i s o l a t i o n of M13 RF DNA, 5.0 ml of an exponentially-growing JM103 or JM101 culture and 300 y l of an M13 i n f e c t i o u s phage supernatant (see Section II.H.2) were added to 500 ml of YT broth and incubated at 37°C f o r 8 - 1 0 hours. In both cases, the p u r i f i c a t i o n procedure was a large-scale version of that described f o r the i s o l a t i o n of small amounts of plasmid DNA (see above), with several modifications. Following ad d i t i o n of the potassium acetate s o l u t i o n , c e l l debris was removed by c e n t r i f u g a t i o n at 19 K RPM i n an SS-34 rotor f o r 30 minutes. The supernatant was mixed d i r e c t l y with 0.6 volumes of isopropanol, and incubated at room temperature f o r 15 minutes. Nucleic acids were p e l l e t e d by c e n t r i f u g a t i o n f o r 20 minutes at 10 K RPM i n an SS-3A rotor. The p e l l e t s were resuspended i n 11.0 ml TE, to which 11 g CsCl and 600 y l ethidium bromide (10 mg/ml i n H 20) were added. This mixture was incubated on i c e f o r 60 minutes, followed by low-speed ce n t r i f u g a t i o n (4 K RPM f o r 5 minutes i n an HB-4 rotor) i n order to remove the f l o c c u l a n t p r e c i p i t a t e . The supernatant was loaded into 13 ml Quick-Seal Tubes (Beckman), and the DNA was banded by isopycnic density gradient c e n t r i f u g a t i o n under the conditions described above. E.2 I s o l a t i o n of Bacteriophage DNA Phage DNA was p u r i f i e d r o u t i n e l y from 20 ml lysates. For Charon 4A phage i s o l a t e s , consistent l y t i c i n f e c t i o n s were obtained using the following conditions: 200 y l of SM buffer containing 3.0 - 4.0 x 10 6 phage was incubated at 37°C for 20 minutes with 100 y l of an overnight host b a c t e r i a l culture. This mixture was added to 20 ml of appropriate 58 growth media i n a 125 ml Erlenmeyer f l a s k , and incubated at 37°C with vigorous shaking. L y s i s was usu a l l y observed within 4 . 5 - 7 hours post-innoculum. For phage i s o l a t e s from e i t h e r the EMBL 3 genomic l i b r a r y or X g t l l and XgtlO cDNA l i b r a r i e s , a s i n g l e plaque was added d i r e c t l y to a s o l u t i o n containing 100 y l of stationary-phase b a c t e r i a l host c e l l s and 100 y l of SM buffer, and incubated at 37°C f o r 20 minutes to allow attachment. The mixture (containing the phage plug) was then added to 20 ml of appropriate growth medium i n a 125 ml Erlenmeyer f l a s k , and incubated at 37°C with vigorous shaking. Under these conditions, l y s i s was u s u a l l y observed within 3 - 4 hours. At the time of c e l l l y s i s , chloroform (3 mL) was added to the cultur e , which was then l e f t shaking slowly f o r a further 5 - 1 0 minutes. At that time, the contents were c a r e f u l l y transferred to a 30 ml Corex tube, such that most of the chloroform was l e f t behind, and the sample was centrifuged at 10 K RPM f o r 10 minutes i n an SS-34 rotor. To the r e s u l t i n g supernatant, 3 ml of 5 M NaCI and 3 g of polyethylene g l y c o l (PEG) 8000 (average molecular weight 7000 - 9000) were added. The contents were mixed, and l e f t at 4°C overnight i n order to p r e c i p i t a t e phage p a r t i c l e s . Following c e n t r i f u g a t i o n of the suspension f o r 10 minutes at 10 K RPM i n an HB-4 rotor, the phage p e l l e t was resuspended i n 500 y l of DNase I buffer (50 mM HEPES pH 7.5, 5.0 mM MgCl 2, and 0.5 mM C a C l 2 ) , and transferred to a microfuge tube. RNAse A (Sigma) (10 y l of a 5 mg/ml stock) and DNase I (Boehringer Mannheim) (5 y l of a 10 mg/ml stock) were added, and the so l u t i o n was incubated at 37°C f o r 1 hour. Following digestion, the so l u t i o n was centrifuged for 5 minutes i n a microfuge i n order to remove any remaining c e l l u l a r debris. To the 59 supernatant, 50 y l of 10X SET buffer (0.1 M Tris-HCl pH 7.5, 0.2 M EDTA and 5% SDS) was added p r i o r to digestion with 4 y l of proteinase K o (40 mg/ml stock solution) (Boehringer Mannheim) f o r 30 minutes at 68 C. The s o l u t i o n was extracted twice with an equal volume of phenol:chloroform (3:1 v/v followed by 1:1 v/v) and once with an equal volume of chloroform. The DNA was p r e c i p i t a t e d from the aqueous phase by addition of 2 volumes of 95% ethanol at room temperature f o r 2 minutes, and c o l l e c t e d by c e n t r i f u g a t i o n . The DNA p e l l e t was washed with 1.0 ml of 70% ethanol, a i r - d r i e d , and resuspended i n 50 y l of TE b u f f e r . E.3 Preparation of Human Genomic DNA Genomic DNA from human l i v e r was prepared e s s e n t i a l l y as described by B l i n and S t a f f o r d (1976). L i v e r t i s s u e was ground to a f i n e powder i n l i q u i d nitrogen i n a Waring blender. The r e s u l t i n g powder was dissolved i n a b u f f e r (10 ml/g tissue) c o n s i s t i n g of 0.5 M EDTA pH 8.0, 0.5% SDS, and 100 yg/ml proteinase K, and was incubated f o r 12 - 16 hours at 50°C. The s o l u t i o n was extracted 3 times with equal volumes of phenol and then dialyzed against buffer (50 mM Tris-HCl pH 8.0, 10 mM NaCI, 10 mM EDTA) u n t i l the ^270 ° f t l i e d i a l y s a t e w a s < ° - 0 5 - RNAse A was added to a f i n a l concentration of 100 yg/ml and the s o l u t i o n was incubated at 37°C f o r 60 minutes. The DNA s o l u t i o n was extracted 3 times with equal volumes of phenol:chloroform (1:1 v/v) and then dialyzed against TE. Insoluble material was removed by c e n t r i f u g a t i o n at 10 K RPM o i n an SS-34 rotor at 4 C f o r 15 minutes. Genomic DNA was p r e c i p i t a t e d by the addition of G i l b e r t Salts (5X Salts i s 2.5 M OT^OAc, 100 mM MgCl^ and 1 mM EDTA) to a f i n a l concentration of IX, followed by the a d d i t i o n of 2 volumes of 95% ethanol. Following c o l l e c t i o n by 60 c e n t r i f u g a t i o n , the DNA p e l l e t was allowed to rehydrate f o r several days. At t h i s time, any remaining insoluble material was removed by ce n t r i f u g a t i o n as previously described. The f i n a l DNA p e l l e t was resuspended i n TE, at a concentration of 0.5 - 1.0 mg/ml. Human genomic DNA was also prepared from white blood c e l l s by Heather K i r k (Department of Biochemistry, U n i v e r s i t y of B r i t i s h Columbia) according to a modified procedure of Kunkel et a l . (1979), and was generously supplied by Ms. Kirk f o r various aspects of t h i s study. E.4 I s o l a t i o n of RNA E.4.1 Preparation of human l i v e r p o l y ( A ) + RNA. A l l glassware, u t e n s i l s , and solutions were autoclaved p r i o r to use i n order to i n a c t i v a t e contaminating ribonucleases. RNA was i s o l a t e d from human l i v e r by the guanidine hydrochloride method (Chirgwin et a l . , 1979). Powdered human l i v e r (obtained from brain-dead donors and immediately frozen i n l i q u i d nitrogen) was added to buffer (10 ml/10 mg of tissue) c o n s i s t i n g of 7.5 M guanidine hydrochloride (GuHCl) pH 7.5, 25 mM sodium c i t r a t e pH 7.0 and 0.1 M DTT. The suspension was disrupted using a Polytron homogenizer. N-la u r y l sarcosine was added to 0.5% (w/v) and the insolu b l e material was removed by ce n t r i f u g a t i o n (10 K RPM for 15 minutes at 4°C i n an SS-34 r o t o r ) . Following addition of ethanol to a f i n a l concentration of 33%, RNA was p r e c i p i t a t e d overnight at -20°C. The p r e c i p i t a t e was c o l l e c t e d by c e n t r i f u g a t i o n under the conditions described above, and dissolved i n one h a l f of the s t a r t i n g volume with the GuHCl buffer. Again, insoluble material was removed by cent r i f u g a t i o n . RNA was p r e c i p i t a t e d as described above, resuspended i n one quarter the s t a r t i n g volume with the GuHCl buf f e r , and r e p r e c i p i t a t e d . The f i n a l RNA p e l l e t was resuspended i n 61 s t e r i l e dH^O, and any p r e c i p i t a t e removed by c e n t r i f u g a t i o n (10 K RPM f o r 15 minutes at 4°C i n the SS-34 r o t o r ) . Human p o l y ( A ) + RNA was i s o l a t e d by chromatography on a column of o l i g o d(T) c e l l u l o s e (Sigma) (Edmonds et a l . , 1971; Aviv and Leder, 1972). T o t a l human or bovine RNA samples ( i n b u f f e r containing 10 mM T r i s - H C l pH 7.5, 4 mM EDTA, 0.5 M NaCI) were applied to the column. The unbound RNA f r a c t i o n was reapplied to the column twice, and the column was washed with the b u f f e r described above u n t i l the 0D„,„ of the eluate was 260 le s s than 0.05. P o l y ( A ) + RNA was subsequently eluted from the column with s t e r i l e dH^O; f r a c t i o n s containing RNA were i d e n t i f i e d spectrophotometrically and pooled. RNA was p r e c i p i t a t e d by the addition of 0.1 V of 3 M NaOAc pH 4.8 and 2 volumes of ethanol. P o l y ( A ) + RNA was resuspended i n s t e r i l e dH^O at a concentration of 0.6 - 1. 0 mg/ml and o stored i n aliquots at -70 C. E. 4.2 Preparation of t o t a l RNA from HepG2 c e l l s . T o t a l c e l l u l a r RNA from HepG2 c e l l s (Knowles, 1980) was p u r i f i e d as previously described by van Oost et a l . (1985) f o r RNA i s o l a t i o n from the hybrid endothelial c e l l 2 l i n e EA.hy926. C e l l s grown to confluency on a t o t a l area of 5225 cm were dissolved i n 135 ml of GuHCl buffer, and t o t a l RNA was p u r i f i e d from t h i s s o l u t i o n by successive ethanol p r e c i p i t a t i o n s as described above. F. BASIC DNA TECHNIQUES F. 1 R e s t r i c t i o n Enzyme Digestion With the exception of genomic digests, DNA (usually 0.5 -2 yg) w a s r o u t i n e l y digested i n a t o t a l volume of 20 y l , using the b u f f e r system described by Maniatis et §_1. (1982). BSA (BRL) was added to 62 a f i n a l concentration of 100 yg/ml. In most cases, 1 - 5 u n i t s of r e s t r i c t i o n enzyme were used per reaction. For digests of small-scale plasmid and phage DNA preparations, 5 yg of RNase A was included. Genomic DNA (5 - 10 yg) was u s u a l l y digested i n a t o t a l volume of 30 y l using 30 - 50 u n i t s of appropriate r e s t r i c t i o n enzyme. Enzymes were purchased from BRL, New England Biolabs, Boehringer Mannheim, and P-L Biochemicals. R e s t r i c t i o n enzyme digestion mixtures were analyzed by electrophoresis i n agarose or polyacrylamide gels (see below), following the a d d i t i o n of 0.1 volumes of loading b u f f e r (0.25% bromophenol blue, 0. 25% xylene cyanol and 25% f i c o l l ) . F.2 End-labelling, of DNA Fragments Where necessary, ceruloplasmin cDNA i n s e r t s , cloned into the EcoRI s i t e i n XgtlO or X g t l l vectors (see Section II.C.2), were v i s u a l i z e d by l a b e l l i n g the 5' overhang generated by EcoRI digestion. These reactions were c a r r i e d out according to Maniatis et a l . (1982), using [ a 3 2P]-dATP and the Klenow fragment of E. c o l i . DNA polymerase 1. Reaction products were analyzed by polyacrylamide gel electrophoresis followed by autoradiography, as described below (see Section F.3.I.). F.3 Electrophoresis of DNA F.3.1 Agarose Rel electrophoresis. DNA samples of r e s t r i c t i o n endonuclease digestion mixtures were analyzed on 0.7 - 1.5% agarose gels, which were poured and run i n e i t h e r 1XTBE buffer (89 mM Tris-borate pH 8.3, 89 mM borate and 2.0 mM EDTA) (Maniatis et a l . , 1982) or 1XTAE buff e r (50XTAE i s 2 M T r i s base, 1 M g l a c i a l a c e t i c acid) (Maniatis et a l . , 1982), containing 0.5 yg/ml ethidium bromide. DNA fragments 63 were v i s u a l i z e d under UV l i g h t (260 nm) and photographs were taken with a Pola r o i d camera using Type 57 f i l m . F.3.2 Polyacrylamide Rel electrophoresis. Non-denaturing polyacrylamide gels (5 - 10% prepared from a stock of 29:1 acrylamide:bis-acrylamide) were poured and run i n 1XTBE buffer. Polymerization was i n i t i a t e d by the addition of ammonium pe r s u l f a t e and TEMED to f i n a l concentrations of 0.066% (w/v) and 0.04% (w/v) r e s p e c t i v e l y . DNA fragments were v i s u a l i z e d e i t h e r by s t a i n i n g of the gels with 10 ug/ml ethidium bromide i n 0.5XTBE followed by UV i r r a d i a t i o n or by autoradiography i f the fragments were end-labelled. In the l a t t e r case, o gel were dried under vaccuum at 80 C f o r 40 minutes using a Bio-Rad gel d r i e r and exposed to X-ray f i l m (Kodak XK-1). Where required, i n t e n s i f y i n g screens (Lightning Plus, Dupont) were used at -70°C. For denaturing polyacrylamide gels (6 - 20%, prepared from a stock of 38:2 acrylamide:bis-acrylamide), urea ( f i n a l concentration of 8.3 M) was added as a denaturant, and the gels were poured and run i n 1XTBE buffer. Polymerization of the gels was i n i t i a t e d by the addition of ammonium pe r s u l f a t e and TEMED to f i n a l concentrations of 0.066% (w/v) and 0.024% (w/v) r e s p e c t i v e l y . Gels were dried, and the DNA was v i s u a l i z e d by autoradiography as described above. F.4 Southern Transfers DNA separated by electrophoresis i n agarose gels was transferred to n i t r o c e l l u l o s e (Schleicher and Sch u e l l ) , Nytran (Schleicher and Schuell) or Zetaprobe (Bio-Rad) e s s e n t i a l l y as described by Southern (1975), except that the acid depurination step was ro u t i n e l y omitted. DNA fragments i n the gels were denatured f o r 45 - 60 minutes i n 0.5 N NaOH, 64 1.5 M NaCI, and then ne u t r a l i z e d by t r e a t i n g twice f o r 20 minutes each with 0.5 M Tris-HCl pH 7.5/1.5 M NaCI. DNA t r a n s f e r to various membranes was c a r r i e d out overnight i n e i t h e r 10XSSC or i n 1.0 M ammonium acetate (pH 7), the l a t t e r being more e f f i c i e n t . F i l t e r s were then a i r dri e d and baked at 68°C f o r 4 - 1 6 hours p r i o r to h y b r i d i z a t i o n . 6. DNA CLONING G.l Fragment Production DNA fragments f o r l i g a t i o n into e i t h e r pUC or M13 vectors were produced by several methods, including sonication (Deininger, 1983) or by r e s t r i c t i o n enzyme digestions and subsequent fragment i s o l a t i o n . In the l a t t e r case, DNA fragments were recovered from agarose gels by e l e c t r o e l u t i o n into d i a l y s i s tubing containing e i t h e r 0.5XTBE or 0.5XTAE buf f e r . The DNA was then p u r i f i e d by chromatography through NACS PREPAC cartridges (BRL). In t h i s procedure, DNA was loaded and washed i n TE containing 0.2 M NaCI. The DNA was eluted i n 500 y l of TE containing 2.0 M NaCI. Aft e r addition of 0.1 volume of 3.0 M Na acetate (pH 4.8) the fragment was p r e c i p i t a t e d with 2 volumes of 95% ethanol and resuspended i n a small volume of s t e r i l e TE buffer. Random DNA fragments were produced by sonication (Deininger, 1983) using a Heat Systems S o n i f i e r . Plasmid DNA (10 - 20 yg i n 500 y l of 0.5 M NaCI, 0.1 M Tris-HCl pH 7.4, 10 mM EDTA) was sheared by 5 power bursts of 5 seconds each. Following ethanol p r e c i p i t a t i o n , DNA fragments of 300 - 600 bp were separated by electrophoresis i n a 5% non-denaturing polyacrylamide gel and i s o l a t e d by e l e c t r o e l u t i o n . The DNA fragments were made blunt-ended by incubation at 37°C f o r 90 - 120 minutes with 33 mM 65 Tr i s - a c e t a t e pH 7.8, 66 mM potassium acetate, 10 mM Mg acetate, 100 mg/ml BSA, and 0.2 mM of each deoxyribonucleotide triphosphate i n 50 y l t o t a l volume, containing 6 units of T4 DNA polymerase (BRL). Following phenol/chloroform (1:1 v/v) extraction, the DNA was p r e c i p i t a t e d with ethanol and resuspended at a concentration of 10 ng/yl of TE buffer. G.2 L i g a t i o n of DNA into pUC or M13 Vectors For r e s t r i c t i o n endonuclease mapping analysis or sonication, DNA fragments were subcloned into pUC12 or pUC13 vectors ( V i e i r a and Messing, 1982) (pUC plasmids were k i n d l y provided by Dr. Roland Russnak, Department of Biochemistry, U n i v e r s i t y of B r i t i s h Columbia). For DNA sequence a n a l y s i s , the M13 vectors mp8, mp9, mpl8 and mpl9 were used. M13 l i g a t i o n s were u s u a l l y c a r r i e d out with 10 ng of vector DNA and 20 - 60 ng i n s e r t DNA i n 20 y l of 50 mM Tris-HCl pH 7.4, 10 mM MgCl 2 > 10 mM DTT, 1.0 mM spermidine, 1.0 mM ATP and 100 yg/ml BSA. For l i g a t i o n s using pUC vectors, 100 ng of vector DNA was l i g a t e d to a 3-fold molar excess of i n s e r t DNA, using the l i g a t i o n b u f f e r described above. Phage T4 DNA l i g a s e (BRL or P-L Biochemicals) was added (0.5 or 1 u n i t f o r cohesive or blunt-ended l i g a t i o n s respectively) and the reaction was o allowed to proceed at 15 C overnight f o r blunt-ended l i g a t i o n s or at o 4 C overnight f o r cohesive-ended l i g a t i o n s . G.3 Transformations E. c o l i s t r a i n s JM103 or JM101 were used as b a c t e r i a l hosts f o r a l l pUC and M13 transformations (see Table I I ) . B a c t e r i a l c e l l s were made competent by treatment with 50 mM calcium c h l o r i d e (Messing, 1983). Aliquots of competent c e l l s (0.3 ml) were t y p i c a l l y transformed with 3 -5 y l of l i g a t i o n mixture (see preceding s e c t i o n ) . C e l l s were incubated 66 with DNA at 4°C f o r 40 - 60 minutes and then heat-shocked at 42°C f o r 3 minutes p r i o r to p l a t i n g . B a c t e r i a i n f e c t e d with recombinant M13 phage were assayed f o r t h e i r i n a b i l i t y to cleave 5-bromo4-chloro3-indolyl-galactoside (X-Gal) as described by Messing (1983), r e s u l t i n g i n the appearance of c l e a r plaques. The same colour assay was used to detect b a c t e r i a l colonies containing pUC plasmids. For c l a r i t y , a summary of the d i f f e r e n t subclones used i n analysis of the wild-type human ceruloplasmin gene i s presented i n Table I I I , and w i l l be r e f e r r e d to i n subsequent sections. H. DNA SEQUENCE ANALYSIS H.l Screening of M13 Clones In the case of sonication, mixtures of randomly-sheared DNA fragments were cloned into M13 vectors (see Section I I . G . l ) . In order to i d e n t i f y recombinant clones carrying e i t h e r cDNA or exon-encoding sequences, M13 plaques were screened by plaque h y b r i d i z a t i o n as previously described (Benton and Davis, 1977) (see Section II.C.2). Hybridization and washing conditions varied, depending of the nature of the probe (see Section II.K f o r a summary of h y b r i d i z a t i o n and washing conditions used i n the present study). H.2 I s o l a t i o n of M13 Template DNA Single-stranded M13 phage DNA from clones of i n t e r e s t was prepared as described by Messing (1983). Aliquots (2 ml) of YT containing 20 y l of host b a c t e r i a (JM101 or JM103) were each innoculated with a o si n g l e plaque, and incubated at 37 C f o r 8 - 1 0 hours. B a c t e r i a l c e l l s were p e l l e t e d by ce n t r i f u g a t i o n i n a 1.5 ml microfuge tube. Phage Table I I I . Cloning Strategy f o r the Wild-Type Human Ceruloplasmin Gene. Various r e s t r i c t i o n fragments containing exon sequences were subcloned from ceruloplasmin genomic clones (Figure 10) i n t o appropriate pUC and/or HI3 vectors. Asterisks follow those pUC subclones that were analyzed by sonication. Crosses i d e n t i f y subclones that were used l n Northern blot analysis of the 5' end of the gene. L I - L4 designate exons present i n the 5' untranslated region of the gene. RESTRICTION FRAGMENT CORRESPONDING EX0N(S) PHAGE CLONE DERIVED FROM 2.6 Kbp EcoRl (pUC13) L1.L2 ' XWT7 1.5 Kbp EcoRIt (pUC13) 700 bp Xbal/EcoRI ( p U C 1 3 ) ^ ^ ' 800 bp Xba/EcoRI (pUC13) — Jt 85 bp Sau3A/EcoRI (mpl8) '190 bp Sau3A (mpl8) -»70 bp Hinfl/EcoRI (mpl8) »220 bp H i n f l (mpl8) LI L2 XWT2 420 bp EcoRIt (pUC13) mpl8,19 L2 XWT7 4.0 Kbp EcoRl (pUC13)t 2.7 Kbp HindlH/BcoRI ( PUC13)t 1 \ rfcj. JMnrfTTT/P^BT-r " \ (pUC13) »160 bp Ddel (mpl8) ,450 bp Ddel (mpl8) L3 XWT7 1.4 Kbp EcoRl (pUC13)t - XWT7 3.1 Kbp EcoRl (pUCL2)*t 1 L 4 , l XWT2 RESTRICTION FRAGMENT CORRESPONDING PHAGE CLONE EXON(S) DERIVED FROM 4.6 Kbp EcoRI (pUC12)* 2 XWT2 800 bp EcoRI/Sall (apl8,19) 3 XWT6 ^,350. bp Xbal/EcoRI (mpl8) 1.8 Kbp EcoRI (pUC12)' " "" " -*950 bp Xbal (apl8) 4 XWT5 .^250 bp KpnI/EcoRI (mpl9) 3.2 Kbp EcoRI/Sall (pUCl2)^_~»800 bp Kpnl (opl9) -*1.9 Kbp KpnI/Sall (pUC18)» 5 6 XWT5 2.6 Kbp Hindlll (pUC12)« 7 XWT1 1.6 Kbp BamHl/Hlndlll (pDC12)* 8,9 XWT1 670 bp BamHl/Sstl (mpl8) 9 XWT1 450 bp EcoRI (pOC12, npl8, mpl9) 10 XWT1 1 1.2 Kbp EcoRI (pUC12) » 490 bp BamHI/EcoRI (mpl9) 10 XWT3 4.8 Kbp EcoRI (pUC13)* 10,11,12,13 XWT3 2.1 Kbp EcoRI/Sall (pUa3*, np!9) | ",14 XWT3 69 p a r t i c l e s i n 1.3 ml of supernatant were p r e c i p i t a t e d by the add i t i o n of 0.3 ml of a s o l u t i o n containing 20% PEG, 2.5 M NaCI, followed by incubation at room temperature f o r 15 minutes. M13 phage were c o l l e c t e d by c e n t r i f u g a t i o n f o r 5 minutes. The phage p e l l e t was then resuspended i n 200 y l of low t r i s b u f f e r (50 mM NaCI, 10 mM T r i s - H C l , pH 7.5, 1 mM EDTA). DNA was p u r i f i e d by extraction with phenol/chloroform (1:1 v/v), followed by a 1:1 extraction with chloroform. The DNA was then p r e c i p i t a t e d by the addition of 0.1 volumes of 3 M Na Acetate and 2 volumes of ethanol. F i n a l M13 phage p e l l e t s were resuspended i n 50 y l of low t r i s b u f f er. H.3 DNA Sequence Analysis A l l DNA sequence was determined using the dideoxynucleotide chain termination technique of Sanger et a l . (1977) as modified f o r M13 phage templates (Messing et a l . , 1981). Sequencing reactions were performed using the dideoxy/deoxyribonucleotide concentrations given i n Table I I I . DNA sequence analysis was c a r r i e d out by h y b r i d i z i n g 4 y l of M13 template DNA with 1 y l of u n i v e r s a l sequencing primer (17mer; 0.03 OD 260 units/ml), 1 y l dH 0, and 2 y l 10 x Hin buffer (600 mM NaCI, 100 mM Tris-HCl pH 7.5, 70 mM MgCl 2) at 68°C for 10 minutes. Following slow cooling to room temperature, 1 y l of 15 yM dATP and 1.5 32 y l of [a P]-dATP (10 y C i / y l ; 3000 Ci/mMole) was added; 2 y l of t h i s mixture was then added to 1.5 y l of the appropriate dideoxy/deoxyribonucleotide mix (see Table IV). To i n i t i a t e the extension reaction, 0.4 u n i t s DNA polymerase 1 Klenow fragment (BRL, Pharmacia) was added to each tube. Following incubation f or 15 minutes at room temperature, 1 y l of 0.5 mM dATP was added to each reaction, and the Table IV. Compositions of M13 DNA Sequencing Mixes. Composition of Sequencing Mixes Nucleotide d/ddG d/ddA d/ddT d/ddC dG 7.9 109.4 158.7 157.9 dT 157.6 109.4 7.9 157.9 dC 157.4 109.4 158.7 10.5 ddG 157.4 ddA - 116.7 ddT - - 550.3 ddC - 191.6 The numbers r e f e r to concentrations (yM) of dideoxy- and deoxyribonucleotide triphosphates used f o r the preparation of M13 DNA sequencing mixes. The values given were determined e m p i r i c a l l y by Dr. Joan McPherson, Department of Plant Sciences, U n i v e r s i t y of B r i t i s h Columbia. 71 cold "chase" was allowed to proceed f o r an a d d i t i o n a l 15 minutes at room temperature. At t h i s time, 5 y l of dye mix (98% formamide, 10" mM EDTA pH 8.0, 0.02% xylene cyanol, 0.02% bromophenol blue) was added. Reaction products were denatured by heating at 90°C f o r 3 minutes, and 2 y l of each re a c t i o n was analyzed on 6% denaturing polyacrylamide gels (see Section II.F.3.2). A f t e r electrophoresis (0.9 W/cm), gels were drie d and exposed overnight to Kodak XK-1 f i l m at room temperature. In order to avoid redundancy, most M13 clones generated by sonication were f i r s t analyzed using only ddTTP reactions (Sanger et a l . , 1980). Where necessary, fragments cloned into the M13 i n opposite o r i e n t a t i o n were i d e n t i f i e d according to t h e i r a b i l i t y to form f i g u r e 8 - l i k e configurations, which migrate more slowly i n agarose gels (Messing, 1983). I. RNA ANALYSIS 1.1 Northern Blot Analysis RNA samples (10 yg p o l y ( A ) + human l i v e r RNA or 20 yg HepG2 t o t a l c e l l u l a r RNA) were denatured by addition of loading b u f f e r containing formaldehyde, and subsequently separated by electrophoresis i n formaldehyde-containing agarose gels (Lehrach et a l . , 1977; Maniatis et a l . , 1982). P r i o r to tr a n s f e r , gels were treated with 50 mM NaOH, 10 mM NaCI for A5 minutes, n e u t r a l i z e d by treatment f o r 45 minutes with 0.1 M Tris-HCl pH 7.5, and f i n a l l y soaked i n 20XSSC f o r 60 minutes. RNA was tra n s f e r r e d to n i t r o c e l l u l o s e i n 20XSSC buffer, by the method of Southern (1975). Following transfer, b l o t s were a i r - d r i e d and baked for 4 - 1 6 hours at 68°C. Ceruloplasmin mRNA species were detected using 72 h y b r i d i z a t i o n and washing conditions described f o r double-stranded DNA probes (see Section II.K). 1.2 RNA Dot Blots Dot b l o t s f o r rapid analyses were prepared by spotting 7.5 -10 yg of human l i v e r mRNA i n s t e r i l e 10XSSC onto n i t r o c e l l u l o s e f i l t e r s . Following a i r drying, f i l t e r s were baked at 68°C f o r 4 - 1 6 hours. The dot b l o t s were probed subsequently with l a b e l l e d M13 templates containing ceruloplasmin genomic DNA fragments (see Section II.K f o r h y b r i d i z a t i o n and washing conditions). 1.3 Nuclease SI Mapping Single-stranded probes were prepared from recombinant M13 templates as described previously (see Section II.B.4). For nuclease SI pr o t e c t i o n assays (as described by Kay et a l . , 1986), 100,000 - 150,000 8 cpm of single-stranded DNA probe ( s p e c i f i c a c t i v i t y = 10 cpm/yg) was mixed with 0.35 - 1.0 yg of human l i v e r p o l y ( A ) + RNA i n a f i n a l volume of 30 y l h y b r i d i z a t i o n b u f f e r (50% formamide, 10 mM PIPES pH 6.9, 400 mM NaCI, 1 mM EDTA). The h y b r i d i z a t i o n mixture was then heated to 80°C f o r 15 minutes and subsequently incubated at 42°C f o r 12 hours. Following annealing, a s o l u t i o n (200 y l ) was added that contained 70 mM Na Acetate, 600 mM NaCI, 2.5 mM ZnS0 A and 150 - 200 U of nuclease SI. A f t e r d i g e s t i o n (60 minutes at 37°C), i n t a c t probe DNA was p r e c i p i t a t e d by the addition of 30 y l of a s o l u t i o n containing 100 mM EDTA, 4 M ammonium acetate and 100 yg of tRNA per ml, followed by 230 y l isopropanol. Reaction products were separated by electrophoresis on a 6% denaturing polyacrylamide gel (see Section II.F.3.2) and were v i s u a l i z e d 73 by exposure to Kodak XK-1 f i l m overnight at -70 C with an i n t e n s i f y i n g screen. J. CHROMOSOME MAPPING Chromosome l o c a l i z a t i o n studies f o r the human ceruloplasmin pseudogene were performed i n co l l a b o r a t i o n with Dr. John Hamerton (Department of Human Genetics, U n i v e r s i t y of Manitoba) using human-hamster somatic c e l l hybrids which had been characterized previously by cytogenetic and isozyme analysis (Donald et a l . , 1983; R i d d e l l et a l . , 1985). DNA from cultured c e l l l i n e s was i s o l a t e d as described by R i d d e l l et a l . (1986). DNA (5 yg) from each hybrid l i n e , as well as control human pl a c e n t a l and hamster DNA was digested wth EcoRI, electrophoresed on 1% agarose gels and transferred to Zetaprobe (Bio-Rad) according to the 32 manufacturer's s p e c i f i c a t i o n s . Blots were probed with a P-labeled r e s t r i c t i o n fragment s p e c i f i c f o r the human ceruloplasmin pseudogene (see Section II.K f o r h y b r i d i z a t i o n and washing conditions). K. SUMMARY OF HYBRIDIZATION/WASHING CONDITIONS K.1 Genomic Southern B l o t Hybridization 32 DNA fragments were detected by h y b r i d i z a t i o n to P-l a b e l l e d DNA probes as described by Kan and Dozy (1978). Membranes were f i r s t wetted with 3XSSC and then prehybridized f o r 2 - 4 hours at 37°C i n a so l u t i o n containing 50% formamide, 6XSSC, 1 mM EDTA, 0.1% SDS, 10 mM Tris-HCl pH 7.5, 10X Denhardts (DH) so l u t i o n (1XDH i s 0.02% BSA, 0.02% f i c o l l , 0.02% polyv i n y l p y r r o l i d o n e ) , 0.05% sodium pyrophosphate, 100 yg/ml denatured herring sperm DNA, and 25 yg/ml p o l y ( A ) + RNA. 74 Hybridizations were c a r r i e d out i n the above buffer, with the addition of denatured probe. Af t e r h y b r i d i z a t i o n , which was allowed to proceed f o r 24 - 36 hours at 37°C, b l o t s were washed i n 2XSSC, 1XDH, and then washed twice f o r 45 minutes each at 50°C i n 0.1XSSC, 0.1% SDS. Following a f i n a l room temperature ri n s e i n 0.1XSSC, f i l t e r s were a i r drie d and exposed to f i l m (see below). For the chromosome l o c a l i z a t i o n study, human-hamster hybrid panels were prehybridized overnight at 42°C i n 50% formamide, 3XSSPE, (1XSSPE i s 0.1 mM EDTA, 10 mM NaH 2P0 4 pH 7.0 and 0.18 M NaCI), 1% SDS, 0.5% non-fat powdered milk, 10% dextran s u l f a t e and 200 yg/ml salmon sperm DNA. The h y b r i d i z a t i o n reaction was c a r r i e d out overnight i n the same buf f e r with the addition of n i c k - t r a n s l a t e d probe at a concentration of 20 ng/ml. Following h y b r i d i z a t i o n , b l o t s were washed twice i n 2XSSC at room temperature f o r a t o t a l of 10 minutes. F i l t e r s were then washed f o r 15 minutes i n 0.2XSSC, 0.1% SDS at 55°C, and f i n a l l y f o r 15 minutes i n 0.2XSSC, 0.1% SDS at 55°C. K.2 Hybridization Conditions Other Than f o r Genomic Southern Blots For a l l other hybr i d i z a t i o n s using double-stranded DNA probes, f i l t e r s were prehybridized i n a s o l u t i o n containing 6XSSC, 2XDH s o l u t i o n at 68°C f o r 1 - 4 hours. F i l t e r s were then hybridized overnight at 68°C i n 6XSSC, 2XDH, 1 mM EDTA, 0.5% SDS. and the denatured probe DNA 6 8 (at l e a s t 1 x 10 cpm/ml, with s p e c i f i c a c t i v i t y > 0.5 x 10 cpm/yg). Following h y b r i d i z a t i o n , f i l t e r s were rinsed once at room o temperature i n 2XSSC, and then washed three times at 68 C i n 1XSSC, 0.5% SDS f o r 30 - 40 minutes each, and f i n a l l y rinsed i n 2XSSC at room temperature. 75 For 5 ' - l a b e l l e d oligodeoxyribonucleotide probes, p r e h y b r i d i z a t i o n was c a r r i e d out at 37°C f o r 2 - 1 6 hours i n 6XSSC, 2XDH, and 0.2% SDS. F i l t e r s were then hybridized at 68°C with the addition of 5'-end labeled oligonucleotide (at l e a s t 1 x 10^ cpm/ml, with s p e c i f i c a c t i v i t y ^ 10^ cpm/pmole). A l l washes contained 6XSSC, and were u s u a l l y c a r r i e d out at room temperature f o r 15 minutes, and then twice f o r 15 minutes each at 37°C. This was followed by higher temperature washings at 40 -55°C, depending on the base composition of the oligonucleotide. A f t e r washing, n i t r o c e l l u l o s e f i l t e r s from a l l h y b r i d i z a t i o n s described above were a i r - d r i e d and exposed overnight to e i t h e r Kodak XK-1 or Kodak X-Omat AR f i l m (the l a t t e r being approximately 5-fold more se n s i t i v e ) at -70°C with i n t e n s i f y i n g screens. 76 I I I . RESULTS A. CHARACTERIZATION OF THE HUMAN PRECERULOPLASMIN cDNA A.1 I n i t i a l Screening of a Human L i v e r cDNA Li b r a r y Two hundred thousand recombinant clones from a human l i v e r cDNA l i b r a r y (provided by Dr. Stuart Orkin at Children's H o s p i t a l , Harvard University) were screened at high density by using pool II and pool I II oligonucleotide mixtures (corresponding to amino acid residues 937 - 942 and 962 - 967 of the ceruloplasmin p r o t e i n sequence r e s p e c t i v e l y (see Figures 5 and 7) as h y b r i d i z a t i o n probes. One recombinant clone (designated phCP-1) hybridized s p e c i f i c a l l y to both oligonucleotide mixtures. R e s t r i c t i o n endonuclease mapping of the p u r i f i e d plasmid showed that phCP-1 contained an i n s e r t of approximately 2.7 Kbp cloned into the P s t l s i t e of the pKT218 vector. Subsequent DNA sequence analysis (see Section III.A.3) showed that the phCP-1 i n s e r t contained DNA encoding amino acid residues 202 - 1046 of plasma ceruloplasmin (Takahashi et a l . , 1984), i n addi t i o n to a 123 bp 3' untranslated region and a poly(A) t r a c t (see Figures 5 and 6). In order to i s o l a t e a clone(s) coding f o r the 5' region of the ceruloplasmin mRNA, the cDNA l i b r a r y was rescreened using a 322 bp H a e l l l - P s t l fragment as h y b r i d i z a t i o n probe (Probe A; see Figure 6). This fragment was derived from the 5' end of the phCP-1 i n s e r t . A singular clone was i d e n t i f i e d and found to contain DNA corresponding to amino acid residues 202 - 432 of ceruloplasmin. Therefore, the l a t t e r clone extended no further 5* than phCP-1. A.2 I s o l a t i o n of cDNA Clones Encoding the 5' End of Human Ceruloplasmin For the i s o l a t i o n of a clone(s) containing cDNA sequence corresponding to the remainder of the ceruloplasmin mRNA, two 77 Figure 5. Schematic summary of the cloning of the human preceruloplasmin cDNA. The phCP-1 clone (derived from the Stuart Orkin human l i v e r cDNA l i b r a r y ) encodes amino acid residues 202 - 1046 of ceruloplasmin, followed by a 123 bp 3* untranslated region (cross-hatched bar) and a poly(A) t a i l . The XhCP-1 clone ( i s o l a t e d from a randomly-primed human l i v e r cDNA l i b r a r y ) was found to contain sequence corresponding to amino a c i d residues 1 - 380 of plasma ceruloplasmin, preceded by a 19 amino a c i d long s i g n a l peptide peptide ( s o l i d bar). An arrow i d e n t i f i e s the s i t e of s i g n a l peptide cleavage. Stars indicate the posit i o n s of synthetic oligonucleotide probes used to screen the respective cDNA l i b r a r i e s . RANDOM PRIMER USED IN FIRST STRAND CDNA SYNTHESIS DOUBLE-STRANDED CDNA t CLONED INTO XgtlO Xhaf OLIGO dT PRIMER USED IN FIRST STRAND CDNA SYNTHESIS • DOUBLE-STRANDED CDNA I CLONED INTO pKT 218 PLASMID ECjORI J XhCP-1 Ec,oR1 EqoRI • • ZrMpolyA phCP-1 SIGNAL SEQUENCE l _ _ l _ -57- 600 1200 1800 2400 3"NON-CODIN(jl -SEQUENCE / 3000 i NUCLEOTIDES 3321 SIGNAL PEPTIDE -19' 200 400 600 800 I J AMINO ACIDS 1000 1046 oo 79 randomly-primed cDNA l i b r a r i e s were constructed (see Section II.C.2). The f i r s t of these, cloned into the EcoRI s i t e of the phage vector XgtlO, was screened by plaque h y b r i d i z a t i o n using a 1071 bp Pstl-EcoRI fragment i s o l a t e d from phCP-1 (Probe B, Figure 6) as a probe. Of the 16 p o s i t i v e clones that were i d e n t i f i e d , Southern b l o t analysis showed that 13 hybridized to the 322 bp H a e l l l - P s t l fragment derived from the 5' end of the phCP-1 cDNA i n s e r t (Probe A, Figure 6). Of these l a t t e r clones, only one was found to hybridize s p e c i f i c a l l y to the pool I oligonucleotide mixture, which corresponds to the amino-terminal 6 amino acids of plasma ceruloplasmin (see Figures 4 and 6). This clone (designated XhCP-1; see Figures 5 and 6) was characterized by r e s t r i c t i o n endonuclease mapping and DNA sequence analysis (see below) and was found to contain an EcoRI i n s e r t of approximately 1.2 Kbp, corresponding to amino acid residues 1 - 380 of the mature ceruloplasmin p r o t e i n . This sequence was preceded by a s i g n a l peptide of 19 amino acid residues beginning with a putative i n i t i a t o r methionine (see Figures 5 and 7). A second randomly-primed human l i v e r cDNA l i b r a r y was constructed i n the EcoRI s i t e of the phage vector X g t l l and screened i n order to i d e n t i f y a clone(s) containing the nucleotide sequence of the 5' untranslated region of the ceruloplasmin mRNA. The probe used f o r l i b r a r y screening was a 150 bp EcoRI-Hindll fragment obtained from the 5' end of the XhCP-1 clone (designated Probe C, Figure 6). Five p o s i t i v e cDNA clones were i d e n t i f i e d (designated XhCP-2 to XhCP-6) and were subsequently p u r i f i e d to homogeneity. From r e s t r i c t i o n endonuclease ana l y s i s , i t was determined that these clones contained EcoRI i n s e r t s of 780 bp, 920 bp, 1.1 Kbp, 730, and 1.0 kbp, for XhCP-2 to XhCP-6 80 Figure 6. R e s t r i c t i o n Map and Sequencing Strategy f o r Human  Preceruloplasmin cDNA Clones. The longer bars below the r e s t r i c t i o n map represent the clones phCP-1 and XhCP-1 that together include regins coding f o r the leader peptide (hatched bar), the plasma p r o t e i n (open bar), and the 3' untranslated sequence ( s o l i d bar). Arrows i n d i c a t e the extent and d i r e c t i o n of nucleotide sequence obtained from various M13 clones. R e s t r i c t i o n fragment probes A, B, and C, which were used i n l i b r a r y screening (see text f o r d e t a i l s ) are indicated d i r e c t l y below the r e s t r i c t i o n map ( s o l i d bars). The P s t l and EcoRl s i t e s i n parentheses r e s u l t from the cloning procedures used i n the construction of the cDNA l i b r a r i e s . Kb, Kilobases. r. CO n X c a c Q. C a rr o o LU o o LU rr o o LU (EcoRI) Hindi PROBE C (Pstl) (Pstl) Haelll PROBE A EcoRI PROBE B iXhCPI IphCPI o 0.6 1.2 1.8 Nucleotides (kb) 2.4 3.0 3.6 CO I- 1 82 re s p e c t i v e l y . Subsequent DNA sequence analysis (see below) showed that XhCP-2 and \hCP-6 extended an a d d i t i o n a l 19 and 38 bp r e s p e c t i v e l y , 5' to the previously characterized XhCP-1 clone (see Figure 8). A.3 DNA Sequence Analysis of Human Preceruloplasmin The complete nucleotide sequence of the phCP-1 and X.hCP-1 cDNA i n s e r t s was determined using the strategy shown i n Figure 6. The majority of the sequence of phCP-1 was obtained by analysis of randomly-sheared fragments cloned into M13. The remainder of the phCP-1 sequence, as well as the e n t i r e XhCP-1 nucleotide sequence was determined by analysis of s p e c i f i c r e s t r i c t i o n endonuclease fragments cloned into M13 vectors. The complete nucleotide sequence of these two cDNA clones that together encode human preceruloplasmin i s shown i n Figure 7. The p o s i t i o n of each nucleotide was determined an average of 3.4 times, and 62% of the sequence was obtained on both strands. In the region where they overlap (nucleotide residues 648 to 1197), the sequences of phCP-1 and XhCP-1 were found to be i d e n t i c a l . Nucleotide residues 1 - 5 7 code f or an amino terminal leader sequence, which i s removed p r i o r to the appearance of ceruloplasmin i n plasma (Takahaski et a l . , 1984). Nucleotides 58 - 3195 of the cDNA sequence encode the plasma form of ceruloplasmin (Takahaski et a l . , 1984). The open reading frame i s followed by a 'TGA' stop codon (encoded by nucleotides 3196 - 3198), a 3* untranslated region of 123 bp (nucleotide residues 3199 - 3321) and a poly(A) t a i l . The 3' untranslated sequence contains a putative polyadenylation s i g n a l , 'ATTAAA', which i s situated 14 nucleotides upstream of the poly(A) t r a c t . 83 Figure 7. Nucleotide Sequence of Human Preceruloplasmin cDNA. The sequence was determined by analysis of the overlapping clones shown i n Figure 6 (see text f o r d e t a i l s ) . The predicted amino acid sequence of human preceruloplasmin i s indicated above the DNA sequence. The putative s i g n a l peptidase cleavage s i t e i s shown by a s o l i d arrow. P o t e n t i a l carbohydrate s i t e s (Takahashi et a l . , 1984) are represented by s o l i d diamonds. Boxed sequences are complementary to oligonucleotide probes used to screen cDNA l i b r a r i e s . The polyadenylation s i g n a l ATTAAA i s underlined. 84 Lya Glu Ly» Hit Tyr Tyr l i e G CIAAA CAA k i d CAT W r tit ATT G 60 75 His Sar Asn l i s Tyr Lau Gin Asn Gly Pro Aap Arg H e Gly Arg Leu Tyr Ly» Lya Ala Lau Tyr Lau Gin Tyr Thr Aap Glu Thr Phe Arg Thr Thr 11a Glu Lya Pro Val Trp Leu Gly Pha Lau Gly Pro I l a l i e Lya Ala Glu CAT TCC AAT ATC TAT CTT CAA AAT GGC CCA GAT ACA ATT CGG AGA CTA TAT AAG AAG GCC CTT TAT CTT CAG TAC ACA GAT GAA ACC TTT AGG ACA ACT ATA GAA AAA CCG GTC TGG CTT GGG TTT TTA CCC OCT ATT ATC AAA GCT GAA 165 1B0 1M 110 225 240 255 270 285 300 90 100 110 A 120 130 Thr Gly Aap Lya Val Tyr Val Hia Leu Lya Asn Lau Ala Ear Arg Pro Tyr Thr Pha Mia Sar Hia Gly I l a Thr Tyr Tyr Lya Glu His Glu Gly Ala H e Tyr Pro Aap Asn Thr Thr Aap Pha Gin Arg Ala Aap Aap Lya Val Tyr ACT GGA GAT AAA GTT TAT GTA CAC TTA AAA AAC CTT GCC TCT AGG CCC TAC ACC TTT CAT TCA CAT GGA ATA ACT TAC TAT AAG GAA CAT GAG GGG GCC ATC TAC CCT GAT AAC ACC ACA GAT TTT CAA AGA GCA GAT GAC AAA GTA TAT 315 330 345 360 175 390 405 420 *35 450 140 150 160 170 180 Pro Gly Glu Gin Tyr Thr Tyr Nat Lau Lau Ala Thr Glu Glu Gin Sar Pro Gly Glu Gly Aap Gly Aan Cya Val Thr Arg l i e Tyr Hia Ear Hia I l a Asp Ala Pro Lys Aap H e Ala Ear Gly Lau I l a Gly Pro teu I l a I l a Cya CCA GGA GAG CAG TAT ACA TAC ATG TTG CTT GCC ACT GAA GAA CAA AGT CCT GGG GAA GGA GAT GGC AAT TGT GTG ACT AGG ATT TAC CAT TCC CAC ATT GAT GCT CCA AAA GAT ATT GCC TCA GGA CTC ATC GGA CCT TTA ATA ATC TGT Glu Aap Phe Gin Glu Ser Aan Arg Met Tyr Ear Val Asn Gly Tyr Thr Pha Gly Ear Leu Pro Gly Leu Ear Net Cya Ala Glu Aap Arg Val Lya Trp Tyr Leu Phe Gly Het Gly Aan Glu Val Aap Val His Ala Ala Pha Phe Hia GAA GAC TTC CAG GAG ACT AAC AGA ATG TAT TCT GTG AAT GGA TAC ACT TTT GGA AGT CTC CCA GGA CTC TCC ATG TGT GCT GAA GAC AGA GTA AAA TGG TAC CTT TTT GGT ATG GGT AAT GAA GTT GAT GTG CAC GCA GCT TTC TTT CAC Cly Gin Ala Leu Thr Aan Lya Aan Tyr Arg H e Aap Thr H e Aan Leu Pha Pro Ala Thr Lau Pha Asp Ala Tyr Hat Val Ala Gin Aan Pro Gly Glu Trp Het Leu Ser Cya Gin Asn Leu Asn Hia Leu Lya Ala Gly Leu Gin Ala GGG CAA GCA CTC ACT AAC AAG AAC TAC CGT ATT GAC ACA ATC AAC CTC TTT CCT GCT ACC CTG TTT GAT GCT TAT ATG GTG GCC CAG AAC CCT GGA GAA TGG ATG CTC AGC TGT CAG AAT CTA AAC CAT CTC AAA GCC GGT TTG CAA GCC 915 930 945 960 975 990 1.005 1,020 1,035 1,050 4 340 350 360 370 4^ 380 :"yr Tyr He Ala Ala Glu Glu He l i e Trp Aan Tyr Ala Pro Ser Gly H e Aap H a Phe Thr Lya Clu Aan Lau Thr Ala TAC TAC ATT GCC GCT GAG GAA ATC ATC TGG AAC TAT GCT CCC TCT GGT ATA GAC ATC TTC ACT AAA GAA AAC TTA ACA GCA 1,125 1,140 1,155 1,170 1,185 1,200 390 400 410 420 430 Pro Gly Sar Asp Sar Ala Vel Phe Pha Glu Gin Gly Thr Thr Arg He Gly Gly Sar Tyr Lya Lya Leu Val Tyr Arq Glu Tyr Thr Asp Ala Ser Phe Thr Asn Arg Lys Glu Arg Gly Pro Glu Glu Glu His Leu Gly He Leu Gly CCT GGA AGT GAC TCA C C C GTG TTT TTT GAA CAA GCT ACC AC* AGA ATT GGA GGC TCT TAT AAA AAC CTG CTT TAT CCT CAC TAC ACA GAT GCC TCC TTC ACA AAT CCA AAG CAC ACA GGC CCT GAA GAA GAC CAT CTT GGC ATC CTG GGT 1,215 1,230 1,245 1,260 1,175 1,290 1,305 1,320 1,335 1,350 Pha Pha Gin Val Gin Glu Cya ABn Lys Ser Ser Ser Lya Aap Aan H e Arg Gly Lys His Val Arg fi TTT TTC CAC CTC CAG GAG TGT AAC AAG TCT TCA TCA AAG CAT AAT ATC CGT GGC AAG CAT GTT AGA C 1,065 1,080 1.095 1,110 Pro Val He Trp Ala Glu Val Gly Asp Thr H e Arg Val Thr Phe His Aan Lya Gly Ala Tyr Pro Leu Ser He Glu Pro He Gly Val Arg Phe Asn Lys Asn Asn Glu Gly Thr Tyr Tyr Ser Pro Asn Tyr Aan Fro Gin Ser Arg CCT GTC ATT TGG GCA GAG GTG GGA GAC ACC ATC AGA GTA ACC TTC CAT AAC AAA GGA GCA TAT CCC CTC AGT ATT GAG CCG ATT GGG CTG AGA TTC AAT AAG AAC AAC GAG GGC ACA TAC TAT TCC CCA AAT TAC AAC CCC CAG AGC AGA 1,365 1,380 1,395 1,410 1,425 1,440 1,455 1,470 1,485 1,500 490 500 510 520 530 Sar V«l Pro Pro Sor Ala Ser Hia Val Ala Pro Thr Glu Thr Pha Thr Tyr Glu Trp Thr Val Pro Lya Glu Val Gly Pro Thr Aan Ala Aap Pro Val Cys Leu Ala Lys Met Tyr Tyr Ser Ala Val Aap Pro Thr Lyo Asp H e Phe AGT GTG CCT CCT TCA GCC TCC CAT GTG GCA CCC ACA GAA ACA TTC ACC TAT GAA TGG ACT GTC CCC AAA GAA GTA GGA CCC ACT AAT GCA GAT CCT GTG TGT CTA GCT AAG ATG TAT TAT TCT GCT GTG GAT CCC ACT AAA GAT ATA TTC 1.S1S 1,530 1,545 1,560 ' 1,575 1.S90 1,605 1,620 1,635 1,650 540 550 560 570 560 Thr Gly Leu H e Gly Pro Met Lys He Cys Lya Lys Gly Ser Leu Hia Ala Aan Gly Arg Gin Lya Aap Val Aap Lys Clu Phe Tyr Leu Phe Pro Thr Val Phe Asp Glu Asn Glu Ser Leu Leu Leu Glu Asp Asn H e Arg Met Phe ACT GGG CTT ATT GGG CCA ATG AAA ATA TGC AAG AAA GGA AGT TTA CAT GCA AAT GGG AGA CAG AAA GAT GTA GAC AAG GAA TTC TAT TTG TTT CCT ACA GTA TTT GAT GAG AAT GAG AGT TTA CTC CTG GAA GAT AAT ATT AGA ATG TTT 1,665 1,680 1,695 1,710 1,725 1,740 1,755 1,770 1,7B5 1,800 590 600 610 620 630 Thr Thr Ala Pro Asp Gin Val Asp Lya Glu Aap Glu Aap Phe Gin Glu Sar Aan Lya Met His Ser Met Asn Gly Pha Het Tyr Gly Aan Gin Pro Gly Leu Thr Met Cys Lys Gly Asp Ser Val Val Trp Tyr Leu Phe Ser Ala Gly ACA ACT GCA CCT GAT CAG GTG GAT AAG GAA GAT GAA GAC TTT CAG CAA TCT AAT AAA ATG CAC TCC ATG AAT GGA TTC ATG TAT GGG AAT CAG CCG GGT CTC ACT ATG TGC AAA GGA GAT TCC GTC GTC TGG TAC TTA TTC AGC GCC GGA 1,815 1,830 1,845 1,860 1,875 1,890 1,905 1,920 1,935 1,950 640 650 660 670 680 Aan G)u Ala Asp Val His Gly H e Tyr Phe Ser Gly Aan Thr Tyr Leu Trp Arg Cly Clu Arg Arg Asp Thr Ala Asn Leu Phe Pro Gin Thr Ser Leu Thr Leu His Met Trp Pro Aap Thr Glu Cly Thr Phe Asn Val Clu Cys Leu AAT GAG GCC GAT CTA CAT GGA ATA TAC TTT TCA GGA AAC ACA TAT CTC TCC AGA CCA GAA CCC AGA GAC ACA CCA AAC CTC TTC CCT CAA ACA AGT CTT ACG CTC CAC ATG TGG CCT GAC ACA GAG CGG ACT TTT AAT GTT CAA TGC CTT 1,965 1,980 1,995 2,010 2,025 2,040 2,055 2.070 2,085 - 2,100 690 700 710 720 730 Thr Thr Asp Hia Tyr Thr Cly Gly Met Lys Gin Lys Tyr Thr Val Kan Gin Cya Arg Arg Gin Ser Clu Aap Sar Thr Pha Tyr Leu Gly Glu Arg Thr Tyr Tyr l i e Ala Ala Val Glu Val Glu Trp Aap Tyr Ser Pro Cln Arg Glu AC* ACT GAT CAT TAC ACA CGC GCC ATG AAG CAA AAA TAT ACT GTG AAC CAA TGC AGC CCG CAG TCT GAG GAT TCC ACC TTC TAC CTG GGA GAG ACG ACA TAC TAT ATC GCA GCA GTG GAG GTG GAA TGG GAT TAT TCC CCA CAA AGG GAG 2,115 2,130 2,145 2,160 2.17S 2,190 2,205 2,220 2,235 2,250 Trp Clu Lys Glu Leu His His Leu Gin Glu Gin Asn Val Ser Aan Ala Phe Lau Aap Lys Cly Glu Phe Tyr H e Gly Ser Lys Tyr Lys Lya Val Val Tyr Arg Gin Tyr Thr Asp Ser Thr Phe Arg Val Pro Val Glu Arg Lys Ala TGG CAA AAC CAG CTG CAT CAT TTA CAA GAG CAG AAT GTT TCA AAT CCA TTT TTA GAT AAG GGA GAG TTT TAC ATA GGC TCA AAG TAC AAG AAA GTT GTG TAT CGG CAG TAT ACT GAT AGC ACA TTC CGT GTT CCA CTG GAG AGA AAA CCT 2,265 2,280 2,295 2,310 2,325 2,540 2,355 2,370 2,385 2,400 790 800 810 820 830 Clu Clu Glu Hia Leu Cly H e Lou Gly Pro Gin Leu Hia Ala Aap Val Gly Asp Lys Val Lya He H e Phe Lya Aan Met Ala Thr Arg Pro Tyr Ser H e His Ala His Cly Val Gin Thr Glu Ser Ser Thr Val Thr Pro Thr Leu CAA CAA GAA CAT CTG GGA ATT CTA GGT CCA CAA CTT CAT GCA GAT GTT GGA GAC AAA GTC AAA ATT ATC TTT AAA AAC ATG GCC ACA AGG CCC TAC TCA ATA CAT GCC CAT GGG GTA CAA ACA GAG AGT TCT ACA GTT ACT CCA ACA TTA 2,415 2,430 2,445 2,460 2,475 2,490 2,SOS 2,520 2,535 2,550 840 850 860 870 880 Pro Cly Glu Thr Leu Thr Tyr Vol Trp Lys H e Pro Glu Arg Ser Gly Ala Gly Thr Glu Aap Ser Ala Cys l i e Pro Trp Ala Tyr Tyr Ser Thr Val Asp Gin Val Lys Aap Leu Tyr Ser Gly Leu H e Gly Pro Leu H e Val Cys CCA GCT GAA ACT CTC ACT TAC GTA TGG AAA ATC CCA GAA AGA TCT CCA GCT GCA ACA GAG GAT TCT CCT TGT ATT CCA TGG GCT TAT TAT TCA ACT GTG GAT CAA GTT AAG GAC CTC TAC AGT CCA TTA ATT GGC CCC CTG ATT GTT TGT 2,565 2,580 2,595 2,610 2,625 2,640 2,655 2,670 2,685 2,700 890 900 910 920 930 Arg Arg Pro Tyr Leu Lya Val Phe Asn Pro Arg Arg Lya Lau Clu Phe Ala Lau Leu Pha Lau Val Pha Aap Glu Aan Glu Sar Trp Tyr Leu Asp Asp Asn H e Lys Thr Tyr Ser Asp Hia Pro Clu Lys Val Asn Lya Asp Asp Glu GGA AGA CCT TAC TTG AAA GTA TTC AAT CCC AGA AGG AAG CTG GAA TTT CCC CTT CTG TTT CTA GTT TTT GAT GAC AAT GAA TCT TCC TAC TTA GAT GAC AAC ATC AAA ACA TAC TCT GAT CAC CCC GAG AAA GTA AAC AAA GAT GAT GAG 2.715 2,730 2,745 2,760 2,775 2,790 2,805 2,820 2,835 2,B50 940 950 960 970 980 Glu Phe H e Clu Sar Asn Lya Met Hia Ala l i e Asn Gly Arg Met Ph* Gly Asn Lsu Gin Gly Lau Thr Met Hia Val Cly Aap Glu Val Asn Trp Tyr Leu Het Gly Met Gly Aan Glu I l a Aap Leu Hia Thr Val Hia Phe His Gly GAA TTC ATA CAA AGC |AAT AAA ATG CAT GCT ATfr AAT GGA AGA ATG TTT GCA AAC CTA CAA GGC CTC ACA ATG CAC GTC GGA GAT GAA GTC |AAC TCC TAT CTG ATC*GG> ATG CGC AAT GAA ATA CAC TTA CAC ACT GTA CAT TTT CAC GGC 2.865 2,880 2,895 2,910 2,925 2,940 2,955 2,970 2,985 3,000 i 990 1,000 1,010 1,020 1,030 | Ear Pha Gin Tyr Lys Hia Arg Gly Val Tyr Sar Ser Aap Val Phe Aap I l a Phe Pro Gly Thr Tyr Gin Thr Lau Glu Mat Pha Pro Arg Thr Pro Cly H e Trp Leu Leu Hia Cys His Val Thr Asp His H e Hia Ala Gly Met ( CAT AGC TTC CAA TAC AAG CAC AGG GGA GTT TAT AGT TCT GAT CTC TTT GAC ATT TTC CCT GGA ACA TAC CAA ACC CTA GAA ATG TTT CCA ACA ACA CCT GGA ATT TGG TTA CTC CAC TGC CAT GTG ACC CAC CAC ATT CAT GCT GGA ATG | 3,015 3,030 3,045 3,060 3,075 3,090 3,105 3,120 3,135 3,150 1,040 1,046 Glu Thr Thr Tyr Thr Val Leu Gin Aan Glu Aap Thr Lya 8«r Cly STOP CAA ACC ACT TAC ACC GTT CTA CAA AAT GAA GAC ACC AAA TCT GCC TCA ATG AAA TAA ATT GGT GAT AAG TGC AAA AAA GAG AAA AAC CAA TCA TTC ATA ACA ATC TAT GTG AAA GTG TAA AAT AGA ATG TTA CTT TGG AAT GAC TAT AAA 3,165 3,180 3,195 3,210 3,225 3,240 3.25S 3,270 3,285 3,300 85 Figure 8. Comparative analysis of cDNA clones XhCP-2 to XhCP-6. EcoRl i n s e r t s i s o l a t e d from the above clones were subcloned into M13 and characterized by T-tracking analysis. Their p o s i t i o n s r e l a t i v e to the previously characterized XhCP-1 cDNA clone are shown. The l o c a t i o n of Probe C (an EcoRI-Hindll r e s t r i c t i o n fragment used as a h y b r i d i z a t i o n probe f o r l i b r a r y screening) i s i d e n t i f i e d by a s o l i d bar. Arrowheads ind i c a t e that the clones extend 3' to Probe C (see text f o r d e t a i l s ) . XhCP-2 and XhCP-6 were found to extend 19 and 38 bp r e s p e c t i v e l y 5' to XhCP-1. The ad d i t i o n a l nucleotide sequence contained i n these two clones i s given. 5 ; -»~XhCP-1 (EcoRi) Probe C Hindi -38 + 1 l j y _ _ _ • X h C P - 2 •XhCP-S •XhCP 4 •XhCP-5 ••XhCP 6 C O a s 87 EcoRl i n s e r t s i s o l a t e d from XhCP-2, XhCP-3, XhCP-4, XhCP-5, and XhCP-6 clones were subcloned into M13 and i n i t i a l l y characterized by T-tracking analysis (Sanger et a l . , 1980) i n order to determine t h e i r p o s i t i o n s r e l a t i v e to the XhCP-1 cDNA clone (see Figure 8). Two of these clones (XhCP-2 and XhCP-6) were found to extend 19 and 38 nucleotides r e s p e c t i v e l y further 5' than XhCP-1. The a d d i t i o n a l 5* sequence contained i n these two clones i s also presented i n Figure 8. A.4 Ceruloplasmin T r a n s c r i p t Analysis A Northern b l o t prepared using samples of both p o l y ( A ) + RNA from human l i v e r and t o t a l RNA p u r i f i e d from HepG2 c e l l s was hybridized to 32 the P - l a b e l l e d cDNA i n s e r t i s o l a t e d from phCP-1. In both l i v e r and HepG2 RNA samples, the cDNA hybridized to an mRNA species that i s 3700 + 200 nucleotides i n s i z e (Figure 9). The cDNA hybridized to an a d d i t i o n a l species of 4500 + 250 nucleotides i n the human l i v e r p o l y ( A ) + RNA sample (Figure 9, lane 1). Given that eukaryotic mRNAs usu a l l y contain poly(A) t a i l s of 180 - 200 nucleotides (Perry, 1976), both of the ceruloplasmin t r a n s c r i p t s are much larger i n s i z e than that required to encode the en t i r e coding region of the ceruloplasmin polypeptide chain (Takahashi et a l . , 1984). A Northern b l o t containing samples of bovine l i v e r p o l y ( A ) + RNA was 32 probed at high stringency using the P-la b e l l e d i n s e r t i s o l a t e d from phCP-1 ( r e s u l t s not shown). In t e r e s t i n g l y , only a s i n g l e h y b r i d i z i n g species was detected which was 3800 + 200 nucleotides i n s i z e . 88 Figure 9. Bl o t h y b r i d i z a t i o n analysis of human ceruloplasmin mRNA. RNA was separated by electrophoresis i n a denaturing agarose/formaldehyde gel and transferred to n i t r o c e l l u l o s e . The f i l t e r 32 was hybridized with P - l a b e l l e d phCP-1. The f i l t e r was exposed to X-ray f i l m f o r 18 hours at -70°C with i n t e n s i f y i n g screens. Lane 1, 10 yg of human l i v e r p o l y ( A ) + RNA; lane 2, 20 yg of t o t a l HepG2 c e l l RNA. The po s i t i o n s of H i n d l l l fragments of X phage DNA used as s i z e markers are shown. Sizes are given i n Kilobases (Kb). Kb 9.5 6.7 4 .3 2.2 2.0 0.59 90 B. CHARACTERIZATION OF THE WILD-TYPE HUMAN CERULOPLASMIN GENE B . l I s o l a t i o n and R e s t r i c t i o n Endonuclease Mapping of Genomic Clones I n i t a l l y , f i v e genomic equivalents [1 x 10 6 plaque-forming units (pfu)] of the Maniatis human genomic phage l i b r a r y , constructed i n the phage lambda vector Charon 4A, were screened by using the phCP-1 cDNA clone as a h y b r i d i z a t i o n probe. As described previously, t h i s cDNA encodes amino acid residues 202 - 1046 of ceruloplasmin, and also contains a 3* untranslated region of 123 bp, terminating with a poly(A) t a i l . The fourteen p o s i t i v e clones i d e n t i f i e d from t h i s screen were p u r i f i e d to homogeneity, and analyzed by r e s t r i c t i o n endonuclease mapping. On t h i s b a s i s , the clones were representative of only two independently-derived genomic clones. One of these clones (designated 3^10) corresponded to a pseudogene f o r human ceruloplasmin (see Section I I I . C ) . The other clone (designated XWT1; see Figure 10) corresponded to the wild-type ceruloplasmin gene, as was determined i n i t i a l l y by r e s t r i c t i o n endonuclease mapping and Southern b l o t analysis, and subsequently confirmed by using DNA sequence analysis (see below). In order to obtain phage clones corresponding to the 5' end of the ceruloplasmin gene, the Maniatis l i b r a r y was subseqently rescreened using the 1.2 Kbp EcoRI i n s e r t from \hCP-l (encoding the si g n a l peptide and amino acids 1 - 380 of the mature protein) as a hy b r i d i z a t i o n probe. From t h i s screen, one p o s i t i v e clone (designated XWT2; see Figure 10) was i d e n t i f i e d . To i s o l a t e a d d i t i o n a l ceruloplasmin genomic clones, 1 x 10^ pfu from a second genomic l i b r a r y (constructed i n the lambda phage vector EMBL 32 3) were screened, using P-labelled cDNA ins e r t s from both phCP-1 and X.hCP-1 as h y b r i d i z a t i o n probes. From t h i s screen, 10 d i f f e r e n t 91 Figure 10. P a r t i a l r e s t r i c t i o n map and intron/exon organization of the  human ceruloplasmin gene. A complete map of the 45 Kbp region using the enzymes BamHl (B) , S a i l (S), H i n d l l l (H), and EcoRI (E) i s shown; an incomplete map f o r AccI (A), Xbal (X), Kpn (K), S s t l (T), and B a l l (L) i s also given. Exons located within the coding region are shown as black boxes i n a bar above the r e s t r i c t i o n map and are numbered 1 - 1 4 . Corresponding introns are also l a b e l l e d (A - N). Approximate posi t i o n s of introns (LA - LD) and exons ( L l - L4) present i n the 5* untranslated region of the gene are indicated; lengths of these exons and introns are undetermined. Genomic phage clones XWT1 - XWT9 are shown below the r e s t r i c t i o n map. Open c i r c l e s at the ends of EMBL 3 phage clones represent Sau3A s i t e s , while s o l i d diamonds at the ends of clones i s o l a t e d from the Maniatis l i b r a r y represent EcoRI l i n k e r s . The r e l a t i v e p o s i t i o n s of the 4.6 Kbp and 2.4 Kbp EcoRI fragments i d e n t i f i e d from genomic Southern analysis are also shown, corresponding to the 3' end of the gene (see text f o r d e t a i l s ) . The scale represents 1 Kilobase p a i r . 93 recombinant clones were obtained. Seven of these were found to correspond to the wild-type ceruloplasmin locus (designated XWT3 to XWT9; see Figure 10), while the remaining three clones (designated 3iJ;21, 34*29, and 3»|/9) were i d e n t i f i e d as human ceruloplasmin pseudogene clones (see Section III.C). DNA was prepared from small scale lysates of wild-type phage clones, and analyzed by multiple r e s t r i c t i o n enzyme digestions and Southern b l o t analyses using appropriate h y b r i d i z a t i o n probes derived from cDNA clones. On t h i s b a s i s , a p a r t i a l r e s t r i c t i o n enzyme map of cloned ceruloplasmin genomic DNA was constructed (Figure 10). The nine wild-type ceruloplasmin clones i d e n t i f i e d (Figure 10) span a region of approximately 45 Kbp of genomic DNA. Southern b l o t analysis using M13 probes derived from the 3' end of the ceruloplasmin cDNA ( i . e . , containing sequences derived from nucleotide residues 2565 - 3321, Figure 7) indicated that the 3' end of the ceruloplasmin gene was not represented i n the above phage clones. B.2 L o c a l i z a t i o n of Intron/Exon Junctions Corresponding to the  Ceruloplasmin Coding Region In order to i d e n t i f y exon-containing DNA sequences corresponding to the cDNA sequence of human preceruloplasmin, e i t h e r s p e c i f i c r e s t r i c t i o n endonuclease fragments h y b r i d i z i n g to cDNA-derived probes or fragments generated by sonication of appropriate genomic clones were l i g a t e d into M13 vectors f o r analysis (see Table 3, section II.G.2). In the l a t t e r case, coding sequences were i d e n t i f i e d by screening M13 32 subclones with P-labelled cDNA fragments as h y b r i d i z a t i o n probes. The intron/exon boundary sequences for exons 1 - 1 4 are presented i n Table V. A l l introns characterized to date are s p l i c e d according to the GT..AG rule T a b l e V. N u c l e o t i d e Sequence of I n t r o n / E x o n J u n c t i o n s i n t h e Human C e r u l o p l a s m i n Gene. 5' SPLICE DONOR 3' SPLICE ACCEPTOR CODON EXON U n o t d e t e r m i n e d INTRON LD c g c t t t c t c c c t t c 8 8 a a a 8 A A G G PHASE 1 A C A C 8 t a a g a A t 8 t c t t 8 t t t t t c t t t g c a g G G A A I I 2 G A G G g t a a g t B c a t t 8 c a t 8 t t g c t t c c t a g G G G C I 3 A A A G 8 t a c a t C a a t a 8 t a a c t t t a a c t c c a g A T T C I 4 C T C C 8 t a a g a D a c t c t g c t c t t g a c t t a c a g C T G T I 5 A A A G 8 t a 8 g a E c t c t t t c t g t t t c a t t t c a 8 C C G G I 6 G A A G 8 t a a t t F a a c a c t t t t t t c c c c c t 8 a g T G A C I I 7 C T G G g t g a g t G a a t t c c t c t t t g t t c c c a 8 G T C C I 8 A G A A g t g a g t H t a a t 8 t 8 a c c t t t c t c a c a 8 G T G T I 9 A C A G 8 t a a 8 t I t c a t t t t 8 t t t t a t t t '•. a c a g G A T G 0 10 C A C T g t a a g t J t t a t t t c c c a a c t t t t t a c a 8 C C A T I 11 not : d e t e r m i n e d K t t t t c t t a t t t c c a c t c c a g G G A C I I 12 A G A A 8 t a a t t L not d e t e r m i n e d I 13 C T A G 8 t a t g t M t t c c t c t t c a c t t t t g c c a 8 G T C C I 14 C C A G g t a c t c N Exon sequence i s shown i n upper c a s e , w h i l e i n t r o n sequence i s g i v e n i n l o w e r c a s e . The codon phase r e f e r s t o t h e p o s i t i o n o f the i n t r o n r e l a t i v e t o the codon t r i p l e t ( S h a r p , 1981), i . e . 0 : i n t r o n o c c u r s between c o d o n s , I : i n t r o n o c c u r s a f t e r t h e f i r s t n u c l e o t i d e of t h e codon, and I I : i n t r o n o c c u r s f o l l o w i n g t h e second n u c l e o t i d e of t h e codon. "L" d e s i g n a t e s i n t r o n s o r exons o c c u r r i n g w i t h i n t he 5' u n t r a n s l a t e d r e g i o n . Table VI. Frequencies of Nucleotides at Intron/Exon Junctions. DONOR FREQUENCIES +4 +3 +2 +1 -1 -2 -3 -4 -5 -6 G 2 2 2 8 13 0 2 1 9 0 A 6 5 9 2 0 0 11 8 1 3 T 0 3 0 1 0 13 0 0 3 9 C 5 3 2 2 0 0 0 2 0 1 CON N A A G G T R A G T C ACCEPTOR FREQUENCIES -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 +1 +2 +3 +4 G 0 2 0 0 3 1 2 1 2 0 0 2 1 0 1 2 1 1 0 13 7 4 5 3 A 4 5 3 2 0 0 2 3 0 2 1 0 3 3 0 0 5 1 13 0 2 2 4 1 T 6 4 6 8 7 10 4 8 4 9 10 8 4 6 9 7 2 1 0 0 1 5 2 3 C 3 2 4 3 3 2 5 1 7 2 2 3 5 3 3 4 5 10 0 0 3 2 2 6 CON Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y A G G N N N The frequencies of nucleotides occurring at intron/extron boundaries i n the human ceruloplasmin gene are compared to the consensus sequence (CON) of Mount (1982). "N" represents G, A, T or C, while "Y" denotes pyrimidine residues ( i . e . C or T). Splice junctions are located between -1 and +1. 96 (Breathnach and Chambon, 1981; Cech, 1983). The consensus sequences surrounding s p l i c e junctions i n eukaryotic RNA polymerase II-transcribed genes (Breathnach and Chambon, 1981; Mount, 1982) were also found to be i n agreement with those characterized i n the human ceruloplasmin gene (see Table VI). Although the s p l i c e donor and acceptor sequences were not determined f o r introns K and L r e s p e c t i v e l y , intron p o s i t i o n s i n both cases could be assigned unequivocally from sequences obtained on one side. Introns present i n the coding sequence corresponding to the 3* end of the gene ( i . e . the region containing nucleotide residues 2656 - 3321 of the cDNA sequence) were not determined, since phage clones containing t h i s area were not obtained from genomic screening (see Figure 10). 'B.3 P a r t i a l Nucleotide Sequence Analysis of the Human Ceruloplasmin  Gene P a r t i a l DNA sequence of the ceruloplasmin gene corresponding to the coding region was determined by analysis of M13 subclones. Approximately 40% of the sequence data was determined on both strands, and DNA sequence obtained on one strand only was determined at l e a s t twice. Sizes and r e l a t i v e p o s i t i o n s of introns and exons i n the gene are summarized i n Table VII and shown schematically i n Figure 10. Intron sizes were determined i n a l l cases by r e s t r i c t i o n endonuclease mapping, and therefore represent close approximations to actual s i z e s . Exons range i n s i z e from 129 - 255 bp, with a calculated average length of 183 bp. Introns t y p i c a l l y were found to be v a r i a b l e i n s i z e , ranging from approximately 800 bp to 9.5 Kbp. The p a r t i a l amino acid sequence determined from the cha r a c t e r i z a t i o n of exons 1 - 1 4 was i d e n t i c a l to that 97 Table VII. Sizes and posit i o n s of introns and exons within the ceruloplasmin gene. Exon Nucleotide P o s i t i o n Length (bp) Intron Nucleotide P o s i t i o n Length (bp)* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -12 2460 4760 5775 7401 8958 12232 14274 15579 25193 26646 28111 29921 31312 146 2708 4973 5949 7656 9130 12372 14427 15791 25344 26859 28319 30061 31441 158 248 213 174 255 172 140 153 212 151 213 208 140 129 A B C D E F G H I J K L M N 159 2709 4974 5950 7657 9131 12373 14428 15792 25345 26860 28320 30062 31442 - 2459 - 4759 - 5774 - 7400 - 8957 - 12231 - 14273 - 15578 - 25192 - 26645 - 28110 - 29920 - 31311 ->32242 2300 2050 800 1450 1300 3100 1900 1150 9400 1300 1250 1600 1250 800+ *Sizes of a l l introns were estimated from r e s t r i c t i o n enzyme an a l y s i s . 98 determined f o r the cDNA sequence corresponding to nucleotide residues 1 -2555. B.4 Organization of the 5' End of the Human Ceruloplasmin Gene B.4.1 Comparison of genomic and cDNA sequence data. I n i t i a l l y , the nucleotide sequence of the XhCP-6 cDNA clone (Figure 8) was compared with overlapping genomic DNA sequence (corresponding to nucleotide residues -288 - +412) derived from sonication of a 3.1 Kbp EcoRI fragment (see Table I I I ) . Alignment of the nucleotide sequences i s presented i n Figure 11. The observed divergence of the genomic sequence with that of the cDNA sequence correlated with the presence of a 3* consensus s p l i c e acceptor s i t e at nucleotide p o s i t i o n -13 i n the 5' untranslated region of the human ceruloplasmin gene. Location of t h i s s p l i c e s i t e was subsequently confirmed using nuclease SI mapping analysis (see below). B.4.2 Nuclease SI mapping analysis of exon 1. The presence of a s p l i c e acceptor s i t e at -13 was confirmed using the 700 bp genomic clone described above as a h y b r i d i z a t i o n probe f o r nuclease SI mapping analysis (see Figure 12). The observed protected fragment of 158 bp indicates the presence of an intron s p l i c e s i t e at -13 bp, since a s p l i c e j unction had been i d e n t i f i e d following nucleotide residue 146 i n the coding region. This r e s u l t i s i n agreement with DNA sequence data (Figure 11), and confirms that the XhCP-6 cDNA clone contains 26 bp corresponding to an exon i n the 5* untranslated region. B.4.3 Southern b l o t a n a l ysis. The lo c a t i o n of an exon i n the 5' untranslated region (designated L4; see Figures 8 and 10) was determined by Southern b l o t analysis of an AccI/EcoRI digest of the 3.1 Kbp EcoRI 99 Figure 11. Comparison of the sequence of the \hCP-6 cDNA clone with overlapping genomic DNA sequence derived from the phage clone \WT2. +1 indicates the f i r s t nucleotide of the i n i t i a t o r methionine residue. The point of sequence divergence between genomic and cDNA sequences coincides with the l o c a t i o n of a 3' s p l i c e acceptor s i t e i n the genomic sequence ( v e r t i c a l arrow). Sequences corresponding to exon L4 (from the cDNA) or intro n LD (from the genomic clone) are indicated. 100 ^ Intron LP  38 ( +1 Q T C C G C C G C T T T C T C C C T T C G G A" A A G A A G G G G A A A A A A A GENOMIC C A C T T C A T T T C T T C T C A G G C T C C A A G A A G G G G A A A A A A A cDNA ~* Exon L4 101 Figure 12. Nuclease SI mapping analysis of exon 1. A 700 bp fragment (containing nucleotide residues -288 - +412 of the genomic DNA sequence) was used as a probe f o r SI nuclease p r o t e c t i o n a n a l y s i s . Following h y b r i d i z a t i o n to 0.5 yg of human l i v e r p o l y ( A ) + RNA and nuclease SI digestion, S l - r e s i s t a n t products were separated on a denaturing polyacrylamide gel and v i s u a l i z e d by autoradiography. A protected band of 158 bp was detected (lane 2), corresponding to exon 1. A H i n f l digest of pBR322 was used as markers (lane 1). The s i z e s of r e s u l t i n g fragments are given i n base p a i r s (bp). 102 bp 1631 506/517 396 344 298 220/221 154 103 Figure 13. Southern b l o t analysis l o c a l i z i n g exon L4 i n the 5*  untranslated region of human ceruloplasmin. AccI/EcoRI (panel A, lane 1) or AccI (panel A, lane 2) digests of the 3.1 Kbp genomic EcoRI fragment containing exon 1 were tr a n s f e r r e d to n i t r o c e l l u l o s e and probed with the XhCP-6 cDNA clone. In a d d i t i o n to the pUC vector band (designated "P"), h y b r i d i z i n g species corresponding to 1.7 Kbp and 0.85 Kbp AccI fragments were detected. Positions of H i n d l l l fragments of phage X DNA used as s i z e markers are shown. Fragment si z e s are given i n Kilobase p a i r s (Kbp). R e s t r i c t i o n analysis of the 3.1 Kbp EcoRI (E) genomic fragment using AccI (A) i s shown i n panel B. The r e l a t i v e p o s i t i o n s of exon L4 and exon 1 are indicated; the l a t t e r exon i s designated by a s o l i d bar. The precise l o c a t i o n of exon L4 i s uncertain, as indicated by the dashed l i n e s . An arrow i d e n t i f i e s the d i r e c t i o n of t r a n s c r i p t i o n . o 4> 105 genomic fragment (Figure 13; see above), using X.hCp-6 cloned into M13 to generate a h y b r i d i z a t i o n probe. The hy b r i d i z a t i o n and washing conditions used were s i m i l a r to those employed f o r oligonucleotide probes (see Section I I . B . l ) . Hybridizing DNA fragments of 1.7 Kbp and 0.85 Kbp were detected, the l a t t e r of which contains Exon 1 (see Figure 10) based on previous Southern analysis (data not shown). This indicates that the 3.0 Kbp EcoRI genomic fragment contains at l e a s t two exons (see Figure 10). B.4.4 Northern b l o t analysis of the 5' end of the human  ceruloplasmin gene. For further analysis of the 5' end gene organization, EcoRI fragments derived from X.WT7 and X.WT2 phage clones were subcloned (see Table III) and used as h y b r i d i z a t i o n probes f o r Northern b l o t analysis of p o l y ( A ) + RNA. Hybridization to both 3.7 Kb and 4.5 Kb ceruloplasmin-specific t r a n s c r i p t s (see Section II.A.4) was detected with a l l fragments tested (see F i g . 14) except the 1.4 Kbp EcoRI fragment. The hy b r i d i z a t i o n s i g n a l obtained using the 420 bp EcoRI probe was observed to be very weak following a 5-day exposure, suggesting the presence of a short exon sequence within t h i s fragment. For more pre c i s e assignment of mRNA-encoding sequences i n the 5' untranslated segment, the 4.0 Kbp EcoRI fragment was subcloned further u t i l i z i n g an i n t e r n a l H i n d l l l s i t e (see Table I I I and Figure 10), and the r e s u l t i n g two fragments were subsequently used as h y b r i d i z a t i o n probes for Northern b l o t a n a l y s i s . As shown i n Figure 14A, hy b r i d i z a t i o n was detected with only the 1.3 Kbp Hindlll/EcoRI fragment. B.4.5 RNA dot b l o t analysis. In order to l o c a l i z e further the exons within the 5' untranslated region of the human ceruloplasmin gene, EcoRI genomic fragments i n pUC13 were subcloned into M13 vectors (see Table 106 Figure 14. Northern b l o t analysis of the 5' end of the ceruloplasmin gene. Samples of human l i v e r p o l y ( A ) + RNA (10 ug each) were electrophoresed i n a denaturing agarose-formaldehyde gel and transferred 32 to n i t r o c e l l u l o s e . F i l t e r s were hybridized with P - l a b e l l e d EcoRI (E) and Hindlll/EcoRI (HE) r e s t r i c t i o n fragments derived from the 5' end of the gene (panel A). Fragment sizes of probes (Kilobases) are shown above each b l o t ; bracketed numbers represent lengths of exposure times i n days. Sizes of the two h y b r i d i z i n g RNA species detected are given i n Kilobases, 32 based on p o s i t i o n s of P - l a b e l l e d H i n d l l l fragments of X DNA. "C" designates a p o s i t i v e c o n t r o l , since t h i s 3.1 Kbp genomic fragment contains exon 1 (see text f o r d e t a i l s ) . Blots are shown sequentially i n a 5* to 3' d i r e c t i o n from l e f t to r i g h t . For c l a r i t y , a r e s t r i c t i o n map i n EcoRI (E) and H i n d l l l (H) corresponding to t h i s region i s shown i n panel B; fragment s i z e s are given i n Kilobase p a i r s (Kbp). The bracketed E i d e n t i f i e s an EcoRI l i n k e r . An arrow indicates the d i r e c t i o n of t r a n s c r i p t i o n . 108 UJ-ui-iii-10 t a 109 Figure 15. RNA dot blo t analysis of the 5' untranslated region of the  ceruloplasmin gene. Human l i v e r p o l y ( A ) + RNA (7.5 yg) was spotted onto n i t r o c e l l u l o s e and probed with M13 clones derived from appropriate r e s t r i c t i o n fragments (see Table I I I ) . A p a r t i a l map of the region i s shown; abbreviations used are E - EcoRI; S - Sau3A; F - H i n f l ; D - Ddel; H - H i n d l l l . The bracketed E represents an EcoRI l i n k e r . The extent and d i r e c t i o n of sequence analysis of the M13 clones used f o r dot b l o t h y b r i d i z a t i o n s are shown by t h i n arrows below the r e s t r i c t i o n map. The t h i c k arrow designates the d i r e c t i o n of t r a n s c r i p t i o n . Corresponding exons L l , L2, and L3 i d e n t i f i e d by dot b l o t analysis are shown. The scale represents 0.1 Kilobase p a i r s (Kbp). I l l I I I ) . M13 clones were analyzed i n i t i a l l y by DNA sequence analysis and were subsequently used to generate double-stranded probes f o r dot b l o t h y b r i d i z a t i o n analysis (Figure 15). The r e s u l t s indicate the presence of at l e a s t 4 exons i n the 5' untranslated sequence of the gene, with a minimum of 2 separate exons within the 1.5 Kbp EcoRI fragment (designated L l and L2; see Figure 10), one exon within the 1.3 Kbp Hindlll/EcoRI fragment (designated L3; see Figure 10) and one exon (L4) determined previously within the 1.7 Kbp AccI fragment (see above). C. ISOLATION AND COMPLETE CHARACTERIZATION OF A PSEUDOGENE FOR HUMAN CERULOPLASMIN C l I s o l a t i o n of Genomic DNA Clones Containing the Human Ceruloplasmin Pseudogene As described previously (see Section I I I . B . l ) , one ceruloplasmin pseudogene clone (designated 3v|»10) was obtained from the screening of a human genomic l i b r a r y constructed i n the Charon 4A vector, using the phCP-1 cDNA clone as a h y b r i d i z a t i o n probe. The i d e n t i t y of t h i s i n i t i a l clone as a pseudogene f o r human ceruloplasmin was based on r e s t r i c t i o n endonuclease mapping and Southern b l o t analysis ( r e s u l t s not shown). This was subsequently confirmed using DNA sequence analysis and nuclease SI analysis (see following sections). Three a d d i t i o n a l pseudogene clones (designated 3<J/9, 3^21, and 3^29) were i s o l a t e d from a human genomic l i b r a r y i n the phage vector X EMBL 3 , using both phCP-1 and XhCP-1 cDNA clones as h y b r i d i z a t i o n probes. R e s t r i c t i o n enzyme mapping analysis indicated that the l a t t e r three phage clones overlapped the 3iJ/10 clone (Figure 16). A t o t a l of nearly 21 Kbp of contiguous genomic DNA i s 112 Figure 16. P a r t i a l R e s t r i c t i o n Map and Sequencing Strategy f o r the Human  Ceruloplasmin Pseudogene. The complete r e s t r i c t i o n map for EcoRI (E) i s shown. The l i n e s above the r e s t r i c t i o n map represent the four overlapping phage clones 3»|/9, 3*1*21, 3«|;29 ( i s o l a t e d from the Geddes l i b r a r y ) , and 3i|/10 (from the Maniatis l i b r a r y ) . The s o l i d bar represents the region of the ceruloplasmin pseudogene that i s homologous to the ceruloplasmin cDNA sequence. S o l i d c i r c l e s at the ends of EMBL 3 phage clones represent Sau3A s i t e s , while s o l i d squares at the end of the 3i|»10 phage clone i s o l a t e d from the Maniatis l i b r a r y represent EcoRI l i n k e r s . The locations of BamHl (B) and H i n d l l l (H) s i t e s used f o r subcloning and sequence analysis are shown within the 3»|/10 clone; the remainder of the map i s incomplete f o r these two enzymes. The region containing the BamHl/Hindlll fragment has been expanded below the r e s t r i c t i o n map. Arrows below t h i s region in d i c a t e the d i r e c t i o n and extent of nucleotide sequence obtained from various M13 clones. Probe A was used as a h y b r i d i z a t i o n probe f o r chromosomal l o c a l i z a t i o n studies (see text f o r d e t a i l s ) . The scale represents 1 Kbp. ate #-J—i 1 1 L. 3^21 • 1—L 3^29 • ' ' 1 — 1 1 • ,1 kbp, B H 3^10 • 1 JJ 1—I 4 1 1 • E ^ E E I ^ - I I T 1 kbp Probe A 114 represented by these four recombinant clones, with the human ceruloplasmin pseudogene mapping to approximately 1.7 Kbp within t h i s region (see Figure 16) . C.2 DNA Sequence Analysis of the Human Ceruloplasmin Pseudogene The nucleotide sequence of the ceruloplasmin pseudogene (Figure 17) was determined by using the strategy outlined i n Figure 16. Each nucleotide was determined an average of 3.4 times, and 54% of the sequence was obtained on both strands. Nucleotides 53 - 1644 of the pseudogene sequence are very s i m i l a r to residues 1502 - 3318 of the ceruloplasmin cDNA (nearly 97% i d e n t i c a l ; see Figure 17), extending through the 123 bp segment corresponding to the 3' untranslated region of the phCP-1 cDNA clone (see Section III.A). The pseudogene, however, i s not characterized by a poly(A) t r a c t at the expected polyadenylation s i t e . DNA sequence analysis i n a 3' d i r e c t i o n revealed the presence of an unusual 54 bp segment, composed p r i m a r i l y of repeated CT dinucleotides (nucleotide residues 1867 - 1920; see Figure 17). Following the sequence corresponding to the 3' untranslated region of phCP-1, the next 43 bp of pseudogene sequence (nucleotides 1645 - 1687) correspond to the 3' untranslated sequence of a ceruloplasmin cDNA clone (designated Cp-1) characterized by Yang et a l . (1986). This clone d i f f e r s from phCP-1 i n that the 3' untranslated region extends f o r an ad d i t i o n a l 120 bp p r i o r to the point of poly(A) addition. Following t h i s 43 bp segment, the remainder of the pseudogene sequence p r i o r to the poly(CT) segment shares l i t t l e s i m i l a r i t y with the 3* untranslated region from the cDNA clone described by Yang et a l . (1986). 115 Figure 17. Nucleotide sequence of the human ceruloplasmin pseudogene and  comparison with the corresponding region of the ceruloplasmin cDNA  sequence. The pseudogene sequence was determined by analysis of the overlapping clones shown i n Figure 16. I/E denotes posi t i o n s of intron-exon boundaries i n the wild-type gene (see Section III.B). The p o s i t i o n of the AGCT i n s e r t i o n that r e s u l t s i n a frameshift mutation i s enclosed i n a box. The si z e s and posi t i o n s of deletions (A) r e l a t i v e to the ceruloplasmin cDNA sequence are also shown; the region of the cDNA corresponding to the 213 bp d e l e t i o n has been omitted. The places (5* and 3') where the pseudogene sequence diverges from the wild-type sequence are indicated by arrowheads (see text f o r d e t a i l s ) . The cDNA sequence represents nucleotides 1502 - 1864 and 2078 - 3318 of phCP-1. Asterisks indicate i d e n t i c a l nucleotides i n corresponding p o s i t i o n s i n the two sequences. Dashes were introduced at points of i n s e r t i o n or d e l e t i o n i n order to maximize homology. 1 ^ 20 40 i A T T C T G A C A T T A G A A A G C A C A C T T C A C C T C T C T A A T G T G A C C T T T C T C A C A G * 60 80 100 120 140 G TG TG CCTC CTTC AG CTTCCCATG TGG CA CCCACAG A A A CATTCACCTATGAATGGA CTG TCC CCAAA GAAGTAGGA CCCACTAATG CAG A TCCTC TA TG G T G T G C C T C CTTCAG C C T C C C A T G T G G C A C C C A C A G A A A C A T T C A C C T A T G A A T G G A C T G T C C C C A A A G A A G T A G G A C C C A C T A A T G C A G A T C C T G T G T G 1520 1540 1560 1580 1600 160 180 200 220 240 T C T A G C T A A G A T G T A T T A T T C T G C T G T G G A T C C C A C T A A A G A T A T A T T C A C T G G G C T T A T T G G G C C A A T G A A A A T A T G C A A G A A A G G A A G T T T A CA T G CA T C T AG C T AAG ATG T A T T A T T C T G C T G T G G A T C C C ACT A A A G A T A T A T T C A C T G G G C T T A T T G G G C C A A T G A A A A T A T G C A A G A A A G G A A G T T T A C A T G CA 1620 1640 1660 1680 1700 260 2B0 300 320 340 A A T T G G A G A C A G A A A G A T G T A G A C A A G G A G T T C T A T T T G T T T C C T A T A G T A T T T A A T G AG AA T G AGGG T T T A C T C C T G G A A G A T A A T A T C A G A A T G T T T A A A T G G G A G A C A G A A A G A T G T A G A C A A G G A A T T C T A T T T G T T T C C T A C A G T A T T T GATGA G AA T G AG AG T T T A C T C C T G G A A G A T A A T A T T A G A A T G T T T A 1720 1740 1760 1780 1B0O l /E ; A 213 bp 360 380 400 1 420 440 C A A C T G C A C C T C ftTCAGGTGG A T A A G G C A G A T G A A G A C T T T C A G G A A T C T A A T A A A A T G C A C T G G A C T T T T A A T G T T G A A T G C C T T A CAGCTJAG Clfc A T C C A A C T G C A C C T G A T C A G G T G G A T A A G G A A G A T G A A G A C T T T C A G G A A T C T A A T A A A A T G C A C T G G A C T T T T A A T G T T G A A T G C C T T A C A A C T G A T C 1820 1840 1860 2080 2100 460 480 500 520 540 A T T A C A C A G G C G G C A T G A A G C A A A A A T A T A C T G T G A A C C A A T G C A G G T G G CAGTCTGAGGATrcCACCTTCTACCTGGGAGAGAGG A C A T A C T A T A T C G C A T T A C A C A G G C W J C A T G A A G C A A A A A T A T A C T G T G A A C C A A T G C A G G C G G C A G T ^ 2120 2140 2160 2180 2200 560 SeO 600 620 640 AG C A G T G G A A G T G G A A T G G G A T T A T T C C C C A C A A A G G G A G T G G G A A A A G G A G C T G C A T C A T T T AC A A G A G C A G A A T G T T T C A A A T G C A T T T T T A G A T A A G A G C A G T G G A G G T G G A A T G G G A T T A T T C C C C A C A A A G GG A G T G G G A A A A G G AG C T G C A T C A T T T A C A A G A G C A G A A T G T T T C A A A T G CA T T T T T A G A T A A G 2220 2240 2260 2280 2300 660 680 700 720 740 G G A G A G T T T T A C A T A G G C T C A A A G T A C A A G A A A G T T G T G T A T C G G C A G T A T A C T G A T T G C A C G TTCCGTA T T C C A G T G G AG AG A A A A G C T G A A G A A G A A C G G A G A G T T T T A C A T A G G C T C A A A G T A C A A G A A A G T T G T G T A T C G G C A G T A T A C T G A T A G C A C A T T C C G T G T T C C A G T G G A G A G A A A A G C T G A A G A A G AA C 2320 2340 2360 23B0 2400 760 7B0 800 820 840 A T C T G G G A A T T C T A G G T C TA C A A C T T C A T G CAG A T G T T G G AG A C A A A G T C A A A A T T A T C T T T A A A A A CA T G A C C A C A A G G C C C T A C T C A A T A CA T G C C C A A T C T G G G A A T T C T A G G T C C A C A A C T T C A T G C A G A T G T T G G A G A C A A A G T C A A A A T T A T C T T T A A A A A CA T G G C C A C A A G G C C C T A C T C A A T A C A T G C C C A 2420 2440 2460 2480 2500 860 880 900 920 940 T G G G G T A C A A A C G G A G A G T T C T A C A T T T A T T C C A G C A T T A C C A G G T G A A A C T C T C A C T T A C C T A T G G A A A A T C C C A G A A A GA T C T G G A G C T G G A A C A G A G T G G G G T A C A A A C A G A G A G T T C T A C A G T T A C T C C A A C A T T A C C A G G T G A A A C T C T C A C T T A C G T A T G G A A A A T C C C A G A A A G A T C T G G A G C T G G A A C A G AG 2520 2540 2560 2580 2600 960 980 1000 1020 1040 G A T T C T G C T T G T A T T C C A T G G G C T T A C T A T T C A A C T G T G G A T C A A G T T A A G G A T C T C T A C A G T G G A T T A A T T G G C C C C C T G A T T G T T T G T C G A A G A CA T T G A T T C T C C T T C T A T T C C A T G G G C T T A T T A T T C A A C T G T G G A T C A A G T T AAGGA C C T C T A C A G T G G A T T A A T T G G C C C C C T G A T T G T T T G T O G A A G A C C T T 2620 2640 2660 2680 2700 1060 1080 1100 1120 1140 A C T T G A A A G T A T T C A A T C C C A G A A A G A A A C T G G A A T T T A C C C T T C T G T T T C T A G T T T T T G A T G AG A A T G A A T C T T G G T A T T T A G A T G A C A A C A T C A A A A C A C T T G A A A G T A T T C A A T C C C A G A A G G A A G C T G G AATTTq C C C T T C T G T T T C T A G T T T T T G ATGAGAATG AA T C T T G G T A C T T A G A T G A C A A C A T C A A A A C 2720 2740 2760 2780 2800 1160 1180 1200 1220 1240 A A A C T C T G A T C A C C C C A A G A A A G T A A A C A A A G A T G A T G A G G A A T T C A T A G A A A G C A A T A A A A T G C A T G C T G T T A A T G G A A G AA T G T T T G G A A A C C C A C A A A T A C T C T G A T C A C C C C G A G A A A G T A A A C A A A G A T G A T G A G G A A T T C A T A G A A A G C A A T A A A A T G C A T G C T A T T A A T G G A A G A A T G T T T G G A A A C C T A C A A 2820 2840 2860 2880 2900 Al7bp 1260 1280 1 1300 1320 G G C C T C A C A A T G C A C A T G G G A G A T G A A G C C A A T G G G C G A T G A A A T AG A C T T A C A CA C T G T A C A T T T T C A C G G C C A T A G C T T C T G G C C T C A C A A T G C A C G T G G G A G A T G A A G T C A A C T G G T A T C T G A T G G G A A T G G G C A A T G A A A T A G A C T T A C A C A C T G T A C A T T T T C A C G G C C A T A G C T T C C 2920 2940 2960 2980 3000 1340 1360 1380 1400 1420 A A T A C A A G C A C A G G G C T C T T T A T A G T T C T G A T G T C T T T G A CA T T T T C C C T G G A A C A T A C C A A A C C C T A G A A A T G T T T C C A A G A A CA C C T G G AA T T T G G T T A A T A C A A G C A C A G G G GAG T T T A T A G T T C T G A T G T C T T T G A CA T T T T C C C T G G A A C A T A C C A A A C C C T A G A A A T G T T T C C A A G AA CA C C T G G A A T T T G G T T 3020 3040 3060 3080 3100 1440 1460 1480 1500 1520 A C T C C A C T G C C A T G T G A C T G A C CA C A T T C A T G C T G G E A T G G A A A C C A C T T A C A T T G TTCTAC A A A A T G A A G A C A C C A A G T C T G G CTGA A T G A A A T A C A T T A C T C C A C T G C C A T G T G A C C G A C C A C A T T C A T G C T G G A A T C G A A A C C A C T T A C A C C G T T C T 3120 3140 3160 3180 3200 1540 1560 1580 1600 1620 G G T G A T A A G T G G A A A A A A G A G A A A A A C C A A T G A T T C A T A A C A A T G T A T G T G A A A G T G TAAAATAGAA TG TTACTTTGGAATG A C T A T A A A CA 1 T G AAAG A G G T G A T A A G T G G A A A A A A G AG A A A A A C C A A T G A T T C A T A A C A A T G T A T G T G A A A G T G T A A A A TAG AA T G T T A C T T T G G AA T G A CTA T A A A CATTAAAAG A 3220 3240 3260 3280 3300 1640 1660 1680 • 1700 1720 AG A C T G G A A A C A T A C A A C T T T G T G C A T T T G T G G G G G A A A A C T A T T A A T T T T T T C A G AG CA C T G T A G C G G T A G CCACAAAG C CAG C T G CCCATG TCCTTGG - G A C T G G A G 3318 1740 1760 1780 1800 1820 CTCTACACACGGCCCACTCACTCACATGCCATCAGTGTAGGCTCTCA T T G T G T G T G T C C C C T T C T G C T T T C A T A G G C C C G G G C C A T T A G C A TTG T T T AGT 1B40 I860 1B80 1900 1920 GTGAGAGAGGCAGGAGCTGCCGGGTTCCCCCCTCTCTCTCTCTCTCC CTCTCTCTCTCTCTCCXTrCTCTCTCTCrCTCTCT C T C T 117 The 5' end of the pseudogene sequence (nucleotide residues 10 - 52; see Figure 17) corresponds to the sequence of 3' end of intron present i n the wild-type gene (data not shown). A consensus 3* s p l i c e acceptor sequence (Breathnach and Chambon, 1981; Cech, 1983) i s located immediately p r i o r to nucleotide residue 53. This s p l i c e j unction i s i n the corresponding p o s i t i o n to a s p l i c e s i t e present i n the wild-type gene (see Table V). On the basis of DNA sequence analysis, nucleotide residue 9 marks the point of divergence from the wild-type gene. Southern b l o t analysis confirms that the 2.6 Kbp EcoRI fragment located d i r e c t l y 5' to the 0.8 Kbp EcoRI fragment (see Figure 16) does not hybridize to the corresponding area of the wild-type gene ( r e s u l t s not shown). In contrast to most processed pseudogenes characterized to date (Vanin, 1985), the points of divergence at the ceruloplasmin pseudogene are not flanked by short d i r e c t repeats. A comparison of the nucleotide sequences of the pseudogene and ceruloplasmin cDNA i s also shown i n Figure 17. A d e l e t i o n of 213 bp was observed i n the pseudogene sequence, corresponding to nucleotides 1865 -2077 of the human ceruloplasmin cDNA sequence. P r i o r to t h i s d e l e t i o n , the pseudogene sequence contains an open reading frame corresponding to amino acid residues 483 - 602 of the c e r u l o p l a s i n coding sequence. Occasional base changes within t h i s open reading frame r e s u l t i n the occurrence of amino acid substitutions (Figure 17). The 213 bp d e l e t i o n i n the pseudogene sequence maintains t h i s open reading frame, which then continues for an a d d i t i o n a l 9 amino acid residues. At t h i s point, an i n s e r t i o n of 4 duplicated nucleotide residues (AGCT) (nucleotides 445 -448; see Figure 17) causes a frameshift mutation, such that a TGA 118 termination codon occurs immediately following t h i s i n s e r t i o n . The remainder of the pseudogene sequence i s s i m i l a r to the phCP-1 cDNA clone and the cDNA clones reported by Yang et a l . (1986) but contains a number of base su b s t i t u t i o n s compared with the cDNA sequence. There i s also a small d e l e t i o n of 17 bp corresponding to nucleotide residues 2942 - 2958 of the cDNA (see Figure 17). However, t h i s d e l e t i o n does not r e s u l t i n the resumption of an open reading frame i n the pseudogene sequence. C.3 Nuclease SI Analysis of the Human Ceruloplasmin Pseudogene The presence of the 213 bp de l e t i o n i n the pseudogene sequence as compared to the cDNA sequence was u t i l i z e d to analyze the presence of pseudogene-specific t r a n s c r i p t s . A single-stranded DNA fragment (286 nucleotides i n length) derived from the human ceruloplasmin cDNA sequence was used as a h y b r i d i z a t i o n probe for nuclease SI a n a l y s i s . This probe corresponds to a region spanning nucleotide residues 1927 - 2213 of the cDNA sequence, thereby containing 150 bp within the deleted region i d e n t i f i e d i n the pseudogene sequence (see Figure 18B). Protection of human l i v e r p o l y ( A ) + RNA with t h i s probe res u l t e d i n a s i n g l e protected band of 286 bp, corresponding to the wild-type t r a n s c r i p t (Figure 18A). A d d i t i o n a l l y , a band corresponding to the f u l l - l e n g t h probe was observed (Figure 18A) which contains M13 sequences i n addition to ceruloplasmin sequences. I f the pseudogene was transcribed i n l i v e r , and assuming that nuclease SI cleaves a l l single-base mismatches between the probe and the putative t r a n s c r i p t , protected DNA fragments of 47bpand 55 bp would be expected (corresponding to nucleotides 453 - 499 and 501 - 555 resp e c t i v e l y ; see Figure 17). However, no protected fragment of a smaller s i z e , which would correspond to an RNA species containing the 213 bp 119 Figure 18. Nuclease SI t r a n s c r i p t analysis f o r the ceruloplasmin  pseudogene. Panel b shows the lo c a t i o n of the 286 bp probe used f o r SI nuclease p r o t e c t i o n analysis r e l a t i v e to the 213 bp de l e t i o n observed i n the pseudogene sequence. Part of t h i s probe (150 bp) i s within the deleted region (corresponding to nucleotides 1865 - 2077 of the cDNA), while 136 bp of the probe are 3' to the de l e t i o n . Following h y b r i d i z a t i o n to 1 yg (lane A) or 0.35 yg (lane B) of p o l y ( A ) + mRNA and nuclease S l - d i g e s t i o n , nuclease-resistant products were separated on a denaturing polyacrylamide gel and v i s u a l i z e d by autoradiography (panel a). A band corresponding to the s i z e of the undigested f u l l - l e n g t h probe (FLP) was observed, as well as a protected band of 286 bp corresponding to the wild-type t r a n s c r i p t . A H i n f l digest of pBR322 was used f or markers. The si z e s of r e s u l t i n g fragments are given i n base p a i r s . a b A B 506/517 mm 396 mm 344 — — - F L P 2 9 8 * — — 2 8 6 bp ^ w v * 220/221 Oft 1 | probe(286 bp) 150 bp 136 bp " 18^65 2077 A 2 1 3 bp 'wild-type transcript -putative pseudogene transcript 121 Figure 19. Chromosome Mapping of the Human Ceruloplasmin Pseudogene Using  Somatic C e l l Hybrid Analysis. A 1.1 Kbp pseudogene-specific probe (Probe A; see Figure 16) was hybridized to EcoRI-digested DNA from human-hamster hybrid c e l l l i n e s (lanes 1-22). Numbering of the c e l l l i n e s corresponds to that shown i n Table 8. EcoRI-digested hamster DNA (lane H) and human pl a c e n t a l DNA 32 (lane HP) were included as controls. The pos i t i o n s of P - l a b e l l e d Hindlll-EcoRI fragments of X DNA used as s i z e markers are shown. Fragment s i z e s are given i n Kilobase p a i r s (Kbp). T - W W W (D N 00 CD CD -r- CM W v lO CD CO CD 0 I" CM -i-— — CM CM CM -1-CL 21.2-5.2. 5.0^ 4-3-a s -20 . 1-9—I 1.6-1.4-0.95H 0-83' 0.56-I I Table VIII. Segregation Of The Human Ceruloplasmin Pseudogene With Human Chromosomes In Human - Hamster Somatic Hybrids Identifiable, Intact Human Chromosomes8 Response t o D Cell Line Probe A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X T 1 41.06 ' .-• - - + - + - - - - + - - + + + + + + - + - - + -2 4 5 . 0 1 - - - - - - - - - - - - - - - - - - ' - - - - + - -.3 45.43 - + - - - - - - - + + + - - - - + + + + , + ' + - + -4 76.14 - - - + - + + - - - - + - + + - + • + + - + + - + + 5 76.31 - - - - - - - - - - - - - - - - - - - - - - - - + 6 76.33 + + - + + + - - + - + - - + - + + - + + - + - + -7 79.05b - - - + - - - - - - - - - - + + + + + + - + - + -8 . 80.05d - - - - - - + + - - - - + - - - + - - - + - - + + 9 80.14c - - ' - - + + + + - - + - - - - - - - - - + + - + -10 80.17a .- - - - ' + + - - - + - + - - - + + - + + + + + +'-11 82.82a - - - - - - - - - - - - - - - - - - - - - - - + -12 85.16a . + - + + + + - - + - + + - + - + - + - ' - - + - + -13 89.27a - - - - - + - - - + - - + - - + + + + - + - + - -14 100.02b - - - + - - - - - - - - - - + - + - + - - + - - -15 102.05b - + - - - + + - - - - - - + - - + + - - + - + - -16 103.04 - • - - - + + - - - - - - + - - + + - + - + + - - -17 111.02a - + - - - + + - - - - - + + + + - + - + - + + - -18 112.10a + + - - + + - - + - - + + + - + + - - + + - - + -19 120.33 - - - - - + + - - - - - + + - - + + - ' + - + + + -20 120.35 + - - + - + + - + - - - + + + - - + + + - - + + -21 133.05 + - - - - - + - + - - - - - - - - - - - + - + + -22 134.02a + + - - + + - + + - + + - + - + + + + + + + - + -Z Discordancy i 27 18 32 23 45 45 32 0 41 27 27 41 27 45 36 64 50 50 32 55 55 45 41 41 aPresence (+) or absence (-) of human chromosomes as determined by cytogenetic analysis and confirmed by isoryme analysis (Donald et al.„ 1983; Riddell et a l . , 1985). M ^Presence (+) or absence (-) of human EcoRI-digested sequences homologous to the human ceruloplasmin pseudogene probe (Probe A). 124 d e l e t i o n was observed at e i t h e r of the RNA concentrations used (Figure 18A). C. 4 Chromosomal L o c a l i z a t i o n of the Human Ceruloplasmin Pseudogene The chromosomal l o c a t i o n of the ceruloplasmin pseudogene was determined by somatic c e l l hybrid analysis. For t h i s purpose, an i s o l a t e d 1.1 Kbp EcoRI r e s t r i c t i o n fragment (Probe A, Figure 16), located 3* to the pseudogene, was used as a h y b r i d i z a t i o n probe. Southern b l o t analysis of EcoRI-digested human-hamster somatic c e l l hybrid DNA (Figure 19) showed that a l l 22 hybrid c e l l l i n e s tested (lanes 1 - 22) were concordant f o r the presence or absence of the 1.1 Kbp band and chromosome 8 (Table V I I I ) . Control lanes containing e i t h e r EcoRI-digested human p l a c e n t a l DNA (lane HP, Figure 19) or hamster DNA (lane H, Figure 19), were included i n the Southern b l o t analysis. No c r o s s - h y b r i d i z a t i o n was detected i n the lane containing hamster DNA, while the expected 1.1 Kbp EcoRI band corresponding to Probe A was observed i n the human placental DNA. D. GENOMIC SOUTHERN BLOT ANALYSIS OF THE HUMAN CERULOPLASMIN GENE I n i t i a l l y , human genomic DNA i s o l a t e d from l i v e r t i s s u e was digested with several r e s t r i c t i o n endonucleases, and the r e s u l t i n g fragments were separated by agarose gel electrophoresis. Following t r a n s f e r to n i t r o c e l l u l o s e , DNA fragments were analyzed with 32 P-labelled h y b r i d i z a t i o n probes derived from the previously characterized human ceruloplasmin cDNA clones phCP-1 and VhCP-1 (see Section III.A). Using the XhCP-1 h y b r i d i z a t i o n probe (Figure 20B), a large single band (> 23 Kbp) was detected with a BamHl digest; t h i s i s predicted based on r e s t r i c t i o n endonuclease mapping of the wild-type gene 125 Figure 20. Genomic Southern Blot Analysis of the Human Ceruloplasmin Gene. Human l i v e r DNA (10 yg) was digested with various r e s t r i c t i o n enzymes (BamHl, EcoRI, P s t l , and H i n d l l l ) and electrophoresed on 1.0% agarose gels. Following t r a n s f e r to n i t r o c e l l u l o s e , the b l o t was 32 hybridized using P - l a b e l l e d phCP-1 (panel A) or \hCP-l (panel B) as 32 probes. Positions of P-la b e l l e d H i n d l l l fragments of phage X. DNA used as s i z e markers are given i n Kilobase p a i r s (Kbp). 126 00 LU # * > CO (J) CO CM I C M i CM CM I in b 00 CM CM6 CM CM in 6 127 (see Figure 10) and suggests that a s i n g l e gene i s being detected with t h i s probe. Other h y b r i d i z i n g fragments present i n EcoRI (15 Kbp, 4.6 Kbp, 3.1 Kbp, 1.8 Kbp), P s t l (9.5 Kbp) and H i n d l l l (5.7 Kbp, 4.5 Kbp) genomic digests a l l correspond to fragments i d e n t i f i e d i n the p a r t i a l r e s t r i c t i o n enzyme map of the wild-type gene (see Figure 10). Using the phCP-1 cDNA clone as a h y b r i d i z a t i o n probe f o r EcoRI, BamHl. H i n d l l l and P s t l human genomic digests (Figure 20A), the r e s u l t i n g pattern i s more complex. M u l t i p l e bands were detected with each r e s t r i c t i o n endonuclease used. EcoRI bands which could not be i d e n t i f i e d i n the r e s t r i c t i o n map of the wild-type gene (see Figure 10) were assigned subsequently to e i t h e r a pseudogene f o r human ceruloplasmin (corresponding to nucleotide residues 1502 - 3318 of the cDNA sequence; see Section III.C) or to the 3* end of the wild-type gene (see below). In the former case, the 0.45 Kbp, 0.8 Kbp, and 1.5 Kbp EcoRI fragments (indicated by arrows i n Figure 21) correspond to the pseudogene locus (see Section III.C and Figure 16). The remainder of the EcoRI fragments i n Figure 20A, as well as a l l of those detected using the XhCP-1 h y b r i d i z a t i o n probe (Figure 20B) have been assigned unambiguously to the wild-type ceruloplasmin gene (see Figures 10 and 22). The 3' end of the ceruloplasmin gene was characterized by Southern b l o t analysis of EcoRI-digested human genomic DNA (Figure 22A) using two s p e c i f i c h y b r i d i z a t i o n probes derived from the cDNA sequence (Figure 22B). Two bands were detected with each h y b r i d i z a t i o n probe, one of which i n each case corresponds to the pseudogene locus (see Figure 21). The remaining two fragments (4.6 Kbp and 2.4 Kbp) were thus unequivocally 128 Figure 21. Genomic Southern Analysis of the Human Ceruloplasmin Pseudogene  and Related Sequences. EcoRI-digested human l i v e r genomic DNA was hybridized with e i t h e r the phCP-1 (panel A) or XhCP-1 (panel B) ceruloplasmin cDNA clones, under conditions of high stringency. Arrows indi c a t e bands that have been assigned to the human ceruloplasmin pseudogene (see Figure 16). The remaining bands correspond to the wild-type locus (see text f o r d e t a i l s ) . 32 Positions of P - l a b e l l e d H i n d l l l fragments of phage X DNA used as s i z e markers are shown. Fragment sizes are given i n Kilobase p a i r s (Kbp). 129 A 23.7-9.5-6 7 -23.7-9 5 -6 7 -4 .5- V V 2.2-2 0 -0 5 9 -130 Figure 22. Genomic Southern b l o t Analysis of the 3' End of the Human  Ceruloplasmin Gene. EcoRI-digested human lymphocyte genomic DNA was hybridized using e i t h e r Probe 1 or Probe 2 (panel A), derived from the 3' end of the human ceruloplasmin cDNA sequence (panel B). Arrows indi c a t e bands that have been assigned to the ceruloplasmin pseudogene locus (see text f o r d e t a i l s ) ; the remaining bands correspond to the 3' end of the wild-type 32 ceruloplasmin gene. Positions of P-labelled H i n d l l l fragments of phage X DNA used as s i z e markers are shown. Fragment si z e s are given i n Kilobase p a i r s (Kbp). Kbp 23.7 -9.5 -6.7 -4.3 _ 2.2 -2.0 _ 0.59 _ B 2417 E 2851 3321 Poly(A) PROBE 1 E 2 132 assigned to the wild-type gene (see Figure 10), with the 2.4 Kbp EcoRI h y b r i d i z i n g species representing the 3'-most fragment. IV. DISCUSSION A. Characterization of the Human Preceruloplasmin cDNA A.1 DNA Sequence Analysis of the Human Preceruloplasmin cDNA I n i t i a l l y , a human ceruloplasmin cDNA clone of 2.7 Kbp (phCP-1) was i s o l a t e d from a cDNA l i b r a r y prepared from human l i v e r mRNA, by using mixtures of synthetic oligonucleotides as h y b r i d i z a t i o n probes. This cDNA clone was found to encode amino acid residues 202 - 1046 of the ceruloplasmin p r o t e i n sequence, followed by 123 bp of 3' untranslated sequence (preceded by a TGA stop codon) and terminating with a poly (A) t r a c t . Two randomly-primed human l i v e r cDNA l i b r a r i e s were subsequently 32 screened using appropriate P - l a b e l l e d r e s t r i c t i o n fragments. Two clones of 1.2 Kbp and 1.0 Kbp (designated \hCP-l and XhCP-6, respe c t i v e l y ) were i d e n t i f i e d that together contained cDNA sequence encoding amino acid residues 1 - 380 of the mature pr o t e i n preceded by a putative 19 amino acid long s i g n a l peptide and 38 bp of 5' untranslated sequence. Nucleotides 58 - 3195 of the cDNA sequence code f o r the plasma form of ceruloplasmin; the predicted amino acid sequence agrees completely with that reported previously by Takahashi et a l . (1984) who used p r o t e i n chemistry techniques. O v e r a l l , the base composition of ceruloplasmin mRNA i s somewhat A + U r i c h (33% A, 26% U, 22% G, 19% C), r e f l e c t i v e of the coding region i n which 60% of the codons end i n ei t h e r A or U. This observation i s i n contrast to the codon usage i n other l i v e r mRNAs, such 133 as those for prothrombin (MacGillivray and Davie, 1984), factor X (Fung et a l . , 1985), or factor XII (Cool et aJL., 1985), i n which approximately 90% of the codons end i n G or C. One codon i s not used i n the coding region of the ceruloplasmin mRNA (CGC f o r arginine) and others are used r a r e l y (2 of 51 alanine residues are encoded by GCG). The 3' untranslated region (spanning nucleotide residues 3199 - 3321) contains a putative polyadenylation s i g n a l ATTAAA (Proudfoot and Brownlee, 1976) that i s located 14 nucleotides upstream of the poly(A) t a i l . This polyadenylation s i g n a l i s observed i n 12% of such 3* terminal sequences from vertebrates (Wickens and Stephenson, 1984), and i s a variant of the more commonly observed s i g n a l AATAAA ( B i r n s t i e l et a l . , 1985). Nucleotide residues 1 - 5 7 code f o r an amino terminal extension of 19 amino acids that i s removed p r i o r to the appearance of ceruloplasmin i n plasma. A methionine residue occurs at residue -19 i n t h i s peptide sequence and may function as the i n i t i a t o r methionine. The leader peptide i s r i c h i n hydrophobic amino acids, and thus resembles a t y p i c a l s i g n a l peptide (von Heijne, 1982; Watson, 1984). Such sequences function i n the i n i t i a t i o n of export of nascent polypeptide chains across the rough endoplasmic reticulum (Blobel et a l . , 1979). The ceruloplasmin cDNA sequence predicts that an Ala-Lys bond (encoded by nucleotides 55 - 60; see Figure 7) i s cleaved during removal of t h i s leader peptide. This i s consistent with the s p e c i f i c i t y of s i g n a l peptidase cleavage (von Heijne, 1983) which t y p i c a l l y occurs following basic residues that are preceded by small hydrophobic amino acids. This suggests that ceruloplasmin i s synthesized i n l i v e r as a t y p i c a l preprotein, containing a signa l peptide sequence of at least 19 amino acids. 134 Mercer and Grimes (1986) have reported the cha r a c t e r i z a t i o n of a p a r t i a l human ceruloplasmin cDNA clone that was i s o l a t e d from a human l i v e r cDNA l i b r a r y constructed i n the phage vector XgtlO. DNA sequence analysis showed that the clone extended from the amino terminal leader sequence to 114 amino acids short of the carboxy-terminus. The proposed s i g n a l peptide sequence obtained from DNA sequence analysis of X.hCP-1 (see Figure 7) was found to agree completely with that reported by Mercer and Grimes (1986). However, the 38 bp of 5' untranslated sequence determined by analysis of both XhCP-2 and XhCP-6 cDNA clones (see Figure 8) d i f f e r s completely from the 14 bp of 5' untranslated sequence determined by Mercer and Grimes (1986). The sequence divergence occurs immediately 5* to the proposed i n i t i a t o r methionine residue and does not appear to occur as the r e s u l t of a frameshift mutation. The reason f o r t h i s discrepancy i s unclear at present, since the nucleotide sequences of XhCP-2 and XhCP-6 are i n complete agreement with that determined from the analysis of the corresponding genomic region (see Figure 11). The p o s s i b i l i t y that the observed d i f f e r e n c e i s the r e s u l t of the u t i l i z a t i o n of a l t e r n a t i v e s p l i c i n g patterns i s u n l i k e l y , since the point of sequence divergence ( i . e . immediately 5' to the i n i t i a t o r methionine) between the two cDNA sequences does not correspond to the lo c a t i o n of an intro n i n the wild-type gene; an intron/exon junction was i d e n t i f i e d at nucleotide p o s i t i o n -13 on the basis of comparative DNA sequence analysis and nuclease SI mapping (see Section III.B.4.2). More probably, the sequence reported by Mercer and Grimes (1986) represents a cloning a r t i f a c t . The nucleotide sequence of two p a r t i a l cDNA clones for human ceruloplasmin have been reported independently by Yang et §_1. (1986). 135 These clones ( i s o l a t e d from the Stuart Orkin human l i v e r cDNA l i b r a r y ) are nearly i d e n t i c a l to the previously described 2.7 Kbp phCP-1 clone (see Section III.A). However, one of the clones analyzed by Yang et a l . (1986) (designated CP-1) contains an i n s e r t i o n of four amino acid residues (occurring between residues 1041 and 1042 of the ceruloplasmin protein sequence) as well as an a d d i t i o n a l 121 bp (extending 3' to phCP-1) i n the 3' untranslated region. The commonly used polyadenylation s i g n a l AATAAA i s located 16 bp upstream from the poly(A) t r a c t i n the CP-1 clone. The o r i g i n of these d i f f e r e n t cDNA clones i s uncertain at present, although these data c l e a r l y suggest some heterogeneity i n ceruloplasmin t r a n s c r i p t s with respect to the point of polyadenylation. Although the corresponding region of the wild-type ceruloplasmin gene has not yet been characterized (see Section III.B), several explanations f o r the observed amino acid i n s e r t i o n i n the CP-1 clone described by Yang et a l . (1986) can be postulated. Examination of the f a c t o r VIII genomic sequence ( G i t s c h i e r et a l . , 1986) reveals the presence of an in t r o n between exons 19 and 20 ( i . e . , corresponding to the ceruloplasmin cDNA sequence encoding amino acid residues 1041 - 1042, Vehar et a l . , 1984). Additional amino acids present i n the CP-1 clone may have a r i s e n from a l t e r n a t i v e s p l i c i n g of an intron i n some l i v e r c e l l s during RNA processing. Other p l a u s i b l e explanations could include the presence of two ceruloplasmin a l l e l e s i n the donor's genome, or the existence of two ceruloplasmin l o c i . The l a t t e r assumption i s somewhat doubtful, since a second wild-type gene was not detectable by e i t h e r genomic Southern b l o t analysis (see Section III.D) or chromosome mapping (Yang et a l . , 1986; Royle et a l . , 1987). 136 A.2 Internal Homology Within the Ceruloplasmin cDNA Sequence Extensive amino acid sequence i d e n t i t y has been reported previously within the three repeated A domains of ceruloplasmin (Takahashi et a l . , 1984). As expected, t h i s homology extends to the nucleotide sequences of the repeated units when they are aligned. The three domains of ceruloplasmin e x h i b i t approximately 46 - 51% i d e n t i t y when compared pairwise (Yang et a l . , 1986; Koschinsky et a l . , 1986). Within these three domains, s p e c i f i c regions show higher l e v e l s of sequence conservation (Yang et a l . , 1986). A comparison of the nucleotide sequence encoding amino acid residues 204 - 276 with that encoding residues 903 - 975 reveals 66.2% i d e n t i t y , while residues 204 - 276 compared with residues 565 - 637 shows 61.2% i d e n t i t y , and residues 565 - 637 aligned with residues 903 - 975 shows 59.4% nucleotide i d e n t i t y (Yang et a l . , 1986). I n t e r e s t i n g l y , a s i m i l a r region from the derived amino acid sequence of a p a r t i a l r a t ceruloplasmin cDNA clone (Aldred et a l . , 1987) (see Section I.D) shows remarkable s i m i l a r i t y with the published human amino acid sequence from residues 194 - 276 (75% i d e n t i t y ) , with almost complete conservation (98%) from residues 227 - 276, except f o r a conservative amino acid change at residue 243. This highly conserved region, present i n each of the t r i p l i c a t e d u n i t s i n the human ceruloplasmin molecule may represent a region subjected to functional constraint. I n t e r e s t i n g l y , these regions are not coincident with proposed copper-binding centers i n ceruloplasmin (see Section I.E). These highly conserved, s t r u c t u r a l l y - r e l a t e d areas may thus be i n d i c a t i v e of the evolutionary maintainence of an a l t e r n a t i v e ceruloplasmin function (Yang et a l . , 1986) . 137 A.3 Analysis of the Human Ceruloplasmin Transcript Northern b l o t analysis of the human ceruloplasmin t r a n s c r i p t revealed the presence of two hy b r i d i z i n g species i n human l i v e r p o l y ( A ) + RNA, (3.7 + 200 Kb and 4.5 + 250 Kb), using the phCP-1 cDNA clone as a hy b r i d i z a t i o n probe. In contrast, t o t a l RNA i s o l a t e d from the human hepatoma c e l l l i n e HepG2 (Knowles, 1980) (also probed with phCP-1) contained only the smaller mRNA species. Ceruloplasmin i s synthesized and secreted by HepG2 c e l l s (Knowles et a l . , 1980), suggesting that the 3.7 Kb RNA species encodes a fun c t i o n a l ceruloplasmin t r a n s c r i p t . The ceruloplasmin cDNA clones previously described i n t h i s study contain a t o t a l of 3359 bp of sequence i n addition to a poly(A) t r a c t which i s us u a l l y 180 - 200 bp long i n eukaryotic mRNAs (Perry, 1976). On t h i s b a s i s , both species appear to be larger than that required to encode the preceruloplasmin mRNA. These t r a n s c r i p t s may r e f l e c t the presence of long 5' or 3' untranslated segments which are d i f f e r e n t i a l l y processed, r e s u l t i n g i n the generation of two RNA species. The p o s s i b i l i t y of a l t e r n a t i v e s p l i c i n g patterns occurring at the 3' end of the ceruloplasmin t r a n s c r i p t has been suggested by Yang et §J^. (1986) to explain an i n s e r t i o n of 4 amino acids i n the pr o t e i n coding sequence (see Section IV.A.1). The extra sequence i d e n t i f i e d i n the l a t t e r clone when compared to phCP-1 (including the add i t i o n a l 121 bp observed i n the 3' untranslated segment) does not, however, account f o r the s i z e difference (approximately 800 nucleotides) observed between the two hy b r i d i z i n g RNA species detected using Northern b l o t a n a l ysis. I t i s s t i l l possible that an as yet u n i d e n t i f i e d cDNA clone may contain a d d i t i o n a l sequence i n the 3' untranslated segment, 138 thereby d i f f e r i n g from previously characterized cDNA clones with respect to the point of polyadenylation. This proposal i s f e a s i b l e , since heterogeneity with respect to the p o s i t i o n of polyadenylation i n the phCP-1 clone compared to the cDNA clone described by Yang et a l . (1986) has already been demonstrated. However, based on the r e s u l t s presented i n t h i s study, i t i s most l i k e l y that the two d i f f e r e n t ceruloplasmin mRNA species a r i s e from d i f f e r e n t i a l processing and/or promoter function at the 5' end of the gene (see Section IV.B.5). In contrast to the r e s u l t s described above using human l i v e r p o l y ( A ) + RNA, i t i s i n t e r e s t i n g to note that a si n g l e ceruloplasmin t r a n s c r i p t of 3.8 kb i s detected using r a t l i v e r mRNA, and that comparable r e s u l t s have been obtained using RNA derived from r a t t e s t i s , yolk sac, placenta, and choroid plexus (Aldred et ajL., 1987). S i m i l a r l y , only one ceruloplasmin mRNA species i s detected using bovine p o l y ( A ) + RNA. This r e i n f o r c e s the proposal that the 3.7 kb human ceruloplasmin RNA species i s l i k e l y s u f f i c i e n t to encode a functional ceruloplasmin protein. B. CHARACTERIZATION OF THE WILD-TYPE HUMAN CERULOPLASMIN GENE B.1 Ceruloplasmin Gene Organization Corresponding to the Coding  Sequence Seven recombinant phage containing wild-type human ceruloplasmin genomic sequences were i s o l a t e d from human genomic X phage l i b r a r i e s . From r e s t r i c t i o n endonuclease mapping analysis coupled with Southern b l o t a n a l y s i s , the ceruloplasmin gene was found to span at lea s t 50 kbp i n length. The lo c a t i o n of 14 exons was determined (corresponding to nucleotides 1 - 2555 of the cDNA sequence); the 3' end of the gene ( i . e . 139 the region corresponding to nucleotide residues 2556 - 3321 of the cDNA sequence) was not represented i n the phage clones obtained from l i b r a r y screening. However, genomic Southern b l o t analysis allowed the i d e n t i f i c a t i o n of two genomic EcoRI fragments (4.6 kbp and 2.4 kbp) encompassing t h i s region. B.2 DNA Sequence Analysis of the Wild-Type Ceruloplasmin Gene A l l s p l i c e doner/acceptor sequences conform to the GT..AG ru l e f o r nucleotides immediately flanking exon boundaries (Breathnach and Chambon, 1981; Cech, 1983). Further flanking sequences are i n general accordance with compiled nucleotide frequencies (Breathnach and Chambon, 1981; Mount, 1982). The sequences of the exons were found to be i d e n t i c a l to the previously determined cDNA sequence. The sizes of characterized exons range from 129 bp - 255 bp, with a calculated median s i z e of 183 bp. This i s consistent with published exon s i z e d i s t r i b u t i o n s (Naora and Deacon, 1982). As found i n other eukaryotic genes, int r o n sizes i n the ceruloplasmin gene are highly v a r i a b l e (0.8 kbp - approximately 9.5 kbp). The t o t a l s i z e of introns determined to date i s approximately 30 kbp which i s already i n excess of that predicted by Naora and Deacon (1982), according to the observed v a r i a b i l i t y of t o t a l intron s i z e as a function of t o t a l exon length. B.3 Intron Positions Within the T r i p l i c a t e d A Domains of Human  Ceruloplasmin As has been described previously (see Section I.A), the e n t i r e human ceruloplasmin molecule possesses an i n t e r n a l t r i p l i c a t e d structure, which l i k e l y represents the product of successive gene d u p l i c a t i o n events (Dwulet and Putnam, 1981b: D o o l i t t l e , 1984; see Section IV.D). If 140 Figure 23. Intron p o s i t i o n s within the three repeated u n i t s of human  ceruloplasmin. The top, middle and bottom l i n e s correspond to the amino acid sequence of the A l , A2 and A3 t r i p l i c a t e d regions r e s p e c t i v e l y . Except f o r gaps introduced i n order to maximize homology, the complete amino acid sequence i s given continuously i n s i n g l e l e t t e r code. Residues boxed at a given p o s i t i o n are i d e n t i c a l ; non-identical residues i n the middle sequence are enclosed i n dashed boxes. Triangles i n d i c a t e determined in t r o n p o s i t i o n s . Carbohydrate attachment s i t e s are designated by a s t e r i s k s . Putative type I copper ligands are i d e n t i f i e d by s o l i d c i r c l e s . Homologous p a i r s of cysteine residues are shaded and v e r t i c a l arrows i d e n t i f y four s i t e s of t r p t i c cleavage (modified from O r t e l et a l . , 1984) . 141 1 351 711 K(E KJH Y Y I H Y R H Y Y I A A G[TR[T]Y Y I A A Y G I I E W 0 Y A S D H G - E K K L I S V 0 T E H WjKjY A P S G I O I F T K E ^ L w b Y[S|P Q R E V(FK]E[I]H H T A P G S D Q E 0 IF- V SIA V F F E Q S_N]A£L]O K G E F Y|I G S K P 0 T T R I G R I G R L G S E|T F RJT T I ik p v w — [ L ~ S | F [ L G P I T K ATJTJG D K I E T E H L G I L T R " VII V Q L H A E Y G 0 T I R A ID Y G D K V H L T K I I K H F F K N H!N .1 L0 K G H0 T T D F Q R A D D K Y Y Y H P Q S R^Tv P P S A S H Y A T E S S T Y T P T L P G E Q Y P£ET P G E T Y T Y M I F T Y E W L T Y V w 141 501 841 L A T E E T Y K 1 P;K;E P E R S Q S[P~G]E G Y G P(T GJAJGJ H A T E G M S V T R I P Y | I A K H S A « I P W A Y[HJS Y Y S Y Y S A P P T QY K D I K 0 1 K DIL'YIS G L I G P L I|Y 190 K E K H I 0 R E F 551 R Q V 0 Y 0 E F 891 PRRKL - - E F Y Y H[?JS T[vJ0 E | [ FJS W Y I F P T L 0 L C S E L E D N I K T Y Y F D E-JJfjE Sjl LjL E ON I Y F 0 E-JE SHY ifblo H IK T YJS 0 H R M F T.T A P E K V 0 K OjNjE 0 F Q E S N[R p;o Q]Y D K[E:P E D F Q E S N K P E K vftillC D D E[EIF[T1E S H K 240 601 939 ^ Y | Y M H S H H H[A I Y K Y Y Y N W Y L F G H G H E W Y I F S"A]G N E I G H G N E D Y H 0 V H DILIH A A F G I Y T V H F H G F[S;G F H G Q A L T. N|KJH N T Y L W - -H s F g Y[KJH 290 649 989 - E Y[Y D T I R 0 T A N I M L D V F 0 I L F 0 A Y S L Y <3 T L T L Y A Q N P G EHI T P G 0H0S T F K V E ifwlLftlH Q H L_MJH I T T D H H V N I sfs~sl R R'QlSlE 0 S T F OJN E D T K SJG I 0 M I R*G K Y L 142 segmental DNA d u p l i c a t i o n has played a r o l e i n the evolution of the ceruloplasmin gene, one may expect the occurrence of intron boundaries between the A domains within the ceruloplasmin gene ( D o o l i t t l e , 1985). The comparative alignment of intro n boundaries within the repeated A units of ceruloplasmin (Figure 23) i s f a c i l i t a t e d by the r e l a t i v e l y high degree of s i m i l a r i t y between the re l a t e d segments. As i s c l e a r l y i l l u s t r a t e d i n Figure 21, the boundaries of the A1/A2 and A2/A3 repeats are each contained on one exon (exons 6 and 12, r e s p e c t i v e l y ) , which i s inconsistent with a DNA d u p l i c a t i o n mechanism where the boundaries of repeated u n i t s would predictably f a l l within introns. This i s reminiscent of the fa c t o r VIII gene organization (see Figure 4, Section I.G.3) i n which the A1/A2 and A2/A3 junctions are each contained on one exon. This observation i s l i k e l y the r e s u l t of intro n loss within these two genes subsequent to t h e i r i n i t i a l formation, such that the present-day gene organizations are no longer r e f l e c t i v e of the o r i g i n a l gene d u p l i c a t i o n events. As i s also the case f o r the fac t o r VIII gene (see Figure 4, Section I.G), only some of the intron boundaries within the A repeats are conserved i n the ceruloplasmin gene (see Figure 23). Furthermore, each of the three A domains within the fa c t o r VIII and ceruloplasmin genes contain a d i f f e r e n t number of exons (see Figure 24). The precise alignment of some intron boundaries within the repeated units of ceruloplasmin suggests that these introns may have arisen following the successive gene d u p l i c a t i o n events, e i t h e r by loss of one member of a duplicated intron, or by i n s e r t i o n of new introns. These l a t t e r two events are e s s e n t i a l l y i n d i s t i n g u i s h a b l e . 143 Figure 24. Positions of introns (t) i n the A domains of ceruloplasmin  and factor VIII. The numbers i d e n t i f y i n g the extent of the t r i p l i c a t e d A u n i t s represent nucleotide residues. Codon phases ( i . e . , I, I I , or 0 i n d i c a t i n g the p o s i t i o n of each intron r e l a t i v e to the codon t r i p l e t ) are shown. Ceruloplasmin introns are l a b e l l e d A - N, while introns i n f a c t o r VIII are designated a - r . Dashed l i n e s designate areas i n which introns have not yet been l o c a l i z e d i n the ceruloplasmin gene. The r e l a t i v e p o s i t i o n of the B domain i n f a c t o r VIII i s indicated. CERULOPLASMIN REPEAT UNITS (Nucleotides) INTRON POSITIONS-CP INTRON POSITIONS- F.VIII 57 A II T B I r A1 A2 A3 1053 2133 C I D I T E I F II G I T H I K M N 3138 - f — I — I — r - H 111 t i l i 1 H t \ 1; I I I 1 ! B Domain A-145 In an analogous s i t u a t i o n , a-fetoprotein (Eiferman et a l . , 1981) and albumin (Sargent et a l . , 1981) each have a t r i p l i c a t e d gene structure ( i . e . composed of 3 s i m i l a r sets of 4 exons). On t h i s basis, i t has been proposed that these genes have ari s e n through the d u p l i c a t i o n of an ancestral gene approximately 300 - 500 m i l l i o n years ago (Eiferman et a l . , 1981). Unlike ceruloplasmin or f a c t o r VIII, a l l i n t r o n boundaries within the repeated u n i t s i n a-fetoprotein and albumin are almost p r e c i s e l y conserved. Since i t has been observed that i n a number of genes, introns separate p r o t e i n domains (Blake, 1983a, b; G i l b e r t , 1978; Go, 1981, 1983), the occurrence of introns i n a-fetoprotein and albumin at the borders of the three genetic domains may r e f l e c t d i v i s i o n s between f u n c t i o n a l u n i t s . Indeed, although a c t i v e binding s i t e s i n mammalian a-fetoproteins are poorly characterized, s p e c i f i c functions have been assigned to the 3 genetic u n i t s i n albumin (Peters and Reid, 1977) (Domain 1: long-chain f a t t y acid binding, Domain 2: b i l i r u b i n binding, and Domain 3: indole binding). However, f o r both the ceruloplasmin and fa c t o r VIII genes, c o r r e l a t i o n s between exons and functional domains are unclear at present. Thus, the conservation of some intron boundaries within the repeated units of these l a t t e r two genes cannot be correlated with the incidence of fu n c t i o n a l u n i t s . B.4 Comparison of the Gene Organizations of Ceruloplasmin and Factor  VIII Comparative analysis of the gene organizations of ceruloplasmin and blood coagulation factor VIII ( G i t s c h i e r et a l . , 1984) i s u s e f u l i n t r a c i n g the evolutionary h i s t o r y of these two proteins. As can be c l e a r l y seen, introns A - L i n the ceruloplasmin gene correspond c l o s e l y i n 146 p o s i t i o n to introns present i n the fa c t o r VIII gene, and except f o r intron H, in t e r r u p t the reading frame i n the same phase (Figure 24). When the prot e i n sequences of ceruloplasmin and fa c t o r VIII are aligned to maximize i d e n t i t y , some of these introns i n t e r r u p t e i t h e r the same amino acid i n the two proteins (e.g., ceruloplasmin introns B, I, J , and K) or amino acids i n i d e n t i c a l p o s i t i o n s (.e.g, ceruloplasmin introns C and E) (see Figure 25). This strongly suggests that these introns were present i n the progenitor gene, p r i o r to du p l i c a t i o n . In two cases (introns A and L i n the ceruloplasmin gene), int r o n p o s i t i o n s vary i n the f a c t o r VIII and ceruloplasmin genes by one amino acid (3 nucleotide residues). These introns also may have been present i n the ancestral gene ( i . e . p r i o r to du p l i c a t i o n ) , and d i f f e r s l i g h t l y i n p o s i t i o n i n the present-day genes due to i n s e r t i o n or d e l e t i o n of s i n g l e amino acids within the coding sequences of e i t h e r f a c t o r VIII or ceruloplasmin. I t i s also possible that these small v a r i a t i o n s may be the r e s u l t of intron s l i d i n g (see below) , as has been postulated to explain s i m i l a r differences i n the organizations of the triphosphate isomerase genes from A s p e r g i l l u s nidulans and maize (McKnight et a l . , 1986). Introns D, F, and G i n the ceruloplasmin gene d i f f e r i n p o s i t i o n by 10, 7, and 12 amino acids r e s p e c t i v e l y i n the factor VIII gene (see Figure 25), although the phases of these three introns i n the two genes are the same (see Figure 24). For introns D and G, a model inv o l v i n g in t r o n s l i d i n g seems u n l i k e l y , since no corresponding i n s e r t i o n s or deletions are observed at these posi t i o n s i n the two genes (Figure 25). This suggests that independent intron i n s e r t i o n has occurred at these locations i n the ceruloplasmin and factor V I I I genes and these introns 147 Figure 25. Comparative posi t i o n s of introns i n ceruloplasmin (A) with  corresponding introns i n f a c t o r VIII ( ? ) . The f a c t o r VIII amino acid sequence i s aligned with that of ceruloplasmin i n the region of the A domain. The consensus l i n e designates residues which are i d e n t i c a l i n the two proteins. The numbering above the sequences corresponds to that of f a c t o r VIII. The numbers preceding the ceruloplasmin l i n e s represents the number of the f i r s t amino acid i n the l i n e . Introns are l a b e l l e d as described i n Figure 24. 148 FACTOR VIM -19 -10 MOIELSTCFFICILRFCFS FACTOS V!)! CERULOPLASMIN CONSENSUS FACTOR VIII CIRULOPLASMIN CONSENSUS FACTOR VIH CERULOPLASMIN CONSENSUS FACTOR VIM CERULOPLASMIN CONSENSUS FACTOR VI II CERULOPLASMIN CONSENSUS FACTOR VI l l CERULOPLASMIN ;ONSENSUS FACTOR vin CERULOPLASMIN CONSENSUS FACTOR VIII CERULOPLASMIN CONSENSUS FACTOR VI11 CERULOPLASMIN CONSENSUS FACTOR VIII CERULOPLASMIN CONSENSUS FACTOR VI II CERULOPLASMIN CONSENSUS 1 10 20 T 30 HO 50 60 70 80 90 ATRRmGAV^LSVDYM--0SDLGELPVDARFPPRVPKSFPFNTSVmnLFVEFTDHLFNIAKPRPPHMGLL6PTI0AEVYDTVVITLI<NMASHPVSL 1 KF.KHYYIG11ETTWDYASDHGEKKl 1SVOTEHSNIYLONGPDR1 GRIYKKAIYIGYTDETFRTT ]EKPVWLGFIGP11 KA-TGDKVYVHLKNIASRPYTF YY-G—c-£WDY VDA- P YKK-L TD--F P-W-G-LGP- I -AE-^-V—LKN-AS-P— 100 HO 120 130 110 150 160 170 lgT 190 HAVGVSYVKA5EGAEYDD0TS0REKEDDKVFPGGSHTYVW0VUENGPMASDPLCLTYSYLSHVDLVKDLN5GLIGALLVCREGSLAKEKT0TLHKFILL 101 HSHGITYYKEKE6AIYPDNTTDFORAD0KVYPGEOYTYP1LLATEEOSPGEGDGNCVTRIYHSH1DAPKDIASGLIGPL11CKKDSLDKEKEKH1DREFVV K--G-Y-K—EGA-Y-D-T DDKV-PG—TY E—P—D—C-T—Y-SH-D—KD—SGL1G-L--C—iSL-KEK B I C 200 210 220 230 210 • 250 260 270 280 290 -FAVFDEGKSWHSETKNSL MGDRDAASARAWPKMHTVNGYVNRSLPGl!GCHRKSVYWHVIGMGTTPEVHS! FLEGHTFLVRNHROASLEI SP I 201 ff SWDENFSHYLEDNIKTYCSEPEKV0K0NEDFOESNRMYSVNGYTFGSLPGI.SMCAEDRVKWYLFGMGNEVDVHAAFFHGOALTNKNYR IDTINLFPA -F-Y-DE—SV—E D-D M-TVNGY—SUPGL—C V-W—GMG VH- -F—G N-R P-g D 300 310 • 320 330 310 350 360 370 380 390 TFLTAOTLLMDLGOFLLFCHlSSHOHDGnEAYVKVDSCPEEPOLRMXNNEEAEDYDDDLTDSEKDWRFDDDNSPSFlOIRSVAK.KHPKTWVHYIAAEEE 301 TLFDAYMVAONPGEWniSCONlNHUAGLQAFFOVQECMCSSSKD — NIRGKHVRHYYIAAEEi T — A G — L - C H—jG—A—V-C YIAAEE-h E I 100 • 110 120 130 110 150 160 170 180 DWDYAPLVLAPDDRSYXSOYLN NGPOR1GRKYXKVRFMAYTDETFKTREA1OHE—SGILGPLLYGEVGDTLLI! FKNQASRPYNIYPHGIT 363 IWNYAPSGIDIFTKENLTAPGSDSAVFFE0GTTR16GSYKKLVYREYTDASFTNRKERGPEEEHLG1LGPV1WAEVGDTIRVTFHNKGAYPLSIEPIGVR -W-YAP i G--R1G—YKK YTD--F—R E GILGP EVGDT F-N P—1-P-G— 1 9 0 T 500 510 520 530 510 550 560 • 570 DVRP LYSR--RLPKGVKHLKDFPILPGEIFKYKWTVTVEDGPTKSDPRCLTRYYSSFVNMERDLASGLJGPULICYKE5VDGrGriQIMSDKRNVIL 163 FNKNNEGTYYSPNYNPQSRSVPP5ASHVAPTETETYEWTVPKEVGPTNADPVCLAKMYYSAVDPTKDIFTGUIGPMKICKKGSLHA'IGRQKDVDKE-FYL Y5 P-E-F-Y-WTV—E-GPT—DP-CL—Y-S-V D—GLIGP—1C-K-S G-0---DK L H I f 580 590- 600 610 • 620 630 610 650 660 670 F-5VFDENRSWYLTEN IORFLPNPAGVOLEDPEFOASNIf.HSINGYVFD-SLOUSVCLHEVAYWYlLSlGAOTDFLSVFFSGYTFKHKMVYEDTLTLFPF 562 FPTVFDENESLLLEDN1RMFTTAPD0VDKEDEDFOESNKMH5HNGFMYGN0PGLTMCKGDSVWYLFSAGNEADVHGIYFSGNTYI.WRGERRDTANLFPO F—VFDEN-S—L—Nl—F—P—V—ED—FO-SN-flHS-NG L--C WY—S-G—D FSG-T DT—LFP-• • m i 680 T 690 700 710 720 730. 710 750 760 770 SGETVFttSMENPGlWILGCHNSDFRNRGMTAllKVSSCDKNTGDYYEDSYEDISAYLLSKNNA IEPRSF S0NSRHPSTRQK0FNATT1PENDIEKTDPWF 662 TSLTLHMWPDTEGTFNVECLTTDHYTGGMKQKYTVNOCRROSEDST (707) — T — M G C — D 6M V—C D— 1680 K 1690 1700 1710 708 1720 -HVLRNRAOSGSVPO-KEDFDIYDEDENOSPRSFQKKTRHYF1AAVERLWDYGKSSSP . . . „ . , . FYLGERTYYIAAVEVEWDYSPOREWEXELHHLQEONVSNAFLDKGEFYIGSKYKKWYROYTDSTFRVPVERKAEEEHLGILG 1730 1710 1750 1760 FKKVVFQEFTDGSFTQPLYRGEINEHLGILG F R-Y-1AAVE—WDY O 17^ _ H-L—A-S L KKW TD—F—P—R EHLG-LG w M 1780 1790 1800 1810 1820 1830 1810 1850 1860 PY1RAEVEDN1MVTFRN0ASRPYSFYSSLISYEED0R0GAEPRKNFVKPNETKTYFWKV0HHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHT 791 POIHADVGDKVK! 1FKNMATRPYSIHAHGV0TESSTVTPTLPGETL TYWK 1PF.RSGAGTEDSACIPWAYYSTVDOVKDLYSGL1 GPL 1VCRR p-.-A-V-D F-N-A-RPYS E Pj- TY-WK C—WAY-S-VD—KD—S6L1GPL-VC — N 1870 1880 1890 1900 1910 1920 1930 1910 1950 1960 NTLNPAHGROVTVOEFALFFTIFDETKSWYFTENMERNCRAPCNIOMEDPTFKENYRFHAINGYIMDTLPGLVMAQDORIRWYLLSMGSNENIHSIHFSG 881 PYLKVFNPRRKl—EFALLFIVFDENESWYIDDN1KTYSDHPEKVNKDDEEF1ESNKMHAINGRMFGNLQGITMHVGDEVNWYIMGMGNE1DLHTVHFHG - L R EFAL-F-FDE-SWY—N P D - F - E HALNG L-Gl-5 WYL—MG H—Hr-G FACTOR VIII CERULOPLASMIN CONSENSUS 1970 1980 1990 2000 2010 2020 2030 2010 2050 2060 HVFTVRKKEEYKMALYHLYPGVFETVEMLPSKAG1WRVECLIGEHLHAGMSTLFLVYSNKC0TPLGMASGH1RDF0ITASGQYGOWAPKLARLHYSGSIN 982 HSFOYXHRGVYS5DVFD1FPGTYQTLEMFPRTPGIHLLHCHVTDHIHAGMETTYTVL0NEDTKSG . H-F Y PG—T-EM-P—G1W—C-—H-HAGM-T---V--N 149 have c o i n c i d e n t a l l y interrupted amino acids i n the same codon phase. A l t e r n a t i v e l y , introns may have o r i g i n a l l y been present i n i d e n t i c a l p o s i t i o n s i n f a c t o r VIII and ceruloplasmin, and may have been subsequently l o s t . This model, however, di c t a t e s the presence of very small exons (10 and 7 bp, respectively) i n the ancestral gene, thereby making the former theory more a t t r a c t i v e . The d i f f e r e n c e i n the p o s i t i o n of i n t r o n h i n the fa c t o r VIII gene corresponding to intr o n F i n ceruloplasmin may be the r e s u l t of intr o n s l i d i n g . Amino acid sequence comparison i n the region of these int r o n junctions i n f a c t o r VIII and ceruloplasmin shows a low degree of s i m i l a r i t y , with a d e l e t i o n observed i n the f a c t o r VIII p r o t e i n sequence r e l a t i v e to that of ceruloplasmin (Figure 25). Since i n s e r t i o n s or deletions are a consequence of the intr o n s l i d i n g mechanism, i t i s po s s i b l e that j u n c t i o n a l s l i d i n g (due to the formation and u t i l i z a t i o n of a l t e r n a t i v e s p l i c e donor/acceptor s i t e s within exons or introns) may account f o r t h i s observed v a r i a t i o n i n intron p o s i t i o n , while maintaining the coding phase of the intron. The remainder of the introns i d e n t i f i e d i n the ceruloplasmin gene to date ( i . e . , introns M and N, contained with the A3 repeated u n i t ; see Figure 24) do not seem to cor r e l a t e with the posit i o n s of introns i n the fa c t o r VIII gene. I t i s thus l i k e l y that introns present i n t h i s region of the two genes have resulted from independent intron i n s e r t i o n or de l e t i o n events. Figure 24 c l e a r l y shows that i n several corresponding regions, the fa c t o r VIII gene possesses more introns than does the ceruloplasmin gene. As has been stated previously, however, i t i s not possible to d i s t i n g u i s h 150 whether t h i s r e f l e c t s intron i n s e r t i o n i n the factor VIII gene, or corresponding i n t r o n loss i n the ceruloplasmin gene. B.5 Characterization of the 5' End of the Human Ceruloplasmin Gene The i n t e r r u p t i o n of the 5' untranslated region by at l e a s t four introns renders the organization of the 5 * end of the human ceruloplasmin gene somewhat unusual. Although not a common occurrence, the presence of one i n t r o n i n the 5' untranslated sequence of a number of genes from diverse organisms has been reported [e.g., the human and chicken i n s u l i n gene (P e r l e r et a l . , 1980; B e l l et a l . , 1980), the heat shock p r o t e i n 83 gene from Drosophilia (Hackett and L i s , 1983), and the nuclear COX4 gene of yeast (Schneider and Guarente, 1987). However, i t has been reported (Irminger et a l . , 1987) that the single-copy, human i n s u l i n - l i k e growth f a c t o r II (IGFII) gene possesses at l e a s t 3 exons (designated 1, 2, and 3, i n a 5' to 3' d i r e c t i o n ) i n the 5' untranslated region. These exons encode a l t e r n a t i v e 5* untranslated regions i n a t i s s u e - s p e c i f i c manner r e s u l t i n g i n IGF-II mRNA species of v a r i a b l e lengths. Brain, placenta, and adrenal gland contain a 6.0 kbp IGF-II t r a n s c r i p t , u t i l i z i n g exon 3 i n the 5' untranslated region, while l i v e r contains a 5.3 kbp IGF-II t r a n s c r i p t , with the 5' untranslated segment derived from exons 1 and 2. The novel organization of these t r a n s c r i p t s was f i r s t detected when i t was found that the 5' untranslated regions of characterized cDNA clones d i f f e r e d from the 5* sequence reported i n the genomic clone (Dull et a l . , 1984). I t has not been established whether the heterogenous t r a n s c r i p t s are due to the existence of two t i s s u e - s p e c i f i c promoters ( i . e . one upstream of exon 1 and one upstream of exon 3) as i s the case f or the mouse a-amylase gene (Young et a l . , 1981; Schibler et a l . , 1983) (see 151 below), or whether d i f f e r e n t t r a n s c r i p t s a r i s e from a l t e r n a t i v e s p l i c i n g patterns i n various t i s s u e s . The 5' extent of the IGF-II gene has not yet been determined, thereby rendering the p o s s i b i l i t y of ad d i t i o n a l exons i n the 5' untranslated segment. The a-amylase t r a n s c r i p t s accumulate i n two d i f f e r e n t tissues of the mouse - the s a l i v a r y gland and the l i v e r . The nucleotide sequences of the two mRNA species are i d e n t i c a l , with the exception of the 5' non-translated regions. The 5* terminal 158 nucleotides of the major l i v e r species are separated from the f i r s t exon i n the coding sequence by a 4.5 Kbp intron. This leader sequence i s unrelated to the 5' terminal 47 nucleotides present i n the s a l i v a r y gland counterpart and i s separated from the f i r s t exon i n the coding sequence by an ad d i t i o n a l i n t r o n of 2.6 Kbp. The two d i s t i n c t mRNA species a r i s e due to the presence of a s p e c i f i c promoter f o r each t r a n s c r i p t (Schibler et a l . , 1983); the r e l a t i v e a c t i v i t y of these two promoters i s t i s s u e - s p e c i f i c . I n t e r e s t i n g l y , the use of a l t e r n a t i v e promoters has also been described for the generation of two t r a n s c r i p t s encoding i n t r a c e l l u l a r and secreted forms of yeast invertase (Carlson and Botstein, 1982) . I t i s i n t e r e s t i n g to speculate as to the b i o l o g i c a l s i g n i f i c a n c e of the heterogenous 5' non-translated leaders observed i n the IFG-II and a-amylase genes. Perhaps v a r i a b l e t r a n s c r i p t s r e s u l t i n d i f f e r e n t l e v e l s of t r a n s l a t i o n a l e f f i c i e n c i e s , thereby s e l e c t i v e l y regulating p r o t e i n concentrations i n d i f f e r e n t t i s s u e s . I t has been reported that RNA species containing greater than two hundred residues i n the 5' non-coding region are translated less e f f i c i e n t l y than are shorter leader sequences by eukaryotic ribosomes i n v i t r o (Young et a l . , 1981). Thus, 152 the length of the 5* non-translated segment may regulate the e f f i c i e n c y of t r a n s l a t i o n or perhaps a f f e c t mRNA s t a b i l i t y . The s i g n i f i c a n c e of multiple exons i n the 5' untranslated region of the ceruloplasmin gene i s unclear at present since both the 3.7 Kb and 4.5 Kb mRNA species are detectable with 5' end probes tested to date. This would not be expected i f a l t e r n a t i v e s p l i c i n g patterns give r i s e to these two t r a n s c r i p t s . U t i l i z a t i o n of two d i f f e r e n t promoters (as has been described f o r the mouse a-amylase gene and the yeast invertase gene) cannot be ruled out at present. I t i s possib l e , however, that t h i s unusual organization may be implicated i n the regulation of t i s s u e - s p e c i f i c expression of the ceruloplasmin gene, as described above f o r the a-amylase and IGF-II genes. Unfortunately, data have not yet been obtained regarding the synthesis of ceruloplasmin i n extrahepatic t i s s u e s . A l t e r n a t i v e l y , i t i s pos s i b l e that the novel organization of the ceruloplasmin gene may be required f o r the regulation of ceruloplasmin l e v e l s by exogenous copper, or during the acute phase inflammatory response (see Section I.B.4). With reference to the l a t t e r proposal, the complete gene organizations of several acute phase reactants have been determined [e.g. t r a n s f e r r i n (Lucero et a l . , 1986) and fibrinogen (Crabtree et a l . , 1985)]. On the basis of DNA sequence determinations, these l a t t e r genes do not possess unusual organizations i n t h e i r respective 5' ends. Further analysis of the 5' end of the ceruloplasmin mRNA using primer extension, and nuclease SI mapping techniques to estimate exon sizes should allow determination of the s i z e of the 5' untranslated leader, and i d e n t i f i c a t i o n of the t r a n s c r i p t i o n a l s t a r t s i t e . These r e s u l t s , coupled with analysis of extrahepatic human ceruloplasmin biosynthesis and complete 153 ch a r a c t e r i z a t i o n of the 5' end of the gene may resolve the existence of the two ceruloplasmin mRNA species detectable with Northern b l o t a n a l y s i s . Thus f a r , DNA sequence analysis of the 5' untranslated region has not revealed the presence of any t y p i c a l eukaryotic promoter elements, strongly suggesting that the 5' most end of the ceruloplasmin t r a n s c r i p t has not yet been i d e n t i f i e d . C. CHARACTERIZATION OF A PSEUDOGENE FOR HUMAN CERULOPLASMIN C l DNA Sequence Analysis of the Human Ceruloplasmin Pseudogene Using previously characterized cDNA clones f o r human ceruloplasmin as h y b r i d i z a t i o n probes f o r human genomic l i b r a r i e s , four overlapping recombinant phage clones encoding approximately 21 kbp of contiguous genomic DNA were obtained. Within t h i s region, a pseudogene f o r human ceruloplasmin was i d e n t i f i e d (nearly 2 Kbp i n length), corresponding to nucleotide residues 1502 - 3198 of the human ceruloplasmin cDNA sequence. A d d i t i o n a l l y , the pseudogene extends through the 123 bp of the 3' untranslated sequence that i s present i n the phCP-1 cDNA clone, and continues f o r a further 40 bp, the sequence of which corresponds to the 3' untranslated sequence of a ceruloplasmin cDNA clone described by Yang et a l . (1986). On t h i s basis, i t appears that the pseudogene has been derived from an mRNA species corresponding to the cDNA clone described by Yang et a l . (1986), as opposed to the shorter mRNA species represented by phCP-1. The de r i v a t i o n of the ceruloplasmin pseudogene from t h i s p a r t i c u l a r mRNA species i s i n t e r e s t i n g , since i t has been shown that for d i f f e r e n t r at cytochrome c mRNA species occurring at the same i n t r a c e l l u l a r concentration, multiple pseudogenes have ar i s e n 154 p r e f e r e n t i a l l y from one mRNA (Scarpulla, 1984). I t has been postulated that t h i s may be due to l o c a l secondary structure i n the 3' end of the mRNA, which f a c i l i t a t e s binding of enzymes involved i n reverse t r a n s c r i p t i o n or subsequent i n t e g r a t i o n into the genome (Scarpulla, 1984). As i s c h a r a c t e r i s t i c of processed pseuodgenes, complete DNA sequence analysis revealed that the human ceruloplasmin pseudogene lacks introns present i n the wild-type gene. In contrast to numerous other examples of processed pseudogenes i n which the intervening sequences are p r e c i s e l y removed (e.g., Vanin et a l . , 1980), the ceruloplasmin pseudogene contains a 213 bp de l e t i o n (corresponding to nucleotides 1865 - 2077 of the cDNA sequence) that occurs exactly at the boundaries of exon 11 i n the wild-type gene. There i s also a small 17 bp del e t i o n i n the pseudogene sequence, beginning at nucleotide 2943 of the cDNA sequence. I t has not been determined whether t h i s d e l e t i o n also corresponds to the l o c a t i o n of intron/exon boundaries i n the wild-type gene. I t i s unclear i f the 213 bp del e t i o n observed i n the pseudogene occurred at the time of intron processing, or whether the d e l e t i o n i s the r e s u l t of a subsequent mutation event. The 5' boundary of the pseudogene i s characterized by a short sequence that i s homologous to the 3' end of intron H i n the wild-type gene. The presence of t h i s i n t r o n i c segment i s expected, since the pseudogene diverges from the wild-type gene p r i o r to the 5'end of t h i s i n t r o n . Therefore, the appropriate 5' s l i c e r donor s i t e (Breathnach and Chambon, 1981; Cech, 1983), required f o r intron removal i s absent. Although the pseudogene appears to have been derived from a processed RNA species ( i . e . lacking intervening sequences), there i s no poly(A) 155 t r a c t present i n the sequence. While the majority of processed pseudogenes have a poly(A) t a i l , several exceptions have been reported (Vanin et a l . , 1980; Notake et a l . , 1983). I t i s possible that the absence of a poly(A) t r a c t i n the human ceruloplasmin pseudogene sequence may be the r e s u l t of the mechanism of i t s formation, possibly involving base p a i r i n g between the poly(A) t a i l of the mRNA and U-rich region i n the 3* untranslated region. In t h i s case, the s i x T residues observed following the 3' untranslated segment corresponding to that described by Yang et a l . (1986) may represent a s i t e of mRNA self-priming. This explanation may also account f o r the divergence that occurs at t h i s point between the pseudogene sequence and the remainder of the 3' untranslated region reported by Yang et a l . (1986). The presence of a repeated CT dinucleotide segment at the 3' end of the ceruloplasmin pseudogene sequence corresponding to the coding strand i s i n t e r e s t i n g . A 116 bp segment, composed mainly of repeated GA dinucleotides corresponding to the coding strand has been reported at the 3' end of the mouse c o r t i c o t r o p i n B - l i p o t r o p i n precursor pseudogene (Notake et a l . , 1983). In the l a t t e r case, t h i s repeated segment occurs immediately following the point at which the pseudogene diverges from the wild-type gene. In the human ceruloplasmin pseudogene, the repeated TC region occurs 172 bp 3' to the point at which the pseudogene diverges from the wild-type gene. The rat metallothionein pseudogene 14b, which has been characterized by Andersen et a l . (1986), contains a 42 bp poly(CA) t r a c t , located approximately 300 bp 3' to the s i t e of polyadenylation. While stretches of repeating CA residues have been found i n eukaryotic DNA at the s i t e of recombination events such as gene conversion (Shen et a l . , 156 1981), and are thought to induce Z-DNA conformational changes (Nordheim and Rich, 1983), the function of s i g n i f i c a n t l y long purine or pyrimidine stretches remains unclear. Tracts of d(GA) •d(TC) have been found n n at many s i t e s i n eukaryotic genomes [e.g. human Ul RNA genes (Htun et a l . , 1984) and the murine immunoglobulin u-c heavy chain gene (Richards et a l . , 1983). I n t e r e s t i n g l y , a d ( GA) 2 ? . d ( T C ) 2 7 t r a c t has been demonstrated i n a polyomavirus-transformed c e l l l i n e near the end of a host DNA segment that i s responsible f o r a r r e s t of the v i r a l r e p l i c a t i o n process (Baran et a l . , 1987). In the l a t t e r case, i t has been postulated that t h i s repeated sequence, i n conjunction with an inverted repeat, may serve as an a r r e s t s i t e f o r chromatin r e p l i c a t i o n i n vivo. In pseudogenes, i t i s a t t r a c t i v e to speculate that repeated pyrimidine or purine stretches may be involved i n the process of pseudogene i n t e g r a t i o n into the genome, since very l i t t l e i s known about the mechanism(s) mediating t h i s event (Vanin, 1985). C.2 Chromosomal Location of the Human Ceruloplasmin Pseudogene A l l processed pseudogenes studied to date are located on d i f f e r e n t chromosomes than t h e i r functional counterparts. This i s i n contrast to non-processed pseudogenes, which have l i k e l y a r i s e n from gene d u p l i c a t i o n events and are therefore on the same chromosome as the respective wild-type gene (Vanin, 1985). Using previously characterized human-hamster hybrid c e l l l i n e s (Donald et a l . , 1983), the human ceruloplasmin pseudogene has been assigned to chromosome 8. This r e s u l t has been recently v e r i f i e d , using the technique of i n s i t u h y b r i d i z a t i o n (Wang et a l . , 1987). This d i f f e r s from the unequivocal assignment of the fu n c t i o n a l ceruloplasmin locus to chromosome 3q25 (Yang et a l . , 1986; 157 Royle et a l . , 1987). I t has been reported previously by Yang et a l . (1986) that a 0.8 kbp EcoRI fragment that can be i d e n t i f i e d i n genomic Southern b l o t s probed with the ceruloplasmin cDNA segregates with human chromosome 11. However, the present study suggests that t h i s 0.8 kbp EcoRI fragment i s part of the human ceruloplasmin pseudogene, and maps to chromsome 8 (see Section I I I . C ) . The reason f o r t h i s discrepancy i s unclear at present, since genomic Southern b l o t analysis indicates that there i s only one pseudogene f o r human ceruloplasmin. However, since the mapping analysis reported by Yang et a l . (1986) was performed using the ceruloplasmin cDNA to probe human-mouse hybrid c e l l l i n e s , the 0.8 kbp band detected may represent a cross-reacting species i n the mouse genome. This i s i n agreement with previous d i f f i c u l t i e s encountered i n chromosome mapping when using cDNA fragments as h y b r i d i z a t i o n probes ( J . Hamerton, personal communication). C.3 Speculations on the Evolutionary O r i g i n of the Human  Ceruloplasmin Pseudogene The human ceruloplasmin pseudogene shares approximately 97% nucleotide sequence i d e n t i t y compared to the wild-type ceruloplasmin coding sequence, suggesting that i t has been formed r e l a t i v e l y recently i n evolutionary time. This i s c h a r a c t e r i s t i c of processed pseudogenes analyzed to date [e.g. L i et a l . , 1981; Freytag et a l . , 1984), a l l of which have ar i s e n following mammalian r a d i a t i o n (approximately 80 m i l l i o n years ago) (Vanin, 1985)]. I n t e r e s t i n g l y , the existence of a processed pseudogene f o r r a t ceruloplasmin has been suggested recently by Shvartsman et a l . (1985) based on preliminary r e s t r i c t i o n endonuclease mapping analysis of r a t ceruloplasmin genomic clones. This suggests that the 158 formation of the ceruloplasmin pseudogene occurred p r i o r to the divergence of rats and humans. This i s consistent with the notion that processed pseudogenes have arisen i n genomes following the widespread appearance of mammals on earth. Presumably pseudogenes are not subject to fun c t i o n a l constraints. This has been interpreted to suggest that the pattern of nucleotide substitutions i n pseudogenes should r e f l e c t patterns of i n t r i n s i c mutation ( L i , 1983). Thus, transversions ( i . e . purine-pyrimidine or pyrimidine-purine base substitutions) should occur twice as frequently as t r a n s i t i o n s ( i . e . purine-purine or pyrimidine-pyrimidine base changes), i f mutations occur randomly ( L i , 1983). Compared with the nucleotide s u b s t i t u t i o n pattern at the f i r s t and second posi t i o n s of codons i n fun c t i o n a l genes, i t has been shown f o r pseudogenes that the r e l a t i v e frequency of t r a n s i t i o n s i s much greater than that of transversions (Gojobori, 1982). This suggests that a sequence under no functional constraint w i l l become A - T r i c h , due to the spontaneous deamination of cytosine and 5-methylcytosine. Of the 44 nucleotide sub s t i t u t i o n s observed i n the human ceruloplasmin pseudogene sequence, 34 of these (7 7%) are t r a n s i t i o n mutations. ;. Genomic Southern b l o t analysis indicates that there are no human ceruloplasmin pseudogene sequences corresponding to the 5' end of the wild-type gene. This suggests that the pseudogene may have arisen from an aberrant t r a n s c r i p t , as the r e s u l t of i n i t i a t i o n within the gene. Such a model has been proposed f o r the mouse c o r t i c o t r o p i n B - l i p o t r o p i n pseudogene (Notake et a l . , 1983). This pseudogene i s s i m i l a r to the human ceruloplasmin pseudogene, i n that i t i s only a p a r t i a l copy of the 159 functional gene, encoding the carboxy-terminal 143 amino acid residues and the 3' untranslated region (Notake et a l . , 1983). In the case of these two pseudogenes, RNA polymerase I II i n i t i a t i o n could have occurred within the respective genes. In v i t r o expression studies suggest that aberrant i n i t i a t i o n of t r a n s c r i p t i o n of the human c o r t i c o t r o p i n B - l i p o t r o p i n precursor can occur at a s i t e within the gene corresponding to the 5' end of the homologous pseudogene sequence (Mishina et a l . , 1982). A d d i t i o n a l l y , i t has been shown that RNA polymerase I I I a c t i v i t y can r e s u l t i n t r a n s c r i p t s i n i t i a t i n g upstream of the human B globin gene, which i s normally transcribed by RNA polymerase II (Carlson and Ross, 1983). Furthermore, these l a t t e r aberrant t r a n s c r i p t s have been shown to be both polyadenylated and s p l i c e d (Carlson and Ross, 1983). Therefore, the possible generation of the human ceruloplasmin pseudogene from an aberrant RNA polymerase I II event does not account f o r incor r e c t s p l i c i n g or the observed lack of polyadenylation. D. A MODEL FOR THE EVOLUTION OF CERULOPLASMIN, FACTOR V AND FACTOR  VIII Based on i t s i n t e r n a l 3-fold repeated structure (Section I.B.I) and comparison to known copper-binding centers i n several blue copper proteins (Section I.E), Dwulet and Putnam (1981b) have proposed a model fo r the evolution of the ceruloplasmin gene. This model suggests that ceruloplasmin has evolved from a small primordial copper-binding protein (approximately 350 amino acids i n length). Tandem duplications of th i s ancestral gene could then have given r i s e to the present-day gene for ceruloplasmin (see Figure 26). Because the pairwise comparison of the three repeated units i n the human ceruloplasmin molecule shows very 160 Figure 26. A proposed model f o r the evolution of ceruloplasmin (CP),  fa c t o r V (FV) and factor VIII (FVIII) (see text f o r d e t a i l s ) . The t r i p l i c a t e d A domain i s shown by cross-hatched bars while the B and C domains (present i n factors V and VIII) are i d e n t i f i e d by open and s t i p p l e d bars, r e s p e c t i v e l y . The ceruloplasmin pseudogene i s represented by a s o l i d bar. MY designates m i l l i o n years. ANCESTRAL IWWNM GENE TRIPLICATION (400 MY) IWWWlWWWlWWWM DUPLICATION I W W W I W W W I * FORMATION (<80MY) PRESENT CP FUSION • EUDc iwwwlvv.-:!-: DUPLICATION I\\\\\NJ:-,::.-|-::.V.:1 | \ \ \ \ \ \ M \ \ \ \ \ \ \ | | \ \ \ \ \ \ M -• T^TI FV FVIII 162 s i m i l a r values (see Table 1), the order of the t r i p l i c a t i o n cannot be determined; two consecutive elongations l i k e l y occurred close together on an evolutionary time scale ( D o o l i t t l e , 1984). Without comparative amino acid sequence data and/or molecular weights f o r ceruloplasmin from more p r i m i t i v e species, the time of the f i r s t d u p l i c a t i o n event i s d i f f i c u l t to estimate. I n t e r e s t i n g l y , the degree of s i m i l a r i t y between the three repeated u n i t s i n ceruloplasmin i s approximately the same as that observed f o r the duplicated halves of t r a n s f e r r i n ( D o o l i t t l e , 1984). Thus, i f the rate of change of ceruloplasmin was s i m i l a r to that of t r a n s f e r r i n , both sets of duplications may have occurred within the same time frame ( D o o l i t t l e , 1984). Since lampreys have a f u l l - s i z e d t r a n s f e r r i n , the l a t t e r event has been placed at > 400 m i l l i o n years ago ( D o o l i t t l e , 1984). On t h i s basis, i t would seem reasonable that ceruloplasmin has grown by d u p l i c a t i o n at the same time. This i s i n agreement with the proposal of Dwulet and Putnam (1981b) who suggested that the t r i p l i c a t i o n event r e s u l t i n g i n the formation of ceruloplasmin was coincident with the appearance of vertebrate animals ( i . e . 500 m i l l i o n years ago), possessing a closed vascular system and a urogenital system. The l a t t e r anatomical developments necessitated that plasma proteins have molecular weights of > 60 Kda to avoid renal excretion. As has been previously discussed, factor VIII was shown to be s t r u c t u r a l l y r e l a t e d to human ceruloplasmin on the basis of amino acid s i m i l a r i t y (Wood et a l . , 1984; Vehar et a l . , 1984). Although i n i t i a l l y indicated on the basis of l i m i t e d amino acid sequence derived from bovine f a c t o r V (Church et a l . , 1984), the complete amino acid sequence of human fa c t o r V (Jenny et a l . , 1987) has shown c l e a r l y that t h i s protein i s 163 s t r u c t u r a l l y r e l a t e d to both f a c t o r VIII and ceruloplasmin. I t has therefore been proposed that ceruloplasmin, f a c t o r V and factor VIII form a gene family derived from a common ancestral gene (see Figure 26). In t h i s scheme, the e n t i r e t r i p l i c a t e d gene for human ceruloplasmin would have undergone d u p l i c a t i o n , thereby forming the progenitor species f o r fac t o r s V and VIII. As i s depicted schematically i n Figure 26, t h i s event l i k e l y occurred r e l a t i v e l y close i n evolutionary time to the formation of the i n i t i a l t r i p l i c a t e d structure, since when compared pairwise, the A repeated u n i t s i n ceruloplasmin, f a c t o r V and fac t o r VIII share very s i m i l a r l e v e l s of amino acid i d e n t i t y (Table 1), implying that they have been evolving f o r a s i m i l a r length of time. P r i o r to the t h i r d gene d u p l i c a t i o n event leading to the formation of factors V and VIII, the B and C domains were l i k e l y inserted by independent gene fusion events (Figure 26), thereby creating the larger and more complex fa c t o r V and VIII molecules. The evolution of ceruloplasmin and factors V and VIII provides an example of divergent evolution i n which factors V and VIII have adopted diverse functions compared to ceruloplasmin. E. CONCLUDING REMARKS Comparative analyses of the gene organizations of s t r u c t u r a l l y r e l a t e d proteins contribute greatly to our knowledge of gene evolution, aiding i n the unambiguous c l a s s i f i c a t i o n of genes into various multigene f a m i l i e s . For example, although the sequence of angiotensinogen i s only 20% i d e n t i c a l to that of a - a n t i t r y p s i n , the d i s t r i b u t i o n of introns i n these two genes i s p r e c i s e l y conserved (Tanaka et §_1. , 1984) thus allowing t h e i r unequivocal assignment to the same gene family ( D o o l i t t l e , 1985). 164 Examination of the gene organization of ceruloplasmin with that previously reported f o r fa c t o r VIII ( G i t s c h i e r et §_1. , 1984) confirms that these genes are members of the same gene family, and as such have l i k e l y a risen from a common ancestral gene. Characterization of the f a c t o r V gene (which as also been proposed as a member of the l a t t e r multigene family, based on shared s t r u c t u r a l properties) w i l l prove i n t e r e s t i n g since one might p r e d i c t that i t w i l l e x h i b i t a more s i m i l a r organization to f a c t o r VIII than does ceruloplasmin, based on s i m i l a r f u n c t i o n a l constraints shared by the two blood coagulation f a c t o r s . As has been discussed previously, development of a more complete understanding of the evolutionary h i s t o r y of ceruloplasmin i t s e l f w i l l require comparative phylogenetic data, involving the i s o l a t i o n and ch a r a c t e r i z a t i o n of ceruloplasmin from p r i m i t i v e vertebrates such as hagfish or lampreys. Just as ceruloplasmin has been rendered a f a s c i n a t i n g p r o t e i n f o r chemical study, the molecular biology of t h i s multicopper oxidase i s proving to be equally i n t r i g u i n g . As has been shown i n the various aspects of t h i s study, the ceruloplasmin gene i s highly complex i n nature, possessing an unusual organization at the 5' end with multiple introns i n t e r r u p t i n g the 5* untranslated leader segment, and heterogeneity with respect to the s i t e of polyadenylation at the 3' end. Therefore, complete knowledge of the organization of t h i s gene w i l l enhance our understanding of gene regulation i n eukaryotic systems. F i n a l l y , complete c h a r a c t e r i z a t i o n of the human preceruloplasmin cDNA sequence as d e t a i l e d i n t h i s study should f a c i l i t a t e studies involving s i t e - d i r e c t e d mutagenesis of various regions of the ceruloplasmin molecule, i n order to i d e n t i f y s p e c i f i c copper-binding centers within the protein. 165 V. REFERENCES Adman, E.T., Stenkamp, R.E., Sieker, L.C., and Jensen, L.H. (1978. J. Mol. B i o l . 123, 35-47. Anfinsen, C.B. (1975). The Molecular Basis of Evolution. Wiley, New York. Aldred, A.R., Grimes, A., Schreiber, G., and Mercer, J.F.B. (1987). J. B i o l . Chem. 262, 2875-2878. Al-Timimi, D.H., and Dormandy, T.L. (1977). Biochem. J . 168, 283-288. Andersen, R.D., B i r r e n , B.W., T a p l i t z , S.J., and Herschman, H.R. (1986). Mol. C e l l . B i o l . 6, 302-314. Aoyagi, T., Ikenaka, T., and Ichida, F. (1978). Cancer Res. 38, 3483-3486. Appleyard, R.K. (1954). Genetics 39, 440-452. Atkinson, T., and Smith, M. (1984). Oligonucleotide Synthesis, A P r a c t i c a l Approach (Gait, M.J, Ed.) IRL Press Ltd., Oxford, pp. 35-81. Aviv, H., and Leder, P. (1972). Proc. N a t l . Acad. S c i . U.S.A. 69, 1408-1412. Baran, N. , Lapidot, A., and Manor, H. (1987). Mol. C e l l . B i o l . ]_, 2636-2640. B a r r e l l , B.G., Bankier, A.T. and Drouin, J . (1979). Nature (London) 282, 189-194. Battey, J . , Max, E.E., McBride, O.W., Swan, D., and Leder, P. (1982). Proc. Natl. Acad. S c i . U.S.A. 79, 5956-5960. B e l l , G.I., P i c t e t , R.L., Rutter, W.J., C o r d e l l , B., Tischer, E. and Goodman, H.M. (1980). Nature (London) 284, 26-32. Benton, W.D. and Davis, R.W. (1977). Science 196, 180-182. Benyajati, C., Spoerel, H.H., and Ashburner, M. (1983). C e l l 33, 125-133. Bereman, R.D., and Kosman, D.J. (1977). J . Am. Chem. Soc. 99, 7322-7325. Bergman, C., Gandvik, E.-K., Nyman, P.O., and S t r i d , L. (1977). Biochem. Biophys. Res. Comm. 7_7, 1052-1059. Birnboim, H.C. , and Doly, J. (1979). Nucleic Acids Res. 7., 1513-1523. B i r n s t i e l , M.L., Busslinger, M., and Strub, K. (1985). C e l l 41, 349-359. 166 Blake, C. (1983a). Nature (London) 306, 535-537. Blake, C. (1983b). Trends Biochem. S c i . 8, 11-13. B l i n , N., and Staffo r d , D.W. (1976). Nucleic Acids Res. 3, 2303-2308. Blobel, G., Walter, P., Chang, C.N., Goldman, B.M., Erickson, A.H., and Lingappa, V.R. (1979) i n Secretory Mechanisms, Vol. 33 (Hopkins, CR. and Duncan, C J . , Eds.), Cambridge U n i v e r s i t y Press, London, pp. 9-36. Bonta, I.L. (1978) i n The Handbook of Experimental Pharmacology (Born, G.V. et §1., Eds.), Springer-Verlag, New York, pp. 523-567. Breathnach, R., and Chambon, P. (1981). Ann. Rev. Biochem. 50, 349-383. B r i v i n g , C , Gandvik, E.K., and Nyman, P.O. (1980). Biochem. Biophys. Res. Comm. 93, 454-461. Campbell, C.H., Brown, R., and Linder, M.C (1981). Biochim. Biophys. Acta 678, 27-38. Carlson, D.P., and Ross, J. (1983). C e l l 34, 857-864. Carlson, M. , and Botstein, D. (1982). C e l l 28, 145-154. Casadaban, M.J. and Cohen, S.N. (1980). J. Mol. B i o l . 138, 179-207. Cech, T.R. (1983). C e l l 34, 713-716. Chaconas, G., and van de Sande, J.H. (1980). Methods Enzymol. 65^ , 75-85. Chen, M.-J., Shimada, T., Moulton, A.D., Harrison, M., and Nienhuis, A.W. (1982). Proc. N a t l . Acad. S c i . U.S.A. 79, 7435-7439. Chirgwin, J . , Przybyla, A.E., MacDonald, R.J., and Rutter, W.J. (1979). Biochemistry 18, 5294-5299. Church, W.R., Jernigan, R.L., Toole, J . , Hewick, R.M., Knopf, J . , Knutson, G.J., Nesheim, M.E., Mann, K.G. and Fass, D.N. (1984). Proc. Natl. Acad. S c i . U.S.A. 81, 6934-6937. Clamp, J.R. (1975) i n The Plasma Proteins, 2nd e d i t i o n , Vol. 2 (Putnam, F.W., Ed.), Academic Press, New York, pp. 163-211. Colman, P.M., Freeman, H.C., Guss, J.M., Murata, M., Norris, V.A., Ramshaw, J.A.M., and Venkatappa, M.P. (1978). Nature (London) 272, 319-324. Cool, D.E., Ed g e l l , C.-J.S., Louie, G.V., Z o l l e r , M.J., Brayer, G.D., and MacGillivray, R.T.A. (1985). J. B i o l . Chem. 260, 13666-13676. 16 7 Cousins, R.J. (1985). Physiol. Rev. 6_5, 238-309. Crabtree, G.R., Comeau, CM., Fowlkes, D.M., Fornace, A.J., J r . , Malley, J.D., and Kant, J.A. (1985). J. Mol. B i o l . 185, 1-19. Craik, CS., Sprang, S., F l e t t e r i c k , R. , and Rutter, W.J. (1982a). Nature (London) 299, 180-182. Craik, C.S., Laub, 0., B e l l , G.I., Sprang, S., F l e t t e r i c k , R., and Rutter, W.J. (1982b). The Relationship of Gene Structure to Pr o t e i n Structure i n Gene Regulation (O'Malley, B., and Fox, C F . , Eds.), Academic Press, New York, pp. 35-54. Craik, CS., Rutter, W.J., and F l e t t e r i c k , R. (1983). Science 220, 1125-1129. Cumings, J.N. (1948). Brain 71, 410. Curzon, G., and O'Reilly, S. (1960). Biochem. Biophys. Res. Commun. 2, 284-286. Czaja, M.J., Weiner, F.R., Schwartzenberg, S.J., S t e r n l i e b , I., Scheinberg, I.H., van T h i e l , D.H., LaRusso, N.F., Giambrone, M.-A., Kirschner, R., Koschinsky, M.L., MacGillivray, R.T.A., and Zern, M.A. (1987). J . C l i n . Invest. 80, 1200-1204. Czosnek, H., Bienz, B., G i v o l . D., Zakut-Houri, R., Pravtcheva, D.D., Ruddle, F.H., and Oren, M. (1984). Mol. C e l l . B i o l . 4, 1638-1640. Dawson, J.H., Dooley, D.M., and Gray, H.B. (1978). Proc. N a t l . Acad. S c i . U.S.A. 75, 4078-4081. Dawson, J.H., Dooley, D.M., Clark, R., Stephens, P.J., and Gray, H.B. (1979). J . Am. Chem. Soc. 101, 5046-5053. Deininger, P.L. (1983). Anal. Biochem. 12J9, 216-223. Deinum, J . , and Vanngard, T. (1973). Biochim. Biophys. Acta 310, 321-330. Denison, R.A., and Weiner, A.M. (1982). Mol. C e l l . B i o l . 2, 815-828. Dickson, P.W., Aldred, A.R., Marley, P.D., Guo-Fen, T., Howlett, G.J., and Schreiber, G. (1985). Biochem. Biophys. Res. Commun. 127, 890-895. Donald, L.J., Wang, H.S., Kamali, V., Liebert, L., Rudner-Marion, M., Cameron, E.S., R i d d e l l , D.C, Vust, A., Wrogemann, K. , and Hamerton, J.L. (1983). Cytogenet. C e l l Genet. 37, 453A. D o o l i t t l e , R.F. (1984) i n The Plasma Proteins, 2nd E d i t i o n , Vol. 4 (Putnam, F.W., Ed.) Academic Press, New York, pp. 317-360. 168 D o o l i t t l e , R.F. (1985). Trends Biochem. S c i . 10, 233-237. Drysdale, J.W., and Munro, H.N. (1966). J . B i o l . Chem. 241, 3630-3637. Dudov, K.P., and Perry, R.P. (1984). C e l l 37, 457-468. D u l l , I . J . , Gray, A., Hayflick, J.S. and U l l r i c h . A. (1984). Nature (London) 310, 775-777. Dwulet, F.E., and Putnam, F.W. (1981a). Proc. N a t l . Acad. S c i . U.S.A. 78, 790-794. Dwulet, F.E., and Putnam, F.W. (1981b). Proc. Natl. Acad. S c i . U.S.A. 78, 2805-2809. Ed g e l l , M.H., Hardies, S.C., Brown, B., V o l i v a , C , H i l l , A., P h i l l i p s , S., Comer, M., Burton, F., Weaver, S., and Hutchinson, I I I . C A . (1983) i n Evolution of Genes and Proteins (Nei, M., and Koehn, R.K., Eds.), Sinauer Associates, Inc., Sanderland, Mass., pp. 1-13. Edmonds, M., Vaughn, M.H., and Nakazato, H. (1971). Proc. N a t l . Acad. S c i . U.S.A. 68, 1336-1340. Eiferman, F.A., Young, P.R., Scott, R.W., and Tilghman, S.M. (1981). Nature (London) 294, 713-718. Esmon, C T . (1979). J. B i o l . Chem. 254, 964-973. Fass, D.N., Knutson, G.J., and Katzmann, J.A. (1982). Blood 59, 594-600. Fee, J.A. (1975). Struct. Bonding (Berlin) 23, 1-60. Feinberg, A.P., and Vogelstein, B. (1983). Anal. Biochem. 132, 6-13. Fielden, E.M. , Roberts, P.B. , Bray, R.C, Lowe, D. J. , Mautner, G.N. , R o t i l i o , C , and Calabrese, L. (1974). Biochem. J. 139, 49-60. Freeman, S., and Daniel, E. (1973). Biochemistry 12, 4806-4810. Freytag, S.O., Bock, H.G.O., Beaudet, A.L., and O'Brien, W.E. (1984). J. B i o l . Chem. 259, 3160-3166. Frieden, E. (1974) i n Protein-Metal Interactions (Friedman, M., Ed.), Plenum Press, New York, pp. 1-31. Frieden, E. (1981) i n Metal Ions i n B i o l o g i c a l Systems, Vol. 13 ( S i g e l , H., Ed.), Marcel Dekker, Inc., New York, pp. 117-142. Frieden, E., and Hseih, H.S. (1976). Adv. Exp. Med. B i o l . 74, 505-529. Frischauf, A.M., Lehrach, H., Poustka, A., and Murray, N. (1983). J. Mol. B i o l . 170, 827-842. 169 Frydman, M., Bonne-Tamir, B., Farrer, L.A., Conneally, P.M., Magazanik, A., Ashbel, S., and Goldwitch, Z. (1985). Proc. Natl. Acad. S c i . U.S.A. 82, 1819-1821. Fulcher, C.A., Roberts, J.R., and Zimmerman, T.S. (1983). Blood 61, 807-811. Fung, M.R., Campbell, R.M., and MacGillivray, R.T.A. (1984). Nucleic Acids Res. 12, 4481-4492. Fung, M.R., Hay, C.W., and MacGillivray, R.T.A. (1985). Proc. N a t l . Acad. S c i . U.S.A. 82, 3591-3595. Gaitskhoki, V., Kisselev, 0.1., Moshkov, K.A., Puchkova, L.V., Shavlovski, M.M., Shulman, V.S., Vacharlovski, V.G., and Neifakh, S.A. (1975). Biochem. Genet. 13, 533-550. Gaitskhoki, V., L'vov, M., Monakhov, N.K., Puchkova, L.V., Schwartzman, A.L., Frolova, L.J., Skobeleva, N.A., Zagorski, W., and Neifakh, S.A. (1981). Eur. J. Biochem. 115, 39-44. Geddes, V.A. (1987). M.Sc. Thesis, U n i v e r s i t y of B r i t i s h Columbia. Germann, U.A., and Lerch, K. (1986). Proc. N a t l . Acad. S c i . U.S.A. 83, 8854-8858. G i l b e r t , W. (1978). Nature (London) 271, 501. G i t s c h i e r , J . , Wood, W.I., Goralka, T.M., Wion, K.L., Chen, E.Y., Eaton, D.H., Vehar, G.A., Capon, D.J., and Lawn, R.M. (1984). Nature 312, 326-330. Go, M. (1981). Nature (London) 291, 90-92. Go, M. (1983). Proc. Natl. Acad. S c i . U.S.A. 80, 1964-1968. Goelet, P., and Karn, J. (1984). Gene 29, 331-342. Gojobori, T., L i , W.-H., and Graur, D. (1982). J . Mol. Evol. 18, 360-369. Goldberg, M.L. (1979). Ph.D. Thesis, Stanford University. Goldstein, I.M., Kaplan, H.B., Edelson, H.S., and Weissmann, G. (1979). J. B i o l . Chem. 254. 4040-4045. Greenquist, A.C., and Coleman, R.W. (1975). Blood 46, 769-782. Grunstein, M., and Hogness, D.S. (1975). Proc. Natl. Acad. S c i . U.S.A. 79, 3961-3965. Gubler, U., and Hoffman, B. (1983). Gene 25, 263-269. 170 Hackett, R.W., and L i s , J.T. (1983). Nucleic Acids Res. 11, 7011-7030. Haldane, J.B.S. (1935). J. Genet. 31, 317-326. H a l l , L., Craig, R.K., Edbrooke, M.R., and Campbell, P.N. (1982) Nucleic Acids Res. 10, 3503-3515. Harr i s , D.I.M. and Sass-Kortsak, A. (1967). J . C l i n . Invest. 46, 659-667. H a r r i s , E.D., and D i S i l v e s t r o , R.A. (1981). Proc. Soc. Exp. B i o l . Med. 166. 528-531. Herve, M., Gamier, A., T o s i , L., and Steinbuch, M. (1978). Biochem. Biophys. Res. Comm. 80, 797-804. Herve, M., Garnier, A., T o s i , L., and Steinbuch, M. (1981). Eur. J . Biochem. 116., 177-183. Hibbard, L.S., and Mann, K.G. (1980). J. B i o l . Chem. 255, 638-645. Himmelwright, R.S., Eickman, N.C., and Solomon, E.I. (1978). Biochem. Biophys.. Res. Comm. 81, 243-247. H o l l i s , G.F., Hieter, P.A., McBride, O.W., Swan, D., and Leder, P. (1982). Nature (London) 296, 321-325. Holmberg, C G . (1944). Acta Physiol Scand. 8, 227-229. Hood, L., Kronenberg, M., and Hunkapiller, T. (1985). C e l l 40, 225-229. Hseih, H.S., and Frieden, E. (1975). Biochem. Biophys. Res. Commun. 67, 1326-1331. Htun, H., Lund, E., and Dahlberg, J.E. (1984). Proc. N a t l . Acad. S c i . U.S.A. 81, 7288-7292. Huynh, T., Young, R.A. and Davis, R.W. (1986). DNA Cloning: A P r a c t i c a l Approach. (Glover, D., Ed.) IRL Press Ltd., Oxford, pp. 49-78. Irminger, J . - C , Rosen, K.M. , Humbel, R.E. , and Villa-Komaroff, L. (1987). Proc. Natl. Acad. S c i . U.S.A. 84. 6330-6334. Jabusch, J.R., Farb, D.L., Kerschensteiner, D.A., and Deutsch, H.F. (1980). Biochemistry 19, 2310-2316. Jackson, CM., and Nemerson, Y. (1980). Ann. Rev. Biochem. 49, 765-811. Jacq, C , M i l l e r , J.R., and Brownlee, G.G. (1977). C e l l 13, 109-120. ' Jagadeeswaran, P., Forget, B.G., and Weissman, S.M. (1981). C e l l 26, 141-142. 171 Jenny, R.J., Pittman, D.D., Toole, J . J . , K r i z , R.W., Aldape, R.A., Hewick, R.M., Kaufman, R.J., and Mann, K.G. (1987). Proc. Natl. Acad. S c i . U.S.A. 84, 4846-4850. Kan, Y.W., and Dozy, A.M. (1978). Proc. Natl. Acad. S c i . U.S.A. 75, 5631-5635. Katz, L., Kingsbury, D.T., and H e l i n s k i , D.R. (1973). J. B a c t e r i o l . 114, 577-591. Katz, L., Williams, P.H., Sato, S., L a e v i t t , R.W., and H e l i n s k i , D.R. (1977) . Biochemistry 16, 1677-1683. Kay, R.J., Boissy, R.J., Russnak, R.H., and Candido, E.P.M. (1986). Mol. C e l l . B i o l . 6, 3134-3143. Kingston, I.B., Kingston, B.L., and Putnam, F.W. (1979). Proc. N a t l . Acad. S c i . U.S.A. 76, 1668-1672. Kingston, I.B., Kingston, B.L., and Putnam, F.W. (1980). J . B i o l . Chem. 255. 2878-2896. Knowles, B.B., Howe, C C . and Aden, D.P. (1980). Science 209. 497-499. Koj, A. (1974) i n Structure and Function of Plasma Proteins ( A l l i s o n , A . C, Ed.), Plenum Press, London, pp. 72-131. Koschinsky, M.L., Funk, W.D., van Oost, B.A., and MacGillivray, R.T.A. (1986). Proc. N a t l . Acad. S c i . U.S.A. 83, 5086-5090. Kunkel, L.M., Smith, K.D., Boyer, S.H., Borgaonkar, D.S., Wachtel, S.S., M i l l e r , O.J., Breg, W.R., Jones J r . , H.W., and Rary, J.M. (1977). Proc. N a t l . Acad. S c i . U.S.A. 74, 1245-1249. Larson, P.H. (1974). Hum. Pathol. 5, 629-640. Lawn, R.M. , F r i t s c h , E.F. , Parker, R.C, Blake, G. , and Maniatis, T. (1978) . C e l l 15, 1157-1174. Lehrach, H., Diamond, D., Wozney, R.R., and Boedtker, H. (1977). Biochemistry 16, 4743-4751. L i , W.-H. (1983). Evolution of Genes and Proteins (Nei, M., and Koehn, R.K., Eds.) Sinauer Associated Inc., Sunderland, Mass., pp. 14-37. L i , W.-H., Luo, C.-C, and Wu, C.-I. (1985) i n Molecular Evolutionary Genetics (Maclntyre, R.J., Ed.) Plenum Press, New York, pp. 1-94. Linder, M.C, and Moor, J.R. (1977). Biochim. Biophys. Acta 499, 329-336. 172 Linder, M.C., Houle, P.A., Isaacs, E., Moor, J.R., and Scott, L.E. (1979). Enzyme 24, 23-35. Lonberg, N., and G i l b e r t , W. (1985). C e l l 40, 81-90. Lucero, M.A., Schaeffer, E., Cohen, G.N., and Zakin, M.M. (1986). Nucleic Acids Res. 14, 8692. Lum, J.B., Infante, A.J., Makker, D.M., Yang, F., and Bowman, B.H. (1986). J . C l i n . Invest. 77, 841-849. MacG i l l i v r a y , R.T.A., Mendez, E., Sinha, S.K., Sutton, M.R., Lineback-Zins, J . , and Brew, K. (1982). Proc. N a t l . Acad. S c i . U.S.A. 79, 2504-2508. MacG i l l i v r a y , R.T.A. and Davie, E.W. (1984). Biochemistry 23, 1626-1634. Maniatis, T., J e f f r e y , A. and K l e i d , D.G. (1975). Proc. N a t l . Acad. S c i . U.S.A. 72, 1184-1188. Maniatis, T., F r i t s c h , E.F., and Sambrook, J . (1982). Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, Cold Spring Harbor. Mann, K.G., Lawler, CM. , Vehar, G.A., and Church, W.R. (1984). J . B i o l . Chem. 259, 12949-12951. Marceau, N., and Aspin, N. (1973a). Biochim. Biophys. Acta 328, 338-350. Marceau, N., and Aspin, N. (1973b). Biochim. Biophys. Acta 328. 351-358. McCombs, M.L. and Bowman, B.H. (1976). Biochim. Biophys. Acta 434, 452-461. McKee, D.H., and Frieden, E. (1971). Biochemistry 10, 3880-3883. McKnight, C L . , O'Hara, P.J., and Parker, M.L. (1986). C e l l 46, 143-147. McLachlan, A.D. (1979). J . Mol. B i o l . 128, 49-79. Mercer, J.F.B., and Grimes, A. (1986). FEBS Lett. 203, 185-190. Messing, J.,*Crea, R., and Seeburg, P.H. (1981). Nucleic Acids Res. 9, 309-321. Messing, J. (1983). Methods Enzymol. 101. 20-78. Meyer, B.J., Meyer, A.C, and Horwitt, M.K. (1958). Amer. J. Phy s i o l . 194. 581-584. 173 Mishina, M., Kurosaki, T., Yamamoto, T., Notake, M., Masu, M. and Numa, S. (1982). EMBO J. 1, 1533-1538. Morell, A.G., Van den Hamer, C.J.A., and Scheinberg, I.H. (1969). J . B i o l . Chem. 24£, 1461-1467. Mount, S.M. (1982). Nucleic Acids Res. 10, 459-472. Nakagawa, 0. (1972). Int. J . Pept. Protein Res. 4, 385-394. Naora, H., and Deacon, N.J. (1982). Proc. N a t l . Acad. S c i . U.S.A. 79. 6196-6200. Neifakh, S.A., Monakhov, N.K., Shaposhnikov, A.M., and Z u n z i t s k i , J.U.N. (1969). Experentia (Basel) 25, 337-345. Neifakh, S.A., Aleynikova, T.D., Gaitskhoki, V., Monakhov, N.K., and Puchkova, L.V. (1979). Doklady Academii Nauk SSSR (Russ) 244, 238-240. Nesheim, M.E., Foster, W.B., Hewick, R., and Mann, K.G. (1984). J . B i o l . Chem. 259, 3187-3196. Neurath, H. (1985). Fed. Proc. 44, 2907-2913. Nordheim, A., and Rich, A. (1983). Proc. N a t l . Acad. S c i . U.S.A. 80. 1821-1825. Notake, M., Tobimatsu, T., Wantanabe, Y., Takahashi, H., Mishina, M., and Numa, S. (1983). FEBS Le t t . 156, 67-71. Noyer, M., and Putnam, F.W. (1981). Biochemistry 20, 3536-3542. Ny, T., Elgh, F., and Lund, B. (1984). Proc. N a t l . Acad. S c i . U.S.A. 81, 5355-5359. Ohno, S. (1970). Evolution by Gene Duplication. Springer-Verlag, New York. O r t e l , T.L., Takahashi, N., and Putnam, F.W. (1984). Proc. N a t l . Acad. S c i . U.S.A. 81, 4761-4765. Owen, C.A., J r . (1965). Amer. J . Ph y s i o l . 209, 900-904. Owen, C.A., J r . (1982) i n Biochemical Aspects of Copper: Copper Proteins, Ceruloplasmin and Copper Protein Binding. Noyes Publications, New Jersey, pp. 128-176. Peisach, J . , and Levine, W.G. (1963). Biochim. Biophys. Acta 7_7, 615-628. 174 Peisach, J . , Levine, W.G., and Blumberg, W.E. (1967). J. B i o l . Chem. 242, 2847-2858. Perl e r , F., E f s t r a t i a d i s , A., Lomedico, P., G i l b e r t , W., Kolodner, R., and Dodgson, M. (1980). C e l l 20, 555-566. Perry, R.P. (1976). Ann. Rev. Biochem. 45, 605-629. Peters, T., and Reed, R.G. (1977). Proc. FEBS 11th Meet. 50, 11-20. Pi c k a r t , L., Freedman, J . H . , Loker, W.J., Peisach, J . , Perkins, CM., Stenkamp, R.E., and Weinstein, B. (1980). Nature (London) 288, 715-717. Poole, S . , F i r t e l , R.A., Lamar, E., and Rowekamp, W. (1981). J . Mol. B i o l . 153, 273-289. Poulik, M.D., and Weiss, M.L. (1975) i n The Plasma Proteins, 2nd Ed., Vol . 2 (Putnam, F.W., Ed.), Academic Press, New York, pp. 51-108. Prochownik, E.V., Markham, A.F., and Orkin, S .H . (1983). J . B i o l . Chem. 258, 8389-8394. Proudfoot, J.M., and Brownlee, C G . (1976). Nature (London) 265, 211-214. Proudfoot, N. (1980). Nature (London) 286, 840-841. Prozorovski, V.N. , Rashkovetski, L . C , Shavlovski, M.M. , V a s i l i e v , V.B., and Neifakh, S.A. (1982). Int. J . Pept. Protein Res. 10, 40-53. Puchkova, L.V., Gaitskhoki, V.S., Monakhov,N.K., Timchenko, L.T., and Neifakh, S.A. (1981). Mol. C e l l . Biochem. 35, 159-169. Putnam, F.W. (1984) i n The Plasma Proteins, 2nd Ed., Vol. 4 (Putnam, F.W., Ed.), Academic Press, New York, pp. 45-166. Raju, K.S. (1983). Mol. C e l l . Biochem. 56, 81-88. Reinhammar, B., Malkin, R., Jensen, P., Karlsson, B., Andreasson, L.-E., Aasa, R., Vanngard, T., and Malmstrom, B.G. (1980). J. B i o l . Chem. 255, 5000-5003. Richards, J.E., G i l l i a m , A.C, Shen, A., Tucker, P.W., and Blattner, F.R. (1983). Nature (London) 306, 483-487. Richardson, J.S., Thomas, K.A., Rubin, B.H., and Richardson, D.C (1975). Proc. Natl. Acad. S c i . U.S.A. 72, 1349-1353. R i d d e l l , D.C, Mallonee, R. , P h i l l i p s , J.A. , Parks, J.S., Sexton, L.A. , and Hamerton, J.L. (1985). Somatic C e l l Mol. Genet. 11, 189-196. 175 R i d d e l l , D.C., Wang, H.S., Beckett, H., Holden, J.J.A., Mulligan, L., P h i l l i p s , A., Simpson, N., Wrogemann, K., Hamerton, J.L., and White, B.N. (1986). Cytogenet. C e l l Genet. 42, 123-128. Rogers, J . (1985). Nature (London) 315, 458-459. Royle, N.J., Irwin, D.M., Koschinsky, M.L., MacGillivray, R.T.A., and Hamerton, J.L. (1987). Somatic C e l l . Mol. Genet. 13, 285-292. Russnak, R., and Candido, E.P.M. (1985). Mol. C e l l . B i o l . 5, 1268-1278. Ryden, L. (1972). Eur. J . Biochem. 26., 380-386. Ryden, L. (1982). Proc. Natl. Acad. S c i . U.S.A. 79, 6767-6771. Ryden, L., and Bjork, I. (1976). Biochemistry 15, 3411-3417. Sanger, F., Nicklen, S. and Coulson, A.R. (1977). Proc. N a t l . Acad. S c i . U.S.A. 74, 5463-5467. Sanger, F., Coulson, A.R., B a r r e l l , B.G., Smith, A.J.H. and Roe, B.A. (1980). J . Mol. B i o l . 143, 161-178. Sargent, T.D., Zagodzinski, L.L., Yang, M. and Bonner, J . (1981). Mol. C e l l . B i o l . 1, 871-883. Sarkar, B., and Wigfield , Y. (1968). Can. J . Biochem. 46, 601-607. Scarpulla, R.C. (1984). Mol. C e l l . B i o l . 4, 2279-2288. Scheinberg, I.H., and G i t l i n , D. (1952). Science 116, 484-485. Scheinberg, I.H., and Morell, A.G. (1973). Inorg. Biochem. 1, 306-309. Schibler, U., Hagenbuchle, 0., Wellauer, P.K., and P i t t e t , A.C. (1983). C e l l 33, 501-508. Schneider, J . C , and Guarente, L. (1987). Nucleic Acids Res. 15, 3515-3529. Schreiber, C , Howlett, G., Nagashima, M., M i l l e r s h i p , A., Martin, H., Urban, J . , and Kotler, L. (1982). J. B i o l . Chem. 25_7, 10271-10277. Schreiber, G., Aldred, A.R., Thomas, T., Birch, H.E., Dickson, P.W., Guo-Fen, T., Heinrich, P.C, Northemann, W. , Howlett, G.H. , De Jong, F.A., and M i t c h e l l , A. (1986). Inflammation 10, 59-66. Schreiber, G. (1987) i n Th Plasma Proteins: Structure, Function and Genetic Control, 2nd Ed., Vol. V (Putnam, F.W., Ed.), Academic Press, Orlando, FL, pp. 293-363. Sharp, P.A. (1981). C e l l 23 , 643-646 . 176 Shell, S.H., Slighton, J.L., and Smithies, 0. (1981). C e l l 26, 191-203. Shvartsman, A.L., Voronina, O.V., Gaitskhoki, V., Nadorshina, O.K., Timchenko, L.T., and Neifakh, S.A. (1985). Molecular Biology (translated from Russian) 19, 410-417. Solomon, E.I., Hare, J.W., and Gary, H.B. (1976). Proc. N a t l . Acad. S c i . U.S.A. 73, 1389-1393. Southern, E.M. (1975). J. Mol. B i o l . 98, 503-517. Sperling, R., Fu r i e , B.C., Blumenstein, M., Keyt, B., and Fu r i e , B. (1978). J . B i o l . Chem. 253, 3898-3906. Staden, R. (1982). Nucleic Acids Res. 10, 4731-4751. Starcher, B.C., and H i l l , C H . (1965). Comp. Biochem. P h y s i o l . 15, 429- 434. Stevens, M.D., D i S i l v e s t r o , R.A., and H a r r i s , E.D. (1984). Biochemistry 23, 261-266. Suzuki, K., Dahlback, B., and Stenflo, J . (1982). J . B i o l . Chem. 257, 6556-6564. Takahashi, N. , Bauman, R.A. , O r t e l , T.L., Dwulet, F.E. , Wang, C.-C, and Putnam, F.W. (1983). Proc. N a t l . Acad. S c i . U.S.A. 80,. 115-119. Takahashi, N., O r t e l , T.L., and Putnam, F.W. (1984). Proc. Natl. Acad. S c i . U.S.A. 81, 390-394. Tanaka, T., Ohkubo, H., and Nakanishi, S. (1984). J . B i o l . Chem. 259, 8063-8065. Tetaert, D., Takahashi, N., and Putnam, F.W. (1982). Anal. Biochem. 123, 430- 437. Toole, J . J . , Knopf, J.L., Wozney, J.M., Sultzman, L.A., Buecker, J.L., Pitman, D.D. , Kaufman, R.J., Brown, E., Shoemaker, C , Orr, E.C, Amphlett, G.W., Foster, W.B., Coe, M.L., Knutson, G.J., Fass, D.N., and Hewick, R.M. (1984). Nature (London) 312, 342-347. Urbach, F.L. (1981) i n Metal Ions i n B i o l o g i c a l Systems, Vol. 13 ( S i g e l , H., Ed.), Marcel Dekker, Inc., pp. 73-115. Ueda, S., Nakai, S., Nishida, Y., Hisajima, H., and Honjo, T. (1982). EMBO J 1, 1539-1544. Van A r s d e l l , S.W., Denison, R.A., Bernstein, L.B., Weiner, A.M., Manser, T. and Gesteland, R.F. (1981). C e l l 26^ 11-17. 177 van Oost, B.A., E d g e l l , C.-J.S., Hay, C , and MacGillivray, R.T.A. (1986). Biochem. C e l l B i o l . 64, 699-705. Vanin, E.F., Goldberg, G.I., Tucker, P.W., and Smithies, O. (1980). Nature (London) 286, 222-226. Vanin, E.F. (1983) i n Regulation of Hemoglobin Biosynthesis (Goldwasser, E. , Ed.), E l s e v i e r Biomedical, New York, pp. 51-68. Vanin, E.F. (1985). Ann. Rev. Genet. 19, 253-272. Varshney, U., and Gedamu, L. (1984). Gene 31, 135-145. Vehar, G.A., Keyt, B., Eaton, D., Rodriguez, H., O'Brien, D.P., Rotblat, F. , Oppermann, H., Keck, R., Wood, W.I., Harkins, R.N., Tuddenham, E.G.D., Lawn, R.M., and Capon, D.J. (1984). Nature 312, 337-341. Verbina, I.A., and Puchkova, L.V. (1985). Doklady Academii Nauk SSSR 283, 478-480. V i e i r a , H., and Messing, J . (1982). Gene 19, 259-268. von Heijne, G. (1982). J . Mol. B i o l . 159, 537-541. von Heijne, G. (1983). Eur. J . Biochem. 133, 17-21. Wang, H., Koschinsky, M., and Hamerton, J.L. (1987). Cytogenet. C e l l Genet., i n press. Wannemacher, R.W. , J r . , Pekarek, R.S., Thompson, W.L., Cumow, R.T. , B e a l l , F.A., Zenser, T.V., Derubertis, F.R., and B e i s e l , W.R. (1975). Endocrinology 96, 651-661. Watson, M.E.E. (1984). Nucleic Acids Res. 12, 5145-5164. Weiner, A.L., and Cousins, R.J. (1980). Biochim. Biophys. Acta 629, 113-125. Weitkamp, L.R. (1983). Ann. Hum. Genet. 47, 293-297. Wickens, M., and Stephenson, P. (1984). Science 226, 1045-1051. Williams, R.J.P. (1971). Inorg. Chim. Acta Rev. 5, 137-155. Wilson, A.C., Carlson, S.S., and White, T.J. (1977). Ann. Rev. Biochem. 46, 573-639. Winnacker, E.-L. (1987) i n From Genes to Clones: Introduction to Gene Technology (translated by Horst Ibelgaufts), VCH Publishers, New York, p. 151. Woo, S.L.C. (1980). Methods Enzymol. 68, 389-395. 178 Wood, W.I., Capon, D.H., Simonsen, C.C., Eaton, D.L., G i t s c h i e r , J . , Keyt, B., Seeburg, P.H., Smith, D.H., Hollingshead, P., Wion, K.L., Delwart, E., Tuddenham, E.G.D., Vehar, G.A. and Lawn, R.M. (1984). Nature 312, 330-337. Yang, F., Lum, J.B. , M c G i l l , J.R. , Moore, CM., Naylor, S.L., Van Bragt, P.H., Baldwin, W.D., and Bowman, B.H. (1984). Proc. N a t l . Acad. S c i . U.S.A. 81, 2752-2756. Yang, F., Naylor, S.L., Lum, J.B., Cutshaw, S., McCoombs, J.L., Naberhaus, K.H. , M c G i l l , J.R., Adrian, G.S., Moore, CM., Barnett, D.R. , and Bowman, B.H. (1986). Proc. N a t l . Acad. S c i . U.S.A. 83, 3257-3261. Young, R.A. and Davis, R.W. (1983a). Proc. N a t l . Acad. S c i . U.S.A. 80, 1194-1198. Young, R.A. and Davis, R.W. (1983b). Science 222, 278-282. Young, R.A., Hagenbiichle, 0., and Schibler, U. (1981). C e l l 23, 451-458. Yuzbasiyan-Gurkan, V.A., Brewer, G.J., and Boerwinkle, E. (1987). Supplement to Amer. J. Hum. Genet. 41, A193. Zuckerkandl, E., and Pauling, L. (1965) i n Plasma Proteins i n Evolving Genes and Proteins (Bryson, V. and Vogel, H.J., Eds.), Academic Press, New York, pp. 97-166. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0098062/manifest

Comment

Related Items