Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sequencing adjacent to BssHII sites on human chromosome 8 : a method of gene identification Debella, Leah Rae 1997

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1997-0393.pdf [ 8.75MB ]
Metadata
JSON: 831-1.0087937.json
JSON-LD: 831-1.0087937-ld.json
RDF/XML (Pretty): 831-1.0087937-rdf.xml
RDF/JSON: 831-1.0087937-rdf.json
Turtle: 831-1.0087937-turtle.txt
N-Triples: 831-1.0087937-rdf-ntriples.txt
Original Record: 831-1.0087937-source.json
Full Text
831-1.0087937-fulltext.txt
Citation
831-1.0087937.ris

Full Text

SEQUENCING ADJACENT TO BssHU SITES ON H U M A N CHROMOSOME 8: A M E T H O D OF GENE IDENTIFICATION by LEAH RAE DEBELLA  BA, Sonoma State University, 1995  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF T H E REQUIREMENTS FOR T H E DEGREE OF MASTER OF SCIENCE in THE F A C U L T Y OF G R A D U A T E STUDIES Department of Genetics  THE UNIVERSITY OF BRITISH COLUMBIA August 1997 © Leah Rae DeBella, 1997  In  presenting  degree at the  this  thesis  in  University of  partial  fulfilment  of  of  department  this thesis for or  by  his  or  scholarly purposes may be her  representatives.  permission.  •  Department of The University of British Columbia Vancouver, Canada  DE-6 (2/88)  Ayc^ir  ZBJn  for  an advanced  Library shall make it  agree that permission for extensive  It  publication of this thesis for financial gain shall not  Date  requirements  British Columbia, \ agree, that the  freely available for reference and study. I further copying  the  is  granted  by the  understood  that  head of copying  my or  be allowed without my written  Abstract  Considerable research is focused on the identification of genes in the human genome. Recently, the Human Genome project has named gene identification as one of its goals to be accomplished in tandem with mapping and sequencing of the entire genome. As a result,-a large program centered on the creation of expressed sequence tags (ESTs) is currently underway. Although this method of gene identification has vastly increased the number of expressed sequences in the database, it is unlikely that all genes encoded in the genome with be detected though the sequencing of cDNAs. An alternative method of gene identification, using genomic D N A , is the sequencing of regions adjacent to rare cutting enzyme sites. Recognition sites of rare cutting enzymes are found infrequently in genomic D N A but cluster in regions known as CpG islands. CpG islands are associated with most ubiquitously expressed genes and 40% of tissue specific genes. Therefore, sequencing of these regions offers a complementary method to be used in the recognition of novel genes. The objective of this project was to identify novel genes present in cloned D N A from human chromosome 8 using BssHU, a rare cutting enzyme. -BssHII, is an excellent choice to be used in this manner, as 80% of its sites are located in CpG islands. D N A adjacent to fourteen Bssrlll sites, originating from eleven cosmids, was sequenced. From this set two human ESTs were identified; two open reading frames with no apparent homologues; two CpG islands, one of which contains a translation initiation signal and is 5' of the EST H88121; and two novel human genes. These results validate the hypothesis that the use of this method can complement other techniques in the identification of novel genes in the human genome.  ii  Table of Contents  Abstract Table  of  ii Contents  iii  List  of  Tables  vi  List  of  Figures  vii  List  of  Abbreviations  ix  Acknowledgments  xii  Chapter I:  Introduction  1  1.1  Features of G C rich regions in the Human Genome  1  1.2  Characteristics  3  1.3  Rare  1.4  Using CpG Islands to Identify  1.5  Gene  1.6  Project  of  Cutting  CpG  Islands  Enzymes  4 Genes  6  Identification  7  Proposal  8  Chapter II: Methods and Materials  11  2.1 Isolation of cloned human genomic D N A - Minipreparation of Cosmid  and  Plasmid  2.2  Restriction  2.3  Electrophoresis  2.4  DNA  11  Digests  12 14  2.3.1  Agarose  Gel  2.3.2  Polyacrylamide  14 Gel  15  Exclusion of Cosmids containing A l u Sequence  16  2.4.1  16  Southern 2.4.1.1  Blot  Synthesis of Radiolabled Alu Probe by Primer Extension  17  Hybridization  18  Hybridization  19  2.4.1.2 2.4.2  Screening 2.4.2.1  by  3' Termini Labeling of T3 Probe with T4 Kinase.. 20  2.5  Construction  of  2.6  D N A Extraction  Deleted from  Cosmids  Agarose  iii  Gel  21 21  2.7  Subcloning  of  Cosmid  2.7.1  Preparation  2.7.2  Ligation  of  Fragments  pBluescript  vector  Reaction  22 23 23  2.8  Transformation of Competent E. coli  24  2.9  DNA  25  2.10  Sequencing  Purification  for  27  2.10.1 Manual  Sequencing  2.10.2 Automated 2.11  2.12  DNA  Sequencing  27  Sequencing  28  Analysis  29  2.11.1 B L A S T  29  2.11.2 G R A I L  30  2.11.3 P R O S I T E  32  Sequence  tagged  2.12.1 G O R  sites  32  gene  32  2.12.2 5-oxo-prolinase 2.12.3 Human  EST  gene  33  W90101  33  45  Chapter III: Results 3.1  3.2  Sequence  Data  45  3.1.1  Cosmid 93C11A5^HII  3.1.2  Subclone  3.1.3  Subclones  from  3.1.4  Subclones  from  3.1.5  Subclone  from  Cosmid  46F11  48  3.1.6  Subclone  from  Cosmid  77C1  49  3.1.7  Subclone  from  Cosmid  141A6  49  3.1.8  Subclone  from  Cosmid  156G7  49  3.1.9  Subclone  from  Cosmid  166H7  50  3.1.10 Subclone  from  Cosmid  175G8  50  3.1.11 Subclone  from  Cosmid  176F5  51  Localization  by  from  STS  Cosmid  45 13E3  46  Cosmid  40G1  46  Cosmid  46F4  47  mapping  51  3.2.1  Localization of the Human GOR gene  51  3.2.2  Localization of Human EST W90101  52  iv  Chapter IV: Discussion  65  4.1  Gene  4.2  D N A Adjacent to S h r i l l 4.2.1  Identification Repetitive  sites  67  DNA  68  4.2.1.1  A l u elements  68  4.2.1.2  LINE1  69  4.2.1.3  elements  V N T R and M E R elements of  Identification  4.2.3  Identification of Open Reading Frames  71  4.2.4  Identification  71  4.2.5  Identification  of  ESTs CpG  70 Islands  of Novel Human Genes GOR  4.2.5.2  Appendix  73  gene  73  5-Oxo-L-Prolinase gene  75  Conclusions  77  References Appendix  69  4.2.2  4.2.5.1 4.3  65  93 1: Sequence obtained 2:  Web  site  from cosmids  98  addresses  104  Appendix 3: Article submitted for publication prior to completion of Thesis  105  v  List of Tables  Table 3.1  GenBank  Table 3.2  GenBank Sequence Tag Sites (STSs) and Expressed Sequence Tag  Table 3.3 Table 3.4  Non-Redundant  (ESTs)  Database  Database  Search  Searches  59 60  GenBank Expressed Sequence Tag (ESTs) Database Search with the G O R gene GenBank Expressed Sequence Tag (ESTs) Database Search  61  with  62  the  OPLAH  gene  Table 3.5  PROSITE database Searches with the GOR gene  Table 3.6  Protein Coding Potential of BssHll-EcoRl as  Analyzed  by  63  subclones  GRAIL  64  Table 3.7  G C content of BssYttl-EcoRI  Table 4.1  RNA  Table 4.2  Potential T A T A and C C A A T Boxes and Spl Binding Sites  91  Table 4.3  Summary  92  splice of  junction Data  subclones sequences  from  vi  Cosmids  64 91  List of Figures  Figure 2.1  Diagram  of  Figure 2.2  Digestion of cosmids with restriction enzymes  36  Figure 2.3  Detection of A l u repetitive sequence: Southern blot  37  Figure 2.4  Detection of Alu repetitive sequence: Colony blot  38  Figure 2.5  Apparatus for the capillary transfer method of Southern blotting  39  Figure 2.6  Diagram of modified  40  Figure 2.7  Creation  Figure 2.8  Subcloning using the modified pKSIIAsc vector  42  Figure 2.9  Detection  vector  43  Figure 2.10  Restriction  subclones  44  Figure 3.1  Sequence comparison between genomic sequence from cosmid  of  sCosl  a  of  vector  pBluescript  deleted  of  vector  cosmid  pKSIIAsc  digests  35  166H7 and sequence from cDNA clone 220630 Figure 3.2  Figure 3.4 Figure 3.5 Figure 4.1 Figure 4.2  53  Sequence comparison between genomic sequence from cosmid 176F5 and sequence from c D N A clone 418116  Figure 3.3  41  54  Chromosomal localization of the human EST W90101 by PCR against somatic cell hybrids GOR sequence comparison between human and chimpanzee for the region amplified by PCR primers  56  Chromosomal localization of the human GOR gene by PCR against somatic cell hybrids  58  Translations of ORFs predicted by GRAIL flanked by 3' and 5' splice sequences  78  55  CpG islands with T A T A boxes and transcription start sequence highlighted  in  italics  79  Figure 4.3  GOR £coRI fragment comparison between human and chimpanzee. 80  Figure 4.4a  GOR sequence comparison between human and chimpanzee beginning at the 5' EcoRl site  81  GOR sequence comparison between human and chimpanzee beginning at the 5' BssHH site  83  GOR sequence comparison between human and chimpanzee beginning at the second .BssHII site  85  Figure 4.4b Figure 4.4c  vii  Figure 4.5  5-oxo-L-prolinase comparison between human and rat  Figure 4.6a  5-oxo-L-prolinase sequence comparison between human and rat for the first exon identified in the human sequence  Figure 4.6b  88  5-oxo-L-prolinase sequence comparison between human and rat for the third exon identified in the human sequence  Figure 4.6d  87  5-oxo-L-prolinase sequence comparison between human and rat for the second exon identified in the human sequence  Figure 4.6c  86  89  5-oxo-L-prolinase sequence comparison between human and rat for the fourth exon identified in the human sequence  viii  90  List of Abbreviations  Amp : ampicillin resistant r  B:  Bssrlll  BLAST: basic alignment search tool bp: base pair cDNA: complementary D N A C o l E l : plasmid origin of replication cR: centiRay dATP: deoxyriboadenosine 5'-triphosphate dCTP: deoxyribocytidine 5'-triphosphate ddATP: dideoxyriboadenosine 5'-triphosphate ddCTP: dideoxyribocytidine 5'-triphosphate ddGTP: dideoxyriboguanosine 5'-triphosphate ddNTP: dideoxyribonucleotide 5'-triphosphate ddTTP: dideoxyribothymidine 5'-triphosphate dGTP: deoxyriboguanosine 5'-triphosphate dHjO: deionized water dNTP: deoxynucleotide 5'-triphosphate dTTP: deoxyribothymidine 5'-triphosphate E: EcoRl ESTs: expressed sequence tags g: gram IPTG: isopropylthiogalactoside kb: kilobase L: liter  ix  lacZ: p-galactosidase M : molar Mb: megabase |iCi: microCurie MCS: multiple cloning site MER: medium reiteration frequency sequence (ig: microgram mg: milligram |il: microliter ml: milliliter p M : micromolar mM: millimolar mRNA: messenger R N A ng: nanogram nm: nanometer NTSB: nick translation stop buffer OPLAH: 5-oxo-prolinase ORF: open reading frame PCR: polymerase chain reaction pmol: picomole rpm: rotations per minute SINE: short interspersed sequence retroposon STSs: sequenced tag sites U : unit  x  V: volts VNTR: variable number tandem repeat X-gal: 5-bromo-4-chloro-3-indolyl-(3-D-galactoside Y A C : yeast artificial chromosome  xi  Acknowledgments  To begin, I would like to thank Dr. Stephen Wood for this opportunity to pursue my interest in genetics. His support and guidance over the past two years has made my first step into this field an extremely rewarding one. As well, I am grateful for the time and valuable suggestions provided by Dr. Diana Juriloff and Dr. Ann Rose during the duration of my study. My gratitude to Mike Schertzer for his time, knowledge, and patience. Also, to the rest of my lab a special thank you for the encouragement and friendship that they gave. Alex Parker for his continual reassurance and the interest he showed in my project. And lastly, to my family for the unconditional love and understanding that they have always provided.  xii  Chapter I: Introduction  1.1 Features of GC rich regions in the Human Genome Even before the Human Genome Project incorporated the identification of genes into its goals (Collins etal., 1993), genetic research was centered around the search for coding sequences. In humans this can be a difficult task due to the enormous size of our genome, only a small portion of which codes for genes. Estimates of the number of genes in the human genome, based on information from genomic sequencing and expressed sequence tag (EST) analysis (Fields etal, 1994), range from 50,000-100,000. Encoded sequences are not distributed evenly throughout the genome. Some chromosomes are rich in genes, whereas others contain very few transcribed sequences. As well, particular regions within chromosomes are richer in genes than others. Chromosome banding techniques have shown chromosomes to be composed of two distinct types of domains, R bands and G bands. R bands coincide with early replicating GC rich regions and contain 80% of identified genes, whereas G bands are A T rich and replicate later. R bands are further subdivided into T bands, comprised of the most G C rich regions. T bands represent 15% of the genome, and contain approximately 65% of identified genes (Holmquist, 1992). Thus, GC rich regions possess a higher gene density than GC poor regions. According to Fields etal, (1994), GC rich areas contain an average of one gene every 23.4 kilobases. Gene poor regions have about a tenth the density of coding D N A , or one gene every several hundred kilobases. Distinct sequences, called CpG islands, are also found in G C rich regions. As with coding sequences, these islands are not distributed evenly throughout the genome. R bands contain the majority, with the highest concentration of CpG islands being found within T bands. Not surprisingly, the distribution of CpG islands between and within chromosomes varies. This pattern was effectively demonstrated by Craig and Bickmore  1  Chapter I: Introduction (1994) using CpG island fragments as fluorescence in situ hybridization probes against a complete set of metaphase chromosomes. Associations between CpG islands and coding D N A had been hypothesized previously, due to their common characteristics and distribution patterns within mammalian genomes. In 1984, several genes associated with CpG islands were identified in chickens and mice (Tykocinski and Max, 1984). An extensive examination of vertebrate coding sequences conducted three years later, analyzed the abundance and location of CpG islands in relation to genes (Gardiner-Garden and Frommer, 1987). It was concluded, from the genes available at this time, that all ubiquitously expressed genes and many of the tissuespecific genes had CpG islands. Islands located 5' of a gene usually began upstream of the transcription start site, with the distance varying from less than 100 base pairs to over 2 kilobases. Four years later, an examination of the location of CpG islands in relation to genes in the human genome was undertaken (Aissani and Bernardi, 1991). Over 80% of the CpG islands associated with genes, in this study, were located at the 5' end of the gene. Most of these islands extended into coding D N A , with a few reaching as far as the 3' flanking sequences. 8% of the islands lay entirely within a gene, 4% extended from within the coding sequence to the 3' end, and 2% were present in the 3' flanking D N A only. In 1992, Larsen etal. confirmed the findings of the previous two studies by analyzing over 400 sequenced human genes. Similar to results obtained previously, all ubiquitously expressed genes and 40% of tissue specific genes were associated with CpG islands. To date there is no established case in which a CpG island is not associated with a gene. Thus, from information collected in these studies and elsewhere, it can be inferred that CpG islands are excellent markers for genes. Moreover, being primarily located at the 5'  2  Chapter I: Introduction end of coding sequences, CpG islands could act as landmarks in the identification of transcription start sites for genes.  1.2 Characteristics of CpG Islands HTF (Hpall tiny fragment) islands or CpG islands account for approximately 2% of the genome and are estimated to be present 45,000 times within it (Antequera and Bird, 1993). In size, they vary from 500 to 2,000 base pairs. As well, all possess a number of distinctive properties distinguishing them from genomic DNA. Throughout most of the genome, CpG dinucleotides are methylated, but remain unmethylated in CpG islands. As mentioned previously, CpG islands are located in GC rich regions (60-70%) (Bird, 1986), as compared to bulk genomic DNA where only 40% of the nucleotides are guanine and cytosine. Most notably, the dinucleotide CpG is found one fifth as often as expected throughout most of the human genome, except within CpG islands, where CpG occurs at the predicted frequency (0.04) (Cross and Bird, 1995). In order to explain the drastic reduction of CpG dinucleotides in bulk genomic DNA, regions with high mutation rates were investigated. Coulondre etal., (1978) studied base substitution mutations occurring in the lad gene of Escherichia coli, a highly mutable locus. High frequencies of base substitution at 5-methylcytosines were identified as the cause. Normally, deamination of cytosine to uracil occurs spontaneously. The uracil residue is then rapidly excised by the enzyme uracil-DNA glycosidase, restoring proper base pairing. However, deamination of 5-methylcytosine residues yields thymine (5methyl-uracil). This results in a G:T base pair, which is subject to normal mismatch repair, but is not removed by the uracil-DNA glycosidase enzyme. The increased number of unexcised G:T base pairs, generated from deamination of 5-methylcytosine, is seen as a hotspot of mutation compared to other sites in the genome (Coulondre etal., 1978). If the transition of 5-methylcytosine to thymine has been occurring throughout the evolution of the human genome, it could explain the reduction of CpG dinucleotides in  3  Chapter I: Introduction  areas where cytosine is methylated. However, if a deficiency in CpG is caused by a transition to TpG and CpA, there must be an equivalent excess of these two dinucleotides in regions where CpG is depleted. To test this theory Bird (1980) compared highly methylated vertebrate genomes with poorly methylated insect genomes. Vertebrates have a greater deficiency of CpG dinucleotides than insect genomes. It was revealed that vertebrates have an abundance of TpG and CpA dinucleotides, approximately equal to the depletion in CpG. In contrast, insects have normal frequencies for both dinucleotides. These results support the hypothesis that deamination of 5-methylcytosine has lead to decreased levels of CpG dinucleotides in the human genome.  1.3 Rare Cutting Enzymes Due to the depletion of CpG dinucleotides in bulk genomic D N A , rare cutting enzymes can be used to locate regions possessing the unmethylated form of this dinucleotide. Enzymes of this type are called rare cutting as their recognition sequences, which include at least one CpG dinucleotide, are found infrequently within the genome. Many rare cutting enzymes can be used to identify CpG islands. However, some enzymes are not as effective at locating CpG islands as others (Bird, 1989). Due to the high G C content in island regions, rare cutting enzymes whose recognition sequence consists entirely of guanine and cytosine are more likely to locate within an island. Moreover, at least 75% of the sites for enzymes possessing two CpG dinucleotides are situated within CpG islands (Lindsay and Bird, 1987). Together these data suggest that rare cutting enzymes with recognition sequences consisting of only guanine and cytosine, and including at least two CpG dinucleotides, are the best choice for locating CpG islands. With only a small portion of the human genome coding for genes, many of which are associated with CpG islands, an easy method for locating these landmark sequences would be invaluable. In chromosomal D N A , methylation blocks C-G enzyme sites outside of CpG islands, assisting in their identification by rare cutting enzymes. However, this  4  Chapter I: Introduction advantage is lost in cloned D N A where CpG is not methylated. Lindsay and Bird (1987) using Sacll, a rare cutting enzyme with the recognition sequence C C G C G G , evaluated the effectiveness of this enzyme at locating islands within cloned D N A . Four cosmids, from a human chromosome X library, were chosen because they contained a SacR site. To confirm that these sites were within CpG islands, D N A from germline and somatic tissues were analyzed for clustering of other rare cutting enzyme sites, as well as their methylation status. Each cosmid fragment contained a cluster of restriction sites and was found to be unmethylated in both tissue types. These fragments were also used as probes in Northern blots, three of which identified transcripts. Thus, SacII successfully identified CpG islands within cloned D N A , results that can be extrapolated to include all rare cutting enzymes. Through the use of rare cutting enzymes CpG islands can be localized in cloned DNA. Five years later, Lindsay and Bird's (1987) results were verified through the analysis of human D N A sequences in the E M B L database (Larsen etal, 1992). Sequence data was examined for the presence of a number of rare cutting enzyme sites. Estimates for each enzyme were made regarding the fraction of sites located within CpG islands, as well as the percentage of islands expected to contain an enzyme's site. Of the twenty-six rare cutting enzymes examined, BssHII stands out as one of the most effective. BssHIl, with the recognition sequence G C G C G C , had 83% of its sites in CpG islands and a site in 58% of the islands examined. / J ^ H U ' s recognition sequence is estimated to be present 17,400 times in genomic D N A , which amounts to a site roughly every 170 kilobases. However, the average fragment size in this study was 500 kilobases, due to loss of small fragments. This indicates that fi^^HU sites are not evenly distributed over the entire genome, but instead are clustered in GC rich regions, resulting in a variety of fragment sizes when used to cut large regions of DNA.  5  Chapter I: Introduction  1.4 Using CpG islands to Identify Genes CpG islands have been shown to be associated with approximately 57% of identified genes (Larsen etal., 1992), most often located near the 5' end. Thus, by identifying a CpG island an associated gene may also be uncovered. In fact, D N A adjacent to rare cutting sites throughout the genome has been cloned and used to identify novel genes in G C rich regions. Molecular searches for genes within a 2 megabase (Mb) Y A C contig spanning Xp22.3 began with the identification of several CpG islands (Lee etal., 1994). Two were sequenced, both of which contained novel human genes. Transcripts for one of these genes (GS2) are found in all human tissues, typical of most CpG island associated genes. GS2 extends over 26 kilobases, with the CpG island located within the first of seven exons. A study of chromosome 21, sequenced regions adjacent to 16 NotI sites (Zhu etal., 1993). Notl is a rare cutting enzyme that recognizes the sequence, G C G G C C G C , with approximately 90% of its sites found within CpG islands (Lindsay and Bird, 1987). Using a program that identifies open reading frames, G R A I L (Uberbacher etal., 1991), five of the clones were found to have high potential as coding regions with seven others having lower probabilities. However, none of the Notl sites sequenced had extensive homologies with sequences present in the database. A difficulty that arises with the use of Notl to identify CpG islands is that although most of its sites are located within CpG islands, Notl recognition sequences are found so infrequently that only 16% of islands contain a restriction site (Larsen etal., 1992). As a result, many CpG islands remain unidentified as do their associated genes. This problem was alleviated by using Eagl and Sacll rare cutting enzymes in a search for genes on chromosome 4 (John etal., 1994). Over 76% of the restriction sites for these two enzymes are located within CpG islands, with more than 57% of CpG islands containing a Eagl or SacJi site (Larsen etal., 1992). Forty-two regions adjacent to a rare cutting enzyme site were identified and cloned from a cosmid contig spanning one million base pairs in the  6  Chapter I: Introduction region where the Huntington gene is located. Seventeen of these clones were found to encode genes. In addition, computer searches using B L A S T (Altschul etal., 1990) and GRAIL (Uberbacher etal., 1991) identified nine other clones as potentially containing coding sequences, as well as confirming fifteen of the seventeen previously identified. Thus, cloning fragments adjacent to a single rare cutting enzyme site is effective as a method of identifying genes within cloned DNA.  1.5 Gene Identification Many techniques are available to identify genes in the human genome. One method being undertaken as part of the human genome project is the sequencing of Expressed Sequence Tags (Adams etal., 1991; Hillier etal., 1996). Expressed Sequence Tags, or ESTs, are segments of sequence from a cDNA clone reverse transcribed from a mRNA. Libraries of cDNA clones have been created from many different tissue types and stages of development in an attempt to obtain representatives of each gene. Normalized libraries have also been created (Bonaldo etal., 1996), where the frequency of all clones is within a narrow range, decreasing the redundancy of cDNAs for a given transcript. Currently, over 709,530 human derived ESTs are present in the dbEST database at the US National Center for Bioinformatics (NCBI). This is largely a result of the Merck I M A G E consortium's effort to create and make these sequences available (Hillier etal., 1997), although ESTs from other sources are also present. Despite the tremendous amount of sequence being generated from coding D N A , transcripts with low level expression, in particular tissues or stages of development are selected against in the sequencing of ESTs. Several other approaches are also being used to locate expressed sequences. One method, cDNA selection, involves the hybridization of an amplified cDNA library to Y A C or cosmid clones immobilized on nylon membranes (Parimoo etal., 1991). cDNA inserts that hybridize to the cloned genomic D N A are eluted, amplified, cloned, and sequenced. This strategy is useful as a method of enrichment for  7  Chapter I: Introduction expressed sequences from a genomic region of interest. However, it is not a practical technique for locating novel genes throughout the genome. Another approach, exon amplification, uses the functional sequences required for R N A splicing to isolate expressed sequences from genomic D N A (Duyk etal., 1990; Buckler etal., 1991). Cloned genomic D N A fragments are ligated into a plasmid vector containing splicing sequences. Ideally, only inserted genomic D N A containing an exon will be spliced properly into mature mRNA when present in mammalian cells in culture. However, genomic fragments with cryptic splice acceptor sites can also be identified through this method. Many of the strategies currently being used to identify genes in the human genome involve the sequencing of expressed sequences. As a result, information from intronic and intergenic regions, such as control and regulatory sequences, are not present. Identification of genes in genomic D N A , through the use of rare cutting enzymes, has an advantage of locating genes and their associated sequences independently of time and location of expression.  1.6 Project Proposal The majority of genes in the human genome are associated with unique sequences known as CpG islands. These regions of GC rich, unmethylated D N A are distributed unevenly throughout the genome. Fluorescence in situ hybridization studies have localized a large proportion of these landmark sequences to the R bands of chromosomes (Craig and Bickmore, 1994). Chromosome 8 has G C rich R bands distributed from one end to the other, interspersed with G bands (Holmquist, 1992). Additionally, the telomeric region of the long arm of chromosome 8 consists of T bands, which correlate with gene rich regions of the genome. Therefore, one of the characteristics of coding D N A , high G C content, indicates human chromosome 8 contains many gene rich regions.  8  Chapter I: Introduction Another feature of CpG islands is their ability to be identified by rare cutting enzymes. Initially, a clustering of rare cutting sites was seen to signify a CpG island. However, it is now recognized that a large percentage of islands are identifiable through the use of a single rare cutting enzyme (Larsen etal., 1992). Rare cutting enzymes vary in their effectiveness at locating CpG islands. Recognition sequences which consist entirely of guanine and cytosine are more often found within islands than those containing adenine and thymine. In addition, enzymes with two or more CpG dinucleotides have over 80% of their recognition sites within CpG islands (Larsen etal., 1992). fissHII is a rare cutting enzyme, with a six base pair recognition sequence, possessing both of these traits. BssHU has an advantage over other restriction enzymes in this category, such as Notl and Ascl, as it cuts more frequently and so is present in a larger percentage of CpG islands. Rare cutting enzymes, such as Notl, Sacll and Eagl, have successfully been used in cloned D N A to identify coding sequences (Zhu etal., 1993; Lee etal., 1994; John etal., 1994). The purpose of this project is to test whether the rare cutting enzyme BssHII, can be used to identify novel genes in cosmids from a chromosome 8 library (Wood etal., 1992). For this project, cosmids possessing a BssW. site within their human insert sequence were chosen for analysis. The regions flanking the restriction site were subcloned into a plasmid vector. This was accomplished by digesting each cosmid with EcoRl and BssEll.  EcoRl restriction sites flank the human sequence embedded in the  cosmid vector D N A . By digesting with . B ^ H l I and EcoRl, fragments adjacent to BssHll restriction sites become Bssffll-EcoRl fragments or 555 HII-55 ^HII fragments. These ,  ,  fragments of D N A were subcloned and sequenced. The sequence was analyzed using B L A S T (Altschul etal., 1990), a program that searches for sequence identity matches against sequence present in the GenBank database. Additional analysis was conducted using G R A I L (Uberbacher etal., 1991), a program that recognize features unique to coding sequences.  9  Chapter I: Introduction The subset of cosmids used for this project were selected from the LA08NC01 chromosome 8 library (Wood etal., 1992). This library was constructed from a human x hamster cell line, retaining human chromosomes 4, 8, and 21. Chromosome 8 was isolated by fluorescence-activated flow sorting. Fragments of human D N A , averaging 36.5 kilobases, were cloned into the sCos-1 vector and used to transfected E. coli cells. To minimize the possible coligation of non-contiguous sequences in the genome, flow-sorted insert D N A was dephosphorylated. 85% of the clones in the LA08NC01 library are human, 9% hamster, with the remaining 6% originating from neither species. At this time no chimeric cosmids have been detected. Cosmids used in this project were selected with a probe for the Ascl (GGCGCGCC) recognition sequence, another rare cutting enzyme used to identify CpG islands. Due to the fact that the cosmids were not randomly selected predictions can not be made as to the percentage expected to contain a BssHll site. However, with 80% of BssRU restriction sites locating within CpG islands, approximately 80% of the cosmids identified as containing a site should be within a CpG islands. Therefore, it is predicted that this method will be efficient for identifying regions of D N A associated with genes.  10  Chapter II: Methods and Materials  Prior to the start of this research project, cosmids were selected from the LA08NC01 chromosome 8 library (Wood etal., 1992) for an Ascl rare cutting enzyme site. This was accomplished by probing high density filters containing each cosmid with a radiolabeled probe, N N N N G G C G C G C C , comprised of the Ascl recognition sequence. The cosmids examined in this project were a subset of the selected cosmids that did not contain an Ascl site.  2.1 Isolation of cloned human genomic DNA- Minipreparation of Cosmid and Plasmid D N A Cosmids from the LA08NC01 chromosome 8 library (Wood etal., 1992) were used to inoculate 5 ml of L B medium containing the antibiotic ampicillin (0.25 mg/ml). Each culture was incubated overnight at 37°C while being shaken vigorously. D N A was prepared using a modification of the protocols described by Birnboim and Doly (1979) and Ish-Horowicz and Burke (1981). To begin, 1.5 ml of the culture was transferred into a eppendorf tube and centrifuged (13,000 rpm) for 5 minutes at 4°C. The liquid medium was removed by aspiration, leaving a pellet of bacteria containing genomic D N A . The pellet was completely resuspended in 100 (il of ice cold Solution I containing lysozyme (4 ng/ml) added just before being used. This solution causes lysing of the bacterial cell wall. To this mixture 200 p:l of freshly prepared Solution II was added, mixed thoroughly, and stored on ice. 150 pj of ice cold Solution III was added, mixed completely, and returned to the ice for 5 minutes. The use of these solutions facilitates the removal of bacterial cell wall debris and denaturation of the protein-polysaccharide complex, while leaving nucleic acids in solution. Each tube was then centrifuged for 5 minutes at 4°C. The supernatant was transferred to a clean tube, leaving behind the protein-polysaccharide complex. To  11  Chapter II: Methods and Materials precipitate the D N A , an equal volume of isopropanol was added for cosmids or 2 volumes of 95% ethanol was used with plasmids. The nucleic acids were collected by centrifugation, the supernatant removed, and the pellet rinsed with 1 ml of ice cold 70% ethanol. After each tube was centrifuged briefly; the supernatant was again removed, and the pellet dried. Each pellet of nucleic acids was redissolved in 50 | i l of TE (pH 8.0) containing RNase (20 u.g/ml). This solution was then incubated at 37°C for 30 minutes to allow the RNase to digest any R N A present in the minipreparation of D N A . L B Medium 400 ml dHjO in I L Ehrlemyer flask 4g peptone (tryptone) 2g yeast extract 2g NaCl 0.4g D-glucose Place 5 ml of medium in 15 ml tube which were autoclave 20 minutes. Solution I 50 m M glucose 2 5 m M T r i s C l (pH 8.0) 10 m M E D T A (pH 8.0) Add dHjO to bring solution volume up to 200 ml, autoclave for 15 minutes, and store at 4°C. Solution II 0.2 N NaOH freshly diluted from a 10 N stock with d l L O 1% SDS Solution III 5 M potassium acetate 60 ml glacial acetic acid 11.5 ml dlLO 28.5 ml Resulting in a 3 M solution of potassium and 5 M solution of acetate. TE 10 m M Tris HC1, pH 8.0 1 mMEDTA  2.2 Restriction Digests Restriction digest reactions were used to determine if a cosmid contained a BssHU site within the human insert. This was accomplished by digesting each selected cosmid with BssHU. The cosmid vector contains a BssHU site within it (Figure 2.1). If the  12  Chapter II: Methods and Materials  human insert also contains a site then at least two bands of D N A will be detected on the agarose gel (described in section 2.3.1). A number of cosmids were found to lack an insert fragment, or to have no BssHTl site within the insert sequence (as is seen with cosmid 150F9 Figure 2.2). Cosmids of this type were presumably selected due to the BssHII site contained within the vector D N A , and were excluded from the set of cosmids to be used. After confirming that a Z^sHn site was present within the human insert sequence, the cosmid was digested with i ^ s H I I and EcoRl. Each of these digests were completed using the following protocol. Approximately 500 ng of D N A , purified using the minipreparation protocol described in section 2.1, was placed in a eppendorf tube. To each D N A sample 2 | i l of 10X B S A , 2 ul of the appropriate buffer, and dFLp was added, bringing the total volume to 20 (il. Half a unit of the appropriate restriction enzyme was added, the reaction mixed well, and incubated at the appropriate temperature for 1.5 hours. After the digestion reaction was completed 5 fil of gel loading buffer was added to each eppendorf tube and fragments of D N A were analyzed on a 0.7% agarose gel (as described in section 2.3). IPX B S A bovine serum albumin fraction V 1 mg/ml Restriction Enzymes Bsstlll (BioLabs; 4 U/|il); NEBuffer for BssHIl with Bsstlll alone and in double digest of TissHII and EcoRl. Each reaction was incubated at 50°C. EcoRl (GibcoBRL; 10 U/pJ); 10X REact 3 (GibcoBRL). The reaction was incubated at 37° C. Gel loading Buffer 0.25% xylene cyanol 0.25% bromophenol blue 40% sucrose in dFL,0 6 x (60 m M EDTA)  13  Chapter II: Methods and Materials  2.3  Electrophoresis Electrophoresis is an effective method used to separate D N A fragments based on  their molecular size. In this project agarose gels were employed in several ways: as a means of separating fragments of D N A resulting from restriction enzyme digests; identification; purification; as the first step in preparing a Southern blot. Polyacrylamide gels are capable of separating fragments differing in size by as little as one base pair, and were used in this project to visualize D N A fragments resulting from manual sequencing.  2.3.1  Agarose Gel 0.7% agarose gels were prepared by adding powdered agarose (GibcoBPX) to I X  TBE buffer. At this concentration of agarose, linear D N A molecules ranging in size from 800 base pairs to 10 kilobases are ideally resolved. After the agarose had dissolved in the boiling buffer and cooled, ethidium bromide was added to a final concentration of 0.5 mg/ml. The gel was poured into a mold, and allowed to set. The electrophoresis apparatus, into which the gel was placed, was filled with I X T B E buffer. Each sample, containing 5 fil of gel loading buffer, was loaded into the wells of the gel. 5 )il (100 ng/ui) of X D N A marker was simultaneously electrophoresed in an adjacent lane providing a ladder of known fragment sizes. Electrical voltage was applied across the gel until the marker dyes migrated the desired distance, after which D N A fragments were visualized using ultraviolet light (wavelength 302 nm). IX TBE 89 mM Tris-borate 2 mM E D T A (pH 8.0) X D N A Marker X D N A (500 ng/u.1)  40.nl  10XBSA  16 | i l  10XReact2  16 ill  MilliQrLO  88 uM  14  Chapter II: Methods and Materials  Hind III  5 ul  Sst II  5 |il  Digestion reaction proceeded for 1.5 hours after which 40 p:l of gel-loading buffer was added. The eppendorf tube was stored at 4°C.  2.3.2  Polyacrylamide Gel 5% polyacrylamide gels were cast using 30% acrylamide, dFL,0, 5X T B E , and  10% ammonium persulfate. This solution was poured between two glass plates, separated by a thin spacer. A comb was inserted into the gel and the gel left to polymerize for 45 minutes. After the acrylamide had polymerized the gel was placed in a vertical electrophoresis tank filled with I X T B E buffer. 4 \i\ from each of the four sequencing reactions, prepared as described in section 2.10.1, were loaded into the wells in the order T G C A . The gel was run at 50 V/cm until the marker dyes had migrated to the bottom of the gel, approximately 1.5 hours. Whatman 3 M M paper was placed on the sequencing gel, after the apparatus was dismantled, allowing the removal of the polyacrylamide gel from the glass plate. The gel was covered with a second piece of paper and dried for approximately 20 minutes under a vacuum gel dryer at 80°C. Fragments resulting from the sequencing reaction were visualized by exposing the bands of radioactive D N A to film. 30% Acrylamide 29 g acrylamide 1 g N,N'-methylenebisacrylamide H O t o 100 ml Heat the solution to 37°C to dissolve the chemicals. 2  10% Ammonium persulfate 1 g ammonium persulfate H O t o 10 ml Store solution at4°C. 2  15  Chapter II: Methods and Materials  2.4 Exclusion of Cosmids containing Alu Sequence 93C1 \&Bssrl\l, the first cosmid to be successfully sequenced, was found to contain an Alu element. Alu is a family of repetitive D N A , found in excess of 500,000 times in the human genome (Deininger et al., 1981). A BssHH site is present in the consensus sequence of the progenitor Alu gene, as well as within subfamilies Alu-J, AluSc, and Alu-Sa (Labuda and Striker, 1989). To avoid further sequencing of these repetitive regions two methods were employed. One involved the use of a transfer technique first described by Southern (1975) (section 2.4.1), with Alu D N A being used as a probe on blots of restriction digests (Figure 2.3). The other allows the screening of a large number of bacterial colonies containing a constructed deleted cosmid (section 2.4.2) for Alu sequence (Figure 2.4). BssHTL-EcoRI and fis^HII-fissHII fragments found to contain an Alu element were excluded from the set of D N A fragments to be sequenced.  2.4.1 Southern Blot In this technique, D N A from each cosmid was digested with restriction enzymes and separated according to size by electrophoresis on an agarose gel. The D N A was then denatured and transferred to a solid support, in the case of this project Nylon transfer membranes were used. D N A attached to the membrane was then hybridized to radiolabeled Alu sequence enabling the identification of any bands complementary to this sequence. To begin, after electrophoresis the gel was transferred to a glass baking dish containing several volumes of 1.5 M NaCl, 0.5 N NaOH. The gel was soaked in this solution for 30 minutes, allowing the D N A to denature. It was then rinsed in deionized water (dli,0), and neutralized for 30 minutes by soaking it in a solution of 1 M Tris (pH 7.4), 1.5 M NaCl. In order to transfer the D N A to the Nylon transfer membrane, it must be eluted from the agarose gel and deposited on the membrane. This was accomplished using the capillary transfer method (described by Sambrook and Maniatis, 1989), in which a flow of liquid carries the D N A fragments from the gel to the membrane using capillary  16  Chapter II: Methods and Materials action established and maintained by a stack of dry paper towels (Figure 2.5). The apparatus required for this method was assembled, with the gel placed with the membrane (Hybond-N+ Nucleic acid transfer membrane) on top of it. The Southern blot apparatus was left over night to allow a complete transfer of D N A to the solid support. Once the transfer was completed the membrane was baked for 2 hours at 80° C in a vacuum oven, to fix the D N A to the membrane.  2.4.1.1 Synthesis of Radiolabeled Alu Probe by Primer Extension A fragment of Alu was obtained for use as a probe by digesting plasmid p51A8EBa8 with EcoRl, using the method described in section 2.2. The digest was electrophoresed on a 0.7% agarose gel for 1.5 hours at 60 volts. The 800 base pair insert fragment containing an Alu element was excised from the gel and placed in the well of a low melting point agarose gel (Sigma Type VII: Low Gelling Temperature Agarose). After allowing the D N A to enter the gel, the band was again excised and placed in an eppendorf tube to which 200 p.1 of TE was added. The solution was boiled for 5 minutes and stored at4°C until needed. 15 (il of the purified linear Alu fragment was pipetted into a eppendorf tube, boiled for 5 minutes to denature the D N A , and placed on ice for 5 minutes. To this 2.5  JLLI of 10X  B S A , 5.0 ul O L B - A , 2.5 uT a - d A T P (10 uGi/pl), and 1 ui (1 U/jxl) of Klenow fragment 32  (Pharmacia Biotech) was added (Sambrook etal., 1989). The eppendorf tube was centrifuged briefly, and left to incubate overnight at room temperature. To stop the reaction an equal volume of Nick Translation Stop Buffer (NTSB) was added. The probe was then boiled for 5 minutes, and placed on ice for 5 minutes.  17  Chapter II: Methods and Materials  5X O L B - A solution O: 1.25 M Tris CI pH 8.0 and 0.125 M M g C l solution A : 1 ml solution O, 18 p.1 B-mercaptoethanol, 5 pi of each dNTP (G/C/T) each dNTP is 0.1 M in 3 m M Tris pH 7.0, 0.2 m M E D T A solution B: 2 M Hepes titrated to pH 6.6 with 4 M NaOH solution C: Hexadeoxyribonucleotides (50 units Pharmacia 27-2166-01) suspended in 550 pi TE to 90 OD units/ml 2  solutions A : B : C are mixed in the ratio of 100:250:150 NTSB 50 m M EDTA 20 m M NaCl 0.1% SDS 500 pg/ml Salmon Sperm D N A d i r p  2.4.1.2  Hybridization  After fixing the nucleic acids to the membrane, as described in section 2.4.1, each membrane was placed in a heat-sealable bag into which approximately 4 ml of warm (65° C) prehybridization solution was added. The end of the bag was sealed, and incubated for 1 hour submerged in 65°C water. After an hour of prehybridization, during which blocking agents anneal to the membrane suppressing background hybridization signals, the denatured probe (prepared as described in section 2.4.1.1) was added to the prehybridization solution. The bag was then resealed twice, to avoid radioactive contamination of the water bath, and resubmerged in 65°C water. After the hybridization was complete each membrane was removed from the bag and immersed in I X SSC, 0.1% SDS solution for 15 minutes, the membrane was then washed in 0.2X SSC, 0.1% SDS solution for 45 minutes. After being washed, each membrane was wrapped in plastic film wrap, and exposed to X-ray film. The film was developed, and bands of D N A hybridizing to the probe were identified. Prehybridization solution 6X SSC 5X Denhardt's reagent 0.5% SDS 100 pg/ml salmon sperm D N A  18  Chapter II: Methods and Materials  2.4.2  Screening by Hybridization A method for lysis of bacterial colonies on transfer membranes, and attachment of  D N A to these membranes, was first described by Grunstein and Hogness (1975). Once fixed these nucleic acids can be hybridized to a radiolabeled probe, first described by Gillespie and Spiegelman (1965). This method allows the screening of a large number of bacterial colonies carrying cosmid or plasmid D N A containing a sequence of interest. In the case of this project, this method was used to identify cosmids containing Alu sequence and bacterial colonies containing fragments subcloned into a pKSIIAsc vector (discussed in section 2.7) eliminating false positives. A transfer membrane (NEF-978 Colony/Plaque Screen Hybridization Transfer Membrane) was placed on an agar plate containing ampicillin. Sterile toothpicks were used to transfer one bacterial colony onto a filter and a master agar plate containing ampicillin. These colonies were numbered and placed in identical positions on each plate, allowing them to be easily identified after hybridization. Both plates were inverted and stored overnight at 37°C. After growing overnight the master plate was sealed with Parafilm and stored at 4° C . Two pieces of Whatman 3 M M paper were cut, placed in the bottom of trays, and saturated with one of the following solutions. The first paper was wet with denaturing solution (1.5 M NaCl, 0.5 N NaOH), the second paper with neutralizing solution (1.5 M NaCl, 0.5 M Tris CI [pH 7.4]). Each transfer membrane was placed on the denaturing solution paper for 8 minutes and the neutralizing solution paper for 8 minutes, causing lysing of the bacterial colonies growing on the membrane. Lastly, each filter was placed in dHjO and wiped gently with wet tissue paper to remove excess cell debris. D N A was fixed onto the membranes by baking for 2 hours at 80° C in a vacuum oven. Hybridization of radiolabeled probes to these membranes will be discussed in section 2.4.2.1, Alu probes were prepared as described in section 2.4.1.1, and pKSIIAsc probes were prepared as described in section 2.4.2.1. After colonies were found to contain Alu or pKSIIAsc  19  Chapter II: Methods and Materials  sequence, appropriate colonies were picked from the master plate and used to inoculate L B medium. Agar plates containing ampicillin 400 ml dtLO in I L Ehrlemyer flask 4 g BactoPeptone (DIFCO laboratories 0118-17-0) 2 g Yeast Extract (DIFCO laboratories 0127-17-9) 2 g NaCl (BDH Inc.) 0.4 g Dextrose (FisherChemical D16-500) 4.8 g agar (BDH Inc.) Autoclave for 20 minutes. Allow solution to cool, add 20 mg ampicillin, pour plates, and store at4°C.  2.4.2.1 3' Termini Labeling of T3 Probe with T4 Kinase Fragments adjacent to fissHII restriction sites were subcloned into the pKSIIAsc vector (Figure 2.6), which contains the T3 promoter allowing annealing of the T3 primer (5'-ATTAACCCTCACTAAAG-3'). To identify bacterial colonies containing a pKSIIAsc vector a radiolabeled T3 probe was hybridized to transfer membranes prepared as described in section 2.4.2. The probe was prepared by pipetting 4.5 | i l of T3 primer (10 (iM), 2.0 p:l 5X kinase buffer (GibcoBRL), 1.0 u.1 T4 kinase (10 U/ml; GibcoBRL), and y A T P - P  32  (10  p.Ci/p,l) into an eppendorf tube. The reaction was incubated for 1 hour at 37°C, followed by heating to 95°C for 5 minutes to stop the reaction. Approximately 4 ml of hybridization solution was added to a heat sealable bag containing the transfer membranes, to which the labeled T3 probe was added. The hybridization was allowed to proceed overnight at 42° C, after which each filter was washed for 2 minutes in 0.2X SSC at the same temperature. Each filter was dried briefly, covered in plastic wrap, and exposed to X-ray film for approximately 30 minutes.  20  Chapter II: Methods and Materials  2.5 Construction of Deleted Cosmids Deleted cosmids were created to be used as templates for the sequencing of D N A adjacent to fissHII sites. This was accomplished by digesting selected cosmids with BssHU, as described in section 2.2, followed by religation of the resulting fragments, shown in Figure 2.7. In an eppendorf tube 2.5 |il of the digested cosmid D N A , 4 jil of 5X ligase buffer (GibcoBRL), 1 pi of T4 D N A ligase (GibcoBRL; 1 U/ul), and drl^O were added to a final volume of 20 JLLL Each reaction was left overnight at 14°C and used to transform competent bacteria, using the protocol described in section 2.8. Religation of these ^ssHII fragments potentially creates three types of ligated fragments which are taken up by the competent bacteria. However, plating the bacteria out on a media containing ampicillin selects for those possessing the ampicillin resistance gene. Deletion cosmids obtained in this manner were sequenced using an A B I automated sequencer, as described in section 2.10.2.  2.6 DNA Extraction from Agarose Gel In order to subclone regions adjacent to BssHU recognition sequences, cosmids containing a fissHII site in the human insert D N A were digested with BssHU and EcoRl, using the method described in section 2.2. These resulting fragments were separated on an agarose gel by electrophoresis, as described in section 2.3.1. Followed by excision of BssHU-EcoRl and BssHU-BssHU bands. D N A from these bands was extracted from the agarose gel using a Q I A E X I I Agarose Gel Extraction kit (QIAGEN). After excision of a D N A band from the agarose gel it was placed in an eppendorf tube, to which Buffer QX1 and 10 |il of QIAEX II was added. The solution was incubated at 50°C for 10 minutes. The addition of Buffer QX1 solublized the agarose gel, as well as creating a high salt environment. QIAEX II are silica-gel particles that adsorb nucleic acids  21  Chapter II: Methods and Materials in the presence of high salt. Each tube was centrifuged for 30 seconds, and the supernatant removed. The pellet was washed once with 500 jil of Buffer QX1 to remove excess agarose, and twice with Buffer PE to remove salt contamination. The pellet was then airdried, resuspended in 20 p:l of dHjO, and incubated for 5 minutes. Elution of the D N A was accomplished during this step by lowering the salt concentration allowing the nucleic acids to be released from Q I A E X I I silica gel particles. Each eppendorf tube was centrifuged, the supernatant containing the D N A fragment removed, and placed in a clean eppendorf tube.  2.7 Subcloning of Cosmid Fragments A modified pBluescript (Stratagene) vector, shown in Figure 2.6, was used in this project to subclone fragments flanking fissHU sites. Modifications to pBluescriptIIKS+ replaced the Spel and Xbal restriction sites with an Ascl restriction site, as described in DeBella etal.., (1997). pBluescript contains two selectable markers, an ampicillin resistance gene and the regulatory sequences and coding information for the amino-terminal region of the p-galactosidase gene (lacZ). Within the coding region of the lacZ gene is a polycloning site which does not disrupt the reading frame unless a fragment of D N A is cloned into it. When this vector is present in a host bacterial cell which encodes the carboxy-terminal portion of the (3-galactosidase gene, the two fragments are able to associate forming an enzymatically active protein (Ullmann etal., 1967). Bacteria possessing the pBluescript plasmid without an insert are identified as blue colonies when grown in the presence of 5-bromo-4-chloro-3-indolyl-p-D-galactoside (X-gal). However, insertion of a fragment into the cloning site disrupts the amino-terminal segment resulting in white colonies.  22  Chapter II: Methods and Materials  2.7.1 Preparation of pBluescript vector The pBluescript vector (pKSIIAsc) was cut with restriction enzyme(s), resulting in protruding terminal ends complementary to those of D N A fragments to be subcloned. This was accomplished by digesting 10 ul of pKSIIAsc (800 ng/ul), 5 ul 10X B S A , 5 ul of the appropriate restriction enzyme buffer, 28 ul dr\0, with 2 ul of the appropriate restriction enzyme. The reaction was allowed to proceed for 1.5 hours. Once complete 200 ul of 95% ethanol was added, the eppendorf tube centrifuged for 5 minutes, and the supernatant removed. The resulting pellet was washed with 400 ul of ice cold 70% ethanol, centrifuged briefly, the supernatant removed, and the pellet allowed the air dry. Once dry the purified vector D N A was resuspended in 80 ul TE. 10 ul of the digested vector was run out on an agarose gel to ensure that the digestion was complete. It was then stored at -20° C. Restriction Enzymes Ascl (10 U/ul BioLabs); used to cut pKSIIAsc when cloning ZJs.s'HII fragments; 10X NEBuffer 4 £coRI (10 U/ul; GibcoBRL); used to cut pKSIIAsc when cloning EcoRl fragments; 10X React 3 buffer (GibcoBRL) Ascl and ZscoRI used to cut pKSIIAsc when cloning BssHll-EcoRl fragments; 10X NEBuffer 4  2.7.2  Ligation Reaction Ligation reactions were set up in eppendorf tubes each containing 10 ul of D N A  (approximately 10 ng/ul) purified using the method described in section 2.6, 5.6 ul dH^O, 2 ul T4 D N A ligase buffer (GibcoBRL), 2 ul pKSIIAsc (10 ng/ul) cut with the appropriate enzyme(s), and 0.4 ul T4 ligase (1 U/ul; GibcoBRL). A positive control was also set up with each ligation reaction containing a pBluescript vector capable of religation and no  23  Chapter II: Methods and Materials foreign D N A . The reaction mixtures were incubated at 14°C overnight. Figure 2.8 diagrams the steps of this process.  2.8 Transformation of Competent E. coli Competent D H 5 a Escherischia coli cells (Inoue etal., 1990), stored at -70°C, were thawed on ice after which 50 pi was pipetted into a pre-chilled 15 ml tube. 10 pi of a ligation reaction (prepared as described in sections 2.5 and 2.7) was added to each tube, swirled to mix, and stored for 30 minutes on ice. Each tube was then heat shocked for 45 seconds at 42°C and returned quickly to ice for 2 minutes. 400 pi of L B medium was added to each tube, which were then incubated for 45 minutes at 37°C. 120 pi of transformed competent cells were transferred onto plates, spread over the surface of the plate, and incubated overnight at 37°C. Ampicillin plates were used for deleted cosmids and X I A plates for subcloned fragments. Bacteria containing deleted cosmids were the only type able to grow on the selective media. These colonies were used to inoculate L B medium, and were grown overnight at 37°C. Bacteria transformed with a pKSIIAsc vector containing an insert fragment of foreign D N A appeared as white colonies on X I A plates due to an inability to cc-complement (Ullmann etal., 1967). These colonies were used to inoculate L B medium, and were grown at 37° C overnight. D N A from these cultures was prepared using the minipreparation protocol (section 2.1), digested with the appropriate enzyme (section 2.2) to ensure that D N A of the correct size was present, and run out on a 0.7% agarose gel (section 2.3.1).  24  Chapter II: Methods and Materials XIA Plate 400 ml drl^O in I L Ehrlemyer flask 4 g BactoPeptone (DLFCO laboratories 0118-17-0) 2 g Yeast Extract (DIFCO laboratories 0127-17-9) 2 g NaCl (BDH Inc.) 0.4 g Dextrose (FisherChemical D16-500) 4.8 g agar (BDH Inc.) Autoclave for 20 minutes. Allow solution to cool, add 20 mg ampicillin, 25 mg X-gal, 60 mg IPTG, pour plates, and store in the dark at 4°C. 2.9 D N A Purification  for  Sequencing  Fragments adjacent to BssHU sites from ten cosmids were subcloned into a pKSIIAsc vector and used to transform competent bacteria. Bacterial colonies containing a subcloned fragment were identified either visually as white colonies on an X I A plate, or by hybridization of transfer membranes with a T3 probe present in the pKSIIAsc vector (Figure;2.9). Prior to purification of template D N A for sequencing, restriction digests were used to confirm that human insert D N A of the expected size was present (Figure 2.10). Once a fragment was confirmed to have been cloned into the pKSIIAsc vector, plasmid D N A was purified for sequencing using a Plasmid Mini Kit (QIAGEN). This was accomplished using a modified alkaline lysis protocol, followed by binding of the nucleic acids to an anion-exchange resin present in a column (QIAGEN-tip). D N A is bound to the resin under low salt conditions, after which impurities such as R N A and proteins are removed by a medium level salt wash. D N A is then eluted in a high salt buffer at pH 7.0, concentrated, and desalted by isopropanol precipitation. 1.5 ml of an overnight culture, inoculated from a single bacterial colony confirmed to contain a subcloned fragment, was centrifuged and the pellet resuspended in 300 pi of chilled Buffer P I . 300 pi of Buffer P2 was added, mixed, and incubated for 5 minutes at room temperature enabling bacterial cell lysis. 300 JLLI of ice cold Buffer P3 was added, mixed, and incubated on ice for 5 minutes. Each eppendorf tube was then centrifuged for  25  Chapter II: Methods and Materials  10 minutes, the supernatant removed, and applied to a QIAGEN-tip. Before the supernatant was added, each column was equilibrated with 1 ml of Buffer QBT. After the supernatant had moved through the column, 4 ml of Buffer QC was added to remove any remaining contaminants. 800 ul of Buffer QF was applied to each column, raising the salt concentration, enabling nucleic acids bound to the resin to be eluted into a clean eppendorf tube. D N A was precipitated by the addition of 560 ul of isopropanol, and centrifuged for 30 minutes. Each pellet was then washed with 1 ml of 70% ethanol, centrifuged briefly, and air dried. The pellet of purified D N A was resuspended in 15 ul of dFL>0. 1 ul of each plasmid was digested for 1.5 hours with EcoRl, and run out on a gel. This was done to ensure that the plasmid was purified, as well as determining the concentration of D N A . Buffer PI (Resuspension Buffer) 50 m M Tris HC1, pH 8.0 10 mM EDTA 100 ug/ml RNase A Buffer P2 fLvsis Buffer) 200 m M NaOH, 1% SDS Buffer P3 (Neutralization Buffer) 3.0 M potassium acetate, pH 5.5 Buffer OBT (Equilibration Buffer) 750 m M NaCl 50 m M MOPS, pH 7.0 15% isopropanol 0.15% Triton X-100 Buffer QC (Wash Buffer) 1.0 M NaCl 50 m M MOPS, pH 7.0 15% isopropanol Buffer OF (Elution Buffer) 1.25 M NaCl 50 m M Tris HC1, pH 8.5 15% isopropanol  26  Chapter II: Methods and Materials  2.10  Sequencing After the template was purified, as described in section 2.9, it was sequenced  manually or using an automated sequencer. Sequencing was completed using either a M13 (5'-GTAAAACGACGGCCAGT-3') or T3 (5'-ATTAACCCTCACTAAAG-3') primer, which flank the cloning sites of the SuperCosl and pBluescript vectors.  2.10.1 Manual Sequencing Manual sequencing was performed on a subcloned fragment by the Sanger dideoxy-mediated chain-termination method (Sanger etal., 1977) using Sequenase 2.0. Sequenase 2.0 lacks the 3'—»5' exonuclease activity, resulting in an extremely stable enzyme with a higher specific activity than other versions of this enzyme. Sequencing reactions with Sequenase version 2.0 occur in two steps. The first is a polymerization reaction with limiting concentrations of dNTP's, including ATP-S , extending the primer 35  approximately 25 nucleotides. Followed by a second set of four reactions where chains of D N A are rapidly extended and terminated by the incorporation of ddNTP. After the reactions were complete, the four samples were loaded into a polyacrylamide sequencing gel, as described in section 2.3.2. For the subclone being sequenced 2 pi of 2 M NaOH, 2 m M E D T A and 18 pi of D N A template (200-400 ng/p.1) was pipetted into a 0.5 ml eppendorf tube. After 5 minutes 2 | i l of 2 M ammonium acetate pH 5.4 was added to terminate the reaction. D N A was precipitated by adding 50 [Xl of 95% ethanol and centrifuging the sample for 15 minutes. The D N A pellet was washed by adding 200 | i l of 70% ethanol to the eppendorf tube and centrifuging for 5 minutes. The pellet was dried and resuspended in 6.5 | i l of water. To  27  Chapter II: Methods and Materials  each D N A sample 0.5 ul of 10 u M primer (T3) and 1 ul of D M S O was added. The eppendorf tube was boiled for 3 minutes, followed by cooling at -196°C for 5 minutes in liquid nitrogen. The eppendorf tube was quickly thawed, 2 ul of Sequenase buffer mix was added and the tube was stored at room temperature for 5 minutes. To four reaction tubes 2.5 ul of one of four dideoxynucleotides (ddATP, ddTTP, ddCTP, ddGTP) was added. In another eppendorf tube 1.0 ul of DTT, 2.0 ul of diluted GTP mix, 0.5 ul DMSO, 1.8 ul Sequenase dilution buffer, 1.0 ul S  35  label, and 0.25 ul Sequenase enzyme  were mixed. 6.3 ul of this was added to the eppendorf tube containing the D N A template. This reaction was left 4 minutes at room temperature to pre-elongate. 3.5 ul of the D N A template mixture was added to each of the 4 eppendorf tubes containing one dideoxynucleotide, and heated for 4 minutes at 37°C. 4 ul of gel-loading buffer was added to each tube, the tubes mixed, and 4 (ll of each reaction loaded into a pre-heated sequencing gel.  2.10.2 Automated Sequencing A l l but one of the subcloned fragments were sequenced using an automated sequencer (Applied Biosystems Model 373 Stretch). Sequence was obtained from 500 ng of QIAGEN purified double stranded D N A template using ABI's AmpliTaq FS DyeDeoxy™ Terminator Cycle Sequencing. Unlike the manual method of sequencing described in section 2.10.1, all four base reactions occur in one eppendorf tube on a thermocycler. Sequence procured through this method can be 98.0% accurate to more than 650 base pairs, whereas manual sequencing generally provides 300 to 400 base pairs of reliable sequence.  28  Chapter II: Methods and Materials  In a 0.6 ml eppendorf tube 8.0 pi of terminator premix, 500 ng of template, 3.2 pmol of primer (Ml3 or T3), and dELjO are added to a final volume of 20 pi. A drop of mineral oil was added to overlay the reaction. Each tube was placed in a thermocycler preheated to 96° C where 25 cycles of the following sequence was completed. 96° C for 30 seconds; 50°C for 15 seconds; 60°C for 4 minutes; to 4°C until reaction was removed for purification. Each reaction was transferred to a 1.5 ml eppendorf tube containing 2 u.1 of 3M Sodium acetate, pH 4.6 and 50 pi 95% ethanol. Precipitation of the sequencing reaction with ethanol removed excess dye terminators. The tube was centrifuged for 20 minutes, the ethanol removed, pellet dried, and stored at -20°C until it was run on the sequencing gel.  2.11 DNA Analysis After sequence was obtain from a subcloned fragment, analysis was performed using B L A S T , GRAIL, and PROSITE. B L A S T is available at the NCBI web site (www.ncbi.nlm.nih.gov). G R A I L and PROSITE can be found at the Baylor College of Medicine search launcher (kiwi.imgen.bcm.tmc.edu:8088).  2.11.1 BLAST B L A S T , basic local alignment search tool, was used as a method of comparison between sequence obtained from the subcloned fragments and sequence present in GenBank databases. B L A S T contains a rapid database searching algorithm that optimizes local similarities between sequences and then extends these alignments based on defined match and mismatch criteria, but does not allow for gaps to improve the alignment (Altschul etal., 1990). Similarity searching begins by looking for similar segments between the query sequence and a database sequence, the statistical significance of any  29  Chapter II: Methods and Materials match found is evaluated, and those matches found to be significance are reported back. Matches reported in this thesis had a P value of 2.8e-18 or less, signifying that the likelihood of this match representing a random alignment is 2.8 times 10 to the minus 18th power. GenBank is the National Institute of Health database, based at the National Center for Biotechnology, which contains all known nucleotide and protein sequences. The sequence located at GenBank was obtained from numerous organisms such as homo sapiens, C. elegans, S. cerevisiae, and mus musculus. Nucleotide sequence data is organized into nonredundant, expressed sequence tags (ESTs), and sequence tagged sites (STSs) databases. Due to the constant addition of new sequence GenBank releases updated versions of the database. GenBank release 100 was searched in this project using B L A S T N , a program that compares the nucleotide query sequence with a database of nucleotide sequences. The non-redundant database containing 310,264 sequences; EST database release 051697 containing 1,024,937 entries; STS database release 051597 containing 45,808 entries; as well as an Alu database, REPBASE (Claverie and Makalowski, 1994), containing Alu repetitive sequence, were searched. B L A S T P was used to compare amino acid sequences to proteins in the SwissProt database, which contains 59,576 non-redundant protein sequences.  2.11.2 GRAIL G R A I L 1.3 was used to locate protein coding regions through the application of a set of seven sensor algorithms designed to elicit the coding potential of a region of sequence (Uberbacher and Mural, 1991). The first of these algorithms, a frame bias matrix, enables the identification of potential coding regions and the favored reading frame. If a region codes for a protein then one of the reading frames should have a significantly better correlation to the matrix than the other two frames. The algorithm calculates the correlation coefficient between the matrix and each reading frame. The difference between  30  Chapter II: Methods and Materials  the best and worst coefficient is used as an indicator of coding potential. Fickett, the second sensor, examines properties of coding sequences. The occurrence of each of the four bases in the sequences is compared to the recurrence of the bases in coding D N A . After which the overall base composition of the test D N A is compared to the known compositions of coding and noncoding D N A . The third sensor, dinucleotide fractal dimension, examines the occurrences of dinucleotides. Certain dinucleotides, such as A A and TC, are commonly found. Their occurrence is compared to the occurrence of dinucleotide CG. The occurrence of these dinucleotides differs between coding and noncoding D N A enabling this sensor to compare the sequence of interest to intronic and coding D N A . Coding 6-tuple word preferences is a sensor that examines the frequency of nucleotide 'words' in a length of sequence. D N A sequence from different regions of the genome, intronic and coding, have distinct distributions of word occurrences. Frequencies of these words occurring in coding compared to noncoding human D N A provides an indicator for a gene. Another GRAIL sensor compares in-frame 6-tuples from the test D N A with in-frame 6-tuples from coding D N A . Word commonality, the sixth sensor, focuses on the overall frequency of a given 6-tuple in bulk D N A . Intronic regions use extremely common words whereas exons use relatively rare words. This algorithm takes this feature into account, assisting in the detection of coding D N A . The final sensor, repetitive 6-tuple word preferences, examines the test sequence for several classes of repetitive D N A using 6-tuple statistics. This algorithm utilizes the fact that highly repetitive D N A rarely encodes a protein. The outputs from these sensor algorithms are integrated using a neural net that predicts the location of a coding region within the segment of sequence. An updated version of this program, G R A I L la, was used in this project. After locating a potential open reading frame, the program evaluates a number candidate sequences using information from the 60 base pairs flanking either side of the open reading frame. Tests by the programmers showed that this version of G R A I L was better at finding true exons, eliminating false positives, and finding the boundaries of coding regions.  31  Chapter II: Methods and Materials GRAIL l a was able to recognize 82% of exons, in a test set, with a false positive rate of 11%. Of the exons over 100 base pairs in length, 95% were recognized, with the correct frame assignment given to 98% of those identified.  2.11.3 PROSITE PROSITE is a protein motif database that was used as a method for determining the potential function of a protein (Bairoch, 1993). This database was searched for particular motifs in distantly related proteins that may not have been identified by B L A S T sequence alignment searches. Examination of specific regions with conserved structure and sequence can reveal biological function. The PROSITE version used in this project contained 1143 protein patterns.  2.12 Sequence tagged sites A novel STS was developed for the following segments of D N A by designing PCR primers to be used against a cell hybrid panel specific to chromosome 8 (Wagner etal., 1991; Wood etal., 1986) and the G3 radiation hybrid cell panel (Stewart etal, 1997). Results from the G3 panel were submitted to the Stanford R H server (wwwshgc.stanford.edu/rhserver2/rhserver_form.html)  which provides a map location for the  STS.  2.12.1 GOR gene Polymerase chain reaction (PCR) was used to localize the novel GOR gene on chromosome 8.  P C R primers, oligo320 (5'-AGGTTGCCCCAAGTCCAAGC-3') and  oligo321 (5-'GCTGTCTGACCTTCCACATC-3'), were designed flanking a 286 base pair region within this gene.  32  Chapter II: Methods and Materials A set of PCR reactions were set up using oligo320 and oligo321 as primers against a set of cell hybrids spanning chromosome 8 (Wagner etal, 1991; Wood etal, 1986) and the G3 radiation hybrid cell panel (Stewart etal, 1997). 25 ng of D N A was included in 25 pi amplification reaction containing 50mM Tris CI, pH 8.3, 0.05% Tween-20, 0.05% Nonidet-40, 2.5 m M of M g C l , 0.2 m M of each dinucleotide (dTTP, dGTP, dCTP, 2  dATP), 0.5 pi of Taq polymerase (4 U/ui; J3RL), and 1.0 pi of each primer (10 ng/pl). Each reaction tube was placed in a thermocycler to undergo the following amplification reaction: 40 cycles of a 2 minute 94°C denaturing step, followed by 30 seconds at 58°C to allowing annealing of the primers, and a 1 minute extension at 72°C. In the final cycle the 72°C extension step was lengthened to 7 minutes, concluded by rapid cooling to 4°C.  2.12.2 5-oxo-L-prolinase gene Polymerase chain reaction (PCR) will be used against the G3 radiation hybrid cell panel (Stewart et al., 1997) to localize the novel O P L A H gene on chromosome 8. P C R primers,  oligo322  (5'-TTCCAAAGGCACGCAACATG-3')  and  oligo319  (5'-  A G G G C C A T C C T G C A G G T G - 3 ' ) , were designed to flank a 96 base pair region within this gene.  2.12.3 Human EST W90101 Polymerase chain reaction (PCR) was used to localize the human EST W90101 on chromosome 8.  P C R primers, oligoll5 (5'-TTCTCCTCTCCGCCTGGCTG-3') and  oligol31 ( 5 ' - G A G G G A C A A G T A T C C A G T C C - 3 ' ) , were designed flanking a 300 base pair region within this EST.  33  Chapter II: Methods and Materials A set of PCR reactions were set up using oligo320 and oligo321 as primers against a set of cell hybrids spanning chromosome 8 (Wagner etal., 1991; Wood etal., 1986). 25 ng of D N A was included in 25 ul amplification reaction containing 50mM Tris CI, pH 8.3, 0.05% Tween-20, 0.05% Nonidet-40, 2.5 m M of M g C l , 0.2 m M of each dinucleotide 2  (dTTP, dGTP, dCTP, dATP), 0.5 ul of Taq polymerase (4 U/ul; B R L ) , and 1.0 ul of each primer (10 ng/ul). Each reaction tube was placed in a thermocycler to undergo the following amplification reaction: 40 cycles of a 2 minute 94°C denaturing step, followed by 30 seconds at 58°C to allowing annealing of the primers, and a 1 minute extension at 72°C. In the final cycle the 72° C extension step was lengthened to 7 minutes, concluded by rapid cooling to 4° C .  PCR products resulting from all reactions were examined by adding 5 ul of gel loading buffer to each reaction tube, loading the products into a 2.0% agarose gel, and electrophoresing at 8 V/cm for approximately 2 hours. 5 ul (100 ng/ul) of <|)X174 marker (HaeUl digested <])X174 phage D N A ) was simultaneously electrophoresed in an adjacent lane in order to determine the size of the PCR products by comparing band migration patterns.  34  Chapter II: Methods and Materials  BssHII  Figure 2.1 Diagram of sCos 1 vector. The SuperCosl vector has a cloning capacity of 30-42 kb. It contains an ampicillinresistance gene to allow for selection. The cloning region is flanked by T3 and T7 primers, that can be used to sequence human insert D N A . (Wahlera/., 1987) 35  Chapter II: Methods and Materials  150F9  147D8  CD (D  (0  Figure 2.2 Digestion of cosmids with restriction enzymes. Cosmid 150F9 has one BssHII site in the vector D N A , but no site in the inserted fragment. 147D8 has a BssHll site in the vector and insert D N A , and was included in the set of cosmids investigated in this thesis.  36  Chapter II: Methods and Materials  Figure 2.3 Detection of Alu repetitive sequence: Southern blot. A Southern blot of cosmid restriction digests (shown on left) probed with Alu D N A . Bands containing Alu sequence appear dark in the autoradiograph (shown on right). E: EcoRl; E/B: EcoKUBssHll double digest  37  Chapter II: Methods and Materials  m  18  19  20  21  22  23  24  25  26  Figure 2.4 Detection of Alu repetitive sequence: Colony blot. Autoradiograph of bacterial colonies containing constructed deleted cosmids probed with Alu DNA. Colonies 1-4 are cosmid 46F4A5s.sHII; 5-12 are cosmid 95C1 \ABssHll; 13-20 are cosmid 124H12Afl™HII; 21-28 are cosmid 141F3A5^HII; 29-36 are cosmid 156G7AS55HII; 37-42 166H7A_!,s sHII. Colonies containing Alu appear as dark dots. 1  Cosmid 95C1 lAfiisHII was excluded from the set of cosmids to be examined because it was positive with the Alu probe.  38  Chapter II: Methods and Materials  weight  Figure 2.5 Apparatus for the capillary transfer method of Southern blotting. Capillary transfer of D N A from agarose gel. The buffer is drawn from below through the gel into the stack of paper towels. D N A is eluted from the gel and deposited on the Nylon transfer membrane. The weight at the top ensures a tight connection between each layer in the transfer system.  39  Chapter II: Methods and Materials  Figure 2.6 Diagram of modified pBluescript vector. The ampicillin resistance gene allow for antibiotic selection of the vector. C o l E l origin is a plasmid origin of replication allowing D N A replication of the plasmid. Only a portion of the lacZ gene is present enabling a-complementation for the blue/white color selection of recombinant plasmids. The presence of an inducible lac promoter upstream from the lacZ gene increases the a-peptide expression. MCS is a multiple cloning site flanked by T3 and M l 3 promoters.  40  Chapter II: Methods and Materials  A m p gene r  EcoRl ColEI ori  BssHII digest with BssHII  B  1  E  I I ori  B  I I IsSSSSSSSSSl Amp  r  T  B  B  human D N A  E B  •  I  Uvvvvvvw-l human DNA  human DNA  ligate fragments  on  bacterial D N A  h u m an D N A  human  DNA  Amp " 1  human  DNA  Figure 2.7 Creation of a deleted cosmid. Three types of fragments could be created during the restriction digest. Each of these religates when ligase is added, creating three forms of D N A able to transform competent bacteria. Only bacteria possessing fragments with the origin of replication and Ampicillin resistance gene are able to grow on ampicillin plates. E: EcoRl; B: Bsstlll  41  Chapter II: Methods and Materials  Subcloning using the modified pKSIIAsc vector. Creation of a subcloned fragment by ligation of a human D N A insert and a modified pBluescript vector. In this figure a BssHUJEcoRI fragment is being subcloned into a vector with complementary AscllEcoRl ends. In this thesis a BssHWBssllU fragment also subcloned into a vector digested with Ascl.  42  Chapter II: Methods and Materials  Figure 2.9 Detection of pKSIIAsc vector. Autoradiograph of bacterial colonies probed with the T3 primer present in the pKSIIAsc vector. This was used as a method to isolate colonies containing a subcloned fragment. Colonies 1, 5, and 6 contained subclone 13E3 0.6; colonies 7-11 did not contain subclone 46G4 4.0; colonies 12-15 did not contain subclone 141E8 8.0; colony 16 contained subclone 176F5 1.8; colonies 23 and 28 contained subclone 175G8 3.3.  43  Chapter II: Methods and Materials  CD i—  E  13E3 0.6  176F5 1.8  175G8 3.3  Figure 2.10 Restriction digests of subclones. Bacterial colonies containing subcloned fragments digested with BssHII and EcoRl to ensure that the correct size fragment is present. Each lane contains a 3 kb band representing the pKSIIAsc vector and another band representing the subcloned fragment.  44  Chapter III: Results  3.1 Sequence Data Initially, deleted cosmids were created as templates for sequencing. However, due to their large size (> 9 kilobases) it was difficult to obtain data and only one was successfully sequenced. The remaining sequence data presented in this thesis was obtained from subcloned fragments with a •BssHII site at one or both ends. These subcloned fragments were smaller in size, alleviating the aforementioned problem. Single-run sequence data was obtained from thirteen segments of human D N A using an automated sequencer (ABI Model 373), with a final segment being sequenced manually. During this project, sequence was acquired from the 5ssHU site using the M l 3 primer, and the EcoRl site using the T3 primer. Sequence obtained during this project is located in Appendix 1.  3.1.1 Cosmid 9 3 C l l A £ s s H I I Sequence through the fissHII site within cosmid 93C1 lABssHll was identified as Alu-Sc repetitive D N A . Alu-Sc elements make up a subfamily of Alu repeats distributed throughout primate genomes (Labuda and Striker, 1989). B L A S T analysis revealed the first 383 nucleotides of sequence shared high sequence identity with the Alu-Sc consensus sequence (Table 3.1). The 5ssHII site for which this cosmid was selected is located 148 base pairs from the 5' end of an Alu-Sc element (GenBank U14571). No matches were recognized in the database for the remaining 267 base pairs of non-repetitive sequence. However, G R A I L scored the probability of an open reading frame extending from base pair 492 to 596 as excellent (Table 3.6). A search against the SwissProt database using the hypothetical protein encoded within this open reading frame identified no matches.  45  Chapter III: Results Presently, no known sequence has homology with the open reading frame identified in cosmid 93C11.  3.1.2 Subclone from Cosmid 13E3 A 600 base pair BssHU-EcoRl fragment from cosmid 13E3 was subcloned into a pKSIIAsc vector and sequenced through the ZtosHII site (Appendix 1). B L A S T inquiries revealed no significant matches with data present in GenBank's non-redundant, EST, or STS databases. G R A I L indicated that there is a good probability of an open reading frame beginning at nucleotide 228 and ending at 333 (Table 3.6). A n additional search against the SwissProt protein database did not uncover any significant homology between the hypothetical protein encoded within this open reading frame and those present in the database. At present the open reading frame identified in cosmid 13E3 does not match any known sequence.  3.1.3 Subclones from Cosmid 40G1 Two BssHU-EcoRl fragments from cosmid 40G1, contained within a 3.3 kilobase EcoRl fragment, were subcloned into a pKSIIAsc vector and sequenced. The two subcloned fragments, 1.4 and 1.6 kilobases in size, were sequenced through the fissHU and £coRI sites (Appendix 1). Inquiries against GenBank's non-redundant database identified one gene, the chimpanzee GOR gene (GenBank D10017), sharing high sequence identity with each sequence read (Table 3.1). The PROSITE database of protein patterns was searched with sequence from the GOR gene in order to identify consensus protein sequences denoting possible gene function. This gene was found to contain putative phosphorylation, amidation, N myristoylation, and N-glycosylation sites (Table 3.5). The numerous phosphorylation sites implies the encoded protein has the potential to be phosphorylated. Amidation sites are features indicative of hormone peptides. N-myristoylation sites are involved in  46  Chapter III: Results associations between proteins and membranes. And N-glycosylation sites suggest that the protein is involved in interactions with other cells or molecules (Creighton, 1993). Although this analysis does not prove the function of the protein, it is useful information for future research in this area. Queries of the sequence tags site (STS) database identified an STS on chromosome 8 (GenBank L18713) with regions that matched sequence obtained from subclone 40G1 1.4 using the M13 primer (Table 3.2). This STS is reported as containing LINE1 sequence (Gerken et al., 1993), a family of repetitive D N A that will be discussed further in Chapter IV. Sequence obtained from the 1.4 kb BssRIl-EcoRl fragment also contains LINE1 repetitive D N A , accounting for this match. Sequence from the BssHII site of subclone 40G1 1.6 matched with four mouse expressed sequence tags (W81932; AA034629; AA097346; AA120740) present in GenBank's EST database (Table 3.2). Inquiries against the EST database, using the partially sequenced chimpanzee GOR gene, identified two additional mouse ESTs (AA030569; AA110525) (Table 3.3). Thus, sequencing adjacent to ZtayHII sites in cosmid 40G1 identified a novel human gene previously identified in chimpanzees, with homology to a number of expressed mouse sequences.  3.1.4 Subclones from Cosmid 46F4 Two BssHII-EcoRI fragments, 1.6 and 1.8 kilobases in size, were subcloned from cosmid 46F4. Sequence obtained using the M l 3 primer identified a significant match with one gene, 5-oxo-L-prolinase ( GenBank U70825), previously sequenced from Rattus norvegicus mRNA (Table 3.1). Inquiries against the EST database with the rat 5-oxo-Lprolinase (OPLAH) gene, identified four mouse ESTs (AA27144; W29895; AA271446; AA097778) with significant regions of homology (Table 3.4).  47  Chapter III: Results G R A I L analysis recognized open reading frames in both fragments (Table 3.6). Subclone 46F4 1.6 identified two potential open reading frames, one extending from base pair 422 to 569, and the other from base pair 84 to 278. Subclone 46F4 1.8 was identified as containing one open reading frame extending from base pair 314 to 616. G R A I L failed to identify the 3' exon sequenced in the 1.8 kilobase fragment. However, analysis of this sequence, changing base 460 from C to T, predicted an exon extending from base pair 308 to 465. Thus, sequencing adjacent to BssHU sites in cosmid 46F4 successfully identified a novel human gene previously detected in rats.  3.1.5 Subclone from Cosmid 46F11 B L A S T analysis of sequence from a 2.5 kilobase BssHU-EcoRl fragment revealed this subclone to contain a variable number tandem repeat (VNTR) associated with a short interspersed sequence retroposon, as well as a medium reiteration frequency sequence (MER) (Table 3.1). The first 107 nucleotides from the BssKll site, shares high sequence identity with 3' region of the short interspersed sequence retroposon (SENE-R.C2) with which it matched. The remaining 543 base pairs did not match significantly with any sequence present in the database. Sequence identity between the MER11A consensus sequence (Kaplan et al, 1991) and the sequence obtained through the EcoRl site of this subclone was quite high (Table 3.1). The G C content for the 650 base pairs from the EcoRl site was found to be 37%, typical of A+T rich M E R elements. Thus, sequence obtained from this cosmid identified a number of repetitive elements, with no indications of expressed sequences being present in the subcloned fragment.  48  Chapter III: Results  3.1.6 Subclone from Cosmid 77C1 Sequence from subclone 77C1 2.4 (Appendix 1), obtained through the 5ssHII site, was used to search GenBank's non-redundant database. Analysis revealed that 600 base pairs extending from the BssHII site of the 2.4 kilobase fissHII-EcoRI fragment is LINE1 repetitive sequence (Table 3.1). Within the L1.2 element, with which this sequence matched, a Z?ssHII site is located 153 base pairs from the 5' end. Thus, sequence obtained from cosmid 77C1 was identified entirely as a member of the L l repetitive element family.  3.1.7 Subclone from Cosmid 141A6 A 1.7 kilobase BssHII-EcoRI fragment from cosmid 141A6 was subcloned and sequenced through the .BssHII site (Appendix 1). The first 49 base pairs of sequence is Alu-Sc repetitive D N A (Table 3.1), not detected by probing with Alu . Searches against the databases with the non-repetitive region identified no significant homologies. As well, no open reading frame was identified by GRAIL. Currently no sequence present in the GenBank database has significant homology with the 600 base pairs of non-redundant sequence from subclone 141A6 1.7.  3.1.8 Subclone from Cosmid 156G7 A 2.3 kilobase BssHII fragment was subcloned from cosmid 156G7. This region could not be sequenced using the automated sequencer and Taq polymerase. However, 135 base pairs of sequence was obtained by manual sequencing with Sequenase. Sequenase is better able to sequence difficult areas, such as G C rich regions, that have a tendency to form secondary structures. Once sequenced, subclone 156G7 2.3 was found to have a higher G C content than any of the other fragments sequenced for this project (Table 3.7).  49  Chapter III: Results B L A S T similarity searches with GenBank databases were unable to identify any significant regions of homology with previously sequenced D N A . G R A I L did not detect an open reading frame. It would have been unlikely for one to be detected within 135 nucleotides of sequence using this program, since G R A I L identifies coding exons 100 base pairs or greater in length. Therefore, sequence obtained from subclone 156G7 2.3 has no homology with sequence in the database and no detectable coding features.  3.1.9 Subclones from Cosmid 166H7 Two BssHU-EcoRl fragments from cosmid 166H7 were subcloned and sequenced through their ZfasHII sites (Appendix 1). The first fragment, 166H7 5.0, identified no significant homologies with sequence in GenBank's databases. The second fragment, 166H7 9.2, was found to match human EST H88121 (Table 3.2). Of the 221 nucleotides within the region of homology (Figure 3.1) there were 26 sequence differences (88% sequence identity). Human EST H88121 originated from a retinal cDNA library, and was sequenced at Washington University School of Medicine as part of the EST project (Hillier et al, 1995). This EST had been placed on chromosome 8 at position 731.5 cR on the WICGR radiation hybrid map (746.2 cR), between markers WI-15870 (729.9 cR) and WI12784 (731.5 cR). Thus, sequence from cosmid 166H7 contains a human EST previously placed on chromosome 8.  3.1.10 Subclone from Cosmid 175G8 A 3.3 kilobase BssHU-EcoRl fragment from cosmid 175G8 was subcloned and sequenced through the BssHU site (Appendix 1). Inquiries against GenBank databases identified no matches with sequence present there. As well, no open reading frames were recognized by GRAIL. Therefore, at this time sequence obtained from this cosmid has no significant homologies with sequence present in the GenBank database.  50  Chapter III: Results  3.1.11 Subclone from Cosmid 176F5 Sequence through the BssHII site of a 1.8 kilobase BssHII-EcoRI fragment from cosmid 176F5 (Appendix 1) was compared with data at GenBank. Inquiries against the non-redundant database identified the first 109 nucleotides as Alu-J sequence (Table 3.1), a subfamily of Alu repetitive elements (Jurka and Milosavljevic, 1991). A B L A S T similarity search against GenBank's EST database, with the nonrepetitive region, identified human EST W90101 (Table 3.2). The terminal 382 base pairs of sequence obtained from this subclone shares 92% sequence identity with the 5' region of this EST (Figure 3.2). Human EST W90101 was sequenced from a fetal liver spleen cDNA library at Washington University School of Medicine as part of the EST project (Hillier etal., 1995). Thus, cosmid 176F5 contains human EST W90101 enabling this EST to be localized to chromosome 8.  3.2 Localization by STS mapping 3.2.1 Localization of the Human GOR gene STS mapping was used as a method of positioning the human GOR gene, identified in cosmid 40G1, by designing primers to amplify a 286 bp product at the BssHII end of the 1.6 kb fragment. The forward primer, 5 ' - A G G T T G C C C C A A G T C C A A G C - 3 ' , is shown underlined at the top of Figure 3.4 while the reverse primer, 5'G C T G T C T G A C C T T C C A C A T C - 3 ' , complements the 20 nucleotides shown underlined at the bottom of Figure 3.4 that are adjacent to the G C G C G C BssHII site. These primers were used against a cell hybrid panel for chromosome 8 (Wagner et al., 1991; Wood et al., 1986). Only XVII-23Ha produced an amplified product of 286 base pairs, localizing the human GOR gene on chromosome 8ql3-q22.2 (Figure 3.5). This STS was also typed in the G3 radiation hybrid panel (Stewart et al., 1997) and the results submitted to the  51  Chapter III: Results Stanford R H server. The highest two point L O D score of 3.87 was reported at a distance of 52.9 c R  10000  from SHGC-37027 which lies within the chromosome 8 bin 67. The  Genethon marker D8S1757, also located within bin 67, maps to 8ql3-q22.1 on the cytogenetic map (Leach et al., 1996) confirming the previous localization.  3.2.2 Localization of the Human EST W90101 STS mapping was the method used to position human EST W90101, identified in cosmid 176F5, by designing primers to amplify a 300 bp product near the ZtosHU site of the 1.8 kb fragment. The forward primer, 5'-TTCTCCTCTCCGCCTGGCTG-3', is shown underlined at the top of Figure 3.2 while the reverse primer, 5'G A G G G A C A A G T A T C C A G T C C - 3 ' , complements the 20 nucleotides shown underlined at the bottom of Figure 3.2 that are adjacent to the G C G C G C BssHU site. These primers were used against a cell hybrid panel for chromosome 8 (Wagner et al., 1991; Wood et ai, 1986). Both M G V 270 and M G V 271 produced amplified products of 300 base pairs, localizing EST W90101 on chromosome 8q24.1-qter (Figure 3.3).  52  Chapter III: Results  genomic  GCGCGCCCCTGCACTACCTAGCGCCGCTGCTCTCCAACCTCAATC  EST  GCGCGCCCCTGCACTACCTAGCGCCGCTGCTCTCCAACCTCAGCC  72  genomic  AACGCCCTGCGGCGCGGGCCTTCCTACTGGACCCCGACNTGTGAA  90  llllllllllllllllllllllllllllllllllllllllll  MINIMI I MIMMMMIMMMIMIMM  I  Mill  EST  AACGCCCTGNG  genomic  CCCCAGGGCGCCCGCGCGGGCGGGGTAGGGGACGGAANTTAGGGG  MIIMI  CGCGGGCCTTCCTACTGGACCCCGACAGGTGAA  M l , III II  III l l l l l l  GCCCAGGGG CCCGCGGGGGGGGNAGNGGG  genomic  GGACGGCCCTAANTGACATCTCCCAACATGTGCCCTGGTCCNNCG  ACGGAAGTTAGGGG  135  159  180  ! 111111 MM MMIIIMI  EST  GGACGGCCCTAAGTGACACCTCCCAACAGGTGCG  genomic  GCTGCTGCCCCTTACCCANTTACCCCCACTCCTCTGTACTC  EST  GCTGCTGCCCCTTACCCAG TACCCCGACTCCTCTGTACGC  MMMMIMMIMM  116  lllllll  EST  MMMIIMM M i l l  45  TGGTCCAGCG  l l l l l l MMMIIMM I  203  221  243  Figure 3.1 Sequence comparison between genomic sequence from cosmid 166H7 and sequence from cDNA clone 220630. The genomic human sequence is numbered from the BssHU site in the 9.2 kb JS^HiIEcoRl fragment. The cDNA sequence is numbered as reported by Hillier et ai, (1995).  53  Chapter III: Results  genomic  GTCAGATGCTTTAGGTACTGATTTTCTCCTCTCCGCCTGGCTGGA 313  EST  GTCAGATGCTTTAGGTACTGATTTTCTCCTCTCTGCCTGGCTGGA 45  genomic  ACATCCGTCTGCATCGGACGGGCTTCCTTCTGCACCTGTGACCTC  3 58  EST  ACATCCGTCTGCATCGGACGGGCTTCCTTCTGCACCTGTGACCTC  90  genomic  ACAAGGGCTCTCCTTGGGGANGCAGGACCAGAGCAGCCCTGAGTA  403  EST  ACAAGGGCTCTCCTTGGGGAGGCAGGACCAGAGCAGCCCTGAGTA  135  genomic  ACCGGGCACAGCCGAAGAACAGGCCCCANGGAAGCCAGGAACCGG  448  I MM M M  II  II  II  ilil  lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllll  lllllllllllllll  llllllllllllllllllllllll  llllllllllll  III  I!  1111  EST genomic  ACCGGGCACAGCCGAGGAACAGGCCCCAGGGAGGCCAGGACCCGG GTCCTCACGCCTGCGGCCACAGAAGCACCGTGTCCCTGCAGCTGT  180 493  EST  GTCCTCACGCCTGCGGNCACAGAAGCACCGTGTCCCTGCAGCTGT  225  genomic  GAACCACCAAGCGTCTGGGTGGGTGGTTACTCAGCAAATGCCAAT  53 8  EST genomic  GAGCCACCAGGCGTCTGGGTGGGTGGTTACTCAGCAAGTGCCAGT AAAAAAACTGGGTCCCAAAATTNGGGTTGCTGTGGACTGGATACT  270 583  EST  AGAGAAACTGGGTCCCAGAGTTGGGGTTGCTGTGGACTGGATACT  315  genomic  TGTCCCTCACCTCCAATTTCAATGTTNAAGTCCCCACCCCCANGA  628  llllllllllllllll  II  ill!  llllllllllllllllllllllllllll  II,n  I I lllllllllllll  lllllllllllllll  r l | | ||  |  n m |  I II l l l l l l l l l l l l l l l l l l l l l l  II II II M I N I  Mill  II  EST genomic  TGTCCCTCACCTCCAGTTCACATNTTAAAGTCCTCACGCCCAGCA 3 60 AAAAA TGTTNGAAGTGGGGCCT 650  EST  AGACAGTGTTGGAGGTGGGGCCT  I I I I I I I II  M  l  383  Figure 3.2 Sequence comparison between genomic sequence from cosmid 176F5 and sequence from cDNA clone 418116. The genomic human sequence is numbered from the BssHII site in the 1.8 kb BssHIIEcoRI fragment. The cDNA sequence is numbered as reported by Hillier et al., (1995).  54  Chapter III: Results  Figure 3.3 Chromosomal localization of the human EST W90101 by PCR against somatic cell hybrids. Lane 1 contains a marker digest, <|)X174 phage digested with Haelll. The cell hybrid panel consists of: XVIII-23Ha, 8pter-8q22, lane 2; V T G H L 19, 8pter-8ql3, lane 3; 1SHL 3, 8pter-ql 1, lane 4; 20xPO435-2, 8p23-ql 1, lane 5; M G V 270, 8q24.1-qter, lane 6; M G V 271, 8q22.1-qter, lane 7. Human D N A , lane 8, is the positive control, and lane 9 is a negative control. 55  Chapter III: Results  R  Q  V GTC 1 1 1 1 GTG V  A Q CAA GCC  I ATC  CAA GCC A Q  ATC I  X  human  L TTA  S AGC  A GCC  T Q CAA ACC  chimp  TTC  AGC S  GCC A  CAA  E S GAG AGC  D GAC  L CTG  GAG AGC E S  GAT D  CTG L  L  A  L  K  V 3TC  D  T  A  V  L  human chimp  F  T  V  Q  V  M  G  S  V  978  I  G  H  S  K  L  I  H  s  I  :TC L  M  A  L  II  183  1014  D  T  A  V  F  P  Y  K  R  R  Y TAC  H  :TC L  ACC T  :TC L  A D GGA GAC  G  F  219  II  A GCG  S  \TT(  3G .  human  L  1 11 I 1 1 1  r  3G G  942 L.  3TC  Y  K  R  L  A  Q  H F TTC  P CCG  Q CAA  Y TAC  TTC F  CCG P  CAC H  TAC Y  TCC  L CTC  R N AGG AAT  L CTC  111  TCC S  CTC L  AGG AAT R N  CTC L  1086  I  T  255  s  1 11 1 1 1  147  1050  D 75  chimp  GCG A  GCC A  GAC D  TAC Y  human  Q CAG  D GAC  G GGC  chimp  CAG  GAC D  GGC G  L  Q  Q  Q  G  Q  H N CAC AAC  S TCC  S AGC  E D GAG GAC  CAC AAC H N  TCC S  AGC S  GAG GAC E D  V  M  W  V  M  W  M  56  Q  D  S  A  N  A  N  A R  1122  39  K AAG  V GTC  R AGA  Q CAG  AAG K  GTC V  CGA R  CAG Q  R  1158  Chapter III: Results  Figure 3.4 GOR sequence comparison between human and chimpanzee for the region amplified by PCR primers. The human sequence is numbered from the BssWI site in the 1.6 kb BssHII-EcoRI genomic fragment. The chimpanzee sequence is numbered from the £coRI site of the partial c D N A as reported by Mishiro et al., (1990).  57  Chapter III: Results  Figure 3.5 Chromosomal localization of the human GOR gene by PCR against somatic cell hybrids. Lane 1 contains a marker digest, (j)X174 phage digested with HaelU. The cell hybrid panel consists of: XVIII-23Ha, 8pter-8q22, lane 2; V T G H L 19, 8pter-8ql3, lane 3; 1SHL 3, 8pter-ql 1, lane 4; 20xPO435-2, 8p23-ql 1, lane 5; M G V 270, 8q24.1-qter, lane 6; M G V 271, 8q22.1-qter, lane 7. Human D N A , lane 8, is the positive control, and lane 9 is a negative control.  58  Chapter III: Results  Table 3.1: Subclone Name 40G1 1.4  40G1 1.4 *  GenBank Non-Redundant Database Search Description  Chimpanzee GOR gene many matches with L l repetitive element Chimpanzee GOR gene many matches with L l repetitive element  40G1 1.6  Chimpanzee GOR gene  40G1 1.6 *  Chimpanzee GOR gene  46F4 1.6  Rattus norvegicus 5oxo-L-prolinase mRNA  46F4 1.8  Rattus norvegicus 5oxo-L-prolinase mRNA VNTR  46F11 2.5 46F11 2.5 *  Subclone Region of Match 14-87 86-332 456-586  Length GenBank % (bp) Identity accession Match number 72 247 131  98 92 83  Many including D10017  1-48 48-411 411-433 424-524 555-578 1-334 328-394 395-431 421-465 487-525 517-633 81-112 113-232 306-361 362-429 16-165 320-459  48 364 23 101 24 334 67 37 45 39 117 32 120 56 68 150 140  100 89 86 91 87 93 92 83 84 89 76 84 83 83 91 78 72  Many including D10017  1-54 53-107  54 55  98 92  27 157 115 27 20 60 75 78 37 36 199 100 46 60 82 71 101 49 17 64  92 92 90 96 95 68 80 87 91 91 83 91 82 73 91 83 73  Z l 1740 S36048 S36051 Z72522  MER11A repetitive element  1-27 42-198 234-348 342-368 367-386 386-445 432-506 77C1 2.4 L1.2 repetitive 1-78 element 67-103 103-138 130-328 400-499 497-542 542-601 93C11 Alu-Sc repetitive 1-82 element 82-152 ABssKII 293-393 141A6 1.7 Alu-Sc element 1-49 176F5 1.8 Alu-J element 1-17 46-109 Only significant similarities with GenBank database are listed * sequenced from the EcoRl site  59  94 82  D10017 D10017 U70825  U70825  M80343  U14571 U14571 U14567  Chapter III: Results  Table 3.2  GenBank Sequence Tag Sites (STS) and Expressed Sequence Tag (EST) Database Searches  Subclone Name 40G1 1.4 t 40G1 1.6  Description  Region of Match (subclone sequence) 427-451 456-535 7-334 330-390  Length (bp)  Human chromosome 25 8 STS UT1959 80 me93cl2.rl Soares 328 mouse embryo 61 $ NbME13.5 14.5 Mus musculus cDNA clone 40G1 1.6 mi53h07.rl Soares 62-334 273 mouse embryo 330-390 69 $ NbME13.5 14.5 Mus musculus cDNA clone 40G1 1.6 mm36e01.rl 314-390 77 Stratagene mouse $ skin (#937313) Mus musculus cDNA clone ml56g02.rl 40G1 1.6 330-390 61 Stratagene mouse $ testis (#937308) Mus musculus cDNA clone 166H7 9.2 ys75cl2.sl Homo 1-110 110 sapiens cDNA clone 103-168 66 220630 189-221 33 176F5 1.8 zh77e07.sl Soares 269-507 239 fetal liver spleen 508-649 142 t 1NFLS SI Homo sapiens cDNA clone 418116 Only significant similarities with GenBank database are listed t Matched with STS $ Matched with EST  60  GenBank % Identity accession Match number 88 83 74 83  L18713  74 83  AA034629  76  AA097346  83  AA 120740  70 84 75 96 78  H88121  W81932  W90101  Chapter III: Results  Table 3.3 GenBank Expressed Sequence Tag (EST) Database Search with the GOR gene Gene Searched  Description  Region of Match  chimpanzee  Length (bp)  ml56g02.rl 352-605 254 Stratagene mouse 607-887 281 testis (#937308) Mus musculus cDNA clone 516050 chimpanzee me93cl2.rl Soares 734-1190 457 GOR mouse embryo NbME13.5 14.5 Mus musculus cDNA clone 403126 chimpanzee mi53h07.rl Soares 734-1135 402 GOR mouse embryo NbME13.5 14.5 Mus musculus cDNA clone 467293 chimpanzee mm36e01.rl 744-867 121 GOR Stratagene mouse skin (#937313) Mus musculus cDNA clone 523608 chimpanzee mi26gl0.rl Soares 8-158 151 GOR mouse embryo 195-299 105 NbME13.5 14.5 Mus musculus cDNA clone 464706 chimpanzee mo54fl0.rl Life Tech 450-551 102 mouse embryo 10 GOR 5dpc 10665016 Mus musculus cDNA clone 557419 Only significant similarities with GenBank database are listed  GOR  61  % GenBank Identity accession Match number 77 77  AA120740  79  W81932  79  AA034629  81  AA097346  72 80  AA030569  82  AA110524  Chapter III: Results  Table 3.4  GenBank Expressed Sequence Tag (EST) Database Search with the OPLAH gene  Gene Searched  Description  Region of Match  rat O P L A H  va86h07.rl Soares mouse N M L Mus musculus cDNA clone 738301 mc06d08.rl Soares mouse p3NMF19.5 Mus musculus cDNA clone 338127 va86h08.rl Soares mouse N M L Mus musculus cDNA clone 738303 mkl7c05.rl Soares mouse p3NMF19.5 Mus musculus cDNA clone 493160  3406-3441 3454-3708 3700-3876 3851-3896 3340-3449 3451-3730 3724-3754  rat O P L A H  rat O P L A H  rat O P L A H  Length % (bp) Identity Match  GenBank accession number  36 255 177 46 110 280 31  100 92 94 65 96 92 93  3409-3440 3449-3825  32 377  96 88  AA271446  3340-3441 3441-3489  102 49  97 85  AA097778  Only significant similarities with GenBank database are listed  62  AA271445  W29895  Chapter III: Results  Table 3.5 PROSITE database Searches with the GOR gene Gene Name GOR GOR GOR  Annotated Domains N-glycosylation site cAMP-dependent protein kinase phosphorylation site Protein kinase C phosphorylation site  GOR  Casein kinase II phosphorylation site  GOR GOR  Tyrosine kinase phosphorylation site N-myristoylation site  GOR  Amidation site  Location and Sequence  Total matches  379: NSSE 13: RRPS; 219: RKDS  1 2  7: SSK; 8: SKR; 19: S L K ; 87:TLK; 151: SCR; 167: SGR; 229: TFK; 358: SLR 55: SKQE; 262: T V V D ; 293: TEAD; 339: T V L D ; 380: SSED 169: R C V R D Q L C Y ; 277: KPDNEIVDY 93: GLTPSS; 101: G L S R A A ; 190: GGRVSQ; 204: GSVGCQ; 207: G C Q V A K 255: GLELTR; 291: G V T E A D ; 443: G A V D G R 473: GLSPSL 217: DGRK; 446: DGRR  8  63  5 2 9  2  Chapter III: Results  Table 3.6 Protein Coding Potential of BssHII-EcoRI as Analyzed by GRAIL Subclone Name 13E3 0.6 46F4 1.6 46F4 1.6 46F4 1.8 93C11AB^HII  Table 3.7 Subclone Name 13E3 0.6 40G1 1.4 40G1 1.6 40G1 1.6* 46F4 1.6 46F4 1.8 46F11 2.5* 77C1 2.5 93C11  frame  ORF  2 -1 -2 1 -2  start  228 422 84 314 492  GC content of Length (bp) 650 650 650 650 650 650 649  G+C  650 650  ORF  subclones  end  333 569 278 616 596  Score  Quality  55.00 90.00 80.00 65.00 85.00  good excellent excellent good excellent  subclones  BssHU-EcoRl  56 47 57 58 64 66 37  Expected CpG 49 33 48 46 64 68 32  Obs. CpG 28 18 31 29 30 43 5  Observed GpC 51 30 47 44 60 63 21  Obs. TpG 46 27 55 31 49 46 45  Obs/Exp CpG§ 0.57 0.54 0.64 0.63 0.47 0.63 0.15  53 48  48 31  22 18  45 34  36 34  0.46 0.59  18 15 46 33 40 37  41 11 40 26 44 52  0.34 0.64 0.68 0.77 0.21 0.38  (%)  ABSSHII  141A6 1.7 650 33 18 6 156G7 2.3 135 68 14 9 166H7 5.0 650 56 46 31 166H7 9.2 650 59 40 31 175G8 3.3 650 49 38 8 176F5 1.8 650 51 42 16 Sequenced from the EcoRl site § Values over 0.60 are considered indicators of CpG islands. Bulk frequencies of 0.20.  64  genomic DNA has  Chapter IV: Discussion  4.1 Gene Identification An important focus of research in human genetics is the identification of genes. Eventually this will be accomplished by sequencing the entire genome, predicted to be completed in 2005, followed by identification and localization of genes on genetic maps. For a number of organisms this process is well underway. Saccharomyces cerevisae is the first eukaryote to have its entire genome sequenced and all of the genes identified (Goffeau et al., 1996). Current research on genome organization and gene function is pointing the way for future work to be done in the human genome. Sequencing of Caenorhabditis elegans ' genome is nearing completion, with roughly 30% remaining. Similarity searches against the database as well as programs that identify coding features, are being used to uncover the genes within this sequence. Functional analysis of genes identified in these genomes will assist in analysis of homologous human genes. Even now over half of cloned genes known to cause human heritable diseases have similarities to a gene in either the yeast or C. elegans genome (Goffeau et al., 1996; Riddle et al., 1997). Sequence comparison between organisms enables important features such as genes to be detected, providing an indispensable tool to assist in analysis of the human genome. As large scale sequencing of the human genome begins researchers are looking ahead to the next steps in the process of characterizing the genome (Lander, 1996). Resequencing of selected regions has been suggested to identify allelic variations in populations, as well as variations in coding regions underlying predisposition and susceptibility to disease. As genes are identified and placed on genetic maps, more information about genome organization will become available providing new insights into the human genome.  65  Chapter IV: Discussion While sequencing of the human genome awaits completion, efforts are being taken to sequence the estimated 50,000 to 100,000 genes encoded within it. Numerous techniques, such as cDNA selection and exon trapping, are currently being used to isolate genes within a small fragment (1 Mb) of D N A (review Monaco 1994). Large scale sequencing of expressed sequence tags (ESTs), as part of the human genome project (Adams et al., 1991; Hillier et al., 1997), has been responsible for identifying 709,530 ESTs. However, due to the difficulty in establishing c D N A libraries with genes transcribed at low frequencies, particular stages of growth, or only in certain tissues, genes of this type are likely to be missed: As well, other genes have been found to have a number of ESTs representing them in the database. To reduce this redundancy comparisons were made among ESTs and 3' ends of known genes (Schuler et al., 1996). From this approximately 50,000 unique sequences were identified (www.ncbi.nlm.nih.gov/UniGene/) more than 16,000 of which have been positioned on genetic maps spanning the human genome. As more genes are identified these will also be placed on genetic maps providing a framework onto which more information can be added. Searches for genes in the human genome revealed that regions which posses a high GC content also have a clustering of coding sequences (Fields et al., 1994). Larsen et al, (1992) found all ubiquitously expressed genes and 40% of tissue specific genes, present in the E M B L DataBank (release 28), were associated with regions known as CpG islands. CpG islands are easily identifiable through the use of a rare cutting enzyme, due to the high frequency of CpG dinucleotides within island D N A compared to genomic D N A (Cross and Bird, 1995). BssHII is a rare cutting enzyme with 80% of its sites in CpG islands (Larsen et al, 1992), making it particularly useful for locating these sequences. As CpG islands are able to be discerned within cloned genomic D N A (Lindsay and Bird, 1987), their identification offers an alternative method of gene detection that could be used to identify genes. The intent of this project was to identify novel human genes by sequencing adjacent to BssHII sites in chromosome 8. Through this method two ESTs, two open reading 66  Chapter IV: Discussion frames, two CpG islands and two novel human genes were identified, demonstrating the effectiveness of this approach. Although sequencing of expressed sequences has identified a large number of genes, approximately half of those predicted to be in the human genome remain to be found. Sequencing of the entire genome will provide us with the sequence for all of the genes, but the task of recognizing them will still remain. Programs, such as GRAIL, assist in the detection of genes by searching for features of coding D N A in sequence. Identification of CpG islands as a method of gene identification offers the advantage of being a useful tool before and after sequencing of the human genome is complete. Sequencing D N A adjacent to a rare cutting enzyme site has proven a successful method of gene detection in this project as well as others (Zhu et ai, 1993; Lee et ai, 1994; John et al., 1994). Moreover, after sequencing of the genome is complete, identification of CpG islands can be accomplished by searching for rare cutting enzyme recognition sequences. Thus, identification of genes through the use of a rare cutting enzyme is a method which can be used throughout the sequencing of the human genome.  4.2 DNA Adjacent to BssHlI sites Sequence obtained from regions adjacent to ZtoHH sites, present in cosmids selected from the LA08NC01 chromosome 8 library, was examined and coding D N A features described. By comparing sequence obtained to those present in GenBank's databases along with identification of open reading frames, two EST, two open reading frames, and two novel human genes were identified. As well, two CpG islands were detected, one of which is believed to be the 5' island for human EST H8121 partially sequenced during this project. Comparisons were made among protein data present in the SwissProt database and the hypothetical proteins coded for by the open reading frames. This was done to identify similarities with better understood genes in other organisms, providing insight into possible functions for these genes. Unfortunately, no significant 67  Chapter IV: Discussion homologies were found. Using the PROSITE database of protein motifs, a number of protein consensus sites were identified within the amino acid sequence of the GOR gene suggesting possible post-translational modifications in the protein. Thus, sequence obtained during this project identified a number of genes and gene associated regions endorsing the use of the rare cutting enzyme BssHII as a method of gene detection.  4.2.1 Repetitive DNA 4.2.1.1 Alu elements Some of the cosmids selected to be used in this project were found to contain repetitive D N A . The Alu family of repetitive D N A , found in excess of 500,000 times in the human genome (Deininger et al, 1981) is known to contain a BssHII site. Cosmid 93C1 lABssHII was found to include Alu sequence, after which Alu repetitive D N A was used as a probe against cosmid DNA. This was done to eliminate D N A containing an Alu element from the set of fragments to be sequenced. Two subcloned cosmid fragments, 141A6 1.7 and 176F5 1.8, were identified as containing a small region of Alu sequence. It is probable that due to the modest amount of Alu sequence, 49 and 109 base pairs respectively, initial screening did not produce an identifiable signal. An individual Alu element is approximately 300 base pairs in length, composed of two tandemly arranged halves separated by an A-rich region (Batzer et al, 1990). Each element posses a 3' oligo-dA tail, the length of which varies, and is flanked by short direct repeats. It is believed that formation and evolution of Alu elements has involved reverse transcription of RNA, which is then reintegrated into a new site in the genome (Deininger and Daniels, 1986). Ancestrally, Alu sequences are believed to have originated from the 7SL R N A gene, due to sequence and secondary structure domains present in both. Divergence from the Alu progenitor gene has resulted in two major subfamilies Alu-J and Alu-S (Jurka and Milosavljevic, 1991), with Alu-S being further subdivided into at least  68  Chapter IV: Discussion five distinct subfamilies. Folding patterns predicted for R N A derived from each Alu subfamily are the same (Labuda and Striker, 1989) despite deviations in sequence, suggesting an important function for these elements not yet discovered.  4.2.1.2 LINE1 elements Another type of repetitive D N A , LINE1, was detected throughout subclone 77C1 2.4. Long interspersed repetitive sequences (LINE1) comprise a family of redundant D N A accounting for approximately 5% of the human genome (Dombroski et al., 1991). As was proposed for Alu elements, LINE Is appear to be formed through the process of retroposition (Deininger and Daniels, 1986). A complete and transpositionally active L I element is 6 to 7 kilobases long, has a 5' untranslated region, two open reading frames, one of which contains a region similar to reverse transcriptase, and a 3' untranslated region followed by an A-rich tail (Smit et al., 1995; Jurka, 1989). In the human genome these elements are present approximately 100,000 times, however only about 3500 are complete (Dombroski et al, 1991). Most members of the L I family are truncated in the 5' region, such that only part of the 3' untranslated region remains. Comparisons between L I sequences allow subfamilies to be identified based on variation in nucleotides, as is done with Alu elements. LI.2, determined to match sequence from subclone 77C11 2.4, was identified as a full-length L I element (Dombroski et al, 1991), containing both of the open reading frames.  4.2.1.3 VNTR and MER elements Representatives from two families of repetitive elements were found within subclone 46F11 2.5. The first of these, variable number tandem repeats, are segments of sequence repeated in tandem a differing number of times resulting in various sized alleles  69  Chapter IV: Discussion for these markers. The V N T R present in this BssHII-EcoRI fragment is associated with a short interspersed sequence retroposon (SINE-R.C2), predicted to be present 4,000 to 5,000 times in the human genome (Zhu et al, 1992). Sequence obtained from the EcoRl site was identified as a MER11A element. M E R repeats are found in the human genome hundreds to thousands of times, representing one of the more frequent type of repeat in the genome (Jurka et al., 1993). The origin of these sequences is unknown, however it is presumed that they represent pseudogenes which arose from an active gene by a process that has not yet been determined. Sequences representative of this family of repeats vary in length from 150 to 650 base pairs (Jurka, 1989) and are predominantly A+T rich. Within the set of cosmids sequenced, two were found to include a significant proportion of repetitive D N A . The remaining nine cosmids provided non-repetitive sequence for analysis and were used towards the goal of this project.  4.2.2 Identification of ESTs Examination of the expressed sequence tag (EST) database identified two human ESTs with significant homology to sequence obtained during this project. The first of these, H88121, was identified in a 9.2 kilobase BssHII-EcoRI fragment from cosmid 166H7. This EST (Hillier et al, 1995) had previously been placed near the telomere on the long arm of chromosome 8 at position 731.5 cR on the WICGR radiation hybrid map. A second human EST, W90101, was identified in a 1.8 kilobase BssKll-EcoRl  fragment  from cosmid 176F5. Localization of this EST was accomplished by STS mapping using primers designed from D N A adjacent to the Bsstlll site (Figure 3.2). Both M G V 270 and M G V 271, from a cell hybrid panel spanning chromosome 8 (Wager et al., 1991; Wood et al., 1986), produced a product localizing the EST to 8q24.1-qter (Figure 3.3). Although the goal of this project was to identify novel genes, by sequencing D N A adjacent to BssHU sites two formerly identified human ESTs were also recognized.  70  Chapter IV: Discussion  4.2.3 Identification of Open Reading Frames D N A adjacent to itoHII sites within subclone 13E3 0.6 and the non-repetitive region of 93C11 ABssHU, are predicted by G R A I L to contain open reading frames. 3' (TGGTGGTGCTGGAG/G) and 5' (AG/GTGCTCCCTGCT) consensus splice sequences (Shapiro and Senapathy, 1987) were found to flank the predicted open reading frame within 13E3 0.6 (Table 4.1). The open reading frame in 93C1 I ABssHU is also flanked by consensus splice sequences, A A N A N A N T G C N C A G / G at the 3'junction and A G / G A G A T N G A G A C at the 5' location (Table 4.1). Translations resulted in putative proteins, encoded by 13E3 0.6 and 93C1 lARssHII, 32 and 34 amino acids respectively. Examination of the SwissProt database uncovered no significant matches, suggesting homologous proteins have not yet been identified. Thus, sequencing uncovered open reading frames within two coding regions with no homologous sequence identifiable in GenBank's databases.  4.2.4 Identification of CpG islands Previous work with CpG islands (Aissani and Bernardi, 1991; Gardiner-Garden and Frommer, 1987; Larsen et ai, 1992) concluded that 80% were located at the 5' end of a gene. Most of these islands began upstream of the associated gene's transcription start site, with distances varying from under 100 base pairs to over 2 kilobases. CpG islands are known to range in size from 500 to 2,000 base pairs (Antequera and Bird, 1993), with the BssHU site located anywhere within this sequence. Thus, it is not surprising that sequence obtained from four subcloned fragments (141A6 1.7, 156G7 2.3, 166H7 5.0, 175G8 3.3) did not match with genes in the databases or contain an open reading frame.  71  Chapter IV: Discussion Presumably, these represent CpG islands located upstream of the first exon, such that less than one hundred base pairs of the coding D N A is within 650 base pairs of the fi^HII site. Assuming that a CpG island was identified through these BssHU sites, the sequence should posses certain features. CpG islands have a higher G C content than genomic D N A (Bird, 1986). This was true for three of the four subclones (156G7 2.3, 166H7 5.0, and 175G8 3.3) shown in Table 3.7. As well, the dinucleotide CpG is found at a higher frequency in island D N A than in the rest of the genome (Cross and Bird, 1995). Sequence from two of the subclones, 156G7 2.3 and 166H7 5.0, had a ratio of observed CpG dinucleotides over expected greater than 0.60. This value has been used as the lower limit indicative of a CpG island (Gardiner-Garden and Frommer, 1987; Larsen et al., 1992). Genomic D N A has a value of approximately 0.20. Thus, sequence analysis supports the assertion that sequencing adjacent to BssHU sites identifies CpG islands, each of which is likely to be associated with a gene. As was mentioned previously, most CpG island are located upstream of transcription initiation. Preceding this start site in genes transcribed by R N A polymerase II are two consensus sequences known as T A T A and C C A A T elements. Spl transcription factor binding sites (GGGCGG) may also be located a short distance upstream of the transcription start site. These consensus sequences were searched for within sequence from the four subclones. T A T A consensus sequences are present in three (Table 4.2), however no C C A A T elements or Spl binding site consensus sequences were found. Translation initiation typically begins a short distance downstream of a T A T A element. Translation begins at an A T G start codon that is flanked by sequence favorable to initiation (Kozak, 1996). The optimal context for initiation in vertebrates is GCCACCatgG, however variations such as YNNatgG are also found (Y represents pyrimidine). A reasonable initiation sequence (TGAatgG) was identified 8 base pairs downstream of the T A T A element in subclone 166H7 5.0 (Figure 4.2). Thus, it appears that a gene and its  72  Chapter IV: Discussion associated CpG island were identified using this method. Primers designed during this project are currently being used to localized this CpG island on chromosome 8.  4.2.5 Identification of Novel Human Genes 4.2.5.1 GOR Gene Investigation of the non-redundant database identified a novel human gene within cosmid 40G1. 2859 base pairs of this c D N A had previously been sequenced in chimpanzees as an EcoRl fragment (Mishiro et al., 1990). The corresponding region within the human homologue, identified in this thesis, is located within a 3.3 kilobase £coRI fragment. Contained within this fragment are two BssHII sites resulting in two BssHII-EcoRI fragments, of 1.4 and 1.6 kilobases, and a 300 base pair BssHII fragment (Figure 4.3). Few variations in sequence were found between the human and chimpanzee genes (Figures 4.4 a-c). Sequence obtained from the EcoRl site of the 1.6 kilobase fragment (Figure 4.4 a) shows more variation in sequence between human and chimpanzee (61% identity) due in part to lower quality sequence obtained for the last 200 base pairs. Among the 76 amino acids compared there are 9 amino acid differences and 19 triplets not able to be translated. Variations in amino acids were due to substitutions at codon position 1 in thirteen cases, position 2 in seven cases, and position 3 in eight cases. Three additional silent substitutions were found at position 3. Among the 168 amino acids compared in Figure 4.4b there are 30 differences (82% identity). Substitutions at codon position 1 accounts for sixteen of the differences, position 2 for six, and position 3 for eight. An additional ten silent substitutions were observed, nine at position 3 and one at position 1. Among the 20 amino acids compared in Figure 4.4c 2 amino acid differences were identified (90% identity). These differences were due to a substitution at position 1 and position 2.  73  Chapter TV: Discussion Localization of this novel human gene was accomplished by STS mapping using primers, shown underlined in Figure 4.4b, designed from D N A adjacent to the BssWll site in the 1.6 kilobase fragment. Only XVIII-23Ha, from a cell hybrid panel spanning chromosome 8 (Wager etal., 1991; Wood etal., 1986), produced a product localizing GOR to 8ql3-q22.2 (Figure 3.5). STS typing in the G3 radiation hybrid panel confirmed the location of this gene by linking it to Genethon marker D8S1757 assigned to the same region of chromosome 8. The 5' chimpanzee £coRI recognition site is not present in the human D N A sequenced. As a result, the human EcoRl fragment extends 5' of the chimpanzees' by 417 bases, to the preceding £coRI site. Within the human sequence a potential 3' splice junction was identified, A A G A C G C G C A A C A G / G , beginning 398 base pairs into the 1.6 kilobase fragment. Subsequent to human base pair 421, where the chimpanzee sequence begins, no intronic sequence is present. Therefore, it appears the 3' exon of a novel human gene has been identified, extending approximately 3 kilobases. In both humans and chimpanzees this 3' exon contains a truncated L I element starting at base pair 1807 of the chimpanzee sequence. Isolation of the chimpanzee cDNA clone, GOR-47-1 containing the GOR epitope ( G R R G Q K A K S N P N R P L ) , was first reported by Mishiro et al., (1990). The protein encoded by GOR was identified due to its high levels in the blood plasma of chimpanzees infected with hepatitis C virus. The GOR epitope within this protein was found to cross reacts with antibodies from patients infected with hepatitis C, and was suggested as a method to be used in the detection of low level virus infection in blood (Mishiro et al, 1990).  74  Chapter IV: Discussion Inquiries against the EST database, with the human and chimpanzee sequences, identified six mouse ESTs (W81932; AA034629; AA097346; AA120740; AA030569; AA110524). These ESTs originated from various mouse cDNA libraries, indicating that homologous mouse genes are ubiquitiously expressed. Although the function of the GOR gene product is unknown, its high level of conservation between mammalian genomes suggests it has an important function.  4.2.5.2 5-Oxo-L-Prolinase Gene Inquiries against the non-redundant database with two subcloned fragments from cosmid 46F4 recognized a novel human gene previously identified in rats. The gene (OPLAH) encodes an enzyme, 5-oxo-L-prolinase, responsible for catalyzing the cleavage of 5-oxo-L-proline to form L-glutamate. The 5-oxo-L-prolinase cDNA, isolated from a rat kidney library, was sequenced in an effort to characterize this enzyme in rats (Guo-jie et al., 1996). Sequence obtained from the two subcloned fragments matched the O P L A H cDNA between base pair 2857 and 3395 in the rat. The region subcloned in the human homologue consists of two BssHII-EcoRI fragments contained within a 4.2 kilobase EcoRl fragment (Figure 4.5) Within the sequence obtained four exons were detected, flanked on either side by introns. The sequences identified as 3' and 5' splice junctions (Shapiro and Senapathy, 1987) are listed in Table 4.1. Comparisons between the rat enzyme protein and a hypothetical yeast protein YKL215C, encoded by a homologous gene (Guo-jie et al, 1996), found them to be almost 50% identical. Comparisons between regions of the rat protein and the four homologous human exons offered the following results. Among the 40 amino acids from exon a there were 7 differences (83% identity) (Figure 4.6a). Variations were due to substitutions in the codon at position 2 in five cases, and position 3 in two cases. Five silent substitutions were identified, one at position 1 and the other four at position 3. Three differences were 75  Chapter IV: Discussion identified in the 49 amino acids of exon b (94% identity) (Figure 4.6b). Two of these were due to a substitution at position 1, and the third a substitution at position 2. As well, there were nineteen silent substitutions, sixteen occurring at position 3, and three at position 1. Exon c has 5 differences among the 45 amino acids (89% identity) (Figure 4.6c) The differences were due to substitutions at position 1 in four cases, and position 2 in one case. An additional seventeen silent substitutions were identified, fifteen at position 3 and two at position 1. Of the 46 amino acids in exon d (Figure 4.6d) 10 differences were identified (78% identity) all of which were due to substitutions at codon position 1. A n additional eighteen silent substitution were recognized, seventeen at codon position 3 and one at position 1. The high level of conservation for this gene throughout eukaryotic evolution demonstrates the importance of this enzyme's function. Subsequent investigation of the EST database with sequence from the rat O P L A H cDNA identified four mouse ESTs (AA271445; W29895; AA271446; AA097778). Thus, sequencing adjacent to BssHII sites in cosmid 46F4 detected a novel human gene with homologous genes identified in yeast, mouse and rat. Primers designed for this project and are currently being used to localize the O P L A H gene on chromosome 8.  76  Chapter IV: Discussion  4.3 Conclusions The goal of this project was to identify human genes not previously detected by sequencing D N A adjacent to a rare cutting enzyme site. D N A adjacent to BssHU sites in eleven cosmids from the LA08NC01 chromosome 8 library were sequenced. Through this method two ESTs were identified; two open reading frames with no apparent homologues; two CpG islands, one of which contains a translation initiation signal and is 5' of human EST H88121; and two novel human genes (Table 4.3). The identification of novel human genes and open reading frames with no homologues is notable considering the number of expressed sequences present in the database (709,530 human; 182,549 mouse; 30,196 nematode; 9,206 fruit fly; 6,650 rat; 3,042 baker's yeast). As predicted, approximately 80% of regions adjacent to BssHII sites were found to be associated with genes. In conclusion, this thesis demonstrates that sequencing of D N A adjacent to BssHII sites is an effective approach for the identification of novel genes.  77  Chapter IV: Discussion  R P G R A G S L L TGGTGGTGCTGGAG G CGG CCT GGA CGG GCC GGA AGC CTT CTG  262  P G I P G A A S G H R L S CCG GGA ATC CCT GGT GCA GCT TCA GGG CAC CGT CTC AGC  301  P A A G A Q M C V P CCT GCT GCT GGA GCC CAG ATG TGC GTC CCA G GTGCTCCCTGCT 344  13E3 0.6  K X K S G X R X G AANANANTGCNCAG G AAG NAC AAA AGT GGA TNG CGA NNA GGG  568  G X X R L X X X E T R Q V GGG GAN CNG AGG CTA NNA AAN NNG GAA ACC CGG CAA GTT  529  R G G R L R G X I X G A AGA GGG GGA AGG AAG CGA GGC AGN ATC NAN GGG GCA G  492  GAGATNGAGACCCA  478  9 3 C U  ABssHU  Figure 4.1 Translations of ORFs predicted by G R A I L flanked by 3' and 5' splice sequences (Shapiro and Senapathy, 1987).  78  Chapter IV: Discussion  GCGCGCCCTCCTCGGATATTTATATCCCCTGGGCCCCTGCCCACTGCTCCCCTCC  55  CCCACAAGCTGCTGCTGACAGCACGACGGCNGTNCCTCCTCCACCCGACGNTGCG  110  CCAGTGGTTGTGCTCCTGCGAGGGG  135  156G7 2.3  GTCA TA TAAGGGC TGAATGGGGGGAATGGCCACGGGNCTCTCCGAAANGGAANGC  330  CTGCTCTTNTCCANACNGGCACGGCCACCGACAAACTCAGGCCNTCCATGGGACA  385  GGAACANCNCCTGGCCCCATTTGGGGATCTTGAAGATGCNNCTTTGAGAAAGGCC  440  TCTGGGCATCTCTNANCTGGTAACTNTTNGGGCATTTGTNAAGAANTCTTCAATC  495  NCTAATTNACAAACCCCAACTCAAANCGAACTTGCACGAAATAAAAAATTTNTTA  550  ATGGAAGTTTNGAAGGGAACCACCTTCCTGCATTGCATTTGATCCAAAATCCCCC  605  TTNCAACCTTTTTTTNTGGNNCCCCNCAATTTTCCCCCCTTCCGC  650  166H7 5.0  Figure 4.2 CpG islands with T A T A boxes and potential transcription start sequence (Kozak, 1996) highlighted in italics.  79  Chapter IV: Discussion  Human genomic  . Chimp  cDNA  E  B  I  •  B  •  1.4 kb  L 6 k b  E  I 1  B  B  1  1  1195 1516  E  1 2859  Figure 4.3 GOR EcoRl fragment comparison between human and chimpanzee. The human and chimpanzee £coRI fragments are shown to scale. The human genomic fragment extends 421 base pairs 5' of the chimpanzee fragment. The location of the PCR product is shown on the 1.6 kb fragment. E: EcoRl; B: BssHU.  80  Chapter IV: Discussion  human  GAATTCCCACTGCCCTCTCNTCANCCTGCCCANAGCTTCGGGCTCT  46  human  GGGTGCCCCAGATGCACNANCAGGCCTCATCATTTGTGGACATCCA  92  human  GNCGGANCCCCANAACNGGGGTCCGGCGGTGCCCCCAACGTGGCCC  138  human  AATATGGTGACGGAGTCGTGCTACTTCCCTGCGCACANGGGATCGG  184  human  CCTGCCGCTTGCCAACCACCCCAAGGCTGACAGACAGGCCCTCGGG  230  human  TAGTCCGCATCTCAACCCCCAGTGAAGAAGGAAGACGATCGCCCAC  276  human  TCTTCCACCCCTTGCNTGGTCNCATGTTACACAGATNCCNNGNANA  322  human  ACCCGGGTGGCCAGCANCAGCCAACGCTCCCGTGGCTCCANGGTCN  368  human  GCANACAGCCANGGGAAGACGCGCAACAG  S G D G G TCA GGG GAT GGC  410  human  X Q E X T A T X X S K X ATN CAA GAA NC ACC GCC ACC ANN NGC TCT AAG CNA  445  chimp  human chimp  human chimp  human chimp  GAA TTC CCC ACC ACC ATC AGC TCT AAG CGA E F P T T I S S K R I X R R P S L P X L N ATC NTC CGT CGT CCA TCC CTTA CCG ANT TTG AAT ATC GTC CGT CGT CCA TCC C TA CCG AGT TTG AAG I V R R P S L P S L K K P I I L R X S G C TTGAATA AAA CCC ATT ATC CTC CGA ANC TCT GGG TGC AAA CCT ATT ATC CTC CGA AGG TCT GGG TGC K P I I L R R S G C  30  479 63  516 93  Q I P T V L R R S Y L Q CAA ATC CCC NACC GTC CTC CGC CGA AGC TAT CTC CAA 553 CAA GTC CCC Q V P  ACC GTC CTC CGC CGA GGC TAT CTC CAA 129 T V L R R G Y L Q  81  Chapter IV: Discussion  human  L F T X K F L X F C A S C T G T T C A C C NAA A A A T T T C T C A A N T T C TGC GCC T C C  III III III chimp  human chimp  human chimp  II  I I I III II  III III III III  C T G T T C A C C GAA GAG T G T C T C A A G T T C TGC GCC T C C L F T E E C L K F C A S X X E A X X K G A A N C A N GAAG GCC NAG GAN A A G GGC T C  N E E A A C GAA GAA  II  III II  II  II I III  I II  III I  CC T A  E C GAA TGC  II  II II  II  III  III  P X NC CCC A C N A  I III I  165  625  II  A A G C A G GA G GCC G T G GAG A A G GCG C T G A A C GAG GAG K Q E A V E K A L N E E X V AAN GTG  589  201  650  I  A A G G T G GCC T A C GAC TGC AGC C C C A A C A K V A Y D C S P N  229  Figure 4.4a GOR sequence comparison between human and chimpanzee beginning at the 5' EcoRl site. The human sequence is numbered from the EcoRl site in the 1.6 kb EcoRl-BssWl genomic fragment. The chimpanzee sequence is numbered from the £coRI site of the partial c D N A as reported by Mishiro et al, (1990).  82  Chapter IV: Discussion  human chimp  human chimp  human chimp  human chimp  human chimp  human chimp  K K X L S R E F I Q G I AAG AAA GAN TTG TCC AGA GAG TTT ATT CAG GGA ATT AAG AAA GAG TTG TCC AGA GAC GCT TAT CCA GGA ATC K K E L S R D A Y P G I  473 726  Y A L X C E X W Y H T H TAAC GCC TTG GAN TGT GAG ANG TGG TAA CAC ACG CAT 437  II I III III II  III III I I II  II  I III III  TA C GCC TTG GAC TGT GAG ATG TGC TAC ACC ACG CAT 762 Y A L D C E M C Y T T H G X E X A G F T V V D X GGC NTA GAG NTG GCC GGG TTC ACC GTG GTG GAC GCN GGC CTA GAG CTG ACC CGC GTC ACC GTG GTG GAC GCC G L E L T R V T V V D A D M G V V Y D T X V K P GAC ATG GGA GTG GTG TAC GAC ACT TTN GTC AAG CCC GAC ATG CGA GTG GTG TAC GAC ACC TTC GTC AAG CCC D M R V V Y D T F V K P  400 798  364 834  D N E I V D Y N T R F X GAC AAC GAG ATT GTG GAC TAC AAC ACC AGG TTT TTCT 328  III III III II  III III III III III III III I I  GAC AAC GAG ATC GTG GAC TAC AAC ACC AGG TTT TCC D N E I V D Y N T R F S X V T E A D V A K X S X NGA GTC ACC GAG GCC GAT GTC GCC AAG ANG AGC ATN GGA GTC ACC GAG GCC GAC GTC GCC AAG ACG AGC ATC G V T E A D V A K T S I  R L P Q V Q A I L L S F human AGG TTG CCC CAA GTC CAA GCC ATC CTG CTG AGC TTT chimp ACC TTG CCC CAA GTG CAA GCC ATC CTG CTG AGC TTT T L P Q V Q A I L L S F L S A Q T I X I G H S V human TTA AGC GCC CAA ACC ATC NTC ATC GGG CAC AGC GTG chimp TTC AGC GCC CAA ACC ATC CTC ATC GGG CAC AGC CTG F S A Q T I L I G H S L  83  870  291 906  255 942  219 978  Chapter IV: Discussion  human chimp  E S D L M A L K L I H S GAG AGC GAC C T G A T G GCC C T C A A G C T C A T C CAC AGC GAG AGC GAT C T G C T G GCC C T G A A G C T C A T C CAC AGC E S D L L A L K L I H S  183 1014  T V V D T A V L F P Q Y human A C C G T G G T G GAC A C G GCC GTG C T C T T C C C G CAA T A C  147  chimp  1050  A C C G T G C T G GAC A C G GCC GTG C T C T T C C C G CAC T A C T V L D T A V L F P H Y  M G F P Y K R S L R N L human A T C GGT T T C C C C T A C A A G CGC T C C C T C AGG A A T C T C  111  chimp  1086  human chimp  C T G GGT T T C C C C T A C A A G CGT T C C C T C AGG A A T C T C L G F P Y K R S L R N L A A D Y L A Q I I Q D S GCG GCA GAC T A C C T G GCA CAG A T C A T C CAG GAC AGC  III II  III III III I I III III III III III III  GCG GCC GAC T A C C T G GGA C A G A T C A T C C A G GAC AGC A A D Y L G Q I I Q D S Q D G H N S S E D A N A CAG GAC GGC C A C A A C T C C AGC GAG GAC GCA A A C GCC CAG GAC GGC C A C A A C T C C AGC GAG GAC GCA A A C GCC Q D G H N S S E D A N A  75 1122  39 1158  C L Q M V M W K V R Q R human T G C C T C C A G A T G G T G A T G T G G A A G G T C AGA CAG CGC GC  1  III III III II III III III III III II III III II chimp  T G C C T G C A G C T G G T G A T G T G G A A G G T C CGA C A G CGC GC C L Q L V M W K V R Q R  1196  Figure 4.4b GOR sequence comparison between human and chimpanzee beginning at the 5' ZfasHII site. The human sequence is numbered from the BssHU site in the 1.6 kb EcoRl-BssHU genomic fragment. The chimpanzee sequence is numbered from the EcoRl site of the partial c D N A as reported by Mishiro et ai, (1990).  84  Chapter IV: Discussion  human chimp  R R P R A C P H P S A G CGC CGC CCT CGC GCC TGT CCA CAT CCC TCT GCC  34  I I III III III III III III III III III  C GCG CCC CCT CGC GCC TGT CCA CAT CCC TCT GCC A P P R A C P H P S A  1547  H P R P L S L H H human CAT CCG AGA CCT CTG TCC TTA CAC CAC TAG CCACCC  70  III III III III III III III III III III MM  chimp CAT CCG AGA CCT CTG TCC TTA CAC CAC TAG CCACCC H P R P L S L H H  1583  human  CACGTGGGGACTTCCATGGCTTCTGAGTAGTACAAGGCCAGCCCCC  lllllllllllllllllllllllllllll  llllllllllllll  116  chimp  CACGTGGG ACTTCCATGGCCTCTGAGTA  CAAGGCCAGCCCCC  1625  human  CGGCCCACCAGCTTTCGGAATGCCCGCTTACCTCTTTTTCTGTAGA  chimp  CGGCCCACCAGCTTTCTGAATGTGTGCTTACCTGTTTTTCTCGAGA  1671  human  GGCACCACAGGGAGGTGGGTGAAGCACTTCGGCTCTGGAGTTACAG  208  chimp  GGCACCACAGTGAGGTGGGTGAAGCACTTAGGCTCTGGAGTTAGAT  1717  human  ATCTGGGTTCAAGGCCAAATTCCACCACTTACTAGGTTTGTAATAT  254  chimp  ATCTGGGTTCAAGGCCAAATTCCACCACTTACTAGGTTTCTAATAT  1763  human  TGGACAGATAACGTCTTTGCGCTTCTACCTTTTGGTCTTTAATGTG  3 00  chimp  TGCACAGATAATGTCTTTGCGCTTCTACCTTTTGATCTTTAAAGTG  1809  human  TGATCAAAAAAGACTTAGACACCCACATAATAATAATAATAATAAT  346  chimp  TGATCAAAAGAGACTTAGACTCCCACATCATAATAATGGGAAACTT  Ml MM IMM MM III  III M M M  llllllllll llllllllllllllllll  162  III  lllllllllllll I  l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l MM  I  I l  l  l  l  l  l  MINIMI l l l l l l l l l l l l l l l l l l l l l l l l l  l II  l  M  I 1855  Figure 4.4c GOR sequence comparison between human and chimpanzee beginning at the second BssHII site. The human sequence is numbered from the BssHII site in the 1.4 kb BssHII-EcoRI genomic fragment. The chimpanzee sequence is numbered from the EcoRl site of the partial c D N A as reported by Mishiro et al., (1990).  85  i  l  l  Chapter IV: Discussion  1.6 kb  B  Human genomic  0.8 kb  B  1.8 kb  . ^ \  6  cDNA  \  \  \  /  1  1  2050 bp  I  /  /  /  I =J 4  0  0  3  b  P  Figure 4.5 5-oxo-L-prolinase comparison between human and rat. The 650 base pairs of sequence obtained from the 1.6 and 1.8 kb BssHII-EcoRI subclones are shown as a line where the sequence is intronic and as a dark boxes, labeled a-d, where it is an exon. The location of these exons in the rat cDNA are shown as white boxes. B: Bsstlll 86  Chapter TV: Discussion  human  T T C R T S V P CCAGAAACCTGCAG G A C A A C C T G T CGG A C T T C C G T G C C C  rat  CCAGGAACCTGCAT  I I I I MINIM  human rat  human rat  human rat  I III III III III I  405  I I III III  G A C A A C C T G T CGG A T C T T C G T G C C C T T C R I F V P  R W Q S T R R A S S W W AGG TGG C A G C C A A C C AGA AGG GCA T C C AGC T G G T G G AGG T G G C A G C T A A C C AGA A A G GCA T C C AGC T G G T G G R W Q L T R K A S S W W G S F I G Q Y G L D V V GGG AGC T T C A T T GGG CAG T A C GGC C T G GAC G T G G T G GAG AGC T T G A T C GGA CAG T A T GGC T T A GAT G T G G T G E S L I G Q Y G L D V V Q A Y M G H I Q CAG GCC T A C A T G GGC C A T A T T C A G GTGGGCCTGGG C A G GCC T A T A T G GGC C A T A T T C A G GCGAATGCTGA Q A T M G H I Q  2880  369 2916  333 2952  298 2987  Figure 4.6a 5-oxo-L-prolinase sequence comparison between human and rat for the first exon identified in the human sequence. The human sequence is numbered from the 2tasHII site in the 1.6 kb EcoRI-BssHU genomic fragment. The rat sequence is numbered from the complete cDNA as reported by Guo-jie etal., (1996).  87  Chapter IV: Discussion  human rat  human rat  human rat  human rat  human rat  A N A E L A V R C T G N T C C C T G C C A G GCA A A C GCT GAG C T G GCC G T G CGA T G G G C C A T A T T C A G GCG A A T GCT GAG C T A GCA GTG AGA A N A E L A V R D M L R A F G T S R Q A GAC A T G T T G CGT GCC T T T GGA A C C T C C CGG C A G GCC  III III  I II  II  III III II  III III III III  GAC A T G C T C CGG G C T T T T GGA A C T T C C CGG C A G GCC D M L R A F G T S R Q A  206 3000  170 3 036  R G L P L E V S S E D H CGG GGC C T G C C C C T G GAG G T G T C C T C G GAA GAC C A C 134 AGG GGC C T G C C C C T G GAG G T G T C T GCA GAG GAT C A C R G L P L E V S A E D H M D D G S P I R L R V Q A T G GAC GAC GGT T C C C C C A T C CGC C T C CGT G T G CAG A T G G A T GAT GGC T C T C C C A T C T G T C T G CGT G T T C A G M D D G S P I C L R V Q I S L S Q A T C AGC C T G A G T C A G GTGGGCCGTGG A T C A A C C T G A G T C A G GGCAGTGCGGT I N L S Q  3072  98 3108  72 3134  Figure 4.6b 5-oxo-L-prolinase sequence comparison between human and rat for the second exon identified in the human sequence. The human sequence is numbered from the BssHII site in the 1.6 kb EcoRl-BssHll genomic fragment. The rat sequence is numbered from the complete cDNA as reported by Guo-jie etal., (1996).  88  Chapter TV: Discussion  human rat  human rat  human rat  human  G S A V F D F S GCCTAGGCGCGCAG GGC AGC GCT G T G T T T GAC T T C A G C T C A A C C T G A G T C A G GGC A G T GCG G T A T T T GAC T T T A C T G S A V F D F T  42 3147  G T G P E V F G N L N A GGC A C T GGG C C G GAG GTG T T T GGT A A T C T C A A C GCA GGT T C C GGG T C T GAG GTG T T T GGC A A T C T C A A T GCC G S G S E V F G N L N A P R A V T L S A L I Y C C C G CGG GCC G T A A C C C T G T C C GCC C T C A T C T A C T G C  III  I III  II II  III II  II  III III II III  C C G AGA GCC A T A A C A C T G T C T G C T C T C A T C T A T T G C P R A I T L S A L I Y C L R C L V G R X I P L N C T G CGC T G T C T G GTG GGC CGC NAC A T C C C A C T C A A C  3183  114 3219  150  rat  T T A CGC T G T C T A GTG GGC CGT GAC A T C C C A C T T A A C L R C L V G R D I P L N  3255  human  C A G GTTCGCAGGGG  164  rat  C A G GGTTGCCTGGC Q  3269  III III II II  Figure 4.6c 5-oxo-L-prolinase sequence comparison between human and rat for the third exon identified in the human sequence. The human sequence is numbered from the 5ssHiI site in the 1.8 kb BssHU-EcoRl genomic fragment. The rat sequence is numbered from the complete cDNA as reported by Guo-jie etal., (1996).  89  Chapter IV: Discussion  human rat  human rat  human rat  human rat  human rat  G C L A P V R V G C C T A C C C C G C A C A G GGC TGC C T G GCG C C A G T G CGC G T G 346 T C C C A A C T T A A C C A G GGT TGC C T G G C T C C T G T G C G T G T C G C L A P V R V  3282  V I P R G S I L X P S P GTC A T T C C C CGA GGC T C C A T C C T G NAC C C G T C C C C C 382 A T A A T T C C C A A A GGC T C C A T A T T G GAT CCA T C C C C A 3318 I I P K G S I L D P S P E A A V V G G N X L T S GAA GCG GCG G T G GTG GGC GGC A A C N T T C T C A C N T C C GAG GCA GCA G T G GTC GGC GGC A A C G T G C T C A C A T C T E A A V V G G N V L T S  418 3354  N A W W M F I L G A F G A A C GCG T G G T G G A T G T T C A T C C T G GGG GCC T T T GGG 454 C A G CGA G T A G T G GAT GTC A T T C T G GGG G C T T T T GGG Q R V V D V I L G A F G A C GCC T G C C C C C  3389  464  GCC T G T T C A G A C  3400  Figure 4.6d 5-oxo-L-prolinase sequence comparison between human and rat for the fourth exon identified in the human sequence. The human sequence is numbered from the BssHII site in the 1.8 kb BssHII-EcoRI genomic fragment. The rat sequence is numbered from the complete cDNA as reported by Guo-jie et al, (1996).  90  Chapter IV: Discussion  Table 4.1 RNA Splice Junction Sequences Subclone Name 13E3 0.6 93CllABssHII 46F4:exon a 46F4: exon b 46F4: exon c 46F4: exon d  3'  splice  junction  TGGTGGTGCTGGAG/G AANANANTGCNCAG/G CCAGAAACCTGCAG/G CTGNTCCCTGCCAG/G GCCTAGGCGCGCAG/G CCCGCCCCGCACAG/G  5' s p l i c e junction AG/GTGCTCCCTGCT AG/GAGATNGAGAC AG/GTGGGCCTGGG AG/GTGGGCCGTGG AG/GTTCGCAGGGG  Splice site consensus sequences from each intron/exon boundary identified in the human 5oxo-L-prolinase gene, listed as exons a-d, and two open reading frames.  Table 4.2  Subclone Name 141A6 1.7 156G7 3.3 166H7 5.0 175G8 3.3  Potential TATA and CCAAT Boxes and Spl Binding Sites T A T A box 5 1 1 0  C C A A T box 0 0 0 0  91  Spl binding site 0 0 0 0  Chapter IV: Discussion  Table 4.3 Summary of Data from Cosmids COSMID  Fragment Isolated (kb)  Match in Data Base  13E3 40G1 40G1 46F4 46F4 46F11 77C1 93C11 141A6 156G7 166H7 166H7 175G8 176F5  0.6 1.4 1.6 1.6 1.8 2.5 2.4  + + + + + +  -  1.7 2.3 5.0 9.2 3.3 1.8  +  -  +  92  Information from Sequence ORF GOR gene GOR gene O P L A H gene O P L A H gene Repetitive D N A L I sequence ORF CpG island CpG island human EST human EST  References Adams, M . D., Kelley, J. M . , Gocayne, J. D., Dubnick, M . , Polymeropoulos, M . H . , Xiao, H., Merril, C. R., Wu, A., Olde, B . , Moreno, R. F., Kerlavage, A . R., McCombie, W. R., and Venter J. C. (1991). Complementary D N A Sequencing: Expressed Sequence Tags and Human Genome Project. Science 252: 1651-1656. Aissani, B., and Bernardi, G. (1991). CpG islands, genes and isochores in the genomes of vertebrates. Gene 106: 185-195. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman D. J. (1990). Basic Local Alignment Search Tool. Journal of Molecular Biology 215: 403-410. Antequera, F., and Bird, A . (1993). Number of CpG islands and genes in human and mouse. Proceedings in the National Academy of Science U S A 90:11995-11999. Bairoch, A. (1993). The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research 21(13): 3097-3103. Batzer, M . A., Kilroy, G . E., Richard, P. E., Shaikh, T. H., Desselle, T. D., Hoppens, C. L., and Deininger, P. L . (1990). Structure and variability of recently inserted Alu family members. Nucleic Acids Research 18(23): 6793-6798. Bickmore, W. A., and Sumner, A. T. (1989). Mammalian chromosome banding - an expression of genome organization. Trends in Genetics 5: 144-148. Bird, A. (1980). D N A methylation and the frequency of CpG in animal D N A . Nucleic Acids Research 8(7): 1499-1504. Bird, A . P. (1986). CpG-rich islands and the function of D N A methylation. Nature 321: 209-213. Bird, A . P. (1989). Two classes of observed frequency for rare-cutter sites in CpG islands. Nucleic Acids Research 17(22): 9485-9486. Birnboim, H. C , and Doly, J. (1979). A rapid alkaline extraction procedure for screening recombinant plasmids D N A . Nucleic Acids Research 7: 1513. Bonaldo, M . , Lennon, G , and Soares, M . B . (1996). Normalization and Subtraction: Two Approaches to Facilitate Gene Discovery. Genome Research 6: 791-806. Buckler, A . J., Chang, D. D., Graw, S. L., Brook, J. D., Haber, D. A . , Sharp, P. A . , and Housman, D. E. (1991). Exon Amplification: a Strategy to Isolate Mammalian Genes Bases on R N A Splicing. Proceedings of the National Academy of Science U S A 88: 40054009. Claverie, J., and Makalowski, W. (1994). Alu alert. Nature 371: 752. Collins, F., and Galas, D. (1993). A New Five-Year Plan for the US Human Genome Project. Science 262: 43-46. Coulondre, C , Miller, J. H., Farabaugh, P. J., and Gilbert, W. (1978). Molecular basis of base substitution hotspots in Escherichia coli. Nature 274: 775-780.  93  Craig, J. M . , and Bickmore, W. A . (1994). The distribution of CpG islands in mammalian chromosomes. Nature Genetics 7: 376-382. Creighton, T. E. (1993). "Proteins: Structures and Molecular Properties", Second Edition, W. H . Freeman and Company, N Y . Cross, S. H., and Bird, A . P. (1995). CpG islands and genes. Current Opinion in Genetics and Development 5: 309-314. DeBella, L., Schertzer, M . , and Wood, S. Identification of a Novel Human Gene (GOR) Localized to 8ql3-8q22. (in preparation). Deininger, P. L., and Daniels, G . R. (1986). The recent evolution of mammalian repetitive D N A elements. Trends in Genetics 2: 76-80. Deininger, P. L., Jolly, D. J., Rubin, C. M . , Friedmann, T., and Schmid, C. W. (1981). Base Sequence Studies of 300 Nucleotide Renatured Repeated Human D N A Clones. Journal of Molecular Biology 151: 17-33. Dombroski, B . A., Mathias, S. L., Nanthakumar, E., Scott, A . F., and Kazazian, H . H . Jr. (1991). Isolation of an Active Human Transposable Element. Science 254: 18051807. Duyk, G. M . , Kim, S., Myers, R. M . , and Cox, D. R. (1990). Exon Trapping: A Genetic Screen to Identify Candidate Transcribed Sequences in Cloned Mammalian Genomic D N A . Proceedings of the National Academy of Science U S A 87: 8995-8999. Fields, C , Adams, M . D., White, O., and Venter, J. C. (1994). How many genes in the human genome? Nature Genetics 7: 345-346. Gardiner-Garden, M . , and Frommer, M . (1987). CpG Islands in Vertebrate Genomes. Journal of Molecular Biology 196: 261-282. Gerken, S. C , Matsunami, N . , Lawrence, E., Carlson, M . , Moore, M . , Ballard, L . , Melis, R., Robertson, M . , Bradley, P., Eisner, T., Tingey, A., Rodriguez, P., Albertsen, H., Lalouel, J. M . , and White, R. (1993). Genetic and physical mapping of simple sequence repeat containing sequence tagged sites from the human genome, (unpublished). Gillespie, D. and Spiegelman, S. (1965). A quantitative assay for D N A - R N A hybrids with D N A immobilized on a membrane. Journal of Molecular Biology 12: 829 Goffeau, A., Barrell, B . G., Bussey, H , Davis, R. W., Dujon, B., Feldmann, H , Galibert, F., Hoheisel, J. D., Jacq, C., Johnston, M . , Louis, E. J., Mewes, H . W., Murakami, Y . , Philippsen, P., Tettelin, H., and Oliver, S. G. (1996). Life with 6000 Genes. Science 274: 546-567. Grunstein, M . and Hogness, D. S. (1975). Colony hybridization: A method for the isolation of cloned DNAs that contain a specific gene. Proceedings in the National Academy of Science 72: 3961 Guo-jie, G. E., Breslow, E. B., and Meister, A. (1996). The Amino Acid Sequence of Rat Kidney 5-Oxo-L-Prolinase Determined by cDNA Cloning. The Journal of Biological Chemistry 271(50): 32293-32300.  94  Hillier, L., Clark, N . , Dubuque, T., Elliston, K., Hawkins, M . , Holman, M . , Hultman, M . , Kucaba, T., Le, M . , Lennon, G., Marra, M . , Parsons, J., Rifkin, L., Rohlfing, T., Soares, M . , Tan, F., Trevaskis, E., Waterston, R., Williamson, A., Wohldmann, P., and Wilson, R. (1995). The WashU-Merck EST Project, (unpublished). Hillier, L., Lennon, G., Becker, M . , Bonaldo, M . F., Chiapelli, B . , Chissoe, S., Dietrich, N . , DuBuque, T., Favello, A., Gish, W., Hawkins, M . , Hultman, M . , Kucaba, T., Lacy, M . , Le, M . , Le, N . , Mardis, E., Moore, B . , Morris, M . , Parsons, J., Prange, C , Rifkin, L., Rohlfing, T., Schellenberg, K . , Soares, M . B . , Tan, F., Thierry-Mieg, J., Trevaskis, E., Underwood, K., Wohldman, P., Waterston, R., Wilson, R., and Marra, M . (1997). Generation and Analysis of 280,000 Human Expressed Sequence Tags. Genome Research 6: 807-828. Holmquist, G. P. (1992). Review Article: Chromosome Bands, Their Chromatin Flavors, and Their Functional Features. American Journal of Human Genetics 51: 17-37. Inoue, H , Nojima, H , and Okayama, H . (1990). Escherichia coli with plasmids. Gene 96: 23-28.  High efficiency transformation of  Ish-Horowicz, D. and Burke, J. F. (1981). Rapid and efficient cosmid cloning. Nucleic Acids Research 9: 2989 John, R. M . , Robbins, C. A., and Myers, R. M . (1994). Identification of genes within CpG-enriched D N A from human chromosome 4pl6.3. Human Molecular Genetics 3(9): 1611-1616. Jurka, J. (1989). Novel families of interspersed repetitive elements from the human genome. Nucleic Acids Research 18(1): 137-141. Jurka, J. (1989). Subfamily Structure and Evolution of the Human L l Family of Repetitive Sequences. Journal of Molecular Evolution 29: 496-503. Jurka, J., Kaplan, D. J., Duncan, C. H , Walichiewicz, J., Milosavljevic, A., Murali, F., and Solus, J. F. (1993). Identification and characterization of new human medium reiteration frequency repeats. Nucleic Acids Research 21(5): 1273-1279. Jurka, J., and Milosavljevic, A. (1991). Reconstruction and Analysis of Human Alu Genes. Molecular Evolution 32: 105-121. Kaplan, D. J., Jurka, J., Solus, J. F., and Duncan, C. H . (1991). Medium reiteration frequency repetitive sequences in the human genome. Nucleic Acids Research 19(17): 4731-4738. Kozak, M . (1996). Interpretation c D N A sequences: some insights from studies on translation. Mammalian Genome 7: 563-574. Labuda, D., and Striker, G. (1989). Sequence conservation in Alu evolution. Nucleic Acids Research 17(7): 2477-2491. Larsen, F., Gundersen, G., Lopez, R., and Prydz, H . (1992). CpG Islands as Gene Markers in the Human Genome. Genomics 13: 1095-1107. Larsen, F., Gundersen G., and Prydz, H . Choice of Enzymes for Mapping Based on CpG Islands in the Human Genome. (1992). G A T A 9(3): 80-85. 95  Leach, R. J., Banga, S. S., Ben-Othame, K . , Chughtai, S., Clarke, R., Daiger, S. P., Kolehmainen, J., Kumar, S., Kuo, M . , Macoska, J., Mada, N . , Naylor, S. L . , Nunes, M . , O'Connell, P., Pebusque, M.-J., Pekkel, V . , Porter, C. J., Simons, C. T., Sohocki, M . M . , Trapman, J., Wells, D., Westbrook, C , and Wood, S. (1996). Report of the third international workshop on human chromosome 8 mapping 1996. Cytogenetics and Cell Genetics 75: 71-84. Lee, W., Salido, E., and Yen, P. H . (1994). Isolation of a New Gene GS2 (DXS1283E) from a CpG Island between STS and K A L I on Xp22.3. Genomics 22: 372-376. Lindsay, S., and Bird, A . P. (1987). Use of restriction enzymes to detect potential gene sequences in mammalian D N A . Nature 327: 336-338. Mishiro, S., Hoshi, Y . , Takeda, K . , Yoshikawa, A . , Gotanda, T., Takahashi, K . , Akahane, Y . , Yoshizawa, H . , Okamoto, H., Tsuda, F., Peterson, D . A., and Muchmore, E. (1990). Non-A, Non-B hepatitis specific antibodies directed at host-derived epitope: implication for an autoimmune process. Lancet 336: 1400-1403. Monaco, A . P. (1993). Isolation of genes from cloned D N A . Current Opinions in Genetics and Development 4: 360-365. Parimoo, S., Patanjali, S. R., Shukla, H., Chaplin, D. D., and Weissman, S. M . c D N A Selection: Efficient PCR Approach fro the Selection of cDNAs Encoded in Large Chromosomal D N A Fragments. Proceedings of the National Academy of Science USA 88: 9623-9627. Riddle, D . L., Blumenthal, T., Meyer, B . J., and Priess, J. R. (1997). " C . elegans II", Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N Y . Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). "Molecular Cloning-A Laboratory Manual", Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sanger, F., Nicklen, S., and Coulson, A . R. (1977). D N A sequencing with chaintermination inhibitors. Proceedings in the National Academy of Science 74: 5463 Schuler, G. D., Boguske, M . S., Stewart, E. A . , Stein, L . D., Gyapay, G , Rice, K., White, R. E., Rodriguez-Tome, P., Aggarwal, A., Bajorek, E., Bentolila, S., Birren, B . B., Butler, A . , Castle, A . B., Chiannilkulchai, N . , Chu, A . , Clee, C , Cowles, S., Day, P. J. R., Dibling, T., Drouot, M . , Dunham, I., Duprat, S., East, C , Edwards, C , Fan, J.-B., Fang, N . , Fizames, C , Garrett, C , Green, L . , Hadley, D., Harris, M . , Harrison, P., Brady, S., Hicks, A . , Holloway, E., Hui, L . , Hussain, S., Louis-Dit-Sully, C , Ma, J., MacGilvery, A., Mader, C , Maratukulam, A., Matise, T. C , McKusick, K . B., Morissette, J., Mungall, A., Muselet, D., Nusbaum, H . C , Page, D. C , Peck, A . , Perkins, S., Piercy, M . , Qin, F., Quackenbush, J., Ranby, S., Reif, T., Rozen, S., Sanders, C , She, X . , Silva, J., Slonim, D. K . , Soderlund, C , Sun, W.-L., Tabar, P., Thangarajah, T., Vega-Czarny, N . , Vollrath, D., Voyticky, S., Wilmer, T., Wu, X . , Adams, M . D., Auffray, C , Walter, N . A . R., Brandon, R., Deheji, A., Goodfellow, P. N . , Houlgatte, R., Hudson Jr., J. R., Ide, S. E., Iorio, K . R., Lee, W. Y . , Seki, N . , Nagase, T., Ishikawa, K . , Nomura, N . , Phillips, C , Polymeropoulos, M . H . , Sandusky, M . , Schmitt, K . , Berry, R., Swanson, K., Torres, R., Venter, J. C , Sikela, J. M . , Bechmann, J. S., Weissenbach, J., Myers, R. M . , Cox, D . R., James, M . R., Bentley, D., Deloukas, P., Lander, E. S., and Hudson, T. J. (1996). A Gene Map of the Human Genome. Science 274: 540-546. 96  Shapiro, M . B., and Senapathy, P. (1987). R N A splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Research 15(17): 7155-7173. Smit, A . F. A . , Toth, G., Riggs, A . D., and Jurka, J. (1995). Ancestral, Mammalianwide Subfamilies of LINE-1 Repetitive Sequences. Journal of Molecular Biology 246: 401-417. Southern, E. M . (1975). Detection of specific sequences among D N A fragments separated by gel electrophoresis. Journal of Molecular Biology 98: 503 Stewart, E. A., McKusick, K . B . , Aggarwal, A., Bajorek, E., Brady, S., Chu, A., Fang, N . , Hadley, D., Haris, M . , Hussain, S., Lee, R., Maratukulam, A . , O'Connor, K . , Perkins, S., Piercy, M . , Qin, F., Reif, T., Sanders, C , She, X . , Sun, W., Tabar, P., Voyticky, S., Cowles, S., Fan, J., Mader, C , Quachenbush, J., Myers, R. M . , and Cox, D. R. (1997). A n STS-Based Radiation Hybrid Map of the Human Genome. Genome Research 7: 422-433. Tykocinski, M . L., and Max, E. E. (1984). C G dinucleotide clusters in M H C genes and in 5' demethylated genes. Nucleic Acids Research 12: 4385-4396. Uberbacher, E. C , and Mural, R. J. (1991). Locating protein-coding regions in human D N A sequences by a multiple sensor-neural network approach. Proceedings in the National Academy of Science U S A 88: 11261-11265. Ullmann, A., Jacob, F., and Monod, J. (1967). Characterization by in vitro complementation of a peptide corresponding to an operator-proximal segment of the (3galactosidase structural gene of Escherichia coli. Journal of Molecular Biology 24: 339 Wagner, M . J., Ge, Y . , Siciliano, M . , and Wells, D. E. (1991). A Hybrid Cell Mapping Panel for Regional Localization of Probes to Human Chromosome 8. Genomics 10: 114125. Wahl, G. M . , Lewis, K . A . , Ruiz, J. D., Rothenberg, B., Zhao, J., and Evans, G. A . (1987). Cosmid vectors for rapid genomic walking, restriction mapping, and gene transfer. Proceedings in the National Academy of Science USA 84: 2160-2164. Wood, S., Poon, R., Riddell, D . C , Royle, N . J., and Hamerton, J. L . (1986). A D N A marker for human chromosome 8 that detects alleles of differing sizes. Cytogenetics and Cell Genetics 42: 113-118. Wood, S., Schertzer, M . , Durbkin, H , Patterson, D., Longmire, J. L., and Deaven, L . L . (1992). Characterization of a human chromosome 8 cosmid library constructed from flowsorted chromosomes. Cytogenetics and Cell Genetics 59: 243-247. Zhu, Y . , Cantor, C. R., and Smith, C. L . (1993). D N A Sequence Analysis of Human Chromosome 21 NotI Linking Clones. Genomics 18: 199-205. Zhu, Z. B . , Hsieh, S., Bentley, D. R., Campbell, R. D., and Volanakis, J. E. (1992). A Variable Number of Tandem Repeats Locus within the Human Complement C2 Gene is Associated with a Retroposon Derived from a Human Endogenous Retrovirus. Journal of Experimental Medicine 175: 1783-1787.  97  Appendix 13E3 0.6 (M13) GCGCGCCATCTCAGAGGGTCTGGTGCAGAGAAGCTTCGGAAGACGGCAAACGTCT CCACCAACGAGACAGCACGATCGTGAGGCACGTCCTGGATGAGGTGACAACGATT CCATTTTCACAGCCATGGGCACCAACGCCACTCGCAGCATGTTCGAGGACGCCAG ACTGGCAGCTGAGCTCCGTTTCTGCCAGGGCTCCTGGAAGCCGGGACGAAATGCA TGGTGGTGCTGGAGGCGGCCTGGACGGGCCGGAAGCCTTCTGCCGGGAATCCCTG GTGCAGCTTCAGGGCACCGTCTCAGCCCTGCTGCTGGAGCCCAGATGTGCGTCCC AGGTGCTCCCTGCTCCCTANAAACTCTACGGGAACTTTCCCTGCAGCGGTGCTCT GTCCCTGTCCCTCACTTTGGTATATTGTCTTTTCTCAAGTACACACAGAAGCCAN GGCACCATGCTCACAGCTGGGAAACTCCCCAGGGTGACNGGCCTCANNAACCATT GCGAACGTGTCTCTCANAACCCCGTTAGGCTGGTCCCTCCCTGCCTCCCCTGTCA TTTGTTCCTTTTGCTCTTCNAATAACCTAAAATTTATAAAGTTNCACTTTACCAA AACCAGCATGGTCTAGGGGAAGGTGGGCAATTCAAAAAAAACAAA  55 110 165 220 275 330 3 85 440 495 550 605 650  40G1 1.4 (M13) GCGCGCCCTCGCGCCTGTCCACATCCCTCTGCCCATCCGAGACCTCTGTCCTTACA CCACTAGCCACCCCACGTGGGGACTTCCATGGCTTCTGAGTACAAGGCCAGCCCC CCGGCCCACCAGCTTTCGGAATGCCCGCTTACCTCTTTTTCTGTAGAGGCACCAC AGGGAGGTGGGTGAAGCACTTCGGCTCTGGAGTTACAGATCTGGGTTCAAGGCCA AATTCCACCACTTACTAGGTTTGTAATATTGGACAGATAACGTCTTTGCGCTTCT ACCTTTTGGTCTTTAATGTGTGATCAAAAAAGACTTAGACACCCACATAATAATA ATAATAATAATGGCAAACTTAACACCCCACTGTCAACATTAGACACACCAACGAA GACAGAAAAGTTAAAAAAGGATATCCCGGGGAAATTGAGCTCAGCTCTGCACCAA AGCGGGACCTAGGANACATCTACAGAACGCTCCACCCCAAATCAACAGAATATAC ATCTTCTCCAGCACCACATCACACTTATTTCCACATTGACCACATTATTGGGAAT TAAACACCCTCNGTTAAAAGTTAAAATAACAGAAATTATTACAAACCGTCTCTCC AACCACATTGCAAACCNCACTAAAACTCCTGGANTGAAAAAACTC  55 110 165 220 275 330 385 440 495 550 605 650  40G1 1.4 (T3) GAATTCGGCTGTGAATCCGTCTGGTCCTGGAGTTTTATTGCTTGATAGCTATTAA TTATTGCCTCAATTTCAGAGCCTGTTATTGGTCTATTCAGGCATTCAACTTCTTC CTGGTTTACTCTGGGGAGGTTGCATGTGTCCAGGAATTTATTCATTTCTTCTAGA TTTCCGAGTTTGTTTGCCTAGAGGTGTTGACAGTATTCTCTCATGGTAGTTTGTA CTTCTGTGGGATCAGTGGTGATATCCTCTTTATCATTTTTTATTGCATCTGCTTG ATTCTTCTTTCTTTCATTCTTTAATAGTCTTGCTAGTGGTCTATCAATTTTGTTG ATGGTTTCAAAAAACCCGCTCCTGGATTCATTGATTTTTTGAAGGGTTTTTTGGG TCTCTATCTCCTTCAGTTCTGCTCGGNATCTTAGTTATTTCTTGGCCTTCTGCTA GCTTTTGAATGTGTTTGCTCTTGCTTCTCTCATCCTTTTAATGGTGATGTTANGG TATGCATTTTTGATCTTTCCTGCTTTCCCCTTGTGGGCATTTAATGCCAATAAAT TTCCCCTCCTACACACTGCNTTAAATGTGTTCCCANAAANTCCGGTAAGTTGTGT CCCTGTTCCCCATGCCTTCCANAAAAATATCTTTATTTCTGCCNT  55 110 165 220 275 330 385 440 495 550 605 650  98  Appendix 40G1 1.6 (M13) GCGCGCTGTCTGACCTTCCACATCACCATCTGCAGGCAGGCGTTTGCNTCCTCGC TGGANTTGTGGCCGTCCTGGCTGTCCTGNATGATCTGTGCCAGNTAGTCTGCCGC GANATNCCTGAGGGAGCGCTTGTAGGGGAAACCCATGTATTGCGGGAAGAGCACG GCCGTGTCCACCACGGTGCTGTNGATGAGCTTCAGGGCCATCAGGTCNCTCTCCA CGCTGTGCCCNATGANGATGGTTTGGGCGCTTAAAAANCTCAGCANGATGGCTTG GACTTGGGGCAACCTNATGCTCNTCTTGGCGACATCGGCCTCGGTGACTCNAGAA AAACCTGGTGTTGTAGTCCACAATCTCGTTGTCGGGCTTGACNAAAGTGTCGTAC ACCACTCNCCATGTCNGCGTCCACCACGGTGAACCCGGCCANCTCTNANGCCATG CGTNGTGTTACCACNTCTCACANTCCAAGGCGTTAAATTCCCTGAATAAACTCTC TGGACAANTCTTTCTTGAAAGTCTCNCCCAAATCCNTTGANGCTCTCCCTTTGCG GGCCTCCCCCACGTTCTTGCTTTGCCACCTTGCAANCCCCCATAACCCAGTAACC NNCTGCCCCACAGGTTTTTTTNGGTTAACCCGGCCTCCNACCCNC  5 5 110 165 220 275 330 385 440 495 550 605 650  40G1 1.6 (T3) GAATTCCCACTGCCCTCTCNTCANCCTGCCCANAGCTTCGGGCTCTGGGTGCCCC AGATGCACNANCAGGCCTCATCATTTGTGGACATCCAGNCGGANCCCCANAACNG GGGTCCGGCGGTGCCCCCAACGTGGCCCAATATGGTGACGGAGTCGTGCTACTTC CCTGCGCACANGGGATCGGCCTGCCGCTTGCCAACCACCCCAAGGCTGACAGACA GGCCCTCGGGTAGTCCGCATCTCAACCCCCAGTGAAGAAGGAAGACGATCGCCCA CTCTTCCACCCCTTGCNTGGTCNCATGTTACACAGATNCCNNGNANAACCCGGGT GGCCAGCANCAGCCAACGCTCCCGTGGCTCCANGGTCNGCANACAGCCANGGGAA GACGCGCAACAGGTCAGGGGATGGCATNCAAGAANCACCGCCACCANNNGCTCTA AGCNAATCNTCCGTCGTCCATCCCTTACCGANTTTGAATTTGAATAAAACCCATT ATCCTCCGAANCTCTGGGTGCCAAATCCCCNACCGTCCTCCGCCGAAGCTATCTC CAACTGTTCACCNAAAAATTTCTCAANTTCTGCGCCTCCAANCANGAAGGCCNAG GANAAGGGCTCAACGAAGAAAANGTGCCTAGAATGCNCCCCACNA  5 5 110 165 220 275 33 0 3 85 440 495 550 605 650  4 6 F 4 1.6 (M13) GCGCGCCCCGAGGGNAAGGGAGAGGCTGTCAGCGGCCGCACCGTGCCCCTGCCGC CTAAAGGCACTCTCCCCACGGCCCACCTGACTCAGGCTGATCTGCACACGGAGGC GGGATGGGGGAACCGTCGTCCATGTGGTCTTCCGAGGACACCTCCAGGGGCAGGC CCCGGGCCTGCCGGGAGGTTCCAAAGGCACGCAACATGTCTCGCACGGCCAGCTC AGCGTTTGCCTGGCAGGGANCAGGATCAGTGGTGGCCAGGTCACCTGCAGGATGG CCCTGCCGCCCACACTCCTGTGCCCAGGCCCACCTGAATATGGCCCATGTAGGCC TGCACCACGTCCAGGCCGTACTGCCCAATGAAGCTCCCCCACCAGCTGGATGCCC TTCTGGTTGGCTGCCACCTGGGCACGGAAGTCCGACAGGTTGTCCNTGCAGGTTT CTGGTTCCGCTGCAGTTGGGGAACTTGCCTGGCGCCCGCANGGCTCCGTCACCGC TGGATGGACAGTGTTCNTCACTGAACCCCTTGGCCAACCTGTCTGCACCCCCCAA ACCAAGTTTTCCCANATTCAAAACAAGGTCTCTTGGCCCCTAAAAAAACACCCAA GCCCGGCCTGGGAATTGCTCAACACCTCAGGAAACCCCTCCTCCT  55 110 165 220 275 330 385 440 495 550 605 650  99  Appendix 4 6 F 4 1.8 (M13) GCGCGCCTAGGCGCGCAGGGCAGCGCTGTGTTTGACTTCAGCGGCACTGGGCCGG AGGTGTTTGGTAATCTCAACGCACCGCGGGCCGTAACCCTGTCCGCCCTCATCTA CTGCCTGCGCTGTCTGGTGGGCCGCNACATCCCACTCAACCAGGTTCGCAGGGGT GTGTGCGCGAGGGGGGGCCCCANCCCGAGGTCACCCTGAGTGCGCCTCCTGGGCC CAGCCACGGTCTGACCCCACGGGTGCTGGGGCGGGGGGTGGGGTGCTGTTTCTCC CGGAAGCCCACCCCAAAANCTCCTGAAAAGCCGCCCGCCCCGCACAGGGCTGCCT GGCGCCAGTGCGCGTGGTCATTCCCCGAGGCTCCATCCTGNACCCGTCCCCCGAA GCGGCGGTGGTGGGCGGCAACNTTCTCACNTCCAACGCGTGGTGGATGTTCATCC TGGGGGCCTTTGGGGCCTGCCCCCCTCCCAAANTTTCCGGGGCCGGGTTTGGCCC AACTCCGGGGCCGAATGGGTTGGGCAGGCTGGAATTAGAAACCGGAAGGCAAAGT TGGGGAACCCCTTGCCCAACCCCAACCCAACGAACAATTCCCCTCAACCANGGCT GCATNAACAACTTAACCCTGGCAACCCCCAACATNGGCTACTAAC  5 5 110 165 220 275 330 385 440 495 550 605 650  46F11 2.5 (M13) GCGCGCCTGCAATCGCANGCACTCGGCAAGCTGAGGCAGGAGAATCAGGCAGGGT ANGTTGCAGTGAGCCGAGATGGCAGCANTNCCGTCCAGCTTCGGCTCGGCATCAG AGGGAGACCGTGGNAAAGAGAGGGAGAGGGAGACCGTGGGGAGAGGGAGAGGGAG AGGGAGAGCCCAAATTGTTACCTTTGAAAAAGTGAATATAGTATTCCTTGCTTTT CACATTTAATATATGACATGATTTTATTGGTTTGAGAAGGGGTCTTGCTATGTTG TGAGGCTGGACTCAAACTCCTGGGCTTAAGTGTGATCCTTCCACCTCAGGCTCCC AAGTNGCTGTAGATTACAGGCTCACACCATTGGTACCCAGATCACGTTTAATACA TTCCTTGAAATTGCTGGGAAAAGATGGCAGAATANGCNCAGCTCCATCCTGTATC TCCCAGCAAGACCCAATGCAGAAGGCAGGTGATTTCCTATATTTCCCACTGAAGT ACCCTCCTTCATCTCATTGGGANTGGTTTANGCTATGGGATGCCCCCCCNTGGAA GGCGANCNNAANCAGGGTTGGGGTTTTCCCCCTTGCCCCANGANNTTGCCNNGNN CCCGGGT  55 110 165 220 275 33 0 385 440 495 550 605 612  46F11 2.5 (T3) GAATTCAGCGATATTTCTCCTACTTGCACTTCCATTTATAGGCTCCCTGCAAGAC AAAAAATATGGCTCTATTCTGCCCAACCCCACAGGCAGTCAGACCTTATGGTTAT CTTCCCTTGTTCCCTGAAAATTGCTGTTATTCTGTTCTTTTTCAGGGTGTACTGA TTTCATATTGTGCAAACACATATTTTTTACAATTAGATTTCATATTGTTCAAACA CATGTTCTACAATCAGTTTGTCAATAGTGGTCCTGAGGCAACATATGTTCTCAGC TTACGAAGATAACAGGATTAAGAGATTAAAGTAAAGACAGGCATAAGAAATTATA AGAGTATTGATTGGGGGAAGTGATAAATGTCCATGAAAATCTTCACAATTTGTGT TTCAGAGATTGCAGTAAAGACAGGTGTTNAGAAATTATAAAAGTATTAATTTTGG GAATGGGATAAAAGTCCATGAAATCTTTACAATTTATGTTCTTCTGCCTCAGCTC CAGCCCGGTCCCTCCTATTCGGGGTCCCTGAATTCCTGCAACACAGAAGAATTTT GTTATTNACCATCCTCCTGCAACCTAATTCTGTCAATTTGGTCNAACTCATTCCT CTGTCCAATTTTTATGCCCTTGGCTGGGAAAAAAATTGGCGAAA  55 110 165 220 275 33 0 3 85 440 495 550 605 649  100  Appendix 77C1 2.4 (M13) GCGCGCACCATGCACGAGCCGAAGCAGGGGGAGGCATTGCCTCACTTGGGAAGCG CAAGGGGTCAGGGGANTTCCCTTTCTGANTCAAAGAAAGGCGTGACGGATGGCAC CTGGAAAATCGGGTCACTCCCACCCGGAATACCGCACTTTTCCGACGGGCTTAAA AAACGGTGCACCANNAAATAATATCCCGCACCTGGCTCGGAGGGTACTACNCCCA CAGANTCTTGCTGATTGCTAGCANAGCAGTCTGANATCAAACTGCAAGGCGGCAC CAAGGCTGGGGGAAGGGCGCCCGCCATCGCCCANGCTTGATTAGGGTAAACAAAG CAGCCTGGAAGCTCNAAACTGGGTTGGNANCCCACCACACCTCAAGGAAGCCTTG CCTGCCTCTGTTNAGCTCCACCTCTGGGGGCACGGCACACACCAACAAAAAGACA GCAGTAACCTCTGCANACTTAAATGTCCCTGTCTGACAGCTTTGAANAAAACANT GGTTTCTCCCAACACCCANCTGGANATCTGANAACGGGCAGACTGCCCCCTCCAN TNGGTTCCTGAACCCCTGACCCCCAACCANCCTAACTGGGAAGCACCCTCCCNCC NNCNGGAACACTGAACACTCNCACCGGTANGGTTTTTCCAACNNA  5 5 110 165 220 275 33 0 3 85 440 495 550 605 650  93C11 (M13) GCGCGCCACCACGCCCAACTAATTTTTGTATTTTTATTGTAGACGGGGTTTCACC ACGTTGGCCAGGATGGCCTTGATCTCTGACTTCGTGATCTGCCAGCCTAGGCTTC CCANAGTGCTGGGATTACAGGTGTGGGCAACCGTGTCCAGCCTAAATCTCTATTT TTTAAAGTTAAAAGAAGAATAAATTGAACATATATTTTACTTATTCACCATTGAA CAGTTATTTTTGTCCATTAATTTTCGTTTTATATATTTATTTTATTTTATTTTAT TTTTTTGAGACAGAGTCTCGTGCTGTCACCCAGGCTGGGTTTGCANTGGCATGAT CTCAGCTCACNGCNNGCTCCGCCTCCTGCATTCACTCNATTCTCCTGCCTCAGCC TCCCGAGTTNNCTGGGACTACCGGGGCCCNCCACCTTGCCCGGCTACTCTNTTGT TCTCCTNCANTAGANACNGGGTTTCNCCNNGTTCCTCCACGATGGGTCTCNATCT CCTGCCCCNTNGATNCTGCCTCGCTTCCTTCCCCCTCTAACTTGCCGGGNTTTCC NNNTTTNNTAGCCTCNGNTCCCCCCCTNNTCGCNATCCNACTTTTGTNCTTCNCT GNGCANTNTNTTTNNCGCCTNTNTCCCCCCCTNNACTCNCCNTCN  5 5 110 165 220 275 330 3 85 440 495 550 605 650  141A6 1.7 (M13) GCGCGCCACTGCACTCCAGCTGGGCGACAGAGCAAGACTCCGTCTCAAAAAAAAA AAAAAAAAAACCTGTCAAGAAATGTGTAAATAATGAAGGTGGTGGTTGTGTCAGG GGNAAAAAGGACAGGGCGATATTAATGGCGAAACACATCAGGAAGGATTTTGGAA AAAAACTATAAAAAAAAAATGGTAACTTTACAGTGGAGGAGCATGGCAAATACCA CATTAAAAATTTGATCAGCAAAATTAATATTACAAATATTAATGTCTACACCAGG TGACTGTGATATGATGCACTGAAAAAAGCACACCATCAGGTAAAAACACATAGCT TCAATCTAATCATGAAAACCACATCATACAAACACTAATTGAGGGACAATCTAAA AAAAAAACTGGCTTCCATCATTTCAAAAGTGTTGAAGGTCCTGGAAAAAAGAAGG ATAGATGACNTTTCCATATATCAAAAGATATTAAAAATTTGACAGTATAATACAA TTAGTAATCCTGGCTTGAATCCTAAACTAAAAAAAGGACNTTATTGGACATTATT GGAAAACCTCCAAATCCACAATGTTGTTAATTGGTTAAAATTCTTATATCTATAA AATTTTTGGTTCCCCNATTTTACTGTCTTTGCATAAAATTTTATT  5 5 110 165 220 275 330 385 440 495 550 605 650  156G7 2 . 3 (T3) GCGCGCCCTCCTCGGATATTTATATCCCCTGGGCCCCTGCCCACTGCTCCCCTCC CCCACAAGCTGCTGCTGACAGCACGACGGCNGTNCCTCCTCCACCCGACGNTGCG CCAGTGGTTGTGCTCCTGCGAGGGG  5 5 110 135  101  Appendix 166H7 5 . 0 (M13) GCGCGCGGGGNAGTTATGGGTCTCNGCGTCTGGCCGGAANCTGCGGAGGTTGGAA ACTGANCGCTGTGATGTCCCGAGCCATGGGGACCGATGCCCGGCGCGATTTGGCC GCCTGCCGCGTCCACACTCGCTGGAAACGTCCATGTAGTCTCAGCTAGGTCGGCC ACTGCTTCCCANTGGGCGCTGGCCACCCCTAANCTNGTCAGCCGGNCCTGGCTTC TCATGGGCCCTGAGGGTCTGANGCCCGCGCCCCGGCTCCTCTACTAGGCTNTCGG GTCATATAAGGGCTGAATGGGGGGAATGGCCACGGGNCTCTCCGAAANGGAANGC CTGCTCTTNTCCANACNGGCACGGCCACCGACAAACTCAGGCCNTCCATGGGACA GGAACANCNCCTGGCCCCATTTGGGGATCTTGAAGATGCNNCTTTGAGAAAGGCC TCTGGGCATCTCTNANCTGGTAACTNTTNGGGCATTTGTNAAGAANTCTTCAATC NCTAATTNACAAACCCCAACTCAAANCGAACTTGCACGAAATAAAAAATTTNTTA ATGGAAGTTTNGAAGGGAACCACCTTCCTGCATTGCATTTGATCCAAAATCCCCC TTNCAACCTTTTTTTNTGGNNCCCCNCAATTTTCCCCCCTTCCGC  55 110 165 220 275 330 385 440 495 550 605 650  166H7 9 . 2 (M13) GCGCGCCCCTGCACTACCTAGCGCCGCTGCTCTCCAACCTCAATCAACGCCCTGC GGCGCGGGCCTTCCTACTGGACCCCGACNTGTGAACCCCAGGGCGCCCGCGCGGG CGGGGTAGGGGACGGAANTTAGGGGGGACGGCCCTAANTGACATCTCCCAACATG TGCCCTGGTCCNNCGGCTGCTGCCCCTTACCCANTTACCCCCACTCCTCTGTACT CCACGGGTCNGGGTGGTCGGGACCCTGCCGGTAATTGCTGCCTCTCCANCACCNT NCAGTTCGTNCGTNTNNCCGGGGNTTCCTNTCCCNNGCCTGCTGGGNNACNGGNA CCCCNNTGGGGCTTGCTCCTGGGAACCTTCTTTTTCNATTNCTNNGTCTCCCTNT TCCCTCCNTCCNATCCCCCTCNATGGACANTCCCCCGAANTGGGTTCCNTCNNGN NANCCTNNANCGGTCGGNANNACTCCCTCCCCCTTTTTTTGGCTNNCTCTCCCCN CTCTNGGTCTNGNGTCCTTCAANGAATTTCCTCGCNNANNCGAACTNANANACGC GNTNItfNTCTAANTT^ NCANGGAANGGGAATNNCCTANT^TAATNNNAAACTAATAATNN  55 110 165 220 275 330 385 440 495 550 605 650  175G8 3 . 3 (M13) GCGCGCCAACTGCTCTTGGTTTGCGCACCACCTGGAAAGGGGTGTGGCTTTCACT CATTCTGGTACCAGGCACAGCATGAACAAATATTCAACAAGCAAACTGGCAGCAT TCCTGGATATCTTAGCTGCCTTGTTTGTCTTCCCAGAGCATGAGGTAGCCCTGTT TGGAACCTGACACCAGCAAAACAGGACGGCGCATGGAACAGACAAACATGCCCCG TGGTTCAGCCACAGTTTTTGGGCACAGACTCCAGAGCACTTCCTGGCCCAGGAAA GCAGCAAGCCACTCCAACCTGGTCAGCTGCCAGGCTATGGTTCACATTTTAGGAT TGAGACTTGGTTTCTGCTTCACAGACAGAACTAGAGGCNGAAAAGAAGAAAAAAA TCNATACAGGCACCCAGAGATGGGCTTACCTAGGGTCCTGGAAGATGCGAGAAGC CACAGTGCAGAAGACAGAATGATGAACACACAGAGAAATCACTTCTGCAAGAACC CACTGATTTTCATGGTCTGATCAAAATGTTCCACAATCACCCTCCTGGAACAATT GGTCTGTTGGGATGACCCCTTAGCTTACGGGGAAGAAAACNGGAATTNGAAGGAA AAAGGAAAAGAAAAGGAAGGGTTAGGAGAAAATAAGAAGCCAAGG  55 110 165 220 275 330 385 440 495 550 605 650  102  Appendix 176F5 1.8 (M13) GCGCGCCTGTAGTCTCAAGCTACCCGTGGGGCTGAGGTTGGAGGATGAGCCCGGG AAGTTGANGCTGCAGTGAGCTGTGATTGTACCACGGCCCTCCAGCATGGGTGACC CTGGCTTANGAAAATTTGTAATAATAACATCCCATCTCTCAGTTTACATGTAGAA ATCTTACACATCTTTCCTAAATATCATGTAAGTCAGTCCCAGGTGTGTTATGTCC TCTGAGTCTATTCTAAATGGTTTATTTTGGATTTTTTTTTCCTTTTCTGTCAGAT GCTTTAGGTACTGATTTTCTCCTCTCCGCCTGGCTGGAACATCCGTCTGCATCGG ACGGGCTTCCTTCTGCACCTGTGACCTCACAAGGGCTCTCCTTGGGGANGCAGGA CCAGAGCAGCCCTGAGTAACCGGGCACAGCCGAAGAACAGGCCCCANGGAAGCCA GGAACCGGGTCCTCACGCCTGCGGCCACAGAAGCACCGTGTCCCTGCAGCTGTGA ACCACCAAGCGTCTGGGTGGGTGGTTACTCAGCAAATGCCAATAAAAAAACTGGG TCCCAAAATTNGGGTTGCTGTGGACTGGATACTTGTCCCTCACCTCCAATTTCAA TGTTNAAGTCCCCACCCCCANCAAAAAATGTTNGAAGTGGGGCCT  Appendix 1 Sequence obtained from cosmids beginning at a rare cutting enzyme site. BssHII (GCGCGC) with the M l 3 primer and EcoRl (GAATTC) with the T3 primer.  103  55 110 165 220 275 330 3 85 440 495 550 605 650  Appendix National Center for Biotechnology Information (NCBI): http://www.ncbi.nlm.nih.gov ESTs database: http://www.ncbi.nlm.nih.gov/dbEST/index.html Baylor College of Medicine search launcher: http://kiwi.imgen.bcm.tmc.edu:8088 Stanford R H server:  http://www-shgc.stanford.edu/rhserver2/rhserver_form.html  UniGene at NCBI: http://www.ncbi.nlm.nih.gov/UniGene/  Appendix 2 Web site addresses.  104  Appendix  Identification of a Novel Human Gene (GOR) Localized to 8ql3-8q22  Leah R. DeBella, Michael Schertzer and Stephen Wood'  Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada, V6T 1Z3  Subject category: Gene discovery and disease genes Short Communication  Appendix 3 Article submitted for publication prior to completion of Thesis.  105  Appendix Transcribed sequences are not dispersed evenly between and within chromosomes, but appear to be concentrated in particular regions. Areas that are rich in genes have a higher GC content than gene poor regions and may contain landmark regions called HTFislands (Lindsay and Bird, 1987) or CpG regions (Aissani etal., 1991). These islands are associated with 40% of tissue specific genes and with most ubiquitously expressed sequences (Larsen et al., 1992). Each CpG island that has been investigated has been found to be located near or within a gene. Rare cutting enzymes are a class of enzyme whose recognition sequences are found frequently in areas with high GC and gene content. Although the recognition sequences for this class of enzymes vary, each contains at least one CpG dinucleotide. Lindsay and Bird (1987) demonstrated that a single rare cutting enzyme could be used to identify CpG islands in cloned D N A , suggesting a method for locating gene rich regions. One of these rare cutting enzymes, BssHII, is an excellent choice to be used in this manner, since 80% of its sites are located in GC rich regions (Larsen et al., 1992). Cosmid 40G1, selected from the LA08NC01 flow-sorted chromosome 8 library (Wood et al., 1992), contains two BssHII sites within a 3.3 kb EcoRl fragment. Both iscoRI/BssHII fragments, 1.6 and 1.4 kb in size, were subcloned into a modified pBluescript (Stratagene) vector. The multiple cloning site in the pBluescript vectors is present as a BssHII fragment making these vectors unsuitable for cloning doubly digested fragments with a BssHII end. The pBluescript II KS(+) vector was modified by cleavage with Xbal and BamHI. The vector was then religated in the presence of two oligonucleotides; 5 ' - G A T C C A C T A G T T - 3 ' and 5'-CTAGGCGCGCCG-3'. The modified vector, pKSIIAsc, lacks the Xbal and Spel sites but contains an Ascl site (Figure 1). The Ascl recognition sequence is a superset of BssHII sites and BssHII ends ligate to the Ascl cut ends of the modified vector. Each fragment was then sequenced from both ends using an automated sequencer (Applied Biosystems Model 373). The sequence obtained was compared to genes present in the GenBank database using B L A S T (Altschul et al., 1990).  106  Appendix Only one gene, GOR (GenBank D10017), identified in chimpanzees, was found to share high sequence identity with the 40G1 cosmid sequences. Few variations in sequence were found between the human and chimpanzee genes. Among the 95 amino acids compared in Figure 2 there are 9 amino acids differences (91% identity). The 9 amino acid differences were due to substitutions at codon position 1 in 6 cases, position 2 in one case and position 3 in two cases. A n additional 5 silent substitutions were observed 1 at position 1 and 4 at position 3. A novel STS was developed by designing PCR primers to amplify a 286 bp product at the BssHII end of the 1.6 kb fragment. The forward primer, 5'A G G T T G C C C C A A G T C C A A G C - 3 ' , is shown underlined at the top of Figure 2 while the reverse primer, 5'-GCTGTCTGACCTTCCACATC-3', complements the 20 nucleotides shown underlined at the bottom of Figure 2 that are adjacent to the G C G C G C BssHII site. These primers were used to localize the human GOR gene on chromosome 8 using a somatic cell hybrid panel (Wagner et al, 1991; Wood et al, 1986). Only XVHI-23Ha produced a product localizing GOR to 8ql3-q22 (Figure 3). This STS was also typed, in duplicate, in the G3 radiation hybrid panel (Stewart et al, 1997) and the results submitted to the Stanford R H server (http://www-shgc.stanford.edu/rhserver2/rhserver_form.html). The highest two point L O D score of 3.87 was reported at a distance of 52.9 c R  10000  from  SHGC-37027 which lies within the chromosome 8 bin 67. We then used multi-point mapping, R H M A P 3.0 (Boehnke et al, 1991) available from http://www.sph.umich.edu/group/statgen/software/, locally and found that GOR could not be ordered within bin 67. The Genethon marker D8S1757, also located within bin 67, maps to position 94.5 c M on the 165.8 c M sex-average Genethon linkage map (Dib et al, 1996) compatible with the cytogenetic localization for GOR. Thus GOR seems to map close to autotaxin (ATX) and most likely proximal to the carbonic anhydase cluster (Leach etal, 1996).  107  Appendix A 2859 bp EcoRl partial cDNA clone containing the GOR epitope ( G R R G Q K A K S N P N R P L ) , was isolated by Mishiro et al, (1990) and sequenced. The corresponding human genomic EcoRl fragment is 3.3 kb in size. The human homologue has lost the EcoRl recognition site corresponding to the 5' site of the chimpanzee fragment. Consequently the human EcoRl fragment extends 5' of the chimpanzee fragment by 417 bases to the preceding EcoRl site. We have identified a potential 3' splice junction, tgttacacagA at bp 311 of the human EcoRl fragment. No intronic sequences were found in the region corresponding to the chimpanzee cDNA. Thus we have identified the 3' exon of the human GOR gene. In both humans and chimpanzees this 3' exon contains a truncated L I element starting at bp 1807 of the chimpanzee sequence which is downstream from the U A G stop codon at 1576/8 bp. The protein encoded by GOR was identified due to its high levels in the blood plasma of chimpanzees infected with hepatitis C virus. The GOR epitope within this protein was found to cross-react with antibodies from patients infected with hepatitis C, and was suggested as a complementary method to be used in the detection of low level virus infection in blood (Mishiro et al, 1990). Additional B L A S T searches of the EST database identified six mouse ESTs (W81932, AA034629, AA097346, AA120740, AA030569, AA110524), sharing high sequence identity with the chimpanzee and/or human GOR genes. Interestingly no human ESTs for GOR were present in the database, which validates the value of the rare cutter CpG island approach as a complementary method for gene identification.. Although the function of the GOR gene product is unknown, its high level of conservation between mammalian genomes suggests it has an important function.  108  Appendix  ACKNOWLEDGEMENTS This work was supported by grant GO-12753 from the Canadian Genome Analysis and Technology Programme.  REFERENCES Aissani, J3., and Bernardi, G. (1991). CpG islands: features and distribution in the genomes of vertebrates. Gene 106: 173-183. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D . J. (1990). Basic local alignment search tool. J. Mol. Biol. 215: 403-410. Boehnke, M . , Lange, K., and Cox, D. R. (1991). Statistical methods for multipoint radiation hybrid mapping. Am. J. Hum. Genet. 49: 1174-1188. Dib, C , Faure, S., Fizames, C , Samson, D., Drouot, N . , Vignal, A., Millasseau, P., et al. (1996). A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380: 152-154. Larsen, F., Gundersen, G , Lopez, R., and Prydz, H . (1992). CpG islands as gene markers in the human genome. Genomics 13: 1095-1107. Larsen, F., Gundersen, G., and Prydz, H . (1992). Choice of enzymes for mapping based on CpG islands in the human genome. Genetic Analysis, Techniques & Applications 9: 80-85. Leach, R. J., Banga, S. S., Ben-Othame, K., Chughtai, S., Clarke, R., Daiger, S. P., Kolehmainen, J., et al. (1996). Report of the third international workshop on human chromosome 8 mapping. Cytogenet.Cell Genet. 75: 71-84. Lindsay, S., and Bird, A . P. (1987). Use of restriction enzymes to detect potential gene sequences in mammalian D N A . Nature 327: 336-338. Stewart, E. A., McKusick, K . B . , Aggarwal, A., Bajorek, E., Brady, S., Chu, A., Fang, N . , et al. (1997). A n STS-based radiation hybrid map of the human genome. Genome Res. 7: 422-433. Mishiro, S., Hoshi, Y . , Takeda, K., Yoshikawa, A., Gotanda, T., Takahashi, K . , Akahane, Y., et al. (1990). Non-A, non-B hepatitis specific antibodies directed at hostderived epitope: implication for an autoimmune process. Lancet 336: 1400-1403. Wagner, M . J., Ge, Y., Siciliano, M . , and Wells, D. E. (1991). A hybrid cell mapping panel for regional localization of probes to human chromosome 8. Genomics 10: 114125. Wood, S., Poon, R., Riddell, D. C , Royle, N . J., and Hamerton, J. L . (1986). A D N A marker for human chromosome 8 that detects alleles of differing sizes. Cytogenet. Cell Genet. 42: 113-118.  109  Appendix Wood, S., Schertzer, M . , Durbkin, H., Patterson, D., Longmire, J. L., and Deaven, L . L . (1992). Characterization of a human chromosome 8 cosmid library constructed from flowsorted chromosomes. Cytogenet. Cell Genet. 59: 243-247.  110  Appendix  pBluescript II KS+ PstI  Smal  BamHI  Spel  Xbal  NotI  5'..CTGCAGCCCGGGGGATCCACTAGTTCTAGAGCGGCCGC 3'..GACGTCGGGCCCCCTAGGTGATCAAGATCTCGCCGGCG  3' 5'  pKSIIAsc PstI  Smal  BamHI  Ascl  NotI  5'..CTGCAGCCCGGGGGATCCGGCGCGCCTAGAGCGGCCGC..3' 3'..GACGTCGGGCCCCCTAGGCCGCGCGGATCTCGCCGGCG..5'  Figure 1 Construction of the modified Bluescript plasmid pKSIIAsc. The relevant region of the multiple cloning site is shown. The region in bold in pBluescript IIKS+ has been excised by restriction digestion and replaced in pKSIIAsc by the region shown in bold.  Ill  Appendix  Human genomic  „. . Chimp  cDNA  E  B  I  •  B  .  ^6kb  1.4 kb  E  I 1  B  B  1  1  1195 1516  E  1 2859  Figure 2 G O R sequence comparison between human and chimpanzee for the region amplified by the human G O R STS. The human sequence is numbered from the BssHII site in the 1.6 kb JECORI/BSSHII genomic fragment. The chimpanzee sequence is numbered from the EcoRl site of the partial cDNA as reported by Mishiro et al (1990).  112  Appendix  Figure 3 Chromosomal localization of the human GOR gene by STS typing of somatic cell hybrids. Lane 1 contains a marker digest, <J)X174 phage digested with Haelll. The cell hybrid panel consists of: XVIII-23Ha, 8pter-8q22, lane 2; V T G H L 19, 8pter-8ql3, lane 3; 1SHL 3, 8pter-ql 1, lane 4; 20xPO435-2, 8p23-ql 1, lane 5; M G V 270, 8q24.1-qter, lane 6; M G V 271, 8q22.1-qter, lane 7. Human D N A , lane 8, is the positive control, and lane 9 is a negative control.  113  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087937/manifest

Comment

Related Items