Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Mapping of D8S136 to the short arm of human chromosome 8 Mitchell, Heather Katherine 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1992_fall_mitchell_heather_katherine.pdf [ 4.84MB ]
Metadata
JSON: 831-1.0086587.json
JSON-LD: 831-1.0086587-ld.json
RDF/XML (Pretty): 831-1.0086587-rdf.xml
RDF/JSON: 831-1.0086587-rdf.json
Turtle: 831-1.0086587-turtle.txt
N-Triples: 831-1.0086587-rdf-ntriples.txt
Original Record: 831-1.0086587-source.json
Full Text
831-1.0086587-fulltext.txt
Citation
831-1.0086587.ris

Full Text

M A P P I N G OF D8S136 TO THE SHORT A R M OF H U M A N CHROMOSOME 8 by HEATHER KATHERINE MITCHELL B.Sc, University of Puget Sound, 1989 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n THE F A C U L T Y OF G R A D U A T E STUDIES GENETICS P R O G R A M We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A SEPTEMBER, 1992 © Heather K. Mitchell, 1992 In presenting this thesis in partial fulfilment of the requirennents for an advanced degree at the University of British Columbia, 1 agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of <^ ^ >^  C - ^ - ^ ^ T v ~ : ^ ^ : ^ c The University of British Columbia Vancouver, Canada Date 3 -7 2_ DE-6 (2/88) A B S T R A C T Cosmid clones containing NotI sites and (GT)n repeats were randomly isolated from the flow-sorted human chromosome 8 library, LA08NC01, by hybridization with nick-translated poly(dC-dA)- (dG-dT) and subsequent EcoRI/Notl and NotI digestion of GT-hybridizing clones. A n anonymous D N A sequence, D8S136, containing three (GT)n repeats and two NotI sites was isolated. Three sequence tagged sites (STSs), unique sequences identifying D8S136, were generated and the physical distances separating the sites were determined. Two of the (GT)n repeats within this segment were shown to be highly polymorphic in D N A from members of the Centre d'Etude Polymorphisme Humain (CEPH) panel of 40 reference families. These polymorphisms were designated D8S136P1 and D8S136P2. Southern blot analysis and amplification of D N A from somatic cell hybrid lines using the polymerase chain reaction (PCR) mapped D8S136 to 8p, interval 8p21^cen and linkage analysis localized this marker to an approximately 9.9 centimorgan (cM) region between flanking marker loci D8S133 and D8S5. Analysis of allelic association between D8S136P1 and D8S136P2 indicated minimal disequilibrium. The disequilibrium did not appear to measurably decrease the amount of information gained by typing individuals for both D8S136P1 and D8S136P2. The PIC values for D8S136P1 and D8S136P2 were 0.82 and 0.69, respectively. The PIC of the haplotype was 0.95. D8S136 is an extremely polymorphic locus which wil l be useful for physical and linkage mapping studies. TABLE OF CONTENTS Abstract i i Table of Contents i i i List of Tables v i i List of Figures v i i i Acknowledgements ix Chapter 1. Introduction 1 1.0 Introduction 1 1.1 Human chromosome 8 2 1.2 Clone isolation 4 1.3 Physical localization and sequence tagged sites 7 1.4 Linkage mapping 8 1.4.1 D N A sequence polymorphisms 9 1.4.2 Linkage analysis 14 1.5 Hardy-Weinberg equilibrium 17 1.6 Investigation of whether closely linked (GT)n repeat polymorphisms are useful for the generation of extremely informative haplotypes 18 1.6.1 Analysis of allelic association 20 1.7 Organization of the thesis 21 Chapter 2. Materials and Methods 22 2. 1 Materials 22 2.1.1 Flow-sorted Human chromosome 8 Library 22 2.1.2 Somatic cell hybrids 23 2.2 Methods 24 2.2.1 Plasmid and cosmid D N A isolation 24 2.2.1.1 Small scale isolation 24 2.2.1.2 Large scale isolation 27 2.2.2 Restriction enzyme digestion 29 2.2.3 Agarose gel electrophoresis 30 2.2.4 Transfer of D N A to nylon membranes 30 2.2.4.1 Southern transfer 30 2.2.4.2 Production of filters for colony hybridization 32 2.2.5 Preparation of radioactively labeled probes 33 2.2.5.1 End-labeling 33 2.2.5.2 Nick translation 33 2.2.5.3 OligolabeUng 33 2.2.5.4 Preannealing with human D N A 36 2.2.6 Prehybridization, hybridization, washing and autoradiography 36 2.2.6.1 End-labeled probe 36 2.2.6.2 Oligolabeled probe 36 2.2.6.3 Stripping filters 37 2.2.7 (GT)n repeat detection 37 2.2.8 Random isolation of cosmids containing both NotI sites and (GT)n repeats 38 2.2.9 Ligation 38 2.2.10 Transformation 39 2.2.11 Sequencing 40 2.2.11.1 Preparation of the template 40 2.2.11.2 Sequencing primers 40 2.2.11.3 Sequencing reactions 41 2.2.11.4 Denaturing polyacrylamide gel electrophoresis 42 2.2.11.5 Storage and analysis of sequence data 43 2.2.12 Polymerase Chain Reaction 44 2.2.12.1 Primer design and synthesis 44 2.2.12.2 Primer purification and end-labeling 45 2.2.12.3 Amplification 46 2.2.12.4 Electrophoresis of radiolabeled PCR products 47 2.2.12.5 Polymorphism typing 47 2.2.13 Statistical analysis 47 2.2.13.1 Linkage analysis 47 2.2.13.2 Calculation of allele and haplotype frequencies 48 2.2.13.3 Calculation of heterozygosity and PIC 49 2.2.13.4 Analysis of observed genotype frequencies at D8S136P1 and D8S136P2 49 2.2.13.5 Analysis of allelic association of alleles at D8S136P1 and D8S136P2 49 Chapter 3. Results 51 3.1 Clone isolation 51 3.2 Physical mapping 51 3.2.1 Restriction Map of cl40D4 51 3.2.2 Physical Localization of cl40D4 52 3.2.2.1 Primary localization-genomic localization blots 52 3.2.2.2 Secondary localization-PCR 52 3.3 Polymorphisms 56 3.3.1 Polymorphism screening and analysis 56 3.3.2 D8S136P1 56 3.3.3 D8S136P2 58 3.4 Sequence tagged sites 64 3.4.1 Sequence flanking a NotI site 64 3.4.2 The D8S136P1 and D8S136P2 (GT)n repeats and their flanking sequences 64 3.5 Linkage analysis 67 3.5.1 Two-point linkage analysis 67 3.5.2 Multipoint linkage analysis 69 3.6 Analysis of the observed genotype frequencies at D8S136P1 and D8S136P2 76 3.7 Analysis of association of alleles at D8S136P1 and D8S136P2 76 Chapter 4. Discussion and Conclusion 82 4.1 Clone isolation 82 4.2 Poljnnorphism screening and linkage analysis 83 4.3 Distribution of aUeles at D8S136P1 and D8S136P2 84 4.4 Analysis of association of alleles at D8S136P1 and D8S136P2 85 4.5 Conclusions 93 4.6 Proposals for future research 94 4.7 Summary 95 References 97 Appendbcl 112 LIST OF TABLES Table 1. Human chromosome 8 constitution of somatic cell hybrids 25 Table 2. Sizes and frequencies of alleles at D8S136P1 60 Table 3. Sizes and frequencies of alleles at D8S136P2 63 Table 4. Two-point meiotic linkage analysis for selected 8p markers and D8S136P1 70 Table 5. Two-point meiotic linkage analysis for selected 8p markers and D8S136P2 72 Table 6. Supporting data for marker order 74 Table 7. Distribution of 246 D8S136 chromosome haplotypes in the C E P H panel 77 Table 8. AUeUc association between D8S136P1 and D8S136P2, expected numbers in parentheses 78 Table 9. D8S136 chromosome haplotypes observed at least five times in the C E P H panel 80 LIST OF nCURES Figure 1. Restriction map of cl40D4 53 Figure 2. Primary physical localization of cl40D4 54 Figure 3. Secondary physical localization of D8S136 55 Figure 4. Sequence of D8S136P2 (GT)n repeat 57 Figure 5. D8S136P1 (GT)n repeat and flanking sequence 59 Figure 6. D8S136P1 (GT)n repeat polymorphism 61 Figure 7. D8S136P2 (GT)n repeat and flanking sequence 62 Figure 8. D8S136P2 (GT)n repeat polymorphism 65 Figure 9. Sequence of the subcloned NotI site and flanking sequence 66 Figure 10. Family with a recombination event between D8S136P1 68 Figure 11. Summary of chromosome 8p multipoint linkage analysis 75 Figure 12. Distribution of alleles at D8S136P1 86 Figure 13. Distribution of alleles at D8S136P2 87 A C K N O W L E D G E M E N T S I would like to thank my supervisor. Dr. Stephen Wood, for his guidance and support. I would like to thank the members of my advisory and examining committees for their time and helpful comments. Lynn Bernard, Mike Schertzer and Craig Kreklywich all provided suggestions and assistance during the data-gathering stage of my research, and I would like to thank all of them. I would like to thank my comrade of the last three years, Karen Henderson, for her empathy and sense of fun. I would like to thank Mike Hayden for his assistance and encouragement. Finally, a special thanks to my mother, Anne Mitchell. She has patiently supported me, listened to me, loved me and provided me with the confidence necessary to pursue my dreams. Chapter 1 Introduction 1.0 Introduction A five year goal of the Human Genome Project is the construction of physical maps of each chromosome, with sequence tagged sites (STSs) spaced approximately 100 kilo bases (kb) apart and the generation of linkage maps with an average resolution of 2 centimorgans (cM), the ultimate goal being the generation of linkage maps with 1 cM resolution. These goals were outlined in the interest of advancing our understanding of human genetics and inherited disease. One of the interests of our lab is the isolation of polymorphic markers from human chromosome 8 for use in linkage analysis and diagnosis of disease (Wood, 1988; Wood et al. 1992). In addition, our lab conducts similar research on human chromosome 5, much of which is described in Bernard (1992). Studies of both chromosomes involve determining the degree of polymorphism of markers which have been isolated and obtaining information on their physical and genetic locations. The objectives of this study were both to contribute to the linkage and physical maps of the short arm of chromosome 8 and to investigate the practical value of simultaneously analyzing two closely linked (GT)n repeat polymorphisms. The first objective was achieved by the isolation of NotI sites from a human chromosome 8 specific library and the physical and genetic localization of these sites. Physical localization was performed using a panel of somatic cell hybrids, while genetic localization was achieved using two-point and multipoint linkage analysis of two (GT)n repeat polymorphisms which mapped in the vicinity of the sites. The later objective was carried out in two steps. Each member of the C E P H panel of reference families was typed for the two closely linked polymorphisms, allowing the degree of polymorphism of these markers and the haplotype to be determined. In the second step, the amount of non-random allelic association between the two markers, in the population defined by the C E P H panel, was estimated. The motivation for this study is presented in the remainder of this chapter along with a discussion of the necessary background information. 1.1 Human chromosome 8 When work on this thesis was begun, interest in chromosome 8 was largely due to the fact that the number of mapped loci on this chromosome was relatively small. Interest in chromosome 8 has been further stimulated by the reports that the genes for a dominant form of retinitis pigmentosa, R P l (Blanton et al., 1991), Werner's syndrome (Goto et al., 1992), and a possible prostate tumor suppressor gene (Bergerheim et al., 1991) have been localized to the short arm of chromosome 8. Human chromosome 8 comprises 5% of the human genome. Consequently, it is expected to contain 5% of the human gene lod, or 2,500 genes out of an estimated 50,000 genes in the genome (see references in McKusick and Ruddle, 1977). Compared to many other human chromosomes, chromosome 8 is sparsely covered with cloned and characterized D N A probes. The Human Genome Mapping 11 ( H G M l l ) database lists approximately 9,000 loci, including cloned genes, localized genes and cloned D N A segments, for the entire genome (Williamson et al., 1991). If 5% of these loci were derived from chromosome 8, the number of loci assigned to this chromosome would be approximately 450. In fact, chromosome 8 has been assigned only 251 loci, comprised of 64 genes, 6 fragile sites and 181 anonymous D N A segments (Donis-Keller and Buckle, 1991). The altered genes responsible for Langer-Giedion syndrome (Buhler et al., 1980), atypical vitelliform macular dystrophy (Ferrell et al., 1983), R P l (Blanton et al., 1991) and Werner's syndrome (Goto et al., 1992), among others (see Wood, 1988; Tsui et al., 1989; Donis-Keller and Buckle, 1990; Donis-Keller and V. Buckle, 1991) have been mapped to chromosome 8. Genes on chromosome 8 are also thought to be involved in Burkitt's lymphoma and possibly prostate cancer. Burkitt's lymphoma is a B cell malignancy which is commonly associated with a 8;14 translocation and rarely with 2;8 or 8;22 translocations (see references in Wood, 1988). These translocations all involve breakpoints in 8q24, the region where the cellular protooncogene homologue of the avian myelocytomatosis viral oncogene (MYC) has been localized (Neel et al., 1982). In addition, a possible tumor suppressor gene involved in the oncogenesis of prostatic carcinoma has been mapped to chromosome 8 by deletion mapping studies of chromosomes from patients with prostate cancer (Bergerheim et al., 1991). D N A clones have been obtained for several genes localized to chromosome 8: the M Y C protooncogene, the tissue plasminogen activator (PLAT) gene, the genes for carbonic anhydrases 1,11 and III, the lipoprotein lipase (LPL) gene, and the ankyrin 1 (ANKl ) gene, along with the region deleted in Langer-Giedion (Donis-Keller and Buckle, 1991). 1.2 Clone isolation Since the hun\an genome is large and complex, the development of chromosome-specific libraries has simplified the task of isolating D N A probes from a particular chromosome. Large insert libraries, constructed from partial digests of flow-sorted D N A , can be used to isolate markers for genetic mapping. These libraries are particularly useful for physical mapping as they allow the isolation of large, overlapping segments of D N A and increase the probability of isolating intact genes. A large insert library, LA08NC01, constructed from a partial digest of flow-sorted human chromosome 8 D N A , was used to randomly isolate clones containing NotI sites and (GT)n repeats. Clones containing NotI sites were of interest for three reasons. First, NotI sites are often associated with CpG islands, and therefore, as discussed below, potential genes. Secondly, they are useful for the construction of long-range physical maps of genomic D N A , because NotI cuts infrequently. Finally, placement of physically localized NotI sites on the linkage map by means of (GT)n repeat polymorphisms in their vicinities, would aid in the generation of a high resolution linkage map and the integration of physical and linkage maps of chromosome 8. Approximately 1% of the vertebrate genome is non-methylated, as detected by cleavage of the D N A into tiny fragments with the methylation-sensitive restriction enzyme, Hpal l (Cooper et al., 1983). In this fraction of the genome. CpG dinucleotides are unmethylated and are observed at the frequency expected from base composition (Bird et al., 1985). In methylated sequences CpG dinucleotides occur at less than one quarter of their expected frequency (Schwartz et al., 1962). The rarity of CpG dinucleotides in bulk D N A is probably due to the deamination of 5-methylcytosine to give thymine (Coulondre et al., 1978; Bird, 1980). Sequences rich in non-methylated CpGs occur as discrete islands, usually 500-2,000 base pairs (bp) long (Bird, 1986), and approximately 30,000 of these islands are distributed randomly throughout the haploid mammalian genome (Bird et al., 1985). These islands, commonly referred to as CpG islands (Gardiner-Garden, 1987) are often associated with vertebrate genes and frequently surround the transcription start site and 5' end of the transcribed sequence (Bird, 1985,1986; Gardiner-Garden, 1987). It is therefore of interest to map and clone island sequences. CpG islands can be detected in chromosomal D N A as clustered sites for C-G enzymes. C-G enzymes are restriction enzymes with recognition sites that are G+C rich and contain one or more CpGs (Brown and Bird, 1986). Identification of islands in chromosomal D N A is aided by methylation of non-island CpGs which prevents digestion by C-G enzymes. This advantage is lost in cloned D N A where CpG methylation is absent. However, even in cloned D N A most sites for C-G enzymes with recognition sites comprised of only G and C and containing two CpGs should occvir in CpG islands (Lindsay and Bird, 1987). For example, in mammalian D N A , 89 percent of NotI sites were calculated to occur in CpG islands (Lindsay and Bird, 1987). The selective isolation of NotI sites, and therefore most likely CpG islands, should provide a useful method for cloning and mapping potential genes. The restriction enzyme, NotI, is also useful for the construction of long-range physical maps of genomic D N A because it cuts infrequently. The recognition site for NotI (GCGGCCGC) contains two CpG dinucleotides. As discussed previously, CpG dinucleotides occur at less than one fourth of the expected frequency in inter-island D N A . Lindsay and Bird (1987) estimated that there are approximately 4,100 NotI sites in the human genome. A n estimate of the distance between NotI restriction sites can be calculated by dividing the approximate size of the human genome by the number of NotI sites in the genome (3 X 10^ bp/4,100) or 730 kb. Chromosome 8 is approximately 5% of the genome or 1.5 X 10^ bp, therefore there are expected to be about 205 NotI sites on chromosome 8 (1.5 X 10^ bp/730 kb). A physical map of chromosome 8 could be generated by using NotI sites as landmarks and measuring the distance from each site to flanking sites. In addition, integration of the physical and linkage maps could be achieved by ordering NotI sites on the linkage map by means of polymorphisms in their vicinities. The resolution of the current chromosome 8 linkage map is on average, 5.66 cM, for the sex-average map (Tomfohrde et. al., in press). It has been estimated that a physical distance of 1,000 kb or 1 mega base (Mb) corresponds on average, to a genetic distance of 1 cM (Donis-Keller et al., 1987). Since the average distance between NotI sites is 730 kb, the isolation and ordering of all the NotI sites on chromosome 8 would generate a map with a resolution of approximately 1 cM. 1.3 Physical localization and sequence tagged sites Physical localization of a clone isolated from a chromosome-spedfic library to subchromosomal regions requires the generation of a D N A probe from the clone and then either in-situ hybridization to banded chromosomes or Southern blot or polymerase chain reaction (PCR) analysis of somatic cell hybrids. The production and uses of somatic cell hybrids are discussed in Thompson and Thompson (1986). Somatic cell hybrids are produced by fusing mouse and human cells. The resulting hybrids are unstable and tend to lose randomly only the human chromosomes. Eventually stable cell lines are produced which contain a full set of mouse chromosomes and one or a few human chromosomes. Somatic cell hybrids containing fragments of human chromosomes can be produced by fusing mouse cells with human cells containing a chromosomal rearrangement or chromosome fragment. A number of such hybrids, each retaining a different region of a particular human chromosome, could be collected to create a panel of hybrids which would divide the chromosome into defined intervals. Examples of somatic cell hybrid panels are shown in Figures 2 and 3 in Chapter 3. A n unmapped clone can be assigned to an interval by determining if the sequence represented by the clone is present or absent in the chromosomal region retained by each hybrid. This can be done using Southern blotting and hybridization or using PCR and unique primers designed from the partial sequence of the unmapped clone. Information about the physical location of the anonymous locus characterized for this thesis was obtained using the somatic cell hybrid panels described in section 2.1.2. It has been suggested that a common landmark for all physical maps would facilitate ordering of loci on the chromosomes and the integration of information from different kinds of physical maps. Olson et al., (1989) originally proposed that these landmarks consist of 200-5,000 bp of unique sequence which has been chromosomally localized and can be specifically detected using a PGR assay. These sequences are referred to as sequenced tagged sites (STSs). More recently, with the increase in the number of polymorphic markers based on hypervariable microsatellite loci, this definition has been expanded to include the unique primer pair flanking a polymorphic microsatellite site (Williamson et al., 1991). The anonymous locus characterized for this thesis is identified by three STSs separated by known physical distances. These sites can be used to isolate overlapping cosmid clones from the LA08NC01 library or to identify overlapping clones isolated by other methods. 1.4 Linkage mapping Human geneticists are particularly interested in mapping genes responsible for hereditary disease. A linkage map is a tool used for the localization and isolation of genes that can only be recognized by phenotypic effects and consists of a collection of polymorphic markers ordered by linkage analysis. The unit of measurement for linkage maps, the map unit or centimorgan (cM), is based on recombination frequencies. Both the density of markers and the degree of marker polymorphism affect the probability of detecting linkage between an unmapped gene and a marker. Once a fine resolution genetic map (1 cM) is constructed for all the chromosomes it should be possible to localize any disease gene to an approximately 1 Mb region between two markers by linkage analysis. This would involve typing markers in disease families and identifying the coinheritance of the disease phenotype with particular alleles of a marker or markers in a significant proportion of the families. Other techniques could then be used to specifically locate and clone the gene. 1.4,1 DNA sequence polymorphisms For a mating to be potentially informative for linkage analysis, at least one of the parents has to be doubly heterozygous at the marker locus or loci to be tested. The expected frequency of hétérozygotes is directly related to the number of common alleles in the population, therefore the more polymorphic a marker is, the greater the probability that an individual wi l l be heterozygous at the marker locus. In human genetics, two measures of a marker's degree of polymorphism (informativeness) are commonly used. One is the heterozygosity (H), H = 1 - Z pi2 where pi respresents the frequency of the 1*^  allele. H is the probability that a random individual is heterozygous for any two alleles at a gene locus with allele frequencies, pi as discussed in Ott (1991). The other measure of the degree of polymorphism is the polymorphism information content (PIC) (Botstein et al., 1980). The PIC value for a marker is the probability, based on allele frequencies, that a given meiotic event wil l be informative for that marker and is calculated by subtracting the frequency of uninformative meiotic events from 1 or, n n-l n PIC = 1 - E pi2 - Z Z 2pi2pj2 i=l i=l j=i+l where n represents the number of alleles and pi represents the allele frequency of the i**^ allele. If the PIC for a marker is 0.8 then approximately 80% of the meiotic events scored for that marker wil l be informative. A highly informative marker, defined as a marker with a PIC of 0.7 or greater, reduces the number of families required to obtain significant support for cosegregation of a marker with a disease trait or another marker. The majority of the markers on the human linkage map are defined on the basis of D N A sequence polymorphisms, rather than classical phenotypic polymorphisms (Donis-Keller et al., 1987). Many types of sequence polymorphisms are present in the genome and can be employed in linkage analysis. The polymorphic marker traditionally used to generate genetic maps is the RFLP or restriction fragment length polymorphism (Botstein et al., 1980). RFLPs are caused by single nucleotide changes resulting in the presence or absence of a restriction enzyme site, and thus a change in the length of a D N A restriction fragment. RFLPs are detected by agarose gel electrophoresis and Southern blot analysis. Marker loci based on these single variants will have only two alleles and are therefore limited in their informativeness. The maximum possible heterozygosity (probability that an individual is heterozygous or expected frequency of hétérozygotes) for an RFLP locus is 0.5. Another type of polymorphism results from a variable number of tandem repeats of a relatively short oligonucleotide sequence. Tandemly reiterated sequences provide more informative polymorphisms than RFLPs because there are usually several alleles at each locus, corresponding to different numbers of repeated units. Two major types of D N A polymorphisms stem from variations in the number of short, tandemly repeated sequences; VNTRs or variable number tandem repeats (Jeffreys et al., 1985a; Nakamura et al., 1987) and dinucleotide repeats (Litt and Luty, 1989; Weber and May, 1989). VNTRs, are highly polymorphic, with an average heterozygosity of 0.7 (Nakamura et al., 1987), and are widely distributed throughout the human genome (Jeffreys et al., 1985a; Nakamura et al., 1987). They can be detected by agarose gel electrophoresis and Southern blot analysis and also using the PCR (Jeffreys, et al., 1988). The D N A base sequences of these hypervariable loci indicate the alleles contain a set of tandem repeats of a short (11-60) oligonucleotide sequence (Nakamura et al., 1987). The length of an allele is a function of the number of copies of the tandem repeat present within the restriction fragment or amplified region. Polymorphism is revealed as variation in the lengths of restriction fragments produced in similar patterns by a number of different enzymes or by size variation in PCR products. Dinucleotide repeats are tandemly repeated units of dinucleotides, such as (dG-dA)n- (dC-dT)n and (dC-dA)n- (dG-dT)n. The dinucleotide repeat most often used as a polymorphic marker is (dC-dA)n- (dG-dT)n, hereafter designated as (GVn- In the time since the first reports, by Litt and Luty (1989) and Weber and May (1989), of the potential of (GT)n repeats as polymorphic markers, many research groups have exploited the variability of these simple repeats. Weber (1990b) investigated the relationship between the informativeness of perfect (GT)n repeats (alternating, tandem GT dinucleotides without interruption and without adjacent repeats of another sequence) and the number of repeated units. His proposed rules for informativeness of (GT)n repeats are as follows. In general, informativeness for repeats of 10 or fewer units was found to be low or zero, informativeness for repeats of 11-15 units was variable with PIC values ranging from zero to approximately 0.65, and repeats of 16 or more units were always moderately to highly informative, with PIC values ranging from approximately 0.5 to 0.8 (Weber, 1990b). Although, on average, (GT)n repeat-containing sequences are not as polymorphic as VNTRs, they are much more common. (GT)n repeats are widely interspersed in the genomes of all eukaryotes which have been examined, including yeast, fish, amphibians, insects, and mammals (Meisfield et al., 1981; Hamada and Kakunaga, 1982; Hamada et al., 1982b; Tautz and Renz, 1984; Gross and Garrard, 1986; Fries et al., 1990; Love et al., 1990; WinterO et al., 1992; ). The number of copies of (GT)n repetitive sequences in different eukaryotic species varies from 100 copies in yeast to 200,000 copies in salmon (Hamada et al., 1982b). In the human genome there are roughly 50,000-100,000 interspersed (GT)n repeats with the range of 'n' roughly 12-30 (Hamada and Kakunaga, 1982; Hamada et al., 1982b, 1984a). On average, (GT)n repeats occur every 30 kb in the human genome as estimated from the analysis of human cosmid clones and GenBank genomic sequences (Stallings et al., 1991). The fimction of (GT)n repeats is unknown. It has been demonstrated that (GT)n repeats have the potential to adopt the Z - D N A conformation within negatively supercoiled plasmids in experimental conditions which approximate physiological conditions (Haniford and PuUeyblank, 1983; Nordheim and Rich, 1983; Hamada et al., 1984a). There have been reports of experiments which indicated that (GT)n repeats could enhance or repress the expression of adjacent genes (Hamada et al., 1984b; Naylor and Clark, 1990). Additionally, (GT)n repeats have been found in association with several genes in eukaryotes including the beta globin cluster (Slightom et al., 1980; Meisfield et al., 1981), the cardiac actin gene (Hamada et al., 1982a), the somatostatin gene (Shen and Rutter, 1984), histone genes (Hentschel, 1982) and the amylase gene cluster (Samuelson et al., 1990). These observations, among others, led to the hypotheses that (GT)n repeats are involved in genetic recombination, gene conversion (Slightom et al., 1980; Tautz and Renz, 1984; Treco and Arnheim, 1986; Pardue et al., 1987; Wahls et al., 1990) and the regulation of gene expression (Hamada et al., 1984b; Naylor and Clark, 1990). However, no convincing evidence has yet been found to indicate that (GT)n repeats are involved in any of these cellular processes. The number of GT dinucleotides found at a given site can vary, generating alleles differing in length by only a few base pairs. These alleles are therefore not detectable by restriction digestion and Southern blot analysis. However, polymorphisms can be detected by amplifying the segment of D N A containing the repeat using the polymerase chain reaction (PCR) a method which rapidly and exponentially amplifies the specific target sequences located between two single-copy oligonucleotide primers (Saiki et al., 1985, 1988). The amplified fragments can then be analysed by denaturing polyacrylamide gel electrophoresis (Litt and Luty, 1989; Weber and May, 1989). PCR-based analysis of these polymorphisms requires knowledge of the sequence flanking the repeat; there are, however, methods for obtaining the flanking sequence without subcloning the repeat region (Yuille et al., 1991). Large numbers of (GT)n repeats are distributed throughout the human genome. They are potentially polymorphic markers which are easily analysed using the PCR. For these reasons, they are particularly useful for linkage analysis, and were therefore used to place isolated NotI sites on the chromosome 8 linkage map. 1.42 Linkage analysis In classical genetics, loci are placed on a linkage map by carrying out carefully designed test crosses. The number of recombinant and nonrecombinant progeny are counted among offspring and the total number of recombinants is divided by the total opportunities for recombination, to obtain an estimate of the recombination fraction. The genetic distance between the loci involved in the testcross is then estimated in relation to the recombination fraction. In humans, of course, test crosses cannot be set up to suit the investigator and instead, human geneticists must rely on available pedigrees. Unfortimately, most pedigrees are small and it is often not possible to count the number of recombinant progeny, as the phase of the double hétérozygote is often unknown. Therefore, sophisticated statistical methods which allow data from independent families to be combined, must be used to estimate the recombination fraction between two loci in humans. The method generally used to assess linkage in humans is a likelihood ratio test, based on the method of Morton (1955). This test calculates the overall likelihood of obtaining an observed segregation of two markers within a pedigree based on two alternative assumptions: that the two loci are linked at a particular recombination fraction, and that they are unlinked. The probability that the observed family data are the result of two linked loci under a specific recombination fraction and the probability that the family data are the result of two unlinked loci are calculated. The ratio of these two probabilities (linked/unlinked) is the likelihood ratio and expresses the odds that the loci are in fact, linked. A likelihood ratio is calculated for each family in the data set. The logarithm to the base 10 of the likelihood ratio is the lod score (Morton, 1955). Since the likelihood of two sets of independent data is the likelihood of the first multiplied by the likelihood of the second and multiplying likelihoods corresponds to adding logarithms, the problem of small families can be overcome by adding lod scores obtained from individual families to get an overall lod score for the data set. A lod score of zero means the possibilities that the loci are linked or unlinked are equally likely. A positive lod score suggests the lod are linked at the recombination fraction tested. A negative lod score suggests the two lod are unlinked. Conventionally, when the lod score reaches or exceeds 3 (odds in favor of linkage are 1,000:1), the data is said to convey significant evidence for linkage (Ott, 1991). The overall lod score for a data set is calculated for various recombination fractions and the maximum likelihood estimate for the recombination fraction between the two loci is the recombination fraction with the highest associated lod score. The relationship between the genetic distance between two loci and the recombination fraction is fundamental to linkage analysis. Genetic distance is measured in map units or cM. For small recombination fractions the percent of recombinants is directly proportional to the genetic distance between two genes, or 1% recombination equals 1 cM. However, for large recombination fractions the relationship between genetic distance and the recombination fraction must be adjusted for the occurence of double crossovers. Haldane's (1919) mapping function, accounts for double crossovers. However, the assumptions of this mapping function, that crossovers are randomly distributed and are mutually independent, are not always true; the frequency of crossovers varies in different chromosomal regions and crossovers do not occur independently of each other. In fact, the occurence of one crossover tends to inhibit the formation of other crossovers in its neighborhood. This phenomenon is referred to as interference. Kosambi's (1944) mapping function, X = 1 In 1 + 2 Q whose inverse is 0 = 1 tanh (2x), 4 1 - 20 2 assumes interference is proportional to the recombination fraction between two loci. The Haldane or Kosambi mapping functions are both commonly used by human geneticists. The Kosambi mapping function was used for conversions of recombination fraction to map units in this thesis. Two recent developments facilitate the concerted effort of numerous research groups towards the goal of constructing a high resolution human genetic linkage map; a common panel of reference families and computer programs for linkage analysis. In order to perform linkage analysis for two markers, they must be typed in the same panel of families. The Centre d'Etude du Polymorphisme Humain (CEPH) organization has established cell lines from a panel of 40 reference families with the philosophy that collaborative research is the most efficient means of constructing the human linkage map (Dausset et al., 1990). The families in the C E P H panel have living parents, four or fewer living grandparents and a mean sibship size of 8.3. CEPH provides interested researchers with D N A and a database containing genotype data for all markers typed in the C E P H panel. In return, each research group must provide CEPH with genotype data, for the 80 parents minimum, for all markers typed by the group in the C E P H panel. The development of computer programs for linkage analysis allows rapid analysis of numerous pairs of markers and of sets of greater than two markers. Linkage calctilations for this thesis were performed using version 4.7 of the LINKAGE package of programs (Lathrop et al., 1984,1985; Lathrop and Lalouel, 1988). 1.5 Hardy-Weinberg equilibrium Under conditions of random mating and the absence of disturbing evolutionary forces such as mutation, migration and natural selection at the locus in question, a population is said to be in Hardy-Weinberg (HWE) equilibrium (Hardy, 1908; Weinberg, 1908). This means that the relationship between allele and genotype frequencies in the population is in accordance with the Hardy-Weinberg principle. This principle states that for a locus with two alleles, the genotype frequencies of A A , Aa and aa are p2,2pq, and q2, respectively, where p equals the frequency of allele A and q equals the frequency of allele a. Two polymorphisms were typed in the C E P H panel as part of the work for this thesis. The observed genotype frequencies, at each marker in the population defined by the C E P H panel, were compared to the distribution expected under HWE using standard analysis. 1.6 Investigation of whether closely linked (GT)„ repeat polymorphisms are useful for the generation of highly informative haplotypes Highly informative markers are not only useful for mapping disease traits by classic segregation analysis but are also useful for alternative strategies used to map complex traits. The analysis of affected relative pairs (Bishop and Williamson, 1990; Risch, 1990; Ott 1990) is an example of an alternative strategy which was used recently to map the gene for insulin dependent diabetes mellitus (Julier et al., 1991). This method, which involves the analysis of identity by state or identity by descent, requires highly informative marker loci for success. The informativeness of a locus can be increased by the simultaneous analysis of multiple nearby polymorphic markers, thereby generating an extremely informative haplotype. However, this approach is constrained by linkage disequilibrium. In the absence of statistically significant linkage disequilibrium the probability that a meiotic event is informative for two markers is one minus the product of the probabilities the meiotic event is uninformative for either marker or (1-PICA)(1-PICB) = 1-PICAB where P I C A , P I C B and P I C A B are the P I C values for marker A , marker B and haplotype A B . This equation defines the increase in the number of informative méioses obtained by typing individuals for two markers in linkage equilibrium. The work done for this thesis investigates whether it is useful to simultaneously analyse two closely linked (GT)n repeat polymorphisms. The decay of linkage disequilibrium per generation is a function of the recombination fraction between the two loci. While mutation is rare and does not contribute significantly to the decay of disequilibrium between two RFLPs, the mutation rate at (GT)n repeat loci is estimated to be 0.00045 per locus per gamete (Kwiatkowski, 1992) and would contribute significantly to the disruption of an existing disequilibriimx between two (GT)n repeat polymorphisms. This suggests that two closely lirJced (GT)n repeat polymorphisms are more likely to be in linkage equilibrium than two closely linked RFLP polymorphisms and that typing individuals for two closely lirJced (GT)n repeat polymorphisms would therefore be warranted. To investigate this possibility, allelic association between two closely linked (GT)n repeat polymorphisms was analysed, as described in section 1.6.1, to determine whether the markers were in linkage disequilibrium in the population defined by the C E P H panel. In addition, members of the C E P H panel were typed for both markers to determine whether typing individuals for the second polymorphism increased the number of informative méioses. 1.6,1 Analysis of allelic association A state of random gametic association between alleles at different loci is called linkage equilibrium as defined in Hartl and Clark (1989). When two loci are in linkage equilibrium the observed haplotype frequencies are equal to the product of their component allele frequencies. However in gametes, the alleles at one locus may not be in random association with the alleles at another locus. When the observed haplotype frequencies deviate significantly from those expected for random combinations of alleles, the loci are said to be in linkage disequilibrium. Linkage disequilibrium can result from close linkage, population admixture, and sample clustering, among other causes, as described in Hartl and Clark (1989) and discussed in section 4.4. Therefore, Edwards (1980) proposed the use of the term "allelic association" to allow discussions of deviation from random association, without implications of mechanism. Among human geneticists the term linkage disequilibrium is often used to refer to association among alleles due to close linkage (Ott, 1991). Standard analysis can be used to determine if observed haplotype frequencies deviate significantly from those expected for random association of alleles in gametes. The test for random allelic association is commonly performed in haplotype contingency tables, the simplest example being a 2 X 2 contingency table for two loci, each with two alleles. For loci with multiple alleles, the test for random allelic association can be performed in contingency tables of which the rows and columns can be combined to eliminate small cell values (Weir and Cockerham, 1978) For example, an observed haplotype distribution can be collapsed to an artificial 2 X 2 contingency table for each haplotype with a satisfactory number of observations. The analysis can then be performed as it would be for a two locus, four allele system. If allelic association is detected, the extent each haplotype frequency departs from the product of its component allele frequencies can be estimated by calculating the linkage disequilibrium parameter (D), as described in Hartl and Clark (1989). As D is constrained by allele frequencies, it can be normalized (D') by dividing by the maximum possible value of disequilibrium, given the observed allele frequencies. 1.7 Organization of the thesis The remainder of the thesis is organized as follows. Chapter 2 contains a discussion of the materials and experimental techniques used to carry out this work. The results of the study are presented in Chapter 3. Finally, in Chapter 4, these results are discussed and evaluated. Several suggestions for future experiments are also made. Chapter 2 Materials and Methods This chapter is restricted to a discussion of the materials and basic experimental techniques used in this study. The overall strategies used to meet the objectives of this work are presented in the following chapter, along with the experimental results. 2.1 Materials 2.1.1 Flow-sorted human chromosome 8 library The human chromosome 8 library designated LA08NC01, was constructed at Los Alamos (Wood et al., 1992). The human x hamster hybrid cell line UV20HL21-27, which retains human chromosomes 4, 8, and 21 (Fuscoe et al., 1986) was used as a cloning source. Chromosome 8 was isolated, intact, from this cell line by fluorescence-activated flow sorting (Deaven et al., 1986). In brief, the library was constructed by partial Sau3AI digestion of chromosome 8 D N A followed by dephosphorylation, ligation into the BamHI cloning site of cosmid vector sCos-1 (Evans et al., 1989) and transfection into E.coli DHSaMCR cells (Sambrook et al., 1989). The library was arrayed by transfer of individual kanamycin-resistant colonies to individual wells of microtiter plates containing LB broth. The colonies were grown overnight at 37°C, glycerol was added to a final concentration of 40% and the plates were stored at -70°C. Filter replicas of individual microtiter plates were produced using a protocol based on the method of Grunstein and Hogness (1975). The specific protocol is described in Wood et al., (1992). Characterization of the library indicated it is 85% human, representative of the whole of chromosome 8 and that on average, each chromosome 8 sequence is represented four times (Wood et al., 1992). 2.12 Somatic cell hybrids Two panels, consisting of cell hybrids that retain different regions of human chromosome 8 in a Chinese hamster background, were used to localize cosmid isolates. The first panel is comprised of five cell hybrids that divide chromsome 8 into four contiguous regions spanning the entire chromosome. These hybrids are described in Wood et al., (1992). Cell hybridsl3blS816-10-3 and 21-8/Ab5/C123a contain, respectively, the der(8) and der(21) chromosomes (Drabkin et al., 1985b; Sacchi et al., 1986) from the t(8;21) (q22; q22) translocation associated with acute myeloid leukemia. Cell hybrid 706B6-C117 is reported to contain an intact chromosome 8 (Jones et al.,1981; Dalla-Favera et al., 1982). Using the BrdU-visible light segregation method of Puck and Kao (1967), another cell hybrid, 706B6-C117-S12 was derived from 706B6-C117 (Wood et al., 1992). Cell hybrid TL/UC2/12-8 retains the der(3) chromosome from a constitutional t(3;8) (pl4.2; q24.13) translocation associated with a familial renal cell carcinoma (Drabkin et al., 1985a). The second panel, developed and characterized by Wagner et al., (1991) consists of 11 cell hybrids. It divides chromosome 8 into nine contiguous or overlapping intervals, designated A-I, spanning the entire chromosome. The panel includes two hybrids from the previously described panel: 13blS816-10-3 and 21-8/Ab5/C123a. Additionally, it includes XVni-23Ha, a hybrid derived from normal human fibroblasts but containing 8pter-8q21 translocated to a hamster chromosome (Francke, 1984; Floyd-Smith et al., 1986). The remaining eight hybrids were selected from a series constructed, using clastogenic selective agents, to map human genes complementing D N A repair mutations in hamster cells (Thompson et al., 1985, 1987, 1989; Siciliano et al. 1986). These hybrids were designated I H L 35,9HL 10, VTGHL 19, IHL 33, ISHL 3,1 H L 12, ISHL 27, and 20XPO435-2. Table 1 lists the chromosome 8 constitution of each of the hybrids in the two panels. 2.2 Methods 2.2.1 Plasmid and cosmid DNA isolation 2.2.1.1 Small scale isolation Small quantities of plasmid and cosmid D N A were isolated using an alkaline lysis mini-prep protocol (Birnboin and Doly, 1979; Ish-Horowicz and Burke, 1981). A colony transformed with the plasmid or cosmid of interest was used to innoculate 5 milliliters (ml) of LB broth containing ampicillin or kanamycin at 50 micrograms (|j,g)/ml. Cultures were incubated at 37°C for 18 hours. Following incubation, 500 microliters (|il) of the culture was combined with an equal volume of glycerol and stored at -70°C, as a stock. The cells in the remainder of the culture were pelleted by centrifugation at 14,000 rpm for 5 minutes in a microfuge. The cells were resuspended in 100 [d of solution I and incubated at room temperature for 5 minutes. 200 | i l of freshly prepared solution n and 150 ^il of solution III were added individually and mixed. The addition of each solution was followed by incubation for 5 minutes on ice. The cell debris was pelleted by centrifugation at 14,000 rpm Table 1. Human chromosome 8 constitution of somatic cell hybrids Hybrid name Chromosome 8 constitution 13blS816-10-3 8pter^q22.2 21-8/Ab5/C123a 8q?? l-^8qter 706B6-C117 8 706B6-C117-S12 8pter->8cen T L / U C 2 / 1 2 ^ 8q24.13^8qter XVIII-23Ha 8pter^8q22.3 1HL35 8pter-»8q22.1 9HL10 8p21->8q22.1 VTGHL 19 8pter-^8ql2 I H L 33 8pter-^8ql2 1SHL3 8pter^8qll I H L 12 8pter->8cen 1SHL27 8pter->8cen 2OXPO435-2 8p22->8cen for 10 minutes. Enzymes and proteins were extracted by the addition of 450 ^il of phenol followed by centrifugation at 14,000 rpm for 5 minutes. The aqueous layer was transferred to a new tube, 400 .^1 of Sevag (24:1 v / v chloroform/isoamyl alcohol) was added and the mixture was centrifuged at 14,000 rpm for 5 minutes. The aqueous layer was transferred to a new tube and the D N A was precipitated with 0.1 volume of 7.5 M ammonium acetate (NH4OAC), p H 7.5 and 2 volumes of 95% ethanol (EtOH). The D N A was pelleted by centrifugation at 14,000 rpm for 10 minutes. The pellet was washed with 1 ml of 70% EtOH, dessicated in a SpeedVac Concentrator and resuspended in 32-40 ^il of IX TE containing 0.1 \ig/^l RNase A . The sample was incubated for 30 minutes at 37°C to allow degradation of contaminating RNA. The sample was stored at -20°C. Alkaline lysis solution I 50 m M glucose 10 m M disodium ethylenediaminetetra-acetic acid (EDTA) 25 m M tris(hydroxymethyl)aminomethane (Tris), p H 8.0 4 milligram (mg)/ml lysozyme added just prior to use. Alkaline lysis solution U 0.2 N sodium hydroxide (NaOH) 1% (w/v) sodium dodecyl sulfate (SDS) Alkaline lysis solution HI 60 ml 5 M potassium acetate (KOAc) 11.5 ml glacial acetic acid 28.5 ml distiUed water (dH20) LB Broth 5 grams (g) yeast extract 10 g tryptone 5 g sodium chloride (NaCl) 1 g D-glucose dH20 to 1 liter (1) Adjust to p H 7.2-7.4 with 300 ^110 N N a O H and autoclave 20 minutes at 15 pounds (lbs) pressure. IX TE Solution, p H 8.0 10 m M Tris, p H 8.0 1 m M EDTA, p H 8.0 22.12 Large scale isolation Large quantities of cosmid or plasmid D N A were isolated using a large scale alkaline lysis protocol (Birnboin and Doly, 1979; Ish-Horowicz and Burke, 1981). Solutions I, n and HI are identical to those used in the small scale alkaline lysis protocol outlined above. A 5 ml culture (section 2.2.1.1) was transferred to 500 ml of LB broth containing antibiotic selection at 50 fig/ml. Following incubation at 37°C for 5 hours, 1 ml of chloramphenicol (80 mg/ml) was added and the culture was incubated for an additional 12-16 hours. The cells were pelleted by centrifugation at 4,000 rpm and 4°C for 10 minutes. The cells were resuspended in 35 ml STE, transferred to a 50 ml tube, and centrifuged at 4,000 rpm and 4°C for 10 minutes. The pellet was resuspended in 8 ml of solution I and incubated at room temperature for 5 minutes. 16 ml of solution n and 12 ml of solution HI were added individually and mixed. The addition of each solution was followed by incubation for 10 minutes on ice. The cell debris was removed by centrifugation at 15,000 rpm and 4°C for 25 minutes. The supernatant was transferred to a new tube and the D N A was precipitated by the addition of 0.6 volume of isopropanol, followed by incubation at room temperature for 15 minutes. The D N A was pelleted by centrifugation at 2,500 rpm for 10 minutes at room temperature. The pellet was air dried and dissolved in 10 ml IX TE, p H 8.0. 1 g /ml cesium chloride (CsCl) and 0.8 ml of lOmg/ml ethidium bromide (EtBr) were added and the mixture was transferred to a 12 ml heat sealable tube. A gradient was formed by centrifugation at 60,000 rpm for 18-20 hours in a ultracentrifuge at 20°C. D N A bands were detected with long wavelength ultraviolet (UV) radiation and the lower band containing supercoiled D N A was removed from the gradient using a 20 gage needle and 3 ml syringe. EtBr was removed by repeated extractions with an equal volume of water saturated butanol. The D N A was precipitated by the addition of 2 volumes of dH20 and 6 volumes 95% EtOH and incubation at -20°C for 1-24 hours. The D N A was collected by centrifugation at 2,500 rpm for 10 minutes at room temperature. The pellet was washed with 70% EtOH, centrifuged at 2,000 rpm, air dried, resuspended in 0.5 to 1 ml of IX TE, and stored at 4°C. STE 10 m M Tris, p H 8.0 100 m M NaCl 1 m M EDTA, p H 8.0 2.2.2 Restriction enzyme digestion Cosmid or plasmid D N A was digested with restriction enzymes by incubation at 37°C in the presence of 1-2 units (u)/|ig D N A of restriction enzyme. A standard reaction contained 100 ng to 3 ng of D N A , IX restriction digest buffer (BRL), and 0.1 mg/ml bovine serum albumin (BSA). If necessary, dH20 was added to bring the total reaction volume to 20 | i l . The reaction was terminated after 1-1.5 hours by the addition of 0.25 volume stop buffer or by incubation at 65°C for 10 minutes. Reactions utilizing enzymes which were not inactivated by incubation at 65°C were terminated by precipitating the D N A with 1/10 volume of 7.5 M ammonium acetate, p H 7.5 and 2 volumes of 95% EtOH. Stop buffer 0.25% Bromophenol Blue 0.25% Xylene Cyanol FF 40% (w/v) sucrose in dH20 10 m M EDTA 2.23 Agarose gel electrophoresis Restriction digested cosmid or plasmid D N A was size fractionated by agarose gel electrophoresis. Agarose gels were prepared by dissolving agarose (0.7%-l% w/v) in IX Tris-borate (TBE). EtBr was added to l | i g / m l to allow visualization of D N A on a short wave U V transilluminator. The liquid agarose was poured into a casting tray and allowed to solidify at room temperature. The gel was placed in an electrophoresis tank and covered with IX TBE buffer. The samples were loaded and a constant voltage was applied. Cosmid D N A was electrophoresed overnight on a 0.7, 0.8 or 1.0% gel at 30 volts (V). Plasmid D N A was electrophoresed on a 0.8% gel at 10-100 V for 1 hour to overnight. Lambda D N A digested with Hindl l l and SacII was used as a size standard. Following electrophoresis, the gel was photographed using a MP4 camera and Polaroid or negative film. lOX TBE buffer 54 g Tris base 27.5 g boric acid 20 ml 0.5 M EDTA, p H 8.0 dH20 to 11 2.2.4 Transfer of DNA to nylon membranes 2.2.4.1 Southern transfer Restriction digested plasmid or cosmid D N A (5-10|j.g) was transferred to a nylon membrane using the method of Southern (1975). The D N A was size fractionated on an agarose gel. A fluorescent ruler was placed on the gel while it was photographed, to allow identification of each band by the distance it had migrated. Excess agarose was trimmed from the gel and the top right comer removed to allow orientation. The D N A was nicked by soaking the gel in 0.25 M hydrochloric acid (HCl) at room temperature for 15 minutes. The gel was rinsed briefly in tap water. The D N A was denatured by soaking the gel in 1.5 M NaCl , 0.5 M N a O H for 30 minutes. The gel was rinsed briefly in tap water, neutralized by soaking in 1 M Tris, 1.5 M NaCl for 30 minutes and then transferred to a 3 M M Whatman wick, supported by a glass plate, and soaking in lOX SSC. A piece of Gene Screen (New England Nuclear) membrane and two pieces of 3 M M Whatman paper were cut to the dimensions of the gel. The membrane was placed on top of the gel and covered with one piece of the Whatman paper soaked in water, the dry piece of Whatman paper and approximately 15 cm of paper towel. The gel was surrounded by plastic wrap, to prevent the buffer from flowing directly from the reservoir to the paper towels. The D N A was allowed to transfer for 18 hours. The gel was restained with EtBr to allow determination of transfer efficiency. Average transfer efficiency was 90%. The membrane was baked at 80°C for 2 hours to fix the D N A onto the membrane and was then wrapped in plastic wrap and stored at room temperature. 20X SSC solution 175.3 g NaCl 88.2 g sodium citrate dH20 to 1 liter 2.2.4.2 Production of filters for colony hybridization Filters for colony hybridization were produced using the method of (Grunstein and Hogness, 1975). A circular nitrocellulose filter was placed on an XIA plate. Exoli DHSaMCR colonies, transformed with recombinant plasmids, were transferred to the filter and to a corresponding location on a stock XIA plate. Both plates were incubated at 37°C overnight. The stock plates were stored at 4°C. The cells were lysed and the D N A denatured by incubating the filters in 0.5 M NaOH, 1.5 M NaCl for 5 minutes. The filters were neutralized by incubation in 1 M Tris, p H 7.5,1.5 M NaCl for 5 minutes. The filters were briefly rinsed in 2X SSC, air dryed, baked in a vacuum dessicator at 80°C for 2 hours, wrapped in plastic wrap and stored at room temperature. Colonies transformed with recombinant plasmids containing a (GT)n repeat were identified by hybridizing colony filters with nick translated poly(dC-dA)- (dG-dT) (section 2.2.7) XIA Plates 11 sterile LB broth 12 g agar 40 |xg/ml 5-bromo-4-chloro-3-indolyl-|3-D-galactoside (X-gal) 120 ^ig/ml isopropyl-P-thiogalactopyranoside (IPTG) 50 |xg/ml ampicillin 2.23 Preparation of radioactively labeled probes 2.23.1 End-labeling The universal primers, T3 and T7 (Stratagene) were radioactively end-labeled using T4 (PNK) polynucleotide kinase (Sambrook et al., 1989). 50 pmoles of D N A were combined with IX P N K buffer, 70 ^.Ci dATP ( f-'^'^V), 9.5 ^il dH20 and 0.67 u T4 kinase. The total reaction volume was 20 |xl. The reaction was incubated at 37°C for 45 minutes and terminated by incubation at 68°C for 10 minutes. lOX P N K buffer 50 m M dithiothreitol (DTT) 100 m M magnesium chloride (MgCl2) 500 m M Tris, p H 7.5 1 m M spermidine 2.2.5.2 Nick translation Poly(dC-dA)- (dG-dT) (Pharmacia) was radioactively labeled using D N A Polymerase I/DNase I ( Rigby et al., 1977), dATP (a-32p) and a BRL nick translation kit. The reactions were carried out precisely as specified by the manufacturer. 2.233 Oligolabeling Total human D N A (THD), Chinese hamster ovary D N A (CHOD) or total cosmid D N A was radioactively labeled using the random primer method (Feinberg and Vogelstein, 1984a, b). 10-30 nanograms (ng) of D N A was diluted in IX TE to a concentration of 1 ng/^il. The D N A was denatured by boiling for 10 minutes, followed by immediate cooling on ice. A standard reaction combined 30 ng of D N A with 10 of OLB-A, 0.1 mg/ml BSA, 50 ^iCi of dATP (a-32p) and 1 u of Klenow. If a smaller amount of D N A was used, dH20 was added to bring the total reaction volume to 50 p.1. The mixture was incubated at room temperatvire for 3-18 hours. To determine the fraction of label incorporated, an assay utilizing trichloracetic acid (TCA) was performed. 1 ^il of the reaction volume was added to 100 | i l of nick translation stop buffer and mixed thoroughly. 40 \i\ aliquots were transferred to two tubes labeled "incorporated" and "total". 1 ml of 10% (w/v) TCA was added to "incorporated", 1 ml of dH20 was added to "total" and both tubes were centrifuged for 2 minutes at 1,000 revolutions per minute (rpm). The supernatant was removed from "incorporated" and 1 ml of dH20 was added. The counts per minute (cpm) for each tube was determined using a liquid scintillation counter. Reactions with a percent incorporation greater than 30% were terminated by the addition of 1 volume nick translation stop buffer (NTSB). Reactions with a lower incorporation were re-incubated in the presence of an additional 1 u of Klenow. Unincorporated nucleotides were separated from the labeled probe by exclusion chromatography on a 1 ml column packed with Sephadex G-25 beads. The labeling reaction was added to the column and centrifuged for 2 minutes at 1,000 rpm. The eluted fraction was transferred to a microfuge tube. The column was washed with 100 [Û. of dH20 and the resulting elutant was combined with the previous fraction. Immediately prior to use, the purified probe was denatured by boiling for 10 minutes, followed by incubation on ice. Solution O 1.25 M Tris, p H 8.0 0.125 M MgCl2 Solution A 1 ml Solution O 18 | i l fi-mercaptoethanol 5 ^il each of 100 m M dCTP, dGTP, and dTTP Solution B 2 M Hepes, p H 6.6 Solution C Hexadeoxyribonucleotides dissolved in IX TE at 90 O D / m l Oligolabeling Buffer-A (OLB-A) Mix solutions A:B:C in a ratio of 100:250:150 N T S B 20 m M EDTA, p H 8.0 2 mg/ml salmon sperm D N A 0.2% (w/v) SDS 2.2.5.4 Preannealing with human DNA Repetitive probes were preannealed using an excess of sheared non-radioactive human D N A (Litt and White, 1985) to prevent hybridization of repetitive sequences. 30 ng of oligolabeled probe was combined with 200 \ig of THD and 2X SSC in a total volume of 270 ^il. The sample was boiled for 5 minutes to denature the D N A and incubated at 65°C for 30 minutes. 2.2.6 Prehybridization, hybridization, washing and autoradiography 2.2.6.1 End-labeled probe Southern blots were prehybridized, to block non-specific probe binding, in 10-15 ml of hybridization solution (5X SSC, 5X Denhardts) at 42°C for 1 hour. The probe was added to the hybridization solution and hybridized overnight at 42°C. Following hybridization, the filter was washed in IX SSC for 10 minutes at room temperature and air dried. The filter was sealed in a hybridization bag and exposed to Kodak XRP-1 at room temperature, using Dupont Lightening Plus intensifying screens, for 4-24 hours. lOOX Denhardt's Solution 4% Ficoll type 400 4% polyvinyl pyrrolidone 4% (w/v)BSA 2.2.62 Oligolabeled probe Southern blots were prehybridized in hybridization solution (6X SSC, 5X Denhardt's, 0.3% (w/v) SDS, 100 ^ig/^il salmon sperm DNA) at 65°C for 1 hour. Single copy probes were incubated at 95°C for 5 minutes to denature the D N A and added to the hybridization solution. Repetitive probes were preannealed (section 2.2.5.4) before addition to the hybridization solution. In both cases, the probe was hybridized for 18 hours at 65°C. The filters were washed twice in IX SSC, 0.1% SDS at 65°C for 15 minutes and twice in 0.2X SSC, 0.1% SDS at 65°C for 30 minutes. The two 30 minute washes were repeated if a significant amount of background signal was detected. The filters were air dried, sealed in a hybridization bag and exposed to Kodak XRP-1 film at -70°C, using Dupont Lightening Plus intensifying screens, for 6-11 days. 2.2.63 Stripping filters Filters were stripped by incubation in 0.4 N NaOH at 43°C for 30 minutes, neutralized by incubation at 43°C for 30 minutes in 0.2 M Tris. p H 7.5, 0.2X SSC, 0.2% (w/v) SDS and air dried. Filters re-used for hybridization a maximum of three times. 2.2.7 (GT)„ repeat detection (GT)n repeats were detected by hybridization to nick translated poly(dC-dA)-(dG-dT) (section 2.2.5.2). Southern blots or colony filters were prehybridized in hybridization solution (section 2.2.6.2) lacking competitor D N A at 55°C for 1 hour. The probe was added to the hybridization solution and hybridized overnight at 55°C. LA08NC01 replica filters were prehybridized and hybridized at 65°C. Following hybridization, blots and colony filters were washed in IX SSC, 0.1% SDS for 45 minutes at 65°C. LA08NC01 replica filters were washed twice for 30 minutes at 55°C. The filters were air dried and sealed in a hybridization bag. The filters were exposed to Kodak XRP-1 film at room temperature, using Dupont Lightening Plus intensifying screens, for 2-48 hours. 2.2.8 Random isolation of cosmids containing both NotI sites and (GT)„ repeats Cosmids containing (GT)n repeats were isolated randomly by screening LA08NC01 replica filters with nick translated poly(dC-dA)- (dG-dT). D N A was isolated (section 2.2.1.1 ) from cosmids which hybridized poly(dC-dA)- (dG-dT) and digested (section 2.2.2) with EcoRI, NotI and EcoRI/NotI to determine if the cosmid inserts contained a NotI site. The restriction digested D N A fragments were transferred to a nylon membrane (section 2.2.4.1). The blots were hybridized with oligolabeled THD and C H O D (section 2.2.5.3) to establish the origin of the inserts and to determine the copy number of the various restriction fragments. In addition, the blots were hybridized with nick translated poly(dC-dA)- (dG-dT). This served three purposes: confirmation of (GT)n repeats identified by replica filter hybridization, determination of the number of (GT)n repeats present in the insert and identification of the restriction fragment in which each (GT)n mapped. 2.2.9 Ligation Cosmid or plasmid D N A and pBluescript II KS +/- (Stratagene) D N A were restriction digested to produce complementary ends. The vector was ligated to fragments of cosmid or plasmid D N A using T4 ligase. The ratio of vector to insert in each ligation reaction was 1:10. This ratio was achieved by combining 2 ^il of 25 ng/ | ig pBluescript n KS +/- D N A with 5 ^il of 100 ng/p.1 recombinant plasmid or cosmid D N A . Ligations were carried out in a 500 | i l tube containing the D N A , IX ligation buffer and 1 p.1 of dH20 to bring the total reaction volume to 10 ^il. l u of T4 ligase was added and the reactions were incubated at 14°C for 18 hours. lOX ligation buffer 100 m M MgCl2 500 m M Tris 100 m M DTT 10 m M ATP 1 mg/ml BSA 2.2.10 Transformation E.coli DHSaMCR cells were made competent using the method of Hanahan (1983). Ligation reactions were transformed into the competent cells using the following procedure. 1-250 ng of D N A was added to 100 [i\ of competent cells, and the mixture was incubated on ice for 30 minutes. The cells were heat-shocked at 42°C for 45 seconds followed by incubation on ice for 2 minutes. 400 .^1 of LB broth was added and the mixture was incubated at 37°C for 45 minutes. A l l or one half of the transformation mixture was plated in 50 p.1 aliquots onto XIA plates. The plates were incubated at 37°C for 18 hours and stored at 4°C. The average transformation efficiency was 2 X 10^ transformants per ^ig of supercoiled plasmid D N A . 2.2.11 Sequencing 2.2.11.1 Preparation of the template Supercoiled recombinant plasmid D N A was isolated using the alkaline lysis miniprep protocol and denatured using an alkaline denaturation protocol (Promega). A volume containing approximately 2 ^ig of D N A was transferred to an Eppendorf tube along with dH20 to 18 | i l . The D N A was denatured by incubation at room temperature for 5 minutes in the presence of 2 ^il of 2 N NaOH, 2 m M EDTA. The reaction was neutralized by the addition of 2 ^il of 2 M NH4OAC, p H 4.6. The D N A was precipitated by the addition of 75 ^il of 95% EtOH, followed by incubation at -70°C for 10 minutes. The D N A was collected by centrifugation at 14,000 rpm for 10 minutes. The pellet was washed with 200 (il of cold 70% EtOH, centrifuged at 14,000 rpm for 1 minute, and dessicated for 5 minutes in a SpeedVac Concentrator. 2.2.11.2 Sequencing primers Plasnùd D N A was sequenced using pBluescript II KS +/- sequencing primers T3, T7, KS and M13 (-20). Primer sequences are as follows. T3 primer: 5' A T T A A C C C T C A C T A A A G 3' T7 primer: 5 ' A A T A C G A C T C A C T A T A G 3' KS primer: 5 'CGAGGTCGACGGTATCG 3' M13 (-20) primer: 5' G T A A A A C G A C G G C C A G T 3' Six anchored primers, designed by Yuille et al. (1991) for determination of sequences flanking microsatellites, were also used. Three of these primers were of the form (5'dG-dT3')7 dX, where X = A , C, or T. The other three were of the form (5'dT-dG3')7 dY, where Y = A , C, or G. These primers were synthesized by the University of British Columbia Oligosynthesis Laboratory using an Applied Biosystems 380B Oligonucleotide Synthesizer. In addition, unique primers designed from sequence flanking two (GT)n repeats were used. The sequences of the unique primers are listed in section 2.2.12.3. 2.2.11.3 Sequencing reactions Denatured plasmid D N A was sequenced using the dideoxy chain termination method (Sanger, et al., 1977). The reagents were obtained from a sequencing kit (Sequenase version 2.0, U.S. Biochemicals). However, the protocol was modified to maximize sequence information obtained from regions of high secondary structure. The denatured D N A was resuspended in 6.5 ^il of dH20, and mixed thoroughly with 5 pmoles of sequencing primer and 1 nl dimethylsulfoxide (DMSO). The sample was incubated at 95°C for 3 minutes, followed by immediate cooling in liquid nitrogen for 5 minutes. 2 | i l of 5X Sequenase buffer was added and the template and primer were annealed by incubation for 5 minutes at room temperature. The primer was labeled by combining the previous mixture with 6.3 \û of labeling mix, and incubating at room temperature for 4 minutes. Four termination mixes were supplied, differing only in the ddNTP each contained. Four tubes, each containing 2.5 ^il of ddTTP, ddATP, ddCTP or ddGTP termination mix, were warmed to 37°C. A 3.5 fil aliquot of the labeling reaction was transferred into each of four tubes. The labeling reaction was terminated by incubating the four tubes at 37°C for 4 minutes. The sequendng readions were completed by the addition of 4 | i l of stop buffer, and stored at -20°C. Labeling mix lui (10 ^iCi) dATP (a-35s) 0.525 ^il DMSO 1.025 ^1100 m M DTT 1.8 | i l enzyme dilution buffer 2 .05 III 1.5 X labeling mix 0.25 |xl Sequenase Stop buffer 95% formamide 20 m M EDTA 0.05% Bromophenol Blue 0.05% Xylene Cyanol FF 2.2.11.4 Denaturing polyacrylamide gel electrophoresis Sequenced plasmid D N A was size fractionated by denaturing polyacrylamide gel eledrophoresis. A 6% Hydrolink gel was made by combining 25.2 g of urea, 3.6 ml lOX TBE, 6 ml 50% Hydrolink Long Ranger solution (AT Biochemicals) and 30 ml dH20, in the order listed. Two 21 cm x 50 cm sequencing plates were cleaned and one was siliconized. The plates were separated by a 0.4 mm spacer and clamped together to create a chamber for the gel solution. The chamber was sealed at the bottom by placing the plates in a gel sealing unit containing 13 nil gel solution, 52 1^ of tetramethylethylenedianiine (TEMED) and 208 ^il of 25% (w/v) ammonium persulfate (APS). Polymerization of the remaining 47 ml of gel solution was initiated by the addition of 26 nl of TEMED and 100 | i l of 25% APS. The gel solution was transferred to the gel chamber using a 25 ml pipette and allowed to polymerize at room temperature. Once polymerization was complete, the gel was prerun in 0.6X TBE buffer at 50 watts (2,500 V, 50 milliamps (mA)) constant power until it reached 50°C (45 minutes to 1 hour). Immediately previous to electrophoresis, sequencing reactions were denatured by boiling for 5 minutes, followed by incubation on ice. 3 |xl of each reaction was loaded onto the gel and electrophoresed for 1-4 hours using conditions identical to those used to prerun the gel. The time required to achieve resolution of the sequence of interest was estimated by comparison to the migration of marker dyes. Bromophenol blue runs approximately 25 bp behind the primer and xylene cyanol runs approximately 50 bp behind bromophenol blue, on a 6% gel. Approximately 300 bp of sequence could be resolved with a single load. Following electrophoresis, the gel was transferred to a piece of 3 M M Whatman, wrapped in plastic wrap and dried for 20 minutes at 80°C on a gel drier. The plastic wrap was removed and the gel was exposed to Kodak XRP-1 film for 2-4 days at room temperature. 2.2.113 Storage and analysis of sequence data Sequence data was stored and analysed using version 1.09c of the ESEE sequence editor (Cabot and Beckenbach, 1989). 2.2,12 Polymerase Chain Reaction 2.2.12.1 Primer design and synthesis Oligonucleotide primers were designed from sequence data based on the following recommendations. - A primer should be approximately 20 bp in length. - The 6 nucleotides at the 3'-end of the primer should be unique to the region to be amplified. - A primer should not contain stretches of polypurines or polypyridmidines. - The degree of self-complimentarity should be negligible. - Homology between primer pairs should be negligible. Primers were designed so that their annealing temperatures, calculated using the formula T A (°C) = 4 (G-C bp) + 2 (A-T bp) -5°C, approximated 53°C. Primer sequences were searched for homologies to primate sequences, stored in the European Molecular Biology Laboratory (EMBL) data base of published sequences (version 27), to determine if the primer sequences were unique. The searches were performed by version 1.2 of the computer program FASTA (Pearson and Lipman, 1988). Version 2.0 of the computer program N A R (Rychlik and Rhoads, 1989) was used to estimate the degree of self-complimentarity and primer dimer formation. Primers were synthesized by the University of British Columbia Oligosynthesis Laboratory using an Applied Biosystems 380B Oligonucleotide Synthesizer. 2.2.12.2 Primer purification and end-labeling Primers were purified using a C18 SEP-PAK cartridge (Waters/Millipore). The SEP-PAK behaves as a small reverse phase chromatography column. The cartridge was prepared by washing it with 10 ml of H P L C grade acetonitrile, followed by 10 ml of dH20. The crude oligonucleotide was dissolved in 1.5 ml of 0.5 M N H 4 O A C and passed through the column. The contaminating aqueous salts were eluted with the solvent and the oligonucleotide was retained by absorption on the column's matrix. The cartridge was washed with 10 ml of dH20. The purified oligonucleotide was eluted with three 1 ml volumes of 20% acetonitrile. The first fraction, contained 80-90% of the purified oligonucleotide. A 50 \û aliquot of this fraction was diluted to 1 ml and the absorbance at 260 nm was measured using a U V spectrophotometer. The |j.g of oligonucleotide present in the remainder of the fraction was calculated assuming 1 OD26O = 33 l^ g- The fraction was evaporated to dryness using a Speed Vac Concentrator and resuspended in IX TE to a final concentration of 100 |J.M. PCR primers were radioactively end-labeled using T4 PNK. A standard reaction combined 55 pmoles of primer D N A with IX P N K buffer, 25 [iCi dATP (Y-32P) and 0.67 u T4 PNK. The total reaction volume was 10 ^il. Larger quantities of end-labeled primer were produced by increasing the amount of each reactant by the same factor. The reaction was incubated at 37°C for 45 minutes and terminated by incubation at 65°C for 10 minutes. 2.2.12.3 Amplification Two polymorphic (GT)n repeats, designated polymorphism 1 (D8S136P1) and polymorphism 2 (D8S136P2), were amplified simultaneously, using modified standard PCR and cycling conditions (Saiki et al., 1985; Mullis et al., 1986; Mullis and Faloona, 1987), Taq D N A polymerase (Saiki et al., 1988) and four unique primers. PCR primers D8S136P1-CA strand 5' G C C C A A A G A G G A G A A T A A A 3' D8S136P1-GT strand 5' CTGTTTCCACACCGAAGC 3' D8S136P2-CA strand 5' A G G A G C A G T A T C A A G C T C A 3' D8S136P2-GT strand 5 ' A G C A A A A C A A T A A G C C A A G G 3' 10 ng of plasmid D N A or 40 ng of genomic or somatic cell hybrid D N A were amplified in a reaction containing 0.2 m M each dATP, dCTP, dGTP and dTTP, 50 m M Tris p H 8.0, 0.05% Tween 20 and NP40, 2.25 m M MgClz, 1.1 pmoles (0.5 ^Ci) each of end-labeled D8S136P1-CA and D8S136P2-CA primers (section 2.2.12.2), 10 pmoles each of the four cold primers and 0.625 units of Taq D N A polymerase. dH20 was added to bring the total reaction volume to 25 | i l . Samples were overlaid with mineral oil and processed through 30 cycles of denaturation at 94°C (1 min.), annealing at 53°C (30 sec.) and extension at 72°C (1 min.) in a D N A thermal cycler from Perkin Elmer-Cetus (Richmond, CA). Following cycling, the samples were held at 72°C for 10 minutes. The samples were incubated at 4°C until they were removed from the thermal cycler, at which time 8 | i l of sequencing stop buffer was added. The samples were stored at -20°C. 2.2.12 A Electrophoresis of radiolabeled PCR products Prior to electorphoresis the samples were denatured by heating at 95°C for 5 minutes, followed by incubation on ice. 7 ^il of each sample was loaded on a prerun 6% Hydrolink gel (section 2.2.11.4) and electrophoresed for 75 minutes at 50 watts (2,500 V , 50 mA) in 0.6X TBE buffer. Product sizes were determined by comparison to a M13 D N A sequencing reaction and to products of known size. Following electrophoresis, the gel was transferred to a piece of 3 M M Whatman, wrapped in plastic wrap and dried for 15 minutes at 80°C on a gel drier. Gels were exposed to Kodak XRP-1 film for 12-48 hours at room temperature or -70°C. 2.2.12.5 Polymorphism typing D8S136P1 and D8S136P2 were typed for 516 individuals from the C E P H panel of 40 families (Dausset et al., 1990). The data was entered in the C E P H database, version 5. 2.2.13 Statistical analysis 2.2.13.1 Linkage analysis Two-point linkage analysis was performed using version 4.7 of the LINKAGE package of programs (Lathrop et al., 1984,1985; Lathrop and Lalouel, 1988) and multipoint linkage analysis was performed using CRIMAP (Lander and Green, 1987). Genotypic information for LPL (GZ-14,15 polymorphism), D8S133, D8S5, D8S137, D8S131, D8S87, A N K l and PLAT (Tomfohrde et al., in press) was obtained from published data included in the C E P H database, version 5. The equivalent number of méioses and the equivalent number of recombinants for each loci pair was calculated using the program EQUIV (Ott, 1991) which is based on the method of Edwards (1976). This is a method of determining the numbers of recombinants and non-recombinants in phase known families which would give the same lod score as that observed (Edwards, 1976). These values were termed effective numbers of recombinants and non-recombinants and their sum is the effective number of méioses (Edwards, 1976). Recombination fractions were converted to cM using the Kosambi mapping function option of the program M A P F U N (Ott, 1991). 2.2.132 Calculation of allele and haplotype frequencies The C E P H panel consists of two and three generation families. Allele frequencies were calculated from genotype data for 123 C E P H individuals. In each family, data was taken only from the oldest existing generation (be it the grandparental or parental generation). In this manner a total data set of 246 independent (i.e from persons unrelated except by marriage) chromosomes was obtained. For each chromosome, the D8S136P1-D8S136P2 haplotype was determined by observing the segregation of alleles within the appropriate pedigree. That is, the haplotypes which resulted in the fewest number of recombinants were chosen. As it was possible to assign a haplotype to each chromosome in the data set, haplotype frequencies were calculated from 246 haplotypes. 2.2.133 Calculation of H and PIC H and PIC values for D8S136P1 and D8S136P2 were calculated from allele frequencies and H and PIC values for the haplotype were calculated from haplotype frequencies, using the program PIC (Ott, 1991). 2.2.13.4 Analysis of observed genotype frequencies atD8S136Pl and D8S136P2 The observed genotype frequencies at D8S136P1 and D8S136P2 in the sample population, were compared to the distribution expected under H W E using standard x2 analysis. Genotype frequencies were calculated from the set of 123 C E P H individuals used to calculate allele and haplotype frequencies. The expected and observed genotype distributions were collapsed in a random fashion until all cell values in the resulting table were equal to or greater than five. The analysis was performed on the collapsed tables. 22.13S Analysis of allelic association of alleles atD8S136Pl and D8S136P2 The null hypothesis of random association between alleles at D8S136P1 and D8S136P2 was tested by standard x^ analysis of haplotype contingency tables. The data set of 246 marker haplotypes was collapsed to 5 X 4 and 2 X 2 contingency tables and the tables were analysed using version 2.5 of the program CONTING (Ott, 1991). For haplotypes for which highly significant disequilbrium was detected (95% confidence level), the extent of the diseqmlibrium was measured by calculating the disequilibrium parameter, D = Pi i -piqi where P u is equal to the observed haplotype frequency and p iq i is the product of the component allele frequencies. D is bounded by Dmin = max (-piqi, -p2q2), Dmax = min (piq2, p2qi) and is therefore constrained by allele frequencies. The value for D was normalized to the maximum value it could have by calculating the proportion D'=D/DmaxOrD' = D/D„ùn. These formulas are described in (Hartl and Clark, 1989). Chapter 3 Results 3.1 Clone isolation Cosmids containing NotI site(s) and (GT)n repeat(s) were isolated by hybridizing the replica filter of a randomly selected plate, designated 8-1-140R, with nick translated poly(dC-dA)- (dG-dT) and subsequent digestion of positive cosmids with EcoRI, NotI and EcoRI/Notl. (GT)n repeats were detected in 53 of the 96 cosmids represented on the filter. Two of the 53 cosmids contained a NotI site: one contained one NotI site and the other contained two NotI sites. These cosmids have respective array positions of B l l and D4 and were designated c l40Bl l and cl40D4. Neither of the cosmid inserts hybridized Chinese hamster ovary D N A (CHOD) and are therefore likely of human origin. Rescreening with nick translated poly(dC-dA)- (dG-dT) indicated c l40Bl l and D4 contained two and three (GT)n repeats, respectively. As cl40D4 contained two NotI sites and three (GT)n repeats, it was selected for further analysis. 3.2 Physical mapping 3.2.1 Restriction Map ofcl40D4 A restriction map of cl40D4 was generated from a series of single and double enzyme digests, gel electrophoresis. Southern blotting and hybridization. cl40D4 and its subclones were digested with one or more restriction enz)nnes. The digested D N A was transferred to a nylon membrane using the method of Southern (1975) and hybridized with end-labeled T3 or T7 or with nick-translated poly(dC-dA)- (dG-dT). The consensus restriction map of cl40D4 is presented in Figure 1. 322 Physical Localization ofcl40D4 3.22.1 Primary localization-genomic localization blots Primary physical localization of cl40D4 was achieved by hybridization of oligolabeled and preannealed cl40D4 total D N A to genomic D N A from the five hybrid cell panel, C E P H individual 1329208 and the hamster cell line HHW1064. As expected, cl40D4 hybridized one band in 1329208 D N A and did not hybridize hamster D N A . cl40D4 hybridized all of the cell hybrid D N A with the exception of TL/UC2/12-8 and 21-8/Ab5/C123a. These results localized cl40D4 to the short arm of chromosome 8 (Figure 2). Once cl40D4 was localized, it was assigned the locus symbol D8S136. 32.22 Secondary localization-PCR A more precise physical localization of cl40D4 was achieved by PCR amplification of D N A from the 11 hybrid cell panel using the primers flanking D8S136P2. Amplification of each cell hybrid line was demonstrated, with the exception of 21-8/Ab5/C123a. These results localized D8S136 to region C of the short arm of chromosome 8 (Figure 3). Region C is defined approximately as the interval 8p21-^cen (Wagner et al., 1991). Figure 1. Restriction map of cosmid 140D4 D8S136P2 (GT)n D8S136P1 * I K N K K s 3.7 11.1 3.4 6.7 0.7 3.5 2.5 (2.2,6.7) 1.0 (4.2,4.7) 3.4 I 1 lcm = 1.43kb * = subcloned Not I site K = KpnI,N = NotI S = Sau3AI, used for cloning E = EcoRI, used for sulxloning Notl site D8S136P1 = EcoRI fragment containing (GT)n repeat D8S136P2 = EcoRI fragment containing (GT)n repeat (GT)n = (GT)n repeat not subcloned Schematic representation of the chromosome 8 sequences present (shown by lines) in each member of the five hybrid ceU panel. cl40D4 hybridized aU of the ceil hybrid D N A with the exception of TL/UC2/12-8 and 21-8/Ab5/C123a. Figure 3. Secondary physical localization of D8S136 24.1 243 V / Schematic representation of the chromosome 8 sequences present (shown by lines) in each members of the eleven somatic cell hybrid panel. Amplification of each cell hybrid line was demonstrated, with the exception of 21-8/Ab5/C123a (21q+). The figure was obtained from Wagner et al., (1991). 33 Polymorphisms 33.1 Polymorphism screening and analysis The (GT)n repeats on which D8S136P1 and D8S136P2 are based were subcloned into pBluescript n KS +/- and sequenced. A sequence autoradiograph for the (GT)n repeat of D8S136P2 is presented in Figure 4. A n equivalent autoradiograph was obtained for the (GT)n repeat of D8S136P1. Primers flanking the repeats were used to amplify the repeats to screen for polymorphisms and both were found to be polymorphic when tested in the C E P H panel parents. Families for which one or both parents were heterozygous were typed. The H and PIC values for each polymorphism were calculated from allele frequencies and the H and PIC values for the haplotype were calculated from haplotype frequencies. Mendelian codominant segregation of alleles was observed in all C E P H pedigrees studied, for both polymorphisms. 332 D8S136P1 The (GT)n repeat of D8S136P1 was subcloned as a 1.9 kb EcoRI fragment into EcoRI digested pBluescript II KS +/-, producing the plasmid pl40D4E2. A 0.8 kb Sau3A fragment from pl40D4E2 was subcloned into BamHI digested pBluescript II KS +/-, producing the plasmid pl40D4Sl. The sequence of the D8S136P1-CA primer was selected from the sequence obtained from sequencing pl40D4Sl with the KS sequendng primer. The sequence of the (GT)n repeat and the D8S136P1-GT primer was selected from sequence obtained from sequencing pl40D4Sl with the D8S136P1-CA primer. The Figure 4. Sequence of D8S136P2 (GT)n repeat A C G T Autoradiograph of a polyacrylamide gel showing the sequence of the 236 bp allele at D8S136P2, obtained by sequencing pl40D4E3 with the D8S136P2-CA primer. sequence of the (GT)n repeat and its flanking sequences is presented in Figure 5. The sequence of the cloned repeat unit was (GT)i5 and the predicted length of the amplified product was 71 bp. The observed allele sizes and frequencies are shown in Table 2. The H and PIC values of D8S136P1 are 0.84 and 0.82, respectively. It has been suggested that investigators who are developing new microsatellite polymorphisms report the genotypes from individuals 133101 and 133102 to allow them to be used as size standards (Weber, 1990a). The reference genotypes are 133101=A6, A9 and 133102=A6, A8. A n example of segregation of the 73, 75, 79 and 81 bp alleles in C E P H family 1423 is shown in Figure 6. Family 1423 was chosen to demonstrate segregation of alleles at both D8S136P1 and D8S136P2 because the autoradiographs generated for this family are easily interpreted. 333D8S136P2 The (GT)n repeat of D8S136P2 was subcloned as a 3.5 kb EcoRI fragment into EcoRI digested pBluescript II KS +/-, producing the plasmid pl40D4E3. The sequence of the D8S136P2-CA primer was selected from the sequence obtained from sequencing pl40D4E3 with an anchored sequencing primer of the form (5'dG-dT3')7 dT. The sequence of the (GT)n repeat and the D8S136P2-GT primer was selected from the sequence obtained from sequencing pl40D4E3 with the D8S136P2-CA primer. The sequence of the (GT)n repeat and its flanking sequences is presented in Figure 7. The sequence of the cloned repeat unit was (GT)20 and the predicted length of the amplified product was 236 bp. The observed allele sizes and frequencies are shown in Table 3. TTTTTGATAATCTCATGACCAAAATCCCTTNACGTGAGTTTTCGTTCCACTGAGCGTCAG AAAAACTATTAGAGTACTGGTTTTAGGGAANTGCACTCAAAAGCAAGGTGACTCGCAGTC ACCCCGTAGAAAAGNTCGNGCCACTGCNCTCCAGCCTGGGCGAAAGAATGAGACCCCATC TGGGGCATCTTTTCNAGCNCGGTGACGNGAGGTCGGACCCGCTTTCTTACTCTGGGGTAG TCACACAGAAAAAAAACAACAAAAAACAAACAAACAAACAAAACACCCTTGTTCAAGAGG AGTGTGTCTTTTTTTTGTTGTTTTTTGTTTGTTTGTTTGTTTTGTGGGAACAAGTTCTCC AGTTTTGGCTGGCTGTGCTTGACCACTCCCAGCTTCCCGCAAAGGATTGCTCAAAAGCCT TCAAAACCGACCGACACGAACTGGTGAGGGTCGAAGGGCGTTTCCTAACGAGTTTTCGGA GAGCCCAMGAGGAGAMAMCACACACACACACACACACACACACACACAAAGGGCTTC CTCGGGTTTCTCCTCTTATTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTCC£S£A£ GGTGTGGAAACAGAGCACACAGAAGGAAAACAGCAGAATAGTTAGAAATGTCATCTAGCC CCACACCTTTGTCTCGTGTGTCTTCCTTTTGTCGTCTTATCAATCTTTACAGTAGATCGG CAGCAACCCCACGGTGGCAGCTCCAGAATCAAAACAGTCTGCAAACAAGGCCACATTCAG GTCGTTGGGGTGCCACCGTCGAGGTCTTAGTTTTGTCAGACGTTTGTTCCGGTGTAAGTC TCCCCCATCTCTGGTGTCTCCCTGCTACCAGGGCCTGGCACTGTTCATTCTCCAG AGGGGGTAGAGACCACAGAGGGACGATGGTCCCGGACCGTGACAAGTAAGAGGTC Sequence of D8S136P1 (GT)n repeat and flanking sequence. Primer sequences are imderlined. The top strand of the sequence is oriented 5'->3'. Table 2. Sizes and frequencies of alleles at D8S136P1 in the C E P H panel Allele Repeat size PCR product size (bp) Frequency A l (GT)24 89 0.0203 A 2 (GT)23 87 0.0122 A3 (GT)22 85 0.0569 A4 (GT)2i 83 0.0366 A5 (GT)20 81 0.1829 A6 (GT)i9 79 0.0935 A 7 (GT)i8 77 0.0285 AS (GT)i7 75 0.2114 A 9 (GT)i6 73 0.1504 *A10 (GT)i5 71 0.2073 * Sequenced allele Autoradiograph of a polyacrylamide gel showing segregation of the 73, 75, 79 81 and 85 bp alleles at D8S136P1 in CEPH family 1423. Each major band represents an allele and allele sizes are indicated on the right side of the figure. Shadow bands are an in vitro artifact, as they were also seen for amplifications of plasmid D N A (Litt and Luty, 1989). GTTTAGAGCAAAAGCAAAACAATAAfiCCAAfinGGCTnTf lTGTGTCîTnTGTGTCTCTGTCT CAAATCTCGTTTTCGTTTTGTTATTCGGTTCCCCCACACACACACACACACACACACACA GTGTGTGTGTGTGTTCTTTAATTAGAATGATGATAAAACAAAAATAGCCTTCCGTTTTCC CACACACACACACAAGAAATTAATCTTACTACTATTTTGTTTTTATCGGAAGGCAAAAGG TTAAACTCTCAAGGTGACACTTTGGTTACAAGTAGAGGCATTGCTGATTTGCTGGTCTTT AATTTGAGAGTTCCACTGTGAAACCAATGTTCATCTCCGTAACGACTAAACGACCAGAAA TGTGACTGGATATTGATTGGCAAGGTAAGAATTGCTTATCACACTCTGATGAGCTTGATA ACACTGACCTATAACTAACCGTTCCATTCTTAACGAATAGTGTGAGACTACTCGAACTAT CTGCTCCTTGCCTAGGAGTAAAAGACTCAGCCTCTCCAGGAACATCAGT GACGAGGAACGGATCCTCATTTTCTGAGTCGGAGAGGTCCTTGTAGTCA Sequence of (GT)n repeat and flanking sequence. Primer sequences are underlined. The top strand of the sequence is oriented 5'^3'. Chapters. Results 63 Table 3. Sizes and fréquences of alleles at D8S136P2 in the C E P H panel Allele Repeat size PCR product size (bp) Frequency A l (GT)24 244 0.0081 A 2 (GT)23 242 0.0122 A3 (GT)22 240 0.1179 A4 (GT)2i 238 0.2154 *A5 (GT)2o 236 0.2642 A6 (GT)i9 234 0.3669 A 7 (GT)i8 232 0.0041 AS (GT)i7 230 0.0041 A9 (GT)i4 224 0.0041 * Sequenced allele The H and PIC values of D8S136P2 are 0.73 and 0.69, respectively. The H and PIC values of the haplotype are 0.95 and 0.95, respectively. The reference genotypes are 133101=A3, A4 and 133102=A4, A5. A n example of segregation of the 234, 236, 238 and 240 bp alleles in CEPH family 1423 is shown in Figure 8. 3.4 Sequence tagged sites 5.4.1 Sequence flanking a NotI site A n STS was generated by subcloning a NotI site and determining its' flanking sequences. The NotI site identified by an asterisk (*) in Figure 1 was subcloned as two EcoRI/NotI fragments of 6.5 kb and 10 kb. These fragments were ligated into EcoRI/NotI digested pBluescript II KS +/-, producing the plasmids pl40D4EN6 and pl40D4EN10, respectively. 154 bp of sequence flanking the NotI site was determined by sequencing pl40D4EN6 and pl40D4EN10 with the M13 (-20) sequencing primer. The sequence of this NotI site and its flanking sequences are presented in Figure 9. 5.4.2 The D8S1S6P1 and D8S1S6P2 (GT)„ repeats and their flanking sequences The sequences of the (GT)n repeats of D8S136P1 and D8S136P2 and their flanking sequences do not meet the original definition of a STS as the PCR products are repetitive and can not be used as a hybridization probe. However, the flanking oligonucleotide primers are unique sequences and are suitable for direct library screening. Therefore, the sequence information obtained for the (GT)n repeats of D8S136P1 and D8S136P2 (figures 5 and 7, respectively) qualify as "quasi" sequence tagged sites. Autoradiograph of a polyacrylamide gel showing segregation of the 234, 236, 238, and 240 bp alleles at D8S136P2 in CEPH family 1423. Allele sizes are indicated on the right side of the figure. Figure 9. Sequence of the subcloned NotI site and flanking sequence AGCGGGAGTGCCTGGGCCGTGCGGGGCCGCCCTAACCGCNTGNCCCCGûCfiSCCfiCGACC TCGCCCTCACGGACCCGGCACGCCCCGGCGGGATTGGCGNACNGGGGCCGCCGGCGCTGG CCGGTCCTCTGCTAGGAGTTAGCTGCCCTCGGTGGCGCGGGAACGGCGCCCGCAGAGAGT GGCCAGGAGACGATCCTCAATCGACGGGAGCCACCGCGCCCTTGCCGCGGGCGTCTCTCA GGCGCTGCGGTCCCTCCCGGACACGGCTCTGCGGTTCCTTCCT CCGCGACGCCAGGGAGGGCCTGTGCCGAGACGCCAAGGAAGGA Sequence of the subcloned NotI site and flanking sequence. The sequence of the NotI site is underlined. The top strand of the sequence is oriented 5 '^3' . 3.5 Linkage analysis D8S136 was positioned on the most recent chromosome 8 Unkage map (Tomfohrde et al., in press) using two-point and multipoint linkage analysis. The maximum likelihood value for the recombination fraction between D8S136P1 and D8S136P2 was 0.003 with a corresponding lod score of 103 and an equivalent number of méioses equal to 353 (Table 4). A single recombination event was observed between D8S136P1 and D8S136P2. This observation was confirmed by retyping the individual in question. The possibility of a sample misidentification can not be excluded without examining a new sample. However, this individual has been tested for numerous marker systems and spurious results have not been observed, therefore this explanation is unlikely. The family with the recombination event between D8S136P1 and D8S136P2 is shown in Figure 10. The recombination event occured in the maternal chromosomes, generating a recombinant chromosome in which the alleles at markers telomeric to D8S136P1 and D8S136P2 are inherited with the allele at D8S136P2 and alleles at markers centromeric to the two markers are inherited with the allele at D8S136P1. The observed segregation of maternal alleles in the recombinant individual suggests that the order of D8S136P1 and D8S136P2 is 8pter-D8S136P2-D8S136Pl-8cen. 33.1 Two-point linkage analysis A n approximate distance for D8S136, from LPL (GZ-14-15), D8S133, D8S5, D8S137, D8S131, D8S87, D8S135, A N K l and PLAT, was obtained by two-point linkage analysis, of each marker against D8S136P1 and D8S136P2, using the Figure 10. Family with a recombination event between D8S136P1 and D8S136P2 GZ-14,15 LPL3GT D8S136P2 D8S136P1 D8S137 ANKl 4p 1 4 66 99 55 1 1 4p 34 56 10 9 65 1 1 o 4B 35 53 10 8 Ô Ô Ô 22 84 34 48 55 414 o ô 1 4 98 114 o 14 65 93 64 l U Ô ô GZ-14,15 LPL3GT D8S136P2 D8S136P1 D8S137 ANKl 4p 14 66 99 65 1 1 46 89 55 10 6 1 43 8 10 56 41 44 46 65 44 65 10 66 44 89 55 1 44 46 89 55 41 214 43 45 8 56 41 10 2|3 44 46 89 55 41 - recombinant individual The pedigree has been analysed to show haplotypes. Alleles at chromomosome 8p markers, LPL (GZ,14-15 and LPL3GT), D8S136P1, D8S136P2, D8S137 and A N K l , segregating in each individual are indicated under the symbols for individuals. For example, the father is segregating alleles 9 and 10 (73 bp and 71 bp alleles) at D8S136P1 and alleles 5 and 6 (236 bp and 234 bp alleles) at D8S136P2. Data for markers other than D8S136P1 and D8S136P2 was obtained from the C E P H database version 5.0. M L I N K program from the LDSTKAGE package of programs. A l l of these markers have been positioned on a chromosome 8 linkage map (Tomfohrde et al., in press) and all, with the exception of LPL (GZ-14,15), have been physically localized to region C on the short arm of chromosome 8 (Tomfohrde et al., in press). LPL (GZ-14,15) has been physically localized to region B. The results of two-point linkage analyses between D8S136P1 and D8S136P2 and these markers are shown in Tables 4 and 5. For each pair of loci tested, the table lists lod scores for various recombination fractions, maximum likelihood values for recombination fractions, maximal lod scores, equivalent number of méioses and equivalent number of recombinants. Significant linkage was detected between D8S136P1 and D8S136P2 and all markers tested. The results of two-point linkage analyses suggest that D8S136 maps in the region between LPL (GZ-14,15) and D8S131. 3.5.2 Multipoint linkage analysis D8S136 was positioned between two flanking markers, in the region between LPL (GZ 14,15) and D8S131 by multipoint linkage analysis using the program CRIMAP (Lander and Green, 1987). The position of D8S136 was varied among the markers described in section 3.5.1 and the most likely flanking markers for D8S136 were D8S133 and D8S5. With the exception of D8S136P1 and D8S136P2, computed odds against inversion of adjacent markers were all better than 1,000:1 (Table 6). A summary of published marker map distances for the short arm of chromosome 8 (Tomfohrde et al., in press) along with a new linkage map incorporating the linkage information for D8S136 is shown in Figure 11. Recombination fraction (0) Locus 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0 z k n LPL -oo 58.67 58.85 52.52 46.89 40.31 32.94 24.86 0.055 58.70 15 281 D8S133 -oo 89.84 84.28 76.70 67.94 58.22 47.60 36.11 0.030 90.73 11 374 D8S5 -oo 22.72 21.53 19.57 17.21 14.57 11.68 8.59 0.042 22.76 4 101 D8S137 -oo 14.45 15.10 14.27 12.77 10.87 8.70 6.36 0.089 15.14 8 89 D8S131 -oo 10.86 11.68 11.27 10.28 8.92 7.26 5.38 0.101 11.68 7 74 0 = maximum likelihood estimate for the recombination fraction Z = maximum lod score k = effective number of recombinants n = effective number of informative méioses. Recombination fraction (0) Locus 0.00 0.05 0.10 0.15 0.20 0.25 0.20 0.35 0 Z k n D8S87 -oo 17.74 25.67 27.68 27.07 24.83 21.41 17.00 0.160 27.72 40 252 D8S135 -oo -1.09 3.29 5.07 5.73 5.71 5.20 4.31 0.223 5.79 18 82 A N K l -oo -11.89 2.07 8.08 10.68 11.30 10.55 8.76 0.245 11.31 47 191 PLAT -oo -2.13 5.42 8.31 9.18 8.85 7.71 5.96 0.209 9.19 25 117 D8S136P2 -oo 96.58 88.06 78.93 69.23 58.91 47.94 36.31 0.003 103.00 1 353 0 = maximum likelihood estimate for the recombination fraction Z = maximum lod score k = effective mm\ber of recombinants n = effective number of informative méioses. Recombination fraction (0) Locus 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0 Z k n LPL -oo 43.10 45.42 43.63 39.91 34.92 28.94 22.15 0.096 45.44 27 278 D8S133 -oo 61.45 61.36 57.60 52.06 45.80 37.49 28.81 0.072 62.13 24 329 D8S5 -oo 17.75 17.60 16.36 14.61 12.49 10.08 7.45 0.067 17.91 6 92 D8S137 -oo 14.79 15.98 15.41 14.03 12.15 9.90 7.38 0.101 15.98 10 101 D8S131 -oo 9.13 9.78 9.35 8.43 7.20 5.78 4.24 0.098 9.78 6 60 0 = maximum likelihood estimate for the recombination fraction Z = maximum lod score k = effective number of recombinants n = effective number of informative méioses. Table 5 con't. Two-point meiotic linkage analysis of selected 8p markers and D8S136P2 Recombination fraction (0) Locus 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0 Z k n D8S87 -oo 25.71 31.97 32.80 31.18 28.04 23.77 18.58 0.137 32.88 35 258 D8S135 -oo 0.12 4.01 5.57 6.11 6.02 5.51 4.62 0.217 6.14 18 83 A N K l -oo -5.97 5.77 10.53 12.30 12.29 11.06 8.90 0.224 12.47 40 178 PLAT -oo -7.11 2.64 6.61 8.11 8.18 7.30 5.75 0.227 8.29 28 121 0 = maximum likelihood estimate for the recombination fraction Z = maximum lod score k = effective number of recombinants n = effective number of informative méioses. Table 6. Supporting data for marker order Locus Odds against flipping adjacent markers LPL 2.46 X1043 D8S133 1.12 X1029 D8S136P2 245.47 D8S136P1 2.63 X 109 D8S5 1.78X103 D8S137 1.70 X 103 D8S131 3.39 X10» D8S87 1.15X105 D8S135 2.5 X103 A N K l Figure 11. Summary of chromosome 8p multipoint linkage analysis Published map (Tomforde et al., in press) 8pter -D8S201 -D8S7 2.3 31.7 - L P L -D8S133 -D8S5 -D8S137 —D8S131 -D8S87 —D8S135 - A N K l - P L A T 8cen 3.9 2.9 5.6 5.0 6.2 3.4 4.0 2.4 New map 8pter •D8S201 —D8S7 - L P L -D8S133 -P8S136 -D8S5 -D8S137 -D8S131 -D8S87 -D8S135 - A N K l - P L A T 8cen 2.3 32.0 3.6 4.2 5.7 3.8 4.4 6.3 2.9 2.5 2.1 Map distances in cM, calculated using the Kosambi mapping function, indicated to the right of each map. 3.6 Analysis of the observed genotype frequencies at D8S136P1 and D8S136P2 The observed genotype frequencies, at D8S136P1 and D8S136P2, in the C E P H panel, were compared to the distribution expected under H W E using standard analysis. The data set used for the analysis was the 123 C E P H panel individuals. The numbers of individuals with each genotype at each locus are presented in Appendix 1. The expected and observed genotype distributions were collapsed in a random fashion until all cell values in the resulting table were equal to or greater than five. The analysis was then performed on the collapsed table. This procedure was repeated several times, for each locus, using different random patterns of collapsing. In none of these instances, did the analysis indicate a significant deviation from H W E for either D8S136P1 or D8S136P2 (representative values for x^, degrees freedom (df) and p were = 1.73,7 df, p = .973 and x^ = 0.589,5 df, p = .989, for D8S136P1 and D8S136P2, respectively). 3.7 Analysis of association of alleles atD8S136Pl and D8S136P2 Standard x^ analysis was used to test the null hypothesis of random association between alleles at D8S136P1 and D8S136P2 in haplotype contingency tables. The analysis was performed using version 2.5 of the program CONTING (Ott, 1991). The distribution of 246 independent marker haplotypes in the C E P H panel is shown in Table 7. This table was collapsed to a smaller 5 X 4 table by combining columns and rows, until all cell values were greater than or equal to five (Table 8). Standard x^ analysis of the 5 X 4 table yielded highly significant evidence for rejection of the null hypothesis of Table 7. Distribution of 246 D8S136 chromosome haplotypes in the C E P H panel Alleles at Alleles at D8S136P2 D8S136P1 A l A 2 A3 A4 A5 A6 A7 A8 A9 A l 1 0 0 3 1 0 0 0 0 A 2 0 1 0 0 1 1 0 0 0 A3 0 1 2 3 6 2 0 0 0 A4 0 0 1 0 2 6 0 0 0 A 5 0 1 7 7 21 8 1 0 0 A6 0 0 0 10 5 8 0 0 0 A 7 0 0 0 1 1 5 0 0 0 A8 0 0 7 16 15 13 0 1 0 A 9 1 0 5 4 2 24 0 0 1 AlO 0 0 7 9 11 24 0 0 0 Table 8. Allelic association between D8S136P1 and D8S136P2, expected numbers in parentheses Alleles at Alleles at D8S136P2 D8S136P1 A (Al,2,3) B (A4) C(A5) D (A6,7,8,9) A (Al,2,3,4) 6 (4.28) 6 (6.68) 10 (8.19) 9(11.85) 31 B (A5) 8 (6.22) 7 (9.70) 21 (11.89) 9 (17.20) 45 C (A6,7,9) 6 (9.26) 15 (14.43) 8 (17.70) 38 (25.60) 67 D(A8) 7(7.19) 16 (11.20) 15 (13.73) 14 (19.87) 52 E (AlO) 7 (7.05) 9 (10.99) 11 (13.48) 24 (19.49) 51 34 53 65 94 246 = 32.24; p value = 0.0013 Note: the alleles designated A-E and A-D under the headings "alleles at D8S136P1" and "alleles at D8S136P2" indicate combinations of alleles shown in Table 7. The alleles pooled to create the combination alleles are listed in parentheses. For example, the A allele at D8S136P2 in Table 8 was created by pooling the data for alleles A l , A2 and A3 shown in Table 7. random allelic association (x^ = 32.24,12 df, p = 0.0013). The twenty haplotypes observed five or more times were analysed individually for random allelic association. These haplotypes are listed in Table 9 under the heading "haplotype ab", where a and b are the alleles at D8S136P1 and D8S136P2, respectively. For each haplotype in Table 9 the 246 marker haplotypes were collapsed to a 2 X 2 contingency table by dividing the data into four categories designated ab, aB, Ab and AB. A represents any allele at D8S136P1 excluding the allele designated a. B represents any allele at D8S136P2 excluding the allele designated b. Standard x^ analysis, using Yate's correction for small cell values (Yates, 1934,1984), was performed for each of the twenty haplotypes. Highly significant evidence for rejection of the null hypothesis of random allelic association was obtained for the haplotypes 55, 56, 64, and 96 (Table 9). D and D' values were calculated for each of these haplotypes and are listed in Table 9. Postive disequilibrium was observed for the marker haplotypes 55, 64, and 96. The observed values of D were 27%, 28%, and 44% of the maximum possible value for positive disequilibrium, respectively. Negative disequilibrium was observed for the marker haplotype 56 and the observed value of D was 53% of the maximum possible value for negative equilibrium. Table 9. D8S136 chromosome haplotypes observed at least five times i i \ the CEPH panel Number of observations out of 246 Haplotype ab A B Ab aB ab X2 p-value D D ' 35 173 59 8 6 1.26 0.2610 46 150 87 3 6 2.16 0.1418 53 179 22 38 7 0.37 0.5411 54 155 46 38 7 0.78 0.3786 55 157 44 24 21 10.37 0.0013 0.037 0.274 56 116 85 37 8 8.38 0.0038 -0.037 0.530 64 180 43 13 10 5.86 0.0155 0.021 0.280 65 163 60 18 5 0.08 0.7743 66 138 85 15 8 0.01 0.9298 76 151 88 2 5 2.15 0.1427 Note: Haplotype ab = allele "a" at D8S136P1 and allele "b" at D8S136P2 A = any allele at D8S136P1 excluding the allele designated "a" B = any allele at D8S136P2 excluding the allele designated "b" D = disequilibrium parameter D' = disequilibrium parameter divided by the maximum possible disequilibrium value Table 9 con't. D8S136 chromosome haplotypes observed at least five times in the C E P H panel Number of observations out of 246 Haplotype ab A B Ab aB ab p-value D D ' 83 172 22 45 7 0.03 0.8578 84 157 37 36 16 2.66 0.1027 85 144 50 37 15 0.07 0.7878 86 114 80 39 13 3.93 0.0474 93 184 24 33 5 0.00 0.9911 96 141 68 13 24 12.69 0.0004 0.041 0.439 103 172 22 45 7 0.03 0.8578 104 150 44 43 9 0.42 0.5177 105 140 54 41 11 0.63 0.4276 106 127 68 27 24 2.07 0.1502 Note: Haplotype ab = allele "a" at D8S136P1 and allele "b" at D8S136P2 A = any allele at D8S136P1 excluding the allele designated "a" B = any allele at D8S136P2 excluding the allele designated "b" D = disequilibrium parameter D' = disequilibrium parameter divided by the maximum possible disequilibrium value Discussion and Conclusion 4.1 Clone isolation Cosmid clones containing both NotI sites and (GT)n repeats were isolated in the interest of using (GT)n repeat polymorphisms to place isolated NotI sites on the chromosome 8 linkage map. In humans, (GT)n repeats occur every 30 kb, on average, in euchromatic D N A but they are under-represented in centromeric heterochromatin (Stallings et al., 1991). The average insert size of the flow-sorted chromosome 8 cosmid library, LA08NC01, is 36.5 kb (Wood et al., 1992). If each cosmid insert contained one (GT)n repeat, almost every cosmid in the library should hybridize poly(dC-dA)- (dG-dT). (GT)n repeats were detected in 53 of the 96 cosmids represented on the filter 8-1-140R. This result was not unexpected as the distribution of (GT)n repeats among the cosmid inserts was expected to be random. This expectation was supported by the observation that c l40Bl l contained one (GT)n repeat and cl40D4 contained three. Similar observations have been reported by Stallings et al., (1991) who observed that cosmid clones from a chromosome 16-specific cosmid library contained from 0 to 11 (GT)n repeats (Stallings, et al., 1991). The distance between NotI sites in the human genome is estimated to be approximately 730 kb and the average insert size of the cosmid clones in the LA08NC01 is 36.5 kb. Therefore, approximately one in every twenty cosmid clones should contain a NotI site. As expected, two out of the 53 GT-positive cosmid clones contained a NotI site and one contained two NotI sites. 4.2 Polymorphism screening and linkage analysis (GT)n repeats are potentially polymorphic markers which are easily analysed using the PCR and large numbers of them are distributed throughout the human genome. For these reasons, (GT)n repeats are particularly useful for linkage analysis, and were therefore used to position isolated NotI sites on the chromosome 8 linkage map. Poly(dC-dA)- (dG-dT) was hybridized to library replica filter 8-1-140R under stringent conditions (65°C) in order to detect (GT)n repeats with roughly 12 or more repeats (Tautz and Renz, 1984). Hybridization was carried out under stringent conditions because the informativeness of (GT)n repeats with 10 or fewer repeats has been reported to be very low, if not zero, whereas for (GT)n repeats with 11 or more dinucleotide units, marker informativeness generally increases as the average number of units increases (Weber, 1990b). A cosmid containing two NotI sites and three (GT)n repeats was isolated and two polymorphic (GT)n repeats, D8S136P1 and D8S136P2, were subcloned and typed in the C E P H panel of families. D8S136P1 and D8S136P2 had uninterrupted GT dinucleotide runs of 15 and 20 and were highly informative, with PIC values of 0.82 and 0.69, respectively. These two repeats follow the general rules for informativeness proposed by Weber (1990b) which are outlined above and in section 1.4.1. Linkage mapping positioned D8S136 in an approximately 9.9 c M interval between markers D8S133 and D8S5 (Figure 11). A recombination event between D8S136P1 and D8S136P2 suggests that the order of these two markers is 8pter-D8S136P2-D8S136Pl-8cen. 43 Distribution of alleles atD8S136Pl andD8S136P2 Several different mechanisms for the generation of germ-line variability in the number of GT dinucleotides at a given (GT)n repeat site have been proposed. One possible explanation, also invoked for the generation of new alleles at VNTR loci, is unequal exchange, either between homologous chromosomes or sister chromatids (Tautz and Renz, 1984; Jeffreys et al., 1985b; Kwiatkowski et al., 1992). Kwiatkowski et al., (1992) recently reported evidence which suggests unequal exchange between chromosome homologs is not the major mechanism of mutation for (GT)n repeats. These researchers identified two spontaneous mutations, one in each of two individuals, for (GT)n repeat markers on chromosome 9q. The mutations did not appear to be the result of an unequal exchange between homologs, as phase analysis indicated the absence of crossover events on 9q in the two individuals. The possibility that the mutations were generated by unequal sister chromatid exchange or gene conversion, without exchange of flanking markers, could not be excluded. Similar observations have been reported for the generation of new alleles at VNTR loci (Wolff et al., 1988; Wolff et al., 1989). The explanation most often invoked for the generation of new alleles at (GT)n loci is slipped strand mispairing within the repeat sequence during recombination, replication or repair (Streisinger and Owen, 1985; Levinson and Gutman, 1987a; Wolff et al., 1989; Coggins and O'Prey, 1989). The rate of strand slippage has been shown to increase with increasing lengths of blocks of repeats for (GT)n sequences in ml3 (Levinson and Gutman, 1987b). The observed dependence of marker informativeness on the number of repeats (Weber, 1990b) is consistent with mutation of human (GT)n sequences via strand slippage. Regardless of the mechanism, one would expect that mutations of a particular allele would randomly generate new alleles with both smaller and larger numbers of GT dinucleotides. The distribution of allele sizes at D8S136P1, shown in Figure 12, seems to be consistent with this description of mutation. The distribution of allele sizes at D8S136P2 (Figure 13) on the other hand, could be interpreted as being somewhat different. That is, if A6 ((GT)i9) were the archetypal allele, it would seem that most mutation occured towards larger allele sizes. The observed distribution of alleles at D8S136P2 could also be explained by the mutation of the archet)^al allele to both larger and smaller alleles and subsequent selection against smaller alleles. 4.4 Analysis of association of alleles atD8S136Pl and D8S136P2 (GT)n repeats, provide a convenient source of highly informative single site polymorphisms. These markers, which generally have PIC values in the range of 0.4-0.8 (Weber, 1990b), could be used to generate highly informative haplotypes if multiple sites were simultaneously analysed. This method of increasing the informativeness of a locus is limited by linkage disequilibrim. The practical value of simultaneously typing two closely linked (GT)n polymorphisms was investigated in two steps. Each member of the C E P H panel of reference families was typed for both markers, which allowed determination of whether typing individuals for the second polymorphism increased the number of informative méioses. In the second step, allelic association between D8S136P1 and D8S136P2 was analysed to determine whether the markers were in linkage disequilibrium in the population Figure 12. Distribution of alleles at D8S136P1 in the C E P H panel 0.3 n Number of GT dinucleotides Figure 13. Distribution of alleles at D8S136P2 in the C E P H panel defined by the C E P H panel. Significant disequilibrium was observed between D8S136P1 and D8S136P2. However, the disequilibrium seemed to be minimal and did not appear to measurably decrease the amount of information gained by typing individuals for both markers. The distribution of 246 independent marker haplotypes in the C E P H panel (Table 7) was collapsed to a smaller 5 X 4 table by combining columns and rows, until all cell values were greater than or equal to five (Table 8). Standard y} analysis for random allelic association, in the 5 X 4 contingency table, yielded highly significant evidence in favor of disequilibrium between D8S136P1 and D8S136P2 {y} = 33.82,12 df, p = 0.0007). Twenty one different methods of collapsing the original data were performed to determine if various patterns of combining columns and rows differed in their ability to detect disequilibrium. Two patterns resulted in a failure to detect linkage disequilibrium. This is most likely because the combination of rows and columns combined observations of positive and negative disequilibrium, such that they canceled each other. The allelic association between D8S136P1 and D8S136P2 was further investigated by individually analysing the twenty haplotypes observed five or more times. These haplotypes were tested for random allelic association by standard analyses in 2 X 2 contingency tables. Significant evidence in favor of disequilibrium was obtained for only four of the twenty marker haplotypes tested: 55, 56, 64, and 96 (Table 9). Furthermore, the strength of disequilibrium at these four haplotypes was at most 53% of themaximum possible value of negative disequilibrium, for haplotype 56, and 44% of the maximum possible value of positive equilibrium, for haplotype 96. Finally, typing individuals in the C E P H panel for markers D8S136P1 and D8S136P2 reduced the number of uniformative méioses from 18% for the most informative marker to 5% for the haplotype, in accordance with the equation (1-PICA)(1-PICB) = I - P I C A B - Although disequilibrium was observed between D8S136P1 and D8S136P2, it does not appear to measurably decrease the amount of information gained by typing individuals for both markers. This indicates that the simultaneous analysis of two closely linked (GT)n repeat polymorphisms can be used to increase considerably the informativeness of a locus. However, this is not the case for all closely linked (GT)n repeat markers. Shortly before the completion of this thesis, Sherrington et al., (1991) published the results of an analysis of the association of alleles at an adjacent pair of (GT)n markers, isolated from D5S76 and separated by approximately 7 kb. Standard analysis for random allelic association was performed in a 4 X 3 contingency table, constructed for 86 haplotypes and significant disequilibrium was detected between the two markers (x^ = 67, 9df, p< 0.001). Six haplotypes were observed five or more times. These haplotypes were tested for random allelic association by standard x^ analysis in 2 X 2 contingency tables and significant disequilibrium was detected for all but one haplotype. Typing individuals for both markers reduced the number of uninformative méioses from 26% for the most informative marker to 16% for the haplotype, only one half of the reduction expected if the two markers were in linkage equilibrium. The degree of disequilibrium observed between D8S136P1 and D8S136P2 was considerably lower than the degree of disequilibrium observed between two other (GT)n repeats studied by Sherrington et al., (1991). Consequently, typing individuals for D8S136P1 and D8S136P2 yielded a greater increase in information than typing individuals for the Sherrington et al., (1991) repeats. This suggests that the amount of information gained by typing individuals for two repeats in disequilibrium is dependent, in part, on the degree of disequilibrium between the repeats. The results presented in this thesis suggest that in some cases tixe informativeness of a locus could potentially be increased by typing individuals for closely linked (GT)n repeat polymorphisms. Several different factors can result in the observation of disequilibrium between two marker loci: clustered sampling, natural selection in favor of certain heterozygous genotypes, random genetic drift to higher or lower frequencies for some chromosomes, population admixture or tight linkage. Although it is difficult to conclusively determine the reason disequilibrium was observed between D8S136P1 and D8S136P2, some causes are more likely than others. Clustered sampling is not a likely explanation for the observation of disequilibrium between D8S136P1 and D8S136P2. Clustered sampling is a statistical artifact which can cause markers to appear to be in disequilibrium. For investigations of two-locus allelic association in contingency tables, data must be collected from individuals who are unrelated, except by marriage as using blood relatives is a deviation from the null hypothesis of independence of observations and tends to inflate chi-squared (Cohen, 1976). Haplotype data for D8S136P1 and D8S136P2 was obtained from unrelated individuals (section 2.2.13.2) to eliminate the possibility of clustered sampling. It is possible the disequilibrium between D8S136P1 and D8S136P2 is a statistical artifact of a population admixture. When two Hardy-Weinberg populations, differing in two-locus gametic frequencies, are combined the resulting population is likely to manifest linkage disequilibrium (Nei and L i , 1973). The model population used for the analysis of allelic association between D8S136P1 and D8S136P2 was the C E P H panel, which is comprised of families with diverse geographic origins; 27 Utah Mormon families which represent Northern Europe, 10 French families, 2 Venezuelan fanûlies and 1 Pennsylvaiuan Old Order Amish family (Dausset, 1990). The C E P H panel is, therefore, a mixture of several populations. The Venezuelan and Pennsylvanian families make up a small fraction of the model population, therefore their contribution to the effect of population admixture is most likely negligible. However, if the two-locus gametic frequencies were significantly different in the French and Utah families, disequilibrium between the two loci could be observed in the model population. Highly significant disequilibriimi was observed for four haplotypes, 55, 56, 64 and 96 in the C E P H panel (Table 9). When D8S136P1 and D8S136P2 were analysed for allelic association in the Utah families, a data set composed of 182 independent haplotypes, highly significant disequilibrium was observed for haplotypes 55 and 96 (^ 2 = 7.95, Idf, p = 0.0048 and = 8-93, Idf, p = 0.0028, respectively). There are two possible explanations for the absence of highly significant disequilbrium for haplotypes 56 and 64 in the Utah data set. Either the sample size was too small to allow detection of disequilibrium for haplotypes 56 and 64 or the disequilibrium observed for these haplotypes in the C E P H panel is the result of population admixture. Since highly significant disequilibrium was observed for haplotypes 56 and 64 when the observations for the Utah data set were artificially doubled. (x2 = 6.42, Idf, p = 0.0113 and = 8.76, Idf, p = 0.0031, respectively), the most likely explanation is the small size of the data set decreased the power of the test for disequilibrium. The results of the analysis of D8S136P1 and D8S136P2 for allelic association in the Utah data set suggests that population admixture is not the most likely explanation for the observed disequilibrium between these two markers. The most likely explanation for the observed disequilibrium between D8S136P1 and D8S136P2 is that it was caused by a mutation to a new allele in the founding population and has been maintained by tight linkage. A n existing disequilibrium (D), between two loci, is reduced by a factor of 1-r in each generation, where r is the recombination fraction between the two loci (Hartl and Clark, 1989). The decay of linkage disequilibrium per generation is generally expressed as D i = (l-r)Do (Hartl and Clark, 1989). Decay is most rapid when two loci are unlinked or r = 0.5. For tightly linked loci, linkage disequilibrium can take many generations to decay. Linked loci tend to show allelic association when the recombination fraction between them is less than 0.02 (Ott, 1991). A n illustration of the effect recombination would have on the decay of disequilibrium between D8S136P1 and D8136b follows. The normalized disequilibrium parameter (D') for the haplotype defined by A5 at the D8S136P1 locus and A5 at the D8S136P2 was estimated to be 0.278 (Table 9). Two-point linkage analysis yielded 0.003 as a maximum likelihood estimate for the recombination fraction between D8S136P1 and D8S136P2 with a corresponding lod score of 103. If the contribution of mutation to the decay of disequilibriimi is ignored and the value of r = 0.003, the observed disequilibrium of 0.278 would decay to 0.01, a value for D' which is unlikely to be detected in a sample of the size studied, in approximately 117 generations. In contrast, if the markers were unlinked, r = 0.5, the observed diseqviilibrivun would decay to 0.01 in approximately 5 generations. Allele 5 (A5) at D8S136P1 is in positive disequilibrium with A5 at D8S136P2 and is also in negative disequilibrium with A6 at D8S136P2. The observation of positive disequilibrium could be explained by the generation of A5 at D8S136P1 by the mutation of an allele at the D8S136P1 locus on a chromosome with A5 at the D8S136P2 locus. It follows then that A5 at D8S136P1 would initially be in negative equilibrium with all other alleles at the D8S136P2 locus. It is possible that 52% of the maximum possible negative disequilibrium value exists between A5 and A6 (Table 9) because A6 is the most common allele at the D8S136P2 locus and not enough time has elapsed for the disequilibrium to completely decay. The observed disequilibrium between D8S136P1 and D8S136P2 is most likely generated by new mutations and maintained by close linkage. 4.5 Conclusions I have charcterized the anonymous D N A locus, D8S136, which maps to chromosome 8p, interval 8p21->cen, and between flanking markers D8S133 and D8S5 on the physical and linkage maps of chromosome 8, respectively. The informativeness of the locus was significantly increased by simultaneously analyzing two closely linked (GT)n repeat polymorphisms, despite the fact that significant disequilibrium was observed. The PIC value for the haplotype is 0.95. This value is comparable to PIC values reported for VNTR lod (Nakamura et al., 1987) and is greater than PIC values reported to date for other polymorphic chromosome 8 loci (Williamson et al., 1990). D8S136 is a highly informative locus which wil l be useful for mapping disease traits located in the vicinity of the marker, either by classic segregation analysis or by strategies used to map complex traits, such as the analysis of affected relative pairs. Knowledge of the order of D8S136P1 and D8S136P2 would allow the initiation of a chromosome walk in the direction of a disease gene exhibiting tight linkage to D8S136 and to markers telomeric or centromeric to D8S136. The three STSs generated at this locus wil l facilitate physical mapping of D8S136 with respect to other chromosome 8 loci and wil l be useful for the isolation of neighboring NotI sites. In addition, since D8S136 has been mapped both physically and genetically it facilitates the integration of the physical and linkage maps of chromosome 8. 4.6 Proposals for future research Below is a list of experiments which could be carried out to futher investigate the work in this thesis 1. Hybridize the PCR product from the STS for the subcloned NotI site to a chromosome 8 NotI jumping library to isolate adjacent NotI sites. 2. Type populations of various ethnic origins for D8S136P1 and D8S136P2 and compare allele frequencies and results of analyses of allelic association to those obtained for the C E P H panel. 3. Subclone the third (GT)n repeat and analyse the allelic association between the three (GT)n repeats at D8S136. 4.7 Summary 1. A done containing three (GT)n repeats and two NotI sites was isolated by hybridization of poly(dC-dA)- (dG-dT) to a flow-sorted chromosome 8 cosmid library and subsequent digestion of positive clone D N A with EcoRI, NotI, and EcoRI/Notl. 2. The clone was physically localized using somatic cell hybrids to chromosome 8p, interval 8p21-»cen and assigned the locus symbol D8S136. 3. Three STSs were generated and positioned on a restriction map of the cosmid done. 4. A highly informative haplotype (PIC = 0.95) was generated at D8S136 by the simultaneous analysis of two adjacent (GT)n polymorphisms designated D8S136P1 and D8S136P2, respectively. 5. D8S136 was postioned on the current chromosome 8 linkage map in an approximately 9.9 c M region between flanking markers D8S133 and D8S5. In addition, a recombination event was observed between D8S136P1 and D8S136P2 which suggests that the order of these two markers is 8pter-D8S136P2-D8S136Pl-8cen. 6. Allelic association between D8S136P1 and D8S136P2 was analysed and significant, but minimal, disequilibrium was observed. The observed disequilibrium did not appear to measurably decrease the information obtained by typing individuals for both polymorphisms. References 97 Bergerheim, U . S., Kunimi, K., Collins, V. P., and Ekman, P. (1991). Deletion mapping of chromosomes 8, 10 and 16 in human prostatic carcinoma. Genes Chromosome Cancer 3: 215-220. Bernard, L. E. (1992) Isolation and mapping of clones from human chromosome 5. Ph.D Thesis, University of British Columbia, Vancouver, Canada. Bird, A . P. (1980). D N A methylation and the frequency of CpG in animal DNA.Nucleic Acids Res. 8: 1499-1504 Bird, A . P. (1986). CpG-rich islands and the function of D N A methylation. Nature 321: 209-213. Bird, A . P., Taggart, M . , Grommer, M . , Miller, O. J., and Macleod, D. (1985). A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich D N A . Cell 40: 91-99. Birnboin, H . C , and Doly, J. (1979). A rapid alkaline extraction procedure for screening recombinant plasmid D N A . Nucl. Acids. Res. 7: 1513-1523. Bishop, T. D, and Williamson, J. A . (1990). The power of identity-by-state methods for linkage analysis. Am. J. Hum. Genet. 46: 254-265. Blanton, S. H . , Heckenlively, J. R., Cottingham, A. W., Friedman, J., Sadler, L. A. , Wagner, M . , Freidman, L. H . , and Daiger, S. P. (1991). Linkage mapping of autosomal dominant retinitis pigmentosa (RPl) to the pericentric region of human chromosome 8. Genomics 11: 857-869. Botstein, D., White, R. L., Skolnick, M . , and Davis, R. W. (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet 32: 314-331. Brown, W. R. A. , and Bird, A . P. (1986). Long-range restriction site mapping of mammalian genomic D N A . Nature 322: 477-481. Buhler, E. M . , Buhler, U . K. , Stadler, G. R., Jani, L., and Jurik, L. P. (1980). Chromosome deletion and multiple cartilaginous exostoses. Eur. J. Pediatr. 133:163-166. Cabot, E. L. and Beckenbach, A . T. (1989). Simultaneous editing of multiple nucleic adds and protein sequences with ESEE. Comput. Applic. Biosci. 5: 233-234. Cohen, J. E. (1976). The distribution of the chi-squared statistic under clustered sampling from contingency tables. /. Am. Statist. Assoc. 71: 665-70. Coggins, L. W.,and O'Prey, M . (1989). D N A tertiary structures formed in vitro by misaligned hybridization of multiple tandem repeat sequences. Nucleic Acids Res. 18: 7417-7427. Cooper, D. N . , Taggart, M . H . , and Bird A . P. (1983). Unmethylated domains in vertebrate DNA.Nucleic. Acids Res. 11: 647-658. Coulondre, C , Miller, J. H . , Farabaugh, P. J., and Gilbert, W. (1978). Molecular basis of base substitution hotspots in Escherichia coli. Nature 274: 775-780. Dalla-Favera, R., Bregni, M . , Erikson, J., Patterson, D., Gallo, R. C , and Croce, C. M . (1982). Human c-myc one gene is located on the region of chromosome 8 that is translocated in Burkitt lymphoma cells. Proc. Natl. Acad. Sci. USA 79: 7824-7827. Dausset, J., Cann, H. , Cohen, D., Lathrop, M . , Lalouel, J. M . , and White, R. (1990). Centre d'Etude du Polymorphisme Humain (CEPH): Collaborative genetic mapping of the human genome. Genomics 6: 575-77. Donis-Keller, H . , Green, P., Helms, C , Cartinhour, S., Weiffenbach, B., Stephens, K., Keith, T. P., Bowden, D. W., Smith, D. R., Lander, E. S., Botstein, D., Akots, G., Rediker, K. S., Gravius, T., Brown, V. A. , Rising, M . B., Parker, C , Powers, J. A . Watt, D. E., Kauffman, E. R., Bricker, A. , Phipps, P., Muller-Kahle, H . , Fulton, T. R., Ng, S., Schumm, J. W., Braman, J. C , Knowlton, R. G., Barker, D. F., Crooks, S. M . , Lincoln, S. E., Daly, M . J., and Abrahamson, J. (1987). A genetic linkage map of the human genome. Cell 51: 319-337. Donis-Keller, H . , and Buckle, V. (1990). Report of the committee on the genetic constitution of chromosome 8. Cytogenet. Cell Genet. 55: 128-135. Donis-Keller, H . , and Buckle, V. (1991). Report of the committee on the genetic constitution of chromosome 8. Cytogenet. Cell Genet. 58: 382-402. Drabkin, H . A. , Bradley, C , Hart, I., Bleskan, J., L i , F. P., and Patterson, D. (1985a). Translocation of c-myc in the hereditary renal cell carcinoma associated with a t(3;8)(pl4.2;q24.13) chromosomal translocation. Proc Natl. Acad. Sci. USA 82: 6980-6984. Drabkin, H . A. , Diaz, M . , Bradley, C. M . , Le Beau, M . M . , Rowley, J. D., and Patterson, D. (1985b). Isolation and analysis of the 21q+ chromosome in the acute myelogenous leukemia 8;21 translocation: evidence that c-mos is not translocated. Proc. Natl. Acad. Sci. USA 82: 464-468. Edwards, J. H . (1976). The interpretation of lods in linkage analysis. Cytogenet. Cell Genet. 16: 289-293. Edwards, J. H . (1980). Allelic association in man. In Population structure and genetic disorders. New York, New York: Academic Press. Efstratiadis, A. , Posakony, J. W., Maniatis, T., Lawn, R. M . , O'Connell, C , Spritz, A . R., DeRiel, J. K., Forget, B. G., Weissman, S. M . , Slightom, J. L., Blechl, A . E., Smithies, O., Baralle, F. E., Shoulders, C. C , and Proudfoot, N . J. (1980). The structure and evolution of the human P-globin gene family. Cell 21:653-668. Feinberg, A . P., and Vogelstein, B. (1984a). A technique for radiolabeling D N A restriction endonuclease fragments to high specific activity. Anal. Biochem. 132: 6-13. Feinberg, A . P., and Vogelstein, B. (1984b). Addendum: A technique for radiolabeling D N A restriction endonuclease fragments to high specific activity. Anal Biochem. 137: 266-267. Ferrell, R. E., Hitner, H . M . , and Antoszyk, J. H . (1983). Linkage of atypical vitelliform macular dystrophy (VMD-1) to the soluable glutamate pyruvate transaminase (GPTl) locus. Am. J. Hum. Genet. 35: 78-84. Floyd-Smith, G., Martin ville, B. D., and Francke, U . (1986). A n expressed B-tubulin gene, TUBB, is located on the short arm of human chroosome 6 and two related sequences are dispersed on chromosomes 8 and 13. Exp. Cell. Res. 163:539-548. Francke, U . (1984). Random X inactivation resulting in mosaic nuUisomy of region Xp21.1-p21.3 associated with heterozygosity for ornithine tmscarbamylase deficiency and for chronic granulomatous disease. Cytogenet. Cell Genet. 38: 298-307. Fries, R., Eggen, A. , and Stranzinger, G. (1990). The bovine genome contains polymorphic microsatellites. Genomics 8: 403-406. Gardiner-Garden, M . , and Frommer, M . (1987). CpG islands in vertebrate genomes./. Mol. Biol. 196: 261-282. Goto, M . , Rubenstein, M . , Weber, J., Woods, K., and Drayna, D. (1992). Genetic linkage of the Werner's syndrome gene to five markers on chromosome 8. Nature 355: 735-738. Gross, D. S., and Garrard, W. T. (1986). The ubiquitous potential Z-forming sequence of eukaryotes, (dT-dG)n- (dC-dA)n, is not detectable in the genomes of eubacteria, archaebacteria, or mitochondria. Mol. Cell. Biol. 6: 3010-3013. Gross, D. S., Huang, S. Y., and Garrard, W. T. (1985). Chromatin structure of the potential Z-forming sequence (dT-dG)n' (dC-dA)n, evidence for an "altemating-B" conformation. /. Mol. Biol. 183: 251-265. Grimstein, M . , and Hogness, D. S. (1975). Colony hybridization: a method for the isolation of cloned DNAs that contain a specific gene. Proc. Natl. Acad. Sci. USA 72: 3961-3965. Haldane, J. B. S. (1919). The combination of linkage values, and the calculation of distances between the lod of linked fadors. /. Genet. 8: 299-309. Hamada, H . , and Kakunaga, T. (1982). Potential Z - D N A forming sequences are highly dispersed in the human genome. Nature 298: 396-398. Hamada, H . , Petrino, M . G., and Kakunaga, T. (1982a). Molecular structure and evolutionary origin of human cardiac muscle actin gene. Proc. Natl. Acad. Sci. USA 79: 5901-5905 Hamada, H . , Petrino, M . G., and Kakunaga, T. (1982b). A novel repeated element with Z - D N A forming potential is widely found in evolutionary diverse eukaryotic genomes. Proc. Natl. Acad. Sci. USA 79: 6465-6469. Hamada, H . , Petrino, M . G., Kakunaga, T., Seidman, M . , and Stollar, B. D. (1984a). Characterization of genomic Poly(dT-dG)n- (dC-dA)n sequences; structure, organization and conformation. Mol. Cell. Biol. 4: 2610-2621. Hamada, H . , Seidman, M . , Howard, B. H . , and Gorman, C. M . (1984b). Enhanced gene expression by the Poly(dT-dG)n • (dC-dA)n sequence. Mol. Cell. Biol. 4: 2622-2630. Hanahan, D. (1983). Studies on transformation of Escherichia coli with plasmids. /. Mol. Biol. 166: 557-580. Haniford, D. B., and Pulleyblank, D. E. (1983). Facile transition of poly [d(TG)-d(CA)] into a left-handed helix in physiological conditions. Nature 302: 632-634. Hardy, G. H . (1908). Mendelian proportions in a mixed population. Science 28:41-50. Hartl, D. L., and Clark, A . G. (1989). Principles of population genetics., 2nd ed. Stmderland, Mass.: Sinauer Associates Inc. Hentschel, C. C. (1981). Homocopolymer sequences in the spacer of a sea urchin histone gene repeat are sensitive to Si nuclease. Nature 295: 714-716. Ish-Horowicz, D., and Burke, J. F. (1981). Rapid and efficient cosmid cloning. Nucl. Acids. Res. 9: 2989-2998. Jeffreys, A . J., Wilson, V., and Thein, S. L. (1985a). Hypervariable 'minisatellite' regions in human D N A . Nature. 314: 67-73. Jeffreys, A. J., Roy le, N . J., Wilson, V., and Wong, Z. (1985b). Spontaneous mutation rates to new length alleles at tandem-repetetive hypervariable loci in human D N A . Nature 332: 278-281. Jeffreys, A . J., Wilson, V., Neumann, R. and Keyte, J. (1988). Amplification of htmian minisatellites by the polymerase chain reaction: towards D N A fingerprinting of single cells.Nucleic Acids Res. 16: 10953-10971. Jones, C , Patterson, D., and Kao, F. T. (1981). Assignment of the gene coding for phosphoribosylglycineamide formyl-transferase to human chromosome 14. Somatic Cell Genet. 7: 399-409. Julier, C , Hyer, R N . , Davies, J., Merlin, F., Soularue, P., Briant, L., Cathelineau, G., Deschamps, I., Rotter, J. I., Froguel, P., Boitard, C , Bell, J. I., and Lathrop, G. M . (1991). Instilin-IGF2 region on chromosome H p encodes a gene implicated in HLA-DR4-dependent diabetes susceptibility. Nature 354: 155-159. Kwiatkowski, D. J., Henske, E. P., Weimer, K., Ozelius, L., Gusella, J. F., and Haines, J. (1992). Construction of a GT polymorphism map of human 9q. Genomics 12: 229-240. Kosambi, D. D. (1944). The estimation of map distances from recombination values. Ann. Eugen. 12: 172-175. Lander, E. S., and Green, P. (1987). Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84: 2363-2367. Lathrop, G. M . , Lalouel, J. M . , Julier, C , and Ott, J. (1984). Strategies for multilocus linkage analysis in humans. Proc. Natl. Acad. Sci. USA 81: 3443-3446. Lathrop, G. M . , Lalouel, J. M . , Julier, C , and Ott, J. (1985). Multilocus linkage analysi in humans: detection of linkage and estimation of recombination. Am. J. Hum. Genet. 37: 482-498. Lathrop, G. M . , and Lalouel, J. M . (1988). Efficient computations in multilocus linkage analysis. Am. J. Hum. Genet. 42: 498-505. Levinson, G., and Gutman, G. A . (1987a). Slipped-strand mispairing: a major mechanism for D N A sequence evolution. Mol. Biol. Evol. 4: 203-221. Levinson, G., and Gutman, G. A . (1987b). High frequencies of short frameshifts in poly-CA/TG tandem repeats borne by bacteriophage M13 in Escherichia coli K-12. Nucleic Acids. Res. 15: 5323-5338. Lindsay, S., and Bird, A . P. (1987). Use of restriction enzymes to detect potential gene sequences in mammalian D N A . Nature 327: 336-338. Litt, M . , and Luty, J. A . (1989). A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am. J. Hum. Genet. 44: 397-401. Litt, M . , and White, R. L. (1985). A highly polymorphic locus in human D N A revealed by cosmid-derived probes. Proc. Natl. Acad. Sci. USA 82: 6206-6210. Love, J. M . , Knight, A . M . , McAleer, M . A. , and Todd, J. A . (1990). Towards construction of a high resolution map of the mouse genome using PCR-analysed microsatellites. Nucleic Acids Res. 18: 4123-4130. McKusick, V. A. , and Ruddle, F. H . (1977). The status of the gene map of the human chromosome. Science 196: 390-405. Meisfield, R., Krystal, M . , and Arnheim, M . (1981). A member of a new repeated sequence family which is conserved throughout eukaryotic evolution is found between the human S and P globin genes. Nucleic Acids Res. 9:5931-5947. Morton, N . E. (1955). Sequential tests for the detection of linkage. Am. J. Hum. Genet. 7: 277-318. Mullis, K., Faloona, F., Scharf, F., Saiki, R., Horn, G., and Erlich, H . (1986). Specific enzymatic amplification of D N A in vitro : the polymerase chain reaction. Cold Spring Harbor Symposia on Quantitative Biology 51: 263-273 Mullis, K. B., and Faloona, F. A . (1987). Specific synthesis of D N A in vitro via a polymerase-catalyzed chain reaction. Methods in Enzymology 155: 335-350. Nakamura, Y. , Leppert, M . , O'Connell, P., Wolff, R., Holm, T., Culver, M . , Martin, C , Fujimoto, E., Hoff, M . , Kumlin, E., and White, R. (1987). Variable number of tandem repeat (VNTR) markers for himian gene mapping. Science 235: 1616-1622. Naylor, L. H . , and Clark, E. M . (1990). d(TG)n • d(CA)n sequences upstream of the rat prolactin gene form Z - D N A and inhibit gene transcription. Nucleic Acids Res. 18: 1595-1601. Neel, B. G., Jhanwar, S. C , Chaganti, R. S. K., and Hayward, W. S. (1982). Two human c-onc genes are located on the long arm of chromosome 8. Proc. Natl. Acad. Sci. USA 79: 7842-6. Nei, M . , and L i , W. H . (1973). Linkage disequilibrium in subdivided populations. Genetics 75: 213-219. Nordheim, A. , and Rich, A. (1983). The sequence (dC-dA)n- (dG-dT)n forms left handed Z - D N A in negatively supercoiled plasmids. Proc. Natl. Acad. Sci. USA 80: 1821-1825. Olson, M . , Hood, L., Cantor, C , and Botstein, D. (1989). A common language for physical mapping of the human genome. Science 245: 1434-1435. Ott, J. (1985). A chi-square test to distinguish allelic association from other causes of phenotypic association between two loci. Genetic Epidemiology 2: 79-84. Ott, J. (1990). Invited editorial: cutting a gordian knot in the linkage analysis of complex human traits. Am. J. Hum. Genet. 46: 219-221. Ott, J. (1991). Analysis of human genetic linkage, revised ed. Baltimore and London: Johns Hopkins University Press. Pardue, M . L., Lowenhaupt, K., Rich, A. , and Nordheim, A . (1987). (dC-dA)n-(dG-dT)n sequences have evolutionarily conserved chromosomal locations in Drosophila with implications for roles in chromosome structure and function. EMBO }. 6:1781-1789. Puck, T. T., and Kao, F. T. (1967). Genetics of somatic mammalian cells. V. Treatment with 5-bromodeoxyuridine and visible light for isolation of nutritionally deficient mutants. Proc. Natl. Acad. Sci. USA 58: 1227-1234. Rigby, P. W. J., Dieckmann, M . , Rhodes, C , and Berg, P. (1977). Labeling deoxyribonucleic add to high specific adivity in vitro by nick translation with D N A polymerase I. /. Mol. Biol. 113: 237-251 Risch, N . (1990). Linkage strategies for genetically complex traits. HI. The effect of marker polymorphism on analysis of affected relative pairs. Am. J. Hum. Genet. 46: 242-253. Rychlik, W., and Rhoads, R. E. (1989). A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of D N A . Nucl. Acids Res. 17: 8543-8551. Sacchi, N . , Watson, D. K. Geurts van Kessel, A . H . M . , Hagemeijer, A. , Kersey, J., Drabkin, H . , Patterson, D., and Pappas, T. S. (1986). Hu-ets -1 and Hu-ets -2 genes are transposed in acute leukemias with (4;11) and (8;21) translocations. Science 231: 379-382. Saiki, R. K., Scharf, S., Faloona, F., Mullis, K. B., Horn, G. T., Erlich, H . A. , and Arnheim, N . (1985). Enzymatic amplification of p-globin genomic sequences and restridion site analysis for diagnosis of sickle cell anemia. Science 230: 1350-1354. Saiki, R K., Gelfand, D. H. , Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., and Ehrlich, H . A . (1988). Primer-directed enzymatic amplification of D N A with a thermostable D N A polymerase. Science 239: 487-491. Sambrook, J., Fritsch, F. F., and Maniatis, T. (1989). Molecular cloning: A laboratory manual, 2nd ed. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press. Samuelson, L. C , Wiebauer, K., Snow, C. M . , and Meisler, M . H . (1990). Retroviral and pseudogene insertion sites reveal the lineage of human salivary and panCTeatic amylase genes from a single gene during primate evolution. Mol. Cell. Biol. 10: 2513-2520. Sanger, F., Nicklen, S., and Coulson, A . R. (1977). D N A sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467. Schwartz, M . N . , Trautner, T. A. , and Kornberg, A . J. (1962). Enzymatic synthesis of deoxyribonucleic add. II. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. /. Biol. Chetn. 237: 1961-1967. Shen, L. P., and Rutter, W. J. (1984). Sequence of the human somatostatin I gene. Science 224: 168-170. Sherman, S. L. (1991). Combining genetic and physical maps. Cytogenet. Cell Genet. 58:1842-1843. Sherrington, R., Melmer, G., Dixon, M . , Cvirtis, D., Mankoo, B., Kalsi, G., and Gurling, H . (1991). Linkage disequilibrium between two highly polymorphic microsatellites. Am. J. Hum. Genet. 49: 966-971. Sidliano, M . J., Carrano, A . V. , and Thompson, L. H . (1986). Assignment of a human repair gene associated with sister-chromatid exchange to chromosome 19. Mutat. Res. 174: 303-308. Slightom, J. L. Blechl, A . E., and Smithies, O. (1980). Human fetal Gy - and Ay -globin genes: complete nucleotide sequences suggest that D N A can be exchanged between these duplicated genes. Cell 21: 627-638. Southern, E. M . (1975). Detection of specific sequences among D N A fragments separated by gel eledrophoresis. /. Mol. Biol. 98: 503-517. Stallings, R. L., Ford, A . F., Nelson, D., Torney, D. C , Hildebrand, C. E., and Moyzis, R. K. (1991). Evolution and distribution of (GT)n repetitive sequences in manunalian genomes. Genomics 10: 807-815. Streisinger, G., and Owen, J. E. (1985). Mechanisms of spontaneous and induced frameshift mutation in bacteriophage T4. Genetics 109: 633-659. Tautz, D., and Renz, M . (1984). Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res. 12: 4127-4138. Thompson, L. H . , Mooney, C. L., Burkhart-Schultz, K., Carrano, A . V., and Siciliano, M . J. (1985). Correction of a nucleotide-excision repair mutation by human chromosome 19 in human-hamster hybrid cells. Somatic Cell Mol. Genet. 11: 87-92. Thompson, L. H . , Carrano, A . V., Sato, K., Salazar, E. P., White, B. F., Stewart, S. A. , Minkler, J. L. and Siciliano, M . J. (1987). Identification of nucleotide-exdsion repair genes on human chromosomes 2 and 13 by functional complementation in hamster-human hybrids. Somatic Cell Mol. Genet. 13: 539-551. Thompson, L. H . , Bachinski, L. L., Stallings, R. L., Dolf, G., Weber, C. A. , Westerveld, A. , and Siciliano, M . J. (1989). Complementation of repair gene mutations on the henuzygous chromosome 9 in CHO: a third repair gene on human chromosome 19. Genomics 5: 670-679. Thompson, J. S. and Thompson, M . W. (1986). Genetics in Medicine, 4th ed. Philadelphia, PA: W. B. Saunders Company. Tomfohrde, J., Wood, S., Schertzer, M . , Wagner, J. M . , Wells, D. E., Parrish, J., Sadler, L. A. , Blanton, S. H . , Daiger, S. P., Wang, Z., Wilkie, P. J., and Weber, J. L. (1992). Human chromosome 8 linkage map based on short tandem repeat polymorphisms: effect of genotyping errors. Genomics, in press. Treco, D., and Arnheim, N . (1986). The evolutionarily conserved repetitive sequence d(TG • AC)n promotes reciprocal exchange and generates unusual recombinant tetrads during yeast meiosis. Mol. Cell. Biol. 6: 3934-3947. Tsui, L. C , Farrall, M . , and Donis-Keller, H . (1989). Report of the committe on the genetic constitution of chromosomes 7 and 8. Cytogenet. Cell Genet. 51: 166-201. Wagner, M . J., GE, Y., Sidliano, M . , and Wells, D. E. (1991). A hybrid cell mapping panel for regional localization of probes to human chromosome 8. Genomics 10: 114-125. Wahls, W. P., Wallace, L. J., and Moore, P. D. (1990). The Z - D N A motif (dTG)3o promotes reception of information during gene conversion events while stimulating homologous recombination in human cells in culture. Mol. Cell. Biol. 10: 785-793. Weber, J. L. (1990a). Human D N A polymorphisms based on length variations in simple-sequence tandem repeats. In Genome analysis volume I: Genetic and physical mapping. New York, New York: Cold Spring Harbor Laboratory Press. Weber, J. L. (1990b). Informativeness of (dC-dA)n- (dG-dT)n polymorphisms. Genomics. 7: 524-530. Weber, J. L., and May, P. E. (1989). Abundant class of human D N A polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet. 44: 388-396. Weinberg, W. (1908). On the demonstration of heredity in man. Translated by S. H . Boyer. 1963. In Papers on human genetics. Englewood Cliffs, New Jersey: Prentice-Hall. Weir, B. S., and Cockerham, C. C. (1978). Testing hypotheses about linkage disequilibrium with multiple alleles. Genetics 88: 633-642. White, R., and Lalouel, J-M. (1988). Sets of linked genetic markers for human chromosomes. Annu. Rev. Genet. 22: 259-79. Williamson, R., Bowcock, A. , Kidd, K., Pearson, P., Schmidtke, J., Ceverha, P., Chipperfield, M . , Cooper, D. N . , Coutelle, C , Hewitt, J., BQinger, K., Langley, K., Beckmann, J., Tolley, M . , and Maidak, B. (1991). Report of the D N A committee and catalogues of cloned and mapped genes, markers formatted for PCR and D N A polymorphisms. Cytogenet. Cell Genet. 58: 1190-1832. Wintero, A . K., Fredholm, M . , and Thomsen, P. D. (1992). Variable (dG-dT)n • (dC-dA)n sequences in the porcine genome. Genomics 12: 281-288. Wolff, R. K., Nakamura, Y., and White, R. (1988). Molecular characterization of a spontaneously generated new allele at a VNTR locus: no exchange of flanking D N A sequence. Genomics 3: 347-351. Wolff, R. K., Plaetke, R., Jeffreys, A . J., and White, R. (1989). Unequal crossingover between homologous chromosomes is not the major mechanism involved in the generation of new alleles at VNTR loci. Genomics 5: 382-384. Wood, S. (1988). Human chromosome 8. /. Med. Genet. 25: 721-731. Wood, S., Schertzer, M . , Drabkin, H . , Patterson, D., Longmire, J. L. , and Deaven, L. L. (1992). Characterization of a human chromosome 8 cosmid library constructed from flow-sorted chromosomes. Cytogenet. Cell Genet. 59: 243-247. Yates, F. (1934). Contingency tables involving small mmibers and the test. /. Roy. Stat. Soc. Suppl. 1: 217-235. Yates, F. (1984). Tests of significance for 2 X 2 contingency tables. /. Roy. Stat. Soc. A 147: 426-463. Yuille, M . A . R , Goudie, D. R., Affara, N . A . , and Ferguson-Smitt, M . A . (1991). Rapid determination of sequences flanking microsatellites. Nucleic Acids Res. 19: 1950. APPENDIX 1 Theoretical and observed distributions of 123 genotypes at D8S136P1 in the C E P H panel Genotype 1/1 1/2 2/2 1/3 2/3 3/3 1/4 2/4 Nunnber of individuals 0.051 0.061 0.018 0.284 0.171 0.398 0.183 0.110 expected Number of individuals 0 0 0 0 0 0 0 0 observed Genotype 3/4 4/4 1/5 2/5 3/5 4/5 5/5 1/6 Number of individuals 0.512 0.165 0.913 0.549 2.560 1.647 4.115 0.467 expected Number of individuals 2 0 1 1 2 1 4 1 observed Note: the notation a/b was used where "a" indicates the allele at D8S136P1 on one chromosome and "b" indicates the allele at D8S136P1 on the other chromosome. For example, four individuals were observed with the genotype 5/5, where 5 corresponds to A5 in Table 2. Genotj^e 2/6 3/6 4/6 5/6 6/6 1/7 2/7 3/7 Number of individuals expected 0.281 1.309 0.842 4.207 1.075 0.142 0.086 0.399 Number of individuals 0 2 1 3 1 0 0 1 observed Genotype 4/7 5/7 6/7 7/7 1/8 2/8 3/8 4/8 Number of individuals 0.257 1.282 0.657 0.100 1.057 0.634 2.959 1.903 expected Number of individuals 0 0 2 0 1 1 1 2 observed Genotype 5/8 6/8 7/8 8/8 1/9 2/9 3/9 4/9 Number of individuals expected 9.512 4.862 1.482 5.497 0.751 0.451 2.105 1.354 Number of individuals observed 13 5 2 6 0 1 4 0 Genotype 5/9 6/9 7/9 8/9 9/9 1/10 2/10 3/10 Number of individuals expected 6.767 3.459 1.054 7.821 2.782 1.035 0.622 2.902 Number of individuals observed 5 3 2 4 5 2 0 2 Genotype 4/10 5/10 6/10 7/10 8/10 9/10 10/10 Number of individuals expected 1.866 9.327 4.768 1.453 10.78 7.670 5.286 Number of individuals observed 3 11 4 0 11 8 5 Theoretical and observed distribution of 123 genotypes at D8S136P2 in the C E P H panel Genotype 1/1 1/2 2/2 1/3 2/3 3/3 1/4 2/4 3/4 Number of individuals 0.008 0.024 0.018 0.235 0.354 1.710 0.429 0.646 6.247 expected Number of individuals 0 0 0 1 0 1 0 1 2 observed Genotype 4/4 1/5 2/5 3/5 4/5 5/5 1/6 2/6 3/6 Number of individuals 5.707 0.526 0.793 7.663 14.00 8.586 0.731 1.101 10.64 expected Number of individuals 8 0 0 11 13 10 1 1 12 observed Note: the notation a/b was used where "a" indicates the allele at D8S136P2 on one chromosome and "b" indicates the allele at D8S136P2 on the other chromosome For example, 8 individuals were observed with the genotype 4/4, where 4 corresponds to A4 in Table 3. Genotype 4/6 5/6 6/6 1/7 2/7 3/7 4/7 5/7 6/7 Number of individuals expected 19.44 23.85 16.56 0.008 0.012 0.119 0.217 0.266 0.370 Number of individuals 21 20 18 0 0 0 0 1 0 observed Genotype 7/7 1/8 2/8 3/8 4/8 5/8 6/8 7/8 8/8 Number of individuals 0.002 0.008 0.012 0.119 0.217 0.266 0.370 0.004 0.002 expected Number of individuals 0 0 0 1 0 0 0 0 0 observed Genotype 1/9 2/9 3/9 4/9 5/9 6/9 7/9 8/9 9/9 Number of individuals expected 0.008 0.012 0.119 0.217 0.266 0.370 0.004 0.004 0.002 Number of individuals observed 0 1 0 0 0 0 0 0 0 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0086587/manifest

Comment

Related Items