UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genetic analysis of huntington disease Andrew, Susan E. 1994

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1995-983795.pdf [ 5.39MB ]
Metadata
JSON: 831-1.0088862.json
JSON-LD: 831-1.0088862-ld.json
RDF/XML (Pretty): 831-1.0088862-rdf.xml
RDF/JSON: 831-1.0088862-rdf.json
Turtle: 831-1.0088862-turtle.txt
N-Triples: 831-1.0088862-rdf-ntriples.txt
Original Record: 831-1.0088862-source.json
Full Text
831-1.0088862-fulltext.txt
Citation
831-1.0088862.ris

Full Text

GENETIC ANALYSIS OF HUNTINGTON DISEASE by SUSAN E. ANDREW B.Sc., The University of Toronto, 1987 M.Sc., Simon Fraser University, 1989 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Genetics We  Programme  accept this thesis to th required  as  conforming  THE UNIVERSITY OF BRITISH COLUMBIA April 1994  ()  Susan  E.  Andrew,  1994  In presenting this thesis in partial fulfilment of the requirements for degree at the University of British Columbia, I agree that the Library freely available for reference and study. I further agree that permission copying of this thesis for scholarly purposes may be granted by the department  or  by  his  or  her  representatives.  It  is  understood  an advanced shall make it for extensive  head of my that copying or  publication of this thesis for financial gain shall not be allowed without my written permission.  (Signature)  Department of  6tJenc5  The University of British Columbia Vancouver, Canada  Date  DE-6 (2)88)  D,1M4E  11  ABSTRACT  Huntington disease (lID) is an autosomal dominant neurodegenerative disease characterized  by progressive dementia and chorea. The initial aim of this thesis was to identify candidate regions for the HD gene. Markers separated by 3 Mb were found to be in strong allelic association with HI), thus identifying two mutually exclusive candidate regions.  A screen for genomic rearrangements in affected individuals using exonic clones from the proximal candidate region was undertaken. One auspicious clone showed a genomic rearrangement, cosegregating with HI) in two families.  Subsequent cloning and  sequencing of this region demonstrated an Alu retrotransposition event associated with Huntington disease in the two families.  During this work, the mutation causing Huntington disease was identified by the HD Collaborative Research Group as an expanded CAG trinucleotide repeat in a novel gene. Analysis of new polymorphic markers in the gene permitted a retrospective analysis of  linkage disequilibrium in a 300kb region harbouring the CAG repeat. Analysis of HI) haplotypes showed that multiple haplotypes underlie CAG expansion. Mean CÁO length  on control chromosomes with haplotypes identical to those most frequently observed in HI) were significantly larger than CAG lengths on other control chromosomes, consistent with the hypothesis that these chromosomes are a reservoir for further expansions, resulting in Hi) chromosomes.  The nature of the dynamic trinucleotide repeat mutation resolved several previously confusing issues of HI). There is a highly significant correlation between the age of onset of Huntington disease and CAG repeat length which accounts for approximately 50% of the variation in the age of onset. The instability of the repeat and the tendency to expand  111  further over the generations accounts for the occurrences of new mutations and observed anticipation. CAG expansion is the basis of the worldwide occurrence of HD and was shown to be a sensitive and specific marker for inheritance of the ND gene. A small proportion of clinically affected patients tested lacked CAG expansion, representing misdiagnosis, sample mix-up, clerical error or possible genetic heterogeneity. CAG analysis also allowed for the resolution of recombinant ND chromosomes that previously  had suggested the gene was located in the distal candidate region.  iv  TABLE OF CONTENTS  PAGE  ABSTRACT  ii  TABLE OF CONTENTS  iv  LIST OF TABLES  vii  LIST OF FIGURES  viii  LIST OF ABBREVIATIONS  x  ACKNOWLEDGEMENTS  xi  1. HUNTINGTON DISEASE 1.1 INTRODUCTION 1.2 HISTORY OF HD 1.3 CLiNICAL FEATURES 1.4 GENETICS 1.4.1 Inheritance 1.4.2 Penetrance and expressivity 1.4.3 Anticipation 1.4.4 Juvenile onset 1.4.5 Epidemiology 1.4.6 Mutation rate 1.5 NEUROPATHOLOGY 1.6 POSITIONAL CLONING 1.6.1 Positional cloning 1.6.2 Chromosomal localization 1.6.3 Establishment of a candidate region 1.6.4 Identification of candidate genes 1.6.5 Mutation analysis 1.7 THE SEARCH FOR THE lID GENE 1.7.1 Mappingthegeneto4pl6.3 1.7.2 Establishing the candidate region 1.7.3 The HD mutation 1.8 OBJECTIVE 1.9 REFERENCES  1 2 2 3 3 3 4 4 6 6 7 7 8 8 8 9 9 10 10 10 13 16 16 18  2. MATERIALS AND METHODS 2.1 GENETIC ANALYSIS 2.2 DNA ISOLATION AND SOUTHERN BLOT 2.3 DNA PROBES 2.4 PREPARATION OF HYBRIDIZATION PROBES 2.5 PCR PRIMERS 2.6 SEQUENCING 2.6.1 Double strand sequencing 2.6.2 Single strand PCR products 2.7 PREPARATION OF cDNA TEMPLATE 2.8 STATISTICAL ANALYSIS OF ASSOCIATION 2.9 STATISTICAL ANALYSIS OF CAG ANALYSIS 2.10 REFERENCES  27 28 29 29 29 29 32 32 32 32 32 34 35  V  GENETIC ANALYSIS 3. NONRANDOM All El JC ASSOCIATION 3.1 INTRODUCTION 3.1.1 Allelic association 3.1.2 Allelic association in the Hi) candidate region 3.2 RESULTS 3.2.1 Identification of new distal polymorphic markers 3.2.2 Nonrandom allelic association across 6 Mb 3.2.3 Haplotype analysis 3.2.4 Analysis of homogeneous populations 3.3 DISCUSSION 3.4 REFERENCES  37 38 38 40 44 44 46 50 53 58 65  4. DNA ANALYSIS OF DISTINCT POPULATIONS 4.1 INTRODUCTION 4.2 RESULTS 4.2.1 Assessment of nonrandom association 4.2.2 DNA haplotype analysis of affected chromosomes 4.2.3 DNA haplotype analysis of control chromosomes 4.2.4 Haplotype comparisons 4.3 DISCUSSION 4.4 REFERENCES  69 70 71 71 77 81 81 83 88  5. PATI’ERNS OF ASSOCIATION AROUND THE HD GENE 5.1 INTRODUCTION 5.2 RESULTS 5.2.1 DNA markers 5.2.2 Statistical analysis 5.2.3 Gene frequencies and allelic association 5.2.4 Haplotype analysis 5.2.5 Comparison of CAG length between haplotypes 5.2.6 Haplotypes of sporadic lID patients 5.3 DISCUSSION 5.4 REFERENCES  90 91 93 93 96 96 99 101 101 104 109  SEARCH FOR GENOMIC REARRANGEMENTS 6. IDENTIFICATION OF AN ALU RETROTRANSPOSITION EVENT 6.1 INTRODUCTION 6.1.1 Gene Tracking 6.1.2 GT clone analysis 6.2 RESULTS 6.2.1 GT48 genoniic rearrangement 6.2.2 Genomic cloning of Alu Retrotransposition event 6.3 DISCUSSION 6.4 REFERENCES  112 113 113 117 120 120 120 133 140  MOLECULAR GENETIC ANALYSIS OF HUNTINGTON DISEASE  7. CAG EXPANSION IN HUNTINGTON DISEASE 7.1 INTRODUCTION  142 143  vi 7.2 RESULTS 7.2.1 Development of a PCR assay 7.2.2 Association between CAG length and age of onset 7.2.3 Correlation between clinical features and CAG length 7.2.4 Variation in CAG repeat length in juvenile onset 7.2.5 Predictive value of CAG repeat length 7.2.6 Precision of CAG repeat assessment 72.7 Parent-child correlations 7.2.8 Sib-sib correlations 7.3 DISCUSSION 7.4 REFERENCES  144 144 148 154 154 155 155 157 158 161 164  8. CAG SENSITW1TY AND SPECIFICITY 8.1 INTRODUCTION 8.2 RESULTS 8.2.1 CAG repeat sizes in ND and other disorders 8.2.2 CAG repeat sizes in control chromosomes 8.3 DISCUSSION 8.4 REFERENCES  167 168 168 171 174 178 181  9. HUNTINGTON DISEASE WITHOUT CAG EXPANSION 9.1 INTRODUCTION 9.2 RESULTS 9.2.1 Errors in assignment 9.2.2 Misdiagnosis 9.2.3 Absence of CAG expansion in HI) families 9.2.4 Genetic heterogeneity of HD? 9.3 DISCUSSION 9.4 REFERENCES  183 184 185 185 189 189 191 197 202  10. NEW MUTATIONS FOR HUNTINGTON DISEASE 10.1 INTRODUCTION 10.2 RESULTS 10.2.1 Identification of a premutation 10.2.2 Parental sex of origin 10.2.3 New mutations without CAG repeat expansion 10.2.4 Sporadic ND is transmitted to offspring 10.3 DISCUSSION 10.4 REFERENCES  204 205 206 206 214 215 215 216 220  11. A CCG REPEAT ADJACENT TO ThE CAG REPEAT 11.1 INTRODUCTION 11.2 RESULTS 11.3 DISCUSSION 11.4 REFERENCES  222 223 223 225 230  12. DISCUSSION 12.1 SUMMARY OF RESULTS 12.1.1 Allelic association 12.1.2 Allelic association around the CAG repeat 12.1.3 Genomic rearrangement associated with HI) 12.1.4 CAG repeat analysis 12.3 FURTHER INVESTIGATIONS 12.4 REFERENCES  231 232 232 233 234 234 236 237  vii LIST OF TABLES  PAGE  2-1  List of probes used  30  3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9  Summary of previous association studies New polymorphisms List of probes used in association analysis Allele frequencies for markers from the HI) candidate region Methods for determining association with multiple alleles Haplotypes of HI) and control chromosomes Haplotypes between markers in association with HI) Allele frequencies for RFLPs on ND and canonical chromosomes Allele frequencies for RFLPs in a UK population  42 45 47 48 51 52 54 56 59  4-1 4-2 4-3 4-4  Polymorphic markers Allele frequencies on HI) and control chromosomes ND haplotypes Control haplotypes  73 74 78 82  5-1 5-2a 5-2b 5-3a 5-3b 5-4  Polymorphic markers used in analysis Allele frequencies on HI) and control chromosomes Allele frequencies of CAG repeat on ND and control chromosomes Haplotypes of ND and control chromosomes CAG lengths of ND and control haplotypes ND Haplotypes from sporadic cases of ND  95 97 98 100 102 103  6-1 6-2  SummaryofGTclones Allele frequencies of 1.2kb Hindu polymorphism  118 134  7-1 7-2 7-3  Demographics of cohort CAG length CAG lengths by sex of parent/grandparent  149 150 153  8-1 8-2 8-3  Distribution of ND allele sizes by origin Distribution of CAG lengths in other neuropsychiatric disorders CAG size distribution for control chromosomes by origin  169 173 176  9-1 9-2 9-3  Reaons for lack of CAG expansion Possible phenocopies Errors in assignment  186 187 188  10-la New mutations with Intermediate sized alleles 10-lb New mutations without lAs  207 208  11-1  226  Frequency of CCG alleles  viii LIST OF FIGURES  PAGE  1-1 1-2 1-3  F1D pedigree Diagram of 4pl&3 showing markers Candidate regions for HD  5 12 14  2-1  Schematic map of markers from 4p 16.3  31  3-1  Map of candidate region showing location of probes  43  4-1  Schematic diagram of HD haplotypes  79  5-1  Map of markers flanking the ND gene  94  6-1 6-2 6-3  114 116  6-7 6-8  Gene tracking methodology Mapping of cDNAs to BINs Southern blot of rearrangement seen in 2ND families a)FamilyA b) Family B a) Hapiotype of Family A with genomic rearrangment b) Haplotype of Family B with genomic rearrangment a) Localization of GT48 b) Restriction map of )GT48 c) Mapping of GT44 demonstrating Hindu polymorphism c) Restriction digests of genomic rearrangement a) PCR assay of Alu insertion b) Sequence of Alu element Pedigrees of families A and B showing CAG repeat sizes Mapping of GT48 and the Huntington disease gene  7-1 7-2 7-3 7-4 7-5 7-6  Primers for amplification of the CAG repeat Autoradiograph of the PCR products across the CAG repeat Relationship of CAG repeat length and age of onset Confidence intervals for predicted age of onset Parent-child corelations of CAG repeat length Sib-sib correlations of CAG repeat length  145 147 152 156 159 160  8-1  a) CAG repeat lengths of 995 ND patients b) CAG repeat lengths of 600 control chromomosomes c) CAG repeat lengths of 995 normal chromosomes from ND patients  172 175 177  9-1 9-2  Phenocopy pedigrees 13-19 a) Pedigree of phenocopy patients 1 and 2 b) Pedigree of phenocopy patients 3 and 4 c) Pedigree of phenocopy patient 5 d) Pedigree of phenocopy patient 6 a) Pedigrees of phenocopy patients 7-10  190 192 193 194 195 196  10-1  Autoradiographs of families demonstrating intermediate sized alleles a)Familyl b)Family2 c)Family3 d)Family4 e)Family5  209 210 211 212 213  6-4 6-5  6-6  121 122 123 124 125 127 128 129 130 131 135 136  ix  LIST OF FIGURES CON’T. 11-1 11-2  Schematic of CAG and CCG repeats and position of PCR primers Autoradiograph showing CCG polymorphism  PAGE 224 227  x  LIST OF ABBREVIATIONS  AO  age of onset  bp  basepair  CF  cystic fibrosis  cM  centiMorgan  DM  myotonic dystrophy  DRPLA  dentato-rubro-paallido-luysian atrophy  FRAXA  fragile X  FRAXE  fragile XE mental retaniation  H])  Huntington disease  kb  kilobase  Mb  Megabase  RFLP  restriction fragment length polymorphism  SBMA  spinal bulbar muscular atrophy  SCA  spinocerebeilar ataxia  All pedigrees presented in this thesis have been altered to protect the patients.  xi  ACKNOWLEDGEMENTS  It was a privilege to work with Dr. Michael Hayden and I am grateful for the guidance, encouragement and support which he consistently provided. I would also like to thank members of my supervisory committee, Dr. Diana Juriloff, Dr. Peter Candido, Dr. Lorne Clark, and Dr. Paul Goodfellow for all their advice during my training.  I would like to thank the patients with Huntington disease and their families, for without their support and participation, this work could not have been done.  There are many friends and colleagues that contributed greatly to this thesis, and I would  like to thank them for their support and friendship. I would especially like to thank Dr. Paul Goldberg for all his help and total support both in and outside of the lab. Special  thanks to Jane Theilmann who has been a part of this thesis from beginning to end. Thank you to Dr. Johanna Rommens, Dr. Berry Kremer, Dr. Hakan Telenius, Dr. Bernhard Weber, Dr. Coffin Collins, Dr. Gordon Hutchinson, Dr. Nando Squitieri, Jutta Zeisler, and Amy Hedrick who all contributed to this thesis. And thank you to other friends in the Hayden lab for making the past four years so enjoyable.  Enormous thanks to my parents John and Catherine, and John S., always, for their love and constant support.  This thesis was supported in part by scholarships from the Huntington Society of Canada and UBC.  1  CHAPTER 1 HUNTINGTON DISEASE  2  1.1 INTRODUCTION Approximately 5% of the population suffer from known genetic diseases , and if diseases 1 with a suspected genetic component are included, such as schizophrenia, diabetes and coronary heart disease, at least 60% of the population suffer from disease with some genetic basis . At present, over 6000 disease genes have been mapped to a specific 2 chromosomal region, and more than 200 genes with mutations underlying disease have been identified . Haldane clearly stated the newly realized goals of human geneticists in 3 1948: “The final aim...should be the enumeration and location of all the genes found in normal human beings, the function of each being deduced from the variations occurring when the said gene is altered by mutation, or when several allelomorphs of it exist in normal men and women...” 4  One of the more recent mutations to be identified is the mutation associated with Huntington disease (ND), described as “one of the most dreadful diseases that man is liable . 5 to”  1.2 HISTORY OF HD Historically, the clinical features of ND have been recorded as early as the 1500s, when Paracelsus went against the popular belief of his time and stated that the dancing mania was physiological and not due to supernatural causes. Paracelsus introduced the term chorea, from the Latin choreus, or dancing, and the Greek choros meaning chorus . In 1872, in 6 his paper “On Chorea”, Dr. George Huntington so clearly and completely described this form of chorea, that the disease now carries his name . As the presence and degree of 7 chorea is variable and is only one feature, albeit characteristic, of this disorder, it is now more commonly referred to as Huntington disease, rather than Huntington’s chorea.  3  1.3 CLINICAL FEATURES The onset of HD is insidious, characterized by involuntary choreic movements and dementia and other psychiatric disturbances. While the age of onset is variable ranging from 2 to 90 years the first symptoms appear on average by the age of 388. Cognitive impairment precedes any movement disorder in over half of the patients afificted with HI) . 9 Often, subtle mental changes such as irritability, depression, and impulsiveness precede the diagnosis and mild incoordination or jerkiness are often the first neurological symptoms . 6 The severity of the chorea and cognitive impairment worsen however as the disease progresses. Dysarthria, presenting as a slowness and disorganization of speech, and difficulty in swallowing, are common . 6  HD is ineluctable: progressive worsening of clinical features results in death, usually 15-25 years after onset of symptoms, from heart attack, complications from aspiration (bronchopneumonia or choking), hematomas, or suicide . Post-mortem examination of the 9 brain, showing caudate atrophy provides definitive diagnosis of RD. Positive family history is therefore an important consideration for the diagnosis of ND.  Choreic movements of the arms and legs, and disturbance of gait, are the most striking features of ND. However, five percent of patients, often those with juvenile onset, do not develop chorea but show a progressive slowing of movements towards a rigid state . 6  1.4 GENETICS 1.4.1 INHERITANCE HI) is inherited as an autosomal dominant disorder which because of adult onset results in the usual onset of symptoms after transmission of the gene to offspring. With no known cure, this disease is not only devastating for the patient, but also places each of the patient’s offspring at 50% risk of inheriting a similar destiny.  4  Matings of two affected individuals are rare, however, two well documented individuals have been shown to be homozygous for the mutation associated with  HD’°”.  Interestingly, they do not differ in their clinical manifestation from those who have inherited one H]) chromosome. Thus, lID represents a true autosomal dominant disorder,  with homozygotes demonstrating a similar phenotype to heterozygotes. Although there are many well documented case of complete dominant disorders in other organisms, such as Drosophila, in humans, of the 3600 or so autosomal dominant disorders known in man , 3  HD is the only one thought to be completely dominant . 9  1.4.2 PENETRANCE AND EXPRESSIVITY H]) demonstrates complete penetrance but displays variable expressivity. The wide spectrum of phenotype does not suggest genetic heterogeneity however, as patients within one family can demonstrate a wide range of phenotypes. Intrafamilial variability can be considerable as demonstrated by a family with ages of onset ranging from 20 to 50 in one generation and in the most recent generation, H]) was inherited in one individual at age 24 (Figure 1-1).  1.4.3 ANTICIPATION Anticipation is a term used to describe the phenomenon of disease presentation over successive generations with progressively earlier ages of onset, or with increasing . This has been observed in several genetic disorders, such as myotonic 12 severity , fragile X syndrome (FRAXA)’ 13 dystrophy (DM) 15 and HI). , bipolar affective disorder 4  Anticipation was often discounted as a bias of ascertainment . However, the recent 16 cloning of the genes for DM , FRAXA 9 -’ 17 , 2 20 2 and now HD demonstrate a molecular mechanism for this phenomenon which will be discussed further in Chapter 7.  5  El. Ao=2osjAo=5os  AO=45  AO=  Ao>5oo  AO=3O’  AO =60é  Etl AO=24  Figure 1-1. Pedigree affected with Huntington disease, demonstrating autosomal dominant inheritance, variable age of onset and anticipation. AO = age of onset.  6 1.4.4 JUVENILE ONSET In approximately 6% of HD patients the age of onset is before age 20, and is termed juvenile onset FID . This group can be further divided into those who have childhood 6 onset before the age of 10, and those who present with adolescent onset, from 10 to 20 years of age. Juvenile HI) is often characterized by rigidity and tremor, and epileptic seizures occur in approximately 30% to 50% of affected cases . The rate of progression of 6 juvenile ND is much faster than the later onset cases (duration of disease is 9 years on average for juveniles as compared to 14.65 years for adult onset cases) . 6  In deviation from Mendelian inheritance, the majority of juvenile cases have inherited the defective gene from their father (64% with onset from age 10-20, 100% with onset before age 10)23. This is consistent with observations in some non-juvenile ND families of anticipation where the gene is also inherited through the paternal Iine’.  1.4.5 EPIDEMIOLOGY RD has a prevalence of 4-10/100,000 in Northern Europe . The disease is seen in all 6 major racial groups, although with different frequencies. Several areas demonstrate a high prevalence of the disease gene, thought to be due to a founder effect. For example, at Lake Maracaibo in Venezuela there is a prevalence of 7000 per million, from one founder . Also, many Afñkaner patients of South Africa can trace their ancestry back to 6 individual 1652, when an affected Dutchman arrived in Cape Town . Geneological investigations in 6 other countries such as Venezuela, Australia, Canada and the United States show north western Europe as the source of many ND chromosomes . Several populations, such as 6 the Japanese and American and African blacks, have a very low prevalence of the ND gene in their populations (3.8, 15 and .6 per million respectively) . 6  7 1.4.6 MUTATION RATE Several studies have determined the mutation rate for HD, and it was accepted that the mutation rate for HI) was the lowest of all known genetic disorders. Mutation rates range from 0.13 mutants/10 . One difficulty in determining 6 6 gametes to 9.6 mutants/10 6 gametes the rate of mutation stems from the difficulty in confirming new mutations. Stevens and Parsonage defined criteria for new mutations as: the parents must be free of all symptoms of disease and have lived at least 70 years, paternity must be proven, and the disease must then be seen to be passed on to the proband’s offspring . However, these strict 26 requirements meant that few confirmed cases were reported that fit all the criteria. Therefore, the mutation rate was, in most likelihood, too low an estimate.  The  identification of the HD mutation has allowed for direct confirmation of the number of new mutations arising from unaffected parents. Analysis of potential new mutation individuals that did not fit all of the above criteria confirms that the number of new mutations was previously underestimated.  1.5 NEUROPATHOLOGY The biochemical basis for HI) is unknown. Pathologically, HD is characterized by selective cell death in the basal ganglia, primarily within the striatum (caudate nucleus and 28 Neuronal degeneration occurs with regional progression, starting in the tail ’ 27 putamen) . of the caudate . The striatum is divided into distinguishable regions called patches and 29 matrix and it is the cells in the matrix which die first, as the cell death progresses through 31 Certain specific cell types are more prone to cell death than others and it ’ 30 striatum the . is the medium-sized spiny neurons, the largest proportion of striatal neurons, that are most . Although the neuronal loss in the basal ganglia is the most characteristic 32 affected pathological feature of HD, as the disease progresses there is also cell loss in the cortex, external segment of the globus pallidus, and eventually in the hypothalamus, however, one notable exception is the cerebellum . 9  8  1.6 POSITIONAL CLONING 1.6.1 POSITIONAL CLONING Positional cloning refers to the isolation of a gene on the basis of its map position alone, with no knowledge of the defective gene product . This approach has been successful in 33 the identification of genes for several human disorders including chronic granulomatous , Duchenne muscular dystrophy 34 disease , retinoblastoma 35 , and cystic fibrosis 36 . 37 Localization of a gene originates with initial identification of the chromosomal region associated with the disease, subsequent refinement of the candidate region, identification of candidate genes within the region of interest, and assessment of these genes for potential disease causing mutations.  Each gene search is different and characteristics of each particular disease and chromosomal location will dictate the appropriate technique, or combination of techniques to be used to identify the gene and ascertain the disease-causing mutations.  1.6.2 CHROMOSOMAL LOCALIZATION In order to locate a disease gene, linkage between disease and a particular chromosomal marker must be found. To search for linkage, families harbouring the disease gene are screened with a series of polymorphic markers spanning the genome, and the likelihood ratio of linkage between the disease and each marker is determined. A logarithm of the odds ratio, or led score, of 3 means the odds of linkage at a particular recombination distance  e are 1000:1 and is accepted as evidence of linkage, whereas a led score of -2 is  indicative of . 39 Localization of the marker to a specific chromosomal region ’ 38 non-linkage by in situ hybridization or somatic cell mapping can allow for further refinement of the disease gene localization by multipoint linkage analysis with additional adjacent markers  9 1.6.3 ESTABLISHMENT OF CANDIDATE REGION Further mapping can refine the candidate region by making use of gross chromosomal rearrangements. Cytogenetically observed rearrangements have been critical in isolation of genes such as Duchenne muscular dystrophy, Wilm’s tumour, and neurofibromatosis . 41 Recombination events can also be invaluable in defining the candidate region. However, the extent to which the candidate region can be narrowed in this manner is limited, usually with a range of 1-2 Mb, depending on the number of recombinant individuals, the number of informative markers, and the presence, if any, of DNA rearrangements.  Nonrandom allelic association between a marker and disease is another method of defining a candidate region for a disease gene exemplified by the localization of the cystic fibrosis . Allelic association studies involve comparing allele frequencies in a group of 42 gene affected patients to those of a normal control population. It is based on the principle that loci located closest to the site of mutation undergo recombination less frequently than those at a greater distance, and therefore are likely to exhibit a higher degree of nonrandom allelic association or linkage disequilibrium. An increase in the measure of association between disease and markers along the chromosome is theoretically indicative of increasing proximity to the disease gene. Allelic association analysis assumes that new mutations are rare, and that affected chromosomes are derived from one or few ancestral mutation events.  1.6.4 IDENTIFICATION OF CANDIDATE GENES Establishment of the minimal candidate region allows for the construction of a physical map and cloning of the defined region of interest. This has been made easier by the ability to 45 and clone large regions of DNA in YACs 43 and P1 vectors” as well as in cosmids phage.  10 Several strategies are now established for identifying all the transcribed sequences present within the region of interest. Previous methods of finding transcribed sequences focused on the identification of CpG islands, conservation of DNA sequences on zoo blots, Northern blot analysis and classical screening of eDNA libraries. These have been superseded by techniques such as hybrid selection techniques , exon trapping 4749 5 and ’ 50 computational approaches 53 that allow for a more rapid analysis of larger regions of the , 52 genome by focusing on direct selection of transcribed sequences.  1.6.5 MUTATION ANALYSIS Further analysis is necessary to evaluate candidate genes for alterations that could be causative for a particular disorder. Demonstration of an alteration cosegregating with disease together with a negative finding in control individuals suggests the alteration may be responsible for the disease phenotype. Many rearrangements associated with disease are detected by Southern blot analysis. However, recently a variety of techniques have been developed for characterizing intragenic changes, including RNase A protection assayM, denaturing gradient gel electrophoresis , and single-strand 56 , chemical mismatch cleavage 55 conformational polymorphism (SSCP) analysis . A definitive method to detect mutations 57 is comparison of sequenced PCR products between patients and nonnals . 58  1.7 THE SEARCH FOR THE HUNTINGTON DISEASE GENE 1.7.1 MAPPING THE GENE TO 4p16.3 In the years following Huntington’s classic paper, progress towards a better understanding of the disease was limited, however, the development of molecular techniques initiated the search for the gene by a positional cloning approach.  Unlike other genetic disorders characterized by gross chromosome rearrangements, such as the large deletions and translocations on the X chromosome that aided isolation of the  11 Duchenne muscular dystrophy gene , no outstanding feature indicated a chromosomal 59 location for the HI) gene. In 1983, the use of polymorphic markers (RFLPs) to determine linkage of disease to specific chromosomes ° was successfully illustrated by the discovery 6 of linkage between H]) and the 4th marker to be tested, named G8, subsequently mapped . HI) was shown to cosegregate with G8 (D4S 10) in a large Venezuelan 61 to 4pl6.3 pedigree and an American family of German  . 61 origin  Multipoint linkage analysis using  both families gave a combined lod score of 8.53 at 0=0, with a 99% confidence interval of 10cM. Although the idea of a linkage map based on frequency of recombination was first suggested in 1913 by Sturtevant , the HI) gene was the first human disease gene to 62 be linked to an autosomal chromosome by a DNA marker without any previous information as to its chromosomal location.  The localization of G8 was further refmed to the tip of the short arm of chromosome 4 by M and somatic cell hybrid panels 3 in situ hybridizationó . Multipoint linkage analysis 65 with centromeric markers showed RD was more tightly linked to D4S1O, therefore placing the disease gene between D4S 10 and the telomere, a region estimated to contain 6Mb of 66 (Figure 1-2). DNA  Examination of 63 HI) families for linkage between ND and G8 gave a maximum lod score of 87.69 at a 0 of 0.0467. The 4% recombination between G8 and RD gave only an approximate location for the HD gene, because of the unknown relationship between recombination and physical distance in this telomeric region. The combined data from several groups found no evidence for locus heterogeneity after examination of several , but a second locus not linked to 4p and causing the HI) phenotype in 6772 populations some isolated families could not be completely excluded.  %%%% %‘%S %%%% %%%% %%%% %%%%  %%% %%% %%% %%% %%% %%%  %%SS %%% %%S% %%% %%  4p  D4S1O D4S180 D4S95  V  V  D4S168  LU D4S98  V  V  V  D4S115 D4S111 D4S228  V  A  V  D4S133  I  V  A I  D4S96  A I  AA  A I  1000 kb  D4S126 D4S127 D4S182 D4S43 I  Figure 1-2.  I  I  D4S90  V  Schematic map of 4p16.3 showing relative locations of markers. Compiled from references 79 and 92.  13 The serendipitous finding of linkage with G8 (D4S 10) allowed for identification of other polymorphic markers closely linked to H]), such as D4S95 , that permitted 73 presymptomatic diagnosis for families and was the first step towards cloning of the gene.  1.7.2 ESTABLISHING THE CANDIDATE REGION Analysis of families with recombination events between the H]) gene and linked DNA markers has resulted in mutually exclusive candidate regions for the HI) gene 75 (Figure ’ 74 1-3). Many recombinant breakpoints were shown to occur at a hot spot for recombination, lying centromeric to D4S 1076-78 thus reducing the number of recombinants with breakpoints telomeric to D4S 10 in the HD candidate region that contributed to the refinement of the candidate region.  Several recombination events placed the HI) gene proximal to D4S 111 in a 2.2 Mb region between D4S 10 and . 82 In contrast, other recombinant lID chromosomes 79 D4S98 suggested a distal location for the HD gene, either distal to D4S 11176,83, distal to . The 84 75 or within a 325 kb region telomeric to a more distal segment D4S90 D4S227  possibility of an extreme telomeric site for the gene, distal to D4S90, led to the cloning of this 300 kb telomeric region , and subsequently, several lines of evidence resulted in the 85 exclusion of the telomere. For example, new markers distal to D4S90 indicated that one recombinant chromosome previously thought to recombine distal to D4S90 maintained the unaffected haplotype throughout the telomeric region . In addition, the lack of genes at 86 the telomere, and the lack of DNA rearrangements associated with HI) in this telomeric , as well as the discovery that the telomeric sequence of 4p is homologous to other 87 region  telomeric regions 88 excluded a telomeric location for the HI) gene.  However, despite exclusion of the telomere, two mutually exclusive candidate regions still existed. One important recombinant HI) chromosome added reason not to exclude the  D4S1O  D4S125 I  1Mb  D4S95 I  D4S98 I  D4S96 I  _(F  D4S111 I  D4S228 I  localization of recombination breakpoint in a sporadic case of H]). (references 75, 89)  candidate region determined by recombinant chromosomes (references 74-76,79-84, 86, 89)  Diagrammatic representation of the candidate regions on 4pl6.13 for H])  I  D4S90  Figure 1-3. Candidate regions on 4pl&3 for Huntington disease in 1990. Compiled from references 74-89.  15 distal candidate region in the search for the HD . 89 This recombinant individual ’ 75 gene  was from a large family of German origin, and developed chorea and psychiatric problems at age 36. Both his parents were healthy at ages greater than 70 years. There was no history of neurologic disorders in the family, and all 11 sibs of this patient were healthy. Analysis of the chromosomal haplotypes in this family revealed a recombination event had occurred in the proband in the distal region of 4pl6.3, between D4S 111 and D4S 141. It was hypothesized that the recombination event in the parental meiosis and the new mutation causing ND were unlikely to be due to chance alone, due to the assumed low mutation rate and the decreased rate of recombination at the telomere, and therefore the recombination event itself was hypothesized to underlie the cause of ND in at least this patient.  The mutually exclusive candidate regions led to the development of genetic and physical maps of both candidate regions and the cloning of each region in YAC and cosmid . 8 76 clones 9 ’ 092 3 Sequencing of a cosmid contig around D4S98 hinted that this is a gene rich region, with a gene encoded every 20 kb . In addition, the identification of many 93 genes from 4pl6.3 as the search for the elusive ND gene was underway supported this observation. For example, the f3-subunit of phosphodiesterase and the myosin light chain gene were both identified in the distal candidate region 95 and excluded as candidate ’ 94 . 96 genes for ND  The development of improved methods to isolate coding sequences resulted in the identification of additional coding sequences from the proximal candidate region. One strategy, outlined in Chapter 6 demonstrates one approach that generated 53 coding sequences, that represent at least 9 transcription units, two of which were subsequently identified as sequence coding for the ND gene . 97  16 1.7.3 THE HUNTINGTON DISEASE MUTATION The mutation associated with HI) was identified by the HI) Collaborative Research Group  in March 199398. A polymorphic trinucleotide repeat (CAG) in the 5’ end of a novel gene located between D4S 180 and D4S 127 expands beyond the normal range of 10-36 repeats to up to over 100 repeats in patients with HI) . Two alternatively polyadenylated 99 transcripts of 10.3 kb and 13.7 kb are derived from the gene associated with the CAG . The gene codes for a predicted 348 kd protein that is widely expressed but 98100 repeat has no known function . 98  HI) is the seventh disease now known to be caused by a dynamic mutation, including fragile X syndrome (FRAXA)’°”° , fragile XE mental retardation (FRAXE) 2 , 103 myotonic dystrophy (DM)’ , spinal bulbar muscular dystrophy (SBMA) 04106 , 107 spinocerebellar ataxia (S  108, and dentato-rubro-pallido-luysian atrophy  110 ’ 09 (DRPLA)’  1.8 OBJECTIVE The objective of this thesis was to further refine the candidate region for the Hi) gene within the 6 Mb candidate region present at the start of this work. Allelic association was used to delineate the most likely candidate region. Two regions with markers in allelic association with lID were identified, separated by 3Mb.  Haplotype analysis of  homogeneous populations suggested that multiple haplotypes underlie Hi), even within a homogeneous population of affected individuals.  A search for genomic rearrangements was undertaken by screening patients with exomc clones mapping to the proximal candidate region, close to the marker in strong linkage disequilibrium with Hi). One auspicious clone identified a genomic rearrangement cosegregating with HI) in two families, and subsequent mapping, cloning and sequencing  17 identified an Alu retrotransposition event associated with HI) in two families representing a possible cause for disease in these two families.  The mutation associated with HD was identified by the H]) Collaborative Group during the course of this thesis. Identification of the mutation thus redirected the aim of this thesis towards understanding of the nature of the dynamic mutation. Towards this objective, genetic analysis of the CAG repeat in HI) families resolved several previously confusing clinical features of H]) and resolved the confusion of mutually exclusive candidate regions suggested by recombinant individuals. In addition, the cloning of the gene allowed for a retrospective analysis of the positional cloning approaches utilized and an assessment of the use of linkage disequilibrium in gene mapping. Towards this end, new polymorphic markers in the gene, in addition to previously used neighbouring markers, permitted a retrospective analysis of linkage disequilibrium in this region and construction of haplotypes provided unique insights into the history of the gene for HD.  The work presented in this thesis could not possibly have been done by one person alone. I have been fortunate to receive the help of several colleagues, especially Jane Theilmann and Amy Hedrick in generating sufficient data for linkage disequilibrium analyses. Much of the work in the Molecular Analysis section of this thesis involved teamwork and co operation amongst members of the laboratory. My contribution is discussed in the Results sections of each chapter, with work done by colleagues discussed in the Introduction only to provide the setting for which my work was done. Analysis of the CAG repeat also required a team effort by Dr. Paul Goldberg, Jane Theilmann, Jutta Zeisler and Dr. Nando Squitieri for generation of such a large amount of data.  18  1.9 REFERENCES 1. Thompson MW, Mclnnes RR, Willard HF (1991). Genetics in Medicine 5th ed. WB Saunders Co, Philadelphia, pp 10. 2. King RA, Rotter fl, Motuisky AG (1992). The Genetic Basis of Common Diseases. Oxford University Press, New York, pp4. 3. McKusick VA (1989). Mendelian Inheritance in Man, 9th edition. Johns Hopkins University Press, Baltimore. 4. Haldane JBS (1948). The formal genetics of man. Proc Roy Soc London 135:147170. 5. Davenport CB and Muncey EB (1916). Huntington’s chorea in relation to heredity and eugenics. Am 3 Insan 73:195-222.  6. Hayden MR (1981). Huntington’s chorea. Springer-Verlag, New York. 7. Huntington G (1872). On chorea. Med Surg Rep 26:317-255. 8. Conneally PM (1984). Huntington’s disease: genetics and epidemiology. Am J Hum Genet 36: 506-526. 9. Wexier NS, Rose EA, Housman DE (1991). Molecular approaches to hereditary diseases of the nervous system: Huntington disease as a paradigm. Ann Rev Neurosci 14:503-529. 10. Wexier NS, Young AB, Tanzi R, Travers H, Starosta-Rubenstein S, Penney JB, Snodgrass SR, Shoulson I, Gomez F, Arrayo MAR, Penchaszadeh GK, Moreno H, Gibbons K, Faryniarz A, Hobbs W, Anderson MA, Bonilla E, Conneally PM, Gusella iF (1987). Homozygotes for Huntington’s disease. Nature 326:194- 197. 11. Myers R, Leavitt J, Farrer LA, Jagadeesh J, McFarlane H, Mastromauro CA, Mark RJ, Gusella IF (1989). Homozygotes for Huntington disease. Am 3 Hum Genet 45:615618. 12. Howeler CJ, Busch HFM, Geraedts 3PM, Niermeijer MF, Staal A (1989). Anticipation in myotonic dystrophy: fact or fiction? Brain 112:10-16. 13. Harper PS, Harley HG, Reardon W, Shaw DJ (1992). Anticipation in myotonic dystrophy: new light on an old problem. Am 3 Hum Genet 51:10-16. 14. Sherman SL, Jacobs PA, Morton NE, Froster-Iskenius U, Howard-Peebles PN, Nielsen KB, Partington MW (1985). Further segregation analysis of the fragile X syndrome with special reference to transmitting males. Hum Genet 69:289-299. 15. Mclnnis MG, McMahon FJ, Chase GA, Simpson SO, Ross CA, DePaulo JR (1993). Anticipation in bipolar affective disorder. Am J Hum Genet 53:385-390.  19  16. Penrose LS (1948). The problem of anticipation in pedigrees of dysirophia myotonica. Ann Eugenics 14:125-132.  17. Brook ii), McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, Hunter K, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA, Crow S, Davies 3, Shelbourne P, Buxton J, Jones C, Junoven V, Johnson K, Harper PS, Shaw DJ, Housman DE (1992). Molecular basis of myotonic dystrophy: Expansion of a trinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase family member. Cell 68:799-808. 18. Fu Y-H, Pizzuti A, Fenwick RG Jr. King J, Rajnarayan S. Dunne PW, Dubel J, Nasser GA, Ashizawa T, de Jong P, Wiereinga B, Korneluk R, Perryman MB, Epstein HF, Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255:1256-1258. 19. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C, Narang M, Barcelo J, O’Hoy K, Leblond S, Earle-MacDonald J, de Jong PJ, Wieringa B, Korneluk B (1992). Myotonic dystrophy mutation: an unstable CTG repeat in the 3’ untranslated region of a candidate gene. Science 255:1253-1255. 20. Oberle I, Rousseau F, Heitz D, Kretz C, Devys D, Hanauer A, Boue J, Bertheas MF, Mandel JL (1991). Instability of a 550-base pair DNA segment and abnormal methylation in fragile X syndrome. Science 262:1097-1102. 21. Yu S, Pritchard M, Kremer E, Lynch M, Nancarrow J, Baker E, Hohnan KMulley JC, Warren ST, Schiessinger D, Sutherland GR, Richards RI (1991). Fragile X characterized by an unstable region of DNA. Science 252:1179-1181. 22. Fu Y-H, Kuhi DPA, Pizzuti A, Pieretti M, Sutcliffe JS, Richards S. Verkerk AJMH, Holden JJA, Fenwick RG Jr. Warren ST, Oostra BA, Nelson DL, Caskey CT (1991). Variation of the CGG repeat at the fragile X site results in genetic instability: Resolution of the Sherman paradox. Cell 67:1-20. 23. Telenius H, Kremer HPH, Theilma.nn J, Andrew SE, Almqvist E, Anvret M, Greenberg C, Greenberg J, Lucotte G, Squitieri F, Starr E, Goldberg YP, Hayden MR (1993). Molecular analysis of juvenile Huntington disease: the major influence on CAG repeat length is the sex of the affected parent. Hum Mol Genet 2:1535-1540. 24. Ridley RM, Frith CD, Crow TJ, Conneally PM (1988). Anticipation in Huntington’s disease is inherited through the male line but may originate in the female. J Med Genet 25:589-595. 25. Farrer LA, Cupples LA, Kiely DK, Conneally PM and Myers RH (1992). Inverse relationship between age of onset disease and paternal age suggests involvement of genetic imprinting. Am 3 Hum Genet 50: 528-535. 26. Stevens D and Parsonage MJ (1969). Mutation in Huntington’s chorea. 3 Neurol Neurosurg Psychiatry 32:140-143. 27. Denny-Brown 3(1962). The Basal Ganglia: Their Relation to Disorders ofMovemenL Oxford University Press, London.  20  28. Bird TI) (1978). The brain in Huntington’s chorea. Psycho! Med 8:357-360. 29. Vonsattel JP, Myers RH, Stevens TJ, Ferrante RJ, Bird ED (1985). Neuropathological classification of Huntington’s disease. J Neurol Exp Neuropathol 44:549-557. 30. Graybiel AM and Ragsdale CW (1978). Histochemically distinct compartments in striatum of human, monkey and cat demonstrated by acetyithiocholinesterase staining. Proc Nail Acad Sci USA 75:5723-26. 31. Reiner A, Albin RL, Anderson KD, D’amato CJ, Penney JB, Young AB (1988). Differential loss of striatal projection neurons in Huntington’s disease. Proc Nat! Acad Sci USA 85:5733-5737. 32. Tobin AJ (1989). Huntington’s disease. In Neurobiology of disease, RC Coffins, AL Pearlinan, eds. Oxford University Press, New York. 33. Coffins FS (1992). Positional cloning: let’s not call it reverse anymore. Nature Genet 1:3-6. 34. Royer-Pokora B, Kunkel LM, Monaco AP, Goff SC, Newburger PE, Baehner RI, Cole FS, Curnutte JT, Orkin SH (1986). Cloning the gene for an inherited human disorder-chronic granulomatous disease on the basis of its chromosomal location. Nature 322:32-38. -  35. Monaco AP, Neve RL, Colletti-Feener C, Bertelson CJ, Kurnit DM, Kunkel LM (1986). Isolation of candidate cDNAs for portions of the Duchenne muscular dystrophy gene. Nature 323:646-650. 36. Friend SH, Bernards R, Rogeiji S, Weinberg RA, Rapaport JM, Albert DM, Drya TP (1986). A human DNA segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma. Nature 323:643-646. 37. Rommens JM, lannuzzi MC, Kerem BS, Drumm ML, Melmer 0, Dean M, Rozmahel R (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping. Science 245:1059-1065. 38. Morton NE (1955). Sequential tests for the detection of linkage. Am 3 Hum Genet 7:277-318. 39. Conneally PM and Rivas M (1980). Linkage analysis in man. In Advances in Human Genetics Vol 10, H Harris and H. Hirschhorn, eds. Plenum Press, New York, pp 209266. 40. Lathrop GM, Lalouel M, Julier C, Ott 3. (1985). Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am 3 Hum Genet 37:482498. 41. Tsui L-C and Estivil X (1991). Identification of disease genes on the basis of chromosomal localization. In Genome analysis vol 3:Genes and Phenotypes. Cold Spring Harbor Laboratory Press, Cold Spring Harbour, pp 1-36.  21  42. Kerem B-S, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui L-C (1989). Identification of the cystic fibrosis gene: Genetic analysis. Science 245:1073-1080. 43. Burke DT, Cane GF, Olson MV (1987). Cloning of large segments of exogenous DNA in yeast by means of artificial chromosome vectors. Science 236:806-811. 44. Stemberg N (1990). Bacteriophage P1 cloning system for isolation, amplification and recovery of DNA fragments as large as 100 kilobase pairs. Proc Nail Acad Sci USA 87:103-107. 45. Collins 3 and Hohn B (1978). Cosmids: a type of plasmid gene-cloning vector that is packageable in vitro. Proc Nail Acad Sci USA 75:4242-4250. 46. Parrish JE and Nelson DL (1993). Methods for fmding genes: A major rate-limiting step in positional cloning. GATA 10:29-41. 47. Lovett M, Kere 3, Hinton LM (1991). Direct selection: A method for the isolation of cDNAs encoded by large genomic regions. Proc Nail Acad Sci USA 8 8:9628-9632. 48. Rommens J, Lin B, Hutchinson GB, Andrew S, Goldberg YP, Glaves ML, Graham R, Lai V, McArthur 3, Nasir 3, Theilmann 3, McDonald H, Kalchman M, Clarke LA, Shappert K, Hayden MR (1993). A transcription map of the region containing the Huntington disease gene. Hum Mol Gen 2:901-907. 49. Parimoo S, Patanjali SR, Shulda H, Chaplin DD, Weissman SM (1991). cDNA selection: Efficient PCR approach for the selection of cDNAs encoded in large chromosomal DNA fragments. Proc Nail Acad Sci USA 88:9623-9627. 50. Duyk GM, Kim S, Myers RM and Cox DR (1990). Exon trapping: A genetic screen to identify candidate transcribed sequences in cloned mammalian genomic DNA. Proc Nail Acad Sci USA 87:8995-8999. 51. Buckler AJ, Chang DD, Graw SL, Brook JD, Haber DA, Sharp PA, Housman DE (1991). Exon amplification: A strategy to isolate mammalian genes based on RNA splicing. Proc Nail Acad Sci USA 88:4005-4009. 52. Uberbacher EC, Mural RJ (1992). Locating protein-coding regions in human DNA sequences by multiple sensor-neural network approach. Proc Nail Acad Sci USA 88:11261-11265. 53. Hutchinson G and Hayden MR (1992). The prediction of exons through an analysis of spliceable open reading frames. Nuci Acids Res 20:3453-3462. 54. Myers RM, Lumelsky N, Lerman LS, Maniatis T (1985). Detection of single base substitutions in total genomic DNA. Nature 313:495-498. 55. Myers RM, Maniatis T, Lerman LS (1987). Detection and localization of single base changes by denaturing gel electrophoresis. Methods Enzymol 155:501-527. 56. Saleeba JA, Ramus SJ, Cotton RGH (1992). Complete mutation detection using unlabelled chemical cleavage. Human Mutation 1:63-69.  22  57. Orita M, Suzuki Y, Seikiya T, Hayashi K (1989). Rapid and sensitive detection of point mutations and DNA polymorphisms using the polymerase chain reaction. Genomics 5:874-879. 58. Wong C, Dowling CE, Saiki RK, Higuchi RG, Erlich HA, Kazazian Jr HH (1988). Characterization of beta-thalassemia mutation using direct genomic sequencing of amplified single copy DNA. Nature 330:384-387. 59. Kunkel LM, Monaco AP, Middlesworth W, Ochs HI), Latt SA (1985). Specific cloning of DNA fragments absent from the DNA of a male patient with an X chromosome deletion. Proc Nail Acad Sci USA 82:4778-47 82. 60. Botstien D, White RL, Skolnick M, Davis RW (1980). Construction of a genetic linkage map in man using restriction length polymorphisms. Am 3 Hum Genet 32:3 14331. 61. Gusella iF, Wexier NS, Conneally PM, Naylor S, Anderson RE, Tanzi RE, Watkins K, Ottina M, WallaceA, Sakaguchi A, Young I, Shoulson E, Bonilla E, Martin JB (1983). A polymorphic marker genetically linked to Huntington’s disease. Nature 306:234-238. 62. Sturtevant AH (1913). The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool 14:43-49. 63. Landegent SE, Jansen in de Wal N, Fisser-Groen YM, Baker E, van der Ploeg M, Pearson PL (1986). Fine mapping of the Huntington disease linked D4S 10 locus by nonradioactive in situ hybridization. Hum Genet 73:354-357. 64. Wang HS, Greenberg CR, Hewitt 3, Kalousek D, Hayden MR (1986). Subregional assignment of the linked marker G8 (D4S 10) for Huntington disease to chromosome 4pl6.3. Am 3 Hum Genet 39:392-396. 65. MacDonald ME, Anderson MA, Gilliam TC, Tranebjaerg L, Carpenter NJ, Magenis E, Hayden MR, Healey ST, Bonner TI, Gusella SF (1987). A somatic cell hybrid panel for localizing DNA segments near the Huntington disease gene. Genomics 1:29-34. 66. Gilliam TC, Tanzi RE, Haines IL, Bonner TI, Faiyniarz AG, Hobbs WJ, MacDonald ME, Cheng SV, Folstein SE, Conneally PM, Wexier NS, Gusella IF (1987). Localization of the Huntington’s disease gene to a small segment of chromosome 4 flanked by D4S1O and the telomere. Cell 50:565-57 1. 67. Conneally PM, Haines IL, Tanzi RE, Wexier NS, Penchaszadeh GK, Harper PS, Foistein SE, Cassiman JJ, Myers RH, Young AB, Hayden MR, Falek A, Tolosa ES, Crespi S, Di Maio L, Holmgren, Anvret M, Kanazawa I, Gusella IF (1989). Huntington’s disease: No evidence for locus heterogeneity. Genomics 5:304-308. 68. Youngman 5, Sarfarazi M, Quarrell OWJ, Conneally PM, Gibbons K, Harper PS, Shaw DJ, Tanzi RE, Wallace MR, Gusella IF (1986). Studies of a DNA marker (G8) genetically linked to Huntington disease in British families. Hum Genet 73:333-339.  23  69. Greenberg U, Martell RW, Theilmann 3, Hayden MR, Joubert 3 (1991). Genetic linkage between Huntington disease and the D4S 10 locus in South African families: further evidence against non-allelic heterogeneity. Hum Genet 87:701-708. 70. Frontali M, Malaspina P. Rossi C, Jacopini AG, Vivona G, Pergola MS, Palena A, Novelletto A (1990). Epidemiological and linkage studies on Huntington’s disease in Italy. Hum Genet 85:165- 170. 71. Kanazawa I, Kondo I, Ikeda JE, Ikeda T, Shizu Y, Yoshida M, Narabayashi H, Kuroda S, Tsunoda H, Mizuta E, Okuno Y, Sugawara K, Murata M, Takahashi M, Gusella iF (1990). Studies on DNA markers (D4S1O and D4S431S 127) genetically linked to Huntington’s disease in Japanese families. Hum Genet 85:257-260. 72. Ajmar F, Mandich P. Bellone E, Abbruzzese G (1991). Huntington disease in Italy. Am 3 Med Gen 39:2 1 1-2 14.  Genetic analysis of  73. Wasmuth 33, Hewitt 3, Smith B, Allard D, Haines IL, Skarecky D, Partlow, Hayden MR (1988). A highly polymorphic locus very tightly linked to the Huntington disease gene. Nature:332:734-736. 74. Pritchard CA, Cox DR, Myers RM (1991). Invited Editorial: The end in sight for Huntington disease. Am 3 Hum Genet 49:1-6. 75. Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann 3, Hayden MR (1992). Delineation of a 50kb DNA segment containing the recombination site in a sporadic case of Huntington’s disease. Nature Genet 2: 216-222.. 76. Skraastad MI, Bakker E, de Lange LF, Vegter van der Vlis M, Klein-Breteler EG, van Ommen GJB, Pearson PL (1989). Mapping of recombinants near the Huntington disease locus by using G8 (D4S1O) and newly isolated markers in the D4S1O region. Am 3 Hum Genet 44: 560-566. 77. Allitto BA, MacDonald ME, Bucan M, Richards J, Romano D, Whaley WL, Falcone B, lanazzi I, Wexier NS, Wasmuth 33, Collins FS, Lehrach H, Haines IL, Gusella IF (1991). Increased recombination adjacent to the Huntington disease-linked D4S1O marker. Genomics 9:104-112. 78. Richards JE, Giffiam TC, Cole IL, Drumm ML, Wasmuth 33, Gusella IF, Collins FS (1988). Chromosome jumping from D4S 10 (G8) toward the Huntington disease gene. Proc Natl Acad Sci USA 85:6437-6441. 79. Bates GP, MacDonald ME, Baxendale S. Youngman 5, Lin C, Whaley L, Wasmuth 33, Gusella IF, Lebrach H (1991). Defined physical limits of the Huntington disease gene candidate region. Am J Hum Genet 49:7- 16. 80. Whaley WL, Bates GP, Novelletto A, Sedlacek Z, Cheng S, Romano D, Ormondroyd E, Allitto B, Lin C, Youngman S, Baxendale S. Bucan M, Altherr M, Wasmuth 3, Wexler NS, Frontali M, Frischauf A-M, Lehrach H, MacDonald ME, Gusella IF (1991). Mapping of cosmid clones in Huntington’s disease region of chromosome 4. Som Cell and Mol Genet 17:83-9 1.  24  81. Snell RG, Thompson LM, Tagle DA, Holloway TL, Barnes G, Harley HG, Sandkuijl LA, MacDonald ME, Collins FS, Gusella iF, Harper PS, Shaw DJ (1992). A recombination event that redefmes the Huntington disease region. Am J Hum Genet 51:357-362. 82. Barron L, Curtis A, Shrimpton AE, Holloway S, May H, Snell RG, Brock DJH (1991). Linkage disequilibrium and recombination make a telomeric site for the Huntington disease gene unlikely. 3 Med Genet 28:520-522. 83. MacDonald ME, Haines IL, Zimmer M, Cheng SV, Youngman S, Whaley WL, Wexier N, Bucan M, Auto BA, Smith B, Leavitt 3, Poustka A, Harper P, Lehrach H, Wasmuth JJ, Frischauf AM, Gusella JF (1989). Recombination events suggest potenial sites for the Huntington disease gene. Neuron 3:183-190. 84. Robbins C, Theilmann 3, Youngman 5, Haines J, Altherr MJ, Harper PS, Payne C, Junker A, Wasmuth 3, Hayden MR (1989). Evidence from family studies that the gene causing Huntington disease is telomeric to D4S95 and D4S90. Am 3 Hum Genet 44:422425. 85. Bates GP, MacDonald ME, Baxendale S, Sedlacek Z, Youngman S. Romano D, Whaley WL, Allitto BA, Poustka A, Gusella IF, Lehrach H (1990). A yeast artificial chromosome telomere clone spanning a possible location of the Huntington disease-linked D4S 10 marker. Genomics 9:104-112. 86. Pritchard C, Zhu N, Zuo J, Bull L, Pericak-Vance MA, Vance 3M, Roses A, Milatovich A, Francke U, Cox DR, Myers RM (1992). Recombination of 4pl6 DNA markers in an unusual family with Huntington disease. Am 3 Hum Genet 50:1218- 1230. 87. Pritchard C, Casher D, Bull L, Cox Dr. Myers RM (1990). A cloned DNA segment from the telomeric region of human chromosome 4p is not detectably rearranged in Huntington disease patients. Proc Natl Acad Sci USA 87:7309-7313. 88. Youngman S, Bates GP, Williams 5, McClatchey Al, Bexendale 5, Zdenek Sedlacek, Altherr M, Wasmuth 33, MacDonald ME, Gusella IF, Sheer D, Lehrach H (1992). The telomeric 60 kb of chromosome arm 4p is homologous to telomeric regions on l3p, l5p, 2lp, and 22p. Genomics 14:350-356. 89. Wolff G, Deuschl G, Wienker TF, Hummel K, Bender K, Lucking C, Schumacher M, Hammer 3, Oepen G (1989). New mutation to Huntington’s disease. 3 Med Genet 26:1827. 90. Buetow KH, Shiang R, Yang P, Nakamura Y, Lathrop GM, White R, Wasmuth 33, Wood 5, Berdahi LD, Leysens NJ, Ritty TM, Wise ME, Murray JC (1991). A detailed multipoint map of human chromosome 4 provides evidence for linkage heterogeneity and position-specific recombination rates. Am 3 Hum Genet 48:911-925. 91. Zuo 3, Robbins C, Baharloo S. Cox DR. Myers R (1993). Construction of cosmid contigs and high-resolution restriction mapping of the Huntington disease region of human chromosome 4. Hum Mol Genet 2:889-899. 92. Bucan M, Zimmer M, Whaley WL, Poustka A, Youngman S, Allitto BA, Ormondroyd E, Smith B, Pohl TM, MacDonald ME, Bates GP, Richards 3, Volinia S, Gilliam TC,  25  Sedlacek Z, Collins FS, Wasmuth 33, Shaw D3, Gusella SF, Frischauf AM, Lebrach H (1990). Physical maps of ‘lpl6.3, the area expected to contain the Huntington disease mutation. Genomics 6:1-15. 93. McCombie WR, Martin-Gallardo A, Gocayne ii), Fitzgerald M Dubnick M, Kelley JM, Castilla L, Liu LI, Wallace S, Trapp S. Tagle D, Whaley WL, Cheng S, Gusella 3, Frischauf A-M, Poustka A, Lehrach H, Collins FS, Kerlavage AR, Fields C, Venter JC (1992). Expressed genes, Alu repeats and polymorphisms in cosmids sequenced from chromosome 4p16.3 Nature Genet 1:348:353. 94. Weber, B, Riess 0, Hutchinson G, Collins C, Lin B, Kowbel D, Andrew S. Schappert K, Hayden (1992). Genomic organization and complete sequence of the human gene encoding the n-subunit of the cGMP phosphodiesterase and its localisation to 4p16.3. Nucl Acids Res 19:6263-6268. 95. Collins C, Schappert K, Hayden MR (1992). The genomic organization of a novel regulatory myosin light chain (MYL5) that maps to chromosome 4pl6.3 and shows different patterns of expression between primates. Hum Mol Gen 1:727-733. 96. Riess 0, Noerremoelle A, Collins C, Mah D, Weber B, Hayden MR (1992). Exclusion of DNA changes in the f3-subunit of the cGMP phosphodiesterase gene as the cause for Huntington disease. Nature Genet 1:104-108. 97. Rommens 3, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, Graham R, Vai L, McArthur 3, Theilmann 3, McDonald H, Kalchman M, Clarke LA, Schappert K, Hayden MR (1993). A transcription map of the region containing the Huntington disease gene. Hum Mol Genet 2:901-907. 98. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72:971-983. 99. Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann 3, Adam S, Starr E, Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington’s disease. Nature Genet 4:398-403. 100. Lin B, Rommens JM, Graham RK, Kalchman M, McDonald H, Nasir 3, Delaney A, Goldberg YP, Hayden MR (1994). Differential 3’ polyadenylation of the Huntington disease gene results in two mRNA species with variable tissue expression. Hum Mol Genet 2:1541-1545.  101. Yu 5, Pritchard M, Kremer E, Lynch M, Nancarrow 3, Baker E, Holman K, Mulley JC, Warren ST, Schiessinger D, Sutherland GR, Richards RI (1991). Fragile X genotype characterized by an unstable region of DNA. Science 252:1179-1181. 102. Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST, Schiessinger D, Sutherland GR, Richards RI (1991). Mapping of DNA instability at the fragile X to a trinucleotide repeat sequence p(CCG)n. Science 252:1711-1714. 103. Knight SJL, Flannery AV, Hirst MC, Campbell L, Christodoulou Z, Phelps SR, Pointon 3, Middleton-Price HR, Barnicoat A, Pembrey ME, Holland J, Oostra BA,  26  Bobrow M, Davies KE (1993). Trinucleotide repeat amplification and hypermethylation of a CpG island in FRAXE mental retardation. Cell 74:127-134. 104. Fu Y-H, Pizzuti A, Fenwick RG Jr, King 3, Rajnarayan S. Dunne PW, Dubel 3, Nasses GA, Ashizawa T, de Jong P. Wieringa B, Lorneluk R, Perryman MB, Epstein HF, Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255:1256-1258. 105 Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C, Narang M, Barcelo 3, O’Hoy K, Leblond S, Earle-MacDonald 3, de Jong P3, Wieringa B, Korneluk RG (1992). Myotonic dystrophy mutation an unstable CTG repeat in the 3’ untranslated region of the gene. Science 255:1253-1255. 106. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, Hunter K, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA, Crow S, Davies 3, Shelbourne P, Buxton 3, Jones C, Juvonen V, Johnson K, Harper PS, Shaw DJ, Housman DE (1992). Molecular basis of myotonic dystrophy: expansion of a trinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase family member. Cell 68:799-808. 107. La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH (1991). Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 352:77-79. 108. Orr, HT, Chung M, Banfi S, Kwiatowski TJ Jr, Servadio A, Beadet AL, McCall AE, Duvick LA, Ranum LPW, Zoghbi HY (1993). Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nature Genet 4:221-226. 109. Nagafuchi S. Yanagisawa H, Sato K, Shirayama T, Ohsaki E, Bundo M, Takeda T, Tadokoro K, Kondo I, Murayama N, Tanaka Y, Kikushima H, Umino K, Kurosawa H, Furukawa T, Nihei K, Inoue T, Sano A, Komure 0, Takahashi M, Yoshizawa T, Kanazawa I, Yamada M (1994). Expansion of an unstable CAG trinucleotide on chromosome 12p in dentatorubral pallidoluysian atrophy. Nature Genet 6:14-18. 110. Koide R, Ikeuchi T, Onodera 0, Tanaka H, Igarashi S, Endo K, Takahashi H, Kondo R, Ishikawa A, Hayashi T, Saito M, Tomoda A, Miike T, Naito H, Ikuta F, Tsuji S (1994). Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysian strophy (DRPLA). Nature Genet 6:9-13.  27  CHAPTER 2 MATERIALS AND METHODS  28  2.1 GENETIC ANALYSIS All family DNA used was part of the Canadian Predictive Testing program for ND or was contributed separately by HD families for research purposes. The clinical diagnosis of HD was made by a neurologist or geneticist. Clinical details of patients were obtained from  extensive records, documented neurological examination, and special investigations such as computerized positron emission tomography (PET) and autopsy records.  Approximately 80% of the DNA samples banked in Vancouver are of UK or Western European descent. For family studies, alleles in unrelated family members and the canonical non-Huntington allele of affected persons were used as control chromosomes. For linkage disequilibrium studies it was assumed that an unaffected spouse was of similar ancestry to that of the affected patient, and that allele frequencies determined from control chromosomes were an accurate estimate of the allele frequencies in the population that the ND patients are drawn from. However, the measure of disequilibrium is highly dependent on allele frequencies and the possibility of inaccurate matching of controls to ND patients is a limitation of this type of analysis. To overcome this, homogeneous populations with confirnied ancestral similarity between controls and affecteds were analyzed in Chapter 4.  For allelic association studies, one affected chromosome was counted per pedigree and as many unique control chromosomes as could be determined. The number of affected individuals and controls used for analysis with each probe varied depending on the availability of DNA and informativeness with each probe. For haplotype analysis, affected and control haplotypes were constructed based on family data where phase was unequivocally determined.  29 2.2 DNA ISOLATION AND SOUTHERN BLOT ANALYSIS DNA was extracted from leukocytes by standard extraction procedures . For Southern blot 1 analysis, 5.i,g of genomic DNA was digested to completion with the appropriate restriction enzyme (BRL). DNA was fractionated on an agarose gel (7 to 1.2%) by electrophoresis, then transferred to Hybond N membranes (Amersham) . 2  Southern blots were prehybridized for 1 hour and hybridized overnight in 0.5M sodium phosphate buffer, pH 7.2, 7% SDS and 1mM EDTA 3 at 65°C. Blots were washed twice for 20 minutes each at .5X SSC, .1% SDS at 65°C, followed by a final stringent wash at  .1XSSC, .1% SDS at 65°C for 5 minutes. Autoradiography was from overnight to 4 days. 2.3 DNA PROBES All probes used are listed in Table 2-1’. The order of the DNA markers has been previously determined and is shown in Figure 216.117.  2.4 PREPARATION OF HYBRIDIZATION PROBES Cloned DNA inserts were prepared as probes for hybridization as follows. DNA inserts were isolated from vector sequence by digestion with the appropriate restriction enzyme (BRL) and purification from low melting point agarose gel (BRL) slices. All DNA probes  were labeled by 18 oligolabeling and purified on a G-25 spin column . Probes were 19 blocked for repetitive elements prior to hybridization by boiling for 10 minutes together with 300 p.g of sonicated total human DNA, followed by pre annealing at 65°C for 1 hour.  2.5 PCR PRIMERS Oligonucleotides were synthesized on a PCR-Mate 391 DNA synthesizer (Applied Biosystems) and purified by reverse phase chromatography (Sep-Pak C18, Waters) according to Atkinson and Smith . 20  30  Table 2-1. List of previously published probes used in this thesis. PROBE  LOCUS  P8 G8 pBS674E-D (674) 127CA S7 pBS731B-C (731) pBS678 (678) p157-9 (157) Rs3 Ac2 BS1 cl6Dp/E2Rep (E2) cl6DpfM4.2 (4.2) 2R3  D4S62 D4S1O D4S95 D4S 127 D4S43 D4S98 D4S96 D4S111 D4S227 D4S227 D4S133 D4S228 D4S228 D4S141  REFERENCE NUMBER 4 5 6 12 7 8 8 9 10 10 11 10 10 13  (P8) D4S62 I  4c  I. 0  Figure 2-1.  (GO) 04S10 D4S180 I I  D4S95  (674) *  (BJ56) D45182 D4S127  I  2  (S 7) D4S43  I  (731) D4S98  I  D4S168  I 3  “  //  I 4  I  (678) D4S96  I  I  I  5  (BS1) D45133  I  (E2&4.2) (157) D4SI1I D4S227 D4S228  (2R3) D45141  I  I  6Mb  -I  4pter  tD5 D4S90  A schematic map of 4pl 6.3 showing approximate distances between markers (compiled from references 4 to 13).  I  D4S125  ÷  32  2.6 DNA SEQUENCING 2.6.1 DOUBLE STRAND SEQUENCING OF CLONED PRODUCTS Clones were sequenced manually according to the dideoxysequencing method (Sequenase Kit USB).  2.6.2 SEQUENCING OF SINGLE STRAND PCR PRODUCTS Asymmetric PCR was used to generate single-strand DNA templates from PCR products according to Sambrook . 10  Double strand PCR product was obtained after initial  amplification of first strand eDNA. The PCR fragments were then purified using Gene Clean (Bio Can 101 mc) and used as a template for generating single strand PCR products using a primer ratio of 100:1 for the two oligonucleotide primers for 45 cycles. Single strand asymmetric PCR products were purified by centrifugal filtration (30000 NMWL filter, Millipore) and fmally sequenced using 1 pmole of the limiting primer by the dideoxy sequencing method (Sequenase Kit USB).  2.7 PREPARATION OF cDNA TEMPLATE First strand cDNA was prepared from previously isolated RNA from control and affected tissue. 5 .tg of RNA was reverse transcribed according to the Superscript pre amplification system (Amersham) with 0.5 pmole random hexamers and 0.5 pmole oligo(d’l), 0. 1mM dNTPs, 10mM DTT, 36 units RNasin, 200 units reverse transcriptase, and RT buffer (20mM Tris-HC1 (pH 8.3), 50 mM KC1, 2.5 mM MgC12, .1 mg/mi BSA). First strand cDNA was diluted 1:100 and used as PCR template.  2.8 STATISTICAL ANALYSIS OF NONRANDOM ALLELIC ASSOCIATION To determine the extent of nonrandom allelic association or linkage disequilibrium, a 2 in which r previously 2 ’ correlation coefficient, r, was estimated as defined 21  =  33 2 where p1 and P2 axe the frequencies of the alleles of locus A, and qj and D/(pjpqq)l/ q2 are the frequencies of the alleles of locus B. D is the linkage disequilibrium parameter  defined as: D  =  P11 piqi, where Pu is the frequency of the A1B] haplotype. A positive -  value of D indicates the two most common alleles at each locus are in association, whereas a negative value indicates a common allele at one locus and a rare allele at the other. One constraint of D as a measure of allelic association is its dependence on allele frequencies. The r value varies between +1.0 and -1.0. A chi-square test of the null hypothesis was given by x 2  =  2 where N is the total number of gametes in the sample. The chi-square Nr  statistic has (rn-i) x (n-i) degrees of freedom, where rn and n are the number of alleles.  For comparisons involving loci with more than two alleles, the most common allele at this locus was defmed as one allele and the remaining alleles were pooled to form a second. Even though this approach has previously been taken in prior studies of linkage ’ this can alter the statistical power of the disequilibrium test. Further 20 disequilibrium analysis was therefore undertaken, using a chi-square test for linkage disequilibrium  t as: ’ 23 between multiallelic loci which had previously been defined  X2=N jJ In this summation approach, locus A has rn alleles A, i=1,...,rn, locus B has n alleles B, j=i,...,n, population frequencies of alleles A and B are written as pf and qj and N is 24 D is given by; Djj = p(j -Pjqj. ’ 23 sample defined as the total number of gametes in the . The cM-square statistic has (rn-i) x (n-i) degrees of freedom.  The Yule’s association coefficient (IQI) is another measure of the degree of allelic association that is less dependent than the D statistic on allele frequencies. This was calculated by IQI=I(ad-bc)/(ad-t-bc)I, where a  =  the number of control chromosomes with  34 allele A, b  =  the number of HI) chromosomes with allele A, c  chromosomes with allele B and d  =  =  the number of control  the number of HI) chromosomes with B. For multi  allelic RFLPs the Yule’s coefficient was calculated by using the most common allele versus pooled remaining alleles. IQI ranges from 0 to 1 with a IQI of 1 representing maximum allelic association. Although this coefficient dates to 1968, its use in successfully defining the minimal candidate region for CF and pinpointing the region for the CF mutation . 37 confirmed its appropriateness and reinstituted its use in allelic association studies  Pairwise haplotype analysis for control chromosomes was performed by comparing the number of observed haplotypes to the number expected based on allele frequencies determined from controls. Pairwise haplotype analysis for H]) chromosomes was performed comparing the number of observed haplotypes on I{D chromosomes to those seen in controls.  2.9 STATISTICAL ANALYSIS OF CAG ANALYSIS To examine the relationship between age of onset and CAG length, linear regression was used, with logarithmic transformation of the age of onset, allowing the treatment of an exponential function as an intrinsically linear model. The use of untransformed values yielded a regression line that crossed the x-axis at a CAG length of about 85, implying that lengths beyond that value are associated with negative age of onset, clearly supporting the use of transformed values in this analysis. Furthermore, visual inspection of the plots and examination of the residuals all pointed to the exponential model being superior to the linear model. Data on all other clinical features such as age of onset of chorea, or age of death, were similarly log-transformed. Confidence limits for prediction were calculated from the ) and then back-transformed to absolute 26 log transformed data (using standard formulas ages of onset All statistical analysis was done using Systat.  35  2.10 REFERENCES 1. Kunkel LM, Smith KD, Boyer SH, Borgaonkar DS, Wachtel SS, Miller 03, Breg WR, Jones 11W Jr, Rary JM (1977). Analysis of human Y chromosome specific reiterated DNA in chromosome variants. Proc Nati Acad Sci USA 74:1245-1249. 2. Southern EM (1975). Detection of specific sequences among DNA fragments separated by gel elecirophoresis. 3 Mol Biol 98: 503-517. 3. Church GM and Gilbert W (1984). Genomic sequencing. Proc Natl Acad Sci USA 81:1991-1995. 4. Hayden MR. Hewitt 3, Wasmuth 33, Kastelein 33, Langlois S, Comically M, Haines J, Smith B, Hubert C, Allard D (1988). A polymorphic DNA marker that represents a conserved expressed sequence in the region of the Huntington disease gene. Am 3 Hum Genet 42:125-131. 5. Gusella iF, Wexler NS, Conneally PM, Naylor S, Anderson RE, Tanzi RE, Watkins K, Omna M, WallaceA, Sakaguchi A, Young I, Shoulson E, Bonilla E, Martin JB (1983). A polymorphic marker genetically linked to Huntington s disease. Nature 306:234-238. 6. Wasmuth 33, Hewitt 3, Smith B, Allard D, Haines JL, Skarecky D, Parlow E, Hayden MR (1988). A highly polymorphic locus very tightly linked to the Huntington disease gene. Nature 322:734-736. 7. Giffiam TC, Bucan M, MacDonald ME, Zimmer M, Haines JL, Cheng SV, Pohi TM (1987). A DNA segment encoding two genes very tightly linked to Huntington’s disease. Science 238:950-952. 8. Smith B, Skarecky D, Bengtsson U, Magenis RE, Carpenter N, Wasmuth 33 (1988). Isolation of DNA markers in the direction of the Huntington disease gene from the G8 locus. Am 3 Hum Genet 42:335-344. 9. Pohi M, Zimmer M, MacDonald ME, Smith B, Bucan M, Poustka A, Volinia S (1988). Construction of a NotI library and isolation of new markers close to the Huntington disease gene. Nuci Acids Res 16:9165-9198. 10. Weber B, Hedrick A, Andrew S, Riess 0, Collins C, Kowbel D, Hayden MR (1992). Isolation and characterization of new highly polymorphic DNA markers from a candidate region for the Huntington disease gene. Am 3 Hum Genet 50:382-393. 11. Pritchard CA, Casher D, Uglum E, Cox DR, Myers RM (1989). Isolation and field inversion gel electrophoresis analysis of DNA markers located close to the Huntington disease gene. Genomics 4:408-418. 12. Taylor SAM, Barnes GT, MacDonald ME, Gusella iF (1992). A dinucleotide repeat polymorphism at the D4S 127 locus. Nucl Acids Res 1:142.  36  13. Snell RG, Youngman S. Lehrach H, Sarafarazi M, Harper PS, shaw DJ (1989). A new probe (2R3) in the region of Huntington s disease. Cytogenet Cell Genet 51:1083. 14. Whaley WL, Michiels F, MacDonald ME, Romano D, Zimmer M, Smith B, Leavitt J, Bucan M, Haines JL, Gihiam TC, Zehetner G, Smith C, Cantor CR, Frischauf AM, Wasmuth 33, Lebrach H, Gusella iF (1988). Mapping of D4S98/S 114/Si 13 confines the Huntington s defect to a reduced physical region at the telomere of chromosome 4. Nucl Acids Res 16:11769-11780. 15. Robbins C, Theilmann 3, Youngman 5, Haines 3, Altherr M3, Harper PS, Payne C, Junker A, Wasmuth 3, Hayden MR (1989). Evidence from family studies that the gene causing Huntington disease is telomeric to D4S95 and D4S90. Am 3 Hum Genet 44:422425. 16. MacDonald ME, Lin C, Srinidhi L, Bates G, Alterr M, Youngman S, Whaley WL, Wexler N, Bucan M, Allitto BA, Smith B, Leavitt 3, Poustka A, Harper P, Lehrach H, Wasmuth 33, Frischauf AM, Gusella iF (1989). Recombination events suggest potential sites for the Huntington disease gene. Neuron 3:183-190. 17. Weber B, Collins C, Kowbel D, mess 0, and Hayden MR (1991). Identification of multiple CpG-islands and associated conserved sequences in a candidate region for the Huntington disease gene. Genomics 11:1113-1124. 18. Feinberg AP and Vogelstein B (1983). A technique for radiolabelling DNA restriction endonuclease fragments to high specific activity. Anal Biochem 132:6-13. 19. Sambrook 3, Fritsch EF, Maniatis T (1989). Molecular cloning. A laborc#ory manual, 2nd ed. Cold Spring Harbor Press, Cold Spring Harbor, NY. 20. Atkinson T and Smith M (1984). Solid phase synthesis of oligodeoxyribonucleotides by the phosphitetriested method. In Oligonucleotide synthesis: A practical approach. Gait MJ ed. IRL Press, Oxford, pp 35-8 1. 21. Litt M and Jorde LB (1986). Linkage disequilibrium between pairs of loci within a highly polymorphic region of chromosome 2q. Am 3 Hum Genet 39:166-178. 22. Hill WG and Robertson A (1986). Linkage disequilibrium in finite populations. Theor Appl Genet 38: 226-23 1. 23. Hill WG (1975). Linkage disequilibrium among multiple neutral alleles provided by mutation in finite populations. Theor Popul Biol 8:117-126. 24. Weir BS and Cockerham CC (1978). Testing hypothesis about linkage disequilibrium with multiple alleles. Genetics 88: 633-642. 25. Yule GU and Kendall MG (1968). An introduction to the theory of statistics, 14th ed. Charles Griffin, London. 26. Draper N and Smith H. (1981). Applied Regression Analysis 2nd ed. Wiley, New York.  37  CHAPTER 3 NONRANDOM ALLELIC ASSOCIATION  The work presented in this chapter has contributed to two publications.  Andrew SE, Theilmann J, Hedrick A, Mah D, Weber B, Hayden MR (1992). Nonrandom allele association between Huntington disease and two loci separated by about 3Mb on 4p16.3. Genomics 13:301-311.  Weber B, Hedrick A, Andrew SE, Riess 0, Collins C, Kowbel D, Hayden MR (1991). Isolation and characterization of new highly polymorphic DNA markers from a candidate region for the Huntington disease gene. Am 3 Hum Genet 50:382-393.  38  3.1 INTRODUCTION 3.1.1 ALLELIC ASSOCIATION Nonrandom allelic association or linkage disequilibrium is the identification of nonrandom association of an allele at one locus with an allele at another locus’. As a measure of the deviation of allele frequencies of pairs of markers at two loci from that of expected, linkage disequilibrium measures the degree of departure from equilibrium’ and reflects disturbing . 2 forces including selection, migration, or mutation  This principle has been adapted from population biology and applied to the search for disease genes, by using the disease locus as one locus and linked markers as the second locus. In these studies, nonrandom allelic association between a marker and a disease locus has been assessed by comparing allele frequencies in a group of affected patients to those of a normal control population. Since loci located closest to the site of mutation undergo recombination less frequently than those at a greater distance, they are likely to exhibit a higher degree of nonrandom allelic association. Therefore, linkage disequilibrium . Since 3 reflects, and exploits, the effects of recombination over many previous generations association is evolutionarily related to recombination, and thus distance, an increase in the measure of association as one moves along a chromosome is indicative of movement towards the disease gene.  Kan and Dozy were the first to demonstrate an RFLP close to the (3-globin gene was in . More 4 strong nonrandom allelic association in American blacks with sickle cell anemia recently, the use of linkage disequilibrium mapping has led to the refinement of the , myotonic 5 candidate regions and subsequent cloning of the genes for Friedrich ataxia 8 and others. A classic example of the , cystic fibrosis (CF) 7 , Wilson’s disease 6 dystrophy role of association data in localization of a disease gene was in CF, where increasing measures of nonrandom allelic association were used to localize the CF gene to an 800-kb  39 region, allowing for subsequent identification of the CFIK gene from the region of the highest values of association . Linkage disequilibrium was also used to reduce the likely 8  candidate region for the diastrophic dysplasia gene to 60 kb, from a previously determined . 9 candidate region of 1.6 Mb, which had been limited by a lack of informative meioses  There are several requirements to do appropriate nonrandom allelic association analyses. Firstly, locus heterogeneity must not be present. Mutations in more than one gene would weaken any chance of detecting association between the disease and any one particular marker allele. Secondly, the disease must have a low mutation rate, ideally, with the majority of affected chromosomes descended from one mutation on one ancestral haplotype. Multiple mutations within the same gene may have occurred on distinct chromosomal backgrounds, making it difficult to detect association. Another important consideration is that the disease be prevalent enough to allow for collection of a sufficient number of affected individuals, in order to establish significant results.  The genetic features of HD fit the above criteria thus allowing for the appropriate use of association studies to aid in the refinement of the candidate region. Conneally et a!. 10 and the mutation rate for HD was showed evidence for lack of locus heterogeneity in HD  . In addition, the 11 estimated to be one of the lowest of all known human genetic diseases collection in Vancouver of over 1000 affected individuals, from more than 500 families  with lID provided an invaluable resource for analysis.  The relationship between physical distance between markers and the amount of linkage disequilibrium is not consistent. Theoretically, disequilibrium and distance have an inverse relationship of r 2  =  1/(1-I-4Nec), where r is a standard measure of disequilibrium, Ne is  3 However, there are . 1 ’ 12 effective population size and c is the recombination fraction several reasons why this is not always the case. This only holds if 4NCC is large, and  40 therefore, this may not be true in small genomic regions. In addition, the measure of disequilibrium is dependent on the rate of recombination which is not constant across the , and the measures of disequilibrium reflect this unequal rate of recombination t4 genome along a chromosome.  An examination of disequilibrium between pairs of loci on chromosomes 2q demonstrated that “although a regular relationship between disequilibrium and physical distance may . A 15 occur in some small chromosomal regions, it cannot necessarily be expected to exist” reasonably uniform relationship between physical distance and disequilibrium across small genomic regions has been observed in some . 22 Whereas in other , 6 cases’ 2334 ’ 15 instances no consistent relationship is maintained.  For comparisons involving loci with more than two alleles, testing for nonrandom allelic association can be carried out, with more than one degree of freedom, on the complete set of data. One disadvantage of multi-allelic markers is the decreased number of individuals in each categoiy for analysis, therefore pooling alleles is one method of countering the problem of small numbers. The most common allele at one locus is defined as one allele and the remaining alleles are pooled to form a second. However, the relationship between pooling alleles and detection of nonrandom association is not clear and pooling can alter the . 35 statistical power of the test, so that the chance of detection may or may not be decreased Another method, the summation approach, provides an additional measure of  36 ’ 35 systems disequilibrium for multiallelic .  3.1.2  ALLELIC  ASSOCIATION IN THE  HUNTINGTON DISEASE  CANDIDATE REGION The small numbers of informative recombinant families, the discrepancies in the recombinant data, and the resulting enormity of the problem of analyzing hundreds of  41 candidate genes argued that an alternate approach, such as the use of nonrandom allelic association was necessary to refine the localization of the ND gene.  Early studies demonstrate that there are conflicting and confusing deviations in allele frequencies seen in different populations (Table 3-1) (Figure 3-1). Five previous studies measuring nonrandom allelic association have supported a proximal location for the ND 41 Strong linkage disequilibrium between HI) and D4S95 (AccI and MboI 37 gene . polymorphisms) was observed in all populations tested. Significant association was seen with other markers in this region, namely D4S 180, D4S 127, D4S 182 and D4S43 . 38  Linkage disequilibrium with D4S98 (Sstl polymorphism) was only seen in some 30 37 descent 4 ’ 8 Interestingly, populations, particularly those of English and Scottish . association initially seen in one population, was no longer observed when the sample size Al. 39 was increased four fold, emphasizing the importance of a large sample size  Analysis in an Italian population showed association between ND and proximal markers D4S 10, D4S 127 and D4S43, although it is possible that these results are spurious due to . 42 the small number of individuals in both cohorts  Many other markers had been tested in the previous studies such as proximal loci D4S62, and the more telomeric loci at D4S 141 and D4S90, however no association was seen in the primarily European data sets. Failure to detect nonrandom allelic association, however, does not always imply its absence, particularly when allele frequencies are unequal and ND segregates with the frequent allele or when the sample size is small  Thus, the patterns of linkage disequilibrium across the 4p16.3 region were complex, showing no continuous trend towards a peak measure of disequilibrium, nor a sharp  MARKER (LOCUS)  POPULATION 39 56 58 19 55 41 67 104 51 51 51 97 53 53 59 21 53 24 106 56 72  174 117 128 129 122 97 275 475 112 90 188 370 278 111 112 121 102 92 398 260 153  No. CONTROL No. HD CHROMOSOMES CHROMOSOMES  0.009 0.0077 0.00037 0.014 0.017 0.0018 0.005 0.00 16 0.032 0.0011 0.007 0.00000 0.00095 0.012 0.034 0.021 0.00029 0.026 0.158 0.0043 0.11  SIGNIFICANCE (P VALUE)  Table 3-1. Published data of markers showing nonrandom allelic associations with Huntington disease.  pk082(D4S1O)-HindHI (D4S180)-BamHI BJ56(D4S127)-PvuII  (D45182)-EcoT22 S7(D4S43)-Hindll S1.5(D4S43)-TaqI 731(D4598)-SstI  674(D4S95)-MboI  BJ56(D4S127)--StuI 674(D4S95)-AccI  Italian W. European WEuropean Italian W. European UK multiple origins multiple origins W.European UK multiple origins multiple origins Scottish WEuropean WEuropean Italian UK multiple origins multiple origins Scottish W.European  REFERENCE  42 38 38 42 38 37 41 39 38 37 41 39 40 38 38 42 37 41 39 40 38  (P8) D4S62  4cen  I 0  Figure 3-1.  (G8) D4S1O D4S1BO -  I  (674) D4S95 I  (BJ56) D4S182 D4S127  I  -  D4S168  3  (S7) D4S43 I  I  (731) D4S98  2  //  I  4  I  (678) D4S96  5  (BS1) D4S133  I_  (E2&4.2) (1 57) D4S111 D4S227 D4S228  I  (2R3) D4S141  I  1  6Mb  -I  4pter  (D51 D4S90  A schematic map of 4pl 6.3 showing approximate distances between markers (compiled from Chpt 2. references 4 to 13).  I  D4S125  I  44 increase suggestive of the region of the gene. In addition, at the time of this study, the values of association were relatively weak, suggesting markers used were still some distance from the site of mutation, and factors such as a significant recombination rate clouded the analysis.  The small sample sizes used in some of the previous studies necessitated re-examination in a larger data set to either confirm or refute the previous fmdings. The aim of this study was to analyze a large number of markers spanning the 6 Mb candidate region between D4S62 and D4S90, using a large cohort of affected indiviuals and controls, most of UK descent to refine the most likely candidate regions. Furthermore, newly identified distal markers were included in this study to add to the understanding of the pattern of association across the region and aid in the localization of the FID gene.  3.2 RESULTS 3.2.1 IDENTIFICATION OF NEW DISTAL POLYMORPHIC PROBES A 460 kb segment of contiguously overlapping DNA had been previously constructed . Random fragments from 43 across the distal candidate region from D4S 111 to D4S228 one distal cosmid containing the recombination breakpoint in the sporadic patient previously mentioned were isolated and hybridized to panels of six unrelated individuals from at least 10 restriction enzymes. Testing of approximately 10 random fragments identified 3 unique and polymorphic probes from within the distal candidate region for the HI) gene (Table 3-2). BS1, a single copy 3.3 kb subclone, detects an MboI polymorphism at the previously identified D4S 133 1ocus’.  The allele frequencies have been calculated by assessment of unaffected chromosomes from families with HI) as determined for nonrandom allelic association analysis. On the basis of the allele frequencies, the heterozygote frequency and the PlC was calculated for each  D4S228  D4S228  LOCUS  cl6Dp/BS3.3  cl6Dp/M4.2  cl6D/E2Rep  PROBE  MboI  0 1 2 3 A B A B  NO. OF NAME ENZYME ALLELE ALLELE SIZE (kb) CHROMOSOMES 4 5 SstI E2 4.8 571 17 4.5 68 3.8 80 4.2 8 Sstl 444 6.5 65 0.8 345 0.7 BS1  Table 3-2. Heterozygosity of three new probes for allelic association studies.  D4S133  ALLELE HETEROZYGOTE FREQUENCY FREQUENCY 0.006 0.24 0.865 0.026 0.103 0.26 0.15 0.85 0.26 0.16 0.84  *Sour: Botstein, D, White RL, Skolnick M, Davis RW (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:314-33 1.  PIC  0.22  0.22  0.22  46 45 (Table 3-2). All are slightly informative, with PlC values of 0.22. E2 is likely to RFLP be an insertion/deletion polymorphism, as it is detected by several other restriction enzymes. The enzyme used for analysis in this chapter is SstL BS1 and 4.2 appear to be due to a single base-pair substitution at their respective restriction recognition sites.  3.2.2 NONRANDOM ALLELIC ASSOCIATION ACROSS 6 MB A total of 134 families with HI) were used for the analysis, providing 860 chromosomes from 134 affected and 313 unaffected individuals.  Control chromosomes were obtained from unaffected spouses, and from the canonical unaffected chromosome from affected patients. For this analysis it was assumed that unaffected spouses were of similar ancestry to their affected partner.  A total of 17 RFLPs detected by 13 probes were examined. All polymorphisms used were detected by probes previously characterized and are summarized in Table 3-3. Probes used for re-analysis of nonrandom association include: 2R3 (D4S 141), 678 (D4S96), 731 (D4S98), 674 (D4S95), G8 (D4S1O) and P8 (D4S62). 157 (D4S111) and S7 (D4S43) had not been tested on this cohort before. Probes Rs3 (D4S227) and Ac2 (D4S227), BS 1 (D4S 133), E2 (D4S228), and 4.2 (D4S228) are newly identified probes not previously examined for association with HD. A physical map showing markers and approximate distances between them is shown in Figure 3-1.  The allele frequencies for RFLPs on HD chromosomes and control chromosomes and measures of allelic association for all 17 RFLPs analyzed are shown in Table 3-4. The . The adjusted 46 Bonferroni procedure was used to adjust for the multiple comparisons significance level is z = , 17 therefore a p value of 0.003 is considered significant / 1 1-(1-cC) in this analysis.  47 Table 3-3. Polymorphic markers used in association analysis. PROBE  LOCUS  ENZYME  P8  D4S62  Hind  G8  D4S1O  HindUl  EcoRI Bgll pBS674E-D (674)  D4S95  AccI  TaqI MboI S7  D4S43  TaqI  pBS731B-C (731)  04898  SstI  pBS678 (678)  D4S96  MspI  p157-9 (157)  D4S111  PstI  Rs3  D4S227  MspI  Ac2  D4S227  AccI  BS1  D4S133  MboI  cl6Dp/E2Rep (E2)  D4S228  SstI  cl6Dp/M4.2 (4.2)  D4S228  SstI  2R3  D4S141  HindJII  ALLELE A B C D A B C 0 A B A B A B A B A B A B A B C A B A B C 0 A B 1 2 3 A B 0 1 2 3 A B A B  SIZE (kb) 3.7 4.6 4.9 5.3 13.0/3.7 13.0/4.9 11.0/3.7 11.0/4.9 11.0 9.0 2.2 2.1 6.8 1.5 2.3 1.75 1.3 0.7/0.6 3.4 2.3 2.8 2.5 10.0 1.4 1.0 1.9 2.1 2.2 2.3 0.75 0.9 2.0 1.8 1.7 0.8 0.7 5.0 4.8 4.5 3.8 8.0 6.5 2.5 1.6  ENZYME  ALLELE  AccI  Bgll  EcoRI  HindIll  A B  A B  A B  A B  A B C D  Hincil  TaqI A B  73.6 15.4 11.0 0.0 100.0 75.6 6.1 14.6 3.7 100.0 38.3 61.7 100.0 56.6 43.4 100.0 81.2 18.8 100.0 34.5 65.5 100.0 80.9 19.1 100.0 10.1 89.9 100.0 18.1 81.9 0.0 100.0  HD CHROMOSOMES % No  329 62 39 3 433 276 38 70 8 392 182 193 375 248 146 394 348 166 514 189 336 525 319 180 499 67 331 398 73 431 4 508  76.0 14.3 9.0 0.7 100.0 70.4 9.7 17.9 2.0 100.0 48.5 51.5 100.0 62.9 37.1 100.0 67.7 32.3 100.0 36.0 64.0 100.0 63.9 36.1 100.0 16.8 83.2 100.0 14.4 84.8 0.8 100.0  NON-HD CHROMOSOMES % No  Table 3-4. Allele frequencies for RFLPs on HD and non-HI) chromosomes. MARKER (LOCUS)  P8(D4S62)  A B C  MboI  A B  D  TaqI  G8(D4S1O)  S7(D4S43)  SstI  674(D4S95)  731(D4S98)  A B C  67 14 10 0 91 62 5 12 3 82 31 50 81 47 36 83 91 21 112 38 72 110 93 22 115 7 62 69 21 95 0 116  .079  .044  .021  1.05  2.85  .92  .23  .0043  .31  .091  .34  .63  P  .35  .13  .21  .13  .06  IQI  r  .047  8.14  .41  .11  .00053  .28  .03 12.0  .18  .11  .78 .14  1.80  .42  .077  .062  .64  .011  .032  ENZYME  Table 3-4. Continued.  MARKER (LOCUS)  ALLELE  NON-HD HD CHROMOSOMES CHROMOSOMES %  r  2 x  .11 .013 55.2 62 283 A 55.9 678(D4S96) MspI 44.8 44.1 230 49 B 100.0 513 111 100.0 .14 .018 12.3 10 12.7 45 A 157(D4S111) PstI 221 60.4 49 B 62.0 26.5 24.0 97 19 C 2 0.5 1 1.3 D 366 100.0 79 100.0 .0062 .0085 64.3 45 10 62.5 MspI A Rs3(D4S227) 35.7 25 37.5 6 B 70 100.0 16 100.0 .82 .082 56.1 55 16 AccI 1 66.7 Ac2(D4S227) 35.7 29.2 35 2 7 8.2 1 4.1 8 3 24 100.0 98 100.0 9.0 .13 15.9 4 65 4.1 A BS1(D4S133) MboI 84.1 345 94 B 95.9 410 100.0 98 100.0 .12 8.58 0.6 3 0.0 0 SstI 0 E2(D4S228) 84.7 447 105 1 95.5 2.8 15 2 1.8 2 11.9 2.7 63 3 3 528 100.0 110 100.0 .14 10.3 17.4 74 6 5.5 SstI A 4.2(D4S228) 82.6 350 94.5 103 B 424 100.0 109 100.0 .37 .038 40.8 11 95 Hindlil A 34.4 2R3(D4S141) 59.2 138 21 B 65.6 233 100.0 32 100.0 All P values calculated with 1 degree of freedom (ldf). Calculations for multi-allelic markers were done with the major allele vs. pooled minor alleles.  .54  .0013  .0034  .0027  .37  .94  .71  .74  P  .14  .57  .58  .63  .22  .04  .03  .01  IQI  50 The allele frequencies for the non-I-ID chromosomes concur with previously reported . 3 37 results 4 ’ 9 The previously reported nonrandom association between the I-ID gene and 1 the RFLPs detected with D4S95 (AccI and MboI) is confirmed in this analysis . 4 ’ 39 1 Three newly tested probes [BS1 (D4S 133), E2 and 4.2 (D4S228)] detect RFLPs that demonstrate statistically significant nonrandom association with the HD gene as shown in Table 3-4. All three of the markers are tightly linked within a 30 kb region (D4S133/D4S228). All other markers tested show random association with HD (Table 3-4). The measure of disequilibrium as determined by the significance of the r value is consistent with the Yule’s coefficent, with a greater significance level corresponding to a greater Yule’s coefficient.  For multi-allelic markers, calculations were done with the major allele vs. pooled minor . Pooling of alleles for linkage disequilibrium analysis may alter the power of 15 alleles detecting linkage disequilibrium. A second approach was therefore taken to determine linkage disequilibrium between mukiallelic loci using a summation approach . In this 35 particular analysis, the power of detecting linkage disequilibrium showed some differences depending on the method used (Table 3-5). For some markers p values were increased and for others, decreased. This raises caution in the interpretation of data of only marginal significance with either method.  3.2.3 HAPLOTYPE ANALYSIS Another significant concern in the interpretation of data in Table 3-4 is the fact that 17 tests have been used which increases the type 1 error. Taking that into account, the appropriate p value chosen for significance would be 0.003. In an effort to further analyze the level of significance of these results, those regions which showed significant results in Table 3-4 or had results which had marginal significance, were further analysed using haplotype analysis. In this instance, the haplotypes were analyzed for allele association and the results are shown in Table 3-6. The haplotypes constructed by a combination of the AccI  ‘-4  ENZYME No. OF ALLELES 4 4 3 4 3 4  A) POOLED ALLELE METHOD r df P .021 .23 1 .63 .044 .92 1 .34 .032 .64 1 .42 .14 .018 1 .71 .082 .82 1 .37 .120 8.58 1 .0034  1.18 2.21 1.89 0.62 1.04 10.79  B) SUMMATION METHOD df P 3 .76 3 .53 2 .39 3 .89 2 .59 .013 3  Table 3-5. Comparison of 2 methods for determination of nonrandom allelic association for multi-allelic RFLPs. MARKER (LOCUS) P8(D4S62) HinclI G8(D4S1O) HindIll 731 (D4S98) SstI 157 (D4S111) PstI Ac2 (D48227) AccI E2(D4S228) SstI df = degree of freedom  674/Acci A A B B  E2 1 1 2 2 3 3 0  o  4..2 A A B B  43.3 24.4 20.6 11.7 100.0  294 33 13 140 480 85.0 1 5.0 100.0  61.3 6.9 2.7 29.2 100.0  2 84 0 0 0 0 0 0 86  84 2 86  87 1 2 18 108  2.3 97.7 0.0 0.0 0.0 0.0 0.0 0.0 100.0  97.7 2.3 100.0  80.5 0.9 1.9 16.7 100.0  2 = 22.8 x df=1 P = 0.00000  2 x  = 19.9 1 0.00001  2 = 41.0 x df =1 P = 0.00000  2 = 30.2 x df=1 P = 0.00000  df P  df P  =  =  = 13.6 df=1 P = 0.00023  B  208 117 99 56 480 210 37 247 0.4 86.8 1.1 1.4 9.3 0.4 0.0 0.6 100.0  1.2 1 0.0 0 1.2 1 84 98.6 86 100.0  2 x  4.2  4.LZ  58.7 41 .3 100.0 1 243 3 4 26 1 0 2 280  9.2 1.8 2.2 86.8 100.0  1.0 1 93 94.9 1.0 1 0.0 0 3.1 3 0.0 0 0.0 0 0.0 0 98 100.0  674/Mbol A B A B  calculated wIth the major haplotype vs. other haplotypes pooled). Table 3-6. Haplotypes of HD chromosomes and control chromosomes HO VS EXPECTED VS OBSERVED OBSERVED EXPECTED OBSERVED OBSERVED HO CONTROL CONTROL HAPLOTYPE LOCUS CONTROL CONTROL CHROMOSOMES CHROMOSOMES CHROMOSOMES % % # % D4S95  .l 145 1 02 247 13.6 71.2 0.4 2.1 1.9 10.0 0.1 0.5 100.0 25 5 6 236 272  1.0 85.0 1.0 1.0 9.3 2.1 0.0 0.5 100.0  df P  =  = 24.1 1 0.00000 =  =  dt P  =  =  = 5.9 1 .015  = 7.1 df=1 P = 0.0078  =  = 7.1 1 0.0078  =  = 8.7 I 0.0032  38 199 3 7 5 28 0 1 280 2.8 14.6 13.1 69.5 100.0  4 329 4 4 36 8 0 2 387  =  A B A B A B A B  8 40 36 189 272  14.7 70.0 0.5 2.3 2.1 9.8 0.1 0.5 0.0 A B A B A B A B  df P  A B A B  57 271 2 9 8 38 0 2 387  1 B other haplotypes  D4S228  D4S228  D4S228  D4S228 1 1 2 2 3 3 0 0  53 and MboI alleles detected by marker D4S95, and the haplotypes resulting from the combination of the BS 1 (D4S 133), E2 and 4.2 (D4S228) alleles both show significant non-random association with HD, confirming the findings in Table 3-4.  In addition, to investigate nonrandom association between markers within each cluster, pairwise haplotypes were analyzed (Table 3-6). The analysis of haplotypes demonstrates that the AccI and MboI RFLPs at D4S95 are in very tight association with each other on HI) and non HD chromosomes as might be expected due to their physical proximity to one another. Additionally, the three RFLPs clustered within 30 kb around D4S228 and D4S 133 are also in strong association with each other on HI) and non lID chromosomes.  In order to determine if the two loci demonstrating association with EU) are also in association with each other, haplotypes of the two loci were analyzed from both lID and control chromosomes (Table 3-7). It is noteworthy that 674 (D4S95, AccI and Mbol) is in strong non-random association with the RFLPs detected at loci D4S228 and D4S 133 on ND chromosomes but not on control chromosomes.  3.2.4 ANALYSIS OF HOMOGENEOUS CHROMOSOMES The families in this study represent many different ancestries and measures of nonrandom association may be affected by the potential of multiple origins for the HI) mutation on chromosomes with different haplotypes. Allelic counts using only canonical non-HI) chromosomes for controls were done to attempt to adjust for effects from different populations. For markers tested with a sufficient number of individuals for analysis in this manner, the measures of nonrandom allelic association are shown in Table 3-8. Although the significance measured by the p value has decreased overall, due to the smaller data set, significant nonrandom association is still present for the RFLP at D4S228 (E2, SstI). The significance of the association (p value) and the measure of association as calculated by the  BS1 B B A A 4.2 B B A A E2 1 1 3 3 2 2 0 0 BS1 B B A A  61 16 3 1 81  OBSERVED HD CHROMOSOMES %  58.8 28.8 10.9 1.5 100.0  OBSERVED CONTROL CHROMOSOME % 161 79 30 4 274  OBSERVED VS EXPECTED (CONTROLS)  HD VS OBSERVED CONTROLS  2 = 6.6 x df= 1 P = 0.010 = 0.12 df= 1 P = 0.73  56.9 27.2 10.8 5.1 100.0 70 16 3 0 89  75.3 20.8 3.7 1.2 100.0  156 74 30 14 274 58.0 29.2 10.9 1.9 100.0  76.8 19.2 2.0 1.0 1.0 0.0 0.0 0.0 100.0  = 12.2 df=1 P=0.00048  218 110 41 7 376  76 19 2 1 1 0 0 0 99  = 0.27 df=1 P=0.61  55.9 26.7 11.8 5.6 100.0  55.5 29.5 9.2 2.4 1.8 1.1 0.5 0.0 100.0  72.6 20.2 3.6 1.2 100.0  77.5 17.9 3.4 1.1 100.0  210 101 44 21 376 211 112 35 9 7 4 2 0 380  61 17 3 1 84  = 13.9 2 x df= 1 P = 0.00019  57.3 27.4 8.1 3.8 1.9 0.9 0.4 0.2 100.0  55.6 32.6 9.8 2.2 100.0  O19 2 df= 1 P = 0.66  218 104 31 14 7 3 2 1 380  155 91 27 6 279  49 df=1 P = 0.0077  53.7 30.4 10.2 5.7 100.0  = 0.12 df=1 P = 0.73  150 85 28 16 279  EXPECTED CONTROL CHROMOSOMES % #  Table 3-7. Haplotypes between RFLPs at D4S95 and D4S228/D4S133 for HD and Non-HD chromosomes.  -  -  -  -  HAPLOTYPE 674/AccI A B A B 674/AccI A B A B 674/AccI A B A B A B A B 674fMboI A B A B  Table 3-7 Continued.  -  -  OBSERVED OBSERVED EXPECTED HD CONTROL HAPLOTYPE CONTROL CHROMOSOMES CHROMOSOMES CHROMOSOMES % % % 4.2 674/MboI 75.3 B 197 54.5 204 70 A 52.8 20.4 19 111 B 124 33.2 29.8 B 42 3.2 11.1 9.1 A 34 3 A 1 1.1 24 3.2 A 12 B 6.3 100.0 374 100.0 374 93 100.0 E2 674fMboI 74.0 54.1 71 A 1 203 209 52.6 1 126 21.9 21 B 118 30.6 32.6 A 29 2.1 3 30 2 7.6 7.8 18 14 4.7 B 1.0 3 1 3.6 2 1 1.0 1.8 A 7 6 1.5 1.0 B 4 0.0 2 5 0 1.3 A 1 0.0 0 0.0 0 0 0.3 1 B 1 0.0 0 0 0.2 0.3 100.0 96 100.0 100.0 386 386 df = degree of freedom calculated with major haplotype vs. pooled other haplotypes  2 x  = 12.4 df=1 P = 0.00044  HD VS. OBSERVED CONTROLS  0.19 df=1 P = 0.66  = 13.5 df=1 P = 0.00024  OBSERVED VS. EXPECTED CONTROLS  = 0.13 df=1 P = 0.72  Table 3-8. MARKER P8 (D4S62)  08 (D4SIO)  674 (D4S95)  731 (D4S98)  TaqI  AccI  BgII  EcoRI  Hindlil  HincIl  A B  A B  A B  A B  A B  A B C D  A B C D  73.6 15.4 11.0 0.0 100.0 75.6 6.1 14.6 3.7 100.0 38.3 61.7 100.0 56.6 43.4 100.0 81.2 18.8 100.0 34.5 65.5 100.0 80.9 19.1 100.0 18.1 81.9 0.0 100.0  72 9 10 0 91 53 9 16 0 78 42 41 83 48 35 83 87 28 115 42 66 108 78 36 114 13 99 2 114  # 79.1 9.9 11.0 0.0 100.0 68.0 11.5 20.5 0.0 100.0 50.6 49.4 100.0 57.8 42.2 100.0 75.6 24.4 100.0 38.8 61.1 100.0 68.4 31.6 100.0 11.4 86.8 1.8 100.0  %  CANONICAL HD NON-HD CHROMOSOMES CHROMOSOMES 67 14 10 0 91 62 5 12 3 82 31 50 81 47 36 83 91 21 112 38 72 110 93 22 115 21 95 0 116  0.13  0.06  0.42  0.02  0.12  0.080  0.056  1.5  3.9  0.81  4.0  0.07  2.4  1.02  0.57  0.22  0.05  0.37  0.046  0.79  0.12  0.31  0.45  P  .19  .32  .09  .17  .03  .25  .19  0.15  IQI  r  0.08  Allele frequencies for RFLPs on HD and canonical non-HD chromosomes.  MboI  ENZYME ALLELE  SstI  A B C  Table 3-8. Continued.  CANONICAL P r NON-ND ND ENZYME ALLELE MARKER CHROMOSOMES CHROMOSOMES % # % # 0.24 1.4 0.08 46.7 50 55.9 62 A MspI 678 (D4S96) 53.3 57 44.1 49 B 107 100.0 111 100.0 .61 0.26 0.04 12 15.6 10 12.7 A 157 (D4SIII) PstI 46 59.7 49 62.0 B 24.7 19 24.0 19 C 0.0 0 1.3 1 D 77 100.0 79 100.0 0.34 4.5 0.15 12 12.0 4.1 4 A BS1 (D4S133) MboI 88.0 88 94 B 95.9 100 100.0 98 100.0 0.0027 9.0 0.20 0.0 0 0.0 A 0 SstI E2 (D4S228) 85.2 98 105 95.5 B 1 0.9 1.8 2 C 13.9 16 2.7 D 3 115 100.0 110 100.0 0.027 4.9 0.15 15.0 16 A 5.5 6 SstI 4.2 (D4S228) 91 85.0 103 94.5 B 107 100.0 109 100.0 All P values calculated with one degree of freedom. Calculations for multi-allelic markers were done with the major allele vs. pooled minor alleles.  IQI  .18  .05  .52  .57  .50  58 Yule’s coefficient (QI) are both shown, confirming the presence of two physically distinct regions in nonrandom association with the HD gene.  Another means of examining the strength of nonrandom allelic association within a more homogeneous population was to study families of United Kingdom (UK) origin separately. Markers demonstrating significant nonrandom association [674 (D4S95), BS1 (D4S 133), E2 and 4.2 (D4S228)] were re-examined after categorization of the origin of HD in each pedigree and these data are presented in Table 3-9. Pedigrees where the ancestry of the HD chromosome was unknown were not included in the analysis. Although the statistical significance of the data is lost likely due to the smaller sample sizes, the measure of nonrandom association as determined by the Yule’s coefficient is greater for this more homogeneous population. The maximum Yule’s coefficient determined in this study was obtained with the UK group tested with 4.2 (D4S228), (IQI=.78).  3.3 DISCUSSION Three polymorphic probes from D4S228 and D4S 133 were identified and cloned which allowed for testing of probes in allelic association analysis in a previously untested region of 4pl6.3. The focus on 4pl6.3 has resulted in the identification of genes such as the c iduronidase gene , the B-subunit of the cyclic GMP phosphodiesterase gene 47 , and the 48 myosin light chain , and these markers may prove useful in future linkage studies for 49 other diseases. In addition, in presymptomatic testing, informativeness of an analysis depends on the frequency of heterozygosity of the DNA markers tested as well as the family structure and these markers may be useful for predictive testing for disease caused by genes in 4.pl6.3.  The most important finding of this study is the discovery of significant nonrandom association between alleles for BS 1 (D4S 133), E2 (D4S228), 4.2 (D4S228) and HD, which  Table 3-9. Allele frequencies and percentages for RFLPs on HD and non-HD chromosomes based on ancestry of the HD chromosome. HD NON-HD MARKER ENZYME ANCESTRY ALLELE CHROMOSOMES CHROMOSOMES r (LOCUS) % AccI 81.3 148 0.12 674 (D4S95) UK A 39 68.5 B 18.7 31.5 9 68 100.0 216 48 100.0 Non-UK A 25 102 65.0 0.08 73.5 B 26.5 9 55 35.0 34 100.0 157 100.0 0.17 Mbol UK A 82.2 126 37 60.6 B 17.8 82 8 39.4 45 100.0 208 100.0 Non-UK A 25 98 0.044 73.5 68.5 26.5 45 31.5 B 9 100.0 34 143 100.0 2.8 BS1 (D4S133) MboI UK A 1 32 15.2 0.13 97.2 178 84.8 B 35 100.0 210 36 100.0 Non-UK A 2 6.2 16.5 0.10 33 167 B 30 93.8 83.5 32 100.0 200 100.0 SstI 2 E2 (D4S228) UK A 0 0.0 0.75 0.12 230 B 43 97.7 85.8 2 0 0.0 0.75 C D 1 2.3 34 12.7 44 100.0 268 100.0 Non-UK A 0.0 1 0.4 0.09 0 B 29 93.6 217 83.5 1 3.2 13 C 5.0 29 11.1 D 1 3.2 260 100.0 31 100.0 4.2 (D4S228) SstI 1 2.3 31 16.4 0.15 UK A B 42 158 97.7 83.6 100.0 189 100.0 43 43 18.3 Non-UK A 3 8.8 0.088 B 31 91.2 192 81.7 100.0 235 100.0 34 All p values calculated with one degree of freedom. Calculations for multi-allelic markers were done with the major allele vs. pooled minor alleles.  2.2  4.5  2.2  3.8  0.34  7.3  1.2  3.8  0.022  0.14  0.034  0.14  0.051  0.56  0.0069  0.27  0.051  .40  .78  .48  .75  .50  .73  .12  .50  .20  .33  IQI  5.2  0.15  P  2.1  60 are separated by approximately 3 Mb from the site of previously identified nonrandom association detected by D4S95 (Fig. 3-1). In addition, previously reported association with alleles for AccI and MboI polymorphisms at D4S95 was confirmed with this larger data set. All other marker alleles tested were shown to be in random association with the HD gene. Therefore, two physically distinct regions (D4S 1331D4S228 and D4S95), both containing markers in significant nonrandom association with the HI) gene have been identified.  It should be noted that although reasonable steps to perform a rigorous statistical analysis have been taken, these statistics may be altered by varying frequency of alleles in controls from different populations. In order to examine a more homogeneous population, families with an HD allele of UK origin were analyzed separately, confirming the presence of two regions in linkage disequilibrium with HD. In addition, analysis of HD chromosomes and the patient’s canonical chromosome from the unaffected parent as controls also showed two genomic regions in disequilibrium with HO. However, the assumption that controls are of similar ancestry to that of their affected partner may not be accurate, and measures of the differences in allele frequencies between HO and controls may therefore not reflect the true state. Thus, the inability to accurately control for the ancestry of the control chromosomes may be reflected in the measures of disequilibrium.  Nonrandom association between two loci, separated by markers which do not show significant association, has been previously reported. However, the distance between these loci has been relatively small. For example, RFLPs from the Apo AI-CllI-AIV gene cluster which are between 4—23 kb apart, are in strong linkage disequilibrium with each other but in 29 Similar findings were reported for the f3-globin ’ 23 them equilibrium with RFLPs between . gene cluster which contains two clusters of RFLPs separated by 9 kb with disequilibrium between RFLPs within each cluster but with no significant disequilibrium between the two clusters in a normal population . The findings of this manuscript reveal two loci, separated 50  61 by 3 Mb of DNA, each showing nonrandom association with a third locus, in this case the HI) gene. In addition, haplotype analysis demonstrated that the two clusters of markers are in significant allelic association with each other on HI) but not control chromosomes.  The measures of association between D4S95 and D4S 1 331D4S228 show that alleles at the two loci, separated by 3Mb are in strong association with HI), and in association with each other on affected but not control chromosomes. If these results are not a statistical artifact, the reasons for two regions of disequilibrium separated by a large genomic distance containing additional markers not in disequilibrium with the HD gene, are unknown. In addition, the significance of markers in the two distinct regions being in disequilbrium with each other on RD chromosomes, but not normal chromosomes also remains unclear. These question remains unanswered despite the identification and localization of the CAG trinucleotide repeat whose expansion is associated with HD.  The localization of the mutation associated with HI) has allowed for retrospective analysis of this data and reassessment of previous association data. The strong linkage disequilibrium between RD and D4S95 (AccI and MboI polymorphisms) observed in all populations tested is not surprising due to its close proximity to the RD mutation (120kb). However, the marker D4S 127 demonstrated slightly weaker association with HI) 38 despite being only 30kb from the CAG repeat in this region. This may be due to the different nature of the polymorphism, with a multi-allelic system such as D4S 127 having a higher mutation rate making it more difficult to identify association, or may be a reflection of the population tested. The weak association seen with markers D4S 180 and D4S 18238 is in keeping with their location 250kb centromeric and 300kb respectively to the RD mutation.  The  associations observed with D4S43, D4S98 and D4S 133/D4S228 and RD in several different populations is still unexplained as these markers are located 900 kb, 1400kb and 3Mb respectively from the CAG repeat.  62  It is possible that expansion of the CAG repeat to the Huntington range is a relatively recent event, and disequilibrium between two distal regions is the result of infrequent recombination between these two loci on Huntington chromosomes compared to normal chromosomes. This is consistent with a prior finding of a low recombination rate in this region on affected but not control chromosomes . However, the tracing of families with 51 ND, even to the 17th century ’ is more in keeping with older origins for the HD gene. 1 Other as yet undefined selective pressures might also be influencing these findings. There may be some unknown slight selective advantage of association of alleles for all chromosomes, which is seen more strongly on affected chromosomes than control chromosomes because of a small number of ancestral Huntington chromosomes.  Specific sequences play a role in expansion of the CAG repeat and there may be some association between distal sequences and repeat expansion. For example, in SCA, sequence specificity, in particular the homogeneity of the repeat sequence, is associated with expansion to disease state. The occurrence of specific DM and FRAXA haplotypes . If sequences distally 52 also suggests that sequence differences may be very important located are involved in the instability of the CAG repeat on HD chromsomes, this may account for the associations observed distally on ED chromosomes.  One hypothesis prior to the cloning of the ND gene was that the causative gene may be extremely large, with exons spanning 3 Mb. The gene associated with the CAG repeat expanded on HD chromosomes is large, spanning over 200 kb, however it is localized between D4S 180 and D4S 127. The possibility, however, that an important regulatory sequence on the ED chromosome physically separated from the ND gene over an extensive distance could account for the distal region of disequilibrium has yet to be definitively excluded.  63  Another hypothesis explaining two sites of association with HD is that the disease is the result of mutations in two independent but functionally related genes 3 Mb apart, with mutations at each location responsible for manifestation of the disease. This is unlikely, however, as CAG expansion occurs in 99% of affected individuals and linkage disequilibrium with a second locus would be undetectable with the small number of individuals not demonstrating CAG expansion.  The finding of linkage disequilibrium between two regions separated by 3 Mb but with the absence of linkage disequilibrium between other markers in closer proximity to these markers at D4S95, D4S228 and D4S 133 again raises concerns about the use of linkage disequilibrium in determining locus order. It has previously been suggested that loci located more distantly from one another may undergo recombination more frequently and therefore would be expected to demonstrate less linkage disequilibrium than more closely linked loci’ . 2 However, the findings of this study, would suggest that no definite relationship is predictably found between physical distance and modest measures of linkage disequilibrium. It has previously been demonstrated that this region of chromosome 4 might have decreased 51 and lack of a constant recombination rate across the region may interfere recombination with the relationship between distance and measure of allelic association. Other factors such as mutation, admixture and drift may would also disturb the expected relationship between physical distance and degree of linkage disequilibrium. These findings suggest that a linear relationship between disequilibrium and physical distance may not exist in this chromosomal region in the population tested.  In conclusion, it has previously been cautioned that measures of disequilibrium cannot be correlated with physical distance between loci . The two regions of nonrandom association 30 shown in this study made it difficult to define precisely a location for the HD gene. Since the  64 identification of the expanding CAG trinucleotide repeat in a novel gene, 120 kb from D4S95, the two regions of disequilibrium are as of yet unexplained. Unless the results are an artifact due to an inaccurate assumption that the control chromosomes reflect the population from which the HD chromosomes were derived, the most likely hypothesis to account for the findings in this manuscript is that other factors, including selection and mutation are acting in a yet unknown fashion.  65  3.4 REFERENCES 1. Lewontin RC and Kojima KI (1960). polymorphisms. Evolution 14:458-472.  The evolutionary dynamics of complex  2. Weir BS (1990). Genetic Data Analysis. Sinauer, Sunderland, MA. 3. Lewontin RC (1974). The genetic basis of evolutionaiy change. Columbia University Press, New York. 4. Kan YW and Dozy AM (1978). Polymorphism of DNA sequence adjacent to human beta globin structural gene: relationship to sickle mutation. Proc Natl Acad Sci USA 75:5631-5635. 5. Fujita R, Hanauer A, Sirugo G, Heilig R, Mandel JL (1990). Additional polymorphisms at marker loci D9S5 and D9S 15 generate extended haplotypes in linkage disequilibrium with Friedreich ataxia. Proc Nati Acad Sci USA 87:1796-1800. 6. Harley HG, Brook ID, Floyd 3, Rundle SA, Crow 5, Walsh KV, Thibault MC, Harper PS, Shaw DJ (1991). Detection of linkage disequilibrium between the myotonic dystrophy locus and a new polymorphic DNA marker. Am 3 Hum Genet 49:68-75. 7. Thomas GR, Roberts EA, Rosales TO, Moroz SP, Lambert MA, Wong LTK, Cox DW. (1993). Allelic association and linkage studies in Wilson disease. Hum Mol Gen 2:14011405. 8. Kerem, BS, Rommens 3M, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui L-C (1989). Identification of the cystic fibrosis gene: genetic analysis. Science 245:1073-1080. 9. Hastbacka, de la Chapelle, Kaitila I, Sistonen P, Weaver A, and Lannder E (1992). Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nature Genet 2:204-211. 10. Conneally PM Haines IL, Tanzi RE, Wexler NS, Penchaszadeh GK, Harper PS, Folstein SE, Cassiman JJ, Myers RH, Young AB, Hayden MR, Falek A, Tolosa ES, Crespi 5, Maio L, Holmgren G, Anvret M, Kanazawa I, Gusella (1989). Huntington disease: No evidence for locus heterogeneity. Genomics 5:304-308. ,  11. Hayden MR. (1981). Huntington’s chorea. Springer-Verlag, New York. 12. Hill WG and Robertson A (1968). Linkage disequilibrium in finite populations. Theor. Appl. Genet 38:226-23 1. 13. Sved JA (1971). Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor Popul Biol 2:125-141. 14. Steinmetz M, Uematsu Y, Lindahl KF (1987). Hotspots of homologous recombination in mammalian genomes. Trends Genet 3:7-10.  66  15. Litt M and Jorde B (1986). Linkage disequilibrium between pairs of loci within a highly polymorphic region of chromosome 2q. Am 3 Hum Genet 39:166-178. 16. Bech-Hanson NT, Linsley PS, Cox DW (1983) Restriction fragment length polymorphisms associated with immunoglobin Cy genes reveal linkage disequilibrium and genomic organization. Proc Natl Acad Sci USA 80:6952-6956. 17. Aschbacher A, Buetow K, Chung D, Walsh S, Murray 3 (1985). Linkage disequilibrium of RFLP’s associated with the a,f3, and y fibrinogen predict gene order on chromosome 4. Am 3 Hum Genet Suppi 37:A186. 18. Chakravarti A, Albein SC, Permutt MA (1986). Evidence for increased recombination near the human insulin gene: implications for disease association studies. Proc Natl Acad Sci USA 83:1045-1049. 19. Chakraborty R, Lidsky AS, Daiger SP, Guttler F, Sullivan S, Dililla AG, Woo SLC (1987). Polymorphic DNA haplotypes at the human phenylalanine hydroxylase locus and their relationship with phenyilcetonuria. Hum Genet 76:40-46. 20. Daiger SP, Chakraborty R, Reed L, Fekete G, Schuler D, Berenssi G, Nasz I, Brclicka R, Kamaryt J, Pijackova A, Moore S, Sullivan S, Woo SLC (1989). Polymorphic DNA haplotypes at the phenylalanine hydroxylase (PAH) locus in European families with phenylketonuria (PKU). Am 3 Hum Genet 45:310-318. 21. LeitersdorfE, Chakravarti A, Hobbs HE (1989) Polymorphic DNA haplotypes at the LDL receptor locus. Am 3 Hum Genet 44:409-421. 22. Elbein SC (1992). Linkage disequilibrium among RFLPs at the insulin-receptor locus despite intervening Alu repeat sequences. Am 3 Hum Genet 51:1103-1110. 23. Thompson EA, Deeb S, Walker D, Motulsky AG (1988). The detection of linkage disequilibrium between closely linked markers: RFLPs at the Al-Cm apolipoprotein genes. Am 3 Hum Genet 42:113-124. 24. Barker Holm T, White R (1984). A locus on chromosome lip with multiple restriction site polymorphisms. Am 3 Hum Genet 36:1159-1171. 25. Chakravarti A, Phillips JA, Mellits KH, Buetow KH, Seeburg PH (1984). Patterns of polymorphism and linkage disequilibrium suggest independent origins of the human growth hormone gene cluster. Proc Natl Acad Sd USA 8 1:6085-6089. 26. Borrensen Al, Moller P, Berg K (1988). Linkage disequilibrium analyses and restriction mapping of four RFLPs at the procx2(I) collagen locus: lack of correlation between disequilibrium and physical distance. Hum Genet 78:216-221. 27. Tzall S, Ellenbogen A, Eng F, Hirschhorn R (1989). Identification and characterization of none RFLPs at the adenosine deaminase (ADA) locus. Am 3 Hum Genet 44:864-875. 28. Hegele RA, Plaetke R, Lalouel JM (1990). Linkage disequilibrium between DNA markers at the low-density lipoprotein receptor gene. Genet Epedemiol 7:69-8 1.  67  29. Benlian P, Boileau C, Loux N, PAstier D, Masliah J, Coulson M, Nigou M, Ragab A, Guimard 3, Ruidavets TB, Bonaiti-Pellie C, Fruchart JC, Douste-Blazy P, Bereziat G, Junien C (1991). Extended haplotypes and linkage disequilibrium between 11 markers at the APOA1-C3-A4 gene cluster on chromosome 11. Am I Hum Genet 48 :903-910. 30. Haviland MB, Kessling AM, Davignon J, Sing CF (1991). Estimation of HardyWeinberg and pairwise disequilibrium in the apolipoprotein AT-Cm-API gene cluster. Am J Hum Genet 49:350-365. 31. Walter MA and Cox DW (1991). Nonuniform linkage disequilibrium within a 1,500kb region of the human immunoglobin heavy-chain complex. Am J Hum Genet 49:9 17931. 32. Zerba KE, Kessling AM, Davignon I, Sing CF (1991). Genetic structure and the search for genotype-phenotype relationships: an example from disequilibrium in the Apo B gene region. Genetics 129:525-533. 33. Miserez AR, Schuster H, Chiodetti N, Keller U (1993). Polymorphic haplotypes and recombination rates at the LDL receptor gene locus in subjects with and without familial hypercholesteremia who are from different populations. Am J Hum Genet 52:808-826. 34. Jorde LB, Watkins WS, Viskochil D, O’Connell P, Ward K (1993) Linkage disequilibrium in the neurofibromatosis I (NFl) region: Implications for gene mapping. Am J Hum Genet 53:1038-1050. 35. Weir BS and Cockerham CC (1978). Testing hypothesis about linkage disequilibrium with multiple alleles. Genetics 88:633-642. 36. Hill WG (1975). Linkage disequilibrium among multiple neutral alleles provided by mutation in fmite populations. Theor Popul Biol 8:117-126. 37. Snell RG, Larazou L, Youngman S, Quarell OWJ, Wasmuth JJ, Shaw DJ, Harper PS (1989). Linkage disequilibrium in Huntington’s disease: an improved localization for the gene. 3 Med Genet 42:673-675. 38. MacDonald ME, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H, Wasmuth J, Gusella iF (1991). Complex patterns of linkage disequilibrium in the Huntington disease region. Am 3 Hum Genet 49:723-734. 39. Adam S, Theilmann J, Buetow K, Hedrick A, Collinc C, Weber B, Huggins, Hayden M (1991). Linkage disequilibrium and modification of risk for Huntington disease. Am J Hum Genet 48:595-603. 40. Barron L, Curtis A, Shrimpton AE, Holloway 5, May H, Snell RG, Brock DIR (1991). Linkage disequilibrium and recombination make a telomeric site for the Huntington’s disease gene unlikely. 3 Med Genet 28:520-522. 41. Theilmann 3, Kanani S, Shiang R, Robbins C, Quarrell 0, Huggins M, Hedrick A, Weber B, Collins C, Wasmuth JJ, Buetow KH, Murray JC, Hayden MR (1989). Nonrandom allelic association between alleles detected at D4S95 and D4S98 and the Huntington disease gene. 3 Med Genet 26:676-68 1.  68  42. Novelletto A, Mandich P, Bellone E, Malaspina P, Vivona G, Ajmar F, Frontali M (1991). Non-random association between DNA markers and Huntington disease locus in the Italian population. Am 3 Med Genet 40:374-376. 43. Weber, B, Collins C, Kowbel D, Riess 0, Hayden MR (1991). Identification of multiple CpG-islands and associated conserved sequences in a candidate region for the Huntington disease gene. Genomics 11:1113-1124. 44. Pritchard CA, Casher D, Uglum E, Cox DR, Myers RM (1989). Isolation and fieldinversion gel electrophoresis analysis of DNA markers close to the Huntington disease gene. Genomics 4:408-418. 45. Botstein D, White RL, Skolnick M, Davis RW (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am 3 Hum Genet 32:3 14-33 1. 46. Weir BS (1990) Genetic Data Analysis. Sinauer, Sunderland, MA. 47. Scott HS, Ashton U, Eyre HJ, Baker E, Brooks DA, Callen DF, Sutherland GR, Morris CP, Hopwood JJ (1990). Chromosomal localization of the human a-L-iduronidase gene (IDUA) to 4p16.3. Am 3 Hum Genet 47:802-807. 48. Weber B, Riess 0, Hutchinson G, Collins C, Lin B, Kowbel D, Andrew S, Shappert K, Hayden MR (1991) Genomic organization and complete sequence of the human gene encoding the 8-subunit of the cGMP phosphodiesterase and its location to 4.pl6.3 Nucl Acids Res. 19:6263-6268. 49. Collins C, Shappert K, Hayden MR (1992). The genomic organization of a novel regulatory myosin light chain (MYL5) that maps to chromosome 4p16.3 and shows different patterns of expression between primates. Hum Mol Genet 1:727-733. 50 Chakravarti A, Buetow KH, Antonarakis SE, Waber PG, Boehm CD, Kazazian HH (1984). Nonuniform recombination within the human 8-globin gene cluster. Am 3 Hum Genet 36:1239-1258. 51. Buetow KH, Shiang R, Yang P. Nakamura Y, Lathrop GM, White R, Wasmuth JJ, Wood 5, Berdahl LD, Leysens NJ, Ritty TM, Wise M, Murray JC (1991). A detailed multipoint map of human chromosome 4 provides evidence for linkage heterogeneity and position-specific recombination rates. Am J Hum Genet 48:911-925. 52. Chakravarti A (1992). Fragile X founder effect? Nature Genet 1:237-238.  69  CHAPTER 4 DNA ANALYSIS OF DISTINCT POPULATIONS  The work presented in this chapter has contributed to two publications.  Andrew SE, Theilmann J, Almqvist E, Norremolle A, Lucotte G, Anvret M, Sorenson SA, Turpin JC, Hayden MR (1993). DNA analysis of distinct populations suggests multiple origins for the mutation causing Huntington disease. Clin Genet 43 :286-294.  Almqvist E, Andrew SE, Theilmann 3, Goldberg P, Zeisler 3, Drugge U, Grandell U, Tapper-Persson M, Winbiad B, Hayden MR, Anvret M. Geographical distribution of haplotypes in Swedish families with Huntington disease (In press, Human Genetics).  70  4.1 INTRODUCTION In an effort to refine the localization of the HD gene, studies of linkage disequilibrium have been undertaken in populations of various ancestries with DNA markers from 4pl6.3. A total of seven allelic association studies have been reported revealing significant associations between various markers and HD in some populations and not in others (Table 3-1). Although some markers appear to be nonrandomly associated in most populations (i.e. D4S95)’5 conflicting results supporting and refuting nonrandom association between ,  D4S1O, D4S43, D4S98 and the lID gene have been reported . In Chapter 3, using a 16 population of mixed ancestry and a UK population, a distal region of disequilibrium was identified at D4S 133/D4S228, about 3 Mb distal to D4S95.  Results of association studies can be significantly biased if the ancestry of the control population is not similar to that of the affected population. In addition to considering ethnic specific allele frequencies, if a disease is caused by more than one mutation at the same locus, different mutations associated with different DNA haplotypes might predominate in various populations. Associations due to identity by descent would be stronger by analysing descendants of a few common ancestors rather than by pooling all persons with different mutations . This argues for studies of nonrandom association in selected distinct 7 populations where controls and affected individuals are more likely to be of similar ancestry and where there might be only a few or perhaps a single mutation underlying HD in that region.  For disequilibrium mapping, a judicious choice of isolated, well-defined  populations with well characterized histories may be essential to decrease the influence of evolutionary forces such as drift and admixture . This is successfully demonstrated by the 8 search for the mutation causing diastrophic dysplasia (DTD), where linkage disequilibrium in the isolated Finnish population of DTD families, all descended from one founder chromosome, was used to localize the likely location of the DTD gene to a region of . 9 approximately 60 kb, from a previously estimated region of 1.6 Mb  71  Another method of analysis of presumed homogeneous populations is to establish haplotypes for affected chromosomes. Development of extended RD haplotypes can lead to estimates of the minimum number of original mutations in a particular geographical region or population.  To date, there have been few HD association studies in populations of clearly defined ancestry. One association study was based on 46 pedigrees of Italian ancestry . Two 6 studies have been based on HD families of UK descent . In these studies however, 4 ’ 2 unaffected family members and others were included as controls without stringently determining their ancestral background. Three association studies have been based on North American populations 5 where it is especially inappropriate to assume that the 3 ’ 1 spouse of the affected person is of similar ancestry. Therefore, these studies are likely to have included controls with diverse ancestral backgrounds which might have confounded the results.  In this analysis, nonallelic association between HI) and several markers previously shown to be in association with HD, is investigated in three distinct populations of French, Swedish and Danish ancestry and results have been compared to a similar sized group of lID families of UK descent. Haplotypes of HD chromosomes were constructed within each population to determine the number of independent mutations for the HD gene in the French, Swedish and Danish populations.  4.2 RESULTS 4.2.1 ASSESSMENT OF NONRANDOM ASSOCIATION For analysis of linkage disequilibrium in homogeneous populations DNA was obtained from a total of 276 individuals living in France, Denmark and Sweden. A total of 149 ND  72 samples and 127 non-HD samples representing 75 unrelated families were collected. Ninety eight samples from 26 HD pedigrees were obtained from Sweden, 121 samples from 18 pedigrees were Danish in origin, and 57 samples from 31 pedigrees were from the Reims region of France. Samples were carefully selected in each of these counthes such that all individuals within each population were of the same ancestral background, with the controls of the same background as the affected individuals.  In addition, HI) and non-HI) samples of a UK population, with autochthonic control chromosomes, was used for comparison to the other populations. This sample was obtained by selecting HI) and non-HD individuals randomly from the UK cohort of Chapter 3 until 20 HD individuals were obtained, resulting in a group with a similar number of HD and control chromosomes to that of the other populations tested in this study. Affected and non-HI) individuals from a population of multiple different ancestries was also assessed for comparison.  Four DNA markers demonstrated in Chapter 3 to be in nonrandom allelic association with HI) were used for analysis of the homogeneous Danish, Swedish, and French populations in this Chapter (Table 4-1).  The presence of significant allelic association was tested using a X 2 test of homogeneity. For the multiallelic marker E2 (D4S228), alleles beyond the most common allele were pooled to form one class. The x 2 statistic is dependent on sample size, therefore the Yule coefficient was also calculated as another measure of association.  The allele frequencies for RFLPs on HI) and control chromosomes and the measures of allelic association are shown in Table 4-2. The statistically significant nonrandom association previously observed between the HD gene and D4S95 (674 MboI  73  Table 4-1.  Polymorphic markers used in association analysis.  PROBE  LOCUS  ENZYME  ALLELE  pBS674E-D (674)  D4S95  MboI  BS1  D4S133  MboI  cl6Dp/E2Rep (E2)  D4S228  SstI  cl6Dp/M4.2 (4.2)  D4S228  SstI  A B A B 0 1 2 3 A B  SIZE (kb) 1.3 0.7/0.6 0.8 0.7 5.0 4.8 4.5 3.8 8.0 6.5  N  88 57 145 45 35 80 319 180 499 12 44 56 9 68 77 23 103 126 12 49 61 65 345 410 1 45 0 7 53  NON-HD CHROMOSOMES % # 67.2 39 32.8 19 100.0 58 62.7 47 37.3 28 100.0 75 60.7 39.3 100.0 56.2 43.8 100.0 63.9 36.1 100.0 21.4 78.6 100.0 11.7 88.3 100.0 18.3 81.7 100.0 19.7 80.3 100.0 15.9 84.1 100.0 1.9 84.9 0.0 13.2 100.0 0.45  0.082  0.63  11.40  4.46  0.13  0.24  1.93  1  1  1  1  1  1  1  1  1  0.0038  0.074  0.50  0.77  0.43  0.00074  0.035  0.72  0.63  0.17  P  0.50  0.63  0.49  0.20  0.42  0.41  0.63  0.16  0.22  0.57  IQI df  3.19  1  0.55 0.52  2 x  8.36  2 1  -  1.18 0.41  Table 4-2. Allele frequencies for RFLPs on HD and Non-HD chromosomes. MARKER (LOCUS) 674 (D4S95)  BS1 (D45133)  E2 (D4S228)  HD ENZYME ANCESTRY ALLELE CHROMOSOMES # 15 French 88.2 MboI 11.8 2 100.0 17 72.2 13 Swedish 27.8 5. 100.0 18 52.9 Danish 9 47.1 8 17 100.0 17 85.0 U.K. 15.0 3 20 100.0 Multiple 80.9 93 22 19.1 Ancestry 115 100.0 French 2 10.0 MboI 90.0 18 100.0 20 16.7 4 Swedish 20 83.3 24 100.0 7.1 Danish 1 13 92.9 14 100.0 U.K. 0 0.0 20 100.0 20 100.0 4 4.1 Multiple Ancestry 94 95.9 98 100.0 SstI 0.0 French 0 17 94.4 00.0 1 5.6 18 100.0 A B Total A B Total A B Total A B Total A B Total A B Total A B Total A B Total A B Total A B Total 0 1 2 3 Total  Table 4-2. Continued. MARKER (LOCUS) E2(D4S228)  4.2 (D4S228)  0 1 2 3 Total 0 1 2 3 Total 0 1 2 3 Total 0 1 2 3 Total A B Total A B Total A B Total A B Total A B Total  0 21 1 1 23 0 15 0 1 16 0 20 0 0 20 0 105 2 3 110 1 15 16 2 21 23 1 15 16 0 20 20 6 103 109  HD ALLELE CHROMOSOMES ENZYME ANCESTRY % Swedish SstI 0.0 91.3 4.3 4.3 100.0 Danish 0.0 93.8 0.0 6.2 100.0 0.0 U.K. 100.0 0.0 0.0 100.0 0.0 Multiple 95.5 Ancestry 1.8 2.7 100.0 6.3 French SstI 93.7 100.0 8.7 Swedish 91.3 .100.0 6.2 Danish 93.8 100.0 0.0 U.K. 100.0 100.0 5.5 94.5 100.0 Multiple Ancestry  NON-HD CHROMOSOMES % 1.3 88.6 1.3 8.9 100.0 0.0 79.6 3.0 17.4 100.0 0.0 84.5 2.4 13.1 100.0 0.6 84.7 2.8 11.9 100.0 12.8 87.2 100.0 10.3 89.7 100.0 21.2 78.8 100.0 21.5 78.5 100.0 17.4 82.6 100.0 1 70 1 7 79 0 105 4 23 132 0 71 2 11 84 3 447 15 63 528 5 34 39 8 70 78 28 104 132 14 51 65 74 350 424  3.71  1.19  0.03  0.05  9.66 8.19  3.54 2.26  1.93 1.07  1.63 0.00022  2 x  1  1  1  1  1  3 1  2 1  2 1  3 1  d f  0.0030  0.05  0.28  0.86  0.81  0.022 0.0042  0.17 0.13  0.38 0.30  0.65 0.99  P  0.60  0.09  0.38  0.58  0.59  0.15  IQI  0.57 8.79  76 5 or between HD and D4S 1331D4S228 (BS 1 MboI polymorphism, E2 polymorphism)’ SstI polymorphism and 4.2 SstI polymorphism) is not observed in this study. None of the P values obtained with these markers are significant at P <0.05 within the populations tested (Table 4-2).  For comparison, a UK population of similar size to the other populations was also analyzed, and significant association was observed between both 674(D4S95) and 4.2(D4S228) and HD, with P values of 0.035 and 0.050 respectively (Table 4-2).  The control allele frequencies within each of these populations for each of the four probes tested are shown in Table 4-2. The significance of the differences in allele frequencies was determined by a X 2 analysis. The numbers of chromosomes counted for each allele were analyzed between populations in a pairwise fashion, with all possible pairwise combinations tested (data not shown).  No significant differences (P  <  0.05) were  observed between any populations.  To investigate underlying differences between HD populations of different ancestral backgrounds, allele frequencies of these DNA markers on HI) chromosomes from France, Sweden, Denmark, UK. and the population of multiple different ancestries (Table 4-2) were compared. Alleles on HI) chromosomes were counted and analyzed between populations in a pairwise fashion, with all possible pairwise combinations tested. Significant differences were seen when allele frequencies for FID chromosomes were compared between the Danish population and the multiple ancestry population at 674(D4S95). A X 2 analysis of the allele frequencies, with one degree of freedom, gave a X value of 5.08 with a P value of 0.024. 2  77 4.2.2 DNA HAPLOTYPE ANALYSIS OF HD CHROMOSOMES The differences in allele frequencies for 674(D4S95) on HI) chromosomes between Danish  and UK families raised the possibility of multiple different origins for the HI) gene at least in these two countries. To investigate this further, haplotypes for the FID chromosomes were constructed with alleles at 674(D4S95), BS1(D4S133), E2(D4S228) and 4.2(D4S228), from pedigrees in the Swedish, Danish, French, and UK populations as well as in the combined group of multiple ancestries (Table 4-3, Figure 4-1).  Despite the relative homogeneity of these populations, more than one HD haplotype was observed within each population (Table 4-3). One identical major haplotype (haplotype 1) was observed in all the populations tested. Haplotype 2 was the second most frequent haplotype observed but was restricted to the Danish, Swedish and multiple ancestry populations. Haplotype 2 differed from haplotype 1 at only one marker (D4S95) suggesting the two haplotypes could be related. However, the subsequent localization of the HI) mutation demonstrated that this marker is the closest to the FR) mutation, and thus haplotype 2 more likely represents a distinct haplotype than a derivation of haplotype 1. Haplotype 3 is incomplete at 674(D4S95) due to the inability to determine phase in some families because of limitations in pedigree structure.  Resolution of the alleles at  674(D4S95) in haplotype 3 would significantly increase the number of haplotypes 1 or 2 observed (Figure 4-1).  Two distinct H]) haplotypes (haplotypes 1 and 5) were observed in the French population (Table 4-3). Six pedigrees contained haplotype 1, and one pedigree contained another haplotype that differed at two of the four markers tested (haplotype 5). Phase could not be established in six French pedigrees at 674 (D4S95).  00 N  -  A  -  -  BS1 E2 (D4S133) (D4S228) B 1 B 1 B 1 A 1 3 A 3 3 A 1 2 2 4.2 (D4S228) B B B B A A A A A A FRENCH % # 46.2 6 0.0 0 46.2 6 0 0.0 1 7.6 0 0.0 0.0 0 0 0.0 0 0.0 0 0.0 13 100.0  DANISH # % 42.9 6 42.9 6 0 0.0 7.1 1 7.1 1 0.0 0 0.0 0 0 0.0 0.0 0 0 0.0 14 100.0  SWEDISH # % 10 43.5 5 21.7 5 21.7 4.4 1 0 0.0 4.4 1 0 0.0 0 0.0 0 0.0 1 4.4 23 100.0  # 9 0 2 0 0 0 0 0 0 0 11  U.K. % 81.8 0.0 18.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0  Table 4-3. Haplotypes for HD chromosomes from Swedish, Danish, French, UK and combined ancestry populations.  -  A A A B A A  -  HAPLOTYPII 674 (D4595) A B  1 2 3 4 5 6 7 8 9 10 TOTAL  MULTIPLF ANCESTRY TOTAL # % # % 66.7 58 60.1 89 18.4 16 18.2 27 9.2 8 14.2 21 1.1 2.0 1 3 1.1 1 2.0 3 0.0 ‘1 0 0.7 1.1 1 0.7 1 1.1 1 0.7 1 1.1 1 0.7 1 0 0.7 1 0.0 100.0 148 87 100.0  79  1  IA  lB  1  lB -  B  Ii  lB 13  2  lB lB Ii  lB  4  IAIAI1  lB  l  IAIAI1IAI8  6  IAIAI3IAI  lAl- 131A15  71B1-  131A1  9[AlI2IAl  Figure 4-1.  I-1A121A110  Schematic of haplotypes seen in French, Danish, Swedish and UK populations. Each box represents a marker used in haplotype analysis; D4S95(674), D4S1 33(BS1), D4S228(E2) and D4S228(4.2). Haplotypes are numbered according to Table 4-3. Arrows between haplotypes suggest possible evolutionary relationships between haplotypes.  80 In the Danish population, there were four distinctly different haplotypes (Table 4-3). Haplotypes 1 and 2 were both observed in six pedigrees. Haplotype 5 was observed in one pedigree and haplotype 4 was observed once in the population tested.  A minimum of five different haplotypes were observed in the Swedish population (Table 43). Haplotype 1 was observed on ten chromosomes while haplotypes 4, 6 and 10 occurred on only one chromosome each. Haplotype 2 was the second most frequent haplotype in the Swedish population and was observed in four HD families. Five haplotypes were incomplete at the 674(D4S95) locus due to pedigree structure.  HD haplotypes were also constructed for the UK. population of similar size to the other populations tested (Table 4-3). Only one major haplotype was observed (haplotype 1), as well as two pedigrees with the incomplete haplotype 3.  For comparison, HD haplotypes were also constructed for a larger population consisting of 87 pedigrees of multiple origins (Table 4-3). One major (haplotype 1) and seven other minor frequency haplotypes were observed when HD chromosomes of multiple origins were examined (Table 4-3). Haplotype 2 was the second most frequent haplotype (18.4%), and each of the six other minor haplotypes were represented once (1.1%).  Finally, in the absence of double recombination or gene conversion events at more than one site, these analyses have shown there is a minimum of two ND mutations in the French population, four in the Danish population and five in the Swedish population studied (Table 4-3).  81 4.2.3 DNA HAPLOTYPE ANALYSIS OF CONTROL CHROMOSOMES To compare haplotypes of HD chromosomes to those of control chromosomes, DNA haplotypes were also constructed for control chromosomes from the unaffected chromosome from the patients and the unaffected spouses from the French, Danish, Swedish and UK. populations as well as the multiple ancestry group (Table 4-4). Of 451 control chromosomes with complete haplotypes, 17 out of 32 possible haplotypes (53.1%) were observed. The observed distribution of haplotypes was significantly different from that expected as calculated from allele frequencies of the polymorphisms (X2  =  194.02, P  =  0.00000, with 23 df), due to significant linkage disequilibrium between these markers on normal chromosomes.  4.2.4 HAPLOTYPE COMPARISONS Two of the rare HD haplotypes (haplotypes 8 and 9), were not observed in the control chromosomes. Haplotype 8 is complete but haplotype 9 could be identical to haplotype 10 which was rarely seen in HD (0.7%) and control (0.2%) populations. Thirteen control haplotypes (haplotypes 8 to 23), accounting for 9.3% of normal chromosomes were not observed on HE) chromosomes (Tables 4-3 and 4-4).  The number of haplotypes observed for the HI) group and the control group for each country were analyzed by  2 x  analysis (data not shown). There were no significant  haplotype differences observed in the Swedish, Danish, French or UK populations. In contrast however, when control haplotypes were compared to HI) haplotypes from the HI) patients of multiple ancestries, significant association of the major haplotype (haplotype 1) with HD was observed  2 (x  =  11.65, P  =  0.00064 with df = 1). In addition, when all the  populations were combined, and the total number of control chromosomes are compared to the total number of HI) chromosomes, significant differences were observed P  =  0.0071 with 22 df; 2  =  8.87, P  =  0.0029 with 1 di).  2 (x  =  41.6,  00  -  BS1 D4S133 B B B A  -  A A -  A A B A B A A B B A B B B B  E2 D4S228 1 1 1 1 3 3 3 1 2 2 3 1 1 3 3 2 0 0 2 2 2 2 1  4.2 D4S228 B B B B A A A A A A A A B A B A B B A B B A A FRENCH # % 18 56.3 5 15.6 6 18.8 0.0 0 0.0 0 3.1 1 0 0.0 0 0.0 0 0.0 0 0.0 1 3.1 0 0.0 0 0.0 0.0 0 0.0 0 0 0.0 1 3.1 0 0.0 0 0.0 0 0.0 0.0 0 0.0 0 0.0 0 32 100.0  DANISH # % 42 42.0 30 30.0 10 10.0 0 0.0 1 0.0 8.0 8 0.0 0 0 0.0 0 0.0 1 1.0 3 3.0 2 2.0 2 2.0 0 0.0 0.0 0 1 1.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 100 100.0 SWEDISH # % 38 48.7 21 26.9 11.5 9 1 1.3 0 0.0 3 3.8 0 0.0 0 0.0 0 0.0 0 0.0 2 2.6 0 0.0 1.3 1 1 1.3 1 1.3 0 0.0 0 0.0 1 1.3 0 0.0 0.0 0 0.0 0 0 0.0 0 0.0 78 100.0  U.K. # 23 13 1 1 0 6 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 48  % 47.9 27.1 2.1 2.1 0.0 12.5 0.0 0.0 0.0 0.0 2.1 2.1 0.0 2.1 2.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0  MULTIPLE ANCESTRY # % 114 44.7 27.8 71 8.6 22 0.4 1 2.0 5 6.3 16 2.7 7 0.0 0 0.0 0 0.0 0 0.8 2 0.8 2 0.0 0 0.4 1 2.0 5 0.8 1 0.8 1 0.8 1 0.4 1 0.8 2 0.4 1 0.4 1 0.4 1 255 100.0  TOTAL # % 235 45.8 140 27.3 9.4 48 0.6 3 1.2 6 34 6.6 1.4 7 0.0 0 0.0 0 0.2 1 1.7 9 1.0 5 0.6 3 3 0.6 1.4 7 0.4 2 0.4 2 0.4 2 0.2 1 0.4 2 0.2 1 0.2 1 0.2 1 513 100.0  TABLE 4-4. Haplotypes for control chromosomes from Swedish, Danish, French, U.K. and combined ancestry populations.  B A B A A A A B B A B A B  -  A A A B A A  -  HAPLOTYPE 674 D4S95 A B 1 2 3 4 5 6 7 8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 TOTAL  83  4.3 DISCUSSION In this study, Danish, Swedish and French populations were examined for nonrandom allelic association with markers previously shown to be in nonrandom allelic association with the ND gene. These countries of origin were chosen on the basis that available samples were from autochthonous families. Thus, the controls used in this study were of similar ancestry to the HD individuals. In contrast to results of previous studies’, no 5 significant associations between any of the markers tested and H]) were observed (Table 42).  What could account for these differences in reports of nonrandom allelic association between certain markers and the ND mutation? The failure to detect nonrandom association does not always imply its absence as association is dependent on allele frequencies, sample size, and whether the disequilibrium is positive or negative (minor or major allele associated with the disease respectively)’ . 0  Large sample sizes are often required to clearly demonstrate disequilibrium, particularly if the major allele at one locus is in disequilibrium with the disease locus. For this reason, in some instances it may be advantageous to combine all families regardless of ancestry to test markers initially for association, as the increase in sample size could outweigh the disadvantage in using unmatched controls to determine control allele frequencies. This can result in evidence for disequilibrium for loci that appeared to be in equilibrium within each specific population’°”. Alternatively, spurious association seen with small numbers may become less significant as the sample size increases. Nonrandom association was initially detected between 731(D4S98) and the ND gene’, that was not upheld in a four times larger sample size . However, with pooling of data from different populations disequilibrium 3 statistics may be significantly altered by unrecognized ethnic differences in allele  84 frequencies of different markers . 12  Therefore, one method to confirm previously  demonstrated disequilibrium is to ensure that the controls are of similar ancestry as the HD individuals by using a clearly homogeneous population for analysis. The selection of the controls is an important part of the analyses, and although ancestry was stringently assessed in this study, it may have still been biased by the fact that control chromosomes were from individuals who had married into families with HI).  In this study, the gene for HD segregated with the more common allele within each population, and likely, the sample sizes in this study were too small, lacking the power to detect nonrandom association. However, analysis of a UK population of comparable size did show significant association between both 674 (D4S95) and 4.2 (D4S228) and HI) (Table 4-2). This suggests that factors other than sample size alone were responsible for the absence of nonrandom ailelic association in this study.  The variability in results for non-random association between HI) and marker alleles in different reports has also been similarly observed in association studies with other diseases. For example, significant association between RFLPs within the insulin receptor gene and non-insulin dependent diabetes mellitus have been reported in a Scandinavian population’ 3 and a Mexican-American population’ 5 but not in two Japanese populations’ ” 4 . This 7 ” 6 could be explained as spurious results due to a small sample size as one of the Japanese studies not showing association was based on a sample size half as large as that of the prior studies. Alternatively, the differences in association might be due to the populations being genetically different, with differences in allele frequencies for the markers tested between the populations studied.  There were no statistically significant differences in allele frequencies for control chromosomes between the four populations reported in this study (Danish, French,  85 Swedish and UK) (Table 4-2), suggesting the observed lack of association in the non-UK populations was due to a factor(s) other than altered allele frequencies in controls.  A possible explanation for the lack of observed association between markers and HI) in this study is that the populations studied may not be homogeneous with respect to a single founder H]) chromosome.  One method of observing differences between HI)  chromosomes is to compare the allele frequencies for different markers on affected chromosomes between populations. The difference in D4S95 allele frequencies between HD chromosomes in the Danish population and HI) chromosomes in the multiple ancestry population, suggested that different chromosomes in the different populations underlie lID.  The possibility of more than one origin for the mutation causing ND in the populations testing, that would account for the lack of linkage disequilibrium observed, was further explored by construction of DNA haplotypes (Table 4-3). Development of extended HD haplotypes can lead to estimates of the minimum original mutations in a particular geographical region or population. In this study, one major haplotype (haplotype 1) could account for all patients with HI) in the UK population, but at least four other additional and distinct haplotypes (haplotypes 2, 4, 5/6 and 10) were seen in the French, Danish and Swedish populations, suggesting at least four additional HI) mutations. The absence of nonrandom association in these populations is therefore, at least in part, due to the presence of a minimum of two different haplotypes within each population, compared to only one definite haplotype in the UK population studied. Therefore, it is the multiple HI) haplotypes within each population other than the UK that make it impossible to detect any allelic association with lID.  A scenario demonstrating possible evolutionary relationships between the haplotypes based on similarity is shown in Figure 4-2. Haplotypes 1 and 2 although identical at distal  86 markers (D4S133 and D4S228) differ at the marker closest to the HI) mutation (D4S95) and likely represent distinct HI) haplotypes. The minor frequency haplotypes differ from haplotype 1 and 2 in at least two markers. Therefore, more than one recombination or a recombination event and gene conversion would have had to have occurred to make haplotypes 5 to 10 similar to 1. This scenario seems unlikely given the decreased rate of recombination in this region . Haplotype 4 seen in FID patients of Denmark and Sweden 18 differs from haplotype 1 at only one marker, and could be derived from haplotype 1 by gene conversion or mutation at the MboI site detected by BS1(D4S133). Alternatively, haplotype 4 could represent a third ancestral HD chromosome.  MacDonald et al. using 8 DNA markers spanning D4S 126 to D4S98, studied 78 HI) and 168 control chromosomes and reported that about one third of the HI) chromosomes were derived from one primordial haplotype in a mixed population of Western European . Using four markers, including D4S95 and three markers from a more distal 19 descent region previously demonstrated to be in association with HI), analysis of 148 HD and 513 control chromosomes has shown that approximately 60% of HI) chromosomes from all ancestries examined have an identical haplotype and might therefore have a common ancestral chromosome. The differences in numbers of predicted primordial haplotypes in all likelihood reflects the varying markers from different regions. However, what both studies clearly support are multiple origins for H]) chromosomes with expanded CAG repeats.  These results support the hypothesis that multiple occurrences of CAG expansion have occurred on different chromosomal backgrounds within these populations. Thus several different haplotypes with the tendency for CAG expansion led to multiple HI) chromosomes. The alternative hypothesis is that one ancient ancestral HI) mutation has evolved dramatically in each population, resulting in distinctly different haplotypes due to  87 the length of time for mutation to occur. Recent evidence however, supports the first hypothesis as haplotype analysis using markers close to the CAG repeat in the homogeneous Swedish population demonstrates that at least three chromosomal haplotypes underlies HI) in this homogeneous population . 20  The multiple HD haplotypes in each population may explain the two clusters of allelic association observed seen in Chapter 3. The possibility of multiple haplotypes with the same allele at a distal marker by chance alone could account for the disequilibrium at markers 3 Mb from the HD gene.  In summary, these findings show no significant association between HI) and the DNA markers tested in the French, Swedish, and Danish populations whereas significant results were demonstrated in a UK. population of similar size, suggesting that the absence of association was not predominantly a consequence of allele frequencies or sample size. To further investigate the number of potential HI) chromosomes, DNA haplotypes were constructed for the Danish, French, Swedish and UK populations. The minimum of two HO haplotypes observed in each of the French, Danish and Swedish populations, compared to the one haplotype in the UK population of a similar size, is an important factor accounting for the absence of association between HD and the DNA markers in these populations. Furthermore, haplotype analysis of HI) chromosomes within each population demonstrated multiple haplotypes within each population, providing support for multiple independent origins for the HI) chromosome in the French, Swedish and Danish populations.  88  4.4 REFERENCES 1. Theilmann J, Kanani S. Shiang R, Robbins C, Quarrell 0, Huggins M, Hedrick A, Weber B, Collins C, Wasmuth JJ, Buetow KH, Murray JC, Hayden MR (1989). Nonrandom association between alleles detected at D4S95 and D4S98 and the Huntington’s disease gene. 3 Med Genet 26:676-68 1. 2. Snell RG, Larazou L, Youngman S, Quarrell OWJ, Wasmuth 33, Shaw DJ, Harper PS (1989). Linkage disequilibrium in Huntington’s disease: An improved localization for the gene. 3 Med Genet 42:673-675. 3. Adam S, Theilmann 3, Buetow K, Hedrick A, Collins C, Weber B, Huggins M, Hayden M (1991). Linkage disequilibrium and modification of risk for Huntington disease. Am 3 Hum Genet 48:595-603. 4. Barron L, Curtis A, Shrimpton AE, Holloway S, May H, Snell RG and Brock DIH (1991). Linkage disequilibrium and recombination make a telomeric site for the Huntington’s disease gene unlikely. J Med Genet 28:520-522. 5. MacDonald M, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H, Wasmuth 33, Gusella IF (1991). Complex patterns of linkage disequilibrium in the Huntington disease region. Am 3 Hum Genet 49:723-734. 6. Novelletto A, Mandich P, Bellone E, Malaspina P, Vivona G, Ajmar F, Frontali M (1991). Non-random association between DNA markers and Huntington disease locus in the Italian population. Am 3 Med Genet 40:374-376. 7. Lander ES (1988). Mapping complex genetic traits in humans, in “Genome analysis: A practical approach”. Davies KE, ed. IRL Press, Oxford. 8. Jorde LB, Watkins WS, Viskochil D, O’Connell P, Ward K (1993). Linkage disequilibrium in the Neurofibromatosis I (NFl) region: Implications for gene mapping. Am 3 Hum Genet 53:1038-1050. 9. Hastbacka 3, de la Chapelle A, Kaitila I, Sistonen, Weaver A, and Lander E (1992). Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nature Genet 2:204-211. 10. Thompson EA, Deeb S, Walker D, and Motuisky AG (1988). The detection of linkage disequilibrium between closely linked markers: RFLPs at the Al-CIlI Apolipoprotein genes. Am J Hum Genet 42:113-124. 11. Nei M and Li WH (1973). Linkage disequilibrium in subdivided populations. Genetics 75:213-219. 12. Haviland MB (1991). Estimation of Hardy-Weinberg and pairwise disequilibrium in the Apolipoprotein AI-CllI-AIV gene cluster. Am J Hum Genet 49:350-365. 13. Sten-Linder M, Olsson M, Iselius L, Efendic S, Luthman H (1991). DNA haplotype analysis suggests linkage disequilibrium in the human insulin receptor gene. Hum Genet 87:469-474.  89  14. McClain DA, Henry RR, Ulirich A, Olefsky JM (1988). Restriction fragment length polymorphism in insulin receptor gene and insulin resistance in NIDDM. Diabetes 37:1071-1075. 15. Raboudi SM, Mitchell BD, Stern MP, Eifler CW, Haffner SM, Hazuda HP, Frazier ML (1989). Type II diabetes mellitus and polymorphism of insulin receptor gene in Mexican Americans. Diabetes 38:975-980. 16. Takeda J, Seino Y, Yoshimasa Y, Fukumoto H, Koh G, Kuzuya H, Imura H, Seino S (1986). Restriction length polymorphism (RFLP) of the human insulin receptor gene in Japanese: its possible usefulness as a genetic marker. Diabetes 29:667-669. 17. Li SR, Oelbaum RS, Stocks 3, Gakon DJ (1988). DNA polymorphisms of the insulin receptor gene in Japanese subjects with non-insulin-dependent diabetes mellitus. Hum Hered 38:273-276. 18. Buetow KH, Shiang R, Yang P, Nakamura Y, Lathrop GM, White R, Wasmuth JJ, Wood S, Berdahi LD, Leysens NJ, Ritty TM, Wise ME, Murray JC (1991). A detailed mukipoint map of human chromosome 4 provides evidence for linkage heterogeneity and position-specific recombination rates. Am J Hum Genet 48:911-925. 19. MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, Bates G, Taylor S, Allitto B, Akherr M, Myers R, Lehrach H, Collins FS, Wasmuth JJ, Frontali M, Gusella iF (1992). The Huntington’s disease candidate region exhibits many different haplotypes. Nature Genet 1:99-103. 20. Almqvist E, Andrew 5, Theilmann 3, Goldberg P. Zeisler J, Drugge U, Grandell U, Tapper-Persson M, Winbiad B, Hayden M, Anvret M. Geographical distribution of haplotypes in Swedish families with Huntington disease (Human Genet, In press).  90  CHAPTER 5 NONRANDOM ALLELIC ASSOCIATION AND HAPLOTYPE ANALYSIS USING MARKERS FLANKING THE CAG REPEAT  91 5.1 INTRODUCTION The theoretical inverse relationship between disequilibrium and physical distance may not necessarily exist . Unequal recombination rates across the genome as well as factors such 1 as drift, mutation, admixture and gene conversion may all affect methods of measuring disequilibrium’. For these reasons, the extent of the usefulness of linkage disequilibrium in gene mapping has been questioned . Further understanding of patterns of allelic 2 association in different regions of the genome will aid in the mapping of other disease genes.  The use of linkage disequilibrium was an important tool in localization of the gene for cystic fibrosis (CF). Analysis across the candidate region revealed a gradient of increasing association as the CFTR mutation was approached . 3  The strongest values of  disequilibrium were obtained with markers in a 200 kb region centromeric to the most common CF mutation, A508, with the highest degree of allelic association detected between RFLPs only 30 kb from the mutation site . In contrast, during the search for the 5 HD gene, linkage disequilibrium results suggested no definitive region in which to concentrate the search. However, the large extent of the candidate region and the irregular distribution of polymorphic markers throughout this genomic region may have been responsible for the difficulty in refining the candidate region. Several other features of 4pi6.3 suggested that linkage disequilibrium in this region may be more complex than elsewhere in the genome. For example, the overall level of recombination decreases from D4S1O to the telomeric marker D4S90 . In addition, a recombination hotspot at D4S1O 4 makes the recombination rate across the candidate region unpredictable . With the 5 localization of the HD mutation, it became possible to reassess linkage disequilibrium with markers flanking the HI) gene to determine if a similar pattern to that for CF exists for HI) or, if the pattern was more complex as analysis across the 2.5 Mb candidate region initially . 7 ’ 6 suggested  92  Analysis of linkage disequilibrium and haplotypes of other diseases caused by dynamic mutation such as Fragile X (FRAXA) and myotonic dystrophy (DM), has led to hypotheses as to the number of founder chromosomes, adding further insight into the mechanism of expansion. Dynamic mutations tend to continue to expand, as demonstrated by further expansion of the CAG repeat associated with HD resulting in a repeat size associated with juvenile onset. The observed anticipation, resulting in onset of disease before reproductive age, results in the loss of that ED chromosome. Dynamic mutations are therefore continuously being eliminated from the population, however, the frequency of the mutant alleles is somehow maintained. This can be accomplished by increased reproductive fitness of those with slightly higher numbers of repeats or by the occurrence of new mutations. However, the strong measures of linkage disequilibrium observed between dynamic mutations such as those associated with FRAX and DM and flanking markers are inconsistent with the concept of recurring new mutations.  Recent haplotype analysis of DM and FRAX chromosomes has suggested a model for the development of dynamic mutations that is in keeping with the observation of allelic association demonstrating a founder chromosome effect. In DM, the mutation is in complete disequilibrium with a nearby 2 allele insertion/deletion polymorphism in multiple populations, suggesting a single origin for the predisposing mutation . A proposed multi 8 step model suggests the basis of the linkage disequilibrium may have been a rare ancestral mutation on a chromosome with an insertion allele that generated an allele with 19-30 repeats. Subsequently, a second mutational mechanism results in the further expansion of these alleles to the premutation size range (30-50 repeats), which is inherently unstable, and expands in subsequent generations to the disease range (>50 repeats) . Thus the DM 8 mutation is always associated with the insertion polymorphism haplotype, yet multiple mutation events occurring on this reservoir of unstable chromosomes prone to expansion  93 generate fully expanded DM mutations, and maintain the frequency of disease alleles in the population.  In FRAXA, linkage disequilibrium with flanking markers was not as strong as in DM, which initially suggested multiple initial mutation . 10 Recent analysis of one of the ’ 9 events flanking microsatellite markers initially used for allelic association studies demonstrated a previously unknown complex pattern. The polymorphism actually consists of 3 variable regions of DNA, which when analyzed separately, are in strong association with the FRAXA mutation, and haplotype data suggests the mechanism of expansion from a reservoir of large alleles is very similar to that of DM . MacPherson et al. demonstrate by 11 haplotype analysis that FRAXA is caused by more than one initial expansion event . 12 Thus, the FRAX mutation is also the result of a multistep mutational process, however, with likely multiple origins for the CGG expansion..  Analysis of association between markers flanking the CAG and H]) will reveal if the model proposed for FRAXA and DM is common to other dynamic mutation disorders such as HD.  5.2 RESULTS 5.2.1 DNA MARKERS Six polymorphic markers distributed across a region of approximately 250 kb (Figure 5-1) were typed in 100 unrelated HD pedigrees primarily of Western European origin (Table 51). The unaffected allele of HD patients and chromosomes from unaffected spouses were used as control chromosomes. The ancestries of the spouses were assumed to be similar to that of their affected partner.  E—4cen  A Glutamic acid  i  HD gene (IT 1 5)  5’  I  (CCG)n (CAG)n D4S1 27 GT 70  Figure 5-1 Map of markers flanking the Huntington disease mutation used in analysis of linkage disequilibrium and haplotypes.  50kb  I  D4S95 (674)  4teI—  95  Table 5-1. Polymorphic markers used in analysis Locus  Allele  Reference (number in text)  Glutamic acid  A B  MacDonald et al. (1993) 21  GT7O  A B  Rommens et al. (1993) 13 and unpublished data  CCG  7 8 9 10 11 12  Andrew et al. (1994) 17  D4S 127  1 2 3 4 5 6 7 8  Taylor et al. (1992) 16  D4S95 AccI  A B  Wasmuth et al. (1988) 15  96 The probe GT7O has been previously described , however allele frequencies for the AccI 13 polymorphism it detects have not been published. The polymorphic CCG repeat adjacent to the CAG repeat used as one of the markers in this analysis is discussed further in Chapter 11.  5.2.2 STATISTICAL ANALYSIS All loci were analyzed for association with 1-ID using a 2 X test of homogeneity and the Yule association coefficient. With multiple comparisons, it is likely one value would be significant at 0.05 by chance alone. Therefore, the Bonferroni 14 procedure was used to adjust for multiple comparisons, and the corrected required significance level is 0.01.  5.2.3 GENE FREQUENCIES AND ALLELIC ASSOCIATION The allele frequencies for the 6 polymorphic markers in 100 HD pedigrees are given in Table 5-2a and Table 5-2b. Allele frequencies for D4S95, D4S 127, CCG and the CAG repeat are similar to those previously published’ . 2 5 1 Allele frequencies are provided for the AccI polymorphism detected by a previously described probe GT70 12  Alleles at D4S 127 and D4S95, 50 and 120 kb respectively telomeric to the H]) mutation are in strong association with the disease chromosome. Of the markers located within the HD gene, the polymorphic CCG repeat directly adjacent to the HD mutation is in very strong allelic association with disease. The glutaniic acid deletion polymorphism approximately 150kb proximal to the H]) mutation is also in strong disequilibrium with the mutation. However, in contrast, the intronic GT7O polymorphism located between the glutamic acid polymorphism and the HD mutation, is not in linkage disequilibrium with H]).  Table 5-2A. Allele frequencies for RFLPs on HD and Control chromosomes.  68.29 31.71 100.00 199 36 235  95 11 106  60.21 5.76 33.51 0.52 100.00  84.68 15.32 100.00  89.62 10.38 100.00  25.98 19.44  38.6 36.73  1.84  11.99  0.00004  0.00003 0.00001  0.00000 0.00000  0.17  0.00053  p  1  4 1  3 1  1  1  Degrees of freedom  0.58  0.55  0.87  0.23  0.6  Yule’s coefficent  Allele  56 26 82 77.53 22.47 100.00 115 11 64 1 191  1.59 24.34 15.34 16.40 42.33 100.00  17.06  Locus  A B Total 69 20 89 95.65 0.00 4.35 0.00 100.00  3 46 29 31 80 189  60.32 39.68 100.00  X2  GT7O A B Total 88 0 4 0 92 2.38 52.38 17.86 8.33 19.05 100.00  149 98 247  Control Chromosomes No. %  CCG 7 9 10 11 Total 2 44 15 7 16 84  85.23 14.77 100.00  lID Chromosomes No. %  D4S127 1 2 3 4 5 Total 75 13 88  Glutamic acid  D4S95 AccI  A B Total  98  Table 5-2b. Allele frequencies of CAG sizes Locus CAG  CAG Length (repeats) 10-15 16-20 21-25 26-30 3 1-35 36-40 41-45 46-50 5 1-55 56-60 61-65 Total  HD Chromosomes % No. 0 0.00 0.00 0 0 0.00 0 0.00 0 0.00 17.00 17 50 50.00 21 21.00 8 8.00 2 2.00 2 2.00 100 100.00  Control Chromosomes No. % 22 10.68 147 71.36 15.05 31 6 2.91 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 206 100.00  99 5.2.4 HAPLOTYPE ANALYSIS Complete haplotypes were constructed from HD pedigrees, counted and frequencies were compared between normal and affected populations (Table 5-3a). The method of grouping haplotypes is critical to the conclusions reached. Haplotypes are arranged in Table 5-3a according to their degree of disequilibrium with HD. Core haplotypes, consisting of intragenic markers that demonstrate linkage disequilibrium with HD (glutamic acid polymorphism and CCG repeat), and their frequency in the population tested are numbered and in bold. Distinct haplotypes within each core haplotype and their frequency in the population tested are listed below the core haplotype.  Twenty-five distinct HD haplotypes were observed after analysis of 67 HD chromosomes (Table 5-3 a). Haplotype 1 is the most common haplotype in affected individuals and controls (55.22% and 46.43% respectively of all chromosomes) with no significant differences in haplotype frequency between the two populations. Haplotype ic and 11, major haplotypes within this core haplotype, each account for 7.46% of all HD chromosomes analyzed (Table 5-3a).  Haplotype 2 is the second most frequent affected haplotype, observed on 35.82% of the HD chromosomes, and only on 10.7 1% of control chromosomes (Table 5-3a). This difference in haplotype frequency between the two populations is significant (p 0.00029).  =  One subhaplotype (haplotype 2d) alone accounts for 22.39% of all HD  chromosomes and is the single most frequent HD haplotype observed.  One haplotype represents a common haplotype in the control population, but is under represented in the HD population Haplotype 3 was seen significantly more frequently in the normal population than on affected chromosomes (p  =  0.00003  ) (Table 5-3a). Two  100  Table 5-3 a. Haplotypes of HD and Control chromosomes. HAPLOTYPE  G  1 la lb Ic id le if ig lh ii lj 1k 11 im in lo 2 2a 2b 2c 2d 2e 2f 2g 2h 2i 3 3a 3b 3c 3d 3e 3f 3g 3h 4 5 6 7 TOTAL  A A A A A A A A A A A A A A A A B B B B B B B B B B A A A A A A A A A B A A A  GT7O CCG -  A A A B A B A B A B A A B B A -  A B A A A A A A A -  A A A A A A A A A A A A  7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 10 10 10 10 10 10 10 10 10 10 9 0 0  127  674A  -  -  0 1 2 2 3 3 4 4 5 0 2 5 3 4 0  A A A A A A A A A A B B B B B  -  -  0 0 1 2 3 4 5 2 5  A A A A A A A B B  -  0 2 4 5 0 2 4 5 5 4 0 0  .  A A A A B B B B B A A B  HD Chromosomes Control Chromosome NO. % NO. 37 46.4 55.2 39 0 0.0 2.4 2 1 1.5 2 2.4 5 7.5 6.0 5 3 4.5 2.4 2 4 6.0 2.4 2 4 6.0 6 7.1 0 1.2 0.0 1 4 6.0 2 2.4 4 3.6 6.0 3 3 4.5 1.2 1 2 2.4 3.0 2 5 7.5 7 8.3 1 1.5 1.2 1 0 1.2 0.0 1 1 1.5 2 2.4 24 35.8 10.7 9 1 1.5 0.0 0 1 1.5 0 0.0 1 1.5 0 0.0 15 22.4 6.0 5 1 1.5 1 1.2 1 1.5 0 0.0 2 3.0 0.0 0 2 3.0 1 1.2 0 0.0 2.4 2 4 6.0 35.7 30 1 1.5 2 2.4 1 1.5 1.2 1 0 0.0 1 1.2 1 1.5 9 10.7 0 0.0 1 1.2 0 0.0 1.2 1 0 0.0 2 2.4 1 1.5 13 15.5 0 0.0 1 1.2 0 0.0 1 1.2 2 3.0 3 3.6 0 0.0 1 1.2 67 100.0 84 100.0  0.83  P Value 0.36  13.1  0.00029  17.2  0.00003  ]5  0.91 0.91 0.79 0.91  2 X  0.01 0.07 0.01  101 subhaplotypes (haplotypes 3d and 3h) represent 10.71% and 15.48% of all control chromosomes respectively, yet less than 3% of all HD chromosomes.  Four haplotypes (haplotypes 4 7) are minor haplotypes, representing less than 3% of all -  HD chromosomes, and 7% of control chromosomes.  Analysis of core haplotypes based on the intragenic polymorphisms demonstrating linkage disequilibrium with HD, i.e. alleles at the glutamic acid polymorphism and the CCG repeat (Table 5-3a) suggests that two major haplotypes, haplotypes 1 and 2 underlie HD, and therefore, there is a minimum of 2 haplotypes underlying HD. Conversely, one major haplotype, haplotype 3, is seen on over 35% of normal chromosomes, but rarely on ND chromosomes (p  =  0.00003).  5.2.5 COMPARISON OF MEAN CAG LENGTH BETWEEN HAPLOTYPES Analysis of the CAG repeat on 600 control chromosomes (Chapter 9) demonstrated that the mean CAG size is 18 repeats, with a range of 10 to 39 repeats. The distribution was however bimodal, with two peaks at 17 and 19 repeats. In this analysis, the mean CAG length was calculated on control chromosomes for the 3 major haplotypes seen in Table 53b. Of interest, the mean CAG was higher for haplotypes 1 and 2 (18.8 repeats and 21.1 repeats respectively) which represent 61% of control chromosomes, yet 94% of affected chromosomes. In contrast the mean CAG for haplotype 3 which is underrepresented on affected chromosomes compared to controls, was lower (16.9 repeats).  5.2.5 ANALYSIS OF NEW MUTATION HAPLOTYPES Haplotypes were constructed for 7 RD chromosomes from families with new mutations for RD (Table 5-4). These chromosomes had a CAG repeat within the intermediate range in the unaffected parent that expanded in the proband to a repeat length within the RD range. !  C  G 7 7 10  CCG  Control Chromosomes NO. % 50.00 39 11.54 9 38.46 30 78 100.00  Table 5-3b. Haplotypes of HD and control chromosomes with mean CAG length.  HAPLOTYPE A B A  HD Chromosomes % NO. 37 56.92 24 36.92 6.15 4 100.00 65  =  Mean CAG length of control chromsomes (± standard error) 18.8 (± .5)* 21.1 (±l.4)* 16.9 (± .4)  0.00043 and p=0.0i6 respectively)  Mean CAG length of HD chromosomes (± standard error) 44.7 (± .9) 43.9(±.8) 46.3 (± 2.2)  mean CAG length of haplotypes 1 and 2 are significantly different from that of haplotype 3 (p  I 2 3 TOTAL *  C  HAPLOTYPE A A A A A A  G 7 7 7 7 7 7  GT7O CCG -  A B A B B  127 -  2 2 3 3 4  A A A A A  -  674A HD Chromosomes Mean CAG length of IA Mean length of Mean CAG length of chromosome of parent HD chromosome normal chromosomes NO. 19.0 (±0.5) 44.7 (±0.9) 34.6 (±1.2) 7 16.6 48.5 34.5 2 21 43 30 1 19 46 34.0 2 20.6 52 39 1 21 42 36 1  Table 5-4. Haplotypes of HD chromosomes from new mutation families  1 la lb ic id le  104 Five different haplotypes were seen, although all haplotypes had identical alleles at the intragenic glutamic acid polymorphism and the CCG repeat (Table 5-4). The intermediate size of the CAG repeat in the unaffected parent, the expanded size in the proband, and the mean CAG size of that particular haplotype in the normal population are given in Table 5-4. All HD haplotypes observed are derived from haplotypes seen in the normal population (Table 5-.3a). It is notable that four of the haplotypes of the new mutations have larger than average CAG repeat sizes in the normal population. The core haplotype common to all the chromosomes (the glutamic acid polymorphism and the CCG repeat) has a mean CAG size on control chromosomes of 19 repeats, higher than that of the general population average.  Although all new mutation haplotypes share identical alleles at the glutamic acid polymorphism and the CCG repeat, there are various extended haplotypes that are prone to expansion from an intermediate CAG size to that of HD.  5.3 DISCUSSION Repetitive sequence polymorphisms such as CA repeats or VNTRs have been thought to have higher mutation rates 22 than simple polymorphisms such as the A/B presence or absence of a restriction enzyme site. Therefore, some reports have suggested the power to detect linkage disequilibrium over time is best with simple A/B 23 . polymorphis 2 ’ 4 ms However, others argue against this concept, showing strong linkage disequilibrium with . The allelic associations seen between HD and both D4S 127, a multi-allelic 25 CA repeats CA repeat, and D4S95, a single site polymorphism (Table 5-la), demonstrate that on 4.p16.3 both types of polymorphism close to the HD mutation can demonstrate a significant measure of allelic association. The strong allelic association seen with D4S 127 is consistent with its proximity to the CAG repeat. The marker not showing association is a single site polymorphism suggesting that the lack of association seen is not due to the nature of the polymorphism.  105  In view of the association with markers 150kb telomeric to the CAG mutation, and 120 kb centromeric to the CAG repeat, the lack of linkage disequilibrium between HD and GT7O located approximately 50 kb centromeric to the CAG repeat was somewhat surprising. This pattern of association is in contrast to that of the CF gene, where markers flanking the predominant z508 mutation demonstrate decreased association with increasing distance from the 4 mutation However, there is a precedence for more irregular patterns of . association across a genomic region. For example, in the NFl gene, 6 markers spanning  280kb are in complete linkage disequilibrium with one another, however disequilibrium decreases substantially with a marker only 68 kb distal to the other set. Similarly, in the insulin receptor gene, a polymorphism in the last exon of the gene only shows association to the adjacent polymorphism and not to any other upstream polymorphisms tested . 21  There are several possible explanations for the lack of allelic association seen between HD and the close marker, GT7O, in this analysis. Firstly, a high mutation rate at this particular locus would quickly dissolve allelic association originally present on affected chromosomes. It is likely that the AccI site polymorphism detected by the exonic probe GT7O is intronic, and therefore not under the same evolutionary constraints as an exonic polymorphism. Thus, the allelic association between HD and the GT7O allele of the founding chromosomes is no longer detectable due to subsequent frequent mutation events at that locus. Alternatively, and most difficult to assess is the role selection, drift, admixture, gene conversion and mutation have on determination of association. These effects cannot necessarily be overcome by analyzing a larger sample set . Replication of 26 these results in different populations with different evolutionary histories is one method to confirm the significance of these findings.  106 Initial reports suggested one origin for the HD mutation . 2 ’ 27 8 Genealogical analyses suggested that north western Europe, either France, Germany or Holland was the source of the original HD mutation, as affected individuals in countries such as Canada, South Africa, Venezuela and Australia, all traced their roots back to these European countries . 28  Haplotype analysis prior to the identification of the HD gene demonstrated a core haplotype underlying two-thirds of HD chromosomes and this was presented as evidence for more than one, but a limited number of initial founder HD chromosome . 29  However,  subsequent haplotype analysis of HD chromosomes from several homogeneous populations demonstrated multiple HD haplotypes within each homogeneous population, and it was suggested that multiple mutation events underlie the . 31 ’ 30 disorder  In this cohort of HD individuals, analysis of HD haplotypes clearly demonstrates that there is more than one founder chromosome underlying HD (Table 5-3a). Haplotypes 1 and 2 represent 90% of all HD chromosomes, and 57% of control chromosomes. The allelic differences between the most frequently affected core haplotype (haplotype 1) and the next most frequent haplotype (haplotype 2) suggest that there are at least 2 different chromosomal backgrounds that expansion for the CAG repeat could have occurred on. This is consistent with previous analyses that there are multiple origins for HD 6s The strong measures of linkage disequilibrium between HD and alleles at chromosome . the glutamic acid site, the CCG repeat, D4S 127 and D4S95AccI, however, do suggest that the number of different ancestral mutations underlying ND is still small enough that linkage disequilibrium can be detected. Development of other intragenic polymorphisms will enable further resolution of HD founder chromosomes.  The difference in frequency between haplotype 2 in the control and affected populations suggests that this haplotype was prone to expansion of the CAG repeat, leading to HD. It  107 is possible that the difficulty in matching control chromosomes accounts for the observed significance.  However, the higher mean CAG size of control individuals with this  haplotype suggests that, similar to FRAXA and DM, it is the reservoir of chromosomes with higher than the average repeat sizes which are prone to further expansion, eventually leading to the disease state. Therefore these chromosomes represent a frequent haplotype in the HD population. This is consistent with the multi-step model of a subgroup of unstable normal alleles with a particular haplotype and high range CAG size being liable to expand, resulting in a chromosome with a CAG repeat expanded to the HD range.  Similarly, haplotype 1 is more common on RD than control individuals, and although this difference is not significant, the significant length of the CAG repeats on normal chromosomes with this haplotype suggests that it is the length of the CAG repeat which is a factor in the instability leading to disease.  Haplotype 3 represent s 35.71% of control chromosomes, yet is rarely seen on RD chromosomes (Table 5-3a). It is of interest to note that haplotypes with this core have a 10 allele at the CCG repeat, suggesting that these haplotypes are somehow more stable than those with 7 CCG repeats, and do not expand to become HD chromosomes. The average CAG length of haplotype 3 is 16.9 repeats, and this relatively small CAG size may be a factor in the stability of these chromosomes.  Analysis of chromosomes from families with new mutations for HD provides the opportunity to observe the rise of an HD chromosome from an intermediate size CAG repeat. The 7 HD alleles derived from intermediate alleles in this analysis are associated with 5 different haplotypes, however, all have the same core haplotype (polymorphism and CCG repeat).  Therefore, there are multiple chromosomal backgrounds, perhaps  derivations of one ancestral chromosome, that are prone to expansion resulting in HD  108 chromosomes. The CAG sizes of four of these haplotypes on normal chromosomes are higher than average (19 to 21 repeats), further supporting the hypothesis that CAG length is a factor in the instability of the CAG repeat, leading to expansion and HI).  The results of this analysis suggest that the mechanism of repeat expansion is a common one, occurring on several different chromosomal backgrounds. CAG length is one factor, although likely not the only one, contributing to the instability of the repeat. The data presented here suggest that chromosomes with CAG repeats within the high end of the normal range can undergo small expansions to the range of intermediate alleles (lAs) which are then prone to further expansion into the range associated with disease.  The identification of additional polymorphic markers within the HD gene would aid in further refinement of the patterns of association within the HD gene. Furthermore, the development of additional markers more closely flanking the CAG repeat associated with HI) will also help in determining more accurately the number, and basis of ancestral founder HD chromosomes.  109  5-4 REFERENCES 1. Hill WG and Robertson A (1968). Linkage disequilibrium in finite populations. Theor. Appl. Genet 38:226-23 1. 2. Jorde LB, Watkins WS, Viskochil D, O’Connell P, Ward K (1993). Linkage disequilibrium in the Neurofibromatosis I (NFl) region: Implications for gene mapping. Am J Hum Genet 53: 1038-1050. 3. Kerem B-S, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui L-C (1989). Identification of the cystic fibrosis gene: Genetic analysis. Science 245:1073-1080. 4. Buetow KH, Shiang R, Yang P, Nakamura Y, Lathrop GM, White R, Wasmuth JJ, Wood 5, Berdahl LD, Leysens NJ, Ritty TM, Wise M, Murray JC (1991). A detailed multipoint map of human chromosome 4 provides evidence for linkage heterogeneity and position-specific recombination rates. Am 3 Hum Genet 48:911-925.  5. Allitto BA, MacDonald ME, Bucan M, Richards J, Romano D, Whaley WL, Falcone B, Ianazzi 3, Wexler NS, Wasmuth JJ, Collins FS, Lehrach H, Haines JL, Gusella JF (1991). Increased recombination adjacent to the Huntington disease-linked D4S1O marker. Genomics 9:104-112. 6. Andrew SE, Theilmann J, Hedrick A, Mah D, Weber B, Hayden MR (1992). Nonrandom association and two loci separated by about 3 Mb on 4.pl6.3. Genomics 13:301-311. 7. MacDonald ME, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H, Wasmuth J, Gusella JF (1991). Complex patterns of linkage disequilibrium in the Huntington disease region. Am J Hum Genet 49:723-734. 8. Imbert G, Kretz C, Johnson K, Mandel JL (1993). Origin of the expansion mutation in myotonic dystrophy. Nature Genet 4:72-76. 9. Richards RI Holman K, Friend K, Kremer E, Hillen D, Staples A, Brown WT, Goonewardena P, Tarleton J, Schwartz C, Sutherland GR (1992). Evidence of founder chromosomes in fragile X syndrome. Nature Genet 1:257-260. 10. Oudet C. Mornet E, Serre JL, Thomas F, Lentes-Zengerling 5, Kretz C, Deluchat C, Tejada I, Boue J, Boue A, Mandel JL (1993). Linkage disequilibrium between the fragile X mutation and two closely linked CA repeats suggests that fragile X chromosomes are derived from a small number of founder chromosomes. Am 3 Hum Genet 52:297-304. 11. Zhong N, Dobkin C, Brown WT (1993). A complex mutable polymorphism located within the fragile X gene. Nature Genet 5:248-253. 12. MacPherson JN, Builman H, Youings SA, Jacobs PA (1994). Insert size and flanking haplotype in fragile X and normal populations: possible multiple origins for the fragile X mutation. Hum Mol Genet 3:399-405.  110  13. Rommens JM, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, Graham R, Lai V, McArthur I, Nasir J, Theilmann J, McDonald H, Kalchman M, Clarke LA, Shappert, Hayden MR (1993). A transcription map of the region containing the Huntington disease gene. Hum Mol Genet 2:901-907. 14. Weir BS (1990). Genetic data analysis. Sinauer, Sunderland, MA. 15. Wasmuth JJ, Hewitt J, Smith B, Allard D, Haines J,L, Skarecky D, Parlow E, Hayden MR (1988). A highly polymorphic locus very tightly linked to the Huntington disease gene. Nature 322:734-736. 16. Taylor SAM, Barnes GT, MacDonald ME, Gusella IF (1992). A dinucleotide repeat polymorphism at the D4S 127 locus. Hum Mol Gen 1:142. 17. Andrew SE, Goldberg YP, Theilmann J, Zeisler J, Hayden MR (1994). A CCG repeat polymorphism adjacent to the CAG repeat in the Huntington disease gene: Implications for diagnostic accuracy and predictive testing. Hum Mol Genet (in press). 18. Andrew SE, Goldberg YP, Kremer B, telenius H, Theilmann J, Adam 5, Starr E, Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington disease. Nature Genet 4:398-403. 19. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein S, Ross C, Franz M, Abbott M, Gray J, Conneally P, Young A, Penney I, Hollingsworth Z, Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E, Alvir I, Bickman Conde 3, Cha 3-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J, Snodgrass S. deYoung M, Wexler N, Moscowitz C, Penchaszadeh 0, MacFarlane H, Anderson M, Jenkins B, Srinidhi 3, Barnes G, Gusella J, MacDonald M (1993). Trinucleotide repeat length instability and age of onset in Hntington’s disease. Nature Genet 4:387-392. 20. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME, Gusella IF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeat expansion and phenotypic variation in Huntington disease. Nature Genet 4:393-397. 21. MacDonald ME, Duyao M, Ambrose C, Barnes G, Srinidhi J, Myers R, Gusella J (1993). A codon deletion in the Huntington’s disease gene is associated with the major ND chromosome haplotype. Am J Hum Genet 53:A80. 22. Jeffreys Al, Royle NJ, Wilson V, Wong Z (1988). Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DNA. Nature 332:278281. 23. Elbein SC (1992). Linkage disequilibrium among RFLPs at the insulin-receptor locus despite intervening Alu repeat sequences. Am J Hum Genet 51:1103-1110. 24. Hastbacka 3, Chappelle A, Kaitila I, Sistonen P, Weaver A, Lander E (1992). Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nature Genet 2:204-211.  111  25. Pandolfo M, Sirugo G, Antonelli A, Weitnauer L, Ferretti L, Leone M, Dones I, Cerino A, Fujita R, Nanauer A, Mandel IL, Di Donato S (1990). Friedrich ataxia in Italian families: Genetic homogeneity and linkage disequilibrium with the marker loci D9S5 and D9S15. Am J Hum Genet 47:228-235. 26. Kaplan N and Weir BS (1992). Expected behaviour of conditional linkage disequilibrium. Am 3 Hum Genet 51:333-343. 27. Conneally PM, Haines JL, Tanzi RE, Wexier NS, Penchaszadeh GA, Harper PS, Foistein SE, Cassiman 33, Myers RH, Young AB, Hayden MR. Falek A, Tolosa ES, Crespi 5, Di Maio L, Holmgren G, Anvret M, Kanazawa I, Gusella (1989). Huntington disease: No evidence for locus heterogeneity. Genomics 5:304-308. 28. Hayden MR (1981). Huntington’s disease. Springer Verlag, New York. 29. MacDonald ME, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H, Wasmuth 3, Gusella IF (1991). Complex patterns of linkage disequilibrium in the Huntington disease region. Am 3 Hum Genet 49:723-734. 30. Andrew SE, Theilmann 3, Almqvist E, Norremolle A, Lucotte G, Anvret M, Sorenson SA, Turpin JC, Hayden MR (1993). DNA analysis of distinct populations suggests multiple origins for the mutation causing Huntington disease. Clin Genet 43:286-294. 31. Almqvist E, Andrew SE, Theilmann 3, Goldberg P, Zeisler 3, Drugge U, Grandell U, Tapper-Persson M, Winbiad B, Hayden MR, Anvret M. Geographical distribution of haplotypes in Swedish families with Huntington disease (In press, Human Genetics).  112  CHAPTER 6 IDENTIFICATION OF AN ALU RETROTRANPOSITION EVENT The work presented in this chapter has contributed to three publications.  Goldberg YP, Rommens JM, Andrew SE, Hutchinson GB, Lin B, Theilmann J, Graham R, Glaves M, Starr E, McDonald H, Nasir J, Schappert K, Kalchman M, Clarke LA, Hayden MR (1993). Identification of an Alu retrotransposition event in close proximity to a strong candidate gene for Huntington’s disease. Nature 362:370-373.  Hutchinson GB, Andrew SE, McDonald H, Goldberg YP, Graham R, Rommens JM, Hayden MR (1993). An Alu element insertion in two families with HD defines a new active Alu subfamily. Nuci Acids Res 21:3379-3383.  Rommens JM, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, Graham R, Lai V. McArthur J, Nasir J, Theilmann J, McDonald H, Kalchman M, Clarke LA, Schappert K, Hayden MR (1993). A transcription map of the region containing the Huntington disease gene. Hum Mol Genet 2:901-907.  113  6.1 INTRODUCTION After mapping a disease locus to a chromosomal location, the region of the genome requiring further study to identify the genes present may still be relatively large (1-2 Mb or more). Systematic and reliable identification of coding regions within extensive genomic regions is difficult as genes are irregularly dispersed and may contain many exons. Selective expression of a particular gene with respect to type of tissue or stage of development may also complicate retrieval of cDNA.  Previous methods of searching for genes through the identification of CpG rich sequences and demonstration of phylogenetic conservation over the region of interest is labour intensive and requires subsequent experimentation to obtain cDNAs for sequence analysis. The more recent approaches to identify genes, including exon trapping” 2 and cDNA selection procedures 4 are more sensitive and greatly expedite cloning strategies. ’ 3  6.1.1 GENE TRACKING To identify transcribed sequences within the proximal candidate region, a collaboration with Dr. J. Rommens was established. We identified and cloned transcribed segments contained within a 1 Mb region localized around D4S95, using a modified direct eDNA selection scheme termed Gene Tracking (Figure 61)5,6. The Gene Tracking method involves preparation of cDNA from an appropriate tissue using primers that subsequently can be used in polymerase chain reaction (PCR) to amplify and clone specific cDNAs. The cDNA pool is blocked for repetitive sequences and hybridized exhaustively with immobilized and purified YAC or cosmid DNA from the candidate region. After two rounds of prolonged hybridization with cDNAs, eluted cDNAs were then amplified by PCR and cloned to yield a library of selected cDNAs for each YAC. The clones of these libraries were arrayed and screened for presence of repetitive sequence, and the remaining clones were then individually hybridized to EcoRI digestions of human, and human-  V 1  I  “‘  V  I’ll V  II 228227  ‘V  • Region 2  1000 kb  Immobilized YACS from Region 1 hybridized with brain and tissue mix cDNA’s • Elution and preparation of minilibraries • Selection of clones  BLAST, SORFIND, PYTHIA, GRAIL  —  Northern BIot/cDNA screening/sequencing  YAC Mapping/Assignment To BINS  (+ve)  Chrom 4 Somatic Cell Hybrid  *  Region 1  VVVV  II 98 180 12795 182 183 D4S1O  4p  Figure 6-1. Gene Tracking  CANDIDATE REGION  Gene Tracking  Mapping  Transcription Unit Assembly  Sequence Analysis  115 hamster hybrids that contain human chromosome 4 and YAC clone DNAs in order to confirm their origin. The clones were then hybridized to each other to test uniqueness, hybridized to a series of additional overlapping YACs for physical mapping, hybridized to RNAs of tissues or cell lines and fmafly characterized by sequencing.  In this collaboration, cDNA was produced from tissue samples of frontal cortex and with pools of cDNAs from four tissue sources including fetal brain, frontal cortex, bone marrow and liver. A high proportion (between 50% 90%) of clones were found by hybridization -  to originate from chromosome 4 and from the original chromosome 4 YAC. The cDNA clones were termed “GT clones” as an abbreviation for “gene tracked”. In this manner a total of 53 GT clones was isolated. The structural integrity of the human DNA within the YAC was confirmed by the comparison of hybridization patterns observed for each clone to YAC DNA and to human and human-hamster hybrid DNAs. For each GT clone, the hybridizing EcoRI fragments of the YAC DNAs corresponded to those observed with human total genomic and chromosome 4 DNA.  The series of overlapping YACs spanning the region were used to delineate physical intervals into non-overlapping BINS containing a particular stretch of DNA across the 1 Mb region as depicted in Figure 6-2. BIN 3 which contains D4S 127, D4S95, and D4S 182 was subdivided into 3 compartments, BIN 3A, 3B, and 3C by using additional overlapping YACs. Refined positioning of each cDNA was deduced by the hybridization pattern to this array of YACs.  The clones were categorized into transcription units by refined physical mapping and crosshybridization to each other and to RNA from a variety of tissues including those initially  116  4p 1000 kb  111  D4S1O 125 180 127 95 182 183 43 98  227  // II  I  I II  I  I  I  “  Region 1  Region 2  %%  I  95  180 V  127 I Vs  182  183 V 100 Kb  353G6  I  70D11  I  1  II I  1  2A11  3  I  i I  i  I  121  I  I 4  I  5  I  I  BINS________  A187G12 B  BIN3  C  I  D1O2A1O  I  Figure 6-2. Mapping of transciptional units within the proximal candidate region. Overlapping regions within Yacs 353G6, 70D1 1 and 2A1 1 were used to define 5 separate BINS. Yacs Al 87G1 2 and Dl 02A1 0 were used to further refine BIN 3 into three separate regions: A,B,C. GT clones were mapped by hybridization to digested Yac DNA and assigned to BINS accordingly.  117 used to select the cDNA fragments. Direct sequence analysis of the clones revealed that several contained open reading frames, however there were also clones for which open reading frames could not be detected. The combined information of RNA hybridization and physical mapping clearly indicated that some of the GT clones were portions of the same transcription units. Based on their expression pattern and size, a total of nine different mRNAs were detected. The GT clones, their localization according to BIN, transcript size if observed, and sequence analysis of the clones are given in Table 6-1.  6.1.2 GT CLONE ANALYSIS The large number of cDNA clones (GT clones) isolated made it necessary to rank them as to their potential for candidacy for further analysis. Refined physical mapping of these clones, by hybridization to different YACs and to a cosmid-phage contig from this region as well as long range mapping by pulsed field electrophoresis, identified a subset of clones which mapped to the proximal candidate region between D4S 127 and D4S 182 (BIN 3) which contains D4S95 and also encompasses DNA markers which form the core haplotype that is present on about one third of disease . 7s A total of 20 cDNA clones chromosome mapped to BiN 3 (Table 6-1).  GT clones from BIN 3 were tested for coding potential on Northern blots to determine transcript size and any difference in expression between control and affected. GT clones showing coding potential with multiple bands on Southern blot hybridization representing multiple exons were also of interest. In addition, GT clones with excellent coding potential according to computer analysis of sequence by GRAIL (Gene Recognition Analysis Internet Link) were also treated as good candidates. For example, the transcription unit detected by GT7O had excellent coding potential, detected several genomic fragments, hybridized to two distinct RNA species and also detected DNA polymorphisms, making it a strong candidate for the H]) gene.  BIN 2  BIN 1A BIN lB  BiN  *70 *63 87 66 1 65 54 72 1 89  168 67 71 65 69 68 1 67 86 166 88 *149  GT  757 600 536 644 600,550 757 764 695,578  650 600 912 207 976 573 500 600 550 600 584  Clone size(bp)  14.0,7.5 13.7 14.0  9.0,8.5,1.2 9.0,1.2 8.5 10.0 11.5,4.2 2.7 2.8 11.0,6.0  7.0 12.0 12.0 2.9 3.8 9.5 9.5 9.5 12.0 6.0 6.0,5.0  EcoRi size (kb)  11.0,13.0;L,F,C,W,B absent absent absent absent absent absent absent  absent absent absent  Sequence analysis  DB search neg. MIR repeat present Same as GT23 Same as GT23 Same as GT23 Same as GT23 Same as GT23 DBsearchneg. DB search neg. DBsearchneg.  DB search neg. Coding Potential Excellent SameasGT7O DB search neg. DB search neg. DB search neg, sequence overlaps with GT66 DB search neg., Alu and MER repeats present DB search neg. DB search neg.,composite clone  DB similarity, HUMPHPLA2(phospholipase A2) 5.5 ; W,Fi,C,B,Co DB similarity, MITGTRN6(yeast mitochondrion) absent Same as GT67 absent DB search neg. absent DB search neg. absent DB search neg. Li and MER12 repeats present absent absent DB search neg., sequence overlaps with GT 68 DB search neg., sequence overlaps with GT 68 Not sequenced 4.5, similar to GT88 4.5, similar to GT166 Not sequenced 11.0,13.0;K,Co,Fi,L,W,C DB search neg. Coding Potential Excellent  RNA hybridization size(kb),distribution  Table 6-1. Summary of characterization of 53 retrieved cDNA fragments  BIN 3B  551 592 532 595 589 597 500 646 550  14.0 14.0 14.0 14.0  23 49 90 93 129 130 136 44 48  BIN3C  BIN 3B  BIN  24  45 34b 1 72 157 30 1 38 169 127 170 53  GT  480 422 443,250 439 400 495 480  667  516 500 560 500 490 458 600 550 550 550  Clone size(bp)  6.5  5.0 6.4 13.0 13.0 8.0,14.0 16.0,14.0,7.5 14.0 16.0  9.0  EcoRi size (kb) 3.8;W,L,F,C,Co  RNA hybridization size(kb),distribution  DB search neg., 2 ORFs, Alu repeat present  search neg. Alu repeat present similarity HSIL1AG, Alu repeat present search neg. Alu repeat search neg. Sequence overlaps GT169  Sequence overlaps with GT45 DB DB DB DB  DB search neg. Similarity to Li repeat  DB search neg. Li repeat present 12.0  DB search neg. ORF good coding potential Sequence overlaps with GT123 DB search neg. DB search neg. DB search neg. i79bp ORF,good coding potential 08 match HUMXTO1O95(EST) is identical  search neg. similarity TFD:P0136, 422bp ORF search neg. 443bp ORF,excellent coding potential search neg. 439bp ORF,excellent coding potential search neg. Coding potential good search neg. search neg. absent 5.5 5.5 5.5  absent absent absent 3.8, Fi,L,W  3.6,wide distribution 1.8,3.6, Fi,L,W,C,Co  absent absent  DB DB DB DB DB DB DB  absent  Sequence analysis  Table 6-1. Summary of characterization of 53 retrieved cDNA fragments  BIN 4  128 131 1 64 I 54 1 59 43 1 33  absent  BIN5B  450 447 352 662 500 500 349  14.0,9.0 14.0,9.0,4.1 14.0,9.0 7.5 9.0 4.2 9.0 98 123 126 125 137 1 60 161  The clones are listed by GT number according to their BIN assignments. The clone size and the EcoRi genomic fragments dete with these clones are also listed. Sizes of mRNAs detected in the tissues are given in kb: K=kidney, Co=Cos cells, F=fibroblasts L=lymphoblast, W=HL6O cells, C=Caco-2 cells, B=bone marrow,F=frontal cortex, FB=fetal brain. Groups of clones that are shown bracketed indicate those that partially overlap as determined by cross-hybridization or sequence analysis. Database (DB) searches were carried out against non-redundant nucleic acid and protein databases of Geribank as well as the dbEST and Transcription Factor databases.  120 Many gene mutations are detected by genomic rearrangement, detectable by Southern blot. A strategy to screen a battery of 250 unrelated patients digested with two different enzymes was undertaken in the laboratory by Jane Theilmann, beginning with the GT clones that maptoBlN3.  6-2 RESULTS 6.2.1 GT48 GENOMIC REARRANGEMENT One GT clone, GT48, located 120 kb from D4S95, detected a genomic rearrangement in 2 out of 250 patients on Southern blots digested with MspI. This altered band (1.7 kb MspI fragment) segregated with HD in both families (Figure 6-3 a, 6-3b).  Interestingly, in one of these families (Figure 6-4a) a recombination event places the RD gene distal to D4S 125. This recombination event in an affected individual from a family with clearly established diagnosis of HD reduced the candidate region by indicating the defective gene must be distal to D4S 125, thus redefining the proximal boundary for the gene (Figure 6-4a).  The rearrangement occurred on the same haplotype in both families (Figure 6-4a,6-4b) and this haplotype was unique among 140 HD families, suggesting a common origin for this rearrangement. Both families were of Scottish origin with their ancestors living 50 km apart.  6.2.2 GENOMIC CLONING OF ALU RETROTRANSPOSITION In order to map and localize the genomic rearrangement further, a 2 phage was isolated by Dr. Rommens using GT48 as a probe (Figure 6-5a). Detailed restriction mapping of 2GT48 in control DNA, determined from single and double digestions using Hindlil, TaqI, MboI, MspI, PstI, and XbaI and hybridizations with GT48, is shown in Figure 6-  -4  SE. flL  1L  j jut aJ..u  111  L  I  I  SI  Figure 6-3a. Genomic rearrangement in one of two families with Huntington disease. Southern blot analysis of Mspl digested genomic DNA probed with GT48 revealed an altered 1.7kb Mspl fragment affected individuals from this family.  2.0  L6  Figure 6-3b. Genomic rearrangement co-segregating with HD in the second family. Southern blot analysis of Mspl digested genomic DNA probed with GT48 shows an altered 1.7kb Mspl fragment co-segregating with HD in all affected individuals and individuals predicted to be at high risk for having inherited the HD gene.  123  II  Ill EcoRl Hindlil BgII PstI XmnI PCR Acci TaqI MboI SstI Pstl MspI PstI PstI PCR Hindill  (G8) (G8) (G8) (YNZ32) (D4S180) (D4S127) (674) (674) (674) (731) (252) (678) (157) (Cl 3B) (E24CA) (2R3)  B C C A B 3 7 B B 3 5 A A A A B A B A 3 3 B B 2 1 2 7 311 2 2  B B C B A B 3 4 B B 3 5 A A A B B A B A 3 3 B B 2 2 2 2 311 2 2  B A B C B B 4 7 B B 5 5 A A B A A A A A 3 3 B B 2 1 2 7 1111 2 2  B A A 8 A 2 A A A B  B C A 5 A 5 B B B B  -  -  A B 2 2 2 2 1111 2 2  BB AA AB 54 AB 22 AA BA AA BA  6 6  A B A A A A 1 5 B 2 5 A A B A A A B A 1 3 A B 2 1 7 7 711 1 2 .  Figure 6-4a. Recombination within a HD family refining the proximal boundary of the HD candidate region. The affected haplotype in this family is designated within the boxed region. Recombination between markers D4S1 80 and D4S1 27 in individual 11-6 shifts the proximal boundary at least 200kb towards the telomere, from D4S1 25 (YNZ32) to D4S1 80.  124  EcoRl HindlU BgIl PstI XmnI  PCR Acci TaqI MboI SstI Pstl MspI PstI  (G8) (G8) (G8) (YNZ32) (D4S180) (D4S127) (674) (674) (674) (731) (252) (678) (157)  A A A 2  B 0 A 7  A A A 2  A C B 7  A A A 2  B D A 7  A A A 2  A C B 7  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  B A A B A B B B 12 A A 1 1  B A A A A A B A 13 A B 1 1  B A A B A B B B 12 A A 1 1  B A A A A A B A 13 A B 1 1  AB AA AA 66 -  AA BB AA BA 11 AA 22  A A A 6  B A A 6  -  -  -  A B B B A B B B 11 A A 2 3  B A A 6  A C B 7  B A A 6  A C B 7  -  -  -  -  -  -  -  -  A A B A A A A A 13 A B 2 1  B A B A B A B A 13 A B 3 1  Figure 6-4b. Haplotype analysis of the second family demonstrating the insertion detected by GT48. The affected haplotype in this family is designated within the boxed region. Individual 11-2 has inherited the HD chromosome but does not yet appear affected.  125  CEN  TEL 95 147  10  182  183  43 100 Kb  353G6  i  2A11 1  i  70D11’____________________ I II I II I  I I  1  I  3  121  14  I I  I I  5  I  1  BINS D1O2A1O  A187G12 IA  BIN3  B  ci  L ‘  GT44  _*  GT48 —  GTZ4 GT49—  GT48 1 Kb  Figure 6-5a.  Mapping of GT clones. GT44, GT48, and GT49 mapped  to both Al 87G1 2 and Dl 02A1 0 as well as 70D1 1, indicating their position in BIN 3C. All three clones are contained within a 1 5kb lambda phage GT48 isolated using GT48 as a probe. GT48 (*) contains a Hindill polymorphism detected by both GT44 and GT48.  126 5b. A polymorphic Hindifi site, detected by GT48 and an adjacent GT clone GT44, was identified as shown (Figure 6-5b, 6-5c).  Analysis of restriction mapping patterns from affected individuals with the insertion demonstrated the rearrangment was detected with GT48 but not with GT49, thus localizing this rearrangement to the 1.2kb polymorphic Hindu fragment (Figure 6-5b).  The genomic rearrangement was seen in genomic DNA digested with multiple enzymes and hybridized with GT48 (Hincli, Hindill, MboI, TaqI) (Figure 6-5d). Using GT48 as a probe, the rearrangement was not observed on Southern blots digested with PstI, but was detected with MspI digested DNA. Thus, the site of rearrangement was further localized within the 1.2kb Hindifi fragment to within the 200bp MspI-PstI fragment (Figure 6-5b). Fine restriction mapping of the genomic region around GT48 and the altered restriction fragment sizes in the affected individuals strongly suggested the genomic rearrangement was an insertion.  The 1.2kb Hindifi fragment from ?GT48 was subcloned and sequenced. Primers flanking the insertion site were designed and a PCR assay to detect the inserted element was established (Figure 6-6a). PCR primers are as follows: primer A: 5’ATGTAKI9(Ifl’CAGGACATGTGGC3’ primer B: 5’AAATAACATCCAGAATCflCAGAT3’ PCR conditions were 3 mM MgCI2, 50 mM KC1, 20 mM Tris pH 8.4 200 mM of each ,  dNTP, 0.5 mM of each primer, 1 .25U of Taq DNA polymerase per 25 jil reaction. Thermal cycling conditions were 95 °C for 45 s, followed by 35 cycles of 94 °C for 1 mm, 54 °C for 30s, 72 °C for 1 mm, with a final extension at 72 °C for 10 mm. PCR products were resolved on 1% agarose gels, in 1X TBE, at 100V for 1 hour.  GT44 *  I  Mb  GT48  1.2kb  HMPH liii  II MbX  I T  M  P  I X  TP II  I  I  TP  1kb  I  HME  GT49  Figure 6-Sb. Restriction map of GT48, a 15kb clone containing the site of Alu insertion detected by GT48 in two Huntington disease families. Location of GT48 and adjacent GT clones GT49 and GT44 are shown. The Alu insertion site was localized to a 1.2kb Hindlll fragment shown boxed. Using GT48 as a probe the rearrangement was not detected by Pstl, yet was detected with Mspl, thus further localizing P=Pstl, EsEc0RI, X=Xbal, the insertion to Mspl-Pstl fragment within the 1 .2kb Hindlll fragment. M=Mspl,T=Taql, Mb=Mbol, H=Hindlll. The polymorphic Hindill site is marked (*)  NE HT II  1  00  3.5-  2.4(0 L()  C)  cc  C•4J  1 1  -  1  CJ 0  c’J  0  GT44  0  Figure 6-Sc. Hybridization of GT44 to YAC DNAs. Southern blot of DNA from YACs 353G6, 7ODl 1, Al 87G1 2, Dl 02A1 0 and 2A1 1 digested with Hindill localizes GT44 to BIN 3B and detects the Hindill polymorphism.  1  Hinc II 2C  41  1  Hind III 2C 1  MboI 2C  TaqI 2C —  —  —  —  —  —  7.0  4.3 3.5  2.0  1.3 1.1  Figure 6-5d. Genomic DNA digested with multiple restriction enzymes and probed with GT48 resulted in altered bands of equal size in the affected individuals from each family. (lane 1 family a, lane 2 family b) and a control (lane C).  C  460118—  123456789  Figure 6-6a. Primers flanking the insertion site were designed. These primers generate a 11 8bp fragment in normal individuals (lanes 6-9) and in addition a 460bp product in five affected individuals from both families (lanes 1-5). The PCR product was subcloned and sequenced.  131  GT48  GT44  — *  GT48  —  GT49  AATTTCTTCTTGTTTAAGAGTATGCTGGCCGGGCGCGGTGGCTCACGCCT GTAATCCCGCACTTTGGGAGGCCGGGCGGGTGGATCATGGGTCAGGA ACAAAAAATTACCGGGCGCGGTGGCGGGCGCCTGTAGTCCCAGCTACTC GGGAGGCTGAGGCAGGAGAATGGCGTGAkCCCGGGAAGCGGAGCTTTCG TGAGCCGAGkTTGCGCCACTGCAGTCCGCAGTCCGGCCTGGGCGCAGG CAA.GA.CTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAA GTATGCTGATTGATATTTGTTCATCATGGG  Figure 6-Gb. The 1.2kb Hindlll fragment was subcloned, sequenced and primers flanking the site of insertion were derived. PCR products from affected individuals with the insertion were cloned and sequenced. The inserted sequence shown here represents a full-length Aft, element (bold) and the insertion site is flanked by a 9 bp direct repeat (underlined).  132 These primers generated a 118 base pair fragment in normal individuals and a 460 base pair product in five affected individuals from both representative families. The 460 base pair product was subcloned using TA cloning (according to Invitrogen protocol) and sequenced (Sequenase).  Cloning and sequence analysis of the rearrangement in both families demonstrated an insertion element of 331 base pairs between the MspI and PstI sites of the 1.2kb Hindifi fragment. The insertion element is a member of the Alu family of mobile repetitive elements (Figure 6-6b). The Ala was identical in both families and had inserted at the identical nucleotide position in both families. These observations and the tracing of both genealogies to the same area of Scotland, suggested one retrotransposition event had occurred, seen in two branches of the same family.  The core chromosomal haplotype seen in both families with the Alu insertion extending for about 1 Mb including alleles at D4S95 and D4S98 is seen in 2% (14/687 chromosomes) of control DNA banked in Vancouver  The PCR assay designed to detect this insertion  demonstrated that none of the 14 individuals from our cohort with this rare haplotype had this insertion. Furthermore, 30 affected individuals of Scottish descent did not have this rearrangement. In addition, screening of 1,000 control chromosomes of multiple ancestries with GT48 showed no insertions had occurred.  The Ala element is flanked by a perfect 9 base pair duplication of the target sequence at the site of insertion (Figure 6-6b), characteristic of mobile element insertion by retrotransposition. Creation of staggered single stranded nicks on both strands of DNA, followed by repair synthesis results in flanking direct repeats during the insertion event . 8 Furthermore, the sequence surrounding the insertion is AT rich which is consistent with the hypothesis that Alu elements preferentially integrate into AT rich regions . 8  133 If the Alu was inserted at the HD gene, alleles at polymorphisms in this region would be expected to be in strong nonrandom allelic association with HD. The 1.2kb HindIll polymorphism was investigated for such allelic association to HD. Allele frequencies differed slightly between the HD and control populations, however this difference was not significant (Table 6-2).  Sequence analysis of the 1.2kb HindIll fragment containing GT48 did not reveal a significant coding potential. The question of whether the Alu was causative of HD in these two families was addressed after identification of CAG repeat in a gene ff15 expanded in HD patients, and development of an assay for CAG analysis (Chapter 7). CAG analysis of individuals in these two families showed that all affected individuals manifesting the Alu insertion also demonstrate CAG expansion in the range associated with HD (Figure 6-7).  6.3 DISCUSSION The goal of positional cloning strategies is the identification of genes of biological importance. The use of cDNA selection procedures to yield a transcription map for a genomic region provides significant advantages over conventional positional cloning approaches as numerous candidate genes or gene fragments are directly and quickly available for assessment.  Gene Tracking, a cDNA selection scheme, was used to identify coding sequences from the 1 Mb candidate region around D4S95, from D4S 180 to D4S 183. The advantage of Gene Tracking over other methods of transcribed sequence identification is that it enables detection of transcribed sequences regardless of genomic organization, whereas exon trapping is unable to detect intronless genes, as it is dependent on splice junctions for . In addition, Gene Tracking is extremely sensitive, able to detect rare 2 ’ 1 identification transcripts present at only a few copies per cell (I. Rommens, personal communication).  134  Table 6-2 Allele frequencies for 1.2kb Hindifi polymorphism ALLELE  A  HD No. 7  % 15.9  CONTROL No. % 68  30.8  B  37  84.1  153  69.2  TOTAL  44  100.0  221  100.0  2 3.29 X p= 0.07 df= 1  135  A I  1  2  4  6  20/25  a  1  18/45  21/21  III  3 20/31 21/49  18/46  1  B  I  2634 18/35  35/41  18/35  35/41  17/24  21/42  II 21/24  17/42  Figure 6-7. CAG repeat sizes for the two families with the Alu retrotranposition. CAG sizes greater than 36 repeats are associated with Huntington disease.  136 After publication of the identification of the GT clones, a novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes . Sequence comparison with the GT clones revealed that GT 70 and GT 149 9 was reported (BiN 1A and BiN 2) are portions of this gene. GT 70 corresponds to nucleotides 835 to 1601 which is 406 nucleotides 3’ to the trinucleotide repeat. GT 149 corresponds to nucleotides 5322 to 5930. Crosshybridization and hybridization to genomic DNA indicated that GT 63 overlaps with GT 70.  At the time of this analysis, the proximal candidate region for HD spanned 2.2 Mb, with the distal boundary defined by a crossover occurring between D4S98 and D4S43’°, and the proximal boundary defined by recombination between D4S 10 and D4S 12511. In one family, identification of a recombination event between markers D4S 180 and D4S 127 in  individual 116 (Figure 7-4a) indicated that the mutation causing I{D was distal to the marker D4S 125. This reduced the candidate region by moving the proximal boundary at least 200 kb towards the telomere.  One eDNA clone, GT48, isolated by a direct cDNA selection procedure, detected a DNA rearrangement in two families with HD. The genomic rearrangement was caused by Alu retrotransposition and occurred in a gene rich area, 5’ to the HD gene, approximately 190 kb from the putative start site of the Huntington gene (Figure 6-8).  cDNAs for GT48 and the two other adjacent clones, GT44 and GT49, were not detected despite extensive library screening by others in the laboratory, suggesting the Alu has not inserted within a coding sequence. However, it is possible that there is a relationship between Alu transposition and HD as the Alu retrotransposition segregates with HD in these two families and was not seen in 1,000 control chromosomes. Furthermore, the  CEN  V  125  I  353G6 r  BINS  1Q  A187G12  70D11  HD GENE  1  BIN3  I  CAG  I  121  A  3  B  I  182  GT48  C  4  I  5  Dl 02A1 0  Figure 6- 8. Schematic map showing the location of the HD gene and GT48 with respect to 4p1 6.3 markers and the YAC BINs used in mapping.  TEL  100Kb  2A1 1 1  138 rearrangement occurs on a core chromosomal haplotype which occurs in 2% of the general population where it is not associated with any rearrangement. The same Alu insertion has now been observed on the HD chromosome in 4 additional Scottish families, and has not been observed on 1000 control chromosomes of Scottish ancestry, supporting specificity of the Alu retrotranspositon event to HD chormosomes (J Warner, personal communication).  The relationship between the insertion event and HD in these two families is not clear. Affected individuals with the Alu insertion demonstrate CAG repeat length consistent with HD and show no unusual clinical features accountable to the Alu retrotransposition event. Alu transposition has been previously shown to cause disease by interruption of exonic sequence as for the factor VIII gene 5 and the cholinesterase gene . Alternatively, the 12 positioning of an Alu element within an intron of a gene can affect processing of the primary transcript as seen in the NF 1 gene . The Alu element identified here is 5’ to the 6 HD gene, 190 kb from the putative initiation codon. It is possible that an Alu insertion could interfere with upstream regulatory sequences, such as enhancers, and affect expression of the gene. However, the demonstration of CAG expansion in these two families, identical to that of other HD families without Alu transposition, suggests that the Alu retrotransposition event is not causative of disease.  Dr. G. Hutchinson analyzed the specific nucleotide variations in each Alu element that allows their classification into subfamilies based on the extent of divergence from the Alu consensus sequence’ . The Mu element in the two HD families has the internal 7 bp 3 duplication at bp250, as well as the 7 other single nucleotide changes diagnostic for the Sb2 . The locations of five other Sb2 Alu elements are known, however, only one 3 subfamily’ is known to disrupt gene activity by insertion and inactivation of the cholinesterase gene’ . 2 The Alu element identified in this chapter does not have the 5 diagnostic nucleotide  139 substitutions found in the further subdivided PV subfamily which have been associated with de novo mutations in the Factor IX gene’ 4 and the neurofibromatosis type 1 gene’ . 5  Sequencing of a 58kb contig on 4pl6.3, around the marker D4S98, demonstrated a large number of Alu repeats, with an average Alu density of 1.0 Alu per kb’ . The presence of 6 this Alu element may be a result of the preferential Alu insertion in Alu-rich areas. However, whether the Alu is a contributing cause to CAG instability, or is an effect of CAG instability resulting in broad chromosomal instability and activation of transposing elements, or whether it is located there by chance alone has yet to be determined.  140  6.4 REFERENCES 1. Duyk GM, Kim S, Myers RM, Cox DR (1990). Exon trapping: A genetic screen to identify candidate transcribes sequences in cloned mammalian genomic DNA. Proc Nati Acad Sci USA 87:8995-8999. 2. Buckler AJ, Chang DD, Graw SL, Brook JD, Haber DA, Sharp PA, Housman DE (1991). Exon amplification: A strategy to isolate mammalian genes based on RNA splicing. Proc Natl Acad Sci USA 88:4005-4009. 3. Lovett M, Kere I, Hinton LM (1991). Direct selection: A method for the isolation of cDNAs encoded by large genomic regions. Proc Nati Acad Sci USA 88:9628-9632. 4. Parimoo 5, Patanjali SR, Shukia H, Chaplin DD, Weissman SM (1991). eDNA selection: Efficient PCR approach for the selection of cDNAs encoded in large chromosomal DNA fragments. Proc Natl Acad Sci USA 88:9623-9627. 5. Rommens J, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, Graham R, Lai V, McArthur J, Nasir 3, Theilmann I, McDonald H, Kalchman M, Clarke LA, Shappert K, Hayden MR (1993). A transcription map of the region containing he Huntington disease gene. Hum Mol Genet 2:901-907. 6. Goldberg YP, Lin B-Y, Andrew SE, Nasir J, Graham R, Glaves ML, Hutchinson G, Theilmann J, Ginzinger DG, Shappert K, Clarke L, Rommens JM, Hayden MR (1992). Cloning and assessment of the c-adducin gene close to D4S95 and assessment of its relationship to Huntington disease. Hum Mo! Genet 1:669-675. 7. MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, Bates G, Taylor 5, Allito B, Altherr M, Myers R, Lerach H, Collins FS, Wasmuth 11, Frontali M, Gusella iF (1992). The Huntington’s disease candidate region exhibits many different haplotypes. Nature Genet 1:99-103. 8. Daniels, GR and Deininger, PC (1985). Integration site preferences of the Alu family and similar repetitive DNA sequences. Nucl Acids Res 13:8939-8954. 9. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72:971-983. 10. Snell RG Thompson LM, Tagle DA, Holloway TL, Barnes G, Harley HG, Sandkuijl LA, MacDonald ME, Collins FS, Gusella JF, Harper PS, Shaw DJ (1992). A recombination that redefines the Huntington disease region. Am J Hum Genet 5 1:357-362. 11. Bates GP MacDonald ME, Baxendale 5, Youngman S, Lin C, Whaley WL, Wasmuth JJ, Gusella IF, Lehrach H (1991). Defined physical limits of the Huntington disease gene candidate region. Am J Hum Genet 49:7-16. 12. Muratani K, Hada T, Yamamoto Y, Kaneko T, Shigeto Y, Ohue T, Furuyama 3, Higashino K (1991). Inactivation of the cholinesterase gene by Ala insertion: Possible mechanism for human gene transposition. Proc Natl Acad Sci USA 88:11315-11319.  141  13. Hutchinson GB, Andrew SE, MacDonald H, Goldberg YP, Graham R, Rommens JM, Hayden MR (1993). An Alu element retrotransposition in two families with Huntington disease defines a new active Alu subfamily. Nucl Acids Res 21:3379-3383. 14. Vidaud D, Vidaud M, Bahnak BR, Siguret V, Gispert Sanchez S. Laurian Y, Meyer D, Goosens M, Lavergne JM (1993). Haemophilia B due to a de novo insertion of a humanspecific Alu subfamily member within the coding region of the factor IX gene. Eur J Hum Genet 1:30-36. 15. Wallace MR, Anderson LB, Saulino AM, Gregory PE, Glover TW, Collins FS (1991). A de novo Alu insertion results in neurofibromatosis type 1. Nature 353:864866. 16. McCombie WR, Martin-Gallardo A, Gocayne 3D, FitzGerald M, Dubnick M, Kelley 3M, Castilla L, Liu L, Wallace 5, Trapp 5, Tagle D, Whaley WL, Cheng 5, Gusella 3, Frischauf A-M, Poustka A, Lebrach H, Collins FS, Kerlavage AR, Fields C, Venter JC (1992). Expressed genes, Alu repeats and polymorphisms from chromosome 4. Nature Genet 1:348-353.  142  CHAPTER 7 THE RELATIONSHIP BETWEEN TRINUCLEOTIDE (CAG) REPEAT LENGTH AND CLINICAL FEATURES OF HUNTINGTON DISEASE The work in this chapter has contributed to two publications.  Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann I, Adam S, Starr E, Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington’s disease, Nature Genet 4:398-403.  Goldberg YP, Andrew SE, Clarke LA, Hayden MR (1993). A PCR method for accurate assessment of trinucleotide repeat expansion in Huntington disease. Hum Mol Genet 6:635-636.  143  7.1 INTRODUCTION The Huntington Disease Collaborative Research Group isolated a novel gene containing a trinucleotide repeat (CAG) that is expanded on HD chromosomes.  The CAG repeat is  located at the 5’ end within the putative coding region. The gene is located between D4S 180 and D4S 127, and produces two transcripts of 10 and 13 kb encoding a protein of 348 kd with no known homology to any other known protein.  In the initial report, this highly polymorphic CAG repeat, located in the 5’ region of the gene, was shown to range from 11 to 34 copies on normal chromosomes, while in the affected individuals analyzed, the CAG repeat had expanded to beyond 42 repeats, with the largest expansion observed of 100 trinucleotides . 1  The description of this trinucleotide expansion demonstrates that dynamic mutations are now associated with at least seven human genetic diseases including fragile X syndrome , X-linked mental retardation (FRAXE) 3 ’ 2 (FRAXA) , myotonic dystrophy (DM) 4 , X 57 linked spinal and bulbar muscular atrophy (SBMA) , spinocerebellar ataxia (SCA) 8 , 9 dentato-rubro-pallido-luysian atrophy (DRPLA)’°” 1 and HDreviewed in 12-16•  Prior studies of these diseases have demonstrated a strong relationship between trinucleotide expansion and clinical severity in offspring. For example, in DM there is an association between an increase in repeat length and earlier clinical onset of disease . 17t9 Furthermore, trinucleotide expansion has also provided a molecular basis for the anticipation seen in the transmission of DM’ . This is also seen in Fragile X where 9 ’ 7 CCG repeat length correlates with the risk of expansion associated with mental . In SBMA, the age of onset and age of clinical milestones such as difficulty 12 retardation with stair climbing, decreased with increasing lengths of the CAG repeat . Although the 20 correlation between disease severity and CAG repeat length was demonstrated, factors  144 other than the trinucleotide repeat appear to contribute significantly to the SBMA disease . 20 phenotype  Genetic and environmental factors have both been invoked to account for the variation in age of onset of HD in different families. Studies of monozygous twins, however, have clearly shown by the high concordance of age of onset that genetic factors play a major 22 The purpose of this analysis was to assess the relationship between CAG repeat ’ 21 role . length and the clinical features of HD. In this chapter the expanded allele of the HD patient is referred to as the “upper allele” and the normal sized allele is referred to as the “lower allele”.  7.2 RESULTS 7.2.1 DEVELOPMENT OF A PCR ASSAY FOR ASSESSMENT OF THE CAG REPEAT In the initial report of the CAG associated with HD, the Huntington Disease Collaborative Research Group detected CAG trinucleotide repeat expansion using primers HD1 and HD2 which span the CAG repeat as well as two separate adjacent CCG trinucleotide repeats (Figure 7-1). This highly repetitive DNA, together with its high GC content renders this region difficult for PCR analysis. In order to reduce the GC content of the PCR product and to produce a smaller product for more accurate sizing of the CAG expansion, PCR conditions were developed using a new primer set (HD 344 and HD 5) eliminating one of the CCG repeats from the amplification product (Figure 7-1). This amplification allowed accurate determination of the length of CAG repeat expansion on both the normal and the disease alleles. PCR was established with primers HD344 (5’ CCTTCGAGTCCCTCAAGTCCTTC 3’) and HD5 (5’ CGGCTGAGGCAGCAGCGG 3’).  PCR conditions were 3 mM MgC12, 50 mM KC1, 20 mM Tris pH 8.4  ,  2%  formamide, 100 mM dATP, 100 mM dCTP, 100 mM CITTP, 75 mM 7-dAza-dGTP, 25  145  HD344 HD1  r  5’  3’ HD5 HD482  “  HD2  (CAG) (CCG)  Figure 7-1. Primers for amplification of the trinucleotide repeat. Primers HD1 and HD2 are as previously published*. HD1 = 5’ATGAAGGCCflCGAGTCCCTCA AGTCCTC 3’. HD2 = 5’AAACTCACGGTCGGTGCAGCGGCI’CCTCAG 3’. Primers HD344 and HI) 5 were synthesized according to the published sequence: HD344 =5’ CCTTCGAGTCCCTCAAGTCCTC 3’ HD5 =5’ CGGCTGAGGCAG CAGCGGCGT 3’. PCR was further modified by using HD 344 together with H]) 482. HD482 = 5’ GGCrGAGGAAGCTGAGGAG 3’. *Hth1gton Disease Collaborative Research Group (1993). Cell 72:971-983.  146 mM 7-dGTP, 0.5 mM of each primer, 0.3 pmol of y 32 P end-labeled primer and 1 .25U of Taq DNA polymerase per 25 .L1 reaction. Thermal cycling conditions were 95 °C for 5 mm, followed by 30 cycles of 94 °C for 1 mm, 63 °C for 1 mm, 72 °C for 1 mm, with a final extension at 72 °C for 7 mm. PCR products were resolved on 6% polyacrylamide gels and scored against an Ml 3 sequencing ladder. The CAG trinucleotide length was determined by subtracting 110 bp of non-CAG containing DNA from the PCR size.  PCR across the CAG repeat was subsequently further optimized by design of a new primer (primer HD482 5’ GGCTGAGGAAGCTGAGGAG 3’) used in combination with primer HD344 (Figure 7-1). Formamide (2%) and glycerol (15%) were also found to be essential for improved PCR across this GC rich region to achieve both specificity and good product yield . Numerous other parameters were tested to improve the PCR 23 including 0-10% DMSO, TMAC, Gelatin, Triton X-100 and BSA without any improvement in the product. PCR conditions were 2 mM MgCl2, 50 mM KC1, 20 mM Tris pH 8.4, 3.5% formamide, 15% glycerol, 200 mM of each dNTP, 0.5 mM of each primer, 0.3 pmol of y 32 P end-labeled primer and 1 .25U of Taq DNA polymerase per 25 j.tl reaction. Thermal cycling conditions were 95 °C for 3 mm, followed by 30 cycles of 94 °C for 1 mm, 63 °C for 1 mm, 72 °C for 1 mm, with a final extension at 72 °C for 7 mm. The CAG trinucleotide length was determined by subtracting 76 bp of non-CAG containing DNA from the PCR size.  Six percent denaturing polyacrylamide gels were found to be best for resolving and accurately sizing the trinucleotide repeat in HD patients (Figure 7-2). Fractionation of PCR products on non-denaturing polyacrylamide gels gave good resolution but did not allow for accurate sizing. To detect PCR products ‘y 32 P end-labeling of HD5 resulted in less background compared to x 32 P dCTP incorporation during thermal cycling. PCR across repetitive regions of DNA often results in multiple band patterns of variable  147  Ladder size (bp)  TNR  I  53— 49— 47— 44— 42—  270 260 —250 240 —230  —  —  —  —220 —210 —200 0  —  —  —  —  —  —  —  — 18—  I.—  I  180  —170  — 4  190  —  —  160  —150  Figure 7-2. PCR amplification of DNA from HD patients using primers HD344 and HD5. PCR products were resolved on a 6% denaturing polyacrylamide gel, dried and subjected to autoradiography overnight. TNR = number of trinucleotide repeats. TNR = (PCR size 110 ) I 3. -  148 intensity creating difficulties in accurately sizing the repeat expansion. The pattern of the stutter is, however, consistent from gel to gel. To size the polymorphic alleles the darkest intensity band (generally the second largest band of the stutter) was scored against an M13 sequencing ladder (Figure 7-2). The CAG trinucleotide length was determined by subtracting 76 bp or 110 bp of non-CAG containing DNA from the PCR size, depending on the primer set used. Failure to detect an expanded allele necessitates Southern blot analysis to distinguish between a normal individual homozygous for a normal allele and an affected patient with an expansion too large for analysis by PCR.  7.2.2 THE ASSOCIATION OF TRINUCLEOTIDE EXPANSION AND AGE OF ONSET A total of 366 persons (259 unrelated families) constituted the cohort for this study (Table 7-1). Age of onset of HD was known for all persons while information concerning the clinical presentation, including involuntary movement disorder, psychiatric and cognitive dysfunction was available in a subset. In this cohort, age of onset has been defined as the age at which the first clearly defined abnormality was apparent including involuntary movements, psychiatric or cognitive abnormalities, or inability to perform complex hand movements as manifested by clumsiness.  Using both PCR methods described, both HD and normal alleles were detected in 98.4% of the cohort (360/366) (Table 7-2). The exact cause for failure to detect an expanded allele in these 6 persons reflects either technical failure, patients with HD without an upper allele, normal persons homozygous for the lower alleles, persons misdiagnosed as HD or sample mix-up. These 6 individuals did not have any clinical characteristics that might have suggested that their exclusion would bias the subsequent analysis of the remaining 360 patients and were deleted from this analysis.  149  TABLE 7-1:  Demographic and Clinical Data of Study Cohort  N=360 Sex distribution: Male Female  191 169  Age of onset: Mean Range  41.5±12.4 5-85  Affected Parent (by age group): Age  N  Mother  Father  Unknown  0-20 21-40 41-60 61-80 >81 Total Cohort  20 167 149 23 1 360  7 75 64 7 0 153  13 82 68 9 0 172  0 10 17 7 1 35  150  TABLE 7-2:  CAG repeat length in cohort  N  Median  Range  360 360  20 44  11-37 38-121  Age of onset  N  Median  Range  0-20 21-40 41-60 61-80 >81  20 167 149 23 1  56.5 46 43 42 43  46-121 38-75 39-52 40-44  Length of lower allele: Length of upper allele:  Length of upper allele:  Group differences and pairwise comparisons of upper allele length are all significant (p.<0.001), except for comparison between 41-60 and 61-80 year groups (N.S.).  151 The median of the size of the upper allele representing the expanded HD chromosome was 44 repeats with a range of 38-121 repeats while the range for the non-HD allele was 11-37 repeats with a median of 20. The range of CAG repeat length of 140 control chromosomes from individuals with no relationship to affected persons was 10 to 31, with a median of 17 repeats (Table 7-2).  To examine the relationship between age of onset and CAG length, linear regression was used, with logarithmic transformation of the age of onset, allowing the treatment of an exponential function as an intrinsically linear model. A highly significant correlation (n =  360, r = 0.70, p  =  10-v) was evident for the relationship between length of trinucleotide  repeat and age of onset in the whole cohort accounting for approximately 50% of the total variation in age of onset (r 2  =  0.49)(Figure 7-3). The regression curve was derived  according to the formula: in [age of onset]  =  5.3379 0.0363 x [trinucleotide size]. When -  analysed by age of onset in 20 year intervals, the 0-20 and 21-40 groups had significantly greater trinucleotide repeat lengths compared to the older age of onset groups (Table 7-2). A similar assessment was also made of the association between trinucleotide expansion and age of onset of other clinical features. A highly significant correlation was seen between CAG expansion and age of onset of chorea (n  =  clumsiness (n  37, r = 0.577, p  =  69, r  =  0.636, p  =  l0), dementia (n  =  122, r  =  psychiatric signs (psychosis/depression/severe behavioral problems) (n =  0.611, p  =  =  =  10),  10’s), and  84, r = 0.518, p  106).  Prior studies have shown earlier onset in offspring of affected males . 2 ’ 2 8 In our cohort, those with an affected father had a lower age of onset (1 year) than those with an affected mother (p <0.001)(Table 7-3). The presence of juvenile onset patients may have been responsible for this finding, as their exclusion from this analysis resulted in no difference in onset age between offspring of an affected father compared to an affected  >pl  U)  a) •1  o o  G)  100  80  60-  —  40-  20  0 35  .  a  aa  a  • a. a... I. a. • a a.... • . a....  i:::  a..... a. • •sa .. . a.... a..  :  • a  a  a  a. a  a..a.aa  a a  a. a... a • a. a a a • . . a a... .aaaa I.. a•aa a... a a I. a a a a a a... a • a a...... • a... a a.  •  a a .  a  a  a a  •U  a .  a  a  a  a  a  a  a  a a.  a  a a I  a a  .  a  a  I  N=360 r—.70” r2_.49 p iO 7  a  a  I  a .  a  I  85  a  I  75  a  I  65  a  I  55  a  45  CAG-repeat length  Figure 7-3. Age of onset by length of CAG repeat. The regression curve was calculated on log transformed age of onset data. One patient with onset at age 5 and a CAG length of 121 repeats is not shown as this was off scale.  153  TABLE 7-3: Length of upper and lower allele by sex of parent and grandparent  Affected parent/grandparent Total cohort: Mother Father  N  Length of upper allele Median Range  Mean age of onset (± standard deviation)  153 172  44 44  39-65 38-121  Juvenile onset: Mother Father  7 13  57 53  49-61 46-121  17.7±3.5 16.0±4.0  Total cohort: Mother/grandmother Mother/grandfather Father/grandmother Father/grandfather  46 44 67 32  45 44.5 45 45  40-58 39-65 40-70 38-121  39.4±11.7 39.4±9.6 35.2±1 0.2 36.4±12.8  Juvenile onset: Mother/grandmother Mother/grandfather Father/grandmother Father/grandfather  4 1 8 3  56.5 49 54 68  52-58  19.3±1.0 19 16.4±2.6 13.7±7.8  *  p<0.001 (determined by t-test) No other differences are significant  46-78 48-121  41.1±11.3 40.1±12.7  * *  154 mother. This confirms the sex of parent effect long observed in HD families. The difference in onset age depending on the sex of the affected parent was not however, reflected in the size of the upper allele, which did not differ significantly between the two parental groups (Table 7-3). Neither the grandparental origin of the gene, nor the parentlgrandparent pattern of inheritance of the mutant allele was associated with any differences in upper allele length for the group as a whole (Table 7-3). No significant differences were seen in age of onset with maternal or paternal inheritance when analyzed by non-parametric Wilcoxon and Kruskall-Wallis tests (data not shown).  The age of death of persons with RD is a specific point in time and is not subject to the same sources of potential bias as other measurements of the natural history of this disorder such as age of onset.  A significant correlation between the length of  trinucleotide expansion and the age of death of the patient with HD was determined (n 51, r  =  0.423, p  =  0.01). This parameter may also have been subject to some bias of  ascertainment in that longer survivors in this particular cohort would be still alive and not be included in this particular analysis.  7.2.3 THE CORRELATION BETWEEN TRINUCLEOTIDE LENGTH AND PRESENTATION OF DIFFERENT CLINICAL FEATURES An assessment was made of the relationship between the length of trinucleotide expansion and the presentation of the major clinical feature at diagnosis. The cohort was divided into those who either had chorea, psychiatric disturbance (psychosis or depression), dementia, or rigidity as the major presenting feature. In each group there was no association, independent of age of onset, between repeat length and a particular clinical presentation.  155 7.2.4 THE VARIATION IN TRINUCLEOTIDE EXPANSION IN PERSONS WITH JUVENILE ONSET Data were available on 20 persons with juvenile onset HD. In this group the size of the expansion ranged from 46 to 121, with median of 56.5 repeats. One patient with onset of disease at age 5 had an expansion of 121 trinucleotide repeats. The remainder had repeat sizes of 78 or lower, and of these, six were in the range of patients with adult onset disease. Regression analysis of age of onset on CAG length revealed a significant correlation (r  =  -0.79; r 2  =  0.62; n  =  20; p  <  10-). In this small sample, no significant  differences in repeat length could be found between those who inherited a mutant allele of paternal compared to maternal origin (Table 7-3). Similarly, parentallgrandparental origin of the mutant allele was not correlated with CAG repeat length (Table 7-3).  7.2.5 THE PREDICTIVE VALUE OF TRINUCLEOTIDE EXPANSION IN THE DETERMINATION OF AGE OF ONSET The significant correlation between the age of onset and the size of the trinucleotide repeat expansion raised the possibility that age of onset could be predicted based on trinucleotide length (Figure 7-4). To test this, a random numbers generator was used to divide the whole cohort (n  =  360) into two smaller groups. The first group (n  =  190) was  used to recalculate a regression equation and confidence limits for prediction while the second group (n  =  170) was used to test these confidence limits (test sample). The  regression equation obtained from the 190 cases was: ln [age of onset]  =  5.4053  -  0.0377  x [trinucleotide size], and was similar to the equation obtained using the whole cohort (r =  -0.64; r 2  =  0.41; p  <  )(Figure 7-3). The model generated using the first group 7 10  described the test sample, since 94.1% (160/170) fell within the range predicted by the 95% confidence limits and 97.6% (166/170) fell within the range predicted by the 99% confidence limits.  ‘a  a  100  80  n.. w  40  20  0  —  -  35  a ‘a.. :a a ea... a a a a  .  IS  .  ••..,•  •  ...  a  a  I  a.  a’..  ::  •  S....  d’.  I  :.  I  75  “•....  I  65  •  .::...  I  55  .  45  CAG-repeat length  a  I  I  2.5Y. 05Y  995  85  Figure 7-4. Confidence intervals for predicted age given the size of the CAG repeat. Outer curves delineate the 99% confidence interval while inner curves show the 95% confidence interval.  157 However, at the lower end of the range of CAG repeats there are very broad confidence limits for age of onset predictions (Figure 7-4). For example, with a trinucleotide repeat length of 45, the expected age of onset would be 41 with 95% confidence limits of between 25 and 66 years. As the trinucleotide expansion increases to over 50, however, these limits narrow. A trinucleotide size of 60 would give an expected age of onset of 23 with 95% confidence limits of 14 and 38 years. However, it should be noted that only a small percentage (4.1%) of the total cohort had repeat lengths of 60 or greater.  7.2.6 THE PRECISION OF ASSESSMENT OF TRINUCLEOTIDE EXPANSION A critical issue in the development of the predictive model is the accuracy and reproducibility of assessment of trinucleotide expansion using the methods described. A total of 37 persons were chosen randomly and assessed in two separate PCR experiments. Of these 37, 11 had exactly the same size expanded allele in both analyses. The maximum differences varied however, between -3 to +3. The mean of the differences ± SD for the upper allele was 0.6 ± 1.9. The size of the lower allele differed by not more than 2 repeats for 97% (36/37) of subjects with 19 persons having no difference (mean ± SD  =  0.3 ± 1.2). This size difference occurred on comparison of the PCR products from  the same DNA samples sized on different gels.  To examine the consequences of this error on the results of the regression analysis, the data on CAG length were varied within the limits of the original dataset in a random manner according to a normal distribution using the appropriate standard deviation (1.9). Repeat regression analysis with the new, randomly displaced CAG lengths resulted in complete agreement with the results of the previous analysis. Thus it is apparent that using this PCR approach, the relatively minor differences in reproducibility of results have no significant impact on the predicted estimates for ages of onset.  158 7.2.7 PARENT-CifiLD CORRELATIONS OF TRINUCLEOTIDE EXPANSION Having demonstrated a significant association between age of onset and repeat length, it was hypothesized that differences in onset ages within families would be explained by differences in repeat length. Both parent-child and sib pairs were examined.  Within this cohort of 360 people there were 25 affected parent-child pairs including 4 in which the child had juvenile onset. The correlation between repeat length in affected parent and child was not significant (r  =  0.33, p  =  0.10, n  =  25). No difference in repeat  length was seen in 1 parent-child pair, while 16 affected children had an increase of between 1 and 4 repeats. Four offspring had an increase of between 8 and 20 while 1 offspring of juvenile onset had an increase of 74 repeats. In this instance, a father with a trinucleotide expansion length of 42 had a child with onset age 5 and trinucleotide expansion of 121, representing the largest expanded allele in this cohort. In two other parent-juvenile onset pairs analysed, the difference in trinucleotide expansion between the parent with adult onset and child with juvenile onset was 8 and 13 trinucleotides respectively (Figure 7-5). Two children had a decrease of 1 repeat and 1 affected child had a decrease of 2 repeats compared to their affected parent.  Large differences in age of onset between parent and child however, did not always parallel differences in trinucleotide repeat length. In this study, 8 parent-child pairs both who manifest adult onset disease showed anticipation of between 10  -  30 years but did  not have differences in trinucleotide repeat expansion greater than three. Therefore the trinucleotide expansion between parent and adult onset offspring was not apparent even when the difference in age of onset was similar to that seen between the parent and juvenile onset child.  159  24 mother affected C  •  father affected  050  _635 40 CAG-repeat length Parent Figure 7-5. Change in repeat length between 25 affected parent child pairs. One parent-child pair with a difference of 74 repeats is not shown as this was off scale. The sex of the affected parent is noted.  160  7.2.8 SIB-SIB CORRELATIONS OF TRINUCLEOTIDE EXPANSION A total of 48 pairs of affected siblings were included in this analysis (Figure 7-6). The correlation between the siblings for trinucleotide expansion was significant (r = 0.66, p  <  0.001). No difference in repeat length was seen in 8 sib pairs while 18 sib pairs differed by only 1 repeat. Sixteen affected sib pairs had a difference of 2 to 6 repeats and the remaining 6 sib pairs had repeat length differences between 7 and 16 (Figure 7-6). Interestingly, in this latter group the father was always the transmitting parent.  Nineteen pairs of siblings had differences of greater than 5 years in the age of onset, varying between 6 to 29 years. In 8 of these pairs of siblings there was no or only one difference in the repeat length. In contrast, five pairs of siblings had ages of onset within 5 years but did have repeat length differences ranging between 4 and 11 (Figure 7-6). It is apparent therefore, that repeat length alone could not account for the observed differences in ages of onset between siblings.  7.3 DISCUSSION The analysis of this cohort of 360 affected persons from 259 unrelated families with HD has demonstrated a significant correlation between the number of CAG repeats and the age of onset.  This association was present irrespective of the mode of clinical  presentation at time of onset. The number of trinucleotide repeats in the upper allele accounted for approximately 50% of the variation in the age of onset. Repeat length, however, was not indicative of any other particular clinical phenotype as there was no independent association between the major clinical features of the illness and the number of trinucleotide repeats.  Assessment of the repeat expansion in 25 parent-child pairs revealed no significant correlation.  In contrast, the sib-sib correlation of repeat length was significant  161  24 -C : mother affected  0)  father affected  18 a) a) 0. a) h. C.)  I  a)  C, C  0 0I •1II  I  o  V .0  0  o  ‘  o  .0  -6 35  I  I  40  45  50  55  60  CAG-repeat length Sib 1 Figure 7-6. Change in repeat length between 49 pairs of affected siblings. The sex of the affected parent is noted.  162 (p <0.001). This is consistent with the previously reported observation of aggregation of age of onset amongst siblings. The majority (2 1/25 parent-child pairs and 42/48 affected sib pairs) showed small differences in CAG repeat length (<6). However, in those six affected sib pairs with greater differences in repeat length, the transmitting parent was always the father. This suggests that expanded CAG repeats inherited through the male germline may be more likely to undergo significant expansion.  In the assessment of three of four parent-child pairs where the child had juvenile onset (anticipation), this was associated with a significant increase in the trinucleotide repeat expansion. An obvious increase in trinucleotide repeat expansion however, is not always associated with anticipation in terms of age of onset. Eight parent-child pairs were identified where the difference in age of onset between the parent and child was 10 years or greater, but where differences in trinucleotide expansion between parent and child were 2 repeats or less. All of these offspring had onset between the ages of 23 and 54. In addition, five pairs of siblings demonstrated differences of 4  -  12 trinucleotide repeat  lengths with no differences in age at onset indicating that moderate changes in repeat length are not always associated with changes in age at onset.  Within families, therefore, repeat length may on occasion show a significant increase without reported changes in age of onset. Conversely, there may also be obvious changes in age of onset with no measurable changes in repeat length. Thus it would appear, particularly in persons with adult onset of HD, that factors other than trinucleotide repeat expansion may also play a significant role in the determination of age of onset.  The identification of trinucleotide repeat expansion in this gene has many implications for predictive testing programs. With regard to the predictability of phenotype based on length of trinucleotide repeat expansion, it is evident that very broad ranges of predicted  163 age of onset can be derived based on the number of trinucleotide repeats. However, it is also apparent that this would only be useful for a minority of persons (4.1%) at risk for HD who have repeat lengths greater than 60.  Curves for ages of onset of offspring have previously been constructed which gave estimations of age of onset in offspring based on age of onset in the parent . As part of 29 prior counseling programs persons at risk have been informed that there is in general an aggregation of age of onset amongst siblings with less correlation of the age of onset between parent and child. The specific estimates of age of onset with appropriate confidence limits developed in the predictive model may add benefit to such counseling programs for a small proportion of patients.  The observation that 6 individuals diagnosed with HD do not show expansion of the CAG repeat has several implications on previous analyses that included these individuals. Further analyses of the individuals lacking expanded repeats is presented in Chapter 9 and the effects of including results from these individuals in previous analyses, such as linkage disequilibrium analyses or analysis of recombinant chromosomes, is discussed.  The highly significant association between trinucleotide repeat length and clinical features of HD have now been confirmed by other studies of different populations primarily of British and Western European descent  3035•  164  7-4 REFERENCES 1. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington disease chromosomes. Cell 72:971-983. 2. Yu 5, Pritchard M, Kremer E, Lynch M, Nancarrow J, Baker E, Holman K, Mulley JC, Warren ST, Schlessinger D, Sutherland GR, Richards RI (1991). Fragile X genotype characterized by an unstable region of DNA. Science 252:1179-1181. 3. Kremer EJ, Pritchard M, Lynch M, Yu 5, Holman K, Baker E, Warren ST, Schiessinger D, Sutherland GR, Richards RI (1991). Mapping of DNA instability at the fragile X to a trinucleotide repeat sequence p(CCG)n. Science 252: 171 1-17 14. 4. Knight SJL, Flannery AV, Hirst MC, Campbell L, Christodoulou Z, Phelps SR, Pointon J, Middleton-Price HR, Barnicoat A, Pembrey ME, Holland J, Oostra BA, Bobrow M, Davies KE (1993). Trinucleotide repeat amplification and hypermethylation of a CpG island in FRAXE mental retardation. Cell 74:127-134. 5. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C, Narang M, Barcelo J, O’Hoy K, Leblond 5, Earle-MacDonald J, de Jong P3, Wieringa B, Korneluk RG (1992). Myotonic dystrophy mutation an unstable CTG repeat in the 3’ untranslated region of the gene. Science 255:1253-1255. 6. Fu Y-H, Pizzuti A, Fenwick RG Jr, King 3, Rajnarayan 5, Dunne PW, Dubel 3, Nasses GA, Ashizawa T, de Jong P, Wieringa B, Lorneluk R, Perryman MB, Epstein HF, Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255:1256-1258. 7. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, Hunter K, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA, Crow 5, Davies 3, Shelbourne P, Buxton J, Jones C, Juvonen V, Johnson K, Harper PS, Shaw DJ, Housman DE (1992). Molecular basis of myotonic dystrophy: expansion of a trinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase family member. Cell 68:799-808. 8. La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH (1991). Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 352:7779. 9. Orr, HT, Chung M, Banfi 5, Kwiatowski TJ Jr, Servadio A, Beadet AL, McCall AE, Duvick LA, Ranum LPW, Zoghbi HY (1993). Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nature Genet 4:22 1-226. 10. Nagafuchi S, Yanagisawa H, Sato K, Shirayama T, Ohsaki E, Bundo M, Takeda T, Tadokoro K, Kondo I, Murayama N, Tanaka Y, Kikushima H, Umino K, Kurosawa H, Furukawa T, Nihei K, Inoue T, Sano A, Komure 0, Takahashi M, Yoshizawa T, Kanazawa I, Yamada M (1994). Expansion of an unstable CAG trinucleotide on chromosome l2p in dentatorubral pallidoluysian atrophy. Nature Genet 6: 14-18.  165  11. Koide R, Ikeuchi T, Onodera 0, Tanaka H, Igarashi S, Endo K, Takahashi H, Kondo R, Ishikawa A, Hayashi T, Saito M, Tomoda A, Miike T, Naito H, Ikuta F, Tsuji S (1994). Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysian strophy (DRPLA). Nature Genet 6:9-13. 12. Fu YH, Kuhi DPA, Pizzuti A, Pieretti M, Sufcliffe JS, Richards S, Verkerk AJMH, Holden JJA, Fenwick RG Jr, Warren ST, Oostra BA, Nelson DL, Caskey CT (1991). Variation of the CCG repeat at the fragile X site results in genetic instability: Resolution of the Sherman paradox. Cell 67:1047-1058. 13. Richards RI and Sutherland GR (1992). Dynamic mutations: A new class of mutations causing human disease. Cell 70:709-712. 14. Hayden MR (1993). On planting alfalfa and growing orchids: The cloning of the gene causing Huntington disease. Clin Genet 43:217-222. 15. Mandel JL (1993). Questions of expansion. Nature Genet 4:8-9. 16. Nelson DL and Warren ST (1993). Trinucleotide repeat instability: when and where. Nature Genet 4: 107-108. 17. Redman JB, Fenwick RG Jr, Fu YH, Pizzuti A, Caskey CT (1993). Relationship between parental trinucleotide CTG repeat length and severity of myotonic dystrophy in offspring. JAMA 269:1960-1965. 18. Tsilfidis C, MacKenzie AE, Mettler G. Barcelo J, Korneluk RG (1992). Correlation between CTG trinucleotide repeat length and frequency of severe congenital myotonic dystrophy. Nature Genet 1:192-195. 19. Hunter A, Tsilfidis C, Mettler G, Jacob P, Mahadevan M, Surh L, Korneluk R (1992). The correlation of age of onset with CTG trinucleotide repeat amplification in myotonic dystrophy. J Med Genet 29:774-779. 20. La Spada AR, Roling DB, Harding AE, Warner CL, Spiegel R, Hausmanowa Petrusewicz I, Yee W-C, Fishbeck KR (1992). Meiotic stability and genotype-phenotype correlation of the trinucleotide repeat in X-linked spinal and bulbar muscular atrophy. Nature Genet 2:301-304. 21. Hayden M.R. (1981). Huntington’s chorea. Springer-Verlag, New York. 22. Harper P.S. (1991). Huntington’s disease. W.B. Saunders, London. 23. Sarkar G, Kapelner S. Sommer SS (1991). Formamide can dramatically improve the specificity of PCR. Nuci Acids Res 18:7465. 24. Merritt AD, Conneally PM, Rahman NF, Drew AL (1969). Juvenile Huntington’s chorea. In: Progress in neurogenetics, Barbeau, A Brunnett, JR eds. Excerpta Medica, Amsterdam, pp 645-650. 25. Farrer LA and Conneally PM (1985). A genetic model for age of onset in Huntington disease. Am J Hum Genet 37:350-357.  166  26. Myers RH, Madden JJ, Teague JL, Falek A. (1982). Factors related to onset age in Huntington’s disease. Am J Hum Genet 34:48 1-488. 27. Adams P, Falek A, Arnold J (1988). Huntington disease in Georgia: age at onset. Am J Hum Genet 43:695-704. 28. Hayden MR, Soles JA, Ward RH (1985). Age of onset in siblings of persons with juvenile onset Huntington disease. Clin. Genet 28:100-105. 29. Stevens D.L. (1972). The heterozygote frequency for Huntington’s chorea. In Huntington’s chorea, 1872-1972. Barbeau A., Chase T.N., Paulson G.W. eds. Raven Press, New York, pp 191-198. 30. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein 5, Ross C, Franz M, Abbott M, Gray I, Conneally P, Young A, Penney I, Hollingsworth Z, Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E, Alvir J, Bickman Conde J, Cha J-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J, Snodgrass 5, deYoung M, Wexler N, Moscowitz C, Penchaszadeh G, MacFarlane H, Anderson M, Jenkins B, Srinidhi J, Barnes G, Gusella J, MacDonald M (1993). Trinucleotide repeat length instability and age of onset in Hntington’s disease. Nature Genet 4:387-392. 31. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME, Gusella JF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeat expansion and phenotypic variation in Huntington disease. Nature Genet 4:393-397. 32. Telenius HT, Kremer HPH, Theilmann J, Andrew SE, Almqvist E, Anvret M, Greenberg C, Greenberg J, Lucotte G, Squitieri F, Starr E, Goldberg YP, Hayden MR (1993). Molecular analysis of juvenile Huntington disease: the major influence on (CAG)n repeat length is the sex of the parent. Hum Mo! Genet 2:1535-1540. 33. Zuhlke C, Riess 0, Schroder K, Siedlaczck I, Epplen JT, Engel W, Thies U (1993). Expansion of the (CAG)n repeat causing Huntington disease in 352 patients of German origin. Hum Mol Genet 2:1467-1469. 34. Norremolle A, Riess 0, Epplen JT, Fenger K, Hasholt L, Sorenson SA (1993). Trinuceolide repeat elongation in the Huntingtin gene in Huntington disease patients from 71 Danish families. Hum Mol Genet 2: 1475-1476. 35. Stine OC, Pleasant N, Franz ML, Abbott MH, Folstein SE, Ross CA (1993). Correlation between the onset age of Huntington’s disease and length of trinucleotide repeat in IT-15. Hum Mo! Genet 2:1547-1549.  167  CHAPTER 8 SENSITIVITY AND SPECIFICITY OF CAG EXPANSION IN HUNTINGTON DISEASE  The work presented in this chapter has contributed to one manuscript.  Kremer B, Goldberg YP, Andrew SE, Theilmann J, Telenius H, Zeisler J, Squitieri F, Lin B, Adam S. Benjamin C, Greenberg 3, Lucotte G, Almqvist E, Bird TD, Schellenberg GD, Bassett A, Aimqvist E, Bird T, Hayden MR. A worldwide study of the Huntington disease mutation: The sensitivity and specificity of measuring CAG repeats. NEJM 330:1401-1406.  168 8.1 INTRODUCTION Prior to the cloning of the gene for HD, the accuracy of predictive testing was limited because of potential recombination between inherited DNA markers and the site of the mutation for the disease. In addition, a significant proportion of persons were excluded from testing because of the unavailability of blood from family members. Prior to a definitive test, diagnosis of HD in a symptomatic person was often complicated by the subtlety of early signs and symptoms that often mimic other disorders. The discovery of the CAG repeat mutation in patients with HD or at risk for the disease adds diagnostic accuracy to the clinical setting.  However, certain limitations of the initial data published should be noted before using this data for diagnostic purposes. Most persons in the initial analyses were of Western European descent . Prior to the routine utilization of the CAG marker as a marker for 15 inheritance of HD it is important to ascertain the sensitivity for detection of affected persons in different countries of different ancestries. Furthermore the specificity of CAG expansion for HD must be assessed.  8.2 RESULTS DNA was analyzed from families living in Canada as well as families from many different parts of the world including persons of European, Asian, Black African, Arab and Native Indian descent. The ethnic origins were defined by the country from which the ancestor with HD originated. The racial and geographic origins of these DNA samples from families with HD is shown in Table 8-1.  169 Table 8-1 Distribution of allele sizes according to country of origin and ethnic background Number of individuals  Number of pedigrees  AFRICAN S. Africa-Black  5  2  52  (39-55)  ARAB Syria Egypt Lebanon Saudi Arabia  4 2 2 3  1 2 1 1  42 42,43 41,42 47  (38-45)  (41-50)  ASIAN China Japan Eastlndia  8 3 4  5 2 2  49 46 43  (43-51) (45-51) (41-59)  2 1 2 2 28 117 45 46 74 2 6 43 52 1 1 2 23 8 1 22 91 103 1 0 3 6  2 1 2 2 13 63 31 26 38 2 4 27 33 1 1 2 7 6 1 11 47 46 6 3 2  CAUCASIAN Europe Austria Belgium Czech Republic Denmark Netherlands England France Germany Great Britain Greece Hungary Ireland Italy Latvia Lithuania Malta Norway Poland Romania Russia Scotland Sweden Ukraine Wales Fmr. Yugoslavia  CAG size median (range)  42,61 44 46,52 43,48 43.5 44 44 44 44 40,46 43.5 42 44 41 44 50 46 44 42 41 43 43 45 46,47 43  (37-59) (38-63) (36-100) (40-65) (39-121) (40-49) (39-52) (39-54)  (39-71) (43-48) (37-47) (38-71) (38-88) (40-49) (42-48)  170  Table 8-1 con’t. GAG size median (range)  Number of individuals  Number of pedigrees  35 2  17 2  45 47,53  (36-75)  5 3 1 1 1 11 4 3 1 8  1 2 1 1 1 4 2 1 1 2  41 47 43 48 44 51 48.5 52 50 47  (39-45) (45-49)  NATIVE AMERICAN Native American Metis  13 3  3 1  48 44  (43-58) (41-46)  UNKNOWN ORIGIN  181  131  43  (39-49)  995  565  44  (36-121)  CAUCASIAN (CONT) North America French Canada Mexico South America Chile Ecuador El Savador Venezuela Australia S. Africa-Caucasian S. Africa-Dutch S. Africa-French S. Africa-Indian S. Africa-Mixed  TOTAL:  43  (42-57) (47-53) (49-55) (44-57)  171 8.2.1 CAG SIZES IN RD AND OTHER NEUROPSYCHIATRIC DISORDERS: After review of 1022 patients, 1005 had signs and symptoms compatible with a diagnosis of HD. 995 persons had CAG repeat lengths ranging from 36 and 121 with a median of 44 repeats. The 995 affected persons were from 565 separate pedigrees and 43 different ancestries. No significant differences (ANOVA) in allele sizes in affected persons were seen in persons of various ancestries, including 5 different racial groups. It is however, noteworthy that a trend to an increased median CAG size was seen in affected South Africans of black and mixed ancestry as well as in persons with HD of Chinese, Japanese, Saudi Arabian and Native Indian descent. However, due to small sample size, no conclusions can be drawn. The increased mean CAG size in South Africans of mixed ancestry is consistent with an increased frequency of juvenile and early onset HD in this . 6 population  In 10 persons, CAG repeat lengths were within the normal range (Figure 8-la). In those persons diagnosed with HD who had CAG repeat length less than 30, reassessment of CAG repeat size confirmed initial results. In all these individuals, additional DNA was requested and repeat measurements of CAG length were performed where possible. In addition, the clinical records of these 10 individuals were re-examined and additional records including where possible reports of neuropathological assessment were reviewed. These potential phenocopies are presented in Chapter 10.  In addition, patients with a clear family history of familial Alzheimer’s disease (n=44), schizophrenia (n=47), neurocanthocytosis (n=2), benign hereditary chorea (n=5) and Dentato-rubro-pallidoluysian atrophy (DRPLA) (n=2) were also assessed for CAG expansion. The range of CAG repeat length is shown in Table 8-2. Clearly the range of  N  U) 0  0 II.  0 .0  E z  150-  100  50  0 0  10  111111  20  liJil  30  I  I  40  60  111111  50  I  90  100  110  120  130  HD alleles: n=995 median=44, range 36-121  80  11111  70  (CAG)n  Figure 8-la. Distributions of CAG repeat lengths on different chromosomes. Upper allele sizes in 1005 persons with clinical signs and symptoms of Huntington disease. A total of 995 persons have CAG repeat lengths of 36 or greater.  173 Table 8-2. The distribution of CAG repeat length in Huntington disease and other neuropsychiatric disorders. No of alleles Tested  GAG Repeat Range  GAG Repeat Mean Size  GAG Repeat Median Size  995  36-121  45.3  44  Alzheimer disease  88  12-24  1 8.8  19  Schizophrenia  94  16-25  19.1  19  Benign Hereditary Ghorea  10  1 6-23  1 8.7  18  Neurocanthocytosis  4  17-20  18.5  18.5  DRPLA  4  19-20  19.5  19.5  No of alleles Tested  GAG Repeat Range  GAG Repeat Mean Size  CAG Repeat Median Size  Non-Huntington alleles Intermediate alleles Alleles in HD range  995 12 1  10-36 30-35 36  1 9.1  19  Control alleles  600  10-39  18.4  18  Disease  Huntington disease HDalleIes  Controls  174 CAG repeat size in these disorders is similar to that seen on normal human chromosomes and shows no overlap with that seen in RD.  8.2.2 CAG SIZES IN CONTROLS CAG expansion was also assessed in 300 control subjects of Caucasian, Black African and Chinese descent in an effort to determine the range of CAG expansion in control individuals without a family history of any neuropsychiatric disorder. The normal CAG size, as defined by 600 chromosomes from control subjects, ranged from 10 to 39 (Figure 8-lb. Table 8-3). The distribution appeared to have a bimodal shape, with a CAG size of 18 being relatively underrepresented as compared to peaks at 17 and 19 triplet repeats.  In addition to these 600 control chromosomes, the 995 chromosomes from the affected individuals not containing the CAG repeat expansion were used to study the normal range of CAG repeats (Figure 8-ic). Of these, 983 (98.8%) had CAG size between 10 and 29; . Again, the distribution appeared to be bimodal, 7 the previously determined normal range with peaks at 17 and 19, and with a CAG repeat size of 18 being underrepresented. Comparisons between CAG lengths in controls of Caucasian, Black and Chinese descent reveal differences in CAG repeat distribution between Caucasian and both other groups (Table 8-3). However, even though these differences are statistically significant, they are small and will not have clinical relevance. In addition, 12 control chromosomes (unaffected alleles in HD persons) of Caucasian descent were detected with CAG size between 30-35 which represents a frequency of 0.75% of intermediate alleles size (lAs) in this population.  One person with clinical signs and symptoms of RD also had two alleles in the RD range (37 and 43). This person had onset at age 50, with a clinical deterioration over 12 years, consisting of progressive chorea, cognitive decline, dysarthria, dysphagia, and  N  150-  100-  5O  0 0  I  10  20  ii 30 40 50 (CAG)n  70  80  One CAG allele is in  control alleles: n=600 median=18, range 10-39  60  Figure 8-lb. Allele sizes on 600 control chromosomes from 300 individuals. the Huntington disease range at 39 repeat lengths.  176  Table 8-3.  CAG size distribution for control chromosomes of different ethnic origins  Origin CAUCASIAN Australia France Germany Great Britain Ireland Italy Lithuania Netherlands Norway Poland Russia Scotland Sweden BLACK CHINESE ARAB *  p=0.003 # p=0.012  No. 226 2 2 8 32 6 28 2 2 10 2 8 20 102 112 1 0 2  Median (range) 19 17,22 19,20 18 18.5 20 19 12,12 11,1 7 19.5 19,29 18 19 19 17 1 7 17,17  (10-35)  *#  (10-26) (12-35) (16-21) (15-29)  (17-27) (15-22) (14-25) (11-29) (11-29) * (1 6-2 0) #  N  C’,  4 0  I  0 0  E  z  150-  100  50—  0 H 0  10  Ii  30  40 (CAG)n  50  70  80  Note that  non-HD alleles: n=995 median=19, range 10-37  60  Figure 8-ic. Lower allele sizes of the 995 affected persons with expanded upper alleles. one allele, at 37 repeats, is in the Huntington disease range.  178 severe cachexia. Post-mortem examination revealed generalized brain atrophy (brain weight 1100 g); marked caudate atrophy. Her father died unaffected in his forties while her mother manifested with HD in her fifties. This patient, although genetically homozygous for the HD mutation, had features typical of a heterozygote for HD . 6  One control individual, a spouse of an affected person and herself without signs or symptoms of the disease at age 57, had a CAG repeat length of 39 repeats on one chromosome. There was no history of HD in any of her ancestors. This was rechecked including analysis of offspring which confinned this fmding and also revealed the existence of a homozygote for CAG alleles in the range of HD in one offspring. This person aged 25 is currently asymptomatic and has CAG repeat lengths of 39 and 42 triplets respectively.  8.3 DISCUSSION The purpose of this analysis was to demonstrate the sensitivity and specificity of the CAG trinucleotide repeat length as a marker for inheritance of the HD gene. A total of 995/1005 (99.0%) of persons who after review were considered to have a clinical diagnosis of HD were shown to have significantly expanded CAG repeat lengths above the range seen on normal human chromosomes. This was observed in affected persons from 43 different countries and 5 different racial groups including persons of Caucasian, Arab, Black African, Chinese, Japanese, and Native Indian descent. These results support the previously reported findings of the sensitivity of CAG expansion in a smaller group of . In contrast, no CAG expansion was seen in other 8 presumed Caucasian patients neuropsychiatric disorders such as familial Alzheimer’ s disease, familial schizophrenia, benign hereditary chorea, or DRPLA with which HD has previously been misdiagnosed. CAG expansion therefore underlies the worldwide distribution of HD and suggests it is directly related to the causation of HD even though the mechanism by which this occurs is still unknown. The high sensitivity and specificity of CAG expansion for the inheritance of  179 HD has significant implications both for assessment of symptomatic persons as well as for predictive testing programs.  For persons at risk for HD, a direct test for inheritance of the mutation will allow individuals at risk to have a more accurate assessment of their genetic risk without the need for DNA from family relatives. However, misdiagnosis and human error remain a source of error and the opportunity to assess an affected family relative will allow confirmation that CAG expansion is present in other affected relatives. This will facilitate correct interpretation of a normal sized CAG repeat length in someone at risk.  Direct DNA testing will be particularly useful in symptomatic persons for whom the family history of HD is uncertain and for whom the natural course of the illness has not been documented. Clearly the demonstration of an expanded CAG repeat within the HD gene is a highly specific marker for the inheritance of the gene for HD and can be used to differentiate HD from other neuropsychiatric disorders which were commonly misdiagnosed as HD in the past such as Alzheimer’s disease, schizophrenia, neurocanthocytosis, benign hereditary chorea and DRPLA. It is of note, that for some cases of DRPLA, CAG repeat size may represent the only means of differentiating these two disorders during life . 9  Two previously unsuspected homozygotes for HD were identified by direct detection of the expanded allele on both chromosomes. Prior reports utilised linked markers for inheritance of the gene and were theoretically subject to error based on recombination between the markers and the mutation . 1 ’ 10 1 However, the presence of a clinical phenotype and pathological findings in one affected person similar to a heterozygote for the mutation for HD and the absence of symptoms in an adult aged 25 who is a homozygote, supports the previously reported findings that the phenotype of the homozygote is not more severe than  180 the heterozygote and is consistent with CAG expansion conferring a gain of function in the pathogenesis of HD.  A total of 12 CAG alleles (0.75%) with sizes between 30 and 35 were seen on control chromosomes. Surprisingly, all of these were seen on the non-HD chromosomes of affected persons. However, other studies have shown that these intermediate sized alleles (lAs) exist in the normal population at a low frequency similar to that seen on the non-HD alleles in this analysis . New mutations for }{D arise from lAs when transmitted ” 2 through the male germline (Chapter 10).  The stability of these lAs on control  chromosomes is uncertain, but it is likely that these would represent the pool from which new mutations for HD arise.  CAG expansion underlies the worldwide distribution of HD in persons of various ancestries and racial groups. In addition to being sensitive for indicating inheritance of HD, CAG expansion is also highly specific, not being seen in persons with other neuropsychiatric disorders with which HD is frequently misdiagnosed.  181  8.4 REFERENCES 1. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein S, Ross C, Franz M, Abbott M, Gray 3, Conneally P, Young A, Penney J, Hollingsworth Z, Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E, Alvir J, Bickman Conde J, Cha J-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J, Snodgrass S, deYoung M, Wexier N, Moscowitz C, Penchaszadeh G, MacFarlane H, Anderson M, Jenkins B, Srinidhi J, Barnes G, Gusella 3, MacDonald M (1993). Trinucleotide repeat length instability and age of onset in Huntington’s disease. Nature Genet 4:387-392. 2. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME, Gusella JF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeat expansion and phenotypic variation in Huntington’s disease. Nature Genet 4:393-397. 3. Telenius HT, Kremer HPH, Theilmann 3, Andrew SE, Almqvist E, Anvret M, Greenberg C, Greenberg J, Lucotte G, Squitieri F, Starr B, Goldberg YP, Hayden MR (1993). Molecular analysis of juvenile Huntington disease: The major influence on (CAG)n repeat length is the sex of the transmitting parent. Hum Mol Genet 2:1535-1540. 4. Zuhike C, Riess 0, Schroder K, Siedlaczck I, Epplen JT, Engel W, Thies U (1993). Expansion of the (CAG)n repeat causing Huntington disease in 352 patients of German origin. Hum Mol Genet 2:1467-1469. 5. Norremolle A, Riess 0, Epplen IT, Fenger K, Hasholt L, Sorenson SA (1993). Trinucleotide repeat elongation in the Huntingin gene in Huntington disease patients from 71 Danish families. Hum Mol Genet 2: 1475-1476. 6. Hayden MR (1981). Huntington’s chorea. Springer-Verlag, New York. 7. Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann J, Adam 5, Starr E, Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington disease. Nature Genet 4: 398-403. 8. MacMillan JC, Snell RG, Tyler A, Houlihan GD, Fenton I, Cheadle JP, Lazarou LP, Shaw DI, Harper PS (1993). Molecular analysis and clinical correlations of the Huntington disease mutation. Lancet 342:954-958. 9. lizuka R, Hirayama K, Machara K. (1984). Dentato-rubro-pallido-luysian atrophy: a clinicopathological study. J Neurol Neurosurg Psychiat 47:1288-98. 10. Wexler NS, Young AB, Tanzi RE, Travers H, Starosta-Rubenstein 5, Penney JB, Snodgrass SR, Shoulson I, Gomez F, Arrayo MAR, Penchaszadeh GK, Moreno H, Gibbons K, Faryniarz A, Hobbs W, Anderson MA, Bonilla E, Conneally PM, Gusella JF (1987). Homozygotes for Huntington’s disease. Nature 326:194-197. 11. Myers RH, Leavitt J, Farrett L, Jagadeesh J, McFarlane H, Mastromauro CA, Mark RJ, Gusella 3 (1989). Homozygote for Huntington’s disease. Am J Hum Genet 45:61518.  182  12. Zuhike C, Riess 0, Bockel B, Lange H, Theis U (1993). Mitotic stability and meiotic variability of the CAG repeat in the Huntington disease gene. Hum Mol Genet 2:20632067.  183  CHAPTER 9 HUNTINGTON DISEASE WITHOUT CAG EXPANSION The work in this chapter has contributed to one manuscript.  Andrew SE, Goldberg YP, Kremer B, Squitieri F, Theilmann J, Zeisler 3, Telenius H, Adam S, Almqvist E, Anvret M, Lucotte G, Stoessl AJ, Campanella G, Hayden MR (1994).  Huntington disease without CAG expansion: Phenocopies or errors in  assignment? Am 3 Hum Genet 54:852-865.  184  9.1 INTRODUCTION The prior reports of the relationship between trinucleotide repeat length and clinical features of HD each described a small number of persons who had been given the diagnosis of HD, but were found not to have CAG repeat sizes in the range seen in affected persons’  4.  Accurate assessment of the reasons for the failure to demonstrate  expanded CAG repeats in those persons diagnosed with HD is critical in determining the sensitivity of CAG repeat length for the diagnosis of HD in symptomatic patients. Furthermore, this is also important for predictive testing programs where the detection of CAG repeat length in the normal range may be mistaken as absolute proof that the person at risk will not develop signs and symptoms consistent with the phenotype of HO in the future.  A total of 1022 individuals from 573 families of 43 different ancestries diagnosed with HO were assessed for CAG repeat length. Those persons who did not have expansion of the CAG repeat in the affected range were assessed more fully to determine the reasons for the failure to detect expanded CAG repeats in all such presumably affected persons. It is possible that on very rare occasions, genetic heterogeneity may underlie the presentation of the HD phenotype.  Other disorders associated with dynamic mutations provide support for multiple mutations being responsible for disease. Two patients with a phenotype characteristic of that observed for Fragile X that lack the cytogenetic expression of FRAXA and CCG expansion have been reported . In both cases mutations in the FMR-1 gene other than 6 ’ 5 expansion of the CCG repeat in the 5’ UTR have been shown to be responsible for the fragile X syndrome. One patient was found to contain a de novo point mutation within the FMR- 1 gene and the other had a submicroscopic deletion of more than 2 Mb of DNA, encompassing the CCG repeat and the FMR-1 gene. However, although the fragile X  185 syndrome may demonstrate allelic heterogeneity, the fundamental disruption of FMR- 1 in all cases described to date makes this disorder genetically homogeneous with respect to locus.  9.2 RESULTS CAG repeat lengths were in the range of that seen in normal human chromosomes (10 to 30 repeats) in 30 of 1,022 persons who had been given the diagnosis of HD (Tables 9-1, 9-2 and 9-3). Clinical details were based on extensive records documenting neurological examination and special investigations such as computerized tomography, positron emission tomography and autopsy records.  In all instances, patient records were  reviewed, including collaboration with the referring physician. Repeat PCR assessment was performed for those individuals without expanded CAG alleles, using both the initial DNA sample as well as a second independently obtained sample when available (7/30 cases). However, this was only possible in 7/30 cases and in the remaining cases it is not possible to determine if sample mix-up had occurred.  The most common causes for the failure to detect CAG expansion in persons with supposed HD represented errors in assignment (18 persons) including misdiagnosis (10 persons), sample mix-up (6 persons) and clerical error (2 persons) (Table 9-1).  9.2.1 ERRORS IN ASSIGNMENT Human error accounted for 8 of these misclassifications (Tables 9-1 and 9-3). In 2 cases the persons were at-risk for HD but were recorded as affected. In 6 individuals sample mix-up took place prior to assessment of the CAG repeat length. This may have occurred at any point from the time of blood withdrawal to the time of assessment of CAG repeat length. PCR reassessment of additional blood samples revealed 3 of the 6 persons were now found to have CAG repeat sizes in the range of affected persons. In another instance  186 Table 9-1: Reasons for Diagnosis of HD Without Expansion of the CAG Repeat Triplet (<37 Repeats) in 1022 Affected Patients  NO.  %  Family History of Neurological Disease  8  0.8  CAG Expansion in Other Family Members  1  0.1  New Mutations  3  0.3  12  1.2  Misdiagnosis  10  1.0  Sample Mix-up  6  0.6  Clerical Error  2  0.2  18  1.8  UNEXPLAINED LACK OF CAG EXPANSION  Total  ERRORS IN ASSIGNMENT  Total  N  00  7  6  5  4  3  2  1  Patient  19,20  19,20  18,19  17,18  16,19  17,20  16,18  16,18  PCR 1 Allele sizes  17,21  19,20  19,20  18,19  17,18  16,19  17,20  16,18  16,18  Repeat PCR 1 Allele sizes  NA  NA  NA  NA  NA  NA  16,19  17,20  NA  NA  New sample Allele sizes  Phenocopy  Phenocopy  Phenocopy  Phenocopy  Phenocopy  Phenocopy  Phenocopy  Phenocopy  Phenocopy  Classification  Possible  Yes  Yes  Yes  Yes  Yes  Yes  Yes  Yes  Family history of neurological disease  Sibof7  Sib of 8  4 persons with CAG expansion in family ( Robbins et al., 1989; Pritchard et al., 1992; Duyao et a!., 1993)  Recombinant, previously described (Weber et al., 1992b)  Cousin of 3, autopsy of affected parent shows caudate atrophy  Cousin of 4, autopsy of affected parent shows caudate atrophy  Sib of 1  Sibof2  Table 9-2. Unexplained lack of CAG expansion  8 17,21 15,21  NA  No No  Comments  9 15,21 17,19  Phenocopy/ New Mutation Phenocopy? New Mutation Phenocopy/ New Mutation  Typical features of HD, confirmed by CT and PET  Typical features of HD, normal PET  10 17,19  16,16  No  11 16,16  Second DNA sample not available  16,16  =  12  NA  PCR 1 Allele sizes  Repeat PCR 1 Allele sizes  New sample  Table 9-3. Errors of assignment  Allele sizes  Family history of  26  25  24  23  22  21  20  19  18  17  16  15  14  13  16,20  19,19  16,17  20,22  17,28  18,21  19,20  19,19  17,28  17,20  18,22  16,20  16,17  17,17  14,21  19,20  16,20  19,19  16,17  20,22  17,28  18,21  19,20  19,19  1 7,28  17,20  18,22  16,20  16,17  17,17  14,2 1  NA  NA  16,43  17,40  19,44  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  Clerical error  Sample mix-up  Sample mix-up  Sample mix-up  Sample mix-up  Sample mix-up  Sample mix-up  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdiagnosis  Misdjagnosjs  Yes  Yes  Yes  Yes  Yes  Yes  Yes  No  No  No  Yes  Yes  Yes  Yes  Yes  Yes  Yes  Unaffected individual  2 persons with CAG expansion in family, haplotypes show sample mix-up  I person with CAG expansion in family  2 persons with CAG expansion in family  1 person with CAG expansion in family  Mix-up with unaffected sib  Mix-up with at risk” child  Progressive dementia & tardive dyskinesia, previous haloperidol use  No caudate atrophy (CT 14 years after diagnosis)  Alzheimer’s disease, confirmed by autopsy  Dementia, probable Alzheimer’s disease  2 persons with CAG expansion in family, psychiatric disturbance  3 persons with CAG expansion in family, neuropathological misdiagnosis of HD  4 persons with CAG expansion in family  Alcoholic, psychiatric disturbance  2 persons with CAG expansion in family, minor motor abnonnalities  I person with CAG expansion in family,behaviour disturbance misdiagnosed as juvenile HD  Comments  27 19,20  ND  Individual”atrisk”  neurological disease  28 11,16  Yes  Classification  29  Clericalerror Not done  17,19 =  17,19  Second DNA sample not available; ND  17,19  =  30  Patient  NA  189 where an additional DNA sample was unavailable, detailed assessment of the family including the parents and children with other highly polymorphic markers in the region, revealed that the blood sample analyzed was highly unlikely to have been derived from this patient. In two cases, DNA from an unaffected family member was mistaken for DNA from an affected individual in that family and represented sample mix-up.  9.2.2 MISDIAGNOSIS (FIGURE 9-1) (TABLE 9-3) Misdiagnosis of HD represented the major cause (10 persons, 1.0%) for the finding of two normal alleles in persons with a presumed phenotype of HD (Tables 9-1 and 9-3). In 7 instances, patients with behaviour, psychiatric or minor motor disturbances from families with HD were incorrectly diagnosed as having HD (patients 13-19). It is noteworthy that in one instance (patient 17) initial neuropathological examination confirmed the diagnosis of HD. However, a second assessment, prior to knowledge of CAG repeat length, clearly indicated that the pathology findings were not consistent with HD. It was noted that the positive family history of HD might have influenced the first neuropathological assessment. In one instance, a patient with dementia as the only presenting feature in the absence of a positive family history of HD was diagnosed as suffering from HD (patient 20). One patient had a tremor and a gait ataxia which was likely due to chronic alcohol intake (patient 15), while in another, dementia associated with tardive dyskinesia was induced by neuroleptic drugs (patient 22).  9.2.3 ABSENCE OF CAG EXPANSION IN INDIVIDUALS FROM FAMILIES WITH CAG EXPANSION IN OTHER AFFECTED PERSONS (TABLES 9-2, 9-3) Ten persons with affected family members with CAG expansion and who had clinical signs which strongly raised the possibility of the diagnosis for HD were shown not to have CAG repeat expansion. In 4 of these instances, sample mix-up accounted for these findings (Table 9-3, patient 25, 26, 27, and 28). In one instance (Table 9-2, patient 6), in  190 H  16/17  8/40 14 14/21 17/17 L  K  18/22  16  25/50 17/43  16/20 17/42 N  M  19/200 I  17/20  19/45 19/50 ‘i7/28  Figure 9-1. Pedigrees of misdiagnosed HD cases. Patients are marked with arrows and sizes of CAG repeats are shown.  191 a family which has been described in detail on three previous occasions 178 two CAG alleles within the normal range were detected. In three instances (Table 9-3, patients 13, 14, 16, 17 and 18), persons from a family with proven HD who had minor motor abnormalities or behavioural disturbances were misdiagnosed as having HD.  9.2.4 LOCUS HETEROGENEITY OF HD? Pedigrees of families containing individuals lacking CAG expansion not attributable to misdiagnosis, sample mix-up or clerical error are shown in Figures 9-2. Haplotypes were constructed for families with a history of HD (patients 1, 2, 3 and 4) (Figure 9-2a, 9-2b). Haplotypes for patients 5 and 6 have been previously determined and are shown in Figure 9-2c and 9-2d’ . Pedigrees of remaining individuals are shown in Figure 9-2e. 9 7  A total of 10 persons from 6 different families had a positive family history of a progressive neurological disorder associated with intellectual decline, chorea, and personality change but lacked CAG expansion (Tables 9-1 and 9-2). In two instances. affected individuals were siblings (Figure 9-2a; Figure 9-2e, Pedigree C) while in another instance the 2 affected persons were first cousins (Figure 9-2b). Haplotype analysis with 6 markers in one family (Figure 9-2b) demonstrated that 2 affected individuals (patient 3 and 4) had not inherited the same chromosome 4 in this region and therefore, the cause for the HD phenotype in this family was unlikely to be a gene in this region of chromosome 4. Haplotype analysis in the family of patients 1 and 2 (Figure 9-2a) shows that both affected persons share identical haplotypes. An older sibling with 2 different parental alleles demonstrates possible early signs of HD. If this individual is definitively diagnosed in the future with HD, then 4.pl6.3 can be excluded as the site responsible for this phenotype in this particular family.  192  0--El CCG CAG D4S127 D4S227  CCG CAG D4S127 D4S227  10 10 1820 6 3 8 8  107 2016 3 6 86  10 7 1816 6 6 8 6  10 10 1820 6 3 8 8  107 1820 6 3 84  7 7 1620 6 3 6 4  10 7 1816 6 6 8 6  Figure 9-2a. Pedigree for patients 1 and 2 (indicated by arrow) including haplotypes showing each have received identical parental chromosomes. The eldest sibling, with different parental alleles to patients 1 and 2, shows possible early signs of HD (shown hatched), which if confirmed as HD in the future would exclude this region of chromosome 4 as responsible for the phenotype in this family.  193  0) CCG CAG D4S127 D4S95 D4S227 D4S133  CCG CAG D4S127 D4S95 D4S227 D4S133  -  -  19 20 4 6 A A A B 1 1  7 118 19 I 6 5 lB A lB B 1 1 10  -  I --1  1620 6 6 B 1  BI 1  43 -  CCG CAG D4S127 D4S95 D4S227 D4S133  -  -  20 17 6 6 -  -  B 1  B 1  77 19 16 5 3 A A B A 2 1  Figure 9-2b. Pedigree for patients 3 and 4 (indicated by arrow) including haplotypes at 6 markers showing that patients 3 and 4 have received a different chromosome 4 from their affected parents, excluding this region of chromosome 4 as responsible for the phenotype in this family.  t94  CAG D4S127 D4S95 D4S227-c3A D4S227-cl3B D4S227-E241 D4S227-E24CA  CAG D4S127 D4S95 D4S227-c3A D4S227-cl3B D4S227-E241 D4S227-E24CA  [] 121 B  13 I 171  liii  17 1 A 4 6 7 3  17 17 14 18i] 1 11211 1 1 A A A A B 3 2 4 1 2 6171 I 2 7 3 7 hil I 3 11 5 2 4 3 5  I f f 131 I  [J  1718 1 1 A A 2 4 7 6 ii fT] 2[1J  Figure 9-2c. Previously published* pedigree for patient 5 (indicated by arrow) including haplotypes at various 4p1 6.3 markers. The affected chromosome is shown boxed. *  Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann J, Hayden MR (1992). Nature Genet 2:216-222.  195  04p16.3  D4S144 D4S1O D4S125 CAG D4S95 D4S115 D4S111 D4S90 D4S169  A B B A B C 319 2 1 A B A B AB BA  B A B A A D 1718 2 1 C C A A BA CC  6 D4S144 D4S10 D4S125 CAG D4S95 D4S115 D4S111 D4S90 D4S169  A C 19 1 B B B A  A A D 18 1 C A A C  Figure 9-2d. Previously published*# pedigree for patient 6 (indicated by arrow) including haplotypes at various 4p markers. The Huntington disease chromosome is shown boxed. Haplotype analysis of patient 6 initially suggested the gene was distal to these three markers# however subsequent analysis with an additional telomeric marker (D4S1 69) then excluded the telomere*. # Robbins C, Theilmann J, Youngman S, Haines J, Altherr MJ, Harper PS, Payne C, Junker A, Wasmuth J, Hayden MR (1989). Am J Hum Genet 44:422-425. Pritchard C, Zhu N, Zuo J, Bull L, Pericak-Vance M, Vance JM, Roses AD, Milatovitch A, Francke U, Cox OR, Myers RM (1992). Am J Hum Genet 50:1218-1230. *  196  C’  19/20  19/20  17/21 F  15/17 —  21/27 10  15/21  4ii  I 6  17/19 17/19 20/21  G  16/16 Figure 9-2e. Pedigrees of patients with unexplained lack of CAG expansion. Patients are marked with arrows and sizes of CAG repeats are given where DNA was available.  197 In a previously described patient (Patient 5)9, DNA marker analysis had indicated that the mutation associated with HD was distal to the region containing the HD gene (Figure 92c). Similarily, patient 6, from a previously described family” , does not share 4.pl6.3 8 ’ 7 markers with other affected family members who demonstrate CAG expansion (Figure 92d). This would indicate that mutations in other genes outside this region of chromosome pl are likely to be associated with the clinical phenotype very similar to HD. 4 , 3 . 6  9.3 DISCUSSION The cardinal finding of this study of 1022 HD affected persons, is that 3.0% of persons (n  =  30) with the diagnosis of HD initially had CAG repeat sizes within the normal range.  After investigation, 18 of these persons were found to represent misclassifications including misdiagnosis, sample mix-up or clerical error while 12 affected persons represent possible cases of HD-like symptoms not caused by CAG expansion of the HD gene.  The assignment of a case as misdiagnosis could be arbitrary depending on how consistent the symptoms were with the classical diagnosis of HD. In this analysis, however the distinction was based on whether an alternative diagnosis was made because the patient had a clinical and a neuropathological phenotype more suggestive of other known disorders  such  as  benign hereditary  chorea,  DRPLA,  inherited ataxia,  neuroacanthocytosis or Alzheimer’s disease. Those patients with a phenotype similar to HD who do not fulfill criteria for these other known disorders and do not have CAG expansion in the HD gene may therefore represent previously undescribed HD-like disorders.  Clinical misdiagnosis is not rare in HD families, accounting for 10 misciassifications in this cohort (1.0%). Previously, it has been demonstrated that patients with HD may often  198 be misdiagnosed as suffering from other illnesses including schizophrenia and Alzheimer’s disease 10,11 This study shows that misdiagnoses of neurological symptoms as HD in families with a positive family history of HD are a significant source of error. This reinforces the need for caution before attributing all neuropsychiatric symptoms as l{D in a family with a positive history of this illness.  In families where there is a history of neuropsychiatric illness in other relatives consistent with an autosomal dominant mode of inheritance, the absence of CAG expansion in the HD gene does not exclude the possibility that the person at risk will not manifest with similar signs and symptoms. This clearly highlights the need where possible to include DNA from an affected person in all analyses of at risk persons which will facilitate more accurate genetic counseling. If the affected relative has CAG expansion and the person at risk does not, this would be reassuring that the person at risk will not develop signs and symptoms of HD. However if the affected person does not show CAG expansion in the HI) gene, this might mean that there is another neurogenetic disorder in this family. Reassurance to the person at risk without CAG expansion that they would not develop signs and symptoms of a similar disorder would not be possible in this situation.  In DRPLA, the phenotype may be similar to HD’ . These two illnesses may however 4 ’ 2 be differentiated by neuropathological examination, where the major involvement of the globus pallidus, subthalamic nucleus and dentate nuclei in DRPLA distinguish it from 7 5 HD’ ’ and now by assessment for CAG expansion in two different novel genes’°” . 5 ” 1 Recently, expansion of a trinucleotide repeat (CAG) on chromosome 12 was shown to be associated with DRPLA’°-”. The assessment of CAG expansion in the DRPLA gene in patients 1-22 demonstrated that all had CAG lengths within the normal range (7-23 repeats) suggesting that none of the individuals in this series lacking CAG expansion at the HD locus can be classified as DRPLA.  199  Human error accounted for 10 niisclassifications (1.0%) in this series: sample mix-up accounted for the absence of detecting trinucleotide repeat expansion in 6 persons in the series (0.6%) while clerical errors accounted for two misciassifications (0.2%). This stresses the importance, where possible, of obtaining additional samples for reassessment of persons who have signs and symptoms compatible with the diagnosis of HD, but who do not manifest CAG repeat expansion prior to concluding that they do not have repeat expansion.  After taking into account misassignments due to misdiagnosis, sample mix-up and clerical error, only 12 persons from 9 families represented unexplained cases lacking CAG expansion (Table 9-1). Construction of haplotypes within the families confirms if a particular DNA sample is consistent with other family chromosomes. Therefore, the possibility of sample mix-up being responsible for those cases in which a second DNA sample is unavailable is unlikely. Consistent patterns of inheritance however, do not necessarily exclude sample mix-up between two members of one family. Unfortunately, unavailability of second DNA samples in many instances prevents definitive exclusion of sample mix-up as an explanation for lack of CAG expansion. These cases remain unexplained until more clinical data or additional DNA samples are available.  In 3 families where assessment was possible (patients 3-6), segregation analysis indicated that other mutations in the HD gene (ff15) leading to this phenotype are unlikely and locus heterogeneity probably underlies the HD phenotype. The genetic cause for the phenotype similar to HD in these instances lies, in all likelihood, outside the HD gene. In 8 patients (patients 1, 2, 7-12) from 6 families the possibility that another mutation in the HD gene other than CAG expansion is responsible for the HD phenotype has not been excluded. It is noteworthy, however, that on retrospective review of 11/12 patients  200 (patients 1 to 11), certain clinical features raised questions about the clinical diagnosis of RD. These included absence of expected progression and failure to see characteristic changes on CT or PET scan after many years of illness. However, due to the lack of an alternative diagnoses in these cases, the diagnosis of HD has not been removed.  In families with demonstrated typical CAG expansion there were initially 10 patients who did not show expansion. Sample mix-up accounted for 4 of these findings (patients 2528), incorrect diagnosis was assigned in 5 cases (patients 13, 14, 16, 17, 18) and the remaining patient (patient 6) represents an unexplained case. In this one remaining instance, the natural history is not typical for HD and in retrospect the diagnosis in this patient is now in question.  Misdiagnosis or sample mix-up are the most likely  explanations for these phenomena. Recent studies from our laboratory indicate that while somatic mosaicism is evident in HD, major differences in CAG repeat size do not occur in different tissues except in persons with juvenile onset and cannot be invoked to account for the clinical presence of normal CAG size in a patient with demonstrated CAG expansion in other family members . 16  The occurrence of individuals with an HD phenotype and lacking CAG expansion questions the hypothesis that the expansion of the CAG repeat is the mutation causing HD.To demonstrate a DNA alteration is the causitive mutation rather than merely a closely associated marker, complete association of alteration and disease must be observed, and the lack of the alteration in control chromosomes. CAG expansion is associated with 99.0% of all HD individuals, and those lacking expansion have not yet been excluded as misdiagnoses, sample mix-up or some other error.  In addition,  expansion into the affected range has only been seen in control individuals, thus it appears that the CAG repeat is causitive for the disorder. However, until animal models have demonstrated that the expanded repeat is responsible for disease, the suggestion that  201 the CAG may not be the primary mutation, but a closely associated marker cannot be entirely excluded.  These findings have significant implications for the understanding of the pathogenesis of HD.  It would indicate that other mutational mechanisms besides CAG expansion,  resulting in alteration of this gene rarely, if ever, lead to this phenotype. A total of 30 persons given the clinical diagnosis of HD in this series of 1022 patients (3.0%) did not have expansion of the CAG repeat.  The majority of these (1.8%) were errors of  assignment. After investigation, a total of maximally 12 persons (1.2%) demonstrate an unexplained lack of CAG expansion. It is likely that in most instances, these patients come from families with HD-like disorders. In at least 4 cases from 3 families, family studies excluded mutations within the HD gene (IT 15) responsible for this phenotype. These results suggest that on rare occasions non-allelic genetic heterogeneity may underlie the presentation of an HD-lilce phenotype.  202  9.4 REFERENCES 1. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein 5, Ross C, Franz M, Abbott M, Gray J, Conneally P, Young A, Penney J, Hollingsworth Z, Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E, Alvir J, Bickman Conde J, Cha I-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J, Snodgrass S, deYoung M, Wexier N, Moscowitz C, Penchaszadeh G, MacFarlane H, Anderson M, Jenkins B, Srinidhi 3, Barnes G, Gusella J, MacDonald M (1993). Trinucleotide repeat length instability and age of onset in Hntington’s disease. Nature Genet 4:387-392. 2. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME, Gusella JF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeat expansion and phenotypic variation in Huntington’s disease. Nature Genet 4:393-397. 3. Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann J, Adam 5, Starr E, Squitieri F, Lin B, Kalchman M, Graham R, Hayden MR (1993). The relationship between trinucleotide (CAG) repeat length and clinical features of Huntington disease. Nature Genet 2:398-403. 4. Telenius H, Kremer HPH, Theilmann J, Andrew SE, Almqvist E, Anvret M, Greenberg C, Greenberg I, Lucotte G, Squitieri F, Starr E, Goldberg YP, Hayden MR (1993). Molecular analysis of juvenile Huntington disease: the major influence on (CAG)n repeat length is the sex of the affected parent. Hum Mol Genet 2:1535-1540. 5. De Boulle K, Verkerk JMH, Reyniers E, Vits L, Hendrickx J, Van Roy B, Van Den Bos F, de GraaffE, Oostra BA, Willems PJ (1993). A point mutation in the FMR-1 gene associated with fragile X mental retardation. Nature Genet 3:31-35. 6. Gedeon AK, Baker E, Robinson H, Partington MW, Gross B, Manca A, Korn B, Poustka A, Yu 5, Sutherland GR, Mulley JC (1992). Fragile X syndrome without CCG amplification has a FMR1 deletion. Nature Genet 1:34 1-344. 7. Pritchard C, Zhu N, Zuo I, Bull L, Pericak-Vance MA, Vance JM, Roses AD, Milatovitch A, Francke U, Cox DR, Myers RM (1992) Recombination of 4.pl6 DNA markers in an unusual family with Huntington disease. Am J Hum Genet 50:1218-1230. 8. Robbins C, Theilmann J, Youngman 5, Haines I, Altherr MJ, Harper PS, Payne C, Junker A, Wasmuth 3, Hayden MR (1989). Evidence from family studies that the gene causing Huntington disease is telomeric to D4S95 and D5S90. Am I Hum Genet 44:422425.  203  9. Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann J, Hayden MR (1992). Delineation of a 50 kb DNA segment containing the site of a recombination event in a sporadic case of Huntington disease. Nature Genet 2:216-222. 10. Hayden MR (1981). Huntington’s chorea. Springer-Verlag, New York. 11. Harper, PS (1991). Huntington’s disease. WB Saunders, London. 12. lizuka R, Hirayama K, Maehara K (1984). Dentato-rubro-pallido-luysian atrophy: a clinico-pathological study. J of Neurol Neurosurg and Psychiatr 47:1288-1298. 13. lizuka R and Hirayama K (1986). Dentato-rubro-pallido-luysian atrophy. Handbook Clin Neurol 5: 437-443. 14. Naito H and Oyanagi S (1982). Familial myoclonus epilepsy and choreoathetosis: Hereditary dentatorubral-pallidoluysian atrophy. Neurology 32: 798-807. 15. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72:971-983. 16. Telenius H, Kremer HPH, Goldberg YP, Theilmann I, Andrew S, Zeisler 3, Adam 5, Greenberg C, Ives EJ, Clarke LA, Hayden MR (1994). Somatic and gonadal mosaicism in Huntington disease: Instability of the CAG repeat is tissue specific and most prominent in brain and sperm. Nature Genet 6:409-414.  204  CHAPTER 10 MOLECULAR ANALYSIS OF SPORADIC CASES OF HUNTINGTON DISEASE  The work in this chapter has contributed to two manuscripts.  Goldberg YP, Andrew SE, Theilmann J, Kremer B, Squitieri F, Telenius H, Brown JD, Hayden MR (1993). Familial predisposition to recurrent mutations causing Huntington’s disease: genetic risk to sibs of sporadic cases. J Med Genet 30:987-990.  Goldberg YP, Kremer B, Andrew SE, Theilmann I, Graham RK, Squitieri F, Telenius H, Adam S, Sajoo A, Starr E, Heiberg A, Wolff G, Hayden MR (1993). Molecular analysis of new mutations for Huntington disease: Intermediate alleles and sex of origin effects. Nature Genet 5:174-179.  205  10.1 INTRODUCTION New mutations causing HD are exceedingly rare with the mutation rate amongst the lowest known for human genetic 1 diseases This may reflect a truly low mutation rate, . 3 but may also reflect that proof of a new mutation in HD is difficult as the parents of the sporadic case must have lived beyond the expected age of onset of HD without any manifestations of the disease, paternity of the sporadic case must be confirmed, and ideally the disease should be transmitted to the offspring of the sporadic case . 4  CAG repeat size of 600 control chromosomes of unaffected persons of Caucasian descent is in a range from 10 to 29 with a median of 18 repeats. Furthermore, 97.5% of normal chromosomes contain less than 28 repeats. The length of the upper allele in a total of 955 patients with HO has a range of 36 to 121 with a median of 44 repeats.  Of the 1050 persons affected with HD (from 650 families), a total of 934 (89%) represented patients in whom there was an established family history of disease. Those patients without an established family history of the disorder (116, 11%) were subjected to further investigation. The goal was to identify persons most likely to represent new mutations for HD. On further investigation of the 116 isolated cases, 56 were excluded by having either a family history that raised the possibility of HD, such as a first degree relative with a serious neurological disorder or, where HD in parents could not convincingly be excluded as a parent had died before the age of 60. An additional 3 persons were excluded because they were adopted and 36 were excluded as no additional information concerning family history was available. A total of 21 individuals remained who fulfilled the criteria of clearly having the signs and symptoms of HD and having parents who had lived beyond the expected age of onset (>60). Of the prior reports of possible mutations, a total of only 9 such cases have previously been described in the . 59 literature  206  10.2 RESULTS 10.2.1 IDENTIFICATION OF THE HD PREMUTATION CAG repeat length was assessed in 21 sporadic cases of HD and their families in order to learn more about the molecular events underlying new mutations for HD. In 18 of these families the sporadic HD patient had an upper allele which was in the size range seen in patients with this disorder (Table 1A and 1B). In all 8 families where DNA was available from parents (n=5) or there were sufficient siblings to reconstruct genotypes in the parents (n=3), one parental allele was found to be significantly greater than that seen in the general population (>30), but below the range seen in patients with HD. Alleles in the range of 30-3 8 repeats have been designated as intermediate alleles (IA).  These lAs are meiotically unstable. In each family the IA expanded in at least one offspring beyond 38 repeats causing HD (Figure 10-1 A-E). However, a small change in repeat size (1-2 repeats) or no change at all was also seen in some unaffected sibs of sporadic patients (Figure 10-1, Table 10-1). Since we have shown that the IA undergoes expansion to the range seen in patients with HD, it was reasoned that the IA is a premutation allele which alone does not cause HD, but predisposes to further mutation which may eventually lead to HD.  In one of the eight families (Figure 10-lA, Family 1) reconstruction of haplotypes indicates that one parent must have had an allele with approximately 32 repeats, which was then transmitted to 4 unaffected offspring as an allele with 32-33 trinucleotide repeats. One offspring however, received an allele containing 43 repeats and developed HD. The lAs of the 4 sibs and the expanded allele of the sporadic patient were found by haplotype analysis to be on the same chromosome as the parent with the IA, confirming that this IA was unstable and expanded to produce a CAG repeat length in the HO range.  207  Table 10-la. New mutation famiNes with demonstrated HD gene (GAG) repeat length between 30-38 (intermediate alleles). Family  2  3  4  5  6  7  8  *  **  Patient  Age of onset  Proband Sib Sib Sib Sib Sib Sib Sib Sib Sib Sib  36  Proband Father Mother Spouse Child  36  Proband Father Sib Sib  32  Proband Father Sib  32  Proband Father Mother Sib Sib Sib Sib Uncle  35  Proband Father Mother  28  Proband Sib Sib Sib Sib  45  Proband Mother Sib  40  Clinical Status of Parents *  Parents alive Unaffected >70  Transmitting Parent  Patemity**  Unknown  Confirmed  20 13 18 13 13 13 13 13 20 20 18  43 20 33 20 20 20 20 20 33 32 33  Father  Confirmed  17 27 16 16 16  44 32 17 16 43  Father  Confirmed  23 20 17 23  49 35 35 35  Father  Confirmed  21 19 19  43 30 30  Father  Confirmed  16 18 16 16 16 16 16 17  53 37 16 38 37 41 18 37  Father  Confirmed  22 19 19  52 38 22  Unknown  Confirmed  21 21 16 18 21  41 38 36 21 37  Father  Unknown  26 19 19  45 26 34  Father died >80 Mother alive >80  Father alive >80  Father alive >70 Mother alIve >70  Father alive >70 Mother alive >60  Father died >60 Mother alive >60  Parents died >60 Sibs >70s and no HD  Father died >60 Mother alive >80  No parent had any clinical features of RD Paternity was Investigated using highly polymorphic DNA markers and blood serology In all instances results were compatible with paternity as shown  Allele size Lower Upper  208  Table 10-B. CAG repeat length and clinical details of new mutation families Family  Patient  Clinical status of Parents *  9  Proband Mother Child Child  Father died >60 Mother alive >80  16 16 16 16  47 18 20 20  10  Proband Child  Father died >70 Mother alive >70  17 16  47 71  16 21 17 17  43 26 21 45  11  *  Allele size Lower Upper  Proband Father died>60 Father-in-law Mother died>80 Mother-in-law Child  12  Proband Sib  Father died>70 Mother died>70  20 17  40 20  13  Proband Spouse Child Child  Parents alive >60  23 17 17 19  43 19 40 41  14  Proband Sib  Father died >60 Mother died >70  19 12  40 13  15  Proband Spouse Child Child  Father died >70 Mother died >70  11 18 15 22  39 22 18 37  16  Proband Child  Father died >60 Mother died>90  16 16  43 19  17  Proband  Father died >90 Mother died >80  15  42  18  Proband  Father died >60 Mother alive >70  20  40  19  Proband  Father died >70 Mother died >80  18  21  20  Proband Mother Father  Father >60 Mother >60  13 22 16  26 28 18  21.0  Proband Sib  Father died >60 Mother died >60  15 19  19 26  No parent had any clinical features of HO  209  TNR  1318131313131320202018 2033202020202033324333 ac bd ac ac ac ac ac ad ad ad/c bd Figure 10-1 (a-e). Autoradiograms showing trinucleotide repeat (TNR) length in families with new mutations. In each family, an intermediate allele (IA) is shown, which has expanded in the sporadic case. Pedigrees aie shown and in families 1,3 and 4, haplotypes (a, b, c and ci) are depicted. Figure 10-la. Haplotypes were reconstructed in family 1 for the parental alleles. Meiotic instability is seen as variability of the intermediate allele amongst sibs. The proband has an expanded allele consistent with HD.  210  TNR 44— 43—  32—  27—  17— 16-s  I  16  16271716  43  16324417  92  Figure 10-lb. In family 2, the intermediate allele of the parent (I-i) expands in the proband (II 2) and is passed on to the next generation (rn-i).  211  TNR  35  23—  *  2O—17—  20231723  35493535  cd ac bc ac Figure 10-ic. Haplotype analysis of family 3 shows that the intermediate alleles and the expanded alleles occurs on the same chromosome. In contrast to the proband, sibs have inherited the intermediate allele in a stable manner.  212  43_]  30_g  1: 19 21 30 43  19  III  acad  ab  Figure 10-id. Haplotype analysis of family 4 shows that the intermediate alleles and the expanded alleles occurs on the same chromosome. In contrast to the proband, sibs have inherited the intermediate allele in a stable manner.  213  TNR 41—  21-$1 18— 16—  2116 182121 383621 3741  Figure 10-le. Parents were unavailable for family 5. However, analysis of sibs shows an unstable intermediate allele of 36-38 repeats. These sibs are all alive (>70 years old) without features of HD.  214 Interestingly, this patient was one of the recombinant individuals discussed in Chapter 1, with the breakpoint between D4S 111 and D4S 141 that led to the hypothesis that the recombination event might underlie the cause of HD in this patient’°.  In 2 other families with established haplotypes (Figures 10-iC and 10-iD, and Table 101A, Families 3 and 4) the expanded CAG repeat of the sporadic patient occurs on the same chromosome as the IA of the sibs and the transmitting parent. In another interesting family, the father with the premutation has passed on an expanded allele to two offspring, only one of whom has already manifested with HD (Table 10-lA, Family 5).  10.2.2 PARENTAL SEX OF ORIGIN The sex of origin of the premutation was determined by examining DNA from both parents. In 5 instances the father has been identified as the transmitting parent (Table 101 A). In two families where parental DNA was unavailable, the sex of the transmitting parent could not be ascertained (Table 10-lA). Furthermore, in two additional families with sporadic HD, the mutation was not inherited from the mother, and therefore paternal origin is implied (Table 10-lB. Family 8). Therefore, in 7/7 families there is preferential origin of the new mutation from the paternally derived allele. The probability of obtaining 7 mutant alleles from the father in 7 independent meiosis, if there is no bias towards one parent or another is 0.0075. This would suggest that the paternal allele in the premutation range is more likely to undergo significant expansion to a repeat length in the range seen in patients affected with HD. The 5 unaffected fathers with lAs have repeat lengths of their upper allele ranging from 30-38 (mean  =  34.2 ± 2.8) which is  significantly different from the upper allele in the general population and less than the upper allele seen in HD. However, it should be noted that at the upper range of the IA it may be difficult to distinguish between a high IA and an upper allele in the lower range (38-40) seen in some patients with HD.  215  In the remaining 12 patients with presumed new mutations causing HO (Table lO-1B), DNA from both parents was unobtainable and thus the parental origin of the expanded allele could not be identified. In 9 of 12 instances however, the person representing a new mutation for HD was found to have an expanded allele consistent with that seen in other affected persons.  10.2.3 NEW MUTATIONS WITHOUT CAG EXPANSION In three families no expanded upper allele was detected although second DNA samples were unavailable to exclude sample mix-up. In these families, the repeat lengths of the sporadic patients were 18/21, 13/26 and 15/19 respectively. All three patients had typical histories of HD and were diagnosed by a neurologist as being affected. In another suspected new mutation, excluded from this study as the mother died before 60, the patient with classical signs of HD detected clinically and by positron emission tomography, had 2 alleles of 16 CAG repeat lengths. This would suggest therefore, that in a proportion of these patients, genetic heterogeneity underlies HD. The underlying cause for an HD-like phenotype remains unknown and merits further investigation. However, this has significant implications for predictive testing programs and clearly indicates the importance of including affected family members in any protocol providing results to at risk persons. The failure to identify an expanded allele in the affected person would indicate that, in this instance, the direct mutation test would not be informative in these offspring.  10.2.4 SPORADIC HD IS TRANSMITTED TO OFFSPRING Patients with sporadic HO can transmit their expanded CAG repeats to their offspring who then will subsequently develop HD. In family 2, (Figure 10-1B, Table 10-lA) for example, where paternity has been established, the unaffected father transmitted an IA  216 which expands to 44 repeats in the sporadic affected patient who in turn passed on an allele of 43 repeats to his offspring who also developed HD. In another family, there was transmission of HD from the parent to his offspring with massive expansion of the HD allele from 47 to 71 repeats (Table 10- 1B, Family 9). In this family paternity has been proven and both the parents lived to an advanced age, beyond 70 and 80 years old respectively, without manifesting signs and symptoms of F{D. This family is pertinent because the sporadic case developed HD at 30 years, while his child who inherited the expanded allele of 71 repeats, manifested at 10 years of age consistent with the previously developed predictive model for age of onset based on repeat length.  10.3 DISCUSSION These findings have major implications for the understanding of the sequential molecular events leading to new mutations and also have clinical relevance. Patients with suspected HD on clinical criteria without a positive family history are not rare and initially represented 11% of this sample which was biased towards ascertainment of familial cases. In this sample, 18 out of 21 sporadic cases of HD had CAG expansion. It would appear, therefore, in most instances that the diagnosis of HD can be made in the absence of a positive family history.  It has previously been postulated that the mutation rate for HD is amongst the lowest for all human genetic diseases and very few sporadic cases had been described . The fact 3 that 21 suspected new mutations in 650 families (3%) reflects the fact that the criteria used here for designation of new mutations were less stringent than those used in previous studies. Thus it is apparent from the results of this analysis that new mutations are not as rare as previously thought, and it is likely that the mutation rate for this disorder has been underestimated.  217 These findings have significant implications for family members of sporadic cases, in particular for siblings and second degree relatives of such affected persons. In the past, there was no appreciation that the offspring of unaffected siblings of a sporadic case for HD might indeed have an increased risk of manifesting with HD in the future since they may also inherit an expanded allele (Table 10-lA, Family 5). Similarly, children of unaffected siblings with an IA or male siblings with an IA are also at increased risk of having children with HD. This latter risk would depend on whether the premutation allele undergoes expansion during transmission through the male germline. Female siblings who carry the premutation allele however, may have a lower probability of passing on an expanded allele resulting in HD in their offspring. Thus, the risk for the children of females carrying the premutation for HD would be considerably lower than that seen in the offspring of males.  These findings may explain previously puzzling family histories. For example, two siblings of unaffected parents may manifest with HD suggesting recessive inheritance. This is consistent with a premutation in the parental allele may have undergone expansion in both offspring.  Furthermore, apparent skipping of generations may be due to  expansion of an inherited premutation in the gerniline of a sibling of a sporadic affected patient with consequent clinical manifestations in the niece or nephew. In the past, non paternity was usually invoked to account for these findings.  The frequency of this expansion of premutation alleles in the male germ line is unknown. This small series which show 10 expansions out of 24 meioses (42%) clearly represents an overestimate as there has been a bias of ascertainment with identification only of those individuals in whom the expansion has occurred and manifest with HD.  218 To date, only two other HD new mutation families have been analyzed and consistent with these findings, lAs were detected in both families of 33 and 36 repeats respectively”. This analysis demonstrates that there is a premutation allele which is unstable, and expands to a range associated with HD. Haplotypes of the premutation alleles with markers flanking the CAG mutation demonstrate multiple origins for the predisposing premutation allele, leading to HD chromosomes with different haplotypes (data not shown). This is consistent with the finding of multiple haplotypes associated with HD 3 ’ 12  In fragile X syndrome, a CCG repeat located in the 5’ untranslated region of the FMR-1 gene ranges in normal persons from 6 to 54 repeats, while expansions greater than 200 . Phenotypically normal transmitting males 5 J 14 repeats are seen in affected individuals have an intermediate size CCG repeat of 52 to 100 which is meiotically unstable . However, in contrast to fragile X where expansion of the premutation 4 ” 3 (premutation)’ only occurs in transmission through the female germline, expansion of the HD premutation has only been demonstrated in the male germline.  In myotonic dystrophy (DM) a CTG in the 3’ untranslated region of the myotonin kinase gene ranges from 5-40 repeats in normal persons, while expansions greater than 100 are seen in affected individuals . Similar to HD, in DM a premutation (50-100 repeats) is 168 meiotically unstable and expands to the full mutation, but in contrast to HD the expanded allele may be transmitted from either parent . 19  It has previously been shown in myotonic dystrophy and fragile X syndrome that the length of the repeat is the major source of recurrent DNA mutations once the repeat has reached an intermediate range . These data also implicate CAG repeat length as one 1317  219 of the factors contributing to instability once the number of repeats reaches a threshold level (30 repeats).  In this analysis, mutations causing HD are more likely on the paternally derived allele. The selective expansion of the CAG triplet repeat in the offspring of males might reflect a previously recognized higher mutation rate in the male than in the female germ line . 3 Errors in DNA replication during germ-cell division might be more likely as the number of germ-cell divisions per generation is much greater in males than females. It is notable that the fathers who clearly had the IA with germline mutations leading to an expanded allele, were of an advanced age (mean 36.7, range 29-55 years) at the time of birth of the affected offspring.  This is similar to the mean age of parents in other autosomal  dominant conditions such as achrondroplasia and Marfan’s syndrome where a higher mutation rate with increasing paternal age has been . 2023 These findings ’ 3 demonstrated suggest that in HD, advanced paternal age in some undetermined way is influencing the susceptibility of the premutation to full expansion.  The sporadic recombinant individual from family 1 with the breakpoint between D4S 111 and D4S 141 that led to the hypothesis of a distally located HD gene, disrupted in the recombinant individual resulting in disease , shows the same CAG expansion to the 10 affected range as in other sporadic cases examined. Therefore, the recombination event would appear to be unassociated with the development of disease in this individual.  These results provide convincing evidence for a premutation in HD. Factors affecting the susceptibility of the premutation to full expansion include the sex of the parent as well as paternal age. These findings have significant clinical implications for family members of sporadic patients and will influence counseling practices.  220  10.4 REFERENCES 1. Hayden, MR (1981). Huntington’s Chorea, Springer-Verlag, New York. 2. Harper, PS (1991) Huntington’s Disease, WB Saunders, London. 3. Vogel F and Motuisky A (1986). Human Genetics, 2nd ed., Springer-Verlag, New York. 4. Stevens D and Parsonage MJ (1969). Mutation in Huntington’s chorea. I Neurol Neurosurg Psychiatry 32:140-143. 5. Wolff G, Deuschl G, Wienker TF, Hummel K, Bender K, Lucking C, Schumacher M, Hammer J, Oepen G (1989). New mutation to Huntington’s disease. I Med Genet 26:1827. 6. Baraitser M, Burn J, Fazzone TA (1983). Huntington’s chorea arising as a fresh mutation. J Med Genet 20:459-460. 7. Shaw M and Caro A (1982). The mutation rate to Huntington’s chorea. J Med Genet 19:161-167. 8. Chiu E and Brackenridge CJ (1976). A probable case of mutation in Huntington’s disease. J Med Genet 13:75-77. 9. Stevens D and Parsonage M (1969). Mutation in Huntington’s chorea I. Neurol Neurosurg Psychiatr 32:140-143. 10. Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann I, Hayden MR (1992). Delineation of a 50kb DNA segment containing the recombination site in a sporadic case of Huntington’s disease. Nature Genet 2: 2 16-222. 11. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72:971-983. 12. MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, Bates G, Taylor 5, Allitto B, Altherr M, Myers R, Lehrach H, Collins FS, Wasmuth JJ, Frontali M, Gusella JF (1992). The Huntington’s disease candidate region exhibits many different haplotypes. Nature Genet 1:99-103. 13. Andrew SE, Theilmann I, Almqvist E, Norremolle A, Lucotte G, Anvret M, Sorenson SA, Turpin JC, Hayden MR (1993). DNA analysis of distinct populations suggests multiple origins for the mutation causing Huntington disease. Clin Genet 43:286-294. 14. Fu Y-H, Kuhl DPA, Pizzuti A, Pieretti M, Sutcliffe IS, Richards 5, Verkerk AJMH, Holden JJa, Fenwick RG Jr, Warren ST, Oostra BA, Nelson DL, Caskey CT (1991). Variation of the CGG repeat at the fragile X site results in genetic instability: Resolution of the Sherman paradox. Cell 67:1-20.  221  15. Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST, Schlessinger D, Sutherland GR, Richards RI (1991). Mapping of DNA instability at the fragile X to a trinucleotide repeat sequence p(CCG)n. Science 252:1711-1714. 16. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C, Narang M, Barcelo J, O’Hoy K, Leblond 5, Earle-MacDonald 3, de Jong PJ, Wieringa B, Korneluk B (1992). Myotonic dystrophy mutation: an unstable CTG repeat in the 3’ untranslated region of a candidate gene. Science 255:1253-1255. 17. Fu Y-H, Pizzuti A, Fenwick RG Jr, King J, Rajnarayan 5, Dunne PW, Dubel J, Nasser GA, Ashizawa T, de Jong P, Wiereinga B, Komeluk R, Perryman MB, Epstein HF, Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255: 1256-1258. 18. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, Hunter K, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA, Crow 5, Davies 3, Shelbourne P, Buxton 3, Jones C, Junoven V, Johnson K, Harper PS, Shaw DJ, Housman DE (1992). Molecular basis of myotonic dystrophy: Expansion of a trinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase family member. Cell 68:799-808. 19. Richards RI and Sutherland GR (1992). Dynamic mutations: A new class of mutations causing human disease. Cell 70:709-7 12. 20. Penrose LS (1955). Parental age and mutation. Lancet 11:3 12. 21. Penrose LS (1957). Parental age in achondroplasia and mongolism. Am J Hum Genet 9: 167-169. 22. Murdoch J, Walker BA and McKusick VA (1972). Parental age effects on the occurence of new mutations for the Marfan syndrome. Ann Hum Genet 35:331-336. 23. Vogel FA (1977). probable sex difference in some mutation rates. Am J Hum Genet 29:312-319.  222  CHAPTER 11 A POLYMORPHIC CCG REPEAT ADJACENT TO THE CAG REPEAT IN THE HUNTINGTON DISEASE GENE The work presented in this chapter contributed to one manuscript.  Andrew SE, Goldberg YP, Theilmann J, Zeisler J, Hayden MR (1994). A CCG polymorphism adjacent to the CAG repeat in the Huntington disease gene: Implications for diagnostic accuracy and predictive testing. Hum Mol Genet 3:65-67.  223  11.1 INTRODUCTION Since the discovery of the CAG trinucleotide expansion associated with HD, different PCR approaches have been taken to assess CAG length’ . PCR across this repetitive region is 4 complicated by the presence of two adjacent CCG trinucleotide repeats which in the initial report were included in the PCR to assess CAG length’. In the initial report, three clones were sequenced across this region and each found to contain 7 CCG repeats adjacent to the CAG suggesting that this CCG repeat was not polymorphic’. Therefore all reported methods have used primers which not only encompass the CAG repeat, but also flank the adjacent CCG repeat in normal human chromosomes (Al/A2) (Figure 11-1). In this chapter, the possibility that inclusion of the adjacent CCG repeat may affect trinucleotide length was examined.  11.2 RESULTS CAG lengths reported in this thesis were obtained by using primers which encompass the CAG repeat and the adjacent CCG repeat in normal human chromosomes (A1/A2) (Figure 11-1). Results of such PCR are more accurately “estimates of CAG size” as these assessments of CAG repeat size have been made assuming that the CCG repeat did not demonstrate any variation.  In an effort to address whether the CCG is polymorphic or not, two additional sets of PCR primers were designed, one which flanks specifically the CAG tract alone (C1IC2), while the other set encompasses the CCG repeat alone (B1JB2) (Figure 11-1).  PCR conditions using primers B 1 and radiolabeled B2 were 2 mM MgC12, 50 mM KC1, 20 mM Tris pH 8.4, 3.5% formamide, 15% glycerol, 200 mM dNTPs, 10 pmol of each primer and 2.5U of Taq polymerase. Thermal cycling conditions were 95°C for 3 mm, followed by 30 cycles of 94°C for 1 mm, 59°C for 1 mm, 72°C for 1 mm, with a final  HD344  CAG  Bi  I  I  CCG  I-  B2  A2  Figure 11-1. 5’ region of the Huntington disease gene showing the polymorphic CAG and CCG repeats. Sequence is shown extending from nucleotide (nt) 335 to 488 according to the published sequence. Primers used for the determination of the estimated CAG (Al /A2), CAG alone (Cl /C2) and CCG alone (Bi /B2) are indicated by arrows above the corresponding sequence. Those primers with an X have an additional 5’ Xhol site as a tail. Al  (Al) HD344X  ClpI  (Cl)  HD482  (A2)  GATGAAGGCCTTCGAGTCCCTCAAGTCCTTCCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC nt 335 (CAG)n  (CCG)n  HD447X (C2) HD482X (B2) HD419X (Bl) AGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCcTCCTCAGCTTCCTCAGCCGCCGCG nt 488 (CAG)n  225 extension at 72°C for 7 mm. PCR products were resolved on 6% polyacrylamide and product sizes were determined by comparison to an M13 sequencing ladder and the number of CCG trinucleotides was calculated as follows: Number of CCG  =  (PCR product size -60)/3, as there are 60 nucleotides of non-CCG  repeats in the PCR product.  The CCG triplet is seen as 7 copies on approximately 67% of normal human chromosomes (Table 11-1). However, this may vary between 7 and 12 repeats with the second highest frequency of repeat being 10 (Table 11-1, Figure 11-2). In contrast, analysis of 114 HD patients indicates that the majority (92%) of persons with HD have 7 CCG repeats associated with the expanded CAG repeat. Three patients however, were homozygous for a CCG repeat length of 10 indicating that the CAG in these instances was segregating together with a CCG trinucleotide of 10 repeats (Table 11-1). In a further six patients with CCG allele sizes of 7 and 10 respectively, the CCG of 10 was found to segregate with the expanded CAG on the HD chromosome.  11.3 DISCUSSION The finding of a polymorphic CCG repeat has significant implications for the assessment of CAG repeat length in persons with symptoms suggestive of HD and also for candidates participating in predictive testing programs. If laboratories continue to use primers which encompass both the CAG and the CCG repeats (“estimated CAG”), then in persons who have an “estimated CAG” repeat length of between 37 to 42, it will be necessary to distinguish between CAG and CCG repeat length in order to accurately assess the contribution of CAG expansion alone.  In the past, using primers Al and A2 in a patient who for example, had a total PCR product size of 45 repeats, subtraction of 7 CCG repeats would give an “estimated CAG” size of  226  Table 11-1.  Frequency of CCG alleles in control and HD chromosomes.  CCG allele  CONTROL  HD  n  %  n  %  7  137  66.83  105  92.17  9  5  2.44  0  0.00  10  61  29.76  9  7.83  11  1  0.49  0  0.00  12  1  0.49  0  0.00  Total  205  100  115  100  X 2 2 6.36,  pO.OOOO3,  df=4  Figure 11-2. PCR Amplification of the CCG repeat showing alleles of 7, 9, 10, 11 and 12 repeats. PCR products have been resolved on 6% polyacrylamide gels and sized against an M13 sequencing ladder.  228 38. In this cohort of 1022 affected persons, the smallest HD allele has an expanded CAG repeat length of 36 repeats. This repeat length of 38 therefore, is within the range seen in affected persons with HD implying that this person has an expanded CAG repeat consistent with having inherited the mutation underlying HD. However, if such a person in fact had a CCG repeat length of 12, the actual CAG repeat length size would be only 33 and therefore would be below the range seen in affected persons with HD. In this particular instance, the measurement of CCG repeat length is critical in reaching an accurate conclusion in terms of confirmation of diagnosis or provision of an accurate risk in predictive testing.  The CCG polymorphism may complicate assessment of CAG size, however, in a small number of instances. Up until now, the approach to determination of trinucleotide expansion was to estimate CAG length by PCR encompassing both CAG and CCG repeats and subtracting 7 CCG repeats as it was thought not to be polymorphic. Even though in the vast majority of persons with HD (92%) direct assessment of CCG length will yield a result of 7 CCG repeats, one cannot assume that this is always the case. Moreover, approximately 33% of normal individuals have a CCG greater than 7 repeats, and therefore, in those instances in which measurement of CCG might have influence on the estimate of risk (for persons with “estimated CAG” between 37-42), direct assessment of CCG length becomes imperative in an effort to give the patient the best estimate of whether they have or have not inherited a mutation associated with HD. Thus, in those instances with CAG length estimates between 37-42, CAG and CCG analysis would be performed independently. Alternatively, in all patients an initial PCR across the CAG alone would circumvent the need for additional PCRs. However, at present, this PCR is much less robust than amplification of the “estimated CAG”, and requires further optimization prior to routine use.  229 Accurate assessment therefore, of CAG repeat size, clearly, is important both for the correct evaluation of symptoms in persons with an “estimated CAG” size of 37-42, as well as for persons at risk who have not inherited the HD chromosome but have 10-12 CCG repeats which otherwise might falsely lead to the interpretation that they have inherited the HD mutation.  The relationship between age of onset and CCG length was also assessed. Age of onset was available for 57 individuals with an HD allele containing 7 CCG repeats (mean  =  42.1 years, range 14  CCG repeats (mean  =  -  65) and for the 9 individuals with an HD allele with 10  44. lyears, range 35  -  64). Comparison between these two groups  showed no significant difference in the age of onset of the disease. Thus, despite evidence that both the CAG and CCG are polymorphic on normal human chromosomes, it is only the CAG that has the susceptibility to significant expansion which in some unknown way is associated with the phenotype of HD.  Recently, Rubensztein et al. 5 have reported similar findings of polymorphism of the CCG repeat, identifying 4 alleles and emphasizing the importance of this repeat in determining accurate CAG repeat length in some FR) individuals.  230  11-4 REFERENCES 1. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington disease chromosomes. Cell 72:971-983. 2. Goldberg YP, Andrew SE, Clarke LA, Hayden MR (1993). A PCR method for accurate assessment of trinucleotide repeat expansion in Huntington disease. Hum Mol Genet 2:635-636. 3. Riess 0, Noerremoelle A, Soerensen SA, Epplen IT (1993). Improved PCR conditions for the stretch of (CAG)n repeats causing Huntington disease. Hum Mo! Genet 2:637. 4. Valdes 3M, Tagle DA, Elmer LW, Collins FS (1993). A simple non-radioactive method for diagnosis of Huntington disease. Hum Mol Genet 2:633-634. 5. Rubensztein DC, Barton DE, Davison BC, Ferguson-Smith MA (1993). Analysis of the huntington gene reveals a trinucleotide-length polymorphism in the region of the gene that contains two CCG-rich stretches and a correlation between decreased age of onset of Huntington disease and CAG repeat number. Hum Mo! Genet 2: 1713-1715.  231  CHAPTER 12 DISCUSSION  232  12.1 SUMMARY OF RESULTS  The identification of a CAG expanded repeat associated with HD ended a long search for . The work presented in this thesis represents refinement of the candidate 1 the HD mutation region by analysing the 6 Mb candidate region for markers demonstrating allelic association with the HD gene, discovery of an Alu retrotransposition event in the proximal candidate region associated with HD in two families, as well as genetic analysis of the CAG repeat providing further insights to the nature of dynamic mutations.  12.1.1 ALLELIC ASSOCIATION Conflicting results from patients with recombinant chromosomes made the search for the HD gene an onerous task. In order to further reduce the candidate region for the HD gene, linkage disequilibrium was examined across the 6 Mb candidate region and two regions of association, separated by 3 Mb were identified.  Measures of linkage disequilibrium depend on accurate allele frequencies and selection of control chromosomes that accurately reflect the allele frequencies of the population from which the HD alleles are sampled is an important consideration. The HD individuals and their unafected spouses analyzed in Chapter 3 were of mixed ancestry, primarily of UK descent. In order to investigate more homogeneous populations, three homogeneous populations were analyzed in Chapter 4 for allelic association. No association was seen with markers from both candidate regions. Haplotype analysis demonstrated several distinct haplotypes within each population, which could account for the inability to measure any allelic association. The existence of multiple haplotypes within each homogeneous population suggests several origins for the HD chromosomes within each population, and thus multiple haplotypes underlie the worldwide occurence of HD.  233 The reason for allelic association observed with distal markers, 3 Mb from the CAG expansion has yet to be resolved despite the identification of the expanded CAG repeat associated with HD’. Allelic association is highly dependent on the allele frequencies of the disease and control populations, and it is possible that the results are a statistical artifact, due to controls not matched rigorously enough to the HD population. Multiple haplotypes within homogeneous populations identical with distal markers by chance may also explain the disequilibrium detected between HI) and these distal markers. Another factor that could have influenced the measures of allelic association was the inadvertent inclusion of cases that were later determined by to be misdiagnoses, sample mix-up or cases suffering from other HD-like disorders. However, only one individual not demonstrating CAG expansion was included in previous linkage disequilibrium analyses, and removal of results from this individual does not alter the significance of the measures of disequilibrium (data not shown).  12.1.2 PATTERNS OF ALLELIC ASSOCIATION AROUND THE CAG REPEAT The CAG repeat associated with HD is located within a novel gene situated 120 kb from the marker D4S95’. Analysis of association with markers distributed over 200kb and flanking the HD gene showed a pattern of increasing allelic association measures with respect to genomic distance from the CAG repeat. Haplotype analysis with these markers confirmed that multiple haplotypes underlie HD. The major HD haplotypes have mean CAG lengths larger than expected on normal chromosomes which is consistent with the hypothesis that the length of the repeat is associated with instability, and these chromosomes with large range of normal CAG length are prone to expansion leading to HD chromosomes.  234 12.2.3 GENOMIC REARRANGEMENT ASSOCIATED WITH HD Prior to the identification of the HD gene a method of identifying transcribed sequences, developed by Dr. Rommens and termed “Gene Tracking”, identified 53 transcribed clones from the proximal candidate region. One of the Gene Tracked clones, located close to the marker D4S95 demonstrating strong linkage disequilibrium with HD, identified a genomic rearrangement cosegregating with HD in 2 families. The rearrangement was mapped, cloned and sequenced and identified as an Alu element retrotransposition event.  After identification of the expanded CAG repeat associated with HD by the Huntington Disease Research Collaborative Group the Alu insertion was localized 190kb from the site of the CAG repeat. The affected individuals with the Alu insertion were shown to have expanded CAG repeat lengths similar to those seen in other HD patients. Whether this Alu insertion event is a factor in the instability of the CAG repeat, or alternatively, whether the insertion is an effect of instability of the chromosome as a result of the CAG expansion, or whether the two events were independent remains unknown. Alu elements are known to promote recombination, suggesting that the insertion event described in this study could have triggered the expansion of the CAG repeat. However, both the size of the CAG repeat and the presentation of the disease in the families with the Alu insertion is no different from that seen in other Huntington disease patients, suggesting that most likely, expansion in the two families with the Alu insertion is due to the same mechanism as in other Huntington disease families.  12.2.4 CAG REPEAT ANALYSIS A PCR assay was established that allowed for rapid, reliable analysis of the length of the CAG repeat in HD patients and their families. Analysis of the CAG repeat demonstrated a significant relationship between CAG repeat length and age of onset of disease, with CAG length responsible for approximately 50% of the variation in the age of onset. CAG repeat  235 lengths were examined in patients from 43 different countries and 5 races, and was seen to underlie HD worldwide. Thirty individuals did not have CAG expansion and represent either genetic heterogeneity of HD or errors of assignment such as misdiagnosis, sample mix-up or clerical error.  Inclusion of alleles from these cases of misdiagnoses, sample mix-up, clerical error, and unexplained HD-like phenotypes in allelic association analyses was not a factor affecting the search for the HD gene using measures of disequilibrium. However, CAG analysis of the families with informative recombination events that previously pointed to conflicting locations for the HD gene allowed for the resolution of the discrepancies in the earlier data. For example, two of the recombinant chromosomes that suggested a distal location for the HD gene were from individuals lacking CAG expansion (Chapter 9, patients 5 and 6). The reason for the HD-like phenotype in these patients has not yet been determined and their diagnosis is now in question.  New mutations were determined to be derived from an intermediate sized “premutation” allele in an unaffected father. The allele expands in the sporadic case to the range associated with HD. A sporadic patient with a recombinant chromosome (Chapter 10, Family 1) that triggered the hypothesis that the recombination breakpoint would identify the gene, demonstrates a CAG size in the expected affected range. The affected allele has expanded in size from an intermediate sized “premutation” allele in the unaffected parent, similar to that observed in other sporadic cases, to an allele within the HD range. This suggests that the recombination event was not causative of expansion in this patient.  12.3 FURTHER INVESTIGATIONS The identification of the CAG trinucleotide repeat associated with Huntington disease has ended the search for the Huntington disease mutation, with future goals being to  236 understand the mechanism of expansion and how the expansion seen in all tissues causes such specific neuronal death, resulting in disease.  The CAG repeat is located in the 5’ coding region of a large gene coding for a protein with no known function. Using the criteria established by Muller to categorize classes of mutant Drosophila alleles , the complete dominance of Huntington disease most likely explained 2 by classifying the Huntington disease mutation as a neomorph. Neomorphic alleles are often called gain of function mutations, as they result in an altered gene product that is functionally different from that of normal, or make the normal product that is produced at the wrong place or time during development, due to error in regulation. How expansion of the CAG repeat alters the function of the HD protein remains to be determined.  It is not yet known what initiates expansion of the CAG repeat. Linkage disequilibrium and haplotype analysis suggests that a few specific haplotypes are prone to further expansion resulting in Huntington chromosomes. Identification of markers more closely flanking the CAG repeat will provide more accurate haplotype information and will confirm if a multi-step process as seen in FRAXA and DM is occurring in Huntington disease . 4 ’ 3  237  12.4 REFERENCES 1. Huntington Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington disease chromosomes. Cell 72:971-983. 2. Muller HJ (1932). Further studies on the nature and causes of gene mutations. In Proceedings of the 6th International Congress Genetics you, pp2l3-255. Ithica, New York. 3. Richards RI Holman K, Friend K, Kremer E, Hillen D, Staples A, Brown WT, Goonewardena P. Tarleton J, Schwartz C, Sutherland GR (1992). Evidence of founder chromosomes in fragile X syndrome. Nature Genet 1:257-260. 4. Imbert G, Kretz C, Johnson K, Mandel JL (1993). Origin of the expansion mutation in myotonic dystrophy. Nature Genet 4:72-76.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088862/manifest

Comment

Related Items